What is Database Concurrency ?
Modern relational database systems potentially allow hundreds, if not thousands, of simultaneous connections to the data. The architecture of the database system itself determines the various ways that concurrent access to the same data can be managed with as little interference between users or connections as possible. Most database systems provide features that an application developer can use to exert some control over how concurrent access is managed, allowing the developer to find a balance between concurrency and data consistency.
Microsoft® SQL Server™ 2005 includes a new technology called Row Level Versioning (RLV) that allows concurrent access to be handled in new ways. Many features of SQL Server 2005 are designed around RLV and no additional application control is necessary in order to take advantage of this new capability. For other features, such as new isolation levels, a Database Administrator must specifically allow RLV on a database by database basis. This allows backward compatibility to be maintained for those applications that depend on the locking behavior in previous SQL Server versions.
This white paper focuses on concurrency enhancements in SQL Server 2005. On the server side, it covers all the features of SQL Server that leverage RLV technology. These include the new features: Snapshot Isolation, Multiple Active Result Sets (MARS) and Online Index Rebuild. RLV is also used in SQL Server 2005 to support database triggers, so the differences in trigger behavior between SQL Server 2000 and 2005 are also discussed. On the client side, concurrency enhancements covered include concurrency in CLR objects, transaction control from the new SQL Native Client, Windows Enterprise Services and queued components, and concurrency using Service Broker enabled applications.
One of the main benefits of RLV and the client-side enhancements is that SQL Server can now provide higher levels of database concurrency with equivalent or better data consistency. This paper will describe the pre-existing concurrency features only in order to compare the new features with the existing ones.
Database Concurrency Definition
Concurrency can be defined as the ability for multiple processes to access or change shared data at the same time. The greater the number of concurrent user processes that can execute without blocking each other, the greater the concurrency of the database system.
Concurrency is impacted when a process that is changing data prevents other processes from reading the data being changed or when a process that is reading data prevents other processes from changing that data. Concurrency is also impacted when multiples processes are attempting to change the same data concurrently and they cannot all succeed without sacrificing data consistency.
How a database system addresses situations that decrease concurrency is partly determined by whether the system is using optimistic or pessimistic concurrency control. Pessimistic concurrency control works on the assumption that there are enough data modification operations in the system to make it likely that any given read operation will be affected impacted by a data modification made by another user. In other words, the system is being pessimistic and assuming a conflict will occur. The default behavior when using pessimistic concurrency control is to use locks to block access to data that another process is using. Optimistic concurrency control works on the assumption that there are few enough data modification operations to make it unlikely (although possible) that any process will modify data that another process is reading. The default behavior when using optimistic concurrency control is to use row versioning to allow data readers to see the state of the data before the modification took place.
Historical Behavior
Historically, the concurrency control model in SQL Server at the server level has been pessimistic and based on locking. While locking is still the best concurrency control choice for most applications, it can introduce significant blocking problems for a small set of applications.
The biggest problem arises when locking causes the writer-block-reader or reader-block-writer problem. If a transaction changes a row, it holds exclusive locks on the changed data. The default behavior in SQL Server is that no other transactions can read the row until the writer commits. Alternatively, SQL Server supports ‘read uncommitted isolation’, which can be requested by the application either by setting the isolation level for the connection or by specifying the NOLOCK table hint. This nonlocking scan should always be carefully considered prior to use, because it is not guaranteed to return transactional consistent results.
Prior to SQL Server 2005, the tradeoff in concurrency solutions has been that we can avoid having writers block readers if we are willing to risk inconsistent data. If our results must always be based on committed data, we needed to be willing to wait for changes to be committed.
Overview of Row Level Versioning
Even if the application requires that results must always be based on committed data, there are still two possibilities. If the reader absolutely must have the latest committed value of the data it is correct for readers to wait (on a lock) for writers to complete their transactions and commit their changes. In other situations, it might be sufficient just to have committed data values, even if they are not the most recent versions. In this case, a reader might be fine if SQL Server could provide it with a previously committed value of the row, that is, an older version.
SQL Server 2005 introduces a new isolation level called ‘snapshot isolation’ (SI) and a new non-locking flavor of read-committed isolation, call ‘read committed snapshot isolation’ (RCSI). These row-versioning based Isolations levels allow the reader to get to a previously committed value of the row without blocking, so concurrency is increased in the system. For this to work, SQL Server must keep old versions of a row when it is updated. Because multiple older versions of the same row may need to be maintained, this new behavior is also called multi-version concurrency control or row level versioning.
To support storing multiple older versions of rows, additional disk space is used from the tempdb database. The disk space for version store needs to be monitored and managed appropriately.
Versioning works by making any transaction that changes data keep the old versions of the data around so that a ‘snapshot’ of the database (or a part of the database) can be constructed from these old versions.
When a record in a table or index is updated, the new record is stamped with the transaction sequence_number of the transaction that is doing the update. The previous version of the record is stored in the version store, and the new record contains a pointer to the old record in the version store. Old records in the version store may contain pointers to even older versions. All the old versions of a particular record are chained in a linked list, and SQL Server may need to follow several pointers in a list to reach the right version. Version records need to be kept in the version store only as long as there are there are operations that might require them.
In the following figure, the current version of the record is generated by transaction T3, and it is stored in the normal data page. The previous versions of the record, generated by transaction T2 and transaction Tx are stored in pages in the version store (in tempdb).
Figure 1: Versions of a Record
Figure 1: versions of a record
Row level versioning gives SQL Server an optimistic concurrency model to work with when the needs of an application require it or when the concurrency reduction of using the default pessimistic model is unacceptable. To switch to the row-versioning based isolation levels, the tradeoffs of using this new concurrency model need to be carefully considered. In addition to extra management requirements to monitor the increased usage of tempdb for the version store, using versioning will slow the performance of update operations, due to the extra work involved in maintaining old versions. Update operations will bear this cost, even if there are no current readers of the data. If there are readers using row level versioning, they will have the extra cost of traversing the link pointers to find the appropriate version of the requested row.
In addition, because the optimistic concurrency model of snapshot isolation assumes (optimistically) that there will not be many update conflicts occurring, you should not choose the SI isolation level if you are expecting contention for updating the same data concurrently. Snapshot isolation works well to enable readers not to be blocked by writers, but simultaneous writers are still not allowed. In the default pessimistic model, the first writer will block all subsequent writers, but using SI, subsequent writers could actually receive error messages and the application would need to resubmit the original request. Note that these update conflicts will only occur with SI, and not with the enhanced read-committed isolation level, RCSI. In addition, there are guidelines you can follow to reduce update conflicts when using Snapshot Isolation.
For more details about using Row Level Versioning to support Snapshot Isolation, please see Kimberly Tripp’s white paper: SQL Server 2005 Snapshot Isolation, http://msdn2.microsoft.com/en-us/library/ms345124.aspx
Additional SQL Server 2005 Features utilizing Row Level Versioning
Although the motivation behind adding row level versioning to SQL Server 2005 was to maintain a version store for optimistic concurrency, and to support row-versioning based isolation levels to solve the problem of data writers blocking all readers, there are other SQL Server 2005 features that take advantage of this new data management mechanism. Two of these features, multiple active result sets (MARS) and online index rebuilds, are new to the product; the third is a new way of managing triggers, which are an existing feature. This section provides an overview description of the use of row level versioning for these SQL Server features.
Triggers and Row Level Versioning
Triggers have been a part of SQL Server since the earliest version and they were the only feature of the product prior to SQL Server 2005 that gave us any type of historical (or versioned) data. One of the special features of triggers is the ability to access a pseudo-table called ‘deleted’. If the trigger is a DELETE trigger, the ‘deleted’ table contains all the rows which were deleted by the operation that caused the trigger to fire. If the trigger is an UPDATE trigger, the ‘deleted’ table contains the old version of the data in all the rows changed by the update statement that caused the trigger to fire, in other words, the data before the update took place. Previous versions of SQL Server would populate the deleted table by scanning the transaction log looking for all the log records in the current transaction that changed the table to which the trigger was tied. Scanning log records can be very expensive, because the transaction log is optimized for writing, not reading. For a high volume OLTP system in which log records for the current transaction may have already been written to disk, this could incur actual physical I/O operations. The new mechanism can then help the performance of your existing triggers.
In SQL Server 2005, the deleted table is materialized using row level versioning. When updates or deletes are performed on a table that has a relevant trigger defined, the changes to the table are versioned, regardless of whether or not row-versioning based isolation levels have been enabled. When the trigger needs to access the ‘deleted’ table, it retrieves the data from the version store. New data, whether from an update or an insert, is accessible through the ‘inserted’ table. When a SQL Server 2005 trigger scans inserted, it looks for the most recent versions of the rows.
Because of the fact that tempdb is used for the version store, applications that make heavy use of triggers in SQL Server 2000 need to be aware that there may be increased demands on tempdb after upgrading to SQL Server 2005.
Online Index Creation and Row Level Versioning
Index creation and rebuilding are obviously not new features of SQL Server, but in SQL Server 2005, we can now build, or rebuild, an index without taking the table or index offline. In previous versions, building or rebuilding a clustered index would exclusively lock the entire table so that all the data was completely inaccessible. Building or rebuilding a nonclustered index would place a shared lock on the table, so that data could be read but not modified. In addition, while rebuilding a nonclustered index, the index itself was completely unusable and queries that might have used the index would exhibit degraded performance.
With Row Level Versioning, SQL Server 2005 allows indexes to be built or rebuilt completely online. As the index is being built, SQL Server scans the existing table for the version of the data at the time the index building began. Any modifications to the table will be versioned, regardless of whether snapshot isolation has been enabled. Requests to read data from the table will access the versioned data.
Multiple Active Result Sets and Row Level Versioning
Although Multiple Active Result Sets (MARS) is a client-side feature of SQL Server 2005, its implementation relies on the version store which is very much a server-side feature. For this reason, discussion of MARS is included along with other row level versioning features, and not in the section on client-side concurrency considerations.
Microsoft SQL Server 2005 extends the support for MARS in applications accessing the database engine. In earlier versions of SQL Server, database applications could not maintain multiple active statements on a connection when using default result sets. The application had to process or cancel all result sets from one batch before it could execute any other batch on that connection. SQL Server 2005 introduces new attributes that allow applications to have more than one pending request per connection; in particular, to have more than one active default result set per connection.
Only SELECT and BULK INSERT statements are allowed to execute non-atomically and interleave with other statements executing on other batches under a MARS enabled connection. DDL and data modification statements all execute atomically. If there are any other statements waiting to execute, they will block until atomic statements have completed execution.
If two batches are submitted under a MARS connection, one of them containing a SELECT statement and the other containing an UPDATE statement, the UPDATE can begin execution before the SELECT has processed its entire result set. However, the UPDATE statement must run to completion before the SELECT statement can make progress and all changes made by the UPDATE will be versioned. If both statements are running under the same transaction, any changes made by the UPDATE statement after the SELECT statement has started execution are not visible to the SELECT, because the SELECT will access the older version of the required data rows.
Client-side Issues Involving Concurrency
The Storage Engine in SQL Server 2005 provides concurrency control, managing transactions and locks. However, users access SQL Server through client applications, components, and services, and all these programming components are affected by the way SQL Server manages concurrency. It is also important to understand how connection settings affect concurrency, and how to manage transactions effectively from client applications.
In previous versions of SQL Server, any command actually executed by SQL Server had to be a Transact-SQL command, regardless of which application layer started a particular transaction.
With the new SQL CLR capabilities in SQL Server 2005, the new server side programming infrastructure, including such features as Service Broker, and the new data access provider, concurrency and transaction management becomes far more powerful. However, with this added power comes added complexity.
Concurrency control in SQL CLR objects
Actions performed from inside a SQL CLR procedure do not need to start a DTC-managed transaction, and operations that update, insert, or delete SQL Server data follow the same transactional principles as standard Transact-SQL procedures.
SQL CLR objects can use SQL Server data by using the built-in Data Access Provider, through the SqlContext object, and these SQL CLR objects can send results to the calling connection using the Pipe property of the SqlContext object, as in the following example:
<SQLProcedure()> _
Public Shared Sub GetServerVersion ()
Dim cmdGetVersion as SqlCommand = SqlContext.CreateCommand()
cmdGetVersion.Commandtext = “SELECT @@VERSION”
cmdGetVersion.CommandType = CommandType.Text
SqlContext.Pipe.Send(cmdGetVersion.ExecuteScalar().ToString())
End Sub
It looks very simple, but what is actually happening behind the scenes? Setting up a trace to watch what actually happens when the above command is executed, we see the following events:
SQL:BatchStartingEXEC cbo.GetServerVersion
SQL:StmtStartingEXEC cbo.GetServerVersion
SP:StartingEXEC cbo.GetServerVersion
SP:StmtStarting SELECT @@VERSION
Note that these are exactly the same events we would have seen if we had defined and executed the following Transact-SQL stored procedure:
CREATE PROCEDURE dbo.GetServerVersion
AS
SELECT @@VERSION
GO
Now let’s create a CLR stored procedure which updates data in Production.Product table in the AdventureWorks database:
<SqlProcedure()> _
Public Shared Sub UpdateListPriceByProductID(ByVal ProductID As SqlInt32)
Try
Dim cmdUpdate As SqlCommand = SqlContext.CreateCommand()
Dim parProductID As SqlParameter = _
cmdUpdate.Parameters.Add("@ProductID", SqlDbType.Int)
parProductID.Direction = ParameterDirection.Input
cmdUpdate.CommandText = "UPDATE Production.Product " _
+ "SET ListPrice = ListPrice * 1.1 " _
+ "WHERE ProductID = @ProductID"
parProductID.Value = ProductID
cmdUpdate.ExecuteNonQuery()
Catch e As Exception
SqlContext.Pipe.Send(e.Message)
End Try
End Sub
Again, we can use a trace to see what activity actually takes place in SQL Server when we execute this stored procedure. The following table lists the events that took place on the server, and includes an event number so that we can refer to the specific events in the following discussion.
1
SQL:BatchStarting
EXECUTE dbo.UpdateListPriceByProductID 444;
The batch we execute in Management Studio begins execution.
2
SQL:StmtStarting
EXECUTE dbo.UpdateListPriceByProductID 444;
The only statement in this batch begins execution.
3
SP:Starting
EXECUTE dbo.UpdateListPriceByProductID 444;
This statement calls the dbo.UpdateListPriceByProductID CLR stored procedure, which begins its execution.
4
SP:StmtStarting
UPDATE Production.Product
SET ListPrice = ListPrice * 1.1
WHERE ProductID = @ProductID
This is the only statement in this stored procedure that executes any action on the database.
Note that there is no connection event, because this procedure uses the SQLContext object to get a reference to the current connection context.
5
SQLTransaction
75691 UPDATE 0 – Begin
Because the statement to be executed is a data modification operation, SQL Server starts an implicit transaction automatically.
6
SP:Starting
UPDATE Production.Product
SET ListPrice = ListPrice * 1.1
WHERE ProductID = @ProductID
There is an AFTER UPDATE trigger defined on the Production.Product table called uProduct. This trigger is managed internally exactly like a stored procedure, so we see the SP:Starting event.
The execution of this trigger still takes place under the control of the transaction number 75691. This is the code that creates this trigger:
CREATE TRIGGER [Production].[uProduct]
ON [Production].[Product]
AFTER UPDATE NOT FOR REPLICATION AS
BEGIN
SET NOCOUNT ON;
UPDATE [Production].[Product]
SET [Production].[Product].[ModifiedDate]
= GETDATE()
FROM inserted
WHERE inserted.[ProductID] =
[Production].[Product].[ProductID];
END;
7
SP:StmtStarting
SET NOCOUNT ON;
The first statement inside the trigger begins.
8
SP:StmtCompleted
SET NOCOUNT ON;
The first statement inside the trigger completes.
9
SP:StmtStarting
UPDATE [Production].[Product]
SET [Production].[Product].[ModifiedDate]
= GETDATE()
FROM inserted
WHERE inserted.[ProductID]
= [Production].[Product].[ProductID];
The second statement inside the trigger begins.
10
SP:StmtCompleted
UPDATE [Production].[Product]
SET [Production].[Product].[ModifiedDate]
= GETDATE()
FROM inserted
WHERE inserted.[ProductID]
= [Production].[Product].[ProductID];
The second statement inside the trigger completes.
11
SP:Completed
UPDATE Production.Product
SET ListPrice = ListPrice * 1.1
WHERE ProductID = @ProductID
The uProduct trigger completes execution.
12
SP:StmtCompleted
UPDATE Production.Product
SET ListPrice = ListPrice * 1.1
WHERE ProductID = @ProductID
The original UPDATE statement started on step 4 completes its execution.
13
SQLTransaction
75691 UPDATE 1 – Commit
Transaction 75691 commits automatically when the statement completes, because this transaction has been opened automatically as this statement required writing to the database.
14
SP:Completed
EXECUTE [dbo].[UpdateListPriceByProductID] 444;
The stored procedure started on step 3 completes.
15
15 SQL:StmtCompleted
EXECUTE [dbo].[UpdateListPriceByProductID] 444;
The statement started on step 2 completes.
16
SQL:BatchCompleted
EXECUTE [dbo].[UpdateListPriceByProductID] 444;
The batch started on step 1 completes.
The trace events give no clue that what was actually executed was CLR code. The fact that a CLR stored procedure forced the execution of a Transact-SQL trigger should not cause any concern from a transactional point of view.
The above trace information shows us that Execution of code sent from inside a SQL CLR stored procedure runs under the same SPID as the connection which calls this procedure, as long as the SqlContext object is used.
When SQL CLR objects access any external database system using any of the .NET Data Providers, they will behave as if they had connected from a standard .NET application. There will be no differences from a transactional point of view. We cover concurrency management in ADO.NET 2.0 in the next section.
SQL CLR objects should always connect to the current instance of SQL Server through the in-process data provider.
Concurrency management from ADO.NET 2.0
ADO.NET broke the former client data access paradigm by providing two different programming models to create database applications and components:
The disconnected model based on optimistic concurrency built around the DataSet and DataAdapter types, and
The Connected model based on pessimistic concurrency, built around the SQLCommand and SQLDataReader types
Using the disconnected model, the application uses a SqlDataAdapter or a TableAdapter to read the requested data from the database, using enough shared locks during this operation to make sure that it can read consistently data rows, and then disconnects from the database, releasing any locks that this reading operation might had. In this sense, transaction behavior is almost the same as in previous releases of ADO.NET, and you can refer to the comprehensive public information available on this topic in MSDN.
ADO.NET 2.0 provides an enhanced SQL Server .NET Data Provider that exposes the new functionality available in the new edition of SQL Server. Some of the new features exposed by this new provider have been covered previously in this paper, such as MARS and the new Snapshot Isolation level.
Another ADO.NET enhancement that might change transaction behavior is the possibility of executing commands asynchronously, which uses the new MARS feature of SQL Server behind the scenes. However, there isn’t anything special on this issue related to ADO.NET and the behavior is exactly the same as it has been described in earlier sections of this paper.
The .NET Framework Version 2.0 provides two new Transaction Managers: the Lightweight Transaction Manager (LTM) and the OleTx Transaction Manager. Access to these two transactions managers is encapsulated through the System.Transactions namespace.
You can read a complete description of System.Transaction in the white paper “Introducing System.Transactions” from Juval Lowy (http://www.microsoft.com/downloads/details.aspx?FamilyId=AAC3D722-444C-4E27-8B2E-C6157ED16B15&displaylang=en) ADO.NET 2.0 has been redesigned to take advantage of System.Transactions automatically, which means that any code using Enterprise Services in ADO.NET 2.0 will use LTM or OleTx behind the scenes when necessary. In this way, the application would be using this transaction model in a declarative way.
To illustrate this new technique, we are going to execute the same sample application with and without using System.Transactions.
This first example executes a SQL statement through a SQLCommand object, without explicit transaction control:
Private Sub TestSystemTransactions()
Dim conAW As New SqlConnection(sConnString)
Dim sQuery1 As String, count1 As Integer
Dim cmd1 As SqlCommand = conAW.CreateCommand()
sQuery1 = "UPDATE Production.Product " _
+ "SET ListPrice = ListPrice * 1.1 " _
+ "WHERE ProductNumber LIKE 'EC%'"
cmd1.CommandText = sQuery1
Try
conAW.Open()
Dim result1 As IAsyncResult = cmd1.BeginExecuteNonQuery()
While result1.IsCompleted = False
Console.WriteLine("Waiting ({0})", count1)
' Wait for 1/10 second, so the counter
' doesn't consume all available resources
' on the main thread.
Threading.Thread.Sleep(100)
If result1.IsCompleted = False Then count1 += 1
End While
Console.WriteLine("Command complete. Affected {0} rows.", _
cmd1.EndExecuteNonQuery(result1))
Catch ex As SqlException
Console.WriteLine("Error ({0}): {1}", ex.Number, ex.Message)
Catch ex As InvalidOperationException
Console.WriteLine("Error: {0}", ex.Message)
Catch ex As Exception
Console.WriteLine("Error: {0}", ex.Message)
Finally
conAW.Close()
Console.ReadLine()
End Try
End Sub
Again, we can use a trace to see what activity actually takes place in SQL Server when we execute this stored procedure.
Download Database Concurrency Presentation
Post a Comment