The E-Lioners | Free Place For Everything

ERR Model Presentation

What is ERR Model In Database Management System ?

The ER model defines the conceptual view of a database. It works around real-world entities and the associations among them. At view level, the ER model is considered a good option for designing databases.

Entity
An entity can be a real-world object, either animate or inanimate, that can be easily identifiable. For example, in a school database, students, teachers, classes, and courses offered can be considered as entities. All these entities have some attributes or properties that give them their identity.

An entity set is a collection of similar types of entities. An entity set may contain entities with attribute sharing similar values. For example, a Students set may contain all the students of a school; likewise a Teachers set may contain all the teachers of a school from all faculties. Entity sets need not be disjoint.

Attributes

Entities are represented by means of their properties, called attributes. All attributes have values. For example, a student entity may have name, class, and age as attributes.

There exists a domain or range of values that can be assigned to attributes. For example, a student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be negative, etc.

Types of Attributes

Simple attribute − Simple attributes are atomic values, which cannot be divided further. For example, a student's phone number is an atomic value of 10 digits.

Composite Attribute

Composite attributes are made of more than one simple attribute. For example, a student's complete name may have first_name and last_name.

Derived Attribute

Derived attributes are the attributes that do not exist in the physical database, but their values are derived from other attributes present in the database. For example, average_salary in a department should not be saved directly in the database, instead it can be derived. For another example, age can be derived from data_of_birth.

Single-value Attribute

Single-value attributes contain single value. For example − Social_Security_Number.

Multi-value Attribute

Multi-value attributes may contain more than one values. For example, a person can have more than one phone number, email_address, etc.

These attribute types can come together in a way like −

simple single-valued attributes
simple multi-valued attributes
composite single-valued attributes
composite multi-valued attributes
Entity-Set and Keys
Key is an attribute or collection of attributes that uniquely identifies an entity among entity set.

For example, the roll_number of a student makes him/her identifiable among students.

Super Key

A set of attributes (one or more) that collectively identifies an entity in an entity set.

Candidate Key

A minimal super key is called a candidate key. An entity set may have more than one candidate key.

Primary Key

A primary key is one of the candidate keys chosen by the database designer to uniquely identify the entity set.

Relationship

The association among entities is called a relationship. For example, an employee works_at a department, a student enrolls in a course. Here, Works_at and Enrolls are called relationships.

Relationship Set

A set of relationships of similar type is called a relationship set. Like entities, a relationship too can have attributes. These attributes are called descriptive attributes.

Degree of Relationship

The number of participating entities in a relationship defines the degree of the relationship.

Binary = degree 2
Ternary = degree 3
n-ary = degree
Mapping Cardinalities
Cardinality defines the number of entities in one entity set, which can be associated with the number of entities of other set via relationship set.

One-to-one − One entity from entity set A can be associated with at most one entity of entity set B and vice versa.

One-to-one relation

One-to-many − One entity from entity set A can be associated with more than one entities of entity set B however an entity from entity set B, can be associated with at most one entity.

One-to-many relation

Many-to-one − More than one entities from entity set A can be associated with at most one entity of entity set B, however an entity from entity set B can be associated with more than one entity from entity set A.

Many-to-one relation

Many-to-many − One entity from A can be associated with more than one entity from B and vice versa.

Download ERR Model Presentation

Database Concurrency Presentation

What is Database Concurrency ?

Modern relational database systems potentially allow hundreds, if not thousands, of simultaneous connections to the data. The architecture of the database system itself determines the various ways that concurrent access to the same data can be managed with as little interference between users or connections as possible. Most database systems provide features that an application developer can use to exert some control over how concurrent access is managed, allowing the developer to find a balance between concurrency and data consistency.
Microsoft® SQL Server™ 2005 includes a new technology called Row Level Versioning (RLV) that allows concurrent access to be handled in new ways. Many features of SQL Server 2005 are designed around RLV and no additional application control is necessary in order to take advantage of this new capability. For other features, such as new isolation levels, a Database Administrator must specifically allow RLV on a database by database basis. This allows backward compatibility to be maintained for those applications that depend on the locking behavior in previous SQL Server versions.
This white paper focuses on concurrency enhancements in SQL Server 2005. On the server side, it covers all the features of SQL Server that leverage RLV technology. These include the new features: Snapshot Isolation, Multiple Active Result Sets (MARS) and Online Index Rebuild. RLV is also used in SQL Server 2005 to support database triggers, so the differences in trigger behavior between SQL Server 2000 and 2005 are also discussed. On the client side, concurrency enhancements covered include concurrency in CLR objects, transaction control from the new SQL Native Client, Windows Enterprise Services and queued components, and concurrency using Service Broker enabled applications.
One of the main benefits of RLV and the client-side enhancements is that SQL Server can now provide higher levels of database concurrency with equivalent or better data consistency. This paper will describe the pre-existing concurrency features only in order to compare the new features with the existing ones.
Database Concurrency Definition
Concurrency can be defined as the ability for multiple processes to access or change shared data at the same time. The greater the number of concurrent user processes that can execute without blocking each other, the greater the concurrency of the database system.
Concurrency is impacted when a process that is changing data prevents other processes from reading the data being changed or when a process that is reading data prevents other processes from changing that data. Concurrency is also impacted when multiples processes are attempting to change the same data concurrently and they cannot all succeed without sacrificing data consistency.
How a database system addresses situations that decrease concurrency is partly determined by whether the system is using optimistic or pessimistic concurrency control. Pessimistic concurrency control works on the assumption that there are enough data modification operations in the system to make it likely that any given read operation will be affected impacted by a data modification made by another user. In other words, the system is being pessimistic and assuming a conflict will occur. The default behavior when using pessimistic concurrency control is to use locks to block access to data that another process is using. Optimistic concurrency control works on the assumption that there are few enough data modification operations to make it unlikely (although possible) that any process will modify data that another process is reading. The default behavior when using optimistic concurrency control is to use row versioning to allow data readers to see the state of the data before the modification took place.
Historical Behavior
Historically, the concurrency control model in SQL Server at the server level has been pessimistic and based on locking. While locking is still the best concurrency control choice for most applications, it can introduce significant blocking problems for a small set of applications.
The biggest problem arises when locking causes the writer-block-reader or reader-block-writer problem. If a transaction changes a row, it holds exclusive locks on the changed data. The default behavior in SQL Server is that no other transactions can read the row until the writer commits. Alternatively, SQL Server supports ‘read uncommitted isolation’, which can be requested by the application either by setting the isolation level for the connection or by specifying the NOLOCK table hint. This nonlocking scan should always be carefully considered prior to use, because it is not guaranteed to return transactional consistent results.
Prior to SQL Server 2005, the tradeoff in concurrency solutions has been that we can avoid having writers block readers if we are willing to risk inconsistent data. If our results must always be based on committed data, we needed to be willing to wait for changes to be committed.
Overview of Row Level Versioning
Even if the application requires that results must always be based on committed data, there are still two possibilities. If the reader absolutely must have the latest committed value of the data it is correct for readers to wait (on a lock) for writers to complete their transactions and commit their changes. In other situations, it might be sufficient just to have committed data values, even if they are not the most recent versions. In this case, a reader might be fine if SQL Server could provide it with a previously committed value of the row, that is, an older version.
SQL Server 2005 introduces a new isolation level called ‘snapshot isolation’ (SI) and a new non-locking flavor of read-committed isolation, call ‘read committed snapshot isolation’ (RCSI). These row-versioning based Isolations levels allow the reader to get to a previously committed value of the row without blocking, so concurrency is increased in the system. For this to work, SQL Server must keep old versions of a row when it is updated. Because multiple older versions of the same row may need to be maintained, this new behavior is also called multi-version concurrency control or row level versioning.
To support storing multiple older versions of rows, additional disk space is used from the tempdb database. The disk space for version store needs to be monitored and managed appropriately.
Versioning works by making any transaction that changes data keep the old versions of the data around so that a ‘snapshot’ of the database (or a part of the database) can be constructed from these old versions.
When a record in a table or index is updated, the new record is stamped with the transaction sequence_number of the transaction that is doing the update. The previous version of the record is stored in the version store, and the new record contains a pointer to the old record in the version store. Old records in the version store may contain pointers to even older versions. All the old versions of a particular record are chained in a linked list, and SQL Server may need to follow several pointers in a list to reach the right version. Version records need to be kept in the version store only as long as there are there are operations that might require them.
In the following figure, the current version of the record is generated by transaction T3, and it is stored in the normal data page. The previous versions of the record, generated by transaction T2 and transaction Tx are stored in pages in the version store (in tempdb).
Figure 1: Versions of a Record
Figure 1: versions of a record
Row level versioning gives SQL Server an optimistic concurrency model to work with when the needs of an application require it or when the concurrency reduction of using the default pessimistic model is unacceptable. To switch to the row-versioning based isolation levels, the tradeoffs of using this new concurrency model need to be carefully considered. In addition to extra management requirements to monitor the increased usage of tempdb for the version store, using versioning will slow the performance of update operations, due to the extra work involved in maintaining old versions. Update operations will bear this cost, even if there are no current readers of the data. If there are readers using row level versioning, they will have the extra cost of traversing the link pointers to find the appropriate version of the requested row.
In addition, because the optimistic concurrency model of snapshot isolation assumes (optimistically) that there will not be many update conflicts occurring, you should not choose the SI isolation level if you are expecting contention for updating the same data concurrently. Snapshot isolation works well to enable readers not to be blocked by writers, but simultaneous writers are still not allowed. In the default pessimistic model, the first writer will block all subsequent writers, but using SI, subsequent writers could actually receive error messages and the application would need to resubmit the original request. Note that these update conflicts will only occur with SI, and not with the enhanced read-committed isolation level, RCSI. In addition, there are guidelines you can follow to reduce update conflicts when using Snapshot Isolation.
For more details about using Row Level Versioning to support Snapshot Isolation, please see Kimberly Tripp’s white paper: SQL Server 2005 Snapshot Isolation, http://msdn2.microsoft.com/en-us/library/ms345124.aspx
Additional SQL Server 2005 Features utilizing Row Level Versioning
Although the motivation behind adding row level versioning to SQL Server 2005 was to maintain a version store for optimistic concurrency, and to support row-versioning based isolation levels to solve the problem of data writers blocking all readers, there are other SQL Server 2005 features that take advantage of this new data management mechanism. Two of these features, multiple active result sets (MARS) and online index rebuilds, are new to the product; the third is a new way of managing triggers, which are an existing feature. This section provides an overview description of the use of row level versioning for these SQL Server features.
Triggers and Row Level Versioning
Triggers have been a part of SQL Server since the earliest version and they were the only feature of the product prior to SQL Server 2005 that gave us any type of historical (or versioned) data. One of the special features of triggers is the ability to access a pseudo-table called ‘deleted’. If the trigger is a DELETE trigger, the ‘deleted’ table contains all the rows which were deleted by the operation that caused the trigger to fire. If the trigger is an UPDATE trigger, the ‘deleted’ table contains the old version of the data in all the rows changed by the update statement that caused the trigger to fire, in other words, the data before the update took place. Previous versions of SQL Server would populate the deleted table by scanning the transaction log looking for all the log records in the current transaction that changed the table to which the trigger was tied. Scanning log records can be very expensive, because the transaction log is optimized for writing, not reading. For a high volume OLTP system in which log records for the current transaction may have already been written to disk, this could incur actual physical I/O operations. The new mechanism can then help the performance of your existing triggers.
In SQL Server 2005, the deleted table is materialized using row level versioning. When updates or deletes are performed on a table that has a relevant trigger defined, the changes to the table are versioned, regardless of whether or not row-versioning based isolation levels have been enabled. When the trigger needs to access the ‘deleted’ table, it retrieves the data from the version store. New data, whether from an update or an insert, is accessible through the ‘inserted’ table. When a SQL Server 2005 trigger scans inserted, it looks for the most recent versions of the rows.
Because of the fact that tempdb is used for the version store, applications that make heavy use of triggers in SQL Server 2000 need to be aware that there may be increased demands on tempdb after upgrading to SQL Server 2005.
Online Index Creation and Row Level Versioning
Index creation and rebuilding are obviously not new features of SQL Server, but in SQL Server 2005, we can now build, or rebuild, an index without taking the table or index offline. In previous versions, building or rebuilding a clustered index would exclusively lock the entire table so that all the data was completely inaccessible. Building or rebuilding a nonclustered index would place a shared lock on the table, so that data could be read but not modified. In addition, while rebuilding a nonclustered index, the index itself was completely unusable and queries that might have used the index would exhibit degraded performance.
With Row Level Versioning, SQL Server 2005 allows indexes to be built or rebuilt completely online. As the index is being built, SQL Server scans the existing table for the version of the data at the time the index building began. Any modifications to the table will be versioned, regardless of whether snapshot isolation has been enabled. Requests to read data from the table will access the versioned data.
Multiple Active Result Sets and Row Level Versioning
Although Multiple Active Result Sets (MARS) is a client-side feature of SQL Server 2005, its implementation relies on the version store which is very much a server-side feature. For this reason, discussion of MARS is included along with other row level versioning features, and not in the section on client-side concurrency considerations.
Microsoft SQL Server 2005 extends the support for MARS in applications accessing the database engine. In earlier versions of SQL Server, database applications could not maintain multiple active statements on a connection when using default result sets. The application had to process or cancel all result sets from one batch before it could execute any other batch on that connection. SQL Server 2005 introduces new attributes that allow applications to have more than one pending request per connection; in particular, to have more than one active default result set per connection.
Only SELECT and BULK INSERT statements are allowed to execute non-atomically and interleave with other statements executing on other batches under a MARS enabled connection. DDL and data modification statements all execute atomically. If there are any other statements waiting to execute, they will block until atomic statements have completed execution.
If two batches are submitted under a MARS connection, one of them containing a SELECT statement and the other containing an UPDATE statement, the UPDATE can begin execution before the SELECT has processed its entire result set. However, the UPDATE statement must run to completion before the SELECT statement can make progress and all changes made by the UPDATE will be versioned. If both statements are running under the same transaction, any changes made by the UPDATE statement after the SELECT statement has started execution are not visible to the SELECT, because the SELECT will access the older version of the required data rows.
Client-side Issues Involving Concurrency
The Storage Engine in SQL Server 2005 provides concurrency control, managing transactions and locks. However, users access SQL Server through client applications, components, and services, and all these programming components are affected by the way SQL Server manages concurrency. It is also important to understand how connection settings affect concurrency, and how to manage transactions effectively from client applications.
In previous versions of SQL Server, any command actually executed by SQL Server had to be a Transact-SQL command, regardless of which application layer started a particular transaction.
With the new SQL CLR capabilities in SQL Server 2005, the new server side programming infrastructure, including such features as Service Broker, and the new data access provider, concurrency and transaction management becomes far more powerful. However, with this added power comes added complexity.
Concurrency control in SQL CLR objects
Actions performed from inside a SQL CLR procedure do not need to start a DTC-managed transaction, and operations that update, insert, or delete SQL Server data follow the same transactional principles as standard Transact-SQL procedures.
SQL CLR objects can use SQL Server data by using the built-in Data Access Provider, through the SqlContext object, and these SQL CLR objects can send results to the calling connection using the Pipe property of the SqlContext object, as in the following example:
<SQLProcedure()> _
Public Shared Sub GetServerVersion ()
Dim cmdGetVersion as SqlCommand = SqlContext.CreateCommand()

cmdGetVersion.Commandtext = “SELECT @@VERSION”
cmdGetVersion.CommandType = CommandType.Text

SqlContext.Pipe.Send(cmdGetVersion.ExecuteScalar().ToString())
End Sub
It looks very simple, but what is actually happening behind the scenes? Setting up a trace to watch what actually happens when the above command is executed, we see the following events:
SQL:BatchStartingEXEC cbo.GetServerVersion
SQL:StmtStartingEXEC cbo.GetServerVersion
SP:StartingEXEC cbo.GetServerVersion
SP:StmtStarting SELECT @@VERSION
Note that these are exactly the same events we would have seen if we had defined and executed the following Transact-SQL stored procedure:
CREATE PROCEDURE dbo.GetServerVersion
AS
SELECT @@VERSION
GO
Now let’s create a CLR stored procedure which updates data in Production.Product table in the AdventureWorks database:
<SqlProcedure()> _
Public Shared Sub UpdateListPriceByProductID(ByVal ProductID As SqlInt32)
Try
Dim cmdUpdate As SqlCommand = SqlContext.CreateCommand()
Dim parProductID As SqlParameter = _
cmdUpdate.Parameters.Add("@ProductID", SqlDbType.Int)

parProductID.Direction = ParameterDirection.Input
cmdUpdate.CommandText = "UPDATE Production.Product " _
+ "SET ListPrice = ListPrice * 1.1 " _
+ "WHERE ProductID = @ProductID"

parProductID.Value = ProductID

cmdUpdate.ExecuteNonQuery()

Catch e As Exception
SqlContext.Pipe.Send(e.Message)
End Try
End Sub
Again, we can use a trace to see what activity actually takes place in SQL Server when we execute this stored procedure. The following table lists the events that took place on the server, and includes an event number so that we can refer to the specific events in the following discussion.
1
SQL:BatchStarting
EXECUTE dbo.UpdateListPriceByProductID 444;

The batch we execute in Management Studio begins execution.
2
SQL:StmtStarting
EXECUTE dbo.UpdateListPriceByProductID 444;

The only statement in this batch begins execution.
3
SP:Starting
EXECUTE dbo.UpdateListPriceByProductID 444;

This statement calls the dbo.UpdateListPriceByProductID CLR stored procedure, which begins its execution.
4
SP:StmtStarting
UPDATE Production.Product
SET ListPrice = ListPrice * 1.1
WHERE ProductID = @ProductID

This is the only statement in this stored procedure that executes any action on the database.
Note that there is no connection event, because this procedure uses the SQLContext object to get a reference to the current connection context.
5
SQLTransaction
75691 UPDATE 0 – Begin

Because the statement to be executed is a data modification operation, SQL Server starts an implicit transaction automatically.
6
SP:Starting
UPDATE Production.Product
SET ListPrice = ListPrice * 1.1
WHERE ProductID = @ProductID

There is an AFTER UPDATE trigger defined on the Production.Product table called uProduct. This trigger is managed internally exactly like a stored procedure, so we see the SP:Starting event.
The execution of this trigger still takes place under the control of the transaction number 75691. This is the code that creates this trigger:
CREATE TRIGGER [Production].[uProduct]
ON [Production].[Product]
AFTER UPDATE NOT FOR REPLICATION AS
BEGIN
SET NOCOUNT ON;

UPDATE [Production].[Product]
SET [Production].[Product].[ModifiedDate]
= GETDATE()
FROM inserted
WHERE inserted.[ProductID] =
[Production].[Product].[ProductID];
END;
7
SP:StmtStarting
SET NOCOUNT ON;

The first statement inside the trigger begins.
8
SP:StmtCompleted
SET NOCOUNT ON;

The first statement inside the trigger completes.
9
SP:StmtStarting
UPDATE [Production].[Product]
SET [Production].[Product].[ModifiedDate]
= GETDATE()
FROM inserted
WHERE inserted.[ProductID]
= [Production].[Product].[ProductID];

The second statement inside the trigger begins.
10
SP:StmtCompleted
UPDATE [Production].[Product]
SET [Production].[Product].[ModifiedDate]
= GETDATE()
FROM inserted
WHERE inserted.[ProductID]
= [Production].[Product].[ProductID];

The second statement inside the trigger completes.
11
SP:Completed
UPDATE Production.Product
SET ListPrice = ListPrice * 1.1
WHERE ProductID = @ProductID

The uProduct trigger completes execution.
12
SP:StmtCompleted
UPDATE Production.Product
SET ListPrice = ListPrice * 1.1
WHERE ProductID = @ProductID

The original UPDATE statement started on step 4 completes its execution.
13
SQLTransaction
75691 UPDATE 1 – Commit

Transaction 75691 commits automatically when the statement completes, because this transaction has been opened automatically as this statement required writing to the database.
14
SP:Completed
EXECUTE [dbo].[UpdateListPriceByProductID] 444;

The stored procedure started on step 3 completes.
15
15 SQL:StmtCompleted
EXECUTE [dbo].[UpdateListPriceByProductID] 444;

The statement started on step 2 completes.
16
SQL:BatchCompleted
EXECUTE [dbo].[UpdateListPriceByProductID] 444;

The batch started on step 1 completes.
The trace events give no clue that what was actually executed was CLR code. The fact that a CLR stored procedure forced the execution of a Transact-SQL trigger should not cause any concern from a transactional point of view.
The above trace information shows us that Execution of code sent from inside a SQL CLR stored procedure runs under the same SPID as the connection which calls this procedure, as long as the SqlContext object is used.
When SQL CLR objects access any external database system using any of the .NET Data Providers, they will behave as if they had connected from a standard .NET application. There will be no differences from a transactional point of view. We cover concurrency management in ADO.NET 2.0 in the next section.
SQL CLR objects should always connect to the current instance of SQL Server through the in-process data provider.
Concurrency management from ADO.NET 2.0
ADO.NET broke the former client data access paradigm by providing two different programming models to create database applications and components:
The disconnected model based on optimistic concurrency built around the DataSet and DataAdapter types, and
The Connected model based on pessimistic concurrency, built around the SQLCommand and SQLDataReader types
Using the disconnected model, the application uses a SqlDataAdapter or a TableAdapter to read the requested data from the database, using enough shared locks during this operation to make sure that it can read consistently data rows, and then disconnects from the database, releasing any locks that this reading operation might had. In this sense, transaction behavior is almost the same as in previous releases of ADO.NET, and you can refer to the comprehensive public information available on this topic in MSDN.
ADO.NET 2.0 provides an enhanced SQL Server .NET Data Provider that exposes the new functionality available in the new edition of SQL Server. Some of the new features exposed by this new provider have been covered previously in this paper, such as MARS and the new Snapshot Isolation level.
Another ADO.NET enhancement that might change transaction behavior is the possibility of executing commands asynchronously, which uses the new MARS feature of SQL Server behind the scenes. However, there isn’t anything special on this issue related to ADO.NET and the behavior is exactly the same as it has been described in earlier sections of this paper.
The .NET Framework Version 2.0 provides two new Transaction Managers: the Lightweight Transaction Manager (LTM) and the OleTx Transaction Manager. Access to these two transactions managers is encapsulated through the System.Transactions namespace.
You can read a complete description of System.Transaction in the white paper “Introducing System.Transactions” from Juval Lowy (http://www.microsoft.com/downloads/details.aspx?FamilyId=AAC3D722-444C-4E27-8B2E-C6157ED16B15&displaylang=en) ADO.NET 2.0 has been redesigned to take advantage of System.Transactions automatically, which means that any code using Enterprise Services in ADO.NET 2.0 will use LTM or OleTx behind the scenes when necessary. In this way, the application would be using this transaction model in a declarative way.
To illustrate this new technique, we are going to execute the same sample application with and without using System.Transactions.
This first example executes a SQL statement through a SQLCommand object, without explicit transaction control:
Private Sub TestSystemTransactions()
Dim conAW As New SqlConnection(sConnString)
Dim sQuery1 As String, count1 As Integer
Dim cmd1 As SqlCommand = conAW.CreateCommand()

sQuery1 = "UPDATE Production.Product " _
+ "SET ListPrice = ListPrice * 1.1 " _
+ "WHERE ProductNumber LIKE 'EC%'"

cmd1.CommandText = sQuery1

Try
conAW.Open()
Dim result1 As IAsyncResult = cmd1.BeginExecuteNonQuery()

While result1.IsCompleted = False
Console.WriteLine("Waiting ({0})", count1)
' Wait for 1/10 second, so the counter
' doesn't consume all available resources
' on the main thread.
Threading.Thread.Sleep(100)
If result1.IsCompleted = False Then count1 += 1
End While

Console.WriteLine("Command complete. Affected {0} rows.", _
cmd1.EndExecuteNonQuery(result1))

Catch ex As SqlException
Console.WriteLine("Error ({0}): {1}", ex.Number, ex.Message)
Catch ex As InvalidOperationException
Console.WriteLine("Error: {0}", ex.Message)
Catch ex As Exception
Console.WriteLine("Error: {0}", ex.Message)
Finally
conAW.Close()
Console.ReadLine()
End Try
End Sub
Again, we can use a trace to see what activity actually takes place in SQL Server when we execute this stored procedure.

Download Database Concurrency Presentation

Basic Transition In DBMS Presentation

What is Transition In Database Management System ?

A transaction can be defined as a group of tasks. A single task is the minimum processing unit which cannot be divided further.

Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from A's account to B's account. This very simple and small transaction involves several low-level tasks.

A’s Account

Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)

B’s Account

Open_Account(B)

Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)

ACID Properties

A transaction is a very small unit of a program and it may contain several lowlevel tasks. A transaction in a database system must maintain Atomicity, Consistency, Isolation, and Durability − commonly known as ACID properties − in order to ensure accuracy, completeness, and data integrity.

Atomicity

This property states that a transaction must be treated as an atomic unit, that is, either all of its operations are executed or none. There must be no state in a database where a transaction is left partially completed. States should be defined either before the execution of the transaction or after the execution/abortion/failure of the transaction.

Consistency

The database must remain in a consistent state after any transaction. No transaction should have any adverse effect on the data residing in the database. If the database was in a consistent state before the execution of a transaction, it must remain consistent after the execution of the transaction as well.

Durability

The database should be durable enough to hold all its latest updates even if the system fails or restarts. If a transaction updates a chunk of data in a database and commits, then the database will hold the modified data. If a transaction commits but the system fails before the data could be written on to the disk, then that data will be updated once the system springs back into action.

Isolation

In a database system where more than one transaction are being executed simultaneously and in parallel, the property of isolation states that all the transactions will be carried out and executed as if it is the only transaction in the system. No transaction will affect the existence of any other transaction.

Serializability

When multiple transactions are being executed by the operating system in a multiprogramming environment, there are possibilities that instructions of one transactions are interleaved with some other transaction.

Schedule − A

chronological execution sequence of a transaction is called a schedule. A schedule can have many transactions in it, each comprising of a number of instructions/tasks.

Serial Schedule

It is a schedule in which transactions are aligned in such a way that one transaction is executed first. When the first transaction completes its cycle, then the next transaction is executed. Transactions are ordered one after the other. This type of schedule is called a serial schedule, as transactions are executed in a serial manner.

In a multi-transaction environment, serial schedules are considered as a benchmark. The execution sequence of an instruction in a transaction cannot be changed, but two transactions can have their instructions executed in a random fashion. This execution does no harm if two transactions are mutually independent and working on different segments of data; but in case these two transactions are working on the same data, then the results may vary. This ever-varying result may bring the database to an inconsistent state.

To resolve this problem, we allow parallel execution of a transaction schedule, if its transactions are either serializable or have some equivalence relation among them.

Equivalence Schedules

An equivalence schedule can be of the following types −

Result Equivalence

If two schedules produce the same result after execution, they are said to be result equivalent. They may yield the same result for some value and different results for another set of values. That's why this equivalence is not generally considered significant.

View Equivalence

Two schedules would be view equivalence if the transactions in both the schedules perform similar actions in a similar manner.

For example −

If T reads the initial data in S1, then it also reads the initial data in S2.

If T reads the value written by J in S1, then it also reads the value written by J in S2.

If T performs the final write on the data value in S1, then it also performs the final write on the data value in S2.

Conflict Equivalence
Two schedules would be conflicting if they have the following properties −

Both belong to separate transactions.
Both accesses the same data item.
At least one of them is "write" operation.
Two schedules having multiple transactions with conflicting operations are said to be conflict equivalent if and only if −

Both the schedules contain the same set of Transactions.
The order of conflicting pairs of operation is maintained in both the schedules.
Note − View equivalent schedules are view serializable and conflict equivalent schedules are conflict serializable. All conflict serializable schedules are view serializable too.

States of Transactions

A transaction in a database can be in one of the following states −

Transaction States
Active − In this state, the transaction is being executed. This is the initial state of every transaction.

Partially Committed − When a transaction executes its final operation, it is said to be in a partially committed state.

Failed − A transaction is said to be in a failed state if any of the checks made by the database recovery system fails. A failed transaction can no longer proceed further.

Aborted − If any of the checks fails and the transaction has reached a failed state, then the recovery manager rolls back all its write operations on the database to bring the database back to its original state where it was prior to the execution of the transaction. Transactions in this state are called aborted. The database recovery module can select one of the two operations after a transaction aborts

Re-start the transaction

Kill the transaction
Committed − If a transaction executes all its operations successfully, it is said to be committed. All its effects are now permanently established on the database system.

Download Basic Transition In DBMS Presentation

No Silver Bullet: Essence and Accidents of Software Engineering Presentation

What is No Silver Bullet ?

Of all the monsters that fill the nightmares of our folklore, none terrify more than werewolves, because they transform unexpectedly from the familiar into horrors. For these, one seeks bullets of silver that can magically lay them to rest.
The familiar software project, at least as seen by the nontechnical manager, has something of this character; it is usually innocent and straightforward, but is capable of becoming a monster of missed schedules, blown budgets, and flawed products. So we hear desperate cries for a silver bullet--something to make software costs drop as rapidly as computer hardware costs do.

But, as we look to the horizon of a decade hence, we see no silver bullet. There is no single development, in either technology or in management technique, that by itself promises even one order-of-magnitude improvement in productivity, in reliability, in simplicity. In this article, I shall try to show why, by examining both the nature of the software problem and the properties of the bullets proposed.

Skepticism is not pessimism, however. Although we see no startling breakthroughs--and indeed, I believe such to be inconsistent with the nature of software--many encouraging innovations are under way. A disciplined, consistent effort to develop, propagate, and exploit these innovations should indeed yield an order-of-magnitude improvement. There is no royal road, but there is a road.

The first step toward the management of disease was replacement of demon theories and humours theories by the germ theory. That very step, the beginning of hope, in itself dashed all hopes of magical solutions. It told workers that progress would be made stepwise, at great effort, and that a persistent, unremitting care would have to be paid to a discipline of cleanliness. So it is with software engineering today.

Does It Have to Be Hard?--Essential Difficulties

Not only are there no silver bullets now in view, the very nature of software makes it unlikely that there will be any--no inventions that will do for software productivity, reliability, and simplicity what electronics, transistors, and large-scale integration did for computer hardware. We cannot expect ever to see twofold gains every two years.
First, one must observe that the anomaly is not that software progress is so slow, but that computer hardware progress is so fast. No other technology since civilization began has seen six orders of magnitude in performance price gain in 30 years. In no other technology can one choose to take the gain in either improved performance or in reduced costs. These gains flow from the transformation of computer manufacture from an assembly industry into a process industry.

Second, to see what rate of progress one can expect in software technology, let us examine the difficulties of that technology. Following Aristotle, I divide them into essence, the difficulties inherent in the nature of software, and accidents, those difficulties that today attend its production but are not inherent.

The essence of a software entity is a construct of interlocking concepts: data sets, relationships among data items, algorithms, and invocations of functions. This essence is abstract in that such a conceptual construct is the same under many different representations. It is nonetheless highly precise and richly detailed.

I believe the hard part of building software to be the specification, design, and testing of this conceptual construct, not the labor of representing it and testing the fidelity of the representation. We still make syntax errors, to be sure; but they are fuzz compared with the conceptual errors in most systems.

If this is true, building software will always be hard. There is inherently no silver bullet.

Let us consider the inherent properties of this irreducible essence of modern software systems: complexity, conformity, changeability, and invisibility.

Complexity. Software entities are more complex for their size than perhaps any other human construct because no two parts are alike (at least above the statement level). If they are, we make the two similar parts into a subroutine--open or closed. In this respect, software systems differ profoundly from computers, buildings, or automobiles, where repeated elements abound.

Digital computers are themselves more complex than most things people build: They have very large numbers of states. This makes conceiving, describing, and testing them hard. Software systems have orders-of-magnitude more states than computers do.

Likewise, a scaling-up of a software entity is not merely a repetition of the same elements in larger sizes, it is necessarily an increase in the number of different elements. In most cases, the elements interact with each other in some nonlinear fashion, and the complexity of the whole increases much more than linearly.

The complexity of software is an essential property, not an accidental one. Hence, descriptions of a software entity that abstract away its complexity often abstract away its essence. For three centuries, mathematics and the physical sciences made great strides by constructing simplified models of complex phenomena, deriving properties from the models, and verifying those properties by experiment. This paradigm worked because the complexities ignored in the models were not the essential properties of the phenomena. It does not work when the complexities are the essence.

Many of the classic problems of developing software products derive from this essential complexity and its nonlinear increases with size. From the complexity comes the difficulty of communication among team members, which leads to product flaws, cost overruns, schedule delays. From the complexity comes the difficulty of enumerating, much less understanding, all the possible states of the program, and from that comes the unreliability. From complexity of function comes the difficulty of invoking function, which makes programs hard to use. From complexity of structure comes the difficulty of extending programs to new functions without creating side effects. From complexity of structure come the unvisualized states that constitute security trapdoors.

Not only technical problems, but management problems as well come from the complexity. It makes overview hard, thus impeding conceptual integrity. It makes it hard to find and control all the loose ends. It creates the tremendous learning and understanding burden that makes personnel turnover a disaster.

Conformity. Software people are not alone in facing complexity. Physics deals with terribly complex objects even at the "fundamental" particle level. The physicist labors on, however, in a firm faith that there are unifying principles to be found, whether in quarks or in unifiedfield theories. Einstein argued that there must be simplified explanations of nature, because God is not capricious or arbitrary.

No such faith comforts the software engineer. Much of the complexity that he must master is arbitrary complexity, forced without rhyme or reason by the many human institutions and systems to which his interfaces must conform. These differ from interface to interface, and from time to time, not because of necessity but only because they were designed by different people, rather than by God.

In many cases, the software must conform because it is the most recent arrival on the scene. In others, it must conform because it is perceived as the most conformable. But in all cases, much complexity comes from conformation to other interfaces; this complexity cannot be simplified out by any redesign of the software alone.

Changeability. The software entity is constantly subject to pressures for change. Of course, so are buildings, cars, computers. But manufactured things are infrequently changed after manufacture; they are superseded by later models, or essential changes are incorporated into later-serial-number copies of the same basic design. Call-backs of automobiles are really quite infrequent; field changes of computers somewhat less so. Both are much less frequent than modifications to fielded software.

In part, this is so because the software of a system embodies its function, and the function is the part that most feels the pressures of change. In part it is because software can be changed more easily--it is pure thought-stuff, infinitely malleable. Buildings do in fact get changed, but the high costs of change, understood by all, serve to dampen the whims of the changers.

All successful software gets changed. Two processes are at work. First, as a software product is found to be useful, people try it in new cases at the edge of or beyond the original domain. The pressures for extended function come chiefly from users who like the basic function and invent new uses for it.

Second, successful software survives beyond the normal life of the machine vehicle for which it is first written. If not new computers, then at least new disks, new displays, new printers come along; and the software must be conformed to its new vehicles of opportunity.

In short, the software product is embedded in a cultural matrix of applications, users, laws, and machine vehicles. These all change continually, and their changes inexorably force change upon the software product.

Invisibility. Software is invisible and unvisualizable. Geometric abstractions are powerful tools. The floor plan of a building helps both architect and client evaluate spaces, traffic flows, views. Contradictions and omissions become obvious. Scale drawings of mechanical parts and stick-figure models of molecules, although abstractions, serve the same purpose. A geometric reality is captured in a geometric abstraction.

The reality of software is not inherently embedded in space. Hence, it has no ready geometric representation in the way that land has maps, silicon chips have diagrams, computers have connectivity schematics. As soon as we attempt to diagram software structure, we find it to constitute not one, but several, general directed graphs superimposed one upon another. The several graphs may represent the flow of control, the flow of data, patterns of dependency, time sequence, name-space relationships. These graphs are usually not even planar, much less hierarchical. Indeed, one of the ways of establishing conceptual control over such structure is to enforce link cutting until one or more of the graphs becomes hierarchical.

In spite of progress in restricting and simplifying the structures of software, they remain inherently unvisualizable, and thus do not permit the mind to use some of its most powerful conceptual tools. This lack not only impedes the process of design within one mind, it severely hinders communication among minds.

Past Breakthroughs Solved Accidental Difficulties

If we examine the three steps in software technology development that have been most fruitful in the past, we discover that each attacked a different major difficulty in building software, but that those difficulties have been accidental, not essential, difficulties. We can also see the natural limits to the extrapolation of each such attack.
High-level languages. Surely the most powerful stroke for software productivity, reliability, and simplicity has been the progressive use of high-level languages for programming. Most observers credit that development with at least a factor of five in productivity, and with concomitant gains in reliability, simplicity, and comprehensibility.

What does a high-level language accomplish? It frees a program from much of its accidental complexity. An abstract program consists of conceptual constructs: operations, data types, sequences, and communication. The concrete machine program is concerned with bits, registers, conditions, branches, channels, disks, and such. To the extent that the high-level language embodies the constructs one wants in the abstract program and avoids all lower ones, it eliminates a whole level of complexity that was never inherent in the program at all.

The most a high-level language can do is to furnish all the constructs that the programmer imagines in the abstract program. To be sure, the level of our thinking about data structures, data types, and operations is steadily rising, but at an ever decreasing rate. And language development approaches closer and closer to the sophistication of users.

Moreover, at some point the elaboration of a high-level language creates a tool-mastery burden that increases, not reduces, the intellectual task of the user who rarely uses the esoteric constructs.

Time-sharing. Time-sharing brought a major improvement in the productivity of programmers and in the quality of their product, although not so large as that brought by high-level languages.

Time-sharing attacks a quite different difficulty. Time-sharing preserves immediacy, and hence enables one to maintain an overview of complexity. The slow turnaround of batch programming means that one inevitably forgets the minutiae, if not the very thrust, of what one was thinking when he stopped programming and called for compilation and execution. This interruption is costly in time, for one must refresh one's memory. The most serious effect may well be the decay of the grasp of all that is going on in a complex system.

Slow turnaround, like machine-language complexities, is an accidental rather than an essential difficulty of the software process. The limits of the potential contribution of time-sharing derive directly. The principal effect of timesharing is to shorten system response time. As this response time goes to zero, at some point it passes the human threshold of noticeability, about 100 milliseconds. Beyond that threshold, no benefits are to be expected.

Unified programming environments. Unix and Interlisp, the first integrated programming environments to come into widespread use, seem to have improved productivity by integral factors. Why?

They attack the accidental difficulties that result from using individual programs together, by providing integrated libraries, unified file formats, and pipes and filters. As a result, conceptual structures that in principle could always call, feed, and use one another can indeed easily do so in practice.

This breakthrough in turn stimulated the development of whole toolbenches, since each new tool could be applied to any programs that used the standard formats.

Because of these successes, environments are the subject of much of today's software-engineering research. We look at their promise and limitations in the next section.

Hopes for the Silver

Now let us consider the technical developments that are most often advanced as potential silver bullets. What problems do they address--the problems of essence, or the remaining accidental difficulties? Do they offer revolutionary advances, or incremental ones?
Ada and other high-level language advances. One of the most touted recent developments is Ada, a general-purpose high-level language of the 1980's. Ada not only reflects evolutionary improvements in language concepts, but indeed embodies features to encourage modern design and modularization. Perhaps the Ada philosophy is more of an advance than the Ada language, for it is the philosophy of modularization, of abstract data types, of hierarchical structuring. Ada is over-rich, a natural result of the process by which requirements were laid on its design. That is not fatal, for subsetted working vocabularies can solve the learning problem, and hardware advances will give us the cheap MIPS to pay for the compiling costs. Advancing the structuring of software systems is indeed a very good use for the increased MIPS our dollars will buy. Operating systems, loudly decried in the 1960's for their memory and cycle costs, have proved to be an excellent form in which to use some of the MIPS and cheap memory bytes of the past hardware surge.

Nevertheless, Ada will not prove to be the silver bullet that slays the software productivity monster. It is, after all, just another high-level language, and the biggest payoff from such languages came from the first transition -- the transition up from the accidental complexities of the machine into the more abstract statement of step-by-step solutions. Once those accidents have been removed, the remaining ones will be smaller, and the payoff from their removal will surely be less.

I predict that a decade from now, when the effectiveness of Ada is assessed, it will be seen to have made a substantial difference, but not because of any particular language feature, nor indeed because of all of them combined. Neither will the new Ada environments prove to be the cause of the improvements. Ada's greatest contribution will be that switching to it occasioned training programmers in modern software-design techniques.

Object-Oriented Programming

Many students of the art hold out more hope for object-oriented programming than for any of the other technical fads of the day. I am among them. Mark Sherman of Dartmouth notes on CSnet News that one must be careful to distinguish two separate ideas that go under that name: abstract data types and hierarchical types. The concept of the abstract data type is that an object's type should be defined by a name, a set of proper values, and a set of proper operations rather than by its storage structure, which should be hidden. Examples are Ada packages (with private types) and Modula's modules.

Hierarchical types, such as Simula-67's classes, allow one to define general interfaces that can be further refined by providing subordinate types. The two concepts are orthogonal_one may have hierarchies without hiding and hiding without hierarchies. Both concepts represent real advances in the art of building software.

Each removes yet another accidental difficulty from the process, allowing the designer to express the essence of the design without having to express large amounts of syntactic material that add no information content. For both abstract types and hierarchical types, the result is to remove a higher-order kind of accidental difficulty and allow a higher-order expression of design.

Nevertheless, such advances can do no more than to remove all the accidental difficulties from the expression of the design. The complexity of the design itself is essential, and such attacks make no change whatever in that. An order-of-magnitude gain can be made by object-oriented programming only if the unnecessary type-specification underbrush still in our programming language is itself nine-tenths of the work involved in designing a program product. I doubt it.

Artificial intelligence. Many people expect advances in artificial intelligence to provide the revolutionary breakthrough that will give order-of-magnitude gains in software productivity and quality. [3] I do not. To see why, we must dissect what is meant by "artificial intelligence."

D.L. Parnas has clarified the terminological chaos:

Two quite different definitions of AI are in common use today. AI-1: The use of computers to solve problems that previously could only be solved by applying human intelligence. Al-2: The use of a specific set of programming techniques known as heuristic or rule-based programming. In this approach human experts are studied to determine what heuristics or rules of thumb they use in solving problems.... The program is designed to solve a problem the way that humans seem to solve it.

The first definition has a sliding meaning.... Something can fit the definition of Al-1 today but, once we see how the program works and understand the problem, we will not think of it as Al any more.... Unfortunately I cannot identify a body of technology that is unique to this field.... Most of the work is problem-specific, and some abstraction or creativity is required to see how to transfer it.

I agree completely with this critique. The techniques used for speech recognition seem to have little in common with those used for image recognition, and both are different from those used in expert systems. I have a hard time seeing how image recognition, for example, will make any appreciable difference in programming practice. The same problem is true of speech recognition. The hard thing about building software is deciding what one wants to say, not saying it. No facilitation of expression can give more than marginal gains.

Expert-systems technology, AI-2, deserves a section of its own.

Expert systems. The most advanced part of the artificial intelligence art, and the most widely applied, is the technology for building expert systems. Many software scientists are hard at work applying this technology to the software-building environment. What is the concept, and what are the prospects?

An expert system is a program that contains a generalized inference engine and a rule base, takes input data and assumptions, explores the inferences derivable from the rule base, yields conclusions and advice, and offers to explain its results by retracing its reasoning for the user. The inference engines typically can deal with fuzzy or probabilistic data and rules, in addition to purely deterministic logic.

Such systems offer some clear advantages over programmed algorithms designed for arriving at the same solutions to the same problems:

Inference-engine technology is developed in an application-independent way, and then applied to many uses. One can justify much effort on the inference engines. Indeed, that technology is well advanced.
The changeable parts of the application-peculiar materials are encoded in the rule base in a uniform fashion, and tools are provided for developing, changing, testing, and documenting the rule base. This regularizes much of the complexity of the application itself.
The power of such systems does not come from ever-fancier inference mechanisms but rather from ever-richer knowledge bases that reflect the real world more accurately. I believe that the most important advance offered by the technology is the separation of the application complexity from the program itself.

How can this technology be applied to the software-engineering task? In many ways: Such systems can suggest interface rules, advise on testing strategies, remember bug-type frequencies, and offer optimization hints.

Consider an imaginary testing advisor, for example. In its most rudimentary form, the diagnostic expert system is very like a pilot's checklist, just enumerating suggestions as to possible causes of difficulty. As more and more system structure is embodied in the rule base, and as the rule base takes more sophisticated account of the trouble symptoms reported, the testing advisor becomes more and more particular in the hypotheses it generates and the tests it recommends. Such an expert system may depart most radically from the conventional ones in that its rule base should probably be hierarchically modularized in the same way the corresponding software product is, so that as the product is modularly modified, the diagnostic rule base can be modularly modified as well.

The work required to generate the diagnostic rules is work that would have to be done anyway in generating the set of test cases for the modules and for the system. If it is done in a suitably general manner, with both a uniform structure for rules and a good inference engine available, it may actually reduce the total labor of generating bring-up test cases, and help as well with lifelong maintenance and modification testing. In the same way, one can postulate other advisors, probably many and probably simple, for the other parts of the software-construction task.

Many difficulties stand in the way of the early realization of useful expert-system advisors to the program developer. A crucial part of our imaginary scenario is the development of easy ways to get from program-structure specification to the automatic or semiautomatic generation of diagnostic rules. Even more difficult and important is the twofold ,task of knowledge acquisition: finding articulate, self-analytical experts who know why they do things, and developing efficient techniques for extracting what they know and distilling it into rule bases. The essential prerequisite for building an expert system is to have an expert.

The most powerful contribution by expert systems will surely be to put at the service of the inexperienced programmer the experience and accumulated wisdom of the best programmers. This is no small contribution. The gap between the best software engineering practice and the average practice is very wide_perhaps wider than in any other engineering discipline. A tool that disseminates good practice would be important.

"Automatic" programming. For almost 40 years, people have been anticipating and writing about "automatic programming," or the generation of a program for solving a problem from a statement of the problem specifications. Some today write as if they expect this technology to provide the next breakthrough.

Parnas implies that the term is used for glamour, not for semantic content, asserting,

In short, automatic programming always has been a euphemism for programming with a higher-level language than was presently available to the programmer.

He argues, in essence, that in most cases it is the solution method, not the problem, whose specification has to be given.

One can find exceptions. The technique of building generators is very powerful, and it is routinely used to good advantage in programs for sorting. Some systems for integrating differential equations have also permitted direct specification of the problem, and the systems have assessed the parameters, chosen from a library of methods of solution, and generated the programs.

These applications have very favorable properties:

The problems are readily characterized by relatively few parameters.
There are many known methods of solution to provide a library of alternatives.
Extensive analysis has led to explicit rules for selecting solution techniques, given problem parameters.
It is hard to see how such techniques generalize to the wider world of the ordinary software system, where cases with such neat properties are the exception. It is hard even to imagine how this breakthrough in generalization could occur.

Graphical programming. A favorite subject for PhD dissertations in software engineering is graphical, or visual, programming--the application of computer graphics to software design. Sometimes the promise held out by such an approach is postulated by analogy with VLSI chip design, in which computer graphics plays so fruitful a role. Sometimes the theorist justifies the approach by considering flowcharts as the ideal program-design medium and by providing powerful facilities for constructing them.

Nothing even convincing, much less exciting, has yet emerged from such efforts. I am persuaded that nothing will.

In the first place, as I have argued elsewhere the flowchart is a very poor abstraction of software structure. Indeed, it is best viewed as Burks, von Neumann, and Goldstine's attempt to provide a desperately needed high-level control language for their proposed computer. In the pitiful, multipage, connection-boxed form to which the flowchart has today been elaborated, it has proved to be useless as a design tool--programmers draw flowcharts after, not before, writing the programs they describe.

Second, the screens of today are too small, in pixels, to show both the scope and the resolution of any seriously detailed software diagram. The so-called "desktop metaphor" of today's workstation is instead an "airplane-seat" metaphor. Anyone who has shuffled a lap full of papers while seated between two portly passengers will recognize the difference--one can see only a very few things at once. The true desktop provides overview of, and random access to, a score of pages. Moreover, when fits of creativity run strong, more than one programmer or writer has been known to abandon the desktop for the more spacious floor. The hardware technology will have to advance quite substantially before the scope of our scopes is sufficient for the software-design task.

More fundamentally, as I have argued above, software is very difficult to visualize. Whether one diagrams control flow, variable-scope nesting, variable cross references, dataflow, hierarchical data structures, or whatever, one feels only one dimension of the intricately interlocked software elephant. If one superimposes all the diagrams generated by the many relevant views, it is difficult to extract any global overview. The VLSI analogy is fundamentally misleading--a chip design is a layered two-dimensional description whose geometry reflects its realization in 3-space. A software system is not.

Program verification. Much of the effort in modern programming goes into testing and the repair of bugs. Is there perhaps a silver bullet to be found by eliminating the errors at the source, in the system-design phase? Can both productivity and product reliability be radically enhanced by following the profoundly different strategy of proving designs correct before the immense effort is poured into implementing and testing them?

I do not believe we will find productivity magic here. Program verification is a very powerful concept, and it will be very important for such things as secure operating-system kernels. The technology does not promise, however, to save labor. Verifications are so much work that only a few substantial programs have ever been verified.

Program verification does not mean error-proof programs. There is no magic here, either. Mathematical proofs also can be faulty. So whereas verification might reduce the program-testing load, it cannot eliminate it.

More seriously, even perfect program verification can only establish that a program meets its specification. The hardest part of the software task is arriving at a complete and consistent specification, and much of the essence of building a program is in fact the debugging of the specification.

Environments and tools. How much more gain can be expected from the exploding researches into better programming environments? One's instinctive reaction is that the big-payoff problems-- hierarchical file systems, uniform file formats to make possible uniform program interfaces, and generalized tools--were the first attacked, and have been solved. Language-specific smart editors are developments not yet widely used in practice, but the most they promise is freedom from syntactic errors and simple semantic errors.

Perhaps the biggest gain yet to be realized from programming environments is the use of integrated database systems to keep track of the myriad details that must be recalled accurately by the individual programmer and kept current for a group of collaborators on a single system.

Surely this work is worthwhile, and surely it will bear some fruit in both productivity and reliability. But by its very nature, the return from now on must be marginal.

Workstations. What gains are to be expected for the software art from the certain and rapid increase in the power and memory capacity of the individual workstation? Well, how many MIPS can one use fruitfully? The composition and editing of programs and documents is fully supported by today's speeds. Compiling could stand a boost, but a factor of 10 in machine speed would surely leave thinktime the dominant activity in the programmer's day. Indeed, it appears to be so now.

More powerful workstations we surely welcome. Magical enhancements from them we cannot expect.

Promising Attacks on the Conceptual Essence

Even though no technological breakthrough promises to give the sort of magical results with which we are so familiar in the hardware area, there is both an abundance of good work going on now, and the promise of steady, if unspectacular progress.
All of the technological attacks on the accidents of the software process are fundamentally limited by the productivity equation:

If, as I believe, the conceptual components of the task are now taking most of the time, then no amount of activity on the task components that are merely the expression of the concepts can give large productivity gains.

Hence we must consider those attacks that address the essence of the software problem, the formulation of these complex conceptual structures. Fortunately, some of these attacks are very promising.

Buy versus build. The most radical possible solution for constructing software is not to construct it at all.

Every day this becomes easier, as more and more vendors offer more and better software products for a dizzying variety of applications. While we software engineers have labored on production methodology, the personal-computer revolution has created not one, but many, mass markets for software. Every newsstand carries monthly magazines, which sorted by machine type, advertise and review dozens of products at prices from a few dollars to a few hundred dollars. More specialized sources offer very powerful products for the workstation and other Unix markets. Even software tools and environments can be bought off-the-shelf. I have elsewhere proposed a marketplace for individual modules.

Any such product is cheaper to buy than to build afresh. Even at a cost of one hundred thousand dollars, a purchased piece of software is costing only about as much as one programmeryear. And delivery is immediate! Immediate at least for products that really exist, products whose developer can refer products to a happy user. Moreover, such products tend to be much better documented and somewhat better maintained than home-grown software.

The development of the mass market is, I believe, the most profound long-run trend in software engineering. The cost of software has always been development cost, not replication cost. Sharing that cost among even a few users radically cuts the per-user cost. Another way of looking at it is that the use of n copies of a software system effectively multiplies the productivity of its developers by n. That is an enhancement of the productivity of the discipline and of the nation.

The key issue, of course, is applicability. Can I use an available off-the-shelf package to perform my task? A surprising thing has happened here. During the 1950's and 1960's, study after study showed that users would not use off-the-shelf packages for payroll, inventory control, accounts receivable, and so on. The requirements were too specialized, the case-to-case variation too high. During the 1980's, we find such packages in high demand and widespread use. What has changed?

Not the packages, really. They may be somewhat more generalized and somewhat more customizable than formerly, but not much. Not the applications, either. If anything, the business and scientific needs of today are more diverse and complicated than those of 20 years ago.

The big change has been in the hardware/software cost ratio. In 1960, the buyer of a two-million dollar machine felt that he could afford $250,000 more for a customized payroll program, one that slipped easily and nondisruptively into the computer-hostile social environment. Today, the buyer of a $50,000 office machine cannot conceivably afford a customized payroll program, so he adapts the payroll procedure to the packages available. Computers are now so commonplace, if not yet so beloved, that the adaptations are accepted as a matter of course.

There are dramatic exceptions to my argument that the generalization of software packages has changed little over the years: electronic spreadsheets and simple database systems. These powerful tools, so obvious in retrospect and yet so late in appearing, lend themselves to myriad uses, some quite unorthodox. Articles and even books now abound on how to tackle unexpected tasks with the spreadsheet. Large numbers of applications that would formerly have been written as custom programs in Cobol or Report Program Generator are now routinely done with these tools.

Many users now operate their own computers day in and day out on various applications without ever writing a program. Indeed, many of these users cannot write new programs for their machines, but they are nevertheless adept at solving new problems with them.

I believe the single most powerful software-productivity strategy for many organizations today is to equip the computer-naive intellectual workers who are on the firing line with personal computers and good generalized writing, drawing, file, and spreadsheet programs and then to turn them loose. The same strategy, carried out with generalized mathematical and statistical packages and some simple programming capabilities, will also work for hundreds of laboratory scientists.

Requirements refinement and rapid prototyping. The hardest single part of building a software system is deciding precisely what to build. No other part of the conceptual work is as difficult as establishing the detailed technical requirements, including all the interfaces to people, to machines, and to other software systems. No other part of the work so cripples the resulting system if done wrong. No other part is more difficult to rectify later.

Therefore, the most important function that the software builder performs for the client is the iterative extraction and refinement of the product requirements. For the truth is, the client does not know what he wants. The client usually does not know what questions must be answered, and he has almost never thought of the problem in the detail necessary for specification. Even the simple answer_"Make the new software system work like our old manual information-processing system"_is in fact too simple. One never wants exactly that. Complex software systems are, moreover, things that act, that move, that work. The dynamics of that action are hard to imagine. So in planning any software-design activity, it is necessary to allow for an extensive iteration between the client and the designer as part of the system definition.

I would go a step further and assert that it is really impossible for a client, even working with a software engineer, to specify completely, precisely, and correctly the exact requirements of a modern software product before trying some versions of the product.

Therefore, one of the most promising of the current technological efforts, and one that attacks the essence, not the accidents, of the software problem, is the development of approaches and tools for rapid prototyping of systems as prototyping is part of the iterative specification of requirements.

A prototype software system is one that simulates the important interfaces and performs the main functions of the intended system, while not necessarily being bound by the same hardware speed, size, or cost constraints. Prototypes typically perform the mainline tasks of the application, but make no attempt to handle the exceptional tasks, respond correctly to invalid inputs, or abort cleanly. The purpose of the prototype is to make real the conceptual structure specified, so that the client can test it for consistency and usability.

Much of present-day software-acquisition procedure rests upon the assumption that one can specify a satisfactory system in advance, get bids for its construction, have it built, and install it. I think this assumption is fundamentally wrong, and that many software-acquisition problems spring from that fallacy. Hence, they cannot be fixed without fundamental revision--revision that provides for iterative development and specification of prototypes and products.

Incremental development--grow, don't build, software. I still remember the jolt I felt in 1958 when I first heard a friend talk about building a program, as opposed to writing one. In a flash he broadened my whole view of the software process. The metaphor shift was powerful, and accurate. Today we understand how like other building processes the construction of software is, and we freely use other elements of the metaphor, such as specifications, assembly of components, and scaffolding.

The building metaphor has outlived its usefulness. It is time to change again. If, as I believe, the conceptual structures we construct today are too complicated to be specified accurately in advance, and too complex to be built faultlessly, then we must take a radically different approach.

Let us turn nature and study complexity in living things, instead of just the dead works of man. Here we find constructs whose complexities thrill us with awe. The brain alone is intricate beyond mapping, powerful beyond imitation, rich in diversity, self-protecting, and selfrenewing. The secret is that it is grown, not built.

So it must be with our software-systems. Some years ago Harlan Mills proposed that any software system should be grown by incremental development. That is, the system should first be made to run, even if it does nothing useful except call the proper set of dummy subprograms. Then, bit by bit, it should be fleshed out, with the subprograms in turn being developed--into actions or calls to empty stubs in the level below.

I have seen most dramatic results since I began urging this technique on the project builders in my Software Engineering Laboratory class. Nothing in the past decade has so radically changed my own practice, or its effectiveness. The approach necessitates top-down design, for it is a top-down growing of the software. It allows easy backtracking. It lends itself to early prototypes. Each added function and new provision for more complex data or circumstances grows organically out of what is already there.

The morale effects are startling. Enthusiasm jumps when there is a running system, even a simple one. Efforts redouble when the first picture from a new graphics software system appears on the screen, even if it is only a rectangle. One always has, at every stage in the process, a working system. I find that teams can grow much more complex entities in four months than they can build.

The same benefits can be realized on large projects as on my small ones.
Great designers. The central question in how to improve the software art centers, as it always has, on people.

We can get good designs by following good practices instead of poor ones. Good design practices can be taught. Programmers are among the most intelligent part of the population, so they can learn good practice. Hence, a major thrust in the United States is to promulgate good modern practice. New curricula, new literature, new organizations such as the Software Engineering Institute, all have come into being in order to raise the level of our practice from poor to good. This is entirely proper.

Nevertheless, I do not believe we can make the next step upward in the same way. Whereas the difference between poor conceptual designs and good ones may lie in the soundness of design method, the difference between good designs and great ones surely does not. Great designs come from great designers. Software construction is a creative process. Sound methodology can empower and liberate the creative mind; it cannot inflame or inspire the drudge.

The differences are not minor--they are rather like the differences between Salieri and Mozart. Study after study shows that the very best designers produce structures that are faster, smaller, simpler, cleaner, and produced with less effort. The differences between the great and the average approach an order of magnitude.

A little retrospection shows that although many fine, useful software systems have been designed by committees and built as part of multipart projects, those software systems that have excited passionate fans are those that are the products of one or a few designing minds, great designers. Consider Unix, APL, Pascal, Modula, the Smalltalk interface, even Fortran; and contrast them with Cobol, PL/I, Algol, MVS/370, and MS-DOS.

Hence, although I strongly support the technology-transfer and curriculumdevelopment efforts now under way, I think the most important single effort we can mount is to develop ways to grow great designers.

No software organization can ignore this challenge. Good managers, scarce though they be, are no scarcer than good designers. Great designers and great managers are both very rare. Most organizations spend considerable effort in finding and cultivating the management prospects; I know of none that spends equal effort in finding and developing the great designers upon whom the technical excellence of the products will ultimately depend.

Table 1. Exciting Vs. Useful But Unexciting Software Products.
Exciting Products
Yes No
Unix Cobol
APL PL/I
Pascal Algol
Modula MVS/370
Smalltalk MS-DOS
Fortran
My first proposal is that each software organization must determine and proclaim that great designers are as important to its success as great managers are, and that they can be expected to be similarly nurtured and rewarded. Not only salary, but the perquisites of recognition--office size, furnishings, personal technical equipment, travel funds, staff support--must be fully equivalent.

How to grow great designers? Space does not permit a lengthy discussion, but some steps are obvious:

Systematically identify top designers as early as possible. The best are often not the most experienced.
Assign a career mentor to be responsible for the development of the prospect, and carefully keep a career file.
Devise and maintain a careerdevelopment plan for each prospect, including carefully selected apprenticeships with top designers, episodes of advanced formal education, and short courses, all interspersed with solo-design and technicalleadership assignments.
Provide opportunities for growing designers to interact with and stimulate each other.
Acknowledgments

I thank Gordon Bell, Bruce Buchanan, Rick Hayes-Roth, Robert Patrick, and, most especially, David Parnas for their insights and stimulating ideas, and Rebekah Bierly for the technical production of this article.
REFERENCES

[1]
D.L. Parnas, "Designing Software for Ease of Extension and Contraction," IEEE Transactions on Software Engineering, Vol. 5, No. 2, March 1979, pp. 128-38.
[2]
G. Booch, "Object-Oriented Design," Software Engineering with Ada, 1983, Menlo Park, Calif.: Benjamin/ Cummings.
[3]
lEEE Transactions on Software Engineering (special issue on artificial intelligence and software engineering), l. Mostow, guest ed., Vol. 11, No. 11, November 1985.
[4]
D.L. Parnas, "Software Aspects of Strategic Defense Systems:" American Scientist, November 1985.
[5]
R. Balzer, "A 15-year Perspective on Automatic Programming," IEEE Transactions on Software Engineering (special issue on artificial intelligence and software engineering), J. Mostow, guest ed., Vol. 11, No. 11 (November 1985), pp. 1257-67.
[6]
Computer (special issue on visual programrning), R.B. Graphton and T. Ichikawa, guest eds., Vol. 18, No. 8, August 1985.
[7]
G. Raeder, "A Survey of Current Graphical Programming Techniques," Computer (special issue on visual programming), R.B. Graphton and T. Ichikawa, guest eds., Vol. 18, No. 8, August 1985, pp. 11-25.
[8]
F.P. Brooks, The Mythical Man Month, Reading, Mass.: Addison-Wesley, 1975, Chapter 14.
[9]
Defense Science Board, Report of the Task Force on Military Software in press.
[10]
H.D. Mills, "Top-Down Programming in Large Systems," in Debugging Techniques in Large Systems, R. Ruskin, ed., Englewood Cliffs, N.J.: Prentice-Hall, 1971.
[11]
B.W. Boehm, "A Spiral Model of Software Development and Enhancement," 1985, TRW Technical Report 21-371-85, TRW, Inc., 1 Space Park, Redondo Beach, Calif. 90278.
[12]
H. Sackman, W.J. Erikson, and E.E. Grant, "Exploratory Experimental Studies Comparing Online and Offline Programming Performance," Communications of the ACM, Vol. 11, No. 1 (January 1968), pp. 3-11.

Download No Silver Bullet Presentation

Labels

Total Pageviews (01/03/2012)

Blog Archive

Search This Blog

Find Us On Facebook

Feature

Main-nav-Menu (Do Not Edit Here!)

Random Posts

Blogroll

Social Share

Video Of Day

Text Widget

Text Widget

Labels

Subscribe To

Followers

Label Cloud!

Labels

Labels

Translate

Events

Template Information

About us

Flicker Images

Pages

Recent posts

Gallery

Recent Post By Lable

Search

Find us on Facebook!

Popular Posts

Most Popular

Technology

Breaking News

ERR Model Presentation

What is ERR Model In Database Management System ?

Attributes

Types of Attributes

Composite Attribute

Derived Attribute

Single-value Attribute

Multi-value Attribute

Super Key

Candidate Key

Primary Key

Relationship

Relationship Set

Degree of Relationship

One-to-one relation

One-to-many relation

Many-to-one relation

Database Concurrency Presentation

What is Database Concurrency ?

Basic Transition In DBMS Presentation

What is Transition In Database Management System ?

A’s Account

B’s Account

Open_Account(B)

ACID Properties

Atomicity

Consistency

Durability

Isolation

Serializability

Schedule − A

Serial Schedule

Equivalence Schedules

Result Equivalence

View Equivalence

For example −

States of Transactions

Re-start the transaction

No Silver Bullet: Essence and Accidents of Software Engineering Presentation

What is No Silver Bullet ?

Does It Have to Be Hard?--Essential Difficulties

Past Breakthroughs Solved Accidental Difficulties

Hopes for the Silver

Object-Oriented Programming

Contact Form