1. Introduction to DBMS 1.1 INTRODUCTION

1. Introduction to DBMS 1.1 INTRODUCTION DataBase Management System has evolved from a specialized computer application to a central component of com...

Author: Scott Greene

5 downloads 1 Views 463KB Size

Report

Download PDF

Recommend Documents

CHAPTER 1 INTRODUCTION TO DBMS AND RDBMS

UNIT I INTRODUCTION TO DBMS

Introduction to PLCs. Ch 1 Introduction 1

1. INTRODUCTION DONORPERFECT BASICS...11

CS352 Lecture - Course Introduction; Basic DBMS Concepts

1. INTRODUCTION 1. INTRODUCTION 1. INTRODUCTION

Introduction To ACLS,1 1 Sep 2006 INTRODUCTION TO ACLS

1 Introduction to Sonar

Introduction to SANS 1

MODULE 1 INTRODUCTION TO

1 Introduction to Ferrite

1. Introduction to XQuery

1 Introduction to Internetworking

unit 1 introduction to

1 Introduction to SCILAB

1. Introduction to Multimedia

Introduction to Colossians 1

Introduction to Ephesians 1

Introduction to Philemon 1

Introduction to Romans 1

Introduction to 1 Corinthians 1

Chapter 1. Introduction. 1. Introduction

Introduction. Introduction. Page 1

Virginia. Introduction. Introduction to

1. Introduction to DBMS 1.1

INTRODUCTION DataBase Management System has evolved from a specialized computer application to a central component of computing environment. Database system plays a vital role in organizing data about a particular enterprise. Consider an example of a company which stores data about following : Employees (Employee No., Name, Address, Salary). Departments (Department No., Name, Location) Project (Name, Project No., Department No., Location) Which may have following relations : An Employee works in a Departments. An Employee works on many Projects. A Department handles many Project. Therefore, a system is needed which can effectively organize the data and also use it to analyze and guide operations of the company. Now–a–days, the amount of information to be stored is increasing tremendously and thus the need of flexible and powerful system is also increasing day–by–day which has the ability not only to effectively organize or maintain large collection of data but also provides easy access to data.

1.2

1.2.1

CONCEPT OF DBMS This section covers basic definitions related to DBMS and also explain various components of DBMS. Data Base Management System consists of two components : (i) Database i.e. collection of data (ii) System or set of programs which are used to access and manage the database. By incorporating these two components, DBMS organize the information, maintain it and retrieve it efficiently as and when required. So to use and understand the system, as well maintained database and set of programs are needed. Concept of Data and Database The word Data is derived from a Latin word which means ‘to give’ thus data are given facts from which additional facts can be inferred. Data, are facts or undoubted information used for different computations or calculations. For example – the facts related to an employee in a company like his Employee No., Name, Salary, Designation etc. are data but when these data are retrieved or processed to find answers of questions like : What is Employee No.of employee whose salary is more than 10,000 ? What is name of employee whose Employee No. is 24 ? Then it becomes information. Thus information is a processed form of data and database is a logicially coherent collection of data, not the information, with same meaning. A database, is a collection of interrelated data which represents some aspects of real world. Database has some inherent meaning and is related to a particular group of users or applications. For example – Database of a college, may contain data about students, faculties, courses etc. which are related to each other with certain relations like – faculty

1.2.2

1.3

teaches students, students are enrolled in courses etc. Thus, we can say that database contains the data, related to a real world enterprise, and is designed, built and populated with data for specific applications related to the enterprise. Definition of DBMS A database management system (DBMS) is essentially a collection of interrelated data and a set of programs to access this data. This collection of data is usually called the database. Database systems are designed to maintain large volumes of data. Management of data involves : • Defining the structures for the storage of data. • Providing the mechanisms for the security of data against unauthorized access. The primary objective of a DBMS is to provide an environment that is convenient and efficient to use, in retrieving information from and storing information into the database. The user of the DBMS is provided the following facilities among others: • Adding empty files to the database. • Inserting new data into the existing files. • Retrieving data from the files. • Updating data in the files. • Deleting data from the files. • Removing existing files from the database. Therefore, DBMS can be used for different purposes besides data storage which are as follows : (i) Efficient access to data. (ii) For avoiding data redundancy and inconsistency. (iii) For providing security of data. (iv) For enforcing different integrity constraints. (v) For providing access for data to multiple users concurrently.

HISTORY OF DBMS Most of the software applications focus on the manipulation of the data from the starting days of computer. So, there is a need arises for a system that helps in storing and manipulating the data. The first general purpose DBMS was designed by Charles Bachman in early 1960’s and called as Integrated Data Store. It founded the basis for the network data model and influenced database system through 1960’s. In 1966, IBM released the first commercially available DBMS called IMS (Information Management System) which based on the hierarchial data model and assumes all data relationship to be structured as hierarchies. Conference On Data Systems Languages (CODASYL) set standards for network database product in 1969. Dr. E. F. Codd, an IBM researcher, proposed relational data model in theoritical paper in 1970. The publication of Codd’s paper in early seventies set off a flurry of activities in both research and commercial system developments communites and they worked to bring out a relational DBMS. IBM developed a relational model prototype in 1976. In 1980’s, the relational model was developed as a standard approach for DBMS. SQL is developed as a part of IBM’s system R project which becomes a standard query language. So IBM released first commercially available Database product based on

relational model SQL/DS for interactive operating system in 1981. IBM produced DB2 for its mairyraness with batch operating system in 1982. SQL was standardized and was adopted as a query language by ANSI and ISO. Many developments were being done in 1980’s and 1990’s in the area of database system, which include the release of Paradox, DBase, Foxpro and Access. Different researches worked out to develop a more powerful and rich data model which can support complex data types. Later on, Enterprise Resource Planning (ERP) and Management Resource Planning (MRP) evolved. Both of these packeges identify a set of common tasks e.g. human resource planning, inventory management etc. of a large organization and provide a general application layer to carry out these tasks. Thenafter, DBMS get into the revolutionary age of internet. Data stored in the DBMS can now be accessed with the help of web browsers from any where and at any time. Stored data is being provided on the web in the form of HTML and XML documents. So in 2000, the fashionable area for innovation is XML database. XML databases aim to remove the traditional division between documents and data, allowing an organization’s information resource to be held in one place, whether they are highly structured or not. All the database vendors try to develop more advanced DBMS which can support complex data like video, streaming data, digital libraries on the web. Thus, the database system evolved from sequential file access to the object oriented database system used in present scenerio.

1.4

FILE SYSTEM V/S DBMS Initially, a computer system used by an enterprise mainly performs data processing tasks i.e. to insert the information about employees, retrieve information about employees of particular department, accounting functions on salary of employees etc. Since these systems performed normal record keeping functions, they were called data processing system. Thus data processing system is an automated system for processing data of an organization. The conventional data processing approach is to develop a program (or many programs) for each application. This results in one or more data files for each application (fig. 1.1). Some of the data may be common between files. However, one application may require the file to be arranged on a particular field, e.g. amount. A major drawback of conventional method is that the storage and access techniques are built into the program. Therefore, though the same data may be required by two applications, the data will have to be stored in to different places because each application depends on the way that the data is stored. There are various drawbacks of the conventional data file processing environment. Some of them are listed below. (i) Data Redundancy Some data elements like name, address, identification code, are used in various applications. Since data is required by multiple applications, it is stored in multiple data files. In most cases, there is a repetition of data files. This is referred to as data redundancy, and it leads to various other problems. (ii) Data Integrity Problem Data redundancy is one reason for problems of data integrity. Since, the same data is stored in different places, it is inevitable that some inconsistency will creep in. For example if an instructor of Microprocessor Technology begins to take course in Computer Architecture this needs to be reflected in more than one place (figure 1.1). If the change is not made in all the places, the university will have different information in different places about the same instructor. (iii)

Data Availability Constraints

When data is scattered in different files, the availability of information from combination of files is constrained to some extent.

Figure 1.1 One to one correspondence between applications and data files 1.4.1

Advantage of DBMS over File System File system stores data in the form of records and data which are files managed by operating system and uses application program to extract information from the file. A major advantage the database approach has over the conventional approach is that a database system provides centralized control of data. (i) Reduced Redundancy Unlike conventional approach each application does not have to maintain its own data files. Data can be integrated and used by multiple applications at the same time. (ii) Ensure Consistency It becomes very difficult to maintain consistent format of files in file system. Different programmers can use different programming languages, which may cause duplication of information in several files. This duplication results in higher storage and access cost. In addition, it may lead to data inconsistency i.e. various copies of same information may not agree. For example, consider an employee management system, if address of an employee which is stored at two places and is updated at only one place then the system will give conflicting information and become inconsistent. The DBMS can guarantee that the database is never inconsistent, by providing a fix format of data and by ensuring that a change made to any entry, automatically applies to the other entries as well. This process is known as propagating updates. (iii)

Data Manipulation Capabilities

File system requires an application program for processing the data stored in files according to needs of user. If the user needs get changed then a different application program is required. For example consider the employes management system. Suppose we want to find name of employees in “Jaipur” then either new application program is developed or we have to find out the name of employee having city as Jaipur manually in the case of files system. This method is not an efficient process as developing a new application program takes a lot of time and it is possible that after development of program, our needs changes from finding employees in Jaipur to find employees in ‘Malviya Nagar, Jaipur’. Database system can solve such problems by simply firing queries to the database as needed and retreive answer in response. (iv) Data Independence (Reduced Programming Efforts) In non-database systems, the requirements of the application dictate the way in which the data is stored, and the access techniques. Besides, the knowledge of the organization of the data and the access techniques are built into the logic and code of the application. These systems are data dependent. Consider this example, suppose the university (mentioned previously) has an application that processes the student file. For performance reasons, the file is indexed on the roll number. The application would be aware of the existing index, and the internal structure of the application would be built around this knowledge. Now, consider that for some reason, the file is to be indexed on the registration date. In this case, it is impossible to change the structure of the stored data without affecting the application too. Such an application is a data dependent one. It is desirable to have data independent applications. Suppose two application X and Y need to access the same file. However, both application require a particular field to be stored in different formats. Application X requires the field “customer-balance” to be stored in decimal format, while application Y requires it be stored in binary format. This would pose a problem in the old systems. In a DBMS, differences may exist in the way that data is actually stored, and the way that it is seen and used by a given application. To conform to the changing requirements of the enterprise, the DataBase Administrator (DBA) may need to change the storage structure or access techniques. The DBA should be able to do this without having to modify the existing applications. If applications are data dependent, programmer effort, that could otherwise be available for the creation of new applications, would be necessary to modify existing applications to match the changes made. (v) Atomicity and Transaction Management File system does not ensure completion of transaction and it may cause problem of data inconsistency. For example, consider employee management system where company wants to shift an employee from sales department to finance department. The procedure for this transaction is to perform two operations, reduction in number of employees in sales department and increament in the number of employee in finance department, but in file system may combine of both operations can not be guaranteed as we can not make a single unit of these two operations and if only one operation is performed and system crashes then the database will become inconsistent. This problem can easily be solved by database management system. It ensures completion of whole transaction which combinses more than one operation or no operation will be performed on behalf of the transaction. This property is called ‘atomicity’. (vi) Security File system does not provide any security to the data stored, as there are no authentication rights provided to user for the file. Complete file is at expose of user. The DBA has to guarantee that only authorized persons have access to the database. The DBA defines the

security checks to be carried out. Different checks can be applied to different operation on the same data. For instance, a person may have the access rights to query on a file, but may not have the rights to delete or update that file. The DBMS allows such security checks to be established for each piece of data in the database. (vii) Integrity Inconsistency between two entries can lead to integrity problems. However, even if there is no redundancy, the database can still be inconsistent. For example, a student may be enrolled in 10 courses in a semester when the maximum number of courses, one can enroll is 7. Another example could be that of a student enrolling in a course that is not being offered that semester. Such problems can be avoided in a DBMS by establishing certain integrity checks to be carried out whenever any update operation is done.

1.5

DISADVANTAGES OF DBMS Inspite of many advantages, DBMS does not proves to be powerful or advantageous system in certain scenarios due to following : (i) Overhead for providing security, integration of data, transaction management, concurrency control etc. (ii) More investment is required for hardware and software. (iii) Special training is required to use DBMS. (iv) Its performance may not be adequate for certain specialized applications (v) Many applications may need to manipulate the data in ways not supported by the query language. So, it is quite advantageous to use file system in certain situations, which are : (i) Database and application are simple and not expected to change . (ii) Concurrent access is not required. (iii) Real time applications as time constraints are not easy to maintain with DBMS.

1.6

DESCRIBING AND STORING DATA IN DBMS DBMS is always concerned with some real world enterprise. Data stored in DBMS describe real world entities and represent relationships between these entities. For example, there are employees, departments and projects in a company and data in the company database describe these entities, in terms of their attributes and relationship to other entities. Data can be described through different data model and at different levels of abstraction.

1.6.1

Data Abstraction Data abstraction is one of the fundamental characteristic of any database management system, which helps in making data more accurate and easy to use. Abstraction refers to the act of representing essential features without including background details or explanations. So, data abstraction refers to the act of representing data without giving details that how data are stored or maintained. Data abstraction prevents irrelevant information at a particular level. Complexity of data is hide through several levels of abstraction so as to simplify user interaction with the system. Different levels of abstraction are : (i) Physical Level or Internal Level It is the lowest level of abstraction which specifies storage details that how data are actually stored on disks or on tapes. It specifies in the manner in which records are stored

either as the collection of pages or as the collection of records. Complex low level data structures are described in detail at this level. The design of data structure described at this level is called physical schema. The data structure at this level may include B trees,. B+ trees, hashing etc. (ii) Logical Level or Conceptual View The next higher level of abstraction describes what data are stored in the database, and what relationship exists among those data. There is only one conceptual schema per database. This schema also contains the method of deriving the objects in the conceptual view from the internal views. The description of data at this level is in a format independent of its physical representation. It also includes features that specify the checks to retain data consistency and integrity. The logical level of abstraction is used by database administrators, who decide what information is to be kept in the database

Cust_ID Loan_No Amount_in_Dollars

TYPE = BYTE (4), OFFSET = 0 TYPE = BYTE (4), OFFSET = 4 TYPE = BYTE (7), OFFSET = 8

Customer_Loan Cust_ID : 101 Loan_No : 1011 Amount_in_Dollars : 8755.00 CREATE TABLE Customer_Loan ( Cust_ID NUMBER(4) Loan_No NUMBER(4) Amount_in_Dollars NUMBER(7,2))

External

Conceptual Internal

Figure 1.2 Level of Data Abstraction (iii) View Level It is the highest level of abstraction which describes different views of the entire database. These views are designed according to the requirements of user who wants to access only a part of the database. A database may have several views, according to the demand of individual user or the group of users. The data in these views are not exactly stored in DBMS but they are computed using specification of view described by user. An analogy to the concept of data types in programming language may clarify the distinction among levels of abstraction. Most high-level programming languages support the notion of a record type. At physical level, a customer, account, employee record can be described as a block of consecutive storage locations for example words or bytes. The language compiler hides this level of details from the programmers. Similarly, the database system hides many of the lowest level storage details from database programmers.

At logical level, each such record is described by a type definition and the interrelationship among these record type is defined. Programmers using a programming language work at this level of abstraction. Similarly, database administrators usually work at this level of abstraction. Finally, at the view level, computer users see a set of application programs that hide details of the data type. Similarly, at the view level, several views of the database are defined, and database users see these views. In addition to hiding details of the logical level of the database, the views also provide a security mechanism to prevent users from accessing parts of the database. Diagram: Example of three levels

1.6.2

1.7

1.7.1

1.7.2

1.8

Figure 1.3 Database Abstraction Instances and Schemas The collection of information stored in the database at a particular moment is called an instance of the database.The overall design of the database is called the database schema. Schemas are changed infrequently. Database systems have several schemas, partitioned according to the levels of abstraction. At the lowest level is the physical schema, at the intermediate level is the logical schema and at the highest level is a subschema. In general database system supports one physical schema, one logical schema, and several subschemas.

DATA INDEPENDENCE Three levels of abstraction, along with the mappings from internal to conceptual and from conceptual to external, provides two distinct levels of data independence: Logical and physical data independence. Logical Data Independence Indicates that conceptual/logical schema can be changed without affecting the existing external(view) schemas. The change would be absorbed by the mapping between the external and conceptual(logical) levels. Consider a change in the conceptual view such as merging two records into one or adding fields to an existing record. This would require a change in the mapping from the external view to the conceptual view so as to leave the external view unchanged. Some changes such as the deletion of a conceptual view field or record, may require changes in the external view and application program. Physical Data Independence Indicates that the physical storage structures or devices used for storing the data could be changed without necessitating a change in the conceptual view or any of the external views. The change would be absorbed by the mapping between the conceptual and internal levels. Modifications at physical level are occasionally to improve performance

DATABASE LANGUAGES Most of the database management systems provide specialized languages called database languages, to interact with database or to get some job done from the database. Commands of these languages provides facility to the user to operate and manage the database efficiently. Importance of database to the user is dependent on the ease with which information can be obtained from it and one of the biggest reason of popularity of relational database management system is that it allows a rich class of questions to ask

from the database in an easy manner. Consider our sample Employee database, a user may ask : (i) Who is getting highest salary ? (ii). How many employees are working under Mr. Ramesh ? (iii) In which department employee’s strength is highest ? Such questions which involve the data stored in a database are called queries and database management systems provides a specialized language called query language to ask queries from database. A DBMS provides facility of data manipulation like retrieval of data, insertion of data, modification of existing data and deletion of data etc. Such facilities are given by Data Manipulation Language (DML) commands. Query language is only one part of DML. DBMS also supports some commands that can make changes in the structure of the database i.e. schema, such commands are known as Data Definition Language (DDL) commands. The DML and DDL are collectively known as data sublanguages, when embedded within a host language. DDL make changes in meta data (data about data) stored in data dictionary and DML retrieve data from a database (query) or make changes in an instance of the database. Relational model supports a powerful mathematical logic based language, called relational calculus which is a nonprocedural language for defining query solutions. Queries in this language have precise meaning. There are two types of relational calculus. (i) Tuple Relational Calculus (ii) Domain Relational Calculus Relation algebra is another formal, procedural query language, based on a collection of operators for manipulating relations.

1.9

TRANSACTION MANAGEMENT: A transaction is a sequence of one or more SQL statements that together forms a logical unit of work. Each statement in the transaction performs a part of the task, but all of them are required to complete the task. All the statements forming a transaction must be executed for the database to be in a consistent state. A transaction occurs when the database is modified. Here is an example of a typical transaction. A customer orders a product. The order processing program will: (i) Query the table containing product details to see if the product is in stock. (ii) Insert the order details into a table that holds the order details. (iii) Update the table containing product details to reduce the quantity on hand by the quantity order. These three actions, in the sequence shown above, form a single logical transaction. The concept of transaction processing is critical for programs that update a database because it ensures the integrity of the database. As a rule, the statements in a transaction are executed as a single unit of work in the database. Either all the statement will be executed successfully, or none of the statements will be executed. The DBMS is responsible for ensuring the consistent state of the database even in the case of the application program aborting in the middle of the transaction, or a hardware failure in the middle of a transaction. Consider the above example of a transaction. What

do you suppose will happen if the order processing program aborted after step(ii). The database would reflect a partial transaction. It would be in an inconsistent state. To maintain consistency, the DBMS undoes all the changes made if a transaction is not completed. Thus, a DBMS guarantees that if a transaction executes some updates, and then for whatever reasons, a failure occurs before the transaction reaches its normal termination, then those updates will be undone. To ensure integrity of the data, we require that the database system maintains the following properties of the transactions: (i) Atomicity Either operations of the transactions are reflected properly in the database or none are. The basic idea behind ensuring atomicity is as follows. The database system keeps track of the old values of any data on which a transaction performs a write, and, if the transaction does not complete its execution the old values are restored to make it appear as though the transaction never executed. It is handled by a component called the Transaction Manager. (ii) Consistency If the database is consistent before an execution of the transaction, the database remains consistent after the execution of the transaction. Execution of a transaction in isolation preserves the consistency of the database. Ensuring consistency for an individual transaction is the responsibility of the application programmer who codes the transaction. (iii) Durability The durability property guarantees that, once a transaction completes successfully, all the updates that it carried out on the database persist, even if there is a system failure after the transaction completes execution. We can guarantee durability by ensuring that either (a) The updates carried out by the transaction have been written to disk before the transaction completes (b) Information about the updates carried out by the transaction and written to disk is sufficient to enable the database to reconstruct the updates when the database system is restarted after the failure. Ensuring durability is the responsibility of a component of the database system called the Recovery Manager. (iv) Isolation Even if the consistency and atomicity properties are ensured for each transaction, if several transactions are executed concurrently, their operations may interleave in some undesirable way, resulting in an inconsistent state. The isolation property of a transaction ensures that the concurrent execution of transaction results in a system state that is equivalent to a state that could have been obtained had these transactions executed one at a time in some order.

1.10

STORAGE MANAGEMENT Main memory of a computer can not be used to store a medium size database which may be in gigabytes or even in terabytes therefore data are stored in disks and moved between disk and main memory. Database management system should structure the data in such a way that movement of data between main memory and disk may be reduced up to the lowest level possible. Minimizing data movement will improve data access speed because data movement to and from the disk is slow relative to the speed of central processing unit.

The goal of database management system is to simplify and facilitate access to data. The information that how the data are stored on physical media is hidden from the user. The performance of system regarding data access speed can be measured in terms of response time i.e. time elapsed between submission of command and data to the system and getting the result of computation. Response time depends on the efficiency of the data structures used to represent these data in the database and on how efficiently the system is able to operate on these data structures. The storage manager component of database management system is responsible for storing, retrieving and updating data in the database. It provides an interface between the low–level data stored in the database and the application programs and queries submitted to the system, it also translates the various DML statements into low–level file system commands.

1.11

DATABASE USERS

Different people can use the database in a different manner. The interaction between user and the database may be of several types according to the user. We can have four types of users differentiated by the way they expect to interact with the DBMS : 1.11.1 Database Administrator Databases of an enterprise are typically important enough and complex enough that the task of designing and maintaining it requires a professional, called the DataBase Administrator (DBA). DBA has central control over the system. The database administrator is responsible for following functions : (i) Schema Design and Maintenance The DBA creates the database schema after interacting with the users of the system and analyzing what data are to be stored in the database. He or she writes a set of definitions that are translated by DDL compiler into a set of tables and are stored the in data dictionary. (ii) Physical Schema and Organisation Modification DBA is responsible for designing physical schema and deciding how the data will be stored on the physical media. DBA defines storage structure and access methods by writing a set of definitions. (iii) Authorization and Security DBA is responsible for ensuring that unauthorized data access is not permitted. The DBA also decides which parts of the database, various users can access and in which mode (read, write or both). Database administrator grants different types of authorizations to different users. For example a clerk may be authorized to view salaries of different employees but he may not be authorized to update salaries. This authority can be given to the accounts officer only. So different access rights are given to different usess according to requirement. (iv) Integrity Constraint Specification Data stored in the database must satisfy certain constraints for example in an employee database, Employee No. must be unique for each employee and Address value must not be left blank. These constraints are called integrity constraints and it is the responsibility of DBA to identify all such constraints and apply them to the database. (v) Recovery From Failure The DBA is responsible for taking necessary steps required for restoring the database if the system fails DBA should keep backups of the database time to time and maintains logs of system activities so that recovery may become possible.

(vi) Database Upgradation DBA always tries to know, understand and analyze, changing requirements of user and make upgradations in the database accordingly. 1.11.2 Application Programmers They are computer professionals and write application programs. They embedded DML call in the host language program. These DML calls are converted in the host language normal procedure call by DML precompiler. The resulting program is then run through the host language compiler, which generates appropriate object code. 1.11.3 Sophisticated Users They work like an analyst and submit queries to a giving process or directly to the DBMS, which breaks down the query into instructions that the storage manager understands. 1.11.4 Specialized Users Speciallized users write specialized database applications like computer aided design systems, knowledgebase and expert systems etc. Such systems are different from traditional data processing framework and uses complex data types. 1.11.5 Naive Users They use the DBMS only by interacting through application programs written previously. For example the clerk at ticket booking window, he uses an application program to do his job of making reservations for a passenger.

1.12

DATABASE STRUCTURE

A database system is divided in different components. The exact architecture of a DBMS depends on the operating system on which it has to work because the operating system will provide the basic services, which database system uses base for accomplishing it’s tasks. Database system can be divided in to two broad parts. (i) Query Processor Components (ii) Storage Manager Components 1.12.1 Query Processor Components These components are used in evaluating DDL and DML queries and includes following components. (i) DML Compiler It first attempts to transform user’s request into an equivalent but more efficient form and then translates that into a set of low level instructions that can be used by query evaluation engine. (ii) Embedded DML Pre–Compiler It converts DML statements embedded in an application program to normal procedure calls in host language. This pre compiler consults DML complier to generate the appropriate code (iii) DDL Interpreter It makes data dictionary, which contains metadata. (iv) Query Evaluation Engine It executes low level instructions generated by the DML complier.

1.12.2 Storage Manager Components It provides the interface between the low level data stored in the database and the application programs and queries submitted to the system. These components include : (i) File Manager It manages disk space allocation and the data structures used to store the data (ii) Buffer Manager It is responsible for fetching data from disk to main memory and decides the caching strategy suitable for the application. (iii) Transaction Manager It ensures consistency of the database after any transaction performed on it. (iv)

Authorization and Integrity Manager

It tests for satisfaction of integrity constraints and checks authority of users to access data. It uses all the integrity constraints and authorization rules specified by the DBA. 1.12.3 Data Structures Used by DBMS A DBMS can use several kind of data structures as a part of physical system implementation. Each structure has it’s own importance. Following are some common data structures. (i)

Data Files

It stores the database itself on the disk. (ii)

Data Dictionary

Information pertaining to the structure and usage of data contained in the database, the metadata, is maintained in a data dictionary. The data dictionary is a database itself, documents the data. Each database user can consult the data dictionary to learn what each piece of data and the various synonyms of the data fields mean. In a system where the data dictionary is part of the DBMS(Integrated system) the data dictionary stores information concerning the source of each data-field value, the frequency of its use, and an audit trail concerning updates, including the who and when of each update. Currently data dictionary systems are available as add-ons to the DBMS. The data dictionary stores: •

Names of relations

•

Names of the attributes of each relation

•

Domains and lengths of attributes

•

Names of views defined on the database, and definitions of those views

•

Names of authorized users

•

Accounting information about users

•

Number of tuples in each relation

•

Method of storage used for each relation

•

Name of the index

•

Name of the relation being indexed

•

Attributes on which the index is defined

•

Type of index formed

(iii) Indices These are used to provide fast access to data items that hold particular values. (iv) Statistical Data It stores statistical information about the data stored in the database, like number of records, blocks etc. in a table. This information can be used to execute a query efficiently.

Figure 1.4 Diagram of Database Structure

SUMMARY • • •

• • •

Data are raw facts or figures. Database is a collection of interrelated, shared, and controlled data, typically describing activities and functions of an organization. DataBase Management System is a software designed to manage all the operations of a database like insertion, modification, deletion, sorting of data and creation, alteration, removal of database structure etc. Data model is a conceptual method of structuring data. Data in the database is abstracted in three levels physical, logical and view level. Schema is the overall design of the database.

• • • • • • • • • • • •

• • • • •

Instance is the snapshot of the database or the data stored in the database at a particular point of time. Logical data independence means no need of changes in the logical structure of data when there are changes at view level. Physical data independence means no need of changes in the physical storage details when there are changes at logical level. Data manipulation language (DML) is used to retrieve and manipulate data. Data definition language (DDL) is used to specify database schema. Transaction is a program unit or set of operations that represents a logical unit of tasks. Atomicity, consistency, isolation and durability are four properties that all the transactions should hold. These properties are known as ACID properties. Atomicity ensures that all actions associated with a transaction are executed to completion, or none is performed. Transaction must not leave the database in inconsistent state. Concurrent transactions must be executed in serializable manner. Isolation property of transaction ensures it. Durability ensures persistence of successfully completed transactions. Storage manager is a program module that provides the interface between the low level data stored in the database and the application programs and queries submitted to the system. Transaction manager ensures consistency of the database even after system failure. Database administrator is a manager whose responsibilities are focused on technical aspects of the system. Query processor components and storage manager components are two broad parts of database system. Query processor components are used in evaluating DDL and DML queries. Storage manager components provide the interface between low level data stored in database and application program and queries submitted to the system.

REVIEW QUESTIONS Q.1 Q.2

Q.3 Q.4 Q.5 Q.6

Why would you choose a database system instead of simply storing data in operating system files ? When would it make sense not to use database system ? Explain the differences between internal, external and conceptual schemas. How are these different schema layers related to the concept of logical and physical data independence ? Explain the role of different users of database ? What is a transaction ? Explain ACID properties of transaction ? What are log files ? How they are helpful in recovery. Explain roles of following for representing information about real world in database ? (a) Data Definition Language (b) Data Manipulation Language (c) Buffer Manager (d) Data Model

Q.7 Q.8 Q.9 Q.10 Q.11 Q.12 Q.13 Q.14 Q.15

Q.16

Explain different database languages available. Explain transaction management in DBMS in detail. Discuss layered structure of DBMS. Describe main characteristics of the database approach and contrast it with file based approach. [Main/Back Exam 2004] Describe components of DBMS environment and discuss how they are related to each other. [Main/Back Exam 2004] What is DBMS ? How are data stored in DBMS ? Explain. [Main/Back Exam 2003] What is transaction ? How is transaction managed in DBMS ? [Main/Back Exam 2003] What are the advantages and disadvantages of using database system ? What is the difference between logical data independence and physical data independence ? Which is easier to accomplish ? and Why ? [Main/Back Exam 2002] Give overall system structure of DBMS and explain function of each component. [1995]



