5. DATABASE MANAGEMENT SYSTEMS

5. DATABASE MANAGEMENT SYSTEMS A database is a collection of data organized in a manner so as to allow for access, retrieval and use of that data. Com...
1 downloads 1 Views 326KB Size
5. DATABASE MANAGEMENT SYSTEMS A database is a collection of data organized in a manner so as to allow for access, retrieval and use of that data. Computers process the data in a database into information. A database at a video rental club, for example, contains data about its members – personal data, rental data etc. When someone wishes to join the club, a video rental employee enters data about the new member into the database. When the member borrows a video, the computer processes the data and calculates the Due date for the video. When the member returns a video, the computer processes the data and calculates an overdue fine, if applicable. With database software, often called a database management system (DBMS), users can create a computerized database; insert, modify and delete data in the database; sort and retrieve data from the database; and create reports from the data in the database. Databases are stored on disk to allow for quick, direct access to information. The Hierarchy of Data Computer data is organized in layers, or hierarchy. Each higher level of data consists of one or more items from the lower level. A database contains files (sometimes called tables in database terminology), a file contains records, a record contains fields, and a field is made up of one or more characters. For example, a university database may consist of 3 files – a Students file, an Instructors file, and a Course offerings file. The Students file contains records about enrolled students; the Instructors file contains records about current instructors; and the Course offerings file contains records about courses offered in a particular semester. Fields - A field is a combination of one or more related characters or bytes and is the smallest unit of data a user accesses. A field name uniquely identifies each field. Field names for the data in the Students file are Student ID, First name, Last name, Courses registered for, Year and semester registered for, Grade received etc. Records - A record is a group of related fields. For example, a student record includes a set of fields about one student. A primary key is a field or a combination of multiple fields that uniquely identifies each record in a file. The data in a primary key is unique to a specific record. For example, the Student ID field uniquely identifies each student because no two students at a university can have the same Student ID. Files - A data file is a collection of related records stored on a direct access medium such as a hard disk. The Student file at a university might contain several thousand individual student records. Each student record in the file contains the same fields. Each field, however, contains different data. Database - A database contains a group of related data files. 1

Sample Student File Field Names

Records

Student ID

First Name

Last Name

Tel. No.

10/0835/1234

Leah

Johnson

645-1234

11/0801/3456

Jason

Smith

220-6734

10/0809/8765

Marge

Penn

613-9876

Fields

File Maintenance File maintenance refers to the procedures that keep data current. File maintenance procedures include adding records to, modifying records in, and deleting records from a file. Users add new records to a file when they obtain new data. When a new student is enrolled at the university, this student’s record needs to be added to the Student file. If a student has a change of address, this information must be modified in the student’s file. Similarly, if the student’s GPA is being calculated and stored in the database, this information must be modified every time new grades for the student are added to the database. Unwanted records can be deleted from the file.

File-based systems (Flat Files) versus Databases Almost all application programs use the flat file approach, the database approach, or a combination of both to store and manage data. File-Based Systems In the past, most organizations exclusively used file-based systems to store and manage data. In a typical file-based system, each department or area within an organization has its own set of files. File-based systems have 2 major weaknesses: data redundancy and isolated data. Data Redundancy – Each department or area in a company has its own files in a file-based system. Thus, the same fields may be shared in multiple files. If a file-based system is used at the university, 2

for example, the Students file and the Fees Payments file share the same students’ Personal information, Dip/degree and Program information. Duplicating data in this manner wastes resources such as storage space and people’s time to enter the data. When new students are added or student data is changed, file maintenance tasks consume additional time because people must update multiple files that contain the same data. Data redundancy also can increase the chances of errors. If a student changes his or her last name or address, for example, clerical staff must update the student’s personal information wherever it appears. If the information is not changed in all the files where it is stored, then discrepancies in the files exist. This results in a loss of data and information integrity. Isolated data – Often it is difficult to access data stored in separate files in different departments. Assume that the Course Grades file containing student grades for students in each department is stored by each department. The Student file containing personal information about the students is stored at Admissions. If the department wanted to send an e-mail to a particular student regarding an incomplete grade, they would need both the student’s course information and the student’s e-mail. They would therefore need access to 2 files – the CourseGrade file stored within the department, and the student file containing personal information, in this case, contacting Admissions to get the student’s e-mail address from their student file. Even if the department has the personal information for the student in another file, getting the email address for the student would mean accessing another file. It is also difficult for new applications to retrieve the appropriate data, which might be stored in various locations.

The Database Approach When a company uses a database approach, many programs and users share the data in the database. A university student database would most likely contain data about students, course offerings, instructors, courses registered for by students, grades earned, and their fees payments. Various sections of the university would share and interact with the data in this database – Exams, Admissions, Departments, Bursary. The data in the database is kept secure, however, so that only authorized users can access the data they are eligible to access. Again, through the use of various access rights, only authorized users can insert/modify/delete data in the database, according to their roles and responsibilities..

3

Shared Database

Departments

Reports

Admissions

Reports

Exams division

Reports

Bursary

Reports

Admissions Data Courses and Grades Data

Database Management System

Bursary Data

Exams Data

Other Data

The database approach addresses many of the weaknesses associated with file processing systems. Advantages of the database approach: Reduced data redundancy – Most data items such as name, address, tel.no, e-mail are stored in only one file, which greatly reduces duplicate data. If a change is to be made to the address, for example, the change needs to be made in only one place – in our example, in the Student file stored by Admissions. All users of the database will now have access to the new address. (Note that a database can reduce data redundancy. It will not eliminate data redundancy completely. Key fields link data together in a database. For example, a Student ID field will exist in any database file that requires access to student data. Thus, the Student ID field is duplicated (exists in more than one database file) in the database.) Improved data integrity – Data Integrity refers to the consistency and accuracy of data in a database. When users modify data in the database, they make changes to one file instead of multiple files. Thus, the database approach increases the data integrity by reducing the possibility of introducing errors and inconsistencies. Data integrity follows from the control of data redundancy. Reduced data redundancy leads to increased data integrity. Keeping a student’s address in only one file eliminates the possibility that the student will have two different addresses stored in different locations. The efficient operation of a business requires a high degree of data integrity

4

When data is corrupted (changed) either accidentally or intentionally, the integrity is lost. Integrity of data can be threatened by hardware problems, power outages, disk crashes, and viruses. The following list outlines some of the ways in which integrity of data in an organization may be compromised.      

Human errors when data is entered Errors that occur when data is transmitted from one computer to another Software bugs and viruses Hardware malfunctions, such as disk crashes 2 or more users updating a record at the same time Intruders and disgruntled employees who break into your computer system and cause subtle, undetectable changes which can cause the most damage  Accidental deletion of data or files  Restoring wrong backups  Natural disasters, such as fires and floods There are many ways to minimize these threats to data integrity. These include:  Backing up data regularly  Controlling access to data via security mechanisms such as access rights (Read Only, Read and Write) to files, and passwords  Designing user interfaces that prevent the input of invalid data  Using error detection and correction software when transmitting data Shared data – The data in a database belongs to and is shared, usually over a network, by the entire organization. Companies that use databases have security settings to define who can access, add, change and delete data in the database. Easier access – The database approach allows a non-technical user to access and maintain data, providing they have the necessary access privileges.

Disadvantages of the database approach: A database can be more complex than a file processing system to develop, and requires persons with special training, especially to develop large databases. A company with a large database would need to employ a fulltime database administrator to maintain the database. The database administrator (DBA) is also responsible for managing database security, monitoring database performance, backup and recovery procedures (if the database is hosted on the company’s server). Web hosting services are responsible for backup and recovery if the database is hosted on their server. Databases also require more memory, storage and processing power than file processing systems. Data in a database can be more vulnerable than data in a file processing system. A database can store a lot of data in its related files. Many users and programs share and depend on this data. If the database 5

is not operating properly or is corrupted, users may not be able to access the required data to perform their jobs. Furthermore, unauthorized users could gain access to a database, and have access to personal and confidential data. To protect their valuable database resources, companies should establish and follow strict security procedures. Despite these limitations, most organizations work with databases because of their tremendous advantages. Although the hardware and software costs to set up a database may seem expensive, longterm benefits exceed the initial costs.

Databases are available for many sizes and types of computers. Whether designed for a small or large computer, most DBMSs offer several tools to enter, maintain, retrieve and report data. These are tables, queries, forms and reports. Tables - In the relational databases that are commonly used today, data in the database is stored in tables (files). Each row in the table is referred to as a record, and each column in the table is referred to as a field. In addition to storing the data, relationships between the data are also stored. This is done by having one or more columns in common with other tables. These relationships are what make it possible to link data in various tables. In the below example, the field Movie ID and Member ID are common to the Borrowers table. Movies Table Movie ID 1 2

Members Table

Movie Name Titanic Gone With the Wind

3

The Bounty Hunter

4

National Treasure

5

The Wedding Singer

6

Raiders of the Lost Ark

Member ID 1 2 3 4

Member FName Kathy Mia Abe Moto

Borrowers Table Member ID

Movie ID

2 1 2 4 1

1 2 3 5 4

Date Borrowed 18/03/2012 04/04/2012 01/04/2012 02/04/2012 06/04/2012

Date Due 25/03/2012 11/04/2012 08/04/2012 09/04/2012 13/04/2012 6

But before data can be entered and retrieved from a database, it first needs to be set up, or organized. One of the tasks involved in this process is that of data definition, the process of describing data to the DBMS prior to entering it. This is done by setting up a data dictionary that contains data about each table (file) in the database and each field in those files – for instance, field width, data type (numeric, text, date/time etc.), validation rule, default value in the field, etc. Queries - To retrieve data in a database, you query it. A query is a request for specific data from the database. A DBMS uses a query language such as SQL (Structured Query Language) to retrieve and maintain data. A query language consists of simple, English-like statements that need to be entered through the keyboard. Each query language has its own language and syntax. The query language can be used to retrieve, add, modify and delete data in the database. To simplify the query process, many DBMSs provide query wizards to guide users through the steps of creating a query. Queries can be created to retrieve data from a single table, or more importantly, to retrieve data that spans several tables that have been linked through one or more common fields. This is where the real power of databases lies. Unlike file processing systems where data can only be retrieved from one file at a time, with the database approach, information from several files can be retrieved at one go. Forms - A form is a window on the screen that provides areas for entering or changing the data in the database. Forms can be used to view and maintain the data in the database. Forms are usually made attractive and easy-to-use by using various formatting features such as font color, style, shading etc. Reports - A Report generator allows users to design a report for output from the database, and then display or print the report. Report generators allow you to format your output; set titles and column headings; subtotals and totals. Some popular database management systems in use currently are: DBMS

Computer type

Proprietary/Open source

MS Access

PC, server, mobile devices

Proprietary

Oracle

PC, server, mainframe, mobile devices

Proprietary

SQL server

PC, server

Proprietary

MySQL

PC, server

Open source

Oracle currently has the largest market share in terms of revenue.

7

Data Warehousing and Data Mining Companies have long gathered and stored what data they could, but these data were typically stored in separate systems that were unable to share data and easily produce meaningful reports of useful information. The modern meaning of data warehousing is the use of a huge database that combines all of a company’s data and allows business users to access the data directly, create reports, and obtain answers to what-if questions. An organization with a data warehouse may offer managers access to large volumes of data that can be overwhelming. Data mining software uses sophisticated decision-making processes to search raw data for patterns and relationships that might be significant. For eg., trends in stock movements or in customer purchase habits at different times of the year.

To use a database or a spreadsheet? Databases are ideal when you have a large amount of related data to store and quickly need to be able to add to, update, query and create reports about the data. Use a spreadsheet when you have less data to store, but frequently need to perform calculations with the data, create charts, and review statistics.

8