Data, Information, and Databases

1/27/2014 Data, Information, and Databases BDIS 6.1 Topics Covered Information types: transactional –vsanalytical  Five characteristics of informat...
Author: Gabriel Rogers
1 downloads 2 Views 901KB Size
1/27/2014

Data, Information, and Databases BDIS 6.1

Topics Covered Information types: transactional –vsanalytical  Five characteristics of information quality  Database versus a DBMS  RDBMS: advantages and terminology  Multi-user issues 

BSAD 141 Dave Novak

The Need for High-Quality Information 

Data are everywhere 

Which data are important?



Which data should the organization store?



Which data need to be further manipulated?



Which data are required to make different types of decisions?



How does the organization convert various data into information that is needed?

The Need for High-Quality Information 

The need to obtain and analyze the many different levels, formats, and granularities of organizational information to make decisions

The Need for High-Quality Information 

Recall difference between data and information from Lecture #1

The Need for High-Quality Information 

CRITICAL TO REMEMBER! 

Decisions are only as good as the quality of the data and information that are used to make the decisions…



Crap in  Crap out • Using technology to help you make a decision using poor quality data doesn’t help

1

1/27/2014

Data Quality Problems Example of Low Quality Data

Characteristics of High Quality Information 1) Accurate 2) Complete  3) Consistent  4) Unique  5) Timely  

1) Accurate Are the data (is the information) correct, precise, and exact?  For example: 

2) Complete Are the data whole (complete) and do they have all the necessary parts?  For example 

Are the data factual? Are values error-free?  Have data been verified? 





• • • •

• Correct spelling • Precise numbers

3) Consistent Are the data are in agreement with themselves and with known facts?  For example 



Does summary information agree with detailed information? Can you reconcile the data? • Do mathematical manipulations yield correct results? • Are data manipulations performed consistently for the entire data set?

Are there missing values or pieces of data? Full street address Area code along with phone number Empty fields Full Names

4) Unique 

Are the data unique (one of a kind) or are there redundant, repetitious or unnecessary data stored in the same database? 

For example: • Are there duplicate records for the same “event”? • Are there different versions of “the same” file or event (which is the latest or most accurate?)

2

1/27/2014

5) Timely Are the data current with respect to decision-making needs?  Timeliness depends on the situation 

Real-time information – Immediate, up-to-date information  Real-time system – Provides realtime information in response to requests  “Real-time” is a relative description that depends on the use or need 

What is a Database? 



Database – a collection of information organized in a way that provides efficient retrieval

What is a Database Management System (DBMS)? 

A database (the physical collection of data) is typically not portable across different DBMS 

Database management systems (DBMS) – A set of computer programs / software that allow users to store, modify, query, and retrieve data in a systematic and controlled manner

A database can be a very simple collection of data such as alphabetically arranging names in an address book

Database Management System (DBMS)



Customers intentionally enter inaccurate information to protect their privacy or because they are irritated  Different data entry standards and formats  Operators enter abbreviated or erroneous information by accident or to save time  Third party and external information contains inconsistencies, inaccuracies, and errors 

There are electronic and physical databases (paper/print) 



How can data be of “poor” quality?

Like application software, different DBMS are generally designed to work with specific system software and specific database schema

A database is typically something inside the DBMS, although in the case of a MS Excel workbook the database is a standalone object

Database Management System (DBMS) A very popular and common DBMS is the relational DBMS (RDBMS)  A standard program and user interface is the Structure Query Language (SQL) 

A programming language used to create, modify, and retrieve information from a database  Different databases use different (proprietary) variations to standard SQL 

3

1/27/2014

Database Management System (DBMS) 



According to the following source (which I did not verify with the Gartner report) the top five commercial RDBMS vendors in 2011 were: http://itknowledgeexchange.techtarget.com/eye-on-oracle/oracle-the-clear-leader-in-24-billionrdbms-market/

Oracle (≈ 50% market share) IBM (≈ 20% market share)  Microsoft (≈ 17% market share)  SAP  Teradata  

Single File Data Management 

MS Excel is a database, but it is not a DBMS! 

Each worksheet is a single large twodimensional matrix

A database is simply an organized collection of data that can be accessed  A DBMS is software that is used to manage the database and provides a set of tools used to manipulate and query data

Database Management System (DBMS) Oracle: Oracle Database and MySQL IBM: DB2 and Informix  Microsoft: SQL Server  SAP: Sybase Enterprise and Sybase IQ  Teradata  

Relational Database Management System (RDBMS) 

Data are organized as a set of formal tables



Data can be accessed and combined in different ways without reorganizing the data within the tables



RDBMS Terminology 

Data model – A picture of logical data structures that detail the relationships among data elements



Metadata – Formal description of data structures (like tables and fields) and any constraints of the table or values within the table 





Data can be manipulated in different ways and combined with data in other tables without altering the original data in the table

RDBMS can be easily extended / scaled – new data and new categories of data can be added without changing existing data

RDBMS Terminology 

Data dictionary – Compiles all of the metadata about the elements in the data model

Data about the containers of data

4

1/27/2014

Entity Sets (Tables)

Entity / Record / Row

Relational table or entity set – Each table consists of columns (fields/attributes) and rows (records/entities)  The table has a name that describes the group of related entities within the table







For example, a table labeled “Student” would contain a group of student entities

Attributes / Field / Column 

The data elements that describe the characteristics of a specific entity 



A person, place, thing, transaction, or event about which data are being collected and stored The individual rows in a table contain entities  Each row is also referred to as a record 



Example?

What is a Relationship? 

The columns in each table contain the attributes

1) When designing a relational DB, data need to be separated into tables that contain related data elements 

Example?

For example we would not store data related to customer (name, address, phone, etc.) and data related to the customer’s particular order (orderID, date, shipping method, etc.) in the same table

What is a Relationship?

What is a Relationship?

All information specific to a customer would go into a “Customer” table  All information specific to the specific orders would go into an “Order” table







We would then create a relationship between the tables to match a particular customer with a particular order

A relationship in an RDB is an association between the entities within the different tables 

There are THREE (3) types of relationships: • One-to-One (1:1) • One-to-Many (1:M) • Many-to-Many (M:M)

5

1/27/2014

Creating Relationships Through Keys 

KEYS are used to create relationships between the entities in different tables in the RDB 

Primary key



Foreign key

Advantages of RDBMs 

RDBMS advantages from a business perspective include 1) Flexibility 2) Scalability and performance  3) Improved information integrity (quality) 

Creating Relationships Through Keys 

Every table in a RDB MUST have a primary key



The foreign key is not required in every table and will only appear on the “many” side of the relationship

Handle changes quickly and easily  Provide users with different views of the data 

Arranging data items in different ways depending on the user need  Showing a particular user only some of the available fields while not showing them other fields 

• Reduced information redundancy

4) Information security

1) Flexibility: Schema



1) Flexibility





For our purposes:

2) Scalability and Performance 

A DBMS must expand to meet increased demand, while maintaining acceptable performance levels Scalability – Refers to how well a system can adapt to increased demands  Performance – Measures how quickly a system performs a certain process or transaction 

Different database schema can be “owned” by or associated with different users The schema is a user personalized set of tables, views, and indexes

6

1/27/2014

3) Information Integrity 

Information integrity – a measure of information quality 





Redundant data are ok if they serve a specific purpose such as being used as backup directly linked to the source



Unintentional redundancy is not good

Know that data have not been entered incorrectly or altered in an unauthorized manner

Integrity constraint – rules that help ensure the quality of information 

3) Information Integrity: Controlling Redundancy

We will discuss entity integrity and referential integrity (there is also domain integrity)

4) Information Security Information is an organizational asset and must be protected  RDBMS offer several security features 



Access level – Determines the level of access each individual user has



Access control – Determines the types of things each group can do



Backup systems promote fault tolerance,

Wasted storage Difficult to modify  Possible inconsistencies  

Multiuser Issues DBMS serve many different users with different needs  Many users may require concurrent access to the same data 

• Who can access the DBMS 

• Types of access, such as power to create, modify, delete, and/or read • Which types of SQL statements can be executed

Multiuser Issues 

Problem: if multiple users (say tens or even hundreds of users) access the same data concurrently, how does the DBMS allow one user to change data without being overwritten by another user? This is typically referred to as the Lost-update problem

Must preserve integrity of data and the performance of the system

Multiuser Issues 

Concurrent transactions are addressed through the use of transactions and locks Transactions – single indivisible action that affects some data  Once a transaction is committed, it is permanent and changes are visible to all users  If transaction is not committed, changes are “rolled back” or reversed 

7

1/27/2014

Multiuser Issues 

Enterprise DBMS

Locks – literally “locks” the data so that changes cannot be made on the data while another transaction is in process

Learning Outcomes 

Five characteristics of quality information



Define database, DBMS, RDBMS, and supporting components and terminology



Advantages of RDBMS



What is SQL?



Describe the lost-update problem and how it is addressed

8

Suggest Documents