Information Science 1

Information Science 1 Data  Files   Week  14   College of Information Science and Engineering Ritsumeikan University Agenda l  Terms and concepts...
Author: Malcolm Todd
17 downloads 1 Views 332KB Size
Information Science 1 Data  Files   Week  14   College of Information Science and Engineering

Ritsumeikan University

Agenda l  Terms

and concepts from Week 13 l  Data files – file structures and data storage – the concept of files – file access methods – main operations – file types and organization l  Test 2

Recall Week 13 Structured data types l  Arrays l  Declaration, Definition (Initialization) l  Index (indices), subscript (subscripts) l  Out of bounds exception l  Multi-dimensional arrays l  Aggregate operations, Nested loops l  Strings, Substrings, “End-of-string” nonprintable character l 

3

Class objectives l  Learn

about data files

– Access (sequential and random) – Types (ASCII and binary) – Organisation – Storage

4

Data in Computer Science and IT

l 

l 

We live in the era when the overwhelming majority of computers are used for data storage and processing (e-documents, blogs, “cloud computing”, …) When discussing the concept of data (or dealing with data) in computer science, one should always consider at least the following aspects:

Storage of data –  Organization / representation of data –  Access to data –  Processing of data – 

5

Data structures vs. File structures l  Working

with data structures and file structures involves, in both cases, issues related to data representation and data access l  Data structures, however, deal with data in the main memory (RAM), while file structures deal with the data in the secondary storage (HDD, SSDD, CD/ DVDD, etc.) 6

File structures in Computer Science and IT Applications

Data Base Management Systems (DBMS) File system Operating Systems (OS)

Hardware 7

Recall Computer Structures (Architecture) Data is manipulated here

Main Memory (RAM)

data transfer

Data is stored here

Secondary Storage

Type: Semiconductors Properties: Fast, expensive, volatile, miniature

Type: Magnetic tapes/ disks. optical disks, semiconductors Properties: Slower than RAM, cheaper than RAM, stable, bigger than RAM

8

Physical files vs. Logical files l 

Physical file is a collection of bytes stored on a secondary storage I/O device

l 

Logical file is an “interface" that allows the application programs to access the physical file on the secondary storage (SS)

l 

The operating system is responsible for associating a logical file in an (application) program to a physical file in an SS. Writing to or reading from a file in a program is usually done through the operating system 9

File structures vs. DBMS l 

A Database Management System (DBMS) is a software designed to make data maintenance easier, safer, and more reliable

l 

DBMS are used to manage data at a logical level, while file structures are used to store data at a physical level

l 

File structures are, therefore, a pre-requisite to DBMS

l 

Small applications may usually not be able to justify the overhead incurred by DBMS

10

Properties of files l 

Persistence: Data written into a file persists after the program stops, so the data can be used any time later

l 

Shareability (the extent to which data is shareable): Data stored in files can be used by many programs (and people) simultaneously

l 

Size: Data files can be very large. In many cases, they cannot fit into RAM

11

File access methods

l 

l 

l 

When we design a file, the important issue is how we will retrieve data components (or specific records) from the file Sometimes we need to process records one after another, but sometimes we need to access a specific data component quickly without retrieving the preceding records The file access method determines how records can be retrieved: sequentially or randomly

12

Sequential access l  A

sequential access data file:

— works

like an audio cassette tape: Data on a sequential access data file is placed one after the other like songs on a cassette tape

— retrieves

a specific data through search for the data or records needed, while moving through the file, always starting at the file’s beginning — Typical

applications that use sequential access: text processors, e-mail clients, streaming media, sorting algorithms for big data, etc.

13

Random access l  A

random access data file:

— works

like an audio CD: With one touch of a button you can immediately access any song on the CD

— allows

one to move to the specific data (or record) directly in the file — This

is achieved by allocating equal space for each record, so the operating system can predict exactly how many data blocks to skip to get to the desired data

— Random

access files thus always require more disk space than sequential access files

14

Sequential files

l 

l 

l 

l 

A sequential file is one in which records can only be accessed one after another from the beginning to the end There is an EOF (end-of-file) marker after the last record The operating system has no information about the record addresses, it only knows where the whole file is stored The only thing known to the operating system is that the records are sequential 15

Indexed files

l 

l 

l 

To access a record in a file randomly, we need to know the address of the record Indexed files provide random access by mapping “keys” (record/data “names”) to the physical addresses of the records The mapping is done, using an index file

16

Hashed files

l 

l 

Instead of an index file, a hashed file uses a mathematical function – hash function – to accomplish this mapping The user (application) gives the key, the function maps the key to the address, and then passes the calculated address to the operating system, which retrieves the corresponding record

17

Basic file processing operations

l 

Opening: Makes the file ready to use by the program –  open an existing file or create a new file

l 

Closing: Makes the logical file name available for another program –  ensures that everything has been written to the file

Reading: Transfers data from the file to RAM l  Writing: Writes data from RAM to the file l  Seeking: Locates where in a file certain data is stored 18 l 

Text files and Binary files l  Any

file stored on a storage device is a sequence of bits that can be interpreted by an application program as either a text file or a binary file —  the

same file can thus be used as a text file and as a binary file, depending on the application

19

Text files l 

A text file is a file of characters. It cannot contain integers, floating-point numbers, or any other data structures in their internal memory format (numerical data can, however, be stored in a text file when the numbers are converted to their character equivalent formats)

l 

Some files can only use character data types. Most notable are file streams (input/output objects in some object-oriented languages like Java and C+ +) for I/O devices. This is why we need special functions to format data that is input from or output to these devices

20

Binary files l 

A binary file is a collection of data stored in an internal format of the computer

l 

In a binary file, data can be an integer (including other data types represented as unsigned integers, such as image or video), a floating-point number (IEEE 754), or any other structured data

l 

Unlike text files, binary files contain data that is meaningful only if it is properly interpreted by a program – 

In a textual file, one byte is used to represent one data element (character in ASCII encoding). In a binary file, the size of a data element depends on the file format

21

File directories l 

To organize files, most operating systems provide directories

l 

A directory performs the same function as a folder in a filing cabinet. However, a directory in most operating systems is represented as a special type of file that holds information about other files

l 

A directory not only serves as a kind of index that tells the operating system where files are located on an auxiliary storage device, but can also contain other information about the files it contains, such as who has access to each file, or the date when each file was created, accessed or modified

22

Hierarchy of file directories

23

Paths and pathnames The file’s path is specified by its absolute pathname – a hierarchically ordered list of all directories, which contain the file, separated by a slash character (/) l  Most operating systems also provide a shorter pathname, known as a relative pathname, which is the path relative to the working directory, i.e. to the directory where the active application works with the data l 

– 

absolute pathname: /usr/staff/tran/file1

– 

relative pathname: tran/file1

24

Files: Concepts you need to know l  Physical

and logical files

l  File

access (sequential and random)

l  Types l  File

of files (ASCII and binary)

organization and storage

25

Homework l  Read

these slides again l  Read through all the selfpreparation assignments you previously got l  Learn the vocabulary l  Consult, whenever necessary, the textbook materials

26

Next class l  IS

–1 overview

– Important terms and concepts of the course. Preparation for the exam

27

Test 05

28