RELATIONAL DATABASE SCHEMA DESIGN AND APPLICATION GENERATING USING IIS*CASE TOOL

RELATIONAL DATABASE SCHEMA DESIGN AND APPLICATION GENERATING USING IIS*CASE TOOL Pavle MOGIN†, Ivan LUKOVIC†, @arko KARADZIC† † Faculty of Technical ...
Author: Daniela Wade
2 downloads 0 Views 73KB Size
RELATIONAL DATABASE SCHEMA DESIGN AND APPLICATION GENERATING USING IIS*CASE TOOL Pavle MOGIN†, Ivan LUKOVIC†, @arko KARADZIC† †

Faculty of Technical Sciences, Institute of Industrial Systems, Trg D. Obradoviæa 7, 21000 Novi Sad, Yugoslavia

Abstract. IIS*CASE is a software product, aimed at relational database schema design and generation of application prototypes. Database schema design is based on the concept of form type and on automatic inference of database constraints from the set of form types. Form types, database schema and menus serve as the input specification to the application generator. A methodology of using IIS*CASE in the information system design has been described in the paper. Theoretical foundations of IIS*CASE, main concepts and the way of their implementation by means of RDBMS "Ingres" are described, as well.

1.

INTRODUCTION

Database (db) schema design is an important task of the information system (IS) development process. The quality of the db schema design has a considerable influence on the expenses of programming and on the exploitation performances of IS, as well. Probably, the main problems of a large db schema design are: • defining those constraints, which will faithfully represent business rules in the real world, and • applying the complex techniques of db schema structuring. At the other hand, development of many applications, each of them containing a certain number of programs, is tedious and time consuming task. The aim of IIS*CASE is to solve or at least to temper described problems. IIS*CASE is a software product, aimed at automated db schema design and at generation of application prototypes. It covers a part of the tasks of Upper and Lower CASE products. One of the important motifs of IIS*CASE development was to overcome the problem of db constraints defining. The basic idea was that designer has to specify only the rules of data usage in the real world, appropriately. Inference of constraints from that specification and db schema design should be the task of the CASE tool, with only really necessary interaction with the designer. In IIS*CASE, the db schema attributes and constraints are specified by means of a concept called form type. This concept is derived by generalization and introducing certain

structuring rules into screen and print forms. By means of these forms, the users communicate with the IS. The idea of form usage for db schema design in a different context and meaning, appears in the work of [1] and in some others, as well. Beside the Introduction, the paper contains six sections and the Conclusion. In the section number two, the role of IIS*CASE in the IS development is described. Section three is devoted to the concept of form type. Mechanisms for inference of data constraints and constraint collision detection principles are described in section four. The specifics of IIS*CASE synthesis algorithm are described in section five. Subschema generation procedure is given in section six. Finally, the principles of IIS*CASE implementation by means of RDBMS Ingres are given in section seven. It is assumed that the reader is familiar with the concepts of relational data model at the level of [2] and [3], and that the section containing these concepts can be omitted.

2.

THE ROLE OF IIS*CASE IN INFORMATION SYSTEM DESIGN

The application of IIS*CASE in an information system design is fully justified if IS should encompass a larger number of business functions (e.g. store control, sales, purchasing, bookkeeping, salaries, production planing, production management, etc.). The information system design methodology is influenced by IIS*CASE, and has to be appropriately adopted. One of the motives for this adoption is to allow parallel and relatively independent work of a number of designers. Preceding a short description of the methodology, there are characteristic concepts of IIS*CASE informally introduced in the next paragraph.

2.1. Characteristic concepts Characteristic concepts of IIS*CASE are: form type, db module schema, db schema, form type subschema and application prototype. The cited sequence of concepts corresponds to the sequence of appropriate design activities of IIS*CASE application. The form type is the starting concept of IIS*CASE application in db schema design process. It encompasses information concerning a subset of db schema attributes and a subset of schema constraints. The db module schema is a named pair Nm(Sm, Im), where Nm is the name of db module schema, Sm a set of relation schemes, and Im a set of constraints. The set of constraints Im contains inclusion dependencies, embedded join dependencies and domain constraints. Often, the name of db module schema corresponds to the name of a business function. Database schema is a named pair N(S, I ), where N is a schema name, S set of third normal form relation schemes, and I is a set of constraints. The set S is produced by merging relation schemes from various db module schemas. To every form type corresponds one subschema. The subschema is a part of db schema necessary to implement any program over the given form type.

The application prototype is generated on the basis of an application menu. The application menu is a special kind of a form type. Each item of that menu is a program. A form type with a corresponding subschema is associated to each program.

2.2.

Methodological aspects of IIS*CASE application

The concepts of: common business entity, common document, common document/business function matrix and software subsystem are crucial for perceiving the role of IIS∗ CASE in the information system development. Common business entity is an business object, whose data are used in more than one business function, and whose data lifetime is relatively long (over one year). Examples of common business entities are: materials, spare parts, machines, employees, organizational units, customers etc. Common document is an information carrier, whose contents is produced in one (source) business function and used in some others. Possible examples of common documents are: purchase requisition, receipt of materials, purchase order, shop order. Common document/business function matrix simply bears information concerning the source business function and business functions, that use or possible, update a certain common document. Final results of IIS∗CASE application are: a relational db schema and a set of software subsystems. Every software subsystem is aimed at the execution of tasks belonging to one business function. Software subsystem contains a number of applications each of them containing programs. All programs of all software subsystems are using the same db, or the replications of the same db. One software subsystem, named the core of IS, contains append, update and report programs for common business entities. The application of IIS∗CASE starts when the following activities of system analysis are finished: • identification of output information, the users expect to get from the new IS, • identification of documents, which are used to keep track of business system sates changes, and which cover, by their contents, the expected outputs of IS, • common document/business function matrix and • identification of common business entities together with the sets of their basic attributes. The first step of IIS∗CASE application is design of the form types aimed at appending and updating data concerning common business entities. The second is design of common document form types. Both kinds of form types are used by designers of software subsystems according to the common document/business function matrix. The first two steps of IIS∗ CASE application have only one goal. It is to allow relatively independent design of db module schemas and software subsystems. Parallel and independent work of designers, each developing its own subsystem, can lead to incompatible results. This incompatibility is mirrored, above all, in homonym and synonym attributes. According to our approach, every designer is obligated to use already defined attributes of common entities and documents, and he or she is allowed to introduce only the attributes with a role local to one function, as the new ones. In this way the problem of homonyms and synonyms disappears or, at least, tempers, allowing an easier db module schemas into db schema integration.

By using the results of the first two steps, designers interactively build updating and reporting form types of every business function software subsystem. After that, constraint collision analysis is performed. Providing that collisions, caused by different form types are resolved, the db module schema Nm(Sm, Im) is generated. The set Sm of relation schemes is generated by applying a modified synthesis algorithm on the union of attribute sets and the union of constraints sets of a software subsystem update form types. Inclusion dependencies of Im are produced by automatic analysis of foreign keys, whereas existential constraints are interactively defined. There are two goals of db module schema design. One is that the designer, by means of produced relation schemes analysis, proves whether he has defined the form types correctly. The analysis may require some improvements of form types. The other goal is to provide conditions for db schema design. Design of db schema N(S, I ) is done by progressive integration of db module schemas. For all that, core db module schema is initially included, either partly, or as a whole, in db module schemas of every software subsystem. The order of integration is defined by priorities of software subsystem implementation. In principle, business functions, generating data used in the other functions, have higher priority. In the process of integration IIS*CASE detects: (i) collision of constraints defined in various module schemas, (ii) transitive dependencies, (iii) primary key and naming collision of relation schemes having equivalent keys, and stemming form various db module schemas. Collisions are remedied in the interaction with designers. Thereafter, normalization is applied again. For every form type of a software subsystem, IIS*CASE generates one subschema. Subschema is generated according to current db schema version. Providing that application menus are designed, after db module schema integration and generation of subschemas, IIS*CASE produces application prototypes of software subsystem.

3.

THE FORM TYPE

A form type is a tree. The nodes of the tree are object types. Let W(F) be the attribute set of a form type named F. Every object type of the form type F is a named pair N(Q, K), where N is the name, Q ={A1,...,Ak} ⊆ W(F) set of attributes, and K ={X |X ⊆ Q} set of keys. Example 3.2. Figure 3.1. contains a geometric representation of a form type, named INVOICE. It has two object types, named INV_HEADER and INV_ITEM, represented as rectangles. The object type attributes are included in the rectangles. Keys are underlined. Further generalization of the form type from Figure 3.1. is given on Figure 3.2. INVOICE

INV_HEADER Invid,

Indat, Cksid, Clisnm, Cusadr, Subto, Tax, Total

INVOICE INVOICE HEADER

INV_ITEM Iteno, Artid, Artnm, Unpri, Qtity, Netam Figure 3.1.

INVOICE ITEM Figure 3.2.

An occurrence of a form type is a tree, whose root contains one occurrence of the root object type, and whose nodes at every lower level of hierarchy are occurrences of the corresponding nonroot object types. Let O = {Ni (Qi , Ki ) | i = 1,...,m} be the set of a form type F object types, and let N1(Q1, K1 ) be the root object type. An occurrence p(N1) of the object type N1 is a function p(N1): Q1 →dom(A11) × ... × dom(Ak1), where (∀Ai ∈ Q1)(p(N1)(Ai ) ∈ dom(Ai )) holds. Let Nj be the object type, directly subordinated to the object type Ni and let P(Ni ) = {pl(Ni ) | l = 1,...,n} denote the set of occurrences of the object type Ni, within one occurrence of the form type F. An occurrence of the object type Nj corresponding to pl(Ni) ∈ P(Ni ), is a function p(Nj , pl(Ni )) : Qj→dom(A1j )×...×dom(Akj ), where (∀Ar ∈ Qj )(p(Nj , pl(Ni ))(Ar) ∈ dom(Ar)) holds. Formally, a form type is a fourtuple F(O, ϕ, C, AF(x)) over the set W(F), where: • O = {Ni (Qi , Ki ) | i = 1,...,m} is the set of object types. • ϕ ⊆ O × O is the relation over the set O, defining the tree structure. (Ni , Nj ) ∈ ϕ, if and only if there exists at least one occurrence of the form type F with more than one occurrence of the object type Nj subordinated to one occurrence of object type Ni . • C is a set of constraints, defined as follows: m

UQi = W ( F ),

i =1

( ∀Ni , N j ∈ O)(i ≠ j ⇔ Qi ∩ Q j = ∅),

(∀Ni , Nj ∈ O)(∀A, B ∈ W(F))((A, B ∈ Ni ) ⇔ Ibp(F, A, B)), (∀Ni , Nj ∈ O)(((Ni , Nj) ∈ ϕ ∧ X ∈ Kj) ⇒ Joo(Ni , Nj , X )) and (∀X ∈ K1)(Jof(N1, X ))), where the predicate "Ibp" denotes that the number of A and B values on every occurrence of the form type F is the same. Thus, "Ibp" suggests the way a designer distributes the form type attributes to object types. The predicate "Joo" denotes that the key X uniquely determines p(Nj , pl(Ni)), and there is no proper subset of X with the same property, whereas "Jof" denotes that X uniquely determines p(N1) (i.e. an occurrence of the form type F) and there is no proper subset of X with the same property. • AF(x) is a predicate with dom(x) = {u, r}. By means of predicate AF(x) the form type usage is defined. The interpretation AF(u) means that the form type F can be used for db updates and for reporting. The form type F belongs to the class AF(u), if it is aimed at changing the values of at least one db schema attribute. The interpretation AF(r) means that the form type F can be used for reporting only. Data entry is done by means of update forms. Therefore, update form types should contain all necessary information concerning db schema attribute and constraint sets. Only these form types ought to be used in db schema design. The set of update form types belonging to a software subsystem will be denoted as SFu, and set of report form types as SFr. The union of all form type attribute sets represent the universal set of attributes U. The db schema attribute set is UBP ⊆ U. The attributes, whose values are inferred from the values of other attributes can belong to U \ UBP .

4.

IDENTIFICATION OF DB SCHEMA CONSTRAINTS

Db schema is designed by using attribute and constraint sets, implicitly contained in form types. Some constraints of future db schema are implied by domain and attribute specifications, which are primitive concepts of IIS*CASE.

4.1. Domain and attribute specification Every attribute from db schema attribute set is based on exactly one domain. By domain specification, the designer specifies type, length and allowed or not allowed values (or ranges of values). By attribute specification the designer defines: (i) whether it is elementary or derived from other attributes, (ii) whether it belongs to UBP and (iii) a recursive function, that is used to infer values of an derived attribute.

4.2. Constraint types inferred from form types and constraint collisions On the basis of designed form types, functional dependencies (fds), embedded join dependencies (jds) and nonfunctional dependencies (nfds) can be inferred. Object type keys, and form type tree structure bear information of a set of fds F(F), defined by form type F. All attributes of any object type Ni , functionally depend on so called one-key-union. One-key-union is the union of keys, each of them was selected from exactly one object type on the path form the root object type to Ni . Let FD(W) be the set of all fds over W. Let X(Ni ), where Ni ∈ O(F), be the set of all one-key-unions for object type Ni . Set of fds, defined by form type F, is the set: F(F) = { X→ A ∈ FD(W) | (∃Nj ∈ O(F))(A ∈ Qj ∧ X ∈ X(Nj ))}. Example 4.1. Let a form type F as in Figure 4.1 be given. Set of nontrivial fds, defined by F is F(F) = {A→B, AC→D, AC→E, AD→C, AD→E }.

F N ( AB,{A }) 1

N 2 ( CDE, {C, D }) N 3 ( FG, {FG }) Appropriate form type tree structure carries the information of an embedded jd, that Figure 4.1. holds in the real world. The term "embedded" is strictly used, because a form type generally formalizes only the relationships between data belonging to a part of the real world. Let the set of leaves of a form type F be given by LO(F) = {N1,...Nk} ⊆ O(F). Then, form type F expresses an embedded jd J(F) = > 1}, where θi ∈ Θ = {θ1,...,θn} represents a noninterpreted, marked attribute. If two nfds are defined by different form types, their right hand sides will get differently marked noninterpreted attributes. Nfd X→θi denotes that there exists a nonfunctional relationship between at least two nonempty proper subsets Y and Z of X, where Y ⊄ Z, Z ⊄ Y holds. Y and Z are the keys of two different object types. Nfd X→θi points out that the information concerning relationship between y ∈ dom(Y) and z ∈ dom(Z) is important for the business managing. Example 4.3. The set of nfds expressed by form type F, shown on Figure 4.1, is: NF(F) = {AC→θ1 , AD→θ2 , AFG→θ3}.

Collision of constraints can arise during the process of real world modeling by form types. Appearance of a simultaneous existence of a: fd and nfd, or nfd and embedded jd, or fd and embedded jd, over the same attribute set causes collision. At least one of the form types, generating colliding constraints, has to be redesigned. In IIS*CASE only form types F ∈ SFu are subjected to constraint collision analysis. Let a fd and a nfd over the same attribute set be defined as follows: ( ∃F ∈ SFu)( ∃f ∈ NF ( F ))( ∃X → Y ∈ ( U F ( Fi )) + )( XY ⊆ lhs( f ) ∧ Y ⊆/ X ). Fi ∈SFu

Form types defining f and X→Y, can not be used simultaneously in the db schema design process. In other words, some of them does not express the real world semantic. Simultaneous existence of an embedded jd J(F1) = > 1, then can arise one of the following two situations: • every θi ∈ Θ stems from the same form type as an attribute A ∈ Q (θi is covered by A), and • there exist a θi ∈ Θ, which is not covered. Namely, let X = YZ, where Y ≠ ∅, Z ≠ ∅, and Y ≠ Z. Every θi ∈ Θ' can bear information concerning another role of Y or Z in their nonfunctional relationship. These roles can, but have not, be covered by some attributes in Q. IIS*CASE keeps track of the origin of θi , and checks whether every θi ∈ Θ' is covered by any A ∈ Q. If the answer is yes, then •

⎛ ⎞ R = Q U⎜ U X ⎟. ⎝ X ∈K ⎠ Otherwise, IIS*CASE asks designer to resolve the ambiguity. Often, the designer enters new attributes and changes the involved form types. Domains of the new attributes should bear information concerning the roles of Y and Z in X.

Example 5.1. Let M be machine, W worker, and MW→θ1 a nfd, stemming from a form type bearing information which workers produce parts on a machine. Let MW→θ2 be another nfd, whose semantics is "worker w maintains machine m". The later nfd stems from another form type. Suppose that the dependency set at the input of the synthesis algorithm step 6o contains MW→{θ1, θ2}. Designer will, probably, define WORKER'S_ROLE with dom(WORKER'S_ROLE) = {works, maintains, works_and_maintains}, as a new attribute.

Widening the synthesis algorithm by two new steps is motivated by the fact that it puts one of a relation scheme equivalent keys, as foreign key in the others in an unpredictable way. Such propagation of keys can cause: unnecessary long SQL query statements, problems in defining inclusion dependencies, and violation of db schema independence (as defined by [8]). To remedy the problem, it is necessary to transform the set of relation schemes S = {(Ri , Ki ) | i = 1, 2,...,n}, produced in the step 6o, into S' = {(Ri', Ki')| i = 1, 2,..., n}, such that only a proper or unproper subset of a relation scheme primary key has been propagated to the other relation schemes. Details of the mentioned transform can be found in [9] and [10].

6.

SUBSCHEMA DESIGN

Subschema design process has the following two main steps: (i) form type applicability analysis and (ii) subschema design. A form type F is applicable, if there exists a set of relation schemes {(Ri , Ki ) | i = 1,...,f } ⊆ S , such that the following: f

W ( F ) ∩ U BP ⊆ U Ri i =1

and Γ ∪ NF ∪ JF = >< ( R1 ,..., R f )

hold. Only the form types belonging to SFr are subjected to applicability analysis in IIS*CASE. It is proved in [5], that every F ∈ SFu is applicable. IIS*CASE generates a subschema F(Sp , Ip), for every applicable form type F. Sets Sp and Ip are inferred from corresponding sets of db schema (S, I ). F(Sp , Ip) has the following characteristics: • Sp is a necessary and sufficient relation scheme set, that provides lossless join and possibility to generate responses on queries, issued by form type F and • Each relation scheme from Sp is a projection of a db relation scheme having the same primary key, onto the set of necessary attributes, according to form type F. If the condition W ( F ) \ U Ri ≠ ∅ holds, IIS*CASE reports, which form type attributes are Ri ∈S p

not covered by designed subschema. If F ∈ SFu, IIS*CASE adds to Sp relation schemes, necessary for inclusion dependencies and business rules checking. Also, it specifies (object type, relation scheme) pairs that can be used to perform tuple deletion from db, (object type, relation scheme) pairs that can be used to perform tuple insertion into db and attributes from

W(F) ∩UBP , whose values can be modified by F. Details of automated subschema design are fully described in [5].

7.

IIS*CASE COUPLING TO RDBMS "INGRES"

IIS*CASE dictionary contains all necessary data, providing automated generation of Ingres/SQL specifications for: db schema tables and local constraints, rules and procedures, triggered by that rules, Ingres/VIFRED-forms and Ingres/VISION-applications. For each relation scheme, the first specification contains appropriate Ingres/SQL CREATE TABLE statement. Information whether a foreign key can obtain null values is derived from existential constraint specification, produced during the db module and db schema design. For each relation scheme key, a CREATE UNIQUE INDEX statement is generated. Domain constraints are implemented by CREATE INTEGRITY statement. Inclusion dependencies and business rules are implemented by Ingres/Knowledge Manager, i.e. by CREATE RULE and CREATE PROCEDURE statements, according to corresponding contents of IIS*CASE data dictionary. IIS*CASE form type design is closely connected to Ingres/VIFRED. Default Ingres/VIFRED form is automatically generated on the basis of form type tree structure and object type attribute sets. It is performed by insertion of appropriate data into Ingres/Forms System Catalog tables. The designer can edit the default VIFRED form, latter. After VIFRED form has been edited, IIS*CASE analyzes it, proves whether it is consistent regarding the form type, and reports eventual inconsistencies. Specification of Ingres/VISION application contains data for Ingres Vision System Catalog. Each Ingres/VISION frame corresponds to an IIS*CASE form type. The designer defines application structure, as a tree structure over the set of form types. Then, it is automatically converted to the tree structure of Ingres/VISION application frames.

8.

CONCLUSION

The first version of IIS*CASE was developed in 1988. on a VAX/VMS platform. It was programmed in Cobol and Assembler. Second version was written in Pascal and implemented on a PC MS_DOS platform. Many improvements written in Prolog ware added to the second version during next three years. During last year, experiments devoted to coupling of MS_DOS version to DBMS Ingres are successfully finished. The implementation of a version, fully integrated into Ingres, is in progress. The first version of IIS*CASE has been already applied to a few projects, whereas the MS_DOS version is in an intensive use. Through the application of this version, the specific methodological approach to information system design, together with an extensive set of standards, has been developed, as well.

REFERENCES

Diet J, Lochovsky F, Interactive Specification and Integration of User Views Using Forms, in: Proceedings of the Eighth International Conference on Entity-Relationship Approach, Toronto, Canada, 18-20. October, 1989, pp.171-185. [2] Maier D, The Theory of Relational Databases (Computer Science Press, Inc. rockville, Maryland, 1983.) [3] Paredaens J, Bra D. P, Gyssens M, Gucht V. D, The Structure of the Relational Database Model (Springer-Verlag, Berlin Heidelberg, 1989.) [4] Mogin P, Lukovi} I, Karad`i} @, Towards the Automatic Inference of Database Constraints, in: Proceedings of the XXXVI Yugoslav Conference ETAN, Kopaonik, 08-11. 06. 1992, Vol. IX, pp.19-26. [5] Lukovi} I, Automated Relational Database Subschema Design by Form Types (MSc thesis, Faculty of Electrical Sciences, Belgrade, 18. 06. 1993.) [6] Beeri C, Bernstein P.A, Computational Problems Related the Design of Normal Form Relational Schemas, in: ACM TODS, Vol.4, No.1, March, 1979, pp 30-59. [7] Diedrich I, Milton J, New Methods and Fast Algorithms for Database Normalization, in: ACM TODS, Vol 13, No 3, Sept. 1988, pp. 339-365. [8] Sagiv Y, Characterization of Globally Consistent Databases and their Correct Access Path, in: TR University of Illinois, July, 1981. [9] Risti} S, Investigation of the Primary Key Propagation Problem in the Relational Database Schema (MSc thesis, Faculty of Economic Sciences, Subotica, April 1994.) [10] Mogin P, Risti} S, Candidates for Primary Key of the Relational Scheme, in: Proceedings of the XXXVIII Yugoslav Conference ETRAN, Ni{, 07-09. 06. 1994. [1]

Suggest Documents