Integrating Object Persistence to Relational Databases

HELSINKI UNIVERSITY OF TECHNOLOGY Department of Computer Science and Engineering Laboratory of Information Processing Science Sampo Nurmentaus Integ...

Author: Derek Philip Jacobs

3 downloads 1 Views 915KB Size

Report

Download PDF

Recommend Documents

Object-Relational Databases Exercices

2013. Goal. Data Persistence and Object-Relational Mapping. Object-Relational Mapping. The Problem with Databases. Object-Relational Mapping

Chapter 11 Object and Object- Relational Databases

O-ODM Framework for Object-Relational Databases

Conversion of Blaise Databases to Relational Databases *

AN INTRODUCTION TO RELATIONAL DATABASES

Object Persistence. Object Oriented Programming

Relational Geographic Databases

Watermarking Relational Databases

XML and Relational Databases

Mining Massive Relational Databases

Connecting Business Objects to Relational Databases

Direct access to relational databases (R16)

Object-Relational Databases. User-Defined Types Object ID s Nested Tables

Object-Relational Mapping Reconsidered

Object-Oriented Databases

Database Slicing on Relational Databases

Automating Layout of Relational Databases

615. object-relational model

OBJECT-RELATIONAL MAPPING

Object-Relational vs Object-Oriented DBMSs

Object Relational Mapping in PHP5

Chapter 8: Object-Oriented Databases

CS6503 Introduction to Relational Databases. Typical E-Commerce Architecture

HELSINKI UNIVERSITY OF TECHNOLOGY Department of Computer Science and Engineering Laboratory of Information Processing Science

Sampo Nurmentaus

Integrating Object Persistence to Relational Databases

Master’s Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Technology.

Espoo, May 01, 2004

Supervisor: Instructor:

Professor Eljas Soisalon-Soininen Professor Eljas Soisalon-Soininen

HELSINKI UNIVERSITY OF TECHNOLOGY

ABSTRACT OF THE MASTER’S THESIS

Author:

Sampo Nurmentaus

Name of the thesis:

Integrating Object Persistence to Relational Databases

Date:

May 01, 2004

Number of pages: 60

Department:

Department of Computer

Professorship:

T-79

Science and Engineering Supervisor:

Prof. Eljas Soisalon-Soininen

Instructor:

Prof. Eljas Soisalon-Soininen

Both object-oriented development and relational data bases are here to stay. They are both mature technologies that are used in a wide range of software projects. We have used both successfully in embedded environment. Both technologies are good at their own fields. Object-oriented methods are good at modelling real world problems and relational databases are practical to store and retrieve data effectively. But integrating these technologies is not a trivial question. There are lot of mismatches ranging from the process to the implementation level. But often relational databases are available for persistence object storage and they might contain data that the application developed must be able to access. There are a lot of cases where a combination of these technologies is required, although there are some mismatches between them. In this thesis we have examined the possibility of integrating these technologies using a persistence layer, a class library that provides reusable tools for storing and restoring C + + objects to and from different relational databases. The goal of our solution was to provide extra flexibility to application development by reducing the coupling between the application and the database solution used and to provide a reusable solution for persistence questions. We developed a design for a persistence layer together with some implementation tests to answer the questions like: is it worth the effort to develop one and what kind of process changes are required. It was discovered that the development of a persistence layer is a large project that requires a lot of effort. It would have to be reused in several projects to be worth implementing, but in the long run it might be worth the investment in terms of reduced application development and maintenance effort. Keywords: object-orientation, persistence, relational databases, object-relational mapping,agile data, embedded systems, C++, serialization

2

TEKNILLINEN KORKEAKOULU

¨ TIIVISTELMA ¨ DIPLOMITYON

Tekij¨ a:

Sampo Nurmentaus

Ty¨ on nimi:

Olioiden tallennus relaatiotietokantaan

P¨ aiv¨ am¨ a¨ ar¨ a:

01.05.2004

Sivuja: 60

Osasto:

Tietotekniikan osasto

Professuuri:

Tyo ¨n valvoja:

Prof. Eljas Soisalon-Soininen

Ty¨ on ohjaaja:

Prof. Eljas Soisalon-Soininen

T-79

Relaatiotietokannat ja olio-ohjelmointi ovat yleisesti k¨ayt¨oss¨a olevia vakiintuneita teknologioita. Niit¨a k¨aytet¨a¨an l¨ahestulkoon kaiken laisissa ohjelmisto j¨arjestelmis¨a. Meill¨a molemmat teknologiat ovat k¨ayt¨oss¨a sulautetussa ymp¨arist¨oss¨a. Molemmat teknologiat ovat hyvi¨a k¨aytt¨otarkoiksessaan. Relaatiotietokantoja k¨aytet¨a¨an suurien tietom¨aa¨rien hallintaan ja olio-ohjelmointia monimutkaisten reaalimaailman ongelmien mallintamiseen. N¨aiden teknologioiden yhdist¨aminen ei kumminkaan ole mutkatonta, vaan ongelmia syntyy niin toteutus, kuin prosessi tasollakin. Usein kumminkin tarve olio-ohjelmoinnin ja relaatiokantojen yhteisk¨ayt¨olle on olemassa. Organisaatioissa on relaatiotietokantoja laajalti k¨ayt¨oss¨a ja ne sis¨alt¨av¨at dataa, johon oliopohjaisen sovelluksen on p¨a¨ast¨av¨a k¨asiksi. Usein n¨ait¨a teknologioita k¨aytet¨a¨an yhdess¨a, vaikka niiden yhteisk¨ayt¨oss¨a on selvi¨a ongelmia. T¨ass¨a ty¨oss¨a on tutkittu olio- ja relaatioteknologioiden yhteisk¨aytt¨o¨a tallennuskerroksen avulla. T¨all¨a tarkoitetaan luokka kirjastoa tai ohjelmistokehyst¨a, jonka tarkoitus on tarjota sovellukselle olioiden tallennus palvelua. T¨ass¨a ty¨oss¨a keskityt¨a¨an nimenomaan tallennuskerroksen toteuttamiseen C++ oliosovelluksen ja relaatiotietokantojen v¨alille. Yksi p¨a¨atavoitteista oli joustavuuden lis¨a¨aminen sovelluskehitykseen v¨ahent¨am¨all¨a kytkent¨a¨a sovelluksen ja tietokannan v¨alill¨a ja kehitt¨aa¨ uudelleenk¨aytett¨av¨a ratkaisu olioiden tallentamiseksi. Ty¨on puitteissa kehitettiin suunnitelma pysyvyyskerroksen toteuttamiseksi ja muutamia testej¨a toteutuksesta. T¨am¨an tavoitteena oli arvioida tarvittavaa ty¨om¨a¨ar¨a¨a ja onko saavutetut hy¨odyn investoinnin arvoisia. My¨os arvioitiin millaisia prosessi muutoksia pysyvyyskerroksen k¨aytt¨o¨onotto aiheuttaisi. P¨a¨ahavainto oli ett¨a pysyvyyskerroksen toteutus on ty¨ol¨as projekti, jossa on paljon huomioitavia yksityiskohtia. Pysyvyys kerrosta tulisi voida uudelleen k¨aytt¨a¨a useissa projekteissa, jotta sen toteutus kannattaisi, mutta pidemm¨all¨a aikav¨alill¨a se olisi kannattava investointi, helpottuneen sovellus kehityksen ja yll¨apidon ansiosta. Avainsanat: olio-ohlelmointi, pysyvyys, relaatio tietokanta, olio-relaatio kuvaus, ketter¨a tietomalli 3

1

Acknowledgements

Hard to believe but it is done. It has required some hard work, but finally I am here. Typing the last part of my thesis and I do feel great relief. But there is also some melancholy in the air. Typing this makes me look over my sholder at the past studies at Helsinki University of Technology. This has surely been the most interesting era in my life this far. I have studied several interesting subjects and learned to know many new people. Spending the next 40 years in that nine-to-fife scene sounds a bit frightening to me so it might very well be that I will continue my studies some day. I wish to thank my supervisor and director professor Eljas Soisalon-Soininen for all the help and advices during the writing process. I would also like to thank Kaj Bj¨orklund,Baris Boyvat, Ilkka Pelkonen and Markku Rontu for comments about my thesis and interesting discussions over the persistence questions and Cristoffer Von Bundstorf for the help with the language. My gratitude also goes to my parents for the financial support that make it possible for me to fully concentrade on my work. Finally I would like to thank my lovely fianc´ee for tolerating this stress suffering geek on her sofa.

On a sunny spring day Nurmij¨arvi, May 19, 2004

Sampo Nurmentaus

4

Contents 1 Acknowledgements

4

2 Introduction

8

2.1

The Structure of This Document . . . . . . . . . . . . . . . . . .

3 Relational Data Bases

9 9

3.1

Operations on Relational Databases . . . . . . . . . . . . . . . .

10

3.2

Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

3.3

Referential Integrity . . . . . . . . . . . . . . . . . . . . . . . . .

11

3.4

Transactions

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

3.5

Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

3.6

Cursors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

4 Object Oriented Development

14

4.1

Relationships Between Classes

. . . . . . . . . . . . . . . . . . .

15

4.2

Full Encapsulation of Persistence Mechanisms . . . . . . . . . . .

17

4.3

Extensibility

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

4.4

Object Oriented Frameworks . . . . . . . . . . . . . . . . . . . .

17

5 The Object-Relational Inpedance Mismatch

18

5.1

Design Time of Relations . . . . . . . . . . . . . . . . . . . . . .

18

5.2

Representing Objects as Tables . . . . . . . . . . . . . . . . . . .

19

5.3

Object Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

5.4

Representing Collections in a Relational Database . . . . . . . .

20

5.5

Representing Object Relationships . . . . . . . . . . . . . . . . .

21

5.6

Representing Inheritance Hierarchies . . . . . . . . . . . . . . . .

22

5.6.1

Whole hierarchy in one table . . . . . . . . . . . . . . . .

22

5.6.2

Each concrete class to a table of it’s own

. . . . . . . . .

23

5.6.3

Each class to its own table . . . . . . . . . . . . . . . . .

23

5.6.4

Map inheritance hierarchies to a generic structure . . . .

24

5.7

Comparison of Different Mapping Strategies . . . . . . . . . . . .

25

5.8

Abstraction of Queries . . . . . . . . . . . . . . . . . . . . . . . .

26

5.9

Mapping Query Results to Objects . . . . . . . . . . . . . . . . .

26

6 Our Embedded System

27

6.1

Proxy Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

6.2

Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

5

6.3

Abstraction of Cursors . . . . . . . . . . . . . . . . . . . . . . . .

28

6.4

Multi-object Actions . . . . . . . . . . . . . . . . . . . . . . . . .

29

7 Implementation Language

29

7.1

Exceptions for Error Handling . . . . . . . . . . . . . . . . . . . .

30

7.2

Lack of Reflectivity . . . . . . . . . . . . . . . . . . . . . . . . . .

30

8 Issues with Legacy Data and Applications

30

8.1

Several Persistence Mechanism . . . . . . . . . . . . . . . . . . .

31

8.2

Multi-object Actions . . . . . . . . . . . . . . . . . . . . . . . . .

31

8.3

Multiple Connections . . . . . . . . . . . . . . . . . . . . . . . . .

31

9 Requirements for a Persistence Layer

32

9.1

Questions we are looking answers for . . . . . . . . . . . . . . . .

36

9.2

Goals for Our Implementation . . . . . . . . . . . . . . . . . . . .

37

10 Our Solution

38

11 Logical View

38

11.1 Representing Queries as Objects . . . . . . . . . . . . . . . . . .

41

12 Development View

42

13 Process View

43

14 Data View

45

15 Scenarios

46

16 Usaging Persistence Layer from an Application

47

17 Analyzing Results

50

18 Alternative Solutions

52

18.1 Alternatives To A Persistence Layer . . . . . . . . . . . . . . . .

52

18.2 Object Oriented Data Bases . . . . . . . . . . . . . . . . . . . . .

52

18.3 Alternatives to Persistence Interface to The Application . . . . .

54

19 Future Developments

54

19.1 Management Utility . . . . . . . . . . . . . . . . . . . . . . . . .

54

19.2 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

19.3 Database Schema Versioning . . . . . . . . . . . . . . . . . . . .

54

6

19.4 Fine grained Versioning of Data . . . . . . . . . . . . . . . . . . .

55

19.5 Storing Temporary State Between Sessions

55

. . . . . . . . . . . .

20 Summary

56

References

58

A Glossary

60

7

2

Introduction

Object Oriented development is one of the most popular programming paradigms today influencing both programming tools and methods and development processes. In a modern object oriented development process, data structures in the program tend to reflect the problem domain to fullfill user requirements as well as possible. On the other hand relational databases are designed to store data that has relatively static structure and to provide fast operations on this data. Traditionally the structure of a relational database used in an application is designed very early in a development process and it is tried to keep fixed during development. Relational databases and object oriented development are both used in many different applications ranging from embedded systems to large scale business solutions. Both object oriented development and relational databases are good at their own fields, but have completely different design goals and principles. Relational databases have their roots in relational algebra whereas Object Oriented development is raised from years of experience in software engineering. Relational databases emphasize on good design decided beforehand where as object oriented development promotes flexibility. This makes the combining of these technologies a non trivial problem. One key advance of OO development is flexibility, it is relatively easy to adapt an object oriented design 1 to the changing requirements of the problem domain. In practise any application must change to remain useful [4]. Often this flexibility is limited by existing database structures stored to relational databases containing huge amounts of data [6, 2]. Combining these techniques is still an interesting question. Often relational databases do exist in organisations and they are very good at handling huge amounts of data. On the other hand, object oriented software development is one of the most promising programming paradigms today. So an effective way to combine these two techniques would give real added value. Many systems that use both technologies do this manually. Queries against relational databases are stored in classes describing the problem domain. This both requires extra development effort and is hard to maintain [1, 6, 2]. A better solution would be to use a persistence layer that reduces the coupling between the database and the class structures of the application [1, 6]. This layer should provide services to retrieve and store classes from a persistence mechanism in question. The actual techniques to store the data of an object should be hidden from the application, so that developers could concentrate on fullfilling user requirements. In addition this layer can provide tools for data exports and imports both XML and other data bases as well as data versions to support existing databases when the program is updated. Also undo functionality and object access control can be integrated to the persistence framework. The possibilities are almost unlimited. In the field of embedded systems application development is faced with many 1 more

specifically: a good object oriented design

8

problems unfamiliar to normal desktop and server environment. Memory, storage space and computing power are limited so extra care should be taken in programming. Often embedded systems have higher expectations on reliability than desktop computers. People are used to reboot their PCs, but this is not the case with a dish washer. Still software components of an embedded system are expected to provide flexibility. It is thought to be easier to adapt software system to changing requirements than a completely hardware based solution and this is often the motivation to implement application logic in software. All this makes an embedded system a very interesting and challenging platform for an application developer. Integrating databases to embedded systems is a challenge so reuse of existing solutions is more than welcome. A great opportunity for our persistence layer.

2.1

The Structure of This Document

First we will describe both relational data bases and object oriented development in further detail concentrating on features related to this project in Sections 3 and 4. Next some fundamental differences between object orientation and relational databases are examined in further detail and also possible solutions to these problems are discussed in Section 5. Also embedded systems are discussed in Section 6 both in general and in our particular case. The limitations set by the embedded environment to the persistence layer are also described and we will present a few solutions for these found in the literature. After this in Section 7 we will discuss a bit about the implementation language we are about to use in our implementation and why we have chosen C + +. A short discussion about pre-existing, legacy, data is given together with ideas about how persistence layer can help in Section 8. The requirements raised from the first Sections are then summarised together with a few new ones in Section 9. After this in Section 10 we will introduce solutions we have designed and finally we describe how our solution tries to solve different problems encountered in Section 17. Then some alternative solutions to the persistence layer are described in Section 18. Finally in Section 19 we make a few remarks about how the development of our persistence layer will go on. In the summary in Section 20 a brief description about what we have achieved is given and things that still remain unsolved are listed.

3

Relational Data Bases

Relational data bases are an 30 year old concept about abstracting the actual physical layout of a database into relations that can then be managed with a 9

query language. To define a relation more exactly, see [7] and [16], for an example, we assume the sets S1 , S2 , ..., Sn given. Now an n-tuple with one element for each set forms a record in the database and a set of these n-tuples is called a relation. More mathematically R is a relation on sets S1 , S2 , ..., Sn if and only if R ⊆ S1 × S2 × ... × Sn . The sets S1 , S2 , ..., Sn are called the domains of the relation. For example if S1 were a set of person names and S2 a set of phone numbers, the relation R(S1 , S2 ) would combine a phone number to a person. Each 2-tuple in the relation R will state that person with this name has this phone number. The primary key of a relation is a subset of its domains with values that uniquely identifies a tuple in the relation. In our phone number example a person name would work as a primary key 2 . If a primary key of a relation is present in another relation it is called a foreign key. This way relations can refer to each other to form more complicated data structures. Relations this way form a data model that then can be accessed and modified with a query language. Operations are defined by relational algebra which give mathematical foundation to the database languages used in real world [7]. The most often used language for data operations on relational databases is called structured query language or SQL for short [8]. In practical database therminology relations are often referred to as tables and domains as attributes or columns. Tuples are called rows and a set of elements from a single domain in a relation forms a column. This convention is not a part of relational model, but it provides a more practical view point to the mathematical world of relations. In the following Sections I describe some of the common features of relational databases that should be taken in account in our project.

3.1

Operations on Relational Databases

Different operations can be performed against a relational database to retrieve and manipulate data stored in relations. Some of the operations are based on relational algebra, some are included for purely practical reasons. Projection of a relation means that we pick up some of its columns to a result set [33]. If we assume relation R with the columns name and phonenumber. Now operation πname (R) produces a relation that has only one column name. When projection is used to limit the number of columns in the result set selection is used to limit the number of rows [33]. Selection returns only rows that fullfill given conditions. For example σname=0 RingoStar0 (R) produces only rows with the name ’Ringo Star’. The cartesian product of two relations means that relations are combined in all possible ways [33]. If we have the relations A and B with both one attribute in it, a and b respectively, the cartesian product of these is a relation with both 2 As clearly can be seen this would never work in real world but serves well as an example here

10

attributes a and b and all the values of A combined to all the values of B. So if the number of rows in the relation A is |A| and respectively for B then |A × B| = |A||B| where A × B is a notation for the cartesian product of A and B. By itself the cartesian product of relations is rarely useful. Often a more limited operation, a join of two relations is used instead [33]. In a join rows of the relations are combined if they fulfill given conditions. For example if we have a phone number relation above and another one, say S, that contains names and street addresses, a natural join of these would combine records with matching names. In addition to natural join, a theta join can be used to join two relations on arbitrary condition. Query languages are also familiar with a notation of an outer join that includes rows from the relation to the result, even if there is no matching row in another relation. All these operations can be performed with SQL commands select and join [8]. Other commands, not based on formal relational algebra, include commands to insert, delete and update rows in a relation [33]. These are represented by SQL commands named logically insert, delete and update [8]. In addition there are also many other commands in SQL related to views,indexes, defining relations etc. The ones with particular influence on our system are discussed in the fallowing Sections.

3.2

Views

Data stored in a relational database is organised in a database schema that describes the structure of relations. This provides a well organised way to store data, but by no means only one model of the data is the right one for all purposes. We can use relational algebra to produce a new relational model from an existing one providing an additional view to the data [33]. This new model can then be queried with familiar operations. This way views provide kind of stored subquery for queries to come. Almost any operations applicable to relations can also be applied to views. However there are certain limitations when it comes to updating views [33]. In many cases it is impossible for the database management system to figure out how to handle an update to a view. Support for updatable views tend to vary between database vendors. These issues makes views an interesting solution to provide easy, read only access to the database for example third party applications. But when updates are required, the advance of views is quite minimal since the application accessing database through views still has to be aware of the underlaying database schema. One possible solution is to use triggers or stored procedures for updates. This provides more flexibility and allows some changes in the database schema without changes to the applications accessing the database.

3.3

Referential Integrity

Let there be relations A and B so that A refers to the relation B by storing the key of B as a foreign key in one of its columns. They are said to maintain 11

referential integrity if it is guaranteed that the row in the relation B that is referred to by a row in the relation A actually exists [33]. In other words, referential integrity means that references between relations are in order. Relational database management systems include techniques to guarantee referential integrity in database schema by foreign key constraints and triggers for example. Referential integrity is also payed attention to at application level. Applications do not allow the user to perform operations that would harm referential integrity and operations that temporarily break the integrity of the database are performed inside transactions. These are discussed further in Section 3.4. Also persistence layers have to take into account referential integrity among objects and thus does jet another checks. One approach, represented in [2], is that multiple layers of these checks only render too much performance penalty. Therefore database integrity checks should be used only as a safety net during development and could be turned off in production use. Of course this does not hold if there are also other applications accessing the database. When using a persistence layer this approach might seem interesting.

3.4

Transactions

If database operations are executed one by one the concurrent access and failures in the applications and the database system may easily generate the database into an inconsistent state. To solve this database operations are grouped into logical sets called transactions. Transaction basically is a set of basic database operations executed on the database [33]. Database management systems are designed to execute transactions with the following properties. Atomicity , the whole transaction is executed or nothing of it is executed. Consistency , if the database is in consistent state before a transaction begins it is still in consistent state when the transaction finishes. Isolation , other processes accessing the database simultaneously do not see the changes made by the transaction until it is finished. Durability , after transaction is finished the changes that it has made are permanently stored in the database. When a transaction is cancelled, either by a database error or by the application, it is said to be rolled back. If the transaction finishes successfully it is committed. When an operation in the middle of the transaction fails, operations already executed in the transaction are rolled back leaving the database to the state it was when the transaction began. If all the operations in the transactions succeed the transaction is committed and after that changes become permanent and visible to the other processes.

12

Transactions are a way to guarantee that a data base does not corrupt due to database or application failures and that processes accessing the database simultaneously always see it in a consistent state. Also persistence layer should have some kind of support for transactions [6, 15, 2, 1]. These can be created implicitly when objects are stored through the persistence layer, but application developers should be able to create them also explicitly when needed. In a persistence layer there is even more than only data involved with transactions [2]. There are also some behavioural aspects as well. For example, undo/redo functionality is an example of object transaction that can be reversed. These are not usually stored to the database, but instead they are transactions inside business objects. Integrating undo/redo functionality is discussed further in Section ??. Also error handling often involves some kind of functionality. Just as collision handling with optimistic locking scheme described in Section 3.5.

3.5

Locking

Most relational database management systems provide ability to lock rows or entire relations while manipulating them. This prevents other processes from accessing them while they are modified by the running process. Depending on the application in question different locking strategies supported by the database may be applied. If there is a persistence layer in between, it should also allow this kind of flexibility [2, 6]. In [2] a few different approaches are described. The first one is to ignore locking totally. Here no locking is involved when objects are fetched from the database, modified and stored back. In this scenario, if someone modifies objects in between, only later update will be visible in the database. A counter approach is to lock records when object is fetched from database and keep it locked until it is stored back. This can also be achieved by keeping transactions active during the whole operation. This prevents collisions but it makes it impossible to anyone to access objects simultaneously. This does not work well for example for interactive applications where records may remain locked for a long time while user is editing the data in them. In between of these two extreme solutions is one where logical time stamp is stored with the object. This time stamp is forwarded always when an object is modified in the database. When an object is stored back it is checked whether the time stamp has changed since the last retrieval. If so someone has updated the object in between. How these situations are then solved is up to application. Different solutions from [2] for locking is summarised in table 3.5. However in embedded systems the database often has only one application accessing it at once, so using too state-of-art locking would only introduce a performance penalty. So the need for locking is often different for different application and thus a general purpose persistence layer should support multiple different approaches.

13

Approach

Description

Overly mistic

The whole locking issue is ignored. If concurrent access is allowed this will cause collisions but in case of only one application accessing database this strategy works fine.

Opti-

Optimistic Locking

Data is read in and stored inside separate transactions. While the application manipulates data it remains unlocked, but some approach is used to detect collisions and they are then resolved.

Pessimistic Locking

Data is kept locked while the application is manipulating it. This prevents any sorts of collisions, but reduces concurrency. Especially when there is no way to distinct read and write access to the objects.

Table 1: Different approaches for locking at the application level

3.6

Cursors

To improve efficiency of database applications, relational databases often provide a feature called cursors [11]. These are a technique to retrieve data from a database little by little instead of huge amounts at time. For example when a query against a database returns large number of records, fetching them all to the client at once is not an option since it requires a lot of memory and transferring all the records at once causes the query execution to take too long time. To solve this databases introduce a cursor, which means a handle to the result set of query that is stored to the database management system. When a cursor is forwarded a new record of the result set is fetched to the client application. This way the application can iterate through a large result set without wasting too much resources. The same kind of behaviour should be preserved by a persistence layer [6]. Integration of cursors to the object oriented persistence layer is discussed in Section ??.

4

Object Oriented Development

Today object oriented development is the most promising programming paradigm and it is adopted in many organisations world wide [24]. At the same time when the object oriented programming tools and methods are spreading across the planet software processes are developing. Instead of serial waterfall [24] styled software process, more and more organisations are adopting some kind of flexible processes for example, Unified Process [15]. Object oriented development models real world subjects, both concrete and abstract ones, as objects [5]. Objects are instantiations of classes, where class defines the attributes and the behaviour of the objects it represents. All we know about an object is described by its class. Like concrete subjects in the real world and Plato’s ideas of them [22]. In a class attributes are used to describe what the object knows, the informa-

14

tions bound with it and operations to describe what the object can do, the functionality of it. These are then common to all objects of the class, but they all have their own identity and values for attributes. Classes and thus objects are related to each other in various ways discussed in more detail in Section 4.1. Object Orientation allows software processes to be more iterative and incremental in nature than traditional ones taking full advance of the flexibility of this modern paradigm [15, 24, 2]. In practise this means that software is developed, designed and specified little by little by adding a few new features on each iteration. The force driving the development is the user requirements, which are accepted to be vague, a moderate amount of change is natural to these processes. Data models in program model the subjects of the problem domain and provide functionality whose major goal is to fulfill some of the user requirements. The nature of object oriented development requires completely new thinking when it comes to data models. Data models are no more something completely static. Instead they must be flexible enough to allow some change during development [1]. Impact of this fact to the persistence layer is further discussed in the following sections.

4.1

Relationships Between Classes

In the design of an object oriented program the classes are related to each other in many different ways to form larger entities. A class describing a single book is related to a class describing its author, the book class may contain several chapter classes and the book may be a subtype of a more general class describing common properties of books, CDs etc. More exactly these relationships can be categorised as fallows [2]. Inheritance models a is-a relationship between two classes. For example lorry, car and ship can be modelled as subtypes of a vehicle class, which practically means that they all are vehicles. There are also other types of inheritance like private inheritance in C + + or implementation of an interface. Often inheritance is not even described as a relationship between classes, but it means that one class fully represents another. But when it comes to persistence of objects it is a relationship. This subject is discussed in further detail in Section 5.6. Composition means a relation where a class is a component of another, a isa-part-of relationship. For example engine, doors and wheels are components of a car. Aggregation means that one class may be made up of others. It is a kind of weaker form of composition. Both aggregation and composition tend to be asymmetric as an is-a-part-of relationship in the real world. It is also worth noticing that relationships as strong as aggregation and composition have also an effect to the lifetime of the objects. When an object representing a car is destroyed, also the classes describing its components are destroyed. Two classes are said to be associated together when they have access to each other through a pointer etc. This is a weaker form of class relation than the ones listed above. It has no influence on the lifetime of the objects and it may be that the object at the other end is completely missing. There are also other types of relationships among classes. If an object or a class is passed as a parameter to a method of another class or as a template 15

parameter to a class the classes in question depend on each other. Class can also implement an interface which forms a realisation relationship between the class and the interface. Still this kind of relationships are not interesting from the point of view of the persistency since they are more about the behaviour of objects than data. The number of objects in a relationship is called the multiplicity of the relationship. Both ends of the relationship have multiplicities of their own, which are usually defined during design of the class model [15]. Multiplicity can be defined to be any range of natural numbers, but a few important special cases can be noted. In the case of composition and aggregation, the ’whole’ end of the relationship has the multiplicity of exactly one. For example an engine is a part of exactly one car. The multiplicity of the other end can vary. A car can have any number of doors from zero to six. The multiplicity of exactly one defines a referential integrity constrain for the relationship since it forces the object at the other end to exist. Association does not have any limitations like this for multiplicities. These multiplicities are not explicitely expressed by the object oriented program, but are an essential part of the class model and thus the data model rised from it. If a data model is rised from a class model the constrains for referential integrity are based on the multiplicities of the class model [2]. This makes it necessary for the persistence layer to be able to define and manage multiplicities at some extent. The dependencies of classes have a great impact on the behaviour of the persistence subsystem when it comes to lifetime of objects [2]. When an object representing a car is deleted also the objects representing its parts are deleted. So when an object has aggregation or composition relation to another object, their lifetimes in persistent storage are related. This kind of chained operations set up by one operation are called cascading operations in [2]. Cascading deletes are suggested at least for objects with composite relationship and possibly also for aggregate objects. But it does not work for associations in more general since the multibilities might differ from one-to-many model of composition and aggregation. Other possible cascading operations include cascading reads and saves. For example, when an object representing a car is read into memory also objects representing its parts could be read. The same applies to saves. But these may still vary from one application to another. Cascading creation of objects is not an interesting question since it is traditionally taken care of by constructors of objects. Associations do not generate cascading operations on objects but relations to other classes may have to be updated. For example if a car object has a relation to the object representing its owner and the car object is destroyed the reference to the car object must be removed from the owner object. When objects have an association relationship between them, the object at the other end of the relationship is automatically restored from the database when the association is fallowed by the application.

16

4.2

Full Encapsulation of Persistence Mechanisms

One of the key aspects of object oriented design is that classes have clear responsibilities assigned to them [15]. No class should have responsibilities from several different domains in application. The responsibilities of the classes of the problem domain are in that domain and no other responsibilities should be assigned to these classes. Problems related to the databases and persistence are from completely different domain and that is why problem domain classes should know nothing about persistence subsystem. The persistence mechanisms like databases should be fully encapsulated from the object to be stored to the persistence system. Classes in the persistence layer should be orthogonal to the classes in problem domain. In practise this may not be perfectly achieved but at least coupling between persistence mechanism and application should be minimised [6]. In [6] it is stated that business domain classes, intended to be persistent, should not inherit from a common superclass implementing persistence since this generates too much coupling between the persistence layer and application classes. On the other hand in [1] a design where all classes stored through the persistence system are inherited from a class P ersistenObject is suggested, which increases coupling between the persistence subsystem and the application classes but gives all persistent objects similar interface to work with. The later is easier to implement but the previous design probably leads into design that is easier to maintain. When programming in C + + the lack of reflexivity described in section 7.2 leads to a solution where domain classes still should be aware of the persistence layer so the system can never be fully orthogonal to the domain classes.

4.3

Extensibility

As mentioned above, object oriented programs can be developed little by little. When the program is growing more and more classes get added to it. Also when a program is maintained, class structures in it often get changed. These actions should be allowed by the persistence layer [1, 2]. The database schema should allow enough flexibility to allow it to adopt to the new user requirements. These actions can not generally be performed fully automatically, but the persistence layer should provide tools to ease the change in database schemes. This viewpoint is taken into account when comparing different mapping strategies in Section 5.

4.4

Object Oriented Frameworks

One key idea of object oriented development is the reuse. In addition to reuse of program code also general ideas of different solutions are reused in terms of design patterns. Between reuse of fully implemented class libraries and abstract design patterns lies object oriented frameworks. Different applications require different characteristics from class libraries so the

17

libraries should be extremely flexible. Frameworks try to provide a more general solution, in terms of ’incomplete’ class libraries where some of the classes are abstract [18] . These classes have only descriptions of their required functionality which is then implemented by an application programmer in the way that is most suitable for the application in hand. These kind of extension points called hot spots in class structure of a framework make a well designed framework an extremely flexible way of reuse. Also persistence layer could be implemented in terms of frameworks. This would allow the application programmer to customise the behaviour of the persistence to optimise it for application at hand.

5

The Object-Relational Inpedance Mismatch

Object oriented development and relational databases described in Sections 3 and 4 have quite different backgrounds. Object Orientation is a practise based on experience on software development whereas relational databases have a sound mathematical background [1, 6]. When an application is developed two separate data models are designed [2]. An object oriented one that is used in application to represent objects of the problem domain and a relational one that is used to store data describing the problem domain persistently . This both adds extra modelling effort and may generate two models that are partly unrelated and may have conflicts in them. Objects are designed to have responsibilities in terms of both data and behaviour but relational databases are all about data, which easily leads into very different models about the same problem. Also people working with either of these tend to think development process differently [2]. Where data administrators want to start with a data model of the system, developers following an object oriented process start with user requirements and class models. In iterative and incremental development models are a subject to change whereas in data oriented world data models are something rock solid. In the following Sections we will describe some issues that these differences rise when storing objects to relational database. We will also describe some of the common solutions found in literature.

5.1

Design Time of Relations

In traditional database related software development data models for the application under development are often specified very early in the process [2, 6]. Now that software is developed more and more in iterative and incremental manners, this kind of predefined database structures tend to be a bit inflexible [15, 6, 2]. The data model influences the structure of the software instead of user requirements. In modern software development, it should be user requirements that form the basis for the software design and the database is only a tool to make software remember things between separate runs.

18

This is a political, process related question more than a technical one, so the solutions are usually not technical, but technology must support these new solutions [2]. The solution discussed in both [2] and [6] is to design the database in the beginning of the implementation of the software where core functionality and data structures required are already clear. In our work we will also examine the possibility for the persistence layer to generate a simple table structure by itself to minimise the effort spent by the application developer.

5.2

Representing Objects as Tables

To be stored to a relational database objects have to be mapped to the tables. Because of different natures of these domains, this is not always trivial. Objects may be composed of other non-trivial objects, they do not have keys and they may inherit properties from other objects. The table rows do not have same identity properties as objects do. One possible solution is to map a single class to a single table and atomic attributes of it, such as integers and strings, into attributes of the table. Complex attributes are handled as objects of their own with separate tables with foreignkey references to original table and attributes containing collections are stored in the table describing relationships between objects [6, 1]. Simple object structure and relational table structure representing it is shown in figure 1. Here one class, Class1 is composed of a few parameters and it aggregates another class Class2. These are mapped to tables of their own and the table for Class1 contains a foreign key from the table for Class2. Also object identifiers are shown in the mapping. These are discussed further in 5.3. The key idea of this mapping is to be simple and provide easy access to the database also for third party applications. Still different questions about flexibility must be solved, since changes in the class model generate changes in the database schema.

Figure 1: One-to-one mapping of objects to tables

19

Another approach to the mapping, suggested in [2], is to map objects into a general structure where classes, objects, attributes and their values are modelled with tables. This allows the database structure to remain the same whenever classes are added or their attributes are modified. This is discussed in greater detail in Section 5.6.4 and illustrated in figure 7. Advances of a general structure like this are improved flexibility of the data model but as a drawback third party access to the database becomes more complicated and there might be performance penalties.

5.3

Object Identifier

Objects have identity whereas rows in a relational database lack this feature [6, 1, 2]. Two objects with exactly the same values are still separately accessible by the system, but in the case of rows in database systems, the identity of the rows is defined only by the attribute values. This identity feature of objects should be somehow simulated by the persistence layer which is usually done by adding an object identifier to the data of each object as so called shadow information [2]. These identifiers are then stored to the persistence system and used as primary keys to retrieve objects and as a foreign key to reference to other objects and they are generated to be unique across all the tables [6]. Object identifiers, hereon referred to as OIDs, can also carry type information to make it easier to access objects base on oid’s. Related to OIDs one technique used in persistence systems is the so called pointer swizzling [13] . This means that a strategy is developed to map OIDs in persistent storage to main memory pointer to minimise the overhead introduced by persistence layer. There are several strategies for this which are discussed in [13].

5.4

Representing Collections in a Relational Database

In object oriented programs different collections are used to represent collections of objects belonging to a class or sharing the same base class [12]. There is no direct counterpart for these collections in the relational world so they must be mapped to some structure. Collections define many-to-one or many-to-many relationships between objects which are further discussed in Section 5.5. So by solving the mapping of collections we also solve questions raised by these objects relations. Objects in a collection may have a specified order or they might be keyed in some special way [2]. These features must be preserved by the mapping to the relational world. If there is an ordering in the collection, the order numbers of objects must be stored to the database and there should be clear politics about how to solve cases like inserting new objects in the middle of the collection that may require renumbering of objects. When mapping a single class to a single table, an additional table is used to represent collections [2]. It is often the same table that represent object relationships. This table then has one column that includes ordering or keying

20

information of the collection. In the case of a generic mapping, like one described in figure 7, collections must be somehow represented by attributes. The keying information can be added to the attribute table. This is not the most beautiful solution since all attributes carry the keying information, but it does add flexibility since any attribute and thus class relationship can change its multiplicity at any time.

5.5

Representing Object Relationships

In object oriented development objects are related to each other in various ways as discussed in Section 4.1 and [1]. Objects can be constructed from other nontrivial objects, they may refer to other objects in one-to-one, one-to-many or in many-to-many fashion or they may inherit properties from each other. Some relationships also has properties of their own, like they can be indexed with a special key. All these relationships must be expressed somehow by relational database structure. Representing these relationships when classes are all mapped to tables of their own requires additional tables to represent relationships. As mentioned in the previous Section 5.2, objects inside other objects can be stored in tables of their own with foreign key reference stored to the original table. The same applies to the one-to-one relationships. Using foreign keys is also the key idea in more complicated relationships. To remain flexible one-to-many and manyto-one relationships must be modelled using a relationship table. This is also the case with relationships having special properties [1, 6]. As mentioned in Section 5.4 collections are used in the object oriented world to represent these relationships, so mapping collections and mapping complex relationships are reduced to the same problem. In [2] it is also noted that simple relationships between classes could be implemented in tables of their own. This introduces a moderate performance penalty but provides extra flexibility. In a relational database all relationships between relations are always bidirectional, they can be queried in both ways. This is not always the case in the object oriented world. The relationship between objects can also be unidirectional, a difference that must be taken into account by the mapping. Example of complex relationship is provided in figure 2. In the figure Class1 includes a map of objects of Class2 the map being keyed with string objects. If objects are mapped to a generic structure relationships are modelled at class level in a table of its own. This is a flexible solution but may cause some performance penalty since it might be that several tables must be accessed to fetch objects. A clear advance is that when properties of a relationship change or new relationships are added only one table is updated. Attributes that carry foreign keys for relations can be stored as attributes in the database structure described in figure 7. Complex relationships that are keyed in some special way still require some clever politics to handle them. Inheritance is yet another relationship among classes which will be discussed in Section 5.6 in greater detail.

21

Figure 2: Object-Relational Mapping from a Complex Relationship

5.6

Representing Inheritance Hierarchies

Inheritance is purely an issue of object oriented domain with no counterpart in the relational world. As the persistence of objects has something to do with the data stored in the objects, we are not interested in types of inheritance where only interfaces of classes are implemented. The thing we really are interested in is a kind of inheritance where the base class has some data members to be stored to the relational database. This could be a public inheritance modelling ’is-a’ relationship between classes or a private inheritance modelling ’is-implementedin-terms-of’ relationship [19]. There are different approaches to map inheritance hierarchy to tables discussed in the following Sections. First we describe a few solutions that are applicable when mapping every class to table of its own and then we describe how inheritance is represented in a generic structure for classes. In each Section we will represent a mapping of example class hierarchy shown in figure 3. This is a simple library example where classes Book and AudioCD inherit a common base class Item and all the classes have attributes to be stored to the database. Base class is an abstract one so it is never instantiated as is. 5.6.1

Whole hierarchy in one table

A table with columns for all the attributes in the hierarchy and one column to define the correct subtype of the object is used to store all the objects of one class hierarchy [2, 6]. The solution is illustrated in figure 4. This is quite easy to implement and efficient to use. Every object in the hierarcy can be fetched with a single query. On the other hand on the table containing all the possible columns for the attributes of all the classes in the hierarchy becomes very wide and contains a lot of empty fields. This makes it inefficient in terms of storage spaces. It also is very inflexible when the application changes [1]. This solution does not provide a very elegant solution in terms of database design either. 22

Figure 3: Example class hierarcy

Figure 4: Mapping example hierarcy to one table 5.6.2

Each concrete class to a table of it’s own

own. This means that attributes belonging to the base class have columns in each of these tables as shown in figure 5 [1, 2]. This is an efficient solution both in terms of execution speed and storage space. No extra space is required and every class can be fetched and updated with a single query. Since the attributes of the base class are stored to different tables, changes in base class structure generate big updates to database schemas. Another problem is that queries over all the objects of the base class are hard to implement. 5.6.3

Each class to its own table

In this solution each class of the hierarchy is mapped to a table of its own as demonstrated in figure 6 [1, 6, 2]. This approach is somehow harder to implement than the previous ones and it has also a performance penalty as fetching and updating a single class requires operations on multiple tables. Here each table has OID as its primary key and same oid appears in all tables

23

Figure 5: Mapping each concrete class of the hierarchy to a table representing the class of the object in question. As OID’s connect the rows in different tables together they are also foreign keys in every table. As mentioned in Section 5.3 OID’s can also have type information in them. If this is the case and the same OID is present in multiple tables it should somehow be addressed that this OID actually refers to a class hierarchy. This is the recommended mapping in [6, 1].

Figure 6: Mapping each concrete class of the hierarcy to a table

5.6.4

Map inheritance hierarchies to a generic structure

When mapping objects to a generic structure, mapping inheritance hierarchies becomes quite simple [2]. Inheritance is just modelled as another kind of relationship between classes. As mentioned before this structure can act as a solution to all mapping related questions, but its drawbacks lies in performance and in the fact third party access to the database becomes more complicated. Structure like the one presented in figure 7 is highly flexible. No changes in classes generate any changes to the database schema, including changes in inheritance hierarchies. Still data structures must be versioned since the existing 24

data may represent previous class structure. The problem with this kind of approach is that operations on objects require accessing several tables. Fetching one class would require accessing four tables. If persistence layer supports changing mapping approach for an existing application, this kind of generic structure can be used when developing and prototyping the application, when change of rate in data structures is high. When the mapping becomes a bottle neck the approach can be changed.

Figure 7: Mapping classes to a generic structure

5.7

Comparison of Different Mapping Strategies

As described in the Sections above, there are at least two different aproaches for mapping classes to a relational database, to map each class to a table or to map classes to a generic structure. The mapping of each class to a table is the recommended solution in [6] and [1]. It is very simple in a basic case, gives a simple database structure and is quite effective. Special cases like complicated relationships between classes and inheritance structures require special handling. Also the flexibility has its limitations since the schema must be changed whenever the class model changes. A general structure to represent classes is described in [2]. It gives a clear advance in terms of flexibility. The same database structure can be used to represent any class model. The main disadvantage mentioned in [2] is the performance penalty. The table containing all the attributes for every object seen in figure ?? easily grows very large. Another problem is that the database schema is quite complicated to access for any third party application.

25

5.8

Abstraction of Queries

An application should not be coupled with an underlaying database [1, 6]. If an application is tightly coupled with a database it is very hard and expensive to port it to different database systems. This same rule applies also to the persistence layer [1]. The persistence layer should use some kind of abstraction to the database access that is both natural in an object oriented environment and independent from underlying database. This makes it easier to run the application on different databases or even on completely different persistence mechanisms like XML files. Even when database access is abstracted the application should be able to take full advantage of the features provided by different databases [1]. This abstraction can be achieved in many ways. One way is to use a higher level query language that is then compiled to actual queries against the persistence mechanism in question. Another approach is to use a subset of a database language that is common to most of the database management systems, like SQL92 standard. This has the drawback that software can not be ported to other than SQL databases and it can not take advantage of the latest features of those either. In addition many data base management systems are thin enough to be run on an embedded system described in Section ?? do support only a subset of SQL92. Even a more sophisticated method would be to represent queries as objects and the actual query is then generated from this structure [1]. On this kind of approach for java is described in [21]. These query objects can then be generated to sql suitable for the database in question or even XML queries described in [29]. This problem is also present in object oriented visual query systems where the visual presentation of queries is represented by objects which are then converted to database queries. Existing solutions are discussed in [23] where also an implementation in java is described. We will not focus on our project to this question, but it is clear that a fully capable persistence layer that totally abstracts the underlaying persistence mechanism must introduce a query abstraction that is capable to take full advance of the database instead of treating it as a simple storage mechanisms.

5.9

Mapping Query Results to Objects

When a programmer queries a database, the results returned should be presented to the application in a form that is natural to the object oriented paradigm. When the result set contains only objects from a single class this is easy, but often query results may overlap several classes. As result sets provided by queries may be bigger than the available memory some abstraction of cursors described in section 3.6 should be available. Usually the database is asked for objects of a certain type. In this kind of query it is easy for persistence layer to generate objects. But in [1] it is stated that the system should allow also arbitrary queries. For example, if we want to make a query for a class structure described in figure 8 to 26

display a list of music pieces including the names of categories for each piece of music. If this is done by fetching all the object into memory, both for the pieces of music and their categories, this would introduce large overhead compared to direct query that simply joins the music piece and the category.

Figure 8: Class Structure Example In our work we will concentrate on pure object approach and minimising the over head involved. The mapping of arbitrary query results to objects is also discussed in visual query systems.

6

Our Embedded System

In general, the embedded system refers to a computer system integrated into another device [17]. In practise these range from tiny watches and smart cards to the high end firewalls and routers. Our system is integrated into an active loudspeaker forming a device that can be programmed to play different audio files or streams. The whole system is managed and updated over Internet [25]. The device has Linux operating system [27] running on it on sixteen megabytes of RAM and uses flash memory or a hard disk as permanent storage. The main CPU of the system is a 25M Hz Etrax 100LX risk processor witch produces 100M IP S of computing power [30]. The data managed by the system is stored into a custom relational database management system data files of which are located on the hard disk. If systems is run without the hard disk, the database is accessed over a TCP/IP networking. This kind of environment sets some limitation to the software running on it. As an embedded system this is a quite powerful one, but compared to usual database and web servers handling audio data resources are very limited. First of all, the amount of processing power and memory is limited. Also dependencies to external libraries should be minimal since all the libraries used must be ported to the target system. These limitations also affects the persistence layer to be used on the device. The over head introduced by another layer of indirection should be moderate. Even though persistence layer should provide persistence to different persistence mechanisms, it should be possible to compile it with only one mechanism in it. This is because accessing different databases usually requires different libraries and not all of these should have to be ported to the embedded system. There are different ways to take limited resources into account when designing persistence layer. These are discussed in the fallowing Sections.

27

6.1

Proxy Objects

What is meant with proxies when it comes to objects is an object that is only partially in memory [6]. This means that only a minimal set of its attributes have values and others are omitted. This way objects consume less memory, for example when listing objects in a database. When detailed information over an object is needed, the rest of it is fetched. In C++ proxies can be effectively implemented using smart pointers [15]. Still there is the problem when to prefetch proxies. This must somehow be expressed by the application using persistence layer or be setup in the mapping data with a management application. In [2] the use of graphical management system for persistence services is suggested. On the other hand, in [21] an implementation where the path of objects to prefetch from database is described in the application program. The first approach allows the performance tuning of the application after the development phase only when the performance really becomes a bottle neck whereas the specifying behaviour in application bounds the system tightly with the persistence framework and promotes optimisation during development. On the other hand, the application programmer often has the best insight to the program and he knows where prefetching proxy objects will have the highest added value.

6.2

Cache

One way to optimise database access speed is to cache data retrieved from database [32]. When the same data is accessed again the data can be fetched from the cache instead of the database. This improves the speed of access especially when the database is located behind a network connection. Of course cached data consumes memory. So caching is somehow a compromise between speed of access and size requirements. It depends on application whether caching has real advances. If application runs with tiny memory requirements and objects are known to be used only once, caching has no advantages. On the other hand, an application that is interactive and thus requires moderate access times to a database and may access same data entries many times in a way that is hard to predict by the application programmer caching can have real advantages. Caching of data is a simple idea, but it rises complicated issues when concurrent access to the data behind the cache is allowed [9]. The problem is very much the same as with a shared memory multiprocessor system with private caches. One trivial solution to cache coherency problems is to keep cached records locked, but this practically prevents all the concurrency. No simple solution exists.

6.3

Abstraction of Cursors

As described in Section 3.6 relational databases use a technique called cursors to minimize memory requirements of query results. This is essential in embedded systems where query results are likely not to fit in the memory available. Persistence layer should provide abstraction for cursors so that the application

28

can easily access a large set of objects without having them all in the memory at once. As stated in [1] persistence layer should always return a collection of objects and the principle of cursors is that they are accessed one by one. For this kind of purposes there is a design pattern called iterator [10]. An iterator is an object that represents a cursor to a set of objects. A persistence layer can return an iterator as a result from a query. This iterator object then holds the database cursor in it. When the application iterates trough the result set, the cursor fetches objects from the database when needed. This way the application can access result sets using cursors just like native collection objects of standard libraries.

6.4

Multi-object Actions

Many actions on data overlap several objects [1]. For example operations to fetch or store multiple objects access large set of objects in the database. These operations should be optimized to take place inside a single transaction and queries should be combined when possible. For example when objects are fetched automatically as application accesses them through a reference from another object, it might be that several objects are to be fetched. The access speed could be increased by allowing the persistence layer to prefetch objects in one large query. The problem is that the persistence layer can not know when to do this kind of prefetch of objects. One solution described in [21] where the application programmer gives persistence system hints about paths along witch to do prefetching. In [1] a solution where metadata used by the mapping includes hints for prefetching. The later has the advance that behaviour can be tuned when needed simply by changing the meta data.

7

Implementation Language

The implementation language used is C++, which is an object oriented programming language raised from C [31]. C++ is designed to be compatible with C, to fully support the object oriented paradigm and to be efficient [31]. This requirement for efficiency is also inherited by different libraries written in C++ and it is often the reason why to select C++ as an implementation language. This is also the reason why we have selected it to be used in our embedded system 6 as application level language. It does not provide much overhead compared to C, but has a rich standard library and is object oriented making it more effective in terms of implementation time for us. The efficiency, in addition to object oriented flexibility, is also one major goal for our persistence system. Both in terms of execution speed and memory requirements. The ways to achieve these are discussed in Section 6. There are also some limitations set by using C++ as an implementation language when it comes to object persistence.

29

7.1

Exceptions for Error Handling

An object oriented application uses error handling technique called exceptions [12]. When a routine failes it creates an exception object that describes the error taking place. Then it throws the exception and the routine that has called the erroneous routine catches the exception at some point. Exceptions are superior to traditional error codes returned by routines since the program does not have to check them for every routine. The persistence layer should also provide an abstraction for database errors in form of exceptions.

7.2

Lack of Reflectivity

Reflectivity is a feature of some object oriented programming languages that allow the program to access its type information runtime. For example in Java this feature can be used to determine the attributes of a class to be stored C++ does have only very limited reflective programming capabilities. This means that there must be some way to tell to the persistence system what attributes it should store from each class [1, 15]. This couples classes of the business domain more tightly to the persistence system which is undesirable as discussed in Section 4.2. The solution we have used is described together with the example code in Section 16.

8

Issues with Legacy Data and Applications

It is often the case that applications must be developed to co-operate with existing systems. With existing databases, applications, web services etc. This places a significant constraint to the application design. If application must access a legacy databases that is used by other applications the database schemas are not easily changed. The data obtained from previously existing systems is called legacy data [2]. Data can be exported from existing applications or it can exist in databases. It may be that it is enough to once import legacy data to the new application, but often old and new systems must coexist. In this case it may be that the new application also must be able to do updates to the existing data. Usually the existing data and applications do not reflect the requirements of the new application. If they would there probably would not be a need for a new application. It might be that there is information missing or that the database includes extra information that must by handled in case of updates etc. It is also common that a database used for a long time in an organisation has ’grown out of its schema’, meaning that as requirements have changed database structures have become outdated. New information is added to existing fields that are then parsed in applications or some fields have become unnecessary. All this makes accessing legacy data a complicated issue. Even though it is a fact that legacy data issues must be taken in account in application design they should not be the driving force of the design [2]. One should still focus on the requirements set to the application, design the application from scratch and then consider what kind of database structures is needed 30

and how well existing databases support them [6]. In these questions a capable persistence layer may come in handy. A persistence layer can support connection to different databases and data exchange formats to import and export data to and from the application. Persistence layer can also provide flexibility to the development of the application so that the user requirements can be satisfied easily including the requirement of being able to access pre-existing data.

8.1

Several Persistence Mechanism

By a persistence mechanism we mean a storage system where objects can be stored to between program runs. There are several different persistence mechanisms available ranging from flat files to object oriented databases and web services. All the technologies have multiple vendors with their own different solutions. Often an application is written to support one persistence mechanism that is currently in use and if required ported later to other mechanisms. A full featured persistence mechanism should support a large set of different persistence mechanisms [1]. This allows the porting of the application to different persistence mechanisms with minimal effort. When all the details of the storage systems used are completely hidden from the application programmer, the application becomes independent from the persistence mechanism used coupling it only to the persistence layer. If the persistence layer supports plug-ins for different persistence mechanisms that can be linked dynamically with the application large set of different mechanisms could be supported still keeping the library small enough to be ported to embedded systems. One important requirement for the support for multiple persistence mechanisms is that the system should be able to still take full advantage of each of them. This makes the design of the persistence layer more complicated. One possible design is discussed in [1].

8.2

Multi-object Actions

Many actions on data overlap several objects, these should be executed efficiently. Each object should not generate query of its own. Instead queries about different objects should be grouped together.

8.3

Multiple Connections

A persistence layer should be able to handle multiple connections to different persistence mechanisms simultaneously [1]. This gives it ability to transfer data from one persistence mechanism to another or from a version of database schema to another. This gives a lot of flexibility and extensibility 4.3 to the system by allowing it to transfer data from one connection to another. If a persistence layer supports several different persistence mechanisms as de31

scribed in Section 8.1 together with multiple connections, the persistence layer can be used as a powerful tool to export and import data from different persistence systems. For example legacy data or data generated by legacy applications can be imported from XML and stored to a relational database or data from relational databases can be queried and stored to XML to be used with third party applications. All this with minimal additions to the application code.

9

Requirements for a Persistence Layer

A general layer like persistence layer has several stakeholders that all set different requirements for the layer. Different stakeholders are summarised in table 2. ID

Stakeholder

Description

S1

Application Developer

Programmer that uses persistence layer in his application.

S2

User of the Application

User that uses an application taking advance of the persistence layer.

S3

Database Administrator

Administrator responsible for database management.

S4

Third party Application Developer

Programmer developing third party applications interacting with the one using the persistence layer.

S5

Maintenance Developer

Programmer responsible for further development of the application after it has been taken into production use.

Table 2: Stakeholders of the persistence layer The main user of the persistence layer is the application developer that uses the persistence layer in his application. The developer wants to use persistence services with as little effort as possible with out loosing the flexibility of the object oriented programming. He also have to deal with existing legacy data. Whether the application is interactive or not, it always has a user in some sense. For an application user, the usage of a persistence layer should be invisible. One important aspect of this is that persistence layer handles different error conditions properly. The user should also be unable to notice other users accessing the database simultaneously and no extra latency should be generated by the persistence layer. The database administrator should be able to take advantage of the database and the version independence provided by the persistence layer. He can switch from database to another when needed and data can be easily transferred between database management systems, program versions and other applications. A third party developer wants to be able to exchange data with the application in standard format that can be easily integrated to her application. She 32

doesn’t want to pay any attention to the internal behaviour of the application she interacts with. The maintenance developer must handle issues with existing data that previous versions of the application have produced. Still she must be able to effectively produce new features and re-factor the old software maintaining its good quality. An overview of the architecture of a system taking full advantage of a capable persistence layer is shown in Figure 9 modelled after [1] and [6]. The figure indicates dependencies between the components of the system. The persistence layer provides transparent storage services for the classes of the problem domain hiding database totally from the application. In addition the persistence layer can provide services to export data in different formats like in Figure 9 the persistence layer can transparently export some data to XML that can then be used by other applications. The persistence layer can also provide help dealing with legacy data, a problem that most database developers have to deal with. The database itself is not depending on the persistence layer so it can also be accessed without the persistence layer.

Figure 9: Role of the persistence layer in a software system From literature we have found some problems involved with persistence layers that should be addressed somehow. They are covered in the previous sections and are summarized in table 3. We list the properties that a persistence layer should have based on from [1],[6] and [2]. The features are described in a form of statements that can also be used to validate the design and implementation of a persistence layer. References to all the stakeholders of the projects and priorities for each feature are also included in the table. ID

Feature

Description

Priority

33

Stakeholder

F1

Minimize modelling effort

The persistence layer should be able to generate initial database schema and mapping metadata. Section 5.1.

medium

S1,S3

F2

Store single objects

The system should be able to save and restored objects to and from relational database. Section 5.2.

high

S1, S2

F3

Manage objects relationships

The system should be able to maintain the relationships between objects stored and maintain referential integrity. Section 5.5.

high

S1,S2

F4

Lazy initialisation

The system restores objects from database when referenced to trough another object. Section 4.1.

medium

S1

F5

Store inheritance hierarchies

The system should be able to store inheritance hierarcies to database. Section 5.6.

high

S1

F6

Object identity

Persistence Layer must preserve the identity properties of objects. Section 5.3.

high

S1

F7

Strore collections

Persistence layer is able to store collections of objects preserving their ordering properties. Section 5.4.

high

S1

F8

Represent query results

Persistence layer is able to map arbitrary query results to objects. Section 5.9.

low

S1

F9

Several persistence mechanisms

The system should be able to save objects to different persistence mechanisms like different databases and XML files. Section 8.1.

medium

S1, S5

34

F10

Full encapsulation of persistence mechanism

It should be possible to change persistence mechanism without modification to the application code. Section 4.2.

medium

S1, S3, S5

F11

Support transactions

The persistence system should use transactions so that concurrent acces to the objects in the mechanism does not corrupt the data. Section 3.4.

high

S1, S3

F12

Extensibility The persistence layer should support addition of classes to existing class models and updates to database schema. Section 4.3.

medium

S1, S3

F13

Locking

The persistence layer should provide locking for objects in a persistence mechanism. Section 3.5.

medium

S1, S3

F14

Access control

Persistence layer can control the access to objects in persistence mechanism. Section 19.2.

low

S1,S2,S4

F15

Cursors

When multiple objects are fetched at once the persistence layer should be able to use database cursor to fetch them one at the time. Section 3.6.

medium

S1

F16

Proxy Objects

When only few of the attributes of an object is needed an proxy object with only needed attributes is fetched. Others are fetched when needed. Section 6.1.

low

S1

F17

Cache

The persistence layer uses a cache to speed up database operations. Section 6.2.

low

S1

35

F18

Multiple connections

The persistence layer should be able to handle multiple simultanous connections to persistence mechanisms so that it can transfer objects from persistence mechanism to another. Section 8.3.

medium

S1, S2, S4

F19

Application interface

Persistence layer should provide simple and clean interface for application programmer to the persistence services, that is compatible with C++ standard template library and take account the limitations of the language. Section 7.

high

S1, S5

F20

Error handling

The system should provide object oriented abstraction for different error handling systems in different persistence mechanism in terms of exceptions. Section 7.1.

high

S1, S5

Table 3: Common Problems With Persistence Layers

9.1

Questions we are looking answers for

The questions that we are seeking answers for in this thesis are listed in table 4. In Section 17 answers to these questions are discussed. Q1

Is it worth the effort spent to build a persistence layer between applications and relational databases ? It might be that a general purpose persistence layer turns out to be so complicated system that implementing one would not be worth doing it.

Q2

Does the usage of a persistence layer require extra effort from the application programmer? If the persistence layer sets a lot of limitations to the application the usage of it may become a burden. Does the effort spent by the application programmer pay back both in short term and long term?

36

Q3

Is it possible to provide extra flexibility for iterative development? A real advance from a persistence layer would be that application could be developed iteratively without paying too much attention to the database structures in a relational database

Q4

Is it possible to provide flexible access to legacy data It would have real added value if a persistence layer could provide access to different formats of pre-existing data.

Q5

Is it possible to provide easy transition over different databases If a persistence layer could encapsulate all the database related operations the application would become really independent of the database, which would make transition from one database to another fairly easy.

Q6

Is it possible to achieve both flexibility and efficiency Many questions related to a persistence layer compromise between flexibility and performance. We are looking for solutions that try to achieve both to some degree. When not possible we examine if it is possible to tune the persistence layer per application to meet the requirements of the task at hand.

Q7

Is it possible to use persistence layer in our embedded systems? If a persistence layer can be implemented effectively enough in terms of execution speed, memory usage and code size and if it does not depend of too many libraries it could be used on our embedded Linux system.

Q8

What kind of process changes does the inauguration of a persistence layer require? When integrating a persistence solution at process level some changes may have to be done to the traditional way of doing things. What kind of changes are required and what are the possible advantages and disadvantages of these? Table 4: Question set for these thesis

9.2

Goals for Our Implementation

In our solution the main focus is on rapid, flexible development not forgetting the performance limitations set by an embedded system. In terms of this thesis no full featured implementation is given, but instead we will represent a design that we have tested with a simple test implementation. The main goal is to reduce the effort spent on data modelling and to make it possible to easily adopt an iterative, flexible process while using relational databases and to promote the reuse of the persistence related code.

37

Although a persistence layer propably introduces some performance penalty, we are trying to keep it moderate enough to make the solution usable on our embedded system. In addition to flexibility in development we are trying to provide flexibility at the data access level. We will examine the possibility to change the mapping of class data model to a relational model for an existing application for database optimisations etc. Though we concentrate on relational databases, that we are trying to abstract away from the application developer, we do not exclude the possibility that the underlaying persistence mechanism could be something completly different. For example XML files etc. For legacy data access, we are providing multiple simultaneous connections to exchange data between persistence mechanisms.

10

Our Solution

In the following the architecture of our solution is described. We use 4 + 1 view points like described in [14] except that instead of physical view point, we use a data viewpoint. The use of different viewpoints is also promoted by IEEE standard 1471 [20]. We also provide references to the features listed in table 3 to provide traceability for different design solutions. The first view begins with the logical view in section 11, the static structure of the system described by class diagrams. Then we discuss the development view in section 12 of the persistence layer. Here questions like code organisation to libraries and linking to applications are discussed. Solutions to issues related with performance, concurrency and other dynamic characteristics of the system are described in section 13. In section 14 we will describe database structures used and generated by persistence layer. Finally we provide scenarios to interconnect these different views in section 15. In each scenario a typical request from the application to the persistence layer is described together with illustrations of how the components of the persistence layer interact to fulfill this request.

11

Logical View

Here the static structure of the persistence layer is described. We first start from a higher level architecture and then descend to the different subsets of it. In Figure 10 the overall architecture of the persistence layer is illustrated. Only the major classes are displayed. Class P ersistentObject defines an templatized interface to be implemented by classes to be stored trough the system. It requires them to implement method GetClassData that is used to overcome the lack of reflexity in C++. P ersistentObject is also used to manage the state of the objects using state design pattern as described in [10]. The issues related to object states are discussed further in section 13. GetClassData method returns an object of type ClassData. It is an interface 38

used to describe the structure of a class. It is implemented by ClassDataObject template class as illustrated in Figure 12. The singleton class ClassM anager is used to manage the class information and to form an object oriented presentation of the metadata describing the data structures to be stored to trough the persitence system. All the attributes of the classes to be stored are listed in this structure. Templates are used to generalize different attribute types. This representation of the datamodel is stored to the persistence mechanism in question. It can then be tuned for performance and to map classes to a different database structure. In namespace P ersistenceM echanism interface for a persistence mechanism to implement is described. The idea is to fully abstract the persistence mechanism both from application and from the rest of the persistence layer. This allows the application to change the persistence mechanism when needed as stated by feature F 9 in table 3. The abstract class P ersistenceConnection hides the details of a persistence mechanism from the application. All the operations to a single persistence mechanism are passed trough this interface. When application operates on multiple persistence mechanisms it uses one of this type of objects for each of them to fullfill the feature F 18. Abstract class Generator is used to generate queries from their object representations as stated by features F 19 and F 10. The interface does not define any operations since not all the persistence mechanisms provide all the possible functionalities. When queries are then generated to the representation specific to the persistence mechanism C++ templates are used to provide compile time checks for syntax of queries and that the mechanism used supports the features requested. The T ransaction interface is used to abstracts the transaction on the persistence mechanisms as required by feature F 11. The interface class automatically rollbacks uncommitted transactions in its destructor. This way the transactions can be used in natural object oriented manner, where exceptions thrown in the middle of transaction automatically reverses the changes done. The transaction objects are created using factory method in class P ersistenceConnection that is a friend class of the T ransaction class witch has a private constructor. This way the transactions are always bound to a connection to the persistence system. Just like transactions also database cursors are abstracted in an object oriented manner. The abstarct class P ersistenceIterator is a sub class of the standard library iterator that is to be implemented by the persistence mechanisms. As a concept iterators and cursors are quite similar so this abstraction provides very easy access to the cursors for object oriented applications. ObjectM anager class is responsible for all the operation performed on the objects. It adds an object oriented layer over the abstraction of the persistence mechanisms. It converts the persistence related object operations to a form understandable by the P ersistenceConnection interface. The routines used to fetch objects return iterators. The iterator class used can be extented by a persistence mechanism to provide abstraction for cursors as stated by feature F 15. Also proxy objects and lazy initialisation, features F 4 and F 16 ,would be responsibilities of the object manager if implemented. When the persistence layer is extented to support different mappings the object manager can be devided 39

into set of classes and encapsulate the mapping operations behind a common interface to be called from the ObjectM anager. This way there could be a few different classes to fullfill this interface and a factory method to create proper mapping object for each class to be stored. Singleton class P ersistenceF acade is used to manage different operations on the persistence layer. Classes are registered trough it and it can be used to do different operations on the main persistence mechanism. The main persistence mechanism is the one that is used to generate object identifiers and is used as the primary storage for objects. The application does not have to take care of the management of this connection. Application can also create additional connections to save objects to different persistence mechanisms.

Figure 10: The Architecture of The Persistence Layer

40

Figure 11: Abstraction of Persistence Mechanism

Figure 12: Class structures used to describe the data model of the application

11.1

Representing Queries as Objects

As mentioned in Section 5.8, queries should be generated by the application in a way that completly decouples it from the persistence mechanism. This design is shown to full fill requirement F 10 in table 3. In Figure 13 one way to present queries as objects is described. This is partly modelled after [1], but is modified to take implementation language into account. In C++ templates provides a way to syntax check queries against a database in compile time. Also abstraction of the connectivity to the database is provided by this class structure by encapsulating data access using interface class P ersistenceM echanism as described in figure 11. The query can be either one that fetches object from a database, one that updates existing objects or one that inserts a new one. The ones that does updates and fetching of objects are given the conditions that an object should met to be affected by the query. Complete class structure describing a where clause is shown in Figure 14. This object representation of queries is then generated into a query by an object representing the current persistence mechanism inherited from the abstract base class P ersistenceM echanism. This way this abstract presentation of a query can be generated to one that can be effectively run on the database currently in use, without coupling application to the database [1]. Our abstraction, as well as the one represented in [1] is not even nearly as 41

powerfull as SQL, but it is designed to be good enough for our persistence layer. Full fuatured object oriented query languages are far beyond our scope.

Figure 13: Class Structure Representing a Query

12

Development View

Here we describe the actual software module organisation of the persistence layer. This is a non-trivial question since as a library persistence layer must be able to adopt itself to different software configurations. In addition we also must cover little the application that is using our persistence layer, since the library is of little use by itself. Since one of the goals of our system was that it runs on different persistence mechanisms it is a client for many libraries itself. There are different ODBC client libraries, native client libraries for different databases and XML-parser libraries just to mention few. The persistence layer can not require all of these libraries to be installed. Especially in embedded systems it is extremely important to be as independent from third party libraries as possible. The persistence solution desinged by us is to be devided into several libraries to reduce dependencies. The idea is to split each abstraction of different persistence mechanisms into a library of it’s own. These libraries depend on the third party libraries necessery to access the persistence mechanism in question and the main persistence layer library. This way application can be linked against only the persistence mechanisms really needed. This is especially important on platforms where dynamic linking is not supported. The organisation of code into libraries is described in Figure 15. The arrows in the Figure describe the dependecies among different software components.

42

Figure 14: Class Structure Representing a Where Clause

Figure 15: Organisation of the software components of the persistence layer

13

Process View

Here we discuss the dynamic nature of the persistence layer. Questions like efficiency in terms of bot hexecution speed and memory usage and concurrency 43

issues are to be addressed. References to the features listed in table 3 are mentioned when apropriate. Since our goal is to use persistence layer in an embedded environment efficiency is an issue for us. Our design is kept as simple as possible to keep the library small. This simplicity also aims at faster execution. Since the database operations are probably to take most of the time so they should be optimized as well as possible. The persistence layer should not generate too much queries to avoid extra overhead. This is done by grouping the queries together as much as possible by storing prefetch information together with the metadata described in section 14. Since most of the application in our environment does not have to keep all the objects fetched from the database in memory at once, the persistence layer should not require this either. This is why the abstraction of database cursors is so vital to our solution, since is makes it possible to access large sets of objects in database without instantiating all of them simultanously which makes the memory usage of the application not dependent on the number of the objects processed. As can be seen in figure 10 the state of an object is hold in P ersistentObject abstract class. The state is modelled with classes with a common interface as stated in desing pattern ’state’ in [10]. The state dependant functionality is set in to the class representing the state in question. In our design this state class is used as a visitor class in ObjectM anager when persistence operations are executed. The main difference in behaviour is for example, how object is stored. If object is new it has to be first created to the persistence mechanism. If it is in state dirty is is simply updated. And if it is clean it does not have to be stored at all. A conservative aproach is used on the dirtyness of the objects. It can not be fully detected wheter an object is changed or not so it is assumed to be dirty when ever unsure. In simple implementation objects practically always are dirty. The abstraction for transactions as described in section 11 is used to keep database in consistent state. The persistence layer generates transactions for database operations, but in addition it should be possible for the application to generate transactions. The P ersistenceConnection class keeps track of the on going transactions on the persistence mechanism in question so that the persistence layer knows wheter a persistence operation is performed in the middle of an application initialized transaction or if it has to generate one implicitely. As mentioned in section 11 transaction interface class does automatic rollback for unfinnished transactions in it’s destructor to allow transactions to interact with exception error handling. In addition to transactions also a higher level concurrency solution should be developed. Option for optimistic lockin scheme is taken into account in database desing as can be seen in section 14. The timestamp for objects is stored as an additional shadow attribute for each object, if an collision is detected, as described in section 3.5 and in [2], and exception is thrown to the application. For pessimistic locking scheme, application initialized transactions can be used. As discussed above, the database errors are all delivered to the application in form of exceptions to support native object oriented error handling as stated

44

in feature F 20. Combined with proper object oriented transaction support this reduces the effort that application programmer has to do to handle database related error conditions.

14

Data View

In this section we describe the datamodel that persistence layer uses to store data to a persistence mechanism. Main focus is on mapping a class structure into a relational database, but the same datamodel may be applied to different persistence mechanisms.

Figure 16: The meta data representing the class structure of an application The persistence layer can generate a data model from the class structures in the application. This data model forms the basis of the meta data that describes the mapping of the classes to the relational database. The data structure used to store the metadata is illustrated in figure 16 3 . The database structures were first defined in normal form, but then optimized for performance by using denormalisation as described in [2]. The idea of the metadata based mapping is to reduce coupling between database and application [1]. Meta data can be customized to adopt the mapping to changes in database structure and to optimize application performance. As described in figure 16 the meta data describes classes their attributes and relationships between classes. A class carries information about how it is mapped 3 UML is used for data modelling here. There is no standard for this yet, but we use the syntax used in [2].

45

to tables. It can use a general mapping or it can be mapped to a table of it’s own as stated in section 5.2. If mapping of a single class to a single table is used the table and column is specified in the meta data. If the meta data model is compared to the generic structure represented in figure 7 it can be seen that the metadata is actually a sub set of this general structure. This way a general structure representing objects can be converted in to an one-table-toone-class mapping without changes in application, which allows the application development team to change the mapping when needed. The table relationship in figure 16 is used to capture the versatile nature of inter object relationships. When the persistence layer generates the metadata to store objects it gives default values for the relationship table based on the relationships between classes. As discussed in section 4.1 the type of the relationship between objects give hints about the cascading operations on objects. On inheritance relationship all cascading operations are used and multiplicities are fixed, but for association by default no cascading operations are used. These attributes can be customized to optimized the memory requirements and execution speed of the application. The version table in the figure 16 stores a running version number for the data model. This issue is not fully addressed in our work but is further discussed in section 19.3.

15

Scenarios

In this section we will describe couble of scenarios about how our persistence layer design is actually performing its basic tasks, storing and restoring objects. When the application first creates an persistent object it is like any other object in the application. It is created and initialized by its constructor. It’s super class P ersistentObject set the state of the object to be N ew. When the object is stored trough the persistence layer, as in line 18 of the code example in figure 19, the persistence facade is called. It calls the object manager with the default persistence connection of the application, which in this case would be a connection to a relational database. The object manager then uses the methods defined in P ersistentObject interface class to read the attribute values of the object to be stored. It uses the information stored in ClassM anager to map these attributes to a set of query objects. After this it uses the persistence connection in question to start a transaction and then passes the query objects to the persistence connection using the interface class P ersistenceConnection which then generates the proper queries for the persistence mechanism in question. Object manager commits the transaction, returns and the object is stored to the persistence mechanism in question. In the code example in figure 19 an object is restored from the persistence mechanism using a simple query that defines a condition over the parameters of the object. As in case of storing the object, the P ersistenceF acade calls the ObjectM anager to restore the objects from default persistence mechanism. The object manager creates query objects to read objects and combines the condition provided by the application to this query. Again the mapping in-

46

formation is got from the ClassM anager. The query is then passed to the P ersistenceConnection interface that returns an abstraction of a database cursor for the persistence mechanism in question. The information returned by this iterator is then used to generate objects in the iterator the ObjectM anager returns to the application trough the persistence facade.

16

Usaging Persistence Layer from an Application

In this section I will briefly describe how an application uses persistence layer at code level. I will use some code examples and class diagrams to illustrate this. Since C++ lacks reflective programming capabilities, the application classes must be coupled to the persistence framework. In Figure 17 the organisation of classes using persistence framework is described. The classes in the application domain to be stored using persistence layer inherit this capability from the class P ersistentObject provided by persistence layer. When storing and retrieving classes from the persistence layer application calls services provided by P ersistenceF acade that is a singleton class acting as a portal to all the services provided by the persistence layer. The actual C++ code to define the M usic class is shown in Figure 17. 4

Figure 17: Example of Persistence Layer Usage As seen on line 3 of the code example, class M usic inherits its persistence capabilities from P ersistentObject template class to which the type of the M usic class itself is passed as a template argument. On line 9 the default constructor of the M usic class calls the constructor of P ersistentObject and on the lines from 15 to 20 an abstract method of P ersistentObject is implemented to return the set of attributes to be stored to the persistence mechanism by the persistence layer. In the same way also the reference to the Cathegory class is defined. 4 Storing artist as a string here is not a good example of an object design, but serves well in this simplified example.

47

1

/ · · Class that

3

inherits

PersistentObject

2

persistence

c a p a b i l i t i e s from

·/

c l a s s Music : p u b l i c P e r s i s t e n c e : : P e r s i s t e n t O b j e c t {

4

public :

5 6

/· ·

7

constructor

c a l l i n g the c o n s t r u c t o r of the PersistentObject

8

·/

Music ( ) : P e r s i s t e n t O b j e c t ( ) { } ;

9 10

/ · · method t o g e t a t t r i b u t e s o f t h i s c l a s s . Used by p e r s i s t e n c e l a y e r t o s e t and g e t a t t r i b u t e v a l u e s o f t h e

11 12 13

object

14

·/

v i r t u a l Persistence : : Attributes GetAttributes ( ) { return Persistence : : Attributes () . AddAttribute( ”name ” , name ) . AddAttribute( ” a r t i s t ” , a r t i s t ) . AddReference ( c a t e g o r y ) ; }

15 16 17 18 19 20 21

SetName ( s t d : : s t r i n g name ) ; SetArtist ( std : : s t r i n g a r t i s t ) ;

22 23 24

private :

25 26

s t d : : s t r i n g name ; std : : s t r i n g a r t i s t ;

27 28 29

/ · · Reference to a category o b j e c t that w i l l

30

be r e t r i e v e d o n l y when r e f e r e n c e d t o .

31

33

·/

P e r s i s t e c e L a y e r : : L a z y R e f e r e n c e c a t e g o r y ;

32

};

Figure 18: Code to Define a Class of Persistent Objects

48

The code to store and retrieve persistent objects through the persistence layer is shown in Figure 19. 1 2 3

i n t main ( ) { PersistenceFacade · facade = PersistenceFacade : : GetInstance ( ) ;

4

/ · Connection o b j e c t to connect p e r s i s t e n c e

5

layer

t o a p o s t g r e SQL d a t a b a s e · /

6

f a c e d e−>c o n n e c t( ”username ” , ”password ” , ”1 2 7 . 0 . 0 . 1 ” ) ;

7 8 9 10 11 12

/ · C r e a t e new o b j e c t u s i n g f a c a d e · /

13

Music · music = new Music ;

14

music−>SetName ( ” A l l You Need I s Love ” ) ; music−>S e t A r t i s t ( ” B e a t l e s , The ” ) ;

15 16 17 18

/ · Store the o b j e c t

19

f a c a d e−>S t o r e O b j e c t ( music ) ;

·/

20

d e l e t e music ;

21 22

/ · Get a l l music o b j e c t s where name e q u a l s

23

’ A l l You Need I s Love ’ · /

24

f o r ( P e r s i s t e n c e : : i t e r a t o r i= f a c a d e−>R e s t o r e O b j e c t s( Attr ( ”name ”) == Value ( ” A l l You Need I s Love ” ) ) ; i 6= i . end ( ) ; ++i ) c o u t