Data Modeling with the UML
Michael R. Blaha, DSc. Modelsoft Consulting Corporation E-mail:
[email protected] Web: www.modelsoftcorp.com
Chicago DAMA August 17, 2011
2011 Michael Blaha
Data Modeling with the UML
1
Section 1: Introduction
What is the UML? The UML (Unified Modeling Language) is a graphical language for modeling software. • The UML arose from the many object-oriented software development approaches of the 1990s. • The purpose of the UML was to standardize object-oriented concepts and notation. • The UML has been sponsored by the OMG (Object Management Group). This talk will explain UML constructs through IDEF1X and examples.
2011 Michael Blaha
Data Modeling with the UML
2
Various UML Diagrams The UML has a variety of diagrams including the following: • Class diagram. Involves classes and relationships. Specifies data structure. • Object diagram. Documents individual objects and links among objects. Shows examples of data structure. • Use case diagram. Specifies high-level software functionality from the perspective of an end-user. • State diagram. Concerns states and events that cause transitions between states. Describes the discrete, temporal behavior of objects. • Activity diagram. Shows the workflow for an individual piece of functionality. • Sequence diagram. Shows how processes interact, with whom and what, and in what order. The remainder of this lecture covers the class model, the model that is most relevant for databases. 2011 Michael Blaha
Data Modeling with the UML
3
Why Use the UML? What Is It Good For? The UML class model is suitable for conceptual data models. • Communicating with business experts. – The UML avoids confusing database details. ♦
Can show attributes without addressing keys.
♦
Avoid distinction between independent and dependent entities.
♦
Can show role names without showing attributes.
♦
Can show selected relationships by package / subject area.
• Communicating with programmers. – Most programmers are familiar with the UML and like it. • Abstraction. – It is helpful to suppress database details when thinking about deep concepts such as data modeling patterns. • Behavior. – UML operations can summarize stored procedures and SOA services. 2011 Michael Blaha
Data Modeling with the UML
4
What Are Weaknesses of the UML? The UML class model does not address database details. • Does not address database design. – The UML notation does not cover database design. – Most UML tools also lack database design support. ♦
Enterprise Architect has some database support. (Need to investigate further.)
• Overemphasizes programming jargon. – The UML creators focused on programming and ignored databases. – Ironically the programming jargon is superficial and the UML has much to offer for database applications. • Has little use by database practitioners. – Many database practitioners know about the UML. – There is a chasm between programming and database cultures.
2011 Michael Blaha
Data Modeling with the UML
5
My Development Process • Construct a model of conceptual intent (using the UML). – Use a UML tool such as Enterprise Architect or Magic Draw. – Build a UML class model in conjunction with business experts. – Write an explanation of the business intent. • Construct a database design model (using IDEF1X or another database notation). – Use a database design tool such as ERwin or Dezign. – Often must manually re-key the model. – Observe the organization’s database design conventions (naming, foreign key enforcement, triggers, etc.) – Generate SQL code. • Maintain both models and explanation as revisions occur. • Use agile development.
2011 Michael Blaha
Data Modeling with the UML
6
Any Comments? • Does anyone here use the UML? – If so, what UML tools? • Have I missed any major UML strengths and weaknesses? • Does anyone construct data models interactively with business customers? Note that this seminar focuses on transactional (OLTP) applications. There is no compelling reason to use the UML for data warehouse (OLAP) applications.
2011 Michael Blaha
Data Modeling with the UML
7
Section 2: A Sample Data Model
A UML Data Model AirlinePartnership
0..1
partnershipName
FrequentFlyerAccount
accountStartDate balanceCurrentAmount balanceCurrentDate 1
*
flightNumber 1
*
* Company companyName
2011 Michael Blaha
*
Flight
partner 1
activityDate mileageActualAmount mileageBonusAmount
0..1
mileageMonthlyAmount 0..1
*
* Activity
accountNumber airlineName
customerName addressString
1
MonthlyStatement
Airline
Customer
statementClosingDate memberStartDate memberEndDate
1
1
*
FlightActivity
OtherActivity
description serviceClass
otherActivityType
0..1
Data Modeling with the UML
*
8
An IDEF1X Data Model FrequentFlyerAccount
Customer
frequentFlyerAccountID
MonthlyStatement frequentFlyerAccountID (FK, AK1) statementClosingDate (AK1) mileageMonthlyAmount
Lists
Airline airlineID Participates
Posts activityID
monthlyStatementID
memberStartDate memberEndDate
customerName addressString
Activity
Has
partnerID (FK) partnershipID (FK)
customerID
airlineID (FK, AK1) accountNumber (AK1) accountStartDate balanceCurrentAmount balanceCurrentDate customerID (FK)
Offers
Airline_AirlinePartnership
Owns
activityDate mileageActualAmount mileageBonusAmount activityDiscrim frequentFlyerAccountID (FK) myonthlyStatementID (FK) Company
Flight
airlineName (AK1)
Schedules
flightID
companyID
airlineID (FK, AK1) flightNumber (AK1)
companyName (AK1) activityDiscrim
Involves AirlinePartnership
Sponsors IsFor
partnershipID
FlightActivity
OtherActivity
flightActivityID (FK) description serviceClass flightID (FK)
partnershipName (AK1)
2011 Michael Blaha
otherActivityID (FK) otherActivityType companyID (FK)
Data Modeling with the UML
9
Section 3: UML Class
UML Class FrequentFlyerAccount FrequentFlyerAccount
frequentFlyerAccountID
accountStartDate balanceCurrentAmount balanceCurrentDate
accountStartDate balanceCurrentAmount balanceCurrentDate
UML
IDEF1X
• Object — a concept, abstraction, or thing with identity that has meaning for an application. • Class — a description of a group of objects with the same attributes, operations, kinds of relationships, and semantic intent. – UML notation. A box with the name of the class in the top portion. – Examples. Airline, FrequentFlyerAccount, and Activity. (Section 6 discusses identity.) 2011 Michael Blaha
Data Modeling with the UML
10
UML Attribute • Value — a piece of data. Values have no identity. • Attribute — a named property of a class that describes a value held by each object of the class. An attribute is a “slot” for data. – Analogy — class::object as attribute::value. – UML notation. Listed in a second portion beneath the class name. – Example. FrequentFlyerAccount has three attributes. • Operation — a function or procedure that may be applied to or by objects in a class. – UML notation. Listed in the third portion of class box. (Not shown.) – Example. A stored procedure could update balanceCurrentAmount each time an Activity posts.
2011 Michael Blaha
Data Modeling with the UML
11
Section 4: UML Association
UML Association FrequentFlyerAccount accountStartDate balanceCurrentAmount balanceCurrentDate
* 1 Customer customerName addressString
FrequentFlyerAccount frequentFlyerAccountID
Owns
accountStartDate balanceCurrentAmount balanceCurrentDate customerID (FK)
UML
2011 Michael Blaha
Customer customerID customerName addressString
IDEF1X
Data Modeling with the UML
12
UML Association (continued) • Link — a physical or conceptual connection among objects. • Association — a description of a group of links with common structure and semantics. – The links of an association connect objects from the same classes. – An association describes a set of potential links in the same way that a class describes a set of potential objects. – UML notation. A line (possibly with multiple line segments) connects the related classes. – Example. The line between FrequentFlyerAccount and Customer.
2011 Michael Blaha
Data Modeling with the UML
13
UML Association End • A binary association has two ends. – Each end can have a name and multiplicity. • End name — an alias for a class in an association. – UML notation. A legend next to the class–association intersection. – Example. An Airline is a partner in an AirlinePartnership. • Multiplicity — the number of instances of one class that may relate to a single instance of an associated class. – Multiplicity constrains the number of related objects. – Often called “cardinality” (though mathematically incorrect). – UML notation. Usually “1”, “0..1”, and “*” (“many” — zero or more). – Example. A FrequentFlyerAccount has one Customer. A Customer can have many FrequentFlyerAccounts.
2011 Michael Blaha
Data Modeling with the UML
14
UML Association Class AirlinePartnership partnershipName
* memberStartDate memberEndDate
* partner Airline airlineName
Airline_AirlinePartnership partnerID (FK) partnershipID (FK) memberStartDate memberEndDate AirlinePartnership
Airline airlineID
Participates
airlineName (AK1)
Involves
partnershipID partnershipName (AK1)
UML
IDEF1X
• Association class — an association that is also a class. – Like an association, the occurrences of an association class derive identity from objects of the constituent classes. – Like a class, an association class can have attributes and operations and participate in associations. 2011 Michael Blaha
Data Modeling with the UML
15
UML Association Class (continued) – UML notation. A box connected to the association with a dotted line. – Example. The association class between AirlinePartnership and Airline has two attributes.
2011 Michael Blaha
Data Modeling with the UML
16
Problem 1 — Association Class Compare the following models. Which one is better? The left model represents Subscription as an association class; the right model treats Subscription as an ordinary class. subscriber Person *
* Magazine
name Subscription address expirationDate phone 1
name
* Payment
(a)
Person subscriber 1 name address phone
* Subscription *
expirationDate 1
* Magazine name
Payment
1
(b)
date amount
date amount
A person may have multiple magazine subscriptions. A magazine has multiple subscribers. For each subscription, it is important to track the date and amount of each payment as well as the current expiration date.
2011 Michael Blaha
Data Modeling with the UML
17
UML Qualified Association Airline airlineID Airline
airlineName (AK1)
airlineName accountNumber 1 0..1
Offers FrequentFlyerAccount frequentFlyerAccountID
accountStartDate balanceCurrentAmount balanceCurrentDate
airlineID (FK, AK1) accountNumber (AK1) accountStartDate balanceCurrentAmount balanceCurrentDate
UML
IDEF1X
FrequentFlyerAccount
• Qualified association — an association in which an attribute called the qualifier partially or fully disambiguates the objects for a “many” association end. 2011 Michael Blaha
Data Modeling with the UML
18
UML Qualified Association (continued) – The qualifier selects among the target objects, reducing the effective multiplicity, often from “many” to “one”. – Names are often qualifiers. – UML notation. A small box on the association line by the source class. The source class plus the qualifier yields the target class. – Example. The accountNumber for a FrequentFlyerAccount is unique within the context of the issuing Airline. – Example. The statementClosingDate for a MonthlyStatement is unique within the context of a FrequentFlyerAccount.
2011 Michael Blaha
Data Modeling with the UML
19
Problem 2 — Qualified Association Restate the following model to use qualifiers. Index name
* isUnique * *
1 Database 1 name
* Table 1 name
* {ordered}
1
1
Column
*{ordered} *
name {ordered} isNull dataType * *
*
*
UniqueKey
{ordered} source
target {ordered}
ForeignKey name OnDeleteOption onUpdateOption
*
*
name primaryOrAlternate
2011 Michael Blaha
Data Modeling with the UML
20
Section 5: UML Generalization
UML Generalization Activity activityID activityDate mileageActualAmount mileageBonusAmount activityDiscrim
Activity activityDate mileageActualAmount mileageBonusAmount
activityDiscrim
FlightActivity FlightActivity
OtherActivity
description serviceClass
otherActivityType
description serviceClass
UML
2011 Michael Blaha
OtherActivity
flightActivityID (FK)
otherActivityID (FK) otherActivityType
IDEF1X
Data Modeling with the UML
21
UML Generalization (continued) • Generalization — the relationship between a class (the superclass) and one or more variations of the class (the subclasses). – Generalization structures the description of objects. – The superclass has general data. – The subclasses have specific data. – UML notation. A large hollow arrowhead points to the superclass. Lines fan out towards the subclasses. – Example. An Activity can be a FlightActivity or an OtherActivity.
2011 Michael Blaha
Data Modeling with the UML
22
Abridged Class Metamodel Attribute name
Qualifier
*
InstanceAttribute
{ordered}
*
Discriminator 1 1
{ordered}
Generalization abstractOrConcrete
*
1
*
Element name 1
* End name 2..* multiplicity
2011 Michael Blaha
1 subclass 1
Association
*
Class
1 superclass
Data Modeling with the UML
23
Section 6: Identity Identity — the property that distinguishes an entity from all others. • Existence-based identity — a system-generated identifier. – Each object primary key is a single field, small, and uniform in size. – The database has a consistent approach to identity. – However... The database can be more complex to inspect and debug. • Value-based identity — a unique combination of real-world attributes. – Primary keys have intrinsic meaning. – However... A primary key change must propagate to foreign keys. – However... Some objects lack value-based identifiers. – However... Multi-field primary keys can be unwieldy. • Mix of existence-based and value-based. – However... An irregular approach.
2011 Michael Blaha
Data Modeling with the UML
24
Approaches to Identity (continued) • Surrogate identity — identification via related entities. – Example. Identifying a person via a passport, driver’s license, or identity card. – However... Violates normal forms. I normally use existence-based identity. (As a minor variation, it is OK to use value-based acronyms for enumeration tables.)
2011 Michael Blaha
Data Modeling with the UML
25
UML Qualifier • UML qualifier — an attribute that distinguishes among “many” objects. – Example. An Airline has many FrequentFlyerAccounts. An Airline plus an accountNumber yield at most one FrequentFlyerAccount. FrequentFlyerAccount Airline
UML
airlineName
accountNumber
FrequentFlyerAccount
Airline
IDEF1X
2011 Michael Blaha
accountStartDate 1 0..1 balanceCurrentAmount balanceCurrentDate
airlineID
frequentFlyerAccountID
airlineName (AK1.1)
airlineID (FK, AK1) accountNumber (AK1) accountStartDate balanceCurrentAmount balanceCurrentDate
Data Modeling with the UML
26
Candidate Key (Alternate Key) • Candidate key — a combination of one or more fields that uniquely identify the records in a table. – The set of fields in a candidate key must be minimal. – No field in a candidate key can be null. – The DBMS can guarantee the uniqueness of candidate keys. – UML notation. None... – Examples. AirlineName is a candidate key. Of course, a primary key is an arbitrary choice of candidate key.
2011 Michael Blaha
Data Modeling with the UML
27
Name Names are prominent in models. A name is a word or phrase that designates a person or thing. There are four scenarios for how names can be used. – Unique names. Some names are unique and resolve to a single entity. Airline name is globally unique. – Unique names within a context. Other names are not unique on their own but are unique when combined with a parent entity (UML qualifiers). The accountNumber provides the unique name for a FrequentFlyerAccount within the context of an Airline. – Non-unique names. Still other names provide important description but alone cannot find an entity. For example, person names are important, but insufficient for finding an individual person. – Multiple unique names. Some entities have multiple names. For example, propylene is known as propylene and C3H6.
2011 Michael Blaha
Data Modeling with the UML
28
Structured Field • Structured field — a field that is composed from constituent pieces with a specified grammar. – Structured fields are synthetic but the pieces have meaning. – Many structured fields have standard protocols. – Examples. Mechanical parts and items for commerce often have structured fields that provide identity. – Example. The UPC (Universal Product Code) has twelve digits. ♦
The first digit indicates the kind of item—general merchandise, random-weight item, health item, in-house item, and coupon.
♦
The next five digits denote the vendor.
♦
Digits seven through eleven identify an item within the context of a vendor.
♦
The last digit is a check digit.
2011 Michael Blaha
Data Modeling with the UML
29
IDEF1X Notation — An Example FrequentFlyerAccount
Customer
frequentFlyerAccountID
Owns
airlineID (FK, AK1) accountNumber (AK1) accountStartDate balanceCurrentAmount balanceCurrentDate customerID (FK)
Offers
customerID customerName addressString
Posts Activity
Has MonthlyStatement
activityID
monthlyStatementID frequentFlyerAccountID (FK, AK1) statementClosingDate (AK1) mileageMonthlyAmount Airline_AirlinePartnership airlineID (FK) partnershipID (FK) memberStartDate memberEndDate
Airline Flight
airlineID Participates
Lists
airlineName (AK1)
activityDate mileageActualAmount mileageBonusAmount activityDiscrim frequentFlyerAccountID (FK) monthlyStatementID (FK) Company
flightID
companyID
airlineID (FK, AK1) flightNumber (AK1)
companyName (AK1)
Schedules
activityDiscrim
Involves AirlinePartnership partnershipID partnershipName (AK1)
2011 Michael Blaha
IsFor
FlightActivity flightActivityID (FK) description serviceClass flightID (FK)
Data Modeling with the UML
Sponsors
OtherActivity otherActivityID (FK) otherActivityType companyID (FK)
30
IDEF1X Notation — Another Design FrequentFlyerAccount
Customer
airlineID (FK) accountNumber
Offers
Owns
customerName addressString
accountStartDate balanceCurrentAmount balanceCurrentDate customerID (FK)
Has
customerID
Posts Activity
MonthlyStatement airlineID (FK) accountNumber (FK) statementClosingDate
activityID Lists
mileageMonthlyAmount Airline_AirlinePartnership airlineID (FK) partnershipID (FK)
Airline
Participates
airlineID airlineName (AK1)
memberStartDate memberEndDate
Flight airlineID (FK) flightNumber
Schedules Involves
activityDate mileageActualAmount mileageBonusAmount activityDiscrim airlineID1 (FK) accountNumber1 (FK) airlineID2 (FK) accountNumber2 (FK) statementClosingDate (FK) Company activityDiscrim companyID companyName (AK1)
FlightActivity
AirlinePartnership
IsFor
partnershipID
flightActivityID (FK) description serviceClass airlineID (FK) flightNumber (FK)
partnershipName (AK1)
2011 Michael Blaha
OtherActivity otherActivityID (FK)
Sponsors
otherActivityType companyID (FK)
Data Modeling with the UML
31
Section 7: Database Design
Database Design Topics • Indexes • Referential integrity • Enumerations • Views
2011 Michael Blaha
Data Modeling with the UML
32
Index Indexes provide the primary means for tuning a relational database. • Normally you should define a unique index for each primary and candidate key. – Most relational DBMSs create unique indexes as a side effect of SQL primary key and unique constraints. • You should also build an index for each foreign key that is not subsumed by a primary key or candidate key constraint. Foreign keys implement associations and generalizations. – The indexes on foreign keys and primary keys enable fast traversal of a class model. • You should deliver your initial schema with indexes and referential integrity. You will frustrate users if you deliver a slow implementation. – It is straightforward to incorporate them into your initial schema. • The DBA may define additional indexes to further speed frequently asked queries. 2011 Michael Blaha
Data Modeling with the UML
33
Referential Integrity • Many relational DBMSs now support referential integrity. Use it! • Define referential integrity actions for deletes. You can partially infer them from the class model. • Given existence-based identity, you do not need referential integrity actions for updates. • Don’t use triggers to implement referential integrity.
2011 Michael Blaha
Data Modeling with the UML
34
Implementing Referential Integrity for Deletion • Generalization. – Normally cascade the effect of a deletion. – The RDBMS can propagate deletion downwards from the superclass to the subclasses. However, a RDBMS cannot propagate deletion upwards from the subclass towards the superclass. • Buried association, minimum multiplicity of zero. – Normally set the foreign key to null. – Sometimes forbid deletion. • Buried association, minimum multiplicity of one. – Cascade the effect of a deletion. – Or forbid deletion. • Association table. – Normally cascade deletions to the records in an association table. – Sometimes forbid a deletion. 2011 Michael Blaha
Data Modeling with the UML
35
Enumeration • An enumeration is a data type that has a finite set of values. – ServiceClass is first, business, or coach. • You should carefully document enumerations. • Do not use a generalization to capture the values of an enumerated attribute. – Use generalization only when at least one subclass has attributes, operations, or associations that do not apply to the superclass. • Options for implementing enumerations. – One or more dedicated enumeration tables. Involves some work defining mechanisms, but a good approach. – Check constraints. A good approach for constraints that seldom change. – Enforcing enumerations with programming code. A poor option that many developers use anyway.
2011 Michael Blaha
Data Modeling with the UML
36
View • A view is a database table that is computed as needed from a SQL query. • Views can be used to consolidate the generalization levels that describe different aspects of an object. • Many database engines support updatable views when they are theoretically possible. Often you can not only read through views, but also write through views.
2011 Michael Blaha
Data Modeling with the UML
37
Section 8: Conclusion
Agile UML Data Modeling • See my YouTube videos for an example of data modeling using the UML. – Develop software for tracking library loan records. – http://www.youtube.com/view_play_list?p=EE77921A75E846EB – I routinely perform agile UML data modeling. I do live, interactive sessions with business customers and it works well.
2011 Michael Blaha
Data Modeling with the UML
38
Seeking Your Advice... I may write a book explaining the UML class model in terms of IDEF1X — along the lines of this talk. • Good idea? • Bad idea? • Suggestions?
2011 Michael Blaha
Data Modeling with the UML
39
Answer 1 • Model (a) states that a subscription has derived identity. Model (b) gives subscriptions more prominence and promotes subscription to a class. • The (b) model is better. – Most copies of magazines have subscription codes on their mailing labels; this could be stored as an attribute. – The subscription code is intended to identify subscriptions; subscriptions are not identified by the combination of a person and a magazine so we should promote Subscription to a class. – Furthermore a person might have multiple subscriptions to a magazine; only the (b) model can readily accommodate this.
2011 Michael Blaha
Data Modeling with the UML
40
Answer 2 — Qualified Association Restate the following model to use qualifiers. 0..1
Index isUnique
1
*
1
name name Database
0..1
name 1
name 1
*
1
Table
name 1
* Column 1 0..1 name {ordered} isNull dataType 1
*
*
UniqueKey 0..1
{ordered} source
*{ordered} * {ordered}
*
target {ordered}
*
ForeignKey OnDeleteOption onUpdateOption
*
0..1
*
primaryOrAlternate
Without qualifiers, the scope of uniqueness is not obvious.
2011 Michael Blaha
Data Modeling with the UML
41