Czech Technical University in Prague Faculty of Electrical Engineering

Diploma Thesis

Genealogic Origin Locator Portal Bc. Tom´ aˇs Fidler

Supervisor: Ing. Michal Valenta, Ph.D.

Master Study Program: Electrical Engineering and Information Technology Specialization: Computer Science and Engineering May 2007

ii

Acknowledgements I would like to thank the following people who helped make this project a reality. I had a fortune to work with Ing. Michal Valenta, Ph.D., my advisor and Mr. Tom´aˇs Zahn, originator of this work. I would also like to thank my colleague Ondˇrej Chaloupka for his helpful suggestions which also contributed to make this work better and useful. And finally, I would like to thank my family for supporting me and my fianc´ee Kl´ara Zachari´aˇsov´a for help with English grammar and the final format. iii

iv

Prohl´ aˇ sen´ı Prohlaˇsuji, ˇze jsem svou diplomovou pr´aci vypracoval samostatnˇe a pouˇzil jsem pouze podklady uveden´e v pˇriloˇzen´em seznamu. Nem´am z´avaˇzn´ y d˚ uvod proti uˇzit´ı tohoto ˇskoln´ıho d´ıla ve smyslu §60 Z´akona ˇc. 121/2000 Sb., o pr´avu autorsk´em, o pr´avech souvisej´ıc´ıch s pr´avem autorsk´ ym a o zmˇenˇe nˇekter´ ych z´akon˚ u (autorsk´ y z´akon).

V Praze dne 25.5.2007

.............................................................

v

vi

Abstract This work analyzes the state of the genealogical programs and websites and together with the analysis of the usable sources for genealogical data it designs the portal for the genealogical use. The designed system is a unique connection of the web based application and the administered database with genealogical data. Its main features are searching for data in the database and adding the data to it and exporting of data into the GEDCOM file. The prototype as well as the design of the website of this application is implemented with the use of the Oracle Application Express.

Abstrakt Tato pr´ace analyzuje genealogick´e zdroje uveden´e v kapitole Sources a jej´ı hlavn´ı ˇc´ast´ı je n´avrh genealogick´eho port´alu. V u ´vodu se zab´ yv´a anal´ yzou souˇcasn´eho stavu genealogick´ ych program˚ u a webov´ ych str´anek s genealogickou t´ematikou. Jedineˇcnost navrˇzen´eho syst´emu spoˇc´ıv´a ve spojen´ı webov´e aplikace a spravovan´e datab´aze genealogick´ ych dat. Mezi hlavn´ı funkce, kter´e byly analyzov´any, patˇr´ı vyhled´av´an´ı dat v datab´azi resp. jejich vkl´ad´an´ı do datab´aze a export dat do souboru form´atu GEDCOM. Jak prototyp, tak design t´eto aplikace byly vytvoˇreny pomoc´ı Oracle Application Express.

vii

viii

Table of contents List of figures

xiv

List of tables

xv

1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The aim of the work . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 1

2 Sources 2.1 Specification of the sources . . . . . . . . . . . . . . . . 2.1.1 Soupis poddan´ ych podle v´ıry z roku 1651 . . . . 2.1.2 Bern´ı rula . . . . . . . . . . . . . . . . . . . . . ˇ ach z roku 1793 2.1.3 Soupis ˇzidovsk´ ych rodin v Cech´ 2.1.4 Stabiln´ı katastr . . . . . . . . . . . . . . . . . . 2.1.5 Poˇstovn´ı adres´aˇre . . . . . . . . . . . . . . . . . 2.1.6 Legion´aˇri . . . . . . . . . . . . . . . . . . . . . 2.1.7 Sˇc´ıt´an´ı lidu . . . . . . . . . . . . . . . . . . . . 2.1.8 Prior genealogical research . . . . . . . . . . . . 2.1.9 Seznam ˇzadatel˚ u o povolen´ı k (vy)cestov´an´ı . . ˇ 2.1.10 Cesk´e katolick´e osady v USA (1865-1890) . . . . 2.1.11 Die Juden und Judengemeinden Bohmens . . . 2.2 Conclusion for sources part . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

3 3 3 3 3 3 4 4 4 4 4 4 5 5

3 State of the art 3.1 GEDCOM format . . . . . . . . . . . . . 3.1.1 ANSEL . . . . . . . . . . . . . . 3.1.2 XML GEDCOM . . . . . . . . . 3.1.3 GEDCOM specification . . . . . . 3.1.4 GEDCOM example . . . . . . . . 3.2 Programs . . . . . . . . . . . . . . . . . 3.2.1 Textbox interface . . . . . . . . . 3.2.1.1 Brother‘s Keeper . . . . 3.2.1.2 Family Historian . . . . 3.2.2 Half-graphic interface . . . . . . . 3.2.2.1 Personal Ancestral File . 3.2.2.2 Legacy Family Tree 6.0 3.2.2.3 Family Tree Maker . . . 3.2.3 Fully graphic interface . . . . . . 3.2.3.1 GenoPro 2007 . . . . . . 3.2.4 PhpGedView project . . . . . . . 3.3 Web pages . . . . . . . . . . . . . . . . . 3.3.1 Informative pages . . . . . . . . . 3.3.1.1 CGSI.org . . . . . . . . 3.3.1.2 Genea.cz . . . . . . . . 3.3.2 Database pages . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

6 6 7 7 8 8 11 11 12 12 12 12 13 14 14 15 16 16 16 17 17 17

ix

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . .

17 17 18 18 18 18 18

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21 21 22 22 24 24 25 25 25 25 25 25 26 26 26 26 26 27 27 27 28 28 28 28 28 29 29 29 30 30 30 31 31 32 33

5 Prototype 5.1 Installation of the application . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Internal logic of the application . . . . . . . . . . . . . . . . . . . . . . .

35 35 36

3.3.3

3.3.4

3.3.2.1 Genealogy.com . . 3.3.2.2 FamilySearch.com Portals . . . . . . . . . . . . 3.3.3.1 Ancestry.com . . . 3.3.3.2 Geni.com . . . . . Supplemental pages . . . . . 3.3.4.1 Surnamedb.com . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

4 Analysis and design 4.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . 4.2 Database design . . . . . . . . . . . . . . . . . . . 4.2.1 The database model . . . . . . . . . . . . 4.2.2 Description of entities . . . . . . . . . . . 4.2.2.1 The main entities . . . . . . . . . 4.2.2.2 The type entities . . . . . . . . . 4.2.2.3 The translation entities . . . . . 4.2.2.4 The temporary entities . . . . . . 4.2.3 Data types . . . . . . . . . . . . . . . . . . 4.2.3.1 The date . . . . . . . . . . . . . 4.2.3.2 TIMESTAMP . . . . . . . . . . . 4.2.3.3 CHAR(1) . . . . . . . . . . . . . 4.2.3.4 VARCHAR2 and NVARCHAR2 4.2.3.5 NUMBER . . . . . . . . . . . . . 4.3 UML diagrams . . . . . . . . . . . . . . . . . . . 4.3.1 Use Case diagram . . . . . . . . . . . . . . 4.3.2 State diagram . . . . . . . . . . . . . . . . 4.3.3 Sequence diagram . . . . . . . . . . . . . . 4.4 Business oriented models . . . . . . . . . . . . . . 4.4.1 Interview . . . . . . . . . . . . . . . . . . 4.4.1.1 Sketches . . . . . . . . . . . . . . 4.4.2 Business . . . . . . . . . . . . . . . . . . . 4.4.2.1 Participants . . . . . . . . . . . . 4.4.2.2 Scenarios . . . . . . . . . . . . . 4.4.2.3 Functions . . . . . . . . . . . . . 4.4.2.4 Data flows . . . . . . . . . . . . . 4.4.2.5 Business diagrams . . . . . . . . 4.4.2.6 Business architecture . . . . . . . 4.4.3 Hierarchy . . . . . . . . . . . . . . . . . . 4.4.3.1 Business Hierarchy . . . . . . . . 4.5 GEDCOM export . . . . . . . . . . . . . . . . . . 4.5.1 Supported tags . . . . . . . . . . . . . . . 4.5.2 Analysis of the export process . . . . . . . 4.5.3 Logic of the export process . . . . . . . . .

x

. . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.3

Screenshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

6 Environment

39

7 Conclusion 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40 40 40

8 Bibliography

41

A Examples of sources

45

B Full list of GEDCOM 5.5 tags

54

C Database model

60

D Tables

64

E UML diagrams

68

F Business diagrams

73

G Simulation of Basic search

83

H Content of enclosed CD

92

xi

xii

List of figures 1.1

Example of the record from the registry . . . . . . . . . . . . . . . . . . .

3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9

Mapping GEDCOM 5.5 to GEDCOM XML . . . . . . Interface of Brother Keeper . . . . . . . . . . . . . . . Ahnentafel numbering system example . . . . . . . . . Interface of Family Tree Maker . . . . . . . . . . . . . Interface of GenoPro 2007 . . . . . . . . . . . . . . . . Example of GenoPro file . . . . . . . . . . . . . . . . . Interface of PhpGedView . . . . . . . . . . . . . . . . . Example of the information available on Ancestry.com Examples of the application on Geni.com . . . . . . . .

. . . . . . . . .

7 11 13 13 14 15 16 19 20

4.1 4.2 4.3

Logical structure of the database . . . . . . . . . . . . . . . . . . . . . . Differences in versions of database model - logical structure . . . . . . . . Hierarchy of users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23 24 31

5.1 5.2 5.3

Error report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basic Search page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Result report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38 38 38

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

1

A.1 Soupis poddan´ ych . . . . . . . . . . . . . . . . . A.2 Bern´ı rula . . . . . . . . . . . . . . . . . . . . . ˇ ach z roku 1793 A.3 Soupis ˇzidovsk´ ych rodin v Cech´ A.4 Stabiln´ı katastr . . . . . . . . . . . . . . . . . . A.5 Legion´aˇri . . . . . . . . . . . . . . . . . . . . . . A.6 Sˇc´ıt´an´ı lidu z roku 1921 . . . . . . . . . . . . . A.7 Prior genealogical research . . . . . . . . . . . . A.8 Seznam ˇzadatel˚ u o povolen´ı k (vy)cestov´an´ı . . ˇ e katolick´e osady v USA 1865-1890 . . . . . A.9 Cesk´ A.10 Die Juden und Judengemeinden Bohmens . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

45 46 47 48 48 49 50 51 52 53

C.1 C.2 C.3 C.4

Database Database Database Database

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

60 61 62 63

E.1 E.2 E.3 E.4 E.5

Portal - use case diagram . . . . . . . Portal - state diagram . . . . . . . . Basic search - sequence diagram . . . User creation - sequence diagram . . GEDCOM export - activity diagram

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

68 69 70 71 72

F.1 F.2 F.3 F.4 F.5

Basic search - sketch . . . GEDCOM export - sketch Insert data - sketch . . . . Basic search - scenario . . Basic search - diagram . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

73 74 75 76 76

model model model model

-

logical structure - version 1 . logical structure - version 2 . physical structure - version 1 physical structure - version 2

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

xiii

. . . . .

F.6 F.7 F.8 F.9 F.10 F.11 F.12 F.13 F.14 F.15 F.16

Extended search - scenario . . . . . . . . . . Extended search - diagram . . . . . . . . . . Insert data by admin - scenario . . . . . . . Insert data by admin - diagram . . . . . . . Creation of the user - scenario . . . . . . . . Creation of the user - business diagram . . . Export data to the GEDCOM file - scenario Export data to the GEDCOM file - diagram Insert data by user - scenario . . . . . . . . Insert data by user - diagram . . . . . . . . Business architecture . . . . . . . . . . . . .

G.1 Simulation G.2 Simulation G.3 Simulation G.4 Simulation G.5 Simulation G.6 Simulation G.7 Simulation G.8 Simulation G.9 Simulation G.10 Simulation G.11 Simulation G.12 Simulation G.13 Simulation G.14 Simulation G.15 Simulation G.16 Simulation G.17 Simulation G.18 Simulation G.19 Simulation G.20 Simulation G.21 Simulation G.22 Simulation G.23 Simulation G.24 Simulation G.25 Simulation G.26 Simulation

of of of of of of of of of of of of of of of of of of of of of of of of of of

Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic Basic

select select select select select select select select select select select select select select select select select select select select select select select select select select

H.1 Content of enclosed CD

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

77 77 78 78 79 79 80 80 81 81 82

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

83 83 83 84 84 84 85 85 85 86 86 86 87 87 87 88 88 88 89 89 89 90 90 90 91 91

. . . . . . . . . . . . . . . . . . . . . . . . . . .

92

-

initial phase step 2 . . . . step 3 . . . . step 4 . . . . step 5 . . . . step 6 . . . . step 7 . . . . step 8 . . . . step 9 . . . . step 10 . . . step 11 . . . step 12 . . . step 13 . . . step 14 . . . step 15 . . . step 16 . . . step 17 . . . step 18 . . . step 19 . . . step 20 . . . step 21 . . . step 22 . . . step 23 . . . step 24 . . . step 25 . . . step 26 . . .

xiv

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

List of tables 4.1 Design of the actual name table . . . . . . . . . . . . . . . . . . . . . . . 4.2 Design of the main table . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21 22 28

D.1 The D.2 The D.3 The D.4 The D.5 The D.6 The D.7 The D.8 The D.9 The D.10 The D.11 The D.12 The D.13 The D.14 The D.15 The

64 64 65 65 65 65 65 66 66 66 66 66 66 67 67

individuals table . . . events table . . . . . unions table . . . . . sources table . . . . . locations table . . . . file management table files table . . . . . . . type of file table . . . type of union table . type of event table . . actual names table . . actual surnames table actual places table . . temp individuals table temp events table . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

xv

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

xvi

CHAPTER 1. INTRODUCTION

1

1 Introduction 1.1

Motivation

I personally have chosen this task because of my experience with genealogical research. I spent hours in archives and found this work very interesting, considering the fact that a really good researcher must be very skillful (i.e.: must be able to read in several languages - in the Czech Republic it is German, Polish, Hungarian (it depends on how close to the borders the family used to live), Latin (at least some abbreviations) and also Czech or Slovak). There are also many transcripts and abbreviations which such a person has to be familiar with.

Figure 1.1: Example of the record from the registry

This portal is going to be the connection of a data providing portal for genealogical searches (can be used by genealogists or casual users) and of the website logic with possibilities of creating user’s own database and generation of lists (charts, boxes). The GEDCOM format export (mentioned in the section 4.5) will also be supported. This connection makes this project different from all existing programs, tools and web pages mentioned in the chapter 3.

1.2

The aim of the work

The main purpose of this project is creation of a database which should help with locating historical place(s) of origin when provided only with a family’s surname. It is intended to be a source for American and Canadian citizens mainly. This database is primarily created to serve as a tool to assist genealogical researchers with the difficult task of establishing the location(s) where families lived prior to emigration, or to help those who do not know the place of origin of an ancestor to narrow the searching to places where the family names are known to have existed. This is particularly necessary in the cases where the descendants have only general data concerning the place of origin (i.e.: Bohemia, Austria, Prague, etc.), and no further research can be done without more specific data. The goal of this work is to build suitable portal based on this database and place it on the internet. The sources mentioned further have been chosen on the basis of a large number of surnames they contain, accompanied by dates and locations. We are going to use these sources to map the overall incidents of specific family names, during a particular historic period (17th to early 20th century), to the geographic loca-

2

CHAPTER 1. INTRODUCTION

tions where they occurred. This is not intended to be a complete source for genealogical research, but only as a reference, to help provide direction. Nevertheless the structure of the database will be created to be suitable for full genealogical research in the future. The originator of the idea to build such a database and portal is Mr. Thomas Zahn (owner of the Czech-American company P.A.T.H. Finders Intl. [1]). He has spent more than 11 years collecting information about families and genealogical research itself and the sources were chosen by him.

CHAPTER 2. SOURCES

3

2 Sources As mentioned before, the sources were chosen by Thomas Zahn (because of his long experience with the work of a researcher). The very first part of this project must be the full specification of these sources and the ways for their implementation.

2.1

Specification of the sources

Sources mentioned below cover the period from 17th century through the important 19th to the early 20th century. Sources and also archives do not contain further records because of the law about protection of personal data 1 , nevertheless this period is adequate. Also other sources (such as telephone directories) are not used, because there could be legal concerns pertaining to personal data protection. The specification of all given sources are as follows: 2.1.1

Soupis poddan´ ych podle v´ıry z roku 1651

This source contains lists of serfs by village, panstv´ı (dominion), house number, family name, occupation, age, and religion. It should be noted that the orthography of spelling of family names was not stable, and some variations need to be taken into account. Also names of the places changes during years, but names and order of villages is set by the statistic lexicon of village from 1992. [3] 2.1.2

Bern´ı rula

Roll of Assessments from 17th century contains lists of farm owners from roughly the same period as the first source, however, it is a more complete list, comprising a larger area, with fewer orthographical errors. Data is organized according to kraj (region), panstv´ı (dominion), village, occupation, and family name. [3] 2.1.3

ˇ Soupis ˇ zidovsk´ ych rodin v Cech´ ach z roku 1793

This is the specific list of Jewish families in Czech Lands. This is another list of persons organized by kraj (region), village, family name and origin. [3] 2.1.4

Stabiln´ı katastr

The Stable Cadastre is a list of all possessors of real property (land and/or house) that accompanied the Indikaˇcn´ı Skicy (Indication Sketches) that were made in the first half of the 19th century. The Cadastre includes an alphabetic listing of all possessors, giving both - first name and family name, as well as the date when the record was made and their place of domicile. The significance to this project is that these lists represent perhaps one of the most complete and accessible lists of surnames from a very significant period (immediately before the emigration of the second half of the 19th century). [3] 1

The full list of state intervention to the registry is situated in §29 of registry law (matriˇcn´ı z´akon) nr. 268/1949 Sb. [2]

4 2.1.5

CHAPTER 2. SOURCES Poˇ stovn´ı adres´ aˇ re

A Postal Directory is listing all inhabitants according to their address (mostly limited to towns) and was prepared by the Postal Service in the late 19th century, and again in early 20th century. They pertain mostly to large towns, and list only the names of persons connected with particular commercial trades. [3] 2.1.6

Legion´ aˇ ri

Separate books containing lists of the soldiers who died during the First World War. These books are organized by okres (district), and include brief biography of commanders, and lists by place of birth, surnames and date of birth of all conscripts who fell or were taken as a prisoner. [3] 2.1.7

Sˇ c´ıt´ an´ı lidu

Census records from the end of 19th century and 1921 contain a list of family members, and other non-related residents, organized by house number (each house contains it’s own list). Information includes specific location (town or village), family name, individual name, and dates (date of census and dates of birth for individuals). [3] 2.1.8

Prior genealogical research

Based on existing research, compiled over the previous 11 years, there exist more than 2,800 different family surnames combined with the specific places of residence (house number/village/region) and dates of particular events (births, marriages, deaths). Although the numbers are not as large as the other sources, the detail is far more precise, and in all cases pertaining to families that in fact did emigrate. [1] 2.1.9

Seznam ˇ zadatel˚ u o povolen´ı k (vy)cestov´ an´ı

ˇ e Emigration documents of Bohemian people, which are placed in the archival fund Cesk´ m´ıstodrˇzitelstv´ı (Bohemian Government). This archival fund is deposited at the State Central Archives in Prague (this source is still in progress). Most of these documents were discarded (also major part of requests was discarded), however, the books listing incoming mail were preserved. In these concise entries, the names, place of origin, the date, and sometimes the destination of the applicant are mentioned. In some cases, the applications have also survived. These books of incoming mail cover the entire period from 1856 to 1910. After the research of these documents is finished, we will be able to ˇ e gubernium create a database including those people who got permission from the Cesk´ to emigrate from Bohemia to America or Canada. [4] 2.1.10

ˇ Cesk´ e katolick´ e osady v USA (1865-1890)

List of persons and families organized by their settlements in the US, and making reference to their places of origin. Not an extensive list of surnames as the others, but useful in making a connection to distinct regions in the Czech lands. It contains dates and events from the everyday life. [3]

CHAPTER 2. SOURCES 2.1.11

5

Die Juden und Judengemeinden Bohmens

List of names by place of living and chronicle like notes about these places. All records content only notes about Jewish families. [3]

2.2

Conclusion for sources part

All of these sources are suitable for this work. We have to admit that some are hard to implement for the database but contains much more information for further researches, i.e.: existing researches has to be reviewed and key words must be specified but these researches contain a lot of further and complete information for the future ”customer”. Although all of the sources contain three important data (in order of importance): surname (family name), place (place of birth, place of living in time of taking record or any other place), date (date of birth or date of record, or age and date of taking of the list). The latter is not so important but will help in further research. The name is also important, but the selection of the data will be done on the basis of the family name. The main part of presented records covers a specific time frame (19th century and early 20th century). This criterion works best for this project, since families that emigrated did so mostly during this time frame. The sources covering the rest are used mainly as the source of family names and changes in them. Also places of location are present and useful. By combining the sources we should be able to accomplish the creation a sampling of Czech/German surnames (there must be translation table for names, surnames and also names of villages and cities - will be mentioned in the section 4.2), and provide links to particular places at about the same time the emigrants with these names would have still been present. Upon completion, we will be able to locate places of origin for many US (and Canadian) descendants of Bohemian immigrants who we have previously been unable to assist. Samples of mentioned sources are enclosed in Appendix A.

6

CHAPTER 3. STATE OF THE ART

3 State of the art Exploration of facts has to be the necessary part for this type of work. Modern genealogists have to be familiar with languages same as well as they have to be able to work with programs and internet websites. This part covers useful programs and websites. There are many programs which allow the researchers to store found data and print out the family trees and other lists. Many of those are not free, only a 14-day (or more) trial version can be obtained. Some of them are free, but they are not as sophisticated and graphically complete as the paid ones. The choice of the program depends on the type of work that is planned to be done. Found data can be stored in a large database, on top of which there is a specially developed application for effects, or they can be stored as a text file and viewed only as a simple web page. There are also many web pages with a genealogical subject matter - from the ones providing links to other websites, through pages with the information how to do research work, through the database pages containing ”paid” collection of information to the data storing websites, where the user can store his or her own data and display them in different views. There are hundreds of genealogical programs and websites for genealogy use and the choice of them depends on what work is needed and how much money can be invested. I will mention only a few of them. Mostly the ones which I am well acquainted with and which are positively reviewed.

3.1

GEDCOM format

The GEDCOM (an acronym for GEnealogical Data COMmunication) format is used as an import/export format in many programs. Additionally, many tools exist to convert GEDCOM files to linked HTML pages. [5] GEDCOM is a specification for exchanging genealogical data between different genealogical programs. It was developed by the Family History Department of the Church of Jesus Christ of Latter-day Saints (thereinafter only LDS) as an aid in their extensive genealogical research. [6] The LDS has enormous collection of data from the entire world. LDS researchers are originators of the Ancestral File Number (thereinafter only AFN), which is a unique identifier for everyone who has a record in the Ancestral File format. This number consists of 4 capital letters or numbers, a dash, and 2 or 3 more capital letters or numbers (it does not contain any vowels - A, E, I, Y, O, U). 2CT1-7A2 is an example. AFN can be searched online at the LDS website, FamilySearch (mentioned in the section 3.3.2.2).[7] There exist other numbering systems, but there is no need to mention them, because in this project we will use none of these. The current version of the specification is GEDCOM 5.5 (released in 1996). [8] In 1999, the draft of GEDCOM 5.5.1 was released (it contains a few more new tags (e.g.: WWW, EMAIL, FACT, etc.), and approves of the UTF-8 encoding for usage). [9] This draft was not officially approved yet, but a number of programs already support this version of GEDCOM.

CHAPTER 3. STATE OF THE ART 3.1.1

7

ANSEL

The 8-bit ANSEL (American National Standard for Extended Latin alphabet coded character set for bibliographic use) is the preferred character set for GEDCOM. [9] It is used for all transfers of data unless different encoding is set. The encoding can be changed in the header section of the GEDCOM file. The exact place is explained below. ANSEL is super-set of ASCII and is known also by these names: ANSI Z39.47-1985 (under copyright) or American Library Association character set. [10] As the ANSEL do not contain all characters (there are only a few limitations, but still there are some), especially for East Asian names (the original CJK characters can not be displayed), the Unicode will be further used as the preferred character set. 3.1.2

XML GEDCOM

In 2002, the beta version of GEDCOM 6.0 was released. [11] This version will no longer prefer the ANSEL encoding. Instead of it (mentioned above), the Unicode will be used and files will be stored in XML format. For now the GEDCOM file is still stored as a plain text. Although the draft was released in 2002, the XML version is still not used as a standard. Some believe that it is due to the fact that the GEDCOM is developed and controlled only by LDS. The figure 3.1 shows the mapping GEDCOM 5.5 to version 6.0. [12]

Figure 3.1: Mapping GEDCOM 5.5 to GEDCOM XML And because of this difference between needs and reality, some projects were started to produce new standard more quickly than LDS. One of them is Text Encoding Initiative (thereinafter only TEI). ”TEI is an international project to develop guidelines for the preparation and interchange of electronic texts for scholarly research, and to satisfy a broad range of uses by the language industries more generally.” [13] Another project is called GEDML, an XML version to GEDCOM, which was published by Michael Kay. [14] The GENTECH Lexikon Working Group developed LexML, which is a combination of TEI and GENTECH Genealogical Data Model. [15] In spite of all these projects, the GEDCOM still remains on top of the genealogical exchange format.

8

CHAPTER 3. STATE OF THE ART

3.1.3

GEDCOM specification

The GEDCOM contains information about individuals, families, events, places, and much more. The file itself consists of sections. Each file has to contain a header section, a section of records and a trailer (the end of the file). [5] Each line of a GEDCOM file begins with a number. This number represents the level of the record. 0 means top-level and it can be HEAD, TRLR, SUBN and INDI, FAM, OBJE, NOTE, REPO, SOUR, SUBM and other (mostly specific for each program - definitions of the environment, colors, fonts and other miscellaneous or graphic information). Number 1 means closer specification of the top-level, 2 means specification or details for level 1, etc. Only positive integer numbers can be used. After this number, the descriptive tag is placed. This tag refers to the type of data which are contained in that line. Tags can also be pointers or identifiers (@I0001@), which indicate a related individual, family, place, source, etc. within the one GEDCOM file. [16] The entire list with explanation of tags supported in the GEDCOM 5.5 standard is placed in the Appendix B. [17] 3.1.4

GEDCOM example

An example of a GEDCOM file with more precise explanation follows: 0 1 2 2 2 1 1 1 2

HEAD SOUR VERS CORP ADDR DATE CHAR GEDC VERS

GenoPro 2.0.0.3 level 2 refers to the source of the file in level 1 GenoPro http://www.genopro.com 5 APR 2007 date of creation of the file UTF-8 character encoding (ANSEL, UTF-8, ASCII, etc.) 5.5

specification of the version of GEDCOM

The following part contains global information for the GenoPro program which defines the graphic interface. This example was shortened. 0 1 2 3 2 3 3 1

GLOBAL NAME FULL FORMAT %T %F (%N) %M %L (%L2) %S DISPLAY FORMAT %F %M %L (%L2) LINES 3 FONT Arial

After the global information of the program, the record section begins. The following is the example of an individual record. 0 @ind00001@ INDI 1 NAME Tom´ aˇ s /Fidler/

the tag with an identifier of a person

CHAPTER 3. STATE OF THE ART

9

2 2 2 1 1 1 2 2 3 1 1 1 2 3

DISPLAY Tom´ aˇ s Fidler GIVN Tom´ aˇ s SURN Fidler POSITION -380,80 position used by GenoPro in maps SEX M level 1 denotes data about person from level 0 BIRT birth information follows DATE 15 SEP 1980 specification of the date of the birth PLAC Jilemnice (Jilemnice) specification of the place of the birth \_XREF @place00001@ reference (pointer) to the place from level 2 EDUCATIONS @edu00001@ reference to the education information FAMC @fam00001@ family where the person is a child CHAN some programs keeps last change DATE 9 APR 2007 last change date TIME 13:12:43 last change time

0 1 2 2 2 1 1

@ind00002@ INDI NAME Jaroslav /Fidler/ DISPLAY Jaroslav Fidler GIVN Jaroslav SURN Fidler SEX M FAMS @fam00001@

0 1 2 3 2 2 1 1

@ind00003@ INDI NAME Marie /Fidlerov´ a / ˇ S´ ıdov´ a/ DISPLAY Marie Fidlerov´ a FORMAT Custom custom format of the display name GIVN Marie SURN Fidlerov´ a / ˇ S´ ıdov´ a there are some problems with maiden name SEX F FAMS @fam00001@

family where the individual is a spouse

The problems with the maiden name are solved by different programs in different ways. The GenoPro has a long column for the Family Name / Maiden Name / Surname. It uses slashes and the display name can be defined in the user’s convention. The following example contains the information about the family structure. 0 1 1 1 1

@fam00001@ FAM UNIONS @marr00001@ HUSB @ind00002@ WIFE @ind00003@ CHIL @ind00001@

identification (pointer) of family identification of union (marriage) identification of husband identification of wife identification of child

This part contains information about the marriage. 0 @marr00001@ Marriage 1 TYPE Civil 1 DATE 22 MAR 1980

reference number of the marriage type of the marriage

10

CHAPTER 3. STATE OF THE ART

1 PLAC Semily 2 \_XREF @place00003@

reference to the place of the marriage

The following tag specifies the place mentioned above in the file. 0 1 1 1

@place00001@ PLAC NAME Jilemnice CITY Jilemnice COUNTRY Czech Republic

name of the place

This tag contains the specification of the education of the person, where the reference was mentioned. 0 1 1 1 2 1

@edu00001@ Education PROGRAM Computer science INSTITUTION CTU PLAC Prague \_XREF @place00002@ LEVEL Undergraduate / Bachelors

Another tag can specify the job position (occupation), sources of information, contact of the person, pictures, etc. The following part shows the type of links in pedigree. It defines if the person in the specified family is a biological, an adopted or a foster child or parent 0 1 1 1

PEDIGREELINK PEDIGREELINK Biological FAMILY @fam00001@ INDIVIDUAL @ind00001@

0 1 1 1

PEDIGREELINK PEDIGREELINK Parent FAMILY @fam00001@ INDIVIDUAL @ind00002@

identifies the child relationship

identifies the parent link

Relationships can be of different types. Together with the definition of other people 0 1 1 1

EMOTIONALRELATIONSHIP EMOTIONALLINK Friendship ENTITY1 @ind00008@ ENTITY2 @ind00001@

this type identifies friendship between this individual and this individual

In GenoPro there exist more than thirty different types of relations, i.e.: jealous, friendship, in love, distant, hate, violence, controlling, manipulative, fan, etc. The GenoPro map can contain the text label with parameters. The following is an example:

CHAPTER 3. STATE OF THE ART 0 1 2 2 1 2 1 2 3 3

11

LABEL POSITION -520,80 WIDTH 160 HEIGHT 40 TEXT Friends PADDING 10 DISPLAY COLOR TEXT #000000 FILL Transparent And the following tag represents the end of transmission of GEDCOM.

0 TRLR

end of file

The GEDCOM is not designed to be user-friendly, although it is possible to write a simple file by hand. To check this file the validators of the structure can be found, e.g.: PhpGedView (the validator is only a part of the large GEDCOM project). The file bush.ged is enclosed as an example of a big family tree. This file is tracing George Bush family back to Edward the King of England, Joan of Acre or Vratislav the King of Bohemia. 1 The analysis of the GEDCOM format for usage in this project and specification of supported tags can be found in the section 4.5.

3.2

Programs

We can divide the genealogical data handling programs into the following three groups: 3.2.1

Textbox interface

In this type of program, the information is inserted in a text box and connections between individuals are shown and created later. Only one child in the family pedigree can be shown in an instance of time. Figure 3.2 shows the example of environment of this type of program (Brother’s Keeper).

Figure 3.2: Interface of Brother Keeper

1

The source of this file is no longer available.

12

CHAPTER 3. STATE OF THE ART

3.2.1.1

Brother‘s Keeper

The BK is a popular program for collecting and handling genealogical data. The records can be treated in many ways. Export and import of GEDCOM format file is supported. The BK is able to create many different charts and lists (e.g.: a descendant chart, a tree chart, an ancestor chart, a list of birthdays, a list of names, a missing information report, etc.). This program is not very user-friendly in adding data (it takes long time to learn how to manipulate with this program), but the lists and charts are worth the effort to learn how to handle it. [18] 3.2.1.2

Family Historian

The FH is natively supporting the GEDCOM format and export/import is easier in this case. There are limitations of the demo version (it does not allow to save data and it is limited to 100 individuals). Data insertion is based on textboxes again, but the level of input is much more detailed. Data can be inserted also in the diagram view, but it is not easy to manipulate with relationship links. The FH enables the views (a family tree, a pedigree, ancestors, etc.) and many different reports (an ancestors report, a descendants report, a report by type of the data, etc.). [19] We have to mention one special view that is enabled in here. The family tree can be viewed together with Ahnentafel number (Ahnentafel numbering system is also known as the Sosa-Stradonitz system). [20] Let us explain this numbering system in short: each person has its own number (but it is not necessarily unique) and there exist only two rules for establishing these numbers. • the number of a person’s father is the double of their own number • the number of a person’s mother is the double plus one 1 2 3 4 5

self father mother father’s father father’s mother

The figure 3.3 shows the example of the Ahnentafel numbering system on the George Bush family. 3.2.2

Half-graphic interface

The information in this type of program can be inserted in the graphic form (family tree), but still one child in the pedigree can be shown. The difference from the type above is that the links between individuals are created in the process of data insertion through textboxes (interactive environment). Figure 3.4 represents this type of program (Family Tree Maker). 3.2.2.1

Personal Ancestral File

This program was evolved by the developers from the Church of Jesus Christ of Latterday Saints. The GEDCOM 5.5 standard is fully supported here. Insertion of data is based

CHAPTER 3. STATE OF THE ART

13

Figure 3.3: Ahnentafel numbering system example

Figure 3.4: Interface of Family Tree Maker

on the family tree, but only one child only can be shown in the pedigree in one time. The amount of types of reports is not big enough, but the ease of manipulation with the program balances it. [21] 3.2.2.2

Legacy Family Tree 6.0

This program is reviewed as the best of all of the used (on a large scale) genealogical programs today. [22] The ease of manipulation with data and the possibility of generation of a large number of different views, reports, lists, etc. make this program very popular. Other positive features are export/import functionality, manipulation with sources of data and complex help for researchers. [23] There are three character sets (ANSEL, ANSI and UTF-8) from which the user can choose the export to the GEDCOM format. In the export part it can be chosen which tags of the GEDCOM will be exported (GEDCOM 5.5 standard, PAF supported tags, basic, or own selection of tags). Also import can be done from different programs and formats (GEDCOM, PAF, etc.). The biggest advantage of this program is a large database of suggested sources. Suggestions are made according to the actual entered data and many different types of sources (or helpful information) are proposed (the database on familysearch.com, other related web pages, information about places and names, local historical centers (for many places

14

CHAPTER 3. STATE OF THE ART

in the US), telephone numbers and addresses to archives, libraries and other helpful institutions, etc.). The last useful quality we would like to mention is that the researcher has a lot of options in the sources part (source name, text, location, pictures, reliability, etc.). 3.2.2.3

Family Tree Maker

This program has the most user-friendly environment of all programs mentioned in this part. The data are input in tables (see figure 3.3) or in a family tree. Again, only one child per family can be shown in a moment of time. Advantage of this program is a large collection of different exports - trees, charts, book or map. The map shows the family members’ places of birth (death, actual living, etc.).2 Sophisticated online help is enabled for researchers - it contains tutorials, genealogy how-to, suggestions or search function. The export to the GEDCOM format and to the older versions of FTM is enabled. Import in this program is not so obvious. The function is called append/merge and supports GEDCOM, AFT or PAF format. We can move with the icons of individuals (only in two specific types of trees) but we are not allowed to add new ones or change the details of any individual. This function is only for changing data layout for print. [24] 3.2.3

Fully graphic interface

This type of program offers the ability of user’s own graphic expression of the pedigree. The information about the individual is inserted in the textbox, but the individual icon is created first and can be placed whenever in the map. Figure 3.5 represents this type of program (GenoPro 2007).

Figure 3.5: Interface of GenoPro 2007

2

A basic version contains only the map of the North America.

CHAPTER 3. STATE OF THE ART 3.2.3.1

15

GenoPro 2007

This program is a user-friendly ”drawing” tool for genealogists. The GEDCOM format import/export function is on a good level and offers standard manipulation. The GenoPro offers table layouts (outside the ”genomap”) of many views (families, unions, individuals, contacts, occupations, etc.). The special features are emotional relationships between individuals (more than thirty different types - e.g.: hate, jealous, love, etc.) and social entities (organization, school, etc.). The last feature we would like to mention is the generation of the complex web page with selected information (the hyperlinks are created between related individuals and a piece of information, color scheme can be chosen, etc.). [25] The figure 3.6 describes the complexity of graphic potential which GenoPro offers.

Figure 3.6: Example of GenoPro file

All non-free programs were tested in a free trial version (the functions of full version can be different), only for GenoPro 2007 we obtained the full academic license. Not all of the genealogical programs are designed for Windows users. There are few originally developed for Linux. GRAMPS (it stands for Genealogical Research and Analysis Management Programming System) is one example for all. [26] It is an Open Source Software package and some parts are under the GNU General Public License. We are mentioning this program because it offers interesting features. Export can be done to many different formats (GEDCOM 5.5 standard, Legacy, BK, FTM, Geneweb, PAF, etc.) and can be limited to a non-living persons only. The copyright of different levels (no copyright, GNU Free Documentation license, and the standard copyright) can be added to the exported file. Individuals are created first, and the relations are handled in a graphic interface later in the pedigree part. The main parts of the GRAMPS are as follows: people (list of individuals), relationships (list of relationships - parents’ information, family information), family list (list of families), pedigree (standard pedigree tree), events (list of events), sources (list of sources), places (list of places), media (pictures, videos, etc.) and repositories (list of repositories).

16 3.2.4

CHAPTER 3. STATE OF THE ART PhpGedView project

And at the end of this section we have to mention the PhpGedView (thereinafter only PGV) which is not really a program but a web application written in the PHP. One of the labels of the PGV is online genealogy viewer. It is primarily used as an online publication of genealogy data for individuals. [27] The last version of this application is built on the PHP, the MySQL database and the Apache web server. The PGV is able to create many different reports (e.g.: Ahnentafel number report (mentioned in the section 3.2.1.2), pedigree report, classic tree, etc.). These reports can be exported to the PDF format file. It is fully compatible with the GEDCOM 5.5 format and the import/export is done via the GEDCOM file. It primarily uses the raw GEDCOM to store data and only some parts (names, surnames, dates, places, etc.) are imported to the appropriate tables in the database. [28] [29] The figure 3.7 represents the family tree shown in the PGV environment. This tree is actually the set of pages connected with hyperlinks and it is enabled to walk through the whole family tree. There are more than 1000 registered websites using the PhpGedView.

Figure 3.7: Interface of PhpGedView

3.3

Web pages

There exist many websites providing help or valuable information for genealogists. We can divide the pages into the four following groups: 3.3.1

Informative pages

The pages of this group contain information how to do genealogical work and researches. They usually include contacts to the experienced researches, which are able to help with the research.

CHAPTER 3. STATE OF THE ART 3.3.1.1

17

CGSI.org

CGSI stands for Czechoslovak Genealogical Society International and is focused on resources and helping with the Czech, Slovak, Moravian, Bohemian, Rusyn and GermanBohemian genealogy. [30] This website contains information focused on Bohemia and useful documents and how-to for beginners and intermediate genealogists with specialization in Bohemia. It also provides help from professional researches. The most remarkable part of this organization is its library section, which contains more than 6000 volumes.3 3.3.1.2

Genea.cz

This website contains a lot of information about the genealogical subject matter. Mostly it is written in Czech because it is administrated by Czech users. But one part is focused on researchers from all over the world and offers a large amount of links to other websites.4 It covers the area of the former Czechoslovakia (including Bohemia, Moravia, Slovakia, and Sub Carpathian Ruthenia). 3.3.2

Database pages

This group of websites contains genealogical data (these are stored in a large database) and offers (but only some virtual tour through the system is for free) these data for other use. 3.3.2.1

Genealogy.com

This website provides a large amount of information about genealogy. It primarily offers data from a database and enables searching in it. The basic search for non-registered users is very limited and offers only little information. More information can be accessed with the paid access account. [34] It also contains the FTM Web Edition, which enables making a family tree online. It allows the user to use only a small part of features of FTM. [24] 3.3.2.2

FamilySearch.com

This website is maintained by the LDS. Originally it was www.familysearch.org. 5 The FamilySearch websites allow the users to search information from the following sources: 1. Ancestral File is a large collection of names (individuals and families). Genealogical organizations from all over the world participate in this project. [36] 2. The International Genealogical Index (thereinafter only IGI) is an index of millions of names of deceased people. This database exists from 1969 and is maintained by LDS. It covers best the period before 1876. [37] [38] 3

The list of the books can be found here: [31] Due to the changes in the website [32], the mentioned page is no longer available. Instead of it we can reccomend the genealogical handbook (written in Czech). [33] 5 The website of the church itself is www.lds.org. [35] 4

18

CHAPTER 3. STATE OF THE ART 3. The Family History Library Catalog describes the books, microfilms, and microfiche in Salt Lake City’s Family History Library. This is a part of the LDS’ library. [39] 4. The Internet Index to the Pedigree Resource File (thereinafter only PRF), which is a list of all family trees submitted by people worldwide to the LDS. The collection of CD or DVD with these trees is offered to other researchers. It contains more than 5 million names. The PRF itself contains more than 100 million names. [40]

3.3.3

Portals

This group contains the websites, which allow the users to insert their own data and handle them in some way. They show these data in some type of a pedigree, for example. 3.3.3.1

Ancestry.com

This portal is mainly focused on the creation of the family pedigree online. The simple logic enables to add individuals to the family tree and add details in the list of individuals. We noted an error in the logic in this site. It checks the surname of the relatives and it did not permit to insert a woman’s surname different from her father’s (i.e.: the daughter ˇıda has to be named Marie S´ ˇıdov´a and not Marie S´ ˇıda as it is inserted by the of Josef S´ system). The family tree can be updated with photographs, different pictures and views can be printed, events and stories can be added to the project. [41] It also provides the search functionality and the data found in here can be add to the created family tree. The sources of information were provided by LDS [35] and partially managed by Ancestry.com. The example of the information which can be found for free is in figure 3.8. 3.3.3.2

Geni.com

This website provides simple creation of a pedigree. It is possible to create an unlimited genealogical map (all siblings are shown, not as in a standard pedigree). This application is based on Macromedia Flash technology. Also the list of individuals is provided. See figure 3.9. The reasonable level of details of information for individuals can be inserted. [42] 3.3.4

Supplemental pages

The pages in this category provide some supplementary information. 3.3.4.1

Surnamedb.com

This website provides interesting information about a large amount of surnames. It provides the origin of the surname, different versions of the surname, historical facts, etc. The following paragraph is taken from this website for the surname Fidler. ”This interesting surname is of Anglo-Saxon origin, and is an occupational name for a professional player of the fiddle, or a nickname for a skilled amateur. The derivation is from the Old English pre 7th Century ”fithelere”, fiddler. The creation of surnames from nicknames was a common practice in the Middle Ages, and many modern-day surnames derive from medieval nicknames referring to personal characteristics, as in this instance

CHAPTER 3. STATE OF THE ART

19

Figure 3.8: Example of the information available on Ancestry.com

the ”fiddler”. It may also be a nickname from the Anglo-Norman French phrase ”vis de leu”, composed of the elements ”vis”, face, ”de”, of, and ”leu”, wolf; hence ”wolfface”. Hunfridus Uis de Leuu is listed in the Domesday Book of Berkshire (1086). The surname is first recorded in the mid 12th Century (see below) and can also be found as Fiddler and Vidler. John le Fithelard is noted in the Subsidy Rolls of Worcestershire (1275), and John Fydeler is listed in the Poll Tax Returns of Yorkshire (1379). Recordings of the surname from London Church Registers include; Robert Fidler who married Joan Shereman on April 21st 1567 at St. Dunstan’s in the East, and William, son of William and Sara Fidler, who was christened on October 4th 1663 at St. Dunstan’s, Stepney. The Coat of Arms most associated with the family is a gold shield with three black bars wavy, the Crest being out of a gold ducal coronet, a demi griffin proper. The first recorded spelling of the family name is shown to be that of William Visdelou, which was dated 1160, in the ”Pipe Rolls of Suffolk”, during the reign of King Henry 11, known as ”The Builder of Churches”, 1154-1189. Surnames became necessary when governments introduced personal taxation. In England this was known as Poll Tax. Throughout the centuries, surnames in every country have continued to ”develop” often leading to astonishing variants of the original spelling.” [43]

20

CHAPTER 3. STATE OF THE ART

Figure 3.9: Examples of the application on Geni.com

CHAPTER 4. ANALYSIS AND DESIGN

21

4 Analysis and design This chapter gives us a deep analysis of the core of the system and deals with the design of the database, with the users’ specifications and rights, it provides us with the description of the processes such as the GEDCOM format export, basic selection or creation of a new user and it defines other logic of the whole portal. The first section follows the prerequisites and it offers the first thoughts about the project. A simple example of the design of the database is also a part of it. The second one presents the whole database concept and explains the database structure with all its data types, entities or relations. The third section deals with the UML diagrams, determines the roles of the users in the system and explains the rights for these users within the application. The last but one section in this chapter contains the business diagrams, establishes the processes and presents the simulations for these processes of the system. The last part of this chapter deals with the GEDCOM format export. We establish the logic of the export and determine the supported tags.

4.1

Prerequisites

The basic thoughts mentioned in here will be more precisely described in the following sections. This section deals with the basic ideas and it outlines the logic of the system. We have designed the structure of the database to offer more functions in the future. This design covers the common functions mentioned in the section 3.2. We ruminate on three different search processes in this project. The first one, that finds the location on the basis of a family surname, was mentioned before. The second one is fully compatible with other genealogical programs. It browses the whole database and finds relatives and creates family trees. However this is not primarily requested. This process will be accessible only for registered users. The last one searches detailed information for the chosen individual or group of individuals. This type of search will be followed by the GEDCOM export functionality. Requests and results have to be unified - all the data have to be in the same ”format”. ˇ If somebody is looking for John Schwartz, it can also be Honza Svarc. There have been significant changes in names and family names during the years and this fact has to be taken into consideration. Basic data tables are tables with names, family names (surnames) and cities (places). All of them must contain all known transliterations. The newest one will carry the symptom of the recency. In the case of place there must be the attribute informing us about the present existence of the place. The structure of these tables should be similar to the table 4.1. ID 1 2 3

name John Honza Jan

actual 0 0 1

actual id 3 3 3

Table 4.1: Design of the actual name table

22

CHAPTER 4. ANALYSIS AND DESIGN

These tables with names, surnames and cities have to be filled in first and must be comprehensive as much as possible. For the best fit to all types of records the date format has to be cut to the part of day, month and year (19th of May 1981 is represented by three columns as 19, 5 and 1981). The main table contain the records about individuals and for further research it must be filled in with as much information as possible. The basic structure should be similar to the table 4.2. ID 1 2 3

ins n John Jirka Marek

ins fam Schwartz Samek Scholz

ins city Praha Turnau Josefov

act n Jan Jiˇr´ı Marek

act fam ˇ Svarc Samek ˇ Solc

act city Prague Turnov none

day 15 11 14

mo 11 11 12

year 1855 1910 1847

Table 4.2: Design of the main table Further columns contain records about mother, father, address and other information suitable for genealogical researchers. This implementation has to influence the way of insertion to the main database from the sources. There are two possibilities of the creation: manual addition of all information or creation of the automated way with the help of triggers (with confirmation). In the case of entering new record to the main table and not having the actual record or not having the record at all (name, surname or city), the appropriate translation table must be edited (i.e.: Gottwaldov is now called Zl´ın). The trigger will raise the message with the propriate instruction what to do. There could be two webforms - one for fully manual handling with the records and one for automated usage of triggers. The system also has to implement uploading and presenting already existing text files of completed researches. Each of these documents has to be commented and indexed in the database. The direct link to this document will be placed in the appropriate row of the main table. The keywords which will unambiguously identify this source are family name(s) and place(s).

4.2 4.2.1

Database design The database model

There were two different ideas before the beginning of modeling. The first was processed with the regard to the database functions. Oracle supports searching in depth and recursive queries - START WITH and CONNECT BY clauses. [44] The second model was designed with the regard to the GEDCOM format export. We have used the Toad Data Modeler [45] and ER Studio [46] as database modeling tools. The main difference between the versions is in the entities related to the family and the connections between parents and their children. The first version specifies the family on the base of the unions entity, where the man id and woman id is presented. The is mother and is father relationships represent the connection between parents and their children.

CHAPTER 4. ANALYSIS AND DESIGN

23

The logical structure of this version of a database model is in figure 4.1.

Figure 4.1: Logical structure of the database

The second version contains families (with the husband id and the wife id attributes) and children (with the main id and the family id attributes) entities. The following structure represents the data in the GEDCOM file with the connection to the database attributes. 0 1 1 1 1 1

@fam00001@ FAM UNIONS @marr00001@ HUSB @ind00002@ WIFE @ind00003@ CHIL @ind00001@ CHIL @ind00004@

family_id in the families union_id in the unions husband_id in the families wife_id in the families main_id in the children main_id in the children

entity entity entity entity entity entity

The child id, the husband id and the wife id are represented by the same column (the main id from the individuals entity). The figure 4.2 shows the differences in the second version. Physical and logical views of both versions are presented in the Appendix C. Both versions were processed and at the close the version designed with regard to the database functions was chosen for this project. We have to mention the reasons for this decision. The search function is primarily designed for this project and Oracle functions are appropriate for the search functions and give us the possibility of creation of the

24

CHAPTER 4. ANALYSIS AND DESIGN

Figure 4.2: Differences in versions of database model - logical structure

descendant charts. The GEDCOM format export functionality will be designed for this version of model (the section 4.5). We have to mention here that the database model does not contain any information about the users’ accounts, because these are stored in the system of APEX, and it is not intended to handle the users specially in this database model. 4.2.2

Description of entities

Description of columns of all tables is placed in the Appendix D 4.2.2.1

The main entities

The individuals entity (represented by the table D.1) contains the records of all individuals with the information about their parents and the date of insertion of the record (and the date of the last change). The NOS and NOD attributes stand for the number of sons and the number of daughters. This information helps to shorten the search process. This process searches through the whole individuals entity when we are looking for the descendants of a chosen person. These attributes are generated automatically by the trigger upon insert action. The events entity (represented by the table D.2) contains all events for all individuals (the only exception is that this entity does not store the information about the wedding - the wedding is the matter of two people and the information about the wedding would be doubled, which is not desired). The unions entity (shown in the table D.3) stores the records for all marriages (and also other different unions). The sources entity (shown in the table D.4) contains the information about the source of any information. The locations entity (shown in the table D.5) stores the addresses for the events, unions and sources. The file management entity (shown in the table D.6) represents the relations between the individuals and the files in the system.

CHAPTER 4. ANALYSIS AND DESIGN

25

The files entity (shown in the table D.7) contains the links to the imported file and the ID of the type of this file (picture, text, etc.). 4.2.2.2

The type entities

The type of file entity (shown in the table D.8), the type of union entity (shown in the table D.9) and type of event entity (shown in the table D.10) contain the types of files, unions or events. The types implemented in this very moment are as follows: a personal picture, a personal document, a family document and a family picture for files, civil marriage, civil and religious marriage, religious marriage and other for the unions, and birth, baptism, death, burial and emigration for the events. This can be updated by the administrator of the system. 4.2.2.3

The translation entities

The actual names entity (shown in the table D.11), the actual surnames entity (shown in the table D.12) and the actual places entity (shown in the table D.13) contain all known transliterations and transcriptions of names, surnames and places. The recency in these entities displays whether the record is or is not current. The current records are represented by the actual name id, the actual place id, the actual m surname id and the actual f surname id attributes. The special attributes actual m surname id and actual f surname id are designed because of the surname differentiation in Bohemia. An American (or Canadian) user would insert Jane Dvorak but this is Jana Dvoˇr´akov´a in Bohemia. The fact that American or Canadian users are without the possibility to write Czech letters leads to the need of handling the data without diacritics. For all the translation entities this fact must be taken into consideration (on the top of all transliterations). A special function which would translate the names (surnames and cities) with the diacritics to the format without diacritics is not very useful in this case. And for that reason the records are placed in the database. 4.2.2.4

The temporary entities

The temp individuals entity (shown in the table D.14) and temp events entity (shown in the table D.15) store the records inserted by the users of the system. These records are stored until the reviewer allows or denies the update to the individuals entity. 4.2.3 4.2.3.1

Data types The date

We have to take into consideration that not all sources are complete (a source can be damaged, unreadable or incomplete) and for this reason we split the date format into three parts: Day, Month and Year. 4.2.3.2

TIMESTAMP

Unlike the incomplete dates, the date of the creation or change of the record in the

26

CHAPTER 4. ANALYSIS AND DESIGN

database is complete and the TIMESTAMP format is used instead. The actual format depends on the installation of the database and can also be edited and changed later. 4.2.3.3

CHAR(1)

This data type is used where only two possible values are needed. Namely it is the recency, divorced, existing (Y or N value) and gender (F or M value) columns. 4.2.3.4

VARCHAR2 and NVARCHAR2

The VARCHAR2 data type is used where the records are not supported in the GEDCOM format. We take into consideration the GEDCOM 5.5. The NVARCHAR2 data type is used for the records which are going to be exported. The reason of using NVARCHAR2 is that the character set of this data type can be either AL16UTF16 or UTF8. This is specified at the database creation time as the national character set. The latter is one of the sets supported by the GEDCOM format and it will be used as default for the export. The advantage is that using VARCHAR2 and NVARCHAR2 saves the space of tables, because both store a character string with variable length. The number defines the maximum string length (in bytes or characters). Maximum is set to 4000 bytes. [47] 4.2.3.5

NUMBER

The NUMBER data type is used in the ID columns and in the reliability column. The ID represents the unique identifier of each entity and the latter one provides the information of the reliability of the source.

4.3

UML diagrams

The UML diagrams mentioned below (and also the one presented in the section 4.5) were designed with the MagicDraw UML Personal Edition. [48] The academic license for this tool was provided within the class X36RSF. [49] 4.3.1

Use Case diagram

The figure E.1 represents the system of APEX and the use cases for the users in the system and in the application of Genealogical portal. This use case diagram defines the roles and possibilities for each type of user in the whole system. There are four types of users - Admin, Reviewer, Registered User and Non-registered User. Admin is the administrator of the system of APEX and the creator of the application and the database. Reviewer is a user specialized in reviewing the entries from other users. A Registered User has the rights to add data to the system, realize the basic and the extended search and export the selected records to the file of GEDCOM format. A Nonregistered User is able to process only the Basic search and ask for creation of a new account.

CHAPTER 4. ANALYSIS AND DESIGN 4.3.2

27

State diagram

The state diagram shown in the figure E.2 corresponds to the states of the portal (mostly web pages). The user with no login to the system is able to visit the pages with Basic search, Information and Registration. After the process of login, the access for a not authorized user is terminated and for an authorized user it is divided into three groups of users. Only two pages - User administration page and DB administration page are not part of the application itself but a part of the APEX. The purpose of all other pages in the diagram is evident. 4.3.3

Sequence diagram

We have designed the two following sequence diagrams to show how the portal should work. The Basic search process represented in the figure E.3 is the key functionality of the whole project. Among the others the figure shows the time assumed to pass between the moments when the user operates the basic search and when ho or she asks for the account. This function is implemented and mentioned in the section 5. The User creation process represented by the sequence diagram in the figure E.4 shows how the new account is created. It also displays the estimated time between the user request for the account and actual creation and confirmation of this creation. The process of user creation is not automated for many reasons. The main two are: the low increase of new users and the need of surveillance of the accounts. This process can be managed later with the usage of either internal or external processing (e.g.: another entity in the database and handling by PHP code).

4.4

Business oriented models

This section was partially processed within the school project. [49] We have used (for modeling of the business oriented models) the Craft.CASE tool, which is developed by the e-Fractal Ltd. for the Deloitte company. [50] All models designed in Craft.CASE use the BORM (Business and Object Relation Modeling) method. BORM is an object-oriented system development method used for analysis, design and development of systems. It stresses the process modeling as the main technique of capture. In each phase, BORM uses only a limited set of modeling concepts and rules. It has been developed as a method for pure object-oriented software development (e.g.: Smalltalk programming language, object database systems like Gemstone, ArtBase, ObjectStore, etc.). BORM is applicable in areas, where modeled problem can be interviewed, analyzed and designed as a collection of processes with mutually collaborating objects. [51] The BORM Object Behavioral Analysis [52] (hereinafter only BBA (also BOBA or OBA)) is the main process for BORM and consists of the following steps: • interview, creation of the list of functions and scenarios • creation of the list of participants • classification of objects

28

CHAPTER 4. ANALYSIS AND DESIGN • creation of the object relations and interactions (ORD) • simulations of scenarios

ORD (Object Relation Diagram) serves for detailed description of processes in the system in the first phases of analysis. Each object that participates (i.e. has a role) in the process is displayed as an automaton via a sequence of states and transitions. In ORD, the modeled process is displayed as mutual communication between objects. ORD can be understood as a combination of activity, sequence, interaction and state-transition UML diagrams. [51] The modeling process in the Craft.CASE is divided into three parts - Interview, Business and Hierarchy. We have followed these parts and the results are shown below. 4.4.1 4.4.1.1

Interview Sketches

The sketch function is the only part of the Interview. It provides the free drawing option. We have created three sketches in the first phase of modeling. These are represented by the figures F.1, F.2 and F.3. 4.4.2 4.4.2.1

Business Participants

Participant is a component of the system participating in the processes which are described in scenarios. Each participant has to be assigned at least to one scenario. [53] Participants are namely: technology equipment, software components, roles of individuals, etc. For our project we have designed the participants mentioned in the table 4.3. Name Admin Casual User Database Reviewer User User’s computer Web page

Description Administrator of the system Non-registered user Database engine Special user with the review role Registered user Computer of user The logic of APEX application

Table 4.3: Participants

4.4.2.2

Scenarios

The scenario is a description of one particular process. It contains the definitions of the Initiation, Action and Result phases. [53] The tables F.4, F.6, F.8, F.10, F.12 and F.14 represent the scenarios in the designed system.

CHAPTER 4. ANALYSIS AND DESIGN 4.4.2.3

29

Functions

The function is a specific part of the system relevant to the particular area or group. [53] Each function is consequently included in one or more scenario. The functions relevant for our project are shown in the business architecture (represented by the figure F.16). 4.4.2.4

Data flows

The data flow represents how data are moved between activities. [53] Data flows used in our project can be viewed in the appropriate business diagrams and we do not mention them separately. 4.4.2.5

Business diagrams

Business diagram is based on functions, scenarios, data flows and participants. It represents the process, which was previously designed in the appropriate scenario. [53] Basic search This process is the main in this project and it is the only one permitted for a user without a login into the system. In comparison with the facts mentioned above we have added the name element for the improvement of the basic select operation. The process itself can be described in this way: Casual User insert requested surname and name (surname is the required field) to the web based form. These data are consequently checked within the logic of APEX and sent to the database engine. In the database, surname (and name if inserted) has to be checked whether it is in actual state or not. If data are not marked as actual, they have to be transformed to the actual state. After that, the main select (with changed information) is performed on the individuals table. When the result is returned, the user is informed, which is the required final state. The figure F.5 represents this process. Simulation of this diagram is placed in appendix G. 1 Extended search This process uses the Basic search as the input of data and allows the User to find more specific information about the chosen person(s). The main difference from all other procedures is in the feedback from the Admin. When the requested information is not presented in the database, the administrator is informed and the user has the possibility to require finding the missing information. The information (if found) is consequently inserted into the database (this process is described below). The figure F.7 represents this process. Data insert by admin This process is closely associated with the Extended search procedure. In case the data requested by the User are not presented in the database, Admin is the only user who can insert new data into the database directly. The administrator (or his team of researchers) has to find the requested data. Triggers and database procedures check the 1

With respect to the amount of space needed for the presentation of the simulation, all other diagrams simulations are situated in the enclosed CD.

30

CHAPTER 4. ANALYSIS AND DESIGN

inserted data for recency and completeness of the record. The figure F.9 represents this process. User creation As mentioned before, the user accounts are not the part of the designed database. User accounts are created in the APEX management environment by the administrator of the system. After a non-registered user fills in the form, Admin is informed and the account can be created. After the account is created, the user is informed via email and the first login can be processed to confirm the creation. The figure F.11 represents this process. GEDCOM export In this process the requested data are exported into the GEDCOM file. Chosen data are processed, the file is created and stored in the HDD in the user’s computer. The full specification of the logic of the export is mentioned in the section 4.5. The figure F.13 represents the business diagram of this process. Data insert by user A registered user is allowed to insert his or her own records to the database. The information is inserted to the temp individuals in the first phase. These records have to be consequently reviewed by the Reviewer and the valuable information is inserted into the individuals table. When the information is not valuable, the individuals table is not updated and records in the temp individuals table are marked as not valuable. In both cases, data in the temp individuals table are kept and the user is informed about the performed action. The figure F.15 represents this process. 4.4.2.6

Business architecture

The business architecture represents the used scenarios and functions, shows the relations between scenarios and points them to the appropriate business diagrams. There are three types of relations between scenarios: [53] • Sequence of scenarios (transition) - end of one scenario is the beginning of the other • Composition of scenarios (uses) - one scenario is a partial activity for another • Specialization of scenarios (extends) - one scenario is a particular case of another The business architecture for our project is represented by the figure F.16. 4.4.3

4.4.3.1

Hierarchy

Business Hierarchy

The business hierarchy describes the hierarchy relations between the components of the system (users, buildings, work positions, etc.). [53] We have created the hierarchy of the users in the system (shown in the figure 4.3).

CHAPTER 4. ANALYSIS AND DESIGN

31

Figure 4.3: Hierarchy of users

4.5

GEDCOM export

The GEDCOM format was explained in the section 3.1.4 and this chapter will handle in detail the logic of export from the database to the GEDCOM format file. As mentioned before, the GEDCOM 5.5 is supported in this project. [8] 4.5.1

Supported tags

As it was stated in the section 3.1.3, the GEDCOM file consists of three parts. Firstly, we have to create the header for each exported file. The template for this is described below: 0 1 2 1 1 1 2

HEAD SOUR VERS DATE CHAR GEDC VERS

Genealogic Origin Locator Portal 1.0 actual_date UTF-8 5.5

The supported tags from the record part are divided into three sections - individual records, family records and place records. @I1@ INDI NAME, DISPLAY, GIVN, SURN, SEX BIRT, BAPM, DEAT, BURI, EMIG (each contains DATE, PLAC and \_XREF) FAMS, FAMC CHAN, DATE, TIME @FAM1@ FAM DIV (Y or nothing) HUSB, WIFE, CHIL MARR, DATE, TYPE PLAC

32

CHAPTER 4. ANALYSIS AND DESIGN

\_XREF @PLAC1@ NAME CITY COUNTRY And the last part is the trailer section. 0 TRLR Description of the mentioned tags is presented in the Appendix B. The only one not mentioned in this appendix is the cross reference tag \ XREF. This tag is taken from the GenoPro program [25], other programs use other tags for the cross references (e.g.: UID in the PAF or Legacy as the User ID, PLA DEFN as the definition of the place in the Legacy, etc.). 4.5.2

Analysis of the export process

As mentioned before anybody can read the format of GEDCOM, but this is not primarily intended and many tools exist for such work. There is an exact specification of how the GEDCOM file is built [54] [55] but there is no exact explanation how to build this file from the data source, namely database. There exist programs that convert the GEDCOM format to HTML, CSV, or newly to the XML format rather than the text, CSV or XML format to the GEDCOM. They also import the GEDCOM to a specially created database. There is one article which describes how to convert text to the GEDCOM manually. [56]. The only existing solution close to the portal that we built is the PGV project (mentioned in the section 3.2.4). [27] This program connects the possibility of storing own data in a sophisticated system, graphic presentation of these data and export to different formats (this solution is fully compatible with GEDCOM 5.5). But there are two big differences between our project and the PGV, the latter one has a lot of features but it is primarily for personal use. It means that no genealogical company with the real found data is using it, and this is what makes our project exceptional - the connection between the company with the genealogical data (based on many years of experience and a large collection of data) and the user who is able to update the database. Another difference is in the way of handling the data - PGV uses the GEDCOM as an input format and stores the data (other than the processed) in special text fields as it was inserted - in the raw GEDCOM format (like 0 @I1@ INDI, 1 NAME John /Chalupa/, etc.). [28] The GEDCOM file consists of sections and other parts, which were described above. We have to take into consideration repetitious sequences, such as individuals, families, places, etc. These sequences have to be processed and stored for temporary usage. In our project we have established four main parts - individuals, families, places and marriages (unions). All these parts must be processed and the connections between them must be specified and recorded. There are many possibilities to temporarily store the information. One possibility is the creation of special temporary views in the database, subsequent reading of the information and saving it into a file. Other possibility is the creation of special arrays

CHAPTER 4. ANALYSIS AND DESIGN

33

(within the special created code, e.g.: in C++) and subsequent filling the file from these arrays. The advantage of the first option is in the possibility to create the complex view more suitable for the export than the present structure of the database. In both cases this information is stored only temporarily and at the end (after the user’s log off from the application or after the export is done) it is deleted. There can be another solution using another structure, but the logic of the export is actually the same for all of them. There must be some limitation for the actual export of the data, because the company exerted much effort and a lot of work to collect such information. The unambiguous identifier who is the originator of the data is presented in the inserted by column in the individuals table and the data of the company are inserted by the Admin. There will be set limitations for the amount of records which can be downloaded from the company collection. The number of the user’s own records is not limited. Other user’s data can be downloaded if it is allowed by the user himself. 4.5.3

Logic of the export process

This section contains the whole logic of the export. The designed process is shown as the UML activity diagram in the figure E.5. All steps of this process are described below. Initial state is the choice of the individuals for the export and the actual run of the process. A special state is the Cancel request, where the export can be stopped in any time of the process and the file is not generated. 1. The system checks for the number of persons for the export. If it is 0, goto step 14, else goto step 2. 2. The system finds the ”oldest” not handled person in the choice and creates the ID - @I1@ and creates the connection between @I1@ and the main id from the individuals entity. 3. The system checks whether the parents for @I1@ are presented in the choice, if they are goto step 2, else goto step 4. This step is presented here for the verification of the ”oldest” person, because no record about the birth of the parents may be presented and therefore the system will not find them as the ”oldest” (although they are at least older than @I1@). 4. The system checks for the wife/husband of the @I1@ and if present, it creates @I2@ and the connection (as in the step 2). It also creates the FAMS tag and FAM record where the HUSB and/or WIFE tags are created. If there is no husband/wife presented, goto step 7. 5. The MARR record is processed for the @I1@ and if present, for the @I2@. 6. The system checks places for the union record and if present, the adequate cross reference and the PLAC record are created. 7. The system checks for the children of the person @I1@ and if present, for the @I2@ and if there areany, it creates @I3@, etc. and updates the FAM record with the CHIL tag(s). This step is being repeated until there is no child for the @I1@ and if present, for the @I2@. Also the FAMC tag is created in this step.

34

CHAPTER 4. ANALYSIS AND DESIGN 8. The system checks for the personal data of @I1@ and creates the INDI record.2 If there is no record (birth, death, etc.) for this person, the export ends and the errors are checked.3 9. The system checks for the places in the personal data and creates adequate cross references and the PLAC records.

10. The system checks for the wife or husband of the @I1@ and if present, goto step 8 with the adequate identifier, else goto step 11. 11. The system checks for the children of the @I1@ and if present, goto step 8 with the adequate identifier, else goto step 12. (This step is being repeated until there is no child for the @I1@ and if present, for the @I2@ person). 12. = 1. In this step the @I1@ person is declared as a handled person and another ”oldest” person is chosen. Practically it is the @I2@ person and therefore the FAM record (if no other husband or wife exists) is kept the same as it was (also the INDI record is created) and the process is ended soon. 13. After all persons in the choice are handled and no error occurs, the file is saved. 14. The system checks for other errors and if they occur, they are shown within the result web page. 15. The system closes the export and terminates all sources (view, array, etc.). This process is able to handle all types of relations - father only and 0, 1 or more children, mother only and 0, 1 or more children, parents and 0, 1 or more children, a separate person without any relatives The logic of the export is adjustable for future changes because not all tags are supported in this very moment (e.g.: BARM for the INDI).

2

The date structure mentioned in the section 4.2.3 has to be rebuilt to be adequate to the format used in the GEDCOM. 3 The application is built not to allow to store the individuals without any information in the events table.

CHAPTER 5. PROTOTYPE

35

5 Prototype The prototype of the application is built on the Oracle Application Express [57] (thereinafter only APEX), which is the tool which enables making another applications. It has to be mentioned that the APEX within the Oracle Database 10g Express Edition (thereinafter only XE) is slightly different from the separate release from Oracle (the version of the tool in Oracle XE is 2.1 instead of the newest available version 3.0).

5.1

Installation of the application

The minimum prerequisite for the run of the application is the Oracle Database 10g Express Edition. [58] It is recommended to have installed the Oracle SQL Developer. [59] It has to be mentioned that there are limitations on hardware: 1 GB of RAM and 1 CPU max. The following steps describe how to run the application on another computer: 1. Open the Database Home Page login window: • On Windows, from the Start menu, select Programs, then Oracle Database 10g Express Edition, and then Go To Database Home Page. • On Linux, click the Application menu (on Gnome) or the K menu (on KDE), then point to Oracle Database 10g Express Edition, and then Go To Database Home Page. 2. At the Database Home Page login window, enter the following information: • Username: Enter system for the user name. • Password: Enter the password specified when Oracle Database XE was installed. 3. Click Login. 4. Go to Administration > Database Users. 5. Create the new user with all privileges. 6. Login as the user created in the step before. 7. Go to Application Builder > Import. 8. Browse for the file application.sql, then click Next and Install. 9. Choose the appropriate schema from the Parse As Schema field.1 10. Keep Build Status on ”Run and Build Application”. 11. Keep the field Auto Assign New Application ID. 1

There is only one schema to choose in this moment.

36

CHAPTER 5. PROTOTYPE

12. Click Install Application. 13. Click Run Application. At this moment the application is imported in the APEX. The following steps describe how to create tables and insert the sample data to the database: 1. Create the tables by running the script create tables.sql (use of the Oracle SQL Developer is recommended). 2. Check the created tables in Home > Object Browser. 3. Fill in the sample data by running the script gen sample data.sql. We have created the new theme for this application. The theme consists of the pictures, CSS files and templates. All these parts are exportable/importable in APEX. The following steps explain how to import the theme we have created for this application: 1. Go to Shared Components > Themes > Create > From Export. 2. Browse for the file theme.sql and click Next, Install and Install Theme. 3. Go back to Themes and click Switch Theme. 4. Choose the Portal theme and click Next, Next and Switch Theme. 5. Run the application.2 6. Create the folder theme 111 in the http://127.0.0.1:8080/i/themes/ location.3 7. Copy all files from the theme files folder to the newly created one. Consequently the sample data can be removed using the rem sample data.sql script. All mentioned SQL scripts are placed in the enclosed CD. The content of the CD is placed in the Appendix H.

5.2

Internal logic of the application

Two report pages were actually created. The first report returns the data based on the surname query. The other returns the data based on surname and name query: select actual_surnames."surname" as "Surname", actual_names."name" as "Name", actual_places."place" as "Place of birth", events."year" as "Year of birth" from actual_names, actual_places, locations, actual_surnames, 2 3

Pages are now without any graphics. This can be done through WebDAV or FTP access. [60]

CHAPTER 5. PROTOTYPE

37

individuals, events where events."main_id"=individuals."main_id" and individuals."actual_surname_id"=actual_surnames."surname_id" and events."location_id"=locations."location_id" and actual_places."place_id"=locations."actual_place_id" and individuals."actual_name_id"=actual_names."name_id" and events."event_type_id" = ’1’ and actual_surnames."surname_id" = (select actual_surnames."actual_m_surname_id" from actual_surnames where LOWER(actual_surnames."surname") like LOWER(:P2_surname)) and actual_names."name_id" = (select actual_names."actual_name_id" from actual_names where LOWER(actual_names."name") like LOWER(:P2_name)) or events."main_id"=individuals."main_id" and individuals."actual_surname_id"=actual_surnames."surname_id" and events."location_id"=locations."location_id" and actual_places."place_id"=locations."actual_place_id" and individuals."actual_name_id"=actual_names."name_id" and events."event_type_id" = ’1’ and actual_surnames."surname_id" = (select actual_surnames."actual_f_surname_id" from actual_surnames where LOWER(actual_surnames."surname") like LOWER(:P2_surname)) and actual_names."name_id" = (select actual_names."actual_name_id" from actual_names where LOWER(actual_names."name") like LOWER(:P2_name)) order by "Year of birth" ASC The script is built to return only individuals having the birth record in the database. LOWER is the Oracle function converting all characters to lowercase which makes the input immune to the CAPS-LOCK and other influences, to be sure that in case the user tries to input Fidler, fidler, FIDLER, FiDlEr, etc. it returns the right result. Routing between the Basic Select page and actual report pages is based on the following script: SELECT actual_names."actual_name_id" from actual_names where LOWER(actual_names."name") like LOWER(:P2_name) When this query returns No rows, the redirection leads to the Surname page, in the opposite way it is linked to the Surname and Name page. This means that if the user filled in the Name, he or she seeks for the person with the specific name and for this the system searches if the specified name exists in the

38

CHAPTER 5. PROTOTYPE

actual names table. In case the specified name is not found, the Surname report is returned. Error report (shown in figure 5.1) is returned when Surname field is left blank (even if the Name is filled in).

Figure 5.1: Error report

5.3

Screenshots

The figure 5.2 represents the Basic Search page and the created design.

Figure 5.2: Basic Search page

The figure 5.3 represents the Report page.

Figure 5.3: Result report

CHAPTER 6. ENVIRONMENT

39

6 Environment All mentioned programs or products were tested on the hardware and software with the following configuration: • Intel Centrino Duo 1.83 GHz • 1 GB RAM • Microsoft Windows XP SP2 • Oracle Database 10g Express Edition Release 10.2.0.1.0 [58] • Oracle Application Express version 2.1.0.00.39 [57] The tested programs were as follows: • GenoPro 2007 [25] • GRAMPS 2.2.6-1 [26] It uses the following libraries: – Python 2.5 [61] – GTK+ 2.8.20 [62] – pygtk 2.8.6 [63] – pycairo 1.2.2 [63] • Brother‘s Keeper 6 [18] • Family Historian 3.0 [19] • Personal Ancestral File 5 [21] • Legacy Family Tree 6.0 [23] • Family Tree Maker 2005 [24] For the development the following programs were used: • Toad Data Modeler [45] • ER Studio [46] • MagicDraw UML Personal Edition [48] • CraftCASE 2.0 RC5.1 - RC6.1 [49] • Oracle SQL Developer [59]

40

CHAPTER 7. CONCLUSION

7 Conclusion 7.1

Summary

The work starts with the necessary analysis of provided sources, consequently deals with the state of the current programs and websites. The main part of this work provides the analysis and design for the genealogical portal. In the first place, we have designed the database model containing all needful information and moreover we have added the suitable entities for a further use of the portal as a fully comparable application with other products. The most important part was the analysis and the design of all adequate processes for such a portal. These processes were described by using the business and UML diagrams. They are namely basic and Advanced Search, new user creation, dding data by user or administrator and export data to the GEDCOM file. The logic of the export was stated more precisely in a separate section. All mentioned processes were verified with the simulations of business diagrams by using the Craft.CASE modeling tool. Finally the prototype was created which provides the basic logic of the search scenario (which is the main portal functionality). This prototype was built with the use of the Oracle Application Express.

7.2

Future work

There are three main parts for future work: • The theme needs to be rebuilt for the use in the company environment by using of the graphics, styles, etc. • The existing prototype meets the minimum required functionality of the basic search. But there are other designed functions to be implemented. There are no web-based forms for filling data to the database (by the administrator or by the users). This should be implemented first for the administrator to be able to fill the appropriate data to the database. The next step is to implement the extended search and other mentioned functions. • The last part is inevitably the testing of the system by means of usability tests and tests of functionality.

CHAPTER 8. BIBLIOGRAPHY

41

8 Bibliography [1] P.A.T.H. FINDERS INTL. website. http://www.pathfinders.cz. [2] Changes in the registry. http://www.genea.cz/informace/rady-do-zacatku/matriky-v-prubehu-staleti/.

[3] T. Zahn - Locate geographic places in the Czech according to the family names. http://www.pathfinders.cz. [4] T. Zahn - Emigration Project. http://www.pathfinders.cz. [5] GEDCOM information. http://en.wikipedia.org/wiki/GEDCOM.

[6] Family History Department homepage. http://www.familysearch.org/Eng/Library/FHL/frameset_library.asp?PAGE=library_history.asp.

[7] Ancestral File Number information. http://en.wikipedia.org/wiki/Ancestral_File_Number.

[8] GEDCOM 5.5 specification. http://homepages.rootsweb.com/~pmcbride/gedcom/55gctoc.htm.

[9] GEDCOM 5.5.1 specification. http://www.phpgedview.net/ged551-5.pdf.

[10] ANSEL specification. http://homepages.rootsweb.com/~pmcbride/gedcom/55gcch3.htm.

[11] GEDCOM 6.0 draft. http://www.familysearch.org/GEDCOM/GedXML60.pdf.

[12] Mapping GEDCOM to XML. http://msdn.microsoft.com/msdnmag/issues/04/05/XMLFiles/default.aspx.

[13] TEI homepage. http://www.tei-c.org/.

[14] GEDML homepage. http://users.breathe.com/mhkay/gedml/.

[15] LexML information. http://www.ancestry.com/learn/library/article.aspx?article=3438.

[16] GEDCOM structure. http://genealogy.about.com/library/weekly/aa110100a.htm.

[17] Tags of the GEDCOM 5.5. http://genealogy.about.com/library/weekly/aa110100d.htm.

42

CHAPTER 8. BIBLIOGRAPHY

[18] Brother’s Keeper homepage. http://www.bkwin.net/. [19] Family Historian page. http://www.family-historian.co.uk/downloads/index.htm.

[20] Ahnentafel specification. http://en.wikipedia.org/wiki/Ahnentafel.

[21] Personal Ancestral File download page. http://www.ldscatalog.com/webapp/wcs/stores/servlet/ProductDisplay?catalogId=10151&storeId=10151&productId=47670

[22] Review of the Legacy program. http://genealogy-software-review.toptenreviews.com/legacy-review.html.

[23] Legacy homepage. http://www.legacyfamilytree.com/.

[24] Family Tree Maker homepage. http://www.familytreemaker.com/. [25] Website of the GenoPro. http://www.genopro.com/family-tree-software/.

[26] GRAMPS homepage. http://www.gramps-project.org/wiki/index.php?title=Main_Page.

[27] PhpGedView project homepage. http://www.phpgedview.net/. [28] PhpGedView description. http://www.phpgedview.net/devdocs/backend.php.

[29] PhpGedView wikipedia information. http://wiki.phpgedview.net/en/index.php?title=Installation_Guide.

[30] Main page of Czechoslovak Genealogy Society International. http://www.cgsi.org/. [31] Library of CGSI. http://www.cgsi.org/research.asp?i=46.

[32] Website of GENEA. http://www.genea.cz/ruzne/svet.htm.

[33] Genealogical handbook. http://www.genea.cz/informace/stara-genea/genealogicka-prirucka/.

[34] Homepage of Genealogy.com. http://www.genealogy.com/index_r.html.

[35] Main page of the Church of Jesus Christ of Latter-day Saints. http://www.lds.org.

.

CHAPTER 8. BIBLIOGRAPHY [36] Ancestral File information. http://www.eogen.com/AncestralFile.

[37] Specification of International Genealogical Index. http://en.wikipedia.org/wiki/International_Genealogical_Index.

[38] Information about International Genealogical Index. http://freespace.virgin.net/owston.tj/index.htm. [39] Family History Library Catalog website. http://www.familysearch.org/Eng/Library/FHLC/frameset_fhlc.asp.

[40] Information about PRF. http://www.findyourfamilytree.com/whatprf.html.

[41] Homepage of Ancestry.com. http://www.ancestry.com/. [42] Starting page of GENI.com. http://www.geni.com/tree/start. [43] SurnameDB website. http://www.surnamedb.com/. [44] START WITH and CONNECT BY clauses explanation. http://www.adp-gmbh.ch/ora/sql/connect_by.html. [45] Website of the TOAD. http://www.quest.com/Toad_Data_Modeler/.

[46] Website of the ER Studio. http://www.embarcadero.com/products/erstudio/index.html.

[47] Website of Oracle document about Native Datatypes. http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96524/c13datyp.htm.

[48] Website of the MagicDraw UML. http://www.magicdraw.com/. [49] Class of X36RSF. http://ocw.cvut.cz/moodle/course/view.php?id=40.

[50] Craft.CASE information. http://kii.pef.czu.cz/~merunka/documents/publications/Objekty_2005_Ostrava.pdf.

[51] BORM information. http://martin.feld.cvut.cz/~molhanec/Vyuka/X36SSP/files/BORM.ppt.

[52] OBA explanation. http://www.grada.cz/dokums_raw/usn/borm.html.

[53] V. Merunka - Craft.CASE 2.0 - ˇcesk´ y navod. http://ocw.cvut.cz/moodle/course/view.php?id=40.

43

44

CHAPTER 8. BIBLIOGRAPHY

[54] Model of GEDCOM - part 1. http://homepages.rootsweb.com/~pmcbride/gedcom/55model1.gif.

[55] Model of GEDCOM - part 2. http://homepages.rootsweb.com/~pmcbride/gedcom/55model2.gif.

[56] Text to GEDCOM transformation. http://www.tedpack.org/text2ged.html. [57] Homepage of the Oracle Application Express. http://www.oracle.com/technology/products/database/application_express/index.html.

[58] Homepage of the Oracle Database 10g Express Edition. http://www.oracle.com/technology/products/database/xe/index.html. [59] Homepage of the Oracle SQL Developer. http://www.oracle.com/technology/products/database/sql_developer/index.html.

[60] Explanation of images in APEX. http://daust.blogspot.com/2006/03/where-are-images-of-application.html.

[61] Python homepage. http://python.org. [62] GLADE for Windows website. http://gladewin32.sf.net/. [63] Libraries for GRAMPS on Windows. http://www.acc.umu.se.

APPENDIX A. EXAMPLES OF SOURCES

A Examples of sources

Figure A.1: Soupis poddan´ ych

45

46

APPENDIX A. EXAMPLES OF SOURCES

Figure A.2: Bern´ı rula

APPENDIX A. EXAMPLES OF SOURCES

ˇ ach z roku 1793 Figure A.3: Soupis ˇzidovsk´ ych rodin v Cech´

47

48

APPENDIX A. EXAMPLES OF SOURCES

Figure A.4: Stabiln´ı katastr

Figure A.5: Legion´aˇri

APPENDIX A. EXAMPLES OF SOURCES

Figure A.6: Sˇc´ıt´an´ı lidu z roku 1921

49

50

APPENDIX A. EXAMPLES OF SOURCES

Figure A.7: Prior genealogical research

APPENDIX A. EXAMPLES OF SOURCES

Figure A.8: Seznam ˇzadatel˚ u o povolen´ı k (vy)cestov´an´ı

51

52

APPENDIX A. EXAMPLES OF SOURCES

ˇ e katolick´e osady v USA 1865-1890 Figure A.9: Cesk´

APPENDIX A. EXAMPLES OF SOURCES

Figure A.10: Die Juden und Judengemeinden Bohmens

53

54

APPENDIX B. FULL LIST OF GEDCOM 5.5 TAGS

B Full list of GEDCOM 5.5 tags ABBR {ABBREVIATION} A short name of a title, description, or name. ADDR {ADDRESS} The contemporary place, usually required for postal purposes, of an individual, a submitter of information, a repository, a business, a school, or a company. ADR1 {ADDRESS1} The first line of an address. ADR2 {ADDRESS2} The second line of an address. ADOP {ADOPTION} Pertaining to creation of a child-parent relationship that does not exist biologically. AFN {AFN} A unique permanent record file number of an individual record stored in Ancestral File. AGE {AGE} The age of the individual at the time an event occurred, or the age listed in the document. AGNC {AGENCY} The institution or individual having authority and/or responsibility to manage or govern. ALIA {ALIAS} An indicator to link different record descriptions of a person who may be the same person. ANCE {ANCESTORS} Pertaining to forbearers of an individual. ANCI {ANCES INTEREST} Indicates an interest in additional research for ancestors of this individual. (See also DESI) ANUL {ANNULMENT} Declaring a marriage void from the beginning (never existed). ASSO {ASSOCIATES} An indicator to link friends, neighbors, relatives, or associates of an individual. AUTH {AUTHOR} The name of the individual who created or compiled information. BAPL {BAPTISM-LDS} The event of baptism performed at age eight or later by priesthood authority of the LDS Church. (See also BAPM) BAPM {BAPTISM} The event of baptism (not LDS), performed in infancy or later. (See also BAPL, above, and CHR, below) BARM {BAR MITZVAH} The ceremonial event held when a Jewish boy reaches age 13. BASM {BAS MITZVAH} The ceremonial event held when a Jewish girl reaches age 13, also known as ”Bat Mitzvah.” BIRT {BIRTH} The event of entering into life. BLES {BLESSING} A religious event of bestowing divine care or intercession. Sometimes given in connection with a naming ceremony. BLOB {BINARY OBJECT} A grouping of data used as input to a multimedia system that processes binary data to represent images, sound, and video. BURI {BURIAL} The event of the proper disposing of the mortal remains of a deceased person. CALN {CALL NUMBER} The number used by a repository to identify the specific items in its collections. CAST {CASTE} The name of an individual’s rank or status in society, based on racial or religious differences, or differences in wealth, inherited rank, profession, occupation, etc. CAUS {CAUSE} A description of the cause of the associated event or fact, such as the cause of death.

APPENDIX B. FULL LIST OF GEDCOM 5.5 TAGS

55

CENS {CENSUS} The event of the periodic count of the population for a designated locality, such as a national or state Census. CHAN {CHANGE} Indicates a change, correction, or modification. Typically used in connection with a DATE to specify when a change in information occurred. CHAR {CHARACTER} An indicator of the character set used in writing this automated information. CHIL {CHILD} The natural, adopted, or sealed (LDS) child of a father and a mother. CHR {CHRISTENING} The religious event (not LDS) of baptizing and/or naming a child. CHRA {ADULT CHRISTENING} The religious event (not LDS) of baptizing and/or naming an adult person. CITY {CITY} A lower level jurisdictional unit. Normally an incorporated municipal unit. CONC {CONCATENATION} An indicator that additional data belongs to the superior value. The information from the CONC value is to be connected to the value of the superior preceding line without a space and without a carriage return and/or new line character. Values that are split for a CONC tag must always be split at a non-space. If the value is split on a space the space will be lost when concatenation takes place. This is because of the treatment that spaces get as a GEDCOM delimiter, many GEDCOM values are trimmed of trailing spaces and some systems look for the first non-space starting after the tag to determine the beginning of the value. CONF {CONFIRMATION} The religious event (not LDS) of conferring the gift of the Holy Ghost and, among protestants, full church membership. CONL {CONFIRMATION L} The religious event by which a person receives membership in the LDS Church. CONT {CONTINUED} An indicator that additional data belongs to the superior value. The information from the CONT value is to be connected to the value of the superior preceding line with a carriage return and/or new line character. Leading spaces could be important to the formatting of the resultant text. When importing values from CONT lines the reader should assume only one delimiter character following the CONT tag. Assume that the rest of the leading spaces are to be a part of the value. COPR {COPYRIGHT} A statement that accompanies data to protect it from unlawful duplication and distribution. CORP {CORPORATE} A name of an institution, agency, corporation, or company. CREM {CREMATION} Disposal of the remains of a person’s body by fire. CTRY {COUNTRY} The name or code of the country. DATA {DATA} Pertaining to stored automated information. DATE {DATE} The time of an event in a calendar format. DEAT {DEATH} The event when mortal life terminates. DESC {DESCENDANTS} Pertaining to offspring of an individual. DESI {DESCENDANT INT} Indicates an interest in research to identify additional descendants of this individual. (See also ANCI) DEST {DESTINATION} A system receiving data. DIV {DIVORCE} An event of dissolving a marriage through civil action. DIVF {DIVORCE FILED} An event of filing for a divorce by a spouse. DSCR {PHY DESCRIPTION} The physical characteristics of a person, place, or thing.

56

APPENDIX B. FULL LIST OF GEDCOM 5.5 TAGS

EDUC {EDUCATION} Indicator of a level of education attained. EMIG {EMIGRATION} An event of leaving one’s homeland with the intent of residing elsewhere. ENDL {ENDOWMENT} A religious event where an endowment ordinance for an individual was performed by priesthood authority in an LDS temple. ENGA {ENGAGEMENT} An event of recording or announcing an agreement between two people to become married. EVEN {EVENT} A noteworthy happening related to an individual, a group, or an organization. FAM {FAMILY} Identifies a legal, common law, or other customary relationship of man and woman and their children, if any, or a family created by virtue of the birth of a child to its biological father and mother. FAMC {FAMILY CHILD} Identifies the family in which an individual appears as a child. FAMF {FAMILY FILE} Pertaining to, or the name of, a family file. Names stored in a file that are assigned to a family for doing temple ordinance work. FAMS {FAMILY SPOUSE} Identifies the family in which an individual appears as a spouse. FCOM {FIRST COMMUNION} A religious rite, the first act of sharing in the Lord’s supper as part of church worship. FILE {FILE} An information storage place that is ordered and arranged for preservation and reference. FORM {FORMAT} An assigned name given to a consistent format in which information can be conveyed. GEDC {GEDCOM} Information about the use of GEDCOM in a transmission. GIVN {GIVEN NAME} A given or earned name used for official identification of a person. GRAD {GRADUATION} An event of awarding educational diplomas or degrees to individuals. HEAD {HEADER} Identifies information pertaining to an entire GEDCOM transmission. HUSB {HUSBAND} An individual in the family role of a married man or father. IDNO {IDENT NUMBER} A number assigned to identify a person within some significant external system. IMMI {IMMIGRATION} An event of entering into a new locality with the intent of residing there. INDI {INDIVIDUAL} A person. INFL {TempleReady} Indicates if an INFANT - data is ”Y” (or ”N”) LANG {LANGUAGE} The name of the language used in a communication or transmission of information. LEGA {LEGATEE} A role of an individual acting as a person receiving a bequest or legal devise. MARB {MARRIAGE BANN} An event of an official public notice given that two people intend to marry. MARC {MARR CONTRACT} An event of recording a formal agreement of marriage, including the prenuptial agreement in which marriage partners reach agreement about the property rights of one or both, securing property to their children.

APPENDIX B. FULL LIST OF GEDCOM 5.5 TAGS

57

MARL {MARR LICENSE} An event of obtaining a legal license to marry. MARR {MARRIAGE} A legal, common-law, or customary event of creating a family unit of a man and a woman as husband and wife. MARS {MARR SETTLEMENT} An event of creating an agreement between two people contemplating marriage, at which time they agree to release or modify property rights that would otherwise arise from the marriage. MEDI {MEDIA} Identifies information about the media or having to do with the medium in which information is stored. NAME {NAME} A word or combination of words used to help identify an individual, title, or other item. More than one NAME line should be used for people who were known by multiple names. NATI {NATIONALITY} The national heritage of an individual. NATU {NATURALIZATION} The event of obtaining citizenship. NCHI {CHILDREN COUNT} The number of children that this person is known to be the parent of (all marriages) when subordinate to an individual, or that belong to this family when subordinate to a FAM RECORD. NICK {NICKNAME} A descriptive or familiar that is used instead of, or in addition to, one’s proper name. NMR {MARRIAGE COUNT} The number of times this person has participated in a family as a spouse or parent. NOTE {NOTE} Additional information provided by the submitter for understanding the enclosing data. NPFX {NAME PREFIX} Text which appears on a name line before the given and surname parts of a name. i.e. (Lt. Cmndr.) Joseph /Allen/ jr. NSFX {NAME SUFFIX} Text which appears on a name line after or behind the given and surname parts of a name. i.e. Lt. Cmndr. Joseph /Allen/ (jr.) In this example jr. is considered as the name suffix portion. OBJE {OBJECT} Pertaining to a grouping of attributes used in describing something. Usually referring to the data required to represent a multimedia object, such an audio recording, a photograph of a person, or an image of a document. OCCU {OCCUPATION} The type of work or profession of an individual. ORDI {ORDINANCE} Pertaining to a religious ordinance in general. ORDN {ORDINATION} A religious event of receiving authority to act in religious matters. PAGE {PAGE} A number or description to identify where information can be found in a referenced work. PEDI {PEDIGREE} Information pertaining to an individual to parent lineage chart. PHON {PHONE} A unique number assigned to access a specific telephone. PLAC {PLACE} A jurisdictional name to identify the place or location of an event. POST {POSTAL CODE} A code used by a postal service to identify an area to facilitate mail handling. PROB {PROBATE} An event of judicial determination of the validity of a will. May indicate several related court activities over several dates. PROP {PROPERTY} Pertaining to possessions such as real estate or other property of interest. PUBL {PUBLICATION} Refers to when and/or were a work was published or created. QUAY {QUALITY OF DATA} An assessment of the certainty of the evidence to sup-

58

APPENDIX B. FULL LIST OF GEDCOM 5.5 TAGS

port the conclusion drawn from evidence. Values: [0—1—2—3] REFN {REFERENCE} A description or number used to identify an item for filing, storage, or other reference purposes. RELA {RELATIONSHIP} A relationship value between the indicated contexts. RELI {RELIGION} A religious denomination to which a person is affiliated or for which a record applies. REPO {REPOSITORY} An institution or person that has the specified item as part of their collection(s). RESI {RESIDENCE} The act of dwelling at an address for a period of time. RESN {RESTRICTION} A processing indicator signifying access to information has been denied or otherwise restricted. RETI {RETIREMENT} An event of exiting an occupational relationship with an employer after a qualifying time period. RFN {REC FILE NUMBER} A number assigned to a record that uniquely identifies it within a known file. RIN {REC ID NUMBER} A number assigned to a record by an originating automated system that can be used by a receiving system to report results pertaining to that record. ROLE {ROLE} A name given to a role played by an individual in connection with an event. SEX {SEX} Indicates the sex of an individual–male or female. SLGC {SEALING CHILD} A religious event pertaining to the sealing of a child to his or her parents in an LDS temple ceremony. SLGS {SEALING SPOUSE} A religious event pertaining to the sealing of a husband and wife in an LDS temple ceremony. SOUR {SOURCE} The initial or original material from which information was obtained. SPFX {SURN PREFIX} A name piece used as a non-indexing pre-part of a surname. SSN {SOC SEC NUMBER} A number assigned by the United States Social Security Administration. Used for tax identification purposes. STAE {STATE} A geographical division of a larger jurisdictional area, such as a State within the United States of America. STAT {STATUS} An assessment of the state or condition of something. SUBM {SUBMITTER} An individual or organization who contributes genealogical data to a file or transfers it to someone else. SUBN {SUBMISSION} Pertains to a collection of data issued for processing. SURN {SURNAME} A family name passed on or used by members of a family. TEMP {TEMPLE} The name or code that represents the name of a temple of the LDS Church. TEXT {TEXT} The exact wording found in an original source document. TIME {TIME} A time value in a 24-hour clock format, including hours, minutes, and optional seconds, separated by a colon (:). Fractions of seconds are shown in decimal notation. TITL {TITLE} A description of a specific writing or other work, such as the title of a book when used in a source context, or a formal designation used by an individual in connection with positions of royalty or other social status, such as Grand Duke. TRLR {TRAILER} At level 0, specifies the end of a GEDCOM transmission. TYPE {TYPE} A further qualification to the meaning of the associated superior tag.

APPENDIX B. FULL LIST OF GEDCOM 5.5 TAGS

59

The value does not have any computer processing reliability. It is more in the form of a short one or two word note that should be displayed any time the associated data is displayed. VERS {VERSION} Indicates which version of a product, item, or publication is being used or referenced. WIFE {WIFE} An individual in the role as a mother and/or married woman. WILL {WILL} A legal document treated as an event, by which a person disposes of his or her estate, to take effect after death. The event date is the date the will was signed while the person was alive. (See also PROBate)

60

APPENDIX C. DATABASE MODEL

C Database model

Figure C.1: Database model - logical structure - version 1

APPENDIX C. DATABASE MODEL

Figure C.2: Database model - logical structure - version 2

61

62

APPENDIX C. DATABASE MODEL

Figure C.3: Database model - physical structure - version 1

APPENDIX C. DATABASE MODEL

Figure C.4: Database model - physical structure - version 2

63

64

APPENDIX D. TABLES

D Tables Key Column name PK main id ins name FK actual name id ins middle ins surname FK actual surname id gender FK mother id FK father id FK source id NOS NOD created by created date changed by changed date

Data type Number NVarchar2(50) Number NVarchar2(30) NVarchar2(50) Number Char(1) Number Number Number Number Number Varchar2(30) Timestamp(6) Varchar2(30) Timestamp(6)

Not null Check YES NO NO NO YES YES YES ”gender” in (’M’, ’F’) NO NO NO NO NO YES YES NO NO

Table D.1: The individuals table Key Column name PK event id day month year info FK main id FK location id FK event type id FK source id

Data type Number Number Number Number Varchar2(255) Number Number Number Number

Not null Check YES NO NO NO NO YES YES YES NO

Table D.2: The events table

APPENDIX D. TABLES

65

Key Column name PK union id day month year divorced FK man id FK woman id FK union type id FK source id FK location id

Data type Number Number Number Number Char(1) Number Number Number Number Number

Not null Check YES NO NO NO NO ”divorced” in (’Y’, ’N’) YES YES YES NO NO

Table D.3: The unions table Key Column name PK source id name of source info reliability name of location FK location id

Data type Number Varchar2(30) Varchar2(255) Number Varchar2(50) Number

Not null Check YES NO NO YES NO YES

Table D.4: The sources table Key Column name PK location id ins location name street FK actual place id

Data type Number Varchar2(30) Varchar2(30) Number

Not null Check YES YES NO YES

Table D.5: The locations table Key PFK PFK

Column name main id file id

Data type Number Number

Not null Check YES YES

Table D.6: The file management table Key Column name PK file id link to file FK file type id

Data type Number Varchar(50) Number

Not null Check YES YES YES

Table D.7: The files table

66

APPENDIX D. TABLES Key Column name PK file type id type

Data type Not null Check Number YES Varchar(30) YES

Table D.8: The type of file table Key Column name PK union type id type

Data type Not null Check Number YES Varchar(30) YES

Table D.9: The type of union table Key Column name PK event type id type

Data type Not null Check Number YES Varchar(30) YES

Table D.10: The type of event table Key Column name PK name id name recency actual name id

Data type Number NVarchar2(50) Char(1) Number

Not null Check YES YES YES ”recency” in (’Y’, ’N’) YES

Table D.11: The actual names table Key Column name PK surname id surname recency actual f surname id actual m surname id

Data type Number NVarchar2(50) Char(1) Number Number

Not null Check YES YES YES ”recency” in (’Y’, ’N’) YES YES

Table D.12: The actual surnames table Key Column name PK place id place recency actual place id existing info

Data type Number NVarchar2(100) Char(1) Number Char(1) Varchar2(255)

Not null Check YES YES YES ”recency” in (’Y’, ’N’) YES YES ”existing” in (’Y’, ’N’) NO

Table D.13: The actual places table

APPENDIX D. TABLES

67

Key Column name PK temp id ins name ins middle ins surname

Data type Number NVarchar2(50) NVarchar2(30) NVarchar2(50)

Not null Check YES NO NO YES

Table D.14: The temp individuals table

Key Column name PK temp event id day month year info FK event type id FK temp id

Data type Number Number Number Number Varchar2(255) Number Number

Not null Check YES NO NO NO NO YES YES

Table D.15: The temp events table

68

APPENDIX E. UML DIAGRAMS

E UML diagrams

Figure E.1: Portal - use case diagram

APPENDIX E. UML DIAGRAMS

Figure E.2: Portal - state diagram

69

70

APPENDIX E. UML DIAGRAMS

Figure E.3: Basic search - sequence diagram

APPENDIX E. UML DIAGRAMS

Figure E.4: User creation - sequence diagram

71

72

APPENDIX E. UML DIAGRAMS

Figure E.5: GEDCOM export - activity diagram

APPENDIX F. BUSINESS DIAGRAMS

F Business diagrams

Figure F.1: Basic search - sketch

73

74

APPENDIX F. BUSINESS DIAGRAMS

Figure F.2: GEDCOM export - sketch

APPENDIX F. BUSINESS DIAGRAMS

Figure F.3: Insert data - sketch

75

76

APPENDIX F. BUSINESS DIAGRAMS

Figure F.4: Basic search - scenario

Figure F.5: Basic search - diagram

APPENDIX F. BUSINESS DIAGRAMS

Figure F.6: Extended search - scenario

Figure F.7: Extended search - diagram

77

78

APPENDIX F. BUSINESS DIAGRAMS

Figure F.8: Insert data by admin - scenario

Figure F.9: Insert data by admin - diagram

APPENDIX F. BUSINESS DIAGRAMS

Figure F.10: Creation of the user - scenario

Figure F.11: Creation of the user - business diagram

79

80

APPENDIX F. BUSINESS DIAGRAMS

Figure F.12: Export data to the GEDCOM file - scenario

Figure F.13: Export data to the GEDCOM file - diagram

APPENDIX F. BUSINESS DIAGRAMS

Figure F.14: Insert data by user - scenario

Figure F.15: Insert data by user - diagram

81

82

APPENDIX F. BUSINESS DIAGRAMS

Figure F.16: Business architecture

APPENDIX G. SIMULATION OF BASIC SEARCH

G Simulation of Basic search

Figure G.1: Simulation of Basic select - initial phase

Figure G.2: Simulation of Basic select - step 2

Figure G.3: Simulation of Basic select - step 3

83

84

APPENDIX G. SIMULATION OF BASIC SEARCH

Figure G.4: Simulation of Basic select - step 4

Figure G.5: Simulation of Basic select - step 5

Figure G.6: Simulation of Basic select - step 6

APPENDIX G. SIMULATION OF BASIC SEARCH

Figure G.7: Simulation of Basic select - step 7

Figure G.8: Simulation of Basic select - step 8

Figure G.9: Simulation of Basic select - step 9

85

86

APPENDIX G. SIMULATION OF BASIC SEARCH

Figure G.10: Simulation of Basic select - step 10

Figure G.11: Simulation of Basic select - step 11

Figure G.12: Simulation of Basic select - step 12

APPENDIX G. SIMULATION OF BASIC SEARCH

Figure G.13: Simulation of Basic select - step 13

Figure G.14: Simulation of Basic select - step 14

Figure G.15: Simulation of Basic select - step 15

87

88

APPENDIX G. SIMULATION OF BASIC SEARCH

Figure G.16: Simulation of Basic select - step 16

Figure G.17: Simulation of Basic select - step 17

Figure G.18: Simulation of Basic select - step 18

APPENDIX G. SIMULATION OF BASIC SEARCH

Figure G.19: Simulation of Basic select - step 19

Figure G.20: Simulation of Basic select - step 20

Figure G.21: Simulation of Basic select - step 21

89

90

APPENDIX G. SIMULATION OF BASIC SEARCH

Figure G.22: Simulation of Basic select - step 22

Figure G.23: Simulation of Basic select - step 23

Figure G.24: Simulation of Basic select - step 24

APPENDIX G. SIMULATION OF BASIC SEARCH

Figure G.25: Simulation of Basic select - step 25

Figure G.26: Simulation of Basic select - step 26

91

92

APPENDIX H. CONTENT OF ENCLOSED CD

H Content of enclosed CD

Figure H.1: Content of enclosed CD