1
Part 4
Transforming data into INSPIRE compliant data
Transformation process 1. Conceptual matching • Understand data specification • Flip chart exercise with domain experts: schema matching • Discuss different options for publishing data • Relationship source data vs INSPIRE theme • Type of download service
2. Configure mappings/ transformation & validate • Choose ETL/transformation tool • Configure schema mapping • Generate data and validate
2
3. Publish data • Create data or configure WFS • Upload data or deploy web service • Choose metadata tool • Create INSPIRE discovery metadata • URLs for access the data • Additional metadata elements as required by data specification • Update conformance statement • Publish metadata in INSPIRE discovery service
Basic transformation operations In the next slides we will highlight a series of basic transformation operations
1. 2. 3. 4. 5.
Schema translation: matching and mapping Coordinate conversion and transformation Filtering and resampling Edge matching Other operations
3
Schema translation
4
• A schema is here defined as a formal description of a model – Conceptual schema = data structures, codelists etc...(UML) – Logical schema = Physical structure (expressed in XSD) – Transfer files = XML/GML files
• Schema matching (finding semantically related objects) – Ontology, thesauri, dictionaries
• Schema mapping (finding transformation rules) – Reclassification – Data Type conversion – Reference systems
• Schema Transformation – Extract-Transform-Load (ETL)
Schema translation
5
Required knowledge • • • • •
6
INSPIRE directive, implementing rules and guidelines UML: to understand the target data model RDBMS: to understand relational models XML: needed to understand GML GML: needed to understand encoding – INSPIRE: GMLv3.2.1
• Network services: needed to publish harmonised data – CSW to host metadata catalogue – WFS to download spatial data in GML format
• ETL tools: needed to conduct data operations – Used to convert data as closely as possible to target schema
7
Schema matching • The process of identifying that two expressions are semantically related
7
Schema matching
8
• Schema matching is the process of identifying corresponding concepts in the source schema (national data sets) and the target schema (INSPIRE specification). • The matching process considered both the language issues as well as the semantic differences in both schemas.
• The result of schema matching process specify how the data in the source schema corresponds to the data in the target schema. • Schema matching is the first step in the data transformation.
Schema matching
9
Matching process
10
• To start matching process you need to: – Identify feature types in both the source schema and the target schema – Identify structural properties of the feature types – Identify attribute names in both schemas – Identify data-value types and characteristics
• The matching process can be performed manually as a desk study or using automated tools that uses intelligent techniques.
Matching process
11
• The result of schema matching is to make sure that features and attributes in both schemas are semantically related. • Matching process will result in a set of transformation (conversion rules) and translation table • The matching table will be used during mapping.
Matching (and filtering)
Schema matching example
12
• Translation table for matching GN feature type (NamedPlace, INSPIRE target) and (Ortnamn, source).
An illustration of schema matching process
13
An illustration of schema matching process
14
Result of schema matching: Transformation rules table
15
Source data type
Target data type
Conversion Type
Meaning
Code list
Character String
CodelistToText
Codelist value converted to character string
number*2
GML object
CoordinateToPoint
Coordinate pair converted to GML Point
Text or missing value
Text
Assign(Value)
Target value in brackets used instead of source text
Integer
Character String
IntegerToText
Integer value converted to character string
Char
Text
Equal
No conversion required
CodeList
CodeList
Assign
Target value used instead of source value
Schema matching example Transport network Matching Feature type VAGL, VAGOVRL >>> to FormOfway
16
Schema matching example • Matching of attributes VAGTYP >> FormOfWay values
17
Schema mapping
18
• Schema mapping determines how source schema’s elements are matched to the right target schema’s elements. • Two levels of mapping – Feature mapping: the process of connecting source feature types to target feature types. – Attribute mapping: attribute mapping is the process of connecting source feature types attributes to attributes on a target feature types
Schema mapping
19
Finding the transformation rules between the objects • Reclassification
• Type conversions Integer, Real, Date, Char and Varchar, BLOB, Enum, Spatial (point, linestring, polygon etc...)
Schema mapping • Example of type conversions
20
Coordinate Conversion
21
• Spatial Reference systems – – – – –
Geocentric coordinates (X,Y,Z) Geographic coordinates (lat, long, H) Projected coordinates (Northing, Easting, Height) Local coordinates (x,y,z) Linear coordinates (startNode, Length, Distance, R/L)
Coordinate Conversion
22
INSPIRE reference systems Reference system For the horizontal component, for the areas within the geographical scope of the European Terrestrial reference System 1989 (ETRS89) shall be used. Ellipsoid The parameters of the GRS80 ellipsoid shall be used for the computation of latitude and longitude (ETRS89-GRS80) and for the computation of plane coordinates using a suitable map projection Map Projection •The ETRS89 Lambert Azimuthal Equal Area (ETRS-LAEA) shall be used for purposes when true area representation is required •The ETRS89 Lambert Conformal Conic (ETRS-LCC) shall be used for conformal mapping at scales smaller or equal to 1:500.000 •The ETRS Transverse Mercator (ETRS-TMzn) shall be used for conformal mapping at scales larger than 1:500.000
22
Resampling or filtering
Filtering
Resampling
23
Edge Matching
24
• Old problem, dating back from times when map sheets were digitised • Occurs at borders between data sets / countries • Geometric and other conditions to be fulfilled – – – –
Connections Smoothness 90-degrees corners (for buildings etc…) Conditions often solved by least squares adjustment or similar (averaging etc...)
Edge Matching
25
Other operations
26
Less common operations – – – – – – –
Address matching (geo-coding) Transformation between temporal reference systems Multiple representation Topology Merging old and new information Multilinguality Nomenclatures and taxonomies