Error and uncertainty: 50 ways to make a mistake. This is lecture eleven

Error and uncertainty: 50 ways to make a mistake This is lecture eleven Error and accuracy • Until the 1990s, little attention was paid to the poten...
4 downloads 0 Views 207KB Size
Error and uncertainty: 50 ways to make a mistake This is lecture eleven

Error and accuracy • Until the 1990s, little attention was paid to the potential for error in GIS. • Researchers were too busy building systems, software and algorithms to do the most elementary operations. • Error and imprecision are problems associated with a more mature science.

Error and analysis • Error can invalidate the results of spatial analysis. • Error can be injected at many points in the process but one of the largest sources of error starts with data. • This is ironic because one of the most attractive features of geographic information systems is their ability to use information from many sources. • However each time a new dataset is introduced, new possibilities for error are also introduced.

Accuracy and precision • • •

Accuracy is the degree to which information on a map or in a digital database matches true or accepted values. Accuracy is an issue pertaining to the quality of data and the number of errors contained in a dataset or map. In discussing a GIS database, it is possible to consider horizontal and vertical accuracy with respect to geographic position, as well as attribute, conceptual, and logical accuracy.

Precision • Precision refers to the level of measurement and exactness of description in a GIS database. • Map precision is similar to decimal precision. • Precise locational data may measure position to a fraction of a unit (like meters or inches). • Precise attribute information may specify the characteristics of features in great detail.

Levels of precision • The level of precision required for particular applications varies greatly. • Engineering projects such as road and utility construction require very precise information measured to the millimeter or tenth of an inch. • Demographic analyses of marketing or electoral trends can often make do with less, say to the closest zip code or precinct boundary. • Highly precise data can be very difficult and costly to collect. • Carefully surveyed locations needed by utility companies to record the locations of pumps, wires, pipes and transformers cost $5-20 per point to collect.

Accuracy and precision • High precision does not indicate high accuracy nor does high accuracy imply high precision. • But high accuracy and high precision are both expensive. • Two additional terms are used as well: • Data quality refers to the relative accuracy and precision of a particular GIS database. These facts are often documented in data quality reports. Error encompasses both the imprecision of data and its inaccuracies.

Types of error • Positional • Attribute • Conceptual

Positional Error • This applies to both horizontal and vertical positions. • Accuracy and precision are a function of the scale at which a map (paper or digital) was created. • The mapping standards employed by the United States Geological Survey specify that: "requirements for meeting horizontal accuracy as 90 per cent of all measurable points must be within 1/30th of an inch for maps at a scale of 1:20,000 or larger, and 1/50th of an inch for maps at scales smaller than 1:20,000."

Table of accuracy standards 1:1,200 ± 3.33 feet 1:2,400 ± 6.67 feet 1:4,800 ± 13.33 feet 1:10,000 ± 27.78 feet 1:12,000 ± 33.33 feet 1:24,000 ± 40.00 feet 1:63,360 ± 105.60 feet 1:100,000 ± 166.67 feet

• This means that location on a map = probable location.

Attribute accuracy • The non-spatial data linked to location may also be inaccurate or imprecise. • Inaccuracies may result from many mistakes. • Non-spatial data can also vary greatly in precision. • Precise attribute information describes phenomena in great detail.

Conceptual accuracy • GIS depend upon the abstraction and classification of real-world phenomena. • The users determines what amount of information is used and how it is classified into appropriate categories. • Sometimes users may use inappropriate categories or misclassify information.

Required quality • It is a mistake to believe that highly accurate and highly precision information is needed for every GIS application. • The need for accuracy and precision will vary radically depending on the type of information coded and the level of measurement needed for a particular application. • Excessive accuracy and precision is not only costly but can cause considerable delay in execution of a project. • The user must determine what will work.

Sources of imprecision and error • There are many sources of error that may affect the quality of a GIS dataset. • Some are quite obvious, but others can be difficult to discern. • Few of these will be automatically identified by the GIS itself. • It is the user's responsibility to prevent them. • Particular care should be devoted to checking for errors because GIS are quite capable of lulling the user into a false sense of accuracy and precision unwarranted by the data available.

Imprecision associated with cartography • There is an inherent imprecision in cartography that begins with the projection process and its necessary distortion of some of the data - an imprecision that may continue throughout the GIS process. • Recognition of error and importantly what level of error is tolerable and affordable must be acknowledged and accounted for by GIS users.

3 major sources of error 1. Obvious sources of error. 2. Errors resulting from natural variations or from original measurements. 3. Errors arising through processing. • Generally errors of the first two types are easier to detect than those of the third because errors arising through processing can be quite subtle and may be difficult to identify.

Obvious sources of error • Age of data. • Data sources may simply be too old to be useful or relevant to current GIS projects. • Past collection standards may be unknown, nonexistent, or not currently acceptable. • Despite the power of GIS, reliance on old data may unknowingly skew, bias, or negate results.

Obvious 2 • Areal Cover. • Data on a give area may be completely lacking, or only partial levels of information may be available for use in a GIS project.

Obvious 3 • Map Scale. • The ability to show detail in a map is determined by its scale. • Scale restricts type, quantity, and quality of data. • One must match the appropriate scale to the level of detail required in the project. • Enlarging a small scale map does not increase its level of accuracy or detail.

Obvious 4 • Density of Observations. • The number of observations within an area is a guide to data reliability and should be known by the map user. • An insufficient number of observations may not provide the level of resolution required to adequately perform spatial analysis

Obvious 5 • Relevance. • Quite often the desired data regarding a site or area may not exist and surrogate data may have to be used instead. • A valid relationship must exist between the surrogate and the phenomenon it is used to study but, error may still creep in because the phenomenon is not being measured directly.

Errors resulting from natural variation • Sources of variation in data. • Variations in data may be due to measurement error introduced by faulty observation, biased observers, or by miscalibrated or inappropriate equipment.

Errors resulting from processing • Processing errors are the most difficult to detect by GIS users and must be specifically looked for. • They require knowledge of the information and the systems used to process it. • These are subtle errors that occur in several ways, and are therefore potentially more insidious, particularly because they can occur in multiple sets of data being manipulated in a GIS project. • Initial errors can also be propagated through processing with error increasing with every manipulation.

Examples processing errors • Numerical Errors. • Different computers may not have the same capability to perform complex mathematical operations and may produce significantly different results for the same problem. • Computer processing errors occur in rounding off operations and are subject to the inherent limits of number manipulation by the processor. • Another source of error may from faulty processors, such as the recent mathematical problem identified in Intel's Pentium chip.

Processing 2 • Errors in Topological Analysis. • Logic errors may cause incorrect manipulation of data and topological analyses. • One must recognize that data is not uniform and is subject to variation. • Overlaying multiple layers of maps can result in problems such as slivers overshoots and dangles. • Variation in accuracy between different map layers may be obscured during processing leading to the creation of "virtual data which may be difficult to detect from real data“.

Processing 3 • Classification and Generalization Problems. • For the human mind to comprehend vast amounts of data it must be classified, and in some cases generalized, to be understandable. • About seven divisions of data is ideal and may be retained in human short term memory.

Classification problems • Defining class intervals is another problem area. • Data is most accurately displayed and manipulated in small multiples. • Classification and aggregation of attributes used in GIS are subject to interpolation error and may introduce irregularities in the data that is hard to detect.

Processing 4 • Digitizing and Geocoding Errors. • Processing errors occur during other phases of data manipulation such as digitizing and geocoding, overlay and boundary intersections, and errors from rasterizing a vector map. • Physiological errors of the operator by involuntary muscle contractions may result in spikes and loops, overshoots and polygon errors. • Errors associated with damaged source maps, operator error and bias can be checked by comparing original maps with digitized versions.

Broad themes in data accurracy & Examples from research

Data consistency • One of the great benefits of GIS analysis is that it permits researchers to examine patterns using large amounts of data over great areas. • There is a proviso: data must be consistent.

Missing data • Missing data undermines the analysis. • Consistency is the key to good data, and good data is the key to reliable analysis

Definitions of a road • SRM Definition: – DA25150000 - A specially prepared route on land for the movement of vehicles (other than railway vehicles) from place to place. These are typically resource roads, either industrial or recreational. Includes road access to log landings.

• Ministry of Forests Definition: – DD31700000 - A narrow path or route not wide enough for the passage of a four wheeled vehicle but suitable for hiking or cycling. Park paths and boardwalks are considered trails.

Spatial mis-registration • Disparity Between the 1996 and the 2001 Street Network Files for the Canada Census.

Extrapolated or interpolated data • Despite the wide-spread use of data models that depend on areal data, most GIS data starts as point data, and are extrapolated to areas. • In fact, there are very few data used in GIS that do not originate as point data, but our traditional map-making devices are all based on the display of areas.

Data from points • Points are almost always the basis of homogenous areas representing levels of phenomena from elevation to population density. • The question then becomes: how do we know where our data come from? • Did they originate as point data, were they aggregated or combined with other data? • To what map projection does the data correspond? • Who collected them? • The answer to these and other questions lies in metadata.

Meta-data • Metadata is data about data. • Typical metadata includes lineage (where did the data come from?); projection, scale, data fields, and the name of a data steward. • The problem with this information is that it does not allow the user to make informed decisions about semantic interoperability. • It is based on an uncritical approach to data that assumes the transparency of language.

Desirable meta-data – Six issues are relevant to integrating multiple data sources: (i) specification of sampling methodologies; (ii) definition of terms; (iii) measurement specification; (iv) documentation of classification system and taxonomic details; (v) identification of data model and history; and (vi) specification of collection rationale and purpose of study.

Meta-data in the international context • Metadata forms part of a burgeoning international effort to extend the interoperability of spatial data. • Efforts to encourage metadata recognize that multiple data coverages are required for data exploration and spatial analysis, but their use is presently limited by problems identifying the data let alone standardizing multiple sources. • There are presently national and international efforts underway to require metadata to be included with spatial data.

International efforts • Meta-data intiatives include the Federal Geographic Data Committee (FGDC) in the United States, and the International Standards Organization (ISO). • In Canada interoperability efforts are coordinated by GeoConnections • See: www.geoconnections.org. • Their goal is to facilitate the development of a Canada Geospatial Data Infrastructure (CGDI).

Suggest Documents