Accuracy and Uncertainty. Siri, Dylan, Emily, Anthony, Kaelin, Laura

Accuracy and Uncertainty Siri, Dylan, Emily, Anthony, Kaelin, Laura USE ERROR: THE NEGLECTED ERROR COMPONENT By Kate Beard “Two things are infinit...
Author: Basil Short
0 downloads 2 Views 912KB Size
Accuracy and Uncertainty Siri, Dylan, Emily, Anthony, Kaelin, Laura

USE ERROR: THE NEGLECTED ERROR COMPONENT By Kate Beard

“Two things are infinite: the universe and human stupidity; and I’m not sure about the universe.”

Thesis There are 3 types of error 1. source error - from collectoin 2. process error - error in data processing for map compilation 3. use error - public using map incorrectly Use is often neglected - TO OUR PERIL!!!! However computer technology allows possibility to address this problem

Real picture of a young girl who used a map incorrectly

Typology of Map Error 1. source a. b. c. d. e.

positional description limited instruments negligence on the part of the collector bad weather? time constraints

2.

processing error a. b. c. d. e.

digital conversion generalization scale change projections graphic representation

use error 1. not considered map error 2. no systematic study 3. the neglected error

Cases Histosol Gershmel maps occurrences of histosol with dots map later used to identify exact location of peatlands Problems map of widely scattered and small occurrences used to identify exact location of large feature - divergence from expectation histosol is not = peatland - lack of information

Why? scale - forced gershmel to limit detail and information positional accuracy sacrificed for graphic emphasis at this scale Map compiled for one reason and used for another

Aggregate data simplified to be represented at smaller scale

THEN - used for farmland designation, zoning administration, and tax assesment Policy implication for taxes and property rights!

CORN

Causes of misuse Lack of information

Deviation from conventions and expectations The use of small scale generalized maps for many uses because they are conventient and less expensive The lack of current data and ability to make frequent updates

Recommendations 1. Store more information in digital maps a. in paper maps more information is collected than presented - make more information available 2. represent more detailed disaggregate data 3. potential for more extensive data quality documentation 4. update maps more frequently than paper maps 5. structure data to avoid illogical mathematical operations

Beard: Question 1 Question: When an author is caught misusing data, what are the consequences for her/him? Answer 1: Blunders--“Errors in map use, however, can carry significant penalties, since a single case of misuse can cancel all investments in source and process error reduction. Failure to consider use error in the past was excusable, but failure to consider it now risks many of the benefits we hope to achieve through GIS” (Beard, p.10) Misuse is not only frowned upon, it undermines many aspects of the work and study itself.

Beard: Question 1 continued... Question: When an author is caught misusing data, what are the consequences for her/him? Answer 2: Malicious user misuse--while the answer is somewhat vague, the consequences of misusing data is addressed fairly seriously. The USGS Geological Survey Manual states under 600.5 § 7 cl.F(1): “Incidents involving the theft of or malicious damage to IT equipment, fraud, national security violations, or other misuse of IT system resources shall be reported immediately to the Bureau Security Officer and /or other local law enforcement officials depending on the circumstances and location.” (http://www.usgs.gov/usgs-manual/600/600-5.html)

Beard: Question 2 Question: “The author gave an example of user error when describing users who inappropriately used a very generalized map to extract quantitative data. [Is] this type of error eliminated by online access to raw data and click-able original information, or can we still have error here?” Answer: One should always be wary when using online data. There are a myriad of sources that are deemed reliable. It is best to look at the metadata to gather information on its reliability. Many of the major sites belonging to ESRI, USGS, and the like have some guidelines and checks when it comes to data. While it is better, use error is difficult to eradicate, so always read up on the data.

Fisher - Models of Uncertainty ● Three categories of uncertainty: o

Error, Vagueness, Ambiguity

● The problem of Definition ● Intergrade Zones: o

Should an area that is 51% Oak trees be classified different than an area that is 49%?

(source: Fisher 1999)

Fisher - Models of Uncertainty ● Well-Defined Geographic Objects - usually created by Western societies o

census blocks, land ownership, etc.

● Poorly-Defined Geographic Objects- most of natural world and traditional societies o

land cover, vegetation, tribal territory, etc.

(source: Fisher 1999)

Fisher - Models of Uncertainty ● Error - explained by probability o

measurement, assignment, class generalisation, spatial generalisation, entry, temporal, processing

● Vagueness - Fuzzy Set Theory (intermediate degrees of belonging) o o

Similarity Relation Model Semantic Import Model

● Ambiguity - least researched o

Discord, Nonspecificity (source: Fisher 1999)

5. Ambiguity “…doubt as to how a phenomenon should be classified because of differing perceptions of it.” (p.197) Discord

Non-Specificity what is “soil?” what is “north of?” what is “deprived?”

Controlled Uncertainty

“…although the error may be inconvenient, the consequences of not introducing it may be worse” (pg. 199) Exact areas for endangered bird species (wide distributed components can introduce locational errors)

Adding ‘small counts’ to protect the confidentiality of people

7. Distinguish between vagueness and error 7.1: Viewshed -Reports areas you can see and areas that you can not see -When you don't have the height, GIS uses the cell’s DEM

Fuzzy Viewshed: Probability of a location being visible

Probable Viewshed:The degree to which objects can be distinguished

Fisher: Question 1 Question: “Is fuzzy set theory like a % goodness of fit? “ Answer: Fuzzy Set Theory--an alternative to Cantor sets. In this, the set is not defined by a strict yes/no or 1/0. It a real number within a range of 0 to 1. This can be compared to a glass of water. It holds 1 unit of water at the maximum and 0 at the minimum, but tends to have something in between. Goodness of Fit--the difference between the experiment’s result and the theoretical result that was expected. With that said, they are not the same thing. Fuzzy Set defines the range of variables related to the data whereas Goodness of Fit compares theoretical and experimental values and measures the difference between them.

GIS users must: -Think about possible sources of uncertainty -How they may be addressed

We must relate our conceptualization of uncertainty to GIS-based data models

Using Metadata to Link Uncertainty and Data Quality Assessments Comber et al. 2006

‘metadata is data about data’ ‘information that helps the user assess the usefulness of a dataset relative to their problem’

Why •In this context, assumptions may not generally be reported as a caveat to the “results” of a report or research project. This keeps the customer happy and allows the user to be seen as a “good” researcher. •User uncertainty > Data uncertainty

Spatial data characteristics ● Data are collected for many reasons and it is impossible to predict future usage ● The “real” metadata often resides in the memory of the scientist who created it ● There is a reduction in collaboration due to commercialization ● There is a belief that technology makes data integration irrelevant.

Comber et al Recommendations ● “Metadata needs to be expanded in order to include the data semantics and conceptualizations, and user generated metadata” o

How can we do this?

How to expand metadata? ● Expansion of metadata slots to include free text descriptions of the data ● Development of text mining tools to populate slots ● Development of tools to mine metadata so created for matches between user application and data ontologies

Comber: Question 1 Question: “I understand in general the idea of text-mining and data-mining, but would like to go more in depth as to what this looks like and entails when putting together metadata.” Answer: To the right is what the process looks like in general. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data was consulted for a more in-depth reading on this subtopic. Feldman and Sanger’s handbook states that Automated indexing is the solution to compiling metadata, for it plays a role in automated extraction of metadata. Themes, bibliographic codes, and key words are complied. This does pose some document indexing problems which are stated to be fixed with text categorization (p.65)

National Centere for Text Mining

Comber: Question 2 Question: “Re: Comber’s definitions of Non-specificity” Answer: Comber’s definition states that “non-specificity occurs when the assignment of an object to a class is open to interpretation” (Comber, p.282).

Suggest Documents