Funding Request for Storage for the Libraries' Spatial Data Collection

Funding Request for Storage for the Libraries' Spatial Data Collection January 2009 Executive Summary. . . . . . . . . . . . . . . . . . . . . . . . ...
Author: Todd Russell
5 downloads 1 Views 471KB Size
Funding Request for Storage for the Libraries' Spatial Data Collection January 2009

Executive Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . page 1

Detailed Report, Overview of Spatial Data Collection . . . . . . . page 2

Detailed Report, Storage, Access & Space Requirements . . . . page 3

Detailed Report, Proposed Storage & Access Plan . . . . . . . . . page 4

Detailed Report, Assessment of User Demand . . . . . . . . . . . . . page 6

Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . page 8

Prepared by CU Libraries:

Rob Cartolano, Director Library Information Technology Office Jeremiah Christensen, GIS/Map Librarian Eric Glass, GIS/Metadata Librarian Jane Weintrop, Data Librarian

CU Information Technology

Peter Crosta, Applications Systems Analyst Engineer Rob Lane, Senior Systems Engineer Lynn Rohr, Lead Application Systems Developer

Funding Request for Storage for the Libraries' Spatial Data Collection Executive Summary The spatial data collection is a digital collection of numeric resources that are formatted for use with GIS (Geographic Information Systems) software applications. Since 2003 Columbia University Libraries (CUL) and Information Technology (CUIT) have been jointly supporting the acquisition, management, storage, and dissemination of spatial data and working with others in the University to improve the infrastructure for GIS use on campus. This request is for networked storage designed specifically for spatial data. The requested system will provide for stable, secure storage for the collection, ensure that the collection can continue to grow, and result in a better access model that is designed specifically for delivering spatial data online. The spatial data collection is outgrowing the options we currently have for storing it. The data are being stored on different media and in different formats. Only a small part can be stored and delivered online so we are under utilizing the potential of our now fully operational spatial data catalog. This proposal will alleviate this situation and position the Libraries/IS for continued growth for the next 3 to 5 years. Proposed Solution • An immediate allocation of 500GB of network application space will allow expanding the online access in the collection over the near-term and serve as bridge-storage while the alternative is being developed. • 3.3TB allocation in the Library/IS Repository to meet the long-term storage needs for the entire collection through fiscal year 2012-13. • A presentation server (3.6TB of usable storage), managed by CUIT, running ArcSDE software (ArcSDE is standard software designed specifically for accessing (?) spatial data files), and configured as follows: o HP DL360 server with mirrored 72 GB drives, running Red Hat Enterprise Linux and Oracle o 12-slot HP RAID 5 with 6 1TB drives Budget A budget is needed only the presentation server as the other resources, Library/IS Repository, network application space, and ArcSDE software, are already owned or licensed by CU. If budgeted as a one-time cost the cost for the presentation server with ArcSDE is $8,900 and if leased is $2,967 per year over the first three years of the project. The details of these costs are charted on page 6 of the report. The rest of this document is a detailed report that provides an overview of the collection; outlines our requirements for improved storage, access and space; gives more detail on the proposed solution; and reports on many factors that are indicators of the demand for spatial data.

1

Funding Request for Storage for the Libraries' Spatial Data Collection Detailed Report Overview of Spatial Data Collection As of December 2008, the spatial data collection has about 3,610 titles that require 965 GB of storage space. The data come in a variety of formats. Raster data consist of either imagery files (orthoimagery, satellite imagery or scanned maps) or a gridded layer over geographic boundaries with a single value stored in each grid cell. Vector data consist of points, lines, and polygons and tables with any number of data attributes that describe a geographic unit. For either of these formats, a data set will usually consist of several individual files that must be delivered and used together. The current collection contains files ranging in size from a few KB to 159 GB. The storage methods for the 965 GB that are now in place are described. • 100 GB are stored on CUNIX in space that CUIT has allocated to EDS's numeric data collection (Over 80% of that allocation is now in use). • Online data sets are stored and delivered in zipped format, a necessary step to reduce the space requirement and to ensure users get all the files that comprise a data set. • The remaining data sets are kept on external hard disks and DVDs in EDS, and users must come to EDS for access. • 680 GB of the off-line collection is comprised of files too large to be distributed using our current distribution model without first being processed to segment the files that cover smaller geographic areas (these larger files are raster files). • 70% of the titles are in vector format and 30% are in raster format. • Even from portable storage accessed via a PC, the extraction of data from a large data set can be a twenty minute process. A detailed list of CUL’s spatial data collection and storage locations is presented in Appendix I. The collection includes data sets that anyone can use and data sets that are licensed for use only by Columbia users. The data sets with Columbia-only licenses requires UNI authentication for access, and those without restricted licenses require no authentication. CU Spatial Data Catalog (http://gis.columbia.edu/data.html) is the tool for identifying items in the collection. As outlined in the Timeline for GIS Services, Appendix H, it is hosted by the Center for International Earth Science Information Network (CIESIN), and EDS staff create and upload metadata for the CUL collection. Since summer 2008 the catalog has described the entire Libraries spatial data collection. The location information in the catalog for the online data sets (stored on CUNIX) is a URL linked to a zipped file; for the off-line collection it directs users to the EDS lab.

2

Storage, Access and Space Requirements The specifications for improved access and storage are as follows: • The entire spatial data collection will be stored in the CUL Repository. • Most of the collection will be accessible online via links from the spatial data catalog. • The catalog links will interface with the data using an ArcSDE server/software interface. This will allow users to select, work with, and then download only the portion of a data product they need. This will make accessible many of the datasets that are too large to be delivered in zipped format, for all practical purposes. • Data with Columbia-only licenses will require UNI authentication before downloading. • Titles that will not be put online will include: o data where license or copyright provisions prohibit campus-wide distribution, o data that have been superseded by a more current version and there is not sufficient demand to keep it online. • For those titles not available online, patrons would have to visit EDS to extract the data from DVD or an external hard drive. Factors that impact the space needed to store the collection include the budget allocation, the format of the data selected for purchase, and the rate at which scanned maps are added to the collection. The assumptions we are making about each are listed here. •

The budget allocation will increase at a rate of 10% per year. Although this percentage sounds high given the current fiscal concerns, the base to which it applies is small, $12,000. We do not anticipate making any large one-time purchases like those we have used in the past to build the collection.



The collection composition, 70% vector and 30%, raster will continue. This is an important factor because in terms of space, 35% of the space is used by vector data and 65% by raster.



We have been scanning maps from the Lehman map collection and adding the resulting spatial data files to the collection. To take full advantage of having a scanner, we want to increase our commitment to scanning. The benefits are increased use of our map collection and growth in our spatial data collection.

We anticipate needing 3.3TB of space by the end of the 2012-13 fiscal year. This number includes: just under 1 TB for titles we now hold, just over 1 TB for estimated growth based on the above assumptions, and 1 TB contingency to cover events outside our assumptions, that is unanticipated demand for raster data or availability of one-time funds. See Appendix A for details. Although the counts in Appendix A show an even increases over time in the amount of space needed for the collection, increases in space will continue to occur in an uneven way just as they have in past years. This occurs because of the impact of raster purchases on space needs. As stated above the vast majority of the spatial data collection will be stored in the CUL Repository. The only exceptions are titles that have license or copyright restrictions prohibiting campus-wide access, and older versions of regularly updated titles for which there is little demand (approximately 350 GB). Access can continue to be provided on-site in EDS in the way it is currently served.

3

Proposed Storage and Access Plan • Collection Accessioning Spatial data generally enters the collection by way of CDs, DVDs, portable hard drives, ftp downloads, and other methods. Once acquired, standard FDGC metadata is prepared to describe the data and this is uploaded to the CU Spatial Catalog. As soon as space in the Library/IS Repository is available, accessioning will also include preparation of the files for deposit to that space in compressed format. Work to move files now in the collection will also get underway. The secure storage of all data will mean we no longer have to rely on the physical media like CD/DVDs or portable hard drives to backup the collection. Further processing will then depend on the delivery choice for the data. • Delivery via 500GB of networked application space For the near term, networked space will still be used in the way it is now. Where the data files for a title are small enough so that a zipped version can be efficiently downloaded, a zipped file is created and uploaded to one of two CUNIX directories: one that is accessible only for authorized Columbia users and another that can be used by anyone. By the end of fiscal year 2007-2008 all the space available to us on CUNIX was used. Increasing the allocation by 500GB will allow us to continue with his method of delivery during the development of the ArcSDE server. Once flow of our data into the Library/IS Repository is in place, the viability of downloading directly from the repository will be tested as an option for handling smaller spatial data files. On the “Spatial Data Flow” diagram, which is shown on the next page, this delivery method is the flow that services web-user access. • Delivery via the ArcSDE Presentation Server An ArcSDE interface provides several advantages: First, spatial data can be directly loaded into ArcGIS from the spatial data catalog without needing to first download the entire spatial dataset to a local computer. This is faster, easier, and eliminates the need for potentially large amounts of local storage. Second, ArcSDE presents users with an image of the geography described by the data and allows users to select data for only the part of the geographic area that they need. Such functionality allows for the delivery of information that is often available only as part of a very large data file. Requests for data delivered via the ArcSDE server would be received from the Spatial Data Catalog and the response would be an image layer file of the geography generated by the ArcSDE software. The user downloads the layer file that pulls data from the ArcSDE server. The user then uses ArcGIS to save (export) whatever small part of the dataset is needed. The very large spatial data files that, even when zipped, cannot easily be downloaded, will be loaded into Oracle database on the ArcSDE server for delivery this way. Delivery of data via the ArcSDE server will be available only on the DSSC/EDS workstations in Lehman Library. For the large files that will be handled this way, this limited access will be an improvement since the DSSC, unlike EDS, is open whenever Lehman Library is open and because there will be no need to handle physical storage media or require staff assistance. As well the downloading of only a subset from a large data file will be much faster than accessing a very large file on a local CD/DVD or other portable device. On the “Spatial Data Flow” diagram, which is shown on the next page, this delivery method is the branch that services the DSSC/EDS machines. The infrastructure investment sought for in this proposal will solve our immediate storage and access needs, will accomplish the majority of the technical work involved in setting up an ArcSDE interface, and position us to better assess the costs of further development to provide for either campus-wide access and, eventually, authorized web-access from anywhere.

4

• Delivery via CD/DVD or portable storage There will continue to be a small portion of seldom-used titles where access will continue to require use of CD/DVDs in EDS. Most often these are early versions of files that get republished on a semi-regular basis. Backup for these data will be in the Library/IS Repository. This handling is not shown on the flow diagram.

The ArcSDE server will be an HP DL Red Hat Enterprise Linux 360 with mirrored 72GB drives for the operating system (RHEL) and Oracle. It will have a 12-slot HP RAID system attached. Six of the slots will be filled with 1TB drives, giving 3.6TB of usable space under RAID 5. The six empty slots will be available for later expansion. Access to the ArcSDE service port will be restricted to clients located within EDS and DSSC. This will be enforced either through filters set up by the CUIT networking group or by the use of iptables or similar mechanism on the server itself.

5

The costs for each alternative, purchase or lease, are listed here.

 

 

 

 

 

Component

Year 1

Year 2

Year 3

Notes

DL360 RAID Enclosure Six 1 TB disks

$4,000

$0

$0

$2,900 $2,000

$0 $0

$0 $0

Total

$8,900

$0

$0

Component

Year 1

Year 2

Year 3

DL360 RAID Enclosure Six 1 TB disks

$1,333

$1,333

$1,333

$967 $667

$967 $667

$967 $667

Total

$2,967

$2,967

$2,967

Purchase

Includes cable and card.

3 Year Lease

Includes cable and card.

Assessment of User Demand Future demand will depend on the interests of academic departments and students. In 2003 it was academic departments that secured university funding for the project called “GIS: Spatial Dimensions in Social Science”; so their interest is proven (refer to the Timeline of GIS Services, Appendix H for details). What to date has hindered a rapid growth in demand is the slow pace of progress by academic departments in resolving some of the issues that were raised while working on the project - specifically, having enough qualified instructors for courses and having hands-on classroom space suitable for teaching GIS. Factors beyond the university will ensure that student expectations will continue to pressure the university to meet these shortcomings. There is a cultural shift towards disseminating quantitative information using visualization techniques like mapping. Examples include the use in the media of increasingly sophisticated maps as a way to display data, the popularity of web tools like Google Maps and Google Earth, and analytic web mapping tools like SimplyMap (http://www.columbia.edu/cgi-bin/cul/resolve?clio6636232) and Social Explorer (http://www.socialexplorer.com/). Despite only modest progress in addressing the needs of academic departments and the relative newness of the online spatial data catalog, there are some historic quantifiable measures that indicate these efforts have been successful. These are summarized below. •

Courses on GIS or Spatial Analysis (Appendix C). Between academic years 2002-2003 and 2007-2008 the number of GIS-relates courses offered rose from 2 courses with 30 enrollees to 13 courses with 250 enrollees.



Other Instruction related to GIS. Three departments (Journalism, Religion, and History) have worked with CCNMTL (Columbia Center for New Media, Teaching, and Learning) and EDS to incorporate the use of a mapping as a tool used by students. In the 2006-2007 school year, 104 ESRI Virtual Campus (self-paced online courses) requests from 24 different departments or programs were made.

6



EDS Traffic (Appendix D). Between 2003 and 2008 GIS consultations rose from representing just 5% of traffic to over 30%. This increase meant that total EDS traffic counts could climb each year despite decreasing demand for some data tasks that require less support due to technology improvements.



Grant Funded Research (Appendix E). In 2006 twelve different departments or research centers listed research projects that used GIS on Columbia's GIS web site. From the list the most active were GSAPP, CIESIN, ISERP and Public Health, often in collaboration with each other. Since this list has not been maintained, updated data that illustrate the use of spatial data in research was available only from ISERP. ISERP has had a full time GIS analyst since 2003 and in 2007 added a part-time analyst. Fiscal year 2008 was their most productive year with 6 GIS-related grant proposals submitted and 5 funded with an award amount of $2,055,548.



ESRI Licenses (Appendix F). Spatial data is most often used with GIS software and the application most often used on campus for GIS work is ESRI's ArcGIS. At a cost of $6,000 per site license, availability of software can be seen as a measure of interest in GIS. There are now 14 departments or programs that have purchased a site license. This covers on site licenses for up to 25 seats. Less expensive single seat licenses are purchased by individual faculty members, research projects, and students, but we have no counts on these.



Usage of ArcGIS Software (Appendix G) ESRI’s ArcGIS software is the most popular way to manage and edit spatial data at Columbia. In the 41 month period from July 2005 to November 2008, there were over 15,000 instances of ArcMap (the leading map creation software) called on CUIT-controlled computers. Users spent over 13,000 hours working with this software, with an average of 64 minutes usage time per instance. The average user also spent about 81 days between the first time she worked with ArcMap and the last time. From September 2005 to September 2008, there was a twofold increase in the number of firsttime ArcMap users and the number of unique ArcMap users.



Spatial Data Catalog (Appendix B) The catalog receives about 1200 page views and 230 unique visitors per month. Columbia IP addresses account for the majority of these visits, but visitors from local internet providers, government agencies, and other educational institutions are not uncommon.

7

Funding Request for Storage for the Libraries' Spatial Data Collection Index to Appendices

A: Space Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . page 09 B: Spatial Data Catalog Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . page 10

C: GIS / Spatial Analysis Listed in the Directory of Classes. . . . . . . page 11 D: Reference Traffic in EDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . page 12 E: Grant Funded GIS-based Research Activity in ISERP . . . . . . . . . . . . . page 13 F: List of CU Departments with ArcGIS Licenses . . . . . . . . . . . . . . . . . . . page 14 G: ArcGIS Usage from 7/2005 to 11/2008 . . . . . . . . . . . . . . . . . . . . . . . . . . page 15 H: Timeline of GIS Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . page 18 I: Description of Spatial Data Collection, Dec. 2008. . . . . . . . . . . . . . . . . . page 19

8

APPENDIX A SPACE REQUIREMENTS FISCAL YEAR 03-04 04-05 05-06 06-07 ** 07-08 08 (Jun-Dec) scanned maps through

DOLLARS BUDGET

1-TIME 5,000 6,350 7,000 10,000 10,000 12,000

2,000 180,000 7,000

Dec. 08 TOTAL SPACE

09 (jan-Jun) 09-10 10-11 11-12 12-13

50,000

13,200 14,520 15,972 17,569 EST. GROWTH CONTGINECY SPACE REQ

TO-DATE

2009-13

Total 11.0 20.0 62.0 357.0 355.0 50.0 110.0

Space - GIGS 1BUDGET TIME 10.0 20.0 42.0 22.0 352.0 50.0

scans

1.0 20.0 335.0 3.0 110

965.0 161.3 195.4 211.0 228.1 246.9 1,042.7 1,000.0

141.3 155.4 171.0 188.1 206.9

20 40 40 40 40

3,007.7

9

Appendix B: Spatial Data Catalog Statistics

Views  and Vis its 2000 1800 1600 1400 1200 1000 800 600 400 200 0

page views vis its unique vis itors

F eb‐ Mar‐ A pr‐ May‐ J un‐ J ul‐ A ug‐ S ep‐ O c t‐ 08 08 08 08 08 08 08 08 08

Figure B1

P ag e Views  per Vis it 6.00 5.00 4.00 3.00 2.00 1.00 0.00 O

p‐

8

08

08

‐0 ct

Se

g‐

08

8

8 l‐0

n‐

Au

Ju

Ju

‐0

8

8

08

r‐ 0

ay M

Ap

b‐

‐0 ar M

Fe

Figure B3

10

Appendix C: GIS / Spatial Analysis Listed in the Directory of Classes Table C1 2001- 2002- 2003- 2004- 2005- 2006- 20072008-2009 COURSES OFFERED 2002 2003 2004 2005 2006 2007 2008 Fall/Spring 2 2 Urban Studies/Bar-CC 1 1 2 2 2 Economics/Barnard 1 1 1 2 3* 1 GSAPP 1 1 2 2 2 3 2 2 Public Health 1 2 2 2 2 Env-Earth-Eng 1 1 1 2 2 2 1 1 QMSS 2 1* 1 Environmental Biology 1 2 2 4 7 8 14 13 11 TOTAL Table C2 ENROLLMENT IN 2001- 2002- 2003COURSES OFFERED 2002 2003 2004 Urban Studies Economics/Barnard GSAPP Public Health Env-Earth-Eng QMSS Environmental Biology TOTAL

20042005 22 15 15

23

25

15 42

7

9

17

20

30

34

74

72

2005- 2006- 20072008-2009 2006 2007 2008 Fall Only 23 43 50 27 23 28 15 24 40 41 37 41 60* 1 1 60 49 34 28 53 17 13 11* 11 104 245 250 130

*estimated counts Comments about the offerings: •

GSAPP - enrollment is limited by lab facilities and students outside of GASPP can rarely be accommodated.



Barnard/CC Urban Studies - for the past four year the courses have always been over enrolled.



Environmental and Earth Engineering - felt these measures under-represent demand as GIS is hidden within other courses not identified as being GIS-related.



QMSS - the newest department to offer courses and they are still limited by the availability of instructors and working to solve this problem



Public Health - enrollment is limited by class space with access the software.

11

Appendix D: Reference Traffic in EDS Table D1

Counts 2003 2004 2005 2006 2007 2008

Statistical Find data or GIS and Software/ work with Extraction/ Transfer/ Mapping Reformat codebooks Conversion Download 66 236 299 484 47 149 277 348 432 85 388 363 344 302 71 476 356 372 289 63 676 551 383 303 55 547 475 385 230 41

Other 186 33 49 34 36 70

Total 1,318 1,324 1,517 1,590 2,004 1,748

Statistical Find data or GIS and Software/ work with Extraction/ Transfer/ Mapping Reformat codebooks Conversion Download 5.0% 17.9% 22.7% 36.7% 3.6% 11.3% 20.9% 26.3% 32.6% 6.4% 25.6% 23.9% 22.7% 19.9% 4.7% 29.9% 22.4% 23.4% 18.2% 4.0% 33.7% 27.5% 19.1% 15.1% 2.7% 31.3% 27.1% 22.0% 13.2% 2.4%

Other 14.1% 2.5% 3.2% 2.1% 1.8% 4.0%

Total 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%

Table D2

Percents 2003 2004 2005 2006 2007 2008

12

Appendix E: Grant Funded GIS-based Research Activity in ISERP Table E1 Year

Proposals submitted

Proposals funded

Total amount awarded

Total indirect costs

FY00 FY01 FY02 FY03 FY04

1 2 2 3 7

0 1 2 1 0

0 56,973 233,005 340,000 0

0 11,340 62,199 30,718 0

FY05 FY06 FY07 FY08* FY09*

4 (5) 0 (1) 9 (12) 1 (5) 2

1 (2) 0 (1) 4 (2) 1 (4) 2

100,000 (1,766,777) 0 (49,797) 7,032,405 2,055,548 15,099

9,091 (615,763) 0 (4,112) 2,621,776 526,069 0

* Represents incomplete information on proposal submission and awards, including grant proposals with a “pending” status. ** Numbers in parentheses include collaborative proposals with Mailman or Social Work and involving ISERP personnel.

13

Appendix F: List of CU Departments with ArcGIS Licenses

CU Department Licenses 1. Columbia University Information Technology (CUIT) 2. Graduate School of Architecture, Planning, and Preservation (GSAPP) 3. Barnard College 4. Center for Environmental Research and Conservation (CERC) 5. Center for International Earth Science Information Network (CIESIN) 6. The Fu Foundation School of Engineering and Applied Science (SEAS) 7. International Research Institute for Climate Prediction (IRI) 8. Institute for Social and Economic Research and Policy (ISERP) 9. The Lamont-Doherty Earth Observatory (LDEO) 10. Mailman Scholl of Public Health 11. School of International and Public Affairs 12. Earth Institute 13. Urban Technical Assistance Program (UTAP) 14. Paul Kilstein Center for Real Estate

Spring 07

Fall 08

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x x x x x

14

Appendix G: ArcGIS Usage from 7/2005 to 11/2008 -

1,021 unique ArcMap users 15,563 instances of ArcMap (371 per year) 13,440 hours of usage (560 days) Average usage time per instance: 1 hour 4 minutes Average duration of usage: 81 days

15

16

17

Appendix H: Timeline of GIS Services 2002 Fall - Need for GIS support is described in an Electronic Data Services (EDS) planning document. The recommendation is to more actively support GIS users by hiring a GIS/Map Librarian and moving the data lab into Lehman 215, which is both larger and better laid out and also adjacent to the Map Room. Upgrading the computer hardware to accommodate GIS applications is also recommended. 2003 Summer - GIS/Map Librarian joins EDS. EDS moves to 215 Lehman with the existing hardware. 2003-2004 FY - A $5,000 budget is set up for the acquisition of spatial data. Before this time, the collection consisted of data received from the Libraries participation in the Federal Government Depository Program and from ESRI who distributes the data sets with the license for ArcGIS software. 2004 Winter - The project, GIS: Spatial Dimensions in Social Science, which was funded by an Academic Quality Fund (AQF) grant, gets underway. Participating units in this project were Graduate School of Architecture, Planning, and Preservation (GSAPP), Institute for Social and Economic Research and Policy (ISERP), Center for International Earth Science Information Network (CIESIN), Libraries, and AcIS. The purpose was "to improve GIS infrastructure and to integrate spatial perspectives into Social Science teaching and research." 2004 Fall - Library intern creates Federal Geographic Data Committee (FGDC) meta-data for the most heavily used titles in the collection. This satisfies one of the Libraries deliverables to the AQF project. 2005 Spring - EDS acquires a large-format scanner suitable for scanning maps. It was purchased by IGERT. AcIS assumed the cost of maintaining it and the Libraries personnel for operating it. This creates the ability to view the map collection as a GIS resource. 2006 Summer - EDS computer hardware is upgraded to high speed, large memory CPUs and 30" monitors to provide the increasing numbers of GIS users with state of the art standards for access. Heavily-used spatial data is stored on the local hardware as part of the software image. 2007 Winter - The Strategic Initiatives for GIS Services document is prepared as a follow up to the 20062009 Libraries Strategic Plan. Needs that were identified include: a metadata librarian, an online catalog for spatial data, and network storage space for the collection. https://www1.columbia.edu/sec/cu/libraries/staffweb/img/assets/9195/eds.oct07.pdf 2007 Spring - GIS/Metadata Librarian hired 2007 Spring – CIESIN agrees to host the CU Spatial Data Catalog. The new GIS/Metadata librarian moves the existing metadata into the new CU Spatial Data Catalog making it searchable on the Web. Until this time, the metadata had been loaded on the EDS work stations where users would search for data using the PCbased ArcCatalog interface. To maximize the benefit of an online service, data sets that can be zipped and made available online are moved into EDS’s DataGate space on CUNIX. 2008 Summer – The heavily-used spatial data titles stored on the EDS PCs as part of the software image are removed in preparation for the using the EDS image as the basis for the Digital Social Science Center (DSSC) software image. All data sets that are candidates for online storage are moved to space on CUNIX. The CU Spatial Catalog entries now reflect our current holdings with the appropriate location (online or onsite in EDS). 2008 Fall – CIESIN updates the CU Spatial Data Catalog with metadata describing their publicly accessible data.

18

Appendix I: Description of Spatial Data Collection, Dec. 2008 Notes: A – Currently on file server B – Too big for file server C – Archived D – Restricted, cannot be distributed online E – Free dataset

Name LandInfo Global GIS Bundle Land Info GlobalGIS_DVD – in .shp and GRID formats (in .shp and GRID formats) Shuttle Radar Topography Mission (STRM) 30 ArcSecond DEM Global Coverage in USGS ASCII .dem Shuttle Radar Topography Mission (STRM) 3 arc second DEM, near global coverage in .bil AND Arc ASCII Grid Landsat 4/5 1990 30m Near global coverage in MrSID Landsat 7 2000 15m Near Global Coverage in mrSID Tactical Pilotage Charts 1:500,000, Global coverage in GeoTiff VMAP0 1:1,000,000 global coverage vector datasets in .shp (2 GB) and .tab World Vector Shoreline, various scales, Global coverage in .shp Orthoimagery Barcelona, geodatabase (includes roads in .shp) Beijing (includes roads in .shp) Berlin (includes roads in .shp) Hong Kong (includes roads in .shp) London (includes roads in .shp) Madrid, geodatabase (includes roads in .shp) Nairobi (includes roads in .shp) Paris (includes roads in .shp) Tokyo (includes roads in .shp) New York, .jpg .ecw San Francisco .tif Buffalo .tif Los Angeles .tif Springfield .tif

Items

Size (GB)

Notes

Format

83

7.35

A, E

raster/vector

1

6

B

raster

1

159

B

raster

1

23.4

B

raster

1

119

B

raster

1

56

B

raster

47

2.35

A

vector

1

0.5

A

vector

3 3 3 3 3

4.73 12 13.2 5.32 22.5

B B B B B

raster raster raster raster raster

3 3 3 3 3 1 1 1 1

6.11 5.24 39.1 71.7 19 67.8 3 61 3.5

B B B B B B B B B

raster raster raster raster raster raster raster raster raster

19

Lead Dog data layers – by city, may include streets, airports, [arks, water bodies, neighborhoods, points of interest, railroads Nairobi 6 0.0153 A vector Guadalajara 3 0.0394 A vector Singapore 6 0.0072 A vector Kingston 8 0.0062 A vector Port Au Prince 5 0.0081 A vector Hong Kong – includes building footprints and census boundaries 0.0336 A vector India – Census boundaries 3 0.0679 A vector Mexico City – including census boundaries with data 5 0.207 A vector Kenya 8 0.0059 A vector

New York Files Census-defined boundaries by borough Facilities Community Districts Neighborhoods Pedestrian/bicycle accidents United Hospital Fund Areas Detailed City Boundaries Bytes of the Big Apple Layers New York City DEM files MapPluto – 2007 MapPluto – Tax lots 2003-2006 Rent stabilized apartments Sub-boroughs Transportation files, 2007 Transportation files, 2005 New York City Real Property Assessment Database (RPAD) Statewide Electric & Gas Service Areas International Steering Committee for Global Mapping Africa Data Dissemation Service – Administrative Boundary files (up to 5th level administrative districts) Japanese Census Files China Historical GIS (CHGIS) China 2000 Township Population files. China 2000 County Population files. China Historical Census Tibet GIS Maps with Census data Corine Land Cover NTAD data (2006)

97 1 1 1 1 1 2 2 4 1 4 1 1 15 10

0.033 0 0.0007 0.0012 0.0086 0.0002 0.0081 0.001 0.54 0.814 4 0.0156 0.0003 0.0229 0.0056

A, E A, E A, E A, E A, E A, E A, E A, E A, E D D A A A C

vector vector vector vector vector vector vector vector raster vector vector vector vector vector vector

1 2

0.942 0.0017

A A

vector vector

82

12

A, E

raster

133 4 99 220 12 11 35 1

0.037 0.535 0.414 0.56 0.134 0.096 0.393 2.67

A, E A, E A A A A A B

vector vector vector vector vector vector vector vector

24

1.185

A

vector

20

Historical united States County (HUSCO) boundary files

46

0.054

A

vector

ESRI Data – wide variety of datasets at national, continental and global scales. ESRI Data (2005-2008) 930 55 A raster/vector StreetMapUSA 1 2.72 B vector ESRI Data (1998-2005) 1300 62 C raster/vector Sanborn Collections – Orthoimagery of Urban Centers, with accompanying vector files of bridges, parks, rivers, building footprints, roof elevations, streets, address points, railroads and ground elevations. New York (one raster file) 11 0.591 A raster/vector Newark (one raster file) 12 0.137 A raster/vector Jersey City (one raster file) 13 0.139 A raster/vector New Jersey Files New Jersey Tax Lot databases 21 1.743 A vector Transportation files, 2007 14 0.0026 A vector Transportation files, 2005 4 0.0011 C vector 279 110 A raster Scanned Maps Total

3611

964.9972

21