Data Management for Libraries

Data Management for Libraries ALA TechSource purchases fund advocacy, awareness, and accreditation programs for library professionals worldwide. D...
Author: Janis Potter
0 downloads 0 Views 407KB Size
Data Management for Libraries

ALA TechSource purchases fund advocacy, awareness, and accreditation programs for library professionals worldwide.

Data Management for Libraries A LITA Guide

Laura Krier and Carly A. Strasser

An imprint of the American Library Association Chicago  2014

© 2014 by the American Library Association. Any claim of copyright is subject to applicable limitations and exceptions, such as rights of fair use and library copying pursuant to Sections 107 and 108 of the U.S. Copyright Act. No copyright is claimed for content in the public domain, such as works of the U.S. government. Printed in the United States of America 18 17 16 15 14   5 4 3 2 1 Extensive effort has gone into ensuring the reliability of the information in this book; however, the publisher makes no warranty, express or implied, with respect to the material contained herein. ISBNs: 978-1-55570-969-3 (paper); 978-1-55570-975-4 (PDF); 978-1-55570-977-8 (ePub); 978-1-55570-976-1 (Kindle);. For more information on digital formats, visit the ALA Store at alastore.ala.org and select eEditions. Cataloging-in-Publication data is on file with the Library of Congress. Book design in Berkeley and Avenir. Cover image ©HunThomas/Shutterstock, Inc. This paper meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper).

Contents

Preface vii

1 What Is Data Management?

1

2 Starting a New Service

9

3 Data Management Plans: An Overview

17

4 The Data Management Interview

31

5 Metadata

39

6 Data Preservation

47

7 Access

61

8 Data Governance Issues

71

Afterword 81 Appendixes A Resources for Institutional Repositories  83 B Sample Data Librarian Job Descriptions  84 C Sample Data Management Plans  89 About the Authors  95 Index 97

  v

Preface

T

he buzz around data management in libraries has been growing quickly since the National Science Foundation announced that data management plans would be a required component of all grant applications starting January 2011. Many librarians felt pressed to implement data management consulting services without having a firm grasp of how best to support researchers at their institutions. In all kinds of institutions, librarians are experimenting with different service models and giving themselves crash courses in research data and the requirements for effective data management, and many are doing it with minimal guidance. This book is intended to offer that guidance. There are a lot of elements to building an effective data management consulting service, and for many of us there are a lot of new things to learn. Jumping into a new arena, and coming up to speed as quickly as many of us have, can be challenging. Through extensive research, discussions with data management librarians around the country, and close work with data management experts, we have pulled together this planning guide to help you and your colleagues build an effective and well-used service for your researchers, faculty, and students. Unfortunately, building a data management service is not something librarians can do on our own. Meaningful data management is a goal that should be supported at all levels of the institution, from lab assistants to department heads to the president of the university. This can be one of the biggest hurdles for libraries looking to implement these new services, but, thankfully, in recent years there has been an increasing understanding on the part of many people working in academia about the need for these kinds of services. This book offers insight into building that support in your institution and maintaining the relationships that ensure your service is successful.

  vii

Preface

  viii

Though many librarians work closely with research faculty and understand the data that is being produced, some of us are new to the world of big data. This book offers a primer on data—on why and how data should be effectively managed. We also offer some tips for talking to faculty about why data management matters, and we help you learn to conduct successful data management interviews. This guide is here not only to help you understand data management, and how your library can be invaluable to researchers, but to help you build a service in your library. Most of the data management guidelines on the web are directed at faculty; this guide has a different approach: to help you help researchers. We walk you through every piece of a data management plan, help you make decisions about repositories and other infrastructure, and guide you through some of the difficult questions that arise about intellectual property, sharing and access, metadata, and preservation. Data management in libraries is a new and growing area. There are sure to be changes over time as we learn more. We hope that this guide can make you and your colleagues better able to contribute to the conversation as we all work collectively to organize, preserve, and provide access to research data, as we have with other products of research.

Chapter One

What Is Data Management?

I

n nearly every field, the practice of research is changing. New technologies and tools are being used to conduct research, resulting in wholly new types of data, in vastly expanding quantities. In both the sciences and humanities, research data is increasingly taking on a digital form, living on local hard drives and remote servers, and scattered across networks. More and more, this data is born in a digital form, although physical forms of are still common within some fields of study. Some research projects combine physical and digital data, and researchers must keep track of both simultaneously. And, increasingly, research projects are producing huge sets of data that would be unmanageable without the aid of computers to process them. These new technologies are opening the doors to greater collaboration among researchers, engineers, and computer scientists, in all fields of study. And, increasingly, librarians are being brought into these partnerships to contribute needed expertise in data management and preservation. Researchers are more interested in conducting their work than in managing and organizing the data behind it, and this is where librarians can provide valuable services and support. As librarians move into this field, it is crucial that we understand the domains in which researchers are working, and that we have a solid grasp of the kinds of research data being produced. Data types can vary widely at different institutions and in different fields of study, but whether you are at a large research library, a medical school, a liberal arts college, or in support of a particular department it is likely that research is

  1

Chapter One

being conducted, and that researchers need data support. You need to work closely with faculty and other researchers to know how best to support them, but a quick review of the data landscape provides a solid foundation to begin discussions.

Types of Research Data

  2

You are likely already familiar with one major distinction between types of data: qualitative versus quantitative. Quite simply, quantitative data deals with things numerically. Qualitative data is descriptive in nature and deals with the quality of things, giving rise to categorization rather than quantification. Those in the social sciences, and in fields such as physics, are often more likely to use quantitative data, whereas fields such as anthropology and history are more likely to use qualitative data. But the truth is that the distinction between these two data types is not as hard and fast as you may believe, and people in all fields gather both types of data in their research. Beyond this basic distinction, there are many other categories of data that may be part of a research project. Primary data is data that is collected by the researcher within a particular project. This is original data that arises from a particular experiment or observation. It is gathered and maintained by the researcher. Researchers often also use secondary data, originally created by someone else. For example, some researchers use census data gathered by a national organization to draw conclusions about a particular population. Libraries may be asked to acquire data sets for use in a particular research project, or researchers may find data sets through open-access repositories. Both primary and secondary data take many forms. Some research projects produce observational data, which is data that has been gathered from observing a particular population or phenomenon. Experimental data, in contrast, is derived from controlled, randomized experiments. Observational data is gathered in instances where it is not possible to conduct a controlled experiment; researchers attempt to measure as many variables as possible in order to elucidate possible cause-and-effect relationships. Controlled experiments generally attempt to minimize the number of contributing factors that are not of interest in order to measure the primary variable(s) in the study. Traditionally, observational and experimental data were both produced by human researchers, taking notes in lab notebooks. But more and more often, data is gathered with the use of computers, sensors, and other monitoring tools. These

What Is Data Management?

tools produce far larger data sets for researchers to collect and analyze. For example, sensors collecting traffic information can gather far more data than can a human observer, and we can gather and analyze larger sets of survey data using computers than using paper surveys filled out and reviewed by manual labor. Research projects might also produce computational data. Computational data is the output of a computer that has taken a large set of varied data and run it through a simulation. The fields of bioinformatics and genomics are forerunners in the use of computational data. Social scientists use computational data to detect patterns and predict behaviors. Computational linguistics looks at patterns and frequency of words and phrases using n-grams. Computational data is increasingly becoming part of all fields of research.

Sharing Data Before the advent of large-scale, born-digital data, research data itself was not widely considered to be a valuable end product. Researchers produced papers that documented their work and drew conclusions about the data they had gathered and analyzed. The use of new technologies, though, means that some types of research data are expensive to produce. As cost rises and the size of data sets increases, data is becoming a more valuable end product. Researchers are beginning to see the advantage in sharing and reusing data sets to reach new conclusions or to better understand a related area of study. But the shift to sharing data, in addition to the final, published version of a research paper, is still in its infancy, and the move toward greater data sharing requires the support and collaboration of many members of the academic institution, including librarians. The first steps toward an open-data landscape are being taken. Some funding bodies have instituted requirements that research papers be shared openly and that plans for managing the data produced during a research project be included in grant proposals. Many subject-specific and institution-specific data repositories are preserving and providing access to a wide range of data sets. Other repositories hold open-access copies of research papers. Although researchers sometimes remain skeptical about the value of sharing their research data, the practice is becoming more accepted. Libraries are in a unique position to provide real value to a burgeoning practice and real guidance to researchers in this new world of research.

  3

Chapter One

  4

As more funding bodies and journals issue requirements that papers and data sets be managed and shared, it is important to pay close attention to the exact specifications. For example, the National Science Foundation requires that researchers submit a brief data management plan, but they do not require that data or final papers be released in an open-access repository. The National Institutes of Health require that a data sharing plan be included for grants requesting funds over a certain amount. The National Endowment for the Humanities Office of Digital Humanities began requiring a data management plan in 2012. Some funding sources merely require that the final paper be made available in an open-access repository. Several journals, including ISME Journal, Evolution, and Plant Physiology, have open-data policies, some requiring that data be submitted to specific repositories and some merely requiring that the data be made available to those who request it.1 The requirements can vary and are not uniformly enforced, and it is important to understand the differences between open-data requirements and open-access publication requirements and between those grants that require only that a data management plan be in place and those that require data deposit.

What Is a Data Management Plan? In many instances, a researcher is required to submit a data management plan along with the grant proposal. These plans lay out the specifics of how research data will be organized, managed, and preserved throughout the data’s lifecycle, during the project and after. The extent and amount of detail in a data management plan depend on the project itself and on the audience for which it is being created. In general, these plans require a description of the project and of the data that will be generated or used, the formats and metadata standards that will be used to store and organize data, where and how the data will be stored, in both the short and long terms, and any access provisions and legal requirements that adhere to the data. In general, funding bodies want to know that researchers have given thought to how their digital and physical data will be stored, preserved, and potentially made accessible to a wider audience.

What Is Data Management?

What Is Data Curation and the Data Lifecycle? There are two ways to think about the lifecycle of data: from a researcher’s perspective and from an archivist’s perspective. The UK Data Archive has created a “research data lifecycle” that can be useful for thinking through all the stages of data from a researcher’s perspective.2 The Digital Curation Centre has, likewise, created a “curation lifecycle model” that lays out all the processes and components involved in data curation from an archivist’s or curator’s perspective.3 Both of these models are useful for libraries looking to implement data management services. The research data lifecycle covers the lifespan of research data from creation through reuse. Most of the data services and management needs we discuss in this book are related to the research data lifecycle and to supporting the needs of researchers throughout the research process. The sequential steps of this lifecycle are creating data, processing data, analyzing data, preserving data, giving access to data, and reusing data. There are roles for librarians at most stages in this process, and each stage is made easier with good planning and management. We discuss these stages and roles for librarians in more detail in other chapters. The data curation lifecycle model covers the lifespan of data after it has been created and analyzed and is ready to be submitted to a repository. Data curation is the management of data once it has been selected for preservation and long-term storage. This model has data and digital objects at its center and treats data curation as an iterative process. The sequential steps of the curation lifecycle are creating or receiving data, appraising and selecting data, ingesting, performing preservation actions, storing data, accessing data for use and reuse, and transforming data. There are occasional actions that may disrupt the cycle, such as reappraising and deaccessioning data sets. Many individuals are usually involved at various stages of the data lifecycle, both during the research process and during the curation process. Where you come in will likely vary from project to project, depending on the services you elect to provide. Likewise, the data itself may be generated in different ways: some may be created, and some may be transformed from existing data sets. Some key elements must be considered at every stage of the lifecycle, including preservation planning and description. The models are intended to be used as guides for planning and are not necessarily meant to be a set of rules to follow step by step. They can be useful for framing conversations with researchers and administrators and for planning library services. We discuss all the elements of these lifecycles in more detail throughout this book.

  5

Chapter One

What Does This Have to Do with the Library?

  6

Libraries have begun stepping in to assist researchers to craft data management plans. In some instances, librarians saw a new way to contribute their skills to support researchers, and in other libraries external pressure brought librarians to the table. In any case, librarians have a great opportunity to expand our services in ways that can benefit faculty, build stronger relationships between libraries and research communities, and continue to play a role in the preservation of scholarly communication. This last is the real key to our role in data management. Libraries have long been crucial players in the scholarly communication chain. We have been responsible for preserving and making accessible the scholarly record. Now, the form that the scholarly record takes is changing, and we must make sure that we are ready and able to continue our role in preserving and providing access. We can help researchers adapt to these changes by taking on new roles in the shifting infrastructure of scholarly communication.

What’s in It for Faculty? Data sharing is not a universal given in the scientific community yet, but nearly all researchers can see the benefit of improved data management. Jahnke and colleagues, in their Council in Library and Information Resources report The Problem of Data, note that researchers “understand that poor data management can be costly to their research and that access to greater technical expertise, through either a consultant or additional training, would be useful for their work.”4 Few researchers are happy with their own data management practices. They comment that they do not have time for the organizational and administrative work that goes into carefully managing and documenting data, and that they never received explicit training in data management practices. Additionally, many researchers work in fields that lack widely used and well-documented metadata standards or a common integrated data infrastructure.5 It can be challenging to convince faculty to take the time to plan for data management at the outset of a research project. The key to working successfully with faculty in this area is to show them how they can benefit from planning and organizing their work ahead of time, then maintaining their data accurately during their project. They will be more interested in working with you to create an

What Is Data Management?

effective data management plan if they can understand how it will help them to complete and publish their work. One oft-cited data management problem for principle investigators is related to work done by their research assistants and graduate students. In many labs, research assistants are responsible for managing their own data. However, the varied data management practices that result from this ad hoc lab practice can create a lack of continuity and lead to missing or incomprehensible data when a particular research assistant leaves the project.6 Data is easier to retrieve and use, whoever produced it, when it is managed properly. Additionally, some researchers have discussed the difficulty of going back to their own previous data sets for reuse or reexamination when the original work suffered poor data management practices. Without good documentation and contextual information, it can be difficult to understand how and why data was captured in the first place. Good data management reduces the amount of work required in interpreting and compiling information at the end of a research project. When good documentation is created while research is ongoing, it does not need to be reconstructed at a later date. Managing data consistently throughout a project can lead to greater confidence in the accuracy of that data and greater efficiency in analyzing it and producing a paper. You encounter a range of attitudes, beliefs, needs, and understandings toward research and research data as you begin to work with faculty. Working in this area makes use of many of your skills, including conducting a data interview that helps you assess what a researcher really needs, understanding how to organize a variety of data types, and helping researchers make the right decisions about access and preservation for their particular data. Librarians are well suited to move into this area, even though some of it may be new to us. Throughout the process of establishing a data management service, you are—first and foremost—doing what librarians do best: establishing relationships on your campus and discovering the best ways to be of service to your unique constituents.

Notes 1. See the Open Access Directory’s list of open-data policies for a growing list of these journals at http://oad.simmons.edu/oadwiki/Journal_open-data_policies. 2. “Research Data Lifecycle,” UK Data Archive, http://data-archive.ac.uk/create-manage/ life-cycle.

  7

Chapter One

3. “DCC Curation Lifecycle Model,” Digital Curation Centre, www.dcc.ac.uk/resources/ curation-lifecycle-model. 4. Lori Jahnke, Andrew Asher, and Spencer D. C. Keralis, The Problem of Data (Washington, DC: Council on Library and Information Resources, August 2012), 15. 5. Dharma Akmon, Ann Zimmerman, Morgan Daniels, and Margaret Hedstrom, “The Application of Archival Concepts to a Data-Intensive Environment: Working with Scientists to Understand Data Management and Preservation Needs,” Archival Science 11, no. 3/4 (November 2011): 329–348. 6. Ibid.

  8

About the Authors

Laura Krier Laura Krier is a metadata librarian at the California Digital Library in Oakland, California. She works on projects ranging from data modeling and analysis to research into linked data models for libraries. She received an MS from Simmons Graduate School of Library and Information Science and a BA from the University of California–Santa Cruz.

Carly A. Strasser Carly A. Strasser is a data curation specialist at the California Digital Library, University of California Office of the President. She has a PhD in biological oceanography, which informs her work on helping researchers better manage and share their data. She is involved in development and implementation of many of the UC Curation Center’s services, including the DMPTool (software to guide researchers in creating a data management plan) and DataUp (an application that helps researchers organize, manage, and archive their tabular data).

  95

Index

A

policies and procedures, 47–48 process of, 52–57 services, 11 attribution stacking, 76 Australian National Data Service, 14 authorship, 74 Azure (storage), 51

abstract forms, 54 academic publishing community, stakeholders, 73 access about, 61–62 benefits of, 65–66 lifecycle model, 5, 12 plan component, 22 repositories and submitting data, 66–69 restricting, 66 role of identifiers, 63–64 submittal refusal, 69 accountability, 86 actionable identifiers, 63 ad-hoc licenses, 75 administration management and, 6 metadata and, 43–44 advice, plans and, 20–21 agreements, data governance and, 71 agri-environmental sciences, 11 Amazon’s Cloud Drive (storage), 51 analysis interpreting and output, 12 lifecycle model, 5 anonymizing data, 12, 66 appearance, significant properties and, 54 appraisal, 52 Archival Resource Kerys (ARKs), 63 archiving data costs and, 51 management plans and, 20

B Banach, Meghan, 68 banks, data archives, 48 behavior, significant properties and, 54 benefits, data governance and, 72 best-practices documents, 45 bioinformatics, 3 buy-in institutional repositories and, 50, 66–69 new services and, 10

C California Digital Library, 18, 63–64 Callery, Bernadette, 77 capturing data, 45 Carlson, Jake, 36, 39 Carpenter, Maria, 14 Cedars Distributed Archive Prototype Demonstrator, 55 centers, data archive, 48 citations, 25, 63 clarifying questions, 32 cleaning data, 12 Cloud Drive (storage), 36, 51

  97

Index

  98

collaboration, 10–11, 21 communication, libraries and, 6, 14 community practices, personas and, 33 complaints, handling, 31 components of plans data access, sharing, and reuse, 24–25 descriptions, 22–23 long-term storage and preservation, 25–26 resources, 26–27 security, ethics, property, 23–24 short-term storage and management, 25 computers data collection tool, 2–3 mediation and, 53 confidentiality and privacy, 76–77 consultations, 19, 21 content, significant properties and, 54 Content Data Object, 56 Content Standard for Digital Geospatial Metadata (CSDGM), 40 context data, 54, 56 contracts copyright and, 75 data governance and, 71 control, maintaining, 33 copyright data governance and, 71 data librarians and, 14 Copyright law, 74–77 Cornell University, 17–18 costs. See also fees faculty and, 6 management plans and, 17, 26–27 preservation and, 51 repositories and, 48, 52 sharing data and, 3 courses, data management, 13–14 creating data, lifecycle model, 5 Creative Commons, 44, 66, 76, 78 Creative Commons Zero (CC-0), 66, 76 credibility, 65 crediting, 25 criteria, repository selection and, 52–55, 66–67 curation of data costs of, 51

lifecycle model, 5 new services and, 10–15 profiles, 32–33 researchers and, 35

D Darwin Core, 40, 42 data access plan component, 24–25 appraisal of, 52 citation of, 63 considerations for, 22 copyright and, 74–77 governance of, 71–72 interviews and, 31–38 legal mechanisms for, 75–76 librarians and, 14, 84–88 lifecycle of, 11–13 management of, 1–7 rights to, 74–75 sharing registries and, 55 training and, 13–14 Data Curation Profile Kit, 32 Data Document Initiative (DDI), 42 Data Governance Interoperability Panel, 78 Data Management Publishing Guide, 18 data repositories access and, 2, 61–63 archiving data and, 47 control and, 33 descriptive metadata and, 41–45 domain types and, 49–50 fees and, 52 institutional, 50, 67–69 interviews and, 31–34 life cycle model of, 13 management plans and, 22, 25–26 metadata guides, 41 preservation services and, 58 research data and, 2–5 resources for, 83 selection of, 52–55, 66–67 services and, 11–14 subject types, 11 submitting data and, 67–69 types of, 48–50

Index

data sets access and, 61 copyright and, 74 sharing information, 3–4 databases, copyright and, 74 DataBib (repository list), 67 DataCite Metadata Schema, 41–42, 63–64 DataONE, 78 Dataverse Network, 11 decision-making, justifying, 21 definitions, data governance and, 72 deposit data, 52 description information, 57 descriptions components of plans, 22–23 curation lifecycle and, 5 data librarian jobs, 84–88 descriptive information, 53 descriptive metadata, 41–43 designated communities, 53 Dietrich, D., 27 Digital Curation Centre, 5, 55 digital data bit sequences and, 54 curation lifecycle, 43 fees for, 51 humanities types, 18 licenses and content, 76 open access and, 66 preservation of, 68 preservation services, 58 storing, 44 Digital Object Identifier (DOI), 63–64 Digital Production and Integration (program), 45 digital signatures (verification), 57 digital systems, user-friendly, 48 digitizing data, 12 directories, data collection and, 22 Directory of Open Access Repositories, 50 discovery systems, 13 document data, 6, 52 domain repositories, 49 domains, metadata services and, 45 Dropbox (storage), 36 Dryad Data Repository, 41, 62

Dryden, A. R., 18, 37 DSpace (software), 49 Dublin Core, 40–42, 56 DuraSpace, 49 duties, data librarian, 84–85 duty statement, 86–87

E e-mail deposition, 67 Ecological Metadata Language (EML), 40, 42 economic considerations, 65–66 eCrystals (domain repository), 49 elements of metadata, 41–43 preservation and, 54 Encoded Archival Description (EAD), 40 environment, data librarians and, 85 errors, detecting, 68 ethics considerations for, 24 plan component, 23–24 privacy, confidentiality and, 76–77 Evolution (journal), 4 existing data. See data experimental data, 2 EZID, 64

F facts, copyright and, 74 faculty new services and, 14–16 planning for management and, 6–7 false attribution, 74 Fedora (software), 49 fees. See also costs digital data and, 51 repositories and, 52 files data collection and, 22 formats of, 44, 55 metadata and, 43 fixity data, 56–57 information, 44 tools, 68 format data collection and, 22

  99

Index

format (cont.) data management plans, 4 metadata and, 43 funders access and, 24 governance and, 72 guidelines of, 27–28 management plans and, 17 sharing data and, 4 as stakeholders, 73 future data governance and, 77–78 users and repositories, 48

G

  100

Garritano, Jeremy R., 36 gathering data, elemental level, 42–43 genomics, 3 Giesecke, Joan, 68 Gold, Anna, 45 Google Docs (storage), 36 governance issues about, 71–72 future of, 77–78 privacy and confidentiality, 76–77 stakeholders, 72–73 status of, 74–76 grants management and, 21 proposals, 4 resources and, 26–27 Green, Ann, 49 guidelines access and, 62 funders and, 27–28 submitting data, 67 Gutmann, Myron, 49

H hard drives, data storage and, 48 hardware, metadata and, 43–44 harvesting, metadata and, 69 hash encryption (verification), 57 health information, confidentiality and, 77

I identifiers access and, 61

broken, 64 role of, 63–64 implementation, 13, 21 InChI (identifier), 64 information professionals, 18–19, 27 institutions repositories, 50, 67–69 resources for, 83 review boards for, 76–77 as stakeholders, 73 support for, 10–11 instruction department of, 13–14 librarians for, 15 new services and, 11–13 intellectual property considerations for, 24 data librarians and, 14 metadata and, 44 plan component, 22 plan components, 23–24 rights to, 74–75 Inter-university Consortium for Political and Social Research, 49 interpreting data, 12 interviews about, 31–34 researchers and, 35–37 starting point, 34 ISME Journal, 4 IUPAC (international chemical identifier), 64

J Jahnke, Lori, 6 job descriptions, samples, 84–88 Jones, S., 20 journals, data requirements for, 73

L Lage, Kathryn, 11, 33–34 legalities confidentiality and, 76–77 data governance and, 71 mechanisms for data and, 75–76 requirements of, 4

Index

Li, Yuan, 68 liaisons, 14–15, 34 librarians, 14 libraries, 6–7, 48 licenses data usage rights and, 75–76 governance and, 71 librarians and, 14 Life Sciences Identifier (LSID), 64 lifecycles, 5, 11–13 logical structure, digital objects and, 45 long-term storage, 25–26 Losoff, Barbara, 34 Louis, K., 65

M machine-readable cataloging (MARC), 40, 56 management plans components of, 25–26 policies and, 62 samples, 89–94 Maness, Jack, 34 Massachusetts Institute of Technology, 18 MD5 checksums, 57 mediation, 53 metadata about, 39–40 administrative, 43–44 considerations of, 23 crafting strategies, 12 data librarians and, 14 descriptive metadata, 41–43 services of, 45–46 standards of, 6 structural, 44–45 updating, 26 Metadata Encoding and Transmission Standard (METS), 40, 44 Metadata Objects Description Schema (MODS), 40 Microsoft Azure (storage), 51 migrating data, 12, 14 models curation lifecycle, 5 Open Archival Information System, 53

monitoring tools, 2–3 motivation, data management plans and, 19–20 MuseumID (repository), 63

N National Center for Biotechnology Information (NCBI), 64 National Endowment for the Humanities, 4, 18 National Geospatial Data Archive Format Registry, 55 National Institutes of Health, 4, 73 National Oceanographic Data Center, 49 National Science Foundation, 18, 28 needs, personas and, 33

O OA Green repositories, 50 OAI-PMH (harvesting), 69 observational data, 2 OCLG/RLG Working Group on Preservation, 56 Ohio State University, 62 one-on-one consultations, 19 online considerations availability of, 66 backup services, 36 open access data governance and, 77 data librarians and, 14 data management plans, 25 interviews and, 33 repositories, 2, 50 restricting, 66–69 sharing data and, 5 Open Archival Information System (OAIS), 53 open-digital rights language, 44 open-source software, 64 OpenDOAR (repository), 50 organizing data, 12–13 ownership, data governance and, 71–72

P participants, locating, 10 personas, interviews and, 32–33

  101

Index

  102

personnel resources, 27 Peters, C., 18, 37 physical structure, digital objects and, 45 pilot projects, 15 Piwowar, H., 65 plans about, 17 advice regarding, 20–21 component knowledge and, 18 components of, 22–27 curation lifecycle and, 5 data management and, 4 funder requirements, 27–28 information professionals and, 18–19 motivating researchers, 19–20 researchers and, 17–18 samples, 89–94 plans, changing, 21 Plant Physiology (journal), 4 policies access and, 62 archiving and, 47 data governance and, 72–75 open access, 4 unifying, 78 political data, privacy and, 66 preserving data about, 47–48 behind the scenes, 53–58 costs of, 51 curation lifecycle model, 5 data management plans and, 4 determining worth, 26 lifecycle model, 5 lifecycle model and, 12 plan component, 25–26 processes of, 52 repositories, 48–51 services of, 58 storage vs. archiving, 48 primary data, 2 print-on-demand, 68 privacy concerns about, 36 confidentiality and, 76–77

data librarians and, 14 repositories and, 48 private keys (verification), 57 The Problem of Data (report), 6 procedures, preservation and, 47 processing data, lifecycle model, 5, 12 profiles, data curation, 32 project management skills, 4, 14 PRONOM Technical Registry, 55 properties determining, 54 plan components, 23–24 Protein DataBank, 49 provenance data, 56–57 Pryor, G., 65 public data librarians and, 15 as stakeholders, 73 public domains, 66, 76 publishing community as stakeholders, 73 Purdue University, 31

Q qualitative vs. quantitative data, 2 quality assurance, data collection and, 22–23 queries, plans and, 20 questions, interview and, 35–37

R randomized experiments, 2 reference data, 56 reference interviews, 31 reference librarians, 15 reference model, 53 registries, 55 regulations, data governance and, 71 relationships, building, 15, 32 repositories access and, 2, 61–63 archiving data and, 47 control and, 33 descriptive metadata and, 41–45 domain types and, 49–50 fees and, 52 institutional, 50, 67–69 interviews and, 31–34

Index

lifecycle model, 13 management plans and, 22, 25–26 metadata guides, 41 preservation services and, 58 research data and, 2–5 resources for, 83 selection of, 52–55, 66–67 subject types, 11 submitting data and, 67–69 types of, 48–50 representation information, 53–55 reproducibility, research and, 71 requirements, 28 research data and, 2–5 data policies, 62 management workshops for, 13 privacy and, 14 sharing papers, 3 researchers data sharing considerations, 24–25 helping, 17–18 maintaining control and, 33 refusal to submit, 69 storage considerations, 25 talking to, 35–37 working with, 10 resolvers, identifiers and, 64 resources components of plans, 26–27 existing types, 21 institutional repositories, 83 preservation costs and, 51 researchers and, 34 responsibilities plans and, 20 stakeholders and, 72 restating questions, 32 restricting access, 66 results, safeguarding data, 26 re3data (repository list), 67 reuse lifecycle model and, 5 plan component, 22, 24–25 review boards, 76–77

rights data and, 74–75 declaration schemas and, 44 metadata and, 44 stakeholders and, 72 roles data librarians, 14–15 identifiers and, 63–64 plans and, 20

S safeguarding data, 26 samples data librarian job descriptions, 84–88 data management plans, 89–94 schemas, 39–40 scholarly communication, 6 secondary data, 2 security components of plans, 23–24 considerations for, 23 data and, 25 plan component, 22 privacy and risks, 61 selection criteria data librarians, 86 research support librarian, 88 semantic information, 54 sensors, data collection tool, 2–3 servers, storing data on, 51 services about, 9–10 collaborating, 10–11 data management types, 6–7 institutional repositories and, 68 lifecycle, 11–13 metadata and, 45–46 preservation of, 58 staffing, 14–16 training and instruction, 13–14 sharing data management and, 3–4 data governance and, 71–72 plan component, 22, 24–25 short-term considerations costs, 51 plan component, 22 storage, 25

  103

Index

  104

social sciences, quantitative data and, 2 software metadata and, 43–44 open-source, 64 sources, finding, 15 staffing changes and, 20 new services and, 14–16 stakeholders access and, 62 copyright and, 74 data governance and, 71–73 data issues and, 72–73 data management plans and, 17 new services and, 10 understanding terms, 47 standards, 4, 18, 39 Stanford University, 62 Steinhart, G., 17 storing data data management plans and, 4 lifecycle model, 11–12 plan component, 22 preservation, archiving and, 48 strategic plans, 14 strategies, creating, 12 structure metadata, 44–45 significant properties and, 54 subject repository, 11, 32, 34 submitting data, 12–13, 66–67 sui generis, 74–75 summarizing questions, 32 support building, 10–11 plans and, 20 survey data, 3

training, 13–14 transcribing data, 12

U UK Data Archive, 5 underlying abstract form (UAF), 54 Universally Unique Identifier (UUID), 63–64 University of Colorado, 32 University of Edinburgh, 62 University of Guelph, 11 University of Houston, 18 University of Illinois, 32 University of Nebraska, 68 updating metadata, 26 URLs (uniform resource locater), 64 usability, 48 users, 9

V validating data, 12 verification methods, 57

W waivers, copyright and, 75 watermarks (verification), 57 websites guides and, 45 identifiers and resolvers, 64 storage and, 51 Whyte, A., 65 workshops, 13, 19

X x-ray crystallographic data, 49 XACML, 44 XrML, 44

T

Y

taxpayers, research funding and, 73 technical metadata, 44 Text Encoding Initiative (TEI), 40 time saver, plans as, 19–20

Yale University, 45

Z Zero license, 66