- Secure research data
– Wanted to focus narrowly on where access to restricted datasets are important in research computing. In social sciences, sometimes researchers have to apply to analyze data from government that is not public. Medical data is protected by regulation. Geospatial data research can use sensitive data on individuals. People working with industry sometimes have restrictions on data. Intellectual property has to be respected. Recommendations:
1. People who manage research computing environments want to know what federal standards need to be complied with – come up with a national working group on how to comply. There is a federal interagency working group on data which might be a good venue to communicate with.
2. A simple catalog of solutions from institutions on how to enable remote access to secure data. Use the Educause Cyberinfrastructure working group.
3. Catalog items for clinical translational study.
1. Develop a set of documentation (elevator speech, exec summary, and extensive report) to describe the need for policies and standards across disciplines as much as possible.
2. Develop workshop for university officers (VP of Research, Provost) to include them in discussions on how institutions can be involved.
3. Catalog of issues on data ownership and responsibility. Reduce mean time to discovery for researcher in how they should deal with their data.
4. Develop workshop for leaders of disciplinary communities.
5. Develop discipline-blind framework – what are the kinds of things a discipline needs to do to develop policies and standards?
6. University librarian is key in this role.
7. It’s time for the researchers to walk into the room with the librarians and say “we’re here”. – Brian Athey.
- Assessment and selection of research data
Is it really a goal to keep all data if possible? Good question.
Good practices with physical materials should be studied for guidance.
Expense of what it takes to manage data shouldn’t be primary consideration for what we keep.
Selection process has to be discipline specific.
What’s the cost of getting rid of something? Is reproduction of the data possible, and if so, what does that cost?
It’s easier to throw things away than to try to collect them after the fact. So collect and manage data before deciding to throw it away.
Researchers will have to provide at least core metadata.
Selection process is not yes/no but a continuum from minimal to full.
1. To make decision easier, develop a framework for making decisions. The researcher is a full partner in this.
2. Educate key audiences on importance of curatorial concepts. – researchers in all disciplines, and catch grad students now.
3. Encourage policy makers to rethink roles across the institution.
- Funding and operation
Recommendations for action:
1. Repository builders should collaborate – build with knowledge and forethought of others. Too many isolated repositories. Think federation.
2. Make data movable. Funding models will change over time. Should be movable from one caretaker to another.
3. Prepare for the hand-off. Anybody organizing a repository must put enough details in plan and budget to enable hand-off at the end of business cycle.
4. It would be useful to have a study of existing repository models.
Partnering researchers, IT staff, librarians and archivists
30 people in this breakout!
1. Communication of what’s out there – what models exist? Portal that identifies workable solutions. What practices work for training – resources for cross-training?
2. Institute more training for grad students.
3. Substantial workshop report from here – task NSF for developing a generic framework that allows institutions to implement policies and appropriate procedures.
4. Hold a workshop to define best institutional practices in communicating between researchers and librarians.
5. Survey our campuses on data management practices.
Standards for provenance, metadata, discoverability
Got into a discussion on “what is metadata” – anything that supports the core user needs for information. IFLA def – can you find it, can you identify it, can you select among resources, can you retain or reuse it? We want our metadata to be interoperable – move across repositories, workspaces, etc. We also want trustworthy and reliable data.
1. Common framework for data. some emerging, like METS.
2. Role of ontologies – domains recognizing standardized terminologies. Linked Data (semantic web) might be worth exploring for this.
3. Instrumented data – if numeric data is off, then data is useless. How do we know if the data is good? Huge gap in current data – need to work with instrument manufacturers. What captured this data? Usually entered manually.
4. Metadata needs to be captured at point of data creation.
5. Need standards of provenance – what’s the purpose of creating this data? Relationships between datasets are critical. Most scientists spend a long time exploring dimensions of the same set of problems.
Researchers want to develop their own metadata – treat it like any other data stream. Don’t worry about having to bring it into a structure.
Partnering funding agencies, research institutions, communities, and industrial and corporate partnerships
1. Joint study of the feasibility of the “digital sheepskin”. Is there a model for a digital container that can be sustained through the ages, including metadata? We’ll probably have to invent some of the social context for this.
2. Conduct an aggregated study of TCO models using trusted party (academia) for storage for perpetuity or for ten years.
3. Identify the missing pieces of the research data software stack, and encourage collaborations between academia and industry.
4. A study on criteria for throwing data away, by discipline.
5. Continue to emphasize that data volume is growing much faster than our ability to move data around. Think about where we need to site data.
6. What are the possible models for joint activity with industrial partners?