I’m in DC for the Fall membership meeting of the Coalition for Networked Information, which is always a great place to pick up on the latest goings-on at the intersection of libraries and digital information and technology. As usual, the meeting kicks off with Cliff Lynch, the Executive Director of CNI in giving a summary of the current state of the art.
Some things Cliff won’t talk about:
- The work Joan Lippincott has been leading on digital scholarship centers. Gets at some of the vehicles for forging and sustaining collaborations within research institutions and stewardship of materials that come out of that. There is a session tomorrow about that.
- Output of executive roundtable on the acquisition, collection, and curation of e-books at scale by University libraries, as well as interaction between online textbooks and ebooks in research libraries. A summary session on that tomorrow. A general challenge question in this area: Are there examples from the recent landscape of books being published in electronic format only (or perhaps with print on demand) that contain high impact contact? Are we starting to see the market emerge where you HAVE to deal with electronic material because it’s not coming out in print, to get coverage of recent events. Thinking mostly about books outside of extremely narrow scholarly domains. Spring executive roundtable will look at software as a networked based service.
- MOOCs. A year ago you couldn’t convene a group of five academics and get them not to talk about MOOCs. While discussion continues, it’s at a much lower temperature. There is some interesting preliminary research looking at the characteristics of the folks who seem to be most successful at MOOCs. Invites us to go back to much more fundamental notions of the library as the center of the university (if you connect a learner with a good library they’ll go far). That’s true of a certain type of person, but not all. The delivery of teaching and learning experiences is different than delivering a collection of knowledge. In the early enthusiasm about MOOCs there’s a tendency to see them as courses by other means. We will see MOOCs or MOOC-like things for purposes different than traditional courses, like training and other things that don’t fit well in the traditional academic definition of a course.
Things that are changing the landscape:
Hard to not give prominent place to the OSTP directive for federal funding agencies to develop plans to give access to reports and underlying data produced by funded research. There was an August deadline for submission of plans from agencies, which are not public. OSTP has been forthright that some plans are a bit more mature than others, depending on agency. We don’t have a firm date for when they will become public, but there is momentum. These developments will reshape the landscape for institutions that host the researchers as well as for the researchers themselves. Other nations and non-governmental funders are moving in the same directions.
One way to think about this is in the governmental sector, as a new set of compliance requirements. But we’ve seen the leadership in research and higher education (ARL, AAU, APLU) look at this as an opportunity and a challenge to rationalize the production of scholarly literature and data, which needs to be done. We’re seeing a lot of changes in the obligations and practices that surround scholarly publishing – a whole range of behaviors that need to be rationalized so that researchers aren’t left scratching their heads. Seen a variety of responses to the OSTP mandate – from SHARE (ARL), CHORUS (STEM publishers), government (based on PubMed Central). All have places in the ecology. One of the attractive things about CHORUS is that it makes articles available in the context of the journals in which they appear, which institutional repositories do not. We need to think about how to take advantage of this, not view it as competition.
A little bit of redundancy is not a bad thing. When the government shut down we got an education about how deeply entwined many of the scholarly information services are as non-essential government services. Interesting to look as a case study at what was unavailable during the shutdown. At some level PubMed Central was deemed essential and stayed up, though it was not ingesting new contributions. Conversations that took place recently under the name ANADAPT2, in Barcelona, mostly of national libraries, looking at aligning digital preservation strategies. Can see very clear advantages to aligning strategies at a nation state level, but realize that there are some functions that each nation wants to maintain autonomy in, rather than getting into interdependent collaborations. A set of trade-offs. When does collaboration turn into interdependence?
SHARE is not just an opportunity and a challenge to straighten out publication, but it also deals with data. There was a second executive order over the summer that told federal agencies that the default thinking about data systems (modulo security and privacy concerns) is to provide public access to data. The word public is popular in governmental circles (rather than “open”). When you talk about public access (to let the public of the United States have access to data) that can be from access to raw data files, all the way to systems that help the public understand and analyze the data. There are people in government struggling to understand where on that continuum to fall, especially as there is no money associated with these initiatives.
Issues emerging in the data area: Research and higher-ed community is mobilizing to address needs through bit preservation services. Some data is constrained because of personally identifiable information. Anonymization, while a useful tool, has limited power. It is frighteningly easy to de-anonymize data. Need to think about how to handle personal data while we gain the power of recombination and reuse of research data. We are seeing a movement away of commitment to open-ended preservation of data to a more limited language of data management plans, e.g. preservation of bits for ten years. There are a number of commercial services or consortially based services where you can prepay for ten years. General proposition is we’ll go ten years and then look at what kind of use has been made of the data and then look at alternatives. We have no process for doing that evaluation – we’ll need to involve all sorts of community discussions about value of data, which will need to be cross-institutional (we’ll need registries). It’s not too soon to start thinking about this problem.
This is an example of a broader issue of “transitions of stewardship” – somebody’s been taking care of something, but now their commitment is expiring. We need an orderly way of putting the information resource in front of the scholarly community and evaluating the need for continuing preservation and finding who will step up to it. We’re getting very good at making digital replications of 2-dimensional things like fine art, but the difference is tracking provenance. There’s lots of progress in 3-dimensional work as well (Smithsonian, e.g.). We now have an opportunity to peel off the scholarly side of artifacts for not only exhibition, but as objects of study. There are lots of institutions of cultural memory that are under severe stress – see the discussions around the collection of the Detroit Institute of Art, which again leads to the idea of taking transitions seriously.
We need to be thinking about where we’re assigning resources. Two things that are troublesome: 1) we don’t know how well we’re doing with our digital preservation efforts. How much of the web gets covered by web archiving? We don’t have an inventory of the kinds of things that are out there or what parts are covered, or where the areas of highest risk are. There’s a tendency to go after the easy stuff – part of our strategy going forward needs to become much more systematic. We have a tendency to continually improve things we’ve already in our grasp (like continually improving layers of backup for archives), but we need to look at the tradeoff of resources for this versus focusing on what we’re not yet capturing.
Another place where we’re seeing emerging activities that need to turn into a system is in distributed factual biography. Author identity, citations, aggregation, interchange, and compilation of citations. Connected to compliance issues, with academic processes, social networking among scholars, identifying important work. There’s an enormous amount of siloed work going on. Creeping up towards a place where we have factual biographies that we can break up into smaller parts and reassemble. What degree of assurance do we need on these bits? What role does privacy play? Is the fact that you published something a secret, or should it be able to be? Noteworthiness – Wikipedia has a complicated set of criteria of deciding whether your biography is worthy of Wikipedia. This has a rich wonderful history, going back to the nineteenth century work on national biographical dictionaries. When does someone become a public figure? There’s a question about systems of annual faculty reviews – one of the most hideous examples of siloed activity imaginable. Information often collected in forms that aren’t reusable in multiple ways. These need to be tied together with things like grant management systems, bibliometric systems, etc which are all moving the same data around. Other countries where the government is involved in assessing faculty work to pass out funds are more sophisticated than is typical in the US. One of the things we need to look at hard and quick is interchange formats. There’s good work in Europe and out of the VIVO community.
Notion of coherence at scale – framed by Chuck Henry at CLER. We’re moving past the era of building fairly little system and federating them, but we need to be thinking at scale – how do systems depend on each other and interrelate? Look beyond academia – Wikipedia, Google, Microsoft, Internet Archive. Look at incredible accomplishments of DPLA (Digital Public Library of America). They’re being very clear about what they’re not going to do in the near future, by implication saying that someone else needs to worry about those topics. The scale of engineering we’re looking at to manage scholarship and research knowledge is crossing some fundamental thresholds and we’re going to need to do things very differently than we did in the past. Examples are all around – look at the Pentagon Papers which are now a fundamental reference source to history of that time. That was a book – the research community knew how to deal with it when it was published. What do we do with things like Wikileaks? What do we do with massive data revelations?