Digital Information Management: Wrap up, concluding remarks, and next steps

Wrap up and next steps
Cliff Lynch, CNI

This set of analyses are getting at a set of central hard critical problems that every institution is facing. Strategy has been fascinating – it would have been easier for each institution to get its planning grant and go off locally. But Deborah and Jeffrey have taken leadership to make the whole greater than the sum of its parts and to let the whole community learn from it. Each institution will have to develop its own local strategies, and its very helpful not to have to do that in isolation. You can learn a lot from other people’s experiences, including what are local weirdnesses to your environment and what are larger, more shared concerns.

Institutions and governance – one of the strong messages we get out of experiences is that this is a strategic institutional problem and it needs to be owned at a high level, like the provost. It’s very appropriate to be able to brief trustees or regents – it is a strategic issue, managing the research at universities. It speaks directly to institutional reputation, values, etc. There are a lot of people implicated in sorting this out – not just the library, or a simple library/IT collaboration. You’ve got legal folks, VPs of research, alumni/development offices, people concerned with reputation, image, public engagement. The question of giving more visibility to what the university is accomplishing is critical. How you come up with the right mechanisms of governance is complicated and will vary significantly from one institution to another.

Open access is very much tangled up in this whole question of managing research outputs and making them accessible. It is an important conversation to be having with faculty and (depending on how it comes out) can provide justification for building infrastructure and services. It is a lot about values – this has moved from a discussion about economics to one that is more fundamentally about values. The economic effects of open access are getting more confused – look at what’s going on in the UK with the Finch report. When we take it down to values about the accessibility of research it’s hard to argue – getting at things like global reach and the public impact of what our universities do.

This is not a simple “build it and we’re done” task – it’s a dynamic process that needs to governed and will unfold probably over decades. Different institutions will balance portfolios differently, but it’s a range of things: repositories, open access, ETDs, policies around software, institutions’ role of disseminating information at scale, etc. Electronic theses and dissertations are very much a part of this portfolio and have been strikingly slow in coming in the US. Why are they not a complete no-brainer?

Things surfacing on the frontiers of these discussions: Documenting faculty research outputs, traditionally through annual activity report, but now is a funny electronic thing that lives in six different silos that drives faculty crazy. Those are becoming more important to faculty as time goes on. We are seeing emergence of a number of systems trying to rationalize this (Vivo being one of many), but the state of the art of interoperability is dismal. Next step from that is alternative metrics, to get at quantifying research impact (very dangerous activity but many institutions seem determined to indulge in), and to help us allocate attention in a world where we’re all drowning. The rate of publication is so high that no matter how finely you niche your work you can’t keep up.

Public engagement will be more important over time – why is this research important, not only to the three people who understand it, but to the people who pay for it? Institutions will be increasingly challenged to help their faculty speak to that.

Research data management was a major part of the discussion here. It’s one of the great institutional challenges as research becomes more data intensive. We’ve seen funders mandate data management and sharing plans. Right now we are at a desperate juncture – we’re a couple of years into this but have almost no idea of what the impact is. The data from Princeton is about the only example of analysis that’s been done. We know very little of what’s going into proposals, which proposals are being funded, the extent to which faculty make good on the plans, etc. The funding agencies themselves don’t seem to be doing any consistent analyses here – funding agencies often don’t know how to do things themselves but wait for someone to show up and propose a grant. Sooner or later (hopefully later), someone will ask a question about what compliance we’re getting on this and if the answers are bad we’ll see a great deal of hysterical activity. We need to get some sense of what’s happening out there. We probably need to have conversations of risk management and research in ways we aren’t doing now – part is about research data, part is about physical things that come out of research (problems with hurricanes are a good example). Not just an IT problem – you put a lot of research at risk with continuity of things like freezers. How is the responsibility for research data distributed – a policy discussion that we’d all like to avoid but is strategic. The other side is if we do intelligent data sharing, can we really improve scientific and scholarly productivity? We’d like to believe that, but we need to try and learn everything we can about the real effects of being successful at research data management.

Concluding Remarks:
Tracey Futhey, VP of IT and CIO at Duke

It was quite telling when David talked about how CIO and Records Management functions hadn’t met – this is an activity that we’re pursuing on both IT and Library tracks, it’s something we have do together.

The two approaches from Duke and Dartmouth had a benefit of doubling the learning by combining together. Two very different governance structures, both with commitment from top and both IT and Library participation, but different participation beyond that. Duke was more focused on academics, while Dartmouth was more academic in nature. Essential for governance structure and membership reflect the goals of the institution.

We may set the plan, but let’s make sure strategies don’t get stagnant – incorporate emergent needs.

It was pretty clear that leveraging faculty motivation is key – helping their work to be consumed by the broadest possible audience, and keeping it simple for them. Set default for desired outcome, rather than relying on people for proactive input.

How can we bring collaborative activities together around the edges of this domain?

These aren’t just library and IT issues, but institutional issues.


Digital Information Management – Open Access and Archiving Research Publications

Open Access and Archiving Research Publications

MIT Faculty Open Access Policy: Implementation & Impact
Ellen Duranceau, Program Manager: Scholarly Publishing and Licensing, MIT

Aim of policy – “The faculty at MIT is committed to disseminating the fruits of its research and scholarship as widely as possible”

Permission-Based Policy – License grant to MIT
– grants MIT non-exculsive permission to exercise all rights under copyright, provided articles not sold for a profit.
– Exists prior to any publihser copyright agreement
– copyright is not transferred to MIT
– copyright can still be transferred to publisher, subject to prior license to MIT
– Opt outs accepted automatically on a per-paper bases

– Office of Provost, in consultation with Faculty Committee on Library System. Will be reviewed on 2014.

Key Factors
– Faculty driven, with library involvement
– Value-based and culture/mission-driven
– Permission-based policy changes default with no author action
– Mediated deposit in repository
– Assertive implementation with workflow tracking, provides assessment data
– Convenient – no paperwork from faculty

Three channels to obtain papers
– Publishers’ websites – if allowed (>30)
— Automatic copy of PDF
— Automatic SWORD deposit for: Biomed Central, Hindawi, Nature experiment (test phase)
– Repositories & MIT websites
– Faculty requests, via subject librarian liaisons

Drawing citations from Scopus and Web of Science, de-duping with EndNote, combining with Data Warehouse of faculty names, then send email to faculty listing papers which haven’t been able to obtain from automated sources and haven’t been opted out. Liaisons found this process worked to open doors and start conversations with faculty.

– faculty time
– availability of final accepted manuscript
– faculty concerns about publisher
– publisher responses – e.g. changing author agreement to require opt out.
– Most publishers cooperating. Some who require opt out still allow posting.

Staff support for OA policy
– Existing staff repurposed – .5 term position added. .75 – 1.0 FTE librarian (parts of two people). .5 FTE acquisitions support staff. .25-.5 metadata support/student staff, ~10 hours/week temp. acquisitions & metadata student staff.

Impact – 7,800 papers. Over 630,000 downloads since Oct ’09, 40k per month.

About 4% of articles opted out, all because of publisher requirements.

Open Access & Archiving Research Publications
David Seaman, Associate Librarian for Info Management, Dartmouth

Fairly deep and broad commitment to open access. Member of COPE, signatory to Berlin Declaration on Open Access, etc. Struggling to disambiguate publishing in an Open Access journal from making access open. Faculty and student council on the libraries working on Dartmouth Faculty Open Access Resolution and Policy to bring to faculty senate.

Elementa – Partnership with BioOne and four other institutions launch 6 OA domains in July 2013. “Science of the Anthropocene” Dartmouth as tech partner building new high-function publication platform based on PLOS ONE’s Ambra 2.0 system using JATS 1.0 XML.

Archiving Research Publications – Have a deep investment in administrative records, using OnBase. Using RSTOR locally for research dataset storage. D2I policy work for campus. Infrastructure planning for digital library program. Doing a Stakeholder Needs Assessment to work out what research publications services are actually needed by faculty. Discovering the Information Needs of Humanists when Planning an Institutional Repository (D-Lib 17, 2011), Ithaka S+R Institutional Repository Services Report, December 2012. Focus with faculty is on articulating a range of services, not on selling the infrastructure.

Where they are today: Solid policy discussions and good stakeholder needs assessment. Systems for managing datasets and admin/archival records in place. New system for OA publishing underway. No existing Institutional repository. No campus-wide faculty profiles (lots of faculty websites of indeterminate freshness).

Looking for a broad research info management system – fed by Banner, HR, CrossRef, Orcid, Scopus, ResearcherID, Google Scholar, etc. Feeding open access repository, dataset access, faculty profiles, grants management, etc.

Kevin Smith – Directory of Copyright and Scholarly Communication, Duke
Duke’s Open Access Policy – Twin Foundations

See it as a step in a process of re-imagining scholarly communications.

ETDs (DukeSpace) since 2006, Law School journals open access since 1998, with repository in 2005.

Two channels for discussion: OA policy as a legal mechanism (legal status, implementation), OA policy as an expression of values.

Legal status: policy gives university a non-exclusive license to put a copy of each published journal article written by faculty members into repository. Policy is waivable, but irrevocable once license comes into existence. Gone about implementation in a non-confrontational way, to reduce burden on faculty.

Adoption – origin in faculty committee related to grant. OA policy was first priority chosen. Many conversations with faculty groups. “Secret to success is to take a drink with anyone who offers”. 3 broad issues: Is OA a good idea? (impact on journals and scholarly societies); Is university-wide policy best way to support OA? (disciplinary differences); How will it be implemented (will it create extra work?). Policy approved unanimously by faculty council.

Practical arguments: Higher citation rate/greater visibility; Interdisciplinary relationships; press coverage/public understanding; SPEED! (the big thing that mattered to lots of faculty, particularly in sciences).

Values-based arguments were very important in faculty debate. Expected readers: Patients, researchers at under-resourced institutions; clinicians and care-givers. Unexpected readers: Policy makers, independent researchers, ordinary people making decisions.

Implementation – Faculty made clear commitment to OA as a principle, with several caveats: reduce workload, don’t create conflicts with journal publishers.

Role of Libraries – Answered questions about: peer-review, publishing business models, risk of plagiarism, budget & staffing for repository. Major role has been in implementation: Working to automate process and tie it to activities faculty already do, like annual reporting and creation of profiles; harvesting citations, creating “batches” based on publisher policies; communicating with authors when articles upload or post-print version is needed.

Risks and benefits: Method is slow, labor-intensive; pressure on Libraries to make policy success. But helps re-focus role of Libraries in digital age; aligns libraries with clearly expressed academic values of faculty.

Digital Information Management – the Duke Dartmouth Project

Duke Dartmouth Project

Jeffrey Horrell, Dean of Libraries at Dartmouth

Both expressed concerns about grappling extraordinary amount of digital information. Services were starting to appear to deal with some parts of the content (journal articles), but that’s only a small piece of the information. What is the role of research libraries in managing this content? Based on conversations with Don Waters, received a joint planning grant and then individual grants to approach questions. Annual advisory council meetings, including colleagues at Chicago, Virginia, Williams, Yale, and others.

Deborah Jakubs – University Librarian, Duke

Both institutions saw, at the start, isolated efforts to manage digital information. Because of decentralization neither institution was able previously to develop infrastructure to straddle both academic and administrative realms. This effort has begun to overcome that issue. Frequent communication to faculty about the value of a coherent approach to digital preservation is important.

Paolo Mangiafico – Director of Digital Information Strategy, Duke
Context: in 2006 one year joint planning project between Duke and Dartmouth. Better understand landscape of digital assets on both campuses. Resulted in a report. Identified challenges: insufficient funding for new infrastructure and support; lack of established models; distributed and independent culture that is hard to change. No one size fits all asset management system. Project about practices and services, nurturing an ecosystem, not implementing a system. Ask faculty to tell us what they want and build around that, not institutional needs.

Faculty wanted collaboration, publishing, and impact. Publishing in terms of getting things out there quickly for people to see that are not ephemeral.

Toolkits and support for:

  • Managing and organizing their own stuff in ays that make sense
  • Dissminating what they’re creating
  • collaboration using modern net tools
  • informal and formal publication
  • metadata standards, taxonomies,
  • data on how stuff is actually used
  • place to hand off data/publications when it’s done.

Tiers of custodianship

  • Formal publication and archiving (institutional repositories) – selection for institutional value
  • Informal publication and mediated management (research repositories) – self archiving, curation by data owners with consulting. permanent urls, basic metadata
  • Basic storage and management (personal repositories) – auto-extracted metadata

selection and filtering processes between boundaries. services at those boundaries.

Process and governance
– provost level steering committee, plus task groups. Decided to focus fist on academic materials, so governing group mostly faculty. A limited term task force – the digital futures task force. Seven faculty, plus a handful of staff. members nominated by deans.

Roadmap and potential projects. task force decided which to prioritize. Decided to start at top of pyramid and work down. in first year developed an open access policy and repository and publication management services for faculty. Mostly managed by library. Developing an experts database based on vivo ontoloty.

In second year, worked on research data management support.

Both efforts received significant attention on campus, including discussing with board of trustees.

Choice architecture – influenced by Nudge by Richard Thaler and Cass Sunstein. set defaults for what you want outcome to be, but allow people to opt out. That’s how they’ve been developing services for digital asset management. Will encourage culture change better than mandates.

Stephen McAllister – Dartmouth

Dartmouth Digital Initiative (D2I)
Protect core digital information. Guide transition from paper to digital.

Committee included counsel, libraries, CIO, VP of Research, College records manager, Director of HR and payroll, Associate Librarian for Info Management and the Director of Digital Info Strategy. Met every other week.

What is strategy? (per Henry Mintzberg) – Deliberate and emergent strategies. Combination of intended strategy and emergent strategies (what happens when it hits reality – what people are really doing). Need a model that can incorporate emerging behaviors.

Vision included governance, culture, and technology – decided to tackle governance first. Wanted institutional representation across the the whole institution. Used existing faculty councils. Digital Asset Strategy Committee includes – CIO, CFO, Dean of Libraries, Director – Dean of College, Dean of Faculty, Prof. Schools reps, VP Research, Advancement. Individuals on that committee also liase to faculty committees. First task – Info Security policy approval. 1.5 year process, led by a former CISO at Pfizer. Went to faculty councils and then to governance group for approval. Have recently been talking about cloud policy. Most recently working on Institutional Repository – strong on admin systems, strong on infrastructure, fewer tools in academic area. Engaged Ithaka to report on: peer institutions, benefits, defining the content, implications of our decisions, readiness, process.

Digital Information Management symposium – opening keynote by David Ferriero

I’m in Washington, D.C. for the Coalition for Networked Information meeting, but today is being spent at a pre-meeting symposium on Digital Information Management in The Research University: Strategic Directions and Tactical Approaches

This is a Duke-Dartmouth Symposium
David Ferriero – Archivist of the United States

Permanence, Perseverance, and Persistence: Managing Digital Information

Citizen Archivist Dashboard:

Nov 2011 President Obama launched executive effort for raising the profiles of records management and records managers, focusing on electronic records. Mandated that OMB and Archives create government wide records management framework. One goal is to encourage open government and access to records. Current standard for federal agencies is print and save electronic records. Aiming to manage all federal records electronically by end of 2019. Revising transfer guidelines for federal records, including metadata by end of next year. When he arrived three years ago he discovered that CIOs in agencies didn’t have existing relationships with Records Managers in the agencies. By end of next year will determine feasibility for secure cloud-based archival data at rest protocol for non-classified data. 250 agencies, all creating records with policies. 2% of what’s created gets transferred to Archives. There’s not a federal job description for a professional records manager – often ends up with junior staff.

In December 2010 PCAST presented Designing an Information Future – recommended reinvestment in information technology research to accelerate pace of discovery in all fields. Will allow better access to government records and services. 2.7 zetabytes of digital data are now being stored. 4.7 zb by 2012, 32 zb by 2014. Storage is a small part of the issue. Require long-term stewardship based on sustainable economic models. Fluency has little to do directly with use of computer, but with computational thinking.

Big data research and development – In March 2012, OSTT announced investment in big data research. A couple of examples: NSF encouraging universities to develop graduate programs for data scientists – is there room for data curators? What skills? Make discoveries while swimming in data, bring structure to formless data, identify rich data sources, clean and combine them. Shift from ad-hoc analysis to an ongoing conversation about data. DARPA x-data program – developing computational techniques and tools for analyzing semi-structured and unstructured data.

Thomas Carlyle quote: “Permanence, perseverance and persistence in spite of all obstacles, discouragement, and impossibilities: It is this, that in all things distinguishes the strong soul from the weak”