CSG Winter 2018 – Research and Teaching & Learning IT: Partnering with the Library

This morning’s workshop on partnering between IT and Libraries features Jenn Stringer/Chris Hoffman (Berkeley), Jennifer Sparrow/Joe Salem (Penn State), Diane Butler (Rice), Cliff Lynch (CNI), Louis King (Yale), David Millman (NYU)

The morning is starting off with some thoughts from Cliff Lynch (CNI):

Reminders of some things many haven’t lived through: In the early 90s there was a call not only for collaboration between IT and Libraries, but serious talk of merging. It was tried at a few institutions, like Columbia University. The takeaway was that it’s fairly crazy at large institutions. The mission expansion of each has been in differing rather than overlapping areas. But it’s been successful at a number of liberal arts organizations.

When CNI was founded it was totally viewed as a collaboration between the CIO and the head of the Library at member institutions. In the early 2000s that makeup was changing. The representation was the head of the library and someone doing research or academic computing, or doing digital work in the libraries. Led to increasing disengagement of the CIOs. Starting around 2000 started putting on executive roundtables with the intent of re-engaging the CIOs. It was fairly easy in the first few years to come up with topics in that sweet spot, but it got harder. If you look back from 1990 – 2005 you see that Libraries had low levels of technical expertise. At the same time libraries had developed some internal expertise in technologies important for digital humanities, data curation, etc, where there is now more competence than in the central IT org, which has structured its mission around infrastructure, compliance, etc. Libraries continue to rely on IT for fundamental infrastructure.

If you look at the landscape, how much IT capability is native to the library, and how much replicates or compliments the expertise in IT. This is hugely inconsistent. If you polled the CSG campuses you’d be surprised at the degree of variation in organic IT expertise in the library.

Collaborations involving library have become much more multilateral rather than bilateral with IT – involving partners like University Presses, Museums, research data management, digital scholarship centers (often involving academic school or department), geospatial centers, maker spaces. \

Don’t forget collaboration on institutional policies. Data governance, privacy and reuse of student data and analytics, responsibility of university to preserving scholarly products. Had a recent roundtable looking at policy implications of adoption of widespread cloud platforms.

This area does not lend itself to checklists.

UC Berkeley – Chris Hoffman

A history of good intentions – Museum Informatics Project – Housed in Library, Digital collections and DAMS. Complicating factors: Sustainability, budget cuts, grant funding; priorities; loss of key champions; culture.

Collectionspace – managing collections for museums.

Research Data Management – an impetus for change. New drivers (DMP requirements), new change leaders, new models for partnership. Benchmarking justified need. Broad definition of research data – all digital parts of a research project. Priority to nurture collaboration between IT and Library. Co-funded a position for program manager. Campus-wide perspective, investing in understanding and bridging cultures.

What’s next? More challenging tests to partnerships, RDM 2.0, Visualization and makerspaces, more fundamental technologies? (archival storage, virtual teaching and research environments); strategic alignment?

NYU – Stratos Efstathiadis, David Millman, David Ackerman

Research technology works closely with LIbraries.

Data Services – estab. 2008. 11 FTE Consultation and instructional support for scholars using quantitative, qualitative, survey design, and geospatial software and methods. Joint service of IT and Libraries.

Digital Library Technology Services – estab ca. 2000. Digital content publication and preservation. New services to support current scholarly communication. R&D to develop new services and partnerships, 19 FTE.

Research Data Management Services – estab 2015. 2 FTE. Promulgate beset practices in data organization, curation, description, publication, compliance, preservation planning, and sharing.

Research Cloud Services – new collaboration build on other preexisting services. Inteconnected research storage environment. REimagine a spectrum of cloud storage from dynamic to published final products. Provide backbone for researchers but also Libraries collections and workflows.

Yale – Louis King

Considerable history at Yale in working in digital transformation space.

Office of Digital Assets and Infrastructure – Sept 2008. Work closely with Library and ITS. Focus on Digital Assets & Infrastructure. Take advantage of disciplinary approach of libraries and technical capacity of IT.

Looking for ways to gain efficiencies and lower overhead for people who want to manage digital content.

Had some substantial initial success, but changes: Initial provost sponsor left Yale, 2009 financial crash, VP retired, two library director transitions, transition in IT director, emerging digital systems in Library.

Late 2012 relaunch as Yale Digital Collections Center, but closed in 2015. But it catalyzed momentum towards digital transformation at Yale. Established the foundation for many successful current and future collaborations.

Rice University – Diane Butler

Library and IT have been partners for a long time. For a very short time, the organizations were merged. Research IT and library have been partners since 2012 and informally even further back. Began iwth library providing the service and IT providing the core infrastructure but has morphed into a collaborative partnership.

Areas of collaboration: Data Management (through Library). Provide consultation, including creating DMP, describing and organizing data, storing data, and sharing data. Training, Access to resources such as platform for sharing and preserving publications and small-to-medium datasets. Still an area for work as faculty aren’t very engaged.

Digital Scholarship: Service provided by library with IT providing infrastructure. Preserving scholarship, navigating copyright and open access, managing and visualizing data, digitizing materials, consultation, etc.  Research IT has history in supporting engineering and sciences, but not so much in humanities.

Digital Humanities: Imagine Rio Project. Most successful collaborative project to date. An architecture and history professors joining together to imagine Rio de Janeiro. Searchable atlas of social and urban evolution of Rio.

Positive outcomes: Research IT had not supported Humanities or qualitative social sciences previously. Success of project has brought in more funding. Research IT now has 2 facilitators that are working with faculty in those disciplines.

At Rice the board has come up with some base funding for research computing, so that all of the work doesn’t have to be funded by grants.

Penn State – Joe Salem and Jennifer Sparrow

Strong history of working with libraries, IT, and student services on accessibility issues. Thinking about spaces in place and how to leverage institutional spaces. Built a “blue box” classroom.

Worked on the Dreamery – a co-learning space for bringing emerging technologies onto campus.

Driving strategic initiatives: Collaborative, technology-infused space. Inherited a space called the Knowledge Commons. Includes a corner with staffing from both Libraries and Academic Tech. Service partnership profile has grown from just a focus on media, to overall platform for supporting students. Work on curricular support together – open educational resources and portable content. Instructional design is a focus.

Learning Spaces committee – Provide leadership in innovative instruction.

What makes the partnership work … or not? What does each side bring to the table?

Chris – Berkeley

Visualization service at Berkeley. HearstCAVE: Connected virtual spaces over the Pacific Research Platform around preserving archaeology preservation. Thinking about how it connects with data science.

Markerspaces at UCB – pockets of excellence and experimentation. Jacobs Institute for Design Innovation. Talking with library and ETS to look at space.

Hooking the two together in a Center for Connected Learning.

Research Data Management at Yale
Much Ado about Something: Complex funder requirements; reliable verficiation of results; reuse of data in new research.

What are the responsibilities and rights of the University and faculty regarding research data? They put out a Yale Research Data & Materials Policy. Developed over 2-2 years with collaboration across the university. There is significant collaboration in support of that policy – Library and IT collaboration: Research Data Strategic Initiaitive Group, Research Data Consultation Group, Yale Center for Research Computing.

Recommendation: Research Data Service Unit; REports within LIbrary – Assessment, coordination, outreach and communication. Federated support model for all research data support services – research technology, data management, metadata, outreach & communications, customer relations, education and training, research data administrative analytics.

NYU – David Millman

Bottom-up requirements – survey local researchers: IT/Lib complementary styles, contacts. Survey peers: IT’Lib coordinated.

Executive review: Dean, AVP-level

NYU – research repository service identification. Umbrella of services – – researc lifecycle. Creation, manipulation, publication, etc. Holistic — customer focus. 1. HPC storage. 2 – medium” performance storage (CIFS, NFS); 3 – “published” sotrage – preserved, curated, citable.

IT/Library crossover strategy questions: business of universities: long-term preservation of scholarship. Any updates on our participation in digital preservation facilities? Some of our colleagues have recommended highly distributed protocols for better preservation. How do we approach this?




CNI Fall 2015 Day 1

I’m at the fall meeting for the Coalition for Networked Information. For those who don’t know, CNI is a joint initiative of Educause and the Association of Research Libraries and was founded in 1990 to promote the use of digital information technology to advance scholarship and education. I was involved in the early days of CNI and I’m happy to have recently been appointed as a representative of Educause on the CNI Steering Committee.

Cliff Lynch is CNI’s Executive Director, and one of the highlights of the member meetings is his plenary address, where he generally surveys the landscape of digital information and pulls together interesting, intriguing, and sometimes troubling themes that he thinks are worth watching and working on.

In today’s plenary Cliff talked about the evolving landscape of federal mandates for public access to federally funded research results. It is only in 2016 that we will see the actual implementation of the plans the various federal agencies put forward to implement the directive that the Office of Science and Technology Policy put out in 2013. Cliff noted that the implementations of the multiple federal funding agencies are not coordinated, and that some of them are not in sync with existing practices at institutions, and there will be a lot of confusion.

Cliff also had some very interesting observations on the current set of issues surrounding security and privacy. He cited the recent IETF work on pervasive surveillance threat models, noting that if you can watch enough aggregate traffic patterns going to and from network locations you can infer a lot, even if you can’t see into the contents of encrypted traffic.  And with the possible emergence of quantum computing that may be able to break current encryption technologies, security and privacy become much more difficult. Looking at the recent string of data breaches at Sony, the Office of Personnel Management, and several universities, you have to start asking whether we are capable of keeping things secure over time.

He then moved on to discussing privacy issues, noting that all sorts of data is being collected on people’s activities in ways that can be creepy – e-texts that tattle on you, e-companions for children or the elderly that broadcast information. CNI held a workshop in the spring on this topic, and the general consensus was that people should be able to have a reasonable expectation of privacy in their online activities, and they should be informed about use of their data. It’s generally clear that we’re doing a horrible job at this. NISO just issued work on distilling some principles. In our campuses people have different impressions of what’s happening in authorization handoffs between institutions and publishers – it’s confused enough that CNI will be fostering some work to gather some facts about this.

The greatest area of innovation right now that Cliff sees is where technology gets combined with other things (the internet of things) – like drones, autonomous vehicles, machine learning, robotics, etc.  But there isn’t a lot of direct technical IT innovation happening, and what we’re seeing is a degree of planned obsolescence where we’re forced to spend lots of time and effort to upgrade software or hardware in ways that don’t get us any increased functionality or productivity. If that continues to be the case we’ll need to figure out how to “slow down the hamster  wheel.”

Finally Cliff closed by talking about the complexity of preservation in a world where information is presented in ways increasingly tailored to the individual. How do we document the evolution of experiences that are mediated by changing algorithms? And this is not just a preservation problem but an accountability issue, given the pervasive use of personalized algorithms in important functions like credit ratings.




CNI Fall 2013 – Creating A Data Interchange Standard For Researchers, Research, And Research Resources: VIVO-ISF

Dean B. Krafft, Brian Lowe, Cornell University

What is VIVO?

  • Software: an open0source semantic-web-based researcher and research discovery tool
  • Data: Institution-wide, publicly-visible information about research and researchers
  • Standards: A stnadard ontology (VIVO data) that interconnects researchers

VIVO normalizes complex inputs, connecting scientists and scholars with and through their research and scholarship.

Why is VIVO important?

  • The only standard way to exchange information about research and researchers across divers institutions
  • Provides authoritative data from institutional databases of record as Linked Open Data
  • Supports search, analysis, and visualization of data
  • Extensible

An http request can return HTML or RDF data

Value for institutions and consortia

  • Common data substrate
  • Distributed curation beyond what is officially tracked
  • Data that is visible gets fixed

US Dept. of Agrigculture implementing VIVO for 45,000 intramural researchers to link to Land Grant universities and international agricultural research institutions.

VIVO exploration and Analytics

  • structured data can be navigated, analyzed, and visualized within or across institutions.
  • VIVO can visualize strengths of networks
  • Create dashboards to understand impact

Providing the context for research data

  • Context is critical to find, understand, and reuse research data
  • Contexts include: narrative publications, research grant data, etc.
  • VIVO dataset registries: Australian National Data Registry, Datastar tool at Cornell

Currently hiring a full-time VIVI project director.

VIVO and the Integrated Semantic Framework

What is the ISF?

  • A semantic infrastructure to represent people based on all the products of their research and activities
  • A partnership between VIVO, eagle-i, and ShareCenter
  • A Clinical and Translational Information Exchange Project (CTSAConnect): 18 months (Feb2012-Aug2013) funded by NIH))

People and Resources – VIVO interested primarily in people, eagle-i interested in genes, anatomy, manufacturer. Overlap in techniques, training, publications, protocols.

ISF Ontology about making relationships – connecting researchers, resources, and clinical activities. Not about classification and applying terms, but about linking things together.

Going beyond static CVs – distributed data, research and scholarship in context, context aids in disambiguation, contributor roles, outputs and outcomes beyond publications.

Linked Data Vocabularies: FOAF (Friend of a Friend) for people, organizations, groups; VCard (Contact info) (new version); BIBO (publications); SKOS (terminologies, controlled vocabularies, etc).

Open biomedical Ontologies (OBO family): OBI (Ontology of biomedical investigations); ERO (eagle-i Research Resource Ontology); RO (Relationship Ontology); IAO (Information Artifact Ontology – goes beyond bibliographic)

Basic Formal Ontology from OBO – Process, Role, Ocurrent, Continuant, Spatial Region, Site.

Reified Relationships – Person-Position-Org, Person-Authorship-Article. RDF Subject/predicate model breaks down for some things, like trying to model different position relationships over time.  So use a triple so the relationship gets treated as an entity of its own with its own metadata. Allows aggregation over time, e.g. Position can be held over a particular time interval. Allows building of a distributed CV over time.  Allows aggregating name change data over time by applying time data to multiple VCards with time properties.

Beyond publication bylines – What are people doing? Roles are important in VIVO ISF. Person-Role-Project. Roles and outputs: Person-Role-Project-document, resource, etc.

Application examples: search (beta.vivosearch.org) can pull in data from distributed software (e.g. Harvard Profiles) using VIVO ontologies.

Use cases: Find publications supported by grants; discover and reuse expensive equipment and resources; demonstrate importance of facilities services to research results; discover people with access to resources or expertise in techniques.

Humanities and Artistic Works -performances of a work, translations, collections and exhibits. Steven McCauley and Theodore Lawless at Brown.

Collaborative development – DuraSpace VIVO-ISF Working Group. Biweekly calls Wed 2 pm ET. https://wiki.duraspace.org/display/VIVO/VIVO-ISF+Ontology+Working+Group

Linked Data for Libraries

December 5, 2013 Mellon made a 2 year grant to Cornell, Harvard, and Stanford starting Jan 2014 to develop Scholarly Resource Semantic Information Store model to capture the intellectual value that librarians and other domain experts add to information resources, together with the social value evident from patterns of research.

Outcomes: Open source extensible SRSIS ontology compatible with VIVO, BIBFRAME and other ontologies for libraries.

Sloan has funded Cornell to integrate ORCID more closely with VIVO. At Cornell they’re turning MARC records into RDF triples indexed with SOLR – beta.blacklight.cornell.edu


CNI Fall 2013 – Visualizing: A New Data Support Role For Duke University Libraries

Angela Voss – Data Visualization Coordinator, Duke Libraries

Data visualization can be typical types such as maps or tag clouds, or custom visualizations such as parallel axes plots. Helping people match their data to their needs, and what they want to get out of their data. Also help people think about cost/benefits of creating visualizations.

Why visualize?

  • Explore data, uncover hidden patterns. e.g. Anscombe’s Quartet.
  • Translate something typically invisible into the visible – makes the abstract easier to understand, increase engagement. Important to people performing research as well as reporting to others.
  • Communicate results, contextualize data, tell a story, or possible even mobilize action around a problem. (see Hans Rosling: The River of Myths). Important to build context around data, not just think that the numbers speak for themselves.

Visualization at Duke

  • No single centralized community, but plenty of distributed groups and projects.
  • Library was already offering GIS help.
  • Who could support visualization? Faculty/department? College/school? Campus-wide organization – was the only option with wide enough reach. There were several options – Duke created a position that reports jointly to Libraries and OIT.
  • Position started in June 2012 – Dual report to Data and GIS Services in the Libraries and Research Computing in OIT.
  • Objectives: instruction and outreach; consultation; develop new visualization services, spaces, programs.

After 18 months, what has been the most successful?

  • Visualization workshop series – software (Tableau (full time students get software free), d3 (Javascript library)), data processing (text analysis, network analysis), best practices (designing academic figures/posters, top 10 dos and don’ts for charts and graphs). The barrier is understanding data transformations to get data into software
  • Online instructional material
  • Just-in-time consulting – crucial to people getting started.
  • Ongoing visualization seminar series – this had been happening since 2002. Helped introduce the community.
  • Student data visualization contest

d3 monthly study group – Using GitHub to share sample code. Using Gist and blocks.org to see the visualization right away. e.g. http://bl.ocks.org/dukevis/6768900/.

Top 10 Dos and Don’ts for Charts and Graphs:

  • Simplify less important information
  • Don’t use 3D effects.
  • Don’t use rainbows for ordered, numerical variables. Use single hue, varying luminance.

Just in time consulting

  • Weekly walk-in consulting hours in the Data & GIS Services computer lab
  • Additional appointments outside of walk-in hours
  • Detailed support and troubleshooting via email

Weekly visualization seminars – Lunch provided, speakers from across campus and outside. Regularity helps. Live streaming and archived video. http://vis.duke.edu/FridayForum

Student data visualization contest

  • Goal: to advertise new services, take a survey of visualization at Duke – helped build relationships across the campus.
  • Open to Duke students, any type of visualization
  • Judged on insightfulness, narrative, aesthetics, technical merit, novelty
  • Awarded three finalists and two winners. Created posters of the winners to display in the lab, and run them on the monitor wall.

After 18 months, what are the challenges?

  • Marketing and outreach – easy to get overwhelmed by the people already using services at the expense of reaching new communities.
  • Staying current – every week there’s a new tool.
  • Project work, priorities – important to continue work as a visualizer on projects.
  • Disciplinary silos and conventions
  • Curriculum and skill gaps – there aren’t people teaching visualization at Duke as a separate topic. Common skill gaps: visualization types and tools; spreadsheet and/or database familiarity; scripting; robust data management practices; basic graphic design

Hopes for the future

  • Active student training program (courses, independent studies, student employment)
  • Additional physical and digital exhibit opportunities
  • Continued project and workshop development

What should a coordinator know?

  • Data transformations
  • Range of visualization types, tools
  • Range of teaching strategies
  • Marketing

What should a coordinator do?

  • Find access points to different communities
  • Use events to build community
  • Collaborate on research projects
  • Stockpile interesting datasets
  • Beware of unmanaged screens
  • Block out plenty of quiet time for the above

How should an organization establish a new visualization support program?

  • Identify potential early adopters
  • Budget for a few events, materials, etc
  • Involve othe service points
  • Provide a support system for the coordinator
  • Expect high demand

Working primarily with staff and grad students, this quarter a lot of undergrads due to a few courses.

Angela’s background is in communication for the most part. There’s a IEEE visualization conference.

CNI Fall 2013 – Opening Plenary – Cliff Lynch

I’m in DC for the Fall membership meeting of the Coalition for Networked Information, which is always a great place to pick up on the latest goings-on at the intersection of libraries and digital information and technology. As usual, the meeting kicks off with Cliff Lynch, the Executive Director of CNI in giving a summary of the current state of the art.

Some things Cliff won’t talk about:

  • The work Joan Lippincott has been leading on digital scholarship centers. Gets at some of the vehicles for forging and sustaining collaborations within research institutions and stewardship of materials that come out of that. There is a session tomorrow about that.
  • Output of executive roundtable on the acquisition, collection, and curation of e-books at scale by University libraries, as well as interaction between online textbooks and ebooks in research libraries. A summary session on that tomorrow. A general challenge question in this area: Are there examples from the recent landscape of books being published in electronic format only (or perhaps with print on demand) that contain high impact contact? Are we starting to see the market emerge where you HAVE to deal with electronic material because it’s not coming out in print, to get coverage of recent events. Thinking mostly about books outside of extremely narrow scholarly domains. Spring executive roundtable will look at software as a networked based service.
  • MOOCs. A year ago you couldn’t convene a group of five academics and get them not to talk about MOOCs. While discussion continues, it’s at a much lower temperature. There is some interesting preliminary research looking at the characteristics of the folks who seem to be most successful at MOOCs. Invites us to go back to much more fundamental notions of the library as the center of the university (if you connect a learner with a good library they’ll go far). That’s true of a certain type of person, but not all. The delivery of teaching and learning experiences is different than delivering a collection of knowledge. In the early enthusiasm about MOOCs there’s a tendency to see them as courses by other means. We will see MOOCs or MOOC-like things for purposes different than traditional courses, like training and other things that don’t fit well in the traditional academic definition of a course.

Things that are changing the landscape:

Hard to not give prominent place to the OSTP directive for federal funding agencies to develop plans to give access to reports and underlying data produced by funded research. There was an August deadline for submission of plans from agencies, which are not public. OSTP has been forthright that some plans are a bit more mature than others, depending on agency. We don’t have a firm date for when they will become public, but there is momentum. These developments will reshape the landscape for institutions that host the researchers as well as for the researchers themselves. Other nations and non-governmental funders are moving in the same directions.

One way to think about this is in the governmental sector, as a new set of compliance requirements. But we’ve seen the leadership in research and higher education (ARL, AAU, APLU) look at this as an opportunity and a challenge to rationalize the production of scholarly literature and data, which needs to be done. We’re seeing  a lot of changes in the obligations and practices that surround scholarly publishing – a whole range of behaviors that need to be rationalized so that researchers aren’t left scratching their heads. Seen a variety of responses to the OSTP mandate – from SHARE (ARL), CHORUS (STEM publishers), government (based on PubMed Central). All have places in the ecology. One of the attractive things about CHORUS is that it makes articles available in the context of the journals in which they appear, which institutional repositories do not. We need to think about how to take advantage of this, not view it as competition.

A little bit of redundancy is not a bad thing. When the government shut down we got an education about how deeply entwined many of the scholarly information services are as non-essential government services. Interesting to look as a case study at what was unavailable during the shutdown. At some level PubMed Central was deemed essential and stayed up, though it was not ingesting new contributions. Conversations that took place recently under the name ANADAPT2, in Barcelona, mostly of national libraries, looking at aligning digital preservation strategies. Can see very clear advantages to aligning strategies at a nation state level, but realize that there are some functions that each nation wants to maintain autonomy in, rather than getting into interdependent collaborations. A set of trade-offs. When does collaboration turn into interdependence?

SHARE is not just an opportunity and a challenge to straighten out publication, but it also deals with data. There was a second executive order over the summer that told federal agencies that the default thinking about data systems (modulo security and privacy concerns) is to provide public access to data. The word public is popular in governmental circles (rather than “open”). When you talk about public access (to let the public of the United States have access to data) that can be from access to raw data files, all the way to systems that help the public understand and analyze the data. There are people in government struggling to understand where on that continuum to fall, especially as there is no money associated with these initiatives.

Issues emerging in the data area: Research and higher-ed community is mobilizing to address needs through bit preservation services. Some data is constrained because of personally identifiable information. Anonymization, while a useful tool, has limited power. It is frighteningly easy to de-anonymize data. Need to think about how to handle personal data while we gain the power of recombination and reuse of research data. We are seeing a movement away of commitment to open-ended preservation of data to a more limited language of data management plans, e.g. preservation of bits for ten years. There are a number of commercial services or consortially based services where you can prepay for ten years. General proposition is we’ll go ten years and then look at what kind of use has been made of the data and then look at alternatives. We have no process for doing that evaluation – we’ll need to involve all sorts of community discussions about value of data, which will need to be cross-institutional (we’ll need registries). It’s not too soon to start thinking about this problem.

This is an example of a broader issue of “transitions of stewardship” – somebody’s been taking care of something, but now their commitment is expiring. We need an orderly way of putting the information resource in front of the scholarly community and evaluating the need for continuing preservation and finding who will step up to it. We’re getting very good at making digital replications of 2-dimensional things like fine art, but the difference is tracking provenance. There’s lots of progress in 3-dimensional work as well (Smithsonian, e.g.). We now have an opportunity to peel off the scholarly side of artifacts for not only exhibition, but as objects of study. There are lots of institutions of cultural memory that are under severe stress – see the discussions around the collection of the Detroit Institute of Art, which again leads to the idea of taking transitions seriously.

We need to be thinking about where we’re assigning resources. Two things that are troublesome: 1) we don’t know how well we’re doing with our digital preservation efforts. How much of the web gets covered by web archiving? We don’t have an inventory of the kinds of things that are out there or what parts are covered, or where the areas of highest risk are. There’s a tendency to go after the easy stuff – part of our strategy going forward needs to become much more systematic. We have a tendency to continually improve things we’ve already in our grasp (like continually improving layers of backup for archives), but we need to look at the tradeoff of resources for this versus focusing on what we’re not yet capturing.

Another place where we’re seeing emerging activities that need to turn into a system is in distributed factual biography. Author identity, citations, aggregation, interchange, and compilation of citations. Connected to compliance issues, with academic processes, social networking among scholars, identifying important work. There’s an enormous amount of siloed work going on. Creeping up towards a place where we have factual biographies that we can break up into smaller parts and reassemble. What degree of assurance do we need on these bits? What role does privacy play? Is the fact that you published something a secret, or should it be able to be? Noteworthiness – Wikipedia has a complicated set of criteria of deciding whether your biography is worthy of Wikipedia. This has a rich wonderful history, going back to the nineteenth century work on national biographical dictionaries. When does someone become a public figure? There’s a question about systems of annual faculty reviews – one of the most hideous examples of siloed activity imaginable. Information often collected in forms that aren’t reusable in multiple ways. These need to be tied together with things like grant management systems, bibliometric systems, etc which are all moving the same data around. Other countries where the government is involved in assessing faculty work to pass out funds are more sophisticated than is typical in the US. One of the things we need to look at hard and quick is interchange formats. There’s good work in Europe and out of the VIVO community.

Notion of coherence at scale – framed by Chuck Henry at CLER. We’re moving past the era of building fairly little system and federating them, but we need to be thinking at scale – how do systems depend on each other and interrelate? Look beyond academia – Wikipedia, Google, Microsoft, Internet Archive. Look at incredible accomplishments of DPLA (Digital Public Library of America). They’re being very clear about what they’re not going to do in the near future, by implication saying that someone else needs to worry about those topics. The scale of engineering we’re looking at to manage scholarship and research knowledge is crossing some fundamental thresholds and we’re going to need to do things very differently than we did in the past. Examples are all around – look at the Pentagon Papers which are now a fundamental reference source to history of that time. That was a book – the research community knew how to deal with it when it was published. What do we do with things like Wikileaks? What do we do with massive data revelations?