CNI Fall 2014 Meeting – VIVO Evolution

Evolution of VIVO Software

Layne Johnson, VIVO Project Director, DuraSpace

VIVO History – started at Cornell in 2003. 2009-12 NIH funded VIVO ($12 million) to evolve.

Problems – Researchers struggle to identify collaborators, most information and data are highly distributed, difficult to access, reuse, & share and is not standardized for interop.

VIVO can facilitate collaborations and store disparate information stored inteh VIVO-ISF ontology.

What is VIVO? Open source, semantic web application enables management and discovery of research and scholarship across disciplines and institutions.

VIVO harvests data from authoritative sources thus reducing manual input and providing integrated data sources. Internal data from ERPs, external data from bibliographic sources, ejournals, patents, etc.

VIVO data stored as RDF.

Triple stores and linked open data: provide abiity to inference and reason; can be machine readable; links into the open data cloue; provide links into a wide variety of information sources from different interoperable ontologies; allow knowledge about research and researchers to be discovered.

VIVO supports search & exploration – by individual, type, relationship, combinations and facets.

One of the larger implementations is USDA VIVO. Another interesting one is Find an Expert at the University of Melbourne. Scholars@Duke. Mountain West Research Consortium has a cross consortium search. The Deep Carbon Observatory data portal uses VIVO.

Installed base of VIVO implementations has remained somewhat level.

VIVO Evolution: from grant-funding to open source. In 2012-13 VIVO partnered with DuraSpace, who provide infrastructure and leadership – legal, tax, marketing communication, leadership. Sustained through a community membership model. VIVO project director hired May 1.

Charter process – Jonathn Markow & Steering group. Based on DuraSpace model for consistency across products. Charter finalized in late July, 2014.

VIVO Governance: Leaderhip group, steering group, management team. Four working groups: Development & Implementation; Applications & Tools; VIVO-ISF Ontology; Community Engagement & Outreach (undergoing reconstitution).

Four levels of membership – $2.5k, $5k, $10k, $20k.

VIVO strategic planning: 14 member strategy group created from leadership, steering, management teams and external members. Met December 1 & 2. Did a survey to determine current state of 41 VIVO leaders. Got 20 respondents. VIVO’s 3 strategic themes: community, sustainability, technology. 5 top goals for each theme selected, each strategy group member got tovote for 3 goals per theme.

Community: increase productivity; develop more transparent governance; increase engaged contributors; maintain a current and dynamic web presence; develop goals for partnerships (ORCID, CRIS, CASRAI, W3C, SciEnCV, CRediT, etc.)

Sustainability: create welcoming community; develop clear value proposition; increase adoption; promote the value of membership.

Technology: Develop democratic code processes; clarify core architecture and processes; develop VIVO search; improve/increase core modularity; team-based development processes.


CNI Fall 2013 – Creating A Data Interchange Standard For Researchers, Research, And Research Resources: VIVO-ISF

Dean B. Krafft, Brian Lowe, Cornell University

What is VIVO?

  • Software: an open0source semantic-web-based researcher and research discovery tool
  • Data: Institution-wide, publicly-visible information about research and researchers
  • Standards: A stnadard ontology (VIVO data) that interconnects researchers

VIVO normalizes complex inputs, connecting scientists and scholars with and through their research and scholarship.

Why is VIVO important?

  • The only standard way to exchange information about research and researchers across divers institutions
  • Provides authoritative data from institutional databases of record as Linked Open Data
  • Supports search, analysis, and visualization of data
  • Extensible

An http request can return HTML or RDF data

Value for institutions and consortia

  • Common data substrate
  • Distributed curation beyond what is officially tracked
  • Data that is visible gets fixed

US Dept. of Agrigculture implementing VIVO for 45,000 intramural researchers to link to Land Grant universities and international agricultural research institutions.

VIVO exploration and Analytics

  • structured data can be navigated, analyzed, and visualized within or across institutions.
  • VIVO can visualize strengths of networks
  • Create dashboards to understand impact

Providing the context for research data

  • Context is critical to find, understand, and reuse research data
  • Contexts include: narrative publications, research grant data, etc.
  • VIVO dataset registries: Australian National Data Registry, Datastar tool at Cornell

Currently hiring a full-time VIVI project director.

VIVO and the Integrated Semantic Framework

What is the ISF?

  • A semantic infrastructure to represent people based on all the products of their research and activities
  • A partnership between VIVO, eagle-i, and ShareCenter
  • A Clinical and Translational Information Exchange Project (CTSAConnect): 18 months (Feb2012-Aug2013) funded by NIH))

People and Resources – VIVO interested primarily in people, eagle-i interested in genes, anatomy, manufacturer. Overlap in techniques, training, publications, protocols.

ISF Ontology about making relationships – connecting researchers, resources, and clinical activities. Not about classification and applying terms, but about linking things together.

Going beyond static CVs – distributed data, research and scholarship in context, context aids in disambiguation, contributor roles, outputs and outcomes beyond publications.

Linked Data Vocabularies: FOAF (Friend of a Friend) for people, organizations, groups; VCard (Contact info) (new version); BIBO (publications); SKOS (terminologies, controlled vocabularies, etc).

Open biomedical Ontologies (OBO family): OBI (Ontology of biomedical investigations); ERO (eagle-i Research Resource Ontology); RO (Relationship Ontology); IAO (Information Artifact Ontology – goes beyond bibliographic)

Basic Formal Ontology from OBO – Process, Role, Ocurrent, Continuant, Spatial Region, Site.

Reified Relationships – Person-Position-Org, Person-Authorship-Article. RDF Subject/predicate model breaks down for some things, like trying to model different position relationships over time.  So use a triple so the relationship gets treated as an entity of its own with its own metadata. Allows aggregation over time, e.g. Position can be held over a particular time interval. Allows building of a distributed CV over time.  Allows aggregating name change data over time by applying time data to multiple VCards with time properties.

Beyond publication bylines – What are people doing? Roles are important in VIVO ISF. Person-Role-Project. Roles and outputs: Person-Role-Project-document, resource, etc.

Application examples: search ( can pull in data from distributed software (e.g. Harvard Profiles) using VIVO ontologies.

Use cases: Find publications supported by grants; discover and reuse expensive equipment and resources; demonstrate importance of facilities services to research results; discover people with access to resources or expertise in techniques.

Humanities and Artistic Works -performances of a work, translations, collections and exhibits. Steven McCauley and Theodore Lawless at Brown.

Collaborative development – DuraSpace VIVO-ISF Working Group. Biweekly calls Wed 2 pm ET.

Linked Data for Libraries

December 5, 2013 Mellon made a 2 year grant to Cornell, Harvard, and Stanford starting Jan 2014 to develop Scholarly Resource Semantic Information Store model to capture the intellectual value that librarians and other domain experts add to information resources, together with the social value evident from patterns of research.

Outcomes: Open source extensible SRSIS ontology compatible with VIVO, BIBFRAME and other ontologies for libraries.

Sloan has funded Cornell to integrate ORCID more closely with VIVO. At Cornell they’re turning MARC records into RDF triples indexed with SOLR –