CNI Fall 2014 Meeting: Fedora 4 early adopters

Fedora 4 Early Adopters

David Wilcox, Defora Product Manager, DuraSpace

Fedora 4.0 released November 27. Built by 35 Fedora community developers. Native citizen of the semantic web – linked data platform service. Hydra and Islandora integration.

Beta pilots – Art Institute of Chicago, Penn State, Stanford, UCSD.

62 members in support of Fedoray, funding increased dramatically (over $500k). Effort around building sustainability – more members at lower funding amounts. Governance model – Leadership and steering groups.

Fedora 4 roadmap – short term (6 months) – 4.1 will support migrations from Fedora 3. Want to establish migration pilots, and prioritize 4.1 features.

Fedora 4.1 features – focus on migrations, but some new features – API partitioning, Web Access Control, Audit service, remote/asynch storage are candidates.

Fedora 4 training- 3 workshops held in October (DC, Australia, Colorado), more planned for 2015.

It is possible for Fedora 4 could be a back-end for VIVO.

If you want to go with Hyrda at this point you should go to Fedora 4, not 3.

Declan Fleming, UCSD

Oroginal goals – map UCSD’s deeply-nested metadata to simpler RDF vocabularies, taking advantage of Dedora 4’s RDF functionality. Ingest UCSD DAMS4 71k objects using different storage options to compare ingest performance, functionality, and repository performance. Synchronize content to disk and/or an external triple store.

Current status – Initial mapping of metadata completed for pilot work. Ingested sample dataset using mulitple storage options: Modeshape, federated filesystem, and hybrid (modeshape objects linked to federated fulesystem files). Ingested full UCSD DAMS4 dataset into Fedora4 using Modeshape.

Ongoing work – continuing to refine metadata mapping, as part of the broader Hudra community push toward interoperability and pluggable data models. Full-scale ingest with simultaneious indexing, full-scale ingest with hybrid storage (about ready to give up on that and embrace modeshape), performance testing.

Over time ingesting of metadata slowed down – they use a lot of blank nodes which adds to complexity of structure – might be the reason.

File operations were very reliable. Didn’t test huge files rigorously.

Stefano Cossu – Art Institute

DAMS project goals – will take over part of current Collection Management System duties – 270k objects, 2/3 of which are digitized. Strong integration with existing systems adopt standards, single source for institution-wide shared data. Meant to become a central hub of knowledge.

LAKE – Linked Asset and Knowledge Ecosystem. Integrates with CITI (collection management system) which is the front-end to Fedora (LAKE) which acts as the asset store.

Why Fedora? Great integration capabilities, very adaptable, built on modern standards, focus on data preservation. Makes no assumptions about front-end interface. REST APIs. Speaks RDF natively.

Key features for the AIC – Content modeling, federation, asynchronous automation, external indexing, flexible storage.

Content modeling: adding/removing functionality via mix-ins. Can define type and sub-types. Spending lots of time building a content model. Serves as a foundation for ontology. Still debating whether JCR is best model for building content model. Additional content control is in their wish list.

Asynchronous Automation: Used modeshape sequencers so far. Camel framework offers more generic functionality and flxibility. Uses: extract metadata on ingestion, create/destroy derivatives based on node events, index content.

Filesystem federation to access external sources, custom database connector.

Indexing: multiple indexing engines – powerful search/query tools: triplesetore, solr, etc.

Tom Cramer – Stanford

Exercising Fedora as a linked data repository – introducing triannon and Stanfords Fedora 4 beta pilot

Use case 1: digital manuscript annotations. Used open annotation W3C working group approach to map annotation into RDF. Tens of thousands of annotations – where to store, manage, and retrieve?

use Case 2:Linked data for libraries. Bibliographic data, person data, circulation and curation data. Build a virtual collection without enriching the core record using linked data to index and visualize.

Need a RDF store, need to persist, manage, and index. Not the ILS nor core repository – this is a fluid space while the repository is stable and reliable. All RDF / linked data.

Fedora was a good fit: Native RDF store, manage assets (bitstreams), built in service framework (versioning, indexing, APIs), easy to deploy.

Linked Data Platform (LDP): W3C draft spec, enables read-write operations of linked data via HTTP, Developed at at same time as Fedora 4, Fedora 4 one of a handful of current LDP implementations.

Stanford pilot: install, configure & deploy Fedora 4; exercise LDP API for storing annotations and associated text/binary objects; develop support for RDF references to external objects; test scale with millions of small objects; integrate with read/write apps and operations – annotation tools (e.g. Annotator), indexing and visualization (Solr and Blacklight)

Current: Annotator (Mirador) <- json-ld -> Trianon (Rails engine for open annotations stored in Fedora 4) <-> LDP – Fedora 4.

Future: Blacklight and Solr.

Learned to date: Fedora 4 approaching 100% LDP 1.0 compliant, Trannon at alpha stage (can write, read & delete open annotations to/from Fedora 4); Still to come: updates to annotations, storage of binary blobs in Fedora, implement authn/z, deploy against real annotation clients, populate with data at scale.

Looking at Fedora 4 as a general store for enriching digital objects and records through annotating, curating, tagging.