CSG Winter 2018 – Research and Teaching & Learning IT: Partnering with the Library

This morning’s workshop on partnering between IT and Libraries features Jenn Stringer/Chris Hoffman (Berkeley), Jennifer Sparrow/Joe Salem (Penn State), Diane Butler (Rice), Cliff Lynch (CNI), Louis King (Yale), David Millman (NYU)

The morning is starting off with some thoughts from Cliff Lynch (CNI):

Reminders of some things many haven’t lived through: In the early 90s there was a call not only for collaboration between IT and Libraries, but serious talk of merging. It was tried at a few institutions, like Columbia University. The takeaway was that it’s fairly crazy at large institutions. The mission expansion of each has been in differing rather than overlapping areas. But it’s been successful at a number of liberal arts organizations.

When CNI was founded it was totally viewed as a collaboration between the CIO and the head of the Library at member institutions. In the early 2000s that makeup was changing. The representation was the head of the library and someone doing research or academic computing, or doing digital work in the libraries. Led to increasing disengagement of the CIOs. Starting around 2000 started putting on executive roundtables with the intent of re-engaging the CIOs. It was fairly easy in the first few years to come up with topics in that sweet spot, but it got harder. If you look back from 1990 – 2005 you see that Libraries had low levels of technical expertise. At the same time libraries had developed some internal expertise in technologies important for digital humanities, data curation, etc, where there is now more competence than in the central IT org, which has structured its mission around infrastructure, compliance, etc. Libraries continue to rely on IT for fundamental infrastructure.

If you look at the landscape, how much IT capability is native to the library, and how much replicates or compliments the expertise in IT. This is hugely inconsistent. If you polled the CSG campuses you’d be surprised at the degree of variation in organic IT expertise in the library.

Collaborations involving library have become much more multilateral rather than bilateral with IT – involving partners like University Presses, Museums, research data management, digital scholarship centers (often involving academic school or department), geospatial centers, maker spaces. \

Don’t forget collaboration on institutional policies. Data governance, privacy and reuse of student data and analytics, responsibility of university to preserving scholarly products. Had a recent roundtable looking at policy implications of adoption of widespread cloud platforms.

This area does not lend itself to checklists.

UC Berkeley – Chris Hoffman

A history of good intentions – Museum Informatics Project – Housed in Library, Digital collections and DAMS. Complicating factors: Sustainability, budget cuts, grant funding; priorities; loss of key champions; culture.

Collectionspace – managing collections for museums.

Research Data Management – an impetus for change. New drivers (DMP requirements), new change leaders, new models for partnership. Benchmarking justified need. Broad definition of research data – all digital parts of a research project. Priority to nurture collaboration between IT and Library. Co-funded a position for program manager. Campus-wide perspective, investing in understanding and bridging cultures.

What’s next? More challenging tests to partnerships, RDM 2.0, Visualization and makerspaces, more fundamental technologies? (archival storage, virtual teaching and research environments); strategic alignment?

NYU – Stratos Efstathiadis, David Millman, David Ackerman

Research technology works closely with LIbraries.

Data Services – estab. 2008. 11 FTE Consultation and instructional support for scholars using quantitative, qualitative, survey design, and geospatial software and methods. Joint service of IT and Libraries.

Digital Library Technology Services – estab ca. 2000. Digital content publication and preservation. New services to support current scholarly communication. R&D to develop new services and partnerships, 19 FTE.

Research Data Management Services – estab 2015. 2 FTE. Promulgate beset practices in data organization, curation, description, publication, compliance, preservation planning, and sharing.

Research Cloud Services – new collaboration build on other preexisting services. Inteconnected research storage environment. REimagine a spectrum of cloud storage from dynamic to published final products. Provide backbone for researchers but also Libraries collections and workflows.

Yale – Louis King

Considerable history at Yale in working in digital transformation space.

Office of Digital Assets and Infrastructure – Sept 2008. Work closely with Library and ITS. Focus on Digital Assets & Infrastructure. Take advantage of disciplinary approach of libraries and technical capacity of IT.

Looking for ways to gain efficiencies and lower overhead for people who want to manage digital content.

Had some substantial initial success, but changes: Initial provost sponsor left Yale, 2009 financial crash, VP retired, two library director transitions, transition in IT director, emerging digital systems in Library.

Late 2012 relaunch as Yale Digital Collections Center, but closed in 2015. But it catalyzed momentum towards digital transformation at Yale. Established the foundation for many successful current and future collaborations.

Rice University – Diane Butler

Library and IT have been partners for a long time. For a very short time, the organizations were merged. Research IT and library have been partners since 2012 and informally even further back. Began iwth library providing the service and IT providing the core infrastructure but has morphed into a collaborative partnership.

Areas of collaboration: Data Management (through Library). Provide consultation, including creating DMP, describing and organizing data, storing data, and sharing data. Training, Access to resources such as platform for sharing and preserving publications and small-to-medium datasets. Still an area for work as faculty aren’t very engaged.

Digital Scholarship: Service provided by library with IT providing infrastructure. Preserving scholarship, navigating copyright and open access, managing and visualizing data, digitizing materials, consultation, etc.  Research IT has history in supporting engineering and sciences, but not so much in humanities.

Digital Humanities: Imagine Rio Project. Most successful collaborative project to date. An architecture and history professors joining together to imagine Rio de Janeiro. Searchable atlas of social and urban evolution of Rio.

Positive outcomes: Research IT had not supported Humanities or qualitative social sciences previously. Success of project has brought in more funding. Research IT now has 2 facilitators that are working with faculty in those disciplines.

At Rice the board has come up with some base funding for research computing, so that all of the work doesn’t have to be funded by grants.

Penn State – Joe Salem and Jennifer Sparrow

Strong history of working with libraries, IT, and student services on accessibility issues. Thinking about spaces in place and how to leverage institutional spaces. Built a “blue box” classroom.

Worked on the Dreamery – a co-learning space for bringing emerging technologies onto campus.

Driving strategic initiatives: Collaborative, technology-infused space. Inherited a space called the Knowledge Commons. Includes a corner with staffing from both Libraries and Academic Tech. Service partnership profile has grown from just a focus on media, to overall platform for supporting students. Work on curricular support together – open educational resources and portable content. Instructional design is a focus.

Learning Spaces committee – Provide leadership in innovative instruction.

What makes the partnership work … or not? What does each side bring to the table?

Chris – Berkeley

Visualization service at Berkeley. HearstCAVE: Connected virtual spaces over the Pacific Research Platform around preserving archaeology preservation. Thinking about how it connects with data science.

Markerspaces at UCB – pockets of excellence and experimentation. Jacobs Institute for Design Innovation. Talking with library and ETS to look at space.

Hooking the two together in a Center for Connected Learning.

Research Data Management at Yale
Much Ado about Something: Complex funder requirements; reliable verficiation of results; reuse of data in new research.

What are the responsibilities and rights of the University and faculty regarding research data? They put out a Yale Research Data & Materials Policy. Developed over 2-2 years with collaboration across the university. There is significant collaboration in support of that policy – Library and IT collaboration: Research Data Strategic Initiaitive Group, Research Data Consultation Group, Yale Center for Research Computing.

Recommendation: Research Data Service Unit; REports within LIbrary – Assessment, coordination, outreach and communication. Federated support model for all research data support services – research technology, data management, metadata, outreach & communications, customer relations, education and training, research data administrative analytics.

NYU – David Millman

Bottom-up requirements – survey local researchers: IT/Lib complementary styles, contacts. Survey peers: IT’Lib coordinated.

Executive review: Dean, AVP-level

NYU – research repository service identification. Umbrella of services – – researc lifecycle. Creation, manipulation, publication, etc. Holistic — customer focus. 1. HPC storage. 2 – medium” performance storage (CIFS, NFS); 3 – “published” sotrage – preserved, curated, citable.

IT/Library crossover strategy questions: business of universities: long-term preservation of scholarship. Any updates on our participation in digital preservation facilities? Some of our colleagues have recommended highly distributed protocols for better preservation. How do we approach this?

 

 

Advertisements

CNI meeting Fall 2014: SHARE update

SHARE update
Tyler Walters, SHARE director and dean of libraries at Virginia Tech
Erice Celeste, SHARE technical director
Jeff Spies, Co-founder/CTO at Center for Open Science

Share is a higher education initiative to maximize research impact. (huh?)

Sponsored by ARL, AAU, APRU.

Knowing what’s going on and keeping informed of what’s going on.

Four working groups addressing key tasks: repository, workflow, technical, communication

Received $1 million from IMLS and Sloan to generate a notification service.

SHARE is a response to the OSTP memo, but roots before that.

Infrastructure: Repositories, research network platforms, CRIS systems, standards and protocols, identifiers

Workflow – multiple silos = administrative burden

Policy – public access, open access, copyright, data management and sharing, internal policies.

Insittutional context: US federal agencies join growing trend to require public access to funded research; measureable proliferation of institutional and disciplinary repositories; premium on impact and visibility in higher ed.

Research Context – Scolarly outcomes are contextualized by materials generated in the process and aftermath of scholarly inquiry. Research process gendrates materials covering methods employed, evidence used, and formative discussion.

Ressearch libraries: collaboration among institutions going up; shift from collections as products to collections as components of the academy’s knowledge resources; library is supporting and embedded within the process of scholarship.

Notification Service: Knowing who is producing what, and under whose auspices, is critical to a wide range of stakeholders – funders, sponsored research offices, etc.

Researchers produce articles, preprints, presentations, datasets, and also administrative output like grant reports and data management plans. Research release events. Meant to be public.

Consumers of research release events: repositories, sponsored research offices, funders, public. Interest in process as well as product. Today each entity must relate arrange with one another to learn what’s going on. Notification service shares metadata about research release events.

Center for open science has partnered with SHARE to implement notification service. http://bit.ly/sharegithub/

Looking for feedback on proposed metadata schema, though the system is schema agnostic.

API – push API and content harvesters (pulling data in from various sources). Now have 24 providers and adding more. 16 use OAIPMH while 8 use non-standard metadata formats.

Harvested data gets put into open science framework – pushes out RSS/Atom, PubSubHubbub, etc. Sit on top of elastic search. You can add a lucene format full-text search to a data request.

250k research release events so far.arxiv and crossref are largest providers. Averaging about 900 events per day. Now averaging 2-3k per day in last few days as new providers are added.

Developed push protocol for providers to push data rather than waiting for pull.

Public release: Early 2015 beta release, fall 2015 first full release.

Some early lessons: Metadata rights issues – some sites not sure about thier right to, for example, share abstracts; Is there an explicit license for metadata (e.g. CC Zero)?;

Inclusion of identifiers – need some key identifiers to be available in order to create effective notifications. Most sources to not even collect email addresses of authors, much less ORCID or ISNI. Most sources make no effort to collect funding information or grant award numbers. Guidelines? See https://www.coar-repositories.org

Consistency across providers – reduce errors, simplify preparing for new providers. Required for push reporting.

Next layer: Reconciliation service – takes output of notification service to create enhanced and interrelated data set.

Share Discovery – searchable and friendly.

Phase 2 benefits – Researchers can keep everyone informed by keeping anyone informed, institutions can assemble more comprehensive record of impact,; open access advocates can hold publishers accountable for promises; other systems can count on consistency of metadata from SHARE.

Relation to Chorus – when items get into Chorus it is a research release event, hopfully will get into notification service.

CNI 2014 Fall meeting – Opening Plenary

I’m in DC for the annual fall meeting of the Coalition for Networked Information. This time the opening plenary is a discussion moderated by Cliff Lynch, CNI’s Executive Director, and including Tom Cramer (Chief Technology Strategist, Stanford University Libraries), Michelle Kimpton (Chief Executive Officer, DuraSpace), and James Hilton (Dean of Libraries & Vice Provost for Digital Educational Initiatives, University of Michigan).

Cliff – Notable successes in people launching community source projects over the last 10-15 years. But the landscape is changing: economic model, speed of development are looking shakey, accellerated move to single or multi-tennant arrangements run elsewhere. Where does this leave you when you come to the point in the lifecycle when you need to think about new systems? How do we engage with new opportunities in the MOOC and Unizen space?

Community source – what is it and does it really have a future?

James Hilton – Community source not going away. Is community source the same as open source? Open source often used synonymous with developer-autonomous centric. Kuali as compared with Sakai. How do you organize the labor that produces an outcome? We have many more tools to tune development – different organizational models can work.

Michele Kimpton (Durapsace) – Tuning of community development model. If you want to collaborate and develop code together, that’s a community model. Code doesn’t advance if people are doing customization at their institutions themselves. Need to invest and be transparent to advance the code base.

Tom Cramer – Many forms of community – one form is to have a centralized organization, but just as many examples where the community is grass-roots driven from the edges, like Fedora.

Is there a trend taking us towards or away from the grass roots model to funding a central model?

Tom Cramer – Examples on both sides. Central organization can bring focus, but so can grassroots – e.g. BlackLight faceted browser for SOLR.

James H – as scale of investment goes up the pressure to organize and centralize goes up.

Is the presence of serious commercial players a factor in central vs. grass-roots?

Tom Cramer – if there’s an absence of commercial players that can buy space and time for grass-roots organizing. Central authority can make missteps, whether community or commercial.

James – Unizen is trying to organize community effort around content and analytics standards. Made a decision to adopt commercial software – in part because they wanted the speed that came with that. Contingent on contracts giving the control needed.

Michele Kimpton – Two models – when commercial entity makes product open-source that gives an exit strategy, but it’s not community controlled. Really serving core paying customers.

Tom Cramer – Community source projects have failed where they’ve been gated communities – fail to channel the interests outside the gates. Also true of vendor solutions – unless you can tap the bigger market it will be a problem.

James – Unizen focus is on creating relays that will be as agnostic as possible. Community development is in building workflows using repositories, not in refining the LMS. It’s not about software, it’s about business and economic models.

Cliff – move to talk about the move of software from local to redundant network hosting. There seems to be a big move in that direction. Seeing what would have been community source before now taking on character as community service – like DPN, APT, etc. How does that change the landscape?

James – makes you ask the question – what parts do I need to control, what do I not need? In Unizen trying to figure out what parts need control. The LMS is core infrastructure – go for economy of scale. Focus control on building digital workflows, helping humanists and research scientists know where stuff goes.

Michele Kimpton – 1700 institutions running DSpace – difficult to upgrade to new releases. Duraspace wanted to provide pathway for smaller institutions to run latest code. Cloud infrastructure will flip IT in academic environments on its head. Will be hard to justify building data centers when they can buy IT as a service and buy only what they need. Can keep the same governing process and openness.

Tom Cramer – Running data center and installing and maintaining software is not the core competency. It’s higher up in the stack providing value to the community. Where do you want to maintain control? Curation, discovery, preservation.

Cliff – A lot of this software is getting big, volatile, and complex enough (especially in the security environment) doing maintenance and configuration management is getting to be troublesome. But if you’re out in the cloud you put need to do version control and validation – is that a worthwhile tradeoff?

James – if you’re committed to running everything in this compliance environment, that is all you will do. What to we value as academic institutions? What do we bring that’s unique?

CLiff – Barrier to innovation is everybody forking off code and doing local adaptations. Sense is in the future with networked software as service that area of variation really goes down. Can diffuse innovations faster.

Tom Cramer – Perhaps getting better at managing diversity. Seen lots of good examples that different communities are good at putting enhancements back into the code base. Separate question than running software as a service. In commercial world looking at securing different layers and diffusing innovation at those layers.

Michele Kimpton – There has been a lot of customization of both DSpace and Fedora, and that leads to frustration in upgrading. But customizations are needed. Part of the beauty of more innovation is you can look at aggregations across instances in the cloud – e.g. how do we aggregate pushing content into DPLA or DPN? Easier to do from cloud to cloud.

Tom Cramer – It’s standardization that enables that, not just cloud. e.g. standardized APIs.

Cliff – standards – are the places where standards are most applicable changing? Used to be notion of standards that allowed replacement of building blocks within a system. Now that you move into a world of aggregated things standards don’t mean as much – may work just fine to be expedient.

James – challenge is how do you move standards at the pace of technology?

Tom Cramer – role for standards based on size of the pool you want to swim in. There are important communities of practice around loose coupling whether informal or formal. Look at the numbers of people using SOLR for searching.

Cliff – Puzzle about how patterns of innovation change. Community source projects from grassroots where there is considerable technical expertise at the participating instituitons. If we think about collective service-based aggregations do local technical experts become scarcer and does that imply less diversity of innovation?

James – if we can move innovation up the stack life gets better.

Tom Cramer – you don’t need to know how to run a server to have technical expertise. Successful solutions will figure out way to tap innovation coming from the edges. Be the community you want to be.

Cliff – you can look back and see the evolution – used to be many organizations that had huge knowledge of global networking, but now it’s held in fewer insittutions.

Michele Kimpton – if the developer can focus on developing and not setting up server and talking to IT, it increases innovation. Can throw things up and see if they work? Capital costs to innovation are so much lower. That’s why in the commercial space you see cloud-based services spawning all over the place.

Discussion of contracting and procurement – the legal folks have the same challenge we do in figuring where we really need to be unique flowers. We all have indemnification and state rules. We don’t need 50 different ways to say it.

CNI Fall 2013 – Opening Plenary – Cliff Lynch

I’m in DC for the Fall membership meeting of the Coalition for Networked Information, which is always a great place to pick up on the latest goings-on at the intersection of libraries and digital information and technology. As usual, the meeting kicks off with Cliff Lynch, the Executive Director of CNI in giving a summary of the current state of the art.

Some things Cliff won’t talk about:

  • The work Joan Lippincott has been leading on digital scholarship centers. Gets at some of the vehicles for forging and sustaining collaborations within research institutions and stewardship of materials that come out of that. There is a session tomorrow about that.
  • Output of executive roundtable on the acquisition, collection, and curation of e-books at scale by University libraries, as well as interaction between online textbooks and ebooks in research libraries. A summary session on that tomorrow. A general challenge question in this area: Are there examples from the recent landscape of books being published in electronic format only (or perhaps with print on demand) that contain high impact contact? Are we starting to see the market emerge where you HAVE to deal with electronic material because it’s not coming out in print, to get coverage of recent events. Thinking mostly about books outside of extremely narrow scholarly domains. Spring executive roundtable will look at software as a networked based service.
  • MOOCs. A year ago you couldn’t convene a group of five academics and get them not to talk about MOOCs. While discussion continues, it’s at a much lower temperature. There is some interesting preliminary research looking at the characteristics of the folks who seem to be most successful at MOOCs. Invites us to go back to much more fundamental notions of the library as the center of the university (if you connect a learner with a good library they’ll go far). That’s true of a certain type of person, but not all. The delivery of teaching and learning experiences is different than delivering a collection of knowledge. In the early enthusiasm about MOOCs there’s a tendency to see them as courses by other means. We will see MOOCs or MOOC-like things for purposes different than traditional courses, like training and other things that don’t fit well in the traditional academic definition of a course.

Things that are changing the landscape:

Hard to not give prominent place to the OSTP directive for federal funding agencies to develop plans to give access to reports and underlying data produced by funded research. There was an August deadline for submission of plans from agencies, which are not public. OSTP has been forthright that some plans are a bit more mature than others, depending on agency. We don’t have a firm date for when they will become public, but there is momentum. These developments will reshape the landscape for institutions that host the researchers as well as for the researchers themselves. Other nations and non-governmental funders are moving in the same directions.

One way to think about this is in the governmental sector, as a new set of compliance requirements. But we’ve seen the leadership in research and higher education (ARL, AAU, APLU) look at this as an opportunity and a challenge to rationalize the production of scholarly literature and data, which needs to be done. We’re seeing  a lot of changes in the obligations and practices that surround scholarly publishing – a whole range of behaviors that need to be rationalized so that researchers aren’t left scratching their heads. Seen a variety of responses to the OSTP mandate – from SHARE (ARL), CHORUS (STEM publishers), government (based on PubMed Central). All have places in the ecology. One of the attractive things about CHORUS is that it makes articles available in the context of the journals in which they appear, which institutional repositories do not. We need to think about how to take advantage of this, not view it as competition.

A little bit of redundancy is not a bad thing. When the government shut down we got an education about how deeply entwined many of the scholarly information services are as non-essential government services. Interesting to look as a case study at what was unavailable during the shutdown. At some level PubMed Central was deemed essential and stayed up, though it was not ingesting new contributions. Conversations that took place recently under the name ANADAPT2, in Barcelona, mostly of national libraries, looking at aligning digital preservation strategies. Can see very clear advantages to aligning strategies at a nation state level, but realize that there are some functions that each nation wants to maintain autonomy in, rather than getting into interdependent collaborations. A set of trade-offs. When does collaboration turn into interdependence?

SHARE is not just an opportunity and a challenge to straighten out publication, but it also deals with data. There was a second executive order over the summer that told federal agencies that the default thinking about data systems (modulo security and privacy concerns) is to provide public access to data. The word public is popular in governmental circles (rather than “open”). When you talk about public access (to let the public of the United States have access to data) that can be from access to raw data files, all the way to systems that help the public understand and analyze the data. There are people in government struggling to understand where on that continuum to fall, especially as there is no money associated with these initiatives.

Issues emerging in the data area: Research and higher-ed community is mobilizing to address needs through bit preservation services. Some data is constrained because of personally identifiable information. Anonymization, while a useful tool, has limited power. It is frighteningly easy to de-anonymize data. Need to think about how to handle personal data while we gain the power of recombination and reuse of research data. We are seeing a movement away of commitment to open-ended preservation of data to a more limited language of data management plans, e.g. preservation of bits for ten years. There are a number of commercial services or consortially based services where you can prepay for ten years. General proposition is we’ll go ten years and then look at what kind of use has been made of the data and then look at alternatives. We have no process for doing that evaluation – we’ll need to involve all sorts of community discussions about value of data, which will need to be cross-institutional (we’ll need registries). It’s not too soon to start thinking about this problem.

This is an example of a broader issue of “transitions of stewardship” – somebody’s been taking care of something, but now their commitment is expiring. We need an orderly way of putting the information resource in front of the scholarly community and evaluating the need for continuing preservation and finding who will step up to it. We’re getting very good at making digital replications of 2-dimensional things like fine art, but the difference is tracking provenance. There’s lots of progress in 3-dimensional work as well (Smithsonian, e.g.). We now have an opportunity to peel off the scholarly side of artifacts for not only exhibition, but as objects of study. There are lots of institutions of cultural memory that are under severe stress – see the discussions around the collection of the Detroit Institute of Art, which again leads to the idea of taking transitions seriously.

We need to be thinking about where we’re assigning resources. Two things that are troublesome: 1) we don’t know how well we’re doing with our digital preservation efforts. How much of the web gets covered by web archiving? We don’t have an inventory of the kinds of things that are out there or what parts are covered, or where the areas of highest risk are. There’s a tendency to go after the easy stuff – part of our strategy going forward needs to become much more systematic. We have a tendency to continually improve things we’ve already in our grasp (like continually improving layers of backup for archives), but we need to look at the tradeoff of resources for this versus focusing on what we’re not yet capturing.

Another place where we’re seeing emerging activities that need to turn into a system is in distributed factual biography. Author identity, citations, aggregation, interchange, and compilation of citations. Connected to compliance issues, with academic processes, social networking among scholars, identifying important work. There’s an enormous amount of siloed work going on. Creeping up towards a place where we have factual biographies that we can break up into smaller parts and reassemble. What degree of assurance do we need on these bits? What role does privacy play? Is the fact that you published something a secret, or should it be able to be? Noteworthiness – Wikipedia has a complicated set of criteria of deciding whether your biography is worthy of Wikipedia. This has a rich wonderful history, going back to the nineteenth century work on national biographical dictionaries. When does someone become a public figure? There’s a question about systems of annual faculty reviews – one of the most hideous examples of siloed activity imaginable. Information often collected in forms that aren’t reusable in multiple ways. These need to be tied together with things like grant management systems, bibliometric systems, etc which are all moving the same data around. Other countries where the government is involved in assessing faculty work to pass out funds are more sophisticated than is typical in the US. One of the things we need to look at hard and quick is interchange formats. There’s good work in Europe and out of the VIVO community.

Notion of coherence at scale – framed by Chuck Henry at CLER. We’re moving past the era of building fairly little system and federating them, but we need to be thinking at scale – how do systems depend on each other and interrelate? Look beyond academia – Wikipedia, Google, Microsoft, Internet Archive. Look at incredible accomplishments of DPLA (Digital Public Library of America). They’re being very clear about what they’re not going to do in the near future, by implication saying that someone else needs to worry about those topics. The scale of engineering we’re looking at to manage scholarship and research knowledge is crossing some fundamental thresholds and we’re going to need to do things very differently than we did in the past. Examples are all around – look at the Pentagon Papers which are now a fundamental reference source to history of that time. That was a book – the research community knew how to deal with it when it was published. What do we do with things like Wikileaks? What do we do with massive data revelations?

 

CNI 2012 Fall meeting – Opening Plenary

I came in a bit late to this plenary, so forgive me for the incomplete notes.

Cliff Lynch, CNI

MOOCS – Machine grading will be important. Masses of data coming out of MOOCS – who controls it and who gets to do what with it? Nobody seems to have asked the students. What’s the pact around learning data? Google decided it would be fun to build a MOOC platform and ran a couple of instances of how to search using Google course. They did it in order to drive use to their product – an example of applying this technology outside of academia, which will be increasingly more common. An affordable way of doing consumer education.

It’s clear that globally promiscuous admission to MOOCs doesn’t map well to the way institutions license content. Will drive greater use of open access materials. Will also drive broader community licensing of materials. Finally we’re seeing some serious work on alumni access to licensed materials – JSTOR is one among players working on this. There will be pressure to move towards more rational personal licensing schemes, but these have high transactional costs.

E-textbooks – Seeing some attempts to do licensing at scale. May have some economic payoff. Completely remap the relationships between faculty, presses, bookstores, etc. One of the messages to take away from MOOCs and e-texts is we need a more deliberate strategy around licensing of instructional materials. A lot of this work has taken place in the IT community, who’ve been able to make some progress. Libraries have stayed away from licensing e-texts at the same time as they’ve developed sophisticated understandings of licensing of other materials that could be brought to bear.

About a month ago there was a short piece in the Chronicle about a new textbook platform. The distinctive feature was that it would report to the teacher whether you’d done the readings. Quoted a few faculty who thought this was wonderful. Do you find this at all creepy? Could ask the same question about MOOCs or LMS systems. Do the students even know? This is an issue waiting to hit the front pages. Need some conversations about privacy and informed consent around these platforms.

Debates going on about how vocational higher education should be, should we still do humanities, are there really decent jobs for STEM graduates, etc. These things are under debate with a new intensity that deserves some serious considerations. The role of employers in training as opposed to universities in teaching – revisit the comments on MOOCS in teaching outside of academia.

Science is under a great deal of pressure. There is a crisis about reproducibility of results bubbling up – some attempts to reproduce results aren’t going very well. If we don’t get this under control it will affect the public support and funding of science.

The rise of PLOS ONE – publishing a measurable part of the scholarly literature. Vetting for correctness rather than ranking. Offers a level of predictability not found in many major journals.

Public libraries are being largely cut out of ebooks, particularly mass market ebooks. Very good example of where licensing can take us, and the extraordinary power that licensing rather than first sale gives publishers.

We’ve had some encouraging court judgements supporting the principle of fair use. At the same time there are some very troubling things in the first sale area suggesting that we may see an increasing limitation to first sale. Broad populace is starting to wake up to this – is someone going to be able to inherit your ebooks?

One of touchstones of CNI’s work has been understanding the changes in scholarly practice. That’s worth continual revisiting. Can identify a number of new developments over the past few years. One is one that would be easy to mis-classify as “big data” – we’ve moved into a world where there is an abundance of evidence, whether that’s an historian visiting records or an archaeologist seeking to understand urn making. We have lots of examples – we want to look at outliers, make sense of millions of email messages, etc. We’re seeing automatic search and clustering tools, analysis of social networks, etc. Different (and predate) big data tools.

Some new scholarly environments – Math Overflow – mathematicians and upper level grad students can post questions and get answers. Has an elaborate system of ratings and rankings – similar to Stack Overflow. Very slce-able scholarly practice communities using these tools in their work. Wolfram Alpha – a new class of information system that has some capabilities for encoding computational knowledge. We need to be very open to recognizing these kinds of new systems showing up in scholarly communities. Scholarly practice does not stay still – change continues to ripple.

Data curation and research data management. CNI has been very active for a decade now, trying to look at what was coming. We’ve seen NSF and NIH requirements, other funding agencies are moving along this path. While we’ve changed the regulations we’re flying mostly blind. We know very little (collectively) about what’s being proposed, what effect it’s having on funding decisions, and whether people do what they say they’re going to do. There’s a tremendous need to collect data so we understand what’s working or not.

Some specific problematic areas – Individually identifiable data: reusing this is very hard We need to think about research continuity and risk management. See Hurricane Sandy.

Building an expandable outline for the web

For part of a major work project I found myself needing to view a hierarchical list in an expandable outline form on the web.

On my Mac I used Omni Outliner which works great, but that didn’t work for my colleagues who work in Windows. They translated the list into a Word outline, but outline view in Word is not all that great, and who wants to fire up Word just to view an outline?

No, the answer was definitely to be able to view this list in a web browser, but how to build that?

We’ve been managing most of the work on this project we’ve using a Sharepoint site, but I couldn’t find a way to build a simple expandable list in Sharepoint. Later I found some articles on how to build one in Sharepoint, but they require developer access to Sharepoint, which I don’t have, and some coding in Visual Studio, which left me out in the cold at home over the long weekend with my house full of Macs.

In casting around I came up with two AJAX toolkits that both have expanding and collapsing tree controls – the DOJO toolkit and the Yahoo! UI Library.

I spent some time this weekend trying out the tree controls in both libraries. I was impressed with both – the features and design of both of these toolkits is pretty amazing.

I spent some time with Dojo’s tree control, but I couldn’t figure out how to apply styles to all the nodes at a given level (e.g. all the top level nodes should be green, second level nodes should have smaller type, etc.). Perhaps I just didn’t spend enough time with it – I’m sure it must be possible.

I was able to implement the controls and styles I needed using Yahoo!’s TreeView control, though it took a couple of hours to understand how to fit the elements of the UI Library together – all of these kinds of tools have their own learning curve. I never did get the tree to build from existing markup in an unordered list – I built my tree content in the Javascript code.

I should mention that these are tools for web developers – they’re not just things where you can drop a line of javascript into a web page and use – they take some savvy and some time to use. But it’s a heck of a lot easier than building tools from scratch!

I’m incredibly impressed with the maturation of the market of freely available tools for web developers since I last looked. I’m also struck again by the maturation of the entire concept of what it means to be a web developer now – it’s not just technical know-how, and it’s not just design ability, but a blending of the two in a way that I think is a fundamentally new discipline. It’s interesting to think about where that discipline finds its home in our academic institutions – at present I don’t really think it has one.