CNI Fall 2012: Closing Plenary – Hunter R. Rawlings III, AAU

Closing Plenary:
Hunter R. Rawlings III, AAU

Fundamental question in higher education: what is college for? The answer drives educational policy and is now up for grabs in dramatic fashion. Went to college to get an education, a rather quaint notion. Assumption was that he’d learn how to learn, and thus would be prepared to be a functioning citizen in a democracy. Today these reasons are buried under getting a job. Much of the public feels this is the reason to send kids to college, and public policy makers agree. Governors of Texas, Florida, Wisconsin, and other states are implementing this in policies.

No quarrel with vocational education to help people get jobs – community colleges do this quite well, and they are growing. But to force this vocational model on public universities, including research universities, is a mistake. Note what Virginia did last year – required colleges and universities to report salaries of graduates by major. Report ranks universities by salaries by major, 18 months after graduation. Entering an era of increased corporatization of higher education – and this will increase.

CEOs of major companies want people who can think for themselves, do sophisticated research, express themselves clearly and deal with unknowns. We’re at a cross roads. State funding for public universities goes down every year, tuition goes up, public anger at tuition increases, and universities end up running themselves like corporations. This leads to problems like what happened in Virginia this summer. This is serious business and the trend line is not good.

Fortunately, there are bright spots amidst the gloom. The astounding flood of undergraduate students from China and other countries in the past ten years. At Michigan State there were 30 undergrads from China in 2002. This year there are 2845. In past two years alone the number of undergraduates from China at American universities has doubled. There are 100 million students in secondary schools in China. 8 million students graduate each year – 100% speak English. A growing number can afford to go to US universities. A tidal wave of 18 year olds coming. This will alter American universities and it will alter China. In many cases they major in subjects that American students don’t want. Let’s hope our government will be sensible enough to let those who want to stay and work do so.

We continue to have great colleges and universities, both public and private, but the publics are under a lot of strain.

The state of scholarly communication, libraries, and IT. Technology provides for increasing access to scholarly communication and reducing the cost of producing it. Publishers are concerned about impact of free availability on their business models. Generating conflicts in the academy and in federal policy making. Access to NIH-funded research has been greatly increased by PubMed Central. Originally NIH planned for 6 month embargo on articles. After contention with publishers embargo was increased to 12 months. NIH made submission mandatory in 2007, and they will begin enforcing that this coming spring. Legislation has been introduced for the last several years for all federal agencies to establish public repositories like PubMed – strongly opposed by commercial publishers.

Scholarly publishing roundtable in Congress, chaired by John Vaughan. Recommendations included call for all federal funding agencies to establish repositories for free public access to research. 12 out of 14 members endorsed the recommendations. The member from Elsevier believed it called for too much government intrusion into publishing, and the member from PLOS believed it did not call for enough government participation. Several of the recommendations were incorporated in the America Competes Act reauthorization.

The digital revolution has great impact for books. One pernicious impact of the soaring costs of scientific journals has been the downward trend in monograph purchases by libraries. AAU and ARL has established a task force focusing on three domains: university presses, digital repositories, and ? University IT organizations will play an increasing role in these processes.

IT also plays an essential role in online education. Almost daily new coalitions of universities offering MOOCs are announced. Newpsapers are delighted with these developments. Does this represent a precipitous paradigm shift? Is the solution to our cost problem at hand? Some political leaders and public think so, but it’s mostly hype. So far very little revenue, no course credit, no degrees, extremely low rates of completion, and (surprisingly) lots of cheating. Not ready to declare victory in the war on cost and in the search for better instruction. Everything we’ve learned about what helps students learn is the opposite of MOOCs – direct interaction helps.

Prior to 400 BC the Greeks communicated orally, not in writing. A performance culture and all forms of literature were in verse because it was rhythmic and more easily remembered. The book and prose made first serious appearance before 400 BC in Athens. Living in Athens was Plato, who learned from Socrates, a great oral philosopher. Plato saw the advent of the book and thought about the consequences of the book for human knowledge. The book had some clear advantages but two disadvantages: Greeks would begin to lose their memory, and (more seriously) you could not query or challenge the ideas in a book – it was a one way street. No dialectic that leads to education. For Plato, when conversation ended, so did learning. Learning depends on intellectual engagement where you perform actively. What did Plato do? He wrote books, but a special kind: Dialogs. Books that mimicked real philosophical conversation. There’s evidence that Plato considered his books playthings, compared to the serious conversation in his academy. The books then are meant to point towards real learning, and he used it intelligently to maximize its benefits and minimize its weaknesses. Perhaps we should think of MOOCs the same way. Very fine demonstrations by scholars of the life of the mind. Enticing and fun, if you don’t take them too seriously. Not the answer to deep education issues, but if they provide 1% of the benefit to humankind of Plato’s dialogs they will have proven themselves a worthy successor.

Flipped classroom is turning out to be a good thing in many cases – watch the lecture online, come to class to perform. Encourages student engagement. Anything that increases engagement is a plus and we’re starting find ways that online education can enhance that. But worried about the hype and over-promising that are going on and the folly that it will solve the cost problem.

What can scholarly community do to restrain the growth in cost of higher education? A complex issue. Harold Shapiro said we could drop costs easily if we start teaching more students per section – some are pursuing this, but you also cut into value. The real question is how do you keep costs down while maintaining quality? This is the big issue that presidents and provosts are now confronting. The public is tired of hearing complex answers to why costs go up – and the answers are complex. Universities have to control non-educational costs much better than they have – we’ve been competing partly on bells and whistles, and we can’t do that any longer. We should push students to finish degrees in shorter time. ACS yesterday recommended that it should take 4 years to get a chemistry PhD.  



CNI Fall 2012 – Digital Humanities At Scale: Hathi Trust Research Center

Digital Humanities At Scale: Hathi Trust Research Center
John Unsworth, Brandeis
Beth Sandore Namachchivaya, UIUC

Public research arm of the Hathi Trust – which is a large corpus providing opportunity for new forms of computation investigation. Over 10 million total volumes currently. The bigger the data, the less able to move it to a researcher’s location. Future research will require computation moving to the data.

Requirements gathering: 2010-11 sponsored by Lib school at UIUC. Did a study interviewing all 22 researchers with Google Digital Humanities Research Awards.

Findings: Improve OCR quality where possible; enhance scanned image views for OCR reference and correction; metadata should expose the quality of OCR; Better granular metadata about languages (human correction preferred); need bibliographic records in useable form.

Goals for HTRC
– provide a persistent and sustainable structure to enable scholars to ask and answer new questions. Leveraging data storage and computational infrastructure at Indiana & Illinois; stimulate community development of new functionality and tools; use tools to enable discoveries.
– Enable scholars to fully utilize content while preventing IP misuse under US copyright law.

One of the early research projects, done by Ted Underwood et al at Illinois. Identify all 18th and 19th century published books in HathiTrust corpus, and apply topic modeling to create consistent overall subject metadata. Also did experiment to look at ratio between words entering the English lexicon before 1150 and after in three different genres. To do this kind of work you start by doing a lot of cleaning of data. The glory is in analyzing the data, which takes just a small amount of the time.

Cleaning the data: 1. clean up the OCR/assess error. 2. Identify parts of a volume (e.g. articles in a serial, poetry/prose). 3. Remove library bookplates, running headers, etc.

Cleaning/enriching the metadata: 1. “18??” 2. discard duplicate volumes / select early editions? 3. Add metadata you need for interpretation like gender or genre.

Things we could share: period lexicons / variant spellings; gazetteers of proper nouns; OCR correction rules for a period; document segmentation and/or cleaned and segmented text; FRBRization; Cleaned / enriched metadata; code to do all the above.

HTRC architecture
Philosophy – computation moves to data; web services architecture and protocols; registry of services and algorithms; Solr full text indexes; noSQL store as volume store; openID authentication; portal front-end, programmatic access; SEASR mining algorithms.

Infrastructure for computational analysis – algorithms must be co-located with data. Analysis or large parts of the corpus require significant parallel computing resources, requiring batch processing.

Can fair use be determined based on categorization of algorithm? Or is all computational use fair use? Even the pubic domain contact was generated by Google and comes with some contractual limitations on use. What kind (and how much) data gets shipped back to user as part of a result set? Researchers need context as well as the token that was a result. Could be a paragraph, a stanza, a page. Need to examine user contributed code to see if it abides by rules.

Phase 2 availability of resource – March 2013. WOrkshops @ Digital Humanities 2013 and JCDL. Fix the OCR and Metadata Shortage Community Challenge. Job opportunities – postdoc @ Illinois.

CNI Fall 2012 Meeting: HarvardX: Developing Communities of Practice for Innovation in Online Learning

HarvardX: Developing Communities of Practice for Innovation in Online Learning
Samantha Earp, Susan Fliss, Harvard University

Most in the audience think MOOCS are at the height of inflated expectations on the Gartner hype cycle.

Early work in progress at Harvard (in startup mode).

EdX was launched in May, following on MITX courses. Harvard and MIT are founding institutions and funding initial work. Slowly growing in a deliberate, measured pace. Evolving to a software + services organization.

HarvardX is the Harvard engagement manifesting as a set of courses on EdX.

High level institutional goals: Improve education on campus, bring education to the world, conduct research on how to deliver on the first two goals. Alignment with educational mission at Harvard, priorities should align with the educational goals of the individual schools. Starting with intent that everything recirculates back to campus.

A very faculty-driven effort. Hoping to push the envelope of using the platform – many examples of online and distance learning at Harvard already. Not looking for a cookie-cutter approach, as opposed to the templated approach in older LMS systems. Looking to explore what’s possible, and willing to tolerate some mistakes along the way. Not just a publication effort – not just taking a course and crafting for publication like a bronze in a museum, but trying to make courses available and learn and adapt.

Faculty have established two high-level criteria – focus on quality and impact. Beyond that focus is on extending and sustaining. Not interested in a series of one-offs. The idea is to foster communities of practice. An important strategy in how we learn from each other.

EdX is governed by board of four members each from MIT and Harvard. Those four (at Harvard) form the core leadership team. THere’s a faculty course team, working on major policy questions, what a course looks like, etc, from faculty perspective. There’s a research committee looking at what is important for Harvard to learn from this.

Course development – have a small core team. Most critical is a HarvardX fellow role – deep pedagogical background along with project management and tech skills. Coordinators of communities of practice and a number of students and contract staff. Will eventually will have research fellows. Will work as data scientists on research questions.

Looking for pedagogy to drive platform development. Slated to become open source platform, which implies a governance and community process that don’t exist yet.

What does it mean to teach in this flavor of online space, and what does it mean for the campus? Leads to a rejuvenated discussion of pedagogy on campus. What pedagogies might be out there already that we should adopt locally?

Intellectual property, copyright, content is an issue.

The opportunity to Collect and analyze data on the librarian participation in courses is exciting to Harvard librarians. Librarians are beginning to plan to support EdX both locally and across the participating institutions. Forming two groups: one on copyright process one on research.

Main issues are use of copyright materials in use, assignment of copyrighted readings, assignment of copyright for original content, applicability of notice and takedown provisions of DMCA, and accessibility. Group is working on suggesting best practices in these areas, using use cases.

Research skills working group in participating libraries – Determining best how to support learners in information seeking tasks and information literacy.

Two groups will collaborate with faulty, academic colleagues, and each other.

Two courses so far. 56,000 people signed up for epidemiology course and roughly 20% are sticking with it. Adding four more courses in spring – law, greek lit, government.

Interested in engaging faculty in developing experimental instructional modules – hoping to quickly ramp up experiments with faculty. May get used in future learning experiences that may or may not be a course. Interested in how this interacts with residential education.

For one course Elsevier allowed page images from the textbook to be used in the MOOC and then sold out the entire press run of that book – they’re interested in exploring more use.

Some materials are clearly available openly, some clearly need licensing, but there’s a large body in the middle that might be covered by fair use.


CNI Fall 2012 – Student-Driven Innovation: Simul8 group at UCLA Library

Student-Driven Innovation: Simul8 Group at UCLA Library
Kevin Rundblad, University of California, Los Angeles
Todd Grappone, University of California, Los Angeles

UCLA received from grant funding from Arcadia to do innovative library work. This project is about student-driven application development. What they get is that they’re being developed by the people who will use them. Trying to capture innovative startup culture in the Library. Their usual project management process doesn’t really get to the freeform chaotic process that is creativity at the student level.

Innovation Thesis: Great new development projects arise from startup-like cultures.
Qualities of a startup: Small, experimental, feedback loops.

2009: needed to find a way to understand users and build apps. Realized Library is good at defense, maintaining systems and keeping them running. But offense – ability to create new apps for newest devices wasn’t their strength. Students work in a startup mentality by nature. Can we create these types of experiences in a work structure? Working “like them” as much as working “with them”

“If you truly want to understand customers’ wants and needs, you need to remove the distance between you and them.”

Started with 5 students.

Founding concepts – leverage student skillsets. Looking for iOS/Android, PHP/Python/MySql, HTML/CSS/Javascript, Amazon Web Services. Students principle driver towards move to the cloud. Not waterfall – Innovation smeared over time. Requirements found in process. Start off creative. Get closer to user experience – implicit user research.

Focus – Mobile/Tablet/Web apps, iPhone/Android. Experiment with new devices and APIs. Build library value, and build value for students by providing a platform for their skills.

Students work independently on their own laptops. Weekly meeting & flexible hours. Using Github. The flexibility in the group is what gives it the power – have to control it like herding cats.

Kevin is the “glue” in the group – App planning. Prototyping – determine suitable technologies. Testing – detail, detail, detail. Find speed/ui issues, github issues.

This group pushes the Library in directions it might not have taken – like Amazon WS. This activity made some folks in the Library IT department uncomfortable, but it’s been great.

CNI Fall 2012 – What we’re learning from e-text pilots

What we’re learning from e-text

Joan Cheverie, EDUCAUSE
Rodney Petersen, EDUCAUSE
Jarrett Cummings, EDUCAUSE

Educause and Internet2 pilot – Spring 2013 eContent Pilot.

Moving from print to digital, but taking it above and beyond just a PDF of a print text. Want to explore innovative business models.

Funding and Distribution – Institutional agreements: College or univeristy pays; leverage institutional buying power; software site license is a model. Distribution to students: access via LMS; Option to purchase print-on-demand; institutional choice on cost recovery.

Stakeholders – Institution, faculty, students, bookstores, IT/libraries, eReaders, publishers. eReaders are increasingly open and part of what publishers offer or have agreements with.

Additional functionality: Highlighting passages, recording comments, sharing notes with other students, reading while offline, using different devices.

Methods – student survey, faculty research protocol, usage data.
Spring 2012 results – option to purchase paper copy (12%), lower cost of etext most important factor, portability also ranked high, offline access desirable, access throughout college, not just class, usability of eReader (especially zoom feature), faculty not using enhanced eText features.

Affordability – Textbook costs and their impact. Licensing e-texts directly could reduce college costs by 4%. For a $100 textbook, direct license cost could be $40, while buying it through the bookstore could cost $67. International survey found 77% reported that they do not always buy textbook for their course, with affordability being a prime concern.

Broadband access – Need to have access offline to texts as well as online.

Information Policy issues – Ownership – what you can do with the book once you own it, but in a licensing environment it’s different. What happens to fair use? DRM – some systems don’t allow printing or cut & paste. Preservation – long term preservation rights are not the norm. The only way to insure permanent access is download without DRM. Favorable licensing should include broad academic use, perpetual use right, DRM-free formats.

Accessibility – E-text pilots and the NFB. Minnesota was part of the first pilot, and their disability student services office evaluated the Courseload e-reader distributing McGraw Hill Education content. Found that Courseload was essentially unusable by people with visual problems. That report was made public as part of the pilot. NFB sent a letter saying that the pilot was a violation of the ADA. Minnesota’s findings had motivated Courseload to make substantive changes, planned for January 2013. Educause helped facilitate conversations between Courseload, McGraw Hill and NFB. Content remains key. Advisory Commission on Accessible Instructional Material in Postsecondary Education for STudents with Disabilities (The AIM Commission) – did a report to Congress, finding that etext publishers were making progress and epub 3 offered possibilities for accessibility, the materials currently in the market have problems. Encouraged adoption of standards for the federal government. CourseSmart says that 80% of their upcoming spring texts are accessible and they have accommodation processes for those that aren’t. The NFB is going to take a look at their offerings.

Privacy and Security – Lots of data analytics about interactions with electronic texts. That’s both good and bad. Materials reside on third parties, and publishers have access to data – what do they do with that? The pilot project has contractual language that prohibits use of data beyond the pilot. Most ereaders are accessible via web browser, so browser security is an issue.

Identity & Access management. Concern about multiple user names and passwords, or sharing passwords. The pilot uses the LMS, so don’t need a separate credential. But the future is in leveraging federation to use local credentials.