[CSG Spring 2008] Jonathan Maybaum on UM.SiteMaker

Jonathan is a Professor of Pharmacology at the University of Michigan Medical school. A flexible authoring tool for public/private websites and web databases. The ability to allow non-technical people to create web applications is something he thinks we should be excited about. There are about 8k sites at UM published with SiteMaker. Also deployed at … Continue reading “[CSG Spring 2008] Jonathan Maybaum on UM.SiteMaker”

Advertisement

Jonathan is a Professor of Pharmacology at the University of Michigan Medical school.

A flexible authoring tool for public/private websites and web databases. The ability to allow non-technical people to create web applications is something he thinks we should be excited about. There are about 8k sites at UM published with SiteMaker. Also deployed at U Chicago, and others. It’s a server-based Java application, open sourced under ECL. So far it looks a little like our own Catalyst WebQ.

SiteMaker works with web sso like Cosign, CAS, and PubCookie, LDAP authorization, and they’re working now on Sakai/Blackboard/Moodle integration.

http://sitemaker.umich.edu/sitemaker.resources/data_table_demo

[CSG Spring 2008] Discussion with RIAA

David Hughes and Mark McDevitt from the RIAA are here to talk about the RIAA’s efforts to detect and report unauthorized music distribution. Primary tool they use is the DMCA takedown notice. In addition to the notices they focus on ebay auctions selling counterfeit CDs and hard drives loaded with content, online video streaming sites, … Continue reading “[CSG Spring 2008] Discussion with RIAA”

David Hughes and Mark McDevitt from the RIAA are here to talk about the RIAA’s efforts to detect and report unauthorized music distribution.

Primary tool they use is the DMCA takedown notice.

In addition to the notices they focus on ebay auctions selling counterfeit CDs and hard drives loaded with content, online video streaming sites, blogs offering music files, irc distribution of files, etc. Also seeing growth of secret closed groups distributing music, and sites offering ringtones, facebook apps and others.

Have a staff of six full-time people investigating. One person is focused on university takedown notices. They’ve sent “tens of thousands” of takedown notices to commercial isps, universities, and others.

Functional process for takedown notices – recent article in Chronicle of Higher Ed, and recent notice to Educause list from Mark Luker.

MediaSentry started with a list of seven hundred songs – typically newer and very popular, plus a handful of popular older songs, e.g. Hotel California. The first step they take is a text-based search for song titles on gnutella, aries, bittorrent, edonkey. They look for a hash that matches a list of previously stored hashes. If they don’t find it, they download the file and create a hash that they create a fingerprint of using Audible Magic. If Audible Magic matches the hash, they add it to the database. If Audible Magic does not find the song, it’s presented to the RIAA for them to listen to to verify.

Media Sentry takes the most widely distributed hashes on the p2p networks. Then sends out requests to p2p nets for files with those hashes. Then sort through IP addresses to determine ownership. Then they attempt TCP handshake to verify IP and request the hash. If the user responds, then they log date, time, and name of infringing file. That info is then used by a human (Jeremy Landis) to send out takedown notices. The notification is not automated.

They do not download every file they send out a notification notice on. There is no rule that says that they send a DMCA notice before a preservation request or presettlement notice is sent. They’re separate processes.

In no case do they rely just on a text match on file info.

The mechanism for presettlement letters has requirements built into it based on needs of litigation. When the system goes out to find users it looks for users sharing files belonging to member companies. Looking for users sharing multiple copies of songs from members, have to be able to download recordings, users have to be in US.

Because they only know IP addresses, they don’t know whether the people they send a presettlement letter to has received a previous takedown notice. Good faith belief is not enough for litigation. What they’re going for in presettlement letters is “egregious” infringers – ones that have hundreds or thousands of songs.

They started sending out takedown notices to the universities in 2003. Notices to commercial entities started in 1996, predating the DMCA. They say that the volume between universities and commercial entities is roughly equal. That’s different than what they said in the Chronicle, but that’s because some context got lost in that interview. Greg notes that this seems like an imbalance, given the relative proportion of users. They note that we have a large concentration of the demographic. Also they note that the volume of outgoing transfers from universities is much larger than on other ISPs. In the case of non-campus ISPs they try to send notices to where “they have the most effect”. When they send to some ISPs for some p2p use, they appear to go in a black hole.

There’s no relation between the recent increase in notices and any political activity, nor do they use the notices to indicate the level of activity at any particular university. The surge is due to software and hardware upgrades at Media Sentry.

They’ve done recent surveys of college-age students where 90% know that it’s illegal, but 70% admit to doing it. Now they know that the population knows it’s wrong but continuing the behavior, therefore that’s what they have to deal with. Elazar notes that after five years of receiving notices, there hasn’t been much change in behavior. The RIAA guys respond by saying that they are seeing some change in behavior – P2P use is not increasing as fast as it was. That’s not true in Canada, where they don’t send out notices or litigate.

Terry asks what the limits are of MediaSentry to see activity inside the institutions, like DC++. They have no visibility into that through MediaSentry. They find out about things like that from things they see – like student newspapers, message boards, etc.

Untitled 8

Lisa Stensland, from Cornell OIT’s Project Management Office gave a presentation about their successful project management methods used to upgrade from Peoplesoft 8 to Peoplesoft 9. They used Critical Chain Project Management, which puts all the contingency time in estimates together at the end as a buffer. She noted that a critical part of the … Continue reading “Untitled 8”

Lisa Stensland, from Cornell OIT’s Project Management Office gave a presentation about their successful project management methods used to upgrade from Peoplesoft 8 to Peoplesoft 9. They used Critical Chain Project Management, which puts all the contingency time in estimates together at the end as a buffer. She noted that a critical part of the project’s success was the complete (or close to complete) commitment of staff time while working on the project, minimizing the multitasking that they needed to do.

[CSG Spring 2008] Agile Organization workshop

Thursday morning is spent in a workshop discussing how our IT organizations can be more agile. I’m one of the coordinators and presenters in this workshop, so can’t really blog it. Maybe somebody else will. Presentations are available on the web site at http://www.stonesoup.org/Meeting.next/work2.pres/

Thursday morning is spent in a workshop discussing how our IT organizations can be more agile. I’m one of the coordinators and presenters in this workshop, so can’t really blog it. Maybe somebody else will. Presentations are available on the web site at http://www.stonesoup.org/Meeting.next/work2.pres/

[CSG Spring 2008] Cyberinfrastructure Workshop – Funding for CI – Three Success Stories

Steve Cawley – University of Minnesota The CIO’s job – get the money. What has worked at Minnesota, what doesn’t work? What are we doing to move forward? Three centrally funded research areas – the Minnesota Supercomputer Institute. $6.5 million budget to provide high-performance computing to institution. Moving from engineering/science college to VP for Research. … Continue reading “[CSG Spring 2008] Cyberinfrastructure Workshop – Funding for CI – Three Success Stories”

Steve Cawley – University of Minnesota

The CIO’s job – get the money. What has worked at Minnesota, what doesn’t work? What are we doing to move forward?

Three centrally funded research areas –

the Minnesota Supercomputer Institute. $6.5 million budget to provide high-performance computing to institution. Moving from engineering/science college to VP for Research.

Central IT – Network is a common good. Unified gig to the desk meshed 10 gig backbone. BOREAS-net 2100 mile owned optical network connecting to Chicago and Kansas City. Chicago CIC fiber ring and OMNIPOP. NLR and Internet2.

Had good luck funding SAN storage for researchers past three years. Received startup funding for server centralization utilizing virtualization. New data center plan central IT plus research.

University Library – expanding expertise including data collection and curation, data preservation and stewardship. VIrtual organization tools. University Digital Conservancy. Rich Media Services.

Problems – limited collaboration between researchers.

Heavy reliance on chargeback is a detriment. Central IT was 80% chargeback, now only 20% is chargeback. Common good services should be centrally funded.

Moving forward – the Research Cyberinfrastructure Alliance. Great exec partnership – VP for Research, CP for Tech, U Librarian. Try to speak with one voice.

Input from interviews with faculty. Large storage needs. Little thought being given to long term data preservation. University does not exist in a vacuum.

Julian Lombardi – Duke

Center for Computational Science Engineering and Math (CSEM) – Visualization Lab, Duke Shared Cluster Resource – blades donated to cluster. Those who donated got priority cycles.

Provost discontinued funding to the center. Cluster and Vis Lab were still being supported by OIT and departments.

Needs are – broad and participatory direction setting; support for emerging and inter-disciplines.

Bill Clebsch – Stanford – Cyberinfrastructure: A Cultural Change

Religious camps blocking progress.

There were three separate campus – the schools, the faculty, and the administration. Everyone pretending that computing can be managed locally.

Asked the Dean of Research to send out letters to the top fifty researchers. Spent time with each of them to find out what the state of computing is.

Exploding Urban Myths – when they talked to faculty found out that the received wisdom wasn’ ttrue.

Myths and facts #1 – myth: scientific research computing methodology has not fundamentally changed (heard from Provost). Fact: researchers’ computational needs have changed fundamentally in the last five years, increased computing availability itself directly yields research benefits, researchers have abandoned the notion that computing equipment needs to be down the hall.

Myths and Facts #2: Myth: faculty needs are highly specialized and cannot be met with shared facilities. Fact: Faculty are willing to share resources, clusters, and cycles. Research methodologies are surprisingly similar regardless of discipline (e.g. larger data sets; simulation studies; from shared memory to parallelism). Episodic nature of research computing allows for coordinated resource sharing.

Myths and Facts #3: Myth: distributed facilitie scan keep pace with demand. Fact: Lessons learned from BioX Program: running out space, cycles, cooling, and power. Cross-disciplinary facility economies of scale. Multi-disciplinary computing economies of scale.

Myths and Facts #4: Myth: Central computing facilities are bureaucratic and inflexible. Fact: Colocating and sharing models reduce overhead. Modularity in building, power, cooling, and support help create sustainability. Move from control to collaboration empowers faculty and reduces central cost. Faculty own the clusters – OIT will just cycle power on request.

Where is this going? 21st century cyberinfrastructure costs will dwarf 20th century ERP investments. Sustainability will be an economic necessity. Cloud/grid computing will affect investment horizons.

[CSG Spring 2008] Cyberinfrastructure Workshop – Jim Pepin

Disruptive Change – Things creating exponential change – transistors, disk capacity, new mass storage, parallel apps, storage management, optics. Federated identity (“Ken is a disruptive change”) team science/academics; CI as a tool for all scholarship. Lack of diversity in computing architectures – X86 or X64 has “won” – maybe IBM/Power or Sun/SPARC at ages. Innovation … Continue reading “[CSG Spring 2008] Cyberinfrastructure Workshop – Jim Pepin”

Disruptive Change –

Things creating exponential change – transistors, disk capacity, new mass storage, parallel apps, storage management, optics.

Federated identity (“Ken is a disruptive change”) team science/academics; CI as a tool for all scholarship.

Lack of diversity in computing architectures – X86 or X64 has “won” – maybe IBM/Power or Sun/SPARC at ages. Innovation is in consumer space – game boxes, iPhones, etc.

Network futures – optical bypasses (we’ve brought on ourselves by building crappy networks with friction). GLIF examples. Security is driving researchers away from campus networks. Will we see our networks become the “campus phone switch” of 2010?

Data futures – Massive storage (really really big) Object oriented (in some cases); Preservation, provenance (- how do we know the data is real? ) distributed, blur between databases and file systems. Metadata.

New Operating Environments – Operating systems in network (grids) not really OSs. How to build petascale single systems – scaling apps is the biggest problem. “Cargo cult” systems and apps. Haven’t trained a generation of users or apps people to use these new parallel environments.

In response to a question Jim says that grids work for very special cases, but are too heavyweight for general use. Cloud computing works in embarrassingly parallel applications. Big problems want a bunch of big resources that you can’t get.

The distinction is made between high throughput computing and high performance computing.

100s of teraflops on campus – how to tie into national petascale systems, all the problems of teragrid and VOs on steroids – network security friction points, identity management, non-homogenous operating environments.

Computation – massively parallel – many cores (doubling every 2-3 years). Massive collections of nodes with high speed interconnect – heat and power density, optical on chip technology. Legacy code scales poorly.

Vis/remote access – SHDTV like quality (4k) enables true telemedicine and robotic surgery, massive storage ties to this,

Versus – old code , writte on 360 or vaxes, vector optimized, static IT models – defending the castle of the campus. researchers don’t play with others well. condo model evolving. will we have to get used to the two port internet? Thinking this is just for science and engineering – social science apps (e.g. education outcomes at clemson – large data, statistics on huge scale) or shoah foundation at USC – many terabytes of video.

VIsion/sales pitch – access to various kinds of resources – parallel high performance, flexible node configurations, large storage of various flavors, viz, leading edge networks.

Storage farms – diverse data models: large streams (easy to do); large number of small files (hard to do); integrate mandates (security, preservation), blur between institution data, and personal/research; storage spans external, campus, departmental, local. The speed of light matters.

[CSG Spring 2008] Cyberinfrastructure Workshop – CI at UC San Diego

Cyberinfrastructure at UC San Diego Elazar Harel Unique assets at UCSD: CalIT2; SDSC; Scripps Institution of Oceanography They have a sustainable funding model for the network. Allows them to invest in cyberinfrastructure without begging or borrowing from other sources. Implemented ubiquitous Shibboleth and OpenID presence. Formed a CI design team – joint workgroup. New CI … Continue reading “[CSG Spring 2008] Cyberinfrastructure Workshop – CI at UC San Diego”

Cyberinfrastructure at UC San Diego

Elazar Harel

Unique assets at UCSD: CalIT2; SDSC; Scripps Institution of Oceanography

They have a sustainable funding model for the network. Allows them to invest in cyberinfrastructure without begging or borrowing from other sources.

Implemented ubiquitous Shibboleth and OpenID presence.

Formed a CI design team – joint workgroup.

New CI Network designed to provide 10 gig or multiples directly to labs. First pilot is in genomics. Rapid deployment of ad-hoc connections. Bottleneck-free 10 gig channels. Working to have reasonable security controls and be as green as possible.

Just bought two Sun Blackboxes – being installed tomorrow. Will be used by labs.

Chaitan Baru – SDSC

Some VO Projects – BIRN (www.birn.net) – NIH Biomedical Informatics Resarch Network – shares neuroscience imaging data. NEES (www.nees.org) Network for earthquake engineering simulations; GEON (www.geongrid.org) Geosciences network; TEAM (www.teamnetwork.org) field ecology data; GLEON (www.gleon.org) Global Lakes; TDAR (www.tdar.org) digital archaeology record; MOCA (moca.anthropgeny.org) comparative anthropogeny

Cyberinfrastructure at the speed of research – research moves very fast. researchers think that google is the best tool they’ve ever used – In some cases “do what it takes” to keep up: take shortcuts; leverage infrastructure from other CI projects and off-the-shelf products. Difficult because – can be stressful on developers who take pride in creating their own; engineers may think PI is changing course too many times. In other cases “don’t get too far ahead” of the users – sometimes we build too much technology – user community may see no apparent benefit to the infrastructure being developed.

The sociology of the research community influences how you think about data.

Portal-based science environments. Support for resource sharing and collaboration. Lots of commonalities, including identity and access issues. Lots of them use the same technologies (e.g. GEON and others). Ways of accessing data and instruments. Lots of interest from scientists in doing server-side processing of data rather than just sharing whole data sets for ftp. e.g. LiDAR on the GEON portal. opentopography model is an attempt to generalize that. EracthScope data portal is another example – includes SDSC, IRIS, UNAVCO (Boulder), adn ICDP (Potsdam).

Cyberdashboards – live status of information as it’s being collected. Notifications of events is also desirable.

Cyberdashboard for Emergency Response – collecting all 911 calls in California. Data miniing of spatiotemporal data. Analysis of calls during San Diego wildfires Oct 2007. Wildfire evacuations – visualization of data from Red Cross disastersafe database.

Cyberinfrastructure for Visualization

On-demand access to data – short lead times from request to readiness to rendering and display.

On-demand access to computing – online modeling, analysis and visualization tools

Online collaboration environments – software architecture, facility architecture.

SDSC/Calit2 synthesis center – conceived as a collaboration space to do science together – brings together – high performance computing; large scale data storage; in person collaboration; consultation. Has big hd screens, steroscopic screen, videoconferencing, etc. Used for workshops, classes, meetings, site visits. Needs tech staff to run it, and research staff to help with visualization, integration, data mining. So far has been on project-based funding, lately there’s been a recharge fee.

Calit2 stereo wall (C-Wall) – Dual HD resolution (1920 x 2048 pixels) with JVS HD2k projectors.

Calit2 digital cinema theater – 200 seats, 8.2 sound, Sony SRX-R110, SGI Prism with 21 TB, 10GE to computers

The StarCAVE – 30 JVC HD2k (1920 x 1080) projectors.

225 megapixel hiperspace tiled display.

In response to a question from Terry Gray, Chaitan notes that the pendulum is swinging a bit in that PIs still want to own their own clusters, but they no longer want to run them – they want them housed and administered in data centers. Elazar notes that they’re trying to make the hardware immaterial – a few years from now they may all be in the cloud, but the service component to help researchers get what they need will remain on campus.

[CSG Spring 2008] Cyberinfrastructure Workshop – Virtual Organizations

Ken Klingenstein – An increasing artifact of the landscape of scientific research, largely from the cost nature of new instruments. Always inter-institutional, frequently international – presents interesting security and privacy issues. Having a “mission” in teaching and a need for administration. All of these proposals end with “in the final year of our proposal three … Continue reading “[CSG Spring 2008] Cyberinfrastructure Workshop – Virtual Organizations”

Ken Klingenstein –

An increasing artifact of the landscape of scientific research, largely from the cost nature of new instruments.

Always inter-institutional, frequently international – presents interesting security and privacy issues.

Having a “mission” in teaching and a need for administration. All of these proposals end with “in the final year of our proposal three thousand students will be able to do this simulation”. Three thousand students did hit the Teragrid a few months back for a challenge – 50% of the jobs never returned.

Tend to cluster around unique global scale facilities and instruments.

Heavily reflected in agency solicitations and peer review processes.

Being seen now in arts and humanities.

VO Characteristics – distributed across space and time; dynamic management structures; collaboratively enabled; computationally enhanced.

Building effective VOs. Workshop run by NSF in January 2008. A few very insightful talks, and many not-so-insightful talks. http://www.ci.uchicago.edu/events/VirtOrg2008/

Fell into the rathole of competing collab tools.

Virtual Org Drivers (VOSS) – solicitation just closed. Studying the sociology – org life cycles, production and innovation, etc.

NSF Datanet – to develop new methods, management structures, and technologies. “Those of us who are familiar with boiling the ocean recognize an opportunity.”

Comanage environment – externalizes id management, priveleges, and groups. Being developed by Internet2 with Stanford as lead institution. Apps being targeted: Confluence (done), Sympa, Asterisk, DimDim, Bedework, Subversion.

Two specimen VOs

LIGO-GEO-VIRGO (www.ligo.org)

Ocean Observing Initiative ( http://www.joiscience.org/ocean_observing )

The new order – stick sensors wherever you can and then correlate the hell out of them.

Lessons Learned – people collaborate externally but compete internally; time zones are hell; big turf issue of the local VO sysadmin – LIGO has 9 different wiki technologies spread out over 15 or more sites (collaboration hell). Diversity driven by autonomous sysadmins. Many instruments are black boxes – give you a shell script as your access control. Physical access control matters with these instruments. There are big science egos involved.

Jim Leous – Penn State – A VO Case Study.

Research as a process: lit search/forming the team; writing the proposal; funding; data collection; data processing; publish; archive.

Science & Engineering Indicators 2008

publications with authors from multiple institutions grew from 41% to 65%. Coauthorship with foreign authors increased by 9% between 2995 and 005.

How do we support this? Different collaborative tools. Lit Search – refworks, zotero, del.icio.us; Research info systems – Kuali Research; home grown; Proposals – wikis, google docs; etc. Lots of logins. COmanage moves the identity and access management out of individual tools and into the collaboration itself.

Need to manage attributes locally – not pollute the central directory with attributes for a specific collaboration effort.

What about institutions that don’t participate. LIGO – 600 scientists from 45 institutions.

LIGO challenges – data rates of 0.5 PB/yr across three detectors (> 1 TB /day); many institutions provide shared infrastructure, e.g. clusters, wikis, instrument control/calibration); international collaboration with other organizations; a typical researcher has dozens of accounts.

Penn State Gravity Team implemented LIGO roster based on LDAP and Kerberos – Penn State “just went out and did it” – drove soul searching from LIGO folks – “why shouldn’t we do this?”. Led to LIGO Hackathon in January, which was very productive. Implemented Shibboleth, several SPs, Confluence, Grouper, etc.

Next steps are to leverage evolving LIGO IAM infrastructure; establish permanent instance of LIFO COmanage; encourage remaining institutions to join InCommon; and (eventually) detect a gravity wave?

Bernie Gulachek – Positioning University of Minnesota’s Research Cyberinfrastructure – forming a Virtual Org at Minnesota – the Research Cyberinfrastructure Alliance.

A group of folks who have provided research technology support – academic health center; college of liberal arts; minnesota supercomputer institute; library; etc.

Not (right now) a conversation about technology, but about organization, alliances, and partnerships. Folks not necessarily accountable to each other, but are willing to come together and change the way they think about things to achieve the greater common good.

Both health center and college of liberal arts came to IT to ask how to build sustainable support for research technology .

Assessing Readiness – will this be something successful, or a one-off partnership? What precepts need to be in place for partnership? The goal is to position the institution for computationally intensive research. They have a (short) set of principles for the Alliance.

Research support has been silo’ed – need to have a connection with a specific campus organization, and the researcher needs to bridge those individual organizations. The vision is to bring the silos together. Get research infrastructure providers talking together. Researcher consultations – hired a consultant.

Common Service Portfolio – Consulting Services; Application Support Services; Infrastructure Services – across the silos. Might be offered differently in different disciplines. Consulting Services are the front door to the researcher.

Group is meeting weekly, discussing projects and interests.

[CSG Spring 2008] Cyberinfrastructure Workshop – Bamboo Project

I’m in Ann Arbor for the Spring CSG meeting. The first day is a workshop focusing on cyberinfrastructure issues. The NSF Atkins report defines ci as high perf comp; data, information; observation, measurement; interfaces, visualization; and collaboration services. Today will concentrate on the last two. The workshop agenda will cover interdisciplinary science; virtual organizations; visualization; … Continue reading “[CSG Spring 2008] Cyberinfrastructure Workshop – Bamboo Project”

I’m in Ann Arbor for the Spring CSG meeting. The first day is a workshop focusing on cyberinfrastructure issues.

The NSF Atkins report defines ci as high perf comp; data, information; observation, measurement; interfaces, visualization; and collaboration services. Today will concentrate on the last two.

The workshop agenda will cover interdisciplinary science; virtual organizations; visualization; mapping scientific problems to IT infrastructures; and getting CI funded.

Chad Kainz from University of Chicago is leading off, talking about the Bamboo Planning Project. The Our Cultural Commonwealth report from ACLS served the same kind of function in the humanities that the Atkins report did in the sciences.

Chad starts off with a scenario of a faculty member in a remote Wyoming institution who creates a mashup tool for correlating medieval text with maps, and publishes that tool, which gets picked up for research by someone in New Jersey, where it is used for scholarly discourse. The Wyoming faculty member then uses the fact of that discourse in her tenure review.

What if we could make it easier for faculty to take that moment of inspiration to create something and share it with others? How do we get away from the server under the desk and yet another database?

How can we advance arts and humanities research through the development of share technology services?

There are a seemingly unending number of humanities disciplines each with only a handful of people – you don’t build infrastructure for a handful of people. One of the challenges is how we boil this down to commonalities to enable working together. Day 2 of the Berkeley Bamboo workshop showed that unintentional normalization will lead to watering down the research innovations. The next workshop will start by trying to look at the common elements.

About eighty institutions participating in the first set of workshops.

One idea is to have demonstrators and pilot projects between workshops to test ideas, explore commonalities, desmonstrate shared services, and experiment with new application models. There is one project exposing textual analysis services from the ARTFL project that will probably be the first example.