I ran out of battery yesterday and didn’t get these posted, but here are some notes from a series of presentations on how people are actually doing science on grids. I probably got some of these details wrong, as about 90% of what was said was over my head – and people say us computer types are incomprehensible 🙂
David Baker, a biochemist from U Washington, talked about the Rosetta@home project where they’re using distributed desktops to help create lowest-energy protein structures. There are about 100k machines currently enrolled in project.
Robert Riggleman from Wisconsin (the other UW), is talking about the use of distributed parallel computing on anti-plasticization of polymers. They’ve used over 75 years of CPU time since April of 2006. They use the GLOW facilities, a centralized high-performance computing facility at Wisconsin.
Margaret Romine, from PNNL, is talking about the problems of dealing with all the data generated by rapid sequencing of genomes. She’s using Gnare/Puma2 software developped at Argonne Lab. The software runs every genome that’s out there to gather evidence. Sequencing a genome is slow, annotating it is slow – typically a year by manual methods. Looking for ways to better automate the annotations, particularly in identifying possibly bad matches.
Oliver Gutsche from Fermilab is talking about high energy physics and the Large Hadron Collider used to study proton-proton collisions. They compare simulated data to real data – they’re talking about 6 petabytes of data in 2008. Core CMS infrastructure includes a data bookeeping service (DBS – catalog of available datasets) and a data location service (which data is stored at what site), and the Trivial File Catalog.