Brian Athey – Big Data 2011 [ #rdlmw ]

Brian Athey is a professor in the Medical School at the University of Michigan.

It’s difficult to incentivize researchers to share data.

Agile data integration is an engine that drives discovery.

Developing personal health system requires combining data extracted from genomics with data extracted from a clinical record of the individual.

There’s a disconnect between classic IT’s “command and control” approach and what actually happens in research labs. We want to achieve a focused collaboration balancing high levels of focus and participation.

Next gen sequencing – turning out around 10 terabytes per day at Michigan, from 1500 users.

In 2006 there was a knee in the curve where it became more economical to generate the genomic data than to store it. We have to make decisions about what we store – we can’t save everything.

Brian is working on a Federated Enterprise Data Warehouse, that stores both clinical and research data. There’s an “honest broker” that mediates the data accessible to the research side.

PCAST NITRD “Big Data” report from November. Has a list of recommendations.

We are all challenged by having to bring heterogeneous data together. Working with Johnson and Johnson on something called tranSMART – J&J have over 400 pharma research databases.

Clinicians have worfklow – researchers don’t.

Discussion items:
IT doesn’t own the problem.
The rise of “architecture”
Data governance
Data governance – who owns the data? bring them into the room. But there also has to be top down convenors.
Privacy, security, confidentiality – the idea of the “honest broker” could be a model.
Cost and value-centered models – if we remain just a cost center we’re cooked.

Question – why can’t we keep all the data? The “Best Buy conundrum” – why do you charge me so much for storage when I can get it elsewhere cheap. Takes money to curate and level out the chaos. Maybe we should let the researchers decide what stays and what goes. The questioner, dealing with crystallography data and working with people dealing with NASA data, says that they’ve learned that getting rid of raw data is a huge mistake. Vijay notes that now the cost of hardware is only 5% of the cost of storage – it’s people and facilities that cost.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: