Daniel Chudnov, Bergis Jules, Daniel Kerchner, Laura Wrubel – George Washington University
Save the time of the researcher:
- Demand from all across the spectrum from researchers for access to historical Twitter data. Library of Congress is archiving Twitter but not giving access.
- How are current researchers collecting Twitter data? One example from GWU: By hand – Google reader (RSS feeds from Twitter), copy and paste into Excel, add coding, then pull into SPSS and Stata. Too much work for too little data (not to mention tools which don’t exist anymore). Copy and paste to Excel doesn’t scale. Not an isolated case. Over 5000 theses and dissertations since 2010 using Twitter data.
- What researchers ask for: specific users, keywords; basic values (user, date, text, retweets and follower counts); Thousands, not millions; need delimited files to import into coding software; need historical software.
- Getting historical data usually requires getting from licensed Twitter reseller: DataSift, Gnip, NTT Data, Topsy (recently purchased by Apple). There are some research platforms for using Twitter data. Data is not cheap, but they are friendly and receptive to working with researchers (hoping to hear about products being developed specifically for academic community). Used to dealing with customers who can deal with very large datasets.
- University archives using this to document student organizations, who are highly active on Twitter. Since March tracking 329 accounts with over 299k tweets.
- Social Feed Manager software is available on GitHub. Django framework with django-social-auth library and tweepy library.
- Just capturing a single feed doesn’t capture all the depth of interaction – want to expand it further and also use other sources.