Tony asked me yesterday what blogs and other news sources I read. I was going to point him to my list of “blogs I try to read regularly”, which is a blogroll from Bloglines, when I realized that my Bloglines subscription list and my Google Reader subscription list (which was originally derived by exporting the OPML list of feeds from Bloglines) had drifted out of sync with each other.
Well, I thought, this should be simple! I’ll just export the OPML lists from both readers and sort them and then compare using diff or something, and bring them back into sync, and clean them up a bit while I’m at it.
No such luck.
Bloglines’ OPML file uses outline text= for specifying folder names, and outline title= for individual subscription names. Google’s OPML file does the reverse.
Bloglines puts all the data for a single subscription on a single line. Google separates elements onto different lines.
Here’s an example:
Bloglines:
<outline text="Blogs" > <outline title="Accidental Pedagogy" text="Accidental Pedagogy" htmlUrl="http://accidentalpedagogy.typepad.com/accidental_pedagogy/" type="rss" xmlUrl="http://accidentalpedagogy.typepad.com/accidental_pedagogy/atom.xml" />
Google:
<outline title="blogs" text="blogs">
<outline text="apophenia" title="apophenia" type="rss"
xmlUrl="http://www.zephoria.org/thoughts/index.xml" htmlUrl="http://www.zephoria.org/thoughts/"/>
Guess it's time to break out some Python.