SpaceBit / Antarctica, Computers / Any RSS to Google Maps with Python

| Subcribe via RSS

Any RSS to Google Maps with Python

November 8th, 2008 Posted in Antarctica, Computers by Ken Mankoff

Desire: I want to keep in touch with people during my upcoming trip to Antarctica, and I want you to get updates in whatever format you prefer (via email, on my site, with RSS, text messages, on a custom map or in a standard Google Map). Last time I went to Antarctica I blogged quite a bit (see 64 entries in ANDRILL category), including photos and videos.

Problem: I’ll be on a boat with no internet connection and a twice-per-day satellite connection limited to 25 KB/day of email. So for a 60 day cruise, I’ll be allowed slightly less than 1.5 megabytes transfer over the entire trip. The text of this post is around 1 KB, so I can write roughly 10 times this and read about 10 times this each day. I guess it isn’t that bad…

Solution: I’ve set up a system that provides the following behaviors, given that I can send email to one or more entities via To: Cc:, and Bcc: fields. Recipients can be individuals, this blog, or Twitter. Emails sent to individuals will go to their inbox. Email to this blog will be posted on the front page and show up on RSS. Emails to Twitter can be read on Twitter, your phone, or RSS. Any emails that end up in RSS that have geographic coordinates in them will be geocoded on my map.

None of this requires any programming skill except the mapping. I’ve written a small python script that will track an RSS feed and update the map if any posts contain geographic coordinates. If you are a programmer, read on to learn how it was done. If you aren’t enjoy the map

Remaining Issue: All this is one-way, me-to-you. I won’t see comments made on this site, and they won’t be seen by anyone else because I moderate them (unless you’ve previously commented), so this site will be a one-way communication while I’m on the boat. If you want to communicate with me, the only way is by doing a direct Tweet (@mankoff) or private Tweet (d mankoff). Any emails will be read when I get off the boat in March.

Implementation: I’ve set up a Google Reader tag, and made the RSS feed of that tag public. If I come across any feeds (or individual posts) that I want to get parsed by this program, I just add them to Google Reader and add that tag. For now, I’m just using my blog posts and my twitter posts, and only the subset of those that contain geographic coordinates. I’ve defined geographic coordinate loosely: On Twitter it is any Tweet that begins with “+-xx.xxx, +-yy.yyyy”. On other RSS items that have content beyond the title, it is the string “lat:+-xx.xxx” anywhere in the body followed by “lon:+-yy.yyy”. I coded this while in New York City, so by adding this text (lat:40.723701, lon:-73.989616) I can make this post show up on the map.

The code below goes through these steps:

  • Get the feed and for each item…
  • Check if it has geographic coordinates embedded in it
  • Check if we have already parsed this item
  • Record the ID (so we won’t parse it again)
  • Write out a KML placemark that contains the coordinates and the content

Python

I’m using the feedparser package.

#!/usr/bin/env python
# easy_install feedparser, then, code...

import feedparser, re, shutil, os, datetime, time

# subscribe to my personal feed, actually a tag in Google Reader
# anything I want to show up on the map, just tag the item or the feed
d = feedparser.parse("http://www.google.com/reader/public/atom/user%2F08422725465205392443%2Flabel%2Fmine")
for item in reversed(d.entries):

    # check if we've already processed it
    fp = open( 'processed.txt', 'r' )
    alreadyDone = False
    for line in fp:
        if line[:-1] == item.id:
            alreadyDone = True
            break
    fp.close()
    if alreadyDone == True:
        continue

    if item.description == item.title: # Twitter, etc.
        geocode = re.search("([-+]?\d+\.\d+)[,]?[\ ]?([-+]?\d+\.\d+)[,]?[\ ]?(.*)", item.title)
        if geocode != None:
            lat = geocode.group(1)
            lon = geocode.group(2)
            content = geocode.group(3)
    else: # normal RSS with lat/lon embedded anywhere in the body
        geocode = re.search(".*lat:[\ ]?([-+]?\d+\.\d+).*", item.content[0].value)
        geocode1 = re.search(".*lon:[\ ]?([-+]?\d+\.\d+).*", item.content[0].value)
        if geocode != None and geocode1 != None:
            lat = geocode.group(1)
            lon = geocode1.group(1)
            content = item.content[0].value

    if geocode == None:
        continue

    # At this point we have lat, lon, content of our geocoded item

    # record the id, so we don't re-parse it
    fp = open( 'processed.txt', 'a' )
    fp.write( item.id + "\n" )
    fp.close()

    # Build this placemark
    shutil.copy('rssbot.kml', 'save/rssbot.kml.'+str(time.mktime(datetime.datetime.now().timetuple())))
    kml = (
        '<Placemark>\n'
        '  <name>'+item.date+' @ ('+lat+','+lon+') </name>\n'
        '  <Point>\n'
        '    <coordinates>%s,%s</coordinates>\n'
        '  </Point>\n'
        '  <description>\n'
        '    <![CDATA[\n'
        '      <br>'
        '      %s\n'
        '      <br><hr>'
        '      <center>More info <a href="http://spacebit.org">here</a>.</center>\n'
        '    ]]>\n'
        '  </description>\n'
        '</Placemark>\n'
        ) %(lon, lat, content)

    # Append this placemark
    f = open( 'rssbot_placemarks.kml', 'a' )
    f.write( kml.encode("utf-8") )
    f.close()

# Done with the latest update. Transfer placemarks to full KML file.
f = open( 'rssbot_placemarks.kml', 'r' )
placemarks = f.read()
f.close()
f = open( 'rssbot.kml', 'w' )
f.write( '\n' )
f.write( '' )
f.write( placemarks )
f.write( '' )
f.close

KML

The rssbot.py above updates rssbot_placemarks.kml with the core data, and then at the end copies it to rssbot.kml and wraps it in three extra lines. In order to keep this file simple and in order to allow multiple views of this file, I’ve created a KML wrapper that accesses this via a NetworkLink. People can then load the wrapper in Google Earth, and get updates automatically without reloading any file.

The wrapper KML (NBP09-01.kml) follows, with line 10 linking to the core (rssbot.kml):

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>
        <name>NBP09-01.kml</name>
        <Folder>
          <name>NBP09-01</name>
        <NetworkLink>
          <name>NBP09-01</name>
          <visibility>1</visibility>
          <Link><href>http://spacebit.org/maps/NBP09-01/rssbot.kml</href></Link>
        </NetworkLink>
        </Folder>
</Document>
</kml>

Google Map

Finally, this KML needs to be displayed in a map. You can see the Google Map code by viewing the page and selecting “View Source” in your browser menu. There are very few modifications to this map from a standard Google Maps Hello World map:

  • Remove some controls {map.removeMapType(…);}
  • Some CSS so the map takes 100% of the page {the two height:”100%”s}
  • Load my wrapper KML over the network {map.addOverlay(new GGeoXml(”http://spacebit.org/maps/NBP09-01/NBP09-01.kml”));}

Putting It All Together

To complete it, set up the Google Map and make sure it loads correctly empty. Add the network layer with a test placemark. Add the 2nd network layer (the 1st being the wrapper, the 2nd being the content), with test placemarks in the 2nd layer. Create an RSS feed, with multiple items, some that contain the string “lat:xx.xx, lon:yy.yyy” where “x” and “y” are numbers. Run the rssbot manually. Check that the output KML is created containing the correct items. Make sure when debugging to delete (or not) the ‘processed.txt’ file that stops items from getting processed multiple times. Set up a cron job to run rssbot.py however often you like. Send yourself a tweet with and without coordinates correctly encoded. Send yourself a tweet via email. Update your blog with coordinates embedded in the post. Check all other ways of updating the RSS feed that the script parses. Enjoy.

Leave a Reply