I came across Ankit Ahuja’s blog today and saw his Twitter Wordle blog post. For those of you who don’t know, Wordle is a simple online application that lets you create word clouds from text files or websites, emphasizing words that are used more often. I wanted to make my own but couldn’t find the right resources from the site, so I ended up doing a Google search for a script that would allow me to download all of my tweets into a text file. I found a python script by Zach Seifts that I ran after downloading BeautifulSoup, a Python HTML/XML parser required to run the script. Nick helped me tweak it a little bit so that it worked. Here is the code that I used:
import time
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
# Replace USERNAME with your twitter username
url = u'http://twitter.com/USERNAME?page=%s'
tweets_file = open('tweets', 'w')
for x in range(10*10000):
f = urlopen(url % x)
text = f.read()
text = text.replace("sc'+'ript", "script")
soup = BeautifulSoup(text)
f.close()
tweets = soup.findAll('span', {'class': 'entry-content'})
if len(tweets) == 0:
break
[tweets_file.write(t.renderContents() + '\n') for t in tweets]
# being nice to twitter's servers
time.sleep(5)
print "working...Page",x
tweets_file.close()
This exported all of my tweets into a text file that included all @ replies and HTML tags. Since Wordle would easily pick them up, I had to get rid of all HTML tags and @’s so that they wouldn’t dominate the word cloud. To do so, I used Emacs to create macros that automatically found and deleted them, leaving only raw text that I could plug into Wordle. The result is this glorious Twitter Wordle cloud à la @chloester:
Twitter Wordle
I came across Ankit Ahuja’s blog today and saw his Twitter Wordle blog post. For those of you who don’t know, Wordle is a simple online application that lets you create word clouds from text files or websites, emphasizing words that are used more often. I wanted to make my own but couldn’t find the right resources from the site, so I ended up doing a Google search for a script that would allow me to download all of my tweets into a text file. I found a python script by Zach Seifts that I ran after downloading BeautifulSoup, a Python HTML/XML parser required to run the script. Nick helped me tweak it a little bit so that it worked. Here is the code that I used:
import time from urllib2 import urlopen from BeautifulSoup import BeautifulSoup # Replace USERNAME with your twitter username url = u'http://twitter.com/USERNAME?page=%s' tweets_file = open('tweets', 'w') for x in range(10*10000): f = urlopen(url % x) text = f.read() text = text.replace("sc'+'ript", "script") soup = BeautifulSoup(text) f.close() tweets = soup.findAll('span', {'class': 'entry-content'}) if len(tweets) == 0: break [tweets_file.write(t.renderContents() + '\n') for t in tweets] # being nice to twitter's servers time.sleep(5) print "working...Page",x tweets_file.close()This exported all of my tweets into a text file that included all @ replies and HTML tags. Since Wordle would easily pick them up, I had to get rid of all HTML tags and @’s so that they wouldn’t dominate the word cloud. To do so, I used Emacs to create macros that automatically found and deleted them, leaving only raw text that I could plug into Wordle. The result is this glorious Twitter Wordle cloud à la @chloester: