Saturday, June 8, 2013

Object Oriented Crawls

A quick update

Today I re-wrote the crawling script in an object-oriented fashion, which took me about 7 hours. I told Ann I would do this later, since the top priority is starting the newspaper crawls, but I figured if I could get a working version by the end of today, I'd have killed two birds with one stone. The old versions of the crawling scripts are still in use for the language crawls, but the way they are written would have made incorporating a new list of newspapers an extremely involved process. I am testing my new version on the newspapers I have thus far culled from newspapermap.com and so far all is going well. I'm storing the data on a12, as per Carl's suggestion. Note, I have changed my dating convention. newspaper crawls are labeled:
"[month]_[day]_[year]_npsnap".

Time for bed now

No comments:

Post a Comment