Building on a Ruby scraper that Steve Faulkner whipped together on Saturday during a City Camp Madison session, here’s my hacked-together attempt at a Python scraper using Beautiful Soup and Requests.
It pulls data from the Madison and Dane County Public Health website to show which beaches are open and which are closed.
The terminal output looks like:
(data)(learning-scrape)… python beaches-scrape.py BB Clarke - (open) Bernies - (open) Brittingham - (open) Esther - (open) Goodland County Park - (closed) Governor Nelson State Park - () Hudson - (open) James Madison - (open) Lake Kegonsa State Park - () Lake Mendota County Park - (open) Marshall - (open) Memorial Union (Pier) - (open) Olbrich - (open) Olin - (open) Spring Harbor - (open) Stewart County Park - (open) Stoughton Mandt Park Pond - () Tenney - (open) Verona Fireman's Park - () Vilas - (open) Warner - (open)
There’s a couple things that come to mind to add to this: The name of the lake that the beach is one, and some kind of if/else statement that can tell if the beach is monitored by another agency, and thus no status is available.
And while Steve’s exports the scraped data to JSON, mine will export it to a text file… not very useful I know, but I’m working on it.
You can also see the scraper at ScraperWiki.