Learning to build on failure to build better in the future...
Achieving a glimmer of success in anything in life comes with numerous lessons -- failures that eventually add up in a way that allows an individual to briefly smile and strike out to fail again.
A week after starting this blog post, I was reminded of this fact thanks to a ViewSource podcast, when Dave Stanton and David Cohn started their discussion with this golden nugget of insight:
["There's a] notion that journalists are different than technologists because journalists don’t have a culture of failing publicly to improve. So, that’s what we’re going to try to do on this podcast — talk about how digital journalism fails. Of course we’ll talk about the successes as well, but there is no great project that doesn’t have a boatload of fail in its wake.
Now I'm not so quick to call myself a technologist or a developer, but I haven't really felt as though I've worked as journalist -- at least not like I had throughout the 14 prior years -- since at least the beginning of 2010.
Well after the June 5 Recall Election here in Wisconsin, I sure can relate to a "boatload of fail," and I can relate to failing publically -- or putting a happy face on it -- getting a lesson to pop up on the homepage of madison.com some 15 minutes after election results started to trickle in on what might have been Wisconsin's most historic night when it comes to politics.
So I want to document what I found to have happened and what I learned from the experience -- you can break something just as easily as you can build a cool presentation -- but also share some of the things that worked. If you want to skip to the re-usable code chunks, they are the bottom of this post.
As far as the tl;dr? Well:
Experiment, fail, learn, build and work and learn from others, but above all don't add features to a release after it's been approved... especially on a Saturday ... three days before the election... when you're learning how it works. Just. Don't. Don't. Do. It.
When that journalist learns to code
June 5 was the third straight "first-Tuesday-of-the-month-election-night" for madison.com and our two newsrooms, and that in itself is kind of exciting; we covered municipal elections and a presidential primary on April 3 and a series of recall primaries on May 8.
For both of the prior election nights I had used Google's Fusion Tables as the data backend -- Fusion Tables, more than just maps. It's flexible when it comes to importing data from a spreadsheet or a text file, and retrieving data is straightforward and easy. Couple that with the fact that we'd be populating some maps throughout the night and after the results came in, it made sense.
And frankly it all worked well and without issue for the April 2 and May 8 election nights. So what changed? Ah, first the backstory.
Our April 3 widget displayed presidential primary, Madison school board and Dane County races. In this case, Erik Paulson wrote a scraper that would parse through a municipal results website and convert the results to a csv that could be uploaded into a Google spreadsheet.
The process involved some manual labor -- copy and pasting data, and triggering the Apps script sync -- but throughout the evening it displayed results a timely fashion.
In the post-mortem, we talked about having a results page for May 8 that would display a list of all races outside of those we'd feature on the homepage.
And so the May 8 widget, which displayed Democratic candidates for Governor, and five other races on a results page, involved much of the same workflow/code as above, but with a few "efficiencies."
For this election night I was able to take a flat file of election results and upload it to Fusion Tables using a Python script written by Kathryn Hurley of the Fusion Tables team, and explained by WNYC's Keefe here.
The script could run automatically, but we configured to run manually, so every couple of minutes I'd drop a flat file into a directory, hit the up arrow in the terminal, press enter and watch the numbers update.
This process worked well throughout the evening and the data flowed to the user. And a large part of me wonders now if I had left things well enough alone -- if I'd stuck with this prior method, and a prior process -- that June 5 would have worked as well.
Inspiration !== act now
One thing the April and May results widgets lacked was a time stamp, so users had no way of knowing when the numbers were last updated, outside of the number of wards that had reported. So I wanted to try to make this happen.
I'd update the Fusion Table and a couple blinks later, the results on the screen in front of me would update ... on a Saturday ... three days before the election. We'll come back to this...
On election night, things started off well enough... Throughout the day I had joked about being nice to the squirrels in the server, hoping they'd be in shape to the handle the traffic.
We've got numbers… 10 wards have reported… Check it out on madison.com— Chris Keller (@ChrisLKeller) June 5, 2012
But not long after that tweet, the widget stopped updating with data. And in hindsight it took me too long to learn that was because we had stopped receiving data from the Fusion Table I was using to store the voting results.
And that is because I failed to realize that the bloated, inefficient script I had written would be hitting Google's Fusion Tables API to the tune of thousands of times each minute scaled to our user traffic on election night.
I'll say that again, because it's something I need to learn from: Thousands of Times. Each Minute.
Google's Hurley helped me to understand just what I was doing in a follow-up email conversation. She wrote:
Your app was making around 8 calls to the FT API per person every 2 min. Let's say 1000 people view the app within a single minute. That would be 8000 calls to the API at once, which is about 130 queries per second! As more and more people viewed the website, the number of queries per second would add up…
Given the nature of what we were dealing with -- a contentious recall election some 16 months in the making -- it was probably more like 40,000 API calls … every two minutes.
So Fusion Tables stopped returning data due to excessive queries, and froze out our IP address followed by our users IP addresses in no short order. And my stomach rose pretty quickly, becoming lodged in an area pretty close to where I had early in the day touted this recall election widget, also known as my big mouth.
Then came the deer in the headlights look and having to answer questions about what was happening while troubleshooting and trying to find an alternative solution.
Go on with the chlorophyll.
So lessons... they are numerous, and if you have made it this far, probably best conveyed with bullet points.
Don't add features to a release after it's been approved ... especially on a Saturday ... three days before the election ... when you're learning how to make it work. Just. Don't. Don't. Do. It.
If using a third-party service for a project that may bring high-traffic, communicate with them, explain the situation, show them what you are attempting to do and ask for advice. Why I didn't do this I will never know but it was poor judgement on my part.
Have a backup plan ready to go at the drop of the hat. The failure of the results widget would have been mitigated -- to an extent -- had I been prepared to drop in an AP results widget in an instance. Failure to be prepared to do that continues to sting me.
Find a code editor. Having worked as a reporter and an editor I know of a symbiotic relationship that exists: A reporter can make an editor better, and an editor can make a reporter better. So it holds that an experience coder can help make a beginner better, and the beginner can buy donuts for the more experienced coder. While what I came up with worked, it could have been much more efficient -- one API call person user sted eight anyone? -- and that fact reminds me that you better ask for help in this world.
Static files have some full circle. For many news apps being developed, if the data only updates every so often, developers are storing the data in a static file and querying that. But even on nights like an election night, where the data is updating rapidly, static files might make some sense.
And it deserves to be repeated ... don't add features to a release after it's been approved ... especially on a Saturday ... three days before the election ... when you're learning how to make it work. Just. Don't. Don't. Do. It.
Ask others for advice. Many of these problems aren't new, and there are solutions out there, but be upfront about your skills and the resources you have at your disposal.
Do Not Get Cocky Because You Will Be Burned By The Sun.
Which brings me to...
Learning new things and working together
All in all, besides the embarrassment of being without election results for too long as I scrambled to let go of what wasn't working and get an AP widget added to our site -- always have a backup plan ready to go at the drop of the hat -- I do have some code I can share, albeit not the code that will overload the Fusion Tables API.
I also owe a post on how we were able to create an election results map that used a series of circles to show how a candidate did in a particular county instead of shding that county a different color. More to come on that.
Bash Script to download results from AP's results FTP
The week prior to the election, Adam Hirsch from Wisconsin Public Radio reached out to me asking for some guidance on getting a Fusion Table map together. He wanted to create a select menu that would toggle between different map layers.
In our subsequent chats, Adam shared with me a terminal script that would hit AP's FTP server every couple of minutes, download the results file, append the date and time to the file name, and create a symlink to the file.
To use, name the file \.sh, open the terminal and cd to the directory and run ./\.sh. If you have your PATH set, you should be able to just run \.sh from the terminal.
Ryan Murphy of the Texas Tribune offered up this bit of code to adjust the notoriously fickle formatting of numbers coming back from Fusion Tables. This function has become a staple.