Quick Script to Implement ‘Paging’ to Search Twitter API for a Given Hashtag

I helped the folks at #wjchat a while back by writing a quick django app to pull tweets each week and save them to a database. The tweets were then accessible as formatted HTML that could be posted in the #wjchat Wordpress blog.

But something happened around the middle of April. The script pulled in 389 tweets on April 16, but the following week only caught 100. Same for the week after.

I did some hand-holding for a few weeks, but more and more of the same occurred.

The script — which was using the Tweepy library — was missing the first hour or so of tweets, and leaving the archive incomplete. Time for a refactor.

I’m not blaming Tweepy but I wanted to see what else was out there. I tried making a go with requests and oauth but I don’t think I’m ready yet. But the twitter library seems to work well.

Tell you the truth: the library didn’t matter. It comes down to how tweets are returned by the search API — descending order — the maximum number of results per page — 100 — and this thing called the max_id.

The max_id is what amounts to “paging” it seems for the Twitter API.

The solution to the issue described above is to use a technique for working with streams of data called cursoring. Instead of reading a timeline relative to the top of the list (which changes frequently), an application should read the timeline relative to the IDs of Tweets it has already processed. This is achieved through the use of the max_id request parameter.

So the best I could get in a single search after April 16 was 100. And once I had 100 I had no means of moving on to the next page. Thankfully tonight I figured out pretty fast how the max_id parameter can work.

The script is based on the #wjchat script, and takes a hashtag and a search start date as arguments and outputs the results as a csv. It requires the following packages…

    pip install python-dateutil==2.1
    pip install pytz==2013b
    pip install twitter==1.14.3

Makin’ and Bakin’ Chicken Wings


  • ½ bottle of Frank’s Red Hot Wings sauce – Buffalo
  • A glob of Barbecue sauce
  • Three shakes of A.1. Steak Sauce or similar fare
  • Four dollops of Cholula Hot Sauce.
  • Three dashes of Back of the Yards butchers rub
  • Three dashes of cayenne pepper
  • A dash of celery salt

Putting it all together

  • Mix the above together with a fork until it’s a consistent sauce.
  • Butcher two pounds of chicken wings, splitting the the biceps from the forearm and discarding everything after the tendon (Tip/Nub).
  • Bake wings for 45 minutes on a cookie sheet at 450 degrees, turning over once about half way through.
  • Toss wings in the sauce.
  • Return to the oven for 15 minutes — or a broiler if you have one — or until they reach preferred sauciness.

Cleaning Out the (Digital) Filing Cabinet

At home I have an entire drawer of a filing cabinet devoted to articles and essays that I printed out from websites or pulled out of magazines. As such, clipping services like Evernote and article storage software like Instapaper and Read It Later (now called Pocket) are the greatest thing ever. And for a habitual hoarder of words and ideas like myself, the worst thing ever.

And so I should probably get down to the job of linking to some of these things that I’ve been holding on to in Evernote, etc. with the idea of someday offering thoughts about others’ thoughts.

via Charlie Kindel: Don’t Make Your Team Say No To You

I’m an idea guy. Ideas come to me a mile a minute. At that point in my career I didn’t realize how disruptive it was to my team that I was spouting these ideas to the team while they were executing on the current plan. In my head, I was just talking about potentialities for the future; by telling the team about all the cool things we could do in the future, I was showing “vision”.

What I found out later, when talking to people who had been on that team, was they viewed me as a “randomizer” they needed to control. In other words, the team spent time and energy MANAGING THE MANAGER. I forced them, regularly, to say “No” to ME.

If you are a leader of an early-stage venture, you need to figure out a way to “vent” your ideas that has NO impact on your team. Here are some tactics I’ve used and seen others use that might help you do this.

via Erik Paulson: The next AppleTV, and the future of newspapers

At their core, local TV news and newspapers are fundamentally reporters. However, I believe that it would be easier for a newspaper staff to put together the content for a TV broadcast than a TV station could put together the content for an issue of a newspaper.

via Mark Surman: Mozillians as inventors

Admittedly, we don’t know how to do this at scale yet. The recent MoJo learning lab reached only 60 people, and was too closely tied to whether one became a fellow or not. But we did catch a glimpse of what might be possible through MoJo events and discussions that happened trough the challenge cycle. The idea of inventing new web things for the newsroom galvanized people, got them sharing ideas. It had people teaching and mentoring each other even if they didn’t know it.


Link: Making Decisions Based on Available Data

via Stijn Debrouwere

Imagine your newsroom has been pumping out articles about the papal election, yet it turns out that the article readers are clicking on is one about the civil war in Syria. Thank God, you can finally stop writing about the goddamned pope because people don’t care anyway. You can commission a new piece on Syria, perhaps even tailored to people’s exact search terms so you’re not just writing about what they care about, you’re answering their questions too. I think that’s incredibly useful information to have, information you can act on, right now.

Except I’ve never seen a news organization that has a workflow that would allow them to routinely respond to their readers’ behavior right now. Content farms seem to be the only content producers with that capability.


Why Should You Be a Newsroom Developer? Why Shouldn’t You?

Back in primitive times — oh like 2007 — I kept a text file of crime and felony arrests by police departments in Chicago’s South Suburbs.

In addition to serving as the assignment editor for the South Suburban edition of The Times, I was the edition’s primary crime reporter — partly out of necessity, partly out of enjoyment and partly out of the one-man-band syndrome I suffer from.

Each day I would add new cases, determine which cases had upcoming court hearings and update older information. Every. Day.

They were meticulous. And inefficient and repetitive.

Naive to guiding lights out there in the world that could show me a better method — one that would allow me to glean insights from this mountain of unstructured data I had collected, or at least lend structure to it — I did what worked.

To this day I think about what I could have done, and now what you — the web developer, technologist or data analyst — could do in a similar situation. Better yet, what could you do every day in a newsroom as part of the Knight-Mozilla Fellowhip program.

The short answer: solve problems and show folks new ways of dealing with repetition and inefficiencies.

Fact: there are mountains of stories, charts, graphs, news applications and much more sitting with every reporter, writer and editor in every single news organization in this world. Some are mundane. But some could change the way society thinks, lives and plays.

Here’s another: that information will sit there for any number of reasons, but I’m inclined to think that lack of imagination is not one of them. There are few too people to demonstrate a better, more efficient method. Far too few people around to develop a plan. And that’s where your help matters.

I’m kind of ashamed that my curiosity never led me to a solution that was better than adding the name of the person arrested, their date of birth, the charges against them and other information to a text file.

I say that because I’ve always loved systems. Whether as a dishwasher, the newsstand manager at a Barnes & Noble or in my first jobs as a journalist at newspapers in very small Wisconsin communities, to me the foundation for how the job is done was as important as doing the job itself.

Certainly I was oblivious to other ways. And many other journalists — who write about the inefficiencies of government or health care or the courts and then do their jobs which are part of a larger system of repetition and routine — are as well.

For me, it took learning “what it means to be a developer”, and thanks to the helping hands and support of othersway too many people to link off to — I have learned lessons that are more important than any code I have written.

  • Identify the problem you are trying to solve.
  • Do not repeat yourself.
  • Iterate, iterate, iterate according to some philosophy.
  • Show Your Work”.
  • Share your data.

These lessons have very little to do with classes, functions and syntax and everything to do with helping to reinforce the core mission of journalism: hold those in power accountable, help people make sense of the world around them and celebrate their place in it.

These lessons are directly applicable to the job descriptions that reporters, editors and web producers have at news organizations large and small where there are obstacles just waiting for solutions of the technical or automated variety.

The solutions could be as “simple” as extracting information from PDFs in a manner that doesn’t require four hours of copying and pasting. Maybe they are as “complex” as combining four or five daily processes that have been duct-taped together through Google spreadsheets, emails, meetings and in-person conversations.

Or maybe there’s simply someone out there with a bunch of text files, waiting for another way they didn’t know they wanted.

Convinced? Become one of the five people who will spend 10 months combining code, data and journalism as a Knight-Mozilla Fellow. Go and apply to become a 2014 fellow today.

Not convinced? Then read what these folks — to whom I offer eternal thank yous — have to say about creating code in newsrooms in the service of journalism. Then just fill out the application. You have nothing to lose and everything to gain.