Recent Posts

Probably_POTUS: A Twitter bot using ML to detect tweets written by Trump

8 minute read

I’ve been looking for a project to play with for a while now so that I could officially end my long hiatus from this blog. (Devotees will notice the blog’s title has been updated and the platform underwent a shiny upgrade). One of my favorite data science stories this year was an analysis done on (then-candidate) Trump’s tweets that showed the crazy tweets tended to come from his personal Andriod phone, while more conventional tweets that his staff might compose came from other sources. I was impressed with how cleanly the space of crazy and sane tweets was divided by which device the tweet came from–you might say the decision function has “wide margins”. That got me thinking…what if we didn’t know which device the tweet came from, maybe we could use Machine Learning to predict which device it would have come from, thereby predicting who composed it.

Updated:

Texas Flood: How to Bust a Drought

9 minute read

Last post I took a stab at modeling the Colorado River Basin (the one in Texas, not the Grand Canyon).  I crawled data for a two year period from 2011 and 2012 consisting of daily precipitation reports from weather stations in the basin and stream flows from the entire Southwest in 15-minute intervals.  I used Pig/Hadoop to reduce this mass of data into a more manageable form, selecting only the 50 streams in the Colorado Basin and computing daily averages.  I then took historical daily observations of the elevation of Lake Travis (the main source of drinking water for Central Texas) and trained a model to predict the lake’s elevation from the stream flows (which I predicted from the rainfall measurements).

Updated:

Texas Flood: Modeling the Colorado River in Texas

9 minute read

Central Texas, including the Austin area where I live, is in the midst of a multi-year severe drought.  Walking around town it’s easy to forget this, especially after the city gets some rain and lawns turn green again.  But even with the recent rains, the Highland Lakes, our region’s freshwater reservoirs, are only at 38% of their storage capacity.  Lake Travis, located just outside Austin, is currently 53 feet below full.  To add some meaning to those numbers, here are a few pictures.

Updated:

Hurricane Sandy, One Year Later

4 minute read

It’s been a year since Hurricane Sandy hit the east coast, and although it’s not talked about as much in the national media, things are still not back to normal.  Being born and raised in New Jersey and now living half a country away in Texas, I wanted to look into how bad things still are.  I visited the beaches of southern New Jersey last summer and everything looked pretty much the way I remembered it, but I know that other areas were much harder hit.

Updated:

(More) Fun with Congressional District Maps

4 minute read

My last post looked at the boundaries of Congressional Districts in the US and tried to draw some conclusions about the political motivations behind the drawing of their boundaries.  Specifically, I calculated the ratio of a district’s perimeter divided by its area to find geometric oddballs–districts with funny shapes that I interpreted as evidence for gerrymandering. It turns out I was on the right track, but didn’t have things quite right.

Updated: