I wrote a post about DIY Data Science back in March. In that post I said that hacking on public data sets and posting about it has the potential to be a big deal in the coming years. I saw a great example of exactly what I was thinking about this morning.
Alastair Coote pulled a bunch of turnstile data from the MTA and figured out what the most used NYC subway stations are during rush hour. And he posted his code to GitHub and embedded it on his blog.
If I were a high school math teacher, I would take his work and make it a project for my students to work on together. The MTA makes a lot of data available to play with. This kind of stuff is highly relevant to teenagers in NYC. They would understand the data and the exercise.
The data and tools to do DIY Data Science are becoming more accessible every day. I hope we all get into data hacking and start collaborating on this stuff together publicly. At a minimum, it will lead to more data scientists and we might learn some interesting things about ourselves and our world at the same time.
BTW – Union Square is the most active subway station at rush hour. Midtown south FTW!