Social Network Analysis

Who's Driving Twitter Discussions - Network Analysis

Using graph theory, social network analysis, and a twitter streamer - we examine how we can extract the users who are leading the discussions on Twitter via highly retweeted content.
NLP Pipeline Manager

NLP Pipeline Manager

Natural Language Processing (NLP) has a lot of very frustrating parts. In this post, I introduce a library I wrote, and how I hope it makes NLP suck less.
Machine Learning with Spark

Nicer Machine Learning with Spark: RFormula

Picking up from the last tutorial, we use RFormula to make our code much easier to read.
Machine Learning with Spark

Machine Learning with Spark

A tutorial for using Spark to predict airline delays, using Spark's Machine Learning methods.
Setting up Spark on a Cluster

Setting up a Spark Cluster on AWS

A tutorial for building a cluster of computers, installing Spark, and doing your first Spark project on AWS.
Can you beat Video Poker?

Monte Carlo (Pt 5), Monte Carlo and Business

The fifth in a series of posts dedicated to Monte Carlo. In this edition, we apply MC to see how a choice changes our business's bottom line.
Can you beat Video Poker?

Monte Carlo (Pt 4), Let's Simulate Particles

The fourth in a series of posts dedicated to Monte Carlo. In this edition, we build a 1D particle simulator.
Can you beat Video Poker?

Monte Carlo (Pt 3), Can you beat Video Poker?

The third in a series of posts dedicated to Monte Carlo. This time, we try to outsmart the casino by learning to play video poker as optimally as possible.
How do we Monte Carlo with Python?

Monte Carlo (Part 2), Monte Carlo + Python?

The second in a series of posts dedicated to Monte Carlo. This time, we solve a few simple problems with Python, to learn how Monte Carlo's work under the hood.
What is a Monte Carlo Simulation?

What is a Monte Carlo Simulation (Part 1)

The first in a series of posts dedicated to exploring the power and flexibility of Monte Carlo techniques. In this post we ask, "what the heck is a Monte Carlo Simulation anyway?"
Simple Recommenders

Recommendation Engines for Dummies

A look into how collaborative filtering works for recommendations, with some Python code to build your own from scratch. Targeted for those without deep techincal knowledge of data science.
Streaming Audio

Streaming Audio with Python

Visualizing the sound coming through your computer's microphone using Python's plotting tools.
Color Compressor

KMeans Color Compressor

Using a clustering algorithm called KMeans, I stylize images by forcing pixels into groups of colors.
Hall of Fame Probability

Will Joey Votto Make the Hall Of Fame?

Using machine learning methods, I investigate what it takes to make the HOF and who among active players is likely to join the legends.
Word Highlighter Plot

Which Word is the Most Biblical?

This web-crawler extracts the text from around the web then finds and highlights the most common words.
Analysis Tree Maker

Analysis Tree Maker

This C++ code is used to convert an unmanageable 60+TB of data into a smaller, but still usable data structure for extracting physics results.

zPlot - A ROOT Extension

A C++ class to make managing plot-like objects within CERN's ROOT program simpler, more user-friendly, and more consistent.
Random Walker Example

The Drunken Walker(s)

A visualization of the classic "drunken walk" physics thought-experiment. A mix of random numbers and Jackson Pollock.
Baseball Simulator

Baseball Simulator

Input the statistics of two teams of baseball players, then simulate games to your heart's content.

Write-ups on more projects coming soon...
See for more.