#### Who's Driving Twitter Discussions - Network Analysis

Using graph theory, social network analysis, and a twitter streamer - we examine how we can extract the users who are leading the discussions on Twitter via highly retweeted content.

#### NLP Pipeline Manager

Natural Language Processing (NLP) has a lot of very frustrating parts. In this post, I introduce a library I wrote, and how I hope it makes NLP suck less.

#### Nicer Machine Learning with Spark: RFormula

Picking up from the last tutorial, we use RFormula to make our code much easier to read.

#### Machine Learning with Spark

A tutorial for using Spark to predict airline delays, using Spark's Machine Learning methods.

#### Setting up a Spark Cluster on AWS

A tutorial for building a cluster of computers, installing Spark, and doing your first Spark project on AWS.

#### Monte Carlo (Pt 5), Monte Carlo and Business

The fifth in a series of posts dedicated to Monte Carlo. In this edition, we apply MC to see how a choice changes our business's bottom line.

#### Monte Carlo (Pt 4), Let's Simulate Particles

The fourth in a series of posts dedicated to Monte Carlo. In this edition, we build a 1D particle simulator.

#### Monte Carlo (Pt 3), Can you beat Video Poker?

The third in a series of posts dedicated to Monte Carlo. This time, we try to outsmart the casino by learning to play video poker as optimally as possible.

#### Monte Carlo (Part 2), Monte Carlo + Python?

The second in a series of posts dedicated to Monte Carlo. This time, we solve a few simple problems with Python, to learn how Monte Carlo's work under the hood.

#### What is a Monte Carlo Simulation (Part 1)

The first in a series of posts dedicated to exploring the power and flexibility of Monte Carlo techniques. In this post we ask, "what the heck is a Monte Carlo Simulation anyway?"

#### Recommendation Engines for Dummies

A look into how collaborative filtering works for recommendations, with some Python code to build your own from scratch. Targeted for those without deep techincal knowledge of data science.

#### Streaming Audio with Python

Visualizing the sound coming through your computer's microphone using Python's plotting tools.

#### KMeans Color Compressor

Using a clustering algorithm called KMeans, I stylize images by forcing pixels into groups of colors.

#### Will Joey Votto Make the Hall Of Fame?

Using machine learning methods, I investigate what it takes to make the HOF and who among active players is likely to join the legends.

#### Which Word is the Most Biblical?

This web-crawler extracts the text from around the web then finds and highlights the most common words.

#### Analysis Tree Maker

This C++ code is used to convert an unmanageable 60+TB of data into a smaller, but still usable data structure for extracting physics results.

#### zPlot - A ROOT Extension

A C++ class to make managing plot-like objects within CERN's ROOT program simpler, more user-friendly, and more consistent.

#### The Drunken Walker(s)

A visualization of the classic "drunken walk" physics thought-experiment. A mix of random numbers and Jackson Pollock.

#### Baseball Simulator

Input the statistics of two teams of baseball players, then simulate games to your heart's content.