Starting-Pitcher-Stats-v2c-01

Baseball Data Visualization Experiment

Inspired by the compelling visuals from the Accurat team, I’m toying around with a new information design to visually represent pitching statistics for a baseball team’s starting pitcher rotation. For those not familiar with baseball, each Major League Baseball team has a “rotation” of 5 starting pitchers that start  baseball games. The first pitcher in the rotation starts one baseball game. The next pitcher in the rotation starts the next game, etc. I wanted to visualize the “line” on the starting pitchers for a team.

UPDATE: See the final completed project  The Rise and Fall of the 2014 Oakland Athletics

The Data

The main reason I really like working with baseball and other sports visualizations is that there is so much data available and it’s so easy to find. Baseball data like this is readily available on several websites including the Major League Baseball official site, ESPN, Baseball-Reference.com, and more. For this visualization, all of the data I need is available in each games box score and pitching line. In standard baseball box scores, the “line” on a given pitcher represents the number of innings pitched, number of hits he gave up, number of runs he allowed, number of earned runs allowed, base on balls (walks), strike outs, and home runs. This, along with the box score are typically reported in simple tables like those below.

Pitching Line

Pitching Line

Box Score for a game

Box Score for a game

 

Concepts and Sketches

The tables may be interesting data for baseball geeks and data nerds, but they’re not very visually compelling. Obsessed with always trying to visually represent things, I had an idea to try to represent the data in both of these tables for each game of a 162 game season for a Major League Baseball team. That’s a lot of data so to simplify the problem a bit, so I decided to just focus on the starting pitcher of each game. With the idea in mind of what data I wanted to visually represent, I started sketching out ideas of how to show the data. I always enjoy the information design work from the team at Accurat, an information design agency based in Milan and New York, and one of their recent works, “Brain Drain” (larger image available here), seemed to heavily influence my design concepts from the very beginning.

Sketch1

Initially I hadn’t planned on showing the inning-by-inning scoring of each team, but during the sketching phase the idea occurred to me to represent that with some area charts.

Sketch2

 

Digital Designs

Next, I started experimenting with some of these design concepts in Adobe Illustrator. The first version, shown below, used a colored background element for each game to represent whether the team won or lost, green representing a win in this case and the yellowish color representing a loss (green and gold being the team’s colors that I’m representing here – the Oakland A’s). A difference between this first digital concept and the sketch above is switching the inning-by-inning scoring area chart over to the right side and moving the bar chart over to the left. This was to allow for the possibility of games with extra innings. In those games, I need to be able to extend that area chart to the left more to accommodate the extra innings.

Version1

At first I liked this but the more I looked it the noisier it looked to me. I wanted a cleaner design so I ditched the background color element and decided to represent the wins and losses by coloring the bars in the bar chart itself one color or another to represent a win or loss.

Version2

This was definitely a cleaner look. I also tried using black or red circle elements in the middle with the size of the bubble representing the score differential and the color representing a win or loss – black for a win, red for a loss. A big black circle in the middle was pretty dominating though and had the potential to obscure other elements so I tried a version using a semi-circle instead and moving it to the side, shown in the version below.

Version3

Figuring I was maybe too obsessed with incorporating a circular element in the design I played around with a few versions that represented the score differential using a bar chart element instead but rejected all of those as they were just too ugly.

Next was a variation on version 2 where I brought back the bubble in the middle but ditched the black and red colors, opting for semi-transparent team colors instead.

 

The Final Design (maybe)

With a few other tweaks and refinements this is the most recent iteration of the design process. This only shows the first 19 games of a 162 game season so obviously this infographic will get MUCH larger by the end of the season.

Starting-Pitcher-Stats-v2c-01

And here’s a legend to help understand how to read it.

How-to-Read-It

 

I really like the idea of using small multiples to show stats from each game. I also like how the design can show different things at different levels. Zoomed out, it’s easy to get a sense of which pitchers are associated with games the team won (predominantly green graphic elements) vs those associated with games the team has ended up losing (predominantly gold color).* You can’t really read specific game details zoomed out but it affords a good high-level overview of the team’s starting pitchers’ performance. Zooming in, this design then visually represents most the elements from the box score and pitching line.

*As a side note, baseball scoring doesn’t necessarily associate the starting pitcher with a win, even if the team wins the game. The win is often designated to a relief pitcher who comes into the game in the middle innings. But the rules covering how that happens and who gets the win is messier than I want to deal with in this infographic or in this article. 

As you can see, this has gone through several iterations of evolutionary development and it’s still far from perfect. I’m well aware that many data visualization pros will cringe at the use of the circular element in the middle that represents the score differential since our brains have a hard time accurately comparing circle sizes. But, the circles here aren’t meant to be compared one to another but rather, just to give a general sense of how big of a win or loss the game was. Plus, I like the use of a circular element to try to offset (somewhat) all the rectangles and linearity of each game graphic.

For any readers out there who are both baseball fans AND data visualization fans that have comments, feedback, or suggestions, feel free to comment below or via the contact page. Or, if you’ve seen this data represented visually elsewhere, I’d love to see how others have tackled this data set.

The Process: A Look Behind the Scenes

For those interested in how I’m creating this infographic, I’m actually creating this all manually in Adobe Illustrator. Ideally , it would be great to generate this using Processing, D3, Raphael, or some other visual-oriented programming language or platform. While I am teaching myself Processing, I’m a long way from being able to write the code to scrape this data from a website, clean it up, parse it, and generate the graphics for each game in this design layout. So, I’m relying on my old standby, Illustrator. You can get a behind-the-scenes look at it in this screen capture.

Illustrator

 

It’s a bit of a tedious process, but not as bad as it might seem. Using a lot of layers makes the process much easier. Each starting pitcher gets his own top-level layer and then each of his games is a sub-layer under that. So, creating the graphic for an individual pitcher’s game is simply a matter of copying one layer, moving it and then adjusting the bar chart elements, area chart etc with the new game’s stats. Sure, it’s a bit time-consuming and laborious, but I actually find it relaxing to work on. Every year I end up working on a manually created baseball-related infographic like this. Some people like to paint or knit or restore an old car or just sit back and watch a baseball game to relax. For me, it’s relaxing to work on a data-centric infographic like this.