TIL - Baseball Data

Ever since taking on the new position at Datalogix (now Oracle), I have been coming up to speed on a variety of topics related to AdTech. It’s been a fun, but challenging, learning curve.

One of the distinct differences in this role and every other role I have had designing and building software product is that we have two distinct sets of “builders” inside the company: engineering, and analytics. My engineering Kung Fu is pretty strong, but my analytics is, well, not so much. I have a degree in what used to be called Decision Sciences from college (now called Operations and Information Management), and we did a lot of modeling in Excel and some database work, but nothing like what current state of the art is in what today is called Data Science or Big Data.

My strong belief is that to be a great product manager, I need to be able to speak to all of my engineering counterparts, understand what they are saying, and have the ability and credibility to call BS when necessary. To get myself further up the curve, I have decided to get super nerdy with statistical learning and modeling. That’s a longer term outcome, and in the near term I have to set some more pedestrian goals. Part of that required my having a project into which I could sink my teeth. A requirement is that it is something which is interesting to me over time, and can become as difficult as I want to make it. I have decided, given the abundance of the data available, that I would target baseball analytics. Nothing groundbreaking there. There’s plenty of books and blogs out there which go into incredible depth on the topic. That should make it easier to find help when I need it, and that’s super important when trying to pick up new skills.

To get started, I pulled down the Retrosheet database, thanks to a great post walking through how to import it into MySQL. The Retrosheet data is play by play data, which is helpful for game analysis. There’s also pitch by pitch data out on the net, which I will get to at some point. I also pulled down the Lahman database, which is season summary statistics, and where I plan to start building my SQL skills back up. The discrete event data from the Retrosheet will be really helpful for building my training data when I start modeling.

Long preamble for my TIL. Today I Learned that in 2014, Colorado had the best team slugging percentage, and Detroit had the best team batting average. It took some doing to get there, but here’s the SQL query, querying against the Lahman database.

select teamID,
count(*) batters,
round(sum(H) / sum(AB),3) team_AVG,
round((sum(H) + sum(2B) + 2 * sum(3B) + 3 * sum(HR))/sum(AB), 3) team_SLG
FROM batting WHERE yearID=2014 GROUP BY teamID ORDER BY team_AVG desc;

I ran it twice, ordering by team average and team slugging percentage.

The Colorado Rockies having the best team slugging percentage is likely due to the thin air at their home ballpark. The next query to validate this would be to see how much better team slugging percentages where when they played at Coors Field. That’s an exercise for another day. However, to validate that I computed the right numbers, I headed over to ESPN to check team averages.

Things checked out, which is always nice. Nothing really ground breaking in terms of analysis yet, but getting all of the data onto one computer, and being able to write queries against it and then share, well, that’s pretty astounding. The Internet really has brought about so much empowerment for individuals. I’m looking forward to spending more time in the data, and sharing along the way.

Pros Are Fast

IM World Champ

Yeah, so today I learned just how fast pros are. Specifically, current (and 3x) Ironman World Champion Mirinda Carfrae. From the moment she passed me (going uphill into about 15mph wind) to the time I could snap this photo, she covered that distance. I run with my phone in my hand, so I didn’t have to fumble about for it. It was humbling and amazing to witness.

Piranha Eat Fast

We took the kids over to the Denver Aquarium today, hoping to take advantage of the New Year holiday and hopefully hit no crowds. For this installment of Today I Learned, I was surprised to find out that a large school of piranha can apparently eat an entire cow in 3-4 minutes. 3. To. 4. Minutes. I’m not sure that a well coordinated pack of human beings could completely clean a carcass of a cow with proper butchering knives. I was a bit skeptical when I read this, but I did a little searching and it turns out that an account of this sort of cow-nivorous behavior (groan) was reported by none other than President Theodore Roosevelt on an expedition to Brazil. Sure, the fish has been corralled and starved for a few days, but hells bells man.

Running in the Cold

In the spirit of reddit’s “today I learned” subreddit, I have decided that at least once a week, I want to share something that I learned. Today’s edition is about running in the cold. And by cold I mean…brrrrrrrrrrrr cold.

There’s been a bit of a cold snap moving through these parts, and the snow has also come with it. Over the course of the last few days, we have seen roughly 10 inches of new snow. Today the sun finally broke through, but all that meant was that I was fooled into thinking that it might actually warm outside. Today’s workout was a threshold bike ride on the trainer followed by a transition run outside. Given all of my problems running in the heat, it really shouldn’t surprise me that I run well in the colder climate.

So what did I learn today?

  1. Somewhat at ends with what I thought would be reality, it was easier to cut a new path while running in 6-8″ of snow than to run on the tracks left by a XC skier.
  2. If you bundle up appropriately, the cold is nothing to fear when exercising. My specific ensemble included running tights, arm and leg warmers, a cycling cold weather jacket, Under Armor shirt, tri short and sleeveless jersey, really good gloves, and a beanie.
  3. Bundling up everywhere else will not prevent your toes from getting cold. Time to invest in some proper cold weather running socks.
  4. YakTrax are an amazing product for running on snow and ice. I never had cause to purchase these items, but they came on the recommendation of Coach Ben and someone at work. I got the running model, and they pretty much made me forget I was running on ice and snow.
  5. One should not put leg warmers on top of running tights. They should go underneath. Else you will stop every 5 minutes to pull them back up, and with gloves on, this is complicated.
  6. Snot freezes.