Twinkie Town: An SB Nation Community

Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: MLB Trade Deadline: Where each team stands right now

Buying a World Series

I've been playing with a dataset I pulled from baseball-reference last weekend (should have been doing stats homework... ah well). 

My idea stemmed from Montanatwinsfan's post and the respective dialog from the community on the validity of "new" vs. "old" statistics.  The ultimate goal is to compile a dataset of team statistics by year and run a series of multiple regressions on the old and new stats to see which ones hold up under statistical scrutiny.  The research question is "Which statistics contribute the most to predicting wins, playoff appearances and ultimately World Series appearances?"

Star-divide

 

I currently have all data from 2000 through 2007, but unfortunately haven't yet bought SPSS to run the regressions.  In the meantime, I've been playing with the dataset in pivot tables a la Excel 2007 (which, for you data miners out there is pretty slick compared to the 2003 version).

I realize that most of this is common sense stuff, but thought it might be interesting to have the numbers to back up the common sense.  In what I hope will be the first of several posts based on the data, here is a look at average salary by year, broken into three groups: Teams that did not make the playoffs, teams that made the playoffs but not the WS, and teams that played in the WS. 

I did not break it out any further because the variance becomes too large to find any meaningful data.  This means that for each WS data point, there are two teams, and for every playoff data point there are eight teams (includes the WS teams).

The first observation I took from the data is that teams that did not make the playoffs have steadily increased their salary, but fall well short of the kind of money teams that make the playoffs are spending.  The second observation I made from this data is that the average salary of WS teams wildly fluctuates to the point where it's almost impossible to draw conclusions on the data.  Four of the WS data points fall below the playoff average (2002, 2005, 2006, 2007) and the other four fall above the playoff average (2000, 2001, 2003, 2004).  We'll approach this anomoly in a moment.

Next, it was time to take a step further into the data.  I created linear regression lines and removed the trend lines from the graph.  You will also note that I provided the regression equation and R-squared calculation, if you're into that sort of thing.

 

 The first thing to note about this graph is that the R-squared value for WS is extremely low, again validating that there is just too little data and too much variance to make an accurate inference about the regression line.  The other two lines, however, seem to fit the data quite well and tell an interesting story.  Simply put, non-playoff teams are averaging a 2.4 million increase per year, while playoff teams are averaging a 6.8 million increase per year.  To put it in context, playoff teams in 2007 spent an average of 24 million more than non-playoff teams.  If this regression holds true, by 2012, playoff teams will spend an average of 46 million more than non-playoff teams!

While these results may not be earth-shattering to most people who have been lamenting about the Yankees (and now Boston) "buying" the World Series, I wanted to slice this data one more time.  This time, I removed both Boston and New York (Yankees) from the data to see how our regression lines would change:

Wow!  The increase in playoff team salaries is now only 1 million more than non-playoff teams.  Additionally, the differences between playoff teams and non-playoff teams is holding fairly constant at about 8 million. 

While again, we can't make any actual inferences about the World Series regression line because the R-squared is too small, it's really crazy to see the shear randomness (is that a word?) of the World Series team's payroll.  With New York and Boston out of the picture, the Marlins (2003) and Rockies (2007) look almost comical when compared to the rest of the data.

The conclusion drawn from the data above are as follows:

1. The ability to "buy" a world series is still inconclusive because there is just too little data to accurately predict world series based on salary.  In fact, several of the past WS teams had significantly lower salaries than even the non-playoff teams.

2. The ability to "buy" a playoff berth seems to be a fairly established trend.  The rate of increase for playoff teams is more than non-playoff teams, so we can expect the margin to grow over time, regardless of whether you include Boston and New York or not. 

3. New York and Boston's combined salary will soon be larger than the sum of the other 28 teams. (Note: Intentional hyperbole thrown in for comic relief... it's late and I've just spent an hour writing about statistics, give me a break)

3 recs  |  Comment 18 comments |

Story-email Email Printer Print

Comments

Display:

Love the research.

I like where you're going with this, and I enjoyed this article. Well done.

My perspective: I'm not sure you can support your conclusions based on the data above. Your conclusions are based on linear regressions, which are themselves based on just eight data points. I'm not sure that we can generalize from this set.

Second, you haven't supported your causation. Your inference - that money buys the playoffs, that y is caused by x - is based on correlation. I'm not suggesting that there is no relationship between the two, only that there are other factors in play.

I like the statistical posts, though - I encourage you to keep working on this type of thing.

by Jon Marthaler on Apr 9, 2008 3:12 AM EDT reply actions  

Causation vs. correlation

Thanks for bringing this up Jon. It's a good point to make when looking at statistics and I completely agree with your concerns.

First, while I currently only have data 2000 - 2007, I hope to eventually build the 90's era into the dataset when I get more time. With more data, and not breaking the results out by year, there should be enough data in the "playoff - no ws" category to accurately make these assumptions.

Second, as mentioned in my original post, I am hoping to get a statistical software package like SPSS to help with running regressions instead of correlations. Not only will this allow me to add multiple variables to the model, but also help to isolate the effect that team salary (or any other variable) has on playoff appearances and ultimately World Series appearances.

What would my life be like without the '91 World Series?

by MJesser on Apr 9, 2008 9:54 AM EDT up reply actions  

If anyone has the spare time

Jon - well put. These are GREAT posts for starting discussions. The causation/correlation question, to me, is that the teams that make the playoffs tend to get more revenue allowing them to spend more, but they were already good, so they keep going to the playoffs (think yankees, red sox, braves). Obviously this is a chicken/egg, dog chasing its tail kind of a question.

The statistic I'd really like to see, if anyone is interested in putting it together, is the year over year wins change relative to the year over year payroll change. I think this helps isolate the impact of spending MORE money. It won't account for the fact that many good teams have to start increasing their payroll to keep their young stars, so the results may not be intuitive.

by snolls on Apr 9, 2008 8:27 AM EDT reply actions  

This is great

Nice work on this. I agree with you, too, that the World Series data has so few datapoints as to be pretty much useless - it's especially pronounced in your chart without Boston and New York, because they represent over half the AL representatives in the World Series over the length of the study (meaning that most of the "averages" are a single datapoint), and in 2003 and 2007 they faced extremely low-payroll teams, throwing the trend data way out of whack.

I think another useful way to plot the data would be to compare payrolls to league average, rather than overall dollars - it would be interesting to see, for example, what percentage of league payroll is spent by the playoff teams vs. non-playoff teams, and how those payrolls are increasing/decreasing in relative terms, which probably more accurately represents how teams are participating in the market. It's also another situation in which you may want to consider excluding the Yankees and possibly the Red Sox for at least some of the comparisons.

"There are only two things that are infinite, the universe and human stupidity, and I'm not sure about the former." - Albert Einstein

by BeefMaster on Apr 9, 2008 10:35 AM EDT reply actions  

Like it.

"I think another useful way to plot the data would be to compare payrolls to league average, rather than overall dollars."

I like this. This is one of the things I was thinking about last night - payrolls throughout the league have been going up, not just those of the top teams, and so it would be good to remove (or at least slightly remove) the effect of "inflation" from this.

You may want to correct for outliers when calculating the league average, too.

by Jon Marthaler on Apr 9, 2008 12:43 PM EDT up reply actions  

One thing that might escape these models

is the whole complex of factors of how baseball is changing.

The '90s were an era of big boppers and steroids. Now, we seem to be moving toward an appreciation for OBP, OPS, pitching, defense and overall team speed.

While such changes might easily be captured by salary analysis, in that teams all spend trying to capture the best pool of talent, what might end up happening is that some desirable qualities, like pitching, defense and team speed, may be more likely possessed by young and fairly cheap players.

Given that there is a huge salary premium in MLB for players with ML experience, we may soon enter an era where young, inexpensive talent can make high quality contributions to a team's overall success. And thus, more and more money will be spent on scouting and minor league development as a near term competitive strategy.

Baseball, like any industry, is never static, and what you might find in such an analysis is that there is a premium that goes to a team that finds the best young talent on the planet.

by Old Twins Cap on Apr 9, 2008 10:42 AM EDT reply actions  

Moneyball

While such changes might easily be captured by salary analysis, in that teams all spend trying to capture the best pool of talent, what might end up happening is that some desirable qualities, like pitching, defense and team speed, may be more likely possessed by young and fairly cheap players.

That's an interesting hypothesis, and it also made me think of an interesting avenue of research - has anyone looked into correlations between statistics and player salaries? I'd guess there's a fairly strong correlation with obvious stuff like homeruns (and likely, by extension, slugging average), but I'd also wonder whether it would be possible to use the data to spot trends in what teams are looking for in players - for example, I'd guess the last decade has seen an increase in the money paid for higher walk rates, possibly along with a decreased emphasis on batting average.

That thought actually wouldn't go along with your idea at all, though, since younger players' salaries aren't determined at all by the market. Even arbitration-eligible players' salaries would have to be taken with a grain of salt in that data, since they're determined largely by seniority in addition to performance.

"There are only two things that are infinite, the universe and human stupidity, and I'm not sure about the former." - Albert Einstein

by BeefMaster on Apr 9, 2008 12:45 PM EDT up reply actions  

Great post

When I have more time, I'll do my best to digest it.

"You're thinking too much. Just have fun." -- Bennie "The Jet" Rodriguez in Sandlot

by cmathewson on Apr 9, 2008 11:48 AM EDT reply actions  

On thought

I wonder what the graphs would look like if you take Boston and New York out of the pool of teams.

"You're thinking too much. Just have fun." -- Bennie "The Jet" Rodriguez in Sandlot

by cmathewson on Apr 9, 2008 11:57 AM EDT reply actions  

Already done

It's the third graph.

"There are only two things that are infinite, the universe and human stupidity, and I'm not sure about the former." - Albert Einstein

by BeefMaster on Apr 9, 2008 12:03 PM EDT up reply actions  

It seems a bit silly to me...

...to take them out of the equation completely. They've won 5 of the last 10 WS and have 16 playoff appearances in the last 20 years. They are exhibits 1A and 1B in the case for a huge payroll being a huge competitive advantage.

Considering this issue without considering those two is like considering how good the '48 Braves were without Warren Spahn and Johnny Sain.

by ubelmann on Apr 9, 2008 2:05 PM EDT up reply actions  

The point

The point of removing Boston and New York from the third graph was more an exercise in showing that hypothesis 2 from my original post still holds true even when the extreme outliers that could significantly affect the "Playoff - Non WS" trend line were removed.

What would my life be like without the '91 World Series?

by MJesser on Apr 9, 2008 2:57 PM EDT up reply actions  

I've added this to the front page

because it's some good work; at the very least it's a great discussion.

I'll continue to do this for excellent posts. Keep up the good work!

by Jesse on Apr 9, 2008 1:06 PM EDT reply actions  

You might not be able to buy a WS...

...but by having a huge payroll you never have to rebuild and you get yourself to the playoffs more often than a team with similar know-how and a lower budget. More chances to win the WS will lead to more WS victories in the long run. Baseball has a ways to go if it truly wants parity in the league.

by ubelmann on Apr 9, 2008 1:58 PM EDT reply actions  

Exactly

This is exactly the point I'm trying to make. I think that baseball has been quite "lucky" the past couple years that extremely low payroll teams have made it to the Series.

Assuming the trend lines hold reasonably true (which may be a bit of a leap of logic as Jon mentioned above), these low payroll teams will have an increasingly hard time to even MAKE the playoffs, let alone go to the Series. This is why I attempted to to focus on the differences and the increasing "gap" between Playoff and Non-Playoff teams and tried to downplay the WS numbers.

What would my life be like without the '91 World Series?

by MJesser on Apr 9, 2008 3:02 PM EDT up reply actions  

Well, as a graduate student in Economics about to take my econometrics comprehensive...

...I have to comment. I haven't read it too carefully because I am in a hurry, but I will say the following things.

What is your regression? From those graphs, it looks like you are regressing salary on year. This would mean you are assuming that "year" is an exogenous (non-random) variable which is obviously reasonable, but you are also assuming that it EXPLAINS salary. This might be true (salaries go up every year), but it also looks like that data might be deflated. Also, by not including anything else in the regression you assume that ONLY year explains salary. Also, the meaning of R^2 in this context is the percentage of salary that is explained by year. which, again, isn't really that interesting of a question. I think a more interesting regression would be something like wins = a+ b*salary + c*(other performance or city characteristics), but that depends on what it is that you are trying to explain.

Also, I didn't realize base-ball reference had salary data, I'll have to check it out. I have a couple of cool baseball data sets and if you want them I can send them to you. I pulled a couple from baseball reference (like all hitting and pitching data for the last 100 years), I got salary data from USA Today and I have some attendance data from somewhere else (I might even have ticket prices).

http://noblingblings.blogspot.com/

by Aaron Fix on Apr 9, 2008 8:37 PM EDT reply actions  

Data

You're right in regards to the regression, it's more like a simple correlation. I used the regression line charts more to show general trends in graphical form. I'm trying to get my hands on SPSS, but for now, I'm forced to use the limited linear regression provided in the excel 2007 chart wizard. If you have any ideas on how to run multiple regression using excel, I'm all ears!

As for the data, it took some work to find (and more work to compile the dataset), but it's all there.

Here's a sample of what I pulled for 2007 - AL: http://www.baseball-reference.com/leagues/AL_2006.shtml

You can find team payroll under the League Miscellaneous Stats section. I'm looking forward to playing with some of the other information to see what pops. I started work on some of the defensive stats, but since there are several variables, I'd rather run multiple regression so that I can isolate the effects. I also have some ideas about looking at All Star players and win percentages. We'll see.

Unfortunately the data is all rolled up to the team level, so you're limited in that regard, but for what I'm ultimately trying to accomplish, it should work fine. If you have any thoughts for future posts, let me know.

What would my life be like without the '91 World Series?

by MJesser on Apr 9, 2008 9:00 PM EDT up reply actions  

Great post

even though I didn't understand a word of it. :) What can I say, I'm a lawyer and I can barely do simple arithmetic even with a calculator.

Keep it coming, I enjoy learning, but also please remember to put some of this in english for me because my eyes start to glaze over when you guys throw out mathematical equations and superscript numbers.......

by montanatwinsfan on Apr 9, 2008 10:14 PM EDT reply actions  

Comments For This Post Are Closed


User Tools

TT is an SB Nation blog of, by and for the fans. We strive to be the best Minnesota Twins blog by providing quality content and analysis, as well as daily news and notes on the team. We hope you'll make Twinkie Town your home for all things Twins!
Start posting about the Twins »

Join SB Nation and dive into communities focused on all your favorite teams.

Connect_with_facebook

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Small
Mr. Grit overlooked
The_jet_small
Free Danny Valencia
Small
Minor League Report...July 24, 2010
Small
The Twins have the best starters in the AL.
Small
Minor League Report...July 17, 2010

Recent FanPosts

Small
Good News/Bad News: Twins trade Ramos for Capps.
Small
Chicago TV announcers...
Twins_small
Matt Capps
Bc_small
Clutch hitting, fact or illusion?
Minnesota_twins_vinyl_baseball_small
Words from the great beyond? And haves and the have nots…
Small
Trade Help: How Much Are My Guys Worth?
Small
Poll: Your favorite non-Twins team.
Small
What team do you hate the most?

+ New FanPost All FanPosts >

Twinkie Town On Twitter

SPONSORS

SBNation.com Recent Stories

HOUSTON - JULY 24:  Pitcher Roy Oswalt #44 of the Houston Astros throws against the Cincinnati Reds in the first inning at Minute Maid Park on July 24 2010 in Houston Texas.  (Photo by Bob Levey/Getty Images) +13 updates

Done Deal: Roy Oswalt Traded To Phillies, Will Make Debut Friday Night In Washington

Washington Nationals' third base coach Pat Listach shakes Adam Dunn's hand who rounds third after hitting a solo home run during the eighth inning of a baseball game against the Atlanta Braves, Thursday, July 29, 2010, in Washington. (AP Photo/Drew Angerer)

MLB Trade Deadline: Where Does Your Team Stand As Saturday Approaches?

Philadelphia Phillies' Cody Ransom, left, celebrates with Greg Dobbs (19) and Placido Polanco after Ransom scored on a single hit by Wilson Valdez against the  Arizona Diamondbacks to win the baseball game in the 11th inning Thursday, July 29, 2010, in Philadelphia. The Phillies won 3-2. (AP Photo/H. Rumph Jr)

Phillies Complete Sweep Of D'Backs With 11th-Inning Win

More from SBNation.com >


Editor-In-Chief

Twinkietown_small Jesse

Senior Writer

Small Bobomojo

Hrbek_small Jon Marthaler

The_jet_small cmathewson

Gladdentwins_small Adam Peterson

Hosken_powell_autograph_small RandBall's Stu

Special Contributor

Untitled_small Trevour

Twins-release_small Nick Nelson

Small Karlee Kanz

Moderators

Chairmanmauer_small fischean