Photo: Corbin Burnes - Instagram
Corbin Burnes pitching
Corbin Burnes
Corbin Burnes is not going to repeat as the National League’s Cy Young Award winner this week, but at least one prominent statistical model suggests he should.
Burnes is not a finalist for the award, which will be announced on Wednesday, but he still may finish outside the top three in the voting. Baseball Prospectus’ Wins Above Replacement Player (WARP) model suggests he deserves better: At 5.4 WARP, he was their top ranked pitcher in all of baseball this season. He finished about three tenths of a win ahead of any other pitcher, including likely NL Cy Young winner Sandy Alcantara of the Marlins. Burnes’ WARP total is higher than his other comparable metrics: Baseball Reference’s version of Wins Above Replacement (bWAR) has him at 4.0 compared to Alcantara’s 8.0, and FanGraphs has him at 4.6 to Alcantara’s 5.7.
Jonathan Judge monitors Baseball Prospectus’ statistical models, and we recently talked to him in an effort to understand why their process sees Burnes differently from the other valuation systems.
KL: I know WARP was one of the first Wins Above Replacement models. How long has it been around, and how has it changed over the years?
JJ: It has been around for at least 25 years, because it started out as Value Over Replacement Player (VORP), which was just a pure offensive metric, and then it sort of started to get defense and other things added to it. The originator of it who is most commonly associated with it is Keith Woolner, who is currently the Principal Data Scientist of Baseball Research and Development for the Cleveland Guardians.
Stay on top of the news of the day
Subscribe to our free, daily e-newsletter to get Milwaukee's latest local news, restaurants, music, arts and entertainment and events delivered right to your inbox every weekday, plus a bonus Week in Review email on Saturdays.
The way it’s changed over the years, I would say it’s just largely gotten more sophisticated as our computing power has gotten more sophisticated. It has gone from being what some of the other metrics still are, like at FanGraphs or Baseball Reference, which is basically “tabulate your results, then try to compensate for your park a little bit,” to where we are actually directly accounting for quality of opponent. We account for things like that. So,it’s gotten a lot more aggressive, and we tend to be a lot more resistant to noise and we think our assessments tend to be a little more accurate for that reason. But a lot of it is just taking advantage of the things we’ve learned from catcher framing and some of those other aspects and trying to take those lessons and let our assessments of batters and pitchers benefit from that.
KL: The differences between the WAR models are kind of interesting all of the time, but the reason they’re topical right now is your assessment of Corbin Burnes, your #1 pitcher. You have Burnes valued differently from most of the commonly cited public metrics. Do you have a feel for why your model saw him differently?
JJ: The thing is, we have a couple of sort of benefits. It’s much harder for a pitcher to have their team influence their numbers for our material than it is for other metrics. So people who strike out a lot of batters, do not walk a lot of batters and tend to get very good results on balls in play on average, as we would expect, are going to do well in our metrics. For us, we would say that’s the way it should be, because things like FIP (Fielding Independent Pitching), the FanGraphs method, they pretend that balls in play are completely uncontrollable by the pitcher. They’re certainly very volatile, but they’re absolutely controllable by a pitcher, to some extent.
Our friends at Baseball Reference really tend to overcredit or punish pitchers for the quality of their defense, which is why they keep ranking (Phillies pitcher) Aaron Nola really low and we keep saying “no, he’s really good.” He’s never seen as their league-leading pitcher or close to it, and we every year keep saying the same thing. It’s because we know that his defense is not great, and we can sort of account for that. So I like to think that we do a better job of really trying to coax out skill and actual contributions, and separate that from the results, within reason.
KL: Is there a level to which you want your metric to be different from the other often-cited numbers? I know you’re not crafting numbers specifically to be different, but is there a level of difference that feels about right to you, as compared to being too similar or different from other models?
JJ: I certainly don’t think we try to be different on purpose. A lot of times DRA (Deserved Run Average) and FIP will agree on people a fair amount. So I tend to think that in general the differences are, I don’t want to say small, but if you’re a good pitcher in one you’re going to be a good pitcher in the other. There are very few instances where one will say you’re terrible and the other will say “My God, you’re a Cy Young candidate.”
I think it is interesting where they actually disagree. And so, I think for us the value is that we have noticed that it’s usually because the other, in our opinion cruder, measurements are usually punishing the pitcher for stuff they can’t do much about. For example, the quality of their defense, or that they’re being credited or cushioned too much by a park. So that tends to be the difference. We feel that we tend to distill down a little bit. Maybe a better way to think about it is that we neutralize more, I think, than others do, so when people have a serious contextual problem that would otherwise skew things a bit we tend to, we feel, do a better job of stopping that from being the case.
|
I think Corbin Burnes is interesting because he doesn’t have the innings and such that Alcantara has (Burnes pitched 202 innings in 2022 compared to Alcantara’s 228 2/3), but one thing that Burnes does do is strike a lot of guys out, he doesn’t walk a lot of guys, and the contact that he gets tends to be very weak. So those are all things that lean toward why we say “look, this guy is doing everything that you could ask him to do.” We just feel that, by our measurements at least, we value strikeouts extremely highly, we value walks extremely highly, and we also value people who just keep having balls that land in play not go very far or not be very strong. That usually tends to be the reason when there’s an unusual difference.
KL: There are also some cases with that variance where you end up being the low model on pitchers. White Sox pitcher Johnny Cueto, for example, was valued at 3.5 WAR by Baseball Reference but 0.2 by WARP. Jose Quintana of the Pirates and Cardinals (nearly 4 fWAR, 0.5 WARP) is another example. Is this just a case of the same thing, where the mix of inputs creates a very different result?
JJ: I tend to think so. I think we tend to be more skeptical of people who are sort of generating those sorts of results on weak contact only. I think we also have been a little harsh on (Brewers reliever) Brent Suter over the years. We tend to be fairly skeptical of people who in our view are good at avoiding strong contact or at least bad results on contact at an above average level, and that seems to be the primary thing they do. We don’t tend to see that as a particularly sustainable skill and that’s something that, at least within a single season, just doesn’t get as much weight as strikeouts because we know strikeouts are a compelling skill in part because they’re a compound skill: You have to do something three times in a row in the same at bat to achieve it, which means it’s almost certainly a better measure.
KL: At the risk of putting you on the spot, if you had a vote for NL Cy Young would you have voted for Corbin Burnes?
JJ: What I might have done, I certainly would have thought hard about it, but the one concern I have is that he threw fewer innings than Alcantara did, not by a lot but some. There’s a school of thought that the only way to really fairly compensate for that is to add a bunch of league average innings, about 20 average innings (to Burnes) to compensate for the fact that he did pitch fewer innings.
I think I would have thought very hard about casting the vote for Burnes, I wouldn’t have had a problem with that, but I probably could have been talked into Alcantara. He really did a nice job and even though we rated him lower than others, I think there’s something to be said for throwing almost 230 innings at such a high level. I think I would have considered that option as well.