Which brings me to this post about Cliff Lee. Various sources (I looked at several articles, Fangraphs, and yes, Wikipedia) tell me that Lee throws five different pitches, armed with a four-seam fastball, a two-seam fastball, a cutter, a changeup, and a curveball. Plenty of past articles have detailed the faultiness and/or suggested better reclassification techniques of MLBAM's pitch type classification where all of the great PITCHf/x data comes from. I would preface this and my future posts with the knowledge that I will use MLBAM's pitch classification for now, as developing my own algorithm to determine a better classification seems like a time-consuming and daunting task to say the least.
But before I dive into plots of Cliff Lee's pitches, I need to look at his pitch types first according to MLBAM. Here's a list of the frequency of his pitches by handedness since 2007, according to MLBAM's classification and my database:
FA to R: 2016 pitches
FA to L: 885 pitches
FF to R: 1585 pitches
FF to L: 773 pitches
FC to R: 301 pitches
FC to L: 79 pitches
CH to R: 1237 pitches
CH to L: 80 pitches
CU to R: 490 pitches
CU to L: 286 pitches
SL to R: 222 pitches
SL to L: 384 pitches
Fangraphs's player page of Cliff Lee tells us that Lee rarely used his slider in recent years (less than 2% of his pitches) but uses his curveball much more. In the list above, the counts for SL (slider) and CU (curveball) are in the same range, so something is definitely fishy here. Fangraphs uses classifications from Baseball Info Solutions, so I am inclined to believe that if Lee does throw a slider, that he throws it rarely as opposed to almost as often as his curveball.
Now a very thorough study would include looking at the release points, spin deflection, and movements of Lee's sliders according to MLBAM to see if they were somehow misclassified as sliders when they were really curveballs that didn't break. I'm not going to do that in this post. Instead, I'll simply ignore the pitches that were classified as sliders and move on.
The same thing with FA (generic fastballs). These are presumably fastballs that could have been four-seamers or two-seamers, so in this analysis, I will ignore FA as well. Which means that I'll be leaving out two-seam fastballs.
Now that we've got that out of the way, let's look at location density plots of Cliff Lee's four-seam fastballs against RHH (1585 pitches) and LHH (773 pitches):
I've added contour lines indicating the area inside which the batter is 50% likely to swing at a pitch, as well as where the batter is 50% likely to make contact on a pitch (hopefully I did them right and interpret them correctly as well, feel free to correct me when you settle down from the PITCHf/x Summit, Jeremy). I also decided to use this color scheme because it contrasts well with the red and blue contour lines and also makes a yellowish contour line to show where Lee tends to locate his four-seam fastballs the most.
Here, Lee throws his four-seamer all over the zone against right-handed hitters, but throws them mostly to the outer (left) part of the zone against left-handed hitters. It would appear that Lee gets called strikes often in this area against LHH. This means that left-handed hitters are more likely to swing when the fastball comes up and inside and down the middle. Both RHH and LHH seem to make contact off of Lee's fastballs when they do swing, even when the four-seamer is up and out of the zone.
Here's a look at Cliff Lee's curveballs and opposing hitters' swing zones and contact zones (490 pitches to RHH, 286 pitches to LHH):
Lee distributes his curveballs very similarly against RHH and LHH, but hitters react differently depending on handedness. It would appear at first that LHH have the advantage by making more contact, but remember that these contour lines represent where the batter swings 50% of the time. This means that right-handed hitters are making better and solid contact in the sweet spots of the zone against Lee than left-handed hitters do, as the LHH 50% contact zone encompasses the upper and lower parts outside the zone. The RHH contact zone is much smaller than the RHH swing zone compared to the LHH contour lines, but that doesn't necessarily mean that RHH are getting more swinging strikes than LHH. Even if they did, making contact on low curveballs out of the zone usually results in a routine ground out. Basically, the larger the contact zone, the worse the contact being made, so the left-hander's curveballs are more successful against LHH than RHH, which is expected.
Finally, let's look at Cliff Lee's cutters (301 pitches) and his changeups (1237 pitches) against right-handed hitters (I'm not including cutters and changeups against LHH because Lee rarely throws these to them, less than 100 times for either since 2007):
Lee likes to throw his cutter all over the middle parts of the zone against RHH, and interestingly likes to locate his changeups to the outer parts of the zone. What's interesting is that the contact zone is smaller against changeups than against cutters but the swing zones are about the same size, albeit in different locations. To me, this means that Lee is able to get more swinging strikes against RHH with changeups, a precondition being that the swing zones are similar-sized.
These plots are definitely interesting to look at and analyze, but they are also in a beta stage. I hope to make "visual scouting reports" like these in the future for both batters and pitchers, similar to what Jeremy Greenhouse did last year (to borrow his terminology). Having a good understanding of the relationship between the swing zone and contact zone will be important (and possibly even the swinging strike zone and balls in play zone). Hopefully I was able to interpret them correctly here, but I am ready to go back to the scatter plots to see if some of my analysis is confirmed there as well.
Helpful Tip: I don't have my own pitch classification algorithm either, but if you're looking individually at a single pitcher, you really need to do the pitch classifications manually.
ReplyDeleteIn general, it's not that difficult. Plot scatterplots of Pfx_x by pfx_z and pfx_x by start_speed and you should see the pitches break up into decent clusters. Alternatively, you could use Spin direction/spin rate and pitch velocity instead.
Note: Seperating two-seam and four-seam fastballs can be a pain, but really you just take your best shot at it (two seams move more in on same-handed batters and have more drop). Sometimes looking at individual starts helps a lot, thought that might be more time consuming.
Anyhow, my point is, if you do individual looks at pitchers, you really need to reclassify your pitches manually. You can't trust the FA designation, and the algorithm is still too happy to give most fastballs a FF designation, even when they are clearly FTs.
Your graphs show you're using GNU R, so it should be trivial to do k-means clustering to get classifications. I'd use start_speed, pfx_x and pfx_z as the input variables. The only thing you have to do is pick your number of clusters in advance.
ReplyDeleteRight, you can use K-means clustering also (though at least in my program you need to check the results to make sure it didn't misclassify an obvious pitch)
ReplyDeleteRight, for example, applying K-Means (Clusters =4, he clearly has a 4-seam, two-seam, curveball, and changeup) resulted in my program splitting up the curveballs for some odd reason.
ReplyDeleteBut maybe R doesn't have this problem.
Thanks a lot for the advice. I'll definitely look into these, I have looked at the pfx_x and pfx_z variables in movement plots before, but will take a look at clustering this week, at least for Lee. Hopefully it categorizes the curveballs and two-seamers reasonably.
ReplyDelete