tue, 26-dec-2023, 14:33

Introduction

For the past two years I’ve played Yahoo fantasy baseball with a group of friends. It’s a fun addition to watching games because it requires you to pay attention to more than just the players on the teams you root for (especially important if your favorite “team” is the Athletics).

Last year we had a draft party and it was interesting to see how different people approached the draft. Some of us chose players for emotional reasons like whether they played for the team they rooted for or what country the player was from, and some used a very analytical approach. The last two years I’ve tended to be more on the emotional side, choosing preferrentialy for former Oakland Athletcs players in the first year, and current Phillies last year. Some brought computers to track choices and rankings, and some didn’t bring anything at all except their phones and minds.

I’ve been working on my draft strategy for next year, and plan to use a more analytical approach to the draft. I’m working on an app that will have all the players in draft ranked, and allow me to easily mark off who has been selected, and who I’ve added to my team in real time as the draft is underway.

One of the important considerations for choosing any player is what positions they can play. Not only do you need to field a complete team with pitchers, catchers, infielders, and outfielders, but some players are capable of playing multiple positions, and those players can be more valuable to a fantasy manager than their pure numbers would suggest because you can plug them into different positions on any given day. Last year I had Alec Bohm on my team, which allowed me to fill either first base (typically manned by Vladimir Gurerro Jr) or third, depending on what teams were playing or who might be injured or getting a day off. I used Brandon Drury to great effect two years ago because he was eligible for three infield positions.

Positional eligibility for Yahoo fantasy follows these rules:

  • Position eligibility – 5 starts or 10 total appearances in a position.
  • Pitcher eligibility – 3 starts to be a starter, or 5 relief appearances to qualify as a reliever.

In this post I will use Retrosheet event data to determine the positional eligibility for all the players who played in the majors last year. In cases where a player in the draft hasn’t played in the majors but is likely to reach Major League Baseball in 2024, I’ll just use whatever position the projections have him in.

Methods

I’m going to use the retrosheet R package to load the event files for 2023, then determine how many games each player started and substituted at each position, and apply Yahoo’s rules to determine eligibility.

We’ll load some libraries, get the team IDs, and map Retrosheet position IDs to the usual position abbreviations.

library(tidyr)
library(dplyr)
library(purrr)
library(retrosheet)
library(glue)

YEAR <- 2023

team_ids <- getTeamIDs(YEAR)

positions <- tribble(
   ~fieldPos, ~pos,
   "1", "P",
   "2", "C",
   "3", "1B",
   "4", "2B",
   "5", "3B",
   "6", "SS",
   "7", "LF",
   "8", "CF",
   "9", "RF",
   "10", "DH",
   "11", "PH",
   "12", "PR"
)

Next, we write a function to retrieve the data for a single team’s home games, and extract the starting and subtitution information, which are stored as $start and $sub matrices in the Retrosheet event files. Then loop over this function for every team, and convert position ID to the position abbreviations.

get_pbp <- function(team_id) {
   print(glue("loading {team_id}"))

   pbp <- getRetrosheet("play", YEAR, team_id)

   starters <- map(
      seq(1, length(pbp)),
      function(game) {
      pbp[[game]]$start |>
         as_tibble()
      }
   ) |>
      list_rbind() |>
      mutate(start_sub = "start")

   subs <- map(
      seq(1, length(pbp)),
      function(game) {
      pbp[[game]]$sub |>
         as_tibble()
      }
   ) |>
      list_rbind() |>
      mutate(start_sub = "sub")

   bind_rows(starters, subs)
}

pbp_start_sub <- map(
   team_ids,
   get_pbp
) |>
   list_rbind() |>
   inner_join(positions, by = "fieldPos")

That data frame looks like this, with one row for every player that played in any game during the 2023 regular season:

# A tibble: 76,043 × 7
   retroID  name                  team  batPos fieldPos start_sub pos
   <chr>    <chr>                 <chr> <chr>  <chr>    <chr>     <chr>
   1 sprig001 George Springer       0     1      9        start     RF
   2 bichb001 Bo Bichette           0     2      6        start     SS
   3 guerv002 Vladimir Guerrero Jr. 0     3      3        start     1B
   4 chapm001 Matt Chapman          0     4      5        start     3B
   5 merrw001 Whit Merrifield       0     5      7        start     LF
   6 kirka001 Alejandro Kirk        0     6      2        start     C
   7 espis001 Santiago Espinal      0     7      4        start     2B
   8 luplj001 Jordan Luplow         0     8      10       start     DH
   9 kierk001 Kevin Kiermaier       0     9      8        start     CF
  10 bassc001 Chris Bassitt         0     0      1        start     P
# ℹ 76,033 more rows

Next, we convert that into appearances by grouping the data by player, whether they were a starter or substitute, and by their position. Since each row in the original data frame is per game, we can use n() to count the games each player started and subbed for each position.

appearances <- pbp_start_sub |>
   group_by(retroID, name, start_sub, pos) |>
   summarize(games = n(), .groups = "drop") |>
   pivot_wider(names_from = start_sub, values_from = games)

That looks like this:

# A tibble: 3,479 × 5
   retroID  name          pos     sub start
   <chr>    <chr>         <chr> <int> <int>
   1 abadf001 Fernando Abad P         6    NA
   2 abboa001 Andrew Abbott P        NA    21
   3 abboc001 Cory Abbott   P        22    NA
   4 abrac001 CJ Abrams     SS        3   148
   5 abrac001 CJ Abrams     PH        2    NA
   6 abrac001 CJ Abrams     PR        1    NA
   7 abrea001 Albert Abreu  P        45    NA
   8 abreb002 Bryan Abreu   P        72    NA
   9 abrej003 Jose Abreu    1B       NA   134
  10 abrej003 Jose Abreu    DH       NA     7
# ℹ 3,469 more rows

Finally, we group by the player and position, calculate eligibility, then group by player and combine all the positions they are eligible for into a single string. There’s a little funny business at the end to remove pitching eligibility from position players who are called into action as pitchers in blow out games, and player suffixes, which may or may not be necessary for matching against your projection ranks.

eligibility <- appearances |>
   filter(pos != "PH", pos != "PR") |>
   mutate(
      sub = if_else(is.na(sub), 0, sub),
      start = if_else(is.na(start), 0, start),
      total = sub + start,
      eligible = case_when(
      pos == "P" & start >= 3 & sub >= 5 ~ "SP,RP",
      pos == "P" & start >= 3 ~ "SP",
      pos == "P" & sub >= 5 ~ "RP",
      pos == "P" ~ "P",
      start >= 5 | total >= 10 ~ pos,
      TRUE ~ NA
      )
   ) |>
   filter(!is.na(eligible)) |>
   arrange(retroID, name, desc(total)) |>
   group_by(retroID, name) |>
   summarize(
      eligible = paste(eligible, collapse = ","),
      eligible = gsub(",P$", "", eligible),
      .groups = "drop"
   ) |>
   mutate(
      name = gsub(" (Jr.|II|IV)", "", name)
   )

Here’s a look at the final results. You can download the full data as a CSV file below.

# A tibble: 1,402 × 3
   retroID  name            eligible
   <chr>    <chr>           <chr>
   1 abadf001 Fernando Abad   RP
   2 abboa001 Andrew Abbott   SP
   3 abboc001 Cory Abbott     RP
   4 abrac001 CJ Abrams       SS
   5 abrea001 Albert Abreu    RP
   6 abreb002 Bryan Abreu     RP
   7 abrej003 Jose Abreu      1B,DH
   8 abrew002 Wilyer Abreu    CF,LF
   9 acevd001 Domingo Acevedo RP
  10 actog001 Garrett Acton   RP
# ℹ 1,392 more rows

Who is eligible for the most positions? Here's the top 20:

   retroID  name              eligible
   <chr>    <chr>             <chr>
 1 herne001 Enrique Hernandez SS,2B,CF,3B,LF,1B
 2 diaza003 Aledmys Diaz      3B,SS,LF,2B,1B,DH
 3 hampg001 Garrett Hampson   SS,CF,RF,2B,LF
 4 mckiz001 Zach McKinstry    3B,2B,RF,LF,SS
 5 ariag002 Gabriel Arias     SS,1B,RF,3B
 6 bertj001 Jon Berti         SS,3B,LF,2B
 7 biggc002 Cavan Biggio      2B,RF,1B,3B
 8 cabro002 Oswaldo Cabrera   LF,RF,3B,SS
 9 castw003 Willi Castro      LF,CF,3B,2B
10 dubom001 Mauricio Dubon    2B,CF,LF,SS
11 edmat001 Tommy Edman       2B,SS,CF,RF
12 gallj002 Joey Gallo        1B,LF,CF,RF
13 ibana001 Andy Ibanez       2B,3B,LF,RF
14 newmk001 Kevin Newman      3B,SS,2B,1B,DH
15 rengl001 Luis Rengifo      2B,SS,3B,RF
16 senzn001 Nick Senzel       3B,LF,CF,RF
17 shorz001 Zack Short        2B,SS,3B,RP
18 stees001 Spencer Steer     1B,3B,LF,2B,DH
19 vargi001 Ildemaro Vargas   3B,2B,SS,LF
20 vierm001 Matt Vierling     RF,LF,3B,CF

Code and data

Downloads:

References and Acknowledgements

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.

Meta Photolog Archives