The Eighth Man

T8M’s Elo Standings Return

With the new USQ season upon us—at least in some parts of the country—our Elo Standings are back, and with new updates to boot. After undergoing some off-season optimization (see footnotes for full details1), our formula has become more precise, more predictive and more responsive to wins. What does that mean in practice? Two main things:

  1. The active range of Elo ratings is wider than ever.
    Last year’s final Elo range stretched from 1055 to 2282, and with our new format changes, that range now stretches from 927 to 2442. While our end-of-season mean regression slightly moderates those extremes, this widened range means that Elo now gives the worst team in the country a 0.16 percent chance of beating the best in the country, instead of its previous 0.86 percent chance. Those new odds are likely a lot more accurate given the range of quidditch in the US but are still much closer than, say, the odds of a beginner chess player defeating World Champion Magnus Carlsen (0.006 percent). The average Elo rating will still remain at 1500 (or close to it, depending on which teams return this season). Teams near that rating will see relatively little change. You can picture this change as simply grabbing the range of Elo ratings at either edge and stretching it out; the further from the average a team is, the more they will see their rating adjusted.
  2. Winning games this season will provide a bigger boost than seasons past.
    This adjustment will undoubtedly make Elo “jumpier” than in seasons past, but will mean that real improvements in a team’s skill will be reflected faster in the ratings. While this means that ratings will be more sensitive to a team’s recent performance, it won’t mean that things will necessarily be out of control—as it will still set up in such a way that the better a team is, the smaller jump it will see from its average win.

You can check out our updated preseason (and most up-to-date) Elo standings here.

[1] Methodology: We analyzed every USQ-official game played from Sept. 13, 2014 to present using a simplified cost function to test our Elo format’s predictiveness and made the following changes:
1. We updated our formula’s “K-Factor” from its original setting based on USQ’s SWIM value, to giving an extra 30 bonus points for each win. This change ensures that a win’s value is reflected a little better in our system. Previously, winning by 20 points was worth two times more than winning by 10 points; winning by 30 worth three times a win by 10; etc.—valuations that didn’t reflect the actual value of winning (or losing) a game in-range, where the range of winning by 10 to winning by 50 are all relatively the same to most teams’ snitch-on-pitch strategies. This works fine in the USQ algorithm, where they use win record as an additional adjustment in their ranking formula, but since Elo does not have that added benefit, the “win bonus” needs to be baked into the regular calculation. Our cost function determined that a win is worth between 28 and 29 additional quaffle points in this function, and we simplified this to 30 points to make it a “three-goal bonus” adjustment for a win.
2. We changed the “snitch when it matters” adjustment which logarithmically reduces a snitch catch’s value to start at +40 rather than starting at +30 as USQ’s formula uses. This was essentially a “common-sense” change to USQ’s formula. Since a team can still lose a game if their opponent catches the snitch when they’re up 30, we felt that a team catching when they were up 30 did not deserve the slight reduction in snitch value that USQ’s formula gives to such a catch.
3. We updated our own “K-Factor multiplier” from 0.8x for regular season games and 1.2x for postseason (USQ Cup) games to 1.0x and 1.5x, respectively. This is essentially a “data-driven” change—laid out only by our cost function—showing that our Elo formula is more predictive when more weight was given to a win. This is the main factor that will make our rating system more responsive to wins this season, essentially meaning a team’s rating will jump more throughout the season.

 

Only games from the 2014-15 season onward were used in this analysis. This is due to the higher relative frequency of new teams that entered our rating records in the seasons prior to that one limiting the predictivity of the model.

Leave a Reply

Your email address will not be published.