Elo rating system


The Elo rating system is a method for calculating the relative skill levels of players, originally designed for rating chess players. It is named after its creator Arpad Elo, a Hungarian-American chess master and physics professor. The Elo system was invented as an improved chess rating system over the previously used Harkness rating system. The system has been adapted for use in other zero-sum games and sports, including tennis, association football, American football, baseball, basketball, pool, various board games and esports.
The difference in the ratings between two players serves as a predictor of the outcome of a match. Two players with equal ratings who play against each other are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent's is expected to score 64%; if the difference is 200 points, then the expected score for the stronger player is 76%.
A player's Elo rating is a number that may change depending on the outcome of rated games played. After every game, the winning player takes points from the losing one. The difference between the ratings of the winner and loser determines the total number of points gained or lost after a game. If the higher-rated player wins, only a few rating points will be taken from the lower-rated player; however, if the lower-rated player scores an upset win, many rating points will be transferred. The lower-rated player will also gain a few points from the higher-rated player in the event of a draw. This means that this rating system is self-correcting. In the long run, players whose ratings are too low or too high should do better or worse, respectively, than the rating system predicts and thus gain or lose rating points until the ratings reflect their true playing strength.
Elo ratings are comparative only and are valid only within the rating pool in which they were calculated, rather than being an absolute measure of a player's strength. While Elo-like systems are widely used in two-player settings, variations have also been applied to multiplayer competitions.

History

was a chess master and an active participant in the United States Chess Federation from its founding in 1939. The USCF used a numerical ratings system devised by Kenneth Harkness to enable members to track their individual progress in terms other than tournament wins and losses. The Harkness system was reasonably fair, but in some circumstances gave rise to ratings many observers considered inaccurate. On behalf of the USCF, Elo devised a new system with a more sound statistical basis. At about the same time, György Karoly and Roger Cook independently developed a system based on the same principles for the New South Wales Chess Association.
Elo's system replaced earlier systems of competitive rewards with one based on statistical estimation. Rating systems for many sports award points in accordance with subjective evaluations of the 'greatness' of certain achievements. For example, winning an important golf tournament might be worth an arbitrarily chosen five times as many points as winning a lesser tournament. A statistical endeavor, by contrast, uses a model that relates the game results to underlying variables representing the ability of each player.
Elo's central assumption was that the chess performance of each player in each game is a normally distributed random variable. Although a player might perform significantly better or worse from one game to the next, Elo assumed that the mean value of the performances of any given player changes only slowly over time. Elo thought of a player's true skill as the mean of that player's performance random variable. A further assumption is necessary because chess performance in the above sense is still not measurable. One cannot look at a sequence of moves and derive a number to represent that player's skill. Performance can only be inferred from wins, draws, and losses. Therefore, a player who wins a game is assumed to have performed at a higher level than the opponent for that game. Conversely, a losing player is assumed to have performed at a lower level. If the game ends in a draw, the two players are assumed to have performed at nearly the same level.
Elo did not specify exactly how close two performances ought to be to result in a draw as opposed to a win or loss. Actually, there is a probability of a draw that is dependent on the performance differential, so this latter is more of a confidence interval than any deterministic frontier. And while he thought it was likely that players might have different standard deviations to their performances, he made a simplifying assumption to the contrary. To simplify computation even further, Elo proposed a straightforward method of estimating the variables in his model. One could calculate relatively easily from tables how many games players would be expected to win based on comparisons of their ratings to those of their opponents. The ratings of a player who won more games than expected would be adjusted upward, while those of a player who won fewer than expected would be adjusted downward. Moreover, that adjustment was to be in linear proportion to the number of wins by which the player had exceeded or fallen short of their expected number.
From a modern perspective, Elo's simplifying assumptions are not necessary because computing power is inexpensive and widely available. Several people, most notably Mark Glickman, have proposed using more sophisticated statistical machinery to estimate the same variables. On the other hand, the computational simplicity of the Elo system has proven to be one of its greatest assets. With the aid of a pocket calculator, an informed chess competitor can calculate to within one point what their next officially published rating will be, which helps promote a perception that the ratings are fair.

Implementing Elo's scheme

The USCF implemented Elo's suggestions in 1960, and the system quickly gained recognition as being both fairer and more accurate than the Harkness rating system. Elo's system was adopted by the World Chess Federation in 1970. Elo described his work in detail in The Rating of Chessplayers, Past and Present, first published in 1978.
Subsequent statistical tests have suggested that chess performance is almost certainly not distributed as a normal distribution, as weaker players have greater winning chances than Elo's model predicts. In paired comparison data, there is often very little practical difference in whether it is assumed that the differences in players' strengths are normally or logistically distributed. Mathematically, however, the logistic function is more convenient to work with than the normal distribution.
FIDE continues to use the rating difference table as proposed by Elo.
The development of the Percentage Expectancy Table is described in more detail by Elo as follows:

The normal probabilities may be taken directly from the standard
tables of the areas under the normal curve when the difference in rating is
expressed as a z score. Since the standard deviation σ of individual
performances is defined as 200 points, the standard deviation σ' of the
differences in performances becomes σ√2 or 282.84. The z value of a
difference then is. This will then divide the area under the
curve into two parts, the larger giving P for the higher rated player and
the smaller giving P for the lower rated player.
For example, let. Then. The table
gives and as the areas of the two portions under the curve.
These probabilities are rounded to two figures in table 2.11.

The table is actually built with standard deviation as an approximation for. The normal and logistic distributions are in a way arbitrary points in a spectrum of distributions which would work well. In practice, both of these distributions work very well for a number of different games.

Different ratings systems

The phrase "Elo rating" is often used to mean a player's chess rating as calculated by FIDE. However, this usage may be confusing or misleading because Elo's general ideas have been adopted by many organizations, including the USCF, many other national chess federations, the short-lived Professional Chess Association, and online chess servers including the Internet Chess Club, Free Internet Chess Server, Lichess, Chess.com, and Yahoo! Games. Each organization has a unique implementation, and none of them follow Elo's original suggestions precisely.
Instead one may refer to the organization granting the rating. For example: "As of April 2018, Tatev Abrahamyan had a FIDE rating of 2366 and a USCF rating of 2473." The Elo ratings of these various organizations are not directly comparable, since Elo ratings measure the results within a closed pool of players rather than absolute skill.

FIDE ratings

For top players, the most important rating is their FIDE rating. FIDE has issued the following lists:
  • From 1971 to 1980, one list a year was issued.
  • From 1981 to 2000, two lists a year were issued, in January and July.
  • From July 2000 to July 2009, four lists a year were issued, at the start of January, April, July and October.
  • From July 2009 to July 2012, six lists a year were issued, at the start of January, March, May, July, September and November.
  • Since July 2012, the list has been updated monthly.
The following analysis of the July 2015 FIDE rating list gives a rough impression of what a given FIDE rating means in terms of world ranking:
  • 5,323 players had an active rating in the range 2200 to 2299, which is usually associated with the Candidate Master title.
  • 2,869 players had an active rating in the range 2300 to 2399, which is usually associated with the FIDE Master title.
  • 1,420 players had an active rating between 2400 and 2499, most of whom had either the International Master or the International Grandmaster title.
  • 542 players had an active rating between 2500 and 2599, most of whom had the International Grandmaster title.
  • 187 players had an active rating between 2600 and 2699, all of whom had the International Grandmaster title.
  • 40 players had an active rating between 2700 and 2799.
  • 4 players had an active rating of over 2800..
The highest ever FIDE rating was 2882, which Magnus Carlsen had on the May 2014 list. A list of the highest-rated players ever is at Comparison of top chess players throughout history.