Tuesday, July 3, 2012

History of Chess Ratings in the USA


“Before Prof. Elo stamped numbers on our foreheads, people played to win tournaments or place as high as they could in the standings or simply win as many games they could. Forget ratings. Play to win and let others figure out how good you are.” Leon Poliakoff, US Master

1933 – The Correspondence Chess League of America was the first national organization to use a numerical rating system. The CCLA used what was known as the Short System which clubs on the west coast had been using. In 1934 the CCLA switched to the Walt James Percentage System but in 1940 switched to a point system designed by Kenneth Williams.
1942 Al Horowitz’ Chess Review magazine which sponsored correspondence events used the Harkness system, an improvement of the Williams system.
1944 – The CCLA changed to an improved version of the Williams system devised by William Wilcock.
1949 – The Harkness system was proposed to the USCF. The British Chess Federation adopted it later and used it at least as late as 1967.
1950 – The USCF started using the Harkness system and published its first rating list in the November issue of Chess Life.  Dr. Reuben Fine topped the list at 2817 and Samuel Reshevsky was second with 2770.
1959 – The USCF named Prof. Arpad Elo the head of a committee to examine all rating systems and make recommendations.
1961 – Elo developed his system and it was used by the USCF.
1970 – FIDE started using the Elo system.
1993 – The German Chess Federation replaced  the Ingo System in Germany with Elo’s system.
2001 – the Glicko System was published.  

      The rating system is used to calculate the strength of the player based on his performance versus other players. Most of the systems recalculate ratings after a tournament or match but some are used to recalculate ratings after individual games. In general, a player's rating goes up if he performs better than expected and down if he performs worse than expected.

Ingo system
      The Ingo system was designed by Anton Hoesslinger and published in 1948. This is a simple system where a player's new rating is the average rating of his competition minus one point for each percentage point above 50 obtained in the tournament. This system is somewhat confusing because lower numbers indicate better performance

Harkness system
       The Harkness System was invented by Kenneth Harkness who published it in 1956 and it was used by the United States Chess Federation until 1960.
       Under the Harkness system, basically when a player competed in a tournament, the average rating of his competition is calculated. If the player scored 50 percent he receives the average competition rating as his performance rating. If he scored more than 50 percent his new rating was the competition average plus 10 points for each percentage point above 50. If he scored less than 50 percent his new rating was the competition average minus 10 points for each percentage point below 50.

Elo rating system
       The Elo system was invented by Prof. Arpad Elo and is the most common system in use today. The USCF uses a modification of the Elo system, where there are bonus points for superior performance in a tournament. USCF ratings are generally 50 to 100 points higher than the FIDE equivalents
       Over the years the USCF has tweaked the system several times.  One tweak is rating floors for players who have maintained a certain level of play over the course of a specified number of games.  Once a player has an established floor his rating will not drop below his floor.  The objection to having a floor is that they distort the rating system. If a player’s rating drops below his floor it does not go any lower.  i.e. if a player’s floor is 1800 and his rating drops below that, he is still rated 1800 and if he loses, points are added to his opponent's rating but not subtracted from his. The idea of floors was instituted because during the Fischer Boom when there was a great influx of new players they were winning points from established players who saw their rating going down.  This drop in rating discouraged many older players whose ability was waning from playing.  The problem was new players came into the rating pool with zero points which they won from established players and the result was a gradual lowering of player’s rating across the board.
       During the Fischer Boom when great numbers of players started entering tournaments deflation became serious and disturbed a lot of people.  As a result the USCF instituted bonus points, feedback points and activity points, collectively known as "fiddle points." For the first 30 years of the rating system deflation had not been a problem but around 1980 it was discovered that players of relatively stable strength had lost points over time when their ratings should have stayed about the same.  As a result activity points, bonus points and feedback points were introduced to protect them.  These were known as "fiddle points" and resulted in inflation of the ratings.  It was possible, if one played enough games against higher rated players to actually gain points even if you ended up with a minus score.
       Elo rating calculates (and this is an important point) the results, not the skill of players and can be used in any two-player game. Of course better players usually have better results…but more on this at the end of this post. It is named after its creator Prof. Arpad Elo, a Hungarian born U.S. player who was also a physics professor.  Elo invented his system to be used in chess, but today it is also used in many other games. An important point is that ratings measure the relative skill of players in the rating pool.  For this reason, ratings from various servers, correspondence organizations and even OTB groups (for example, players in different countries) are not comparable.
      Elo was a master and an active participant in the USCF from its founding in 1939. The first USCF rating system was devised by Kenneth Harkness but in some circumstances gave ratings which many people thought weren’t accurate so Elo devised a new system with a sounder statistical basis.
      Although a player might perform significantly better or worse from one game to the next, Elo assumed that the performances of  a player changes slowly over time and a player's true skill is the mean of that player's performance over time.
      Several people, most notably Mark Glickman,  have proposed using more sophisticated statistical methods to estimate ratings, but the simplicity of the Elo system remains one of its greatest assets.
      The USCF implemented Elo's suggestions in 1960 and the system quickly gained recognition as being both more fair and accurate than the Harkness Rating System. Elo's system was adopted by FIDE in 1970.
      Performance can only be inferred from wins, losses, and draws against other players. A player's rating depends on the ratings of his opponents, and the results scored against them. Elo scaled ratings so that a difference of 200 rating points would mean that the stronger player has an expected score of approximately 75 percent.
      An increase or decrease in the average rating over all players in the rating system is referred to as rating inflation or rating deflation.  For example, a modern rating of 2500 means less than an old rating of 2500 and so using ratings to compare players between different eras becomes impossible.
      It is commonly believed that at the top level modern ratings are inflated but it has also been suggested that an overall increase in ratings reflects greater skill. The number of people with ratings over 2700 has increased. Around 1979 only Karpov was over 2700.  This increased to 15 players in 1994, and by 2009 there were 33 players over 2700. 
      In the mid-1990’s, the USCF realized that young players were improving faster than the rating system was able to track and as a result, established players started to lose rating points to the young and underrated players. Several of the older players were frustrated over what they considered an unfair rating decline and some gave up tournament chess as a result.  As a result the USCF included a bonus point system which feeds rating points into the system.
      As mentioned earlier, better players usually have better results, and thus higher ratings, but it is important to remember ratings are a measure of results. Claude F. Bloodgood III was a controversial American player. As a young man, he was arrested several times and eventually ended up being sentenced to death after murdering his mother, although this sentence was later commuted. While in prison, he remained a very active player and played a large number of postal games as well as OTB rated games with other inmates. Over time, he achieved a very high USCF rating by manipulating the system.
      Bloodgood set up the Virginia Penitentiary Chess Program in 1972 and taught inmates the game and they competed against each other in tournaments. Bloodgood, using state money obtained in a grant, bought a bunch of cheap chess sets and registered his 50 or so prisoners with the USCF and became a TD so that he could hold VaPen Open tournaments that were USCF rated.  There were no entrance fees for these events and inmates played thousands of rated games.  This resulted in Bloodgood racking up hundreds of rating points until he finally reached a rating of 2702 and was ranked number 2 in the US.
      As far back as 1958 Bloodgood had warned USCF officials of a serious statistical flaw in their rating system but nobody listened. When his rating got to the point that it qualified him for a spot in the US championship, the USCF finally realized there was a problem and they solved it quickly and painlessly…they deleted Bloodgood’s name from the rating list.  That’s what Elo meant when he pointed out that ratings measure performance and not ability.  At best Bloodgood was likely a high-rated Expert.

No comments:

Post a Comment