Monday, March 14, 2011

Engine Positional Evaluations and Comparisons

On the talkchess forum IM Larry Kaufmann recently posted a position and comments regarding different engine evaluations of a position in which there is a material imbalance. After the moves: 1. d4 Nf6 2. c4 g6 3. Nc3 d5 4. Nf3 Bg7 5. Qb3 dxc4 6. Qxc4 O-O 7. e4 b6 8. e5 Be6 9. exf6 Bxc4 10. fxg7 Kxg7 11. Bxc4:

      We have a position with 2B’s and a N for a Q and P. Kaufmann observes that the move 7...b6 is rarely played by Grandmasters because the position is considered favorable for White. As he points out, material is deemed even but White has all the positional pluses. He did an interesting experiment where he ran this position for 30 minutes on a quad and arrived at these assessments:
Deep Rybka 4: +.53
Deep Shredder 12: +.88
Fritz 12: +.47
Hiarcs 13.1: +.48
Komodo 1.3: +.30
Naum 4: +1.05
Stockfish 2.0: +.20

Firebird 1.31: +.09
Rybka 2.3.2 a MP: +0.02
Robbolito .085g3: 0.00
Ivanhoe 47 and 49: 0.00
Houdini 1.5: -0.13
Critter 0.90: -0.17
      He points out: “Rybka 2.3.2a MP got this quite wrong with a nearly zero score, and Robbolito, which is said to have come from decompiled Rybka 2.3.2a code, also makes the same mistake with a zero score. Of course the scores won't be identical as the searches are different. The engines acknowledged to come from Robbo have of course also a zero or near-zero score. Houdini and Critter actually go negative; it is hard to imagine that a program not starting with the Robbo values would make such a big error in evaluating this position. I don't know much about Critter so I don't mean to start a debate about its status, but this is certainly strange.
      As for why Rybka 2.3.2a gets it wrong, all attempts to fix the undervaluation of minor pieces vs. major pieces tested poorly in Rybka, yet seem to test okay in unrelated engines. So any program that makes this same mistake is likely to either have copied Rybka values, or to be so similar to Rybka that testing produced the same anomalous result.”
      CC GM Marjan Semrl states that a Master will begin his analysis by examining the position and its properties very carefully and will take into consideration the advice of the program, but he will also advise the program and force it to analyze the moves he thinks are best.
      Once again, we see that you simply cannot rely on engine analysis to tell you what the best move in any given position is unless you are talking about a tactical situation. As I have pointed out in previous posts, this seems especially true in positions in which there is a material imbalance.  What this means is don't place blind faith in an engine's numerical evaluation if the position is primarily one that requires positional judgment. 
      All that said, any engine move is likely to be better than the ones you and I select because most of us are't quite as good as GM's when it comes to positional judgment.


  1. I recently read Silman's 'The Amateur's Mind" and it has helped me a lot in assessing a position.

    I have been picking various positions from master games and trying to figure out a plan based on Silman's method of imbalances.

    It has proven very helpful!! I still need to translate it into games.

    After I have come up with a plan and the best move(s) I check it tactically with an engine to make sure.

    I think that Hiarcs has some of the best positional understanding and I tend to trust it the most. I will double check with Shredder and then Junior (cause Junior can come up with some cool moves!)

    But I won't check with the engine until I have thought it out myself first.

    I think one of the best engines to help with OTB play is Junior or Hiarcs. Sometime the really big boy engines are suggesting moves based on lines SO deep I will never get it. Junior and Hiarcs tend to suggesting lines that would work OTB (especially in normal club play)

  2. It would be interesting to see what Juniors evaluation is…hint.

  3. I let Deep Junior 11.1a and Deep Junior 12 both look at the position for about 2 minutes or so.

    Deep Junior 11.1a evaluates the position as +0.31 for White. Deep Junior 11.1a instantly thought the position was a + for White but it took about 10 minutes for it to settle in at the final evaluation.

    Deep Junior 12 really was pretty optimistic for White right off the bat and settled in at a +0.66 for White!

    Junior is an optimistic engine at times. When doing post mortems on my OTB games I have often found that Junior gives me the most instructive idea on how I could have played the position. Junior often suggests the most forcing move with just a tad bit of speculation. Which I think is helpful in OTB chess.

    It is when the big boy engines (and people) face off that the difference in .10 centipawns makes a difference. That is why the best engine solution isn't always the best OTB solution.

  4. I meant I let them look at the position for about 20 minutes each. :)

  5. Thanks! I get a lot of hits on this Blog by people looking for engine comparisons and such, so this information should be helpful.

  6. Junior is also a very fun engine to play against for people who like to do that. (I do!)

    Junior 12 UCI has the limit strength feature.

    Even at easier settings Junior will challenge one's opening repertoire (especially me as my repertoire is very undeveloped!)