Pre-evaluated chess position and its features
Description
One of the hypothesis underlying this project is that the evaluation of chess positions can be improved by systematically analyzing various board features, such as material balance, mobility, central control, king safety, and piece connectivity. These features are extracted from 200,000 chess positions represented in the FEN (Forsyth–Edwards Notation) format, with each position evaluated using predefined heuristics. The dataset consists of 200,000 chess positions, each represented by a FEN string along with a pre-calculated evaluation. The FEN strings describe the positions of all pieces on the board, while the evaluations are numerical values indicating the relative advantage of either side. The data was gathered from real games from https://github.com/r2dev2/ChessDataContributor To interpret this data, several metrics are computed for each position: Material count: This measures the material difference between the two sides by assigning weights to different pieces (e.g., pawns, knights, bishops, rooks, and queens). Positive values indicate a material advantage for white, while negative values favor black. Total material: This metric sums the total material on the board, regardless of color. Mobility: This reflects the number of legal moves available to each side. Central control: Central squares attacks (e4, d4, e5, d5) are critical in chess. King safety: This examines the vulnerability of each king by evaluating whether the squares surrounding the king are under attack. Connectivity: This assesses how well the pieces support each other. Position of the pieces: the squares where they are and the average evaluation when they are on these squares, creating a sort of heat map. There is also a graph representation of the raw board, which indicates the list of squares attacked by another square. Depending on the piece on the square and the overall board configuration, a square can "point" to many others, which allow us to calculate the above features. These features are then stored in a relational database for future analysis, enabling more advanced techniques such as an evaluation function to predict how good a position is. The notable findings from this dataset suggest that a holistic approach to chess evaluation that considers both static (material) and dynamic (mobility, connectivity) factors provides a more comprehensive picture of a position. The data can be interpreted by analyzing the relationships between these metrics and the final evaluations. For example, a high material count often correlates with a favorable evaluation, but in some cases, superior king safety or central control can compensate for a material deficit. In conclusion, this study provides valuable insights into the evaluation of chess positions by analyzing key positional features.
Files
Steps to reproduce
The data was collected from a set of chess positions represented in FEN notation, extracted from a public game database. The python-chess library was employed to load and manipulate these positions, enabling the extraction of key information such as piece locations and legal moves. Using Python scripts, I processed approximately 200,000 positions, which were initially stored in CSV files for analysis. Each position was evaluated based on factors like material count, mobility, central control, and king safety. Specific functions were developed for these calculations, such as compute_material_count to assess material advantage and compute_central_control to determine the influence over central squares. Additionally, piece connectivity and mobility were calculated by analyzing the number of legal moves available to each side. Piece locations, including pawns, knights, and other pieces, were extracted using library functions to enable detailed evaluations. The results were stored in an SQLite database, with FEN notation used as the primary key for easy referencing. The entire data insertion process was automated through scripts that processed the positions and inserted the corresponding evaluations into the database. Graphs play a crucial role in this process, offering the core of every feature considered. For example, graphs build the relationship between mobility and central control across thousands of positions, revealing trends that may not be immediately apparent from raw data alone. Moreover, the distribution of piece connectivity, which is strongly based on graphs, across different positions allows researchers to detect patterns and correlations more effectively. These graphs can be generated by importing the processed data from the SQLite database, making the analysis more insightful and easier to interpret.