# Project


<img src="images/chess.png">

## Libraries

In [1]:


import time
import json
import requests
import pandas as pd
import re
from pandas.io.json import json_normalize
from IPython.core.display import HTML
from bs4 import BeautifulSoup


## Leaderboards


Aqui obtenemos los leaderboards de los diferentes tipos de juego.

In [2]:
url = 'https://api.chess.com/pub/leaderboards'
r = requests.get(url)
leaderboards = r.json()

In [3]:
leaderboards.keys()

dict_keys(['daily', 'daily960', 'live_rapid', 'live_blitz', 'live_bullet', 'live_bughouse', 'live_blitz960', 'live_threecheck', 'live_crazyhouse', 'live_kingofthehill', 'lessons', 'tactics'])

Podemos ver que hay diferentes tipos de juego segun el tiempo que tenemos para realizar los movimientos:

    - Daily 
        - Cada jugador tiene 24 horas para realizar cada uno de los movimientos.
        
    - Rapid 
        - Cada jugador desde 10 minutos en adelante hasta 1 hora para realizar todos los movimientos.
        - Puede tener incrementos de segundos por cada movimiento.
        
    - Blitz 
        - Cada jugador tiene entre 2 minutos y 10 para realizar todos sus movimientos.
        - Puede tener incrementos de segundos por cada movimiento.
        
    - Bullet
        - Cada jugador tiene un minuto para realizar todos los movimientos.
        
Tambien hay diferentes tipos de juegos:

    - 960
        - Son partidas de ajedrez con las piezas iniciales situadas de forma aleatoria. Es decir, los peones siguen en su posición inicial pero las piezas mayores están desordenadas.
        
    - Kingofthehill
        - Gana el primer jugador que lleve su rey al centro del tablero.
        
    - Bughouse
        - El formato bughouse o pasapiezas, se juega por parejas y se realiza de tal manera que las piezas que capture nuestro compañero a su rival, pasarán a estar disponibles para que las utilicemos. Es decir, si nosotros jugamos con blancas y nuestro compañero (que juega con negras) le captura un caballo a su rival, en nuestro turno tendremos un caballo blanco para utilizar o colocar donde queramos.
        
    - Crazyhouse
        - Te permite utilizar las piezas del rival que capturas. Es decir, si jugamos con blancas capturamos un peón negro del rival, de repente tendremos un peón blanco que podremos incorporar en el tablero en nuestro turno. Para ser más concreto, la pieza capturada cambia de color y pasa a estar a nuestra disposición.

Ademas, contamos con lecciones y tacticas para practicar diferentes situaciones o resolver problemas.

### Blitz Leaderboard

In [4]:
leaderboard_blitz_dataframe = pd.DataFrame(leaderboards['live_blitz'])
leaderboard_blitz_dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 14 columns):
player_id      50 non-null int64
@id            50 non-null object
url            50 non-null object
username       50 non-null object
score          50 non-null int64
rank           50 non-null int64
country        50 non-null object
title          48 non-null object
name           45 non-null object
status         50 non-null object
avatar         50 non-null object
trend_score    50 non-null object
trend_rank     50 non-null object
flair_code     50 non-null object
dtypes: int64(3), object(11)
memory usage: 5.6+ KB


In [5]:
leaderboard_blitz_dataframe = leaderboard_blitz_dataframe[['player_id', 'username', 'name', 'title', 'score', 'rank','avatar', 'country']]
leaderboard_blitz_dataframe.head()

Unnamed: 0,player_id,username,name,title,score,rank,avatar,country
0,15448422,hikaru,Hikaru Nakamura,GM,3251,1,https://images.chesscomfiles.com/uploads/v1/us...,https://api.chess.com/pub/country/US
1,42022994,firouzja2003,Alireza Firouzja,GM,3134,2,https://images.chesscomfiles.com/uploads/v1/us...,https://api.chess.com/pub/country/IR
2,26897358,grischuk,Alexander Grischuk,GM,3120,3,https://images.chesscomfiles.com/uploads/v1/us...,https://api.chess.com/pub/country/RU
3,26303484,bigfish1995,Vladimir Fedoseev,GM,3107,4,https://images.chesscomfiles.com/uploads/v1/us...,https://api.chess.com/pub/country/RU
4,28417564,lachesisq,Ian Nepomniachtchi,GM,3096,5,https://images.chesscomfiles.com/uploads/v1/us...,https://api.chess.com/pub/country/RU


Aqui cambio el contenido de la columna country, cambiando la url por el nombre dentro de cada url.

In [6]:
for i in leaderboard_blitz_dataframe.index:
    url = leaderboard_blitz_dataframe.loc[i, 'country']
    leaderboard_blitz_dataframe.loc[i, 'country'] = requests.get(url).json()['name']

In [7]:
leaderboard_blitz_dataframe.head()

Unnamed: 0,player_id,username,name,title,score,rank,avatar,country
0,15448422,hikaru,Hikaru Nakamura,GM,3251,1,https://images.chesscomfiles.com/uploads/v1/us...,United States
1,42022994,firouzja2003,Alireza Firouzja,GM,3134,2,https://images.chesscomfiles.com/uploads/v1/us...,Iran
2,26897358,grischuk,Alexander Grischuk,GM,3120,3,https://images.chesscomfiles.com/uploads/v1/us...,Russia
3,26303484,bigfish1995,Vladimir Fedoseev,GM,3107,4,https://images.chesscomfiles.com/uploads/v1/us...,Russia
4,28417564,lachesisq,Ian Nepomniachtchi,GM,3096,5,https://images.chesscomfiles.com/uploads/v1/us...,Russia


Aqui saco las localizaciones de cada jugador si las tiene.

In [8]:
for i in leaderboard_blitz_dataframe.index:
    url = f"https://api.chess.com/pub/player/{leaderboard_blitz_dataframe.loc[i, 'username']}"
    r = requests.get(url).json()
    if 'location' in r.keys():
        leaderboard_blitz_dataframe.loc[i, 'location'] = r['location']
    else:
        leaderboard_blitz_dataframe.loc[i, 'location'] = 'Unknown'

In [9]:
leaderboard_blitz_dataframe.head()

Unnamed: 0,player_id,username,name,title,score,rank,avatar,country,location
0,15448422,hikaru,Hikaru Nakamura,GM,3251,1,https://images.chesscomfiles.com/uploads/v1/us...,United States,"Sunrise, Florida"
1,42022994,firouzja2003,Alireza Firouzja,GM,3134,2,https://images.chesscomfiles.com/uploads/v1/us...,Iran,Babol
2,26897358,grischuk,Alexander Grischuk,GM,3120,3,https://images.chesscomfiles.com/uploads/v1/us...,Russia,Moscow
3,26303484,bigfish1995,Vladimir Fedoseev,GM,3107,4,https://images.chesscomfiles.com/uploads/v1/us...,Russia,Moscow
4,28417564,lachesisq,Ian Nepomniachtchi,GM,3096,5,https://images.chesscomfiles.com/uploads/v1/us...,Russia,Moscow


De cada una de las estadisticas de los jugadores saco su mejor puntuacion y el numero de victorias, derrotas y empates.

In [10]:
for i in leaderboard_blitz_dataframe.index:
    url = f"https://api.chess.com/pub/player/{leaderboard_blitz_dataframe.loc[i,'username']}/stats"
    r = requests.get(url).json()
    leaderboard_blitz_dataframe.loc[i, 'best_rating'] = r['chess_blitz']['best']['rating']
    leaderboard_blitz_dataframe.loc[i, 'games_won'] = r['chess_blitz']['record']['win']
    leaderboard_blitz_dataframe.loc[i, 'games_lost'] = r['chess_blitz']['record']['loss']
    leaderboard_blitz_dataframe.loc[i, 'games_draw'] = r['chess_blitz']['record']['draw']

In [11]:
leaderboard_blitz_dataframe.head()

Unnamed: 0,player_id,username,name,title,score,rank,avatar,country,location,best_rating,games_won,games_lost,games_draw
0,15448422,hikaru,Hikaru Nakamura,GM,3251,1,https://images.chesscomfiles.com/uploads/v1/us...,United States,"Sunrise, Florida",3322.0,13233.0,2381.0,1616.0
1,42022994,firouzja2003,Alireza Firouzja,GM,3134,2,https://images.chesscomfiles.com/uploads/v1/us...,Iran,Babol,3233.0,4581.0,3003.0,1038.0
2,26897358,grischuk,Alexander Grischuk,GM,3120,3,https://images.chesscomfiles.com/uploads/v1/us...,Russia,Moscow,3120.0,257.0,176.0,133.0
3,26303484,bigfish1995,Vladimir Fedoseev,GM,3107,4,https://images.chesscomfiles.com/uploads/v1/us...,Russia,Moscow,3174.0,3363.0,2445.0,753.0
4,28417564,lachesisq,Ian Nepomniachtchi,GM,3096,5,https://images.chesscomfiles.com/uploads/v1/us...,Russia,Moscow,3204.0,1245.0,697.0,297.0


Muestro los tres primeros con el avatar

In [12]:
def path_to_image_html(path):
    return '<img src="'+ path + '" width="100" >'

pd.set_option('display.max_colwidth', -1)

HTML(leaderboard_blitz_dataframe[:3].to_html(escape=False ,formatters=dict(avatar=path_to_image_html)))

Unnamed: 0,player_id,username,name,title,score,rank,avatar,country,location,best_rating,games_won,games_lost,games_draw
0,15448422,hikaru,Hikaru Nakamura,GM,3251,1,,United States,"Sunrise, Florida",3322.0,13233.0,2381.0,1616.0
1,42022994,firouzja2003,Alireza Firouzja,GM,3134,2,,Iran,Babol,3233.0,4581.0,3003.0,1038.0
2,26897358,grischuk,Alexander Grischuk,GM,3120,3,,Russia,Moscow,3120.0,257.0,176.0,133.0


## Games Dataset Blitz

Primero sacamos la lista de todas la url de cada jugador con partidas.

In [104]:
list_games = []
for i in leaderboard_blitz_dataframe.index:
    url = f"https://api.chess.com/pub/player/{leaderboard_blitz_dataframe.loc[i,'username']}/games/archives"
    r = requests.get(url).json()['archives']
    list_games += r


De cada URL obtenemos la informacion de cada partida (Tarda bastante en ejecutarse, son todas las partidas de los 50 mejores jugadores)

In [114]:
lst_all_games = [json_normalize(requests.get(i).json()['games']) for i in list_games]

In [115]:
data = pd.concat(lst_all_games,ignore_index=True, sort=False)

In [121]:
data.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 382925 entries, 0 to 382924
Data columns (total 19 columns):
url               382925 non-null object
pgn               356016 non-null object
time_control      382925 non-null object
end_time          382925 non-null int64
rated             382925 non-null bool
fen               382925 non-null object
time_class        382925 non-null object
rules             382925 non-null object
white.rating      382925 non-null int64
white.result      382925 non-null object
white.@id         382925 non-null object
white.username    382925 non-null object
black.rating      382925 non-null int64
black.result      382925 non-null object
black.@id         382925 non-null object
black.username    382925 non-null object
start_time        1288 non-null float64
tournament        9153 non-null object
match             178 non-null object
dtypes: bool(1), float64(1), int64(3), object(14)
memory usage: 53.0+ MB


In [21]:
data1 = data.copy()

In [20]:
data1 = data1[['time_class','white.username', 'white.rating', 'black.username', 'black.rating', 
                        'time_control','white.result','black.result','pgn','rated']]
display(data1.head(1))


Unnamed: 0,time_class,white.username,white.rating,black.username,black.rating,time_control,white.result,black.result,pgn,rated
0,blitz,Hikaru,2354,Godswill,2167,180,win,resigned,"[Event ""Live Chess""]\n[Site ""Chess.com""]\n[Date ""2014.01.06""]\n[Round ""-""]\n[White ""Hikaru""]\n[Black ""Godswill""]\n[Result ""1-0""]\n[ECO ""C25""]\n[ECOUrl ""https://www.chess.com/openings/Vienna-Game-Max-Lange-Paulsen-Variation""]\n[CurrentPosition ""6k1/1p2R3/p1p5/8/2P1B3/1P1P1p1P/P6K/8 b - -""]\n[Timezone ""UTC""]\n[UTCDate ""2014.01.06""]\n[UTCTime ""23:50:17""]\n[WhiteElo ""2354""]\n[BlackElo ""2167""]\n[TimeControl ""180""]\n[Termination ""Hikaru won by resignation""]\n[StartTime ""23:50:17""]\n[EndDate ""2014.01.06""]\n[EndTime ""23:54:39""]\n[Link ""https://www.chess.com/live/game/692667823""]\n\n1. e4 {[%clk 0:03:00]} 1... e5 {[%clk 0:03:00]} 2. Nc3 {[%clk 0:02:57.6]} 2... Nc6 {[%clk 0:02:57.2]} 3. g3 {[%clk 0:02:42.4]} 3... g6 {[%clk 0:02:52.5]} 4. Bg2 {[%clk 0:02:40.8]} 4... Bg7 {[%clk 0:02:51.8]} 5. Nge2 {[%clk 0:02:38.3]} 5... Nge7 {[%clk 0:02:50.8]} 6. O-O {[%clk 0:02:37.6]} 6... O-O {[%clk 0:02:49.7]} 7. d3 {[%clk 0:02:37.1]} 7... d6 {[%clk 0:02:48.7]} 8. h3 {[%clk 0:02:36.6]} 8... h6 {[%clk 0:02:46.9]} 9. Be3 {[%clk 0:02:35.5]} 9... Be6 {[%clk 0:02:45.5]} 10. Qd2 {[%clk 0:02:35]} 10... Qd7 {[%clk 0:02:44]} 11. Kh2 {[%clk 0:02:33]} 11... Kh7 {[%clk 0:02:42.5]} 12. Nd5 {[%clk 0:02:32.2]} 12... Nd4 {[%clk 0:02:37.4]} 13. Nxe7 {[%clk 0:02:21.8]} 13... Qxe7 {[%clk 0:02:33]} 14. c3 {[%clk 0:02:21.1]} 14... Nxe2 {[%clk 0:02:31.7]} 15. Qxe2 {[%clk 0:02:20.3]} 15... c6 {[%clk 0:02:30.5]} 16. f4 {[%clk 0:02:18.9]} 16... f5 {[%clk 0:02:28.8]} 17. Rae1 {[%clk 0:02:16.3]} 17... a6 {[%clk 0:02:18.1]} 18. fxe5 {[%clk 0:02:13.1]} 18... dxe5 {[%clk 0:02:16.6]} 19. exf5 {[%clk 0:02:11.8]} 19... Bxf5 {[%clk 0:02:15.8]} 20. g4 {[%clk 0:02:11.2]} 20... Be6 {[%clk 0:02:14.8]} 21. c4 {[%clk 0:02:10.7]} 21... Rad8 {[%clk 0:02:12.4]} 22. Be4 {[%clk 0:02:09.6]} 22... Bf7 {[%clk 0:02:04.3]} 23. b3 {[%clk 0:02:07.4]} 23... Qh4 {[%clk 0:01:38.8]} 24. Bf2 {[%clk 0:02:04.8]} 24... Qg5 {[%clk 0:01:37]} 25. Bc5 {[%clk 0:02:03.6]} 25... Be6 {[%clk 0:01:22.7]} 26. Bxf8 {[%clk 0:01:47.4]} 26... Rxf8 {[%clk 0:01:20.9]} 27. Rxf8 {[%clk 0:01:45.1]} 27... Bxf8 {[%clk 0:01:20.1]} 28. Qf2 {[%clk 0:01:44.4]} 28... Be7 {[%clk 0:01:09.6]} 29. Rf1 {[%clk 0:01:42]} 29... h5 {[%clk 0:01:05.2]} 30. gxh5 {[%clk 0:01:40.1]} 30... Qxh5 {[%clk 0:01:04.5]} 31. Qf3 {[%clk 0:01:39.3]} 31... Qg5 {[%clk 0:00:59.4]} 32. Rg1 {[%clk 0:01:34.1]} 32... Qf4+ {[%clk 0:00:48.6]} 33. Qxf4 {[%clk 0:01:29.8]} 33... exf4 {[%clk 0:00:47.8]} 34. Rxg6 {[%clk 0:01:29.4]} 34... f3 {[%clk 0:00:32.9]} 35. Rxe6+ {[%clk 0:01:27.2]} 35... Kg8 {[%clk 0:00:32]} 36. Rxe7 {[%clk 0:01:26.3]} 1-0",True


In [186]:
data1['time_class'].value_counts()

bullet    194127
blitz     185060
rapid     2450  
daily     1288  
Name: time_class, dtype: int64

En el siguiente paso nos quedamos con las partidas que son blitz y cuentan para los puntos de rating

In [187]:
data1 = data1.loc[data1['time_class'] == 'blitz']
data1 = data1.loc[data1['rated'] == True]

In [189]:
data1['time_class'].value_counts()

blitz    182194
Name: time_class, dtype: int64

Limpiamos la columna 'pgn' para obtener la notacion de las partidas

In [190]:
def game_anotation(x):
    
    # Input: pgn, messy string with notation
    # Output: Cleaning notation chess game
    
    if x is not None:
        l = [ i + ' ' + j for i, j in zip(re.findall('\d+\. \w+-?\w+', str(x)), re.findall('\d+\.\.\. \w+-?\w+', str(x)))]
        return ' '.join(l)
    else:
        return "Unknown"



In [191]:
for i in data1.index:
    data1.loc[i,'pgn'] = game_anotation(data1.loc[i,'pgn'])

In [16]:
display(data1.head(1))

Unnamed: 0,time_class,white.username,white.rating,black.username,black.rating,time_control,white.result,black.result,pgn,rated
0,blitz,Hikaru,2354,Godswill,2167,180,win,resigned,1. e4 1... e5 2. Nc3 2... Nc6 3. g3 3... g6 4. Bg2 4... Bg7 5. Nge2 5... Nge7 6. O-O 6... O-O 7. d3 7... d6 8. h3 8... h6 9. Be3 9... Be6 10. Qd2 10... Qd7 11. Kh2 11... Kh7 12. Nd5 12... Nd4 13. Nxe7 13... Qxe7 14. c3 14... Nxe2 15. Qxe2 15... c6 16. f4 16... f5 17. Rae1 17... a6 18. fxe5 18... dxe5 19. exf5 19... Bxf5 20. g4 20... Be6 21. c4 21... Rad8 22. Be4 22... Bf7 23. b3 23... Qh4 24. Bf2 24... Qg5 25. Bc5 25... Be6 26. Bxf8 26... Rxf8 27. Rxf8 27... Bxf8 28. Qf2 28... Be7 29. Rf1 29... h5 30. gxh5 30... Qxh5 31. Qf3 31... Qg5 32. Rg1 32... Qf4 33. Qxf4 33... exf4 34. Rxg6 34... f3 35. Rxe6 35... Kg8,True
1,blitz,Godswill,2163,Hikaru,2438,180,timeout,win,1. d4 1... Nf6 2. c4 2... b6 3. Nc3 3... Bb7 4. f3 4... e6 5. e4 5... d5 6. e5 6... Nfd7 7. cxd5 7... exd5 8. Bd3 8... c5 9. dxc5 9... Bxc5 10. Qe2 10... Nc6 11. f4 11... Nd4 12. Qg4 12... Qe7 13. Nge2 13... Nxe2 14. Nxe2 14... Bb4 15. Kf1 15... O-O 16. a3 16... f6 17. axb4 17... fxe5 18. Bf5 18... Nf6 19. Qh3 19... g6 20. Be6 20... Kh8 21. f5 21... Ne4 22. Kg1 22... Qxb4 23. Bh6 23... Qxb2 24. Re1 24... Qb4 25. Rf1 25... Rf6 26. g4 26... Ba6 27. Qe3 27... gxf5 28. Qg5 28... Nxg5 29. Bxg5 29... Qxg4 30. Ng3 30... Qxg5 31. Rxf5 31... Rxf5 32. Bxf5 32... Qd2 33. h4 33... Rg8 34. Rh3 34... Bd3 35. Be6 35... Rxg3 36. Rxg3 36... Qe1 37. Kh2 37... Qxg3 38. Kxg3 38... d4 39. Kf3 39... a5 40. Bd5 40... a4 41. Kg4 41... a3 42. Kg5 42... Kg7 43. h5 43... h6 44. Kg4 44... b5 45. Kf3 45... b4 46. Kf2 46... e4 47. Ke1 47... e3 48. Kd1 48... Kf6 49. Kc1 49... Ke5 50. Bb3 50... Bf5 51. Kd1 51... d3 52. Ke1 52... Bh3 53. Bd1 53... Kd4 54. Bb3 54... Kc3,True
2,blitz,Hikaru,2509,Dmitriy_From_Russia,2266,180,win,resigned,1. e4 1... d6 2. g3 2... g6 3. Bg2 3... Bg7 4. Ne2 4... Nf6 5. d4 5... O-O 6. c4 6... c6 7. O-O 7... Nbd7 8. Nec3 8... e5 9. d5 9... Nc5 10. b4 10... Na6 11. a3 11... Nc7 12. h3 12... a5 13. b5 13... cxd5 14. cxd5 14... Bd7 15. a4 15... Qc8 16. Kh2 16... Nh5 17. Nd2 17... Na6 18. Ra3 18... f5 19. bxa6 19... Rxa6 20. exf5 20... Bxf5 21. Nde4 21... Nf6 22. Nxf6 22... Bxf6 23. Ne4 23... Bg7 24. Rc3 24... Qd7 25. Qb3 25... Raa8 26. Be3 26... Bxe4 27. Bxe4 27... Rf7 28. Rfc1 28... Raf8 29. Qc2 29... Bf6 30. Rc7 30... Qd8 31. Rxf7 31... Rxf7 32. Qc8 32... Qxc8 33. Rxc8 33... Kg7,True
3,blitz,Dmitriy_From_Russia,2263,Hikaru,2556,180,timeout,win,1. c4 1... d6 2. b3 2... g6 3. Bb2 3... Nf6 4. Nc3 4... Bg7 5. Qc2 5... c5 6. O-O 6... Nc6 7. e3 7... Bf5 8. d3 8... a6 9. Nf3 9... b5 10. Nd5 10... O-O 11. Nxf6 11... exf6 12. Kb1 12... Nb4 13. Qd2 13... bxc4 14. e4 14... Bd7 15. dxc4 15... Bc6 16. Re1 16... Re8 17. a3 17... Bxe4 18. Rxe4 18... Rxe4 19. axb4 19... cxb4 20. Bd3 20... Re7 21. Qxb4 21... Rb8 22. Qa4 22... Reb7 23. Bc2 23... f5 24. Bxg7 24... Kxg7 25. Rd1 25... Qf6 26. Qa1 26... Rb6 27. Qxf6 27... Kxf6 28. Kb2 28... a5 29. Kc3 29... Rc8 30. h3 30... h5 31. Nd4 31... d5 32. Ra1 32... Rc5 33. Nf3 33... dxc4 34. bxc4 34... Rb8 35. Ne1 35... Rbc8 36. Bb3 36... h4 37. Nd3 37... Rd5 38. c5 38... Rdxc5 39. Nxc5 39... Rxc5 40. Bc4 40... Re5 41. Ra2 41... g5 42. Re2 42... Rxe2 43. Bxe2 43... Ke5 44. Kc4 44... a4 45. Kb4 45... Kd4 46. Kxa4 46... Kc3 47. f4 47... f6 48. Kb5 48... Kd4 49. Bd1 49... Ke3 50. fxg5 50... fxg5 51. Bf3 51... Kf4 52. Bh5 52... Kg3 53. Bg6 53... Kf4 54. Bf7 54... g4 55. Bd5 55... Kg3 56. Kc5 56... gxh3 57. gxh3 57... Kxh3 58. Be6 58... Kg4 59. Kd4 59... Kf4 60. Kd3 60... Ke5,True
4,blitz,Hikaru,2590,Dmitriy_From_Russia,2260,180,win,checkmated,1. e4 1... d6 2. g3 2... g6 3. Bg2 3... Bg7 4. Ne2 4... Nf6 5. O-O 5... O-O 6. d4 6... Nfd7 7. c4 7... Re8 8. f4 8... Nf8 9. f5 9... e5 10. d5 10... Na6 11. Nbc3 11... Nc5 12. b4 12... Ncd7 13. a3 13... gxf5 14. exf5 14... Nf6 15. h3 15... Kh8 16. g4 16... h6 17. Ng3 17... N8d7 18. Be3 18... Rg8 19. Qd2 19... Nb6 20. Qd3 20... Bd7 21. c5 21... Na4 22. Nce4 22... dxc5 23. bxc5 23... a6 24. Nxf6 24... Bb5 25. Qd2 25... Bxf6 26. Ne4 26... Bxf1 27. Rxf1 27... Bg5 28. Bxg5 28... hxg5 29. h4 29... gxh4,True


In [197]:
data1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 182194 entries, 0 to 382924
Data columns (total 10 columns):
time_class        182194 non-null object
white.username    182194 non-null object
white.rating      182194 non-null int64
black.username    182194 non-null object
black.rating      182194 non-null int64
time_control      182194 non-null object
white.result      182194 non-null object
black.result      182194 non-null object
pgn               182194 non-null object
rated             182194 non-null bool
dtypes: bool(1), int64(2), object(7)
memory usage: 19.1+ MB


Eliminamos duplicados porque una misma partida aparecera dos veces cuando dos jugadores se hayan enfrentado.

In [202]:
data1.drop_duplicates(inplace=True)
data1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 162158 entries, 0 to 382924
Data columns (total 10 columns):
time_class        162158 non-null object
white.username    162158 non-null object
white.rating      162158 non-null int64
black.username    162158 non-null object
black.rating      162158 non-null int64
time_control      162158 non-null object
white.result      162158 non-null object
black.result      162158 non-null object
pgn               162158 non-null object
rated             162158 non-null bool
dtypes: bool(1), int64(2), object(7)
memory usage: 12.5+ MB


In [201]:
data1['pgn'][data1.pgn == 'Unknown'].sum()

0

Aqui podemos observar la duracion y la cantidad de las partidas blitz.

In [203]:
data1['time_control'].value_counts()

180      150486
180+2    7914  
300      1530  
180+1    1205  
300+1    378   
600      266   
300+2    232   
240+2    52    
300+5    40    
600+5    36    
300+3    10    
120+2    3     
180+3    2     
600+3    1     
360+9    1     
480      1     
420+5    1     
Name: time_control, dtype: int64

Estos son los resultados de las partidas blitz.

In [204]:
data1['white.result'].value_counts()

win                    77097
resigned               44463
timeout                10918
checkmated             9254 
agreed                 5353 
repetition             4889 
bughousepartnerlose    4354 
insufficient           2901 
timevsinsufficient     1516 
stalemate              689  
abandoned              601  
50move                 85   
threecheck             28   
kingofthehill          10   
Name: white.result, dtype: int64

In [205]:
data1['black.result'].value_counts()

win                    69628
resigned               51056
timeout                11158
checkmated             10308
agreed                 5353 
repetition             4889 
bughousepartnerlose    3916 
insufficient           2901 
timevsinsufficient     1516 
stalemate              689  
abandoned              624  
50move                 85   
threecheck             20   
kingofthehill          15   
Name: black.result, dtype: int64

In [134]:
data.to_csv('games_data.csv', index=False)

In [206]:
data1.to_csv('games_blitz_clean.csv', index=False)

## Web Scraping


Mi idea es añadir a cada partida la apertura que se esta usando.(No esta terminado aun)

In [144]:
url = 'https://es.wikipedia.org/wiki/Apertura_(ajedrez)'
html = requests.get(url).content
soup = BeautifulSoup(html, 'lxml')
text = soup.find_all('li')

In [143]:
lst = [i.text for i in text if i.text.startswith('1')]
lst

['1 Reglas clásicas',
 '1.e4 e5 Apertura abierta',
 '1.e4 e5 2.Cf3 Cc6 3.Ab5 Apertura española',
 '1.e4 e5 2.Cf3 Cc6 3.c3 Apertura Ponziani',
 '1.e4 e5 2.Cf3 Cc6 3.d4 Apertura escocesa',
 '1.e4 e5 2.Cf3 Cc6 3.Ac4 Apertura italiana Ac5 Giuoco piano y otras variantes (particularmente Cf6)',
 '1.e4 e5 2.Cf3 Cc6 3.Ac4 Ac5 4.b4 Gambito Evans',
 '1.e4 e5 2.Cf3 Cc6 3.Ac4 Cf6 Defensa de los dos caballos',
 '1.e4 e5 2.Cf3 Cc6 3.Ac4 Ae7 Defensa húngara',
 '1.e4 e5 2.Cf3 Cc6 3.Ac4 Cd4?! Gambito Blackburne',
 '1.e4 e5 2.Cf3 Cc6 3.Ac4 f5?! Gambito Rousseau',
 '1.e4 e5 2.Cf3 Cc6 3.Cc3 Cf6 Apertura de los cuatro caballos',
 '1.e4 e5 2.Cf3 Cc6 3.Cc3 sin 3...Cf6 Apertura de los tres caballos',
 '1.e4 e5 2.Cf3 Cc6 3.g3 Apertura Konstantinopolsky',
 '1.e4 e5 2.Cf3 Cf6 Defensa Petrov',
 '1.e4 e5 2.Cf3 f5 Gambito Letón',
 '1.e4 e5 2.Cf3 f6 Defensa Damiano',
 '1.e4 e5 2.Cf3 d5 Gambito elefante',
 '1.e4 e5 2.Cf3 d6 Defensa Philidor',
 '1.e4 e5 2.Cf3 De7 Defensa brasileña o Defensa Gunderam',
 '1.e4 e5 2.Cf3 