In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from bs4 import BeautifulSoup

import urllib.request,urllib.parse, urllib.error



The following is a brief script I am currently working on to scrape SumoDB, select wrestler based on their year of premiere, and then extract the match data for each wrestler, above a certain rank. The goal is to assemble a series of entries, where each wrestler has a set of bashos/tournaments under their name, as well as the win/loss/injury stats. By extracting this info from the plain text representation on SumoDB and converting it into a more user friendly table, I can then begin doing some analysis. 

As a test case, I will be extracting Shodai's match records. Note that the URL displays the site as text-only, as the "table" used is actually a set of images. It is a little easier to scrape the info this way. 

URL:http://sumodb.sumogames.de/Rikishi.aspx?r=12130&t=1

In [3]:
url = input('Enter -')
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html,'html.parser')

Enter -http://sumodb.sumogames.de/Rikishi.aspx?r=12130&t=1


Just parsing the HTML here gives us a rough structure of the page, and there are two areas we want to focus on: the biographical info( age,weight,height) , and the tournament info. Let's first start by extracting the name (look for 'h2' tags).

In [4]:
soup


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta content="Rek8qIcTIMRzIdPb0GEasAfYsajhqVKmwC9WwdCk3U0" name="google-site-verification"/><title>
	Shodai Naoya Rikishi Information
</title><link href="website.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript">
window.onload = function()
{
    if (window.winOnLoad) window.winOnLoad();
}
window.onunload = function()
{
    if (window.winOnUnload) window.winOnUnload();
}
</script>
<script src="scripts/x_core.js" type="text/javascript"></script>
<script src="scripts/xselect.js" type="text/javascript"></script>
<!-- Add jQuery library -->
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.7/jquery.min.js" type="text/javascript"></script>
<!-- Add mousewheel plugin (this is optional) -->
<script src="scripts/jquery.mousewheel-3.0.6.pack.js" type="text/javascript"></script>
<!-- Add fancyBo

In [5]:
Name = soup.find_all('h2')[0].text
Name

'Shodai Naoya'

The rest of our info, is unfortunately stored in plain text. Here, I did a bit of digging, and found a nice snippet from https://medium.com/@chipk215/web-scraping-a-story-of-preformatted-text-df65486a8f15, which gives a quick overview of what is done below: you strip the text, and then find your start/end parts. Web scraping is still a bit new to me, so I figured it's worth putting that out there.

We are now going to take that set of text, and set it up as a nice and messy list. Who doesn't love lists? Again the features here are split in two. Note the premiere date of 'Hatsu Dohyo', essentially meaning when they first entered professional sumo. 

In [6]:
pre = soup.find('pre')
start_section_text = 'Highest Rank'
end_section_text = '2001.05 ' # date of next basho...

page_text = pre.text.strip()
start_position = page_text.find(start_section_text)
end_position = page_text.find(end_section_text)

table_text = page_text[start_position:]

lines = table_text.splitlines()
lines

['Highest Rank     Ozeki',
 'Real Name        SHODAI Naoya',
 'Birth Date       November 5, 1991 (29 years)',
 'Shusshin         Kumamoto-ken, Uto-shi',
 'Height and Weight182 cm 150 kg',
 'University       Tokyo University of Agriculture',
 'Heya             Tokitsukaze',
 'Shikona          Shodai Naoya',
 'Hatsu Dohyo      2014.03',
 '',
 'Career Record    295-216-5/510 (40 basho)',
 '  In Makuuchi    227-198-5/424 (29 basho), 1 Yusho, 2 Jun-Yusho, 1 Shukun-Sho, 6 Kanto-Sho, 1 Kinboshi',
 '   As Ozeki      3-2-5/4 (1 basho)',
 '   As Sekiwake   39-21/60 (4 basho), 1 Yusho, 1 Shukun-Sho, 2 Kanto-Sho',
 '   As Komusubi   4-11/15 (1 basho)',
 '   As Maegashira 181-164/345 (23 basho), 2 Jun-Yusho, 4 Kanto-Sho, 1 Kinboshi',
 '  In Juryo       24-6/30 (2 basho), 1 Yusho',
 '  In Makushita   25-10/35 (5 basho), 1 Yusho',
 '  In Sandanme    6-1/7 (1 basho)',
 '  In Jonidan     6-1/7 (1 basho)',
 '  In Jonokuchi   7-0/7 (1 basho), 1 Yusho',
 '  In Mae-zumo    1 basho',
 '',
 'Shodai Naoya',
 

As you might have noticed, Sumo rankings are given by this upper case + number + lowercase notation, denoting a wrestler/rikishi's rank. Also note that Basho/tournaments are held every other month, and that match records are stored here as *,O,-, which are losses, wins, and withdrawals (due to injury). The next column summarizes a 15 day tournament, with a score out of 15 (i.e. 11-4 denotes 11 wins and 4 losses). 

In [7]:
Name

'Shodai Naoya'

In [8]:
for i in range(len(lines)):
    res = lines[i].find(Name.split()[0])
    if res == 0:
        print(i)
        break

23


In [9]:

record_idx = lines.index(Name)

In [11]:
lines[record_idx:]

['Shodai Naoya',
 '2014.03 Mz                      2-1',
 '2014.05 Jk12w   -O-OO-O-O-O-O-- 7-0    Yusho',
 '2014.07 Jd10e   -O-O-OO--O*---O 6-1',
 '2014.09 Sd48e   -O-OO--O*--OO-- 6-1',
 '2014.11 Ms59e   O-O-*-*--OO---O 5-2',
 '2015.01 Ms37w   -O-O-O-OO-O-O-- 7-0    Yusho',
 '2015.03 Ms3w    O-O-*-O-O-*---* 4-3',
 '2015.05 Ms2w    -O*-O-O-O-*--*- 4-3',
 '2015.07 Ms1e    *-O-O-O-O-*--O- 5-2',
 '2015.09 J12w    OOO*OOOOOO**OO* 11-4',
 '2015.11 J5w     O*OOOOOOOO*OOOO 13-2   Yusho (1st)',
 '2016.01 M12w    O*OOO***OOOO*OO 10-5   Kanto-sho (1st)',
 '2016.03 M6w     OO***OOO**OOOO* 9-6',
 '2016.05 M2e     ******OOO**OOO* 6-9',
 '2016.07 M5e     OO*OO*O*OOO*O** 9-6',
 '2016.09 M2w     *******OOO%O*OO 7-8',
 '2016.11 M3w     OOO*O*OOO*OOOO* 11-4   Kanto-sho (2nd)',
 '2017.01 S1w     *O*OO***O**O*OO 7-8',
 '2017.03 K1w     O*O**O****O**** 4-11',
 '2017.05 M5w     O*OOO*OOOO***OO 10-5',
 '2017.07 M1e     *O***%*O****O*O 5-10   Kinboshi',
 '2017.09 M5e     *O*OO%****O*O** 6-9',
 '2017.11 M7w    

We can quickly get the biographical info by exploiting the structure of this page: the name repeats a few times, and we can use these as handy landmarks to denote when the biographical info ends and tournament records begin. This is really useful since rikishi can earn a variety of prizes in a tournmament, and that the match record (at the top) obviously varies from rikishi to rikishi. By using the names, we can split this list into a few easier to manage snippets.

In [13]:
def get_bio_info(first_snippet):
    premiere_yr = first_snippet[1].split()[-1]
    current_weight, current_height = get_height(first_snippet)
    

In [14]:

def get_height(raw_input):
    phys_line = raw_input[4].split()
    weight = phys_line[-2] # weight in KG
    
    # Check if there is a height/weight first
    
    
    if phys_line[2]!='Weight':
        height = phys_line[2].split('t')[-1] # it got stuck together, seperate to get height in cm
        
    else:
        height = phys_line[3]
        
    return(height,weight)



In [15]:
matches = lines[record_idx+1:]


In [17]:
matches[1:]

['2014.05 Jk12w   -O-OO-O-O-O-O-- 7-0    Yusho',
 '2014.07 Jd10e   -O-O-OO--O*---O 6-1',
 '2014.09 Sd48e   -O-OO--O*--OO-- 6-1',
 '2014.11 Ms59e   O-O-*-*--OO---O 5-2',
 '2015.01 Ms37w   -O-O-O-OO-O-O-- 7-0    Yusho',
 '2015.03 Ms3w    O-O-*-O-O-*---* 4-3',
 '2015.05 Ms2w    -O*-O-O-O-*--*- 4-3',
 '2015.07 Ms1e    *-O-O-O-O-*--O- 5-2',
 '2015.09 J12w    OOO*OOOOOO**OO* 11-4',
 '2015.11 J5w     O*OOOOOOOO*OOOO 13-2   Yusho (1st)',
 '2016.01 M12w    O*OOO***OOOO*OO 10-5   Kanto-sho (1st)',
 '2016.03 M6w     OO***OOO**OOOO* 9-6',
 '2016.05 M2e     ******OOO**OOO* 6-9',
 '2016.07 M5e     OO*OO*O*OOO*O** 9-6',
 '2016.09 M2w     *******OOO%O*OO 7-8',
 '2016.11 M3w     OOO*O*OOO*OOOO* 11-4   Kanto-sho (2nd)',
 '2017.01 S1w     *O*OO***O**O*OO 7-8',
 '2017.03 K1w     O*O**O****O**** 4-11',
 '2017.05 M5w     O*OOO*OOOO***OO 10-5',
 '2017.07 M1e     *O***%*O****O*O 5-10   Kinboshi',
 '2017.09 M5e     *O*OO%****O*O** 6-9',
 '2017.11 M7w     *OO***O**OOOO%O 9-6',
 '2018.01 M4e     ***OOOO**OOO*** 

Now for the fun task: we can convert that messy looking pre-formatted text table into something a bit more useable. We are going to ignore any honours (i.e. Yusho, etc) for now. The next two functions extract a full record, and then trim the record based on a rank cutoff.

In [19]:
matches

['2014.03 Mz                      2-1',
 '2014.05 Jk12w   -O-OO-O-O-O-O-- 7-0    Yusho',
 '2014.07 Jd10e   -O-O-OO--O*---O 6-1',
 '2014.09 Sd48e   -O-OO--O*--OO-- 6-1',
 '2014.11 Ms59e   O-O-*-*--OO---O 5-2',
 '2015.01 Ms37w   -O-O-O-OO-O-O-- 7-0    Yusho',
 '2015.03 Ms3w    O-O-*-O-O-*---* 4-3',
 '2015.05 Ms2w    -O*-O-O-O-*--*- 4-3',
 '2015.07 Ms1e    *-O-O-O-O-*--O- 5-2',
 '2015.09 J12w    OOO*OOOOOO**OO* 11-4',
 '2015.11 J5w     O*OOOOOOOO*OOOO 13-2   Yusho (1st)',
 '2016.01 M12w    O*OOO***OOOO*OO 10-5   Kanto-sho (1st)',
 '2016.03 M6w     OO***OOO**OOOO* 9-6',
 '2016.05 M2e     ******OOO**OOO* 6-9',
 '2016.07 M5e     OO*OO*O*OOO*O** 9-6',
 '2016.09 M2w     *******OOO%O*OO 7-8',
 '2016.11 M3w     OOO*O*OOO*OOOO* 11-4   Kanto-sho (2nd)',
 '2017.01 S1w     *O*OO***O**O*OO 7-8',
 '2017.03 K1w     O*O**O****O**** 4-11',
 '2017.05 M5w     O*OOO*OOOO***OO 10-5',
 '2017.07 M1e     *O***%*O****O*O 5-10   Kinboshi',
 '2017.09 M5e     *O*OO%****O*O** 6-9',
 '2017.11 M7w     *OO***O**OOOO%O 

In [79]:
def extract_record(table_of_bashos):
    date = []
    rank = []
    record = []
    final_score = []
    
    for i in range(len(table_of_bashos)):
        entry = table_of_bashos[i:i+1]
        values = entry[0].split()
        if len(values)<4:
            #print('Name change/Missing Data')
            pass
        else:
            #print(i)
            date.append(values[0])
            rank.append(values[1])
            record.append(values[2])
            final_score.append(values[3])
    return(date,rank,record,final_score)

date,rank,record,final_score = extract_record(matches[1:])

One motivation for trimming based on rank is that Juryo is the first official rank where Rikishi compete in 15 bouts, not 7. This is the top tier of sumo, and while the occasional top ranked wrestler falls into Juryo, they rarely fall below unless their career is practically over. We will use Juryo as our cutoff, and it is where I will consider a rikishi/wrestler to have finally reached "pro" status. 

In [31]:
record
row = 0
entry = rank[row]

while (entry[:2] != 'Jd' and entry[:2] != 'Jk' and entry[:1] == 'J') and row < len(rank):
        # reached a juryo rank
        row = +1
        entry = rank[row]
        print(row)
        
row
record

['-O-OO-O-O-O-O--',
 '-O-O-OO--O*---O',
 '-O-OO--O*--OO--',
 'O-O-*-*--OO---O',
 '-O-O-O-OO-O-O--',
 'O-O-*-O-O-*---*',
 '-O*-O-O-O-*--*-',
 '*-O-O-O-O-*--O-',
 'OOO*OOOOOO**OO*',
 'O*OOOOOOOO*OOOO',
 'O*OOO***OOOO*OO',
 'OO***OOO**OOOO*',
 '******OOO**OOO*',
 'OO*OO*O*OOO*O**',
 '*******OOO%O*OO',
 'OOO*O*OOO*OOOO*',
 '*O*OO***O**O*OO',
 'O*O**O****O****',
 'O*OOO*OOOO***OO',
 '*O***%*O****O*O',
 '*O*OO%****O*O**',
 '*OO***O**OOOO%O',
 '***OOOO**OOO***',
 'O***OOOO***OO**',
 'OOOOO**O***OOO*',
 'O*****%***%OO*O',
 '**OO***OOO**O**',
 'O*O*OO***O*O*%O',
 '**O*O***OO**OOO',
 '*********OO*OOO',
 'OO*OO**OO*OO*OO',
 'OO**OO**O**O**O',
 '*O*O*********O*',
 'OOOO**O*OOOOO*O',
 'OOOOOO*OOOOOO*O',
 'OO*O***O*O*OOO*',
 'OO*OOOOOO**O%O*',
 'OOO*OO*OOOOOOOO',
 'OOO*#-----']

In [80]:
def filter_rank(table_of_bashos, rank):
    # Modifies extracted table to include all bashos after exceeding a certain rank
    # Table of bashos is a list of extracted entries, sorted row-wise. Want to modify into table
    # Rank is a string representation of a rank;
    
    
    # Returns row, the relevant index for a given basho rank's first instance
    if rank == 'Juryo':
        # Basically find first instance of first 'Juryo' bout, slice off the rest.
        row = 0
        entry =table_of_bashos[row]
        while (entry[:2] != 'Jk' and entry[:1] == 'J' and entry[:2]!='Jd') and row < len(table_of_bashos):
            
            #print(entry,row, len(table_of_bashos))
            
            if row >= len(table_of_bashos)-1:
                break
            else:
                row += 1
                entry = table_of_bashos[row]

            
            
            
        return(row) # index where first Juryo match is recorded
    
    elif rank == 'Maegashira':
        row = 0
        while table_of_bashos[row][1:3] =='s' and table_of_bashos[row][:1]!='M': # Makuhita != Maegashira, so we dennote the
            #latter as M(number) vs Ms_number
            row += 1
            
        return(row) # index where first Juryo match is recorded
    else:
        print('Invalid rank, try again')
    


In [81]:
def filter_for_rank(rank_list, rank_to_use):


    if rank_to_use =='Juryo':
        row = 0
        for i in range(len(rank_list)):
            res = rank_list[i].find('J')
            if res != -1 and rank_list[i].find('Jd') == -1 and rank_list[i].find('Jk') == -1 and rank_list[i].find('Js') == -1:
        #found juryo
                row = i
                rank_list[i].find('J')
                break
            else:
                #print('No juryo')
                row= -2
        
    return(row)

In [30]:
test_list = [
 'Sd74e',
 'Sd63e',
 'Sd78w',
 'Jd22e','J12e','J5w','Jd12e','J5w']

filter_for_rank(test_list,'Juryo')

NameError: name 'filter_for_rank' is not defined

Filter rank just gives us a nice little index to trim the list we've been working with further. We use it to reformat the previous lists, so that the next few steps involve only the ranks we are interested in.

In [34]:
result = filter_for_rank(rank, 'Juryo')


    # IGNORE THIS RIKISHI


In [42]:
Jidx = filter_for_rank(rank,'Juryo')
# use Jidx to filter for Juryo only matches.

date_f,rank_f,record_f,final_score_f = date[Jidx:],rank[Jidx:],record[Jidx:],final_score[Jidx:]


In [82]:
def translate_record(record, concat):
    # Inputs: Record is the list of wins,losses, withdrawals in a given basho
    # We one-hot these symbolic representations with three arrays of 0s and 1s
    
    wins = np.zeros(15)
    losses = np.zeros(15)
    withdrawals = np.zeros(15)
    
    for i in range(len(record)):
        if record[i] == 'O' or record[i] == '%':
            wins[i] = 1
        elif record[i] == '*' or record[i] =='#':
            losses[i] = 1
        elif record[i] == '-':
            withdrawals[i]= 1

            
            
    #print(np.sum(wins),np.sum(losses), np.sum(withdrawals),'W/L/With ')
    if concat == True:
        return(np.concatenate((wins,losses,withdrawals)))
        # Stick em all together and return it as one array!
    else:
        return(wins,losses,withdrawals)
        # Return as three seperate arrays
    
    
    #print(np.sum(wins),np.sum(losses), np.sum(withdrawals),'W/L/With ')
        


Translate record just takes the simple inputs I described above, and converts it into either a 45x1 dimensional array (useful for ML) or a 3x15 array of wins,losses,withdrawals. This is just toggled with the "concat" parameter. With this said and done, we have now been able to take a full list of bouts on a single html page, and convert it into a useable array. We can easily assemble an array from this.

To scale it up to the rikishi we are interested in, we can exploit SumoDB's ability to sort by 'Intai' or premiere year. This amounts to a nice little start date for all our Rikishi. 

In [44]:
record_arr =np.zeros((len(rank_f),3,15))
for i in range(len(record_f)):
    basho_record = record_f[i]
    record_arr[i,0,:],record_arr[i,1,:],record_arr[i,2,:]= translate_record(basho_record,concat = False)



In [47]:
 # To scrape the full table, first get reference list (will select based off of that...)

In [48]:
url_ref = 'http://sumodb.sumogames.de/Rikishi.aspx?shikona=&heya=-1&shusshin=-1&b=-1&high=-1&hd=-1&entry=-1&intai=-1&sort=7'
html_ref = urllib.request.urlopen(url_ref).read()
soup_ref = BeautifulSoup(html_ref,'html.parser')

In [49]:
table = soup_ref.find_all('table')
df = pd.read_html(str(table))[1]
df

Unnamed: 0,Shikona,Heya,Shusshin,Birth Date,Highest Rank,Hatsu Dohyo,Intai,Last Shikona
0,Akashi,-,Tochigi,1600,Yokozuna,0.00,0.0,Akashi
1,Araiwa,-,-,,Not in Kyokai,0.00,0.0,Araiwa
2,Ashinoura#,Nakamura,-,,Not in Kyokai,0.00,0.0,Ashinoura#
3,Ayagawa,-,Tochigi,1703,Yokozuna,0.00,0.0,Ayagawa
4,Chitosegawa,-,-,,Not in Kyokai,0.00,0.0,Chitosegawa
...,...,...,...,...,...,...,...,...
12660,Yutakanami,Tatsunami,Fukuoka,"January 2, 2001",Jonidan 51,2019.09,,Yutakanami
12661,Yutakasho,Sakaigawa,Kagoshima,"November 19, 1994",Makushita 39,2013.03,,Yutakasho
12662,Yutakayama,Tokitsukaze,Niigata,"September 22, 1993",Maegashira 1,2016.03,,Yutakayama
12663,Zendaisho,Takadagawa,Chiba,"October 14, 1987",Sandanme 85,2003.05,,Zendaisho


In [81]:
df[df['Shikona']=='Ezonoyama']

Unnamed: 0,Shikona,Heya,Shusshin,Birth Date,Highest Rank,Hatsu Dohyo,Intai,Last Shikona
4577,Ezonoyama,-,-,,Sandanme 78,1960.09,1962.01,Ezonoyama


In [50]:
tags = soup_ref('a')
tag_list = []
for tag in tags:
    tag_list.append((tag.get('href',None)))
    


In [51]:
sorted_df =df.loc[df['Hatsu Dohyo']>=1960.01]

Shikonas = sorted_df['Shikona']

In [52]:
sorted_df

Unnamed: 0,Shikona,Heya,Shusshin,Birth Date,Highest Rank,Hatsu Dohyo,Intai,Last Shikona
4237,Katoyama,-,-,,Jonokuchi 14,1960.01,1960.03,Katoyama
4275,Murakami,-,-,,Jonokuchi 6,1960.03,1960.05,Murakami
4279,Ohirato,-,-,,Jonidan 108,1960.01,1960.05,Ohirato
4297,Kondo,-,-,,Jonokuchi 20,1960.05,1960.07,Kondo
4314,Higashida,-,-,,Jonidan 64,1960.01,1960.09,Higashida
...,...,...,...,...,...,...,...,...
12660,Yutakanami,Tatsunami,Fukuoka,"January 2, 2001",Jonidan 51,2019.09,,Yutakanami
12661,Yutakasho,Sakaigawa,Kagoshima,"November 19, 1994",Makushita 39,2013.03,,Yutakasho
12662,Yutakayama,Tokitsukaze,Niigata,"September 22, 1993",Maegashira 1,2016.03,,Yutakayama
12663,Zendaisho,Takadagawa,Chiba,"October 14, 1987",Sandanme 85,2003.05,,Zendaisho


In [147]:
np.argwhere(np.asarray(sorted_df['Highest Rank'].str.find('Makushita')) !=-1).size

1431

In [53]:
xs =sorted_df['Intai'].isnull()

xs.iloc[0]

False

In [57]:
URL_list = tag_list

In [58]:
URL_list = ['http://sumodb.sumogames.de/' +s+'&t=1' for s in tag_list]

In [59]:
r_upper = 8# full list of RIKISHI, SORTED!
r_lower = -12

In [60]:
full_URL_list = URL_list[r_upper:r_lower]
full_URL_list

['http://sumodb.sumogames.de/Rikishi.aspx?r=11893&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=11791&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=8351&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=11894&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=11799&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=8343&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=11802&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=8331&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=8346&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=8354&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=8353&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=8334&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=8338&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=8339&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=8332&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=8333&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=8344&t=1',
 'http://sumodb.sumogames.de/Rikishi.aspx?r=8345&t=1',
 'htt

In [61]:
IDX=(sorted_df.index.to_numpy())
NIDX = np.zeros(np.size(IDX),dtype =int)
print(type(NIDX[0]))
for i in range(np.size(IDX)):
    NIDX[i] = IDX[i].item()




<class 'numpy.int64'>


In [62]:
type(IDX[0].item())

int

In [63]:
# NOw we have a sorted_URL list

URL_list_sort = np.asarray(full_URL_list)[IDX]

URL_LIST = URL_list_sort.tolist()



In [83]:
def Scrape_and_Save(URL,keyword,concat,indx):
    # Given a URL, rank to filter, and concat status, scrape a link!
    import requests


    htmly = urllib.request.urlopen(URL).read()
    soupy = BeautifulSoup(htmly,'html.parser')
    pre_s = soupy.find('pre')
    start_section_text = 'Highest Rank'
    
    Name =soupy.find_all('h2')[0].text
    
    xs =sorted_df['Intai'].isnull()

    # CHECK IF THERE IS A CUTOFF/INTAI
    if xs.iloc[indx]== False:
        end_section_text = str(sorted_df['Intai'].iloc[indx]) + str(' ') # this space makes all the difference.
        page_text = pre_s.text.strip()
        start_position = page_text.find(start_section_text)
        end_position = page_text.find(end_section_text)

        table_text = page_text[start_position:]

        lines = table_text.splitlines()
    else:
        end_section_text = '2020.11 '
        page_text = pre_s.text.strip()
        start_position = page_text.find(start_section_text)
        end_position = page_text.find(end_section_text)

        table_text = page_text[start_position:end_position]

        lines = table_text.splitlines()# date of next basho...

    # Strip down the page based on this header/footer split

    
    Name =soupy.find_all('h2')[0].text
    if lines.count('Weight') != 0:
        Weight,Height = get_height(lines)
    else:
        Weight,Height = np.nan,np.nan
    First_Name= Name.split()[0]
    for i in range(len(lines)):
        res = lines[i].find(Name.split()[0])
        if res == 0:
            record_idx = i
            break
    #record_idx = lines.index(Name)
    #print(record_idx)
    #print('STRUCTURE: ',lines)
    matches = lines[record_idx+1:]
    #print('MATCHES: ',matches)
    
    date,rank,record,final_score = extract_record(matches[1:])
    Jidx = filter_for_rank(rank,keyword)
    
    if Jidx == -2:
        return # We do not want rikishi with a non 15 record...
        # DO NOT PROCESS FURTHER!
    else:
        
# use Jidx to filter for Juryo only matches.

        date_f,rank_f,record_f,final_score_f = date[Jidx:],rank[Jidx:],record[Jidx:],final_score[Jidx:]
        #print(Jidx)
        if concat== True:
        
            record_arr =np.zeros((len(rank_f),45))
            for i in range(len(record_f)):
                basho_record = record_f[i]
                record_arr[i,:]= \
                translate_record(basho_record,concat =concat)
            
        elif concat== False:
        
            record_arr =np.zeros((len(rank_f),3,15))
            for i in range(len(record_f)):
                basho_record = record_f[i]
                record_arr[i,0,:],record_arr[i,1,:],record_arr[i,2,:]= \
                translate_record(basho_record,concat = concat)

            
        return date_f,rank_f,final_score_f,record_arr,Weight,Height,Name


New database:


ID | RIKISHI_NAME | BASHO | Win/Loss/With vectors (15,15,15) | Weight | Height | Rank | Cutoff_Rank |

In [866]:
lines

['Highest Rank     Maegashira 1',
 'Real Name        YOSHITANE Hiromichi',
 'Birth Date       December 15, 1970',
 'Shusshin         Chiba-ken, Funabashi-shi',
 'Height and Weight183 cm 183 kg',
 'Heya             Tatsutagawa - Michinoku',
 'Shikona          Yoshitane Hiromichi - Shikishima Katsumori',
 'Hatsu Dohyo      1989.01',
 'Intai            2001.05',
 'KabuShikishima Katsumori - Tatsutagawa Katsumori - Fujigane Katsumori - Fujigane Shigeki - Nishikijima Sukemoto - Onogawa Sukemoto - Onogawa Hiromichi - Tanigawa Hiromichi - Ajigawa Hiromichi - Urakaze Hiromichi - Urakaze Tomimichi',
 '',
 'Career Record    416-418-38/832 (75 basho)',
 '  In Makuuchi    175-228-17/402 (28 basho), 2 Kinboshi',
 '   As Maegashira 175-228-17/402 (28 basho), 2 Kinboshi',
 '  In Juryo       128-114-13/241 (17 basho), 1 Yusho',
 '  In Makushita   53-38-8/91 (15 basho), 1 Yusho',
 '  In Sandanme    34-22/56 (8 basho)',
 '  In Jonidan     20-15/35 (5 basho)',
 '  In Jonokuchi   6-1/7 (1 basho)',
 '  In 

In [867]:
unique_ids = np.zeros(len(URL_LIST),dtype= int)
for i in range(len(URL_LIST)):
    unique_ids[i] = int(URL_LIST[i].split('=')[1].split('&')[0])
    
unique_ids

array([ 5921, 11151, 11328, ..., 12292,  2924, 12419])

In [868]:
item_id = int(URL_LIST[0].split('=')[1].split('&')[0])

item_id

5921

In [940]:
if Scrape_and_Save('http://sumodb.sumogames.de/Rikishi.aspx?r=20&t=1','Juryo', True,5000) == None:
    print('Not added, did not exceed rank')
else:
    Date,Rank,Score,Records,Ht,Wt,Nm =Scrape_and_Save('http://sumodb.sumogames.de/Rikishi.aspx?r=20&t=1','Juryo', True,5000)

STRUCTURE:  ['Highest Rank     Maegashira 1', 'Real Name        YOSHITANE Hiromichi', 'Birth Date       December 15, 1970', 'Shusshin         Chiba-ken, Funabashi-shi', 'Height and Weight183 cm 183 kg', 'Heya             Tatsutagawa - Michinoku', 'Shikona          Yoshitane Hiromichi - Shikishima Katsumori', 'Hatsu Dohyo      1989.01', 'Intai            2001.05', 'KabuShikishima Katsumori - Tatsutagawa Katsumori - Fujigane Katsumori - Fujigane Shigeki - Nishikijima Sukemoto - Onogawa Sukemoto - Onogawa Hiromichi - Tanigawa Hiromichi - Ajigawa Hiromichi - Urakaze Hiromichi - Urakaze Tomimichi', '', 'Career Record    416-418-38/832 (75 basho)', '  In Makuuchi    175-228-17/402 (28 basho), 2 Kinboshi', '   As Maegashira 175-228-17/402 (28 basho), 2 Kinboshi', '  In Juryo       128-114-13/241 (17 basho), 1 Yusho', '  In Makushita   53-38-8/91 (15 basho), 1 Yusho', '  In Sandanme    34-22/56 (8 basho)', '  In Jonidan     20-15/35 (5 basho)', '  In Jonokuchi   6-1/7 (1 basho)', '  In Mae-zum

In [74]:
Date,Rank,Score,Records,Ht,Wt,Nm =Scrape_and_Save('http://sumodb.sumogames.de/Rikishi.aspx?r=20&t=1','Juryo', True,5000)

STRUCTURE:  ['Highest Rank     Maegashira 1', 'Real Name        YOSHITANE Hiromichi', 'Birth Date       December 15, 1970', 'Shusshin         Chiba-ken, Funabashi-shi', 'Height and Weight183 cm 183 kg', 'Heya             Tatsutagawa - Michinoku', 'Shikona          Yoshitane Hiromichi - Shikishima Katsumori', 'Hatsu Dohyo      1989.01', 'Intai            2001.05', 'KabuShikishima Katsumori - Tatsutagawa Katsumori - Fujigane Katsumori - Fujigane Shigeki - Nishikijima Sukemoto - Onogawa Sukemoto - Onogawa Hiromichi - Tanigawa Hiromichi - Ajigawa Hiromichi - Urakaze Hiromichi - Urakaze Tomimichi', '', 'Career Record    416-418-38/832 (75 basho)', '  In Makuuchi    175-228-17/402 (28 basho), 2 Kinboshi', '   As Maegashira 175-228-17/402 (28 basho), 2 Kinboshi', '  In Juryo       128-114-13/241 (17 basho), 1 Yusho', '  In Makushita   53-38-8/91 (15 basho), 1 Yusho', '  In Sandanme    34-22/56 (8 basho)', '  In Jonidan     20-15/35 (5 basho)', '  In Jonokuchi   6-1/7 (1 basho)', '  In Mae-zum

In [78]:
w_labels = ['Day %i win'%(i+1) for i in range(15)]
l_labels = ['Day %i loss'%(i+1) for i in range(15)]
wth_labels = ['Day %i withdrawal'%(i+1) for i in range(15)]

labels = w_labels+l_labels+wth_labels
labels

['Day 1 win',
 'Day 2 win',
 'Day 3 win',
 'Day 4 win',
 'Day 5 win',
 'Day 6 win',
 'Day 7 win',
 'Day 8 win',
 'Day 9 win',
 'Day 10 win',
 'Day 11 win',
 'Day 12 win',
 'Day 13 win',
 'Day 14 win',
 'Day 15 win',
 'Day 1 loss',
 'Day 2 loss',
 'Day 3 loss',
 'Day 4 loss',
 'Day 5 loss',
 'Day 6 loss',
 'Day 7 loss',
 'Day 8 loss',
 'Day 9 loss',
 'Day 10 loss',
 'Day 11 loss',
 'Day 12 loss',
 'Day 13 loss',
 'Day 14 loss',
 'Day 15 loss',
 'Day 1 withdrawal',
 'Day 2 withdrawal',
 'Day 3 withdrawal',
 'Day 4 withdrawal',
 'Day 5 withdrawal',
 'Day 6 withdrawal',
 'Day 7 withdrawal',
 'Day 8 withdrawal',
 'Day 9 withdrawal',
 'Day 10 withdrawal',
 'Day 11 withdrawal',
 'Day 12 withdrawal',
 'Day 13 withdrawal',
 'Day 14 withdrawal',
 'Day 15 withdrawal']

In [84]:

dataframe_list = []
record_frame_list = []
import time
for i in range(1250):
    time.sleep(np.random.randint(5,10))
    #print('i =',i)
    if Scrape_and_Save(URL_LIST[i],'Juryo', True,i) == None:
        print('Not added, did not exceed rank')
    else:
    #Date,Rank,Score,Records,Ht,Wt,Nm =Scrape_and_Save('http://sumodb.sumogames.de/Rikishi.aspx?r=20&t=1','Juryo', True,5000)
        Date,Rank,Score,Records,Ht,Wt,Nm = Scrape_and_Save(URL_LIST[i],'Juryo', True,i)
    # Append values row by row:
        print(Date,Rank)
        record_frame = pd.DataFrame(Records, columns = labels)
        dataframe = pd.DataFrame({'Rank': np.asarray(Rank), 'Date':np.asarray(Date), 'Score': np.asarray(Score), 'Name': Nm,'Height':Ht,'Weight':Wt})
        dataframe_list.append(dataframe)
        record_frame_list.append(record_frame)
    if i%100 ==0:
        print(URL_LIST[i],i)
    

[] []
http://sumodb.sumogames.de/Rikishi.aspx?r=5921&t=1 0
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
Not added, did not exceed rank
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
http://sumodb.sumogames.de/Rikishi.aspx?r=11415&t=1 100
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
Not added, did not exceed rank
[] []
Not added, did not exceed rank
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []
[] []


KeyboardInterrupt: 

In [74]:
dataframe_list

[Empty DataFrame
 Columns: [Rank, Date, Score, Name, Height, Weight]
 Index: [],
 Empty DataFrame
 Columns: [Rank, Date, Score, Name, Height, Weight]
 Index: [],
 Empty DataFrame
 Columns: [Rank, Date, Score, Name, Height, Weight]
 Index: [],
 Empty DataFrame
 Columns: [Rank, Date, Score, Name, Height, Weight]
 Index: [],
 Empty DataFrame
 Columns: [Rank, Date, Score, Name, Height, Weight]
 Index: [],
 Empty DataFrame
 Columns: [Rank, Date, Score, Name, Height, Weight]
 Index: [],
 Empty DataFrame
 Columns: [Rank, Date, Score, Name, Height, Weight]
 Index: [],
 Empty DataFrame
 Columns: [Rank, Date, Score, Name, Height, Weight]
 Index: [],
 Empty DataFrame
 Columns: [Rank, Date, Score, Name, Height, Weight]
 Index: [],
 Empty DataFrame
 Columns: [Rank, Date, Score, Name, Height, Weight]
 Index: [],
 Empty DataFrame
 Columns: [Rank, Date, Score, Name, Height, Weight]
 Index: [],
 Empty DataFrame
 Columns: [Rank, Date, Score, Name, Height, Weight]
 Index: [],
 Empty DataFrame
 Columns: [

In [76]:
new_df = pd.concat(dataframe_list)
record_df = pd.concat(record_frame_list)


Unnamed: 0,Rank,Date,Score,Name,Height,Weight
0,J17e,1966.09,4-11,Shodai Naoya,,
1,Ms5w,1966.11,3-4,Shodai Naoya,,
2,Ms7w,1967.01,4-3,Shodai Naoya,,
3,Ms5e,1967.03,3-4,Shodai Naoya,,
4,Ms19e,1967.05,2-5,Shodai Naoya,,
...,...,...,...,...,...,...
27,Ms2e,1971.09,5-2,Shodai Naoya,,
28,J12w,1971.11,8-7,Shodai Naoya,,
29,J9w,1972.01,6-9,Shodai Naoya,,
30,J12e,1972.03,5-10,Shodai Naoya,,


In [844]:
Score

['8-7',
 '6-9',
 '12-3',
 '7-8',
 '9-6',
 '9-6',
 '5-10',
 '9-6',
 '10-5',
 '7-8',
 '8-7',
 '5-10',
 '9-6',
 '8-7',
 '10-5',
 '10-5',
 '6-9',
 '8-7',
 '7-8',
 '9-6',
 '8-7',
 '6-9',
 '8-7',
 '6-9',
 '8-7',
 '8-7',
 '3-12',
 '8-7',
 '4-9-2',
 '0-0-15',
 '7-8',
 '9-6',
 '1-14',
 '8-7']

In [845]:
Records[0]

array([0., 1., 1., 1., 0., 1., 0., 0., 0., 1., 1., 1., 0., 1., 0., 1., 0.,
       0., 0., 1., 0., 1., 1., 1., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [840]:
cols = ['frame', 'count']
N = 4
dat = pd.DataFrame(Records,columns = labels)
for i in range(N):

    #dat = dat.append(np.ones(45,dtype =int),ignore_index=True)
    dat = dat.append(dict(zip(dat.columns, np.ones(45,dtype =int))), ignore_index=True)
dat

Unnamed: 0,Day 1 win,Day 2 win,Day 3 win,Day 4 win,Day 5 win,Day 6 win,Day 7 win,Day 8 win,Day 9 win,Day 10 win,...,Day 6 withdrawal,Day 7 withdrawal,Day 8 withdrawal,Day 9 withdrawal,Day 10 withdrawal,Day 11 withdrawal,Day 12 withdrawal,Day 13 withdrawal,Day 14 withdrawal,Day 15 withdrawal
0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,1.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [709]:
record_frame = pd.DataFrame(Records, columns = labels)
record_frame

Unnamed: 0,Day 1 win,Day 2 win,Day 3 win,Day 4 win,Day 5 win,Day 6 win,Day 7 win,Day 8 win,Day 9 win,Day 10 win,...,Day 6 withdrawal,Day 7 withdrawal,Day 8 withdrawal,Day 9 withdrawal,Day 10 withdrawal,Day 11 withdrawal,Day 12 withdrawal,Day 13 withdrawal,Day 14 withdrawal,Day 15 withdrawal
0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,...,0.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,...,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0
2,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [659]:
np.column_stack((np.asarray(Rank),np.asarray(Date),np.asarray(Score),Records))

array([['Ms10e', '1993.07', '6-1', ..., '1.0', '1.0', '0.0'],
       ['Ms1e', '1993.09', '4-3', ..., '1.0', '1.0', '1.0'],
       ['J12w', '1993.11', '8-7', ..., '0.0', '0.0', '0.0'],
       ...,
       ['M8e', '1999.01', '9-6', ..., '0.0', '0.0', '0.0'],
       ['M2w', '1999.03', '1-14', ..., '0.0', '0.0', '0.0'],
       ['M11w', '1999.05', '8-7', ..., '0.0', '0.0', '0.0']], dtype='<U32')

In [717]:
dataframe = pd.DataFrame({'Rank': np.asarray(Rank), 'Date':np.asarray(Date), 'Score': np.asarray(Score), 'Name': Name,'Height':Ht,'Weight':Wt})
dataframe.join(record_frame)

# This creates a single part of the giant database.
# To add a row:


Unnamed: 0,Rank,Date,Score,Name,Height,Weight,Day 1 win,Day 2 win,Day 3 win,Day 4 win,...,Day 6 withdrawal,Day 7 withdrawal,Day 8 withdrawal,Day 9 withdrawal,Day 10 withdrawal,Day 11 withdrawal,Day 12 withdrawal,Day 13 withdrawal,Day 14 withdrawal,Day 15 withdrawal
0,Ms10e,1993.07,6-1,Shikishima Katsumori,183,183,0.0,1.0,0.0,0.0,...,0.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0
1,Ms1e,1993.09,4-3,Shikishima Katsumori,183,183,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0
2,J12w,1993.11,8-7,Shikishima Katsumori,183,183,0.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,J8w,1994.01,6-9,Shikishima Katsumori,183,183,1.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,J11w,1994.03,12-3,Shikishima Katsumori,183,183,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,J3e,1994.05,7-8,Shikishima Katsumori,183,183,0.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,J5w,1994.07,9-6,Shikishima Katsumori,183,183,1.0,0.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,J3w,1994.09,9-6,Shikishima Katsumori,183,183,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,M16e,1994.11,5-10,Shikishima Katsumori,183,183,1.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,J4w,1995.01,9-6,Shikishima Katsumori,183,183,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [123]:
sorted_df[sorted_df['Shikona'] =='Takeba']

Unnamed: 0,Shikona,Heya,Shusshin,Birth Date,Highest Rank,Hatsu Dohyo,Intai,Last Shikona
4395,Takeba,-,-,,Jonidan 94,1960.05,1961.01,Takeba


In [100]:
URL_LIST[81]

'http://sumodb.sumogames.de/Rikishi.aspx?r=10200&t=1'

In [None]:
http://sumodb.sumogames.de/Rikishi.aspx?r=10200&t=1