# Scraping The Top Games from Nintendolife.com using BeautifulSoup

#### By Uday Sidhu

### Objective: To Retrieve the names, ratings, and page URLs for the top games listed on nintendolife.com.

Nintendolife.com is a website pertaining to everything Nintendo. It hosts the latest news, announcements, guides, and reviews of the latest games released or coming to a Nintendo platform. It also hosts a list of the top rated games of all time. This is the list that we are going to be scraping in this project.




![logo](https://i.imgur.com/P9nyUA9.png)





### What is web scraping?
Web scraping is the process of collecting structured web data in an automated fashion. It’s also called web data extraction. Some of the main use cases of web scraping include price monitoring, price intelligence, news monitoring, lead generation, and market research among many others. In this project we use python libraries `requests` and `BeautifulSoup4` to retrive and parse HTML pages and obtain the desired information.

https://www.zyte.com/learn/what-is-web-scraping/


![webscraping_image](https://i.imgur.com/AXQ4S7S.png)


## Outline
1. Retrieve web-page using `requests`.
2. Import `BeautifulSoup4` to parse the webpage thus obtained.
3. Obtain the required information such as game name, game page URL, and rating.
4. Store obtained information in the form of a dictionary.
5. Write extracted information to a .csv file.
6. Consolidate the code into functions and get entries from a specific page number.





In [1]:
!pip install jovian --upgrade --quiet

In [2]:
import jovian

In [3]:
# Execute this to save new versions of the notebook
jovian.commit(project="Web-Scraping-nintendolife.com")

<IPython.core.display.Javascript object>

[jovian] Creating a new project "udaysidhu1/Web-Scraping-nintendolife.com"[0m
[jovian] Committed successfully! https://jovian.ai/udaysidhu1/web-scraping-nintendolife-71269[0m


'https://jovian.ai/udaysidhu1/web-scraping-nintendolife-71269'

In [4]:
import requests

##   Retrieve web-page using `requests`



In [5]:
top_url='https://www.nintendolife.com/games/browse?sort=rating'

In [6]:
response = requests.get(top_url)

In [7]:

response.status_code

200

In [8]:
page_contents=response.text

In [9]:
#saving web page to view later
with open('webpage.html','w') as f:
    f.write(page_contents)
    

## Import `BeautifulSoup4` to parse the webpage thus obtained.

In [10]:
!pip install beautifulsoup4 --upgrade --quiet



In [11]:
from bs4 import BeautifulSoup

In [12]:
doc=BeautifulSoup(page_contents,'html.parser')

## Store obtained information in the form of dictionaries.

In the cell below, the span tags with the the class `title accent-hover` are searched for. These span tags contain the titles of the games.

In [13]:
span_tags=doc.find_all('span',class_='title accent-hover')
span_tags=span_tags[:60]

In [14]:
span_tags[5].text

"The Legend of Zelda: Breath of the Wild - The Champions' Ballad + Expansion Pass"

In [15]:
len(span_tags)

60

The text form the span tags is appended to a list giving us a list of game titles.

In [16]:
title_list=[]
for tag in span_tags:
    title_list.append(tag.text)
print(title_list)

['The Legend of Zelda: Breath of the Wild', 'The Legend of Zelda: Ocarina of Time', 'The Legend of Zelda: Breath of the Wild', "The Legend of Zelda: Link's Awakening", 'Chrono Trigger', "The Legend of Zelda: Breath of the Wild - The Champions' Ballad + Expansion Pass", 'Chrono Trigger', 'The Legend of Zelda: A Link to the Past', 'Dodgeball Academia', 'Super Mario World', 'Metroid Prime Trilogy', 'Resident Evil 4', 'Super Metroid', 'The Legend of Zelda: The Wind Waker', 'Metroid Prime', 'Paper Mario: The Thousand-Year Door', 'Final Fantasy III', 'Pokémon HeartGold & SoulSilver', 'Xenoblade Chronicles: Definitive Edition', 'Tetris', 'Super Mario Bros. 3', "The Legend of Zelda: Collector's Edition", 'Super Smash Bros. Ultimate', 'Super Mario Odyssey', 'Xenoblade Chronicles', 'Ori and the Will of the Wisps', 'The Legend of Zelda: Ocarina of Time 3D', 'The Legend of Zelda: The Wind Waker HD', 'The Legend of Zelda: Ocarina of Time / Master Quest', 'Super Mario Galaxy', 'Super Mario Galaxy 2'

In [17]:
a_tags=doc.find_all('a',class_='title accent-hover')
a_tags=a_tags[:60]

In [18]:
print(a_tags[0]['href'])

games/nintendo-switch/legend_of_zelda_breath_of_the_wild


In [19]:
url_list=[]
for tag in a_tags:
    url_cat='https://www.nintendolife.com/'+tag['href']
    url_list.append(url_cat)


In [20]:
rating_span_tags=doc.find_all('p',{'class':'user-rating'})
print(rating_span_tags[7].text)

 9.4


In [21]:
rating_span_tags=doc.find_all('p',{'class':'user-rating'})
rating_list=[]
for tag in rating_span_tags:
    rating_list.append(tag.text)


In [22]:
print(rating_list)

[' 9.6', ' 9.6', ' 9.5', ' 9.5', ' 9.5', ' 9.5', ' 9.4', ' 9.4', ' 9.4', ' 9.4', ' 9.4', ' 9.4', ' 9.4', ' 9.4', ' 9.4', ' 9.3', ' 9.3', ' 9.3', ' 9.3', ' 9.3', ' 9.2', ' 9.2', ' 9.2', ' 9.2', ' 9.2', ' 9.2', ' 9.2', ' 9.2', ' 9.2', ' 9.2', ' 9.2', ' 9.2', ' 9.2', ' 9.2', ' 9.2', ' 9.1', ' 9.1', ' 9.1', ' 9.1', ' 9.1', ' 9.1', ' 9.1', ' 9.1', ' 9.1', ' 9.1', ' 9.1', ' 9.1', ' 9.1', ' 9.1', ' 9.1', ' 9.1', ' 9.1', ' 9.1', ' 9', ' 9', ' 9', ' 9', ' 9', ' 9', ' 9']


## Write extracted information to a CSV file.

In [23]:
game_dict={'title':title_list,
            'rating':rating_list,
            'url':url_list
}


In [24]:
import pandas as pd

In [25]:
games_df=pd.DataFrame(game_dict)

In [26]:
games_df

Unnamed: 0,title,rating,url
0,The Legend of Zelda: Breath of the Wild,9.6,https://www.nintendolife.com/games/nintendo-sw...
1,The Legend of Zelda: Ocarina of Time,9.6,https://www.nintendolife.com/games/n64/legend_...
2,The Legend of Zelda: Breath of the Wild,9.5,https://www.nintendolife.com/games/wiiu/legend...
3,The Legend of Zelda: Link's Awakening,9.5,https://www.nintendolife.com/games/gameboy/leg...
4,Chrono Trigger,9.5,https://www.nintendolife.com/games/ds/chrono_t...
5,The Legend of Zelda: Breath of the Wild - The ...,9.5,https://www.nintendolife.com/games/switch-esho...
6,Chrono Trigger,9.4,https://www.nintendolife.com/games/snes/chrono...
7,The Legend of Zelda: A Link to the Past,9.4,https://www.nintendolife.com/games/snes/legend...
8,Dodgeball Academia,9.4,https://www.nintendolife.com/games/switch-esho...
9,Super Mario World,9.4,https://www.nintendolife.com/games/snes/super_...


In [27]:
 games_df.to_csv('games.csv',index=None)

## Consolidate the code into functions and get entries from specified page numbers.

The function `get_doc()` returns a beautiful soup document.

In [28]:
def get_doc(p_no):
    top_url=get_subsi_page_url(p_no)
    response = requests.get(top_url)
    page_contents=response.text
    doc=BeautifulSoup(page_contents,'html.parser')
    return doc

In [29]:
#generate url depending on the page number passed as argument
def get_subsi_page_url(p_no):
    if p_no=='1':
        rurl='https://www.nintendolife.com/games/browse?sort=rating'
    else:
        rurl='https://www.nintendolife.com/games/browse?sort=rating&page='+p_no
    return rurl
    

In [30]:
get_subsi_page_url('1')

'https://www.nintendolife.com/games/browse?sort=rating'

In [43]:
#funtion to get a list of systems that that the game can run on.
def get_system_list():
    doc=get_doc(p_no)
    span_tags=doc.find_all('span',class_='subtitle')
    span_tags=span_tags[:60]
    system_list=[]
    for tag in span_tags:
        system_list.append(tag.text)
    return system_list

In [32]:
#funtion to get a list of game page urls
def get_url_list():
    doc=get_doc(p_no)
    a_tags=doc.find_all('a',class_='title accent-hover')
    a_tags=a_tags[:60]
    url_list=[]
    for tag in a_tags:
        url_cat='https://www.nintendolife.com/'+tag['href']
        url_list.append(url_cat)

    return url_list

In [33]:
#function to get a list of game titles
def get_title_list():
    doc=get_doc(p_no)
    span_tags=doc.find_all('span',class_='title accent-hover')
    span_tags=span_tags[:60]
    title_list=[]
    for tag in span_tags:
        title_list.append(tag.text)
    return title_list

In [34]:
#function to get a list of game ratings
def get_rating_list():
    doc=get_doc(p_no)
    rating_span_tags=doc.find_all('p',{'class':'user-rating'})
    rating_list=[]
    for tag in rating_span_tags:
        rating_list.append(tag.text)
    return rating_list

In [35]:
#function to create a dictionary and finally a csv out of the obtained information  
def create_csv():
    headers=['title','rating','url','system']
    game_dict={'title':title_list,
                'rating':rating_list,
                'url':url_list,
                'system':system_list}
    games_df=pd.DataFrame(game_dict)
    games_df.to_csv('games.csv',index=None,mode='a')

In [36]:
#function to create a dictionary and finally append the csv created earlier

def append_csv():
    game_dict={'title':title_list,
                'rating':rating_list,
                'url':url_list,
                'system':system_list}
    games_df=pd.DataFrame(game_dict)
    games_df.to_csv('games.csv',header=False,mode='a',index=None)

In [37]:
header=[["title", "rating", "url", "system"]]
df = pd.DataFrame(header)
df.to_csv('games.csv', index=False, header=False)

In [38]:

#calling the functions
num=4 #number of pages to be scraped
for i in range(1,num+1):
    s_num=str(i)
    
    p_no=s_num
    url_list=get_url_list()
    rating_list=get_rating_list()
    title_list=get_title_list()
    system_list=get_system_list()
    if i=='1':
        create_csv()
    else:
        append_csv()



In [39]:
result_df=pd.read_csv('games.csv')

In [40]:
result_df

Unnamed: 0,title,rating,url,system
0,The Legend of Zelda: Breath of the Wild,9.6,https://www.nintendolife.com/games/nintendo-sw...,Switch
1,The Legend of Zelda: Ocarina of Time,9.6,https://www.nintendolife.com/games/n64/legend_...,N64
2,The Legend of Zelda: Breath of the Wild,9.5,https://www.nintendolife.com/games/wiiu/legend...,Wii U
3,The Legend of Zelda: Link's Awakening,9.5,https://www.nintendolife.com/games/gameboy/leg...,GB
4,Chrono Trigger,9.5,https://www.nintendolife.com/games/ds/chrono_t...,DS
...,...,...,...,...
235,Rhythm Heaven Fever,8.6,https://www.nintendolife.com/games/wii/rhythm_...,Wii
236,Picross 3D,8.6,https://www.nintendolife.com/games/ds/picross_3d,DS
237,DuckTales,8.6,https://www.nintendolife.com/games/nes/ducktales,NES
238,Spiritfarer,8.6,https://www.nintendolife.com/games/switch-esho...,Switch eShop


In [None]:
jovian.submit(assignment="zerotoanalyst-project1", files=['games.csv'])

<IPython.core.display.Javascript object>

## Summary 
1. Successfully retrieved web-page using `requests`.
2. Parsed web-page thus obtained using `BeautifulSoup` 
3. Obtained the required information such as game name, game page URL,system, and rating.
4. Stored obtained information in the form of a dictionary.
5. Written the extracted information to a .csv file.
6. Consolidated the code into functions and retrieved information from a specific page number.



## Future Work

- Scrape other pages like "Latest Featured", "Latest Reviews" etc.
- Try other web scraping tools like scrapy or selenium 

## References 
- https://www.crummy.com/software/BeautifulSoup/bs4/doc/
- https://docs.python-requests.org/en/master/
- https://stackoverflow.com/questions/46510966/beautiful-soup-nested-tag-search