## Getting Data from GSMArena
GSMArena's primary task is to provide detailed and accurate information about mobile phones and their features. Users can leave their reviews on anything related to phones on the website and other users can also comment on each review.

Thus, we have decided to scrape this website for sentiments on the different flagship models for the various phone brands in the market today. All phone brands listed on the site have their models scraped based on their date of release (latest models).

## Scraping phone reviews from landing page

In [1]:
# All phone reviews scraper

# Importing required libraries
from bs4 import BeautifulSoup as BS
import requests
import re
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

# Download Chromedriver and set PATH to it before executing this
browser = webdriver.Chrome("C:/Program Files (x86)/Google/Chrome/Application/chromedriver_win32/chromedriver.exe")

# Change url to the website you want to scrap
url="https://www.gsmarena.com/"
browser.get(url)
r = browser.page_source
#r = requests.get(url)
        
# Beautiful Soup object
data = BS(r, "lxml")
all_brands_link = [url + extension for extension in (brand['href'] for brand in ((data.find("div", {"class": "brandmenu-v2"})).find("ul")).find_all("a"))]

In [4]:
print(all_brands_link)

['https://www.gsmarena.com/samsung-phones-9.php', 'https://www.gsmarena.com/apple-phones-48.php', 'https://www.gsmarena.com/nokia-phones-1.php', 'https://www.gsmarena.com/sony-phones-7.php', 'https://www.gsmarena.com/lg-phones-20.php', 'https://www.gsmarena.com/htc-phones-45.php', 'https://www.gsmarena.com/motorola-phones-4.php', 'https://www.gsmarena.com/huawei-phones-58.php', 'https://www.gsmarena.com/microsoft-phones-64.php', 'https://www.gsmarena.com/lenovo-phones-73.php', 'https://www.gsmarena.com/xiaomi-phones-80.php', 'https://www.gsmarena.com/google-phones-107.php', 'https://www.gsmarena.com/acer-phones-59.php', 'https://www.gsmarena.com/asus-phones-46.php', 'https://www.gsmarena.com/oppo-phones-82.php', 'https://www.gsmarena.com/oneplus-phones-95.php', 'https://www.gsmarena.com/meizu-phones-74.php', 'https://www.gsmarena.com/blackberry-phones-36.php', 'https://www.gsmarena.com/alcatel-phones-5.php', 'https://www.gsmarena.com/zte-phones-62.php', 'https://www.gsmarena.com/toshib

In [2]:
# Array to store all phone models according to brand
all_brands_phones_model = {}

# Array to store all phone models infomation link according to brand
all_brands_phones_link = {}

# Retrieving links to each phone model
for brand in all_brands_link:
    browser.get(brand)
    r = browser.page_source
    brand_mainpage = BS(r, "lxml")
    # getting name of brand to append to correct key in 'all_brands_phones' dictionary
    name = re.split(" ", ((brand_mainpage.find("div", {"class": "article-hgroup"})).find("h1", {"class": "article-info-name"})).text)[0]
    # initialize key as phone brand name
    all_brands_phones_link[name] = []
    all_brands_phones_model[name] = []
    # storing only titles of links related to phones (to filter tablets,watches etc.) using boolean checker
    wanted_links = [True if "phone" in link else False for link in (type['title'] for type in ((brand_mainpage.find("div", {"class": "makers"})).find("ul")).find_all("img"))]

    for link in range(len(wanted_links)):
        if (wanted_links[link] == True):
            model = [wanted.text for wanted in (((brand_mainpage.find("div", {"class": "makers"})).find("ul")).find_all("a"))][link]
            model_link = [url + extension for extension in (model['href'] for model in (((brand_mainpage.find("div", {"class": "makers"})).find("ul")).find_all("a")))][link]
            all_brands_phones_link[name].append(model_link)
            all_brands_phones_model[name].append(model)

In [8]:
# writing models of phone brands into json and csv file
import json
import csv

json = json.dumps(all_brands_phones_model)
f = open("all_brand_models.json","w")
f.write(json)
f.close()

w = csv.writer(open("all_brand_models.csv", "w"))
for key, val in all_brands_phones_model.items():
    w.writerow([key, val])

In [3]:
print(all_brands_phones_link)

{'Samsung': ['https://www.gsmarena.com/samsung_galaxy_s9+-8967.php', 'https://www.gsmarena.com/samsung_galaxy_s9-8966.php', 'https://www.gsmarena.com/samsung_galaxy_j2_pro_(2018)-8904.php', 'https://www.gsmarena.com/samsung_galaxy_a8+_(2018)-8790.php', 'https://www.gsmarena.com/samsung_galaxy_a8_(2018)-8886.php', 'https://www.gsmarena.com/samsung_galaxy_j2_(2017)-8900.php', 'https://www.gsmarena.com/samsung_galaxy_c7_(2017)-8789.php', 'https://www.gsmarena.com/samsung_galaxy_note8-8505.php', 'https://www.gsmarena.com/samsung_galaxy_s8_active-8676.php', 'https://www.gsmarena.com/samsung_galaxy_j7_v-8778.php', 'https://www.gsmarena.com/samsung_galaxy_note_fe-8683.php', 'https://www.gsmarena.com/samsung_galaxy_j7_max-8684.php', 'https://www.gsmarena.com/samsung_galaxy_j7_(2017)-8675.php', 'https://www.gsmarena.com/samsung_galaxy_j7_pro-8561.php', 'https://www.gsmarena.com/samsung_galaxy_j5_(2017)-8705.php', 'https://www.gsmarena.com/samsung_galaxy_j3_(2017)-8438.php', 'https://www.gsmaren

In [5]:
print(all_brands_phones_model)

{'Samsung': ['Galaxy S9+', 'Galaxy S9', 'Galaxy J2 Pro (2018)', 'Galaxy A8+ (2018)', 'Galaxy A8 (2018)', 'Galaxy J2 (2017)', 'Galaxy C7 (2017)', 'Galaxy Note8', 'Galaxy S8 Active', 'Galaxy J7 V', 'Galaxy Note FE', 'Galaxy J7 Max', 'Galaxy J7 (2017)', 'Galaxy J7 Pro', 'Galaxy J5 (2017)', 'Galaxy J3 (2017)', 'Z4', 'Galaxy S8', 'Galaxy S8+', 'Galaxy C5 Pro', 'Galaxy Xcover 4', 'Galaxy J1 mini prime', 'Galaxy J3 Emerge', 'Galaxy C7 Pro', 'Galaxy A7 (2017)', 'Galaxy A5 (2017)', 'Galaxy A3 (2017)', 'Galaxy Grand Prime Plus', 'Galaxy J2 Prime', 'Galaxy C9 Pro', 'Galaxy C10', 'Galaxy A8 (2016)', 'Galaxy On8', 'Galaxy On7 (2016)', 'Galaxy J5 Prime', 'Galaxy J7 Prime', 'Z2', 'Galaxy Note7 (USA)', 'Galaxy Note7', 'Galaxy On7 Pro', 'Galaxy On5 Pro', 'Galaxy Tab J', 'Galaxy J Max', 'Galaxy J2 Pro (2016)', 'Galaxy J2 (2016)', 'Z3 Corporate Edition', 'Galaxy Xcover 3 G389F', 'Galaxy S7 active', 'Galaxy J3 Pro', 'Galaxy C7', 'Galaxy C5', 'Galaxy A9 Pro (2016)', 'Galaxy J7 (2016)', 'Galaxy J5 (2016)', 

In [3]:
browser = webdriver.Chrome("C:/Program Files (x86)/Google/Chrome/Application/chromedriver_win32/chromedriver.exe")

# creating a dictionary brand of dictionaries of models of dictionaries of phone specification
overall_dict = {}
for brand in all_brands_link:
    browser.get(brand)
    r = browser.page_source
    brand_mainpage = BS(r, "lxml")
    # getting name of brand to append to correct key in 'all_brands_phones' dictionary
    name = re.split(" ", ((brand_mainpage.find("div", {"class": "article-hgroup"})).find("h1", {"class": "article-info-name"})).text)[0]
    # initialize key as phone brand name
    overall_dict[name] = {}
print(overall_dict)
    
for brand in overall_dict:
    print("new round")
    for brand_ in all_brands_phones_model:
        if brand == brand_:
            print(brand_)
            for model in range(len(all_brands_phones_model[brand_])):
                # scraping for each model individually
                wanted_model_url = all_brands_phones_link[brand_][model]
                browser.get(wanted_model_url)
                r = browser.page_source
                model_page = BS(r, "lxml")
                
                # storing specifications in a dictionary of dictionaries
                phone_specs = {}
                # getting headers for specification categories
                all_tables_header = [header.text for header in (model_page.find_all("th"))]
                for header in all_tables_header:
                    phone_specs[header] = {}

                # getting sub-headers of specification categories' headers
                # then populate sub-headers with its respective values
                all_tables = [table for table in (model_page.find_all("table"))]
                for table in all_tables:
                    main_header = table.find("th").text
                    sub_headers = [subs.text for subs in (table.find_all("td", {"class": "ttl"}))]
                    for subheader in sub_headers:
                        if (subheader != '\xa0'):
                            phone_specs[main_header][subheader] = ""
                    sub_headers_content = [content.text for content in (table.find_all("td", {"class": "nfo"}))]
                    counter = 0
                    for key in phone_specs[main_header]:
                        # no missing subheaders
                        if (len(phone_specs[main_header]) == len(sub_headers_content)):
                            phone_specs[main_header][key] = sub_headers_content[counter]
                            counter += 1
                        # missing subheaders from the front
                        else:
                            counter += 1
                            phone_specs[main_header][key] = sub_headers_content[counter]
                            
                # storing phone specs            
                overall_dict[brand][all_brands_phones_model[brand_][model]] = phone_specs
print("Completed this section.")

{'Samsung': {}, 'Apple': {}, 'Nokia': {}, 'Sony': {}, 'LG': {}, 'HTC': {}, 'Motorola': {}, 'Huawei': {}, 'Microsoft': {}, 'Lenovo': {}, 'Xiaomi': {}, 'Google': {}, 'Acer': {}, 'Asus': {}, 'Oppo': {}, 'OnePlus': {}, 'Meizu': {}, 'BlackBerry': {}, 'alcatel': {}, 'ZTE': {}, 'Toshiba': {}, 'Vodafone': {}, 'Energizer': {}, 'XOLO': {}, 'Lava': {}, 'Micromax': {}, 'BLU': {}, 'Gionee': {}, 'vivo': {}, 'LeEco': {}, 'Panasonic': {}, 'HP': {}, 'YU': {}, 'verykool': {}, 'Maxwest': {}, 'Plum': {}}
new round
Samsung
new round
Apple
new round
Nokia
new round
Sony
new round
LG
new round
HTC
new round
Motorola
new round
Huawei
new round
Microsoft
new round
Lenovo
new round
Xiaomi
new round
Google
new round
Acer
new round
Asus
new round
Oppo
new round
OnePlus
new round
Meizu
new round
BlackBerry
new round
alcatel
new round
ZTE
new round
Toshiba
new round
Vodafone
new round
Energizer
new round
XOLO
new round
Lava
new round
Micromax
new round
BLU
new round
Gionee
new round
vivo
new round
LeEco
new round
P

In [9]:
# writing models of phone brands into json and csv file
import json
import csv

json = json.dumps(overall_dict)
f = open("all_brand_models_specs.json","w")
f.write(json)
f.close()

w = csv.writer(open("all_brand_models_specs.csv", "w", encoding="utf-8"))
for key, val in overall_dict.items():
    w.writerow([key, val])

## Scraping comments for each phone model we have extracted
Before we can scrap the comments, the comments for each phone model is separated into different pages. Thus, for each phone model, we need to get it's respective comments' pages' URL first. We shall store that in a dictionary of phone brands of dictionary of phone models of list of comments' URL.

In [10]:
# scraping comments url for each phone model
browser = webdriver.Chrome("C:/Program Files (x86)/Google/Chrome/Application/chromedriver_win32/chromedriver.exe")

# all_brands_phones_link contains all url to each phone model
all_models_comments_url = {}
for brand in all_brands_link:
    browser.get(brand)
    r = browser.page_source
    brand_mainpage = BS(r, "lxml")
    # getting name of brand to append to correct key in 'all_brands_phones' dictionary
    name = re.split(" ", ((brand_mainpage.find("div", {"class": "article-hgroup"})).find("h1", {"class": "article-info-name"})).text)[0]
    # initialize key as phone brand name
    all_models_comments_url[name] = {}

for brands in all_models_comments_url:
    for models in all_brands_phones_model:
        if (brands == models):
            for model in range(len(all_brands_phones_model[models])):
                all_models_comments_url[brands][all_brands_phones_model[models][model]] = []
print("Completed this section.")

{'Samsung': {'Galaxy S9+': [], 'Galaxy S9': [], 'Galaxy J2 Pro (2018)': [], 'Galaxy A8+ (2018)': [], 'Galaxy A8 (2018)': [], 'Galaxy J2 (2017)': [], 'Galaxy C7 (2017)': [], 'Galaxy Note8': [], 'Galaxy S8 Active': [], 'Galaxy J7 V': [], 'Galaxy Note FE': [], 'Galaxy J7 Max': [], 'Galaxy J7 (2017)': [], 'Galaxy J7 Pro': [], 'Galaxy J5 (2017)': [], 'Galaxy J3 (2017)': [], 'Z4': [], 'Galaxy S8': [], 'Galaxy S8+': [], 'Galaxy C5 Pro': [], 'Galaxy Xcover 4': [], 'Galaxy J1 mini prime': [], 'Galaxy J3 Emerge': [], 'Galaxy C7 Pro': [], 'Galaxy A7 (2017)': [], 'Galaxy A5 (2017)': [], 'Galaxy A3 (2017)': [], 'Galaxy Grand Prime Plus': [], 'Galaxy J2 Prime': [], 'Galaxy C9 Pro': [], 'Galaxy C10': [], 'Galaxy A8 (2016)': [], 'Galaxy On8': [], 'Galaxy On7 (2016)': [], 'Galaxy J5 Prime': [], 'Galaxy J7 Prime': [], 'Z2': [], 'Galaxy Note7 (USA)': [], 'Galaxy Note7': [], 'Galaxy On7 Pro': [], 'Galaxy On5 Pro': [], 'Galaxy Tab J': [], 'Galaxy J Max': [], 'Galaxy J2 Pro (2016)': [], 'Galaxy J2 (2016)': 

In [60]:
browser = webdriver.Chrome("C:/Program Files (x86)/Google/Chrome/Application/chromedriver_win32/chromedriver.exe")

# is_number method is to filter out ">>" text that are present in models with more than 3 pages of comments
def is_number(s):
    try:
        float(s)
        return True
    except ValueError:
        pass
 
    try:
        import unicodedata
        unicodedata.numeric(s)
        return True
    except (TypeError, ValueError):
        pass
 
    return False

for brand in all_models_comments_url:
    for link in range(len(all_brands_phones_link[brand])):
        model_name = all_brands_phones_model[brand][link]
        url = all_brands_phones_link[brand][link]
        browser.get(url)
        r_model = browser.page_source
        model_mainpage = BS(r_model, "lxml")
        if ((model_mainpage.find("div", {"id": "user-comments"})) != None):
            all_comments_urls = []
            main_comments_url = (((model_mainpage.find("div", {"id": "user-comments"})).find("h2")).find("a")["href"]).replace('.php', "")
            
            # to determine number of pages of comments a particular phone model has
            complete_url = "https://www.gsmarena.com/" + main_comments_url + ".php"
            browser.get(complete_url)
            r_comments = browser.page_source
            comments_mainpage = BS(r_comments, "lxml")
            number_of_pages = max([int(number.text) for number in ((comments_mainpage.find("div", {"class": "nav-pages"})).find_all("a")) if (is_number(number.text) == True)])
            
            all_comments_urls.append(complete_url)
            for pg_number in range(2, number_of_pages+1):
                indv_page_url = "https://www.gsmarena.com/" + main_comments_url + "p" + str(pg_number) + ".php"
                all_comments_urls.append(indv_page_url)
            all_models_comments_url[brand][model_name] = all_comments_urls
        print(all_models_comments_url)
print("Completed this section.")

{'Samsung': {'Galaxy S9+': ['https://www.gsmarena.com/samsung_galaxy_s9+-reviews-8967.php', 'https://www.gsmarena.com/samsung_galaxy_s9+-reviews-8967p2.php', 'https://www.gsmarena.com/samsung_galaxy_s9+-reviews-8967p3.php', 'https://www.gsmarena.com/samsung_galaxy_s9+-reviews-8967p4.php', 'https://www.gsmarena.com/samsung_galaxy_s9+-reviews-8967p5.php', 'https://www.gsmarena.com/samsung_galaxy_s9+-reviews-8967p6.php', 'https://www.gsmarena.com/samsung_galaxy_s9+-reviews-8967p7.php', 'https://www.gsmarena.com/samsung_galaxy_s9+-reviews-8967p8.php', 'https://www.gsmarena.com/samsung_galaxy_s9+-reviews-8967p9.php', 'https://www.gsmarena.com/samsung_galaxy_s9+-reviews-8967p10.php', 'https://www.gsmarena.com/samsung_galaxy_s9+-reviews-8967p11.php']}, 'Apple': {}, 'Nokia': {}, 'Sony': {}, 'LG': {}, 'HTC': {}, 'Motorola': {}, 'Huawei': {}, 'Microsoft': {}, 'Lenovo': {}, 'Xiaomi': {}, 'Google': {}, 'Acer': {}, 'Asus': {}, 'Oppo': {}, 'OnePlus': {}, 'Meizu': {}, 'BlackBerry': {}, 'alcatel': {},

WebDriverException: Message: unknown error: cannot determine loading status
from disconnected: received Inspector.detached event
  (Session info: chrome=64.0.3282.167)
  (Driver info: chromedriver=2.35.528161 (5b82f2d2aae0ca24b877009200ced9065a772e73),platform=Windows NT 10.0.16299 x86_64)


In [63]:
# scraping comments for each phone model
browser = webdriver.Chrome("C:/Program Files (x86)/Google/Chrome/Application/chromedriver_win32/chromedriver.exe")

# all_brands_phones_link contains all url to each phone model
all_models_comments = {}
for brand in all_brands_link:
    browser.get(brand)
    r = browser.page_source
    brand_mainpage = BS(r, "lxml")
    # getting name of brand to append to correct key in 'all_brands_phones' dictionary
    name = re.split(" ", ((brand_mainpage.find("div", {"class": "article-hgroup"})).find("h1", {"class": "article-info-name"})).text)[0]
    # initialize key as phone brand name
    all_models_comments[name] = {}

for brands in all_models_comments:
    for models in all_brands_phones_model:
        if (brands == models):
            for model in range(len(all_brands_phones_model[models])):
                all_models_comments[brands][all_brands_phones_model[models][model]] = []
print(all_models_comments)

# go each brand's page
for brand in all_models_comments_url:
    # go each brand's individual phone's page
    for model in all_models_comments_url[brand]:
        if (len(all_models_comments_url[brand][model]) != 0):
            # go all individual phone's comments pages
            for link in all_models_comments_url[brand][model]:
                browser.get(link)
                r_comments = browser.page_source
                brand_comments = BS(r_comments, "lxml")
                
                all_comments = brand_comments.find_all("p", {"class": "uopin"})
        
                for comment in all_comments:
                    # if it is a reply to a comment
                    if ((comment.find('span') != None) or comment.find('a') != None):
                        # remove all nested span tags
                        while(comment.find('span') != None):
                            comment.span.decompose()
                        # remove all nested a tags
                        while(comment.find('a') != None):
                            comment.a.decompose()
                        # remove all remaining html tags
                        comment = comment.get_text()
                        # then append
                        all_models_comments[brand][model].append(comment)
                    # no nested spans, just append
                    else:
                        all_models_comments[brand][model].append(comment)
print("Completed this section.")

{'Samsung': {'Galaxy S9+': [], 'Galaxy S9': [], 'Galaxy J2 Pro (2018)': [], 'Galaxy A8+ (2018)': [], 'Galaxy A8 (2018)': [], 'Galaxy J2 (2017)': [], 'Galaxy C7 (2017)': [], 'Galaxy Note8': [], 'Galaxy S8 Active': [], 'Galaxy J7 V': [], 'Galaxy Note FE': [], 'Galaxy J7 Max': [], 'Galaxy J7 (2017)': [], 'Galaxy J7 Pro': [], 'Galaxy J5 (2017)': [], 'Galaxy J3 (2017)': [], 'Z4': [], 'Galaxy S8': [], 'Galaxy S8+': [], 'Galaxy C5 Pro': [], 'Galaxy Xcover 4': [], 'Galaxy J1 mini prime': [], 'Galaxy J3 Emerge': [], 'Galaxy C7 Pro': [], 'Galaxy A7 (2017)': [], 'Galaxy A5 (2017)': [], 'Galaxy A3 (2017)': [], 'Galaxy Grand Prime Plus': [], 'Galaxy J2 Prime': [], 'Galaxy C9 Pro': [], 'Galaxy C10': [], 'Galaxy A8 (2016)': [], 'Galaxy On8': [], 'Galaxy On7 (2016)': [], 'Galaxy J5 Prime': [], 'Galaxy J7 Prime': [], 'Z2': [], 'Galaxy Note7 (USA)': [], 'Galaxy Note7': [], 'Galaxy On7 Pro': [], 'Galaxy On5 Pro': [], 'Galaxy Tab J': [], 'Galaxy J Max': [], 'Galaxy J2 Pro (2016)': [], 'Galaxy J2 (2016)': 

no error
Shameless THIEF SAMSUNG wants to pay 300 euro for your S8+ because it became ancient  phone. 
What a world...
removed nested spans
From   IP68: Protected from total dust ingress. Protected from long term immersion up to a specified pressure.
 
If it will not fail after long term immersion I call it waterproof.
removed nested spans
Please name me one flagship that is waterproof. Take note you used the word waterproof, not water resistant.
removed nested spans
Not everyone have the time to play the computer anymore.
no error
Note 8 is the best 
removed nested spans
Recommended for you bro: waiting for GSXIII
removed nested spans
If that so, no one will buy $20000 watch.
no error
Off my list unless fm radio and recording for the world too. Cjd
removed nested spans
Ahh come on, if you get $10,000 / monthly from your business, 
you'll definetly build $5,000 gaming PC and buy couple of this phone....
no error
Waiting for galaxy s10 
It will be a game changer 
removed nested spans
Yo

removed nested spans
I don't know where you get your information... It's been said that the S9 wasn't going to be flexible, Samsung has been working on a flexible phone called the Galaxy X.
removed nested spans
I guess it will come to reality pretty soon. Though they are still perfecting it perhaps because of screen burn issues on their amoled screens
removed nested spans
Yep, at the current AU-EUR rates the S9+ would cost me AU$1550. The S8+ cost AU$1350.
 
If the camera is a big step up i will get the S9+. 
The S8+ camera is still awesome so I'm not convinced the difference will be that great.
no error
It doesn't worry me that the design is the same as the S8+. 
Glad they finally have dual camera. 
The only way I will upgrade my S8+ is if the S9+ camera tests are substantially better. 
A bit bigger battery would have been nice.

no error
990 Euro will be too much to ask for this flagship. I wonder how will the existing S8+ users see switching to such costly but almost similar device.

WebDriverException: Message: chrome not reachable
  (Session info: chrome=64.0.3282.167)
  (Driver info: chromedriver=2.35.528161 (5b82f2d2aae0ca24b877009200ced9065a772e73),platform=Windows NT 10.0.16299 x86_64)


## Scraping articles from landing page
The below scraper used Selenium only. It is unfinished since Beautiful Soup was brought in after which to scrape the website [code as shown above].

In [1]:
# Importing required libraries
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

# Download Chromedriver and set PATH to it before executing this
browser = webdriver.Chrome("C:/Program Files (x86)/Google/Chrome/Application/chromedriver_win32/chromedriver.exe")
# Change url to the website you want to scrap
url="https://www.gsmarena.com/"
browser.get(url)
time.sleep(1)

In [2]:
# Using Selenium import By
from selenium.webdriver.common.by import By

#content = browser.find_elements_by_class_name('news-item')
#content.find_element_by_css_selector('a').get_attribute('href')

body = browser.find_element_by_tag_name('body')
# Moving down the page 5 times
for _ in range(5):
    body.send_keys(Keys.PAGE_DOWN)
    time.sleep(0.2)

In [3]:
# Scraping all reviews shown on the page
content = browser.find_elements_by_class_name('news-item')

# Retreive and store review titles
review_titles = []

for review in content:
    title = review.find_element_by_css_selector('h3').text
    review_titles.append(title)
#print(review_titles)

# Retrieve and store weblinks to reviews
review_links = []

for review in content:
    link = review.find_element_by_css_selector('a').get_attribute('href')
    review_links.append(link)
#print(review_links)

In [5]:
# Initializing another webdriver to scrap each review individually
browser = webdriver.Chrome("C:/Program Files (x86)/Google/Chrome/Application/chromedriver_win32/chromedriver.exe")

# At every review link, there are links to related articles as well, where more reviews can be scraped
# Retrieve and store review body content
review_body = []

for link in review_links:
    browser.get(link)
    text_item = browser.find_element_by_id('review-body')

    # To ensure all paragraphs of text are taken into account, not just the first paragraph
    texts = text_item.find_elements_by_css_selector('p')
    
    entire_review = ""
    for text in texts:
        entire_review += text.text + " "
    print(entire_review)
    review_body.append(entire_review)

Global smartphone sales reached 408 million units during the 2017 holiday season, Gartner reported. This is a 5.6% decline, compared with Q4 2016, when over 432 million phones found a new owner, marking the first time sales go down since Gartner started tracking in 2004. According to Anshul Gupta, research director at Gartner, two factors led to the fall: the upgrade from feature phones to smartphones slowed down due to lack of ultra-low-cost devices and users deciding to choose a quality model and stick with it for a longer time before switching. “Moreover, while demand for high quality, 4G connectivity, and better camera features remained strong, high expectations weakened smartphone sales”, added Gupta. Due to the massive success of the Galaxy S8 duo and the disappointment with the pricing and launch strategy of the iPhone X, Samsung was back in the lead both in volume and market share ahead of Apple for Q4. Both companies, though, posted declines, where the actual winners are Huawe

Spanish company BQ has launched two new Android smartphones today, both of which are already available from its international online store. The Aquaris VS and Aquaris VS Plus share a lot of specs, as you'd expect from the naming convention, and yes, the Plus model is bigger. The BQ Aquaris VS has a 5.2-inch 720p IPS touchscreen with 2.5D glass and up to 520 nits of brightness, a 12 MP main camera with f/2.0 aperture and 1.25µm pixels, as well as 1080p video recording. For selfies you have an 8 MP sensor with f/2.0 aperture and 1.12µm pixels, and even a LED flash. The phone is powered by the Snapdragon 430 chipset, which has a 1.4 GHz octa-core Cortex-A53 CPU. You get either 3GB of RAM and 32GB of storage for €189.90, or a 4/64GB combo for €209.90. A fingerprint scanner is on the back, and the Aquaris VS has a 3,100 mAh battery with support for Qualcomm Quick Charge 3.0. It runs Android 7.1.2 Nougat with an update to Oreo promised for the future. The Aquaris VS Plus only has one version

Back in December 2017, Xiaomi was rumored to launch the Mi Max 3 with a 7” tall screen and 5,500 mAh battery, while a January leak suggested a dual camera setup. According to latest find, the phablet will have all these specs, plus a choice between Snapdragon 660 and 630 chipsets and wireless charging. XDA-Developers found the information in strings part of the firmware code of the MIUI Keyguard APK. The lines “Wireless charging has stopped” and the others listed should appear on the screen when the Mi Max 3 is placed on a charging pad. A graphic, showing how the Mi Max 3 should be positioned, also surfaced, although this looks like a generic image of a phone rather than an actual phone. Dual cameras in the tablet-like smartphone will come with either IMX363 sensor by Sony or S5K217+S5K5E8 Samsung setup. On the front, there will be one S5K4H7 Samsung sensor, paired with an iris scanner, currently reserved only for Galaxy phones. Apparently, Xiaomi wants to create a phone that is sittin

In [6]:
# Initializing another webdriver to scrap each review individually
browser = webdriver.Chrome("C:/Program Files (x86)/Google/Chrome/Application/chromedriver_win32/chromedriver.exe")

# Retreive and store review comments
review_comments_link = []

for link in review_links:
    browser.get(link)
    comments_link = browser.find_elements_by_class_name('article-info-meta-link')
    # need to consider the case where the article has no comments
    if (len(comments_link) > 1):
        wanted_comments_link = comments_link[1]
        link = wanted_comments_link.find_element_by_css_selector('a').get_attribute('href')
        review_comments_link.append(link)
    else:
        review_comments_link.append("This review has no comments currently.")
print(review_comments_link)

['https://www.gsmarena.com/newscomm-29755.php', 'This review has no comments currently.', 'https://www.gsmarena.com/newscomm-29754.php', 'https://www.gsmarena.com/newscomm-29752.php', 'https://www.gsmarena.com/newscomm-29751.php', 'https://www.gsmarena.com/newscomm-29750.php', 'https://www.gsmarena.com/newscomm-29749.php', 'https://www.gsmarena.com/newscomm-29748.php', 'https://www.gsmarena.com/newscomm-29747.php', 'https://www.gsmarena.com/newscomm-29744.php', 'https://www.gsmarena.com/newscomm-29746.php', 'https://www.gsmarena.com/newscomm-29745.php', 'https://www.gsmarena.com/newscomm-29743.php', 'https://www.gsmarena.com/newscomm-29742.php', 'https://www.gsmarena.com/newscomm-29741.php', 'https://www.gsmarena.com/newscomm-29739.php', 'https://www.gsmarena.com/newscomm-29737.php', 'https://www.gsmarena.com/newscomm-29738.php', 'https://www.gsmarena.com/newscomm-29736.php']
