### OBTAINGING SECOND BIKES IN THE BARCELONA AREA THAT FULFILL CERTAIN CRITERIA

The code below will walk you through the web scraping process to follow in order to obtain data about second hand bikes listed on the Wallapop website in Spain - Barcelona and create a ```DataFrame``` that appends all the listings and another ```DataFrame``` that will give us a summary of the data obtained (based on bike type, condition, and average listing price).

We would like to look for bike types that fall under three chriteria: ```Bicicletas de carretera```, ```MTB```, and ```Plegables```.

Next, for each bike type of the three mentioned, we would like to limit the search for 3 bike states/conditions: ```"Nuevo"```, ```"Como Nuevo"```, and ```"En Buen Estado"```.

Therefore, to do the above, the below code follows the below architecture:
1. We import the necessary libraries, initiate a driver instance, and load the ```Wallapop website```
2. We apply the necessary preliminary filters to suit our item search, location, and price restrictions
3. We select the first bike type, and then the first bike condition, and scrape the listings for the needed data (a for-loop will help us deal with the search and will make our code easier to read and understand). Our search will be limited to 250 listings in case more exist.
4. Step 3 is repeated untill we have obtained all the data about every bike type and bike condition
5. The data is then aggregated in a ```DataFrame``` ```df```
6. A summary ```DataFrame``` ```agg```  is created to group as a summary of the data obtained (based on bike type, condition, and average listing price)


With that said, let's get to work! First, let's begin by importing the required libraries.

In [38]:
# Importing the standard libraries
import time
import pandas as pd
import numpy as np
from datetime import datetime
import re
import locale

# Importing Selenium library and relevant classes
import selenium
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from selenium.webdriver.common.action_chains import ActionChains

Next, we install the necessary ```Selenium``` drivers, launch the ```Wallapop``` website, and maximize the window for a better view.

In [39]:
# Installing the Selenium Chrome web driver
# No external files need to be downloaded with this method of utilizing Selenium
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))

# Fetching the Wallapop home page
driver.get('https://es.wallapop.com');
time.sleep(1)

# Maximizing the window
driver.maximize_window()  
driver.switch_to.window(driver.current_window_handle)
driver.implicitly_wait(1)

Let's accept the website cookies that are shown once the page loads.

Sometimes the page loads without the requirement to accept the cookies, and therefore the code to accept cookies has to be placed in a try-except statement

In [40]:
# Accepting page cookies whenever it shows/loads up on the screen
try:
    WebDriverWait(driver, 15).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".banner-actions-container"))).click()
except:
    pass

Now let's search for our product of interest using the search box, and clicking on the result that matches our interests.

Again, we are searching for ```bicicletas```, so the search result that we want is ```"Bicicletas"```.

In [41]:
# Searching for all results matching the keyword “bicicleta"
driver.find_element(By.CSS_SELECTOR, ".Search__input").send_keys("bicicleta")
time.sleep(3)

# Clicking on the result which retrieves the "Bicicletas" search results
driver.find_element(By.CSS_SELECTOR, '.Search__suggestion-title--highlighted').click()
time.sleep(3) # A sleep time of 3 seconds has been put to allow the page to load

Once the page loads, we can start further filtering the search results to find what we're looking for.

Let's start by setting our desired search location to be "España, Barcelona” and narrowing the search results (to be shown) down to a maximum of 10km.

In [42]:
# Clicking the filter button to access the search filtering form
driver.find_elements(By.CSS_SELECTOR, '.d-flex.ng-star-inserted')[4].click()

# Setting the location preferences to “España, Barcelona”
driver.find_element(By.CSS_SELECTOR, '.LocationFilter__input').clear()
driver.find_element(By.CSS_SELECTOR, '.LocationFilter__input').send_keys("España, Barcelona")
time.sleep(3)

# Clicking on the first result for the location result of interest
driver.find_element(By.CSS_SELECTOR, '.dropdown-item.ng-star-inserted.active').click()

# Narrowing the search down to a maximum of 10km
handle_slider = driver.find_elements(By.CSS_SELECTOR, '.ngx-slider-span.ngx-slider-bar-wrapper.ngx-slider-selection-bar')[0]
ActionChains(driver).drag_and_drop_by_offset(handle_slider, -100, 0).perform()

# Clicking the "acceptar" button to load the filtered results
driver.find_element(By.CSS_SELECTOR, '.btn.btn-filter.btn-primary').click()
time.sleep(3) # A sleep time of 3 seconds has been put to allow the page to load

Next filter is to limit the price search to have a maximum bike price of 800€

In [43]:
# Clicking the filter button to access the price filtering form
driver.find_elements(By.CSS_SELECTOR, '.d-flex.ng-star-inserted')[5].click()
time.sleep(1)

# Setting the maximum price to 800€
driver.find_element(By.CSS_SELECTOR, '.RangeFilter__input__field.pl-2.w-100.ng-untouched.ng-pristine.ng-valid').clear()
driver.find_element(By.CSS_SELECTOR, '.RangeFilter__input__field.pl-2.w-100.ng-untouched.ng-pristine.ng-valid').send_keys("800")
time.sleep(1)

# Clicking "Aplicar"
driver.find_elements(By.CSS_SELECTOR, '.btn.btn-filter.btn-primary')[1].click()
time.sleep(1)

#### MAKING OUR LIFE EASIER AND AVOIDING SPAGHETTI CODING
Now that we have loaded the page and applied all the preliminary filters, we can begin applying our specialized filters (of bike type and bike condition). But first, let's create some functions that will make our life easier later-on

In [44]:
# Function for selecting Bicicletas de carretera
def selecting_carretera():
    
    print("Filtering only for Carretera bike types")

    # Filtering the results to include only "Bicicletas y triciclos"
    driver.find_elements(By.CSS_SELECTOR, '.d-flex.ng-star-inserted')[7].click()
    time.sleep(1)
    driver.find_element(By.XPATH, '//p[contains(text(), " Bicicletas y triciclos")]').click()

    # Narrowing down our search to include “Bicicletas de carretera” (road bikes)
    row_elements = driver.find_elements(By.XPATH, "//*[@class = 'w-100 ng-star-inserted']")

    for bike_label_row in row_elements:
        bike_label = bike_label_row.text.strip()
        if bike_label == "Bicicletas de carretera":
            bike_label_row.find_element(By.XPATH, ".//tsl-checkbox-form").click()
            time.sleep(1)

    # Clicking "Aplicar"
    driver.find_elements(By.CSS_SELECTOR, '.btn.btn-filter.btn-primary')[2].click()

In [45]:
# Function for selecting MTB
def selecting_mtb():

    print("Filtering only for MTB bike types")
    
    # Filtering the results to include only "Bicicletas y triciclos"
    driver.find_elements(By.CSS_SELECTOR, '.d-flex.ng-star-inserted')[7].click()
    time.sleep(1)
    driver.find_element(By.XPATH, '//p[contains(text(), " Bicicletas y triciclos")]').click()

    # Narrowing down our search to include “MTB”
    row_elements = driver.find_elements(By.XPATH, "//*[@class = 'w-100 ng-star-inserted']")

    for bike_label_row in row_elements:
        bike_label = bike_label_row.text.strip()
        if bike_label == "MTB":
            bike_label_row.find_element(By.XPATH, ".//tsl-checkbox-form").click()
            time.sleep(1)

    # Clicking "Aplicar"
    driver.find_elements(By.CSS_SELECTOR, '.btn.btn-filter.btn-primary')[2].click()

In [46]:
# Function for selecting Plegables
def selecting_plegables():

    print("Filtering only for Plegables bike types")

    # Filtering the results to include only "Bicicletas y triciclos"
    driver.find_elements(By.CSS_SELECTOR, '.d-flex.ng-star-inserted')[7].click()
    time.sleep(1)
    driver.find_element(By.XPATH, '//p[contains(text(), " Bicicletas y triciclos")]').click()

    # Narrowing down our search to include “Plegables” (road bikes)
    row_elements = driver.find_elements(By.XPATH, "//*[@class = 'w-100 ng-star-inserted']")

    for bike_label_row in row_elements:
        bike_label = bike_label_row.text.strip()
        if bike_label == "Bicicletas plegables":
            bike_label_row.find_element(By.XPATH, ".//tsl-checkbox-form").click()
            time.sleep(1)


    # Clicking "Aplicar"
    driver.find_elements(By.CSS_SELECTOR, '.btn.btn-filter.btn-primary')[2].click()

In [47]:
# Function for selecting "Nuevo" (New)
def selecting_nuevo():

    print("Selecting only Nuevo bike conditions")

    # Opening the "Estado del producto" option button
    driver.find_elements(By.CSS_SELECTOR, '.Bubble__dropdown_arrow.d-flex.justify-content-center.align-items-center.ng-star-inserted')[2].click()
    time.sleep(1)

    # Choosing “Nuevo” 
    row_elements = driver.find_elements(By.XPATH, "//*[@class = 'w-100 ng-star-inserted']")

    for bike_condition_row in row_elements:
        bike_condition = bike_condition_row.text.strip()
        if bike_condition == "Nuevo\nNunca se ha usado":
            bike_condition_row.find_element(By.XPATH, ".//tsl-checkbox-form").click()
            time.sleep(1)

    # Clicking "Aplicar"
    driver.find_elements(By.CSS_SELECTOR, '.btn.btn-filter.btn-primary')[3].click()

In [48]:
# Function for de-selecting "Nuevo" (New)
def deselecting_nuevo():

    print("Deselecting Nuevo bike conditions")

    driver.find_element(By.XPATH, '//div[contains(text(), " Nuevo ")]').click()
    time.sleep(1)

    # Choosing “Nuevo"
    row_elements = driver.find_elements(By.XPATH, "//*[@class = 'w-100 ng-star-inserted']")

    for bike_condition_row in row_elements:
        bike_condition = bike_condition_row.text.strip()
        if bike_condition == "Nuevo\nNunca se ha usado":
            bike_condition_row.find_element(By.XPATH, ".//tsl-checkbox-form").click()
            time.sleep(1)

    # Clicking "Aplicar"
    driver.find_elements(By.CSS_SELECTOR, '.btn.btn-filter.btn-primary')[3].click()

In [49]:
# Function for selecting Como Nuevo (As Good As New)
def selecting_como_nuevo():

    print("Selecting only Como Nuevo bike conditions")

    # Opening the "Estado del producto" option button
    driver.find_elements(By.CSS_SELECTOR, '.Bubble__dropdown_arrow.d-flex.justify-content-center.align-items-center.ng-star-inserted')[2].click()
    time.sleep(1)

    # Choosing "Como Nuevo" 
    row_elements = driver.find_elements(By.XPATH, "//*[@class = 'w-100 ng-star-inserted']")

    for bike_condition_row in row_elements:
        bike_condition = bike_condition_row.text.strip()
        if bike_condition == "Como nuevo\nEn perfectas condiciones":
            bike_condition_row.find_element(By.XPATH, ".//tsl-checkbox-form").click()
            time.sleep(1)

    # Clicking "Aplicar"
    driver.find_elements(By.CSS_SELECTOR, '.btn.btn-filter.btn-primary')[3].click()

In [50]:
# Function for de-selecting "Como Nuevo" (As Good As New)
def deselecting_como_nuevo():

    print("Deselecting Como Nuevo bike conditions")

    # Opening the "Estado del producto" option button
    driver.find_element(By.XPATH, '//div[contains(text(), " Como nuevo ")]').click()
    time.sleep(1)

    # Choosing "Como Nuevo"
    row_elements = driver.find_elements(By.XPATH, "//*[@class = 'w-100 ng-star-inserted']")

    for bike_condition_row in row_elements:
        bike_condition = bike_condition_row.text.strip()
        if bike_condition == "Como nuevo\nEn perfectas condiciones":
            bike_condition_row.find_element(By.XPATH, ".//tsl-checkbox-form").click()
            time.sleep(1)

    # Clicking "Aplicar"
    driver.find_elements(By.CSS_SELECTOR, '.btn.btn-filter.btn-primary')[3].click()

In [51]:
# Function for selecting “En buen estado” (In good condition)
def selecting_en_buen_estado():

    print("Selecting only En Buen Estado bike conditions")

    # Opening the "Estado del producto" option button
    driver.find_elements(By.CSS_SELECTOR, '.Bubble__dropdown_arrow.d-flex.justify-content-center.align-items-center.ng-star-inserted')[2].click()
    time.sleep(1)

    # Choosing “En buen estado”
    row_elements = driver.find_elements(By.XPATH, "//*[@class = 'w-100 ng-star-inserted']")

    for bike_condition_row in row_elements:
        bike_condition = bike_condition_row.text.strip()
        if bike_condition == "En buen estado\nBastante usado, pero bien conservado":
            bike_condition_row.find_element(By.XPATH, ".//tsl-checkbox-form").click()
            time.sleep(1)

    # Clicking "Aplicar"
    driver.find_elements(By.CSS_SELECTOR, '.btn.btn-filter.btn-primary')[3].click()

In [52]:
# Function for de-selecting “En buen estado” (In good condition)
def deselecting_en_buen_estado():

    print("Deselecting En Buen Estado bike conditions")

    # Opening the "Estado del producto" option button
    driver.find_element(By.XPATH, '//div[contains(text(), " En buen estado ")]').click()
    time.sleep(1)

    # Choosing “En buen estado”
    row_elements = driver.find_elements(By.XPATH, "//*[@class = 'w-100 ng-star-inserted']")

    for bike_condition_row in row_elements:
        bike_condition = bike_condition_row.text.strip()
        if bike_condition == "En buen estado\nBastante usado, pero bien conservado":
            bike_condition_row.find_element(By.XPATH, ".//tsl-checkbox-form").click()
            time.sleep(1)

    # Clicking "Aplicar"
    driver.find_elements(By.CSS_SELECTOR, '.btn.btn-filter.btn-primary')[3].click()

In [53]:
# Function for the scrolling down command
def scroll_down():

    # Get the page scroll height
    last_height = driver.execute_script("return document.body.scrollHeight")

    # Scrolling down all the available pages
    while True:

        # Scrolling down to the bottom of the page
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        # Waiting for the page to load
        time.sleep(5)

        # Calculating the new scroll height and comparing it with last scroll height
        new_height = driver.execute_script("return document.body.scrollHeight")

        if new_height == last_height:

            break

        last_height = new_height

In [54]:
# Function for the scrolling up command
def scroll_up_to_home():

    driver.find_element(By.TAG_NAME, 'html').send_keys(Keys.HOME)

# COMPILING EVERYTHING
Now we can begin scraping the website.

Keep in mind that the architecture of the below code is 2 for-loops within each other that'll perform the filtering process. This helps in making the code easier to read and understand instead of writing the same code 9 times! (3 bike types * 3 bike coniditions = 9 possible filters).

In [55]:
prices = []
titles = []
descriptions = []
image_url = []
posting_url = []
bike_type = []
bike_state = []
children = []
publication_dates = []
bike_size_number = []
bike_size_letter = []
total_no_of_postings_retrieved_list = []
total_no_of_postings_retrieved_list_modified_list = []

selecting_bike_type = [selecting_carretera, selecting_mtb, selecting_plegables]
selecting_bike_condition = [selecting_nuevo, selecting_como_nuevo, selecting_en_buen_estado]
deselecting_bike_condition = [deselecting_nuevo, deselecting_como_nuevo, deselecting_en_buen_estado]

for selecting_bike_type_function in selecting_bike_type:
   
    time.sleep(5)
    
    selecting_bike_type_function()
    
    for selecting_bike_condition_function in selecting_bike_condition:
        
        selecting_bike_condition_function()

        time.sleep(5)

        # Clicking on the "Ver más productos" button if it exists, otherwise scrolling down the page to load the listings
        try:
            
            # Finding the "Ver más productos" button"
            driver.execute_script("arguments[0].scrollIntoView(true);", WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, '//button[contains(text(), "Ver más productos")]'))))
            
            # Clicking the "Ver más productos" button"
            driver.execute_script("arguments[0].click();", WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, '//button[contains(text(), "Ver más productos")]'))))

            driver.implicitly_wait(20)
            time.sleep(20)

            # Scrolling down to load all the listings
            scroll_down()

            driver.implicitly_wait(5)
            time.sleep(5)

            # Scrolling back up to the start of the page
            scroll_up_to_home()

        except:
            
            # Scrolling down to load all the listings
            scroll_down()

            driver.implicitly_wait(5)
            time.sleep(5)
            
            # Scrolling back up to the start of the page
            scroll_up_to_home()

        time.sleep(10)

        # Understanding how many listings we retrieved in total
        total_no_of_postings_retrieved = driver.find_elements(By.XPATH, '//div[@class="ItemCard__data border-top-0 ItemCard__data--with-description"]')
        print("We have found a total of", len(total_no_of_postings_retrieved), "postings!")
        
        if len(total_no_of_postings_retrieved) > 250:
            print("However, keep in mind that we will only be using a maximum of 250 listings if more than 250 were found!")
        
        print("Now let's retrieve the data for these postings!")
        
        # Understanding how many postings exist
        total_no_of_postings_retrieved_list.append(len(total_no_of_postings_retrieved))

        # Retreiving the required data but for a maximum of 250 bike listings
        if len(total_no_of_postings_retrieved) > 250:
            total_no_of_postings_retrieved_modified = total_no_of_postings_retrieved[0 : 250]
        else:
            total_no_of_postings_retrieved_modified = total_no_of_postings_retrieved

        # Checking that we are limiting our extracted listings to 250 in-case more exist
        total_no_of_postings_retrieved_list_modified_list.append(len(total_no_of_postings_retrieved_modified))

        time.sleep(10)

        # Opening every bike listing and then obtaining the data
        for listing in range(len(total_no_of_postings_retrieved_modified)):
            
            try:

                time.sleep(1)
                
                # Opening every bike listing post in a new Chrome tab
                driver.find_elements(By.XPATH, '//div[@class="ItemCard__image ItemCard__image--with-description"]')[listing].click()
                
                time.sleep(1)
                
                # Setting the new tab as the active one so we extract the data we need
                driver.switch_to.window(driver.window_handles[1])

                # Finding the bike price
                try:
                    prices.append(float(driver.find_element(By.XPATH, '//div[@class="card-product-price-info"]').text.split()[0])) 
                except:
                    prices.append(float(driver.find_element(By.XPATH, '//div[@class="card-product-price-info"]').text.split()[0].replace(",", "."))) 
                
                time.sleep(0.1)

                # Finding the post title
                titles.append(driver.find_element(By.XPATH, '//h1[@class="js__card-product-detail--title card-product-detail-title  card-product-detail-title--with-extra-info "]').text) 
                
                time.sleep(0.1)

                # Extracting the post description
                description = driver.find_element(By.XPATH, '//p[@class="js__card-product-detail--description card-product-detail-description"]').text
                descriptions.append(description) 
                
                time.sleep(0.1)

                # Extracting the image URL
                image_url.append(driver.find_elements(By.CSS_SELECTOR, 'div ul li img')[4].get_attribute('src')) 
                
                time.sleep(0.1)

                # Extracting the page URL
                posting_url.append(driver.current_url)
                
                time.sleep(0.1)

                # Obtaining the bike type
                bike_type.append(driver.find_element(By.XPATH, '//div[@class="mb-3 ExtraInfo--horizontally-scrollable"]').text.split("\n")[1])
                
                time.sleep(0.1)
                
                # Obtaining the bike condition
                bike_state.append(driver.find_element(By.XPATH, '//span[@class="ExtraInfo__text"]').text)
                
                time.sleep(0.1)

                # Checking whether the bike is suitable for children (having any of the mentions: “niño/a”, “niño”, “niña”, “niños” , “niñas” or "niño/as")
                bike_title = driver.find_element(By.XPATH, '//h1[@class="js__card-product-detail--title card-product-detail-title  card-product-detail-title--with-extra-info "]').text
                bike_description = driver.find_element(By.XPATH, '//p[@class="js__card-product-detail--description card-product-detail-description"]').text
                if ("niño/a" or "niño" or "niña" or "niños" or "niñas" or "niño/as") in bike_title:
                    children.append(True)
                elif ("niño/a" or "niño" or "niña" or "niños" or "niñas" or "niño/as") in bike_description:
                    children.append(True)
                else:
                    children.append(False)
            
                time.sleep(0.1)


                # Getting the size of the bike based on the letter from the listings descriptions that we have
                if re.findall(" s ", description, flags=re.IGNORECASE):
                    bike_size_letter.append("S")
                elif re.findall(" m ", description, flags=re.IGNORECASE):
                    bike_size_letter.append("M")
                elif re.findall(" l ", description, flags=re.IGNORECASE):
                    bike_size_letter.append("L")
                elif re.findall("S/M", description, flags=re.IGNORECASE):
                    bike_size_letter.append("M/L")
                elif re.findall("M/L", description, flags=re.IGNORECASE):
                    bike_size_letter.append("M/L")
                else:
                    bike_size_letter.append(np.nan)


                # Getting the size of the bike based on the size number from the listing descriptions that we have
                if " talla " in description.lower():
                    
                    # Limiting the text to search to avoid unintentional errors
                    talla_index = description.lower().split().index('talla')
                    mini_description = description.split()[talla_index : talla_index + 3]
                    new_description_small = ' '.join(mini_description)
                    
                    # The typical range of bike sizes was observed to be between 50 and 60
                    if '50' in new_description_small:
                        bike_size_number.append(int(50))
                    elif '51' in new_description_small:
                        bike_size_number.append(int(51))
                    elif '52' in new_description_small:
                        bike_size_number.append(int(52))
                    elif '53' in new_description_small:
                        bike_size_number.append(int(53))
                    elif '54' in new_description_small:
                        bike_size_number.append(int(54))
                    elif '55' in new_description_small:
                        bike_size_number.append(int(55))
                    elif '56' in new_description_small:
                        bike_size_number.append(int(56))
                    elif '57' in new_description_small:
                        bike_size_number.append(int(57))
                    elif '58' in new_description_small:
                        bike_size_number.append(int(58))
                    elif '59' in new_description_small:
                        bike_size_number.append(int(59))
                    elif '60' in new_description_small:
                        bike_size_number.append(int(60))
                    else:
                        bike_size_number.append(np.nan)
                
                else:
                    bike_size_number.append(np.nan)
                
                time.sleep(0.1)

                # BONUS QUESTION: Retreiving the posting date of every listing
                publication_date = driver.find_element(By.XPATH, '//div[@class="card-product-detail-user-stats-published"]').text # Obtaining the date
                posting_date = publication_date.title().replace("-", "/") # Adjusting the format for datetime
                publication_dates.append(posting_date)

                time.sleep(0.1)

                # Going back to the main page
                driver.close()
                driver.switch_to.window(driver.window_handles[0])
                
                time.sleep(0.1)
            
            except:
                # Going back to the main page
                driver.close()
                driver.switch_to.window(driver.window_handles[0])
            
        # Scrolling back up to the start of the page
        from selenium.webdriver.common.keys import Keys
        driver.find_element(By.TAG_NAME, 'html').send_keys(Keys.HOME)

        print("Done!")

        # for deselecting_bike_condition_function in deselecting_bike_condition:
        if selecting_bike_condition_function == selecting_nuevo:
            deselecting_nuevo()
        elif selecting_bike_condition_function == selecting_como_nuevo:
            deselecting_como_nuevo()
        else:
            deselecting_en_buen_estado()

    time.sleep(2)

    selecting_bike_type_function()

    time.sleep(2)

Filtering only for Carretera bike types
Selecting only Nuevo bike conditions
We have found a total of 33 postings!
Now let's retrieve the data for these postings!
Done!
Deselecting Nuevo bike conditions
Selecting only Como Nuevo bike conditions
We have found a total of 168 postings!
Now let's retrieve the data for these postings!
Done!
Deselecting Como Nuevo bike conditions
Selecting only En Buen Estado bike conditions
We have found a total of 269 postings!
However, keep in mind that we will only be using a maximum of 250 listings if more than 250 were found!
Now let's retrieve the data for these postings!
Done!
Deselecting En Buen Estado bike conditions
Filtering only for Carretera bike types
Filtering only for MTB bike types
Selecting only Nuevo bike conditions
We have found a total of 58 postings!
Now let's retrieve the data for these postings!
Done!
Deselecting Nuevo bike conditions
Selecting only Como Nuevo bike conditions
We have found a total of 345 postings!
However, keep in mi

Let's make sure that our obtaines listings are limited to 250 in case more exist.

To do this, we previously obtained all the possible listings of every filter possible and then limited it to 250.

The ```DataFrame``` below can help us visualize the results better.

In [72]:
# Creating the DataFrame along with the respective columns and data
df_postings = pd.DataFrame({"Original No of Postings":total_no_of_postings_retrieved_list, "Modified No of Postings":total_no_of_postings_retrieved_list_modified_list})
df_postings["Result"] = np.where(df_postings["Original No of Postings"] > df_postings["Modified No of Postings"], "Modified", "Not Modified")
df_postings

Unnamed: 0,Original No of Postings,Modified No of Postings,Result
0,33,33,Not Modified
1,168,168,Not Modified
2,269,250,Modified
3,58,58,Not Modified
4,345,250,Modified
5,452,250,Modified
6,46,46,Not Modified
7,143,143,Not Modified
8,132,132,Not Modified


### Creating the DataFrame for the data obtained
Now that we have the data needed, let's do the adjustments needed and then create the ```DataFrame```!

In [59]:
# Dealing with the bike state. Changing "Bueno" to "En Buen Estado"
bike_state_modified = []

for state in bike_state:
    if "Bueno" in state:
        bike_state_modified.append("En Buen Estado")
    else:
        bike_state_modified.append(state)

# Dealing with the date -> BONUS question
US_publication_dates = []

for date in publication_dates:
    if "Dic" in date:
        US_publication_dates.append(date.replace("Dic", "Dec"))
    else:
        US_publication_dates.append(date)

In [92]:
# Creating the DataFrame along with the respective columns and data
df = pd.DataFrame({"Link":posting_url, "Title":titles, "Description":descriptions, "Price":prices, "Image":image_url, "Type":bike_type, "State":bike_state_modified, "Children":children,
                    "Size (letter)":bike_size_letter, "Size (number)":bike_size_number, "Date":US_publication_dates})

# Setting the date column type to datetime
df['Date']= pd.to_datetime(df['Date'])

# Droping the duplicate entries and just keeping the first entry from the duplicates
df = df.drop_duplicates(keep='first')

# Checking all the data types in our DataFrame
print("Our DataFrame column data types consist of the following:")
print(df.dtypes)

# Checking the size of our DataFrame and the number of listings obtained
print("\nOur DataFrame has", df.shape[0], "rows, and", df.shape[1], "columns.\nIn other words, we have obtained the data for", df.shape[0], "listings!\n")

# Displaying the data for the first 5 listings obtained
df.head(5)

Our DataFrame column data types consist of the following:
Link                     object
Title                    object
Description              object
Price                   float64
Image                    object
Type                     object
State                    object
Children                   bool
Size (letter)            object
Size (number)           float64
Date             datetime64[ns]
dtype: object

Our DataFrame has 1326 rows, and 11 columns.
In other words, we have obtained the data for 1326 listings!



Unnamed: 0,Link,Title,Description,Price,Image,Type,State,Children,Size (letter),Size (number),Date
0,https://es.wallapop.com/item/bicicleta-montana...,Bicicleta montaña barata y nueva,"Vendo boco nueva, nunca ha sido utilizado.",115.0,https://cdn.wallapop.com/images/10420/e4/sm/__...,Bicicletas de carretera,Nuevo,False,,,2022-12-03
1,https://es.wallapop.com/item/bicicleta-nueva-8...,BICICLETA NUEVA,"Bicicleta nueva de hace un año, usada solo dos...",300.0,https://cdn.wallapop.com/images/10420/e3/ty/__...,Bicicletas de carretera,Nuevo,False,,,2022-11-28
2,https://es.wallapop.com/item/bicicleta-canonda...,Bicicleta Cannondale,Bicicletas nuevas a estrenar marca Cannondale ...,550.0,https://cdn.wallapop.com/images/10420/e3/d1/__...,Bicicletas de carretera,Nuevo,False,M,,2022-11-25
3,https://es.wallapop.com/item/bicicleta-de-grav...,Bicicleta de gravel NUEVA,"Nueva con garantia, todas las tallas",595.0,https://cdn.wallapop.com/images/10420/e2/2v/__...,Bicicletas de carretera,Nuevo,False,,,2022-11-24
4,https://es.wallapop.com/item/casco-de-bici-o-p...,Casco de bici o patín,Está nuevo lo compramos y nunca lo usamos,30.0,https://cdn.wallapop.com/images/10420/e1/vi/__...,Bicicletas de carretera,Nuevo,False,,,2022-11-16


### Creating the agg DataFrame
Now that we have all our data, let's get some more meaning out of it by grouping the results according to bike ```Type``` and then bike ```State``` and then display the respective ```Average Price (€)```.

In [95]:
agg = pd.DataFrame(df.groupby(['Type', 'State'])["Price"].mean().round(2))
agg.rename(columns = {"Price":"Average Price (€)"}, inplace = True)
agg

Unnamed: 0_level_0,Unnamed: 1_level_0,Average Price (€)
Type,State,Unnamed: 2_level_1
Bicicletas de carretera,Como nuevo,325.7
Bicicletas de carretera,En Buen Estado,272.04
Bicicletas de carretera,Nuevo,311.39
Bicicletas plegables,Como nuevo,168.64
Bicicletas plegables,En Buen Estado,114.04
Bicicletas plegables,Nuevo,253.93
MTB,Como nuevo,288.61
MTB,En Buen Estado,253.12
MTB,Nuevo,394.91


In [62]:
driver.quit()