## Introduction

In this Jupyter notebook we want to scrap the csv example on cars in the Wikipedia page and save the resulting output in the local folder 


### Load the library

In [1]:
from bs4 import BeautifulSoup
import requests

### Get the csv example

Load in the html

In [2]:
csv_wiki = requests.get("https://en.wikipedia.org/wiki/Comma-separated_values")
soup = BeautifulSoup(csv_wiki.text, 'html.parser')

Get the csv example under the header "Example"

In [4]:
#method 1
table = soup.find(id='Example').findNext('pre').text
table

'Year,Make,Model,Description,Price\n1997,Ford,E350,"ac, abs, moon",3000.00\n1999,Chevy,"Venture ""Extended Edition""","",4900.00\n1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00\n1996,Jeep,Grand Cherokee,"MUST SELL!\nair, moon roof, loaded",4799.00\n'

In [5]:
#method 2
info = soup.find('table', class_ = 'wikitable').findNext('pre').text
info

'Year,Make,Model,Description,Price\n1997,Ford,E350,"ac, abs, moon",3000.00\n1999,Chevy,"Venture ""Extended Edition""","",4900.00\n1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00\n1996,Jeep,Grand Cherokee,"MUST SELL!\nair, moon roof, loaded",4799.00\n'

### Save the csv example into a csv file

In [6]:
f = open('car.csv', 'w')
f.write(table)
f.close()


Reload the csv data from the file to make sure the data was saved properly

In [7]:
import pandas as pd
df = pd.read_csv('car.csv')
df

Unnamed: 0,Year,Make,Model,Description,Price
0,1997,Ford,E350,"ac, abs, moon",3000.0
1,1999,Chevy,"Venture ""Extended Edition""",,4900.0
2,1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.0
3,1996,Jeep,Grand Cherokee,"MUST SELL!\nair, moon roof, loaded",4799.0


In [8]:
df.shape

(4, 5)

In [9]:
df.head(2)

Unnamed: 0,Year,Make,Model,Description,Price
0,1997,Ford,E350,"ac, abs, moon",3000.0
1,1999,Chevy,"Venture ""Extended Edition""",,4900.0


In [10]:
df.tail(2)

Unnamed: 0,Year,Make,Model,Description,Price
2,1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.0
3,1996,Jeep,Grand Cherokee,"MUST SELL!\nair, moon roof, loaded",4799.0


In [11]:
df.columns

Index(['Year', 'Make', 'Model', 'Description', 'Price'], dtype='object')

In [12]:
df.dtypes

Year             int64
Make            object
Model           object
Description     object
Price          float64
dtype: object

## What is the make of the most expensive car?

In [13]:
n = df[df['Price'] == max(df['Price'])].index
n

Int64Index([2], dtype='int64')

In [14]:
print(df.loc[[2]]['Make'])


2    Chevy
Name: Make, dtype: object


## How many cars do we have per year?

In [15]:
df['Year'].value_counts()

1999    2
1997    1
1996    1
Name: Year, dtype: int64