# The Python Mega Course: Build 10 Real World Applications
---

This notebook contains the source code for the video lectures of Section 6 of [The Python Mega Course: Build 10 Real World Applciations](https://www.udemy.com/the-python-mega-course/?couponCode=GITHEADSECTION).

# Section 6: Data Analysis with Pandas
***

**Lecture 95:** [What is Pandas](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/5163238?start=0)
---

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

**Lecture 96:** [Note on IPython](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/7828164?start=0)
---

You can install IPython from your command line / terminal by running:

In [None]:
pip install ipython

or

In [None]:
pip3 install ipython

**Lecture 97:** [Getting Started with Pandas](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/5163240?start=0)
---

In [25]:
import pandas

In [22]:
#create a dataframe object
df1 = pandas.DataFrame([[2, 4, 6], [10, 20, 30]])
df1

Unnamed: 0,0,1,2
0,2,4,6
1,10,20,30


In [23]:
#dataframe object with named columns
df1 = pandas.DataFrame([[2, 4, 6], [10, 20, 30]], columns=["Price", "Age", "Value"])
df1

Unnamed: 0,Price,Age,Value
0,2,4,6
1,10,20,30


In [28]:
#dataframe object with named columsn and rows
df1 = pandas.DataFrame([[2, 4, 6], [10, 20, 30]], columns=["Price", "Age", "Value"], index=["First", "Second"])
df1

Unnamed: 0,Price,Age,Value
First,2,4,6
Second,10,20,30


In [24]:
#dataframe object from a dictionary
df2 = pandas.DataFrame([{"Name":"John"},{"Name":"Jack"}])
df2

Unnamed: 0,Name
0,John
1,Jack


In [26]:
#dataframe with many columns
df2 = pandas.DataFrame([{"Name":"John", "Surname":"Johns"},{"Name":"Jack"}])
df2

Unnamed: 0,Name,Surname
0,John,Johns
1,Jack,


In [29]:
#get the object type
type(df1)

pandas.core.frame.DataFrame

In [30]:
#get the mean value of each dataframe column
df1.mean()

Price     6.0
Age      12.0
Value    18.0
dtype: float64

In [31]:
#the mean value of all columns
df1.mean().mean()

12.0

In [32]:
#accessing a dataframe column
df1["Price"]

First      2
Second    10
Name: Price, dtype: int64

In [33]:
#another way to access the dataframe column
df1.Price

First      2
Second    10
Name: Price, dtype: int64

In [34]:
#the mean value of a column
df1.Price.mean()

6.0

**Lecture 98:** [Getting Started with Jupyter Notebooks](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/5163242?start=0)
---

This lecture contains a tutorial on how to use Jupyter Notebooks

**Lecture 99:** [Note on Loading Excel Files](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/7828274?start=0)
---

In the next lecture among other things you're also going to learn how to load Excel (.xlsx) files in Python with pandas. Pandas may require the xlrd library as a dependency. If you get an error such as *ModuleNotFoundError: No module named 'xlrd'*  you can fix that by installing xlrd:
<br>
*pip install xlrd*
<br>
or
<br>
*pip3 install xlrd*

**Lecture 100:** [Loading Data in Python from CSV, Excel, TXT, and JSON Files](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/5163246?start=0)
---

Please download the five files from [this link](http://pythonhow.com/data/python-mega-course-data/section6/).

In [36]:
import pandas

In [37]:
#loading data from a CSV file
df1 = pandas.read_csv("supermarkets.csv")
df1

Unnamed: 0,ID,Address,City,State,Country,Name,Employees
0,1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
1,2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
2,3,332 Hill St,San Francisco,California 94114,USA,Super River,25
3,4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
4,5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
5,6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20


The above CSV file had a first row that we wanted to be treated as a header in the dataframe. In case your data don't have a header, you have to set the header to None:

In [38]:
#loading data with no header from a CSV
df1 = pandas.read_csv("supermarkets.csv", header=None)
df1

Unnamed: 0,0,1,2,3,4,5,6
0,ID,Address,City,State,Country,Name,Employees
1,1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
2,2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
3,3,332 Hill St,San Francisco,California 94114,USA,Super River,25
4,4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
5,5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
6,6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20


In [68]:
#setting one of the columns as an index column
df1 = pandas.read_csv("supermarkets.csv")
df1.set_index("ID", inplace=True)
df1

Unnamed: 0_level_0,Address,City,State,Country,Name,Employees
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
3,332 Hill St,San Francisco,California 94114,USA,Super River,25
4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20


In [69]:
#get the number of columns and rows
df1.shape

(6, 6)

In [70]:
#loading data from a JSON file
df2 = pandas.read_json("supermarkets.json")
df2.set_index("ID", inplace=True)
df2

Unnamed: 0_level_0,Address,City,Country,Employees,Name,State
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,3666 21st St,San Francisco,USA,8,Madeira,CA 94114
2,735 Dolores St,San Francisco,USA,15,Bready Shop,CA 94119
3,332 Hill St,San Francisco,USA,25,Super River,California 94114
4,3995 23rd St,San Francisco,USA,10,Ben's Shop,CA 94114
5,1056 Sanchez St,San Francisco,USA,12,Sanchez,California
6,551 Alvarado St,San Francisco,USA,20,Richvalley,CA 94114


In [46]:
#loading data from an Excel file (assuming they are in the first sheet)
df3 = pandas.read_excel("supermarkets.xlsx", sheet_name=0)
df3

Unnamed: 0,ID,Address,City,State,Country,Supermarket Name,Number of Employees
0,1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
1,2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
2,3,332 Hill St,San Francisco,California 94114,USA,Super River,25
3,4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
4,5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
5,6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20


In [49]:
#loading data from a TXT file with columns divided by comas
df4 = pandas.read_csv("supermarkets-commas.txt")
df4

Unnamed: 0,ID,Address,City,State,Country,Name,Employees
0,1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
1,2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
2,3,332 Hill St,San Francisco,California 94114,USA,Super River,25
3,4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
4,5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
5,6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20


In [53]:
#loading data from a TXT file with columns divided by semi-colons
df5 = pandas.read_csv("supermarkets-semi-colons.txt", sep=";")
df5

Unnamed: 0,ID,Address,City,State,Country,Name,Employees
0,1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
1,2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
2,3,332 Hill St,San Francisco,California 94114,USA,Super River,25
3,4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
4,5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
5,6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20


In [54]:
#loading data from an URL of a TXT file
df6 = pandas.read_csv("http://pythonhow.com/supermarkets.csv")
df6

Unnamed: 0,ID,Address,City,State,Country,Name,Employees
0,1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
1,2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
2,3,332 Hill St,San Francisco,California 94114,USA,Super River,25
3,4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
4,5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
5,6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20


In [56]:
#loading data from an URL of a JSON file
df7 = pandas.read_json("http://pythonhow.com/supermarkets.json")
df7

Unnamed: 0,Address,City,Country,Employees,ID,Name,State
0,3666 21st St,San Francisco,USA,8,1,Madeira,CA 94114
1,735 Dolores St,San Francisco,USA,15,2,Bready Shop,CA 94119
2,332 Hill St,San Francisco,USA,25,3,Super River,California 94114
3,3995 23rd St,San Francisco,USA,10,4,Ben's Shop,CA 94114
4,1056 Sanchez St,San Francisco,USA,12,5,Sanchez,California
5,551 Alvarado St,San Francisco,USA,20,6,Richvalley,CA 94114


**Lecture 101:** [Indexing and Slicing Dataframes](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/5163250?start=0)
---

Lets suppose we have the following dataframe:

In [76]:
df8 = pandas.read_json("http://pythonhow.com/supermarkets.json")
df8.set_index("Address", inplace=True)
df8

Unnamed: 0_level_0,City,Country,Employees,ID,Name,State
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
3666 21st St,San Francisco,USA,8,1,Madeira,CA 94114
735 Dolores St,San Francisco,USA,15,2,Bready Shop,CA 94119
332 Hill St,San Francisco,USA,25,3,Super River,California 94114
3995 23rd St,San Francisco,USA,10,4,Ben's Shop,CA 94114
1056 Sanchez St,San Francisco,USA,12,5,Sanchez,California
551 Alvarado St,San Francisco,USA,20,6,Richvalley,CA 94114


In [77]:
#Extracting a portion of a dataframe based on labels
df8.loc["735 Dolores St":"332 Hill St", "Country":"ID"]

Unnamed: 0_level_0,Country,Employees,ID
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
735 Dolores St,USA,15,2
332 Hill St,USA,25,3


In [80]:
#Extracting all rows (:) of the column Country
df8.loc[:, "Country"]

Address
3666 21st St       USA
735 Dolores St     USA
332 Hill St        USA
3995 23rd St       USA
1056 Sanchez St    USA
551 Alvarado St    USA
Name: Country, dtype: object

In [81]:
#Extracting all rows (:) of the column Country as a list
list(df8.loc[:, "Country"])

['USA', 'USA', 'USA', 'USA', 'USA', 'USA']

In [85]:
#here is the dataframe again
df8

Unnamed: 0_level_0,City,Country,Employees,ID,Name,State
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
3666 21st St,San Francisco,USA,8,1,Madeira,CA 94114
735 Dolores St,San Francisco,USA,15,2,Bready Shop,CA 94119
332 Hill St,San Francisco,USA,25,3,Super River,California 94114
3995 23rd St,San Francisco,USA,10,4,Ben's Shop,CA 94114
1056 Sanchez St,San Francisco,USA,12,5,Sanchez,California
551 Alvarado St,San Francisco,USA,20,6,Richvalley,CA 94114


In [82]:
#Extracting a portion of a dataframe based on index numbers
df8.iloc[1:3, 1:3]

Unnamed: 0_level_0,Country,Employees
Address,Unnamed: 1_level_1,Unnamed: 2_level_1
735 Dolores St,USA,15
332 Hill St,USA,25


In [87]:
#Intersection between row with index 3 and columns with index 1 to 4
df8.iloc[3, 1:4]

Country      USA
Employees     10
ID             4
Name: 3995 23rd St, dtype: object

**Lecture 102:** [Dropping Dataframe Columns and Rows](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/5163254?start=0)
---

Lets again suppose we have the following dataframe:

In [98]:
df8 = pandas.read_json("http://pythonhow.com/supermarkets.json")
df8.set_index("Address", inplace=True)
df8

Unnamed: 0_level_0,City,Country,Employees,ID,Name,State
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
3666 21st St,San Francisco,USA,8,1,Madeira,CA 94114
735 Dolores St,San Francisco,USA,15,2,Bready Shop,CA 94119
332 Hill St,San Francisco,USA,25,3,Super River,California 94114
3995 23rd St,San Francisco,USA,10,4,Ben's Shop,CA 94114
1056 Sanchez St,San Francisco,USA,12,5,Sanchez,California
551 Alvarado St,San Francisco,USA,20,6,Richvalley,CA 94114


In [99]:
#deleting the column City. 1 means you're deleting columns 
df8.drop("City", 1)

Unnamed: 0_level_0,Country,Employees,ID,Name,State
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3666 21st St,USA,8,1,Madeira,CA 94114
735 Dolores St,USA,15,2,Bready Shop,CA 94119
332 Hill St,USA,25,3,Super River,California 94114
3995 23rd St,USA,10,4,Ben's Shop,CA 94114
1056 Sanchez St,USA,12,5,Sanchez,California
551 Alvarado St,USA,20,6,Richvalley,CA 94114


Note that the above simply returns a dataframe with the deleted column, but it doesn't change the original dataframe:

In [100]:
#the original dataframe still has the City column
df8

Unnamed: 0_level_0,City,Country,Employees,ID,Name,State
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
3666 21st St,San Francisco,USA,8,1,Madeira,CA 94114
735 Dolores St,San Francisco,USA,15,2,Bready Shop,CA 94119
332 Hill St,San Francisco,USA,25,3,Super River,California 94114
3995 23rd St,San Francisco,USA,10,4,Ben's Shop,CA 94114
1056 Sanchez St,San Francisco,USA,12,5,Sanchez,California
551 Alvarado St,San Francisco,USA,20,6,Richvalley,CA 94114


In [101]:
#to change the original dataframe set inplace to True
df8.drop("City", 1, inplace=True)
df8

Unnamed: 0_level_0,Country,Employees,ID,Name,State
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3666 21st St,USA,8,1,Madeira,CA 94114
735 Dolores St,USA,15,2,Bready Shop,CA 94119
332 Hill St,USA,25,3,Super River,California 94114
3995 23rd St,USA,10,4,Ben's Shop,CA 94114
1056 Sanchez St,USA,12,5,Sanchez,California
551 Alvarado St,USA,20,6,Richvalley,CA 94114


In [102]:
#delete a row
df8.drop("332 Hill St", 0)

Unnamed: 0_level_0,Country,Employees,ID,Name,State
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3666 21st St,USA,8,1,Madeira,CA 94114
735 Dolores St,USA,15,2,Bready Shop,CA 94119
3995 23rd St,USA,10,4,Ben's Shop,CA 94114
1056 Sanchez St,USA,12,5,Sanchez,California
551 Alvarado St,USA,20,6,Richvalley,CA 94114


In [103]:
#delete multiple rows
df8.drop(df8.index[0:3], 0)

Unnamed: 0_level_0,Country,Employees,ID,Name,State
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3995 23rd St,USA,10,4,Ben's Shop,CA 94114
1056 Sanchez St,USA,12,5,Sanchez,California
551 Alvarado St,USA,20,6,Richvalley,CA 94114


In [104]:
#delete multiple columns
df8.drop(df8.columns[0:3], 1)

Unnamed: 0_level_0,Name,State
Address,Unnamed: 1_level_1,Unnamed: 2_level_1
3666 21st St,Madeira,CA 94114
735 Dolores St,Bready Shop,CA 94119
332 Hill St,Super River,California 94114
3995 23rd St,Ben's Shop,CA 94114
1056 Sanchez St,Sanchez,California
551 Alvarado St,Richvalley,CA 94114


**Lecture 103:** [Updating and Adding new Columns and Rows](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/5163256?start=0)
---

Again, we're going to work with the following dataframe:

In [105]:
df8 = pandas.read_json("http://pythonhow.com/supermarkets.json")
df8.set_index("Address", inplace=True)
df8

Unnamed: 0_level_0,City,Country,Employees,ID,Name,State
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
3666 21st St,San Francisco,USA,8,1,Madeira,CA 94114
735 Dolores St,San Francisco,USA,15,2,Bready Shop,CA 94119
332 Hill St,San Francisco,USA,25,3,Super River,California 94114
3995 23rd St,San Francisco,USA,10,4,Ben's Shop,CA 94114
1056 Sanchez St,San Francisco,USA,12,5,Sanchez,California
551 Alvarado St,San Francisco,USA,20,6,Richvalley,CA 94114


In [106]:
#adding a new column
df8["Continent"] = "North America"
df8

Unnamed: 0_level_0,City,Country,Employees,ID,Name,State,Continent
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
3666 21st St,San Francisco,USA,8,1,Madeira,CA 94114,North America
735 Dolores St,San Francisco,USA,15,2,Bready Shop,CA 94119,North America
332 Hill St,San Francisco,USA,25,3,Super River,California 94114,North America
3995 23rd St,San Francisco,USA,10,4,Ben's Shop,CA 94114,North America
1056 Sanchez St,San Francisco,USA,12,5,Sanchez,California,North America
551 Alvarado St,San Francisco,USA,20,6,Richvalley,CA 94114,North America


In [108]:
#changing the values of an existing column
df8["Continent"] = df8["Country"] + ", " + "North America"
df8

Unnamed: 0_level_0,City,Country,Employees,ID,Name,State,Continent
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
3666 21st St,San Francisco,USA,8,1,Madeira,CA 94114,"USA, North America"
735 Dolores St,San Francisco,USA,15,2,Bready Shop,CA 94119,"USA, North America"
332 Hill St,San Francisco,USA,25,3,Super River,California 94114,"USA, North America"
3995 23rd St,San Francisco,USA,10,4,Ben's Shop,CA 94114,"USA, North America"
1056 Sanchez St,San Francisco,USA,12,5,Sanchez,California,"USA, North America"
551 Alvarado St,San Francisco,USA,20,6,Richvalley,CA 94114,"USA, North America"


In [110]:
#adding a new row
df8_t = df8.T
df8_t["My Address"] = ["My City", "My Country", 10, 7, "My Shop", "My State", "My Continent"]
df8 = df8_t.T
df8

Unnamed: 0_level_0,City,Country,Employees,ID,Name,State,Continent
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
3666 21st St,San Francisco,USA,8,1,Madeira,CA 94114,"USA, North America"
735 Dolores St,San Francisco,USA,15,2,Bready Shop,CA 94119,"USA, North America"
332 Hill St,San Francisco,USA,25,3,Super River,California 94114,"USA, North America"
3995 23rd St,San Francisco,USA,10,4,Ben's Shop,CA 94114,"USA, North America"
1056 Sanchez St,San Francisco,USA,12,5,Sanchez,California,"USA, North America"
551 Alvarado St,San Francisco,USA,20,6,Richvalley,CA 94114,"USA, North America"
My Address,My City,My Country,10,7,My Shop,My State,My Continent


**Lecture 104:** [Example: Geocoding Addresses with Pandas and Geopy](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/5163258?start=0)
---

This lecture shows how to convert (geocode) address strings to geographical latitude and longitude coordinates. For that you need to install geopy with:<br> *pip install geopy*<br>

In [5]:
#import the ArcGIS geocoder
from geopy.geocoders import ArcGIS

In [6]:
#create a geocoder object
nom = ArcGIS()

In [8]:
#geocode an address
nom.geocode("3995 23rd St, San Francisco, CA 94114")

Location(3995 23rd St, San Francisco, California, 94114, (37.75298458728149, -122.4317017142651, 0.0))

In [9]:
#you can also store the results in a variable
n = nom.geocode("3995 23rd St, San Francisco, CA 94114")

In [10]:
#you can see that n is a location object of geopy
type(n)

geopy.location.Location

In [11]:
#extract latitude and longitude
print(n.latitude, n.longitude)

37.75298458728149 -122.4317017142651


In [12]:
#lets load some csv data with addresses
import pandas
df = pandas.read_csv("supermarkets.csv")
df

Unnamed: 0,ID,Address,City,State,Country,Name,Employees
0,1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
1,2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
2,3,332 Hill St,San Francisco,California 94114,USA,Super River,25
3,4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
4,5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
5,6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20


In [13]:
#update the address column so it also has city, state, and country
df["Address"] = df["Address"] + ", " + df["City"] + ", " + df["State"] + ", " + df["Country"]
df

Unnamed: 0,ID,Address,City,State,Country,Name,Employees
0,1,"3666 21st St, San Francisco, CA 94114, USA",San Francisco,CA 94114,USA,Madeira,8
1,2,"735 Dolores St, San Francisco, CA 94119, USA",San Francisco,CA 94119,USA,Bready Shop,15
2,3,"332 Hill St, San Francisco, California 94114, USA",San Francisco,California 94114,USA,Super River,25
3,4,"3995 23rd St, San Francisco, CA 94114, USA",San Francisco,CA 94114,USA,Ben's Shop,10
4,5,"1056 Sanchez St, San Francisco, California, USA",San Francisco,California,USA,Sanchez,12
5,6,"551 Alvarado St, San Francisco, CA 94114, USA",San Francisco,CA 94114,USA,Richvalley,20


In [14]:
#create a coordinates column to store the geocoded values
df["Coordinates"] = df["Address"].apply(nom.geocode)
df

Unnamed: 0,ID,Address,City,State,Country,Name,Employees,Coordinates
0,1,"3666 21st St, San Francisco, CA 94114, USA",San Francisco,CA 94114,USA,Madeira,8,"(3666 21st St, San Francisco, California, 9411..."
1,2,"735 Dolores St, San Francisco, CA 94119, USA",San Francisco,CA 94119,USA,Bready Shop,15,"(735 Dolores St, San Francisco, California, 94..."
2,3,"332 Hill St, San Francisco, California 94114, USA",San Francisco,California 94114,USA,Super River,25,"(332 Hill St, San Francisco, California, 94114..."
3,4,"3995 23rd St, San Francisco, CA 94114, USA",San Francisco,CA 94114,USA,Ben's Shop,10,"(3995 23rd St, San Francisco, California, 9411..."
4,5,"1056 Sanchez St, San Francisco, California, USA",San Francisco,California,USA,Sanchez,12,"(1056 Sanchez St, San Francisco, California, 9..."
5,6,"551 Alvarado St, San Francisco, CA 94114, USA",San Francisco,CA 94114,USA,Richvalley,20,"(551 Alvarado St, San Francisco, California, 9..."


In [15]:
#extracting the latitude of the first value of the Coordinates column
df.Coordinates[0].latitude

37.75644449321933

In [16]:
#create latitude and longitude columns
df["Latitude"] = df["Coordinates"].apply(lambda x: x.latitude if x != None else None)
df["Longitude"] = df["Coordinates"].apply(lambda x: x.longitude if x != None else None)
df

Unnamed: 0,ID,Address,City,State,Country,Name,Employees,Coordinates,Latitude,Longitude
0,1,"3666 21st St, San Francisco, CA 94114, USA",San Francisco,CA 94114,USA,Madeira,8,"(3666 21st St, San Francisco, California, 9411...",37.756444,-122.429353
1,2,"735 Dolores St, San Francisco, CA 94119, USA",San Francisco,CA 94119,USA,Bready Shop,15,"(735 Dolores St, San Francisco, California, 94...",37.757801,-122.425599
2,3,"332 Hill St, San Francisco, California 94114, USA",San Francisco,California 94114,USA,Super River,25,"(332 Hill St, San Francisco, California, 94114...",37.755641,-122.428794
3,4,"3995 23rd St, San Francisco, CA 94114, USA",San Francisco,CA 94114,USA,Ben's Shop,10,"(3995 23rd St, San Francisco, California, 9411...",37.752985,-122.431702
4,5,"1056 Sanchez St, San Francisco, California, USA",San Francisco,California,USA,Sanchez,12,"(1056 Sanchez St, San Francisco, California, 9...",37.752144,-122.429758
5,6,"551 Alvarado St, San Francisco, CA 94114, USA",San Francisco,CA 94114,USA,Richvalley,20,"(551 Alvarado St, San Francisco, California, 9...",37.753709,-122.433251
