## Pandas Basics

The table has one row for each album and several columns.

- **artist:** Name of the artist
- **album:** Name of the album
- **released_year:** Year the album was released
- **length_min_sec:** Length of the album (hours,minutes,seconds)
- **genre:** Genre of the album
- **music_recording_sales_millions:** Music recording sales (millions in USD) on [SONG://DATABASE]
- **claimed_sales_millions:** Album's claimed sales (millions in USD) on [SONG://DATABASE]
- **date_released:** Date on which the album was released
- **soundtrack:** Indicates if the album is the movie soundtrack (Y) or (N)
- **rating_of_friends:** Indicates the rating from your friends from 1 to 10


variable = feature = column

records = row

dataframe = dataset


### **importing dataset**

In [2]:
import pandas as pd

In [3]:
# read dataset
df = pd.read_csv("TopSellingAlbums.csv")

In [4]:
type(df)

pandas.core.frame.DataFrame

In [7]:
# display the top 5 rows
df.head()

Unnamed: 0,Artist,Album,Released,Length,Genre,Music Recording Sales (millions),Claimed Sales (millions),Released.1,Soundtrack,Rating
0,Michael Jackson,Thriller,1982,0:42:19,"pop, rock, R&B",46.0,65,30-Nov-82,,10.0
1,AC/DC,Back in Black,1980,0:42:11,hard rock,26.1,50,25-Jul-80,,9.5
2,Pink Floyd,The Dark Side of the Moon,1973,0:42:49,progressive rock,24.2,45,01-Mar-73,,9.0
3,Whitney Houston,The Bodyguard,1992,0:57:44,"R&B, soul, pop",27.4,44,17-Nov-92,Y,8.5
4,Meat Loaf,Bat Out of Hell,1977,0:46:33,"hard rock, progressive rock",20.6,43,21-Oct-77,,8.0


In [6]:
# display the last 5 rows
df.tail()

Unnamed: 0,Artist,Album,Released,Length,Genre,Music Recording Sales (millions),Claimed Sales (millions),Released.1,Soundtrack,Rating
3,Whitney Houston,The Bodyguard,1992,0:57:44,"R&B, soul, pop",27.4,44,17-Nov-92,Y,8.5
4,Meat Loaf,Bat Out of Hell,1977,0:46:33,"hard rock, progressive rock",20.6,43,21-Oct-77,,8.0
5,Eagles,Their Greatest Hits (1971-1975),1976,0:43:08,"rock, soft rock, folk rock",32.2,42,17-Feb-76,,7.5
6,Bee Gees,Saturday Night Fever,1977,1:15:54,disco,20.6,40,15-Nov-77,Y,7.0
7,Fleetwood Mac,Rumours,1977,0:40:01,soft rock,27.9,40,04-Feb-77,,6.5


In [None]:
# reading excel file
# df_excel = pd.read_excel("")

- we can access the column "Length" and assign it a new dataframe 'x'

In [9]:
# 2-D Array
x = df[['Length']]
type(x)

pandas.core.frame.DataFrame

In [10]:
x

Unnamed: 0,Length
0,0:42:19
1,0:42:11
2,0:42:49
3,0:57:44
4,0:46:33
5,0:43:08
6,1:15:54
7,0:40:01


- you can also assign the value to a Series, you can think of python Series as a 1-D dataframe. Just One Bracket.

In [15]:
y = df['Album']
type(y)

pandas.core.series.Series

In [16]:
y

0                           Thriller
1                      Back in Black
2          The Dark Side of the Moon
3                      The Bodyguard
4                    Bat Out of Hell
5    Their Greatest Hits (1971-1975)
6               Saturday Night Fever
7                            Rumours
Name: Album, dtype: object

In [17]:
y.values

array(['Thriller', 'Back in Black', 'The Dark Side of the Moon',
       'The Bodyguard', 'Bat Out of Hell',
       'Their Greatest Hits (1971-1975)', 'Saturday Night Fever',
       'Rumours'], dtype=object)

- you can also convert the above array to list

In [19]:
y.values.tolist()

['Thriller',
 'Back in Black',
 'The Dark Side of the Moon',
 'The Bodyguard',
 'Bat Out of Hell',
 'Their Greatest Hits (1971-1975)',
 'Saturday Night Fever',
 'Rumours']

- selecting multiple columns

In [20]:
# for selecting multiple columns always use double square brackets
y = df[['Length', 'Artist', 'Genre']]
y

Unnamed: 0,Length,Artist,Genre
0,0:42:19,Michael Jackson,"pop, rock, R&B"
1,0:42:11,AC/DC,hard rock
2,0:42:49,Pink Floyd,progressive rock
3,0:57:44,Whitney Houston,"R&B, soul, pop"
4,0:46:33,Meat Loaf,"hard rock, progressive rock"
5,0:43:08,Eagles,"rock, soft rock, folk rock"
6,1:15:54,Bee Gees,disco
7,0:40:01,Fleetwood Mac,soft rock


- One way to access unique elements is the `iloc` & `loc` method, where you can access the 1st row and 1st column as follows:
- It's like indexing and slicing.
- These two are concepts.

**iloc** -> index location

df.iloc[row , column]

In [21]:
df.iloc[0,2]

1982

In [22]:
df.iloc[3:5, 3:6]

Unnamed: 0,Length,Genre,Music Recording Sales (millions)
3,0:57:44,"R&B, soul, pop",27.4
4,0:46:33,"hard rock, progressive rock",20.6


In [23]:
df.iloc[0, 0:2]

Artist    Michael Jackson
Album            Thriller
Name: 0, dtype: object

In [26]:
df.iloc[6:8, 4:6]

Unnamed: 0,Genre,Music Recording Sales (millions)
6,disco,20.6
7,soft rock,27.9


In [27]:
df.iloc[0:2, 0:6]

Unnamed: 0,Artist,Album,Released,Length,Genre,Music Recording Sales (millions)
0,Michael Jackson,Thriller,1982,0:42:19,"pop, rock, R&B",46.0
1,AC/DC,Back in Black,1980,0:42:11,hard rock,26.1


In [30]:
# giving only rows 
df.iloc[0:2]

Unnamed: 0,Artist,Album,Released,Length,Genre,Music Recording Sales (millions),Claimed Sales (millions),Released.1,Soundtrack,Rating
0,Michael Jackson,Thriller,1982,0:42:19,"pop, rock, R&B",46.0,65,30-Nov-82,,10.0
1,AC/DC,Back in Black,1980,0:42:11,hard rock,26.1,50,25-Jul-80,,9.5


In [31]:
# giving only columns
df.iloc[:,0:3]

Unnamed: 0,Artist,Album,Released
0,Michael Jackson,Thriller,1982
1,AC/DC,Back in Black,1980
2,Pink Floyd,The Dark Side of the Moon,1973
3,Whitney Houston,The Bodyguard,1992
4,Meat Loaf,Bat Out of Hell,1977
5,Eagles,Their Greatest Hits (1971-1975),1976
6,Bee Gees,Saturday Night Fever,1977
7,Fleetwood Mac,Rumours,1977


- we can also provide the rows and columns in list form.

In [33]:
df.iloc[[6,2,5], [5,1]]

Unnamed: 0,Music Recording Sales (millions),Album
6,20.6,Saturday Night Fever
2,24.2,The Dark Side of the Moon
5,32.2,Their Greatest Hits (1971-1975)


- There is another method call `loc` which uses names of rows and column indexes.
- `loc` 

In [36]:
# if the indexing is in a,b,c,d,e,f... then you will provide these values in row in loc
# as now we have 0,1,2,3...
df.loc[0:2, 'Artist':'Genre']

Unnamed: 0,Artist,Album,Released,Length,Genre
0,Michael Jackson,Thriller,1982,0:42:19,"pop, rock, R&B"
1,AC/DC,Back in Black,1980,0:42:11,hard rock
2,Pink Floyd,The Dark Side of the Moon,1973,0:42:49,progressive rock


## Adding Column

In [37]:
df['New Artist'] = df['Artist']

In [38]:
df.head()

Unnamed: 0,Artist,Album,Released,Length,Genre,Music Recording Sales (millions),Claimed Sales (millions),Released.1,Soundtrack,Rating,New Artist
0,Michael Jackson,Thriller,1982,0:42:19,"pop, rock, R&B",46.0,65,30-Nov-82,,10.0,Michael Jackson
1,AC/DC,Back in Black,1980,0:42:11,hard rock,26.1,50,25-Jul-80,,9.5,AC/DC
2,Pink Floyd,The Dark Side of the Moon,1973,0:42:49,progressive rock,24.2,45,01-Mar-73,,9.0,Pink Floyd
3,Whitney Houston,The Bodyguard,1992,0:57:44,"R&B, soul, pop",27.4,44,17-Nov-92,Y,8.5,Whitney Houston
4,Meat Loaf,Bat Out of Hell,1977,0:46:33,"hard rock, progressive rock",20.6,43,21-Oct-77,,8.0,Meat Loaf


In [39]:
# adding on specific location
df.insert(2, 'new_col', 0)
df

Unnamed: 0,Artist,Album,new_col,Released,Length,Genre,Music Recording Sales (millions),Claimed Sales (millions),Released.1,Soundtrack,Rating,New Artist
0,Michael Jackson,Thriller,0,1982,0:42:19,"pop, rock, R&B",46.0,65,30-Nov-82,,10.0,Michael Jackson
1,AC/DC,Back in Black,0,1980,0:42:11,hard rock,26.1,50,25-Jul-80,,9.5,AC/DC
2,Pink Floyd,The Dark Side of the Moon,0,1973,0:42:49,progressive rock,24.2,45,01-Mar-73,,9.0,Pink Floyd
3,Whitney Houston,The Bodyguard,0,1992,0:57:44,"R&B, soul, pop",27.4,44,17-Nov-92,Y,8.5,Whitney Houston
4,Meat Loaf,Bat Out of Hell,0,1977,0:46:33,"hard rock, progressive rock",20.6,43,21-Oct-77,,8.0,Meat Loaf
5,Eagles,Their Greatest Hits (1971-1975),0,1976,0:43:08,"rock, soft rock, folk rock",32.2,42,17-Feb-76,,7.5,Eagles
6,Bee Gees,Saturday Night Fever,0,1977,1:15:54,disco,20.6,40,15-Nov-77,Y,7.0,Bee Gees
7,Fleetwood Mac,Rumours,0,1977,0:40:01,soft rock,27.9,40,04-Feb-77,,6.5,Fleetwood Mac


## Dropping Column

## Object type of each Column