# Reading and Writing CSV files in pandas

## Reading a CSV file in pandas

The `pandas.read_csv()` function load data from a csv file. 
* The result is a pandas Dataframe.

We need to provide the file path and name to this function
* If the file is in the working directory, we can use just the file name; `pandas.read_csv(file_name)`
* If not, we must provide the file path `pandas.read_csv(file_path)`

In [3]:
!pwd

/Users/yunz/Library/CloudStorage/Dropbox/Teaching/O712/Notebooks/Notebooks_in_Class


In [9]:
import pandas as pd

# Load the "youtube.csv" file
# Use file name only if file is in the working directory
youtube = pd.read_csv("/Users/yunz/Downloads/youtube.csv")

In [4]:
# Use url if the file is stored remotely
youtube = pd.read_csv("https://raw.githubusercontent.com/zhouy185/BUS_O712/main/Data/youtube.csv")
youtube

Unnamed: 0,video_id,views,likes,dislikes,comment_count,comments_disabled,ratings_disabled
0,a-4_6bThk2E,666861,16006,916,3686,False,False
1,5O6iNs0cEIU,239156,1598,252,327,False,False
2,RF-Mqs2qC-M,175490,9514,245,1070,False,False
3,YPR8K4TQ88M,1233755,8823,268,156,False,False
4,HQw0LJbwHY4,108850,730,46,67,False,False
...,...,...,...,...,...,...,...
995,wc27wBExdyw,108376,534,57,74,False,False
996,syAXXLxO8tw,910675,46643,791,1619,False,False
997,0kMYy9kU9jU,294246,22999,281,1194,False,False
998,VUYFDSs3Wlk,187172,923,28,44,False,False


Use `df.head()` to show the first 5 rows of data (or you can specify how many rows to view)

In [5]:
youtube.head(6)

Unnamed: 0,video_id,views,likes,dislikes,comment_count,comments_disabled,ratings_disabled
0,a-4_6bThk2E,666861,16006,916,3686,False,False
1,5O6iNs0cEIU,239156,1598,252,327,False,False
2,RF-Mqs2qC-M,175490,9514,245,1070,False,False
3,YPR8K4TQ88M,1233755,8823,268,156,False,False
4,HQw0LJbwHY4,108850,730,46,67,False,False
5,oyBfefx7I6c,1602851,28896,1214,2109,False,False


By default, the first row in the csv file is used as the header of the DataFrame.

If it should not be used as the header, use the argument `header=None`

In [8]:
# Make sure the file youtube.csv is in the same directory as your Jupyter notebook, if only the file name "youtue.csv"
pd.read_csv("youtube.csv",header=None)

Unnamed: 0,0,1,2,3,4,5,6
0,video_id,views,likes,dislikes,comment_count,comments_disabled,ratings_disabled
1,a-4_6bThk2E,666861,16006,916,3686,FALSE,FALSE
2,5O6iNs0cEIU,239156,1598,252,327,FALSE,FALSE
3,RF-Mqs2qC-M,175490,9514,245,1070,FALSE,FALSE
4,YPR8K4TQ88M,1233755,8823,268,156,FALSE,FALSE
...,...,...,...,...,...,...,...
996,wc27wBExdyw,108376,534,57,74,FALSE,FALSE
997,syAXXLxO8tw,910675,46643,791,1619,FALSE,FALSE
998,0kMYy9kU9jU,294246,22999,281,1194,FALSE,FALSE
999,VUYFDSs3Wlk,187172,923,28,44,FALSE,FALSE


We can also replace the header in the csv by setting `header=0` and providing a list of column names to use as the header

In [9]:
 youtube_copy = pd.read_csv("youtube.csv",header=0,names=['C0','C1','C2','C3','C4','C5','C6'])

In [10]:
youtube_copy

Unnamed: 0,C0,C1,C2,C3,C4,C5,C6
0,a-4_6bThk2E,666861,16006,916,3686,False,False
1,5O6iNs0cEIU,239156,1598,252,327,False,False
2,RF-Mqs2qC-M,175490,9514,245,1070,False,False
3,YPR8K4TQ88M,1233755,8823,268,156,False,False
4,HQw0LJbwHY4,108850,730,46,67,False,False
...,...,...,...,...,...,...,...
995,wc27wBExdyw,108376,534,57,74,False,False
996,syAXXLxO8tw,910675,46643,791,1619,False,False
997,0kMYy9kU9jU,294246,22999,281,1194,False,False
998,VUYFDSs3Wlk,187172,923,28,44,False,False


**Note**: `header = 0` means that we overwrite the original header with the provided names

Given a DataFrame, we can use `df.to_csv()` to convert it a csv file

In [11]:
youtube_copy.to_csv("/Users/yunz/Downloads/youtube_copy.csv",index=False)

The file data.csv will be created in the current directory (where the notebook is opened or started).

The CSV file created has an additional column added as Index. 
* If we would like to remove the Index column we need to add the argument index and set it to False. 