# CSV (Comma-Separated Values)

Q. What is CSV?
- CSV is a simple and widely used file format for storing tabular data (data in rows and columns).
Each line of the file represents a row, and the values within a row are separated by commas.

![image.png](attachment:image.png)

Q. Advantages of CSV:
* Simplicity: Easy to understand and work with.
* Compatibility: Can be opened and edited in various spreadsheet programs.
* Interoperability: Supported by many data analysis and machine learning tools.

# CSV Cheatcodes 2024

Let's discuss, how to use pandas to `read_csv` function's various parameters while working with `csv` or `tsv` files. 

Note: This cheat code serves as a reference for addressing any challenges encountered with CSV or TSV files in the future.

### Importing pandas

In [1]:
# Import the pandas library and alias it as pd
import pandas as pd

### 1. Opening a CSV file present locally on system or in same folder

`file_path`: Represents the path to the CSV or TSV file.

In [2]:
# Read a CSV file named 'new_employee.csv' into a pandas DataFrame
data = pd.read_csv("new_employee.csv")

# Display the DataFrame
data.head()

Unnamed: 0,enrollee_id,city,city_development_index,gender,relevent_experience,enrolled_university,education_level,major_discipline,experience,company_size,company_type,last_new_job,training_hours,target
0,8949,city_103,0.92,Male,Has relevent experience,no_enrollment,Graduate,STEM,>20,,,1,36,1.0
1,29725,city_40,0.776,Male,No relevent experience,no_enrollment,Graduate,STEM,15,50-99,Pvt Ltd,>4,47,0.0
2,11561,city_21,0.624,,No relevent experience,Full time course,Graduate,STEM,5,,,never,83,0.0
3,33241,city_115,0.789,,No relevent experience,,Graduate,Business Degree,<1,,Pvt Ltd,never,52,1.0
4,666,city_162,0.767,Male,Has relevent experience,no_enrollment,Masters,STEM,>20,50-99,Funded Startup,4,8,0.0


### 2.Using the 'index_col' parameter

`index_col`: This is to allow you to set which columns to be used as the index of the dataframe.

Note: `index_col` is optional, and if it's not specified, the DataFrame will have the default `integer index`. If you don't have a specific column to use as the index, you can omit this parameter.

In [3]:
# Read 'new_employee.csv' and set the 'enrollee_id' column as the index
pd.read_csv('new_employee.csv', index_col='enrollee_id')

Unnamed: 0_level_0,city,city_development_index,gender,relevent_experience,enrolled_university,education_level,major_discipline,experience,company_size,company_type,last_new_job,training_hours,target
enrollee_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
8949,city_103,0.92,Male,Has relevent experience,no_enrollment,Graduate,STEM,>20,,,1,36,1.0
29725,city_40,0.776,Male,No relevent experience,no_enrollment,Graduate,STEM,15,50-99,Pvt Ltd,>4,47,0.0
11561,city_21,0.624,,No relevent experience,Full time course,Graduate,STEM,5,,,never,83,0.0
33241,city_115,0.789,,No relevent experience,,Graduate,Business Degree,<1,,Pvt Ltd,never,52,1.0
666,city_162,0.767,Male,Has relevent experience,no_enrollment,Masters,STEM,>20,50-99,Funded Startup,4,8,0.0
21651,city_176,0.764,,Has relevent experience,Part time course,Graduate,STEM,11,,,1,24,1.0
28806,city_160,0.92,Male,Has relevent experience,no_enrollment,High School,,5,50-99,Funded Startup,1,24,0.0
402,city_46,0.762,Male,Has relevent experience,no_enrollment,Graduate,STEM,13,<10,Pvt Ltd,>4,18,1.0
27107,city_103,0.92,Male,Has relevent experience,no_enrollment,Graduate,STEM,7,50-99,Pvt Ltd,1,46,1.0
699,city_103,0.92,,Has relevent experience,no_enrollment,Graduate,STEM,17,10000+,Pvt Ltd,>4,123,0.0


### 3.Using the 'usecols' parameter

`usecols`: Specifies a subset of columns to be read from the file.

In [4]:
# Read 'new_employee.csv' and select only specific columns
pd.read_csv("new_employee.csv", usecols= ["enrollee_id", "relevent_experience",	"experience", "last_new_job"])

Unnamed: 0,enrollee_id,relevent_experience,experience,last_new_job
0,8949,Has relevent experience,>20,1
1,29725,No relevent experience,15,>4
2,11561,No relevent experience,5,never
3,33241,No relevent experience,<1,never
4,666,Has relevent experience,>20,4
5,21651,Has relevent experience,11,1
6,28806,Has relevent experience,5,1
7,402,Has relevent experience,13,>4
8,27107,Has relevent experience,7,1
9,699,Has relevent experience,17,>4


### 4.Using the 'squeeze' parameter

`squeeze()`: converts a single column DataFrame into a Series.

Note: Squeeze will try to reduce the dimension if its possible to reduce. 

In [5]:
# Read 'new_employee.csv' and squeeze the result into a pandas Series
pd.read_csv("new_employee.csv", usecols=["target"], squeeze= True)



  pd.read_csv("new_employee.csv", usecols=["target"], squeeze= True)


0     1.0
1     0.0
2     0.0
3     1.0
4     0.0
5     1.0
6     0.0
7     1.0
8     1.0
9     0.0
10    1.0
11    0.0
12    0.0
13    0.0
14    0.0
15    0.0
16    0.0
17    0.0
18    1.0
19    1.0
20    0.0
21    0.0
22    0.0
23    0.0
24    0.0
25    0.0
26    1.0
27    0.0
28    1.0
29    1.0
30    0.0
31    0.0
32    0.0
Name: target, dtype: float64

### 5.Using the 'skiprows' parameter

`skiprows`: Allows skipping a specified number of rows from the beginning of the file.

In [9]:
# Define a lambda function to get indices of rows to skip
skip_condition = lambda x: x % 2 != 0

# Read 'new_employee.csv' and skiprows of the dataset 
pd.read_csv("new_employee.csv", skiprows=skip_condition)

Unnamed: 0,enrollee_id,city,city_development_index,gender,relevent_experience,enrolled_university,education_level,major_discipline,experience,company_size,company_type,last_new_job,training_hours,target
0,29725,city_40,0.776,Male,No relevent experience,no_enrollment,Graduate,STEM,15,50-99,Pvt Ltd,>4,47,0.0
1,33241,city_115,0.789,,No relevent experience,,Graduate,Business Degree,<1,,Pvt Ltd,never,52,1.0
2,21651,city_176,0.764,,Has relevent experience,Part time course,Graduate,STEM,11,,,1,24,1.0
3,402,city_46,0.762,Male,Has relevent experience,no_enrollment,Graduate,STEM,13,<10,Pvt Ltd,>4,18,1.0
4,699,city_103,0.92,,Has relevent experience,no_enrollment,Graduate,STEM,17,10000+,Pvt Ltd,>4,123,0.0
5,23853,city_103,0.92,Male,Has relevent experience,no_enrollment,Graduate,STEM,5,5000-9999,Pvt Ltd,1,108,0.0
6,5826,city_21,0.624,Male,No relevent experience,,,,2,,,never,24,0.0
7,6588,city_114,0.926,Male,Has relevent experience,no_enrollment,Graduate,STEM,16,10/49,Pvt Ltd,>4,18,0.0
8,5764,city_21,0.624,,Has relevent experience,no_enrollment,Graduate,STEM,2,5000-9999,Pvt Ltd,2,7,0.0
9,11399,city_13,0.827,Female,Has relevent experience,no_enrollment,Graduate,Arts,4,,,1,132,1.0


### 6. Using the 'dtype' parameter

`dtype`: Provides a dictionary mapping column names to data types to force the conversion of specific columns.

In [10]:
# Read 'new_employee.csv' and specify the data type of the 'target' column as integer
pd.read_csv("new_employee.csv", dtype={'target': int})

Unnamed: 0,enrollee_id,city,city_development_index,gender,relevent_experience,enrolled_university,education_level,major_discipline,experience,company_size,company_type,last_new_job,training_hours,target
0,8949,city_103,0.92,Male,Has relevent experience,no_enrollment,Graduate,STEM,>20,,,1,36,1
1,29725,city_40,0.776,Male,No relevent experience,no_enrollment,Graduate,STEM,15,50-99,Pvt Ltd,>4,47,0
2,11561,city_21,0.624,,No relevent experience,Full time course,Graduate,STEM,5,,,never,83,0
3,33241,city_115,0.789,,No relevent experience,,Graduate,Business Degree,<1,,Pvt Ltd,never,52,1
4,666,city_162,0.767,Male,Has relevent experience,no_enrollment,Masters,STEM,>20,50-99,Funded Startup,4,8,0
5,21651,city_176,0.764,,Has relevent experience,Part time course,Graduate,STEM,11,,,1,24,1
6,28806,city_160,0.92,Male,Has relevent experience,no_enrollment,High School,,5,50-99,Funded Startup,1,24,0
7,402,city_46,0.762,Male,Has relevent experience,no_enrollment,Graduate,STEM,13,<10,Pvt Ltd,>4,18,1
8,27107,city_103,0.92,Male,Has relevent experience,no_enrollment,Graduate,STEM,7,50-99,Pvt Ltd,1,46,1
9,699,city_103,0.92,,Has relevent experience,no_enrollment,Graduate,STEM,17,10000+,Pvt Ltd,>4,123,0


### 7. Using the 'nrows' parameter

`nrows`: Specifies the maximum number of rows /to read the first n rows of a CSV file.

Note: That the nrows parameter does not affect the total number of rows in the CSV file.

In [11]:
# Read the first 10 rows of 'new_employee.csv'
pd.read_csv("new_employee.csv", nrows=10)

Unnamed: 0,enrollee_id,city,city_development_index,gender,relevent_experience,enrolled_university,education_level,major_discipline,experience,company_size,company_type,last_new_job,training_hours,target
0,8949,city_103,0.92,Male,Has relevent experience,no_enrollment,Graduate,STEM,>20,,,1,36,1.0
1,29725,city_40,0.776,Male,No relevent experience,no_enrollment,Graduate,STEM,15,50-99,Pvt Ltd,>4,47,0.0
2,11561,city_21,0.624,,No relevent experience,Full time course,Graduate,STEM,5,,,never,83,0.0
3,33241,city_115,0.789,,No relevent experience,,Graduate,Business Degree,<1,,Pvt Ltd,never,52,1.0
4,666,city_162,0.767,Male,Has relevent experience,no_enrollment,Masters,STEM,>20,50-99,Funded Startup,4,8,0.0
5,21651,city_176,0.764,,Has relevent experience,Part time course,Graduate,STEM,11,,,1,24,1.0
6,28806,city_160,0.92,Male,Has relevent experience,no_enrollment,High School,,5,50-99,Funded Startup,1,24,0.0
7,402,city_46,0.762,Male,Has relevent experience,no_enrollment,Graduate,STEM,13,<10,Pvt Ltd,>4,18,1.0
8,27107,city_103,0.92,Male,Has relevent experience,no_enrollment,Graduate,STEM,7,50-99,Pvt Ltd,1,46,1.0
9,699,city_103,0.92,,Has relevent experience,no_enrollment,Graduate,STEM,17,10000+,Pvt Ltd,>4,123,0.0


### 8. Using the 'na_values' parameter

`na_values`: Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN

In [14]:
# The `na_values` parameter is used to specify a list of values that should be treated as missing or NaN (Not a Number) values in the resulting DataFrame. 
# In this case, the value "-" is specified as a missing value indicator.
pd.read_csv("new_employee.csv", na_values=["-"])

# Check row 14

Unnamed: 0,enrollee_id,city,city_development_index,gender,relevent_experience,enrolled_university,education_level,major_discipline,experience,company_size,company_type,last_new_job,training_hours,target
0,8949,city_103,0.92,Male,Has relevent experience,no_enrollment,Graduate,STEM,>20,,,1,36,1.0
1,29725,city_40,0.776,Male,No relevent experience,no_enrollment,Graduate,STEM,15,50-99,Pvt Ltd,>4,47,0.0
2,11561,city_21,0.624,,No relevent experience,Full time course,Graduate,STEM,5,,,never,83,0.0
3,33241,city_115,0.789,,No relevent experience,,Graduate,Business Degree,<1,,Pvt Ltd,never,52,1.0
4,666,city_162,0.767,Male,Has relevent experience,no_enrollment,Masters,STEM,>20,50-99,Funded Startup,4,8,0.0
5,21651,city_176,0.764,,Has relevent experience,Part time course,Graduate,STEM,11,,,1,24,1.0
6,28806,city_160,0.92,Male,Has relevent experience,no_enrollment,High School,,5,50-99,Funded Startup,1,24,0.0
7,402,city_46,0.762,Male,Has relevent experience,no_enrollment,Graduate,STEM,13,<10,Pvt Ltd,>4,18,1.0
8,27107,city_103,0.92,Male,Has relevent experience,no_enrollment,Graduate,STEM,7,50-99,Pvt Ltd,1,46,1.0
9,699,city_103,0.92,,Has relevent experience,no_enrollment,Graduate,STEM,17,10000+,Pvt Ltd,>4,123,0.0


### 9. Loading a huge dataset in chunks using 'chunksize' parameter

`chunksize`: Number of lines to read from the file per chunk

Note: This parameter determines the number of rows to read at a time while working with large dataset and less memory to load full file at once.

* The maximum size of an individual chunk is 4 TB. The number of allowable chunks is 32,766.

In [16]:
# The `chunksize=10` parameter specifies that the file should be read in chunks of 10 rows at a time. 
# The resulting object `dfs` is an iterable that contains each chunk of the CSV file.
dfs = pd.read_csv("new_employee.csv",chunksize=10)

# The code is iterating over each chunk of the CSV file and printing the shape of each chunk. 
for chunks in dfs:
    print(chunks.shape)

(10, 14)
(10, 14)
(10, 14)
(3, 14)


### 10. Handling Dates using 'parse_dates' parameter

`parse_dates`: A list of columns to be parsed as dates, helping in proper date-time handling.

Note: NaT stands for Not a Time.

In [19]:
# Read 'new_employee.csv' and parse the 'date' column as datetime
pd.read_csv("new_employee.csv", parse_dates=['year_of_education'])


  return tools.to_datetime(
  return tools.to_datetime(
  return tools.to_datetime(
  return tools.to_datetime(
  return tools.to_datetime(
  return tools.to_datetime(
  return tools.to_datetime(
  return tools.to_datetime(
  return tools.to_datetime(
  return tools.to_datetime(
  return tools.to_datetime(
  return tools.to_datetime(
  return tools.to_datetime(
  return tools.to_datetime(
  return tools.to_datetime(
  return tools.to_datetime(
  return tools.to_datetime(


Unnamed: 0,enrollee_id,city,city_development_index,gender,relevent_experience,enrolled_university,education_level,year_of_education,major_discipline,experience,company_size,company_type,last_new_job,training_hours,target
0,8949,city_103,0.92,Male,Has relevent experience,no_enrollment,Graduate,2020-01-06,STEM,>20,,,1,36.0,1.0
1,29725,city_40,0.776,Male,No relevent experience,no_enrollment,Graduate,2022-10-15,STEM,15,50-99,Pvt Ltd,>4,47.0,0.0
2,11561,city_21,0.624,,No relevent experience,Full time course,Graduate,2023-02-02,STEM,5,,,never,83.0,0.0
3,33241,city_115,0.789,,No relevent experience,,Graduate,2022-03-09,Business Degree,<1,,Pvt Ltd,never,52.0,1.0
4,666,city_162,0.767,Male,Has relevent experience,no_enrollment,Masters,2021-07-30,STEM,>20,50-99,Funded Startup,4,8.0,0.0
5,21651,city_176,0.764,,Has relevent experience,Part time course,Graduate,2020-05-22,STEM,11,,,1,24.0,1.0
6,28806,city_160,0.92,Male,Has relevent experience,no_enrollment,High School,2021-04-12,,5,50-99,Funded Startup,1,24.0,0.0
7,402,city_46,0.762,Male,Has relevent experience,no_enrollment,Graduate,2023-03-27,STEM,13,<10,Pvt Ltd,>4,18.0,1.0
8,27107,city_103,0.92,Male,Has relevent experience,no_enrollment,Graduate,2020-08-18,STEM,7,50-99,Pvt Ltd,1,46.0,1.0
9,699,city_103,0.92,,Has relevent experience,no_enrollment,Graduate,2021-01-21,STEM,17,10000+,Pvt Ltd,>4,123.0,0.0


# Most Important Parameters

### 11. Using a Function with 'converters'

`converters`: Functions for converting values in specified columns. Keys can either be column labels or column indices. 

Note: dict of {HashableCallable}, optional

In [20]:
# Define a function to rename a 'major_discipline'
def rename(name):
    if name == "Arts":
        return "Humanities"
    else:
        return name

# Read 'new_employee.csv' and apply the 'rename' function to the 'major_discipline' column
pd.read_csv('new_employee.csv', converters={'major_discipline': rename})


Unnamed: 0,enrollee_id,city,city_development_index,gender,relevent_experience,enrolled_university,education_level,year_of_education,major_discipline,experience,company_size,company_type,last_new_job,training_hours,target
0,8949,city_103,0.92,Male,Has relevent experience,no_enrollment,Graduate,01/06/2020,STEM,>20,,,1,36.0,1.0
1,29725,city_40,0.776,Male,No relevent experience,no_enrollment,Graduate,15/10/2022,STEM,15,50-99,Pvt Ltd,>4,47.0,0.0
2,11561,city_21,0.624,,No relevent experience,Full time course,Graduate,02/02/2023,STEM,5,,,never,83.0,0.0
3,33241,city_115,0.789,,No relevent experience,,Graduate,03/09/2022,Business Degree,<1,,Pvt Ltd,never,52.0,1.0
4,666,city_162,0.767,Male,Has relevent experience,no_enrollment,Masters,30/07/2021,STEM,>20,50-99,Funded Startup,4,8.0,0.0
5,21651,city_176,0.764,,Has relevent experience,Part time course,Graduate,22/05/2020,STEM,11,,,1,24.0,1.0
6,28806,city_160,0.92,Male,Has relevent experience,no_enrollment,High School,04/12/2021,,5,50-99,Funded Startup,1,24.0,0.0
7,402,city_46,0.762,Male,Has relevent experience,no_enrollment,Graduate,27/03/2023,STEM,13,<10,Pvt Ltd,>4,18.0,1.0
8,27107,city_103,0.92,Male,Has relevent experience,no_enrollment,Graduate,18/08/2020,STEM,7,50-99,Pvt Ltd,1,46.0,1.0
9,699,city_103,0.92,,Has relevent experience,no_enrollment,Graduate,21/01/2021,STEM,17,10000+,Pvt Ltd,>4,123.0,0.0


### 12. Opening a 'csv' file from an 'URL'

In [21]:
# Import necessary libraries
import requests
from io import StringIO

# URL of the CSV file
url = "https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
# Set headers to mimic a web browser request
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"}
# Make a request to the URL
request = requests.get(url, headers=headers)
# Use StringIO to create a file-like object from the text content of the request
data = StringIO(request.text)

# Read the CSV file from the URL into a pandas DataFrame
pd.read_csv(data)

Unnamed: 0,Country,Region
0,Algeria,AFRICA
1,Angola,AFRICA
2,Benin,AFRICA
3,Botswana,AFRICA
4,Burkina,AFRICA
...,...,...
189,Paraguay,SOUTH AMERICA
190,Peru,SOUTH AMERICA
191,Suriname,SOUTH AMERICA
192,Uruguay,SOUTH AMERICA


### 13. Using the 'delimiter' and 'sep' parameter and naming the columns using 'names'

`delimiter`: Specifies the delimiter used in the file, either a comma (,) for CSV or a tab (\t) for TSV.

In [44]:
pd.read_csv("occupation_data.tsv", delimiter="\t", names=["movie_name", 'release_year', 'rating', 'votes', 'genres'] )

Unnamed: 0,movie_name,release_year,rating,votes,genres
m0,10 things i hate about you,1999,6.9,62847,['comedy' 'romance']
m1,1492: conquest of paradise,1992,6.2,10421,['adventure' 'biography' 'drama' 'history']
m2,15 minutes,2001,6.1,25854,['action' 'crime' 'drama' 'thriller']
m3,2001: a space odyssey,1968,8.4,163227,['adventure' 'mystery' 'sci-fi']
m4,48 hrs.,1982,6.9,22289,['action' 'comedy' 'crime' 'drama' 'thriller']
m5,the fifth element,1997,7.5,133756,['action' 'adventure' 'romance' 'sci-fi' 'thri...
m6,8mm,1999,6.3,48212,['crime' 'mystery' 'thriller']


`sep`: Character or regex pattern to treat as the delimiter, default ‘,’ 

`names`: Sequence of column labels to apply. 

Note: If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this list are not allowed.

In [34]:
pd.read_csv("occupation_data.tsv", sep="\t", names=["movie_name", 'release_year', 'rating', 'votes', 'genres'] )

Unnamed: 0,movie_name,release_year,rating,votes,genres
m0,10 things i hate about you,1999,6.9,62847,['comedy' 'romance']
m1,1492: conquest of paradise,1992,6.2,10421,['adventure' 'biography' 'drama' 'history']
m2,15 minutes,2001,6.1,25854,['action' 'crime' 'drama' 'thriller']
m3,2001: a space odyssey,1968,8.4,163227,['adventure' 'mystery' 'sci-fi']
m4,48 hrs.,1982,6.9,22289,['action' 'comedy' 'crime' 'drama' 'thriller']
m5,the fifth element,1997,7.5,133756,['action' 'adventure' 'romance' 'sci-fi' 'thri...
m6,8mm,1999,6.3,48212,['crime' 'mystery' 'thriller']


### 14. Using the 'header' parameter

`header`: Indicates whether the file has a header row. If set to True, the first row is treated as column headers.

In [37]:
pd.read_csv("new_employee_test.csv", header= 0)

Unnamed: 0,0,enrollee_id,city,city_development_index,gender,relevent_experience,enrolled_university,education_level,year_of_education,major_discipline,experience,company_size,company_type,last_new_job,training_hours,target
0,1,8949,city_103,0.92,Male,Has relevent experience,no_enrollment,Graduate,01/06/2020,STEM,>20,,,1,36,1.0
1,2,29725,city_40,0.776,Male,No relevent experience,no_enrollment,Graduate,15/10/2022,STEM,15,50-99,Pvt Ltd,>4,47,0.0
2,3,11561,city_21,0.624,,No relevent experience,Full time course,Graduate,02/02/2023,STEM,5,,,never,83,0.0
3,4,33241,city_115,0.789,,No relevent experience,,Graduate,03/09/2022,Business Degree,<1,,Pvt Ltd,never,52,1.0
4,5,666,city_162,0.767,Male,Has relevent experience,no_enrollment,Masters,30/07/2021,STEM,>20,50-99,Funded Startup,4,8,0.0


### 15. Using the 'encoding' parameter or 'UnicodeDecodeError' i.e. No 'UTF-8' Encoding

`encoding`: Defines the character encoding of the file, ensuring proper reading of text.

Note: Use 'encoding' when you get some other dataset rather than utf-8 encoded. Example: Emoji's Dataset

In [38]:
pd.read_csv('people_data.csv',encoding='latin-1')

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,San Francisco
2,Charlie,22,London
3,Diana,35,Paris
4,Eva,28,Berlin


### 16. Using the 'error_bad_lines' parameter to handle/skip bad lines or 'ParserError'

`error_bad_lines` : boolean, default True

Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will dropped from the DataFrame that is returned. (Only valid with C parser)

In [43]:
pd.read_csv('book_data.csv', sep=';', encoding="latin-1",error_bad_lines=False)



  pd.read_csv('book_data.csv', sep=';', encoding="latin-1",error_bad_lines=False)
b'Skipping line 4: expected 5 fields, saw 6\n'


Unnamed: 0,Title,Author,Genre,Price,Publication_Year
0,The Great Gatsby,F. Scott Fitzgerald,Fiction,10.99,1925
1,To Kill a Mockingbird,Harper Lee,Fiction,12.99,1960
2,Pride and Prejudice,Jane Austen,Romance,8.99,1813
3,The Catcher in the Rye,J.D. Salinger,Fiction,11.99,1951


These are the major problems I encounter when working with `csv_file`. 

For a comprehensive understanding of the `read_csv` function, refer to the official pandas documentation: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html for full scope.