# Good Reads Summary

#### The objective of this assignment is for you to explain what is happening in each cell in clear, understandable language. 

#### _There is no need to code._ The code is there for you, and it already runs. Your task is only to explain what each line in each cell does.

#### The placeholder cells should describe what happens in the cell below it.

**Example**: The cell below imports `pandas` as a dependency because `pandas` functions will be used throughout the program, such as the Pandas `DataFrame` as well as the `read_csv` function.

In [1]:
import pandas as pd

_[This cell defines as a csv file in our directory and we then use the read_csv function in pandas to import the data from the csv file as a data frame. Character encoding helps read in the CSV file if there are non-standard characters or values in the data set that would not be properly imported otherwise.]_

In [4]:
goodreads_path = "Resources/books_clean.csv"

goodreads_df = pd.read_csv(goodreads_path, encoding="utf-8")
goodreads_df.head(50)

Unnamed: 0,ISBN,Publication Year,Original Title,Authors,One Star Reviews,Two Star Reviews,Three Star Reviews,Four Star Reviews,Five Star Reviews
0,439023483,2008.0,The Hunger Games,Suzanne Collins,66715,127936,560092,1481305,2706317
1,439554934,1997.0,Harry Potter and the Philosopher's Stone,"J.K. Rowling, Mary GrandPré",75504,101676,455024,1156318,3011543
2,316015849,2005.0,Twilight,Stephenie Meyer,456191,436802,793319,875073,1355439
3,61120081,1960.0,To Kill a Mockingbird,Harper Lee,60427,117415,446835,1001952,1714267
4,743273567,1925.0,The Great Gatsby,F. Scott Fitzgerald,86236,197621,606158,936012,947718
5,525478817,2012.0,The Fault in Our Stars,John Green,47994,92723,327550,698471,1311871
6,618260307,1937.0,The Hobbit or There and Back Again,J.R.R. Tolkien,46023,76784,288649,665635,1119718
7,316769177,1951.0,The Catcher in the Rye,J.D. Salinger,109383,185520,455042,661516,709176
8,1416524797,2000.0,Angels & Demons,Dan Brown,77841,145740,458429,716569,680175
9,679783261,1813.0,Pride and Prejudice,Jane Austen,54700,86485,284852,609755,1155673


_[This generates several variables. The first line gives us the number of authors in the data set. The second and third lines generate the earliest and most recent year included in the data set. The iloc function is restricting the data frame to specific rows and columns that are in the previous data frame. The axis column ensures that it passes on from one column to the next as it takes the sum of the reviews. Without that distinction, the review column shows up as N/A when we try to take the the sum of the reviews below. _

In [10]:
author_count = len(goodreads_df["Authors"].unique())

earliest_year = goodreads_df["Publication Year"].min()
latest_year = goodreads_df["Publication Year"].max()

goodreads_df['Total Reviews'] = goodreads_df.iloc[:, 4:].sum(axis=1)
total_reviews = sum(goodreads_df['Total Reviews'])
goodreads_df

Unnamed: 0,ISBN,Publication Year,Original Title,Authors,One Star Reviews,Two Star Reviews,Three Star Reviews,Four Star Reviews,Five Star Reviews,Total Reviews
0,439023483,2008.0,The Hunger Games,Suzanne Collins,66715,127936,560092,1481305,2706317,4942365.0
1,439554934,1997.0,Harry Potter and the Philosopher's Stone,"J.K. Rowling, Mary GrandPré",75504,101676,455024,1156318,3011543,4800065.0
2,316015849,2005.0,Twilight,Stephenie Meyer,456191,436802,793319,875073,1355439,3916824.0
3,61120081,1960.0,To Kill a Mockingbird,Harper Lee,60427,117415,446835,1001952,1714267,3340896.0
4,743273567,1925.0,The Great Gatsby,F. Scott Fitzgerald,86236,197621,606158,936012,947718,2773745.0
5,525478817,2012.0,The Fault in Our Stars,John Green,47994,92723,327550,698471,1311871,2478609.0
6,618260307,1937.0,The Hobbit or There and Back Again,J.R.R. Tolkien,46023,76784,288649,665635,1119718,2196809.0
7,316769177,1951.0,The Catcher in the Rye,J.D. Salinger,109383,185520,455042,661516,709176,2120637.0
8,1416524797,2000.0,Angels & Demons,Dan Brown,77841,145740,458429,716569,680175,2078754.0
9,679783261,1813.0,Pride and Prejudice,Jane Austen,54700,86485,284852,609755,1155673,2191465.0


In [14]:
%whos


Variable         Type         Data/Info
---------------------------------------
author_count     int          4664
earliest_year    float64      -1750.0
goodreads_df     DataFrame                ISBN  Publica<...>[10000 rows x 10 columns]
goodreads_path   str          Resources/books_clean.csv
latest_year      float64      2017.0
pd               module       <module 'pandas' from '//<...>ages/pandas/__init__.py'>
summary_table    DataFrame       Total Unique Authors  <...>    2017.0    596873216.0
total_reviews    float        596873216.0


_[These cells create a data frame that displays the values we are interested in for the data set. This data frame is made from a dictionary. We knows this because we only have the curly brackets there inside the data drame function. Author count is in square brackets because its value is an integer, and we can not create a dictionary that includes an integer value without adding the square brackets. Adding the square brackets allows that value to be read into the dictionary. ]_

In [13]:
summary_table = pd.DataFrame({"Total Unique Authors": [author_count],
                              "Earliest Year": earliest_year,
                              "Latest Year": latest_year,
                              "Total Reviews": total_reviews})
summary_table

Unnamed: 0,Total Unique Authors,Earliest Year,Latest Year,Total Reviews
0,4664,-1750.0,2017.0,596873216.0
