# What kind of book should we publish to get into the Amazon best sellers list?

We are a new publisher looking to make our first published book a huge success. We focus on our first book because resources are limited and we can't afford to test the market by publishing multiple books and see what would sell the most. Another important detail is we're limited to publishing our e-books; hence, we won't be considering other medium such as physical books or audiobook.

Our aim is somewhat modest and we are only looking to be included in the Amazon best sellers list. If we can be included in that list, we will be more likely to be included in other best sellers list such as the New York Times. 

In relation to the Amazon Book Best Sellers list, we want to answer the following questions:
1. Which genre is the most popular?
2. What is the average price of books?
3. How many reviews does a book receive on average? Does the average number of reviews change significantly depending on the genre?
4. How high is the rating a book receives on average? Does the average rating change significantly depending on the genre?
5. Are there any other factors aside from genre that makes a book popular?

We'll start by answering the above plus some incidental questions found along the way; we will the follow up by formulating one or two book profiles that could potentially become best sellers.

## Loading, Cleaning and Exploring the Data

### Load Data Set

In [1]:
import pandas as pd

amazon_books = pd.read_csv("Amazon_popular_books_dataset.csv")
print(amazon_books.shape)

# Display both head and tail of data frame
amazon_books

(2269, 40)


Unnamed: 0,asin,ISBN10,answered_questions,availability,brand,currency,date_first_available,delivery,department,description,...,upc,url,video,video_count,categories,best_sellers_rank,buybox_seller,image,number_of_sellers,colors
0,0007350813,0007350813,0,In Stock.,Emily Brontë,USD,,"[""FREE delivery Tuesday, December 28 if you sp...",,,...,,https://www.amazon.com/dp/0007350813,,0,"[""Books"",""Literature & Fiction"",""Genre Fiction""]","[{""category"":""Books / Literature & Fiction / H...",,,,
1,0007513763,9780007513765,0,In Stock.,Drew Daywalt,USD,,"[""FREE delivery Tuesday, December 28 if you sp...",,,...,,https://www.amazon.com/dp/0007513763,,0,"[""Books"",""Children's Books"",""Literature & Fict...","[{""category"":""Books / Children's Books / Liter...",VMG Books & Media,,,
2,0008183988,0008183988,0,,Bernard Cornwell,USD,,"[""FREE delivery January 4 - 10 if you spend $2...",,,...,,https://www.amazon.com/dp/0008183988,,0,"[""Books"",""Literature & Fiction"",""Genre Fiction""]","[{""category"":""Books / Literature & Fiction / H...",Reuseaworld,,,
3,0008305838,0008305838,0,In Stock.,David Walliams,USD,,"[""FREE delivery Tuesday, December 28 if you sp...",,,...,,https://www.amazon.com/dp/0008305838,,0,"[""Books"",""Children's Books"",""Literature & Fict...","[{""category"":""Books / Children's Books / Liter...",Bahamut Media,,,
4,0008375526,0008375526,0,In Stock.,Caroline Hirons,USD,,"[""FREE delivery Tuesday, December 28"",""Or fast...",,,...,,https://www.amazon.com/dp/0008375526,,0,"[""Books"",""Crafts, Hobbies & Home"",""Home Improv...","[{""category"":""Books / Health, Fitness & Dietin...",KathrynAshleyGallery,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2264,B07P5BPVGM,,0,,Jess Lourey,USD,,[],,,...,,https://www.amazon.com/dp/B07P5BPVGM,,0,"[""Books"",""Mystery, Thriller & Suspense"",""Thril...","[{""category"":""Books / Mystery, Thriller & Susp...",,,,
2265,B07P5JBCFL,,0,,"Heidi Murkoff (Author, Narrator), Meeghan Hola...",USD,,[],,,...,,https://www.amazon.com/dp/B07P5JBCFL,,0,"[""Books"",""Health, Fitness & Dieting"",""Women's ...","[{""category"":""Books / Health, Fitness & Dietin...",,,,
2266,B07NF7DFS2,,0,,"Clea Shearer (Author, Narrator), Joanna Teplin...",USD,,[],,,...,,https://www.amazon.com/dp/B07NF7DFS2,,0,"[""Books"",""Crafts, Hobbies & Home"",""Home Improv...","[{""category"":""Audible Books & Originals / Home...",,,,
2267,B07P67N918,,0,,"Lisa Jewell (Author), Tamaryn Payne (Narrator)...",USD,,[],,,...,,https://www.amazon.com/dp/B07P67N918,,0,"[""Books"",""Mystery, Thriller & Suspense"",""Thril...","[{""category"":""Audible Books & Originals / Myst...",,,,


2269 rows of the data frame matched the data set [documentation](https://github.com/luminati-io/Amazon-popular-books-dataset#amazon-popular-books-dataset):

> This Amazon dataset contains 2269 best-selling books.

On the other hand, the documentation does not mention the number of columns. We should investigate the column names to make sure nothing odd happened when we loaded the data set.

In [2]:
# Show all columns
pd.DataFrame({'columns': amazon_books.columns})

Unnamed: 0,columns
0,asin
1,ISBN10
2,answered_questions
3,availability
4,brand
5,currency
6,date_first_available
7,delivery
8,department
9,description


### Column Selection

Observing the above results, the column names seems reasonable. Now, we'll need to select the relevant columns because even if keeping irrelevant columns don't have any impact in the results of our analysis, it does clutter the display and also slows down data processing and calculations. 

The documentation mentions that the following are the key columns:

> Key data points included in this free dataset:
> - ASIN
> - ISBN10
> - Categories
> - Reviews count
> - Avg. rating
> - Number of sellers
> - URL
> - Image
> - Final price
> - Title
> - Description
> - Availability

The above columns should be enough for the purpose of this project thus we will restrict our data frame to only those columns for now; we can always add the removed columns later if there's a need for them.

| Full Name         | Column Name       |
|:------------------|:------------------|
| ASIN              | asin              |
| ISBN10            | ISBN10            |
| Categories        | categories        |
| Reviews count     | reviews_count     |
| Average rating    | rating            |
| Number of sellers | number_of_sellers |
| URL               | url               |
| Image             | image             |
| Final price       | final_price       |
| Title             | title             |
| Description       | description       |
| Availability      | availability      |
| Format            | format            |
| Author            | brand             |

We've also added the columns for Format and Author (which is called `brand` - odd, but makes sense because sometimes the book is authored collectively as an institution or publisher; and yes, authors are themselves brands in a way).

In [3]:
# Define the columns to keep
keep_cols = ['asin', 
             'ISBN10', 
             'categories', 
             'reviews_count', 
             'rating', 
             'number_of_sellers', 
             'url', 
             'image', 
             'final_price',
             'title',
             'description',
             'availability',
             'format',
             'brand'
            ]

# Review
amazon_books_updated = amazon_books[keep_cols]
amazon_books_updated.head()

Unnamed: 0,asin,ISBN10,categories,reviews_count,rating,number_of_sellers,url,image,final_price,title,description,availability,format,brand
0,7350813,7350813,"[""Books"",""Literature & Fiction"",""Genre Fiction""]",13451,4.6 out of 5 stars,,https://www.amazon.com/dp/0007350813,,3.99,Wuthering Heights (Collins Classics),,In Stock.,"[{""name"":""Kindle"",""price"":""$0.99"",""url"":""/Wuth...",Emily Brontë
1,7513763,9780007513765,"[""Books"",""Children's Books"",""Literature & Fict...",16628,4.8 out of 5 stars,,https://www.amazon.com/dp/0007513763,,12.08,THE DAYS THE CRAYONS QUIT,,In Stock.,"[{""name"":""Kindle"",""price"":""$10.99"",""url"":""/Day...",Drew Daywalt
2,8183988,8183988,"[""Books"",""Literature & Fiction"",""Genre Fiction""]",11275,4.8 out of 5 stars,,https://www.amazon.com/dp/0008183988,,,War Lord: Book 13 (The Last Kingdom Series),,,"[{""name"":""Kindle"",""price"":""$11.99"",""url"":""/War...",Bernard Cornwell
3,8305838,8305838,"[""Books"",""Children's Books"",""Literature & Fict...",15520,4.8 out of 5 stars,,https://www.amazon.com/dp/0008305838,,20.43,Code Name Bananas: The hilarious and epic new ...,,In Stock.,"[{""name"":""Paperback"",""price"":""$24.78"",""url"":""/...",David Walliams
4,8375526,8375526,"[""Books"",""Crafts, Hobbies & Home"",""Home Improv...",10884,4.8 out of 5 stars,,https://www.amazon.com/dp/0008375526,,28.89,Skincare: The award-winning ultimate no-nonsen...,,In Stock.,"[{""name"":""Kindle"",""price"":""$16.99"",""url"":""/Ski...",Caroline Hirons


### Another Round of Column Selection

The previous column selection is mostly done to retrieve plausible columns for the analysis. Upon closer inspection, there are some columns I would like to remove:

| Full Name    | Column Name  | Reason for Removal                                                                                                                                                                                                                                                                                                                                                                                         |
|:-------------|:-------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ASIN         | asin         | The [Amazon Standard Identification Number](https://en.wikipedia.org/wiki/Amazon_Standard_Identification_Number) (ASIN) is not required. It's sufficient to identify a book by its title.                                                                                                                                                                                                                  |
| ISBN         | ISBN10       | Similar to the reason for ASIN removal, the [International Standard Book Number](https://en.wikipedia.org/wiki/ISBN) (ISBN) is not required.                                                                                                                                                                                                                                                               |
| URL          | url          | The url or link to the Amazon page for the book is not necessary. Our analysis doesn't cover anything that looks into the exact url or the page the url links to.                                                                                                                                                                                                                                          |
| Image        | image        | The image of the book is not needed. It could be useful, but it requires different (and potentially more complicated) data analysis methods which are beyond the scope of this project. A worthy candidate for investigation in our next project.                                                                                                                                                          |
| Description  | description  | Similar to book image, this could be useful but can be too complicated to analyse for the purpose of this project.                                                                                                                                                                                                                                                                                         |
| Availability | availability | We are not concerned with a book's stock availability on Amazon; if the book is included in the best seller list, it's good enough whether it's available or not. Other than that, stock availability data is dynamic and the data set is just a snapshot of a book's availability when the data was collected; we'll need a time series data set of availability if we want to use the data for analysis. |

In [4]:
# Define the columns to remove
remove_cols =   [
                 'asin', 
                 'ISBN10', 
                 'url', 
                 'image', 
                 'description',
                 'availability',
                 ]

# Remove the columns
amazon_books_updated = amazon_books_updated.drop(columns=remove_cols)

# Review
amazon_books_updated.head()

Unnamed: 0,categories,reviews_count,rating,number_of_sellers,final_price,title,format,brand
0,"[""Books"",""Literature & Fiction"",""Genre Fiction""]",13451,4.6 out of 5 stars,,3.99,Wuthering Heights (Collins Classics),"[{""name"":""Kindle"",""price"":""$0.99"",""url"":""/Wuth...",Emily Brontë
1,"[""Books"",""Children's Books"",""Literature & Fict...",16628,4.8 out of 5 stars,,12.08,THE DAYS THE CRAYONS QUIT,"[{""name"":""Kindle"",""price"":""$10.99"",""url"":""/Day...",Drew Daywalt
2,"[""Books"",""Literature & Fiction"",""Genre Fiction""]",11275,4.8 out of 5 stars,,,War Lord: Book 13 (The Last Kingdom Series),"[{""name"":""Kindle"",""price"":""$11.99"",""url"":""/War...",Bernard Cornwell
3,"[""Books"",""Children's Books"",""Literature & Fict...",15520,4.8 out of 5 stars,,20.43,Code Name Bananas: The hilarious and epic new ...,"[{""name"":""Paperback"",""price"":""$24.78"",""url"":""/...",David Walliams
4,"[""Books"",""Crafts, Hobbies & Home"",""Home Improv...",10884,4.8 out of 5 stars,,28.89,Skincare: The award-winning ultimate no-nonsen...,"[{""name"":""Kindle"",""price"":""$16.99"",""url"":""/Ski...",Caroline Hirons


The final list of columns will be the following:

| Full Name         | Column Name       |
|:------------------|:------------------|
| Categories        | categories        |
| Reviews count     | reviews_count     |
| Average rating    | rating            |
| Number of sellers | number_of_sellers |
| Final price       | final_price       |
| Title             | title             |
| Format            | format            |
| Author            | brand             |

### Handling missing values

Let's check for missing values.

In [5]:
# Show the number of missing values per column
amazon_books_updated.isnull().sum()

categories              0
reviews_count           0
rating                  0
number_of_sellers    2265
final_price           877
title                   0
format                 96
brand                   1
dtype: int64

It seems that the number of sellers, final price, format, and brand columns have missing values. The first two, especially  number of sellers, have startling number of missing values. Next, I'll go through each column one by one to evaluate how to handle them.

**Missing values in the number of sellers column**

There's a significant number of missing values in this column. Interesting enough, there are 3 rows with a value.

In [6]:
# Show frequency of values in number_of_sellers
amazon_books_updated['number_of_sellers'].value_counts()

# Show rows has values in number_of_sellers
amazon_books_updated[amazon_books_updated['number_of_sellers'].notnull()]

Unnamed: 0,categories,reviews_count,rating,number_of_sellers,final_price,title,format,brand
664,"[""Books"",""Humor & Entertainment"",""Puzzles & Ga...",37782,4.9 out of 5 stars,94.0,27.93,Player's Handbook (Dungeons & Dragons),,Wizards of the Coast
1194,"[""Books"",""Self-Help"",""Relationships"",""Love & R...",15377,4.7 out of 5 stars,19.0,8.01,Knock Knock What I Love about You Fill in the ...,,Knock Knock
2029,"[""Toys & Games"",""Learning & Education"",""Electr...",18477,4.8 out of 5 stars,4.0,17.99,"VTech Musical Rhymes Book, Pink",,VTech
2177,"[""Books"",""Calendars"",""Animals"",""Dogs""]",11249,4.8 out of 5 stars,2.0,16.99,Pooping Pooches White Elephant Gag Gift Calendar,,Pooping Pooches


It doesn't seem that we can glean anything important from the available values for number of sellers. Doing the work ourself by filling in the number of sellers is not an option because we do not know exactly when the data was collected; even if we do know, it's unclear whether it's possible to find past data for the known date. Our only option is to drop the number of sellers column though not reluctantly because it is not too important for our analysis.

In [7]:
amazon_books_updated.drop(columns=['number_of_sellers'], inplace=True)
# Verify column has been dropped
pd.DataFrame({"columns": amazon_books_updated.columns})

Unnamed: 0,columns
0,categories
1,reviews_count
2,rating
3,final_price
4,title
5,format
6,brand


**Missing values in the final price and format columns**

We put both the final price and format columns together in this section because they are actually related. Let's see what we mean by that:

In [8]:
# Print the frequency of values in both the final price and format columns
print(amazon_books_updated['final_price'].value_counts(dropna=False))

# Be ready for an eyesore
print(amazon_books_updated['format'].value_counts(dropna=False).head())

NaN      877
9.99      25
6.99      24
8.99      23
15.99     21
        ... 
17.26      1
25.01      1
21.06      1
58.88      1
9.48       1
Name: final_price, Length: 733, dtype: int64
NaN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                96
[{"name":"Kindle","price":"$0.00","url":"/Picture-Dorian-Gray-AmazonClassics-ebook/dp/B071HDXV91"},{"n

Note that the format column also has prices in them but each price is linked to a certain format. This means that a book might have multiple prices depending on its format.

Other than that, the values in the final price column is as expected at least on a cursory look, but the values in the format column needs further inspection. Before that, we need to do some data conversion for the format column because even though the above looks like a list, it is actually a string. Converting all the values into a list is our next step.

In [9]:
import ast
import numpy as np

'''
**Convert each value that is not null into a list**

`ast.literal_eval` evaluates and converts the literal string into the appropriate Python object. 

The [documentation](https://docs.python.org/3/library/ast.html#ast.literal_eval) mentioned that it is safe-ish; it prevents Python code from being called but depending on the input, it's possible to trigger memory or C-stack exhaustion.

We trust the collector of the data set enough to use `ast.literal_eval` in spite of the potential attack.
'''
amazon_books_updated['format'] = amazon_books_updated['format'].apply(lambda x: ast.literal_eval(x) if not pd.isna(x) else np.nan)

# Making sure that all values in format is either a list or a null value
print("Formats are either list or nulls: {}".format(amazon_books_updated['format'].apply(lambda x: type(x) is list or x is np.nan).all()))

# Verify if the number of nulls is still the same
amazon_books_updated['format'].isnull().sum()

Formats are either list or nulls: True


96

In [10]:
# Is everything a list?
# for val in amazon_books_updated['format']:
# #     print(type(val))
#     if val is list:
#         print(val)

#     if val == False:
#         print(val)

Next, let's explore the possible values for the format name and price.

In [11]:
'''
The value in the format column is a list containing dictionaries. 

In each list is a dictionary that follows the following structure:
{
    "name": "Format",
    "price": "$0.00",
    "url": "/url/to/the/book/page/on/Amazon"
}

We would like to extract the name and price for each book. 

''' 

price_set = set()
format_set = set()

not_null_format = amazon_books_updated[amazon_books_updated['format'].notnull()]['format']

# Loop through the rows with no missing formats and add to the corresponding sets
for formats in not_null_format:
    for book_format in formats:
        format_name = book_format['name']
        price = book_format['price']
        
        format_set.add(format_name)
        price_set.add(price)

        
# Prepare for display
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "last_expr"

format_df = pd.DataFrame({"format": list(format_set)})
display(format_df)

price_df = pd.DataFrame({"price": list(price_set)})
price_df

Unnamed: 0,format
0,MP3 CD
1,Flexibound
2,Paperback
3,Kindle & comiXology
4,Kindle Edition with Audio/Video
5,Library Binding
6,Plastic Comb
7,Multimedia CD
8,Loose Leaf
9,Rag Book


Unnamed: 0,price
0,
1,$51.95
2,$24.52
3,$15.51
4,$6.65
...,...
1398,$13.19
1399,$14.97
1400,$23.36
1401,$2.83


One thing for certain is we didn't expect the variety of book formats (which must be obvious to everyone else but not us). This introduces slight complexity on our part because we need to select the format that best represents our more general book format: physical book and e-book.

On the other hand, the values for price looks reasonable except for the empty value.

Let's identify which book format to keep.

**Identifying relevant formats**

Physical books on Amazon can come in many formats. After reviewing the books we intend to publish, we decided to focus on the following physical formats: paperback and hardcover. For e-books, the obvious choice would be Kindle and we ignore its derivatives which are Kindle & comixology and Kindle Edition with Audio/Video.

We've pruned the format list to only represent what we intend to publish:

| Format Name           | Definition                                                                                                                                                                                                                                                                                                            |
|:----------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Paperback             | A book with thick paper or paperboard cover. To contrast with the mass market paperback, a paperback is also called "trade paperback" and is regarded as being higher in quality.                                                                                                                                     |
| Perfect Paperback     | "Perfect" might be a misnomer here depending on your knowledge of book publication. A perfect paperback is not about the quality of the paperback but the kind of binding used. Perfect binding relies solely on adhesive and does not use any stitches.                                                                         |
| Mass Market Paperback | A mass market paperback is similar to a "trade" paperback, but it tends to be smaller and less durable. It's also generally cheaper since it's intended to be produced "in mass". This format is generally used when publishing very popular books in order to make it accessible, mostly price-wise, to most people. |
| Hardcover             | A book with rigid/hard protective covers. A book will typically have a hardcover format which are published early in the publication cycle and only for a limited amount of time. After sales are slower, publisher of the hardcover book normally transition to the cheaper paperback format to reinvigorate sales.  |
| Kindle                | An e-book meant for reading with an Amazon Kindle.                                                                                                                                                                                                                                                                    |

You can find some references on book formats in the <a href="#References">References</a> section.

**Keep only the relevant book formats in the data set**



In [13]:
keep_format = ['Paperback', 'Kindle', 'Perfect Paperback', 'Mass Market Paperback', 'Hardcover']

def keep_specific_format(format_list, format_to_keep):
    if format_list == np.nan:
        return format_list
    for f_dict in format_list:
        if f_dict['name'] == format_to_keep:
            return f_dict['price']

def keep_paperback_hardcover_kindle(row):
#   print(row['format'])
    book_formats = row['format']
    if book_formats is np.nan:
         return row
    for format in book_formats:
        name = format['name']
        price = format['price']
        if name in keep_format:
            split_name = name.split(' ')
            snakecase_name = "_".join(split_name).lower()
            row["{}_price".format(snakecase_name)] = price
    return row

amazon_books_updated = amazon_books_updated.apply(keep_paperback_hardcover_kindle, axis=1)
amazon_books_updated.head()

Unnamed: 0,brand,categories,final_price,format,hardcover_price,kindle_price,mass_market_paperback_price,paperback_price,perfect_paperback_price,rating,reviews_count,title
0,Emily Brontë,"[""Books"",""Literature & Fiction"",""Genre Fiction""]",3.99,"[{'name': 'Kindle', 'price': '$0.99', 'url': '...",$10.99,$0.99,$4.95,,,4.6 out of 5 stars,13451,Wuthering Heights (Collins Classics)
1,Drew Daywalt,"[""Books"",""Children's Books"",""Literature & Fict...",12.08,"[{'name': 'Kindle', 'price': '$10.99', 'url': ...",$9.19,$10.99,,,,4.8 out of 5 stars,16628,THE DAYS THE CRAYONS QUIT
2,Bernard Cornwell,"[""Books"",""Literature & Fiction"",""Genre Fiction""]",,"[{'name': 'Kindle', 'price': '$11.99', 'url': ...",$18.29,$11.99,,,,4.8 out of 5 stars,11275,War Lord: Book 13 (The Last Kingdom Series)
3,David Walliams,"[""Books"",""Children's Books"",""Literature & Fict...",20.43,"[{'name': 'Paperback', 'price': '$24.78', 'url...",,,,$24.78,,4.8 out of 5 stars,15520,Code Name Bananas: The hilarious and epic new ...
4,Caroline Hirons,"[""Books"",""Crafts, Hobbies & Home"",""Home Improv...",28.89,"[{'name': 'Kindle', 'price': '$16.99', 'url': ...",,$16.99,,,,4.8 out of 5 stars,10884,Skincare: The award-winning ultimate no-nonsen...


It seems that the information from the format column is much richer than the final price column. Another thing we recently noticed is we don't have an idea which format the final price is referring to. This highlights another problem: formats are needed because we're only interested in books that were published at least as a hardcover, paperback, or an e-book; without the format, we risk addding non-representative books into our analysis. 

Because of the above stated reasons, we are considering removing the final price column and also any books with missing formats. Let's investigate those books with no formats first before removing anything.

In [27]:
# Get books with missing formats
no_formats = amazon_books_updated[amazon_books_updated['format'].isnull()]

display(no_formats.head(5))

# Get the percentage of books with missing formats; needed to decide whether to drop them
print("Number of books with missing formats: {}".format(len(no_formats)))
print("Percentage of total data set: {:.2%}".format(len(no_formats)/len(amazon_books_updated)))

Unnamed: 0,brand,categories,final_price,format,hardcover_price,kindle_price,mass_market_paperback_price,paperback_price,perfect_paperback_price,rating,reviews_count,title
148,Mrs Hinch,"[""Books"",""Crafts, Hobbies & Home"",""Home Improv...",19.7,,,,,,,4.8 out of 5 stars,15127,Mrs Hinch: The Activity Journal
173,Dr. Seuss,"[""Books"",""Children's Books"",""Literature & Fict...",11.98,,,,,,,4.8 out of 5 stars,11204,The Little Blue Box of Bright and Early Board ...
186,Roger Priddy,"[""Books"",""Children's Books"",""Education & Refer...",13.17,,,,,,,4.8 out of 5 stars,11915,First 100 Board Book Box Set (3 books): First ...
244,"by Wynn Kapit (Author), Lawrence M. Elson (Aut...","[""Books"",""New, Used & Rental Textbooks"",""Medic...",,,,,,,,4.6 out of 5 stars,10129,The Anatomy Coloring Book
273,Ann Whitford Paul,"[""Books"",""Children's Books"",""Growing Up & Fact...",4.44,,,,,,,4.8 out of 5 stars,58744,If Animals Kissed Good Night


Number of books with missing formats: 96
Percentage of total data set: 4.23%


## References



https://en.wikipedia.org/wiki/Hardcover

https://www.julesbuono.com/paperback-vs-mass-market-paperback/

https://en.wikipedia.org/wiki/Paperback#Mass-market

https://en.wikipedia.org/wiki/Amazon_Kindle

https://sellercentral.amazon.com/forums/t/what-is-a-perfect-paperback-in-listing-a-book/236479

https://en.wikibooks.org/wiki/Bookbinding/Perfect_binding