## Excel in Python with openpyxl

### Excel terminology

<table class="table table-hover">
<thead>
<tr>
<th>Term</th>
<th>Explanation</th>
</tr>
</thead>
<tbody>
<tr>
<td>Spreadsheet or Workbook</td>
<td>A <strong>Spreadsheet</strong> is the main file you are creating or working with.</td>
</tr>
<tr>
<td>Worksheet or Sheet</td>
<td>A <strong>Sheet</strong> is used to split different kinds of content within the same spreadsheet. A <strong>Spreadsheet</strong> can have one or more <strong>Sheets</strong>.</td>
</tr>
<tr>
<td>Column</td>
<td>A <strong>Column</strong> is a vertical line, and it’s represented by an uppercase letter: <em>A</em>.</td>
</tr>
<tr>
<td>Row</td>
<td>A <strong>Row</strong> is a horizontal line, and it’s represented by a number: <em>1</em>.</td>
</tr>
<tr>
<td>Cell</td>
<td>A <strong>Cell</strong> is a combination of <strong>Column</strong> and <strong>Row</strong>, represented by both an uppercase letter and a number: <em>A1</em>.</td>
</tr>
</tbody>
</table>

### Install openpyxl

With pip: `pip install openpyxl`

With conda: `conda install openpyxl`

### Creating an Excel file

In [None]:
from openpyxl import Workbook

workbook = Workbook()
sheet = workbook.active

sheet["A1"] = "hello"
sheet["B1"] = "world!"

workbook.save(filename="hello_world.xlsx")

### Reading an Excel file

We'll be using the *reviews-sample.xlsx* file troughout this lesson.

In [2]:
from openpyxl import load_workbook

workbook = load_workbook(filename="reviews-sample.xlsx")
print(workbook.sheetnames)

sheet = workbook.active
print('\n', sheet)

print('\n', sheet.title)

['amazon_reviews_us_Watches_v1_00-sample']

 <Worksheet "amazon_reviews_us_Watches_v1_00-sample">

 amazon_reviews_us_Watches_v1_00-sample




Now, after opening a spreadsheet, you can easily retrieve data from it like this:

In [3]:
print(sheet["A1"]) # wrong

print('\n', sheet["A1"].value) # correct

print('\n', sheet["F10"].value) # correct

<Cell 'amazon_reviews_us_Watches_v1_00-sample'.A1>

 marketplace

 G-Shock Men's Grey Sport Watch


To return the actual value of a cell, you need to do `.value`. Otherwise, you’ll get the main Cell object. You can also use the method .cell() to retrieve a cell using index notation. 

Remember to add `.value` to get the actual value and not a Cell object:

In [4]:
print(sheet.cell(row=10, column=6)) # wrong

print('\n', sheet.cell(row=10, column=6).value) # correct

<Cell 'amazon_reviews_us_Watches_v1_00-sample'.F10>

 G-Shock Men's Grey Sport Watch


> **Note**: Even though in Python you’re used to a zero-indexed notation, with spreadsheets you’ll always use a one-indexed notation where the first row or column always has index 1.

The above shows you the quickest way to open a spreadsheet. However, you can pass additional parameters to change the way a spreadsheet is loaded.

### Additional reading options

There are a few arguments you can pass to `load_workbook()` that change the way a spreadsheet is loaded. The most important ones are the following two Booleans:

* `read_only` loads a spreadsheet in read-only mode allowing you to open very large Excel files.
* `data_only` ignores loading formulas and instead loads only the resulting values.


### Iterating Through the Data

You can slice the data with a combination of columns and rows:

In [5]:
sheet["A1:C2"]

((<Cell 'amazon_reviews_us_Watches_v1_00-sample'.A1>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.B1>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.C1>),
 (<Cell 'amazon_reviews_us_Watches_v1_00-sample'.A2>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.B2>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.C2>))

You can get ranges of rows or columns:

In [6]:
# Get all cells from column A
sheet["A"]

(<Cell 'amazon_reviews_us_Watches_v1_00-sample'.A1>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A2>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A3>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A4>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A5>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A6>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A7>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A8>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A9>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A10>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A11>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A12>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A13>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A14>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A15>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A16>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A17>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A18>,
 <Cell 'amazon_reviews_us_Watches_v1_

In [7]:
# Get all cells for a range of columns
sheet["A:B"]

((<Cell 'amazon_reviews_us_Watches_v1_00-sample'.A1>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A2>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A3>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A4>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A5>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A6>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A7>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A8>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A9>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A10>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A11>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A12>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A13>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A14>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A15>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A16>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A17>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A18>,
  <Cell 'amazon_rev

In [8]:
# Get all cells from row 5
sheet[5]

(<Cell 'amazon_reviews_us_Watches_v1_00-sample'.A5>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.B5>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.C5>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.D5>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.E5>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.F5>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.G5>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.H5>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.I5>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.J5>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.K5>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.L5>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.M5>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.N5>,
 <Cell 'amazon_reviews_us_Watches_v1_00-sample'.O5>)

In [9]:
# Get all cells for a range of rows
sheet[5:6]

((<Cell 'amazon_reviews_us_Watches_v1_00-sample'.A5>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.B5>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.C5>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.D5>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.E5>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.F5>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.G5>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.H5>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.I5>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.J5>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.K5>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.L5>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.M5>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.N5>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.O5>),
 (<Cell 'amazon_reviews_us_Watches_v1_00-sample'.A6>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.B6>,
  <Cell 'amazon_reviews_us_Watches_v1_00-sample'.C6>,
  <Cell 'amazon_reviews_us_

There are also multiple ways of using normal Python generators to go through the data. The main methods you can use to achieve this are:

* `.iter_rows()`
* `.iter_cols()`

Both methods can receive the following arguments:

* `min_row`
* `max_row`
* `min_col`
* `max_col`

These arguments are used to set boundaries for the iteration:

In [10]:
for row in sheet.iter_rows(min_row=1,max_row=2,min_col=1,max_col=3):
    print(row)

(<Cell 'amazon_reviews_us_Watches_v1_00-sample'.A1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.B1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.C1>)
(<Cell 'amazon_reviews_us_Watches_v1_00-sample'.A2>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.B2>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.C2>)


In [11]:
for column in sheet.iter_cols(min_row=1,max_row=2,min_col=1,max_col=3):
    print(column)

(<Cell 'amazon_reviews_us_Watches_v1_00-sample'.A1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.A2>)
(<Cell 'amazon_reviews_us_Watches_v1_00-sample'.B1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.B2>)
(<Cell 'amazon_reviews_us_Watches_v1_00-sample'.C1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.C2>)


One additional argument you can pass to both methods is the Boolean `values_only`. When it’s set to `True`, the values of the cell are returned, instead of the `Cell` object:

In [None]:
for value in sheet.iter_rows(min_row=1,max_row=2,min_col=1,max_col=3,values_only=True):
    print(value)

If you want to iterate through the whole dataset, then you can also use the attributes `.rows` or `.columns` directly, which are shortcuts to using `.iter_rows()` and `.iter_cols()` without any arguments:

In [12]:
for row in sheet.rows:
    print(row)

(<Cell 'amazon_reviews_us_Watches_v1_00-sample'.A1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.B1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.C1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.D1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.E1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.F1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.G1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.H1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.I1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.J1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.K1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.L1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.M1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.N1>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.O1>)
(<Cell 'amazon_reviews_us_Watches_v1_00-sample'.A2>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.B2>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.C2>, <Cell 'amazon_reviews_us_Watches_v1_00-sample'.D2>, <Cell 'ama

### Manipulate data using Python’s default datatypes

For example, say you want to extract product information from the spreadsheet and into a dictionary where each key is a product ID.

A straightforward way to do this is to iterate over all the rows, pick the columns you know are related to product information, and then store that in a dictionary.

First of all, have a look at the headers and see what information you care most about:

In [13]:
for value in sheet.iter_rows(min_row=1,max_row=1,values_only=True):
    print(value)

('marketplace', 'customer_id', 'review_id', 'product_id', 'product_parent', 'product_title', 'product_category', 'star_rating', 'helpful_votes', 'total_votes', 'vine', 'verified_purchase', 'review_headline', 'review_body', 'review_date')


This code returns a list of all the column names you have in the spreadsheet. To start, grab the columns with names:

* product_id
* product_parent
* product_title
* product_category

In [14]:
for value in sheet.iter_rows(min_row=2,min_col=4,max_col=7,values_only=True):
    print(value)

('B00FALQ1ZC', 937001370, 'Invicta Women\'s 15150 "Angel" 18k Yellow Gold Ion-Plated Stainless Steel and Brown Leather Watch', 'Watches')
('B00D3RGO20', 484010722, "Kenneth Cole New York Women's KC4944 Automatic Silver Automatic Mesh Bracelet Analog Watch", 'Watches')
('B00DKYC7TK', 361166390, 'Ritche 22mm Black Stainless Steel Bracelet Watch Band Strap Pebble Time/Pebble Classic', 'Watches')
('B000EQS1JW', 958035625, "Citizen Men's BM8180-03E Eco-Drive Stainless Steel Watch with Green Canvas Band", 'Watches')
('B00A6GFD7S', 765328221, "Orient ER27009B Men's Symphony Automatic Stainless Steel Black Dial Mechanical Watch", 'Watches')
('B00EYSOSE8', 230493695, "Casio Men's GW-9400BJ-1JF G-Shock Master of G Rangeman Digital Solar Black Carbon Fiber Insert Watch", 'Watches')
('B00WM0QA3M', 549298279, "Fossil Women's ES3851 Urban Traveler Multifunction Stainless Steel Watch - Rose", 'Watches')
('B00A4EYBR0', 844009113, 'INFANTRY Mens Night Vision Analog Quartz Wrist Watch with Nato Nylon Wa

Now that you know how to get all the important product information you need, let’s put that data into a dictionary:

In [15]:
import json
from pprint import pprint

products = {}

# Using the values_only because you want to return the cells' values
for row in sheet.iter_rows(min_row=2,min_col=4,max_col=7,values_only=True):
    product_id = row[0]
    product = {
        "parent": row[1],
        "title": row[2],
        "category": row[3]
    }
    products[product_id] = product

# Using json here to be able to format the output for displaying later
pprint(products)

# Just for fun, you could easily turn this into a JSON object:
#products_json = json.dumps(products)
#print(products_json)

{'B000B55AEA': {'category': 'Watches',
                'parent': 851729310,
                'title': 'Timex Easy Reader Day-Date Leather Strap Watch'},
 'B000EQS1JW': {'category': 'Watches',
                'parent': 958035625,
                'title': "Citizen Men's BM8180-03E Eco-Drive Stainless Steel "
                         'Watch with Green Canvas Band'},
 'B000FVE3BG': {'category': 'Watches',
                'parent': 824370661,
                'title': "Invicta Men's 3329 Force Collection Lefty Watch"},
 'B000GAWSA4': {'category': 'Watches',
                'parent': 700023949,
                'title': "Casio Men's AW80V-1BV"},
 'B000GB0G5M': {'category': 'Watches',
                'parent': 492842685,
                'title': "Casio Women's LQ139B-1B Classic Round Analog Watch"},
 'B000JQFX1G': {'category': 'Watches',
                'parent': 400836338,
                'title': "Invicta Men's 8926OB Pro Diver Stainless Steel "
                         'Automatic Watch with L

## Appending New Data

Before you start creating very complex spreadsheets, have a quick look at an example of how to append data to an existing spreadsheet.

Go back to the first example spreadsheet you created (hello_world.xlsx) and try opening it and appending some data to it, like this:

In [17]:
# Start by opening the spreadsheet and selecting the main sheet
workbook = load_workbook(filename="hello_world.xlsx")
sheet = workbook.active

# Write what you want into a specific cell
sheet["C1"] = "writing ;)"

# Save the spreadsheet
workbook.save(filename="hello_world_append.xlsx")

### Automatically adding a row to the end of the file

There's an alternative method of adding rows without specifying the cell, if you wish to append them to the end of the file, using the `append` method:

In [18]:
# Start by opening the spreadsheet and selecting the main sheet
workbook = load_workbook(filename="hello_world_append.xlsx")
sheet = workbook.active

# specify the values of the new row:
new_row = ['value1', 'value2', 'value3']

# append the new_row to the end of the file
sheet.append(new_row)

# Save the spreadsheet
workbook.save(filename="hello_world_append_row_to_end.xlsx")

### Adding values to cells in the next empty row or column

There are some properties that you can use that will help you achieve this tasks:

In [20]:
# Start by opening the spreadsheet and selecting the main sheet
workbook = load_workbook(filename="hello_world_append_row_to_end.xlsx")
sheet = workbook.active

# check out these properties:
print('Total rows:', sheet.max_row) # the total number of rows
print('Total columns:', sheet.max_column) # the total number of columns

# if you need to know the letter of the column:
from openpyxl.utils import get_column_letter
print('Letter of last column:', get_column_letter(sheet.max_column))

# let's a value to a cell to the next new row and the next new column:
the_new_row = sheet.max_row + 1
the_new_column = get_column_letter(sheet.max_column + 1)
the_new_cell = f'{the_new_column}{the_new_row}'
print('The new cell:', the_new_cell)
sheet[the_new_cell] = 'new_value'

# Save the spreadsheet
workbook.save(filename="hello_world_append_to_new_row&column.xlsx")

Total rows: 2
Total columns: 3
Letter of last column: C
The new cell: D3


### Writing Excel spreadsheets

One thing you can do to help with coming code examples is add the following method to your Python file or console:

In [21]:
def print_rows():
    for row in sheet.iter_rows(values_only=True):
        print(row)

### Adding and Updating Cell Values

You already learned how to add values to a spreadsheet like this:

````python
sheet["A1"] = "value"
````

There’s another way you can do this, by first selecting a cell and then changing its value:

````python
cell = sheet["A1"]
cell
Output: <Cell 'Sheet'.A1>

cell.value
Output: 'hello'

cell.value = "hey"
cell.value
Output: 'hey'
````

The new value is only stored into the spreadsheet once you call `workbook.save()`.

The openpyxl creates a cell when adding a value, if that cell didn’t exist before:

In [22]:
workbook = load_workbook(filename="hello_world.xlsx")
sheet = workbook.active

# Before, our spreadsheet has only 1 row
print_rows()

print('-' * 30)

# Try adding a value to row 10
sheet["B10"] = "test"
print_rows()

('hello', 'world!')
------------------------------
('hello', 'world!')
(None, None)
(None, None)
(None, None)
(None, None)
(None, None)
(None, None)
(None, None)
(None, None)
(None, 'test')


As you can see, when trying to add a value to cell B10, you end up with a tuple with 10 rows, just so you can have that test value.

### Managing Rows and Columns

One of the most common things you have to do when manipulating spreadsheets is adding or removing rows and columns. The openpyxl package allows you to do that in a very straightforward way by using the methods:

* `.insert_rows()`
* `.delete_rows()`
* `.insert_cols()`
* `.delete_cols()`

Every single one of those methods can receive two arguments:

* `idx`
* `amount`

Using our basic *hello_world.xlsx* example again, let’s see how these methods work:

In [23]:
workbook = load_workbook(filename="hello_world.xlsx")
sheet = workbook.active

print_rows()
print('-' * 30)

# Insert a column before the existing column 1 ("A")
sheet.insert_cols(idx=1)
print_rows()
print('-' * 30)

# Insert 5 columns between column 2 ("B") and 3 ("C")
sheet.insert_cols(idx=3, amount=5)
print_rows()
print('-' * 30)

# Insert a new row in the beginning
sheet.insert_rows(idx=1)
print_rows()
print('-' * 30)

# Insert 3 new rows in the beginning
sheet.insert_rows(idx=1, amount=3)
print_rows()
print('-' * 30)

('hello', 'world!')
------------------------------
(None, 'hello', 'world!')
------------------------------
(None, 'hello', None, None, None, None, None, 'world!')
------------------------------
(None, None, None, None, None, None, None, None)
(None, 'hello', None, None, None, None, None, 'world!')
------------------------------
(None, None, None, None, None, None, None, None)
(None, None, None, None, None, None, None, None)
(None, None, None, None, None, None, None, None)
(None, None, None, None, None, None, None, None)
(None, 'hello', None, None, None, None, None, 'world!')
------------------------------


In [24]:
# Delete the created columns
sheet.delete_cols(idx=3, amount=5)
sheet.delete_cols(idx=1)
print_rows()
print('-' * 30)

# Delete the first 4 rows
sheet.delete_rows(idx=1, amount=4)
print_rows()

(None, None)
(None, None)
(None, None)
(None, None)
('hello', 'world!')
------------------------------
('hello', 'world!')


The only thing you need to remember is that when inserting new data (rows or columns), the insertion happens **before** the idx parameter.

So, if you do `insert_rows(1)`, it inserts a new row **before** the existing first row.

It’s the same for columns: when you call `insert_cols(2)`, it inserts a new column right **before** the already existing second column (B).

However, when deleting rows or columns, `.delete_...` deletes data **starting from** the index passed as an argument.

For example, when doing `delete_rows(2)` it deletes row 2, and when doing `delete_cols(3)` it deletes the third column (C).

### Managing Sheets

Sheet management is also one of those things you might need to know, even though it might be something that you don’t use that often.

If you look back at the code examples from this tutorial, you’ll notice the following recurring piece of code:

````python
sheet = workbook.active
````

This is the way to select the default sheet from a spreadsheet. However, if you’re opening a spreadsheet with multiple sheets, then you can always select a specific one like this:

````python
# Let's say you have two sheets: "Products" and "Company Sales"
workbook.sheetnames
Output: ['Products', 'Company Sales']

# You can select a sheet using its title
products_sheet = workbook["Products"]
sales_sheet = workbook["Company Sales"]
````

You can also change a sheet title very easily:

````python
workbook.sheetnames
['Products', 'Company Sales']

products_sheet = workbook["Products"]
products_sheet.title = "New Products"

workbook.sheetnames
['New Products', 'Company Sales']
````

If you want to create or delete sheets, then you can also do that with .create_sheet() and .remove():

````python
workbook.sheetnames
['Products', 'Company Sales']

operations_sheet = workbook.create_sheet("Operations")
workbook.sheetnames
['Products', 'Company Sales', 'Operations']

# You can also define the position to create the sheet at
hr_sheet = workbook.create_sheet("HR", 0)
workbook.sheetnames
['HR', 'Products', 'Company Sales', 'Operations']

# To remove them, just pass the sheet as an argument to the .remove()
workbook.remove(operations_sheet)
workbook.sheetnames
['HR', 'Products', 'Company Sales']

workbook.remove(hr_sheet)
workbook.sheetnames
['Products', 'Company Sales']
````

One other thing you can do is make duplicates of a sheet using copy_worksheet():

````python
workbook.sheetnames
['Products', 'Company Sales']

products_sheet = workbook["Products"]
workbook.copy_worksheet(products_sheet)
<Worksheet "Products Copy">

workbook.sheetnames
['Products', 'Company Sales', 'Products Copy']
````

If you open your spreadsheet after saving the above code, you’ll notice that the sheet Products Copy is a duplicate of the sheet Products.

### Freezing rows and columns

In [25]:
workbook = load_workbook(filename="reviews-sample.xlsx")
sheet = workbook.active
sheet.freeze_panes = "B2"
workbook.save("sample_frozen.xlsx")

### Adding filters

The code below is an example of how you would add some filters to our existing *reviews-sample.xlsx* spreadsheet:

In [26]:
workbook = load_workbook(filename="reviews-sample.xlsx")
sheet = workbook.active

# Check the used spreadsheet space using the attribute "dimensions"
print(sheet.dimensions)

sheet.auto_filter.ref = "A1:O100"
workbook.save(filename="sample_with_filters.xlsx")

A1:O100


### Adding formulas

Formulas (or *formulae*) are one of the most powerful features of spreadsheets.

They give you the power to apply specific mathematical equations to a range of cells. Using formulas with `openpyxl` is as simple as editing the value of a cell.

You can see the list of formulas supported by openpyxl:

In [28]:
from openpyxl.utils import FORMULAE

FORMULAE

frozenset({'ABS',
           'ACCRINT',
           'ACCRINTM',
           'ACOS',
           'ACOSH',
           'AMORDEGRC',
           'AMORLINC',
           'AND',
           'AREAS',
           'ASC',
           'ASIN',
           'ASINH',
           'ATAN',
           'ATAN2',
           'ATANH',
           'AVEDEV',
           'AVERAGE',
           'AVERAGEA',
           'AVERAGEIF',
           'AVERAGEIFS',
           'BAHTTEXT',
           'BESSELI',
           'BESSELJ',
           'BESSELK',
           'BESSELY',
           'BETADIST',
           'BETAINV',
           'BIN2DEC',
           'BIN2HEX',
           'BIN2OCT',
           'BINOMDIST',
           'CEILING',
           'CELL',
           'CHAR',
           'CHIDIST',
           'CHIINV',
           'CHITEST',
           'CHOOSE',
           'CLEAN',
           'CODE',
           'COLUMN',
           'COLUMNS',
           'COMBIN',
           'COMPLEX',
           'CONCATENATE',
           'CONFIDENCE',
           'CO

Let’s add some formulas to our *reviews-sample.xlsx* spreadsheet.

Starting with something easy, let’s check the average star rating for the 99 reviews within the spreadsheet:

In [27]:
workbook = load_workbook(filename="reviews-sample.xlsx")
sheet = workbook.active

# Star rating is column "H"
sheet["P2"] = "=AVERAGE(H2:H100)"
workbook.save(filename="sample_formulas.xlsx")

You can use the same methodology to add any formulas to your spreadsheet. For example, let’s count the number of reviews that had helpful votes:

In [29]:
# The helpful votes are counted on column "I"
sheet["P3"] = '=COUNTIF(I2:I100, ">0")'
workbook.save(filename="sample_formulas.xlsx")

You’ll have to make sure that the strings within a formula are always in double quotes, so you either have to use single quotes around the formula like in the example above or you’ll have to escape the double quotes inside the formula: `"=COUNTIF(I2:I100, \">0\")"`.

### Adding styles

You can apply multiple styling options to your spreadsheet, including fonts, borders, colors, and so on. Have a look at the `openpyxl`<a href="https://openpyxl.readthedocs.io/en/stable/styles.html" target="_blank">documentation</a> to learn more.

You can also choose to either apply a style directly to a cell or create a template and reuse it to apply styles to multiple cells.

Let’s start by having a look at simple cell styling, using our *reviews-sample.xlsx* again as the base spreadsheet:

In [30]:
workbook = load_workbook(filename="reviews-sample.xlsx")
sheet = workbook.active

# Import necessary style classes
from openpyxl.styles import Font, Color, Alignment, Border, Side, colors

# Create a few styles
bold_font = Font(bold=True)
big_red_text = Font(color=colors.RED, size=20)
center_aligned_text = Alignment(horizontal="center")
double_border_side = Side(border_style="double")
square_border = Border(top=double_border_side,right=double_border_side,bottom=double_border_side,left=double_border_side)

# Style some cells!
sheet["B2"].font = bold_font
sheet["B3"].font = big_red_text
sheet["B4"].alignment = center_aligned_text
sheet["B5"].border = square_border
workbook.save(filename="sample_styles.xlsx")

When you want to apply multiple styles to one or several cells, you can use a `NamedStyle` class instead, which is like a style template that you can use over and over again. Have a look at the example below:

In [31]:
from openpyxl.styles import NamedStyle

# Let's create a style template for the header row
header = NamedStyle(name="header")
header.font = Font(bold=True)
header.border = Border(bottom=Side(border_style="thin"))
header.alignment = Alignment(horizontal="center", vertical="center")

# Now let's apply this to all first row (header) cells
header_row = sheet[1]
for cell in header_row:
    cell.style = header

workbook.save(filename="sample_styles.xlsx")

### Conditional formatting

It’s a much more powerful approach to styling because it dynamically applies styles according to how the data in the spreadsheet changes.

In a nutshell, conditional formatting allows you to specify a list of styles to apply to a cell (or cell range) according to specific conditions.

For example, a widespread use case is to have a balance sheet where all the negative totals are in red, and the positive ones are in green. This formatting makes it much more efficient to spot good vs bad periods.

In [32]:
from openpyxl.styles import PatternFill, colors
from openpyxl.styles.differential import DifferentialStyle
from openpyxl.formatting.rule import Rule

red_background = PatternFill(bgColor=colors.RED)
diff_style = DifferentialStyle(fill=red_background)
rule = Rule(type="expression", dxf=diff_style)
rule.formula = ["$H1<3"]
sheet.conditional_formatting.add("A1:O100", rule)
workbook.save("sample_conditional_formatting.xlsx")

Code-wise, the only things that are new here are the objects `DifferentialStyle` and `Rule`:

* `DifferentialStyle` is quite similar to `NamedStyle`, which you already saw above, and it’s used to aggregate multiple styles such as fonts, borders, alignment, and so forth.
* `Rule` is responsible for selecting the cells and applying the styles if the cells match the rule’s logic.

Using a `Rule` object, you can create numerous conditional formatting scenarios.

However, for simplicity sake, the `openpyxl` package offers 3 built-in formats that make it easier to create a few common conditional formatting patterns. These built-ins are:

* `ColorScale`
* `IconSet`
* `DataBar`

The `ColorScale` gives you the ability to create color gradients:

In [33]:
from openpyxl.formatting.rule import ColorScaleRule

color_scale_rule = ColorScaleRule(start_type="min",start_color=colors.RED,end_type="max",end_color=colors.GREEN)

# Again, let's add this gradient to the star ratings, column "H"
sheet.conditional_formatting.add("H2:H100", color_scale_rule)
workbook.save(filename="sample_conditional_formatting_color_scale.xlsx")

The `IconSet` allows you to add an icon to the cell according to its value:

In [34]:
from openpyxl.formatting.rule import IconSetRule

workbook = load_workbook(filename="reviews-sample.xlsx")
sheet = workbook.active

icon_set_rule = IconSetRule("5Arrows", "num", [1, 2, 3, 4, 5])
sheet.conditional_formatting.add("H2:H100", icon_set_rule)
workbook.save("sample_conditional_formatting_icon_set.xlsx")

Finally, the `DataBar` allows you to create progress bars:

In [36]:
from openpyxl.formatting.rule import DataBarRule

workbook = load_workbook(filename="reviews-sample.xlsx")
sheet = workbook.active

data_bar_rule = DataBarRule(start_type="num",start_value=1,end_type="num",end_value="5",color=colors.GREEN)
sheet.conditional_formatting.add("H2:H100", data_bar_rule)
workbook.save("sample_conditional_formatting_data_bar.xlsx")

### Adding images

To be able to load images to a spreadsheet using `openpyxl`, you’ll have to install `Pillow`:

* `pip install Pillow`
* `conda install Pillow`

This is the code you need to import that image into the *hello_word.xlsx* spreadsheet:

In [37]:
from openpyxl import load_workbook
from openpyxl.drawing.image import Image

# Let's use the hello_world spreadsheet since it has less data
workbook = load_workbook(filename="hello_world.xlsx")
sheet = workbook.active

logo = Image("logo.jpeg")

# A bit of resizing to not fill the whole spreadsheet with the logo
logo.height = 150
logo.width = 150

sheet.add_image(logo, "A3")
workbook.save(filename="hello_world_logo.xlsx")

### Adding charts

For any chart you want to build, you’ll need to define the chart type: `BarChart`, `LineChart`, and so forth, plus the data to be used for the chart, which is called `Reference`. See all the availale charts <a href="https://openpyxl.readthedocs.io/en/stable/charts/introduction.html#chart-types" target="_blank">here</a>.

Before you can build your chart, you need to define what data you want to see represented in it. Sometimes, you can use the dataset as is, but other times you need to massage the data a bit to get additional information.

Let’s start by building a new workbook with some sample data:

In [39]:
from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference

workbook = Workbook()
sheet = workbook.active

# Let's create some sample sales data
rows = [
    ["Product", "Online", "Store"],
    [1, 30, 45],
    [2, 40, 30],
    [3, 40, 25],
    [4, 50, 30],
    [5, 30, 25],
    [6, 25, 35],
    [7, 20, 40],
]

for row in rows:
    sheet.append(row)

Now let's create a bar chart that displays the total number of sales per product:

In [40]:
chart = BarChart()
data = Reference(worksheet=sheet,
                 min_row=1,
                 max_row=8,
                 min_col=2,
                 max_col=3)

chart.add_data(data, titles_from_data=True)
sheet.add_chart(chart, "E2")

workbook.save("bar_chart.xlsx")

Like with images, the top left corner of the chart is on the cell you added the chart to. In your case, it was on cell E2.

Let's try creating a line chart instead. First we'll create a new sheet with some data:

In [41]:
import random
from openpyxl import Workbook
from openpyxl.chart import LineChart, Reference

workbook = Workbook()
sheet = workbook.active

# Let's create some sample sales data
rows = [
    ["", "January", "February", "March", "April",
    "May", "June", "July", "August", "September",
     "October", "November", "December"],
    [1, ],
    [2, ],
    [3, ],
]

for row in rows:
    sheet.append(row)

for row in sheet.iter_rows(min_row=2,
                           max_row=4,
                           min_col=2,
                           max_col=13):
    for cell in row:
        cell.value = random.randrange(5, 100)

Let's create our line chart:

In [42]:
chart = LineChart()
data = Reference(worksheet=sheet,
                 min_row=2,
                 max_row=4,
                 min_col=1,
                 max_col=13)

chart.add_data(data, from_rows=True, titles_from_data=True)
sheet.add_chart(chart, "C6")

workbook.save("line_chart.xlsx")

One thing to keep in mind here is the fact that you’re using `from_rows=True` when adding the data. This argument makes the chart plot row by row instead of column by column.

In your sample data, you see that each product has a row with 12 values (1 column per month). That’s why you use `from_rows`. If you don’t pass that argument, by default, the chart tries to plot by column, and you’ll get a month-by-month comparison of sales.

Another difference that has to do with the above argument change is the fact that our `Reference` now starts from the first column, min_col=1, instead of the second one. This change is needed because the chart now expects the first column to have the titles.

There are a couple of other things you can also change regarding the style of the chart. For example, you can add specific categories to the chart:

````python
cats = Reference(worksheet=sheet,
                 min_row=1,
                 max_row=1,
                 min_col=2,
                 max_col=13)
chart.set_categories(cats)
````

Another thing you can do to improve the chart readability is to add an axis. You can do it using the attributes x_axis and y_axis:

````python
chart.x_axis.title = "Months"
chart.y_axis.title = "Sales (per unit)"
````

There is also a way to style your chart by using Excel’s default `ChartStyle` property. In this case, you have to choose a number between 1 and 48. Depending on your choice, the colors of your chart change as well:

````python
# You can play with this by choosing any number between 1 and 48
chart.style = 24
````

There is no clear documentation on what each style number looks like, but <a href="https://1drv.ms/x/s!Asf0Y5Y4GI3Mg6kZNRd1IA09NLWv9A" target="_blank">this spreadsheet</a> has a few examples of the styles available.

Here's the complete code for this example:

In [43]:
import random
from openpyxl import Workbook
from openpyxl.chart import LineChart, Reference

workbook = Workbook()
sheet = workbook.active

# Let's create some sample sales data
rows = [
    ["", "January", "February", "March", "April",
    "May", "June", "July", "August", "September",
     "October", "November", "December"],
    [1, ],
    [2, ],
    [3, ],
]

for row in rows:
    sheet.append(row)

for row in sheet.iter_rows(min_row=2,
                           max_row=4,
                           min_col=2,
                           max_col=13):
    for cell in row:
        cell.value = random.randrange(5, 100)

# Create a LineChart and add the main data
chart = LineChart()
data = Reference(worksheet=sheet,
                           min_row=2,
                           max_row=4,
                           min_col=1,
                           max_col=13)
chart.add_data(data, titles_from_data=True, from_rows=True)

# Add categories to the chart
cats = Reference(worksheet=sheet,
                 min_row=1,
                 max_row=1,
                 min_col=2,
                 max_col=13)
chart.set_categories(cats)

# Rename the X and Y Axis
chart.x_axis.title = "Months"
chart.y_axis.title = "Sales (per unit)"

# Apply a specific Style
chart.style = 24

# Save!
sheet.add_chart(chart, "C6")
workbook.save("line_chart.xlsx")

### Bonus: Working With Pandas

Even though you can use Pandas to handle Excel files, there are few things that you either can’t accomplish with Pandas or that you’d be better off just using openpyxl directly.

For example, some of the advantages of using openpyxl are the ability to easily customize your spreadsheet with styles, conditional formatting, and such.

But guess what, you don’t have to worry about picking. In fact, openpyxl has support for both converting data from a Pandas `DataFrame` into a `workbook` or converting an openpyxl `workbook` into a Pandas `DataFrame`.

#### From Pandas to Openpyxl

Let’s create a sample `DataFrame`:

In [45]:
import pandas as pd

data = {
    "Product Name": ["Product 1", "Product 2"],
    "Sales Month 1": [10, 20],
    "Sales Month 2": [5, 35],
}
df = pd.DataFrame(data)

Now that you have some data, you can use `.dataframe_to_rows()` to convert it from a `DataFrame` into a `worksheet`:

In [46]:
from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows

workbook = Workbook()
sheet = workbook.active

for row in dataframe_to_rows(df, index=False, header=True):
    sheet.append(row)

workbook.save("pandas.xlsx")

If you want to add the DataFrame’s index, you can change index=True, and it adds each row’s index into your spreadsheet.

#### From Openpyxl to Pandas

On the other hand, if you want to convert a spreadsheet into a DataFrame, you can also do it in a very straightforward way like so:

In [47]:
import pandas as pd
from openpyxl import load_workbook

workbook = load_workbook(filename="reviews-sample.xlsx")
sheet = workbook.active

values = sheet.values
df = pd.DataFrame(values)

df.head(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date
1,US,3653882,R3O9SGZBVQBV76,B00FALQ1ZC,937001370,"Invicta Women's 15150 ""Angel"" 18k Yellow Gold ...",Watches,5,0,0,N,Y,Five Stars,Absolutely love this watch! Get compliments al...,2015-08-31
2,US,14661224,RKH8BNC3L5DLF,B00D3RGO20,484010722,Kenneth Cole New York Women's KC4944 Automatic...,Watches,5,0,0,N,Y,I love thiswatch it keeps time wonderfully,I love this watch it keeps time wonderfully.,2015-08-31


Alternatively, if you want to add the correct headers and use the review ID as the index, for example, then you can also do it like this instead:

In [48]:
import pandas as pd
from openpyxl import load_workbook

workbook = load_workbook(filename="reviews-sample.xlsx")
sheet = workbook.active

data = sheet.values

# Set the first row as the columns for the DataFrame
cols = next(data)
data = list(data)

# Set the column "review_id", the third column, as the index for each row
idx = [row[3] for row in data]

df = pd.DataFrame(data, index=idx, columns=cols)

df.head(3)

Unnamed: 0,marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date
B00FALQ1ZC,US,3653882,R3O9SGZBVQBV76,B00FALQ1ZC,937001370,"Invicta Women's 15150 ""Angel"" 18k Yellow Gold ...",Watches,5,0,0,N,Y,Five Stars,Absolutely love this watch! Get compliments al...,2015-08-31
B00D3RGO20,US,14661224,RKH8BNC3L5DLF,B00D3RGO20,484010722,Kenneth Cole New York Women's KC4944 Automatic...,Watches,5,0,0,N,Y,I love thiswatch it keeps time wonderfully,I love this watch it keeps time wonderfully.,2015-08-31
B00DKYC7TK,US,27324930,R2HLE8WKZSU3NL,B00DKYC7TK,361166390,Ritche 22mm Black Stainless Steel Bracelet Wat...,Watches,2,1,1,N,Y,Two Stars,Scratches,2015-08-31
