# Exporting Data

> Data science is not effective without saving results.
>
> \- Another wise person

## Applied Review

### Data in Python

* Data is frequently represented inside a **DataFrame** - a class from the pandas library
* Other structures exist, too - dicts, models, etc.
* Data is stored in memory - this makes it relatively quickly accessible
* Data is session-specific, so quitting Python (shutting down Jupyter notebook) removes the data from memory

### Importing Data

* Tabular data can be imported into DataFrames using the `pd.read_csv()` function - there are parameters for different options
* Other data formats like JSON (key-value pairs) and Pickle (native Python) can be imported using the `with` statement and respective functions:
  * JSON files use the `load()` function from the `json` library
  * Pickle files use the `load()` function from the `pickle` library

## General Model

### General Framework

A general way to conceptualize data export from Python to Disk:

1. Data sits in memory in the Python session
2. Python code can be used to copy the data from Python's memory to an appropriate format on disk

![export-framework.png](images/export-framework.png)

## Exporting DataFrames

Remember that DataFrames are representations of tabular data -- therefore, knowing how to export DataFrames to tabular data files is important.

### Exporting Setup

We need data to export.

Let's begin by revisiting the importing of tabular data into a DataFrame:

In [None]:
import pandas as pd
planes_df = pd.read_csv('../data/planes.csv')

Next, let's do some manipulations on `planes_df`.

<font style="color:#008;">
    <strong>Question</strong>:<br><em>How do we select the `year` and `manufacturer` variables while returning a DataFrame?</em>
</font>

In [None]:
planes_df = planes_df[['year', 'manufacturer']]

<font style="color:#008;">
    <strong>Question</strong>:<br><em>How do we compute the average `year` by `manufacturer`?</em>
</font>

In [7]:
avg_year_by_man_df = planes_df.groupby('manufacturer', as_index = False).mean()

Unnamed: 0,y,z
0,a,1.5
1,b,3.5


Let's view our result:

In [None]:
avg_year_by_man_df

### Exporting DataFrames with Pandas

DataFrames can be exported using a method built-in to the DataFrame object itself: `DataFrame.to_csv()`.

In [None]:
avg_year_by_man_df.to_csv('../data/avg_year_by_man.csv')

Let's reimport to see the tabular data we just exported:

In [None]:
pd.read_csv('../data/avg_year_by_man.csv').head()

Notice the extra column!

<font style="color:#008;">
    <strong>Question</strong>:<br><em>Where did the extra column come from?</em>
</font>

We can elect not to save the index with the DataFrame by passing `False` to the `index` parameter of `to_csv()`:

In [None]:
avg_year_by_man_df.to_csv('../data/avg_year_by_man.csv', index = False)
pd.read_csv('../data/avg_year_by_man.csv').head()

The `to_csv()` method has similar parameters to `read_csv()`. A few examples:

* `sep` - the data's delimter
* `header` - whether or not to write out the column names

Full documentation can be pulled up by running the method name followed by a question mark:

In [8]:
pd.DataFrame.to_csv?

[0;31mSignature:[0m [0mpd[0m[0;34m.[0m[0mDataFrame[0m[0;34m.[0m[0mto_csv[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mpath_or_buf[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0msep[0m[0;34m=[0m[0;34m','[0m[0;34m,[0m [0mna_rep[0m[0;34m=[0m[0;34m''[0m[0;34m,[0m [0mfloat_format[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mcolumns[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mheader[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m [0mindex[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m [0mindex_label[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mmode[0m[0;34m=[0m[0;34m'w'[0m[0;34m,[0m [0mencoding[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mcompression[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mquoting[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mquotechar[0m[0;34m=[0m[0;34m'"'[0m[0;34m,[0m [0mline_terminator[0m[0;34m=[0m[0;34m'\n'[0m[0;34m,[0m [0mchunksize[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mtupleize_cols[0m[0;34m=[0m[0;

### Your Turn

1. Exporting data is copying data from Python's ________ to the ________. 
2. Fill in the blanks to fix the following code:

   ```python
   import pandas as pd
   flights_df = pd.________('../data/flights.csv')
   flights_to_cvg_df = flights_df[flights_df[________] == 'CVG']
   flights_to_cvg_df.________('../data/flights_to_cvg.csv', ________ = False)
   ```

## Exporting Other Files

Recall being exposed to the importing of JSON and Pickle files -- now we will see how to save them.

### JSON Files

Take a look at the below `dict`:

In [13]:
dict_example = {
    "first": "Guido",
    "last": "van Rossum"
}

And then we can save it as a JSON file using the `with` statement and the `dump` function from the `json` library:

In [14]:
import json
with open('../data/dict_example_export.json', 'w') as f:
    f.write(json.dumps(dict_example))

We can then reimport this to verify we saved it correctly:

In [15]:
with open('../data/dict_example_export.json', 'r') as f:
    imported_json = json.load(f)

In [16]:
type(imported_json)

dict

In [17]:
imported_json

{'first': 'Guido', 'last': 'van Rossum'}

### Pickle Files

<font style="color:#008;">
    <strong>Question</strong>:<br><em>What are Pickle files?</em>
</font>

Python's native data files are known as **Pickle** files:

* All Pickle files have the `.pickle` extension
* Pickle files are great for saving native Python data that can't easily be represented by other file types
  * Pre-processed data
  * Models
  * Any other Python object...

#### Exporting Pickle Files

Pickle files can be exported using the `pickle` library paired with the `with` statement and the `open()` function:

In [22]:
import pickle
with open('../data/pickle_example_export.pickle', 'wb') as f:
    pickle.dump(dict_example, f)

We can then reimport this to verify we saved it correctly:

In [23]:
with open('../data/pickle_example_export.pickle', 'rb') as f:
    imported_pickle = pickle.load(f)

In [24]:
type(imported_pickle)

dict

In [25]:
imported_pickle

{'first': 'Guido', 'last': 'van Rossum'}

# Questions

Are there any questions before we move on?