# Exporting
## JSON, CSV and timelines

This notebook provides an example of how the exporter functions in INCA can be used.

First of all, we have to instantiate INCA.

In [1]:
# Instantiating INCA
from inca import Inca
myinca = Inca()

ImportError: No module named 'inca'

If you have never scraped any data with INCA before, your Elastic Search database is most likely empty. We can look at the content of the Elastic Search database by running the following command:

In [None]:
myinca.database.list_doctypes()

Let's start by scraping some nu.nl articles using the RSS scraper function so that we have some data to work with.

In [None]:
myinca.rssscrapers.nu()

Checking our Elastic Search database again, we can see that we now have some nu.nl articles.

In [None]:
myinca.database.list_doctypes()

You might want to use this data in other programmes, such as STATA or SPSS. In order to do so, you can use one of the export functions to download the data on your own computer. INCA has two options for this: CSV and JSON. 

### CSV
Below we export the nu.nl articles out of Elastic Search database as a CSV file. This function as the following additional options: 
 * destination: By default, the function creates a folder named exports in which it stores the output. If we already have a destination folder in which we want to store the CSV file, we can specify the destination parameter.
 * fields: By default, all fields are included in the output. Let's say we only want to include the title, text and publication date of the nu.nl articles. You can see the code for this below.
 * include_meta: By default, META is not included in the output. If we do think it is necessary to include this information, we can set this parameter to True.
 * remove_linebreaks: By default, all line breaks within cells are replaced by a space. If we want to keep the line breaks, we can set the remove_linebreaks parameter to True.
 * delimiter: By default, the delimiter is set to ':'. European locales of Microsoft Excel use ';' as a delimiter. Therefore, to ensure compatibility, we can set the delimter to a semicolon. 

In [None]:
# CSV
# Exporting nu.nl articles
myinca.importers_exporters.export_csv(query = 'doctype:"nu"')

# Destination folder
myinca.importers_exporters.export_csv(query = 'doctype:"nu"', destination = '/home/marieke/mycsvfiles')

# Fields
myinca.importers_exporters.export_csv(query = 'doctype:"nu"', fields = ["title", "text", "publication_date"] )

# Include META
myinca.importers_exporters.export_csv(query = 'doctype:"nu"', include_meta = True)

# Remove line breaks
myinca.importers_exporters.export_csv(query = 'doctype:"nu"', remove_linebreaks = True)

# Delimiter
myinca.importers_exporters.export_csv(query = 'doctype:"nu"', delimiter = ';')

### JSON
For JSON, you have the option to export the documents in one file, or to create a JSON file for each document. (The latter is not recommended, as this can result into a large number of files!). Below we export the nu.nl articles as JSON file(s). There are the following additional options:
 * destination: Similarly to CSV export, you can specify a destination folder in which the output is stored.
 * compression: By default, the output is not compressed. If we want a gzipped output file, we can set this parameter to 'gz'. (Use 'bz2' for bzip2 compression.)
 * include_meta: Similar to the CSV export function, the default is set to True. Including META is done by setting this parameter to True.

In [None]:
# JSON
# Separate files
myinca.importers_exporters.export_json_files(query = 'doctype:"nu"')

# One file
myinca.importers_exporters.export_json_file(query = 'doctype:"nu"')

# Destination
myinca.importers_exporters.export_json_file(query = 'doctype:"nu"', destination = '/home/marieke/myjsonfiles')

# Compression
myinca.importers_exporters.export_json_file(query = 'doctype:"nu"', compression = 'gz')

# Include META
myinca.importers_exporters.export_json_file(query = 'doctype:"nu"', include_meta = True)

### Timelines
The timeline function exports a csv file includes he timeline function exports a file with the number of documents per time period. We specify the Elastic Search doctype in 

 * destination: By default, the output is a CSV file named timeline_export.csv. You can specify the folder, filename and type of file. Let's say we want to store the results as a JSON file, then I could set the path to: "/home/marieke/mytimelineoutput.json". 
 * timefield: By default, the key under which the date/time is stored is set to 'publication_date'. For nu.nl articles, the date/time key is indeed 'publication_date'. However, if we, for instance, want to export youtube videos as a timeline file, we can set the timefield to "publishedAt". 
 * granularity: By default, the level of aggregation is set to 'week'. You can specify another interval, such as "year", "quarter", "month", "day", "hour", "minute" or "second".  #### Hoe diep ga ik hierop in? For instance, let's export the timeline to group the nu.nl articles together on a monthly basis.

In [None]:
# Timelines
# Exporting nu.nl as timeline
myinca.importers_exporters.export_timeline(queries = 'doctype:"nu"')

# Destination
myinca.importers_exporters.export_timeline(queries = 'doctype:"nu"', destination = "/home/marieke/mytimelineoutput.json")

# Timefield
myinca.importers_exporters.export_timeline(queries = 'doctype:"youtube_videos"', timefield = "date")

# Granularity
myinca.importers_exporters.export_timeline(queries = 'doctype:"nu"', granularity = "month")