# Export documents as JSON and CSV files

This notebook provides an example of how the exporter functions in INCA can be used. You might want to use the data you scraped in other programmes, such as STATA or SPSS. In order to do so, you can use one of the export functions to download the data on your own computer. INCA has two options for this: CSV and JSON. For this example we use nu.nl news articles, but you can replace this with your doctype of interest. 

First of all, we have to instantiate INCA.

In [1]:
from inca import Inca
myinca = Inca()

ImportError: No module named 'inca'

We can look at the content of the Elastic Search database by running the following command. Here you can see that the doctype of the nu.nl newsarticles are simply called "nu". 

In [None]:
myinca.database.list_doctypes()

## CSV
Below we export the nu.nl articles out of Elastic Search database as a CSV file. 

In [None]:
# Exporting nu.nl articles
myinca.importers_exporters.export_csv(query = 'doctype:"nu"')

__Selecting time period__

It is likely you do not want to export all documents of your doctype that are in the database. Thus, we can limit the time period, which is specified in the Elastic Search query.

To include 

In [None]:
myinca.importers_exporters.export_csv(query = 'doctype:"nu" publication_date>2017')

This function as the following additional options:
  *  destination
  *  fields
  *  include_meta
  *  remove_linebreaks
  *  delimiter

You can use any combination of these parameters.

__Destination__

By default, the function creates a folder named exports in which it stores the output. If we already have a destination folder in which we want to store the CSV file, we can specify the destination parameter.

In [None]:
myinca.importers_exporters.export_csv(query = 'doctype:"nu"', destination = '/home/marieke/mycsvfiles')

__Fields__

By default, all fields are included in the output. Let's say we only want to include the title, text and publication date of the nu.nl articles. You can see the code for this below.

In [None]:
myinca.importers_exporters.export_csv(query = 'doctype:"nu"', fields = ["title", "text", "publication_date"] )

If you want to see which fields are present in your documents, you can look at them with the following code:

In [None]:
myinca.database.doctype_fields('nu')

__Include_meta__ 

By default, META is not included in the output. If we do think it is necessary to include this information, we can set this parameter to True.

In [None]:
myinca.importers_exporters.export_csv(query = 'doctype:"nu"', include_meta = True)

__Remove_linebreaks__

By default, all line breaks within cells are replaced by a space. If we want to keep the line breaks, we can set the remove_linebreaks parameter to True.

In [None]:
myinca.importers_exporters.export_csv(query = 'doctype:"nu"', remove_linebreaks = True)

__Delimiter__

By default, the delimiter is set to ':'. European locales of Microsoft Excel use ';' as a delimiter. Therefore, to ensure compatibility, we can set the delimter to a semicolon. 

In [None]:
myinca.importers_exporters.export_csv(query = 'doctype:"nu"', delimiter = ';')

## JSON
For JSON, you have the option to export the documents in one JSON file...

In [None]:
myinca.importers_exporters.export_json_file(query = 'doctype:"nu"')

...or to create a JSON file for each document. (This is not recommended, as this can result into a large number of files!)

In [None]:
myinca.importers_exporters.export_json_files(query = 'doctype:"nu"')

This function has the following additional options:
 * destination
 * compression
 * include_meta

Again, any combination of these parameters is possible.

__Destination__

Similarly to CSV export, you can specify a destination folder in which the output is stored.

In [None]:
myinca.importers_exporters.export_json_file(query = 'doctype:"nu"', destination = '/home/marieke/myjsonfiles')

__Compression__ 

By default, the output is not compressed. If we want a gzipped output file, we can set this parameter to 'gz'.

In [None]:
myinca.importers_exporters.export_json_file(query = 'doctype:"nu"', compression = 'gz')

Or you can export a bzip2 compressed file by setting the parameter to 'bz2'.

In [None]:
myinca.importers_exporters.export_json_file(query = 'doctype:"nu"', compression = 'bz2')

__Include_meta__ 

Similar to the CSV export function, the default is set to True. Including META is done by setting this parameter to True.

In [None]:
myinca.importers_exporters.export_json_file(query = 'doctype:"nu"', include_meta = True)