# Importing and Exporting Data

## Reading data, input formats

`pyddsde.Characterize()` takes two main input arguments, the main timeseries `data`, and the timestep `t`. See the [Getting Started](./1%20-%20Getting%20Started.ipynb) for more details. 

- `data` should be a list containing one or two Numpy arrays, for scalar or vector case respectively.
- `t` can either be a scalar, denoting the time-interval between samples, an array (of the same length as the array(s) in `data`) corresponding to the timestamps of each datapoint. `pyddsde` assumes that data-points are evenly spaced.

The example notebooks use a specialized function `load_sample_dataset()` to load example data, but the data could come from any source, as long as it is in the correct format. For example, in the [Fitting non-polynomial functions](./4%20-%20Fitting%20non-polynomial%20functions.ipynb), the data was generated by simulating an SDE.

## Exporting data

`pyddsde` allows you to export data as a Pandas DataFrame, or save data into a CSV file.

In [1]:
import pyddsde
import pandas as pd

In [2]:
data, t = pyddsde.load_sample_dataset('model-data-vector-ternary')
ddsde = pyddsde.Characterize(data, t, bins=20, show_summary=False)

`ddsde.get_data()` returns a Pandas dataframe containing the drift and diffusion coefficients. By default data is returned as binned averages, with the number of bins controlled by the `bins` parameter in the `pyddsde.Characterize()` call.

In [3]:
df = ddsde.export_data()

`df` is a [Pandas](https://pandas.pydata.org) dataframe, and supports various data manipulations: see [Pandas documentation](https://pandas.pydata.org/docs/user_guide/index.html) for more details.

For example, `df.head(n)` shows the first `n` rows of the dataframe. Individual columns (e.g. `drift_x`) can be accessed using the column names. (e.g. `df.drift_x`).

In [4]:
df.head()

Unnamed: 0,x,y,drift_x,drift_y,diffusion_x,diffusion_y,diffusion_xy
5,-0.473684,-0.999993,0.008879,0.069661,0.001583,0.001583,0.008879
6,-0.368421,-0.999993,0.00254,0.060316,0.002648,0.002648,0.00254
7,-0.263158,-0.999993,-0.005515,0.055219,0.004341,0.004341,-0.005515
8,-0.157895,-0.999993,-0.006491,0.051776,0.004562,0.004562,-0.006491
9,-0.052632,-0.999993,-0.012554,0.047965,0.004476,0.004476,-0.012554


In [5]:
df.drift_x

5      0.008879
6      0.002540
7     -0.005515
8     -0.006491
9     -0.012554
         ...   
370    0.012908
371    0.004369
372   -0.000901
373   -0.010205
389    0.035175
Name: drift_x, Length: 314, dtype: float64

As mentioned before, the data is returned as binned averages by default. if raw, unbinned drift and diffusion data is required (i.e. drift and diffusion coefficients estimated at each time-point, use parameter `raw=True`. Note that the raw estimates will be very noisy. However, the raw estimates can be useful, for example, for further analysis with custom regression or curve-fitting.

In [6]:
df_raw = ddsde.export_data(raw=True)
df_raw.head()

Unnamed: 0,x,y,drift_x,drift_y,diffusion_x,diffusion_y,diffusion_xy
0,0.040464,0.185828,-0.84968,-0.197604,0.086635,0.004686,0.020148
1,-0.061497,0.162115,0.467746,0.182182,0.026254,0.003983,0.010226
2,-0.005368,0.183977,0.800044,0.231554,0.076809,0.006434,0.02223
3,0.090638,0.211764,1.142051,-0.939814,0.156514,0.10599,-0.128798
4,0.227684,0.098986,-0.346819,0.594684,0.014434,0.042438,-0.02475


In [7]:
len(df), len(df_raw)

(314, 580866)

### Saving data

To save data to a CSV file, use `ddsde.get_data()` while specifying a file-name using the `filename` parameter.

In [8]:
ddsde.export_data(filename='example_export.csv')