# Setup and download data

This notebooks sets up CAMPA and downloads an example dataset to the current folder so you can follow along with the following tutorials. This assumes that you will be running all notebooks from the same folder. If this is not the case, adjust `CAMPA_DIR` accordingly.


In [None]:
from pathlib import Path

In [None]:
CAMPA_DIR = Path.cwd()
print(CAMPA_DIR)

Before configuring CAMPA, we need to ensure that all parameter files for configuring the running the different CAMPA steps are present in the `params` subfolder. Note that in general, these files don't need to be in a folder named `params`, but the following tutorials will follow this convention. 
Let us download the necessary parameter files from the [github repository](https://github.com/theislab/campa/tree/main/notebooks/params).

In [None]:
import requests

r = requests.get("https://github.com/kennethreitz/requests/blob/master/README.rst")
r.text()



CAMPA has one main config file; `campa.ini`. 
The [overview](../overview.rst)
describes how you can create this config file from the command line, 
but here we will see how we can create a config from within the campa module using 
the config file representation [campa.constants.campa_config](../api/campa.constants.campa_config.rst).

In [1]:
from pathlib import Path

from campa.constants import campa_config

print(campa_config)

2022-10-28 14:08:27.966034: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-28 14:08:28.148129: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-10-28 14:08:28.157973: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-28 14:08:28.157988: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if yo

Reading config from /home/icb/hannah.spitzer/.config/campa/campa.ini
CAMPAConfig (fname: /home/icb/hannah.spitzer/.config/campa/campa.ini)
EXPERIMENT_DIR: 
BASE_DATA_DIR: 
CO_OCC_CHUNK_SIZE: 10000000.0
data_config/testdata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/notebooks/params/TestData_constants.py



If you have not yet set up a config, this should look pretty empty. 
The lines `WARNING: EXPERIMENT_DIR is not initialised` and `WARNING: BASE_DATA_DIR is not initialised` are expected in this case 
and alert us that we need to set `EXPERIMENT_DIR` and `BASE_DATA_DIR` to that CAMPA knows where experiments and data is stored.

Let us set the ``EXPERIMENT_DIR`` and the ``BASE_DATA_DIR``, and add the `ExampleData` data config.
Here, we set the data and experiments paths relative the the folder that this notebook is stored in. 
Note that we assume that this folder contains the `params/ExampleData_constants.py` file.

In [2]:
# point to example data folder in which we will download the example data
campa_config.BASE_DATA_DIR = CAMPA_DIR / "example_data"
# experiments will be stored in example_experiments
campa_config.EXPERIMENT_DIR = CAMPA_DIR / "example_experiments"
# add ExampleData data_config
campa_config.add_data_config("ExampleData", CAMPA_DIR / "params/ExampleData_constants.py")
# set CO_OCC_CHUNK_SIZE (a parameter making co-occurrence calculation more memory efficient)
campa_config.CO_OCC_CHUNK_SIZE = 1e7

print(campa_config)

CAMPAConfig (fname: /home/icb/hannah.spitzer/.config/campa/campa.ini)
EXPERIMENT_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/notebooks/example_experiments
BASE_DATA_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/notebooks/example_data
CO_OCC_CHUNK_SIZE: 10000000.0
data_config/testdata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/notebooks/params/TestData_constants.py
data_config/exampledata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/notebooks/params/ExampleData_constants.py



We can now save the config to quickly load it later on. 
Here, we store the config in the `params` directory in the current folder.

In [3]:
# save config in non-standard location
campa_config.write("params/campa.ini")

Reading config from params/campa.ini



By default, campa looks for config files in
 the current directory and ``$HOME/.config/campa``, but loading a config from any other file is also easy:

In [4]:
# read config from non-standard location by setting campa_config.config_fname
campa_config.config_fname = "params/campa.ini"
print(campa_config)

Reading config from params/campa.ini
CAMPAConfig (fname: params/campa.ini)
EXPERIMENT_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/notebooks/example_experiments
BASE_DATA_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/notebooks/example_data
CO_OCC_CHUNK_SIZE: 10000000.0
data_config/testdata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/notebooks/params/TestData_constants.py
data_config/exampledata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/notebooks/params/ExampleData_constants.py



To follow along with the workflow tutorials, you need to download the example dataset.

Here, we store the example data in the `BASE_DATA_DIR` just configured in the config.

In [5]:
from campa.data import load_example_data

example_data_path = load_example_data(Path(campa_config.BASE_DATA_DIR).parent)
print("Example data downloaded to: ", example_data_path)

Path or dataset does not yet exist. Attempting to download...
{'x-amz-id-2': 'ek/Wyckw4mANMAR5AM3yHizzb2CdVT3/Hyxg9WYldFcgUkUR53NyKPyVY0V/chT35pI9Lnn0ZPs=', 'x-amz-request-id': 'G0V754TZQ3CP5X6H', 'Date': 'Fri, 28 Oct 2022 12:29:35 GMT', 'x-amz-replication-status': 'COMPLETED', 'Last-Modified': 'Fri, 28 Oct 2022 11:44:27 GMT', 'ETag': '"6300ee9228b5e78480a3a5a540e85730"', 'x-amz-tagging-count': '1', 'x-amz-server-side-encryption': 'AES256', 'Content-Disposition': 'attachment; filename="example_data.zip"', 'x-amz-version-id': 'WbEd4ye51WteRY2_BZaTchKIFVKkAxuw', 'Accept-Ranges': 'bytes', 'Content-Type': 'application/zip', 'Server': 'AmazonS3', 'Content-Length': '126837954'}
attachment; filename="example_data.zip"
Guessed filename: example_data.zip
Downloading... 126837954


123866it [00:01, 69196.16it/s]


Example data downloaded to:  /home/icb/hannah.spitzer/projects/pelkmans/software_new/notebooks/example_data



The example data is now stored in your `campa_config.BASE_DATA_DIR` folder.

The data is represented as an [MPPData][MPPData] object. For more information on this class and the data representation on disk see the [Data representation tutorial](mpp_data.ipynb).

[MPPData]: ../classes/campa.data.MPPData.rst
