# Setup and download data

This tutorials shows how to set up CAMPA and download an example dataset.
To follow along with this and the following tutorials, please execute the following steps first:

- install CAMPA (``pip install campa``) 
- download the [tutorials](https://github.com/theislab/campa/tree/main/notebooks) to a new folder, referred to as ``CAMPA_DIR`` in the following
- navigate to ``CAMPA_DIR`` in the terminal and start this notebook with `jupyter notebook setup.py`

Note that the following notebooks assume that you will run them from the same folder that you run this notebook in (``CAMPA_DIR``). If this is not the case, adjust ``CAMPA_DIR`` at the top of each notebook to point to the folder that you run this notebook in. 

In [1]:
from pathlib import Path

# set CAMPA_DIR to the current working directory
CAMPA_DIR = Path.cwd()
print(CAMPA_DIR)

/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test


## Download parameter files

Before configuring CAMPA, we need to ensure that all parameter files for configuring the running the different CAMPA steps are present in the `params` subfolder. Note that in general, these files don't need to be in a folder named `params`, but the following tutorials will follow this convention. 
Let us download the necessary parameter files from the [github repository](https://github.com/theislab/campa/tree/main/notebooks/params).

In [2]:
import glob

import requests

# ensure params folder exists
(CAMPA_DIR / "params").mkdir(parents=True, exist_ok=True)

# download parameter files from git
for param_file in [
    "ExampleData_constants",
    "example_data_params",
    "example_experiment_params",
    "example_feature_params",
]:
    r = requests.get(f"https://raw.github.com/theislab/campa/main/notebooks/params/{param_file}.py")
    with open(CAMPA_DIR / "params" / f"{param_file}.py", "w") as f:
        f.write(r.text)

print(f'Files in {CAMPA_DIR / "params"}: {glob.glob(str(CAMPA_DIR / "params" / "*"))}')

Files in /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params: ['/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/example_experiment_params.py', '/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/example_data_params.py', '/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/ExampleData_constants.py', '/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/example_feature_params.py']


## Set up CAMPA config

CAMPA has one main config file: `campa.ini`. 
The [overview](../overview.rst)
describes how you can create this config file from the command line, 
but here we will see how we can create a config from within the campa module using 
the config file representation [campa.constants.campa_config](../api/campa.constants.campa_config.rst).

In [3]:
from campa.constants import campa_config

print(campa_config)

2022-11-25 09:57:06.641175: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-25 09:57:24.354282: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-25 09:57:27.507035: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


CAMPAConfig (fname: None)
EXPERIMENT_DIR: None
BASE_DATA_DIR: None
CO_OCC_CHUNK_SIZE: None



If you have not yet set up a config, this should look pretty empty. 
The lines `WARNING: EXPERIMENT_DIR is not initialised` and `WARNING: BASE_DATA_DIR is not initialised` are expected in this case 
and alert us that we need to set `EXPERIMENT_DIR` and `BASE_DATA_DIR` to that CAMPA knows where experiments and data is stored.

Let us set the ``EXPERIMENT_DIR`` and the ``BASE_DATA_DIR``, and add the `ExampleData` data config.
Here, we set the data and experiments paths relative to ``CAMPA_DIR`` defined above.

In [4]:
# point to example data folder in which we will download the example data
campa_config.BASE_DATA_DIR = CAMPA_DIR / "example_data"
# experiments will be stored in example_experiments
campa_config.EXPERIMENT_DIR = CAMPA_DIR / "example_experiments"
# add ExampleData data_config (pointing to ExampleData_constants file that we just downloaded)
campa_config.add_data_config("ExampleData", CAMPA_DIR / "params/ExampleData_constants.py")
# set CO_OCC_CHUNK_SIZE (a parameter making co-occurrence calculation more memory efficient)
campa_config.CO_OCC_CHUNK_SIZE = 1e7

print(campa_config)

CAMPAConfig (fname: None)
EXPERIMENT_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments
BASE_DATA_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_data
CO_OCC_CHUNK_SIZE: 10000000.0
data_config/exampledata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/ExampleData_constants.py



We can now save the config to quickly load it later on. 
Here, we store the config in the `params` directory in the current folder.

In [5]:
# save config
campa_config.write(CAMPA_DIR / "params" / "campa.ini")

Reading config from /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini



By default, campa looks for config files in
 the current directory and ``$HOME/.config/campa``, but loading a config from any other file is also easy:

In [6]:
# read config from non-standard location by setting campa_config.config_fname
campa_config.config_fname = CAMPA_DIR / "params" / "campa.ini"
print(campa_config)

Reading config from /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini
CAMPAConfig (fname: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini)
EXPERIMENT_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments
BASE_DATA_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_data
CO_OCC_CHUNK_SIZE: 10000000.0
data_config/exampledata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/ExampleData_constants.py



## Download example dataset

To follow along with the workflow tutorials, you need to download the example dataset.

Here, we store the example data in the `BASE_DATA_DIR` just configured in the config.

In [7]:
from campa.data import load_example_data

example_data_path = load_example_data(Path(campa_config.BASE_DATA_DIR).parent)
print("Example data downloaded to: ", example_data_path)

Path or dataset does not yet exist. Attempting to download...
{'x-amz-id-2': 'HSPvG563oJllNzdrsV13AQCjHZ7P9FyV0mTfxhkmBn5sm1orzTIridTerZSrwwqhhJja8adJlLA=', 'x-amz-request-id': 'D1AWZ3CZHG6SQ9A8', 'Date': 'Fri, 25 Nov 2022 09:07:47 GMT', 'x-amz-replication-status': 'COMPLETED', 'Last-Modified': 'Fri, 28 Oct 2022 11:44:27 GMT', 'ETag': '"6300ee9228b5e78480a3a5a540e85730"', 'x-amz-tagging-count': '1', 'x-amz-server-side-encryption': 'AES256', 'Content-Disposition': 'attachment; filename="example_data.zip"', 'x-amz-version-id': 'WbEd4ye51WteRY2_BZaTchKIFVKkAxuw', 'Accept-Ranges': 'bytes', 'Content-Type': 'application/zip', 'Server': 'AmazonS3', 'Content-Length': '126837954'}
attachment; filename="example_data.zip"
Guessed filename: example_data.zip
Downloading... 126837954


123866it [00:04, 28644.04it/s]


Example data downloaded to:  /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_data



The example data is now stored in your `campa_config.BASE_DATA_DIR` folder.

The data is represented as an [MPPData][MPPData] object. For more information on this class and the data representation on disk see the [Data representation tutorial](mpp_data.ipynb).

[MPPData]: ../classes/campa.data.MPPData.rst
