<font color=gray>ADS Sample Notebook.

Copyright (c) 2021 Oracle, Inc. All rights reserved. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl.
</font>

***
# <font color=red>Getting Started with Oracle Cloud Infrastructure Data Science</font>
<p style="margin-left:10%; margin-right:10%;">by the <font color=teal> Oracle Cloud Infrastructure Data Science Service Team </font></p>

***

## Service Overview

Welcome to Oracle Cloud Infrastructure Data Science Service.

The Oracle Cloud Infrastructure Data Science service is a fully managed platform for data science teams to build, train, and manage machine learning models using Oracle Cloud Infrastructure (OCI).

The Data Science service:

* Provides data scientists with a collaborative, project-driven workspace.
* Enables self-service access to infrastructure for data science workloads.
* Includes Python-centric tools, libraries, and packages developed by the open-source community and the [Oracle Accelerated Data Science Library](https://docs.cloud.oracle.com/en-us/iaas/tools/ads-sdk/latest/index.html), which supports the end-to-end lifecycle of predictive models:
    * Data acquisition, profiling, preparation, and visualization.
    * Feature engineering.
    * Model training.
    * Model evaluation, explanation, and interpretation.
    * Model storage through the [Model Catalog](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/manage-models.htm). 
    * Model deployment.
* Integrates with the rest of the OCI services, including [Oracle Functions](https://docs.cloud.oracle.com/en-us/iaas/Content/Functions/Concepts/functionsoverview.htm), [Data Flow](https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_data_flow.htm), [Autonomous Data Warehouse](https://docs.cloud.oracle.com/en-us/iaas/Content/Database/Concepts/adboverview.htm), [Streaming](https://docs.cloud.oracle.com/en-us/iaas/Content/Streaming/Concepts/streamingoverview.htm), [Vault](https://docs.cloud.oracle.com/en-us/iaas/Content/KeyManagement/Concepts/keyoverview.htm), [Logging](https://docs.cloud.oracle.com/en-us/iaas/Content/Logging/Concepts/loggingoverview.htm#loggingoverview), and [Object Storage](https://docs.cloud.oracle.com/en-us/iaas/Content/Object/Concepts/objectstorageoverview.htm).
* Helps data scientists concentrate on methodology and domain expertise to deliver more models to production.

For more details, see the [Data Science documentation](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm).

---

## Overview

This **[Neurophysiology](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/use-notebook-sessions.htm#conda_understand_environments)** conda environment provides the best-in-class tooling for analyzing, visualizing and exploring neurophysiological data. Using tools such as `MNE` and `Gumpy` empower you to create brain-computer interfaces (BCI) or visualize near-infrared spectroscopy (NIRS) data to assess tissue oxygenation. The tools also allow for the analysis of many different neurophysiological signals such as electrocorticograms (ECoG), electroencephalograms (EEG), stereoelectroencephalograms (sEEG), magnetoencephalograms (MEG), and much more. This environment also includes the "lite" distribution of Oracle Accelerated Data Science (`Oracle-ads`) library, which offers a variety of useful data access, profiling, and transformation features.


---

**Important:**

Placeholder text for required values are surrounded by angle brackets that must be removed when adding the indicated content. For example, when adding a database name to `database_name = "<database_name>"` would become `database_name = "production"`.

---

## Prerequisites:
- Experience with a specific topic: Novice
- Professional experience: None

---

## Objectives:

- <a href='#authentication'>Understanding Authentication to Oracle Cloud Infrastructure Resources from a Notebook Session</a>
    - <a href='#resource_principals'>Authentication with Resource Principals</a>
        - <a href='#resource_principals_ads'>Resource Principals Authentication using the ADS SDK</a>
        - <a href='#resource_principals_oci'>Resource Principals Authentication using the OCI SDK</a>
        - <a href='#resource_principals_cli'>Resource Principals Authentication using the OCI CLI</a> 
    - <a href='#api_keys'>Authentication with API Keys</a>
- <a href='#conda'>Neurophysiology Conda Environment</a>
    - <a href='#conda_overview'>Overview</a>
    - <a href='#conda_libraries'>Principal Conda Libraries</a>
    - <a href='#conda_configuration'>Configuration</a>
- <a href='#ref'>References</a> 

---

In [None]:
import logging
import warnings

from ads import set_auth
from ads import set_documentation_mode
from oci.auth.signers import get_resource_principals_signer
from oci.data_science import DataScienceClient
from os import popen

set_documentation_mode(False)
warnings.filterwarnings('ignore')
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

<a id='authentication'></a>
# Understanding Authentication to Oracle Cloud Infrastructure Resources from a Notebook Session

When working within a notebook session, the `datascience` user is used. This user does not have an OCI Identity and Access Management (IAM) identity, so it has no access to the OCI API. To access OCI resources. This includes Data Science projects, models and any other OCI service resources from the notebook environment, you must configure either resource principals or API keys. For most applications, the resource principal is the recommended approach.

<a id='resource_principals'></a>
## Authentication with Resource Principals

Data Science enables easy and secure authentication using the notebook session's resource principal to access other OCI resources, including Data Science projects and models. These steps show you how to use your notebook session's resource principal.

In advance, a tenancy administrator must write policies to grant permissions to the resource principal to access other OCI resources, see [Manually Configuring Your Tenancy for Data Science](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/configure-tenancy.htm).

There are two methods to configure the notebook to use resource principals, and they are the `ads` and `oci` libraries. While both of these libraries provide the required authentication, the `ads` library is specifically designed for easy operation within a Data Science notebook session.

If you don't wabt to take on these library dependencies, you can use the `oci` command from the command line.

For more details on using resource principals in the Data Science service, see the [ADS Configuration documentation](https://docs.cloud.oracle.com/en-us/iaas/tools/ads-sdk/latest/user_guide/configuration/configuration.html#) and the [Authenticating to the Oracle Cloud Infrastructure APIs from a Notebook Session](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/use-notebook-sessions.htm#topic_kxj_znw_pkb).

<a id='resource_principals_ads'></a>
### Resource Principals Authentication using the ADS SDK

The `set_auth()` method sets the proper authentication mechanism for ADS. ADS uses the `oci` SDK to access resources like the model catalog or Object Storage.

Within a notebook session, configure the use of a resource principal for the ADS SDK by running this in a notebook cell:

In [None]:
set_auth(auth='resource_principal') 

<a id='resource_principals_oci'></a>
### Resource Principals Authentication using the OCI SDK

Within your notebook session, the `oci` library can use the resource principal. This cell demonstrates how to make a basic connection using the default settings:

In [None]:
resource_principal = get_resource_principals_signer() 
dsc = DataScienceClient(config={}, signer=resource_principal)

<a id='resource_principals_cli'></a>
### Resource Principals Authentication using the OCI CLI

Within a notebook session, the OCI CLI can be used to configure the resource principal using the `--auth=resource_principal` option. For example:

In [None]:
cmd = "oci data-science project get --project-id=$PROJECT_OCID --auth=resource_principal 2>&1"
print(popen(cmd).read())

If the resource principal is correctly configured, messages similar to the following are displayed:

```
{
"data": {
"compartment-id": "ocid1.compartment.oc1..aaaaaaaafl3avkal72rrwuy4m5rumpwh7r4axejjwq5hvwjy4h4uoyi7kzyq",
"created-by": "ocid1.user.oc1..aaaaaaaabfrlcbiyvjmjvgh3ns6trdyoewxytqywwta3yqmy3ah3fa3uw76q",
"defined-tags": {},
"description": "my favorite demo project\n",
"display-name": "demo-project",
"freeform-tags": {},
"id": "ocid1.datascienceproject.oc1.iad.aaaaaaaappvg4tp5kmbkurcyeghxaqmaknw3s5yh2oxcvfrvjeaadinsng6q",
"lifecycle-state": "ACTIVE",
"time-created": "2019-11-14T22:29:06.870000+00:00"
},
"etag": "b4d66fb733748f3454206d5de6b9acb3634edc804b2ad1997bd69dc676035a89"
}
```

<a id='api_keys'></a>
## Authentication with API Keys

If resource principals are not explicitly used, API Keys are used by default. For some use cases, you may want to set up API keys, see the instructions in the `api_keys.ipynb` example notebook.


<a id='conda'></a>
# Neurophysiology Conda Environment

<a id='conda_overview'></a>
## Overview

This **[Neurophysiology](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/use-notebook-sessions.htm#conda_understand_environments)** conda environment contains packages for analyzing, visualizing and exploring neurophysiological data. Included within this conda environment are `MNE` and `Gumpy` to create brain-computer interfaces (BCI) or visualize near-infrared spectroscopy (NIRS) data to assess tissue oxygenation. The tools also allow for the analysis of many different neurophysiological signals such as electrocorticograms (ECoG), electroencephalograms (EEG), stereoelectroencephalograms (sEEG), magnetoencephalograms (MEG), and much more. This environment also includes the lite distribution of `ADS` library, which offers a variety of useful data access, profiling, and transformation features. 

You can access notebook examples for this conda environment in JupyterLab from the **Launcher** tab by clicking **Notebook Examples**. Then you can select one of the notebook examples that are available for all of the conda environments intsalled in your notebook session.

<a id='conda_libraries'></a>
## Principal Conda Libraries

1. `Oracle-ads`:
    Oracle-ads offers a friendly user interface, with objects and methods that cover all the steps involved in the lifecycle of machine learning models, from data acquisition to model evaluation and interpretation.
    
2. `MNE`:
    Provides the best-in-class tooling for analyzing, visualizing and exploring neurophysiological data. MNE empowers you to visualize near-infrared spectroscopy (NIRS) data to assess tissue oxygenation. The tool also allows for the analysis of many different neurophysiological signals such as electrocorticograms (ECoG), electroencephalograms (EEG), stereoelectroencephalograms (sEEG), magnetoencephalograms (MEG), and much more. 

    A great place to get started is to look at the [examples gallery](https://mne.tools/stable/auto_examples/index.html) to see what MNE has to offer. The MNE [tutorials](https://mne.tools/stable/auto_tutorials/index.html) cover many of the common tasks that this conda environment was designed for.
    
3. `Gumpy`:
    Gumpy implements tooling used in the collection and analysis of electroencephalogram (EEG) and electromyogram (EMG) data. It empowers researchers to quickly perform data analysis and implement novel classifiers. The combination of EEG and EMG makes it especially powerful for the development of hybrid brain-computer interfaces (BCI). It is also designed for extensibility so that you can customize it for your specific needs.

    Gumpy comes with the following modules:

    * Dataset: Read Graz 2b, the recorded EEG and EMG dataset, and create new datasets by subclassing gumpy.dataset.Dataset.
    * Signal processing: Process EEG and EMG signals by filtering, normalizing, and much more.
    * Plotting: Create data visualizations that are specific to EEG and EMG data, such as discrete wavelet approximations, PCA and more.
    * Feature extraction: Extract features from EMG and EEG signals.
    * Classification:  Machine learning classifiers such as SVM, LDA, KNN, LDA with shrinkage, MLP, Random Forest, Logistic regression Quadratic LDA
   
   Check out the [examples](http://www.gumpy.org/#org75f77e2).

<a id='conda_configuration'></a>
## Configuration

No additional configuration is needed to use this conda environment.

# References

* [Understanding and Using Conda Environments](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/use-notebook-sessions.htm#conda_understand_environments)
* [ADS Configuration documentation](https://docs.cloud.oracle.com/en-us/iaas/tools/ads-sdk/latest/user_guide/configuration/configuration.html#)
* [Authenticating to the Oracle Cloud Infrastructure APIs from a Notebook Session](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/use-notebook-sessions.htm#topic_kxj_znw_pkb)
* [Manually Configuring Your Tenancy for Data Science](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/configure-tenancy.htm)
* [Data Science documentation](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm)
* [Data Science & AI Blog](https://blogs.oracle.com/datascience/)