<font color=gray>ADS Sample Notebook.

Copyright (c) 2020, 2021 Oracle, Inc.  All rights reserved.
Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl.
</font>

***
# <font color=red>Introduction to Projects</font>
<p style="margin-left:10%; margin-right:10%;">by the <font color=teal> Oracle Cloud Infrastructure Data Science Service Team </font></p>

***

## Overview:

This notebook reviews the concepts around notebooks and how to manipulate them use the `oci` and `ads` libraries.

---

## Prerequisites:
 - Experience with the topic: Novice
 - Professional experience: None
 
---

## Objectives:
This notebook covers the following topics:
- <a href="#introduction">Introduction to Projects</a>
- <a href="#manage_ads">Managing Projects with the Accelerated Data Science (ADS) SDK</a>
    - <a href="#create_project_ads">Creating a Project</a>
    - <a href="#list_project_ads">Listing Projects</a>
    - <a href="#update_project_ads">Updating a Project</a>
        - <a href="#update_project_ads_project">Updating Using the Project Class</a>
        - <a href="#update_project_ads_projectcatalog">Updating Using the ProjectCatalog Class</a>
    - <a href="#delete_project_ads">Deleting a Project</a>
- <a href="#manage_oci">Managing Projects with the Oracle Cloud Infrastructure Data Science SDK</a>
    - <a href="#create_project_oci">Creating a Project</a>
    - <a href="#list_project_oci">Listing Projects</a>
    - <a href="#update_project_oci">Updating a Project</a>
    - <a href="#delete_project_oci">Deleting a Project</a>
- <a href="#ref">References</a>
***

<a id="introduction"></a>
# 1. Introduction to Projects

Projects are collaborative workspaces for organizing and documenting Data Science assets such as notebook sessions and models. Fundamentally, a project is a container to store these Data Science assets and the notebook sessions that are used for interactive coding. Each notebook must belong to one and only one project, but a project can have many notebooks. With the use of Identity and Access Management (IAM), restrictions on who can view, modify and delete notebooks. This allows for easy, safe and secure sharing of Data Science work products. Projects can also be used as an effective method for tracking expenses with the use of *CostCenter* tags.

There are two libraries for interacting with the project API and Python. They are the `data_science` module of the `oci` library and the Accelerated Data Science (ADS) library `ads.catalog.project`. The `data_science` module is more flexible but requires a deeper knowledge of the API system. The `ads.catalog.project` library has been developed for use by data scientists as it simplifies many of the operations that they would want to perform. Both libraries can be used together.

<a id="manage_ads"></a>
# 2. Managing Projects with the Accelerated Data Science (ADS) SDK

There primary class that is used to work with a project is `ProjectCatalog`. The `ProjectCatalog` takes an optional parameter `compartment_id` that defines the compartment where the project will be created. By default, it will use the compartment that the notebook belongs to.

Operations generally return a `Project` or `ProjectSummaryList` object. The `Project` object is a representation of the project in Python. Operations on this object are not stored until the `commit()` method is called. The `rollback()` method can be used to revert pending changes.

<a id="project_attribute"></a>
The `Project` class has the following attributes. Some of these attributes cannot be committed.

* attribute_map: Map between `ads` and `oci` field names
* compartment_id: OCID of the compartment that the project belongs to
* created_by: OCID of the account that created the project
* defined_tags: Predefined tags
* description: Description of the project
* display_name: Display name of the project
* freeform_tags: User-defined tags for the project
* id: OCID of the project
* lifecycle_state: Lifecycle state of the project, such as 'ACTIVE'
* time_created: Timestamp of when the project was created
* user_email: Email address of the account that created the project
* user_name: User name of the account that created the project

<a id="create_project_ads"></a>
## Creating a Project

To create a project, an instance of `ProjectCatalog` is used with the `create_project()` method. For interoperability with the `oci` library, an `oci.data_science.models.CreateProjectDetails` object can be passed to the `create_project_details` parameter. Or the following optional parameters can be used:

* compartment_id: OCID of the compartment that the project belongs to. Defaults to notebook compartment.
* defined_tags: Oracle defined tags.
* description: Description of the project.
* display_name: Display name of the project.
* freeform_tags: User-defined tags for the project.

The `create_project()` method will return a `Project` object. In the following example, a project will be created using some of the optional parameters. See the section on using the `oci` libraries on how to create a `CreateProjectDetails` object if that is the preferred method.

The optional parameter `ds_client_auth` can be used to specify the preferred authentication method to access OCI Data Science client.

In [None]:
from ads.catalog.project import ProjectSummaryList, ProjectCatalog
from ads.common import auth as authutil
project = ProjectCatalog(ds_client_auth=authutil.api_keys()).create_project(
    display_name="test_project",
    description="This is a test project")

Information about the `Project` object can be accessed through a couple of different approaches. The following section details these approaches.

Specific fields can be accessed as attributes.

In [None]:
print("Display name: {}".format(project.display_name))
print("OCID of the creating account: {}".format(project.created_by))
print("Timestamp of when project was created: {}".format(project.time_created))

The `show_in_notebook()` method creates a formated table that displays information about the project

In [None]:
project.show_in_notebook()

The `to_dataframe()` method will return details about the project in a Pandas dataframe. 

In [None]:
type(project.to_dataframe())

<a id="list_project_ads"></a>
## Listing Projects

The `list_projects()` method of the `ProjectCatalog` object returns a `ProjectSummaryList` object that contains details of the various projects in a compartment. Each element in the `ProjectSummaryList` is a `ProjectSummary` object. 

The `ProjectSummary` class has the same attributes as the <a href="#project_attribute">`Project` class</a> and can be used to access details of a project. It also has the `show_in_notebook()` method which prints a table with the project information. The `to_dataframe()` method can also be used to convert the information to a Pandas dataframe.

The following section details how to list projects in this notebook's compartment. To list notebooks in other compartments use the `compartment_id` in the `ProjectCatalog` constructor with the OCID of the compartment to list.

In [None]:
project_list = ProjectCatalog().list_projects()
project_list

The `sort()` method of the `ProjectSummaryList` class allows the data to be sorted. It takes a list object that contains the column names to sort by. It has an optional parameter, `reverse`, which will sort the data in descending if set to `True`. By default, the data will be sorted in ascending order.

The following example sorts the projects by `lifecycle_state` and then `time_created` in descending order.

In [None]:
project_list.sort_by(["lifecycle_state", "time_created"], reverse=True)

The `ProjectSummaryList` object can be converted to a Pandas dataframe with the `to_dataframe()` method.

In [None]:
type(project_list.to_dataframe())

`ProjectSummaryList` objects can be accessed by indexing its elements. The object returned is of type `ProjectSummary`. In the following example, the `show_in_notebook()` method will print a table with the first project's metadata.

In [None]:
project_list[0].show_in_notebook()

The `ProjectSummary` class has the same attributes as the <a href="#project_attribute">`Project` class</a> and can be used to access details of a project. The following code snippet prints the `description` of the first project.

In [None]:
project_list[0].description

`ProjectSummaryList` objects support list comprehension. In the following example, the time a project was created (`time_created`), will be put into a list object. In addition, a filter will be applied to only select those projects that were created at the same time or before the project created in this notebook.

In [None]:
print("The current project was created on: {}".format(str(project.time_created)))
[str(x.time_created) for x in project_list if x.time_created <= project.time_created]

`ProjectSummary` objects can be converted to Pandas data frames using the `to_dataframe()` method.

In [None]:
project_list[0].to_dataframe()

The `filter()` method in the `ProjectSummaryList` class accepts a lambda function or a list comprehension which is used to filter the data. It will filter the `ProjectSummaryList` and return another `ProjectSummaryList`. In the case of a list comprehension, the list comprehension must return `ProjectSummary` objects.

The following example filters the project list for the project who's OCID matches the OCID of the project created in this notebook. The first example uses a lambda function and the second uses list comprehension.

In [None]:
project_list.filter(lambda x: x.id == project.id)

In [None]:
project_list.filter([x for x in project_list if x.id == project.id])

<a id="update_project_ads"></a>
## Updating a Project

There are two methods for programmatically updating projects within the ADS framework. The `ProjectCatalog` class has an `update_project()` method that requires a project's OCID to make changes. Alternatively, a `Project` object can be used to update the project that it represents.

<a id="update_project_ads_project"></a>
### Updating Using the Project Class

`Project` objects are an in-memory representation of a project that is stored in a compartment. Changes made to the `Project` object are only updated in memory. To store these changes back to the Oracle Cloud Infrastructure the `commit()` method must be called. This is analogous to editing a document and then saving the document back to the storage device.

Changes to a `Project` object can be reverted back to what is in the Oracle Cloud Infrastructure by using the `rollback()` method. This is analogous to reloading a text document in an editor, without saving the changes.

The following example updates the `project` object to have a new description. After doing that, it retrieves the stored project's description. These are the same projects but the descriptions are different. This is because the changes have not been committed.

In [None]:
project.description = "a new description"
project_list = ProjectCatalog().list_projects()
project_stored = project_list.filter([x for x in project_list if x.id == project.id])[0]

print("Project object's description: {}".format(project.description))
print("Stored project's description: {}".format(project_stored.description))
print("Are the project OCIDs the same: {}".format(project.id == project_stored.id))

The `rollback()` method can be used to revert the *project* object back to what is stored in the compartment.

In [None]:
project.rollback()
print("Project object's description: {}".format(project.description))
print("Stored project's description: {}".format(project_stored.description))
print("Are the project OCIDs the same: {}".format(project.id == project_stored.id))

Attributes that are changed in a `Project` object are stored into the compartment with the use of the `commit()` method.

The following example updates the description of a project and commits that change.

In [None]:
project.description = "A new description"
project = project.commit()

<a id="update_project_ads_projectcatalog"></a>

### Updating Using the ProjectCatalog Class

The `ProjectCatalog` object has a method, `update_project()`, that allows a project to be updated in the compartment without having to obtain a `Project` object, making changes to the object, and committing the changes. The `update_project()` method requires that the `project_id` parameter be set to the OCID of the project that is to be updated.

For interoperability with the `oci` library, an `oci.data_science.models.UpdateProjectDetail` object can be passed to the `update_project_details` parameter. Or the following optional parameters can be used:

* compartment_id: OCID of the compartment that the project belongs to. Defaults to notebook compartment.
* defined_tags: Oracle defined tags.
* description: Description of the project.
* display_name: Display name of the project.
* freeform_tags: User-defined tags for the project

The following example updates the project created in this notebook.

In [None]:
ProjectCatalog().update_project(project_id=project.id, description="Description updated with update_project")
project_list = ProjectCatalog().list_projects()
print("Updated description: {}".format(
    project_list.filter([x for x in project_list if x.id == project.id])[0].description))

<a id="delete_project_ads"></a>
## Deleting a Project

The `ProjectCatalog` class has a `delete_project()` method that deletes a project. The delete operation will fail unless all associated resources (such as notebook sessions or models) are in a DELETED state. All associated resources must be deleted before deleting a project. The `project` parameter requires a project's OCID or a `Project` object.

The `delete_project()` method will return a boolean value indicating if the project was deleted or not. Repeated calls or deleting a project that is already deleted will return a `True` value.

The following example deletes the project that was created in this notebook.

In [None]:
ProjectCatalog().delete_project(project=project)

<a id="manage_oci"></a>

# Managing Projects with the Oracle Cloud Infrastructure Data Science SDK

Oracle Cloud Infrastructure Data Science enables you to authenticate using your notebook session's resource principal to access other Oracle Cloud Infrastructure resources. Resource principals provide a more secure and easy-to-use method of authenticating to Oracle Cloud Infrastructure resources.

The main class for interacting with the Data Science service is `DataScienceClient`. This class requires a dictionary object that has the keys needed to make API calls. However, with resources principals, you only need to pass an empty dictionary and the resources principals to the `signer`.

In [None]:
import oci
import os

from ads import set_auth
from oci.data_science.models import CreateProjectDetails
from oci.data_science.models import UpdateProjectDetails
from oci.data_science import DataScienceClient
from os import path

set_auth(auth='resource_principal') 
rps = oci.auth.signers.get_resource_principals_signer()

<a id="create_project_oci"></a>
## Creating a Project

To create a project, an instance of `DataScienceClient` is used with the `create_project()` method. The `CreateProjectDetails` is used to provide details about the project. The `ads` library optionally accepts this object, but it is required in the `oci` library. 

The `CreateProjectDetails` has the following parameters:

* compartment_id: OCID of the compartment that the project belongs to. Defaults to notebook compartment.
* defined_tags: Oracle defined tags.
* description: Description of the project.
* display_name: Display name of the project.
* freeform_tags: User-defined tags for the project.

The `create_project()` method returns a `Response` object. If the project was created successfully, the `Project` object is contained in the `data` attribute. In the following example, a `CreateProjectDetials` object is created. It defined the `compartment_id` as the compartment of this notebook session. The `create_project()` method is used to create the project. The `status` attribute of the `Response` object is checked. If the project was created, the variable `project` is created containing the `Project` object.

In [None]:
project_details = CreateProjectDetails(
    compartment_id=os.environ["NB_SESSION_COMPARTMENT_OCID"],
    display_name="test_project",
    description="This is a test project")
response = DataScienceClient({}, signer=rps).create_project(create_project_details=project_details)
if response.status == 200:
    project = response.data
    print("Project OCID: {}".format(project.id))
else:
    print("The project was not created!")

<a id="list_project_oci"></a>
## Listing Projects

The `list_projects()` method of the `DataScienceClient` object returns a `Response` object. The API is able to return a list of projects, it has a python list object in the `data` attribute. This list contains details of the various projects in a compartment. Each element in the list is a `ProjectSummary` object.

The `ProjectSummary` class has the same attributes as the <a href="#project_attribute">`Project` class</a> and can be used to access details of a project.

The `list_projects()` method provides a flexible interface to sort, filter, and paginate the results. The following is a list of the parameters that can be used:

* compartment_id: (required) Filter results by the OCID of the compartment.
* id: Filter results by OCID. 
* display_name: Filter results by its user-friendly name.
* lifecycle_state: Filter results by the specified lifecycle state. Allowed values are: "ACTIVE", "DELETING", "DELETED"
* created_by: Filter results by the OCID of the user who created the resource.
* limit: For list pagination. The maximum number of results per page or items to return in a paginated "List" call. 1 is the minimum, 1000 is the maximum.
* page: For list pagination. The value of the opc-next-page response header from the previous "List" call.
* sort_order: Specifies the sort order to use, either ASC (ascending) or DESC (descending).
* sort_by: Specifies the field to sort by. Accepts only one field. By default, when you sort by timeCreated, results are shown in descending order. When you sort by displayName, results are shown in ascending order. Sort order for displayName field is case sensitive. Allowed values are: "timeCreated", "displayName".
* opc_request_id: Unique Oracle assigned identifier for the request.
* retry_strategy: A retry strategy to apply to this specific operation and call. It overrides any retry strategy set at the client-level. This should be one of the strategies available in the retry module. A convenience DEFAULT_RETRY_STRATEGY is also available. To have this operation explicitly not perform any retries, pass an instance of NoneRetryStrategy.

In the following example, a list of deleted projects is obtained. The list is sorted by the display name in descending order. Upon a successful API call, the project's display names are printed in the order in which they are returned.

In [None]:
response = DataScienceClient({}, signer=rps).list_projects(
    compartment_id=os.environ["NB_SESSION_COMPARTMENT_OCID"],
    lifecycle_state="DELETED",
    sort_order="DESC",
    sort_by="displayName")
if response.status == 200:
    project_list = response.data
    for item in project_list:
        print(item.display_name)
else:
    print("Unable to list the projects!")

<a id="update_project_oci"></a>
## Updating a Project

The `update_project()` method requires a project's OCID and an `UpdateProjectDetails` object. The `UpdateProjectDetails` allows the display name, description, freeform tags, and defined tags to be updated. Only the fields that are going to be changed need to be specified. The `update_project()` method returns a `Response` object. If the update is successful, the `data` attribute contains a `Project` object with information about the updated project.

The following example updates the project that was created in this section by giving it a new description.

In [None]:
project_details = UpdateProjectDetails(description="a new description")
response = DataScienceClient({}, signer=rps).update_project(project_id=project.id, update_project_details=project_details)
if response.status == 200:
    print("The project was updated.")
else:
    print("The project was not updated!")

<a id="delete_project_oci"></a>
## Deleting a Project

The `delete_project()` method requires a project's OCID to delete the project. It returns a `Response` object. If the update is successful, the `data` attribute contains a `Project` object with information about the updated project.

The `Response` object returns a 202 Accepted HTTP code if the API accepts the responsibility to delete the project. Repeated calls to delete the same project will generally result in an HTTP status code of 204 No Content. The value of the `data` attribute is always `None`.

The following example deletes the project that was created in this notebook.

In [None]:
response = DataScienceClient({}, signer=rps).delete_project(project_id=project.id)
if response.status == 202:
    print("The project is pending deletion.")
else:
    print("The project was not deleted!")

<a id="ref"></a>
# References

* [ADS SDK - Project](https://docs.cloud.oracle.com/en-us/iaas/tools/ads-sdk/latest/ads.catalog.html#module-ads.catalog.project)
* [OCI Data Science SDK](https://oracle-cloud-infrastructure-python-sdk.readthedocs.io/en/latest/api/data_science.html)