<a href="https://colab.research.google.com/github/wandb/examples/blob/master/colabs/pytorch/Artifacts_Basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
<!--- @wandbcode{artifacts-basics} -->

<img src="http://wandb.me/logo-im-png" width="400" alt="Weights & Biases" />

<!--- @wandbcode{pytorch-video} -->



Use [Weights & Biases](https://wandb.com) for machine learning experiment tracking, dataset and model versioning and management, collaboration and more.

<div><img /></div>

<img src="https://wandb.me/mini-diagram" width="650" alt="Weights & Biases" />

<div><img /></div>




Use W&B Artifacts to track and version data as the inputs and outputs of your W&B Runs. For example, a model training run might take in a dataset as input and produce a trained model as output. In addition to logging hyperparameters, metadata, and metrics to a run, you can use an artifact to log the dataset used to train the model as input and the resulting model checkpoints as outputs.

# Set Up

In order to log to W&B, you will need the `wandb` package installed and imported into your script or notebook. If you are not already authenticated or signed up, a link will appear which you can use to do so.

In [None]:
!pip install wandb
import wandb
wandb.login()

# Create An Artifact

The general workflow for creating an Artifact is:


1.   Intialize a run.
2.   Create an Artifact.
3.   Add a dataset, model, another Artifact or any files or directories to the new Artifact that you want to track and version.
4.   Log the artifact in the W&B platform.

This can by accomplished with a few lines of code:

In [None]:
run = wandb.init(project="artifact-basics")
run.log_artifact(artifact_or_path="/content/sample_data/mnist_test.csv", name="my_first_artifact", type="dataset")
run.finish()

First, initalize the run with [`wandb.init()`](https://docs.wandb.ai/ref/python/init). In this demo, the code adds the run to the `artifact-basic` project, but you can change the name to anyting you'd like.

Next, log the Artifact with [`run.log_artifact()`](https://docs.wandb.ai/ref/python/public-api/run#log_artifact). In this demo, the Artifact is a `dataset` using data from `mnist_test.csv`. You can customize your Artifact with a `name` and other metadata- see the Artifacts Reference guide for more information.


If you change or add any argument names, be sure to replicate those changes in the following code samples as well.

# Use an Artifact

When you want to use a specific version of an Artifact in a downstream task, you can specify the specific version you would like to use via either `v0`, `v1`, `v2` and so on, or via specific aliases you may have added. The `latest` alias always refers to the most recent version of the Artifact logged.

The proceeding code snippet specifies an artifact called `my_first_artifact` with the alias `latest`:


In [None]:
run = wandb.init(project="artifact-basic")
artifact = run.use_artifact(artifact_or_name="my_first_artifact:latest")
run.finish()

# Create a new Artifact version

When you need add a new file, you can use the [`artifact.add_file`](https://docs.wandb.ai/ref/python/artifact#add_file) method.

In [None]:
run = wandb.init(project="artifact-basics")
artifact = run.use_artifact("my_first_artifact:latest") # selects the artifact you're adding the file to
artifact.add_file(local_path="/content/sample_data/california_housing_test.csv", name="new_file")
run.log_artifact(artifact)
run.finish()

This adds a new .csv file called `new_file` to the `my_first_artifact` Artifact. 

If you edit a file, you'll need to go through a similar process:

In [None]:
# sorts the .csv file
import pandas
csvData = pandas.read_csv("/content/sample_data/california_housing_test.csv") 
csvData.sort_values(csvData.columns[6],  
                    axis=0, 
                    inplace=True)
csvData.to_csv("/content/sample_data/california_housing_test.csv") # overwrites file with the sorted data
# adds the new file to the artifact
run = wandb.init(project="artifact-basics")
artifact = run.use_artifact("my_first_artifact:latest")
artifact.add_file(local_path="/content/sample_data/california_housing_test.csv", name="sorted_file")
run.log_artifact(artifact, aliases= ["sorted"]) # logs the new artifact version, overwriting the old one.
run.finish()


Now the sorted file will be logged in `my_first_artifact`. Any changes you log to an artifact will overwrite any older version. 

The Artifact has also been given a custom `alias`, a label for this Artifact version. While the `alias` is currently `sorted`, the default aliases is `vN`, where `N` is the number of versions the Artifact has. This increments automatically.

# Update Artifact version metadata

You can update the `description`, `metadata`, and `alias` of an artifact on the W&B platform during or outside a W&B Run.


This example changes the `description` of the `my_first_artifact` artifact inside a run:

In [None]:
run = wandb.init(project="artifact-basics")
artifact = run.use_artifact(artifact_or_name="my_first_artifact:latest")
artifact.description = "This is an edited description."
artifact.save()  # persists changes to an Artifact's properties
run.finish()

# Download an Artifact

To retrieve the path of an Artifact for external use, use the [`artifact.download()`](https://docs.wandb.ai/ref/python/artifact#download) method. This will retrieve the directory of an Artifact you select.

In [None]:
run = wandb.init(project="artifact-basics")
artifact = run.use_artifact(artifact_or_name="my_first_artifact:latest")
# This will download the specified artifact to where your code is running
datadir = artifact.download()
run.finish()
# prints the path of the current artifact directory
print(u'\u2500' * 10)
print("Data directory located at:" + datadir)

For more information on ways to customize your Artifact download, see the [Download and Usage guide](https://docs.wandb.ai/guides/artifacts/download-and-use-an-artifact).

# Navigate the Artifacts UI

You can also manage your Artifacts via the W&B platform. This can give you insight into your model's performance or dataset versioning. To navigate to the relevant information, click this [link](https://wandb.ai/wandb/artifact-basics/overview), then click on the **Artifacts** tab.

Navigating to the **Lineage** section in the tab will show the dependency graph formed by calling `run.use_artifact()` when an Artifact is an input to a run, and `run.log_artifact()` when an Artifact is output to a run. This helps visualize the relationship between different model versions and other objects like datasets and jobs in your project. Click [this](https://wandb.ai/wandb/artifact-basics/artifacts/dataset/my_first_artifact/v0/lineage) link to navigate to the project's lineage page.

# Next steps
1. [Artifacts Python reference documentation](https://docs.wandb.ai/ref/python/artifact): Deep dive into artifact parameters and advanced methods.
2. [Lineage](https://docs.wandb.ai/guides/artifacts/explore-and-traverse-an-artifact-graph): View lineage graphs, which are automatically built when using W&B artifact system, providing an auditable visual overview of the relationships between specific artifact versions, datasets models and runs.
3. [Model Registry](https://docs.wandb.ai/guides/model_registry): Learn how to centralize your best artifact versions in a shared registry.
4. [Artifact Automations](https://docs.wandb.ai/guides/artifacts/project-scoped-automations): Automatically run specific Weights & Biases jobs based on changes to your artifacts, such as automatically training a new model each time a new version of the training data is logged.
5. [Reference Artifacts](https://docs.wandb.ai/guides/artifacts/track-external-files#download-a-reference-artifact): Track files saved outside the W&B server, like Amazon S3 buckets, GCS buckets, Azure blobs, and more. 