# Getting Started on Azure Databricks

Azure Databricks&reg; provides a notebook-oriented Apache Spark&trade; as-a-service workspace environment, making it easy to manage clusters and explore data interactively.

### Use cases for Apache Spark 
* Read and process huge files and data sets
* Query, explore, and visualize data sets
* Join disparate data sets found in data lakes
* Train and evaluate machine learning models
* Process live streams of data
* Perform analysis on large graph data sets and social networks

## Exercise 1

Create a notebook and Spark cluster.

-sandbox
<img alt="Hint" title="Hint" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.3em" src="https://files.training.databricks.com/static/images/icon-light-bulb.svg"/>&nbsp;**Hint:** This step will require you navigate around Databricks while doing this lesson.  We recommend you <a href="" target="_blank">open a second browser window</a> when navigating around Databricks.  This way you can view these instructions in one window and navigate in another.

### Step 1
Databricks notebooks are backed by clusters, or networked computers that work together to process your data. Create a Spark cluster (*if you already have a running cluster, skip to **Step 2** *):
1. In your new window, click the **Clusters** button in the sidebar.
<div><img src="https://files.training.databricks.com/images/eLearning/create-cluster-4.png" style="height: 200px"/></div><br/>
2. Click the **Create Cluster** button.
<div><img src="https://files.training.databricks.com/images/eLearning/create-cluster-5.png" style="border: 1px solid #aaa; border-radius: 10px 10px 10px 10px; box-shadow: 5px 5px 5px #aaa"/></div><br/>
3. Name your cluster. Use your name or initials to easily differentiate your cluster from your coworkers.
4. Select the Databricks Runtime version. To complete all the lessons in this module, use the latest runtime (**5.1** or newer) and Scala **2.11**.
5. Select **3** as the Python version.
5. Specify your cluster configuration.
  * For clusters created on a **Community Edition** shard the default values are sufficient for the remaining fields.
  * For all other environments, please refer to your company's policy on creating and using clusters.</br></br>
6. Right click on **Cluster** button on left side and open a new tab. Click the **Create Cluster** button.
<div><img src="https://databricksdemostore.blob.core.windows.net/images/04/01/create-cluster.png" style="height: 300px; border: 1px solid #aaa; border-radius: 10px 10px 10px 10px; box-shadow: 5px 5px 5px #aaa"/></div>

<img alt="Hint" title="Hint" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.3em" src="https://files.training.databricks.com/static/images/icon-light-bulb.svg"/>&nbsp;**Hint:** Check with your local system administrator to see if there is a recommended default cluster at your company to use for the rest of the class. This could save you some money!

-sandbox
### Step 2
Create a new notebook in your home folder:
1. Click the **Home** button in the sidebar.
<div><img src="https://files.training.databricks.com/images/eLearning/home.png" style="height: 200px"/></div><br/>
2. Right-click on your home folder.
3. Select **Create**.
4. Select **Notebook**.
<div><img src="https://files.training.databricks.com/images/eLearning/create-notebook-1.png" style="height: 150px; border: 1px solid #aaa; border-radius: 10px 10px 10px 10px; box-shadow: 5px 5px 5px #aaa"/></div><br/>
5. Name your notebook `First Notebook`.<br/>
6. Set the language to **Python**.<br/>
7. Select the cluster to which to attach this Notebook.  
<img alt="Side Note" title="Side Note" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.05em; transform:rotate(15deg)" src="https://files.training.databricks.com/static/images/icon-note.webp"/> If a cluster is not currently running, this option will not exist.
8. Click **Create**.
<div>
  <div style="float:left"><img src="https://files.training.databricks.com/images/eLearning/create-notebook-2b.png" style="width:400px; border: 1px solid #aaa; border-radius: 10px 10px 10px 10px; box-shadow: 5px 5px 5px #aaa"/></div>
  <div style="float:left">&nbsp;&nbsp;&nbsp;or&nbsp;&nbsp;&nbsp;</div>
  <div style="float:left"><img src="https://files.training.databricks.com/images/eLearning/create-notebook-2.png" style="width:400px; border: 1px solid #aaa; border-radius: 10px 10px 10px 10px; box-shadow: 5px 5px 5px #aaa"/></div>
  <div style="clear:both"></div>
</div>

-sandbox
### Step 3

Now that you have a notebook, we can use it to run some code.
1. In the first cell of your notebook, type `1 + 1`. 
2. Run the cell by clicking the run icon and selecting **Run Cell**.
<div><img src="https://files.training.databricks.com/images/eLearning/run-notebook-1.png" style="width:600px; margin-bottom:1em; border: 1px solid #aaa; border-radius: 10px 10px 10px 10px; box-shadow: 5px 5px 5px #aaa"/></div>
<img alt="Side Note" title="Side Note" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.05em; transform:rotate(15deg)" src="https://files.training.databricks.com/static/images/icon-note.webp"/> You can also run a cell by typing **Ctrl-Enter**.

In [6]:
1 + 1

-sandbox

### Attach and Run

If your notebook was not previously attached to a cluster you might receive the following prompt: 
<div><img src="https://files.training.databricks.com/images/eLearning/run-notebook-2.png" style="margin-bottom:1em; border: 1px solid #aaa; border-radius: 10px 10px 10px 10px; box-shadow: 5px 5px 5px #aaa"/></div>

If you click **Attach and Run**, first make sure that you are attaching to the correct cluster.

If it is not the correct cluster, click **Cancel** instead see the next cell, **Attach & Detach**.

-sandbox
### Attach & Detach

If your notebook is detached you can attach it to another cluster:  
<img src="https://files.training.databricks.com/images/eLearning/attach-to-cluster.png" style="border: 1px solid #aaa; border-radius: 10px 10px 10px 10px; box-shadow: 5px 5px 5px #aaa"/>
<br/>
<br/>
<br/>
If your notebook is attached to a cluster you can:
* Detach your notebook from the cluster.
* Restart the cluster.
* Attach to another cluster.
* Open the Spark UI.
* View the Driver's log files.

<img src="https://files.training.databricks.com/images/eLearning/detach-from-cluster.png" style="margin-bottom:1em; border: 1px solid #aaa; border-radius: 10px 10px 10px 10px; box-shadow: 5px 5px 5px #aaa"/>

## Summary
* Create notebooks by clicking the down arrow on a folder and selecting the **Create Notebook** option.
* Import notebooks by clicking the down arrow on a folder and selecting the **Import** option.
* Attach to a spark cluster by selecting the **Attached/Detached** option directly below the notebook title.
* Create clusters using the Clusters button on the left sidebar.

## Review Questions

**Q:** How do you create a Notebook?  
**A:** Sign into Databricks, select the **Home** icon from the sidebar, right-click your home-folder, select **Create**, and then **Notebook**. In the **Create Notebook** dialog, specify the name of your notebook and the default programming language.

**Q:** How do you create a cluster?  
**A:** Select the **Clusters** icon on the sidebar, click the **Create Cluster** button, specify the specific settings for your cluster and then click **Create Cluster**.

**Q:** How do you attach a notebook to a cluster?  
**A:** If you run a command while detached, you may be prompted to connect to a cluster. To connect to a specific cluster, open the cluster menu by clicking the **Attached/Detached** menu item and then selecting your desired cluster.

-sandbox
## Next Steps

Start the next lesson, **Querying Files with DataFrames**.
1. In the left sidebar, click **Home**.
2. Select your home folder.
3. Select the folder **DataFrames-(version #)**
4. Open the notebook **02-Querying-Files** by single-clicking on it (you'll be working the rest of the course from within your Databricks account)


<img src="https://files.training.databricks.com/images/eLearning/main-menu-4.png" style="margin-bottom: 5px; border: 1px solid #aaa; border-radius: 10px 10px 10px 10px; box-shadow: 5px 5px 5px #aaa; width: auto; height: auto; max-height: 383px"/>

## Additional Topics & Resources
**Q:** Are there additional docs I can reference to find my way around Databricks?  
**A:** See <a href="https://docs.azuredatabricks.net/getting-started/index.html" target="_blank">Getting Started with Databricks</a>.

**Q:** Where can I learn more about the cluster configuration options?  
**A:** See <a href="https://docs.azuredatabricks.net/user-guide/clusters/index.html#id1" target="_blank">Spark Clusters on Databricks</a>.

**Q:** Can I import formats other than .dbc files?  
**A:** Yes, see <a href="https://docs.azuredatabricks.net/user-guide/notebooks/notebook-manage.html#notebook-external-formats" target="_blank">Importing Notebooks</a>.

**Q:** Can I install the courseware notebooks into a non-Databricks distribution of Spark?  
**A:** No, the files that contain the courseware are in a Databricks specific format (DBC).