# Table Management

Apache Spark&trade; and Azure Databricks&reg; allow you to access and optimize data in managed and unmanaged tables.

-sandbox
### Optimization of Data Storage with Managed and Unmanaged Tables

A **managed table** is a table that manages both the data itself as well as the metadata.  In this case, a `DROP TABLE` command removes both the metadata for the table as well as the data itself.  

**Unmanaged tables** manage the metadata from a table such as the schema and data location, but the data itself sits in a different location, often backed by a blob store like the Azure Blob Storage. Dropping an unmanaged table drops only the metadata associated with the table while the data itself remains in place.

<div><img src="https://files.training.databricks.com/images/eLearning/ETL-Part-2/managed-and-unmanaged-tables.png" style="height: 400px; margin: 20px"/></div>

### Writing to a Managed Table

Managed tables allow access to data using the Spark SQL API.

Run the cell below to mount the data.

In [5]:
%run "./Includes/Classroom-Setup"

Create a DataFrame.

In [7]:
df = spark.range(1, 100)

display(df)

Register the table.

In [9]:
df.write.mode("OVERWRITE").saveAsTable("myTableManaged")

Use `DESCRIBE EXTENDED` to describe the contents of the table.  Scroll down to see the table `Type`.
Notice the location is also `dbfs:/user/hive/warehouse/mytable`.

In [11]:
%sql
DESCRIBE EXTENDED myTableManaged

### Writing to an Unmanaged Table

Write to an unmanaged table by adding an `.option()` that includes a path.

In [13]:
df.write.mode("OVERWRITE").option('path', '/tmp/myTableUnmanaged').saveAsTable("myTableUnmanaged")

-sandbox
Now examine the table type and location of the data.

<img alt="Side Note" title="Side Note" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.05em; transform:rotate(15deg)" src="https://files.training.databricks.com/static/images/icon-note.webp"/> An external table is the same as an unmanaged table.

In [15]:
%sql
DESCRIBE EXTENDED myTableUnmanaged

### Dropping Tables

Take a look at how dropping tables operates differently in the two cases below.

Look at the files backing up the managed table.

In [18]:
%fs ls dbfs:/user/hive/warehouse/mytablemanaged

Drop the table.

In [20]:
%sql
DROP TABLE myTableManaged

Next look at the underlying data.

In [22]:
%fs ls dbfs:/user/hive/warehouse/mytablemanaged

The data was deleted so spark will not find the underlying data. Perform the same operation with the unmanaged table.

In [24]:
%fs ls /tmp/myTableUnmanaged

Drop the unmanaged table.

In [26]:
%sql
DROP TABLE myTableUnmanaged

See if the data is still there.

In [28]:
%fs ls /tmp/myTableUnmanaged

## Review
**Question:** What happens to the original data when I delete a managed table?  What about an unmanaged table?  
**Answer:** Deleting a managed table deletes both the metadata and the data itself. Deleting an unmanaged table does not delete the original data.

**Question:** What is a metastore?  
**Answer:** A metastore is a repository of metadata such as the location of where data is and the schema information. A metastore does not include the data itself.

## Additional Topics & Resources

**Q:** Where can I find out more about connnecting to my own metastore?  
**A:** Take a look at the <a href="https://docs.azuredatabricks.net/user-guide/advanced/external-hive-metastore.html" target="_blank">Databricks documentation for more details</a>

**Extra Practice:** Apply what you learned in this module by completing the optional [Custom Transformations, Aggregating, and Loading]($./Optional/Custom-Transformations) exercise.