d-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px; height: 163px">
</div>

# Connecting to S3

Apache Spark&trade; and Databricks&reg; allow you to connect to virtually any data store including Amazon S3.
## In this lesson you:
* Mount and access data in S3
* Define options when reading from S3

## Audience
* Primary Audience: Data Engineers
* Additional Audiences: Data Scientists and Data Pipeline Engineers

## Prerequisites
* Web browser: Please use a <a href="https://docs.databricks.com/user-guide/supported-browsers.html#supported-browsers" target="_blank">supported browser</a>.
* Concept (optional): <a href="https://academy.databricks.com/collections/frontpage/products/dataframes" target="_blank">DataFrames course from Databricks Academy</a>

<iframe  
src="//fast.wistia.net/embed/iframe/r2725pnugw?videoFoam=true"
style="border:1px solid #1cb1c2;"
allowtransparency="true" scrolling="no" class="wistia_embed"
name="wistia_embed" allowfullscreen mozallowfullscreen webkitallowfullscreen
oallowfullscreen msallowfullscreen width="640" height="360" ></iframe>
<div>
<a target="_blank" href="https://fast.wistia.net/embed/iframe/r2725pnugw?seo=false">
  <img alt="Opens in new tab" src="https://files.training.databricks.com/static/images/external-link-icon-16x16.png"/>&nbsp;Watch full-screen.</a>
</div>

-sandbox
### Spark as a Connector

Spark quickly rose to popularity as a replacement for the [Apache Hadoop&trade;](http://hadoop.apache.org/) MapReduce paradigm in a large part because it easily connected to a number of different data sources.  Most important among these data sources was the Hadoop Distributed File System (HDFS).  Now, Spark engineers connect to a wide variety of data sources including:  
<br>
* Traditional databases like Postgres, SQL Server, and MySQL
* Message brokers like <a href="https://kafka.apache.org/" target="_blank">Apache Kafka</a> and <a href="https://aws.amazon.com/kinesis/">Kinesis</a>
* Distributed databases like Cassandra and Redshift
* Data warehouses like Hive
* File types like CSV, Parquet, and Avro

<img src="https://files.training.databricks.com/images/eLearning/ETL-Part-1/open-source-ecosystem_2.png" style="border: 1px solid #aaa; border-radius: 10px 10px 10px 10px; box-shadow: 5px 5px 5px #aaa"/>

-sandbox
### DBFS Mounts and S3

Amazon Simple Storage Service (S3) is the backbone of Databricks workflows.  S3 offers data storage that easily scales to the demands of most data applications and, by colocating data with Spark clusters, Databricks quickly reads from and writes to S3 in a distributed manner.

The Databricks File System, or DBFS, is a layer over S3 that allows you to mount S3 buckets, making them available to other users in your workspace and persisting the data after a cluster is shut down.

In our road map for ETL, this is the <b>Extract and Validate </b> step:

<img src="https://files.training.databricks.com/images/eLearning/ETL-Part-1/ETL-Process-1.png" style="border: 1px solid #aaa; border-radius: 10px 10px 10px 10px; box-shadow: 5px 5px 5px #aaa"/>

<iframe  
src="//fast.wistia.net/embed/iframe/wk0yb1jyz5?videoFoam=true"
style="border:1px solid #1cb1c2;"
allowtransparency="true" scrolling="no" class="wistia_embed"
name="wistia_embed" allowfullscreen mozallowfullscreen webkitallowfullscreen
oallowfullscreen msallowfullscreen width="640" height="360" ></iframe>
<div>
<a target="_blank" href="https://fast.wistia.net/embed/iframe/wk0yb1jyz5?seo=false">
  <img alt="Opens in new tab" src="https://files.training.databricks.com/static/images/external-link-icon-16x16.png"/>&nbsp;Watch full-screen.</a>
</div>

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Classroom-Setup & Classroom-Cleanup<br>

For each lesson to execute correctly, please make sure to run the **`Classroom-Setup`** cell at the start of each lesson (see the next cell) and the **`Classroom-Cleanup`** cell at the end of each lesson.

In [0]:
%run "./Includes/Classroom-Setup"

-sandbox

Define your AWS credentials.  Below are defined read-only keys, the name of an AWS bucket, and the mount name to refer to use in DBFS.

<img alt="Side Note" title="Side Note" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.05em; transform:rotate(15deg)" src="https://files.training.databricks.com/static/images/icon-note.webp"/> For getting AWS keys, take a look at <a href="https://docs.aws.amazon.com/general/latest/gr/managing-aws-access-keys.html" target="_blank"> take a look at the AWS documentation

In [0]:
awsAccessKey = "AKIAJBRYNXGHORDHZB4A"
# Encode the Secret Key to remove any "/" characters
secretKey = "a0BzE1bSegfydr3%2FGE3LSPM6uIV5A4hOUfpH8aFF".replace("/", "%2F")
awsBucketName = "databricks-corp-training/common"

In addition to the sourcing information above, we need to define a target location.

So that no two students produce the exact same mount, we are going to be a little more creative with this one.

In [0]:
mountPoint = f"/mnt/etlp1s-vivek-si"

In case you mounted this bucket earlier, you might need to unmount it.

In [0]:
try:
  dbutils.fs.unmount(mountPoint) # Use this to unmount as needed
except:
  print("{} already unmounted".format(mountPoint))

-sandbox

Now mount the bucket [using the template provided in the docs.](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html#mounting-an-s3-bucket)

<img alt="Side Note" title="Side Note" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.05em; transform:rotate(15deg)" src="https://files.training.databricks.com/static/images/icon-note.webp"/> The code below includes error handling logic to handle the case where the mount is already mounted.

In [0]:
try:
  mountTarget = "s3a://{}:{}@{}".format(awsAccessKey, secretKey, awsBucketName)
  dbutils.fs.mount(mountTarget, mountPoint)
except:
  print("{} already mounted. Run previous cells to unmount first".format(mountPoint))

Next, explore the mount using `%fs ls` and the name of the mount.

Remember, your mount name includes your email address so you will need to uncomment and update the following FILL_IN section

In [0]:
print("Hint: Your mount name is {}".format(mountPoint))

In [0]:
%fs ls /mnt/etlp1s-vivek-si

path,name,size
dbfs:/mnt/etlp1s-vivek-si/301/,301/,0
dbfs:/mnt/etlp1s-vivek-si/Chicago-Crimes-2018.csv,Chicago-Crimes-2018.csv,5201668
dbfs:/mnt/etlp1s-vivek-si/City-Data.delta/,City-Data.delta/,0
dbfs:/mnt/etlp1s-vivek-si/City-Data.parquet/,City-Data.parquet/,0
dbfs:/mnt/etlp1s-vivek-si/EDGAR-Log-20170329/,EDGAR-Log-20170329/,0
dbfs:/mnt/etlp1s-vivek-si/StatLib/,StatLib/,0
dbfs:/mnt/etlp1s-vivek-si/UbiqLog4UCI/,UbiqLog4UCI/,0
dbfs:/mnt/etlp1s-vivek-si/_META/,_META/,0
dbfs:/mnt/etlp1s-vivek-si/adventure-works/,adventure-works/,0
dbfs:/mnt/etlp1s-vivek-si/airbnb/,airbnb/,0


In practice, always secure your AWS credentials.  Do this by either maintaining a single notebook with restricted permissions that holds AWS keys, or delete the cells or notebooks that expose the keys. After a cell used to mount a bucket is run, access this mount in any notebook, any cluster, and share the mount between colleagues.

## Adding Options

When you import that data into a cluster, you can add options based on the specific characteristics of the data.

<iframe  
src="//fast.wistia.net/embed/iframe/u2z99yb5p0?videoFoam=true"
style="border:1px solid #1cb1c2;"
allowtransparency="true" scrolling="no" class="wistia_embed"
name="wistia_embed" allowfullscreen mozallowfullscreen webkitallowfullscreen
oallowfullscreen msallowfullscreen width="640" height="360" ></iframe>
<div>
<a target="_blank" href="https://fast.wistia.net/embed/iframe/u2z99yb5p0?seo=false">
  <img alt="Opens in new tab" src="https://files.training.databricks.com/static/images/external-link-icon-16x16.png"/>&nbsp;Watch full-screen.</a>
</div>

Display the first few lines of `Chicago-Crimes-2018.csv` using `%fs head`.

In [0]:
%fs head /mnt/training/Chicago-Crimes-2018.csv

`option` is a method of `DataFrameReader`. Options are key/value pairs and must be specified before calling `.csv()`.

This is a tab-delimited file, as seen in the previous cell. Specify the `"delimiter"` option in the import statement.  

:NOTE: Find a [full list of parameters here.](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=dateformat#pyspark.sql.DataFrameReader.csv)

In [0]:
display(spark.read
  .option("delimiter", "\t")
  .csv("/mnt/training/Chicago-Crimes-2018.csv")
)

_c0,_c1,_c2,_c3,_c4,_c5,_c6,_c7,_c8,_c9,_c10,_c11,_c12,_c13,_c14,_c15,_c16,_c17,_c18,_c19,_c20,_c21
ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
23811,JB141441,02/05/2018 01:10:00 AM,118XX S INDIANA AVE,0110,HOMICIDE,FIRST DEGREE MURDER,VACANT LOT,false,false,0532,005,9,53,01A,1179707,1826280,2018,02/12/2018 03:49:14 PM,41.678585145,-87.617837834,"(41.678585145, -87.617837834)"
11228589,JB148990,01/23/2018 09:00:00 AM,072XX S VERNON AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,false,false,0323,003,6,69,11,,,2018,02/12/2018 03:49:14 PM,,,
11228563,JB148931,01/31/2018 10:12:00 AM,040XX N KEYSTONE AVE,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,APARTMENT,false,false,1722,017,39,16,11,,,2018,02/12/2018 03:49:14 PM,,,
11228555,JB148885,02/01/2018 02:00:00 PM,017XX W CONGRESS PKWY,0820,THEFT,$500 AND UNDER,HOSPITAL BUILDING/GROUNDS,false,false,1231,012,2,28,06,,,2018,02/12/2018 03:49:14 PM,,,
11228430,JB148675,01/27/2018 09:00:00 PM,061XX S EBERHART AVE,0560,ASSAULT,SIMPLE,RESIDENCE,false,true,0313,003,20,42,08A,,,2018,02/12/2018 03:49:14 PM,,,
11228401,JB148683,02/02/2018 12:00:00 PM,038XX N SAWYER AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,RESIDENCE,false,false,1733,017,33,16,11,,,2018,02/12/2018 03:49:14 PM,,,
11228347,JB148599,01/28/2018 07:00:00 PM,008XX E 45TH ST,0620,BURGLARY,UNLAWFUL ENTRY,RESIDENCE,false,false,0221,002,4,39,05,,,2018,02/12/2018 03:49:14 PM,,,
11228291,JB148591,01/10/2018 04:45:00 PM,010XX E 53RD ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,false,false,0233,002,4,41,11,,,2018,02/12/2018 03:49:14 PM,,,
11228287,JB148482,01/03/2018 03:45:00 PM,0000X W C1 ST,0810,THEFT,OVER $500,AIRPORT TERMINAL LOWER LEVEL - NON-SECURE AREA,false,false,1651,016,41,76,06,,,2018,02/12/2018 03:49:14 PM,,,


Spark doesn't read the header by default, as demonstrated by the column names of `_c0`, `_c1`, etc. Notice that the column names are present in the first row of the DataFrame. 

Fix this by setting the `"header"` option to `True`.

In [0]:
display(spark.read
  .option("delimiter", "\t")
  .option("header", True)
  .csv("/mnt/training/Chicago-Crimes-2018.csv")
)

ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
23811,JB141441,02/05/2018 01:10:00 AM,118XX S INDIANA AVE,0110,HOMICIDE,FIRST DEGREE MURDER,VACANT LOT,False,False,532,5,9,53,01A,1179707.0,1826280.0,2018,02/12/2018 03:49:14 PM,41.678585145,-87.617837834,"(41.678585145, -87.617837834)"
11228589,JB148990,01/23/2018 09:00:00 AM,072XX S VERNON AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,False,False,323,3,6,69,11,,,2018,02/12/2018 03:49:14 PM,,,
11228563,JB148931,01/31/2018 10:12:00 AM,040XX N KEYSTONE AVE,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,APARTMENT,False,False,1722,17,39,16,11,,,2018,02/12/2018 03:49:14 PM,,,
11228555,JB148885,02/01/2018 02:00:00 PM,017XX W CONGRESS PKWY,0820,THEFT,$500 AND UNDER,HOSPITAL BUILDING/GROUNDS,False,False,1231,12,2,28,06,,,2018,02/12/2018 03:49:14 PM,,,
11228430,JB148675,01/27/2018 09:00:00 PM,061XX S EBERHART AVE,0560,ASSAULT,SIMPLE,RESIDENCE,False,True,313,3,20,42,08A,,,2018,02/12/2018 03:49:14 PM,,,
11228401,JB148683,02/02/2018 12:00:00 PM,038XX N SAWYER AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,RESIDENCE,False,False,1733,17,33,16,11,,,2018,02/12/2018 03:49:14 PM,,,
11228347,JB148599,01/28/2018 07:00:00 PM,008XX E 45TH ST,0620,BURGLARY,UNLAWFUL ENTRY,RESIDENCE,False,False,221,2,4,39,05,,,2018,02/12/2018 03:49:14 PM,,,
11228291,JB148591,01/10/2018 04:45:00 PM,010XX E 53RD ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,233,2,4,41,11,,,2018,02/12/2018 03:49:14 PM,,,
11228287,JB148482,01/03/2018 03:45:00 PM,0000X W C1 ST,0810,THEFT,OVER $500,AIRPORT TERMINAL LOWER LEVEL - NON-SECURE AREA,False,False,1651,16,41,76,06,,,2018,02/12/2018 03:49:14 PM,,,
11228268,JB148558,02/04/2018 04:00:00 PM,044XX S MICHIGAN AVE,2825,OTHER OFFENSE,HARASSMENT BY TELEPHONE,APARTMENT,False,True,215,2,3,38,26,,,2018,02/12/2018 03:49:14 PM,,,


Spark didn't infer the schema, or read the timestamp format, since this file uses an atypical timestamp.  Change that by adding the option `"timestampFormat"` and pass it the format used in this file.  

Set `"inferSchema"` to `True`, which triggers Spark to make an extra pass over the data to infer the schema.

In [0]:
crimeDF = (spark.read
  .option("delimiter", "\t")
  .option("header", True)
  .option("timestampFormat", "mm/dd/yyyy hh:mm:ss a")
  .option("inferSchema", True)
  .csv("/mnt/training/Chicago-Crimes-2018.csv")
)
display(crimeDF)

ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
23811,JB141441,2018-01-05T01:10:00.000+0000,118XX S INDIANA AVE,0110,HOMICIDE,FIRST DEGREE MURDER,VACANT LOT,False,False,532,5,9,53,01A,1179707.0,1826280.0,2018,2018-01-12T15:49:14.000+0000,41.678585145,-87.617837834,"(41.678585145, -87.617837834)"
11228589,JB148990,2018-01-23T09:00:00.000+0000,072XX S VERNON AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,False,False,323,3,6,69,11,,,2018,2018-01-12T15:49:14.000+0000,,,
11228563,JB148931,2018-01-31T10:12:00.000+0000,040XX N KEYSTONE AVE,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,APARTMENT,False,False,1722,17,39,16,11,,,2018,2018-01-12T15:49:14.000+0000,,,
11228555,JB148885,2018-01-01T14:00:00.000+0000,017XX W CONGRESS PKWY,0820,THEFT,$500 AND UNDER,HOSPITAL BUILDING/GROUNDS,False,False,1231,12,2,28,06,,,2018,2018-01-12T15:49:14.000+0000,,,
11228430,JB148675,2018-01-27T21:00:00.000+0000,061XX S EBERHART AVE,0560,ASSAULT,SIMPLE,RESIDENCE,False,True,313,3,20,42,08A,,,2018,2018-01-12T15:49:14.000+0000,,,
11228401,JB148683,2018-01-02T12:00:00.000+0000,038XX N SAWYER AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,RESIDENCE,False,False,1733,17,33,16,11,,,2018,2018-01-12T15:49:14.000+0000,,,
11228347,JB148599,2018-01-28T19:00:00.000+0000,008XX E 45TH ST,0620,BURGLARY,UNLAWFUL ENTRY,RESIDENCE,False,False,221,2,4,39,05,,,2018,2018-01-12T15:49:14.000+0000,,,
11228291,JB148591,2018-01-10T16:45:00.000+0000,010XX E 53RD ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,233,2,4,41,11,,,2018,2018-01-12T15:49:14.000+0000,,,
11228287,JB148482,2018-01-03T15:45:00.000+0000,0000X W C1 ST,0810,THEFT,OVER $500,AIRPORT TERMINAL LOWER LEVEL - NON-SECURE AREA,False,False,1651,16,41,76,06,,,2018,2018-01-12T15:49:14.000+0000,,,
11228268,JB148558,2018-01-04T16:00:00.000+0000,044XX S MICHIGAN AVE,2825,OTHER OFFENSE,HARASSMENT BY TELEPHONE,APARTMENT,False,True,215,2,3,38,26,,,2018,2018-01-12T15:49:14.000+0000,,,


## The Design Pattern

Other connections work in much the same way, whether your data sits in Cassandra, Redis, Redshift, or another common data store.  The general pattern is always:  
<br>
1. Define the connection point
2. Define connection parameters such as access credentials
3. Add necessary options

After adhering to this, read data using `spark.read.options(<option key>, <option value>).<connection_type>(<endpoint>)`.

## Exercise 1: Read Wikipedia Data

Read Wikipedia data from S3, accounting for its delimiter and header.

### Step 1: Get a Sense for the Data

Take a look at the head of the data, located at `/mnt/training/wikipedia/pageviews/pageviews_by_second.tsv`.

In [0]:
%fs head /mnt/training/wikipedia/pageviews/pageviews_by_second.tsv

### Step 2: Import the Raw Data

Import the data **without any options** and save it to `wikiDF`. Display the result.

In [0]:
wikiDF = (spark.read
          
  .csv("/mnt/training/wikipedia/pageviews/pageviews_by_second.tsv")
)
display(wikiDF)

_c0
"""timestamp""	""site""	""requests"""
"""2015-03-16T00:09:55""	""mobile""	1595"
"""2015-03-16T00:10:39""	""mobile""	1544"
"""2015-03-16T00:19:39""	""desktop""	2460"
"""2015-03-16T00:38:11""	""desktop""	2237"
"""2015-03-16T00:42:40""	""mobile""	1656"
"""2015-03-16T00:52:24""	""desktop""	2452"
"""2015-03-16T00:54:16""	""mobile""	1654"
"""2015-03-16T01:18:11""	""mobile""	1720"
"""2015-03-16T01:30:32""	""desktop""	2288"


In [0]:
# TEST - Run this cell to test your solution

dbTest("ET1-P-03-01-01", 7200001, wikiDF.count())
dbTest("ET1-P-03-01-02", '_c0', wikiDF.columns[0])

print("Tests passed!")

### Step 3: Import the Data with Options

Import the data with options and save it to `wikiWithOptionsDF`.  Display the result.  Your import statement should account for:<br><br>  

 - The header
 - The delimiter

In [0]:
wikiWithOptionsDF = (spark.read
  .option("delimiter", "\t")
  .option("header", True)
  .csv("/mnt/training/wikipedia/pageviews/pageviews_by_second.tsv")
)
display(wikiWithOptionsDF)


timestamp,site,requests
2015-03-16T00:09:55,mobile,1595
2015-03-16T00:10:39,mobile,1544
2015-03-16T00:19:39,desktop,2460
2015-03-16T00:38:11,desktop,2237
2015-03-16T00:42:40,mobile,1656
2015-03-16T00:52:24,desktop,2452
2015-03-16T00:54:16,mobile,1654
2015-03-16T01:18:11,mobile,1720
2015-03-16T01:30:32,desktop,2288
2015-03-16T01:32:24,mobile,1609


In [0]:
# TEST - Run this cell to test your solution
cols = wikiWithOptionsDF.columns

dbTest("ET1-P-03-02-01", 7200000, wikiWithOptionsDF.count())

dbTest("ET1-P-03-02-02", True, "requests" in cols)
dbTest("ET1-P-03-02-03", True, "site" in cols)
dbTest("ET1-P-03-02-04", True, "timestamp" in cols)

print("Tests passed!")

## Review

**Question:** What accounts for Spark's quick rise in popularity as an ETL tool?  
**Answer:** Spark easily accesses data virtually anywhere it lives, and the scalable framework lowers the difficulties in building connectors to access data.  Spark offers a unified API for connecting to data making reads from a CSV file, JSON data, or a database, to provide a few examples, nearly identical.  This allows developers to focus on writing their code rather than writing connectors.

**Question:** What is DBFS and why is it important?  
**Answer:** The Databricks File System (DBFS) allows access to scalable, fast, and distributed storage backed by S3 or the Azure Blob Store.

**Question:** How do you connect your Spark cluster to S3?  
**Answer:** By mounting it. Mounts require AWS credentials and give access to a virtually infinite store for your data. Using AWS IAM roles provides added security since your keys will not appear in log files.  <a href="https://docs.databricks.com/user-guide/cloud-configurations/aws/iam-roles.html" target="_blank">One other option is to define your keys in a single notebook that only you have permission to access.</a> Click the arrow next to a notebook in the Workspace tab to define access permissions.

**Question:** How do you specify parameters when reading data?  
**Answer:** Using `.option()` during your read allows you to pass key/value pairs specifying aspects of your read.  For instance, options for reading CSV data include `header`, `delimiter`, and `inferSchema`.

**Question:** What is the general design pattern for connecting to your data?  
**Answer:** The general design pattern is as follows:
0. Define the connection point.
0. Define connection parameters such as access credentials.
0. Add necessary options such as for headers or parallelization.

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Classroom-Cleanup<br>

Run the **`Classroom-Cleanup`** cell below to remove any artifacts created by this lesson.

In [0]:
%run "./Includes/Classroom-Cleanup"

## Next Steps

Start the next lesson, [Connecting to JDBC]($./04-Connecting-to-JDBC ).

## Additional Topics & Resources

**Q:** Where can I find more information on DBFS?  
**A:** <a href="https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html" target="_blank">Take a look at the Databricks documentation for more details

-sandbox
&copy; 2020 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>