# Welcome!

This is an intro-level notebook to walk you through core ideas of working with Spark (particularly PySpark) on Databricks. 

To get started, click the "Clusters" drop down at top-left to attach to a running cluster (if there are none running, go ahead and create one)

To execute each cell, you can:
* Press the "play" button at top-right
* On Mac, press `Shift+Return`

More details on Notebooks: [Use notebooks](https://docs.databricks.com/notebooks/notebooks-use.html#develop-notebooks)

# Working in Databricks

## Notebook Cells
Cells are a small code segment that provides simplified development and easy debugging 
You can click the round `(+)` button between existing cells to add a new cell, drag existing cells around, colapse/delete/hide/etc. cells as needed.

## Magic Commands
Each cell can be a different supported language! This allows you to switch languages depending on the task at hand.

In [0]:
print("This is a python cell")

In [0]:
%sql
SELECT "This is a SQL cell"
FROM (VALUES (1))

This is a SQL cell
This is a SQL cell


If you don't use a Magic command, it will run the notebook's default language, visible at the top of the screen. 

Other important magic commands:
* `%fs`: file system commands
* `%sh`: shell commands, e.g. `%sh ls`
* `%r`, `%scala`, `%sql`, `%python`: supported notebook languages
* `%run`: to run another notebook

## DBUtils
In addition to programming languages, you can use a set of useful "utility" functions for common tasks. These utilities are known as `dbutils`, and can be can access in notebooks via `dbutils.`

In [0]:
#Example: dbutils filesystem commands to see sample datasets 
dbutils.fs.ls("dbfs:/databricks-datasets")

In [0]:
#See all dbutils modules
dbutils.help()

# Spark in Databricks

## Databricks Runtime
Each Databricks cluster comes pre-installed with most of what you will need to do your job. The collection of libraries, environment settings, and other configs running on your cluster are collectively known as the **Databricks Runtime** (a.k.a. "DBR"). 

As a managed platform, Databricks continuously releases new runtimes with performance improvements and new features. You can learn more about the Databricks Runtime and see release notes here: [DBR Notes](https://docs.databricks.com/release-notes/runtime/releases.html)

To find out the Runtime your cluster is running, simply click the cluster selector at top-left of a notebook. This will show you physical details of the cluster (amount of memory, etc.) and the DBR version. For example, it might say `DBR 7.3 LTS ML`. You can then go to the release notes for that version to see what Python/R libraries are pre-installed, and learn about performance improvements and new functionality.

## Spark Session 
If you have set up Spark before, you have likely seen a similar command: `spark = SparkSession.builder.getOrCreate()`

In Databricks notebooks, each notebook **automatically** has a Spark context setup and ready after attaching to a cluster. Simply call the `spark` variable in Python to start interacting with Pyspark!

In [0]:
spark

Different notebooks have different Spark contexts by default. If you are sharing a cluster for development purposes with your colleauges, you do not need to worry about conflicting Spark sessions.

You can also **view** and **set** Spark configs for your current session directly from a notebook. Let's try it with a common config, the number of "shuffle partitions" used by our Spark jobs (`spark.sql.shuffle.partitions`):

In [0]:
spark.conf.get("spark.sql.shuffle.partitions") #Get the default value

In [0]:
spark.conf.set("spark.sql.shuffle.partitions",360) #Set the new value

In [0]:
spark.conf.get("spark.sql.shuffle.partitions") #Get the new value after setting

**Side Note:** Spark configs are very important to get the most out of Spark for big data processing. Although the defaults are a best attempt, you can drastically improve performance by tweaking these based on your data, cluster, and computing environment.

If you want to learn more, check out available Spark Tuning courses on academy.databricks.com

# Interacting with Data

## Understanding Cloud-based filepaths
Similar to how you might interact with a filepath when working locally (e.g. `C:/...`), cloud-based storage systems provide similar functionality. These paths are often found in the cloud provider's console or provided to you by an admin or data engineer. For example, this path may start with `abfss:/` on Azure ADLS Gen 2 storage, or `s3:/` on AWS S3 storage.

### DBFS
In addition to these cloud storage locations, **every Databricks workspace has a `dbfs:/` path**. This is the "Databricks Files System", and acts as a "default" location for working files (and is actually backed up by the same cloud storage as referenced above). You can use this for development purposes and shared file assets, but **DO NOT use DBFS for Production data** - instead, save it to a cloud storage path (e.g. ADLS or S3) where you can properly manage access and monitor.

In [0]:
display(dbutils.fs.ls("dbfs:/databricks-datasets"))

path,name,size
dbfs:/databricks-datasets/COVID/,COVID/,0
dbfs:/databricks-datasets/README.md,README.md,976
dbfs:/databricks-datasets/Rdatasets/,Rdatasets/,0
dbfs:/databricks-datasets/SPARK_README.md,SPARK_README.md,3359
dbfs:/databricks-datasets/adult/,adult/,0
dbfs:/databricks-datasets/airlines/,airlines/,0
dbfs:/databricks-datasets/amazon/,amazon/,0
dbfs:/databricks-datasets/asa/,asa/,0
dbfs:/databricks-datasets/atlas_higgs/,atlas_higgs/,0
dbfs:/databricks-datasets/bikeSharing/,bikeSharing/,0


## Loading Data
In Pyspark, we can load data using the Spark DataFrame "Reader" methods. For the rest of the notebook, we will use one of the Databricks Datasets that comes loaded with every workspace. 

[pyspark.sql.DataFrameReader](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader)

Common reader methods:
* `csv`, `text`, `delta`
* Example: `spark.read.format('json').load('<path>')`

In [0]:
dbutils.fs.ls("dbfs:/databricks-datasets/wine-quality")

In [0]:
wine_wrong = spark.read.csv("dbfs:/databricks-datasets/wine-quality/winequality-red.csv")
display(wine_wrong)

_c0
"""fixed acidity"";""volatile acidity"";""citric acid"";""residual sugar"";""chlorides"";""free sulfur dioxide"";""total sulfur dioxide"";""density"";""pH"";""sulphates"";""alcohol"";""quality"""
7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
7.8;0.88;0;2.6;0.098;25;67;0.9968;3.2;0.68;9.8;5
7.8;0.76;0.04;2.3;0.092;15;54;0.997;3.26;0.65;9.8;5
11.2;0.28;0.56;1.9;0.075;17;60;0.998;3.16;0.58;9.8;6
7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
7.4;0.66;0;1.8;0.075;13;40;0.9978;3.51;0.56;9.4;5
7.9;0.6;0.06;1.6;0.069;15;59;0.9964;3.3;0.46;9.4;5
7.3;0.65;0;1.2;0.065;15;21;0.9946;3.39;0.47;10;7
7.8;0.58;0.02;2;0.073;9;18;0.9968;3.36;0.57;9.5;7


Uh-oh! We see this file has a header-row (e.g. Row #1) and is separated by `;` instead of commas. We can modify the CSV DataFrameReader to account for this. Add `header=True` and `sep=';'`

In [0]:
wine = spark.read.csv("dbfs:/databricks-datasets/wine-quality/winequality-red.csv", <TODO>)
display(wine)

fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
7.4,0.7,0.0,1.9,0.076,11.0,34,0.9978,3.51,0.56,9.4,5
7.8,0.88,0.0,2.6,0.098,25.0,67,0.9968,3.2,0.68,9.8,5
7.8,0.76,0.04,2.3,0.092,15.0,54,0.997,3.26,0.65,9.8,5
11.2,0.28,0.56,1.9,0.075,17.0,60,0.998,3.16,0.58,9.8,6
7.4,0.7,0.0,1.9,0.076,11.0,34,0.9978,3.51,0.56,9.4,5
7.4,0.66,0.0,1.8,0.075,13.0,40,0.9978,3.51,0.56,9.4,5
7.9,0.6,0.06,1.6,0.069,15.0,59,0.9964,3.3,0.46,9.4,5
7.3,0.65,0.0,1.2,0.065,15.0,21,0.9946,3.39,0.47,10.0,7
7.8,0.58,0.02,2.0,0.073,9.0,18,0.9968,3.36,0.57,9.5,7
7.5,0.5,0.36,6.1,0.071,17.0,102,0.9978,3.35,0.8,10.5,5


Note that wrapping a DataFrame (or any tabular object) in `display()` will make it easier to explore and interactive than the standard Spark `.show()` method:

In [0]:
wine.show()

We now have a Spark DataFrame that we can use in PySpark!

## Saving Data
While you may be familiar with some common data formats (`.csv`, `.txt`, etc.), **not all file formats are equivalent!** Keep in mind these factors: 

* **R - Row or Column Store **: Column-based formats allow more data to be skipped and are generally faster.
* **C - Compression **: Compressing files saves time and money during data transmission and storage.
* **E - Schema Evolution **: Files that are self-describing keep the schema in the same place as the data (e.g. column names, data types) and can evolve over time as new columns are added.
* **S - Splitability **: Files that can be broken down into smaller chunks can be processed in parallel by multiple machines, leading to better performance.

Databricks open-sourced [Delta Lake](https://databricks.com/product/delta-lake-on-databricks), a "format" that brings reliability, performance, and lifecycle management to data lakes. We recommend using Delta as your file format when working with Spark. This can be specified in read/write operations with: `format('delta')`

We will take the above data and save to a Delta table. But first, we must use a very common Pyspark method called `withColumnRenamed()` to remove spaces in the column headers:

In [0]:
# We could do something like this, replacing each colum name 1-by-1...
wine_slow = wine.withColumnRenamed("fixed acidity","fixed_acidity")
display(wine_slow)

fixed_acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
7.4,0.7,0.0,1.9,0.076,11.0,34,0.9978,3.51,0.56,9.4,5
7.8,0.88,0.0,2.6,0.098,25.0,67,0.9968,3.2,0.68,9.8,5
7.8,0.76,0.04,2.3,0.092,15.0,54,0.997,3.26,0.65,9.8,5
11.2,0.28,0.56,1.9,0.075,17.0,60,0.998,3.16,0.58,9.8,6
7.4,0.7,0.0,1.9,0.076,11.0,34,0.9978,3.51,0.56,9.4,5
7.4,0.66,0.0,1.8,0.075,13.0,40,0.9978,3.51,0.56,9.4,5
7.9,0.6,0.06,1.6,0.069,15.0,59,0.9964,3.3,0.46,9.4,5
7.3,0.65,0.0,1.2,0.065,15.0,21,0.9946,3.39,0.47,10.0,7
7.8,0.58,0.02,2.0,0.073,9.0,18,0.9968,3.36,0.57,9.5,7
7.5,0.5,0.36,6.1,0.071,17.0,102,0.9978,3.35,0.8,10.5,5


In [0]:
# Or we can mix our Python knowledge with Spark to improve efficiency!
new_names = [column.replace(" ","_") for column in wine.columns]
new_wine = wine.toDF(*new_names)
display(new_wine)

fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,alcohol,quality
7.4,0.7,0.0,1.9,0.076,11.0,34,0.9978,3.51,0.56,9.4,5
7.8,0.88,0.0,2.6,0.098,25.0,67,0.9968,3.2,0.68,9.8,5
7.8,0.76,0.04,2.3,0.092,15.0,54,0.997,3.26,0.65,9.8,5
11.2,0.28,0.56,1.9,0.075,17.0,60,0.998,3.16,0.58,9.8,6
7.4,0.7,0.0,1.9,0.076,11.0,34,0.9978,3.51,0.56,9.4,5
7.4,0.66,0.0,1.8,0.075,13.0,40,0.9978,3.51,0.56,9.4,5
7.9,0.6,0.06,1.6,0.069,15.0,59,0.9964,3.3,0.46,9.4,5
7.3,0.65,0.0,1.2,0.065,15.0,21,0.9946,3.39,0.47,10.0,7
7.8,0.58,0.02,2.0,0.073,9.0,18,0.9968,3.36,0.57,9.5,7
7.5,0.5,0.36,6.1,0.071,17.0,102,0.9978,3.35,0.8,10.5,5


Let's practice by saving this file to a temporary DBFS location.

In [0]:
#Extract the current users name from email address
user=dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().apply('user').split("@")[0].split(".")[0]
#Build filepath for temporary saving of data; we will clean this up later
save_path = "dbfs:/tmp/"+user

new_wine.write.format("delta").mode("overwrite").save(save_path)

Note the use of `mode("overwrite")`. This will "overwrite" the existing table at that location. Supported write modes:

* `append`: Append contents of this DataFrame to existing data.

* `overwrite`: Overwrite existing data.

* `error` or `errorifexists`: Throw an exception if data already exists.

* `ignore`: Silently ignore this operation if data already exists.

### Exercise: List out filepath
Use the variable `save_path` and `dbutils` to look at your actual files in cloud storage.

In [0]:
dbutils.<TODO>

## (aside) "Mounts"/mnt
A Databricks "Mount" (aka `mnt`) is simply a shortcut to a cloud storage path. Instead of referencing a very long and complex file path on cloud storage, you simply call the mount. For example:

Instead of:
> `spark.read.csv("adl://<myadlsfolder>.azuredatalakestore.net/Path/To/My/Very/Nested/Files/MyData.csv")`

You can mount that location started at `Files`, then refer to it simply by:
> `spark.read.csv("/mnt/MyData.csv")`

To learn more about Mounts, refer here: [Databrick File System](https://docs.databricks.com/data/databricks-file-system.html). A workspace administrator will typically be required to setup mounts.

# Data Manipulation Basics

## DataFrames Intros
Thinking tabular: if you have never worked with "dataframes" or tabular data, it helps to have a mental model. Think **rows** and **columns**!

![tabular data](http://exceltutorialworld.com/wp-content/uploads/2017/10/Capture-30.png)

This is an Excel file, but many datasets (both relational and otherwise) can be thought of as tabular

If you have hear of Spark before, you may have heard of RDDs, or "Resilient distributed datasets". While these were important for early versions of Spark, almost all development now happens on Spark DataFrames, a higher-level abstraction of RDDs. **You should focus your learning efforts on the DataFrame APIs**.

## Differences between SQL and PySpark 
SQL is a syntax; PySpark is an interface to the Spark engine. 

Do you prefer SQL instead? Great! In Databricks you can complete many of the same operations with `%sql SELECT * FROM...` as you can on a traditional query tool. In Pyspark, you can also use `spark.sql("SELECT * FROM...)"` for a SQL-like interface from Python. 

Fun fact: performance between SQL, Pyspark, and Scala Spark are nearly identical in most circumstances. They all compile to the same thing!
![DataFrame comparison](https://www.oreilly.com/library/view/learning-pyspark/9781786463708/graphics/B05793_03_03.jpg)

## Differences between Pandas and Spark DataFrames
Many data practitioners start working with tabular data in Pandas, a popular Python library. Let's compare to PySpark:


|                | pandas DataFrame                  | Spark DataFrame                                                     |
| -------------- | --------------------------------- | ------------------------------------------------------------------- |
| Computation    | Eager                             | Lazy                                                                |
| Column         | df\['col'\]                       | df\['col'\]                                                         |
| Mutability     | Mutable                           | Immutable                                                           |
| Add a column   | df\['c'\] = df\['a'\] + df\['b'\] | df.withColumn('c', df\['a'\] + df\['b'\])                           |
| Rename columns | df.columns = \['a','b'\]          | df.select(df\['c1'\].alias('a'), df\['c2'\].alias('b'))             |
| Value count    | df\['col'\].value\_counts()       | df.groupBy(df\['col'\]).count().orderBy('count', ascending = False) |

## (aside) Koalas for Pandas on Spark ![Koalas](https://koalas.readthedocs.io/en/latest/_static/koalas-logo-docs.png)
Koalas is a "pandas-like" library to interact with data. It uses Spark as a backend, allowing massive datasets to be processed but with the convenience of **pandas** APIs. To learn more: https://koalas.readthedocs.io/en/latest/index.html

## Transformations
To get started, we will do a few aggregations on our wine DataFrame. For consistency, let's load from our save path:

In [0]:
#Uncomment and re-run to create the path if needed
#user=dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().apply('user').split("@")[0].split(".")[0]
#save_path = "dbfs:/tmp/"+user

data = spark.read.format("delta").load(save_path)
display(data)

fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,alcohol,quality
7.4,0.7,0.0,1.9,0.076,11.0,34,0.9978,3.51,0.56,9.4,5
7.8,0.88,0.0,2.6,0.098,25.0,67,0.9968,3.2,0.68,9.8,5
7.8,0.76,0.04,2.3,0.092,15.0,54,0.997,3.26,0.65,9.8,5
11.2,0.28,0.56,1.9,0.075,17.0,60,0.998,3.16,0.58,9.8,6
7.4,0.7,0.0,1.9,0.076,11.0,34,0.9978,3.51,0.56,9.4,5
7.4,0.66,0.0,1.8,0.075,13.0,40,0.9978,3.51,0.56,9.4,5
7.9,0.6,0.06,1.6,0.069,15.0,59,0.9964,3.3,0.46,9.4,5
7.3,0.65,0.0,1.2,0.065,15.0,21,0.9946,3.39,0.47,10.0,7
7.8,0.58,0.02,2.0,0.073,9.0,18,0.9968,3.36,0.57,9.5,7
7.5,0.5,0.36,6.1,0.071,17.0,102,0.9978,3.35,0.8,10.5,5


Let's get a count of the number of the number of records for each `quality` value. We use the `groupBy()` method to return a grouped dataframe, then `count()` to simply count the number of rows in each group.

In [0]:
display(data.groupBy("quality").count())

quality,count
3,10
7,199
5,681
4,53
8,18
6,638


However, if we look at the schema of our original DataFrame, we see that all the columns have been created as `string` types:

In [0]:
data.printSchema()

**Takeaway: always consider the data types you are operating on in Spark** 

To do further aggregations, let's build a quick function to convert every column to a numeric value of types `double`:

In [0]:
from pyspark.sql.functions import col
def cast_all_to_double(input_df):
  return input_df.select([col(col_name).cast("double") for col_name in input_df.columns])

data_num = data.transform(cast_all_to_double)
display(data_num)

fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,alcohol,quality
7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5.0
7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5.0
7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5.0
11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6.0
7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5.0
7.4,0.66,0.0,1.8,0.075,13.0,40.0,0.9978,3.51,0.56,9.4,5.0
7.9,0.6,0.06,1.6,0.069,15.0,59.0,0.9964,3.3,0.46,9.4,5.0
7.3,0.65,0.0,1.2,0.065,15.0,21.0,0.9946,3.39,0.47,10.0,7.0
7.8,0.58,0.02,2.0,0.073,9.0,18.0,0.9968,3.36,0.57,9.5,7.0
7.5,0.5,0.36,6.1,0.071,17.0,102.0,0.9978,3.35,0.8,10.5,5.0


**Takeway: you may need to import additional functions (e.g. `pyspark.sql.functions import col`) for certain DataFrame operations.**

Now it is your turn. 

### Exercise: Calculate the Average value of `pH` and `chlorides` by wine quality. 
Refer to Pyspark documentation to determine syntax: [Pyspark Sql Docs](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql.functions)

*Bonus: show the result in a single Dataframe, and rename the resulting columns as `avg_pH` and `avg_chlorides`.*

In [0]:
data_num.<TODO>

quality,avg(pH),avg(chlorides)
7.0,3.290753768844219,0.0765879396984924
3.0,3.398,0.1225
5.0,3.304948604992654,0.0927356828193832
8.0,3.2672222222222214,0.0684444444444444
4.0,3.381509433962264,0.0906792452830188
6.0,3.318072100313484,0.0849561128526645


## Spark Functions
If at all possible, it is best practice to leverage existing PySpark functions rather than writing your own. Each PySpark function is written to be highly parallelized and efficient. Any function written in generic Python will likely not be as performant, especially on large datasets. 

Common built-in functions:
* [Data type/conversion functions](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql.types)
* [Date time functions](https://docs.databricks.com/spark/latest/dataframes-datasets/dates-timestamps.html)
* [Math functions](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.sqrt) (e.g. square root)
* [Aggregate functions](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql.functions)
* [Data cleansing functions](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions) (e.g. remove nulls)

One option to parallelize functions that do not exist in PySpark are called Pandas_UDFs. These use efficient serialization techniques to parallelize processing across a Spark cluster. [Learn more here](https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html)

# Automation & Monitoring

## DBU Types

When working in Databricks it is important to consider the **cluster type** you are working on: 
* `Interactive`: "submit a command, get an immediate response" for interactive development purposes. Charged at a *higher* DBU rate. This is what we're doing when working in this notebook!
* `Automated`: "run this code with no human in the loop" for recurring job runs. Charged at a *lower* DBU rate. 

In general, it is a best practice to first develop your code on `Interactive` clusters, then move them as recurring scheduled jobs to `Automated` clusters. To do this, you can scheduel directly from your Notebooks using the `Schedule` button at top-right, or use the `Jobs` tab on the left.

## Jobs Tab
When moving workloads to an `Automated` cluster, you can configure several parameters. Remember that an `Automated` cluster is "ephemeral", just to run your bit of code. This is a highly efficient way to rent cloud computing resources, and you have full control over the exact cluster parameters and configuration to get the best performance. 

* **Schedule**: on what recurring schedule should the code execute, using a simple schedule UI or CRON syntax.
* **Notifications**: set email alerts for job starting, completing, or errors
* **Virtual Machine types**: set cluster compute resources and determine the right hardware for your job
* **Autoscaling parameters**: you can select to allow the cluster to "autoscale" up and down depending on the amount of Spark tasks waiting
* **Logs**: after each run of a job, the Driver, Spark, and any custom logs are available to refer back to later
* **Initialization scripts**: a.k.a "init scripts", these allow you to highly customize your runtime environment with bash scripting
* Many more settings, including permissions, Container services, and runtimes...

## Cost Efficiency Considerations
In addition to the parameters for a single "job", your team can achieve cost efficiency goals with several additional Databricks features:

* **Instance Pools**: clusters can pull from a "pool" of warm instances, drastically reducing cluster launch and scaling times.
* **High Concurrency Clusters**: multiple users can efficiently "share" cluster compute resources with features like task pre-emption and increased session isolation
* **Autoscaling**: Databricks has proprietary features that will autoscale clusters up and down depending on the work waiting to be completed
* **Autotermination**: clusters with no activity will self-terminate after a period of time. 
* **Tagging**: assign cluster "tags" that propagate back to reporting on the cloud provider to monitor and manage costs

# Cleanup

Thanks for going through this notebook! To learn more, be sure to check out customer training available at [Databricks Academy](https://academy.databricks.com/pathway/INT-AL-FREE-SP)!
Coupon Code: `DB_CE` at checkout

In [0]:
#File cleanup of Wine Delta table created
dbutils.fs.rm(save_path, recurse=True)