## **Delta Lake vs Parquet**

- **Parquet** is a columnar storage file format — excellent for read-heavy workloads, but it is just a file. It has no concept of transactions, no ability to update or delete rows, and no metadata management.

- **Delta Lake** is built on top of Parquet but adds a **__delta_log/_** folder — the transaction log. This is what gives Delta its superpowers.

💡 Key Analogy  
Parquet is a file. Delta is a table system that uses Parquet files internally — similar to how a database uses raw data files but manages them with a layer of intelligence on top.


![image_1771592875617.png](./image_1771592875617.png "image_1771592875617.png")




## **The Small File Problem**

Every time you perform a small append or streaming write, Spark creates a new Parquet file inside your Delta table directory. Over time, this leads to thousands of tiny files instead of a manageable number of larger ones.

**Why Small Files Hurt Performance**
- Task overhead: Each file becomes one Spark task — 10,000 small files = 10,000 tasks even if data is tiny
- Metadata cost: Spark must open and read the footer of every Parquet file to find schema and statistics
- Network overhead: More files = more I/O round trips to cloud storage (S3, ADLS, GCS)
- Driver bottleneck: The Spark driver must track every task; too many small tasks can crash it

**Common Causes of Small Files**

- **1.** Frequent small appends — writing a few rows every few minutes  
- **2.** Streaming jobs — each micro-batch creates new files  
- **3.** High-cardinality partitioning — partitioning by user_id creates millions of tiny partition folders   
- **4.** Auto Loader / Kafka consumers — continuous ingestion with small trigger intervals

> **Note:**  
> Ideal Parquet file size is 128 MB to 1 GB. If your files are consistently smaller than 32 MB, you have a small file problem. OPTIMIZE is the cure.


## **OPTIMIZE Command**

The **OPTIMIZE** command compacts many small Parquet files into fewer, larger files.  
Delta Lake targets approximately 1 GB per output file. Old small files are marked as deleted in the transaction log (but retained for time travel until VACUUM is run).

> ****Basic OPTIMIZE Syntax****

> -- Basic OPTIMIZE (compaction only)  
`OPTIMIZE table_name;`   
>
> -- OPTIMIZE with ZORDER (compaction + data co-location)  
`OPTIMIZE table_name ZORDER BY (column1, column2); `
>
> -- OPTIMIZE on a specific partition   
`OPTIMIZE table_name WHERE order_date = '2024-01-01';`



## **What is ZORDER?**
- ZORDER reorders the data within files so that rows with similar values in a column are physically stored together.
- When you later run WHERE city = 'Mumbai', Spark can skip most files entirely using Delta's data skipping statistics.

| **Scenario**                                      | **Recommendation**                               |
|---------------------------------------------------|--------------------------------------------------|
| **Column used in WHERE filters, low cardinality** | ZORDER BY that column                            |
| **Column used for date ranges**                   | Partition by date, then ZORDER by another column |
| **High cardinality column (user_id)**             | ZORDER — do NOT partition                        |
| **After bulk load**                               | Always run OPTIMIZE                              |
| **Streaming pipeline**                            | Schedule OPTIMIZE every 1–4 hours                |

## **Basic Performance Thinking**
Good Delta performance comes from making it easy for Spark to skip data it doesn't need.    
The three tools for this are **partitioning, ZORDER**, and **OPTIMIZE** — each works at a different granularity.

**Partitioning  vs  ZORDER  vs  OPTIMIZE**

| **Technique**    | **Works At**    | **Best For**                     | **Caution**                           |
|------------------|-----------------|----------------------------------|---------------------------------------|
| **Partitioning** | Directory level | Date, Region — low cardinality   | Never use high-cardinality columns    |
| **ZORDER**       | File level      | Frequently filtered columns      | Use with OPTIMIZE together            |
| **OPTIMIZE**     | File size       | Reducing file count after writes | Run after bulk loads, not every write |
| **VACUUM**       | Disk cleanup    | Removing old file versions       | Wait 7+ days to preserve time travel  |



💡**Golden Rules**

- `Target file size: 128 MB to 1 GB per Parquet file`
- `Partition columns:` Low to medium cardinality only — date, region, status
- `Do NOT partition by:` user_id, session_id, transaction_id — creates too many directories
- `ZORDER columns:` High-cardinality columns you filter on frequently
- `OPTIMIZE timing:` After every significant batch write; never on every single small append


**Task 1 : Convert CSV to Delta Format**

## **Day 01 PRACTICAL TASKS**

In [0]:
df_nov =  spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("/Volumes/workspace/ecommerce/ecommerce_data/2019-Nov.csv")
df_nov.show(5)

+-------------------+----------+----------+-------------------+--------------------+------+------+---------+--------------------+
|         event_time|event_type|product_id|        category_id|       category_code| brand| price|  user_id|        user_session|
+-------------------+----------+----------+-------------------+--------------------+------+------+---------+--------------------+
|2019-11-01 00:00:00|      view|   1003461|2053013555631882655|electronics.smart...|xiaomi|489.07|520088904|4d3b30da-a5e4-49d...|
|2019-11-01 00:00:00|      view|   5000088|2053013566100866035|appliances.sewing...|janome|293.65|530496790|8e5f4f83-366c-4f7...|
|2019-11-01 00:00:01|      view|  17302664|2053013553853497655|                NULL| creed| 28.31|561587266|755422e7-9040-477...|
|2019-11-01 00:00:01|      view|   3601530|2053013563810775923|appliances.kitche...|    lg|712.87|518085591|3bfb58cd-7892-48c...|
|2019-11-01 00:00:01|      view|   1004775|2053013555631882655|electronics.smart...|xiaomi

In [0]:
df_oct =  spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("/Volumes/workspace/ecommerce/ecommerce_data/2019-Oct.csv")

In [0]:
events = df_oct.union(df_nov)

In [0]:
# Convert CSV to Delta format
events.write.format("delta").mode("overwrite").save("/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events")
print("✅ CSV successfully converted to Delta format")

# Verify: list the files created
display(dbutils.fs.ls("/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events/"))
# we'll see .parquet files + a _delta_log/ folder

✅ CSV successfully converted to Delta format


path,name,size,modificationTime
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events/_delta_log/,_delta_log/,0,1771601486850
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events/part-00000-94e158ba-a530-4a6f-87ae-28297e20f4f5.c000.snappy.parquet,part-00000-94e158ba-a530-4a6f-87ae-28297e20f4f5.c000.snappy.parquet,33620245,1771601254000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events/part-00000-cb350f0a-b3fe-441a-9a2e-e2e5eccd9a80.c000.snappy.parquet,part-00000-cb350f0a-b3fe-441a-9a2e-e2e5eccd9a80.c000.snappy.parquet,33620248,1771601426000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events/part-00001-69fd6b5f-3c0d-44bf-bf05-d7849d10e2c7.c000.snappy.parquet,part-00001-69fd6b5f-3c0d-44bf-bf05-d7849d10e2c7.c000.snappy.parquet,31245950,1771601426000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events/part-00001-d9c27c18-035f-43fe-9607-d7383af481b2.c000.snappy.parquet,part-00001-d9c27c18-035f-43fe-9607-d7383af481b2.c000.snappy.parquet,31245947,1771601254000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events/part-00002-983c1dd1-97ec-4fa3-9302-1e17bc7e4e61.c000.snappy.parquet,part-00002-983c1dd1-97ec-4fa3-9302-1e17bc7e4e61.c000.snappy.parquet,31464271,1771601426000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events/part-00002-9e868910-a529-4ab7-851f-6aa02003c88c.c000.snappy.parquet,part-00002-9e868910-a529-4ab7-851f-6aa02003c88c.c000.snappy.parquet,31464268,1771601254000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events/part-00003-88bea0fa-e69c-4734-8a39-ae638d5c835e.c000.snappy.parquet,part-00003-88bea0fa-e69c-4734-8a39-ae638d5c835e.c000.snappy.parquet,32185677,1771601426000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events/part-00003-cb7edc41-f9eb-4ee1-9b80-69c6ff6c9093.c000.snappy.parquet,part-00003-cb7edc41-f9eb-4ee1-9b80-69c6ff6c9093.c000.snappy.parquet,32185674,1771601254000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events/part-00004-0c050077-c382-4a0c-919e-87168d7fb0a7.c000.snappy.parquet,part-00004-0c050077-c382-4a0c-919e-87168d7fb0a7.c000.snappy.parquet,33289292,1771601259000


In [0]:
# This confirms only 112 files are ACTIVE (not 222)
display(spark.sql("""
DESCRIBE DETAIL delta.`/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events`
"""))
# numFiles = 112  ← Delta's truth

format,id,name,description,location,createdAt,lastModified,partitionColumns,clusteringColumns,numFiles,sizeInBytes,properties,minReaderVersion,minWriterVersion,tableFeatures,statistics,clusterByAuto
delta,df5c551a-47c2-4fcf-9b02-b62b53b97f75,,,dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events,2026-02-20T15:27:28.531Z,2026-02-20T15:31:26.000Z,List(),List(),111,3856979937,Map(delta.enableDeletionVectors -> true),3,7,"List(appendOnly, deletionVectors, invariants)","Map(numRowsDeletedByDeletionVectors -> 0, numDeletionVectors -> 0)",False


In [0]:
#  Verify that we can read it back as Delta
df_delta = spark.read.format("delta").load("/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events")
df_delta.show()
print(f"Row count: {df_delta.count()}")

+-------------------+----------+----------+-------------------+--------------------+--------+-------+---------+--------------------+
|         event_time|event_type|product_id|        category_id|       category_code|   brand|  price|  user_id|        user_session|
+-------------------+----------+----------+-------------------+--------------------+--------+-------+---------+--------------------+
|2019-10-01 00:00:00|      view|  44600062|2103807459595387724|                NULL|shiseido|  35.79|541312140|72d76fde-8bb3-4e0...|
|2019-10-01 00:00:00|      view|   3900821|2053013552326770905|appliances.enviro...|    aqua|   33.2|554748717|9333dfbd-b87a-470...|
|2019-10-01 00:00:01|      view|  17200506|2053013559792632471|furniture.living_...|    NULL|  543.1|519107250|566511c2-e2e3-422...|
|2019-10-01 00:00:01|      view|   1307067|2053013558920217191|  computers.notebook|  lenovo| 251.74|550050854|7c90fc70-0e80-459...|
|2019-10-01 00:00:04|      view|   1004237|2053013555631882655|electr

In [0]:
%sql
-- (SQL cell) — Create database first
CREATE DATABASE IF NOT EXISTS day1_db;
USE day1_db;

LOCATION = External Table = Unity Catalog permission setup = not available freely.

missing a cloud file system scheme (like "dbfs:/", "s3://", or "abfss://")

Databricks requires this scheme for table registration, especially with Unity Catalog. Your directory exists and contains Delta files, but it is not accessible via DBFS or S3, so the CREATE TABLE command can fail.

In [0]:
# lets Create a delta table as well
spark.sql("""
CREATE TABLE events_delta
USING DELTA
-- LOCATION '/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events/'
AS SELECT * FROM events
""")

[0;31m---------------------------------------------------------------------------[0m
[0;31mAnalysisException[0m                         Traceback (most recent call last)
File [0;32m<command-5183256210756524>, line 2[0m
[1;32m      1[0m [38;5;66;03m# lets Create a delta table as well[39;00m
[0;32m----> 2[0m spark[38;5;241m.[39msql([38;5;124m"""[39m
[1;32m      3[0m [38;5;124mCREATE TABLE events_delta[39m
[1;32m      4[0m [38;5;124mUSING DELTA[39m
[1;32m      5[0m [38;5;124mLOCATION [39m[38;5;124m'[39m[38;5;124m/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta/events/[39m[38;5;124m'[39m
[1;32m      6[0m [38;5;124m"""[39m)

File [0;32m/databricks/python/lib/python3.12/site-packages/pyspark/sql/connect/session.py:879[0m, in [0;36mSparkSession.sql[0;34m(self, sqlQuery, args, **kwargs)[0m
[1;32m    876[0m         _views[38;5;241m.[39mappend(SubqueryAlias(df[38;5;241m.[39m_plan, name))
[1;32m    878[0m cmd [38;5;241m=[39m SQL(sqlQuery, _a

In [0]:
# saveAsTable creates a MANAGED Delta table — no LOCATION needed
events.write \
    .format("delta") \
    .mode("overwrite") \
    .saveAsTable("events_delta")

print("✅ Managed Delta table created successfully")

✅ Managed Delta table created successfully


In [0]:
%sql
-- Verify the table
DESCRIBE TABLE EXTENDED events_delta;

col_name,data_type,comment
event_time,timestamp,
event_type,string,
product_id,int,
category_id,bigint,
category_code,string,
brand,string,
price,double,
user_id,int,
user_session,string,
,,


In [0]:
# View transaction history
display(spark.sql("DESCRIBE HISTORY events_delta"))

version,timestamp,userId,userName,operation,operationParameters,job,notebook,queryHistoryStatementId,clusterId,readVersion,isolationLevel,isBlindAppend,operationMetrics,userMetadata,engineInfo
0,2026-02-20T17:06:31.000Z,8815326091183894,bvishaladf@gmail.com,CREATE OR REPLACE TABLE AS SELECT,"Map(partitionBy -> [], clusterBy -> [], description -> null, isManaged -> true, properties -> {""delta.enableDeletionVectors"":""true""}, statsOnLoad -> true)",,List(162371018403429),4b99d552-bcdd-4483-8720-814ea00060a9,0220-170202-m8ruoskn-v2n,,WriteSerializable,False,"Map(numFiles -> 111, numRemovedFiles -> 0, numRemovedBytes -> 0, numDeletionVectorsRemoved -> 0, numOutputRows -> 109950743, numOutputBytes -> 3856979937)",,Databricks-Runtime/18.0.x-aarch64-photon-scala2.13


In [0]:
# Simulate small file problem
for i in range(3):
    events.limit(500).write.format("delta").mode("append").save("/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2")

In [0]:
# Better simulation — forces many tiny files per append
for i in range(5):
    events.limit(1000) \
        .repartition(10) \
        .write.format("delta") \
        .mode("append") \
        .save("/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2")

    # repartition will Force 10 tiny files per write

# Now check file count — should be 50+ tiny files
display(dbutils.fs.ls("/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2/"))

# Analyze file sizes and count
files = dbutils.fs.ls("/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2/")
file_info = [(f.name, f.size) for f in files if f.name.endswith(".parquet")]
df_files = spark.createDataFrame(file_info, ["file_name", "file_size_bytes"])
display(df_files)

path,name,size,modificationTime
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2/_delta_log/,_delta_log/,0,1771608518516
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2/part-00000-1d1d50df-d04d-4240-b914-dac79d101884.c000.snappy.parquet,part-00000-1d1d50df-d04d-4240-b914-dac79d101884.c000.snappy.parquet,22384,1771607265000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2/part-00000-5e45b71d-8010-46e5-b295-9d576e7eb898.c000.snappy.parquet,part-00000-5e45b71d-8010-46e5-b295-9d576e7eb898.c000.snappy.parquet,9169,1771608509000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2/part-00000-8029ff04-d83c-4435-b0ed-cf53e6e2c8f7.c000.snappy.parquet,part-00000-8029ff04-d83c-4435-b0ed-cf53e6e2c8f7.c000.snappy.parquet,9169,1771608512000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2/part-00000-80c23461-bdfe-40d2-a0f2-e44a43c4ddfd.c000.snappy.parquet,part-00000-80c23461-bdfe-40d2-a0f2-e44a43c4ddfd.c000.snappy.parquet,22384,1771607268000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2/part-00000-ba1fef09-d2db-4609-bd2e-a1a58a768650.c000.snappy.parquet,part-00000-ba1fef09-d2db-4609-bd2e-a1a58a768650.c000.snappy.parquet,9169,1771608515000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2/part-00000-bb9e1a49-345d-4789-8bce-bd43c1b0743d.c000.snappy.parquet,part-00000-bb9e1a49-345d-4789-8bce-bd43c1b0743d.c000.snappy.parquet,22384,1771607261000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2/part-00000-d34822af-8a0b-4243-9c54-7049a65a8465.c000.snappy.parquet,part-00000-d34822af-8a0b-4243-9c54-7049a65a8465.c000.snappy.parquet,9169,1771608518000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2/part-00000-ea42f5bb-9eee-4dfb-80a9-4fc9fea25be9.c000.snappy.parquet,part-00000-ea42f5bb-9eee-4dfb-80a9-4fc9fea25be9.c000.snappy.parquet,9169,1771608506000
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2/part-00001-2c9358b1-cd89-4744-8101-3832605a8f6b.c000.snappy.parquet,part-00001-2c9358b1-cd89-4744-8101-3832605a8f6b.c000.snappy.parquet,9298,1771608506000


file_name,file_size_bytes
part-00000-1d1d50df-d04d-4240-b914-dac79d101884.c000.snappy.parquet,22384
part-00000-5e45b71d-8010-46e5-b295-9d576e7eb898.c000.snappy.parquet,9169
part-00000-8029ff04-d83c-4435-b0ed-cf53e6e2c8f7.c000.snappy.parquet,9169
part-00000-80c23461-bdfe-40d2-a0f2-e44a43c4ddfd.c000.snappy.parquet,22384
part-00000-ba1fef09-d2db-4609-bd2e-a1a58a768650.c000.snappy.parquet,9169
part-00000-bb9e1a49-345d-4789-8bce-bd43c1b0743d.c000.snappy.parquet,22384
part-00000-d34822af-8a0b-4243-9c54-7049a65a8465.c000.snappy.parquet,9169
part-00000-ea42f5bb-9eee-4dfb-80a9-4fc9fea25be9.c000.snappy.parquet,9169
part-00001-2c9358b1-cd89-4744-8101-3832605a8f6b.c000.snappy.parquet,9298
part-00001-84657728-68fc-4169-9bab-ad1489ec914d.c000.snappy.parquet,9298


In [0]:
# Check before Optimize total file count 
display(spark.sql("DESCRIBE DETAIL delta.`/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2`"))
# Note the numFiles value: should be ~53

format,id,name,description,location,createdAt,lastModified,partitionColumns,clusteringColumns,numFiles,sizeInBytes,properties,minReaderVersion,minWriterVersion,tableFeatures,statistics,clusterByAuto
delta,f31b85fb-3e7f-4420-a7a4-cdf2b0150890,,,dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2,2026-02-20T17:07:37.679Z,2026-02-20T17:28:38.000Z,List(),List(),53,533092,Map(delta.enableDeletionVectors -> true),3,7,"List(appendOnly, deletionVectors, invariants)","Map(numRowsDeletedByDeletionVectors -> 0, numDeletionVectors -> 0)",False


In [0]:
%sql
-- Run OPTIMIZE on the events table
OPTIMIZE delta.`/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2`

path,metrics
dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2,"List(1, 53, List(60614, 60614, 60614.0, 1, 60614), List(9028, 22384, 10058.33962264151, 53, 533092), 0, null, null, 0, 1, 53, 0, true, 0, 0, 1771609282533, 1771609287992, 8, 1, null, List(0, 0), null, 9, 9, 618, 0, null, null)"


In [0]:
# Check after Optimise — numFiles should drop significantly
display(spark.sql("DESCRIBE DETAIL delta.`/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2`"))

format,id,name,description,location,createdAt,lastModified,partitionColumns,clusteringColumns,numFiles,sizeInBytes,properties,minReaderVersion,minWriterVersion,tableFeatures,statistics,clusterByAuto
delta,f31b85fb-3e7f-4420-a7a4-cdf2b0150890,,,dbfs:/Volumes/ecommerce/sc_ecommerce/vol_ecommerce/delta_2/events_2,2026-02-20T17:07:37.679Z,2026-02-20T17:41:28.000Z,List(),List(),1,60614,Map(delta.enableDeletionVectors -> true),3,7,"List(appendOnly, deletionVectors, invariants)","Map(numRowsDeletedByDeletionVectors -> 0, numDeletionVectors -> 0)",False


In [0]:
# saveAsTable creates a MANAGED Delta table — no LOCATION needed
events.write \
    .format("delta") \
    .mode("overwrite") \
    .saveAsTable("events_delta_2")

print("✅ Managed Delta table created successfully")

✅ Managed Delta table created successfully


In [0]:
%sql
-- Verify the table
DESCRIBE TABLE EXTENDED events_delta_2;

col_name,data_type,comment
event_time,timestamp,
event_type,string,
product_id,int,
category_id,bigint,
category_code,string,
brand,string,
price,double,
user_id,int,
user_session,string,
,,


In [0]:
spark.sql("OPTIMIZE events_delta_2")


DataFrame[path: string, metrics: struct<numFilesAdded:bigint,numFilesRemoved:bigint,filesAdded:struct<min:bigint,max:bigint,avg:double,totalFiles:bigint,totalSize:bigint>,filesRemoved:struct<min:bigint,max:bigint,avg:double,totalFiles:bigint,totalSize:bigint>,partitionsOptimized:bigint,zOrderStats:struct<strategyName:string,inputCubeFiles:struct<num:bigint,size:bigint>,inputOtherFiles:struct<num:bigint,size:bigint>,inputNumCubes:bigint,mergedFiles:struct<num:bigint,size:bigint>,numOutputCubes:bigint,mergedNumCubes:bigint>,clusteringStats:struct<inputZCubeFiles:struct<numFiles:bigint,size:bigint>,inputOtherFiles:struct<numFiles:bigint,size:bigint>,inputNumZCubes:bigint,mergedFiles:struct<numFiles:bigint,size:bigint>,numOutputZCubes:bigint>,numBins:bigint,numBatches:bigint,totalConsideredFiles:bigint,totalFilesSkipped:bigint,preserveInsertionOrder:boolean,numFilesSkippedToReduceWriteAmplification:bigint,numBytesSkippedToReduceWriteAmplification:bigint,startTimeMs:bigint,endTimeMs:bigint,