Open
Description
Feature Request / Improvement
Request to add limit
pushdown to improve the performance of reading a big table by skipping full batch scan, where the batch scan is implemented here
How is this observed?
When select * from table_name limit 1
, the spark will actually scan all the data from the table; the bigger the table, the longer it takes.
For example,
(1) BatchScan glue_catalog.lakehouse_bronze.table_name
Output [51]: [ISTEST#69, LEADUUID#70, UPDATEDAT#71, ...etc]
glue_catalog.lakehouse_bronze.table_name (branch=null) [filters=, groupedBy=] <-- don't have limit pushdown
Query engine
Spark
Willingness to contribute
- I can contribute this improvement/feature independently
- I would be willing to contribute this improvement/feature with guidance from the Iceberg community
- I cannot contribute this improvement/feature at this time