-
-
Notifications
You must be signed in to change notification settings - Fork 101
Description
DuckDB's HTTPFS feature, which can read parquet, csv, json, and other files on HTTP servers or cloud object storage, is an incredibly powerful tool that allows the query engine to use range reads to push down queries on parquet (and use its builtin statistics) to limit the amount of data transferred over the network. This helps DuckDB run queries really quickly even over files that might be too large to load into DuckDB WASM's memory.
When I tried this spec in Mosaic Playground:
{
"plot": [
{
"mark": "lineY",
"data": {
"from": "read_parquet('https://f005.backblazeb2.com/file/alk-data/courtlistener/2024-10-27/opinion-clusters-2024-09-30.parquet')"
},
"x": "file",
"y": "Close"
}
],
"width": 680,
"height": 200
}Mosaic created this query:
DESCRIBE SELECT "Date" AS "col0", "Close" AS "col1" FROM "read_parquet('https://f005/"."backblazeb2"."com/file/alk-data/courtlistener/2024-10-27/opinion-clusters-2024-09-30"."parquet')" AS "source"And when I changed it to remove the read_parquet function I got
DESCRIBE SELECT "Date" AS "col0", "Close" AS "col1" FROM "https://f005/"."backblazeb2"."com/file/alk-data/courtlistener/2024-10-27/opinion-clusters-2024-09-30"."parquet" AS "source"It would be great to add some logic to detect https:// and http:// strings (and maybe s3:// and hf:// which are also supported by the httpfs extension) in the from field, and output them directly into the output SQL.
mosaic/packages/sql/src/Query.js
Lines 158 to 185 in 56756b0
| from(...expr) { | |
| const { query } = this; | |
| if (expr.length === 0) { | |
| // @ts-ignore | |
| return query.from; | |
| } else { | |
| const list = []; | |
| expr.flat().forEach(e => { | |
| if (e == null) { | |
| // do nothing | |
| } else if (typeof e === 'string') { | |
| list.push({ as: e, from: asRelation(e) }); | |
| } else if (e instanceof Ref) { | |
| list.push({ as: e.table, from: e }); | |
| } else if (isQuery(e) || isSQLExpression(e)) { | |
| list.push({ from: e }); | |
| } else if (Array.isArray(e)) { | |
| list.push({ as: unquote(e[0]), from: asRelation(e[1]) }); | |
| } else { | |
| for (const as in e) { | |
| list.push({ as: unquote(as), from: asRelation(e[as]) }); | |
| } | |
| } | |
| }); | |
| query.from = query.from.concat(list); | |
| return this; | |
| } | |
| } |
And to add docs/examples for mosaic-sql, vgplot, and mosaic-spec.