Skip to content

Commit

Permalink
feat: subreports
Browse files Browse the repository at this point in the history
chore: try to get github to recognize license format
  • Loading branch information
totalhack committed May 2, 2023
1 parent 11c9b0b commit 655397a
Show file tree
Hide file tree
Showing 11 changed files with 500 additions and 175 deletions.
4 changes: 1 addition & 3 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
Copyright (c) 2019 to Present totalhack
Signed: 48ce2494044fc4db7b23a35240ee9c4d163b62b66b630a0cae7ebf8987015d71

GNU LESSER GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Expand Down Expand Up @@ -165,5 +164,4 @@ General Public License ever published by the Free Software Foundation.
whether future versions of the GNU Lesser General Public License shall
apply, that proxy's public statement of acceptance of any version is
permanent authorization for you to choose that version for the
Library.

Library.
134 changes: 89 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,17 +49,18 @@ With `Zillion` you can:
* [Warehouse Configuration](#example-warehouse-config)
* [Reports](#example-reports)
* [Advanced Topics](#advanced-topics)
* [Subreports](#subreports)
* [FormulaMetrics](#formula-metrics)
* [Divisor Metrics](#divisor-metrics)
* [FormulaDimensions](#formula-dimensions)
* [DataSource Formulas](#datasource-formulas)
* [Type Conversions](#type-conversions)
* [Config Variables](#config-variables)
* [DataSource Priority](#datasource-priority)
* [AdHocMetrics](#adhoc-metrics)
* [AdHocDimensions](#adhoc-dimensions)
* [AdHocDataTables](#adhoc-data-tables)
* [Technicals](#technicals)
* [Config Variables](#config-variables)
* [DataSource Priority](#datasource-priority)
* [Supported DataSources](#supported-datasources)
* [Multiprocess Considerations](#multiprocess-considerations)
* [Demo UI / Web API](#demo-ui)
Expand Down Expand Up @@ -555,6 +556,49 @@ result = wh.execute(
**Advanced Topics**
-------------------

<a name="subreports"></a>

### **Subreports**

Sometimes you need subquery-like functionality in order to filter one
report to the results of some other (that perhaps required a different grain).
Zillion provides a simplistic way of doing that by using the `in report` or `not in report`
criteria operations. There are two supported ways to specify the subreport: passing a
report spec ID or passing a dict of report params.

```python
# Assuming you have saved report 1234 and it has "partner" as a dimension:

result = warehouse.execute(
metrics=["revenue", "leads"],
dimensions=["date"],
criteria=[
("date", ">", "2020-01-01"),
("partner", "in report", 1234)
]
)

# Or with a dict:

result = warehouse.execute(
metrics=["revenue", "leads"],
dimensions=["date"],
criteria=[
("date", ">", "2020-01-01"),
("partner", "in report", dict(
metrics=[...],
dimension=["partner"],
criteria=[...]
))
]
)
```

The criteria field used in `in report` or `not in report` must be a dimension
in the subreport. Note that subreports are executed at `Report` object initialization
time instead of during `execute` -- as such they can not be killed using `Report.kill`.
This may change down the road.

<a name="formula-metrics"></a>

### **Formula Metrics**
Expand Down Expand Up @@ -658,49 +702,6 @@ To prevent type conversions, set `skip_conversion_fields` to `true` on your
See `zillion.field.TYPE_ALLOWED_CONVERSIONS` and `zillion.field.DIALECT_CONVERSIONS`
for more details on currently supported conversions.

<a name="config-variables"></a>

### **Config Variables**

If you'd like to avoid putting sensitive connection information directly in
your `DataSource` configs you can leverage config variables. In your `Zillion`
yaml config you can specify a `DATASOURCE_CONTEXTS` section as follows:

```yaml
DATASOURCE_CONTEXTS:
my_ds_name:
user: user123
pass: goodpassword
host: 127.0.0.1
schema: reporting
```

Then when your `DataSource` config for the datasource named "my_ds_name" is
read, it can use this context to populate variables in your connection url:

```json
"datasources": {
"my_ds_name": {
"connect": "mysql+pymysql://{user}:{pass}@{host}/{schema}"
...
}
}
```

<a name="DataSource Priority"></a>

### **DataSource Priority**

On `Warehouse` init you can specify a default priority order for datasources
by name. This will come into play when a report could be satisfied by multiple
datasources. `DataSources` earlier in the list will be higher priority. This
would be useful if you wanted to favor a set of faster, aggregate tables that
are grouped in a `DataSource`.

```python
wh = Warehouse(config=config, ds_priority=["aggr_ds", "raw_ds", ...])
```

<a name="adhoc-metrics"></a>

### **Ad Hoc Metrics**
Expand Down Expand Up @@ -794,6 +795,49 @@ appending it to the technical string: i.e. "cumsum:all" or "mean(5):group"

---

<a name="config-variables"></a>

### **Config Variables**

If you'd like to avoid putting sensitive connection information directly in
your `DataSource` configs you can leverage config variables. In your `Zillion`
yaml config you can specify a `DATASOURCE_CONTEXTS` section as follows:

```yaml
DATASOURCE_CONTEXTS:
my_ds_name:
user: user123
pass: goodpassword
host: 127.0.0.1
schema: reporting
```

Then when your `DataSource` config for the datasource named "my_ds_name" is
read, it can use this context to populate variables in your connection url:

```json
"datasources": {
"my_ds_name": {
"connect": "mysql+pymysql://{user}:{pass}@{host}/{schema}"
...
}
}
```

<a name="datasource-priority"></a>

### **DataSource Priority**

On `Warehouse` init you can specify a default priority order for datasources
by name. This will come into play when a report could be satisfied by multiple
datasources. `DataSources` earlier in the list will be higher priority. This
would be useful if you wanted to favor a set of faster, aggregate tables that
are grouped in a `DataSource`.

```python
wh = Warehouse(config=config, ds_priority=["aggr_ds", "raw_ds", ...])
```

<a name="supported-datasources"></a>

**Supported DataSources**
Expand Down
129 changes: 86 additions & 43 deletions docs/markdown/readme_contents.md
Original file line number Diff line number Diff line change
Expand Up @@ -487,6 +487,49 @@ result = wh.execute(
**Advanced Topics**
-------------------

<a name="subreports"></a>

### **Subreports**

Sometimes you need subquery-like functionality in order to filter one
report to the results of some other (that perhaps required a different grain).
Zillion provides a simplistic way of doing that by using the `in report` or `not in report`
criteria operations. There are two supported ways to specify the subreport: passing a
report spec ID or passing a dict of report params.

```python
# Assuming you have saved report 1234 and it has "partner" as a dimension:

result = warehouse.execute(
metrics=["revenue", "leads"],
dimensions=["date"],
criteria=[
("date", ">", "2020-01-01"),
("partner", "in report", 1234)
]
)

# Or with a dict:

result = warehouse.execute(
metrics=["revenue", "leads"],
dimensions=["date"],
criteria=[
("date", ">", "2020-01-01"),
("partner", "in report", dict(
metrics=[...],
dimension=["partner"],
criteria=[...]
))
]
)
```

The criteria field used in `in report` or `not in report` must be a dimension
in the subreport. Note that subreports are executed at `Report` object initialization
time instead of during `execute` -- as such they can not be killed using `Report.kill`.
This may change down the road.

<a name="formula-metrics"></a>

### **Formula Metrics**
Expand Down Expand Up @@ -590,49 +633,6 @@ To prevent type conversions, set `skip_conversion_fields` to `true` on your
See `zillion.field.TYPE_ALLOWED_CONVERSIONS` and `zillion.field.DIALECT_CONVERSIONS`
for more details on currently supported conversions.

<a name="config-variables"></a>

### **Config Variables**

If you'd like to avoid putting sensitive connection information directly in
your `DataSource` configs you can leverage config variables. In your `Zillion`
yaml config you can specify a `DATASOURCE_CONTEXTS` section as follows:

```yaml
DATASOURCE_CONTEXTS:
my_ds_name:
user: user123
pass: goodpassword
host: 127.0.0.1
schema: reporting
```

Then when your `DataSource` config for the datasource named "my_ds_name" is
read, it can use this context to populate variables in your connection url:

```json
"datasources": {
"my_ds_name": {
"connect": "mysql+pymysql://{user}:{pass}@{host}/{schema}"
...
}
}
```

<a name="DataSource Priority"></a>

### **DataSource Priority**

On `Warehouse` init you can specify a default priority order for datasources
by name. This will come into play when a report could be satisfied by multiple
datasources. `DataSources` earlier in the list will be higher priority. This
would be useful if you wanted to favor a set of faster, aggregate tables that
are grouped in a `DataSource`.

```python
wh = Warehouse(config=config, ds_priority=["aggr_ds", "raw_ds", ...])
```

<a name="adhoc-metrics"></a>

### **Ad Hoc Metrics**
Expand Down Expand Up @@ -726,6 +726,49 @@ appending it to the technical string: i.e. "cumsum:all" or "mean(5):group"

---

<a name="config-variables"></a>

### **Config Variables**

If you'd like to avoid putting sensitive connection information directly in
your `DataSource` configs you can leverage config variables. In your `Zillion`
yaml config you can specify a `DATASOURCE_CONTEXTS` section as follows:

```yaml
DATASOURCE_CONTEXTS:
my_ds_name:
user: user123
pass: goodpassword
host: 127.0.0.1
schema: reporting
```

Then when your `DataSource` config for the datasource named "my_ds_name" is
read, it can use this context to populate variables in your connection url:

```json
"datasources": {
"my_ds_name": {
"connect": "mysql+pymysql://{user}:{pass}@{host}/{schema}"
...
}
}
```

<a name="datasource-priority"></a>

### **DataSource Priority**

On `Warehouse` init you can specify a default priority order for datasources
by name. This will come into play when a report could be satisfied by multiple
datasources. `DataSources` earlier in the list will be higher priority. This
would be useful if you wanted to favor a set of faster, aggregate tables that
are grouped in a `DataSource`.

```python
wh = Warehouse(config=config, ds_priority=["aggr_ds", "raw_ds", ...])
```

<a name="supported-datasources"></a>

**Supported DataSources**
Expand Down
5 changes: 3 additions & 2 deletions docs/markdown/readme_toc.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,18 @@
* [Warehouse Configuration](#example-warehouse-config)
* [Reports](#example-reports)
* [Advanced Topics](#advanced-topics)
* [Subreports](#subreports)
* [FormulaMetrics](#formula-metrics)
* [Divisor Metrics](#divisor-metrics)
* [FormulaDimensions](#formula-dimensions)
* [DataSource Formulas](#datasource-formulas)
* [Type Conversions](#type-conversions)
* [Config Variables](#config-variables)
* [DataSource Priority](#datasource-priority)
* [AdHocMetrics](#adhoc-metrics)
* [AdHocDimensions](#adhoc-dimensions)
* [AdHocDataTables](#adhoc-data-tables)
* [Technicals](#technicals)
* [Config Variables](#config-variables)
* [DataSource Priority](#datasource-priority)
* [Supported DataSources](#supported-datasources)
* [Multiprocess Considerations](#multiprocess-considerations)
* [Demo UI / Web API](#demo-ui)
Expand Down

0 comments on commit 655397a

Please sign in to comment.