Skip to content

Commit

Permalink
[HOTFIX] Fix documentation errors.Add examples for pre-aggregate usage
Browse files Browse the repository at this point in the history
Fix documentation errors.Add examples for pre-aggregate usage

This closes apache#1945

(cherry picked from commit fa1c515)
  • Loading branch information
sraghunandan authored and zzcclp committed Mar 1, 2018
1 parent 7726b4f commit c424037
Show file tree
Hide file tree
Showing 2 changed files with 51 additions and 45 deletions.
72 changes: 27 additions & 45 deletions docs/data-management-on-carbondata.md
Original file line number Diff line number Diff line change
Expand Up @@ -638,21 +638,21 @@ This tutorial is going to introduce all commands and data operations on CarbonDa

```
LOAD DATA [LOCAL] INPATH 'folder_path'
INTO TABLE [db_name.]table_name PARTITION (partition_spec)
OPTIONS(property_name=property_value, ...)
NSERT INTO INTO TABLE [db_name.]table_name PARTITION (partition_spec) SELECT STATMENT
INTO TABLE [db_name.]table_name PARTITION (partition_spec)
OPTIONS(property_name=property_value, ...)
INSERT INTO INTO TABLE [db_name.]table_name PARTITION (partition_spec) <SELECT STATMENT>
```

Example:
```
LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
INTO TABLE locationTable
PARTITION (country = 'US', state = 'CA')
LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.csv'
INTO TABLE locationTable
PARTITION (country = 'US', state = 'CA')
INSERT INTO TABLE locationTable
PARTITION (country = 'US', state = 'AL')
SELECT * FROM another_user au
WHERE au.country = 'US' AND au.state = 'AL';
PARTITION (country = 'US', state = 'AL')
SELECT <columns list excluding partition columns> FROM another_user
```

#### Load Data Using Dynamic Partition
Expand All @@ -661,12 +661,11 @@ This tutorial is going to introduce all commands and data operations on CarbonDa

Example:
```
LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
INTO TABLE locationTable
LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.csv'
INTO TABLE locationTable
INSERT INTO TABLE locationTable
SELECT * FROM another_user au
WHERE au.country = 'US' AND au.state = 'AL';
SELECT <columns list excluding partition columns> FROM another_user
```

#### Show Partitions
Expand All @@ -690,19 +689,19 @@ This tutorial is going to introduce all commands and data operations on CarbonDa

```
INSERT OVERWRITE TABLE table_name
PARTITION (column = 'partition_name')
select_statement
PARTITION (column = 'partition_name')
select_statement
```

Example:
```
INSERT OVERWRITE TABLE partitioned_user
PARTITION (country = 'US')
SELECT * FROM another_user au
WHERE au.country = 'US';
PARTITION (country = 'US')
SELECT * FROM another_user au
WHERE au.country = 'US';
```

### CARBONDATA PARTITION(HASH,RANGE,LIST) -- Alpha feature, this partition not supports update and delete data.
### CARBONDATA PARTITION(HASH,RANGE,LIST) -- Alpha feature, this partition feature does not support update and delete data.

The partition supports three type:(Hash,Range,List), similar to other system's partition features, CarbonData's partition feature can be used to improve query performance by filtering on the partition column.

Expand Down Expand Up @@ -897,11 +896,11 @@ will be transformed by Query Planner to fetch data from pre-aggregate table **ag

But queries of kind
```
SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY country, sex
SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY user_id, country, sex
SELECT sex, avg(quantity) from sales GROUP BY sex
SELECT max(price), country from sales GROUP BY country
SELECT country, max(price) from sales GROUP BY country
```

will fetch the data from the main table **sales**
Expand All @@ -921,18 +920,13 @@ pre-aggregate tables satisfy the query condition, the plan is transformed automa
pre-aggregate table to fetch the data

##### Compacting pre-aggregate tables
Compaction is an optional operation for pre-aggregate table. If compaction is performed on main
table but not performed on pre-aggregate table, all queries still can benefit from pre-aggregate
table.To further improve performance on pre-aggregate table, compaction can be triggered on
pre-aggregate tables directly, it will merge the segments inside pre-aggregation table.
To do that, use ALTER TABLE COMPACT command on the pre-aggregate table just like the main table
Compaction command (ALTER TABLE COMPACT) need to be run separately on each pre-aggregate table.
Running Compaction command on main table will **not automatically** compact the pre-aggregate
tables.Compaction is an optional operation for pre-aggregate table. If compaction is performed on
main table but not performed on pre-aggregate table, all queries still can benefit from
pre-aggregate tables.To further improve performance on pre-aggregate tables, compaction can be
triggered on pre-aggregate tables directly, it will merge the segments inside pre-aggregate table.

NOTE:
* If the aggregate function used in the pre-aggregate table creation included distinct-count,
during compaction, the pre-aggregate table values are recomputed.This would a costly
operation as compared to the compaction of pre-aggregate tables containing other aggregate
functions alone

##### Update/Delete Operations on pre-aggregate tables
This functionality is not supported.

Expand Down Expand Up @@ -1007,16 +1001,6 @@ roll-up for the queries on these hierarchies.
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
avg(price) FROM sales GROUP BY order_time, country, sex
CREATE DATAMAP agg_minute
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
'event_time’=’order_time’,
'minute_granualrity’=’1’,
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
avg(price) FROM sales GROUP BY order_time, country, sex
CREATE DATAMAP agg_minute
ON TABLE sales
USING "timeseries"
Expand All @@ -1039,9 +1023,7 @@ roll-up for the queries on these hierarchies.
```

It is **not necessary** to create pre-aggregate tables for each granularity unless required for
query
.Carbondata
can roll-up the data and fetch it
query.Carbondata can roll-up the data and fetch it.

For Example: For main table **sales** , If pre-aggregate tables were created as

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,30 @@ object PreAggregateTableExample {
println("time for query on table without pre-aggregate table:" + time_without_aggTable.toString)
// scalastyle:on

// 3. if avg function is defined for a column, sum also can be used on that;but not other way
// round
val time_without_aggTable_sum = time {
spark.sql(
s"""
| SELECT id, sum(age)
| FROM personTableWithoutAgg group by id
""".stripMargin).count()
}

val time_with_aggTable_sum = time {
spark.sql(
s"""
| SELECT id, sum(age)
| FROM personTable group by id
""".stripMargin).count()
}
// scalastyle:off
println("time for query with function sum on table with pre-aggregate table:" +
time_with_aggTable_sum.toString)
println("time for query with function sum on table without pre-aggregate table:" +
time_without_aggTable_sum.toString)
// scalastyle:on

spark.sql("DROP TABLE IF EXISTS mainTable")
spark.sql("DROP TABLE IF EXISTS personTable")
spark.sql("DROP TABLE IF EXISTS personTableWithoutAgg")
Expand Down

0 comments on commit c424037

Please sign in to comment.