[HOTFIX] Fix documentation errors.Add examples for pre-aggregate usage

Fix documentation errors.Add examples for pre-aggregate usage This closes apache#1945 (cherry picked from commit fa1c515)
zzcclp · Mar 1, 2018 · c424037 · c424037
1 parent 7726b4f
commit c424037
Show file tree

Hide file tree

Showing 2 changed files with 51 additions and 45 deletions.
diff --git a/docs/data-management-on-carbondata.md b/docs/data-management-on-carbondata.md
@@ -638,21 +638,21 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
 
   ```
   LOAD DATA [LOCAL] INPATH 'folder_path' 
-    INTO TABLE [db_name.]table_name PARTITION (partition_spec) 
-    OPTIONS(property_name=property_value, ...)
-  NSERT INTO INTO TABLE [db_name.]table_name PARTITION (partition_spec) SELECT STATMENT 
+  INTO TABLE [db_name.]table_name PARTITION (partition_spec) 
+  OPTIONS(property_name=property_value, ...)
+    
+  INSERT INTO INTO TABLE [db_name.]table_name PARTITION (partition_spec) <SELECT STATMENT>
   ```
 
   Example:
   ```
-  LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
-    INTO TABLE locationTable
-    PARTITION (country = 'US', state = 'CA')
+  LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.csv'
+  INTO TABLE locationTable
+  PARTITION (country = 'US', state = 'CA')
     
   INSERT INTO TABLE locationTable
-    PARTITION (country = 'US', state = 'AL')
-    SELECT * FROM another_user au 
-    WHERE au.country = 'US' AND au.state = 'AL';
+  PARTITION (country = 'US', state = 'AL')
+  SELECT <columns list excluding partition columns> FROM another_user
   ```
 
 #### Load Data Using Dynamic Partition
@@ -661,12 +661,11 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
 
   Example:
   ```
-  LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
-    INTO TABLE locationTable
+  LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.csv'
+  INTO TABLE locationTable
           
   INSERT INTO TABLE locationTable
-    SELECT * FROM another_user au 
-    WHERE au.country = 'US' AND au.state = 'AL';
+  SELECT <columns list excluding partition columns> FROM another_user
   ```
 
 #### Show Partitions
@@ -690,19 +689,19 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
 
   ```
    INSERT OVERWRITE TABLE table_name
-    PARTITION (column = 'partition_name')
-    select_statement
+   PARTITION (column = 'partition_name')
+   select_statement
   ```
 
   Example:
   ```
   INSERT OVERWRITE TABLE partitioned_user
-    PARTITION (country = 'US')
-    SELECT * FROM another_user au 
-    WHERE au.country = 'US';
+  PARTITION (country = 'US')
+  SELECT * FROM another_user au 
+  WHERE au.country = 'US';
   ```
 
-### CARBONDATA PARTITION(HASH,RANGE,LIST) -- Alpha feature, this partition not supports update and delete data.
+### CARBONDATA PARTITION(HASH,RANGE,LIST) -- Alpha feature, this partition feature does not support update and delete data.
 
   The partition supports three type:(Hash,Range,List), similar to other system's partition features, CarbonData's partition feature can be used to improve query performance by filtering on the partition column.
 
@@ -897,11 +896,11 @@ will be transformed by Query Planner to fetch data from pre-aggregate table **ag
 
 But queries of kind
 ```
-SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY country, sex
+SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY user_id, country, sex
 
 SELECT sex, avg(quantity) from sales GROUP BY sex
 
-SELECT max(price), country from sales GROUP BY country
+SELECT country, max(price) from sales GROUP BY country
 ```
 
 will fetch the data from the main table **sales**
@@ -921,18 +920,13 @@ pre-aggregate tables satisfy the query condition, the plan is transformed automa
 pre-aggregate table to fetch the data
 
 ##### Compacting pre-aggregate tables
-Compaction is an optional operation for pre-aggregate table. If compaction is performed on main 
-table but not performed on pre-aggregate table, all queries still can benefit from pre-aggregate 
-table.To further improve performance on pre-aggregate table, compaction can be triggered on 
-pre-aggregate tables directly, it will merge the segments inside pre-aggregation table. 
-To do that, use ALTER TABLE COMPACT command on the pre-aggregate table just like the main table
+Compaction command (ALTER TABLE COMPACT) need to be run separately on each pre-aggregate table.
+Running Compaction command on main table will **not automatically** compact the pre-aggregate 
+tables.Compaction is an optional operation for pre-aggregate table. If compaction is performed on
+main table but not performed on pre-aggregate table, all queries still can benefit from 
+pre-aggregate tables.To further improve performance on pre-aggregate tables, compaction can be 
+triggered on pre-aggregate tables directly, it will merge the segments inside pre-aggregate table. 
 
-  NOTE:
-  * If the aggregate function used in the pre-aggregate table creation included distinct-count,
-     during compaction, the pre-aggregate table values are recomputed.This would a costly 
-     operation as compared to the compaction of pre-aggregate tables containing other aggregate 
-     functions alone
-
 ##### Update/Delete Operations on pre-aggregate tables
 This functionality is not supported.
 
@@ -1007,16 +1001,6 @@ roll-up for the queries on these hierarchies.
   SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
    avg(price) FROM sales GROUP BY order_time, country, sex
   
-  CREATE DATAMAP agg_minute
-  ON TABLE sales
-  USING "timeseries"
-  DMPROPERTIES (
-  'event_time’=’order_time’,
-  'minute_granualrity’=’1’,
-  ) AS
-  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
-   avg(price) FROM sales GROUP BY order_time, country, sex
-    
   CREATE DATAMAP agg_minute
   ON TABLE sales
   USING "timeseries"
@@ -1039,9 +1023,7 @@ roll-up for the queries on these hierarchies.
   ```
 
   It is **not necessary** to create pre-aggregate tables for each granularity unless required for 
-  query
-  .Carbondata
-   can roll-up the data and fetch it
+  query.Carbondata can roll-up the data and fetch it.
 
   For Example: For main table **sales** , If pre-aggregate tables were created as  
 

diff --git a/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala b/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
@@ -135,6 +135,30 @@ object PreAggregateTableExample {
     println("time for query on table without pre-aggregate table:" + time_without_aggTable.toString)
     // scalastyle:on
 
+    // 3. if avg function is defined for a column, sum also can be used on that;but not other way
+    // round
+    val time_without_aggTable_sum = time {
+      spark.sql(
+        s"""
+           | SELECT id, sum(age)
+           | FROM personTableWithoutAgg group by id
+      """.stripMargin).count()
+    }
+
+    val time_with_aggTable_sum = time {
+      spark.sql(
+        s"""
+           | SELECT id, sum(age)
+           | FROM personTable group by id
+      """.stripMargin).count()
+    }
+    // scalastyle:off
+    println("time for query with function sum on table with pre-aggregate table:" +
+      time_with_aggTable_sum.toString)
+    println("time for query with function sum on table without pre-aggregate table:" +
+      time_without_aggTable_sum.toString)
+    // scalastyle:on
+
     spark.sql("DROP TABLE IF EXISTS mainTable")
     spark.sql("DROP TABLE IF EXISTS personTable")
     spark.sql("DROP TABLE IF EXISTS personTableWithoutAgg")