Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YCQL] Missing support for Materialized Views #919

Open
fungl164 opened this issue Mar 1, 2019 · 7 comments
Open

[YCQL] Missing support for Materialized Views #919

fungl164 opened this issue Mar 1, 2019 · 7 comments
Assignees
Labels
area/ycql Yugabyte CQL (YCQL) community/request Issues created by external users kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects

Comments

@fungl164
Copy link

fungl164 commented Mar 1, 2019

Jira Link: DB-2467
Is there planned support for YCQL Materialized Views? Thanxs!

@kmuthukk
Copy link
Collaborator

kmuthukk commented Mar 1, 2019

hi @fungl164 - currently this effort is not in the near-term roadmap. But something we will reconsider after about 3-4 months.

With YugaByte DB, some of the capabilities of MVs can be achieved via the secondary index capability. As part of CREATE INDEX you can specify additional columns to be included in the index (via the INCLUDE clause) to avoid doing a second lookup into the main table for commonly queried columns via the index. See https://docs.yugabyte.com/latest/api/ycql/ddl_create_index/#included-columns-1.

Feel free to share a bit more about the schema you were thinking about for your main table & for your Materialized View, and perhaps we can see if that's easy enough to model via a CREATE INDEX based alternative.

regards,
Kannan

@kmuthukk kmuthukk added this to To Do in YCQL via automation Mar 1, 2019
@kmuthukk kmuthukk added the kind/enhancement This is an enhancement of an existing feature label Mar 1, 2019
@fungl164
Copy link
Author

fungl164 commented Mar 2, 2019

@kmuthukk Thanks for the quick response. I appreciate the work you guys are doing. I think it's an awesome approach to consolidate all these technologies under a common platform.

As a follow up to your offer, at the moment, how would I be able to express the following MVs as indexes? On another note, is it even possible to define multiple secondary indexes with additional included columns (perhaps I'm misreading the docs...)? If so, how would this work? Many thanks for the feedback!

CREATE TABLE IF NOT EXISTS imdb.movie_stats (
 	movie_name text,
	movie_genre text,
	user_name text,
	user_rank int,
	last_watched timestamp, // truncated to MM-DD-YYYY
	PRIMARY KEY (movie_genre, movie_name, user_name) 
);

CREATE MATERIALIZED VIEW IF NOT EXISTS imdb.most_watched_by_year
AS SELECT *
FROM imdb.movie_stats
WHERE movie_genre IS NOT NULL
   AND movie_name IS NOT NULL
   AND user_genre IS NOT NULL
PRIMARY KEY ((movie_genre, last_watched), movie_name, user_name); 

CREATE MATERIALIZED VIEW IF NOT EXISTS imdb.best_rated
AS SELECT *
FROM imdb.movie_stats
WHERE movie_genre IS NOT NULL
   AND movie_name IS NOT NULL
   AND user_name IS NOT NULL
PRIMARY KEY ((user_rank, movie_genre), movie_name, user_name);

@kmuthukk
Copy link
Collaborator

kmuthukk commented Mar 2, 2019

Hi @fungl164 - thanks for the words of appreciation.

And thanks for sharing a full example. That is very helpful.

This is the equivalent schema I would suggest using YCQL secondary indexes.

Notice that the YCQL index specification also allows you to pick the partition columns and clustering columns much the same way as your main table or the materialized view (in your example). So I have tried to match the same selections for the CREATE INDEX. The only difference is that the app simply queries from the main table itself (rather than from the MV), but underneath, the appropriate index should be used based on your lookup pattern. See some SELECT examples below.

CREATE KEYSPACE IF NOT EXISTS imdb;

CREATE TABLE IF NOT EXISTS imdb.movie_stats (
       movie_name text,
       movie_genre text,
       user_name text,
       user_rank int,
       last_watched timestamp,
       PRIMARY KEY (movie_genre, movie_name, user_name)
) WITH transactions = { 'enabled' : true };

CREATE INDEX IF NOT EXISTS most_watched_by_year
  ON imdb.movie_stats((movie_genre, last_watched), movie_name, user_name)
  INCLUDE(user_rank);

CREATE INDEX IF NOT EXISTS best_rated
  ON imdb.movie_stats((user_rank, movie_genre), movie_name, user_name)
  INCLUDE(last_watched);

USE imdb;
INSERT INTO movie_stats(movie_name, movie_genre, user_name, user_rank, last_watched)
    VALUES ('m1', 'g1', 'u1', 5, '2019-01-18');
INSERT INTO movie_stats(movie_name, movie_genre, user_name, user_rank, last_watched)
    VALUES ('m2', 'g2', 'u1', 4, '2019-01-17');
INSERT INTO movie_stats(movie_name, movie_genre, user_name, user_rank, last_watched)
    VALUES ('m3', 'g1', 'u2', 5, '2019-01-18');
INSERT INTO movie_stats(movie_name, movie_genre, user_name, user_rank, last_watched)
    VALUES ('m4', 'g1', 'u1', 2, '2019-02-27');


// If movie_genre,
// or movie_genre & movie_name
// or movie_genre & movie_name & user_name
// are specified, the query should be served efficiently from the primary table.
SELECT *
  FROM movie_stats
 WHERE movie_genre = 'g1';

// If movie_genre & last_watched are specified
// then the query should be served efficiently from the most_watched_by_year index.
SELECT *
  FROM movie_stats
 WHERE movie_genre = 'g1' and last_watched='2019-02-27'

// If user_rank and movie_genre are specified
// then the query should be served efficiently from the best_rated index.
SELECT *
  FROM movie_stats
 WHERE movie_genre = 'g2' and user_rank=5;

SELECT *
  FROM movie_stats
 WHERE movie_genre = 'g2' and user_rank=4;

@harshithdepa - It'll be great if as part of the work you are doing with adding an "explain plan support" #861 for YCQL statements, you can confirm if the output correctly shows if the above variant SELECT statements are using the INDEX as expected and also the name of the INDEX being used.

@fungl164
Copy link
Author

fungl164 commented Mar 2, 2019

Awesome! Thanks! Sounds like all you need is AST mapping between indexes and MVs. A direct select on an index (as opposed to calling the main table) would practically be the same as directly calling the MV and would save you a lookup and a decision on which it the most appropriate index to choose from. Nonetheless, this is great news...thnxs!

Now on to figuring out how to setup a single node cluster for local development purposes.... : )

@kmuthukk
Copy link
Collaborator

kmuthukk commented Mar 2, 2019

Thanks @fungl164 . Your observation is correct. It is indeed mostly syntax sugar, but still work to implement, test and productionize. We will keep this in mind in our next round of feature planning.

@kmuthukk
Copy link
Collaborator

kmuthukk commented Mar 2, 2019

Regarding:

Now on to figuring out how to setup a single node cluster for local development purposes.... : )

You can use steps along these lines to setup a single node cluster with replication factor 1 for test purposes.

# Fresh install. Have a clean data directory. You can use multiple dirs if you are multiple drives.
# In this example we are using only 1 drive as the data drive.
export DIR=$HOME/yb-data
rm -rf $DIR
mkdir $DIR

# kill any old yb-master/yb-tserver processes
pkill yb-master
pkill yb-tserver

# set IP to your node's IP
export IP=10.9.123.137

# with replication factor of 1, you just need one master.
export MASTERS=$IP:7100

# comma separated list of data directories.
export DATA_DIRS=$DIR

cd /home/centos/yugabyte-<1.x.y.z>/bin
./yb-master --master_addresses $MASTERS --rpc_bind_addresses=$IP:7100 --fs_data_dirs $DATA_DIRS --replication_factor 1 >& $DIR/master.out &
./yb-tserver --tserver_master_addrs $MASTERS --rpc_bind_addresses=$IP:9100 --fs_data_dirs $DATA_DIRS >& $DIR/tserver.out &

And then just confirm via ps that your processes are running; otherwise check the file where you

Verify Master UI by checking
links http://$IP:7000

Verify TServer UI by checking:
links http://$IP:9000

@yugabyte-ci yugabyte-ci added the community/request Issues created by external users label Jul 17, 2019
@frozenspider frozenspider added the area/ycql Yugabyte CQL (YCQL) label Sep 9, 2020
@frozenspider frozenspider changed the title Missing support for YCQL Materialized Views [YCQL] Missing support for Materialized Views Sep 9, 2020
@NVargP
Copy link

NVargP commented Feb 4, 2021

Hello, would it be possible to reconsider fixing this issue, please? The approach with secondary indices described above doesn't work for views that use agregators like SUM, COUNT, AVERAGE, etc.

Having materialized aggregated views is a superpower.

Thank you.

@yugabyte-ci yugabyte-ci added the priority/medium Medium priority issue label Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ycql Yugabyte CQL (YCQL) community/request Issues created by external users kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
YCQL
  
To Do
Status: No status
Development

No branches or pull requests

6 participants