[YCQL] Missing support for Materialized Views #919

fungl164 · 2019-03-01T08:51:35Z

Jira Link: DB-2467
Is there planned support for YCQL Materialized Views? Thanxs!

kmuthukk · 2019-03-01T15:36:52Z

hi @fungl164 - currently this effort is not in the near-term roadmap. But something we will reconsider after about 3-4 months.

With YugaByte DB, some of the capabilities of MVs can be achieved via the secondary index capability. As part of CREATE INDEX you can specify additional columns to be included in the index (via the INCLUDE clause) to avoid doing a second lookup into the main table for commonly queried columns via the index. See https://docs.yugabyte.com/latest/api/ycql/ddl_create_index/#included-columns-1.

Feel free to share a bit more about the schema you were thinking about for your main table & for your Materialized View, and perhaps we can see if that's easy enough to model via a CREATE INDEX based alternative.

regards,
Kannan

fungl164 · 2019-03-02T08:25:14Z

@kmuthukk Thanks for the quick response. I appreciate the work you guys are doing. I think it's an awesome approach to consolidate all these technologies under a common platform.

As a follow up to your offer, at the moment, how would I be able to express the following MVs as indexes? On another note, is it even possible to define multiple secondary indexes with additional included columns (perhaps I'm misreading the docs...)? If so, how would this work? Many thanks for the feedback!

CREATE TABLE IF NOT EXISTS imdb.movie_stats (
 	movie_name text,
	movie_genre text,
	user_name text,
	user_rank int,
	last_watched timestamp, // truncated to MM-DD-YYYY
	PRIMARY KEY (movie_genre, movie_name, user_name) 
);

CREATE MATERIALIZED VIEW IF NOT EXISTS imdb.most_watched_by_year
AS SELECT *
FROM imdb.movie_stats
WHERE movie_genre IS NOT NULL
   AND movie_name IS NOT NULL
   AND user_genre IS NOT NULL
PRIMARY KEY ((movie_genre, last_watched), movie_name, user_name); 

CREATE MATERIALIZED VIEW IF NOT EXISTS imdb.best_rated
AS SELECT *
FROM imdb.movie_stats
WHERE movie_genre IS NOT NULL
   AND movie_name IS NOT NULL
   AND user_name IS NOT NULL
PRIMARY KEY ((user_rank, movie_genre), movie_name, user_name);

kmuthukk · 2019-03-02T14:05:45Z

Hi @fungl164 - thanks for the words of appreciation.

And thanks for sharing a full example. That is very helpful.

This is the equivalent schema I would suggest using YCQL secondary indexes.

Notice that the YCQL index specification also allows you to pick the partition columns and clustering columns much the same way as your main table or the materialized view (in your example). So I have tried to match the same selections for the CREATE INDEX. The only difference is that the app simply queries from the main table itself (rather than from the MV), but underneath, the appropriate index should be used based on your lookup pattern. See some SELECT examples below.

CREATE KEYSPACE IF NOT EXISTS imdb;

CREATE TABLE IF NOT EXISTS imdb.movie_stats (
       movie_name text,
       movie_genre text,
       user_name text,
       user_rank int,
       last_watched timestamp,
       PRIMARY KEY (movie_genre, movie_name, user_name)
) WITH transactions = { 'enabled' : true };

CREATE INDEX IF NOT EXISTS most_watched_by_year
  ON imdb.movie_stats((movie_genre, last_watched), movie_name, user_name)
  INCLUDE(user_rank);

CREATE INDEX IF NOT EXISTS best_rated
  ON imdb.movie_stats((user_rank, movie_genre), movie_name, user_name)
  INCLUDE(last_watched);

USE imdb;
INSERT INTO movie_stats(movie_name, movie_genre, user_name, user_rank, last_watched)
    VALUES ('m1', 'g1', 'u1', 5, '2019-01-18');
INSERT INTO movie_stats(movie_name, movie_genre, user_name, user_rank, last_watched)
    VALUES ('m2', 'g2', 'u1', 4, '2019-01-17');
INSERT INTO movie_stats(movie_name, movie_genre, user_name, user_rank, last_watched)
    VALUES ('m3', 'g1', 'u2', 5, '2019-01-18');
INSERT INTO movie_stats(movie_name, movie_genre, user_name, user_rank, last_watched)
    VALUES ('m4', 'g1', 'u1', 2, '2019-02-27');


// If movie_genre,
// or movie_genre & movie_name
// or movie_genre & movie_name & user_name
// are specified, the query should be served efficiently from the primary table.
SELECT *
  FROM movie_stats
 WHERE movie_genre = 'g1';

// If movie_genre & last_watched are specified
// then the query should be served efficiently from the most_watched_by_year index.
SELECT *
  FROM movie_stats
 WHERE movie_genre = 'g1' and last_watched='2019-02-27'

// If user_rank and movie_genre are specified
// then the query should be served efficiently from the best_rated index.
SELECT *
  FROM movie_stats
 WHERE movie_genre = 'g2' and user_rank=5;

SELECT *
  FROM movie_stats
 WHERE movie_genre = 'g2' and user_rank=4;

@harshithdepa - It'll be great if as part of the work you are doing with adding an "explain plan support" #861 for YCQL statements, you can confirm if the output correctly shows if the above variant SELECT statements are using the INDEX as expected and also the name of the INDEX being used.

fungl164 · 2019-03-02T19:22:23Z

Awesome! Thanks! Sounds like all you need is AST mapping between indexes and MVs. A direct select on an index (as opposed to calling the main table) would practically be the same as directly calling the MV and would save you a lookup and a decision on which it the most appropriate index to choose from. Nonetheless, this is great news...thnxs!

Now on to figuring out how to setup a single node cluster for local development purposes.... : )

kmuthukk · 2019-03-02T22:50:57Z

Thanks @fungl164 . Your observation is correct. It is indeed mostly syntax sugar, but still work to implement, test and productionize. We will keep this in mind in our next round of feature planning.

kmuthukk · 2019-03-02T22:53:45Z

Regarding:

Now on to figuring out how to setup a single node cluster for local development purposes.... : )

You can use steps along these lines to setup a single node cluster with replication factor 1 for test purposes.

# Fresh install. Have a clean data directory. You can use multiple dirs if you are multiple drives.
# In this example we are using only 1 drive as the data drive.
export DIR=$HOME/yb-data
rm -rf $DIR
mkdir $DIR

# kill any old yb-master/yb-tserver processes
pkill yb-master
pkill yb-tserver

# set IP to your node's IP
export IP=10.9.123.137

# with replication factor of 1, you just need one master.
export MASTERS=$IP:7100

# comma separated list of data directories.
export DATA_DIRS=$DIR

cd /home/centos/yugabyte-<1.x.y.z>/bin
./yb-master --master_addresses $MASTERS --rpc_bind_addresses=$IP:7100 --fs_data_dirs $DATA_DIRS --replication_factor 1 >& $DIR/master.out &
./yb-tserver --tserver_master_addrs $MASTERS --rpc_bind_addresses=$IP:9100 --fs_data_dirs $DATA_DIRS >& $DIR/tserver.out &

And then just confirm via ps that your processes are running; otherwise check the file where you

Verify Master UI by checking
links http://$IP:7000

Verify TServer UI by checking:
links http://$IP:9000

NVargP · 2021-02-04T18:48:39Z

Hello, would it be possible to reconsider fixing this issue, please? The approach with secondary indices described above doesn't work for views that use agregators like SUM, COUNT, AVERAGE, etc.

Having materialized aggregated views is a superpower.

Thank you.

kmuthukk added this to To Do in YCQL via automation Mar 1, 2019

kmuthukk added the kind/enhancement This is an enhancement of an existing feature label Mar 1, 2019

kmuthukk assigned harshithdepa and kmuthukk Mar 2, 2019

yugabyte-ci added the community/request Issues created by external users label Jul 17, 2019

yugabyte-ci unassigned harshithdepa Jul 17, 2019

pashinin mentioned this issue Aug 12, 2019

[question or feature request] CQL: what if additional indexes added after some INSERTs? #2019

Closed

frozenspider added the area/ycql Yugabyte CQL (YCQL) label Sep 9, 2020

frozenspider changed the title ~~Missing support for YCQL Materialized Views~~ [YCQL] Missing support for Materialized Views Sep 9, 2020

yugabyte-ci added the priority/medium Medium priority issue label Jun 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YCQL] Missing support for Materialized Views #919

[YCQL] Missing support for Materialized Views #919

fungl164 commented Mar 1, 2019 •

edited by yugabyte-ci

kmuthukk commented Mar 1, 2019 •

edited

fungl164 commented Mar 2, 2019 •

edited

kmuthukk commented Mar 2, 2019

fungl164 commented Mar 2, 2019

kmuthukk commented Mar 2, 2019

kmuthukk commented Mar 2, 2019

NVargP commented Feb 4, 2021

[YCQL] Missing support for Materialized Views #919

[YCQL] Missing support for Materialized Views #919

Comments

fungl164 commented Mar 1, 2019 • edited by yugabyte-ci

kmuthukk commented Mar 1, 2019 • edited

fungl164 commented Mar 2, 2019 • edited

kmuthukk commented Mar 2, 2019

fungl164 commented Mar 2, 2019

kmuthukk commented Mar 2, 2019

kmuthukk commented Mar 2, 2019

NVargP commented Feb 4, 2021

fungl164 commented Mar 1, 2019 •

edited by yugabyte-ci

kmuthukk commented Mar 1, 2019 •

edited

fungl164 commented Mar 2, 2019 •

edited