Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Table Support #398

Open
domoritz opened this issue May 23, 2024 · 3 comments
Open

Multi-Table Support #398

domoritz opened this issue May 23, 2024 · 3 comments
Labels
projects Project ideas for Mosaic

Comments

@domoritz
Copy link
Member

Mosaic assumes a single table right now but often data is spread across tables. Take for example a dataset with movies and actors. The tables could be joined but then the counts for histograms over movies or actors would become confusing. This project adds support for multiple tables by defining what table a selection is over and resolving the selection to the correct joins over other tables.

@domoritz domoritz added the projects Project ideas for Mosaic label May 23, 2024
@domoritz
Copy link
Member Author

In https://github.com/cmudig/Texture @willeppy has shown the value of being able to do queries over joined tables.

@willeppy
Copy link

More info about how I did it for that project...

I extended the selection in mosaic to track the table names for the table the selection comes from and is applied to (PR: willeppy#5)

  • If these are the same, then it is a normal selection like in current mosaic.
  • If they differ, then I wrap the selection in SQL exists clause before applying the filter to join the tables. For a more general implementation, I think there would need to be a way to specify how to join. For my use case, I only ever want to filter the current table to the values that exist in the selection from the other table (ignoring duplicates in the other table).

As an example:

  • I have a main_table_reviews and word_table. main_table_reviews has a primary key "id" and column "sentiment". word_table has foreign key "id" and column "word"
  • I have applied two filters. One to a chart pulling from word_table toword == "hello" and another filter through a different chart directly to the main_table_reviews.sentiment column.
  • This generates the following query to plot the data for main_table_reviews:
SELECT
    *
FROM
    "main_table_reviews"
WHERE
    EXISTS (
        SELECT
            1
        FROM
            "word_table"
        WHERE
            "main_table_reviews"."id" = "word_table"."id"
            AND "word" = 'hello'
    )
    AND "sentiment" = 'negative';

So I think it would be possible to extend this to a more generic implementation potentially where you specify how to join charts and what key to use for joins

One note: anecdotally this makes the client interactions slower right now because I think it messes up the indexing used by mosaic, I haven't looked into this very deeply yet though

@derekperkins
Copy link

This would save a lot of data transfer for us using a standard star schema design where we currently have to pre-join the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
projects Project ideas for Mosaic
Projects
None yet
Development

No branches or pull requests

3 participants