Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream tables that aren't ancestors of direct target table won't be included in the subset #18

Closed
janmeier opened this issue Nov 20, 2020 · 2 comments

Comments

@janmeier
Copy link

Maybe I'm just misunderstanding what condenser is able to do :)

I have this schema - of course much more complex in reality, but this shows the problem I'm facing:

image

When dumping meals, I expect to get all the food items in the meal, and their i18ns. MealI18n is dumped correctly, but FoodI18n is not.

Create tables

CREATE TABLE public.food (
  id uuid NOT NULL PRIMARY KEY
);
CREATE TABLE public.food_i18n (
  id uuid NOT NULL PRIMARY KEY REFERENCES food(id)
);
CREATE TABLE public.meal (
  id uuid NOT NULL PRIMARY KEY
);
CREATE TABLE public.meal_i18n (
  id uuid NOT NULL PRIMARY KEY REFERENCES meal(id)
);
CREATE TABLE public.meal_food (
  meal_id uuid NOT NULL REFERENCES meal (id),
  food_id uuid NOT NULL REFERENCES food (id),
  PRIMARY KEY (meal_id, food_id)
);

INSERT INTO public.food (id) VALUES ('c00d544c-2b15-11eb-adc1-0242ac120002'), ('2aa03896-780b-42de-a297-09e897a55c09');
INSERT INTO public.food_i18n (id) VALUES ('c00d544c-2b15-11eb-adc1-0242ac120002'), ('2aa03896-780b-42de-a297-09e897a55c09');

INSERT INTO public.meal (id)  VALUES ('eb0831ee-16c2-42d3-a1d0-b51b623b1e8e'), ('b0aa8b6d-a5cc-421c-9eac-88d88d9dc8dc');
INSERT INTO public.meal_i18n (id)  VALUES ('eb0831ee-16c2-42d3-a1d0-b51b623b1e8e'), ('b0aa8b6d-a5cc-421c-9eac-88d88d9dc8dc');

INSERT INTO public.meal_food (meal_id, food_id) VALUES ('eb0831ee-16c2-42d3-a1d0-b51b623b1e8e','c00d544c-2b15-11eb-adc1-0242ac120002');

Config

{
  "initial_targets": [
    {
      "table": "public.meal",
      "percent": 200
    }
  ],
  "db_type": "postgres",
  "keep_disconnected_tables": false,
  "excluded_tables": [],
  "passthrough_tables": [],
  "dependency_breaks": [],
  "upstream_filters": [],
  "fk_augmentation": [  ],  
}

Console output

Beginning subsetting with these direct targets: ['public.meal']
Processing 1 of 1: {'table': 'public.meal', 'percent': 200}
Direct target tables completed in 0.019591331481933594s
Beginning greedy upstream subsetting with these tables: ['public.food_i18n', 'public.meal_food', 'public.meal_i18n']
Processing 1 of 3: public.food_i18n
Processing 2 of 3: public.meal_food
Processing 3 of 3: public.meal_i18n
Greedy subsettings completed in 0.026256084442138672s
Beginning pass-through tables: []
Pass-through completed in 1.9073486328125e-06s
Beginning downstream subsetting with these tables: ['public.meal_i18n', 'public.meal_food', 'public.food_i18n', 'public.food', 'public.meal']
Processing 1 of 5: public.meal_i18n
Processing 2 of 5: public.meal_food
Processing 3 of 5: public.food_i18n
Processing 4 of 5: public.food
Processing 5 of 5: public.meal
Downstream subsetting completed in 0.020848989486694336s
Beginning post subset SQL calls
Completed post subset SQL calls in 5.054473876953125e-05s
public.meal, 0, 2, 0
public.meal_food, 0, 1, 0
public.food, 0, 1, 0
public.food_i18n, 0, 0, 0
public.meal_i18n, 0, 2, 0

I thought maybe the problem was, that a primary key references another primary key, but I added food_id toFoodI18n, but that still has the same problem

@theaeolianmachine
Copy link
Contributor

Hi Jan!

Thanks for sending this in. This is actually a known issue with the current implementation of condenser, thought not quite as described — allow me to explain.

When you select a direct target, such as Meals, the current algorithm does one pass "upstream", meaning following foreign (parent) keys that point to Meals (in this case, MealFood and MealI18n), and any grandparents etc until you reach the "root tables", i.e. tables with no keys pointing to them. In this case, MealFood and MealI18n are the tables at the top.

From there, we make a downstream pass all the way down to the "leaf tables", i.e. tables that don't point to anything else. In this case, from MealFood and MealI18n, we go back down to Meal and Food, and then we call it a day. This is a referentially intact subset. Notably though, we never go back upstream! This means when we hit Food (which is first hit on the downstream pass), we don't go back upstream to grab FoodI18n. The intent was for us to minimize the amount of data collected to make a good subset — there's a wide variety of opinions of what makes a good subset.

That being said, we've actually written a new algorithm for subsetting traversal, and we're currently working on adding it to the core Tonic product. It will in fact hit MealI18n, and it ensures every related table will be subset.

We're not sure if we'll be adding this to condenser any time soon, but hopefully this context is helpful! I heard you spoke with Jake recently, happy to help where we can.

@theaeolianmachine theaeolianmachine changed the title Tables that are more than two steps away from the initial target are not dumped Upstream tables that aren't ancestors of direct target table won't be included in the subset Dec 3, 2020
@janmeier
Copy link
Author

Hi Johnny

Thanks for the detailed explanation. I did take a brief look at the description of the algorithm, but your explanation makes it a lot clearer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants