Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for external db for schema management in mongodb connector #8956

Closed

Conversation

academy-codex
Copy link
Contributor

Aim: Fixes #8887
Description:

  1. Added a configuration property to capture the name of the mongo database to be used by trino for schema management for the different other databases and collections on the mongodb cluster.
  2. The property is optional and in the case it is not provided, the default implementation of _schema collection will happen.
  3. The trino mongodb connector will create collections inside the configured db as trino_<<db_name>>_schema format and the collections will have documents of schemas of the collections in the <<db_name>>.
  4. Introduced a flag to enable/disable creation of indexes. The user should have the feature to make it optional in case the user is okay with the penalty of query execution time without indexing.

@academy-codex
Copy link
Contributor Author

@hashhar Kindly have a look over the draft changes done.

@ebyhr ebyhr self-requested a review August 25, 2021 14:46
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.

@hashhar
Copy link
Member

hashhar commented Aug 26, 2021

Re: #8956 (comment)

I see this is for the collection names. Earlier it was just _schema because every database had one such collection.
Now a single database may have multiple collections.

We need to preserve backcompat (which is possible by not setting the db-name).
And we need to avoid collision of collection name.

I think the cleanest solution is to make collection-name and database-name configs mutually exclusive.
If db-name is specified each collection in the db is named the same as the collection whose schema they represent. That way it's really easy for users/admins to find the schema for a given collection.

If collection-name is specified then the code behaves as the old impl where each DB has one collection with the name specified in collection-name.

@academy-codex
Copy link
Contributor Author

@hashhar I see your point. Let me make the changes and commit.

@hashhar
Copy link
Member

hashhar commented Aug 26, 2021

One more question : different DBs can have collections with the same name. How does it work with the new mechanism you are adding?

@academy-codex
Copy link
Contributor Author

academy-codex commented Aug 26, 2021

One more question : different DBs can have collections with the same name. How does it work with the new mechanism you are adding?

So this should be fine actually. For example, you have have database X and database Y and a meta data database META. Inside meta 2 collections will be created with the same name as database names. META will have 2 collections X and Y.

Lets say X and Y both have a same table name sample. So inside META we only query in X collection where tableName="sample" for sample table of X and for Y we query in Y collection.

Both tables with same name but have entries in different collections in meta data database because they are from different databases.

Hope i was able to explain ?

@academy-codex
Copy link
Contributor Author

@hashhar Did some changes. Please have a look once you get time :)
Thanks.

@@ -59,6 +59,7 @@ Property Name Description
``mongodb.write-concern`` The write concern
``mongodb.required-replica-set`` The required replica set name
``mongodb.cursor-batch-size`` The number of elements to return in a batch
``mongodb.schema-database`` The database to use for schema management
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This property should actually be placed right after mongodb.schema-collection.
  • Since we are improving the metadata management, there should be a point in both that only either of these will be used. Not both.
  • mongodb.schema-collection should be marked as deprecated. And we better remove it after couple of releases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mongodb.schema-collection should be marked as deprecated. And we better remove it after couple of releases.

I disagree. I don't think we would like to break the backward compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mongodb.schema-collection should be marked as deprecated. And we better remove it after couple of releases.

I disagree. I don't think we would like to break the backward compatibility.

Alright. Makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below comment isn't addressed. I think right "before" is better considering the hierarchy.

This property should actually be placed right after mongodb.schema-collection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix TestMongoClientConfig and add some tests.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments.

The impl looks good now. Can you also add some add tests too to make this is working as intended?

@academy-codex
Copy link
Contributor Author

Please fix TestMongoClientConfig and add some tests.

@ebyhr @hashhar Please check if all looks good now. I shall add the test cases and commit thereafter.

@academy-codex
Copy link
Contributor Author

Added the fixes for the test cases

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please separate into two commits, index-option and external-db.

@@ -59,6 +59,7 @@ Property Name Description
``mongodb.write-concern`` The write concern
``mongodb.required-replica-set`` The required replica set name
``mongodb.cursor-batch-size`` The number of elements to return in a batch
``mongodb.schema-database`` The database to use for schema management
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below comment isn't addressed. I think right "before" is better considering the hierarchy.

This property should actually be placed right after mongodb.schema-collection.

@ebyhr ebyhr self-requested a review September 29, 2021 05:23
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good to me. Please squash and separate into 2 commits (create-index and schema-database) when applying comments.

@academy-codex
Copy link
Contributor Author

@ebyhr @hashhar Addressed all review comments. Let me know if it's good to merge :)

Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add TestMongoExternalDatabaseConnectorSmokeTest that extends BaseMongoConnectorSmokeTest with mongodb.schema-database config property.

``mongodb.schema-collection`` A collection which contains schema information
``mongodb.create-index-for-schema.enabled``Create an index for schema collection when it doesn't exist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check build failure.

@@ -91,7 +91,7 @@

public class MongoSession
{
private static final Logger log = Logger.get(MongoSession.class);
private static final Logger log = Logger.get(io.trino.plugin.mongodb.MongoSession.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert unrelated change.

@@ -147,6 +151,7 @@ public void shutdown()
public List<String> getAllSchemas()
{
return ImmutableList.copyOf(client.listDatabaseNames()).stream()
.filter(name -> schemaDatabase.map(s -> !name.equals(s)).orElse(true))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename s to schema.

@@ -54,6 +56,7 @@ public void testExplicitPropertyMappings()
{
Map<String, String> properties = new ImmutableMap.Builder<String, String>()
.put("mongodb.schema-collection", "_my_schema")
.put("mongodb.enable-schema-create-index", "true")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct the property name.

{
Map<String, String> properties = new ImmutableMap.Builder<String, String>()
.put("mongodb.schema-database", "trino_meta_data_db")
.put("mongodb.enable-schema-create-index", "true")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the above comment.

@mosabua
Copy link
Member

mosabua commented Oct 28, 2022

👋 @academy-codex - this PR has become inactive. If you're still interested in working on it, please let us know.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

@colebow
Copy link
Member

colebow commented Nov 30, 2022

Closing this one out due to inactivity, but please reopen if you would like to pick this back up.

@academy-codex
Copy link
Contributor Author

Hey! I want to pick this back up :)

@academy-codex academy-codex reopened this May 29, 2023
@github-actions github-actions bot added docs mongodb MongoDB connector labels May 29, 2023
@mosabua
Copy link
Member

mosabua commented May 29, 2023

You can reopen the PR .. and then we can continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

MongoDB connector to have configurable database for _schema collection
6 participants