Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make schema controller and schema cache stateless #7951

Open
wants to merge 3 commits into
base: alpha
Choose a base branch
from

Conversation

pzawadzki-pronos
Copy link

@pzawadzki-pronos pzawadzki-pronos commented Apr 26, 2022

New Pull Request Checklist

Issue Description

Currently Schema-controller, database-controller and schema cache is a singleton and it makes whole parse-server statefull

Related issue: #7950

Approach

The concept is to make schema logic stateless and make possible to override in parse-configuration schema cache, so that it is possible to e.g cache it in Redis or another custom solution

TODOs before merging

NA

  • Add tests
  • Add changes to documentation (guides, repository pages, in-code descriptions)
  • Add security check
  • Add new Parse Error codes to Parse JS SDK
  • A changelog entry is created automatically using the pull request title (do not manually add a changelog entry)

@parse-github-assistant
Copy link

parse-github-assistant bot commented Apr 26, 2022

Thanks for opening this pull request!

  • 🎉 We are excited about your hands-on contribution!

@pzawadzki-pronos pzawadzki-pronos changed the title Make schema controller and schema cache stateless feat: make schema controller and schema cache stateless Apr 26, 2022
@mtrezza
Copy link
Member

mtrezza commented Apr 27, 2022

Thanks for this PR, it seems to be quite a significant one at the internals of Parse Server, so could you provide more details about what the PR changes and whether there are any breaking changes?

@pzawadzki-pronos
Copy link
Author

So @mtrezza answering Your question.

  1. Before my change parse-server was caching schema in 2 internal variables:
    -> in SchemaController itself
    -> adapters/Cache/SchemaCache
    So when You want to get schema it looked SchemController (data field). Adapters/Cache/SchemaCache was used when there were need to build SchemaController/data. What is more adapters/Cache/SchemaCache it wasn't switchable adapter(like other), but just hardcoded, and data were cached in simple variable.
  2. After my change
    I introduced real SchemaCacheAdapter which is fully switchable. By default it uses InMemory structure so cache will work exactly the same as earlier (so there are no breaking changes). In my solution there is only one schema-cache and it work in the way -> try to get schema from SchemaCacheAdapter. If there is no data in cache then query database (of course result are later cached). Next difference is that in my solution we just fetch schema on demand.

So to sum up:

  1. My changes introduces real SchemaCacheAdapter, which can be overriden while initializing parse-server
  2. SchemaController and DatabaseController is stateless since there is no local variable keeping state
  3. In my opinion it makes whole mechanism much more simple and error prone

@mtrezza
Copy link
Member

mtrezza commented Apr 28, 2022

Sounds great, let's wait for some more review feedback just to be sure.

In the meantime, could you rebase this onto alpha and take a look at the failing tests? (Ignore the flaky, unrelated tests failing). You can run each of the tests also locally (see package.json for npm commands).

@mtrezza mtrezza requested a review from a team April 28, 2022 15:28
@mtrezza mtrezza linked an issue Apr 28, 2022 that may be closed by this pull request
3 tasks
@mtrezza mtrezza removed the request for review from a team April 28, 2022 15:31
@Moumouls
Copy link
Member

Hi @pzawadzki-pronos , thanks for your PR ! 👍

After some reading on the issue, a quick look at the changes. I've some questions and maybe some doubts if this PR could be merged into the parse-server code base.

Parse Server is not currently designed in many aspects to handle multi-tenant requests directly.

As I understand in your use case, you send all queries to a single Parse Server deployment type (could be many parse-server with a single configuration behind a load balancer). Then on the fly, you may use a request header or a Parse.Object field to detect the target database. Based on this field you manipulate the Mongo connection string and maybe some other properties to send the query to the right database and handle the response correctly.

From my point of view. Without the complete code to correctly handle multi-tenant requests, your suggested changes will be not usable by the community. These changes are specifically designed for a multi-tenant use case. In this case, the complete multi-tenant implementation should be supported/documented.

From my knowledge of parse-server it means (this is a partial list, it needs a deeper investigation):

  • Introduce a new header (it will be compatible with REST and GraphQL API), this is the technique that you use?
  • This header is easy to manipulate, (JS SDK maybe need some adjustments, to add a header on the fly during query/save)
  • A MultiTenantController + MultiTenantInMemoryAdapter could be introduced to inject on the fly the corresponding database config, Cache config, that will be used for the current query. A Database/Schema store could be handled by the MultiTenantController to keep in cache the config for a specific tenant.
  • Cloud code use singleton Parse Node JS SDK, adjustments are needed. In the current state, it's not usable. How did you manage to correctly handle internal requests in your cloud code ?
  • Global goal, execute the whole request with a matching config (based on the tenant matched from the request header)

It seems that you succeed in using parse-server in a multi-tenant use case 👏 . Could you provide additional details about which changes you needed to make it work properly and the limitations encountered?

I think multi-tenant compatibility will help the community and show how parse-server is versatile, but we need to do this with the complete experience of working out of the box. 🚀

@pzawadzki-pronos
Copy link
Author

pzawadzki-pronos commented Apr 29, 2022

Hi @pzawadzki-pronos , thanks for your PR ! 👍

After some reading on the issue, a quick look at the changes. I've some questions and maybe some doubts if this PR could be merged into the parse-server code base.

Parse Server is not currently designed in many aspects to handle multi-tenant requests directly.

As I understand in your use case, you send all queries to a single Parse Server deployment type (could be many parse-server with a single configuration behind a load balancer). Then on the fly, you may use a request header or a Parse.Object field to detect the target database. Based on this field you manipulate the Mongo connection string and maybe some other properties to send the query to the right database and handle the response correctly.

From my point of view. Without the complete code to correctly handle multi-tenant requests, your suggested changes will be not usable by the community. These changes are specifically designed for a multi-tenant use case. In this case, the complete multi-tenant implementation should be supported/documented.

From my knowledge of parse-server it means (this is a partial list, it needs a deeper investigation):

  • Introduce a new header (it will be compatible with REST and GraphQL API), this is the technique that you use?
  • This header is easy to manipulate, (JS SDK maybe need some adjustments, to add a header on the fly during query/save)
  • A MultiTenantController + MultiTenantInMemoryAdapter could be introduced to inject on the fly the corresponding database config, Cache config, that will be used for the current query. A Database/Schema store could be handled by the MultiTenantController to keep in cache the config for a specific tenant.
  • Cloud code use singleton Parse Node JS SDK, adjustments are needed. In the current state, it's not usable. How did you manage to correctly handle internal requests in your cloud code ?
  • Global goal, execute the whole request with a matching config (based on the tenant matched from the request header)

It seems that you succeed in using parse-server in a multi-tenant use case 👏 . Could you provide additional details about which changes you needed to make it work properly and the limitations encountered?

I think multi-tenant compatibility will help the community and show how parse-server is versatile, but we need to do this with the complete experience of working out of the box. 🚀

Multitenant is not the only use-case. Other are:
a) make closer to achieve stateless parse-server (i sow there is some ticket about it) and we are getting rid of singleton which is rather bad practice
b) You can easily switch way of caching schema -> e.g we are using node-cache-manager with underlying Rerdis
c) I think that it can also solve some issues connected with multiple parse instances, see current implementation (before my changes):
image

Back to questions about multitenant:

  • In our case WEB application is sending applicatinId like "applicationId+tenantId". Just before using Parse, express 'hook' is splitting this string and setting proper applicationId. New header component - we are also considering, but first was quicker (for POC)
  • yes we are overriding databaseAdapter and newly introduced schemaAdapter. We are switching connection on the fly, based on context set in CLS (lib cls-hooked). It is set on the beginning on handling request - before parse is doing anything (based on header)
    image
  • currently we are not using in the project GraphQL
  • Cloud code use singleton - we are using directAccess so all triggers, queries seem to be executed on the same CLS context.
  • Conversion our app to multi-tenant is currently on POC phase. So from quick tests looks that major functionality works (app is not crashing), but we need to introduce some changes not connected with parse-server. We started from parse since we pointed it out like game-changer.

@pzawadzki-pronos pzawadzki-pronos force-pushed the 7950_Make_schema_controller_and_schema_cache_stateless branch from ed338bb to c603bb2 Compare April 29, 2022 13:13
@mtrezza mtrezza force-pushed the alpha branch 2 times, most recently from 59215e6 to e6d7d8f Compare May 1, 2022 02:29
@Moumouls
Copy link
Member

Moumouls commented May 1, 2022

Hi @pzawadzki-pronos

make closer to achieve stateless parse-server

I can see your point of view. A stateless parse-server will play better for many uses case, but the Schema cache is technically already stateless. Parse Server stores the whole schema architecture into the _Schema table/collection, and parse pull this schema from the database when it's needed. Developers have also the control to clear the schema when they want. I think stateless is not the correct word here maybe. It seems that you try to achieve "multi-schema" support that will be relevant mainly in a multi-tenant config.

make closer to achieve stateless parse-server (i sow there is some ticket about it) and we are getting rid of singleton which is rather bad practice

You are right, singleton is terrible and does not play well in many use cases, it can break the code isolation. And your use case shows the limitation of Parse. Cloud and Parse JS SDK singleton. This is not hard to fix I think but it needs a deeper look inside the parse-server and a separate PR.

You can easily switch way of caching schema -> e.g we are using node-cache-manager with underlying Redis

Also a multi-tenant use case. In a "traditional" parse-server config, a parse-server works commonly with one database. Then nothing will be faster than storing the Schema in the memory (like the alpha current implementation).

Cloud code use singleton - we are using directAccess so all triggers, and queries seem to be executed in the same CLS context.

I'm scared for you about maybe a concurrency issue. Did you check with high concurrency parallel usage that the cloud function works correctly? Parse Server is currently not designed for this type of usage, you can land with deep side effects.

Conversion of our app to multi-tenant is currently on POC phase. So from quick tests looks that major functionality works (app is not crashing), but we need to introduce some changes not connected with parse-server. We started from parse since we pointed it out like game-changer.

On my side, I can say that the parse-server is not designed to handle this kind of use case, you could have a hard to making it work properly on a specific use case, or you could be completely blocked on the issue/bug related to the internal architecture of parse-server. And I think it will never be supported since it needs a tremendous internal refactor to support correctly a "multi-tenant" out of the box. We have many features that will not work like DefinedSchema, LiveQuery, and maybe GraphQL.

I'm personally not in favor of this PR.

In another hand, I can suggest you take a look at managing the multi-tenant at a higher level (network/gateway).

Many companies have switched to GraphQL to handle correctly at scale the multi-tenant use case, and to only expose one documented API. This is GraphQL/Apollo federation Gateway https://www.apollographql.com/docs/federation/ it allows you to stitch/merge/manage/connect many (Remote or in memory) GraphQL Schema into one big GraphQL schema. In your use case, you could deploy 1 parse-server with its specific config for each tenant. Then you have another GraphQL Federation serve (a simple express lightweight app) that manages all queries. In the GraphQL federation server, you can easily introduce some custom business logic to handle correctly relations and more. Note: your Cloud code on a tenant will also be able to query your public GraphQL gateway federation server with a master key.
GraphQL is also cross-platform if your team works on mobile, web, game, iot. It will work perfectly.

I hope it could help 🚀

@pzawadzki-pronos pzawadzki-pronos force-pushed the 7950_Make_schema_controller_and_schema_cache_stateless branch from c603bb2 to 1f71dc8 Compare May 3, 2022 11:12
@pzawadzki-pronos
Copy link
Author

pzawadzki-pronos commented May 3, 2022

Hi @pzawadzki-pronos

make closer to achieve stateless parse-server

I can see your point of view. A stateless parse-server will play better for many uses case, but the Schema cache is technically already stateless. Parse Server stores the whole schema architecture into the _Schema table/collection, and parse pull this schema from the database when it's needed. Developers have also the control to clear the schema when they want. I think stateless is not the correct word here maybe. It seems that you try to achieve "multi-schema" support that will be relevant mainly in a multi-tenant config.

make closer to achieve stateless parse-server (i sow there is some ticket about it) and we are getting rid of singleton which is rather bad practice

You are right, singleton is terrible and does not play well in many use cases, it can break the code isolation. And your use case shows the limitation of Parse. Cloud and Parse JS SDK singleton. This is not hard to fix I think but it needs a deeper look inside the parse-server and a separate PR.

You can easily switch way of caching schema -> e.g we are using node-cache-manager with underlying Redis

Also a multi-tenant use case. In a "traditional" parse-server config, a parse-server works commonly with one database. Then nothing will be faster than storing the Schema in the memory (like the alpha current implementation).

Cloud code use singleton - we are using directAccess so all triggers, and queries seem to be executed in the same CLS context.

I'm scared for you about maybe a concurrency issue. Did you check with high concurrency parallel usage that the cloud function works correctly? Parse Server is currently not designed for this type of usage, you can land with deep side effects.

Conversion of our app to multi-tenant is currently on POC phase. So from quick tests looks that major functionality works (app is not crashing), but we need to introduce some changes not connected with parse-server. We started from parse since we pointed it out like game-changer.

On my side, I can say that the parse-server is not designed to handle this kind of use case, you could have a hard to making it work properly on a specific use case, or you could be completely blocked on the issue/bug related to the internal architecture of parse-server. And I think it will never be supported since it needs a tremendous internal refactor to support correctly a "multi-tenant" out of the box. We have many features that will not work like DefinedSchema, LiveQuery, and maybe GraphQL.

I'm personally not in favor of this PR.

In another hand, I can suggest you take a look at managing the multi-tenant at a higher level (network/gateway).

Many companies have switched to GraphQL to handle correctly at scale the multi-tenant use case, and to only expose one documented API. This is GraphQL/Apollo federation Gateway https://www.apollographql.com/docs/federation/ it allows you to stitch/merge/manage/connect many (Remote or in memory) GraphQL Schema into one big GraphQL schema. In your use case, you could deploy 1 parse-server with its specific config for each tenant. Then you have another GraphQL Federation serve (a simple express lightweight app) that manages all queries. In the GraphQL federation server, you can easily introduce some custom business logic to handle correctly relations and more. Note: your Cloud code on a tenant will also be able to query your public GraphQL gateway federation server with a master key. GraphQL is also cross-platform if your team works on mobile, web, game, iot. It will work perfectly.

I hope it could help 🚀

"I can see your point of view. A stateless parse-server will play better for many uses case, but the Schema cache is technically already stateless. Parse Server stores the whole schema architecture into the _Schema table/collection, and parse pull this schema from the database when it's needed. Developers have also the control to clear the schema when they want. I think stateless is not the correct word here maybe. It seems that you try to achieve "multi-schema" support that will be relevant mainly in a multi-tenant config."
"Also a multi-tenant use case. In a "traditional" parse-server config, a parse-server works commonly with one database. Then nothing will be faster than storing the Schema in the memory (like the alpha current implementation)."

In the code of parse-server there is support to create multiple parse-server instances (with different app id) on the same process of node. You can provide there different databases. The only limitation is that You cannot use cloud code on the server (that's why I cannot use this approach - in our application there is a lot of cloud code). And there, this singleton schema cache, can lead to some problems. So the question is -> if Parse is still supporting this option?
Seem like in parse-server code there are a lot of in consequences -> in most places there is cache /checks against applicationId, but 14months ago someone introduced this Adapter/Cache/SchemaCache -> which in my opinion made parse-code much worse, it is in Adapter directory but it is not a standard adapter (You cannot configure it), from my perspective it seem like temporary hack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make SchemaController and SchemaCache stateless
3 participants