Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema and statement validation prior to execution #40

Merged
merged 51 commits into from
Nov 16, 2016
Merged

Schema and statement validation prior to execution #40

merged 51 commits into from
Nov 16, 2016

Conversation

aew
Copy link
Collaborator

@aew aew commented Nov 11, 2016

Fixes a number of outstanding bugs and incompatibilities between validation output and execution input. Backwards compatible.

Changes include:

  • Perform schema and statement validation prior to execution, with more robust error handling (backwards compatible)
  • Line and column numbers in validation error output
  • Fix bug that could cause an infinite loop in the absence of a parent spec
  • Enable memoization of schema and statement preparation prior to execution
  • Post-process validated schema to eliminate unnecessary complexity
  • Fix potential collision of specs for root query and mutation fields sharing the same type
  • Support validation of arguments for root fields
  • Fix bug in validation of unused fragments
  • Fix bugs in validation of scalar leafs for lists of scalars and lists of objects
  • Deprecate graphql-clj.type/create-schema in favor of graphql-clj.validator/validate-schema
  • Fix bug in fragment cycles validation
  • Properly assign specs for recursively nested fields (switch to pre-order traversal for adding statement specs)

This is a large change because the parser was eliminating line and column metadata necessary for producing validation error messages. The executor was coupled to parser output rather than validator output. This PR proposes that the executor take the validator output as an input rather than the parser output.

Related discussion: #22

aew added 30 commits November 5, 2016 16:54
@aew
Copy link
Collaborator Author

aew commented Nov 11, 2016

I think I understand the issue - a query failing validation (because of this subscriptionType issue) is returning more than just the errors key, including things that don't serialize to JSON. I'll patch this.

@aew
Copy link
Collaborator Author

aew commented Nov 12, 2016

There were actually additional bugs hidden by a bug in the fragment cycles validator which was inappropriately short-circuiting the DFS. I've fixed these bugs and added a couple missing unit tests for things that were breaking the Graphiql starter project.

@tendant
Copy link
Owner

tendant commented Nov 12, 2016

Just had simple test in GraphiQL, looks like it work fine now.

Thanks a lot for your great work. The validation really makes huge difference on usability of the library. It helps provide much better error messages.

@tendant
Copy link
Owner

tendant commented Nov 12, 2016

Do you have more commits coming? If not, I might release a new version tonight.

@aew
Copy link
Collaborator Author

aew commented Nov 12, 2016

I'm not planning to commit any more unless we discover more bugs, so feel free to release a new version. Thanks!

@tendant
Copy link
Owner

tendant commented Nov 14, 2016

Found a new issue in inspector:

https://github.com/tendant/graphql-clj-starter/tree/testing-validation-loc

Error result:

{
  "errors": [
    {
      "error": "Cannot query field 'kind' on type 'FullType'.",
      "loc": {
        "line": 21,
        "column": 1
      }
    }
  ]
}

@aew
Copy link
Collaborator Author

aew commented Nov 14, 2016

Ok - taking a look at this. Can you clarify which query you used to produce this error?

@tendant
Copy link
Owner

tendant commented Nov 14, 2016

It came from introspection query.

Just open graphiql, the error will show up in graphql result.

@tendant
Copy link
Owner

tendant commented Nov 15, 2016

Problem might relate to reloading of schema.

Looks like after updating schema in dev environment. Introspection query returns:

{
  "errors": [
    {
      "error": "Cannot query field 'kind' on type 'FullType'.",
      "loc": {
        "line": 21,
        "column": 1
      }
    }
  ]
}

When try to run any query after changing schema in dev server, I got below error for all root query fields:

{:errors [{:node-type :field, :parent {:node-type :operation-definition, :v/parent {:node-type :statement-root}, :v/path [Query]}, :node {:node-type :field, :field-name #object[graphql_clj.box.Box 0x231e19a3 {:status :ready, :val chapters}], :selection-set [{:node-type :field, :field-name #object[graphql_clj.box.Box 0x702e24d5 {:status :ready, :val id}], :v/path [Query chapters id], :spec :graphql-clj.1623193917/id}], :v/path [Query chapters], :spec :graphql-clj.1623193917/chapters, :v/parent {:node-type :operation-definition, :v/parent {:node-type :statement-root}, :v/path [Query]}, :v/parentk :selection-set}, :error Parent type not found}]}

@tendant
Copy link
Owner

tendant commented Nov 15, 2016

BTW: I tried to use validator/validate-schema* function, it still has the same issue.

(let [resolver (resolver/create-resolver-fn (:schema (:state validated-statement)) resolver-fn)]
(assoc-in validated-statement [:state :resolver] resolver)))))))

(def prepare (memoize prepare*))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better not to use memoize. User can choose to use memoize if needed, they can also choose how to memoize.

Although memoize will get higher performance at the expense of higher memory use. However it is better leave it to user to decide whether use it or not.

Another good thing about leaving this to the user is that user can choose to use memoize if they wish, or they can use enhanced version of memoize with customized caching policy.

(graphql-clj.type/create-type-meta-fn graphql-clj.type/demo-schema))
(let [type-schema (type/create-schema graphql-clj.introspection/introspection-schema)]
(execute nil type-schema nil "query{__schema{types {name kind}}}")))
([validated-document]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For public api of execute, we should try to keep it to the bare minimal. User can create they own version of execute if needed.

It is easier to support only one execute function and maintain backward compatibility in the long run.

@tendant
Copy link
Owner

tendant commented Nov 15, 2016

The problem goes away when all memoize function calls are removed.

Also, I would like to propose to expose execute function to be as simple as we can:

(defn execute
  [context schema-or-state resolver-fn statement-or-state variables]
    ...))

User can cache validated schema, validated statement and resolver-fn as they wish.

@tendant
Copy link
Owner

tendant commented Nov 15, 2016

Another issue with the validated schema and statement: validation result is more than 10x bigger than parsing result.

Below are size comparison for schema "graphql-clj-starter.graphql/starter-schema"

6.2K  parsed-schema
70K   validated-schema

Below are size comparison for statement "{droid}"

242B parsed-statement
71K    validated-statement

Looks like the validated data contains the schema in addition to the result.

(filter #(= (:type-name %) root-name))
first
:fields
(map (juxt (comp box/box->val :field-name) (comp box/box->val :type-name)))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug here: Root fields return nil as field type, when root field is using composite type like NOT_NULL, LIST etc.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Root field type can't be expressed use string value of type-name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this for list types. I think non-null types are working (there is a test for this with loremIpsum of type String!) but I may be missing something.

(if (or recursive? (not (meta d)))
(do
(assert (= (first d) 'clojure.spec/def)) ;; Protect against unexpected statement eval
(try (eval d) (catch Compiler$CompilerException _ d))) ;; Squashing errors here to provide better error messages in validation
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(assert (= (first d) 'clojure.spec/def)) does not prevent against all unexpected statement eval. The best way is not to use eval at all.

Copy link
Collaborator Author

@aew aew Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't think of another way to dynamically generate these specs. We could avoid using spec - we could use prismatic schema instead or something like that. Other suggestions?

@aew
Copy link
Collaborator Author

aew commented Nov 16, 2016

Sounds good re: removing memoize and the single arity for the execute function - I'll make those changes.

Re: size of the validation output - I'm waiting to see how validation output interacts with a simplified execution phase to understand exactly what we need to keep, then we can remove everything else and it will be small.

@aew
Copy link
Collaborator Author

aew commented Nov 16, 2016

I've made the changes discussed above - @tendant worth taking another look.

For additional context, there are a lot of outstanding issues that concern me, but I don't know how many of them we want to fix in a single PR:

  • Getting to "of type" is complex, happens in different ways in different places, and most likely needs to be solved a different way to maintain the same behavior with simpler code
  • I'm not very happy with the dynamic use of spec - we may be better served using prismatic schema given spec's global state, eval, minimal leverage from the specs after they are created, etc.
  • I would expect the validation phase to be slow (I haven't stopped to benchmark it). I don't think that optimizing this is a high priority at the moment, because ideally validation would behave like JIT compilation, whereas execution affects latency of serving each request. I think we want the output of the validation phase to inline everything necessary to execute a query (probably including the specific resolver function for each field, which would be a breaking change from today). I don't have a clear idea of what the statement validation output format should be until we re-implement the execution phase.
  • With this change, current library users may start doing validation inside the inner loop, which could be an unexpected breaking change from a performance perspective (this is why I initially thought to do memoization inside the library to keep things working as expected). We may want to think about how we make it clear where people should be memoizing, or bite the bullet and make an actual breaking change once the execution phase is clear.

Given all of this, we could either merge this and hope for the best, or remove validation as a requirement for the execution phase, and perhaps add a different execution function for previously "prepared" inputs. WDYT?

@tendant
Copy link
Owner

tendant commented Nov 16, 2016

I will test it and merge it, if there is no major issue.

As long as we keep the public API simple and consistent, it is not too hard to change the internal implementation if needed.

@tendant tendant merged commit 973fcb8 into tendant:master Nov 16, 2016
@tendant
Copy link
Owner

tendant commented Nov 16, 2016

Merged. Thanks you @aew for your great contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants