Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OPTIMADE API] Several validation issues reported by the OPTIMADE provider dashboard #128

Closed
ml-evs opened this issue Sep 29, 2021 · 15 comments

Comments

@ml-evs
Copy link
Contributor

ml-evs commented Sep 29, 2021

Hi qmpy devs (I guess mostly @tachyontraveler 馃槈), as I don't think any of you were in the last OPTIMADE meeting, I just thought I would give you a heads-up that the OPTIMADE provider dashboard now lists the errors associated with your implementation. OQMD has quite a few validation errors that you may want to sort out (that others have also raised at recent OPTIMADE tutorials, that make it impossible to use OQMD with the new pymatgen OPTIMADE client). Here is the dashboard link: https://www.optimade.org/providers-dashboard/providers/oqmd.html

The main errors reported by that dashboard can be clustered vaguely into:

  1. Issues with the OptimadeStructureSerializer, e.g. missing values (that should be null or present), incorrect chemical formula formats (additional redundant "1"s, wrong element orderings etc) and some fields having the wrong type, e.g. elements should be a list and not a string.
  2. Incorrect reporting of errors/not implemented filters, e.g. the filter ?filter=id < 12345 should return a proper status code and error format that indicates to automated clients that this filter is not available.
  3. The only mandatory filter (structure_features) is not supported - in this case it, I don't think OQMD hosts any "weird" structures that would have values for structure_features, so it should probably just return all-or-nothing.
  4. A minor issue with the /info/structures endpoint, you need to report that your prefix is _oqmd (even if it is obvious) so that clients know to look for your prefixed properties (which are perhaps the most useful ones I've seen - would love it if all "stability" dbs could agree on a format of hull distance for filtering).

Most of the reported errors should only require minor changes, I'm happy to help out and make some of the tweaks if that would help! If you could handle adding optimade-python-tools as a dependency (only a minimal install would be needed), you could also validate the models produced by your serializer with Python code before it gets deployed.

Let me know what you think!

@tachyontraveler
Copy link
Member

Hi @ml-evs , Thanks for reporting the issues. I will work on them this week and they all should be fixed by this weekend - unless I ran into any totally unexpected issues with the Django REST API framework that we use in OQMD.

(and I will be there at the October OPTIMADE meeting)

I'd be happy to look into adding the optimade-python-tools to qmpy once the current errors are fixed, and will let you know if I come across any significant hurdles.

Abhi

@tachyontraveler
Copy link
Member

I've fixed every issue except for the "CONSTANTFIRST" queries since they're optional in OPTIMADE v1.0.0 and v1.1.0

I will close this issue in a few days unless @ml-evs has any comments/suggestions.

@ml-evs
Copy link
Contributor Author

ml-evs commented Nov 16, 2021

Thanks @tachyontraveler, this looks great! The dashboard confirms that the only issue is now the constant first comparison.

One issue is that I don't think constant first queries are optional... the spec says:

6.2.2   Numeric and String comparisons

Comparisons involving Numeric and String properties can be expressed using the usual comparison operators: '<', '>', '<=', '>=', '=', '!='. Implementations MUST support comparisons in the forms:

identifier <operator> constant
constant <operator> identifier

Where identifier is a property name and constant is either a numerical or string type constant.

Implementations MAY also support comparisons with identifiers on both sides, and comparisons with numerical type constants on both sides, i.e., in the forms:

identifier <operator> identifier
constant <operator> constant

so while I think identifier <operator> identifier is optional (and whatever constant <operator> constant is meant to do...), it seems that constant <operator> identifier is a "MUST".

@tachyontraveler tachyontraveler unpinned this issue Nov 16, 2021
@tachyontraveler
Copy link
Member

Thanks for clarifying that, @ml-evs . I have patched all the issues and now the local optimade validator installed my machine finds no test failures. Also passes 2/4 optional tests as well. I can work on the remaining optional fixes sometime later - probably after this quarter.

@tachyontraveler
Copy link
Member

Closing this issue since the OQMD implementation now passes all the validation tests, as indicated on the dashboard.

@ml-evs
Copy link
Contributor Author

ml-evs commented Nov 17, 2021

Awesome, nice one!

@ml-evs
Copy link
Contributor Author

ml-evs commented Nov 25, 2021

Hey @tachyontraveler, @JPBergsma spotted one final issue with the OQMD OPTIMADE (that also revealed a bug in the validator). The anonymous chemical formulae are ordered incorrectly, e.g., you serve ABCD4 rather than A4BCD. We had a bug that would only enforce the correct order on proportions != 1...

I will quickly throw together a PR for this now.

@tachyontraveler
Copy link
Member

I will close this issue after the anonymous formula issue is resolved in the scripts and then implemented at oqmd.org

@tachyontraveler
Copy link
Member

tachyontraveler commented Dec 17, 2021

@ml-evs

I tested a few ways to solve this problem and hit a wall at a consistency issue in the filtering of chemical_formula_anonymous.

There's an issue with implementing support for STARTS operation for chemical_formula_anonymous because of the disparity in the ordering between OQMD and OPTIMADE.

For example,

The OPTIMADE query chemical_formula_anonymous STARTS "A2B" AND nelements < 4 requires to return all materials whose anonymous formula is one of
"A2B", "A2B2", "A2BC", "A2B2C2"

But internally, I have to convert the query to look for chemical_formula_anonymous to start with "AB2", not "A2B", as the order in which the anonymous formula is saved in OQMD is ascending. In that case, the internal Django query would return the materials whose anonymous formula is one of
"AB2", "AB2C", "AB2C2", "AB2C3", "AB2C4", "AB2C5", ...

Writing a script to parse the OPTIMADE query and finding relevant matches before the Django SQL query is impractical when nelements filter is not applied.

Now, I can just change the format of retrieved data and disable the support for the STARTS query with chemical_formula_anonymous. Lmk if that's a solution for now with OPTIMADE?

There's a very low chance that I will change the OQMD's generic formula convention anytime soon as it'll break many of the internal users' data generation and retrieval scripts. I will keep this issue open until it's fully resolved.

@ml-evs
Copy link
Contributor Author

ml-evs commented Dec 17, 2021

Tricky, and completely understand that you don't want to change the qmpy representation!

It actually looks like supporting substring comparisons for chemical_formula_anonymous is not mandatory (see https://github.com/Materials-Consortia/OPTiMaDe/blob/develop/optimade.rst#chemical-formula-anonymous), so you can just spit out a 501 Not Implemented for CONTAINS, STARTS, ENDS etc. Do you have a filter mapper that already allows you to check for equality? e.g. if they ask for A2B in the filter, the query gets converted to AB2 for Django? If so, that should be enough!

@ml-evs
Copy link
Contributor Author

ml-evs commented Dec 17, 2021

I see that the dashboard was having a go at you for this (which I guess prompted your comment) - I can't check right now, but if it still moans when you return a 501, we can fix that in the validator itself.

@tachyontraveler
Copy link
Member

Tricky, and completely understand that you don't want to change the qmpy representation!

It actually looks like supporting substring comparisons for chemical_formula_anonymous is not mandatory (see https://github.com/Materials-Consortia/OPTiMaDe/blob/develop/optimade.rst#chemical-formula-anonymous), so you can just spit out a 501 Not Implemented for CONTAINS, STARTS, ENDS etc. Do you have a filter mapper that already allows you to check for equality? e.g. if they ask for A2B in the filter, the query gets converted to AB2 for Django? If so, that should be enough!

Yes, I can easily implement the equality comparison after conversion to OQMD format. I will proceed in this way in that case.

@tachyontraveler
Copy link
Member

Now the query with START operation results in a 501 error while a query with equality operation returns a successful response.

The anonymous formulae are returned in OPTIMADE's preferred format

@ml-evs
Copy link
Contributor Author

ml-evs commented Dec 17, 2021

Looks good to me! I notice that the validator does incorrectly flag this, so I raised an issue to track this in the Python tools.

@tachyontraveler
Copy link
Member

Sounds good. I will close this issue in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants