Skip to content

SST1RSoXSDB catalog search fails (metadata change): proposal_id can be string or numeric #201

@pbeaucage

Description

@pbeaucage

SST1RSoXSDB instrument metadata has changed again, this time by writing the proposal_id field as a numeric datatype, e.g.:

from tiled.client import from_uri
c = from_uri('https://tiled.nsls2.bnl.gov/')
c['rsoxs']['raw'].distinct('proposal_id',counts=True)

Out[66]: 
{'metadata': {'start.proposal_id': [{'value': 'C-315077', 'count': 1185},
  [...]
   {'value': 'GU-307808-3', 'count': 228},
   {'value': 316980, 'count': 608},
   {'value': 'GU-308931', 'count': 1454},
   {'value': 'PU-303840', 'count': 117},
   {'value': '305401', 'count': 334},
[...]]},
 'structure_families': None,
 'specs': None}

This is a complicated one to fix because searchCatalog currently uses Regex for most matching. Regex, of course, can't match a number. Eq can match a number or string, but only exactly, and is datatype sensitive, i.e.,

len(c['rsoxs']['raw'].search(Eq('proposal_id',"317132"))) --> 0
len(c['rsoxs']['raw'].search(Eq('proposal_id',317132))) = 132

So perhaps we add a special case carve-out to proposal_id matches that tries the search as regex, if that reduces to zero length, then casts the query to int and tries an Eq match? I hate the special case but don't see another way. We could ask/search to see if Tiled has an equivalence query that could handle this on server-side, and long term the instrument should be writing this data as strings anyway to accommodate GU/PU distinction.

Any suggestions on how to accomplish this are most welcome. It's a bit of a bear.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions