Description
With PyTables, you can do queries like someTable.where("X**2 + Y**2 < 1")
(as described here). It seems, though, that pandas imposes its own more restrictive query syntax which allows only a very limited set of query operations. This is confusing to users who expect to be able to query a Pandas HDFStore in the same way that they would query the underlying PyTables table.
Comments by @jreback on this Stackoverflow question suggest the pandas query handling is needed to handle complex queries and/or queries involving datetimes. However, it's rather extreme to block all queries from using complex expressions, even when the same query would work fine on the Pytable itself. (The SO question gives a simple example of that.) I suggest the following:
- The documentation should make it very clear that the examples provided there exhaustively describe the types of possible queries. (If they don't, then we need to come up with a comprehensive explanation of what is allowed.) They should also document that this query syntax is a subset of what is allowed with Pytables.
- It would be nice to provide some sort of override flag (or perhaps a separate function) saying "Just pass this query through to Pytables", to stop pandas from messing with the query. There is still work to be done in terms of wrapping the query result in a DataFrame, but this is on the output side and doesn't require modifying the query on the way in. This might also need to accept the
condvars
argument to pass ad-hoc variables to PyTables for use in queries.
Any thoughts on this? I'm curious what kinds of queries motivated the creation of this pandas-specific query syntax initially.