Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue while querying the nested data parquet files. #21850

Closed
guderamesh opened this issue May 8, 2024 · 5 comments
Closed

Issue while querying the nested data parquet files. #21850

guderamesh opened this issue May 8, 2024 · 5 comments

Comments

@guderamesh
Copy link

I get the below error when trying to query the nested data parquet files.

"class org.apache.parquet.io.primitiveColumnIO cannot be cast to class org.apache.parquet.io.GroupColumnIO (org.apache.parquet.io.primitiveColumnIO and org.apache.parquet.io.GroupColumnIO are in unnamed module of loader io.trino.server.PluginClassLoader @16488ed7)"

below is the sample record in the file.
clusterid=3243242, decesed=False, overallscroe=0, tier=0, buildverison=20230615, buildtype=0, names=arrays([{'firstname': 'myranda', 'middleinitial': 'K',
'middlename': 'K', 'lastname': 'Fielder', 'sources': array([3434], dtype=int32), 'sourcelevel':3, 'bizname':''}], dtype=object),
dobs=array([{{'year': 2005, 'month': 4, 'day': 8, 'yearscore': 7093, 'monthscore', 'sources': array([[3434], dtype=int32), 'sourcelevel':3, 'appendable':'False'}], dtype=object),
ethincities=array([], dtype=object),
phones=array([], dtype=object)

Note: The same parquet file was able to query through bigquery external table.
Please suggest, how to resolve the issue.
Thank you.

@ebyhr
Copy link
Member

ebyhr commented May 8, 2024

How did you create the Parquet file? Can you share the file with table definition, or the entire steps to reproduce?

@guderamesh
Copy link
Author

The parquet file is being created by the upstream application. At this point in time, not sure, need to check.

@raunaqmorarka
Copy link
Member

Which release of Trino are you using ? It might have been fixed by #20943
We need a sample file and table definition to work on this problem.

@guderamesh
Copy link
Author

Got the issue resolved after reading the parquet file schema and creating the corresponding hive table compatible with that schema, now able to query the parquet files.

@guderamesh
Copy link
Author

thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants