slice bounds out of range #462

programmerX1123 · 2022-05-12T14:27:36Z

Hi, I am parsing a parquet file whose schema is (generated by parquet-tools):

{
  "Tag": "name=Schema, repetitiontype=REQUIRED",
  "Fields": [
    {
      "Tag": "name=Timestamp, type=INT64, repetitiontype=OPTIONAL"
    },
    {
      "Tag": "name=File_name, type=BYTE_ARRAY, convertedtype=UTF8, repetitiontype=OPTIONAL"
    },
    {
      "Tag": "name=Avro_name, type=BYTE_ARRAY, convertedtype=UTF8, repetitiontype=OPTIONAL"
    },
    {
      "Tag": "name=Offset, type=INT32, repetitiontype=OPTIONAL"
    },
    {
      "Tag": "name=File_format, type=BYTE_ARRAY, convertedtype=UTF8, repetitiontype=OPTIONAL"
    },
    {
      "Tag": "name=Meta_data, type=BYTE_ARRAY, convertedtype=UTF8, repetitiontype=OPTIONAL"
    }
  ]
}

And I use the following struct to hold the content of parquet file:

type Schema struct {
	Timestamp int64  `parquet:"name=timestamp, type=INT64"`
	AvroName  string `parquet:"name=avro_name, type=BYTE_ARRAY"`
	FileName  string `parquet:"name=file_name, type=BYTE_ARRAY"`
	Offset    int32  `parquet:"name=offset, type=INT32"`
}

When I try to parse a parquet file which has 4905 rows, the following error is thrown out:

panic: runtime error: slice bounds out of range [:4905] with capacity 3072

But when I run the same code on a parquet file that has only 5 rows, there is no error (these 2 parquet files are generated by same script so they share the same schema). Here is the result:

[{211297138286 Image0.avro 211297138286.png 269475} 
{210997038286 Image0.avro 210997038286.png 58} 
{210997038286 Image0.avro 210997038286.png 58} 
{210997038286 Image0.avro 210997038286.png 58} 
{210997038286 Image0.avro 210997038286.png 58}]

So is there a limit of the size of the parquet file?
Besides, when I omit the AvroName field, the first parquet file can also be read successfully ( but AvroName is a field of file names just as FileName so I don't think there are any differences between them).
Moreover, I have tested several parquet files with different number of rows, and they get the same slice bounds out of range error. Therefore I think this error is not caused by occasional mistake during the generation of parquet file.
Now I am really confused and wonder if you can help me fix this bug. Thank you in advance!

The text was updated successfully, but these errors were encountered:

hangxie · 2022-05-19T05:00:24Z

The schema and go struct don't match, OPTIONAL fields should be defined as pointer so it can be nil.
If it does not work after changing definition of type Schema, it will be helpful to have a sample parquet file (and better with snippet of your source code) to troubleshoot.

When a Read is performed after SeekToRow on mergedRowGroups, the rowIndex is checked against the seek index and advanced until the rowIndex == seek index. Previously, the rowIndex was not advanced in the normal read path, resulting in mistakenly dropping unread rows when advancing the rowIndex.

ZhenSh · 2023-07-24T16:32:58Z

Hi @programmerX1123
I have run into this same issue, wondering how did you get the issue resolved? Could you share the info?
Appreciate it.
Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slice bounds out of range #462

slice bounds out of range #462

programmerX1123 commented May 12, 2022 •

edited

hangxie commented May 19, 2022

ZhenSh commented Jul 24, 2023

slice bounds out of range #462

slice bounds out of range #462

Comments

programmerX1123 commented May 12, 2022 • edited

hangxie commented May 19, 2022

ZhenSh commented Jul 24, 2023

programmerX1123 commented May 12, 2022 •

edited