Optimize for reading large file? #845

yqylovy · 2021-05-19T10:05:43Z

I have a 30MB size excel，it contains 50000+ rows。
After it unziped, the xl/worksheets/sheet1.xml is about 330MB, so while reading it, a lot of memory are used.It's any idea to optimize it?

The text was updated successfully, but these errors were encountered:

xuri · 2021-05-19T11:22:15Z

The column and row iterator read data as streaming, it will use lower memory than GetRows.

yqylovy · 2021-05-20T02:24:28Z

In my case，just call OpenFile and Cols method using 700MB.I found file.XLSX hold the raw data in memory, so it take space when reading.
But read data as streaming is better than GetRows,thanks.

ywtGGG · 2021-07-28T15:36:24Z

老哥，我也遇到了一样的问题，我一个170m，大约200w行的数据，读取的时候居然占用我接近2g的内存，能优化下不，用那个rows迭代器读的
Showing nodes accounting for 1916.15MB, 99.58% of 1924.18MB total
Dropped 30 nodes (cum <= 9.62MB)
Showing top 10 nodes out of 20
flat flat% sum% cum cum%
1336.30MB 69.45% 69.45% 1336.30MB 69.45% github.com/360EntSecGroup-Skylar/excelize/v2.readFile
346.17MB 17.99% 87.44% 515.17MB 26.77% github.com/360EntSecGroup-Skylar/excelize/v2.rowXMLHandler
145.50MB 7.56% 95.00% 145.50MB 7.56% encoding/xml.copyValue

- Close spreadsheet and row's iterator required - New options `WorksheetUnzipMemLimit` have been added - Improve streaming reading performance, memory usage decrease about 93.7%

xuri · 2021-09-19T12:32:35Z

Hi @yqylovy, @ywtGGG , I have improved the streaming reader performance for the rows iterator, memory usage decrease by about 93.7%. Please upgrade to the master branch code. This is a breaking change, the user should be close the stream after using the row's iterator, and close the spreadsheet after opening an existing spreadsheet like this:

package main

import (
    "fmt"

    "github.com/xuri/excelize/v2"
)

func main() {
    f, err := excelize.OpenFile("Book1.xlsx")
    if err != nil {
        fmt.Println(err)
        return
    }
    rows, err := f.Rows("Sheet1")
    if err != nil {
        fmt.Println(err)
        return
    }
    for rows.Next() {
        row, err := rows.Columns()
        if err != nil {
            fmt.Println(err)
        }
        for _, colCell := range row {
            fmt.Print(colCell, "\t")
        }
        fmt.Println()
    }
    // Close the stream
    if err = rows.Close(); err != nil {
        fmt.Println(err)
    }
    // Close the workbook
    if err := f.Close(); err != nil {
        fmt.Println(err)
    }
}

yqylovy · 2022-03-14T11:42:38Z

@xuri hi，请教下，我发现在 Cols 方法里面，还有

colIterator.cols.sheetXML = f.readBytes(name)
decoder := f.xmlNewDecoder(bytes.NewReader(colIterator.cols.sheetXML))

这样的解压调用，使得调用Cols时内存会暴涨，请问是有什么原因这块需要保留原始bytes吗？

xuri · 2022-03-14T11:48:33Z

The cell data was stored row by row in the worksheet so that the columns iterator with backtracking required was different from the rows iterator.

yqylovy · 2022-03-15T02:17:47Z

看到也是构建decoder来迭代，具体不同体现在哪呢？如果改成 f.xmlDecoder(name) 是仅仅影响性能，还是会有功能方面的影响？

…eaking changes - Close spreadsheet and row's iterator required - New options `WorksheetUnzipMemLimit` have been added - Improve streaming reading performance, memory usage decrease about 93.7%

xuri closed this as completed in ec8120c Sep 18, 2021

xuri mentioned this issue Sep 23, 2021

F.Close() not working.. #1025

Closed

aceshot mentioned this issue Dec 17, 2021

high memory occupy when reading 100w lines excel file with latest master version #1096

Closed

xuri mentioned this issue Sep 20, 2022

Legacy version rows iterator cause high memory occupies 行迭代器把容器内存打满问题 #1356

Closed

xuri mentioned this issue Jun 11, 2023

How to stream read when a file is very large #1558

Closed

xuri mentioned this issue Aug 17, 2024

Limit the reading of GetRows and Rows #1972

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize for reading large file? #845

Optimize for reading large file? #845

yqylovy commented May 19, 2021

xuri commented May 19, 2021

yqylovy commented May 20, 2021

ywtGGG commented Jul 28, 2021

xuri commented Sep 19, 2021

yqylovy commented Mar 14, 2022 •

edited

Loading

xuri commented Mar 14, 2022

yqylovy commented Mar 15, 2022

Optimize for reading large file? #845

Optimize for reading large file? #845

Comments

yqylovy commented May 19, 2021

xuri commented May 19, 2021

yqylovy commented May 20, 2021

ywtGGG commented Jul 28, 2021

xuri commented Sep 19, 2021

yqylovy commented Mar 14, 2022 • edited Loading

xuri commented Mar 14, 2022

yqylovy commented Mar 15, 2022

yqylovy commented Mar 14, 2022 •

edited

Loading