Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize for reading large file? #845

Closed
yqylovy opened this issue May 19, 2021 · 7 comments
Closed

Optimize for reading large file? #845

yqylovy opened this issue May 19, 2021 · 7 comments

Comments

@yqylovy
Copy link

yqylovy commented May 19, 2021

I have a 30MB size excel,it contains 50000+ rows。
After it unziped, the xl/worksheets/sheet1.xml is about 330MB, so while reading it, a lot of memory are used.It's any idea to optimize it?

@xuri
Copy link
Member

xuri commented May 19, 2021

The column and row iterator read data as streaming, it will use lower memory than GetRows.

@yqylovy
Copy link
Author

yqylovy commented May 20, 2021

In my case,just call OpenFile and Cols method using 700MB.I found file.XLSX hold the raw data in memory, so it take space when reading.
But read data as streaming is better than GetRows,thanks.

@ywtGGG
Copy link

ywtGGG commented Jul 28, 2021

老哥,我也遇到了一样的问题,我一个170m,大约200w行的数据,读取的时候居然占用我接近2g的内存,能优化下不,用那个rows迭代器读的
Showing nodes accounting for 1916.15MB, 99.58% of 1924.18MB total
Dropped 30 nodes (cum <= 9.62MB)
Showing top 10 nodes out of 20
flat flat% sum% cum cum%
1336.30MB 69.45% 69.45% 1336.30MB 69.45% github.com/360EntSecGroup-Skylar/excelize/v2.readFile
346.17MB 17.99% 87.44% 515.17MB 26.77% github.com/360EntSecGroup-Skylar/excelize/v2.rowXMLHandler
145.50MB 7.56% 95.00% 145.50MB 7.56% encoding/xml.copyValue

@xuri xuri closed this as completed in ec8120c Sep 18, 2021
xuri added a commit that referenced this issue Sep 19, 2021
- Close spreadsheet and row's iterator required
- New options `WorksheetUnzipMemLimit` have been added
- Improve streaming reading performance, memory usage decrease about 93.7%
@xuri
Copy link
Member

xuri commented Sep 19, 2021

Hi @yqylovy, @ywtGGG , I have improved the streaming reader performance for the rows iterator, memory usage decrease by about 93.7%. Please upgrade to the master branch code. This is a breaking change, the user should be close the stream after using the row's iterator, and close the spreadsheet after opening an existing spreadsheet like this:

package main

import (
    "fmt"

    "github.com/xuri/excelize/v2"
)

func main() {
    f, err := excelize.OpenFile("Book1.xlsx")
    if err != nil {
        fmt.Println(err)
        return
    }
    rows, err := f.Rows("Sheet1")
    if err != nil {
        fmt.Println(err)
        return
    }
    for rows.Next() {
        row, err := rows.Columns()
        if err != nil {
            fmt.Println(err)
        }
        for _, colCell := range row {
            fmt.Print(colCell, "\t")
        }
        fmt.Println()
    }
    // Close the stream
    if err = rows.Close(); err != nil {
        fmt.Println(err)
    }
    // Close the workbook
    if err := f.Close(); err != nil {
        fmt.Println(err)
    }
}

@yqylovy
Copy link
Author

yqylovy commented Mar 14, 2022

@xuri hi,请教下,我发现在 Cols 方法里面,还有

colIterator.cols.sheetXML = f.readBytes(name)
decoder := f.xmlNewDecoder(bytes.NewReader(colIterator.cols.sheetXML))

这样的解压调用,使得调用Cols时内存会暴涨,请问是有什么原因这块需要保留原始bytes吗?

@xuri
Copy link
Member

xuri commented Mar 14, 2022

The cell data was stored row by row in the worksheet so that the columns iterator with backtracking required was different from the rows iterator.

@yqylovy
Copy link
Author

yqylovy commented Mar 15, 2022

看到也是构建decoder来迭代,具体不同体现在哪呢?如果改成 f.xmlDecoder(name) 是仅仅影响性能,还是会有功能方面的影响?

jenbonzhang pushed a commit to jenbonzhang/excelize that referenced this issue Oct 22, 2023
…eaking changes

- Close spreadsheet and row's iterator required
- New options `WorksheetUnzipMemLimit` have been added
- Improve streaming reading performance, memory usage decrease about 93.7%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants