Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash of identical files #1732

Closed
Levap123 opened this issue Nov 28, 2023 · 3 comments · Fixed by #1735
Closed

Hash of identical files #1732

Levap123 opened this issue Nov 28, 2023 · 3 comments · Fixed by #1735
Labels
confirmed This issue can be reproduced
Projects

Comments

@Levap123
Copy link

Hello excelize team,

I am currently working with the excelize library and have encountered a scenario where I need to generate Excel files that are identical in every respect, such that their SHA256 hashes would also be identical. However, I've observed that even when creating new files with the same content and structure, the resulting SHA256 hashes are different for each file.

This leads me to wonder whether excelize adds any unique metadata or data (such as timestamps or unique identifiers) to each file during creation, which might be causing this discrepancy in hashes.

Could you please provide some insights on the following:

  • Does excelize inherently include any unique data or metadata in each new Excel file it creates?
  • If so, is there a way to control or disable this behavior to ensure that two identically created files using excelize would yield the same SHA256 hash?
@xuri
Copy link
Member

xuri commented Nov 29, 2023

Thanks for your issue. We've been using the sync.Map to store workbook packages each part path name and content internally, so the order of these parts may be different on the save, that's the reason for this problem. I'll consider changing this design, but that needs some time. I suggest that make a copy of the workbook instead of saving it as a new one for the same workbook if you can.

@xuri xuri added the confirmed This issue can be reproduced label Nov 29, 2023
user65536 added a commit to user65536/excelize that referenced this issue Nov 30, 2023
@user65536
Copy link
Contributor

@xuri I've created a pull request for this issue, the internal part path has been sorted so that the output file would be identical.

@xuri xuri added this to Improve the Compatibility in v2.8.1 Nov 30, 2023
@tjayrush
Copy link

This feature is HUGE. Thanks for including this. We publish Excel sheets from a process that starts with blockchain data (that, as you probably know) is hashed data. Until now, we would arrive all the way at the end of a lengthy data extraction pipeline that preserves the "hashed data nature" of the blockchain data, but when we wrote to Excel using your library, it would destroy that preserved nature of the hashable data.

We do all this so we can store the Excel file on IPFS which is a hash-based, content addressable storage medium. Our pipeline can now produce perfectly auditable and re-procucible data all the way from the blockchain into an Excel spreadsheet. SUPER useful. THanks.

xuri added a commit to barlevd/excelize that referenced this issue Apr 30, 2024
Saving workbook with reverse sorted internal part path to keep same hash of identical files
xuri pushed a commit that referenced this issue Apr 30, 2024
Saving workbook with reverse sorted internal part path to keep same hash of identical files and fix incorrect MIME type
barlevd added a commit to barlevd/excelize that referenced this issue Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed This issue can be reproduced
Projects
No open projects
v2.8.1
Improve the Compatibility
Development

Successfully merging a pull request may close this issue.

4 participants