Skip to content

teh/deltadiff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This is implementation of the rsync delta algorithm without a rolling hash function.

Instead of using a rolling hash I use the first 8 bytes of each block as the weak lookup key. For data with high entropy, e.g. gzipped data this works quite well. It breaks hard for arbitrary data, so use with care.

You can check the Shannon entropy of your data with the ent.py script:

$  cat /tmp/data.tgz | python ent.py 
0.999503541539

High entropy means something close to 1.

I am avoiding the rolling hash because a pure python implementation is very slow. This implementation is, while not fast, at least usable for files up to a few hundred megabytes.

Example usage:

from deltadiff2 import *

# We have an original file on the server and a changed file on the
# client:
sig = generate_signature('/tmp/original.zip')
delta = generate_delta('/tmp/changed.zip', sig)

# Now the client sends up the deltas over the dial up modem and we
# apply them on the server side:
patch('/tmp/original.zip', '/tmp/new.zip', sig, delta)

About

Implementation of rsync's delta diff (for high entropy data only!)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages