# Context

LZ4 is a widely-used lossless compression algorithm. The reference
implementation of LZ4 at github.com/lz4/lz is subject to a heap-based
buffer overflow in releases prior to 1.9.2 as described in
CVE-2019-17543. The vulnerability is fixed by commit
d7cad81093cd805110291f84d64d385557d0ffba.


# Aim

Lets try to determine all the code that may still be vulnerable!

First lets define WoC access functions

In [34]:
import requests, json

def showCnt (type, sha1):
  url='http://worldofcode.org/api/lookup?command=showCnt' + '&type='+ type + '&sha1=' + sha1
  r = requests.get(url)
  res = json.loads(r.content)['stdout']
  if type == 'commit': 
    res = res.split(';')
    return ('Tree:'+res[1]+'\nParent:'+res[2]+'\nAuthor:'+res[3]+'\n')
  return (res)


def getValue (map, key):
  #map/key may not be correct names of the parameters
  url='http://worldofcode.org/api/lookup?command=getValues' + '&type='+ map + '&sha1=' + key
  r = requests.get(url)
  return (json.loads(r.content)['stdout']).split(';')[1:]

Once these functions are defined we can investigate the the problem:

What is the commit?

What files does it modify?

In [35]:
print (showCnt('commit', 'd7cad81093cd805110291f84d64d385557d0ffba'))

print (getValue('c2b', 'd7cad81093cd805110291f84d64d385557d0ffba'))

print (getValue('c2f', 'd7cad81093cd805110291f84d64d385557d0ffba'))

Tree:9af7fb7a0b32809791cad70c12eda3dc9ccb48c7
Parent:1bcde6414a68094601ecd57a968808fdd43fb986
Author:Nick Terrell <terrelln@fb.com>

['9808d70aed03290c648b983ea404446779eff501\n']
['lib/lz4.c\n']


The commit with the fix contains only one blob
(9808d70aed03290c648b983ea404446779eff501) that creates a new version of lib/lz4.c. The author of that commit is
Nick Terrell <terrelln@fb.com>. 

What projects fixed the vulerability? If they have fixed blob, then they should be OK, 
if they have the same commit, they are also probably forks.
We use p to represent original repo and P to represent deforked repo (one among repos that 
share lots of commits (https://arxiv.org/abs/2002.02707)

In [36]:
print (getValue('b2P', '9808d70aed03290c648b983ea404446779eff501'))

print (getValue('c2P', 'd7cad81093cd805110291f84d64d385557d0ffba'))

print (getValue('c2p', 'd7cad81093cd805110291f84d64d385557d0ffba'))

['0-wiz-0_libuv', '06094051_librdkafka', '1010101012101_borg', '3370sohail_gecko-dev', '540KJ_root', '6180_python-lz4', 'Alexhuszagh_c-blosc', 'Bambooie_gdsfmt', 'cactus74_fst', 'darkskygit_simple_kv', 'draede_cx', 'dudw_libportable', 'fangq_matzip', 'girdharshubham_gokafka', 'git.bioconductor.org_packages_gdsfmt', 'jmoiron_golz4', 'karubabu_quickbms', 'liliilli_Dy', 'lmtwga_lzbench', 'marcelorl_graphql-real-time-example', 'pharaoh1_7-Zip-zstd', 'scalarwaves_zbox', 'silnrsi_grcompiler', 'tafia_fstlib', 'ushiiwaka_ps2-packer', 'windreamer_py-lz4framed\n']
['0-wiz-0_libuv\n']
['MiniclipPortugal_lz4', 'bobby0809_lz4', 'gitlab.com_exokos_data/LibLZ4', 'gitlab.com_freedesktop-sdk_mirrors/github/lz4/lz4', 'gitlab.com_fuchsia-cn_fuchsia/third_party/lz4', 'lz4_lz4', 'terrelln_lz4\n']


b2P show 26 projects. 

c2P shows one project (that one project has 7681 forks. 

c2p shows seven projects. 

That becomes interesting: somehow the fix was produced via different commits in the forks and 
many other projects (26) have implemented the fix even though these are not forks.
 
                       
Now lets try to identify the code that may still be vulnerable. How?
By looking at the pre-fix file content.                       
                       
                       


In [37]:
print (getValue('b2ob', '9808d70aed03290c648b983ea404446779eff501'))

['08cf6b5cd72b8182552dcc53bdc0d83ccd5382fd', '143c36e1a7448c488a44498ac953ea222f3f38d0', '4046102e6deea607dc12f870c14295cab1efee77', '707b94c41954792f95b6bb2d316b787352969cef', '877d14edad4b0568598d64579ecb68db82bd59f4', 'c9c5a072a193b9b7f7c010797d0e122038587332', 'e51a3e0a46c9608bedbb0b9565d736240b30bde6', 'e614c4577f2ae8b2db76ff838f2051eeeeb1a89b', 'ed928ced3f154ab414f657c4dbd0193cbe7cd969\n']


In fact, we find recursively finds 514 unique old blobs in 706 different projects.

We then need to identify those that do not have the fixed blob in order to determine projects that 
contain still-vulnerable code.