Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Danger of arbitrary code execution by torch.load() #193

Open
34j opened this issue Mar 31, 2023 · 17 comments · Fixed by #327
Open

Danger of arbitrary code execution by torch.load() #193

34j opened this issue Mar 31, 2023 · 17 comments · Fixed by #327

Comments

@34j
Copy link
Collaborator

34j commented Mar 31, 2023

Do not use untrusted models, as there is currently no way to address this.

@34j 34j changed the title Danger of arbitrary code execution by torch.load() Danger of arbitrary code execution by torch.load() Mar 31, 2023
@34j 34j pinned this issue Mar 31, 2023
@34j 34j added the wontfix This will not be worked on label Mar 31, 2023
@BlueAmulet
Copy link
Collaborator

The way of fixing/mitigating this is to use a custom unpickler with torch.load that only allows certain safe classes to be used.
An example of that can be found here https://github.com/joeyballentine/ESRGAN-Bot/blob/master/utils/unpickler.py
You then call torch.load like torch.load("some_model.pth", pickle_module=RestrictedUnpickle)

@sbersier
Copy link

sbersier commented Apr 1, 2023

EDIT: For some reason, in my case, @BlueAmulet code worked or not, depending on the environnment (possibly due to different torch versions).

DISCLAIMER: I'm not expert in security. So, take what I say with caution.

NOTE: There is the picklescan python library (pip install picklescan); "Security scanner detecting Python Pickle files performing suspicious actions." E.g.: picklescan -p some_model.pth The best way to detect malicious files? I don't know. But it looks like it does the job.

Indeed, ML models can contain executable code.
There is a good video on this subject by Yannick Kilcher:

https://www.youtube.com/watch?v=2ethDz9KnLk

He also provides a link to a (perfectly safe) toy model to illustrate the point. When you load the model, it opens your default browser, just to show that it launched a shell process. Now, since these models will be loaded in a python environnment, you could look for instructions like "import", "exec", "os.popen", "subprocess", "os.system" and things like that.

For example, in linux (you'll have to figure out how to do that on Windows, or Mac by yourself), the following command when issued on the toy model provided by Yannick looks for all strings in the model and searches for given keywords.

strings -a pytorch_model.bin | grep 'system\|subprocess\|exec\|popen\|import\|eval'

In this case, it spits out:

exec('''import webbrowser
import sys

Which, in a real case, should immediately raise your suspicion.

Detection with picklescan:

picklescan -p pytorch_model.bin

results in:


/path/to/pytorch_model.bin:archive/data.pkl: dangerous import '__builtin__ eval' FOUND
----------- SCAN SUMMARY -----------
Scanned files: 1
Infected files: 1
Dangerous globals: 1

So, picklescan correctly detects it as malicious.

Is it enough to be perfectly safe? Probably not. Hugginface is aware of the threat so they scan the files. But, if you download models from dubious places you might have a bad surprise.

Now, to go back to the code mentionned by @BlueAmulet :
In my case, it didn't do the job when in the base environment (but it worked in a different venv). I tried with the toy model I mentionned above. It could be due to the torch version: 1.13.1+cu117

  1. import the RestrictedUnpickler class
  2. torch.load("pytorch_model.bin", pickle_module=RestrictedUnpickle)

The result (I tested it): It opens the browser. So, it doesn't work (torch version:
Additional note: .bin, .pt and .pth are strictly equivalent

Here is the python script to reproduce it. Of course, you need the toy model by Yannick Kilcher:

# Safe unpickler to prevent arbitrary code execution
# From: https://github.com/joeyballentine/ESRGAN-Bot/blob/master/utils/unpickler.py

import pickle
from types import SimpleNamespace

safe_list = {
    ("collections", "OrderedDict"),
    ("torch._utils", "_rebuild_tensor_v2"),
    ("torch", "FloatStorage"),
}


class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        # Only allow required classes to load state dict
        if (module, name) not in safe_list:
            raise pickle.UnpicklingError(
                "Global '{}.{}' is forbidden".format(module, name)
            )
        return super().find_class(module, name)


RestrictedUnpickle = SimpleNamespace(
    Unpickler=RestrictedUnpickler,
    __name__="pickle",
    load=lambda *args, **kwargs: RestrictedUnpickler(*args, **kwargs).load(),
)

###################################
import torch
torch.load("pytorch_model.bin", pickle_module=RestrictedUnpickle)

# ---> Result: it opens the browser!

EDIT:
It works in the environnement I created for so-vits-svc-fork. So, I don't know what to think about it.


(so-vits-svc-fork) steph@steph-desktop:/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads$ python test.py 
Traceback (most recent call last):
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads/test.py", line 31, in <module>
    torch.load("pytorch_model.bin", pickle_module=RestrictedUnpickle)
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/AUDIO_PROCESSING/SOVITS_2.1.5/so-vits-svc-fork/lib/python3.10/site-packages/torch/serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/AUDIO_PROCESSING/SOVITS_2.1.5/so-vits-svc-fork/lib/python3.10/site-packages/torch/serialization.py", line 1172, in _load
    result = unpickler.load()
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/AUDIO_PROCESSING/SOVITS_2.1.5/so-vits-svc-fork/lib/python3.10/site-packages/torch/serialization.py", line 1165, in find_class
    return super().find_class(mod_name, name)
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads/test.py", line 17, in find_class
    raise pickle.UnpicklingError(
_pickle.UnpicklingError: Global '__builtin__.eval' is forbidden

@34j
Copy link
Collaborator Author

34j commented Apr 1, 2023

Cannot reproduce. It correctly returns an error. Did you really try it???

@34j
Copy link
Collaborator Author

34j commented Apr 1, 2023

I think the BlueAmulet example says almost the same thing as the official Python documentation.

@sbersier
Copy link

sbersier commented Apr 1, 2023

Absolutely, I executed the code and the browser opens to the page: https://www.ykilcher.com/pickle
I double checked it. No error returned whatsoever.
Are we talking about the same file: pytorch_model.bin ?
From here: https://huggingface.co/ykilcher/totally-harmless-model

Strange... Could you paste returned error in your case?

But I can prove it. I recorded my Desktop. Link to video:
https://drive.google.com/file/d/1qpBv_3Hy5ehh_t7RkxjsTFlrHfhuqpi-/view?usp=sharing

@34j
Copy link
Collaborator Author

34j commented Apr 1, 2023

custom_unpickle

@34j
Copy link
Collaborator Author

34j commented Apr 1, 2023

Am I doing something wrong? Strange......

@sbersier
Copy link

sbersier commented Apr 1, 2023

Weird indeed...
Note:
python -c "import pickle; print(pickle.__version__)" returns:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: module 'pickle' has no attribute '__version__'

Is it the same for you?

@34j
Copy link
Collaborator Author

34j commented Apr 1, 2023

Yes.

@34j
Copy link
Collaborator Author

34j commented Apr 1, 2023

Thanks for your verification. The post does not need to be deleted. (Rather why did you think that?) Additional investigations are needed.

@sbersier
Copy link

sbersier commented Apr 1, 2023

So, to get pickle version, the correct way is:
python -c "import pickle; print(pickle.format_version)"
which returns: 4.0 for both environnments.
So, it looks like, pickle version was not involved after all...

But: I have torch version 1.13.1+cu117 in my base env and version 2.0.0+cu117 in my so-vits-svc-fork
Could this be the reason?

@sbersier
Copy link

sbersier commented Apr 1, 2023

Note:
Ubuntu 22.04.2
python==3.10.6

Both torch==1.13.1 and torch==2.0.0 allow for code execution when loading a model with torch.load() method.
The "safe unpickler patch" @BlueAmulet is referring to works for torch==2.0.0 but not for torch==1.13.1

The "malicious" test model is a toy model by Yannick Kilcher.
It just opens the web browser when the model is loaded with torch.load().
See: https://www.youtube.com/watch?v=2ethDz9KnLk and the link in the description.
The model is: pytorch_model.bin


cd Downloads
mkdir TEST
cd TEST
# Note: The code for test.py is given in the last section and should be placed in TEST/.
cp ./Downloads/pytorch_model.bin .
python -m venv torch_1_13_1
python -m venv torch_2_0_0

##############################
# CASE 1: Using torch==1.13.1:

source torch_1_13_1/bin/activate
pip install -U torch==1.13.1
pip install -U numpy

# Case 1.a):

python -c "import torch; torch.load('pytorch_model.bin')"

# Case 1.b):

python test.py # (which, in principle, implements a "safe" unpickler)

# BOTH result in the web browser opening (i.e. successfull code execution)
# That is, even when using the "safe" unpickler class defined in test.py, it fails to prevent code execution.

#############################
# CASE 2: Using torch==2.0.0:

source torch_2_0_0/bin/activate
pip install -U torch==2.0.0
pip install -U numpy 

# Case 2.a):
python -c "import torch; torch.load('pytorch_model.bin')"
# The "attack" is SUCCESSFUL (we have code execution)

# Case 2.b):
python test.py

# In this case, the "patch" works and the code execution fails with the expected message:
Traceback (most recent call last):
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads/TEST/test.py", line 31, in <module>
    torch.load("pytorch_model.bin", pickle_module=RestrictedUnpickle)
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads/TEST/torch_2_0_0/lib/python3.10/site-packages/torch/serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads/TEST/torch_2_0_0/lib/python3.10/site-packages/torch/serialization.py", line 1172, in _load
    result = unpickler.load()
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads/TEST/torch_2_0_0/lib/python3.10/site-packages/torch/serialization.py", line 1165, in find_class
    return super().find_class(mod_name, name)
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads/TEST/test.py", line 17, in find_class
    raise pickle.UnpicklingError(
_pickle.UnpicklingError: Global '__builtin__.eval' is forbidden

CONCLUSION:
The safe unpickler "patch" @BlueAmulet is referring to works for torch 2.0.0 but NOT for torch 1.13.1

The problem is that a lot of ML related repositories (that are not necessarily very old) have a torch version lower that 2.0.0 explicitely specified in their requirements...

(NOTE: both torch versions use pickle version 4.0)

###################################
# Code for test.py:

#
# test.py
# From: https://github.com/joeyballentine/ESRGAN-Bot/blob/master/utils/unpickler.py
# Safe unpickler to prevent arbitrary code execution
#

import torch
import pickle
from types import SimpleNamespace

safe_list = {
    ("collections", "OrderedDict"),
    ("torch._utils", "_rebuild_tensor_v2"),
    ("torch", "FloatStorage"),
}


class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        # Only allow required classes to load state dict
        if (module, name) not in safe_list:
            raise pickle.UnpicklingError(
                "Global '{}.{}' is forbidden".format(module, name)
            )
        return super().find_class(module, name)


RestrictedUnpickle = SimpleNamespace(
    Unpickler=RestrictedUnpickler,
    __name__="pickle",
    load=lambda *args, **kwargs: RestrictedUnpickler(*args, **kwargs).load(),
)


###################################
# Load the model:

torch.load("pytorch_model.bin", pickle_module=RestrictedUnpickle)

@34j
Copy link
Collaborator Author

34j commented Apr 1, 2023

Thank you for a perfect survey. I had no idea that PyTorch had let such a serious problem go so far......

@malfet
Copy link

malfet commented Apr 1, 2023

As of pytorch-1.13 there is weights_only option designed to mitigate the problem. I.e. torch.load("foobar.pth", weights_only=True) should be safe to execute on untrusted models. There is even a PR proposing switching this option to True by default in the next release: pytorch/pytorch#97495

@BlueAmulet
Copy link
Collaborator

As of pytorch-1.13 there is weights_only option designed to mitigate the problem.

Good to know that's now a thing, the unpickler idea I referenced at the beginning of this discussion was tested back when pytorch 1.6.0 was latest. It seems like when weights_only was added they accidentally broke pickle_module pytorch/pytorch#88438

weights_only is simpler to add in, and works starting with 1.13.0

@34j
Copy link
Collaborator Author

34j commented Apr 16, 2023

Not resolved only in cluster model loading

@NanoCode012
Copy link

Would it be possible to use safetensors instead? https://github.com/huggingface/safetensors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants