Danger of arbitrary code execution by `torch.load()` #193

34j · 2023-03-31T01:58:44Z

Do not use untrusted models, as there is currently no way to address this.

BlueAmulet · 2023-04-01T02:30:53Z

The way of fixing/mitigating this is to use a custom unpickler with torch.load that only allows certain safe classes to be used.
An example of that can be found here https://github.com/joeyballentine/ESRGAN-Bot/blob/master/utils/unpickler.py
You then call torch.load like torch.load("some_model.pth", pickle_module=RestrictedUnpickle)

sbersier · 2023-04-01T08:47:57Z

EDIT: For some reason, in my case, @BlueAmulet code worked or not, depending on the environnment (possibly due to different torch versions).

DISCLAIMER: I'm not expert in security. So, take what I say with caution.

NOTE: There is the picklescan python library (pip install picklescan); "Security scanner detecting Python Pickle files performing suspicious actions." E.g.: picklescan -p some_model.pth The best way to detect malicious files? I don't know. But it looks like it does the job.

Indeed, ML models can contain executable code.
There is a good video on this subject by Yannick Kilcher:

https://www.youtube.com/watch?v=2ethDz9KnLk

He also provides a link to a (perfectly safe) toy model to illustrate the point. When you load the model, it opens your default browser, just to show that it launched a shell process. Now, since these models will be loaded in a python environnment, you could look for instructions like "import", "exec", "os.popen", "subprocess", "os.system" and things like that.

For example, in linux (you'll have to figure out how to do that on Windows, or Mac by yourself), the following command when issued on the toy model provided by Yannick looks for all strings in the model and searches for given keywords.

In this case, it spits out:

exec('''import webbrowser
import sys

Which, in a real case, should immediately raise your suspicion.

Detection with picklescan:

picklescan -p pytorch_model.bin

results in:


/path/to/pytorch_model.bin:archive/data.pkl: dangerous import '__builtin__ eval' FOUND
----------- SCAN SUMMARY -----------
Scanned files: 1
Infected files: 1
Dangerous globals: 1

So, picklescan correctly detects it as malicious.

Is it enough to be perfectly safe? Probably not. Hugginface is aware of the threat so they scan the files. But, if you download models from dubious places you might have a bad surprise.

Now, to go back to the code mentionned by @BlueAmulet :
In my case, it didn't do the job when in the base environment (but it worked in a different venv). I tried with the toy model I mentionned above. It could be due to the torch version: 1.13.1+cu117

import the RestrictedUnpickler class
torch.load("pytorch_model.bin", pickle_module=RestrictedUnpickle)

The result (I tested it): It opens the browser. So, it doesn't work (torch version:
Additional note: .bin, .pt and .pth are strictly equivalent

Here is the python script to reproduce it. Of course, you need the toy model by Yannick Kilcher:

# Safe unpickler to prevent arbitrary code execution
# From: https://github.com/joeyballentine/ESRGAN-Bot/blob/master/utils/unpickler.py

import pickle
from types import SimpleNamespace

safe_list = {
    ("collections", "OrderedDict"),
    ("torch._utils", "_rebuild_tensor_v2"),
    ("torch", "FloatStorage"),
}


class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        # Only allow required classes to load state dict
        if (module, name) not in safe_list:
            raise pickle.UnpicklingError(
                "Global '{}.{}' is forbidden".format(module, name)
            )
        return super().find_class(module, name)


RestrictedUnpickle = SimpleNamespace(
    Unpickler=RestrictedUnpickler,
    __name__="pickle",
    load=lambda *args, **kwargs: RestrictedUnpickler(*args, **kwargs).load(),
)

###################################
import torch
torch.load("pytorch_model.bin", pickle_module=RestrictedUnpickle)

# ---> Result: it opens the browser!

EDIT:
It works in the environnement I created for so-vits-svc-fork. So, I don't know what to think about it.


(so-vits-svc-fork) steph@steph-desktop:/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads$ python test.py 
Traceback (most recent call last):
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads/test.py", line 31, in <module>
    torch.load("pytorch_model.bin", pickle_module=RestrictedUnpickle)
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/AUDIO_PROCESSING/SOVITS_2.1.5/so-vits-svc-fork/lib/python3.10/site-packages/torch/serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/AUDIO_PROCESSING/SOVITS_2.1.5/so-vits-svc-fork/lib/python3.10/site-packages/torch/serialization.py", line 1172, in _load
    result = unpickler.load()
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/AUDIO_PROCESSING/SOVITS_2.1.5/so-vits-svc-fork/lib/python3.10/site-packages/torch/serialization.py", line 1165, in find_class
    return super().find_class(mod_name, name)
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads/test.py", line 17, in find_class
    raise pickle.UnpicklingError(
_pickle.UnpicklingError: Global '__builtin__.eval' is forbidden

34j · 2023-04-01T10:34:31Z

Cannot reproduce. It correctly returns an error. Did you really try it???

34j · 2023-04-01T10:35:34Z

I think the BlueAmulet example says almost the same thing as the official Python documentation.

sbersier · 2023-04-01T11:03:53Z

Absolutely, I executed the code and the browser opens to the page: https://www.ykilcher.com/pickle
I double checked it. No error returned whatsoever.
Are we talking about the same file: pytorch_model.bin ?
From here: https://huggingface.co/ykilcher/totally-harmless-model

Strange... Could you paste returned error in your case?

But I can prove it. I recorded my Desktop. Link to video:
https://drive.google.com/file/d/1qpBv_3Hy5ehh_t7RkxjsTFlrHfhuqpi-/view?usp=sharing

34j · 2023-04-01T11:14:38Z

34j · 2023-04-01T11:15:32Z

Am I doing something wrong? Strange......

sbersier · 2023-04-01T11:23:19Z

Weird indeed...
Note:
python -c "import pickle; print(pickle.__version__)" returns:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: module 'pickle' has no attribute '__version__'

Is it the same for you?

34j · 2023-04-01T11:24:00Z

Yes.

34j · 2023-04-01T11:47:09Z

Thanks for your verification. The post does not need to be deleted. (Rather why did you think that?) Additional investigations are needed.

sbersier · 2023-04-01T12:03:19Z

So, to get pickle version, the correct way is:
python -c "import pickle; print(pickle.format_version)"
which returns: 4.0 for both environnments.
So, it looks like, pickle version was not involved after all...

But: I have torch version 1.13.1+cu117 in my base env and version 2.0.0+cu117 in my so-vits-svc-fork
Could this be the reason?

sbersier · 2023-04-01T14:43:24Z

Note:
Ubuntu 22.04.2
python==3.10.6

Both torch==1.13.1 and torch==2.0.0 allow for code execution when loading a model with torch.load() method.
The "safe unpickler patch" @BlueAmulet is referring to works for torch==2.0.0 but not for torch==1.13.1

The "malicious" test model is a toy model by Yannick Kilcher.
It just opens the web browser when the model is loaded with torch.load().
See: https://www.youtube.com/watch?v=2ethDz9KnLk and the link in the description.
The model is: pytorch_model.bin


cd Downloads
mkdir TEST
cd TEST
# Note: The code for test.py is given in the last section and should be placed in TEST/.
cp ./Downloads/pytorch_model.bin .
python -m venv torch_1_13_1
python -m venv torch_2_0_0

##############################
# CASE 1: Using torch==1.13.1:

source torch_1_13_1/bin/activate
pip install -U torch==1.13.1
pip install -U numpy

# Case 1.a):

python -c "import torch; torch.load('pytorch_model.bin')"

# Case 1.b):

python test.py # (which, in principle, implements a "safe" unpickler)

# BOTH result in the web browser opening (i.e. successfull code execution)
# That is, even when using the "safe" unpickler class defined in test.py, it fails to prevent code execution.

#############################
# CASE 2: Using torch==2.0.0:

source torch_2_0_0/bin/activate
pip install -U torch==2.0.0
pip install -U numpy 

# Case 2.a):
python -c "import torch; torch.load('pytorch_model.bin')"
# The "attack" is SUCCESSFUL (we have code execution)

# Case 2.b):
python test.py

# In this case, the "patch" works and the code execution fails with the expected message:
Traceback (most recent call last):
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads/TEST/test.py", line 31, in <module>
    torch.load("pytorch_model.bin", pickle_module=RestrictedUnpickle)
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads/TEST/torch_2_0_0/lib/python3.10/site-packages/torch/serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads/TEST/torch_2_0_0/lib/python3.10/site-packages/torch/serialization.py", line 1172, in _load
    result = unpickler.load()
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads/TEST/torch_2_0_0/lib/python3.10/site-packages/torch/serialization.py", line 1165, in find_class
    return super().find_class(mod_name, name)
  File "/media/steph/417d421a-664e-4d33-9de2-72375ee18508/home/steph/Downloads/TEST/test.py", line 17, in find_class
    raise pickle.UnpicklingError(
_pickle.UnpicklingError: Global '__builtin__.eval' is forbidden

CONCLUSION:
The safe unpickler "patch" @BlueAmulet is referring to works for torch 2.0.0 but NOT for torch 1.13.1

The problem is that a lot of ML related repositories (that are not necessarily very old) have a torch version lower that 2.0.0 explicitely specified in their requirements...

(NOTE: both torch versions use pickle version 4.0)

###################################
# Code for test.py:

#
# test.py
# From: https://github.com/joeyballentine/ESRGAN-Bot/blob/master/utils/unpickler.py
# Safe unpickler to prevent arbitrary code execution
#

import torch
import pickle
from types import SimpleNamespace

safe_list = {
    ("collections", "OrderedDict"),
    ("torch._utils", "_rebuild_tensor_v2"),
    ("torch", "FloatStorage"),
}


class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        # Only allow required classes to load state dict
        if (module, name) not in safe_list:
            raise pickle.UnpicklingError(
                "Global '{}.{}' is forbidden".format(module, name)
            )
        return super().find_class(module, name)


RestrictedUnpickle = SimpleNamespace(
    Unpickler=RestrictedUnpickler,
    __name__="pickle",
    load=lambda *args, **kwargs: RestrictedUnpickler(*args, **kwargs).load(),
)


###################################
# Load the model:

torch.load("pytorch_model.bin", pickle_module=RestrictedUnpickle)

34j · 2023-04-01T15:08:43Z

Thank you for a perfect survey. I had no idea that PyTorch had let such a serious problem go so far......

malfet · 2023-04-01T16:37:51Z

As of pytorch-1.13 there is weights_only option designed to mitigate the problem. I.e. torch.load("foobar.pth", weights_only=True) should be safe to execute on untrusted models. There is even a PR proposing switching this option to True by default in the next release: pytorch/pytorch#97495

BlueAmulet · 2023-04-01T16:50:09Z

As of pytorch-1.13 there is weights_only option designed to mitigate the problem.

Good to know that's now a thing, the unpickler idea I referenced at the beginning of this discussion was tested back when pytorch 1.6.0 was latest. It seems like when weights_only was added they accidentally broke pickle_module pytorch/pytorch#88438

weights_only is simpler to add in, and works starting with 1.13.0

34j · 2023-04-16T05:26:24Z

Not resolved only in cluster model loading

NanoCode012 · 2023-04-25T15:42:22Z

Would it be possible to use safetensors instead? https://github.com/huggingface/safetensors

34j changed the title ~~Danger of arbitrary code execution by torch.load()~~ Danger of arbitrary code execution by torch.load() Mar 31, 2023

34j pinned this issue Mar 31, 2023

34j added the wontfix This will not be worked on label Mar 31, 2023

34j mentioned this issue Apr 14, 2023

fix: fix torch.load and save to use file objects to allow non-ASCII characters and use weights_only and remove unidecode #327

Merged

34j closed this as completed in #327 Apr 14, 2023

34j removed the wontfix This will not be worked on label Apr 14, 2023

34j unpinned this issue Apr 14, 2023

34j mentioned this issue Apr 16, 2023

fix(cluster): do not use weights_only in get_cluster_model() #354

Merged

34j reopened this Apr 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Danger of arbitrary code execution by `torch.load()` #193

Danger of arbitrary code execution by `torch.load()` #193

34j commented Mar 31, 2023 •

edited

Loading

BlueAmulet commented Apr 1, 2023

sbersier commented Apr 1, 2023 •

edited

Loading

34j commented Apr 1, 2023

34j commented Apr 1, 2023

sbersier commented Apr 1, 2023 •

edited

Loading

34j commented Apr 1, 2023

34j commented Apr 1, 2023

sbersier commented Apr 1, 2023

34j commented Apr 1, 2023

34j commented Apr 1, 2023

sbersier commented Apr 1, 2023 •

edited

Loading

sbersier commented Apr 1, 2023 •

edited

Loading

34j commented Apr 1, 2023

malfet commented Apr 1, 2023 •

edited

Loading

BlueAmulet commented Apr 1, 2023

34j commented Apr 16, 2023

NanoCode012 commented Apr 25, 2023

Danger of arbitrary code execution by torch.load() #193

Danger of arbitrary code execution by torch.load() #193

Comments

34j commented Mar 31, 2023 • edited Loading

BlueAmulet commented Apr 1, 2023

sbersier commented Apr 1, 2023 • edited Loading

34j commented Apr 1, 2023

34j commented Apr 1, 2023

sbersier commented Apr 1, 2023 • edited Loading

34j commented Apr 1, 2023

34j commented Apr 1, 2023

sbersier commented Apr 1, 2023

34j commented Apr 1, 2023

34j commented Apr 1, 2023

sbersier commented Apr 1, 2023 • edited Loading

sbersier commented Apr 1, 2023 • edited Loading

34j commented Apr 1, 2023

malfet commented Apr 1, 2023 • edited Loading

BlueAmulet commented Apr 1, 2023

34j commented Apr 16, 2023

NanoCode012 commented Apr 25, 2023

Danger of arbitrary code execution by `torch.load()` #193

Danger of arbitrary code execution by `torch.load()` #193

34j commented Mar 31, 2023 •

edited

Loading

sbersier commented Apr 1, 2023 •

edited

Loading

sbersier commented Apr 1, 2023 •

edited

Loading

sbersier commented Apr 1, 2023 •

edited

Loading

sbersier commented Apr 1, 2023 •

edited

Loading

malfet commented Apr 1, 2023 •

edited

Loading