Skip to content

Different ids for FakeOsModule.<some> Python3.12 #1171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
evgenii-moriakhin opened this issue May 23, 2025 · 19 comments
Open

Different ids for FakeOsModule.<some> Python3.12 #1171

evgenii-moriakhin opened this issue May 23, 2025 · 19 comments

Comments

@evgenii-moriakhin
Copy link

evgenii-moriakhin commented May 23, 2025

On Python3.12 (python:3.12-alpine)

import shutil
from unittest.mock import patch

from pyfakefs.fake_filesystem import FakeFilesystem
from pyfakefs.fake_os import FakeOsModule

fake_fs = FakeFilesystem()
os_mock = FakeOsModule(fake_fs)

for _ in range(10):
    print(id(os_mock.lstat))

with patch("shutil.os", os_mock):
    shutil.rmtree("/foo")

prints (sometimes) different ids and raised

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/local/lib/python3.12/shutil.py", line 759, in rmtree
    _rmtree_safe_fd(stack, onexc)
  File "/usr/local/lib/python3.12/shutil.py", line 667, in _rmtree_safe_fd
    assert func is os.lstat
           ^^^^^^^^^^^^^^^^
AssertionError

I don't understand what this is related to, why different id's are returned for class

@mrbean-bremen
Copy link
Member

So you are saying os_mock.lstat changes while inside the range loop (that does nothing)? This is weird. At the moment, I don't have the slightest idea (and could not reproduce it yet).

@evgenii-moriakhin
Copy link
Author

Yes

full output on python:3.12-alpine image for example

root@dynamix:~# docker run -it --rm python:3.12-alpine sh
/ # pip install pyfakefs
Collecting pyfakefs
  Downloading pyfakefs-5.8.0-py3-none-any.whl.metadata (7.8 kB)
Downloading pyfakefs-5.8.0-py3-none-any.whl (230 kB)
Installing collected packages: pyfakefs
Successfully installed pyfakefs-5.8.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: pip install --upgrade pip
/ # python
Python 3.12.10 (main, May  9 2025, 23:41:15) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shutil
>>> from unittest.mock import patch
>>> from pyfakefs.fake_filesystem import FakeFilesystem
>>> from pyfakefs.fake_os import FakeOsModule
>>> fake_fs = FakeFilesystem()
>>> os_mock = FakeOsModule(fake_fs)
>>> for _ in range(10):
...     print(id(os_mock.lstat))
... 
140042655692736
140042655693312
140042655692736
140042655693312
140042655692736
140042655693312
140042655692736
140042655693312
140042655692736
140042655693312
>>> with patch("shutil.os", os_mock):
...     shutil.rmtree("/foo")
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/local/lib/python3.12/shutil.py", line 759, in rmtree
    _rmtree_safe_fd(stack, onexc)
  File "/usr/local/lib/python3.12/shutil.py", line 667, in _rmtree_safe_fd
    assert func is os.lstat
           ^^^^^^^^^^^^^^^^
AssertionError
>>> 

@mrbean-bremen
Copy link
Member

Thank you - now I can reproduce the problem. And it happens not always - weird indeed.

@mrbean-bremen
Copy link
Member

I verified that pyfakefs have always behaved that way (e.g. the different IDs), so this is not new. I still don't understand the behavior, though.

@evgenii-moriakhin
Copy link
Author

evgenii-moriakhin commented May 25, 2025

import shutil
from unittest.mock import patch

from pyfakefs.fake_filesystem import FakeFilesystem
from pyfakefs.fake_os import FakeOsModule

fake_fs = FakeFilesystem()
os_mock = FakeOsModule(fake_fs)

for _ in range(10):
    print(id(os_mock))
    print(id(os_mock.lstat))
    print()

with patch("shutil.os", os_mock):
    shutil.rmtree("/foo")

And this is how the id's stabilize, I haven't caught a case where they're different yet

update

I figured out why, there's only 2 of them, and they either change or they don't

@mrbean-bremen
Copy link
Member

mrbean-bremen commented May 25, 2025

Funny, I did (and found) exactly the same, though I'm still stomped.
I also found that if not printing the IDs, they stay the same, regardless of how often I get them:

    ids = set()
    for _ in range(100000):
        ids.add(id(os_mock.lstat))
    assert len(ids) == 1  # always passes

@evgenii-moriakhin
Copy link
Author

I'm not sure if it's print-dependent and not something weirder.

Because with print - I just did a demo after catching this error in shutil (I didn't have any prints)

@mrbean-bremen
Copy link
Member

mrbean-bremen commented May 26, 2025

I don't think it's specifically dependent on print, it may also be triggered by some other call - I haven't checked this. And it is not specific to lstat, the same happens with other calls.
I also checked back until pyfakefs 2.9 (the first version I was involved with) to check if some of my changes broke it, and the behavior was always the same. I don't even know how to investigate this, so this may take some time.

@mrbean-bremen
Copy link
Member

Ok, I'm still not sure why ids of function objects behave this way, but this seems to be a general Python behavior:

class Foo:
    def bar(self):
        pass

for _ in range(10):
    print(id(foo.bar))

behaves the same way, so this a red herring. I should have checked the actual problem (shutil.rmtree) instead...

This was actually a known problem which is fixed if using the Patcher or related methods, but not if directly patching os. Should have remembered that... There are a few things that are additionally patched in shutils by the patcher, such as:

        if hasattr(shutil, "_use_fd_functions"):
            shutil._use_fd_functions = False

which avoids using this optimized path which does not work with pyfakefs. See the setUp method for more stuff done there.

I have to think about how to best handle this without using the patcher...

@evgenii-moriakhin
Copy link
Author

evgenii-moriakhin commented May 26, 2025

It didn't occur to me to check the class object, I checked the class itself :\

After your code I asked chatgpt (o3) with search mode and it gave me some great articles

https://discuss.python.org/t/comparing-class-methods-using-is-is-wrong/29105/11

https://discuss.python.org/t/python-weirdness-can-anyone-explain-to-this-python-newbee/11976

As per the suggestion to cache in __dict__ on one of the links, I did so and the id became the same. I hope I haven't broken any python magic

import types

class Foo:
    def bar(self):
        pass

    def __getattribute__(self, name):
        attr = object.__getattribute__(self, name)
        if isinstance(attr, types.MethodType) and attr.__self__ is self:
            object.__setattr__(self, name, attr)

        return attr

foo = Foo()

for _ in range(10):
    print(id(foo.bar))

@mrbean-bremen
Copy link
Member

Ah, that explains it, thanks! Though it also means that is just not a good idea to compare ids of methods in instances.
Learned something new, even if it is not really related to this issue.

@evgenii-moriakhin
Copy link
Author

I think the caching approach (if it doesn't break anything) is related to the problem, in that you can make your pyfakefs classes behave like “modules”, even though they are not in fact modules

though if in stdlib somebody would want to check for isinstance(..., ModuleType) etc... Anyway, I don't want to go too far, these seem to be really extreme cases

@mrbean-bremen
Copy link
Member

mrbean-bremen commented May 27, 2025

I think the caching approach (if it doesn't break anything) is related to the problem, in that you can make your pyfakefs classes behave like “modules”, even though they are not in fact modules

True, that could help for this specific case, though there are other cases (for example the mentioned isinstance checks in the code) that should be handled. I think I'll move the special handling of shutil into FakeShutilModule if possible. That would mean that you also have to patch shutil if you want to use it, but I think that would be the correct way to handle it anyway, as there are already patched functions there that you might need.
I'll keep in mind your approach of caching the method though - it might still be needed for other cases.

@mrbean-bremen
Copy link
Member

I'm starting to think that this is not a good idea - whatever I do, it doesn't work reliably on all Python versions and systems. And yes, your idea (caching the methods) would probably work under Linux and Python < 3.14 (as the problem is only the function check), but with macOS or Python 3.14 (in all systems) this is not sufficient, as they use system-level calls per default that cannot be patched and should be avoided.

I was thinking about some other approach, but first need to understand your use case better. You wrote in the other issue:

Because the application is multithreaded, patching or mocking would be a bad idea

but by injecting the faked os with the fake filesystem into your code, it would use the same fake filesystem in all threads anyway, so the problem persists - or I don't understand something. Note that usually you won't notice problems in multi-threaded code, except if opening the same file for writing from several threads simultanously (or similar scenarios).

Can you elaborate on the difference between your approach, and just using the Patcher in your tests in some way?

@evgenii-moriakhin
Copy link
Author

In short - I should have multiple filesystem instances in my tests, not just one

That is

fake_os1
fake_pathlib1
etc

fake_os2
fake_pathlib2
etc

And there are several running threads that need to use these different file systems. They switch file systems by using set ContextVar inside the thread, and modifying the product code

def business_logic():
 os = os_ctx.get()
 pathlib = pathlib_ctx.get()
 ...

I did it this way because to use patching on the test code side:

def test_some():
     with Patcher/mock.patch(...):
        ... # any background code can be executed here, using different file systems in threads

actually this applies not only to threads, but also to asyncio coroutines, but I simplify the scenario

So I can't use with Pacher() in test code, and similarly I can't use it inside threads in product code in any (hidden) way, because patching changes the global state.

a simple demo of how this might look like

import shutil

from threading import Thread
from random import random
import time
from pyfakefs.fake_filesystem_unittest import Patcher


def th1():
    for _ in range(10):
        with Patcher(allow_root_user=False) as patcher:
            file_path = "/foo/bar"
            patcher.fs.create_dir(file_path)
        time.sleep(random())
        
def th2():
    for _ in range(10):
        with Patcher(allow_root_user=False) as patcher:
            file_path = "/foo/bar"
            patcher.fs.create_dir(file_path)
        time.sleep(random())

def th3():
    shutil.copy("/foo", "/bar")
    
thread1 = Thread(target=th1)
thread2 = Thread(target=th2)
thread3 = Thread(target=th3)

thread1.start()
thread2.start()
thread3.start()

thread1.join()
thread2.join()
thread3.join()

but even this code causes errors. If we leave thread1 - there will be no errors, but if we include other threads (thread3 just for demo), we will get errors related to threads

Traceback (most recent call last):
  File "/opt/jumpscale7/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/opt/jumpscale7/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/apiserver/jober/t.py", line 20, in th2
    patcher.fs.create_dir(file_path)
AttributeError: 'NoneType' object has no attribute 'create_dir'

so I went a different way, I try to make isolated module objects, without patching

for example for shutil like this (I have already shown, this is how I “memorize” the os module inside the shutil module)

open_func_contextvar = ContextVar("open_func_contextvar", default=open)
os_lib_contextvar = ContextVar("os_lib_contextvar", default=os)
shutil_lib_contextvar = ContextVar("shutil_lib_contextvar", default=shutil)


def get_open_func():  # noqa: ANN201
    return open_func_contextvar.get()


def get_os_lib():  # noqa: ANN201
    return os_lib_contextvar.get()


def get_shutil_lib():  # noqa: ANN201
    return shutil_lib_contextvar.get()


def patched_open(*args, **kwargs):
    return open_func_contextvar.get()(*args, **kwargs)

builtins.open = patched_open
class FakeShutilModule(_FakeShutilModule):
    def __init__(
        self,
        filesystem: FakeFilesystem,
        fake_os: FakeOsModule,
    ) -> None:
        # Lock while we mutate global state.
        with _patch_module_lock:
            with suppress(KeyError):
                del sys.modules["shutil"]

            sys.modules["os"] = cast("ModuleType", fake_os)

            patched_shutil = importlib.import_module("shutil")

            sys.modules["os"] = os
            del sys.modules["shutil"]

            import shutil  # noqa: F401 # type: ignore (returns real module to sys.modules)

        self.filesystem = filesystem
        self._shutil_module = patched_shutil

    def disk_usage(self, path: str) -> tuple[int, int, int]:
        return self.filesystem.get_disk_usage(path)

    def __getattr__(self, name: str) -> Any:  # noqa: ANN401
        return getattr(self._shutil_module, name)

and for each thread/background coroutine

        fake_fs = FakeFilesystem()
        open_func_mock = FakeFileOpen(fake_fs)
        os_mock = FakeOsModule(fake_fs)
        shutil_mock = FakeShutilModule(fake_fs, os_mock)

and for each too

        token_open_func = open_func_contextvar.set(cast("Any", open_func_mock))
        token_os = os_lib_contextvar.set(cast("Any", os_mock))
        token_shutil = shutil_lib_contextvar.set(cast("Any", shutil_mock))
        token_pathlib = pathlib_lib_contextvar.set(cast("Any", pathlib_mock))

I can now write product code in this way

def product_code():
    shutil = get_shutil_lib()
    ...

and it becomes thread and coroutine independent and uses its own file systems

I did the same thing with pathlib before your last PR, and even for some other modules missing in pyfakefs, like tempfile

@evgenii-moriakhin
Copy link
Author

More conceptually, imagine that you are running two web servers in threads in the same pytest session, and you need each of them to use a different file system.

I did this as setting a ContextVar for each incoming request to each web server.

@mrbean-bremen
Copy link
Member

Thank you for the explanation!

A couple of question:
You wrote (and showed in your example), that you need to create a separate fake filesystem for each thread. Why is this? Wouldn't it make more sense to use one filesystem - or do the threads in reality work on different file systems? I probably still don't get the use case...

So I can't use with Pacher() in test code, and similarly I can't use it inside threads in product code in any (hidden) way, because patching changes the global state.

I'm not sure what is the difference between that and injecting a fake os, which uses a fake filesystem. Are you saying that you want to use the fake filesystem only for some code/threads, but not for others?

and even for some other modules missing in pyfakefs, like tempfile

tempfile is a bit similar to shutil, as there is some global patching done in the Patcher setup (at least under Linux). Also, the temp directory is created at that time, as tempfile assumes its existence.

@mrbean-bremen
Copy link
Member

Ok, I missed your second comment about the web servers... so you are indeed using different file systems for different threads. Hmm...

@evgenii-moriakhin
Copy link
Author

evgenii-moriakhin commented May 29, 2025

It was actually working pretty well until I upgraded to 3.12 python - and there was a problem with os.lstat

I can think of a workaraund for this, but at least you now know more about how your library is being used :)

The ability to create modules (and more generally, fake objects) separately - I liked it, it's important in tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants