Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GipPython lock folder with repo on windows so it could not be removed #387

Open
Lehych opened this issue Feb 17, 2016 · 7 comments
Open

Comments

@Lehych
Copy link

Lehych commented Feb 17, 2016

Tested on

GitPython version: 1.0.2
Windows: Server 2012
Git: 2.7.0.windows

To reproduce

import os
import shutil
from git import Repo

path_to_repo = 'somepath'
r = Repo.init(path_to_repo)
with file(os.path.join(path_to_repo, 'test.txt'), 'a'):
    os.utime(os.path.join(path_to_repo, 'test.txt'), None)
r.index.add(['test.txt'])
r.index.commit('Test commit')

shutil.rmtree(path_to_repo)

you'll se
WindowsError: [Error 32] The process cannot access the file because it is being used by another process:
Also you'll see that there are four git subprocesses running until python process will be closed (I was using REPL and this subprocesses were hanging there until I closed REPL).

They are git cat-file --batch-check and git cat-file --batch.
On macos calling this example in REPL leads to appearing of two non finished git-processes too.

Process Explorer show that the folder (path_to_repo) is locked by 4 git processes.
gitpython_bug

Maybe there is concept that Repo should be closed somehow but i could not find that kind of API.

Some GitPython-tests fails with this error (tests run on master branch):

======================================================================
ERROR: test_commit_serialization (git.test.performance.test_commit.TestPerformance)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\gp\git\test\performance\lib.py", line 89, in tearDown
    shutil.rmtree(self.gitrwrepo.working_dir)
  File "c:\python27\Lib\shutil.py", line 256, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "c:\python27\Lib\shutil.py", line 254, in rmtree
    os.rmdir(path)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\me\\appdata\\local\\temp\\3\\tmpdjaibf'
-------------------- >> begin captured logging << --------------------
root: INFO: You can set the GIT_PYTHON_TEST_GIT_REPO_BASE environment variable to a .git repository ofyour choice - defaulting to the gitpython repository
--------------------- >> end captured logging << ---------------------

======================================================================
ERROR: test_large_data_streaming (git.test.performance.test_streams.TestObjDBPerformance)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\gp\git\test\lib\helper.py", line 121, in repo_creator
    return func(self, rw_repo)
  File "D:\gp\git\test\performance\test_streams.py", line 90, in test_large_data_streaming
    os.remove(db_file)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: u'c:\\users\\me\\appdata\\local\\temp\\3\\tmpnylxewbare_test_large_data_streaming\\objects\\81\\7bd0459ba45c7186b5279fbacc69dc39c42efb'
-------------------- >> begin captured logging << --------------------
root: INFO: You can set the GIT_PYTHON_TEST_GIT_REPO_BASE environment variable to a .git repository ofyour choice - defaulting to the gitpython repository
--------------------- >> end captured logging << ---------------------

P.S. I know it is a problem to test on windows platform. My colleague mentioned AppVeyor CI for windows CI. Pip-accel is using it.

@Lehych
Copy link
Author

Lehych commented Feb 17, 2016

There is a thing with multiple instances of git running simultaneously. If two of them running at the same time there could be a lock. Maybe there is inside gitpython several git processes run at the same time on commit?

Also there is same problems with gitdb (master branch)

======================================================================
ERROR: test_writing (gitdb.test.db.test_pack.TestPackDB)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\gitdb\test\lib.py", line 87, in wrapper
    return func(self, path)
  File "D:\gitdb\gitdb\test\lib.py", line 114, in wrapper
    return func(self, path)
  File "D:\gitdb\gitdb\test\db\test_pack.py", line 33, in test_writing
    os.rename(pack_path, new_pack_path)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process


======================================================================
ERROR: test_large_data_streaming (gitdb.test.performance.test_stream.TestObjDBPerformance)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\gitdb\gitdb\test\lib.py", line 72, in wrapper
    return func(self, *args, **kwargs)
  File "D:\gitdb\gitdb\test\lib.py", line 87, in wrapper
    return func(self, path)
  File "D:\gitdb\gitdb\test\performance\test_stream.py", line 107, in test_large_data_streaming
    os.remove(db_file)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: u'c:\\users\\me\\appdata\\local\\temp\\3\\test_large_data_streaminghwncwz\\16\\09b36fbb091bf2e35c05a4fd61c16ac18e2296'

@Byron
Copy link
Member

Byron commented Feb 21, 2016

Thanks for that wonderfully detailed and conclusive issue !
The problem described here is well known to me, and may originate in my previous misconception on the reliability of destructors. Thus GitPython believes to use __del__ to release resources, even though these methods might never be called.
Even if they are, eventually, they might be a delay until the file-locks are actually released by windows, which I believe could be part of the reason tests will fail to cleanup.

At some point I stopped testing on windows as well, which was in a time when AppVeyor and travis didn't even exist yet to compensate.

However, there should be a release() method on the repositories which is supposed to be called when you are done with them. Maybe these work as advertised and can help to workaround the issue.
Another option might be to offload gitpython calls to another process using multi-processing. That way, one can more easily control and enforce the release of resources, without polluting your own process' resources.

Even though I don't think I will be able to fix the issue, I will leave it open for everyone to see.

@slacAWallace
Copy link

Could you point me in the direction of the release() method so I can try it? I can't find it. I am running into this problem as well. I tried this:
rmtree-example

@ankostis
Copy link
Contributor

ankostis commented Oct 31, 2018

Please search issues with tag.deadlocks [edit] tag.leaks .

@slacAWallace
Copy link

Found something that worked:
#546 (comment)

@claell
Copy link

claell commented Jul 17, 2024

Just ran into this issue. This is quite bad, especially when wanting to do follow up actions with the repository.

Interesting, though, how little attention this issue got, based on the possible consequences it has.

However, there should be a release() method on the repositories which is supposed to be called when you are done with them. Maybe these work as advertised and can help to workaround the issue.

r.release() doesn't exist. I found a mention of a release() method at https://gitpython.readthedocs.io/en/stable/tutorial.html, but that doesn't really help.
@Byron, If you have any more information on this, that'd be appreciated!

The workaround from #546 (comment) also lines up with the docs.
In https://gitpython.readthedocs.io/en/stable/reference.html#git.cmd.Git.clear_cache it is mentioned that this is used to release resources.

Possible other workaround (but seems to be less nice): #718 (comment)

Referencing #718 which seems to be about the same problem.

TLDR:
r.git.clear_cache() fixes the problem (at least for me).

@Byron
Copy link
Member

Byron commented Jul 17, 2024

Maybe submitting a PR that makes clear_cache() easier to discover would be a good course of action here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants