Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent stream access causes ValueErrors with confusing message #595

Open
quantus opened this issue Feb 27, 2017 · 1 comment
Open

Concurrent stream access causes ValueErrors with confusing message #595

quantus opened this issue Feb 27, 2017 · 1 comment

Comments

@quantus
Copy link

quantus commented Feb 27, 2017

While playing in Python interpreter and getting the feel of the API I ran into seemingly random issues when reading blob contents from random commits. Having one streams "open" when doing some other function calls leads to ValueErrors being thrown. Because I was working in the interpreter, I didn't understand that I had existing streams "open" and it took some time to understand what was wrong. I also got ValueError: I/O operation on closed file errors in some cases, but I assume the actual root cause is the same.

Example how to reproduce the issue:

  • git clone https://github.com/octocat/Hello-World
  • Try running following python script:
from git import Repo

r = Repo('Hello-World')
stream = r.commit('553c2077f0edc3d5dc5d17262f6aa498e69d6f8e').tree['README'].data_stream.stream
# This will raise ValueError: SHA Hello could not be resolved, git returned: 'Hello World!'
r.commit('762941318ee16e59dabbacb1b4049eec22f0d303').tree['README']

stream.readlines()

I would think the best option would be to allow having multiple streams open at the same time. If that is unfeasible for some reason, then other option would be to warn the user that they are doing something unsupported.

OS: macOS 10.12.3
GitPython: 2.1.1

@Byron
Copy link
Member

Byron commented Mar 5, 2017

Thanks a lot for raising the issue, and for making it easily reproducible!
Fixing it would certainly be possible, but sounds like a lot of work. Tracking concurrent access to the same stream sounds like some work too.

As a workaround, I tried to use the pure python implementation aka GitDB, but to no avail:

from git import Repo, GitDB

r = Repo('Hello-World', odbt=GitDB)
stream = r.commit('553c2077f0edc3d5dc5d17262f6aa498e69d6f8e').tree['README'].data_stream.stream
# This will raise ValueError: SHA Hello could not be resolved, git returned: 'Hello World!'
r.commit('762941318ee16e59dabbacb1b4049eec22f0d303').tree['README']

stream.readlines()
Traceback (most recent call last):
  File "test.py", line 8, in <module>
    stream.readlines()
  File "/Users/byron/dev/GitPython/git/ext/gitdb/gitdb/util.py", line 237, in __getattr__
    self._set_cache_(attr)
  File "/Users/byron/dev/GitPython/git/ext/gitdb/gitdb/stream.py", line 88, in _set_cache_
    assert attr == '_s'
AssertionError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants