Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Rewrite gitjob.Git() as a single-job class using Qt-style signals #981

Merged
merged 33 commits into from
Jul 9, 2017

Conversation

uliska
Copy link
Collaborator

@uliska uliska commented Jun 28, 2017

This refers to my comment #974 (comment)

Git() is now a more straightforward object responsible for synchronously
or asynchronously running a single Git command.
The asynchronous behaviour relies completely on Qt-style signalling,
i.e. there are no callback functions but the option to connect to signals.
The class forwards all the signals from QProcess.

I have removed the queue concept (although it was well implemented). If we need it at all we should move the idea to the Repo() object but keep the Git() object clean and focused.
Maybe we don't need a queue at all because the same effect may be achieved using signals and slots (call Git command A and connect to a slot. In that slot call Git command B etc.)


This code will not work without correspondingly updating the owning Git repo class. But I wanted to create this in order to discuss the approach.

I'm feeling somewhat bad to suggest such far-reaching changes, but we should get to a common understanding before building too much further code upon it.
Please ask if there is anything you don't understand - or if you have any objections.

This refers to my comment #974 (comment)

Git() is now a more straightforward object responsible for synchronously
or asynchronously running a single Git command.
The asynchronous behaviour relies completely on Qt-style signalling,
i.e. there are no callback functions but the option to connect to signals.
The class forwards all the signals from QProcess.
@wenxin-bupt
Copy link
Collaborator

Except concerns of lagging too much behind the schedule, I like to rewrite any part of my code if it is necessary. I'll give my feedback after reading your commit.

I do this without further discussion but in a separate commit
which makes it revertable
This has been superseded by the wrapped QProcess signals
a)
Git commands don't raise GitError only because of something present
in stderr

b)
Through checking self._stderr and _stdout for "None" it is possible to
determine a job has completed.

c)
run_blocking returns a (stdout, stderr) tuple, the async version also
populates these two properties

=>
Of course these changes have to be reflected by the callers (= currently
gitrepoo.Repo() )
@uliska
Copy link
Collaborator Author

uliska commented Jun 29, 2017

I've added some more commits reshaping and cleaning gitjob.py even more, and I think we should go that way, updating the "caller" accordingly.

I would also suggest to continue with #974 (comment) and especially #974 (comment),
but I would rather merge this Pull Request back to the rewrite-vcs branch first.

@wenxin-bupt
Copy link
Collaborator

  1. I think we have different abstract of Git().

    I take a Git() instance as a Git command shell. As long as it is alive, I can give it commands and it outputs the result.

    Your Git() is a wrapped QProcess. When we need to run a command, we instantiate a Git() and connect the related functions with it.

    If we instantiate and run a Git() object in a function, will it still work when the function ends execution and the Git() object is destroyed?

  2. I think the command queue pattern have some advantages over the signal-slot-chain pattern. I suggest keeping it in Repo().

    Here are the scenarios I considered when I chose command queue pattern:

    Scenario 1
    We need to check the repository's status every several seconds in case the repository is modified outside Frescobaldi. Typically, when a file in this repository is opened in Frescobaldi, there are four commands need to run one after another.

    1. check whether the Git is working
    2. check whether the root folder is still a repository.
    3. check whether the committed file is updated
    4. check whether the file in Index is updated.

    When connecting them with signal-slot, we can't simply connect the later Git() command to currently running Git() command's finished signal. The later one must wait until the handling function ends and call it. It surely needs more code than just appending them into a command queue. More importantly, signal-slot-chain couples the functions together, while adding commands into a queue is easier to read and modify.

    Scenario 2
    The users can type in characters very quickly.
    E.g. when a key in keyboard is pressed continuously. 1000 'a' are entered consecutively.

    Repo() needs to run a git diff command once receive a document-changed signal. We need to run 1000 times git diff command. This would certainly deteriorate the performance.

    If we use signal-slot-chain here, how can we optimize the it? I think we can set a command execution time threshold. For example, two consecutive Git commands must have a interval of more than 5ms. Otherwise the later Git command will be neglected. By this way, we can reduce the number of Git command that really execute.

    But if we use a command queue here, it will be much easier. When 1000 Git commands are inserted into the queue, we do a duplication removal operation on the queue. Only the last Git diff command will be kept and other git commands won't be neglected at the same time.

So in my opinion we can change Git() and move the command queue to Repo()

@wenxin-bupt wenxin-bupt mentioned this pull request Jun 29, 2017
@uliska
Copy link
Collaborator Author

uliska commented Jun 30, 2017

I think we have different abstract of Git().

Yes and no.

I take a Git() instance as a Git command shell. As long as it is alive, I can give it commands and it outputs the result.

Your Git() is a wrapped QProcess. When we need to run a command, we instantiate a Git() and connect the related functions with it.

Not necessarily. Depending on the use case, "my" Git() can also be kept as a member variable. it is designed to perform multiple jobs consecutively, although it has a narrower focus of just one job at a time. It is provided for that for example by reinitializing the result fields before starting the process.

If we instantiate and run a Git() object in a function, will it still work when the function ends execution and the Git() object is destroyed?

Yes, if the process is "owned" by some persistent object. I think I've done this by passing the owner through the constructor methods, but we have to test this once the objects are actually instantiated (all of my code here is completely untested so far.


"command queue"

ad Scenario 1: yes, this is true, and maybe this can be keptl I think it would then be the responsibility of the queue (and maybe we should use the existing QQueue object since it's there) to start a job, connect to its finished signal in order to trigger the next one.

However, I would still investigate QFileSystemWatcher to trigger the updates. It should be much easier to simply react to an external event than to repeatedly going through this loop of (expensive) Git checks.

ad Scenario 2: I think I said this somewhere, but we should take the behaviour of LilyPond=>Automatic Engrave, which

  • only triggers a compilation after a threshold of (I'm not sure) 500ms, which seems a good value for the Git refresh too
  • kills a running compilation if the new one starts before the old one has completed.

I think this is a sufficient approach for the Git updates too.

So in my opinion we can change Git() and move the command queue to Repo()

OK, I have no objections.

@wenxin-bupt
Copy link
Collaborator

wenxin-bupt commented Jul 1, 2017

The test code I insert into view()'s __init__() function to test Git()

        path_1 = '/home/pysrc/git/frescobaldi/frescobaldi/frescobaldi_app/vcs/d.py'
        path_2 = '/home/pysrc/git/frescobaldi/frescobaldi/frescobaldi_app/vcs/gitrepoo.py'
        path_3 = '/home/pysrc/git/frescobaldi/frescobaldi/.git/index/'
        path_4 = '/home/'
        path_5 = '/home/pysrc/git/frescobaldi/frescobaldi/'

        import gitjob

        class fakeowner:
            def __init__(self, path):
                self._root_path = path
                self._relative_path = ''

        owner = fakeowner(path_5)

        self.git = gitjob.Git(owner)

        args1 = ['status', '--porcelain', owner._relative_path]
        args2 = ['status']
        
        def printStarted():
            print('Started is emitted')  

        def printStateChanged(state):
            dict = {
            0 : 'State: NotRunning',
            1 : 'State: Starting',
            2 : 'State: Running'   
            }
            print(dict[state])

        def printReadyReadStandardError():
            print('readyReadStandardError is emitted')

        def printReadyReadStandardOutput():
            print('readyReadStandardOutput is emitted')    

        def printres(gitprocess):
            print("finished is emmitted")
            print(gitprocess.stdout())

        self.git.started.connect(printStarted)
        self.git.stateChanged.connect(printStateChanged)
        self.git.readyReadStandardError.connect(printReadyReadStandardError)
        self.git.readyReadStandardOutput.connect(printReadyReadStandardOutput)
        self.git.finished.connect(printres)

        self.git.run_blocking(args1)
        self.git.run_blocking(args2)
        self.git.run_blocking(args1)
        self.git.run(args2)

@wenxin-bupt
Copy link
Collaborator

wenxin-bupt commented Jul 2, 2017

The test code I insert into view()'s __init__() function to test Git() and GitJobQueue()

        path_1 = '/home/pysrc/git/frescobaldi/frescobaldi/frescobaldi_app/vcs/d.py'
        path_2 = '/home/pysrc/git/frescobaldi/frescobaldi/frescobaldi_app/vcs/gitrepoo.py'
        path_3 = '/home/pysrc/git/frescobaldi/frescobaldi/.git/index/'
        path_4 = '/home/'
        path_5 = '/home/pysrc/git/frescobaldi/frescobaldi/'

        import gitjob

        class fakeowner:
            def __init__(self, path):
                self._root_path = path
                self._relative_path = ''

        self.owner = fakeowner(path_5)

        self.jobqueue = gitjob.GitJobQueue()

        args1 = ['status', '--porcelain', self.owner._relative_path]
        args2 = ['status']
        
        def printStarted():
            print('Started is emitted')  

        def printStateChanged(state):
            dict = {
            0 : 'State: NotRunning',
            1 : 'State: Starting',
            2 : 'State: Running'   
            }
            print(dict[state])

        def printReadyReadStandardError():
            print('readyReadStandardError is emitted')

        def printReadyReadStandardOutput():
            print('readyReadStandardOutput is emitted')    

        def printres(gitprocess):
            print("finished is emmitted")
            print(gitprocess.stdout())
            # to invoke next git command in the jobqueue
            gitprocess.executed.emit(0)

         
        git1 = gitjob.Git(self.owner)
        git1.preset_args = args1  
        git1.started.connect(printStarted)
        git1.stateChanged.connect(printStateChanged)
        git1.readyReadStandardError.connect(printReadyReadStandardError)
        git1.readyReadStandardOutput.connect(printReadyReadStandardOutput)
        git1.finished.connect(printres)

        git2 = gitjob.Git(self.owner)
        git2.preset_args = args2
        git2.started.connect(printStarted)
        git2.stateChanged.connect(printStateChanged)
        git2.readyReadStandardError.connect(printReadyReadStandardError)
        git2.readyReadStandardOutput.connect(printReadyReadStandardOutput)
        git2.finished.connect(printres)

        self.jobqueue.enqueue(git1)
        self.jobqueue.enqueue(git2)

@uliska
Copy link
Collaborator Author

uliska commented Jul 2, 2017

This is indeed strange. I asked about it on stackoverflow: https://stackoverflow.com/questions/44868741/is-qqueue-missing-from-pyqt-5

@wenxin-bupt
Copy link
Collaborator

Thank you!

I find another strange issue:
In this page, PyQt5 QProcess's documentation is directly linked to its C++ version.
http://pyqt.sourceforge.net/Docs/PyQt5/api/qprocess.html

I can't access signal errorOccurred through a PyQt5 QProcess object, while signal errorOccurred is listed in the C++ doc as one of the signals.

I tried to access signal error and succeeded. In fact, signal error is the lower-version of errorOccured. It is provided in PyQt4.

http://pyqt.sourceforge.net/Docs/PyQt4/qprocess.html#error-2

I think PyQt5 is still in development. And in future they will alter the signal name from error to errorOccurred to keep the consistency between PyQt and Qt. This is the issue we must prepare in advance.

Signal "errorOccured" is still not an attribute of QProcess in PyQt5 (PyQt 5.5.1 specifically), while it is listed in Qt's doc.
Signal "error" does the same job now. It is an attribute of PyQt4's QProcess. I assume that it will be replaced by "errorOccured" in future.
@wenxin-bupt
Copy link
Collaborator

wenxin-bupt commented Jul 2, 2017

Add a function to print error code:

  def printErrorOccured(error):
            dict = {
            0 : 'Error: FailedToStart',
            1 : 'Error: Crashed',
            2 : 'Error: Timedout',
            4 : 'Error: WriteError',
            3 : 'Error: ReadError',
            5 : 'Error: UnknownError'
            }
            print(dict[error])

# connect the signal with it
git1.errorOccurred.connect(printErrorOccured)

I changed the Git.executable's content from "git" to "gitt" to trigger the error signal.
Then I received FailedToStart error. And signal finished was not emitted when error occurred.

"if not self.isRunning():" doesn't make sense
Invoke kill() will trigger two signal: "errorOccured" and "stateChanged"
"errorOccured" is emitted at first with the error code: "Crashed"
"stateChanged" then signals with the state code "NotRunning"
You will not receive these two signals when the Git() object is destroyed right after Git.kill() is called.
@wenxin-bupt
Copy link
Collaborator

According to this question Is deleteLater() necessary in PyQt/PySide?

Often, removing the last Python reference to an object won't be enough to fully clean up, because there could still be a reference held on the Qt side.

Since we need to frequently create and delete Git() in GitJobQueue(), I think we need to take care of the memory usage.

If "args" is passed in, run() and run_blocking() will use it, otherwise run() and run_blocking() will use "self.preset_args". If both "args" and "self.preset_args" are None, we think it's the programmer's fault. We raise an exception here.
If the process crashes, a "errorOccurred" signal will be emitted and "finished" signal won't be emitted.
So _finished() only need to handle the successful result.
Remove the Git() object in queue-head. If the Git() object is running, terminate it by calling kill().
Modify the related code in KillAll() and _auto_run_next()
@uliska
Copy link
Collaborator Author

uliska commented Jul 3, 2017

@wenxin-bupt please review https://stackoverflow.com/questions/44868741/is-qqueue-missing-from-pyqt-5, there has been an informative answer.

@wenxin-bupt
Copy link
Collaborator

Thank you.
I checked errorOccurred again. I'm sure I can't access it in PyQt 5.5.1. Is the solution in commit Git(): Forward signal "errorOccured" acceptable?

@uliska
Copy link
Collaborator Author

uliska commented Jul 3, 2017

Yes, but please replace the # errorOccurred may not be supported comment with a slightly more explicit explanation of the situation and a link to the stackoverflow post

@wenxin-bupt
Copy link
Collaborator

wenxin-bupt commented Jul 4, 2017

In above commit, should I define an enum instead of emitting an int in signal finished()?

Error message will output to stderr()
"""
self._handle_results()
self.finished.emit(exitcode)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Oh.... I need to put my question here)
Should I define an enum instead of emitting an int in signal finished()?

"self" should be emitted along with the "exitcode" in signal "finished"
…Occurred"

When an error happens to the Git() object in the queue,  GitJobQueue() object will not work properly.
So "errorOccurred" should be emitted from  GitJobQueue() object to let the host object decide what to do next.
E.g. kill_all(), run_next()
"stdout()" and "stderr()" now return "_stdout" and "_stderr" directly.
"""
# TODO: should we check isRunning() before or can we rely on the "is not None" check?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isRunning() works by checking the process's state: QProcess.NotRunning

There is a time interval between changing state into QProcess.NotRunning and emitting finished() signal.

_stdout and _stderr are updated after the signal finished() has been emitted. So there will be the case in which isRunning() returns False while _stdout and _stderr are still None.

"""
# TODO: should we check isRunning() before or can we rely on the "is not None" check?
# A simpler approach would be to simply return self._stdout and have the caller interpret
# the type of result: either None (job not completed) or a (potentially empty) string list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to return _stdout and _stderr directly to let the caller decide what to next.

So I pushed this commit.

Save binary results in "_handle_results()".
Let user decide which kind of result they want in "stdout()" and "stderr()"
Class Git() should concentrate on the running process of a Git command.
"version()" will be moved to "Repo()".
@wenxin-bupt
Copy link
Collaborator

I think Git() and GitJobQueue() is nearing completion. And how errors which occur during running are handled in Git() and GitJobQueue() should be explained here.

I met two kind of errors.

The first kind is induced by intentionally changing the Git().executable from "git" to "gitt". Git() emits signal errorOccurred with the message QProcess.FailedToStart and then crashes. It won't emit signal finished. This kind of error is useful when we want to check whether Git is available in shell. I think error handling should be placed in Repo(). So Git() just simply emits the errorOccurred signal.

Another kind of errors don't emit errorOccurred signal. But they pass a exitcode in finished() signal. This kind of error will occur when I intentionally change the argument args to a wrong argument. And then we can get the error message in stderr(). Same as the solution above, I just let Git() emit finished with the exitcode (0 for success, 1 for failure).

Git() also emits errorOccurred with QProcess.Crashed when calling kill() on it. But this kind of "errors" are caused by us. So it shouldn't be discussed here.

@uliska
Copy link
Collaborator Author

uliska commented Jul 6, 2017 via email

@wenxin-bupt
Copy link
Collaborator

wenxin-bupt commented Jul 8, 2017

Could I merge this branch to rewrite-vcs and then I can directly implement the Repo() part without worrying about the conflict?


Update:

I've started working on rewrite-vcs, so this branch can be kept for further revision.

@uliska
Copy link
Collaborator Author

uliska commented Jul 8, 2017

OK. I read your previous comment but couldn't reply.

This branch was only created in order not to impose this Git() class change without discussing it first. So you can merge this pull request whenever you like into rewrite-vcs.

@wenxin-bupt wenxin-bupt merged commit fe249d1 into rewrite-vcs Jul 9, 2017
@uliska uliska deleted the vcs-signals branch July 17, 2017 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants