WIP: Rewrite gitjob.Git() as a single-job class using Qt-style signals #981

uliska · 2017-06-28T15:51:40Z

This refers to my comment #974 (comment)

Git() is now a more straightforward object responsible for synchronously
or asynchronously running a single Git command.
The asynchronous behaviour relies completely on Qt-style signalling,
i.e. there are no callback functions but the option to connect to signals.
The class forwards all the signals from QProcess.

I have removed the queue concept (although it was well implemented). If we need it at all we should move the idea to the Repo() object but keep the Git() object clean and focused.
Maybe we don't need a queue at all because the same effect may be achieved using signals and slots (call Git command A and connect to a slot. In that slot call Git command B etc.)

This code will not work without correspondingly updating the owning Git repo class. But I wanted to create this in order to discuss the approach.

I'm feeling somewhat bad to suggest such far-reaching changes, but we should get to a common understanding before building too much further code upon it.
Please ask if there is anything you don't understand - or if you have any objections.

This refers to my comment #974 (comment) Git() is now a more straightforward object responsible for synchronously or asynchronously running a single Git command. The asynchronous behaviour relies completely on Qt-style signalling, i.e. there are no callback functions but the option to connect to signals. The class forwards all the signals from QProcess.

wenxin-bupt · 2017-06-29T06:29:55Z

Except concerns of lagging too much behind the schedule, I like to rewrite any part of my code if it is necessary. I'll give my feedback after reading your commit.

I do this without further discussion but in a separate commit which makes it revertable

This has been superseded by the wrapped QProcess signals

a) Git commands don't raise GitError only because of something present in stderr b) Through checking self._stderr and _stdout for "None" it is possible to determine a job has completed. c) run_blocking returns a (stdout, stderr) tuple, the async version also populates these two properties => Of course these changes have to be reflected by the callers (= currently gitrepoo.Repo() )

uliska · 2017-06-29T10:03:50Z

I've added some more commits reshaping and cleaning gitjob.py even more, and I think we should go that way, updating the "caller" accordingly.

I would also suggest to continue with #974 (comment) and especially #974 (comment),
but I would rather merge this Pull Request back to the rewrite-vcs branch first.

wenxin-bupt · 2017-06-29T13:14:51Z

I think we have different abstract of Git().

I take a Git() instance as a Git command shell. As long as it is alive, I can give it commands and it outputs the result.

Your Git() is a wrapped QProcess. When we need to run a command, we instantiate a Git() and connect the related functions with it.

If we instantiate and run a Git() object in a function, will it still work when the function ends execution and the Git() object is destroyed?
I think the command queue pattern have some advantages over the signal-slot-chain pattern. I suggest keeping it in Repo().

Here are the scenarios I considered when I chose command queue pattern:

Scenario 1
We need to check the repository's status every several seconds in case the repository is modified outside Frescobaldi. Typically, when a file in this repository is opened in Frescobaldi, there are four commands need to run one after another.
1. check whether the Git is working
2. check whether the root folder is still a repository.
3. check whether the committed file is updated
4. check whether the file in Index is updated.
When connecting them with signal-slot, we can't simply connect the later Git() command to currently running Git() command's finished signal. The later one must wait until the handling function ends and call it. It surely needs more code than just appending them into a command queue. More importantly, signal-slot-chain couples the functions together, while adding commands into a queue is easier to read and modify.

Scenario 2
The users can type in characters very quickly.
E.g. when a key in keyboard is pressed continuously. 1000 'a' are entered consecutively.

Repo() needs to run a git diff command once receive a document-changed signal. We need to run 1000 times git diff command. This would certainly deteriorate the performance.

If we use signal-slot-chain here, how can we optimize the it? I think we can set a command execution time threshold. For example, two consecutive Git commands must have a interval of more than 5ms. Otherwise the later Git command will be neglected. By this way, we can reduce the number of Git command that really execute.

But if we use a command queue here, it will be much easier. When 1000 Git commands are inserted into the queue, we do a duplication removal operation on the queue. Only the last Git diff command will be kept and other git commands won't be neglected at the same time.

So in my opinion we can change Git() and move the command queue to Repo()

uliska · 2017-06-30T08:46:16Z

I think we have different abstract of Git().

Yes and no.

I take a Git() instance as a Git command shell. As long as it is alive, I can give it commands and it outputs the result.

Your Git() is a wrapped QProcess. When we need to run a command, we instantiate a Git() and connect the related functions with it.

Not necessarily. Depending on the use case, "my" Git() can also be kept as a member variable. it is designed to perform multiple jobs consecutively, although it has a narrower focus of just one job at a time. It is provided for that for example by reinitializing the result fields before starting the process.

If we instantiate and run a Git() object in a function, will it still work when the function ends execution and the Git() object is destroyed?

Yes, if the process is "owned" by some persistent object. I think I've done this by passing the owner through the constructor methods, but we have to test this once the objects are actually instantiated (all of my code here is completely untested so far.

"command queue"

ad Scenario 1: yes, this is true, and maybe this can be keptl I think it would then be the responsibility of the queue (and maybe we should use the existing QQueue object since it's there) to start a job, connect to its finished signal in order to trigger the next one.

However, I would still investigate QFileSystemWatcher to trigger the updates. It should be much easier to simply react to an external event than to repeatedly going through this loop of (expensive) Git checks.

ad Scenario 2: I think I said this somewhere, but we should take the behaviour of LilyPond=>Automatic Engrave, which

only triggers a compilation after a threshold of (I'm not sure) 500ms, which seems a good value for the Git refresh too
kills a running compilation if the new one starts before the old one has completed.

I think this is a sufficient approach for the Git updates too.

So in my opinion we can change Git() and move the command queue to Repo()

OK, I have no objections.

wenxin-bupt · 2017-07-01T09:00:07Z

The test code I insert into view()'s __init__() function to test Git()

        path_1 = '/home/pysrc/git/frescobaldi/frescobaldi/frescobaldi_app/vcs/d.py'
        path_2 = '/home/pysrc/git/frescobaldi/frescobaldi/frescobaldi_app/vcs/gitrepoo.py'
        path_3 = '/home/pysrc/git/frescobaldi/frescobaldi/.git/index/'
        path_4 = '/home/'
        path_5 = '/home/pysrc/git/frescobaldi/frescobaldi/'

        import gitjob

        class fakeowner:
            def __init__(self, path):
                self._root_path = path
                self._relative_path = ''

        owner = fakeowner(path_5)

        self.git = gitjob.Git(owner)

        args1 = ['status', '--porcelain', owner._relative_path]
        args2 = ['status']
        
        def printStarted():
            print('Started is emitted')  

        def printStateChanged(state):
            dict = {
            0 : 'State: NotRunning',
            1 : 'State: Starting',
            2 : 'State: Running'   
            }
            print(dict[state])

        def printReadyReadStandardError():
            print('readyReadStandardError is emitted')

        def printReadyReadStandardOutput():
            print('readyReadStandardOutput is emitted')    

        def printres(gitprocess):
            print("finished is emmitted")
            print(gitprocess.stdout())

        self.git.started.connect(printStarted)
        self.git.stateChanged.connect(printStateChanged)
        self.git.readyReadStandardError.connect(printReadyReadStandardError)
        self.git.readyReadStandardOutput.connect(printReadyReadStandardOutput)
        self.git.finished.connect(printres)

        self.git.run_blocking(args1)
        self.git.run_blocking(args2)
        self.git.run_blocking(args1)
        self.git.run(args2)

Use a raw list instead

wenxin-bupt · 2017-07-02T06:42:56Z

The test code I insert into view()'s __init__() function to test Git() and GitJobQueue()

        path_1 = '/home/pysrc/git/frescobaldi/frescobaldi/frescobaldi_app/vcs/d.py'
        path_2 = '/home/pysrc/git/frescobaldi/frescobaldi/frescobaldi_app/vcs/gitrepoo.py'
        path_3 = '/home/pysrc/git/frescobaldi/frescobaldi/.git/index/'
        path_4 = '/home/'
        path_5 = '/home/pysrc/git/frescobaldi/frescobaldi/'

        import gitjob

        class fakeowner:
            def __init__(self, path):
                self._root_path = path
                self._relative_path = ''

        self.owner = fakeowner(path_5)

        self.jobqueue = gitjob.GitJobQueue()

        args1 = ['status', '--porcelain', self.owner._relative_path]
        args2 = ['status']
        
        def printStarted():
            print('Started is emitted')  

        def printStateChanged(state):
            dict = {
            0 : 'State: NotRunning',
            1 : 'State: Starting',
            2 : 'State: Running'   
            }
            print(dict[state])

        def printReadyReadStandardError():
            print('readyReadStandardError is emitted')

        def printReadyReadStandardOutput():
            print('readyReadStandardOutput is emitted')    

        def printres(gitprocess):
            print("finished is emmitted")
            print(gitprocess.stdout())
            # to invoke next git command in the jobqueue
            gitprocess.executed.emit(0)

         
        git1 = gitjob.Git(self.owner)
        git1.preset_args = args1  
        git1.started.connect(printStarted)
        git1.stateChanged.connect(printStateChanged)
        git1.readyReadStandardError.connect(printReadyReadStandardError)
        git1.readyReadStandardOutput.connect(printReadyReadStandardOutput)
        git1.finished.connect(printres)

        git2 = gitjob.Git(self.owner)
        git2.preset_args = args2
        git2.started.connect(printStarted)
        git2.stateChanged.connect(printStateChanged)
        git2.readyReadStandardError.connect(printReadyReadStandardError)
        git2.readyReadStandardOutput.connect(printReadyReadStandardOutput)
        git2.finished.connect(printres)

        self.jobqueue.enqueue(git1)
        self.jobqueue.enqueue(git2)

uliska · 2017-07-02T08:14:42Z

This is indeed strange. I asked about it on stackoverflow: https://stackoverflow.com/questions/44868741/is-qqueue-missing-from-pyqt-5

wenxin-bupt · 2017-07-02T09:17:19Z

Thank you!

I find another strange issue:
In this page, PyQt5 QProcess's documentation is directly linked to its C++ version.
http://pyqt.sourceforge.net/Docs/PyQt5/api/qprocess.html

I can't access signal errorOccurred through a PyQt5 QProcess object, while signal errorOccurred is listed in the C++ doc as one of the signals.

I tried to access signal error and succeeded. In fact, signal error is the lower-version of errorOccured. It is provided in PyQt4.

http://pyqt.sourceforge.net/Docs/PyQt4/qprocess.html#error-2

I think PyQt5 is still in development. And in future they will alter the signal name from error to errorOccurred to keep the consistency between PyQt and Qt. This is the issue we must prepare in advance.

Signal "errorOccured" is still not an attribute of QProcess in PyQt5 (PyQt 5.5.1 specifically), while it is listed in Qt's doc. Signal "error" does the same job now. It is an attribute of PyQt4's QProcess. I assume that it will be replaced by "errorOccured" in future.

wenxin-bupt · 2017-07-02T12:00:56Z

Add a function to print error code:

  def printErrorOccured(error):
            dict = {
            0 : 'Error: FailedToStart',
            1 : 'Error: Crashed',
            2 : 'Error: Timedout',
            4 : 'Error: WriteError',
            3 : 'Error: ReadError',
            5 : 'Error: UnknownError'
            }
            print(dict[error])

# connect the signal with it
git1.errorOccurred.connect(printErrorOccured)

I changed the Git.executable's content from "git" to "gitt" to trigger the error signal.
Then I received FailedToStart error. And signal finished was not emitted when error occurred.

"if not self.isRunning():" doesn't make sense

Invoke kill() will trigger two signal: "errorOccured" and "stateChanged" "errorOccured" is emitted at first with the error code: "Crashed" "stateChanged" then signals with the state code "NotRunning" You will not receive these two signals when the Git() object is destroyed right after Git.kill() is called.

wenxin-bupt · 2017-07-02T14:12:39Z

According to this question Is deleteLater() necessary in PyQt/PySide?

Often, removing the last Python reference to an object won't be enough to fully clean up, because there could still be a reference held on the Qt side.

Since we need to frequently create and delete Git() in GitJobQueue(), I think we need to take care of the memory usage.

If "args" is passed in, run() and run_blocking() will use it, otherwise run() and run_blocking() will use "self.preset_args". If both "args" and "self.preset_args" are None, we think it's the programmer's fault. We raise an exception here.

If the process crashes, a "errorOccurred" signal will be emitted and "finished" signal won't be emitted. So _finished() only need to handle the successful result.

Remove the Git() object in queue-head. If the Git() object is running, terminate it by calling kill(). Modify the related code in KillAll() and _auto_run_next()

uliska · 2017-07-03T08:17:48Z

@wenxin-bupt please review https://stackoverflow.com/questions/44868741/is-qqueue-missing-from-pyqt-5, there has been an informative answer.

wenxin-bupt · 2017-07-03T13:50:53Z

Thank you.
I checked errorOccurred again. I'm sure I can't access it in PyQt 5.5.1. Is the solution in commit Git(): Forward signal "errorOccured" acceptable?

uliska · 2017-07-03T13:57:39Z

Yes, but please replace the # errorOccurred may not be supported comment with a slightly more explicit explanation of the situation and a link to the stackoverflow post

wenxin-bupt · 2017-07-04T02:36:11Z

In above commit, should I define an enum instead of emitting an int in signal finished()?

wenxin-bupt · 2017-07-04T02:56:35Z

frescobaldi_app/vcs/gitjob.py

+                        Error message will output to stderr() 
+        """ 
+        self._handle_results()
+        self.finished.emit(exitcode)


(Oh.... I need to put my question here)
Should I define an enum instead of emitting an int in signal finished()?

"self" should be emitted along with the "exitcode" in signal "finished"

…Occurred" When an error happens to the Git() object in the queue, GitJobQueue() object will not work properly. So "errorOccurred" should be emitted from GitJobQueue() object to let the host object decide what to do next. E.g. kill_all(), run_next()

"stdout()" and "stderr()" now return "_stdout" and "_stderr" directly.

wenxin-bupt · 2017-07-04T13:04:34Z

frescobaldi_app/vcs/gitjob.py

        """
-        # TODO: should we check isRunning() before or can we rely on the "is not None" check?


isRunning() works by checking the process's state: QProcess.NotRunning

There is a time interval between changing state into QProcess.NotRunning and emitting finished() signal.

_stdout and _stderr are updated after the signal finished() has been emitted. So there will be the case in which isRunning() returns False while _stdout and _stderr are still None.

wenxin-bupt · 2017-07-04T13:09:19Z

frescobaldi_app/vcs/gitjob.py

        """
-        # TODO: should we check isRunning() before or can we rely on the "is not None" check?
-        # A simpler approach would be to simply return self._stdout and have the caller interpret
-        # the type of result: either None (job not completed) or a (potentially empty) string list


I think it's better to return _stdout and _stderr directly to let the caller decide what to next.

So I pushed this commit.

Save binary results in "_handle_results()". Let user decide which kind of result they want in "stdout()" and "stderr()"

Class Git() should concentrate on the running process of a Git command. "version()" will be moved to "Repo()".

wenxin-bupt · 2017-07-06T08:56:24Z

I think Git() and GitJobQueue() is nearing completion. And how errors which occur during running are handled in Git() and GitJobQueue() should be explained here.

I met two kind of errors.

The first kind is induced by intentionally changing the Git().executable from "git" to "gitt". Git() emits signal errorOccurred with the message QProcess.FailedToStart and then crashes. It won't emit signal finished. This kind of error is useful when we want to check whether Git is available in shell. I think error handling should be placed in Repo(). So Git() just simply emits the errorOccurred signal.

Another kind of errors don't emit errorOccurred signal. But they pass a exitcode in finished() signal. This kind of error will occur when I intentionally change the argument args to a wrong argument. And then we can get the error message in stderr(). Same as the solution above, I just let Git() emit finished with the exitcode (0 for success, 1 for failure).

Git() also emits errorOccurred with QProcess.Crashed when calling kill() on it. But this kind of "errors" are caused by us. So it shouldn't be discussed here.

uliska · 2017-07-06T09:04:34Z

the vcs module has a function is_available('git') which in turn loads the vcs.gitrepo module, instantiates Repo and invokes the class method vcs_available('git') (see https://github.com/wbsoft/frescobaldi/blob/rewrite-vcs/frescobaldi_app/vcs/gitrepo.py#L83). Please copy that function to "your" Repo class and adapt to the new Git job as you describe.

I think a Repo() object doesn't have to care about the presence of Git. It should only be created in the first place if Git is available. (And we don't have to consider the case that a user uninstalls Git during a Frescobaldi session ...)

All correct

wenxin-bupt · 2017-07-08T11:38:25Z

Could I merge this branch to rewrite-vcs and then I can directly implement the Repo() part without worrying about the conflict?

Update:

I've started working on rewrite-vcs, so this branch can be kept for further revision.

uliska · 2017-07-08T14:42:47Z

OK. I read your previous comment but couldn't reply.

This branch was only created in order not to impose this Git() class change without discussing it first. So you can merge this pull request whenever you like into rewrite-vcs.

uliska added the gsoc-2017-git label Jun 28, 2017

uliska assigned wenxin-bupt Jun 28, 2017

uliska added 5 commits June 29, 2017 11:10

Git(): Remove deprecated elements

81a8ac6

I do this without further discussion but in a separate commit which makes it revertable

Git(): Remove obsolete import

71f5553

Git(): Remove obsolete signal

779a0ef

This has been superseded by the wrapped QProcess signals

Git() Add question comment

4600235

wenxin-bupt mentioned this pull request Jun 29, 2017

Rewrite vcs #974

Open

Fix some errors to test Git()

c7123c1

wenxin-bupt added 2 commits July 1, 2017 23:21

Add a demo of GitJobQueue

ce7efd2

QQueue is not provided in PyQt

cfb4cec

Use a raw list instead

wenxin-bupt added 3 commits July 2, 2017 21:44

Git(): Fix the error in kill()

5affda6

"if not self.isRunning():" doesn't make sense

GitJobQueue(): using QObject.deleteLater() to delete jobs in the queue

d532a8e

wenxin-bupt added 3 commits July 3, 2017 13:44

Git(): Raise an exception when "args" is None

09d2642

If "args" is passed in, run() and run_blocking() will use it, otherwise run() and run_blocking() will use "self.preset_args". If both "args" and "self.preset_args" are None, we think it's the programmer's fault. We raise an exception here.

Git(): Remove the TODO comment in _finished()

d106cf3

If the process crashes, a "errorOccurred" signal will be emitted and "finished" signal won't be emitted. So _finished() only need to handle the successful result.

GitJobQueue(): Add _remove_current(),

fe1c0e5

Remove the Git() object in queue-head. If the Git() object is running, terminate it by calling kill(). Modify the related code in KillAll() and _auto_run_next()

Git(): modify _finished() to forward "exitcode"

1ccd1af

Git(): revise the comment of signal "errorOccurred"

d22f0bb

wenxin-bupt reviewed Jul 4, 2017

View reviewed changes

wenxin-bupt added 7 commits July 4, 2017 11:25

Git(): Fix an error in _finished()

13617b0

"self" should be emitted along with the "exitcode" in signal "finished"

GitJobQueue(): Revise "_queue" using "collections.deque()"

9fcc073

GitJobQueue(): Change the name of "_auto_run_next()" and "killAll()"

534efb7

GitJobQueue(): Add a comment in enqueue()

10a0249

Git(): Fix a typo error in "stderr()"

b29c199

Git(): Remove the TODO comment in "stdout()" and "stderr()"

e32bbd2

"stdout()" and "stderr()" now return "_stdout" and "_stderr" directly.

wenxin-bupt reviewed Jul 4, 2017

View reviewed changes

wenxin-bupt added 7 commits July 4, 2017 21:26

Should use "is None" instead of "== None"

c5e1900

Git(): "isbinary" is moved to "stdout()" and "stderr()"

947b8c2

Save binary results in "_handle_results()". Let user decide which kind of result they want in "stdout()" and "stderr()"

Git(): Remove "version()"

7f0b3d7

Class Git() should concentrate on the running process of a Git command. "version()" will be moved to "Repo()".

Git(): Remove the "_tic()" "_toc()" helper function

844d2d3

Git(): Move the docstring of class "Git()"

435e7af

GitJobQueue(): Revise the docstrings in class "GitJobQueue"

91c3f74

Trim trailing spaces

4d9bf17

Git(): Revise the comment of signal "executed"

1134a9d

wenxin-bupt merged commit fe249d1 into rewrite-vcs Jul 9, 2017

uliska deleted the vcs-signals branch July 17, 2017 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Rewrite gitjob.Git() as a single-job class using Qt-style signals #981

WIP: Rewrite gitjob.Git() as a single-job class using Qt-style signals #981

uliska commented Jun 28, 2017

wenxin-bupt commented Jun 29, 2017

uliska commented Jun 29, 2017 •

edited

wenxin-bupt commented Jun 29, 2017

uliska commented Jun 30, 2017

wenxin-bupt commented Jul 1, 2017 •

edited

wenxin-bupt commented Jul 2, 2017 •

edited

uliska commented Jul 2, 2017

wenxin-bupt commented Jul 2, 2017

wenxin-bupt commented Jul 2, 2017 •

edited

wenxin-bupt commented Jul 2, 2017

uliska commented Jul 3, 2017 •

edited

wenxin-bupt commented Jul 3, 2017

uliska commented Jul 3, 2017

wenxin-bupt commented Jul 4, 2017 •

edited

wenxin-bupt Jul 4, 2017

wenxin-bupt Jul 4, 2017

wenxin-bupt Jul 4, 2017

wenxin-bupt commented Jul 6, 2017

uliska commented Jul 6, 2017 via email

wenxin-bupt commented Jul 8, 2017 •

edited

uliska commented Jul 8, 2017

		"""
		# TODO: should we check isRunning() before or can we rely on the "is not None" check?

WIP: Rewrite gitjob.Git() as a single-job class using Qt-style signals #981

WIP: Rewrite gitjob.Git() as a single-job class using Qt-style signals #981

Conversation

uliska commented Jun 28, 2017

wenxin-bupt commented Jun 29, 2017

uliska commented Jun 29, 2017 • edited

wenxin-bupt commented Jun 29, 2017

uliska commented Jun 30, 2017

wenxin-bupt commented Jul 1, 2017 • edited

wenxin-bupt commented Jul 2, 2017 • edited

uliska commented Jul 2, 2017

wenxin-bupt commented Jul 2, 2017

wenxin-bupt commented Jul 2, 2017 • edited

wenxin-bupt commented Jul 2, 2017

uliska commented Jul 3, 2017 • edited

wenxin-bupt commented Jul 3, 2017

uliska commented Jul 3, 2017

wenxin-bupt commented Jul 4, 2017 • edited

wenxin-bupt Jul 4, 2017

Choose a reason for hiding this comment

wenxin-bupt Jul 4, 2017

Choose a reason for hiding this comment

wenxin-bupt Jul 4, 2017

Choose a reason for hiding this comment

wenxin-bupt commented Jul 6, 2017

uliska commented Jul 6, 2017 via email

wenxin-bupt commented Jul 8, 2017 • edited

uliska commented Jul 8, 2017

uliska commented Jun 29, 2017 •

edited

wenxin-bupt commented Jul 1, 2017 •

edited

wenxin-bupt commented Jul 2, 2017 •

edited

wenxin-bupt commented Jul 2, 2017 •

edited

uliska commented Jul 3, 2017 •

edited

wenxin-bupt commented Jul 4, 2017 •

edited

wenxin-bupt commented Jul 8, 2017 •

edited