Skip to content
This repository

Pipe #33

Closed
wants to merge 24 commits into from

7 participants

Ted Nyman Aman Gupta Postmodern Ryan Tomayko Bryan Helmkamp Caleb Spare Scott J. Goldman
Ted Nyman
Collaborator
tnm commented

This adds a full implementation of the Pipe Based Method of talking to Pygments. It maintains full API compatibility with the existing pygments.rb library.

The basic data flow:

  1. Rubyland opens a pipe to the mentos.py process if one does not already exist

  2. mentos listens for a fixed 32-length string, representing an integer, which represents header length

  3. Rubyland sends over a JSON header of the 'RPC'-ish method name, args, kwargs, and length of text to come (if any)

  4. Rubyland sends over the text, if any

  5. mentos returns the requested data

This pull also includes handling for dealing with dead child processes, catching SIGCHLD and then issuing the subsequent wait. We are also able to re-spawn child processes if they die. We also catch relevant errors involved in the syscalls.

This also adds a benchmark tool allowing for arbitrary iterations as well as increasing the length of the input data itself.

All existing tests are green, and I've also added several more unit tests. There are now also some test data files (from Redis and Gunicorn) in test/

lib/pygments/mentos.py
((72 lines not shown))
  72 + Highlight the relevant code, and return a result string.
  73 + The default formatter is html, but alternate formatters can be passed in via
  74 + the formatter_name argument. Additional paramters can be passed as args
  75 + or kwargs.
  76 + """
  77 + # Default to html if we don't have the formatter name.
  78 + if formatter_name:
  79 + _format_name = str(formatter_name)
  80 + else:
  81 + _format_name = "html"
  82 +
  83 + # Return a lexer object
  84 + lexer = self.return_lexer(args, kwargs)
  85 +
  86 + # Make sure we sucessfuly got a lexer
  87 + if lexer:
1
Aman Gupta Owner
tmm1 added a note

What happens in the else case here? Do we need to send back an error response of some sort?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
lib/pygments/mentos.py
((166 lines not shown))
  166 +
  167 + The header is of form:
  168 + { "method": "highlight", "args": [], "kwargs": {"arg1": "v"}, "bytes": 128, "fd": "8"}
  169 + """
  170 +
  171 + while True:
  172 + res = None
  173 +
  174 + # The loop begins by reading off a simple 32-arity string representing
  175 + # an integer of 32 bits. This is the length of our JSON header. Using
  176 + # this method allows to avoid worrying about newlines.
  177 + size = sys.stdin.read(32)
  178 +
  179 + # Read from stdin the amount of bytes we were told to expect.
  180 + header_bytes = int(size, 2)
  181 + line = sys.stdin.read(header_bytes)
2
Aman Gupta Owner
tmm1 added a note

We've had issues in the past sending utf-8 over stdin. I think we did something like

sys.stdin  = codecs.getreader('UTF-8')(sys.stdin)
Bryan Helmkamp
brynary added a note

@tmm1 -- What issues did you see with utf-8 over stdin in the past?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
lib/pygments/popen.rb
((19 lines not shown))
  19 + # The #start method also includes logic for dealing with signals from the
  20 + # child.
  21 + #
  22 + def start(pygments_path = File.expand_path('../../../vendor/pygments-main/', __FILE__))
  23 + ENV['PYGMENTS_PATH'] = pygments_path
  24 +
  25 + # Make sure we kill off the child when we're done
  26 + at_exit { stop }
  27 +
  28 + # A pipe to the mentos python process. #POSIX::Spawn#popen4 gives us
  29 + # the pid and three IO objects to write and read..
  30 + @pid, @in, @out, @err = popen4(File.expand_path('../mentos.py', __FILE__))
  31 +
  32 + # Deal with dying child processes.
  33 + Signal.trap('CHLD') do
  34 +
1
Aman Gupta Owner
tmm1 added a note

I think you can do something like this to make sure other existing CHLD handlers are not overwritten:

old = trap('CHLD') do
  old.call
end

I'm not sure we want to in this case though... since the other handler is probably also going to call waitpid and would block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
lib/pygments/popen.rb
((27 lines not shown))
  27 +
  28 + # A pipe to the mentos python process. #POSIX::Spawn#popen4 gives us
  29 + # the pid and three IO objects to write and read..
  30 + @pid, @in, @out, @err = popen4(File.expand_path('../mentos.py', __FILE__))
  31 +
  32 + # Deal with dying child processes.
  33 + Signal.trap('CHLD') do
  34 +
  35 + # Once waitpid() returns the pid (i.e., the child process exited),
  36 + # we can safely set our pid variable to nil. Next time a Pygments.rb
  37 + # method gets called, the child will be spawned again, so we don't
  38 + # need to spawn a new child in this block right now. For extra safety,
  39 + # if an ECHILD (no children) is set by waitpid(), don't die horribly;
  40 + # still set the @pid to nil.
  41 + begin
  42 + @pid = nil if Process.waitpid == @pid
2
Aman Gupta Owner
tmm1 added a note

Does it makes sense to use Process::WNOHANG here as well? To avoid any blocking in case the process has already been reaped.

Also might be worth using Process.waitpid(@pid) instead, but I'm not convinced.

Ted Nyman Collaborator
tnm added a note

Yeah totally, I can add in the WNOHANG. I do think plain Process.waitpid is fine here.

Also I think once we get the start/stop process exactly how we want, I'll be finnin go extract it out since it seems generic and re-usable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
lib/pygments/popen.rb
((65 lines not shown))
  65 + Process.waitpid(@pid)
  66 + rescue Errno::ESRCH, Errno::ECHILD
  67 + end
  68 + end
  69 +
  70 + @pid = nil
  71 + end
  72 +
  73 + # Check for a @pid variable, and then hit `kill -0` with the pid to
  74 + # check if the pid is still in the process table. If this function
  75 + # gives us an ENOENT or ESRCH, we can also safely return false (no process
  76 + # to worry about).
  77 + #
  78 + # Returns true if the child is alive.
  79 + def alive?
  80 + return true if @pid && Process.kill(0, @pid)
5
Aman Gupta Owner
tmm1 added a note

I used this kill(0, pid) trick but I'm wondering if there's a better solution. I guess I'm concerned it might be expensive, but it doesn't really look that way.

/cc @rtomayko @scottjg

Scott J. Goldman
scottjg added a note

i'm not sure offhand how expensive it is, but i'm wondering if it's really reliable/necessary? if the process is dead, won't the pipe be broken - and that gets caught below anyway?

Aman Gupta Owner
tmm1 added a note

Good point, if we catch the EPIPE when writing the json payload and respawn/retry that'll probably be good enough.

Ryan Tomayko
rtomayko added a note

My understanding is that kill(0, pid) is fast and reliable and the best way to do a pure process exists check. Use it if you need it. Agreed EPIPE looks like it makes it unnecessary though.

Ted Nyman Collaborator
tnm added a note

Yep, the EPIPE should take care of it, although I suppose it's harmless enough to leave this check in. I'll try to rig up some tests anyway to get some data on the perf implications, if any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
test/test_pygments.rb
((6 lines not shown))
5 5
6   -class PygmentsLexerTest < Test::Unit::TestCase
7   - include Pygments
  6 +P = Pygments
1
Aman Gupta Owner
tmm1 added a note

This whole P. thing is kind of stupid, but the include Pygments into all the test classes and subclasses was causing weird inheritance behaviors and making the tests unreliable. Maybe there's a better solution, but I couldn't come up with anything..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Aman Gupta
Owner
tmm1 commented

:fire: This looks great. If the simplejson issue is resolved and CI is green, we should branch deploy this to an fe or two and see what happens.

lib/pygments/popen.rb
... ... @@ -0,0 +1,261 @@
  1 +# coding: utf-8
  2 +
  3 +require 'posix/spawn'
  4 +require 'yajl'
  5 +
  6 +# Error class
  7 +class MentosError < IOError
1
Ted Nyman Collaborator
tnm added a note

I'll add in the code so we make full use of this error class (getting a Python trace, etc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Postmodern

Has this been tested on Windows?

lib/pygments/popen.rb
... ... @@ -0,0 +1,261 @@
  1 +# coding: utf-8
  2 +
  3 +require 'posix/spawn'
  4 +require 'yajl'
3
Postmodern
postmodern added a note

Why yajl instead of json or multi_json?

Aman Gupta Owner
tmm1 added a note

I like yajl, and it provides a fast symbolize_keys option.

Postmodern
postmodern added a note

The fewer dependencies and C-extensions the better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Aman Gupta
Owner
tmm1 commented

Has this been tested on Windows?

No. Likely will not work, although posix-spawn kinda works on 1.9 in windows.

Either way, it requires python which is not bundled on windows. win32 is not a priority right now. You can continue to use pygments 0.2.x on windows if that works, since it's API compatible.

Ryan Tomayko

Man this looks awesome. Good shit.

Bryan Helmkamp

Hey @tmm1, @tnm...

I tried this branch out yesterday, and I think I ran into an issue where Pygments returned the wrong result for a request. Haven't been able to reproduce yet. Is there any chance that a bug in this code might cause Pygments to incorrectly return the result of a previous highlight? (Or perhaps if it's used incorrectly -- maybe somehow something is sharing a socket which shouldn't be, etc.)

Thoughts?

-Bryan

Ted Nyman
Collaborator
tnm commented

Hey @brynary,

I'm still finishing this up, so I wouldn't use it quite yet. Once the release is done, a bug like that won't be possible :)

-t

Bryan Helmkamp

@tnm -- Thanks for the heads up. Does the behavior I describe sound like a possible or known bug?

tnm and others added some commits
Ted Nyman tnm Update core mentos operations, adding pipe checks.
This also updates tests, improves the shell lexer recognition, and allows for
logging to be set via the ENV.
0b2049c
Ted Nyman tnm Update with latest error handling, timeouts, and local lexers 02c05ff
Ted Nyman tnm closed this
Caleb Spare

Hey, I'm thinking of writing something like pygments.rb for Go, and I'm curious why you guys switched approaches from the embedded python to a child process. I looked through this ticket and others and couldn't see any mention of the reasoning. Does this approach afford some kind of speedup, maybe? Perhaps using rubypython to embed a python runtime has higher memory usage than a separate process, or can cause instabilities? I'm just guessing here.

Ted Nyman
Collaborator
tnm commented

Hey @cespare — There were reliability issues with rubypython and the embedded approach; too-frequent segfaults, mostly, as well as difficulty in debugging them. #13 was an example that would occur sometimes in tests.

Caleb Spare

@tnm Thanks, appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 24 unique commits by 3 authors.

Apr 19, 2012
Aman Gupta add really basic v0.1 of Pygments::Popen 712644d
Jul 01, 2012
Ted Nyman tnm merge 11abc8d
Jul 04, 2012
Ted Nyman tnm Full implementation of popen. Remove old extensions, update Rakefile. 6517d46
Ted Nyman tnm Merge branch 'master' into full-popen 602e68e
Jul 07, 2012
Ted Nyman tnm Improve pipe communication style, more tests, benchmarks 8d84bc0
Ted Nyman tnm Add new benchmarks to the README 0beeec7
Ted Nyman tnm Nicer README formatting for benchmarks 9810916
Ted Nyman tnm Add benchmark task to Rakefile 62e06b1
Jul 08, 2012
Ted Nyman tnm Additional tests and matching 3aa50a7
Jul 09, 2012
Ted Nyman tnm Vendor simplejson and ensure it is on the path 75ece4b
Ted Nyman tnm Raise mentos errors in Rubyland 671d395
Ted Nyman tnm Better error detail ec8dce5
Jul 10, 2012
Ted Nyman tnm Convert unicode keys to strings for older Python 410b981
Ted Nyman tnm Raise full Python errors in Rubyland 1d575a7
Ted Nyman tnm Remove old code ac94b46
Jul 12, 2012
Ted Nyman tnm Respect SIGCHLD raised from other children of the same process 61ed40e
Jul 13, 2012
Ted Nyman tnm Explicitly pass in code to matching 9d32e7d
Ted Nyman tnm Add default arg bf69dce
Jul 14, 2012
Ted Nyman tnm Bump version to 0.3.0, simplify pipe communication, test additions d7fe71e
Ted Nyman tnm Remove old code 26af3b0
Jul 15, 2012
Ted Nyman tnm Better pythonpath 478eaa4
Jul 16, 2012
Ted Nyman tnm Make sure not to send any code as JSON bf2c254
Aug 04, 2012
Ted Nyman tnm Update core mentos operations, adding pipe checks.
This also updates tests, improves the shell lexer recognition, and allows for
logging to be set via the ENV.
0b2049c
Sep 25, 2012
Ted Nyman tnm Update with latest error handling, timeouts, and local lexers 02c05ff
Something went wrong with that request. Please try again.