Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a lexer for the R language in Haskell #369

Closed
facundominguez opened this issue Nov 16, 2021 · 15 comments
Closed

Implement a lexer for the R language in Haskell #369

facundominguez opened this issue Nov 16, 2021 · 15 comments

Comments

@facundominguez
Copy link
Member

facundominguez commented Nov 16, 2021

We discovered in #368 that inline-r stopped working with ghc-8.10.6 in MacOS.

Abstracting over low-level details, the R runtime started complaining that the Template Haskell code is running in an inappropriate thread.

inline-r uses quasiquotes to embed R code, and then uses Template Haskell to generate the calls to execute the code in the R runtime. One of the tasks that Template Haskell accomplishes is parsing the R code to extract from it the quoted variables that must be marshaled from the surrounding Haskell context.

The parser is implemented itself in the R runtime, so we have Template Haskell invoke the R runtime to do the parsing. Now the R runtime is very picky and wants to run in the main thread of the program invoking it, and it turns out that a patch in ghc-8.10.6 started running TH splices in a forked thread.

I imagine that we could embrace this constraint, and demand of ghc that it provides some flag to recover the behavior up-to ghc-8.10.4. But perhaps it would work better to just give up trying to run the R runtime in Template Haskell, and instead do our lexing in Haskell. After all, we don't expect the tokens of the R language to change a lot over time. What do you think?

@nrnrnr
Copy link
Member

nrnrnr commented Nov 16, 2021

Unless R has a gnarly lexical structure, I would go for this. The manual makes it look pretty civilized, even if you have to write a parser as well as a lexer.

@goldfirere
Copy link

Happening upon this thread only by luck... but would it make sense to run R in a separate process and then send the code to it via a pipe of some sort? That would sound like it would respect R's sensitive runtime while not reimplementing something annoying.

@facundominguez
Copy link
Member Author

Yep, that is a feasible approach as well. Thanks for bringing it up!

@idontgetoutmuch
Copy link
Member

Would that work on windows?

@facundominguez
Copy link
Member Author

facundominguez commented Nov 17, 2021

It could be made to work in windows.
The simplest thing would be to run R once per quasiquote, so we don't have to figure out when to start and terminate a long running process. The major downside, is that there probably will be platform differences in the communication mechanism with R and we will have at least 3 platforms to check I'd expect most platform differences to be abstracted away by the process package.

Another thing to consider is that if we go with a lexer (one that only scans enough to get quoted variables), parsing errors would be deferred until runtime. Is this acceptable? A full lexer and a parser do look like they will take some more trouble to get right.

@UnkindPartition
Copy link
Collaborator

@facundominguez it seems to work just fine with GHC 9 and the latest stackage nightly, both locally and on the CI: https://github.com/tweag/HaskellR/runs/4627191190?check_suite_focus=true

I can't test locally with GHC 8.10.6 unfortunately; with nix it fails with attribute 'ghc8106' missing, and without nix I'm running into #257.

So I'm inclined to close this unless someone can reproduce it with GHC 9+.

@facundominguez
Copy link
Member Author

facundominguez commented Dec 24, 2021

Hello! Perhaps you could reproduce it by reverting this commit: 97e99d0

I checked that the GHC modifications that broke inline-r are still present in ghc-9.2.1. Maybe upgrade CI to ghc-9.2.1 and see what happens there. I have only seen it fail in MacOS. I'm updating the description here to make that explicit. I'm sorry for the oversight.

@facundominguez
Copy link
Member Author

So I'm inclined to close this unless someone can reproduce it with GHC 9+.

Given that TH splices don't run in the main thread still, I think we are being unlucky. If the failure isn't visible at runtime, we have at least a design flaw that still would deserve being fixed. Also, one thing to try is running ghc with +RTS -N or -j and see if that makes a difference.

@facundominguez
Copy link
Member Author

facundominguez commented Dec 24, 2021

I just checked the source code of ghc-9.0.1, which is the actual version that the dev branch is trying. It does not contain the breaking changes. You will have to try 9.2.1 to have a chance of reproducing it.

The 9.0.2 release also includes the breaking changes: https://www.haskell.org/ghc/blog/20211225-ghc-9.0.2-released.html
They appear as issue 19410 in the release notes.

@UnkindPartition
Copy link
Collaborator

@facundominguez @idontgetoutmuch I still cannot reproduce the issue on GHC 9.2.1, either locally or on CI, including on Mac OS (see e.g. this job).

However, I have now made a change which uses an external R interpreter process instead of an embedded one; it's currently on the ghc-9.2.1 branch. Could you please test it and see if it fixes the issue for you?

@facundominguez
Copy link
Member Author

My way of reproducing was CI. So given that CI is green, your fix is looking good. I had a look at fe9b5bd, and it matches what I expected.

One thing I didn't anticipated was the need to write a temporary file with the quasiquote contents, but likely this is necessary to avoid escaping issues if one were to construct an R string instead. Perhaps there is some way to feed stdin as a file: https://stackoverflow.com/questions/9370609/piping-stdin-to-r
but I'm nitpicking.

@UnkindPartition
Copy link
Collaborator

My way of reproducing was CI. So given that CI is green, your fix is looking good.

Well, the CI was green even before the fix — see the link above.

Perhaps there is some way to feed stdin as a file

The problem isn't so much to read the input from stdin, but the fact that I'm feeding the R script itself through stdin to the interpreter. The alternative would be to introduce a run-time dependency (with the help of Cabal) on the installed R/collectAntis.R script file as opposed to embedding it into the binary like it's done now. I thought the current way was more resilient, but I'm open to changing it.

@facundominguez
Copy link
Member Author

facundominguez commented Jan 31, 2022

Right, I was reading the git history backwards. In that case looks like I can't help much reproducing it. I'm trying ghc-9.0.2 here though.

The alternative would be to introduce a run-time dependency (with the help of Cabal) on the installed R/collectAntis.R script file as opposed to embedding it into the binary like it's done now. I thought the current way was more resilient, but I'm open to changing it.

I see. In that case, and in the absence of further issues, I'd keep it as is.

@facundominguez
Copy link
Member Author

I think I managed to reproduce the failure with ghc-9.0.2 in this build.

HaskellR-examples                > Error: C stack usage  17587407606712 is too close to the limit

@idontgetoutmuch
Copy link
Member

@facundominguez @idontgetoutmuch I still cannot reproduce the issue on GHC 9.2.1, either locally or on CI, including on Mac OS (see e.g. this job).

However, I have now made a change which uses an external R interpreter process instead of an embedded one; it's currently on the ghc-9.2.1 branch. Could you please test it and see if it fixes the issue for you?

For me this now works on 9.2.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants