New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New qq #188

Closed
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@qnikst
Member

qnikst commented Jul 24, 2015

Instead of analysing and generatic code that will recreate R's SEXP tree,
at runtime, new quasi-quoter analyses R's and adds additional functions
that take care of interporability of between runtime. I.e. converts
and assignes all required values and then passes computation to
plain parse-eval function in R. So QQ converts (simplified):

[r| plot(x_hs, y_hs) |]

into

do define "x_hs" (mkSEXP x_hs)
   define "y_hs" (mkSEXP y_hs)
   x <- tryEval "plot(x_hs, y_hs)"
   undefine "x_hs"
   undefine "y_hs"
   return x

Non simplified version uses withGlobalyDefined helper instead of
define/undefine pair, mkSEXP happen in IO, and special care about
functions is introduced.

This approach seems very reasonable unless we don't want to change
R's SEXP tree, more over we could convert this represendation into
the form

let expression = parsePure "plot(x_hs, y_hs)"
in do ...
      x <- eval expressin
      ...

In this form parsed expressions could be floated out. However it's
not obvious if this transformation is safe to apply in R and require
additional investigation.

@mboes

This comment has been minimized.

Show comment
Hide comment
@mboes

mboes Jul 24, 2015

Member

@Fuuzetsu could you give this a review? Reassign to me once done.

Member

mboes commented Jul 24, 2015

@Fuuzetsu could you give this a review? Reassign to me once done.

@mboes mboes assigned Fuuzetsu and unassigned mboes Jul 24, 2015

@Fuuzetsu

This comment has been minimized.

Show comment
Hide comment
@Fuuzetsu

Fuuzetsu Jul 24, 2015

Contributor

@mboes sure

Contributor

Fuuzetsu commented Jul 24, 2015

@mboes sure

@Fuuzetsu

This comment has been minimized.

Show comment
Hide comment
@Fuuzetsu

Fuuzetsu Jul 24, 2015

Contributor

There seem to be two main issues, or 1 issue and 1 maybe-issue-depends-who-you-ask. First the obvious one from the diff, closures being broken:

+    -- XXX: there is a problem with closures that we need to solve somehow,
+    --      temporary solution, just not use it.
+    -- _ <- [r| z <- function(y) y_hs + y |]
+    -- ("8" @=?) =<< [r| z(3) |]

Seems like a pretty big regression. AFAIK @qnikst wanted some review before digging into it further though. I don't know what the policy is but I wonder if it'd be better to have ‘expecting to fail with such and such message’ test rather than just comment it out. At least then we might catch when something changes again.

On master:

Prelude H H R R H> let y = (5 :: Double) in [r| z <- function(y) y_hs + y |] >> return ()
Prelude H H R R H> H.printQuote [r| z(3) |]
[1] 8

This PR:

Prelude H H R R H> let y = (5 :: Double) in [r| z <- function(y) y_hs + y |] >> return ()
Prelude H H R R H> H.printQuote [r| z(3) |]
*** Exception: R Runtime Error: Error in z(3) : object 'y_hs' not found

Second problem is change of semantics in how values that get mutated work. Notably, on master:

Prelude H H R R H GHC.Int> [r| abc <- function(x){eval(parse(text = paste(substitute(x), "<<- 6"))); 10} |] >> return ()
Prelude H H R R H GHC.Int> let x = (2 :: Int32) in H.printQuote [r| abc(x_hs) + x_hs |]
*** Exception: R Runtime Error: Error in 2 <<- 6 : invalid (do_set) left-hand side to assignment

but with this PR:

Prelude H H R R H GHC.Int> [r| abc <- function(x){eval(parse(text = paste(substitute(x), "<<- 6"))); 10} |] >> return ()
Prelude H H R R H GHC.Int> let x = (2 :: Int32) in H.printQuote [r| abc(x_hs) + x_hs |]
[1] 16

What the right result should be is ‘it depends’. Personally I'd expect 12, i.e. look from the Haskell side where values are immutable and we're effectively splicing, something like

Prelude H H R R H GHC.Int> let x = (2 :: Int32); y = x in H.printQuote [r| abc(x_hs) + y_hs |]
[1] 12

This gives a suggestion on how to implement such a thing, that is by binding each separate occurance of _hs value to a different name. Given an approach that returns 12, we can recover mutability inside R with something like [r| x <- x_hs; abc(x) + x|], result of which I think should not be surprising to anyone, 16.

But if you ask R person they'll probably tell you they'd expect 16 to begin with.

Whatever we settle on, I think we should have a test: this is a change in behaviour that went under the radar.

Just going to leave very minor nitpicks in-code but otherwise I see no other issues.

Contributor

Fuuzetsu commented Jul 24, 2015

There seem to be two main issues, or 1 issue and 1 maybe-issue-depends-who-you-ask. First the obvious one from the diff, closures being broken:

+    -- XXX: there is a problem with closures that we need to solve somehow,
+    --      temporary solution, just not use it.
+    -- _ <- [r| z <- function(y) y_hs + y |]
+    -- ("8" @=?) =<< [r| z(3) |]

Seems like a pretty big regression. AFAIK @qnikst wanted some review before digging into it further though. I don't know what the policy is but I wonder if it'd be better to have ‘expecting to fail with such and such message’ test rather than just comment it out. At least then we might catch when something changes again.

On master:

Prelude H H R R H> let y = (5 :: Double) in [r| z <- function(y) y_hs + y |] >> return ()
Prelude H H R R H> H.printQuote [r| z(3) |]
[1] 8

This PR:

Prelude H H R R H> let y = (5 :: Double) in [r| z <- function(y) y_hs + y |] >> return ()
Prelude H H R R H> H.printQuote [r| z(3) |]
*** Exception: R Runtime Error: Error in z(3) : object 'y_hs' not found

Second problem is change of semantics in how values that get mutated work. Notably, on master:

Prelude H H R R H GHC.Int> [r| abc <- function(x){eval(parse(text = paste(substitute(x), "<<- 6"))); 10} |] >> return ()
Prelude H H R R H GHC.Int> let x = (2 :: Int32) in H.printQuote [r| abc(x_hs) + x_hs |]
*** Exception: R Runtime Error: Error in 2 <<- 6 : invalid (do_set) left-hand side to assignment

but with this PR:

Prelude H H R R H GHC.Int> [r| abc <- function(x){eval(parse(text = paste(substitute(x), "<<- 6"))); 10} |] >> return ()
Prelude H H R R H GHC.Int> let x = (2 :: Int32) in H.printQuote [r| abc(x_hs) + x_hs |]
[1] 16

What the right result should be is ‘it depends’. Personally I'd expect 12, i.e. look from the Haskell side where values are immutable and we're effectively splicing, something like

Prelude H H R R H GHC.Int> let x = (2 :: Int32); y = x in H.printQuote [r| abc(x_hs) + y_hs |]
[1] 12

This gives a suggestion on how to implement such a thing, that is by binding each separate occurance of _hs value to a different name. Given an approach that returns 12, we can recover mutability inside R with something like [r| x <- x_hs; abc(x) + x|], result of which I think should not be surprising to anyone, 16.

But if you ask R person they'll probably tell you they'd expect 16 to begin with.

Whatever we settle on, I think we should have a test: this is a change in behaviour that went under the radar.

Just going to leave very minor nitpicks in-code but otherwise I see no other issues.

Show outdated Hide outdated inline-r/src/Language/R/QQ.hs
_ -> error "Impossible happen."
mkQQ :: String -> Q TH.Exp
mkQQ input = parse input >>= \expr ->
-- XXX: possibly we need to introduce our own parser and do not reuse R one.

This comment has been minimized.

@Fuuzetsu

Fuuzetsu Jul 24, 2015

Contributor

As mentioned on slack, should state why this might be the case

@Fuuzetsu

Fuuzetsu Jul 24, 2015

Contributor

As mentioned on slack, should state why this might be the case

Show outdated Hide outdated inline-r/src/Language/R/QQ.hs
nmext = name ++ "_ext"
splice = "function(...) .Call(" ++ nmext ++ ", ...)"
hvar <- fmap (TH.varE . (maybe (TH.mkName nm) id)) (TH.lookupValueName nm)
[| -- let expression = H.unsafeParseText splice False

This comment has been minimized.

@Fuuzetsu

Fuuzetsu Jul 24, 2015

Contributor

Spurious commented line, should remove it rather than keep it around

@Fuuzetsu

Fuuzetsu Jul 24, 2015

Contributor

Spurious commented line, should remove it rather than keep it around

Show outdated Hide outdated inline-r/src/Language/R/QQ.hs
splice = "function(...) .Call(" ++ nmext ++ ", ...)"
hvar <- fmap (TH.varE . (maybe (TH.mkName nm) id)) (TH.lookupValueName nm)
[| -- let expression = H.unsafeParseText splice False
R.withProtected (H.mkSEXPIO $hvar) $ \val -> do

This comment has been minimized.

@Fuuzetsu

Fuuzetsu Jul 24, 2015

Contributor

Surely this is just H.mkSEXPIO hvar instead of H.mkSEXPIO $hvar or even H.mkSEXPIO $ hvar?

@Fuuzetsu

Fuuzetsu Jul 24, 2015

Contributor

Surely this is just H.mkSEXPIO hvar instead of H.mkSEXPIO $hvar or even H.mkSEXPIO $ hvar?

This comment has been minimized.

@qnikst

qnikst Jul 25, 2015

Member

$hvar is splice here, we have following error otherwise:

No instance for (TH.Lift TH.ExpQ) arising from a use of ‘TH.lift’
        In the expression: TH.lift hvar
        In the expression:
          [| R.withProtected (H.mkSEXPIO hvar)
``
@qnikst

qnikst Jul 25, 2015

Member

$hvar is splice here, we have following error otherwise:

No instance for (TH.Lift TH.ExpQ) arising from a use of ‘TH.lift’
        In the expression: TH.lift hvar
        In the expression:
          [| R.withProtected (H.mkSEXPIO hvar)
``

This comment has been minimized.

@Fuuzetsu

Fuuzetsu Jul 25, 2015

Contributor

Ah, ok, I only thought that it could have been a splice later, didn't know you didn't have to do $(hvar).

@Fuuzetsu

Fuuzetsu Jul 25, 2015

Contributor

Ah, ok, I only thought that it could have been a splice later, didn't know you didn't have to do $(hvar).

@Fuuzetsu Fuuzetsu assigned mboes and unassigned Fuuzetsu Jul 24, 2015

qnikst added some commits Jul 23, 2015

Update tests to new QQ.
Currently one test is disabled because closures generated by QQ
captures wrong environment values.
Add defineVar function.
defineVar function allow to assign a SEXP to a name in the
given environment.
Add withGloballyDefined function.
withGloballyDefined - is a helper function that allow to
  assign value to name in global environment for a
  sequence of commands.
Introduce new QQ.
Instead of analysing and generatic code that will recreate R's SEXP tree,
at runtime, new quasi-quoter analyses R's and adds additional functions
that take care of interporability of between runtime. I.e. converts
and assignes all required values and then passes computation to
plain parse-eval function in R. So QQ converts (simplified):

[r| plot(x_hs, y_hs) |]

into

do define "x_hs" (mkSEXP x_hs)
   define "y_hs" (mkSEXP y_hs)
   x <- tryEval "plot(x_hs, y_hs)"
   undefine "x_hs"
   undefine "y_hs"
   return x

Non simplified version uses withGlobalyDefined helper instead of
define/undefine pair, mkSEXP happen in IO, and special care about
functions is introduced.

This approach seems very reasonable unless we don't want to change
R's SEXP tree, more over we could convert this represendation into
the form

let expression = parsePure "plot(x_hs, y_hs)"
in do ...
      x <- eval expressin
      ...

In this form parsed expressions could be floated out. However it's
not obvious if this transformation is safe to apply in R and require
additional investigation.
@qnikst

This comment has been minimized.

Show comment
Hide comment
@qnikst

qnikst Jul 25, 2015

Member

I've rebased PR on the top of the master banch and addressed minor comments. I'll reply major now,

Member

qnikst commented Jul 25, 2015

I've rebased PR on the top of the master banch and addressed minor comments. I'll reply major now,

@qnikst

This comment has been minimized.

Show comment
Hide comment
@qnikst

qnikst Jul 25, 2015

Member

1st problem with closures.
In order to address that program I need to describe where the problem is, why it worked before, and why we have problem now. The problem is that we need to support closures that work with say "immutable lexical scoping", this means that "value" of free variables is taken from the lexical scope of the function and never changes. The problem that R's scoping rules are not "immutable" and instead of taking a value it takes a reference to the environment that holds a value, and looks up a value on each use. Example:

> x <- 5
> z <- function(y) x + y
> z(0)
[1] 5
> x <- 4
> z(0)
[1] 4

Why everything worked before: we have modified SEXP tree, and instead of reference, we substibuted a value. So there we no lookup at all, and tree that for function(y) x_hs + y was converted to tree for function(y) <value_of_x_in_haskell> + t, so the R's algorithm was not used here.
With new solution we keep a reference, however, the way how we work with variable substitution brings problems here. We generate (simplified) following code:

defineVar("x_hs", mkSEXP x, globalEnv)
parseEval("function(y) x_hs + y")
undefineVar("x_hs")

so after execution of undefineVar lookup of x_hs will fail.

How could we workaround this problem.
I don't have great solution for now, but I propose following approaches:

  1. for each QQ create a new environment rpho (same as R's code rpho <- new.env(), and assign all haskell variables in that env, then accesses will be converted to function(y) rpho$x + y.
    This will solve a problem because we use special environments that are protected by the closure and are "private"
  2. Another approach is to generate unique name for each haskell variable, and do not unassing that ever, so it will be garbage collected once closure is garbage collected.
    I'll think about other approaches though.
Member

qnikst commented Jul 25, 2015

1st problem with closures.
In order to address that program I need to describe where the problem is, why it worked before, and why we have problem now. The problem is that we need to support closures that work with say "immutable lexical scoping", this means that "value" of free variables is taken from the lexical scope of the function and never changes. The problem that R's scoping rules are not "immutable" and instead of taking a value it takes a reference to the environment that holds a value, and looks up a value on each use. Example:

> x <- 5
> z <- function(y) x + y
> z(0)
[1] 5
> x <- 4
> z(0)
[1] 4

Why everything worked before: we have modified SEXP tree, and instead of reference, we substibuted a value. So there we no lookup at all, and tree that for function(y) x_hs + y was converted to tree for function(y) <value_of_x_in_haskell> + t, so the R's algorithm was not used here.
With new solution we keep a reference, however, the way how we work with variable substitution brings problems here. We generate (simplified) following code:

defineVar("x_hs", mkSEXP x, globalEnv)
parseEval("function(y) x_hs + y")
undefineVar("x_hs")

so after execution of undefineVar lookup of x_hs will fail.

How could we workaround this problem.
I don't have great solution for now, but I propose following approaches:

  1. for each QQ create a new environment rpho (same as R's code rpho <- new.env(), and assign all haskell variables in that env, then accesses will be converted to function(y) rpho$x + y.
    This will solve a problem because we use special environments that are protected by the closure and are "private"
  2. Another approach is to generate unique name for each haskell variable, and do not unassing that ever, so it will be garbage collected once closure is garbage collected.
    I'll think about other approaches though.
@qnikst

This comment has been minimized.

Show comment
Hide comment
@qnikst

qnikst Jul 25, 2015

Member

2nd problem is very interesting. On the contrary to the first one I have not considered that issue and never tested that (as you can see we had no test to caught that problem).

I think that the "sanity-test" here will be: observable result of splitting QQ into several one should be equal to result of the expression. In this example:
let x = 2 in [r| abc(x_hs) + x_hs |] should be equal to let x = 2 in do {y <- [r| abc(x_hs)|]; z <- [r| x_hs|]; [r| y_hs + z_hs |]
It's easy to see that current this test will not hold for the current implementation, this means that we have a bug.

The only solution for that bug that I see is to create a separate entry for each use of the variable, i.e.
[r| abc(x_hs) + x_hs |] will be converted to:

let v = mkSEXP
assign("x_hs_1", v, globalEnv)
assign("x_hs_2", v, globalEnv)
parseEval("abs(x_hs_1) + x_hs_2")
unassign("x_hs_1")
unassing("x_hs_2")

Or just document this problem with a big fat warning that user should not do such a tricks.
I need to mention that this solution plays nicely with second solution of the closure problem, but will complicate implementation a bit.

Member

qnikst commented Jul 25, 2015

2nd problem is very interesting. On the contrary to the first one I have not considered that issue and never tested that (as you can see we had no test to caught that problem).

I think that the "sanity-test" here will be: observable result of splitting QQ into several one should be equal to result of the expression. In this example:
let x = 2 in [r| abc(x_hs) + x_hs |] should be equal to let x = 2 in do {y <- [r| abc(x_hs)|]; z <- [r| x_hs|]; [r| y_hs + z_hs |]
It's easy to see that current this test will not hold for the current implementation, this means that we have a bug.

The only solution for that bug that I see is to create a separate entry for each use of the variable, i.e.
[r| abc(x_hs) + x_hs |] will be converted to:

let v = mkSEXP
assign("x_hs_1", v, globalEnv)
assign("x_hs_2", v, globalEnv)
parseEval("abs(x_hs_1) + x_hs_2")
unassign("x_hs_1")
unassing("x_hs_2")

Or just document this problem with a big fat warning that user should not do such a tricks.
I need to mention that this solution plays nicely with second solution of the closure problem, but will complicate implementation a bit.

@Fuuzetsu

This comment has been minimized.

Show comment
Hide comment
@Fuuzetsu

Fuuzetsu Jul 27, 2015

Contributor

let x = 2 in do {y <- [r| abc(x_hs)|]; z <- [r| x_hs|]; [r| y_hs + z_hs |]

Surely meant to be let x = 2 in do {y <- [r| abc(x_hs)|]; z <- [r| x_hs|]; [r| y + z |]

I think that assigning a different name for each place Haskell variable is used is the easiest approach with new qq.

Or just document this problem with a big fat warning that user should not do such a tricks.

Warnings only work when the user is in control of the whole code… What if the user is using a library which does something like this?

Contributor

Fuuzetsu commented Jul 27, 2015

let x = 2 in do {y <- [r| abc(x_hs)|]; z <- [r| x_hs|]; [r| y_hs + z_hs |]

Surely meant to be let x = 2 in do {y <- [r| abc(x_hs)|]; z <- [r| x_hs|]; [r| y + z |]

I think that assigning a different name for each place Haskell variable is used is the easiest approach with new qq.

Or just document this problem with a big fat warning that user should not do such a tricks.

Warnings only work when the user is in control of the whole code… What if the user is using a library which does something like this?

@qnikst

This comment has been minimized.

Show comment
Hide comment
@qnikst

qnikst Jul 27, 2015

Member

Surely meant to be let x = 2 in do {y <- [r| abc(x_hs)|]; z <- [r| x_hs|]; [r| y + z |]

No, it should be [r| y_hs + z_hs|] because nothing is bounded to names y and z in R environment, so [r| y + z |] will fail.

I think that assigning a different name for each place Haskell variable is used is the easiest approach with new qq.

I agree with that, so I think we need to agree on details of implementation here.

Member

qnikst commented Jul 27, 2015

Surely meant to be let x = 2 in do {y <- [r| abc(x_hs)|]; z <- [r| x_hs|]; [r| y + z |]

No, it should be [r| y_hs + z_hs|] because nothing is bounded to names y and z in R environment, so [r| y + z |] will fail.

I think that assigning a different name for each place Haskell variable is used is the easiest approach with new qq.

I agree with that, so I think we need to agree on details of implementation here.

@Fuuzetsu

This comment has been minimized.

Show comment
Hide comment
@Fuuzetsu

Fuuzetsu Jul 27, 2015

Contributor

No, it should be [r| y_hs + z_hs|] because nothing is bounded to names y and z in R environment, so [r| y + z |] will fail.

Ah, I read do { … } as being QQ'd.

Contributor

Fuuzetsu commented Jul 27, 2015

No, it should be [r| y_hs + z_hs|] because nothing is bounded to names y and z in R environment, so [r| y + z |] will fail.

Ah, I read do { … } as being QQ'd.

@mboes

This comment has been minimized.

Show comment
Hide comment
@mboes

mboes Nov 21, 2015

Member

@qnikst could you resolve the merge conflicts on this branch?

Member

mboes commented Nov 21, 2015

@qnikst could you resolve the merge conflicts on this branch?

@mboes mboes referenced this pull request Dec 30, 2015

Merged

Rewrite R quasiquoter. #230

@mboes

This comment has been minimized.

Show comment
Hide comment
@mboes

mboes Dec 30, 2015

Member

Closing. Will reopen of #230 doesn't get merged.

Member

mboes commented Dec 30, 2015

Closing. Will reopen of #230 doesn't get merged.

@mboes mboes closed this Dec 30, 2015

@mboes mboes deleted the new-qq branch May 22, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment