New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring token parsing in quasi module #1206
Refactoring token parsing in quasi module #1206
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one behavior change here, but otherwise this looks great.
@@ -33,7 +33,6 @@ | |||
* Add `createRawSqlitePoolFromInfo`, `createRawSqlitePoolFromInfo_`, | |||
`withRawSqlitePoolInfo`, and `withRawSqlitePoolInfo_` to match the existing | |||
pool functions for regular `SqlBackend`. [#983](https://github.com/yesodweb/persistent/pull/983) | |||
>>>>>>> master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch 😄
persistent/Database/Persist/Quasi.hs
Outdated
-- | Tokenize a string. | ||
tokenize :: Text -> [Token] | ||
tokenize t | ||
| T.null t = [] | ||
| "-- | " `T.isPrefixOf` t = [DocComment t] | ||
| "-- | " `T.isPrefixOf` t = [DocComment (T.drop 5 t)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of the magic number - is that T.length "-- | " == 5
?
If so, may be better to change this to a pattern guard:
| "-- | " `T.isPrefixOf` t = [DocComment (T.drop 5 t)] | |
| Just txt <- T.stripPrefix "-- | " t = [DocComment t] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. this is a breaking change, actually - the original code has the documentation comments unmodified, eg still carrying the -- |
stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this isn't so great but this function here returns the -- |
to the Text
value representation of DocComment
:
persistent/persistent/Database/Persist/Quasi.hs
Lines 560 to 564 in 10bc6c5
tokenText :: Token -> Text | |
tokenText tok = | |
case tok of | |
Token t -> t | |
DocComment t -> "-- | " <> t |
IE when it gets sent back into the EntityDef
here
in (x, M.insert name (map lineText children) y) |
lineText
ultimately calls tokenText
to convert the line into text for those EntityDef
fields - there is a bit more detail on the PR description about this. I don't think it is amazing, but the state of this currently isn't too great either, and I think this change is a step in the right direction and certainly is making use of the types more rather than passing around Text
values.
I'm happy to update/write any tests to assert nothing has broken, as this should not break anything, my intention is to maintain the existing functionality here. I have obviously had to remove/update some tests due to changing types, and removing some functions, and I've tried to update them but unfortunately it isn't the cleanest diff so that may not be so obvious (and there may be some flaws in how I've changed the tests possibly?).
Looking through the tests again, I guess we have nothing that asserts what should appear on the EntityDef
at the end of this process (AFAICT), perhaps I could add some along with this PR to assert that that behaviour has been maintained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so I've added some tests for parse
, I've tried to cover off most variations of comments in there, but if there is a particular entity definition you want me to check (or some use case I've missed out or something), then I can always add that to these tests. You can see now though that it is retaining the -- |
in the values passed back to the entity def in these tests:
persistent/persistent/test/main.hs
Lines 283 to 286 in ddb7b0a
it "should parse the `entityAttrs` field" $ do | |
entityAttrs bicycle `shouldBe` ["-- | this is a bike"] | |
entityAttrs car `shouldBe` [] | |
entityAttrs vehicle `shouldBe` [] |
persistent/persistent/test/main.hs
Lines 351 to 354 in ddb7b0a
it "should parse the `entityEntities` field" $ do | |
entityExtra bicycle `shouldBe` Map.singleton "ExtraBike" [["foo", "bar", "-- | this is a foo bar"], ["baz"]] | |
entityExtra car `shouldBe` mempty | |
entityExtra vehicle `shouldBe` mempty |
This field here, however, does strip out the -- |
:
persistent/persistent/test/main.hs
Lines 361 to 364 in ddb7b0a
it "should parse the `entityComments` field" $ do | |
entityComments bicycle `shouldBe` Nothing | |
entityComments car `shouldBe` Just "This is a Car\n" | |
entityComments vehicle `shouldBe` Nothing |
But this is the case in master
, this function here is loading the comment into LinesWithComments
:
persistent/persistent/Database/Persist/Quasi.hs
Lines 698 to 701 in 03e794f
case isComment (NEL.head (tokens line)) of | |
Just comment | |
| lineIndent line == lowestIndent -> | |
consComment comment lwc : lwcs |
Where isComment
is stripping out the -- |
portion:
persistent/persistent/Database/Persist/Quasi.hs
Lines 661 to 663 in 03e794f
isComment :: Text -> Maybe Text | |
isComment xs = | |
T.stripPrefix "-- | " xs |
As a way of testing the tests, I cherry picked these onto master
and ran them to assert that they assert the same functionality as what is in master
currently. Hope this is sufficient to cover off that this change is not having any detrimental impact, if there is something I have missed from the above that would be good to check then I can always add more checks here 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh man i never even thought to test that the comments weren't being picked up by attrs and extras! That feels like a silly mistake on my part.
Great investigation, thanks! I'm happy with this.
persistent/Database/Persist/Quasi.hs
Outdated
lns <- NEL.nonEmpty (T.lines txt) | ||
NEL.nonEmpty $ mapMaybe parseLine (NEL.toList lns) | ||
|
||
-- TODO: refactor to return (Line' NonEmpty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed I'd left this in, but there is no real explanation here... sorry! This is something I wanted to change, but was worried that it may inflate this initial PR, I will address this following this PR I think. Effectively we can parse a Line' NonEmpty
upfront here, which means we can greatly simplify the implementation of Line'
as it no longer needs to be polymorphic as the the type used for the 'collection' of tokens. I will do this as another change though following this one.
If you don't want TODOs
lying around I can remove this 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you leave this PR comment in the note? That'd be great for whoever sees this TODO later on. THanks!
This is ready for another look now :) With the comments thing, is it that the comments that appear in I don't really use this functionality in persistent so unsure how exactly it is meant to work. If that is something that needs addressing I can open an issue for this to follow this PR 👍 (or if you can open one with some detail about the changes required I'd be happy to work on this) |
Yeah, the comments should only be in |
Before submitting your PR, check that you've:
@since
declarations to the HaddockAfter submitting your PR:
(unreleased)
on the ChangelogFurther refactoring of the
Quasi
module. Currently thetokenize
function is collectingSpace
tokens, however spaces are largely immaterial to the process of parsingEntityDef
s, besides parsing the amount of indentation into theLine
type.The existence of space between tokens is implicit in the result of the token parsing, and there are no rules around the amount of spacing that comes between tokens, so this PR changes the
Token
type so it does not include spaces at all, maintaining existing functionality, and parsing directly into theLine
type which contains all the necessary tokens and indentation. This enables us to get rid of some redundant functions (empty
,removeSpaces
) which retrospectively look back through the parsed space tokens in order to remove them.This change also allows us to differentiate between
Token
s andDocComment
values, where currently we are re-parsingDocComments
(viaisComment
), this function has been re-written to make use of the initial tokenization to detectDocComments
.Getting a list of
Text
values has been recovered with the functionlineText
which effectively gets whattokens
fromLine
previously got you. I think this is mainly becauseEntityDef
's are afterText
values here (entityDerives
andentityExtra
I think this applies to), I stopped at this point as I didn't want to change theEntityDef
definition as part of this change. There is still plenty of room for more refactoring following this change though.