Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring token parsing in quasi module #1206

Merged
merged 12 commits into from Mar 29, 2021

Conversation

danbroooks
Copy link
Contributor

@danbroooks danbroooks commented Mar 15, 2021

Before submitting your PR, check that you've:

After submitting your PR:

  • Update the Changelog.md file with a link to your PR
  • Bumped the version number if there isn't an (unreleased) on the Changelog
  • Check that CI passes (or if it fails, for reasons unrelated to your change, like CI timeouts)

Further refactoring of the Quasi module. Currently the tokenize function is collecting Space tokens, however spaces are largely immaterial to the process of parsing EntityDefs, besides parsing the amount of indentation into the Line type.

The existence of space between tokens is implicit in the result of the token parsing, and there are no rules around the amount of spacing that comes between tokens, so this PR changes the Token type so it does not include spaces at all, maintaining existing functionality, and parsing directly into the Line type which contains all the necessary tokens and indentation. This enables us to get rid of some redundant functions (empty, removeSpaces) which retrospectively look back through the parsed space tokens in order to remove them.

This change also allows us to differentiate between Tokens and DocComment values, where currently we are re-parsing DocComments (via isComment), this function has been re-written to make use of the initial tokenization to detect DocComments.

Getting a list of Text values has been recovered with the function lineText which effectively gets what tokens from Line previously got you. I think this is mainly because EntityDef's are after Text values here (entityDerives and entityExtra I think this applies to), I stopped at this point as I didn't want to change the EntityDef definition as part of this change. There is still plenty of room for more refactoring following this change though.

Copy link
Collaborator

@parsonsmatt parsonsmatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one behavior change here, but otherwise this looks great.

@@ -33,7 +33,6 @@
* Add `createRawSqlitePoolFromInfo`, `createRawSqlitePoolFromInfo_`,
`withRawSqlitePoolInfo`, and `withRawSqlitePoolInfo_` to match the existing
pool functions for regular `SqlBackend`. [#983](https://github.com/yesodweb/persistent/pull/983)
>>>>>>> master
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch 😄

-- | Tokenize a string.
tokenize :: Text -> [Token]
tokenize t
| T.null t = []
| "-- | " `T.isPrefixOf` t = [DocComment t]
| "-- | " `T.isPrefixOf` t = [DocComment (T.drop 5 t)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of the magic number - is that T.length "-- | " == 5 ?

If so, may be better to change this to a pattern guard:

Suggested change
| "-- | " `T.isPrefixOf` t = [DocComment (T.drop 5 t)]
| Just txt <- T.stripPrefix "-- | " t = [DocComment t]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. this is a breaking change, actually - the original code has the documentation comments unmodified, eg still carrying the -- | stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this isn't so great but this function here returns the -- | to the Text value representation of DocComment:

tokenText :: Token -> Text
tokenText tok =
case tok of
Token t -> t
DocComment t -> "-- | " <> t

IE when it gets sent back into the EntityDef here

in (x, M.insert name (map lineText children) y)

lineText ultimately calls tokenText to convert the line into text for those EntityDef fields - there is a bit more detail on the PR description about this. I don't think it is amazing, but the state of this currently isn't too great either, and I think this change is a step in the right direction and certainly is making use of the types more rather than passing around Text values.

I'm happy to update/write any tests to assert nothing has broken, as this should not break anything, my intention is to maintain the existing functionality here. I have obviously had to remove/update some tests due to changing types, and removing some functions, and I've tried to update them but unfortunately it isn't the cleanest diff so that may not be so obvious (and there may be some flaws in how I've changed the tests possibly?).

Looking through the tests again, I guess we have nothing that asserts what should appear on the EntityDef at the end of this process (AFAICT), perhaps I could add some along with this PR to assert that that behaviour has been maintained?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I've added some tests for parse, I've tried to cover off most variations of comments in there, but if there is a particular entity definition you want me to check (or some use case I've missed out or something), then I can always add that to these tests. You can see now though that it is retaining the -- | in the values passed back to the entity def in these tests:

it "should parse the `entityAttrs` field" $ do
entityAttrs bicycle `shouldBe` ["-- | this is a bike"]
entityAttrs car `shouldBe` []
entityAttrs vehicle `shouldBe` []

it "should parse the `entityEntities` field" $ do
entityExtra bicycle `shouldBe` Map.singleton "ExtraBike" [["foo", "bar", "-- | this is a foo bar"], ["baz"]]
entityExtra car `shouldBe` mempty
entityExtra vehicle `shouldBe` mempty

This field here, however, does strip out the -- | :

it "should parse the `entityComments` field" $ do
entityComments bicycle `shouldBe` Nothing
entityComments car `shouldBe` Just "This is a Car\n"
entityComments vehicle `shouldBe` Nothing

But this is the case in master, this function here is loading the comment into LinesWithComments:

case isComment (NEL.head (tokens line)) of
Just comment
| lineIndent line == lowestIndent ->
consComment comment lwc : lwcs

Where isComment is stripping out the -- | portion:

isComment :: Text -> Maybe Text
isComment xs =
T.stripPrefix "-- | " xs

As a way of testing the tests, I cherry picked these onto master and ran them to assert that they assert the same functionality as what is in master currently. Hope this is sufficient to cover off that this change is not having any detrimental impact, if there is something I have missed from the above that would be good to check then I can always add more checks here 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh man i never even thought to test that the comments weren't being picked up by attrs and extras! That feels like a silly mistake on my part.

Great investigation, thanks! I'm happy with this.

lns <- NEL.nonEmpty (T.lines txt)
NEL.nonEmpty $ mapMaybe parseLine (NEL.toList lns)

-- TODO: refactor to return (Line' NonEmpty)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed I'd left this in, but there is no real explanation here... sorry! This is something I wanted to change, but was worried that it may inflate this initial PR, I will address this following this PR I think. Effectively we can parse a Line' NonEmpty upfront here, which means we can greatly simplify the implementation of Line' as it no longer needs to be polymorphic as the the type used for the 'collection' of tokens. I will do this as another change though following this one.

If you don't want TODOs lying around I can remove this 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you leave this PR comment in the note? That'd be great for whoever sees this TODO later on. THanks!

@danbroooks danbroooks changed the title Refactoring quasi module Refactoring token parsing in quasi module Mar 28, 2021
@danbroooks
Copy link
Contributor Author

danbroooks commented Mar 29, 2021

This is ready for another look now :)

With the comments thing, is it that the comments that appear in entityExtras and entityAttrs should actually appear in entityComments?

I don't really use this functionality in persistent so unsure how exactly it is meant to work. If that is something that needs addressing I can open an issue for this to follow this PR 👍 (or if you can open one with some detail about the changes required I'd be happy to work on this)

@parsonsmatt parsonsmatt merged commit 4560b80 into yesodweb:master Mar 29, 2021
@parsonsmatt
Copy link
Collaborator

Yeah, the comments should only be in entityComments. It's not a huge deal but it's a bit embarrassing that I missed it when I imlpemented this 😅

@danbroooks danbroooks deleted the refactoring-quasi-module branch April 22, 2021 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants