Refactoring token parsing in quasi module #1206

danbroooks · 2021-03-15T20:11:32Z

Before submitting your PR, check that you've:

Documented new APIs with Haddock markup
Added @since declarations to the Haddock

After submitting your PR:

Update the Changelog.md file with a link to your PR
Bumped the version number if there isn't an (unreleased) on the Changelog
Check that CI passes (or if it fails, for reasons unrelated to your change, like CI timeouts)

Further refactoring of the Quasi module. Currently the tokenize function is collecting Space tokens, however spaces are largely immaterial to the process of parsing EntityDefs, besides parsing the amount of indentation into the Line type.

The existence of space between tokens is implicit in the result of the token parsing, and there are no rules around the amount of spacing that comes between tokens, so this PR changes the Token type so it does not include spaces at all, maintaining existing functionality, and parsing directly into the Line type which contains all the necessary tokens and indentation. This enables us to get rid of some redundant functions (empty, removeSpaces) which retrospectively look back through the parsed space tokens in order to remove them.

This change also allows us to differentiate between Tokens and DocComment values, where currently we are re-parsing DocComments (via isComment), this function has been re-written to make use of the initial tokenization to detect DocComments.

Getting a list of Text values has been recovered with the function lineText which effectively gets what tokens from Line previously got you. I think this is mainly because EntityDef's are after Text values here (entityDerives and entityExtra I think this applies to), I stopped at this point as I didn't want to change the EntityDef definition as part of this change. There is still plenty of room for more refactoring following this change though.

parsonsmatt

There's one behavior change here, but otherwise this looks great.

parsonsmatt · 2021-03-22T16:39:17Z

persistent-sqlite/ChangeLog.md

@@ -33,7 +33,6 @@
 * Add `createRawSqlitePoolFromInfo`, `createRawSqlitePoolFromInfo_`,
  `withRawSqlitePoolInfo`, and `withRawSqlitePoolInfo_` to match the existing
  pool functions for regular `SqlBackend`. [#983](https://github.com/yesodweb/persistent/pull/983)
->>>>>>> master


Nice catch 😄

parsonsmatt · 2021-03-22T16:41:09Z

persistent/Database/Persist/Quasi.hs

 -- | Tokenize a string.
 tokenize :: Text -> [Token]
 tokenize t
    | T.null t = []
-    | "-- | " `T.isPrefixOf` t = [DocComment t]
+    | "-- | " `T.isPrefixOf` t = [DocComment (T.drop 5 t)]


Oh. this is a breaking change, actually - the original code has the documentation comments unmodified, eg still carrying the -- | stuff.

Yeah, this isn't so great but this function here returns the -- | to the Text value representation of DocComment:

persistent/persistent/Database/Persist/Quasi.hs

Lines 560 to 564 in 10bc6c5

tokenText :: Token -> Text

tokenText tok =

case tok of

Token t -> t

DocComment t -> "-- | " <> t

IE when it gets sent back into the EntityDef here

persistent/persistent/Database/Persist/Quasi.hs

Line 961 in 10bc6c5

in (x, M.insert name (map lineText children) y)

lineText ultimately calls tokenText to convert the line into text for those EntityDef fields - there is a bit more detail on the PR description about this. I don't think it is amazing, but the state of this currently isn't too great either, and I think this change is a step in the right direction and certainly is making use of the types more rather than passing around Text values.

I'm happy to update/write any tests to assert nothing has broken, as this should not break anything, my intention is to maintain the existing functionality here. I have obviously had to remove/update some tests due to changing types, and removing some functions, and I've tried to update them but unfortunately it isn't the cleanest diff so that may not be so obvious (and there may be some flaws in how I've changed the tests possibly?).

Looking through the tests again, I guess we have nothing that asserts what should appear on the EntityDef at the end of this process (AFAICT), perhaps I could add some along with this PR to assert that that behaviour has been maintained?

That's a good idea :)

Ok so I've added some tests for parse, I've tried to cover off most variations of comments in there, but if there is a particular entity definition you want me to check (or some use case I've missed out or something), then I can always add that to these tests. You can see now though that it is retaining the -- | in the values passed back to the entity def in these tests:

persistent/persistent/test/main.hs

Lines 283 to 286 in ddb7b0a

it "should parse the `entityAttrs` field" $ do

entityAttrs bicycle `shouldBe` ["-- | this is a bike"]

entityAttrs car `shouldBe` []

entityAttrs vehicle `shouldBe` []

persistent/persistent/test/main.hs

Lines 351 to 354 in ddb7b0a

it "should parse the `entityEntities` field" $ do

entityExtra bicycle `shouldBe` Map.singleton "ExtraBike" [["foo", "bar", "-- | this is a foo bar"], ["baz"]]

entityExtra car `shouldBe` mempty

entityExtra vehicle `shouldBe` mempty

This field here, however, does strip out the -- | :

persistent/persistent/test/main.hs

Lines 361 to 364 in ddb7b0a

it "should parse the `entityComments` field" $ do

entityComments bicycle `shouldBe` Nothing

entityComments car `shouldBe` Just "This is a Car\n"

entityComments vehicle `shouldBe` Nothing

But this is the case in master, this function here is loading the comment into LinesWithComments:

persistent/persistent/Database/Persist/Quasi.hs

Lines 698 to 701 in 03e794f

case isComment (NEL.head (tokens line)) of

Just comment

| lineIndent line == lowestIndent ->

consComment comment lwc : lwcs

Where isComment is stripping out the -- | portion:

persistent/persistent/Database/Persist/Quasi.hs

Lines 661 to 663 in 03e794f

isComment :: Text -> Maybe Text

isComment xs =

T.stripPrefix "-- | " xs

As a way of testing the tests, I cherry picked these onto master and ran them to assert that they assert the same functionality as what is in master currently. Hope this is sufficient to cover off that this change is not having any detrimental impact, if there is something I have missed from the above that would be good to check then I can always add more checks here 👍

oh man i never even thought to test that the comments weren't being picked up by attrs and extras! That feels like a silly mistake on my part.

Great investigation, thanks! I'm happy with this.

danbroooks · 2021-03-28T21:29:08Z

persistent/Database/Persist/Quasi.hs

+    lns <- NEL.nonEmpty (T.lines txt)
+    NEL.nonEmpty $ mapMaybe parseLine (NEL.toList lns)
+
+-- TODO: refactor to return (Line' NonEmpty)


I noticed I'd left this in, but there is no real explanation here... sorry! This is something I wanted to change, but was worried that it may inflate this initial PR, I will address this following this PR I think. Effectively we can parse a Line' NonEmpty upfront here, which means we can greatly simplify the implementation of Line' as it no longer needs to be polymorphic as the the type used for the 'collection' of tokens. I will do this as another change though following this one.

If you don't want TODOs lying around I can remove this 👍

Could you leave this PR comment in the note? That'd be great for whoever sees this TODO later on. THanks!

danbroooks · 2021-03-29T17:50:57Z

This is ready for another look now :)

With the comments thing, is it that the comments that appear in entityExtras and entityAttrs should actually appear in entityComments?

I don't really use this functionality in persistent so unsure how exactly it is meant to work. If that is something that needs addressing I can open an issue for this to follow this PR 👍 (or if you can open one with some detail about the changes required I'd be happy to work on this)

parsonsmatt · 2021-03-29T19:25:07Z

Yeah, the comments should only be in entityComments. It's not a huge deal but it's a bit embarrassing that I missed it when I imlpemented this 😅

danbroooks added 6 commits March 15, 2021 18:25

Extract isCapitalizedText function

dfdcad4

Extract setFieldComments as top level function

607d57e

Refactoring quasi quoter

977b9c7

Update changelog

7ac5b6b

Correct test description

d95dde1

Remove git conflict marker left in changelog

10bc6c5

parsonsmatt requested changes Mar 22, 2021

View reviewed changes

danbroooks added 3 commits March 23, 2021 12:53

Merge branch 'master' into refactoring-quasi-module

28bdd24

Add a test for parse

ddb7b0a

Refactor to use pattern guard

114bee0

danbroooks commented Mar 28, 2021

View reviewed changes

danbroooks changed the title ~~Refactoring quasi module~~ Refactoring token parsing in quasi module Mar 28, 2021

danbroooks added 3 commits March 28, 2021 22:45

Rename PR

93901ac

More detail on TODO comment

ed798a1

Merge branch 'master' into refactoring-quasi-module

51c90ec

parsonsmatt approved these changes Mar 29, 2021

View reviewed changes

parsonsmatt merged commit 4560b80 into yesodweb:master Mar 29, 2021

danbroooks mentioned this pull request Apr 20, 2021

Simplify Line type in Quasi module, always use NonEmpty #1231

Merged

7 tasks

danbroooks deleted the refactoring-quasi-module branch April 22, 2021 10:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring token parsing in quasi module #1206

Refactoring token parsing in quasi module #1206

danbroooks commented Mar 15, 2021 •

edited

parsonsmatt left a comment

parsonsmatt Mar 22, 2021

parsonsmatt Mar 22, 2021

parsonsmatt Mar 22, 2021

danbroooks Mar 22, 2021

parsonsmatt Mar 26, 2021

danbroooks Mar 28, 2021

parsonsmatt Mar 29, 2021

danbroooks Mar 28, 2021 •

edited

parsonsmatt Mar 29, 2021

danbroooks commented Mar 29, 2021 •

edited

parsonsmatt commented Mar 29, 2021

	\| "-- \| " `T.isPrefixOf` t = [DocComment (T.drop 5 t)]
	\| Just txt <- T.stripPrefix "-- \| " t = [DocComment t]

	tokenText :: Token -> Text
	tokenText tok =
	case tok of
	Token t -> t
	DocComment t -> "-- \| " <> t

	it "should parse the `entityAttrs` field" $ do
	entityAttrs bicycle `shouldBe` ["-- \| this is a bike"]
	entityAttrs car `shouldBe` []
	entityAttrs vehicle `shouldBe` []

	it "should parse the `entityEntities` field" $ do
	entityExtra bicycle `shouldBe` Map.singleton "ExtraBike" [["foo", "bar", "-- \| this is a foo bar"], ["baz"]]
	entityExtra car `shouldBe` mempty
	entityExtra vehicle `shouldBe` mempty

	it "should parse the `entityComments` field" $ do
	entityComments bicycle `shouldBe` Nothing
	entityComments car `shouldBe` Just "This is a Car\n"
	entityComments vehicle `shouldBe` Nothing

	case isComment (NEL.head (tokens line)) of
	Just comment
	\| lineIndent line == lowestIndent ->
	consComment comment lwc : lwcs

	isComment :: Text -> Maybe Text
	isComment xs =
	T.stripPrefix "-- \| " xs

Refactoring token parsing in quasi module #1206

Refactoring token parsing in quasi module #1206

Conversation

danbroooks commented Mar 15, 2021 • edited

parsonsmatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danbroooks Mar 28, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danbroooks commented Mar 29, 2021 • edited

parsonsmatt commented Mar 29, 2021

danbroooks commented Mar 15, 2021 •

edited

danbroooks Mar 28, 2021 •

edited

danbroooks commented Mar 29, 2021 •

edited