Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mdast-util-split: start of work #315

Closed
wants to merge 10 commits into from

Conversation

artragis
Copy link
Member

@artragis artragis commented Feb 21, 2019

Context

this PR aims to add a new mdast-util project to prepare some work on zmarkdown that will try to import a flat markdown text to a fully hierarchized tutorial/article.

For that I need an util to traverse all headings and extract "part" from what follow those headings.

What is done

  • the basic documentation
  • the canonical case (presented in doc & in tests)

What remains before merging

  • test of all configuration (for example I'm not sure my first commit properly handles the conclusion parameter)
  • test edge cases : what if a sub header is inside a quote? At the beginning I was using unist-util-find but as it finds everything in depth I changed the logic to a custom function that I haven't fully tested yet
  • test edge cases : what if there is no header? don't forget to mention mdast-util-normalize
  • implement the "sub header parsing" create a more splitted document (example with
    part > subpart > chapter)

Why doing such a PR?

  • I'd like to have some comment about the functionality
  • Is it a good way to perform what I want?
  • Is my documentation good enough?
  • Have you ideas of edge cases?
  • Have you comment on my code style?
  • Hound please help me improve code quality.

packages/mdast-util-split/src/index.js Outdated Show resolved Hide resolved
packages/mdast-util-split/src/index.js Outdated Show resolved Hide resolved
packages/mdast-util-split/src/index.js Outdated Show resolved Hide resolved
packages/mdast-util-split/src/index.js Outdated Show resolved Hide resolved
packages/mdast-util-split/__tests__/index.js Outdated Show resolved Hide resolved
@vhf
Copy link
Contributor

vhf commented Feb 21, 2019

Nice! I'll do a few rounds of reviews, the first one will only be obvious things, later on I might PR against your branch to fix trickier things. [edit]cf. artragis#7 [/edit]

I'm not sure about the naming of this package. mdast-util-split sounds very generic while this package seems to solve a specific thing.

Copy link
Contributor

@vhf vhf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for other depths and test a file with no heading.

# mdast-util-split [![Build Status][build-badge]][build-status] [![Coverage Status][coverage-badge]][coverage-status]


**rebber** is a LaTeX stringifier for [remark][]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**rebber** is a LaTeX stringifier for [remark][]

module.exports = split
import visit from 'unist-util-visit'

function split (tree, {splitDepth = 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

splitDepth is not used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet implemented

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's implemented, but I'm not happy at all with my namings, I always mix "trees" and "children" and it's quite complex when splitDepth is greater than 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confusing I agree. I think this.subTrees actually refers to new root trees, not subtrees?

Try drawing a simple schema of the output for a simple md doc at various splitDepth, it should make it easier to see what is a root tree, what is a subtree (i.e. a subpart of mdast root tree)?

Copy link
Contributor

@vhf vhf Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the introduction

# chapter 1

a paragraph

> a quote to *ensure this is parsed*

## section 1.1

content 1.1

## subsection 1.1.1

content 1.1.1

## section 1.2

content 1.2

# chapter 2

## subsection 2.1.1

content 2.1.1

# the conclusion

paragraph

For instance what is expected here with splitDepth 1 and splitDepth 2?

My guess for splitDepth 1:

{
  introduction: {type: root, children: [/* from doc root to `chapter 1` (not included) */]},
  trees: [
      {type: root, children: [/* from `chapter 1` to `chapter 2` (not included) */]},
      {type: root, children: [/* from `chapter 2` to `conclusion` (not included) */]},
  ],
  conclusion: {type: root, children: [/* from `chapter 2` (not included) to document end */]},
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, it is what I want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, not, the conclusion must be from "conclusion" (not included)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is what you mean:

{
  introduction: {type: root, children: [/* from doc root to `chapter 1` (not included) */]},
  trees: [
      {type: root, children: [/* from `chapter 1` to `chapter 2` (not included) */]},
      {type: root, children: [/* from `chapter 2` to `conclusion` (not included) */]},
  ],
  conclusion: {type: root, children: [/* from `conclusion` (not included) to document end */]},
}

What about splitDepth 2 for this document?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what I mean. For splitDepth 2 here it what i would like :

{
  introduction: {type: root, children: [/* from doc root to `chapter 1` (not included) */]},
  title: { type: root, children: [/*the heading node*/],
  trees: [
      {type: root, title: {type: 'root', children:[/*chapter 1 heading*/]},
        trees: [/** trees for section1.1=>section 1.2],
        conclusion: {type: root, children:[/* content of section 1.2, but heading discarded*/]},
      {type: root, children: [/* from `chapter 2` to `conclusion` (not included) */]},
  ],
  conclusion: {type: root, children: [/* from `conclusion` (not included) to document end */]},
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does chapter 1 have a conclusion but no introduction? Could you please explain how all of this works in the README?

packages/mdast-util-split/src/index.js Outdated Show resolved Hide resolved
packages/mdast-util-split/src/index.js Outdated Show resolved Hide resolved
})
}
extractIntroductions () {
const self = this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this trick

Copy link
Member Author

@artragis artragis Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I do it then? if I use this in the lambda, this is undifined
.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this will be the object, not undefined.

packages/mdast-util-split/src/index.js Outdated Show resolved Hide resolved
A boolean driving the way we want to process introduction :
- if `true`, all MDAST elements between the split header and the first sub header will be extracted and put in the `introduction` property of the result
property of the returned object
- if `false` they will just be added as MDAST elements of the `children` property.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this line, could you please clarify?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the property set to false, we do not extract the introduction from the tree. In though I had changed like for the conclusion in if false, the part tree will not be changed

property of the returned object
- if `false` they will just be added as MDAST elements of the `children` property.

#### `options.conclusionAsProperty`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### `options.conclusionAsProperty`
#### `options.conclusionAsProperty = false`


A boolean driving the way we want to process the last sub header if there are more than one.
- if `true`, the last sub header is removed from tree and all following elements are stored in the `conclusion` property of the result
- if `false` (default), nothing is changed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- if `false` (default), nothing is changed.
- if `false`, nothing is changed.

this.depth = depth
this._introduction = newRootTree()
}
set introduction (nodes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a getter/setter here makes it harder to understand IMO but I might be failing to see the benefits it brings.

_introduction is already initialized in the constructor, and later on (for instance l.120) when we assign to introduction it's hard to tell whether it enter the setter or not. I would refactor to not use these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I wanted to use a "simple" get/set at the beginning, failed and instead of reworking the whole code kept the get/set. if you want to remove this, you clearly can, it's an antipattern now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, go ahead then

}
extractConclusions () {
this.subTrees.forEach(splittedPart => {
const firstHeading = find(splittedPart.children, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

splitPart, not splitted

splittedPart.conclusion = newRootTree(rootContent)
})
}
extractIntroductions () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which cases do we have several introductions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when all chapters have introductions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still extract it when only one chapter has one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we extract it for every chapter that has one introduction.

@artragis
Copy link
Member Author

artragis commented Mar 4, 2019

Thanks to your first review and PR I could have a better view on my work and I was wondering if I was not trying to create a too complex thing.

An idea that showed up was to say :

just extract one level with two parametters :

  • which depth is considered as "root" depth for titles, default to 1
  • do we extract the last paragraph as a conclusion?

the result would just be :

{
    introduction: {type: 'root', children: [everything before first heading]},
    chapters: [all trees],
    conclusion: /* nothing if extractConclusion is false, else the last tree of chapters*/
}

thanks to that we can use all the existing utils on subtrees if we want a deeper splitting or a specific treatment.

@vhf
Copy link
Contributor

vhf commented Mar 4, 2019

Nice! I agree, the initial code was quite complicated and the purpose unclear, it's good we now see things more clearly.

{
    introduction: {type: 'root', children: [everything before first heading]},
    chapters: [all trees],
    conclusion: /* nothing if extractConclusion is false, else the last tree of chapters*/
}

If I understand correctly, when extractConclusion is true conclusion contains chapters.pop(). If that's correct I wouldn't bother, it's not worth it. If anyone wants the last chapter it's easy enough to get it, no need for an extra config option.

Other than that I like the new idea. Mapping the splitter to its result (with depth+1) is good enough when a deeper split is required.

@artragis
Copy link
Member Author

artragis commented Mar 4, 2019

If I understand correctly, when extractConclusion is true conclusion contains chapters.pop(). If that's correct I wouldn't bother, it's not worth it. If anyone wants the last chapter it's easy enough to get it, no need for an extra config option.

in fact I would say newTree(chapters.pop()) but you are probably right, it's perhaps too much.

Mapping the splitter to its result (with depth+1) is good enough when a deeper split is required.

perhaps you were rephrasing my message but I do not get it here. What do you mean by "mapping the splitter to its result"?

@vhf
Copy link
Contributor

vhf commented Mar 5, 2019

const splitAtDepth = (depth) => (tree) => split(tree, { depth })

const level1 = splitAtDepth(1)(doc)
const level2 = level1.chapters.map(splitAtDepth(2))

@vhf
Copy link
Contributor

vhf commented Apr 5, 2019

I'm not sure about the naming of this package. mdast-util-split sounds very generic while this package seems to solve a specific thing.

What about mdast-util-split-by-heading?

@artragis
Copy link
Member Author

artragis commented Apr 6, 2019

About module name : I do not know what is better and having a look at unified documentation I saw you're part of the kind-of-board of unified/remark ecosystem so I think you can put the name you want without negociating it will be the best thing for remark ecosystem.

@vhf vhf mentioned this pull request Apr 14, 2019
@vhf vhf closed this Apr 14, 2019
@artragis artragis deleted the add_split_utils branch December 19, 2020 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants