Skip to content

Commit

Permalink
Improved readme
Browse files Browse the repository at this point in the history
  • Loading branch information
tobiasli committed Oct 27, 2019
1 parent fa2602d commit b49a9e7
Show file tree
Hide file tree
Showing 2 changed files with 71 additions and 5 deletions.
71 changes: 70 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,75 @@ pip install fileparse-tobiasli

## Usage

Say you have som text, and you have an idea of the structure of this text.

```python
nested_text = """# This is a title.
This is contents.
And some more.
## This is a subtitle.
with subtitle contents.
# This is another title.
With some contents.
"""
```

You can then define some simple classes defining this content structure and patterns that match each content type. Finally we define a model.Finder, which allows us to search for the content type in the text file.

```python
[add usecase]
import re

import fileparse.parsing as model
import fileparse.readers as readers

class Text(model.Content):
pass
text_match = re.compile('^(?P<text>[^#].+)$')
text_finder = model.ContentFinder(start_pattern=text_match,
content_type=Text)

class SubTitle(model.Content):
pass
subtitle_match = re.compile('^## ?(?P<subtitle>[^#].+)$')
subtitle_finder = model.ContentFinder(start_pattern=subtitle_match,
content_type=SubTitle,
sub_content_finders=[text_finder]
)

class Title(model.Content):
pass
title_match = re.compile('^# ?(?P<title>[^#].+)$')
title_finder = model.ContentFinder(start_pattern=title_match,
content_type=Title,
sub_content_finders=[subtitle_finder, text_finder])
```
Notice two things:
* The regex patterns are named capture groups. The named capture groups are added as property to their content type. I.e. a `SubTitle` instance will receive a `SubTitle.subtitle` property.
* `Text` content can be found within both a `Title` and a `SubTitle`. And that a `SubTitle` only can be found within a `Title`.

Finally, we define the Parser.

````python
file_finder = model.Parser(finders=[title_finder])
````

The file_finder is now ready to parse text content.

For this specific content, we need a text stream able to parse a string. We define it like this:

````python
stream = readers.TextStream(reader=readers.StringReader(string=nested_text))
````

We can now parse the text with the rules defined in file_finder, and se what comes out of it. To get information out of a file-object, use the `file.get_contents_by_type(content_type)` method.

````python
file = file_finder.parse_stream(stream)

print(file.get_contents_by_type(SubTitle)[0].subtitle == 'This is a subtitle.')
print(file.get_contents_by_type(SubTitle)[0].contents[0].text == 'with subtitle contents.')
````

Happy parsing.
5 changes: 1 addition & 4 deletions fileparse/test/test_parsing.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,7 @@
¤ this is not
fin"""

NESTED_TEXT = """
# This is a title.
NESTED_TEXT = """# This is a title.
This is contents.
And some more.
Expand All @@ -23,7 +21,6 @@
# This is another title.
With some contents.
"""


Expand Down

0 comments on commit b49a9e7

Please sign in to comment.