This Notebook is available at my github _tfer_
* Much of this is taken from official website, which is under the MIT license, so this is too.

In [None]:
# Some code make sure we can use the srl mosule
srl_module_present=True
try:
   from srl import SRL
except ImportError:
    srl_module_present=False
    import sys
    !{sys.executable} -m pip install srl
    from srl import SRL

if srl_module_present:
    print("The srl module was already present.")
else:
    print("The srl module had to be installed first.")

__===========================================================__

# Simple Regex Language

## There are a few basic syntax rules you should know before diving in

* SRL is case insensitive. Thus, LITERALLY "test" is exactly the same as literally "test". But please beware, that everything inside a string is in fact case sensitive. LITERALLY "TEST" does NOT equal literally "test".
* The comma separating statements is completely optional and has no effect whatsoever. In fact, it gets removed before interpreting. But since it helps the human eye to distinct between different statements, it's allowed.
* Strings are interpreted as literal characters and will have any escapes interpreted. They can either be defined using 'single', "double", (plus """triple""" in Python) quotation marks. Escaping them (quotes?-trf) using a backslash is possible as well.
* Parentheses should only be used when building a sub-query, for example while using a capture or non-capture group to, for example, apply a quantifier to multiple items. [^elsewhere the're used for sub-query and/or capture]
* Comments are currently not supported and may be implemented in the future.
__===========================================================__


## Frame [^ Called 'Character' in the official Doc]

A frame is anything that can match a tile in our model.  An SRL statement to describe one, or a sequence of frames takes the following form:

```srl-BNF
<character-set-name> [specification] [quantifier] [anchor]
```

As you can see, the `<character-set-name>` almost always come first, (though there can be another `[anchor]` before it). They start a new statement, and everything that follows defines or refines the frame(s) it introduces.  Some `<character-set-name>`'s allow a specification.  For example `LETTER`, allows you to specify a span of allowed letters, e.g.: `from a to f`.

Every frame or frame sequence can be quantified. You may want to match exactly four letters from a to f. This would match abcd, but not abcg. You can do that by supplying `exactly 4 times` as a quantifier:

```srl
letter from a to f exactly 4 times
```

Note: this adds 4 frames to our regex, four copies of `letter from a to f`


Okay, let's dive into the different `<character-set-name>'s`. Below, we'll go though all the available `<character-set-name>'s` and give some example queries.

---

### LITERALLY
```srl
literally "string"
```

The `literally` <character-set-name> allows you build a sequence of frames up with one statement.  It passes a string to the query that will be interpreted exactly as what you've requested. Nothing else will match besides your string. Any special character will automatically be escaped.

#### Example query:
```srl
literally "sample"
```


In [None]:
srl = SRL('literally "sample"')
# the python regex the next line returns follows a Perl 5 extension
srl.pattern

### ONE OF

example format:
```srl-BNF
one of "characters"
```

So `literally`, (above), comes in handy if the string is known. But if there is a unknown string which may only contain certain characters, using `ONE OF` makes much more sense. This will match one of the supplied characters.

#### Example query:
```srl
one of "a%1"
```


In [56]:
srl = SRL('one of "a%1"')
# the python regex the next line returns a character class that needs
#  to the double backslash the percent sign to allow it to be escaped
srl.pattern

'[a\\%1]'

### LETTER _and_ UPPERCASE LETTER

format:

```srl-BNF
letter [from a to z]
```

This will help you to match a letter between a specific span, if the specific character expected isn't known. If you know you're expecting an letter, then go for it. If not supplying anything, a normal letter between a and z will be matched. Of course, you can define a span, using the `from <x> to <y>` syntax.

Please note, that this will only match one letter. If you expect more than one letter, use a quantifier.

>Note: LETTER would be called an alphabetic character in computer science class.

#### Example queries:

```srl
letter from a to f
uppercase letter
```

In [None]:
srl = SRL('letter from a to f')
# below returns a character class using python 'span' notation
srl.pattern

In [None]:
srl = SRL('uppercase letter')
# below returns a character class using python 'span' notation
srl.pattern

or use in this format like the next example does:

```srl-BNF
uppercase letter [from A to Z]
```

This of course behaves just like the normal letter, with the only difference, that uppercase letter only matches letters that are written in uppercase. Of course, if the case insensitive flag is applied to the query, these two act completely the same.

#### Example query:

```srl
uppercase letter from A to F
```

In [None]:
srl = SRL('uppercase letter from A to F')
# below returns a character class using python 'span' notation
srl.pattern

### ANY CHARACTER

format:

```srl-BNF
any character
```

Just like a letter, any character matches anything between A to Z, plus 0 to 9 and _, -case insensitive. This way you can validate if someone for example entered a valid username.

>In many computer languages, including Python, these are the characters from which you can form valid identifers.

#### Example query:

```srl
starts with any character once or more, must end
```

>Note: this example shows an `anchor` in front, i.e. the `starts with`.


In [None]:
srl = SRL('starts with any character once or more, must end')
# below returns string that when python-interpreted 
#  === r'^\w+$'
srl.pattern

### NO CHARACTER

```srl-BNF
no character
```

The inverse to the `any character` is `no character`. This will match everything except a to z, A to Z, 0 to 9 and _.

Example query:

```srl
starts with no character once or more, must end
```


In [None]:
regex_in_SRL = """
starts with
    no character once or more
must end
"""

srl = SRL(regex_in_SRL)
# below will return a string that when interpretived
#  is === r'^\W+$'
srl.pattern

### DIGIT or NUMBER

format:

```srl-BNF
digit [from 0 to 9]
```

When expecting a digit, but not a specific one, this comes in handy. Each `digit` matches only one digit, meaning it will only match a single `digit from 0 to 9`, but you can repeat that by using a quantifier.  Obviously, limiting the allowed values for `digit` isn't a problem either.  So if you're searching for a `number from 5 to 7`, go for it!

>Note: `number` is an alias for `digit`.

#### Example query:

```srl
starts with digit from 5 to 7 exactly 2 times, must end
```


In [None]:
srl = SRL('starts with digit from 5 to 7 exactly 2 times, must end')
# 
srl.pattern

The output above has this form:


>`anchor`, a `character-class` in span-format, a `curlied-exact-count`, an `anchor`


### ANYTHING

format:

```srl-BNF
anything
```

Any character whatsoever. Well.. except for line breaks. This will match any character, except new lines. And, of course, only once. So don't forget to apply a quantifier, if necessary.

#### Example query:

```srl
anything
```


In [None]:
srl = SRL('anything')
#
srl.pattern

### NEW LINE

format:

```srl-BNF
new line
```

Match a new line. Forgive us, if we can't provide an example for that one, but you can check it out yourself below.


### WHITESPACE and NO WHITESPACE

```srl_BNF 
[no] whitespace
```

This matches any whitespace character. This includes a space, tab or new line. If using no whitespace everything except a whitespace character will match.

#### Example query:

```srl
whitespace
```


In [None]:
srl = SRL('whitespace')
# below's result === r'\s'
srl.pattern

In [None]:
r'\s' == '\\s'

### TAB

```srl-BNF
tab
```

If you want to match tabs, but no other whitespace characters, this might be for you. It will only match the tab character, and nothing else.

#### Example query:

```snl_BNF
tab
```


In [None]:
srl = SRL('tab')
srl.pattern

In [None]:
'\\t' == r'\t'

### backslash

```srl
backslash
```

Matching a backslash with `literally` would work, but requires escaping, since the backslash is the escaping character. Thus, you'd have to use literally "\\" to match one backslash. Or you could just write backslash.

#### Example query:

```srl
backslash
```


In [None]:
#srl = SRL('backslash')
#srl.pattern

# seems broken

In [None]:
# should give this
'\\' == r'\'

### RAW

format:

```srl
raw "expression"
```

Sometimes, you may want to enforce a specific part of a regular expression. You can do this by using raw. This will append the given string without escaping it.

#### Example query:

literally "an", whitespace, raw "[a-zA-Z]"

In [None]:
srl = SRL('literally "an", whitespace, raw "[a-zA-Z]"')
srl.pattern

__===========================================================__

## Quantifiers

Quantifiers are probably one of the most important things here. If you've specified a character or a group in your query and now want to multiply it, you don't have to copy and paste all of it. Just tell them how many copies to allow.

Oh, and don't be confused. Sometimes, you may find that these quantifiers don't match with the tinkered example. That's okay, since we're not forcing the string to start or end. Thus, even if only parts of that string are matching, the expression will be valid.

>Remember: You can execute every Python Cell in this notebook by clicking it, and then pressing shift-enter !

___

### EXACTLY _and_ ONCE _and_ TWICE

__format:__

```srl-BNF
exactly <x> times
```

You're sure. You don't guess, you dictate `exactly 4 times`. Not more, not less. The statement before has to match _exactly x times_.

>Note: since exactly x times is pretty much to write, two common shortcut terms exist.  Instead of `exactly 1 time`, you can write `once`, and for `exactly 2 times`, write `twice`

__format:__

```srl-BNF
digit exactly 3 times, letter twice
```

### BETWEEM <x\> AND <y\> TIMES

__format:__

```srl-BNF
between <x> and <y> times
```
For a specific number of repetitions between a span of <x\> to <y\>, you may use this quantifier. It will make sure the previous character exists between x and y times.

>Note: since between x and y times is pretty much to write, you can get rid of the times: `between 1 and 5`

__Example query:__

```srl
Example query:
starts with digit between 3 and 5 times, letter twice
optional
optional
```

### OPTIONAL

You can't always be sure that something exists. Sometimes it's okay if something is missing. In that case, the `optional` quantifier comes in handy. It will match the sub-query, if it's there, and ignore it, if it's missing.

__Example query:__

```srl
digit optional, letter twice
```

### ONCE/NEVER OR MORE

__format:__

```srl-BNF
once or more
never or more
```

If something has to exist at least once, or never, but if it does, then it may exist multiple times, the quantifiers once or more and never or more will do the job.

__Example query:__

```srl
starts with letter once or more, must end
```

### AT LEAST X TIMES

__format:__

```BNF
at least <x> times
```

Something may exist in an infinite length, but must exist at least x times.

__Example query:__

```srl
letter at least 10 times
```