Structures semi-structured text, useful when parsing command line output from unix systems and networking devices.
where you have a nice string of text and say to yourself, “wow I wish it just returned something structured that I could deal with like JSON or some other key/value format”.
Well that’s where pytijo
tries to help. It lets you define the payload you wish came back to you, and with a sprinkle of the right regular expressions it does!
With pip: :
pip install pytijo
From source :
make install
Pass your text and a "structure" (python dictionary) to the parser
modules parse
method.
from pytijo import parser
output = """
eth0 Link encap:Ethernet HWaddr 00:11:22:3a:c4:ac
inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:147142475 errors:0 dropped:293854 overruns:0 frame:0
TX packets:136237118 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:17793317674 (17.7 GB) TX bytes:46525697959 (46.5 GB)
eth1 Link encap:Ethernet HWaddr 00:11:33:4a:c8:ad
inet addr:192.168.1.3 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::225:90ff:fe4a:c8ad/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:51085118 errors:0 dropped:251 overruns:0 frame:0
TX packets:3447162 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4999277179 (4.9 GB) TX bytes:657283496 (657.2 MB)
"""
struct = {
'interfaces': [{
'#id': '(eth\d{1,2})',
'ipv4_address': 'inet addr:(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})',
'mac_address': 'HWaddr\s((?:[a-fA-F0-9]{2}[:|\-]?){6})'
}]
}
parsed = parser.parse(output, struct)
print parsed
This will return the python dictionary
{
'interfaces': [
{
'id': 'eth0',
'ipv4_address': '192.168.1.2',
'mac_address': '00:11:22:3a:c4:ac'
},
{
'id': 'eth1',
'ipv4_address': '192.168.1.3',
'mac_address': '00:11:33:4a:c8:ad'
}
]
}
Which you can then do with as you please, maybe return as JSON as part of a REST service...
With the values either being a dictionary
{}
, a list []
, or a regular expression string [a-z](\d)
with one group (to populate the value).The structure is recursively parsed, populating the dictionary/structure that was provided with values from the input text.
To parse these sections of text, we define a dictionary with key of either
#id
or #start
the difference being #start
key/value is dropped from the resulting output.This
#id
or #start
marks the beginning and end for each “chunk” that you’d like parsed.You can forcefully mark the end of a “chunk” by specifying a
#end
key and regex value.An example is useful here.
E.g. The following structure.
{
'tables': [
{
'#id': '\[TABLE (\d{1,2})\]',
'flows': [
{
'#id': '\[FLOW_ID(\d+)\]',
'info': 'info\s+=\s+(.*)'
}
]
}
]
}
Will create a “chunk/block” from the following output
[TABLE 0] Total entries: 3
[FLOW_ID1]
info = related to table 0 flow 1
[TABLE 1] Total entries: 31
[FLOW_ID1]
info = related to table 1 flow 1
That will be parsed as:
{
'tables': [{
'id': '0',
'flows': [{ 'id': '1', 'info': 'related to table 0 flow 1' }],
}, {
'id': '1',
'flows': [{ 'id': '1', 'info': 'related to table 1 flow 1' }]
}]
}
See under tests/test_parser_api.py
for more usage examples.