Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to use unsupported "(?-i)" in a Python regex will result in an error #562

Closed
ghost opened this issue Aug 27, 2020 · 22 comments
Closed
Labels

Comments

@ghost
Copy link

ghost commented Aug 27, 2020

In previous versions I was able to use double quotes to grep. When using the grep case insensitive flag (?i) my jobs fail if the search terms are in single or double quotes.

When searching for two terms using the pipe character how can I specify one term as being case insensitive and the next term as case sensitive? Like this...
grep: (?i)"National Age Group Record"|(?-i)"NAG"

When I don't quote NAG or specify it as case sensitive I end up with matches I don't want i.e. Snags

@SHU-red
Copy link

SHU-red commented Aug 27, 2020

are you sure that you dont need to use '' for your expressions?

something like
grep: '(?i)"National Age Group Record"|(?-i)"NAG"'

@ghost
Copy link
Author

ghost commented Aug 27, 2020

@SHU-red,

Thanks for your input. I gave that a shot and received errors.

Just in case my job has incorrect syntax here it is. I am wondering if putting html2text: re before grep is causing the issue?

name: (24)SwimSwam News "NAG"
url: https://swimswam.com/news
filter:
  - xpath:
      path: '(//div[contains(@class,"item")]//h3)/a|(//div[contains(@class,"item")]//h4)/a'
  - html2text: re
  - grep: '(?i)"National Age Group"|(?-i)"NAG"'
---

I receive this error

Traceback (most recent call last):
  File "/Users/john/Library/Python/3.8/bin/urlwatch", line 8, in <module>
    sys.exit(main())
  File "/Users/john/Library/Python/3.8/lib/python/site-packages/urlwatch/cli.py", line 112, in main
    urlwatch_command.run()
  File "/Users/john/Library/Python/3.8/lib/python/site-packages/urlwatch/command.py", line 402, in run
    self.handle_actions()
  File "/Users/john/Library/Python/3.8/lib/python/site-packages/urlwatch/command.py", line 204, in handle_actions
    sys.exit(self.test_filter(self.urlwatch_config.test_filter))
  File "/Users/john/Library/Python/3.8/lib/python/site-packages/urlwatch/command.py", line 137, in test_filter
    raise job_state.exception
  File "/Users/john/Library/Python/3.8/lib/python/site-packages/urlwatch/handler.py", line 113, in process
    data = FilterBase.process(filter_kind, subfilter, self, data)
  File "/Users/john/Library/Python/3.8/lib/python/site-packages/urlwatch/filters.py", line 146, in process
    return filtercls(state.job, state).filter(data, subfilter)
  File "/Users/john/Library/Python/3.8/lib/python/site-packages/urlwatch/filters.py", line 363, in filter
    return '\n'.join(line for line in data.splitlines()
  File "/Users/john/Library/Python/3.8/lib/python/site-packages/urlwatch/filters.py", line 364, in <genexpr>
    if re.search(subfilter['re'], line) is not None)
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/re.py", line 201, in search
    return _compile(pattern, flags).search(string)
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/sre_parse.py", line 805, in _parse
    flags = _parse_flags(source, state, char)
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/sre_parse.py", line 913, in _parse_flags
    raise source.error(msg, len(char))
re.error: missing : at position 29

examples of failed jobs
name: (02)urlwatch webpage
url: "https://thp.io/2008/urlwatch/"
filter: 
  - html2text: re
  - grep: (?i)current version 
  - strip 
---
name: (02)urlwatch webpage
url: "https://thp.io/2008/urlwatch/"
filter: 
  - html2text: re
  - grep: (?i)'current version' 
  - strip
---
name: (02)urlwatch webpage
url: "https://thp.io/2008/urlwatch/"
filter: 
  - html2text: re
  - grep: (?i)"current version" 
  - strip 
---
name: (02)urlwatch webpage
url: "https://thp.io/2008/urlwatch/"
filter: 
  - html2text: re
  - grep: (?i)'current.*version'
  - strip 
---
name: (02)urlwatch webpage
url: "https://thp.io/2008/urlwatch/"
filter: 
  - html2text: re
  - grep: (?i)"current.*version"
  - strip 
---

examples of successful jobs
name: (02)urlwatch webpage
url: "https://thp.io/2008/urlwatch/"
filter: 
  - html2text: re
  - grep: (?i)current.*version
  - strip # Strip leading and trailing whitespace 
---
name: (02)urlwatch webpage
url: "https://thp.io/2008/urlwatch/"
filter: 
  - html2text: re
  - grep: "Current.*version"
  - strip # Strip leading and trailing whitespace 
---

There is now a urlwatch subreddit

@thp
Copy link
Owner

thp commented Aug 27, 2020

For me, this parses successfully. Are you sure you are using Python 3 and all the latest packages and dependencies?

Also, this line:

name: (24)SwimSwam News "NAG"

Maybe try:

name: '(24)SwimSwam News "NAG"'

@thp
Copy link
Owner

thp commented Aug 27, 2020

Are you sure there are no weird invisible bytes in your file? Maybe attach it to this thread, so we can have a look? Also, looking at the file in a hex editor or an editor that shows all hidden bytes could be useful?

@SHU-red
Copy link

SHU-red commented Aug 27, 2020

As soon as im using the "grep"-line i get the same error.

Maybe using shellpipe instead helps?

Someting like this?
- shellpipe: "grep -i -o '(?i)National Age Group|(?-i)NAG'"

@thp
Copy link
Owner

thp commented Sep 6, 2020

@jprokos any news on that front?

@ghost
Copy link
Author

ghost commented Sep 6, 2020

I am still looking into it.

I began by:
Reinstalling Python using homebrew.
Reinstalling urlwatch and it's dependencies and packages.

This job failed:

name: (23)SwimSwam News [NAG]
url: https://swimswam.com/news
filter:
  - xpath:
      path: '(//div[contains(@class,"item")]//h3)/a|(//div[contains(@class,"item")]//h4)/a'
  - html2text: re
  - grep: '(?i)"National Age Group"|(?-i)"NAG"'
---
Sun Sep 06 03:30:06
iMac191:~ john$ uwtf 23
Traceback (most recent call last):
  File "/usr/local/bin/urlwatch", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/urlwatch/cli.py", line 112, in main
    urlwatch_command.run()
  File "/usr/local/lib/python3.8/site-packages/urlwatch/command.py", line 402, in run
    self.handle_actions()
  File "/usr/local/lib/python3.8/site-packages/urlwatch/command.py", line 204, in handle_actions
    sys.exit(self.test_filter(self.urlwatch_config.test_filter))
  File "/usr/local/lib/python3.8/site-packages/urlwatch/command.py", line 137, in test_filter
    raise job_state.exception
  File "/usr/local/lib/python3.8/site-packages/urlwatch/handler.py", line 113, in process
    data = FilterBase.process(filter_kind, subfilter, self, data)
  File "/usr/local/lib/python3.8/site-packages/urlwatch/filters.py", line 146, in process
    return filtercls(state.job, state).filter(data, subfilter)
  File "/usr/local/lib/python3.8/site-packages/urlwatch/filters.py", line 363, in filter
    return '\n'.join(line for line in data.splitlines()
  File "/usr/local/lib/python3.8/site-packages/urlwatch/filters.py", line 364, in <genexpr>
    if re.search(subfilter['re'], line) is not None)
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/re.py", line 201, in search
    return _compile(pattern, flags).search(string)
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/sre_parse.py", line 805, in _parse
    flags = _parse_flags(source, state, char)
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/sre_parse.py", line 913, in _parse_flags
    raise source.error(msg, len(char))
re.error: missing : at position 29

These jobs are working:

name: (02)urlwatch webpage
url: "https://thp.io/2008/urlwatch/"
filter: 
  - html2text: re
  - grep: (?i)"current\sversion"
  - strip
---
name: (21)Swimming World Time Standar/NAG/USA Swimming
url: https://www.swimmingworldmagazine.com/news/category/usa/
filter:
  - xpath: '(//div[contains(@class,"post-detail")]//h2)/a'
  - html2text: re
  - grep: (?i)USA.*Swimming|Time.*Standard|National.*Age.*Group|"NAG"
---
name: (22)SwimSwam News [USA Swimming, Time Standard, National Age Group]
url: https://swimswam.com/news
filter:
  - xpath:
      path: '(//div[contains(@class,"item")]//h3)/a|(//div[contains(@class,"item")]//h4)/a'
  - html2text: re
  - grep: (?i)"USA\sSwimming"|"time\sStandard"|"National\sAge\sGroup"
---
name: (23)SwimSwam News [NAG]
url: https://swimswam.com/news
filter:
  - xpath:
      path: '(//div[contains(@class,"item")]//h3)/a|(//div[contains(@class,"item")]//h4)/a'
  - html2text: re
  - grep: "NAG"
---

@thp
Copy link
Owner

thp commented Sep 7, 2020

You cannot use (?-i)

@ghost
Copy link
Author

ghost commented Sep 7, 2020

EDIT:

You cannot use (?-i)

Thanks, I removed that from my jobs.

I am still having inconsistent behavior when combining grep, (?i), and double quotes. I can't really understand what is going on. No error - just doesn't return data.

I've checked my job file and don't see anything odd. I installed a hex editor but not familiar with how to use it to spot issues in this file.

The odd thing being it seems to be job dependent. The format works for other jobs - see job 22 below.

No longer finding anything:

name: (02)urlwatch webpage
url: "https://thp.io/2008/urlwatch/"
filter: 
  - html2text: re
  - grep: (?i)"current\sversion"
  - strip
---

returns nothing

Changing to the following works...

name: (02)urlwatch webpage
url: "https://thp.io/2008/urlwatch/"
filter: 
  - html2text: re
  - grep: (?i)current\sversion
  - strip
---
Mon Sep 07 04:00:39
iMac191:~ john$ urlwatch --test-filter 2
Current version: urlwatch-2.21.tar.gz (2020-07-31)

The - grep: (?i)"" format works for this job:

name: (22)SwimSwam News [USA Swimming, Time Standard, National Age Group]
url: https://swimswam.com/news
filter:
  - xpath:
      path: '(//div[contains(@class,"item")]//h3)/a|(//div[contains(@class,"item")]//h4)/a'
  - html2text: re
  - grep: (?i)"USA\sSwimming"|"time\sStandard"|"National\sAge\sGroup"
---

returns

Mon Sep 07 04:00:01
iMac191:~ john$ urlwatch --test-filter 22
USA Swimming Announces Speedo Swim Again Virtual Series
USA Swimming National Team Roster For 2020-21 Released With 115 Total Athletes

@ghost
Copy link
Author

ghost commented Sep 16, 2020

I looked at my python install and can't see anything wrong. I still have inconsistencies. I am on macOS which doesn't use GNU command line tools - would this matter?

What should the format be for the grep statement - quotes, single quotes and pipes?

another example where the pipe seems to be messing up my job.

This will not return the second item in the xpath:

name: (34)Apple Releases RSS
url: "https://developer.apple.com/news/releases/rss/releases.rss"
filter:
  - xpath: '//item/title/text()|//item/pubDate/text()'
  - html2text: re
---

This does return data for "pubDate" and the grep argument works as entered.

name: (35)Apple Releases RSS
url: "https://developer.apple.com/news/releases/rss/releases.rss"
filter:
  - css:
      selector: 'item > title, item > pubDate'
      method: xml
  - html2text: re
  - grep: (?i)ios|macos|xcode
---

@thp
Copy link
Owner

thp commented Sep 23, 2020

Try the following in an interactive Python shell:

>>> import yaml
>>> import pprint
>>> pprint.pprint(yaml.load(open('/path/to/your/urls.yaml'), Loader=yaml.SafeLoader))

Where /path/to/your/urls.yaml is of course the full path to the urls.yaml that you are trying to use.

The grep filter has nothing to do with the grep command line utility, other than that it mimics its behavior a bit (read the code of lib/urlwatch/filters.py):

        return '\n'.join(line for line in data.splitlines()                                                              
                         if re.search(subfilter['re'], line) is not None)                                                

Nowhere do the docs say that it uses the grep utility. If you are using a recent version of urlwatch, you can use the shellpipe filter to open any command (including grep) and filter the content with that command (stdin/stdout). In that case, if you are on macOS, grep will usually be BSD grep and in most Linux distributions, it will be GNU grep.

Again, the issue you have been seeing is with (?-i), which isn't a valid Python regular expression feature.

The other things that you were mentioning (which should be a different issue, this isn't a support forum thread) might be a limitation of Python's xpath support: https://stackoverflow.com/a/22560964/1047040

@thp thp changed the title grep issue when using case insensitive flag (?i) Trying to use unsupported "(?-i)" in a Python regex will result in an error Sep 23, 2020
@thp thp added the wontfix label Sep 24, 2020
@ghost
Copy link
Author

ghost commented Oct 15, 2020

Sorry for long delay. This is incorrect:

Again, the issue you have been seeing is with (?-i), which isn't a valid Python regular expression feature.

That is no longer in my job and I still have the same issues. I wanted to correct that misconception before I continue to troubleshoot. Thanks.

@ghost
Copy link
Author

ghost commented Oct 16, 2020

Try the following in an interactive Python shell:

>>> import yaml
>>> import pprint
>>> pprint.pprint(yaml.load(open('/path/to/your/urls.yaml'), Loader=yaml.SafeLoader))

Where /path/to/your/urls.yaml is of course the full path to the urls.yaml that you are trying to use.

@thp Thanks for continuing to troubleshoot this. I did exactly as you asked and ran into an error.

I do have multiple "jobs" separated by ____ in my urls.yaml file. First job starts on line 7 and ends with ___ on line 13.

>>> import yaml
>>> import pprint
>>> pprint.pprint(yaml.load(open('/Users/john/Desktop/urls.yaml'), Loader=yaml.SafeLoader))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/yaml/__init__.py", line 114, in load
    return loader.get_single_data()
  File "/usr/local/lib/python3.8/site-packages/yaml/constructor.py", line 49, in get_single_data
    node = self.get_single_node()
  File "/usr/local/lib/python3.8/site-packages/yaml/composer.py", line 41, in get_single_node
    raise ComposerError("expected a single document in the stream",
yaml.composer.ComposerError: expected a single document in the stream
  in "/Users/john/Desktop/urls.yaml", line 7, column 1
but found another document
  in "/Users/john/Desktop/urls.yaml", line 13, column 1

Hope that is helpful.

@ghost
Copy link
Author

ghost commented Oct 16, 2020

As soon as im using the "grep"-line i get the same error.

Maybe using shellpipe instead helps?

Someting like this?
- shellpipe: "grep -i -o '(?i)National Age Group|(?-i)NAG'"

That seems to work except when -shellpipe: "grep 'NAG'" doesn't find anything it causes the whole job to fail because of the command's exit status. subprocess.CalledProcessError: Command 'grep 'NAG'' returned non-zero exit status 1.

Do all -shellpipe commands require ; > /dev/null 2>&1 at the end to catch the exit status so the job doesn't fail?

@thp
Copy link
Owner

thp commented Oct 17, 2020

I do have multiple "jobs" separated by ____ in my urls.yaml file. First job starts on line 7 and ends with ___ on line 13.

>>> import yaml
>>> import pprint
>>> pprint.pprint(yaml.load(open('/Users/john/Desktop/urls.yaml'), Loader=yaml.SafeLoader))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/yaml/__init__.py", line 114, in load
    return loader.get_single_data()
  File "/usr/local/lib/python3.8/site-packages/yaml/constructor.py", line 49, in get_single_data
    node = self.get_single_node()
  File "/usr/local/lib/python3.8/site-packages/yaml/composer.py", line 41, in get_single_node
    raise ComposerError("expected a single document in the stream",
yaml.composer.ComposerError: expected a single document in the stream
  in "/Users/john/Desktop/urls.yaml", line 7, column 1
but found another document
  in "/Users/john/Desktop/urls.yaml", line 13, column 1

The jobs should be separated with ---, not ___.

Anyway, try this instead:

>>> import yaml
>>> import pprint
>>> pprint.pprint(yaml.load_all(open('/path/to/your/urls.yaml'), Loader=yaml.SafeLoader))

@thp
Copy link
Owner

thp commented Oct 17, 2020

Do all -shellpipe commands require ; > /dev/null 2>&1 at the end to catch the exit status so the job doesn't fail?

Yes.

@thp
Copy link
Owner

thp commented Oct 17, 2020

I'm closing this now, as the original issue (using (?-i), which isn't a thing) has been resolved.

@thp thp closed this as completed Oct 17, 2020
@ghost
Copy link
Author

ghost commented Oct 17, 2020

Anyway, try this instead:

>>> import yaml
>>> import pprint
>>> pprint.pprint(yaml.load_all(open('/path/to/your/urls.yaml'), Loader=yaml.SafeLoader))
$ python
Python 3.8.6 (default, Oct  8 2020, 14:07:53) 
[Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import yaml
>>> import pprint
>>> pprint.pprint(yaml.load_all(open("/Users/john/Desktop/urls.yaml"), Loader=yaml.SafeLoader))
<generator object load_all at 0x10d5aa9e0>
>>> 

@thp
Copy link
Owner

thp commented Oct 19, 2020

Ok, try this then:

>>> import yaml
>>> import pprint
>>> pprint.pprint(list(yaml.load_all(open('/path/to/your/urls.yaml'), Loader=yaml.SafeLoader)))

@ghost
Copy link
Author

ghost commented Oct 19, 2020

[{'filter': [{'xpath': {'path': '(//div[contains(@class,"release-timeline-tags")]//h4)[1]/a'}},
             {'html2text': 're'}],
  'name': '01_urlwatch update released',
  'url': 'https://github.com/thp/urlwatch/releases'},
 {'filter': [{'html2text': 're'}, {'grep': '(?i)current\\sversion'}, 'strip'],
  'name': '02_urlwatch webpage',
  'url': 'https://thp.io/2008/urlwatch/'},
 {'filter': [{'html2text': 're'}, {'grep': '(?i)current\\sversion'}, 'strip'],
  'name': '03_RansomWhere? Objective-See',
  'url': 'https://objective-see.com/products/ransomwhere.html'},
 {'filter': [{'html2text': 're'}, {'grep': '(?i)current\\sversion'}, 'strip'],
  'name': '04_BlockBLock Objective-See',
  'url': 'https://objective-see.com/products/blockblock.html'},
 {'filter': [{'html2text': 're'},
             {'grep': '(?i)current\\sversion'},
             {'re.sub': '(?m)^[ \\t]*'},
             'strip'],
  'name': '05_ProcessMonitor/FileMonitor Objective-See',
  'url': 'https://objective-see.com/products/utilities.html'},
 {'filter': [{'xpath': {'path': '//div[contains(@class,"download-link")]//a'}},
             {'html2text': 're'},
             {'grep': '(?i)Download.*VNC.*Server'},
             {'re.sub': '(?m)^[ \\t]*'}],
  'name': '06_RealVNC Server Version',
  'url': 'https://www.realvnc.com/en/connect/download/vnc/macos/'},
 {'filter': [{'xpath': {'path': '/html/body/div[4]/div/main/div[2]/div/div[2]/div[2]/div/div/div[1]/h4/a'}},
             {'html2text': 're'},
             'strip'],
  'name': '07_SwiftDefaultApps Preference Pane',
  'url': 'https://github.com/Lord-Kamina/SwiftDefaultApps/tags'},
 {'filter': [{'xpath': '(//div[contains(@class,"field-item even")]//li)[6]'},
             {'html2text': 're'}],
  'name': '08_Concept2 Firmware Update',
  'url': 'https://www.concept2.com/service/monitors/pm5/firmware/timeline'},
 {'filter': [{'xpath': {'path': '//tbody/tr/td'}},
             {'html2text': 're'},
             {'re.sub': '(?m)^[ \\t]*'}],
  'name': '09_ISI Championship Meets',
  'url': 'https://www.teamunify.com/SubTabGeneric.jsp?team=ilslsc&_stabid_=161229'},
 {'filter': [{'xpath': {'path': '//*[@id="colMain_contentMain"]/table/tr[position()<11]'}},
             {'html2text': 're'},
             {'re.sub': '(?m)^[ \\t]*'}],
  'name': '10_ISI News',
  'url': 'https://www.teamunify.com/News.jsp?team=ilslsc'},
 {'filter': [{'element-by-id': 'colMain_content'}],
  'name': '11_ISI Records',
  'url': 'https://www.teamunify.com/SubTabGeneric.jsp?team=ilslsc&_stabid_=203319'},
 {'filter': [{'element-by-id': 'colMain_content'}],
  'name': '12_ISI Top 10 Times',
  'url': 'https://www.teamunify.com/SubTabGeneric.jsp?team=ilslsc&_stabid_=203320'},
 {'filter': [{'element-by-id': 'colMain_content'}],
  'name': '13_ISI Time Standards',
  'url': 'https://www.teamunify.com/SubTabGeneric.jsp?team=ilslsc&_stabid_=202420'},
 {'filter': [{'element-by-id': 'colMain_content'}],
  'name': '14_ISI Dual in the Pool',
  'url': 'https://www.teamunify.com/SubTabGeneric.jsp?team=ilslsc&_stabid_=195361'},
 {'filter': [{'element-by-id': 'colMain_content'}],
  'name': '15_ISI Camps',
  'url': 'https://www.teamunify.com/SubTabGeneric.jsp?team=ilslsc&_stabid_=160922'},
 {'filter': [{'element-by-id': 'colMain_content'}],
  'name': '16_ISI Olympic Trial Qualifiers',
  'url': 'https://www.teamunify.com/SubTabGeneric.jsp?team=ilslsc&_stabid_=203332'},
 {'filter': [{'element-by-id': 'events_content'}],
  'name': '17_ISI Zone Events',
  'url': 'https://www.teamunify.com/Home.jsp?_tabid_=0&team=ilzone'},
 {'filter': [{'xpath': '(//div[contains(@class,"pg_news")]//td)/a'},
             {'html2text': 're'}],
  'name': '18_ISI Zone News',
  'url': 'https://www.teamunify.com/News.jsp?team=ilzone'},
 {'filter': [{'xpath': '(//table[contains(@id,"AutoNumber5")]//tbody//tr[2])/td[1]|(//table[contains(@id,"AutoNumber5")]//tbody//tr[3])/td[1]'},
             {'html2text': 're'},
             {'re.sub': '(?m)^[ \\t]*'}],
  'name': '19_Central Zone Swimming',
  'url': 'https://www.teamunify.com/TabGeneric.jsp?_tabid_=47721&team=cenzone'},
 {'filter': [{'xpath': '//div[contains(@class,"Times_NationalAgeGroupRecords '
                       '")]'},
             {'html2text': 're'},
             {'re.sub': '(?m)^[ \\t]*'}],
  'name': '20_USA Swimming 11-12 SCY NAG Records',
  'url': 'https://www.usaswimming.org/times/popular-resources/national-age-group-records/scy/11-12'},
 {'filter': [{'xpath': '(//div[contains(@class,"post-detail")]//h2)/a'},
             {'html2text': 're'},
             {'grep': '(?i)USA.*Swimming|Time.*Standard|National.*Age.*Group|"NAG"'}],
  'name': '21_Swimming World Time Standard/NAG/USA Swimming',
  'url': 'https://www.swimmingworldmagazine.com/news/category/usa/'},
 {'filter': [{'xpath': {'exclude': 'a',
                        'method': 'xml',
                        'path': '//item/description/text()'}},
             {'grep': '(?i)"usa\\sswimming"|"time.*standard"|"national\\sage\\sgroup"'},
             {'html2text': 're'}],
  'name': '22_SwimSwam News [USA Swimming, Time Standard, National Age Group]',
  'url': 'https://swimswam.com/feed/#first'},
 {'filter': [{'xpath': {'exclude': 'a',
                        'method': 'xml',
                        'path': '//item/description/text()'}},
             {'shellpipe': "grep 'NAG'; > /dev/null 2>&1"},
             {'html2text': 're'}],
  'name': '23_SwimSwam News [NAG]',
  'url': 'https://swimswam.com/feed/#second'},
 {'filter': [{'element-by-id': 'ranking'},
             {'html2text': 're'},
             {'re.sub': '(?m)^[ \\t]*'}],
  'name': '24_SwimSwam Girls 11-12 NAG Records SCY',
  'url': 'https://swimswam.com/records/girls-11-12-us-national-age-group-records-scy/'},
 {'filter': [{'element-by-id': 'ranking'},
             {'html2text': 're'},
             {'re.sub': '(?m)^[ \\t]*'}],
  'name': '25_SwimSwam Girls 11-12 NAG Records LCM',
  'url': 'https://swimswam.com/records/girls-11-12-us-national-age-group-records-lcm/'},
 {'filter': [{'xpath': '(//div[contains(@class,"entry")]//h3)/a'},
             {'html2text': 're'}],
  'name': '26_NCSA Upcoming Events',
  'url': 'https://www.teamunify.com/EventsCurrent.jsp?team=recndncsa'},
 {'filter': [{'xpath': '//td[contains(@class,"headlines")]//a'},
             {'html2text': 're'}],
  'name': '27_NCSA Upcoming Events',
  'url': 'https://www.teamunify.com/News.jsp?team=recndncsa'},
 {'filter': [{'xpath': '(//article[contains(@class,"article")]//h1)//a|//div[contains(@class,"byline")]'},
             {'html2text': 're'},
             {'grep': '(?i)Apple.*Seeds'}],
  'name': '28_MacRumors Apple Seeds',
  'url': 'https://www.macrumors.com'},
 {'filter': [{'xpath': {'path': '(//div[contains(@class,"navbuttons")]//p)/a'}},
             {'html2text': 're'}],
  'name': '29_Apple 10.15.x "Catalina" Updates',
  'url': 'https://support.apple.com/en-us/HT210642'},
 {'filter': [{'xpath': '(//div[contains(@id,"sections")]//ul)[1]/li[position()<5]'},
             {'html2text': 're'}],
  'name': '30_Apple Latest Versions',
  'url': 'https://support.apple.com/en-us/HT201222'},
 {'filter': [{'xpath': '(//div[contains(@class,"content-rounded")]//center)/div[1]'},
             {'html2text': 're'}],
  'name': '31_RARBG Registration',
  'url': 'https://rarbg.to/login'},
 {'filter': [{'grep': '".*.app.tar.xz"|".*\\.pkg"'}, {'html2text': 're'}],
  'name': '32_filebot Beta',
  'url': 'https://get.filebot.net/filebot/BETA/'},
 {'filter': [{'grep': '".*.app.tar.xz"|".*\\.pkg"'}, {'html2text': 're'}],
  'name': '33_filebot_4.9.2',
  'url': 'https://get.filebot.net/filebot/FileBot_4.9.2/'},
 {'filter': [{'css': {'method': 'xml',
                      'selector': 'item > title, item > pubDate'}},
             {'html2text': 're'},
             {'grep': '(?i)ios|macos|xcode'}],
  'name': '34_Apple Releases RSS',
  'url': 'https://developer.apple.com/news/releases/rss/releases.rss'},
 None]

@thp
Copy link
Owner

thp commented Oct 19, 2020

Cool! Maybe remove the last "---" in the file, so the None element goes away. Other than that, it looks like it's properly parsed.

@ghost
Copy link
Author

ghost commented Oct 19, 2020

Thanks for looking over the file. Appreciate you.
"---" is a divider between jobs, a header, a footer? For some reason I remember that all jobs had to end with "---". I did remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants