Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in 0.0.31? #78

Closed
crd477 opened this issue Jul 27, 2022 · 13 comments
Closed

Regression in 0.0.31? #78

crd477 opened this issue Jul 27, 2022 · 13 comments

Comments

@crd477
Copy link

crd477 commented Jul 27, 2022

Hello, I've observed what seems to be a regression in the latest release.

With version 0.0.30, the URLs in the test file I have there are correctly flagged as problematic but 0.0.31 doesn't appear to work at all.

crd@raspberrypi:~ $ python3 -m venv uc
crd@raspberrypi:~ $ . uc/bin/activate
(uc) crd@raspberrypi:~ $ cd uc
(uc) crd@raspberrypi:~/uc $ mkdir foo
(uc) crd@raspberrypi:~/uc $ cat > foo/foo.md
 *   Principal Lead Software Engineer -- https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997
 *   Senior Software Engineer -- https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999
 *    Associate Software Engineer -- https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005
(uc) crd@raspberrypi:~/uc $ pip install urlchecker==0.0.30
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting urlchecker==0.0.30
  Using cached https://www.piwheels.org/simple/urlchecker/urlchecker-0.0.30-py3-none-any.whl (26 kB)
Collecting fake-useragent
  Using cached https://www.piwheels.org/simple/fake-useragent/fake_useragent-0.1.11-py3-none-any.whl (13 kB)
Collecting requests>=2.18.4
  Using cached https://www.piwheels.org/simple/requests/requests-2.28.1-py3-none-any.whl (62 kB)
Collecting certifi>=2017.4.17
  Using cached https://www.piwheels.org/simple/certifi/certifi-2022.6.15-py3-none-any.whl (160 kB)
Collecting idna<4,>=2.5
  Using cached https://www.piwheels.org/simple/idna/idna-3.3-py3-none-any.whl (64 kB)
Collecting charset-normalizer<3,>=2
  Using cached https://www.piwheels.org/simple/charset-normalizer/charset_normalizer-2.1.0-py3-none-any.whl (39 kB)
Collecting urllib3<1.27,>=1.21.1
  Using cached https://www.piwheels.org/simple/urllib3/urllib3-1.26.11-py2.py3-none-any.whl (139 kB)
Installing collected packages: urllib3, idna, charset-normalizer, certifi, requests, fake-useragent, urlchecker
Successfully installed certifi-2022.6.15 charset-normalizer-2.1.0 fake-useragent-0.1.11 idna-3.3 requests-2.28.1 urlchecker-0.0.30 urllib3-1.26.11
(uc) crd@raspberrypi:~/uc $ urlchecker check foo/
           original path: foo/
              final path: foo/
               subfolder: None
                  branch: main
                 cleanup: False
              file types: ['.md', '.py']
                   files: []
               print all: True
                 verbose: False
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997

🤔 Uh oh... The following urls did not pass:
❌️ https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005
❌️ https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999
❌️ https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997
(uc) crd@raspberrypi:~/uc $ pip install --upgrade urlchecker==0.0.31
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting urlchecker==0.0.31
  Using cached https://www.piwheels.org/simple/urlchecker/urlchecker-0.0.31-py3-none-any.whl (28 kB)
Requirement already satisfied: fake-useragent in ./lib/python3.9/site-packages (from urlchecker==0.0.31) (0.1.11)
Requirement already satisfied: requests>=2.18.4 in ./lib/python3.9/site-packages (from urlchecker==0.0.31) (2.28.1)
Requirement already satisfied: charset-normalizer<3,>=2 in ./lib/python3.9/site-packages (from requests>=2.18.4->urlchecker==0.0.31) (2.1.0)
Requirement already satisfied: idna<4,>=2.5 in ./lib/python3.9/site-packages (from requests>=2.18.4->urlchecker==0.0.31) (3.3)
Requirement already satisfied: certifi>=2017.4.17 in ./lib/python3.9/site-packages (from requests>=2.18.4->urlchecker==0.0.31) (2022.6.15)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./lib/python3.9/site-packages (from requests>=2.18.4->urlchecker==0.0.31) (1.26.11)
Installing collected packages: urlchecker
  Attempting uninstall: urlchecker
    Found existing installation: urlchecker 0.0.30
    Uninstalling urlchecker-0.0.30:
      Successfully uninstalled urlchecker-0.0.30
Successfully installed urlchecker-0.0.31
(uc) crd@raspberrypi:~/uc $ urlchecker check foo/
           original path: foo/
              final path: foo/
               subfolder: None
                  branch: main
                 cleanup: False
              file types: ['.md', '.py']
                   files: []
               print all: True
                 verbose: False
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
2022-07-26 20:55:30,171 - urlchecker - ERROR - Error running task
🤔 There were no URLs to check.
(uc) crd@raspberrypi:~/uc $

Maybe I'm doing something wrong?

@vsoch
Copy link
Collaborator

vsoch commented Jul 27, 2022

Can you provide your test file with URLs for me to check?

@vsoch
Copy link
Collaborator

vsoch commented Jul 27, 2022

oh lolz this is usrse jobs right?

@vsoch
Copy link
Collaborator

vsoch commented Jul 27, 2022

This is what I'm getting with your foo.md

$ urlchecker check --files foo.md .
           original path: .
              final path: /home/vanessa/Desktop/Code/urlchecker-python
               subfolder: None
                  branch: main
                 cleanup: False
              file types: ['.md', '.py']
                   files: ['foo.md']
               print all: True
                 verbose: False
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999

@vsoch
Copy link
Collaborator

vsoch commented Jul 27, 2022

In your run I don't see that it's detected any files (e.g., files: [] is empty)

@vsoch
Copy link
Collaborator

vsoch commented Jul 27, 2022

I think running it the same way as you works for me too:

$ urlchecker check foo/
           original path: foo/
              final path: foo/
               subfolder: None
                  branch: main
                 cleanup: False
              file types: ['.md', '.py']
                   files: []
               print all: True
                 verbose: False
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997

🎉 All URLS passed!

Could you provide some way to reproduce what is happening so I could help?

@crd477
Copy link
Author

crd477 commented Jul 27, 2022

(uc) crd@raspberrypi:~/uc $ urlchecker check --files foo/foo.md foo
           original path: foo
              final path: foo
               subfolder: None
                  branch: main
                 cleanup: False
              file types: ['.md', '.py']
                   files: ['foo/foo.md']
               print all: True
                 verbose: False
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
2022-07-26 21:11:47,887 - urlchecker - ERROR - Error running task
🤔 There were no URLs to check.

In your run I don't see that it's detected any files (e.g., files: [] is empty)

But that invocation worked OK in 0.0.30.

@crd477
Copy link
Author

crd477 commented Jul 27, 2022

Incidentally, what version of Python are you using?

@vsoch
Copy link
Collaborator

vsoch commented Jul 27, 2022

ahh I see this:

2022-07-26 20:55:30,171 - urlchecker - ERROR - Error running task

This means the multiprocess workers had an error, so to get to the bottom of this what you can do is pip install IPython,
and insert:

import IPython
IPython.embed()

here:

funcs[file_name] = check_task

and here:

and then you'll want to manually run the function (so it doesn't get run by the worker, e.g.,)

kwargs = {
                "file_name": file_name,
                "exclude_patterns": exclude_patterns,
                "exclude_urls": exclude_urls,
                "print_all": self.print_all,
                "retry_count": retry_count,
                "timeout": timeout,
                "port": ports.pop(0),
            }
check_task(**kwargs)

That will enter you into the second function, then manually run the check for each url and inspect what happens. Report back here and we will try to work on it!

@vsoch
Copy link
Collaborator

vsoch commented Jul 27, 2022

I have a fairly new one:

$ python --version
Python 3.9.12

I hope it's not that! If it is we can find out with the check above. Let me know what you find!

@vsoch
Copy link
Collaborator

vsoch commented Jul 27, 2022

And when we do figure it out, we should probably better capture this particular error so it's clearer to you what happened! I missed that error message the first time I looked at it.

@vsoch
Copy link
Collaborator

vsoch commented Aug 4, 2022

hey @crd477 ! I think I was able to reproduce your error - would you mind testing both shown commands at the branch #80 (comment)? Thank you!

@vsoch
Copy link
Collaborator

vsoch commented Aug 4, 2022

Fixed with #82

@vsoch vsoch closed this as completed Aug 4, 2022
@crd477
Copy link
Author

crd477 commented Aug 4, 2022

Yes, sorry I didn't get back to you sooner - I got busy with something else and then I'm mostly AFK this week.
This fix appears to work for me:

(0.0.32) crd@raspberrypi:~/uc $ urlchecker --version
0.0.32
(0.0.32) crd@raspberrypi:~/uc $ urlchecker check foo --serial
           original path: foo
              final path: foo
               subfolder: None
                  branch: main
                 cleanup: False
                  serial: True
              file types: ['.md', '.py']
                   files: []
               print all: True
                 verbose: False
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997

🤔 Uh oh... The following urls did not pass:
❌️ https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999
❌️ https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005
❌️ https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997
(0.0.32) crd@raspberrypi:~/uc $ urlchecker check foo
           original path: foo
              final path: foo
               subfolder: None
                  branch: main
                 cleanup: False
                  serial: False
              file types: ['.md', '.py']
                   files: []
               print all: True
                 verbose: False
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005
HTTPSConnectionPool(host='uwhires.admin.washington.edu', port=443): Max retries exceeded with url: /ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005

🤔 Uh oh... The following urls did not pass:
❌️ https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209999
❌️ https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=209997
❌️ https://uwhires.admin.washington.edu/ENG/Candidates/default.cfm?szCategory=jobprofile&szOrderID=210005

Just to be clear, the fact that these specific URLs fail is OK and is not related to my issue report.

Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants