New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New: Markdown links validator #1

Merged
merged 10 commits into from Jan 10, 2019

Conversation

Projects
None yet
4 participants
@sarvaje
Copy link
Member

sarvaje commented Nov 30, 2018

This package is need to do: webhintio/hint#1543

For now, the validator can:

  • Validate an URL (e.g. [link](https://example.com))
  • Validate if a relative link is valid (e.g. [link](./doc.md))
  • Validate if an "internal" link is valid (e.g. [link](#title))
  • Validate if a relative + "internal" link is valid (e.g. [link](./docs/doc.md#title))
@sarvaje

This comment has been minimized.

Copy link
Member

sarvaje commented Nov 30, 2018

@webhintio/contributors this is ready to review.

Show resolved Hide resolved src/lib/cli.ts Outdated
Show resolved Hide resolved src/lib/cli.ts
Show resolved Hide resolved src/lib/cli.ts
Show resolved Hide resolved src/lib/cli.ts Outdated
Show resolved Hide resolved src/lib/mdfile.ts Outdated
Show resolved Hide resolved package.json
Show resolved Hide resolved package.json
Show resolved Hide resolved package.json Outdated

@molant molant requested review from alrra and antross Dec 2, 2018

@molant

This comment has been minimized.

Copy link
Member

molant commented Dec 2, 2018

@antross can you take a look at this PR whenever you have the time? It's a new tool to validate that our links in md are correct.

@sarvaje

This comment has been minimized.

Copy link
Member

sarvaje commented Dec 3, 2018

I have found some issues that I need to fix before this is merged.

@sarvaje

This comment has been minimized.

Copy link
Member

sarvaje commented Dec 4, 2018

@antross this should be ready for review

const validateLinks = async (mdFiles: IMDFile[]): Promise<void> => {
for (const mdFile of mdFiles) {
// Validate file by file to prevent 429 and socket hang up errors.
await mdFile.validateLinks();

This comment has been minimized.

@molant

molant Dec 4, 2018

Member

Are we validating all links from all files at the same time? If so maybe we should look into parallelizing something? Won't one-by-one take a long time? Maybe parallelize the ones that are for the file system?

This comment has been minimized.

@sarvaje

sarvaje Dec 4, 2018

Member

Validate relative and internal links is fast, I think we don't need to improve anything about that. For absolute links we can't validate all at the same time, or we get network or "too many request" errors, That is why I am validating file by file.

This comment has been minimized.

@molant

molant Dec 5, 2018

Member

validating file by file.

Validating one by one you mean, right?
How long does it take to validate all the links with the current code? That will allow us to better decide if we should parallelize or not.

This comment has been minimized.

@sarvaje

sarvaje Jan 8, 2019

Member

Aprox. 127 seconds to validate hint repo.

This comment has been minimized.

@molant

molant Jan 8, 2019

Member

How many links are there? 2 minutes is quite a lot IMO 😖

This comment has been minimized.

@sarvaje

sarvaje Jan 8, 2019

Member

Doing chunks of 10 files, the analysis takes 40 seconds when everything is valid.

This comment has been minimized.

@molant

molant Jan 8, 2019

Member

Nice! Have you tried with more files? Is that the sweet spot?

This comment has been minimized.

@sarvaje

sarvaje Jan 8, 2019

Member

No, I can try with more, but it is going to depend on the files and the links on them, what works today, tomorrow can fail.

This comment has been minimized.

@molant

molant Jan 8, 2019

Member

Please give it a try. No need to be super exact (i.e.: increase by 5 each time). Any savings are welcome :)

This comment has been minimized.

@sarvaje

sarvaje Jan 8, 2019

Member

Take a look to the new solution. Instead of analyzing group of 10 files, I'm analyzing up to 10 files at the same time all the time.

With this solution, the time with a limit of 10 files and 20 files is more or less the same, so I prefer to keep 10.

btw, the time is more or less 22s in my computer.

@antross

antross approved these changes Dec 4, 2018

Copy link
Member

antross left a comment

Nice work!

How much work this required really makes me wish there was already a tool that could handle this. Perhaps this will be that tool for someone else in the future.

Show resolved Hide resolved src/lib/cli.ts
Show resolved Hide resolved src/lib/cli.ts Outdated
Update src/lib/cli.ts
Co-Authored-By: sarvaje <dgarcia@plainconcepts.com>
@sarvaje

This comment has been minimized.

Copy link
Member

sarvaje commented Dec 4, 2018

There is a couple of issues right now:

  1. Some links (e.g. https://scotthelme.co.uk/protect-site-from-cryptojacking-csp-sri/) returns a 500 error but if you navigate with a browser, the link works.
  2. https://webhint.visualstudio.com/ require authentication, but we are using it in the extension-vscode contributing.md file.

There is two options, investigate the error 500 and remove the https://webhint.visualstudio.com link or implement an ignore list so we can ignore those links.

@webhint/core thoughts?

@antross

This comment has been minimized.

Copy link
Member

antross commented Dec 4, 2018

I think an explicit ignore list sounds like a good solution as I suspect most of the value here will come from catching "new" links. Ones that we've previously verified through other means should be OK to exclude.

@sarvaje

This comment has been minimized.

Copy link
Member

sarvaje commented Dec 4, 2018

I think an explicit ignore list sounds like a good solution

Perfect!! I will work on that.

@molant

This comment has been minimized.

Copy link
Member

molant commented Dec 4, 2018

@sarvaje

This comment has been minimized.

Copy link
Member

sarvaje commented Dec 4, 2018

@sarvaje sarvaje force-pushed the sarvaje:validate-external-links branch from 2f23963 to 929af32 Dec 5, 2018

@molant

This comment has been minimized.

Copy link
Member

molant commented Dec 5, 2018

The headers received for https://scotthelme.co.uk/protect-site-from-cryptojacking-csp-sri/ is:

I meant the ones sent by the browser or the ones sent by awesome bot when we validate via cron

@sarvaje

This comment has been minimized.

Copy link
Member

sarvaje commented Dec 5, 2018

ahhhh ok

@molant

This comment has been minimized.

Copy link
Member

molant commented Jan 7, 2019

@sarvaje what's the status of this? Is this ready for review?

@molant

This comment has been minimized.

Copy link
Member

molant commented Jan 7, 2019

BTW, if I do curl -I https://scotthelme.co.uk/protect-site-from-cryptojacking-csp-sri/ I get a 200 status code.

@sarvaje

This comment has been minimized.

Copy link
Member

sarvaje commented Jan 7, 2019

@sarvaje what's the status of this? Is this ready for review?

I need to check the url https://scotthelme.co.uk/protect-site-from-cryptojacking-csp-sri/ but it should be ready to review.

@sarvaje

This comment has been minimized.

Copy link
Member

sarvaje commented Jan 7, 2019

@molant I'm changing to node http and https packages and https://scotthelme.co.uk/protect-site-from-cryptojacking-csp-sri/ also returns a 200

I will push the changes in few minutes

@sarvaje

This comment has been minimized.

Copy link
Member

sarvaje commented Jan 7, 2019

now, using the packages http and https I get an 403 error for the url: https://tools.ietf.org/html/rfc6265...

@sarvaje

This comment has been minimized.

Copy link
Member

sarvaje commented Jan 7, 2019

Never mind, I have found the issue.

@sarvaje

This comment has been minimized.

Copy link
Member

sarvaje commented Jan 8, 2019

@webhintio/contributors this is ready to review.

sarvaje added some commits Jan 8, 2019

@molant

molant approved these changes Jan 8, 2019

Show resolved Hide resolved package.json Outdated
Update package.json
Co-Authored-By: sarvaje <dgarcia@plainconcepts.com>

@molant molant merged commit c083d09 into webhintio:master Jan 10, 2019

1 check passed

licence/cla Contributor License Agreement is signed.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment