HTML Link Parser

Exercise details

The goal of exercise is to create a package that makes it easy to parse an HTML file and extract all of the links (<a href="">...</a> tags). For each extracted link there should be a data structure returned that includes both the href, as well as the text inside the link. Any HTML inside of the link can be stripped out, along with any extra whitespace including newlines, back-to-back spaces, etc. Links are nested in different HTML elements.

<a href="/dog">
  <span>Something in a span</span>
  Text not in a span
  <b>Bold text!</b>
</a>

In situations like these we want to get output like:

Link{
  Href: "/dog",
  Text: "Something in a span Text not in a span Bold text!",
}

Notes

1. Use the x/net/html package

2. Ignore nested links

Ignore any links nested inside of another link. Eg with following HTML:

<a href="#">
  Something here <a href="/dog">nested dog link</a>
</a>

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
examples/ex1		examples/ex1
README.md		README.md
parse.go		parse.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HTML Link Parser

Exercise details

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

TinStay/LinkParser

Folders and files

Latest commit

History

Repository files navigation

HTML Link Parser

Exercise details

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages