Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue finding href of <a> tag #242

Closed
psquared-dev opened this issue Jun 4, 2023 · 6 comments
Closed

issue finding href of <a> tag #242

psquared-dev opened this issue Jun 4, 2023 · 6 comments

Comments

@psquared-dev
Copy link

Here is the code:

  const root = parse(html);
  const links = root.querySelectorAll("a");

  for (const a of links) {
    console.log(a.rawAttrs); 
    }
  }

a.rawAttrs returns 'href="/" rel="home"' but a.getAttribute("href') returns undefined.

Also a.attrs always returns an empty object {}.

@psquared-dev psquared-dev changed the title issue finding href of a tag issue finding href of a tag Jun 4, 2023
@psquared-dev psquared-dev changed the title issue finding href of a tag issue finding href of <a> tag Jun 4, 2023
@Ionys320
Copy link

Hi,
If you replace querySelectorAll by getElementsByTagName, you'll be able to get the href by using .getAttribute("href").

const root = parse(html);
const links = root.querySelectorAll("a");

for (const a of links) {
    console.log(a.rawAttrs); 
    console.log(a.getAttribute("href"));
}

taoqf added a commit that referenced this issue Aug 16, 2023
@excelsior091224
Copy link

I have same issue.
I tried to extract the <code> contained in the <pre> as follows, but what I got back was an empty list.
No matter how I look at it, I am not getting the <code>.

<pre>
  <code>test</code>
</pre>

test code

// test
const root = parse(data.content);
const pre_list = root.getElementsByTagName("pre");
pre_list.map((pre) => {
  console.log("pre:"+pre);
});
const pre_code = root.getElementsByTagName("pre code");
console.log("pre_code:"+pre_code);
pre_list.map((pre) => {
  const code = pre.getElementsByTagName("code");
  console.log("code:"+code);
});

result

// pre_list
// 1st <pre>
BlogPreview.tsx:33 pre:<pre><code class="language-typescript">
// omission
// 2nd <pre>
BlogPreview.tsx:33 pre:<pre><code class="language-typescript">  public async getBlogs(queries?: MicroCMSQueries) {
// omission
// 3rd <pre>
BlogPreview.tsx:33 pre:<pre><code>---
// omission
// 4th <pre>
BlogPreview.tsx:33 pre:<pre><code class="language-typescript">import { Cache, CacheContainer } from &quot;node-ts-cache&quot;;
// omission
// 5th <pre>
BlogPreview.tsx:33 pre:<pre><code class="language-json">{
// omission
// pre_code
BlogPreview.tsx:36 pre_code:
// code
5BlogPreview.tsx:39 code:

@taoqf
Copy link
Owner

taoqf commented Sep 8, 2023

@excelsior091224 I'm afraid this is another issue. in you case ,you should just add an options to parse

const root = parse(html, {
	blockTextElements: {
		script: true,
		noscript: true,
		style: true,
	}
});

taoqf added a commit that referenced this issue Sep 8, 2023
@devansh-sharma-tw
Copy link

@taoqf , this commit (release v6.1.7 onwards) breaks the earlier functionality of ignoring text content of specific tags by setting them as false in blockTextElements, which seems unintended to me.

  • This is the behavior before this commit (running v6.1.6):
const htmlString = "sample <b><strong>text</strong> inside tags</b> <script>text inside script</script>"

console.log(parse(htmlString, {
    blockTextElements: {
        script: false
    }
}).text)    // Output: sample text inside tags

console.log(parse(htmlString, {
    blockTextElements: {
        script: true
    }
}).text)    // Output: sample text inside tags text inside script

This matches the behavior explained in the README as well.

  • This is the behavior after this commit (running v6.1.7-v6.1.9):
const htmlString = "sample <b><strong>text</strong> inside tags</b> <script>text inside script</script>"

console.log(parse(htmlString, {
    blockTextElements: {
        script: false
    }
}).text)    // Output: sample text inside tags text inside script

console.log(parse(htmlString, {
    blockTextElements: {
        script: true
    }
}).text)    // Output: sample text inside tags text inside script

Could you please check ?

@taoqf taoqf added the bug label Sep 15, 2023
@taoqf taoqf closed this as completed in 36670f4 Sep 15, 2023
@taoqf
Copy link
Owner

taoqf commented Sep 15, 2023

@devansh-sharma-tw Sorry for that. You can try v6.1.0 now.
@excelsior091224 For your case, you should not pass and empty object as blockTextElement in option. like this:

const html = `<pre>
  <code>test</code>
</pre>`;
const root = parse(html, {
	blockTextElements: {
	}
});
const list = root.getElementsByTagName("code");
const [code] = list;
code.text.should.eql('test');

@devansh-sharma-tw
Copy link

@taoqf Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants