Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charset on final file is not like expected #23

Closed
TiagoGouvea opened this issue Sep 9, 2023 · 5 comments · Fixed by #22
Closed

Charset on final file is not like expected #23

TiagoGouvea opened this issue Sep 9, 2023 · 5 comments · Fixed by #22

Comments

@TiagoGouvea
Copy link

Configuration

version: website-scraper@5.3.1

options: [provide your full options object]

const options = {
  urls: ['https://www.appmasters.io/pt/'],
  directory: '/Users/tiagogouvea/www/scrapper/test/app/',
  plugins: [ new SaveToExistingDirectoryPlugin() ]
};

Description

Trying to scrap a website that has <meta charSet="utf-8"/> results on a content without correct charset presentation.

Following is a screenshot showing the original content and the scrapped one.

Cursor_and_App_Masters_-_Desenvolvimento_web_e_mobile_-_App_Masters_-_Desenvolvimento_Web_e_Mobile_-_Juiz_de_Fora_MG

Expected behavior: That the save filed was on the same charset.

Actual behavior: It failed to save on the current way.

Additional Information

Congratulations for this amazing project! 👏👏👏

@henq98
Copy link

henq98 commented Sep 9, 2023

Edit index.js in website-scraper-existing-directory module folder
from
await fs.outputFile(filename, text, { encoding: 'binary' });
to
await fs.outputFile(filename, text, { encoding: 'utf-8' });

@aivus
Copy link
Member

aivus commented Sep 9, 2023

@henq98 good catch! We will check this.

@s0ph1e looks like we need to replace

await fs.outputFile(filename, text, { encoding: 'binary' });

with

await fs.outputFile(filename, text, { encoding: resource.getEncoding() });

@aivus aivus transferred this issue from website-scraper/node-website-scraper Sep 9, 2023
@aivus
Copy link
Member

aivus commented Sep 9, 2023

Issue moved to #23

@s0ph1e
Copy link
Member

s0ph1e commented Sep 10, 2023

Hi all 👋

I've just released a bug fix in version website-scraper-existing-directory@1.0.1

Thank you for your collaboration 👍

@TiagoGouvea
Copy link
Author

Great! Thank you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants