Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace remote image assets with downloaded one #494

Closed
jeepers3327 opened this issue Jun 11, 2022 · 3 comments
Closed

Replace remote image assets with downloaded one #494

jeepers3327 opened this issue Jun 11, 2022 · 3 comments

Comments

@jeepers3327
Copy link

Configuration

version: website-scraper@5.0.0

options:

class Plugin {
  apply(registerAction) {
    let getUrl;

    registerAction("beforeStart", ({ utils }) => {
      getUrl = utils.getUrl;
    });

    registerAction(
      "getReference",
      ({ resource, parentResource, originalReference }) => {
        // console.log("resource ", resource.url);
        // console.log("parent res ", parentResource.url);
        // console.log("original ref ", originalReference);

        const res = /\.(gif|jpe?g|tiff?|png|webp|bmp)$/i.test(
          originalReference
        );

        if (res) {
          const asset = originalReference.split("/");

          const name = asset[asset.length - 1];

          console.log(name);
          console.log("r", resource.getFilename());

          return { reference: resource.getFilename() };
        } // console.log("reference ", originalReference);
        else if (!resource) {
          return {reference: parentResource.url + originalReference}
        } else {
          console.log(resource.url);
          return { reference: resource.getFilename() };
        }
      }
    );
  }
}
const options = {
  urls: ["https://ahfarmer.github.io/emoji-search/"],
  directory: `./${now.getTime()}`,
  plugins: [
    new PuppeteerPlugin({
      scrollToBottom: { timeout: 10000, viewportN: 10 } /* optional */,
      blockNavigation: true /* optional */,
    }),
    new Plugin(),
  ],
};

Description

I want to replace assets hosted online to the ones downloaded offline so it can be used without internet. In the url above, it has a asset which has src attribute value set to //cdn.jsdelivr.net/emojione/assets/png/1f4af.png". I was going to use getReference action but I realized later on that this was already doing what I want to do with the exception of image urls

Expected behavior: Make //cdn.jsdelivr.net/emojione/assets/png/1f4af.png or any remote image urls reference or replaces with downloaded one

Actual behavior: The image url was not replaced with locally downloaded assets

@s0ph1e
Copy link
Member

s0ph1e commented Jun 11, 2022

Hi @jeepers3327

I executed the code but for me it works fine, I have <img alt="100" src="images/1f4af.png"> in the result html.
Did you try to check the logs to see what happens with this resource?

@jeepers3327
Copy link
Author

Upon looking at the actual html file. It is indeed working, assets are already replaced. Issue is with react spa loading the content from js file instead of the content of the html.

@jeepers3327
Copy link
Author

Thanks for taking your time on checking this out @s0ph1e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants