Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: cache for remote images #310

Merged
merged 5 commits into from
Feb 11, 2024
Merged

add: cache for remote images #310

merged 5 commits into from
Feb 11, 2024

Conversation

HolgerHuo
Copy link
Contributor

@HolgerHuo HolgerHuo commented Jan 27, 2024

Thoughts

Currently webp-server pings every remote file to see if it has changed but in most cases, image source are static. The extra HEAD request increases network overhead and causes latency for response. A Redis cache can be implemented to map each URL to its ETag and skip the pinging procedure. This pr illustrates a simple implementation and from my tests, the latency created by fetching remote image is eliminated.

Improvements to make

  • Respect cache-control header sent by origin server
  • Write locks for redis
  • Maybe smarter eviction of redis entries to constrain redis mem usage?
  • Further caching. The calculation of smallest file, etc seems still unnecessary and heavy, but is it cost-effective to cache them altogether?

Above are some of my thoughts to make webp-server-go more speedy. I don't know if these changes are effective or well-implemented and I would like to hear some advice from the dev team on how to best utilize caching in our scenerio. I'll keep working on this pr and hopefully it will be completed soon.

@HolgerHuo
Copy link
Contributor Author

Further caching. The calculation of smallest file, etc seems still unnecessary and heavy, but is it cost-effective to cache them altogether?

After some profiling, I found that an in-memory cache for etag is enough unless the server significantly lacks on I/O, but a cache wont help this anyway.

Plus, with a week of field research, the processing time (mean Convert function call) of second requests to my instance of webp-server reduced by ~200ms which is exactly the latency between the server and the origin server. My redis memory usage is around 10MiB with ~10000 entries by a 3 day TTL. No serverside errors were observed during this period. However, my instance only serves ~20k requests with a total bandwidth of ~2.5GiB, and data on giant instances needs to be further collected.

About TTL for redis:

  1. It's best to maintain hit rate so the combination of Redis's built in eviction policy (lru) and a max memory limit works best
  2. Respect cache-control using modules like https://github.com/pquerna/cachecontrol , however, since most static image have a de facto infinite expire ttl, a global max ttl should be implemented
  3. Force global ttl for redis expiration, also the current behavior in this pr.

Also, since webp-server now supports multiple origin servers, should the TTL config be applied to all hosts or separate configs are better. The drawback with the letter is that it will break current config file structure.

The caching logic in this pr looks ok to me and it's ready for review. Please notify me if the dev team thinks another default behavior of redis ttl better fits the majority and I'll update soon!

@HolgerHuo HolgerHuo marked this pull request as ready for review February 3, 2024 16:07
@HolgerHuo HolgerHuo changed the title [wip] add: redis cache for remote images add: redis cache for remote images Feb 3, 2024
@HolgerHuo HolgerHuo force-pushed the master branch 2 times, most recently from edd0a0b to b86ed62 Compare February 3, 2024 16:13
@BennyThink BennyThink added the enhancement New feature or request label Feb 5, 2024
@n0vad3v
Copy link
Member

n0vad3v commented Feb 6, 2024

Thank you for your contribution @HolgerHuo , yes, adding a stateful cache on fetchRemoteImg will reduce HEAD requests made to origin on every request and can cut down latency.

However adding an external service, especially a stateful service(e,g Redis) will greatly increase the complexity of the entire system, this is my main concern about adding feature to this, including this and issue #305.

We've some similar concerns on some aspects and as you can see currently WebP Server Go can run without any other service dependencies, like metadata service is replaced with static local JSON file, etc.

Speaking of your comment you've mentioned:

I found that an in-memory cache for etag is enough unless the server significantly lacks on I/O, but a cache wont help this anyway.

Which I didn't get it, what do you mean by "unless the server significantly lacks on I/O, but a cache wont help this anyway."

BTW, we've included github.com/patrickmn/go-cache as WriteCache, could we use the same package for the same function here(as Redis will introduce TTL for cache evict as well, and, as per your experiment "redis memory usage is around 10MiB with ~10000 entries by a 3 day TTL" so the RAM usage shouldn't be a problem here), this will eliminate the need for external service and keep WebP Server Go simple to setup and configure.

@HolgerHuo
Copy link
Contributor Author

Hi! I understand your concern over the complexity of of the system.

Speaking of your comment you've mentioned:

I found that an in-memory cache for etag is enough unless the server significantly lacks on I/O, but a cache wont help this anyway.

This was a comment for my previous thinking on Further caching. The calculation of smallest file, etc seems still unnecessary and heavy, but is it cost-effective to cache them altogether?. Unless the harddisk is incapable of handling size comparison operations, my VPS with 4G8G nvme ssd can handle 10~20 concurrent process of file size comparison under 1ms avg. time, so caching the result of smallest file is unnecessary, and even if it helps on slower ssd-cached servers, webp-server will eventually calculate the etag of the response or sending the file over HTTP, and redis won't help on this operation, thus we dont need to cache the result of smallest file in memory.

However adding an external service, especially a stateful service(e,g Redis) will greatly increase the complexity of the entire system, this is my main concern about adding feature to this, including this and issue #305.

Did you mean adding features depending on external stateful services? How was #305 related to this?

We've some similar concerns on some aspects and as you can see currently WebP Server Go can run without any other service dependencies, like metadata service is replaced with static local JSON file, etc.
BTW, we've included github.com/patrickmn/go-cache as WriteCache, could we use the same package for the same function here(as Redis will introduce TTL for cache evict as well, and, as per your experiment "redis memory usage is around 10MiB with ~10000 entries by a 3 day TTL" so the RAM usage shouldn't be a problem here), this will eliminate the need for external service and keep WebP Server Go simple to setup and configure.

I see these. I'll update the pr in a little while.

Btw, has the dev team considered making webp-server able to scale horizontally, ie, to better work in a multi-node environment, since I saw #296 was implemented. I've considered about using go's internal in-memory cache, but i did miss that a WriteCache was implemented in this way already and I thought that an external service will help in a multi-node environment and further enable it to accommodate more requests. Certainly, other parts of the code also need to be modified to make it cloud native, and we might make this an option without breaking its single-machine simplicity with opt-in redis backend.

@HolgerHuo
Copy link
Contributor Author

By the way, as for ttl, do you think there is a better mechanism for cache expiration? Does a global ttl fit in most cases?

@HolgerHuo
Copy link
Contributor Author

HolgerHuo commented Feb 6, 2024

Hi! I have refactored the code and initial results showed that go-cache is working normally. More tests are still ongoing but it should be stable and won't interfere with other logics of webp-server.

@HolgerHuo HolgerHuo changed the title add: redis cache for remote images add: cache for remote images Feb 6, 2024
@HolgerHuo
Copy link
Contributor Author

Sorry, i overlooked the 0 for no expiration behavior in config.go and i fix that in the following commit.

@n0vad3v
Copy link
Member

n0vad3v commented Feb 7, 2024

Nice work, using go-cache looks better!

Did you mean adding features depending on external stateful services? How was #305 related to this?

Yes, but not limited to this, issue 305 suggests adding another GET parameter that we think might increase complexity of code, so we're still thinking about it.

Does a global ttl fit in most cases?

My intuition feels so okay to use a global TTL, feel free to discuss if you've thought of another case which global TTL won't fit.

Btw, has the dev team considered making webp-server able to scale horizontally

Currently we haven't thought about this at this time but we are open to ideas.

we might make this an option without breaking its single-machine simplicity with opt-in redis backend.

Agreed, and to me, I've come up with few things that might be solved at the same time(when considering about scale horizontally, especially in kubernetes environment) like:

  1. should all nodes share same filesystem(PV) for caching, if this is the case, how should those instances run without creating locks on files.
  2. Or, if they run independently, will external service like redis still needed.

I haven't thought about this problem very clearly. Your thoughts and suggestions are welcomed.

@HolgerHuo
Copy link
Contributor Author

Thanks!

My intuition feels so okay to use a global TTL, feel free to discuss if you've thought of another case which global TTL won't fit.

I also think it ok but a better approach may be implementing lru eviction policy limited by resp's ttl. But in current design it doesn't feel so necessary.

Agreed, and to me, I've come up with few things that might be solved at the same time(when considering about scale horizontally, especially in kubernetes environment) like:

I think to make able it (scaling horizontally as in kubernetes style), a shared fs is better because:

  1. pods in kubernetes are commonly with in same datacenters or geographically close, and a shared fs (either pv, ceph or SAN) exists in most cases;
  2. if instances have their own storage, cache may be duplicated or inefficient since the request is forwarded by lbs unknowingly;
  3. even the lb distributes loads by uri, it doesn't fits in k8s's philosophy that pods are stateful since in k8s, pods may be pulled down or spun up according to system load, and if requests constantly hit on specific resources, new pods won't help with an empty cache, and vice versa.

If run independently, go-cache may also fall behind since it's not persistent across restarts, and unlike writelock's lifecycle, cache for requests are usually much longer, but redis is not a must.

should all nodes share same filesystem(PV) for caching, if this is the case, how should those instances run without creating locks on files.

I don't think it possible without creating locks in a graceful manner, but race conditions are not so easily encountered because file io operations wont take too long. Anyhow, to make the server reliable, it's still necessary for locks to be implemented.

@HolgerHuo
Copy link
Contributor Author

HolgerHuo commented Feb 7, 2024

There are some other thoughts like maintaining regarding k8s deployment like instance's lifecycle. And i think if the dev team is interested, i might try to work on this in another pr.

PS: a small bug is fixed in the following commit.

handler/remote.go Outdated Show resolved Hide resolved
@n0vad3v
Copy link
Member

n0vad3v commented Feb 10, 2024

Nice!❤️

I've added a comment on code, the rest looks good for me, please comment when this PR is ready for merge.

There are some other thoughts like maintaining regarding k8s deployment like instance's lifecycle. And i think if the dev team is interested

I think we can create an issue for this topic to make sure we are on the same page before creating PR.

@HolgerHuo
Copy link
Contributor Author

Hi! I've updated the code. I'm not very familiar with go-cache module. But since it is an internal module and we have full knowledge of what is written to that cache, is this error handling necessary? Anyhow, it is a better practice.

I think we can create an issue for this topic to make sure we are on the same page before creating PR.

Yes, we can continue further work there.

Btw, happy chinese new year.

@n0vad3v
Copy link
Member

n0vad3v commented Feb 11, 2024

Happy Lunar New Year!

@n0vad3v n0vad3v merged commit 123c96d into webp-sh:master Feb 11, 2024
4 of 6 checks passed
HolgerHuo added a commit to HolgerHuo/webp_server_go that referenced this pull request Mar 25, 2024
HolgerHuo added a commit to HolgerHuo/webp_server_go that referenced this pull request May 8, 2024
HolgerHuo added a commit to HolgerHuo/webp_server_go that referenced this pull request May 8, 2024
@HolgerHuo HolgerHuo deleted the master branch May 8, 2024 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants