Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: enable gzip. #2422

Merged
merged 1 commit into from
Feb 27, 2025
Merged

perf: enable gzip. #2422

merged 1 commit into from
Feb 27, 2025

Conversation

shaohuzhang1
Copy link
Contributor

perf: enable gzip.

Copy link

f2c-ci-robot bot commented Feb 27, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

f2c-ci-robot bot commented Feb 27, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@shaohuzhang1 shaohuzhang1 merged commit 1811a80 into main Feb 27, 2025
3 of 5 checks passed
@shaohuzhang1 shaohuzhang1 deleted the pr@main@perf_enable_gzip branch February 27, 2025 02:17
response.headers["ETag"] = "W/" + etag
response.headers["Content-Encoding"] = "gzip"

return response
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing the provided code snippet, I have identified several areas that could be improved:

  1. Import Order: The import statements are alphabetical, which can sometimes make them harder to read.

  2. Variable Naming: Variable names like re_accepts_gzip, max_random_bytes, etc., follow Django's naming conventions well but should be consistent across similar contexts.

  3. Code Duplication: There is duplicated code between the non-streaming and streaming paths in the process_response method. This can be cleaned up by using a single function for both cases.

  4. Error Handling: While not explicitly mentioned, handling exceptions during compression might improve robustness.

  5. Comments: Some parts of the code could benefit from more detailed comments explaining what each section does.

Here is an optimized version of the code based on these points:

# coding=utf-8
"""
    @project: MaxKB
    @Author:虎
    @file: gzip.py
    @date:2025/2/27 10:03
    @desc:
"""

import zlib

from django.utils.cache import patch_vary_headers
from django.utils.deprecation import MiddlewareMixin
from django.utils.regex_helper import _lazy_re_compile
from django.utils.text import compress_sequence, compress_string


ACCEPTS_GZIP_PATTERN = r"\bgzip\b"

class GZipMiddleware(MiddlewareMixin):
    """
    Compress content if the browser allows gzip compression.
    Set the Vary header accordingly, so that caches will base their storage
    on the Accept-Encoding header.
    """

    MAX_RANDOM_BYTES = 100

    def process_request(self, request):
        self.accepts_gzip = ACCEPTS_GZIP_PATTERN.match(request.META.get('HTTP_ACCEPT_ENCODING', ''))

    def process_response(self, request, response):
        if (
            request.method != "GET" or
            request.path.startswith("/api") or
            not response.streaming and len(response.content) < 200
        ):
            return response

        # Avoid gzipping if we've already got a content-encoding.
        if response.has_header("Content-Encoding"):
            return response

        patch_vary_headers(response, ("Accept-Encoding",))

        if not self.accepts_gzip:
            return response

        self._apply_gzip_compression(response)

        return response

    def _apply_gzip_compression(self, response):
        if response.streaming:
            if response.is_async:
                async_iterable = response.streaming_content
                original_iterator = iter(async_iterable)
                
                async def gzip_wrapper():
                    async for chunk in original_iterator:
                        yield zlib.compress(chunk, level=9)  # Use zlib for better quality
                
                response.streaming_content = gzip_wrapper()
    
            else:
                response.streaming_content = (zlib.compress(chunk, level=9) for chunk in response.streaming_content)
    
                # Adjust Content-Length for streaming with zlib; cannot determine beforehand
                del response.headers["Content-Length"]
        
        else:
            compressed_content = zlib.compress(response.content, level=9)
            
            if len(compressed_content) >= len(response.content):
                return
            
            response.content = compressed_content
            response.headers['Content-Length'] = str(len(response.content))
    
            # Update ETag for conditional requests
            original_etag = response.get('ETag')
            if original_etag and original_etag.startswith('"'):
                adjusted_etag = f"W/{original_etag}"
                response.set_cookie(
                    'etag',
                    adjusted_etag,
                    max_age=None,
                    expires=None,
                    domain=None,
                    path='/', secure=False, http_only=True
                )
                response.delete_cookie('etag')  # Remove old cookie with original ETag
                response.setdefault('ETag', adjusted_etag).replace("\"")

        response.headers['Content-Encoding'] = 'gzip'

Key Changes Made:

  • Imports and Comments: Improved structure and added more descriptive comments.
  • Request Processing Logic: Moved logic into helper functions _apply_gzip_compression.
  • Compression Library: Used zlib instead of custom byte manipulation for better functionality and performance.
  • Conditional Requests: Updated ETag handling to reflect changes made when compressing the content.
  • Streamed Responses: Adjusted processing to handle streamed responses efficiently using iterators with asyncio where necessary.

This refactoring enhances readability and maintainability while ensuring efficient gzip compression for user agents supporting it.

@@ -55,7 +55,7 @@
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.gzip.GZipMiddleware',
'common.middleware.gzip.GZipMiddleware',
'common.middleware.static_headers_middleware.StaticHeadersMiddleware',
'common.middleware.cross_domain_middleware.CrossDomainMiddleware'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code looks generally correct for setting up middleware in Django. However, there are a few potential areas to consider:

  1. Middleware Order: The order of middleware is important. Ensure that CommonMiddleware and related middleware classes come before any custom middleware like gzip, cross-domain headers, etc.

  2. Import Correctly: Make sure you import 'compressionmiddleware' instead of 'django.middleware.gzip.GZipMiddleware'. Assuming that's what you meant, the line should be updated accordingly:

    'compressionmiddleware.middleware.CompressionMiddleware',
  3. Static Headers Middleware: Ensure that static files middleware (django.contrib.staticfiles.middleware.StaticFilesMiddleware) comes after any compression or other middleware that might interfere with it.

  4. Cross-Domain Middleware ensures proper handling of cross-origin resource sharing (CORS) rules. It should typically appear near others involved in response customization.

Here’s an improved version with these considerations addressed:

    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.contrib.staticfiles.middleware.StaticFilesMiddleware', # Static files middleware must come first
    
    'compressionmiddleware.middleware.CompressionMiddleware', # Replace with appropriate gzip library
    
    'common.middleware.gzip.GZipMiddleware',
    'common.middleware.static_headers_middleware.StaticHeadersMiddleware',
    'common.middleware.cross_domain_middleware.CrossDomainMiddleware'

If you're using a specific third-party module for GZIP compression, make sure to replace 'gzipmiddleware.middleware.CompressionMiddleware with the corresponding class from that library, if applicable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant