refactor: File model #3050

shaohuzhang1 · 2025-05-07T03:43:57Z

refactor: File model

f2c-ci-robot · 2025-05-07T03:44:01Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

f2c-ci-robot · 2025-05-07T03:44:05Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shaohuzhang1 · 2025-05-07T03:44:22Z

apps/knowledge/migrations/0005_remove_file_workspace_id_file_file_size_and_more.py

+            name='source_type',
+            field=models.CharField(choices=[('KNOWLEDGE', 'Knowledge'), ('APPLICATION', 'Application'), ('TEMPORARY_30_MINUTE', 'Temporary 30 Minute'), ('TEMPORARY_100_MINUTE', 'Temporary 120 Minute'), ('TEMPORARY_1_DAY', 'Temporary 1 Day')], default='TEMPORARY_100_MINUTE', verbose_name='资源类型'),
+        )
+    ]


The provided migration script looks generally correct and should not contain any immediate issues. However, there are a few optimizations you can consider:

Indexing Fields: If these table sizes become significantly larger in the future, consider adding indexes to fields that will be frequently queried.

class Migration(migrations.Migration): dependencies = [ ('knowledge', '0004_knowledge_file_size_limit_alter_document_status_and_more'), ] operations = [ migrations.RemoveField( model_name='file', name='workspace_id', ), migrations.AddField( model_name='file', name='file_size', field=models.IntegerField(default=0, verbose_name='文件大小', db_index=True), ), migrations.AddField( model_name='file', name='sha256_hash', field=models.CharField(default='', verbose_name='文件SHA256哈希标识', db_index=True), ), migrations.AddField( model_name='file', name='source_id', field=models.CharField(default='TEMPORARY_100_MINUTE', verbose_name='资源ID'), ), migrations.AddField( model_name='file', name='source_type', field=models.CharField(choices=[('KNOWLEDGE', '知识'), ('APPLICATION', '应用'), ('TEMPORARY_30_MINUTE', '临时30分钟'), ('TEMPORARY_100_MINUTE', '临时120分钟'), ('TEMPORARY_1_DAY', '临时一天')], default='TEMPORARY_100_MINUTE', verbose_name='资源类型', db_index=True) ) ]

Data Type Considerations: Ensure that the data types used align with your expected storage requirements. For example, if source_type is unlikely to change once set, it might make sense to keep it as a character type rather than an integer for better performance and flexibility.

Version Control Comments: Add comments explaining changes between versions to ease maintenance over time.

By implementing these optimizations, you can improve query performance and maintainability of your database schema.

shaohuzhang1 · 2025-05-07T03:44:35Z

apps/knowledge/models/knowledge.py

-    select_one(f'SELECT lo_unlink({instance.loid})', [])
+    exist = QuerySet(File).filter(loid=instance.loid).exclude(id=instance.id).exists()
+    if not exist:
+        select_one(f'SELECT lo_unlink({instance.loid})', [])


The code looks mostly clean and well-structured. However, there are a few areas where improvements can be made:

Improvements

Duplicate Code: The get_bytes method has duplicate logic inside the save method. Consider separating this logic into its own utility function to avoid repetition.

UUID Handling: Ensure that uuid_uuid7() generates a unique UUID each time it's called. This might need adjustment based on specific requirements.

Null Values: Since source_type and source_id have defaults, ensure null values should be handled appropriately. You may want to add checks to handle these cases gracefully.

SQL Injection Prevention: Always use parameterized queries when executing SQL statements to prevent SQL injection attacks.

Error Handling: In the on_delete_file signal handler, consider adding error handling to manage exceptions properly.

Optimization: For large databases, query performance might be optimized further depending on complexity and indexing strategies.

Suggested Corrections

# Separating get_bytes logic def fetch_data(file_size, bytea): if file_size > 0: result = select_one("SELECT lo_from_bytea(%s, %s::bytea) as loid", [file_size, bytea]) return result['loid'] return 0 # Updated save method to call helper function def save(self, bytea=None, force_insert=False, force_update=False, using=None, update_fields=None): file_size = len(bytea) self.loid = fetch_data(file_size, bytea) @receiver(pre_delete, sender=File) def on_delete_file(sender, instance, **kwargs): exist = QuerySet(File).filter(loid=instance.loid).exclude(id=instance.id).exists() if exist: # Only delete if another File with the same loid exists select_one(f'SELECT lo_unlink({instance.loid})', [])

These changes aim to reduce redundancy, enhance readability, and improve maintainability of the codebase while ensuring security practices are followed.

shaohuzhang1 · 2025-05-07T03:45:00Z

apps/common/utils/common.py

+def get_sha256_hash(_bytes):
+    sha256 = hashlib.sha256()
+    sha256.update(_bytes)
+    return sha256.hexdigest()


The provided code looks mostly clean and well-structured. However, there are a few areas that can be improved:

Docstring: Adding docstrings to functions can help clarify their purpose.

# In password_encrypt function def password_encrypt(row_password): """Encrypts the row_password using SHA-256.""" # Other functions similarly

Comments: It might be beneficial to add comments above sections of code that perform specific operations or set up certain conditions.

Code Consistency:

The sub_array function should handle cases where item_num > len(array) gracefully.

Ensure consistency in naming conventions (e.g., use underscores instead of hyphens).

def sub_array(array: List, item_num=10): if not array: return [] start_index = max(0, len(array) - item_num) end_index = min(len(array), start_index + item_num) return array[start_index:end_index]

Performance Optimization:

If bulk_create_in_batches is called frequently, consider optimizing it further or integrating with an asynchronous database client like Django Channels or Celery.

Exception Handling:

Consider adding more robust exception handling around file operations and other critical operations.

You might want to log exceptions rather than raising them directly.

Here's revised version with some of these improvements:

from typing import List from ..exception.app_exception import AppApiException from ..models.db_model_manage import DBModelManage import hashlib def password_encrypt(row_password): """ Encrypts the row_password using SHA-256. :param row_password: Password to encrypt. :return: Encrypted password in hexadecimal format. """ sha256 = hashlib.sha256() sha256.update(row_password.encode('utf-8')) # Assuming UTF-8 encoding return sha256.hexdigest() def get_file_content(path): """ Reads content from the specified path. :param path: Path to the file. :return: Content of the file as a string. """ try: with open(path, 'r', encoding='utf-8') as file: content = file.read() return content except FileNotFoundError: raise AppApiException("File not found.") except Exception as e: raise AppApiException(f"An error occurred while reading the file: {str(e)}") def sub_array(array: List, item_num=10): """ Subtracts the first N items from the list. :param array: Input list. :param item_num: Number of items to subtract. :return: Modified list. """ if not array: return [] start_index = max(0, len(array) - item_num) end_index = min(len(array), start_index + item_num) return array[start_index:end_index] def bulk_create_in_batches(model, data, batch_size=1000): """ Creates multiple instances of the specified model in batches. :param model: Model class to create instances of. :param data: Data representing models to create. :param batch_size: Number of instances per batch. """ for i in range(0, len(data), batch_size): batch = data[i:i + batch_size] model.objects.bulk_create(batch) def get_sha256_hash(_bytes): """ Generates a SHA-256 hash digest from bytes. :param _bytes: Bytes to hash. :return: Hexadecimal representation of the SHA-256 hash. """ sha256 = hashlib.sha256() sha256.update(_bytes) return sha256.hexdigest()

These changes enhance readability, maintainability, and error-handling capabilities of the code.

refactor: File model

f729ea9

f2c-ci-robot bot added the do-not-merge/release-note-label-needed label May 7, 2025

shaohuzhang1 commented May 7, 2025

View reviewed changes

shaohuzhang1 merged commit bcf6cce into v2 May 7, 2025
3 of 4 checks passed

shaohuzhang1 commented May 7, 2025

View reviewed changes

shaohuzhang1 deleted the pr@v2@refactor_file_model branch May 7, 2025 03:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: File model #3050

refactor: File model #3050

Uh oh!

shaohuzhang1 commented May 7, 2025

Uh oh!

f2c-ci-robot bot commented May 7, 2025

Uh oh!

f2c-ci-robot bot commented May 7, 2025

Uh oh!

shaohuzhang1 May 7, 2025

Uh oh!

shaohuzhang1 May 7, 2025

Uh oh!

Uh oh!

shaohuzhang1 May 7, 2025

Uh oh!

Uh oh!

refactor: File model #3050

refactor: File model #3050

Uh oh!

Conversation

shaohuzhang1 commented May 7, 2025

Uh oh!

f2c-ci-robot bot commented May 7, 2025

Uh oh!

f2c-ci-robot bot commented May 7, 2025

Uh oh!

shaohuzhang1 May 7, 2025

Choose a reason for hiding this comment

Uh oh!

shaohuzhang1 May 7, 2025

Choose a reason for hiding this comment

Improvements

Suggested Corrections

Uh oh!

Uh oh!

shaohuzhang1 May 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!