-
Notifications
You must be signed in to change notification settings - Fork 2.2k
refactor: File model #3050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: File model #3050
Conversation
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
name='source_type', | ||
field=models.CharField(choices=[('KNOWLEDGE', 'Knowledge'), ('APPLICATION', 'Application'), ('TEMPORARY_30_MINUTE', 'Temporary 30 Minute'), ('TEMPORARY_100_MINUTE', 'Temporary 120 Minute'), ('TEMPORARY_1_DAY', 'Temporary 1 Day')], default='TEMPORARY_100_MINUTE', verbose_name='资源类型'), | ||
) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provided migration script looks generally correct and should not contain any immediate issues. However, there are a few optimizations you can consider:
-
Indexing Fields: If these table sizes become significantly larger in the future, consider adding indexes to fields that will be frequently queried.
class Migration(migrations.Migration): dependencies = [ ('knowledge', '0004_knowledge_file_size_limit_alter_document_status_and_more'), ] operations = [ migrations.RemoveField( model_name='file', name='workspace_id', ), migrations.AddField( model_name='file', name='file_size', field=models.IntegerField(default=0, verbose_name='文件大小', db_index=True), ), migrations.AddField( model_name='file', name='sha256_hash', field=models.CharField(default='', verbose_name='文件SHA256哈希标识', db_index=True), ), migrations.AddField( model_name='file', name='source_id', field=models.CharField(default='TEMPORARY_100_MINUTE', verbose_name='资源ID'), ), migrations.AddField( model_name='file', name='source_type', field=models.CharField(choices=[('KNOWLEDGE', '知识'), ('APPLICATION', '应用'), ('TEMPORARY_30_MINUTE', '临时30分钟'), ('TEMPORARY_100_MINUTE', '临时120分钟'), ('TEMPORARY_1_DAY', '临时一天')], default='TEMPORARY_100_MINUTE', verbose_name='资源类型', db_index=True) ) ]
-
Data Type Considerations: Ensure that the data types used align with your expected storage requirements. For example, if
source_type
is unlikely to change once set, it might make sense to keep it as a character type rather than an integer for better performance and flexibility. -
Version Control Comments: Add comments explaining changes between versions to ease maintenance over time.
By implementing these optimizations, you can improve query performance and maintainability of your database schema.
select_one(f'SELECT lo_unlink({instance.loid})', []) | ||
exist = QuerySet(File).filter(loid=instance.loid).exclude(id=instance.id).exists() | ||
if not exist: | ||
select_one(f'SELECT lo_unlink({instance.loid})', []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks mostly clean and well-structured. However, there are a few areas where improvements can be made:
Improvements
-
Duplicate Code: The
get_bytes
method has duplicate logic inside thesave
method. Consider separating this logic into its own utility function to avoid repetition. -
UUID Handling: Ensure that
uuid_uuid7()
generates a unique UUID each time it's called. This might need adjustment based on specific requirements. -
Null Values: Since
source_type
andsource_id
have defaults, ensure null values should be handled appropriately. You may want to add checks to handle these cases gracefully. -
SQL Injection Prevention: Always use parameterized queries when executing SQL statements to prevent SQL injection attacks.
-
Error Handling: In the
on_delete_file
signal handler, consider adding error handling to manage exceptions properly. -
Optimization: For large databases, query performance might be optimized further depending on complexity and indexing strategies.
Suggested Corrections
# Separating get_bytes logic
def fetch_data(file_size, bytea):
if file_size > 0:
result = select_one("SELECT lo_from_bytea(%s, %s::bytea) as loid", [file_size, bytea])
return result['loid']
return 0
# Updated save method to call helper function
def save(self, bytea=None, force_insert=False, force_update=False, using=None, update_fields=None):
file_size = len(bytea)
self.loid = fetch_data(file_size, bytea)
@receiver(pre_delete, sender=File)
def on_delete_file(sender, instance, **kwargs):
exist = QuerySet(File).filter(loid=instance.loid).exclude(id=instance.id).exists()
if exist: # Only delete if another File with the same loid exists
select_one(f'SELECT lo_unlink({instance.loid})', [])
These changes aim to reduce redundancy, enhance readability, and improve maintainability of the codebase while ensuring security practices are followed.
def get_sha256_hash(_bytes): | ||
sha256 = hashlib.sha256() | ||
sha256.update(_bytes) | ||
return sha256.hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provided code looks mostly clean and well-structured. However, there are a few areas that can be improved:
-
Docstring: Adding docstrings to functions can help clarify their purpose.
# In password_encrypt function def password_encrypt(row_password): """Encrypts the row_password using SHA-256.""" # Other functions similarly
-
Comments: It might be beneficial to add comments above sections of code that perform specific operations or set up certain conditions.
-
Code Consistency:
- The
sub_array
function should handle cases whereitem_num > len(array)
gracefully. - Ensure consistency in naming conventions (e.g., use underscores instead of hyphens).
def sub_array(array: List, item_num=10): if not array: return [] start_index = max(0, len(array) - item_num) end_index = min(len(array), start_index + item_num) return array[start_index:end_index]
- The
-
Performance Optimization:
- If
bulk_create_in_batches
is called frequently, consider optimizing it further or integrating with an asynchronous database client like Django Channels or Celery.
- If
-
Exception Handling:
- Consider adding more robust exception handling around file operations and other critical operations.
- You might want to log exceptions rather than raising them directly.
Here's revised version with some of these improvements:
from typing import List
from ..exception.app_exception import AppApiException
from ..models.db_model_manage import DBModelManage
import hashlib
def password_encrypt(row_password):
"""
Encrypts the row_password using SHA-256.
:param row_password: Password to encrypt.
:return: Encrypted password in hexadecimal format.
"""
sha256 = hashlib.sha256()
sha256.update(row_password.encode('utf-8')) # Assuming UTF-8 encoding
return sha256.hexdigest()
def get_file_content(path):
"""
Reads content from the specified path.
:param path: Path to the file.
:return: Content of the file as a string.
"""
try:
with open(path, 'r', encoding='utf-8') as file:
content = file.read()
return content
except FileNotFoundError:
raise AppApiException("File not found.")
except Exception as e:
raise AppApiException(f"An error occurred while reading the file: {str(e)}")
def sub_array(array: List, item_num=10):
"""
Subtracts the first N items from the list.
:param array: Input list.
:param item_num: Number of items to subtract.
:return: Modified list.
"""
if not array:
return []
start_index = max(0, len(array) - item_num)
end_index = min(len(array), start_index + item_num)
return array[start_index:end_index]
def bulk_create_in_batches(model, data, batch_size=1000):
"""
Creates multiple instances of the specified model in batches.
:param model: Model class to create instances of.
:param data: Data representing models to create.
:param batch_size: Number of instances per batch.
"""
for i in range(0, len(data), batch_size):
batch = data[i:i + batch_size]
model.objects.bulk_create(batch)
def get_sha256_hash(_bytes):
"""
Generates a SHA-256 hash digest from bytes.
:param _bytes: Bytes to hash.
:return: Hexadecimal representation of the SHA-256 hash.
"""
sha256 = hashlib.sha256()
sha256.update(_bytes)
return sha256.hexdigest()
These changes enhance readability, maintainability, and error-handling capabilities of the code.
refactor: File model