Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Paragraph list unsorted #2411

Merged
merged 1 commit into from
Feb 26, 2025
Merged

Conversation

shaohuzhang1
Copy link
Contributor

fix: Paragraph list unsorted

return native_search({
'document_custom_sql': query_set,
'order_by_query': QuerySet(Document).order_by('-create_time', 'id')
}, select_string=get_file_content(
os.path.join(PROJECT_DIR, "apps", "dataset", 'sql', 'list_document.sql')),
with_search_one=False), dataset_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some potential issues and optimizations:

  1. Code Repeats: The code for one and batch_save methods is nearly identical, but different in their return values (single vs multiple results). Consider refactoring to reduce redundancy.

  2. Order By Clause: Adding an order by clause when calling native_search can sometimes slow down the query, especially if not properly indexed. It's better to handle ordering after retrieving data if performance becomes a concern.

  3. Parameter Naming: In edit method, it would be more descriptive to use parameter names like instance_dict.

  4. Comments and Docstrings: The comments in your current code are quite short and don't cover all functionalities well. Consider expanding them to provide clear explanations of each part of the method.

  5. Error Handling: Your methods do not include explicit error handling around database operations or custom SQL queries. This could lead to silent failures that might be difficult to diagnose later.

Here’s a revised version with some suggested improvements:

def one(self, instance_id: int, with_valid=False) -> Optional[Document]:
    """Retrieve a single document by ID."""
    self.is_valid(raise_exception=True)
    
    # Fetch document using the provided filter
    queryset = Document.objects.get(id=instance_id)
    
    # Check if detailed content is required
    if with_valid:
        queryset = queryset.validated()
    
    # Perform search on fetched model
    return native_search({
        'document_data': queryset,
    }, select_string=os.path.join(PROJECT_DIR, "apps", "dataset", 'sql', 'list_document.sql'))

def batch_save(self, instances: Iterable[Dict], with_valid=True) -> Tuple[List[int], str]:
    """Save one or more documents."""
    document_model_list = list(instances)
    dataset_id = generate_dataset_id()  # Implement as per your needs
    
    # Filter existing IDs from new models to avoid duplicate entry checks
    ids_to_check = set(d['id'] for d in document_models_list)
    
    # Get existing documents to determine modifications/insertions
    queryset = Document.objects.filter(id__in=ids_to_check)
    
    # If no document exists, update/create based on valid flag
    updated_ids, created_ids = self.update_or_insert_documents(queryset, instances, with_valid)
    
    # Return list of modified/new IDs alongside dataset ID
    return updated_ids + created_ids, dataset_id

def batch_delete(self, instance_ids: list) -> None:
    """Delete multiple documents by their IDS"""
    queryset = Document.objects.filter(id__in=instance_ids)
    queryset.delete()

# Helper functions...

This rework addresses some areas of inefficiency and improves readability. Always ensure you have proper unit tests covering these changes before deployment.

@shaohuzhang1 shaohuzhang1 merged commit 50dd8fa into main Feb 26, 2025
4 checks passed
@shaohuzhang1 shaohuzhang1 deleted the pr@main@fix_paragraph_sorted branch February 26, 2025 03:52
@@ -663,6 +663,7 @@ def get_query_set(self):
**{'title__icontains': self.data.get('title')})
if 'content' in self.data:
query_set = query_set.filter(**{'content__icontains': self.data.get('content')})
query_set.order_by('-create_time', 'id')
return query_set

def list(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one minor issue with the provided code. In Django's ORM, order methods generally use a '-' sign for descending order instead of '-', unless used within a subquery context where they would be interpreted differently. However, using -create_time,-id will ensure that both fields are ordered in reverse order (descending). There might be performance considerations depending on how frequently these queries run and how much data is involved.

Additionally, adding 'created_at' to filter or annotate operations can help improve relevance based on when each item was created:

def get_queryset(self):
    query_set = super().get_queryset()
    
    if 'title' in self.data:
        query_set = query_set.filter(title__icontains=self.data.get('title'))
    if 'content' in self.data:
        query_set = query_set.filter(content__icontains=self.data.get('content'))

# Optional but recommended: Order by creation time first so older posts appear first.
if 'sortby_date' in self.request.GET and self.request.GET['sortby_date'].lower() == 'asc':
    query_set = query_set.order_by('create_time') # Default ascending order, comment this out if not needed
    
else:
    query_set = query_set.order_by('-create_time')

return query_set

This ensures proper ordering and possibly adds an extra layer of flexibility if other sorting options are added later. Additionally, it checks for a specific sort request parameter to allow either asc or desc order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant