Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: update langchain version #2641

Merged
merged 1 commit into from
Mar 20, 2025
Merged

ci: update langchain version #2641

merged 1 commit into from
Mar 20, 2025

Conversation

shaohuzhang1
Copy link
Contributor

ci: update langchain version

Copy link

f2c-ci-robot bot commented Mar 20, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

f2c-ci-robot bot commented Mar 20, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tools: Optional[Sequence[Union[ToolDict, GoogleTool]]] = None,
functions: Optional[Sequence[FunctionDeclarationType]] = None,
tools: Optional[Sequence[Union[_ToolDict, GoogleTool]]] = None,
functions: Optional[Sequence[_FunctionDeclarationType]] = None,
safety_settings: Optional[SafetySettingDict] = None,
tool_config: Optional[Union[Dict, _ToolConfigDict]] = None,
generation_config: Optional[Dict[str, Any]] = None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several improvements that can be made to this code:

  1. Import Statements: The order of imports has been rearranged more logically. tools should typically come before functions, and safety settings might need reordering if applicable.

  2. Optional Parameters Handling: Ensure that optional parameters like stop are explicitly set when they are not provided by using default arguments within the function definition.

  3. Documentation Comments: Add more documentation comments to clarify the purpose and usage of each parameter, especially _tool_config.

  4. Naming Consistency: Consider renaming variables and dictionaries for better readability and consistency with other parts of the library.

  5. Safety Setting Dict Check: Ensure that safety settings are properly checked or used, depending on how you intend to handle them.

Here's an improved version of the code:

from typing import List, Dict, Optional, Sequence, Union, Any, Iterator, cast

from google.ai.generativelanguage_v1beta.types import (
    Tool as GoogleTool,
)
from langchain_core.callbacks import CallbackManagerForLLMRun
from langchain_core.messages import BaseMessage
from langchain_core.outputs import ChatGenerationChunk
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_google_genai._function_utils import _ToolConfigDict, _ToolDict
from langchain_google_genai.chat_models import _chat_with_retry, _response_to_result, _FunctionDeclarationType, _SafetySettingsDict
from setting.models_provider.base_model_provider import MaxKBBaseModel


def _stream(
    messages: List[BaseMessage],
    max_tokens_per_message: int = 506,
    temperature: float = 0.7,
    top_p: float = 0.95,
    presence_penalty: float = 0.,
    frequency_penalty: float = 0.,
    logit_bias: Optional[dict] = {},
    stop: Optional[List[str]] = None,
    run_manager: Optional[CallbackManagerForLLMRun] = None,
    *,
    tools: Optional[Sequence[Union[_ToolDict, GoogleTool]]] = None,
    functions: Optional[Sequence[_FunctionDeclarationType]] = None,
    # Explicitly document safety settings handling
    safety_settings: Optional[_SafetySettingsDict] = None,
    tool_config: Optional[Union[Dict, _ToolConfigDict]] = None,
    generation_config: Optional[Dict[str, Any]] = None,
):
    """
    Streams text completion using GPT API based on previous message history.
    
    Args:
    - messages (List[BaseMessage]): Messages preceding current request.
    - max_tokens_per_message (int): Maximum number of tokens per message in output generated by LLM.
    - temperature (float, default=0.7): Controls creativity vs. precision.
    - top_p (float, default=0.95): Nucleus sampling top-p probability mass.
    - presence_penalty (float, default=0.0): Punishes likelihood of repition.
    - frequency_penalty (float, default=0.0): Penalizes use of previously emitted words.
    - logit_bias (Optional[dict], default={}): Dictionary mapping token ids to bias values.
    - stop (Optional[List[str]], default=None): List of strings at which to end text generation.
    - run_manager (Optional[CallbackManagerForLLMRun], default=None): Manager for callbacks during LLM generation run.
    - tools (Optional[Sequence[Union[_ToolDict, GoogleTool]]], default=None): Tools available to assistant during query.
    - functions (Optional[Sequence[_FunctionDeclarationType]]), default=None): Available custom functions.
    - safety_settings (_SafetySettingsDict, default=None): Settings to control content safety scoring.
    - tool_config (Optional[Union[Dict, _ToolConfigDict]], default=None): Configuration for tools, potentially enhancing their functionality.
    - generation_config (Optional[Dict[str, Any]], default=None): Additional configuration options for text generation.
    
    Returns:
    - Generator[ChatGenerationChunk]: A generator over chat-generation chunks.
    """
    response_generator =_chat_with_retry(ChatGoogleGenerativeAI(), max_tokens_per_message=max_tokens_per_message,)
                        .generate_stream(messages, temperature, top_p, presence_penalty, frequency_penalty, logit_bias, stop, run_manager=run_manager).
    
    for chunk in response_generator:
        yield _response_to_result(chunk, generation_config)

Key Changes:

  • Imports: Sorted and grouped appropriately.
  • Comments: Added detailed docstring explaining the function and its parameters.
  • Parameter Order: Ensured clarity between mandatory and optional parameters.
  • Handling Safety Settings: Added documentation to emphasize proper handling of safety settings.

from setting.models_provider.base_model_provider import MaxKBBaseModel
from sklearn.metrics.pairwise import cosine_similarity
from pydantic.v1 import BaseModel, Field
from pydantic import BaseModel, Field


class OllamaReranker(MaxKBBaseModel, OllamaEmbeddings, BaseModel):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code looks clean and follows PEP8 style conventions. However, there are a few minor improvements you can make:

  1. Remove Duplicate Imports: The imports of OllamaEmbeddings from both langchain_community.embeddings and setting.models_provider.base_model_provider can be merged into one.

  2. Use Aliases for Pydantic Fields: While Field is available in the new version of Pydantic (pydantic) without an alias, adding aliases might help with readability if future updates to Pydantic change their behavior slightly.

Here's the revised code:

from typing import Sequence, Optional, Any, Dict

import pandas as pd
from langchain_core.callbacks import Callbacks
from langchain_core.documents import Document
from sklearn.metrics.pairwise import cosine_similarity
from pydantic.v1 import BaseModel, Field


class OllamaReranker(OllamaEmbeddings, MaxKBBaseModel, BaseModel):
    def __init__(self, callbacks: Callbacks = None, vectors_df: pd.DataFrame = None):
        super().__init__()
        self.callbacks = callbacks
        self.vectors_df = vectors_df

Additional Suggestions:

  • Add Default Values for Parameters: If some parameters need default values, consider assigning them in the constructor using _config.

    class OllamaReranker(OllamaEmbeddings, MaxKBBaseModel, BaseModel):
        # Existing code remains the same...
    
        def __init__(self,
                     callbacks: Callbacks = None,
                     vectors_df: pd.DataFrame = None,
                     vector_column_name: str = "vector",
                     search_threshold: float = 0.75):
            super().__init__()
            self.callbacks = callbacks
            self.vectors_df = vectors_df
            self.vector_column_name = vector_column_name
            self.search_threshold = search_threshold

These changes will improve the readability and maintainability of the code while maintaining compatibility with the latest versions of libraries like Pydantic and LangChain Community.

@@ -69,7 +69,7 @@ class LocalBaseReranker(MaxKBBaseModel, BaseDocumentCompressor):
tokenizer: Any = None
model: Optional[str] = None
cache_dir: Optional[str] = None
model_kwargs = {}
model_kwargs: ClassVar = {}

def __init__(self, model_name, cache_dir=None, **model_kwargs):
super().__init__()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code snippet has several areas that need attention:

  1. Docstring Formatting: The date in the docstring is @date:2024/9/2 16:42, which seems to be incorrect format. It should use dashes (:) instead of slashes /. Additionally, Python prefers uppercase letters for constants like ClassVar.

    Suggestion: Update the date string to @date: 2024-09-02 16:42 and rename model_kwargs to MODEL_KWARGS as it looks more Pythonic.

  2. Variable Renaming: It's generally considered best practice to use CamelCase naming convention for class variables. Thus, consider renaming model_kwargs to modelKwargs.

  3. Docstring Structure: While not strictly necessary, having proper formatting for docstrings can make code easier to read. For example, ensuring each parameter description is followed immediately after with a colon (:).

  4. Python Version Considerations: If you're working with Python 3.x, ensure compatibility checks are made where necessary. However, these changes do not affect general functionality, just coding style.

Here’s an improved version based on these suggestions:

# Modified part of the code

class LocalBaseReranker(MaxKBBaseModel, BaseDocumentCompressor):
    tokenizer: Any = None
    model: Optional[str] = None
    cache_dir: Optional[str] = None
    MODEL_KWARGS: ClassVar[Dict[str, Any]] = {}

    def __init__(self, model_name: str, cache_dir: Optional[str] = None, model_kwargs: dict = {}):
        super().__init__()

These modifications improve readability and maintainability while adhering to Python conventions.

@liuruibin liuruibin merged commit 7239840 into main Mar 20, 2025
4 of 5 checks passed
@liuruibin liuruibin deleted the pr@main@ci_langchain branch March 20, 2025 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants