In [22]:
azure_configs = {
    "base_url": "https://nitoropenai.openai.azure.com/",
    "model_deployment": "gpt-4",
    "model_name": "gpt-4",
    "embedding_deployment": "text-embedding-ada-002",
    "embedding_name": "text-embedding-ada-002",  # most likely
}


In [23]:
from dotenv import load_dotenv
load_dotenv()

True

In [27]:
from langchain_openai.chat_models import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper


azure_llm = AzureChatOpenAI(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["model_deployment"],
    model=azure_configs["model_name"],
    validate_base_url=False,
)

# init the embeddings for answer_relevancy, answer_correctness and answer_similarity
azure_embeddings = AzureOpenAIEmbeddings(
    model=azure_configs["embedding_name"],
)

azure_llm = LangchainLLMWrapper(azure_llm)
azure_embeddings = LangchainEmbeddingsWrapper(azure_embeddings)

---

In [42]:
answer = """
Based solely on the provided text, several misconceptions hinder young founders:\n\n1. **Misunderstanding of \"Work Experience\":**  Young founders often misunderstand the meaning of \"work experience.\" They may believe it refers to specific skills or expertise, rather than the elimination of childhood habits like \"flaking\" and the development of an understanding of the inherently difficult nature of work and its relationship to money. This misunderstanding can lead them to underestimate the importance of developing these crucial attributes.\n\n2. **Overemphasis on Effort over Results:** Young founders may believe that hard work alone guarantees success, a misconception stemming from the \"good effort\" reward system in school.  The text explicitly states that the market doesn't reward effort; it rewards results that meet user needs. This misunderstanding can lead to wasted time and resources on projects that don't deliver value.\n\n3. **Naive View of Wealth:**  The text suggests that young founders may equate wealth with superficial things like Ferraris and admiration, rather than understanding its true value as a means to escape the \"brutal equation\" of working long hours to avoid starvation. This limited perspective can affect their motivation and strategic decision-making.\n\n4. **Underestimating the Importance of User Focus:**  The text implies that young founders, lacking experience in the relationship between work and money, may not automatically focus on the user's needs.  This lack of focus can lead to the development of products or services that fail to meet market demands.\n\n5. **Ignoring the Importance of Adaptability:** The text highlights that many startups end up doing something different than initially planned.  A rigid, pre-ordained plan, coupled with significant spending, is detrimental.  Young founders may not fully grasp the need for flexibility and adaptation in their approach.\n\n\nIn summary, young founders often hold misconceptions about the nature of work, the value of wealth, the importance of user focus, and the need for adaptability. These misconceptions can lead to inefficient resource allocation, a lack of focus on user needs, and ultimately, hinder their progress in building successful startups.\n
"""

In [25]:
from ragas.dataset_schema import SingleTurnSample
from ragas.metrics._factual_correctness import FactualCorrectness


sample = SingleTurnSample(
    response="The Eiffel Tower is located in Paris.",
    reference="The Eiffel Tower is located in Paris. I has a height of 1000ft."
)

scorer = FactualCorrectness()
scorer.llm = azure_llm
await scorer.single_turn_ascore(sample)

0.67

In [43]:
from ragas import SingleTurnSample 
from ragas.metrics import ResponseRelevancy

sample = SingleTurnSample(
        user_input="What common misconceptions do young founders (e.g., recent college graduates) have about building successful startups, and how do these misconceptions hinder their progress?",
        response=answer,
    )

scorer = ResponseRelevancy()
scorer.llm = azure_llm
scorer.embeddings = azure_embeddings
await scorer.single_turn_ascore(sample)

0.9459876223990668

In [32]:
from ragas.dataset_schema import SingleTurnSample
from ragas.metrics import SemanticSimilarity


sample = SingleTurnSample(
    response="The Eiffel Tower is located in Paris.",
    reference="The Eiffel Tower is located in Paris. I has a height of 1000ft."
)

scorer = SemanticSimilarity()
scorer.embeddings = azure_embeddings
await scorer.single_turn_ascore(sample)

0.9379679986573374

In [34]:
from ragas.dataset_schema import SingleTurnSample
from ragas.metrics._string import NonLLMStringSimilarity

sample = SingleTurnSample(
    response="The Eiffel Tower is located in India.",
    reference="The Eiffel Tower is located in Paris."
)

scorer = NonLLMStringSimilarity()
await scorer.single_turn_ascore(sample)

0.8918918918918919

In [48]:
from ragas.dataset_schema import SingleTurnSample
from ragas.metrics import SimpleCriteriaScore


sample = SingleTurnSample(
    user_input="Where is the Eiffel Tower located?",
    response="The Eiffel Tower is located in France.",
    reference="The Eiffel Tower is located in France."
)

scorer =  SimpleCriteriaScore(
    name="course_grained_score", 
    definition="Score 0 to 5 by similarity, 0 is the lowest and 5 is the highest",
    llm=azure_llm
)

await scorer.single_turn_ascore(sample)

5

In [None]:
from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams

correctness_metric = GEval(
    name="Correctness",
    criteria="Determine whether the actual output is factually correct based on the expected output.",
    # NOTE: you can only provide either criteria or evaluation_steps, and not both
    evaluation_steps=[
        "Check whether the facts in 'actual output' contradicts any facts in 'expected output'",
        "You should also heavily penalize omission of detail",
        "Vague language, or contradicting OPINIONS, are OK",
    ],
    evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT],
)

In [73]:
from langchain_openai import AzureChatOpenAI
from deepeval.models.base_model import DeepEvalBaseLLM
import os
from dotenv import load_dotenv

load_dotenv()

class DEAzureOpenAI(DeepEvalBaseLLM):
    def __init__(
        self,
        model,
        model_name
    ):
        self.model = model
        self.model_name = model_name

    def load_model(self):
        return self.model

    def generate(self, prompt: str) -> str:
        chat_model = self.load_model()
        return chat_model.invoke(prompt).content

    async def a_generate(self, prompt: str) -> str:
        chat_model = self.load_model()
        res = await chat_model.ainvoke(prompt)
        return res.content

    def get_model_name(self):
        return self.model_name


# gpt_35 = AzureChatOpenAI(
#     openai_api_version=os.getenv("OPENAI_API_VERSION"),
#     azure_deployment=os.getenv("AZURE_OPENAI_DEPLOYMENT_GPT_35"),
#     azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
#     openai_api_key=os.getenv("AZURE_OPENAI_API_KEY"),
# )
azure_llm = AzureChatOpenAI(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["model_deployment"],
    model=azure_configs["model_name"],
    validate_base_url=False,
)

# gpt_35 = DEAzureOpenAI(model=gpt_35, model_name="Azure OpenAI GPT-3.5")
gpt_4 = DEAzureOpenAI(model=azure_llm, model_name="Azure OpenAI GPT-4")


In [77]:
from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCaseParams

correctness_metric = GEval(
    model=gpt_4,
    name="Correctness",
    evaluation_steps=[
        "Check whether the facts in 'actual output' contradicts any facts in 'expected output'",
        "You should also heavily penalize omission of detail",
        "Vague language, or contradicting OPINIONS, are OK"
    ],
    evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT],
)

In [78]:
import nest_asyncio
nest_asyncio.apply()

In [104]:
answer = """The provided text states that the decrease in the percentage of inherited wealth among the richest Americans from 1982 to 2020 is not due to increased inheritance taxes; in fact, these taxes decreased significantly during that period.  Instead, the decline is attributed to the rise in the number of people creating new fortunes.  This increase in newly created wealth is primarily driven by two factors:\n\n1. **Company Founding:**  Approximately three-quarters of new fortunes in 2020 stemmed from founding companies or early employee equity.  This contrasts sharply with 1982, where inheritance was the dominant source of wealth for the richest individuals.\n\n2. **Investment Management:**  Another significant factor is the rise of successful investment fund managers.  While hedge funds and private equity firms existed in 1982, none of their founders were wealthy enough to be among the top 100 richest Americans at that time.  By 2020, however, 17 of the 73 new fortunes were attributable to managing investment funds.  This reflects both the development of new high-return investment strategies and increased investor trust in these fund managers.\n\nIn summary, the shift away from inherited wealth as the primary source of riches among the Forbes 400 between 1982 and 2020 is primarily explained by the substantial increase in wealth creation through company founding and successful investment management, not by changes in inheritance tax laws."""

In [105]:
from deepeval.test_case import LLMTestCase
...

test_case = LLMTestCase(
    input="What were the main reasons for the decline in inherited wealth as the primary source of riches among the Forbes 400 between 1982 and 2020?",
    actual_output=answer,
    expected_output="The decline in inherited wealth isn't due to increased inheritance taxes (which actually decreased during this period).  Instead, it's primarily because more people are creating new fortunes through starting companies and investing, especially in the tech sector. This shift is largely attributed to the decreasing cost of starting businesses, driven by technological advancements, and the rise of new, high-growth industries. The rise of venture capital and private equity has also played a significant role, providing funding and expertise to help startups scale rapidly.  Essentially, the opportunities for creating wealth through entrepreneurship and investment have expanded significantly, outpacing the accumulation of wealth through inheritance."
)

correctness_metric.measure(test_case)
print(correctness_metric.score)
print(correctness_metric.reason)

Output()

0.8
The actual output accurately identifies the main reasons for the decline in inherited wealth in line with the expected output, but omits detail about the role of tech sector and the decreasing cost of starting businesses due to technological advancements.
