# Simple demo
### description:
* #### basic demo: categorize things into current list
* #### Article categorization: categorize article into current list or create new category, and generate summary

## basic demo

In [3]:
from dotenv import load_dotenv
load_dotenv()

True

In [4]:
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate
# from langchain_core.pydantic_v1 import BaseModel, Field
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI

In [5]:
llm = ChatOpenAI(model = "gpt-4o", temperature = 0)

In [6]:
# class Issue_list(BaseModel):
#     category: str = Field(description="category of new issue")
#     category_list: list[str] = Field(description="List of category of issues")

class IssueList(BaseModel):  # 根据Python命名规范，类名使用驼峰命名
    category: str = Field(description="Category of new issue")
    category_list: list[str] = Field(default_factory=list, description="List of category of issues")

parser = JsonOutputParser(pydantic_object=IssueList)

In [7]:
prompt_template = """
Categorize the new issue into current list of issues.
If the issue can not be categorized in current list, then add it into the category_list.

{format_instructions}

<Current_category_list>
{current_category_list}
</Current_category_list>

<new_issue>
{new_issue}
</new_issue>

"""

prompt = PromptTemplate(
    template = prompt_template,
    input_variables = ["current_category_list", "new_issue"],
    partial_variables = {"format_instructions": parser.get_format_instructions()}
    )

In [8]:
categorizer = prompt | llm | parser

In [9]:
res = categorizer.invoke({"new_issue": "macbook", "current_category_list": ['fruit', 'animal', 'cars']})
print(res)


{'category': 'macbook', 'category_list': ['fruit', 'animal', 'cars', 'macbook']}


## Method 2

In [10]:
llm = ChatOpenAI(model = "gpt-4o", temperature = 0)

class IssueList(BaseModel):  # 根据Python命名规范，类名使用驼峰命名
    category: str = Field(description="Category of new issue")
    # category_list: list[str] = Field(default_factory=list, description="List of category of issues")

parser = JsonOutputParser(pydantic_object=IssueList)

prompt_template = """
Categorize the new issue into current list of issues.
If the issue can not be categorized in current list, then add it into the category_list.

{format_instructions}

<Current_category_list>
{current_category_list}
</Current_category_list>

<new_issue>
{new_issue}
</new_issue>

"""

prompt = PromptTemplate(
    template = prompt_template,
    input_variables = ["current_category_list", "new_issue"],
    partial_variables = {"format_instructions": parser.get_format_instructions()}
    )

categorizer = prompt | llm | parser

In [11]:
current_list = ['fruit', 'animal', 'cars']
res = categorizer.invoke({"new_issue": "macbook", "current_category_list": current_list})
# print(res)

if res['category'] not in current_list:
    print(f"new category: {res['category']}")
    current_list.append(res['category'])
print(current_list)

new category: macbook
['fruit', 'animal', 'cars', 'macbook']


## Article categorization

example:
generate by ChatGPT, prompt:
```
generate 3 different category of article, with label and content, ex:
label: animal
content: eagle can fly ....

the content should be between 100~200 words
```

for test:
```
give me 2 more contents that belong to previous categories, and two more contents in new categories
```

In [12]:
test_samples_01 = [
{ # Technology
    "article":"Blockchain technology is transforming the way data is stored and transactions are conducted. Originally developed for Bitcoin, blockchain is a decentralized ledger that ensures transparency and security in financial transactions. Its potential extends far beyond cryptocurrency; industries like supply chain management and healthcare are exploring blockchain for secure data sharing and tracking. One of its key features is immutability, meaning once data is recorded, it cannot be altered. This characteristic makes blockchain a powerful tool against fraud and data tampering, paving the way for more secure and transparent digital interactions.",
},
{# Nature
    "article": "Coral reefs are often referred to as the “rainforests of the sea” due to their immense biodiversity. These vibrant underwater ecosystems provide habitat and shelter for a vast array of marine species, from tiny plankton to large fish. Coral reefs also play a crucial role in protecting coastlines from erosion by dissipating wave energy. However, they are highly sensitive to environmental changes. Rising sea temperatures, pollution, and overfishing are some of the major threats to coral reefs. Efforts such as marine protected areas and sustainable fishing practices are vital to preserving these delicate ecosystems for future generations."
},
{# History
    "article": "The Industrial Revolution, beginning in the late 18th century, marked a major turning point in human history. Originating in Britain, it spread rapidly across Europe and the United States, fundamentally altering economies and societies. The introduction of machinery, such as the steam engine, enabled mass production, leading to the growth of factories and urbanization. This period also saw significant advancements in transportation, with the development of railways and steamships. While the Industrial Revolution brought about economic growth and technological innovation, it also led to challenging working conditions and environmental degradation, issues that continue to resonate today."
},
{# Label: Art
"article": "Impressionism, an art movement that began in the late 19th century, broke away from traditional artistic conventions. Artists like Claude Monet and Pierre-Auguste Renoir sought to capture the fleeting effects of light and color, often painting en plein air (outdoors). Their work emphasized the perception of the moment, with loose brushwork and vibrant colors that conveyed a sense of immediacy. Initially met with criticism, Impressionism eventually gained recognition and influenced many subsequent art movements. Today, it remains one of the most beloved and influential styles in the history of art, celebrated for its innovative approach and emotional depth."
},
{ #Label: Health
"article": "Mental health awareness has gained significant traction in recent years, emphasizing the importance of emotional well-being alongside physical health. Conditions such as anxiety, depression, and stress are increasingly recognized as serious health issues that require attention and care. Therapy, medication, and lifestyle changes are common methods for managing mental health, but there’s also a growing emphasis on preventive measures like mindfulness, exercise, and social support. Reducing stigma and promoting open conversations about mental health are crucial steps in ensuring that individuals receive the help they need to lead fulfilling lives."
}
]

In [13]:
test_samples_02 = [
    {# Label: Science
        "article": "Quantum computing represents the next frontier in computational power, leveraging the principles of quantum mechanics to perform complex calculations at unprecedented speeds. Unlike classical computers, which use bits as units of information, quantum computers use quantum bits, or qubits, which can exist in multiple states simultaneously. This property, known as superposition, allows quantum computers to solve problems that are currently intractable for classical systems, such as optimizing large datasets or simulating molecular interactions. While still in its early stages, quantum computing holds immense potential for advancements in cryptography, medicine, and artificial intelligence, potentially revolutionizing various fields."
        },
    {# Label: Literature
        "article":""" The novel "1984" by George Orwell is a seminal work of dystopian fiction that explores themes of totalitarianism, surveillance, and individual freedom. Set in a world where the government, led by the omnipresent Big Brother, exercises total control over its citizens, the story follows Winston Smith as he navigates a society devoid of privacy or truth. Orwell's portrayal of a world where language is manipulated, history is rewritten, and independent thought is punished serves as a powerful warning about the dangers of unchecked governmental power. "1984" remains a relevant and thought-provoking exploration of political oppression and the human spirit. """
        },
    {#Label: Sports
        "article":"""Soccer, known as football outside of North America, is the world's most popular sport, with millions of fans and players across the globe. The game is played on a rectangular field with two teams of eleven players each, aiming to score goals by getting the ball into the opposing team's net. Soccer is renowned for its simplicity, requiring minimal equipment, and its ability to bring people together across cultural and geographical boundaries. Major tournaments like the FIFA World Cup and UEFA Champions League attract global attention, showcasing the best talent and uniting fans in a shared passion for the sport."""
        }
]

In [15]:
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate, FewShotPromptTemplate
# from langchain_core.pydantic_v1 import BaseModel, Field
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model = "gpt-4o", temperature = 0)

In [16]:
class Article_List(BaseModel):
    category: str = Field(description="Category of new article. If the article can not be categorized in current list, then create a new category.")
    summary: str = Field(description="Summary of new article")

parser = JsonOutputParser(pydantic_object=Article_List)

In [38]:
template = """
<example>
<article>
{article}
</article>

<category>
{category}
</category>

<summary>
{summary}
</summary>
</example>
"""

example_prompt = PromptTemplate(template = template, input_variables = ["article", "category", "summary"])

In [39]:
examples = [
    { # Technology
        "article": "Artificial intelligence (AI) has rapidly evolved in recent years, becoming an integral part of various industries. From healthcare to finance, AI is revolutionizing how tasks are performed, improving efficiency and accuracy. In healthcare, AI is used for predictive diagnostics and personalized treatment plans, enabling doctors to provide better patient care. In the financial sector, AI algorithms help detect fraud and make data-driven investment decisions. As AI continues to advance, ethical considerations such as privacy and job displacement become increasingly important. Balancing innovation with these concerns will be key to harnessing AI’s full potential.",
        "category": "Technology",
        "summary": "AI is transforming industries like healthcare and finance, improving efficiency, but ethical challenges like privacy and job displacement remain."
    },
    {
        "article":
        "The Amazon Rainforest, often referred to as the “lungs of the Earth,” plays a critical role in regulating the global climate. Spanning across nine countries in South America, it is home to an incredibly diverse array of plant and animal species. The dense canopy of trees in the Amazon absorbs large amounts of carbon dioxide, helping to mitigate the effects of climate change. However, deforestation poses a significant threat to this vital ecosystem. Conservation efforts are crucial to preserving the Amazon’s biodiversity and ensuring that it continues to contribute to the health of our planet.",
        "category": "Nature",
        "summary": "The Amazon Rainforest is vital for climate regulation, but deforestation threatens its biodiversity and environmental role."
    },
    {
        "article": "The Renaissance, which spanned from the 14th to the 17th century, was a period of great cultural and intellectual growth in Europe. This era saw the revival of classical art, literature, and learning, with figures like Leonardo da Vinci and Michelangelo leading the way in artistic innovation. The invention of the printing press by Johannes Gutenberg in the mid-15th century revolutionized the dissemination of knowledge, making books more accessible to the public. The Renaissance laid the foundation for the modern world, fostering a spirit of inquiry and creativity that continues to influence contemporary thought and culture.",
        "category": "History",
        "summary": "The Renaissance was a cultural revival in Europe, advancing art and knowledge, and laying the groundwork for modern thought."
    },
]

In [40]:
categorizer_template = """Please, categorize the following article into the correct category, and provide a brief summary of the article:
<article>
{article}
</article>

<list_of_categories>
{list_of_categories}
</list_of_categories>

{format_instructions}
"""

In [41]:
# currently, example_prompt and categorizer_template are the same
fewshot_prompt = FewShotPromptTemplate(
    example_prompt = example_prompt,
    examples = examples,
    suffix = categorizer_template,
    input_variables=["article", "list_of_categories"],
    partial_variables = {
        "format_instructions": parser.get_format_instructions()}
)

In [42]:
prompt_check = fewshot_prompt.format(
    article = test_samples_01[2].get("article"),
    list_of_categories = {"Technology, History, Art"}
    )
print(prompt_check)


<example>
<article>
Artificial intelligence (AI) has rapidly evolved in recent years, becoming an integral part of various industries. From healthcare to finance, AI is revolutionizing how tasks are performed, improving efficiency and accuracy. In healthcare, AI is used for predictive diagnostics and personalized treatment plans, enabling doctors to provide better patient care. In the financial sector, AI algorithms help detect fraud and make data-driven investment decisions. As AI continues to advance, ethical considerations such as privacy and job displacement become increasingly important. Balancing innovation with these concerns will be key to harnessing AI’s full potential.
</article>

<category>
Technology
</category>

<summary>
AI is transforming industries like healthcare and finance, improving efficiency, but ethical challenges like privacy and job displacement remain.
</summary>
</example>



<example>
<article>
The Amazon Rainforest, often referred to as the “lungs of the E

In [46]:
test_samples_01[2].get("article")

'The Industrial Revolution, beginning in the late 18th century, marked a major turning point in human history. Originating in Britain, it spread rapidly across Europe and the United States, fundamentally altering economies and societies. The introduction of machinery, such as the steam engine, enabled mass production, leading to the growth of factories and urbanization. This period also saw significant advancements in transportation, with the development of railways and steamships. While the Industrial Revolution brought about economic growth and technological innovation, it also led to challenging working conditions and environmental degradation, issues that continue to resonate today.'

In [44]:
chain = fewshot_prompt | llm | parser

In [47]:
res = chain.invoke({
    "article": test_samples_01[2].get("article"),
    "list_of_categories": "Technology, Technology, History, Art"
    })
print(res)

if res['category'] not in current_list:
    print(f"new category: {res['category']}")
    current_list.append(res['category'])
print(current_list)

{'category': 'History', 'summary': 'The Industrial Revolution, starting in the late 18th century, transformed economies and societies with machinery and mass production, but also introduced challenging working conditions and environmental issues.'}
new category: History
['fruit', 'animal', 'cars', 'macbook', 'History']


In [48]:
current_list = set(["Technology", "History"])
# print(type(current_list))

for sample in test_samples_02:
    res = chain.invoke({
        "article": sample.get("article"),
        "list_of_categories": "".join(current_list)
    })
    print(res)
    current_list.add(res.get("category"))
    print(current_list)

{'category': 'Technology', 'summary': 'Quantum computing uses qubits and superposition to perform complex calculations, promising advancements in cryptography, medicine, and AI.'}
{'Technology', 'History'}
{'category': 'Literature', 'summary': "George Orwell's '1984' is a dystopian novel exploring themes of totalitarianism, surveillance, and individual freedom, highlighting the dangers of unchecked governmental power."}
{'Literature', 'Technology', 'History'}
{'category': 'Sports', 'summary': "Soccer, the world's most popular sport, is played globally and unites people through major tournaments like the FIFA World Cup and UEFA Champions League."}
{'Literature', 'Technology', 'History', 'Sports'}
