# **Reuters-21578 News Classification Using Generative AI**

## **Key Features of the Approach**
- **Zero-shot Classification**:
  - Classify articles directly using a pre-trained language model by crafting intuitive prompts without prior examples.
- **One-shot Classification**:
  - Guide the model with one labeled example to refine its contextual understanding for classification.
- **Generative AI**:
  - Use advanced language models capable of understanding and generating human-like text for predictive tasks.

# import libraries

In [1]:
!pip install httpx==0.27.2



In [9]:
! pip -qqq install langchain-openai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/50.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.7/50.7 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.5/1.2 MB[0m [31m14.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m17.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
from google.colab import userdata

In [17]:
import os
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY1')

In [11]:
from openai import OpenAI
import numpy as np
import pandas as pd
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from typing import List
from langchain_openai import ChatOpenAI
client = OpenAI(max_retries=5)

In [7]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Dataset

In [8]:
file_path = '/content/drive/MyDrive/topics_classification_dataset.csv'
df = pd.read_csv(file_path)
df

Unnamed: 0,LEWISSPLIT,Text,Topics
0,TRAIN,JAGUAR SEES STRONG GROWTH IN NEW MODEL SALES J...,earn
1,TRAIN,NORD RESOURCES CORP <NRD> 4TH QTR NET Shr 19 c...,earn
2,TRAIN,FIVE GROUPS APPLY TO BUY FRENCH TELEPHONE GROU...,acq
3,TRAIN,BLIZZARD CLOSES BOSPHORUS Blizzard conditions ...,ship
4,TRAIN,JAPAN FUND <JPN> SEEKERS CONFIDENT OF FINANCIN...,acq
...,...,...,...
7052,TRAIN,BAKER INTERNATIONAL CORP SUES HUGHES TOOL SEEK...,acq
7053,TRAIN,USAIR GROUP REJECTS TRANS WORLD AIRLINES TAKEO...,acq
7054,TRAIN,BAKER <BKO> SUES TO FORCE HUGHES <HT> MERGER B...,acq
7055,TRAIN,SPAIN DEREGULATES BANK DEPOSIT INTEREST RATES ...,interest


# Initialize LLM

In [20]:
class Classification(BaseModel):
    topics: List[str] = Field(
        description=(
            "The topic that the article belongs to. You must choose exactly one. "
            "Possible values are: 'money-fx', 'ship', 'interest', 'acq', 'earn'."
        )
    )

llm = ChatOpenAI(temperature=0, model="gpt-4").with_structured_output(Classification)

In [18]:
prompt = ChatPromptTemplate.from_template(
     """
You are an expert news classifier. Your task is to read the following passage and classify it into exactly one of the following topics:

- money-fx
- ship
- interest
- acq
- earn

You **must** choose exactly one topic from the list above. If the passage does not perfectly match any topic, choose the closest relevant topic based on its overall context.

**Output Format:**
Provide your answer in JSON format matching the 'Classification' class.

Example:
{{
    "topics": ["earn"]
}}

Passage:
{input}
"""
)

In [33]:
response = llm.invoke(prompt.format(input=df[(df['LEWISSPLIT'] == 'TRAIN')]['Text'][0]))

print(response.topics)

['earn']


In [34]:
df[(df['LEWISSPLIT'] == 'TRAIN')]['Topics'][0]

'earn'

#  Prompt Engineering for one-shot Classification



In [35]:
examples = [
    {
        "example_passage": "Zambia will reintroduce a modified\nforeign exchange auction at the end of this month as part of a\nnew two-tier exchange rate, central bank governor Leonard\nChivuno said.\n    Chivuno told a press conference at the end of three weeks\nof negotiations with the International Monetary Fund (IMF) that\nthere would be a fixed exchange rate for official transactions\nand a fluctuating rate, decided by the auction, for other types\nof business.\n    The Bank of Zambia previously held weekly auctions to\ndistribute foreign exchange to the private sector and determine\nthe kwacha's exchange rate, but these were suspended at the end\nof January.\n    President Kenneth Kaunda said at the time that he was\nsuspending the auction system in view of the rapid devaluation\nand violent fluctuations of the exchange rate which had\nresulted.\n    Business and banking sources said another reason for\nsuspending the auction was that the central bank was low on\nforeign exchange and was 10 weeks behind in paying successful\nbidders.    The kwacha stood at 2.2 per dollar when the auction system\nwas first introduced in October 1985, but it slid to around 15\nper dollar by the time it was suspended 16 months later.\n    Since then, Zambia has operated a fixed exchange rate of\nabout nine kwacha per dollar.\n REUTER\n\x03",
        "example_classification": {"topics": ["money-fx"]},
    },
    {
        "example_passage": "Some 10 Indian ships have been held up\nat Calcutta port after four days of industrial action by local\nseamen, a spokesman for the shipowners' association INSA said.\n    The dispute has prevented local crewmen signing on and off,\nbut has not affected foreign ships with international crews\ndocking at Calcutta, which exports tea and jute and imports\nmachinery, crude oil and petroleum products, the spokesman\nsaid.\n    Foreign ships may also suffer if dock workers join the\naction, he said. The Shipping Corporation of India (SCI) has\nasked its ships to avoid the port until the dispute is over,\nNational Union of Seafarers in India president Leo Barnes said.\n Reuter\n\x03",
        "example_classification": {"topics": ["ship"]},
    },
    {
        "example_passage": "British bank base lending rates are\nlikely to fall by as much as one full point to 9-1/2 pct this\nweek following the sharp three billion stg cut in the U.K.\nCentral government borrowing target to four billion stg set in\ntoday's 1987 budget, bank analysts said.\n    The analysts described Chancellor of the Exchequer Nigel\nLawson's budget as cautious, a quality which currency and money\nmarkets had already started to reward.\n    Sterling surged on foreign exchange markets and money\nmarket interest rates moved sharply lower as news of the budget\nmeasures came through, the analysts said.\n    Lloyds merchant bank chief economist Roger Bootle said he\nexpected base rates to be cut by one full point tomorrow.\n    \"This is very much a safety-first budget in order to get\ninterest rates down,\" he said.\n    Bootle said the money markets had almost entirely\ndiscounted such a one point cut, with the key three month\ninterbank rate down to 9-11/16 pct from 9-13/16 last night, and\nit ",
        "example_classification": {"topics": ["interest"]},
    },
    {
        "example_passage": "Irving Bank Corp said it bought the\nfactoring division of Associates Commercial Corp, a unit of\nGulf and Western Co Inc's Associates Corp of North America.\n    The terms of the previously announced deal were not\ndisclosed.\n    It said the assets were transferred to Irving Commercial\nCorp.\n\n Reuter\n\x03",
        "example_classification": {"topics": ["acq"]},
    },
    {
        "example_passage": "Shr five cts vs one ct\n    Net 196,986 vs 37,966\n    Revs 15.5 mln vs 8,900,000\n    Nine mths\n    Shr 52 cts vs 22 cts\n    Net two mln vs 874,000\n    Revs 53.7 mln vs 28.6 mln\n Reuter\n\x03",
        "example_classification": {"topics": ["earn"]},
    },
]

tagging_prompt_one_shot = ChatPromptTemplate.from_template(
     """
  You are an expert news classifier. Your task is to read the following passage and classify it into exactly one of the following topics:

  - money-fx
  - ship
  - interest
  - acq
  - earn

  You **must** choose exactly one topic from the list above. If the passage does not perfectly match any topic, choose the closest relevant topic based on its overall context.

  **Output Format:**
  Provide your answer in JSON format matching the 'Classification' class.

  Here are some examples to guide you:

  {examples_text}

  Example:
  {{
      "topics": ["EARN"]
  }}

  Passage:
  {input}
"""
)

In [38]:
input_passage= df[df['LEWISSPLIT'] == 'TEST']['Topics'].reset_index(drop=True)[0]

formatted_prompt = tagging_prompt_one_shot.format(input=input_passage, examples_text = examples)

response = llm.invoke(formatted_prompt)

print(response.topics)

['ship']


In [39]:
df[df['LEWISSPLIT'] == 'TEST']['Topics'].reset_index(drop=True)[0]

'ship'