# **Reuters-21578 News Classification Using Generative AI**

## **Key Features of the Approach**
- **Zero-shot Classification**:
  - Classify articles directly using a pre-trained language model by crafting intuitive prompts without prior examples.
- **One-shot Classification**:
  - Guide the model with one labeled example to refine its contextual understanding for classification.
- **Generative AI**:
  - Use advanced language models capable of understanding and generating human-like text for predictive tasks.


# Table of Contents
1. [Load Dataset](#load-dataset)
2. [Install Required Libraries](#install-required-libraries)
3. [Set up OpenAI API Key](#set-up-openai-api-key)
4. [Import Libraries](#import-libraries)
5. [Initialize LLM](#initialize-llm)
6. [Zero-shot Classification](#zero-shot-classification)
7. [One-shot Classification](#one-shot-classification)

# Load Dataset

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
file_path = '/content/drive/MyDrive/processed_dataset.csv'

In [3]:
import pandas as pd
df = pd.read_csv(file_path)
df

Unnamed: 0,Topics,Places,People,Orgs,Exchanges,Companies,LEWISSPLIT,CGISPLIT,OLDID,NEWID,Title,Dateline,Body
0,earn,usa,,,,,TEST,TRAINING-SET,5041,16003,BEVERLY ENTERPRISES <BEV> SETS REGULAR DIVIDEND,"PASEDENA, Calif., April 9 - \n",Qtly div five cts vs five cts prior\n Pay J...
1,money-fx,usa,james-baker,imf,,,TEST,TRAINING-SET,5044,16006,TREASURY'S BAKER SAYS SYSTEM NEEDS STABILITY,"WASHINGTON, April 9 -",Treasury Secretary James Baker said\nthe float...
2,earn,usa,,,,,TEST,TRAINING-SET,5052,16013,TRUSTCORP INC <TTCO> 1ST QTR NET,"TOLEDO, Ohio, April 9 - \n","Shr 67 cts vs 62 cts\n Net 9,160,000 vs 7,7..."
3,earn,usa,,,,,TEST,TRAINING-SET,5054,16015,NAPA VALLEY BANCORP <NVBC> 1ST QTR NET,"NAPA, Calif, April 9 -\n","Shr 20 cts vs 25 cts\n Net 487,000 vs 435,0..."
4,earn,usa,,,,,TEST,TRAINING-SET,5055,16016,INTERNATIONAL POWER MACHINES <PWR> 4TH QTR LOSS,"MESQUITE, Texas, April 9 - \n",Shr loss 21 cts vs loss 28 cts\n Net loss 8...
...,...,...,...,...,...,...,...,...,...,...,...,...,...
6481,interest,west-germany,,,,,TRAIN,TRAINING-SET,18389,1971,BUNDESBANK LEAVES CREDIT POLICIES UNCHANGED,"FRANKFURT, March 5 -",The Bundesbank left credit policies\nunchanged...
6482,money-fx,egypt,,,,,TRAIN,TRAINING-SET,18401,1983,EGYPTIAN CENTRAL BANK DOLLAR RATE UNCHANGED,"CAIRO, March 4 -",Egypt's Central Bank today set the dollar\nrat...
6483,acq,usa,,,,,TRAIN,TRAINING-SET,18412,1994,BAKER <BKO> SUES TO FORCE HUGHES <HT> MERGER,"HOUSTON, March 5 -",Baker International corp said it has\nfiled su...
6484,interest,spain,,ec,,,TRAIN,TRAINING-SET,18413,1995,SPAIN DEREGULATES BANK DEPOSIT INTEREST RATES,"MADRID, March 5 -",Spain's Finance Ministry deregulated bank\ndep...


# Install Required Libraries

In [4]:
! pip -qqq install langchain-openai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/50.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.7/50.7 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/409.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m399.4/409.5 kB[0m [31m16.2 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m409.5/409.5 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.2/1.2 MB[0m [31m59.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m27.2 MB/s[0m eta [36m0:00:00[0m
[?25h

# Set up OpenAI API Key

In [5]:
from google.colab import userdata
import os
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# import libraries

In [55]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from typing import List
from langchain_openai import ChatOpenAI


# Initialize LLM

In [56]:
class Classification(BaseModel):
    topics: List[str] = Field(
        description=(
            "The topic that the article belongs to. You must choose exactly one. "
            "Possible values are: 'money-fx', 'ship', 'interest', 'acq', 'earn'."
        )
    )

# Initialize the LLM with structured output
llm = ChatOpenAI(temperature=0, model="gpt-4").with_structured_output(Classification)

# Prompt Engineering for Zero-shot Classification

In [8]:
tagging_prompt = ChatPromptTemplate.from_template(
     """
You are an expert news classifier. Your task is to read the following passage and classify it into exactly one of the following topics:

- money-fx
- ship
- interest
- acq
- earn

You **must** choose exactly one topic from the list above. If the passage does not perfectly match any topic, choose the closest relevant topic based on its overall context.

**Output Format:**
Provide your answer in JSON format matching the 'Classification' class.

Example:
{{
    "topics": ["earn"]
}}

Passage:
{input}
"""
)

In [None]:
# Your input passage
input_passage = """Viacom International Inc said &lt;National
Amusements Inc> has again raised the value of its offer for
Viacom's publicly held stock.
    The company said the special committee of its board plans
to meet later today to consider this offer and the one
submitted March one by &lt;MCV Holdings Inc>.
    A spokeswoman was unable to say if the committee met as
planned yesterday.
    Viacom said National Amusements' Arsenal Holdings Inc
subsidiary has raised the amount of cash it is offering for
each Viacom share by 75 cts to 42.75 dlrs while the value of
the fraction of a share of exchangeable Arsenal Holdings
preferred to be included was raised 25 cts to 7.75 dlrs.
    National Amusements already owns 19.6 pct of Viacom's stock."""

# Generate the classification using the .invoke() method
response = llm.invoke(tagging_prompt.format(input=input_passage))

# Access the topics
print(response.topics)


['ACQ']


In [16]:
df[(df['LEWISSPLIT'] == 'TRAIN') & (df['Topics'] == 'acq')]['Body'][583]


"Irving Bank Corp said it bought the\nfactoring division of Associates Commercial Corp, a unit of\nGulf and Western Co Inc's Associates Corp of North America.\n    The terms of the previously announced deal were not\ndisclosed.\n    It said the assets were transferred to Irving Commercial\nCorp.\n\n Reuter\n\x03"

#  Prompt Engineering for one-shot Classification

In [46]:
examples = [
    {
        "example_passage": "Zambia will reintroduce a modified\nforeign exchange auction at the end of this month as part of a\nnew two-tier exchange rate, central bank governor Leonard\nChivuno said.\n    Chivuno told a press conference at the end of three weeks\nof negotiations with the International Monetary Fund (IMF) that\nthere would be a fixed exchange rate for official transactions\nand a fluctuating rate, decided by the auction, for other types\nof business.\n    The Bank of Zambia previously held weekly auctions to\ndistribute foreign exchange to the private sector and determine\nthe kwacha's exchange rate, but these were suspended at the end\nof January.\n    President Kenneth Kaunda said at the time that he was\nsuspending the auction system in view of the rapid devaluation\nand violent fluctuations of the exchange rate which had\nresulted.\n    Business and banking sources said another reason for\nsuspending the auction was that the central bank was low on\nforeign exchange and was 10 weeks behind in paying successful\nbidders.    The kwacha stood at 2.2 per dollar when the auction system\nwas first introduced in October 1985, but it slid to around 15\nper dollar by the time it was suspended 16 months later.\n    Since then, Zambia has operated a fixed exchange rate of\nabout nine kwacha per dollar.\n REUTER\n\x03",
        "example_classification": {"topics": ["money-fx"]},
    },
    {
        "example_passage": "Some 10 Indian ships have been held up\nat Calcutta port after four days of industrial action by local\nseamen, a spokesman for the shipowners' association INSA said.\n    The dispute has prevented local crewmen signing on and off,\nbut has not affected foreign ships with international crews\ndocking at Calcutta, which exports tea and jute and imports\nmachinery, crude oil and petroleum products, the spokesman\nsaid.\n    Foreign ships may also suffer if dock workers join the\naction, he said. The Shipping Corporation of India (SCI) has\nasked its ships to avoid the port until the dispute is over,\nNational Union of Seafarers in India president Leo Barnes said.\n Reuter\n\x03",
        "example_classification": {"topics": ["ship"]},
    },
    {
        "example_passage": "British bank base lending rates are\nlikely to fall by as much as one full point to 9-1/2 pct this\nweek following the sharp three billion stg cut in the U.K.\nCentral government borrowing target to four billion stg set in\ntoday's 1987 budget, bank analysts said.\n    The analysts described Chancellor of the Exchequer Nigel\nLawson's budget as cautious, a quality which currency and money\nmarkets had already started to reward.\n    Sterling surged on foreign exchange markets and money\nmarket interest rates moved sharply lower as news of the budget\nmeasures came through, the analysts said.\n    Lloyds merchant bank chief economist Roger Bootle said he\nexpected base rates to be cut by one full point tomorrow.\n    \"This is very much a safety-first budget in order to get\ninterest rates down,\" he said.\n    Bootle said the money markets had almost entirely\ndiscounted such a one point cut, with the key three month\ninterbank rate down to 9-11/16 pct from 9-13/16 last night, and\nit ",
        "example_classification": {"topics": ["interest"]},
    },
    {
        "example_passage": "Irving Bank Corp said it bought the\nfactoring division of Associates Commercial Corp, a unit of\nGulf and Western Co Inc's Associates Corp of North America.\n    The terms of the previously announced deal were not\ndisclosed.\n    It said the assets were transferred to Irving Commercial\nCorp.\n\n Reuter\n\x03",
        "example_classification": {"topics": ["acq"]},
    },
    {
        "example_passage": "Shr five cts vs one ct\n    Net 196,986 vs 37,966\n    Revs 15.5 mln vs 8,900,000\n    Nine mths\n    Shr 52 cts vs 22 cts\n    Net two mln vs 874,000\n    Revs 53.7 mln vs 28.6 mln\n Reuter\n\x03",
        "example_classification": {"topics": ["earn"]},
    },
]

# Corrected prompt template with escaped curly braces
tagging_prompt_one_shot = ChatPromptTemplate.from_template(
     """
  You are an expert news classifier. Your task is to read the following passage and classify it into exactly one of the following topics:

  - money-fx
  - ship
  - interest
  - acq
  - earn

  You **must** choose exactly one topic from the list above. If the passage does not perfectly match any topic, choose the closest relevant topic based on its overall context.

  **Output Format:**
  Provide your answer in JSON format matching the 'Classification' class.

  Here are some examples to guide you:

  {examples_text}

  Example:
  {{
      "topics": ["EARN"]
  }}

  Passage:
  {input}
"""
)

In [47]:
# Sample input passage
input_passage = df_test = df[df['LEWISSPLIT'] == 'TEST']['Topics'][0]

# Format the prompt with the input passage
formatted_prompt = tagging_prompt_one_shot.format(input=input_passage, examples_text = examples)

# Invoke the LLM
response = llm.invoke(formatted_prompt)

# Output the topics
print(response.topics)

['earn']
