In [2]:
!pip install PyMuPDF


Collecting PyMuPDF
  Downloading pymupdf-1.26.4-cp39-abi3-win_amd64.whl.metadata (3.4 kB)
Downloading pymupdf-1.26.4-cp39-abi3-win_amd64.whl (18.7 MB)
   ---------------------------------------- 0.0/18.7 MB ? eta -:--:--
   ---------------------------------------- 0.0/18.7 MB ? eta -:--:--
   ---------------------------------------- 0.0/18.7 MB ? eta -:--:--
   ---------------------------------------- 0.0/18.7 MB ? eta -:--:--
   ---------------------------------------- 0.0/18.7 MB ? eta -:--:--
   ---------------------------------------- 0.0/18.7 MB ? eta -:--:--
   ---------------------------------------- 0.0/18.7 MB ? eta -:--:--
   ---------------------------------------- 0.0/18.7 MB ? eta -:--:--
   ---------------------------------------- 0.0/18.7 MB ? eta -:--:--
   ---------------------------------------- 0.0/18.7 MB ? eta -:--:--
   ---------------------------------------- 0.0/18.7 MB ? eta -:--:--
   ---------------------------------------- 0.0/18.7 MB ? eta -:--:--
   ------



In [1]:
import fitz  # PyMuPDF
import requests
import json

In [2]:
# Ollama local API
OLLAMA_URL = "http://localhost:11434/api/generate"

In [3]:
# Function to query Ollama
def query_ollama(prompt, model="llama3.1"):
    payload = {
        "model": model,
        "prompt": prompt
    }
    response = requests.post(OLLAMA_URL, json=payload, stream=True)
    if response.status_code != 200:
        print(f"Error {response.status_code}: {response.text}")
        return None
    
    output_text = ""
    for line in response.iter_lines():
        if line:
            data = json.loads(line.decode("utf-8"))
            if "response" in data:
                output_text += data["response"]
    return output_text



In [4]:
# Function to load text from a TXT file
'''def load_txt(file_path):
    with open(file_path, "r", encoding="utf-8") as f:
        return f.read()
'''

# Function to load text from a PDF file
def load_pdf(file_path):
    doc = fitz.open(file_path)
    text = ""
    for page in doc:
        text += page.get_text()
    return text


In [6]:
# 1. Load text from file
#text = load_txt("text.txt") #Replace with text name
text = load_pdf("article10.pdf")  

# 2. Send text to Ollama for summarization
prompt = f"Summarize the following text:\n\n{text}"  # limit to avoid overload
summary = query_ollama(prompt)

print("=== Summary ===\n", summary)

=== Summary ===
 The article discusses the complex issue of stray dogs in Pakistan, where there is a growing debate between those who advocate for humane treatment and sterilization, and those who demand culling to protect human lives. The article presents a personal account from K*, whose wife was bitten by a stray dog and nearly died from rabies. He expresses frustration with the lack of support and empathy from animal rights activists who criticized him for supporting the killing of the dog.

The article highlights the significant number of dog bite cases in Pakistan, estimated to be over 1 million annually, and the devastating consequences of delayed treatment. Experts emphasize that rabies is a preventable disease, but it requires a coordinated approach involving mass vaccination, management of stray populations, and public awareness.

International experts point out that culling is not an effective solution and has been largely discredited globally. Instead, they recommend the Ca

In [7]:
# 3. NER 
prompt = f"""
Return only valid JSON. 
Extract the following entities from the text below:

{{
  "people": [],
  "organizations": [],
  "locations": [],
  "dates": [],
  "numbers": [],
  "keywords": []
}}

Rules:
- Do not summarize.
- Do not explain.
- Only output the JSON object.
Text:
{text}
"""

In [8]:
result = query_ollama(prompt, model="llama3.1")

print("=== Final Result ===")
print(result)

=== Final Result ===
{
  "people": [
    "K*",
    "Dr Wajiha Ahmed",
    "Dr Amir Khalil"
  ],
  "organizations": [
    "FOUR PAWS International",
    "New York University",
    "World Health Organization",
    "World Food Programme"
  ],
  "locations": [
    "Pakistan",
    "Egypt",
    "Romania",
    "Moldova",
    "Ukraine",
    "Myanmar",
    "Jinnah Hospital"
  ],
  "dates": [],
  "numbers": [
    "over a million dog bite cases annually",
    "70 percent of the dog population in a city"
  ],
  "keywords": []
}


In [11]:
def extract_keywords(text, model="llama3.1"):
    prompt = f"""
    You are a keyword extraction system.
    From the following text, extract the most important 10 keywords and return them in a JSON list.

    Example output:
    {{
      "keywords": ["keyword1", "keyword2", "keyword3"]
    }}

    Text:
    {text}
    """

    result = query_ollama(prompt, model=model)
    return result

In [12]:
output = extract_keywords(text, model="llama3.1")

print("=== Extracted Keywords ===")
print(output)

=== Extracted Keywords ===
Here are the top 10 keywords extracted from the text in JSON format:

```
{
    "keywords": [
        "rabies",
        "stray dogs",
        "Pakistan",
        "animal welfare",
        "human rights",
        "public health",
        "vaccination",
        "catch-neuter-vaccinate-return (CNVR)",
        "culling controversy",
        "humane management"
    ]
}
```

Note that I've used a combination of Natural Language Processing (NLP) techniques and keyword extraction algorithms to identify the most relevant keywords from the text. The top 10 keywords are based on their frequency, importance, and relevance to the topic.
