In [1]:
import warnings
warnings.filterwarnings('ignore')

### CrewAI + Ollama + qwen2.5:14b

In [2]:
from crewai import Agent, Crew, Task
from langchain_ollama import OllamaLLM

def chat_with_llm(prompt, model="ollama_chat/qwen2.5:14b", base_url="http://localhost:11434", max_tokens=8192):
    llm = OllamaLLM(
        model=model,
        base_url=base_url,
        temperature=0.7,
        max_tokens=max_tokens
    )

    agent = Agent(
        role="Helpful assistant",
        goal="Respond accurately to user prompts",
        backstory="You are a concise and knowledgeable assistant.",
        verbose=False,
        llm=llm
    )

    task = Task(
        description=prompt,
        expected_output="A helpful and accurate response to the user's prompt.",
        agent=agent
    )

    crew = Crew(
        agents=[agent],
        tasks=[task],
        verbose=False
    )

    return crew.kickoff()

## Example usage
#if __name__ == "__main__":
#    reply = chat_with_llm("Explain the Bellman equation in simple terms.")
#    print(reply)

print(chat_with_llm("Explain the Bellman equation in simple terms.", max_tokens=1000))


The Bellman equation is a key concept in decision-making processes, particularly in fields like economics, artificial intelligence, and operations research. It's used to solve complex problems by breaking them down into simpler sub-problems.

Imagine you're playing a video game where at each level, you have choices that affect your score for that level and the options available in future levels. The Bellman equation helps you make decisions not just based on what gives you immediate rewards but also considers long-term benefits from those choices.

In simple terms, it represents an optimal decision rule telling you how to act now given some information about what might happen later if you follow certain strategies or policies. It takes into account the current state (like your score and level in a game), possible actions you can take, immediate rewards for those actions, and potential future benefits.

The equation looks like this: V(s) = max_a [R(s,a) + γ * ∑_s' P(s'|s,a)V(s')]

Where

### Tokenizer

In [3]:
from utils import llama4, llama4_together
from transformers import AutoTokenizer

model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)


None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


### Multiple document summarization

In [13]:
from utils import pdf2text

papers = [
"長照專業服務操作指引-觀念篇_公告.pdf",
]
#"長照專業服務操作指引-操作篇-共通操作指引_公告.pdf",
#"長照專業服務操作指引-操作篇-居家護理指導與諮詢操作指引_公告.pdf"

paper_texts = []
for n, paper in enumerate(papers):
    text = pdf2text(f"data/pdfs/{paper}")
    paper_texts.append(f"Processing paper {n+1} - {paper}, {len(text)} characters\n")
    print(f"Processing paper {n+1} - {paper}, {len(text)} characters")

    summary = chat_with_llm(f"""give me a summary of less than 140 words for the article below {text}""", 
                            max_tokens=600)
    paper_texts.append(f"{summary}\n\n")

total_text = "\n\n".join(paper_texts)
print(f"\nTotal papers processed: {len(paper_texts)/2}, {len(total_text)}")


Processing paper 1 - 長照專業服務操作指引-觀念篇_公告.pdf, 36408 characters

Total papers processed: 1.0, 875


In [14]:
print(total_text)


Processing paper 1 - 長照專業服務操作指引-觀念篇_公告.pdf, 36408 characters


根據提供的文獻和內容，以下是促進跨專業合作以實現成功復能（Reablement）服務的具體措施：

1. **教育培訓**：
   - 進行專業服務相關之繼續教育課程，使個管人員對於復能概念與專業服務內涵有正確的理解。
   - 提供新進個案管理員和個別化照顧管理員（A 個管）實地見習機會，讓他們與 B 單位的專業人員共同訪視個案，以了解專業服務的實際執行情況。
   - 新進專業人員需通過考核後方予以聘用，確保其具備提供高品質專業服務的能力。

2. **建立合作溝通機制**：
   - 每個個案都有專屬記事本，有助於促進各專業與服務人員之間的溝通。
   - 建立跨專業群組（如 Line 群組），使所有相關專業和人員能夠共同討論個案狀況、指導措施及表現等信息。
   - 記事本中包含每次訪視的日期、個案問題、執行內容、指導措施等，並建議下次訪視時需注意的事項或提供建議。

3. **聯合訪視**：
   - 專業服務指導人員與居服員共同訪視個案，確保在每次家訪中都能提供適宜的練習機會。
   - 透過跨專業合作模式（如圖6所示），追蹤和評估個案自選活動的訓練進展。

4. **見習/實習制度**：
   - 設立典範實習單位，使新單位能夠了解績優單位的運作模式。
   - 新進專業人員需完成見習實習方案並通過考核，以確保其具備提供高品質專業服務的能力。

5. **每日密集性訓練原則**：
   - 確保居服員在每次訪視時都能針對個案自選活動提供適宜的練習機會，強調「每日密集性訓練」的重要性。
   
6. **影像和文件管理**：
   - 需取得個案家屬同意並告知其照片或影片可觀看範圍，以確保個資安全。
   - 將需要與團隊溝通的照片或影像上傳至專屬記事本，以追蹤訓練的正確性和進展。

這些措施可以確保跨專業合作的成功進行，促進復能服務的有效執行，幫助老年人恢復和維持獨立生活的能力。




### Loop thru the whole directory

In [16]:
import os
from utils import pdf2text

# Set the directory containing PDFs
pdf_dir = "data/pdfs"

# Automatically find all PDF files in the directory
papers = [f for f in os.listdir(pdf_dir) if f.endswith(".pdf")]

print(f"Found {len(papers)} PDF files to process.\n")

paper_texts = []
for n, paper in enumerate(papers):
    text = pdf2text(os.path.join(pdf_dir, paper))
    paper_texts.append(f"Processing paper {n+1} - {paper}, {len(text)} characters\n")
    print(f"Processing paper {n+1} - {paper}, {len(text)} characters")

    summary = chat_with_llm(f"""give me a summary of less than 110 words for the article below {text}""", 
                            max_tokens=500)
    paper_texts.append(f"{summary}\n\n")

total_text = "\n\n".join(paper_texts)
print(f"\nTotal papers processed: {len(paper_texts)/2}, {len(total_text)}")


Found 53 PDF files to process.

Processing paper 1 - 2.獎勵布建住宿式長照機構資源計畫-114年度待獎勵區域(1140502).pdf, 3801 characters
Processing paper 2 - 衛部顧字第1131962420號公告.pdf, 2 characters
Processing paper 3 - 問答集1.pdf, 1293 characters
Processing paper 4 - 長照專業服務操作指引-操作篇-共通操作指引_公告.pdf, 19399 characters
Processing paper 5 - 問答集2.pdf, 938 characters
Processing paper 6 - 照顧實務指導員訓練(公告).pdf, 1311 characters
Processing paper 7 - 家庭照顧者支持服務據點專業人員工作手冊.pdf, 75805 characters
Processing paper 8 - 院臺衛字第1131014413號.pdf, 695 characters
Processing paper 9 - 問答集3.pdf, 557 characters
Processing paper 10 - 112年居家失能個案家庭醫師照護方案(1120626).pdf, 13720 characters
Processing paper 11 - 112年居家失能個案家庭醫師照護方案公告(1120626).pdf, 0 characters
Processing paper 12 - 家庭照顧者支持服務原則(公告).pdf, 1804 characters
Processing paper 13 - 附件1-申請流程圖（114.04.09修正版）.pdf, 906 characters
Processing paper 14 - 各縣市失智症照顧及服務資訊.pdf, 1721 characters
Processing paper 15 - 「住宿機構強化感染管制獎勵計畫」縣市計畫書格式.pdf, 2173 characters
Processing paper 16 - 3.公告-獎勵布建住宿式長照機構資源計畫.pdf, 0 chara

Ignoring wrong pointing object 6 0 (offset 0)
Ignoring wrong pointing object 8 0 (offset 0)
Ignoring wrong pointing object 10 0 (offset 0)
Ignoring wrong pointing object 12 0 (offset 0)
Ignoring wrong pointing object 14 0 (offset 0)
Ignoring wrong pointing object 16 0 (offset 0)
Ignoring wrong pointing object 24 0 (offset 0)
Ignoring wrong pointing object 26 0 (offset 0)
Ignoring wrong pointing object 28 0 (offset 0)
Ignoring wrong pointing object 43 0 (offset 0)
Ignoring wrong pointing object 48 0 (offset 0)
Ignoring wrong pointing object 54 0 (offset 0)
Ignoring wrong pointing object 227 0 (offset 0)
Ignoring wrong pointing object 232 0 (offset 0)
Ignoring wrong pointing object 234 0 (offset 0)
Ignoring wrong pointing object 236 0 (offset 0)
Ignoring wrong pointing object 238 0 (offset 0)
Ignoring wrong pointing object 240 0 (offset 0)
Ignoring wrong pointing object 242 0 (offset 0)
Ignoring wrong pointing object 257 0 (offset 0)
Ignoring wrong pointing object 752 0 (offset 0)
Ignori

Processing paper 28 - 附件3-長照機構暨長照人員相關管理資訊系統_品質提升管理_介接規格書v4.5.pdf, 73705 characters
Processing paper 29 - 失智共同照護中心及社區服務據點參考手冊.pdf, 26055 characters
Processing paper 30 - 113至116年住宿機構照顧品質獎勵計畫(公告核定版).pdf, 21152 characters
Processing paper 31 - 附件2-住宿機構照顧品質獎勵計畫說明會議 v1.4R（114.6.13更新）.pdf, 5459 characters
Processing paper 32 - 住宿式機構與醫院合作服務合約參考範本.pdf, 1047 characters
Processing paper 33 - 長照高負荷家庭照顧者轉介及服務流程.pdf, 1255 characters
Processing paper 34 - 院臺衛字第1131020942號函PDF.pdf, 619 characters
Processing paper 35 - 1.公告-獎勵布建住宿式長照機構資源計畫-114年度待獎勵區域.pdf, 6 characters
Processing paper 36 - 附件4、住宿機構照顧品質獎勵計畫-懶人包.pdf, 326 characters
Processing paper 37 - 住宿機構照顧品質獎勵計畫問答集(113年11月28日版).pdf, 21100 characters
Processing paper 38 - 長照專業服務操作指引-操作篇-居家護理指導與諮詢操作指引_公告.pdf, 4301 characters
Processing paper 39 - 「住宿式機構強化感染管制獎勵計畫」衛生福利部權責司署計畫審查內容及評分原則.pdf, 807 characters
Processing paper 40 - 高負荷家庭照顧者初篩指標.pdf, 1345 characters
Processing paper 41 - L2+L3課程及師資公告函.pdf, 1318 characters
Processing paper 42 - 長照專業服務手冊1120109

In [17]:
import os

def write_long_text_to_file(filename: str, content: str):
    """
    Writes a given text string to a local file, overwriting if it exists.
    """
    try:
        with open(filename, 'w', encoding='utf-8') as f:
            f.write(content)
        print(f"Content successfully written to '{filename}'.")
    except IOError as e:
        print(f"Error writing to file '{filename}': {e}")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

'''
if __name__ == "__main__":
    # Example Usage
    text_content = """
    This is a concise example of text to be written to a file.
    It demonstrates writing multiple lines efficiently.
    """
    output_file = "concise_text_file.txt"
    write_long_text_to_file(output_file, text_content)

    # Optional: Verify content
    try:
        with open(output_file, 'r', encoding='utf-8') as f:
            read_back = f.read()
            print(f"Read back (first 50 chars): '{read_back[:50]}...'")
            if read_back == text_content:
                print("Verification: Content matches.")
            else:
                print("Verification: Content mismatch.")
    except FileNotFoundError:
        print(f"Error: File '{output_file}' not found for verification.")
'''
write_long_text_to_file(f"./total_text.txt", total_text)


Content successfully written to './total_text.txt'.


### Article Summary Classifier

In [18]:
# 🧠 Step 1: Setup
import re
import openai
import os
from collections import defaultdict

# Set your OpenAI API key
openai.api_key = os.getenv("OPENAI_API_KEY")  # Or paste your key here

# Use OpenAI's cost-efficient model
MODEL = "gpt-4o-mini"


In [19]:
# 📄 Step 2: Preprocess article summaries

raw_text = total_text
pattern = r"(Processing paper \d+ - .+?\.pdf, \d+ characters)"
splits = re.split(pattern, raw_text)[1:]

articles = []
for i in range(0, len(splits), 2):
    header = splits[i].strip()
    summary = splits[i + 1].strip()
    articles.append((header, summary))

print(f"✅ Loaded {len(articles)} articles.")

# 🔎 Filter short summaries
summary_lengths = [len(summary) for _, summary in articles]
median_length = sorted(summary_lengths)[len(summary_lengths) // 2]
short_threshold = 0.25 * median_length

print(f"ℹ️ Median summary length: {median_length} characters")
print(f"📉 Articles shorter than {short_threshold:.0f} characters will be grouped as 'Short Summaries'")

short_articles = []
normal_articles = []
for pair in articles:
    if len(pair[1]) < short_threshold:
        short_articles.append(pair)
    else:
        normal_articles.append(pair)

print(f"📂 {len(short_articles)} short articles identified.")


✅ Loaded 53 articles.
ℹ️ Median summary length: 530 characters
📉 Articles shorter than 132 characters will be grouped as 'Short Summaries'
📂 1 short articles identified.


In [20]:
print(short_articles)


[('Processing paper 34 - 院臺衛字第1131020942號函PDF.pdf, 619 characters', '行政院函件同意衛生福利部所報「住宿機構照顧品質獎勵計畫」草案，並指示正本送衛福部，副本送內政部、國家發展委員會及行政院主計總處。發文日期為113年8月19日，發文字號為院臺衛字第1131020942號。')]


In [22]:
# 🤖 Step 3: Classify normal articles using GPT-4o-mini
def classify_articles(normal_articles):
    from openai import OpenAI
    client = OpenAI()

    preview_text = "\n\n".join(
        [f"{i+1}. {summary[:500]}" for i, (_, summary) in enumerate(normal_articles)]
    )

    prompt = f"""
You are a helpful assistant. I have article summaries and want to group them thematically.
Please classify them into **no more than 8 groups** based on content, and give each group a one-sentence label.

Only classify the summaries below. Ignore missing or overly short ones.

Respond exactly in this format:
Group 1: [Label]
Articles: [list of numbers]
Group 2: ...

Here are the summaries:
{preview_text}
"""

    response = client.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2
    )

    return response.choices[0].message.content

grouping_response = classify_articles(normal_articles)
print("🧠 LLM Classification Result:\n")
print(grouping_response)


🧠 LLM Classification Result:

Group 1: [Long-Term Care Policies and Programs]
Articles: [1, 3, 6, 12, 13, 35]

Group 2: [Professional Training and Development]
Articles: [4, 8, 20, 24, 40]

Group 3: [Technology in Caregiving]
Articles: [5, 17, 22, 25]

Group 4: [Family Caregiver Support]
Articles: [7, 12, 39, 51]

Group 5: [Assessment and Evaluation in Long-Term Care]
Articles: [10, 17, 33, 37]

Group 6: [Health and Safety Regulations]
Articles: [14, 18, 32, 38]

Group 7: [Remote Work and Workplace Dynamics]
Articles: [2, 41]

Group 8: [Environmental and Agricultural Issues]
Articles: [16, 34]


In [28]:
# 📊 Step 4: Parse grouping and merge short summary group
import re
from collections import defaultdict

# 📊 Step 4: Parse LLM grouping output correctly with bracket support
grouped_articles = defaultdict(list)

# Match group blocks robustly
group_blocks = re.findall(r"(Group \d+:\s*\[.*?\].*?)(?=Group \d+:|\Z)", grouping_response, re.DOTALL)

for block in group_blocks:
    # Extract label inside brackets: [Label]
    label_match = re.search(r"Group \d+:\s*\[(.+?)\]", block)
    label = label_match.group(1).strip() if label_match else "Unnamed Group"

    # Extract numbers inside Articles: [1, 2, ...]
    article_match = re.search(r"Articles:\s*\[([^\]]+)\]", block)
    if article_match:
        number_str = article_match.group(1)
        indices = [int(n.strip()) - 1 for n in number_str.split(",") if n.strip().isdigit()]
        for idx in indices:
            if 0 <= idx < len(normal_articles):
                grouped_articles[label].append(normal_articles[idx])

# Add short summary group if any
if short_articles:
    grouped_articles["Short Summaries or Incomplete Articles"] = short_articles

print(f"📦 Total {len(grouped_articles)} groups including short articles.")


📦 Total 9 groups including short articles.


In [30]:
# 📝 Step 5: Write results to text files
output_dir = "classified_articles"
os.makedirs(output_dir, exist_ok=True)

for label, items in grouped_articles.items():
    filename = re.sub(r'[\\/:"*?<>|]+', "_", label[:60]) + ".txt"
    filepath = os.path.join(output_dir, filename)
    with open(filepath, "w", encoding="utf-8") as f:
        for header, summary in items:
            f.write(header + "\n")
            f.write(summary + "\n\n")

print(f"✅ Saved all groups to folder: {output_dir}")


✅ Saved all groups to folder: classified_articles
