## API経由での推論実行環境

OpenRouterを使ってAPI経由で推論を実行（無料で出来る範囲で）  

使用可能なモデルの一例  
`deepseek/deepseek-chat-v3-0324:free` : 685B  
`deepseek/deepseek-r1-0528:free` : 671B  
`deepseek/deepseek-r1:free` : 671B  
`qwen/qwen3-235b-a22b-07-25:free` : 235B  
  
参考  
[【LLMは無料で使え！】OpenRouterのススメ【CLINEにも！】](https://zenn.dev/asap/articles/5cda4576fbe7cb)

無料で最大1000request/日まで使えるようですので、予備検討には使えるかと思います。

#### 懸念事項

APIの使用については不明確な点があります（運営の見解は無料APIはOKとのことですが）
本内容は、トレーニングデータの生成ではなく、あくまで予備検討・調査と捉えてください


In [1]:
from IPython.display import display, HTML, Math, Markdown

### APT_keyの登録

`.env`ファイルを作成して以下のように登録。あるいは、本ファイル内に直接記入しても良い。

```
OPENROUTER_API_KEY=your_api_key_here
```

In [2]:

import os
from dotenv import load_dotenv

from typing import List, Dict

load_dotenv()

# api_key = "your_api_key_here"
# api_key = os.environ.get("OPENAI_API_KEY")
# base_url = "https://api.openai.com/v1"

# api_key = "your_api_key_here"
api_key = os.environ.get("OPENROUTER_API_KEY")
base_url = "https://openrouter.ai/api/v1"

### 動作確認

openaiの仕様に準じているみたいです。`import openai`が使えます。

In [3]:
import openai

client = openai.OpenAI(api_key=api_key, base_url=base_url)

In [4]:
msgs:List[Dict[str, str]] = [
    {"role": "system", "content": "あなたは親切なアシスタントです。"},
    {"role": "user", "content": "こんにちは！"},
]

# GPTのテスト
response = client.chat.completions.create(
    model="deepseek/deepseek-r1:free",
    messages=msgs,
)
print(response.choices[0].message.content)

こんにちは！ ご用件がありましたら、どんなことでもおっしゃってください。お手伝いできることがあれば幸いです！


# 検証

PersonaHabに基づいて、対象を明確化した上で問題を作成させる  

`proj-persona/PersonaHub` (license: cc-by-nc-sa-4.0)  
https://huggingface.co/datasets/proj-persona/PersonaHub  

↓

hugginfaceのdatasetから、検証用に1つだけサンプルを拝借  

persona
```text
A scientist who specializes in virology and the study of ancient viruses. This persona is interested in the potential dangers of thawing permafrost and the revival of ancient viruses, and is likely to be part of a research team studying the potential impact of these viruses on human and animal health. This persona may also be familiar with the work of Jean-Michel Claverie, the lead author of the study.

```

> ウイルス学と古代ウイルスの研究を専門とする科学者。このペルソナは、永久凍土の融解と古代ウイルスの復活の潜在的な危険性に関心を持っており、これらのウイルスが人間と動物の健康に及ぼす潜在的な影響を研究する研究チームの一員である可能性が高い。また、この研究の筆頭著者であるジャン＝ミシェル・クラヴェリー氏の研究にも精通している可能性がある。

このpersonaベースでを生成


今回は  
`deepseek/deepseek-r1:free` : 671B  
で検証  

In [None]:
user_prompt = (
    "Create a math problem related to the following persona:\n\n"
    "{persona}\n\n"
    "Note:\n\n"
    "1. The math problem should be challenging and involve advanced mathematical skills and knowledge. Only top talents can solve it correctly.\n"
    "2. You should make full use of the persona description to create the math problem to ensure that the math problem is unique and specific to the persona.\n"
    "3. First, consider **Topics** that require advanced mathematical skills closely related to persona descriptions and list the relevant **Mathematical skills**."
    " Then, drawing on your knowledge and experience of persona descriptions, create specialised master's- or doctoral-level problems that combine highly interrelated **Topics** and **Mathematical skills**.\n"
    "4. If the math problem contains more than one task, structure it so that each task relates to the others, with the most difficult task being completed last.\n"
    "5. The problem must be solved analytically, not numerically."
    # " Also, make sure that the conditions required for an analytical solution are included in the problem statement, and that the problem is presented as solvable.\n"
    " Also, make sure that the problem statement includes all the conditions necessary for an analytical solution and presents the problem as one that can be uniquely solved.\n"
    "6. Your response should always start with 'Math problem:'. Your response should not include a solution to the created math problem."
    " Do not include any information, such as 'notes', that is not necessary for solving the question.\n\n"
    # "4. The problem must be solved analytically, not numerically."
    # " Also, make sure that the conditions required for an analytical solution are included in the problem statement, and that the problem is presented as solvable.\n"
    # "4. Consider the advanced mathematical skills that are highly relevant to the persona description, and create specialised problems at master's or doctoral level that reflect these skills."
    # "4. Consider the advanced mathematical skills that are highly relevant to the persona description, and create specialised problems at leading researchers or doctoral-level level that reflect these skills.\n\n"
    # "5. First, consider <topics> that require advanced mathematical skills closely related to persona descriptions and list the relevant <mathematical skills>."
    # " Then, drawing on your knowledge and experience of persona descriptions, create specialised master's- or doctoral-level problems that combine highly interrelated <topics> and <mathematical skills>.\n"
    # "6. If your problem contains more than one task, structure it so that each task relates to the others, with the most difficult task being completed last.\n\n"
    
    # "The following issues should also be avoided when creating the math problem:\n"
    # "- Insufficient information: Ploblems may lack the essential details needed to solve them, leaving them incomplete or ambiguous. For example, a trigonometry question might omit the necessary angles or distances.\n"
    # "- Unsolvable or computationally intractable problems: Some problems are either unsolvable or require excessive brute-force calculations, which are impractical for evaluating reasoning abilities.\n"
    # "- Nonsensical problems: Models sometimes produce problems that are logically inconsistent, confusing or ambiguous, such as a probability issue with unclear parameters or an impossible geometry scenario."
    # " Inconsistent, confusing or ambiguous problems, such as a probability issue with unclear parameters or an impossible geometry scenario.\n"
    # # "- Deceitful Solutions: Occasionally, models fabricate solutions to nonsensical or unsolvable problems, presenting incorrect logic as plausible reasoning.\n"
)

## 5. 適用カテゴリと数学スキルについて自分で深堀りさせる

ドメイン（persona）に関連する高度な数学的要素を要求するカテゴリ（topic）を想起させたうえで、具体的な数学スキル（amth_skill）をリストアップし、それらを融合して難化させたい。  
↓  
R1クラスのモデル  
* MATHレベルのデータセットは内部に持ち合わせている
* 専門的なカテゴリと数学的スキルの関連性についても同様  
  
外部から不要な情報を与えて指定することが逆に制約になっていそうなので、カテゴリ（topic）×数学スキル（amth_skill）の組み合わせ検討もお任せしたい。 
  
 `topic`と`math_skill`について、プレースホルダーを使って深堀りし融合させ、問題の幅を広げる試みをする。

↓  
難易度：    
数学スキル: 修士レベル以上  
学際性: 疫学、地球科学、確率過程の統合（博士/研究レベル）  
ただし、一部修正は必要  
taskがに複数に分かれる  

In [40]:
persona_deiscription = "A scientist who specializes in virology and the study of ancient viruses."

# # p_seed = {"subject": "computer science", "group": "artificial intelligence", "category": "explainable AI", "job": "AIAI Robotics Researcher"}
# p_seed = {"subject": "engineering", "group": "biomedical engineering", "category": "medical imaging", "job": "Biomechanics Engineer"}
# persona_deiscription = f"A {p_seed["job"]} who specializes in {p_seed["group"]} and the study of {p_seed['category']}."

user_prompt = (
    "Create a math problem related to the following persona:\n\n"
    "{persona}\n\n"
    "Note:\n\n"
    "1. The math problem should be challenging and involve advanced mathematical skills and knowledge. Only top talents can solve it correctly.\n"
    "2. You should make full use of the persona description to create the math problem to ensure that the math problem is unique and specific to the persona.\n"
    "3. First, consider **Topics** that require advanced mathematical skills closely related to persona descriptions and list the relevant **Mathematical Skills**."
    " Then, drawing on your knowledge and experience of persona descriptions, create specialised master's- or doctoral-level problems that combine highly interrelated **Topics** and **Mathematical Skills**.\n"
    "4. If the math problem contains more than one task, structure it so that each task relates to the others, with the most difficult task being completed last.\n"
    "5. The problem must be solved analytically and must not require any numerical calculations."
    " Also, make sure that the problem statement includes all the conditions necessary for an analytical solution and presents the problem as one that can be uniquely solved.\n"
    "6. Your response should always start with 'Math problem:'. Your response should not include a solution to the created math problem."
    " Do not include any information, such as 'notes', that is not necessary for solving the question.\n\n"
)
print(user_prompt.format(persona=persona_deiscription))


Create a math problem related to the following persona:

A scientist who specializes in virology and the study of ancient viruses.

Note:

1. The math problem should be challenging and involve advanced mathematical skills and knowledge. Only top talents can solve it correctly.
2. You should make full use of the persona description to create the math problem to ensure that the math problem is unique and specific to the persona.
3. First, consider **Topics** that require advanced mathematical skills closely related to persona descriptions and list the relevant **Mathematical Skills**. Then, drawing on your knowledge and experience of persona descriptions, create specialised master's- or doctoral-level problems that combine highly interrelated **Topics** and **Mathematical Skills**.
4. If the math problem contains more than one task, structure it so that each task relates to the others, with the most difficult task being completed last.
5. The problem must be solved analytically and must 

In [22]:
''' 問題の生成 '''

msgs:List[Dict[str, str]] = [
    {"role": "user", "content": user_prompt.format(persona=persona_deiscription)},
]

client = openai.OpenAI(api_key=api_key, base_url=base_url)

response = client.chat.completions.create(
    # model="deepseek/deepseek-r1-0528:free",
    model="deepseek/deepseek-r1:free",
    messages=msgs,
)
generated_problem = response.choices[0].message.content
Math(generated_problem)
# print(generated_problem)

<IPython.core.display.Math object>

In [23]:
generated_problem

'Math problem:  \n\nA virologist reconstructs the evolutionary dynamics of an ancient RNA virus using a coalescent model. The viral population is assumed to evolve according to a time-varying effective population size \\( N(t) = N_0 e^{-\\alpha t} \\), where \\( \\alpha > 0 \\), and mutations follow a Poisson process with rate \\( \\theta \\) per lineage per unit time. Observations from *n* contemporary viral strains are used to infer the time since the most recent common ancestor (TMRCA) and mutation rate.  \n\n**Tasks**:  \n1. Let \\( k \\) lineages exist at time \\( t = 0 \\). Derive the probability that at least \\( m \\) lineages (\\( 1 \\leq m \\leq k \\)) survive without coalescing until time \\( T \\), assuming coalescence occurs at rate \\( \\binom{j}{2}/N(t) \\) when there are \\( j \\) lineages.  \n\n2. Using the time-varying \\( N(t) \\), show that the expected TMRCA of the *n* strains satisfies the integral equation:  \n\\[\n\\mathbb{E}[T_{\\text{MRCA}}] = \\int_{0}^{\\inf

### 評価

In [24]:
user_prompt = (
    "以下の問題は高度な専門性を要求する数学の問題です、回答可能な数学の問題として成立しているでしょうか。また、この問題を解くために必要な数学的要求スキルとそれ以外の専門的な知識レベル（学部、修士、博士、学際レベルなど）について日本語で解説してください\n\n"
    "{problem}\n\n"
)

# print(user_prompt.format(problem=generated_problem))

In [25]:

''' 生成した問題の評価 '''
msgs:List[Dict[str, str]] = [
    {"role": "user", "content": user_prompt.format(problem=generated_problem)},
]

client = openai.OpenAI(api_key=api_key, base_url=base_url)

response = client.chat.completions.create(
    model="deepseek/deepseek-r1:free",
    messages=msgs,
)
assessment_result = response.choices[0].message.content
Markdown(assessment_result)

この問題は高度な数学的専門性を必要とするものですが、以下のように各タスクに分けて考えることができます。

**タスク1**  
**問題の成立性**：  
確かに成立しています。時間変化する有効集団サイズ \( N(t) \) のもとで、合流モデルにおける系統の生存確率を導出する問題です。非定常ポアソン過程と合流速度の積分計算を組み合わせる必要があり、確率過程の理論に基づいて厳密に解くことが可能です。

**必要なスキル**：  
- **数学**：非斉時ポアソン過程、積分変換、組合せ確率の計算。  
- **専門知識**：合流理論の基礎と時間変化する集団サイズの扱い。  
- **レベル**：博士課程レベル（確率論と集団遺伝学の融合領域）。

---

**タスク2**  
**問題の成立性**：  
与えられた積分方程式の導出と簡略化は、合流理論の延長線上で解決可能です。ただし、\( \alpha \to 0 \) の挙動を解析する際に定常状態（定数集団サイズ）との整合性を確認する必要があります。

**導出の流れ**：  
1. 合流事象の生起を時間変換 \( \tau(t) = \int_0^t \frac{1}{N(s)}ds \) を用いて標準合流過程にマッピング。  
2. 無矛盾性確認：積分方程式が期待値 \( \mathbb{E}[T_{\text{MRCA}}] \) を正確に表現することを示す。  
3. \( N(t) = N_0 e^{-\alpha t} \) の場合、時間積分を解析的に評価し、\( \alpha \to 0 \) で古典的な定数集団サイズモデルの結果と一致することを示す。

**必要なスキル**：  
- **数学**：積分変換、漸近解析、微分方程式。  
- **専門知識**：時間変化集団サイズ下でのTMRCAの期待値計算。  
- **レベル**：修士〜博士課程レベル（確率微分方程式の応用）。

---

**タスク3**  
**問題の成立性**：  
突然変異プロセスの尤度関数の構築は標準的です。ポアソン過程の性質と分岐時間の積分を用いてMLEを導出可能です。

**導出の流れ**：  
1. 再構築された系統樹の分岐長 \( L(t) \) に沿った突然変異の生起をポアソン過程としてモデル化。  
2. 尤度関数 \( \mathcal{L}(\theta) \) を書き、最尤推定量 \( \hat{\theta} = \frac{k}{\int_0^{T_{\text{MRCA}}} L(t)dt} \) を導出。  
3. 前提条件（無限サイト仮定、中立進化、分岐長の正確性）を明示。

**必要なスキル**：  
- **数学**：ポアソン過程の尤度関数、最尤推定。  
- **専門知識**：系統樹解析と分子進化モデル。  
- **レベル**：学部上級〜修士課程レベル（生物統計学の基礎）。

---

**総合的な専門性の評価**：  
- **数学的要求**：確率論（非斉時ポアソン過程、積分方程式）、統計学（最尤推定）、微分積分。  
- **分野知識**：集団遺伝学の合流理論、系統進化モデル、ウイルス進化の動態。  
- **必要な教育レベル**：タスク1〜2は博士課程、タスク3は修士課程レベル。学際的な問題のため、数学と生物学の両方の深い理解が必要です。

### 修正

In [38]:
user_prompt = (
    "Solve the following problems and confirm whether they can be answered analytically as mathematical problems.\n"
    "{problem}\n\n"
    "- The problem must be solved analytically and must not require any numerical calculations."
    "If there are any deficiencies in the problem, such as insufficient conditions, revise the problem statement"
    " so that a unique answer can be obtained analytically, without the need for numerical calculation. Then, create a **Revised Problem**."
    # "- If there are deficiencies in the problem, such as insufficient conditions, revise the problem statement"
    # " so that a unique answer can be obtained analytically, and create a **Revised Problem**."
    " Also, explain the revisions: output the **Revised Points**.\n"
    "- If you have revised the question, please submit your answer to the revised question as your **Final Answer**."
    " If you have not revised the question, please submit your answer to the original question as your **Final Answer**.\n"
    # "- Please store only the answer in the **Final Answer** and do not include any unnecessary tags or explanations."
    "- Make sure only the answer in the **Final Answer** enclosed in the latex style."
    " If there are multiple answers, please list them as **Final Answer** in the form of [Answer 1, Answer 2, ...].\n"
    # " If there are multiple numerical answers, write them as a comma separated list (n1, n2, ...).\n"
    "- Please provide in the following format. Do not include any other text.\n"
    "json format:\n"
    '{{"revised_problem": "**Revised Problem**", "revise_point": "**Revised Points**", "answer": "**Final Answer**"}}'
)

print(user_prompt.format(problem=generated_problem))

Solve the following problems and confirm whether they can be answered analytically as mathematical problems.
Math problem:  

A virologist reconstructs the evolutionary dynamics of an ancient RNA virus using a coalescent model. The viral population is assumed to evolve according to a time-varying effective population size \( N(t) = N_0 e^{-\alpha t} \), where \( \alpha > 0 \), and mutations follow a Poisson process with rate \( \theta \) per lineage per unit time. Observations from *n* contemporary viral strains are used to infer the time since the most recent common ancestor (TMRCA) and mutation rate.  

**Tasks**:  
1. Let \( k \) lineages exist at time \( t = 0 \). Derive the probability that at least \( m \) lineages (\( 1 \leq m \leq k \)) survive without coalescing until time \( T \), assuming coalescence occurs at rate \( \binom{j}{2}/N(t) \) when there are \( j \) lineages.  

2. Using the time-varying \( N(t) \), show that the expected TMRCA of the *n* strains satisfies the in

In [39]:
msgs:List[Dict[str, str]] = [
    {"role": "user", "content": user_prompt.format(problem=generated_problem)},
]

client = openai.OpenAI(api_key=api_key, base_url=base_url)

response = client.chat.completions.create(
    model="deepseek/deepseek-r1:free",
    messages=msgs,
)
assessment_result = response.choices[0].message.content
Markdown(assessment_result)

JSONDecodeError: Expecting value: line 1291 column 1 (char 7095)

In [37]:
assessment_result

''