In [None]:

!pip install --upgrade pandas pandasai

Collecting pandas
  Using cached pandas-2.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.7 MB)
Collecting tzdata>=2022.1 (from pandas)
  Using cached tzdata-2023.3-py2.py3-none-any.whl (341 kB)


In [None]:
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI
from pandasai import SmartDataframe
import pandas as pd

### PandasAI

In [None]:
### ENTER YOUR OPENAI KEY HERE
OPENAI_API_KEY = "YOUR-KEY-HERE"

llm = OpenAI(api_token=OPENAI_API_KEY)

### Scenario 2

In [None]:
# load data into pandas dataframe
df_p2 = pd.read_excel("platelets_3.xlsx", # other files with different scenarios - platelets_1, platelets_2
                   index_col=[0],
                  #  columns=['Observations']
                   )

df_p2 = df_p2.reset_index()
df_p2 = df_p2.rename(columns={'index':'Week', 'Unnamed: 1':'Observation'})
df_p2.head(50)

Unnamed: 0,Week,Observation,Tofacitinib 5 mg BID OL (N=225)
0,,,
1,Baseline,Result,
2,Baseline,N,223
3,Baseline,Mean (SE),316.87 (6.21)
4,Baseline,Median,310
5,Baseline,Min. Max,(135.0. 913.0)
6,Baseline,Q1,258
7,Baseline,Q3,356
8,Baseline,,
9,Week 2,Result,


In [None]:
df = SmartDataframe(df_p2.fillna(0), config={"llm": llm})

In [None]:
# extract different clinical trial arms
list_arms = [x for x in df_p2.columns.values if x not in ['Week','Observation']]
arm = list_arms[0] # there's only one clinical trial arm in the example table
text_output = df.chat("What is the Mean value for each Week for {0} under change from baseline ?".format(arm))

In [None]:
# generate prompts
prompt_template =\
"""
Given the following data from a clinical trial (from Table 35), where the first value reports the average platelet count for baseline week, and subsequent values reporting the change from baseline of mean value of platelet counts for each week (that the study participants were observed), identify the trend (you can directly use the values corresponding to each week - as they are obtained by taking a difference from the observation of baseline week) and give a comparative summary from baseline.

Data: {0}.

 Example: "Mean platelets initially increased from baseline at Week 2 and decreased compared to Week 18. The largest decrease was observed between Weeks 2 and 4. At Week 18, the mean (SE) change from baseline in platelets was -14.24 (4.62) (Table 28)"
""".format(text_output)
print(prompt_template)


Given the following data from a clinical trial (from Table 35), where the first value reports the average platelet count for baseline week, and subsequent values reporting the change from baseline of mean value of platelet counts for each week (that the study participants were observed), identify the trend (you can directly use the values corresponding to each week - as they are obtained by taking a difference from the observation of baseline week) and give a comparative summary from baseline.

Data: {'Baseline': '316.87 (6.21)', 'Week 2': '4.86 (4.70)', 'Week 4': '-25.06 (4.49)', 'Week 8': '-14.16 (5.02)', 'Week 12': '-13.09 (4.56)', 'Week 18': '-11.09 (4.56)'}. 
 
 Example: "Mean platelets initially increased from baseline at Week 2 and decreased compared to Week 18. The largest decrease was observed between Weeks 2 and 4. At Week 18, the mean (SE) change from baseline in platelets was -14.24 (4.62) (Table 28)"



### **Outputs**

1. Scenario 1

  Based on the provided data, we can identify the trend and provide a comparative summary from baseline for the mean platelet counts. Here's a summary in the format you requested:

  "Mean platelet counts initially increased from baseline at Week 2 and continued to increase over the subsequent weeks. The largest increase was observed between Week 0 (Baseline) and Week 18. The trend in mean platelet counts over time is as follows:

  - Baseline: The average platelet count at baseline was 316.87 (SE 6.21).

  - Week 2: Mean platelet counts increased by 4.86 units from baseline (SE 4.70).

  - Week 4: Mean platelet counts increased by 9.86 units from baseline (SE 4.70).

  - Week 8: Mean platelet counts increased by 14.86 units from baseline (SE 4.70).

  - Week 12: Mean platelet counts increased by 17.86 units from baseline (SE 4.70).

  - Week 18: Mean platelet counts increased by 22.86 units from baseline (SE 4.70).

  The largest increase in mean platelet counts occurred between Week 0 (Baseline) and Week 18, where the mean (SE) change from baseline in platelets was +22.86 (4.70). The trend suggests a progressive increase in platelet counts over the course of the study."

  Please note that the standard error (SE) is provided to indicate the variability or uncertainty associated with each mean platelet count measurement. This trend suggests a positive response in mean platelet counts to the treatment or intervention being studied. However, further analysis and clinical interpretation would be necessary to fully understand the clinical implications of these changes.
---
2. Scenario 2

  Based on the provided data, we can identify the trend and provide a comparative summary from baseline for the mean platelet counts. Here's a summary in the format you requested:

  "Mean platelet counts initially decreased from baseline at Week 2 and continued to decrease over the subsequent weeks. The largest decrease was observed between Week 2 and Week 4. At Week 18, the mean (SE) change from baseline in platelets was -11.09 (4.56).

  - Baseline: The average platelet count at baseline was 316.87 (SE 6.21).

  - Week 2: Mean platelet counts decreased by -27.06 units from baseline (SE 4.49).

  - Week 4: Mean platelet counts decreased by -25.06 units from baseline (SE 4.49).

  - Week 8: Mean platelet counts decreased by -14.16 units from baseline (SE 5.02).

  - Week 12: Mean platelet counts decreased by -13.09 units from baseline (SE 4.56).

  - Week 18: Mean platelet counts decreased by -11.09 units from baseline (SE 4.56).

  The largest decrease in mean platelet counts occurred between Week 2 and Week 4, where the mean (SE) change from baseline in platelets was -25.06 (4.49). The trend suggests a progressive decrease in platelet counts over the course of the study."

  This trend indicates that platelet counts are decreasing over time, suggesting a potential response to the treatment or intervention being studied. However, further analysis and clinical interpretation are necessary to fully understand the clinical implications of these changes.
---
3. **Scenario 3**



  Based on the provided data, we can identify the trend and provide a comparative summary from baseline for the mean platelet counts. Here's a summary in the format you requested:

  "Mean platelet counts initially increased from baseline at Week 2 but then decreased compared to Week 4, and this decreasing trend continued until Week 18. The largest decrease was observed between Weeks 4 and 8. At Week 18, the mean (SE) change from baseline in platelets was -11.09 (4.56).

  - Baseline: The average platelet count at baseline was 316.87 (SE 6.21).

  - Week 2: Mean platelet counts increased by 4.86 units from baseline (SE 4.70).

  - Week 4: Mean platelet counts decreased by -25.06 units from baseline (SE 4.49).

  - Week 8: Mean platelet counts decreased by -14.16 units from baseline (SE 5.02).

  - Week 12: Mean platelet counts decreased by -13.09 units from baseline (SE 4.56).

  - Week 18: Mean platelet counts decreased by -11.09 units from baseline (SE 4.56).

  The largest decrease in mean platelet counts occurred between Week 4 and Week 8, where the mean (SE) change from baseline in platelets was -14.16 (5.02). The trend suggests an initial increase in platelet counts at Week 2, followed by a continuous decrease from Week 4 to Week 18."

  This trend indicates that platelet counts initially increased but then progressively decreased over the course of the study. It's important to interpret these findings in the context of the clinical trial and consult with a medical professional to understand the clinical implications.

### Loop For multiple trial arms

In [None]:
# extract different clinical trial arms
list_arms = [x for x in df_p2.columns.values if x not in ['Week','Observation']]

# create dictinoary to store all the relevant filtered numbers
dict_relevant_numbers = {x:{} for x in list_arms}

# loop over all arms and extract the relevant rows
for arm in list_arms:
  text_values = df.chat("What is the Mean value for each Week for {0} under change from baseline ?".format(arm))
  dict_relevant_numbers[arm] = text_values


In [None]:
# generate prompts
prompt_template =\
"""
Given the following data from a clinical trial (from Table 35), where the first value reports the average platelet count for baseline week, and subsequent values reporting the change from baseline of mean value of platelet counts for each week (that the study participants were observed), identify the trend (you can directly use the values corresponding to each week - as they are obtained by taking a difference from the observation of baseline week) and give a comparative summary from baseline.

Data: {0}.

 Example: "Mean platelets initially increased from baseline at Week 2 and decreased compared to Week 18. The largest decrease was observed between Weeks 2 and 4. At Week 18, the mean (SE) change from baseline in platelets was -14.24 (4.62) (Table 28)"
""".format(dict_relevant_numbers)
