# FIN411X ChatGPT Co‑Tutor Portfolio Project  
**Replication package README** (`Readme.ipynb`)

This notebook documents the structure of the replication package for the study on an AI‑assisted, Python‑based CAPM portfolio project in FIN411X (Investment Analysis). It explains:

- how the repository is organised  
- which data files are included  
- what each notebook does  
- how to reproduce all tables and figures reported in the article

You can read this notebook directly on GitHub or open it in Jupyter / Google Colab. All paths below are given relative to the root folder of the replication package.


## 1. Repository layout

In the ZIP file supplied with the manuscript, the top‑level folder is:

```text
Replication_Package/
  Readme.ipynb                  <-- this notebook (to be added at repo root)
  code_capm/
    capm_portfolio_project.ipynb
    portfolio_analysis.ipynb
  code_stats/
    FIN411X_Stats_Analysis.ipynb
    FIN411X_Make_Tables_and_Figures.ipynb
    run_FIN411X_–_STATS_SCRIPT.ipynb
  data/
    chatgpt_survey_instrument.pdf
    chatgpt_survey_raw_id.csv
    summary_all_portfolios.csv
  documentation/
    ChatGPT Prompt Library for CAPM & Fixed‑Weight Portfolio (FIN411X Project).docx
    python_code_notes.docx
  figures/
    ai_literacy_vs_performance_scatterplot.png
    chatgpt_scales_boxplot.png
    portfolio_returns_histogram.png
  output/
    fin411x_correlations.csv
    fin411x_descriptives.csv
  tables/
    table1_portfolio_performance.csv
    table2_ai_scales_reliability.csv
    table3_A2_frequency.csv
    table3_A3_frequency.csv
    table3_A5_frequency.csv
    table3_E1_frequency.csv
    table4_correlations_matrix.csv
    table5_ai_scales_vs_performance.csv
```

The core artefacts for replication are located in the `code_capm/`, `code_stats/`, `data/`, `tables/`, and `figures/` folders.


## 2. Software requirements

The notebooks were designed to run either:

- in **Google Colab** (recommended for quick use), or  
- in a local **Jupyter** environment (e.g., Anaconda, VS Code, JupyterLab).

All analyses use standard, widely‑available Python packages.

### 2.1. Python and libraries

- Python 3.9 or later  
- `numpy`  
- `pandas`  
- `yfinance` (for downloading daily price data from Yahoo Finance)  
- `statsmodels` (for CAPM regressions)  
- `scipy` (for Cronbach’s alpha and correlations)  
- `matplotlib` (for figures)  
- `quantstats` (optional; used for some portfolio diagnostics)

In a local environment you can install these with, for example:

```bash
pip install numpy pandas yfinance statsmodels scipy matplotlib quantstats
```

> **Note.** In Google Colab, most packages are pre‑installed; the notebooks include `pip` cells where an additional install is needed.


## 3. Data files (`data/`)

All data files used in the article’s quantitative analyses are included in the `data/` folder.

### 3.1 `summary_all_portfolios.csv`

- One row per student portfolio (N = 30).  
- Key variables:
  - `student_id` (pseudonymous ID S01–S30)  
  - `portfolio_id` (P01–P30)  
  - `final_value_aed` (final portfolio value, AED)  
  - `pnl_aed` (absolute profit/loss vs. AED 600,000 initial capital)  
  - `total_return_pct` (total portfolio return in %)  
  - `alpha_annual` (annualised CAPM alpha)  
  - `beta` (CAPM beta)

This file is the primary source for all portfolio‑level analyses (RQ1) and for linking performance to survey responses (RQ2).

> For **exact replication** of the article, you normally do **not** need to rerun the CAPM portfolio notebook; you can work directly from this CSV.

### 3.2 `chatgpt_survey_raw_id.csv`

- Item‑level responses to the *AI Literacy and Metacognition: ChatGPT Survey*.  
- Includes the same pseudonymous IDs (`Student_id`) used in `summary_all_portfolios.csv`.  
- Columns cover:
  - usage indicators (Section A)  
  - learning & confidence items (Section B)  
  - metacognitive strategies items (Section C)  
  - AI literacy / critical awareness items (Section D)  
  - behavioural and open‑ended items (Sections E and F).

The `FIN411X_Stats_Analysis` and `FIN411X_Make_Tables_and_Figures` notebooks use this file to construct the B_scale, C_scale, and D_scale variables.

### 3.3 `chatgpt_survey_instrument.pdf`

- PDF of the full survey instrument for reference.  
- This file is **not** used directly in the code, but documents the wording and ordering of all items.


## 4. Notebooks and their roles

This section summarises the purpose and inputs/outputs of each notebook.

### 4.1 CAPM / portfolio notebooks (`code_capm/`)

#### `capm_portfolio_project.ipynb`

This notebook mirrors the student workflow for the CAPM portfolio project. It:

- sets up the Python environment (imports `numpy`, `pandas`, `yfinance`, `statsmodels`, and optional `quantstats`),  
- downloads daily price data for an assigned 10‑asset portfolio and a benchmark index using `yfinance`,  
- computes daily asset returns and constructs the portfolio’s daily value based on fixed weights,  
- calculates daily portfolio returns and corresponding benchmark returns,  
- builds excess return series for the portfolio and benchmark using a specified risk‑free rate,  
- estimates CAPM alpha and beta via ordinary least squares (OLS) regression,  
- computes summary performance indicators (final value, P&L, total return, alpha, beta), and  
- writes a **one‑row CSV summary** for the chosen `student_id` / `portfolio_id` combination.

In the teaching context, each student ran this notebook once for their assigned portfolio. For replication, you can either:

- use the already‑aggregated `summary_all_portfolios.csv` in `data/` (recommended), or  
- rerun this notebook for new or alternative portfolios.

> Because price data are retrieved from Yahoo Finance at runtime, rerunning this notebook in future may yield slightly different CAPM estimates if the data provider revises historical prices. The article’s results are tied to the snapshot captured in `summary_all_portfolios.csv`.

#### `portfolio_analysis.ipynb`

This instructor‑facing notebook aggregates multiple one‑row portfolio summaries into a single file. It:

- reads individual portfolio summary CSVs (one per student),  
- concatenates them into a single DataFrame,  
- standardises column names and ordering, and  
- exports `summary_all_portfolios.csv` (same structure as the file bundled in `data/`).

For most users, this notebook is **optional**: you only need it if you wish to regenerate `summary_all_portfolios.csv` from a set of individual portfolio files.


### 4.2 Statistics, tables, and figures (`code_stats/`)

#### `FIN411X_Stats_Analysis.ipynb`

This notebook performs the main quantitative analyses reported in the Results section. Specifically, it:

1. Loads `data/summary_all_portfolios.csv` (portfolio performance).  
2. Loads `data/chatgpt_survey_raw_id.csv` (survey responses with `Student_id`).  
3. Cleans ID fields and merges the two datasets on student ID (S01–S30).  
4. Constructs composite scale scores:
   - `B_scale` – learning & confidence (Section B items),  
   - `C_scale` – metacognitive strategies (Section C items),  
   - `D_scale` – AI literacy / critical awareness (Section D items).  
5. Computes Cronbach’s alpha for each scale.  
6. Produces descriptive statistics for:
   - portfolio indicators (final value, P&L, total return, alpha, beta), and  
   - the three AI‑related scales.  
7. Computes Pearson correlation matrices for:
   - portfolio performance indicators, and  
   - relationships between performance and AI‑related scales.

Key outputs:

- Printed tables of descriptive statistics and correlation matrices.  
- Two CSV files saved to the working directory:
  - `fin411x_descriptives.csv`  
  - `fin411x_correlations.csv`  

These CSVs are also provided in the `output/` folder for convenience.

#### `FIN411X_Make_Tables_and_Figures.ipynb`

This notebook generates publication‑ready tables and figures from the processed data. It:

- (Re)loads `summary_all_portfolios.csv` and `chatgpt_survey_raw_id.csv`,  
- ensures consistency of ID fields and scale construction (B_scale, C_scale, D_scale),  
- formats and exports the main tables reported in the article (Tables 1–5) as CSVs in the `tables/` folder, and  
- creates the key figures, saving them as PNGs in the `figures/` folder.

Concretely, it produces:

- **Tables (CSV):**  
  - `tables/table1_portfolio_performance.csv`  
  - `tables/table2_ai_scales_reliability.csv`  
  - `tables/table3_A2_frequency.csv`, `tables/table3_A3_frequency.csv`, `tables/table3_A5_frequency.csv`, `tables/table3_E1_frequency.csv`  
  - `tables/table4_correlations_matrix.csv`  
  - `tables/table5_ai_scales_vs_performance.csv`  

- **Figures (PNG):**  
  - `figures/portfolio_returns_histogram.png` (distribution of total returns)  
  - `figures/ai_literacy_vs_performance_scatterplot.png` (AI literacy vs. total return)  
  - `figures/chatgpt_scales_boxplot.png` (distribution of the three AI‑related scales)

These files correspond directly to the tables and figures cited in the Results section.

#### `run_FIN411X_–_STATS_SCRIPT.ipynb`

This small helper notebook focuses on regenerating the main figures from an already‑merged DataFrame `df` (as produced by `FIN411X_Stats_Analysis`). It:

- assumes `df` is available in memory,  
- uses `matplotlib` to draw:
  - the AI literacy vs. total return scatterplot,  
  - the boxplot of ChatGPT‑related scales, and  
  - the histogram of portfolio total returns, and  
- saves each figure to the `figures/` folder.

If you run `FIN411X_Make_Tables_and_Figures.ipynb`, you normally do **not** need this script; it is included for transparency.


## 5. Replication workflow

This section summarises the minimum steps needed to reproduce the quantitative results, tables, and figures from the article.

### 5.1 Quick replication (recommended)

If your goal is to reproduce the reported tables and figures **exactly**, you can work directly from the bundled CSV files:

1. **Obtain the repository**  
   - Download / clone the `chatgpt-cotutor-finance` repository (or the `Replication_Package` folder) to your machine, **or**  
   - Upload the entire `Replication_Package` folder to Google Drive and open it in Google Colab.

2. **Run the stats analysis notebook**  
   - Open `code_stats/FIN411X_Stats_Analysis.ipynb`.  
   - Make sure both `data/summary_all_portfolios.csv` and `data/chatgpt_survey_raw_id.csv` are accessible (in Colab, upload or mount Drive as indicated in the notebook).  
   - Run all cells from top to bottom.  
   - This will recompute descriptive statistics, reliability indices, and correlation matrices and save:
     - `fin411x_descriptives.csv`  
     - `fin411x_correlations.csv`  

3. **Generate tables and figures**  
   - Open `code_stats/FIN411X_Make_Tables_and_Figures.ipynb`.  
   - Run all cells from top to bottom.  
   - The notebook will:
     - (re)build B_scale, C_scale, D_scale,  
     - format and export Tables 1–5 to the `tables/` folder, and  
     - export the main figures to the `figures/` folder.

At this point, the contents of `tables/` and `figures/` should match the tables and figures reported in the manuscript (up to rounding/formatting in your text editor).

### 5.2 Optional: Regenerating portfolio summaries

If you wish to regenerate `summary_all_portfolios.csv` from scratch (for example, to extend the project or explore alternative portfolios):

1. Open `code_capm/capm_portfolio_project.ipynb`.  
2. Set `student_id` and `portfolio_id` as described in the notebook.  
3. Run all cells to download prices, compute CAPM statistics, and write a one‑row summary CSV for that portfolio.  
4. Repeat as needed for additional portfolios.  
5. Open `code_capm/portfolio_analysis.ipynb` to aggregate individual portfolio summary files into a new `summary_all_portfolios.csv`.  
6. Move this new CSV into the `data/` folder (or adjust the paths in the stats notebooks accordingly).  
7. Rerun the stats and table/figure notebooks as in §5.1.

> **Caution.** Because price data are retrieved from Yahoo Finance via `yfinance`, re‑computing portfolios at a later date may yield slightly different values if the provider updates historical prices. For strict replication of the published article, use the `summary_all_portfolios.csv` included in `data/`.


## 6. Adapting the materials

The replication package is designed to be reusable by other instructors and researchers. Some common adaptations include:

- **Changing the portfolio universe**  
  - Edit the asset tickers and fixed weights in `capm_portfolio_project.ipynb`.  
  - Regenerate `summary_all_portfolios.csv` as described in §5.2.

- **Modifying the survey or scales**  
  - If you adapt the AI literacy / metacognition survey, update the column selections and item lists in `FIN411X_Stats_Analysis.ipynb` and `FIN411X_Make_Tables_and_Figures.ipynb` so that B_scale, C_scale, and D_scale reflect your new instrument.

- **Adding new performance metrics**  
  - Extend the CAPM notebook with additional indicators (e.g., Sharpe ratio, maximum drawdown) using `pandas`, `numpy`, or `quantstats`.  
  - Store these metrics in `summary_all_portfolios.csv` and incorporate them into `FIN411X_Stats_Analysis` and `FIN411X_Make_Tables_and_Figures` as needed.

- **Using different AI tools**  
  - The current design frames ChatGPT as a “co‑tutor” for Python and CAPM. You can adapt the reflections and survey items to explore other generative AI systems while keeping the same portfolio and stats pipeline.

If you modify the notebooks, it is good practice to:

- keep a copy of the original replication package for reference, and  
- document any changes in a short changelog section at the top of each modified notebook.


## 7. Reproducibility notes

- All analyses are deterministic given the fixed CSV files in `data/`; no random seeds are required.  
- The only potential source of variation arises if you choose to **regenerate** `summary_all_portfolios.csv` from live market data using `yfinance`.  
- The repository does **not** include raw ChatGPT transcripts; the qualitative analysis is documented in the manuscript and supplementary materials but cannot be fully reproduced from these files for confidentiality reasons.

For any issues running the notebooks, check that:

1. The current working directory in Jupyter/Colab is the root of the replication folder, and  
2. All required Python libraries are installed and importable.
