# Notebook 3 – Analytical Modeling and Ranking

The previous notebook focused on cleaning and standardizing the data, resulting in the `clean_dataset` view — a consistent, validated table that serves as the foundation for analysis.

This notebook shifts from **data preparation** to **data interpretation**.  Here, we create `rank_metrics`, a view that applies SQL window functions to rank companies and compute percentile scores across key financial ratios.

The idea is simple:
- `clean_dataset` → ensures data quality  
- `rank_metrics` → enables comparison and ranking  

This separation keeps the workflow modular and transparent.

### Domain Logic Reference

As a data analyst, my role is to structure, transform, and interpret data — not define the financial meaning behind it.  To apply the right interpretation rules, I consulted with a subject matter expert (SME), who clarified the general logic used in evaluating financial performance:

- **Lower P/E Ratio** → Better (suggests undervaluation)  
- **Higher ROE** → Better (stronger profitability)  
- **Lower Debt-to-Equity** → Better (lower financial risk)  
- **Higher EPS Growth** → Better (improving earnings)  

These rules were defined by the SME and applied here purely for analytical demonstration.

## 1. Environment Setup

In [1]:
%load_ext sql
%load_ext dotenv
%dotenv
%config SqlMagic.style = '_DEPRECATED_DEFAULT'

%sql postgresql://$DB_USER:$DB_PASSWORD@$DB_SERVER:$DB_PORT/$DB_NAME
%sql SELECT current_database() AS connected_to, NOW() AS time_check;

 * postgresql://postgres:***@localhost:5432/fundamentals
1 rows affected.


connected_to,time_check
fundamentals,2025-10-22 22:48:36.313739+08:00


## 2. Inspect the Clean Dataset
Let’s confirm the unified dataset is ready for analysis.

In [2]:
%%sql
SELECT * FROM clean_dataset LIMIT 5;

 * postgresql://postgres:***@localhost:5432/fundamentals
5 rows affected.


company_id,ticker,year,quarter,market_cap,eps,roe,debt_to_equity,pe_ratio,price_per_share,dividends_per_share,shares_outstanding,industry,avg_pe,avg_roe
11,JGS,2021,1,2036741112.714,0.712,0.086,0.477,9.926,7.067,1.708,288217712,Holding Firms,12.2,0.12
7,GTCAP,2021,1,1823691041.418,0.464,0.062,0.754,11.167,5.184,3.014,351787156,Holding Firms,12.2,0.12
4,SMC,2021,1,9442699073.125,0.167,0.082,1.186,65.954,10.983,0.352,859791827,Holding Firms,12.2,0.12
1,AC,2021,1,14533845909.705,0.489,0.296,0.644,59.341,28.995,2.296,501249478,Holding Firms,12.2,0.12
10,PNB,2021,1,6124708133.262,0.164,0.111,1.59,45.8,7.518,2.667,814622670,Banks,10.9,0.16


## 3. Compute Relative Performance Metrics

We’ll use **Common Table Expressions (CTEs)** to calculate percentile ranks for each key ratio within its industry.

- Lower P/E → Better
- Higher ROE → Better
- Lower Debt-to-Equity → Better
- Higher Dividend Yield → Better

*Note:* `PERCENT_RANK()` is used instead of `RANK()` to express each company’s standing as a percentile within the industry.  All percentile ranks and leaderboard ranks are computed per `industry`, per `year`, and per `quarter`.  This ensures each ranking compares companies within the same snapshot (industry at that quarter).


In [3]:
%%sql
DROP VIEW IF EXISTS rank_metrics;

CREATE VIEW rank_metrics AS
WITH ranked AS (
    SELECT
        ticker,
        industry,
        year,
        quarter,
        pe_ratio,
        roe,
        debt_to_equity,
        dividends_per_share,
        price_per_share,
        -- dividend yield
        ROUND(dividends_per_share / NULLIF(price_per_share, 0), 6) AS dividend_yield,
        -- percentile ranks scoped to industry + year + quarter
        PERCENT_RANK() OVER (
            PARTITION BY industry, year, quarter
            ORDER BY pe_ratio ASC
        ) AS pe_rank,
        PERCENT_RANK() OVER (
            PARTITION BY industry, year, quarter
            ORDER BY roe DESC
        ) AS roe_rank,
        PERCENT_RANK() OVER (
            PARTITION BY industry, year, quarter
            ORDER BY debt_to_equity ASC
        ) AS debt_rank,
        PERCENT_RANK() OVER (
            PARTITION BY industry, year, quarter
            ORDER BY (dividends_per_share / NULLIF(price_per_share,0)) DESC
        ) AS div_rank
    FROM clean_dataset
)
SELECT
    ticker,
    industry,
    year,
    quarter,
    ROUND(pe_ratio, 6) AS pe_ratio,
    ROUND(roe, 6) AS roe,
    ROUND(debt_to_equity, 6) AS debt_to_equity,
    -- ROUND(dividends_per_share, 6) AS dividends_per_share,
    dividend_yield,
    ROUND(CAST(pe_rank AS numeric), 2) AS pe_rank, 
    ROUND(CAST(roe_rank AS numeric), 2) AS roe_rank, 
    ROUND(CAST(debt_rank AS numeric), 2) AS debt_rank, 
    ROUND(CAST(div_rank AS numeric), 2) AS div_rank
FROM ranked;


SELECT * FROM rank_metrics ORDER BY industry, year, quarter LIMIT 5;

 * postgresql://postgres:***@localhost:5432/fundamentals
Done.
Done.
5 rows affected.


ticker,industry,year,quarter,pe_ratio,roe,debt_to_equity,dividend_yield,pe_rank,roe_rank,debt_rank,div_rank
SECB,Banks,2021,1,9.926,0.086,0.477,0.241687,0.0,0.67,0.0,1.0
PNB,Banks,2021,1,45.8,0.111,1.59,0.354749,0.67,0.33,0.67,0.33
MBT,Banks,2021,1,67.855,0.025,0.633,0.520969,1.0,1.0,0.33,0.0
BDO,Banks,2021,1,11.005,0.2,1.735,0.279114,0.33,0.0,1.0,0.67
MBT,Banks,2021,2,3.296,0.082,0.997,0.655288,0.0,1.0,0.0,0.33



## 4. Compute Composite Scores
Now we combine the percentile ranks into a single weighted score.
We set the weights per our SME's requirements:

- P/E Ratio → 30%
- ROE → 30%
- Debt-to-Equity → 20%
- Dividend Yield → 20%

*Note:* In this project, a higher composite score means a stronger overall performance.  Some metrics, like P/E and Debt-to-Equity, are considered better when they’re lower.  To keep everything moving in the same direction, their percentile ranks were flipped using (1 - rank).  This doesn’t change who’s doing better or worse — it just keeps all scores consistent and easy to read.

In [4]:
%%sql
DROP VIEW IF EXISTS stock_scores;

CREATE VIEW stock_scores AS
SELECT
    ticker,
    industry,
    year,
    quarter,
    pe_ratio,
    roe,
    debt_to_equity,
    dividend_yield,
    ROUND(
        (0.30 * (1 - pe_rank)) +   -- lower PE is better => invert percentile
        (0.30 * roe_rank) +        -- higher ROE is better
        (0.20 * (1 - debt_rank)) + -- lower debt is better => invert percentile
        (0.20 * div_rank),         -- higher dividend yield is better
        6
    ) AS composite_score
FROM rank_metrics;

SELECT * FROM stock_scores ORDER BY industry, year, quarter, composite_score DESC LIMIT 5;

 * postgresql://postgres:***@localhost:5432/fundamentals
Done.
Done.
5 rows affected.


ticker,industry,year,quarter,pe_ratio,roe,debt_to_equity,dividend_yield,composite_score
SECB,Banks,2021,1,9.926,0.086,0.477,0.241687,0.901
MBT,Banks,2021,1,67.855,0.025,0.633,0.520969,0.434
BDO,Banks,2021,1,11.005,0.2,1.735,0.279114,0.335
PNB,Banks,2021,1,45.8,0.111,1.59,0.354749,0.33
MBT,Banks,2021,2,3.296,0.082,0.997,0.655288,0.866


## 5. Generate Buy–Hold–Sell Recommendations

After computing the composite scores for all companies, I consulted the SME to interpret what those scores should mean in practical terms.  Since I’m focusing on the analytics side (not financial advisory), I relied on the SME to define the thresholds that translate a score into a qualitative recommendation.

According to the SME:

- **Buy** → Composite score ≥ 0.70  
- **Hold** → Composite score between 0.40 and 0.69  
- **Sell** → Composite score < 0.40  

These thresholds were used to create the `stock_recommendations` view below.

In [None]:
%%sql

DROP VIEW IF EXISTS stock_recommendations CASCADE;

CREATE VIEW stock_recommendations AS
SELECT
    ticker,
    industry,
    year,
    quarter,
    ROUND(pe_ratio, 3) AS pe_ratio,
    ROUND(roe, 3) AS roe,
    ROUND(debt_to_equity, 3) AS debt_to_equity,
    ROUND(dividend_yield, 4) AS dividend_yield,
    ROUND(composite_score, 3) AS composite_score,
    CASE
        WHEN composite_score IS NULL THEN 'NO DATA'
        WHEN composite_score >= 0.70 THEN 'BUY'
        WHEN composite_score >= 0.40 THEN 'HOLD'
        ELSE 'SELL'
    END AS recommendation
FROM stock_scores
ORDER BY year DESC, quarter DESC, industry, composite_score DESC;

 * postgresql://postgres:***@localhost:5432/fundamentals
Done.
Done.


[]

In [6]:
%%sql

WITH latest AS (
    SELECT year, quarter
    FROM stock_recommendations
    ORDER BY year DESC, quarter DESC
    LIMIT 1
)
SELECT
    r.year,
    r.quarter,
    r.ticker,
    r.recommendation
FROM stock_recommendations r
JOIN latest l
  ON r.year = l.year
 AND r.quarter = l.quarter
ORDER BY r.ticker, r.recommendation;

 * postgresql://postgres:***@localhost:5432/fundamentals
12 rows affected.


year,quarter,ticker,recommendation
2023,4,AC,HOLD
2023,4,AP,HOLD
2023,4,BDO,SELL
2023,4,FGEN,BUY
2023,4,GTCAP,SELL
2023,4,JGS,HOLD
2023,4,MBT,HOLD
2023,4,MER,SELL
2023,4,PNB,HOLD
2023,4,SECB,BUY


## 6. Summary

This notebook introduced the analytical layer of the project:

- Computed percentile-based ranks and composite scores across multiple financial indicators.  
- Translated numerical insights into clear recommendations using SME-defined thresholds.  
- Produced a final, analysis-ready view: `stock_recommendations`.