[Pipeline] Underlying SQL Metrics #1099

zacandcheese · 2024-01-03T15:59:43Z

Description:

There is currently no way to generate the SQL to make a metric table.

Tasks:

machine_learning/metrics/classification.py: Create a way to get the underlying SQL of the metrics
machine_learning/metrics/regression.py: Adding an additional parameter to regression_report to return the SQL of the metric instead of the result of the metrics.

Definition of Done:

SQL code generation is possible for regression and classification.

Concerns:

An example to show we really don't use sql to compute classification anymore:

how accuracy_score used to be computed in _metrics.py the 0.12.0 version of Verticapy
AVG(CASE WHEN {0} = {1} THEN 1 ELSE 0 END)
how accuracy_score is computed now in classification.py in 1.0.0

def accuracy_score(...):
    return _compute_final_score(
        _accuracy_score,
        **locals(),
    )

def _accuracy_score(...):
    return (tp + tn) / (tp + tn + fn + fp)

def confusion_matrix(...) -> np.ndarray:
        res = _executeSQL(
            query=f"""
            SELECT 
                CONFUSION_MATRIX(obs, response 
                USING PARAMETERS num_classes = 2) OVER() 
            FROM 
                (SELECT 
                    DECODE({y_true}, '{pos_label}', 
                           1, NULL, NULL, 0) AS obs, 
                    DECODE({y_score}, '{pos_label}', 
                           1, NULL, NULL, 0) AS response 
                 FROM {input_relation}) VERTICAPY_SUBTABLE;""",
            title="Computing Confusion matrix.",
            method="fetchall",
        )
        return np.round(np.array([x[1:-1] for x in res])).astype(int)

def _compute_final_score(...):
    cm = confusion_matrix(y_true, y_score, input_relation, **kwargs)
    return _compute_final_score_from_cm(metric, cm, average=average, multi=multi

The text was updated successfully, but these errors were encountered:

oualib · 2024-03-10T08:36:42Z

@zacandcheese did you find any solution for this one?

oualib · 2024-09-26T11:40:27Z

No need for this current issue as Pipelines project will not move forward for now.

oualib added Machine Learning - Model Evaluation Cross Validation, HP Tuning, ... Pipeline Anything related to the Pipelines. labels Jan 4, 2024

oualib added this to the VerticaPy 1.1.0 milestone Mar 3, 2024

oualib assigned zacandcheese Mar 10, 2024

oualib modified the milestones: VerticaPy 1.0.5, VerticaPy 1.1.0 Aug 2, 2024

oualib closed this as completed Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pipeline] Underlying SQL Metrics #1099

[Pipeline] Underlying SQL Metrics #1099

zacandcheese commented Jan 3, 2024 •

edited by oualib

Loading

oualib commented Mar 10, 2024

oualib commented Sep 26, 2024

[Pipeline] Underlying SQL Metrics #1099

[Pipeline] Underlying SQL Metrics #1099

Comments

zacandcheese commented Jan 3, 2024 • edited by oualib Loading

Description:

Tasks:

Definition of Done:

Concerns:

oualib commented Mar 10, 2024

oualib commented Sep 26, 2024

zacandcheese commented Jan 3, 2024 •

edited by oualib

Loading