Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pipeline] Underlying SQL Metrics #1099

Open
1 of 2 tasks
zacandcheese opened this issue Jan 3, 2024 · 1 comment
Open
1 of 2 tasks

[Pipeline] Underlying SQL Metrics #1099

zacandcheese opened this issue Jan 3, 2024 · 1 comment
Assignees
Labels
Machine Learning - Model Evaluation Cross Validation, HP Tuning, ... Pipeline Anything related to the Pipelines.

Comments

@zacandcheese
Copy link
Contributor

zacandcheese commented Jan 3, 2024

Description:

There is currently no way to generate the SQL to make a metric table.

Tasks:

  • machine_learning/metrics/classification.py: Create a way to get the underlying SQL of the metrics
  • machine_learning/metrics/regression.py: Adding an additional parameter to regression_report to return the SQL of the metric instead of the result of the metrics.

Definition of Done:

  • SQL code generation is possible for regression and classification.

Concerns:

An example to show we really don't use sql to compute classification anymore:

  • how accuracy_score used to be computed in _metrics.py the 0.12.0 version of Verticapy
    AVG(CASE WHEN {0} = {1} THEN 1 ELSE 0 END)
  • how accuracy_score is computed now in classification.py in 1.0.0
def accuracy_score(...):
    return _compute_final_score(
        _accuracy_score,
        **locals(),
    )

def _accuracy_score(...):
    return (tp + tn) / (tp + tn + fn + fp)

def confusion_matrix(...) -> np.ndarray:
        res = _executeSQL(
            query=f"""
            SELECT 
                CONFUSION_MATRIX(obs, response 
                USING PARAMETERS num_classes = 2) OVER() 
            FROM 
                (SELECT 
                    DECODE({y_true}, '{pos_label}', 
                           1, NULL, NULL, 0) AS obs, 
                    DECODE({y_score}, '{pos_label}', 
                           1, NULL, NULL, 0) AS response 
                 FROM {input_relation}) VERTICAPY_SUBTABLE;""",
            title="Computing Confusion matrix.",
            method="fetchall",
        )
        return np.round(np.array([x[1:-1] for x in res])).astype(int)

def _compute_final_score(...):
    cm = confusion_matrix(y_true, y_score, input_relation, **kwargs)
    return _compute_final_score_from_cm(metric, cm, average=average, multi=multi
@oualib oualib added Machine Learning - Model Evaluation Cross Validation, HP Tuning, ... Pipeline Anything related to the Pipelines. labels Jan 4, 2024
@oualib oualib added this to the VerticaPy 1.1.0 milestone Mar 3, 2024
@oualib
Copy link
Member

oualib commented Mar 10, 2024

@zacandcheese did you find any solution for this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Machine Learning - Model Evaluation Cross Validation, HP Tuning, ... Pipeline Anything related to the Pipelines.
Projects
None yet
Development

No branches or pull requests

2 participants