Skip to content

Redundancy between ibis and ibis-ml #181

Open
@vspinu

Description

@vspinu

Thanks for the amazing initiative!

I am a bit taken aback by the redundancy of abstractions between ibis-ml and native ibis. I would expect ibis-ml to be a lightweight extension of ibis as much as possible, but that doesn't seem to be the case. Ibis-ml does its own stuff which is not compatible with the core ibis.

Here are a few examples which I came accross.

Selectors

Ibis-ml has its own abstraction for selectors. For example the following cast

ml.Cast(ml.has_type("boolean"), "int8"),

could have been:

ml.Cast(s.of_type("boolean"), "int8"),

Casing

Ibis-ml uses CamelCase. Ibs uses snake_case.

Ibis Pipelines

Most importantly, Ibis pipelines are already lazy and backend independent. So why not reuse those as ML recipes directly?

Ibis-ml could simply either

  1. enrich the existing backend transforms with the ML functionality, or
  2. provide its own proxy backend which would be dispatched to backends depending on the input data to the fit method

For example:

## 1. Ibis Table is already a deferred recipe, so use it as such:
rcp = (
    df
    .drop(["approved", "day"])
    .mutate(day=_.cast("string"))
    .mutate(s.across(s.endswith("_id"), _.cast("string")))
    .fill_na(s.of_type("string"))
    .mutate(s.of_type("boolean"), _.cast("int8"))
    .ordinal_encode(s.of_type("string"), min_frequency=0.01)
)

tr = rcp.fit() # or rcp.fit(df), or rcp.fit(df_from_other_backend)


## Option 2:
# Start with a ml.recipe pseudo backend 
rcp = (
    ml.reicipe
    .drop(["approved", "day"])
    .mutate(day=_.cast("string"))
    .mutate(s.across(s.endswith("_id"), _.cast("string")))
    .fill_na(s.of_type("string"))
    .mutate(s.of_type("boolean"), _.cast("int8"))
    .ordinal_encode(s.of_type("string"), min_frequency=0.01)
)

tr = rcp.fit(df)

Does this make sense?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions