sklearn_experimenter
is a Python class that allows you to easily run experiments with different datasets, models, splits, and metrics using scikit-learn. It provides a convenient way to organize and automate your machine learning experiments.
- New Feature: : Now it`s possible to create a custom dataset reader and uses sklearn pipelines on add_models().
from sklearn_experimenter import ModelRunner
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, f1_score
runner = ModelRunner()
To add datasets using the runner.data.add_datasets()
method, you can follow these examples:
- Adding a folder path:
runner.data.add_datasets(['bases/'])
- Adding a file path:
runner.data.add_datasets(['bases/df1.csv'])
- Adding a DataFrame with a name:
runner.data.add_datasets([(X, 'X')])
- Adding a DataFrame without a name:
runner.data.add_datasets([X])
- Mixing different types:
runner.data.add_datasets(['bases/','bases/df1.csv',(X,'X'),X])
Is possible to set a custom dataset reader, before adding the dataset. The return of the function should be a pandas DataFrame.
def custom_reader(filename):
data = pd.read_csv(filename,decimal=',',sep=';')
return data
#Função personalizada para leitura do dataset
runner.data.reader = custom_reader
In the current version, the target column is automatically set as the last column in each dataset, and the remaining columns are considered as the feature columns. However, if you need to customize the target and feature columns for a specific dataset, you can use the following approach:
runner.data.datasets[Nth].target = 'target_column_name'
runner.data.datasets[Nth].feature = ['feature_column1', 'feature_column2', ...]
Replace Nth
with the index of the dataset for which you want to modify the target and feature columns. Set 'target_column_name'
to the desired name of the target column, and provide a list of column names ['feature_column1', 'feature_column2', ...]
for the feature columns.
runner.splits.add_holdout([0.3]) # Holdouts
runner.splits.add_fold([3]) # K-folds
It`s possible to add models by name, by the fuction o even add a pipeline as a model.
pipeline = Pipeline([
('scaler', StandardScaler()),
('bagging', BaggingClassifier(estimator=knn()))
])
runner.models.add_models(['DecisionTreeClassifier',
BaggingClassifier(estimator=knn()),
pipeline]
)
runner.metrics.add_score([
confusion_matrix,
accuracy_score,
lambda y_true, y_pred: f1_score(y_true, y_pred, average='micro')
])
runner.random.add_model_seed([42])
runner.save_path ='output.pkl'
runner.run()
For more detailed information on the available methods and functionalities of the sklearn_experimenter
class, please refer to the documentation.
- Set n_jobs parameters of functions.
This project is licensed under the MIT License. See the LICENSE file for more details.