Skip to content

Releases: webis-de/small-text

v1.0.0b4

04 May 18:20
Compare
Choose a tag to compare

This release adds two no query strategies, improves the Dataset interface, and introduces optional dependencies.

Added

  • General:
    • We now have a concept for optional dependencies which allows components to rely on soft dependencies, i.e. python dependencies which can be installed on demand (and only when certain functionality is needed).
  • Datasets:
    • The Dataset interface now has a clone() method that creates an identical copy of the respective dataset.
  • Query Strategies:

Changed

  • Datasets:
    • Separated the previous DatasetView implementation into interface (DatasetView) and implementation (SklearnDatasetView).
    • Added clone() method which creates an identical copy of the dataset.
  • Query Strategies:
    • EmbeddingBasedQueryStrategy now only embeds instances that are either in the label or in the unlabeled pool (and no longer the entire dataset).
  • Code examples:
    • Code structure was unified.
    • Number of iterations can now be passed via an cli argument.
  • small_text.integrations.pytorch.utils.data:
    • Method get_class_weights() now scales the resulting multi-class weights so that the smallest class weight is equal to 1.0.

v1.0.0b3

06 Mar 16:16
Compare
Choose a tag to compare

This release adds a new query strategy, improves the docs, and cleans up the interfaces in preparation of v1.0.0.

Added

Changed

  • Cleaned up and unified argument naming: The naming of variables related to datasets and
    indices has been improved and unified. The naming of datasets had been inconsistent,
    and the previous x_ notation for indices was a relict of earlier versions of this library and
    did not reflect the underlying object anymore.

    • PoolBasedActiveLearner:

      • attribute x_indices_labeled was renamed to indices_labeled
      • attribute x_indices_ignored was unified to indices_ignored
      • attribute queried_indices was unified to indices_queried
      • attribute _x_index_to_position was named to _index_to_position
      • arguments x_indices_initial, x_indices_ignored, and x_indices_validation were
        renamed to indices_initial, indices_ignored, and indices_validation. This affects most
        methods of the PoolBasedActiveLearner.
    • QueryStrategy

      • old: query(self, clf, x, x_indices_unlabeled, x_indices_labeled, y, n=10)
      • new: query(self, clf, dataset, indices_unlabeled, indices_labeled, y, n=10)
    • StoppingCriterion

      • old: stop(self, active_learner=None, predictions=None, proba=None, x_indices_stopping=None)
      • new: stop(self, active_learner=None, predictions=None, proba=None, indices_stopping=None)
  • Renamed environment variable which sets the small-text temp folder from ALL_TMP to SMALL_TEXT_TEMP

v1.0.0b2

22 Feb 16:48
Compare
Choose a tag to compare

This release fixes some broken links which were caused due to the recent change in naming the git tags (1.0.0a8 -> v1.0.0b1).

Fixed

  • Fix links to the documentation in README.md and notebooks.

v1.0.0b1

22 Feb 16:22
Compare
Choose a tag to compare

First beta release with multi-label functionality and stopping criteria. Added/revised large parts of the documentation.

Added

  • Added a changelog.
  • All provided classifiers are now capable of multi-label classification.

Changed

  • Documentation has been overhauled considerably.
  • PoolBasedActiveLearner: Renamed incremental_training kwarg to reuse_model.
  • SklearnClassifier: Changed __init__(clf) to __init__(model, num_classes, multi_Label=False)
  • SklearnClassifierFactory: __init__(clf_template, kwargs={}) to __init__(base_estimator, num_classes, kwargs={}).
  • Refactored KimCNNClassifier and TransformerBasedClassification.

Removed

  • Removed device kwarg from PytorchDataset.__init__(),
    PytorchTextClassificationDataset.__init__() and TransformersDataset.__init__().