Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pigeon annotator #2641

Merged
merged 39 commits into from
May 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
e072a87
initial structure and changes
strickvl Apr 25, 2024
a59c9eb
add implementation
strickvl Apr 25, 2024
d613be2
update flavor
strickvl Apr 25, 2024
5216a06
small tweaks
strickvl Apr 25, 2024
9909fc8
renaming and init updates
strickvl Apr 25, 2024
dd36e99
docstrings
strickvl Apr 25, 2024
dcae059
Remove unnecessary import statement in integrations module
strickvl Apr 25, 2024
74fee6c
Refactor get_datasets and get_dataset_names methods in PigeonAnnotato…
strickvl Apr 25, 2024
d7b3bc9
add mixin to config
strickvl Apr 25, 2024
ae261c0
fix annotator
strickvl Apr 25, 2024
149dedb
use an old version of ipywidgets
strickvl Apr 25, 2024
00c2c6f
remove pigeon dependency and fix display
strickvl Apr 25, 2024
235889c
update annotator
strickvl Apr 25, 2024
ce9c34d
ignore pigeon in CI
strickvl Apr 25, 2024
f8a7556
fix to use latest ipywidgets
strickvl Apr 25, 2024
1ac2493
allow modern ipywidgets
strickvl Apr 25, 2024
1d23e7d
update main page
strickvl Apr 25, 2024
c00c62b
add docs
strickvl Apr 25, 2024
44536e0
Refactor PigeonAnnotator class to return annotations
strickvl Apr 25, 2024
827ef71
update docs page
strickvl Apr 25, 2024
dc54040
add image
strickvl Apr 25, 2024
219fcf3
linting fix
strickvl Apr 25, 2024
084d4f2
mypy fix
strickvl Apr 25, 2024
6d65dd6
mypy fixes and retake image
strickvl Apr 25, 2024
0279440
fix formatting
strickvl Apr 25, 2024
7515b81
Optimised images with calibre/image-actions
github-actions[bot] Apr 25, 2024
c940137
Remove unused Jupyter notebook and annotation file
strickvl Apr 25, 2024
9d6871e
add pigeon to toc
strickvl Apr 25, 2024
97a9b60
mypy fixes
strickvl Apr 25, 2024
0bf0ae2
docstring fixes
strickvl Apr 26, 2024
6464a47
Merge branch 'develop' into feature/pigeon-annotator
strickvl Apr 29, 2024
ce2b41d
update pigeon launch signature as per prodigy launch sig update
strickvl May 1, 2024
e02db76
Merge branch 'feature/pigeon-annotator' of github.com:zenml-io/zenml …
strickvl May 1, 2024
b899fc3
Merge remote-tracking branch 'origin/develop' into feature/pigeon-ann…
strickvl May 1, 2024
fffc009
pigeon conforms to prodigy updated signature
strickvl May 1, 2024
1979228
Merge remote-tracking branch 'origin/develop' into feature/pigeon-ann…
strickvl May 13, 2024
2328dfa
fix dataset_stats method
strickvl May 13, 2024
adb3acb
print -> logger
strickvl May 13, 2024
1f75b17
Merge branch 'develop' into feature/pigeon-annotator
strickvl May 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Binary file added docs/book/.gitbook/assets/pigeon.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,12 @@ The core parts of the annotation workflow include:
### List of available annotators

For production use cases, some more flavors can be found in specific `integrations` modules. In terms of annotators,
ZenML features an integration with `label_studio`.
ZenML features integrations with `label_studio` and `pigeon`.

| Annotator | Flavor | Integration | Notes |
|-----------------------------------------|----------------|----------------|----------------------------------------------------------------------|
| [LabelStudioAnnotator](label-studio.md) | `label_studio` | `label_studio` | Connect ZenML with Label Studio |
| [PigeonAnnotator](pigeon.md) | `pigeon` | `pigeon` | Connect ZenML with Pigeon. Notebook only & for image and text classification tasks. |
| [ProdigyAnnotator](prodigy.md) | `prodigy` | `prodigy` | Connect ZenML with [Prodigy](https://prodi.gy/) |
| [Custom Implementation](custom.md) | _custom_ | | Extend the annotator abstraction and provide your own implementation |

Expand All @@ -71,9 +72,11 @@ zenml annotator flavor list

### How to use it

The available implementation of the annotator is built on top of the Label Studio integration, which means that using an
annotator currently is no different from what's described on
the [Label Studio page: How to use it?](label-studio.md#how-do-you-use-it).
The available implementation of the annotator is built on top of the Label
Studio integration, which means that using an annotator currently is no
different from what's described on the [Label Studio page: How to use
it?](label-studio.md#how-do-you-use-it). ([Pigeon](pigeon.md) is also supported, but has a
very limited functionality and only works within Jupyter notebooks.)

### A note on names

Expand Down
113 changes: 113 additions & 0 deletions docs/book/stacks-and-components/component-guide/annotators/pigeon.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
description: Annotating data using Pigeon.
---

# Pigeon

Pigeon is a lightweight, open-source annotation tool designed for quick and easy labeling of data directly within Jupyter notebooks. It provides a simple and intuitive interface for annotating various types of data, including:

* Text Classification
* Image Classification
* Text Captioning

### When would you want to use it?

![Pigeon annotator interface](../../../.gitbook/assets/pigeon.png)

If you need to label a small to medium-sized dataset as part of your ML workflow and prefer the convenience of doing it directly within your Jupyter notebook, Pigeon is a great choice. It is particularly useful for:

* Quick labeling tasks that don't require a full-fledged annotation platform
* Iterative labeling during the exploratory phase of your ML project
* Collaborative labeling within a Jupyter notebook environment

### How to deploy it?

To use the Pigeon annotator, you first need to install the ZenML Pigeon integration:

```shell
zenml integration install pigeon
```

Next, register the Pigeon annotator with ZenML, specifying the output directory where the annotation files will be stored:

```shell
zenml annotator register pigeon --flavor pigeon --output_dir="path/to/dir"
```

Note that the `output_dir` is relative to the repository or notebook root.

Finally, add the Pigeon annotator to your stack and set it as the active stack:

```shell
zenml stack update <YOUR_STACK_NAME> --annotator pigeon
```

Now you're ready to use the Pigeon annotator in your ML workflow!

### How do you use it?

With the Pigeon annotator registered and added to your active stack, you can easily access it using the ZenML client within your Jupyter notebook.

For text classification tasks, you can launch the Pigeon annotator as follows:

````python
from zenml.client import Client

annotator = Client().active_stack.annotator

annotations = annotator.launch(
data=[
'I love this movie',
'I was really disappointed by the book'
],
options=[
'positive',
'negative'
]
)
````

For image classification tasks, you can provide a custom display function to render the images:

````python
from zenml.client import Client
from IPython.display import display, Image

annotator = Client().active_stack.annotator

annotations = annotator.launch(
data=[
'/path/to/image1.png',
'/path/to/image2.png'
],
options=[
'cat',
'dog'
],
display_fn=lambda filename: display(Image(filename))
)
````

The `launch` method returns the annotations as a list of tuples, where each tuple contains the data item and its corresponding label.

You can also use the `zenml annotator dataset` commands to manage your datasets:

* `zenml annotator dataset list` - List all available datasets
* `zenml annotator dataset delete <dataset_name>` - Delete a specific dataset
* `zenml annotator dataset stats <dataset_name>` - Get statistics for a specific dataset

Annotation files are saved as JSON files in the specified output directory. Each
annotation file represents a dataset, with the filename serving as the dataset
name.

## Acknowledgements

Pigeon was created by [Anastasis Germanidis](https://github.com/agermanidis) and
released as a [Python package](https://pypi.org/project/pigeon-jupyter/) and
[Github repository](https://github.com/agermanidis/pigeon). It is licensed under
the Apache License. It has been updated to work with more recent `ipywidgets`
versions and some small UI improvements were added. We are grateful to Anastasis
for creating this tool and making it available to the community.

<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
1 change: 1 addition & 0 deletions docs/book/toc.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,7 @@
* [Develop a Custom Feature Store](stacks-and-components/component-guide/feature-stores/custom.md)
* [Annotators](stacks-and-components/component-guide/annotators/annotators.md)
* [Label Studio](stacks-and-components/component-guide/annotators/label-studio.md)
* [Pigeon](stacks-and-components/component-guide/annotators/pigeon.md)
* [Prodigy](stacks-and-components/component-guide/annotators/prodigy.md)
* [Develop a Custom Annotator](stacks-and-components/component-guide/annotators/custom.md)
* [Image Builders](stacks-and-components/component-guide/image-builders/image-builders.md)
Expand Down
3 changes: 2 additions & 1 deletion scripts/install-zenml-dev.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ install_integrations() {
# figure out the python version
python_version=$(python -c "import sys; print('.'.join(map(str, sys.version_info[:2])))")

ignore_integrations="feast label_studio bentoml seldon pycaret skypilot_aws skypilot_gcp skypilot_azure prodigy"
ignore_integrations="feast label_studio bentoml seldon pycaret skypilot_aws skypilot_gcp skypilot_azure pigeon prodigy"

# if python version is 3.11, exclude all integrations depending on kfp
# because they are not yet compatible with python 3.11
if [ "$python_version" = "3.11" ]; then
Expand Down
5 changes: 2 additions & 3 deletions src/zenml/integrations/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@
support. This includes orchestrators like Apache Airflow, visualization tools
like the ``facets`` library, as well as deep learning libraries like PyTorch.
"""
import sys

from zenml.integrations.airflow import AirflowIntegration # noqa
from zenml.integrations.aws import AWSIntegration # noqa
from zenml.integrations.azure import AzureIntegration # noqa
Expand Down Expand Up @@ -49,8 +47,9 @@
from zenml.integrations.neptune import NeptuneIntegration # noqa
from zenml.integrations.neural_prophet import NeuralProphetIntegration # noqa
from zenml.integrations.openai import OpenAIIntegration # noqa
from zenml.integrations.pigeon import PigeonIntegration # noqa
from zenml.integrations.pillow import PillowIntegration # noqa
from zenml.integrations.polars import PolarsIntegration
from zenml.integrations.polars import PolarsIntegration # noqa
from zenml.integrations.prodigy import ProdigyIntegration # noqa
from zenml.integrations.pycaret import PyCaretIntegration # noqa
from zenml.integrations.pytorch import PytorchIntegration # noqa
Expand Down
1 change: 1 addition & 0 deletions src/zenml/integrations/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
NEPTUNE = "neptune"
NEURAL_PROPHET = "neural_prophet"
OPEN_AI = "openai"
PIGEON = "pigeon"
PILLOW = "pillow"
PLOTLY = "plotly"
POLARS = "polars"
Expand Down
44 changes: 44 additions & 0 deletions src/zenml/integrations/pigeon/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.
"""Initialization of the Pigeon integration."""
from typing import List, Type

from zenml.integrations.constants import PIGEON
from zenml.integrations.integration import Integration
from zenml.stack import Flavor

PIGEON_ANNOTATOR_FLAVOR = "pigeon"


class PigeonIntegration(Integration):
"""Definition of Pigeon integration for ZenML."""

NAME = PIGEON
REQUIREMENTS = ["ipywidgets>=8.0.0"]

@classmethod
def flavors(cls) -> List[Type[Flavor]]:
"""Declare the stack component flavors for the Pigeon integration.

Returns:
List of stack component flavors for this integration.
"""
from zenml.integrations.pigeon.flavors import (
PigeonAnnotatorFlavor,
)

return [PigeonAnnotatorFlavor]


PigeonIntegration.check_installation()
20 changes: 20 additions & 0 deletions src/zenml/integrations/pigeon/annotators/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Copyright (c) ZenML GmbH 2024. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
# or implied. See the License for the specific language governing
# permissions and limitations under the License.
"""Initialization of the Pigeon annotators submodule."""

from zenml.integrations.pigeon.annotators.pigeon_annotator import (
PigeonAnnotator,
)

__all__ = ["PigeonAnnotator"]