Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulty in viewing dataset plots that have long text and numerous items #295

Open
jeongukjae opened this issue Aug 2, 2023 · 0 comments
Labels
bug Something isn't working contributions welcome This issue is ready to be worked on

Comments

@jeongukjae
Copy link
Contributor

What happened?

When attempting to render a model card featuring a histogram with very long text (label) and numerous items, I'm facing challenges in effectively visualizing the data.

For instance, let's consider the scenario where I render some string statistics, containing 50 lorem ipsum buckets with numbering, resulting in a model card like the one shown below.

스크린샷 2023-08-02 오후 12 12 56 스크린샷 2023-08-02 오후 12 13 02

The labels are overlapped, and difficult to read.

What is the expected behavior?

Clearer plots.

I think it would be beneficial for the model-card-toolkit to limit on the number of words and items for histogram labels when generating histogram plots.

def draw_histogram(graph: Graph) -> Optional[Graph]:
"""Draw a histogram given the graph.
Args:
graph: The Graph object represents the necessary data to draw a histogram.
Returns:
A Graph object, or None if plotting raises TypeError given the raw data.
"""
if not graph:
return None
try:
# generate and open a new figure
figure, ax = plt.subplots()
# When graph.x or y is str, the histogram is ill-defined.
ax.barh(graph.y, graph.x, color=graph.color)
ax.set_title(graph.title)
if graph.xlabel:
ax.set_xlabel(graph.xlabel)
if graph.ylabel:
ax.set_ylabel(graph.ylabel)
for index, value in enumerate(graph.x):
show_value = f'{value:.2f}' if isinstance(value, float) else value
# To avoid the number has overlap with the box of the graph.
if value > 0.9 * max(graph.x):
ax.text(
value - (value / 10), index, show_value, va='center', color='w'
)
else:
ax.text(value, index, show_value, va='center')
graph.figure = figure
graph.base64str = figure_to_base64str(figure)
except TypeError as e:
logging.info('skipping %s for histogram; plot error: %s:', graph.name, e)
return None
finally:
# closes the figure (to limit memory consumption)
plt.close()
return graph

How can we reproduce the problem?

run following code to rerender previous image

from tensorflow_metadata.proto.v0 import statistics_pb2

from model_card_toolkit import model_card
from model_card_toolkit.utils import tf_graphics

lorem = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed efficitur, enim sit amet ultrices malesuada, lorem augue rhoncus quam, sit amet ullamcorper dolor ligula quis est. Sed tempor blandit pharetra. Aenean facilisis eu lacus non molestie. Sed enim turpis, semper vel gravida sed, egestas at lacus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aliquam at libero posuere, dapibus tellus at, aliquet ipsum. Fusce quis ante nec neque interdum mollis mattis vitae ante. Curabitur aliquet enim enim, ac porttitor nibh lobortis nec. Nam id gravida ex. Donec mi magna, fermentum ac pulvinar vitae, cursus vel odio."
feature = statistics_pb2.FeatureNameStatistics()
feature.path.step.extend("string_feature")
feature.type = statistics_pb2.FeatureNameStatistics.STRING
for i in range(50):
    bucket = feature.string_stats.rank_histogram.buckets.add()
    bucket.label = f"{lorem} {i}"
    bucket.sample_count = 1000 + i * 100

feature_stats = statistics_pb2.DatasetFeatureStatistics()
feature_stats.features.add().CopyFrom(feature)
datasets = statistics_pb2.DatasetFeatureStatisticsList()
datasets.datasets.add().CopyFrom(feature_stats)

mc = model_card.ModelCard()
tf_graphics.annotate_dataset_feature_statistics_plots(
    mc, [datasets]
)

mc.render(
    template_path="model_card_toolkit/template/html/default_template.html.jinja",
    output_path="sample/model_card.html"
)

Model Card Toolkit Version

2.0.0

Python Version

3.8.10

Platforms

docker

Relevant log output

No response

@codesue codesue added bug Something isn't working contributions welcome This issue is ready to be worked on labels Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working contributions welcome This issue is ready to be worked on
Projects
None yet
Development

No branches or pull requests

2 participants