Difficulty in viewing dataset plots that have long text and numerous items #295

jeongukjae · 2023-08-02T03:23:48Z

What happened?

When attempting to render a model card featuring a histogram with very long text (label) and numerous items, I'm facing challenges in effectively visualizing the data.

For instance, let's consider the scenario where I render some string statistics, containing 50 lorem ipsum buckets with numbering, resulting in a model card like the one shown below.

The labels are overlapped, and difficult to read.

What is the expected behavior?

Clearer plots.

I think it would be beneficial for the model-card-toolkit to limit on the number of words and items for histogram labels when generating histogram plots.

model-card-toolkit/model_card_toolkit/utils/graphics.py

Lines 52 to 91 in 74d7e6d

    
           def draw_histogram(graph: Graph) -> Optional[Graph]: 
        
             """Draw a histogram given the graph. 
        
             Args: 
        
               graph: The Graph object represents the necessary data to draw a histogram. 
        
             Returns: 
        
               A Graph object, or None if plotting raises TypeError given the raw data. 
        
             """ 
        
             if not graph: 
        
               return None 
        
             try: 
        
               # generate and open a new figure 
        
               figure, ax = plt.subplots() 
        
               # When graph.x or y is str, the histogram is ill-defined. 
        
               ax.barh(graph.y, graph.x, color=graph.color) 
        
               ax.set_title(graph.title) 
        
               if graph.xlabel: 
        
                 ax.set_xlabel(graph.xlabel) 
        
               if graph.ylabel: 
        
                 ax.set_ylabel(graph.ylabel) 
        
               for index, value in enumerate(graph.x): 
        
                 show_value = f'{value:.2f}' if isinstance(value, float) else value 
        
                 # To avoid the number has overlap with the box of the graph. 
        
                 if value > 0.9 * max(graph.x): 
        
                   ax.text( 
        
                       value - (value / 10), index, show_value, va='center', color='w' 
        
                   ) 
        
                 else: 
        
                   ax.text(value, index, show_value, va='center') 
        
               graph.figure = figure 
        
               graph.base64str = figure_to_base64str(figure) 
        
             except TypeError as e: 
        
               logging.info('skipping %s for histogram; plot error: %s:', graph.name, e) 
        
               return None 
        
             finally: 
        
               # closes the figure (to limit memory consumption) 
        
               plt.close() 
        
             return graph

How can we reproduce the problem?

run following code to rerender previous image

from tensorflow_metadata.proto.v0 import statistics_pb2

from model_card_toolkit import model_card
from model_card_toolkit.utils import tf_graphics

lorem = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed efficitur, enim sit amet ultrices malesuada, lorem augue rhoncus quam, sit amet ullamcorper dolor ligula quis est. Sed tempor blandit pharetra. Aenean facilisis eu lacus non molestie. Sed enim turpis, semper vel gravida sed, egestas at lacus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Aliquam at libero posuere, dapibus tellus at, aliquet ipsum. Fusce quis ante nec neque interdum mollis mattis vitae ante. Curabitur aliquet enim enim, ac porttitor nibh lobortis nec. Nam id gravida ex. Donec mi magna, fermentum ac pulvinar vitae, cursus vel odio."
feature = statistics_pb2.FeatureNameStatistics()
feature.path.step.extend("string_feature")
feature.type = statistics_pb2.FeatureNameStatistics.STRING
for i in range(50):
    bucket = feature.string_stats.rank_histogram.buckets.add()
    bucket.label = f"{lorem} {i}"
    bucket.sample_count = 1000 + i * 100

feature_stats = statistics_pb2.DatasetFeatureStatistics()
feature_stats.features.add().CopyFrom(feature)
datasets = statistics_pb2.DatasetFeatureStatisticsList()
datasets.datasets.add().CopyFrom(feature_stats)

mc = model_card.ModelCard()
tf_graphics.annotate_dataset_feature_statistics_plots(
    mc, [datasets]
)

mc.render(
    template_path="model_card_toolkit/template/html/default_template.html.jinja",
    output_path="sample/model_card.html"
)

Model Card Toolkit Version

2.0.0

Python Version

3.8.10

Platforms

docker

Relevant log output

No response

codesue added bug Something isn't working contributions welcome This issue is ready to be worked on labels Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difficulty in viewing dataset plots that have long text and numerous items #295

Difficulty in viewing dataset plots that have long text and numerous items #295

jeongukjae commented Aug 2, 2023

Difficulty in viewing dataset plots that have long text and numerous items #295

Difficulty in viewing dataset plots that have long text and numerous items #295

Comments

jeongukjae commented Aug 2, 2023

What happened?

What is the expected behavior?

How can we reproduce the problem?

Model Card Toolkit Version

Python Version

Platforms

Relevant log output