Code Interpreter Generates Inadequate Themes for Complaints in CSV

I encountered an issue with TaskWeaver's code interpreter when analyzing a CSV file containing customer complaints. After uploading the CSV and requesting the top 5 themes of the complaints, the interpreter generated Python code utilizing Latent Dirichlet Allocation (LDA) for topic modeling. However, the resulting themes were merely lists of words without meaningful context, which doesn't meet the expectation of coherent thematic summaries.

**Steps to Reproduce:**
1. Upload a CSV file with a column containing textual data of customer complaints.
2. Ask TaskWeaver: "Provide the top 5 themes of the complaints."

**Expected Behavior:** The Large Language Model (LLM) should interpret the complaints and provide coherent thematic summaries, such as:
1. Billing Issues
2. Service Delays
3. Product Quality Concerns
4. Customer Support Complaints
5. Account Access Problems

**Actual Behavior:** The code interpreter generated the following Python code:
```
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

# Vectorize the complaint texts
vectorizer = CountVectorizer(stop_words='english')
X = vectorizer.fit_transform(df['COMPLAINT_SUM_TXT'])

# Apply LDA for topic modeling
lda = LatentDirichletAllocation(n_components=5, random_state=0)
lda.fit(X)

# Get the top words for each topic
def get_top_words(model, feature_names, n_top_words):
    top_words = []
    for topic_idx, topic in enumerate(model.components_):
        top_words.append([feature_names[i] for i in topic.argsort()[:-n_top_words - 1:-1]])
    return top_words

n_top_words = 5
feature_names = vectorizer.get_feature_names_out()
top_themes = get_top_words(lda, feature_names, n_top_words)
top_themes
```

**This code produced the following output:**
```
1. ['Databricks', 'client', 'account', 'told', 'form']
2. ['client', 'Databricks', 'funds', 'contract', 'received']
3. ['policy', 'client', 'account', 'called', 'told']
4. ['client', 'Databricks', 'account', 'states', 'advisor']
5. ['client', 'policy', 'received', '2023', 'told']
```

These outputs are lists of words without clear thematic context, making it difficult to derive actionable insights.

Is there a way to enhance the taskweaver to use LLM or invoke an agent/plugin to work on output from the code and generate expected themes? Any suggestions or different approaches please provide. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Code Interpreter Generates Inadequate Themes for Complaints in CSV #465

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Code Interpreter Generates Inadequate Themes for Complaints in CSV #465

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions