Description
I encountered an issue with TaskWeaver's code interpreter when analyzing a CSV file containing customer complaints. After uploading the CSV and requesting the top 5 themes of the complaints, the interpreter generated Python code utilizing Latent Dirichlet Allocation (LDA) for topic modeling. However, the resulting themes were merely lists of words without meaningful context, which doesn't meet the expectation of coherent thematic summaries.
Steps to Reproduce:
- Upload a CSV file with a column containing textual data of customer complaints.
- Ask TaskWeaver: "Provide the top 5 themes of the complaints."
Expected Behavior: The Large Language Model (LLM) should interpret the complaints and provide coherent thematic summaries, such as:
- Billing Issues
- Service Delays
- Product Quality Concerns
- Customer Support Complaints
- Account Access Problems
Actual Behavior: The code interpreter generated the following Python code:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
# Vectorize the complaint texts
vectorizer = CountVectorizer(stop_words='english')
X = vectorizer.fit_transform(df['COMPLAINT_SUM_TXT'])
# Apply LDA for topic modeling
lda = LatentDirichletAllocation(n_components=5, random_state=0)
lda.fit(X)
# Get the top words for each topic
def get_top_words(model, feature_names, n_top_words):
top_words = []
for topic_idx, topic in enumerate(model.components_):
top_words.append([feature_names[i] for i in topic.argsort()[:-n_top_words - 1:-1]])
return top_words
n_top_words = 5
feature_names = vectorizer.get_feature_names_out()
top_themes = get_top_words(lda, feature_names, n_top_words)
top_themes
This code produced the following output:
1. ['Databricks', 'client', 'account', 'told', 'form']
2. ['client', 'Databricks', 'funds', 'contract', 'received']
3. ['policy', 'client', 'account', 'called', 'told']
4. ['client', 'Databricks', 'account', 'states', 'advisor']
5. ['client', 'policy', 'received', '2023', 'told']
These outputs are lists of words without clear thematic context, making it difficult to derive actionable insights.
Is there a way to enhance the taskweaver to use LLM or invoke an agent/plugin to work on output from the code and generate expected themes? Any suggestions or different approaches please provide. Thank you!