# Synthetic Data Generation


In [1]:
import json
import sys
import csv
sys.path.append('..')


import tinytroupe
from tinytroupe.agent import TinyPerson
from tinytroupe.environment import TinyWorld, TinySocialNetwork
from tinytroupe.factory import TinyPersonFactory
from tinytroupe.extraction import default_extractor as extractor
from tinytroupe.extraction import ResultsReducer
import tinytroupe.control as control


!!!!
DISCLAIMER: TinyTroupe relies on Artificial Intelligence (AI) models to generate content. 
The AI models are not perfect and may produce inappropriate or inacurate results. 
For any serious or consequential use, please review the generated content before using it.
!!!!

Looking for default config on: c:\Users\pdasilva\OneDrive - Microsoft\Git repositories\tinytroupe-opensource\TinyTroupe\examples\..\tinytroupe\config.ini
Found custom config on: c:\Users\pdasilva\OneDrive - Microsoft\Git repositories\tinytroupe-opensource\TinyTroupe\examples\config.ini

Current TinyTroupe configuration 
[OpenAI]
api_type = openai
azure_api_version = 2023-05-15
model = gpt-4o-mini
max_tokens = 4000
temperature = 1.0
freq_penalty = 0.3
presence_penalty = 0.0
timeout = 60
max_attempts = 5
waiting_time = 0
exponential_backoff_factor = 5
embedding_model = text-embedding-3-small
cache_api_calls = False
cache_file_name = openai_api_cache.pickle
max_content_display_length = 1024

[Simulation]
rai_harmful_

Let's create the specific types of agents we need to collect data.

In [2]:
factory = TinyPersonFactory("A random knowledge worker in a company providing marketing services.")

In [3]:
people = []
for i in range(2):
    person = factory.generate_person(temperature=1.6)
    print(person.minibio())
    people.append(person)

len(people)

Lucas Martinez is a 29 year old Digital Marketing Specialist, Spanish, currently living in Spain. Lucas Martinez is not only dedicated to his role as a Digital Marketing Specialist but also possesses a friendly demeanor that makes him a valued collaborator among his peers. His analytical nature drives him to seek data-driven solutions, although he sometimes struggles with anxiety over deadlines, which can hinder his creativity. Outside of work, Lucas finds joy in photography, capturing moments during his travels, and enjoys unwinding with video games. He also has a passion for cooking, often experimenting with new recipes on weekends, which allows him to express his creativity in a different way.
Clara Thompson is a 34 year old Content Strategist, American, currently living in United States. Clara Thompson is not only a dedicated Content Strategist but also a highly organized individual who thrives in collaborative environments. Her creative mindset drives her passion for storytelling,

2

In [4]:
company = TinyWorld("Some Corp Inc.", people)

In [5]:
company.make_everyone_accessible()

In [6]:
company.broadcast("Get some work done together, help each other.")

In [7]:
company.run(5)

We can now extract the conversations, which form the synthetic corpus we wanted.

In [8]:
people[0].pp_current_interactions()

In [9]:
reducer = ResultsReducer()

def aux_extract_content(focus_agent: TinyPerson, source_agent:TinyPerson, target_agent:TinyPerson, kind:str, event: str, content: str, timestamp:str):

    if event == "TALK":
        author = focus_agent.name
    elif event == "CONVERSATION":
        if source_agent is None:
            author = "USER"
        else:
            author = source_agent.name
    else:
        raise ValueError(f"Unknown event: {event}")
    
    
    entry = (author, content)
    print(entry)
    return entry
    


reducer.add_reduction_rule("TALK", aux_extract_content)
reducer.add_reduction_rule("CONVERSATION", aux_extract_content)

Finally, we obtain the dataframe with the data and save it to a `.csv`, for later use in other applications.

In [10]:
df = reducer.reduce_agent_to_dataframe(people[0], column_names=["author", "content"])
df

('USER', 'Get some work done together, help each other.')
('Lucas Martinez', 'I think working together is a great idea! What tasks do you want to tackle first?')
('Lucas Martinez', "Hey Clara, I'm ready to get started on our projects. What do you think we should focus on first?")
('Clara Thompson', 'I think we should start by brainstorming new content ideas for our upcoming campaign. What do you think?')
('Lucas Martinez', 'I think brainstorming new content ideas is a great starting point! I have a few ideas in mind that we can discuss. What do you think?')
('Clara Thompson', "I'm excited to hear your ideas! What do you have in mind?")
('Lucas Martinez', 'I have a few ideas in mind! One is to create interactive social media posts that engage our audience. Another idea is to encourage user-generated content by running a contest. What do you think?')
('Clara Thompson', "I really like your ideas! Creating interactive social media posts and running a contest for user-generated content coul

Unnamed: 0,author,content
0,USER,"Get some work done together, help each other."
1,Lucas Martinez,I think working together is a great idea! What...
2,Lucas Martinez,"Hey Clara, I'm ready to get started on our pro..."
3,Clara Thompson,I think we should start by brainstorming new c...
4,Lucas Martinez,I think brainstorming new content ideas is a g...
5,Clara Thompson,I'm excited to hear your ideas! What do you ha...
6,Lucas Martinez,I have a few ideas in mind! One is to create i...
7,Clara Thompson,I really like your ideas! Creating interactive...
8,Lucas Martinez,Great! Let's start by discussing the platforms...
9,Clara Thompson,I agree that Instagram and Facebook are great ...


In [11]:
df.to_csv("../data/extractions/synthetic_data_generation.out.csv", index=False)