# Simulating Constitutive Processes of semantic change within heterogeneous populations of speakers

In [1]:
# basic imports
import torch
import numpy as np
import pandas as pd
from rpy2.robjects.lib.grid import xaxis
from tqdm.notebook import tqdm

# project code imports
from mod.agent import *
from mod.network import *
from mod.plot import *

In [2]:
no_agents = 50
no_connections = 10

## A very basic simulation

In [3]:
vocab_size = 10
semantic_features = 3

In [4]:
ag1 = agent(vocab_size,semantic_features, starting_observations=5)
ag2 = agent(vocab_size,semantic_features, starting_observations=5)

starting_env = torch.distributions.MultivariateNormal(torch.randn(size=(1,semantic_features)), covariance_matrix=torch.eye(semantic_features) * .2)

#### Learning for a single semantic dimension

In [5]:
((ag1.vocab - ag2.vocab)**2).sum()

tensor(70.4477)

In [6]:
# we'll just focus on one semantic dimension here!
word_mask = torch.FloatTensor([0,1,0])

# and set the number of iterations
turns = 300

In [7]:
utt_tracking, vocab_dif = [], []

In [8]:
for _ in tqdm(range(turns)):
    env = starting_env.sample()
    speaker_prob = torch.rand(size=(1,))

    if speaker_prob > .5:
        utt = ag1.speak(env, lam=3)
        ag2.listen(utt, env)
        ag1.listen(utt, env)
        utt_tracking += [(utt, ag2.speak(env, lam=3))]
        vocab_dif += [((ag1.vocab - ag2.vocab)**2).sum()]
    else:
        utt = ag2.speak(env, lam=3)
        ag2.listen(utt, env)
        ag1.listen(utt, env)
        utt_tracking += [(utt, ag1.speak(env, lam=3))]
        vocab_dif += [((ag1.vocab - ag2.vocab)**2).sum()]


  0%|          | 0/300 [00:00<?, ?it/s]

In [9]:
utt_tracking = torch.FloatTensor(utt_tracking)
vocab_dif = torch.FloatTensor(vocab_dif)

In [10]:
((ag1.vocab - ag2.vocab)**2).sum()

tensor(6.6240)

In [11]:
fig = plot(vocab_dif.numpy(), 'vocabulary difference')
fig.update_layout(
    title='Dyadic interaction in a random environment',
    yaxis_title='Î” vocab',
    xaxis_title='turns'
)
fig.show()

## Returning to forced birth vs. pro-life

So this one is trolly and fun. Basically, we want to replicate the changes in frequency for forced birth (FB) versus pro-life (PL) across months prior to and after the Dobbs decision. We can have a set of features representing the relative probability that a word will be associated with a feature. Something like the following table (note: these aren't normalized probabilities in the example below. I'm not sure whether we ought to do that or not.):

| **Date range** | **Antiabortion** | **legality** | **($\neg$) activist** | **morality** |
|------------|--------------| -------- | ----------------- | -------- |
| _2022/1-2022/5_ | .35          | .2       | .45            |  .0001   |
| _2022/6-2023/1_ | .2           | .45      | .35            | .0001    |
| ... | ... | ... | ... | ... |
| _2024/1-2024/5_ | .0001 | .2  | .45 | .35 |

which we can then use as a series of environments that dictate (1) what people say, (2) how people update their beliefs on the constraints around when to use certain words. We can even initialize the network with the same number of "users" as there are on _r/Feminism_!
