# Redirection Demo in US Supreme Court oral arguments

This notebook demonstrates our redirection framework introduced this paper: **Taking a turn for the better: Conversation redirection throughout the course of mental-health therapy.** In the paper, we define redirection as the extent to which speakers shift the immediate focus of the conversation and applied our measure in the context of long-term messaging therapy. In this demo, we provide an initial exploration into how our redirection framework can be applied in other domains in particular to a publicly available dataset of U.S. Supreme Court oral arguments (Danescu-Niculescu-Mizil et al., 2012; Chang et al., 2020). Although court proceedings differ from therapy in terms of topics, goals, and interaction styles, their relatively unstructured and dynamic nature enables an initial exploration of how such discussions are redirected.

In this setting, we focus on the interactions between justices and lawyers. The power dynamics between these distinct roles reflect the asymmetric relationship between therapists and patients in mental-health domains, where one party generally holds more influence over the conversation.

In [2]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2,3"

We first import all the necessary packages from Convokit including our wrapper models and config files.

In [3]:
from convokit import Corpus
from likelihoodModel import LikelihoodModel
from gemmaLikelihoodModel import GemmaLikelihoodModel
from redirectionModel import RedirectionModel
from config import DEFAULT_BNB_CONFIG, DEFAULT_LORA_CONFIG, DEFAULT_TRAIN_CONFIG

We then load the `supreme-court` corpus we will be using for training and analysis.

In [4]:
DATA_DIR = '/supreme_court/supreme-corpus'
corpus = Corpus(DATA_DIR)
corpus.print_summary_stats()

Number of Speakers: 8979
Number of Utterances: 1700789
Number of Conversations: 7817


Now, we define our likelihood model responsible for computing utterance likelihoods based on provided context.The likelihood probabilities are later used to compute redirection scores for each utterance. Here, we define a likelihood model using the Gemma-2B called `GemmaLikelihodModel` which inherits from a default `LikelihoodModel` interface. Different models (Gemma, Llama, Mistral, etc.) can be supported by inheriting from this base interface. 

Since in this demo, we are using Gemma-2B through HuggingFace, we need to provide an authentication token for access to the model.

In [5]:
gemma_likelihood_model = \
    GemmaLikelihoodModel(
        hf_token = "TODO: add your HuggingFace authentication token",
        model_id = "google/gemma-2b", 
        train_config = DEFAULT_TRAIN_CONFIG,
        bnb_config = DEFAULT_BNB_CONFIG,
        lora_config = DEFAULT_LORA_CONFIG,
    )

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

We use the following default configs and parameters for fine-tuning. However, you may override these by defining your own configs and passing them to the `GemmaLikelihoodModel`.

Now we can define our redirection model, providing the initialized `gemma_likelihood_model` as our `LikelihoodModel`. The `redirection_attribute_name` represents the name of the meta-data field to save our redirection scores to in the corpus.

We also note that it is possible to define your own `actual_context_selector` and `reference_context_selector` to determine which contexts you would like to compute. These functions take as input an utterance and returns the actual or reference contexts for that particular utterance. By default, we use the immediate contexts described in our paper.

In [6]:
redirection_model = \
    RedirectionModel(
        likelihood_model = gemma_likelihood_model,
        redirection_attribute_name = "redirection"
#         actual_context_selector = <YOUR OWN ACTUAL CONTEXT SELECTOR>, 
#         reference_context_selector = <YOUR OWN REFERENCE CONTEXT SELECTOR>,
    )

Now we can call the fit method to fine-tune our model on a subset of the conversations in the corpus. We use a selector function to only fine-tune on the `train` subset of our data. Alternatively, if you already have saved an existing model, you can load it into memory using `load_from_disk`.

In [7]:
# redirection_model.fit(corpus, selector=lambda convo: convo.meta["train"] == True)
redirection_model.load_from_disk('redirection_model')

After we have our fine-tuned model, we can then run inference on the test conversations in order to compute the redirection scores. 

In [8]:
redirection_model.transform(corpus, selector=lambda convo: convo.meta['test'] == True)

We can then call summarize to view examples of high and low redirecting utterances from each speaker.

In [9]:
redirection_model.summarize(corpus)

[JUSTICE] HIGH REDIRECTION
Alright, passed for cause.
The Court - Alright the preemptory is with the people, Mr. Gustafson, I accept that Your Honor.
Mr. Solomon, Your Honor, before the jurist one may counsel for its events.
Yes and the following proceedings were held.
Now tell me in there what -- what was said to -- to challenge any single juror who sat in that case. 

Now, let's assume that they had this lien and the time hadn't expired (Inaudible) when you took over the property.
Did that then destroy their right to perfect their lien? 

-- that I asked you.
You -- you told us what the Commission would tell a shipper.
Now, the Commissioner says you must -- you must go to the court and the court will refer back to us and then we'll determine whether it's reasonable or unreasonable.
Now, does it go on and -- and then you said, you can go to the court if we find that unreasonable.
Do they go on beyond that and tell -- tell the carrier that he may then get reparations for past unreasona

We can also perform a FightingWords analysis to see distinguishing bigrams indicating high vs. low redirection from both speakers.

In [10]:
from convokit import FightingWords

Here we first show phrasings indicative of low redirection from justices.

In [11]:
fw_justice = FightingWords(ngram_range=(2,2))
class1 = 'justice_high'
class2 = 'justice_low'
fw_justice.fit(corpus, class1_func=lambda utt: utt.meta['type'] == class1, 
               class2_func=lambda utt: utt.meta['type'] == class2)
justice = fw_justice.summarize(corpus, plot=False, class1_name=class1, class2_name=class2)
justice.head(20)

Initializing default CountVectorizer with ngram_range (2, 2)... Done.
class1_func returned 2500 valid corpus components. class2_func returned 2500 valid corpus components.
Vocab size is 1192
Comparing language...
ngram zscores computed.


Unnamed: 0_level_0,z-score,class
ngram,Unnamed: 1_level_1,Unnamed: 2_level_1
voice overlap,-13.635868,justice_low
all right,-5.982801,justice_low
that right,-5.361657,justice_low
what you,-4.07298,justice_low
your honor,-4.027127,justice_low
that is,-3.966338,justice_low
no no,-3.509171,justice_low
want to,-3.477982,justice_low
you may,-3.442936,justice_low
of their,-3.441729,justice_low


Here we show phrasings indicative of high redirection from justices.

In [12]:
justice.tail(20)[::-1]

Unnamed: 0_level_0,z-score,class
ngram,Unnamed: 1_level_1,Unnamed: 2_level_1
ask you,3.623118,justice_high
may ask,3.537771,justice_high
court of,3.104942,justice_high
the first,2.924341,justice_high
that you,2.609062,justice_high
if he,2.536161,justice_high
the union,2.489816,justice_high
under the,2.485726,justice_high
as understand,2.477095,justice_high
as to,2.458816,justice_high


We can perform the corresponding analysis for lawyers as well.

In [13]:
fw_lawyer = FightingWords(ngram_range=(2,2))
class1 = 'lawyer_high'
class2 = 'lawyer_low'
fw_lawyer.fit(corpus, class1_func=lambda utt: utt.meta['type'] == class1, 
               class2_func=lambda utt: utt.meta['type'] == class2)
lawyer = fw_lawyer.summarize(corpus, plot=False, class1_name=class1, class2_name=class2)
lawyer.head(20)

Initializing default CountVectorizer with ngram_range (2, 2)... Done.
class1_func returned 2500 valid corpus components. class2_func returned 2500 valid corpus components.
Vocab size is 4660
Comparing language...
ngram zscores computed.


Unnamed: 0_level_0,z-score,class
ngram,Unnamed: 1_level_1,Unnamed: 2_level_1
yes sir,-10.526824,lawyer_low
your honor,-6.915489,lawyer_low
yes your,-5.612426,lawyer_low
trial court,-4.789453,lawyer_low
the association,-4.153246,lawyer_low
it for,-4.082037,lawyer_low
oh yes,-3.773727,lawyer_low
the purpose,-3.64175,lawyer_low
collective bargaining,-3.569423,lawyer_low
it was,-3.555451,lawyer_low


In [14]:
lawyer.tail(20)[::-1]

Unnamed: 0_level_0,z-score,class
ngram,Unnamed: 1_level_1,Unnamed: 2_level_1
the petitioner,4.977367,lawyer_high
the federal,4.449306,lawyer_high
this court,4.232517,lawyer_high
the fourteenth,3.99746,lawyer_high
where the,3.96237,lawyer_high
seems to,3.884303,lawyer_high
fourteenth amendment,3.779286,lawyer_high
the jones,3.61385,lawyer_high
of goods,3.574844,lawyer_high
it seems,3.485075,lawyer_high
