# VQA Rephrase

Here we try to rephrase some of the questions in simple manner

Notes:
1. It seems that we can only rephrase a portion of these questions...
  - "Is their hair long or short?" (even though answer is "other", we can't quite rephrase this)
  
Initial thoughts on rules:
1. Let's grab "What ... is" or "What ... does" or "What ... are", "What .... do" -- grab the "AUX" verb from the POS tagging, use it as the first word of the question, capitalize it.
2. "Is/Are/Does/Do NOUN ...", so we need to find either a noun or a pronoun right after moving "AUX" verb. What does this NOUN look like? It can also be "Is/Are/Does/Do ... VERB NOUN", then we need to know if the verb is transitive or not.
  - We divide this into three situations and conquer
  - Situation 1: "AUX ..." no verb after AUX, then we directly attach the answer! Is the bowl ANSWER (brown)? is the name of this Inn hobo inn? 
  - Situation 1a: "AUX PREP PHRASE" (AUX + propositional phrase) like "What kind of vehicle is on the left?" --> "Is on the left?", then instead of adding answer to the end, we add it to the "Is there a truck on the left?". Similarly "Is the lamp post ANSWER(yellow) on the left side?"
  - Situation 2: "AUX ... VERB ..." with a verb
  - Situation 2a: verb is transitive, we directly add answer to it.
  - Situation 2b: verb is intransitive
3. Adding the answer
  - "AUX DET", then we just append the answer
  - "AUX PRON [ANS]" or just "AUX [ANS] PP" if the answer if NOUN and singular, we need to add "a" or "an" to it, plural nouns are fine

In [1]:
import spacy

In [2]:
nlp = spacy.load("en_core_web_sm")

In [56]:
from nltk.stem import WordNetLemmatizer

wnl = WordNetLemmatizer()

def is_noun_plural(word):
    lemma = wnl.lemmatize(word, 'n')
    plural = True if word is not lemma else False
    return plural, lemma

In [4]:
isplural("apples")

(True, 'apple')

In [5]:
from prettytable import PrettyTable

In [6]:
def display(data, ranges, what_only=True):
    x = PrettyTable()
    x.field_names = ["Question", "Answer1", "Answer2"]
    for idx in ranges:
        if 'What' in data[idx]['question']:
            x.add_row([data[idx]['question'], data[idx]['answer1'], data[idx]['answer2']])
    print(x)

In [7]:
def search(data, search_word, limit=5):
    x = PrettyTable()
    cnt = 0
    for d in data:
        if search_word in d['question']:
            x.add_row([d['question'], d['answer1'], d['answer2']])
            cnt += 1
        if cnt == limit:
            break
    print(x)

In [8]:
import json
train_data = json.load(open('./data/vqa/pragmatic_other_train.json'))
val_data = json.load(open('./data/vqa/pragmatic_other_val.json'))

In [53]:
train_data[0]

{'question': 'What type of fruit is in the bottom right corner?',
 'answer1': 'apples',
 'image1': 472405,
 'answer2': 'orange',
 'image2': 258073,
 'answer_type1': 'other',
 'answer_type2': 'other',
 'question_type': 'what type of'}

In [54]:
train_data[1]

{'question': 'Is their hair long or short?',
 'answer1': 'short',
 'image1': 520590,
 'answer2': 'long',
 'image2': 155268,
 'answer_type1': 'other',
 'answer_type2': 'other',
 'question_type': 'is'}

In [84]:
display(train_data, range(60))

+----------------------------------------------------+------------------+-----------------+
|                      Question                      |     Answer1      |     Answer2     |
+----------------------------------------------------+------------------+-----------------+
| What type of fruit is in the bottom right corner?  |      apples      |      orange     |
|             What type of food is this?             |     sandwich     |      pizza      |
|              What color is the plate?              |  blue and white  |      white      |
|   What type of meat do you see in the sandwich?    |     chicken      |       ham       |
|    What color is the spot below the cat's nose?    |      black       |      white      |
|       What does the photo say at the bottom?       |    dutchsimba    |     nothing     |
|              What color is the bowl?               |      brown       |      white      |
|          What does it say on the ground?           |     no entry     |      c

In [98]:
search(train_data, 'How', limit=15)

+---------------------------------------------------------------+--------------+-----------+
|                            Field 1                            |   Field 2    |  Field 3  |
+---------------------------------------------------------------+--------------+-----------+
|                   How supplied these banana?                  |     dole     |  chiquita |
|                      How is the weather?                      |    cloudy    |   sunny   |
|                  How tall are the buildings?                  |     very     | very tall |
|              How does the dog carry his Frisbee?              | in his mouth |   mouth   |
| How would you describe the pattern of the little girls dress? |   tye dye    |    dots   |
|          How far does the water come up on the bears?         |    chest     |   ankles  |
|                   How bright is the laptop?                   | very bright  |   bright  |
|                  How is the mouth of the man?                 |    w

In [96]:
search(train_data, 'give', limit=20)

+-----------------------------------------------------------------+------------------------+-----------------------------+
|                             Field 1                             |        Field 2         |           Field 3           |
+-----------------------------------------------------------------+------------------------+-----------------------------+
|                     What is given off light?                    |        lantern         |             lamp            |
|     What does the sign above the walkway give directions to?    |        platform        | street league skateboarding |
|      What color glow do the large recessed lights give off?     |         yellow         |            purple           |
|                  What vehicle gives this view?                  |        airplane        |             bike            |
|              What toy has the elephant been given?              |          ball          |            tires            |
|               

In [9]:
def show_pos(sent):
    print(sent)
    print(" ".join([token.pos_ for token in nlp(sent)]))

In [30]:
show_pos("What color is the lamp post on the left side?")

What color is the lamp post on the left side?
DET NOUN AUX DET NOUN NOUN ADP DET ADJ NOUN PUNCT


In [31]:
show_pos("What type of food is this?")

What type of food is this?
DET NOUN ADP NOUN AUX DET PUNCT


In [32]:
show_pos("What does it say on the ground?")

What does it say on the ground?
PRON AUX PRON VERB ADP DET NOUN PUNCT


In [73]:
show_pos("What kind of vehicle is on the left?")

What kind of vehicle is on the left?
DET NOUN ADP NOUN AUX ADP DET NOUN PUNCT


In [33]:
show_pos("What type of meat do you see in the sandwich?")

What type of meat do you see in the sandwich?
DET NOUN ADP NOUN AUX PRON VERB ADP DET NOUN PUNCT


In [34]:
show_pos("What kind of bike is this person riding?")

What kind of bike is this person riding?
DET NOUN ADP NOUN AUX DET NOUN VERB PUNCT


In [51]:
show_pos("What is covering the ground?")

What is covering the ground?
PRON AUX VERB DET NOUN PUNCT


In [74]:
show_pos("What are the people doing?")

What are the people doing?
PRON AUX DET NOUN VERB PUNCT


In [82]:
show_pos("What is the hot dog sitting on top of?")

What is the hot dog sitting on top of?
PRON AUX DET ADJ NOUN VERB ADP NOUN ADP PUNCT


In [85]:
show_pos("What is between the elephant?")

What is between the elephant?
PRON AUX ADP DET NOUN PUNCT


In [99]:
show_pos("What is given off light?")

What is given off light?
PRON AUX VERB ADP NOUN PUNCT


In [100]:
show_pos("What did the girl give to the man?")

What did the girl give to the man? 
PRON AUX DET NOUN VERB ADP DET NOUN PUNCT


In [102]:
show_pos("What color is the bowl with a handle?")

What color is the bowl with a handle?
DET NOUN AUX DET NOUN ADP DET NOUN PUNCT


In [57]:
is_noun_plural("apples")

(True, 'apple')

In [59]:
is_noun_plural("klm")

(False, 'klm')

In [10]:
def check_verb(token):
    """Check verb type given spacy token"""
    if token.pos_ == 'VERB':
        indirect_object = False
        direct_object = False
        for item in token.children:
            if(item.dep_ == "iobj" or item.dep_ == "pobj"):
                indirect_object = True
            if (item.dep_ == "dobj" or item.dep_ == "dative"):
                direct_object = True
        if indirect_object and direct_object:
            return 'DITRANVERB'
        elif direct_object and not indirect_object:
            return 'TRANVERB'
        elif not direct_object and not indirect_object:
            return 'INTRANVERB'
        else:
            return 'VERB'
    else:
        return token.pos_

In [60]:
from copy import copy

def join_cap_sent(list_of_words, answer):
    # We unify AUX verb with answer plurality here
    # because different answer has different plurality!
    aux_verb = unify_answer(answer, list_of_words[0])
    list_of_words[0] = aux_verb
    
    return " ".join(list_of_words).capitalize()

def check_pron_or_noun_before_verb(pos_start_from_aux):
    pron_idx, noun_idx = -1, -1
    if 'PRON' in pos_start_from_aux:
        pron_idx = pos_start_from_aux.index("PRON")
    if 'NOUN' in pos_start_from_aux:
        noun_idx = pos_start_from_aux.index("NOUN")
        
    pron_or_noun_idx = min(noun_idx, pron_idx)
    
    verb_idx = pos_start_from_aux.index("VERB")
    return pron_or_noun_idx < verb_idx

def check_adp_immediately_after_verb(pos_start_from_aux):
    # this can suggest transitivity
    # "sitting on top of" vs "riding" / "say"
    adp_idx = -1
    if 'ADP' in pos_start_from_aux:
        adp_idx = pos_start_from_aux.index("ADP")
    verb_idx = pos_start_from_aux.index("VERB")
    return adp_idx - verb_idx == 1

def get_right_most_idx(pos_start_from_aux, pos_tag):
    return next(i for i in reversed(range(len(pos_start_from_aux))) if pos_start_from_aux[i] == pos_tag)

aux_verb_get_dual = {
    "is": "are",
    "are": "is",
    "does": "do",
    "do": "does"
}

def unify_answer(answer, aux_verb):
    noun_plural, _ = is_noun_plural(answer)
    if noun_plural and aux_verb in {'is', 'does'}:
        aux_verb = aux_verb_get_dual[aux_verb]
    if not noun_plural and aux_verb in {'are', 'do'}:
        aux_verb = aux_verb_get_dual[aux_verb]
    return aux_verb
    
def rephrase(question, answer1, answer2):
    """
    Return (pragmatic_question1, pragmatic_question2) or (None, None)
    """
    nlp_sent = nlp(question)
    
    # Filter out questions that do not start with "WHAT"
    if nlp_sent[0].text != 'What' and "What's" in question:
        return None, None
    
    pos_per_tokens = [token.pos_ for token in nlp_sent]
    
    # 0). No AUX found (do nothing)
    if 'AUX' not in pos_per_tokens:
        return None, None
    
    aux_idx = pos_per_tokens.index("AUX")
    sent_start_from_aux = [token.text for token in nlp_sent][aux_idx:]
    pos_start_from_aux = pos_per_tokens[aux_idx:]
    
    # Then we go into branches 
    
    # No verb situation
    if 'VERB' not in pos_start_from_aux:
        # 2). "AUX ADP...?" -> "AUX [ANSWER] ADP ...?"
        # This assumes no NOUN, we insert answer as NOUN
        if 'ADP' in pos_start_from_aux:
            return join_cap_sent([sent_start_from_aux[0], answer1] + sent_start_from_aux[1:], answer1), join_cap_sent([sent_start_from_aux[0], answer2] + sent_start_from_aux[1:], answer2)
        # 1). "AUX..NOUN?" -> "AUX..NOUN [ANSWER]"
        # directly append to the last part before PUNCT
        elif 'NOUN' in pos_start_from_aux:
            return join_cap_sent(sent_start_from_aux[:-1] + [answer1, '?'], answer1), join_cap_sent(sent_start_from_aux[:-1] + [answer2, '?'], answer2)
    else:
        # 3). "AUX..VERB..?"
        # a). "AUX PRON/NOUN doing" -> "AUX PRON/NOUN [ANSWER]...?" (special case)
        verb_idx = pos_per_tokens.index('VERB')
        VERB_FORM = check_verb(nlp_sent[verb_idx])

        if 'doing' in sent_start_from_aux:
            # replace "doing" with the answer
            new_sent_1 = copy(sent_start_from_aux)
            new_sent_2 = copy(sent_start_from_aux)
            new_sent_1[new_sent_1.index('doing')] = answer1
            new_sent_2[new_sent_2.index('doing')] = answer2
            return join_cap_sent(new_sent_1, answer1), join_cap_sent(new_sent_2, answer2)
        
        # d). "AUX ... PRON/NOUN VERB ADP..ADP..?" -> "AUX PRON/NOUN VERB ADP...ADP [ANSWER] ...?"
        elif check_adp_immediately_after_verb(pos_start_from_aux) and check_pron_or_noun_before_verb(pos_start_from_aux) and VERB_FORM == 'INTRANVERB':
            right_most_adp_idx = get_right_most_idx(pos_start_from_aux, 'ADP')
            return join_cap_sent(sent_start_from_aux[:right_most_adp_idx+1] + [answer1] + sent_start_from_aux[right_most_adp_idx+1:], answer1), \
                        join_cap_sent(sent_start_from_aux[:right_most_adp_idx+1] + [answer2] + sent_start_from_aux[right_most_adp_idx+1:], answer2)
        # c). "AUX VERB ...?" -> "AUX [ANSWER] VERB ...?"
        elif pos_start_from_aux.index("VERB") - pos_start_from_aux.index("AUX") == 1:
            verb_idx = pos_start_from_aux.index("VERB")
            # insert answer before the verb!
            return join_cap_sent(sent_start_from_aux[:verb_idx] + [answer1] + sent_start_from_aux[verb_idx:], answer1), \
                    join_cap_sent(sent_start_from_aux[:verb_idx] + [answer2] + sent_start_from_aux[verb_idx:], answer2)
        # b). "AUX ... PRON/NOUN VERB...?" -> "AUX PRON/NOUN VERB [ANSWER] ...?"
        elif check_pron_or_noun_before_verb(pos_start_from_aux):
            verb_idx = pos_start_from_aux.index("VERB")
            # insert answer right after the verb
            return join_cap_sent(sent_start_from_aux[:verb_idx+1] + [answer1] + sent_start_from_aux[verb_idx+1:], answer1), \
                    join_cap_sent(sent_start_from_aux[:verb_idx+1] + [answer2] + sent_start_from_aux[verb_idx+1:], answer2)

    return None, None

In [40]:
show_pos("do you see in the sandwich?")

do you see in the sandwich?
AUX PRON VERB ADP DET NOUN PUNCT


In [43]:
show_pos("is the hot dog sitting on top of?")

is the hot dog sitting on top of?
AUX DET ADJ NOUN VERB ADP NOUN ADP PUNCT


In [46]:
nlp_sent = nlp("What do you see in the sandwich?")
check_verb(nlp_sent[3])

'TRANVERB'

In [47]:
nlp_sent = nlp("What is the hot dog sitting on top of?")
check_verb(nlp_sent[5])

'INTRANVERB'

In [147]:
rephrase("What color is the bowl?", "brown", "blue")

('Is the bowl brown ?', 'Is the bowl blue ?')

In [148]:
rephrase("What kind of vehicle is on the left?", "truck", "train")

('Is truck on the left ?', 'Is train on the left ?')

In [149]:
rephrase("What is in the dogs mouth?", "frisbee", "teddy bear")

('Is frisbee in the dogs mouth ?', 'Is teddy bear in the dogs mouth ?')

In [150]:
rephrase("What are the people doing?", "skiing", "snowboarding")

('Are the people skiing ?', 'Are the people snowboarding ?')

In [151]:
rephrase("What does it say on the ground?", "no entry", "clear")

('Does it say no entry on the ground ?', 'Does it say clear on the ground ?')

In [152]:
rephrase("What kind of bike is this person riding?", "mountain bike", "motorcycle")

('Is this person riding mountain bike ?', 'Is this person riding motorcycle ?')

In [166]:
rephrase("What is covering the ground?", 'snow', 'sand')

('Is snow covering the ground ?', 'Is sand covering the ground ?')

In [61]:
rephrase("What is the hot dog sitting on top of?", "paper", 'bun')

('Is the hot dog sitting on top of paper ?',
 'Is the hot dog sitting on top of bun ?')

In [50]:
rephrase("What kinds of meat is on this sandwich?", 'beef', 'chicken')

('Is beef on this sandwich ?', 'Is chicken on this sandwich ?')

In [62]:
rephrase("What letters are on the airplane?", 'klm', 'jet blue')

('Is klm on the airplane ?', 'Is jet blue on the airplane ?')

In [52]:
rephrase("What type of meat do you see in the sandwich?", "ham", "chicken")

('Do you see ham in the sandwich ?', 'Do you see chicken in the sandwich ?')

In [63]:
rephrase('What type of fruit is in the bottom right corner?', 'apples', 'orange')

('Are apples in the bottom right corner ?',
 'Is orange in the bottom right corner ?')

In [49]:
sent = 'What does it say on the ground?'
check_verb(nlp(sent)[3])

'TRANVERB'

In [83]:
sent = 'What is the hot dog sitting on top of?'
check_verb(nlp(sent)[5])

'INTRANVERB'

In [101]:
sent = 'What did the girl give to the man?'
check_verb(nlp(sent)[4])

'TRANVERB'

In [145]:
sent = 'What kind of bike is this person riding?'
check_verb(nlp(sent)[-2])

'INTRANVERB'

In [142]:
sent = 'The girl is riding a bicycle'
check_verb(nlp(sent)[3])

'TRANVERB'

In [61]:
show_pos("Is it a crow?")

Is it a crow?
AUX PRON DET NOUN PUNCT


In [62]:
show_pos("Is on the left?")

Is on the left?
AUX ADP DET NOUN PUNCT


In [77]:
nlp("yellow")[-1].pos_

'PROPN'

In [75]:
nlp("white")[-1].pos_

'PROPN'

In [76]:
nlp("truck")[-1].pos_

'NOUN'

In [80]:
nlp("evening")[-1].pos_

'NOUN'

### Check Stats

Now we run the transformation and see how many came up non-empty

In [13]:
from tqdm import tqdm

In [19]:
non_empty_train = 0
for d in tqdm(train_data):
    pq1, pq2 = rephrase(d['question'], d['answer1'], d['answer2'])
    if pq1 is not None:
        non_empty_train += 1

100%|██████████| 91041/91041 [17:44<00:00, 85.51it/s] 


In [20]:
non_empty_train

66416

In [21]:
number_of_what_qs = 0
for d in train_data:
    if 'What' in d['question']:
        number_of_what_qs += 1
number_of_what_qs

75291

In [14]:
non_empty_val = 0
for d in tqdm(val_data):
    pq1, pq2 = rephrase(d['question'], d['answer1'], d['answer2'])
    if pq1 is not None:
        non_empty_val += 1

100%|██████████| 42623/42623 [08:22<00:00, 84.88it/s] 


In [15]:
non_empty_val

30791

In [16]:
number_of_what_qs = 0
for d in val_data:
    if 'What' in d['question']:
        number_of_what_qs += 1

In [17]:
number_of_what_qs

35183

In [18]:
len(train_data)

91041

In [22]:
len(val_data)

42623

## Load Data

In [70]:
import json
train_what_rephrased = json.load(open("./data/vqa/pragmatic_other_train_what_rephrased.json"))
val_what_rephrased = json.load(open("./data/vqa/pragmatic_other_val_what_rephrased.json"))

In [71]:
def display_rephrased(data, ranges):
    """
    This only shows rephrased with original, and answers
    """
    for idx in ranges:
        print("Pragmatic Q3:", data[idx]['question'])
        print("Literal Q1:", data[idx]['literal_question1'])
        print("Literal Q2:", data[idx]['literal_question2'])
        print()

In [72]:
display_rephrased(train_what_rephrased, range(40))

Pragmatic Q3: What type of fruit is in the bottom right corner?
Literal Q1: Are apples in the bottom right corner ?
Literal Q2: Is orange in the bottom right corner ?

Pragmatic Q3: What color is the plate?
Literal Q1: Is the plate blue and white ?
Literal Q2: Is the plate white ?

Pragmatic Q3: What type of meat do you see in the sandwich?
Literal Q1: Does you see chicken in the sandwich ?
Literal Q2: Does you see ham in the sandwich ?

Pragmatic Q3: What color is the spot below the cat's nose?
Literal Q1: Is black the spot below the cat 's nose ?
Literal Q2: Is white the spot below the cat 's nose ?

Pragmatic Q3: What does the photo say at the bottom?
Literal Q1: Does the photo say dutchsimba at the bottom ?
Literal Q2: Does the photo say nothing at the bottom ?

Pragmatic Q3: What color is the bowl?
Literal Q1: Is the bowl brown ?
Literal Q2: Is the bowl white ?

Pragmatic Q3: What does it say on the ground?
Literal Q1: Does it say no entry on the ground ?
Literal Q2: Does it say c