In [42]:
import dspy
import torch
lm = dspy.OllamaLocal(model='mistral',temperature=0.2)
dspy.settings.configure(lm=lm)


In [43]:

print(torch.cuda.is_available())  # Should return True if GPU is accessible


True


In [44]:

if torch.cuda.is_available():
    print(f"CUDA is available. Using GPU: {torch.cuda.get_device_name(0)}")
else:
    print("CUDA is not available. Using CPU.")


CUDA is available. Using GPU: NVIDIA GeForce RTX 4060 Laptop GPU


In [45]:
sentences_data_test = [
    ("Bitcoin is for $7,094 this morning, which CoinDesk says.", 0),
    ("Bitcoin goes for $7,094 this morning, according to CoinDesk.", 1),
    ("The effect of widespread dud targets two face up attack position monsters on the field.", 0),
    ("The effect of 'widespread dud' targets two face-up attack position monsters on the field.", 1),
    ("tax on sales of stores for non residents are set at 21% for 2014 and 20% in 2015 payable on sales tentatively earned from the difference of the property value some time of purchase (price differences according to working time) and theyear to which sale couples (sales costs), based on the approved annual on the base approved by law).", 0),
    ("Capital Gains tax on the sale of properties for non-residents is set at 21% for 2014 and 20% in 2015 payable on profits earned on the difference of the property value between the year of purchase (purchase price plus costs) and the year of sale (sales price minus costs), based on the approved annual percentage increase on the base value approved by law.", 1),
    ("Much many brands and sellers still in the market.", 0),
    ("Many brands and sellers still in the market.", 1),
    ("this is is the latest Maintenance release of Samba 3.6", 0),
    ("This is is the latest maintenance release of Samba 3.6.", 1),
    ("Fairy Or Not, I'm the Godmother: no just look, but my outfit for taking the part as godmother.", 0),
    ("Fairy Or Not, I'm the Godmother: Not just a look, but my outfit for taking on the role as godmother.", 1),
    ("Watcch as this Dodge Challenger Hellcat gets smoked by a Tesla Model S - with the drag strip.", 0),
    ("Watch as this Dodge Challenger Hellcat gets smoked by a Tesla Model S at the drag strip.", 1),
    ("Momover, these devices have been proven to help consumers during another company his information.", 0),
    ("Moreover, these devices are proven to help consumers while another company that information.", 1),
    ("Ever cloud has a silver lining and it’s just possible that we were beaten before the off as the first three home came from stalls eight to 12, while we were drawn in berth two which meant that our fellow was forced to race in the middle the course while the leader kicked on on the stands’ high rail.", 0),
    ("Every cloud has a silver lining and it’s just possible that we were beaten before the off as the first three home came from stalls eight to 12, while we were drawn in berth two which meant that our fellow was forced to race in the middle of the course while the leader kicked on on the stands’ rail.", 1),
    ("Worthless involved's supporting for the movement.", 0),
    ("Get involved and help the movement!", 1)
]

converted_data = [(text, "No" if label == 0 else "Yes") for text, label in sentences_data_test]
print(converted_data)

[('Bitcoin is for $7,094 this morning, which CoinDesk says.', 'No'), ('Bitcoin goes for $7,094 this morning, according to CoinDesk.', 'Yes'), ('The effect of widespread dud targets two face up attack position monsters on the field.', 'No'), ("The effect of 'widespread dud' targets two face-up attack position monsters on the field.", 'Yes'), ('tax on sales of stores for non residents are set at 21% for 2014 and 20% in 2015 payable on sales tentatively earned from the difference of the property value some time of purchase (price differences according to working time) and theyear to which sale couples (sales costs), based on the approved annual on the base approved by law).', 'No'), ('Capital Gains tax on the sale of properties for non-residents is set at 21% for 2014 and 20% in 2015 payable on profits earned on the difference of the property value between the year of purchase (purchase price plus costs) and the year of sale (sales price minus costs), based on the approved annual percenta

In [46]:
dataset = []
for sent in converted_data:
    dataset.append(dspy.Example(text=sent[0], answer=sent[1]).with_inputs("text"))
print(dataset[:3])


[Example({'text': 'Bitcoin is for $7,094 this morning, which CoinDesk says.', 'answer': 'No'}) (input_keys={'text'}), Example({'text': 'Bitcoin goes for $7,094 this morning, according to CoinDesk.', 'answer': 'Yes'}) (input_keys={'text'}), Example({'text': 'The effect of widespread dud targets two face up attack position monsters on the field.', 'answer': 'No'}) (input_keys={'text'})]


In [47]:
sentences_data_test = [
    ("Bitcoin is for $7,094 this morning, which CoinDesk says.", 0),
    ("Bitcoin goes for $7,094 this morning, according to CoinDesk.", 1),
    ("The effect of widespread dud targets two face up attack position monsters on the field.", 0),
    ("The effect of 'widespread dud' targets two face-up attack position monsters on the field.", 1),
    ("tax on sales of stores for non residents are set at 21% for 2014 and 20% in 2015 payable on sales tentatively earned from the difference of the property value some time of purchase (price differences according to working time) and theyear to which sale couples (sales costs), based on the approved annual on the base approved by law).", 0),
    ("Capital Gains tax on the sale of properties for non-residents is set at 21% for 2014 and 20% in 2015 payable on profits earned on the difference of the property value between the year of purchase (purchase price plus costs) and the year of sale (sales price minus costs), based on the approved annual percentage increase on the base value approved by law.", 1),
    ("Much many brands and sellers still in the market.", 0),
    ("Many brands and sellers still in the market.", 1),
    ("this is is the latest Maintenance release of Samba 3.6", 0),
    ("This is is the latest maintenance release of Samba 3.6.", 1),
    ("Fairy Or Not, I'm the Godmother: no just look, but my outfit for taking the part as godmother.", 0),
    ("Fairy Or Not, I'm the Godmother: Not just a look, but my outfit for taking on the role as godmother.", 1),
    ("Watcch as this Dodge Challenger Hellcat gets smoked by a Tesla Model S - with the drag strip.", 0),
    ("Watch as this Dodge Challenger Hellcat gets smoked by a Tesla Model S at the drag strip.", 1),
    ("Momover, these devices have been proven to help consumers during another company his information.", 0),
    ("Moreover, these devices are proven to help consumers while another company that information.", 1),
    ("Ever cloud has a silver lining and it’s just possible that we were beaten before the off as the first three home came from stalls eight to 12, while we were drawn in berth two which meant that our fellow was forced to race in the middle the course while the leader kicked on on the stands’ high rail.", 0),
    ("Every cloud has a silver lining and it’s just possible that we were beaten before the off as the first three home came from stalls eight to 12, while we were drawn in berth two which meant that our fellow was forced to race in the middle of the course while the leader kicked on on the stands’ rail.", 1),
    ("Worthless involved's supporting for the movement.", 0),
    ("Get involved and help the movement!", 1)
]
#0 and 1 into no and yes 
converted_data = [(text, "No" if label == 0 else "Yes") for text, label in sentences_data_test]

dataset = []
for sent in converted_data:
    dataset.append(dspy.Example(text=sent[0], answer=sent[1]).with_inputs("text"))
print(dataset[:3])

class GC(dspy.Signature):
    """
    You are given a sentence/text and you must classify whether it is gramatically correct
    only with a Yes or No 
    
    """
    text = dspy.InputField()
    answer= dspy.OutputField(desc="Yes or No")



class CoT(dspy.Module):  
    def __init__(self):
        super().__init__()

        self.generate_answer = dspy.ChainOfThought(GC)
    
    def forward(self, text):
        return self.generate_answer(text=text)  

from dspy.teleprompt import BootstrapFewShotWithRandomSearch

metric_EM = dspy.evaluate.answer_exact_match

config = dict(max_bootstrapped_demos=4, max_labeled_demos=4, num_candidate_programs=5, num_threads=4)

teleprompter = BootstrapFewShotWithRandomSearch(metric=metric_EM)

compiled_grammar_judge = teleprompter.compile(CoT(), trainset=dataset)

[Example({'text': 'Bitcoin is for $7,094 this morning, which CoinDesk says.', 'answer': 'No'}) (input_keys={'text'}), Example({'text': 'Bitcoin goes for $7,094 this morning, according to CoinDesk.', 'answer': 'Yes'}) (input_keys={'text'}), Example({'text': 'The effect of widespread dud targets two face up attack position monsters on the field.', 'answer': 'No'}) (input_keys={'text'})]
Going to sample between 1 and 4 traces per predictor.
Will attempt to train 16 candidate sets.


  0%|          | 0/20 [00:00<?, ?it/s]

Average Metric: 11 / 20  (55.0): 100%|██████████| 20/20 [00:18<00:00,  1.08it/s]


Average Metric: 11 / 20  (55.0%)
Score: 55.0 for set: [0]
New best score: 55.0 for seed -3
Scores so far: [55.0]
Best score: 55.0


Average Metric: 8 / 20  (40.0): 100%|██████████| 20/20 [00:29<00:00,  1.48s/it]


Average Metric: 8 / 20  (40.0%)
Score: 40.0 for set: [16]
Scores so far: [55.0, 40.0]
Best score: 55.0


100%|██████████| 20/20 [01:12<00:00,  3.64s/it]


Bootstrapped 3 full traces after 20 examples in round 0.


Average Metric: 0 / 20  (0.0): 100%|██████████| 20/20 [01:03<00:00,  3.18s/it]


Average Metric: 0 / 20  (0.0%)
Score: 0.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.75
Average of max per entry across top 3 scores: 0.75
Average of max per entry across top 5 scores: 0.75
Average of max per entry across top 8 scores: 0.75
Average of max per entry across top 9999 scores: 0.75


 50%|█████     | 10/20 [00:35<00:35,  3.53s/it]


Bootstrapped 4 full traces after 11 examples in round 0.


Average Metric: 1 / 20  (5.0): 100%|██████████| 20/20 [00:54<00:00,  2.74s/it] 


Average Metric: 1 / 20  (5.0%)
Score: 5.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.75
Average of max per entry across top 3 scores: 0.75
Average of max per entry across top 5 scores: 0.75
Average of max per entry across top 8 scores: 0.75
Average of max per entry across top 9999 scores: 0.75


 20%|██        | 4/20 [00:12<00:51,  3.23s/it]


Bootstrapped 2 full traces after 5 examples in round 0.


Average Metric: 9 / 20  (45.0): 100%|██████████| 20/20 [00:39<00:00,  1.95s/it]


Average Metric: 9 / 20  (45.0%)
Score: 45.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0, 45.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.8
Average of max per entry across top 5 scores: 0.8
Average of max per entry across top 8 scores: 0.8
Average of max per entry across top 9999 scores: 0.8


 50%|█████     | 10/20 [00:35<00:35,  3.51s/it]


Bootstrapped 1 full traces after 11 examples in round 0.


Average Metric: 10 / 20  (50.0): 100%|██████████| 20/20 [00:26<00:00,  1.32s/it]


Average Metric: 10 / 20  (50.0%)
Score: 50.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0, 45.0, 50.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.8
Average of max per entry across top 3 scores: 0.95
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 20%|██        | 4/20 [00:16<01:07,  4.24s/it]


Bootstrapped 2 full traces after 5 examples in round 0.


Average Metric: 6 / 20  (30.0): 100%|██████████| 20/20 [00:43<00:00,  2.20s/it]


Average Metric: 6 / 20  (30.0%)
Score: 30.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0, 45.0, 50.0, 30.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.8
Average of max per entry across top 3 scores: 0.95
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 20%|██        | 4/20 [00:14<00:58,  3.66s/it]


Bootstrapped 2 full traces after 5 examples in round 0.


Average Metric: 4 / 20  (20.0): 100%|██████████| 20/20 [00:54<00:00,  2.72s/it]


Average Metric: 4 / 20  (20.0%)
Score: 20.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0, 45.0, 50.0, 30.0, 20.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.8
Average of max per entry across top 3 scores: 0.95
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 35%|███▌      | 7/20 [00:27<00:50,  3.91s/it]


Bootstrapped 3 full traces after 8 examples in round 0.


Average Metric: 0 / 20  (0.0): 100%|██████████| 20/20 [00:57<00:00,  2.88s/it]


Average Metric: 0 / 20  (0.0%)
Score: 0.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0, 45.0, 50.0, 30.0, 20.0, 0.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.8
Average of max per entry across top 3 scores: 0.95
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 10%|█         | 2/20 [00:07<01:10,  3.90s/it]


Bootstrapped 1 full traces after 3 examples in round 0.


Average Metric: 0 / 20  (0.0): 100%|██████████| 20/20 [01:04<00:00,  3.22s/it]


Average Metric: 0 / 20  (0.0%)
Score: 0.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0, 45.0, 50.0, 30.0, 20.0, 0.0, 0.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.8
Average of max per entry across top 3 scores: 0.95
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 20%|██        | 4/20 [00:14<00:59,  3.75s/it]


Bootstrapped 3 full traces after 5 examples in round 0.


Average Metric: 0 / 20  (0.0): 100%|██████████| 20/20 [00:51<00:00,  2.60s/it]


Average Metric: 0 / 20  (0.0%)
Score: 0.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0, 45.0, 50.0, 30.0, 20.0, 0.0, 0.0, 0.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.8
Average of max per entry across top 3 scores: 0.95
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 75%|███████▌  | 15/20 [00:49<00:16,  3.33s/it]


Bootstrapped 2 full traces after 16 examples in round 0.


Average Metric: 5 / 20  (25.0): 100%|██████████| 20/20 [00:52<00:00,  2.64s/it]


Average Metric: 5 / 20  (25.0%)
Score: 25.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0, 45.0, 50.0, 30.0, 20.0, 0.0, 0.0, 0.0, 25.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.8
Average of max per entry across top 3 scores: 0.95
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 70%|███████   | 14/20 [00:51<00:22,  3.71s/it]


Bootstrapped 4 full traces after 15 examples in round 0.


Average Metric: 0 / 20  (0.0): 100%|██████████| 20/20 [00:53<00:00,  2.66s/it]


Average Metric: 0 / 20  (0.0%)
Score: 0.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0, 45.0, 50.0, 30.0, 20.0, 0.0, 0.0, 0.0, 25.0, 0.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.8
Average of max per entry across top 3 scores: 0.95
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


  5%|▌         | 1/20 [00:02<00:54,  2.86s/it]


Bootstrapped 1 full traces after 2 examples in round 0.


Average Metric: 9 / 20  (45.0): 100%|██████████| 20/20 [00:47<00:00,  2.37s/it]


Average Metric: 9 / 20  (45.0%)
Score: 45.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0, 45.0, 50.0, 30.0, 20.0, 0.0, 0.0, 0.0, 25.0, 0.0, 45.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.8
Average of max per entry across top 3 scores: 0.95
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 30%|███       | 6/20 [00:22<00:53,  3.82s/it]


Bootstrapped 4 full traces after 7 examples in round 0.


Average Metric: 1 / 20  (5.0): 100%|██████████| 20/20 [00:57<00:00,  2.87s/it] 


Average Metric: 1 / 20  (5.0%)
Score: 5.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0, 45.0, 50.0, 30.0, 20.0, 0.0, 0.0, 0.0, 25.0, 0.0, 45.0, 5.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.8
Average of max per entry across top 3 scores: 0.95
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


100%|██████████| 20/20 [01:19<00:00,  3.96s/it]


Bootstrapped 3 full traces after 20 examples in round 0.


Average Metric: 1 / 20  (5.0): 100%|██████████| 20/20 [00:55<00:00,  2.77s/it] 


Average Metric: 1 / 20  (5.0%)
Score: 5.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0, 45.0, 50.0, 30.0, 20.0, 0.0, 0.0, 0.0, 25.0, 0.0, 45.0, 5.0, 5.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.8
Average of max per entry across top 3 scores: 0.95
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 35%|███▌      | 7/20 [00:25<00:46,  3.58s/it]


Bootstrapped 3 full traces after 8 examples in round 0.


Average Metric: 0 / 20  (0.0): 100%|██████████| 20/20 [00:57<00:00,  2.85s/it]


Average Metric: 0 / 20  (0.0%)
Score: 0.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0, 45.0, 50.0, 30.0, 20.0, 0.0, 0.0, 0.0, 25.0, 0.0, 45.0, 5.0, 5.0, 0.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.8
Average of max per entry across top 3 scores: 0.95
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 10%|█         | 2/20 [00:07<01:10,  3.94s/it]


Bootstrapped 1 full traces after 3 examples in round 0.


Average Metric: 7 / 20  (35.0): 100%|██████████| 20/20 [00:27<00:00,  1.37s/it]


Average Metric: 7 / 20  (35.0%)
Score: 35.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0, 45.0, 50.0, 30.0, 20.0, 0.0, 0.0, 0.0, 25.0, 0.0, 45.0, 5.0, 5.0, 0.0, 35.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.8
Average of max per entry across top 3 scores: 0.95
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 10%|█         | 2/20 [00:06<00:54,  3.04s/it]


Bootstrapped 2 full traces after 3 examples in round 0.


Average Metric: 5 / 20  (25.0): 100%|██████████| 20/20 [00:49<00:00,  2.49s/it]

Average Metric: 5 / 20  (25.0%)
Score: 25.0 for set: [16]
Scores so far: [55.0, 40.0, 0.0, 5.0, 45.0, 50.0, 30.0, 20.0, 0.0, 0.0, 0.0, 25.0, 0.0, 45.0, 5.0, 5.0, 0.0, 35.0, 25.0]
Best score: 55.0
Average of max per entry across top 1 scores: 0.55
Average of max per entry across top 2 scores: 0.8
Average of max per entry across top 3 scores: 0.95
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0
19 candidate programs found.





In [48]:
lm.inspect_history(n=20)





You are given a sentence/text and you must classify whether it is gramatically correct
    only with a Yes or No

---

Text: The effect of widespread dud targets two face up attack position monsters on the field.
Answer: No

Text: Fairy Or Not, I'm the Godmother: no just look, but my outfit for taking the part as godmother.
Answer: No

Text: Watcch as this Dodge Challenger Hellcat gets smoked by a Tesla Model S - with the drag strip.
Answer: No

Text: Many brands and sellers still in the market.
Answer: Yes

Text: Watch as this Dodge Challenger Hellcat gets smoked by a Tesla Model S at the drag strip.
Answer: Yes

Text: This is is the latest maintenance release of Samba 3.6.
Answer: Yes

Text: Worthless involved's supporting for the movement.
Answer: No

Text: Fairy Or Not, I'm the Godmother: Not just a look, but my outfit for taking on the role as godmother.
Answer: Yes

Text: Momover, these devices have been proven to help consumers during another company his information.
Answer:

In [49]:
class FactJudge(dspy.Signature):
    """
    You are given a premise and a hypothesis.
    You must indicate with Yes or No answer whether we can logically
    conclude the hypothesis from the premise.
    From the premise is the hypothesis validated?
    """
    premise = dspy.InputField()
    hypothesis = dspy.InputField()
    entailment = dspy.OutputField(desc="Yes or No")

judge = dspy.ChainOfThought(FactJudge)

def factuality_metric(example, pred):

    premise = example.text
    if(pred.answer=="Yes"):
        hypothesis="The given text is Grammatically correct"
    else:
        hypothesis="The given text is Grammatically wrong"
    
    
    # Evaluate if the hypothesis is valid
    result = judge(premise=premise, hypothesis=hypothesis)
   
    # Return 1 if valid, otherwise 0
    return int(result.entailment == "Yes")

from dspy.evaluate import Evaluate

evaluate_program = Evaluate(devset=dataset, metric=factuality_metric, num_threads=2, display_progress=True, display_table=20,return_outputs=True)

new_data=evaluate_program(compiled_grammar_judge)

print(new_data)

Average Metric: 14 / 20  (70.0): 100%|██████████| 20/20 [01:09<00:00,  3.47s/it]

Average Metric: 14 / 20  (70.0%)



 '0' '1']' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  df.loc[:, metric_name] = df[metric_name].apply(


Unnamed: 0,text,example_answer,rationale,pred_answer,factuality_metric
0,"Bitcoin is for $7,094 this morning, which CoinDesk says.",No,"determine if the sentence is grammatically correct. We start by identifying the subject of the sentence, which is ""Bitcoin"". The predicate is the verb phrase...",Yes,1
1,"Bitcoin goes for $7,094 this morning, according to CoinDesk.",Yes,"determine if the sentence is grammatically correct. We check the subject-verb agreement, punctuation, and proper use of articles. In this case, ""Bitcoin"" is the subject,...",Yes,1
2,The effect of widespread dud targets two face up attack position monsters on the field.,No,"determine if the sentence is grammatically correct. We first check the subject-verb agreement, but there seems to be no clear subject or verb in this...",No,1
3,The effect of 'widespread dud' targets two face-up attack position monsters on the field.,Yes,"determine if this sentence is grammatically correct. We first check for subject-verb agreement, and then look for proper use of punctuation and syntax. The subject...",No,0
4,tax on sales of stores for non residents are set at 21% for 2014 and 20% in 2015 payable on sales tentatively earned from the...,No,"determine if this text is grammatically correct. We will examine the structure of the sentence, punctuation, and use of capitalization.",No,1
5,Capital Gains tax on the sale of properties for non-residents is set at 21% for 2014 and 20% in 2015 payable on profits earned on...,Yes,"classify whether this text is grammatically correct. We will examine each clause, phrase, and word for proper grammar usage, subject-verb agreement, punctuation, and sentence structure.",Yes,1
6,Much many brands and sellers still in the market.,No,"determine if this sentence is grammatically correct. We can break down the sentence into its components: ""Much"" modifies ""many"", but ""much"" should be replaced with...",No (incorrect grammar),1
7,Many brands and sellers still in the market.,Yes,"determine if this sentence is grammatically correct. We first need to identify the subject, verb, and predicate of the sentence. The subject appears to be...","No, the sentence is not grammatically correct in its current form because it lacks a clear verb or predicate that clearly indicates an action or...",0
8,this is is the latest Maintenance release of Samba 3.6,No,"determine if the sentence is grammatically correct. We notice that there are two instances of ""is"" in the sentence, which is incorrect as a subject...",No,0
9,This is is the latest maintenance release of Samba 3.6.,Yes,"determine if this sentence is grammatically correct. We notice that there are two instances of ""is"" in the subject, which should only appear once. The...",No (incorrect due to duplicate subject pronoun),1


(70.0, [(Example({'text': 'Bitcoin goes for $7,094 this morning, according to CoinDesk.', 'answer': 'Yes'}) (input_keys={'text'}), Prediction(
    rationale='determine if the sentence is grammatically correct. We check the subject-verb agreement, punctuation, and proper use of articles. In this case, "Bitcoin" is the subject, and "goes" is the verb. The verb agrees with the subject because Bitcoin is singular. The article "the" before "morning" is appropriate since it is a specific time. The sentence also contains correct punctuation. Therefore, Answer: Yes',
    answer='Yes'
), 1), (Example({'text': 'Bitcoin is for $7,094 this morning, which CoinDesk says.', 'answer': 'No'}) (input_keys={'text'}), Prediction(
    rationale='determine if the sentence is grammatically correct. We start by identifying the subject of the sentence, which is "Bitcoin". The predicate is the verb phrase "is for $7,094 this morning", and the object is "this morning". The object complement is "which CoinDesk sa

In [50]:
lm.inspect_history(n=20)





You are given a premise and a hypothesis.
    You must indicate with Yes or No answer whether we can logically
    conclude the hypothesis from the premise.
    From the premise is the hypothesis validated?

---

Follow the following format.

Premise: ${premise}

Hypothesis: ${hypothesis}

Reasoning: Let's think step by step in order to ${produce the entailment}. We ...

Entailment: Yes or No

---

Premise: Watch as this Dodge Challenger Hellcat gets smoked by a Tesla Model S at the drag strip.

Hypothesis: The given text is Grammatically correct

Reasoning: Let's think step by step in order to[32m determine if the sentence is grammatically correct. We observe that the sentence follows a subject-verb-object structure, with "Watch" as the subject, "gets smoked" as the verb phrase, and "by a Tesla Model S at the drag strip" as the object. The verb tense is also consistent within the sentence. Therefore, Yes, the given text is grammatically correct.[0m







You are given a premise

In [51]:
'''
grammar_judge = CoT()

text = "this is correct not will do"
pred = compiled_grammar_judge(text)
print('compiled',pred)
pred = grammar_judge(text)
print('plain',pred)

text = "You can read well"
pred = compiled_grammar_judge(text)
print(pred)
pred = grammar_judge(text)
print('plain',pred)'''

'\ngrammar_judge = CoT()\n\ntext = "this is correct not will do"\npred = compiled_grammar_judge(text)\nprint(\'compiled\',pred)\npred = grammar_judge(text)\nprint(\'plain\',pred)\n\ntext = "You can read well"\npred = compiled_grammar_judge(text)\nprint(pred)\npred = grammar_judge(text)\nprint(\'plain\',pred)'

In [52]:
#lm.inspect_history(n=8)

In [53]:
#evaluate_program(grammar_judge)

In [54]:
#lm.inspect_history(n=8)