### Step1: Choose the bug you want to inject into the bug library
You can find bug details in bug_lib.py or you can refer to appendix in the paper. Specially, if bug_no = -1, it means that we don't inject any bug into the stable-baselines3 library.

In [1]:
bug_no = -1
algorithm = 'ppo'
n = 2
m = 2
I = 1
J = 800

### Step 2: Inject bug into Stable-baselines3 library
**Important**: Restart the kernal after injecting bug to make sure the running stable-baselines3 is injected successfully. After kernal restarts, you don't have to run this block again.

In [7]:
import bug_lib
if bug_no == -1:
    bug_lib.cover_then_inject_bugs([])
else:
    bug_lib.cover_then_inject_bugs([bug_no])

### Step 3: Import relevant libraries and buggy SB3
**Important:** Run the first code block to redefine bug_no and algorithm .

In [2]:
import Util
import LPEA_Env
import numpy as np
from gymnasium.wrappers import TimeLimit 
import stable_baselines3 as sb3
import Lyaponov_oracle_util as LO
import os




### Step 4: Generate state transition matrices. 
In our experiment, we set I = 20, n = 2, m = 2. This implies that we randomly generate 20 pairs of 2 by 2 matrices A and B. These 20 pairs of A and B will be used to create 20 environments. See **Section II-B Lyapunov Stability Control Theory** for more information. However in demo, please use a small I such as 1 to shorten the running time.

In [3]:
file_path = f'./saved_array/{n}by{n}'.format(n=n, m=m)
if not os.path.exists(file_path):
    os.makedirs(file_path)

for i in range(I):
    A, B = Util.generate_state_transition_matix(n, m)
    np.save(f'./saved_array/{n}by{n}/array_A_{i}.npy'.format(n=n, m=m), A)
    np.save(f'./saved_array/{n}by{m}/array_B_{i}.npy'.format(n=n, m=m), B)

### Step 5: Train **I** agents and save all the trained agents into a file for further useage  
This step may take some time. It depends on your device and algorithm you choose. Normally, This step may take some time, depending on your device and the algorithm you choose. Typically, PPO and A2C will take about 20 minutes to train, while TD3 will take approximately 2 hours if I = 20.

In [4]:
file_path_log = './trained_models/oracle_{alg}/bug_{bug}/{n}by{m}/'.format(n=n, m=m, bug = bug_no, alg=algorithm)
random_seed = 1

for i in range(I):
    file_path_A = 'saved_array/{n}by{m}/array_A_{i}.npy'.format(n=n, m=m, i=i)
    file_path_B = 'saved_array/{n}by{m}/array_B_{i}.npy'.format(n=n, m=m, i=i)
    loaded_A = np.load(file_path_A)
    loaded_B = np.load(file_path_B)
    env = TimeLimit(LPEA_Env.CustomEnv(loaded_A, loaded_B, n, m), max_episode_steps=50)
    if algorithm == 'ppo':
        model = sb3.PPO("MlpPolicy", env, verbose=0, seed=random_seed, learning_rate=0.0012)
        model.learn(total_timesteps=120000)
    elif algorithm == 'a2c':
        model = sb3.A2C("MlpPolicy", env, verbose=0, seed=random_seed, learning_rate=0.0004)
        model.learn(total_timesteps=90000)
    elif algorithm == 'td3':
        model = sb3.TD3("MlpPolicy", env, verbose=0, seed=random_seed, )
        model.learn(total_timesteps=90000)
    model.save('./trained_models/oracle_{alg}/bug_{bug}/{n}by{m}/{i}_model'.format(n=n, m=m, i=i, bug=bug_no, alg=algorithm))

### Step 6: Evaluation the trained agents.

Define vartheta and theta. Check Section III Step 3 of the paper for more information.

In [4]:
buggy_metrics = LO.buggy_trained_model_metrics_calculation(algorithm, n, m, I, J, bug_no)

  logger.warn(
  logger.warn(
  logger.warn(
Exception: code expected at most 16 arguments, got 18
Exception: code expected at most 16 arguments, got 18


In [5]:
for vartheta in range(100, 40, -10):
    for theta in range(100, 45, -25):
        Oracle_result = LO.LPEA_Oracle(buggy_metrics, I, J, vartheta * 0.01, theta * 0.01)
        if Oracle_result:
            print("vartheta={vartheta}%, theta={theta}%, the software is bug-less based on LPEA Oracle".format(vartheta=vartheta, theta=theta))
        else:
            print("vartheta={vartheta}%, theta={theta}%, the software is buggy based on LPEA Oracle".format(vartheta=vartheta, theta=theta))

vartheta=100%, theta=100%, the software is buggy based on LPEA Oracle
vartheta=100%, theta=75%, the software is buggy based on LPEA Oracle
vartheta=100%, theta=50%, the software is bug-less based on LPEA Oracle
vartheta=90%, theta=100%, the software is buggy based on LPEA Oracle
vartheta=90%, theta=75%, the software is bug-less based on LPEA Oracle
vartheta=90%, theta=50%, the software is bug-less based on LPEA Oracle
vartheta=80%, theta=100%, the software is buggy based on LPEA Oracle
vartheta=80%, theta=75%, the software is bug-less based on LPEA Oracle
vartheta=80%, theta=50%, the software is bug-less based on LPEA Oracle
vartheta=70%, theta=100%, the software is buggy based on LPEA Oracle
vartheta=70%, theta=75%, the software is bug-less based on LPEA Oracle
vartheta=70%, theta=50%, the software is bug-less based on LPEA Oracle
vartheta=60%, theta=100%, the software is buggy based on LPEA Oracle
vartheta=60%, theta=75%, the software is bug-less based on LPEA Oracle
vartheta=60%, th

In [None]:
buggy_metrics