# Fine-Tuning LLM to Generate TextFSM Templates

This is a basic proof of concept for fine-tuning a large language model (LLM) to generate TextFSM templates based on raw text and the expected output.

The [ntc-templates](https://github.com/networktocode/ntc-templates) repository provides a collection of TextFSM templates, along with unit tests that include raw data and expected outputs. These resources serve as the foundation for this fine-tuning process.

Low-Rank Adaptation (LoRA) is used during fine-tuning to reduce memory consumption.

# Preparing the data

Dowload and process the data

In [1]:
!wget https://github.com/networktocode/ntc-templates/archive/refs/heads/master.zip -O master.zip

--2024-09-08 21:34:38--  https://github.com/networktocode/ntc-templates/archive/refs/heads/master.zip
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/networktocode/ntc-templates/zip/refs/heads/master [following]
--2024-09-08 21:34:38--  https://codeload.github.com/networktocode/ntc-templates/zip/refs/heads/master
Resolving codeload.github.com (codeload.github.com)... 140.82.113.9
Connecting to codeload.github.com (codeload.github.com)|140.82.113.9|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: 'master.zip'

master.zip              [  <=>               ]   2.99M  9.10MB/s    in 0.3s    

2024-09-08 21:34:39 (9.10 MB/s) - 'master.zip' saved [3135747]



In [2]:
!unzip -oq master.zip

In [3]:
from pathlib import Path
test_data = Path("ntc-templates-master/tests")
template_data = Path("ntc-templates-master/ntc_templates/templates")

In [4]:
def get_data():
  for x in test_data.rglob("*.raw"):
    try:
      with x.open() as fp:
        raw = fp.read()
      with next(x.parent.glob(f"{x.stem}.y*l")).open() as fp:
        fsm_data = fp.read()

      with (template_data / f"{x.parent.parent.name}_{x.parent.name}.textfsm").open() as fp:
        fsm_template = fp.read()
      vendor = " ".join(x.parent.parent.name.split("_"))
      cmd = " ".join(x.parent.name.split("_"))
      yield dict(raw=raw, fsm_data=fsm_data, fsm_template=fsm_template, vendor=vendor, cmd=cmd)
    except Exception as exc:
      ... # One of the files not found.

In [5]:
!pip install pandas



In [6]:
import pandas as pd

In [7]:
df = pd.DataFrame(get_data())

In [8]:
df

Unnamed: 0,raw,fsm_data,fsm_template,vendor,cmd
0,"0 D address=192.168.69.254 address-lists="""" s...","---\nparsed_sample:\n - active_address: ""192....",Value INDEX (\d+)\nValue FLAG ([XRDB]+)\nValue...,mikrotik routeros,ip dhcp-server lease print terse without-paging
1,0 comment=UniFi1 address=10.124.3.199 mac-a...,"---\nparsed_sample:\n - index: ""0""\n flag:...",Value INDEX (\d+)\nValue FLAG ([XRDB]+)\nValue...,mikrotik routeros,ip dhcp-server lease print terse without-paging
2,time: 10:00:47\n ...,"---\nparsed_sample:\n - time: ""10:00:47""\n ...",Value TIME (\d{2}\:\d{2}\:\d{2})\nValue DATE (...,mikrotik routeros,system clock print
3,"Flags: X - disabled, E - established\n 0 E nam...","---\nparsed_sample:\n - index: ""0""\n flag:...",Value INDEX (\d+)\nValue FLAG (X|E)\nValue NAM...,mikrotik routeros,routing bgp peer print status without-paging
4,"Flags: X - disabled, E - established \n 0 E na...","---\nparsed_sample:\n - index: ""0""\n flag:...",Value INDEX (\d+)\nValue FLAG (X|E)\nValue NAM...,mikrotik routeros,routing bgp peer print status without-paging
...,...,...,...,...,...
1468,\nAP Database\n-----------\nName ...,"---\nparsed_sample:\n - ap_model: ""635""\n ...",Value AP_NAME (\S+)\nValue GROUP (\S+)\nValue ...,aruba os,show ap database long
1469,Interface IP Address / IP Ne...,"---\nparsed_sample:\n - admin: ""up""\n inte...",Value INTERFACE (\S+\s\S+)\nValue IP_ADDRESS (...,aruba os,show ip interface brief
1470,Interface [Status/Protocol]\...,"---\nparsed_sample:\n - admin: ""up""\n inte...",Value INTERFACE (\S+\s\S+|\S+)\nValue List IPV...,aruba os,show ipv6 interface brief
1471,AP Radio Database\n-----------------\nName ...,"---\nparsed_sample:\n - ap_name: ""ap-building...",Value AP_NAME (\S+)\nValue GROUP (\S+)\nValue ...,aruba os,show ap radio-database


## Create the prompt

In [9]:
prompt_template = """You are a powerful text-to-TextFSM model. Your job is generate TextFSM templates to extract data from semi structured text. You are given a example text and the expected structured data.

You must output the TextFSM template that extracts the expected structured data.

# Text:
```
{raw}
```

# Expected Data:
```
{fsm_data}
```

# Response:
```
{fsm_template}
```
"""

In [10]:
df["prompt"] = df.apply(lambda row: prompt_template.format(raw=row["raw"], fsm_data=row["fsm_data"], fsm_template=row["fsm_template"]), axis=1)

In [11]:
df

Unnamed: 0,raw,fsm_data,fsm_template,vendor,cmd,prompt
0,"0 D address=192.168.69.254 address-lists="""" s...","---\nparsed_sample:\n - active_address: ""192....",Value INDEX (\d+)\nValue FLAG ([XRDB]+)\nValue...,mikrotik routeros,ip dhcp-server lease print terse without-paging,You are a powerful text-to-TextFSM model. Your...
1,0 comment=UniFi1 address=10.124.3.199 mac-a...,"---\nparsed_sample:\n - index: ""0""\n flag:...",Value INDEX (\d+)\nValue FLAG ([XRDB]+)\nValue...,mikrotik routeros,ip dhcp-server lease print terse without-paging,You are a powerful text-to-TextFSM model. Your...
2,time: 10:00:47\n ...,"---\nparsed_sample:\n - time: ""10:00:47""\n ...",Value TIME (\d{2}\:\d{2}\:\d{2})\nValue DATE (...,mikrotik routeros,system clock print,You are a powerful text-to-TextFSM model. Your...
3,"Flags: X - disabled, E - established\n 0 E nam...","---\nparsed_sample:\n - index: ""0""\n flag:...",Value INDEX (\d+)\nValue FLAG (X|E)\nValue NAM...,mikrotik routeros,routing bgp peer print status without-paging,You are a powerful text-to-TextFSM model. Your...
4,"Flags: X - disabled, E - established \n 0 E na...","---\nparsed_sample:\n - index: ""0""\n flag:...",Value INDEX (\d+)\nValue FLAG (X|E)\nValue NAM...,mikrotik routeros,routing bgp peer print status without-paging,You are a powerful text-to-TextFSM model. Your...
...,...,...,...,...,...,...
1468,\nAP Database\n-----------\nName ...,"---\nparsed_sample:\n - ap_model: ""635""\n ...",Value AP_NAME (\S+)\nValue GROUP (\S+)\nValue ...,aruba os,show ap database long,You are a powerful text-to-TextFSM model. Your...
1469,Interface IP Address / IP Ne...,"---\nparsed_sample:\n - admin: ""up""\n inte...",Value INTERFACE (\S+\s\S+)\nValue IP_ADDRESS (...,aruba os,show ip interface brief,You are a powerful text-to-TextFSM model. Your...
1470,Interface [Status/Protocol]\...,"---\nparsed_sample:\n - admin: ""up""\n inte...",Value INTERFACE (\S+\s\S+|\S+)\nValue List IPV...,aruba os,show ipv6 interface brief,You are a powerful text-to-TextFSM model. Your...
1471,AP Radio Database\n-----------------\nName ...,"---\nparsed_sample:\n - ap_name: ""ap-building...",Value AP_NAME (\S+)\nValue GROUP (\S+)\nValue ...,aruba os,show ap radio-database,You are a powerful text-to-TextFSM model. Your...


## Shuffel the dataset and take the first 1200 for training

In [12]:
df = df.sample(frac=1).reset_index(drop=True)

In [13]:
df

Unnamed: 0,raw,fsm_data,fsm_template,vendor,cmd,prompt
0,MAC age-time : 300 seconds\nNumber ...,"---\nparsed_sample:\n - mac_address: ""88:3a:3...",Value MAC_ADDRESS (\S+)\nValue VLAN_ID (\d+)\n...,aruba aoscx,show mac-address-table,You are a powerful text-to-TextFSM model. Your...
1,"Type escape sequence to abort.\nSending 2, 100...","---\nparsed_sample:\n - destination: ""10.32.2...",Value Required SENT_QTY (\d+)\nValue Required ...,cisco ios,ping,You are a powerful text-to-TextFSM model. Your...
2,==============================================...,"---\nparsed_sample:\n - admin_state: ""Up""\n ...",Value Required PORT_ID (\S+)\nValue Required S...,alcatel sros,show service sap-using,You are a powerful text-to-TextFSM model. Your...
3,Flags: I - Internal usage VLAN\nAging time is ...,---\nparsed_sample:\n - destination_address: ...,Value DESTINATION_ADDRESS ((\w\w:){5}\w\w)\nVa...,cisco s300,show mac address-table,You are a powerful text-to-TextFSM model. Your...
4,SWITCH-NAME# show lldp neighbor-info detail\n-...,"---\nparsed_sample:\n - capabilities: ""WLAN""\...",Value Required LOCAL_INTERFACE (\S+)\nValue Re...,aruba aoscx,show lldp neighbors-info detail,You are a powerful text-to-TextFSM model. Your...
...,...,...,...,...,...,...
1468,Interface IP-Address OK?...,"---\nparsed_sample:\n - interface: ""Ethernet0...",Value INTERFACE (\S+)\nValue IP_ADDRESS (\S+)\...,cisco ios,show ip interface brief,You are a powerful text-to-TextFSM model. Your...
1469,iGigabitEthernet0/0/0 has 0 neighbors\n\nGigab...,"---\nparsed_sample:\n - capabilities: ""bridge...",Value Required LOCAL_INTERFACE (\S+)\nValue CH...,huawei vrp,display lldp neighbor,You are a powerful text-to-TextFSM model. Your...
1470,Power Supply:\nVoltage: 12 Volts\nPower ...,"---\nparsed_sample:\n - power_supply: ""1""\n ...",Value POWER_SUPPLY (\d+)\nValue POWER_SUPPLY_M...,cisco nxos,show environment,You are a powerful text-to-TextFSM model. Your...
1471,SNMP write community: Kl3t5k0p\nSNMP access co...,"---\nparsed_sample:\n - name: ""Kl3t5k0p""\n ...",Value NAME (\S+)\nValue SECURITY_NAME (\S+)\nV...,oneaccess oneos,show snmp community,You are a powerful text-to-TextFSM model. Your...


In [14]:
data = df.prompt.tolist()[:1200]

In [15]:
from IPython.display import display, Markdown

display(Markdown(data[0]))

You are a powerful text-to-TextFSM model. Your job is generate TextFSM templates to extract data from semi structured text. You are given a example text and the expected structured data.

You must output the TextFSM template that extracts the expected structured data.

# Text:
```
MAC age-time            : 300 seconds
Number of MAC addresses : 5

MAC Address          VLAN     Type                      Port
--------------------------------------------------------------
88:3a:30:a3:86:80    1        dynamic                   lag100
90:e2:ba:28:0d:f1    10       dynamic                   lag100
00:01:2e:82:0f:7b    3560     dynamic                   lag100
90:e2:ba:28:0d:f0    3590     dynamic                   lag100
88:3a:30:a3:86:80    3590     dynamic                   lag100
80:5e:0c:76:ed:bb    2015     port-access-security      1/1/30
```

# Expected Data:
```
---
parsed_sample:
  - mac_address: "88:3a:30:a3:86:80"
    port: "lag100"
    type: "dynamic"
    vlan_id: "1"
  - mac_address: "90:e2:ba:28:0d:f1"
    port: "lag100"
    type: "dynamic"
    vlan_id: "10"
  - mac_address: "00:01:2e:82:0f:7b"
    port: "lag100"
    type: "dynamic"
    vlan_id: "3560"
  - mac_address: "90:e2:ba:28:0d:f0"
    port: "lag100"
    type: "dynamic"
    vlan_id: "3590"
  - mac_address: "88:3a:30:a3:86:80"
    port: "lag100"
    type: "dynamic"
    vlan_id: "3590"
  - mac_address: "80:5e:0c:76:ed:bb"
    port: "1/1/30"
    type: "port-access-security"
    vlan_id: "2015"

```

# Response:
```
Value MAC_ADDRESS (\S+)
Value VLAN_ID (\d+)
Value TYPE (\S+)
Value PORT (\S+)

Start
  ^MAC\s+age-time.*$$
  ^Number\s+of\s+MAC.*$$
  ^MAC\s+Address\s+VLAN\s+Type\s+Port
  ^-+$$
  ^${MAC_ADDRESS}\s+${VLAN_ID}\s+${TYPE}\s+${PORT} -> Record
  ^\s*$$
  ^. -> Error

```


# Build LLM

In [16]:
!pip install -q -U keras-nlp
!pip install -q -U keras>=3

In [17]:
import os

os.environ["KERAS_BACKEND"] = "jax"  # Or "torch" or "tensorflow".
# Avoid memory fragmentation on JAX backend.
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]="1.00"

import keras
import keras_nlp

In [18]:
class Settings:
    base_model = "gemma_instruct_2b_en" # base LLM
    rank = 4  # LoRA Rank
    sequence_length = 512 # max input size
    batch_size = 1 # depending on GPUs
    epochs = 1 # try more epochs

Provide the Kaggle API token and accept the Gemma license on Kaggle to download the model.

In [19]:
%%time
# Use 2b to be able to use free Colab or Kaggle Notebook, would be nice to try 7b model

gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset(Settings.base_model)

normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.


CPU times: user 6.13 s, sys: 7.73 s, total: 13.9 s
Wall time: 1min 1s


In [20]:
gemma_lm.summary()

## Test Base LLM

In [21]:
%%time
display(Markdown(gemma_lm.generate("What is TextFSM in the context of Network Automation?", max_length=1024)))

What is TextFSM in the context of Network Automation?

TextFSM is a software tool that can be used to automate the configuration and management of network devices and services. It is a powerful tool that can be used to streamline the network configuration process and to ensure that all devices and services are configured correctly.

**Key features of TextFSM include:**

* **Device and service discovery:** TextFSM can automatically discover the network devices and services that are available.
* **Configuration management:** TextFSM can be used to manage the configuration of network devices and services.
* **Troubleshooting:** TextFSM can be used to troubleshoot network issues.
* **Reporting:** TextFSM can generate reports on the status of the network.

TextFSM is a popular choice for network automation due to its ease of use and its powerful features. It is a valuable tool for any network administrator who wants to streamline the network configuration process and to ensure that all devices and services are configured correctly.

CPU times: user 19.6 s, sys: 235 ms, total: 19.8 s
Wall time: 20 s


In [22]:
%%time
demo_prompt = """You are a powerful text-to-TextFSM model. Your job is generate TextFSM templates to extract data from semi structured text. You are given a example text and the expected structured data.

You must output the TextFSM template that extracts the expected structured data.

# Text:
```
18:42:41.321 PST Sun Feb 8 2009
12:18:42.123 CET Sun Feb 14 2021
08:15:00.0 PST Mon Okt 31 2020
```

# Expected Data:
```
[
  {
    "Year": "2009",
    "MonthDay": "8",
    "Month": "Feb",
    "Timezone": "PST",
    "Time": "18:42:41"
  },
  {
    "Year": "2021",
    "MonthDay": "14",
    "Month": "Feb",
    "Timezone": "CET",
    "Time": "12:18:42"
  },
  {
    "Year": "2020",
    "MonthDay": "31",
    "Month": "Okt",
    "Timezone": "PST",
    "Time": "08:15:00"
  }
]
```
"""
display(Markdown(gemma_lm.generate(demo_prompt, max_length=1024)))

You are a powerful text-to-TextFSM model. Your job is generate TextFSM templates to extract data from semi structured text. You are given a example text and the expected structured data.

You must output the TextFSM template that extracts the expected structured data.

# Text:
```
18:42:41.321 PST Sun Feb 8 2009
12:18:42.123 CET Sun Feb 14 2021
08:15:00.0 PST Mon Okt 31 2020
```

# Expected Data:
```
[
  {
    "Year": "2009",
    "MonthDay": "8",
    "Month": "Feb",
    "Timezone": "PST",
    "Time": "18:42:41"
  },
  {
    "Year": "2021",
    "MonthDay": "14",
    "Month": "Feb",
    "Timezone": "CET",
    "Time": "12:18:42"
  },
  {
    "Year": "2020",
    "MonthDay": "31",
    "Month": "Okt",
    "Timezone": "PST",
    "Time": "08:15:00"
  }
]
```
**TextFSM Template:**
```
(Timestamp) ([Year] ([Month] ([Day])) ([Timezone]) ([Time])
```

CPU times: user 2.3 s, sys: 0 ns, total: 2.3 s
Wall time: 2.3 s


## Fine-tuning with Low-Rank Adaptation (LoRA)

In [23]:
gemma_lm.backbone.enable_lora(rank=Settings.rank)
gemma_lm.summary()

In [24]:
# Set sequence length to control memory usage
gemma_lm.preprocessor.sequence_length = Settings.sequence_length

# Use AdamW (Adam with Weight Decay)
optimizer = keras.optimizers.AdamW(
    learning_rate=5e-5,  # Try higher learning rates like 1e-4, 2e-4 or 2e-5
    weight_decay=0.01,
    beta_1=0.9,
    beta_2=0.999
)

optimizer.exclude_from_weight_decay(var_names=["bias", "scale"])

gemma_lm.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=optimizer,
    weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()]
)

In [25]:
gemma_lm.fit(data, epochs=Settings.epochs, batch_size=Settings.batch_size)

[1m1200/1200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1764s[0m 1s/step - loss: 1.6225 - sparse_categorical_accuracy: 0.6901


<keras.src.callbacks.history.History at 0x7987606562c0>

In [26]:
gemma_lm.save("textfsmLLM.keras")

## Test new LLM

In [27]:
print(demo_prompt)

You are a powerful text-to-TextFSM model. Your job is generate TextFSM templates to extract data from semi structured text. You are given a example text and the expected structured data.

You must output the TextFSM template that extracts the expected structured data.

# Text:
```
18:42:41.321 PST Sun Feb 8 2009
12:18:42.123 CET Sun Feb 14 2021
08:15:00.0 PST Mon Okt 31 2020
```

# Expected Data:
```
[
  {
    "Year": "2009",
    "MonthDay": "8",
    "Month": "Feb",
    "Timezone": "PST",
    "Time": "18:42:41"
  },
  {
    "Year": "2021",
    "MonthDay": "14",
    "Month": "Feb",
    "Timezone": "CET",
    "Time": "12:18:42"
  },
  {
    "Year": "2020",
    "MonthDay": "31",
    "Month": "Okt",
    "Timezone": "PST",
    "Time": "08:15:00"
  }
]
```



In [28]:
%%time
display(Markdown(gemma_lm.generate(demo_prompt, max_length=1024)))

You are a powerful text-to-TextFSM model. Your job is generate TextFSM templates to extract data from semi structured text. You are given a example text and the expected structured data.

You must output the TextFSM template that extracts the expected structured data.

# Text:
```
18:42:41.321 PST Sun Feb 8 2009
12:18:42.123 CET Sun Feb 14 2021
08:15:00.0 PST Mon Okt 31 2020
```

# Expected Data:
```
[
  {
    "Year": "2009",
    "MonthDay": "8",
    "Month": "Feb",
    "Timezone": "PST",
    "Time": "18:42:41"
  },
  {
    "Year": "2021",
    "MonthDay": "14",
    "Month": "Feb",
    "Timezone": "CET",
    "Time": "12:18:42"
  },
  {
    "Year": "2020",
    "MonthDay": "31",
    "Month": "Okt",
    "Timezone": "PST",
    "Time": "08:15:00"
  }
]
```
# Response:
```
Value YEAR (\d+)
Value MONTH (\d+)
Value DAY (\d+)
Value MONTHDAY (\d+)
Value TIME (\d{2}:\d{2}:\d{2}.\d{3} \(.*?\)
Value TIME_ZONE (\w+)
Value TZ (\w+)
Value TZ_DST (\w+)
Value TZ_OFFSET (\d+)
Value TZ_OFFSET_DST (\d+)
Value TZ_OFFSET_DST_OFFSET (\d+)
Value TZ_OFFSET_DST_OFFSET_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET_DST_DST_DST (\d+)
Value TZ_OFFSET_DST_OFFSET

CPU times: user 44.5 s, sys: 93.5 ms, total: 44.6 s
Wall time: 44.8 s


# ToDo

This is just a basic first attempt, and the model still needs extensive validation and optimization.