# ZigBee2MQTT Capability Extraction Script
The following script has been used in the RetroWoT Paper to extract the service capabilities of devices. 
ZigBee2MQTT structures device model descriptions in the following format:

```
const definitions: Definition[] = [
    {
        zigbeeModel: ['DimmerSwitch-2Gang-ZB3.0'],
        model: 'D086-ZG',
        vendor: 'HZC Electric',
        description: 'Zigbee dual dimmer',
        extend: [
            deviceEndpoints({endpoints: {'l1': 1, 'l2': 2}}),
            light({endpointNames: ['l1', 'l2'], configureReporting: true}),
        ],
    },
    {
        zigbeeModel: ['TempAndHumSensor-ZB3.0'],
        model: 'S093TH-ZG',
        vendor: 'HZC Electric',
        description: 'Temperature and humidity sensor',
        fromZigbee: [fz.temperature, fz.humidity, fz.linkquality_from_basic], <--- We need to extract this line
        toZigbee: [],                                                         <--- We need to extract this line
        extend: [e.temperature(), e.humidity()],
        exposes: [e.temperature(), e.humidity()], // Unfortunately, battery percentage is not reported by this device
    },
];
```

The clusters provided by these devices can be identified within the lines: "fromZigbee" and "toZigbee".

"fromZigbee" provides service capabilities of the device that can be read by another device, such as the temperature or humidity in this case.

"toZigbee" provides service capabilities of the device to manipulate it. This can be for example changing the brightness of a light.

With this script we extract for each document these lines, so we can analyse and clean them further in excel.

In [45]:
from typing import List
import re

def get_fz_functions(content: str) -> List[str]:
    pattern = r"(?:\w+\.)*fz\.\w+"
    matches = re.findall(pattern, content)
    return matches


def get_tz_functions(content: str) -> List[str]:
    pattern = r"(?:\w+\.)*tz\.\w+"
    matches = re.findall(pattern, content)
    return matches


def get_extend_lines(content: str) -> List[str]:
    res = []
    pattern = r".*extend.*"
    extend_lines = re.findall(pattern, content)
    function_pattern = r"(\w+)\("

    for line in extend_lines:
        function_calls = re.findall(function_pattern, line)
        res += function_calls

    return res



In [47]:
import os

from collections import Counter
import pandas as pd

folder_path = "./zigbee-herdsman-converters/src/devices/"
services = []
# Iterate over the files in the folder
for file_name in os.listdir(folder_path):
    file_path = os.path.join(folder_path, file_name)

    # Check if the file is a document
    if os.path.isfile(file_path):
        with open(file_path, "r") as file:
            content = file.read()

            # Call the extract_info function on the document content
            services += get_fz_functions(content)
            services += get_tz_functions(content)
            services += get_extend_lines(content)


data = dict(Counter(services))

pd.DataFrame(data.items(), columns=["Service", "Count"]).sort_values(
    "Count", ascending=False
).to_csv("zigbee_herdmans_frequency_new.csv", index=False)

Unnamed: 0,Service,Count
1,fz.battery,509
0,light,471
487,philipsLight,462
11,tz.on_off,223
3,fz.on_off,222
...,...,...
389,tuya.fz.backlight_mode_off_on,1
384,legacy.tz.tuya_thermostat_boost_time,1
382,legacy.tz.tuya_thermostat_force,1
381,legacy.tz.tuya_thermostat_eco_temp,1
