In [0]:
import numpy as np
from numpy.random import default_rng
import matplotlib.pyplot as plt

# Data Resilience Tips and Tricks in LoRaWAN

## Data-driven SLA

There are many use-cases that can be implemented using LoRaWAN protocol, but I would focus on one of the most challenging ones: water metering.

Everynet customers are primarily interested in service, whereby SLA is expressed solely by the availability of meter readings in the customer’s database or ERP system.

For example, a typical SLA for a metering project is as follows:

 - 97% of meters must be connected to an ERP system;
 - 95% of daily consumption datapoints should eventually be available in an ERP system.
 
The percentage numbers could be different, but neither “network availability” nor “signal quality”, nor "message loss", nor the applied connectivity technology is involved in this data-driven SLA definition. 

Customers are only referring to the availability of meter readings and the percentage of sensors covered by the network. 

Unfortunately, any other SLA definition leads to a dilution of responsibility, unnecessary delving into the technical details, and does not correspond to the needs of the customer.

**Coverage availability** is something that network operator suppose to deliver, while **datapoints availability** is something that can be significanlty improved by the solution provider and device manufacturer.

## Message Loss vs Data Loss

Meter readings or the **datapoints** are some values generated by the meter and delivered using LoRaWAN **messages**.

Some messages are lost during transmission, but it is important to distingush between **lost messages** and **lost datapoints**.

Lost messages are inevitable to any radio technology they do not always lead to the lost datapoints if data redundancy techniques are used. 

The simplest data redundancy technique is message repetition. 

We will focus on this technique in this paper.


## Data redundancy by repetition

Here is an example of how repetition helps to decrease data loss.

Let's say that device generated some datapoint and delivers it using LoRaWAN messages:

- probability of each single message to be delivered to the gateway is $prr$
- the message would be repeated $n$ times

Datapoint is considered to be delivered in case **at least one message out of $n$ is delivered**.

Given the $prr$ and $n$ values datapoint delivery probability could be calculated as follows: $1 - (1 - prr)^n$


### Example I

For example, probability of message delivery is $prr = 0.75$ and number of repetitions $n=3$.

Probability of datapoint delivery: $1 - (1 - 0.75)^3 = 0.98$


### Example II

For example, probability of message delivery is $prr = 0.25$ and number of repetitions $n=5$.

Probability of datapoint delivery: $1 - (1 - 0.25)^5 = 0.76$

In [0]:
# Probability of datapoint delivery using repeated message
# prr - probability of single message delivery, n - number of message repetitions
def repeated_message_prr(prr, n):
    return 1.0 - (1.0 - prr)**n


repeated_message_prr(prr = 0.75, n = 3)

0.984375

## Message loss estimate

Now, when we have a formula we can easily decide on the reasonable value of $prr$.

Accurate message loss could be obtained by the accurate simulation. 

We would use a relatively simple simulation process to obtain a message loss estimate that is good enough to be used in practice.


### Device behaviour

Let's agree on device behaviour model before we put together a simulator:
- each device transmits message once per hour
- transmisson time is selected randomly within an hour
- uniform distribution is used for random number generator
- all messages have the same time on air

Let's calculate an $prr$ estimate given the number of devices per gateway and message time on air.

In [0]:
# Simplified model of message collisions under the following assumptions:
# - message is sent once an hour
# - one gateway is used for reception
def message_prr_with_collisions(number_of_devices, message_time_on_air_ms):
    number_of_frequency_channels = 7
    hour_in_ms = 60 * 60 * 1000
    number_of_devices_per_channel = number_of_devices // number_of_frequency_channels
    messages_received, messages_sent = 0, 0

    start = default_rng().integers(0, hour_in_ms, number_of_devices_per_channel)
    finish = start + message_time_on_air_ms

    intervals = list(zip(start, finish))
    intervals.sort(key=lambda x:x[0], reverse=False)

    for i in range(1, len(intervals)):
        prev_end = intervals[i - 1][1]
        curr_start = intervals[i][0]
        if prev_end < curr_start:
            messages_received += 1

    prr = messages_received / number_of_devices_per_channel
    return prr

message_prr_with_collisions(number_of_devices=8000, message_time_on_air_ms=3000)

0.38266199649737304

## Number of repetitions estimate

Once we know $prr$ value it's time to estimate $n$ based on the required datapoint availability SLA.

In [0]:

def datapoint_sla_to_repetitions(required_datapoint_sla, message_prr):
    n = 1
    while n < 1024:
        if repeated_message_prr(message_prr, n) > required_datapoint_sla:
            break
        n += 1
    return n

datapoint_sla_to_repetitions(required_datapoint_sla=0.97, message_prr=0.38)

8

## Implementing data redundancy in message format

In order to implement data redundancy, but do not overuse LoRaWAN messages we suggest to add historical values to the message.

Suggested message format offers no redundancy for hourly water consumption, but adds more redundancy for daily and monthly water consumption.

```
typedef struct {
    uint16_t hourly_water_consumption;
    uint16_t daily_water_consumption[8];
    unit16_t monthly_water_consumption[2]
} message_t;
```

Given the format of the message daily water consumption readings will have $24 * 8 = 192$ attempts to be delivered ($n=192$).

Monthly consumption readings have $24 * 30 * 2 = 1440$ attempts to be delivered.

Given the number of attemps and datapoint SLA value, we can calculate requred message PRR for both daily and monthly messages:
- message PRR to achieve 95% of daily readings delivered is 1.5%
- message PRR to achieve 95% of monthly reading delivered is 0.2%

This is a massive improvement achieved with a very low effort.

In [0]:

def datapoint_sla_to_message_prr(required_datapoint_sla, message_format_redundancy_n):

    prr_l, prr_r = 0.0000000001, 1.0

    while prr_r - prr_l > 0.00001:
        prr_mid = (prr_l + prr_r) / 2
        if repeated_message_prr(prr_mid, message_format_redundancy_n) > required_datapoint_sla:
        #if 1.0 - (1.0 - prr_mid)**message_format_redundancy_n > required_datapoint_sla:
            prr_r = prr_mid
        else:
            prr_l = prr_mid

    return prr_r

datapoint_sla_to_message_prr(required_datapoint_sla=0.97, message_format_redundancy_n=1440)

0.0024337769552253727

## End-to-end simulation example

We are going to simulate the following SLA: **99% of monthly messages should be eventually available in the ERP system**.

In other words, our datapoint SLA is $0.99$.

Our message size is $22$ bytes + LoRaWAN overhead of $13$ bytes.

Given the size of the message we calculate message time on air at SF12: $1811$ millisecond.

Time on air calculator: https://avbentem.github.io/airtime-calculator/ttn/as923/48

The more the time on air the more the probability of the message collision, we took the worst case.

Our device transmits messages once per hour, so number of delivery attempts for monthly value is $1440$.



In [0]:
required_datapoint_sla = 0.97
message_time_on_air_ms = 1811
message_format_redundancy_n = 1440
number_of_devices = 80000

achievable_message_prr = message_prr_with_collisions(number_of_devices, message_time_on_air_ms)
required_message_prr = datapoint_sla_to_message_prr(required_datapoint_sla, message_format_redundancy_n)
achievable_datapoint_sla = repeated_message_prr(achievable_message_prr, message_format_redundancy_n)

print(f"Only {round(required_message_prr * 100, 2)}% of messages need to be delivered, to achive {required_datapoint_sla * 100}% datapoints to be delivered.")
print(f"In reality around {round(achievable_message_prr * 100, 2)}% of messages will be delivered, which will result in {achievable_datapoint_sla * 100}% datapoint SLA.")

Only 0.24% of messages need to be delivered, to achive 97.0% datapoints to be delivered.
In reality around 0.36% of messages will be delivered, which will result in 99.43468817176266% datapoint SLA.



Hope you enjoyed this notebook!