## What is LIDAR?

So what is LIDAR? Lidar stands for Light Detection And Ranging, similar to sonar and radar with sound and radio waves respectively. Lidar technology uses beams of light to sense the range to an object. This can be very useful to a robot as it can sense the distance to a target or obstacle, and create a local “map” of the world around it.

There are a huge variety of lidar models on the market, employing a range of different technologies, at different price points. For now, we can boil this down into three categories

<img src= "https://frenzy86.s3.eu-west-2.amazonaws.com/python/Lidar1.png" width=600>

- 1D - Measures the distance to a single point. These are often used as “digital tape measures”. Alone, these have limited usefulness, e.g. for a UAV to detect height from the ground or a mobile robot to detect the distance to a wall directly in front.

- 2D - The most common type of lidar, this measures many points in a scan plane. The simplest models are simply a 1D lidar attached to a motor with an encoder. These are often used on mobile robots to create a floor plan to navigate around. Models vary in frequency, range (maximum distance), resolution, and horizontal field-of-view. These are sometimes called “laser scanners”.

- 3D - There are an increasing number of 3D lidars on the market, that utilise a variety of methods and technologies. Rather than a single 2D scan plane, these lidars see in three dimensions. Some models can be effectively treated as 3D cameras, returning an image with a distance measured to each pixel.

<img src= "https://frenzy86.s3.eu-west-2.amazonaws.com/python/Lidar2.png" width=600>

Each LaserScan message contains the data for a single sweep of the sensor, and is basically an array of floats representing each range measurement (in metres), along with some extra parameters to aid in understanding the data properly.

The principle of LIDAR operation is pretty simple. A focused light beam is aimed at an object and a sensor looks for its reflection. If the beam is detected its intensity and angle (or phase) is measured. These values are then plugged into an equation run by a fast onboard computer to determine the reflecting objects position and characteristics.
By “sweeping” the beam and receiver array mechanically we can quickly build up a 3D “image’ of the surrounding area. This is often displayed in a “point cloud” to help us humans visualize what the LIDAR is “seeing”.

Of course LIDAR is not the only method that can be used to sense external surroundings. Let’s see how LIDAR compares to other remote position sensing method.

## Machine Learning robot with LIDAR



In [None]:
## Use Visualize APP in order to see da data

<img src= "https://frenzy86.s3.eu-west-2.amazonaws.com/python/Lidar4.png" width=600>

### Loading the data (Supervised Dataset)

We drive for different type of circuit che robot, in order to collect Supervised data, with joystic direction and Lidar acquisition

<img src= "https://frenzy86.s3.eu-west-2.amazonaws.com/python/Lidar3.png" width=600>

Each file with the data saved is composed of different number of samples and has exactly 241 collumns. Columns 0-239 have the measurmeents from the LIDAR the last column has the label with a letter. There are only five possible letters:

- F - forward
- I - forward right
- R - right
- G - forward left
- L - left

In [1]:
import pandas as pd

from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

In [2]:
# path = 'data/data1.txt'
path = "https://frenzy86.s3.eu-west-2.amazonaws.com/python/data/data1.txt"

In [3]:
df = pd.read_csv(path, header=None)
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,231,232,233,234,235,236,237,238,239,240
0,235,237,239,241,243,246,248,251,253,256,...,222,222,223,223,224,225,226,227,228,R
1,299,308,316,320,327,334,342,350,359,368,...,248,251,254,257,260,264,267,271,275,F
2,294,302,308,317,322,329,336,344,353,362,...,243,246,249,252,255,258,262,266,270,F
3,218,225,230,234,239,244,244,255,261,267,...,182,184,187,189,191,194,196,199,202,F
4,188,194,172,170,172,176,179,183,187,192,...,182,184,187,167,165,167,170,172,174,G
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1781,274,274,274,274,274,274,275,275,276,277,...,275,275,274,275,274,276,279,279,278,F
1782,274,275,276,278,280,279,289,288,288,289,...,261,261,261,262,262,262,263,263,264,F
1783,261,263,264,266,268,270,272,275,277,280,...,249,250,250,250,251,251,252,253,253,F
1784,245,248,249,251,253,255,257,260,262,265,...,234,234,234,235,235,236,237,238,239,G


In [4]:
## Target
df.iloc[:,-1].value_counts()

Unnamed: 0_level_0,count
240,Unnamed: 1_level_1
F,1176
G,298
I,291
L,16
R,5


### Data cleaning
Next, we can initiate some data cleaning. I intend to keep this step straightforward, but feel free to experiment and enhance the cleaning process. As they often emphasize in data science, "garbage in, garbage out," so the cleaner the data, the better the final result.

I will rename the last column and label it "Target" for ease of work. Additionally, we'll eliminate all samples with labels "L," "R," "H," or "J." To streamline the task for the classifier, I've chosen to focus solely on driving forward, forward left, and forward right. This selection is sufficient for navigating the racetracks I designed.

In [5]:
df.rename(columns={df.columns[-1]: 'Target'}, inplace=True)
print(f"Label counts before cleaning the data: \n {df['Target'].value_counts()}")

## remove from the dataset L, R and H bacause too few observations
df = df[(df['Target'] != 'L') & (df['Target'] != 'R') & (df['Target'] != 'H') & (df['Target'] != 'J')]
df.reset_index(drop=True, inplace=True)


Label counts before cleaning the data: 
 Target
F    1176
G     298
I     291
L      16
R       5
Name: count, dtype: int64


Now we will separate our X and Y that is the input and output data. After that we will divide it into train and test sets with train_test_split. Label encoder is used to convert letters that were used in label column to numbers so that the classifier can work with that.

In [6]:
df['Target'].value_counts()

Unnamed: 0_level_0,count
Target,Unnamed: 1_level_1
F,1176
G,298
I,291


In [7]:
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,231,232,233,234,235,236,237,238,239,Target
0,299,308,316,320,327,334,342,350,359,368,...,248,251,254,257,260,264,267,271,275,F
1,294,302,308,317,322,329,336,344,353,362,...,243,246,249,252,255,258,262,266,270,F
2,218,225,230,234,239,244,244,255,261,267,...,182,184,187,189,191,194,196,199,202,F
3,188,194,172,170,172,176,179,183,187,192,...,182,184,187,167,165,167,170,172,174,G
4,188,194,172,170,172,176,179,183,187,192,...,182,184,187,167,165,167,170,172,174,G
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1760,274,274,274,274,274,274,275,275,276,277,...,275,275,274,275,274,276,279,279,278,F
1761,274,275,276,278,280,279,289,288,288,289,...,261,261,261,262,262,262,263,263,264,F
1762,261,263,264,266,268,270,272,275,277,280,...,249,250,250,250,251,251,252,253,253,F
1763,245,248,249,251,253,255,257,260,262,265,...,234,234,234,235,235,236,237,238,239,G


In [8]:
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

le = LabelEncoder()
y = le.fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    random_state=667
                                                    )

### Data selection

We don't need all the data. Most of it is just noise that won't be useful for us (remember? garbage in, garbage out). We are not too concerned about what is behind us, dirving forward while looking backwards is not the best idea. That's why we do data selection. Why now? Data selection should be performed after division to train and test set, otherwise we are exposed to data leakage problem.

We will perform dataselection with SelectKBest from sklearn package.

With K you define how many features you want to select. We were able to get the robot to autonomously navigate in the race track with decent precision with as low as 10 features. For the second video where I tried to make the robots race I had to increase the number of dimensions to 80 to get it to work. Even with such a high number of dimensions Arduino still seem to work well.

In [10]:
k = 80
k_best = SelectKBest(score_func=f_classif, k=k) #f_classif --> ANOVA F-value between label/feature for classification tasks.
k_best.fit(X_train, y_train)

selected_feature_indices = k_best.get_support(indices=True)
# we have to print it like this to have the commas between the indices so that it's easy to copy and paste to Arduino IDE
print("selected features: ", X.columns[selected_feature_indices])

selected features:  Index([135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148,
       149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162,
       163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 184, 185, 186,
       187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200,
       201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214,
       215, 216, 217, 218, 219, 220, 221, 222, 223, 224],
      dtype='object')


### Training the model
Training the model is a straightforward process, thanks to all the libraries available in Python. The ultimate outcome depends on our dataset and the preceding steps we executed. Post-training, accuracy will be computed using the test set, and a higher accuracy is desirable.

In the videos, the classifiers I employed achieved a maximum accuracy of about 75%, which, while not the optimal performance and open to improvement, enabled the robot to autonomously navigate the racetrack. Infrequent collisions with the wall did occur. At times, the robot could navigate for a few minutes without any crashes. We will also print the classification report to see the accuracy for all the classes.

In [11]:
clf = RandomForestClassifier(max_depth=3, random_state=42)
clf.fit(X_train.iloc[:, selected_feature_indices], y_train)

y_pred = clf.predict(X_test.iloc[:, selected_feature_indices])

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

class_names = le.classes_
report = classification_report(y_test, y_pred, target_names=class_names, zero_division=0)
print('Classification Report:\n', report)

Accuracy: 0.7478753541076487
Classification Report:
               precision    recall  f1-score   support

           F       0.77      0.91      0.83       245
           G       0.68      0.44      0.53        57
           I       0.62      0.29      0.40        51

    accuracy                           0.75       353
   macro avg       0.69      0.55      0.59       353
weighted avg       0.73      0.75      0.72       353



### Exporting the Classifier

While performing tasks in Python is convenient, we face limitations when it comes to running Python code on Arduino. Therefore, the next step involves exporting the classifier. I've come across an [excellent article](https://eloquentarduino.github.io/2019/11/how-to-train-a-classifier-in-scikit-learn/) online that provides a detailed explanation of how to export the classifier to C and integrate it with Arduino. The resulting file will be saved to the same directory where you are currently working, so please remember to relocate it to the Arduino folder. If you are experimenting with and testing various models, ensure to modify the index at the end of the file name to avoid mixing up files.

**REMEMBER** to copy the selected dimensions and paste into the Arduino file. Number of dimensions during training and later classifing must match otherwise it won't work!


In [None]:
from micromlgen import port

arduino_code = open("randomForest10.h", mode="w+")
arduino_code.write(port(clf))
arduino_code.close()
print("selected features: ", X.columns[selected_feature_indices])

selected features:  Index([135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148,
       149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162,
       163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 184, 185, 186,
       187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200,
       201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214,
       215, 216, 217, 218, 219, 220, 221, 222, 223, 224],
      dtype='object')
