# Report for Landsat Satellite Classification
## Project Purpose
Landsat data is one of the many sources of information available on site. With the advent of the era characterized by remote sensing integrated methods, it will be of great significance to interpret scenes by integrating different types and resolutions of spatial data (including multi-spectral and radar data, maps showing terrain, land use, etc.) (for example, NASA Earth Observation System since the beginning of this decade). The existing statistical methods are not suitable for dealing with such different data types. This project aims to predict this classification based on neural networks, given the multi-spectral values. 

## Problem Statement
The database comprises the multi-spectral values of the pixels in 3x3 neighbors in a satellite picture and the categorization linked to each neighborhood's center pixel. With the help of the multi-spectral readings, this categorization will be predicted. The category of a pixel is encoded as an integer in the dataset.

## Dataset Description
The dataset I use is the Landsat Satellite Dataset. It was generated from Landsat Multi-Spectral Scanner image data, which the Department of Statistics and Modelling Science, University of Strathclyde, provided. In the sample database, the class of a pixel is coded as an integer, ranging from 0 to 255. This dataset consists of 4435 training samples and 2000 testing samples. All the samples have 36 features, four spectra in each sample, and every spectral is 3x3 neighbors in a satellite image. The label of this sample is also an integer representing a specific class. The meaning of labels is shown in the table below:
<table>
  <thead>
    <tr>
      <th>label</th>
      <th>class</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>red soil</td>
    </tr>
    <tr>
      <td>2</td>
      <td>cotton crop</td>
    </tr>    
    <tr>
      <td>3</td>
      <td>grey soil</td>
    </tr>    
    <tr>
      <td>4</td>
      <td>damp grey soil</td>
    </tr>
    <tr>
      <td>5</td>
      <td>soil with vegetation stubble</td>
    </tr>
    <tr>
      <td>7</td>
      <td>very damp grey soil</td>
    </tr>
  </tbody>
</table>


## Project Repository
https://github.com/yichiz3/project-7300
 

## Data Process

In [1]:
from util import *

def idx2label(idx):
    if idx<5:
        return idx+1
    return idx+2

def label2idx(label):
    if label<=5:
        return label-1
    return 5

class Dataset(data.Dataset):
    def __init__(self, df):
        self.x = np.array(df.iloc[:, :-1]) / 255
        self.x = torch.tensor(self.x, dtype=torch.float32)
        self.y = torch.tensor(df.iloc[:, -1].map(label2idx), dtype=torch.int64)

    def __getitem__(self, index):     
        return self.x[index, :], self.y[index]

    def __len__(self):
        return int(self.x.shape[0])


def get_data(batch_size=32):
    train_data = pd.read_csv(PROJ_PATH+"/sat.trn", sep=' ', header=None)
    test_data = pd.read_csv(PROJ_PATH+"./sat.tst", sep=' ', header=None)
    train_set = Dataset(train_data)
    test_set = Dataset(test_data)
    train_loader = data.DataLoader(train_set, batch_size=batch_size, shuffle=True)
    test_loader = data.DataLoader(test_set, batch_size=batch_size, shuffle=True)
    return train_loader, test_loader

ModuleNotFoundError: No module named 'torch'

The training and testing data are stored in the file ‘sat.trn’ and ‘sat.tst’, respectively. First, training and testing samples are read into the pandas data frame format. The features of each sample are divided by 255 and transformed into a float number in (0, 1). Then, the features and labels of the dataset will be turned into pytorch tensors and proceed as pytorch dataloaders.

## EDA
### The Distribution of Labels

In [None]:


train_data = pd.read_csv(PROJ_PATH+"/sat.trn", sep=' ', header=None)
test_data = pd.read_csv(PROJ_PATH+"./sat.tst", sep=' ', header=None)
train_label = train_data.iloc[:, -1]
test_label = test_data.iloc[:, -1]

train_map = {}
for label in train_label.values:
    if label not in train_map:
        train_map[label] = 1
    else:
        train_map[label] += 1
sum_ = sum(train_map.values())

def pie():
    sum_ = sum(train_map.values())

    plt.pie(np.array(list(train_map.values()))/sum_, labels=train_map.keys(),autopct='%.1f%%')
    plt.show()

    
    test_map = {}
    for label in test_label.values:
        if label not in test_map:
            test_map[label] = 1
        else:
            test_map[label] += 1
    sum_ = sum(test_map.values())
    
    plt.pie(np.array(list(test_map.values()))/sum_, labels=test_map.keys(),autopct='%.1f%%')
    plt.show()  

pie()

As we see that, the distribution of labels is roughly even. No label has occupied most of the proportion or which is too less for the model to learn. The distribution is similar on the training set compared to the label distribution on the testing set. Label 7(very damp grey soil) is the label with the most significant number of samples, with a proportion of 23.4-23.5%. Label 4(damp grey soil) has the lowest proportions, around 9.4%-10.5%.

### The Histogram of the pixel value

In [None]:

def hist():
    plt.subplot(2,2,1)
    plt.hist(train_data.iloc[:, 0].values)
    plt.title("Histogram of Pixel Values on the 1st dimension")
    plt.show()
    plt.subplot(2,2,2)
    plt.hist(train_data.iloc[:, 9].values)
    plt.title("Histogram of Pixel Values on the 10th dimension")
    plt.show()
    plt.subplot(2,2,3)
    plt.hist(train_data.iloc[:, 18].values)
    plt.title("Histogram of Pixel Values on the 19th dimension")
    plt.show()
    plt.subplot(2,2,4)
    plt.hist(train_data.iloc[:, 27].values)
    plt.title("Histogram of Pixel Values on the 28th dimension")
    plt.show()
hist()

We also have done predictive variable analysis. There are totally 36 variables, so we pick up the 1st, 10th, 19th and 28th variable to draw the histogram of pixel values. 
The distribution of the 1st, 10th and 19th variable is approximately symmetric. Most of the pixel values is gathered around the center of the distribution. The distribution for the 28th pixel value if right skewed, most of the variables is gathered at the smaller values of the distribution.


In [None]:

def correlation():
    corrdata=train_data.iloc[:, [0, 9, 18, 27]]
    corr = corrdata.corr()
    plt.figure(figsize=(15,8))
    sns.heatmap(corr, annot=True, annot_kws={"size": 15})
    
    plt.title("Correlation between four spectrums")
    plt.show()
correlation()

As we can see from Figure 2, there is a high correlation between spectral bands 3 and 4. However, there is no correlation between others. We can assume that spectral bands 3 and 4 are likely to have similar values simultaneously.

## Model Details
Considering that all the features of the observations of the dataset are continuous numeric variables, I choose to apply logistic regression and Neural Network to solve the classification issue in this report because the parameter of these two models is naturally continuous. The structure of logistic regression is simple but effective. Generally, logic regression and neural networks with SoftMax are used to find the optimal linear separable boundaries. The feature dimension here is 36, as the higher the dimension is, the more likely we might make it possible to draw a linear separable boundary for the samples. Then, I applied decision tree and adaboost to test their performance.

### Logistic Regression

In [None]:
class Logistic(nn.Module):
    """
    Logistic Regression
    """
    def __init__(self, input_dim=36, output_dim=6):
        super(Logistic, self).__init__()
        self.line = nn.Linear(input_dim, output_dim)
        self.activate = nn.Softmax(dim=-1)

    def forward(self, x):
        h1 = self.line(x)
        return self.activate(h1)


The input dimension of the logistic regression and the output dimension is 6, with each dimension representing the probability for the data to be a specific class. There are no hyper-parameters for logistic regression.

### Neural network

In [None]:
class Model(nn.Module):
    """
    Two Layer NN
    """
    def __init__(self, input_dim=36, hidden_dim=72, output_dim=6):
        super(Model, self).__init__()
        self.line1 = nn.Linear(input_dim, hidden_dim)
        self.line2 = nn.Linear(hidden_dim, output_dim)
        self.activate = nn.Softmax(dim=-1)

    def forward(self, x):
        h1 = self.line1(x)
        a1 = torch.relu(h1)
        return self.activate(self.line2(a1))

As for the model of the two-layer neural network, we can choose to set up the output dimension of the first linear layer and activate this layer's function. After several tries, I set the first layer's hidden dimension to 72 and applied a relu function as the activate function. The output dimension is 6, which is equal to the number of labels. The structure of my two-layer neural network model is shown below. ![Model Structuer](C:\Users\gqf12\Documents\ml\project-7300\nn.png)

I use batch gradient descent Algorithms for updating the parameters in each epoch. For each batch, the output of logistic regression can be formatted as:
$$h_1=xW∈R^{Batch×6}$$
$$y_score=softmax(h_1)∈R^{Batch×6}$$
where $W∈R^{36×6}$ is the linear metrics. 

Moreover, the softmax function is known as:
$$softmax(x_i)=e^{x_i}/(∑e^{x_j} )$$
it can convert the predicted score as predicted probabilities.
As for the two-layer neural networks, the formula is a bit of complicated. Suppose the input of the model is $x∈R^{Batch×36}$, then in the first layer:
$h_1=xW_1∈R^{Batch×72}$
where $W_1∈R^{36×72}$ is the linear metrics in layer 1. Then $h_1$ is passes through relu activation:
$$a_1=relu(h_1)∈R^{Batch×72}$$
The output of the first layer $a_1$, will be send into the second layer:
$$h_2=a_1 W_2∈R^{Batch×6}$$
where $W_2∈R^{72×6}$ is the linear metrics in layer 2. Finally, the after softmax function, the output probability for this batch, y_score, is obtained by:
$$y_{score}=softmax(h_2)∈R^{Batch×6}$$
The cross entropy loss function:
$$CrossEntropy(y,y)=∑_i y_i log(\hat{y_i})$$
 is chosen as the loss function. And ADAM with learning rate equals to 0.005 is chosen as the optimizer of the network. In practical the batch size is set to be 64. The model is trained in 15 epochs.


### Decision Tree

In [None]:

class Node():
    def __init__(self, feature_index=None, threshold=None, left=None, 
                 right=None, info_gain=None, value=None):
        # for decision node
        self.feature_index = feature_index
        self.threshold = threshold
        self.left = left
        self.right = right
        self.info_gain = info_gain
        # for leaf node
        self.value = value

class DecisionTreeClassifier():
    def __init__(self, min_samples_split=2, max_depth=2):
        ''' constructor '''
        # initialize the root of the tree 
        self.root = None
        # stopping conditions
        self.min_samples_split = min_samples_split
        self.max_depth = max_depth
        
    def build_tree(self, dataset, curr_depth=0):
        ''' recursive function to build the tree ''' 
        
        X, Y = dataset[:,:-1], dataset[:,-1]
        num_samples, num_features = np.shape(X)
        
        # split until stopping conditions are met
        if num_samples>=self.min_samples_split and curr_depth<=self.max_depth:
            # find the best split
            best_split = self.get_best_split(dataset, num_samples, num_features)
            # check if information gain is positive
            if best_split["info_gain"]>0:
                # recur left
                left_subtree = self.build_tree(best_split["dataset_left"], curr_depth+1)
                # recur right
                right_subtree = self.build_tree(best_split["dataset_right"], curr_depth+1)
                # return decision node
                return Node(best_split["feature_index"], best_split["threshold"], 
                            left_subtree, right_subtree, best_split["info_gain"])
        
        # compute leaf node
        leaf_value = self.calculate_leaf_value(Y)
        # return leaf node
        return Node(value=leaf_value)
    
    def get_best_split(self, dataset, num_samples, num_features):
        ''' function to find the best split '''
        
        # dictionary to store the best split
        best_split = {}
        max_info_gain = -float("inf")
        
        # loop over all the features
        for feature_index in range(num_features):
            feature_values = dataset[:, feature_index]
            possible_thresholds = np.unique(feature_values)
            # loop over all the feature values present in the data
            for threshold in possible_thresholds:
                # get current split
                dataset_left, dataset_right = self.split(dataset, feature_index, threshold)
                # check if childs are not null
                if len(dataset_left)>0 and len(dataset_right)>0:
                    y, left_y, right_y = dataset[:, -1], dataset_left[:, -1], dataset_right[:, -1]
                    # compute information gain
                    curr_info_gain = self.information_gain(y, left_y, right_y, "gini")
                    # update the best split if needed
                    if curr_info_gain>max_info_gain:
                        best_split["feature_index"] = feature_index
                        best_split["threshold"] = threshold
                        best_split["dataset_left"] = dataset_left
                        best_split["dataset_right"] = dataset_right
                        best_split["info_gain"] = curr_info_gain
                        max_info_gain = curr_info_gain
                        
        # return best split
        return best_split
    
    def split(self, dataset, feature_index, threshold):
        ''' function to split the data '''
        
        dataset_left = np.array([row for row in dataset if row[feature_index]<=threshold])
        dataset_right = np.array([row for row in dataset if row[feature_index]>threshold])
        return dataset_left, dataset_right
    
    def information_gain(self, parent, l_child, r_child, mode="entropy"):
        ''' function to compute information gain '''
        
        weight_l = len(l_child) / len(parent)
        weight_r = len(r_child) / len(parent)
        if mode=="gini":
            gain = self.gini_index(parent) - (weight_l*self.gini_index(l_child) + weight_r*self.gini_index(r_child))
        else:
            gain = self.entropy(parent) - (weight_l*self.entropy(l_child) + weight_r*self.entropy(r_child))
        return gain
    
    def entropy(self, y):
        ''' function to compute entropy '''
        
        class_labels = np.unique(y)
        entropy = 0
        for cls in class_labels:
            p_cls = len(y[y == cls]) / len(y)
            entropy += -p_cls * np.log2(p_cls)
        return entropy
    
    def gini_index(self, y):
        ''' function to compute gini index '''
        
        class_labels = np.unique(y)
        gini = 0
        for cls in class_labels:
            p_cls = len(y[y == cls]) / len(y)
            gini += p_cls**2
        return 1 - gini
        
    def calculate_leaf_value(self, Y):
        ''' function to compute leaf node '''
        
        Y = list(Y)
        return max(Y, key=Y.count)
    
    def print_tree(self, tree=None, indent=" "):
        ''' function to print the tree '''
        
        if not tree:
            tree = self.root

        if tree.value is not None:
            print(tree.value)

        else:
            print("X_"+str(tree.feature_index), "<=", tree.threshold, "?", tree.info_gain)
            print("%sleft:" % (indent), end="")
            self.print_tree(tree.left, indent + indent)
            print("%sright:" % (indent), end="")
            self.print_tree(tree.right, indent + indent)
    
    def fit(self, X, Y):
        ''' function to train the tree '''
        dataset = np.concatenate((X, Y), axis=1)
        self.root = self.build_tree(dataset)
    
    def predict(self, X):
        ''' function to predict new dataset '''
        preditions = [self.make_prediction(x, self.root) for x in X]
        return preditions

    def predict_One(self, single_X):
        return self.make_prediction(single_X, self.root) 
    
    def make_prediction(self, x, tree):
        ''' function to predict a single data point '''
        if tree.value!=None: return tree.value
        feature_val = x[tree.feature_index]
        if feature_val<=tree.threshold:
            return self.make_prediction(x, tree.left)
        else:
            return self.make_prediction(x, tree.right)

### Adaboost

In [None]:

class Adaboost:
    def __init__(self, num_learner: int, Model, random_State=None, 
                 min_samples_split_lower=2, min_samples_split_upper=5, max_depth_lower=2, max_depth_upper=6):
        self.num_learner = num_learner
        self.entry_weights = None
        self.learner_weights = None
        self.sorted_learners = None
        self.map_Label_2_Index = {}
        self.learners = self.create_Learners(
            Model=Model,
            random_State=random_State,
            min_samples_split_lower=min_samples_split_lower,
            min_samples_split_upper=min_samples_split_upper,
            max_depth_lower=max_depth_lower,
            max_depth_upper=max_depth_upper)

    def create_Learners(self, Model, random_State, min_samples_split_lower=2, min_samples_split_upper=5, 
                        max_depth_lower=2, max_depth_upper=6):
        if random_State:
            random.setstate(random_State)
        learners = []
        for i in range(self.num_learner):
            learners.append(
                Model(
                    min_samples_split=random.randint(
                        min_samples_split_lower,
                        min_samples_split_upper),
                    max_depth=random.randint(
                        max_depth_lower,
                        max_depth_upper
                    )))
        return learners

    def __fit_Learners(self, X_Train, Y_Train):
        for learner in self.learners:
            learner.fit(X_Train, Y_Train)

    def fit(self, X_Train, Y_Train):
        self.num_Of_Classes = len(np.unique(Y_Train))
        self.map_Label_2_Index = {
            label: index for index, label in enumerate(np.unique(Y_Train))}
        self.map_Index_2_Label = {
            index: label for index, label in enumerate(np.unique(Y_Train))}

        num_Of_Data = X_Train.shape[0]
        self.__fit_Learners(X_Train=X_Train, Y_Train=Y_Train)
        self.entry_weights = np.full(
            (num_Of_Data,), fill_value=1/num_Of_Data, dtype=np.float32)
        self.learner_weights = np.zeros((self.num_learner,), dtype=np.float32)

        score = [0 for i in range(self.num_learner)]
        for learner_idx, learner in enumerate(self.learners):
            score[learner_idx] = accuracy_score(
                learner.predict(X_Train), Y_Train)
        self.sorted_learners = [l for l, e in sorted(
            zip(self.learners, score), key=lambda pair: pair[1], reverse=True)]

        for learner_idx, learner in enumerate(self.sorted_learners):
            Y_Predicted = learner.predict(X_Train)
            is_wrong = np.array(Y_Predicted != Y_Train.reshape(-1)).astype(int)
            weighted_learner_error = np.sum(
                is_wrong * self.entry_weights)/self.entry_weights.sum()
            self.learner_weights[learner_idx] = max(0, 
                    np.log(1/(weighted_learner_error + 1e-6) - 1) + np.log(
                self.num_Of_Classes - 1))
            alpha_arr = np.full(
                (num_Of_Data,), fill_value=self.learner_weights[learner_idx], 
                                                dtype=np.float32)
            self.entry_weights = self.entry_weights * \
                np.exp(alpha_arr * is_wrong)
            self.entry_weights = self.entry_weights/self.entry_weights.sum()

        self.learner_weights = self.learner_weights/self.learner_weights.sum()

    def predict(self, features):
        return [self.predict_One(feature) for feature in features]

    def predict_One(self, feature):
        pooled_prediction = np.zeros((self.num_Of_Classes,), dtype=np.float32)
        for learner_idx, learner in enumerate(self.sorted_learners):
            predicted_cat = learner.predict_One(feature)
            prediction = np.full(
                (self.num_Of_Classes,), fill_value=-1/(self.num_Of_Classes-1), 
                                                        dtype=np.float32)
            prediction[self.map_Label_2_Index[predicted_cat]] = 1
            pooled_prediction += prediction*self.learner_weights[learner_idx]

        return self.map_Index_2_Label[np.argmax(pooled_prediction)]


## Model Training and Evaluation

In [None]:

def train_or_eval(model, epochs=15, batch_size=32, lr=5e-3, l2=1e-2):
    from sklearn.metrics import classification_report, confusion_matrix

    loss_func = nn.CrossEntropyLoss()
    optim = torch.optim.Adam(model.parameters(), lr=lr)
    train_loader, test_loader = get_data(batch_size)
    losses = []
    for e in range(epochs):
        total_loss = 0
        y_preds = []
        y_truth = []
        model.train()
        for x, y in train_loader:
            optim.zero_grad()
            y_score = model(x)
            y_pred = y_score.argmax(-1)
            y_pred = y_pred
            y_preds.extend(y_pred.numpy().tolist())
            y_truth.extend(y.numpy().tolist())
            loss = loss_func(y_score, y)
            total_loss += loss.item() * y_pred.shape[0]
            loss.backward()
            optim.step()
        print("training details for epoch ", e+1)
        losses.append(total_loss / len(y_preds))
        print("loss: ", losses[-1])
        print(classification_report(y_truth, y_preds, target_names=['1','2','3','4','5','7']))
        print(confusion_matrix(y_truth, y_preds))
        
        model.eval()
        y_preds = []
        y_truth = []
        with torch.no_grad():
            for x, y in test_loader:
                y_score = model(x)
                y_pred = y_score.argmax(-1)
                y_pred = y_pred
                y_preds.extend(y_pred.numpy().tolist())
                y_truth.extend(y.numpy().tolist()) 
        print("testing details for epoch ", e+1)
        print(classification_report(y_truth, y_preds, target_names=['1','2','3','4','5','7']))
        print(confusion_matrix(y_truth, y_preds))
    return losses     


### Logistic Regression

In [None]:
lr_model = Logistic()
lr_losses = train_or_eval(lr_model, epochs=30, batch_size=64, lr=1e-2)
plt.plot(lr_losses)
plt.xlabel("epoch")
plt.ylabel("loss")
plt.show()

For this model, among the 6 classes, the model has the highest precision 0.78 on class 3(grey soil) and highest recall 0.99 on class 1(red soil), and the model perform poor on class 2(cotton crop), class 4(damp grey soil) and class 5(soil with vegetation stubble). The overall accuracy on the testing set is 0.64.

### Neural network

In [None]:
nn_model = Model()
nn_losses = train_or_eval(nn_model, epochs=15, batch_size=64)
plt.plot(nn_losses)
plt.xlabel("epoch")
plt.ylabel("loss")
plt.show()

Among the 6 classes, the model has the highest precision 0.93 and highest recall 0.99 on class 1(red soil), and the model perform poor on class 4(damp gray soil), none of the samples from class 4 is classified correctly. The overall accuracy on the testing set is 0.82. Also, the loss curve shows that the training loss for the neural network gets stable after 11 epochs. As for the logistic regression, it was not until 30 epochs. 

### Decision Tree classifier

In [None]:
from StandardScaler import StandardScaler
train_data = pd.read_csv("sat.trn", sep=' ', header=None)
test_data = pd.read_csv("sat.tst", sep=' ', header=None)
sc = StandardScaler()
x_train = sc.fit_transform(np.array(train_data.iloc[:, :-1]) )
y_train = np.array(list(map(label2idx, train_data.iloc[:, -1])))
x_test = sc.transform(np.array(test_data.iloc[:, :-1]) )
y_test = np.array(list(map(label2idx, test_data.iloc[:, -1])))

model_Tree = DecisionTreeClassifier(min_samples_split=4, max_depth=5)
model_Tree.fit(x_train, y_train.reshape([-1,1]))
model_Tree.print_tree()
Y_Pred_Tree = model_Tree.predict(x_train)
print(
    f"Decision Tree Classifier accruacy for training is {accuracy_score(y_train, Y_Pred_Tree)}")
print("Confusion matrix for training:")
print(confusion_matrix(y_train, Y_Pred_Tree))


pred_Test_Tree = model_Tree.predict(x_test)
print(
    f"Test accuracy score of dicision tree is: {accuracy_score(pred_Test_Tree, y_test)}")
print(confusion_matrix(y_test, pred_Test_Tree))


I also used decision tree classifier as comparison. The max depth hyperparameter is set to be 5 and minimum samples of a node is 4 to have low bias and variance. The decision tree model has test accuracy score of 0.84 which is similiar to the performance of logistic regression and neural network. But from the diagnal of the confusion matrix which is the true positive value of the predicted class, decision tree will separate all the classes. 

### Adaboost

In [None]:
model_Adabost = Adaboost(num_learner=10, Model=DecisionTreeClassifier, min_samples_split_lower=3,
                         min_samples_split_upper=5, max_depth_lower=4, max_depth_upper=6)
model_Adabost.fit(x_train, y_train.reshape(-1, 1))
Y_Pred_ada = model_Adabost.predict(x_train)
print(
    f"Adaboost accuracy is {accuracy_score(Y_Pred_ada, y_train.reshape(-1))}")
print("confusion matrix:")
print(confusion_matrix(y_train, Y_Pred_ada))

I also used Adaboost algorithm with Decision tree as conjunction. The model has 10 decision tree learners with minimum sample split of 3 and maximum of 5, and depth ranging from 4 to 6. Adaboost uses multiple weak decision tree models to prevent overfitting. As we can see from the result, Adaboost has better accuracy while be overfitting. Adaboost has accuracy of 0.85 which is higher than decision tree, logistic regression and neural network in this dataset.

## Model Selection
Based on the loss curves shown in Figures 5 and 6, when the loss gets stable, the training loss for the two-layer neural network is smaller than the training loss of logistic regression. Moreover, the loss curve's convergence speed for the neural network is also faster than the logistic regression.
Taking the classification accuracy, the overall accuracy for the testing set of logistic regression is only 0.64. As for the neural network, this number can come up to 0.82. Seen from the classification condition within and across the class, the performance of logistic regression is poor in classes 2, 4, and 5. In contrast, the neural network's performance is only poor in class 4. Also, the precision and recall rate on other classes tell us that the neural network better classifies the samples. I select the two-layer neural network as the final classification model.

## Conclusion and Recommendation
In this report, I choose the Landsat Satellite dataset. The features of the samples in this dataset are all pixel values. I did an exploratory analysis on the distribution of pixel values of several features and checked if the distribution of labels was almost the same for the training and testing set. After that, I applied logistic regression and neural network as two candidate classification models and selected neural network as the final model for classifying the Landsat Satellite dataset based on their loss curves and classification reports.
After selecting the final model, according to the result of the confusion matrix of the model, we can see that, among the 211 samples from class 4, 76 out of them are misclassified into class 3 (grey soil), and 123 out of them are misclassified into class 7(very damp grey soil). Those two classes might be similar to the damp grey soil, so it is difficult for this model to recognize.
I recommend that this neural network model be used for Landsat Satellite classification for all soil classes except for damp gray soil. It performs well, especially on red soil and cotton crops.
