# STINTSY MCO

In [12]:
# dependencies
import pandas as pd

In [13]:
# load the data
data = pd.read_csv('dataset/pumpkin_seeds.csv', encoding='latin1')

# Section 1: Introduction
(Possible problem statement)
In the agricultural industry, the ability to classify pumpkin seeds by type is essential for efficient seed sorting, quality control, and market distribution. Accurate seed type identification allows farmers and distributors to ensure that the seeds meet specific requirements for growth, yield, and commercial viability.

In this project, we aim to develop a machine learning model to classify pumpkin seeds into two primary types: Cercevelik and Urgup Sivrisi. Using a dataset that includes various morphological features of the seeds—such as area, perimeter, axis lengths, compactness, and eccentricity—we will explore, preprocess, and analyze the data to build a robust classification model. This model should be able to accurately predict the type of a pumpkin seed based on its physical characteristics, providing a valuable tool for farmers and agricultural businesses to streamline the sorting and distribution process.

# Section 2: Description of the dataset

In [14]:
print(data.dtypes)

Area                   int64
Perimeter            float64
Major_Axis_Length    float64
Minor_Axis_Length    float64
Convex_Area            int64
Equiv_Diameter       float64
Eccentricity         float64
Solidity             float64
Extent               float64
Roundness            float64
Aspect_Ration        float64
Compactness          float64
Class                 object
dtype: object


In [15]:
data.head()

Unnamed: 0,Area,Perimeter,Major_Axis_Length,Minor_Axis_Length,Convex_Area,Equiv_Diameter,Eccentricity,Solidity,Extent,Roundness,Aspect_Ration,Compactness,Class
0,56276,888.242,326.1485,220.2388,56831,267.6805,0.7376,0.9902,0.7453,0.8963,1.4809,0.8207,erevelik
1,76631,1068.146,417.1932,234.2289,77280,312.3614,0.8275,0.9916,0.7151,0.844,1.7811,0.7487,erevelik
2,71623,1082.987,435.8328,211.0457,72663,301.9822,0.8749,0.9857,0.74,0.7674,2.0651,0.6929,erevelik
3,66458,992.051,381.5638,222.5322,67118,290.8899,0.8123,0.9902,0.7396,0.8486,1.7146,0.7624,erevelik
4,66107,998.146,383.8883,220.4545,67117,290.1207,0.8187,0.985,0.6752,0.8338,1.7413,0.7557,erevelik


| Attribute             | Description                                                                                           | Data Type |
|-----------------------|-------------------------------------------------------------------------------------------------------|-----------|
| **Area**              | Number of pixels within the borders of a pumpkin seed                                                 | int64     |
| **Perimeter**         | Circumference in pixels of a pumpkin seed                                                             | float64   |
| **Major_Axis_Length** | Large axis distance of a pumpkin seed                                                                 | float64   |
| **Minor_Axis_Length** | Small axis distance of a pumpkin seed                                                                 | float64   |
| **Convex_Area**       | Number of pixels of the smallest convex shell at the region formed by the pumpkin seed                | int64     |
| **Equiv_Diameter**    | Computed as $\sqrt{\frac{4a}{\pi}}$, where $a$ is the area of the pumpkin seed                                 | float64   |
| **Eccentricity**      | Eccentricity of a pumpkin seed                                                                        | float64   |
| **Solidity**          | Convex condition of the pumpkin seeds                                                                 | float64   |
| **Extent**            | Ratio of a pumpkin seed area to the bounding box pixels                                               | float64   |
| **Roundness**         | Ovality of pumpkin seeds without considering the distortion of the edges                              | float64   |
| **Aspect_Ratio**      | Aspect ratio of the pumpkin seeds                                                                     | float64   |
| **Compactness**       | Proportion of the area of the pumpkin seed relative to the area of the circle with the same circumference | float64 |
| **Class**             | Seed type, either Cercevelik or Urgup Sivrisi                                                         | object    |


# Section 3: List of dependencies

# Section 4: Data preprocessing and cleaning

In [16]:
data.head()

Unnamed: 0,Area,Perimeter,Major_Axis_Length,Minor_Axis_Length,Convex_Area,Equiv_Diameter,Eccentricity,Solidity,Extent,Roundness,Aspect_Ration,Compactness,Class
0,56276,888.242,326.1485,220.2388,56831,267.6805,0.7376,0.9902,0.7453,0.8963,1.4809,0.8207,erevelik
1,76631,1068.146,417.1932,234.2289,77280,312.3614,0.8275,0.9916,0.7151,0.844,1.7811,0.7487,erevelik
2,71623,1082.987,435.8328,211.0457,72663,301.9822,0.8749,0.9857,0.74,0.7674,2.0651,0.6929,erevelik
3,66458,992.051,381.5638,222.5322,67118,290.8899,0.8123,0.9902,0.7396,0.8486,1.7146,0.7624,erevelik
4,66107,998.146,383.8883,220.4545,67117,290.1207,0.8187,0.985,0.6752,0.8338,1.7413,0.7557,erevelik


Clean the class as it cannot display special characters

In [17]:
print(data['Class'].unique())

['\x82er\x8develik' '\x86rg\x9fp Sivrisi']


In [None]:
data['Class'] = data['Class'].replace({'\x82er\x8develik': 'Cercevelik'})
data['Class'] = data['Class'].replace({'\x86rg\x9fp Sivrisi': 'Sivrisi'})

print(data['Class'].unique())

['Cercevelik' 'Sivrisi']
