
**Association Rule Mining (ARM)** is a fundamental data mining technique used to discover interesting relationships, patterns, and associations among items in large datasets. This technique is particularly valuable in agricultural research where understanding the relationships between different farming practices, fertilizers, and seasonal patterns can lead to improved crop yields and more efficient resource utilization.

 **Background and Motivation**

In agricultural systems, the relationship between fertilizer usage and seasonal variations is crucial for optimizing crop production. Cassava, being a staple crop in many regions, requires careful management of nutrients across different growing seasons. Understanding these associations can help farmers:

1. **Optimize fertilizer application timing** across different seasons
2. **Identify complementary fertilizer combinations** that work well together
3. **Reduce costs** by avoiding redundant or conflicting fertilizer applications
4. **Improve yield predictions** based on seasonal fertilizer patterns

 **Association Rule Mining Overview**

Association Rule Mining follows the general form: **IF {antecedent} THEN {consequent}**

Where:
- **Antecedent**: The condition or set of conditions (e.g., specific fertilizers used in Season 1)
- **Consequent**: The result or outcome (e.g., fertilizers used in Season 2)
- **Support**: Frequency of occurrence of the rule in the dataset
- **Confidence**: Probability that the consequent occurs given the antecedent
- **Lift**: Measure of how much more likely the consequent is given the antecedent

 **Methodology**
This analysis employs the **Apriori Algorithm**, one of the most widely used algorithms for association rule mining. The Apriori algorithm works by:

1. **Finding frequent itemsets** that meet a minimum support threshold
2. **Generating association rules** from these frequent itemsets
3. **Evaluating rule quality** using confidence and lift metrics
4. **Filtering rules** based on minimum thresholds

In [1]:
#importing libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.preprocessing import TransactionEncoder


Step 1: Create a dataframe and read the excel file.

In [2]:

df = pd.read_excel("C:/Users/HP/Big Data/Assignments/Assignment2/Cassava_Yield_Data.xlsx")

In [3]:
print(df.head())

   Sesn  locn  block  rep tillage    ferT  Plants_harvested  No_bigtubers  \
0     2     1      1    1     conv  F2150                28             0   
1     2     1      1    1     conv  F1100                28             0   
2     2     1      1    1     conv  F3200                28             2   
3     2     1      1    1     conv  F5300                28             6   
4     2     1      1    1     conv  F4250                28             3   

   Weigh_bigtubers  No_mediumtubers  Weight_mediumtubers  No_smalltubers  \
0              0.0               61                  2.5             319   
1              0.0              110                  4.6             260   
2              0.2              115                  5.2             319   
3              0.7               60                  2.7             303   
4              0.3               82                  3.4             332   

   Weight_smalltubers  Totaltuberno  AV_tubers_Plant  Total_tubweight  \
0      

In [7]:
print(df.shape)

(115, 20)


In [9]:
# checking for missing values or data
df.isna().sum()

Sesn                     0
locn                     0
block                    0
rep                      0
tillage                  0
ferT                     0
Plants_harvested         0
No_bigtubers             0
Weigh_bigtubers          0
No_mediumtubers          0
Weight_mediumtubers      0
No_smalltubers           0
Weight_smalltubers       0
Totaltuberno             0
AV_tubers_Plant          0
Total_tubweight          0
plotsize                 0
HEC                      0
TotalWeightperhectare    0
TotalTuberperHectare     0
dtype: int64

Step 2: Converting (fertiliser + season) to transactions  as strings

In [10]:
transactions = df[['Sesn','ferT']].astype(str).values.tolist()
print(transactions)

[['2', 'F2150'], ['2', 'F1100'], ['2', 'F3200'], ['2', 'F5300'], ['2', 'F4250'], ['2', 'F5300'], ['2', 'F3200'], ['2', 'F4250'], ['2', 'F1100'], ['2', 'F2150'], ['2', 'F4250'], ['2', 'F5300'], ['2', 'F2150'], ['2', 'F3200'], ['2', 'F1100'], ['2', 'F2150'], ['2', 'F1100'], ['2', 'F3200'], ['2', 'F5300'], ['2', 'F4250'], ['2', 'F5300'], ['2', 'F3200'], ['2', 'F4250'], ['2', 'F1100'], ['2', 'F2150'], ['2', 'F4250'], ['2', 'F5300'], ['2', 'F2150'], ['2', 'F3200'], ['2', 'F1100'], ['1', 'F1100'], ['1', 'F3200'], ['1', 'F2150'], ['1', 'F4250'], ['1', 'F5300'], ['1', 'F1100'], ['1', 'F3200'], ['1', 'F2150'], ['1', 'F4250'], ['1', 'F5300'], ['1', 'F1100'], ['1', 'F3200'], ['1', 'F2150'], ['1', 'F4250'], ['1', 'F5300'], ['1', 'F1100'], ['1', 'F3200'], ['1', 'F2150'], ['1', 'F4250'], ['1', 'F5300'], ['1', 'F1100'], ['1', 'F3200'], ['1', 'F2150'], ['1', 'F4250'], ['1', 'F5300'], ['1', 'F1100'], ['1', 'F3200'], ['1', 'F2150'], ['1', 'F4250'], ['1', 'F5300'], ['1', 'F1100'], ['1', 'F3200'], ['1', '

Step 3: One-hot encode

In [11]:
te = TransactionEncoder()
te_data = te.fit(transactions).transform(transactions)
df1 = pd.DataFrame(te_data, columns=te.columns_)


Step 4: Frequent itemsets

In [13]:
frequent_itemsets = apriori(df1, min_support=0.05, use_colnames=True)

Step 5: Association rules

In [14]:
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.1)

Step 6: Unique fertilisers and seasons

In [15]:
ferT = df['ferT'].astype(str).unique().tolist()
Sesn = df['Sesn'].astype(str).unique().tolist()

Step 7: Filter fertiliser ↔ season associations

In [19]:
fert_season_rules = rules[
    (rules['antecedents'].apply(lambda x: any(i in x for i in ferT)) &
     rules['consequents'].apply(lambda x: any(i in x for i in Sesn))) |
    (rules['antecedents'].apply(lambda x: any(i in x for i in Sesn)) &
     rules['consequents'].apply(lambda x: any(i in x for i in ferT)))
]

print(fert_season_rules[['antecedents','consequents','support','confidence','lift']])

   antecedents consequents   support  confidence  lift
0      (F1100)         (1)  0.095652    0.478261   1.0
1          (1)     (F1100)  0.095652    0.200000   1.0
2      (F2150)         (1)  0.095652    0.478261   1.0
3          (1)     (F2150)  0.095652    0.200000   1.0
4      (F3200)         (1)  0.095652    0.478261   1.0
5          (1)     (F3200)  0.095652    0.200000   1.0
6      (F4250)         (1)  0.095652    0.478261   1.0
7          (1)     (F4250)  0.095652    0.200000   1.0
8      (F5300)         (1)  0.095652    0.478261   1.0
9          (1)     (F5300)  0.095652    0.200000   1.0
10         (2)     (F1100)  0.104348    0.200000   1.0
11     (F1100)         (2)  0.104348    0.521739   1.0
12         (2)     (F2150)  0.104348    0.200000   1.0
13     (F2150)         (2)  0.104348    0.521739   1.0
14         (2)     (F3200)  0.104348    0.200000   1.0
15     (F3200)         (2)  0.104348    0.521739   1.0
16         (2)     (F4250)  0.104348    0.200000   1.0
17     (F4

Since the lift value in all combinations is 1.0, that shows that, there is 'no meaningful association' between fertilizers and seasons in the dataset(Cassava_Yield).