# Exploring the historical truck transactional data

### Configuration

In [6]:
import pandas as pd
import plotly_express as px

In [7]:
trucks = pd.read_csv('historical_data/truck_hist_cleaned.csv')
trucks.sample(10)

Unnamed: 0,timestamp,type,total,truck_id
2838,2025-01-17 12:24:00,cash,4.99,3
1437,2025-01-16 21:43:00,cash,8.7,2
934,2025-01-19 13:19:00,cash,7.0,1
3880,2025-01-15 14:51:00,cash,5.0,5
611,2025-01-16 14:50:00,card,7.0,1
3695,2025-01-13 18:05:00,card,5.0,5
4767,2025-01-15 16:45:00,cash,5.99,6
465,2025-01-15 12:42:00,cash,7.0,1
4380,2025-01-18 14:24:00,cash,5.0,5
163,2025-01-14 12:46:00,cash,7.0,1


In [9]:
trucks['timestamp'] = pd.to_datetime(trucks['timestamp'])
trucks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5278 entries, 0 to 5277
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   timestamp  5278 non-null   datetime64[ns]
 1   type       5278 non-null   object        
 2   total      5278 non-null   float64       
 3   truck_id   5278 non-null   int64         
dtypes: datetime64[ns](1), float64(1), int64(1), object(1)
memory usage: 165.1+ KB


In [10]:
trucks.describe()

Unnamed: 0,timestamp,total,truck_id
count,5278,5278.0,5278.0
mean,2025-01-16 13:26:42.470632704,6.358513,3.276241
min,2025-01-13 09:06:00,0.99,1.0
25%,2025-01-14 18:07:15,4.99,2.0
50%,2025-01-16 14:31:00,6.99,3.0
75%,2025-01-18 09:38:45,7.99,5.0
max,2025-01-19 23:58:00,12.99,6.0
std,,2.304055,1.687172


## Explore the data

1. Which truck has the highest number of transactions?

In [18]:
trucks.groupby(['truck_id']).count().reset_index()

Unnamed: 0,truck_id,timestamp,type,total
0,1,976,976,976
1,2,921,921,921
2,3,1444,1444,1444
3,4,283,283,283
4,5,914,914,914
5,6,740,740,740


Grouping by 'truck_id' and counting the number of transactions, it is clear that truck #3 has the highest number of transactions at 1444, with the second highest being under 1000 at 976 from truck #1, and the lowest number, at 283 transactions is truck #4.

2. Which truck has the lowest total transaction value?

In [21]:
trucks.groupby(['truck_id']).sum('total').reset_index().min()

truck_id      1.00
total       770.17
dtype: float64

Following the trend from the number of transactions, truck #4 trails behind as the truck with the lowest total transaction value at £770.17.

3. What is the average transaction value?

In [26]:
trucks['total'].mean()

np.float64(6.358512694202348)

This shows the average transaction value to be £6.36 amongst all 6 trucks.

4. What is the average transaction value for each truck?

In [29]:
trucks.groupby(['truck_id'])['total'].mean().reset_index()

Unnamed: 0,truck_id,total
0,1,7.753791
1,2,8.040717
2,3,5.866039
3,4,2.721449
4,5,5.449344
5,6,5.899459


The above table shows average transaction value for each truck, with the highest being truck #2 at £8.04, and the cheapest average is for truck #4 at £2.72.

5. What proportion of transactions use cash?

In [48]:
print(f"Proportion of transactions that use cash: {round(float((len(trucks[trucks['type'] == 'cash']) / len(trucks['type'])) * 100),2)}%")

Proportion of transactions that use cash: 50.66%
