# Exploring the historical truck transactional data

### Configuration

In [1]:
import pandas as pd
import plotly_express as px

In [2]:
trucks = pd.read_csv('historical_data/truck_hist_cleaned.csv')
trucks.sample(10)

Unnamed: 0,timestamp,type,total,truck_id
2270,2025-01-14 12:18:00,card,7.99,3
2690,2025-01-16 16:04:00,cash,12.99,3
2177,2025-01-13 16:26:00,card,7.99,3
162,2025-01-14 12:44:00,card,7.0,1
4612,2025-01-15 11:58:00,cash,6.99,6
1457,2025-01-16 23:07:00,cash,8.5,2
2121,2025-01-13 15:51:00,cash,1.99,3
433,2025-01-15 11:08:00,cash,8.9,1
3484,2025-01-16 10:40:00,cash,0.99,4
2138,2025-01-13 16:03:00,card,7.99,3


In [3]:
trucks['timestamp'] = pd.to_datetime(trucks['timestamp'])
trucks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5278 entries, 0 to 5277
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   timestamp  5278 non-null   datetime64[ns]
 1   type       5278 non-null   object        
 2   total      5278 non-null   float64       
 3   truck_id   5278 non-null   int64         
dtypes: datetime64[ns](1), float64(1), int64(1), object(1)
memory usage: 165.1+ KB


In [4]:
trucks.describe()

Unnamed: 0,timestamp,total,truck_id
count,5278,5278.0,5278.0
mean,2025-01-16 13:26:42.470632704,6.358513,3.276241
min,2025-01-13 09:06:00,0.99,1.0
25%,2025-01-14 18:07:15,4.99,2.0
50%,2025-01-16 14:31:00,6.99,3.0
75%,2025-01-18 09:38:45,7.99,5.0
max,2025-01-19 23:58:00,12.99,6.0
std,,2.304055,1.687172


## Explore the data

1. Which truck has the highest number of transactions?

In [5]:
trucks.groupby(['truck_id']).count().reset_index()

Unnamed: 0,truck_id,timestamp,type,total
0,1,976,976,976
1,2,921,921,921
2,3,1444,1444,1444
3,4,283,283,283
4,5,914,914,914
5,6,740,740,740


Grouping by 'truck_id' and counting the number of transactions, it is clear that truck #3 has the highest number of transactions at 1444, with the second highest being under 1000 at 976 from truck #1, and the lowest number, at 283 transactions is truck #4.

2. Which truck has the lowest total transaction value?

In [16]:
trucks.groupby(['truck_id']).sum('total').reset_index()

Unnamed: 0,truck_id,total
0,1,7567.7
1,2,7405.5
2,3,8470.56
3,4,770.17
4,5,4980.7
5,6,4365.6


Following the trend from the number of transactions, truck #4 trails behind as the truck with the lowest total transaction value at £770.17.

3. What is the average transaction value?

In [7]:
trucks['total'].mean()

np.float64(6.358512694202348)

This shows the average transaction value to be £6.36 amongst all 6 trucks.

4. What is the average transaction value for each truck?

In [8]:
trucks.groupby(['truck_id'])['total'].mean().reset_index()

Unnamed: 0,truck_id,total
0,1,7.753791
1,2,8.040717
2,3,5.866039
3,4,2.721449
4,5,5.449344
5,6,5.899459


The above table shows average transaction value for each truck, with the highest being truck #2 at £8.04, and the cheapest average is for truck #4 at £2.72.

5. What proportion of transactions use cash?

In [12]:
print(f"Proportion of transactions that use cash: {round((len(trucks[trucks['type'] == 'cash']) / len(trucks['type']) * 100),2)}%")

Proportion of transactions that use cash: 50.66%


## Visualise the data

1. A bar chart showing average transaction total for each truck

In [10]:
px.bar(trucks.groupby('truck_id')['total'].mean().reset_index(), x='truck_id', y='total', title='Average Transaction Total per Truck')

2. A pie chart showing the proportion of transactions made using cash or card

In [14]:
cat_counts = trucks['type'].value_counts()
px.pie(names=cat_counts.index, values=cat_counts.values, title='Proportion of payment type')

3. A line chart showing the total transaction value each day

In [15]:
daily = trucks.groupby('timestamp')['total'].sum().reset_index()
px.line(daily, x='timestamp', y='total', title='Total Transaction Value per day')