# REAL RACING 3 VEHICLES - DATA ANALYSIS - OUTCOME 1 (Revision 1)

### **Welcome to my 1st part of data analysis page for Real Racing 3 Vehicles!**

This first outcome part has been **revised for game version 8.4**, in order to keep the analysis up-to-date for the currently available vehicles in the game.

As always, we must begin with importing the necessary modules first.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

And, of course, we have to read the data frame that contains all vehicles...

**Important:** It seems that GitHub causes some tables to have unusual resizing and/or print weird characters when the dollar sign `$` appeared in the dataset during rendering. To avoid this issue, all values in `Price` that contain `$` should be changed before the analyses begin.

In [2]:
rr3_df = pd.read_csv('real_racing_3_vehicles_v8_4.csv')
for i in range(len(rr3_df)):
    if rr3_df.loc[i, 'Price'][-1] == '$':
        s = rr3_df.loc[i, 'Price'][:-1] + 'dollar'
        rr3_df.loc[i, 'Price'] = s

Let's see what we have in the first 10 rows:

In [3]:
rr3_df.head(10)

Unnamed: 0,Manufacturer,Model,PR,Top Speed,Acceleration,Braking,Grip,Class,Type,Series,Price,Service Time,Service Cost,Availability,Exclusive Events
0,Acura,NSX GT3,65.2,274,3.02,29.6,1.65,R,MR,2,750 Gold,180,1990.0,True,False
1,Acura,ARX-05,78.4,322,2.8,25.6,1.76,R,MR,1,850 Gold,180,1997.0,True,False
2,Alfa Romeo,Giulietta TCR,34.5,241,5.3,29.3,1.3,R,FF,1,180 Gold,180,1332.0,True,False
3,Alfa Romeo,4C,29.4,257,4.34,30.2,1.05,P,MR,1,200 Gold,180,1415.0,True,False
4,Alfa Romeo,155 V6 TI,38.0,274,4.6,29.0,1.35,R,F4,2,300 Gold,180,1665.0,True,True
5,Ariel,Atom 3.5,44.5,249,2.7,30.5,1.0,S,RR,2,255000 Rdollar,140,1170.0,True,False
6,Ariel,Atom V8,61.1,322,2.3,30.5,1.2,S,RR,3,150 Gold,180,1402.0,True,True
7,Aston Martin,DB9,28.2,295,4.5,32.0,1.0,P,FR,4,230000 Rdollar,130,1132.0,True,False
8,Aston Martin,Vanquish,33.6,295,4.1,31.4,1.02,P,FR,3,110 Gold,150,1230.0,True,False
9,Aston Martin,V12 Vantage S,43.8,330,3.7,30.5,1.1,P,FR,3,425000 Rdollar,170,1355.0,True,True


The dataset is prepared properly then, good! Let's get the manufacturer and the model name of the **50th vehicle** in the data frame:

In [4]:
rr3_df.loc[49, ['Manufacturer', 'Model']]

Manufacturer                            Bugatti
Model           Veyron 16.4 Grand Sport Vitesse
Name: 49, dtype: object

Well, this returns a **Series** when we check the type of the input above. But, I want it to be appeared as a plain text:

In [5]:
print(rr3_df.loc[49, 'Manufacturer'] + ' ' + rr3_df.loc[49, 'Model'])

Bugatti Veyron 16.4 Grand Sport Vitesse


OK, what are the specs of the vehicle at 125th index?

In [6]:
rr3_df.loc[125, 'PR':'Grip']

PR              79.2
Top Speed        323
Acceleration    2.77
Braking         25.6
Grip            1.76
Name: 125, dtype: object

Hmmm, according to the base stats, this probably looks like **a race car**, I suppose...

Anyway, this time, I'd like to see the **price to buy**, **total service time** and **cost** for the **250th vehicle**:

In [7]:
rr3_df.loc[249, ['Price', 'Service Time', 'Service Cost']]

Price           850 Gold
Service Time         180
Service Cost        1997
Name: 249, dtype: object

According to these results, it should be an expensive sports car that belongs to a well-known manufacturer. Well, the value 180 for **Service Time** means the vehicle needs **180 minutes** for servicing and 1997 for **Service Cost** means you've got to spend **1997 R$** to begin servicing that vehicle.

How about the **classes** and **types** for vehicles between **251st and 260th** index?

In [8]:
rr3_df.loc[251:260, ['Class', 'Type']]

Unnamed: 0,Class,Type
251,S,MR
252,S,MR
253,S,RR
254,P,RR
255,S,RR
256,S,R4
257,P,RR
258,S,RR
259,S,RR
260,R,RR


All right, we can inspect vehicles individually like these all day. But now, let's get more detailed info about this dataset.

First of all, these are the columns in the datset (for pandas, a dataframe) I'm working with:

In [9]:
rr3_df.columns

Index(['Manufacturer', 'Model', 'PR', 'Top Speed', 'Acceleration', 'Braking',
       'Grip', 'Class', 'Type', 'Series', 'Price', 'Service Time',
       'Service Cost', 'Availability', 'Exclusive Events'],
      dtype='object')

Here's the shape of the dataframe: 

In [10]:
rr3_df.shape

(310, 15)

How about the general statistics of the features which hold **numerical values** (i.e. those with data type of `float64`)?

In [11]:
round(rr3_df.describe(), 2)

Unnamed: 0,PR,Top Speed,Acceleration,Braking,Grip,Series,Service Time,Service Cost
count,310.0,310.0,310.0,310.0,310.0,310.0,310.0,308.0
mean,49.73,310.53,3.66,30.77,1.33,2.14,165.21,1396.21
std,26.58,46.17,1.15,5.51,0.5,1.16,60.82,563.62
min,0.1,180.0,1.9,14.3,0.72,0.0,1.0,0.0
25%,34.8,280.0,2.9,28.78,1.03,1.0,150.0,1132.0
50%,48.35,317.0,3.5,30.5,1.17,2.0,180.0,1483.5
75%,66.15,332.75,4.1,32.3,1.5,3.0,180.0,1901.5
max,130.3,447.0,8.2,48.8,4.0,7.0,325.0,2192.0


Whoa, look at that! That's lots of info we obtained just from one command! Well, these numbers already explain itself!

What makes me wondering is that why the count value for **Service Cost** is not the exact number (which is 2 less), when compared against the count of other features above.

Therefore, I want to find out which row(s) have that kind of missing values, possibly **NaN values**.

In [12]:
rr3_df[rr3_df['Service Cost'].isnull()]

Unnamed: 0,Manufacturer,Model,PR,Top Speed,Acceleration,Braking,Grip,Class,Type,Series,Price,Service Time,Service Cost,Availability,Exclusive Events
211,McLaren,Senna,66.2,335,2.9,28.0,1.45,S,MR,1,850 Gold,180,,True,False
213,McLaren,Senna GTR,79.5,362,2.8,25.9,1.7,S,MR,1,1000 Gold,180,,True,True


Ah yes! I remembered that McLaren Senna and Senna GTR that were added in version 8.3 and I forgot to include their service costs to the dataset! Perhaps I should consider filling the values accordingly ASAP.

OK, so I've obtained this knowledge, let's continue motivating ourselves...

With `describe()` method, I've seen many **descriptive decimal values**. Can I do the same with `numpy` methods? Let's try them on PR column:

In [13]:
PR = rr3_df['PR'].values
print("Count: {}, Mean: {:.3}, Std. dev.: {:.5}".format(PR.size, PR.mean(), PR.std()))

Count: 310, Mean: 49.7, Std. dev.: 26.538


In [14]:
print("Min: {}, Max: {}".format(PR.min(), PR.max()))

Min: 0.1, Max: 130.3


In [15]:
print("25th, 50th, 75th  precent values are {}".format(np.percentile(PR, [25, 50, 75])))

25th, 50th, 75th  precent values are [34.8  48.35 66.15]


Yep, they are all doable! By obtaining this information above, I'd like to get the names of vehicles which have **the lowest** PR values:

In [16]:
lowest_PR = rr3_df[rr3_df['PR'] == 0.1]
lowest_PR.loc[:, ['Manufacturer', 'Model', 'Price']]

Unnamed: 0,Manufacturer,Model,Price
194,Mazda,RX-3,50 Gold
229,Nissan,Skyline 2000 GT-R (KPGC10),50 Gold
254,Porsche,911 Targa (1974),53000 Rdollar


Hmmm, the lowest PR value is shared by **3 vehicles**, not one! Well, why do Mazda RX-3 and Nissan Skyline 2000 GT-R cost Gold to buy then instead of R$!?

How about the vehicle with **the highest PR** value?

In [17]:
highest_PR = rr3_df[rr3_df['PR'] == 130.3]
highest_PR.iloc[:,[0,1,3,4,5,6,10]]

Unnamed: 0,Manufacturer,Model,Top Speed,Acceleration,Braking,Grip,Price
214,McLaren,MP4-X,402,1.9,14.3,4.0,1200 Gold


WOW! **McLaren MP4-X**, a futuristic F1-inspired hypercar, must be immensely fast! This vehicle really deserves to have the highest PR in the game (still even for version 8.4). Moreover, it has the highest acceleration, cornering grip and lowest braking distance! _(Check these values with the ones above properly)_

On the other hand, w.r.t. **top speed, it doesn't have the highest**! So, which one?

In [18]:
highest_TS = rr3_df['Top Speed'].max()
highest_TS_name = rr3_df[rr3_df['Top Speed'] == highest_TS]
highest_TS_name[['Manufacturer', 'Model', 'Top Speed']]

Unnamed: 0,Manufacturer,Model,Top Speed
163,Koenigsegg,Agera RS,447


Aah! Koenigsegg Agera RS (available since game version 8.0) has the highest top speed (447 kph) even with **no upgrades**!

Now, I wonder which vehicle appears the most throughout the career series.

In [19]:
highest_series = rr3_df['Series'].max()
rr3_df[rr3_df['Series'] == highest_series]

Unnamed: 0,Manufacturer,Model,PR,Top Speed,Acceleration,Braking,Grip,Class,Type,Series,Price,Service Time,Service Cost,Availability,Exclusive Events
272,Porsche,918 Spyder Concept,60.2,322,3.0,30.2,1.4,S,M4,7,845000 Rdollar,215,1605.0,True,False


Ah yes, one of my favourite vehicle here: **Porsche 918 Spyder Concept**. Should this be a good choice to go for full upgrade?

OK, lastly, through these numbers above (the stats we've seen with `describe()`), I noticed that the minimum required servicing time is just **1 minute**.

Thus, which vehicle does almost not need much servicing time?

In [20]:
minimum_service_time = rr3_df['Service Time'].min()
rr3_df[rr3_df['Service Time'] == minimum_service_time]

Unnamed: 0,Manufacturer,Model,PR,Top Speed,Acceleration,Braking,Grip,Class,Type,Series,Price,Service Time,Service Cost,Availability,Exclusive Events
226,Nissan,Silvia (S15),1.7,243,5.5,34.4,0.85,P,FR(RHD),3,25700 Rdollar,1,342.0,True,False


Hey, that's our good ol' friend **Nissan Silvia**! That was our first vehicle ever to own for newcomers in the game, right? Plus, it shouldn't be so expensive to get it to fully upgraded, though!

In contrast, now, which one has the highest servicing time?

In [21]:
max_service_time = rr3_df['Service Time'].max()
rr3_df[rr3_df['Service Time'] == max_service_time]

Unnamed: 0,Manufacturer,Model,PR,Top Speed,Acceleration,Braking,Grip,Class,Type,Series,Price,Service Time,Service Cost,Availability,Exclusive Events
309,Toyota,TS040 Hybrid (2014),89.4,394,2.5,26.2,1.82,R,M4,2,950 Gold,325,2192.0,True,False


Ah, it's a **Toyota**! But, not a street legal vehicle; instead, from _Endurance Motorsport Series_. I figured out that it needs **5 hours 25 minutes** to have it fully serviced. Likewise, **the highest servicing cost** belongs to this vehicle, too!

In [22]:
median_service_time = rr3_df['Service Time'].median()
print("Median of service time is {:.0f} minutes.".format(median_service_time))
count_median_service_time = rr3_df[rr3_df['Service Time'] == median_service_time].shape[0]
print("And {} vehicles have this duration of servicing time.".format(count_median_service_time))

Median of service time is 180 minutes.
And 149 vehicles have this duration of servicing time.


OK, it's not surprising that the median value of servicing time is **180 minutes = 3 hours**, since many new vehicles that were added through the latest updates, even the cheap ones, have their service times set to that value. Now, in game version 8.4, it's been raised to **149** vehicles!

For this reason, this time value is definitely dominant among all!

In [23]:
print("Ratio of vehicles having service time of 3 hours: {:.2f} percent".format(
    count_median_service_time/rr3_df.shape[0] * 100))

Ratio of vehicles having service time of 3 hours: 48.06 percent


By doing a simple calculation here, we imply that accounts for **48%** of the vehicles, nearly the half!

Another thing, which surprises me that there does exist one or more vehicles **with NO service cost**. Hmmm, what are those?

In [24]:
no_service_cost = rr3_df[rr3_df['Service Cost'] == 0]
no_service_cost.loc[:,['Manufacturer', 'Model', 'Service Time', 'Price']]

Unnamed: 0,Manufacturer,Model,Service Time,Price
127,Mercedes-AMG,GT R F1 Safety Car,180,500000 Mdollar
128,Formula 1,F1 Academy Car,180,0 Mdollar
129,Formula 1,Renault F1 Team R.S. 19,180,2000000 Mdollar
130,Formula 1,Red Bull Racing RB15,180,2000000 Mdollar
131,Formula 1,Toro Rosso STR14,180,2000000 Mdollar
132,Formula 1,Scuderia Ferrari SF90,180,2000000 Mdollar
133,Formula 1,Mercedes-AMG Petronas Motorsport F1 W10 EQ Power+,180,2000000 Mdollar
134,Formula 1,Haas F1 Team VF-19,180,2000000 Mdollar
135,Formula 1,Alfa Romeo Racing C38,180,2000000 Mdollar
136,Formula 1,McLaren F1 Team MCL34,180,2000000 Mdollar


Oh my goodness! We came across many vehicles from **2019 season of Formula 1 vehicles**! Plus, with the adjustments in game version 8.4, **Formula E**, **two GT3** vehicles also had their service costs removed. Despite that, these vehicles still need **180 minutes** to service!

In addition, all these vehicles can be obtained with the special `Mdollar` currency, which was introduced in game version 8.0 when F1 vehicles had appeared for the first time.

Lastly, we know that when a new version is delivered to this game, it generally selects **two** available vehicles to be eligible for **Exclusive Events**. However, in order to unlock to these events, you need to **fully upgrade** those vehicles! Well, how many of them could access to its own exclusive events?

In [25]:
print(rr3_df['Exclusive Events'].value_counts())
print("*** That accounts for {:.2f} percent.".format(len(rr3_df[rr3_df['Exclusive Events'] == True]) / len(rr3_df) * 100))

False    242
True      68
Name: Exclusive Events, dtype: int64
*** That accounts for 21.94 percent.


Well, total of **68** vehicles. Can I name a few arbitrarily, if possible?

In [26]:
rr3_exclusive = rr3_df[rr3_df['Exclusive Events'] == True]
rr3_exclusive.iloc[[7, 12, 22, 34, 47, 50], [0, 1]]

Unnamed: 0,Manufacturer,Model
22,Audi,R8 LMS Ultra
49,Bugatti,Veyron 16.4 Grand Sport Vitesse
103,Ferrari,LaFerrari
166,Koenigsegg,Regera
212,McLaren,MP4/4
215,Mercedes-Benz,190E 2.5-16 Evolution II


So, any vehicle with exclusive events might be cheap or expensive. It's up to you if you'd like to access those events, then you need a pile of **Gold** and **R$** for full upgrades.

### **All right folks, that's all for the 1st analysis outcome for Vehicles in Real Racing 3!**

**In the next part, I will reveal more statistical info for better insight of this dataset; and putting a few visualization charts will be nice here!**

## Until the next part, keep racing!