# REAL RACING 3 VEHICLES - DATA ANALYSIS - OUTCOME 1 (Revision 2)

### **Welcome to my 1st part of data analysis page for Real Racing 3 Vehicles!**

This first outcome part has been **revised for game version 9.0**, in order to keep the analysis up-to-date for the currently available vehicles in the game.

As always, we must begin with importing the necessary modules first.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import math

And, of course, we have to read the data frame that contains all vehicles...

**Important:** It seems that GitHub causes some tables to have unusual resizing and/or print weird characters when the dollar sign `$` appeared in the dataset during rendering. To avoid this issue, all values in `Price` that contain `$` should be changed before the analyses begin.

In [2]:
rr3_df = pd.read_csv('real_racing_3_vehicles_v9_0.csv')
for i in range(len(rr3_df)):
    if rr3_df.loc[i, 'Price'][-1] == '$':
        s = rr3_df.loc[i, 'Price'][:-1] + 'dollar'
        rr3_df.loc[i, 'Price'] = s

Let's see what we have in the first 10 rows:

In [3]:
rr3_df.head(10)

Unnamed: 0,Manufacturer,Model,PR,Top Speed,Acceleration,Braking,Grip,Class,Type,Series,Price,Service Time,Service Cost,Availability,Exclusive Events,In Main Career
0,Acura,NSX GT3,65.2,274,3.02,29.6,1.65,R,MR,2,750 Gold,180,1990.0,True,False,False
1,Acura,ARX-05,78.5,322,2.8,25.6,1.76,R,MR,1,850 Gold,180,1997.0,True,False,False
2,Alfa Romeo,Giulietta TCR,34.5,241,5.3,29.3,1.3,R,FF,1,180 Gold,180,1332.0,True,False,False
3,Alfa Romeo,4C,29.4,257,4.34,30.2,1.05,P,MR,1,200 Gold,180,1415.0,True,False,False
4,Alfa Romeo,155 V6 TI,38.0,274,4.6,29.0,1.35,R,F4,2,300 Gold,180,1665.0,True,True,False
5,Ariel,Atom 3.5,44.5,249,2.7,30.5,1.0,S,RR,2,255000 Rdollar,140,1170.0,True,False,True
6,Ariel,Atom V8,61.1,322,2.3,30.5,1.2,S,RR,3,150 Gold,180,1402.0,True,True,True
7,Aston Martin,DB9,28.2,295,4.5,32.0,1.0,P,FR,4,230000 Rdollar,130,1132.0,True,False,True
8,Aston Martin,Vanquish,33.6,295,4.1,31.4,1.02,P,FR,3,110 Gold,150,1230.0,True,False,True
9,Aston Martin,V12 Vantage S,43.8,330,3.7,30.5,1.1,P,FR,3,425000 Rdollar,170,1355.0,True,True,True


The dataset is prepared properly then, good! 

Let's get the manufacturer and the model name of the **60th vehicle** in the data frame:

In [4]:
rr3_df.loc[59, ['Manufacturer', 'Model']]

Manufacturer            Chevrolet
Model           Camaro ZL1 (2013)
Name: 59, dtype: object

Well, this returns **multiple values** as we wanted to extract from two columns while observing the output above.

But, I want it to be appeared as a plain text:

In [5]:
print(rr3_df.loc[59, 'Manufacturer'] + ' ' + rr3_df.loc[59, 'Model'])

Chevrolet Camaro ZL1 (2013)


OK, what are the specs of the vehicle at **130th index**?

In [6]:
rr3_df.loc[130, 'PR':'Grip']

PR              43.8
Top Speed        322
Acceleration     3.5
Braking         29.9
Grip            1.04
Name: 130, dtype: object

Hmmm, according to the base stats above, this probably looks like **production-based sports car**, I suppose...

Anyway, this time, I'd like to see the **price to buy**, **total service time** and **cost** for the **250th vehicle**:

In [7]:
rr3_df.loc[249, ['Price', 'Service Time', 'Service Cost']]

Price           1000 Gold
Service Time          180
Service Cost         2037
Name: 249, dtype: object

According to these results, it should be an expensive sports car that belongs to a well-known manufacturer. Well, the value 180 for **Service Time** means the vehicle needs **180 minutes** for servicing and 2037 for **Service Cost** means you've got to spend **2037 R$** to begin servicing that vehicle.

How about the **classes** and **types** for vehicles between **251st and 260th** index?

In [8]:
rr3_df.loc[251:260, ['Class', 'Type']]

Unnamed: 0,Class,Type
251,P,FR
252,P,F4
253,S,FR
254,S,FR
255,R,FR
256,S,FR
257,R,FR
258,R,MR
259,R,FR
260,R,FR


All right, we can inspect vehicles individually like these all day. But now, let's get more detailed info about this dataset.

First of all, **these are the columns in the dataset** I'm working with:

In [9]:
rr3_df.columns

Index(['Manufacturer', 'Model', 'PR', 'Top Speed', 'Acceleration', 'Braking',
       'Grip', 'Class', 'Type', 'Series', 'Price', 'Service Time',
       'Service Cost', 'Availability', 'Exclusive Events', 'In Main Career'],
      dtype='object')

Here's the current shape of the dataframe: 

In [10]:
rr3_df.shape

(349, 16)

From these numbers, there are **349 rows**, i.e. 349 vehicles existed in the dataset; plus **16 different features**.

How about the general statistics of the features which hold **numerical values** (i.e. those with data type of `float64`)?

In [11]:
round(rr3_df.describe(), 2)

Unnamed: 0,PR,Top Speed,Acceleration,Braking,Grip,Series,Service Time,Service Cost
count,349.0,349.0,349.0,349.0,349.0,349.0,349.0,347.0
mean,52.1,307.16,3.58,30.13,1.39,2.05,166.86,1266.99
std,27.57,46.08,1.13,5.75,0.55,1.13,57.5,679.04
min,0.1,180.0,1.9,14.3,0.72,0.0,1.0,0.0
25%,37.8,277.0,2.86,27.7,1.05,1.0,165.0,847.0
50%,48.8,312.0,3.4,30.2,1.22,2.0,180.0,1422.0
75%,66.8,330.0,4.0,32.0,1.5,3.0,180.0,1840.0
max,130.3,447.0,8.2,48.8,4.0,7.0,325.0,2192.0


Whoa, look at that! That's lots of info we obtained just from one command! Well, these numbers already explain itself! But, in case, you didn't understand what they meant, let me clarify:

* **mean** is the average
* **std** is the standard deviation
* **min** and **max** are the minimum and the maximum values
* **25%**, **50%**, **75%** are the 25th percentile, 50th percentile (median), and 75th percentile values, respectively.

What makes me wondering is that why the count value for **Service Cost** is not the exact number (which is 2 less), when compared against the count of other features above.

Therefore, I want to find out which row(s) have that kind of missing values, possibly **NaN values**.

In [12]:
rr3_df[rr3_df['Service Cost'].isnull()]

Unnamed: 0,Manufacturer,Model,PR,Top Speed,Acceleration,Braking,Grip,Class,Type,Series,Price,Service Time,Service Cost,Availability,Exclusive Events,In Main Career
21,Aston Martin,Valkyrie,95.5,367,2.6,21.3,2.53,S,MR,1,1150 Gold,180,,True,False,False
67,Chevrolet,Corvette C6.R GT2,57.0,274,3.8,27.7,1.6,R,FR,1,450 Gold,180,,True,False,False


Ah yes! I remembered that __Aston Martin Valkyrie__ and __Chevrolet Corvette C6.R GT2__ were the newest vehicles, added in version 9.0. Perhaps, I could not obtain the values from the related source website, as they appeared (?). I should consider filling those values ASAP.

OK, let's continue motivating ourselves...

With `describe()` method, I've seen many **descriptive decimal values**. Can I do the same with `numpy` methods? Let's try them on **PR** column:

In [13]:
PR = rr3_df['PR'].values
print("Count: {}, Mean: {:.3}, Std. dev.: {:.5}".format(PR.size, PR.mean(), PR.std()))

Count: 349, Mean: 52.1, Std. dev.: 27.53


In [14]:
print("Min: {}, Max: {}".format(PR.min(), PR.max()))

Min: 0.1, Max: 130.3


In [15]:
print("25th, 50th, 75th  precent values are {}".format(np.percentile(PR, [25, 50, 75])))

25th, 50th, 75th  precent values are [37.8 48.8 66.8]


Yep, they are all doable! By obtaining this information above, I'd like to get the names of vehicles which have **the lowest** PR values:

In [16]:
lowest_PR = rr3_df[rr3_df['PR'] == 0.1]
lowest_PR.loc[:, ['Manufacturer', 'Model', 'Price']]

Unnamed: 0,Manufacturer,Model,Price
230,Mazda,RX-3,50 Gold
265,Nissan,Skyline 2000 GT-R (KPGC10),50 Gold
290,Porsche,911 Targa (1974),53000 Rdollar


Hmmm, the lowest PR value is shared by **3 vehicles**, not one! Well, we may ask why **Mazda RX-3** and **Nissan Skyline 2000 GT-R** cost Gold to buy instead of R$!

How about the vehicle with **the highest PR** value?

In [17]:
highest_PR = rr3_df[rr3_df['PR'] == 130.3]
highest_PR.iloc[:,[0,1,3,4,5,6,10]]

Unnamed: 0,Manufacturer,Model,Top Speed,Acceleration,Braking,Grip,Price
250,McLaren,MP4-X,402,1.9,14.3,4.0,1200 Gold


Wow! **McLaren MP4-X**, a futuristic F1-inspired hypercar, looks immensely fast! This vehicle really deserves to have the highest PR in the game (still even for version 9.0). Moreover, it has the highest acceleration, cornering grip and lowest braking distance! _(Check these values with the ones above properly)_

On the other hand, w.r.t. **top speed, it doesn't have the highest**! So, which one?

In [18]:
highest_TS = rr3_df['Top Speed'].max()
highest_TS_name = rr3_df[rr3_df['Top Speed'] == highest_TS]
highest_TS_name[['Manufacturer', 'Model', 'Top Speed']]

Unnamed: 0,Manufacturer,Model,Top Speed
199,Koenigsegg,Agera RS,447


Aah! **Koenigsegg Agera RS** (available since game version 8.0) has the highest top speed (447 kph, about 280 mph) even with **no upgrades**!

Now, I wonder which vehicle appears the most throughout the career series.

In [19]:
highest_series = rr3_df['Series'].max()
rr3_df[rr3_df['Series'] == highest_series]

Unnamed: 0,Manufacturer,Model,PR,Top Speed,Acceleration,Braking,Grip,Class,Type,Series,Price,Service Time,Service Cost,Availability,Exclusive Events,In Main Career
309,Porsche,918 Spyder Concept,60.2,322,3.0,30.2,1.4,S,M4,7,845000 Rdollar,215,1605.0,True,False,True


Ah yes, one of my favourite vehicle here: **Porsche 918 Spyder Concept**. Should this be a good choice to go for full upgrade?

OK, lastly, through these numbers above (the stats we've seen with `describe()`), I noticed that the minimum required servicing time is just **1 minute**.

Thus, which vehicle does almost not need much servicing time?

In [20]:
minimum_service_time = rr3_df['Service Time'].min()
rr3_df[rr3_df['Service Time'] == minimum_service_time]

Unnamed: 0,Manufacturer,Model,PR,Top Speed,Acceleration,Braking,Grip,Class,Type,Series,Price,Service Time,Service Cost,Availability,Exclusive Events,In Main Career
262,Nissan,Silvia (S15),1.7,243,5.5,34.4,0.85,P,FR(RHD),3,25700 Rdollar,1,342.0,True,False,True


Hey, that's our good ol' friend **Nissan Silvia (S15)**! That was our first vehicle ever to own in the game, right? Plus, it shouldn't be so expensive to get it to fully upgraded, though!

In contrast, now, which one has the highest servicing time?

In [21]:
max_service_time = rr3_df['Service Time'].max()
rr3_df[rr3_df['Service Time'] == max_service_time]

Unnamed: 0,Manufacturer,Model,PR,Top Speed,Acceleration,Braking,Grip,Class,Type,Series,Price,Service Time,Service Cost,Availability,Exclusive Events,In Main Career
348,Toyota,TS040 Hybrid (2014),89.4,394,2.5,26.2,1.82,R,M4,2,950 Gold,325,2192.0,True,False,True


Ah, it's a **Toyota**! But, not a street legal vehicle; instead, from _Endurance Motorsport Series_. I figured out that it needs **5 hours 25 minutes** to have it fully serviced. Likewise, **the highest servicing cost** belongs to this vehicle, too!

In [22]:
median_service_time = rr3_df['Service Time'].median()
print("Median of service time is {:.0f} minutes.".format(median_service_time))
count_median_service_time = rr3_df[rr3_df['Service Time'] == median_service_time].shape[0]
print("And {} vehicles have this duration of servicing time.".format(count_median_service_time))

Median of service time is 180 minutes.
And 188 vehicles have this duration of servicing time.


OK, it's not surprising that the median value of servicing time is **180 minutes = 3 hours**, since many new vehicles that were added throughout the latest updates, have their service times set to that value. Now, in game version 9.0, it's been raised to **188** vehicles!

In [23]:
print("Ratio of vehicles having service time of 3 hours: {:.2f} percent".format(
    count_median_service_time/rr3_df.shape[0] * 100))

Ratio of vehicles having service time of 3 hours: 53.87 percent


By doing a simple calculation here, we imply that this value accounts for **nearly 54%** of the vehicles in total!

Another thing that, there does exist vehicles **with NO service costs**. Interesting, right? But, what are those?

In [24]:
no_service_cost = rr3_df[rr3_df['Service Cost'] == 0]
no_service_cost.loc[:,['Manufacturer', 'Model', 'Service Time', 'Price']]

Unnamed: 0,Manufacturer,Model,Service Time,Price
15,Aston Martin,Vantage GTE (2019),180,1950000 Mdollar
32,Bentley,Continental GT3,180,1300000 Mdollar
66,Chevrolet,Corvette C8.R,180,1950000 Mdollar
75,Chevrolet,Camaro ZL1 1LE (Hendrick Motorsports - 2020),180,2150000 Mdollar
76,Chevrolet,Camaro ZL1 1LE (Chip Ganassi Racing - 2020),180,2150000 Mdollar
77,Chevrolet,Camaro ZL1 1LE (Richard Childress Racing - 2020),180,2150000 Mdollar
78,Chevrolet,Camaro ZL1 1LE (Richard Petty Racing - 2020),180,2150000 Mdollar
135,Ford,Mustang (Stewart-Haas Racing - 2020),180,2150000 Mdollar
136,Ford,Mustang (Team Penske - 2020),180,2150000 Mdollar
140,Formula 1,Mercedes-AMG GT R F1 Safety Car,180,500000 Mdollar


Oh my goodness! We came across many vehicles! These are all from **Motorsports Discipline!**. It is good to know that you don't need to pay anything when you want to service.

If we categorize them all, these are **2019 and 2020 Season vehicles from Formula 1, 6th Season from Formula E (including several ones from previous seasons), 2020 Season vehicles from NASCAR, GT3 class, EuroMaster class**, and **2020 Endurance GTE class**. What they have in common is that they need **180 minutes** to service!

In addition, all these vehicles can be obtained with the special `Mdollar` currency, **available since game version 8.0**. 

If you are eager to own them all in the current state, you would need:

In [25]:
def get_mdollars(vals):
    n = []
    for v in vals:
        n.append(int(v[:-8]))
    return n
mdollars = get_mdollars(no_service_cost['Price'].values)
print("You need {} Mdollars to own all motorsport vehicles.".format(sum(mdollars)))

You need 104900000 Mdollars to own all motorsport vehicles.


Moreover, there is a daily limit where you can earn up to M$250,000. So, **how many days should you play consecutively** in order to reach the value above?

In [26]:
print("You need to play for {:.0f} consecutive days!".format(sum(mdollars) / 250000))

You need to play for 420 consecutive days!


Ah! That's more than a year to buy all vehicles with M$! I even don't count the **other criterions**, such as bonuses, upgrades, tuning setups, driver trainings.

On to the next one; we know that when a new version is delivered to this game, it generally selects **one** or **two** available vehicles to be eligible for **Exclusive Events**. However, in order to unlock to these events, you need to **fully upgrade** those vehicles! Well, how many of them have gained the right to access its own exclusive events?

In [27]:
print(rr3_df['Exclusive Events'].value_counts())
print("*** That accounts for {:.2f} percent.".format(len(rr3_df[rr3_df['Exclusive Events'] == True]) / len(rr3_df) * 100))

False    272
True      77
Name: Exclusive Events, dtype: int64
*** That accounts for 22.06 percent.


Well, total of **77** vehicles. Can I name a few arbitrarily, if possible?

In [28]:
rr3_exclusive = rr3_df[rr3_df['Exclusive Events'] == True]
rr3_exclusive.sample(n=7, random_state=1)[['Manufacturer', 'Model']]

Unnamed: 0,Manufacturer,Model
180,Hennessey,Venom GT
219,Lamborghini,Centenario LP770-4
114,Ferrari,FXX K
330,Porsche,935 (2019)
257,Mercedes-AMG,GT3
258,Mercedes-Benz,CLK-LM
263,Nissan,Skyline GT-R V-Spec (R34)


So, any vehicle with exclusive events might be cheap or expensive. It's up to you if you'd like to access those events, then you need a pile of **Gold** and **R$** for full upgrades.

Lastly, let me investigate the vehicles in **main career series** that you can race with them in **Road Collection**:

In [29]:
def get_prices(vals):
    r, g = [], []
    for v in vals:
        if v[-7:] == 'Rdollar':
            r.append(int(v[:-8]))
        elif v[-4:] == 'Gold':
            g.append(int(v[:-5]))
    return r, g
rr3_main_career = rr3_df[rr3_df['In Main Career'] == True]
print("There are {} vehicles available in main career series.".format(rr3_main_career.shape[0]))
print("*** This accounts for {:.2f} percent.".format((rr3_main_career.shape[0] / rr3_df.shape[0]) * 100))
rdollars, golds = get_prices(rr3_main_career['Price'].values)
print("\nYou should accumulate {} R$ and {} Gold to buy all these vehicles.".format(sum(rdollars), sum(golds)))

There are 94 vehicles available in main career series.
*** This accounts for 26.93 percent.

You should accumulate 33006755 R$ and 14045 Gold to buy all these vehicles.


Here are some examples of vehicles from main career series:

In [30]:
rr3_main_career.sample(n=10, random_state=1)[['Manufacturer', 'Model']]

Unnamed: 0,Manufacturer,Model
100,Ferrari,F12Berlinetta
86,Dodge,Viper SRT10 Coupe
118,Ford,Shelby GT500
309,Porsche,918 Spyder Concept
296,Porsche,911 GT3 Cup
122,Ford,GT FIA GT1
253,Mercedes-Benz,SLS AMG
117,Ford,Focus RS
286,Pagani,Zonda F
269,Nissan,GT-R Premium (R35)


### **All right folks, that's all for the 1st part of analysis outcome for Vehicles in Real Racing 3!**

**In the next part, I will reveal more statistical info for better insight of this dataset; and putting a few visualization charts will be nice here!**

## As always, keep racing!