## Exercises: Filter and query data

In [1]:
import pandas as pd
autos = pd.read_json("../Data/autos.json")

**Use the "autos" dataset to answer the following questions:**

**Q1:** How many cars from Jaguar are in the dataset, and what is the price of the most expensive one?

<details>
<summary>Answer</summary>
<br>
&nbsp;&nbsp;&nbsp;count=3
&nbsp;&nbsp;&nbsp;price=36000.0
</details>

**My solution:**

In [17]:
# We'll use a query for this one
jaguars = autos.query("make == 'jaguar'")
print(f"There are {len(jaguars)} Jaguars in the dataset.")

highest_price = jaguars["price"].max()
print(f"The most expensive one costs ${highest_price}")

There are 3 Jaguars in the dataset.
The most expensive one costs $36000.0


**Q2:** How many cars from Toyota are in the dataset, and what is the price of the most expensive one?

<details>
<summary>Answer</summary>
<br>
&nbsp;&nbsp;&nbsp;count=32
&nbsp;&nbsp;&nbsp;price=17669.0
</details>

**My solution:**

In [16]:
# This time we will use a mask
toyotas = autos[autos["make"] == "toyota"]
print(f"There are {len(toyotas)} cars from Toyota in the dataset.")

highest_price = toyotas["price"].max()
print(f"The most expensive Toyota costs ${highest_price}")

There are 32 cars from Toyota in the dataset.
The most expensive Toyota costs $17669.0


**Q3:** What is the length, width and height of the most expensive car in the entire dataset?

<details>
<summary>Answer</summary>
<br>&nbsp;&nbsp;&nbsp;length=199.2
&nbsp;&nbsp;&nbsp;width=72.0
&nbsp;&nbsp;&nbsp;height=55.4
</details>

**My solution:**

In [27]:
top_price_car = autos.sort_values(by="price", ascending=False).iloc[0]
features = ["length", "width", "height"]
print("The most expensive car has the following measurements:")
for feature in features:
    print(f"{feature}: {top_price_car[feature]} inch")

The most expensive car has the following measurements:
length: 199.2 inch
width: 72.0 inch
height: 55.4 inch


**Q4:** What is the lowest price per horsepower in the dataset, and what brand ("make") is that car?

<details>
<summary>Answer</summary>
<br>
&nbsp;&nbsp;&nbsp;price per horsepower=72.84
&nbsp;&nbsp;&nbsp;brand=Toyota
</details>

**My solution:**

In [40]:
# add a new column "horsepower_price"
autos["horsepower_price"] = autos["price"] / autos["horsepower"]

best_value_car = autos.sort_values(by="horsepower_price", ascending=True).iloc[0]
brand = best_value_car['make'].title()
price_per_hp = best_value_car["horsepower_price"]
print(
    f"The car that has the lowest price per horsepower is a {brand} "
    f"with a price of ${price_per_hp:.2f} per horsepower."
)

The car that has the lowest price per horsepower is a Toyota with a price of $72.84 per horsepower.


**Q5:** How many of the cars in the dataset has as many cylinders as they have doors?

<details>
<summary>Answer</summary>
<br>
&nbsp;&nbsp;&nbsp;cars=95
</details>

**My solution:**

In [43]:
matching_cars= autos.query("`num-of-cylinders` == `num-of-doors`")

print(f"{len(matching_cars)} of the cars have an equal number of cylinders and doors.")

95 of the cars have an equal number of cylinders and doors.


**Use the "autos" dataset and write python code to solve the following tasks:**

**T1:** Print a string to inform the user of the price difference between the cheapest and the most expensive car in the dataset.

<details>
<summary>Solution</summary>
<br>
&nbsp;&nbsp;&nbsp;<b>Example:</b><br>
&nbsp;&nbsp;&nbsp;The price difference between the cheapest and the most expensive car is 40282.0
</details>

**My solution:**

In [46]:
max_price = autos["price"].max()
min_price = autos["price"].min()
price_diff = max_price - min_price

print(
    f"The price difference between the most expensive car (${max_price}) "
    f"and the cheapest car (${min_price}) is ${price_diff}."
)

The price difference between the most expensive car ($45400.0) and the cheapest car ($5118.0) is $40282.0.


**T2:** Ask the user to input a brand, then print the price range for that brand.

<details>
<summary>Solution</summary>
<br>
&nbsp;&nbsp;&nbsp;<b>Example 1:</b><br>
&nbsp;&nbsp;&nbsp;Input the name of a brand: volvo<br>
&nbsp;&nbsp;&nbsp;The prices for cars of brand 'volvo' ranges from 12940.0 to 22625.0<br>
<br>
&nbsp;&nbsp;&nbsp;<b>Example 2:</b><br>
&nbsp;&nbsp;&nbsp;Input the name of a brand: toyota<br>
&nbsp;&nbsp;&nbsp;The prices for cars of brand 'toyota' ranges from 5348.0 to 17669.0<br>
<br>
&nbsp;&nbsp;&nbsp;<b>Example 3:</b><br>
&nbsp;&nbsp;&nbsp;Input the name of a brand: tesla<br>
&nbsp;&nbsp;&nbsp;The brand 'tesla' does not exists in the dataset.<br>
</details>

**My solution:**

In [54]:
while True:
    brand = input("Enter the name of a brand: ").strip()
    if not brand:
        print("Invalid input. Try again.")
    else:
        break

# Turn "Mercedes Benz" to "mercedes-benz" and so on
brand = brand.lower()
brand = brand.replace(" ", "-")

brand_cars = autos[autos["make"] == brand]
if brand_cars.size < 1:
    print(f"The brand '{brand}' does not exist in the dataset.")
else:
    min_price = brand_cars["price"].min()
    max_price = brand_cars["price"].max()
    print(
        f"The prices of cars of brand '{brand}' ranges from "
        f"{min_price} to {max_price}."
    )

The prices of cars of brand 'mercedes-benz' ranges from 25552.0 to 45400.0.


**T3:** Ask the user to input a brand, then print the number of cars in the dataset for that brand, and all attributes for a random sample car of that brand.
<details>
<summary>Solution</summary>
<br>
&nbsp;&nbsp;&nbsp;<b>Example:</b><br>
&nbsp;&nbsp;&nbsp;Input the name of a brand: mazda<br>
&nbsp;&nbsp;&nbsp;There are 17 cars of brand 'mazda' in the dataset.<br><br>
&nbsp;&nbsp;&nbsp;Here is the data for a random 'mazda' car:<br>
&nbsp;&nbsp;&nbsp;aspiration = std<br>
&nbsp;&nbsp;&nbsp;body-style = sedan<br>
&nbsp;&nbsp;&nbsp;bore = 3.03<br>
&nbsp;&nbsp;&nbsp;city-mpg = 31<br>
&nbsp;&nbsp;&nbsp;compression-ratio = 9.0<br>
&nbsp;&nbsp;&nbsp;curb-weight = 1945<br>
&nbsp;&nbsp;&nbsp;drive-wheels = fwd<br>
&nbsp;&nbsp;&nbsp;engine-location = front<br>
&nbsp;&nbsp;&nbsp;engine-size = 91<br>
&nbsp;&nbsp;&nbsp;engine-type = ohc<br>
&nbsp;&nbsp;&nbsp;fuel-system = 2bbl<br>
&nbsp;&nbsp;&nbsp;fuel-type = gas<br>
&nbsp;&nbsp;&nbsp;height = 54.1<br>
&nbsp;&nbsp;&nbsp;highway-mpg = 38<br>
&nbsp;&nbsp;&nbsp;horsepower = 68.0<br>
&nbsp;&nbsp;&nbsp;length = 166.8<br>
&nbsp;&nbsp;&nbsp;make = mazda<br>
&nbsp;&nbsp;&nbsp;normalized-losses = 113.0<br>
&nbsp;&nbsp;&nbsp;num-of-cylinders = four<br>
&nbsp;&nbsp;&nbsp;num-of-doors = four<br>
&nbsp;&nbsp;&nbsp;peak-rpm = 5000.0<br>
&nbsp;&nbsp;&nbsp;price = 6695.0<br>
&nbsp;&nbsp;&nbsp;stroke = 3.15<br>
&nbsp;&nbsp;&nbsp;symboling = 1<br>
&nbsp;&nbsp;&nbsp;wheel-base = 93.1<br>
&nbsp;&nbsp;&nbsp;width = 64.2<br>
</details>

**My solution:**

In [66]:
while True:
    brand = input("Enter the name of a brand: ").strip()
    if not brand:
        print("Invalid input. Try again.")
    else:
        break

# Turn "Mercedes Benz" to "mercedes-benz" and so on
brand = brand.lower()
brand = brand.replace(" ", "-")

brand_cars = autos[autos["make"] == brand]
if brand_cars.size < 1:
    print(f"The brand '{brand}' does not exist in the dataset.")
else:
    print(f"There are {len(brand_cars)} of the brand '{brand}' in the dataset.")
    print(f"Here is the data of a random '{brand}' car:")
    # beware that .sample returns a DataFrame, not a Series
    sample_car = brand_cars.sample().iloc[0]
    for column in autos.columns:
        print(f"{column}: {sample_car[column]}")

There are 11 of the brand 'volvo' in the dataset.
Here is the data of a random 'volvo' car:
aspiration: turbo
body-style: sedan
bore: 3.62
city-mpg: 17
compression-ratio: 7.5
curb-weight: 3045
drive-wheels: rwd
engine-location: front
engine-size: 130
engine-type: ohc
fuel-system: mpfi
fuel-type: gas
height: 56.2
highway-mpg: 22
horsepower: 162.0
length: 188.8
make: volvo
normalized-losses: 103.0
num-of-cylinders: four
num-of-doors: four
peak-rpm: 5100.0
price: 18420.0
stroke: 3.15
symboling: -2
wheel-base: 104.3
width: 67.2
horsepower_price: 113.70370370370371


**T4:** Ask the user to input a brand, then export all cars of that brand to a .csv file with the same name as the brand.

<details>
<summary>Solution</summary>
<br>
&nbsp;&nbsp;&nbsp;<b>Example 1:</b><br>
&nbsp;&nbsp;&nbsp;Input the name of a brand: volkswagen<br>
&nbsp;&nbsp;&nbsp;Exported 12 cars to 'volkswagen.csv'<br>
<br>
&nbsp;&nbsp;&nbsp;<b>Example 2:</b><br>
&nbsp;&nbsp;&nbsp;Input the name of a brand: tesla<br>
&nbsp;&nbsp;&nbsp;The brand 'tesla' does not exists in the dataset.<br>
</details>

**My solution:**

In [71]:
while True:
    brand = input("Enter the name of a brand: ").strip()
    if not brand:
        print("Invalid input. Try again.")
    else:
        break

# Turn "Mercedes Benz" to "mercedes-benz" and so on
brand = brand.lower()
brand = brand.replace(" ", "-")

brand_cars = autos[autos["make"] == brand]

if len(brand_cars) < 1:
    print(f"The brand '{brand}' does not exist in the dataset.")
else:
    file_name = f"{brand}.csv"
    path = f"../Data/{file_name}"
    brand_cars.to_csv(path)
    print(f"Exported {len(brand_cars)} cars to '{file_name}'")

Exported 12 cars to 'volkswagen.csv'
