### 1. Case Study Selection 
- Televisions in Flipkart website

### 2. Search for Relevant Websites 
- Find the websites which are allowing you to scrape the data from the domain/use case you have chosen.

### 3. Define the problem Statement: 
Customer Satisfaction Analysis:

***Target Feature*** : Reviews or Scores

***Problem Statement*** : Identify factors that impact customer satisfaction (measured by Scores or Reviews) and determine which features contribute to better customer reviews for TVs.

***Goal*** : This could be useful for understanding what TV features or attributes are associated with higher satisfaction levels and could guide customer-centered product recommendations.

### 4. Extract the Data from website

In [3]:
# importing modules
import pandas as pd
import requests
from bs4 import BeautifulSoup
import re
import numpy as np

# Scraping the data into FlipKart Website
url = "https://www.flipkart.com/search?q=television"
url

'https://www.flipkart.com/search?q=television'

In [4]:
rq = requests.get(url)
rq

<Response [403]>

In [5]:
request_header = {'Content-Type': 'text/html; charset=UTF-8','User-Agent': 'Chrome/101.0.0.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/119.0','Accept-Encoding': 'gzip, deflate, br'}

In [6]:
page = requests.get(url, headers=request_header)
page

<Response [200]>

### Brand,Inches

In [7]:
soup = BeautifulSoup(page.text)

In [7]:
soup.find_all("div",class_ = "KzDlHZ")

[<div class="KzDlHZ">Mi by Xiaomi A Series 80 cm (32 inch) HD Ready LED Smart Google TV 2024 Edition with 200+ Free Channel...</div>,
 <div class="KzDlHZ">Infinix 80 cm (32 inch) HD Ready LED Smart Linux TV 2024 Edition</div>,
 <div class="KzDlHZ">LG 32LQBPTA 80 cm (32 inch) HD Ready LED Smart WebOS TV with Alpha5 Gen5 AI Processor, ThinQ AI, AI So...</div>,
 <div class="KzDlHZ">TCL S5500 79.97 cm (32 inch) Full HD LED Smart Google TV 2024 Edition with 1.5 GB RAM + 16 GB ROM</div>,
 <div class="KzDlHZ">TCL L4B 79.97 cm (32 inch) HD Ready LED Smart Android TV 2024 Edition with Metallic Bezel Less and Chr...</div>,
 <div class="KzDlHZ">Thomson Alpha 80 cm (32 inch) HD Ready LED Smart Linux TV with 30 W Sound Output &amp; Bezel-Less Design</div>,
 <div class="KzDlHZ">Foxsky 80 cm (32 inch) HD Ready LED Smart Android TV</div>,
 <div class="KzDlHZ">LG 32LMBPTC 80 cm (32 inch) HD Ready LED Smart WebOS TV with Quad Core Processor, Active HDR, 60 Hz Re...</div>,
 <div class="KzDlHZ">iFFALCON b

In [22]:
for i in soup.find_all("div",class_="KzDlHZ"):
    print(i.text)

XElectron 60 cm (24 inch) HD Ready LED TV
XElectron 108 cm (43 inch) Ultra HD (4K) LED Smart Android TV
SAMSUNG 80 cm (32 Inch) HD Ready LED Smart Tizen TV with Bezel-Free Design | 300+ Free Channels | PurC...
Mi by Xiaomi A Series 80 cm (32 inch) HD Ready LED Smart Google TV 2024 Edition with 200+ Free Channel...
SAMSUNG New D Series Brighter Crystal 4K Vision Pro (2024 Edition) 108 cm (43 inch) Ultra HD (4K) LED ...
SAMSUNG 108 cm (43 inch) Full HD LED Smart Tizen TV with 300+ Free Channels | HDR | PurColor | Dolby D...
LG 32LMBPTC 80 cm (32 inch) HD Ready LED Smart WebOS TV with Quad Core Processor, Active HDR, 60 Hz Re...
LG 32LQBPTA 80 cm (32 inch) HD Ready LED Smart WebOS TV with Alpha5 Gen5 AI Processor, ThinQ AI, AI So...
TCL L4B 79.97 cm (32 inch) HD Ready LED Smart Android TV 2024 Edition with Metallic Bezel Less and Chr...
HUIDI 80 cm (32 inch) HD Ready LED TV 2024 Edition with Bezel Less Display
iFFALCON by TCL U64 139 cm (55 inch) Ultra HD (4K) LED Smart Google TV with 24W

In [6]:
Brand = []
Inches = []

for i in soup.find_all("div", class_="KzDlHZ"):
    text = i.text  
    
    brand_match = re.findall(r"^(\w+)(?:\s\w+)?", text)
    if brand_match:
        Brand.append(brand_match[0])  
    else:
        Brand.append(np.nan) 

    inch_match = re.findall(r"\((\d+)\s?inch\)", text)
    if inch_match:
        Inches.append(inch_match[0])  
    else:
        Inches.append(np.nan)  

print("Brand:", Brand)
print("Inches:", Inches)


Brand: ['XElectron', 'XElectron', 'SAMSUNG', 'Mi', 'Reliance', 'XElectron', 'SAMSUNG', 'SAMSUNG', 'LG', 'TCL', 'Infinix', 'iFFALCON', 'LG', 'HUIDI', 'TCL', 'MOTOROLA', 'Foxsky', 'KODAK', 'TCL', 'Daiwa', 'REDMI', 'SAMSUNG', 'Thomson', 'Acer']
Inches: ['32', '24', nan, '32', '55', '32', '43', '43', '32', '32', '32', '55', '32', nan, '32', '43', '32', '32', '55', '32', '43', '55', nan, '40']


In [25]:
len(Brand)

24

In [26]:
len(Inches)

24

In [78]:
pd.DataFrame({"Brand":Brand,"Inches":Inches,})

Unnamed: 0,Brand,Inches
0,XElectron,24.0
1,XElectron,43.0
2,SAMSUNG,
3,Mi,32.0
4,SAMSUNG,43.0
5,SAMSUNG,43.0
6,LG,32.0
7,LG,32.0
8,TCL,32.0
9,HUIDI,32.0


### Ratings,Reviews,Score

In [7]:
# Lists to store data
Scores = []
Ratings = []
Reviews = []

for item in soup.find_all("div", class_="tUxRFH"):
    score_text = item.find("div", class_="XQDdHH")
    if score_text:
        Scores.append(score_text.text)
    else:
        Scores.append(np.nan)

    rating_review_text = item.find("span", class_="Wphh3N")
    if rating_review_text:
        rating_review_text = rating_review_text.text
    else:
        rating_review_text = ""

    rating_match = re.search(r"([\d,]+)\sRatings", rating_review_text)
    if rating_match:
        Ratings.append(rating_match.group(1))
    else:
        Ratings.append(np.nan)

    review_match = re.search(r"([\d,]+)\sReviews", rating_review_text)
    if review_match:
        Reviews.append(review_match.group(1))
    else:
        Reviews.append(np.nan)

# Output results to verify
print("Scores:", Scores)
print("Ratings Count:", Ratings)
print("Reviews Count:", Reviews)


Scores: ['4.1', '4.1', '4.3', '4.3', nan, '4.1', '4.3', '4.3', '4.3', '4.1', '4.2', '4.2', '4.3', '3.8', '4.1', '4.3', '4', '4.2', '4.1', '4.1', '4.2', '4.3', '4.4', '4.2']
Ratings Count: ['777', '777', '1,47,714', '1,68,634', nan, '777', '36,365', '1,47,714', '21,185', '22,685', '48,479', '67,052', '21,185', '1,776', '22,685', '3,583', '605', '5,745', '5,013', '2,189', '8,945', '36,365', '41,269', '26,058']
Reviews Count: ['155', '155', '10,942', '13,109', nan, '155', '2,899', '10,942', '1,405', '1,898', '5,109', '7,515', '1,405', '153', '1,898', '309', '93', '579', '554', '211', '553', '2,899', '5,610', '2,825']


In [97]:
len(Scores)

24

In [98]:
len(Ratings)

24

In [99]:
len(Reviews)

24

In [101]:
pd.DataFrame({"Brand":Brand,"Inches":Inches,"Scores":Scores,"Reviews":Reviews,"Ratings":Ratings})

Unnamed: 0,Brand,Inches,Scores,Reviews,Ratings
0,XElectron,24.0,4.1,154,776
1,XElectron,43.0,4.1,154,776
2,SAMSUNG,,4.3,10935,147586
3,Mi,32.0,4.3,13099,168507
4,SAMSUNG,43.0,4.3,2894,36269
5,SAMSUNG,43.0,4.3,10935,147586
6,LG,32.0,4.3,1400,21115
7,LG,32.0,4.3,1400,21115
8,TCL,32.0,4.1,1887,22568
9,HUIDI,32.0,4.0,816,7659


### OS,Launch Year,Warranty

In [105]:
soup.find_all("div",class_="_6NESgJ")

[<div class="_6NESgJ"><ul class="G4BRas"><li class="J+igdf">HD Ready 1366 x 768 Pixels</li><li class="J+igdf">Launch Year: 2023</li><li class="J+igdf">1 Year warranty from the date of purchase. For more help &amp; support call us at: 8527312304 or mail us at: customercare@xelectron.com</li></ul></div>,
 <div class="_6NESgJ"><ul class="G4BRas"><li class="J+igdf">Operating System: Android</li><li class="J+igdf">Ultra HD (4K) 3840 x 2160 Pixels</li><li class="J+igdf">Launch Year: 2022</li><li class="J+igdf">1 Year warranty from the date of purchase. For more help &amp; support call us at: 8527312304 or mail us at: customercare@xelectron.com</li></ul></div>,
 <div class="_6NESgJ"><ul class="G4BRas"><li class="J+igdf">Operating System: Tizen</li><li class="J+igdf">HD Ready 1366 x 768 Pixels</li><li class="J+igdf">Launch Year: 2022</li><li class="J+igdf">1 Year Comprehensive Warranty on Product and 1 Year Additional on Panel</li></ul></div>,
 <div class="_6NESgJ"><ul class="G4BRas"><li class

In [103]:
for i in soup.find_all("div",class_="_6NESgJ"):
    print(i.text)

HD Ready 1366 x 768 PixelsLaunch Year: 20231 Year warranty from the date of purchase. For more help & support call us at: 8527312304 or mail us at: customercare@xelectron.com
Operating System: AndroidUltra HD (4K) 3840 x 2160 PixelsLaunch Year: 20221 Year warranty from the date of purchase. For more help & support call us at: 8527312304 or mail us at: customercare@xelectron.com
Operating System: TizenHD Ready 1366 x 768 PixelsLaunch Year: 20221 Year Comprehensive Warranty on Product and 1 Year Additional on Panel
Operating System: Google TVHD Ready 1366 x 768 PixelsLaunch Year: 20241 Year Warranty on Product and 1 Year Additional Warranty on Panel
Operating System: TizenUltra HD (4K) 3840 x 2160 PixelsLaunch Year: 20242 Year Warranty (1 Year Standard Warranty + 1 Year additional warranty on Panel)
Operating System: TizenFull HD 1920 x 1080 PixelsLaunch Year: 20231 Year Comprehensive Warranty on Product and 1 Year Additional Warranty on Panel
Operating System: WebOSHD Ready 1366 x 768 P

In [8]:
os_list = []
warranty_list = []
launch_year_list = []

for item in soup.find_all("div", class_="_6NESgJ"):
    text = item.text
    
    os_match = re.search(r"Operating System: (\w+)", text)
    if os_match:
        os_list.append(os_match.group(1))
    else:
        os_list.append(None)
    
    year_match = re.search(r"Launch Year: (\d{4})", text)
    if year_match:
        launch_year_list.append(year_match.group(1))
    else:
        launch_year_list.append(None)
    
    warranty_match = re.search(r"(\d)\s+Year", text) 
    if warranty_match:
        warranty_list.append(warranty_match.group(1))
    else:
        warranty_list.append(None)

# Output the extracted lists
print("Operating Systems:", os_list)
print("Launch Years:", launch_year_list)
print("Warranties:", warranty_list)


Operating Systems: [None, None, 'TizenHD', 'Google', 'WebOSUltra', 'AndroidHD', 'TizenUltra', 'TizenFull', 'WebOSHD', 'AndroidHD', 'LinuxHD', 'Google', 'WebOSHD', 'Android', 'Google', 'Google', 'Google', 'LinuxHD', 'Google', 'LinuxHD', 'FireTv', 'TizenUltra', 'LinuxHD', 'Google']
Launch Years: ['2023', '2023', '2022', '2024', '2024', '2022', '2024', '2023', '2020', '2024', '2024', '2024', '2023', '2024', '2024', '2024', '2024', '2024', '2024', '2024', '2024', '2024', '2023', '2024']
Warranties: ['1', '1', '1', '1', '1', '1', '2', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '2', None, '2', '2', '1', '1']


In [8]:
len(os_list)

24

In [9]:
len(launch_year_list)

24

In [10]:
len(warranty_list)

24

In [17]:
pd.DataFrame({"Brand":Brand,"Inches":Inches,"Scores":Scores,"Reviews":Reviews,"Ratings":Ratings,\
              "Os":os_list,"Launch Year":launch_year_list,"Warranty":warranty_list})

Unnamed: 0,Brand,Inches,Scores,Reviews,Ratings,Os,Launch Year,Warranty
0,XElectron,24.0,4.1,154.0,776.0,,2023,1.0
1,Reliance,32.0,3.9,1.0,50.0,AndroidHD,2024,1.0
2,SAMSUNG,,4.3,10935.0,147586.0,TizenHD,2022,1.0
3,Mi,32.0,4.3,13099.0,168507.0,Google,2024,1.0
4,Reliance,32.0,3.8,36.0,392.0,AndroidHD,2023,1.0
5,XElectron,43.0,4.1,154.0,776.0,AndroidUltra,2022,1.0
6,SAMSUNG,43.0,4.3,2894.0,36269.0,TizenUltra,2024,2.0
7,LG,32.0,4.3,1400.0,21115.0,WebOSHD,2020,1.0
8,SAMSUNG,43.0,4.3,10935.0,147586.0,TizenFull,2023,1.0
9,TCL,32.0,4.1,1887.0,22568.0,AndroidHD,2024,1.0


### Selling Price,Original Price

In [19]:
soup.find_all("div",class_ = "hl05eU")

[<div class="hl05eU"><div class="Nx9bqj _4b5DiR">₹23,990</div><div class="yRaY8j ZYYwLA">₹<!-- -->29,990</div><div class="UkUFwK"><span>20% off</span></div></div>,
 <div class="hl05eU"><div class="Nx9bqj _4b5DiR">₹14,999</div><div class="yRaY8j ZYYwLA">₹<!-- -->31,400</div><div class="UkUFwK"><span>52% off</span></div></div>,
 <div class="hl05eU"><div class="Nx9bqj _4b5DiR">₹43,990</div><div class="yRaY8j ZYYwLA">₹<!-- -->73,000</div><div class="UkUFwK"><span>39% off</span></div></div>,
 <div class="hl05eU"><div class="Nx9bqj _4b5DiR">₹3,05,900</div><div class="yRaY8j ZYYwLA">₹<!-- -->3,99,900</div><div class="UkUFwK"><span>23% off</span></div></div>,
 <div class="hl05eU"><div class="Nx9bqj _4b5DiR">₹5,990</div><div class="yRaY8j ZYYwLA">₹<!-- -->13,999</div><div class="UkUFwK"><span>57% off</span></div></div>,
 <div class="hl05eU"><div class="Nx9bqj _4b5DiR">₹19,999</div><div class="yRaY8j ZYYwLA">₹<!-- -->34,999</div><div class="UkUFwK"><span>42% off</span></div></div>,
 <div class="

In [96]:
sp = []
for i in soup.find_all("div",class_="tUxRFH"):
        a =i.find("div",class_="Nx9bqj _4b5DiR")
        if a :
            sp.append(a.text)
        else:
            sp.append(np.nan)

In [97]:
print(sp)
print(len(sp))

['₹4,45,000', '₹9,799', '₹24,490', '₹31,889', '₹5,990', '₹19,999', '₹1,10,000', '₹29,999', '₹8,999', '₹21,990', '₹23,499', '₹13,499', '₹15,999', '₹28,999', '₹34,390', '₹53,090', '₹59,912', '₹34,599', '₹16,390', '₹34,999', '₹22,199', '₹11,099', '₹35,500', '₹69,999']
24


In [14]:
op = []
for i in soup.find_all("div",class_="tUxRFH"):
        a =i.find("div",class_="yRaY8j ZYYwLA")
        if a :
            op.append(a.text)
        else:
            op.append(np.nan)

In [15]:
print(op)
print(len(op))

['₹29,990', '₹31,400', '₹73,000', '₹3,99,900', '₹13,999', '₹34,999', '₹65,990', '₹1,00,790', '₹31,999', '₹42,999', '₹2,54,990', nan, '₹17,599', '₹1,19,990', '₹59,990', '₹47,999', '₹24,999', '₹20,000', nan, '₹54,999', '₹37,999', '₹24,999', '₹5,49,990', '₹39,990']
24


In [22]:
soup.find_all("div",class_="UkUFwK")

[<div class="UkUFwK"><span>20% off</span></div>,
 <div class="UkUFwK"><span>52% off</span></div>,
 <div class="UkUFwK"><span>39% off</span></div>,
 <div class="UkUFwK"><span>23% off</span></div>,
 <div class="UkUFwK"><span>57% off</span></div>,
 <div class="UkUFwK"><span>42% off</span></div>,
 <div class="UkUFwK"><span>43% off</span></div>,
 <div class="UkUFwK"><span>37% off</span></div>,
 <div class="UkUFwK"><span>31% off</span></div>,
 <div class="UkUFwK"><span>33% off</span></div>,
 <div class="UkUFwK"><span>49% off</span></div>,
 <div class="UkUFwK"><span>60% off</span></div>,
 <div class="UkUFwK"><span>41% off</span></div>,
 <div class="UkUFwK"><span>38% off</span></div>,
 <div class="UkUFwK"><span>33% off</span></div>,
 <div class="UkUFwK"><span>32% off</span></div>,
 <div class="UkUFwK"><span>41% off</span></div>,
 <div class="UkUFwK"><span>41% off</span></div>,
 <div class="UkUFwK"><span>67% off</span></div>,
 <div class="UkUFwK"><span>36% off</span></div>,
 <div class="UkUFwK"

In [25]:
dp = []
for i in soup.find_all("div",class_="tUxRFH"):
        a =i.find("div",class_="UkUFwK")
        if a :
            dp.append(a.text)
        else:
            dp.append(np.nan)

In [26]:
print(dp)
print(len(dp))

['20% off', '52% off', '39% off', '23% off', '57% off', '42% off', '43% off', '37% off', '31% off', '33% off', '49% off', nan, '60% off', '41% off', '38% off', '33% off', '32% off', '41% off', nan, '41% off', '67% off', '36% off', '38% off', '52% off']
24


In [8]:
Brand = []
Inches = []
Scores = []
Ratings = []
Reviews = []
os_list = []
warranty_list = []
launch_year_list = []
sp = []
op = []
dp = []
for i in range(1,25):
    url = f"https://www.flipkart.com/search?q=television&page={i}"

    page = requests.get(url, headers=request_header)
    soup = BeautifulSoup(page.text)

    #for i in soup.find_all("div",class_ = "tUxRFH"):
     #   tv_name.append(i.text)
    for i in soup.find_all("div", class_="KzDlHZ"):
        text = i.text  
    
        brand_match = re.findall(r"^(\w+)(?:\s\w+)?", text)
        if brand_match:
            Brand.append(brand_match[0])  
        else:
            Brand.append(np.nan) 
    
        inch_match = re.findall(r"\((\d+)\s?inch\)", text)
        if inch_match:
            Inches.append(inch_match[0])  
        else:
            Inches.append(np.nan)  
    for item in soup.find_all("div", class_="tUxRFH"):
        score_text = item.find("div", class_="XQDdHH")
        if score_text:
            Scores.append(score_text.text)
        else:
            Scores.append(np.nan)
    
        rating_review_text = item.find("span", class_="Wphh3N")
        if rating_review_text:
            rating_review_text = rating_review_text.text
        else:
            rating_review_text = ""
    
        rating_match = re.search(r"([\d,]+)\sRatings", rating_review_text)
        if rating_match:
            Ratings.append(rating_match.group(1))
        else:
            Ratings.append(np.nan)
    
        review_match = re.search(r"([\d,]+)\sReviews", rating_review_text)
        if review_match:
            Reviews.append(review_match.group(1))
        else:
            Reviews.append(np.nan)

    for item in soup.find_all("div", class_="_6NESgJ"):
        text = item.text
        
        os_match = re.search(r"Operating System: (\w+)", text)
        if os_match:
            os_list.append(os_match.group(1))
        else:
            os_list.append(np.nan)
        
        year_match = re.search(r"Launch Year: (\d{4})", text)
        if year_match:
            launch_year_list.append(year_match.group(1))
        else:
            launch_year_list.append(np.nan)
        
        warranty_match = re.search(r"(\d)\s+Year", text)  
        if warranty_match:
            warranty_list.append(warranty_match.group(1))  
        else:
            warranty_list.append(np.nan)

    for i in soup.find_all("div",class_="tUxRFH"):
        a =i.find("div",class_="Nx9bqj _4b5DiR")
        if a :
            sp.append(a.text)
        else:
            sp.append(np.nan)
    for i in soup.find_all("div",class_="tUxRFH"):
        a =i.find("div",class_="yRaY8j ZYYwLA")
        if a :
            op.append(a.text)
        else:
            op.append(np.nan)
    for i in soup.find_all("div",class_="tUxRFH"):
        a =i.find("div",class_="UkUFwK")
        if a :
            dp.append(a.text)
        else:
            dp.append(np.nan)

In [9]:
print(len(Brand))
print(len(Inches))
print(len(Scores))
print(len(Ratings))
print(len(Reviews))
print(len(os_list))
print(len(warranty_list))
print(len(launch_year_list))
print(len(sp))
print(len(op))
print(len(dp))

576
576
576
576
576
576
576
576
576
576
576


### 5. Create a Data Frame 

- Convert the scraped data to DataFrame 


In [9]:
df = pd.DataFrame({"Brand":Brand,"Inches":Inches,"Scores":Scores,"Reviews":Reviews,"Ratings":Ratings,\
              "OS":os_list,"Launch Year":launch_year_list,"Warranty":warranty_list,\
               "Selling_Price":sp,"Original_Price":op,"Discount_Percentage":dp})
df

Unnamed: 0,Brand,Inches,Scores,Reviews,Ratings,OS,Launch Year,Warranty,Selling_Price,Original_Price,Discount_Percentage
0,XElectron,32,4.1,157,789,,2023,1,"₹7,650","₹17,999",57% off
1,XElectron,24,4.1,157,789,,2023,1,"₹5,950","₹13,999",57% off
2,LG,32,4.3,1454,22008,WebOSHD,2023,1,"₹13,990","₹19,990",30% off
3,Mi,32,4.3,13210,170109,Google,2024,1,"₹12,990","₹24,999",48% off
4,XElectron,43,4.1,157,789,AndroidUltra,2022,1,"₹19,950","₹34,999",42% off
...,...,...,...,...,...,...,...,...,...,...,...
571,LIMEBERRY,50,,,,WebOSUltra,2024,1,"₹48,019","₹89,999",46% off
572,LIMEBERRY,32,,,,Google,2023,1,"₹12,500","₹37,999",67% off
573,TOSHIBA,85,4,2,5,VIDAAUltra,2023,2,"₹1,99,999","₹4,99,999",60% off
574,Haier,32,,,,,2021,1,"₹12,990","₹25,990",50% off


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 576 entries, 0 to 575
Data columns (total 11 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Brand                576 non-null    object
 1   Inches               571 non-null    object
 2   Scores               519 non-null    object
 3   Reviews              519 non-null    object
 4   Ratings              519 non-null    object
 5   OS                   535 non-null    object
 6   Launch Year          576 non-null    object
 7   Warranty             517 non-null    object
 8   Selling_Price        574 non-null    object
 9   Original_Price       574 non-null    object
 10  Discount_Percentage  573 non-null    object
dtypes: object(11)
memory usage: 49.6+ KB


### 6. Export into .csv format 
- Export the data frame into .csv format 


In [14]:
data = df.to_csv()
data

',Brand,Inches,Scores,Reviews,Ratings,OS,Launch Year,Warranty,Selling_Price,Original_Price,Discount_Percentage\r\n0,XElectron,32,4.1,157,789,,2023,1,"₹7,650","₹17,999",57% off\r\n1,XElectron,24,4.1,157,789,,2023,1,"₹5,950","₹13,999",57% off\r\n2,LG,32,4.3,"1,454","22,008",WebOSHD,2023,1,"₹13,990","₹19,990",30% off\r\n3,Mi,32,4.3,"13,210","1,70,109",Google,2024,1,"₹12,990","₹24,999",48% off\r\n4,XElectron,43,4.1,157,789,AndroidUltra,2022,1,"₹19,950","₹34,999",42% off\r\n5,Infinix,32,4.2,"5,150","48,900",LinuxHD,2024,1,"₹8,499","₹16,999",50% off\r\n6,TCL,32,4.1,"2,000","23,694",AndroidHD,2024,1,"₹10,490","₹20,990",50% off\r\n7,TCL,32,4.1,"2,000","23,694",Google,2024,1,"₹12,990","₹23,990",45% off\r\n8,Thomson,32,4.4,"5,666","41,671",LinuxHD,2022,1,"₹8,499","₹14,999",43% off\r\n9,MOTOROLA,43,4.3,337,"3,910",Google,2024,1,"₹21,699","₹51,999",58% off\r\n10,Foxsky,32,3.9,764,"3,945",AndroidHD,2023,1,"₹7,699","₹22,499",65% off\r\n11,iFFALCON,55,4.2,"7,573","67,670",Google,2024,1,"₹26,999","₹73

### 7. Read CSV File 
- After exporting the data frame you need to import the data frame for


#### How many features(Columns) do you have?

Here are brief descriptions for each of columns:

1. **Brand** : The brand name of the TV.
2. **Inches** : Screen size of the TV in inches.
3. **Scores** : Average score rating given by customers.
4. **Reviews** : Total number of customer reviews for the TV.
5. **Ratings** : Total number of customer ratings for the TV.
6. **OS** : Operating system of the TV.
7. **Launch Year** : Year the TV model was released.
8. **Warranty** : Warranty period (in years) offered with the TV.
9. **Selling Price** : Current selling price of the TV.
10. **Original Price**: Original listed price of the TV before discounts.
11. **Discount Percentage**: 











#### How many observations(rows) do you have?


 - I was able to extract the 576 rows from FlipKart website using web scraping 

#### What is the data type of each feature(Columns)?


- When I extracted the data from Flipkart, all column data types were set as objects only.
- Now i want convert the corrct data types in each column

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 576 entries, 0 to 575
Data columns (total 11 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Brand                576 non-null    object
 1   Inches               571 non-null    object
 2   Scores               519 non-null    object
 3   Reviews              519 non-null    object
 4   Ratings              519 non-null    object
 5   OS                   535 non-null    object
 6   Launch Year          576 non-null    object
 7   Warranty             517 non-null    object
 8   Selling_Price        574 non-null    object
 9   Original_Price       574 non-null    object
 10  Discount_Percentage  573 non-null    object
dtypes: object(11)
memory usage: 49.6+ KB


#### How many missing values are there?


- There are 7 missing column values in above dataset
- Those columns are Inches(7 values), Scores(95 values), Reviews(95 values), Ratings(95 values), OS(52 values), Warranty(58 values), Original_Price(2 value) and Discount Percentage(3 values)

### 8. Clean the Data : 
In this section you have to clean the data like:   
- Removing the special characters,  
- Incorrect Headers, 
- Incorrect Format of the data (Invalid Values, Columns) 
- Converting the data types 
- Imputing the missing values,
- Identifying and treating the Outliers 

***Note: Fixing Rows and Columns***

In [18]:
df

Unnamed: 0,Brand,Inches,Scores,Reviews,Ratings,OS,Launch Year,Warranty,Selling_Price,Original_Price,Discount_Percentage
0,XElectron,32,4.1,157,789,,2023,1,"₹7,650","₹17,999",57% off
1,XElectron,24,4.1,157,789,,2023,1,"₹5,950","₹13,999",57% off
2,LG,32,4.3,1454,22008,WebOSHD,2023,1,"₹13,990","₹19,990",30% off
3,Mi,32,4.3,13210,170109,Google,2024,1,"₹12,990","₹24,999",48% off
4,XElectron,43,4.1,157,789,AndroidUltra,2022,1,"₹19,950","₹34,999",42% off
...,...,...,...,...,...,...,...,...,...,...,...
571,LIMEBERRY,50,,,,WebOSUltra,2024,1,"₹48,019","₹89,999",46% off
572,LIMEBERRY,32,,,,Google,2023,1,"₹12,500","₹37,999",67% off
573,TOSHIBA,85,4,2,5,VIDAAUltra,2023,2,"₹1,99,999","₹4,99,999",60% off
574,Haier,32,,,,,2021,1,"₹12,990","₹25,990",50% off


In [81]:
Data1.groupby(["OS"]).size().reset_index(name="count")

Unnamed: 0,OS,count
0,Android,11
1,AndroidFull,26
2,AndroidHD,31
3,AndroidUltra,54
4,CoolitaFull,6
5,CoolitaHD,4
6,CoolitaUltra,1
7,FireTv,6
8,Google,315
9,Linux,2
