# Bengalore House Price Prediction

## About this dataset file

- What are the things that a potential home buyer considers before purchasing a house? The location, the size of the property, vicinity to offices, schools, parks, restaurants, hospitals or the stereotypical white picket fence? What about the most important factor — the price?

- Now with the lingering impact of demonetization, the enforcement of the Real Estate (Regulation and Development) Act (RERA), and the lack of trust in property developers in the city, housing units sold across India in 2017 dropped by 7 percent. In fact, the property prices in Bengaluru fell by almost 5 percent in the second half of 2017, said a study published by property consultancy Knight Frank.

- For example, for a potential homeowner, over 9,000 apartment projects and flats for sale are available in the range of ₹42-52 lakh, followed by over 7,100 apartments that are in the ₹52-62 lakh budget segment, says a report by property website Makaan. According to the study, there are over 5,000 projects in the ₹15-25 lakh budget segment followed by those in the ₹34-43 lakh budget category.

- Buying a home, especially in a city like Bengaluru, is a tricky choice. While the major factors are usually the same for all metros, there are others to be considered for the Silicon Valley of India. With its help millennial crowd, vibrant culture, great climate and a slew of job opportunities, it is difficult to ascertain the price of a house in Bengaluru.

In [1]:
import numpy as np
import pandas as pd

In [2]:
data=pd.read_csv("Bengaluru_House_Data.csv")

In [3]:
data.sample(5)

Unnamed: 0,area_type,availability,location,size,society,total_sqft,bath,balcony,price
9174,Super built-up Area,Ready To Move,Choodasandra,2 BHK,Maidsr,1075,2.0,2.0,45.0
12961,Built-up Area,Ready To Move,Kavika Layout,6 BHK,,1799,6.0,3.0,101.0
335,Super built-up Area,19-Dec,Whitefield,2 BHK,Oreldhi,1173,2.0,1.0,58.0
2313,Super built-up Area,Ready To Move,Malleshwaram,2 BHK,SrntsRV,900,2.0,1.0,95.0
7065,Built-up Area,Ready To Move,Ashwathnagar,2 BHK,,1100,2.0,2.0,55.0


In [4]:
data.head()

Unnamed: 0,area_type,availability,location,size,society,total_sqft,bath,balcony,price
0,Super built-up Area,19-Dec,Electronic City Phase II,2 BHK,Coomee,1056,2.0,1.0,39.07
1,Plot Area,Ready To Move,Chikka Tirupathi,4 Bedroom,Theanmp,2600,5.0,3.0,120.0
2,Built-up Area,Ready To Move,Uttarahalli,3 BHK,,1440,2.0,3.0,62.0
3,Super built-up Area,Ready To Move,Lingadheeranahalli,3 BHK,Soiewre,1521,3.0,1.0,95.0
4,Super built-up Area,Ready To Move,Kothanur,2 BHK,,1200,2.0,1.0,51.0


In [5]:
data.shape

(13320, 9)

In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13320 entries, 0 to 13319
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   area_type     13320 non-null  object 
 1   availability  13320 non-null  object 
 2   location      13319 non-null  object 
 3   size          13304 non-null  object 
 4   society       7818 non-null   object 
 5   total_sqft    13320 non-null  object 
 6   bath          13247 non-null  float64
 7   balcony       12711 non-null  float64
 8   price         13320 non-null  float64
dtypes: float64(3), object(6)
memory usage: 936.7+ KB


In [7]:
data.isnull().sum()

area_type          0
availability       0
location           1
size              16
society         5502
total_sqft         0
bath              73
balcony          609
price              0
dtype: int64

In [8]:
for column in data.columns:
    print(data[column].value_counts())
    print("*"*40)

area_type
Super built-up  Area    8790
Built-up  Area          2418
Plot  Area              2025
Carpet  Area              87
Name: count, dtype: int64
****************************************
availability
Ready To Move    10581
18-Dec             307
18-May             295
18-Apr             271
18-Aug             200
                 ...  
16-Oct               1
17-Jan               1
16-Nov               1
16-Jan               1
14-Jul               1
Name: count, Length: 81, dtype: int64
****************************************
location
Whitefield                         540
Sarjapur  Road                     399
Electronic City                    302
Kanakpura Road                     273
Thanisandra                        234
                                  ... 
3rd Stage Raja Rajeshwari Nagar      1
Chuchangatta Colony                  1
Electronic City Phase 1,             1
Chikbasavanapura                     1
Abshot Layout                        1
Name: count, Length: 130

In [9]:
data.columns

Index(['area_type', 'availability', 'location', 'size', 'society',
       'total_sqft', 'bath', 'balcony', 'price'],
      dtype='object')

In [10]:
data.drop(columns=['area_type','availability','society','balcony'],inplace=True)

In [11]:
data.describe()

Unnamed: 0,bath,price
count,13247.0,13320.0
mean,2.69261,112.565627
std,1.341458,148.971674
min,1.0,8.0
25%,2.0,50.0
50%,2.0,72.0
75%,3.0,120.0
max,40.0,3600.0


In [12]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13320 entries, 0 to 13319
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   location    13319 non-null  object 
 1   size        13304 non-null  object 
 2   total_sqft  13320 non-null  object 
 3   bath        13247 non-null  float64
 4   price       13320 non-null  float64
dtypes: float64(2), object(3)
memory usage: 520.4+ KB


In [13]:
data['location'].value_counts()

location
Whitefield                         540
Sarjapur  Road                     399
Electronic City                    302
Kanakpura Road                     273
Thanisandra                        234
                                  ... 
3rd Stage Raja Rajeshwari Nagar      1
Chuchangatta Colony                  1
Electronic City Phase 1,             1
Chikbasavanapura                     1
Abshot Layout                        1
Name: count, Length: 1305, dtype: int64

In [14]:
data['location']=data['location'].fillna('Whitefield')

In [15]:
data['size'].value_counts()

size
2 BHK         5199
3 BHK         4310
4 Bedroom      826
4 BHK          591
3 Bedroom      547
1 BHK          538
2 Bedroom      329
5 Bedroom      297
6 Bedroom      191
1 Bedroom      105
8 Bedroom       84
7 Bedroom       83
5 BHK           59
9 Bedroom       46
6 BHK           30
7 BHK           17
1 RK            13
10 Bedroom      12
9 BHK            8
8 BHK            5
11 BHK           2
10 BHK           2
11 Bedroom       2
27 BHK           1
19 BHK           1
43 Bedroom       1
16 BHK           1
14 BHK           1
12 Bedroom       1
13 BHK           1
18 Bedroom       1
Name: count, dtype: int64

In [16]:
## since there are only 16 null values in size and highest freq is of 2 BHK so we will add this
data['size']=data['size'].fillna('2 BHK')

In [17]:
data.isnull().sum()

location       0
size           0
total_sqft     0
bath          73
price          0
dtype: int64

In [18]:
data['bath']=data['bath'].fillna(data['bath'].median())

In [19]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13320 entries, 0 to 13319
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   location    13320 non-null  object 
 1   size        13320 non-null  object 
 2   total_sqft  13320 non-null  object 
 3   bath        13320 non-null  float64
 4   price       13320 non-null  float64
dtypes: float64(2), object(3)
memory usage: 520.4+ KB


In [20]:
data['bhk']=data['size'].str.split().str.get(0).astype(int)

In [21]:
data['bhk']

0        2
1        4
2        3
3        3
4        2
        ..
13315    5
13316    4
13317    2
13318    4
13319    1
Name: bhk, Length: 13320, dtype: int64

In [22]:
data[data.bhk>20]

Unnamed: 0,location,size,total_sqft,bath,price,bhk
1718,2Electronic City Phase II,27 BHK,8000,27.0,230.0,27
4684,Munnekollal,43 Bedroom,2400,40.0,660.0,43


In [23]:
data[data.bhk==2]

Unnamed: 0,location,size,total_sqft,bath,price,bhk
0,Electronic City Phase II,2 BHK,1056,2.0,39.07,2
4,Kothanur,2 BHK,1200,2.0,51.00,2
5,Whitefield,2 BHK,1170,2.0,38.00,2
12,7th Phase JP Nagar,2 BHK,1000,2.0,38.00,2
13,Gottigere,2 BHK,1100,2.0,40.00,2
...,...,...,...,...,...,...
13302,Annaiah Reddy Layout,2 BHK,1075,2.0,48.00,2
13304,Raja Rajeshwari Nagar,2 BHK,1187,2.0,40.14,2
13310,Rachenahalli,2 BHK,1050,2.0,52.71,2
13312,Bellandur,2 BHK,1262,2.0,47.00,2


In [24]:
data[data.bhk==4]

Unnamed: 0,location,size,total_sqft,bath,price,bhk
1,Chikka Tirupathi,4 Bedroom,2600,5.0,120.0,4
6,Old Airport Road,4 BHK,2732,4.0,204.0,4
7,Rajaji Nagar,4 BHK,3300,4.0,600.0,4
11,Whitefield,4 Bedroom,2785,5.0,295.0,4
22,Thanisandra,4 Bedroom,2800,5.0,380.0,4
...,...,...,...,...,...,...
13294,Sultan Palaya,4 BHK,2200,3.0,80.0,4
13299,Whitefield,4 BHK,2830 - 2882,5.0,154.5,4
13306,Rajarajeshwari Nagara,4 Bedroom,1200,5.0,325.0,4
13316,Richards Town,4 BHK,3600,5.0,400.0,4


In [25]:
data['total_sqft'].unique()

array(['1056', '2600', '1440', ..., '1133 - 1384', '774', '4689'],
      dtype=object)

In [26]:
def convertRange(x):
    temp=x.split('-')
    if len(temp)==2:
        return (float(temp[0])+float(temp[1]))/2
    try:
        return float(x)
    except:
        return None

In [27]:
data['total_sqft']=data['total_sqft'].apply(convertRange)

In [28]:
data.head()

Unnamed: 0,location,size,total_sqft,bath,price,bhk
0,Electronic City Phase II,2 BHK,1056.0,2.0,39.07,2
1,Chikka Tirupathi,4 Bedroom,2600.0,5.0,120.0,4
2,Uttarahalli,3 BHK,1440.0,2.0,62.0,3
3,Lingadheeranahalli,3 BHK,1521.0,3.0,95.0,3
4,Kothanur,2 BHK,1200.0,2.0,51.0,2


In [29]:
## price per square feet
data['price_per_sqft']=data['price']*100000/data['total_sqft']

In [30]:
data.head()

Unnamed: 0,location,size,total_sqft,bath,price,bhk,price_per_sqft
0,Electronic City Phase II,2 BHK,1056.0,2.0,39.07,2,3699.810606
1,Chikka Tirupathi,4 Bedroom,2600.0,5.0,120.0,4,4615.384615
2,Uttarahalli,3 BHK,1440.0,2.0,62.0,3,4305.555556
3,Lingadheeranahalli,3 BHK,1521.0,3.0,95.0,3,6245.890861
4,Kothanur,2 BHK,1200.0,2.0,51.0,2,4250.0


In [31]:
data['location'].value_counts()

location
Whitefield                         541
Sarjapur  Road                     399
Electronic City                    302
Kanakpura Road                     273
Thanisandra                        234
                                  ... 
3rd Stage Raja Rajeshwari Nagar      1
Chuchangatta Colony                  1
Electronic City Phase 1,             1
Chikbasavanapura                     1
Abshot Layout                        1
Name: count, Length: 1305, dtype: int64

In [32]:
data['location'] = data['location'].apply(lambda x:x.strip())
location_count=data['location'].value_counts()

In [33]:
location_count_less_10=location_count[location_count<=10]
location_count_less_10

location
Basapura                                10
Dairy Circle                            10
Nagappa Reddy Layout                    10
Naganathapura                           10
Sector 1 HSR Layout                     10
                                        ..
Duddanahalli                             1
Doddanakunte                             1
Jogupalya                                1
Subhash Nagar                            1
Kengeri Satellite Town KHB Apartment     1
Name: count, Length: 1053, dtype: int64

In [34]:
data['location']=data['location'].apply(lambda x:'other' if x in location_count_less_10 else x)

In [35]:
data.sample(5)

Unnamed: 0,location,size,total_sqft,bath,price,bhk,price_per_sqft
13154,Sarjapur Road,2 BHK,1112.0,2.0,58.0,2,5215.827338
9847,Chandapura,2 BHK,1025.0,2.0,27.68,2,2700.487805
5595,Rajaji Nagar,2 BHK,1440.0,2.0,185.0,2,12847.222222
12816,other,2 BHK,1200.0,2.0,50.0,2,4166.666667
6880,HBR Layout,2 BHK,1089.0,2.0,60.0,2,5509.641873


In [36]:
data['location'].value_counts()

location
other                        2885
Whitefield                    542
Sarjapur  Road                399
Electronic City               304
Kanakpura Road                273
                             ... 
Tindlu                         11
Marsur                         11
2nd Phase Judicial Layout      11
Thyagaraja Nagar               11
HAL 2nd Stage                  11
Name: count, Length: 242, dtype: int64

In [37]:
data.describe()

Unnamed: 0,total_sqft,bath,price,bhk,price_per_sqft
count,13274.0,13320.0,13320.0,13320.0,13274.0
mean,1559.626694,2.688814,112.565627,2.802778,7907.501
std,1238.405258,1.338754,148.971674,1.294496,106429.6
min,1.0,1.0,8.0,1.0,267.8298
25%,1100.0,2.0,50.0,2.0,4266.865
50%,1276.0,2.0,72.0,3.0,5434.306
75%,1680.0,3.0,120.0,3.0,7311.746
max,52272.0,40.0,3600.0,43.0,12000000.0


In [38]:
(data['total_sqft']/data['bhk']).describe()

count    13274.000000
mean       575.074878
std        388.205175
min          0.250000
25%        473.333333
50%        552.500000
75%        625.000000
max      26136.000000
dtype: float64

In [39]:
data=data[((data['total_sqft']/data['bhk'])>=300)]

In [40]:
data.sample(5)

Unnamed: 0,location,size,total_sqft,bath,price,bhk,price_per_sqft
1655,Konanakunte,4 BHK,3000.0,4.0,160.0,4,5333.333333
13191,Devanahalli,3 BHK,1520.0,2.0,69.76,3,4589.473684
5075,Sarjapur Road,5 Bedroom,3200.0,5.0,140.0,5,4375.0
9466,Hormavu,2 BHK,1165.0,2.0,40.0,2,3433.476395
1982,other,3 BHK,1685.0,2.0,110.0,3,6528.189911


In [41]:
data.describe()

Unnamed: 0,total_sqft,bath,price,bhk,price_per_sqft
count,12530.0,12530.0,12530.0,12530.0,12530.0
mean,1594.564544,2.559537,111.382401,2.650838,6303.979357
std,1261.271296,1.077938,152.077329,0.976678,4162.237981
min,300.0,1.0,8.44,1.0,267.829813
25%,1116.0,2.0,49.0,2.0,4210.526316
50%,1300.0,2.0,70.0,3.0,5294.117647
75%,1700.0,3.0,115.0,3.0,6916.666667
max,52272.0,16.0,3600.0,16.0,176470.588235


In [42]:
data.shape

(12530, 7)

In [43]:
data['price_per_sqft'].describe()

count     12530.000000
mean       6303.979357
std        4162.237981
min         267.829813
25%        4210.526316
50%        5294.117647
75%        6916.666667
max      176470.588235
Name: price_per_sqft, dtype: float64

In [44]:
def remove_outliers_sqft(df):
    df_output=pd.DataFrame()
    for key,subdf in df.groupby('location'):
        m=np.mean(subdf.price_per_sqft)
        
        st=np.std(subdf.price_per_sqft)
        
        gen_df=subdf[(subdf.price_per_sqft>(m-st)) & (subdf.price_per_sqft <= (m+st))]
        df_output=pd.concat([df_output,gen_df],ignore_index=True)
    return df_output
data=remove_outliers_sqft(data)
data.describe()

Unnamed: 0,total_sqft,bath,price,bhk,price_per_sqft
count,10301.0,10301.0,10301.0,10301.0,10301.0
mean,1508.440608,2.471702,91.286372,2.574896,5659.062876
std,880.694214,0.979449,86.342786,0.897649,2265.774749
min,300.0,1.0,10.0,1.0,1250.0
25%,1110.0,2.0,49.0,2.0,4244.897959
50%,1286.0,2.0,67.0,2.0,5175.600739
75%,1650.0,3.0,100.0,3.0,6428.571429
max,30400.0,16.0,2200.0,16.0,24509.803922


In [45]:
def bhk_outlier_remover(df):
    exclude_indices=np.array([])
    for location, location_df in df.groupby('location'):
        bhk_stats={}
        for bhk,bhk_df in location_df.groupby('bhk'):
            bhk_stats[bhk]={
                'mean':np.mean(bhk_df.price_per_sqft),
                'std':np.std(bhk_df.price_per_sqft),
                'count':bhk_df.shape[0]
            }
            
        for bhk,bhk_df in location_df.groupby('bhk'):
            stats=bhk_stats.get(bhk-1)
            if stats and stats['count']>5:
                exclude_indices=np.append(exclude_indices, bhk_df[bhk_df.price_per_sqft<(stats['mean'])].index.values)
    return df.drop(exclude_indices,axis='index')

In [46]:
data=bhk_outlier_remover(data)

In [47]:
data.shape

(7361, 7)

In [48]:
data

Unnamed: 0,location,size,total_sqft,bath,price,bhk,price_per_sqft
0,1st Block Jayanagar,4 BHK,2850.0,4.0,428.0,4,15017.543860
1,1st Block Jayanagar,3 BHK,1630.0,3.0,194.0,3,11901.840491
2,1st Block Jayanagar,3 BHK,1875.0,2.0,235.0,3,12533.333333
3,1st Block Jayanagar,3 BHK,1200.0,2.0,130.0,3,10833.333333
4,1st Block Jayanagar,2 BHK,1235.0,2.0,148.0,2,11983.805668
...,...,...,...,...,...,...,...
10292,other,2 BHK,1200.0,2.0,70.0,2,5833.333333
10293,other,1 BHK,1800.0,1.0,200.0,1,11111.111111
10296,other,2 BHK,1353.0,2.0,110.0,2,8130.081301
10297,other,1 Bedroom,812.0,1.0,26.0,1,3201.970443


In [49]:
data.drop(columns=['size','price_per_sqft'],inplace=True)

In [50]:
data.head()

Unnamed: 0,location,total_sqft,bath,price,bhk
0,1st Block Jayanagar,2850.0,4.0,428.0,4
1,1st Block Jayanagar,1630.0,3.0,194.0,3
2,1st Block Jayanagar,1875.0,2.0,235.0,3
3,1st Block Jayanagar,1200.0,2.0,130.0,3
4,1st Block Jayanagar,1235.0,2.0,148.0,2


In [51]:
data['location'].value_counts()

location
other                    1154
Whitefield                248
Sarjapur  Road            195
Electronic City           162
Raja Rajeshwari Nagar     140
                         ... 
Banjara Layout              4
Vishwapriya Layout          4
Thyagaraja Nagar            4
Vishveshwarya Layout        4
Marsur                      3
Name: count, Length: 242, dtype: int64

In [52]:
data.to_csv("Cleaned_data.csv")

In [53]:
X=data.drop(columns=['price'])
y=data['price']

In [54]:
X.columns

Index(['location', 'total_sqft', 'bath', 'bhk'], dtype='object')

In [56]:
X.describe()

Unnamed: 0,total_sqft,bath,bhk
count,7361.0,7361.0,7361.0
mean,1496.942529,2.448173,2.500611
std,865.78199,1.011515,0.929312
min,300.0,1.0,1.0
25%,1096.0,2.0,2.0
50%,1260.0,2.0,2.0
75%,1680.0,3.0,3.0
max,30000.0,16.0,16.0


In [58]:
for column in X.columns:
    print(column,X[column].dtype)

location object
total_sqft float64
bath float64
bhk int64


In [55]:
y.columns

AttributeError: 'Series' object has no attribute 'columns'

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression,Lasso,Ridge
from sklearn.preprocessing import OneHotEncoder,StandardScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from sklearn.metrics import r2_score

In [None]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)

In [None]:
print(X_train.shape)
print(X_test.shape)

(5888, 4)
(1473, 4)


In [None]:
X_train.columns

Index(['location', 'total_sqft', 'bath', 'bhk'], dtype='object')

## Applying Linear Regression

In [None]:
column_trans=make_column_transformer((OneHotEncoder(sparse_output=False),['location']),
                                                remainder='passthrough')

In [None]:
scaler=StandardScaler()

In [None]:
lr=LinearRegression()

In [None]:
pipe = make_pipeline(column_trans,scaler,lr)

In [None]:
pipe.fit(X_train,y_train)

The format of the columns of the 'remainder' transformer in ColumnTransformer.transformers_ will change in version 1.7 to match the format of the other transformers.
At the moment the remainder columns are stored as indices (of type int). With the same ColumnTransformer configuration, in the future they will be stored as column names (of type str).



In [None]:
y_pred_lr=pipe.predict(X_test)

In [None]:
r2_score(y_test,y_pred_lr)

0.825226976284512

## Applying Lasso

In [None]:
lasso=Lasso()

In [None]:
pipe=make_pipeline(column_trans,scaler,lasso)

In [None]:
pipe.fit(X_train,y_train)

The format of the columns of the 'remainder' transformer in ColumnTransformer.transformers_ will change in version 1.7 to match the format of the other transformers.
At the moment the remainder columns are stored as indices (of type int). With the same ColumnTransformer configuration, in the future they will be stored as column names (of type str).



In [None]:
y_pred_lasso=pipe.predict(X_test)

In [None]:
r2_score(y_test,y_pred_lasso)

0.814689475169039

## Applying Ridge

In [None]:
ridge=Ridge()

In [None]:
pipe=make_pipeline(column_trans,scaler,ridge)

In [None]:
pipe.fit(X_train,y_train)

The format of the columns of the 'remainder' transformer in ColumnTransformer.transformers_ will change in version 1.7 to match the format of the other transformers.
At the moment the remainder columns are stored as indices (of type int). With the same ColumnTransformer configuration, in the future they will be stored as column names (of type str).



In [None]:
y_pred_ridge=pipe.predict(X_test)

In [None]:
r2_score(y_test,y_pred_ridge)

0.825234850229012

In [None]:
from sklearn.ensemble import RandomForestRegressor
rf_reg=RandomForestRegressor()

In [None]:
pipe=make_pipeline(column_trans,scaler,rf_reg)

In [None]:
pipe.fit(X_train,y_train)

The format of the columns of the 'remainder' transformer in ColumnTransformer.transformers_ will change in version 1.7 to match the format of the other transformers.
At the moment the remainder columns are stored as indices (of type int). With the same ColumnTransformer configuration, in the future they will be stored as column names (of type str).



In [None]:
y_pred_rf=pipe.predict(X_test)

In [None]:
r2_score(y_test,y_pred_rf)

0.7825063012573916

So here the highest r2_score is similar for LinearRegression and Ridge
,so let we choose Ridge

In [None]:
pipe=make_pipeline(column_trans,scaler,ridge)

In [None]:
pipe.fit(X_train,y_train)

The format of the columns of the 'remainder' transformer in ColumnTransformer.transformers_ will change in version 1.7 to match the format of the other transformers.
At the moment the remainder columns are stored as indices (of type int). With the same ColumnTransformer configuration, in the future they will be stored as column names (of type str).



In [None]:
y_pred_rf=pipe.predict(X_test)
r2_score(y_test,y_pred_rf)

0.825234850229012

In [None]:
import pickle

In [None]:
pickle.dump(pipe,open("RidgeModel.pkl","wb"))

In [None]:
data['location'].unique()

array(['1st Block Jayanagar', '1st Phase JP Nagar',
       '2nd Phase Judicial Layout', '2nd Stage Nagarbhavi',
       '5th Block Hbr Layout', '5th Phase JP Nagar', '6th Phase JP Nagar',
       '7th Phase JP Nagar', '8th Phase JP Nagar', '9th Phase JP Nagar',
       'AECS Layout', 'Abbigere', 'Akshaya Nagar', 'Ambalipura',
       'Ambedkar Nagar', 'Amruthahalli', 'Anandapura', 'Ananth Nagar',
       'Anekal', 'Anjanapura', 'Ardendale', 'Arekere', 'Attibele',
       'BEML Layout', 'BTM 2nd Stage', 'BTM Layout', 'Babusapalaya',
       'Badavala Nagar', 'Balagere', 'Banashankari',
       'Banashankari Stage II', 'Banashankari Stage III',
       'Banashankari Stage V', 'Banashankari Stage VI', 'Banaswadi',
       'Banjara Layout', 'Bannerghatta', 'Bannerghatta Road',
       'Basavangudi', 'Basaveshwara Nagar', 'Battarahalli', 'Begur',
       'Begur Road', 'Bellandur', 'Benson Town', 'Bharathi Nagar',
       'Bhoganhalli', 'Billekahalli', 'Binny Pete', 'Bisuvanahalli',
       'Bommanahalli'

In [None]:
newdf=pd.read_csv('Cleaned_data.csv')

In [None]:
newdf.columns

Index(['Unnamed: 0', 'location', 'total_sqft', 'bath', 'price', 'bhk'], dtype='object')

In [None]:
newdf.location.value_counts()

location
other                    1154
Whitefield                248
Sarjapur  Road            195
Electronic City           162
Raja Rajeshwari Nagar     140
                         ... 
Banjara Layout              4
Vishwapriya Layout          4
Thyagaraja Nagar            4
Vishveshwarya Layout        4
Marsur                      3
Name: count, Length: 242, dtype: int64

In [None]:
newdf['bhk'].unique()

array([ 4,  3,  2,  5,  1,  6,  8,  7,  9, 10, 11, 16, 13])

In [None]:
newdf['bath'].unique()

array([ 4.,  3.,  2.,  5.,  8.,  1.,  6.,  7.,  9., 12., 16., 13.])

In [None]:
newdf['total_sqft'].max()

np.float64(30000.0)

In [None]:
newdf['total_sqft'].info()

<class 'pandas.core.series.Series'>
RangeIndex: 7361 entries, 0 to 7360
Series name: total_sqft
Non-Null Count  Dtype  
--------------  -----  
7361 non-null   float64
dtypes: float64(1)
memory usage: 57.6 KB


In [None]:
newdf['total_sqft'].dtype()

TypeError: 'numpy.dtypes.Float64DType' object is not callable