# Booj Test

I am writing a script to download the XML file from the URL given in the git readme file.

In [18]:
import json
import time
import requests
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup

Here I upload the data raw from the website provided

In [19]:
url = "http://syndication.enterprise.websiteidx.com/feeds/BoojCodeTest.xml"

payload = "{}"
response = requests.request("GET", url, data=payload)

print(response.text)

<?xml version='1.0' encoding='UTF-8'?>
<Listings>

  <Listing>
    <Location>
      <StreetAddress>0 Castro Peak Mountainway</StreetAddress>
      <UnitNumber/>
      <City>Malibu</City>
      <State>CA</State>
      <Zip>90265</Zip>
      <ParcelID/>
      <Lat>34.087528</Lat>
      <Long>-118.804301</Long>
      <County/>
      <StreetIntersection/>
      <DisplayAddress>Yes</DisplayAddress>
    </Location>
    <ListingDetails>
      <Status>Active</Status>
      <Price>535000.00</Price>
      <ListingUrl>http://www.thepartnerstrust.com/property/42921875/syn/43/</ListingUrl>
      <MlsId>14799273</MlsId>
      <MlsName>CLAW</MlsName>
      <ProviderListingId>42921875</ProviderListingId>
      <DateListed>2014-10-03 00:00:00</DateListed>
      <VirtualTourUrl><![CDATA[http://www.thepartnerstrust.com/property/42921875/syn/43/]]></VirtualTourUrl>
      <ListingEmail>zillow-41@leadrelay.com</ListingEmail>
      <AlwaysEmailAgent>0</AlwaysEmailAgent>
      <Shor

As we can see, it's in XML format so I load in Beautiful Soup to my libraries above to clean it up a bit

In [20]:
soup = BeautifulSoup(response.text, 'lxml')

In [21]:
soup

<?xml version='1.0' encoding='UTF-8'?><html><body><listings>
<listing>
<location>
<streetaddress>0 Castro Peak Mountainway</streetaddress>
<unitnumber></unitnumber>
<city>Malibu</city>
<state>CA</state>
<zip>90265</zip>
<parcelid></parcelid>
<lat>34.087528</lat>
<long>-118.804301</long>
<county></county>
<streetintersection></streetintersection>
<displayaddress>Yes</displayaddress>
</location>
<listingdetails>
<status>Active</status>
<price>535000.00</price>
<listingurl>http://www.thepartnerstrust.com/property/42921875/syn/43/</listingurl>
<mlsid>14799273</mlsid>
<mlsname>CLAW</mlsname>
<providerlistingid>42921875</providerlistingid>
<datelisted>2014-10-03 00:00:00</datelisted>
<virtualtoururl></virtualtoururl>
<listingemail>zillow-41@leadrelay.com</listingemail>
<alwaysemailagent>0</alwaysemailagent>
<shortsale></shortsale>
<reo></reo>
</listingdetails>
<rentaldetails>
<availability></availability>
<leaseterm></leaseterm>
<depositfees></depositfees>
<utilitiesincluded>
<water></water>

### Feature Selections

In [22]:
dates = [t.text for t in soup.find_all('datelisted')]
mlsid = [t.text for t in soup.find_all('mlsid')]
mlsname = [t.text for t in soup.find_all('mlsname')]
street = [t.text for t in soup.find_all('streetaddress')]
price = [t.text for t in soup.find_all('price')]
bedrooms = [t.text for t in soup.find_all('bedrooms')]
bathrooms = [t.text for t in soup.find_all('bathrooms')]
appliances = [t.text for t in soup.find_all('appliances')]
description = [t.text for t in soup.find_all('description')]
fullbath = [t.text for t in soup.find_all('fullbathrooms')]
halfbath = [t.text for t in soup.find_all('halfbathrooms')]
threequarter = [t.text for t in soup.find_all('threequarterbathrooms')]

I looked through a few of the first listings to get a feel for the data.  I noticed a lot of data was missing in some of the fields.  I decided to pull each requested feature, throw them into a Pandas and then clean it up from there.  I was a little confused with what qualifies a *room* or *bathrooms*.  I wasn't sure if you wanted all the different sizes of bathrooms so I included them into the Pandas just in case.  I never figured out logically how to ascertain the total amount of *rooms*.  I wasn't sure if you considered a basement to be a room or a sauna to be a room.

In [23]:
data = [dates, mlsid, mlsname, street, price, bedrooms, bathrooms, appliances, description, fullbath,
        halfbath, threequarter]

In [24]:
df = pd.DataFrame(data)

In [25]:
df = df.T

In [26]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,2014-10-03 00:00:00,14799273,CLAW,0 Castro Peak Mountainway,535000.0,0,,,,,,
1,2014-10-17 00:00:00,14802845,CLAW,"23410 Civic Center Way, C1",200000.0,0,,,,,,
2,2015-03-03 00:00:00,15883387,CLAW,0 SADDLE PEAK RD,23500.0,3,,\nBuilt-Ins\nRange Hood\nMicrowave\nRangeOven\n,,4.0,,
3,2015-03-18 00:00:00,15888095,CLAW,"23410 Civic Center Way, C1",72500.0,5,,\nBuilt-Ins\nRange Hood\nMicrowave\nRangeOven\n,,4.0,1.0,
4,2015-11-18 00:00:00,15959941,CLAW,21310 PACIFIC COAST HWY,230000.0,2,,\nRangeOven\n,,1.0,,


In [27]:
df.shape

(438, 12)

Now I have the raw data in a Pandas DF, I can easily clean up the rest.  I am unhappy that the entires before I converted were 219.  It appears through the conversion from XML to Pandas, I doubled my rows.  I'll be keeping an eye out on what happened there by doing further investigation.

In [28]:
df.columns

RangeIndex(start=0, stop=12, step=1)

In [29]:
df.columns = ['Date', 'MLSid', 'MLS Name', 'Street', 'Price', 'Bedrooms', 'Bathrooms', 
                   'Appliances', 'Description', 'Fullbath', 'Halfbath', 'Threequarter Bath']

In [30]:
df_work = df

In [31]:
df.head()

Unnamed: 0,Date,MLSid,MLS Name,Street,Price,Bedrooms,Bathrooms,Appliances,Description,Fullbath,Halfbath,Threequarter Bath
0,2014-10-03 00:00:00,14799273,CLAW,0 Castro Peak Mountainway,535000.0,0,,,,,,
1,2014-10-17 00:00:00,14802845,CLAW,"23410 Civic Center Way, C1",200000.0,0,,,,,,
2,2015-03-03 00:00:00,15883387,CLAW,0 SADDLE PEAK RD,23500.0,3,,\nBuilt-Ins\nRange Hood\nMicrowave\nRangeOven\n,,4.0,,
3,2015-03-18 00:00:00,15888095,CLAW,"23410 Civic Center Way, C1",72500.0,5,,\nBuilt-Ins\nRange Hood\nMicrowave\nRangeOven\n,,4.0,1.0,
4,2015-11-18 00:00:00,15959941,CLAW,21310 PACIFIC COAST HWY,230000.0,2,,\nRangeOven\n,,1.0,,


### Now begins my EDA with my ```df_work``` dataframe

In [32]:
df_work.isnull().sum()

Date                 219
MLSid                219
MLS Name             219
Street                 0
Price                219
Bedrooms             219
Bathrooms            219
Appliances           219
Description            0
Fullbath             219
Halfbath             219
Threequarter Bath    219
dtype: int64

In [33]:
df_work.dtypes

Date                 object
MLSid                object
MLS Name             object
Street               object
Price                object
Bedrooms             object
Bathrooms            object
Appliances           object
Description          object
Fullbath             object
Halfbath             object
Threequarter Bath    object
dtype: object

It looks like all the features are ```object``` and need to be converted

In [34]:
df_work['Date'] = pd.to_datetime(df_work['Date'])

In [35]:
df_work['Price'] = df_work['Price'].astype(float)

In [36]:
df_work['Bedrooms'].unique()

array(['0', '3', '5', '2', '6', '4', '7', '1', '8', None], dtype=object)

I noticed that there was a string ```'None'``` in the *numeric* column ```Bedrooms```.  This needed to be converted to a zero so it would easily be changed to an integer for the whole feature.

In [37]:
df_work['Bedrooms'] = df_work['Bedrooms'].replace('None', '0')

I decided to personally going through each feature with integers to assess unique values and correct the formatting

In [38]:
df_work['Bathrooms'].unique()

array(['', None], dtype=object)

In [39]:
df_work['Bathrooms'] = df_work['Bathrooms'].replace('None', '0')
df_work['Bathrooms'] = df_work['Bathrooms'].replace('', '0')

In [40]:
df_work['Fullbath'].unique()

array(['', '4', '1', '3', '2', '7', '5', '8', '6', None], dtype=object)

In [41]:
df_work['Fullbath'] = df_work['Fullbath'].replace('None', '0')
df_work['Fullbath'] = df_work['Fullbath'].replace('', '0')

In [42]:
df_work['Halfbath'].unique()

array(['', '1', '2', '3', '99', None], dtype=object)

This was an incredibly amusing find.  how can there be a listing with ```'99'``` Half Bathrooms?

In [43]:
df_work['Halfbath'] = df_work['Halfbath'].replace('None', '0')
df_work['Halfbath'] = df_work['Halfbath'].replace('', '0')

In [44]:
df_work['Halfbath'].value_counts()

0     139
1      72
2       5
3       2
99      1
Name: Halfbath, dtype: int64

It seems there was only one instance of a home with 99 Half Bathrooms so I felt it was necessary to change that to zero.

In [45]:
#df_work['Halfbath'] = df_work['Halfbath'].replace('99', '0')

**_NOTE: I noticed that there was 1 address with 83 listings below.  This would account for that 1 address to have so many Half Bathrooms.  I decided to change the 0 back to 99 for the integrity of the data_**

In [46]:
df_work['Threequarter Bath'].unique()

array(['', '1', '2', '3', '4', None], dtype=object)

In [47]:
df_work['Threequarter Bath'] = df_work['Threequarter Bath'].replace('None', '0')
df_work['Threequarter Bath'] = df_work['Threequarter Bath'].replace('', '0')

In [48]:
df_work.head()

Unnamed: 0,Date,MLSid,MLS Name,Street,Price,Bedrooms,Bathrooms,Appliances,Description,Fullbath,Halfbath,Threequarter Bath
0,2014-10-03,14799273,CLAW,0 Castro Peak Mountainway,535000.0,0,0,,,0,0,0
1,2014-10-17,14802845,CLAW,"23410 Civic Center Way, C1",200000.0,0,0,,,0,0,0
2,2015-03-03,15883387,CLAW,0 SADDLE PEAK RD,23500.0,3,0,\nBuilt-Ins\nRange Hood\nMicrowave\nRangeOven\n,,4,0,0
3,2015-03-18,15888095,CLAW,"23410 Civic Center Way, C1",72500.0,5,0,\nBuilt-Ins\nRange Hood\nMicrowave\nRangeOven\n,,4,1,0
4,2015-11-18,15959941,CLAW,21310 PACIFIC COAST HWY,230000.0,2,0,\nRangeOven\n,,1,0,0


I realized that there were 219 *null* values that I decided to remove.  This I feel was the discrepancy I noticed earlier with the doubling of rows after converting to Pandas

In [49]:
df_work = df_work.dropna(axis=0, how = 'any')

Now I can convert all the necessary features into integers

In [50]:
df_work['Bedrooms'] = df_work['Bedrooms'].astype(int)
df_work['Bathrooms'] = df_work['Bathrooms'].astype(int)
df_work['Fullbath'] = df_work['Fullbath'].astype(int)
df_work['Halfbath'] = df_work['Halfbath'].astype(int)
df_work['Threequarter Bath'] = df_work['Threequarter Bath'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See

In [51]:
df_work['Rooms'] = df_work['Bedrooms']+df_work['Bathrooms']+df_work['Fullbath']+df_work['Halfbath']+df_work['Threequarter Bath']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [52]:
df_work.head()

Unnamed: 0,Date,MLSid,MLS Name,Street,Price,Bedrooms,Bathrooms,Appliances,Description,Fullbath,Halfbath,Threequarter Bath,Rooms
0,2014-10-03,14799273,CLAW,0 Castro Peak Mountainway,535000.0,0,0,,,0,0,0,0
1,2014-10-17,14802845,CLAW,"23410 Civic Center Way, C1",200000.0,0,0,,,0,0,0,0
2,2015-03-03,15883387,CLAW,0 SADDLE PEAK RD,23500.0,3,0,\nBuilt-Ins\nRange Hood\nMicrowave\nRangeOven\n,,4,0,0,7
3,2015-03-18,15888095,CLAW,"23410 Civic Center Way, C1",72500.0,5,0,\nBuilt-Ins\nRange Hood\nMicrowave\nRangeOven\n,,4,1,0,10
4,2015-11-18,15959941,CLAW,21310 PACIFIC COAST HWY,230000.0,2,0,\nRangeOven\n,,1,0,0,3


In [53]:
df_work.dtypes

Date                 datetime64[ns]
MLSid                        object
MLS Name                     object
Street                       object
Price                       float64
Bedrooms                      int64
Bathrooms                     int64
Appliances                   object
Description                  object
Fullbath                      int64
Halfbath                      int64
Threequarter Bath             int64
Rooms                         int64
dtype: object

In [54]:
df_work['MLSid'].unique()

array(['14799273', '14802845', '15883387', '15888095', '15959941',
       '15965091', '16106762', '16109128', '16114818', '16115130',
       '16115640', '16118008', '16122648', '16125100', '16129206',
       '16130744', '16132084', '16132158', '16133142', '16134056',
       '16134704', '16134756', '16134906', '16135516', '16136820',
       '16137056', '16137086', '16141262', '16142180', '16142204',
       '16144050', '16145020', '16146358', '16146576', '16148546',
       '16150042', '16150280', '16150318', '16150758', '16151208',
       '16151674', '16153228', '16154748', '16154856', '16154966',
       '16155556', '16155592', '16156574', '16157052', '16157428',
       '16157766', '16157850', '16158240', '16158556', '16158874',
       '16159656', '16159808', '16160058', '16160314', '16161032',
       '16161068', '16161388', '16161666', '16162304', '16162424',
       '16162540', '16162570', '16162696', '16162826', '16163034',
       '16163198', '16163248', '16163266', '16163826', '161639

It looks like that not all the IDs are purely numeric.  At the very end, there seems to be a string element, hybrid of numbers and letters.

### Organizing Description

In [57]:
df_test = df_work

In [58]:
df_test.tail()

Unnamed: 0,Date,MLSid,MLS Name,Street,Price,Bedrooms,Bathrooms,Appliances,Description,Fullbath,Halfbath,Threequarter Bath,Rooms
214,2016-11-18 07:45:06,316010702,ITECH,1227 GRANVILLE AVE,1025000.0,4,0,\nBuilt-In Gas\n,,1,0,1,6
215,2014-07-24 16:33:10,PT2979,partnerstrust,"11726 San Vicente Blvd, Suite 350",50000.0,4,0,,,4,0,0,8
216,2015-02-13 14:29:51,PT3446,partnerstrust,7462 CLINTON ST,999990.0,0,0,,,0,0,0,0
217,2015-02-27 09:00:55,PT3481,partnerstrust,"9378 Wilshire Boulevard, Suite 200",3700.0,0,0,,,2,0,0,2
218,2016-10-04 00:00:00,SB16712658MR,CRMLS,610 N Orlando AVE,779000.0,3,0,\nRangeOven\n,,2,99,0,104


In [59]:
df_test['Description'] = df_test['Description'].replace('', '0')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [60]:
df_test.head()

Unnamed: 0,Date,MLSid,MLS Name,Street,Price,Bedrooms,Bathrooms,Appliances,Description,Fullbath,Halfbath,Threequarter Bath,Rooms
0,2014-10-03,14799273,CLAW,0 Castro Peak Mountainway,535000.0,0,0,,0,0,0,0,0
1,2014-10-17,14802845,CLAW,"23410 Civic Center Way, C1",200000.0,0,0,,0,0,0,0,0
2,2015-03-03,15883387,CLAW,0 SADDLE PEAK RD,23500.0,3,0,\nBuilt-Ins\nRange Hood\nMicrowave\nRangeOven\n,0,4,0,0,7
3,2015-03-18,15888095,CLAW,"23410 Civic Center Way, C1",72500.0,5,0,\nBuilt-Ins\nRange Hood\nMicrowave\nRangeOven\n,0,4,1,0,10
4,2015-11-18,15959941,CLAW,21310 PACIFIC COAST HWY,230000.0,2,0,\nRangeOven\n,0,1,0,0,3


Now that I was able to replace the non-values in the descriptions, or *blanks*, I was able to create a new data frame with just descriptions which were only 3.  I believe this has narrowed down the selections, however there are no accompanying data except ```Street``` address.  I feel like my initial steps of converting from XML to Pandas, I lost something

In [61]:
df_description = df_test[df_test['Description'] != '0']

In [62]:
df_description

Unnamed: 0,Date,MLSid,MLS Name,Street,Price,Bedrooms,Bathrooms,Appliances,Description,Fullbath,Halfbath,Threequarter Bath,Rooms


In [63]:
df_description.to_csv('Description-only.csv')

### Organizing Date

In [64]:
df_test

Unnamed: 0,Date,MLSid,MLS Name,Street,Price,Bedrooms,Bathrooms,Appliances,Description,Fullbath,Halfbath,Threequarter Bath,Rooms
0,2014-10-03 00:00:00,14799273,CLAW,0 Castro Peak Mountainway,535000.0,0,0,,0,0,0,0,0
1,2014-10-17 00:00:00,14802845,CLAW,"23410 Civic Center Way, C1",200000.0,0,0,,0,0,0,0,0
2,2015-03-03 00:00:00,15883387,CLAW,0 SADDLE PEAK RD,23500.0,3,0,\nBuilt-Ins\nRange Hood\nMicrowave\nRangeOven\n,0,4,0,0,7
3,2015-03-18 00:00:00,15888095,CLAW,"23410 Civic Center Way, C1",72500.0,5,0,\nBuilt-Ins\nRange Hood\nMicrowave\nRangeOven\n,0,4,1,0,10
4,2015-11-18 00:00:00,15959941,CLAW,21310 PACIFIC COAST HWY,230000.0,2,0,\nRangeOven\n,0,1,0,0,3
5,2015-12-11 00:00:00,15965091,CLAW,"23410 Civic Center Way, C1",2988000.0,0,0,,0,0,0,0,0
6,2016-03-16 00:00:00,16106762,CLAW,23826 MALIBU RD,25000.0,3,0,\nBuilt-Ins\nMicrowave\nRangeOven\n,0,3,1,0,7
7,2016-03-25 00:00:00,16109128,CLAW,"23410 Civic Center Way, C1",6500.0,2,0,\nBuilt-Ins\nMicrowave\nBuilt-In Gas\nCooktop ...,0,1,1,0,4
8,2016-04-12 00:00:00,16114818,CLAW,943 16th ST,5800000.0,3,0,\nBuilt-Ins\nRange Hood\nRangeOven\n,0,3,0,0,6
9,2016-04-13 00:00:00,16115130,CLAW,1333 Montana Ave,5499.0,3,0,,0,2,0,0,5


Now, time to focus on organizing the year and getting only 2016 and putting them in descending order

In [65]:
df_test = df_test[(df_test['Date'].dt.year == 2016)]

In [66]:
df_test.sort_values(by='Date', ascending = False)

Unnamed: 0,Date,MLSid,MLS Name,Street,Price,Bedrooms,Bathrooms,Appliances,Description,Fullbath,Halfbath,Threequarter Bath,Rooms
214,2016-11-18 07:45:06,316010702,ITECH,1227 GRANVILLE AVE,1025000.0,4,0,\nBuilt-In Gas\n,0,1,0,1,6
212,2016-11-18 00:00:00,316010634,ITECH,15630 WOODVALE RD,785000.0,2,0,\nBuilt-Ins\nMicrowave\nOven\nRangeOven\nRange...,0,3,0,0,5
156,2016-11-18 00:00:00,16181228,CLAW,928 19TH ST,7488000.0,4,0,\nBuilt-In BBQ\nBuilt-Ins\nRange Hood\nRangeOv...,0,4,1,0,9
155,2016-11-18 00:00:00,16181224,CLAW,"9378 Wilshire Boulevard, Suite 200",839000.0,2,0,\nBuilt-Ins\nRangeOven\n,0,2,1,0,5
213,2016-11-17 00:00:00,316010650,ITECH,"11726 San Vicente Blvd, Suite 350",1299000.0,4,0,\nBuilt-In Gas\n,0,3,1,0,8
154,2016-11-17 00:00:00,16180298,CLAW,1616 N LA BREA AVE,1125000.0,3,0,\nRangeOven\nGas/Electric Range\nRange Hood\nS...,0,2,0,0,5
153,2016-11-16 00:00:00,16180182,CLAW,"11726 San Vicente Blvd, Suite 350",1875000.0,3,0,\nRange Hood\nMicrowave\nRangeOven\n,0,3,1,0,7
211,2016-11-14 00:00:00,316010536,ITECH,"11726 San Vicente Blvd, Suite 350",2600000.0,5,0,\nBuilt-In BBQ\n,0,4,0,0,9
152,2016-11-14 00:00:00,16179602,CLAW,1310 NAPOLI DR,3986000.0,3,0,\nBuilt-In BBQ\nBuilt-Ins\nMicrowave\nRangeOven\n,0,1,0,1,5
210,2016-11-13 00:00:00,316010528,ITECH,235 MAIN ST,569900.0,4,0,\nOven\n,0,3,0,0,7


I wasn't sure if this *should* have been in descending or ascending order.  I thought it would make more sense with recent first.

### Organizing Appliances

In [67]:
df_test['Appliances']

6                    \nBuilt-Ins\nMicrowave\nRangeOven\n
7      \nBuilt-Ins\nMicrowave\nBuilt-In Gas\nCooktop ...
8                   \nBuilt-Ins\nRange Hood\nRangeOven\n
9                                                       
10                                                      
11                           \nBuilt-In BBQ\nRangeOven\n
12                                                      
13     \nBuilt-In BBQ\nBuilt-Ins\nMicrowave\nRangeOven\n
14                                                      
15                                                      
16                                                      
17                \nBuilt-In BBQ\nMicrowave\nRangeOven\n
18                                         \nRangeOven\n
19                                                      
20                                                      
21                                                      
22     \nRange Hood\nMicrowave\nCooktop - Gas\nOven-G...
23                  \nBuilt-Ins

In [300]:
df_test.loc[df_test['Appliances'].str.contains('\n'), :]

Unnamed: 0,Date,MLSid,MLS Name,Street,Price,Bedrooms,Bathrooms,Appliances,Description,Fullbath,Halfbath,Threequarter Bath,Rooms
6,2016-03-16 00:00:00,16106762,CLAW,23826 MALIBU RD,25000.0,3,0,\nBuilt-Ins\nMicrowave\nRangeOven\n,0,3,1,0,331
7,2016-03-25 00:00:00,16109128,CLAW,"23410 Civic Center Way, C1",6500.0,2,0,\nBuilt-Ins\nMicrowave\nBuilt-In Gas\nCooktop ...,0,1,1,0,211
8,2016-04-12 00:00:00,16114818,CLAW,943 16th ST,5800000.0,3,0,\nBuilt-Ins\nRange Hood\nRangeOven\n,0,3,0,0,33
11,2016-04-25 00:00:00,16118008,CLAW,"23410 Civic Center Way, C1",6995000.0,5,0,\nBuilt-In BBQ\nRangeOven\n,0,4,0,0,54
13,2016-05-16 00:00:00,16125100,CLAW,"23410 Civic Center Way, C1",150000.0,6,0,\nBuilt-In BBQ\nBuilt-Ins\nMicrowave\nRangeOven\n,0,7,1,0,671
17,2016-06-06 00:00:00,16132158,CLAW,"9378 Wilshire Boulevard, Suite 200",27500.0,3,0,\nBuilt-In BBQ\nMicrowave\nRangeOven\n,0,5,0,0,35
18,2016-07-15 00:00:00,16133142,CLAW,1912 KELTON AVE,1699000.0,3,0,\nRangeOven\n,0,2,1,0,321
22,2016-06-16 00:00:00,16134906,CLAW,28011 PAQUET PL,1749000.0,2,0,\nRange Hood\nMicrowave\nCooktop - Gas\nOven-G...,0,2,0,0,22
23,2016-06-16 00:00:00,16135516,CLAW,"23410 Civic Center Way, C1",2195000.0,3,0,\nBuilt-Ins\nRange Hood\nRangeOven\n,0,3,0,0,33
24,2016-07-06 00:00:00,16136820,CLAW,6237 BONSALL DR,17500000.0,5,0,\nBuilt-In BBQ\nBuilt-Ins\nRangeOven\n,0,5,2,0,552


In [78]:
df_test['Appliances'] = df_test['Appliances'].str.replace('\n', ' ')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [79]:
df_test

Unnamed: 0,Date,MLSid,MLS Name,Street,Price,Bedrooms,Bathrooms,Appliances,Description,Fullbath,Halfbath,Threequarter Bath,Rooms,Test
6,2016-03-16 00:00:00,16106762,CLAW,23826 MALIBU RD,25000.0,3,0,Built-Ins Microwave RangeOven,0,3,1,0,7,Built-Ins Microwave RangeOven
7,2016-03-25 00:00:00,16109128,CLAW,"23410 Civic Center Way, C1",6500.0,2,0,Built-Ins Microwave Built-In Gas Cooktop - Ga...,0,1,1,0,4,Built-Ins Microwave Built-In Gas Cooktop - Ga...
8,2016-04-12 00:00:00,16114818,CLAW,943 16th ST,5800000.0,3,0,Built-Ins Range Hood RangeOven,0,3,0,0,6,Built-Ins Range Hood RangeOven
9,2016-04-13 00:00:00,16115130,CLAW,1333 Montana Ave,5499.0,3,0,,0,2,0,0,5,
10,2016-04-14 00:00:00,16115640,CLAW,27061 SEA VISTA DR,798000.0,0,0,,0,0,0,0,0,
11,2016-04-25 00:00:00,16118008,CLAW,"23410 Civic Center Way, C1",6995000.0,5,0,Built-In BBQ RangeOven,0,4,0,0,9,Built-In BBQ RangeOven
12,2016-05-06 00:00:00,16122648,CLAW,27086 MALIBU COVE COLONY DR,6649000.0,3,0,,0,3,1,0,7,
13,2016-05-16 00:00:00,16125100,CLAW,"23410 Civic Center Way, C1",150000.0,6,0,Built-In BBQ Built-Ins Microwave RangeOven,0,7,1,0,14,Built-In BBQ Built-Ins Microwave RangeOven
14,2016-05-26 00:00:00,16129206,CLAW,36 BREEZE AVE,5975.0,4,0,,0,2,0,0,6,
15,2016-06-02 00:00:00,16130744,CLAW,1730 Ocean Park Blvd,2450000.0,0,0,,0,0,0,0,0,


In [80]:
df_test.drop(columns = 'Test', inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [81]:
df_test.head()

Unnamed: 0,Date,MLSid,MLS Name,Street,Price,Bedrooms,Bathrooms,Appliances,Description,Fullbath,Halfbath,Threequarter Bath,Rooms
6,2016-03-16,16106762,CLAW,23826 MALIBU RD,25000.0,3,0,Built-Ins Microwave RangeOven,0,3,1,0,7
7,2016-03-25,16109128,CLAW,"23410 Civic Center Way, C1",6500.0,2,0,Built-Ins Microwave Built-In Gas Cooktop - Ga...,0,1,1,0,4
8,2016-04-12,16114818,CLAW,943 16th ST,5800000.0,3,0,Built-Ins Range Hood RangeOven,0,3,0,0,6
9,2016-04-13,16115130,CLAW,1333 Montana Ave,5499.0,3,0,,0,2,0,0,5
10,2016-04-14,16115640,CLAW,27061 SEA VISTA DR,798000.0,0,0,,0,0,0,0,0


In [82]:
df_test.to_csv('Final.csv')