<script>
    function findAncestor (el, name) {
        while ((el = el.parentElement) && el.nodeName.toLowerCase() !== name);
        return el;
    }
    function colorAll(el, textColor) {
        el.style.color = textColor;
        Array.from(el.children).forEach((e) => {colorAll(e, textColor);});
    }
    function setBackgroundImage(src, textColor) {
        var section = findAncestor(document.currentScript, "section");
        if (section) {
            section.setAttribute("data-background-image", src);
			if (textColor) colorAll(section, textColor);
        }
    }
</script>

<style>
h1 {
  border: 1.5px solid #333;
  padding: 8px 12px;
  background-image: linear-gradient(#2774AE,#ebf8e1, #FFD100);
  position: static;
}
</style>

<h1 style='color:white'> Statistics 21 <br/> Python & Other Technologies for Data Science </h1>

<h3 style='color:white'>Vivian Lew, PhD - Friday, Week 5</h3>

<script>
    setBackgroundImage("Window1.jpg");
</script>

# Pandas DataFrames

## Week 5 Friday

## Adapted from Pandas in Action by B.Pashkaver and Python for Data Analysis by W. McKinney

We will be using the following today:

In [1]:
import numpy as np
import pandas as pd
import pyreadstat # you may need to install it
import datetime

and various datasets 

## Intro

- A pandas DataFrame is a two-dimensional table of data with rows and columns 
- pandas assigns an index label and an index position to each DataFrame row
- pandas also assigns a label and a position to each column

The DataFrame is two-dimensional because it requires two points of reference to specify a data value. 

`pd.DataFrame( )` is the basic constructor, ideally it likes a dictionary data structure but is adaptable.

## Direct read to DataFrame from different sources

- the read_ functions can transform data from a variety of sources (e.g., csv, SAS, SPSS, json)
- and can do so remotely URL:

In [2]:
DIS = pd.read_csv('http://www.stat.ucla.edu/~vlew/datasets/DISNEY.csv')

In [3]:
DIS

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,YM
0,2019-07-01,140.449997,141.949997,139.220001,141.649994,139.939774,8996500,19/07
1,2019-07-02,141.399994,142.860001,141.270004,142.529999,140.809143,7554100,19/07
2,2019-07-03,142.699997,143.000000,142.000000,142.979996,141.253708,4150900,19/07
3,2019-07-05,141.419998,142.889999,140.699997,142.449997,141.601624,5596000,19/07
4,2019-07-08,142.179993,142.229996,140.970001,141.020004,140.180145,4993900,19/07
...,...,...,...,...,...,...,...,...
248,2020-06-24,115.849998,116.000000,110.029999,112.070000,112.070000,22252500,20/06
249,2020-06-25,108.989998,111.510002,108.500000,111.360001,111.360001,17240400,20/06
250,2020-06-26,110.949997,111.199997,108.019997,109.099998,109.099998,15270900,20/06
251,2020-06-29,109.000000,111.570000,108.099998,111.519997,111.519997,12584300,20/06


## Generated data from Numpy to Pandas

In [4]:
np.random.seed(1)
random_data = np.random.randint(1, 101, [3, 5])
random_data

array([[38, 13, 73, 10, 76],
       [ 6, 80, 65, 17,  2],
       [77, 72,  7, 26, 51]])

In [5]:
pd.DataFrame(data = random_data)

Unnamed: 0,0,1,2,3,4
0,38,13,73,10,76
1,6,80,65,17,2
2,77,72,7,26,51


In [6]:
scores = pd.DataFrame(
            data = random_data, 
            index = ['s1', 's2', 's3'], 
            columns = ['t1', 't2', 't3', 't4', 't5']
        )
scores

Unnamed: 0,t1,t2,t3,t4,t5
s1,38,13,73,10,76
s2,6,80,65,17,2
s3,77,72,7,26,51


## For Practice (you will need to install & import pyreadstat)

SPSS (IBM's stat software)

In [7]:
cc = pd.read_spss('SPDLCWave1Data.sav')
# Set the desired column as the index
index_column = 'ID'  
cc.set_index(index_column, inplace=True)
cc.head()

Unnamed: 0_level_0,U_S_,reside_spouse,reside_child,cookBR,laundryBR,shopBR,dishesBR,cleanBR,driveBR,cookAR,...,reltext,evan,attend,married,prevmar,prevmarP,prevchild,extend,state,CPSweightW1
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.0,Yes,Yes,Yes,I did more of it,I did more of it,I did it all,My partner did more of it,We shared it equally,We shared it equally,I do more of it,...,,Yes,Green,Married,No,No,No,No,Michigan,0.419133
2.0,Yes,Yes,Yes,I did more of it,I did more of it,I did more of it,I did more of it,We shared it equally,We shared it equally,I do it all,...,,No,Green,Married,No,No,No,No,Kentucky,0.599348
3.0,Yes,Yes,Yes,We shared it equally,We shared it equally,My partner did more of it,We shared it equally,We shared it equally,,I do more of it,...,spiritual,No,Green,Married,No,No,No,No,Maryland,0.419133
4.0,Yes,Yes,Yes,My partner did more of it,We shared it equally,I did more of it,I did more of it,We shared it equally,We shared it equally,My partner does more of it,...,,No,Green,Married,No,No,No,No,California,1.791661
5.0,Yes,Yes,Yes,I did more of it,I did more of it,We shared it equally,I did it all,I did it all,We shared it equally,My partner does more of it,...,,,Green,Married,No,No,No,No,Mississippi,0.419133


## more practice

SAS - widely used in high places (e.g., federal government) mostly used for advanced analytics.

In [8]:
d = pd.read_sas("http://www.principlesofeconometrics.com/sas/usa.sas7bdat")

In [9]:
d.head()

Unnamed: 0,GDP,INF,F,B
0,4119.5,3.548619,8.47667,10.6767
1,4178.399902,3.645685,7.92333,9.76333
2,4261.299805,3.309139,7.9,9.28667
3,4321.799805,3.469453,8.10333,8.84333
4,4385.600098,3.06095,7.82667,7.93667


## still more practice

Stata - "boutique" software favored by Economists.

In [10]:
cps = pd.read_stata('20zpallagi.dta')
cps.head()

Unnamed: 0,statefips,state,zipcode,agi_stub,n1,mars1,mars2,mars4,elf,cprep,...,a85300,n11901,a11901,n11900,a11900,n11902,a11902,n12000,a12000,year
0,1,AL,0,"$1 under $25,000",785000.0,519980.0,85690.0,165290.0,724170.0,22560.0,...,0.0,57720.0,46577.0,674840.0,1827202.0,672200.0,1818867.0,2900.0,6089.0,2020.0
1,1,AL,0,"$25,000 under $50,000",554310.0,270870.0,121420.0,146470.0,515150.0,13260.0,...,0.0,81770.0,112540.0,470410.0,1445383.0,466960.0,1432458.0,4660.0,11648.0,2020.0
2,1,AL,0,"$50,000 under $75,000",290630.0,113280.0,124770.0,44570.0,269700.0,6420.0,...,0.0,70360.0,144380.0,220710.0,626662.0,216530.0,610170.0,5760.0,16235.0,2020.0
3,1,AL,0,"$75,000 under $100,000",181010.0,42010.0,120820.0,14410.0,168830.0,2570.0,...,0.0,49500.0,135429.0,130670.0,437179.0,126790.0,419324.0,3730.0,14903.0,2020.0
4,1,AL,0,"$100,000 under $200,000",269080.0,31310.0,224330.0,8270.0,252360.0,3250.0,...,20.0,103250.0,470206.0,165650.0,724529.0,156910.0,642895.0,11280.0,80064.0,2020.0


In [11]:
cps.dtypes.value_counts() 

float64     161
int32         2
object        1
category      1
float32       1
Name: count, dtype: int64

## yet more practice

JSON - JavaScript Object Notation. lightweight, text-based format for representing data. It is human readable. Easy for Python to write and to parse and generate. 

In [12]:
yelp = pd.read_json('Yelp/yelp_academic_dataset_business.json', lines = True)
# Set the desired column as the index
index_column = 'business_id'  
yelp.set_index(index_column, inplace=True)

yelp.head()

Unnamed: 0_level_0,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
business_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Pns2l4eNsfO8kk83dixA6A,"Abby Rappoport, LAC, CMQ","1616 Chapala St, Ste 2",Santa Barbara,CA,93101,34.426679,-119.711197,5.0,7,0,{'ByAppointmentOnly': 'True'},"Doctors, Traditional Chinese Medicine, Naturop...",
mpf3x-BjTdTEA3yCZrAYPw,The UPS Store,87 Grasso Plaza Shopping Center,Affton,MO,63123,38.551126,-90.335695,3.0,15,1,{'BusinessAcceptsCreditCards': 'True'},"Shipping Centers, Local Services, Notaries, Ma...","{'Monday': '0:0-0:0', 'Tuesday': '8:0-18:30', ..."
tUFrWirKiKi_TAnsVWINQQ,Target,5255 E Broadway Blvd,Tucson,AZ,85711,32.223236,-110.880452,3.5,22,0,"{'BikeParking': 'True', 'BusinessAcceptsCredit...","Department Stores, Shopping, Fashion, Home & G...","{'Monday': '8:0-22:0', 'Tuesday': '8:0-22:0', ..."
MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,PA,19107,39.955505,-75.155564,4.0,80,1,"{'RestaurantsDelivery': 'False', 'OutdoorSeati...","Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', ..."
mWMc6_wTdE0EUBKIGXDVfA,Perkiomen Valley Brewery,101 Walnut St,Green Lane,PA,18054,40.338183,-75.471659,4.5,13,1,"{'BusinessAcceptsCreditCards': 'True', 'Wheelc...","Brewpubs, Breweries, Food","{'Wednesday': '14:0-22:0', 'Thursday': '16:0-2..."


In [13]:
yelp.dtypes.value_counts() 

object     8
float64    3
int64      2
Name: count, dtype: int64

## Interview example 

Even these can be read, there are other ways, but this one is perhaps most understandable.

In [14]:
datasift = pd.read_json('http://www.stat.ucla.edu/~vlew/datasets/DataSift.json', 
                        lines = True)
datasift

Unnamed: 0,count,hash,hash_type,id,delivered_at,interactions
0,18,2f07e6e5e408d2a2988e,historic,49de7967b29ac130ab49248db140d634,2014-01-09 23:53:55+00:00,[{'interaction': {'author': {'avatar': 'http:/...


In [15]:
datasift['interactions'][0][0]

{'interaction': {'author': {'avatar': 'http://a0.twimg.com/profile_images/1833467114/oxford_2_normal.jpg',
   'id': 494925931,
   'link': 'http://twitter.com/OxfordFreegle',
   'name': 'Oxford Freegle',
   'username': 'OxfordFreegle'},
  'content': 'WANTED: Blue Brooches - Any style or condition (Abingdon Ox14) http://t.co/cETDGisx',
  'created_at': 'Sun, 13 Jan 2013 23:08:04 +0000',
  'id': '1e25dd61585aa200e07405b0f7d6e7ec',
  'link': 'http://twitter.com/OxfordFreegle/statuses/290596104419041280',
  'schema': {'version': 3},
  'source': 'Freegle',
  'type': 'twitter'},
 'klout': {'score': 25},
 'language': {'confidence': 62, 'tag': 'en'},
 'links': {'code': [200],
  'created_at': ['Sun, 13 Jan 2013 23:08:05 +0000'],
  'hops': [[]],
  'meta': {'charset': ['UTF-8'],
   'content_type': ['text/html'],
   'lang': ['unknown']},
  'normalized_url': ['http://go.frgl.it/Z39rd'],
  'retweet_count': [0],
  'title': [None],
  'url': ['http://go.frgl.it/Z39rd']},
 'salience': {'content': {'sentim

In [16]:
# list comprehension
texts = [datasift['interactions'][0][i]['interaction']['content'] for i in range(18)]

# tuple and the built-in enumerate
for i, text in enumerate(texts):
    print(f"Tweet {i + 1}: {text}")

Tweet 1: WANTED: Blue Brooches - Any style or condition (Abingdon Ox14) http://t.co/cETDGisx
Tweet 2: I uploaded a @YouTube video http://t.co/X1aUIU7k Relevence - Player - Neyo Style Beat
Tweet 3: Ya estamos en el 2013. Ya déjen de poner el puto Gangnam Style por favor!
Tweet 4: Watching the Golden Globes red carpet on E! I'll definitely be tweeting my personal commentary #ERedCarpet
Tweet 5: I alwaaayyss have to pick out my outfit the night before school. #girlproblems
Tweet 6: so sick of how i look and what i dress like
Tweet 7: @DonnaNewcross I'd have to wear net knickers. At least plenty of supplies on ward, toiletries etc!
Tweet 8: Watching #thegoldenblobes #redcarpet
Tweet 9: #eredcarpet is the hashtag to use if you want to send comments to the social team
Tweet 10: Its my favorite time if year, red carpet season!! #GoldenGlobes #Eredcarpet
Tweet 11: Red Carpet #GoldenGlobes
Tweet 12: @GiulianaRancic love your second outfit. You look absolutely stunning!!!
Tweet 13: RT @_IFB: Bes

## Fixed Width File

The columns and data are orderly (lined up nicely) but a little messy due to comments and unconventional (Python violating) naming

In [17]:
# Define column names (when needed)
column_names = ['DATE', 'TIME', 'ET', 'GT', 'MAG',  'M', 'LAT',  'LON', 'DEPTH', 'Q',  'EVID', 'NPH', 'NGRM']

eq2023 = pd.read_fwf("https://service.scedc.caltech.edu/ftp/catalogs/SCEC_DC/2023.catalog", 
                 names=column_names, skiprows=10)


In [18]:
eq2023.head()

Unnamed: 0,DATE,TIME,ET,GT,MAG,M,LAT,LON,DEPTH,Q,EVID,NPH,NGRM
0,2023/01/01,00:32:22.63,eq,l,0.84,l,33.408,-116.617,7.5,A,40152455.0,71.0,1270.0
1,2023/01/01,00:57:36.51,eq,l,0.47,l,33.556,-116.613,15.2,A,40152463.0,36.0,1453.0
2,2023/01/01,01:00:24.86,eq,l,1.93,l,34.401,-118.713,6.7,A,40152471.0,87.0,2922.0
3,2023/01/01,01:07:57.96,eq,l,0.84,l,33.403,-116.37,4.2,A,40152479.0,51.0,1283.0
4,2023/01/01,02:10:25.56,eq,l,1.0,l,33.013,-116.429,6.7,A,40152487.0,69.0,1342.0


## Reading & Converting Dates

In [19]:
DIS = pd.read_csv('http://www.stat.ucla.edu/~vlew/datasets/DISNEY.csv')
DIS.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,YM
0,2019-07-01,140.449997,141.949997,139.220001,141.649994,139.939774,8996500,19/07
1,2019-07-02,141.399994,142.860001,141.270004,142.529999,140.809143,7554100,19/07
2,2019-07-03,142.699997,143.0,142.0,142.979996,141.253708,4150900,19/07
3,2019-07-05,141.419998,142.889999,140.699997,142.449997,141.601624,5596000,19/07
4,2019-07-08,142.179993,142.229996,140.970001,141.020004,140.180145,4993900,19/07


In [20]:
DIS.dtypes

Date          object
Open         float64
High         float64
Low          float64
Close        float64
Adj Close    float64
Volume         int64
YM            object
dtype: object

## Reading instead with Date immediately identified

In [21]:
DIS = pd.read_csv("http://www.stat.ucla.edu/~vlew/datasets/DISNEY.csv", 
                  parse_dates = ["Date"])

In [22]:
# create a more usable date field, perhaps
DIS['YM'] = DIS['Date'].dt.to_period('M')
DIS.dtypes

Date         datetime64[ns]
Open                float64
High                float64
Low                 float64
Close               float64
Adj Close           float64
Volume                int64
YM                period[M]
dtype: object

In [23]:
DIS.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,YM
0,2019-07-01,140.449997,141.949997,139.220001,141.649994,139.939774,8996500,2019-07
1,2019-07-02,141.399994,142.860001,141.270004,142.529999,140.809143,7554100,2019-07
2,2019-07-03,142.699997,143.0,142.0,142.979996,141.253708,4150900,2019-07
3,2019-07-05,141.419998,142.889999,140.699997,142.449997,141.601624,5596000,2019-07
4,2019-07-08,142.179993,142.229996,140.970001,141.020004,140.180145,4993900,2019-07


## Application

In [24]:
# Group the DataFrame by the time period variable YM and 
# compute the min and max for the 'Close' variable by chaining
result = DIS.groupby('YM')['Close'].agg(['min', 'max', 'median'])

# Round multiple columns at once using a dictionary
round_dict = {'min': 1, 'max': 1, 'median': 0}
result = result.round(round_dict)

print(result)

           min    max  median
YM                           
2019-07  139.9  146.4   143.0
2019-08  131.7  141.9   136.0
2019-09  130.0  139.6   136.0
2019-10  128.1  132.4   130.0
2019-11  131.3  151.6   147.0
2019-12  143.8  150.6   146.0
2020-01  135.9  148.2   144.0
2020-02  117.7  144.7   141.0
2020-03   85.8  120.0   100.0
2020-04   93.9  112.2   102.0
2020-05  100.9  121.5   109.0
2020-06  109.1  127.3   117.0


In [25]:
type(result)

pandas.core.frame.DataFrame

## value_counts() answers a different question

In [26]:
DIS['YM'].value_counts().sort_index()

YM
2019-07    22
2019-08    22
2019-09    20
2019-10    23
2019-11    20
2019-12    21
2020-01    21
2020-02    19
2020-03    22
2020-04    21
2020-05    20
2020-06    22
Freq: M, Name: count, dtype: int64

In [27]:
type(DIS['YM'].value_counts())

pandas.core.series.Series

## Rearranging value

Both Series and DataFrames have the `.sort_index()` and `.sort_values()` methods which can be used to rearrange the value.

In [28]:
DIS.sort_values(by = "Close", ascending = True)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,YM
183,2020-03-23,84.489998,87.279999,81.089996,85.760002,85.760002,32246600,2020-03
182,2020-03-20,95.989998,96.989998,85.839996,85.980003,85.980003,31957800,2020-03
180,2020-03-18,87.589996,89.339996,79.070000,88.800003,88.800003,43592500,2020-03
176,2020-03-12,97.620003,100.000000,91.639999,91.809998,91.809998,40392900,2020-03
179,2020-03-17,95.800003,97.459999,91.150002,93.529999,93.529999,27526200,2020-03
...,...,...,...,...,...,...,...,...
103,2019-11-25,148.800003,150.210007,147.699997,149.690002,148.798508,11316800,2019-11
107,2019-12-02,152.940002,152.970001,149.100006,150.619995,149.722961,10351000,2019-12
105,2019-11-27,152.300003,152.570007,151.149994,151.479996,150.577850,6155400,2019-11
106,2019-11-29,151.479996,152.470001,151.009995,151.580002,150.677261,6284900,2019-11


## DataFrame Attributes

In [29]:
result.index

PeriodIndex(['2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2019-12',
             '2020-01', '2020-02', '2020-03', '2020-04', '2020-05', '2020-06'],
            dtype='period[M]', name='YM')

In [30]:
result.ndim, result.shape, result.size

(2, (12, 3), 36)

## Filtering/Subsetting

In [31]:
# multiple conditions use bitwise & and |
yelp_bad = yelp[(yelp["review_count"] > 100) & (yelp["stars"] < 2)]
type(yelp_bad)

pandas.core.frame.DataFrame

In [32]:
yelp_bad_sorted = yelp_bad.sort_values(by='review_count', ascending=False)
yelp_bad_sorted.head()

Unnamed: 0_level_0,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
business_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1fy9fS3UH2k4TfQcryNKkA,Goedeker's,13850 Manchester Rd,Ballwin,MO,63011,38.595703,-90.483721,1.5,747,1,"{'BusinessAcceptsCreditCards': 'True', 'Restau...","Appliances & Repair, Local Services, Appliance...","{'Monday': '10:0-18:0', 'Tuesday': '10:0-18:0'..."
ONuqtwn8euUIWumg3U_4DQ,Sears Home Services,639 B Gravios Bluffs Blvd,Fenton,MO,63026,38.507848,-90.435434,1.0,575,0,{'BusinessAcceptsCreditCards': 'True'},"Local Services, Home Services, Appliances & Re...","{'Monday': '9:30-17:30', 'Tuesday': '9:30-17:3..."
-jsmtvdoUI-GJRSklYmEuA,Pets Best,"2323 S Vista Ave, Ste 100",Boise,ID,83705,43.581395,-116.214277,1.5,461,1,,"Pet Services, Pets, Pet Insurance","{'Monday': '6:0-19:0', 'Tuesday': '6:0-19:0', ..."
xDLh8Rgh1nL-JZW7wYQF8A,Defender Security Company,"3750 Priority Way South Dr, Ste 200",Indianapolis,IN,46240,39.920326,-86.102723,1.0,413,1,"{'ByAppointmentOnly': 'True', 'BusinessAccepts...","Security Systems, Heating & Air Conditioning/H...","{'Monday': '9:0-17:0', 'Tuesday': '9:0-17:0', ..."
uLp_wwemUq6cAuxPwaZF3Q,Zatarain's Kitchen,"900 Airline Dr, Cocourse D, Louis Armstrong Ne...",Kenner,LA,70062,29.985551,-90.254249,1.5,412,0,"{'RestaurantsReservations': 'False', 'Business...","Seafood, Restaurants, Food, Desserts, Cajun/Cr...","{'Monday': '10:0-20:0', 'Tuesday': '10:0-20:0'..."


## Filtering/Subsetting (cont'd)

In [33]:
# or \ and & use parentheses for clarity
yelp_high_low = yelp[((yelp["stars"] < 2) | (yelp["stars"] > 4.25)) & (yelp["review_count"] > 400)]
yelp_high_low_sorted = yelp_high_low.sort_values(by='stars', ascending=False)
yelp_high_low_sorted

Unnamed: 0_level_0,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
business_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
l_7TW_Ix58-QvhQgpJi_Xw,SUGARED + BRONZED,1120 Walnut St,Philadelphia,PA,19107,39.948570,-75.160072,5.0,513,1,"{'BusinessParking': '{'garage': True, 'street'...","Beauty & Spas, Shopping, Waxing, Cosmetics & B...","{'Monday': '0:0-0:0', 'Tuesday': '9:0-23:0', '..."
tARR9jhv5gi9TjsfSVmjmw,Kaffe Crepe,"1300 East Plumb Ln, Ste C4",Reno,NV,89502,39.504208,-119.782768,5.0,454,1,"{'Caters': 'False', 'HasTV': 'False', 'GoodFor...","Food, Restaurants, Cafes, Creperies, Coffee & Tea","{'Monday': '9:0-15:0', 'Tuesday': '8:0-16:0', ..."
_aKr7POnacW_VizRKBpCiA,Blues City Deli,2438 McNair Ave,Saint Louis,MO,63104,38.605024,-90.218110,5.0,991,1,"{'BikeParking': 'True', 'RestaurantsAttire': '...","Delis, Bars, Restaurants, Nightlife, Pubs, Ame...","{'Monday': '0:0-0:0', 'Tuesday': '10:30-15:0',..."
FHDuu5Mv1bEkusxEuhptZQ,Barracuda Deli Cafe St. Pete Beach,6640 Gulf Blvd,St Pete Beach,FL,33706,27.736694,-82.748189,5.0,521,1,"{'RestaurantsAttire': 'u'casual'', 'Restaurant...","Caribbean, Latin American, Restaurants, Breakf...","{'Tuesday': '11:0-20:30', 'Wednesday': '11:0-2..."
8QqnRpM-QxGsjDNuu0E57A,Carlillos Cocina,415 S Rock Blvd,Sparks,NV,89431,39.530096,-119.766608,5.0,799,1,"{'NoiseLevel': 'u'average'', 'GoodForMeal': '{...","Bars, Mexican, Breakfast & Brunch, Restaurants...","{'Monday': '7:0-14:0', 'Tuesday': '7:0-14:0', ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...
uLp_wwemUq6cAuxPwaZF3Q,Zatarain's Kitchen,"900 Airline Dr, Cocourse D, Louis Armstrong Ne...",Kenner,LA,70062,29.985551,-90.254249,1.5,412,0,"{'RestaurantsReservations': 'False', 'Business...","Seafood, Restaurants, Food, Desserts, Cajun/Cr...","{'Monday': '10:0-20:0', 'Tuesday': '10:0-20:0'..."
-jsmtvdoUI-GJRSklYmEuA,Pets Best,"2323 S Vista Ave, Ste 100",Boise,ID,83705,43.581395,-116.214277,1.5,461,1,,"Pet Services, Pets, Pet Insurance","{'Monday': '6:0-19:0', 'Tuesday': '6:0-19:0', ..."
1fy9fS3UH2k4TfQcryNKkA,Goedeker's,13850 Manchester Rd,Ballwin,MO,63011,38.595703,-90.483721,1.5,747,1,"{'BusinessAcceptsCreditCards': 'True', 'Restau...","Appliances & Repair, Local Services, Appliance...","{'Monday': '10:0-18:0', 'Tuesday': '10:0-18:0'..."
ONuqtwn8euUIWumg3U_4DQ,Sears Home Services,639 B Gravios Bluffs Blvd,Fenton,MO,63026,38.507848,-90.435434,1.0,575,0,{'BusinessAcceptsCreditCards': 'True'},"Local Services, Home Services, Appliances & Re...","{'Monday': '9:30-17:30', 'Tuesday': '9:30-17:3..."


## Filtering/Subsetting (cont'd)

In [34]:
# 
yelp_big = yelp[yelp["review_count"].between(100, max(yelp["review_count"]))] # inclusive
yelp_big["stars"].value_counts().sort_index()

stars
1.0      28
1.5      88
2.0     237
2.5     576
3.0    1403
3.5    3364
4.0    5403
4.5    3269
5.0     279
Name: count, dtype: int64

## Filtering/Subsetting (cont'd)

In [35]:
#filtering on a string, this is a series method in pandas
names_with_plus = yelp[yelp["name"].str.contains("\+")]
names_with_plus.head()

Unnamed: 0_level_0,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
business_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
rR30b1XWbxZFFgLyGbfNAw,BODHI : Craft Bar + Thai Bistro,922 Massachusetts Ave,Indianapolis,IN,46202,39.78022,-86.141803,4.5,132,1,"{'Ambience': '{u'divey': False, u'hipster': Fa...","Gastropubs, Nightlife, Restaurants, Thai, Bars...","{'Monday': '0:0-0:0', 'Wednesday': '17:0-22:0'..."
2ahg0y3bmn8gRo0aGrFiEQ,Wink Lash Studio + Bath Bar,"538 W Plumb Ln, Ste E",Reno,NV,89509,39.505012,-119.815995,4.5,15,1,"{'BusinessAcceptsCreditCards': 'True', 'Restau...","Beauty & Spas, Eyelash Service","{'Tuesday': '10:0-17:0', 'Wednesday': '10:0-17..."
voKcGki0lA5JETqKFwm6AA,Factotum Barber + Supply,902 Piety St,New Orleans,LA,70117,29.964984,-90.042294,5.0,14,1,"{'RestaurantsPriceRange2': '2', 'ByAppointment...","Barbers, Men's Hair Salons, Hair Salons, Beaut...",
SvcWlFeXbSNkENnZfWYgEQ,willow + june hair,6212 A Ridge Ave,Philadelphia,PA,19128,40.035794,-75.218028,5.0,24,1,"{'ByAppointmentOnly': 'True', 'RestaurantsPric...","Beauty & Spas, Hair Extensions, Makeup Artists...","{'Tuesday': '14:0-20:0', 'Wednesday': '14:0-20..."
MuKoTR56s6elHEI2wUkgJA,Grace Meat + Three,4270 Manchester,St. Louis,MO,63110,38.626791,-90.256705,4.5,470,1,"{'WiFi': 'u'no'', 'HasTV': 'True', 'Restaurant...","Restaurants, Soul Food, Southern, Soup, Sandwi...","{'Monday': '0:0-0:0', 'Wednesday': '11:0-21:0'..."


## Changing the Index

The index of a Pandas Series or Pandas DataFrame is immutable and cannot be modified.

BUT, if you want to change the index of a series or dataframe, you can define a new index and replace the existing index of the series/DataFrame.

In [36]:
# note that the value after the decimal place corresponds to the letter position.
# i.e. 1.4 corresponds to d, the fourth letter.
original1 = pd.Series([1.4, 2.3, 3.1, 4.2], index = ['d','c','a','b'])
original1.index

Index(['d', 'c', 'a', 'b'], dtype='object')

In [37]:
print(original1)

d    1.4
c    2.3
a    3.1
b    4.2
dtype: float64


In [38]:
original1.index = range(4) # I replace the index of the series with this range object.

In [39]:
original1

0    1.4
1    2.3
2    3.1
3    4.2
dtype: float64

In [40]:
original1.index # We can see this has automatically become a RangeIndex object

RangeIndex(start=0, stop=4, step=1)

In [41]:
original1[1]

2.3

In [42]:
original1.loc[1] # behaves the same as above

2.3

In [43]:
original1.iloc[1] # behaves the same as above because the range index starts at 0

2.3

In [44]:
original1.index = range(1,5)

In [45]:
original1

1    1.4
2    2.3
3    3.1
4    4.2
dtype: float64

In [46]:
original1[1]

1.4

In [47]:
original1.loc[1]

1.4

In [48]:
original1.iloc[1] # why?

2.3

In [49]:
original1.index = ['a','b','c','d'] # be careful as no restrictions regarding the meaning of the index is applied.
# in the original 'a' was associated with 3.1. This index will associate it with 1.4

In [50]:
original1

a    1.4
b    2.3
c    3.1
d    4.2
dtype: float64

In [51]:
original1['a']

1.4

In [52]:
original1[0] # now that the index uses strings, you still can index by position

1.4

In [53]:
original1.index = [1, 2, 3, 4, 5]
# if the object you provide is of a different length, you get a value error

ValueError: Length mismatch: Expected axis has 4 elements, new values have 5 elements

In [54]:
# similarly you can change the index of a DataFrame by defining a new object and assigning it to the index.
original2 = pd.Series([2.2, 3.1, 1.3, 4.4], index = ['b','a','c','d'])
df = pd.DataFrame({"x":original1, "y": original2})
df.index = ['j','k','l','m']
df

Unnamed: 0,x,y
j,1.4,3.1
k,2.3,2.2
l,3.1,1.3
m,4.2,4.4


## Reindexing

Reindexing is different from just defining a new index. Sometimes, we want to set another column as the index of our DataFrame. Let’s say we wanted to make Studio the index of movies. 

Reindexing takes a current Pandas object and creates a *new* Pandas object that *conforms* to the specified index. 

Reindexing and creating a new index for a dataframe object aren't the same.

In [55]:
original = pd.Series([1.4, 2.3, 3.1, 4.2], index = ['d','c','a','b'])

In [56]:
original

d    1.4
c    2.3
a    3.1
b    4.2
dtype: float64

In [57]:
newobj = original.reindex(['a','b','c','d','e']) # note this has an index value that doesn't exist in the original series

In [58]:
newobj  # takes the data in orignal and moves it so it conforms to the specified index
# values that do not exist for the new index get NaN

a    3.1
b    4.2
c    2.3
d    1.4
e    NaN
dtype: float64

In [59]:
# if you don't want NaN, you can specify a fill_value
newobj2 = original.reindex(['a','b','c','d','e'], fill_value = 0)
newobj2

a    3.1
b    4.2
c    2.3
d    1.4
e    0.0
dtype: float64

For ordered data like a time series, it might be desirable to fill values when reindexing

In [60]:
obj3 = pd.Series(['blue', 'purple', 'yellow'], index=[0, 3, 6])
obj3

0      blue
3    purple
6    yellow
dtype: object

In [61]:
obj3.reindex(range(9))  # without any optional arguments, lots of missing values

0      blue
1       NaN
2       NaN
3    purple
4       NaN
5       NaN
6    yellow
7       NaN
8       NaN
dtype: object

In [62]:
obj3.reindex(range(9), method='ffill')
# forward-fill pushes values 'forward' until a new value is encountered

0      blue
1      blue
2      blue
3    purple
4    purple
5    purple
6    yellow
7    yellow
8    yellow
dtype: object

In [63]:
obj3.reindex(range(9), method='bfill')  
# back-fill works in the opposite direction
# there was no value at index 8 so, NaNs get filled in

0      blue
1    purple
2    purple
3    purple
4    yellow
5    yellow
6    yellow
7       NaN
8       NaN
dtype: object

## when there is an existing index

Here, yelp was using business_id as the index and it will be lost if we don't preserve the existing information.  We can reset_index()

In [64]:
yelp.head(3)

Unnamed: 0_level_0,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
business_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Pns2l4eNsfO8kk83dixA6A,"Abby Rappoport, LAC, CMQ","1616 Chapala St, Ste 2",Santa Barbara,CA,93101,34.426679,-119.711197,5.0,7,0,{'ByAppointmentOnly': 'True'},"Doctors, Traditional Chinese Medicine, Naturop...",
mpf3x-BjTdTEA3yCZrAYPw,The UPS Store,87 Grasso Plaza Shopping Center,Affton,MO,63123,38.551126,-90.335695,3.0,15,1,{'BusinessAcceptsCreditCards': 'True'},"Shipping Centers, Local Services, Notaries, Ma...","{'Monday': '0:0-0:0', 'Tuesday': '8:0-18:30', ..."
tUFrWirKiKi_TAnsVWINQQ,Target,5255 E Broadway Blvd,Tucson,AZ,85711,32.223236,-110.880452,3.5,22,0,"{'BikeParking': 'True', 'BusinessAcceptsCredit...","Department Stores, Shopping, Fashion, Home & G...","{'Monday': '8:0-22:0', 'Tuesday': '8:0-22:0', ..."


In [65]:
yelp.reset_index().head(3)

Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
0,Pns2l4eNsfO8kk83dixA6A,"Abby Rappoport, LAC, CMQ","1616 Chapala St, Ste 2",Santa Barbara,CA,93101,34.426679,-119.711197,5.0,7,0,{'ByAppointmentOnly': 'True'},"Doctors, Traditional Chinese Medicine, Naturop...",
1,mpf3x-BjTdTEA3yCZrAYPw,The UPS Store,87 Grasso Plaza Shopping Center,Affton,MO,63123,38.551126,-90.335695,3.0,15,1,{'BusinessAcceptsCreditCards': 'True'},"Shipping Centers, Local Services, Notaries, Ma...","{'Monday': '0:0-0:0', 'Tuesday': '8:0-18:30', ..."
2,tUFrWirKiKi_TAnsVWINQQ,Target,5255 E Broadway Blvd,Tucson,AZ,85711,32.223236,-110.880452,3.5,22,0,"{'BikeParking': 'True', 'BusinessAcceptsCredit...","Department Stores, Shopping, Fashion, Home & G...","{'Monday': '8:0-22:0', 'Tuesday': '8:0-22:0', ..."


## Date Ranges as Index

In [66]:
# we specify the creation of a date_index using the date_range function
# freq = 'D' creates Daily values 
date_index = pd.date_range('1/1/2010', periods=6, freq='D')
date_index

DatetimeIndex(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',
               '2010-01-05', '2010-01-06'],
              dtype='datetime64[ns]', freq='D')

In [67]:
# we create a DataFrame with the date index
df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]}, index=date_index)
df2

Unnamed: 0,prices
2010-01-01,100.0
2010-01-02,101.0
2010-01-03,
2010-01-04,100.0
2010-01-05,89.0
2010-01-06,88.0


In [68]:
date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')  # a new date index
df2.reindex(date_index2)

Unnamed: 0,prices
2009-12-29,
2009-12-30,
2009-12-31,
2010-01-01,100.0
2010-01-02,101.0
2010-01-03,
2010-01-04,100.0
2010-01-05,89.0
2010-01-06,88.0
2010-01-07,


In [69]:
df2.reindex(date_index2, method = 'bfill') 
# The value for Jan 3 isn't filled in because that NaN was not created by the reindexing process
# The NaN already existed in the data.

Unnamed: 0,prices
2009-12-29,100.0
2009-12-30,100.0
2009-12-31,100.0
2010-01-01,100.0
2010-01-02,101.0
2010-01-03,
2010-01-04,100.0
2010-01-05,89.0
2010-01-06,88.0
2010-01-07,


## Dropping rows or columns

you can use `df.drop()` to remove rows (default) or columns (specify axis = 1) at certain index locations.

In [70]:
df = pd.DataFrame(np.arange(12).reshape(3,4), columns=['A', 'B', 'C', 'D'], index = ['x','y','z'])
df

Unnamed: 0,A,B,C,D
x,0,1,2,3
y,4,5,6,7
z,8,9,10,11


In [71]:
# drop rows
# df.drop returns a new object and leaves df unchanged

df_y = df.drop(['x', 'z'])
df_y

Unnamed: 0,A,B,C,D
y,4,5,6,7


In [72]:
# drop columns
# you can use the argument inplace = True

df.drop(['B', 'C'], axis = 1, inplace = True) 

# we must specify axis = 1 otherwise Pandas will look for "B" and "C" in the row names

In [73]:
df

Unnamed: 0,A,D
x,0,3
y,4,7
z,8,11


## Data Alignment

When performing element-wise arithmetic, Pandas will align the index values before doing the computation

In [74]:
s1 = pd.Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])
s1

a    7.3
c   -2.5
d    3.4
e    1.5
dtype: float64

In [75]:
s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1],
               index=['a', 'c', 'e', 'f', 'g'])
s2

a   -2.1
c    3.6
e   -1.5
f    4.0
g    3.1
dtype: float64

In [76]:
pd.DataFrame({'s1':s1,'s2':s2}) # for reference

Unnamed: 0,s1,s2
a,7.3,-2.1
c,-2.5,3.6
d,3.4,
e,1.5,-1.5
f,,4.0
g,,3.1


In [77]:
s1 + s2  # returns a new series, where the indexes are the union of the indexes of s1 and s2

a    5.2
c    1.1
d    NaN
e    0.0
f    NaN
g    NaN
dtype: float64

In [78]:
s1.add(s2)

a    5.2
c    1.1
d    NaN
e    0.0
f    NaN
g    NaN
dtype: float64

In [79]:
pd.DataFrame({'s1':s1,'s2':s2})

Unnamed: 0,s1,s2
a,7.3,-2.1
c,-2.5,3.6
d,3.4,
e,1.5,-1.5
f,,4.0
g,,3.1


In [80]:
s1.sub(s2, fill_value = 0)

a    9.4
c   -6.1
d    3.4
e    3.0
f   -4.0
g   -3.1
dtype: float64

In [81]:
s1.rsub(s2, fill_value = 0) # .rsub means 'right hand subtract' sets the series in the argument as the base

a   -9.4
c    6.1
d   -3.4
e   -3.0
f    4.0
g    3.1
dtype: float64

In [82]:
s1 * s2

a   -15.33
c    -9.00
d      NaN
e    -2.25
f      NaN
g      NaN
dtype: float64

In [83]:
s1.multiply(s2, fill_value = 1)

a   -15.33
c    -9.00
d     3.40
e    -2.25
f     4.00
g     3.10
dtype: float64

For data frames with different columns, the rows and columns will be aligned

In [84]:
df1 = pd.DataFrame(np.arange(9.).reshape((3, 3)), columns=list('bcd'),
                   index=['Ohio', 'Texas', 'Colorado'])
df1

Unnamed: 0,b,c,d
Ohio,0.0,1.0,2.0
Texas,3.0,4.0,5.0
Colorado,6.0,7.0,8.0


In [85]:
df2 = pd.DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'),
                   index=['Utah', 'Ohio', 'Texas', 'Oregon'])
df2

Unnamed: 0,b,d,e
Utah,0.0,1.0,2.0
Ohio,3.0,4.0,5.0
Texas,6.0,7.0,8.0
Oregon,9.0,10.0,11.0


In [86]:
df1 + df2 
# c is in df1, but not df2
# e is in df2, but not df1
# the result returns the union of columns, but will fill in NaN for elements that do not exist in both

Unnamed: 0,b,c,d,e
Colorado,,,,
Ohio,3.0,,6.0,
Oregon,,,,
Texas,9.0,,12.0,
Utah,,,,


In [87]:
# if you want to fill in values that are missing, you can use df.add() and specify the fill_value
# this will perform the above operation, but instead of using NaN when it can't find a value 
# (which will return NaN),
# it will use the fill_value
df1.add(df2, fill_value = 0)
# you still get NaN if the value does not exist in either DataFrame

Unnamed: 0,b,c,d,e
Colorado,6.0,7.0,8.0,
Ohio,3.0,1.0,6.0,5.0
Oregon,9.0,,10.0,11.0
Texas,9.0,4.0,12.0,8.0
Utah,0.0,,1.0,2.0


Arithmetic operations that can be called on DataFrames and Series are:

- `.add()`, `.radd()` and `.sub()`, `.rsub()`
- `.mul()`, `.rmul()` and `.div()`, `.rdiv()` 
- `.floordiv()`, `.rfloordiv()` (floor division `//`)
- `.pow()`, `.rpow()` (exponentiation `**`)

In [88]:
df = pd.DataFrame({'one':[1.5,6.0,np.nan, 1.5,4,6, np.nan],
                   'two':[np.nan, -4.5, np.nan, -1.5, 0, -4.5, 4]},
                  index=['a', 'b', 'c', 'd','e','f','g'])
df

Unnamed: 0,one,two
a,1.5,
b,6.0,-4.5
c,,
d,1.5,-1.5
e,4.0,0.0
f,6.0,-4.5
g,,4.0


## filtering out missing values

In [89]:
df

Unnamed: 0,one,two
a,1.5,
b,6.0,-4.5
c,,
d,1.5,-1.5
e,4.0,0.0
f,6.0,-4.5
g,,4.0


In [90]:
df.dropna() # gets rid of any row that is not complete

Unnamed: 0,one,two
b,6.0,-4.5
d,1.5,-1.5
e,4.0,0.0
f,6.0,-4.5


In [91]:
df.dropna(how = 'all')  # only drops rows that are entirely NaN

Unnamed: 0,one,two
a,1.5,
b,6.0,-4.5
d,1.5,-1.5
e,4.0,0.0
f,6.0,-4.5
g,,4.0


## Filling in Missing Values

In [92]:
df

Unnamed: 0,one,two
a,1.5,
b,6.0,-4.5
c,,
d,1.5,-1.5
e,4.0,0.0
f,6.0,-4.5
g,,4.0


In [93]:
df.fillna(0) # fill in missing values with a constant

Unnamed: 0,one,two
a,1.5,0.0
b,6.0,-4.5
c,0.0,0.0
d,1.5,-1.5
e,4.0,0.0
f,6.0,-4.5
g,0.0,4.0


In [94]:
df.fillna({'one': 1000, 'two': 0})  # use a dictionary to specify values to use for each column

Unnamed: 0,one,two
a,1.5,0.0
b,6.0,-4.5
c,1000.0,0.0
d,1.5,-1.5
e,4.0,0.0
f,6.0,-4.5
g,1000.0,4.0


In [95]:
df.fillna(method = 'bfill')  # backfills. You can also use ffill

Unnamed: 0,one,two
a,1.5,-4.5
b,6.0,-4.5
c,1.5,-1.5
d,1.5,-1.5
e,4.0,0.0
f,6.0,-4.5
g,,4.0


In [96]:
df.mean()

one    3.8
two   -1.3
dtype: float64

In [97]:
df.fillna(df.mean())  # fill na with df.mean() will fill in the column means

Unnamed: 0,one,two
a,1.5,-1.3
b,6.0,-4.5
c,3.8,-1.3
d,1.5,-1.5
e,4.0,0.0
f,6.0,-4.5
g,3.8,4.0


all of the above fillna methods have created new DataFrame objects. If you want to modify the current DataFrame, you can use the optional argument `inplace = True`

In [98]:
df.T

Unnamed: 0,a,b,c,d,e,f,g
one,1.5,6.0,,1.5,4.0,6.0,
two,,-4.5,,-1.5,0.0,-4.5,4.0


In [99]:
# apparently you can only fill missing values with dictionaries/series over a column 
# so we have to do some Transpose magic
df.T.fillna(df.T.mean()).T

Unnamed: 0,one,two
a,1.5,1.5
b,6.0,-4.5
c,,
d,1.5,-1.5
e,4.0,0.0
f,6.0,-4.5
g,4.0,4.0


## dealing with duplicates

In [100]:
df

Unnamed: 0,one,two
a,1.5,
b,6.0,-4.5
c,,
d,1.5,-1.5
e,4.0,0.0
f,6.0,-4.5
g,,4.0


In [101]:
df.duplicated()  # sees if any of the rows are a duplicate of an earlier row

a    False
b    False
c    False
d    False
e    False
f     True
g    False
dtype: bool

In [102]:
nodups = df[~df.duplicated()]  # gets rid of the duplicated rows
nodups

Unnamed: 0,one,two
a,1.5,
b,6.0,-4.5
c,,
d,1.5,-1.5
e,4.0,0.0
g,,4.0


<h1> Statistics 21 <br/> Have great weekend! </h1>

<script>
    setBackgroundImage("Window1.jpg", "black");
</script>