# Students:
## Fabio Fonseca, registration
## Julio Cesar, registration
## Tiago Fernandes, 20171021201

## 1. Introduction

In the past few missions, you've learned how to use Pandas to analyze data quickly and efficiently. You applied this knowledge by creating guided projects to solidify your knowledge. You'll go further in this project, and you'll build an end-to-end data analysis project on your own, using Pandas and Python.



## 2. The data

In this project, you'll be working with crime data from [Montgomery County, MD](https://en.wikipedia.org/wiki/Montgomery_County,_Maryland). Each row in the data is a crime reported by a law enforcement officer in <span style="background-color: #F9EBEA; color:##C0392B">2013</span> and entered into a database.

You'll want to download the data from the repository. After downloading the data, you'll want to create a new Jupyter notebook in the same folder, and ensure that any code or analysis you do on the data occurs in that notebook.

You can load the data in and display the first <span style="background-color: #F9EBEA; color:##C0392B">5</span> rows to get a better idea of the structure:

>```python
import pandas as pd
crimes = pd.read_csv("MontgomeryCountyCrime2013.csv")
crimes.head()
```

You'll also want to display all of the column names with:

>```python
crimes.columns
```
>```python
Index(['Incident ID', 'CR Number', 'Dispatch Date / Time', 'Class',
       'Class Description', 'Police District Name', 'Block Address', 'City',
       'State', 'Zip Code', 'Agency', 'Place', 'Sector', 'Beat', 'PRA',
       'Start Date / Time', 'End Date / Time', 'Latitude', 'Longitude',
       'Police District Number', 'Location', 'Address Number'],
      dtype='object')
```

After displaying some of the data, make sure you look through and understand each column. It can be helpful to display the first few values in each column in order to understand it better. It can also be useful to perform a Google search to help give you context for columns. For example, looking up <span style="background-color: #F9EBEA; color:##C0392B">Police District Number Montgomery County</span> bring you to this [page](https://www.montgomerycountymd.gov/pol/districts/map.html), which helps you understand them. Make sure to write up a Markdown cell explaining anything relevant that you learned.

You'll also want to explore missing values in each column. Why do you think certain columns have missing values? Make sure to write up your thoughts on missing values, and how they'll impact your analysis.

Also make sure to look at the format of each column. For example, <span style="background-color: #F9EBEA; color:##C0392B">Zip Code</span> is a float column, but if you know about Zip codes in the US, you know that they're always integers. Keeping that this column is of the "wrong" type in mind will help you as you analyze the data.

In [9]:
#Import packages and load csv data file
import pandas as pd
import numpy as np
import datetime

crimes = pd.read_csv("MontgomeryCountyCrime2013.csv")

#Print all data from csv file
crimes

Unnamed: 0,Incident ID,CR Number,Dispatch Date / Time,Class,Class Description,Police District Name,Block Address,City,State,Zip Code,...,Sector,Beat,PRA,Start Date / Time,End Date / Time,Latitude,Longitude,Police District Number,Location,Address Number
0,200939101,13047006,10/02/2013 07:52:41 PM,511,BURG FORCE-RES/NIGHT,OTHER,25700 MT RADNOR DR,DAMASCUS,MD,20872.0,...,,,,10/02/2013 07:52:00 PM,,,,OTHER,,25700.0
1,200952042,13062965,12/31/2013 09:46:58 PM,1834,CDS-POSS MARIJUANA/HASHISH,GERMANTOWN,GUNNERS BRANCH RD,GERMANTOWN,MD,20874.0,...,M,5M1,470.0,12/31/2013 09:46:00 PM,,,,5D,,
2,200926636,13031483,07/06/2013 09:06:24 AM,1412,VANDALISM-MOTOR VEHICLE,MONTGOMERY VILLAGE,OLDE TOWNE AVE,GAITHERSBURG,MD,20877.0,...,P,6P3,431.0,07/06/2013 09:06:00 AM,,,,6D,,
3,200929538,13035288,07/28/2013 09:13:15 PM,2752,FUGITIVE FROM JUSTICE(OUT OF STATE),BETHESDA,BEACH DR,CHEVY CHASE,MD,20815.0,...,D,2D1,11.0,07/28/2013 09:13:00 PM,,,,2D,,
4,200930689,13036876,08/06/2013 05:16:17 PM,2812,DRIVING UNDER THE INFLUENCE,BETHESDA,BEACH DR,SILVER SPRING,MD,20815.0,...,D,2D3,178.0,08/06/2013 05:16:00 PM,,,,2D,,
5,200931009,13037095,08/07/2013 11:31:19 PM,1864,CDS IMPLMNT-MARIJUANA/HASHISH,MONTGOMERY VILLAGE,N270 CUTOVR X8 TO X9 HWY,GAITHERSBURG,MD,,...,P,6P1,444.0,08/07/2013 11:31:00 PM,,,,6D,,
6,200931987,13037600,08/10/2013 07:52:08 PM,1833,CDS-POSS COCAINE& DERIVATIVES,MONTGOMERY VILLAGE,SAM EIG HWY,ROCKVILLE,MD,20877.0,...,P,6P2,660.0,08/10/2013 07:52:00 PM,,,,6D,,
7,200936488,13043769,09/15/2013 06:56:49 AM,2791,ALL OTHER NON-TRAFFIC CRIM OFFENSES,MONTGOMERY VILLAGE,WOODFIELD RD,WASHINGTON GROVE,MD,20877.0,...,P,6P3,419.0,09/15/2013 06:40:00 AM,,,,6D,,
8,200938488,13046321,09/29/2013 12:44:15 AM,2812,DRIVING UNDER THE INFLUENCE,ROCKVILLE,WOOTTON PKW,ROCKVILLE,MD,20852.0,...,A,1A1,263.0,09/29/2013 12:44:00 AM,,,,1D,,
9,200939746,13047878,10/07/2013 11:39:48 PM,2812,DRIVING UNDER THE INFLUENCE,ROCKVILLE,WOOTTON PKW,ROCKVILLE,MD,20850.0,...,A,1A1,260.0,10/07/2013 11:39:00 PM,10/08/2013 12:30:00 AM,,,1D,,


### Observations from crimes database:

* There are more than twenty thousand occurrences (23369), where each occurrence contains 22 attributes;

### Individual analysis from each Crimes table column:

These descriptions are based on direct understanding of each column name. Below, a brief description of some specific columns.

- **Incident ID**: Maybe a occurrences unique identifier. Through a previous check, was possible to verify that all are unique values.
- **CR Number**: *Unknown*.
- **Dispatch Date/Time**: Maybe something associated to release moment.
- **Class**: A kind of code for occurrence class.
- **Class Description**: Description of occurrence class (item above).
- **Police District Name**: Responsible policial district name of occurrence.
- **Agency**: *Unknown*.
- **Place**: A description of occurrence place, like "in the house", "in the car", etc.
- **Sector**: A letter that represents the sector, in a district view.
- **Beat**: Police patrol identifier
- **PRA**: *Unknown*.
- **Start Date/Time**: Occurrence start register
- **End Date/Time**: Occurrence end register
- **Police District Number**: Police district identifier number

Among other more direct attributes: **Block Address, City, State, Zip Code, Latitude, Longitude, Location and Address Number**.



## 3. Analyzing the times of crimes

The  <span style="background-color: #F9EBEA; color:##C0392B">Dispatch Date / Time</span> column looks very interesting, because it allows us to figure out when crimes are most likely to occur. You can use this column to answer questions like:

- What day of the week are the most crimes committed on? (ie Monday, Tuesday, etc)
- During what time of day are the most crimes committed?
- During what month are the most crimes committed?

You can answer these questions by first parsing the <span style="background-color: #F9EBEA; color:##C0392B">Dispatch Date / Time</span> column using the [pandas.to_datetime](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html) function, like this:

>```python
d_time = pandas.to_datetime(crimes["Dispatch Date / Time"])
```

After doing the type conversion, you'll need to extract the components of the datetime you're interested in. You can see documentation for this [here](http://pandas.pydata.org/pandas-docs/stable/api.html#datetimelike-properties). After the extraction, you can use the <span style="background-color: #F9EBEA; color:##C0392B">pandas.Series.value_counts</span> method to count up the items you want.

There is some nuance around counting up the time of day when crimes are committed. You'll have to decide how you want to define "time of day". This can be as simple as using the hour, or as complex as assigning categories to certain times, like "morning", "afternoon", "evening", and "night".

As you answer these questions, make sure to document your code, and add in explanations after each cell. Your explanations should discuss the answer, along with anything interesting your discovered.

Were you surprised by your findings? Why do you think that crimes follow the patterns that they do? It may be useful to do some research here to see if you can find support for your theories.

After you're done, take a look at the <span style="background-color: #F9EBEA; color:##C0392B">End Date / Time</span> and <span style="background-color: #F9EBEA; color:##C0392B">Start Date / Time</span> columns. Are these different from the <span style="background-color: #F9EBEA; color:##C0392B">Dispatch Date / Time</span> column? Would it be useful to use one or both of those columns to do this analysis instead?


In [10]:
# Separeted block, too slow
dispatch_date_crimes = crimes["Dispatch Date / Time"]
dispatch_date_time = pd.to_datetime(dispatch_date_crimes)

In [25]:
#What day of the week are the most crimes committed on? (ie Monday, Tuesday, etc)
#Solution: Just take the dayofweek attribute and make a count from values.

#Code:
print(" - What day of the week are the most crimes committed on? (ie Monday, Tuesday, etc)","\n")
print(dispatch_date_time.dt.dayofweek.value_counts())
print("obs.: 0 = monday, 6 = sunday")
print("===================================================================================")


#During what time of day are the most crimes committed?
#Solution: First, was made a dictionary with the limits of each part of day. After, for each part of day, 
#          were recorded all occurrences that were included into especific limits.
#          Morning:   06:00:00 ~ 11:59:59
#          Afternoon: 12:00:00 ~ 17:59:59
#          Evening:   18:00:00 ~ 23:59:59
#          Night:     00:00:00 ~ 05:59:59

#Code:
print(" - During what time of day are the most crimes committed?","\n")
categorical_part_of_day = {'morning' : {'init':datetime.time(6,0,0), 'finish':datetime.time(11,59,59)},   # morning: 05h00 AM ~ 11h59 AM
                           'afternoon' : {'init':datetime.time(12,0,0),'finish':datetime.time(17,59,59)}, # afternoon 12h00 PM ~ 16h59 PM
                           'evening' : {'init':datetime.time(18,0,0),'finish':datetime.time(23,59,59)},   # afternoon 17h00 PM ~ 20h59 PM
                           'night' : {'init':datetime.time(0,0,0),'finish':datetime.time(5,59,59)}        # afternoon 21h00 PM ~ 04h59 PM
                          }      
crimes_per_part_of_day = {}
for part_of_day, limits in categorical_part_of_day.items() :
    crimes_per_part_of_day[part_of_day] = dispatch_date_time.dt.time[(dispatch_date_time.dt.time > limits['init']) & (dispatch_date_time.dt.time < limits['finish'])].size    
print(crimes_per_part_of_day)
print("===================================================================================")

#During what month are the most crimes committed?
#Solution: Same as dayofweek, we use the month attribute and then count all values.

#Code:
print(" - During what month are the most crimes committed?","\n")
print(dispatch_date_time.dt.month.value_counts())
print("===================================================================================")

 - What day of the week are the most crimes committed on? (ie Monday, Tuesday, etc) 

1    3836
0    3734
2    3611
4    3594
3    3404
5    2807
6    2383
Name: Dispatch Date / Time, dtype: int64
obs.: 0 = monday, 6 = sunday
 - During what time of day are the most crimes committed? 

{'evening': 6286, 'afternoon': 6842, 'morning': 6971, 'night': 3265}
 - During what month are the most crimes committed? 

10    4075
8     4002
11    3941
9     3927
12    3904
7     3520
Name: Dispatch Date / Time, dtype: int64


In [5]:
# ============================================ Solucao de Julio ===============================================


#Analisando o campo "Dispatch Date / Time"
##########################################################
d_time = pd.to_datetime(crimes["Dispatch Date / Time"])#Converte a coluna "Dispatch Date / Time" para datetime

######################################################################################################
dias_da_ocorrencia = d_time.dt.dayofweek#Obtem o dia da semana do datetime segunda=0, domingo=6
dias_da_semana = [0,1,2,3,4,5,6]#segunda=0, domingo=6
ocorrencias_por_dia = {}

#Soma as ocorrencias que ocorreram nos dias da semana 
for dia in dias_da_semana:
    ocorrencias_por_dia[dia] = sum(dias_da_ocorrencia == dia)

print("Ocorrencias por dia da semana: ", ocorrencias_por_dia)#segunda=0, domingo=6

#####################################################################################################
meses_da_ocorrencia = d_time.dt.month#Obtem o mes do ano do datetime janeiro=1, dezembro=12
#print(meses_da_ocorrencia)
meses_do_ano = [1,2,3,4,5,6,7,8,9,10,11,12]#janeiro=1, dezembro=12
ocorrencias_por_mes = {}

#Soma as ocorrencias que ocorreram nos meses do ano
for mes in meses_do_ano:
    ocorrencias_por_mes[mes] = sum(meses_da_ocorrencia == mes)

print("Ocorrencias por mês do ano: ",ocorrencias_por_mes)#janeiro=1, dezembro=12

#####################################################################################################
horarios_ocorrencias = d_time.dt.time#Obtem o horario da ocorrencia 
#cria um dicionario com os campos 'madrugada', 'manha', 'tarde' e 'noite', que representam respectivamente horarios entre
#00:00:00 e 06:00:00, 06:00:00 e 12:00:00, 12:00:00 e 18:00:00, 18:00:00 e 23:59:59
ocorrencia_periodo_dia = {'madrugada':0,'manha':0,'tarde':0,'noite':0}

#Soma para cada periodo do dia
for hora in horarios_ocorrencias:
    if hora < datetime.time(6,0,0):
        ocorrencia_periodo_dia['madrugada'] += 1
    if hora > datetime.time(6,0,0) and hora < datetime.time(12,0,0):
        ocorrencia_periodo_dia['manha'] += 1
    if hora > datetime.time(12,0,0) and hora < datetime.time(18,0,0):
        ocorrencia_periodo_dia['tarde'] += 1
    if hora > datetime.time(18,0,0) and hora < datetime.time(23,59,59):
        ocorrencia_periodo_dia['noite'] += 1
        
print("Ocorrencia por periodo do dia: ",ocorrencia_periodo_dia)

Ocorrencias por dia da semana:  {0: 3734, 1: 3836, 2: 3611, 3: 3404, 4: 3594, 5: 2807, 6: 2383}
Ocorrencias por mês do ano:  {1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 3520, 8: 4002, 9: 3927, 10: 4075, 11: 3941, 12: 3904}
Ocorrencia por periodo do dia:  {'manha': 6971, 'tarde': 6842, 'madrugada': 3269, 'noite': 6286}


## 4. Analyzing locations of crimes

There are a few columns that encode information about the location of crimes:

- <span style="background-color: #F9EBEA; color:##C0392B">Police District Name</span>
- <span style="background-color: #F9EBEA; color:##C0392B">Block Address</span>
- <span style="background-color: #F9EBEA; color:##C0392B">Zip Code</span>
- <span style="background-color: #F9EBEA; color:##C0392B">Sector</span>
- <span style="background-color: #F9EBEA; color:##C0392B">Beat</span>
- <span style="background-color: #F9EBEA; color:##C0392B">Latitude</span>
- <span style="background-color: #F9EBEA; color:##C0392B">Longitude</span>
- <span style="background-color: #F9EBEA; color:##C0392B">Police District Number</span>
- <span style="background-color: #F9EBEA; color:##C0392B">Location</span>
- <span style="background-color: #F9EBEA; color:##C0392B">Address Number</span>

These columns have varying numbers of missing values, and varying granularity. Some of the columns represent areas with large granularity (like police districts), whereas some represent areas with small granularity, like <span style="background-color: #F9EBEA; color:##C0392B">Latitude</span>, and <span style="background-color: #F9EBEA; color:##C0392B">Longitude</span>.


In order to decide which column to use to analyze the locations of crimes, you need to utilize the following criteria:

- **Granularity**. Areas that are too small aren't great, because only a few crimes were committed inside them, which makes it hard to analyze and compare. For example, if I tell you that Silver Spring (a city in Montgomery County) is the place with the most crimes, you'll know to avoid that area. However, if I tell you that a 100 foot section of Silver Spring has the most crimes, it won't be as helpful (it's unlikely that you'll ever be in that 100 foot section).
- **Comprehensibility**. You looked up the Police District map of Montgomery County before, so it's simple to tell what area corresponds to district <span style="background-color: #F9EBEA; color:##C0392B">6</span>. However, what area does Beat <span style="background-color: #F9EBEA; color:##C0392B">5M1</span> correspond to? You may be able to look this up, but it's harder to comprehend.
- **Missing values**. If a column has a lot of missing values, that means that the conclusions you draw are less valid, because you don't know if the missing data is systematic (ie all data for a given district is missing) or random (equal amounts of data are missing from each district). You should try to select a column that has minimal missing values.

Based on the above criteria, pick a column that you want to use to analyze location. After picking a column, see if you can answer these questions:

- In what area did the most crimes occur? What physical locations (like cities) does this area correspond to?
- Which area has the highest number of crimes per capita? You may be able to find population data per area online. For example, [this](https://www.montgomerycountymd.gov/POL/Resources/Files/crime/MCP2015AnnualCrimeReportFINAL.pdf) annual report has per-district populations towards the bottom.

Make sure to write up the answers to these questions, along with your code, and explain why you reached the conclusions you did.


In [34]:
# Load columns and check null values
sectors_crime = crimes["Sector"]
districts_crime = crimes["Police District Number"]
city_crime = crimes["City"]
sectors_is_null = sectors_crime.isnull()
districts_is_null = districts_crime.isnull()

#Remove null values
sectors_crime = sectors_crime[~sectors_is_null]
districts_crime = districts_crime[~districts_is_null]


#In what area did the most crimes occur?
#Solution: We are using area as Police District Number. So, just take the Police District
#          Number column and make a count from values.

#Code:
print(" - In what area did the most crimes occur? (Police District Number)","\n")
crimes_per_district = districts_crime.value_counts()
print(crimes_per_district)
print("===================================================================================")


#What physical locations (like cities) does this area correspond to?
#Solution: For each district name, was considered all cities, from unique way, that had 
#          association into the occurrences

#Code:
print(" - What physical locations (like cities) does this area correspond to?","\n")
cities_per_district = {}
for district in districts_crime.unique():    
    cities_per_district[district] = city_crime[districts_crime == district].unique()  
    print(district,':',cities_per_district[district])
print("===================================================================================")


#Which area has the highest number of crimes per capita? You may be able to find population data per area online. For example, this annual report has per-district populations towards the bottom.
#Solution: First of all, we collected the population data per district from external resource.
#          With this, for each district population, we divided the amount of crimes by 
#          district population.

#Code:
print(" - Which area has the highest number of crimes per capita? You may be able to find population data per area online. For example, this annual report has per-district populations towards the bottom.","\n")
population_per_district = {"1D":148000, "2D":182000, "3D":152000, "4D":207000, "5D":130000, "6D":146000}
crimes_per_capita = {}
for district, population in population_per_district.items():
    crimes_per_capita[district] = crimes_per_district[district]/population
    print(district,':',crimes_per_capita[district])
print("obs.: Just these districts has a population values. No records for OTHER and TPPD districts.")
print("===================================================================================")



 - In what area did the most crimes occur? (Police District Number) 

3D       5533
4D       4375
6D       3812
1D       3480
2D       3383
5D       2755
TPPD       23
OTHER       8
Name: Police District Number, dtype: int64
 - What physical locations (like cities) does this area correspond to? 

OTHER : ['DAMASCUS' 'CHEVY CHASE' 'KISSIMMEE' 'LAUREL' 'HYATTSVILLE' 'GAITHERSBURG'
 'OLNEY' 'GERMANTOWN']
5D : ['GERMANTOWN' 'DAMASCUS' 'CLARKSBURG' 'BOYDS' 'BROOKEVILLE' 'GAITHERSBURG'
 'POOLESVILLE' 'MOUNT AIRY' 'DICKERSON' 'MONTGOMERY VILLAGE' 'BARNESVILLE'
 'DERWOOD']
6D : ['GAITHERSBURG' 'ROCKVILLE' 'WASHINGTON GROVE' 'MONTGOMERY VILLAGE'
 'DERWOOD' 'GERMANTOWN' 'OLNEY']
2D : ['CHEVY CHASE' 'SILVER SPRING' 'BETHESDA' 'ROCKVILLE' 'KENSINGTON'
 'POTOMAC' 'CABIN JOHN' 'GLEN ECHO']
1D : ['ROCKVILLE' 'DERWOOD' 'BOYDS' 'GAITHERSBURG' 'POTOMAC' 'SILVER SPRING'
 'POOLESVILLE' 'DICKERSON' 'GERMANTOWN' 'BEALLSVILLE']
3D : ['SILVER SPRING' 'BURTONSVILLE' 'TAKOMA PARK' 'CHEVY CHASE' 'SPENCERVILLE'
 

## 5. Analyzing types of crime

The <span style="background-color: #F9EBEA; color:##C0392B">Class Description</span> column tells us something about the type of crime that was committed. We can use this column to discover which crimes are committed most often.

Here are some initial questions to answer:

- Which crimes are the most common? Least common?
- Can you split the types of crimes manually into "Violent" (caused harm to others or involved weapons) and "Nonviolent" (mostly property crimes, like theft? What's the most common violent crime? The most common nonviolent?

To manually split up violent and nonviolent crimes, just assign each crime to a category. For example, <span style="background-color: #F9EBEA; color:##C0392B">ASSAULT & BATTERY - CITIZEN</span> is violent, and <span style="background-color: #F9EBEA; color:##C0392B">VANDALISM-MOTOR VEHICLE</span> is nonviolent. It may be useful to create a column called <span style="background-color: #F9EBEA; color:##C0392B">Violent</span>, and then use the [pandas.DataFrame.apply](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html) method to assign **True** or **False** to each row in the column. For example, if **ASSAULT** is in Class Description, it's violent, but if **LARCENY** is in Class Description, it's nonviolent.

Make sure to write up the answers to these questions, along with your code, and explain why you reached the conclusions you did.

In [7]:
#Which crimes are the most common? Least common?
#Can you split the types of crimes manually into "Violent" (caused harm to others or involved weapons) and "Nonviolent" 
#(mostly property crimes, like theft? What's the most common violent crime? The most common nonviolent?
crimes1 =crimes
crimes2 =crimes
#crimes

id_crime = crimes["Class"]
description_crime = crimes["Class Description"]
city_crime = crimes["City"]
column_names = crimes.columns
info_crimes = column_names.tolist()

un_crimes = crimes["Class"].unique()
un_des_crimes = crimes["Class Description"].unique()
mul_cols = crimes[["Class", "Class Description"]]
#print(un_des_crimes)


#id_crime = id_crime[~id_crime_is_null]
#description_crime = description_crime[~description_crime_is_null]
#Which crimes are the most common? Least common?
print("Which crimes are the most common? Least common?\n")
rate_crimes_per_district = id_crime.value_counts()
desc_crimes_per_district = description_crime[rate_crimes_per_district.index.tolist()]
desc_list = []
for i in range (0, len(desc_crimes_per_district)):
    desc_list.append(mul_cols.loc[mul_cols["Class"] == rate_crimes_per_district.index[i], "Class Description"].values[0])
dat1 = pd.DataFrame({'ID Class':rate_crimes_per_district.index.tolist()})
dat2 = pd.DataFrame({'Class Description':desc_list})
dat3 = pd.DataFrame({'Amount':rate_crimes_per_district.tolist()})
dat4 = dat1.join(dat2)
dat5 = dat4.join(dat3)
print("Most common")
print(dat5.head());
print("\nLeast common")
print(dat5.tail());

#print(dat5);

#Can you split the types of crimes manually into "Violent" (caused harm to others or involved weapons) and "Nonviolent" 
#(mostly property crimes, like theft? What's the most common violent crime? The most common nonviolent?
print("\nCan you split the types of crimes manually into 'Violent' (caused harm to others or involved weapons) and 'Nonviolent'" 
      +"(mostly property crimes, like theft? What's the most common violent crime? The most common nonviolent?")
listfalse = [False]*crimes.size

dat10 = pd.DataFrame({'Violent':listfalse})
crimes1 = crimes1.join(dat10);
#print(crimes1)


if 1:
    listviolent = []
    #for i in range (0, dat5.size):
    new_listviolent = []
    listviolent.append( [(mul_cols.loc[mul_cols["Class Description"] == s, "Class"].values[0])  for s in crimes["Class Description"] if "ASSLT" in s])
    for item in listviolent[0]:
        if item not in new_listviolent:
            new_listviolent.append(item)
    listviolent.append( [(mul_cols.loc[mul_cols["Class Description"] == s, "Class"].values[0])  for s in crimes["Class Description"] if "ASSAULT" in s])
    for item in listviolent[1]:
        if item not in new_listviolent:
            new_listviolent.append(item)
    listviolent.append( [(mul_cols.loc[mul_cols["Class Description"] == s, "Class"].values[0])  for s in crimes["Class Description"] if "WEAPON" in s])
    for item in listviolent[2]:
        if item not in new_listviolent:
            new_listviolent.append(item)
    listviolent.append( [(mul_cols.loc[mul_cols["Class Description"] == s, "Class"].values[0])  for s in crimes["Class Description"] if "HOMICIDE" in s])
    for item in listviolent[3]:
        if item not in new_listviolent:
            new_listviolent.append(item)
    listviolent.append( [(mul_cols.loc[mul_cols["Class Description"] == s, "Class"].values[0])  for s in crimes["Class Description"] if "ROB" in s])
    for item in listviolent[4]:
        if item not in new_listviolent:
            new_listviolent.append(item)
    listviolent.append( [(mul_cols.loc[mul_cols["Class Description"] == s, "Class"].values[0])  for s in crimes["Class Description"] if "WPN" in s])
    for item in listviolent[5]:
        if item not in new_listviolent:
            new_listviolent.append(item)
    listviolent.append( [(mul_cols.loc[mul_cols["Class Description"] == s, "Class"].values[0])  for s in crimes["Class Description"] if "RAPE" in s])
    for item in listviolent[6]:
        if item not in new_listviolent:
            new_listviolent.append(item)
    listviolent.append( [(mul_cols.loc[mul_cols["Class Description"] == s, "Class"].values[0])  for s in crimes["Class Description"] if "KIDNAP" in s])
    for item in listviolent[7]:
        if item not in new_listviolent:
            new_listviolent.append(item)
    for item in new_listviolent:
        if item not in new_listviolent:
            new_listviolent.append(item)
            
    sizeofcrimes = len(crimes1) - 1 
    print(sizeofcrimes)
    #crimes1[["Class"]].loc[i]
    for j in range (0, len(new_listviolent) - 1):
        for i in range (0, len(crimes1) - 1):
                #print(i)
                #if crimes1['Class'].loc[[i]].item() == new_listviolent[0]:
                if crimes1.iloc[i, crimes1.columns.get_loc('Class')] == new_listviolent[j]:
                    crimes1.iloc[i, crimes1.columns.get_loc('Violent')] = True
    #for item in new_listviolent:        
    #print(len(new_listviolent))
    #crimes.loc[crimes["Class Description"]. find("ASSAULT")]
    print("ACABOU!!!!")
    print(crimes1[["Class", "Class Description", "Violent"]])
    
    dat_s = pd.DataFrame({'Violent':crimes1['Violent']})
    crimes2 = crimes2.join(dat_s);
    
    #crimes2.to_csv("MontgomeryCountyCrime2013_1.csv", sep='\t', encoding='utf-8')
    crimes2.to_csv("MontgomeryCountyCrime2013_1.csv")
    
    rate_violent = crimes1["Violent"].value_counts()
    
    print(rate_violent)
    

Which crimes are the most common? Least common?

Most common
   ID Class                Class Description  Amount
0      2812      DRIVING UNDER THE INFLUENCE    1710
1      1834       CDS-POSS MARIJUANA/HASHISH    1334
2      2938                  POL INFORMATION    1191
3       614      LARCENY FROM AUTO OVER $200     914
4       617  LARCENY FROM BUILDING OVER $200     895

Least common
     ID Class                 Class Description  Amount
280       435    AGG ASSLT OTHER WPN ON ELDERLY       1
281       115                    HOMICIDE-OTHER       1
282       333     ROB OTHER WEAPON GAS/SVC  STA       1
283       445        AGG ASSLT BEAT/INJ ELDERLY       1
284      1818  CDS-MANU DRUG OVERDOSE NOT FATAL       1

Can you split the types of crimes manually into 'Violent' (caused harm to others or involved weapons) and 'Nonviolent'(mostly property crimes, like theft? What's the most common violent crime? The most common nonviolent?
23368
ACABOU!!!!
       Class                    

## 6. Combine Analysis

After doing some analysis on types of crimes, you can combine our analysis with location and time data to answer more complex questions, like:

- Where are the most violent crimes committed? How about nonviolent?
- When are the most violent crimes committed? How about nonviolent?

Make sure to write up the answers to these questions, along with your code, and explain why you reached the conclusions you did.

In [8]:
#Where are the most violent crimes committed? How about nonviolent?
#When are the most violent crimes committed? How about nonviolent?
crimes = pd.read_csv("MontgomeryCountyCrime2013_1.csv")
viol_crime = crimes["Violent"]
sectors_crime = crimes["Sector"]
districts_crime = crimes["Police District Number"]
city_crime = crimes["City"]
column_names = crimes.columns

print("Where are the most violent crimes committed? How about nonviolent?")
print("\nViolent")
crimes_violent = crimes.loc[crimes['Violent'] == True]
#crimes.loc[crimes['Violent'].isin([True])]
crimes_area_violent = crimes_violent["Police District Number"].value_counts()
#print(crimes_area_violent)

print("\nNon violent")

crimes_nonviolent = crimes.loc[crimes['Violent'] == False]
#crimes.loc[crimes['Violent'].isin([True])]
crimes_area_nonviolent = crimes_nonviolent["Police District Number"].value_counts()
#print(crimes_area_nonviolent)

#crimes

print("When are the most violent crimes committed? How about nonviolent?")


print("\nViolent")
night = []
morning = []
afternoon = []
evening = []
night = [s for s in crimes_violent["Start Date / Time"] if (s.endswith("AM") & ((s.endswith("12",10,13)) | s.endswith("01",10,13) |
                                                                         s.endswith("02",10,13) | s.endswith("03",10,13) |
                                                                         s.endswith("04",10,13) | s.endswith("05",10,13)))]
morning = [s for s in crimes_violent["Start Date / Time"] if (s.endswith("AM") & ((s.endswith("06",10,13)) | s.endswith("07",10,13) |
                                                                         s.endswith("08",10,13) | s.endswith("09",10,13) |
                                                                         s.endswith("10",10,13) | s.endswith("11",10,13)))]
afternoon = [s for s in crimes_violent["Start Date / Time"] if (s.endswith("PM") & ((s.endswith("12",10,13)) | s.endswith("01",10,13) |
                                                                         s.endswith("02",10,13) | s.endswith("03",10,13) |
                                                                         s.endswith("04",10,13) | s.endswith("05",10,13)))]
evening = [s for s in crimes_violent["Start Date / Time"] if (s.endswith("PM") & ((s.endswith("06",10,13)) | s.endswith("07",10,13) |
                                                                         s.endswith("08",10,13) | s.endswith("09",10,13) |
                                                                         s.endswith("10",10,13) | s.endswith("11",10,13)))]
print("At night crimes:", len(night))
print("At morning crimes:", len(morning))
print("At afternoon crimes:", len(afternoon))
print("At evening crimes:", len(evening))

print("\nNon violent")
night = []
morning = []
afternoon = []
evening = []
night = [s for s in crimes_nonviolent["Start Date / Time"] if (s.endswith("AM") & ((s.endswith("12",10,13)) | s.endswith("01",10,13) |
                                                                         s.endswith("02",10,13) | s.endswith("03",10,13) |
                                                                         s.endswith("04",10,13) | s.endswith("05",10,13)))]
morning = [s for s in crimes_nonviolent["Start Date / Time"] if (s.endswith("AM") & ((s.endswith("06",10,13)) | s.endswith("07",10,13) |
                                                                         s.endswith("08",10,13) | s.endswith("09",10,13) |
                                                                         s.endswith("10",10,13) | s.endswith("11",10,13)))]
afternoon = [s for s in crimes_nonviolent["Start Date / Time"] if (s.endswith("PM") & ((s.endswith("12",10,13)) | s.endswith("01",10,13) |
                                                                         s.endswith("02",10,13) | s.endswith("03",10,13) |
                                                                         s.endswith("04",10,13) | s.endswith("05",10,13)))]
evening = [s for s in crimes_nonviolent["Start Date / Time"] if (s.endswith("PM") & ((s.endswith("06",10,13)) | s.endswith("07",10,13) |
                                                                         s.endswith("08",10,13) | s.endswith("09",10,13) |
                                                                         s.endswith("10",10,13) | s.endswith("11",10,13)))]

print("At night crimes:", len(night))
print("At morning crimes:", len(morning))
print("At afternoon crimes:", len(afternoon))
print("At evening crimes:", len(evening))

#for item in morning:
#    print(item)
#st = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')


Where are the most violent crimes committed? How about nonviolent?

Violent

Non violent
When are the most violent crimes committed? How about nonviolent?

Violent
At night crimes: 506
At morning crimes: 302
At afternoon crimes: 568
At evening crimes: 767

Non violent
At night crimes: 3998
At morning crimes: 3798
At afternoon crimes: 6565
At evening crimes: 6865


## 7. Posing and answering your own questions

After you've finished exploring the data and answering some directed questions, you should be able to start coming up with some of your own.

You can think of questions based on a few strategies:

- Expanding or tweaking the directed questions from earlier.
- Exploring patterns you found while exploring the data.
- Questions based on research you've done about the previous lessons

Try to think of at least three questions, then answer them the same way you did in previous screens.