![Los Angeles skyline](la_skyline.jpg)

Los Angeles, California 😎. The City of Angels. Tinseltown. The Entertainment Capital of the World! 

Known for its warm weather, palm trees, sprawling coastline, and Hollywood, along with producing some of the most iconic films and songs. However, as with any highly populated city, it isn't always glamorous and there can be a large volume of crime. That's where you can help!

You have been asked to support the Los Angeles Police Department (LAPD) by analyzing crime data to identify patterns in criminal behavior. They plan to use your insights to allocate resources effectively to tackle various crimes in different areas.

## The Data

They have provided you with a single dataset to use. A summary and preview are provided below.

It is a modified version of the original data, which is publicly available from Los Angeles Open Data.

# crimes.csv

| Column     | Description              |
|------------|--------------------------|
| `'DR_NO'` | Division of Records Number: Official file number made up of a 2-digit year, area ID, and 5 digits. |
| `'Date Rptd'` | Date reported - MM/DD/YYYY. |
| `'DATE OCC'` | Date of occurrence - MM/DD/YYYY. |
| `'TIME OCC'` | In 24-hour military time. |
| `'AREA NAME'` | The 21 Geographic Areas or Patrol Divisions are also given a name designation that references a landmark or the surrounding community that it is responsible for. For example, the 77th Street Division is located at the intersection of South Broadway and 77th Street, serving neighborhoods in South Los Angeles. |
| `'Crm Cd Desc'` | Indicates the crime committed. |
| `'Vict Age'` | Victim's age in years. |
| `'Vict Sex'` | Victim's sex: `F`: Female, `M`: Male, `X`: Unknown. |
| `'Vict Descent'` | Victim's descent:<ul><li>`A` - Other Asian</li><li>`B` - Black</li><li>`C` - Chinese</li><li>`D` - Cambodian</li><li>`F` - Filipino</li><li>`G` - Guamanian</li><li>`H` - Hispanic/Latin/Mexican</li><li>`I` - American Indian/Alaskan Native</li><li>`J` - Japanese</li><li>`K` - Korean</li><li>`L` - Laotian</li><li>`O` - Other</li><li>`P` - Pacific Islander</li><li>`S` - Samoan</li><li>`U` - Hawaiian</li><li>`V` - Vietnamese</li><li>`W` - White</li><li>`X` - Unknown</li><li>`Z` - Asian Indian</li> |
| `'Weapon Desc'` | Description of the weapon used (if applicable). |
| `'Status Desc'` | Crime status. |
| `'LOCATION'` | Street address of the crime. |

In [189]:
# Re-run this cell
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
crimes = pd.read_csv("crimes.csv", parse_dates=["Date Rptd", "DATE OCC"], dtype={"TIME OCC": str})
crimes.head()

Unnamed: 0,DR_NO,Date Rptd,DATE OCC,TIME OCC,AREA NAME,Crm Cd Desc,Vict Age,Vict Sex,Vict Descent,Weapon Desc,Status Desc,LOCATION
0,220314085,2022-07-22,2020-05-12,1110,Southwest,THEFT OF IDENTITY,27,F,B,,Invest Cont,2500 S SYCAMORE AV
1,222013040,2022-08-06,2020-06-04,1620,Olympic,THEFT OF IDENTITY,60,M,H,,Invest Cont,3300 SAN MARINO ST
2,220614831,2022-08-18,2020-08-17,1200,Hollywood,THEFT OF IDENTITY,28,M,H,,Invest Cont,1900 TRANSIENT
3,231207725,2023-02-27,2020-01-27,635,77th Street,THEFT OF IDENTITY,37,M,H,,Invest Cont,6200 4TH AV
4,220213256,2022-07-14,2020-07-14,900,Rampart,THEFT OF IDENTITY,79,M,B,,Invest Cont,1200 W 7TH ST


In [190]:
import pandas as pd

In [191]:
crimes=pd.read_csv('crimes.csv', header=0, sep=",")

print(crimes.shape)
print(crimes)

(185715, 12)
            DR_NO  ...                                  LOCATION
0       220314085  ...   2500 S  SYCAMORE                     AV
1       222013040  ...   3300    SAN MARINO                   ST
2       220614831  ...                         1900    TRANSIENT
3       231207725  ...   6200    4TH                          AV
4       220213256  ...   1200 W  7TH                          ST
...           ...  ...                                       ...
185710  231510379  ...   5300    DENNY                        AV
185711  231604807  ...  12500    BRANFORD                     ST
185712  231606525  ...  12800    FILMORE                      ST
185713  231210064  ...   6100 S  VERMONT                      AV
185714  230906458  ...  14500    HARTLAND                     ST

[185715 rows x 12 columns]


In [192]:
print(crimes.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 185715 entries, 0 to 185714
Data columns (total 12 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   DR_NO         185715 non-null  int64 
 1   Date Rptd     185715 non-null  object
 2   DATE OCC      185715 non-null  object
 3   TIME OCC      185715 non-null  int64 
 4   AREA NAME     185715 non-null  object
 5   Crm Cd Desc   185715 non-null  object
 6   Vict Age      185715 non-null  int64 
 7   Vict Sex      185704 non-null  object
 8   Vict Descent  185705 non-null  object
 9   Weapon Desc   73502 non-null   object
 10  Status Desc   185715 non-null  object
 11  LOCATION      185715 non-null  object
dtypes: int64(3), object(9)
memory usage: 17.0+ MB
None


1. Highest Freq Hr: ```peak_crime_hour```
2. HF Night crime Area: ```peak_night_crime_location```
3. Crime count w/ age grps: ```victim_ages```

## Task 1
# Highest Freq Crime Hr: `peak_crime_hour`

filter + add count

In [193]:
print(crimes['TIME OCC'])

0         1110
1         1620
2         1200
3          635
4          900
          ... 
185710    1100
185711    1800
185712    1000
185713    1630
185714     900
Name: TIME OCC, Length: 185715, dtype: int64


We can either count entries of data in hourly intervals individually, or modify column 'TIME OCC' into a general hour group format

Taking approach 2: Changing the time column to denote the nearest lower integer 
- Ideal to start by creating a new data-frame for 'TIME OCC" column
- Further, to simplistically reach the nearest lower integer, dividing by 100 reduces the value to only two significant digits, thereby using casting to achieve the goal

In [195]:
df=crimes[['TIME OCC']]         

n=df.shape[0]
for i in range(n):
    df.at[i,'TIME OCC']=int(df.at[i,'TIME OCC']/100)

#print(df)

Conveniently we can get column values against their number of occurences in a series format through `df['column'].value_counts()`.
Here, converting into a dictionary is ideal for further use.

In [196]:
hr_grp=df['TIME OCC'].value_counts().to_dict()

print(hr_grp)

{12: 13663, 18: 10125, 17: 9964, 20: 9579, 15: 9393, 19: 9262, 16: 9224, 14: 8872, 11: 8787, 0: 8728, 21: 8701, 22: 8531, 13: 8474, 10: 8440, 8: 7523, 23: 7419, 9: 7092, 1: 5836, 6: 5621, 7: 5403, 2: 4726, 3: 3943, 4: 3238, 5: 3171}


The following code block has variable:
- l: List of values of the dictionary
- i: Index of the maximum value in the dictionary, asessed through l
- hr: the hour associated with highest crime frequency, retrived by using i on the list of keys of the dictionary
- peak_crime_hour: is assigned to hr

In [197]:
l=list(hr_grp.values())
i=l.index(max(l))
hr=list(hr_grp.keys())[i]
#hr=int(hr)


peak_crime_hour= hr
print(peak_crime_hour)

12


## Task 2
# HF Night crime Area: `peak_night_crime_location`

Since we have to understand crime frequency in areas as per a time restriction, creating a dataframe of variables 'AREA NAME' & 'TIME OCC' is apt.

`df2['TIME OCC']=df['TIME OCC']` is done to ensure that the hourly gap format of timing from Task 1 follows in this Task as well for convenience.

Lastly, the data has been filtered based on the time restriction, i.e 2200-0000 & 0000-0400 (10pm to 3:59am)

In [199]:
df2=crimes[['AREA NAME', 'TIME OCC']]
df2['TIME OCC']=df['TIME OCC']

df3=df2[(df2['TIME OCC']>=22)|(df2['TIME OCC']<4)]
print(df3)

          AREA NAME  TIME OCC
8       77th Street         0
10       Devonshire         1
30           Newton         0
33           Newton        23
36         Foothill         0
...             ...       ...
185687       Newton        22
185695  77th Street        23
185700       Newton        22
185701     Van Nuys        22
185704   Devonshire         2

[39183 rows x 2 columns]


As done in the previous Task, to get a count of number of crimes in each region, `.value_counts().to_dict()` is used. In a similar fashion, the region with maximum crime is obtained.

In [200]:
area=df2['AREA NAME'].value_counts().to_dict()

#print(area)
l=list(area.values())
mfi=l.index(max(l))

l=list(area.keys())
peak_night_crime_location=l[mfi]
#print(area)
#print(peak_night_crime_location)

## Task 3
# Crime count w/ age grps: `victim_ages`

A: dataframe of variable 'Vict Age'
l: list of lower limit of age groups

In order to get the count of ages satisfying a certain condition, filtering of dataframes can be performed
o,k: gives the count of elements in respective dataframes with the help of `.shape[0]`
s: String element denoting age range


In [201]:
A=crimes[['Vict Age']]

dic={}
l=[0,18,26,35,45,55,65]
for i in range(len(l)):
    if l[i]==65:
        o=df[(l[i]<=A['Vict Age'])].shape[0]
        
        s=str(l[i])+"+"
        
        dic[s]=o
        break
    
    k=df[(l[i] <= A['Vict Age']) & (A['Vict Age'] <= (l[i+1]-1))].shape[0]
    
    s=str(l[i])+'-'+str(l[i+1]-1)
    dic[s]=k

#print(dic)

victim_ages=pd.Series(dic)
print(victim_ages)

0-17      4528
18-25    28291
26-34    47470
35-44    42157
45-54    28353
55-64    20169
65+      14747
dtype: int64
