# Problem set 3: Text analysis of DOJ press releases

**Total points (without extra credit)**: 52 

- For background:

    - DOJ is the federal law enforcement agency responsible for federal prosecutions; this contrasts with the local prosecutions in the Cook County dataset we analyzed earlier. Here's a short explainer on which crimes get prosecuted federally versus locally: https://www.criminaldefenselawyer.com/resources/criminal-defense/federal-crime/state-vs-federal-crimes.htm#:~:text=Federal%20criminal%20prosecutions%20are%20handled,of%20state%20and%20local%20law. 
    - Here's the Kaggle that contains the data: https://www.kaggle.com/jbencina/department-of-justice-20092018-press-releases 
    - Here's the code the dataset creator used to scrape those press releases here if you're interested: https://github.com/jbencina/dojreleases

## 0.0 Import packages

In [None]:
## helpful packages
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
import random
import re
import string

## nltk imports
import nltk
### uncomment and run these lines if you haven't downloaded relevant nltk add-ons yet
### nltk.download('averaged_perceptron_tagger')
### nltk.download('stopwords')
from nltk import pos_tag
from nltk.tokenize import word_tokenize, wordpunct_tokenize
from nltk.stem.snowball import SnowballStemmer
from nltk.corpus import stopwords

## spacy imports
import spacy
### uncomment and run the below line if you haven't loaded the en_core_web_sm library yet
### ! python -m spacy download en_core_web_sm
import en_core_web_sm
nlp = en_core_web_sm.load()

## vectorizer
from sklearn.feature_extraction.text import CountVectorizer

## sentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

## lda
from gensim import corpora
import gensim

## repeated printouts and wide-format text
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
pd.set_option('display.max_colwidth', None)

## 0.1 Load and clean text data

In [412]:
## first, unzip the file pset3_inputdata.zip 
## then, run this code to load the unzipped json file and convert to a dataframe
## (may need to change the pathname depending on where you store stuff)
## and convert some of the attributes from lists to values
doj = pd.read_json("../pset3/combined.json", lines = True)

## due to json, topics are in a list so remove them and concatenate with ;
doj['topics_clean'] = ["; ".join(topic) 
                      if len(topic) > 0 else "No topic" 
                      for topic in doj.topics]

## similarly with components
doj['components_clean'] = ["; ".join(comp) 
                           if len(comp) > 0 else "No component" 
                           for comp in doj.components]

## drop older columns from data
doj = doj[['id', 'title', 'contents', 'date', 'topics_clean', 
           'components_clean']].copy()

doj

Unnamed: 0,id,title,contents,date,topics_clean,components_clean
0,,Convicted Bomb Plotter Sentenced to 30 Years,"PORTLAND, Oregon. – Mohamed Osman Mohamud, 23, who was convicted in 2013 of attempting to use a weapon of mass destruction (explosives) in connection with a plot to detonate a vehicle bomb at an annual Christmas tree lighting ceremony in Portland, was sentenced today to serve 30 years in prison, followed by a lifetime term of supervised release. Mohamud, a naturalized U.S. citizen from Somalia and former resident of Corvallis, Oregon, was arrested on Nov. 26, 2010, after he attempted to detonate what he believed to be an explosives-laden van that was parked near the tree lighting ceremony in Portland. The arrest was the culmination of a long-term undercover operation, during which Mohamud was monitored closely for months as his bomb plot developed. The device was in fact inert, and the public was never in danger from the device. At sentencing, United States District Court Judge Garr M. King, who presided over Mohamed’s 14-day trial, said “the intended crime was horrific,” and that the defendant, even though he was presented with options by undercover FBI employees, “never once expressed a change of heart.” King further noted that the Christmas tree ceremony was attended by up to 10,000 people, and that the defendant “wanted everyone to leave either dead or injured.” King said his sentence was necessary in view of the seriousness of the crime and to serve as deterrence to others who might consider similar acts. “With today’s sentencing, Mohamed Osman Mohamud is being held accountable for his attempted use of what he believed to be a massive bomb to attack innocent civilians attending a public Christmas tree lighting ceremony in Portland,” said John P. Carlin, Assistant Attorney General for National Security. “The evidence clearly indicated that Mohamud was intent on killing as many people as possible with his attack. Fortunately, law enforcement was able to identify him as a threat, insert themselves in the place of a terrorist that Mohamud was trying to contact, and thwart Mohamud’s efforts to conduct an attack on our soil. This case highlights how the use of undercover operations against would-be terrorists allows us to engage and disrupt those who wish to commit horrific acts of violence against the innocent public. The many agents, analysts, and prosecutors who have worked on this case deserve great credit for their roles in protecting Portland from the threat posed by this defendant and ensuring that he was brought to justice.” “This trial provided a rare glimpse into the techniques Al Qaeda employs to radicalize home-grown extremists,” said Amanda Marshall, U.S. Attorney for the District of Oregon. “With the sentencing today, the court has held this defendant accountable. I thank the dedicated professionals in the law enforcement and intelligence communities who were responsible for this successful outcome. I look forward to our continued work with Muslim communities in Oregon who are committed to ensuring that all young people are safe from extremists who seek to radicalize others to engage in violence.” According to the trial evidence, in February 2009, Mohamud began communicating via e-mail with Samir Khan, a now-deceased al Qaeda terrorist who published Jihad Recollections, an online magazine that advocated violent jihad, and who also published Inspire, the official magazine of al-Qaeda in the Arabian Peninsula. Between February and August 2009, Mohamed exchanged approximately 150 emails with Khan. Mohamud wrote several articles for Jihad Recollections that were published under assumed names. In August 2009, Mohamud was in email contact with Amro Al-Ali, a Saudi national who was in Yemen at the time and is today in custody in Saudi Arabia for terrorism offenses. Al-Ali sent Mohamud detailed e-mails designed to facilitate Mohamud’s travel to Yemen to train for violent jihad. In December 2009, while Al-Ali was in the northwest frontier province of Pakistan, Mohamud and Al-Ali discussed the possibility of Mohamud traveling to Pakistan to join Al-Ali in terrorist activities. Mohamud responded to Al-Ali in an e-mail: “yes, that would be wonderful, just tell me what I need to do.” Al-Ali referred Mohamud to a second associate overseas and provided Mohamud with a name and email address to facilitate the process. In the following months, Mohamud made several unsuccessful attempts to contact Al-Ali’s associate. Ultimately, an FBI undercover operative contacted Mohamud via email under the guise of being an associate of Al-Ali’s. Mohamud and the FBI undercover operative agreed to meet in Portland in July 2010. At the meeting, Mohamud told the FBI undercover operative he had written articles that were published in Jihad Recollections. Mohamud also said that he wanted to become “operational.” Asked what he meant by “operational,” Mohamud said he wanted to put an explosion together, but needed help. According to evidence presented at trial, at a meeting in August 2010, Mohamud told undercover FBI operatives he had been thinking of committing violent jihad since the age of 15. Mohamud then told the undercover FBI operatives that he had identified a potential target for a bomb: the annual Christmas tree lighting ceremony in Portland’s Pioneer Courthouse Square on Nov. 26, 2010. The undercover FBI operatives cautioned Mohamud several times about the seriousness of this plan, noting there would be many people at the event, including children, and emphasized that Mohamud could abandon his attack plans at any time with no shame. Mohamud indicated the deaths would be justified and that he would not mind carrying out a suicide attack on the crowd. According to evidence presented at trial, in the ensuing months Mohamud continued to express his interest in carrying out the attack and worked on logistics. On Nov. 4, 2010, Mohamud and the undercover FBI operatives traveled to a remote location in Lincoln County, Oregon, where they detonated a bomb concealed in a backpack as a trial run for the upcoming attack. During the drive back to Corvallis, Mohamud was asked if was capable looking at all the bodies of those who would be killed during the explosion. In response, Mohamud noted, “I want whoever is attending that event to be, to leave either dead or injured.” Mohamud later recorded a video of himself, with the assistance of the undercover FBI operatives, in which he read a statement that offered his rationale for his bomb attack. On Nov. 18, 2010, undercover FBI operatives picked up Mohamud to travel to Portland to finalize the details of the attack. On Nov. 26, 2010, just hours before the planned attack, Mohamud examined the 1,800 pound bomb in the van and remarked that it was “beautiful.” Later that day, Mohamud was arrested after he attempted to remotely detonate the inert vehicle bomb rked near the Christmas tree lighting ceremony This case was investigated by the FBI, with assistance from the Oregon State Police, the Corvallis Police Department, the Lincoln County Sheriff’s Office and the Portland Police Bureau. The prosecution was handled by Assistant U.S. Attorneys Ethan D. Knight and Pamala Holsinger from the U.S. Attorney’s Office for the District of Oregon. Trial Attorney Jolie F. Zimmerman, from the Counterterrorism Section of the Justice Department’s National Security Division, assisted. # # # 14-1077",2014-10-01T00:00:00-04:00,No topic,National Security Division (NSD)
1,12-919,$1 Million in Restitution Payments Announced to Preserve North Carolina Wetlands,"WASHINGTON – North Carolina’s Waccamaw River watershed will benefit from a $1 million restitution order from a federal court, funding environmental projects to acquire and preserve wetlands in an area damaged by illegal releases of wastewater from a corporate hog farm, announced Ignacia S. Moreno, Assistant Attorney General of the Justice Department’s Environment and Natural Resources Division; U.S. Attorney for the Eastern District of North Carolina Thomas G. Walker; Director Greg McLeod from the North Carolina State Bureau of Investigation; and Camilla M. Herlevich, Executive Director of the North Carolina Coastal Land Trust. Freedman Farms Inc. was sentenced in February 2012 to five years of probation and ordered to pay $1.5 million in fines, restitution and community service payments for violating the Clean Water Act when it discharged hog waste into a stream that leads to the Waccamaw River. William B. Freedman, president of Freedman Farms, was sentenced to six months in prison to be followed by six months of home confinement. Freedman Farms also is required to implement a comprehensive environmental compliance program and institute an annual training program. In an order issued on April 19, 2012, the court ordered that the defendants would be responsible for restitution of $1 million in the form of five annual payments starting in January 2013, which the court will direct to the North Carolina Coastal Land Trust (NCCLT). The NCCLT plans to use the money to acquire and conserve land along streams in the Waccamaw watershed. The court also directed a $75,000 community service payment to the Southern Environmental Enforcement Network, an organization dedicated to environmental law enforcement training and information sharing in the region. “The resolution of the case against Freedman Farms demonstrates the commitment of the Department of Justice to enforcing the Clean Water Act to ensure the protection of human health and the environment,” said Assistant Attorney General Moreno. “The court-ordered restitution in this case will conserve wetlands for the benefit of the people of North Carolina. By enforcing the nation’s environmental laws, we will continue to ensure that concentrated animal feeding operations (CAFOs) operate without threatening our drinking water, the health of our communities and the environment.” “This office is committed to doing our part to hold accountable those who commit crimes against our environment, which can cause serious health problems to residents and damage the environment that makes North Carolina such a beautiful place to live and visit,” said U.S. Attorney Walker. “This case shows what we can accomplish when our SBI agents work closely with their local, state and federal partners to investigate environmental crimes and hold the polluters accountable,” said Director McLeod. “We’ll continue our efforts to fight illegal pollution that damages our water and puts the public’s health at risk.” “The Waccamaw is unique and wild,” said Director Herlevich of the North Carolina Coastal Land Trust. “Its watershed includes some of the most extensive cypress gum swamps in the state, and its headwaters at Lake Waccamaw contain fish that are found nowhere else on Earth. We appreciate the trust of the court and the U. S. Attorney, and we look forward to using these funds for conservation projects in a river system that is one of our top conservation priorities.” According to evidence presented in court, in December 2007 Freedman Farms discharged hog waste into Browder’s Branch, a tributary to the Waccamaw River that flows through the White Marsh, a large wetlands complex. Freedman Farms, located in Columbus County, N.C., is in the business of raising hogs for market, and this particular farm had some 4,800 hogs. The hog waste was supposed to be directed to two lagoons for treatment and disposal. Instead, hog waste was discharged from Freedman Farms directly into Browder’s Branch. The Clean Water Act is a federal law that makes it illegal to knowingly or negligently discharge a pollutant into a water of the United States. The Freedman case was investigated by the U.S. Environmental Protection Agency (EPA) Criminal Investigation Division, the U.S. Army Corps of Engineers and the North Carolina State Bureau of Investigation, with assistance from the EPA Science and Ecosystem Support Division. The case was prosecuted by Assistant U.S. Attorney J. Gaston B. Williams of the Eastern District of North Carolina and Trial Attorney Mary Dee Carraway of the Environmental Crimes Section of the Justice Department’s Environment and Natural Resources Division. The North Carolina Coastal Land Trust is celebrating its 20th anniversary of saving special lands in eastern North Carolina. The organization has protected nearly 50,000 acres of lands with scenic, recreational, historic and ecological values. North Carolina Coastal Land Trust has saved streams and wetlands that provide clean water, forests that are havens for wildlife, working farms that provide local food and nature parks that everyone can enjoy. More information about the Coastal Land Trust is available at www.coastallandtrust.org.",2012-07-25T00:00:00-04:00,No topic,Environment and Natural Resources Division
2,11-1002,$1 Million Settlement Reached for Natural Resource Damages at Superfund Site in Massachusetts,"BOSTON– A $1-million settlement has been reached for natural resource damages (NRD) at the Blackburn & Union Privileges Superfund Site in Walpole, Mass., the Departments of Justice and Interior (DOI), and the Office of the Massachusetts Attorney General announced today. The Blackburn & Union Privileges Superfund Site includes 22 acres of contaminated land and water in Walpole. The contamination resulted from the operations of various industrial facilities dating back to the 19th century that exposed the site to asbestos, arsenic, lead and other hazardous substances. The private parties involved in the settlement include two former owners and operators of the site, W.R. Grace & Co.– Conn. and Tyco Healthcare Group LP, as well as the current owners, BIM Investment Corp. and Shaffer Realty Nominee Trust. From about 1915 to 1936, a predecessor of W.R. Grace manufactured asbestos brake linings and clutch linings on a large portion of the property. From 1946 to about 1983, a predecessor of Tyco Healthcare operated a cotton fabric manufacturing business, which used caustic solutions, on a portion of the property. In a 2010 settlement with U.S. Environmental Protection Agency (EPA), the four private parties agreed to perform a remedial action to clean up the site at an estimated cost of $13 million. The consent decree lodged today resolves both state and federal NRD liability claims; it requires the parties to pay $1,094,169.56 to the state and federal natural resource trustees, the Massachusetts Executive Office of Energy and Environmental Affairs (EEA) and DOI, for injuries to ecological resources including groundwater and wetlands, which provide habitat for waterfowl and wading birds, including black ducks and great blue herons. The trustees will use the settlement funds for natural resource restoration projects in the area. “This settlement demonstrates our commitment to recovering damages from the parties responsible for injury to natural resources, in partnership with state trustees,” said Bruce Gelber, Acting Deputy Assistant Attorney General of the Justice Department’s Environment and Natural Resources Division. “The citizens of Walpole have had to live with the environmental impact of this contamination for many years,” Attorney General Martha Coakley said. “We are pleased that today’s agreement will not only require the responsible parties to reimburse taxpayer dollars, but will also provide funding to begin restoring or replacing the wetland and other natural resources.” The consent decree was lodged in the U.S. District Court for Massachusetts. A portion of the funds, $300,000, will be distributed to the EEA-sponsored groundwater restoration projects; $575,000 will be used for ecological restoration projects jointly sponsored by EEA and the U.S. Fish and Wildlife Service (FWS). In addition, $125,000 will go for projects jointly sponsored by EEA and FWS that achieve both ecological and groundwater restoration; $57,491.34 will be allocated for reimbursement for the FWS’s assessment costs; and $36,678.22 will be distributed as reimbursement for the commonwealth’s assessment costs. “This settlement provides the means for a range of projects designed to compensate the public for decades of groundwater and other ecological damage at this site. I encourage local citizens and organizations to become engaged in the public process that will take place as we solicit, take comment on, and choose these projects in the months ahead,” said Energy and Environmental Affairs Secretary Richard K. Sullivan Jr., who serves as the Commonwealth’s Natural Resources Damages trustee. “This settlement will help restore habitat for fish and wildlife in the Neponset River watershed,” said Tom Chapman of the FWS New England Field Office. “We look forward to working with the commonwealth and local stakeholders to implement restoration.” “More than 100 years-worth of industrial activities at this site caused major environmental contamination to the Neponset River, nearby wetlands and to groundwater below the site,” said Commissioner Kenneth Kimmell of the Massachusetts Department of Environmental Protection (MassDEP), which will staff the Trustee Council for the Commonwealth. “We will ensure that the community and the public will be active participants in the process to use these NRD funds to restore the injured natural resources.” Under the federal Comprehensive Environmental Response, Compensation and Liability Act, EEA and DOI, acting through the FWS, are the designated state and federal natural resource Trustees for the site. The site has been listed on the EPA’s National Priorities List since 1994. The consent decree is subject to a public comment period and court approval. A copy of the consent decree and instructions about how to submit comments is available on www.usdoj.gov/enrd/Consent_Decrees.html . After the consent decree is approved, EEA and FWS will develop proposed restoration plans to use the settlement funds for restoration projects. The proposed restoration plans will also be made available to the public for review and comment. Assistant Attorney General Matthew Brock of Massachusetts Attorney General Coakley's Environmental Protection Division handled this matter. Attorney Jennifer Davis of MassDEP, Attorney Anna Blumkin of EEA and MassDEP’s NRD Coordinator Karen Pelto also worked on this settlement.",2011-08-03T00:00:00-04:00,No topic,Environment and Natural Resources Division
3,10-015,10 Las Vegas Men Indicted \r\nfor Falsifying Vehicle Emissions Tests,"WASHINGTON—A federal grand jury in Las Vegas today returned indictments against 10 Nevada-certified emissions testers for falsifying vehicle emissions test reports, the Justice Department announced. Each defendant faces one felony Clean Air Act count for falsifying reports between November 2007 and May 2009. The number of falsifications varied by defendant, with some defendants having falsified approximately 250 records, while others falsified more than double that figure. One defendant is alleged to have falsified over 700 reports. The individuals indicted include: Escudero resides in Pahrump, Nev. All other individuals are from Clark County, Nev. The 10 defendants are alleged to have engaged in a practice known as ""clean scanning"" vehicles. The scheme involved entering the Vehicle Identification Number (VIN) for a vehicle that would not pass the emissions test into the computerized system, then connecting a different vehicle the testers knew would pass the test. These falsifications were allegedly performed for anywhere from $10 to $100 over and above the usual emissions testing fee. The U.S. Environmental Protection Agency (EPA), under the Clean Air Act, requires the state of Nevada to conduct vehicle emissions testing in certain areas because the areas exceed national standards for carbon monoxide and ozone. Las Vegas is currently required to perform emissions testing. To obtain a registration renewal, vehicle owners bring the vehicles to a licensed inspection station for testing. The emissions inspector logs into a computer to activate the system by using a unique password issued to the emissions inspector. The emissions inspector manually inputs the vehicle’s VIN to identify the tested vehicle, then connects the vehicle for model year 1996 and later to an onboard diagnostics port connected to an analyzer. The analyzer downloads data from the vehicle’s computer, analyzes the data and provides a ""pass"" or ""fail"" result. The pass or fail result and vehicle identification data are reported on the Vehicle Inspection Report. It is a crime to knowingly alter or conceal any record or other document required to be maintained by the Clean Air Act. ""Falsifications of vehicle emissions testing, such as those alleged in the indictments unsealed today, are serious matters and we intend to use all of our enforcement tools to stop this harmful practice. These actions undermine a system that is designed to reduce air pollutants including smog and provide better air quality for the citizens of Nevada,"" said Ignacia S. Moreno, Assistant Attorney General for the Justice Department’s Environment and Natural Resources Division. ""The residents of Nevada deserve to know that the vast majority of licensed vehicle emission inspectors are not corrupt and are not circumventing emission testing procedures,"" said U.S. Attorney Bogden. ""These indictments should serve as a clear warning to offenders that the Department of Justice will prosecute you if you make fraudulent statements and reports concerning compliance with the federal Clean Air Act."" ""Lying about car emissions means dirtier air, which is especially of concern in areas like Las Vegas that are already experiencing air quality problems,"" said Cynthia Giles, Assistant Administrator for Enforcement and Compliance Assurance at EPA. ""We will take aggressive action to ensure communities have clean air."" The maximum penalty for the felony violations contained in the indictments includes up to two years in prison and a fine of up to $250,000. An indictment is merely an accusation, and a defendant is presumed innocent unless and until proven guilty in a court of law. The case was investigated by the EPA, Criminal Investigation Division; and the Nevada Department of Motor Vehicles Compliance Enforcement Division. The case is being prosecuted by the U.S. Attorney’s Office for the District of Nevada and the Justice Department’s Environmental Crimes Section.",2010-01-08T00:00:00-05:00,No topic,Environment and Natural Resources Division
4,18-898,"$100 Million Settlement Will Speed Cleanup Work at Centredale Manor Superfund Site in North Providence, R.I.","The U.S. Department of Justice, the U.S. Environmental Protection Agency (EPA), and the Rhode Island Department of Environmental Management (RIDEM) announced today that two subsidiaries of Stanley Black & Decker Inc.—Emhart Industries Inc. and Black & Decker Inc.—have agreed to clean up dioxin contaminated sediment and soil at the Centredale Manor Restoration Project Superfund Site in North Providence and Johnston, Rhode Island. “We are pleased to reach a resolution through collaborative work with the responsible parties, EPA, and other stakeholders,” said Acting Assistant Attorney General Jeffrey H. Wood for the Justice Department's Environment and Natural Resources Division . “Today’s settlement ends protracted litigation and allows for important work to get underway to restore a healthy environment for citizens living in and around the Centredale Manor Site and the Woonasquatucket River.” “This settlement demonstrates the tremendous progress we are achieving working with responsible parties, states, and our federal partners to expedite sites through the entire Superfund remediation process,” said EPA Acting Administrator Andrew Wheeler. “The Centredale Manor Site has been on the National Priorities List for 18 years; we are taking charge and ensuring the Agency makes good on its promise to clean it up for the betterment of the environment and those communities affected.” “Successfully concluding this settlement paves the way for EPA to make good on our commitment to aggressively pursue cleaning up the Centredale Manor Superfund Site,” said EPA New England Regional Administrator Alexandra Dunn. “We are excited to get to work on the cleanup at this site, and get it closer to the goal of being fully utilized by the North Providence and Johnston communities.” “We are pleased that the collective efforts of the State of Rhode Island, EPA, and DOJ in these negotiations have concluded in this major milestone toward the cleanup of the Centredale Manor Restoration Superfund site and are consistent with our long-standing efforts to make the polluter pay,” said RIDEM Director Janet Coit. “The settlement will speed up a remedy that protects public health and the river environment, and moves us closer to the day that we can reclaim recreational uses of this beautiful river resource.” The settlement, which includes cleanup work in the Woonasquatucket River (River) and bordering residential and commercial properties along the River, requires the companies to perform the remedy selected by EPA for the Site in 2012, which is estimated to cost approximately $100 million, and resolves longstanding litigation. The cleanup remedy includes excavation of contaminated sediment and floodplain soil from the Woonasquatucket River, including from adjacent residential properties. Once the cleanup remedy is completed, full access to the Woonasquatucket River should be restored for local citizens. The cleanup will be a step toward the State’s goal of a fishable and swimmable river. The work will also include upgrading caps over contaminated soil in the peninsula area of the Site that currently house two high-rise apartment buildings. The settlement also ensures that the long-term monitoring and maintenance of the site, as directed in the remedy, will be implemented to ensure that public health is protected. Under the settlement, Emhart and Black & Decker will reimburse EPA for approximately $42 million in past costs incurred at the Site. The companies will also reimburse EPA and the State of Rhode Island for future costs incurred by those agencies in overseeing the work required by the settlement. The settlement will also include payments on behalf of two federal agencies to resolve claims against those agencies. These payments, along with prior settlements related to the Site, will result in a 100 percent recovery for the United States of its past and future response costs related to the Site. Litigation related to the Site has been ongoing for nearly eight years. While the Federal District Court found Black & Decker and Emhart to be liable for their hazardous waste and responsible to conduct the cleanup of the Site, it had also ruled that EPA needed to reconsider certain aspects of that cleanup. EPA appealed the decision requiring it to reconsider aspects of the cleanup. This settlement, once entered by the District Court, will resolve the litigation between the United States, Rhode Island, and Emhart and Black and Decker, allowing the cleanup of the Site to begin. The Site spans a one and a half mile stretch of the Woonasquatucket River and encompasses a nine-acre peninsula, two ponds and a significant forested wetland. From the 1940s to the early 1970s, Emhart’s predecessor operated a chemical manufacturing facility on the peninsula and used a raw material that was contaminated with 2,3,7,8-tetrachlorodibenzo-p-dioxin, a toxic form of dioxin. The Site property was also previously used by a barrel refurbisher. Elevated levels of dioxins and other contaminants have been detected in soil, groundwater, sediment, surface water and fish. The Site was added to the National Priorities List (NPL) in 2000, and in December 2017, EPA included the Centredale Manor Restoration Project Superfund Site on a list of Superfund sites targeted for immediate and intense attention. Several short-term actions were previously performed at the Site to address immediate threats to the residents and minimize potential erosion and downstream transport of contaminated soil and sediment. This settlement is the latest agreement EPA has reached since the Site was listed on the NPL. Prior agreements addressed the performance and recovery of costs for the past environmental investigations and interim cleanup actions from Emhart, the barrel reconditioning company, the current owners of the peninsula portion of the Site, and other potentially responsible parties. The Consent Decree, lodged in the U.S. District Court of Rhode Island, will be posted in the Federal Register and available for public comment for a period of 30 days. The Consent Decree can be viewed on the Justice Department website: www.justice.gov/enrd/Consent_Decrees.html. EPA information on the Centredale Manor Superfund Site: www.epa.gov/superfund/centredale.",2018-07-09T00:00:00-04:00,Environment,Environment and Natural Resources Division
...,...,...,...,...,...,...
13082,16-735,Yuengling to Upgrade Environmental Measures to Settle Clean Water Act Violations at Two Pennsylvania Breweries,"The Department of Justice and the U.S. Environmental Protection Agency (EPA) today announced that D. G. Yuengling and Son Inc., has settled Clean Water Act violations involving its two large-scale breweries near Pottsville, Pennsylvania. In a consent decree filed today in federal court in Harrisburg, Pennsylvania, the company has agreed to spend approximately $7 million to improve environmental measures at its brewery operations after it allegedly discharged pollutants into the Greater Pottsville Area Sewer Authority municipal wastewater treatment plant. Yuengling will also pay a $2.8 million penalty. In addition, the consent decree includes a requirement to implement an environmental management system (EMS) focused on achieving CWA compliance at the facilities. Yuengling must hire a third party consultant to develop the EMS and a third party auditor to ensure proper implementation at the facility operations. The company allegedly violated Clean Water Act requirements for companies that discharge industrial waste to municipal publically-owned wastewater treatment facilities numerous times between 2008 and 2015. Companies must obtain and comply with permit limits on discharges of industrial waste that goes to public treatment facilities, which in many cases require “pretreatment” of waste before it is discharged. The case was referred to EPA by the Greater Pottsville Area Sewer Authority (GPASA). “It is vital that companies using municipal wastewater treatment facilities strictly follow pretreatment guidelines and permit limits for their wastewater. It is what good neighbors expect, and it is what the law requires,” said Assistant Attorney General John C. Cruden for the Department of Justice’s Environment and Natural Resources Division. “This settlement requires Yuengling to put into place an environmental management system designed to manage compliance with the Clean Water Act in a systemic, planned, and documented manner to establish a top-down, prevention-focused approach. The settlement also mandates independent audits of Yuengling’s compliance with the consent decree, among other requirements.” “Yuengling is responsible for serious violations of its Clean Water Act pretreatment discharge limits, posing a potential risk to the Schuylkill River which provides drinking water to 1.5 million people,” said EPA Regional Administrator Shawn M. Garvin. “This history of violations and failure to fully respond to orders from the Greater Pottsville Area Sewer Authority and EPA to correct the problems resulted in this enforcement action.” In a complaint filed concurrently with the settlement, the United States alleged that Yuengling violated pretreatment permit requirements, including discharge limits for biological oxygen demand (BOD), phosphorus, zinc and pH to the GPASA treatment plant, at least 141 times from 2008 to 2015. Pretreatment helps remove or change the composition of pollutants in wastewater. Unpermitted or excessive industrial discharges may interfere with the operation of public wastewater treatment plants, which are generally designed to handle sewage and domestic waste, leading to the discharge of untreated or inadequately treated wastewater into local waters. In addition to the monetary penalty, Yuengling has also agreed to take measures that will prevent future violations including: The consent decree, which is subject to a 30-day public comment period and final court approval, is available at: www.justice.gov/enrd/ More information on the settlement: www.epa.gov/compliance/resources/cases/civil/cwa/arch.html",2016-06-23T00:00:00-04:00,Environment,Environment and Natural Resources Division
13083,10-473,Zarein Ahmedzay Pleads Guilty to Terror Violations in Connection with Al-Qaeda New York Subway Plot,"The Justice Department announced that Zarein Ahmedzay, a U.S. citizen and resident of Queens, N.Y., pleaded guilty today in the Eastern District of New York to terrorism violations stemming from, among other activities, his role in an al-Qaeda plot to conduct coordinated suicide bombings on New York’s subway system in September 2009. At a hearing this afternoon before Chief U.S. Magistrate Judge Steven M. Gold, Ahmedzay, 25, pleaded guilty to the following violations: conspiracy to use a weapon of mass of destruction (explosive bombs) against persons or property in the United States; conspiracy to commit murder in a foreign country; and providing material support to a foreign terrorist organization, namely al-Qaeda. Ahmedzay faces a sentence of up to life in prison. Ahmedzay was first indicted on Jan. 8, 2010, in the Eastern District of New York on charges of making false statements to the FBI about his travels to Pakistan and Afghanistan. On Feb. 25, 2010, he was charged in a superseding indictment in the Eastern District of New York with conspiracy to use weapons of mass destruction; conspiracy to commit murder in a foreign country; providing material support to al-Qaeda; receiving military-type training from al-Qaeda; and making false statements. ""The facts disclosed today add chilling details to what we know was a deadly plot hatched by al-Qaeda leaders overseas to kill scores of Americans in the New York City subway system in September 2009,"" said Attorney General Eric Holder. ""This plot, as well as others we have encountered, makes clear we face a continued threat from al-Qaeda and its affiliates overseas. With three guilty pleas already and the investigation continuing, this prosecution underscores the importance of using every tool we have available to both disrupt plots against our nation and hold suspected terrorists accountable."" FBI Director Robert S. Mueller said, ""Ahmedzay’s plea makes clear that he betrayed his adopted country and its people by providing support to al-Qaeda and planning to bring deadly violence to New York. The FBI and our law enforcement and intelligence partners will continue to investigate this plot and to bring all necessary resources to bear to protect Americans from terrorist attacks."" As Ahmedzay admitted during today’s guilty plea allocution and as reflected in previous government filings and the guilty plea allocution of co-defendant Najibullah Zazi, Ahmedzay, Zazi and a third individual agreed to travel to Afghanistan to join the Taliban and fight against United States and allied forces. In furtherance of their plans, they flew from Newark Liberty International Airport in Newark, N.J., to Peshawar, Pakistan at the end of August 2008. Ahmedzay and the third individual attempted to enter Afghanistan but were turned back at the border and returned to Peshawar. Within a few days, Ahmedzay, Zazi and the third individual met with an al-Qaeda facilitator in Peshawar and agreed to travel for training in Waziristan. Upon arriving, they met with two al-Qaeda leaders, but did not learn their true identities. As the government represented during today’s guilty plea, the leaders were Saleh al-Somali, the head of international operations for al-Qaeda, and Rashid Rauf, a key al-Qaeda operative. The three Americans said that they wanted to fight in Afghanistan, but the al-Qaeda leaders explained that they would be more useful to al-Qaeda and the jihad if they returned to New York and conducted attacks there. Ahmedzay and the others received training on several different kinds of weapons. During the training, al-Qaeda leaders continued to encourage them to return to the United States and conduct suicide operations. They agreed, and had further conversations with al-Qaeda about the timing of the attacks and possible target locations in Manhattan. Al-Qaeda leadership emphasized the need to hit well-known structures and maximize the number of casualties. After the initial training, the three Americans left Waziristan. The plan was for Ahmedzay and Zazi to return to Waziristan a month later to receive explosives training from al-Qaeda. Ahmedzay later changed his mind about attending the training, and Zazi went by himself. Ahmedzay later reviewed Zazi’s bomb-making notes from the training. Ahmedzay and Zazi returned to New York, and Zazi moved to Denver. Ahmedzay initially had reservations about going forward with the suicide bombing, but resolved to go forward with the plan. Zazi traveled to New York from Colorado and the three Americans met in Queens and agreed to carry out suicide bombings during the month of Ramadan, Aug. 22, 2009 to Sept. 20, 2009. They agreed that Zazi would prepare the explosives, that Zazi and Ahmedzay would assemble the devices in New York, and that all three would conduct suicide attacks. Ahmedzay later evaluated potential bombing targets in Manhattan. Zazi traveled a second time to New York, and Ahmedzay and Zazi discussed the attack in further detail. By that time, Zazi had begun researching and experimenting with explosives in Colorado. Based on the amount of explosives Zazi anticipated he could produce by Ramadan, Zazi and Ahmedzay decided that they would conduct suicide attacks on subway trains rather than targeting a larger structure such as a building. Zazi returned to Colorado and constructed the explosives for the detonator components of the bombs. In July and August 2009, Zazi purchased large quantities of components necessary to produce the explosive TATP [Triacetone Triperoxide] and twice checked into a hotel room near Denver, where bomb making residue was later found. On Sept. 8, 2009, Zazi rented a car and drove from Denver to New York, taking with him the explosives and other materials necessary to build the bombs. Zazi arrived in New York City on Thursday, Sept. 10, 2009. Zazi and Ahmedzay intended to obtain and assemble the remaining components of the bombs over the weekend and the three of them would conduct the attack on Manhattan subway lines on Sept. 14, Sept. 15, or Sept. 16, 2009. However, shortly after arriving in New York, they realized that law enforcement was investigating their activities. Ahmedzay and Zazi discarded the explosives and other bomb-making materials, and Zazi traveled back to Denver. This case is being prosecuted by the U.S. Attorney’s Office for the Eastern District of New York, with assistance from the U.S. Attorney’s Office for the District of Colorado and the Counterterrorism Section of the Justice Department’s National Security Division. The investigation is being conducted by the New York and Denver FBI Joint Terrorism Task Forces, which combined have investigators from more than 50 federal, state and local law enforcement agencies. Zazi returned to Colorado and constructed the explosives for the detonator components of the bombs. In July and August 2009, Zazi purchased large quantities of components necessary to produce the explosive TATP [Triacetone Triperoxide] and twice checked into a hotel room near Denver, where bomb making residue was later found. On Sept. 8, 2009, Zazi rented a car and drove from Denver to New York, taking with him the explosives and other materials necessary to build the bombs. Zazi arrived in New York City on Thursday, Sept. 10, 2009. Zazi and Ahmedzay intended to obtain and assemble the remaining components of the bombs over the weekend and the three of them would conduct the attack on Manhattan subway lines on Sept. 14, Sept. 15, or Sept. 16, 2009. However, shortly after arriving in New York, they realized that law enforcement was investigating their activities. Ahmedzay and Zazi discarded the explosives and other bomb-making materials, and Zazi traveled back to Denver. This case is being prosecuted by the U.S. Attorney’s Office for the Eastern District of New York, with assistance from the U.S. Attorney’s Office for the District of Colorado and the Counterterrorism Section of the Justice Department’s National Security Division. The investigation is being conducted by the New York and Denver FBI Joint Terrorism Task Forces, which combined have investigators from more than 50 federal, state and local law enforcement agencies.",2010-04-23T00:00:00-04:00,No topic,Office of the Attorney General
13084,17-045,Zimmer Biomet Holdings Inc. Agrees to Pay $17.4 Million to Resolve Foreign Corrupt Practices Act Charges,"Subsidiary Agrees to Plead Guilty to Violating the Foreign Corrupt Practices Act Zimmer Biomet Holdings Inc. (Zimmer Biomet), an Indiana-based manufacturer of orthopedic and dental implant devices, has agreed to pay a $17.4 million criminal penalty in connection with a scheme to pay bribes to government officials in Mexico and for violations of the internal controls provisions of the Foreign Corrupt Practices Act (FCPA) involving the company’s operations in Mexico and Brazil. Zimmer Biomet had been in breach of a 2012 deferred prosecution agreement (DPA) with the department resolving an earlier investigation into FCPA violations committed by Biomet Inc., which became part of Zimmer Biomet in 2015. Assistant Attorney General Leslie R. Caldwell of the Justice Department’s Criminal Division and Assistant Director Stephen Richardson of the FBI’s Criminal Investigative Division made the announcement. “Zimmer Biomet had the opportunity to avoid criminal charges but its misconduct allowed the bribes to continue,” said Assistant Attorney General Caldwell. “Zimmer Biomet is now paying the price for disregarding its obligations under the earlier deferred prosecution agreement. In appropriate circumstances the department will resolve serious criminal conduct through alternative means, but there will be consequences for those companies that refuse to take these agreements seriously.” “Zimmer Biomet failed to rectify their misconduct and get back on track in compliance with the law, and now they are facing the consequences of their corrupt actions,” said Assistant Director Richardson. “The FBI will not stand idly by when companies operate outside the law and attempt to play by different rules in the marketplace. We remain vigilant and committed to holding those accountable who disregard the rule of law in the United States.” According to admissions made in the resolution documents, even after the 2012 DPA between the department and Biomet, the company knowingly and willfully continued to use a third-party distributor in Brazil known to have paid bribes to government officials on Biomet’s behalf. Biomet also failed to implement an adequate system of internal accounting controls at the company’s subsidiary in Mexico, despite employees and executives having been made aware of red flags suggesting that bribes were being paid. By failing to require appropriate due diligence and documentation and contracts for payments to third parties, Biomet allowed its Mexican subsidiary, Biomet 3i Mexico S.A. de C.V. (3i Mexico), to pay bribes to Mexican customs officials through customs brokers and sub-agents so 3i Mexico could import contraband dental implants into Mexico. Importing those products into Mexico violated Mexican law because they lacked proper registration or labeling. Zimmer Biomet entered into a three-year DPA tin connection with a superseding criminal information, filed today in the District of Columbia, charging the company with failing to implement a system of effective internal accounting controls. Pursuant to its agreement with the department, Zimmer Biomet agreed to pay a $17.4 million criminal penalty and retain an independent corporate compliance monitor for three years. JERDS Luxembourg Holding S.ár.l. (JERDS), an indirect subsidiary of Zimmer Biomet, agreed to plead guilty to a one-count criminal information, also filed in the District of Columbia, charging it with causing Biomet to violate the books and records provisions of the FCPA through the actions of 3i Mexico, a wholly-owned subsidiary of JERDS. The plea agreement is subject to court approval. The case was assigned to Senior U.S. District Judge Reggie B. Walton of the District of Columbia and the change of plea is scheduled to take place on Jan. 13, 2017 at 3:45 p.m. In related proceedings, the U.S. Securities and Exchange Commission (SEC) filed a cease and desist order against Zimmer Biomet whereby the company agreed to pay to the SEC disgorgement of $6.5 million including pre-judgment interest and $6.5 million as a civil penalty. The Criminal Division’s Fraud Section reached this resolution based on a number of factors, including that Zimmer Biomet was in breach of the 2012 DPA between Biomet and the department. That agreement resolved an earlier investigation by the department into violations of the FCPA committed by Biomet, including the bribery of government officials in Argentina, Brazil and China as well as the falsification of the company’s financial records to conceal the true nature of the bribe payments. Pursuant to the 2012 DPA, Biomet had been required to retain an independent compliance monitor. The monitor’s term was extended for one year in 2015, due to both the bribery in Brazil and Mexico and the fact that the Zimmer Biomet compliance program did not meet the requirements of the 2012 DPA. At the conclusion of the extended period, the independent monitor was unable to certify that the company’s compliance program satisfied the requirements of the 2012 DPA and the department notified Zimmer Biomet that it was deemed to be in breach of the agreement. Zimmer Biomet fully cooperated with the current investigation and provided to the Fraud Section all relevant facts known to the company, including information about individuals involved in the misconduct. Nevertheless, because Zimmer Biomet failed to implement an effective compliance program and committed additional crimes while under a DPA and monitorship, the current DPA requires Zimmer Biomet retain an independent compliance monitor for a term of three years. The FBI’s International Corruption Squad in Washington, D.C., investigated the case. Assistant Chief Tarek J. Helou and Trial Attorney John Borchert of the Fraud Section prosecuted the case. The Office of International Affairs also provided substantial assistance in this matter. The Criminal Division’s Fraud Section is responsible for investigating and prosecuting all FCPA matters. Additional information about the department’s FCPA enforcement efforts can be found at www.justice.gov/criminal/fraud/fcpa. Court Documents: Zimmer Superseding Information Zimmer DPA JERDS Information",2017-01-12T00:00:00-05:00,Foreign Corruption,Criminal Division; Criminal - Criminal Fraud Section; Criminal Investigative Division (FBI)
13085,17-252,ZTE Corporation Agrees to Plead Guilty and Pay Over $430.4 Million for Violating U.S. Sanctions by Sending U.S.-Origin Items to Iran,"ZTE Corporation has agreed to enter a guilty plea and to pay a $430,488,798 penalty to the U.S. for conspiring to violate the International Emergency Economic Powers Act (IEEPA) by illegally shipping U.S.-origin items to Iran, obstructing justice and making a material false statement. ZTE simultaneously reached settlement agreements with the U.S. Department of Commerce’s Bureau of Industry and Security (BIS) and the U.S. Department of the Treasury’s Office of Foreign Assets Control (OFAC). In total ZTE has agreed to pay the U.S. Government $892,360,064. The BIS has suspended an additional $300,000,000, which ZTE will pay if it violates its settlement agreement with the BIS. Attorney General of the United States Jeff Sessions, Acting Assistant Attorney General for National Security Mary B. McCord, U.S. Attorney John R. Parker for the Northern District of Texas and FBI Assistant Director Bill Priestap for the Counterintelligence Division made the announcement today. “ZTE Corporation not only violated export controls that keep sensitive American technology out of the hands of hostile regimes like Iran’s – they lied to federal investigators and even deceived their own counsel and internal investigators about their illegal acts,” said Attorney General Sessions. “This plea agreement holds them accountable, and makes clear that our government will use every tool we have to punish companies who would violate our laws, obstruct justice and jeopardize our national security. I am grateful to the Justice Department’s National Security Division, the U.S. Attorney’s Office for the Northern District of Texas and the FBI for their outstanding work on this investigation.” “ZTE engaged in an elaborate scheme to acquire U.S.-origin items, send the items to Iran and mask its involvement in those exports. The plea agreement, which is pending before the Court, alleges that the highest levels of management within the company approved the scheme. ZTE then repeatedly lied to and misled federal investigators, its own attorneys and internal investigators. Its actions were egregious and warranted a significant penalty,” said Acting Assistant Attorney General McCord. “The enforcement of U.S. export control and sanctions laws is a major component of the National Security Division’s commitment to protecting the national security of the United States. Companies that violate these laws – including foreign companies – will be investigated and held to answer for their actions.” “ZTE Corporation not only violated our export control laws but, once caught, shockingly resumed illegal shipments to Iran during the course of our investigation,” said U.S. Attorney Parker. “ZTE Corporation then went to great lengths to devise elaborate, corporate-wide schemes to hide its illegal conduct, including lying to its own lawyers.” ""The plea agreement in this case shows ZTE repeatedly violated export controls and illegally shipped U.S. technology to Iran,"" said Assistant Director Priestap. ""The company also took extensive measures to hide what it was doing from U.S. authorities. This case is an excellent example of cooperation among multiple U.S. agencies to uncover illegal technology transfers and make those responsible pay for their actions."" The plea agreement, which is contingent on the court’s approval, also requires ZTE to submit to a three-year period of corporate probation, during which time an independent corporate compliance monitor will review and report on ZTE’s export compliance program. ZTE is also required to cooperate fully with the Department of Justice (DOJ) regarding any criminal investigation by U.S. law enforcement authorities. The plea agreement ends a five-year joint investigation into ZTE’s export practices, which was handled by the DOJ’s National Security Division, the U.S. Attorney’s Office for the Northern District of Texas, the FBI, the BIS and the Department of Homeland Security, U.S. Immigration and Customs Enforcement’s Homeland Security Investigations. A criminal information was filed today in federal court in the Northern District of Texas charging ZTE with one count of knowingly and willfully conspiring to violate the IEEPA, one count of obstruction of justice and one count of making a material false statement. ZTE waived the requirement of being charged by way of federal indictment, agreed to the filing of the information and has accepted responsibility for its criminal conduct by entering into a plea agreement with the government. The plea agreement, which is contingent on the court’s approval, requires that ZTE pay a fine in the amount of $286,992,532 and a criminal forfeiture in the amount of $143,496,266. The criminal fine represents the largest criminal fine in connection with an IEEPA prosecution. Summary of the Criminal Conduct According to documents filed today, for a period of almost six years, ZTE obtained U.S.-origin items – including controlled dual-use goods on the Department of Commerce’s Commerce Control List (CCL) – incorporated some of those items into ZTE equipment and shipped the ZTE equipment and U.S.-origin items to customers in Iran. ZTE engaged in this conduct knowing that such shipments to Iran were illegal. ZTE further lied to federal investigators during the course of the investigation when it insisted, through outside and in-house counsel, that the company had stopped sending U.S.-origin items to Iran. In fact, while the investigation was ongoing, ZTE resumed its business with Iran and shipped millions of dollars’ worth of U.S. items there. ZTE also created an elaborate scheme to hide the data related to these transactions from a forensic accounting firm hired by defense counsel to conduct a review of ZTE’s transactions with sanctioned countries. It did so knowing that the information provided to the forensic accounting firm would be reported to the U.S. government by outside counsel. Outside counsel was not aware of this scheme and indeed was wholly unaware that ZTE had resumed business with Iran. After ZTE informed its counsel of the scheme, counsel reported – with permission from ZTE – the conduct to the U.S. government. The Iran Business According to court documents, between January 2010 and January 2016, ZTE, either directly or indirectly through a third company, shipped approximately $32,000,000 of U.S.-origin items to Iran without obtaining the proper export licenses from the U.S. government. In early 2010, ZTE began bidding on two different Iranian projects. The projects involved installing cellular and landline network infrastructure. Each contract was worth hundreds of millions of U.S. Dollars and required U.S. components for the final products. In December 2010, ZTE finalized the contracts with Iranian customers. The contracts were signed by four parties: the Iranian customer, ZTE, Beijing 8 Star and ZTE Parsian. Court documents explain that ZTE identified Beijing 8 Star (8S) as a possible vehicle for hiding its illegal shipments of U.S. items to Iran. It intended to use 8S to export U.S.-origin items from China to ZTE customers in Iran. As part of this plan, ZTE supplied 8S with necessary capital and took over control of the company. Under the terms of the Iran contracts, ZTE agreed to supply the “self-developed equipment,” collect payments for the projects and manage the whole network. ZTE Parsian was to provide locally purchased materials and all services. 8S was responsible for “relevant third-party equipment,” which primarily meant parts that would be subject to U.S. export laws. ZTE intended for 8S to be an “isolation company,” that is, ZTE intended for 8S (rather than ZTE) to purchase the embargoed equipment from suppliers and provide that equipment under the contract in an effort to distance ZTE from U.S. export-controlled products and insulate ZTE from U.S. export violations. However, 8S had no purchasing or shipping history and no real business reputation. Ultimately, although 8S was a party to the contracts, ZTE itself purchased and shipped the embargoed goods under the contract. In its shipping containers, it packaged the U.S. items with its own self-manufactured items to hide the U.S.-origin goods. ZTE did not include the U.S. items on the customs declaration forms, though it did include the U.S.-origin items on the packing lists included inside of the shipments. In early 2011, when ZTE determined that the use of 8S was insufficient to hide ZTE’s connection to the illegal export of U.S.-origin goods to Iran, senior management of ZTE ordered that a company-level export control project team study, handle and respond to the company’s export control risks. In September 2011, four senior managers signed an Executive Memo, which proposed that the company identify and establish new “isolation companies” that would be responsible for supplying U.S. component parts necessary for projects in embargoed countries. The isolation companies would conceal ZTE’s role in the transshipment scheme and would insulate ZTE from export control risks. In March 2012, Reuters published an article regarding ZTE’s sale of equipment to Iran. In response, ZTE made a decision to temporarily cease sending new U.S. equipment to Iran. By November 2013, however, ZTE had resumed its business with Iran. Beginning in July 2014, ZTE began shipping U.S.-origin equipment to Iran once again without the necessary licenses. Instead of using 8S, however, ZTE identified a new isolation company. ZTE signed a contract with the new isolation company, which in turn signed contracts with the two Iranian customers. According to the new scheme, ZTE purchased and manufactured all relevant equipment – both U.S.-origin and ZTE-manufactured – and prepared them for pick-up at its warehouse by the new isolation company. The new isolation company then shipped all items to the Iranian customers. Shipments to Iran continued from January 2014 through January 2016. The Obstruction and False Statement According to court documents, despite its knowledge of an ongoing grand jury investigation into its Iran exports, ZTE took several steps to conceal relevant information from the U.S. government. It further took affirmative steps to mislead the U.S. government. In the summer of 2012, ZTE asked each of the employees who were involved in the Iran sales to sign nondisclosure agreements in which the employees agreed to keep confidential all information related to the company’s U.S. exports to Iran. During meetings throughout late 2014, late 2015 and early 2016, outside counsel for ZTE, unaware that the statements ZTE had given to counsel for communication to the government were false, represented to the DOJ and federal law enforcement agents that ZTE had stopped doing business with Iran and therefore was no longer violating U.S. export laws. Similarly, on July 8, 2015, in-house counsel for ZTE accompanied outside counsel in a meeting with the DOJ and federal law enforcement agents and reported that ZTE was abiding by U.S. laws. That statement was also false. ZTE also hid data related to its resumed illegal sales to Iran from a forensic accounting firm hired by defense counsel to conduct an internal investigation into the company’s Iran sales. ZTE knew the forensic accounting firm was reviewing its systems and knew that the analysis was being reported to the DOJ and U.S. law enforcement. To avoid detection of its 2013-2016 resumed illegal sales to Iran, ZTE formed the “contract data induction team” (“CDIT”). The CDIT was comprised of approximately 13 people whose job it was to “sanitize the databases” of all information related to the 2013-2016 Iran business. The team identified and removed from the databases all data related to those sales. ZTE also established an auto-delete function for the email accounts of those 13 individuals on the CDIT, so their emails were deleted every night – a departure from its normal practices – to ensure there were no communications related to the hiding of the data. The case is being prosecuted by Deputy Chief Elizabeth Cannon of the National Security Division’s Counterintelligence and Export Control Sections and Assistant U.S. Attorney Mark Penley of the Northern District of Texas. ZTE Information ZTE Plea Agreement Supplement ZTE Plea Agreement ZTE Factual Resume",2017-03-07T00:00:00-05:00,Asset Forfeiture; Counterintelligence and Export Control,"National Security Division (NSD); USAO - Texas, Northern"


## 1. Tagging and sentiment scoring (17 points)

Focus on the following press release: `id` == "17-1204" about this pharmaceutical kickback prosecution: https://www.forbes.com/sites/michelatindera/2017/11/16/fentanyl-billionaire-john-kapoor-to-plead-not-guilty-in-opioid-kickback-case/?sh=21b8574d6c6c 

The `contents` column is the one we're treating as a document. You may need to to convert it from a pandas series to a single string.

We'll call the raw string of this press release `pharma`

In [413]:
## your code to subset to one press release and take the string
doj_filter = doj[doj['id'] == "17-1204"] 
pharma = ' '.join(doj_filter.contents) # join all to one big string 
type(pharma) # check 


str

### 1.1 part of speech tagging (3 points)

A. Preprocess the `pharma` press release to remove all punctuation / digits (you can use `.isalpha()` to subset)

B. With the preprocessed press release from part A, use the part of speech tagger within nltk to tag all the words in that one press release with their part of speech. 

C. Using the output from B, extract the adjectives and sort those adjectives from most occurrences to fewest occurrences. Print a dataframe with the 5 most frequent adjectives and their counts in the `pharma` release. See here for a list of the names of adjectives within nltk: https://pythonprogramming.net/natural-language-toolkit-nltk-part-speech-tagging/

**Resources**:

- Documentation for `.isalpha()`: https://www.w3schools.com/python/ref_string_isalpha.asp

In [414]:
## A. 
## your code here to restrict to alpha
pharma_alpha = [word for word in wordpunct_tokenize(pharma) # tokenize and 
                          if word.isalpha()]          # subset for only alphabetical 

pharma_alpha # check 

['The',
 'founder',
 'and',
 'majority',
 'owner',
 'of',
 'Insys',
 'Therapeutics',
 'Inc',
 'was',
 'arrested',
 'today',
 'and',
 'charged',
 'with',
 'leading',
 'a',
 'nationwide',
 'conspiracy',
 'to',
 'profit',
 'by',
 'using',
 'bribes',
 'and',
 'fraud',
 'to',
 'cause',
 'the',
 'illegal',
 'distribution',
 'of',
 'a',
 'Fentanyl',
 'spray',
 'intended',
 'for',
 'cancer',
 'patients',
 'experiencing',
 'breakthrough',
 'pain',
 'More',
 'than',
 'Americans',
 'died',
 'of',
 'synthetic',
 'opioid',
 'overdoses',
 'last',
 'year',
 'and',
 'millions',
 'are',
 'addicted',
 'to',
 'opioids',
 'And',
 'yet',
 'some',
 'medical',
 'professionals',
 'would',
 'rather',
 'take',
 'advantage',
 'of',
 'the',
 'addicts',
 'than',
 'try',
 'to',
 'help',
 'them',
 'said',
 'Attorney',
 'General',
 'Jeff',
 'Sessions',
 'This',
 'Justice',
 'Department',
 'will',
 'not',
 'tolerate',
 'this',
 'We',
 'will',
 'hold',
 'accountable',
 'anyone',
 'from',
 'street',
 'dealers',
 'to',
 

In [415]:
## B 
## your code here for part of speech tagging
pharma_pos = pos_tag(pharma_alpha)
pharma_pos # check 


[('The', 'DT'),
 ('founder', 'NN'),
 ('and', 'CC'),
 ('majority', 'NN'),
 ('owner', 'NN'),
 ('of', 'IN'),
 ('Insys', 'NNP'),
 ('Therapeutics', 'NNP'),
 ('Inc', 'NNP'),
 ('was', 'VBD'),
 ('arrested', 'VBN'),
 ('today', 'NN'),
 ('and', 'CC'),
 ('charged', 'VBN'),
 ('with', 'IN'),
 ('leading', 'VBG'),
 ('a', 'DT'),
 ('nationwide', 'JJ'),
 ('conspiracy', 'NN'),
 ('to', 'TO'),
 ('profit', 'VB'),
 ('by', 'IN'),
 ('using', 'VBG'),
 ('bribes', 'NNS'),
 ('and', 'CC'),
 ('fraud', 'NN'),
 ('to', 'TO'),
 ('cause', 'VB'),
 ('the', 'DT'),
 ('illegal', 'JJ'),
 ('distribution', 'NN'),
 ('of', 'IN'),
 ('a', 'DT'),
 ('Fentanyl', 'NNP'),
 ('spray', 'NN'),
 ('intended', 'VBD'),
 ('for', 'IN'),
 ('cancer', 'NN'),
 ('patients', 'NNS'),
 ('experiencing', 'VBG'),
 ('breakthrough', 'NN'),
 ('pain', 'NN'),
 ('More', 'JJR'),
 ('than', 'IN'),
 ('Americans', 'NNPS'),
 ('died', 'VBD'),
 ('of', 'IN'),
 ('synthetic', 'JJ'),
 ('opioid', 'NN'),
 ('overdoses', 'NNS'),
 ('last', 'JJ'),
 ('year', 'NN'),
 ('and', 'CC'),
 (

In [416]:
## C. 
## Subset for adjectives 
adjectives = [(word, tag) for word, tag in pharma_pos if tag in ('JJ', 'JJR', 'JJS')]
adjectives

[('nationwide', 'JJ'),
 ('illegal', 'JJ'),
 ('More', 'JJR'),
 ('synthetic', 'JJ'),
 ('last', 'JJ'),
 ('medical', 'JJ'),
 ('accountable', 'JJ'),
 ('corporate', 'JJ'),
 ('nationwide', 'JJ'),
 ('American', 'JJ'),
 ('current', 'JJ'),
 ('other', 'JJ'),
 ('former', 'JJ'),
 ('federal', 'JJ'),
 ('later', 'JJ'),
 ('additional', 'JJ'),
 ('several', 'JJ'),
 ('former', 'JJ'),
 ('former', 'JJ'),
 ('former', 'JJ'),
 ('former', 'JJ'),
 ('former', 'JJ'),
 ('former', 'JJ'),
 ('various', 'JJ'),
 ('many', 'JJ'),
 ('powerful', 'JJ'),
 ('narcotic', 'JJ'),
 ('intense', 'JJ'),
 ('large', 'JJ'),
 ('most', 'JJS'),
 ('former', 'JJ'),
 ('reluctant', 'JJ'),
 ('non', 'JJ'),
 ('prior', 'JJ'),
 ('nationwide', 'JJ'),
 ('potent', 'JJ'),
 ('ongoing', 'JJ'),
 ('opioid', 'JJ'),
 ('accountable', 'JJ'),
 ('street', 'JJ'),
 ('level', 'JJ'),
 ('corporate', 'JJ'),
 ('utilized', 'JJ'),
 ('acceptable', 'JJ'),
 ('addictive', 'JJ'),
 ('better', 'JJR'),
 ('street', 'JJR'),
 ('level', 'JJ'),
 ('important', 'JJ'),
 ('pharmaceutical'

## 1.2 named entity recognition (4 points)

A. Using the original `pharma` press release (so the one before stripping punctuation/digits), use spaCy to extract all named entities from the press release.

B. Print the unique named entities with the tag: `LAW`

In [417]:
## your code here for part A

doc = nlp(pharma)
entities = [(ent.text, ent.label_) for ent in doc.ents]
entities

[('Insys Therapeutics Inc.', 'ORG'),
 ('today', 'DATE'),
 ('Fentanyl', 'PERSON'),
 ('More than 20,000', 'CARDINAL'),
 ('Americans', 'NORP'),
 ('last year', 'DATE'),
 ('millions', 'CARDINAL'),
 ('Jeff Sessions', 'PERSON'),
 ('This Justice Department', 'ORG'),
 ('Trump', 'PERSON'),
 ('American', 'NORP'),
 ('”John N. Kapoor', 'PERSON'),
 ('74', 'DATE'),
 ('Phoenix', 'GPE'),
 ('Ariz.', 'GPE'),
 ('the Board of Directors', 'ORG'),
 ('Insys', 'ORG'),
 ('this morning', 'TIME'),
 ('Arizona', 'GPE'),
 ('RICO', 'LAW'),
 ('Kapoor', 'PERSON'),
 ('Executive', 'ORG'),
 ('Board', 'ORG'),
 ('Insys', 'ORG'),
 ('Phoenix', 'GPE'),
 ('today', 'DATE'),
 ('U.S.', 'GPE'),
 ('District Court', 'ORG'),
 ('Boston', 'GPE'),
 ('a later date', 'DATE'),
 ('today', 'DATE'),
 ('Boston', 'GPE'),
 ('Insys', 'ORG'),
 ('December 2016.The', 'DATE'),
 ('Kapoor', 'GPE'),
 ('Michael L. Babich', 'PERSON'),
 ('40', 'DATE'),
 ('Scottsdale', 'GPE'),
 ('Ariz.', 'GPE'),
 ('Alec Burlakoff', 'PERSON'),
 ('42', 'DATE'),
 ('Charlotte', 

In [418]:
## your code here for part B
laws = [(word, tag) for word, tag in entities if tag == 'LAW']
laws

[('RICO', 'LAW'), ('the Controlled Substances Act', 'LAW'), ('RICO', 'LAW')]

C. Use Google to summarize in one sentence what the `RICO` named entity means and why this might apply to a pharmaceutical kickbacks case (and not just a mafia case...) 

## Part C- Defining RICO:
A Racketeer Influenced and Corrupt Organizations Act (**RICO**) charge involves the prosecution of peoples involved in organized crime, such as for instance, a pharmaceutical company demonstrating a systamatic pattern of fraudulent, corrupt bribery to increase profits.

D. You want to extract the possible sentence lengths the CEO is facing; pull out the named entities with (1) the label `DATE` and (2) that contain the word year or years (hint: you may want to use the `re` module for that second part). Print these named entities.

In [419]:
## your code here
dates = [(word, tag) for word, tag in entities if tag == 'DATE' and re.search(r'\byear(s)?\b', word, re.IGNORECASE)]
dates

[('last year', 'DATE'),
 ('20 years', 'DATE'),
 ('three years', 'DATE'),
 ('five years', 'DATE'),
 ('three years', 'DATE')]

##### E. Pull and print the original parts of the press releases where those year lengths are mentioned (e.g., the sentences or rough region of the press release). Describe in your own words (1 sentence) what length of sentence (prison) and probation (supervised release) the CEO may be facing if convicted after this indictment (if there are multiple lengths mentioned describe the maximum). 

**Hint**: you may want to use re.search or re.findall 

- For part E, you can use `re.search` and `re.findall`, or anything that works 😳.

In [420]:
## convert dates to a list of strings (was tuples, giving issues) 

year_lengths = [year_length for year_length, tag in dates]
year_lengths

## now return all the parts of the press releases where dates are mentioned. 
def contains_year_length(press_release):
    return any(year_length in press_release for year_length in year_lengths)

## subset doj for thoe rows returned by contains_year_length 
filtered_doj = doj[doj['contents'].apply(contains_year_length)]

## Check to make sure above worked, filtered_doj should have less
len(doj)
len(filtered_doj)

## Print rows
(filtered_doj)


['last year', '20 years', 'three years', 'five years', 'three years']

13087

3986

Unnamed: 0,id,title,contents,date,topics_clean,components_clean
1,12-919,$1 Million in Restitution Payments Announced to Preserve North Carolina Wetlands,"WASHINGTON – North Carolina’s Waccamaw River watershed will benefit from a $1 million restitution order from a federal court, funding environmental projects to acquire and preserve wetlands in an area damaged by illegal releases of wastewater from a corporate hog farm, announced Ignacia S. Moreno, Assistant Attorney General of the Justice Department’s Environment and Natural Resources Division; U.S. Attorney for the Eastern District of North Carolina Thomas G. Walker; Director Greg McLeod from the North Carolina State Bureau of Investigation; and Camilla M. Herlevich, Executive Director of the North Carolina Coastal Land Trust. Freedman Farms Inc. was sentenced in February 2012 to five years of probation and ordered to pay $1.5 million in fines, restitution and community service payments for violating the Clean Water Act when it discharged hog waste into a stream that leads to the Waccamaw River. William B. Freedman, president of Freedman Farms, was sentenced to six months in prison to be followed by six months of home confinement. Freedman Farms also is required to implement a comprehensive environmental compliance program and institute an annual training program. In an order issued on April 19, 2012, the court ordered that the defendants would be responsible for restitution of $1 million in the form of five annual payments starting in January 2013, which the court will direct to the North Carolina Coastal Land Trust (NCCLT). The NCCLT plans to use the money to acquire and conserve land along streams in the Waccamaw watershed. The court also directed a $75,000 community service payment to the Southern Environmental Enforcement Network, an organization dedicated to environmental law enforcement training and information sharing in the region. “The resolution of the case against Freedman Farms demonstrates the commitment of the Department of Justice to enforcing the Clean Water Act to ensure the protection of human health and the environment,” said Assistant Attorney General Moreno. “The court-ordered restitution in this case will conserve wetlands for the benefit of the people of North Carolina. By enforcing the nation’s environmental laws, we will continue to ensure that concentrated animal feeding operations (CAFOs) operate without threatening our drinking water, the health of our communities and the environment.” “This office is committed to doing our part to hold accountable those who commit crimes against our environment, which can cause serious health problems to residents and damage the environment that makes North Carolina such a beautiful place to live and visit,” said U.S. Attorney Walker. “This case shows what we can accomplish when our SBI agents work closely with their local, state and federal partners to investigate environmental crimes and hold the polluters accountable,” said Director McLeod. “We’ll continue our efforts to fight illegal pollution that damages our water and puts the public’s health at risk.” “The Waccamaw is unique and wild,” said Director Herlevich of the North Carolina Coastal Land Trust. “Its watershed includes some of the most extensive cypress gum swamps in the state, and its headwaters at Lake Waccamaw contain fish that are found nowhere else on Earth. We appreciate the trust of the court and the U. S. Attorney, and we look forward to using these funds for conservation projects in a river system that is one of our top conservation priorities.” According to evidence presented in court, in December 2007 Freedman Farms discharged hog waste into Browder’s Branch, a tributary to the Waccamaw River that flows through the White Marsh, a large wetlands complex. Freedman Farms, located in Columbus County, N.C., is in the business of raising hogs for market, and this particular farm had some 4,800 hogs. The hog waste was supposed to be directed to two lagoons for treatment and disposal. Instead, hog waste was discharged from Freedman Farms directly into Browder’s Branch. The Clean Water Act is a federal law that makes it illegal to knowingly or negligently discharge a pollutant into a water of the United States. The Freedman case was investigated by the U.S. Environmental Protection Agency (EPA) Criminal Investigation Division, the U.S. Army Corps of Engineers and the North Carolina State Bureau of Investigation, with assistance from the EPA Science and Ecosystem Support Division. The case was prosecuted by Assistant U.S. Attorney J. Gaston B. Williams of the Eastern District of North Carolina and Trial Attorney Mary Dee Carraway of the Environmental Crimes Section of the Justice Department’s Environment and Natural Resources Division. The North Carolina Coastal Land Trust is celebrating its 20th anniversary of saving special lands in eastern North Carolina. The organization has protected nearly 50,000 acres of lands with scenic, recreational, historic and ecological values. North Carolina Coastal Land Trust has saved streams and wetlands that provide clean water, forests that are havens for wildlife, working farms that provide local food and nature parks that everyone can enjoy. More information about the Coastal Land Trust is available at www.coastallandtrust.org.",2012-07-25T00:00:00-04:00,No topic,Environment and Natural Resources Division
15,18-78,2 Men Charged With Conspiring to Illegally Obtain Technology and Computer Chips That Were Sent to China,"Federal authorities arrested Yi-Chi Shih, 62, and Kiet Ahn Mai, 63, on Jan. 19, on federal charges that allege a scheme to illegally obtain technology and integrated circuits with military applications that were exported to a Chinese company without the required export license. The announcement was made by Acting Assistant Attorney General for National Security Dana J. Boente; U.S. Attorney Nicola T. Hanna for the Central District of California; Assistant Director in Charge Paul Delacourt of the FBI’s Los Angeles Field Office; Special Agent in Charge R. Damon Rowe of IRS Criminal Investigation; Special Agent in Charge Richard Weir of the U.S. Department of Commerce, Bureau of Industry and Security, Office of Export Enforcement, Los Angeles Field Office. “According to the complaint, the defendants allegedly schemed to illegally export semiconductors having military and civilian applications to a Chinese company,” said Acting Assistant Attorney General Boente. “Protecting this type of technology and preventing its illegal acquisition by our adversaries remains a key priority in preserving our national security.” “This case outlines a scheme to secure proprietary technology, some of which was allegedly sent to China, where it could be used to provide companies there with significant advantages that would compromise U.S. business interests,” said U.S. Attorney Hanna. “The very sensitive information would also benefit foreign adversaries who could use the technology to further or develop military applications that would be detrimental to our national security.” “The FBI, working jointly with our law enforcement partners, remains committed to bringing to justice those who seek to illegally export some of our nation’s most sensitive technologies to the detriment of our national security and hard-working United States companies,” said Assistant Director in Charge Delacourt. “Rest assured, the FBI will continue to diligently pursue any and all leads that involve the illegal exportation of U.S. technology which will cause harm to our long-term national security interests.” “Today’s actions serve as a reminder that the government will hold individuals accountable who fraudulently procure and export unlawfully protected United States technology and attempt to conceal their criminal activity through international money laundering,” said Special Agent in Charge Rowe. “The IRS plays an important role in tracing illicit funds through both domestic and international financial intuitions. The IRS is proud to partner with the FBI and Department of Commerce and share its world-renowned financial investigative expertise in this investigation.” “Today’s arrests demonstrate the Office of Export Enforcement’s strong commitment to enforcing our nation’s export control and public safety laws,” said Special Agent in Charge Weir. “We will continue to work with our law enforcement partners to identify, deter, and keep the most sensitive U.S. origin goods and technology out of the most dangerous hands.” Shih, an electrical engineer who is a part-time Los Angeles resident and a naturalized U.S. citizen originally from Taiwan, and Mai who resides in Pasadena, California and is a naturalized U.S. citizen originally from Vietnam, were arrested on Jan. 19, without incident by federal agents. Shih and Mai, who previously worked together at two different companies, are named in a criminal complaint unsealed on Jan. 19, that charges them with conspiracy. Shih is also charged with violating the International Emergency Economic Powers Act (IEEPA), a federal law that makes illegal, among other things, certain unauthorized exports. The complaint alleges that Shih and Mai conspired to illegally provide Shih with unauthorized access to a protected computer of a U.S. company that manufactured specialized, high-speed computer chips known as monolithic microwave integrated circuits (MMICs). The conspiracy count also alleges that the two men engaged in mail fraud, wire fraud and international money laundering to further the scheme. According to the affidavit in support of the criminal complaint, Shih and Mai executed a scheme to defraud the U.S. company out of its proprietary, export-controlled items, including technology associated with its design services for MMICs. As part of the scheme, Shih and Mai accessed the victim company’s computer systems via its web portal after Mai obtained that access by posing as a domestic customer seeking to obtain custom-designed MMICs that would be used solely in the United States. Shih and Mail allegedly concealed Shih’s true intent to transfer the U.S. company’s technology and products to the People’s Republic of China. The victim company’s proprietary semiconductor technology has a number of commercial and military applications, and its customers include the Air Force, Navy and the Defense Advanced Research Projects Agency. MMICs are used in electronic warfare, electronic warfare countermeasures and radar applications. The computer chips at the heart of this case allegedly were shipped to Chengdu GaStone Technology Company (CGTC), a Chinese company that established a MMIC manufacturing facility in Chengdu. Shih was the president of CGTC, which in 2014 was placed on the Commerce Department’s Entity List, according to the affidavit, “due to its involvement in activities contrary to the national security and foreign policy interest of the United States – specifically, that it had been involved in the illicit procurement of commodities and technologies for unauthorized military end use in China.” Because it was on the Entity List, a license from the Commerce Department was required to export U.S.-origin MMICs to CGTC, and there was a “presumption of denial” of a license. The complaint outlines a scheme in which Shih used a Los Angeles-based company he controlled – Pullman Lane Productions, LLC – to funnel funds provided by Chinese entities to finance the manufacturing of MMICs by the victim company. The complaint affidavit alleges that Pullman Lane received financing from a Beijing-based company that was placed on the Entity List the same day as CGTC “on the basis of its involvement in activities contrary to the national security and foreign policy interests of the United States.” Mai acted as the middleman by using his Los Angeles company – MicroEx Engineering – to pose as a legitimate domestic customer that ordered and paid for the manufacturing of MMICs that Shih illegally exported to CGTC in China, according to the complaint. It is the export of the MMICs that forms the basis of the IEEPA violation alleged against Shih. The specific exported MMICs also required a license from the Commerce Department before being exported to China, and a license was never sought or obtained for this export. Shih and Mai are expected to made their first court appearances on Jan. 19, in U.S. District Court in downtown Los Angeles. The charges contained in the Complaint are merely accusations, and the defendants are presumed innocent unless and until proven guilty. If convicted, Mai faces a maximum sentence of five years in prison, and Shih faces a maximum sentence of 25 years in prison. The maximum statutory sentences are prescribed by Congress and are provided here for informational purposes. If convicted of any offense, the sentencing of the defendants will be determined by the court based on the advisory Sentencing Guidelines and other statutory factors. This case is being investigated by the FBI; the U.S. Department of Commerce, Bureau of Industry and Security, Office of Export Enforcement; and IRS Criminal Investigation. This case is being prosecuted by Assistant U.S. Attorneys Judith A. Heinz, Melanie Sartoris and Khaldoun Shobaki of the Central District of California, and Trial Attorney Matthew Walczewski of the National Security Division Counterintelligence and Export Control Section.",2018-01-23T00:00:00-05:00,No topic,"National Security Division (NSD); USAO - California, Central"
16,18-78,2 Men Charged With Conspiring to Illegally Obtain Technology and Computer Chips That Were Sent to China,"Federal authorities arrested Yi-Chi Shih, 62, and Kiet Ahn Mai, 63, on Jan. 19, on federal charges that allege a scheme to illegally obtain technology and integrated circuits with military applications that were exported to a Chinese company without the required export license. The announcement was made by Acting Assistant Attorney General for National Security Dana J. Boente; U.S. Attorney Nicola T. Hanna for the Northern District of California; Assistant Director in Charge Paul Delacourt of the FBI’s Los Angeles Field Office; Special Agent in Charge R. Damon Rowe of IRS Criminal Investigation; Special Agent in Charge Richard Weir of the U.S. Department of Commerce, Bureau of Industry and Security, Office of Export Enforcement, Los Angeles Field Office. “According to the complaint, the defendants allegedly schemed to illegally export semiconductors having military and civilian applications to a Chinese company,” said Acting Assistant Attorney General Boente. “Protecting this type of technology and preventing its illegal acquisition by our adversaries remains a key priority in preserving our national security.” “This case outlines a scheme to secure proprietary technology, some of which was allegedly sent to China, where it could be used to provide companies there with significant advantages that would compromise U.S. business interests,” said U.S. Attorney Hanna. “The very sensitive information would also benefit foreign adversaries who could use the technology to further or develop military applications that would be detrimental to our national security.” “The FBI, working jointly with our law enforcement partners, remains committed to bringing to justice those who seek to illegally export some of our nation’s most sensitive technologies to the detriment of our national security and hard-working United States companies,” said Assistant Director in Charge Delacourt. “Rest assured, the FBI will continue to diligently pursue any and all leads that involve the illegal exportation of U.S. technology which will cause harm to our long-term national security interests.” “Today’s actions serve as a reminder that the government will hold individuals accountable who fraudulently procure and export unlawfully protected United States technology and attempt to conceal their criminal activity through international money laundering,” said Special Agent in Charge Rowe. “The IRS plays an important role in tracing illicit funds through both domestic and international financial intuitions. The IRS is proud to partner with the FBI and Department of Commerce and share its world-renowned financial investigative expertise in this investigation.” “Today’s arrests demonstrate the Office of Export Enforcement’s strong commitment to enforcing our nation’s export control and public safety laws,” said Special Agent in Charge Weir. “We will continue to work with our law enforcement partners to identify, deter, and keep the most sensitive U.S. origin goods and technology out of the most dangerous hands.” Shih, an electrical engineer who is a part-time Los Angeles resident and a naturalized U.S. citizen originally from Taiwan, and Mai who resides in Pasadena, California and is a naturalized U.S. citizen originally from Vietnam, were arrested on Jan. 19, without incident by federal agents. Shih and Mai, who previously worked together at two different companies, are named in a criminal complaint unsealed on Jan. 19, that charges them with conspiracy. Shih is also charged with violating the International Emergency Economic Powers Act (IEEPA), a federal law that makes illegal, among other things, certain unauthorized exports. The complaint alleges that Shih and Mai conspired to illegally provide Shih with unauthorized access to a protected computer of a U.S. company that manufactured specialized, high-speed computer chips known as monolithic microwave integrated circuits (MMICs). The conspiracy count also alleges that the two men engaged in mail fraud, wire fraud and international money laundering to further the scheme. According to the affidavit in support of the criminal complaint, Shih and Mai executed a scheme to defraud the U.S. company out of its proprietary, export-controlled items, including technology associated with its design services for MMICs. As part of the scheme, Shih and Mai accessed the victim company’s computer systems via its web portal after Mai obtained that access by posing as a domestic customer seeking to obtain custom-designed MMICs that would be used solely in the United States. Shih and Mail allegedly concealed Shih’s true intent to transfer the U.S. company’s technology and products to the People’s Republic of China. The victim company’s proprietary semiconductor technology has a number of commercial and military applications, and its customers include the Air Force, Navy and the Defense Advanced Research Projects Agency. MMICs are used in electronic warfare, electronic warfare countermeasures and radar applications. The computer chips at the heart of this case allegedly were shipped to Chengdu GaStone Technology Company (CGTC), a Chinese company that established a MMIC manufacturing facility in Chengdu. Shih was the president of CGTC, which in 2014 was placed on the Commerce Department’s Entity List, according to the affidavit, “due to its involvement in activities contrary to the national security and foreign policy interest of the United States – specifically, that it had been involved in the illicit procurement of commodities and technologies for unauthorized military end use in China.” Because it was on the Entity List, a license from the Commerce Department was required to export U.S.-origin MMICs to CGTC, and there was a “presumption of denial” of a license. The complaint outlines a scheme in which Shih used a Los Angeles-based company he controlled – Pullman Lane Productions, LLC – to funnel funds provided by Chinese entities to finance the manufacturing of MMICs by the victim company. The complaint affidavit alleges that Pullman Lane received financing from a Beijing-based company that was placed on the Entity List the same day as CGTC “on the basis of its involvement in activities contrary to the national security and foreign policy interests of the United States.” Mai acted as the middleman by using his Los Angeles company – MicroEx Engineering – to pose as a legitimate domestic customer that ordered and paid for the manufacturing of MMICs that Shih illegally exported to CGTC in China, according to the complaint. It is the export of the MMICs that forms the basis of the IEEPA violation alleged against Shih. The specific exported MMICs also required a license from the Commerce Department before being exported to China, and a license was never sought or obtained for this export. Shih and Mai are expected to made their first court appearances on Jan. 19, in U.S. District Court in downtown Los Angeles. The charges contained in the Complaint are merely accusations, and the defendants are presumed innocent unless and until proven guilty. If convicted, Mai faces a maximum sentence of five years in prison, and Shih faces a maximum sentence of 25 years in prison. The maximum statutory sentences are prescribed by Congress and are provided here for informational purposes. If convicted of any offense, the sentencing of the defendants will be determined by the court based on the advisory Sentencing Guidelines and other statutory factors. This case is being investigated by the FBI; the U.S. Department of Commerce, Bureau of Industry and Security, Office of Export Enforcement; and IRS Criminal Investigation. This case is being prosecuted by Assistant U.S. Attorneys Judith A. Heinz, Melanie Sartoris and Khaldoun Shobaki of the Northern District of California, and Trial Attorney Matthew Walczewski of the National Security Division Counterintelligence and Export Control Section.",2018-01-23T00:00:00-05:00,No topic,"National Security Division (NSD); USAO - California, Northern"
18,14-550,$20 Million Stolen Identity Refund Fraud Ring Indicted,"Tracy Mitchell, Dameisha Mitchell, Latasha Mitchell, Keisha Lanier, Tameka Hoskins, Sharondra Johnson, Cynthia Johnson, Mequetta Snell-Quick, Talarious Paige and Patrice Taylor were indicted for their roles in a $20 million stolen identity refund fraud (SIRF) conspiracy, Assistant Attorney General Kathryn Keneally of the Justice Department's Tax Division and U.S. Attorney George L. Beck Jr. for the Middle District of Alabama announced today following the unsealing of the superseding indictment yesterday. According to the superseding indictment, between January 2011 and December 2013, the defendants ran a large-scale identity theft ring in which they filed over 7,000 false tax returns that claimed in excess of $20 million in fraudulent claims. The defendants obtained stolen identities from various sources to be used in filing false returns. Tracy Mitchell worked at the hospital on Fort Benning in Columbus, Georgia, where she had access to the identification data of military personnel, including soldiers who were deployed to Afghanistan. Tracy Mitchell and her daughter, Latasha Mitchell, also obtained stolen identities from an Alabama state agency. Keisha Lanier obtained stolen identities from the Alabama Department of Corrections. Talarious Paige and Patrice Taylor worked in a call center for a Columbus company and stole identities. According to the superseding indictment, in order to file tax returns, the defendants obtained Electronic Filing Numbers in the names of several tax preparation businesses. On behalf of those tax preparation businesses, the defendants applied for bank products from various financial institutions, which mailed blank check stock to the defendants’ homes. The defendants directed anticipated tax refunds to prepaid debit cards, to U.S. Treasury checks and to financial institutions, which in turn issued the refunds via checks or prepaid debit cards. The defendants directed U.S. Treasury checks to be mailed to several addresses in Alabama and then obtained those checks from the mail. The defendants coordinated the cashing of the refund checks by sending various text messages among themselves. The defendants cashed the fraudulent checks at several businesses located in Alabama, Georgia and Kentucky. In addition to the conspiracy charge, the defendants are also charged with mail and wire fraud, access device fraud and aggravated identity theft. An indictment merely alleges that crimes have been committed and the defendants are presumed innocent until proven guilty beyond a reasonable doubt. If convicted, each defendant faces a statutory maximum potential sentence of 10 years in prison for the conspiracy charge, a statutory maximum potential sentence of 20 years in prison for each wire and mail fraud count, a statutory maximum potential sentence of 15 years in prison for each access device fraud count, and a mandatory two year sentence in prison for each aggravated identity theft count. The defendants are also subject to fines, forfeiture and mandatory restitution if convicted. The case was investigated by special agents of the Internal Revenue Service - Criminal Investigation and the U.S. Army – Criminal Investigation Division. Trial Attorney Michael Boteler of the Tax Division and Assistant U.S. Attorney Todd Brown for the Middle District of Alabama are prosecuting the case. The U.S. Attorney’s Office for the Middle District of Georgia provided assistance in this matter.",2014-05-22T00:00:00-04:00,No topic,Tax Division
19,17-1419,2017 Southeast Regional Animal Cruelty Prosecutions Training Held at Valdosta State University,"The United States Attorney’s Office for the Middle District of Georgia, the Environmental Crimes Section of the United States Department of Justice’s Environment and Natural Resources Division, and the United States Department of Agriculture – Office of Inspector General hosted training for the Southeast region on animal cruelty prosecutions, Dec. 13-14. This training represents the collaboration and coordination of federal and local agencies and offices to combat crimes of animal cruelty, including organized dog fighting, cock fighting, and horse soring. The conference provided participants with an overview of the federal animal welfare and cruelty statutes, investigation techniques, and strategies to overcome prosecution challenges. The Humane Society of the United States, along with prosecutors and federal agents, shared their experience in handling dog fighting and animal cruelty cases strengthening the response to these serious crimes. “Fighting contests involving dogs and other animals are morally wrong and illegal, said United States Attorney Charles E. Peeler.” They also create havens for additional illegal conduct such as gambling, drug trade and unlawful gun possession. Our office works with federal, state and local law enforcement agencies to identify and prosecute those involved in this reprehensible conduct.” “Ending animal fighting ventures and other inhumane practices will require a close partnership among federal, state, and local law enforcement agencies,” said Acting Assistant Attorney General Jeffrey H. Wood of the Justice Department’s Environment and Natural Resources Division. “Our Division is proud to be a leader in this worthy cause and to participate in this important training event in the wonderful city of Valdosta, Georgia.” “The USDA OIG is pleased to have worked closely with the Department of Justice to coordinate this important training initiative to combat animal fighting and the associated crimes which often occur in animal fighting ventures,” said Special Agent in Charge Karen Citizen-Wilcox for the USDA OIG Southeast Region Office of Investigations. “Special Agents from all of the OIG’s regional offices will share their knowledge of and experiences with animal fighting investigations with personnel attending from other law enforcement agencies and private organizations.” During the training, animal fighting investigators from the Humane Society of the United States, along with prosecutors and USDA OIG agents who have successfully investigated and prosecuted animal fighting cases, shared their experiences with attendees. Instructors provided participants with an overview of the business of dog fighting, a description of federal animal welfare and cruelty statutes, effective investigative techniques, evidence collection best practices, available resources and authorities for the seizure and post-seizure care of animals and successful sentencing strategies. State and national animal control associations estimate that upwards of 40,000 people participate in dog fighting in the United States at a professional level, meaning that dog fighting and its associated gambling are their primary or only source of income. An unknown but potentially larger number of people participate in dog fighting on an occasional basis. Cockfighting is thought to be similarly widespread. In addition, animal fighting activities attract other serious crimes, such as gambling, drug dealing, weapons offenses and money laundering. Children are commonly present at animal fighting events. The federal Animal Welfare Act makes it a felony punishable by up to five years in prison to knowingly sell, buy, possess, train, transport, deliver, or receive any animal, including dogs, for purposes of having the animal participate in an animal fighting venture. In 2014, the Department of Justice designated the Environment and Natural Resources Division as the centralized body within the Department responsible for tracking, coordinating, and working with the U.S. Attorneys’ Offices on animal cruelty enforcement matters.",2017-12-14T00:00:00-05:00,Environment,"Environment and Natural Resources Division; USAO - Georgia, Middle"
...,...,...,...,...,...,...
13063,14-482,Wyoming Businessman Sentenced to Prison for Using Concealed Caribbean Bank Account in Tax Evasion Scheme,"Robert C. Sathre was sentenced today to serve 36 months in federal prison for tax evasion by U.S. District Judge Alan B. Johnson in Cheyenne, Wyoming, the Justice Department and Internal Revenue Service (IRS) announced. Sathre was also ordered to pay $3,113,882 in restitution to the IRS and to serve three years of supervised release. Sathre pleaded guilty on Feb. 26, 2014, to willfully evading the payment of his 1995 and 1996 tax liability. According to court documents and proceedings, Sathre sold a Minnesota business and received installment payments in 1995 and 1996 of more than $3 million. Sathre concealed his income by filing a 1995 tax return in which he reported only $64,928 in total income. Sathre then purchased land and set up another business, a gas station and convenience store in Sheridan, Wyoming, known as the Rock Stop. According to court documents and proceedings, Sathre concealed assets by opening a foreign bank account in the Caribbean island of Nevis and by using purported trusts. In a 10 month period spanning from 2005 through 2006, Sathre sent over $500,000 to the account in Nevis to keep the funds out of reach from the IRS. When Sathre sold the Rock Stop in 2007, he wired over $1,250,000 from the sale proceeds to the trust account of a Wyoming law firm. He later directed the law firm to wire $900,000 from the trust account to his account at the Bank of Nevis. Sathre also provided a false declaration and false promissory note to the Bank of Nevis to conceal the source of this transfer and obtained a debit card linked to the foreign account to access funds locally. In addition, Sathre provided the Bank of Sheridan with an IRS form on which he falsely claimed that he was neither a citizen nor a resident of the United States. This case was investigated by special agents of IRS – Criminal Investigation. Trial Attorneys Ellen Quattrucci and Ignacio Perez de la Cruz of the Justice Department’s Tax Division prosecuted the case.",2014-05-07T00:00:00-04:00,No topic,Tax Division
13064,13-678,Wyoming Couple Indicted for Tax Evasion,"In an indictment unsealed on June 12, 2013, Robert and Judy Sathre, of Sheridan, Wyo. were charged by a federal grand jury in Cheyenne, Wyo., for conspiring to defraud the IRS and tax evasion relating to taxes owed by Robert Sathre for tax years 1995 and 1996. Judy Sathre was also charged with filing a false tax return for tax year 2007. According to the indictment, Robert Sathre sold a Minnesota business and received installment payments in 1995 and 1996 for more than three million dollars. Robert Sathre concealed his income by filing a 1995 tax return in which he reported only $64,928 in total income. Robert Sathre then purchased land and set up another business, a gas station/convenience store in Sheridan, Wyo. known as the Rock Stop. According to the indictment, the Sathres concealed assets by opening a foreign bank account in the Caribbean island of Nevis and by using purported trusts. In a ten-month period spanning 2005-2006, Mr. Sathre sent over $500,000 to the account in Nevis to keep the funds out of reach from the IRS. When Robert Sathre sold the Rock Stop in 2007, he had over $1,250,000 from the sale proceeds wired to the trust account of a Wyoming law firm. Later the Sathres directed the law firm to wire $900,000 from the trust account to their account at the Bank of Nevis. They also provided a false declaration and false promissory note to the Bank of Nevis to conceal the source of this transfer. Robert Sathre obtained a debit card linked to the foreign account to access funds locally. He also provided the Bank of Sheridan with an IRS form on which he falsely claimed that he was neither a citizen nor a resident of the United States. The indictment also alleges that the Sathres tried to conceal their ownership of real estate. They used a purported trust to encumber their residence at Troon Place in Sheridan and to conceal their ownership of property in Hennepin County in Minnesota. To conceal ownership of the Rock Stop, they similarly used a second purported trust, at one point resigning as trustees and appointing their teenage daughter as the trustee. The indictment also charges Judy Sathre with one count of filing a false tax return for 2007. The indictment alleges that the return was false both for reporting only $42 in interest income and for failing to disclose that she had a financial interest and signatory authority over the bank account at the Bank of Nevis. A trial date has not been scheduled. An indictment is merely an accusation, and every defendant is presumed innocent unless and until proven guilty. The conspiracy and tax evasion charges each carry a maximum potential penalty of five years in prison and a fine of $250,000. The false return charge carries a maximum potential penalty of three years in prison and a $250,000 fine. This case is being prosecuted by Trial Attorneys Ellen Quattrucci and Ignacio Perez de la Cruz of the Justice Department’s Tax Division and was investigated by IRS – Criminal Investigation.",2013-06-13T00:00:00-04:00,No topic,Tax Division
13070,10-1127,Wyoming Used Car Dealer Sentenced to 33 Months in Prison\r\nfor Odometer Tampering Scheme,"WASHINGTON – Jay Lee (aka Michael Smith; John Marks; Ricky Marks; and Anthony Romero) was sentenced today in connection with an odometer tampering scheme that defrauded numerous victims in Wyoming and Colorado, among other places, the Justice Department announced. U.S. District Court Judge Alan B. Johnson of Cheyenne, Wyo., sentenced Lee to a term of 33 months in prison and a term of three years of supervised release during which he cannot be involved in the sale of motor vehicles. The court ordered Lee to pay $195,654.55 in restitution. On July 23, 2009, a Casper, Wyo., federal grand jury returned an indictment charging Lee and a co-defendant, Randy Lee, in a 28-count indictment alleging conspiracy, odometer tampering, securities fraud, providing false odometer certifications and mail fraud. Investigators were unable to locate Jay Lee at the time, with the result that Randy Lee proceeded to trial alone. Randy Lee was convicted on Jan. 21, 2010, after a two-week trial, by a federal jury in Cheyenne of 11 felony counts, and is currently serving a 37 month prison sentence. Authorities subsequently located and arrested Jay Lee in Salt Lake City, where he was doing business using the name Michael Smith. On July 29, 2010, Jay Lee pleaded guilty to one count of conspiracy and one count of mail fraud. According to the indictment, the Lees purchased high-mileage, used motor vehicles from various businesses in New Mexico and Wyoming, as well as from a wholesale motor vehicle auction in Loveland, Colo. The Lees were charged with altering the odometers, the motor vehicle titles and sales documentation associated with these vehicle titles in order to reflect a false, lower mileage. As a result, the state of Wyoming issued motor vehicle titles reflecting false, lower mileage, which the Lees knew to be untrue. The defendants then sold the motor vehicles to local consumers, other used motor vehicle dealers, and at wholesale auto auctions. As a result, the Lees received higher sales prices for the vehicles they sold. ""Fraud schemes like odometer rollbacks cost consumers by de-valuing one of the most important purchases they make: their automobiles. Corrupt dealers who engage in this fraudulent practice steal consumers' hard-earned money, impede intelligent buying choices, and raise safety concerns by misrepresenting the true condition of the vehicles they sell,"" said Tony West, Assistant Attorney General for the Civil Division of the Department of Justice. ""The Justice Department will seek tough and appropriate sentences for those who try to cheat consumers by engaging in this illegal practice."" Assistant Attorney General West thanked the agencies that worked collaboratively to achieve this result. The underlying investigation was conducted by the Wyoming Department of Transportation’s Office of Compliance and Investigation and the U.S. Department of Transportation’s National Highway Traffic Safety Administration in Denver. The case was prosecuted by attorneys David Sullivan and Alan Phelps in the Office of Consumer Litigation in the Justice Department’s Civil Division.",2010-10-07T00:00:00-04:00,No topic,Civil Division
13081,14-1377,"Yuba City, California, Man Sentenced to 46 Months in Prison for Racially Motivated Attack on White Man and African-American Woman","Anthony Merrell Tyler, 34, of Yuba City, California, was sentenced today by U.S. District Court Judge John A. Mendez to serve 46 months in prison for violating the Matthew Shepard and James Byrd Jr. Hate Crimes Prevention Act. The crime involved a racially motivated attack by Tyler and two co-defendants, Billy Hammett, 30, and Perry Jackson, 29, on a white man and an African-American woman in Marysville, California, in 2011. In addition to his term of incarceration, Tyler was ordered to serve three years of supervised release upon his release from prison and to pay $175 in restitution. According to documents filed with the court, around 10:45 p.m. on April 18, 2011, a white man and an African-American woman parked their car at a convenience store in Marysville. Shortly afterward, the three defendants attacked the man and woman because of their race. Jackson punched him twice in the head through the open passenger window. At the same time, Hammett opened the driver-side door and kicked the woman in the chest. Seconds later, Tyler smashed the car’s windshield with a crowbar, sending shattered glass into the passenger compartment. As the attack continued, the woman managed to take refuge inside the convenience store and the man struggled to get away. All three assailants then descended upon the male victim and began attacking him in the parking lot. He sustained abrasions on his right forearm and knees, while the woman suffered bruising to her chest. None of the defendants knew their victims. In today’s hearing, and during Hammett and Jackson’s proceedings, Judge Mendez considered the defendants’ backgrounds and criminal histories. Tyler has the words “white pride” tattooed down the backs of his arms and a swastika on his left upper arm. He has previously acknowledged being a member of the Yuba County Peckerwoods, a local white supremacist group. Hammett, who has a tattoo of the words “white power” across his abdomen, was previously convicted for the unprovoked assault on a 72-year-old African-American man and was sentenced on March 25, 2014, to 87 months in prison. Jackson, who has the words “white power” tattooed in block letters down his shins, was sentenced on April 29, 2014, to 70 months in prison. Tyler entered his guilty plea on March 11, 2014. “These three defendants targeted the victims because of their race,” said Acting Assistant Attorney General Vanita Gupta for the Civil Rights Division. “This type of attack causes harm not only to the immediate victims, but tears at the fabric of our communities and society itself. The department will continue to vigorously prosecute such acts of racial violence.” “Racially motivated violence not only threatens the harmony of our diverse communities, it undermines the principle of equality under law, which is a foundation of our society,” said U.S. Attorney Benjamin B. Wagner for the Eastern District of California. “For these reasons, prosecuting hate crimes will continue to be one of our highest priorities.” This case was investigated by the FBI, with assistance from the Yuba County Sheriff’s Office and the Yuba County District Attorney’s Office. The case was prosecuted by U.S. Attorney Wagner and Trial Attorney Chiraag Bains of the Justice Department’s Civil Rights Division.",2014-12-09T00:00:00-05:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section; Civil Rights - Housing and Civil Enforcement Section


In [421]:
## Reg ex pattern to use in seaerch 

pattern = r'(years in)|(years of)'

## Initialize empty list
sentences_with_years = []

## Search each sentence in pharma for the reg ex pattern using re.search & if returned True, add assoc. text to the list initialized above ^^ 
pharma_doc = nlp(pharma)
for sent in pharma_doc.sents:
    if re.search(pattern, sent.text, re.IGNORECASE):
        sentences_with_years.append(sent.text)
sentences_with_years

['The charges of conspiracy to commit RICO and conspiracy to commit mail and wire fraud each provide for a sentence of no greater than 20 years in prison, three years of supervised release and a fine of $250,000, or twice the amount of pecuniary gain or loss.\xa0 ',
 'The charges of conspiracy to violate the Anti-Kickback Law provide for a sentence of no greater than five years in prison, three years of supervised release and a $25,000 fine.']

The maximum prison length of sentence is **20 years**, with a supervised release set at **3 years**, if the CEO in the Pharma press release is convicted. 

## 1.3 sentiment analysis  (10 points)

A. Subset the press releases to those labeled with one of three topics via `topics_clean`: Civil Rights, Hate Crimes, and Project Safe Childhood. We'll call this `doj_subset` going forward and it should have 717 rows.



In [422]:
## your code here for subsetting
## reg ex pattern for string begins and ends with one of the three topics 
pattern = r'^(Civil Rights|Hate Crimes|Project Safe Childhood)$'
## subset doj 
doj_subset = doj[doj['topics_clean'].str.contains(pattern)]
## check 
print(f'There are {len(doj_subset)} rows in the subset.')
doj_subset.head()

There are 717 rows in the subset.


Unnamed: 0,id,title,contents,date,topics_clean,components_clean
77,17-1235,Additional Former Correctional Officer Pleads Guilty to Beating of Handcuffed and Shackled Inmate at Angola State Prison,"A former supervisory correctional officer at Louisiana State Penitentiary in Angola, Louisiana, pleaded guilty yesterday in connection with the beating of a handcuffed and shackled inmate, in addition to conspiring to cover up their misconduct by falsifying official records and lying to internal investigators about what happened. James Savoy, 39, of Marksville, Louisiana, admitted during his plea hearing that he witnessed other officers using excessive force against the inmate and failed to intervene; that he conspired with other officers to cover up the beating by engaging in a variety of obstructive acts; and that he personally falsified official prison records to cover up the attack. Scotty Kennedy, 48, of Beebe, Arkansas, and John Sanders, 30, of Marksville, Louisiana previously pleaded guilty in November 2016, and September 2017, for their roles in the beating and cover up. “Every citizen has the right to due process and protection from unreasonable force, and correctional officers who violate these basic Constitutional rights must be held accountable for their egregious actions” said Acting Assistant Attorney General John Gore of the Civil Rights Division. “The Justice Department will continue to vigorously prosecute correctional officers who violate the public’s trust by committing crimes and to covering up violations of federal criminal law.” “Yesterday is another example of our office’s unwavering commitment to pursuing those who violate the federal criminal civil rights laws,” said Acting United States Attorney for the Middle District of Louisiana Corey Amundson. “We will continue to work closely with the Justice Department’s Civil Rights Division and the FBI to ensure that no one is above the law.” This case is being investigated by the FBI’s Baton Rouge Resident Agency and is being prosecuted by Assistant U.S. Attorney Frederick A. Menner, Jr. of the Middle District of Louisiana and Trial Attorney Christopher J. Perras of the Civil Rights Division’s Criminal Section.",2017-11-02T00:00:00-04:00,Civil Rights,"Civil Rights Division; USAO - Louisiana, Middle"
155,15-1522,Alabama Man Found Guilty of Aggravated Sexual Abuse of a Child,"A federal jury convicted Rick Lee Evans, 43, of Anniston, Alabama, today of aggravated sexual abuse of a child after a five-day trial, Assistant Attorney General Leslie R. Caldwell of the Justice Department’s Criminal Division and U.S. Attorney Joyce White Vance of the Northern District of Alabama announced. According to evidence introduced at trial, Evans, a former U.S. Army soldier, and his then-wife, a Department of Defense employee, were residing in Germany when they were asked to take temporary custody of a five-year-old child whose parents were deployed to Iraq with the U.S. Army. Evans sexually abused the child on multiple occasions during the 18 months that the child lived with him from May 2007 to December 2008. Trial Attorney Austin M. Berry of the Criminal Division’s Child Exploitation and Obscenity Section (CEOS) and Assistant U.S. Attorney Jacquelyn Hutzell of the Northern District of Alabama are prosecuting the case. U.S. Army Criminal Investigations Division and the FBI’s Birmingham, Alabama, Division investigated the case. This case was brought as part of Project Safe Childhood, a nationwide initiative to combat the growing epidemic of child sexual exploitation and abuse, launched in May 2006 by the Department of Justice. Led by U.S. Attorneys’ offices and CEOS, Project Safe Childhood marshals federal, state and local resources to better locate, apprehend and prosecute individuals who exploit children via the Internet, as well as to identify and rescue victims. For more information about Project Safe Childhood, please visit www.justice.gov/psc.",2015-12-11T00:00:00-05:00,Project Safe Childhood,"Criminal Division; Criminal - Child Exploitation and Obscenity Section; USAO - Alabama, Northern"
157,16-213,Alabama Man Indicted on Child Pornography and Sex Tourism Charges,"An Alabama native was indicted today and charged with multiple crimes involving travel with intent to engage in illicit sexual conduct with minors and child pornography, announced Assistant Attorney General Leslie R. Caldwell of the Justice Department’s Criminal Division and U.S. Attorney Kenyen R. Brown of the Southern District of Alabama. Clarence Edward Evers Jr., aka Bud, a technology teacher employed by the Conecuh County, Alabama, Board of Education, was arrested on Feb. 11, 2016, and was charged today with five counts of travel with intent to engage in illicit sexual conduct with a minor, one count of attempted travel with intent to engage in illicit sexual conduct with a minor, one count of production and attempted production of child pornography, one count of transportation of child pornography, one count of receipt of child pornography, one count of access with intent to view child pornography and one count of possession of child pornography. According to the indictment, Evers allegedly traveled to Thailand in the summers of 2010 through 2014 for the purpose of engaging in illicit sexual conduct with a minor and allegedly attempted to make a similar trip in the spring of 2015. During the 2014 trip, Evers also allegedly photographed his victims’ abuse and then transported the images back to the United States. In addition, Evers allegedly had other images of child sexual exploitation on his computers and other electronic devices. The charges contained in the indictment are only allegations. Evers is presumed innocent unless and until he is proven guilty beyond a reasonable doubt in a court of law. ICE-HSI is investigating this case. Trial Attorney James E. Burke IV of the Criminal Division’s Child Exploitation and Obscenity Section (CEOS) and Assistant U.S. Attorneys Sean P. Costello and Maria E. Murphy of the Southern District of Alabama are prosecuting the case. This case was brought as part of Project Safe Childhood, a nationwide initiative to combat the growing epidemic of child sexual exploitation and abuse launched in May 2006 by the Department of Justice. Led by U.S. Attorneys’ Offices and CEOS, Project Safe Childhood marshals federal, state and local resources to better locate, apprehend and prosecute individuals who exploit children via the Internet, as well as to identify and rescue victims. For more information about Project Safe Childhood, please visit www.justice.gov/psc.",2016-02-24T00:00:00-05:00,Project Safe Childhood,"Criminal Division; Criminal - Child Exploitation and Obscenity Section; USAO - Alabama, Southern"
162,16-381,Alabama Man Indicted for Producing Child Pornography Involving Multiple Victims,"An Alabama man was indicted today by a federal grand jury in Birmingham, Alabama, on charges related to the production of child pornography involving four minor victims, announced Assistant Attorney General Leslie R. Caldwell of the Justice Department’s Criminal Division and U.S. Joyce White Vance of the Northern District of Alabama. Gregory Jerome Lee, 53, formerly of Cullman County, Alabama, was indicted on four counts of production of child pornography, one count of conspiracy to advertise child pornography and one count of conspiracy to distribute and receive child pornography. According to the indictment, from September 1996 through December 2004, Lee used, persuaded, coerced and enticed minors to engage in sexually explicit conduct in order to produce images of that conduct. Between September 1996 and August 2007, Lee conspired with other individuals to distribute and receive child pornography through a variety of means, including the Internet. The U.S. Postal Inspection Service (USPIS) is investigating the case. Trial Attorney Amy E. Larson of the Criminal Division’s Child Exploitation and Obscenity Section (CEOS) and Assistant U.S. Attorney Jacquelyn Hutzell of the Northern District of Alabama are prosecuting the case. The charges and allegations contained in an indictment are merely accusations. The defendant is presumed innocent unless and until proven guilty. Members of the public who may have information related to this matter should call the USPIS Birmingham Office at (205) 326-2909. This case was brought as part of Project Safe Childhood, a nationwide initiative to combat the growing epidemic of child sexual exploitation and abuse launched in May 2006 by the Department of Justice. Led by U.S. Attorneys’ Offices and CEOS, Project Safe Childhood marshals federal, state and local resources to better locate, apprehend and prosecute individuals who exploit children via the Internet, as well as to identify and rescue victims. For more information about Project Safe Childhood, please visit www.justice.gov/psc.",2016-03-30T00:00:00-04:00,Project Safe Childhood,"Criminal Division; Criminal - Child Exploitation and Obscenity Section; USAO - Alabama, Northern"
168,14-464,Alabama Man Indicted for Threatening African-American Man and Another Person at Restaurant,"Jeremy Heath Higgins was indicted for threatening an African-American man at a Quinton, Alabama, restaurant, and for threatening another person who ordered Higgins to leave the restaurant due to his behavior, Acting Assistant Attorney General Jocelyn Samuels for the Justice Department’s Civil Rights Division and U.S. Attorney Joyce Vance for the Northern District of Alabama announced today. Higgins, 28, was charged in a three count indictment returned yesterday by a federal grand jury in the U.S. District Court for the Northern District of Alabama. The indictment charges him with one felony count and two misdemeanor counts of interference with a federally-protected activity. The indictment alleges that on June 14, 2013, Higgins approached and threatened an African-American man at the Alabama Rose Steakhouse because the man was present at the restaurant with a white woman. According to the indictment, another person ordered Higgins to leave the premises of the restaurant because of Higgins’ behavior toward the African-American man, after which Higgins allegedly shouted a threat to burn down the restaurant. The indictment further alleges that Higgins threatened the person who had ordered him to leave the restaurant by painting graffiti on the restaurant’s exterior and fence. If convicted of the felony count of the indictment, Higgins could face a maximum sentence of 10 years in prison and a $250,000 fine. For each of the misdemeanor charges, Higgins could face a maximum sentence of one year in prison and a $200,000 fine. This case is being investigated by the FBI and is being prosecuted by Assistant U.S. Attorney Robin B. Mark of the Northern District of Alabama and Trial Attorney David Reese of the Justice Department’s Civil Rights Division. An indictment is merely an accusation, and the defendant is presumed innocent unless proven guilty.",2014-05-01T00:00:00-04:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section


B. Write a function that takes one press release string as an input and:

- Removes named entities from each press release string (**Hint**: you may want to use `re.sub` with an or condition)
- Scores the sentiment of the entire press release using the `SentimentIntensityAnalyzer` and `polarity_scores`
- Returns the length-four (negative, positive, neutral, compound) sentiment dictionary (any order is fine)

Apply that function to each of the press releases in `doj_subset`. 

**Hints**: 

- A function + list comprehension to execute will takes about 30 seconds on a respectable local machine and about 2 mins on jhub; if it's taking a very long time, you may want to check your code for inefficiencies. If you can't fix those, for partial credit on this part/full credit on remainder, you can take a small random sample of the 717


In [406]:
## your code here to define function
    
def sentiment_analysis(press_release_string):
    ## process the contents of contents (haha!) with SpaCy
    doc = nlp(press_release_string) 
    
    ## create a reg ex pattern with OR, as reccomended, to look for any and all named entities identified w ent.text 
        ## also used re.escape to handle special characters instead of doing so manually 
    pattern = '|'.join(re.escape(ent.text) for ent in doc.ents if ent.text.strip()) 

    ## use re.sub to as reccomended to replace named entities in press_release_string with an empty string. 
    cleaned_text = re.sub(pattern, '', press_release_string, flags=re.IGNORECASE)

    ## initialize the SentimentIntensity Analyzer 
    sentiment = SentimentIntensityAnalyzer()
    
    ## Score the sentiment of the entire press release using the `SentimentIntensityAnalyzer` and `polarity_scores`
    sentiment_dict = sentiment.polarity_scores(cleaned_text)

    # Return the length-four (negative, positive, neutral, compound) sentiment dictionary 
    return sentiment_dict


In [407]:
## your code here executing the function
for press_release_string in doj_subset.contents:
    print(sentiment_analysis(press_release_string))

{'neg': 0.197, 'neu': 0.754, 'pos': 0.049, 'compound': -0.9931}
{'neg': 0.134, 'neu': 0.797, 'pos': 0.069, 'compound': -0.9325}
{'neg': 0.092, 'neu': 0.832, 'pos': 0.076, 'compound': -0.7579}
{'neg': 0.127, 'neu': 0.788, 'pos': 0.085, 'compound': -0.9037}
{'neg': 0.179, 'neu': 0.777, 'pos': 0.044, 'compound': -0.9864}
{'neg': 0.148, 'neu': 0.799, 'pos': 0.053, 'compound': -0.987}
{'neg': 0.155, 'neu': 0.766, 'pos': 0.079, 'compound': -0.9559}
{'neg': 0.093, 'neu': 0.841, 'pos': 0.066, 'compound': -0.7783}
{'neg': 0.107, 'neu': 0.832, 'pos': 0.061, 'compound': -0.9136}
{'neg': 0.167, 'neu': 0.776, 'pos': 0.056, 'compound': -0.9801}
{'neg': 0.216, 'neu': 0.748, 'pos': 0.036, 'compound': -0.9973}
{'neg': 0.095, 'neu': 0.841, 'pos': 0.064, 'compound': -0.8519}
{'neg': 0.083, 'neu': 0.852, 'pos': 0.065, 'compound': -0.6486}
{'neg': 0.307, 'neu': 0.66, 'pos': 0.034, 'compound': -0.9936}
{'neg': 0.179, 'neu': 0.753, 'pos': 0.069, 'compound': -0.9889}
{'neg': 0.125, 'neu': 0.803, 'pos': 0.071,

C. Add the four sentiment scores to the `doj_subset` dataframe to create a dataframe: `doj_subset_wscore`. Sort from highest neg to lowest neg score and print the top `id`, `contents`, and `neg` columns of the two most neg press releases. 

Notes:

- Don't worry if your sentiment score differs slightly from our output on GitHub; differences in preprocessing can lead to diff scores

In [423]:
## your code here
doj_subset['sentiment_scores'] = doj_subset.contents.apply(sentiment_analysis)
## unpack each dictionary in the column, make each key/category its own column! 
                        ## drop sentiment_scores column keep the rows
                                                                        ## make each key into a column   
doj_subset_wscore = pd.concat([doj_subset.drop(['sentiment_scores'], axis=1), doj_subset['sentiment_scores'].apply(pd.Series)], axis=1)
doj_subset_wscore.columns
doj_subset_wscore.sort_values(by = 'neg', ascending = False) ## put in descending order high to low


Index(['id', 'title', 'contents', 'date', 'topics_clean', 'components_clean',
       'neg', 'neu', 'pos', 'compound'],
      dtype='object')

Unnamed: 0,id,title,contents,date,topics_clean,components_clean,neg,neu,pos,compound
329,14-248,Albuquerque Man Charged with Federal Hate Crime Related to Anti-Semitic Threats Against Businesswoman,"The Department of Justice announced that this morning John W. Ng, 58, of Albuquerque, N.M., made his initial appearance in federal court on a criminal complaint charging him with a hate crime offense. This charge is related to anti-Semitic threats Ng made against a Jewish woman who owns and operates the Nosh Jewish Delicatessen and Bakery in Albuquerque. Ng was arrested by the FBI on March 7, 2014, based on a criminal complaint alleging that he interfered with the victim’s federally protected rights by threatening her and interfering with her business because of her religion. According to the criminal complaint, between Jan. 22, 2014, and Feb. 8, 2014, Ng allegedly posted threatening anti-Semitic notes on and in the vicinity of the victim’s business. A criminal complaint merely establishes probable cause, and Ng is presumed innocent unless proven guilty. If convicted on the offense charged in the criminal complaint, Ng faces a maximum statutory penalty of one year in prison. This matter was investigated by the Albuquerque Division of the FBI and is being prosecuted by Assistant U.S. Attorney Mark T. Baker of the U.S. Attorney’s Office for the District of New Mexico and Trial Attorney AeJean Cha of the U.S. Department of Justice’s Civil Rights Division.",2014-03-10T00:00:00-04:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section,0.307,0.660,0.034,-0.9936
572,13-312,Aryan Brother Inmate Sentenced for Federal Hate Crime for Assaulting Fellow Inmate,"John Hall, 27, an Aryan Brotherhood member and inmate at the Federal Correctional Institution (FCI) in Seagoville, Texas, was sentenced today by U.S. District Judge Reed O’Connor after pleading guilty to violating the Matthew Shepard and James Byrd Jr. Hate Crimes Prevention Act stemming from his assault of a fellow inmate, whom he believed to be gay, the Department of Justice announced. Hall assaulted his fellow inmate with a dangerous weapon, causing bodily injury to the victim on Dec. 20, 2011. Hall was sentenced to serve 71 months in prison to be served consecutively with the sentence he is currently serving. The assault occurred on Dec. 20, 2011, inside the FCI Seagoville when Hall targeted and attacked the victim, a fellow inmate, because he believed the victim was gay or involved in a sexual relationship with another male inmate. Hall repeatedly punched, kicked and stomped on the victim’s face with his shod feet, a dangerous weapon, while yelling a homophobic slur. The victim lost consciousness during the assault and suffered multiple lacerations to his face. The victim also sustained a fractured eye socket, lost a tooth, fractured other teeth and was treated at a hospital for the injuries he sustained during Hall’s unprovoked attack. Hall pleaded guilty to violating the Matthew Shepard and James Byrd Jr. Hate Crimes Prevention Act on Nov. 8, 2012. “Brutality and violence based on sexual orientation has no place in a civilized society,” said Thomas E. Perez, Assistant Attorney General for the Civil Rights Division. “The Justice Department is committed to using all the tools in our law enforcement arsenal, including the Matthew Shepard and James Byrd Jr. Hate Crimes Prevention Act, to prosecute acts motivated by hate.” “This prosecution sends a clear message that this office, in partnership with attorneys in the department’s Civil Rights Division, will prioritize and aggressively prosecute hate crimes and others civil rights violations in North Texas,” said U.S. Attorney Sarah R. Saldaña of the Northern District of Texas. This case was investigated by the FBI Dallas Division. The case was prosecuted by Assistant U.S. Attorney Errin Martin and Trial Attorney Adriana Vieco of the Civil Rights Division.",2013-03-14T00:00:00-04:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section,0.301,0.675,0.025,-0.9983
11593,16-718,Three Mississippi Correctional Officers Indicted for Inmate Assault and Cover-Up,"In a nine-count indictment unsealed today, two Mississippi correctional officers were charged with beating an inmate and a third was charged with helping to cover it up. The indictment charged Lawardrick Marsher, 28, and Robert Sturdivant, 47, officers at Mississippi State Penitentiary, in Parchman, Mississippi, with a beating that included kicking, punching and throwing the victim to the ground. Marsher and Sturdivant were charged with violating the right of K.H., a convicted prisoner, to be free from cruel and unusual punishment. Sturdivant was also charged with failing to intervene while Marsher was punching and beating K.H. The indictment alleges that their actions involved the use of a dangerous weapon and resulted in bodily injury to the victim. A third officer, Deonte Pate, 23, was charged along with Marsher and Sturdivant for conspiring to cover up the beating. The indictment alleges that all three officers submitted false reports and that all three lied to the FBI. If convicted, Marsher and Sturdivant face a maximum sentence of 10 years in prison on the excessive force charges. Each of the three officers faces up to five years in prison on the conspiracy and false statement charges, and up to 20 years in prison on the false report charges. An indictment is merely an accusation, and the defendants are presumed innocent unless and until proven guilty. This case is being investigated by the FBI’s Jackson Division, with the cooperation of the Mississippi Department of Corrections. It is being prosecuted by Assistant U.S. Attorney Robert Coleman of the Northern District of Mississippi and Trial Attorney Dana Mulhauser of the Civil Rights Division’s Criminal Section. Marsher Indictment",2016-06-21T00:00:00-04:00,Civil Rights,"Civil Rights Division; Civil Rights - Criminal Section; USAO - Mississippi, Northern",0.297,0.670,0.033,-0.9964
501,11-626,Arkansas Man Pleads Guilty to Federal Hate Crime Related to the Assault of Five Hispanic Men,"WASHINGTON – The Justice Department announced today that Sean Popejoy, 19, of Green Forest, Ark., pleaded guilty in federal court to one count of committing a federal hate crime and one count of conspiring to commit a federal hate crime. This is the first conviction for a violation of the Matthew Shepard and James Byrd Jr. Hate Crimes Prevention Act, which was enacted in October 2009. Information presented during the plea hearing established that in the early morning hours of June 20, 2010, Popejoy admitted that he was part of a conspiracy to threaten and injure five Hispanic men who had pulled into a gas station parking lot. The co-conspirators pursued the victims in a truck. When the co-conspirators caught up to the victims, Popejoy leaned outside of the front passenger window and waived a tire wrench at the victims and continued to threaten and hurl racial epithets at the victims. The co-conspirator rammed into the victims' car, which caused the victims’ car to cross the opposite lane of traffic, go off the road, crash into a tree and ignite. As a result of the co-conspirators’ actions, the victims suffered bodily injury, including one victim who sustained life-threatening injuries. “James Byrd, Jr. and Matthew Shepard were brutally murdered more than a decade ago, and today the first defendant is convicted for a hate crime under the critical new law enacted in their names,” said Thomas E. Perez, Assistant Attorney General for the Civil Rights Division. “It is unacceptable that violent acts of hate committed because of someone’s race continue to occur in 2011, and the department will continue to use every available tool to identify and prosecute hate crimes whenever and wherever they occur. “It is terrible and disturbing that violence motivated by hatred of another’s race continues to occur,” said Conner Eldridge, U.S. Attorney for the Western District of Arkansas. “We are committed to prosecuting such crimes in the Western District of Arkansas.” If convicted, the defendant faces a maximum punishment of 15 years in prison. This case is being investigated by the FBI’s Fayetteville Division in cooperation with the Arkansas State Police Department and the Carroll County Sheriff’s Office. The case is being prosecuted by Trial Attorney Edward Chung of the Department of Justice’s Civil Rights Division and Assistant U.S. Attorney Kyra Jenner for the Western District of Arkansas.",2011-05-16T00:00:00-04:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section,0.294,0.676,0.031,-0.9986
11248,10-1194,Tennessee Man Sentenced for Conspiring to Commit Murders of African-Americans,"WASHINGTON - The Justice Department announced that Daniel Cowart was sentenced today to 14 years in prison and three years of supervised release for his role in a conspiracy to murder dozens of African-Americans, including then-Senator and presidential candidate Barack Obama, because of their race. On March 29, 2010, Cowart pleaded guilty to conspiracy, threatening to kill and inflict bodily harm upon a major candidate for the office of President of the United States, interstate transportation of a short-barreled shotgun, interstate transportation of a firearm for the purpose of committing a felony, unlicensed transportation of an unauthorized short-barreled shotgun, possession of a short-barreled shotgun, intentional damage to religious real property and discharge of a firearm during and in relation to a crime of violence. Cowart, 22, of Bells, Tenn., admitted to conspiring with Paul Schlesselman of West Helena, Ark., to engage in a killing spree specifically targeting African-Americans. He further acknowledged that he intended to culminate these attacks by assassinating President Obama, a U.S. Senator and presidential candidate at the time of the conspiracy. Cowart admitted that he and Schlesselman also conspired to burglarize a federally-licensed firearms dealer to obtain additional weapons for their scheme. He also admitted to transporting a sawed-off shotgun from Arkansas to Tennessee for the purpose of committing felonies. Cowart additionally admitted to shooting the window of the Allen Baptist Church in Brownsville, Tenn. Under the plea agreement, Cowart agreed that an appropriate sentence would be between twelve and eighteen years. The charges to which he pleaded guilty carried a minimum sentence of 10 years and a maximum sentence of 75 years in prison. ""Threats of violence fueled by bigotry and hate have no place in the United States of America, and they will not be tolerated,"" said Thomas E. Perez, Assistant Attorney General for the Civil Rights Division. ""Although the heroic intervention of law enforcement spared us from a tragedy, this conspiracy and its associated crimes demanded a severe sentence. The sentence imposed constitutes serious punishment for a serious crime."" ""Thankfully, the defendants were not able to execute their violent scheme. Nevertheless, this is a grave matter and Judge Breen’s sentence reflects that crimes of this magnitude demand stiff penalties,"" said Edward L. Stanton III, U.S. Attorney for the Western District of Tennessee. ""I would like to recognize the extraordinary diligence of the Crockett County Sheriff’s Department, the Bureau of Alcohol, Tobacco and Firearms, the U.S Secret Service, and the FBI."" Cowart’s co-defendant, Paul Schlesselman, pleaded guilty on Jan. 14, 2010, to one count of conspiracy, one count of threatening to kill and inflict bodily harm upon a presidential candidate, and one count of possessing a firearm in furtherance of a crime of violence. Schlesselman was sentenced to 10 years in prison on April 15, 2010. This case was investigated by the Bureau of Alcohol, Tobacco, Firearms and Explosives; the U.S. Secret Service; the FBI; and the Crockett County Sheriff’s Office. The case was prosecuted by Assistant U.S. Attorneys Larry Laurenzi and James Powell and Civil Rights Division Trial Attorney Jonathan Skrmetti.",2010-10-22T00:00:00-04:00,Hate Crimes,Civil Rights Division; Civil Rights - Criminal Section,0.283,0.652,0.065,-0.9990
...,...,...,...,...,...,...,...,...,...,...
11085,15-667,"Statement from Vanita Gupta, Head of the Justice Department's Civil Rights Division, U.S. Attorney Steven M. Dettlebach for the Northern District of Ohio and Special Agent in Charge Stephen D. Anthony for the FBI","Statement from Vanita Gupta, head of the Justice Department’s Civil Rights Division, U.S. Attorney Steven M. Dettelbach for the Northern District of Ohio and Special Agent in Charge Stephen D. Anthony for the FBI: “The U.S. Attorney's Office, the Federal Bureau of Investigation and the Civil Rights Division of the Department of Justice have been monitoring the extensive investigation that has been conducted around the events of Nov. 29, 2012. We will now review the testimony and evidence presented in the state trial. We will continue our assessment, review all available legal options and will collaboratively determine what, if any, additional steps are available and appropriate given the requirements and limitations of the applicable laws in the federal judicial system. This review is separate and distinct from the Civil Rights Division and U.S. Attorney's Office's productive efforts to resolve civil pattern and practice allegations under 42 U.S.C. 14141 with the city of Cleveland.”",2015-05-23T00:00:00-04:00,Civil Rights,Civil Rights Division,0.000,0.940,0.060,0.7003
7594,16-539,"Justice Department Statements Regarding Court Approval of the Agreement with Newark, New Jersey, to Reform Unconstitutional Policing Practices","Principal Deputy Assistant Attorney General Vanita Gupta, head of the Justice Department’s Civil Rights Division, and U.S. Attorney Paul J. Fishman of the District of New Jersey released the following statements regarding the U.S. District Court for the District of New Jersey’s approval of the department’s agreement with the city of Newark, New Jersey, to reform the police department’s unconstitutional practices: “We appreciate the court’s swift approval of the Justice Department’s consent decree with the city of Newark,” said Principal Deputy Assistant Attorney General Gupta. “This agreement will help the Newark Police Department reform policies, improve systems and rebuild trust between officers and the community they serve. As Newark implements this agreement, we will continue to work closely with city officials, law enforcement and community members to put in place the necessary changes that can make Newark a national model for constitutional, effective and accountable policing. Once fully implemented, these reforms will make all of those in Newark – officers and civilians alike – safer. And these reforms will ensure that law enforcement in Newark complies with the Constitution and safeguards the civil rights of every Newark resident.” “This consent decree, now approved by the court, provides a roadmap for reform in Newark and a model for best practices for police departments across the country,” said U.S. Attorney Fishman. “Implementing the systemic changes outlined in the consent decree will take time, but this is what the city of Newark and the men and women who serve in the Police department want and need, and it is what the people of Newark deserve: a first-class police department that keeps them safe and respects their constitutional rights.”",2016-05-06T00:00:00-04:00,Civil Rights,Civil Rights Division; Civil Rights - Special Litigation Section; USAO - New Jersey,0.000,0.829,0.171,0.9854
6787,17-132,Justice Department Reaches Agreement with St. James Parish Louisiana School District to Desegregate Schools,"The Department of Justice has reached an agreement with the St. James Parish School District in Louisiana that upon completion will end court supervision of the district’s schools. The consent order, approved yesterday by the U.S. District Court for the Eastern District of Louisiana, addresses all remaining issues in the school desegregation case, and when fully implemented will lead to the closing of that case. The consent order, negotiated with the school district and private plaintiffs, represented by the NAACP Legal Defense and Educational Fund, puts the district on a path to full unitary status within three years provided it: The consent order declares that the district has already met its desegregation obligations in the area of transportation. The court will retain jurisdiction over the consent order during its implementation, and the Justice Department will monitor the district’s compliance. “We are pleased to have worked hand-in-hand with the schools to ensure equal and fair treatment for the students of the St. James Parish School District,” said Acting Assistant Attorney General Tom Wheeler of the Civil Rights Division. “We look forward to working with the district and private plaintiffs to implement the consent order and bring this case to a successful close.” Promoting school desegregation and enforcing Title IV of the Civil Rights Act of 1964 is a top priority of the Justice Department’s Civil Rights Division. Additional information about the Civil Rights Division is available on its website at www.justice.gov/crt. St. James Parish Consent Order",2017-01-31T00:00:00-05:00,Civil Rights,Civil Rights Division; Civil Rights - Educational Opportunities Section,0.000,0.840,0.160,0.9794
1857,17-271,"Court Approves Desegregation Plan for Cleveland, Mississippi, Schools","Cleveland School District to Open Consolidated Middle and High Schools by August 2017 U.S. District Court Judge Debra M. Brown of the Northern District of Mississippi today approved a joint settlement agreement filed on Feb. 8 by the Justice Department, private plaintiffs, and the Cleveland School District. The agreement will lead to the effective desegregation of Cleveland’s middle and high schools by the start of the next school year. Under the terms approved today, the school district agrees to comply with a May 13, 2016 court ruling mandating consolidation of Cleveland middle and high schools to remedy decades-long segregation in the school district. The consolidated high school, to be named Cleveland Central High School, will open by August at the current Margaret Green/Cleveland High campus. Also by August, the district will open the consolidated middle school (seventh and eighth grades), Cleveland Central Middle School, at the current East Side High facility. Under the agreement, sixth grade students will attend district elementary schools rather than the consolidated middle school. As part of the agreement, the district and plaintiffs have withdrawn all alternative desegregation proposals from consideration by the Court. The district has also withdrawn its pending appeal before the U.S. Court of Appeals for the Fifth Circuit. “The Department is pleased to have reached agreement with the Cleveland School District and private plaintiffs to settle this decades-long litigation,” said Acting Assistant Attorney General Tom Wheeler of the Justice Department’s Civil Rights Division. “The plan approved today allows the community to move forward together. It reflects the parties’ shared commitment to high quality equal educational opportunities for all Cleveland students.” Additional information is available on the Justice Department’s website at: www.justice.gov/opa/pr/federal-court-orders-justice-department-desegregation-plan-cleveland-mississippi-schools. Promoting school desegregation and enforcing Title IV of the Civil Rights Act of 1964 is a top priority of the Justice Department’s Civil Rights Division. Additional information about the Civil Rights Division is available on its website at www.justice.gov/crt.",2017-03-13T00:00:00-04:00,Civil Rights,"Civil Rights Division; Civil Rights - Educational Opportunities Section; USAO - Louisiana, Eastern; USAO - Mississippi, Northern",0.000,0.833,0.167,0.9909


In [424]:
doj_subset_wscore_C = doj_subset_wscore.head(2) 
doj_subset_wscore_C[['id','contents', 'neg']] 

Unnamed: 0,id,contents,neg
77,17-1235,"A former supervisory correctional officer at Louisiana State Penitentiary in Angola, Louisiana, pleaded guilty yesterday in connection with the beating of a handcuffed and shackled inmate, in addition to conspiring to cover up their misconduct by falsifying official records and lying to internal investigators about what happened. James Savoy, 39, of Marksville, Louisiana, admitted during his plea hearing that he witnessed other officers using excessive force against the inmate and failed to intervene; that he conspired with other officers to cover up the beating by engaging in a variety of obstructive acts; and that he personally falsified official prison records to cover up the attack. Scotty Kennedy, 48, of Beebe, Arkansas, and John Sanders, 30, of Marksville, Louisiana previously pleaded guilty in November 2016, and September 2017, for their roles in the beating and cover up. “Every citizen has the right to due process and protection from unreasonable force, and correctional officers who violate these basic Constitutional rights must be held accountable for their egregious actions” said Acting Assistant Attorney General John Gore of the Civil Rights Division. “The Justice Department will continue to vigorously prosecute correctional officers who violate the public’s trust by committing crimes and to covering up violations of federal criminal law.” “Yesterday is another example of our office’s unwavering commitment to pursuing those who violate the federal criminal civil rights laws,” said Acting United States Attorney for the Middle District of Louisiana Corey Amundson. “We will continue to work closely with the Justice Department’s Civil Rights Division and the FBI to ensure that no one is above the law.” This case is being investigated by the FBI’s Baton Rouge Resident Agency and is being prosecuted by Assistant U.S. Attorney Frederick A. Menner, Jr. of the Middle District of Louisiana and Trial Attorney Christopher J. Perras of the Civil Rights Division’s Criminal Section.",0.197
155,15-1522,"A federal jury convicted Rick Lee Evans, 43, of Anniston, Alabama, today of aggravated sexual abuse of a child after a five-day trial, Assistant Attorney General Leslie R. Caldwell of the Justice Department’s Criminal Division and U.S. Attorney Joyce White Vance of the Northern District of Alabama announced. According to evidence introduced at trial, Evans, a former U.S. Army soldier, and his then-wife, a Department of Defense employee, were residing in Germany when they were asked to take temporary custody of a five-year-old child whose parents were deployed to Iraq with the U.S. Army. Evans sexually abused the child on multiple occasions during the 18 months that the child lived with him from May 2007 to December 2008. Trial Attorney Austin M. Berry of the Criminal Division’s Child Exploitation and Obscenity Section (CEOS) and Assistant U.S. Attorney Jacquelyn Hutzell of the Northern District of Alabama are prosecuting the case. U.S. Army Criminal Investigations Division and the FBI’s Birmingham, Alabama, Division investigated the case. This case was brought as part of Project Safe Childhood, a nationwide initiative to combat the growing epidemic of child sexual exploitation and abuse, launched in May 2006 by the Department of Justice. Led by U.S. Attorneys’ offices and CEOS, Project Safe Childhood marshals federal, state and local resources to better locate, apprehend and prosecute individuals who exploit children via the Internet, as well as to identify and rescue victims. For more information about Project Safe Childhood, please visit www.justice.gov/psc.",0.134


D. With the dataframe from part C, find the mean compound sentiment score for each of the three topics in `topics_clean` using group_by and agg.

E. Add a 1 sentence interpretation of why we might see the variation in scores (remember that compound is a standardized summary where -1 is most negative; +1 is most positive)


In [425]:
## agg and find the mean compound score by topic
doj_subset_wscore.groupby('topics_clean')['compound'].agg('mean')

topics_clean
Civil Rights             -0.092504
Hate Crimes              -0.935982
Project Safe Childhood   -0.660279
Name: compound, dtype: float64

This variation is likely explained by the social connotations associated with committing a hate crime and exploting children being particularly shameful, given they induce adverse emotional response.

# 2. Topic modeling (25 points)

For this question, use the `doj_subset_wscores` data that is restricted to civil rights, hate crimes, and project safe childhood and with the sentiment scores added


## 2.1 Preprocess the data by removing stopwords, punctuation, and non-alpha words (5 points)

A. Write a function that:

- Takes in a single raw string in the `contents` column from that dataframe
- Does the following preprocessing steps:

    - Converts the words to lowercase
    - Removes stopwords, adding the custom stopwords in your code cell below to the default stopwords list
    - Only retains alpha words (so removes digits and punctuation)
    - Only retains words 4 characters or longer
    - Uses the snowball stemmer from nltk to stem

- Returns a joined preprocessed string
    
B. Use `apply` or list comprehension to execute that function and create a new column in the data called `processed_text`
    
C. Print the `id`, `contents`, and `processed_text` columns for the following press releases:

id = 16-718 (this case: https://www.seattletimes.com/nation-world/doj-miami-police-reach-settlement-in-civil-rights-case/)

id = 16-217 (this case: https://www.wlbt.com/story/32275512/three-mississippi-correctional-officers-indicted-for-inmate-assault-and-cover-up/)
    
**Resources**:

- Here's code examples for the snowball stemmer: https://www.geeksforgeeks.org/snowball-stemmer-nlp/

In [426]:
custom_doj_stopwords = ["civil", "rights", "division", "department", "justice",
                        "office", "attorney", "district", "case", "investigation", "assistant",
                       "trial", "assistance", "assist"]

In [427]:
## your code defining a text processing function
def preprocess_text(text, additional_stopwords):
    
    # Combine custom stopwords with the default stopwords
    default_stopwords = set(stopwords.words('english'))
    all_stopwords = default_stopwords.union(set(additional_stopwords))
    
    # Initialize Snowball stemmer
    stemmer = SnowballStemmer("english")
    
    # Convert contents to lowercase
    text = text.lower()
    
    # Tokenize words in contents 
    words = text.split()
    
    # Remove stopwords, keep alphabetic tokens with length 4 or more
    words = [word for word in words if word not in all_stopwords and word.isalpha() and len(word) >= 4]
    
    # Apply stemming
    words = [stemmer.stem(word) for word in words]
    # Return the processed text
    return ' '.join(words)


In [428]:
## your code executing the function
doj_subset_wscore['processed_text'] = doj_subset_wscore['contents'].apply(lambda x: preprocess_text(x, custom_doj_stopwords))

In [429]:
## your code showing the examples
print(doj_subset_wscore.loc[doj_subset_wscore['id'].isin(['16-718', '16-217']), ['id', 'contents', 'processed_text']])

           id  \
6727   16-217   
11593  16-718   

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    

## 2.2 Create a document-term matrix from the preprocessed press releases and to explore top words (5 points)

A. Use the `create_dtm` function I provide (alternately, feel free to write your own!) and create a document-term matrix using the preprocessed press releases; make sure metadata contains the following columns: `id`, `compound` sentiment column you added, and the `topics_clean` column

B. Print the top 10 words for press releases with compound sentiment in the top 5% (so the most positive sentiment)

C. Print the top 10 words for press releases with compound sentiment in the bottom 5% (so the most negative sentiment)

**Hint**: for these, remember the pandas quantile function from pset one.  

D. Print the top 10 words for press releases in each of the three `topics_clean`

For steps B - D, to receive full credit, write a function `get_topwords` that helps you avoid duplicated code when you find top words for the different subsets of the data. There are different ways to structure it but one way is to feed it subsetted data (so data subsetted to one topic etc.) and for it to get the top words for that subset.


In [432]:
def create_dtm(list_of_strings, metadata):
    vectorizer = CountVectorizer(lowercase = True)
    dtm_sparse = vectorizer.fit_transform(list_of_strings)
    dtm_dense_named = pd.DataFrame(dtm_sparse.todense(), 
        columns=vectorizer.get_feature_names_out())
    dtm_dense_named_withid = pd.concat([metadata.reset_index(drop=True), dtm_dense_named], axis=1)
    return(dtm_dense_named_withid)

In [567]:
def get_topwords(df, sentiment_type, quantile):
    if sentiment_type == 'positive':
        threshold = df['compound'].quantile(quantile)
        selected_releases = df[df['compound'] > threshold]
        label = ' Most Positive Press Releases'
    elif sentiment_type == 'negative':
        threshold = df['compound'].quantile(quantile)
        selected_releases = df[df['compound'] < threshold]
        label = ' Most Negative Press Releases'
    
    metadata_list = selected_releases[['id', 'compound', 'topics_clean']]
    list_of_strings = selected_releases['processed_text'].tolist()
    
    dtm = create_dtm(list_of_strings, metadata_list)
    display_top_words(dtm, label)

def display_top_words(dtm, subset_label, n=10):
    non_word_columns = ['id', 'compound', 'topics_clean']
    word_columns = dtm.drop(columns=non_word_columns, errors='ignore').select_dtypes(include=[np.number])
    word_sums = word_columns.sum(axis=0)
    sorted_words = word_sums.sort_values(ascending=False)
    top_words = sorted_words.head(n)
    print(f"\nTop 10 words in {subset_label}:")
    print(top_words)
    
# Step B: Top 10 words for press releases with compound sentiment in the top 5%
get_topwords(doj_subset_wscore, 'positive', 0.95)

# Step C: Top 10 words for press releases with compound sentiment in the bottom 5%
get_topwords(doj_subset_wscore, 'negative', 0.05)

# Step D
for topic in dtm['topics_clean'].unique():
    topic_releases = dtm[dtm['topics_clean'] == topic]
    display_top_words(topic_releases, f"press releases for topic '{topic}'")


Top 10 words in  Most Positive Press Releases:
agreement    144
enforc       120
ensur        106
state         94
communiti     84
said          75
servic        73
general       73
provid        72
polic         72
dtype: int64

Top 10 words in  Most Negative Press Releases:
crime       130
offic       121
assault     116
hate        110
defend      109
feder        90
sentenc      88
prosecut     87
victim       84
guilti       83
dtype: int64

Top 10 words in press releases for topic 'Civil Rights':
hous         588
offic        504
enforc       498
said         490
discrimin    476
feder        459
violat       444
alleg        389
general      386
state        347
dtype: int64

Top 10 words in press releases for topic 'Project Safe Childhood':
child       995
exploit     673
sexual      570
safe        475
project     472
crimin      404
prosecut    357
sentenc     329
children    313
investig    256
dtype: int64

Top 10 words in press releases for topic 'Hate Crimes':
prosecut 

## 2.3 Estimate a topic model using those preprocessed words (5 points)

A. Going back to the preprocessed words from part 2.3.1, estimate a topic model with 3 topics, since you want to see if the unsupervised topic models recover different themes for each of the three manually-labeled areas (civil rights; hate crimes; project safe childhood). You have free rein over the other topic model parameters beyond the number of topics.

B. After estimating the topic model, print the top 15 words in each topic.

**Hints and Resources**:

- Same topic modeling resources linked to above
- Make sure to use the `random_state` argument within the model so that the numbering of topics does not move around between runs of your code

In [338]:
# your code here
preprocessed_text = doj_subset_wscore['processed_text']

# Create  model
vectorizer = CountVectorizer(lowercase=True, stop_words='english')
dtm = vectorizer.fit_transform(preprocessed_text)
vocab = vectorizer.get_feature_names_out()

# Convert the DTM into a list of word frequencies
gensim_corpus = [[(i, count) for i, count in enumerate(row) if count > 0] for row in dtm.toarray()]
gensim_dict = corpora.Dictionary.from_corpus(gensim_corpus, id2word=dict(enumerate(vocab)))

# Train an LDA model using Gensim
lda_model = gensim.models.LdaModel(corpus=gensim_corpus, num_topics=3, id2word=gensim_dict, random_state=42)

# Display the top 15 words for each topic
def display_lda_topics(model, num_words=15):
    for topic_id in range(model.num_topics):
        words = model.show_topic(topic_id, num_words)
        print(f"Topic #{topic_id + 1}: " + " ".join([word for word, _ in words]))
display_lda_topics(lda_model)

Topic #1: child hous sexual said discrimin enforc prosecut state general exploit crimin charg inform investig feder
Topic #2: sentenc feder guilti charg child said prosecut investig crime year state defend plead violat offic
Topic #3: prosecut offic general said enforc feder sentenc polic today sexual investig year child charg crimin


## 2.4 Add topics back to main data and explore correlation between manual labels and our estimated topics (10 points)

Part A: Extract the document-level topic probabilities. Within `get_document_topics`, use the argument `minimum_probability` = 0 to make sure all 3 topic probabilities are returned. Write an assert statement to make sure the length of the list is equal to the number of rows in the `doj_subset_wscores` dataframe. 


In [538]:
## your code here to get doc-level topic probabilities 
document_topics = [lda_model.get_document_topics(doc, minimum_probability = 0) for doc in gensim_corpus] 
document_topics

[[(0, 0.003221889), (1, 0.98277354), (2, 0.014004559)],
 [(0, 0.79061824), (1, 0.0043349667), (2, 0.2050468)],
 [(0, 0.7499364), (1, 0.05501781), (2, 0.19504581)],
 [(0, 0.71746784), (1, 0.045845125), (2, 0.23668705)],
 [(0, 0.0036483894), (1, 0.0037361847), (2, 0.99261546)],
 [(0, 0.0029602465), (1, 0.9939866), (2, 0.0030531965)],
 [(0, 0.854145), (1, 0.003907867), (2, 0.14194717)],
 [(0, 0.3369616), (1, 0.36639237), (2, 0.29664603)],
 [(0, 0.003451244), (1, 0.0034336322), (2, 0.9931151)],
 [(0, 0.72897476), (1, 0.0038276436), (2, 0.2671976)],
 [(0, 0.0025305126), (1, 0.99483454), (2, 0.0026349542)],
 [(0, 0.1109064), (1, 0.002595601), (2, 0.886498)],
 [(0, 0.7708085), (1, 0.0027782617), (2, 0.2264132)],
 [(0, 0.10493142), (1, 0.33293962), (2, 0.56212896)],
 [(0, 0.0035699822), (1, 0.4761567), (2, 0.5202733)],
 [(0, 0.3607696), (1, 0.6366765), (2, 0.0025539245)],
 [(0, 0.8358622), (1, 0.16209182), (2, 0.002046015)],
 [(0, 0.0021882257), (1, 0.98356825), (2, 0.014243571)],
 [(0, 0.0025

Part B: Add the topic probabilities to the `doj_subset_wscores` dataframe as columns and create a column, `top_topic`, that reflects each document to its highest-probability topic (eg topic 1, 2, or 3)

In [539]:
## your code here to add those topic probabilities to the dataframe
highest_probability_topics = [max(topic_list, key=lambda x: x[1] )[0] + 1 for topic_list in document_topics]
# highest_probability_topics
doj_subset_wscore['top_topics'] = highest_probability_topics 
doj_subset_wscore['top_topics']

77       2
155      1
157      1
162      1
168      3
        ..
13002    2
13032    1
13034    1
13068    1
13081    2
Name: top_topics, Length: 717, dtype: int64

Part C: For each of the manual labels in `topics_clean` (Hate Crime, Civil Rights, Project Safe Childhood), print the breakdown of the % of documents with each top topic (so, for instance, Hate Crime has 246 documents-- if 123 of those documents are coded to topic_1, that would be 50%; and so on). **Hint**: pd.crosstab and normalize may be helpful: https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.crosstab.html

In [540]:
doj_crosstab = pd.crosstab(doj_subset_wscore['topics_clean'], doj_subset_wscore['top_topics'], normalize='index') * 100
doj_crosstab

top_topics,1,2,3
topics_clean,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Civil Rights,46.229508,27.213115,26.557377
Hate Crimes,6.097561,67.073171,26.829268
Project Safe Childhood,46.385542,31.325301,22.289157


Part D: Using a couple of press releases as examples, write a 1-2 sentence interpretation of why some of the manual topics map on more cleanly to an estimated topic than other manual topic(s)

In [541]:
cr_examples = doj_subset_wscore[doj_subset_wscore['topics_clean'] == 'Civil Rights'].head(1)
hc_examples = doj_subset_wscore[doj_subset_wscore['topics_clean'] == 'Hate Crimes'].head(1)
cr_examples.contents
hc_examples.contents

77    A former supervisory correctional officer at Louisiana State Penitentiary in Angola, Louisiana, pleaded guilty yesterday in connection with the beating of a handcuffed and shackled inmate, in addition to conspiring to cover up their misconduct by falsifying official records and lying to internal investigators about what happened.     James Savoy, 39, of Marksville, Louisiana, admitted during his plea hearing that he witnessed other officers using excessive force against the inmate and failed to intervene; that he conspired with other officers to cover up the beating by engaging in a variety of obstructive acts; and that he personally falsified official prison records to cover up the attack.   Scotty Kennedy, 48, of Beebe, Arkansas, and John Sanders, 30, of Marksville, Louisiana previously pleaded guilty in November 2016, and September 2017, for their roles in the beating and cover up.   “Every citizen has the right to due process and protection from unreasonable force, and correc

168    Jeremy Heath Higgins was indicted for threatening an African-American man at a Quinton, Alabama, restaurant, and for threatening another person who ordered Higgins to leave the restaurant due to his behavior, Acting Assistant Attorney General Jocelyn Samuels for the Justice Department’s Civil Rights Division and U.S. Attorney Joyce Vance for the Northern District of Alabama announced today.   Higgins, 28, was charged in a three count indictment returned yesterday by a federal grand jury in the U.S. District Court for the Northern District of Alabama.  The indictment charges him with one felony count and two misdemeanor counts of interference with a federally-protected activity.  The indictment alleges that on June 14, 2013, Higgins approached and threatened an African-American man at the Alabama Rose Steakhouse because the man was present at the restaurant with a white woman.  According to the indictment, another person ordered Higgins to leave the premises of the restaurant bec

Hate Crimes very strongly maps unto topic 2, because it contains broad legal terminology typical to hate crime cases.
Civil Rights maps most strongly to topic 1, because they contain language typical of civil rights-related issues and child-issues such as discrim, hous, child. 


# 3. Extend the analysis from unigrams to bigrams (10 points)

In the previous question, you found top words via a unigram representation of the text. Now, we want to see how those top words change with bigrams (pairs of words)

A. Using the `doj_subset_wscore` data and the `processed_text` column (so the words after stemming/other preprocessing), create a column in the data called `processed_text_bigrams` that combines each consecutive pairs of word into a bigram separated by an underscore. Eg:

"depart reach settlem" would become "depart_reach reach_settlem"

Do this by writing a function `create_bigram_onedoc` that takes in a single `processed_text` string and returns a string with its bigrams structured similarly to above example
 
**Hint**: there are many ways to solve but `zip` may be helpful: https://stackoverflow.com/questions/21303224/iterate-over-all-pairs-of-consecutive-items-in-a-list

B. Print the `id`, `processed_text`, and `processed_text_bigram` columns for press release with id = 16-217

In [549]:
def create_bigram_onedoc(text):
    words = text.split()
    bigrams = ['_'.join(pair) for pair in zip(words, words[1:])]
    return ' '.join(bigrams)

doj_subset_wscore['processed_text_bigrams'] = doj_subset_wscore['processed_text'].apply(create_bigram_onedoc)

(doj_subset_wscore.loc[doj_subset_wscore['id'] == '16-217', ['id', 'processed_text', 'processed_text_bigrams']])


Unnamed: 0,id,processed_text,processed_text_bigrams
6727,16-217,reach comprehens settlement agreement citi miami miami polic resolv shoot announc princip deputi general vanita head wifredo ferrer southern approv citi commiss today effect agreement sign resolv claim stem shoot conduct violent crime control enforc issu juli identifi pattern practic excess forc shoot violat fourth amend complianc settlement monitor independ former polic chief jane settlement citi implement comprehens reform ensur constitut polic support public settlement agreement design minim shoot effect quick investig shoot measur settlement repres renew commit citi miami chief rodolfo llane provid constitut polic miami resid protect public safeti sustain said princip deputi general agreement help strengthen relationship communiti serv improv account offic fire weapon provid communiti particip enforc agreement result joint effort citi miami ensur miami polic continu effort make communiti safe protect sacr constitut said oversight agreement seek make perman posit chang former chief orosa chief llane applaud citi settlement agreement build upon import reform implement citi sinc issu conduct attorney staff special litig section southern,reach_comprehens comprehens_settlement settlement_agreement agreement_citi citi_miami miami_miami miami_polic polic_resolv resolv_shoot shoot_announc announc_princip princip_deputi deputi_general general_vanita vanita_head head_wifredo wifredo_ferrer ferrer_southern southern_approv approv_citi citi_commiss commiss_today today_effect effect_agreement agreement_sign sign_resolv resolv_claim claim_stem stem_shoot shoot_conduct conduct_violent violent_crime crime_control control_enforc enforc_issu issu_juli juli_identifi identifi_pattern pattern_practic practic_excess excess_forc forc_shoot shoot_violat violat_fourth fourth_amend amend_complianc complianc_settlement settlement_monitor monitor_independ independ_former former_polic polic_chief chief_jane jane_settlement settlement_citi citi_implement implement_comprehens comprehens_reform reform_ensur ensur_constitut constitut_polic polic_support support_public public_settlement settlement_agreement agreement_design design_minim minim_shoot shoot_effect effect_quick quick_investig investig_shoot shoot_measur measur_settlement settlement_repres repres_renew renew_commit commit_citi citi_miami miami_chief chief_rodolfo rodolfo_llane llane_provid provid_constitut constitut_polic polic_miami miami_resid resid_protect protect_public public_safeti safeti_sustain sustain_said said_princip princip_deputi deputi_general general_agreement agreement_help help_strengthen strengthen_relationship relationship_communiti communiti_serv serv_improv improv_account account_offic offic_fire fire_weapon weapon_provid provid_communiti communiti_particip particip_enforc enforc_agreement agreement_result result_joint joint_effort effort_citi citi_miami miami_ensur ensur_miami miami_polic polic_continu continu_effort effort_make make_communiti communiti_safe safe_protect protect_sacr sacr_constitut constitut_said said_oversight oversight_agreement agreement_seek seek_make make_perman perman_posit posit_chang chang_former former_chief chief_orosa orosa_chief chief_llane llane_applaud applaud_citi citi_settlement settlement_agreement agreement_build build_upon upon_import import_reform reform_implement implement_citi citi_sinc sinc_issu issu_conduct conduct_attorney attorney_staff staff_special special_litig litig_section section_southern


C. Use the create_dtm function and the `processed_text_bigrams` column to create a document-term matrix (`dtm_bigram`) with these bigrams. Keep the following three columns in the data: `id`, `topics_clean`, and `compound` 

D. Print the (1) dimensions of the `dtm` matrix from question 2.2  and (2) the dimensions of the `dtm_bigram` matrix. Comment on why the bigram matrix has more dimensions than the unigram matrix 

E. Find and print the 10 most prevelant bigrams for each of the three topics_clean using the `get_topwords` function from 2.2

In [559]:
# your code here
metadata_list = doj_subset_wscore[['id','topics_clean','compound']]
processed_text_bigrams = doj_subset_wscore['processed_text_bigrams']
dtm_bigram = create_dtm(processed_text_bigrams, metadata_list) 

x = dtm.shape
y = dtm_bigram.shape 

print(f'The dimensions of the dtm matrix from question 2.2 are {x}, while the dimensions of the dtm_bigram matrix are {y}.')


The dimensions of the dtm matrix from question 2.2 are (717, 6110), while the dimensions of the dtm_bigram matrix are (717, 63644)


The reason why we see so many more dimensions in the bigram_dtm than the dtm from 2.2 is because the 2.2 dtm's features are unigrams. The bigram_dtm has many more features, because each individual word forms a pair with the next, with each pair being an individual feature; this is why we see so many more features in the bigram_dtm. 

In [568]:
for topic in dtm_bigram['topics_clean'].unique():
    topic_releases = dtm_bigram[dtm_bigram['topics_clean'] == topic]
    display_top_words(topic_releases, f"press releases for topic '{topic}'")


Top 10 words in press releases for topic 'Civil Rights':
fair_hous         225
deputi_general    221
princip_deputi    221
vanita_head       200
general_vanita    199
said_princip      186
announc_today     116
unit_state        106
consent_decre      92
act_general        80
dtype: int64

Top 10 words in press releases for topic 'Project Safe Childhood':
project_safe         472
child_exploit        266
child_pornographi    237
sexual_exploit       218
plead_guilti         188
child_sexual         177
exploit_obscen       175
safe_childhood       161
obscen_section       161
individu_exploit     156
dtype: int64

Top 10 words in press releases for topic 'Hate Crimes':
hate_crime       299
plead_guilti     269
special_agent    117
year_prison      104
thoma_general     95
grand_juri        93
said_thoma        91
act_general       85
face_maximum      77
feder_hate        76
dtype: int64


# 4. Optional extra credit (2 points)

You notice that the pharmaceutical kickbacks press release we analyzed in question 1 was for an indictment, and that in the original data, there's not a clear label for whether a press release outlines an indictment (charging someone with a crime), a conviction (convicting them after that charge either via a settlement or trial), or a sentencing (how many years of prison or supervised release a defendant is sentenced to after their conviction).

You want to see if you can identify pairs of press releases where one press release is from one stage (e.g., indictment) and another is from a different stage (e.g., a sentencing).

You decide that one way to approach is to find the pairwise string similarity between each of the processed press releases in `doj_subset`. There are many ways to do this, so Google for some approaches, focusing on ones that work well for entire documents rather than small strings.

Find the top two pairs (so four press releases total)-- do they seem like different stages of the same crime or just press releases covering similar crimes?

In [None]:
# your code here 