# 월간 데이콘 법원 판결 예측 AI 경진대회
특정 사건에서 첫 번째 당사자와 두 번째 당사자 중 첫 번째 당사자의 승소 여부를 예측
https://dacon.io/competitions/official/236112/overview/description
- ID : 사건 샘플 ID
- first_party : 사건의 첫 번째 당사자
- second_party : 사건의 두 번째 당사자
- facts : 사건 내용
- first_party_winner : 첫 번째 당사자의 승소 여부 (0 : 패배, 1 : 승리)

+++ 비교해 확인
https://dacon.io/competitions/official/235671/codeshare/2006?page=1&dtype=recent

In [None]:
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings(action='ignore')

In [None]:
import nltk # 문장 토크나이저
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [None]:
train = pd.read_csv("train.csv")
train.head(50)

Unnamed: 0,ID,first_party,second_party,facts,first_party_winner
0,TRAIN_0000,Phil A. St. Amant,Herman A. Thompson,"On June 27, 1962, Phil St. Amant, a candidate ...",1
1,TRAIN_0001,Stephen Duncan,Lawrence Owens,Ramon Nelson was riding his bike when he suffe...,0
2,TRAIN_0002,Billy Joe Magwood,"Tony Patterson, Warden, et al.",An Alabama state court convicted Billy Joe Mag...,1
3,TRAIN_0003,Linkletter,Walker,Victor Linkletter was convicted in state court...,0
4,TRAIN_0004,William Earl Fikes,Alabama,"On April 24, 1953 in Selma, Alabama, an intrud...",1
5,TRAIN_0005,"C & A Carbone, Inc., et al.",Town of Clarkstown,"A New York town, Clarkstown, allowed a contrac...",1
6,TRAIN_0006,"David Jennings, et al.","Alejandro Rodriguez, et al.",Sections of the Immigration and Nationality Ac...,1
7,TRAIN_0007,"US Airways, Inc.",Barnett,"In 1990, Robert Barnett injured his back while...",1
8,TRAIN_0008,"Ron Davis, Acting Warden",Hector Ayala,"Hector Ayala, a Hispanic man, was charged with...",1
9,TRAIN_0009,Paul A. McDaniel,"Selma Cash Paty, et al.","Since its first state Constitution in 1796, Te...",1


In [None]:
# 동명이인 일까
# 아님 동일한 사람의 기록일까
len(train.first_party.unique()),len(train.second_party.unique())

(2110, 1974)

간단하게 기본 내용 확인    
=> 범죄 유형 분리 필요한지 보고 싶음

In [None]:
train.facts[0] # 명예훼손

'On June 27, 1962, Phil St. Amant, a candidate for public office, made a television speech in Baton Rouge, Louisiana.  During this speech, St. Amant accused his political opponent of being a Communist and of being involved in criminal activities with the head of the local Teamsters Union.  Finally, St. Amant implicated Herman Thompson, an East Baton Rouge deputy sheriff, in a scheme to move money between the Teamsters Union and St. Amant’s political opponent. \r\nThompson successfully sued St. Amant for defamation.  Louisiana’s First Circuit Court of Appeals reversed, holding that Thompson did not show St. Amant acted with “malice.”  Thompson then appealed to the Supreme Court of Louisiana.  That court held that, although public figures forfeit some of their First Amendment protection from defamation, St. Amant accused Thompson of a crime with utter disregard of whether the remarks were true.  Finally, that court held that the First Amendment protects uninhibited, robust debate, rather

In [None]:
train[train.facts.str.contains('defamation')]

Unnamed: 0,ID,first_party,second_party,facts,first_party_winner
0,TRAIN_0000,Phil A. St. Amant,Herman A. Thompson,"On June 27, 1962, Phil St. Amant, a candidate ...",1
104,TRAIN_0104,Jim Garrison,Louisiana,"On November 2, 1962, Jim Garrison, the Distric...",1
296,TRAIN_0296,Air Wisconsin Airlines Corporation,William L. Hoeper,Section 125 of the Aviation Transportation Saf...,1
390,TRAIN_0390,Michael Milkovich,"Lorain Journal Co., The News Herald, J. Theodo...","Michael Milkovich, Maple Heights High School’s...",1
829,TRAIN_0829,Albert Snyder,"Fred W. Phelps, Sr., et al.",The family of deceased Marine Lance Cpl. Matth...,0
1601,TRAIN_1601,Farmers Educational and Cooperative Union of A...,"WDAY, Inc.","The radio and television station WDAY, Inc. br...",0
1783,TRAIN_1783,"Old Dominion Branch No. 496, National Associat...","Henry M. Austin, et al.","In the spring of 1970, Old Dominion Branch No....",1


In [None]:
train.facts[1] # 살인 혐의

'Ramon Nelson was riding his bike when he suffered a lethal blow to the back of his head with a baseball bat. After two eyewitnesses identified Lawrence Owens from an array of photos and then a lineup, he was tried and convicted for Nelson’s death. Because Nelson was carrying cocaine and crack cocaine potentially for distribution, the judge at Owens’ bench trial ruled that Owens was probably also a drug dealer and was trying to “knock [Nelson] off.” Owens was found guilty of first-degree murder and sentenced to 25 years in prison.\r\nOwens filed a petition for a writ of habeas corpus on the grounds that his constitutional right to due process was violated during the trial. He argued that the eyewitness identification should have been inadmissible based on unreliability and that the judge impermissibly inferred a motive when a motive was not an element of the offense. The district court denied the writ of habeas corpus, and Owens appealed. The U.S. Court of Appeals for the Seventh Circu

In [None]:
train[train.facts.str.contains('murder')]

Unnamed: 0,ID,first_party,second_party,facts,first_party_winner
1,TRAIN_0001,Stephen Duncan,Lawrence Owens,Ramon Nelson was riding his bike when he suffe...,0
2,TRAIN_0002,Billy Joe Magwood,"Tony Patterson, Warden, et al.",An Alabama state court convicted Billy Joe Mag...,1
8,TRAIN_0008,"Ron Davis, Acting Warden",Hector Ayala,"Hector Ayala, a Hispanic man, was charged with...",1
36,TRAIN_0036,Nix,Williams,Williams was arrested for the murder of a ten-...,1
54,TRAIN_0054,Martinez,California,Richard Thomas was convicted of attempted murd...,0
...,...,...,...,...,...
2423,TRAIN_2423,Payton,New York,New York City police suspected Theodore Payton...,1
2429,TRAIN_2429,"Charles L. Ryan, Director, Arizona Dept. of Co...",Edward Harold Schad,"In 1985, an Arizona jury convicted Edward Scha...",0
2436,TRAIN_2436,"David Bobby, Warden",Michael Bies,"In 1992, Michael Bies was convicted of kidnapp...",1
2461,TRAIN_2461,Cicenia,Lagay,"During police interrogation for a murder, Cice...",0


In [None]:
train[train.facts.str.contains('murder')]['first_party_winner'].value_counts()

1    184
0     57
Name: first_party_winner, dtype: int64

In [None]:
train.facts[3] # 불법 증거 사용 - evidence illegally obtained

'Victor Linkletter was convicted in state court on evidence illegally obtained by police prior to the Supreme Court decision concerning the Fourth Amendment in Mapp v. Ohio. Mapp applied the exclusionary rule to state criminal proceedings, denying the use of illegally obtained evidence at trial. Linkletter argued for a retrial based on the Mapp decision.\r\n'

In [None]:
train.facts[4] # (주거) 침입

'On April 24, 1953 in Selma, Alabama, an intruder broke into the apartment of the daughter of the city mayor. The daughter and the intruder struggled through several rooms until she was able to seize his knife, and he fled. The assailant had a towel over his head, so the victim could not identify the defendant during the trial. The police apprehended William Earl Fikes on the basis of a call from a private citizen and held him “on an open charge of investigation.” The police questioned Fikes for hours, placed him in jail, and limited his access to anyone familiar. After nearly a week of this treatment, Fikes confessed in the form of answers to the interrogator’s leading questions. Five days later, Fikes confessed under questioning a second time. When these confessions were admitted into the trial as evidence, Fikes did not testify regarding the events surrounding his interrogation because the judge had ruled he would be subjected to unlimited cross-examination. The jury convicted Fikes

In [None]:
train[train.facts.str.contains('intruder')]

Unnamed: 0,ID,first_party,second_party,facts,first_party_winner
4,TRAIN_0004,William Earl Fikes,Alabama,"On April 24, 1953 in Selma, Alabama, an intrud...",1
1274,TRAIN_1274,Willie Mae Barker,"John W. Wingo, Warden","On July 20, 1958, intruders beat an elderly co...",0


In [None]:
train.facts[5] # 조례 위반 - violating the ordinance

'A New York town, Clarkstown, allowed a contractor to construct and operate a waste processing plant within town limits. The revenue from the plant would help compensate the contractor. Clarkstown promised that the plant would receive 120,000 tons of solid waste each year, and permitted the contractor to charge an $81 "tipping fee" for each ton received. To meet the 120,000 ton quota, Clarkstown adopted a "flow control ordinance." The ordinance required that all solid waste flowing into and out of the town pass through the new plant. C & A Carbone, Inc. operated a similar plant within the town. To avoid paying the $81 fee, Carbone trucked processed waste directly to an Indiana landfill. In 1991, a Carbone truck carrying illegal waste crashed and police discovered that Carbone was violating the ordinance. Clarkstown sued Carbone in a New York Supreme Court. Carbone responded by suing Clarkstown in a federal District Court, claiming that the ordinance violated the Commerce Clause by disr

In [None]:
train.facts[6]

'Sections of the Immigration and Nationality Act require that noncitizens who are determined to be inadmissible to the United States must be detained during removal proceedings, though some may be released on bond if they can demonstrate that they are not a flight risk or a danger to the community. Alejandro Rodriguez and other detained noncitizens sued and argued that their prolonged detention without hearings and determinations to justify the detentions violated their due process rights. After litigation regarding class certification, the district court granted a preliminary injunction that required the government to provide each detainee with a bond hearing and to release that detainee unless the government could show, by clear and convincing evidence, that continued detention was justified. The U.S. Court of Appeals for the Ninth Circuit held that prolonged detention without a hearing raised serious constitutional concerns, and therefore that the relevant mandatory statutory langua

## 문장 리스트화   
● Sentence Tokenization   
마침표나 느낌표, 물음표 등으로 구분      
마침표의 경우 항상 문장의 마지막이 아니라 Mr. / Dr. 등등 문장 중간에도 들어갈 수 있음. 그러나 영어 문장의 경우 이러한 케이스도 잘 잡아주고 있는 것을 확인

- 좋은 vocab을 만드는 것이 곧 토크나이징 퀄리티와 직결
- 모델이 맥락 지식을 잘 학습하여 downstream task에서 좋은 성능을 내는 데에까지 영향
- 모델 학습에 들어가기에 앞서 vocab을 잘 살펴볼 필요

In [None]:
train.facts

0       On June 27, 1962, Phil St. Amant, a candidate ...
1       Ramon Nelson was riding his bike when he suffe...
2       An Alabama state court convicted Billy Joe Mag...
3       Victor Linkletter was convicted in state court...
4       On April 24, 1953 in Selma, Alabama, an intrud...
                              ...                        
2473    Congress amended the Clean Air Act through the...
2474    Alliance Bond Fund, Inc., an investment fund, ...
2475    In 1992, the District Court sentenced Manuel D...
2476    On March 8, 1996, Enrico St. Cyr, a lawful per...
2477    Herbert Markman owns the patent to a system th...
Name: facts, Length: 2478, dtype: object

In [None]:
# 문장 각각 리스트로 저장
text = "I am a college student. I'm 23 years old. I like to read books."
sentences = nltk.sent_tokenize(text)
print(sentences)

['I am a college student.', "I'm 23 years old.", 'I like to read books.']


In [None]:
train.facts[0]

'On June 27, 1962, Phil St. Amant, a candidate for public office, made a television speech in Baton Rouge, Louisiana.  During this speech, St. Amant accused his political opponent of being a Communist and of being involved in criminal activities with the head of the local Teamsters Union.  Finally, St. Amant implicated Herman Thompson, an East Baton Rouge deputy sheriff, in a scheme to move money between the Teamsters Union and St. Amant’s political opponent. \r\nThompson successfully sued St. Amant for defamation.  Louisiana’s First Circuit Court of Appeals reversed, holding that Thompson did not show St. Amant acted with “malice.”  Thompson then appealed to the Supreme Court of Louisiana.  That court held that, although public figures forfeit some of their First Amendment protection from defamation, St. Amant accused Thompson of a crime with utter disregard of whether the remarks were true.  Finally, that court held that the First Amendment protects uninhibited, robust debate, rather

In [None]:
train.facts[0] = nltk.sent_tokenize(train.facts[0])
train.facts[0]

['On June 27, 1962, Phil St. Amant, a candidate for public office, made a television speech in Baton Rouge, Louisiana.',
 'During this speech, St. Amant accused his political opponent of being a Communist and of being involved in criminal activities with the head of the local Teamsters Union.',
 'Finally, St. Amant implicated Herman Thompson, an East Baton Rouge deputy sheriff, in a scheme to move money between the Teamsters Union and St. Amant’s political opponent.',
 'Thompson successfully sued St. Amant for defamation.',
 'Louisiana’s First Circuit Court of Appeals reversed, holding that Thompson did not show St. Amant acted with “malice.”  Thompson then appealed to the Supreme Court of Louisiana.',
 'That court held that, although public figures forfeit some of their First Amendment protection from defamation, St. Amant accused Thompson of a crime with utter disregard of whether the remarks were true.',
 'Finally, that court held that the First Amendment protects uninhibited, robus

In [None]:
for i in range(2477):
  train.facts[i] = nltk.sent_tokenize(train.facts[i])

In [None]:
train

Unnamed: 0,ID,first_party,second_party,facts,first_party_winner
0,TRAIN_0000,Phil A. St. Amant,Herman A. Thompson,"[On June 27, 1962, Phil St. Amant, a candidate...",1
1,TRAIN_0001,Stephen Duncan,Lawrence Owens,[Ramon Nelson was riding his bike when he suff...,0
2,TRAIN_0002,Billy Joe Magwood,"Tony Patterson, Warden, et al.",[An Alabama state court convicted Billy Joe Ma...,1
3,TRAIN_0003,Linkletter,Walker,[Victor Linkletter was convicted in state cour...,0
4,TRAIN_0004,William Earl Fikes,Alabama,"[On April 24, 1953 in Selma, Alabama, an intru...",1
...,...,...,...,...,...
2473,TRAIN_2473,"HollyFrontier Cheyenne Refining, LLC, et al.","Renewable Fuels Association, et al.",[Congress amended the Clean Air Act through th...,1
2474,TRAIN_2474,"Grupo Mexicano de Desarrollo, S. A.","Alliance Bond Fund, Inc.","[Alliance Bond Fund, Inc., an investment fund,...",1
2475,TRAIN_2475,Peguero,United States,"[In 1992, the District Court sentenced Manuel ...",0
2476,TRAIN_2476,Immigration and Naturalization Service,St. Cyr,"[On March 8, 1996, Enrico St. Cyr, a lawful pe...",0
