# Court Data Extractor Demo

This notebook demonstrates how to use different modules from the Court Data Extractor package.

## Setup

First, let's import all necessary modules and create instances of our extractors.

In [1]:
import pandas as pd
from src.articles import ArticlesExtractor
from src.gender import GenderExtractor
from src.districts import MunicipalityExtractor
from src.punishments import PunishmentExtractor

# Initialize extractors
articles_extractor = ArticlesExtractor(remove_duplicates=True)
gender_extractor = GenderExtractor(russian_names_db=False)
municipality_extractor = MunicipalityExtractor()
punishment_extractor = PunishmentExtractor()

## 1. Articles Extractor

The Articles Extractor helps parse legal articles, their parts, and subparts from court decision texts.

In [2]:
# Example 1: Process single string
test_string = "Губаев Борис Магомедович - ст.159 ч.2 УК РФ"
result = articles_extractor.process_string(test_string)
print("Single string processing result:")
print(result)

# Example 2: Process DataFrame
data = {
    'text_column': [
        "Губаев Борис Магомедович - ст.159 ч.2 УК РФ",
        "ст. 20.1 КоАП; ст. 19.3 ч.1 КоАП",
        "ст. 105 ч.1 п.а УК; ст. 111 ч.2 УК"
    ]
}
df = pd.DataFrame(data)

results = articles_extractor.process_dataframe(df, 'text_column', parallel=True, n_workers=2)
print("\nDataFrame processing results:")
for i, result in enumerate(results):
    print(f"\nRow {i+1}:")
    print(result)

Single string processing result:
[{'person': 'Person 1', 'articles': [{'article': '159', 'part': '2', 'subpart': None}], 'code_type': 'CRIMINAL'}]

DataFrame processing results:

Row 1:
[{'person': 'Person 1', 'articles': [{'article': '159', 'part': '2', 'subpart': None}], 'code_type': 'CRIMINAL'}]

Row 2:
[{'person': 'Person 1', 'articles': [{'article': '20.1', 'part': None, 'subpart': None}, {'article': '19.3', 'part': '1', 'subpart': None}], 'code_type': 'ADMIN'}]

Row 3:
[{'person': 'Person 1', 'articles': [{'article': '105', 'part': '1', 'subpart': ['а']}, {'article': '111', 'part': '2', 'subpart': None}], 'code_type': 'CRIMINAL'}]


## 2. Gender Extractor

The Gender Extractor helps determine the gender of defendants from their full names.

In [3]:
# Example: Extract gender from text
text = "Волостных Владислав Витальевич - ст.291 ч.3; ст.222 ч.1; ст.290 ч.5 п.в; ст.290 ч.5 п.в; ст.290 ч.5 п.в УК РФ"
result = gender_extractor.extract_genders(text)
print("Gender extraction result:")
print(result)

# Note: Results show (name, gender) where gender can be:
# M - male
# F - female
# U - undefined
# C - contradiction

Gender extraction result:
[('Волостных Владислав Витальевич', 'M')]


## 3. Municipality Extractor

The Municipality Extractor helps determine the region and municipality for a given court code.

In [4]:
# Example 1: Get municipality for single court code
court_code = "61RS0006"
region, municipality, oktmo = municipality_extractor.get_municipality(court_code)
print(f"Court code: {court_code}")
print(f"Region: {region}")
print(f"Municipality: {municipality}")
print(f"OKTMO: {oktmo}")

# Example 2: Process DataFrame with court codes
data = {
    'court_code': ['61RS0006', '61RS0007', '61RS0008']
}
df = pd.DataFrame(data)
df = municipality_extractor.process_dataframe(df, 'court_code')
print("\nDataFrame processing result:")
print(df)

Court code: 61RS0006
Region: Ростовская
Municipality: Городской округ Город Ростов-на-Дону
OKTMO: 60701000

DataFrame processing result:
  court_code      region                          municipality     oktmo
0   61RS0006  Ростовская  Городской округ Город Ростов-на-Дону  60701000
1   61RS0007  Ростовская  Городской округ Город Ростов-на-Дону  60701000
2   61RS0008  Ростовская  Городской округ Город Ростов-на-Дону  60701000


## 4. Punishment Extractor

The Punishment Extractor helps extract structured information about punishments from court decision texts.

In [None]:
# Example: Process DataFrame with court decisions
# Note: This requires a DataFrame with 'result_text' and 'defendants_simple' columns
# and a valid punishments.yaml configuration file

# Sample data
data = {
    'result_text': [
        "Приговор: Иванов Иван Иванович приговорен к лишению свободы сроком на 5 лет",
        "Приговор: Петров Петр Петрович приговорен к штрафу в размере 100000 рублей"
    ],
    'defendants_simple': [
        "Иванов Иван Иванович",
        "Петров Петр Петрович"
    ]
}
df = pd.DataFrame(data)

# Process the DataFrame
df = punishment_extractor.process_dataframe(df)
print("Punishment extraction results:")
print(df[['defendants_simple', 'punishments']])