## Scrape of List of Victims and Recognized Motu Proprio

<img src='https://web.archive.org/web/20220517114810im_/https://hrvvmemcom.gov.ph/wp-content/uploads/slider18/dividerblockbackground.jpg'>

Scraped data was based on the website 'Human Rights Violations Victims' Memorial Commission' (https://hrvvmemcom.gov.ph/list-of-victims-recognized-motu-proprio/, archive:https://web.archive.org/web/20220517114810/https://hrvvmemcom.gov.ph/list-of-victims-recognized-motu-proprio/)

Because revionism is most likely going to intensify under the regime of Bongbong Marcos Jr. It was decided to archive the website and scrape the names of victims.It was originally intended to archive the Human Rights Victims' Claims Board but the website is alreay defunct.

In [1]:
original_page = 'https://hrvvmemcom.gov.ph/list-of-victims-recognized-motu-proprio/'
mirror = 'https://web.archive.org/web/20220517114810/https://hrvvmemcom.gov.ph/list-of-victims-recognized-motu-proprio/'

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from time import sleep
import re

# installing driver manager
service = Service(executable_path=ChromeDriverManager().install())



Current google-chrome version is 101.0.4951
Get LATEST chromedriver version for 101.0.4951 google-chrome
Driver [C:\Users\Vanessa\.wdm\drivers\chromedriver\win32\101.0.4951.41\chromedriver.exe] found in cache


In [4]:
# use mirror to scrape
r = requests.get(mirror)
index_page = mirror

In [5]:
# create beautiful soup object
soup = BeautifulSoup(r.text, 'html.parser')

#### Point System

* 10 : killings and enforced disappearance
* 9 : torture (rape and forcible abduction)
* 8 : torture (mutilation, sexual abuse, children and minors involved)
* 7 : torture (psychological, mental, emotional harm, acts of lasciviousness)
* 6 : cruel, inhumane and degrading treatment
* 5 : arbitrary detention (> 6 months)
* 4 : arbitrary detention (15 days - 6 months)
* 3 : arbitrary detention (36 hours - 15 days)
* 2 : involuntary exile (violence and illegal takeover of business)
* 1 : involuntary exile (intimidation and physical injuries)

#### Get all pages assigned to Point System 

As per Annex A “Legal Guide on Definition of Human Rights Violations and Awarding of Points Under RA 10368”

In [6]:
# find all links with text 'Click here to see names of victims'
links = soup.find_all('a', text='Click here to see names of victims')

for link in links:
    print(link['href'])

https://web.archive.org/web/20220517114810/https://hrvvmemcom.gov.ph/10-points-5/
https://web.archive.org/web/20220517114810/https://hrvvmemcom.gov.ph/9-points-2/
https://web.archive.org/web/20220517114810/https://hrvvmemcom.gov.ph/8-points-2/
https://web.archive.org/web/20220517114810/https://hrvvmemcom.gov.ph/7-points-2/
https://web.archive.org/web/20220517114810/https://hrvvmemcom.gov.ph/6-points-2/
https://web.archive.org/web/20220517114810/https://hrvvmemcom.gov.ph/work-in-progress/
https://web.archive.org/web/20220517114810/https://hrvvmemcom.gov.ph/work-in-progress/
https://web.archive.org/web/20220517114810/https://hrvvmemcom.gov.ph/work-in-progress/
https://web.archive.org/web/20220517114810/https://hrvvmemcom.gov.ph/work-in-progress/
https://web.archive.org/web/20220517114810/https://hrvvmemcom.gov.ph/work-in-progress/


Trimming list to include only pages with victims listed

In [7]:
point_links = [link['href'] for link in links[0:5]]

#### Get totals of Points 5 to 1

Since there are no lists of victim names, only total of cases are retrieved for cases with 1 to 5 points (involuntary exile to arbitrary detention)

In [8]:
# use webdriver to wait loading of page and js animations
chrome = webdriver.Chrome(service=service)
chrome.get(index_page)
sleep(30)

# get last 5 values
numbers_elements = chrome.find_elements(By.CLASS_NAME, 'n2-ss-item-counter-counting-div') 
numbers = [number.text for number in numbers_elements][-5:]

chrome.quit()

In [9]:
# create csv for victim counts for cases 5 to 1
detention_and_exile = pd.DataFrame(data=zip(range(5,0,-1), numbers))
detention_and_exile.rename(columns={0:'Points',1:'Victims'}, inplace=True)
detention_and_exile['Victims'] = detention_and_exile['Victims'].str.replace(',','').astype(int)

# fix index and export
detention_and_exile.to_csv('detentions_exile.csv', index=False)

#### On Motu Proprio Recognized Victims

Motu Proprio victims suffered varying degrees of oppression, but they chose to forgo reparations. These victims were identified based on Sections 18 and 26 of R.A. 10368, as well as Section 20 of the Implementing Rules and Regulations of R.A. 10368. 

In [10]:
# find all names table entries, remove indices and save in text file
mp_victims = soup.find_all('tr')
mp_victims_names = [re.sub("\d+", " ", victim.text.strip()) for victim in mp_victims]

with open('motu_proprio_victims.txt', 'w', encoding = 'utf-8') as f:
    for item in mp_victims_names:
        f.write(f'{item}\n')

#### Get all names for Victims with 10 to 6 points

In [12]:
# get names for cases with points from 10 to 6
for index, link in enumerate(point_links):
    
    fname = f'points{10-index}.csv'

    # request site
    r = requests.get(link)
    soup = BeautifulSoup(r.text, 'html.parser')
    
    # get rows with names and place of incident
    rows = soup.find_all('tr')
    content = [row.text.strip() for row in rows[1:]]
        
    victim_names = [text.split('\n')[0] for text in content]
    
    # remvoe place of incident for cases with points=10
    if index == 0:
        place_of_incident = [text.split('\n')[-1] for text in content]
        df = pd.DataFrame(data = {'victim':victim_names, 'place_of_incident':place_of_incident})
    else:
        df = pd.DataFrame(data = {'victim':victim_names})
    
    df.to_csv(fname, index=False, sep='\t')
    
    sleep(60)