#Group members:
-  Bao Luu
-  Lev Nguyen
-  Sean Safi


# How the data is stored in the database

**Get the created database from here: https://drive.google.com/drive/folders/1nfhloimOLZ06Qe5KwPbVcggNtWJ0c8gM?usp=share_link**

The data stored in the DB is essentially the same as the data that was scraped. The column names and data will be the same:

-  Database: `ukraine.db`

-  Includes three tables that are the same tables that we have already scraped: 

  +  Table `Civilian_deaths`: Civilian Deaths by Area including the number of people killed, the time period and the data source

  +  Table `Dead_Foreign_Fighters`: Dead foreign fighters and volunteers including their nationality, allegiances, and their military forces

  +  Table `Total_losses`: All russian losses reported by Ministry Defence of Ukraine

#Importing Libraries

In [None]:
import json
import pandas as pd
import numpy as np
import requests
from lxml import etree
import io
import sqlite3 as sql

#Function Preparation

1.  `sql_select()` function that takes in the query and database, returning the subset we need

2.  `process_data()` function converts the data into a pandas DataFrame

In [None]:
def sql_select(db, qry):
  connection = sql.connect(db)
  cursor = connection.cursor()
  subset = cursor.execute(qry)

  return subset.fetchall()

In [None]:
def process_data(data):
  df = pd.DataFrame(data)
  return df

#Querying subsets of the data from Database

##Table 1: Civilian Deaths by Area

 - Query the subset of all Areas and their Fatalities from the table Civilian_deaths

In [None]:
db = 'ukraine_war.db'
qry = 'SELECT Area,Fatalities FROM Civilian_deaths'

data1 = sql_select(db, qry)
print(data1)

[('Cherkasy Oblast', 2), ('Chernihiv Oblast', '700+'), ('Dnipropetrovsk Oblast', 53), ('Donetsk Oblast', '1,246'), ('Kharkiv Oblast', '1,600+'), ('Kherson Oblast', 467), ('Kirovohrad Oblast', 7), ('Kyiv Oblast', '1,596+'), ('Luhansk Oblast', '1,986+'), ('Lviv Oblast', 7), ('Mariupol', '25,000+'), ('Mykolaiv Oblast', 403), ('Odesa Oblast', 33), ('Poltava Oblast', 22), ('Rivne Oblast', 25), ('Sumy Oblast', '106+'), ('Vinnytsia Oblast', 23), ('Volyn Oblast', 5), ('Zaporizhzhia Oblast', 113), ('Zhytomyr Oblast', 13)]


In [None]:
df1 = process_data(data1)
df1.columns = ['Areas', 'Fatalities']
df1

Unnamed: 0,Areas,Fatalities
0,Cherkasy Oblast,2
1,Chernihiv Oblast,700+
2,Dnipropetrovsk Oblast,53
3,Donetsk Oblast,1246
4,Kharkiv Oblast,"1,600+"
5,Kherson Oblast,467
6,Kirovohrad Oblast,7
7,Kyiv Oblast,"1,596+"
8,Luhansk Oblast,"1,986+"
9,Lviv Oblast,7


In [None]:
import plotly.express as px

In [None]:
df1 = process_data(data1)
df1.columns = ['Areas', 'Fatalities']

df1['Fatalities'] = df1['Fatalities'].astype(str)
df1['Fatalities'] = df1['Fatalities'].str.replace('+', '')
df1['Fatalities'] = df1['Fatalities'].str.replace(',', '')
df1['Fatalities'] = pd.to_numeric(df1['Fatalities'])
df1 = df1.sort_values('Fatalities', ascending=False)
df1

fig = px.bar(df1, x="Fatalities", y="Areas", orientation='h',
             height=700,
             title='Civilian Deaths',
             )

fig.show()


The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.



##Table 2: Dead foreign fighters

Query the subset of the dead foreign fighters' nationality, how many are dead (Deaths column) and their forces

In [None]:
db = 'ukraine_war.db'
qry = 'SELECT Country,Deaths,Forces FROM Dead_foreign_fighters'

data2 = sql_select(db, qry)
print(data2)

[('Argentina', 1, 'Ukrainian Armed forces'), ('Australia', 3, 'Ukrainian Armed forces'), ('Austria', 1, 'Ukrainian Armed forces'), ('Azerbaijan', 25, 'Ukrainian Armed forces'), ('Belarus', 16, 'Ukrainian Armed forces'), ('Brazil', 3, 'Ukrainian Armed forces'), ('Canada', 2, 'Ukrainian Armed forces'), ('Colombia', 3, 'Ukrainian Armed forces'), ('Croatia', 1, 'Ukrainian Armed forces'), ('Czech Republic', 1, 'Ukrainian Armed forces'), ('Denmark', 1, 'Ukrainian Armed forces'), ('France', 3, 'Ukrainian Armed forces'), ('Georgia', 35, 'Ukrainian Armed forces'), ('Germany', 1, 'Ukrainian Armed forces'), ('Ireland', 1, 'Ukrainian Armed forces'), ('Israel', 2, 'Ukrainian Armed forces'), ('Italy', 1, 'Ukrainian Armed forces'), ('Japan', 1, 'Ukrainian Armed forces'), ('Netherlands', 1, 'Ukrainian Armed forces'), ('New Zealand', 1, 'Ukrainian Armed forces'), ('Poland', 6, 'Ukrainian Armed forces'), ('Russia', 4, 'Ukrainian Armed forces'), ('Spain', 1, 'Ukrainian Armed forces'), ('South Korea', 1, 

In [None]:
df2 = process_data(data2)
df2.columns = ['Nationality', 'Deaths', 'Forces']
df2

Unnamed: 0,Nationality,Deaths,Forces
0,Argentina,1,Ukrainian Armed forces
1,Australia,3,Ukrainian Armed forces
2,Austria,1,Ukrainian Armed forces
3,Azerbaijan,25,Ukrainian Armed forces
4,Belarus,16,Ukrainian Armed forces
5,Brazil,3,Ukrainian Armed forces
6,Canada,2,Ukrainian Armed forces
7,Colombia,3,Ukrainian Armed forces
8,Croatia,1,Ukrainian Armed forces
9,Czech Republic,1,Ukrainian Armed forces


In [None]:
import plotly.express as px

fig = px.bar(df2, x='Deaths', y='Nationality', color='Forces',
             labels={'Nationality':'Dead foreign fighters'}, height=1000)
fig.show()

##Table 3: Total Losses

Query all columns of the table to get the losses in all military vehicles and weapons


In [None]:
db = 'ukraine_war.db'
qry = 'SELECT * FROM Total_losses'

data3 = sql_select(db, qry)
print(data3)

[('2022 - 02 - 24  00 : 00 : 00', 30, 130, 7, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 800), ('2022 - 02 - 25  00 : 00 : 00', 100, 516, 10, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2800), ('2022 - 02 - 26  00 : 00 : 00', 100, 540, 16, 18, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3000), ('2022 - 02 - 27  00 : 00 : 00', 150, 706, 27, 26, 50, 1, 4, 0, 0, 2, 0, 2, 0, 0, 4500), ('2022 - 02 - 28  00 : 00 : 00', 191, 816, 29, 29, 74, 1, 21, 0, 0, 3, 0, 2, 0, 0, 5300), ('2022 - 03 - 01  00 : 00 : 00', 198, 846, 29, 29, 77, 1, 24, 0, 7, 3, 0, 2, 0, 0, 5710), ('2022 - 03 - 02  00 : 00 : 00', 211, 862, 30, 31, 85, 0, 40, 0, 9, 3, 0, 2, 0, 0, 5840), ('2022 - 03 - 03  00 : 00 : 00', 217, 900, 30, 31, 90, 0, 0, 42, 11, 3, 0, 2, 0, 0, 9000), ('2022 - 03 - 04  00 : 00 : 00', 251, 939, 37, 37, 105, 0, 0, 50, 18, 3, 0, 2, 0, 0, 9166), ('2022 - 03 - 05  00 : 00 : 00', 269, 945, 39, 40, 105, 0, 0, 50, 19, 3, 0, 2, 0, 0, 10000), ('2022 - 03 - 06  00 : 00 : 00', 285, 985, 44, 48, 109, 0, 0, 50, 21, 4, 0, 2, 0, 0, 11000), ('2022 - 03 

In [None]:
df3 = process_data(data3)
df3.columns = ['date','tanks','armored_vehicle','planes','helicopters','cannons','mlrs_buk','mlrs_grad','mlrs','anti_air','uav','cruise_missiles','ships','cars_cisterns','special_equipment','personnel']
df3.head(65)

Unnamed: 0,date,tanks,armored_vehicle,planes,helicopters,cannons,mlrs_buk,mlrs_grad,mlrs,anti_air,uav,cruise_missiles,ships,cars_cisterns,special_equipment,personnel
0,2022 - 02 - 24 00 : 00 : 00,30,130,7,6,0,0,0,0,0,0,0,0,0,0,800
1,2022 - 02 - 25 00 : 00 : 00,100,516,10,7,0,0,0,0,0,0,0,0,0,0,2800
2,2022 - 02 - 26 00 : 00 : 00,100,540,16,18,0,0,0,0,0,0,0,0,0,0,3000
3,2022 - 02 - 27 00 : 00 : 00,150,706,27,26,50,1,4,0,0,2,0,2,0,0,4500
4,2022 - 02 - 28 00 : 00 : 00,191,816,29,29,74,1,21,0,0,3,0,2,0,0,5300
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60,2022 - 04 - 25 00 : 00 : 00,884,2258,181,154,411,0,0,149,69,201,0,8,28,28,21900
61,2022 - 04 - 26 00 : 00 : 00,918,2308,184,154,416,0,0,149,69,205,0,8,31,31,22100
62,2022 - 04 - 27 00 : 00 : 00,939,2342,185,155,421,0,0,149,71,207,0,8,31,31,22400
63,2022 - 04 - 28 00 : 00 : 00,970,2389,187,155,431,0,0,151,72,215,0,8,31,31,22800


#Answering the central question 
**What are the major factors contributing to the huge number of casualties of the Russia-Ukraine war?**
 
- From the DataFrame of the table Civilian Deaths' subset, we can see that areas that are adjacent to Russia (Chernihiv, Kharkiv, Luhansk, Mariupol - Eastern Ukraine) have the most number of civilian casualties, except for Kyiv because the Russians have to emphasize their military forces on the capital city of Ukraine in their operation.

- The bar chart of the table Dead foreign fighters indicates that the majority of foreign fighters are volunteers going to Ukraine to fight for the Ukrainian Armed forces while the Russian army, with the support of The Donetsk Armed forces and Luhansk Armed forces because they are pro-Russian paramilitaries in the Donbas region of eastern Ukraine

- The table Total losses indicates that the majority of vehicles/weapons is constantly inscreasing, indicating that there were more and more military vehicles/weapons are used in Ukraine except for the BM-21 Grad (mlrs_grad), Buk missile system, and ships which were not utilized much throughout the time period 


--> The factors contributing to the huge number of casualties of the Russia-Ukraine war are:
 
 1. Areas: the closer they are to Russian border, the more casualties there are 
 2. A lot of foreign fighters volunteering to fight alongside Ukrainians with numerous fighting for Russian people's militas in Ukraine's Donbas region
 3. The constant increase in weapons and military vehicles being supplied to the Russian army during the war
