# Serbian Legislation Network

### Describing Serbian legislation system as a complex network.

## Data

The latest version of each current legal document is available at <a href="http://www.pravno-informacioni-sistem.rs/SlGlasnikPortal/reg/advancedSearch">Serbian Legal Information System website</a>. <a href="https://github.com/vdragan1993/serbian-document-network/blob/master/src/crawler.py">Crawler</a> and <a href="https://github.com/vdragan1993/serbian-document-network/blob/master/src/scraper.py">Scraper</a> were developed in order to collect all republican legislations with their ID card and list of related regulations. Only legislations with ID card were scraped. Original collected data can be found in <a href="https://github.com/vdragan1993/serbian-document-network/tree/master/dataset/original_data">dataset/original_data/</a>.

## Network

In order to create legislation network, nodes and their links needs to be created. 

### Nodes

Every collected document is a node. List of document names extracted from their ID cards with custom made id number can be found in <a href="https://github.com/vdragan1993/serbian-document-network/blob/master/dataset/new_data/new_num_doc_sorted.txt">dataset/new_data/</a>.

### Links

Links between nodes are full-explicit references. References extraction process for every document:

1. Converting Serbian Cyrillic to Latin, and replacing special characters with their ASCII pairs (č -> c, š -> s...)
2. Tokenization
3. Stemming using <a href="https://github.com/vdragan1993/serbian-stemmer">Serbian Stemmer</a>
4. Joining stemmed words into space-separated text
5. Searching for collected document names in joined text
6. Saving detected reference in <b>this_document_name\t\t\tfound_document_name</b> format

After this, all detected references were aggregated into one document. Also, another document was created by replacing document names with their custom made id numbers.

Result of this process are references by <a href="https://github.com/vdragan1993/serbian-document-network/blob/master/dataset/new_data/all_text_lines.txt">document name</a> and <a href="https://github.com/vdragan1993/serbian-document-network/blob/master/dataset/new_data/all_num_lines.txt">document id</a>, and this documents are our legislation network. Also, references detected in every document can be found in <a href="https://github.com/vdragan1993/serbian-document-network/tree/master/dataset/new_data/graph">dataset/new_data/graph/</a>.

In order to evaluate process accuracy, full-explicit references were manually detected in 10 randomly selected documents. After validation, obtained accuracy of references extraction process was: %.

In [48]:
# imports
import pandas as pd
import codecs
import graphistry
import warnings
import networkx as nx
import matplotlib.pyplot as plt
from collections import OrderedDict
from operator import itemgetter
from collections import Counter
from itertools import islice
from numpy import linalg
# setup
warnings.filterwarnings('ignore')
api_key = open('API_key.txt').read()
graphistry.register(key=api_key)
%matplotlib inline

In [11]:
def load_num_doc(file_path):
    """
    Reading num - doc dictonary for mapping document name and id
    """
    f = codecs.open(file_path, 'r', 'utf8')
    lines = f.readlines()
    f.close()
    num_doc_mapper = {}
    clean_lines = [line[:-2] for line in lines if line.endswith('\r\n')]
    clean_lines.append(lines[-1])
    for line in clean_lines:
        number = int(line.split(',')[0])
        text = line[len(str(number))+1:]
        num_doc_mapper[number] = text
    return num_doc_mapper

In [21]:
def sort_dictionary_by_value_asc(input_dict):
    output_dict = OrderedDict(sorted(input_dict.items(), key=itemgetter(1)))
    return output_dict

def sort_dictionary_by_value_desc(input_dict):
    output_dict = OrderedDict(sorted(input_dict.items(), key=itemgetter(1)))
    return output_dict

In [2]:
# reading graph
edges = pd.read_csv('dataset/new_data/all_text_lines.txt', sep='\t\t\t', names=['src', 'dest'])
print(edges.head())

                                                 src                    dest
0  ustavni zakon za sprovodjenje ustava republike...  zakon o ministarstvima
1  ustavni zakon za sprovodjenje ustava republike...  ustav republike srbije
2  uredba o prestanku vazenja uredbe o osnivanju ...           zakon o vladi
3                       uredba o vojnoj legitimaciji           zakon o vladi
4                       uredba o vojnoj legitimaciji   zakon o vojsci srbije


In [3]:
# visualization using graphistry
graphistry.bind(source='src', destination='dest').plot(edges)

In [14]:
# reading num - name dictionary
num_doc_mapper = load_num_doc('dataset/new_data/new_num_doc_sorted.txt')

# reading and creating network using networkx
graph = nx.read_edgelist('dataset/new_data/all_num_lines.txt', create_using=nx.DiGraph(), nodetype=int)
print(nx.info(graph))

Name: 
Type: DiGraph
Number of nodes: 5391
Number of edges: 17343
Average in degree:   3.2170
Average out degree:   3.2170


In [40]:
# highest degrees
print("Nodes with highest degrees: (in + out)\n")
degrees_high = sort_dictionary_by_value_desc(graph.degree())
degrees_high_count = Counter(degrees_high)
for k, v in degrees_high_count.most_common(5):
    print('%s: %i (%i + %i)\n' % (num_doc_mapper[k], v, graph.in_degree(k), graph.out_degree(k)))

Nodes with highest degrees: (in + out)

zakon o vladi: 1492 (1482 + 10)

zakon o planiranju i izgradnji: 319 (284 + 35)

zakon o radu: 310 (290 + 20)

zakon o zastiti zivotne sredine: 302 (282 + 20)

zakon o carinskoj tarifi: 289 (289 + 0)



In [45]:
# lowest degrees
print("Nodes with lowest degrees: (in + out)\n")
deegrees_low = sort_dictionary_by_value_asc(graph.degree())
deegrees_low_count = islice(deegrees_low.items(), 0, 5)
for k, v in deegrees_low_count:
    print('%s: %i (%i + %i)\n' % (num_doc_mapper[k], v, graph.in_degree(k), graph.out_degree(k)))

Nodes with lowest degrees: (in + out)

pravilnik o upotrebi grba, zastave i himne republike srbije u diplomatsko-konzularnim predstavnistvima republike srbije i na zvanicnim dokumentima ministarstva spoljnih poslova: 1 (0 + 1)

pravilnik o blizim uslovima za cuvanje, upravljanje i distribuciju krvi i komponenata krvi: 1 (0 + 1)

zakljucak o upotrebi grba, zastave i himne republike srbije: 1 (1 + 0)

pravilnik o sistemu pracenja, nacinu oznacavanja i drugim pitanjima od znacaja za identifikaciju svakog pojedinacnog uzimanja krvi, odnosno pojedinacne jedinice krvi, kao i o nacinu, postupku i sadrzaju obrasca za prijavljivanje ozbiljnih nezeljenih dogadjaja, odnosno ozbiljnih nezeljenih reakcija: 1 (0 + 1)

pravilnik o nacinu upisa cinjenice drzavljanstva u maticnu knjigu rodjenih, obrascima za vodjenje evidencija o resenjima o sticanju i prestanku drzavljanstva i obrascu uverenja o drzavljanstvu: 1 (0 + 1)



In [46]:
# highest in_degrees
print("Nodes with highest in degrees:\n")
in_degrees_high = sort_dictionary_by_value_desc(graph.in_degree())
in_degrees_high_count = Counter(in_degrees_high)
for k, v in in_degrees_high_count.most_common(5):
    print('%s: %i\n' % (num_doc_mapper[k], v))

Nodes with highest in degrees:

zakon o vladi: 1482

zakon o radu: 290

zakon o carinskoj tarifi: 289

zakon o planiranju i izgradnji: 284

zakon o zastiti zivotne sredine: 282



In [47]:
# highest out_degrees
print("Nodes with highest out degrees:\n")
out_degrees_high = sort_dictionary_by_value_desc(graph.out_degree())
out_degrees_high_count = Counter(out_degrees_high)
for k, v in out_degrees_high_count.most_common(5):
    print('%s: %i\n' % (num_doc_mapper[k], v))

Nodes with highest out degrees:

strategija prevencije i zastite od diskriminacije: 76

zakljucak (o usvajanju nacionalnog akcionog plana za koriscenje obnovljivih izvora energije republike srbije): 73

zakljucak o usvajanju treceg akcionog plana za energetsku efikasnost republike srbije za period do 2018. godine: 67

uredba o utvrdjivanju prostornog plana podrucja posebne namene za infrastrukturni koridor visokonaponskog dalekovoda 110 kv broj 113/x od ts nis 1 do vrle III: 64

strategija reforme javne uprave u republici srbiji: 59

