### Abstract

In [7]:
### Red Scare

### Abstract:

#   -Input: A graph G with vertex set V(G) and edge set E(G); the graph can be directed or undirected; 
#               - no multiple edges between any pair of vertices and unweighted;  
#               - every graph comes with two specified vertices s, t ∈ V(G) called start and end vertices and a subset R ⊆ V(G) of red vertices; R can include s and t;
#               - an s,t-path is a sequence of DISTINCT vertices v1, ... vl such that v1 = s, vl = t and (vi, vi+1) ∈ E(G) for all i = 1, ..., l − 1 := AKA simple path;

#           Every input file is of the form: 	n m r
#	                                            s t
#	                                            <vertices>
#	                                            <edges>        # with n vertices, m edges and r cardinality of R(how many red vertices are there)
#                                                              # each vertex name is a string from [_a-z0-9]+
#                                                              # the names of vertices in R are followed by *; Ex.: 7 *       
#                                                              # edges of the form : u -- v for undirected edge , u --> v for directed arc 



#    Sub-tasks we want to solve for each problem:

#             - None: Return 1 if the length of a shorthest path internally avoiding R(red vertices) exists, -1 otherwise; * if the edge (s, t) exists then length(path(s,t)) = 2;

#             - Some: Return True if there is a path from s to t that includes at least one vertex from R

#             - Many: Return the maximum number of red vertices on any path from s to t; if no path return -1

#             - Few: Return minimum number of red vertices on any path from s to t; if no path, return -1

#             - Alternate: Return true if there is a path from s to t that alternates between red and non-red vertices, false otherwise



#    Requirements:

#            - Hint: For 3, we should be able to handle all instances; 2 roughly 50% of instances;
#            - The algorithms should run in polynomial time; if no polynom and > 1h report;
#            - Hint to tackle: For 2, not able to write one algo that works for all graphs; for 1 of these 2 should be able to argue for computational hardness with a simple reduction; mistify 2
#            - Universality: the algo must run in polynomial time on a well-defined class of graphs:
#                            - Well-defined classes:  * all graphs, * directed graphs, * undirected graphs, * bipartite graphs,
#                                                     * acyclic graphs, * graphs of bounded treewidth, * planar graphs, * expanders, * combination of these;
#            - Allowed:  if(isBipartite(G)) then
#                             # run the Strumpf-Chosa algorithm
#                        else print('!') # problem is NP-hard for non-bipartite graph
#            - Not allowed:  if (filename == 'rusty-I-17") then print(14) solved by hand


#            Libraries:

#            - Focus is on choosing between algorithms, not implementing them; not required to write them from scratch;
#            - Allowed: implementation can be either reusing code, built-in, books, external;


#     Deliverables:

#            1. A report; follow the skeleton in doc/report.pdf.
#            2. A text file results.txt with all the results, as specified in report.
#            3. Scripts, ReadME file that explains how to recreate results.txt by running your programs.




In [1]:

#     Steps:


#          Keywords, concepts, tests: - * We have to build the graphs for all the instances/files;
#                                     - * Graph tests: the algorithm must run on defined classes of graphs, ex.directed, undirected, bipartite; graph is connected, so maybe specify this;
#                                     - The tests should tell us what kind of algorithm should we use for that specific graph, without knowing the type of the graph; blind graph;
#                                     - For some problems, the red vertices appear randomly, for others they are fixed; different rules for checking if the vertex is red; colloring the vertex red as we build the graphs vs build and then check for red vertices while searching for paths;
#                                     -  Remember that for each subtask we check if there is a path from s to t, s and t can be red;
#                                     - * A path from s to t has distinct vertices, the path starts at s ends at t; the number of edges tells us the type of graph each problem might respond to tests; Ex.: If #edges == 3 then problem is Individual graphs,  if #edges == N^2 then problem is Grids;
#                                     - Object implementation vs functional implementation; 


#          Problems:

#                     1. Individual Graphs:
#                                           * Small graph, 3 vertices and an-all red dodechaderon; good to test parser
#                                           * T: Can be directed or undirected, no tree;
#                     2. Word Graphs
#                                           * Each vertex represents a 5-letter word; 
#                                           * An edge (u,v) if the corresponding words are anagrams or differ in exactly k positions, k € { 1, 2};
#                                           * T: has distinctive name for the vertices;
#                     3. Grids
#                                           * Consists of N^2 vertices 
#                                           * Each vertex (x, y) is connected to (x-1, y), (x, y-1), (x-1, y-1) if they exist;
#                                           * Every second row is red, except for the top- or bottom-most vertex, alternatingly;
#                                           * T: consists of exactly N^2 edges, can be both directed and indirected;
#                     4. Walls
#                                           * Family consisting of N overlapping 8-cycles called bricks; the bricks are laid in a wall of height 2 with various intervals of overlap;
#                                           * Each wall has a single red vertex w, the rightmost vertex of the same vertex as vertex 0;
#                                           * T: Contains cycles of length 8 with just one red vertex, can be both directed and undirected; 
#                     5. Sky
#                                           * Tree, in each level move down either one step left either right; 
#                                           * "Get from the start to the goal, avoiding the trees" --> avoid red vertices but maybe also avoid using a tree
#                                           * T: Directed, no cycles;
#                     6. Increasing numbers
#                                           * Each Increasing graph is generated from a sequence idx_1, .. idx_n of unique ints with 0 < val_i < 2n;
#                                           * The random process: Pick a subset of size n from {1, ..., 2n} and arrange them randomly;
#                                           * s = val_1, t = val_n; Odd numbers are red; Edge (val_i, val_i+1) if idx_i < idx_j and val_i < val_j;

#          Algorithms:
#                      * Maximum independent set 
#                      * Spanning tree, BFS, DFS, Prim, Dijsktra
#                      * Greedy
#                      * Divide and conquer --> Grids 
#                      * Dynamic programming, backtragking
#                      * Network flow
#                      * Np-hardness

#          Tests:
#                      * Number of edges, vertices, ratio vertices/edges --> Individual graphs, Grid, Tree;
#                         - as you check graphs and gather info on ratio, collect it and update along for each type of problem, do majority voting for tests; outlier detection to establish range for edges/vertices ratio;
#                      * Complete graph  --> Individual graphs
#                      * Tree  --> Sky
#                      * Dense graph --> Grids
#                      * Sparse graph --> Increasing numbers
#                      * Based on input format, we color the red vertices - 5 * is a red vertex; source and target are set and can be red so check them; also check if the graph is directed or undirected; also if name is string or int;
#

#             Majority voting of the tests: - if 3/5 | 2/3 tests say that the graph is a tree, then we assume that the graph is a tree; 
#                      - connected graph --> all problems
#                      - directed vs undirected --> all problems - if directed, then sky and incresing numbers but no grid nor individual graphs
#                      - number of edges --> all problems  - if #edges = 3 -> Individual graphs, if #edges = N^2 -> Grids
#                      - check if there are 8 non-overlapping cycles --> Walls


### Read the files

In [14]:
### taking the input, check for graphs of size 3

import warnings
from tqdm import tqdm
import os
import networkx as nx
import pandas as pd

warnings.filterwarnings('ignore')
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'


PATH = "../data/"

files_list = os.listdir(PATH)
df = pd.DataFrame(columns=['Instance name', 'n', 'F'])

###  count number of graphs in the folder
count_graphs = 0

for file in tqdm(files_list):
    if file.endswith('.txt'):
        full_path = os.path.join(PATH + file)
        if os.path.isfile(full_path):
            print(full_path)
            with open (full_path, 'r') as f:
                DG = nx.DiGraph()

                n, m, r = map(int, f.readline().strip().split())
                s, t = map(str, f.readline().strip().split())
                d = {}

                for i in range(n):
                    name = f.readline().strip().split(' ')
                    
                    #Red
                    if len(name) > 1:
                        DG.add_node(name[0], color="red")
                        d[name[0]] = "red"
                                
                    #Black
                    else:
                        DG.add_node(name[0], color="black")
                        d[name[0]] = "black"
                    
                
                for j in range(m):
                    start, directed, end = f.readline().strip().split(' ')
                    
                    if directed == '--':
                        if DG.nodes[start]['color'] == "red":
                            if DG.nodes[end]['color'] == "red":
                                DG.add_edge(start, end, weight=m)
                                DG.add_edge(end, start, weight=m)
                            else:
                                DG.add_edge(start, end, weight=1)
                                DG.add_edge(end, start, weight=m)
                        else:
                            if DG.nodes[end]['color'] == "red":
                                DG.add_edge(start, end, weight=m)
                                DG.add_edge(end, start, weight=1)
                            else:
                                DG.add_edge(start, end, weight=1)
                                DG.add_edge(end, start, weight=1)
                    else:
                        if DG.nodes[end]['color'] == "red":
                            DG.add_edge(start, end, weight=m)
                        else:
                            DG.add_edge(start, end, weight=1)
                
                try:
                    few_path = nx.shortest_path(DG, s, t)
                    count = 0
                    for i in few_path:
                        if d[i] == "red":
                            count += 1
                except nx.NetworkXNoPath:
                    count = -1
                
                file = file.replace('.txt', '')
                new_row = {'Instance name': file, 'n': n, 'F': count}
                df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)

df.to_csv('few_result.csv', index=False)

  5%|▍         | 7/155 [00:00<00:04, 33.34it/s]

../data/rusty-2-2000.txt
../data/wall-p-3.txt
../data/ski-level20-3.txt
../data/gnm-1000-2000-1.txt
../data/gnm-3000-6000-0.txt
../data/dodecahedron.txt
../data/common-2-2500.txt


  9%|▉         | 14/155 [00:00<00:02, 47.96it/s]

../data/wall-n-1000.txt
../data/gnm-3000-6000-1.txt
../data/gnm-1000-2000-0.txt
../data/ski-level20-2.txt
../data/wall-p-2.txt
../data/common-1-5000.txt
../data/common-1-1500.txt
../data/ski-level20-1.txt
../data/wall-p-1.txt
../data/grid-25-0.txt
../data/common-1-3500.txt
../data/gnm-5000-7500-1.txt
../data/wall-z-1000.txt
../data/smallworld-20-1.txt
../data/increase-n20-1.txt
../data/smallworld-20-0.txt


 15%|█▍        | 23/155 [00:00<00:02, 62.70it/s]

../data/rusty-1-3000.txt
../data/gnm-5000-7500-0.txt
../data/wall-p-4.txt
../data/grid-25-1.txt
../data/common-2-20.txt
../data/gnm-2000-3000-1.txt
../data/wall-p-1000.txt
../data/gnm-4000-6000-1.txt
../data/rusty-2-4500.txt


 21%|██        | 32/155 [00:00<00:03, 39.36it/s]

../data/common-2-250.txt
../data/gnm-10-15-0.txt
../data/increase-n20-3.txt
../data/wall-z-10000.txt


 25%|██▍       | 38/155 [00:01<00:04, 27.37it/s]

../data/increase-n20-2.txt
../data/gnm-10-15-1.txt
../data/gnm-4000-6000-0.txt
../data/common-2-4000.txt
../data/bht.txt


 27%|██▋       | 42/155 [00:01<00:04, 24.17it/s]

../data/grid-25-2.txt
../data/gnm-2000-3000-0.txt
../data/rusty-2-5757.txt


 30%|██▉       | 46/155 [00:01<00:05, 18.40it/s]

../data/wall-n-4.txt
../data/common-2-50.txt
../data/increase-n100-3.txt
../data/gnm-3000-4500-1.txt
../data/gnm-1000-1500-0.txt
../data/wall-p-10000.txt


 32%|███▏      | 50/155 [00:02<00:06, 16.79it/s]

../data/rusty-1-3500.txt
../data/gnm-1000-1500-1.txt
../data/gnm-3000-4500-0.txt
../data/common-1-3000.txt
../data/increase-n100-2.txt
../data/common-2-4500.txt


 36%|███▌      | 56/155 [00:02<00:05, 18.43it/s]

../data/gnm-2000-4000-1.txt
../data/common-2-5757.txt


 38%|███▊      | 59/155 [00:02<00:07, 12.36it/s]

../data/rusty-2-4000.txt
../data/gnm-2000-4000-0.txt


 43%|████▎     | 66/155 [00:03<00:05, 17.17it/s]

../data/increase-n100-1.txt
../data/wall-n-2.txt
../data/grid-5-0.txt
../data/common-1-250.txt
../data/common-2-2000.txt
../data/rusty-2-2500.txt
../data/grid-5-1.txt
../data/wall-n-3.txt
../data/wall-n-1.txt
../data/rusty-1-5000.txt
../data/smallworld-40-0.txt
../data/wall-n-10000.txt


 51%|█████     | 79/155 [00:03<00:03, 21.86it/s]

../data/common-1-1000.txt
../data/smallworld-40-1.txt
../data/grid-5-2.txt
../data/common-1-20.txt
../data/grid-50-1.txt
../data/increase-n500-2.txt
../data/smallworld-10-0.txt
../data/increase-n10-1.txt
../data/rusty-1-4500.txt
../data/common-1-100.txt
../data/smallworld-10-1.txt
../data/increase-n500-3.txt
../data/wall-n-100.txt


 55%|█████▌    | 86/155 [00:03<00:02, 28.32it/s]

../data/common-1-4000.txt
../data/grid-50-0.txt
../data/rusty-1-5757.txt
../data/ski-level3-1.txt
../data/common-2-3500.txt


 65%|██████▌   | 101/155 [00:04<00:01, 33.63it/s]

../data/ski-level3-3.txt
../data/ski-illustration.txt
../data/smallworld-3-1.txt
../data/grid-50-2.txt
../data/increase-n500-1.txt
../data/increase-n10-2.txt
../data/increase-n10-3.txt
../data/rusty-2-3000.txt
../data/smallworld-3-0.txt
../data/ski-level3-2.txt
../data/wall-p-10.txt
../data/ski-level10-2.txt
../data/gnm-5000-10000-0.txt
../data/common-2-1500.txt
../data/common-2-5000.txt


 74%|███████▍  | 115/155 [00:04<00:01, 36.72it/s]

../data/increase-n8-3.txt
../data/rusty-1-17.txt
../data/increase-n8-2.txt
../data/gnm-5000-10000-1.txt
../data/ski-level5-1.txt
../data/ski-level10-3.txt
../data/rusty-1-2000.txt
../data/common-1-500.txt
../data/smallworld-30-0.txt
../data/ski-level10-1.txt
../data/ski-level5-3.txt
../data/increase-n8-1.txt
../data/ski-level5-2.txt
../data/common-1-2500.txt
../data/wall-z-100.txt
../data/smallworld-30-1.txt
../data/rusty-2-5000.txt


 87%|████████▋ | 135/155 [00:04<00:00, 48.49it/s]

../data/wall-z-1.txt
../data/P3.txt
../data/common-2-1000.txt
../data/wall-z-2.txt
../data/common-1-2000.txt
../data/wall-z-3.txt
../data/rusty-1-2500.txt
../data/common-2-100.txt
../data/common-1-4500.txt
../data/grid-10-0.txt
../data/common-2-500.txt
../data/smallworld-50-1.txt
../data/common-1-5757.txt
../data/smallworld-50-0.txt
../data/increase-n50-1.txt
../data/rusty-1-4000.txt
../data/common-1-50.txt
../data/grid-10-1.txt
../data/gnm-4000-8000-1.txt


 96%|█████████▌| 149/155 [00:05<00:00, 50.05it/s]

../data/wall-n-10.txt
../data/wall-z-4.txt
../data/increase-n50-3.txt
../data/gnm-10-20-0.txt
../data/rusty-2-3500.txt
../data/wall-p-100.txt
../data/increase-n50-2.txt
../data/gnm-10-20-1.txt
../data/G-ex.txt
../data/common-2-3000.txt


100%|██████████| 155/155 [00:05<00:00, 29.27it/s]

../data/grid-10-2.txt
../data/gnm-4000-8000-0.txt
../data/wall-z-10.txt





In [15]:
df

Unnamed: 0,Instance name,n,F
0,rusty-2-2000,2000,0
1,wall-p-3,20,0
2,ski-level20-3,254,-1
3,gnm-1000-2000-1,1000,5
4,gnm-3000-6000-0,3000,0
...,...,...,...
149,G-ex,8,1
150,common-2-3000,3000,1
151,grid-10-2,100,4
152,gnm-4000-8000-0,4000,0
