## CSE 255 Problem Set 1: Collinear points

For this problem set, we'll be using the Jupyter notebook:

![](jupyter.png)

### In this problem set, you will write python3 code to find sets of collinear points given __ 2D points.

Definition of collinearity: In geometry, collinearity of a set of points is the property of their lying on a single line. A set of points with this property is said to be collinear.

![](non-collinear-points.jpg)

Here, points P,Q,R and A,R,B are collinear. However, points A,B,C are non-collinear.

Given an input file with a set of co-ordinates, your task is to use pyspark library functions to write a program to find if three points are collinear.

For instance, if given these points: {(1,1), (0,1), (2,2), (3,3), (0,5), (3,4), (5,6), (0,-3), (-2,-2)}

Sets of collinear points are: {((-2,-2), (1,1), (2,2), (3,3)), ((0,1), (3,4), (5,6)), ((0,-3), (0,1), (0,5))}. Note that the ordering of the points in a set or the order of the sets does not matter. 

Note: Every set has to have atleast three points (single point or a pair of points always lie on a line).

1. https://en.wikipedia.org/wiki/Collinearity
2. http://www.mathcaptain.com/geometry/collinear-points.html

In [1]:
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("Collinear Points").setMaster("local[4]")
sc = SparkContext(conf=conf)

### Approach

1. Find Cartesian product of the RDD with itself, so that you get a set of a pair of distinct point co-ordinates. For instance, if your RDD is {(1,0), (2,0), (3,0)}, the function should return {((1,0), (2,0)), ((2,0), (1,0)), ((1,0), (3,0)), ((3,0), (1,0)), ((2,0), (3,0)), ((3,0), (2,0))}
2. Find the slope between pairs of all points. If more than one pair have the same slope and one of the points is common, it means that the three points are collinear. For instance, let's take the result from the previous step as {((1,0), (2,0)), ((2,0), (1,0)), ((1,0), (3,0)), ((3,0), (1,0)), ((2,0), (3,0)), ((3,0), (2,0))}. Slope between (1,0) and (2,0) is 0, and slope between (1,0) and (3,0) is 0. Since (1,0) is common to both of them, they must all be collinear.


In [2]:
### BEGIN SOLUTION
def convert_to_tuples(x):
    arr = x.split()
    return tuple([int(el) for el in arr])

def filt(x):                
    return x[0] != x[1]

def sort_elements(x):
    return tuple(sorted(x))

def remove_duplicates(x, y):
    return set((x,y))
### END SOLUTION

def preprocessing(rdd):
    """
    Find Cartesian product of the RDD with itself, and ensure you get a set of a pair of DISTINCT point co-ordinates
    """
    ### BEGIN SOLUTION
    
    #Finding cartesian product
    rdd = rdd.map(convert_to_tuples)
    rdd = rdd.cartesian(rdd)
    
    #To remove duplicates points, i.e. ((1,0), (1,0))
    rdd = rdd.filter(filt)
    ### END SOLUTION
    return rdd

In [3]:
### BEGIN SOLUTION
def find_slope(x):
    if x[1][0] - x[0][0] == 0:
        slope = "inf"
    else:
        slope = 1.0 * (x[1][1] - x[0][1])/(x[1][0] - x[0][0])
    return ((x[0], slope), (x[1]))

def returnpoints(x):
    x[1].append(x[0][0])
    return tuple(x[1])
### END SOLUTION

def find_collinear(rdd):
    """
    1. Find the slope of the line between all pairs of points A = (Ax, Ay) and B = (Bx, By).
    2. For each (A, B), find all points C = ((C1x, C1y), (C2x, C2y), ... (Cnx, Cny)) 
       where slope of (A,B) = slope of (A, Ci).
    3. Return (A, B, Ck) where Ck = all points of C which satisfy the condition 1.
    """
    ### BEGIN SOLUTION
    rdd = rdd.map(find_slope).groupByKey().mapValues(list).filter(lambda x: len(x[1]) > 1)
    rdd = rdd.map(returnpoints)
    ### END SOLUTION
    return rdd

In [4]:
def get_sorted(x):
    return tuple(sorted(x))

def execute(filename):
    """
    Compute the set of collinear points
    """
    rdd = sc.textFile(filename)
    rdd = preprocessing(rdd)
    rdd = find_collinear(rdd)
    rdd = rdd.map(get_sorted)
    res = set(rdd.collect())
    return res

In [5]:
#Intermediate tests
#sorted(preprocessing(sc.textFile("data.txt")).collect())

In [6]:
"""Check that the get_collinear function returns the correct output for several all input files"""
assert execute("data.txt") == {((-2, -2), (1, 1), (2, 2), (3, 3)), ((0, 1), (3, 4), (5, 6)), ((0, -3), (0, 1), (0, 5))}
assert execute("test.txt") == {((1, 0), (2, 0), (3, 0))}
### BEGIN HIDDEN TESTS

### END HIDDEN TESTS

In [8]:
"""Check that the get_collinear function raises an error for invalid input files"""
try:
    execute("data.txt")
except ValueError:
    pass
else:
    raise AssertionError("did not raise")

AssertionError: did not raise