# 09 - GraphX: procesamiento de grafos

Programación paralela de grafos con Spark

-   Principal abstracción: [*Graph*](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.Graph)
    -   Multigrafo dirigido con propiedades asignadas a vértices y aristas
    -   Extensión de los RDDs
- Incluye constructores de grafos, operadores básicos ( *reverse*, *subgraph*…) y algoritmos de grafos ( *PageRank*, *Triangle Counting*…)
- Actualmente, no disponible en PySpark (solo Scala)

Documentación: [spark.apache.org/docs/latest/graphx-programming-guide.html](http://spark.apache.org/docs/latest/graphx-programming-guide.html)

### Grafos en GraphX

![GraphX](media/16.grapxgraph.png)
(Fuente: M.S. Malak, R. East "Spark GraphX in action", Manning, 2016)

In [1]:
!pip install pyspark

[33mYou are using pip version 9.0.1, however version 18.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [11]:
# Create apache spark context
from pyspark import SparkContext
sc = SparkContext(master="local", appName="Mi app")

In [10]:
# Stop apache spark context
sc.stop()

Ejemplo de grafo sencillo

![Grafo](media/17.simpsonsgraph.png)

(Fuente: P. Zecević, M. Bonaći "Spark in action", Manning, 2017)

Ejemplo de código en Scala:

```Scala
import org.apache.spark.graphx._
case class Person(name:String, age:Int)
val vertices = sc.parallelize(Array((1L, Person("Homer", 39)),
                                    (2L, Person("Marge", 39)),
                                    (3L, Person("Bart", 12)),
                                    (4L, Person("Milhouse", 12))))
                                    
val aristas = sc.parallelize(Array(Edge(4L, 3L, "amigo"),
                                 Edge(3L, 1L, "padre"),
                                 Edge(3L, 2L, "madre"),
                                 Edge(1L, 2L, "casadoCon")))
                                 
val graph = Graph(vertices, aristas)

graph.vertices.count()
graph.edges.count()
```