# Types of RDDs
Aside from the base RDD class that contains members (properties or attributes and functions)
common to all RDDs, there are some specific RDD implementations that enable additional opera-
tors and functions. These additional RDD types include the following:

# i)PairRDD: 
An RDD of key/value pairs. You have already seen this type of RDD as it is
automatically created by using the wholeTextFiles() method.

# ii)DoubleRDD: 
An RDD consisting of a collection of double values only. Because the
values are of the same numeric type, several additional statistical functions are available,
including mean() , sum() , stdev() , variance() , and histogram() , among others.

# iii) DataFrame (formerly known as SchemaRDD):
A distributed collection of data organized into named and typed columns. 
A DataFrame is equivalent to a relational table in Spark SQL. 
DataFrames originated with the read.jdbc() and read.json() functions discussed earlier.

# iv) SequenceFileRDD:
An RDD created from a SequenceFile, either compressed or
uncompressed.

# v) HadoopRDD: 
An RDD that provides core functionality for reading data stored in HDFS
using the v1 MapReduce API.

# vi) NewHadoopRDD: 
An RDD that provides core functionality for reading data stored in
Hadoop.For example, files in HDFS, sources in HBase, or S3—using the new MapReduce
API (org.apache.hadoop.mapreduce).

# vii) CoGroupedRDD: 
An RDD that cogroups its parents. For each key in parent RDDs, the
resulting RDD contains a tuple with the list of values for that key. (We will discuss the
cogroup() function later in this chapter.)

# viii) JdbcRDD: 
An RDD resulting from a SQL query to a JDBC connection. It is available in the
Scala API only.

# ix) PartitionPruningRDD: 
An RDD used to prune RDD partitions or other partitions to avoid
launching tasks on all partitions. For example, if you know the RDD is partitioned by range,
and the execution DAG has a filter on the key, you can avoid launching tasks on partitions
that don’t have the range covering the key.

# x) ShuffledRDD: 
The resulting RDD from a shuffle, such as repartitioning of data.

# xi) UnionRDD: 
An RDD resulting from a union() operation against two or more RDDs.

There are other RDD variants, including ParallelCollectionRDD and PythonRDD, which are
created from the parallelize() and range() functions discussed previously.

Throughout this book, in addition to the base RDD class, you will mainly use the PairRDD,
DoubleRDD, and DataFrame RDD classes, but it’s worthwhile to be familiar with all the various
RDD types. Documentation and more information about the types of RDDs can be found in the
Spark Scala API documentation at https://spark.apache.org/docs/latest/api/scala/index.html.