Install mongoDB in Ambari
```
cd /var/lib/ambari-server/resources/stacks/HDP/2.5/services
git clone https://github.com/nikunjness/mongo-ambari.git
sudo service ambari restart
```
Then log in to Ambari and 'add service' -> mongoDB

Make sure to install pymongo in ambari server as well


In [None]:
from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.sql import functions

def parseInput(line):
    fields = line.split('|')
    return Row(user_id = int(fields[0]), age = int(fields[1]), gender = fields[2], occupation = fields[3], zip = fields[4]) #row name needs to match with Cassandra table

if __name__ == "__main__":
    # Create a SparkSession
    spark = SparkSession.builder.appName("MongoDBIntegration").getOrCreate()

    # Get the raw data
    lines = spark.sparkContext.textFile("hdfs://127.0.0.1:8020/user/maria_dev/ml-100k/u.data")
    # Convert it to a RDD of Row objects with (userID, age, gender, occupation, zip)
    users = lines.map(parseInput)
    # Convert that to a DataFrame
    usersDataset = spark.createDataFrame(users)

    # Write it into Mongo
    usersDataset.write\
        .format("com.mongodb.spark.sql.DefaultSource")\
        .options("uri","mongodb://127.0.0.1/movielens.users")\
        .mode('append')\
        .save()

    # Read it back from Mongo into a new Dataframe
    readUsers = spark.read\
    .format("com.mongodb.spark.sql.DefaultSource")\
    .options("uri","mongodb://127.0.0.1/movielens.users")\
    .load() #this will not load the data into memory

    readUsers.createOrReplaceTempView("users")

    sqlDF = spark.sql("SELECT * FROM users WHERE age < 20")
    sqlDF.show() #here what this is doing is figuring out how to translate this query into mongoDB language, then it will query in mongo

    # Stop the session
    spark.stop()


mongo Shell

```
mongo
use movielens
db.users.find({user_id:100}) to find the record with user_id 100
```

the find above is not too efficient as it is not index and it is scanning the entire database. We can create an index

```
db.users.createIndex({user_id:1}) #1 means ascending
```
other mongo commands
```
db.users.aggregate([
    { $group: { _id: {occupation: "$occupation"}, avgAge: { $avg: "$age"}}}
    ])
    
db.users.count()
db.getCollectionInfos()
db.users.drop()
```

