# Syntax for handling collections
Below is a series of tests that go show the different methods that we will be able to use to minimise the number of API calls that we have to make in order to collect and store all the relevant data. 

In [1]:
# Imports
from bson import ObjectId # To be able to use the ObjectID notation
import os 
import pymongo 

In [2]:
mongodb_password = os.environ.get("mongodb_password")
mongodb_user = os.environ.get("mongodb_user")
client = pymongo.MongoClient("mongodb+srv://" + mongodb_user + ":" + mongodb_password + "@basiccluster-6s0er.mongodb.net/test?retryWrites=true&w=majority")
db = client.football

Using sort:
https://stackoverflow.com/questions/10242149/using-sort-with-pymongo/10242305

In [3]:
db.test.drop()

In [4]:
for i in range(1,6):
    db.test.insert_one({'test_id': i})

In [5]:
list(db.test.find())

[{'_id': ObjectId('5dc6f62cd57b60e8df505784'), 'test_id': 1},
 {'_id': ObjectId('5dc6f62cd57b60e8df505785'), 'test_id': 2},
 {'_id': ObjectId('5dc6f62dd57b60e8df505786'), 'test_id': 3},
 {'_id': ObjectId('5dc6f62dd57b60e8df505787'), 'test_id': 4},
 {'_id': ObjectId('5dc6f62dd57b60e8df505788'), 'test_id': 5}]

In [6]:
db.test.update_one({'test_id': 1}, 
               { '$set':
                {
                 'stats': {
                     'goals':1, 
                     'shots':10
                 }   
                }}, upsert=True)

db.test.update_one({'test_id': 2}, 
               { '$set':
                {
                 'stats': {
                     'goals':4, 
                     'shots':9
                 }   
                }}, upsert=True)

<pymongo.results.UpdateResult at 0x11024ecd0>

Through specifying `upsert=True` this means that it will update or insert if it isn't present in the collection. While using `$set` means that the statement won't remove those keys that we don't specify. 

In [7]:
list(db.test.find())

[{'_id': ObjectId('5dc6f62cd57b60e8df505784'),
  'test_id': 1,
  'stats': {'goals': 1, 'shots': 10}},
 {'_id': ObjectId('5dc6f62cd57b60e8df505785'),
  'test_id': 2,
  'stats': {'goals': 4, 'shots': 9}},
 {'_id': ObjectId('5dc6f62dd57b60e8df505786'), 'test_id': 3},
 {'_id': ObjectId('5dc6f62dd57b60e8df505787'), 'test_id': 4},
 {'_id': ObjectId('5dc6f62dd57b60e8df505788'), 'test_id': 5}]

## Deleting records

In [8]:
# db.test.delete_one({"_id": ObjectId('5dc6e8be6eb37078520186e4')})
db.test.delete_one({"test_id": 3})

<pymongo.results.DeleteResult at 0x110210aa0>

In [9]:
list(db.test.find())

[{'_id': ObjectId('5dc6f62cd57b60e8df505784'),
  'test_id': 1,
  'stats': {'goals': 1, 'shots': 10}},
 {'_id': ObjectId('5dc6f62cd57b60e8df505785'),
  'test_id': 2,
  'stats': {'goals': 4, 'shots': 9}},
 {'_id': ObjectId('5dc6f62dd57b60e8df505787'), 'test_id': 4},
 {'_id': ObjectId('5dc6f62dd57b60e8df505788'), 'test_id': 5}]

Through using this sort of working process, we are going to be store stats and update them when they come available through the API. 

## Sorting records
https://stackoverflow.com/questions/12031507/mongodb-sorting-by-nested-object-value  

1: ascending   
-1: descending

In [10]:
db.test.find().sort([('test_id',-1)])[0]

{'_id': ObjectId('5dc6f62dd57b60e8df505788'), 'test_id': 5}

In [11]:
# Game with shots desc
list(db.test.find().sort([('stats.shots',-1)]))

[{'_id': ObjectId('5dc6f62cd57b60e8df505784'),
  'test_id': 1,
  'stats': {'goals': 1, 'shots': 10}},
 {'_id': ObjectId('5dc6f62cd57b60e8df505785'),
  'test_id': 2,
  'stats': {'goals': 4, 'shots': 9}},
 {'_id': ObjectId('5dc6f62dd57b60e8df505787'), 'test_id': 4},
 {'_id': ObjectId('5dc6f62dd57b60e8df505788'), 'test_id': 5}]

In [12]:
# Game with goals asc
list(db.test.find().sort([('stats.goals',1)]))

[{'_id': ObjectId('5dc6f62dd57b60e8df505787'), 'test_id': 4},
 {'_id': ObjectId('5dc6f62dd57b60e8df505788'), 'test_id': 5},
 {'_id': ObjectId('5dc6f62cd57b60e8df505784'),
  'test_id': 1,
  'stats': {'goals': 1, 'shots': 10}},
 {'_id': ObjectId('5dc6f62cd57b60e8df505785'),
  'test_id': 2,
  'stats': {'goals': 4, 'shots': 9}}]

In [13]:
list(db.test.find({'stats.shots':9}).sort([('stats.goals',1)]))

[{'_id': ObjectId('5dc6f62cd57b60e8df505785'),
  'test_id': 2,
  'stats': {'goals': 4, 'shots': 9}}]

It is likely that we are going to need to check whether stats exist or not; we can do this using the syntax below. 

In [14]:
list(db.test.find({'stats':{'$exists': True}}))

[{'_id': ObjectId('5dc6f62cd57b60e8df505784'),
  'test_id': 1,
  'stats': {'goals': 1, 'shots': 10}},
 {'_id': ObjectId('5dc6f62cd57b60e8df505785'),
  'test_id': 2,
  'stats': {'goals': 4, 'shots': 9}}]

We can also filter out those that we don't have statistics for, so that we can work out which games we should call the API to obtain. 

In [15]:
filter = list(db.test.find({'stats':{'$exists': True}}, {'_id'}))
filter = [i['_id'] for i in filter]
list(db.test.find({'_id': {'$nin': filter}}))

[{'_id': ObjectId('5dc6f62dd57b60e8df505787'), 'test_id': 4},
 {'_id': ObjectId('5dc6f62dd57b60e8df505788'), 'test_id': 5}]

## Indexes

In [16]:
db.test.create_index('test_id', unique=True, name='test_id_pk', default_language='english')

'test_id_pk'

In [17]:
db.test.insert_one({'test_id': 1})

DuplicateKeyError: E11000 duplicate key error collection: football.test index: test_id_pk dup key: { : 1 }

This means that we won't accidentally insert multiple records for a single match and will avoid data read issues later. 