## **04 MongoDB Python Advance**

### 0. 기본 pymongo 템플릿 코드
> sample_mflix 데이터셋을 기반으로, 지금까지 익힌 mongodb 문법을 pymongo 에서 어떻게 적용해서 사용할 수 있는지를 알아보기로 함

In [1]:
from pymongo import MongoClient

# MongoDB에 연결 (인증 미필요시)
client = MongoClient("mongodb://localhost:27017")
# client = MongoClient("mongodb://username:password@localhost:27017")
# 인증이 필요하지 않은 경우 위의 첫 번째 줄 사용, 인증이 필요한 경우 두 번째 줄 사용

db = client.sample_mflix   # use sample_mflix (데이터베이스 선택)
movies = db.movies         # 'movies' collection 선택

In [2]:
movies

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'sample_mflix'), 'movies')

### 다양한 find() 문법 적용

**1. 프로젝션(projection) - 결과 문서에 표시할 필드 지정:**

In [3]:
movies.find({"year": 1923}, {"_id": 0, "title": 1, "year": 1}) 
# db.movies.find( { year: 1923 }, { _id: 0, title: 1, year: 1 } )

<pymongo.synchronous.cursor.Cursor at 0x22c0f1446e0>

In [4]:
for movie in movies.find({"year": 1923}, {"_id": 0, "title": 1, "year": 1}):
    print(movie, movie['year'])

{'title': 'The Hunchback of Notre Dame', 'year': 1923} 1923
{'title': 'Our Hospitality', 'year': 1923} 1923
{'title': 'Safety Last!', 'year': 1923} 1923
{'title': 'Three Ages', 'year': 1923} 1923
{'title': 'A Woman of Paris: A Drama of Fate', 'year': 1923} 1923
{'title': 'The Chechahcos', 'year': 1923} 1923


In [5]:
type(movie)

dict

**2. 비교 쿼리 연산자 - MongoDB 비교 쿼리 연산자 사용:**

In [6]:
list(movies.find({"year": {"$lt": 1910}}, {"_id": 0, "title": 1, "year": 1})) 
# db.movies.find( { year: { $lt: 1910 } }, { _id: 0, title: 1, year: 1 } )

[{'title': 'Blacksmith Scene', 'year': 1893},
 {'title': 'The Great Train Robbery', 'year': 1903},
 {'title': 'A Corner in Wheat', 'year': 1909},
 {'title': 'The Kiss', 'year': 1896},
 {'title': 'Dickson Experimental Sound Film', 'year': 1894},
 {'title': 'The Kiss', 'year': 1896},
 {'title': 'Newark Athlete', 'year': 1891}]

In [7]:
# 1910년 이전에 출시된 영화 찾기
for movie in movies.find({"year": {"$lt": 1910}}, {"_id": 0, "title": 1, "year": 1}):
    print(movie)

{'title': 'Blacksmith Scene', 'year': 1893}
{'title': 'The Great Train Robbery', 'year': 1903}
{'title': 'A Corner in Wheat', 'year': 1909}
{'title': 'The Kiss', 'year': 1896}
{'title': 'Dickson Experimental Sound Film', 'year': 1894}
{'title': 'The Kiss', 'year': 1896}
{'title': 'Newark Athlete', 'year': 1891}


**3. 논리 쿼리 연산자 - MongoDB 논리 쿼리 연산자 사용:**

In [8]:
# 1900년 이전 또는 2015년 이후에 출시된 영화 찾기
for movie in movies.find(
        {"$or": [{"year": {"$lt": 1900}}, {"year": {"$gt": 2015}}]},
        {"_id": 0, "title": 1, "year": 1}
):
    print(movie)

{'title': 'Blacksmith Scene', 'year': 1893}
{'title': 'The Kiss', 'year': 1896}
{'title': 'Dickson Experimental Sound Film', 'year': 1894}
{'title': 'The Kiss', 'year': 1896}
{'title': 'Newark Athlete', 'year': 1891}
{'title': 'The Masked Saint', 'year': 2016}


**4. 배열 쿼리 연산자 - MongoDB 배열 쿼리 연산자 사용:**

In [23]:
# 'Action'과 'Sci-Fi' 장르의 영화 찾기
for movie in movies.find(
        {"genres": {"$all": ["Action", "Sci-Fi"]}},
        {"_id": 0, "title": 1, "year": 1, "genres": 1}
):
    print(movie)

{'genres': ['Action', 'Adventure', 'Sci-Fi'], 'title': 'Flash Gordon', 'year': 1936}
{'genres': ['Action', 'Sci-Fi', 'Thriller'], 'title': 'The War of the Worlds', 'year': 1953}
{'genres': ['Action', 'Sci-Fi'], 'title': 'The 10th Victim', 'year': 1965}
{'genres': ['Action', 'Drama', 'Sci-Fi'], 'title': 'Das Millionenspiel', 'year': 1970}
{'genres': ['Action', 'Sci-Fi'], 'title': 'Battle for the Planet of the Apes', 'year': 1973}
{'genres': ['Action', 'Sci-Fi', 'Thriller'], 'title': 'Westworld', 'year': 1973}
{'genres': ['Action', 'Sci-Fi', 'Sport'], 'title': 'Rollerball', 'year': 1975}
{'genres': ['Action', 'Adventure', 'Sci-Fi'], 'title': 'Buck Rogers in the 25th Century', 'year': 1979}
{'year': 1978, 'genres': ['Action', 'Drama', 'Sci-Fi'], 'title': 'Superman'}
{'genres': ['Action', 'Adventure', 'Sci-Fi'], 'title': 'Message from Space', 'year': 1978}
{'genres': ['Action', 'Adventure', 'Sci-Fi'], 'title': 'Mad Max', 'year': 1979}
{'genres': ['Action', 'Drama', 'Sci-Fi'], 'title': 'The

**5. 정렬하기(sort), 앞쪽 일부 건너뛰기(skip), 갯수 제한하기(limit):**
- find() 에 붙여서, 별도 메서드로 사용

In [10]:
list(movies.find().sort("imdb.rating", -1).skip(3).limit(3) )
# db.movies.find().sort( { "imdb.rating": -1 } ).skip(3).limit(3)

[{'_id': ObjectId('573a13d3f29313caabd9473c'),
  'plot': 'As profiled in the film "Aging Out", Risa Bejarano was a foster care success story. Recently graduated, she set out for college with multiple scholarships and a sense of excitement about ...',
  'genres': ['Documentary'],
  'title': 'No Tomorrow',
  'poster': 'https://m.media-amazon.com/images/M/MV5BMjI2OTgxODExOF5BMl5BanBnXkFtZTgwMzQ3NTA2MDE@._V1_SY1000_SX677_AL_.jpg',
  'countries': ['USA'],
  'fullplot': 'As profiled in the film "Aging Out", Risa Bejarano was a foster care success story. Recently graduated, she set out for college with multiple scholarships and a sense of excitement about her future. Then, she was brutally murdered. Soon, "Aging Out" became the centerpiece of Risa\'s murder trial, as prosecutors used the film to heighten sympathy for the victim and hatred for the defendant. Troubled that their documentary was being used to advance the prosecutor\'s argument for the death penalty, filmmakers Vanessa Roth (Acad

In [11]:
for movie in movies.find().sort("imdb.rating", -1).skip(3).limit(3):  # -1은 내림차순을 의미합니다.
    print(movie)

{'_id': ObjectId('573a13d3f29313caabd9473c'), 'plot': 'As profiled in the film "Aging Out", Risa Bejarano was a foster care success story. Recently graduated, she set out for college with multiple scholarships and a sense of excitement about ...', 'genres': ['Documentary'], 'title': 'No Tomorrow', 'poster': 'https://m.media-amazon.com/images/M/MV5BMjI2OTgxODExOF5BMl5BanBnXkFtZTgwMzQ3NTA2MDE@._V1_SY1000_SX677_AL_.jpg', 'countries': ['USA'], 'fullplot': 'As profiled in the film "Aging Out", Risa Bejarano was a foster care success story. Recently graduated, she set out for college with multiple scholarships and a sense of excitement about her future. Then, she was brutally murdered. Soon, "Aging Out" became the centerpiece of Risa\'s murder trial, as prosecutors used the film to heighten sympathy for the victim and hatred for the defendant. Troubled that their documentary was being used to advance the prosecutor\'s argument for the death penalty, filmmakers Vanessa Roth (Academy Aware win

**6. 정규표현식과 pymongo**

-  파이썬의 정규표현식 라이브러리인 `re` 모듈의 `compile` 함수를 사용하여 정규 표현식 객체를 생성하고,
- 이를 pymongo 에 적용할 수 있습니다.

- 예: re.I (IGNORECASE): 이 옵션은 대소문자를 구분하지 않는다는 것을 나타냅니다. 따라서 'Star', 'STAR', 'star', 'sTaR' 등을 모두 찾을 수 있습니다.

In [12]:
import re
regex = re.compile('Star', re.I)  # 'Star'를 대소문자를 구분하지 않고 검색합니다.

for movie in movies.find({"title": regex}).limit(1): # title에 'Star'가 포함된 영화 찾기
    print(movie)

{'_id': ObjectId('573a1392f29313caabcdb497'), 'plot': 'A young woman comes to Hollywood with dreams of stardom, but achieves them only with the help of an alcoholic leading man whose best days are behind him.', 'genres': ['Drama'], 'runtime': 111, 'rated': 'NOT RATED', 'cast': ['Janet Gaynor', 'Fredric March', 'Adolphe Menjou', 'May Robson'], 'poster': 'https://m.media-amazon.com/images/M/MV5BMmE5ODI0NzMtYjc5Yy00MzMzLTk5OTQtN2Q3MzgwOTllMTY3XkEyXkFqcGdeQXVyNjc0MzMzNjA@._V1_SY1000_SX677_AL_.jpg', 'title': 'A Star Is Born', 'fullplot': 'Esther Blodgett is just another starry-eyed farm kid trying to break into the movies. Waitressing at a Hollywood party, she catches the eye of alcoholic star Norman Maine, is given a test, and is caught up in the Hollywood glamor machine (ruthlessly satirized). She and her idol Norman marry; but his career abruptly dwindles to nothing', 'languages': ['English'], 'released': datetime.datetime(1937, 4, 27, 0, 0), 'directors': ['William A. Wellman', 'Jack Con

In [13]:
list(movies.find({"title": regex}).limit(1)) # db.movies.find( { title: /Star/i } )

[{'_id': ObjectId('573a1392f29313caabcdb497'),
  'plot': 'A young woman comes to Hollywood with dreams of stardom, but achieves them only with the help of an alcoholic leading man whose best days are behind him.',
  'genres': ['Drama'],
  'runtime': 111,
  'rated': 'NOT RATED',
  'cast': ['Janet Gaynor', 'Fredric March', 'Adolphe Menjou', 'May Robson'],
  'poster': 'https://m.media-amazon.com/images/M/MV5BMmE5ODI0NzMtYjc5Yy00MzMzLTk5OTQtN2Q3MzgwOTllMTY3XkEyXkFqcGdeQXVyNjc0MzMzNjA@._V1_SY1000_SX677_AL_.jpg',
  'title': 'A Star Is Born',
  'fullplot': 'Esther Blodgett is just another starry-eyed farm kid trying to break into the movies. Waitressing at a Hollywood party, she catches the eye of alcoholic star Norman Maine, is given a test, and is caught up in the Hollywood glamor machine (ruthlessly satirized). She and her idol Norman marry; but his career abruptly dwindles to nothing',
  'languages': ['English'],
  'released': datetime.datetime(1937, 4, 27, 0, 0),
  'directors': ['William

- re 모듈 없이, 직접 정규표현식을 pymongo 에 사용할 수도 있음
- `$options`
   - `i`: 대소문자를 구분하지 않습니다. (re 라이브러리에서는 re.I)

In [14]:
# title에 'Star'가 포함된 영화 찾기
list(movies.find({"title": regex}).limit(1)) # db.movies.find( { title: /Star/i } )

[{'_id': ObjectId('573a1392f29313caabcdb497'),
  'plot': 'A young woman comes to Hollywood with dreams of stardom, but achieves them only with the help of an alcoholic leading man whose best days are behind him.',
  'genres': ['Drama'],
  'runtime': 111,
  'rated': 'NOT RATED',
  'cast': ['Janet Gaynor', 'Fredric March', 'Adolphe Menjou', 'May Robson'],
  'poster': 'https://m.media-amazon.com/images/M/MV5BMmE5ODI0NzMtYjc5Yy00MzMzLTk5OTQtN2Q3MzgwOTllMTY3XkEyXkFqcGdeQXVyNjc0MzMzNjA@._V1_SY1000_SX677_AL_.jpg',
  'title': 'A Star Is Born',
  'fullplot': 'Esther Blodgett is just another starry-eyed farm kid trying to break into the movies. Waitressing at a Hollywood party, she catches the eye of alcoholic star Norman Maine, is given a test, and is caught up in the Hollywood glamor machine (ruthlessly satirized). She and her idol Norman marry; but his career abruptly dwindles to nothing',
  'languages': ['English'],
  'released': datetime.datetime(1937, 4, 27, 0, 0),
  'directors': ['William

In [15]:
for movie in movies.find({"title": {"$regex": "star", "$options": 'i'}}).limit(1): 
    print(movie)

{'_id': ObjectId('573a1392f29313caabcdb497'), 'plot': 'A young woman comes to Hollywood with dreams of stardom, but achieves them only with the help of an alcoholic leading man whose best days are behind him.', 'genres': ['Drama'], 'runtime': 111, 'rated': 'NOT RATED', 'cast': ['Janet Gaynor', 'Fredric March', 'Adolphe Menjou', 'May Robson'], 'poster': 'https://m.media-amazon.com/images/M/MV5BMmE5ODI0NzMtYjc5Yy00MzMzLTk5OTQtN2Q3MzgwOTllMTY3XkEyXkFqcGdeQXVyNjc0MzMzNjA@._V1_SY1000_SX677_AL_.jpg', 'title': 'A Star Is Born', 'fullplot': 'Esther Blodgett is just another starry-eyed farm kid trying to break into the movies. Waitressing at a Hollywood party, she catches the eye of alcoholic star Norman Maine, is given a test, and is caught up in the Hollywood glamor machine (ruthlessly satirized). She and her idol Norman marry; but his career abruptly dwindles to nothing', 'languages': ['English'], 'released': datetime.datetime(1937, 4, 27, 0, 0), 'directors': ['William A. Wellman', 'Jack Con

**7. distinct: 이 메소드는 특정 필드의 모든 고유한 값을 반환합니다.**

In [16]:
distinct_genres = movies.distinct('genres') # db.movies.distinct("genres")
print(distinct_genres)

['Action', 'Adventure', 'Animation', 'Biography', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Family', 'Fantasy', 'Film-Noir', 'History', 'Horror', 'Music', 'Musical', 'Mystery', 'News', 'Romance', 'Sci-Fi', 'Short', 'Sport', 'Talk-Show', 'Thriller', 'War', 'Western']


**8. $in: 이 연산자는 필드 값이 특정 배열 내의 값 중 하나와 일치하는 문서를 선택합니다.**

In [17]:
movies.find({'genres': {'$in': ['Action', 'Adventure']}}).limit(3) 
# db.movies.find( { genres: { $in: [ "Action", "Adventure" ] } } ).limit(3)
# 'Action' 또는 'Adventure' 장르의 영화 찾기

<pymongo.synchronous.cursor.Cursor at 0x22c1990b4d0>

In [18]:
for movie in movies.find({'genres': {'$in': ['Action', 'Adventure']}}).limit(3):
    print(movie['genres'])

['Action']
['Action', 'Adventure', 'Crime']
['Comedy', 'Short', 'Action']


**9. $exists: 이 연산자는 특정 필드가 문서에 존재하는지 여부에 따라 문서를 선택합니다.**

In [19]:
movies.find({'writers': {'$exists': False}}).limit(3) 
# db.movies.find( { writers: { $exists: false } } ).limit(3)
# 'writers' 필드가 존재하지 않는 영화 찾기

<pymongo.synchronous.cursor.Cursor at 0x22c19a442d0>

In [20]:
for movie in movies.find({'writers': {'$exists': False}}).limit(3):
    print(movie)

{'_id': ObjectId('573a1390f29313caabcd4135'), 'plot': 'Three men hammer on an anvil and pass a bottle of beer around.', 'genres': ['Short'], 'runtime': 1, 'cast': ['Charles Kayser', 'John Ott'], 'num_mflix_comments': 1, 'title': 'Blacksmith Scene', 'fullplot': 'A stationary camera looks at a large anvil with a blacksmith behind it and one on either side. The smith in the middle draws a heated metal rod from the fire, places it on the anvil, and all three begin a rhythmic hammering. After several blows, the metal goes back in the fire. One smith pulls out a bottle of beer, and they each take a swig. Then, out comes the glowing metal and the hammering resumes.', 'countries': ['USA'], 'released': datetime.datetime(1893, 5, 9, 0, 0), 'directors': ['William K.L. Dickson'], 'rated': 'UNRATED', 'awards': {'wins': 1, 'nominations': 0, 'text': '1 win.'}, 'lastupdated': '2015-08-26 00:03:50.133000000', 'year': 1893, 'imdb': {'rating': 6.2, 'votes': 1189, 'id': 5}, 'type': 'movie', 'tomatoes': {'

**10. count_documents: 이 메소드는 쿼리에 일치하는 문서의 수를 반환합니다.**
- find() 대신에 count_documents() 메서드로 count 값을 얻을 수 있음

> find().count() 방식도 문서의 수를 세는 방법으로 사용할 수 있지만, 이 방식은 MongoDB 4.0 이후로 공식적으로 deprecated (사용이 권장되지 않는) 되었습니다.

In [21]:
# 'Action' 또는 'Adventure' 장르의 영화 개수 세기
count = movies.count_documents({'genres': {'$in': ['Action', 'Adventure']}}) 
# db.movies.countDocuments( { genres: { $in: [ "Action", "Adventure" ] } } )
print(count)

3805
