Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastApi & MongoDB - the full guide #1515

Closed
mclate opened this issue Jun 4, 2020 · 33 comments
Closed

FastApi & MongoDB - the full guide #1515

mclate opened this issue Jun 4, 2020 · 33 comments

Comments

@mclate
Copy link

mclate commented Jun 4, 2020

Description

In this issue i'd like to gather all the information about the use of MongoDB, FastApi and Pydantic. At this point this is a "rather complete" solution, but i'd like to gather feedback and comments from the community to se how it can be improved.

The biggest pain point that started this and several other threads when trying to use FastAPI with mongo is the _id field. There are several issues here:

  1. Most known one - _id field being ObjectId, which is not very JSON-friendly
  2. _id field by it's naming is not very python-friendly (that is, written as is in Pydantic model, it would become a private field - many IDEs will point that)

Below i'll try to describe solutions i've found in different places and see what cases do the cover and what's left unsolved.

Let's say, we have some Joe, who's a regular developer. Joe just discovered FastAPI and is familiar with mongo (to the extend that he can create and fetch documents from DB). Joe wants to build clean and fast api that would:

1️⃣ Be able to define mongo-compatible documents as regular Pydantic models (with all the proper validations in place):

class User(BaseModel):
    id: ObjectId = Field(description="User id")
    name: str = Field()

2️⃣ Write routes that would use native Pydantic models as usual:

@app.post('/me', response_model=User)
def save_me(body: User):
   ...

3️⃣ Have api to return json like {"id": "5ed8b7eaccda20c1d4e95bb0", "name": "Joe"} (it's quite expected in the "outer world" to have id field for the document rather than _id. And it just looks nicer.)
4️⃣ Have Swagger and ReDoc documentation to display fields id (str), name (str)
5️⃣ Be able to save Pydantic documents into Mongo with proper id field substitution:

user = User(id=ObjectId(), name='Joe')
inserted = db.user.insert_one(user) # This should insert document as `{"_id": user.id, "name": "Joe"}`
assert inserted.inserted_id == user.id

6️⃣ Should be able to fetch documents from Mongo with proper id matching:

user_id = ObjectId()
found = db.user.find({"_id": user_id})
user = User(**found)
assert user.id == user_id

Known solutions

Validating ObjectId

As proposed in #452, one can define custom field for ObjectId and apply validations to it. One can also create base model that would encode ObjectId into strings:

class OID(str):
    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, v):
        try:
            return ObjectId(str(v))
        except InvalidId:
            raise ValueError("Not a valid ObjectId")


class MongoModel(BaseModel):
    class Config(BaseConfig):
        json_encoders = {
            datetime: lambda dt: dt.isoformat(),
            ObjectId: lambda oid: str(oid),
        }

class User(MongoModel):
    id: OID = Field()
    name: str = Field()


@app.post('/me', response_model=User)
def save_me(body: User):
    assert isinstance(body.id, ObjectId)
    return body

Now we have:

1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ 6️⃣
☑️ ☑️

Dealing with _id

Another suggested option would be to use alias="_id" on Pydantic model:

class MongoModel(BaseModel):
    class Config(BaseConfig):
        allow_population_by_field_name = True  # << Added
        json_encoders = {
            datetime: lambda dt: dt.isoformat(),
            ObjectId: lambda oid: str(oid),
        }

class User(MongoModel):
    id: OID = Field(alias="_id")  # << Notice alias
    name: str = Field()


@app.post('/me', response_model=User)
def save_me(body: User):
    assert isinstance(body.id, ObjectId)
    res = db.insert_one(body.dict(by_alias=True))  # << Inserting as dict with aliased fields
    assert res.inserted_id == body.id
    return body

Now are able to save to DB using User.id field as _id - that solves 5️⃣.

However, how Swagger and ReDoc show id field as _id, and json that is returned looks like this: {"_id":"5ed803afba6455fd78659988","name":"Joe"}. This is a regression for 3️⃣ and 4️⃣
Now we have:

1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ 6️⃣
☑️ ☑️ ✅️ ☑️

Hacking our way through

We can do some extra coding to keep id field and make proper inserting into DB. Effectively, we're shuffling id and _id field in MongoModel upon dumping/loading.

class MongoModel(BaseModel):

    class Config(BaseConfig):
        allow_population_by_field_name = True
        json_encoders = {
            datetime: lambda dt: dt.isoformat(),
            ObjectId: lambda oid: str(oid),
        }

    @classmethod
    def from_mongo(cls, data: dict):
        """We must convert _id into "id". """
        if not data:
            return data
        id = data.pop('_id', None)
        return cls(**dict(data, id=id))

    def mongo(self, **kwargs):
        exclude_unset = kwargs.pop('exclude_unset', True)
        by_alias = kwargs.pop('by_alias', True)

        parsed = self.dict(
            exclude_unset=exclude_unset,
            by_alias=by_alias,
            **kwargs,
        )

        # Mongo uses `_id` as default key. We should stick to that as well.
        if '_id' not in parsed and 'id' in parsed:
            parsed['_id'] = parsed.pop('id')

        return parsed

@app.post('/me', response_model=User)
def save_me(body: User):
    assert isinstance(body.id, ObjectId)
    res = db.insert_one(body.mongo())  # << Notice that we should use `User.mongo()` now.
    assert res.inserted_id == body.id
    return body

This brings back documentation and proper output and solves the insertion:

1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ 6️⃣
✅️ ✅️ ✅️ ☑️

Looks like we're getting closer...

Fetching docs from DB

Now, let's try to fetch doc from DB and return it:

@app.post('/me', response_model=User)
def save_me(body: User):
    assert isinstance(body.id, ObjectId)
    res = db.insert_one(body.mongo())  # << Notice that we should use `User.mongo()` now.
    assert res.inserted_id == body.id

    found = col.find_one({'_id': res.inserted_id})
    return found

    """
    pydantic.error_wrappers.ValidationError: 1 validation error for User
    response -> id
      field required (type=value_error.missing)
    """

The workaround for this is to use User.from_mongo:

@app.post('/me', response_model=User)
def save_me(body: User):
    assert isinstance(body.id, ObjectId)
    res = db.insert_one(body.mongo())
    assert res.inserted_id == body.id

    found = col.find_one({'_id': res.inserted_id})
    return User.from_mongo(found)  # << Notice that we should use `User.from_mongo()` now.

This seem to cover fetching from DB. Now we have:

1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ 6️⃣
✅️ ✅️ ✅️ ✅️

Conclusion and questions

Under the spoiler one can find final code to make FastApi work with mongo in the most "native" way:

Full code
class OID(str):
  @classmethod
  def __get_validators__(cls):
      yield cls.validate

  @classmethod
  def validate(cls, v):
      try:
          return ObjectId(str(v))
      except InvalidId:
          raise ValueError("Not a valid ObjectId")


class MongoModel(BaseModel):

  class Config(BaseConfig):
      allow_population_by_field_name = True
      json_encoders = {
          datetime: lambda dt: dt.isoformat(),
          ObjectId: lambda oid: str(oid),
      }

  @classmethod
  def from_mongo(cls, data: dict):
      """We must convert _id into "id". """
      if not data:
          return data
      id = data.pop('_id', None)
      return cls(**dict(data, id=id))

  def mongo(self, **kwargs):
      exclude_unset = kwargs.pop('exclude_unset', True)
      by_alias = kwargs.pop('by_alias', True)

      parsed = self.dict(
          exclude_unset=exclude_unset,
          by_alias=by_alias,
          **kwargs,
      )

      # Mongo uses `_id` as default key. We should stick to that as well.
      if '_id' not in parsed and 'id' in parsed:
          parsed['_id'] = parsed.pop('id')

      return parsed


class User(MongoModel):
  id: OID = Field()
  name: str = Field()


@app.post('/me', response_model=User)
def save_me(body: User):
  assert isinstance(body.id, ObjectId)
  res = db.insert_one(body.mongo())
  assert res.inserted_id == body.id

  found = col.find_one({'_id': res.inserted_id})
  return User.from_mongo(found)

And the list of things that are sub-optimal with given code:

  1. One can no longer return any data and expect FastApi to apply response_model validation. Have to use User.from_mongo with every return. This is somewhat a code duplication. Would be nice to get rid of this somehow
  2. The amount of "boilerplate" code needed to make FastAPI work "natively" with mongo is quite significant and it's not that straightforward. This can lead to potential errors and raises entry bar for someone who wants to start using FastAPI with mongo
  3. There is still this duality, where in models one uses id field, while all mongo queries are built using _id. Afraid there is no way to get rid of this though... (I'm aware that MongoEngine and other ODM engines cover this, but specifically decided to stay out of this subject and focus on "native" code)
@mclate mclate added the question Question or problem label Jun 4, 2020
@dbanty
Copy link
Contributor

dbanty commented Jun 7, 2020

Firstly, nice work! As you said, this is a fully working solution to using MongoDB with FastAPI that I'm sure will benefit people going forward.

I would highly recommend that if this is to become the "recommended" way of working with MongoDB that we recommend an ODM (object-document-mapper) and show any potential issues with using those with Pydantic/FastAPI. The main reasons are:

  1. It would fall more in line with most of the examples in the docs (e.g. SQLAlchemy). Most FastAPI examples with response models show returning ORM-like objects. An ODM is the natural translation for Mongo.
  2. It resolves the 3 sub-optimal points you mentioned above.
  3. It encourages using objects instead of dicts for all code- which allows type annotations and editor completion.

The existing ODMs are not great. I don't think any of the major ones include type annotations or bulk write support. But they are fairly lightweight and get us most of the way there, and allow you to reach down into raw Mongo queries when you need to. I think if we're going to put some development effort into making Mongo easier to use with Pydantic/FastAPI, it would be best spent writing docs that are as accessible as possible and maybe contributing to existing ODMs to clear up any sticking points.

Obviously ODMs can be a contentious topic, but so can ORMs and FastAPI does not shy away from showing them as the easier way to get started. I think in an ideal world, we'd include the more straight forward "here's an ODM, point and click" approach first and the more advanced "DIY" approach after for people who want to wander into the deep end.

@mdgilene
Copy link

mdgilene commented Jun 11, 2020

Correct me if I'm wrong but isn't the real key missing part for all of this the serializers/deserializers/validators for all the Mongo/Bson datatypes in Pydantic. If Pydantic added support for all the extra datatypes then you could just return a MongoEngine instance directly no?

@leonh
Copy link

leonh commented Jul 24, 2020

Interested to know how people are handling creation of indexes in mongo db . Does anyone know of suitable way to define index on a Pydantic model?

@MarkShawn2020
Copy link

MarkShawn2020 commented Jul 24, 2020

Here Comes!

I gived up the json_encoder in the fastapi, and developed a more handy one, specialized for mongodb.

Keep in mind that if you has one _id field in the document, the mongodb won't generate one ObjectID.

So it's better that we always generate our own _id.

# -*- coding: utf-8 -*-
# -----------------------------------
# @CreateTime   : 2020/7/25 0:27
# @Author       : Mark Shawn
# @Email        : shawninjuly@gmail.com
# ------------------------------------

import json
from datetime import datetime, date
from uuid import UUID
from bson import ObjectId
from pydantic import BaseModel


def mongo_json_encoder(record: [dict, list, BaseModel]):
    """
    This is a json_encoder designed specially for dump mongodb records.

    It can deal with both record_item and record_list type queried from mongodb.

    You can extend the encoder ability in the recursive function `convert_type`.

    I just covered the following datatype: datetime, date, UUID, ObjectID.

    Contact me if any further support needs.

    Attention: it will change the raw record, so copy it before operating this function if necessary.

    Parameters
    ----------
    **record**: a dict or a list, like the queried documents from mongodb.

    Returns
    -------
    json formatted data.
    """

    def convert_type(data):
        if isinstance(data, (datetime, date)):
            # ISO format: data.isoformat()
            return str(data)
        elif isinstance(data, (UUID, ObjectId)):
            return str(data)
        elif isinstance(data, list):
            return list(map(convert_type, data))
        elif isinstance(data, dict):
            return mongo_json_encoder(data)
        try:
            json.dumps(data)
            return data
        except TypeError:
            raise TypeError({
                "error_msg": "暂不支持此类型序列化",
                "key": key,
                "value": value,
                "type": type(value)
            })

    # add support for BaseModel
    if isinstance(record, BaseModel):
        return mongo_json_encoder(record.dict(by_alias=True))
    elif isinstance(record, dict):
        for key, value in record.items():
            record[key] = convert_type(value)
        return record
    else:
        return list(map(mongo_json_encoder, record))


def mongo_json_encoder_decorator(func):
    """
    this is a decorator for converting the queried documents from mongodb
    
    Parameters
    ----------
    func

    Returns
    -------

    """
    def wrapper(*args, **kwargs):
        res = func(*args, **kwargs)
        return mongo_json_encoder(res)
    return wrapper

and the test script is passed as the following:

# -*- coding: utf-8 -*-
# -----------------------------------
# @CreateTime   : 2020/7/25 0:47
# @Author       : Mark Shawn
# @Email        : shawninjuly@gmail.com
# ------------------------------------

import uuid
from uuid import UUID
from bson import ObjectId
from typing import  List, Union
from pydantic import BaseModel, Field
from utils.json import mongo_json_encoder


class FriendBase(BaseModel):
    class Config:
        arbitrary_types_allowed = True
        allow_population_by_field_name = True

    id: Union[str, UUID, ObjectId] = Field(alias='_id')
    name: str


class Friend(FriendBase):
    friends: List[FriendBase] = []


f_1 = Friend(id='test', name='test')
f_2 = Friend(id=uuid.uuid1(), name='test', friends=[f_1])
f_3 = Friend(id=ObjectId(), name='test', friends=[f_1, f_2])

i_1 = f_1.dict(by_alias=True)
i_2 = f_2.dict(by_alias=True)
i_3 = f_3.dict(by_alias=True)

j_1 = mongo_json_encoder(i_1.copy())
j_2 = mongo_json_encoder(i_2.copy())
j_3 = mongo_json_encoder(i_3.copy())
j_all = [f_1, f_2, f_3]

assert i_1 == j_1
assert i_2 == j_2, "this should not pass"
assert i_3 == j_3, "this should not pass"

It just runs well!

@raedkit
Copy link

raedkit commented Oct 12, 2020

I hope @tiangolo would adapt FastAPI to have less boilerplate code when using MongoDB.. This would be fantastic.

@art049
Copy link

art049 commented Nov 10, 2020

I recently wrote ODMantic to ease the integration of FastAPI/Pydantic and MongoDB.
Basically, it completely bundle all the boilerplate code required to work with Mongo and you can still perform raw mongo queries on top of the one brought by the ODM engine.

There is a FastAPI example in the documentation if you wanna have a look 😃

@mdgilene
Copy link

@art049 Hey, this looks very promising as an all in one solution the problems discussed in this thread. Would be great to get some buy in from the major players the Python world and see the project grow more mature. I'm always hesitant pulling in relatively new libraries (looks like your project is ~6-7months old) especially into production code until it is proven to be relatively mature and well maintained. Either way, this does look like it address pretty much all of this issues people have brought up. Looking forward to how this progresses.

@philmade
Copy link

Dumb question

Doesn't this problem go away if you just allow your mongo engine to auto place _id on your models, and then use that instead of id ?

@philmade
Copy link

philmade commented Nov 25, 2020

This was my solution - just define a new 'out' schema with 'id' on there, then set that it to '_id' from the object which comes out of the database on a query. It allowed me to use a standard response model,

class UserBase(BaseModel):
    id: Optional[PyObjectId] = Field(alias='_id')
    username: str

class UserOut(UserBase):
    id: Optional[PyObjectId]

@core.get('/user', response_model=users.UserOut)
async def userfake() -> users.UserFull:
    user = UserBase()
    result = await mdb.users.insert_one(user.dict())
    in_db = await mdb.users.find_one({'_id': result.inserted_id})
    out_db = users.UserOut(**in_db)
    out_db.id = in_db['_id']
    return out_db

@philmade
Copy link

philmade commented Nov 26, 2020

I actually improved on that slightly so it kinda 'just works'.

I've created a MongoBase and a MongoOut schema, which you could subclass for all other outward data. This way, our Alias allows us to write to _id on the way in, and out MongoOut schema reworks the data on the way back out. The MongoOut class should always be the class you inherit from first - it won't work the other way. It eliminates the need for those messy lines above.

class MongoBase(BaseModel):
    id: Optional[PyObjectId] = Field(alias='_id')

    class Config(BaseConfig):
        orm_mode = True
        allow_population_by_field_name = True
        json_encoders = {
            datetime: lambda dt: dt.isoformat(),
            ObjectId: lambda oid: str(oid),
        }


class MongoOut(MongoBase):
    id: Optional[PyObjectId]

    def __init__(self, **pydict):
        super().__init__(**pydict)
        self.id = pydict.pop('_id')

class UserOut(MongoOut, UserBase):
    pass

@core.get('/user', response_model=users.UserOut)
async def userfake():
    user = fake_user()
    result = await mdb.users.insert_one(user.dict())
    in_db = await mdb.users.find_one({'_id': result.inserted_id})
    return in_db

@philmade
Copy link

philmade commented Nov 26, 2020

Sorry to add more, hopefully this is useful. The hackiest but simplest solution I've found is below - you don't actually need the alias when using motor engine. Motor automatically adds ObjectID to every object if its not there, so you can actually drop the MongoOut and have one simple MongoBase which populates id with _id at initialisation :

class MongoBase(BaseModel):
    id: Optional[PyObjectId]

    class Config(BaseConfig):
        orm_mode = True
        allow_population_by_field_name = True
        json_encoders = {
            datetime: datetime.isoformat,
            ObjectId: str
        }

    def __init__(self, **pydict):
        super().__init__(**pydict)
        self.id = pydict.get('_id')

class UserBase(MongoBase):
    username: str
    email: str = None
    first_name: str = None
    last_name: str = None

@core.get('/user', response_model=users.UserBase)
async def userfake():
    user = fake_user()
    result = await mdb.users.insert_one(user.dict())
    in_db = await mdb.users.find_one({'_id': result.inserted_id})
    return in_db

The downside (is it a downside?) is that in the DB there's a redundant 'id' which isn't being used. Below is what in_db looks like before its put back into UserBase(MongoBase). However, in_db['_id'] is equal to the out_db.id object, and the swagger docs are all correct....

{'_id': ObjectId('5fb9f4c00d1263cc1555d197'), 'id': None, 'username': 'Denise Garcia', }

@leonh
Copy link

leonh commented Nov 26, 2020

@NomeChomsky I use a mixin that works in a similar way.

class DBModelMixin(BaseModel):
    id: Optional[ObjectIdStr] = Field(..., alias="_id")

    class Config:
        json_loads = json_util.loads
        json_dumps = json_util.dumps
        allow_population_by_field_name = True
        json_encoders = {ObjectId: lambda x: str(x)}

with pydantic classes like this...

class Item(BaseModel):
    name: str = Field(..., max_length=250)

class ItemDB(Item, DBModelMixin):
    pass

@tuxnani
Copy link

tuxnani commented Dec 31, 2020

How to paginate MongoDB cursor object in FastApi?

@Kilo59
Copy link

Kilo59 commented Dec 31, 2020

How to paginate MongoDB cursor object in FastApi?
https://pymongo.readthedocs.io/en/stable/api/pymongo/cursor.html#pymongo.cursor.Cursor
https://motor.readthedocs.io/en/stable/api-asyncio/cursors.html#motor.motor_asyncio.AsyncIOMotorCursor

Both regular pymongo and motor cursor's have next() implemented, so you can iterate over the items one at a time.
You could also use skip & limit to along with .to_list()/list() to "paginate" over chunks of documents if you are so inclined.

@mghayour
Copy link

mghayour commented Feb 21, 2021

Hi guys
I was getting this error while returning sth like this as api result:

{
   "_id": ObjectId("6031523be7ff2bb4e5294211"),
   "name": "mahdi"
}

Error:

File "/home/mahdi/.local/lib/python3.7/site-packages/fastapi/encoders.py", line 143, in jsonable_encoder
    raise ValueError(errors)
ValueError: [TypeError("'ObjectId' object is not iterable"), TypeError('vars() argument must have __dict__ attribute')]

I browsed other solutions but they needs change entire project and apis
So, this is my solution:

# fix ObjectId & FastApi conflict
import pydantic
from bson.objectid import ObjectId
pydantic.json.ENCODERS_BY_TYPE[ObjectId]=str

This work fine for me, to serialize ObjectId with native fastApi methods
hope it helps

@mghayour
Copy link

mghayour commented Feb 21, 2021

And this is my user model, to support pydantic and mongodb,
These code also fix all of issues that mentioned in main question.

from pydantic import BaseModel
import struct
import pydantic
from bson.objectid import ObjectId

class BeeObjectId(ObjectId):
    # fix for FastApi/docs
    __origin__ = pydantic.typing.Literal
    __args__ = (str, )

    @property
    def timestamp(self):
        timestamp = struct.unpack(">I", self.binary[0:4])[0]
        return timestamp

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, v):
        if not isinstance(v, ObjectId):
            raise ValueError("Not a valid ObjectId")
        return v

# fix ObjectId & FastApi conflict
pydantic.json.ENCODERS_BY_TYPE[ObjectId]=str
pydantic.json.ENCODERS_BY_TYPE[BeeObjectId]=str


class User(BaseModel):
    id: BeeObjectId
    name: str
    class Config:
        fields = {'id': '_id'}

@avico78
Copy link

avico78 commented Mar 8, 2021

And this is my user model, to support pydantic and mongodb,
These code also fix all of issues that mentioned in main question.

from pydantic import BaseModel
import struct
import pydantic
from bson.objectid import ObjectId

class BeeObjectId(ObjectId):
    # fix for FastApi/docs
    __origin__ = pydantic.typing.Literal
    __args__ = (str, )

    @property
    def timestamp(self):
        timestamp = struct.unpack(">I", self.binary[0:4])[0]
        return timestamp

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, v):
        if not isinstance(v, ObjectId):
            raise ValueError("Not a valid ObjectId")
        return v

# fix ObjectId & FastApi conflict
pydantic.json.ENCODERS_BY_TYPE[ObjectId]=str
pydantic.json.ENCODERS_BY_TYPE[BeeObjectId]=str


class User(BaseModel):
    id: BeeObjectId
    name: str
    class Config:
        fields = {'id': '_id'}

Having issue with FastApi and Mongo , especially when it comes to date format configured in basemodel ,
not sure if what you did solving it , but can provide an example of use?

@avico78
Copy link

avico78 commented Mar 8, 2021

Is there a way working with formatted dates with fastapi/mongo?
ibreaking my head trying to do simple assignment of pyaditc class
but mongo doesn't accept datetime.date ....

any suggestion?

class CustomerBase(BaseModel):
    birthdate: date = None

Mongo connection:

from motor.motor_asyncio import AsyncIOMotorClient

DB = DB_CLIENT[CONF.get("databases", dict())["mongo"]["NAME"]]

for input :

{ "birthdate": "2021-03-05"}

Routing:

@customers_router.post("/", response_model=dict)
async def add_customer(customer: CustomerBase):
    print(customer.dict())

>> {'birthdate': datetime.date(2021, 3, 5)}

    await DB.customer.insert_one(customer.dict())
    return {"test":1}

>> 
 File "./customers/routes.py", line 74, in add_customer
    await DB.customer.insert_one(customer.dict())
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.8/site-packages/pymongo/collection.py", line 698, in insert_one
    self._insert(document,
  File "/usr/local/lib/python3.8/site-packages/pymongo/collection.py", line 613, in _insert
    return self._insert_one(
  File "/usr/local/lib/python3.8/site-packages/pymongo/collection.py", line 602, in _insert_one
    self.__database.client._retryable_write(
  File "/usr/local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1498, in _retryable_write
    return self._retry_with_session(retryable, func, s, None)
  File "/usr/local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1384, in _retry_with_session
    return self._retry_internal(retryable, func, session, bulk)
  File "/usr/local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1416, in _retry_internal
    return func(session, sock_info, retryable)
  File "/usr/local/lib/python3.8/site-packages/pymongo/collection.py", line 590, in _insert_command
    result = sock_info.command(
  File "/usr/local/lib/python3.8/site-packages/pymongo/pool.py", line 699, in command
    self._raise_connection_failure(error)
  File "/usr/local/lib/python3.8/site-packages/pymongo/pool.py", line 683, in command
    return command(self, dbname, spec, slave_ok,
  File "/usr/local/lib/python3.8/site-packages/pymongo/network.py", line 120, in command
    request_id, msg, size, max_doc_size = message._op_msg(
  File "/usr/local/lib/python3.8/site-packages/pymongo/message.py", line 714, in _op_msg
    return _op_msg_uncompressed(
bson.errors.InvalidDocument: cannot encode object: datetime.date(2021, 3, 5), of type: <class 'datetime.date'>

```

@Kilo59
Copy link

Kilo59 commented Apr 2, 2021

@mannawar
I'm not sure I fully understand your complete question. And this seems off-topic (it doesn't really have anything to do with Mongo).

But when it comes to how to model your data, I would suggest a single generic audio or item model with song, podcast audiobook as enum types.

import enum

class AudoType(str, enum.Enum):
  song = "song"
  podcast = "podcast"
  audiobook = "audiobook"

class Audio(BaseModel):
    id: Annotated[str, Field(default_factory=lambda: uuid4().hex)]
    name: str = Field(..., exclusiveMaximum=10)
    duration: int = Field(...)
    uploaded_time: datetime = Field(...)
    type: AudioType = Field(...)

If the models need to be significantly different, you could also use Union[Model1, Model2, Model3] and do an isisntance check.

@RamiAwar
Copy link

Hey guys, I created a small ORM that would solve this issue. It's based on Pydantic and wraps PyMongo, and it's nothing fancy yet just something that I use in a small production environment.

It's still in development, but hopefully the core API is stable enough now. It lacks some features like index creation, fancy query operations, etc. but would get the job done better than what's already in FastAPI contrib and some other wrappers that I found.

Docs (kind of): https://ramiawar.github.io/Mongomantic
Repo: https://github.com/RamiAwar/Mongomantic

The inspiration for this was ditching mongoengine, which mongomantic is heavily inspired by. Using Pydantic and Mongoengine requires the definition of two schemas, one being a Pydantic model and the other a Mongoengine model.

Mongomantic would solve this problem by relying solely on one Pydantic model.

Feel free to submit any issues!

@yoshiya0503
Copy link

yoshiya0503 commented Jun 2, 2021

if you want to use model as 'SQLAlchmey way' based on pymongo + pydantic, as follows.

looks like extreamly simple.

from typing import Union
from abc import abstractmethod, ABCMeta
from datetime import datetime
from pydantic import BaseModel, Field
from bson.objectid import ObjectId as PyObjectId
import os
from motor.motor_asyncio import AsyncIOMotorClient
from config import settings

client = AsyncIOMotorClient(config.MONGO_URL)
db = client[config.MONGO_DB]

class Collection(metaclass=ABCMeta):
    __collection__ = None

    @classmethod
    @property
    def collection(cls):
        if cls.__collection__ is None:
            raise ValueError('collection name is invalid')
        return cls.__collection__

    @abstractmethod
    def document(self) -> dict:
        raise NotImplementedError('you have to define document()')

    @classmethod
    async def find(cls) -> list:
        return await db[cls.collection].find().to_list(None)

    @classmethod
    async def get(cls, query: dict) -> dict:
        return await db[cls.collection].find_one(query)

    async def create(self) -> dict:
        await db[self.collection].insert_one(self.document())
        return self.document()

    async def update(self, id: ObjectId, update: dict = {}) -> dict:
        update = self.document()
        del update['_id'], update['updated_at']
        await db[self.collection].update_one({'_id': id}, {'$set': update})
        return await self.get({'_id': id})

    async def delete(self, id: ObjectId) -> None:
        return await db[self.collection].delete_one({'_id': id})


class Model(BaseModel):
    id: ObjectId = Field(default_factory=ObjectId, alias="_id")
    created_at: datetime = datetime.now()
    updated_at: datetime = datetime.now()

    def document(self) -> dict:
        return self.dict(by_alias=True)

    @classmethod
    def model(cls, document: Union[dict, list, None]):
        if not document:
            return document
        if type(document) is dict:
            return cls(**dict(document))
        if type(document) is list:
            return [cls(**dict(doc)) for doc in document]

    class Config:
        allow_population_by_field_name = True
        arbitrary_types_allowed = True
        json_encoders = {
            ObjectId: str,
            datetime: lambda dt: dt.isoformat()
        }


class Document(Model, Collection):
    pass

@mclate
Copy link
Author

mclate commented Jun 2, 2021

Motor and other async mongo libs are using pymongo under the hood, which is not async in any way. Apparently, they are all using ThreadPoolExecutor to make pymongo calls in a separate thread (i wish i'm wrong on this one). Thus, here is the simple (yet, not full) solution on how to make MongoEngine look like async one:

import asyncio
import concurrent.futures
import functools

executor = concurrent.futures.ThreadPoolExecutor()


def aio(f):
    @functools.wraps(f)
    async def aio_wrapper(*args, **kwargs):
        f_bound = functools.partial(f, *args, **kwargs)
        loop = asyncio.get_running_loop()
        return await loop.run_in_executor(executor, f_bound)

    return aio_wrapper


class AsyncQuerySet(mongoengine.QuerySet):
    _get = mongoengine.QuerySet.get
    get = aio(mongoengine.QuerySet.get)

    _count = mongoengine.QuerySet.count
    count = aio(mongoengine.QuerySet.count)

    _first = mongoengine.QuerySet.first
    first = aio(mongoengine.QuerySet.first)


class Document(mongoengine.Document):
    meta = {
        'abstract': True,
        'queryset_class': AsyncQuerySet,
    }

    _save = mongoengine.Document.save
    save = aio(mongoengine.Document.save)

    _update = mongoengine.Document.update
    update = aio(mongoengine.Document.update)

    _modify = mongoengine.Document.modify
    modify = aio(mongoengine.Document.modify)

    _delete = mongoengine.Document.delete
    delete = aio(mongoengine.Document.delete)

class MyDoc(Document):
    ....

await MyDoc.objects(id='...').first()  # async version
# or 
MyDoc.objects(id='...')._first()  # sync version

Notice, this doesn't cover all the cases yet. In particular, things like for i in MyDoc.objects(): is still a synchronous call. If anyone can figure out how to make QuerySet.__iter__ work asynchronously, that would be very nice.

@aknoerig
Copy link

Having tried a couple of alternatives myself, I have now settled for the ODM library Beanie. It's very much in line with FastAPI philosophies (async, pydantic, etc), well documented and actively maintained.

@mickdewald
Copy link

Having tried a couple of alternatives myself, I have now settled for the ODM library Beanie. It's very much in line with FastAPI philosophies (async, pydantic, etc), well documented and actively maintained.

Can you compare it to ODMantic? I have settled with ODMantic, but I don't like, that the project is rather inactive (the latest release is relatively old) and it does not work with the latest motor version.

@aknoerig
Copy link

Having tried a couple of alternatives myself, I have now settled for the ODM library Beanie. It's very much in line with FastAPI philosophies (async, pydantic, etc), well documented and actively maintained.

Can you compare it to ODMantic? I have settled with ODMantic, but I don't like, that the project is rather inactive (the latest release is relatively old) and it does not work with the latest motor version.

I haven't done a thorough analysis, but from what I can tell, the two are very similar in how they work internally. I also tried ODMantic, but indeed the fact that there has been zero recent activity steered me off. I also prefer the API of Beanie, as it allows you to do the database interactions right from the document model (e.g., MyType.find() and my_instance.save() vs. the engine-API of ODMantic). Finally, Beanie seems to be more complete when it comes to covering MongoDB features.

@vaizki
Copy link

vaizki commented Feb 14, 2022

Having tried a couple of alternatives myself, I have now settled for the ODM library Beanie. It's very much in line with FastAPI philosophies (async, pydantic, etc), well documented and actively maintained.

How did you solve the _id vs id in Beanie + FastAPI? I have tried Field(.., alias='_id') with response_model_by_alias=False on the route decorations and seems like a thousand other combinations but when I get one operation working another one breaks. I need to use id everywhere in the code and the APIs, only place where _id should be present is in the mongo documents.

@aknoerig
Copy link

How did you solve the _id vs id in Beanie + FastAPI?

I think what you want is already the default behaviour for Beanie. All you need to do is to inherit your model from beanie.Document and it will automatically have an id field for use in Python -- no need to define your own.

When converting the model to json (for storing in MongoDB, or for your JavaScript frontend) it gets automatically converted to _id, as you would expect. Same for the other way around: the json _id gets converted back into the python id field.

@fleshgakker
Copy link

fleshgakker commented Mar 18, 2022

Don't like the idea of jsoning FastApi Models, so came up with the following pydantic validator:

def _mongo_id_mutator(cls, values) -> dict:
    if '_id' in values:
        values['id'] = values['_id']
        del values['_id']
    return values

def mongo_id_mutator() -> classmethod:
    decorator = root_validator(pre=True, allow_reuse=True)
    validation = decorator(_mongo_id_mutator)
    return validation

class ProductId(BaseModel):
    id: MongoObjectId
    _id_validator: classmethod = mongo_id_mutator()

@codespresso
Copy link

How did you solve the _id vs id in Beanie + FastAPI?

I think what you want is already the default behaviour for Beanie. All you need to do is to inherit your model from beanie.Document and it will automatically have an id field for use in Python -- no need to define your own.

When converting the model to json (for storing in MongoDB, or for your JavaScript frontend) it gets automatically converted to _id, as you would expect. Same for the other way around: the json _id gets converted back into the python id field.

Beanie is a great library but it suffers from the same issue. While returning the document instance it returns "_id" in the json response while we expect "id" to be returned. Here is how I solved the problem taking tips from all the issues

Use separate Response model and set "id" explicitly in "init" method to avoid using dict method

class InvestorResponse(Investor):
    """
    Investor Response Model
    """
    class Config:
        fields = {'id': 'id'}

    def __init__(self, **pydict):
        super(InvestorResponse, self).__init__(**pydict)
        self.id = pydict.get('_id')

@LrsK
Copy link

LrsK commented Jul 5, 2022

How did you solve the _id vs id in Beanie + FastAPI?

I think what you want is already the default behaviour for Beanie. All you need to do is to inherit your model from beanie.Document and it will automatically have an id field for use in Python -- no need to define your own.
When converting the model to json (for storing in MongoDB, or for your JavaScript frontend) it gets automatically converted to _id, as you would expect. Same for the other way around: the json _id gets converted back into the python id field.

Beanie is a great library but it suffers from the same issue. While returning the document instance it returns "_id" in the json response while we expect "id" to be returned. Here is how I solved the problem taking tips from all the issues

Use separate Response model and set "id" explicitly in "init" method to avoid using dict method

class InvestorResponse(Investor):
    """
    Investor Response Model
    """
    class Config:
        fields = {'id': 'id'}

    def __init__(self, **pydict):
        super(InvestorResponse, self).__init__(**pydict)
        self.id = pydict.get('_id')

While using beanie, you can get the correct "id"-field instead of "_id", by unsetting the response_model_by_alias field in the FastAPI-route.

@app.get("/todos", response_model=list[Todo], response_model_by_alias=False)
async def get_todos():
    ...
    ```

@AndreMPCosta
Copy link

How did you solve the _id vs id in Beanie + FastAPI?

I think what you want is already the default behaviour for Beanie. All you need to do is to inherit your model from beanie.Document and it will automatically have an id field for use in Python -- no need to define your own.
When converting the model to json (for storing in MongoDB, or for your JavaScript frontend) it gets automatically converted to _id, as you would expect. Same for the other way around: the json _id gets converted back into the python id field.

Beanie is a great library but it suffers from the same issue. While returning the document instance it returns "_id" in the json response while we expect "id" to be returned. Here is how I solved the problem taking tips from all the issues
Use separate Response model and set "id" explicitly in "init" method to avoid using dict method

class InvestorResponse(Investor):
    """
    Investor Response Model
    """
    class Config:
        fields = {'id': 'id'}

    def __init__(self, **pydict):
        super(InvestorResponse, self).__init__(**pydict)
        self.id = pydict.get('_id')

While using beanie, you can get the correct "id"-field instead of "_id", by unsetting the response_model_by_alias field in the FastAPI-route.

@app.get("/todos", response_model=list[Todo], response_model_by_alias=False)
async def get_todos():
    ...
    ```

Simple and effective, thanks!

@Pablongo24
Copy link

Pablongo24 commented Oct 24, 2022

Sorry to add more, hopefully this is useful. The hackiest but simplest solution I've found is below - you don't actually need the alias when using motor engine. Motor automatically adds ObjectID to every object if its not there, so you can actually drop the MongoOut and have one simple MongoBase which populates id with _id at initialisation :

class MongoBase(BaseModel):
    id: Optional[PyObjectId]

    class Config(BaseConfig):
        orm_mode = True
        allow_population_by_field_name = True
        json_encoders = {
            datetime: datetime.isoformat,
            ObjectId: str
        }

    def __init__(self, **pydict):
        super().__init__(**pydict)
        self.id = pydict.get('_id')

class UserBase(MongoBase):
    username: str
    email: str = None
    first_name: str = None
    last_name: str = None

@core.get('/user', response_model=users.UserBase)
async def userfake():
    user = fake_user()
    result = await mdb.users.insert_one(user.dict())
    in_db = await mdb.users.find_one({'_id': result.inserted_id})
    return in_db

The downside (is it a downside?) is that in the DB there's a redundant 'id' which isn't being used. Below is what in_db looks like before its put back into UserBase(MongoBase). However, in_db['_id'] is equal to the out_db.id object, and the swagger docs are all correct....

{'_id': ObjectId('5fb9f4c00d1263cc1555d197'), 'id': None, 'username': 'Denise Garcia', }

You can improve on this solution by just adding a conditional when you initialize your MongoBase. I ended up going in this direction, where I let Mongo handle all "_id" at document creation time (i.e. I don't pass any kind of ID on insert_one or insert_many). For each collection, I have a BaseModel and a ResponseModel (e.g. BaseUserModel and ResponseUserModel). In the Response Models, I handle the conversion of "_id" to "id".

The sample below is almost a full working sample. It only needs a mongo client connector object - which is what I import in line 8: from app.services.mongo_db_client import client

from datetime import datetime
from typing import Union

from bson import ObjectId
from pydantic import BaseModel, EmailStr
from fastapi import APIRouter, Depends, HTTPException, status

from app.services.mongo_db_client import client

router = APIRouter(prefix='/users')


# The 3 classes below would be in your `models.py`, or modules inside your `models` subpackage
class BaseMongoModel(BaseModel):

    def __init__(self, **data: dict):
        data = self._reformat_mongo_id_key(data)
        super(BaseMongoModel, self).__init__(**data)

    @staticmethod
    def _reformat_mongo_id_key(data):
        if not data:
            return data
        if '_id' in data and 'id' not in data:
            data['id'] = data.pop('_id', None)
        return data


class BaseUserModel(BaseMongoModel):
    first_name: str
    last_name: str
    email: EmailStr

    class Config:
        json_encoders = {ObjectId: str, datetime: str}


class ResponseUserModel(BaseUserModel):
    id: ObjectId
    date_created: datetime


# These classes would be in `services.py`, or in modules inside your `services` subpackage
class CrudBase:
    def __init__(self, db_name: str, collection_name: str) -> None:
        self.client = client.get_client()
        self.db = self.client[db_name]
        self.collection = self.db[collection_name]
        self._id_field = '_id'
        self._base_date_fields = ['date_created']

    async def create(self, data: dict) -> dict:
        data = self._assign_date_fields(data)
        data = await self.collection.insert_one(data)
        return data

    async def fetch_by_id(self, item_id: Union[ObjectId, str]) -> dict:
        item = await self.collection.find_one({self._id_field: ObjectId(item_id)})
        return item

    async def _assign_date_fields(self, data: dict) -> dict:
        utc_now = datetime.utcnow()
        data.update({date_field: utc_now for date_field in self._base_date_fields})
        return data

    # Additional Crud operations would go in this class, e.g. insert_many, delete, update_one, update_many, etc...


class CrudUser(CrudBase):
    # Inherits all from CrudBase
    pass


# This class would be in your dependencies.py. Checks that the record exists in the db. if not, raises HTTPException.
class IdValidators:
    def __init__(self, crud_service: CrudUser):
        self.crud_service = crud_service

    async def validate_id(self, item_id: str) -> dict:
        item = await self.crud_service.fetch_by_id(item_id)
        if item is None:
            raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="<Your Error Message Here>")
        return item


# The functions below would be in your routers.py
crud = CrudUser(db_name='<Your DB Name>', collection_name='<Your Collection Name>')
validator = IdValidators(crud_service=crud)


@router.get("/{item_id}", response_model=ResponseUserModel)
async def get_user_by_id(user: dict = Depends(validator.validate_id)) -> dict:
    return user


@router.post("/", response_model=ResponseUserModel, status_code=status.HTTP_201_CREATED)
async def create_user(user: BaseUserModel) -> dict:
    user = user.dict()
    user = await crud.create(user)
    return user

The solution above covers all 6 points addressed in the original post:

  1. Be able to define mongo-compatible documents as regular Pydantic models (with all the proper validations in place).
  2. Write routes that would use native Pydantic models as usual
  3. Have api to return json like {"id": "5ed8b7eaccda20c1d4e95bb0", "name": "Joe"} (it's quite expected in the "outer world" to have id field for the document rather than _id. And it just looks nicer.)
  4. Have Swagger and ReDoc documentation to display fields id (str), name (str)
  5. Be able to save Pydantic documents into Mongo with proper id field substitution.
  6. Should be able to fetch documents from Mongo with proper id matching
  • I achieve 1. with only a bit of extra code (have to define separate BaseModels and ResponseModels for each Model type)
  • For 2. my routes use Pydantic models, plus I can also use Dependencies to handle database validation - i.e. IdValidator class
  • For 3. I return JSON with "id", which I agree is much better to return for typical client "outer world" expectations
  • For 4. Works with swagger and ReDoc
  • For 5. I don't need to worry about ID substitution at creation time. I let Mongo handle it.
  • For 6. Done, plus added flexibility of handling IDs as strings (if HTTP request) or ObjectId (internal data handling)

In addition:

  • I do have let FastAPI handle all response_model validation. The original post uses a very cumbersome .from_mongo() class method, which is not only annoying (no offense meant) since you have to write this method on every endpoint return, but it kind of defeats one of the main purposes of using FastAPI (integrated response validation)
  • I don't end up with redundant "id" and "_id" fields in my Mongo collections
  • While there is still duality of models using 'id' while Mongo queries use "_id", I define this ONCE in the CrudBase attribute self._id_field = '_id', and then all queries use this attribute, so I don't need to remember to differentiate between "id" and "_id" in my routers, plus I get all the benefits of code completion.

Unfortunately, OP's 2nd point is very true, there is quite a bit of boilerplate needed to make Mongo work "nicely" with FastAPI. Even if you go with my design of letting MongoDB handle all ID creations and only using the API to handle response, you still need 3 different classes to define a ResponseModel, with one class overriding Pydantic's BaseModel constructor. I agree that it is a high barrier to entry. While much of the material I found in different online forums (and Mongo's own blog) was quite helpful, it still took me a couple of days to figure out a good working solution for my use case.

@nyxgear
Copy link

nyxgear commented Dec 20, 2022

To solve the problem, I would report the very convenient pydantic-mongo package:
https://github.com/jefersondaniel/pydantic-mongo

It implements the management of ObjectId as described above, as well as abstractions to upsert, query, and delete entities in MongoDB collections.

From the project's readme:

from pydantic import BaseModel
from pydantic_mongo import AbstractRepository, ObjectIdField
from pymongo import MongoClient

class Spam(BaseModel):
    id: ObjectIdField = None
    foo: Foo
    bars: List[Bar]

    class Config:
        # The ObjectIdField creates an bson ObjectId value, so its necessary to setup the json encoding
        json_encoders = {ObjectId: str}

class SpamRepository(AbstractRepository[Spam]):
    class Meta:
        collection_name = 'spams'

client = MongoClient(os.environ["MONGODB_URL"])
database = client[os.environ["MONGODB_DATABASE"]]

spam = Spam(foo=Foo(count=1, size=1.0),bars=[Bar()])

spam_repository = SpamRepository(database=database)

# Insert / Update
spam_repository.save(spam)

# Delete
spam_repository.delete(spam)

# Find One By Id
result = spam_repository.find_one_by_id(spam.id)

@tiangolo tiangolo changed the title [QUESTION] FastApi & MongoDB - the full guide FastApi & MongoDB - the full guide Feb 24, 2023
Repository owner locked and limited conversation to collaborators Feb 28, 2023
@tiangolo tiangolo converted this issue into discussion #9074 Feb 28, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests