Pydantic vs TypedDict experiment

TLDR at the bottom.

Working on a project which interacts with an external http API. The API accepts a json input with certain keys. To simplify the request object, lets take the following example:

{
    "menge": int,
    "stuck_preis": float,
    "farbe": string
}

To handle the creation of the requests body there is a pydantic class, something like:

from pydantic import BaseModel

class ExternalAPIRequest(BaseModel):
    menge: str
    stuck_preis: float
    farbe: str


request = ExternalAPIRequest(
    menge=50,
    stuck_preis=10,
    farbe="rot
)

As you can see the keys are not in English. My code base is in English and I dont like to merge languages, so I have a mapping object such as:

from enum import Enum

class ExternalAPIKeys(Enum):
    QUANTITY = "menge"
    PRICE_PER_UNIT = "stuck_preis"
    COLOR = "farbe"

Here is all good, but I had the happy idea, what if the external api changes the keys to English? I would have to update the keys everywhere. I am not a big fun of this renaming tasks, so could the renaming effort be reduced somehow?

Simple mapping object approach

The ExternalAPIRequest instances can be created with the help of the mapping object:

request = ExternalAPIRequest(**{
    ExternalAPIKeys.QUANTITY: 50,
    ExternalAPIKeys.PRICE_PER_UNIT: 10.5,
    ExternalAPIKeys.COLOR: "rot
})

Much better, the keys are hard coded in just two places, the keys mapping and the pydantic model.

Metaclasses based on the pydantic model

Two places still more than the ideal. Lets have dynamic pydantic class definition then. It was at this point I realized that to achieve this I would require metaclasses and probably it was not a good idea. However, could this work? Would mypy complain? Now I had to dig deeper.

PARAMETERS_TYPE_MAPPING = {
    ExternalAPIKeys.QUANTITY: int,
    ExternalAPIKeys.PRICE_PER_UNIT: float
    ExternalAPIKeys.COLOR: str
}


DynamicExternalAPIRequest = type(
    "DynamicExternalAPIRequest",
    (BaseModel,),
    {
        "__annotations__": {
            key.value: key_type for key, key_type in PARAMETERS_TYPE_MAPPING
        }
    },
)

request = DynamicExternalAPIRequest(**{
    ExternalAPIKeys.QUANTITY: 50,
    ExternalAPIKeys.PRICE_PER_UNIT: 10.5,
    ExternalAPIRequest.COLOR: "rot
})

Testing

Lets test this with a correct and a buggy instances:

request = DynamicExternalAPIRequest(
    **{
        ExternalAPIKeys.QUANTITY.value: 50,
        ExternalAPIKeys.PRICE_PER_UNIT.value: 10.5,
        ExternalAPIKeys.COLOR.value: "rot",
    }
)
print(request.json())

Test passes, the object is built correctly:

{
    "menge": 50, 
    "stuck_preis": 10.5, 
    "farbe": "rot"
}

And lets check a buggy instance:

wrong_request = DynamicExternalAPIRequest(
    false_value=50, stuck_preis=25, farbe="rot"
)

This second object raises an exception:

  File "/pydantic_base.py", line 104, in third_test
    wrong_request = DynamicExternalAPIRequest(
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for DynamicExternalAPIRequest
menge
  field required (type=value_error.missing)

Which is great, exactly what we want.

Mypy Test

Mypy raises the following errors:

pydantic_base.py:51: error: Unexpected keyword argument "id" for "Foo"  [call-arg]
pydantic_base.py:53: error: Missing named argument "other" for "Foo"  [call-arg]

Mypy only complains about the static declared class. As I suspected it did not complain about the dynamically generated class.

This is not good for my goals. Since the project already runs mypy as part of the CI pipeline it would be great if it caught the errors. Way better to fail on the CI pipeline than on runtime.

Using TypedDict

I was interested on mypy catching the errors. Therefore, TypedDict might be a better suitor, since mypy does catch errors when the objects are defined statically (ie not using metaclasses). Lets define a couple of new objects for the request object. One static and one dynamic:

from enum import Enum
from typing import Any, TypedDict


class ExternalAPIParams(TypedDict):
    menge: int
    farbe: str


class Parameters(Enum):
    quantity = "menge"
    color = "farbe"

PARAMETERS_TYPE_MAPPING = {
    Parameters.name: int, 
    Parameters.identification: str
}

DynamicTypedDict = type(
    "DynamicTypedDict",
    (TypedDict,),
    {
        "__annotations__": {
            p.value: PARAMETERS_TYPE_MAPPING[p] for p in PARAMETERS_TYPE_MAPPING
        },
    },
)

Testing

Lets run some test cases:

# Correct ExternalAPIParams (static case)
correct_ex = ExternalAPIParams(menge=22, farbe="rot")
print(correct_ex)
# Output: {'menge': 22, 'farbe': 'rot'}

# Wrong ExternalAPIParams (static case)
ex = ExternalAPIParams(menge=22, color="rot")
print(ex)
# Output: {'menge': 22, 'color': 'rot'}

# Correct DynamicTypedDict (dynamic case)
correct_dc = DynamicTypedDict(
    **{Parameters.quantity.value: 1, Parameters.color.value: "rot"}
)
print(correct_dc)
# Output: {'menge': 1, 'farbe': 'rot'}

# Wrong DynamicTypedDict (dynamic case)
dc = DynamicTypedDict(id=22, color="rot")
print(dc)
# Output: {'id': 22, 'color': 'rot'}

There are no exceptions raised at all.

Mypy Test

Mypy complains this time. Although only catches the bugs related to the buggy statically defined object:

ex = ExternalAPIParams(menge=22, color="rot")

the errors read as follow:

type_dict.py:48: error: Missing key "farbe" for TypedDict "ExternalAPIParams"  [typeddict-item]
type_dict.py:48: error: Extra key "color" for TypedDict "ExternalAPIParams"  [typeddict-unknown-key]

But there are no complains about the buggy dynamic case. This case isnt great either. It is nice that mypy catches the errors about the static built typeddict, after all this is exactly what mypy was built for. But mypy not catching the bug on the dynamically generated typeddict speaks against its use.

Note on python version

Running the TypedDict experiments I noticed two different results depending on the python version.

When using python 3.11, the interpreter throws an error when creating the dynamic metaclass. The error is:

TypeError: type() doesn't support MRO entry resolution; use types.new_class()

Learnings

  • Mypy is not able to catch errors when the models are defined dynamically.
  • Pydantic complains at runtime when a buggy object is built even if the model was dynamically defined.
  • TypeDict does not complain on runtime when generating a buggy instance, no matter if the model was statically or dynamically defined.

What should you use?

The decision depends whether you want to throw an error at runtime or not.

Should you do this?

By this I mean define models dynamically to raise errors on the static type checker. And my answer is: probably not.

Using metaclasses adds complexity to the code, and as it has been seen, on the case of TypedDict based models, mypy reliability is degraded.

I realized this was a bad idea trying to explain to a colleague what I was aiming for. I put it this way: I am trying to dynamically define models to raise errors on the static type checker of a dynamically typed language.