P G Jones

My evolution writing JSON-REST APIs

2020-12-11

I've been writing JSON-REST APIs in Python for a number of years, and over that time I've found the tooling has greatly improved. To show you how you can benefit I'm going to show you how I've evolved. However, if you want to skip to the tooling I use today take a look at Quart-Schema.

Writing JSON-REST APIs in Python using Flask/Quart has always been pleasent. Yet communicating the API structure (routes, data, etc) to third parties and to my future self has not been. It is also difficult to guarantee that the API works as the documentation says it does. This has resulted in isses, bugs and misuse, in production I'd love to have avoided.

I think I would have avoided these if the APIs I wrote,

  • ensured that any data received or sent matched the communicated structure,
  • communicated their structure using standardised documentation, specifically OpenAPI,
  • were setup such that linting checks informed me if I made any mistakes,
  • took less effort to write (whilst acheiving these aims).

Initial Flask setup

This is a simple example of how I used to write a Flask APIs,

@app.route("/todo/", methods=["POST"])
def create_todo():
    data = request.get_json()
    todo_id = create_todo(data["task"], data["due"])
    todo = {"id": todo_id, "task": data["task"], "due": data["due"]}
    return jsonify(todo), 201

which works very nicely, but meet none of my aims. Both myself and third party user would have to read the code to understand what is sent and returned by the API.

It also does not help to know when the wrong data has been sent. For example due should be a date but if it isn't the create_todo function will raise a ValueError resulting in the API returning a 500 response. It should return a 400 response so that the third party and I can tell that it is an error with the data rather than a server or programming issue.

Flask with docstrings

In an attempt to solve the documentation problem experienced with the simple Flask setup I tried the sphinx autohttp extensions,

@app.route("/todo/", methods=["POST"])
def create_todo():
    """Create a TODO task.

    .. :quickref: Create a TODO.

    :reqheader Accept: application/json
    :<json string task: task description
    :<json string due: todo due date
    :>json int id: todo unique identifier
    :>json string task: task description
    :>json string due: todo due date
    :resheader Content-Type: application/json
    :status 201: post created
    """
    data = request.get_json()
    todo_id = create_todo(data["task"], data["due"])
    todo = {"id": todo_id, "task": data["task"], "due": data["due"]}
    return jsonify(todo), 201

which enables documentation to be created by manually specifying the data structures in the docstring.

Whilst this helps with the documentation it has the same issues as the simpler Flask example, in that it doesn't validate the data sent to the route, nor the data returned. It also doesn't help with any linters, for example a simple typo like data["duo"] would only raise an error during testing, or worse in production.

The bigger issue though, is writing the docstring documentation as it is time consuming, boring work that I (and my colleagues) would forget to do. Over time this lead to the documentation being wrong leading to distrust and lack of use.

I eventually stopped using this technique and reverted to the simpler example as this documentation ended up actively misleading. It also became clear that the winning documentation standard was OpenAPI, rather than sphinx documentation.

Aside switch to Quart

It was around this time in that I switched from writing APIs in Flask to writing them in Quart. The snippets onwards in this article will now assume Quart, but the only difference is the use of async/await keywords, you can remove them to get a Flask equivalent.

I wrote Quart as I wanted to start using the async await keywords and libraries. You can remove async, await from the examples to get a Flask equivalent.

JSONSchema decorators

The data sent and received in the APIs I write is formatted as JSON. This means a JSONSchema could be used to describe the structure and used to validate that any data matches it. This allows me to write decorators that accept JSONSchemas to validate request and response data,

from functools import wraps
from typing import Any, Callable, Dict

import fastjsonschema
from quart import abort, make_response, request


def validate_request(schema: Dict[str, Any]) -> Callable:
    """This validates the request JSON.

    If there is no JSON in the body, or the JSON doesn't validate this
    will trigger a 400 response.
    """
    validator = fastjsonschema.compile(schema)

    def decorator(func: Callable) -> Callable:
        @wraps(func)
        async def wrapper(*args: Any, **kwargs: Any) -> Any:
            data = await request.get_json()
            try:
                validator(data)
            except fastjsonschema.JsonSchemaException:
                abort(400)
            else:
                return await func(*args, **kwargs)

        return wrapper

    return decorator

def validate_response(schemas: Dict[int, Dict[str, Any]]) -> Callable:
    """This validates the response JSON.

    The schemas are keyed by status code.
    """
    validators = {
        status: fastjsonschema.compile(schema)
        for status, schema in schemas.items()
    }

    def decorator(func: Callable) -> Callable:
        @wraps(func)
        async def wrapper(*args: Any, **kwargs: Any) -> Any:
            result = await func(*args, **kwargs)
            response = await make_response(result)
            if response.status_code in validators:
                validators[response.status_code](await response.get_json())

            return response

        return wrapper

    return decorator

These can then be used, for example,

@app.route("/todo/", methods=["POST"])
@validate_request(
    {
        "type": "object",
        "properties": {
            "due": {"type": "string", "format": "date"},
            "task": {"type": "string"},
        },
        "required": ["due", "task"],
        "additionalProperties": False,
    }
)
@validate_response(
    {
        201: {
            "type": "object",
            "properties": {
                "due": {"type": "string", "format": "date"},
                "id": {"type": "number"},
                "task": {"type": "string"},
            },
            "required": ["due", "id", "task"],
            "additionalProperties": False,
        },
    }
)
async def create_todo():
    data = await request.get_json()
    todo_id = create_todo(data["task"], data["due"])
    todo = {"id": todo_id, "task": data["task"], "due": data["due"]}
    return todo, 201

The JSONSchemas can then be used as the documentation either when developing or to create an openapi specification of the API for third parties.

This solves the validation and documentation problems, whilst ensuring that the documentation stays correct. It is however just as painful to write as the sphinx docstrings (although now it cannot be ignored). It also doesn't help linters understand what we are doing with the data.

Dataclass decorators

To gain help from the linting tools the data needs to be defined in a structure they understand. Conveniently Python 3.7 shipped dataclasses which we can use to define the structure and will enable IDEs and linters to alert me of any issues. By altering the decorator to convert the dataclass to a JSONSchema allows a dataclass to be used instead,

@dataclass
class TodoData:
    due: date
    task: str

@dataclass
class Todo(TodoData):
    id: int

@app.route("/todo/", methods=["POST"])
@validate_request(TodoData)
@validate_response(Todo, 201)
async def index() -> Todo:
    data = await request.get_json()
    todo_data = TodoData(**data)
    todo_id = create_todo(todo_data.task, todo_data.due)
    return Todo(id=todo_id, task=todo_data.task, due=todo_data.due), 201

which is actually quite pleasant code to write and actually makes it quicker (with the IDE autocompletion) than any of the previous examples.

I have been experimenting with the code to make the dataclass decorators work for a while, and it isn't the easiest. Thankfully I recently came across pydantic's dataclasses which validate the arguments, and can generate schemas for documentation.

Quart-Schema (Conclusion)

Quart-Schema is a Quart extension that provides the validation decorators and auto-generated documentation. The example in this article in full is,

from datetime import date

from pydantic.dataclasses import dataclass
from quart import Quart
from quart_schema import QuartSchema, validate_request, validate_response

app = Quart(__name__)
QuartSchema(app)

@dataclass
class TodoData:
    due: date
    task: str

@dataclass
class Todo(TodoData):
    id: int

@app.route("/todo/", methods=["POST"])
@validate_request(TodoData)
@validate_response(Todo, 201)
async def index(data: TodoData) -> Todo:
    todo = await create_todo(data)
    return todo, 201

with the documentation available at /docs or /openapi.json for the raw form.

I think, with Quart-Schema, that I've finally achieved all my initial aims in a form that is very clear and simple to write. This has only been possible with the async/await, type hinting, and dataclass improvements to Python made over the past few years.

I'd quite like to know what you think of this - I'm on twitter @pdgjones.