Fork me on GitHub

Task queues

Task queues manage background work that must be executed outside the usual HTTP request-response cycle.

Why are task queues necessary?

Tasks are handled asynchronously either because they are not initiated by an HTTP request or because they are long-running jobs that would dramatically reduce the performance of an HTTP response.

For example, a web application could poll the GitHub API every 10 minutes to collect the names of the top 100 starred repositories. A task queue would handle invoking code to call the GitHub API, process the results and store them in a persistent database for later use.

Another example is when a database query would take too long during the HTTP request-response cycle. The query could be performed in the background on a fixed interval with the results stored in the database. When an HTTP request comes in that needs those results a query would simply fetch the precalculated result instead of re-executing the longer query. This precalculation scenario is a form of caching enabled by task queues.

Other types of jobs for task queues include

  • spreading out large numbers of independent database inserts over time instead of inserting everything at once

  • aggregating collected data values on a fixed interval, such as every 15 minutes

  • scheduling periodic jobs such as batch processes

Task queue projects

The defacto standard Python task queue is Celery. The other task queue projects that arise tend to come from the perspective that Celery is overly complicated for simple use cases. My recommendation is to put the effort into Celery's reasonable learning curve as it is worth the time it takes to understand how to use the project.

  • The Celery distributed task queue is the most commonly used Python library for handling asynchronous tasks and scheduling.

  • The RQ (Redis Queue) is a simple Python library for queueing jobs and processing them in the background with workers. RQ is backed by Redis and is designed to have a low barrier to entry. The intro post contains information on design decisions and how to use RQ.

  • Taskmaster is a lightweight simple distributed queue for handling large volumes of one-off tasks.

Hosted message and task queue services

Task queue third party services aim to solve the complexity issues that arise when scaling out a large deployment of distributed task queues.

  • Iron.io is a distributed messaging service platform that works with many types of task queues such as Celery. It also is built to work with other IaaS and PaaS environments such as Amazon Web Services and Heroku.

  • Amazon Simple Queue Service (SQS) is a set of five APIs for creating, sending, receiving, modifying and deleting messages.

  • CloudAMQP is at its core managed servers with RabbitMQ installed and configured. This service is an option if you are using RabbitMQ and do not want to maintain RabbitMQ installations on your own servers.

Task queue resources

Task queue learning checklist

Pick a slow function in your project that is called during an HTTP request.

Determine if you can precompute the results on a fixed interval instead of during the HTTP request. If so, create a separate function you can call from elsewhere then store the precomputed value in the database.

Read the Celery documentation and the links in the resources section below to understand how the project works.

Install a message broker such as RabbitMQ or Redis and then add Celery to your project. Configure Celery to work with the installed message broker.

Use Celery to invoke the function from step one on a regular basis.

Have the HTTP request function use the precomputed value instead of the slow running code it originally relied upon.

What's next after task queues?

How do I monitor my app and its task queues with logging?

How can I learn more about the users of my application?

What tools exist for monitoring a live web application?


对Full Stack Python这本书很感兴趣?想要一本包含代码、详细教程的完整版吗?那么用邮箱订阅吧。一旦完成了我就会发给你的,别担心除了订阅确认邮件,我是不会给你发乱七八糟的邮件的。