Part 2: Django, Docker, Elastic Beanstalk, RDS, SQS, Celery, and .ebextensions
Have your bottle of ibuprofen ready.
For a solution on how to make Django + Celery + Docker fully scalable on Elastic Beanstalk, see Part 3.
In Part 1 I deployed a Docker app running Django to AWS Elastic Beanstalk (EB) using Docker Compose. There, we just used the standard internal sqlite3
database for demonstration purposes and not a remote database, but if we’re using EB then it makes sense to have our database rest inside of AWS’s RDS service. At the end of the post, I commented that adding AWS RDS to the configuration would be “straightforward” and shrugged off the .ebextensions
. As it turns out, it is not “straightforward” and the deployment scheme we drew up completely bypassed .ebextensions
!
In this post, we’re going back to the drawing board to redesign our deployment routine to include an RDS database and launching a Celery asynchronous background task manager.
A quick note: in multi-instance EB environments, you might have problems with remaining logged in. What gives? Go to your EB environment dashboard, go to configurations, go to your load balancer settings, and enable session stickiness.
An additional issue: After following through with setting up RDS and turning off Django’s debug mode, I noticed that my app was functioning but my load balancer health check was suddenly erroring out, putting my EB environment into “severe” mode erroring with a Target.ResponseCodeMismatch
code while simultaneously telling me that “ELB health is failing or not available for all instances“ and not giving me which HTTP code it was unexpectedly receiving. When reading the environment logs, I also noticed that /var/log/nginx/access.log
, which contains information about health check responses, had also disappeared. The load balancer is responsible for sending out health checks to each instance, and it is possible that something went haywire there (this would not be the first time I broke an EB environment in an unexpected way after heavy experimentation). As frustrating as it it, the fix appears to have been to just make an entirely new environment. This is not ideal, but it is important to have EB health checks functioning properly.
Associating a RDS Database
Creating a new RDS database, either fresh or from a snapshot of an existing database, is very easy with EB. Simply navigate to your EB environment dashboard, go to the configurations panel, and either create a fresh database or create a new database by providing an existing snapshot name. Be careful in how you setup your deletion rules—if you terminate the environment without taking precautions, there is a chance your RDS database will be deleted too.
For those of you wanting to associate a new EB environment with an existing RDS database, I have bad news. Apparently, the underlying foundations of EB do not permit this. You will have to either expose your database to the internet and inject its information as environment variables, or somehow figure out how to do it within the VPC.
Using ebcli
’s Deploy Command and Updates to docker-compose.yml
In the last post, we deployed our app by just sending EB our docker-compose.yml
file which simply contained a link to our image. This was fine for what we were doing, but we were ultimately bypassing some important steps. Most notably, the.ebextensions
folder was embedded into our image and not executing on deployment! This is fine if you don’t need any pre or post deployment routines or hooks, but for any sophisticated service this will likely be an issue.
So we’ll have to deploy the old fashioned way: by zipping our app and deploying it with the eb deploy <ENVIRONMENT-NAME>
command in the working directory of your Django app (this must be configured with the ebcli
first by running eb init
!). It is very likely that once you have your deployment working, you can set up your deployment routine to slightly resemble what we did in Part 1 by telling eb deploy
to only upload docker-compose.yml
, .ebextensions
, .platform
, and any other directories you need and to point Docker to a cloud-based image in docker-compose.yml
. But that is a headache for a different day, and we’ll just send our entire app to the cloud and have docker-compose.yml
build the image there. My new docker-compose.yml
for this deployment routine looks like this:
version: '3.8'
services:
web:
build:
context: .
dockerfile: Dockerfile
image: project:latest
command: bash -c "python3 manage.py migrate && python3 manage.py runserver 0.0.0.0:8000"
ports:
- 80:8000
env_file:
- .env
There are some important differences here from Part 1. First, as I said above, we’ll just build the image locally on deployment and call it project:latest
(it is not being sent to a remote repo, like Docker Hub or ECR, so don’t worry about overwriting anything unless you explicitly tell it to). The build
command’s context is the folder relative to docker-compose.yml
where the Dockerfile is, and the dockerfile
command is the name of the Dockerfile (useful if you have multiple Dockerfiles depending on the stage of deployment). Next, we’re going to remove our database migration from .ebextensions
and just do it before running the server. To make a long story short, the formatting after the command
argument is the best way to chain together multiple commands on AWS Linux 2. Lastly, we’re including an env_file
. More on this in a second.
I have also made another change in my deployment routine. I made a second “base” image that is based upon the first. Since we’re building the image locally, it will have to install all of our Python libraries during the deployment. Depending on how many you have, this could take a lot of time. In my second “base” image, I have just copy and pasted the requirements.txt
pip
install from Part 1 and referenced the original base image with all of the OS libraries. I tagged this one project:base-python
:
# Use Original Base Image
FROM project:base
# copy requiremnets.txt to the Docker workdir and install all dependencies
COPY requirements.txt /app/requirements.txt
RUN python3 -m pip install -r requirements.txt
After generating this image and pushing to my repos, I then go to work on making my deployment Dockerfile that references this one. The Dockerfile
I am then deploying and references in docker-compose.yml
is just the port opening and Django project copying commands from our Part 1 image. If you are confused, just know that I split the original Dockerfile from Part 1 into 2 different images to speed up the deployment by pre-installing the Python libraries. Here is the Dockerfile I am deploying:
FROM project:base-python
# copy project
COPY . /app/
# port where the Django app runs
EXPOSE 80
EXPOSE 8000
# Use docker-compose to run the Django app
Note that if you’re using a public Docker Hub repo for project:base-python
, then the FROM
reference should be fine. If you’re using a private repo, then its better to host project:base-python
on ECR where you won’t have permission issues. In this case, you can just change the FROM
command to point to the ECR URL of your tagged image. This process will make production deployments lighter, but its feasibility depends on your project’s needs.
Environment Variables and RDS
It is good security practice to avoid hardcoding your database login information in code. Instead, we can scoop them up programmatically at runtime through environment variables, generally handled by the standard os
library in Python. Here is what this generally looks like in settings.py
when using a remote database:
DATABASES = {
'default': {
'ENGINE': 'django.contrib.gis.db.backends.postgis',
'NAME': os.environ['RDS_DB_NAME'],
'USER': os.environ['RDS_USERNAME'],
'PASSWORD': os.environ['RDS_PASSWORD'],
'HOST': os.environ['RDS_HOSTNAME'],
'PORT': os.environ['RDS_PORT'],
}
}
For those of you familiar with deploying Django to EB without Docker, this has likely never raised an issue. However, if you were to simply run this code in with the configuration in Part 1, you will find that these keys are not available, causing your code to error out and your deployment to fail. What gives? At the end of Part 1, I showed you how to inspect your Docker environment after ssh
’ing into your instance. Running docker inspect
on our instance, we will see that these RDS environment variables (or any environment variables injected via the EB environment dashboard) are not listed. And here it is again, the sinking feeling of yet another possibly insurmountable deployment issue on EB. However, an unlikely hero, the AWS EB documentation will save the day.
Above, in our production docker-compose.yml
, we included a reference to an environment file. As it turns out, the environment variables injected into our EB environment are only available to be directly referenced on deployment, and are otherwise (specifically in the Docker platform on EB) stored in a hidden file called .env
placed in the root directory of our application. This includes the RDS environment variables (IP, username, etc.) for the database associated with our EB environment. According to AWS documentation, we can simply scoop this up with our docker-compose.yml
file and they will be injected into our environment variables that can be programmatically referenced by Python and Django. We can also additionally add more environment variables via the environment
argument. This would be useful if you wanted to easily switch Django’s DEBUG
variable in settings.py
depending on whether you are debugging or deploying to production. With these changes, your deployed instances should have ready access to AWS RDS.
Django Static Files: A Pitfall of Docker + Elastic Beanstalk
Static files are all of the JS, CSS, and media used to make your web pages function. Django aggregates these into a central location when python3 manage.py collectstatic
is run. In Part 1, I deployed Django in debug mode, and static files were being served as usual. However, in production, Django changes where and how it serves its static files. Generally, static files are served via URL references in your web pages, and Django treats these URLs a bit differently than those you’ve defined in your project. In non-Docker deployments running Django with DEBUG=FALSE
, you must configure your proxy server via .ebextensions
to serve static files. For example, .ebextensions/01_staticfiles.config
would look like:
option_settings:
aws:elasticbeanstalk:environment:proxy:staticfiles:
/static: staticfiles/
However, the AWS documentation shows that configuring the proxy to serve static files in Amazon Linux 2 is unavailable for Docker. Some people have had hacked around this limitation by copying them elsewhere to the system and then configuring nginx to look for them. Others seem to have handled this by attaching volumes via docker-compose.yml
. However, I’m not a fan of keeping static files on the web servers themselves, as it puts additional load on your servers and doesn’t scale well at all (what happens when a user needs to upload an image in your multi-instance environment without a central place to put it?). A better solution is to use AWS S3 to host them, which can be integrated into your project directly with django-storages without much additional work or changes in standard Django workflow beyond configuring settings.py
. Not only do we get around this problem, but you also get your own centralized CDN for all static files and media. You can also extend their backends to point to different static URLs for debug or production if you’re worried about messing things up. Anyway, we get this configured (configuring the S3 bucket permissions to be appropriately accessible in read-only mode can be a frustrating but very important process) and we will successfully bypass this limitation in serving static files. You can either send static files to S3 locally by building some switches in settings.py
or you can add it to the commands in docker-compose.yml
(but this can be risky for accidentally overwriting your production static files!).
Celery + Docker + Django + EB: Welcome to the Thunderdome
Any sophisticated web platform will want the ability to schedule and queue tasks. Celery is just that service: a way to schedule and execute asynchronous tasks. Using django-celery and django-celery-beat, and configuring our Django project accordingly, we can use our own database as a centralized task scheduler. Celery requires two components: the Beat, which sends out the signal that a task needs to be run, and the Workers, which execute the task. The Beat and the Workers need to be connected by some sort of messaging system so that they can talk to each other. For this, it is obvious to choose AWS SQS, which can be integrated into django-celery
and which gives you so many free messages that this will likely not cost you to implement. There are additional tutorials out there which show you how to configure SQS in settings.py
as your message broker between Beat and Worker. Pro tip: be sure to set up a dedicated queue per environment and switch out their keys as you change between production deployments and debug setups. If your production queue and your debug queue are the same, then tasks could be sent to either! I like to use the same switch in settings.py
that I used for choosing between local and production static files for solving this issue.
However, this is where Elastic Beanstalk becomes very unideal. First, Celery should run as a daemon in most normal configurations, and configuring/debugging this on deployment is a total nightmare. Secondly, you only ever want one Beat running at a time, which can be problematic on an auto-scaling service like EB. Multiple Beats mean that multiple signals to run a task are being sent, therefore causing task duplication which can cause all sorts of issues. Secondly, EB currently offers no ability to protect a specific instance from being terminated during auto-scaling. Even if you get only one Beat running, it always exists under the threat of being trimmed. EB treats all instances equally, and the leader has no protective weight. Lastly, it is difficult to decouple your Beat and Workers from your web environment: generally, you need access to your database and the Django ORM in order to execute most tasks.
A new leader instance is elected by EB when eb deploy
is run. The leader environment is just a chosen instance in your environment to execute exclusive code. Ideally, we could leverage the leader_only
command in .ebextensions
to start a Beat on only the leader, and then allow the Workers to run on every instance. We would then want to make sure that we turned off auto-scaling by selected the same minimum and maximum number of instances allowed as a stopgap solution to protect our Beat. There are creative solutions for trying to keep a leader identified at all times during instance adjustments, but this is too much of a headache for now. An additional issue is that you cannot tell which instance is the leader, so identifying which instance your Beat is running would be a nightmare of ssh
’ing and proc
checks.
The ideal setup would be to decouple the Beat and the Workers and to put them in a different environment that can be more easily managed, requiring you to open your database to the internet and deploy your Django code somewhere else so that the Beat could read the database and access your ORM. You would have to be careful to keep your models.py
up-to-date between your Celery environment and your web environment though, otherwise your ORM will begin to throw errors. Unsurprisingly, there is no good documentation for how to accomplish this. Another task for another day.
For now, let’s compromise on auto-scaling and keep our instance count rigid. This is not ideally and potentially costly, but it is so, so hard to give up the ease of use of EB. In a previous project, I used Amazon Linux 1’s supervisor
to run Celery Worker and Beat as a daemon, with Beat only running on the leader thanks to the leader_only tag in .ebextensions
where I was launching this service. However, supervisor
does not come configured in Amazon Linux 2, and the paths to reference any install are totally different. Unless we’re down to try hours of hacking, this is not an ideal path forward. Additionally, solutions attempting to use platform hooks do not work with our configuration (after hours of trying, I forget the reason for this particular attempt). To make a long story short, after several exhausting hours of trying, what I found to work is to take advantage of docker-compose
and to run Beat and Worker in separate containers alongside of our web app. This does not scale as well as the leader_only
options we previously used in Amazon Linux 1, but it may be possible to kill Beat containers with platform hooks if they fail the EB_IS_COMMAND_LEADER
environment variable check. I have not tried this, but it is a promising path for future work. For now, we will live with a single EB instance and can manually kill other Beat containers through ssh
if we have to spin up an additional, reasonable number of other instances.
Ok, so let’s set up a multi-container environment to handle Celery Beat and Worker alongside our app. First problem: Worker requires the notoriously difficult to install pycurl
Python library. Let’s go back to our Python base image (the second image we built in Part 1 to handle our requirements.txt
install, which also references our initial base image in a FROM
argument) and add the following block of code to install pycurl
:
RUN yum -y install openssl-static && \
export PYCURL_SSL_LIBRARY=openssl && \
pip3 install pycurl==7.43.0.5 --global-option="--with-openssl" --upgrade
Rebuild your base Python image, tag it as base-python
, and push to ECR.
Now, we need to configure our Dockerfile
and docker-compose.yml
. Here, I will take inspiration from this blog post. In the root of your Django project, add a folder called celery
and create two subfolders called beat
and worker
. In each, we will add bash
scripts simply called start
. In file celery/beat/start
, add the following:
#!/bin/bash
set -o errexit
set -o nounset
rm -f './celerybeat.pid'
celery -A project beat -l info
Then, in celery/worker/start
, add:
#!/bin/bash
set -o errexit
set -o nounset
celery -A project worker --loglevel=info
Where, in both, rename project
to the name of the folder from the root Django directory where settings.py
lives (i.e. from the Django root, I would find settings.py
under project/settings.py
). Now, in our production Dockerfile
, we will add commands to copy these files to the Docker image root so that we can call them later from docker-compose.yml
. Your new production Dockerfile
should look like this (using the base Python image stored at the ECR URL):
# Use Base Python Image
FROM URL:base-python
# copy project
COPY . /app/
ENV DJANGO_SETTINGS_MODULE=project.settings
COPY ./celery/worker/start /start-celeryworker
RUN sed -i 's/\r$//g' /start-celeryworker
RUN chmod +x /start-celeryworker
COPY ./celery/beat/start /start-celerybeat
RUN sed -i 's/\r$//g' /start-celerybeat
RUN chmod +x /start-celerybeat
# port where the Django app runs
EXPOSE 80
EXPOSE 8000
Where we again replace project.settings
with your specific settings.py
path. Now, let’s extend our original docker-compose.yml
to be the following:
version: '3.8'
services:
web:
build:
context: .
dockerfile: Dockerfile
image: project:latest
command: bash -c "python3 manage.py migrate && python3 manage.py runserver 0.0.0.0:8000"
volumes:
- .:/app
ports:
- 80:8000
env_file:
- .env
celery_worker:
build:
context: .
dockerfile: Dockerfile
image: django_celery_worker
command: /start-celeryworker
volumes:
- .:/app
env_file:
- .env
environment:
- DJANGO_SETTINGS_MODULE=project.settings
restart: on-failure # will restart until it's success
depends_on:
- web
celery_beat:
build:
context: .
dockerfile: Dockerfile
image: django_celery_beat
command: /start-celerybeat
volumes:
- .:/app
env_file:
- .env
environment:
- DJANGO_SETTINGS_MODULE=project.settings
depends_on:
- web
There is much to explain here. Again, we are building 3 separate images, and in each we are referencing the Dockerfile
that we just produced above. The volume
is referring to the app
folder that we make in our Dockerfile
. It is important that the Celery images have these contents because they need access to our Django settings.py
. There is likely a way to “trim the fat” for the rest of the project so that we don’t include more than we have to and make each Celery image bigger than necessary, but for now we will just deal with it. The command
in each of the Celery images references the bash files we made above as a way to execute Beat and Worker. Additionally, we are assuming that you have the Django settings.py module set up to use RDS. Othewise, Worker and Beat might be using copies of the same database. If you are not using a remote database, it would be a good idea to build a 4th container to host a database and use depends_on
accordingly. We are also using the .env
file in each to get our RDS data (note: for local testing purposes you’ll want to comment this line out. If you make a local .env
environment, be sure to delete it before deploying to EB or else it will overwrite EB’s .env
file!). Lastly, we add restart: on-failure
to the worker node. This is important: even though we make both Celery containers depend on the web service, this does not influence the startup sequence! I had several occasions where Worker started before the web app, making the database unavailable during local testing and causing docker-compose
to error out. This will just keep restarting the errored out Worker container until the web service boots up.
You may then deploy to EB, and it should work! I use django-celery-results to store task results in the database for easy confirmation that my setup is working. You can also see the standard outputs of Beat and Worker in the Elastic Beanstalk environment logs by inspecting /var/log/eb-docker/containers/eb-current-app/eb-stdouterr.log
via either ssh
or by eb logs
.
An Imperfect Solution
I have (somewhat) successfully combined Django, Docker, Celery, and RDS on AWS Elastic Beanstalk while using SQS as my message broker. Somewhat defeating the purpose of EB, this solution does not scale to multiple instances without producing multiple Beats, but this should get you deployed in a limited capacity until you can figure that part out (leave a comment if you do!). Some potential solutions are killing the Beat containers on non-leader instances through platform hooks (see my brief description above), or trying something like redbeat
. The best solution would be to remove your Beat and Worker from your web service’s EB environment entirely, perhaps to different EB environment, and then connect it to your RDS instance manually (this may require exposing the database to the internet if you can’t get your VPC grouping right). If I get a solution like this working, I will be sure to write another post.
I love your posts! That are always so informative and show your success and failures to let us know you are a REAL developer. Keep you the great work!!!