The Big Log

You didn’t just do that, Heroku

Mon, 17 Apr 2023 00:00:00 GMT

Obligatory TL;DR with spoilers at the end of the post.

Background

MusicButler is the first web-app I ever built. It notifies its users about new music by artists found in their libraries. It’s also how I learned a lot of what I know about programming and most of what I know about web-development. I believe it played a huge part in how I got to make a career shift to a software developer in my 30s.

Before the dawn of Dockerized apps, one way a layman like me would get an app to production was using Heroku’s Procfiles. MusicButler went live in August 2018. Today it’s a nice side-project that I’m proud to stand behind.

The app bakcend is built on Django and relies heavily on the Celery library for carrying out millions of background and scheduled tasks each day. In fact, all Heroku dynos except for the web one are Celery dynos:[^1]

web: daphne musicbutler.asgi:application --port $PORT --bind 0.0.0.0
celerybeat: celery -A musicbutler beat
celerybackgroundworker1: celery -A musicbutler worker -Q regular
celerybackgroundworker2: celery -A musicbutler worker -Q regular
celeryimportantworker: celery -A musicbutler worker -Q important

The dyno of interest here is celerybeat: the Celery “scheduler” responsible for assigning scheduled tasks to other Celery workers.

April 8: first duplicate email received

One of the oldest pieces of code in MusicButler is the one that sends out music-drop emails to thousands of users each day, once a day.

I was surprised when I saw that my test user had received two emails titled "New Music for April 8, 2023" at the same time. The emails were identical. This had never happened before.

Upon checking my email delivery provider dashboard, I confirmed that MusicButler had sent twice its average volume of emails that day. Heroku's logs showed that celerybeat had dispatched the scheduled task twice:

celerybeat.1 [2023-04-08 21:35:00,000: INFO/MainProcess] Scheduler: Sending due task send_music_drops (send_music_drops)
celerybeat.1 [2023-04-08 21:35:00,014: INFO/MainProcess] Scheduler: Sending due task send_music_drops (send_music_drops)

Ok. This isn't the email provider’s fault. It’s mine, or Celery’s. Probably the former.

I was busy that night so I utilized a Celery companion library called Celery Once which ensures only one instance of a task can run concurrently.

April 12: First user complaint

Over the next few days some personal matters came up that steered my attention away. It was not until a long-time user complained about receiving duplicate emails that I realized the issue was still persisting. Apparently Celery Once doesn’t handle locking for scheduled tasks.

The music drops is one the core features of the app, so I needed to fix it quickly. I had approximately 24 hours until the next batch of emails was due to be sent out.

I could implement the use of a distributed lock myself, or make the email-sending task truly idempotent, but I wanted understand why something was broken in the first place. This code has been running flawlessly for years.

Unfortunate coincidence

Nothing stood out when I ran the service locally. A quick scan of the code didn't bring up any suspects either.

I inspected Heroku’s logs again and saw that it wasn’t just this specific task that was being dispatched twice, all of them were:

celerybeat.1 [2023-04-12 18:22:00,000: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)
celerybeat.1 [2023-04-12 18:22:00,015: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)

Luckily most of these tasks are truly idempotent so little damage was done there.

Here’s where a mix of Impostor Syndrome and mandated developer humility sent me looking in the wrong directions.

In the weeks before, I’ve written some new code that was also supposed to be scheduled by celerybeat. Did I muck up something there? since this is Python, you can shoot yourself in the foot in a variety of ways. Maybe I've imported the scheduler configuration code twice, maybe I inadvertedly put some code in an __init__ file, like this Stack Overflow answer I came across suggests.

I start to delete the new code. At first gradually, and then frantically. With each deployment that failed to mitigate the issue, I was running out of time and nerves.

“It’s Celery’s fault”

Using Heroku's log drains again, I managed to pinpoint the exact time the issue first manifested. The celerybeat dyno started sending the same tasks twice on April 6, 2023 at 22:16:00 UTC:

# once every two minutes
celerybeat.1 [2023-04-06 22:10:00,003: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)
celerybeat.1 [2023-04-06 22:12:00,001: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)
celerybeat.1 [2023-04-06 22:14:00,001: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)
# twice every two minutes
celerybeat.1 [2023-04-06 22:16:00,000: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)
celerybeat.1 [2023-04-06 22:16:00,015: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)

22:16:00 is the exact time a new build of the app was deployed. This was a revelation, but not a very useful one by the time I discovered it: I had already commented out all new code that could remotely be connected to the “bug”.

So is it Celery then? This awesome library has its fair share of complaints about duplicated task execution — most are due to misconfiguration. I scanned dozens of Stack Overflow threads and GitHub issues in Celery’s repository. Very few were relevant to my case, and none of the fixes were.

April 13: You didn’t just do that, Heroku

The last thread I read and the one that would lead me to the shocking discovery included a brute-force suggestion: delete Celery’s celerybeat-schedule file, where celerybeat keeps its schedule.

There’s no reason this would happen on a Serverless platform like Heroku, I thought to myself. But, at this point nothing makes sense anymore, I also thought to myself.

I decided to do something different just before; I renamed the dyno celerybeat to celerybeatnew in the Procfile and deployed:

web: daphne musicbutler.asgi:application --port $PORT --bind 0.0.0.0
celerybeatnew: celery -A musicbutler beat
celerybackgroundworker1: celery -A musicbutler worker -Q regular
celerybackgroundworker2: celery -A musicbutler worker -Q regular
celeryimportantworker: celery -A musicbutler worker -Q important

After the deployment was over I checked the logs again; now everything became clear:

👇
celerybeatnew.1 [2023-04-13 19:42:00,000: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)
👇
celerybeat.1 [2023-04-13 19:42:00,015: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)

See, at this point the celerybeat dyno shouldn’t even exist. It was nowhere to be found on my list of dynos. But here it is, alive, well, and scheduling tasks.

So what happened on that April 6 deployment is that Heroku either spun up two celerybeat dynos instead of one, or just never killed the old one. It’s not that celerybeat was misbehaving, it’s just that there were now two of them. It was virtually impossible to get to this conclustion until I changed the dyno's name in the Procfile.

Heroku’s support lives up to its name

I actually know what happened, but not thanks to Heroku’s support. I contacted them on April 13 and as of April 17, their only response is “we’re looking into this”. I haven't heard from them since, and the old celerybeat dyno is still up with nothing I can do to stop it.[^2]

Update: Herkou finally terminated the zombie dyno on April 17 at 14:00UTC. They confirmed there was nothing I could've done to terminate it myself.

In hindsight, it was a good decision to not simply rewrite the task’s code to utilize locks. That would have prevented new instances of the app from executing tasks twice, but Heroku is running an old, zombie dyno with outdated code. I know this because I removed some tasks from celerybeatnew and witnessed how they’re still running, courtesy of Heroku’s on-the-house worker. If Heroku was simply spinning up a new extra dyno with each deployment, these tasks should’ve stopped executing enitrely.

Writing on the wall

I’ve been reading on Heroku’s slow demise on developer communities for years now. Just weeks earlier, I’d read a horror story on Twitter about Heroku just flat-out deleting someone’s account, including production apps.

And still. I don’t know what I could’ve done differently here: when I list an internet-taught software developer, a talented OSS team, and a multi-gazillion dollar corporation, my instinct is to look into them in this same order.

Next steps

Truly idempotent tasks

This could have happened on any platform, and for other reasons. If a task should never ever be executed twice, or concurrently, it shouldn't count on a scheduler to prevent that. In fact, I've followed Celery's own guide on ensuring a task is only executed one at a time. It's just that I didn't retroactively apply that to a 4 year-old piece of code.

No more vendor lock

In 2018 Heroku was pretty much the only option for a newbie like me to get started with web-development. The landscape is very different now with providers like Render, Railway, and more offering a simillar DX.

I've just finished deploying a Celery worker on Railway. The exprience was OK. It would cost a bit more than what I currently pay for a Heroku dyno. Railway ivented their own deployment-file format, Nixpacks, which allows one to build Docker images. Render utilizes render.yaml files. Wherever I decide to go, I’ll be Dockerizing MusicButler so I can move between vendors less painfully.

When you're a solo-developer doing something on the side you need to prioritize your time ruthlessly. I've always prioritized new features and user feedback over infra work. Looks like it's time to start paying attention to the latter.

TL;DR

Heroku has been running a 2nd copy of my scheduler instance since April 6, 2023 and I have zero control over it.
All scheduled tasks were carried out twice, causing disturbance to users and unnecessary high load.
Given how Heroku works and how they present their logs, I had no way to detect this early on, or reason to suspect that’s what happened.
I discovered the root cause April 13 and contacted Heroku. The zombie instance is still running as of April 17 at 18:45 UTC.

[^1]: Some arguments were omitted for brevity. [^2]: I tried every documented method, and some undocumented ones.

The Django Speed Handbook: making a Django app faster

Tue, 25 Feb 2020 00:00:00 GMT

Over the course of developing several Django apps, I've learned quite a bit about speed optimizations. Some parts of this process, whether on the backend or frontend, are not well-documented. I've decided to collect most of what I know in this article.

If you haven’t taken a close look at the performance of your web-app yet, you're bound to find something good here.

What's in this article?

Why speed is important

On the web, 100 milliseconds can make a significant difference and 1 second is a lifetime. Countless studies indicate that faster loading times are associated with better conversion-rates, user-retention, and organic traffic from search engines. Most importantly, they provide a better user experience.

Different apps, different bottlenecks

There are many techniques and practices to optimize your web-app’s performance. It’s easy to get carried away. Look for the highest return-to-effort ratio. Different web-apps have different bottlenecks and therefore will gain the most when those bottlenecks are taken care of. Depending on your app, some tips will be more useful than others.

While this article is catered to Django developers, the speed optimization tips here can be adjusted to pretty much any stack. On the frontend side, it’s especially useful for people hosting with Heroku and who do not have access to a CDN service.

Analyzing and debugging performance issues

On the backend, I recommend the tried-and-true django-debug-toolbar. It will help you analyze your request/response cycles and see where most of the time is spent. Especially useful because it provides database query execution times and provides a nice SQL EXPLAIN in a separate pane that appears in the browser.

Google PageSpeed will display mainly frontend related advice, but some can apply to the backend as well (like server response times). PageSpeed scores do not directly correlate with loading times but should give you a good picture of where the low-hanging fruits for your app are. In development environments, you can use Google Chrome's Lighthouse which provides the same metrics but can work with local network URIs. GTmetrix is another detail-rich analysis tool.

Disclaimer

Some people will tell you that some of the advice here is wrong or lacking. That's okay; this is not meant to be a bible or the ultimate go-to-guide. Treat these techniques and tips as ones you may use, not should or must use. Different needs call for different setups.

Backend: the database layer

Starting with the backend is a good idea since it's usually the layer that's supposed to do most of the heavy lifting behind the scenes.

There's little doubt in my mind which two ORM functionalities I want to mention first: these are select_related and prefetch_related. They both deal specifically with retrieving related objects and will usually improve speed by minimizing the number of database queries.

select_related

Let's take a music web-app for example, which might have these models:

# music/models.py, some fields & code omitted for brevity
class RecordLabel(models.Model):
    name = models.CharField(max_length=560)


class MusicRelease(models.Model):
    title = models.CharField(max_length=560)
    release_date = models.DateField()

class Artist(models.Model):
    name = models.CharField(max_length=560)
    label = models.ForeignKey(
        RecordLabel,
        related_name="artists",
        on_delete=models.SET_NULL
    )
    music_releases = models.ManyToManyField(
        MusicRelease,
related_name="artists"
    )

So each artist is related to one and only one record company and each record company can sign multiple artists: a classic one-to-many relationship. Artists have many music-releases, and each release can belong to one artist or more.

I've created some dummy data:

20 record labels
each record label has 25 artists
each artist has 100 music releases

Overall, we have ~50,500 of these objects in our tiny database.

Now let's wire-up a fairly standard function that pulls our artists and their label. django_query_analyze is a decorator I wrote to count the number of database queries and time to run the function. Its implementation can be found in the appendix.

# music/selectors.py
@django_query_analyze
def get_artists_and_labels():
    result = []
    artists = Artist.objects.all()
    for artist in artists:
        result.append({"name": artist.name, "label": artist.label.name})
    return result

get_artists_and_labels is a regular function which you may use in a Django view. It returns a list of dictionaries, each contains the artist's name and their label. I'm accessing artist.label.name to force-evaluate the Django QuerySet; you can equate this to trying to access these objects in a Jinja template:

{% for artist in artists_and_labels %}
Name: {{ artist.name }}, Label: {{ artist.label.name }}
{% endfor %}

Now let's run this function:

ran function get_artists_and_labels
--------------------
number of queries: 501
Time of execution: 0.3585s

So we've pulled 500 artists and their labels in 0.36 seconds, but more interestingly — we've hit the database 501 times. Once for all the artists, and 500 more times: once for each of the artists' labels. This is called "The N+1 problem". Let's tell Django to retrieve each artist's label in the same query with select_related:

@django_query_analyze
def get_artists_and_labels_select_related():
    result = []
    artists = Artist.objects.select_related("label") # select_related
    for artist in artists:
        result.append(
            {"name": artist.name, "label": artist.label.name if artist.label else "N/A"}
        )
    return result

Now let's run this:

ran function get_artists_and_labels_select_related
--------------------
number of queries: 1
Time of execution: 0.01481s

500 queries less and a 96% speed improvement.

prefetch_related

Let's look at another function, for getting each artist's first 100 music releases:

@django_query_analyze
def get_artists_and_releases():
    result = []
    artists = Artist.objects.all()[:100]
    for artist in artists:
        result.append(
            {
                "name": artist.name,
                "releases": [release.title for release in artist.music_releases.all()],
            }
        )
    return result

How long does it take to fetch 100 artists and 100 releases for each one of them?

ran function get_artists_and_releases
--------------------
number of queries: 101
Time of execution: 0.18245s

Let's change the artists variable in this function and add select_related so we can bring the number of queries down and hopefully get a speed boost:

artists = Artist.objects.select_related("music_releases")

If you actually do that, you'll get an error:

django.core.exceptions.FieldError: Invalid field name(s) given in select_related: 'music_releases'. Choices are: label

That's because select_related can only be used to cache ForeignKey or OneToOneField attributes. The relationship between Artist and MusicRelease is many-to-many though, and that's where prefetch_related comes in:

@django_query_analyze
def get_artists_and_releases_prefetch_related():
    result = []
    artists = Artist.objects.all()[:100].prefetch_related("music_releases") # prefetch_related
    for artist in artists:
        result.append(
            {
                "name": artist.name,
                "releases": [rel.title for rel in artist.music_releases.all()],
            }
        )
    return result

select_related can only cache the "one" side of the "one-to-many" relationship, or either side of a "one-to-one" relationship. You can use prefetch_related for all other caching, including the many side in one-to-many relationships, and many-to-many relationships. Here's the improvement in our example:

ran function get_artists_and_releases_prefetch_related
--------------------
number of queries: 2
Time of execution: 0.13239s

Nice.

Things to keep in mind about select_related and prefetch_related:

If you aren't pooling your database connections, the gains will be even bigger because of fewer roundtrips to the database.
For very large result-sets, running prefetch_related can actually make things slower.
One database query isn't necessarily faster than two or more.

Indexing

Indexing your database columns can have a big impact on query performance. Why then, is it not the first clause of this section? Because indexing is more complicated than simply scattering db_index=True on your model fields.

Creating an index on frequently accessed columns can improve the speed of look-ups pertaining to them. Indexing comes at the cost of additional writes and storage space though, so you should always measure your benefit:cost ratio. In general, creating indices on a table will slow down inserts/updates.

Take only what you need

When possible, use values() and especially values_list() to only pull the needed properties of your database objects. Continuing our example, if we only want to display a list of artist names and don't need the full ORM objects, it's usually better to write the query like so:

artist_names = Artist.objects.values('name')
# 

artist_names = Artist.objects.values_list('name')
# 

artist_names = Artist.objects.values_list('name', flat=True)
#

Haki Benita, a true database expert (unlike me), reviewed some parts of this section. You should read Haki's blog.

Backend: the request layer

The next layer we're going to look at is the request layer. These are your Django views, context processors, and middleware. Good decisions here will also lead to better performance.

Pagination

In the section about select_related we were using the function to return 500 artists and their labels. In many situations returning this many objects is either unrealistic or undesirable. The section about pagination in the Django docs is crystal clear on how to work with the Paginator object. Use it when you don't want to return more than N objects to the user, or when doing so makes your web-app too slow.

Asynchronous execution/background tasks

There are times when a certain action inevitably takes a lot of time. For example, a user requests to export a big number of objects from the database to an XML file. If we're doing everything in the same process, the flow looks like this:

web: user requests file -> process file -> return response

Say it takes 45 seconds to process this file. You're not really going to let the user wait all this time for a response. First, because it's a horrible experience from a UX standpoint, and second, because some hosts will actually cut the process short if your app doesn't respond with a proper HTTP response after N seconds.

In most cases, the sensible thing to do here is to remove this functionality from the request-response loop and relay it to a different process:

web: user requests file -> delegate to another process -> return response
                           |
                           v
background process:        receive job -> process file -> notify user

Background tasks are beyond the scope of this article but if you've ever needed to do something like the above I'm sure you've heard of libraries like Celery.

Compressing Django's HTTP responses

This is not to be confused with static-file compression, which is mentioned later in the article.

Compressing Django's HTTP/JSON responses also stands to save your users some latency. How much exactly? Let's check the number of bytes in our response's body without any compression:

Content-Length: 66980
Content-Type: text/html; charset=utf-8

So our HTTP response is around 67KB. Can we do better? Many use Django's built-in GZipMiddleware for gzip compression, but today the newer and more effective brotli enjoys the same support across browsers (except IE11, of course).

Important: Compression can potentially open your website to security breaches, as mentioned in the GZipMiddleware section of the Django docs.

Let's install the excellent django-compression-middleware library. It will choose the fastest compression mechanism supported by the browser by checking the request's Accept-Encoding headers:

pip install django-compression-middleware

Include it in our Django app's middleware:

MIDDLEWARE = [
    "django.middleware.security.SecurityMiddleware",
    "django.contrib.sessions.middleware.SessionMiddleware",
    "django.contrib.auth.middleware.AuthenticationMiddleware",
    "compression_middleware.middleware.CompressionMiddleware",
# ...
]

And inspect the body's Content-Length again:

Content-Encoding: br
Content-Length: 7239
Content-Type: text/html; charset=utf-8

The body size is now 7.24KB, 89% smaller. You can certainly argue this kind of operation should be delegated to a dedicated server like Ngnix or Apache. I'd argue that everything is a balance between simplicity and resources.

Caching

Caching is the process of storing the result of a certain calculation for faster future retrieval. Django has an excellent caching framework that lets you do this on a variety of levels and using different storage backends.

Caching can be tricky in data-driven apps: you'd never want to cache a page that's supposed to display up-to-date, realtime information at all times. So, the big challenge isn't so much setting up caching as it is figuring out what should be cached, for how long, and understanding when or how the cache is invalidated.

Before resorting to caching, make sure you've made proper optimizations at the database-level and/or on the frontend. If designed and queried properly, databases are ridiculously fast at pulling out relevant information at scale.

Frontend: where it gets hairier

Reducing static files/assets sizes can significantly speed up your web application. Even if you've done everything right on the backend, serving your images, CSS, and JavaScript files inefficiently will degrade your application's speed.

Between compiling, minifying, compressing, and purging, it's easy to get lost. Let's try not to.

Serving static-files

You have several options on where and how to serve static files. Django's docs mention a dedicated server running Ngnix and Apache, Cloud/CDN, or the same-server approach.

I've gone with a bit of a hybrid attitude: images are served from a CDN, large file-uploads go to S3, but all serving and handling of other static assets (CSS, JavaScript, etc…) is done using WhiteNoise (covered in-detail later).

Vocabulary

Just to make sure we're on the same page, here's what I mean when I say:

Compiling: If you're using SCSS for your stylesheets, you'll first have to compile those to CSS because browsers don't understand SCSS.
Minifying: reducing whitespace and removing comments from CSS and JS files can have a significant impact on their size. Sometimes this process involves uglifying: the renaming of long variable names to shorter ones, etc...
Compressing/Combining: for CSS and JS, combining multiple files to one. For images, usually means removing some data from images to make their files size smaller.
Purging: remove unneeded/unused code. In CSS for example: removing selectors that aren't used.

Serving static files from Django with WhiteNoise

WhiteNoise allows your Python web-application to serve static assets on its own. As its author states, it comes in when other options like Nginx/Apache are unavailable or undesired.

Let's install it:

pip install whitenoise[brotli]

Before enabling WhiteNoise, make sure your STATIC_ROOT is defined in settings.py:

STATIC_ROOT = os.path.join(BASE_DIR, "staticfiles")

To enable WhiteNoise, add its WhiteNoise middleware right below SecurityMiddleware in settings.py:

MIDDLEWARE = [
  'django.middleware.security.SecurityMiddleware',
  'whitenoise.middleware.WhiteNoiseMiddleware',
  # ...
]

In production, you'll have to run manage.py collectstatic for WhiteNoise to work.

While this step is not mandatory, it's strongly advised to add caching and compression:

STATICFILES_STORAGE = 'whitenoise.storage.CompressedManifestStaticFilesStorage'

Now whenever it encounters a {% static %} tag in templates, WhiteNoise will take care of compressing and caching the file for you. It also takes care of cache-invalidation.

One more important step: To ensure that we get a consistent experience between development and production environments, we add runserver_nostatic:

INSTALLED_APPS = [
    'whitenoise.runserver_nostatic',
    'django.contrib.staticfiles',
    # ...
]

This can be added regardless of whether DEBUG is True or not, because you don't usually run Django via runserver in production.

I found it useful to also increase the caching time:

# Whitenoise cache policy
WHITENOISE_MAX_AGE = 31536000 if not DEBUG else 0 # 1 year

Wouldn't this cause problems with cache-invalidation? No, because WhiteNoise creates versioned files when you run collectstatic:

So when you deploy your application again, your static files are overwritten and will have a different name, thus the previous cache becomes irrelevant.

Compressing and combining with django-compressor

WhiteNoise already compresses static files, so django-compressor is optional. But the latter offers an additional enhancement: combining the files. To use compressor with WhiteNoise we have to take a few extra steps.

Let's say the user loads an HTML document that links three .css files:

Your browser will make three different requests to these locations. In many scenarios it's more effective to combine these different files when deploying, and django-compressor does that with its {% compress css %} template tag:

This:

{% load compress %}

  {% compress css %}
    
    
    
  {% compress css %}

Becomes:

Let's go over the steps to make django-compressor and WhiteNoise play well. Install:

pip install django_compressor

Tell compressor where to look for static files:

COMPRESS_STORAGE = "compressor.storage.GzipCompressorFileStorage"
COMPRESS_ROOT = os.path.abspath(STATIC_ROOT)

Because of the way these two libraries intercept the request-response cycle, they're incompatible with their default configurations. We can overcome this by modifying some settings.

I prefer to use environment variables in .env files and have one Django settings.py, but if you have settings/dev.py and settings/prod.py, you'll know how to convert these values:

main_project/settings.py:

from decouple import config
#...

COMPRESS_ENABLED =  config("COMPRESS_ENABLED", cast=bool)
COMPRESS_OFFLINE = config("COMPRESS_OFFLINE", cast=bool)

COMPRESS_OFFLINE is True in production and False in development. COMPRESS_ENABLED is True in both[^fn-1-compress].

With offline compression, one must run manage.py compress on every deployment. On Heroku, you'll want to disable the platform from automatically running collectstatic for you (on by default) and instead opt to do that in the post_compile hook, which Heroku will run when you deploy. If you don't already have one, create a folder called bin at the root of your project and inside of it a file called post_compile with the following:

python manage.py collectstatic --noinput
python manage.py compress --force
python manage.py collectstatic --noinput

Another nice thing about compressor is that it can compress SCSS/SASS files:

COMPRESS_PRECOMPILERS = (
    ("text/x-sass", "django_libsass.SassCompiler"),
    ("text/x-scss", "django_libsass.SassCompiler"),
)

Minifying CSS & JS

Another important thing to apply when talking about load-times and bandwidth usage is minifying: the process of (automatically) decreasing your code's file-size by eliminating whitespace and removing comments.

There are several approaches to take here, but if you're using django-compressor specifically, you get that for free as well. You just need to add the following (or any other filters compressor supports) to your settings.py file:

COMPRESS_FILTERS = {
    "css": [
        "compressor.filters.css_default.CssAbsoluteFilter",
        "compressor.filters.cssmin.rCSSMinFilter",
    ],
    "js": ["compressor.filters.jsmin.JSMinFilter"],
}

Defer-loading JavaScript

Another thing that contributes to slower performance is loading external scripts. The gist of it is that browsers will try to fetch and execute JavaScript files in the tag as they are encountered and before parsing the rest of the page:

We can use the async and defer keywords to mitigate this:

async and defer both allow the script to be fetched asynchronously without blocking. One of the key differences between them is when the script is allowed to execute: With async, once the script has been downloaded, all parsing is paused until the script has finished executing, while with defer the script is executed only after all HTML has been parsed.

I suggest referring to Flavio Copes' article on the defer and aysnc keywords. Its general conclusion is:

The best thing to do to speed up your page loading when using scripts is to put them in the head, and add a defer attribute to your script tag.

Lazy-loading images

Lazily loading images means that we only request them when or a little before they enter the client's (user's) viewport. It saves time and bandwidth ($ on cellular networks) for your users. With excellent, dependency-free JavaScript libraries like LazyLoad, there really isn't an excuse to not lazy-load images. Moreover, Google Chrome natively supports the lazy attribute since version 76.

Using the aforementioned LazyLoad is fairly simple and the library is very customizable. In my own app, I want it to apply on images only if they have a lazy class, and start loading an image 300 pixels before it enters the viewport:

$(document).ready(function (e) {
  new LazyLoad({
    elements_selector: ".lazy", // classes to apply to
    threshold: 300, // pixel threshold
  });
});

Now let's try it with an existing image:

We replace the src attribute with data-src and add lazy to the class attribute:

Now the client will request this image when the latter is 300 pixels under the viewport.

If you have many images on certain pages, using lazy-loading will dramatically improve your load times.

Optimize & dynamically scale images

Another thing to consider is image-optimization. Beyond compression, there are two more techniques to consider here.

First, file-format optimization. There are newer formats like WebP that are presumably 25-30% smaller than your average JPEG image at the same quality. As of 02/2020 WebP has decent but incomplete browser support, so you'll have to provide a standard format fallback if you want to use it.

Second, serving different image-sizes to different screen sizes: if some mobile device has a maximum viewport width of 650px, then why serve it the same 1050px image you're displaying to 13″ 2560px retina display?

Here, too, you can choose the level of granularity and customization that suits your app. For simpler cases, You can use the srcset attribute to control sizing and be done at that, but if for example you're also serving WebP with JPEG fallbacks for the same image, you may use the element with multiple sources and source-sets.

If the above sounds complicated for you as it does for me, this guide should help explain the terminology and use-cases.

Unused CSS: Removing imports

If you're using a CSS framework like Bootstrap, don't just include all of its components blindly. In fact, I would start with commenting out all of the non-essential components and only add those gradually as the need arises. Here's a snippet of my bootstrap.scss, where all of its different parts are imported:

// ...

// Components
// ...
@import "bootstrap/dropdowns";
@import "bootstrap/button-groups";
@import "bootstrap/input-groups";
@import "bootstrap/navbar";
// @import "bootstrap/breadcrumbs";
// @import "bootstrap/badges";
// @import "bootstrap/jumbotron";

// Components w/ JavaScript
@import "bootstrap/modals";
@import "bootstrap/tooltip";
@import "bootstrap/popovers";
// @import "bootstrap/carousel";

I don't use things like badges or jumbotron so I can safely comment those out.

Unused CSS: Purging CSS with PurgeCSS

A more aggressive and more complicated approach is using a library like PurgeCSS, which analyzes your files, detects CSS content that's not in use, and removes it. PurgeCSS is an NPM package, so if you're hosting Django on Heroku, you'll need to install the Node.js buildpack side-by-side with your Python one.

Conclusion

I hope you've found at least one area where you can make your Django app faster. If you have any questions, suggestions, or feedback don't hesitate to drop me a line on Twitter.

Appendices

Decorator used for QuerySet performance analysis

Below is the code for the django_query_analyze decorator:

from timeit import default_timer as timer
from django.db import connection, reset_queries

def django_query_analyze(func):
    """decorator to perform analysis on Django queries"""

    def wrapper(*args, **kwargs):

        avs = []
        query_counts = []
        for _ in range(20):
            reset_queries()
            start = timer()
            func(*args, **kwargs)
            end = timer()
            avs.append(end - start)
            query_counts.append(len(connection.queries))
            reset_queries()

        print()
        print(f"ran function {func.__name__}")
        print(f"-" * 20)
        print(f"number of queries: {int(sum(query_counts) / len(query_counts))}")
        print(f"Time of execution: {float(format(min(avs), '.5f'))}s")
        print()
        return func(*args, **kwargs)

    return wrapper

[^fn-1-compress]: it's still useful to hold this boolean in the environment

How to scaffold Django projects with Cookiecutter

Sun, 28 Jul 2019 00:00:00 GMT

import demoVideo from './cookiecutter-demo-minified-9.mp4'

This post is a guide on how to scaffold (quick-start) new projects efficiently with Cookiecutter, a library that creates projects from project-templates. It outlines how I created my own Django cookiecutter, Scaffold Django X, but the same can be applied to Flask and pretty much any other Python project.

Working on some Django articles, I found myself needing to start a new project more often than usual. It can get tedious: having to initialize a project, filling boilerplate settings, adding template-files and directories…the list goes on.

Sure, you can duplicate an existing project. But then too much time is spent on removing unneeded files, figuring out why the new project isn't running (you forgot to remove a file or a line), and fiddling with settings. Might as well just start from scratch.

Or, use Cookiecutter:

Play Full Screen

What is Cookiecutter?

Cookiecutter is a command-line utility that creates projects from project-templates, aptly called cookiecutters. It allows for dynamic insertion of content within files and inclusion/exclusion of the files themselves in a way that makes project generation flexible and convenient.

Cookiecutter (the library) shares the same name with what it can be used to generate, a cookiecutter. This may be confusing at times. Keep in mind that Cookiecutter the library is written with a big C.

After you install Cookiecutter you can clone any local or remote project-template, choose how to configure your project from a set of options (predefined by the template's author), and you're ready to go. Want to use Celery? it's automatically included in the requirements.txt file, its relevant configs are added to Django's settings.py, and Celery-specific files are already in the generated project's tree. Using Django as a backend only? a cookiecutter can remove the static and templates folders for you.

The most popular Django related Cookiecutter project is Cookiecutter Django by Daniel Roy Greenfeld. It's very customizable and includes a long list of options and features. I found Cookiecutter Django too opinionated (and a bit of an overkill) for my use-case so I created my own cookiecutter: Scaffold Django X.

Generating a project from a cookiecutter

Before creating a cookiecutter, it's first worth understanding how you'd generate a project from an existing one.

Because you would want to be able to use it from any folder, it's a good idea to install Cookiecutter in your global/main Python environment:

$ pip install cookiecutter

If you want to scaffold a project from a local cookiecutter, navigate to the folder in which you want your actual project to live, open Terminal, and type cookiecutter followed by the path to the cookiecutter that you want to base your project on.

To create a project in ~/my-projects/ based on the cookiecutter/template called simple-django-cookiecutter, you go to ~/my-projects/:

$ cd ~/my-projects/

This is where Cookiecutter will place the generated project folder. You then invoke cookiecutter, specifying the path to the project template:

$ cookiecutter ~/code/my-cookiecutters/simple-django-cookiecutter/

You will then be prompted to fill-in some details that will be used to populate your project with the relevant configurations. These configuration variables and their default values are derived from a special kind of file, cookiecutter.json, which is covered later in this guide.

For example, the simple cookiecutter that I've created prompts for the following options when invoked:

project_slug [open_folder]:

project_name [Open Folder]:

description [A very nice weblog]:

author_name [SKM]:

author_email [openfolder@example.com]:

include_jquery_cdn [n]:

Select css_framework:
1 - tailwindcss
2 - bootstrap
3 - none

Choose from 1, 2, 3 (1, 2, 3) [1]:

If the project_slug is provided as music_app, then this is what the project's root folder will be called. The description will automatically go in the website's meta tags. include_jquery_cdn is handled in a similar fashion: if y is provided instead of the default n, then a to jQuery's CDN is inserted in the project's base.html. Django Cookiecutter X also populates base.html with an empty main.css and main.js files so that they're ready to use.

At the end of this process, a music_app folder is placed under ~/my-projects and you can start developing the preconfigured Django project.

It's also possible to clone remote cookiecutters. From Github for example:

cookiecutter https://github.com/SHxKM/django-scaffold-cookiecutter

This is how one would generate a project from a template. Next: how to build the template itself.

How to create a cookiecutter: the basics

The minimum requirement for a valid project-template is that it contains a cookiecutter.json file at its root folder. This file is used to define the different variables the user has to fill or choose from during the generation stage. It also sets an overridable default for each variable:

{
  "some_variable": "some_default_value",
  "project_slug": "open_folder",
  "project_name": "Open Folder",
  "description": "A very nice weblog",
  "author_name": "SKM",
  "author_email": "openfolder@example.com",
  "include_jquery_cdn": "n",
  "css_framework": ["tailwindcss", "bootstrap", "none"]
}

These are key-value pairs, with the each value denoting the default to use. If a user simply hits enter when prompted for the project_name, it will be Open Folder. If a list is used — like in css_framework above — Cookiecutter will present a numbered choice prompt for that option.

So, when the user has finished answering all prompts, Cookiecutter then scans the files and looks for blocks that match each of the keys above. But where and how are these values used?

Variables in filenames and folders

Here's the root folder of an example cookiecutter:

.
├── cookiecutter.json
└── {{ cookiecutter.project_slug }}
    ├── Pipfile
    ├── manage.py
    ├── static
    ├── templates
    └── {{ cookiecutter.project_slug }}

So for one, filenames and directories can themselves be variables. The Django project lives under the directory {{ cookiecutter.project_slug }}. The directory is named this way because Cookiecutter is going to dynamically rename it when the project is generated. You may be familiar with this curly-brace notation as Cookiecutter uses the same Jinja2 templating engine that Django supports.

Variables in HTML files

Here's a snippet from base.html:

{# base.html #}

...

  {{ cookiecutter.project_name }}
  
...

When the project is generated, {{ cookiecutter.project_name }} is simply replaced with the name provided by the user in the Terminal.

The fact that Cookiecutter uses the same syntax as Django templates can create a problem if it tries to parse Django's own tags, like {% static %} or {% url %}. You can escape these with the {% raw %} and {% endraw %} tags:

{% raw %}{% load static %}{% endraw %}

For conditionals, like whether to include the jQuery CDN, an if block is employed:

{%- if cookiecutter.include_jquery_cdn == "y" -%}
  
{%- endif %}

Variables in Python files

That's basically the gist of it for template files, but the same logic can be used in Python files. Here's a snippet from Django's settings.py:

# settings.py
INSTALLED_APPS = [
    "django.contrib.admin",
    "django.contrib.auth",
    "django.contrib.contenttypes",
    "django.contrib.sessions",
    "django.contrib.messages",
    "django.contrib.staticfiles",
    {%- if cookiecutter.css_framework == 'tailwindcss' -%}
    "tailwind",
    "theme",
    {%- endif %}
]

The above isn't valid Python but these tags are stripped-out anyway when the project is generated. If the user picks Tailwind CSS as their CSS framework then we need to include some extra lines (apps) in INSTALLED_APPS.

Here's how you'd access a cookiecutter's variable inside a Python file:

project_slug = "{{ cookiecutter.project_slug }}"

Variables in…cookiecutter.json

We can make our cookiecutter.json smarter by deriving project_slug's default value from project_name[^1]:

{
  "project_name": "Open Folder",
  "project_slug": "{{ cookiecutter.project_name.lower()|replace(' ', '_')|replace('-', '_')|replace('.', '_')|trim() }}"
}

This way, if the user enters "My App" as their project_name, the default value for project_slug becomes my_app. The user can then simply hit enter to use this value for the slug, or override it as they wish.

Ignoring files when parsing

We can tell Cookiecutter to ignore — not attempt to parse — certain directories or files:

{
  "project_slug": "open_folder",
  "project_name": "Open Folder",
  "_copy_without_render": ["theme"]
}

_copy_without_render tells Cookiecutter to copy files "as-is" without attempting to render (parse) them. In the case above, theme is a folder that contains a package that integrates the Tailwind CSS framework into Django. It contains 3rd-party files that should remain untouched even if they contain curly braces {{ }} that Cookiecutter usually sniffs for and strips.

Pre/Post-generate hooks

Cookiecutter also supports pre- and post-generation hooks. These are just regular Python files that are run before/after the project is generated. They are named pre_gen_project.py and post_gen_project.py, respectively. You place them inside a directory named hooks at the root of the project:

├── cookiecutter.json
├── hooks
│   ├── post_gen_project.py
│   └── pre_gen_project.py
└── {{ cookiecutter.project_slug }}
    ├── Pipfile
    ├── manage.py
    ├── static
    ├── templates
    ...

These generation hooks can be extremely useful when files (not just lines of code) should be added/removed dynamically depending on the user input. Below are examples of how each of these hooks can be useful.

Pre-generation hooks

If you want to validate that the project_slug given by the user is all lower-case, you can create a file hooks/pre_gen_project.py and include the following:

# hooks/pre_gen_project.py
project_slug = "{{ cookiecutter.project_slug }}"

assert (
    project_slug == project_slug.lower()
), f"{project_slug} project slug should be all lowercase"

Before Cookiecutter attempts to parse the project files, it will run pre_gen_project.py and if the user provided a slug that isn't all lowercase, the assertion will fail. The project isn't generated at all and an appropriate error message is displayed.

Post-generation hooks

We can do some interesting things in post_gen_project.py as well. Remember the aforementioned theme folder? it contains necessary files and modules for the 3rd party package django-tailwind. But if the user chose bootstrap or none we don't need this directory anymore:

# hooks/post_gen_project.py
import os
import shutil


def remove_tailwind_folder():
    theme_dir_path = "theme"
    if os.path.exists(theme_dir_path):
        shutil.rmtree(theme_dir_path)

# ...

def main():
    if "{{cookiecutter.css_framework}}".lower() != "tailwindcss":
        remove_tailwind_folder()

if __name__ == "__main__":
    main()

main() checks if the user chose to use Tailwind CSS, and if not, calls the function remove_tailwind_folder() which will delete its folders. As you can see, we have access to project variables in the generation hooks files:

if "{{cookiecutter.css_framework}}".lower() != "tailwindcss":
   # variable key                            # variable value

Conclusion

Cookiecutter can cut project generation time significantly. For more complex boilerplate the time savings can be reduced by more than 90%. As always, the package's docs site is a good place to start if there's something you're unsure of.

[^1]: (full-credit for this to the aforementioned Cookiecutter Django)

Eliminating indentation by returning early

Fri, 28 Jun 2019 00:00:00 GMT

Returning early is a fairly basic but useful technique and it's one that I've only adopted relatively late in my Python journey. The Zen of Python states that "flat is better than nested" and returning early can definitely make a noticeable difference in this regard.

Consider the following function:

def make_odd_even(number: int) -> int:
    if number % 2 != 0:
        return number + 1
    else:
        return number

Given an integer, make_odd_even() converts an odd number to an even one. It first verifies that it's odd, and then adds 1 to it, making it even. If the number is already even, it's returned as is.

Here's another, shorter way to write it:

def make_odd_even_v2(number: int) -> int:
    if number % 2 != 0:  # check if odd
        return number + 1
    return number

The else clause is omitted and becomes implicit because we know that if the number isn't odd then it’s surely even. There is no third option. Considering that return always stops any further code from being executed, we also know that if the number is odd, the second return statement is never reached. Same result, shorter code.

Another way to write the function is to flip the if clause check:

def make_odd_even_v3(number: int) -> int:
    if number % 2 == 0:  # check if even
        return number
    return number + 1

It's hard to see when the code is so trivial, but though they achieve the same result, only make_odd_even_v3() is an example of a returning early function.

What is returning early?

Returning early is the practice of first checking for one or more "invalid"/terminating conditions, usually at the beginning of the code, and halting the execution if any of these conditions is satisfied.

That's a mouthful.

In programming-speak, returning early is known as guard or guard-code. Thanks to reddit user novel_yet_trivial for pointing this out.

Here's a more involved example: Suppose we want to write a function to download media (image or video) from a tweet and then upload it from our local machine to an FTP server. This function should receive one parameter, tweet_url, and if all goes well, it should return a URL to the downloaded media file.

Here's one possible implementation:

The function below calls other helper functions and raises custom exceptions. Their implementations are beside the point of this article and are therefore omitted.

def upload_tweet_media(tweet_url: str) -> str:
    if check_valid_tweet(tweet_url):
        local_path = download_media_from_tweet(tweet_url)
        if local_path:
            try:
                url_to_file = upload_to_ftp(local_path)
                return url_to_file
            except ftputil.error.FTPOSError:
                raise FTPError("Couldn't upload to FTP server")
        else:
            raise DownloadError("Couldn't download twitter media")
    else:
        raise TwitterURLError("URL is invalid")

We first check whether tweet_url is a valid URL and that it actually points to a tweet. If it does, we then attempt to download the media from this tweet using the helper function download_media_from_tweet() - this naughty function returns either the local_path to the downloaded file, or None if the download failed for any reason. If the download is successful, we then pass the file’s local path to upload_to_ftp(). Assuming all goes well, the function returns the URL to the uploaded file. For every condition check, we’re also including an else clause.

That's a lot of indentation up there. At the innermost part, we're three levels deep.

Advantages of returning early

How would the function above look with early-return clauses: what if it checks for the "negative", falsey, or invalid scenarios first?

def upload_tweet_media(tweet_url: str) -> str:
    if not check_valid_tweet(tweet_url):
        raise TwitterURLError("URL is invalid")

    local_path = download_media_from_tweet(tweet_url)

    if not local_path:
        raise DownloadError("Couldn't download twitter media")

    try:
        url_to_file = upload_to_ftp(local_path)
        return url_to_file
    except ftputil.error.FTPOSError:
        raise FTPError("Couldn't upload to FTP server")

Here, we — only seemingly — flipped the order by which we check for invalid conditions. In reality, the outer if clauses are evaluated first anyway - we just changed the way the code is laid out.

The function is now flatter, shorter even with spacing, and cleaner to the eye. As for readability, I think this change only makes our intention clearer: unless the URL is invalid, and then unless the file couldn't be downloaded, try to upload it to the FTP. An added benefit is that our “happy-path” return value (url_to_file) is no longer indented three levels deep, and is clearly visible towards the end of the function.

Order (sometimes) matters

In the example above, the order by which we perform the checks matters. It's obvious: we shouldn't attempt to download a file if the URL isn't valid, so we should first check if the URL is invalid, and only then attempt the download.

However, it isn't always immediately evident that conditions are coupled. When refactoring code to return early, keep in mind to verify that dependent checks are performed in the right order. You're no longer guided by the mental hints of indentation.

It's only returning early if you actually return

Also remember that there has to be some kind of terminating statement in your return early clauses. In the example above, these are raise statements, but they could have been returns. The important thing is to halt the execution inside these clauses.

Conclusion

It's not always possible and it doesn't always make sense to use early-returns. Where it does, they can eliminate multiple levels of indentation, make the code more readable, shorter, and the intention behind it clearer.

Django tutorial: as-you-type search with Ajax

Sun, 26 May 2019 00:00:00 GMT

import demoVideo from './das-vid2-compressed.mp4' import demoVideo2 from './das-naive-comp.mp4'

Updated 24/03/2022: Django 4.0.3

This is a walkthrough tutorial on how to implement what's defined as “incremental search" in a Django app. We want results to refresh (with a tiny delay) as the user types their search term. We’ll also give a visual indication that the search is running by animating the search icon.

Here's a demo of the final functionality:

Play

The source code for this tutorial is available on Github.

Our app

Don't worry, not another blog or to-do app. This time it's a music website that displays music albums and artists. The structure is taken from an actual web-app I've built but is simplified for this post's purposes. Here's a folder only view:

django-ajax-search/
├── core
│   └── migrations
├── django-ajax-search
├── static
│   └── django-ajax-search
│       └── javascript
└── templates

Our project's root directory is called django-ajax-search, and we've created an app called core where we'll write most of our Django-related code. Make sure it's included in INSTALLED_APPS in your settings.py file.

Here's the models.py file:

# core/models.py
class MusicRelease(models.Model):
    title = models.CharField(max_length=560)
    release_date = models.DateField(blank=True, null=True)  # some releases don't have release-dates

    def __str__(self):
        return self.title

    @property
    def is_released(self):
        return self.release_date < timezone.now().date()


class Artist(models.Model):
    music_releases = models.ManyToManyField(MusicRelease, blank=True)
    name = models.CharField(max_length=560)

    def __str__(self):
        return f"{self.name} (release count: {self.music_releases.count()})"

Nothing fancy: each artist can have many music releases, and each release can have many artists. I've also added some helpful string representations to the Artist model.

High-level overview

We're going to let users search for artists in the database by name. Instead of a form with a submit button, we're going to refresh the results as the user types their query.

There are several moving pieces in this article so here's an outline of what we're going to do:

Briefly go over how HTTP GET parameters are handled in Django views and make our view capture the user's query.
Make the Django view handle Ajax requests and respond to them properly with a JSON response containing the new results.
Use JavaScript and jQuery to send an Ajax request to our view once the user starts typing in the HTML search box. This request will include the term so the server can return relevant results.
Once our view returns the JSON response, our JS code will use it to change the information presented to the user without a page-refresh.

Some of the concepts above will be discussed verbatim and others will be covered briefly.

Dependencies and additional setup

Make sure jQuery (CDN link) is included inside the head tag of the base.html template. While they're not strictly required, I'll also be using Bootstrap 4 as a CSS framework and Font Awesome for the search icon, which we'll make blink when a search is being taken care of by the server.

Another thing to verify is that the JS file is included in base.html:

{# base.html #}
{% block footer %}
  
{% endblock %}

Again, you don't have to follow the structure religiously but if you're ever confused on where things belong, check out the Github repository.

While source code is shared in the Github repository as is, platform-specific styling is often omitted in the blocks below to keep them short and relatively portable. Styling isn't the point of this guide anyway.

Artists in our database

Let's also create artists to work with. The demo app is going to have three:

Chet Faker (release count: 1)
Queen (release count: 1)
Parker Sween (release count: 0)

I don't know who Parker Sween is.

The artists view

Here's the views.py file:

# core/views.py
def artists_view(request):
    ctx = {}
    url_parameter = request.GET.get("q")

    if url_parameter:
        artists = Artist.objects.filter(name__icontains=url_parameter)
    else:
        artists = Artist.objects.all()

    ctx["artists"] = artists

    return render(request, "artists.html", context=ctx)

This view is referenced like this in our urls.py:

# urls.py
from django.urls import path
from core import views as core_views

urlpatterns = [
    # ...
    path("artists/", core_views.artists_view, name="artists"),
]

So the path ourapp.com/artists/ is going to hit this view. Let's pick it further apart.

Capturing HTTP GET parameters

The first thing to make sure of is that the view captures the GET parameter we're going to send. Here's the relevant line:

def artists_view(request):
# ...
url_parameter = request.GET.get("q")

So, one way to pass information between clients and servers is using HTTP GET parameters:

https://www.somewebsite.com/some-page?name=josh

When a URL like the above is requested, the server will receive the GET parameter name alongside its value josh. It's up to the server to decide what to do with this parameter, if at all.

GET parameters are often referred to as query strings, URL parameters, and other combinations of the two.

In Django views, these URL GET parameters are made available in a special kind of dictionary — a QueryDict called GET. This QueryDict lives in the request object, the one every Django view accepts as its first argument. Going back to the line above:

def artists_view(request):
# ...
url_parameter = request.GET.get("q")

This means that our view will capture a GET parameter q. If it isn't passed at all, url_parameter will be None. The first GET is the dictionary itself, and the second get() is just the method used to retrieve a key's value from a dictionary.

Some examples of URLs requested and how they would map:

URL requested: https://ourapp.com/artists?q=Queen
url_parameter value: "Queen"

URL requested: https://ourapp.com/artists?q=Samba
url_parameter value: "Samba"

URL requested: https://ourapp.com/artists?q=Chet Faker (decoded)
url_parameter value: "Chet Faker"

URL requested: https://ourapp.com/artists/?q=Chet%20Faker (encoded)
url_parameter value: "Chet Faker"

You may be more used to see URL parameters directly appended to the URL path without a forward-slash, like artists?q=Queen rather than artists/?q=Queen. The first looks cleaner, yes, but requires some workarounds that are irrelevant to the subject at hand. In any case, both paths will resolve correctly given the above configuration.

Case insensitive filtering

Another portion to go over in artists_view:

# core/views.py
def artists_view(request):
# ...
if url_parameter:
artists = Artist.objects.filter(name__icontains=url_parameter)
else:
artists = Artist.objects.all()

If url_parameter's value isn't None, it means that some string was passed after ?q= and we want to filter for Artist objects containing this string. Using icontains means the search will also be case-insensitive. For example: if url_parameter is KER, our view will return a QuerySet containing two of our artists: Chet Faker and Parker Sween. Queen won't be there.

Template files

Our view renders the template file artists.html:

{# artists.html #}
{% extends "base.html" %}

{% block content %}
Artists



  {# icon and search-box #}
  
    
    
  

  {# artist-list section #}
  
    {% include 'artists-results-partial.html' %}
  


{% endblock %}

The first thing to note is that this template includes another template, artists-results-partial.html:

{# artists-results-partial.html #}
{% if artists %}
  
  {% for artist in artists %}
    {{ artist.name }}
  {% endfor %}
  
{% else %}
  No artists found.
{% endif %}

Including the artist-list in a separate template partial doesn't only yield better readability; more importantly, it will allow us to more easily refresh this part (and this part only) of the page using JavaScript & jQuery. Also, take note of the HTML id attributes we assign to each of the search icon, the input field, and the div holding our artist list. We will use these values later when we target these elements for manipulation with jQuery.

Making the view respond to Ajax requests

Before we get to the JS code, there's one last addition we need to make in artists_view so it responds to Ajax requests:

from django.template.loader import render_to_string
from django.http import JsonResponse

def artists_view(request):
    # ...earlier code
  is_ajax_request = request.headers.get("x-requested-with") == "XMLHttpRequest" and does_req_accept_json

    if is_ajax_request:
        html = render_to_string(
            template_name="artists-results-partial.html",
            context={"artists": artists}
        )

        data_dict = {"html_from_view": html}

        return JsonResponse(data=data_dict, safe=False)

return render(request, "artists.html", context=ctx)

We first check if the request was made via an Ajax call. In this case, we want to return the browser a JSONResponse. But what are we returning, exactly?

We're passing JSONResponse a dictionary we've constructed, called data_dict. It has a single key html_from_view. This key's value is going to be the variable html.

html is our template artists-results-partial.html rendered as a string. It literally is the HTML output of our artist-list. We provide Django's render_to_string() a template to use and a context dictionary, and it returns to us that template as a string given the context it was fed. If it's not clear yet, here's an example:

In the view, If the variable artists is this QuerySet:

]>

Then these lines:

html = render_to_string(
            template_name="artists-results-partial.html",
            context={"artists": artists}
        )
print(html)

Will print the following:


  Chet Faker

You can see where this is going by now: using JS and jQuery, we can pass whatever the user is typing in the input box to our view as a GET parameter, filter by that string, and then return a JSON response with the new HTML to the browser where it will replace the old HTML.

Implementing Ajax search

We're going to send an Ajax request to the server. Once we get a JSON response back, we'll use jQuery to manipulate the relevant HTML elements. I can't possibly go in-depth on each piece of functionality here as that's beyond the scope of this post but I'll try to at least explain the bigger picture.

Ajax (AJAX) stands for Asynchronous Javascript and XML. The key word here is asynchronous: it allows to send and receive data between clients (browsers) and servers without the need to reload the entire page.

jQuery is one of the most popular JavaScript libraries, to the point where some would confuse it as a language on its own. Its mission statement is "to allow developers to do more while writing less code".

The code below is written in the ES6 syntax of JavaScript and may not work on a minority of browsers like Internet Explorer. If you want to support those you'll need to use a transpiler or employ a polyfill.

Here's the full JavaScript code:

const user_input = $("#user-input");
const search_icon = $("#search-icon");
const artists_div = $("#replaceable-content");
const endpoint = "/artists/";
const delay_by_in_ms = 700;
let scheduled_function = false;

let ajax_call = function (endpoint, request_parameters) {
  $.getJSON(endpoint, request_parameters).done((response) => {
    // fade out the artists_div, then:
    artists_div
      .fadeTo("slow", 0)
      .promise()
      .then(() => {
        // replace the HTML contents
        artists_div.html(response["html_from_view"]);
        // fade-in the div with new contents
        artists_div.fadeTo("slow", 1);
        // stop animating search icon
        search_icon.removeClass("blink");
      });
  });
};

user_input.on("keyup", function () {
  const request_parameters = {
    q: $(this).val(), // value of user_input: the HTML element with ID user-input
  };

  // start animating the search icon with the CSS class
  search_icon.addClass("blink");

  // if scheduled_function is NOT false, cancel the execution of the function
  if (scheduled_function) {
    clearTimeout(scheduled_function);
  }

  // setTimeout returns the ID of the function to be executed
  scheduled_function = setTimeout(
    ajax_call,
    delay_by_in_ms,
    endpoint,
    request_parameters
  );
});

Let's look at the first few lines:

const user_input = $("#user-input");
const search_icon = $("#search-icon");
const artists_div = $("#replaceable-content");

Remember how we gave some of the HTML elements in artists.html an ID attribute? Here, we’re using a jQuery selector to save those elements as variables so we can more easily refer to them later in the code. All jQuery selectors start with a dollar sign with the selected arguments enclosed in parenthesis.

We’re then initializing some additional variables:

const endpoint = "/artists/";
const delay_by_in_ms = 700;
let scheduled_function = false;

The first one is the relative path to the endpoint we’re going to make our Ajax request to. Note that this has to be a path where a Django URL is defined and we should always use the URL path because JavaScript knows nothing about Django's named URLs or views.

scheduled_function and delay_by_in_ms are explained later.

After the variables, we define the function ajax_call() which we invoke towards the end of the code:

let ajax_call = function (endpoint, request_parameters) {
  $.getJSON(endpoint, request_parameters).done((response) => {
    // fade out the artists_div, then:
    artists_div
      .fadeTo("slow", 0)
      .promise()
      .then(() => {
        // replace the HTML contents
        artists_div.html(response["html_from_view"]);
        // fade-in the div with new contents
        artists_div.fadeTo("slow", 1);
        // stop animating search icon
        search_icon.removeClass("blink");
      });
  });
};

This one takes two arguments, endpoint and request_parameters. It then uses jQuery's getJSON() method to send an Ajax request to the endpoint alongside the parameters. When it's done, it's going to give us an object we call response. We then fade out artists_div, replace its contents with response['html_from_view'], and fade it back-in. If you're confused about where html_from_view is coming from, go back to the view code responsible for handling Ajax requests.

Once the function is defined, we're using jQuery’s on() to bind a function to each keyup event that happens on user_input:

user_input.on("keyup", function () {
  // our code
});

This means that each time a keyboard key is released (after being pressed) inside user_input, the function is run. Let's inspect this function's body:

const request_parameters = {
  q: $(this).val(), // value of user_input: the HTML element with ID user-input
};

The first step is getting the value inside the input field. This is the string the user has typed so far. We save it inside an object request_parameters where its key is q.

Next, we add the blink CSS class to our search icon:

// start animating the search icon with the CSS class
search_icon.addClass("blink");

This lets the user know we’re doing something with their request. The search icon will blink indefinitely as long as it has this class. That's why we remove it at the end of ajax_call().

Here's the CSS code defining blink:

@keyframes blinker {
  from {
    opacity: 1;
  }
  to {
    opacity: 0;
  }
}

.blink {
  text-decoration: blink;
  animation-name: blinker;
  animation-duration: 0.6s;
  animation-iteration-count: infinite;
  animation-timing-function: ease-in-out;
  animation-direction: alternate;
}

Making sure the server isn't hammered

Now, to the setTimeout/clearTimeout part:

// if scheduled_function is NOT false, cancel the execution of the function
if (scheduled_function) {
  clearTimeout(scheduled_function);
}

// setTimeout returns the ID of the function to be executed
scheduled_function = setTimeout(
  ajax_call,
  delay_by_in_ms,
  endpoint,
  request_parameters
);

setTimeout() is a built-in JavaScript function that delays a function execution by a predefined duration (specified in milliseconds). It returns an ID of the function that is scheduled for execution. Here's its signature:

setTimeout(func, delay_in_ms, func_param1, func_param2, ...)

Compare this with the parameters we're passing above and you can see that the code is scheduling the ajax_call() function to execute after 700 milliseconds.

But why introduce a delay?

Because if we actually hit the server every time the keyup event is registered, we're going to flood it with too many requests in a short amount of time. Here's an example of a naïve implementation that doesn't use setTimeout() and clearTimeout(). I've added a logging message that prints to the console each time a request is made:

Play

So yeah, we don't want to hammer the server with every keystroke like that. That's what we utilize setTimeout() for. But that's only one part of the puzzle. With setTimeout(), all we're doing is delaying the execution of each query by 700ms. What we really want to do is send a request only after the user has ceased typing for a bit. That's where clearTimeout() comes in.

clearTimeout() is another built-in function. Given a function ID returned by setTimeout(), it cancels the execution of that function if it hasn't already been executed.

Now let's look at that piece of code again:

// if scheduled_function is NOT false, cancel the execution of the function
if (scheduled_function) {
  clearTimeout(scheduled_function);
}

// setTimeout returns the ID of the function to be executed
scheduled_function = setTimeout(
  ajax_call,
  delay_by_in_ms,
  endpoint,
  request_parameters
);

The above code block simply ensures an Ajax call is sent to our server at most every 700 milliseconds.

Since we initialized scheduled_function to false, the first time the if statement is evaluated, it's going to skip clearTimeout() and instantly schedule our Ajax call to execute after 700ms and that function's ID in scehduled_function.

Now, if within that very short timespan (699 milliseconds) the user types another letter, and since the variable scheduled_function is now truthy, the if block will evaluate to true and clearTimeout() will cancel the function that was scheduled for execution. Instantly after that, another new Ajax call is scheduled, and the cycle continues…if 700 milliseconds did pass since the user had last typed anything, ajax_call() is executed normally.

If you're still grappling with this concept, try to think of setTimeout() as scheduleFunction() and clearTimeout() as cancelScheduledFunction().

Summary

Nowadays using Django as a backend with a frontend framework like Vue.js or React is all the rage, but sometimes all that's needed for interactivity in a “classic” Django app is some JavaScript and jQuery knowledge.

You can clone the Github repository and play around with the working code if you feel you need a better understanding of some of the concepts outlined.

A more Pythonic dictionary

Fri, 10 May 2019 00:00:00 GMT

Dictionaries are versatile, fast, and efficient. This post will cover two dictionary related features that I feel don't get enough attention: setdefault and defaultdict. They're presented together to highlight both the differences and the similarities between them.

Use case: how many views did each article get?

Here's a simplified real-world scenario: a call to Google Analytics' API returns the following list of lists where each sub-list represents an article: the first item is the article's ID and the second one is its view count. Some article IDs may appear in more than one sub-list, and we want to sum the view counts for each distinct article:

received_list = [
    [1678, 30],  # 1678 is the ID, 30 is the view count
    [1987, 99],
    [1822, 50],
    [1678, 22],  # ID already appears
    [2299, 30],
    [1987, 100],  # ID already appears
]

If you know some Python, this should be pretty simple:

articles_and_views = {}

for each_list in received_list:
    article_id = each_list[0]
    article_views = each_list[1]

    if articles_and_views.get(article_id):
        articles_and_views[article_id] += article_views
    else:
        articles_and_views[article_id] = article_views

This if block is your standard "check whether some key is in dictionary" code. If it is, then we increment its corresponding value by article_views; if the key isn't already in the dictionary, we create it by assignment.

The output is correct as article 1678 appeared twice, first with 30 views and then with 22:

{1678: 52, 1987: 199, 1822: 50, 2299: 30}

The example above is a simple one. This is so this post can focus more on what setdefault and defaultdict do, and less on the underlying data-structures. In other scenarios you may be operating inside nested dictionaries, nested lists, and even more complicated structures. That's where these two will often come handy.

setdefault

setdefault is a dictionary method, just like get. In fact, you can think of it as a get that combines a conditional set: get a key's value, but if the key isn't present in the dictionary, create it with the default value provided:

my_dict.setdefault(k, v_if_not_k)
# k: the key to search for
# v_if_not_k (optional): value to assign to the previously non-existent key after creating it

In our case, we can utilize this to get rid of the if clause:

articles_and_views = {}

for each_list in received_list:
    article_id = each_list[0]
    article_views = each_list[1]
    articles_and_views.setdefault(article_id, 0)
    articles_and_views[article_id] += article_views

That's because there's a hidden if inside of setdefault. We're asking the dictionary articles_and_views: "did you see this article_id in your keys before? if so, give us that key's value. If not, create this key and set its value to 0". The default value can of course be a number other than 0, a list, or any other object. If you don't provide this second argument at all, the default value will be None.

Using setdefault makes sure that when we get to this next line:

articles_and_views[article_id] += article_views

article_id is undoubtedly an existing key in the dictionary. Either we just initialized it with a value of 0, or it had already existed before, so setdefault did not alter it. In any case, we can now increment its value safely.

In this case, we're not using the value returned by setdefault, but it's good to keep in mind it is available if needed.

While it's not unique to setdefault, there's one important thing to stress about this method: you can't assign to its return value. Meaning, this won't work:

articles_and_views.setdefault(article_id, 0) += article_views
# SyntaxError: can't assign to function call

If you're confused by this, remember that it's a method (function), and you can't assign (=) to functions. The above snippet is comparable to this one (which is hopefully more obviously incorrect):

# a function/method on the left?!
n = -50
abs(n) += 25

# SyntaxError: can't assign to function call

However, you certainly can do something like this with setdefault if you wanted to simply append each article_views to a list instead of adding them up:

# notice the default value is now a list
articles_and_views = {}

for each_list in received_list:
    article_id = each_list[0]
    article_views = each_list[1]
    articles_and_views.setdefault(article_id, []).append(article_views)

This code will work and articles_and_views ends up looking like this:

{1678: [30, 22], 1987: [99, 100], 1822: [50], 2299: [30]}

Every time you need a default value inside of a dictionary, consider setdefault. It will save you time and logical overhead. I found it especially useful for unifying external data:

employees_from_api = [
    {"name": "Britney", "age": 32, "bonus": 1500},
    {"name": "Jeff", "age": 32, "bonus": 2400},
    {"name": "Benjamin", "age": 21}, # no bonus
]

for employee in employees_from_api:
    bonus = employee.setdefault("bonus", 500)
    print(f"{employee['name']}'s yearly bonus is {employee['bonus']}")

Output:

Britney's yearly bonus is 1500
Jeff's yearly bonus is 2400
Benjamin's yearly bonus is 500

defaultdict

defaultdict is a subclass of dict and can be imported from the built-in collections module:

from collections import defaultdict

For the most part, defaultdict behaves just like dict, but it has one distinct feature: if provided with a valid callable as its first argument (more on this later), it never raises a KeyError when accessing non-existing keys; instead, it creates those.

This should help demonstrate this:

>>> regular_dict = {}
>>> regular_dict['non_existent_key']
KeyError: 'non_existent_key'

>>> from collections import defaultdict
>>> int_defaultdict = defaultdict(int)
>>> int_defaultdict['non_existent_key']
0

>>> list_defaultdict = defaultdict(list)
>>> list_defaultdict["non_existent_key"]
[]

>>> dict_defaultdict = defaultdict(dict)
>>> dict_defaultdict["non_existent_key"]
{}

To apply it to our example:

from collections import defaultdict

articles_and_views = defaultdict(int)

for each_list in received_list:
    article_id = each_list[0]
    article_views = each_list[1]
    articles_and_views[article_id] += article_views

We've eliminated 3/6 lines compared to the same implementation with the if block. The code is cleaner, not less readable, and a lot more Pythonic.

Only if a key doesn't already exist in a dictionary, defaultdict will create it and use the callable to set its value. In this case the callable is int, which returns 0 when invoked (and remember it will be invoked only when article_id does not exist as key in the dictionary).

>>> print(articles_and_views)
defaultdict(, {1678: 52, 1987: 199, 1822: 50, 2299: 30})

As you can see, the representation of a defaultdict is different from that of a regular dictionary. The former also specifies the callable it uses, or as the Python docs define it: the default_factory (in this case: int). We can always get the default representation or convert back to a dict:

>>> print(dict(articles_and_views))
{1678: 52, 1987: 199, 1822: 50, 2299: 30}

Default factory must be a callable

Say we now wanted to boost our ego (or avoid getting fired) and start each article's view count at 1,000:

articles_and_views = defaultdict(1000)

The above will return an error:

TypeError: first argument must be callable or None

defaultdict needs a callable "default factory", and we gave it the integer 1000 which is...not callable.

So let's give it a callable:

def return_one_thousand():
return 1000

articles_and_views = defaultdict(return_one_thousand)

Notice that we are not calling the function return_one_thousand (no curly braces) because that will defeat the purpose. Instead, it's defaultdict that will call it each time it needs to create a missing key. The function return_one_thousand is of course a callable so defaultdict doesn't complain.

If we only want to return a simple value we don't have to define a function and can simply use a lambda:

articles_and_views = defaultdict(lambda: 1000)

Both the return_one_thousand and the lambda implementations will return the following:

{1678: 1052, 1987: 1199, 1822: 1050, 2299: 1030}

So how come int, dict, and list worked? try invoking int and see what you get:

>>> int()
0

When you need a certain default behavior with a dictionary, consider defaultdict. It will often yield cleaner code than setdefault.

Summary

setdefault and defaultdict's usages can overlap, but they are different tools: the former is a method that works on a key-by-key basis and the latter is a subclass of the "regular" Python dict class. It's good to remember that the convenience offered by defaultdict — never raising a KeyError — can be a double-edged sword.

Grouping in Django templates

Sun, 28 Apr 2019 00:00:00 GMT

I've recently deployed a tiny changelog app in one of my Django projects. The models.py file looks like this:

# changelog/models.py (truncated)
class ChangeLog(models.Model):

    IMPROVEMENT = ('improvement', 'improvement')
    FEATURE = ('feature', 'feature')
    BUG = ('bugfix', 'bug fix')

    CHOICES = (IMPROVEMENT, FEATURE, BUG,)

    title = models.CharField(max_length=560)
    description = models.TextField(null=True, blank=True)
    category = models.CharField(choices=CHOICES, max_length=215)
    display_date = models.DateTimeField(editable=True)

Nothing special so far. The only slight oddity here is display_date: unlike what its name suggests, it's actually a datetime field.

In this app's main template, I wanted to sort (in reverse order) and group items by the date portion of their display_date so the output would be something like this:


  March 17, 2019
  changelog #2 created on this date
  changelog #1 created on this date



  March 15, 2019
  changelog #1 created on this date

So, ChangeLog objects that have the same date should all be inside the same div. This is the view I had wired up at the time:

# views.py
def changelog_index(request):
    changelog_items = ChangeLog.objects.order_by('-display_date')

    context = {
        'changelog_items': changelog_items
    }

    return render(request, 'changelog_index.html', context)

order_by takes care of sorting the changelog items in reverse chronological order. But there's a step missing here: how to group these changelog items by date?

Grouping in the view

One way is to group inside the view:

# views.py - grouping in view
def changelog_index(request):
changelogs = ChangeLog.objects.order_by('-display_date')

dates_and_items = {}

for changelog in changelogs:
    current_key = changelog.display_date.date()  # the item's date
    dates_and_items.setdefault(current_key, []).append(changelog)

context['dates_items'] = dates_and_items

return render(request, 'changelog_index.html', context)

Don't worry if you don't get what setdefault is doing, just know that this view creates a dictionary with dates as keys, and each such key holds a list of ChangeLog objects belonging to that date.

And then changelong_index.html would include something like this:

{# changelong_index.html - grouping in view #}
{% for date, item_list in dates_items.items %}
  
    {{ date }}
{% for changelog in item_list %}
    {{ changelog.title }} - {{ changelog.description }}

{% endfor %}
  
{% endfor %}

Here we are iterating over each date in our dictionary, and in the nested for loop, we iterate over this key’s item_list.

Grouping in the template

The other option is to leave the views.py file untouched. Reminder:

# views.py - grouping in template
def changelog_index(request):
    changelog_items = ChangeLog.objects.order_by('-display_date')

    context = {
        'changelog_items': changelog_items
    }

    return render(request, 'changelog_index.html', context)

And use Django’s built-in {% regroup %} tag:

{# changelong_index.html - grouping in template #}
{% regroup changelog_items by display_date.date as dates_items %}
{% for date in dates_items %}
  
    {{ date.grouper }}
    {% for changelog in date.list %}
      {{ changelog.title }} - {{ changelog.description }}

    {% endfor %}
  
{% endfor %}

Recognize that the markup is almost identical to the previous template snippet. Let's go over the differences:

{% regroup changelog_items by display_date.date as dates_items %}

regroup is an aptly named tag. It takes a list-like collection, and regroups it by a common attribute. Above, we’re regrouping the changelog_items QuerySet by its items’ display_date.date, and calling this regrouped collection dates_items which we can then use in the for loop.[^1]

If we wanted to regroup changelogs by category, we'd write:

{# group by each changelong_item's category #}
{% regroup changelog_items by category as cats_items %}

Anyway...we take this regrouped collection and iterate over it like so:

{# changelong_index.html - grouping in template, continued #}
{% for date in dates_items %}
  
    {{ date.grouper }}
    {% for changelog in date.list %}
      {{ changelog.title }} - {{ changelog.description }}
    {% endfor %}
{% endfor %}

Of special note are date.grouper in the H3 tag, and the date.list we iterate over in the nested for loop. These are objects that regroup creates: grouper is the item that was grouped-by, and list is the list of objects that belong to this group.

You can think of grouper as a key in the dictionary, and list as the value, which is list of items belonging to that “key”. In our case, each grouper is a distinct date, which has a list of changelog items.

Important caveat

Note that {% regroup %} itself does not sort the collection it regroups. In our case, the ChangeLog objects were sorted in the view, so regroup works as expected. If they weren't, regroup would create duplicate sections with the same date.

But there is a way to sort in the template, using the dictsort/dictsortreversed template tag:

{% regroup changelog_items|dictsortreversed:"display_date" by display_date.date as sorted_dates %}

Here, receiving a an unordered collection changelog_items, we sort by display_date in descending order (from latest to first), and then group by the display_date.date.[^2]

Where to group?

I don't proclaim to know the definitive answer, and I don't think there is one. In the case above, grouping in the template involved less effort and took less time to write. One more possible case to utilize regroup is when you want to sort the same QuerySet by different attributes in the same view. In other cases, different considerations (like speed) may favor grouping in the view. Always weigh and balance.

As I wrote earlier this month, I generally prefer my templates to be as dumb as possible, but every rule has its exception, and it's good to have regroup in one's arsenal when the situation calls for it.

[^1]: Note that we never passed dates_items from the view. [^2]: Using display_date (a datetime) in dictsortreversed means that while items are grouped by dates, more recent items within the same date are displayed first.

macOS migrations with Brewfile

Mon, 22 Apr 2019 00:00:00 GMT

Perhaps the most-dreaded aspect of setting-up a new machine is the time spent on reinstalling apps and reapplying all of the customizations from the previous one. As my MacBook Pro is about to turn six, I had been looking for a way to automate this process. At least for the applications part, I recently found a good solution (that’s apparently been around for a while).

This post is about using a Brewfile to migrate macOS packages and applications. If you're already versed in the world of Homebrew and Homebrew Bundle, you might find it overly verbose. It’s written from a beginner’s perspective as up until recently I wasn't too familiar with the concept myself.

Brewfile in a nutshell

A Brewfile contains instructions on which packages, command-line utilities, and applications to install on a macOS system. Here's a short snippet:

# Brewfile snippet

# install Python and SQLite
brew "python"
brew "sqlite"

# install 1Password, Pages, and Drafts from the Mac App Store
mas "com.agilebits.onepassword-osx", id: 443987910 # 1Password
mas "com.apple.iWork.Pages", id: 409201541 # Pages
mas "com.agiletortoise.Drafts-OSX", id: 1435957248 # Drafts


# install the apps below from Homebrew's repository
cask "carbon-copy-cloner"
cask "dropbox"
cask "vlc"

If I were to "run" this Brewfile, it would install the Python and SQLite packages, then 1Password, Pages, and Drafts from the Mac App Store, and finally Carbon Copy Cloner, Dropbox, and VLC from Homebrew’s repository (which usually pulls them from their respective websites). All apps are installed in the Applications folder by default, but the ability to differentiate between App Store and non App Store applications is significant in my case.

This is already faster than doing any of these steps manually. What's more, a Brewfile can be generated automatically so you’d rarely need to write the lines above one-by-one.

Why Brewfile

Because the alternatives aren't as good.

Cloning: Using the excellent Carbon Copy Cloner to clone my old HD to the new one would theoretically be the quickest way to get going, but after 6 years, I imagine there's more than a little cruft in my system files, and recent changes to Apple’s hardware make this option even less attractive. There are also apps on my current machine that I actually don't want to move over.

Time Machine and/or Migration Assistant: Migration Assistant hasn't been known for its reliability lately, and Time Machine backups are not less problematic. Listing the advantages and drawbacks is beyond the scope of this post, but if you want to read more about the pros and cons of each migration strategy, Jason Snell does a good job on that.

Starting fresh: Nothing could go wrong, but a lot of time spent on configuration and installing apps.

A detour

To understand what a Brewfile does and how it can fit in a migration strategy, it's good to be familiar with the moving parts that make it useful. This is not an exhaustive overview, but rather an introduction into each.

Homebrew

In the beginning, there was Homebrew, a package manager created by Max Howell in 2009. After installing homebrew, you can open the Terminal and install packages easily and quickly:

# installs ffmpeg, a popular command-line package, on macOS
$ brew install ffmpeg

# now that ffmpeg is installed, we can use it:
$ ffmpeg -i input.mp4 output.avi

Behind the scenes, brew is using what it calls a "formula" to install the ffmpeg package. This formula is a piece of code that’s responsible for holding all the information required to install ffmpeg: its name, version, URL to the source files that should be downloaded, and other packages that ffmpeg needs in order to operate.

Homebrew not only makes it easy to install packages, but also to maintain them:

# upgrades ffmpeg
$ brew upgrade ffmpeg

# upgrades all outdated formulae
$ brew upgrade

# update homebrew itself, and all packages
$ brew update

# uninstalls ffmpeg
$ brew uninstall ffmpeg

And to discover them:

# search for youtube-dl
$ brew search youtube-dl

# get info about youtube-dl
$ brew info youtube-dl

homebrew is very nice indeed. It's lauded for its ease-of-use, documentation and helpful command-line feedback.

Homebrew Cask

homebrew-cask is like homebrew, but for macOS apps, fonts, plugins, and other non-open source software. If brew install [formula-name] installs a package corresponding to that formula's name, then brew cask install [cask-appname] installs an application with that cask's name:

# install firefox
$ brew cask install firefox

# install slack
$ brew cask install slack

By default, it places installed apps in the Mac's Applications directory. You can search for casks the same way you search for formulae:

$ brew search firefox

# Output:
==> Casks
firefox
multifirefox
homebrew/cask-versions/firefox-beta
homebrew/cask-versions/firefox-developer-edition
homebrew/cask-versions/firefox-esr
homebrew/cask-versions/firefox-nightly

But where is firefox coming from here? How does brew cask install firefox know what to install?

$ brew cask info firefox

# Output:
firefox: 66.0.3 (auto_updates)
https://www.mozilla.org/firefox/
Not installed
From: https://github.com/Homebrew/homebrew-cask/blob/master/Casks/firefox.rb

==> Name
Mozilla Firefox

==> Languages
cs, de, en-GB, en, eo, es-AR, es-CL, es-ES, fi, fr, gl, in, it, ja, ko, nl, pl, pt-BR, pt, ru, tr, uk, zh-TW, zh

==> Artifacts
Firefox.app (App)

A few pieces of information here:

firefox: 66.0.3 is the version we can expect homebrew-cask to install.
From: holds the URL where the cask lives. If you inspect it you'll see that somewhere in there is also the URL one would go to in the browser when installing Firefox the “regular” way. There's no magic here.
Install options, like Languages. Running brew cask install firefox --language=it will install Firefox in Italian.

Indeed, homebrew-cask is very, very nice.

Mac App Store command line interface

There's one more tool that we need to cover before Brewfile: mas-cli is a simple command line interface for the Mac App Store (MAS). It can't install apps that you haven't downloaded or purchased before, but it will allow you to upgrade those that you have installed, and download apps tied to your iCloud account:

# search for 1Password
$ mas search 1Password

# Output:
1333542190  1Password 7 - Password Manager (7.2.5)

# install 1Password by its app identifier
$ mas install 1333542190

# upgrade all apps that have pending updates
$ mas upgrade

# upgrade 1Password
$ mas upgrade 1333542190

mas-cli may not seem terribly useful at first glance, but it was the missing piece in my migration strategy since it provides a way to capture all Mac Store apps currently installed:

# list all apps installed through the Mac App Store
$ mas list

# Output (truncated)
1225570693 com.ulyssesapp.mac (15.2)
986304488 com.zive.kiwi (2.0.18)
422304217 com.dayoneapp.dayone (1.10.6)
# ^identifier
# ^bundle name
#^version

Yes, mas-cli is nifty.

So, brew install, brew cask install, and mas install make things a lot faster. The next step is to find a way to automate the generation and execution of these commands.

Homebrew Bundle

homebrew-bundle is an extension of homebrew and is installed as soon as the command brew bundle is first used. It's the glue that brings everything together.

Run brew bundle dump and Homebrew Bundle will generate a file called Brewfile listing **all ** of the installed brew packages, cask applications, and Mac App Store applications currently on the machine. If, on the other hand, you run brew bundle from a folder that contains a Brewfile, it will install everything listed in that file.

So, given a Brewfile with the following content:

# install Python and SQLite
brew "python"
brew "sqlite"

# install 1Password, Pages, and Drafts from the Mac App Store
mas "com.agilebits.onepassword-osx", id: 443987910 # 1Password
mas "com.apple.iWork.Pages", id: 409201541 # Pages
mas "com.agiletortoise.Drafts-OSX", id: 1435957248 # Drafts


# install the apps below from their own respective websites
cask "carbon-copy-cloner"
cask "dropbox"
cask "vlc"

Running brew bundle from the same directory where Brewfile is located will install the above packages and applications.

Notice that the Brewfile syntax differs from the commands you'd usually type in the Terminal. This table should help:

Terminal command	Brewfile
`brew install [formulaName]`	`brew "[forumlaName]"`
`brew cask install [caskName]`	`cask "[caskName]"`
`mas install [identifier]`	`mas "[bundleIdentifier]", id: [identifier]`

I think you know where this is going by now: run brew bundle dump on the current machine, copy the Brewfile generated to the new one, run brew bundle, and Homebrew will take it from there. If you have lots of apps and packages the process will take some time, but nowhere near the time (or effort) it would have taken to do manually.

A quick-guide on setting up a new macOS using a Brewfile

Here's an abbreviated guide to set-up a new macOS with Homebrew Bundle. Unless otherwise stated, all commands below are to be typed in the macOS Terminal prompt.

The steps involved are:

Installing dependencies on the current (source) macOS machine
Installing Homebrew taps
Generating a Brewfile
Migration

1. Installing dependencies on the source machine

Homebrew

Check if you already have Homebrew installed:

$ brew help

If Homebrew isn't installed, the output should be something like brew: command not found. Homebrew itself depends on the command line tools (CLT) for Xcode, installed like this:

$ xcode-select --install

You can then install Homebrew by pasting the following in your Terminal prompt:

$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

If you do have Homebrew, and help prints a long list of commands, it's a good idea to run an update before proceeding:

$ brew update

Homebrew Cask

Comes with Homebrew, but it doesn't hurt to make sure it's there:

# more on "tap" later
$ brew tap caskroom/cask

Homebrew Bundle

Will be installed as we run it (later).

Mac App Store CLI

The way to install this mas-cli varies depending on the OS version. You can find simple instructions in the project's Github repository, but if you have a recent version this should suffice:

$ brew install mas

2. Installing Homebrew taps

Think of taps as additional sources brew will look at when searching and installing formulae and casks. Here's what I recommend if you're following this tutorial:

# for good measure, I've included the default taps:
brew tap homebrew/bundle
brew tap homebrew/cask
brew tap homebrew/cask-fonts
brew tap homebrew/core
brew tap homebrew/services
brew tap mas-cli/tap

3. Creating the Brewfile

Now that all of the of the dependencies are installed, let's generate a Brewfile:

# navigate to the user's home (~) directory
$ cd

# "dump" (create) the Brewfile in our home directory
# based on which packages and apps are installed
$ brew bundle dump

Notice that Brewfile may be missing non-MAS applications and packages that you haven't installed with brew or brew cask. If you installed Firefox from Mozilla's website, homebrew-bundle doesn't know about it. It's easy enough to search for those and add them manually. And, it's something you only have to do once since you'll never ever again go to a website, find the install link, wait for the download to finish, and then drag the app icon to /Applications.

A Brewfile looks something like this:

tap "homebrew/bundle"
tap "homebrew/cask"
tap "homebrew/cask-fonts"
tap "homebrew/core"
tap "homebrew/services"
# ... possibly more tap commands here

brew "atomicparsley"
brew "autoconf"
brew "freetype"
# ... more brew commands here

cask "font-fira-mono"
cask "sip"
# ... more cask commands here

mas "com.acqualia.soulver", id: 413965349
mas "com.agilebits.onepassword-osx", id: 443987910
mas "com.agiletortoise.Drafts-OSX", id: 1435957248
mas "com.apple.dt.Xcode", id: 497799835
mas "com.apple.iWork.Keynote", id: 409183694
mas "com.apple.iWork.Numbers", id: 409203825
mas "com.apple.iWork.Pages", id: 409201541
# ... more mas commands here

If you'd like to omit some packages or otherwise change the Brewfile that your target macOS will use, you can simply copy the file somewhere else and make your changes there.

I keep my Brewfile in a Github repository, but you can place it in Dropbox, Google Drive, or wherever.

One more change I do is placing all mas directives before the cask ones, so the App Store version of an app is preferred in case that app is mistakenly listed in both sections.

4. Migration

The only dependency needed on the new machine is Homebrew (see step 1). That's because the Brewfile pulled from the old setup already stages all others for installation.

Once Homebrew is installed and a Brewfile is present, it's as simple as running:

$ brew bundle

brew bundle will look for a Brewfile in the current directory, but you can also specify the path manually:

# will install from a Brewfile in the Dropbox folder
$ brew bundle --file=~/Dropbox/

If you enjoyed this post, please consider donating to Homebrew.

Django: Keeping logic out of templates (and views)

Mon, 08 Apr 2019 00:00:00 GMT

When I first started dabbling with Django and web-development, a good friend with a little more experience advised that I should keep logic away from my templates. "Templates should be dumb".

I didn't really understand what that meant until I started suffering the consequences of having logic in my .html files. After 3 years with Django, I now try to keep business-logic away not only from templates, but also from views.

In this post I'll gradually go over from the least to the most recommended path and outline the advantages that each one offers.

Our app: a simple blog

Let's start with extracting logic from the templates first. As is the case with most real-world apps, the project usually starts simple and plain in its specifications and requirements, and starts growing gradually.

Given this model:

# models.py
from django.db import models
from django.utils import timezone


class Post(models.Model):
    title = models.CharField(max_length=90, blank=False)
    content = models.TextField(blank=False)
    slug = models.SlugField(max_length=90)
    is_draft = models.BooleanField(default=True, null=False)
    is_highlighted = models.BooleanField(default=False)
    published_date = models.DateTimeField(default=timezone.now)
    likes = models.IntegerField(default=0)

    class Meta:
        ordering = ('-published_date',)

    def __str__(self):
        return self.title

    @property
    def is_in_past(self):
        return self.published_date < timezone.now()

The worst: logic in templates

In our blog's index.html, we want to display the latest 10 posts' titles and their publication date. The title should also be a link to the post-detail view, where the post content is presented.

While we do want to see our drafts so we can preview how they look on the website, we certainly don't want them visible to other visitors.

# views.py
def all_posts(request):
context = {}
posts = Post.objects.all()[:10]
context['posts'] = posts
return render(request, 'index.html', context)

{# index.html #}
{% for post in posts %}
  {% if request.user.is_superuser %}
    
      
        {{ post.title }}

        {% if post.is_draft %}
          Draft
        {% endif %}

        {% if not post.is_in_past %}
          Future Post
        {% endif %}
       Date: {{ post.published_date }}
      
    
  {% elif not request.user.is_superuser and not post.is_draft %}
    
      
        {{ post.title }}
      
       Date: {{ post.published_date }}
    
  {% endif %}
{% endfor %}

In index.html, we're checking if request.user is an admin, and if they are, we're not filtering any posts. In the elif block that applies to all other visitors, we're making sure the is_draft property is False before displaying the post:

{% elif not request.user.is_superuser and not post.is_draft %}

We’re also adding some Bootstrap markup so an admin can see clearly if a certain post is a draft or one that is scheduled in the future. We don't need this markup for regular visitors because they're not supposed to see these posts in the first place.

This kind of design is pretty bad for several reasons:

No separation of concerns: why is the template deciding which posts to show?
Violates the DRY (Don't Repeat Yourself) principle: look at the span tag that holds the date. Because of our choice, we have to repeat it in both clauses of our if statement.
Verbosity: our index.html is only displaying links to our posts, yet it already feels very cluttered.
Readability and maintainability: the Jinja/Django templating engine is good, but isn't known for its clean syntax. If you come back to this in 6 months, can you quickly tell what's happening? will you remember that if you add a div containing the post's author name, you should do it in both clauses the if statement?

The better way

If instead we write our view like this:

# views.py
def posts_index(request):
context = {}
limit = 10
posts = Post.objects.all()

if not request.user.is_superuser:
# hide drafts
posts = posts.filter(is_draft=False)

context['posts'] = posts[:limit]
return render(request, 'index.html', context)

Then our index.html file looks like this:

{# index.html #}
{% for post in posts %}
  
    
      {{ post.title }}

      {% if post.is_draft %}
        Draft
      {% endif %}

      {% if not post.is_in_past %}
        Future Post
      {% endif %}
    
     Date: {{ post.published_date }}
  
{% endfor %}

We keep the business logic outside of the template file, as it should be strictly responsible for presentation 90% of the time. Templates should mostly be concerned with how elements are rendered, not which, or if they are.

What we gain here:

DRYness: we're no longer repeating the HTML for rendering the post.
Reusability: because index.html no longer makes a decision about whether to display a post, we can use it in other views later (archive for example).
Readability: it's much clearer now what's happening in index.html and it'll be easier to figure out when we come back to it in the future.

So this is much better, and probably sufficient if you're developing a super-simple application. But even with this, you'll start repeating yourself sooner than later.

You may have spotted a bug in the code above. We’re not filtering out future posts (those with a published_date value in the future) when we render the index to the blog’s visitors.

Let's fix that:

# views.py
from django.utils import timezone

def posts_index(request):
    context = {}
    limit = 10
    posts = Post.objects.all()[:limit]

    if not request.user.is_superuser:
# filter out drafts and future posts
        posts = Post.objects.filter(is_draft=False, published_date__lte=timezone.now())[:limit]

    context['posts'] = posts
    return render(request, 'index.html', context)

Now only the admin will see future posts.

Now, we create a new view, featured_posts, where we only want to display posts that are marked as highlighted by us, using the is_highlighted field of the model. Simple enough:

def featured_posts(request):
    context = {}
    posts = Post.objects.filter(is_highlighted=True)

    if not request.user.is_superuser:
        posts = posts.filter(is_draft=False, published_date__lte=timezone.now())

    context['posts'] = posts
# we're free to use `index.html` here because our template is now re-usable
    return render(request, 'index.html', context)

Now let's create a third view, dashboard, where we display the latest 5 regular posts, and the latest 5 highlighted posts (they may overlap):

def dashboard(request):
    context = {}
    posts = Post.objects.all()
    limit = 10
    posts_featured = Post.objects.filter(is_highlighted=True)

    if not request.user.is_superuser:
        posts = posts.filter(is_draft=False, published_date__lte=timezone.now())
        posts_featured = posts_featured.filter(is_draft=False, published_date__lte=timezone.now())

    context['last_posts'] = posts[:limit]
    context['last_posts_featured'] = posts_featured[:limit]

    return render(request, 'dashboard.html', context)

We already see two problems here:

Our code is getting more and more verbose, and that's with only two fields to filter by. Imagine having 3 or 4 (like author and tags for example). With real-world applications you'll often have more.
We're leaking implementation details of our models to our views: our view now has to know that there's a field called is_highlighted in our models.

Worse yet, consider what happens if we now decide that posts appearing under the featured sections in our blog should meet two criteria:

is_published is True
likes count is at least 3

We now have to update the code in two of our views so it includes the new criterion:

Post.objects.filter(is_draft=False, is_highlighted=True, likes__gte=3)

Now imagine the work involved when you have 7 views, and two more criteria to filter by - definitely a possibility when you're dealing with larger scale apps.

The even better way(s)

There are two ways to go about this. We'll quickly cover the first one, which is considered less conventional and less natural, but does the job fine if you need something quick and dirty.

Class methods

class Post(models.Model):
    # ...

    @classmethod
    def published(cls):
        """
        :return: published posts only: no drafts and no future posts
        """
        return cls.objects.filter(is_draft=False, published_date__lte=timezone.now())

    @classmethod
    def featured(cls):
        """
        :return: featured posts only
        """
        return cls.objects.filter(is_highlighted=True)

We've added two model methods, which we can use in our views like this:

# notice: no .objects because it's a model/class method

published_posts = Post.published()
featured_posts = Post.featured()
published_and_featured = Post.published() & Post.featured()

Look at how much cleaner our dashboard becomes with this change:

def dashboard(request):
    context = {}
    posts = Post.objects.all()
    limit = 10
    posts_featured = Post.featured()

    if not request.user.is_superuser:
        posts = posts & Post.published()
        posts_featured = posts_featured & Post.published()

    context['last_posts'] = posts[:limit]
    context['last_posts_featured'] = posts_featured[:limit]

    return render(request, 'dashboard.html', context)

What's more, changing our criteria for what is considered a "featured" post becomes as simple as changing one line in Post.featured():

class Post(model.Model):
# ...
@classmethod
    def featured(cls):
        """
        :return: highlighted posts with at least 3 likes
        """
        return cls.objects.filter(is_highlighted=True, likes__gte=3)

Now all the views that invoke this model method will update accordingly.

So this is pretty sweet, but as I wrote, considered less conventional in the Django community. One more limitation of model methods is that they are not directly chainable:

# attempting to chain our two methods
>>> posts_featured_published = Post.featured().published()

'QuerySet' object has no attribute 'published'

This is why we turn to using the logical AND (&) operator:

# using '&' to further filter our queryset
posts_featured_published = Post.featured() & Post.published()

So using model methods solves many of the previous method's shortcomings, but there's an even better way.

Custom model managers

I'm not going to go in-depth about managers vs querysets, as this is beyond the scope of this post. Let's get rid of our model methods in the previous step, and instead define our models.py file like this:


class PostQuerySet(models.QuerySet):
    def published(self):
        return self.filter(is_draft=False, published_date__lte=timezone.now())

    def featured(self):
        return self.filter(is_highlighted=True)


# Create your models here.
class Post(models.Model):
    title = models.CharField(max_length=90, blank=False)
    content = models.TextField(blank=False)
    slug = models.SlugField(max_length=90)
    is_draft = models.BooleanField(default=True, null=False)
    is_highlighted = models.BooleanField(default=False)
    published_date = models.DateTimeField(default=timezone.now)
    likes = models.IntegerField(default=0)

# use PostQuerySet as the manager for this model
    objects = PostQuerySet.as_manager()

    class Meta:
        ordering = ('-published_date',)

    def __str__(self):
        return self.title

    @property
    def is_in_past(self):
        return self.published_date < timezone.now()

Of note is the objects field we've added to Post, which instructs this model to use PostQuerySet as its manager.

Let's examine, once again, our dashboard view:

def dashboard(request):
    context = {}
    posts = Post.objects.all()
    limit = 10
    posts_featured = Post.objects.featured()

    if not request.user.is_superuser:
        posts = posts.published()
        posts_featured = posts_featured.published()

    context['last_posts'] = posts[:limit]
    context['last_posts_featured'] = posts_featured[:limit]

    return render(request, 'dashboard.html', context)

Notice how we these two manager methods are now chainable:

>>> posts_featured_published = Post.objects.featured().published()

, ]>

With PostQuerySet in our models.py file, we're extending the manager-methods at our disposal, so alongside get, filter, aggregate, etc…we now have published and featured.

A few advantages of using model managers over class methods:

Chainability and clarity: Post.objects.featured().published() looks more Pythonic and natural than Post.featured() & Post.published().
Reusability: in many cases you can reuse the same manager for more than one model. Maybe in the future you'll create a ShortNote model which you can use the same PostQuerySet to manage. With model methods you'll have to redefine custom filters inside your ShortNote model.

There are a few more advantages, such as the ability to define several managers on the same model, but these are beyond the scope of this post.

So, takeaway: keep logic out of templates almost at all costs, try to have as little of it as possible in your views. If you want something quick, a model method may suffice, but prefer model managers.

Django templates: 'include' context

Sun, 09 Sep 2018 00:00:00 GMT

Something I learned today which should come handy. The include tag allows rendering a partial template from another:

{% include 'foo/bar.html' %}

So I was doing this to pass context to the included partial:

{% with obj=release %}
{% include 'releases_widget.html' %}
{% endwith %}

And this why it's good to read the docs, because apparently this can be done much better like so:

{% include 'releases_widget.html' with obj=release %}

Restoring a database from Heroku for local development

Wed, 05 Sep 2018 00:00:00 GMT

I've recently had to download my Django app's database for local inspection. Heroku lets you do that pretty easily with:

$ heroku pg:backups:download

This gets you a .dump file. Now it's time to create a database clone out of it.

Here's the gist. We first create a new database:

$ sudo -u USERNAME createdb NEW_DATABASE_NAME

Note that USERNAME and NEW_DATABASE_NAME should be replaced with the respective values.

The next step to is to restore the downloaded .dump to the database we just created:

$ pg_restore --verbose --clean --no-acl --no-owner -h localhost -d NEW_DATABASE_NAME /PATH/TO/latest.dump

And now there's a database clone that you can connect to at NEW_DATABASE_NAME. It's also possible to overwrite an existing database by supplying its name instead of a new database name, which makes the database-creation step redundant.

The process usually finishes with some reported errors, but I never noticed anything weird or wrong with database copies I've generated this way.