<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~files/feed-premium.xsl"?>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:feedpress="https://feed.press/xmlns" xmlns:media="http://search.yahoo.com/mrss/" xmlns:podcast="https://podcastindex.org/namespace/1.0" version="2.0">
  <channel>
    <feedpress:locale>en</feedpress:locale>
    <atom:link rel="via" href="https://feed.thebiglog.com/"/>
    <atom:link rel="self" href="https://feed.thebiglog.com/"/>
    <atom:link rel="hub" href="https://feedpress.superfeedr.com/"/>
    <title>The Big Log</title>
    <description>RSS feed for The Big Log</description>
    <link>https://thebiglog.com/</link>
    <item>
      <title>You didn’t just do that, Heroku</title>
      <link>https://thebiglog.com/posts/heroku-anti-dx/</link>
      <guid isPermaLink="true">https://thebiglog.com/posts/heroku-anti-dx/</guid>
      <pubDate>Mon, 17 Apr 2023 00:00:00 GMT</pubDate>
      <content:encoded><![CDATA[<p>&lt;div class="notice"&gt;
&lt;div class="notice-content"&gt;
&lt;p&gt;Obligatory &lt;a href="https://thebiglog.com#tldr"&gt;TL;DR&lt;/a&gt; with spoilers at the end of the post.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</p>
<h2>Background</h2>
<p><a href="https://www.musicbutler.io/">MusicButler</a> is the first web-app I ever built. It notifies its users about new music by artists found in their libraries. It’s also how I learned a lot of what I know about programming and most of what I know about web-development. I believe it played a huge part in how I got to make a career shift to a software developer in my 30s.</p>
<p>Before the dawn of Dockerized apps, one way a layman like me would get an app to production was using Heroku’s Procfiles. MusicButler went live in August 2018. Today it’s a nice side-project that I’m proud to stand behind.</p>
<p>The app bakcend is built on Django and relies heavily on the <a href="https://docs.celeryq.dev/en/stable/">Celery library</a> for carrying out millions of background and scheduled tasks each day. In fact, all Heroku dynos except for the <code>web</code> one are Celery dynos:[^1]</p>
<pre><code>web: daphne musicbutler.asgi:application --port $PORT --bind 0.0.0.0
celerybeat: celery -A musicbutler beat
celerybackgroundworker1: celery -A musicbutler worker -Q regular
celerybackgroundworker2: celery -A musicbutler worker -Q regular
celeryimportantworker: celery -A musicbutler worker -Q important
</code></pre>
<p>The dyno of interest here is <code>celerybeat</code>: the Celery “scheduler” responsible for assigning scheduled tasks to other Celery workers.</p>
<h2>April 8: first duplicate email received</h2>
<p>One of the oldest pieces of code in MusicButler is the one that sends out music-drop emails to thousands of users each day, once a day.</p>
<p>I was surprised when I saw that my test user had received two emails titled "New Music for April 8, 2023" at the same time. The emails were identical. This had never happened before.</p>
<p>Upon checking my email delivery provider dashboard, I confirmed that MusicButler had sent twice its average volume of emails that day. Heroku's logs showed that <code>celerybeat</code> had dispatched the scheduled task twice:</p>
<pre><code>celerybeat.1 [2023-04-08 21:35:00,000: INFO/MainProcess] Scheduler: Sending due task send_music_drops (send_music_drops)
celerybeat.1 [2023-04-08 21:35:00,014: INFO/MainProcess] Scheduler: Sending due task send_music_drops (send_music_drops)
</code></pre>
<p>Ok. This isn't the email provider’s fault. It’s mine, or Celery’s. Probably the former.</p>
<p>I was busy that night so I utilized a Celery companion library called <a href="https://github.com/cameronmaske/celery-once">Celery Once</a> which ensures only one instance of a task can run concurrently.</p>
<h2>April 12: First user complaint</h2>
<p>Over the next few days some personal matters came up that steered my attention away. It was not until a long-time user complained about receiving duplicate emails that I realized the issue was still persisting. Apparently Celery Once doesn’t handle locking for scheduled tasks.</p>
<p>The music drops is one the core features of the app, so I needed to fix it quickly. I had approximately 24 hours until the next batch of emails was due to be sent out.</p>
<p>I could implement the use of a distributed lock myself, or make the email-sending task truly idempotent, but I wanted understand why something was broken in the first place. This code has been running flawlessly <em>for years</em>.</p>
<h2>Unfortunate coincidence</h2>
<p>Nothing stood out when I ran the service locally. A quick scan of the code didn't bring up any suspects either.</p>
<p>I inspected Heroku’s logs again and saw that it wasn’t just this specific task that was being dispatched twice, <em>all of them were</em>:</p>
<pre><code>celerybeat.1 [2023-04-12 18:22:00,000: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)
celerybeat.1 [2023-04-12 18:22:00,015: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)
</code></pre>
<p>Luckily most of these tasks are truly idempotent so little damage was done there.</p>
<p>Here’s where a mix of Impostor Syndrome and mandated developer humility sent me looking in the wrong directions.</p>
<p>In the weeks before, I’ve written some new code that was also supposed to be scheduled by <code>celerybeat</code>. Did I muck up something there? since this is Python, you can shoot yourself in the foot in a variety of ways. Maybe I've imported the scheduler configuration code twice, maybe I inadvertedly put some code in an <code>__init__</code> file, like <a href="https://stackoverflow.com/a/41582354/5013234">this Stack Overflow answer</a> I came across suggests.</p>
<p>I start to delete the new code. At first gradually, and then frantically. With each deployment that failed to mitigate the issue, I was running out of time and nerves.</p>
<h2>“It’s Celery’s fault”</h2>
<p>Using Heroku's log drains again, I managed to pinpoint the exact time the issue first manifested. The <code>celerybeat</code> dyno started sending the same tasks twice on April 6, 2023 at 22:16:00 UTC:</p>
<pre><code># once every two minutes
celerybeat.1 [2023-04-06 22:10:00,003: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)
celerybeat.1 [2023-04-06 22:12:00,001: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)
celerybeat.1 [2023-04-06 22:14:00,001: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)
# twice every two minutes
celerybeat.1 [2023-04-06 22:16:00,000: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)
celerybeat.1 [2023-04-06 22:16:00,015: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)
</code></pre>
<p>22:16:00 is the exact time a new build of the app was deployed. This was a revelation, but not a very useful one by the time I discovered it: I had already commented out all new code that could <em>remotely</em> be connected to the “bug”.</p>
<p>So is it Celery then? This awesome library has its fair share of complaints about duplicated task execution — most are due to misconfiguration. I scanned dozens of Stack Overflow threads and GitHub issues in Celery’s repository. Very few were relevant to my case, and none of the fixes were.</p>
<h2>April 13: You didn’t just do that, Heroku</h2>
<p>The last thread I read and the one that would lead me to the shocking discovery included a brute-force suggestion: delete Celery’s <code>celerybeat-schedule</code> file, where <code>celerybeat</code> keeps its schedule.</p>
<p>There’s no reason this would happen on a Serverless platform like Heroku, I thought to myself. But, at this point nothing makes sense anymore, I also thought to myself.</p>
<p>I decided to do something different just before; I renamed the dyno <code>celerybeat</code> to <code>celerybeatnew</code> in the Procfile and deployed:</p>
<pre><code>web: daphne musicbutler.asgi:application --port $PORT --bind 0.0.0.0
celerybeatnew: celery -A musicbutler beat
celerybackgroundworker1: celery -A musicbutler worker -Q regular
celerybackgroundworker2: celery -A musicbutler worker -Q regular
celeryimportantworker: celery -A musicbutler worker -Q important
</code></pre>
<p>After the deployment was over I checked the logs again; now everything became clear:</p>
<pre><code>👇
celerybeatnew.1 [2023-04-13 19:42:00,000: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)
👇
celerybeat.1 [2023-04-13 19:42:00,015: INFO/MainProcess] Scheduler: Sending due task refresh_user_library (refresh_user_library)
</code></pre>
<p>See, at this point the <code>celerybeat</code> dyno shouldn’t even exist. It was nowhere to be found on my list of dynos. But here it is, alive, well, and scheduling tasks.</p>
<p>So what happened on that April 6 deployment is that Heroku either spun up two <code>celerybeat</code> dynos instead of one, or just never killed the old one. It’s not that <code>celerybeat</code> was misbehaving, it’s just that there were now two of them. It was virtually impossible to get to this conclustion until I changed the dyno's name in the Procfile.</p>
<h2>Heroku’s support lives up to its name</h2>
<p>I actually know what happened, but not thanks to Heroku’s support. I contacted them on April 13 and as of April 17, their only response is “we’re looking into this”. I haven't heard from them since, and the old <code>celerybeat</code> dyno is still up with nothing I can do to stop it.[^2]</p>
<p>&lt;div class="notice"&gt;
&lt;div class="notice-content"&gt;
&lt;p&gt;&lt;span class="font-bold"&gt;Update:&lt;/span&gt; Herkou finally terminated the zombie dyno on April 17 at 14:00UTC. They confirmed there was nothing I could've done to terminate it myself.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</p>
<p>In hindsight, it was a good decision to not simply rewrite the task’s code to utilize locks. That would have prevented <em>new instances</em> of the app from executing tasks twice, but Heroku is running an old, zombie dyno with outdated code. I know this because I removed some tasks from <code>celerybeatnew</code> and witnessed how they’re still running, courtesy of Heroku’s on-the-house worker. If Heroku was simply spinning up a <em>new</em> extra dyno with each deployment, these tasks should’ve stopped executing enitrely.</p>
<h2>Writing on the wall</h2>
<p>I’ve been reading on Heroku’s slow demise on developer communities for years now. Just weeks earlier, I’d read a <a href="https://twitter.com/dannypostmaa/status/1624689089332281344?lang=en">horror story on Twitter</a> about Heroku just flat-out deleting someone’s account, including production apps.</p>
<p>And still. I don’t know what I could’ve done differently here: when I list an internet-taught software developer, a talented OSS team, and a multi-gazillion dollar corporation, my instinct is to look into them in this same order.</p>
<h2>Next steps</h2>
<h3>Truly idempotent tasks</h3>
<p>This could have happened on any platform, and for other reasons. If a task should never ever be executed twice, or concurrently, it shouldn't count on a scheduler to prevent that. In fact, I've followed Celery's <a href="https://docs.celeryq.dev/en/stable/tutorials/task-cookbook.html#ensuring-a-task-is-only-executed-one-at-a-time">own guide</a> on ensuring a task is only executed one at a time. It's just that I didn't retroactively apply that to a 4 year-old piece of code.</p>
<h3>No more vendor lock</h3>
<p>In 2018 Heroku was pretty much the only option for a newbie like me to get started with web-development. The landscape is very different now with providers like <a href="https://render.com/">Render</a>, <a href="https://railway.app/">Railway</a>, and more offering a simillar DX.</p>
<p>I've just finished deploying a Celery worker on Railway. The exprience was OK. It would cost a bit more than what I currently pay for a Heroku dyno. Railway ivented their own deployment-file format, <a href="https://nixpacks.com/docs">Nixpacks</a>, which allows one to build Docker images. Render utilizes <code>render.yaml</code> files. Wherever I decide to go, I’ll be Dockerizing MusicButler so I can move between vendors less painfully.</p>
<p>When you're a solo-developer doing something on the side you need to prioritize your time ruthlessly. I've always prioritized new features and user feedback over infra work. Looks like it's time to start paying attention to the latter.</p>
<h2>TL;DR</h2>
<ul>
<li>Heroku has been running a 2nd copy of my scheduler instance since April 6, 2023 and I have zero control over it.</li>
<li>All scheduled tasks were carried out twice, causing disturbance to users and unnecessary high load.</li>
<li>Given how Heroku works and how they present their logs, I had no way to detect this early on, or reason to suspect that’s what happened.</li>
<li>I discovered the root cause April 13 and contacted Heroku. The zombie instance is still running as of April 17 at 18:45 UTC.</li>
</ul>
<p>[^1]: Some arguments were omitted for brevity.
[^2]: I tried every documented method, and some undocumented ones.</p>
]]></content:encoded>
    </item>
    <item>
      <title>The Django Speed Handbook: making a Django app faster</title>
      <link>https://thebiglog.com/posts/django-faster-speed-tutorial/</link>
      <guid isPermaLink="true">https://thebiglog.com/posts/django-faster-speed-tutorial/</guid>
      <pubDate>Tue, 25 Feb 2020 00:00:00 GMT</pubDate>
      <content:encoded><![CDATA[<p>Over the course of developing several Django apps, I've learned quite a bit about speed optimizations. Some parts of this process, whether on the backend or frontend, are not well-documented. I've decided to collect most of what I know in this article.</p>
<p><strong>If you haven’t taken a close look at the performance of your web-app yet, you're bound to find something good here</strong>.</p>
<p>&lt;details class="toc-container"&gt;&lt;summary class="toc-title"&gt;What's in this article?&lt;/summary&gt;
&lt;ul class="toc"&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#why-speed-is-important"&gt;Why speed is important&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#different-apps-different-bottlenecks"&gt;Different apps, different bottlenecks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#analyzing-and-debugging-performance-issues"&gt;Analyzing and debugging performance issues&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#disclaimer"&gt;Disclaimer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#backend-the-database-layer"&gt;Backend: the database layer&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#select_related"&gt;select_related&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#prefetch_related"&gt;prefetch_related&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#indexing"&gt;Indexing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#take-only-what-you-need"&gt;Take only what you need&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#backend-the-request-layer"&gt;Backend: the request layer&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#pagination"&gt;Pagination&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#asynchronous-executionbackground-tasks"&gt;Asynchronous execution/background tasks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#compressing-djangos-http-responses"&gt;Compressing Django's HTTP responses&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#caching"&gt;Caching&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#frontend-where-it-gets-hairier"&gt;Frontend: where it gets hairier&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#serving-static-files"&gt;Serving static-files&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#vocabulary"&gt;Vocabulary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#serving-static-files-from-django-with-whitenoise"&gt;Serving static files from Django with WhiteNoise&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#compressing-and-combining-with-django-compressor"&gt;Compressing and combining with django-compressor&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#minifying-css-js"&gt;Minifying CSS &amp; JS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#defer-loading-javascript"&gt;Defer-loading JavaScript&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#lazy-loading-images"&gt;Lazy-loading images&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#optimize-dynamically-scale-images"&gt;Optimize &amp; dynamically scale images&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#unused-css-removing-imports"&gt;Unused CSS: Removing imports&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#unused-css-purging-css-with-purgecss"&gt;Unused CSS: Purging CSS with PurgeCSS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#conclusion"&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#appendices"&gt;Appendices&lt;/a&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thebiglog.com#decorator-used-for-queryset-performance-analysis"&gt;Decorator used for QuerySet performance analysis&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;</p>
<h2>Why speed is important</h2>
<p>On the web, 100 milliseconds can make a significant difference and 1 second is a lifetime. Countless studies indicate that faster loading times are associated with better conversion-rates, user-retention, and organic traffic from search engines. Most importantly, they provide a better user experience.</p>
<h2>Different apps, different bottlenecks</h2>
<p>There are <em>many</em> techniques and practices to optimize your web-app’s performance. It’s easy to get carried away. <strong>Look for the highest return-to-effort ratio</strong>. Different web-apps have different bottlenecks and therefore will gain the most when those bottlenecks are taken care of. Depending on your app, some tips will be more useful than others.</p>
<p>While this article is catered to Django developers, the speed optimization tips here can be adjusted to pretty much any stack. On the frontend side, it’s especially useful for people hosting with Heroku and who do not have access to a CDN service.</p>
<h2>Analyzing and debugging performance issues</h2>
<p>On the backend, I recommend the tried-and-true <a href="https://github.com/jazzband/django-debug-toolbar"><code>django-debug-toolbar</code></a>. It will help you analyze your request/response cycles and see where most of the time is spent. Especially useful because it provides database query execution times and provides a nice SQL <code>EXPLAIN</code> in a separate pane that appears in the browser.</p>
<p><a href="https://developers.google.com/speed/pagespeed/insights/">Google PageSpeed</a> will display mainly frontend related advice, but some can apply to the backend as well (like server response times). PageSpeed scores do not directly correlate with loading times but should give you a good picture of where the low-hanging fruits for your app are. In development environments, you can use <a href="https://developers.google.com/web/tools/lighthouse">Google Chrome's Lighthouse</a> which provides the same metrics but can work with local network URIs. <a href="https://gtmetrix.com/">GTmetrix</a> is another detail-rich analysis tool.</p>
<h2>Disclaimer</h2>
<p>Some people will tell you that some of the advice here is wrong or lacking. That's okay; this is not meant to be a bible or the ultimate go-to-guide. Treat these techniques and tips as ones you may use, not should or must use. Different needs call for different setups.</p>
<h2>Backend: the database layer</h2>
<p>Starting with the backend is a good idea since it's usually the layer that's supposed to do most of the heavy lifting behind the scenes.</p>
<p>There's little doubt in my mind which two ORM functionalities I want to mention first: these are <code>select_related</code> and <code>prefetch_related</code>. They both deal specifically with retrieving related objects and will usually improve speed by minimizing the number of database queries.</p>
<h3>select_related</h3>
<p>Let's take a music web-app for example, which might have these models:</p>
<pre><code># music/models.py, some fields &amp; code omitted for brevity
class RecordLabel(models.Model):
    name = models.CharField(max_length=560)


class MusicRelease(models.Model):
    title = models.CharField(max_length=560)
    release_date = models.DateField()

class Artist(models.Model):
    name = models.CharField(max_length=560)
    label = models.ForeignKey(
        RecordLabel,
        related_name="artists",
        on_delete=models.SET_NULL
    )
    music_releases = models.ManyToManyField(
        MusicRelease,
related_name="artists"
    )
</code></pre>
<p>So each artist is related to one and only one record company and each record company can sign multiple artists: a classic one-to-many relationship. Artists have many music-releases, and each release can belong to one artist or more.</p>
<p>I've created some dummy data:</p>
<ul>
<li>20 record labels</li>
<li>each record label has 25 artists</li>
<li>each artist has 100 music releases</li>
</ul>
<p>Overall, we have ~50,500 of these objects in our tiny database.</p>
<p>Now let's wire-up a fairly standard function that pulls our artists and their label. <code>django_query_analyze</code> is a decorator I wrote to count the number of database queries and time to run the function. Its implementation can be found in the appendix.</p>
<pre><code># music/selectors.py
@django_query_analyze
def get_artists_and_labels():
    result = []
    artists = Artist.objects.all()
    for artist in artists:
        result.append({"name": artist.name, "label": artist.label.name})
    return result
</code></pre>
<p><code>get_artists_and_labels</code> is a regular function which you may use in a Django view. It returns a list of dictionaries, each contains the artist's name and their label. I'm accessing <code>artist.label.name</code> to force-evaluate the Django QuerySet; you can equate this to trying to access these objects in a Jinja template:</p>
<pre><code>{% for artist in artists_and_labels %}
&lt;p&gt;Name: {{ artist.name }}, Label: {{ artist.label.name }}&lt;/p&gt;
{% endfor %}
</code></pre>
<p>Now let's run this function:</p>
<pre><code>ran function get_artists_and_labels
--------------------
number of queries: 501
Time of execution: 0.3585s
</code></pre>
<p>So we've pulled 500 artists and their labels in 0.36 seconds, but more interestingly — we've hit the database 501 times. Once for all the artists, and 500 more times: <em>once for each</em> of the artists' labels. This is called "The N+1 problem". Let's tell Django to retrieve each artist's <code>label</code> in the same query with <code>select_related</code>:</p>
<pre><code>@django_query_analyze
def get_artists_and_labels_select_related():
    result = []
    artists = Artist.objects.select_related("label") # select_related
    for artist in artists:
        result.append(
            {"name": artist.name, "label": artist.label.name if artist.label else "N/A"}
        )
    return result
</code></pre>
<p>Now let's run this:</p>
<pre><code>ran function get_artists_and_labels_select_related
--------------------
number of queries: 1
Time of execution: 0.01481s
</code></pre>
<p>500 queries less and a 96% speed improvement.</p>
<h3>prefetch_related</h3>
<p>Let's look at another function, for getting each artist's first 100 music releases:</p>
<pre><code>@django_query_analyze
def get_artists_and_releases():
    result = []
    artists = Artist.objects.all()[:100]
    for artist in artists:
        result.append(
            {
                "name": artist.name,
                "releases": [release.title for release in artist.music_releases.all()],
            }
        )
    return result
</code></pre>
<p>How long does it take to fetch 100 artists and 100 releases for each one of them?</p>
<pre><code>ran function get_artists_and_releases
--------------------
number of queries: 101
Time of execution: 0.18245s
</code></pre>
<p>Let's change the <code>artists</code> variable in this function and add <code>select_related</code> so we can bring the number of queries down and hopefully get a speed boost:</p>
<pre><code>artists = Artist.objects.select_related("music_releases")
</code></pre>
<p>If you actually do that, you'll get an error:</p>
<pre><code>django.core.exceptions.FieldError: Invalid field name(s) given in select_related: 'music_releases'. Choices are: label
</code></pre>
<p>That's because <code>select_related</code> can only be used to cache ForeignKey or OneToOneField attributes. The relationship between <code>Artist</code> and <code>MusicRelease</code> is many-to-many though, and that's where <code>prefetch_related</code> comes in:</p>
<pre><code>@django_query_analyze
def get_artists_and_releases_prefetch_related():
    result = []
    artists = Artist.objects.all()[:100].prefetch_related("music_releases") # prefetch_related
    for artist in artists:
        result.append(
            {
                "name": artist.name,
                "releases": [rel.title for rel in artist.music_releases.all()],
            }
        )
    return result
</code></pre>
<p><code>select_related</code> can only cache the "one" side of the "one-to-many" relationship, or either side of a "one-to-one" relationship. You can use <code>prefetch_related</code> for all other caching, including the many side in one-to-many relationships, and many-to-many relationships. Here's the improvement in our example:</p>
<pre><code>ran function get_artists_and_releases_prefetch_related
--------------------
number of queries: 2
Time of execution: 0.13239s
</code></pre>
<p>Nice.</p>
<p>Things to keep in mind about <code>select_related</code> and <code>prefetch_related</code>:</p>
<ul>
<li>If you aren't pooling your database connections, the gains will be even bigger because of fewer roundtrips to the database.</li>
<li>For very large result-sets, running <code>prefetch_related</code> can actually make things slower.</li>
<li>One database query isn't <em>necessarily</em> faster than two or more.</li>
</ul>
<h3>Indexing</h3>
<p>Indexing your database columns can have a big impact on query performance. Why then, is it not the first clause of this section? Because indexing is more complicated than simply scattering <code>db_index=True</code> on your model fields.</p>
<p>Creating an index on frequently accessed columns can improve the speed of look-ups pertaining to them. Indexing comes at the cost of additional writes and storage space though, so you should always measure your benefit:cost ratio. In general, creating indices on a table will slow down inserts/updates.</p>
<h3>Take only what you need</h3>
<p>When possible, use <code>values()</code> and especially <code>values_list()</code> to only pull the needed properties of your database objects. Continuing our example, if we only want to display a list of artist names and don't need the full ORM objects, it's usually better to write the query like so:</p>
<pre><code>artist_names = Artist.objects.values('name')
# &lt;QuerySet [{'name': 'Chet Faker'}, {'name': 'Billie Eilish'}]&gt;

artist_names = Artist.objects.values_list('name')
# &lt;QuerySet [('Chet Faker',), ('Billie Eilish',)]&gt;

artist_names = Artist.objects.values_list('name', flat=True)
# &lt;QuerySet ['Chet Faker', 'Billie Eilish']&gt;
</code></pre>
<p>&lt;div class="notice"&gt;
&lt;div class="notice-content"&gt;
&lt;p&gt;
Haki Benita, a true database expert (unlike me), reviewed some parts of this section. You should read &lt;a href="http://hakibenita.com/"&gt;Haki's blog&lt;/a&gt;.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</p>
<h2>Backend: the request layer</h2>
<p>The next layer we're going to look at is the request layer. These are your Django views, context processors, and middleware. Good decisions here will also lead to better performance.</p>
<h3>Pagination</h3>
<p>In the section about <code>select_related</code> we were using the function to return 500 artists and their labels. In many situations returning this many objects is either unrealistic or undesirable. The section about <a href="https://docs.djangoproject.com/en/2.2/topics/pagination/">pagination in the Django docs</a> is crystal clear on how to work with the <code>Paginator</code> object. Use it when you don't want to return more than <code>N</code> objects to the user, or when doing so makes your web-app too slow.</p>
<h3>Asynchronous execution/background tasks</h3>
<p>There are times when a certain action inevitably takes a lot of time. For example, a user requests to export a big number of objects from the database to an XML file. If we're doing everything in the same process, the flow looks like this:</p>
<pre><code>web: user requests file -&gt; process file -&gt; return response
</code></pre>
<p>Say it takes 45 seconds to process this file. You're not really going to let the user wait all this time for a response. First, because it's a horrible experience from a UX standpoint, and second, because some hosts will actually cut the process short if your app doesn't respond with a proper HTTP response after N seconds.</p>
<p>In most cases, the sensible thing to do here is to remove this functionality from the request-response loop and relay it to a different process:</p>
<pre><code>web: user requests file -&gt; delegate to another process -&gt; return response
                           |
                           v
background process:        receive job -&gt; process file -&gt; notify user
</code></pre>
<p>Background tasks are beyond the scope of this article but if you've ever needed to do something like the above I'm sure you've heard of libraries like <a href="http://www.celeryproject.org/">Celery</a>.</p>
<h3>Compressing Django's HTTP responses</h3>
<p>&lt;div class="notice"&gt;
&lt;div class="notice-content"&gt;
&lt;p&gt;This is not to be confused with static-file compression, which is mentioned later in the article.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</p>
<p>Compressing Django's HTTP/JSON responses also stands to save your users some latency. How much exactly? Let's check the number of bytes in our response's body without any compression:</p>
<pre><code>Content-Length: 66980
Content-Type: text/html; charset=utf-8
</code></pre>
<p>So our HTTP response is around 67KB. Can we do better? Many use Django's built-in <code>GZipMiddleware</code> for <code>gzip</code> compression, but today the newer and more effective <code>brotli</code> enjoys the same support across browsers (except IE11, of course).</p>
<p>&lt;div class="notice"&gt;
&lt;div class="notice-content"&gt;
&lt;p&gt;&lt;b&gt;Important:&lt;/b&gt; Compression can &lt;i&gt;potentially&lt;/i&gt; open your website to security breaches, as mentioned in the &lt;a href="https://docs.djangoproject.com/en/2.2/ref/middleware/#module-django.middleware.gzip"&gt;GZipMiddleware section&lt;/a&gt; of the Django docs.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</p>
<p>Let's install the excellent <a href="https://pypi.org/project/django-compression-middleware/">django-compression-middleware</a> library. It will choose the fastest compression mechanism supported by the browser by checking the request's <code>Accept-Encoding</code> headers:</p>
<pre><code>pip install django-compression-middleware
</code></pre>
<p>Include it in our Django app's middleware:</p>
<pre><code>MIDDLEWARE = [
    "django.middleware.security.SecurityMiddleware",
    "django.contrib.sessions.middleware.SessionMiddleware",
    "django.contrib.auth.middleware.AuthenticationMiddleware",
    "compression_middleware.middleware.CompressionMiddleware",
# ...
]
</code></pre>
<p>And inspect the body's <code>Content-Length</code> again:</p>
<pre><code>Content-Encoding: br
Content-Length: 7239
Content-Type: text/html; charset=utf-8
</code></pre>
<p>The body size is now 7.24KB, 89% smaller. You can certainly argue this kind of operation should be delegated to a dedicated server like Ngnix or Apache. I'd argue that everything is a balance between simplicity and resources.</p>
<h3>Caching</h3>
<p>Caching is the process of storing the result of a certain calculation for faster future retrieval. Django has an excellent <a href="https://docs.djangoproject.com/en/3.0/topics/cache/">caching framework</a> that lets you do this on a variety of levels and using different storage backends.</p>
<p>Caching can be tricky in data-driven apps: you'd never want to cache a page that's supposed to display up-to-date, realtime information at all times. So, the big challenge isn't so much setting up caching as it is figuring out what should be cached, for how long, and understanding when or how the cache is invalidated.</p>
<p>Before resorting to caching, make sure you've made proper optimizations at the database-level and/or on the frontend. If designed and queried properly, databases are ridiculously fast at pulling out relevant information at scale.</p>
<h2>Frontend: where it gets hairier</h2>
<p>Reducing static files/assets sizes can significantly speed up your web application. Even if you've done everything right on the backend, serving your images, CSS, and JavaScript files inefficiently will degrade your application's speed.</p>
<p>Between compiling, minifying, compressing, and purging, it's easy to get lost. Let's try not to.</p>
<h3>Serving static-files</h3>
<p>You have several options on where and how to serve static files. <a href="https://docs.djangoproject.com/en/2.2/howto/static-files/deployment/#deploying-static-files">Django's docs</a> mention a dedicated server running Ngnix and Apache, Cloud/CDN, or the same-server approach.</p>
<p>I've gone with a bit of a hybrid attitude: images are served from a CDN, large file-uploads go to S3, but all serving and handling of other static assets (CSS, JavaScript, etc…) is done using WhiteNoise (covered in-detail later).</p>
<h3>Vocabulary</h3>
<p>Just to make sure we're on the same page, here's what I mean when I say:</p>
<ul>
<li>Compiling: If you're using SCSS for your stylesheets, you'll first have to compile those to CSS because browsers don't understand SCSS.</li>
<li>Minifying: reducing whitespace and removing comments from CSS and JS files can have a significant impact on their size. Sometimes this process involves uglifying: the renaming of long variable names to shorter ones, etc...</li>
<li>Compressing/Combining: for CSS and JS, combining multiple files to one. For images, usually means removing some data from images to make their files size smaller.</li>
<li>Purging: remove unneeded/unused code. In CSS for example: removing selectors that aren't used.</li>
</ul>
<h3>Serving static files from Django with WhiteNoise</h3>
<p>WhiteNoise allows your Python web-application to serve static assets on its own. <a href="http://whitenoise.evans.io/en/stable/index.html#what-s-the-point-in-whitenoise-when-i-can-do-the-same-thing-in-a-few-lines-of-apache-nginx-config">As its author states</a>, it comes in when other options like Nginx/Apache are unavailable or undesired.</p>
<p>Let's install it:</p>
<pre><code>pip install whitenoise[brotli]
</code></pre>
<p>Before enabling WhiteNoise, make sure your <code>STATIC_ROOT</code> is defined in <code>settings.py</code>:</p>
<pre><code>STATIC_ROOT = os.path.join(BASE_DIR, "staticfiles")
</code></pre>
<p>To enable WhiteNoise, add its WhiteNoise middleware right below <code>SecurityMiddleware</code> in <code>settings.py</code>:</p>
<pre><code>MIDDLEWARE = [
  'django.middleware.security.SecurityMiddleware',
  'whitenoise.middleware.WhiteNoiseMiddleware',
  # ...
]
</code></pre>
<p>In production, you'll have to run <code>manage.py collectstatic</code> for WhiteNoise to work.</p>
<p>While this step is not mandatory, it's strongly advised to add caching and compression:</p>
<pre><code>STATICFILES_STORAGE = 'whitenoise.storage.CompressedManifestStaticFilesStorage'

</code></pre>
<p>Now whenever it encounters a <code>{% static %}</code> tag in templates, WhiteNoise will take care of compressing and caching the file for you. It also takes care of cache-invalidation.</p>
<p>One more important step: To ensure that we get a consistent experience between development and production environments, we add <code>runserver_nostatic</code>:</p>
<pre><code>INSTALLED_APPS = [
    'whitenoise.runserver_nostatic',
    'django.contrib.staticfiles',
    # ...
]
</code></pre>
<p>This can be added regardless of whether <code>DEBUG</code> is <code>True</code> or not, because you don't usually run Django via <code>runserver</code> in production.</p>
<p>I found it useful to also increase the caching time:</p>
<pre><code># Whitenoise cache policy
WHITENOISE_MAX_AGE = 31536000 if not DEBUG else 0 # 1 year
</code></pre>
<p>Wouldn't this cause problems with cache-invalidation? No, because WhiteNoise creates <em>versioned</em> files when you run <code>collectstatic</code>:</p>
<pre><code>&lt;link
  rel="stylesheet"
  href="https://thebiglog.com/static/CACHE/css/4abd0e4b71df.css"
  type="text/css"
  media="all"
/&gt;
</code></pre>
<p>So when you deploy your application again, your static files are overwritten and will have a different name, thus the previous cache becomes irrelevant.</p>
<h3>Compressing and combining with django-compressor</h3>
<p>WhiteNoise already compresses static files, so <code>django-compressor</code> is optional. But the latter offers an additional enhancement: combining the files. To use compressor with WhiteNoise we have to take a few extra steps.</p>
<p>Let's say the user loads an HTML document that links three <code>.css</code> files:</p>
<pre><code>&lt;head&gt;
  &lt;link rel="stylesheet" href="https://thebiglog.combase.css" type="text/css" media="all" /&gt;
  &lt;link rel="stylesheet" href="https://thebiglog.comadditions.css" type="text/css" media="all" /&gt;
  &lt;link
    rel="stylesheet"
    href="https://thebiglog.comnew_components.css"
    type="text/css"
    media="all"
  /&gt;
&lt;/head&gt;
</code></pre>
<p>Your browser will make three different requests to these locations. In many scenarios it's more effective to combine these different files when deploying, and <code>django-compressor</code> does that with its <code>{% compress css %}</code> template tag:</p>
<p>This:</p>
<pre><code>{% load compress %}
&lt;head&gt;
  {% compress css %}
    &lt;link rel="stylesheet" href="https://thebiglog.combase.css" type="text/css" media="all"&gt;
    &lt;link rel="stylesheet" href="https://thebiglog.comadditions.css" type="text/css" media="all"&gt;
    &lt;link rel="stylesheet" href="https://thebiglog.comnew_components.css" type="text/css" media="all"&gt;
  {% compress css %}
&lt;/head&gt;
</code></pre>
<p>Becomes:</p>
<pre><code>&lt;head&gt;
  &lt;link rel="stylesheet" href="https://thebiglog.comcombined.css" type="text/css" media="all" /&gt;
&lt;/head&gt;
</code></pre>
<p>Let's go over the steps to make <code>django-compressor</code> and WhiteNoise play well. Install:</p>
<pre><code>pip install django_compressor
</code></pre>
<p>Tell compressor where to look for static files:</p>
<pre><code>COMPRESS_STORAGE = "compressor.storage.GzipCompressorFileStorage"
COMPRESS_ROOT = os.path.abspath(STATIC_ROOT)
</code></pre>
<p>Because of the way these two libraries intercept the request-response cycle, they're incompatible with their default configurations. We can overcome this by modifying some settings.</p>
<p>I prefer to use environment variables in <code>.env</code> files and have one Django <code>settings.py</code>, but if you have <code>settings/dev.py</code> and <code>settings/prod.py</code>, you'll know how to convert these values:</p>
<p><code>main_project/settings.py</code>:</p>
<pre><code>from decouple import config
#...

COMPRESS_ENABLED =  config("COMPRESS_ENABLED", cast=bool)
COMPRESS_OFFLINE = config("COMPRESS_OFFLINE", cast=bool)
</code></pre>
<p><code>COMPRESS_OFFLINE</code> is <code>True</code> in production and <code>False</code> in development. <code>COMPRESS_ENABLED</code> is <code>True</code> in both[^fn-1-compress].</p>
<p>With offline compression, one must run <code>manage.py compress</code> on every deployment. On Heroku, you'll want to disable the platform from automatically running <code>collectstatic</code> for you (on by default) and instead opt to do that in the <code>post_compile</code> hook, which Heroku will run when you deploy. If you don't already have one, create a folder called <code>bin</code> at the root of your project and inside of it a file called <code>post_compile</code> with the following:</p>
<pre><code>python manage.py collectstatic --noinput
python manage.py compress --force
python manage.py collectstatic --noinput
</code></pre>
<p>Another nice thing about compressor is that it can compress SCSS/SASS files:</p>
<pre><code>COMPRESS_PRECOMPILERS = (
    ("text/x-sass", "django_libsass.SassCompiler"),
    ("text/x-scss", "django_libsass.SassCompiler"),
)
</code></pre>
<h3>Minifying CSS &amp; JS</h3>
<p>Another important thing to apply when talking about load-times and bandwidth usage is minifying: the process of (automatically) decreasing your code's file-size by eliminating whitespace and removing comments.</p>
<p>There are several approaches to take here, but if you're using <code>django-compressor</code> specifically, you get that for free as well. You just need to add the following (or any other filters compressor supports) to your <code>settings.py</code> file:</p>
<pre><code>COMPRESS_FILTERS = {
    "css": [
        "compressor.filters.css_default.CssAbsoluteFilter",
        "compressor.filters.cssmin.rCSSMinFilter",
    ],
    "js": ["compressor.filters.jsmin.JSMinFilter"],
}
</code></pre>
<h3>Defer-loading JavaScript</h3>
<p>Another thing that contributes to slower performance is loading external scripts. The gist of it is that browsers will try to fetch and execute JavaScript files in the <code>&lt;head&gt;</code> tag as they are encountered <em>and before</em> parsing the rest of the page:</p>
<pre><code>&lt;html&gt;
  &lt;head&gt;
    &lt;script src="https://will-block.js"&gt;&lt;/script&gt;
    &lt;script src="https://will-also-block.js"&gt;&lt;/script&gt;
  &lt;/head&gt;
&lt;/html&gt;
</code></pre>
<p>We can use the <code>async</code> and <code>defer</code> keywords to mitigate this:</p>
<pre><code>&lt;html&gt;
  &lt;head&gt;
    &lt;script async src="https://thebiglog.comsomelib.somecdn.js"&gt;&lt;/script&gt;
  &lt;/head&gt;
&lt;/html&gt;
</code></pre>
<p><code>async</code> and <code>defer</code> both allow the script to be fetched asynchronously without blocking. One of the key differences between them is <em>when</em> the script is allowed to execute: With <code>async</code>, once the script has been downloaded, all parsing is paused until the script has finished executing, while with <code>defer</code> the script is executed only after all HTML has been parsed.</p>
<p>I suggest referring to <a href="https://flaviocopes.com/javascript-async-defer/">Flavio Copes' article</a> on the <code>defer</code> and <code>aysnc</code> keywords. Its general conclusion is:</p>
<blockquote>
<p>The best thing to do to speed up your page loading when using scripts is to put them in the <code>head</code>, and add a <code>defer</code> attribute to your <code>script</code> tag.</p>
</blockquote>
<h3>Lazy-loading images</h3>
<p>Lazily loading images means that we only request them when or a little before they enter the client's (user's) viewport. It saves time and bandwidth ($ on cellular networks) for your users. With excellent, dependency-free JavaScript libraries like <a href="https://github.com/verlok/lazyload">LazyLoad</a>, there really isn't an excuse to not lazy-load images. Moreover, Google Chrome natively supports the <code>lazy</code> attribute since version 76.</p>
<p>Using the aforementioned LazyLoad is fairly simple and the library is very customizable. In my own app, I want it to apply on images only if they have a <code>lazy</code> class, and start loading an image 300 pixels before it enters the viewport:</p>
<pre><code>$(document).ready(function (e) {
  new LazyLoad({
    elements_selector: ".lazy", // classes to apply to
    threshold: 300, // pixel threshold
  });
});
</code></pre>
<p>Now let's try it with an existing image:</p>
<pre><code>&lt;img class="album-artwork" alt="{{ album.title }}"  src="https://thebiglog.com{{ album.image_url }}"&gt;
</code></pre>
<p>We replace the <code>src</code> attribute with <code>data-src</code> and add <code>lazy</code> to the class attribute:</p>
<pre><code>&lt;img class="album-artwork lazy" alt="{{ album.title }}"  data-src="https://thebiglog.com{{ album.image_url }}"&gt;
</code></pre>
<p>Now the client will request this image when the latter is 300 pixels under the viewport.</p>
<p>If you have many images on certain pages, using lazy-loading will dramatically improve your load times.</p>
<h3>Optimize &amp; dynamically scale images</h3>
<p>Another thing to consider is image-optimization. Beyond compression, there are two more techniques to consider here.</p>
<p>First, file-format optimization. There are newer formats like <code>WebP</code> that are presumably 25-30% smaller than your average <code>JPEG</code> image at the same quality. As of 02/2020 WebP has <a href="https://caniuse.com/#feat=webp">decent but incomplete</a> browser support, so you'll have to provide a standard format fallback if you want to use it.</p>
<p>Second, serving different image-sizes to different screen sizes: if some mobile device has a maximum viewport width of 650px, then why serve it the same 1050px image you're displaying to 13″ 2560px retina display?</p>
<p>Here, too, you can choose the level of granularity and customization that suits your app. For simpler cases, You can use the <code>srcset</code> attribute to control sizing and be done at that, but if for example you're also serving <code>WebP</code> with <code>JPEG</code> fallbacks for the same image, you may use the <code>&lt;picture&gt;</code> element with multiple sources and source-sets.</p>
<p>If the above sounds complicated for you as it does for me, <a href="https://dev.to/jsco/a-comprehensive-guide-to-responsive-images-picture-srcset-source-etc-4adj">this guide</a> should help explain the terminology and use-cases.</p>
<h3>Unused CSS: Removing imports</h3>
<p>If you're using a CSS framework like Bootstrap, don't just include all of its components blindly. In fact, I would start with commenting out all of the non-essential components and only add those gradually as the need arises. Here's a snippet of my <code>bootstrap.scss</code>, where all of its different parts are imported:</p>
<pre><code>// ...

// Components
// ...
@import "bootstrap/dropdowns";
@import "bootstrap/button-groups";
@import "bootstrap/input-groups";
@import "bootstrap/navbar";
// @import "bootstrap/breadcrumbs";
// @import "bootstrap/badges";
// @import "bootstrap/jumbotron";

// Components w/ JavaScript
@import "bootstrap/modals";
@import "bootstrap/tooltip";
@import "bootstrap/popovers";
// @import "bootstrap/carousel";
</code></pre>
<p>I don't use things like <code>badges</code> or <code>jumbotron</code> so I can safely comment those out.</p>
<h3>Unused CSS: Purging CSS with PurgeCSS</h3>
<p>A more aggressive and more complicated approach is using a library like <a href="https://github.com/FullHuman/purgecss">PurgeCSS</a>, which analyzes your files, detects CSS content that's not in use, and removes it. PurgeCSS is an NPM package, so if you're hosting Django on Heroku, you'll need to install the Node.js buildpack side-by-side with your Python one.</p>
<h2>Conclusion</h2>
<p>I hope you've found at least one area where you can make your Django app faster. If you have any questions, suggestions, or feedback don't hesitate to <a href="https://twitter.com/SHxKM">drop me a line on Twitter</a>.</p>
<h2>Appendices</h2>
<h3>Decorator used for QuerySet performance analysis</h3>
<p>Below is the code for the <code>django_query_analyze</code> decorator:</p>
<pre><code>from timeit import default_timer as timer
from django.db import connection, reset_queries

def django_query_analyze(func):
    """decorator to perform analysis on Django queries"""

    def wrapper(*args, **kwargs):

        avs = []
        query_counts = []
        for _ in range(20):
            reset_queries()
            start = timer()
            func(*args, **kwargs)
            end = timer()
            avs.append(end - start)
            query_counts.append(len(connection.queries))
            reset_queries()

        print()
        print(f"ran function {func.__name__}")
        print(f"-" * 20)
        print(f"number of queries: {int(sum(query_counts) / len(query_counts))}")
        print(f"Time of execution: {float(format(min(avs), '.5f'))}s")
        print()
        return func(*args, **kwargs)

    return wrapper
</code></pre>
<p>[^fn-1-compress]: it's still useful to hold this boolean in the environment</p>
]]></content:encoded>
    </item>
    <item>
      <title>How to scaffold Django projects with Cookiecutter</title>
      <link>https://thebiglog.com/posts/django-how-to-scaffold-cookiecutter/</link>
      <guid isPermaLink="true">https://thebiglog.com/posts/django-how-to-scaffold-cookiecutter/</guid>
      <pubDate>Sun, 28 Jul 2019 00:00:00 GMT</pubDate>
      <content:encoded><![CDATA[<p>import demoVideo from './cookiecutter-demo-minified-9.mp4'</p>
<p>This post is a guide on how to scaffold (quick-start) new projects efficiently with <a href="https://github.com/cookiecutter/cookiecutter">Cookiecutter</a>, a library that creates projects from project-templates. It outlines how I created my own Django cookiecutter, <a href="https://github.com/SHxKM/django-scaffold-cookiecutter">Scaffold Django X</a>, but the same can be applied to Flask and pretty much any other Python project.</p>
<p>Working on some Django articles, I found myself needing to start a new project more often than usual. It can get tedious: having to initialize a project, filling boilerplate settings, adding template-files and directories…the list goes on.</p>
<p>Sure, you can duplicate an existing project. But then too much time is spent on removing unneeded files, figuring out why the new project isn't running (you forgot to remove a file or a line), and fiddling with settings. Might as well just start from scratch.</p>
<p>Or, use Cookiecutter:</p>
<p>&lt;div class="vidwrap"&gt;
&lt;div class="space-x-4"&gt;
&lt;a href="https://thebiglog.com#" id="button-minified-demo" onclick="var v = document.getElementById('vid-minified-demo'); if (v.paused) v.play(); else v.pause(); return false;" class="underline"&gt;Play&lt;/a&gt;
&lt;a target="_blank" href={demoVideo} class="underline"&gt;Full Screen&lt;/a&gt;
&lt;/div&gt;
&lt;video src={demoVideo} playsinline muted class="playable w-full mt-0" id="vid-minified-demo" onclick="if (this.paused) this.play(); else this.pause();" onplay="document.getElementById('button-minified-demo').textContent = 'Pause';" onpause="document.getElementById('button-minified-demo').textContent = 'Play'"&gt;
&lt;/video&gt;
&lt;/div&gt;</p>
<h2>What is Cookiecutter?</h2>
<p><code>Cookiecutter</code> is a command-line utility that creates projects from project-templates, aptly called cookiecutters. It allows for dynamic insertion of content within files and inclusion/exclusion of the files themselves in a way that makes project generation flexible and convenient.</p>
<p>&lt;div class="notice"&gt;
&lt;div class="notice-content"&gt;
&lt;p&gt;Cookiecutter (the library) shares the same name with what it can be used to generate, a cookiecutter. This may be confusing at times. Keep in mind that Cookiecutter the library is written with a big C.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</p>
<p>After you install Cookiecutter you can clone any local or remote project-template, choose how to configure your project from a set of options (predefined by the template's author), and you're ready to go. Want to use Celery? it's automatically included in the <code>requirements.txt</code> file, its relevant configs are added to Django's <code>settings.py</code>, and Celery-specific files are already in the generated project's tree. Using Django as a backend only? a cookiecutter can remove the <code>static</code> and <code>templates</code> folders for you.</p>
<p>The most popular Django related Cookiecutter project is <a href="https://github.com/pydanny/cookiecutter-django">Cookiecutter Django</a> by Daniel Roy Greenfeld. It's very customizable and includes a long list of options and features. I found Cookiecutter Django too opinionated (and a bit of an overkill) for my use-case so I created my own cookiecutter: <a href="https://github.com/SHxKM/django-scaffold-cookiecutter">Scaffold Django X</a>.</p>
<h2>Generating a project from a cookiecutter</h2>
<p>Before creating a cookiecutter, it's first worth understanding how you'd generate a project from an existing one.</p>
<p>Because you would want to be able to use it from any folder, it's a good idea to install Cookiecutter in your global/main Python environment:</p>
<pre><code>$ pip install cookiecutter
</code></pre>
<p>If you want to scaffold a project from a local cookiecutter, navigate to the folder in which you want your actual project to live, open Terminal, and type <code>cookiecutter</code> followed by the path to the cookiecutter that you want to base your project on.</p>
<p>To create a project in <code>~/my-projects/</code> based on the cookiecutter/template called <code>simple-django-cookiecutter</code>, you go to <code>~/my-projects/</code>:</p>
<pre><code>$ cd ~/my-projects/
</code></pre>
<p>This is where Cookiecutter will place the generated project folder. You then invoke <code>cookiecutter</code>, specifying the path to the project template:</p>
<pre><code>$ cookiecutter ~/code/my-cookiecutters/simple-django-cookiecutter/
</code></pre>
<p>You will then be prompted to fill-in some details that will be used to populate your project with the relevant configurations. These configuration variables and their default values are derived from a special kind of file, <code>cookiecutter.json</code>, which is covered later in this guide.</p>
<p>For example, <a href="https://github.com/SHxKM/django-scaffold-cookiecutter/tree/master">the simple cookiecutter that I've created</a> prompts for the following options when invoked:</p>
<pre><code>project_slug [open_folder]:

project_name [Open Folder]:

description [A very nice weblog]:

author_name [SKM]:

author_email [openfolder@example.com]:

include_jquery_cdn [n]:

Select css_framework:
1 - tailwindcss
2 - bootstrap
3 - none

Choose from 1, 2, 3 (1, 2, 3) [1]:
</code></pre>
<p>If the <code>project_slug</code> is provided as <code>music_app</code>, then this is what the project's root folder will be called. The <code>description</code> will automatically go in the website's meta tags. <code>include_jquery_cdn</code> is handled in a similar fashion: if <code>y</code> is provided instead of the default <code>n</code>, then a <code>&lt;link&gt;</code> to jQuery's CDN is inserted in the project's <code>base.html</code>. Django Cookiecutter X also populates <code>base.html</code> with an empty <code>main.css</code> and <code>main.js</code> files so that they're ready to use.</p>
<p>At the end of this process, a <code>music_app</code> folder is placed under <code>~/my-projects</code> and you can start developing the preconfigured Django project.</p>
<p>It's also possible to clone remote cookiecutters. From Github for example:</p>
<pre><code>cookiecutter https://github.com/SHxKM/django-scaffold-cookiecutter
</code></pre>
<p>This is how one would generate a project from a template. Next: how to build the template itself.</p>
<h2>How to create a cookiecutter: the basics</h2>
<p>The minimum requirement for a valid project-template is that it contains a <code>cookiecutter.json</code> file at its root folder. This file is used to define the different variables the user has to fill or choose from during the generation stage. It also sets an overridable default for each variable:</p>
<pre><code>{
  "some_variable": "some_default_value",
  "project_slug": "open_folder",
  "project_name": "Open Folder",
  "description": "A very nice weblog",
  "author_name": "SKM",
  "author_email": "openfolder@example.com",
  "include_jquery_cdn": "n",
  "css_framework": ["tailwindcss", "bootstrap", "none"]
}
</code></pre>
<p>These are key-value pairs, with the each value denoting the default to use. If a user simply hits enter when prompted for the <code>project_name</code>, it will be <code>Open Folder</code>. If a list is used — like in <code>css_framework</code> above — Cookiecutter will present a numbered choice prompt for that option.</p>
<p>So, when the user has finished answering all prompts, Cookiecutter then scans the files and looks for blocks that match each of the keys above. But where and how are these values used?</p>
<h3>Variables in filenames and folders</h3>
<p>Here's the root folder of an example cookiecutter:</p>
<pre><code>.
├── cookiecutter.json
└── {{ cookiecutter.project_slug }}
    ├── Pipfile
    ├── manage.py
    ├── static
    ├── templates
    └── {{ cookiecutter.project_slug }}
</code></pre>
<p>So for one, filenames and directories can themselves be variables. The Django project lives under the directory <code>{{ cookiecutter.project_slug }}</code>. The directory is named this way because Cookiecutter is going to dynamically rename it when the project is generated. You may be familiar with this curly-brace notation as Cookiecutter uses the same Jinja2 templating engine that Django supports.</p>
<h3>Variables in HTML files</h3>
<p>Here's a snippet from <code>base.html</code>:</p>
<pre><code>{# base.html #}
&lt;head&gt;
...

  &lt;title&gt;{{ cookiecutter.project_name }}&lt;/title&gt;
  &lt;meta name="description" content="{{ cookiecutter.description }}"&gt;
...
&lt;/head&gt;
</code></pre>
<p>When the project is generated, <code>{{ cookiecutter.project_name }}</code> is simply replaced with the name provided by the user in the Terminal.</p>
<p>The fact that Cookiecutter uses the same syntax as Django templates can create a problem if it tries to parse Django's own tags, like <code>{% static %}</code> or <code>{% url %}</code>. You can escape these with the <code>{% raw %}</code> and <code>{% endraw %}</code> tags:</p>
<pre><code>{% raw %}{% load static %}{% endraw %}
</code></pre>
<p>For conditionals, like whether to include the jQuery CDN, an <code>if</code> block is employed:</p>
<pre><code>{%- if cookiecutter.include_jquery_cdn == "y" -%}
  &lt;script src="https://code.jquery.com/jquery-3.4.1.min.js"&gt;&lt;/script&gt;
{%- endif %}
</code></pre>
<h3>Variables in Python files</h3>
<p>That's basically the gist of it for template files, but the same logic can be used in Python files. Here's a snippet from Django's <code>settings.py</code>:</p>
<pre><code># settings.py
INSTALLED_APPS = [
    "django.contrib.admin",
    "django.contrib.auth",
    "django.contrib.contenttypes",
    "django.contrib.sessions",
    "django.contrib.messages",
    "django.contrib.staticfiles",
    {%- if cookiecutter.css_framework == 'tailwindcss' -%}
    "tailwind",
    "theme",
    {%- endif %}
]
</code></pre>
<p>The above isn't valid Python but these tags are stripped-out anyway when the project is generated. If the user picks <a href="https://tailwindcss.com/">Tailwind CSS</a> as their CSS framework then we need to include some extra lines (apps) in <code>INSTALLED_APPS</code>.</p>
<p>Here's how you'd access a cookiecutter's variable inside a Python file:</p>
<pre><code>project_slug = "{{ cookiecutter.project_slug }}"
</code></pre>
<h3>Variables in…cookiecutter.json</h3>
<p>We can make our <code>cookiecutter.json</code> smarter by deriving <code>project_slug</code>'s default value from <code>project_name</code>[^1]:</p>
<pre><code>{
  "project_name": "Open Folder",
  "project_slug": "{{ cookiecutter.project_name.lower()|replace(' ', '_')|replace('-', '_')|replace('.', '_')|trim() }}"
}
</code></pre>
<p>This way, if the user enters "My App" as their <code>project_name</code>, the default value for <code>project_slug</code> becomes <code>my_app</code>. The user can then simply hit enter to use this value for the slug, or override it as they wish.</p>
<h2>Ignoring files when parsing</h2>
<p>We can tell Cookiecutter to ignore — not attempt to parse — certain directories or files:</p>
<pre><code>{
  "project_slug": "open_folder",
  "project_name": "Open Folder",
  "_copy_without_render": ["theme"]
}
</code></pre>
<p><code>_copy_without_render</code> tells Cookiecutter to copy files "as-is" without attempting to render (parse) them. In the case above, <code>theme</code> is a folder that contains a package that integrates the Tailwind CSS framework into Django. It contains 3rd-party files that should remain untouched even if they contain curly braces <code>{{ }}</code> that Cookiecutter usually sniffs for and strips.</p>
<h2>Pre/Post-generate hooks</h2>
<p>Cookiecutter also supports pre- and post-generation hooks. These are just regular Python files that are run before/after the project is generated. They are named <code>pre_gen_project.py</code> and <code>post_gen_project.py</code>, respectively. You place them inside a directory named <code>hooks</code> at the root of the project:</p>
<pre><code>├── cookiecutter.json
├── hooks
│   ├── post_gen_project.py
│   └── pre_gen_project.py
└── {{ cookiecutter.project_slug }}
    ├── Pipfile
    ├── manage.py
    ├── static
    ├── templates
    ...
</code></pre>
<p>These generation hooks can be extremely useful when files (not just lines of code) should be added/removed dynamically depending on the user input. Below are examples of how each of these hooks can be useful.</p>
<h3>Pre-generation hooks</h3>
<p>If you want to validate that the <code>project_slug</code> given by the user is all lower-case, you can create a file <code>hooks/pre_gen_project.py</code> and include the following:</p>
<pre><code># hooks/pre_gen_project.py
project_slug = "{{ cookiecutter.project_slug }}"

assert (
    project_slug == project_slug.lower()
), f"{project_slug} project slug should be all lowercase"
</code></pre>
<p>Before Cookiecutter attempts to parse the project files, it will run <code>pre_gen_project.py</code> and if the user provided a slug that isn't all lowercase, the assertion will fail. The project isn't generated at all and an appropriate error message is displayed.</p>
<h3>Post-generation hooks</h3>
<p>We can do some interesting things in <code>post_gen_project.py</code> as well. Remember the aforementioned <code>theme</code> folder? it contains necessary files and modules for the 3rd party package <code>django-tailwind</code>. But if the user chose <code>bootstrap</code> or <code>none</code> we don't need this directory anymore:</p>
<pre><code># hooks/post_gen_project.py
import os
import shutil


def remove_tailwind_folder():
    theme_dir_path = "theme"
    if os.path.exists(theme_dir_path):
        shutil.rmtree(theme_dir_path)

# ...

def main():
    if "{{cookiecutter.css_framework}}".lower() != "tailwindcss":
        remove_tailwind_folder()

if __name__ == "__main__":
    main()
</code></pre>
<p><code>main()</code> checks if the user chose to use Tailwind CSS, and if not, calls the function <code>remove_tailwind_folder()</code> which will delete its folders. As you can see, we have access to project variables in the generation hooks files:</p>
<pre><code>if "{{cookiecutter.css_framework}}".lower() != "tailwindcss":
   # variable key                            # variable value
</code></pre>
<h2>Conclusion</h2>
<p>Cookiecutter can cut project generation time significantly. For more complex boilerplate the time savings can be reduced by more than 90%. As always, the package's <a href="https://cookiecutter.readthedocs.io/en/latest/index.html">docs site</a> is a good place to start if there's something you're unsure of.</p>
<p>[^1]: (full-credit for this to the aforementioned <a href="https://github.com/pydanny/cookiecutter-django">Cookiecutter Django</a>)</p>
]]></content:encoded>
    </item>
    <item>
      <title>Eliminating indentation by returning early</title>
      <link>https://thebiglog.com/posts/returning-early/</link>
      <guid isPermaLink="true">https://thebiglog.com/posts/returning-early/</guid>
      <pubDate>Fri, 28 Jun 2019 00:00:00 GMT</pubDate>
      <content:encoded><![CDATA[<p>Returning early is a fairly basic but useful technique and it's one that I've only adopted relatively late in my Python journey. <a href="https://www.python.org/dev/peps/pep-0020/">The Zen of Python</a> states that "flat is better than nested" and returning early can definitely make a noticeable difference in this regard.</p>
<p>Consider the following function:</p>
<pre><code>def make_odd_even(number: int) -&gt; int:
    if number % 2 != 0:
        return number + 1
    else:
        return number
</code></pre>
<p>Given an integer, <code>make_odd_even()</code> converts an odd number to an even one. It first verifies that it's odd, and then adds 1 to it, making it even. If the number is already even, it's returned as is.</p>
<p>Here's another, shorter way to write it:</p>
<pre><code>def make_odd_even_v2(number: int) -&gt; int:
    if number % 2 != 0:  # check if odd
        return number + 1
    return number
</code></pre>
<p>The <code>else</code> clause is omitted and becomes implicit because we know that if the number isn't odd then it’s surely even. There is no third option. Considering that <code>return</code> always stops any further code from being executed, we also know that if the number is odd, the second return statement is never reached. Same result, shorter code.</p>
<p>Another way to write the function is to flip the <code>if</code> clause check:</p>
<pre><code>def make_odd_even_v3(number: int) -&gt; int:
    if number % 2 == 0:  # check if even
        return number
    return number + 1
</code></pre>
<p>It's hard to see when the code is so trivial, but though they achieve the same result, only <code>make_odd_even_v3()</code> is an example of a returning early function.</p>
<h2>What is returning early?</h2>
<p>Returning early is the practice of first checking for one or more "invalid"/terminating conditions, usually at the beginning of the code, and halting the execution if any of these conditions is satisfied.</p>
<p>That's a mouthful.</p>
<p>&lt;div class="notice"&gt;
&lt;div class="notice-content"&gt;
&lt;p&gt;In programming-speak, returning early is known as &lt;a href="https://en.wikipedia.org/wiki/Guard_(computer_science)"&gt;guard&lt;/a&gt; or guard-code. Thanks to reddit user &lt;a href="https://www.reddit.com/user/novel_yet_trivial"&gt;novel_yet_trivial&lt;/a&gt; for pointing this out.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</p>
<p>Here's a more involved example: Suppose we want to write a function to download media (image or video) from a tweet and then upload it from our local machine to an FTP server. This function should receive one parameter, <code>tweet_url</code>, and if all goes well, it should return a URL to the downloaded media file.</p>
<p>Here's one possible implementation:</p>
<p>&lt;div class="notice"&gt;
&lt;div class="notice-content"&gt;
&lt;p&gt;The function below calls other helper functions and raises custom exceptions. Their implementations are beside the point of this article and are therefore omitted.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</p>
<pre><code>def upload_tweet_media(tweet_url: str) -&gt; str:
    if check_valid_tweet(tweet_url):
        local_path = download_media_from_tweet(tweet_url)
        if local_path:
            try:
                url_to_file = upload_to_ftp(local_path)
                return url_to_file
            except ftputil.error.FTPOSError:
                raise FTPError("Couldn't upload to FTP server")
        else:
            raise DownloadError("Couldn't download twitter media")
    else:
        raise TwitterURLError("URL is invalid")
</code></pre>
<p>We first check whether <code>tweet_url</code> is a valid URL and that it actually points to a tweet. If it does, we then attempt to download the media from this tweet using the helper function <code>download_media_from_tweet()</code> - this naughty function returns either the <code>local_path</code> to the downloaded file, or <code>None</code> if the download failed for any reason. If the download is successful, we then pass the file’s local path to <code>upload_to_ftp()</code>. Assuming all goes well, the function returns the URL to the uploaded file. For every condition check, we’re also including an <code>else</code> clause.</p>
<p>That's a lot of indentation up there. At the innermost part, we're three levels deep.</p>
<h2>Advantages of returning early</h2>
<p>How would the function above look with early-return clauses: what if it checks for the "negative", falsey, or invalid scenarios <strong>first</strong>?</p>
<pre><code>def upload_tweet_media(tweet_url: str) -&gt; str:
    if not check_valid_tweet(tweet_url):
        raise TwitterURLError("URL is invalid")

    local_path = download_media_from_tweet(tweet_url)

    if not local_path:
        raise DownloadError("Couldn't download twitter media")

    try:
        url_to_file = upload_to_ftp(local_path)
        return url_to_file
    except ftputil.error.FTPOSError:
        raise FTPError("Couldn't upload to FTP server")
</code></pre>
<p>Here, we — only seemingly — flipped the order by which we check for invalid conditions. In reality, the outer <code>if</code> clauses are evaluated first anyway - we just changed the way the code is laid out.</p>
<p>The function is now flatter, shorter even with spacing, and cleaner to the eye. As for readability, I think this change only makes our intention clearer: <em>unless</em> the URL is invalid, <em>and then unless</em> the file couldn't be downloaded, try to upload it to the FTP. An added benefit is that our “happy-path” return value (<code>url_to_file</code>) is no longer indented three levels deep, and is clearly visible towards the end of the function.</p>
<h2>Order (sometimes) matters</h2>
<p>In the example above, the order by which we perform the checks matters. It's obvious: we shouldn't attempt to download a file if the URL isn't valid, so we should first check if the URL is invalid, and only then attempt the download.</p>
<p>However, it isn't always immediately evident that conditions are coupled. When refactoring code to return early, keep in mind to verify that dependent checks are performed in the right order. You're no longer guided by the mental hints of indentation.</p>
<h2>It's only returning early if you actually return</h2>
<p>Also remember that there has to be some kind of terminating statement in your return early clauses. In the example above, these are <code>raise</code> statements, but they could have been <code>return</code>s. The important thing is to halt the execution inside these clauses.</p>
<h2>Conclusion</h2>
<p>It's not always possible and it doesn't always make sense to use early-returns. Where it does, they can eliminate multiple levels of indentation, make the code more readable, shorter, and the intention behind it clearer.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Django tutorial: as-you-type search with Ajax</title>
      <link>https://thebiglog.com/posts/django-tutorial-as-you-type-search-with-ajax/</link>
      <guid isPermaLink="true">https://thebiglog.com/posts/django-tutorial-as-you-type-search-with-ajax/</guid>
      <pubDate>Sun, 26 May 2019 00:00:00 GMT</pubDate>
      <content:encoded><![CDATA[<p>import demoVideo from './das-vid2-compressed.mp4'
import demoVideo2 from './das-naive-comp.mp4'</p>
<p>&lt;div class="notice"&gt;
&lt;div class="notice-content"&gt;
&lt;p&gt;Updated 24/03/2022: Django 4.0.3&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</p>
<p>This is a walkthrough tutorial on how to implement what's defined as “incremental search" in a Django app. We want results to refresh (with a tiny delay) as the user types their search term. We’ll also give a visual indication that the search is running by animating the search icon.</p>
<p>Here's a demo of the final functionality:</p>
<p>&lt;div class="vidwrap"&gt;
&lt;div class="space-x-4"&gt;
&lt;a href="https://thebiglog.com#" id="button-das-demo" onclick="var v = document.getElementById('vid-das-demo'); if (v.paused) v.play(); else v.pause(); return false;" class="underline"&gt;Play&lt;/a&gt;
&lt;/div&gt;
&lt;video src={demoVideo} playsinline muted class="playable w-full mt-0" id="vid-das-demo" onclick="if (this.paused) this.play(); else this.pause();" onplay="document.getElementById('button-das-demo').textContent = 'Pause';" onpause="document.getElementById('button-das-demo').textContent = 'Play'"&gt;
&lt;/video&gt;
&lt;/div&gt;</p>
<p>The source code for this tutorial is available <a href="https://github.com/SHxKM/django-ajax-search">on Github</a>.</p>
<h2>Our app</h2>
<p>Don't worry, not another blog or to-do app. This time it's a music website that displays music albums and artists. The structure is taken from an actual web-app I've built but is simplified for this post's purposes. Here's a folder only view:</p>
<pre><code>django-ajax-search/
├── core
│   └── migrations
├── django-ajax-search
├── static
│   └── django-ajax-search
│       └── javascript
└── templates
</code></pre>
<p>Our project's root directory is called <code>django-ajax-search</code>, and we've created an app called <code>core</code> where we'll write most of our Django-related code. Make sure it's included in <code>INSTALLED_APPS</code> in your <code>settings.py</code> file.</p>
<p>Here's the <code>models.py</code> file:</p>
<pre><code># core/models.py
class MusicRelease(models.Model):
    title = models.CharField(max_length=560)
    release_date = models.DateField(blank=True, null=True)  # some releases don't have release-dates

    def __str__(self):
        return self.title

    @property
    def is_released(self):
        return self.release_date &lt; timezone.now().date()


class Artist(models.Model):
    music_releases = models.ManyToManyField(MusicRelease, blank=True)
    name = models.CharField(max_length=560)

    def __str__(self):
        return f"{self.name} (release count: {self.music_releases.count()})"
</code></pre>
<p>Nothing fancy: each artist can have many music releases, and each release can have many artists. I've also added some helpful string representations to the <code>Artist</code> model.</p>
<h2>High-level overview</h2>
<p>We're going to let users search for artists in the database by name. Instead of a form with a submit button, we're going to refresh the results as the user types their query.</p>
<p>There are several moving pieces in this article so here's an outline of what we're going to do:</p>
<ol>
<li>Briefly go over how HTTP GET parameters are handled in Django views and make our view capture the user's query.</li>
<li>Make the Django view handle Ajax requests and respond to them properly with a JSON response containing the new results.</li>
<li>Use JavaScript and jQuery to send an Ajax request to our view once the user starts typing in the HTML search box. This request will include the term so the server can return relevant results.</li>
<li>Once our view returns the JSON response, our JS code will use it to change the information presented to the user without a page-refresh.</li>
</ol>
<p>Some of the concepts above will be discussed verbatim and others will be covered briefly.</p>
<h2>Dependencies and additional setup</h2>
<p>Make sure jQuery (<a href="https://code.jquery.com/">CDN link</a>) is included inside the <code>head</code> tag of the <code>base.html</code> template. While they're not strictly required, I'll also be using <a href="https://getbootstrap.com/docs/4.0/getting-started/introduction/">Bootstrap 4</a> as a CSS framework and <a href="https://fontawesome.com/start">Font Awesome</a> for the search icon, which we'll make blink when a search is being taken care of by the server.</p>
<p>Another thing to verify is that the JS file is included in <code>base.html</code>:</p>
<pre><code>{# base.html #}
{% block footer %}
  &lt;script type="text/javascript" src="https://thebiglog.com{% static "javascript/main.js" %}"&gt;&lt;/script&gt;
{% endblock %}
</code></pre>
<p>Again, you don't have to follow the structure religiously but if you're ever confused on where things belong, check out the <a href="https://github.com/SHxKM/django-ajax-search">Github repository</a>.</p>
<p>&lt;div class="notice"&gt;
&lt;div class="notice-content"&gt;
&lt;p&gt;While source code is shared in the Github repository as is, platform-specific styling is often omitted in the blocks below to keep them short and relatively portable. Styling isn't the point of this guide anyway.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</p>
<h2>Artists in our database</h2>
<p>Let's also create artists to work with. The demo app is going to have three:</p>
<pre><code>Chet Faker (release count: 1)
Queen (release count: 1)
Parker Sween (release count: 0)
</code></pre>
<p>I don't know who Parker Sween is.</p>
<h2>The artists view</h2>
<p>Here's the <code>views.py</code> file:</p>
<pre><code># core/views.py
def artists_view(request):
    ctx = {}
    url_parameter = request.GET.get("q")

    if url_parameter:
        artists = Artist.objects.filter(name__icontains=url_parameter)
    else:
        artists = Artist.objects.all()

    ctx["artists"] = artists

    return render(request, "artists.html", context=ctx)
</code></pre>
<p>This view is referenced like this in our <code>urls.py</code>:</p>
<pre><code># urls.py
from django.urls import path
from core import views as core_views

urlpatterns = [
    # ...
    path("artists/", core_views.artists_view, name="artists"),
]
</code></pre>
<p>So the path <code>ourapp.com/artists/</code> is going to hit this view. Let's pick it further apart.</p>
<h3>Capturing HTTP GET parameters</h3>
<p>The first thing to make sure of is that the view captures the GET parameter we're going to send. Here's the relevant line:</p>
<pre><code>def artists_view(request):
# ...
url_parameter = request.GET.get("q")
</code></pre>
<p>So, one way to pass information between clients and servers is using HTTP GET parameters:</p>
<pre><code>https://www.somewebsite.com/some-page?name=josh
</code></pre>
<p>When a URL like the above is requested, the server will receive the GET parameter <code>name</code> alongside its value <code>josh</code>. It's up to the server to decide what to do with this parameter, if at all.</p>
<p>&lt;div class="notice"&gt;
&lt;div class="notice-content"&gt;
&lt;p&gt;GET parameters are often referred to as query strings, URL parameters, and other combinations of the two.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</p>
<p>In Django views, these URL GET parameters are made available in a special kind of dictionary — a QueryDict called <code>GET</code>. This QueryDict lives in the <code>request</code> object, the one every Django view accepts as its first argument. Going back to the line above:</p>
<pre><code>def artists_view(request):
# ...
url_parameter = request.GET.get("q")
</code></pre>
<p>This means that our view will capture a GET parameter <code>q</code>. If it isn't passed at all, <code>url_parameter</code> will be <code>None</code>. The first <code>GET</code> is the dictionary itself, and the second <code>get()</code> is just the method used to retrieve a key's value from a dictionary.</p>
<p>Some examples of URLs requested and how they would map:</p>
<pre><code>URL requested: https://ourapp.com/artists?q=Queen
url_parameter value: "Queen"

URL requested: https://ourapp.com/artists?q=Samba
url_parameter value: "Samba"

URL requested: https://ourapp.com/artists?q=Chet Faker (decoded)
url_parameter value: "Chet Faker"

URL requested: https://ourapp.com/artists/?q=Chet%20Faker (encoded)
url_parameter value: "Chet Faker"
</code></pre>
<p>&lt;div class="notice"&gt;
&lt;div class="notice-content"&gt;
&lt;p&gt;You may be more used to see URL parameters directly appended to the URL path without a forward-slash, like &lt;code&gt;artists?q=Queen&lt;/code&gt; rather than &lt;code&gt;artists/?q=Queen&lt;/code&gt;. The first looks cleaner, yes, but requires some workarounds that are irrelevant to the subject at hand. In any case, both paths will resolve correctly given the above configuration.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</p>
<h3>Case insensitive filtering</h3>
<p>Another portion to go over in <code>artists_view</code>:</p>
<pre><code># core/views.py
def artists_view(request):
# ...
if url_parameter:
artists = Artist.objects.filter(name__icontains=url_parameter)
else:
artists = Artist.objects.all()
</code></pre>
<p>If <code>url_parameter</code>'s value isn't <code>None</code>, it means that some string was passed after <code>?q=</code> and we want to filter for <code>Artist</code> objects containing this string. Using <code>icontains</code> means the search will also be case-insensitive. For example: if <code>url_parameter</code> is <code>KER</code>, our view will return a QuerySet containing two of our artists: Chet Fa<strong>ker</strong> and Par<strong>ker</strong> Sween. Queen won't be there.</p>
<h2>Template files</h2>
<p>Our view renders the template file <code>artists.html</code>:</p>
<pre><code>{# artists.html #}
{% extends "base.html" %}

{% block content %}
&lt;h3&gt;Artists&lt;/h3&gt;

&lt;div class="row"&gt;

  {# icon and search-box #}
  &lt;div class="col-6 align-left"&gt;
    &lt;i id="search-icon" class="fas fa-search"&gt;&lt;/i&gt;
    &lt;input id="user-input" placeholder="Search"&gt;
  &lt;/div&gt;

  {# artist-list section #}
  &lt;div id="replaceable-content" class="col-6"&gt;
    {% include 'artists-results-partial.html' %}
  &lt;/div&gt;

&lt;/div&gt;
{% endblock %}
</code></pre>
<p>The first thing to note is that this template includes another template, <code>artists-results-partial.html</code>:</p>
<pre><code>{# artists-results-partial.html #}
{% if artists %}
  &lt;ul&gt;
  {% for artist in artists %}
    &lt;li&gt;{{ artist.name }}&lt;/li&gt;
  {% endfor %}
  &lt;/ul&gt;
{% else %}
  &lt;p&gt;No artists found.&lt;/p&gt;
{% endif %}
</code></pre>
<p>Including the artist-list in a separate template partial doesn't only yield better readability; more importantly, it will allow us to more easily refresh this part (and this part only) of the page using JavaScript &amp; jQuery. Also, take note of the HTML <code>id</code> attributes we assign to each of the search icon, the input field, and the div holding our artist list. We will use these values later when we target these elements for manipulation with jQuery.</p>
<h2>Making the view respond to Ajax requests</h2>
<p>Before we get to the JS code, there's one last addition we need to make in <code>artists_view</code> so it responds to Ajax requests:</p>
<pre><code>from django.template.loader import render_to_string
from django.http import JsonResponse

def artists_view(request):
    # ...earlier code
  is_ajax_request = request.headers.get("x-requested-with") == "XMLHttpRequest" and does_req_accept_json

    if is_ajax_request:
        html = render_to_string(
            template_name="artists-results-partial.html",
            context={"artists": artists}
        )

        data_dict = {"html_from_view": html}

        return JsonResponse(data=data_dict, safe=False)

return render(request, "artists.html", context=ctx)
</code></pre>
<p>We first check if the request was made via an Ajax call. In this case, we want to return the browser a <code>JSONResponse</code>. But what are we returning, exactly?</p>
<p>We're passing <code>JSONResponse</code> a dictionary we've constructed, called <code>data_dict</code>. It has a single key <code>html_from_view</code>. This key's value is going to be the variable <code>html</code>.</p>
<p><code>html</code> is our template <code>artists-results-partial.html</code> rendered as a string. It <em>literally is</em> the HTML output of our artist-list. We provide Django's <code>render_to_string()</code> a template to use and a context dictionary, and it returns to us that template as a string given the context it was fed. If it's not clear yet, here's an example:</p>
<p>In the view, If the variable <code>artists</code> is this QuerySet:</p>
<pre><code>&lt;QuerySet [&lt;Artist: Chet Faker (release count: 1)&gt;]&gt;
</code></pre>
<p>Then these lines:</p>
<pre><code>html = render_to_string(
            template_name="artists-results-partial.html",
            context={"artists": artists}
        )
print(html)
</code></pre>
<p>Will print the following:</p>
<pre><code>&lt;ul&gt;
  &lt;li&gt;Chet Faker&lt;/li&gt;
&lt;/ul&gt;
</code></pre>
<p>You can see where this is going by now: using JS and jQuery, we can pass whatever the user is typing in the input box to our view as a GET parameter, filter by that string, and then return a JSON response with the new HTML to the browser where it will replace the old HTML.</p>
<h2>Implementing Ajax search</h2>
<p>We're going to send an Ajax request to the server. Once we get a JSON response back, we'll use jQuery to manipulate the relevant HTML elements. I can't possibly go in-depth on each piece of functionality here as that's beyond the scope of this post but I'll try to at least explain the bigger picture.</p>
<p>Ajax (AJAX) stands for Asynchronous Javascript and XML. The key word here is asynchronous: it allows to send and receive data between clients (browsers) and servers without the need to reload the entire page.</p>
<p>jQuery is one of the most popular JavaScript libraries, to the point where some would confuse it as a language on its own. Its mission statement is "to allow developers to do more while writing less code".</p>
<p>&lt;div class="notice"&gt;
&lt;div class="notice-content"&gt;
&lt;p&gt;The code below is written in the ES6 syntax of JavaScript and may not work on a minority of browsers like Internet Explorer. If you want to support those you'll need to use a transpiler or employ a polyfill.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</p>
<p>Here's the full JavaScript code:</p>
<pre><code>const user_input = $("#user-input");
const search_icon = $("#search-icon");
const artists_div = $("#replaceable-content");
const endpoint = "/artists/";
const delay_by_in_ms = 700;
let scheduled_function = false;

let ajax_call = function (endpoint, request_parameters) {
  $.getJSON(endpoint, request_parameters).done((response) =&gt; {
    // fade out the artists_div, then:
    artists_div
      .fadeTo("slow", 0)
      .promise()
      .then(() =&gt; {
        // replace the HTML contents
        artists_div.html(response["html_from_view"]);
        // fade-in the div with new contents
        artists_div.fadeTo("slow", 1);
        // stop animating search icon
        search_icon.removeClass("blink");
      });
  });
};

user_input.on("keyup", function () {
  const request_parameters = {
    q: $(this).val(), // value of user_input: the HTML element with ID user-input
  };

  // start animating the search icon with the CSS class
  search_icon.addClass("blink");

  // if scheduled_function is NOT false, cancel the execution of the function
  if (scheduled_function) {
    clearTimeout(scheduled_function);
  }

  // setTimeout returns the ID of the function to be executed
  scheduled_function = setTimeout(
    ajax_call,
    delay_by_in_ms,
    endpoint,
    request_parameters
  );
});
</code></pre>
<p>Let's look at the first few lines:</p>
<pre><code>const user_input = $("#user-input");
const search_icon = $("#search-icon");
const artists_div = $("#replaceable-content");
</code></pre>
<p>Remember how we gave some of the HTML elements in <code>artists.html</code> an ID attribute? Here, we’re using a jQuery selector to save those elements as variables so we can more easily refer to them later in the code. All jQuery selectors start with a dollar sign with the selected arguments enclosed in parenthesis.</p>
<p>We’re then initializing some additional variables:</p>
<pre><code>const endpoint = "/artists/";
const delay_by_in_ms = 700;
let scheduled_function = false;
</code></pre>
<p>The first one is the relative path to the endpoint we’re going to make our Ajax request to. Note that this has to be a path where a Django URL is defined and we should always use the URL path because JavaScript knows nothing about Django's named URLs or views.</p>
<p><code>scheduled_function</code> and <code>delay_by_in_ms</code> are explained later.</p>
<p>After the variables, we define the function <code>ajax_call()</code> which we invoke towards the end of the code:</p>
<pre><code>let ajax_call = function (endpoint, request_parameters) {
  $.getJSON(endpoint, request_parameters).done((response) =&gt; {
    // fade out the artists_div, then:
    artists_div
      .fadeTo("slow", 0)
      .promise()
      .then(() =&gt; {
        // replace the HTML contents
        artists_div.html(response["html_from_view"]);
        // fade-in the div with new contents
        artists_div.fadeTo("slow", 1);
        // stop animating search icon
        search_icon.removeClass("blink");
      });
  });
};
</code></pre>
<p>This one takes two arguments, <code>endpoint</code> and <code>request_parameters</code>. It then uses jQuery's <code>getJSON()</code> method to send an Ajax request to the endpoint alongside the parameters. When it's done, it's going to give us an object we call <code>response</code>. We then fade out <code>artists_div</code>, replace its contents with <code>response['html_from_view']</code>, and fade it back-in. If you're confused about where <code>html_from_view</code> is coming from, go back to the view code responsible for handling Ajax requests.</p>
<p>Once the function is defined, we're using jQuery’s <code>on()</code> to bind a function to each <code>keyup</code> event that happens on <code>user_input</code>:</p>
<pre><code>user_input.on("keyup", function () {
  // our code
});
</code></pre>
<p>This means that each time a keyboard key is released (after being pressed) inside <code>user_input</code>, the function is run. Let's inspect this function's body:</p>
<pre><code>const request_parameters = {
  q: $(this).val(), // value of user_input: the HTML element with ID user-input
};
</code></pre>
<p>The first step is getting the value inside the input field. This is the string the user has typed so far. We save it inside an object <code>request_parameters</code>  where its key is <code>q</code>.</p>
<p>Next, we add the <code>blink</code> CSS class to our search icon:</p>
<pre><code>// start animating the search icon with the CSS class
search_icon.addClass("blink");
</code></pre>
<p>This lets the user know we’re doing something with their request. The search icon will blink indefinitely as long as it has this class. That's why we remove it at the end of <code>ajax_call()</code>.</p>
<p>Here's the CSS code defining <code>blink</code>:</p>
<pre><code>@keyframes blinker {
  from {
    opacity: 1;
  }
  to {
    opacity: 0;
  }
}

.blink {
  text-decoration: blink;
  animation-name: blinker;
  animation-duration: 0.6s;
  animation-iteration-count: infinite;
  animation-timing-function: ease-in-out;
  animation-direction: alternate;
}
</code></pre>
<h2>Making sure the server isn't hammered</h2>
<p>Now, to the <code>setTimeout</code>/<code>clearTimeout</code> part:</p>
<pre><code>// if scheduled_function is NOT false, cancel the execution of the function
if (scheduled_function) {
  clearTimeout(scheduled_function);
}

// setTimeout returns the ID of the function to be executed
scheduled_function = setTimeout(
  ajax_call,
  delay_by_in_ms,
  endpoint,
  request_parameters
);
</code></pre>
<p><code>setTimeout()</code> is a built-in JavaScript function that delays a function execution by a predefined duration (specified in milliseconds). It returns an ID of the function that is scheduled for execution. Here's its signature:</p>
<pre><code>setTimeout(func, delay_in_ms, func_param1, func_param2, ...)
</code></pre>
<p>Compare this with the parameters we're passing above and you can see that the code is scheduling the <code>ajax_call()</code> function to execute after 700 milliseconds.</p>
<p>But why introduce a delay?</p>
<p>Because if we <em>actually</em> hit the server every time the <code>keyup</code> event is registered, we're going to flood it with too many requests in a short amount of time. Here's an example of a naïve implementation that doesn't use <code>setTimeout()</code> and <code>clearTimeout()</code>. I've added a logging message that prints to the console each time a request is made:</p>
<p>&lt;div class="vidwrap"&gt;
&lt;div class="space-x-4"&gt;
&lt;a href="https://thebiglog.com#" id="button-das-naive-implement" onclick="var v = document.getElementById('vid-das-naive-implement'); if (v.paused) v.play(); else v.pause(); return false;" class="underline"&gt;Play&lt;/a&gt;
&lt;/div&gt;
&lt;video src={demoVideo2} playsinline muted class="playable w-full mt-0" id="vid-das-naive-implement" onclick="if (this.paused) this.play(); else this.pause();" onplay="document.getElementById('button-das-naive-implement').textContent = 'Pause';" onpause="document.getElementById('button-das-naive-implement').textContent = 'Play'"&gt;
&lt;/video&gt;
&lt;/div&gt;</p>
<p>So yeah, we don't want to hammer the server with every keystroke like that. That's what we utilize <code>setTimeout()</code> for. But that's only one part of the puzzle. With <code>setTimeout()</code>, all we're doing is delaying the execution of each query by 700ms. What we really want to do is send a request only after the user has ceased typing for a bit. That's where <code>clearTimeout()</code> comes in.</p>
<p><code>clearTimeout()</code> is another built-in function. Given a function ID returned by <code>setTimeout()</code>, it cancels the execution of that function <em>if it hasn't already been executed</em>.</p>
<p>Now let's look at that piece of code again:</p>
<pre><code>// if scheduled_function is NOT false, cancel the execution of the function
if (scheduled_function) {
  clearTimeout(scheduled_function);
}

// setTimeout returns the ID of the function to be executed
scheduled_function = setTimeout(
  ajax_call,
  delay_by_in_ms,
  endpoint,
  request_parameters
);
</code></pre>
<p>The above code block simply ensures an Ajax call is sent to our server <em>at most</em> every 700 milliseconds.</p>
<p>Since we initialized <code>scheduled_function</code> to <code>false</code>, the first time the <code>if</code> statement is evaluated, it's going to skip <code>clearTimeout()</code> and instantly schedule our Ajax call to execute after 700ms and that function's ID in <code>scehduled_function</code>.</p>
<p>Now, if within that very short timespan (699 milliseconds) the user types another letter, and since the variable <code>scheduled_function</code> is now truthy, the <code>if</code> block will evaluate to <code>true</code> and <code>clearTimeout()</code> will cancel the function that was scheduled for execution. Instantly after that, another new Ajax call is scheduled, and the cycle continues…if 700 milliseconds <em>did pass</em> since the user had last typed anything, <code>ajax_call()</code> is executed normally.</p>
<p>If you're still grappling with this concept, try to think of <code>setTimeout()</code> as <code>scheduleFunction()</code> and <code>clearTimeout()</code> as <code>cancelScheduledFunction()</code>.</p>
<h2>Summary</h2>
<p>Nowadays using Django as a backend with a frontend framework like Vue.js or React is all the rage, but sometimes all that's needed for interactivity in a “classic” Django app is some JavaScript and jQuery knowledge.</p>
<p>You can clone the <a href="https://github.com/SHxKM/django-ajax-search">Github repository</a> and play around with the working code if you feel you need a better understanding of some of the concepts outlined.</p>
]]></content:encoded>
    </item>
    <item>
      <title>A more Pythonic dictionary</title>
      <link>https://thebiglog.com/posts/a-more-pythonic-dictionary/</link>
      <guid isPermaLink="true">https://thebiglog.com/posts/a-more-pythonic-dictionary/</guid>
      <pubDate>Fri, 10 May 2019 00:00:00 GMT</pubDate>
      <content:encoded><![CDATA[<p>Dictionaries are versatile, fast, and efficient. This post will cover two dictionary related features that I feel don't get enough attention: <code>setdefault</code> and <code>defaultdict</code>. They're presented together to highlight both the differences and the similarities between them.</p>
<h2>Use case: how many views did each article get?</h2>
<p>Here's a simplified real-world scenario: a call to Google Analytics' API returns the following list of lists where each sub-list represents an article: the first item is the article's ID and the second one is its view count. Some article IDs may appear in more than one sub-list, and we want to sum the view counts for each distinct article:</p>
<pre><code>received_list = [
    [1678, 30],  # 1678 is the ID, 30 is the view count
    [1987, 99],
    [1822, 50],
    [1678, 22],  # ID already appears
    [2299, 30],
    [1987, 100],  # ID already appears
]
</code></pre>
<p>If you know some Python, this should be pretty simple:</p>
<pre><code>articles_and_views = {}

for each_list in received_list:
    article_id = each_list[0]
    article_views = each_list[1]

    if articles_and_views.get(article_id):
        articles_and_views[article_id] += article_views
    else:
        articles_and_views[article_id] = article_views
</code></pre>
<p>This <code>if</code> block is your standard "check whether some key is in dictionary" code. If it is, then we increment its corresponding value by <code>article_views</code>; if the key isn't already in the dictionary, we create it by assignment.</p>
<p>The output is correct as article <code>1678</code> appeared twice, first with 30 views and then with 22:</p>
<pre><code>{1678: 52, 1987: 199, 1822: 50, 2299: 30}
</code></pre>
<p>The example above is a simple one. This is so this post can focus more on what <code>setdefault</code> and <code>defaultdict</code> do, and less on the underlying data-structures. In other scenarios you may be operating inside nested dictionaries, nested lists, and even more complicated structures. That's where these two will often come handy.</p>
<h2>setdefault</h2>
<p><code>setdefault</code> is a dictionary method, just like <code>get</code>. In fact, you can think of it as a <code>get</code> that combines a conditional <code>set</code>: get a key's value, but if the key isn't present in the dictionary, create it with the default value provided:</p>
<pre><code>my_dict.setdefault(k, v_if_not_k)
# k: the key to search for
# v_if_not_k (optional): value to assign to the previously non-existent key after creating it
</code></pre>
<p>In our case, we can utilize this to get rid of the <code>if</code> clause:</p>
<pre><code>articles_and_views = {}

for each_list in received_list:
    article_id = each_list[0]
    article_views = each_list[1]
    articles_and_views.setdefault(article_id, 0)
    articles_and_views[article_id] += article_views
</code></pre>
<p>That's because there's a hidden <code>if</code> inside of <code>setdefault</code>. We're asking the dictionary <code>articles_and_views</code>: "did you see this <code>article_id</code> in your keys before? if so, give us that key's value. If not, create this key and set its value to 0". The default value can of course be a number other than 0, a list, or any other object. If you don't provide this second argument at all, the default value will be <code>None</code>.</p>
<p>Using <code>setdefault</code> makes sure that when we get to this next line:</p>
<pre><code>articles_and_views[article_id] += article_views
</code></pre>
<p><code>article_id</code> is undoubtedly an existing key in the dictionary. Either we just initialized it with a value of 0, or it had already existed before, so <code>setdefault</code> did not alter it. In any case, we can now increment its value safely.</p>
<p>In this case, we're not using the value returned by <code>setdefault</code>, but it's good to keep in mind it is available if needed.</p>
<p>While it's not unique to <code>setdefault</code>, there's one important thing to stress about this method: you can't <em>assign</em> to its return value. Meaning, this won't work:</p>
<pre><code>articles_and_views.setdefault(article_id, 0) += article_views
# SyntaxError: can't assign to function call
</code></pre>
<p>If you're confused by this, remember that it's a method (function), and you can't assign (<code>=</code>) to functions. The above snippet is comparable to this one (which is hopefully more obviously incorrect):</p>
<pre><code># a function/method on the left?!
n = -50
abs(n) += 25

# SyntaxError: can't assign to function call
</code></pre>
<p>However, you certainly can do something like this with <code>setdefault</code> if you wanted to simply append each <code>article_views</code> to a list instead of adding them up:</p>
<pre><code># notice the default value is now a list
articles_and_views = {}

for each_list in received_list:
    article_id = each_list[0]
    article_views = each_list[1]
    articles_and_views.setdefault(article_id, []).append(article_views)
</code></pre>
<p>This code will work and <code>articles_and_views</code> ends up looking like this:</p>
<pre><code>{1678: [30, 22], 1987: [99, 100], 1822: [50], 2299: [30]}
</code></pre>
<p>Every time you need a default value inside of a dictionary, consider <code>setdefault</code>. It will save you time and logical overhead. I found it especially useful for unifying external data:</p>
<pre><code>employees_from_api = [
    {"name": "Britney", "age": 32, "bonus": 1500},
    {"name": "Jeff", "age": 32, "bonus": 2400},
    {"name": "Benjamin", "age": 21}, # no bonus
]

for employee in employees_from_api:
    bonus = employee.setdefault("bonus", 500)
    print(f"{employee['name']}'s yearly bonus is {employee['bonus']}")
</code></pre>
<p>Output:</p>
<pre><code>Britney's yearly bonus is 1500
Jeff's yearly bonus is 2400
Benjamin's yearly bonus is 500
</code></pre>
<h2>defaultdict</h2>
<p><code>defaultdict</code> is a subclass of <code>dict</code> and can be imported from the built-in <code>collections</code> module:</p>
<pre><code>from collections import defaultdict
</code></pre>
<p>For the most part, <code>defaultdict</code> behaves just like <code>dict</code>, but it has one distinct feature: if provided with a valid callable as its first argument (more on this later), it never raises a <code>KeyError</code> when accessing non-existing keys; instead, it creates those.</p>
<p>This should help demonstrate this:</p>
<pre><code>&gt;&gt;&gt; regular_dict = {}
&gt;&gt;&gt; regular_dict['non_existent_key']
KeyError: 'non_existent_key'

&gt;&gt;&gt; from collections import defaultdict
&gt;&gt;&gt; int_defaultdict = defaultdict(int)
&gt;&gt;&gt; int_defaultdict['non_existent_key']
0

&gt;&gt;&gt; list_defaultdict = defaultdict(list)
&gt;&gt;&gt; list_defaultdict["non_existent_key"]
[]

&gt;&gt;&gt; dict_defaultdict = defaultdict(dict)
&gt;&gt;&gt; dict_defaultdict["non_existent_key"]
{}
</code></pre>
<p>To apply it to our example:</p>
<pre><code>from collections import defaultdict

articles_and_views = defaultdict(int)

for each_list in received_list:
    article_id = each_list[0]
    article_views = each_list[1]
    articles_and_views[article_id] += article_views
</code></pre>
<p>We've eliminated 3/6 lines compared to the same implementation with the <code>if</code> block. The code is cleaner, not less readable, and a lot more Pythonic.</p>
<p>Only if a key doesn't already exist in a dictionary, <code>defaultdict</code> will create it and use the callable to set its value. In this case the callable is <code>int</code>, which returns <code>0</code> when invoked (and remember it will be invoked only when <code>article_id</code> does not exist as key in the dictionary).</p>
<pre><code>&gt;&gt;&gt; print(articles_and_views)
defaultdict(&lt;class 'int'&gt;, {1678: 52, 1987: 199, 1822: 50, 2299: 30})
</code></pre>
<p>As you can see, the representation of a <code>defaultdict</code> is different from that of a regular dictionary. The former also specifies the callable it uses, or as the Python docs define it: the <code>default_factory</code> (in this case: <code>int</code>). We can always get the default representation or convert back to a <code>dict</code>:</p>
<pre><code>&gt;&gt;&gt; print(dict(articles_and_views))
{1678: 52, 1987: 199, 1822: 50, 2299: 30}
</code></pre>
<h3>Default factory must be a callable</h3>
<p>Say we now wanted to boost our ego (or avoid getting fired) and start each article's view count at 1,000:</p>
<pre><code>articles_and_views = defaultdict(1000)
</code></pre>
<p>The above will return an error:</p>
<pre><code>TypeError: first argument must be callable or None
</code></pre>
<p><code>defaultdict</code> needs a callable "default factory", and we gave it the integer <code>1000</code> which is...not callable.</p>
<p>So let's give it a callable:</p>
<pre><code>def return_one_thousand():
return 1000

articles_and_views = defaultdict(return_one_thousand)
</code></pre>
<p>Notice that we are <em>not calling</em> the function <code>return_one_thousand</code> (no curly braces) because that will defeat the purpose. Instead, it's <code>defaultdict</code> that will call it each time it needs to create a missing key. The function <code>return_one_thousand</code> is of course a callable so <code>defaultdict</code> doesn't complain.</p>
<p>If we only want to return a simple value we don't have to define a function and can simply use a <code>lambda</code>:</p>
<pre><code>articles_and_views = defaultdict(lambda: 1000)
</code></pre>
<p>Both the <code>return_one_thousand</code> and the <code>lambda</code> implementations will return the following:</p>
<pre><code>{1678: 1052, 1987: 1199, 1822: 1050, 2299: 1030}
</code></pre>
<p>So how come <code>int</code>, <code>dict</code>, and <code>list</code> worked? try invoking <code>int</code> and see what you get:</p>
<pre><code>&gt;&gt;&gt; int()
0
</code></pre>
<p>When you need a certain default behavior with a dictionary, consider <code>defaultdict</code>. It will often yield cleaner code than <code>setdefault</code>.</p>
<h2>Summary</h2>
<p><code>setdefault</code> and <code>defaultdict</code>'s usages can overlap, but they are different tools: the former is a method that works on a key-by-key basis and the latter is a subclass of the "regular" Python <code>dict</code> class. It's good to remember that the convenience offered by <code>defaultdict</code> — never raising a <code>KeyError</code> — can be a double-edged sword.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Grouping in Django templates</title>
      <link>https://thebiglog.com/posts/grouping-in-django-templates/</link>
      <guid isPermaLink="true">https://thebiglog.com/posts/grouping-in-django-templates/</guid>
      <pubDate>Sun, 28 Apr 2019 00:00:00 GMT</pubDate>
      <content:encoded><![CDATA[<p>I've recently deployed a tiny changelog app in one of my Django projects. The <code>models.py</code> file looks like this:</p>
<pre><code># changelog/models.py (truncated)
class ChangeLog(models.Model):

    IMPROVEMENT = ('improvement', 'improvement')
    FEATURE = ('feature', 'feature')
    BUG = ('bugfix', 'bug fix')

    CHOICES = (IMPROVEMENT, FEATURE, BUG,)

    title = models.CharField(max_length=560)
    description = models.TextField(null=True, blank=True)
    category = models.CharField(choices=CHOICES, max_length=215)
    display_date = models.DateTimeField(editable=True)
</code></pre>
<p>Nothing special so far. The only slight oddity here is <code>display_date</code>: unlike what its name suggests, it's actually a <code>datetime</code> field.</p>
<p>In this app's main template, I wanted to sort (in reverse order) and group items by the date portion of their <code>display_date</code> so the output would be something like this:</p>
<pre><code>&lt;div class="changelog-day"&gt;
  &lt;h3 class="changelog-heading"&gt;March 17, 2019&lt;/h3&gt;
  &lt;p&gt;changelog #2 created on this date&lt;/p&gt;
  &lt;p&gt;changelog #1 created on this date&lt;/p&gt;
&lt;/div&gt;

&lt;div class="changelog-day"&gt;
  &lt;h3 class="changelog-heading"&gt;March 15, 2019&lt;/h3&gt;
  &lt;p&gt;changelog #1 created on this date&lt;/p&gt;
&lt;/div&gt;
</code></pre>
<p>So, <code>ChangeLog</code> objects that have the same date should all be inside the same <code>div</code>. This is the view I had wired up at the time:</p>
<pre><code># views.py
def changelog_index(request):
    changelog_items = ChangeLog.objects.order_by('-display_date')

    context = {
        'changelog_items': changelog_items
    }

    return render(request, 'changelog_index.html', context)
</code></pre>
<p><code>order_by</code> takes care of sorting the changelog items in reverse chronological order. But there's a step missing here: how to group these changelog items by date?</p>
<h2>Grouping in the view</h2>
<p>One way is to group inside the view:</p>
<pre><code># views.py - grouping in view
def changelog_index(request):
changelogs = ChangeLog.objects.order_by('-display_date')

dates_and_items = {}

for changelog in changelogs:
    current_key = changelog.display_date.date()  # the item's date
    dates_and_items.setdefault(current_key, []).append(changelog)

context['dates_items'] = dates_and_items

return render(request, 'changelog_index.html', context)
</code></pre>
<p>Don't worry if you don't get what <code>setdefault</code> is doing, just know that this view creates a dictionary with dates as keys, and each such key holds a list of <code>ChangeLog</code> objects belonging to that date.</p>
<p>And then <code>changelong_index.html</code> would include something like this:</p>
<pre><code>{# changelong_index.html - grouping in view #}
{% for date, item_list in dates_items.items %}
  &lt;div class="changelog-day"&gt;
    &lt;h3 class="changelog-heading"&gt;&lt;b&gt;{{ date }}&lt;/b&gt;&lt;/h3&gt;
{% for changelog in item_list %}
    &lt;p&gt;{{ changelog.title }} - {{ changelog.description }}&lt;p&gt;
{% endfor %}
  &lt;/div&gt;
{% endfor %}
</code></pre>
<p>Here we are iterating over each <code>date</code> in our dictionary, and in the nested for loop, we iterate over this key’s <code>item_list</code>.</p>
<h2>Grouping in the template</h2>
<p>The other option is to leave the <code>views.py</code> file untouched. Reminder:</p>
<pre><code># views.py - grouping in template
def changelog_index(request):
    changelog_items = ChangeLog.objects.order_by('-display_date')

    context = {
        'changelog_items': changelog_items
    }

    return render(request, 'changelog_index.html', context)
</code></pre>
<p>And use Django’s built-in <code>{% regroup %}</code> tag:</p>
<pre><code>{# changelong_index.html - grouping in template #}
{% regroup changelog_items by display_date.date as dates_items %}
{% for date in dates_items %}
  &lt;div class="changelog-day"&gt;
    &lt;h3 class="changelog-heading"&gt;&lt;b&gt;{{ date.grouper }}&lt;/b&gt;&lt;/h3&gt;
    {% for changelog in date.list %}
      &lt;p&gt;{{ changelog.title }} - {{ changelog.description }}&lt;p&gt;
    {% endfor %}
  &lt;/div&gt;
{% endfor %}
</code></pre>
<p>Recognize that the markup is almost identical to the previous template snippet. Let's go over the differences:</p>
<pre><code>{% regroup changelog_items by display_date.date as dates_items %}
</code></pre>
<p><code>regroup</code> is an aptly named tag. It takes a list-like collection, and regroups it by a common attribute. Above, we’re regrouping the <code>changelog_items</code> QuerySet by its items’ <code>display_date.date</code>, and calling this regrouped collection <code>dates_items</code> which we can then use in the <code>for</code> loop.[^1]</p>
<p>If we wanted to regroup changelogs by category, we'd write:</p>
<pre><code>{# group by each changelong_item's category #}
{% regroup changelog_items by category as cats_items %}
</code></pre>
<p>Anyway...we take this regrouped collection and iterate over it like so:</p>
<pre><code>{# changelong_index.html - grouping in template, continued #}
{% for date in dates_items %}
  &lt;div class="changelog-day"&gt;
    &lt;h3 class="changelog-heading"&gt;&lt;b&gt;{{ date.grouper }}&lt;/b&gt;&lt;/h3&gt;
    {% for changelog in date.list %}
      &lt;p&gt;{{ changelog.title }} - {{ changelog.description }}&lt;p&gt;
    {% endfor %}
{% endfor %}
</code></pre>
<p>Of special note are <code>date.grouper</code> in the <code>H3</code> tag, and the <code>date.list</code> we iterate over in the nested <code>for</code> loop. These are objects that <code>regroup</code> creates: <code>grouper</code> is the item that was grouped-by, and <code>list</code> is the list of objects that belong to this group.</p>
<p>You can think of <code>grouper</code> as a key in the dictionary, and <code>list</code> as the value, which is list of items belonging to that “key”. In our case, each <code>grouper</code> is a distinct date, which has a <code>list</code> of changelog items.</p>
<h2>Important caveat</h2>
<p>Note that <code>{% regroup %}</code> itself <em>does not</em> sort the collection it regroups. In our case, the <code>ChangeLog</code> objects were sorted in the view, so <code>regroup</code> works as expected. If they weren't, <code>regroup</code> would create duplicate sections with the same date.</p>
<p>But there is a way to sort in the template, using the <code>dictsort</code>/<code>dictsortreversed</code> template tag:</p>
<pre><code>{% regroup changelog_items|dictsortreversed:"display_date" by display_date.date as sorted_dates %}
</code></pre>
<p>Here, receiving a an unordered collection <code>changelog_items</code>, we sort by <code>display_date</code> in descending order (from latest to first), and then group by the <code>display_date.date</code>.[^2]</p>
<h2>Where to group?</h2>
<p>I don't proclaim to know the definitive answer, and I don't think there is one. In the case above, grouping in the template involved less effort and took less time to write. One more possible case to utilize <code>regroup</code> is when you want to sort the same QuerySet by different attributes in the same view. In other cases, different considerations (like speed) may favor grouping in the view. Always weigh and balance.</p>
<p>As I wrote <a href="https://thebiglog.comdjango-keeping-logic-out-of-templates-and-views">earlier this month</a>, I generally prefer my templates to be as dumb as possible, but every rule has its exception, and it's good to have <code>regroup</code> in one's arsenal when the situation calls for it.</p>
<p>[^1]: Note that we never passed <code>dates_items</code> from the view.
[^2]: Using <code>display_date</code> (a <code>datetime</code>) in <code>dictsortreversed</code> means that while items are grouped by dates, more recent items within the same date are displayed first.</p>
]]></content:encoded>
    </item>
    <item>
      <title>macOS migrations with Brewfile</title>
      <link>https://thebiglog.com/posts/macos-migrations-with-brewfile/</link>
      <guid isPermaLink="true">https://thebiglog.com/posts/macos-migrations-with-brewfile/</guid>
      <pubDate>Mon, 22 Apr 2019 00:00:00 GMT</pubDate>
      <content:encoded><![CDATA[<p>Perhaps the most-dreaded aspect of setting-up a new machine is the time spent on reinstalling apps and reapplying all of the customizations from the previous one. As my MacBook Pro is about to turn six, I had been looking for a way to automate this process. At least for the applications part, I recently found a good solution (that’s apparently been around for a while).</p>
<p>This post is about using a <code>Brewfile</code> to migrate macOS packages and applications. If you're already versed in the world of Homebrew and Homebrew Bundle, you might find it overly verbose. It’s written from a beginner’s perspective as up until recently I wasn't too familiar with the concept myself.</p>
<h2>Brewfile in a nutshell</h2>
<p>A <code>Brewfile</code> contains instructions on which packages, command-line utilities, and applications to install on a macOS system. Here's a short snippet:</p>
<pre><code># Brewfile snippet

# install Python and SQLite
brew "python"
brew "sqlite"

# install 1Password, Pages, and Drafts from the Mac App Store
mas "com.agilebits.onepassword-osx", id: 443987910 # 1Password
mas "com.apple.iWork.Pages", id: 409201541 # Pages
mas "com.agiletortoise.Drafts-OSX", id: 1435957248 # Drafts


# install the apps below from Homebrew's repository
cask "carbon-copy-cloner"
cask "dropbox"
cask "vlc"

</code></pre>
<p>If I were to "run" this <code>Brewfile</code>, it would install the Python and SQLite packages, then 1Password, Pages, and Drafts from the Mac App Store, and finally Carbon Copy Cloner, Dropbox, and VLC from Homebrew’s repository (which usually pulls them from their respective websites). All apps are installed in the Applications folder by default, but the ability to differentiate between App Store and non App Store applications is significant in my case.</p>
<p>This is already faster than doing any of these steps manually. What's more, a <code>Brewfile</code> can be generated automatically so you’d rarely need to write the lines above one-by-one.</p>
<h2>Why Brewfile</h2>
<p>Because the alternatives aren't as good.</p>
<p><strong>Cloning</strong>: Using the excellent <a href="https://bombich.com/">Carbon Copy Cloner</a> to clone my old HD to the new one would theoretically be the quickest way to get going, but after 6 years, I imagine there's more than a little cruft in my system files, and recent changes to Apple’s hardware make this option even less attractive. There are also apps on my current machine that I actually <em>don't</em> want to move over.</p>
<p><strong>Time Machine and/or Migration Assistant</strong>: Migration Assistant hasn't been known for its reliability lately, and Time Machine backups are not less problematic. Listing the advantages and drawbacks is beyond the scope of this post, but if you want to read more about the pros and cons of each migration strategy, <a href="https://sixcolors.com/post/2016/11/whats-the-best-way-to-migrate/">Jason Snell</a> does a good job on that.</p>
<p><strong>Starting fresh</strong>: Nothing could go wrong, but a lot of time spent on configuration and installing apps.</p>
<h2>A detour</h2>
<p>To understand what a <code>Brewfile</code> does and how it can fit in a migration strategy, it's good to be familiar with the moving parts that make it useful. This is not an exhaustive overview, but rather an introduction into each.</p>
<h3>Homebrew</h3>
<p>In the beginning, there was Homebrew, a package manager created by Max Howell in 2009. After installing <code>homebrew</code>, you can open the Terminal and install packages easily and quickly:</p>
<pre><code># installs ffmpeg, a popular command-line package, on macOS
$ brew install ffmpeg

# now that ffmpeg is installed, we can use it:
$ ffmpeg -i input.mp4 output.avi
</code></pre>
<p>Behind the scenes, <code>brew</code> is using what it calls a "formula" to install the <code>ffmpeg</code> package. This formula is a piece of code that’s responsible for holding all the information required to install <code>ffmpeg</code>: its name, version, URL to the source files that should be downloaded, and other packages that <code>ffmpeg</code> needs in order to operate.</p>
<p>Homebrew not only makes it easy to install packages, but also to maintain them:</p>
<pre><code># upgrades ffmpeg
$ brew upgrade ffmpeg

# upgrades all outdated formulae
$ brew upgrade

# update homebrew itself, and all packages
$ brew update

# uninstalls ffmpeg
$ brew uninstall ffmpeg
</code></pre>
<p>And to discover them:</p>
<pre><code># search for youtube-dl
$ brew search youtube-dl

# get info about youtube-dl
$ brew info youtube-dl
</code></pre>
<p><code>homebrew</code> is very nice indeed. It's lauded for its ease-of-use, documentation and helpful command-line feedback.</p>
<h3>Homebrew Cask</h3>
<p><code>homebrew-cask</code> is like <code>homebrew</code>, but for macOS apps, fonts, plugins, and other non-open source software. If <code>brew install [formula-name]</code> installs a package corresponding to that formula's name, then <code>brew cask install [cask-appname]</code> installs an application with that cask's name:</p>
<pre><code># install firefox
$ brew cask install firefox

# install slack
$ brew cask install slack
</code></pre>
<p>By default, it places installed apps in the Mac's Applications directory. You can search for casks the same way you search for formulae:</p>
<pre><code>$ brew search firefox

# Output:
==&gt; Casks
firefox
multifirefox
homebrew/cask-versions/firefox-beta
homebrew/cask-versions/firefox-developer-edition
homebrew/cask-versions/firefox-esr
homebrew/cask-versions/firefox-nightly
</code></pre>
<p>But where is <code>firefox</code> coming from here? How does <code>brew cask install firefox</code> know what to install?</p>
<pre><code>$ brew cask info firefox

# Output:
firefox: 66.0.3 (auto_updates)
https://www.mozilla.org/firefox/
Not installed
From: https://github.com/Homebrew/homebrew-cask/blob/master/Casks/firefox.rb

==&gt; Name
Mozilla Firefox

==&gt; Languages
cs, de, en-GB, en, eo, es-AR, es-CL, es-ES, fi, fr, gl, in, it, ja, ko, nl, pl, pt-BR, pt, ru, tr, uk, zh-TW, zh

==&gt; Artifacts
Firefox.app (App)
</code></pre>
<p>A few pieces of information here:</p>
<ol>
<li><code>firefox: 66.0.3</code> is the version we can expect <code>homebrew-cask</code> to install.</li>
<li><code>From:</code> holds the URL where the <code>cask</code> lives. If you <a href="https://github.com/Homebrew/homebrew-cask/blob/master/Casks/firefox.rb">inspect it</a> you'll see that somewhere in there is also the URL one would go to in the browser when installing Firefox the “regular” way. There's no magic here.</li>
<li>Install options, like <code>Languages</code>. Running <code>brew cask install firefox --language=it</code> will install Firefox in Italian.</li>
</ol>
<p>Indeed, <code>homebrew-cask</code> is very, very nice.</p>
<h3>Mac App Store command line interface</h3>
<p>There's one more tool that we need to cover before <code>Brewfile</code>: <code>mas-cli</code> is a simple command line interface for the Mac App Store (MAS). It can't install apps that you haven't downloaded or purchased before, but it will allow you to upgrade those that you have installed, and download apps tied to your iCloud account:</p>
<pre><code># search for 1Password
$ mas search 1Password

# Output:
1333542190  1Password 7 - Password Manager (7.2.5)

# install 1Password by its app identifier
$ mas install 1333542190

# upgrade all apps that have pending updates
$ mas upgrade

# upgrade 1Password
$ mas upgrade 1333542190
</code></pre>
<p><code>mas-cli</code> may not seem terribly useful at first glance, but it was the missing piece in my migration strategy since it provides a way to capture all Mac Store apps currently installed:</p>
<pre><code># list all apps installed through the Mac App Store
$ mas list

# Output (truncated)
1225570693 com.ulyssesapp.mac (15.2)
986304488 com.zive.kiwi (2.0.18)
422304217 com.dayoneapp.dayone (1.10.6)
# ^identifier
# ^bundle name
#^version
</code></pre>
<p>Yes, <code>mas-cli</code> is nifty.</p>
<p>So, <code>brew install</code>, <code>brew cask install</code>, and <code>mas install</code> make things a lot faster. The next step is to find a way to automate the generation and execution of these commands.</p>
<h3>Homebrew Bundle</h3>
<p><code>homebrew-bundle</code> is an extension of <code>homebrew</code> and is installed as soon as the command <code>brew bundle</code> is first used. It's the glue that brings everything together.</p>
<p>Run <code>brew bundle dump</code> and Homebrew Bundle will generate a file called <code>Brewfile</code> listing **all ** of the installed brew packages, cask applications, and Mac App Store applications currently on the machine. If, on the other hand, you run <code>brew bundle</code> from a folder that contains a <code>Brewfile</code>, it will install everything listed in that file.</p>
<p>So, given a <code>Brewfile</code> with the following content:</p>
<pre><code># install Python and SQLite
brew "python"
brew "sqlite"

# install 1Password, Pages, and Drafts from the Mac App Store
mas "com.agilebits.onepassword-osx", id: 443987910 # 1Password
mas "com.apple.iWork.Pages", id: 409201541 # Pages
mas "com.agiletortoise.Drafts-OSX", id: 1435957248 # Drafts


# install the apps below from their own respective websites
cask "carbon-copy-cloner"
cask "dropbox"
cask "vlc"
</code></pre>
<p>Running <code>brew bundle</code> from the same directory where <code>Brewfile</code> is located will install the above packages and applications.</p>
<p>Notice that the <code>Brewfile</code> syntax differs from the commands you'd usually type in the Terminal. This table should help:</p>
<table>
<thead>
<tr>
<th>Terminal command</th>
<th>Brewfile</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>brew install [formulaName]</code></td>
<td><code>brew "[forumlaName]"</code></td>
</tr>
<tr>
<td><code>brew cask install [caskName]</code></td>
<td><code>cask "[caskName]"</code></td>
</tr>
<tr>
<td><code>mas install [identifier]</code></td>
<td><code>mas "[bundleIdentifier]", id: [identifier]</code></td>
</tr>
</tbody>
</table>
<p>I think you know where this is going by now: run <code>brew bundle dump</code> on the current machine, copy the <code>Brewfile</code> generated to the new one, run <code>brew bundle</code>, and Homebrew will take it from there. If you have lots of apps and packages the process will take some time, but nowhere near the time (or effort) it would have taken to do manually.</p>
<h2>A quick-guide on setting up a new macOS using a Brewfile</h2>
<p>Here's an abbreviated guide to set-up a new macOS with Homebrew Bundle. Unless otherwise stated, all commands below are to be typed in the macOS Terminal prompt.</p>
<p>The steps involved are:</p>
<ol>
<li>Installing dependencies on the current (source) macOS machine</li>
<li>Installing Homebrew taps</li>
<li>Generating a <code>Brewfile</code></li>
<li>Migration</li>
</ol>
<h3>1. Installing dependencies on the source machine</h3>
<h3>Homebrew</h3>
<p>Check if you already have Homebrew installed:</p>
<pre><code>$ brew help
</code></pre>
<p>If Homebrew isn't installed, the output should be something like <code>brew: command not found</code>. Homebrew itself depends on the command line tools (CLT) for Xcode, installed like this:</p>
<pre><code>$ xcode-select --install
</code></pre>
<p>You can then install Homebrew by pasting the following in your Terminal prompt:</p>
<pre><code>$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
</code></pre>
<p>If you do have Homebrew, and <code>help</code> prints a long list of commands, it's a good idea to run an update before proceeding:</p>
<pre><code>$ brew update
</code></pre>
<h3>Homebrew Cask</h3>
<p>Comes with Homebrew, but it doesn't hurt to make sure it's there:</p>
<pre><code># more on "tap" later
$ brew tap caskroom/cask
</code></pre>
<h3>Homebrew Bundle</h3>
<p>Will be installed as we run it (later).</p>
<h3>Mac App Store CLI</h3>
<p>The way to install this <code>mas-cli</code> varies depending on the OS version. You can find simple instructions in the project's <a href="https://github.com/mas-cli/mas">Github repository</a>, but if you have a recent version this should suffice:</p>
<pre><code>$ brew install mas
</code></pre>
<h3>2. Installing Homebrew taps</h3>
<p>Think of <code>taps</code> as additional sources <code>brew</code> will look at when searching and installing formulae and casks. Here's what I recommend if you're following this tutorial:</p>
<pre><code># for good measure, I've included the default taps:
brew tap homebrew/bundle
brew tap homebrew/cask
brew tap homebrew/cask-fonts
brew tap homebrew/core
brew tap homebrew/services
brew tap mas-cli/tap
</code></pre>
<h3>3. Creating the Brewfile</h3>
<p>Now that all of the of the dependencies are installed, let's generate a <code>Brewfile</code>:</p>
<pre><code># navigate to the user's home (~) directory
$ cd

# "dump" (create) the Brewfile in our home directory
# based on which packages and apps are installed
$ brew bundle dump
</code></pre>
<p>Notice that <code>Brewfile</code> may be missing non-MAS applications and packages that you haven't installed with <code>brew</code> or <code>brew cask</code>. If you installed Firefox from Mozilla's website, <code>homebrew-bundle</code> doesn't know about it. It's easy enough to search for those and add them manually. And, it's something you only have to do once since you'll never ever again go to a website, find the install link, wait for the download to finish, and then drag the app icon to <code>/Applications</code>.</p>
<p>A <code>Brewfile</code> looks something like this:</p>
<pre><code>tap "homebrew/bundle"
tap "homebrew/cask"
tap "homebrew/cask-fonts"
tap "homebrew/core"
tap "homebrew/services"
# ... possibly more tap commands here

brew "atomicparsley"
brew "autoconf"
brew "freetype"
# ... more brew commands here

cask "font-fira-mono"
cask "sip"
# ... more cask commands here

mas "com.acqualia.soulver", id: 413965349
mas "com.agilebits.onepassword-osx", id: 443987910
mas "com.agiletortoise.Drafts-OSX", id: 1435957248
mas "com.apple.dt.Xcode", id: 497799835
mas "com.apple.iWork.Keynote", id: 409183694
mas "com.apple.iWork.Numbers", id: 409203825
mas "com.apple.iWork.Pages", id: 409201541
# ... more mas commands here

</code></pre>
<p>If you'd like to omit some packages or otherwise change the <code>Brewfile</code> that your target macOS will use, you can simply copy the file somewhere else and make your changes there.</p>
<p>I keep <a href="https://github.com/SHxKM/macos-setup/blob/master/Brewfile">my Brewfile</a> in a Github repository, but you can place it in Dropbox, Google Drive, or wherever.</p>
<p>One more change I do is placing all <code>mas</code> directives before the <code>cask</code> ones, so the App Store version of an app is preferred in case that app is mistakenly listed in both sections.</p>
<h3>4. Migration</h3>
<p>The only dependency needed on the new machine is Homebrew (see step 1). That's because the <code>Brewfile</code> pulled from the old setup already stages all others for installation.</p>
<p>Once Homebrew is installed and a <code>Brewfile</code> is present, it's as simple as running:</p>
<pre><code>$ brew bundle
</code></pre>
<p><code>brew bundle</code> will look for a <code>Brewfile</code> in the current directory, but you can also specify the path manually:</p>
<pre><code># will install from a Brewfile in the Dropbox folder
$ brew bundle --file=~/Dropbox/
</code></pre>
<p>If you enjoyed this post, please consider <a href="https://github.com/homebrew/brew#donations">donating to Homebrew</a>.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Django: Keeping logic out of templates (and views)</title>
      <link>https://thebiglog.com/posts/django-keeping-logic-out-of-templates-and-views/</link>
      <guid isPermaLink="true">https://thebiglog.com/posts/django-keeping-logic-out-of-templates-and-views/</guid>
      <pubDate>Mon, 08 Apr 2019 00:00:00 GMT</pubDate>
      <content:encoded><![CDATA[<p>When I first started dabbling with Django and web-development, a good friend with a little more experience advised that I should keep logic away from my templates. <em>"Templates should be dumb"</em>.</p>
<p>I didn't really understand what that meant until I started suffering the consequences of having logic in my <code>.html</code> files. After 3 years with Django, I now try to keep business-logic away not only from templates, but also from views.</p>
<p>In this post I'll gradually go over from the least to the most recommended path and outline the advantages that each one offers.</p>
<h2>Our app: a simple blog</h2>
<p>Let's start with extracting logic from the templates first. As is the case with most real-world apps, the project usually starts simple and plain in its specifications and requirements, and starts growing gradually.</p>
<p>Given this model:</p>
<pre><code># models.py
from django.db import models
from django.utils import timezone


class Post(models.Model):
    title = models.CharField(max_length=90, blank=False)
    content = models.TextField(blank=False)
    slug = models.SlugField(max_length=90)
    is_draft = models.BooleanField(default=True, null=False)
    is_highlighted = models.BooleanField(default=False)
    published_date = models.DateTimeField(default=timezone.now)
    likes = models.IntegerField(default=0)

    class Meta:
        ordering = ('-published_date',)

    def __str__(self):
        return self.title

    @property
    def is_in_past(self):
        return self.published_date &lt; timezone.now()

</code></pre>
<h2>The worst: logic in templates</h2>
<p>In our blog's <code>index.html</code>, we want to display the latest 10 posts' titles and their publication date. The title should also be a link to the <code>post-detail</code> view, where the post content is presented.</p>
<p>While we do want to see our drafts so we can preview how they look on the website, we certainly don't want them visible to other visitors.</p>
<pre><code># views.py
def all_posts(request):
context = {}
posts = Post.objects.all()[:10]
context['posts'] = posts
return render(request, 'index.html', context)

</code></pre>
<pre><code>{# index.html #}
{% for post in posts %}
  {% if request.user.is_superuser %}
    &lt;div class="post-section"&gt;
      &lt;h4&gt;
        &lt;a href="https://thebiglog.com{% url 'post-detail' pk=post.id %}"&gt;{{ post.title }}&lt;/a&gt;

        {% if post.is_draft %}
          &lt;span class="alert alert-info small"&gt;Draft&lt;/span&gt;
        {% endif %}

        {% if not post.is_in_past %}
          &lt;span class="alert alert-info small"&gt;Future Post&lt;/span&gt;
        {% endif %}
      &lt;span class="text-muted"&gt; Date: {{ post.published_date }}&lt;/span&gt;
      &lt;/h4&gt;
    &lt;/div&gt;
  {% elif not request.user.is_superuser and not post.is_draft %}
    &lt;div class="post-section"&gt;
      &lt;h4&gt;
        &lt;a href="https://thebiglog.com{% url 'post-detail' pk=post.id %}"&gt;{{ post.title }}&lt;/a&gt;
      &lt;/h4&gt;
      &lt;span class="text-muted"&gt; Date: {{ post.published_date }}&lt;/span&gt;
    &lt;/div&gt;
  {% endif %}
{% endfor %}
</code></pre>
<p>In <code>index.html</code>, we're checking if <code>request.user</code> is an admin, and if they are, we're not filtering any posts. In the <code>elif</code> block that applies to all other visitors, we're making sure the <code>is_draft</code> property is <code>False</code> before displaying the post:</p>
<pre><code>{% elif not request.user.is_superuser and not post.is_draft %}
</code></pre>
<p>We’re also adding some Bootstrap markup so an admin can see clearly if a certain post is a draft or one that is scheduled in the future. We don't need this markup for regular visitors because they're not supposed to see these posts in the first place.</p>
<p>This kind of design is pretty bad for several reasons:</p>
<ol>
<li>No separation of concerns: why is the <em>template</em> deciding which posts to show?</li>
<li>Violates the DRY (Don't Repeat Yourself) principle: look at the span tag that holds the date. Because of our choice, we have to repeat it in both clauses of our <code>if</code> statement.</li>
<li>Verbosity: our <code>index.html</code> is only displaying links to our posts, yet it already feels very cluttered.</li>
<li>Readability and maintainability: the Jinja/Django templating engine is good, but isn't known for its clean syntax. If you come back to this in 6 months, can you quickly tell what's happening? will you remember that if you add a <code>div</code> containing the post's author name, you should do it in both clauses the <code>if</code> statement?</li>
</ol>
<h2>The better way</h2>
<p>If instead we write our view like this:</p>
<pre><code># views.py
def posts_index(request):
context = {}
limit = 10
posts = Post.objects.all()

if not request.user.is_superuser:
# hide drafts
posts = posts.filter(is_draft=False)

context['posts'] = posts[:limit]
return render(request, 'index.html', context)

</code></pre>
<p>Then our <code>index.html</code> file looks like this:</p>
<pre><code>{# index.html #}
{% for post in posts %}
  &lt;div class="post-section"&gt;
    &lt;h4&gt;
      &lt;a href="https://thebiglog.com{% url 'post-detail' pk=post.id %}"&gt;{{ post.title }}&lt;/a&gt;

      {% if post.is_draft %}
        &lt;span class="alert alert-info small"&gt;Draft&lt;/span&gt;
      {% endif %}

      {% if not post.is_in_past %}
        &lt;span class="alert alert-info small"&gt;Future Post&lt;/span&gt;
      {% endif %}
    &lt;/h4&gt;
    &lt;span class="text-muted"&gt; Date: {{ post.published_date }}&lt;/span&gt;
  &lt;/div&gt;
{% endfor %}

</code></pre>
<p>We keep the business logic outside of the template file, as it should be strictly responsible for presentation 90% of the time. Templates should mostly be concerned with <em>how</em> elements are rendered, not which, or if they are.</p>
<p>What we gain here:</p>
<ul>
<li>DRYness: we're no longer repeating the HTML for rendering the post.</li>
<li>Reusability: because <code>index.html</code> no longer makes a decision about whether to display a post, we can use it in other views later (<code>archive</code> for example).</li>
<li>Readability: it's much clearer now what's happening in <code>index.html</code> and it'll be easier to figure out when we come back to it in the future.</li>
</ul>
<p>So this is much better, and probably sufficient if you're developing a super-simple application. But even with this, you'll start repeating yourself sooner than later.</p>
<p>You may have spotted a bug in the code above. We’re not filtering out <em>future posts</em> (those with a <code>published_date</code> value in the future) when we render the index to the blog’s visitors.</p>
<p>Let's fix that:</p>
<pre><code># views.py
from django.utils import timezone

def posts_index(request):
    context = {}
    limit = 10
    posts = Post.objects.all()[:limit]

    if not request.user.is_superuser:
# filter out drafts and future posts
        posts = Post.objects.filter(is_draft=False, published_date__lte=timezone.now())[:limit]

    context['posts'] = posts
    return render(request, 'index.html', context)

</code></pre>
<p>Now only the admin will see future posts.</p>
<p>Now, we create a new view, <code>featured_posts</code>, where we only want to display posts that are marked as highlighted by us, using the <code>is_highlighted</code> field of the model. Simple enough:</p>
<pre><code>def featured_posts(request):
    context = {}
    posts = Post.objects.filter(is_highlighted=True)

    if not request.user.is_superuser:
        posts = posts.filter(is_draft=False, published_date__lte=timezone.now())

    context['posts'] = posts
# we're free to use `index.html` here because our template is now re-usable
    return render(request, 'index.html', context)
</code></pre>
<p>Now let's create a third view, <code>dashboard</code>, where we display the latest 5 regular posts, and the latest 5 highlighted posts (they may overlap):</p>
<pre><code>def dashboard(request):
    context = {}
    posts = Post.objects.all()
    limit = 10
    posts_featured = Post.objects.filter(is_highlighted=True)

    if not request.user.is_superuser:
        posts = posts.filter(is_draft=False, published_date__lte=timezone.now())
        posts_featured = posts_featured.filter(is_draft=False, published_date__lte=timezone.now())

    context['last_posts'] = posts[:limit]
    context['last_posts_featured'] = posts_featured[:limit]

    return render(request, 'dashboard.html', context)
</code></pre>
<p>We already see two problems here:</p>
<ol>
<li>Our code is getting more and more verbose, and that's with only two fields to filter by. Imagine having 3 or 4 (like author and tags for example). With real-world applications you'll often have more.</li>
<li>We're leaking implementation details of our models to our views: our view now has to know that there's a field called <code>is_highlighted</code> in our models.</li>
</ol>
<p>Worse yet, consider what happens if we now decide that posts appearing under the featured sections in our blog should meet two criteria:</p>
<ul>
<li><code>is_published</code> is <code>True</code></li>
<li><code>likes</code> count is at least <code>3</code></li>
</ul>
<p>We now have to update the code in two of our views so it includes the new criterion:</p>
<pre><code>Post.objects.filter(is_draft=False, is_highlighted=True, likes__gte=3)
</code></pre>
<p>Now imagine the work involved when you have 7 views, and two more criteria to filter by - definitely a possibility when you're dealing with larger scale apps.</p>
<h2>The even better way(s)</h2>
<p>There are two ways to go about this. We'll quickly cover the first one, which is considered less conventional and less natural, but does the job fine if you need something quick and dirty.</p>
<h3>Class methods</h3>
<pre><code>class Post(models.Model):
    # ...

    @classmethod
    def published(cls):
        """
        :return: published posts only: no drafts and no future posts
        """
        return cls.objects.filter(is_draft=False, published_date__lte=timezone.now())

    @classmethod
    def featured(cls):
        """
        :return: featured posts only
        """
        return cls.objects.filter(is_highlighted=True)
</code></pre>
<p>We've added two model methods, which we can use in our views like this:</p>
<pre><code># notice: no .objects because it's a model/class method

published_posts = Post.published()
featured_posts = Post.featured()
published_and_featured = Post.published() &amp; Post.featured()
</code></pre>
<p>Look at how much cleaner our <code>dashboard</code> becomes with this change:</p>
<pre><code>def dashboard(request):
    context = {}
    posts = Post.objects.all()
    limit = 10
    posts_featured = Post.featured()

    if not request.user.is_superuser:
        posts = posts &amp; Post.published()
        posts_featured = posts_featured &amp; Post.published()

    context['last_posts'] = posts[:limit]
    context['last_posts_featured'] = posts_featured[:limit]

    return render(request, 'dashboard.html', context)
</code></pre>
<p>What's more, changing our criteria for what is considered a "featured" post becomes as simple as changing one line in <code>Post.featured()</code>:</p>
<pre><code>class Post(model.Model):
# ...
@classmethod
    def featured(cls):
        """
        :return: highlighted posts with at least 3 likes
        """
        return cls.objects.filter(is_highlighted=True, likes__gte=3)
</code></pre>
<p>Now all the views that invoke this model method will update accordingly.</p>
<p>So this is pretty sweet, but as I wrote, considered less conventional in the Django community. One more limitation of model methods is that they are not directly chainable:</p>
<pre><code># attempting to chain our two methods
&gt;&gt;&gt; posts_featured_published = Post.featured().published()

'QuerySet' object has no attribute 'published'
</code></pre>
<p>This is why we turn to using the logical AND (<code>&amp;</code>) operator:</p>
<pre><code># using '&amp;' to further filter our queryset
posts_featured_published = Post.featured() &amp; Post.published()
</code></pre>
<p>So using model methods solves many of the previous method's shortcomings, but there's an even better way.</p>
<h3>Custom model managers</h3>
<p>I'm not going to go in-depth about managers vs querysets, as this is beyond the scope of this post. Let's get rid of our model methods in the previous step, and instead define our <code>models.py</code> file like this:</p>
<pre><code>
class PostQuerySet(models.QuerySet):
    def published(self):
        return self.filter(is_draft=False, published_date__lte=timezone.now())

    def featured(self):
        return self.filter(is_highlighted=True)


# Create your models here.
class Post(models.Model):
    title = models.CharField(max_length=90, blank=False)
    content = models.TextField(blank=False)
    slug = models.SlugField(max_length=90)
    is_draft = models.BooleanField(default=True, null=False)
    is_highlighted = models.BooleanField(default=False)
    published_date = models.DateTimeField(default=timezone.now)
    likes = models.IntegerField(default=0)

# use PostQuerySet as the manager for this model
    objects = PostQuerySet.as_manager()

    class Meta:
        ordering = ('-published_date',)

    def __str__(self):
        return self.title

    @property
    def is_in_past(self):
        return self.published_date &lt; timezone.now()
</code></pre>
<p>Of note is the <code>objects</code> field we've added to <code>Post</code>, which instructs this model to use <code>PostQuerySet</code> as its manager.</p>
<p>Let's examine, once again, our <code>dashboard</code> view:</p>
<pre><code>def dashboard(request):
    context = {}
    posts = Post.objects.all()
    limit = 10
    posts_featured = Post.objects.featured()

    if not request.user.is_superuser:
        posts = posts.published()
        posts_featured = posts_featured.published()

    context['last_posts'] = posts[:limit]
    context['last_posts_featured'] = posts_featured[:limit]

    return render(request, 'dashboard.html', context)
</code></pre>
<p>Notice how we these two manager methods are now chainable:</p>
<pre><code>&gt;&gt;&gt; posts_featured_published = Post.objects.featured().published()

&lt;PostQuerySet [&lt;Post: ...&gt;, &lt;Post: ...&gt;]&gt;
</code></pre>
<p>With <code>PostQuerySet</code> in our <code>models.py</code> file, we're extending the manager-methods at our disposal, so alongside <code>get</code>, <code>filter</code>, <code>aggregate</code>, etc…we now have <code>published</code> and <code>featured</code>.</p>
<p>A few advantages of using model managers over class methods:</p>
<ol>
<li>Chainability and clarity: <code>Post.objects.featured().published()</code> looks more Pythonic and natural than <code>Post.featured() &amp; Post.published()</code>.</li>
<li>Reusability: in many cases you can reuse the same manager for more than one model. Maybe in the future you'll create a <code>ShortNote</code> model which you can use the same <code>PostQuerySet</code> to manage. With model methods you'll have to redefine custom filters inside your <code>ShortNote</code> model.</li>
</ol>
<p>There are a few more advantages, such as the ability to define <em>several</em> managers on the same model, but these are beyond the scope of this post.</p>
<p>So, takeaway: keep logic out of templates almost at all costs, try to have as little of it as possible in your views. If you want something quick, a model method may suffice, but prefer model managers.</p>
]]></content:encoded>
    </item>
    <item>
      <title>Django templates: 'include' context</title>
      <link>https://thebiglog.com/posts/django-templates-include-context/</link>
      <guid isPermaLink="true">https://thebiglog.com/posts/django-templates-include-context/</guid>
      <pubDate>Sun, 09 Sep 2018 00:00:00 GMT</pubDate>
      <content:encoded><![CDATA[<p>Something I learned today which should come handy. The <code>include</code> tag allows rendering a partial template from another:</p>
<pre><code>{% include 'foo/bar.html' %}
</code></pre>
<p>So I was doing this to pass context to the included partial:</p>
<pre><code>{% with obj=release %}
{% include 'releases_widget.html' %}
{% endwith %}
</code></pre>
<p>And this why it's good to read the docs, because apparently this can be done much better like so:</p>
<pre><code>{% include 'releases_widget.html' with obj=release %}
</code></pre>
]]></content:encoded>
    </item>
    <item>
      <title>Restoring a database from Heroku for local development</title>
      <link>https://thebiglog.com/posts/restoring-a-database-from-heroku-for-local-development/</link>
      <guid isPermaLink="true">https://thebiglog.com/posts/restoring-a-database-from-heroku-for-local-development/</guid>
      <pubDate>Wed, 05 Sep 2018 00:00:00 GMT</pubDate>
      <content:encoded><![CDATA[<p>I've recently had to download my Django app's database for local inspection. Heroku lets you do that pretty easily with:</p>
<pre><code>$ heroku pg:backups:download
</code></pre>
<p>This gets you a <code>.dump</code> file. Now it's time to create a database clone out of it.</p>
<p>Here's the gist. We first create a new database:</p>
<pre><code>$ sudo -u USERNAME createdb NEW_DATABASE_NAME
</code></pre>
<p>Note that <code>USERNAME</code> and <code>NEW_DATABASE_NAME</code> should be replaced with the respective values.</p>
<p>The next step to is to restore the downloaded <code>.dump</code> to the database we just created:</p>
<pre><code>$ pg_restore --verbose --clean --no-acl --no-owner -h localhost -d NEW_DATABASE_NAME /PATH/TO/latest.dump
</code></pre>
<p>And now there's a database clone that you can connect to at <code>NEW_DATABASE_NAME</code>. It's also possible to overwrite an existing database by supplying its name instead of a new database name, which makes the database-creation step redundant.</p>
<p>The process usually finishes with some reported errors, but I never noticed anything weird or wrong with database copies I've generated this way.</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
