Cron vs Timedelta on Airflow: how to use them properly

Fabio Antunes
3 min readMar 28, 2023

When setting up an Airflow DAG (Directed Acyclic Graph), two of the most important arguments to fix are the start date and the schedule interval. The ‘start_date’, as straightforward as its name, defines when a DAG should run for the very first time. To define that, you can code like this:

with DAG('dag_name', start_date=datetime(2023, 1, 1)) as dag:

Whether you define a past date, like the example, you should be aware of the ‘catchup’ parameter and read the Airflow docs about that, setting it to be True or False

On the other hand, the parameter ‘schedule_interval’ defines the frequency that your DAG and its tasks will run. There are a few ways to define it, but in this short article we’ll focus on using cron expressions, or timedelta and their differences. We’ll also go over use cases.

If you are a Linux or Unix-based OS user, you know how to use cron expressions to schedule some tasks to run periodically. But, if you don’t, to summarize, cron expressions are a pattern to set a given time that a task needs to be executed once it matches. You may have seen something like this before: ‘0 8 * * 1–5’, which means to run a task every workday at 8am. Alternatively, Airflow also allows you to use cron presets, such as ‘@daily’, ‘@hourly’, ‘@weekly’ and so forth.

Your code’ll look like this:

with DAG('dag_name', start_date=datetime(2023, 1, 1), schedule_interval='@daily') as dag:

Or:

with DAG('dag_name', start_date=datetime(2023, 1, 1), schedule_interval='0 0 * * *') as dag:

Besides setting the ‘schedule_interval’ with cron expressions, it’s also possible to do that using timedelta. According to its docs, datetime is a Python module that supplies classes for manipulating dates and times, and timedelta is a datetime object that represents a duration — the difference between two dates or times. So, you can simply set a timedelta to ‘days=1’ or ‘hours=12’. In this case, your code might look like this:

with DAG('dag_name', start_date=datetime(2023, 1, 1), schedule_interval=timedelta(days=1)) as dag:

But, ok, what’s the difference between them?

At very first glance, they really do seem to be the same solution to setting the schedule_interval argument. To really understand their differences and use cases, it’s a good approach to dive into them deep and bring up some examples

Cron expression, as said before, is a pattern to schedule a task to be executed when it matches. However, cron is stateless, which means it’s not accurate to set frequencies and it doesn’t record past jobs, nor retain info about them. To make it understandable and crystal clear let’s use an example.

When writing your DAG arguments and parameters, let’s assume that you’ve set the start date to January 1st of 2023 at 8am. After that, you’ve set the schedule_interval with the cron preset ‘@daily’, that stands for the cron expression ‘0 0 * * *’. Instead of having your first DAG run on January 2nd at 8am, your DAG will be triggered on January 2nd at 00 am. It occurs because cron expression is not accurate with frequency and is triggered when the pattern matches with the current time — stateless, as mentioned before.

Otherwise, timedelta objects are stateful, taking into consideration the start_date that has been set, the difference between two dates. Using the precedent sample, your DAG will be triggered on January 2nd at 8am, because it really takes into account the previous execution.

In order to use it, you just need to specify the frequency, for instance, setting it to run your DAG with a one-day interval between each run ‘schedule_interval=timedelta(days=1)’.

Ok, ok! Now we got it! But, when should I use cron expressions or timedelta objects? Once the previous execution really doesn’t matter about and you can run your DAG according to the pattern you’ve set, go with it! But, when frequency and previous execution matter, let’s say you want to run it with a 3-day interval, timedelta will fit better! Cron expressions may fail at the beginning of the month/year and it doesn’t happen using timedelta.

For further information, go over the docs about timedelta and Airflow, for sure!

--

--