By Chris Knight – Senior BI Engineer
Here at synvert TCM (formerly Crimson Macaw), we use dbt, a data transformation tool to help build data warehouses for our clients. We’re all about saving our clients time and cost, so how can you go about saving time in dbt with macros?
You can learn more about dbt here.
Macros in dbt are pieces of code that are referencable in multiple dbt models. They ensure that the code you write is ‘DRY’ (don’t repeat yourself – a principal that is fundamental to the dbt ethos) since they are reusable across multiple dbt models. This results in code that is easier to understand and quicker to write.
Macros in dbt use jinja2; a templating language that you can use in conjunction with SQL.
You must create them in the macros directory in your dbt project, as defined in the macro-paths element within the projects.yml file as shown below.
Example 1 – metadata generator
At synvert TCM, one way in which we use macros is to add standard metadata columns to our dbt models. We use metadata columns to capture data about each row such as the source system, change data capture (CDC) timestamp and operation, as well as the insert and update dates.
Having a macro that inserts these standard columns saves time as you don’t have to manually add them to each model. If a change or addition is required, this can be done in one place, with any changes propagating to each model, which also reduces the risk of errors.
The screenshot below shows the macro code. Note the opening and closing jinja2 tags.
When we create our dbt models, rather than manually listing every metadata column in every model that requires them, we need only add a reference to the macro, using the appropriate jinja2 syntax.
When the dbt model is compiled, the jinja2 code will be replaced by the column definitions from the macro. You can find the compiled code within the target\compiled directory within your dbt project. The compiled code for the above example; a user dimension is shown below.
As you can see, once the model has been compiled, the jinja2 reference to the macro name is replaced by the columns specified in the macro itself.
Example 2 – date dimension
A date dimension is a common feature of many data models. Date dimensions typically include additional attributes relating to a date and allow definable hierarchies so that measures roll up. For example, they allow daily sales to roll up into weekly sales, and weekly sales to roll up to monthly sales etc, which is useful for drill-down reports. In addition, they often include details of fiscal periods, national holidays, and leap years.
Date dimensions will often have the same or similar structure across not only multiple data models within a project but across multiple projects and multiple clients. This is a great scenario where a macro can save time. Here at synvert TCM, we have developed a macro which accepts parameters for the start and end dates to determine the range of dates the finished dimension will contain. It also contains the start of the fiscal year. This allows for slightly different requirements across projects and clients, whilst removing the need for multiple variations of the same macro thus ensuring that the code is DRY.
The macro leverages the date_spine macro from the dbt_utils package to provide the list of dates within the range specified by the start and end date parameters and uses this as the basis for the logic for each column in the date dimension. Other dbt packages can be found on the dbt hub.
To use the macro, create your date dimension dbt model and call the macro, passing in the desired variables.
Portable Macros
To allow this macro to be portable across projects we have included it in our own custom utilities package. We reference this in the packages.yml file, so it can install in a project simply by running dbt deps.
Here is the markdown that we added to our dbt documentation, which gives further details:
Conclusion: Saving time in dbt with macros.
The examples we’ve looked at here only scratch the surface of what is possible. At synvert TCM we have developed macros for a wide variety of uses. This includes batch control, generation of prototype data models and automated schema.yml file generation. We recommend the use of macros whenever you have code that repeats across multiple dbt models. If you have macros you are using across multiple dbt projects, consider including these in a package.
The consistency afforded using macros leads to quicker cross-project familiarisation, which in turn allows us to better support our clients.
Want to find out more? Need support on your own data journey? Get in touch here.