Migrating from v0.5 to v0.6¶
The release of dbtvault v0.6 has brought in a number of major changes:
Staging has been significantly improved, as we have introduced the new stage macro.
sourcevariable used by table macros in the
dbt_project.ymlfile has previously caused some confusion. This variable has been renamed to
source_modeland must be used in all models. See below for more details. A big thank you to @balmasi for this suggestion.
With this update we've finally completed our move from writing metadata in-model, to writing metadata in-YAML
instead. The new stage macro entirely replaces the functionality of the old staging macros.
It is no longer necessary to call a combination of
from in a staging model.
Previously, your staging model looked like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
In v0.6, the equivalent is now this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
1 2 3 4
No more unnecessary
from macro, no more hard to read nested lists and no more awkward comma.
With this new approach, staging is also more modular; if you do not need to derive or hash columns then you can simply
skip writing the configuration in the YAML.
Staging has been messy for a little while, and we appreciate your patience whilst we worked on improving it! We hope that this makes life easier.
For more details, specific use case examples and full documentation of functionality see below:
derive_columns and hash_columns¶
The old macros
multi_hash have been re-worked and re-named to
We believe these names make much more sense. Generally, you won't need to use these macros individually (the new stage macro uses these macros internally),
but you may need to if you have specific staging needs and prefer to write your own staging layer macros with these macros as helpers.
source is now source_model¶
source has been refactored to
refers to the model which is the source of data for the current model being used e.g. a hub or link. This change was
made after receiving feedback that the
source variable may cause confusion. Previously the
vars section of the YAML for each model in
dbt_project.yml file looked like:
1 2 3 4 5 6 7
dbt_project.yml file will now look like:
1 2 3 4 5 6 7
This variable change applies to all models in the v0.6 release (not just hubs and links), please adjust all
variables and variable invocations in the
dbt_project.yml and models to these changes.
Hubs and Links¶
The functionality of the hubs and links have been updated to allow for loading multiple load dates in bulk. The hub and link SQL has also been refactored to use common table expressions (CTEs) as suggested in the Fishtown Analytics SQL style guide, to improve code readability.
The invocation of the hub and link macros have not changed aside from the variable change stated above. The old invocations of the macros were:
The new invocation of the macros is now:
Hubs and Links are the only tables that can be loaded in bulk. Other table types (e.g. Satellites) require iteration due to the temporal attributes, and must be loaded in order. As of dbtvault v0.7.0 We now have a new materialisation to make this easier.
The t-links have not changed, other than their invocation.
1 2 3
1 2 3
Satellites have gone through a minor change in v0.6.
As with other table macros, the invocation of the macro has changed as follows:
1 2 3
1 2 3
Satellites have been updated to allow hashdiff columns to be aliased. This is a feature which will be part of more versatile global aliasing functionality which will allow users to set constant values for naming convention purposes.
HASHDIFF columns should be called
HASHDIFF, as per Data Vault 2.0 standards. Due to the fact we have a shared
staging layer for the raw vault, we cannot have multiple columns sharing the same name. This means we have to name each
HASHDIFF columns differently. dbtvault aims to align as closely as possible with Data Vault 2.0 standards,
and the following new feature is one of many steps we will be making towards that goal.
Below is an example satellite YAML config from a
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
The highlighted lines show the syntax required to alias a column named
CUSTOMER_HASHDIFF (present in the
stg_customer_details_hashed staging layer) as