Migrating from v0.5 to v0.6¶
The release of dbtvault v0.6 has brought in a number of major changes:
-
Staging has been significantly improved, as we have introduced the new stage macro.
-
hub and link macros have been refactored to allow for multi-date and intra-day loading.
-
The
source
variable used by table macros in the
dbt_project.yml
file has previously caused some confusion. This variable has been renamed tosource_model
and must be used in all models. See below for more details. A big thank you to @balmasi for this suggestion.
Staging¶
With this update we've finally completed our move from writing metadata in-model, to writing metadata in-YAML
instead. The new stage macro entirely replaces the functionality of the old staging macros.
It is no longer necessary to call a combination of multi-hash
, add_columns
and from
in a staging model.
Previously, your staging model looked like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
In v0.6, the equivalent is now this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
1 2 3 4 |
|
No more unnecessary from
macro, no more hard to read nested lists and no more awkward comma.
With this new approach, staging is also more modular; if you do not need to derive or hash columns then you can simply
skip writing the configuration in the YAML.
Staging has been messy for a little while, and we appreciate your patience whilst we worked on improving it! We hope that this makes life easier.
For more details, specific use case examples and full documentation of functionality see below:
See Also
derive_columns and hash_columns¶
The old macros add_columns
and multi_hash
have been re-worked and re-named to hash_columns
and derive_columns
respectively.
We believe these names make much more sense. Generally, you won't need to use these macros individually (the new stage macro uses these macros internally),
but you may need to if you have specific staging needs and prefer to write your own staging layer macros with these macros as helpers.
Table Macros¶
source is now source_model¶
The variable source
has been refactored to source_model
which
refers to the model which is the source of data for the current model being used e.g. a hub or link. This change was
made after receiving feedback that the source
variable may cause confusion. Previously the vars
section of the YAML for each model in
the dbt_project.yml
file looked like:
1 2 3 4 5 6 7 |
|
The dbt_project.yml
file will now look like:
1 2 3 4 5 6 7 |
|
Note
This variable change applies to all models in the v0.6 release (not just hubs and links), please adjust all
variables and variable invocations in the dbt_project.yml
and models to these changes.
Hubs and Links¶
The functionality of the hubs and links have been updated to allow for loading multiple load dates in bulk. The hub and link SQL has also been refactored to use common table expressions (CTEs) as suggested in the Fishtown Analytics SQL style guide, to improve code readability.
The invocation of the hub and link macros have not changed aside from the variable change stated above. The old invocations of the macros were:
1 2 |
|
1 2 |
|
The new invocation of the macros is now:
1 2 |
|
1 2 |
|
Note
Hubs and Links are the only tables that can be loaded in bulk. Other table types (e.g. Satellites) require iteration due to the temporal attributes, and must be loaded in order. As of dbtvault v0.7.0 We now have a new materialisation to make this easier.
T-Links¶
The t-links have not changed, other than their invocation.
1 2 3 |
|
1 2 3 |
|
Satellites¶
Satellites have gone through a minor change in v0.6.
Invocation¶
As with other table macros, the invocation of the macro has changed as follows:
1 2 3 |
|
1 2 3 |
|
Hashdiff aliasing¶
Satellites have been updated to allow hashdiff columns to be aliased. This is a feature which will be part of more versatile global aliasing functionality which will allow users to set constant values for naming convention purposes.
HASHDIFF
columns should be called HASHDIFF
, as per Data Vault 2.0 standards. Due to the fact we have a shared
staging layer for the raw vault, we cannot have multiple columns sharing the same name. This means we have to name each
of our HASHDIFF
columns differently. dbtvault aims to align as closely as possible with Data Vault 2.0 standards,
and the following new feature is one of many steps we will be making towards that goal.
Below is an example satellite YAML config from a dbt_project.yml
file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
The highlighted lines show the syntax required to alias a column named CUSTOMER_HASHDIFF
(present in the
stg_customer_details_hashed
staging layer) as HASHDIFF
.