Links are another fundamental component in a Data Vault.
Links model an association or link, between two business keys. A good example would be a list of all Orders and the Customer associated with that order, in the business.
Due to the similarities in the load logic between links and hubs, most of this page will be familiar if you have already followed the hubs page.
Our links will contain:
Primary Key (src_pk)
A primary key (or surrogate key) which is usually a hashed representation of the natural key. For links, we take the natural keys (prior to hashing) represented by the foreign key columns below and create a hash on a concatenation of them.
Foreign Keys (src_fk)
Foreign keys referencing the primary key for each hub referenced in the link (2 or more depending on the number of hubs referenced)
Load date (src_ldts)
A load date or load date timestamp. This identifies when the record was first loaded into the database.
Record Source (src_source)
The source for the record. This can be a code which is assigned to a source name in an external lookup table, or a string directly naming the source system.
Setting up link models¶
Create a new dbt model as before. We'll call this one
To create a link model, we simply copy and paste the above template into a model named after the link we are creating. dbtvault will generate a link using metadata provided in the next steps.
Links should use the incremental materialization, as we load and add new records to the existing data set.
We recommend setting the
incremental materialization on all of your links using the
1 2 3 4 5 6 7 8 9 10 11 12
Adding the metadata¶
Now we need to provide some metadata to the link macro.
The first piece of metadata we need is the source model. This step is easy, as we created the
staging layer ourselves. All we need to do is provide the name of the staging layer in the
and dbtvault will do the rest for us.
1 2 3 4
See our metadata reference for more ways to provide metadata
Next, we define the columns which we would like to bring from the source.
Using our knowledge of what columns we need in our
link_customer_nation table, we can identify columns in our
staging layer which map to them:
- A primary key, which is a combination of the two natural keys: In this case
CUSTOMER_NATION_PKwhich we added in our staging layer.
CUSTOMER_KEYwhich is one of our natural keys (we'll use the hashed column,
NATION_KEYthe second natural key (we'll use the hashed column,
- A load date timestamp, which is present in the staging layer as
We can now add this metadata to the
1 2 3 4 5 6 7 8 9
We are using
src_fk, a list of the foreign keys. This is instead of the
we used when building the hubs. These columns must be given in this list format in the
for the links.
With our model complete and our YAML written, we can run dbt to create our
dbt run -m +link_customer_nation
And our table will look like this:
Loading from multiple sources to form a union-based link¶
In some cases, we may need to create a link via a union, instead of a single source as we have seen so far. This may be because we have multiple source staging tables, each of which contains a natural key of the link. This would require multiple feeds into one table: dbt prefers one feed, so we union the different feeds into one source before performing the insert via dbt.
So, this data can and should be combined because these records have a shared key. We can union the tables on that key, and create a link containing a complete record set.
We'll need to have a staging model for each of the sources involved,
and provide them as a list of strings in the
dbt_project.yml file as shown below.
If your primary key and natural key columns have different names across the different tables, they will need to be aliased to the same name in the respective staging layers via the stage macro.
The union link model will look exactly the same as creating a single source link model. To create a union you need to provide a list of sources rather than a single source in the metadata, the link macro will handle the rest.
1 2 3 4 5 6 7 8 9 10 11
We have now created:
- A staging layer
- A Hub
- A Link
Next we will look at transactional links.