Effectivity Satellites are built on Links and record the time period when the corresponding link records start and end effectivity. The next section will explore the need for effectivity satellites and explain some of the logic around their implementation.
The Driving Key problem and why we have Effectivity Satellites¶
Say we have a Table A. Table A contains a column which is a foreign key link to a Table B. In the Data Vault we model this as a Link, as follows:
The link has no from and to dates, it declares that there is a link between A and B for a reason, that is all.
To solve this, we create an Effectivity Satellite off
EFF_SAT_A_B. This contains information about
the status of
LINK_A_B. Now we have two columns,
END_DATE. When the Effectivity Satellite
record is created, we set the
START_DATE and leave the
END_DATE empty, or in our case set it to the maximal date of
9999-31-12 to signify that the relationship is will not expire - it is active, or 'open'.
Now imagine, the foreign key link in Table A changes for some reason, pointing to another record in Table B. The foreign key column in the row changes to point to the new record, it no longer points to the old record. This gives us two items of information: the new key and the ending of the old key, perhaps we have an order, and we want to change who placed the order.
The load will create a new
EFF_SAT_A_B record. The old record now needs to be end-dated so that we do not have 2 open
Link records. Ideally, we should end-date the original link relationship inside the same transaction to avoid 2 SQL queries
and possibly de-syncing.
In any Link there are two FK columns, one will change over time and the other will remain constant. The one that does not change is called the driving key and the one that does change is called the driven key (or secondary key).
Our effectivity satellites will contain:
Primary Key (src_pk)¶
A primary key (or surrogate key) which is usually a hashed representation of the natural key. For an effectivity satellite, this should be the same as the corresponding link's PK.
Driving Foreign Key (src_dfk)¶
The driving foreign key stores the primary key of the associated link, which will remain constant over time.
For example, in a relationship between a customer and an order, the order will always have occurred, but the customer
attached to the order may change over time if the order is amended. In this case the DFK would be the
(Derived from the
More on driving keys is described below.
Secondary Foreign Key (src_sfk)¶
The secondary foreign key stores the primary key of the associated link, which is likely to change over time.
As per the example in the DFK section above, this would be the
CUSTOMER_PK, derived from the
Start Date (src_start_date)¶
The start date (along with the end date) form the effectivity satellite payload, and is metadata related to the corresponding link table. The start date is the start date of the relationship in the link, which is being tracked by the effectivity satellite.
It is important to note than whilst this is usually the same or initialised to the same source data
EFFECTIVE_FROM, there can be divergence over time. The start date tracks the start date of the link, whilst
EFFECTIVE_FROM tracks the date that the effectivity satellite recorded a change in the relationship in the link.
End Date (src_end_date)¶
The end date also forms part of the effectivity satellite payload, and is metadata related to the corresponding link table, in the same way the start date is.
In cases of 1-1 and 1-M (One to many) relationships, this can usually be inferred. Unfortunately, with M-M (Many to Many) relationships, it becomes impossible to infer the end date between a pair of specific business entities, so we must explicitly define it and track the end date separately.
Effective From (src_eff)¶
An effectivity date. Usually called
EFFECTIVE_FROM, this column is the business effective date of a
satellite record. It records that a record is valid from a specific point in time.
If the link relationship changes, then the record with the old relation should no longer be valid, and it will no
longer have the most recent
Load date (src_ldts)¶
A load date or load date timestamp. This identifies when the record was first loaded into the database.
Record Source (src_source)¶
The source for the record. This can be a code which is assigned to a source name in an external lookup table, or a string directly naming the source system.
LOAD_DATE is the time the record is loaded into the database.
EFFECTIVE_FROM is different,
holding the business effectivity date of the record (i.e. When it actually happened in the real world) and will usually
hold a different value, especially if there is a batch processing delay between when a business event happens and the
record arriving in the database for load. Having both dates allows us to ask the questions 'what did we know when'
and 'what happened when' using the
EFFECTIVE_FROM date accordingly.
Setting up effectivity satellite models¶
Create a new dbt model as before. We'll call this one
1 2 3 4
To create an effectivity satellite model, we simply copy and paste the above template into a model named after the effectivity satellite we are creating. dbtvault will generate an effectivity satellite using parameters provided in the next steps.
Effectivity Satellites should use the incremental materialization, as we load and add new records to the existing data set.
We recommend setting the
incremental materialization on all of your satellites using the
1 2 3 4 5 6 7 8 9 10 11 12
Loading Effectivity Satellites correctly
dbtvault provides custom materialisations, designed to load effectivity satellites (among other structures) in the correct way:
Adding the metadata¶
Let's look at the metadata we need to provide to the eff_sat macro.
The first piece of metadata we need is the source model. This step is easy, as in this example we created the staging layer ourselves. All we need to do is provide the name of stage table as a string in our metadata as follows.
1 2 3
See our metadata reference for more ways to provide metadata
Next, we define the columns which we would like to bring from the source.
Using our knowledge of what columns we need in our
eff_sat_customer_nation table, we can identify columns in our
staging layer which map to them:
- The primary key of the parent link table, which is a hashed natural key.
CUSTOMER_NATION_PKwe created earlier in the staging section will be used for
CUSTOMER_PKcolumn. This will be our driving key.
NATION_PKcolumn. This will be our secondary key.
START_DATEcolumn. This should be initialised to the same value as the
END_DATEcolumn. This should be initialised to the maximal date (
9999-31-12), to signify an open link record (i.e. An active relationship)
EFFECTIVE_FROMcolumn, also added in staging.
- A load date timestamp, which is present in the staging layer as
We can now add this metadata to the
1 2 3 4 5 6 7 8 9 10 11
With our model complete and our YAML written, we can run dbt to create our
eff_sat_customer_nation effectivity satellite.
dbt run -m +eff_sat_customer_nation
And our table will look like this: