Effectivity Satellites are built on Links and record the time period when the corresponding Link records start and end effectivity. The next section will explore the need for Effectivity Satellites and explain why we have them and how they work.
The Driving Key problem and why we have Effectivity Satellites¶
Say we have a Table A. Table A contains a column which is a foreign key link to a Table B. In the Data Vault we model this as a Link, as follows:
A Link does not have any temporal data; it declares that there is a relationship between A and B, but nothing about when or for how long.
To solve this, we create an Effectivity Satellite from
EFF_SAT_A_B. This contains information about
the status of
LINK_A_B. Now we have two columns,
END_DATE. When the Effectivity Satellite
record is created, we set the
START_DATE and leave the
END_DATE empty, or in our case set it to the maximal date of
9999-31-12 to signify that the relationship will not expire - it is active, or 'open'.
Now imagine, the foreign key link in Table A changes for some reason, pointing to another record in Table B. The foreign key column in the row changes to point to the new record and it no longer points to the old record. This gives us two important pieces of information: the new key and the fact that the old key is no longer valid.
Perhaps we have a booking for a hotel room which has been cancelled, and we want to change the booking.
The load will create a new
EFF_SAT_A_B record. The old record now needs to be end-dated so that we do not have 2 open
Link records. Ideally, we should end-date the original Link relationship inside the same transaction to avoid 2 SQL queries
and possible de-syncing.
In any Link there are two FK columns, one will change over time and the other will remain constant. The one that does not change is called the driving key and the one that does change is called the driven key (or secondary key).
An Effectivity Satellite contains:
Primary Key (src_pk)¶
A primary key (or surrogate key) which is usually a hashed representation of the natural key. For an Effectivity Satellite, this should be the same as the corresponding link's PK.
Driving Foreign Key (src_dfk)¶
The driving foreign key stores the primary key of one of the associated Hubs to the parent Link, which will remain constant over time.
For example, in a relationship between a customer and an order, the order will always have occurred, but the customer
attached to the order may change over time if the order is amended. In this case the DFK would be the
(Derived from the
More on the concept of driving keys is described above.
Secondary Foreign Key (src_sfk)¶
The secondary foreign key stores the primary key of one of the associated Hubs to the parent Link, which is likely to change over time.
As per the example in the DFK section above, this would be the
CUSTOMER_HK, derived from the
Start Date (src_start_date)¶
The start date (along with the end date) form the Effectivity Satellite payload, and is metadata related to the corresponding Link table. The start date is the start date of the relationship in the Link, which is being tracked by the Effectivity Satellite.
It is important to note than whilst this is usually the same or initialised to the same source data
EFFECTIVE_FROM, there can be divergence over time. The start date tracks the start date of the link, whilst
EFFECTIVE_FROM tracks the date that the Effectivity Satellite recorded a change in the relationship in the Link.
End Date (src_end_date)¶
The end date also forms part of the effectivity satellite payload, and is metadata related to the corresponding Link table, in the same way the start date is.
In cases of 1-1 and 1-M relationships, this can usually be inferred. Unfortunately, with M-M relationships, it becomes impossible to infer the end date between a pair of specific business entities, so we must explicitly define it and track the end date separately.
Effective From (src_eff)¶
An effectivity date. Usually called
EFFECTIVE_FROM, this column is the business effective date of a
satellite record. It records that a record is valid from a specific point in time.
If the link relationship changes, then the record with the old relation should no longer be valid, and it will no
longer have the most recent
Load date (src_ldts)¶
A load date or load date timestamp. This identifies when the record was first loaded into the database.
Record Source (src_source)¶
The source for the record. This can be a code which is assigned to a source name in an external lookup table, or a string directly naming the source system.
Load date vs. Effective From Date¶
LOAD_DATE is the time the record is loaded into the database.
EFFECTIVE_FROM is different,
holding the business effectivity date of the record (i.e. When it actually happened in the real world) and will usually
hold a different value, especially if there is a batch processing delay between when a business event happens and the
record arriving in the database for load. Having both dates allows us to ask the questions 'what did we know when'
and 'what happened when' using the
EFFECTIVE_FROM date accordingly.
EFFECTIVE_FROM field is not part of the Data Vault 2.0 standard, and as such it is an optional field, however,
in our experience we have found it useful for processing and applying business rules in downstream Business Vault, for
use in presentation layers.
Creating effectivity satellite models¶
Create a new dbt model as before.
1 2 3 4
To create an Effectivity Satellite model, we simply copy and paste the above template into a model named after the Effectivity Satellite we are creating. dbtvault will generate an Effectivity Satellite using parameters provided in the next steps.
The recommended materialisation for Effectivity Satellites is
incremental, as we load and add new records to the existing data set.
Adding the metadata¶
Let's look at the metadata we need to provide to the eff_sat macro.
See our metadata reference for more detail on how to provide metadata.
We provide the column names which we would like to select from the staging area (
Using our knowledge of what columns we need in our
eff_sat_customer_nation Effectivity Satellite, we can identify columns in our
staging layer which map to them:
When we provide the metadata above, our model should look like the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
With our metadata provided and our model complete, we can run dbt to create
Effectivity Satellite, as follows:
dbt run -m +eff_sat_customer_nation
dbt run -s +eff_sat_customer_nation
The resulting Effectivity Satellite table will look like this:
|3C5984...||B8C37E...||79CBA1...||1993-01-01 00:00:00.000||9999-31-12 00:00:00.000||1993-01-01 00:00:00.000||1993-01-01 00:00:00.000||1|
|D8CB1F...||FED333...||8FAA77...||1993-01-01 00:00:00.000||9999-31-12 00:00:00.000||1993-01-01 00:00:00.000||1993-01-01 00:00:00.000||1|