Satellites contain point-in-time payload data related to their parent hub or link records. Satellites are where the concrete data for our business entities in the hubs and links, reside. Each hub or link record may have one or more child satellite records, which form a history of changes to that hubs or link record as they happen.
Each component of a satellite is described below.
Primary Key (src_pk)¶
A primary key (or surrogate key) which is usually a hashed representation of the natural key. For a satellite, this should be the same as the corresponding link or hub PK, concatenated with the load timestamp.
This is a concatenation of the payload (below) and the primary key. This allows us to detect changes in a record (much like a checksum). For example, if a customer changes their name, the hashdiff will change as a result of the payload changing.
The payload consists of concrete data for an entity (e.g. A customer). This could be a name, a date of birth, nationality, age, gender or more. The payload will contain some or all of the concrete data for an entity, depending on the purpose of the satellite.
Effective From (src_eff) - optional¶
An effectivity date. Usually called
EFFECTIVE_FROM, this column is the business effective date of a
satellite record. It records that a record is valid from a specific point in time.
If a customer changes their name, then the record with their 'old' name should no longer be valid, and it will no
longer have the most recent
This is an optional metadata column which can be useful later on, and is not part of the DataVault 2.0 standard.
Load date (src_ldts)¶
A load date or load date timestamp. This identifies when the record was first loaded into the database.
Record Source (src_source)¶
The source for the record. This can be a code which is assigned to a source name in an external lookup table, or a string directly naming the source system.
Load date vs. Effective From Date¶
LOAD_DATE is the time the record is loaded into the database.
EFFECTIVE_FROM is different,
holding the business effectivity date of the record (i.e. When it actually happened in the real world) and will usually
hold a different value, especially if there is a batch processing delay between when a business event happens and the
record arriving in the database for load. Having both dates allows us to ask the questions 'what did we know when'
and 'what happened when' using the
EFFECTIVE_FROM date accordingly.
EFFECTIVE_FROM field is not part of the Data Vault 2.0 standard, and as such it is an optional field, however,
in our experience we have found it useful for processing and applying business rules in downstream business vault, for
use in presentation layers.
Creating satellite models¶
Create a new dbt model as before. We'll call this one
1 2 3
To create a satellite model, we simply copy and paste the above template into a model named after the satellite we are creating. dbtvault will generate a satellite using parameters provided in the next steps.
The recommended materialisation for satellites is
incremental, as we load and add new records to the existing data set.
Adding the metadata¶
Let's look at the metadata we need to provide to the satellite macro.
We provide the column names which we would like to select from the staging area (
Using our knowledge of what columns we need in our
sat_customer_details satellite, we can identify columns in our
staging layer which map to them:
|src_hashdiff||source_column: CUSTOMER_HASHDIFF, alias: HASHDIFF|
|src_payload||CUSTOMER_NAME, CUSTOMER_DOB, CUSTOMER_PHONE|
We're supplying a mapping (dictionary) to our
src_hashdiff parameter, Read More
When we provide the metadata above, our model should look like the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
See our metadata reference for more detail on how to provide metadata to satellites.
With our model complete and our YAML written, we can run dbt to create our
dbt run -m +sat_customer_details
And our table will look like this: