Multi Active Satellites (MAS)¶
Multi Active Satellites (MAS) contain point-in-time payload data related to their parent hub or link records that allow for multiple records to be valid at the same time. Some example use cases could be when customers have multiple active phone numbers or addresses.
In order to accommodate for multiple records of the same entity at a point-in-time, one or more Child Dependent Keys will be included in the Primary Key.
Our multi active satellite structures will contain:
Primary Key (src_pk)¶
A primary key (or surrogate key) which is usually a hashed representation of the natural key. For a multi active satellite, this should be the same as the corresponding link or hub PK, concatenated with the load timestamp.
Child Dependent Key(s) (src_cdk)¶
The child dependent keys are a subset of the payload (below) that helps with identifying the different valid records for each entity inside the multi active satellite. For example, a customer could have different valid phone number valid at the same time. The phone number attribute will be selected as a child dependent key that helps the natural key keep records unique and identifiable. If the customer has only one phone number, but multiple extensions associated with that phone number, then both the phone number, and the extension attribute will be considered a child dependent key.
This is a concatenation of the payload (below) and the primary key. This allows us to detect changes in a record (much like a checksum). For example, if a customer changes their name, the hashdiff will change as a result of the payload changing.
The payload consists of concrete data for an entity (e.g. A customer). This could be a name, a phone number, a date of birth, nationality, age, gender or more. The payload will contain some or all of the concrete data for an entity, depending on the purpose of the satellite.
Effective From (src_eff)¶
An effectivity date. Usually called
EFFECTIVE_FROM, this column is the business effective date of a multi active
satellite record. It records that a record is valid from a specific point in time.
If a customer changes their name, then the record with their 'old' name should no longer be valid, and it will no
longer have the most recent
Load date (src_ldts)¶
A load date or load date timestamp. This identifies when the record first gets loaded into the database.
Record Source (src_source)¶
The source for the record. This can be a code which is assigned to a source name in an external lookup table, or a string directly naming the source system.
LOAD_DATE is the time the record is loaded into the database.
EFFECTIVE_FROM is different,
holding the business effectivity date of the record (i.e. When it actually happened in the real world) and will usually
hold a different value, especially if there is a batch processing delay between when a business event happens and the
record arriving in the database for load. Having both dates allows us to ask the questions 'what did we know when'
and 'what happened when' using the
EFFECTIVE_FROM date accordingly.
Setting up MAS models¶
Create a new dbt model as before. We'll call this one
1 2 3
To create a MAS model, we simply copy and paste the above template into a model named after the MAS we are creating. dbtvault will generate a MAS using parameters provided in the next steps.
MAS should use the incremental materialization, as we load and add new records to the existing data set.
We recommend setting the
incremental materialization on all of your MAS using the
1 2 3 4 5 6 7 8 9 10
Loading Multi Active Satellites correctly
dbtvault provides custom materialisations, designed to load structures which contain deltas (such as multi active satellites, among other structures) in the correct way:
Adding the metadata¶
Let's look at the metadata we need to provide to the ma_sat macro.
The first piece of metadata we need is the source model. This step is easy, as in this example we created the staging layer ourselves. All we need to do is provide the name of stage table as a string in our metadata as follows.
1 2 3
See our metadata reference for more ways to provide metadata
Next, we define the columns which we would like to bring from the source.
Using our knowledge of what columns we need in our
ma_sat_customer_details table, we can identify columns in our
staging layer which map to them:
The primary key of the parent hub or link table, which is a hashed natural key. The
CUSTOMER_PKwe created earlier in the staging section will be used for
The child dependent key,
CUSTOMER_PHONE, that is part of the payload inside the raw staging layer.
- A hashdiff. We created
HASHDIFFin staging earlier, which we will use here.
- Some payload columns:
CUSTOMER_PHONEwhich should be present in the raw staging layer via an stage macro call.
EFFECTIVE_FROMcolumn, also added in staging.
- A load date timestamp, which is present in the staging layer as
We can now add this metadata to the
1 2 3 4 5 6 7 8 9 10 11 12
With our model complete and our YAML written, we can run dbt to create our
ma_sat_customer_details multi active satellite.
dbt run -m +ma_sat_customer_details
And our table will look like this: