Bridge tables are query assistant tables that are part of the Business Vault. Similar to PIT tables, their purpose is to improve performance of queries on the Raw Data Vault by reducing the number of required joins for such queries to simple equi-joins. A Bridge table spans across a Hub and one or more associated Links.
This means that it is essentially a specialised form of Link table, containing hash keys from the Hub and the Links its spans. It does not contain information from Satellites, however, it may contain computations and aggregations (according to grain) to increase query performance upstream when creating virtualised data marts. Bridge tables provide a timeline for valid sets of Hub and Link relationships for a given set of dates described in an As of Date table.
Below, is a diagram showing Effectivity Satellites, Hubs and Links which would be spanned by a Bridge table:
Our Bridge structures will contain:
Hub Table Name (source_model)¶
This is the name of the Hub that contains the primary key (
src_pk) and to which the Links are connected to.
Primary Key (src_pk)¶
A primary key (or surrogate key) which is usually a hashed representation of the natural key. This will be the primary key used by the Hub.
As of Date Table (as_of_dates_table)¶
The As of Date table describes the history needed to construct the Bridge table as a list of dates. This is where you would supply the name of your As of Date table.
Bridge Table Parameters (bridge_walk)¶
This is a dictionary of Bridge table metadata subdivided into dictionaries for each link relationship. The metadata for each link relationship includes Bridge table column aliases (bridge_x), Link table name and foreign key column names (link_x), and the related Effectivity Satellite table details (eff_sat_x).
Stage Load Date Timestamps (stage_tables_ldts)¶
List of stage table load date timestamp columns. These are used to find the water-level, i.e. the latest date that hasn't yet been impacted by the stage table.
Hub Load Date Timestamp (src_ldts)¶
Hub load date timestamp column. This is used to distinguish new key relationships when compared to the water-level.
Creating Bridge models¶
Create a new dbt model as before. We'll call this one
1 2 3 4 5
To create a Bridge model, we simply copy and paste the above template into a model named after the Bridge table we are creating. dbtvault will generate a Bridge table using parameters provided in the following steps.
Bridge tables should use the
bridge_incremental materialisation, as the Bridge is remade with each new As of Date table.
Adding the metadata¶
Let's look at the metadata we need to provide to the bridge macro.
Here we will define the metadata for the source_model. We will use the
HUB_CUSTOMER that we built before.
1 2 3
Next we need to choose which source columns we will use in our
The primary key of the parent Hub, which is a hashed natural key. The
CUSTOMER_PKwe created earlier in the Hub section will be used for
BRIDGE_CUSTOMER_ORDERas the origin Primary Key.
LOAD_DATETIMEcolumn which represents the load date timestamp the
CUSTOMER_PKis valid for.
1 2 3 4 5
As of Date Table¶
The As of Date table is the source information of the As of Dates. This will provide the dates for which to generate the Bridge table.
Here we name our As of Date table
1 2 3 4 5 6
Bridge table parameters (
Finally, we need to choose which Links to incorporate in our
Below there are described the different Bridge aliases, Links table and column names, Effectivity Satellite table and column names associated with one of the Link - Effectivity Satellite pair (
LINK_CUSTOMER_ORDER_PKwill be the alias for the Primary Key column of the
LINK_CUSTOMER_ORDERLink inside the
EFF_SAT_CUSTOMER_ORDER_ENDDATEis the Bridge alias for the
EFF_SAT_CUSTOMER_ORDER_LOADDATEis the Bridge alias for the
- The full table name of the Link connecting the Customer and Order Hubs is
- The name of the Primary Key column of
- The first Foreign Key is
- The second Foreign Key is
- The full table name of the associated Effectivity Satellite is
- The Primary Key of the
EFF_SAT_CUSTOMER_ORDERtable is the same as of the parent link:
- The name of the column inside the
EFF_SAT_CUSTOMER_ORDERtable describing the timestamp when a
CUSTOMER_ORDERrelationship ended is
- The name of the column inside the
EFF_SAT_CUSTOMER_ORDERtable recording the load date/timestamp of a
In a similar fashion, continue defining the different aliases for the
ORDER_PRODUCT Link and Effectivity Satellite columns.
The dbt_project.yml below only defines two Link relationships but to add others you would follow the same method inside
bridge_walk metadata. For instance, it can be seen where the
PRODUCT_COMPONENT relationship metadata would begin.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Finally, we add the Links & Effectivity Satellites stage table names and their Load Date/Timestamp column name.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
In the end, our model should look like the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
In order to finalise the creation of the
bridge_customer_order table we use the following dbt command:
dbt run -m +bridge_customer_order
dbt run -s +bridge_customer_order
The resulting Bridge table should look like this: