Skip to content

Point In Time (PIT) tables

A Point-In-Time table is a query assistant structure, part of the Business Vault, meant to improve the performance of loading and creating the information marts. Given a supplied list of dates/timestamps in an As of Date table, the PIT table will identify the relevant records from each Satellite for that specific date/timestamp and record the Hash Key and the LDTS value of that Satellite record. By identifying the coordinates of the relevant records at each point-in-time a priori, the information marts queries can make use of equi-joins which offer a significant boost in performance.

The recommendation is to use the PIT table when referencing at least two Satellites and especially when the Satellites have different rates of update.

Structure

Our Point-In-Time structures will contain:

Source Model (source_model)

This is the name of the parent Hub that contains the primary key (src_pk) and to which the Satellites are connected to.

Primary Key (src_pk)

A primary key (or surrogate key) which is usually a hashed representation of the natural key. This will be the primary key used by the parent Hub.

Load Date/Timestamp (src_ldts)

This is a string with the name of the Hub's Load Date/Timestamp column

As of Date Table (as_of_dates_table)

The as_of_dates_table describes the history needed to construct the PIT table as a list of dates. This is where you would supply the name of your As of Date table.

Satellites (satellites)

This is a dictionary that contains the metadata for the Satellites in subject. It will have three levels of keys.

The first level key is the name of the Satellite in uppercase.

The second level keys will be pk and ldts.

The third level key will be 'PK' and 'LDTS'. The expected value for the 'PK' key is the Hash Key column name of the Satellite (e.g. CUSTOMER_HK). The expected value for the 'LDTS' key is the Load Date/Timestamp column name of the Satellite (e.g. LOAD_DATETIME).

Stage Models (stage_tables_ldts)

This is a dictionary that contains the names of the Load Date/Timestamp columns for each stage table sourcing the Satellites.

The keys in the dictionary will be the stage table names (e.g. 'STG_CUSTOMER_DETAILS), whereas the values will be the name of the Load Date/Timestamp column for that stage table (e.g. 'LOAD_DATETIME')

Tip

To see a full example of how the metadata needs to be defined for a PIT object, please check the PIT section on the metadata page.

Creating PIT models

Create a new dbt model as before. We'll call this one pit_customer.

1
2
3
4
5
{{ automate_dv.pit(source_model=source_model, src_pk=src_pk,
                   as_of_dates_table=as_of_dates_table,
                   satellites=satellites,
                   stage_tables_ldts=stage_tables_ldts,
                   src_ldts=src_ldts) }}

To create a PIT model, we simply copy and paste the above template into a model named after the PIT we are creating. AutomateDV will generate a PIT using parameters provided in the next steps.

Materialisation

PIT tables should use the pit_incremental materialisation, as they will be remade with every new As of Date table.

Adding the metadata

Let's look at the metadata we need to provide to the pit macro.

Parameter Value
source_model hub_customer
src_pk CUSTOMER_HK
as_of_dates_table AS_OF_DATE
satellites {"SAT_CUSTOMER_DETAILS":
  {"pk": {"PK": "CUSTOMER_HK"}, {"ldts": {"LDTS": "LOAD_DATETIME"}}
}
{"SAT_CUSTOMER_LOGIN":
  {"pk": {"PK": "CUSTOMER_HK"}, {"ldts": {"LDTS": "LOAD_DATETIME"}}
}
stage_tables_ldts {"STG_CUSTOMER_DETAILS": "LOAD_DATETIME",
"STG_CUSTOMER_LOGIN": "LOAD_DATETIME"}
src_ldts LOAD_DATETIME

Source table

Here we will define the metadata for the source_model. We will use the HUB_CUSTOMER that we built before.

1
2
3
{%- set yaml_metadata -%}
source_model: hub_customer
...

Primary Key

Next we need add the Hub's Primary Key column

1
2
3
4
{%- set yaml_metadata -%}
source_model: hub_customer
src_pk: CUSTOMER_HK
...

Load Date/Timestamp

Next, we add the Load Date/Timestamp column name of the parent Hub

1
2
3
4
5
{%- set yaml_metadata -%}
source_model: hub_customer
src_pk: CUSTOMER_HK
src_ldts: LOAD_DATETIME
...

As of Date Table

Next, we provide the table name of the PIT's As of Date table.

1
2
3
4
5
6
{%- set yaml_metadata -%}
source_model: hub_customer
src_pk: CUSTOMER_HK
as_of_dates_table: AS_OF_DATE
src_ldts: LOAD_DATETIME
...

Satellites metadata

Here we add the Satellite related details (i.e. the Primary/Hash Key and the Load Date/Timestamp column names)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
{%- set yaml_metadata -%}
source_model: hub_customer
src_pk: CUSTOMER_HK
as_of_dates_table: AS_OF_DATE
satellites: 
  SAT_CUSTOMER_DETAILS:
    pk:
      PK: CUSTOMER_HK
    ldts:
      LDTS: LOAD_DATETIME
  SAT_CUSTOMER_LOGIN:
    pk:
      PK: CUSTOMER_HK
    ldts:
      LDTS: LOAD_DATETIME
src_ldts: LOAD_DATETIME
...

Stage metadata

Finally, we add Satellites' stage table names and their Load Date/Timestamp column names

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
{%- set yaml_metadata -%}
source_model: hub_customer
src_pk: CUSTOMER_HK
as_of_dates_table: AS_OF_DATE
satellites: 
  SAT_CUSTOMER_DETAILS:
    pk:
      PK: CUSTOMER_HK
    ldts:
      LDTS: LOAD_DATETIME
  SAT_CUSTOMER_LOGIN:
    pk:
      PK: CUSTOMER_HK
    ldts:
      LDTS: LOAD_DATETIME
stage_tables: 
  STG_CUSTOMER_DETAILS: LOAD_DATETIME
  STG_CUSTOMER_LOGIN: LOAD_DATETIME    
src_ldts: LOAD_DATETIME
{%- endset -%}

Now, our model should look like the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
{{ config(materialized='pit_incremental') }}

{%- set yaml_metadata -%}
source_model: hub_customer
src_pk: CUSTOMER_HK
as_of_dates_table: AS_OF_DATE
satellites: 
  SAT_CUSTOMER_DETAILS:
    pk:
      PK: CUSTOMER_HK
    ldts:
      LDTS: LOAD_DATETIME
  SAT_CUSTOMER_LOGIN:
    pk:
      PK: CUSTOMER_HK
    ldts:
      LDTS: LOAD_DATETIME
stage_tables: 
  STG_CUSTOMER_DETAILS: LOAD_DATETIME
  STG_CUSTOMER_LOGIN: LOAD_DATETIME    
src_ldts: LOAD_DATETIME
{%- endset -%}

{% set metadata_dict = fromyaml(yaml_metadata) %}

{% set source_model = metadata_dict['source_model'] %}
{% set src_pk = metadata_dict['src_pk'] %}
{% set as_of_dates_table = metadata_dict['as_of_dates_table'] %}
{% set satellites = metadata_dict['satellites'] %}
{% set stage_tables_ldts = metadata_dict['stage_tables_ldts'] %}
{% set src_ldts = metadata_dict['src_ldts'] %}

{{ automate_dv.pit(source_model=source_model, src_pk=src_pk,
                   as_of_dates_table=as_of_dates_table,
                   satellites=satellites,
                   stage_tables_ldts=stage_tables_ldts,
                   src_ldts=src_ldts) }}

Note

See our metadata reference for more details on how to provide metadata to PITs.

Running dbt

With our model complete and our YAML written, we can run dbt to create our pit_customer table.

dbt run -s +pit_customer

The resulting Point in Time table would look like this:

CUSTOMER_HK AS_OF_DATE SAT_CUSTOMER_DETAILS_PK SAT_CUSTOMER_DETAILS_LDTS SAT_CUSTOMER_LOGIN_PK SAT_CUSTOMER_LOGIN_LDTS
HY67OE... 2021-11-01 HY67OE... 2020-06-05 000000... 1900-01-01
RF57V3... 2021-11-01 RF57V3... 2017-04-24 RF57V3... 2021-04-01
. . . . . .
. . . . . .
HY67OE... 2021-11-15 HY67OE... 2021-11-09 HY67OE... 2021-11-14
RF57V3... 2021-11-15 RF57V3... 2017-04-24 RF57V3... 2021-04-01
. . . . . .
. . . . . .
HY67OE... 2021-11-31 HY67OE... 2021-11-09 HY67OE... 2021-11-30
RF57V3... 2021-11-31 RF57V3... 2021-11-20 RF57V3... 2021-04-01