transformation - Data Voyager 📊

![[transformation.drawio.svg]] # Data layers in Enterprise Data Warehouse Bigquery,snowflake or databricks | | **RAW_PRD** **(*_staging table)** | **STAGING** | **INTERMEDIATE** | **MART** | **SEMANTIC LAYER** | | -------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Environment** | Snowflake | Snowflake | Snowflake | Snowflake | Cube Cloud | | **Description** | Represent raw data from your data sources | apply incremental of source table | Clean and prepare data for transformation only joins from staging are allowed | Convert data into fact, dimension & bridge tables | View with applied business logic & business metrics ready for consumption by BI tool | | **Technical Rules** | 1:1 Replication of Operational Source Data done by [Ingestion tool](https://asadventure.atlassian.net/wiki/spaces/Lakehouse/pages/4463231001) | - **create surrogate key when…** your raw data has a clear natural key (e.g., customer_id, order_id), and you want to create a consistent surrogate key early in the process. - incremental logic (dbt snapshot or dbt incremental) - store static reference data (csv,excel): dbt seeds - Column-level Data type transformations ex. string→date , unix_timestamp → datetime - column rename ex. id_customer → customer_id - Rename columns & table names according to [naming conventions](https://asadventure.atlassian.net/wiki/spaces/Lakehouse/pages/4475158529) | - **create surrogate key when…** your surrogate key needs to be derived from multiple sources (e.g., a composite key across multiple tables). - UNION ex. customers_be & customers_nl - JOINS ex. key_ mapping - usage of CTE’s - pivot tables | - check for uniqueness & not null on surrogate_key | - pre-aggregated view ready for consumption by BI tool - data is in star schema view format - security: row-level + user-based security is added in semantic layer | | **dbt Materialization strategy** | NOT APPLICABLE | **dimension data** - dbt snapshot strategy: a) **unique key + timestamp** (ex. ingestion_export_date, airbyte_extracted_at) or b) **unique key + check** (using columns if no timestamp available) **transactional data** type 1 history is sufficient as it is kept in RAW_PRD so no extra incremental logic needed. | IF <20 M rows full refresh ELSE IF >20 M rows incremental + merge | IF <20 M rows full refresh ELSE IF >20 M rows incremental + merge | pre-aggregated views by semantic layer |