ETL Overview for Analytics Data
Overview
Bonsai ingests raw, event-level first-party analytics data — including GA4, Adobe Analytics, Heap, or custom 1P tracking implementations — and standardizes it into a unified customer journey dataset for marketing attribution, incrementality modeling, and campaign analytics.
Analytics Platforms
GA4 BigQuery Export
The GA4 BigQuery Export provides raw event-level analytics data (sessions, events, traffic source metadata, device data, and commerce events) in BigQuery. Bonsai uses this export as an upstream source of truth for web and app behavioral touchpoints.
API documentation: This connector leverages the Google Analytics BigQuery Export Integration
Adobe Analytics Data Feed
The Adobe Analytics Data Feed provides raw, hit-level event data (page views, custom events, eVars, props, traffic source metadata, device data, and commerce events) delivered via cloud storage or API extraction. Bonsai uses this raw export as an upstream source of truth for web and app behavioral touchpoints.
API documentation: This connector leverages the Adobe Analytics Data Feed and Reporting APIs
Heap Data Export
Heap provides raw, event-level behavioral data (user interactions, page views, custom events, properties, traffic attribution metadata, and user identifiers) via warehouse sync or API export. Bonsai ingests this event-level export as a first-party behavioral source to construct standardized customer journey paths.
API documentation: This connector leverages the Heap Warehouse Sync and API Export
Custom First-Party Tracking (1P Event Stream)
Custom first-party tracking implementations (including proprietary event pipelines, server-side tracking, or custom 1P cookie frameworks) provide raw event-level behavioral data such as sessions, user identifiers, traffic source parameters, and commerce events. Bonsai ingests this structured event stream via secure file transfer or warehouse connection and transforms it into a standardized customer journey dataset.
API documentation: Integration specifications are provided directly by the client. Data must meet Bonsai’s event schema requirements (event timestamp, user identifier, session identifier, event name, traffic source metadata).
Transformation (ETL) Summary
Bonsai runs a daily ETL pipeline that normalizes any analytics event data and enriches it with campaign, geographic, device, and conversion attributes. The output is a unified touchpoint dimension table that represents each event in the customer journey.
Key transformation steps include:
Extract event-level records from analytics export tables.
Normalize identifiers (cookie/device ID, user ID, session ID) to support journey stitching.
Flatten and map traffic source fields into standardized source/medium/campaign fields.
Derive geographic and device dimensions from analytics event metadata.
Join commerce-related events to transaction_id and item details when available.
Parse / extract click identifiers (gclid, dclid, fbclid, etc.) for paid attribution.
Generate derived date dimensions (day of week, week, month, year).
Apply client-defined touchpoint dimensions (dim1–dim10) and matching flags.
Write consolidated records into the Bonsai touchpoint table.
Output Table
Table: event_touchpoint_dim_table
This table consolidates customer journey data, linking key interactions to subsequent events such as orders. It serves as a single source of truth for understanding how different marketing touchpoints contribute to customer purchase.
Table metadata:
Description: A comprehensive table that records a customer's journey, including touchpoints, campaign data, and attributed metrics, to facilitate marketing effectiveness analysis.
Update Frequency: Updated daily via an ETL process that aggregates data from various source systems.
Partitioning: This table is not currently partitioned.
Example Use Case: Analyze the custom attributed metrics against each campaign in any DMA.
Field Definitions:
client_number
INTEGER
An internal client identifier.
key
STRING
A primary key for the specific interaction or event within the journey.
cookie_id
STRING
Unique identifier assigned to a browser or device via cookie tracking.
user_id
STRING
Unique identifier associated with a known or logged-in user.
event_name
STRING
Name of the event captured (e.g., page_view, purchase, click).
journey_time
INTEGER
The timestamp of the event or interaction within the journey.
session_id
INTEGER
Unique identifier for the user’s session, grouping related events together.
continent
STRING
Geographical dimension related to the user's location. Used for broad geographic segmentation.
sub_continent
STRING
Geographical dimensions related to the user's location. Used for more granular geographic grouping.
country
STRING
Geographical dimensions related to the user's location. Used for national-level data segmentation and filtering.
region
STRING
Geographical dimensions related to the user's location. Can represent a state, province, or a collection of postal codes. Used for more specific location-based targeting.
metro
STRING
Geographical dimensions related to the user's location. A metropolitan area, which includes a city and its surrounding suburbs. Used for local-level targeting and analysis.
campaign
STRING
An organized effort to promote a product or service. Used to group and track performance data.
source
STRING
The origin of a user or event. Used to identify where traffic or conversions came from.
medium
STRING
The category of the source. Used to classify the type of channel that drove traffic (e.g., CPC, organic, email).
operating_system
STRING
User operating system
operating_system_version
STRING
User operating system version.
deviceCategory
STRING
Category of the device used (e.g., desktop, mobile, tablet).
mobile_brand_name
STRING
Brand name of the mobile device.
mobile_model_name
STRING
Model name of the mobile device.
mobile_marketing_name
STRING
Marketing name of the mobile device.
dayofweek
INTEGER
Numeric representation of the day of the week (1–7).
week
INTEGER
Week number of the year.
month
INTEGER
Month number of the year.
year
INTEGER
Four-digit year value.
date
DATE
Date of the event (YYYY-MM-DD).
transaction_id
STRING
Unique identifier for the transaction or purchase event.
item_category
STRING
Primary category of the item purchased or viewed.
item_category2
STRING
Secondary category of the item.
item_category3
STRING
Tertiary category of the item.
item_category4
STRING
Fourth-level item category.
item_category5
STRING
Fifth-level item category.
product_sku
STRING
Product SKU or stock keeping unit identifier.
netsalesquantity
INTEGER
Quantity of items sold (net of returns).
product_brand
STRING
Brand name of the product.
product_name
STRING
Name or title of the product.
product_category
STRING
Overall category or type of the product.
net_sales_amount_usd
FLOAT
Total net sales amount in U.S. dollars.
referrer
STRING
Referring URL or source that directed the user.
gclid
STRING
A Google click ID used by Google Ads.
dclid
STRING
A click ID used by DoubleClick.
fbclid
STRING
A click ID used by Facebook and Instagram Ads.
gbraid
STRING
A Google click ID used for iOS campaigns, specifically for app-to-web measurement on browsers that do not support third-party cookies. (Google Brand Referrer ID)
ko_click_id
STRING
A click ID used by TikTok for its advertising campaigns
li_fat_id
STRING
A click ID used by LinkedIn.
twclid
STRING
A click ID used by Twitter (X) for its advertising platform.
ttclid
STRING
TikTok click identifier used for ad attribution tracking.
wbraid
STRING
A Google click ID for iOS campaigns, used for web-to-app measurement.
dim1-10
STRING
Client-defined custom 1st-party touch point dimensions
cookie_match_dim1
STRING
flag1-10
STRING
cookie_match_flag
INTEGER
key_interaction_flag
INTEGER
A flag indicating if the event was a key interaction.
key_interaction_weight
INTEGER
A value representing the weight or importance of a key interaction.
Last updated
