ETL Overview for Analytics Data

Overview

Bonsai ingests raw, event-level first-party analytics data — including GA4, Adobe Analytics, Heap, or custom 1P tracking implementations — and standardizes it into a unified customer journey dataset for marketing attribution, incrementality modeling, and campaign analytics.

Analytics Platforms

GA4 BigQuery Export

The GA4 BigQuery Export provides raw event-level analytics data (sessions, events, traffic source metadata, device data, and commerce events) in BigQuery. Bonsai uses this export as an upstream source of truth for web and app behavioral touchpoints.

API documentation: This connector leverages the Google Analytics BigQuery Export Integrationarrow-up-right

Adobe Analytics Data Feed

The Adobe Analytics Data Feed provides raw, hit-level event data (page views, custom events, eVars, props, traffic source metadata, device data, and commerce events) delivered via cloud storage or API extraction. Bonsai uses this raw export as an upstream source of truth for web and app behavioral touchpoints.

API documentation: This connector leverages the Adobe Analytics Data Feed and Reporting APIsarrow-up-right

Heap Data Export

Heap provides raw, event-level behavioral data (user interactions, page views, custom events, properties, traffic attribution metadata, and user identifiers) via warehouse sync or API export. Bonsai ingests this event-level export as a first-party behavioral source to construct standardized customer journey paths.

API documentation: This connector leverages the Heap Warehouse Sync and API Exportarrow-up-right

Custom First-Party Tracking (1P Event Stream)

Custom first-party tracking implementations (including proprietary event pipelines, server-side tracking, or custom 1P cookie frameworks) provide raw event-level behavioral data such as sessions, user identifiers, traffic source parameters, and commerce events. Bonsai ingests this structured event stream via secure file transfer or warehouse connection and transforms it into a standardized customer journey dataset.

API documentation: Integration specifications are provided directly by the client. Data must meet Bonsai’s event schema requirements (event timestamp, user identifier, session identifier, event name, traffic source metadata).

Transformation (ETL) Summary

Bonsai runs a daily ETL pipeline that normalizes any analytics event data and enriches it with campaign, geographic, device, and conversion attributes. The output is a unified touchpoint dimension table that represents each event in the customer journey.

Key transformation steps include:

  • Extract event-level records from analytics export tables.

  • Normalize identifiers (cookie/device ID, user ID, session ID) to support journey stitching.

  • Flatten and map traffic source fields into standardized source/medium/campaign fields.

  • Derive geographic and device dimensions from analytics event metadata.

  • Join commerce-related events to transaction_id and item details when available.

  • Parse / extract click identifiers (gclid, dclid, fbclid, etc.) for paid attribution.

  • Generate derived date dimensions (day of week, week, month, year).

  • Apply client-defined touchpoint dimensions (dim1–dim10) and matching flags.

  • Write consolidated records into the Bonsai touchpoint table.

Output Table

Table: event_touchpoint_dim_table

This table consolidates customer journey data, linking key interactions to subsequent events such as orders. It serves as a single source of truth for understanding how different marketing touchpoints contribute to customer purchase.

Table metadata:

  • Description: A comprehensive table that records a customer's journey, including touchpoints, campaign data, and attributed metrics, to facilitate marketing effectiveness analysis.

  • Update Frequency: Updated daily via an ETL process that aggregates data from various source systems.

  • Partitioning: This table is not currently partitioned.

  • Example Use Case: Analyze the custom attributed metrics against each campaign in any DMA.

Field Definitions:

Field Name
Type
Description

client_number

INTEGER

An internal client identifier.

key

STRING

A primary key for the specific interaction or event within the journey.

cookie_id

STRING

Unique identifier assigned to a browser or device via cookie tracking.

user_id

STRING

Unique identifier associated with a known or logged-in user.

event_name

STRING

Name of the event captured (e.g., page_view, purchase, click).

journey_time

INTEGER

The timestamp of the event or interaction within the journey.

session_id

INTEGER

Unique identifier for the user’s session, grouping related events together.

continent

STRING

Geographical dimension related to the user's location. Used for broad geographic segmentation.

sub_continent

STRING

Geographical dimensions related to the user's location. Used for more granular geographic grouping.

country

STRING

Geographical dimensions related to the user's location. Used for national-level data segmentation and filtering.

region

STRING

Geographical dimensions related to the user's location. Can represent a state, province, or a collection of postal codes. Used for more specific location-based targeting.

metro

STRING

Geographical dimensions related to the user's location. A metropolitan area, which includes a city and its surrounding suburbs. Used for local-level targeting and analysis.

campaign

STRING

An organized effort to promote a product or service. Used to group and track performance data.

source

STRING

The origin of a user or event. Used to identify where traffic or conversions came from.

medium

STRING

The category of the source. Used to classify the type of channel that drove traffic (e.g., CPC, organic, email).

operating_system

STRING

User operating system

operating_system_version

STRING

User operating system version.

deviceCategory

STRING

Category of the device used (e.g., desktop, mobile, tablet).

mobile_brand_name

STRING

Brand name of the mobile device.

mobile_model_name

STRING

Model name of the mobile device.

mobile_marketing_name

STRING

Marketing name of the mobile device.

dayofweek

INTEGER

Numeric representation of the day of the week (1–7).

week

INTEGER

Week number of the year.

month

INTEGER

Month number of the year.

year

INTEGER

Four-digit year value.

date

DATE

Date of the event (YYYY-MM-DD).

transaction_id

STRING

Unique identifier for the transaction or purchase event.

item_category

STRING

Primary category of the item purchased or viewed.

item_category2

STRING

Secondary category of the item.

item_category3

STRING

Tertiary category of the item.

item_category4

STRING

Fourth-level item category.

item_category5

STRING

Fifth-level item category.

product_sku

STRING

Product SKU or stock keeping unit identifier.

netsalesquantity

INTEGER

Quantity of items sold (net of returns).

product_brand

STRING

Brand name of the product.

product_name

STRING

Name or title of the product.

product_category

STRING

Overall category or type of the product.

net_sales_amount_usd

FLOAT

Total net sales amount in U.S. dollars.

referrer

STRING

Referring URL or source that directed the user.

gclid

STRING

A Google click ID used by Google Ads.

dclid

STRING

A click ID used by DoubleClick.

fbclid

STRING

A click ID used by Facebook and Instagram Ads.

gbraid

STRING

A Google click ID used for iOS campaigns, specifically for app-to-web measurement on browsers that do not support third-party cookies. (Google Brand Referrer ID)

ko_click_id

STRING

A click ID used by TikTok for its advertising campaigns

li_fat_id

STRING

A click ID used by LinkedIn.

twclid

STRING

A click ID used by Twitter (X) for its advertising platform.

ttclid

STRING

TikTok click identifier used for ad attribution tracking.

wbraid

STRING

A Google click ID for iOS campaigns, used for web-to-app measurement.

dim1-10

STRING

Client-defined custom 1st-party touch point dimensions

cookie_match_dim1

STRING

flag1-10

STRING

cookie_match_flag

INTEGER

key_interaction_flag

INTEGER

A flag indicating if the event was a key interaction.

key_interaction_weight

INTEGER

A value representing the weight or importance of a key interaction.

Last updated