# ETL Overview for Analytics Data

## Overview

Bonsai ingests raw, event-level first-party analytics data — including GA4, Adobe Analytics, Heap, or custom 1P tracking implementations — and standardizes it into a unified customer journey dataset for marketing attribution, incrementality modeling, and campaign analytics.

## Analytics Platforms

### GA4 BigQuery Export

The GA4 BigQuery Export provides raw event-level analytics data (sessions, events, traffic source metadata, device data, and commerce events) in BigQuery. Bonsai uses this export as an upstream source of truth for web and app behavioral touchpoints.

**API documentation:** This connector leverages the [Google Analytics BigQuery Export Integration](https://support.google.com/analytics/answer/7029846?hl=en)

### Adobe Analytics Data Feed

The Adobe Analytics Data Feed provides raw, hit-level event data (page views, custom events, eVars, props, traffic source metadata, device data, and commerce events) delivered via cloud storage or API extraction. Bonsai uses this raw export as an upstream source of truth for web and app behavioral touchpoints.

**API documentation:** This connector leverages the [Adobe Analytics Data Feed and Reporting APIs](https://experienceleague.adobe.com/docs/analytics/export/analytics-data-feed/data-feed-overview.html)

### Heap Data Export

Heap provides raw, event-level behavioral data (user interactions, page views, custom events, properties, traffic attribution metadata, and user identifiers) via warehouse sync or API export. Bonsai ingests this event-level export as a first-party behavioral source to construct standardized customer journey paths.

**API documentation:** This connector leverages the [Heap Warehouse Sync and API Export](https://developers.heap.io/docs)

### Custom First-Party Tracking (1P Event Stream)

Custom first-party tracking implementations (including proprietary event pipelines, server-side tracking, or custom 1P cookie frameworks) provide raw event-level behavioral data such as sessions, user identifiers, traffic source parameters, and commerce events. Bonsai ingests this structured event stream via secure file transfer or warehouse connection and transforms it into a standardized customer journey dataset.

**API documentation:** Integration specifications are provided directly by the client. Data must meet Bonsai’s event schema requirements (event timestamp, user identifier, session identifier, event name, traffic source metadata).

## Transformation (ETL) Summary

Bonsai runs a daily ETL pipeline that normalizes any analytics event data and enriches it with campaign, geographic, device, and conversion attributes. The output is a unified touchpoint dimension table that represents each event in the customer journey.

Key transformation steps include:

* Extract event-level records from analytics export tables.
* Normalize identifiers (cookie/device ID, user ID, session ID) to support journey stitching.
* Flatten and map traffic source fields into standardized source/medium/campaign fields.
* Derive geographic and device dimensions from analytics event metadata.
* Join commerce-related events to transaction\_id and item details when available.
* Parse / extract click identifiers (gclid, dclid, fbclid, etc.) for paid attribution.
* Generate derived date dimensions (day of week, week, month, year).
* Apply client-defined touchpoint dimensions (dim1–dim10) and matching flags.
* Write consolidated records into the Bonsai touchpoint table.

### Output Table

#### Table: event\_touchpoint\_dim\_table

This table consolidates customer journey data, linking key interactions to subsequent events such as orders. It serves as a single source of truth for understanding how different marketing touchpoints contribute to customer purchase.

**Table metadata:**

* Description: A comprehensive table that records a customer's journey, including touchpoints, campaign data, and attributed metrics, to facilitate marketing effectiveness analysis.
* Update Frequency: Updated daily via an ETL process that aggregates data from various source systems.
* Partitioning: This table is not currently partitioned.
* Example Use Case: Analyze the custom attributed metrics against each campaign in any DMA.

**Field Definitions:**

| Field Name                 | Type    | Description                                                                                                                                                                  |
| -------------------------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| client\_number             | INTEGER | An internal client identifier.                                                                                                                                               |
| key                        | STRING  | A primary key for the specific interaction or event within the journey.                                                                                                      |
| cookie\_id                 | STRING  | Unique identifier assigned to a browser or device via cookie tracking.                                                                                                       |
| user\_id                   | STRING  | Unique identifier associated with a known or logged-in user.                                                                                                                 |
| event\_name                | STRING  | Name of the event captured (e.g., page\_view, purchase, click).                                                                                                              |
| journey\_time              | INTEGER | The timestamp of the event or interaction within the journey.                                                                                                                |
| session\_id                | INTEGER | Unique identifier for the user’s session, grouping related events together.                                                                                                  |
| continent                  | STRING  | Geographical dimension related to the user's location. Used for broad geographic segmentation.                                                                               |
| sub\_continent             | STRING  | Geographical dimensions related to the user's location. Used for more granular geographic grouping.                                                                          |
| country                    | STRING  | Geographical dimensions related to the user's location. Used for national-level data segmentation and filtering.                                                             |
| region                     | STRING  | Geographical dimensions related to the user's location. Can represent a state, province, or a collection of postal codes. Used for more specific location-based targeting.   |
| metro                      | STRING  | Geographical dimensions related to the user's location. A metropolitan area, which includes a city and its surrounding suburbs. Used for local-level targeting and analysis. |
| campaign                   | STRING  | An organized effort to promote a product or service. Used to group and track performance data.                                                                               |
| source                     | STRING  | The origin of a user or event. Used to identify where traffic or conversions came from.                                                                                      |
| medium                     | STRING  | The category of the source. Used to classify the type of channel that drove traffic (e.g., CPC, organic, email).                                                             |
| operating\_system          | STRING  | User operating system                                                                                                                                                        |
| operating\_system\_version | STRING  | User operating system version.                                                                                                                                               |
| deviceCategory             | STRING  | Category of the device used (e.g., desktop, mobile, tablet).                                                                                                                 |
| mobile\_brand\_name        | STRING  | Brand name of the mobile device.                                                                                                                                             |
| mobile\_model\_name        | STRING  | Model name of the mobile device.                                                                                                                                             |
| mobile\_marketing\_name    | STRING  | Marketing name of the mobile device.                                                                                                                                         |
| dayofweek                  | INTEGER | Numeric representation of the day of the week (1–7).                                                                                                                         |
| week                       | INTEGER | Week number of the year.                                                                                                                                                     |
| month                      | INTEGER | Month number of the year.                                                                                                                                                    |
| year                       | INTEGER | Four-digit year value.                                                                                                                                                       |
| date                       | DATE    | Date of the event (YYYY-MM-DD).                                                                                                                                              |
| transaction\_id            | STRING  | Unique identifier for the transaction or purchase event.                                                                                                                     |
| item\_category             | STRING  | Primary category of the item purchased or viewed.                                                                                                                            |
| item\_category2            | STRING  | Secondary category of the item.                                                                                                                                              |
| item\_category3            | STRING  | Tertiary category of the item.                                                                                                                                               |
| item\_category4            | STRING  | Fourth-level item category.                                                                                                                                                  |
| item\_category5            | STRING  | Fifth-level item category.                                                                                                                                                   |
| product\_sku               | STRING  | Product SKU or stock keeping unit identifier.                                                                                                                                |
| netsalesquantity           | INTEGER | Quantity of items sold (net of returns).                                                                                                                                     |
| product\_brand             | STRING  | Brand name of the product.                                                                                                                                                   |
| product\_name              | STRING  | Name or title of the product.                                                                                                                                                |
| product\_category          | STRING  | Overall category or type of the product.                                                                                                                                     |
| net\_sales\_amount\_usd    | FLOAT   | Total net sales amount in U.S. dollars.                                                                                                                                      |
| referrer                   | STRING  | Referring URL or source that directed the user.                                                                                                                              |
| gclid                      | STRING  | A Google click ID used by Google Ads.                                                                                                                                        |
| dclid                      | STRING  | A click ID used by DoubleClick.                                                                                                                                              |
| fbclid                     | STRING  | A click ID used by Facebook and Instagram Ads.                                                                                                                               |
| gbraid                     | STRING  | A Google click ID used for iOS campaigns, specifically for app-to-web measurement on browsers that do not support third-party cookies. (Google Brand Referrer ID)            |
| ko\_click\_id              | STRING  | A click ID used by TikTok for its advertising campaigns                                                                                                                      |
| li\_fat\_id                | STRING  | A click ID used by LinkedIn.                                                                                                                                                 |
| twclid                     | STRING  | A click ID used by Twitter (X) for its advertising platform.                                                                                                                 |
| ttclid                     | STRING  | TikTok click identifier used for ad attribution tracking.                                                                                                                    |
| wbraid                     | STRING  | A Google click ID for iOS campaigns, used for web-to-app measurement.                                                                                                        |
| dim1-10                    | STRING  | Client-defined custom 1st-party touch point dimensions                                                                                                                       |
| cookie\_match\_dim1        | STRING  |                                                                                                                                                                              |
| flag1-10                   | STRING  |                                                                                                                                                                              |
| cookie\_match\_flag        | INTEGER |                                                                                                                                                                              |
| key\_interaction\_flag     | INTEGER | A flag indicating if the event was a key interaction.                                                                                                                        |
| key\_interaction\_weight   | INTEGER | A value representing the weight or importance of a key interaction.                                                                                                          |
