Databricks - Setup Guide

Who can use this feature?
- Available with Anywhere: Warehouse, an add-on for Business, Advanced, and Enterprise plans.
- Requires an Admin or Architect role to configure.

Note: Databricks syncs are supported for Databricks workspaces hosted on Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure.

Databricks is a unified, open analytics platform for building, deploying, sharing, and maintaining enterprise-grade data, analytics, and AI solutions at scale. Fullstory customers with Anywhere: Warehouse can sync data to Databricks Unity Catalog with Ready to Analyze Views. This setup guide provides the following resources for getting started:

Setting up Databricks

Fullstory's Warehouse Data Sync for Databricks leverages Unity Catalog to manage data governance and Databricks SQL Warehouses to execute the loading processes.

The architecture utilizes Workload Identity Federation, allowing Fullstory to securely write data to a staging area and then load it into your Delta Lake managed tables without the need for long-lived static secrets.

Setting up Account level resources

  1. Create a Service Principal.
    In your Databricks Account Console, create a new Service Principal. This identity will be used by Fullstory to authenticate and perform sync operations.

  2. Create a Federation Policy for the Service Principal.
    To allow Fullstory to authenticate as your Service Principal, you must create a federation policy using the following configuration:

    • Issuer: https://accounts.google.com
    • Subject:
      • NA customers: 116984388253902328461
      • EU customers: 107589159240321051166
    • Audience: fullstory.com.pgm.c5nprx.cc
  3. Create a Workspace (optional).
    If you wish to use an existing workspace, skip this step. Otherwise, create a new workspace.
    Note down your workspace URL (e.g., https://xxx.x.gcp.databricks.com).
  4. Assign Service Principal to the Workspace.
    Add the Service Principal created in Step 1 as a User of the workspace that you'd like to use.

Setting up Workspace level resources

Navigate to your workspace using the workspace URL and perform the following actions.

  1. Identify or create your SQL Warehouse.
    Select an existing SQL warehouse or create a new one in your workspace. Note down the Warehouse ID.

  2. Identify or create your Catalog.
    Select an existing Catalog or create a new one in your workspace. Note down the Catalog Name.

  3. Configure Unity Catalog Permissions.
    For the Catalog you wish to use, assign the following permissions to your Service Principal:

    • USE CATALOG
    • USE SCHEMA
    • CREATE SCHEMA
    • CREATE TABLE
    • MODIFY
    • SELECT
  4. Set up Storage Credentials.
    Fullstory uses a cloud storage staging layer to load data into Databricks. The setup differs depending on your cloud provider. Expand the section below that matches your Databricks deployment.

    GCP - Set up Storage Credentials

    If your Databricks workspace is hosted on GCP, Fullstory uses GCP as a staging layer. To ensure Databricks can read these files, a Storage Credential is used.

    1. In Unity Catalog, create a new Storage Credential.
    2. Ensure you select GCP Service Account as the credential type.
    3. Assign the READ FILES permission to your Service Principal.
    4. Note down the Credential Name and the GCP Service Account email associated with it.

    For details on how Fullstory stages data using GCS, see GCP: Direct Read from GCS into Unity Catalog.

    AWS - Set up Storage Credentials

    If your Databricks workspace is hosted on AWS, Fullstory uses S3 as a staging layer. To ensure Databricks can read these files, an S3 bucket and a Storage Credential are used.

    1. In Unity Catalog, create a new Storage Credential.
      AWS - Create a new credential
    2. Ensure you select AWS IAM Role as the credential type.

      1. After the credential is created, add the Trust Policy from Databricks to the IAM Role.
        Credential created
      2. Additionally, add the following Trust Policy for Fullstory to the IAM Role. Use the subject identifier for your region:
        • NA: 116984388253902328461
        • EU: 107589159240321051166
        {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Effect": "Allow",
              "Principal": {
                "Federated": "accounts.google.com"
              },
              "Action": "sts:AssumeRoleWithWebIdentity",
              "Condition": {
                "StringEquals": {
                  "accounts.google.com:aud": "<audience for your Fullstory region from above>"
                }
              }
            }
          ]
        }
    3. Grant permissions to the IAM Role for your S3 bucket. Create an S3 bucket if one does not already exist. This bucket must not overlap with Unity Catalog managed storage — use a dedicated bucket for Fullstory staging. Then grant the IAM Role access to the bucket following the instructions from Databricks.

      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Action": [
              "s3:GetObject",
              "s3:PutObject",
              "s3:DeleteObject",
              "s3:ListBucket",
              "s3:GetBucketLocation",
              "s3:ListBucketMultipartUploads",
              "s3:ListMultipartUploadParts",
              "s3:AbortMultipartUpload"
            ],
            "Resource": ["arn:aws:s3:::<BUCKET>/*", "arn:aws:s3:::<BUCKET>"],
            "Effect": "Allow"
          },
          {
            "Action": ["kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey*"],
            "Resource": ["arn:aws:kms:<KMS-KEY>"],
            "Effect": "Allow"
          },
          {
            "Action": ["sts:AssumeRole"],
            "Resource": ["arn:aws:iam::<AWS-ACCOUNT-ID>:role/<AWS-IAM-ROLE-NAME>"],
            "Effect": "Allow"
          }
        ]
      }
    4. Assign the READ FILES permission to your Service Principal.
    5. Note down the Credential Name, S3 Bucket, AWS Region, and the IAM Role ARN associated with it.

    For details on how Fullstory stages data using S3, see AWS: Upload to S3, then Read into Unity Catalog.

    Azure - Set up Storage Credentials

    If your Databricks workspace is hosted on Azure, Fullstory uses ADLS Gen2 as a staging layer. To ensure Databricks can read these files, a Storage Credential in Databricks and a Managed Identity in Azure are used.

    1. Create an ADLS Gen2 storage account and container for Fullstory staging data. This may be the same as the Unity Catalog root. If so, use a different prefix for the staging area.
    2. Create an Azure Managed Identity (or App Registration in Entra ID) that will allow Fullstory to write data to ADLS Gen2. Configure it with a federated credential that trusts Fullstory's identity.
      • Issuer: https://accounts.google.com
      • Subject:
        • NA customers: 116984388253902328461
        • EU customers: 107589159240321051166
      • Audience: api://AzureADTokenExchange
    3. Assign the Storage Blob Data Contributor role to the Azure Managed Identity on the storage account or container.
    4. In Unity Catalog, create a new Storage Credential configured to access the ADLS Gen2 container.
    5. Assign the READ FILES permission to your Service Principal created earlier.
    6. Note down the Credential Name, Azure Tenant ID, Azure Client ID (Managed Identity), Storage Account Name, and Container Name.

    Understanding the two Client IDs (Azure): When using Azure, the configuration requires two separate Client IDs.

    • The Service Principal Client ID (common field) is used to authenticate to the Databricks workspace — running SQL statements and orchestrating the sync.
    • The Azure Client ID (Azure-specific field) is used to write staged data into ADLS Gen2 storage.
    • These can be the same value if your Databricks Service Principal uses the same Azure Managed Identity, but they can also be different to separate Fullstory's access to your Azure storage from Databricks's Storage Credential.

    For details on how Fullstory stages data using ADLS Gen2, see Azure: Upload to ADLS Gen2, then Read into Unity Catalog.

Understanding how Staging and Sync Works

Fullstory syncs curated ready-to-analyze views into your Databricks Unity Catalog. The sync process follows different data paths depending on your cloud provider, but the end result is the same: data lands in your Unity Catalog as managed tables.

Sync Process Overview

For each batch of data, Fullstory executes the sync in two main steps:

  1. Stage — Prepare data files in cloud storage and make them accessible to Databricks.
  2. Copy & Merge — Run COPY INTO to load staged data and merge (or insert) data into Unity Catalog.

The key difference between cloud providers is Step 1: how data migrates from Fullstory's internal storage to a location Databricks can read.

GCP: Direct Read from GCS into Unity Catalog

Fullstory grants temporary read access to your GCP service account provided during each sync process. Databricks then reads directly from gs:// — no data is copied to an intermediate location. After each sync, access is revoked.

AWS: Upload to S3, then Read into Unity Catalog

Fullstory uploads data files to your S3 bucket by assuming your IAM role via OIDC federation. Databricks then reads staged files from s3://.

Your S3 bucket must not overlap with Unity Catalog managed storage. Overlap will cause the INVALID_PARAMETER_VALUE.LOCATION_OVERLAP error from Databricks during sync.

Staging data files TTL recommendation: Staged files in your S3 bucket are not cleaned up automatically by Fullstory. Configure an S3 lifecycle policy with a TTL (for example 7 days) to automatically delete stale files.

Azure: Upload to ADLS Gen2, then Read into Unity Catalog

Fullstory uploads data files to your ADLS Gen2 container using the Azure Managed Identity. Files are written to abfss://<container>@<storageaccount>. Databricks reads staged files using the Storage Credential.

Staging data files TTL recommendation: Configure a lifecycle management policy with a TTL (for example 7 days) on the staging path to automatically delete stale files.

Summary

Aspect GCP AWS Azure
Staging location Fullstory's GCS (no copy) Customer's S3 bucket Customer's ADLS Gen2 container
How Fullstory stages data Grants temporary ACL on GCS objects Uploads files from GCS to S3 Uploads files from GCS to ADLS Gen2
Databricks reads from gs:// (Fullstory's bucket) s3:// (your bucket) abfss:// (your container)
Cleanup after sync ACL revoked Staging table dropped Staging table dropped
Customer manages staging storage? No Yes (S3 bucket) Yes (ADLS Gen2 container)
Recommended staging TTL N/A 7 days 7 days

Setting up Databricks in Fullstory

  1. Log in to Fullstory and navigate to Settings > Anywhere > Warehouse.
  2. Select Databricks from the list of available destinations.
  3. Use the Choose your cloud provider selector to pick your provider (GCP, AWS, or Azure), then fill in the following fields using the information noted from the above steps:

    GCP AWS Azure
    Workspace URL Workspace URL Workspace URL
    Service Principal Client ID Service Principal Client ID Service Principal Client ID
    Warehouse ID Warehouse ID Warehouse ID
    Catalog Name Catalog Name Catalog Name
    Storage Credential Name Storage Credential Name Storage Credential Name
    GCP Service Account Email S3 Bucket Azure Tenant ID
    IAM role Amazon resource name (ARN) Azure Client ID
    Region Azure Storage Account Name
    Azure Container Name
    Azure Storage Path Prefix (optional)
    Fullstory Databricks Config - GCP

    GCP

    Fullstory Databricks Config - AWS

    AWS

    Fullstory Databricks Config - Azure

    Azure

  4. Click Save.

Frequently Asked Questions

Does Fullstory use Databricks Delta Share?

No, Fullstory uses a traditional warehouse connection to sync data to Databricks, not Delta Share. With this approach:

  • You own your data in your Databricks warehouse.
  • You have full control over data retention and governance.
  • You receive the same Ready to Analyze Views available for Snowflake, Redshift, and BigQuery.

What data does Fullstory sync to Databricks?

Fullstory syncs data using Ready to Analyze Views, which provides pre-built views optimized for analytics and business intelligence. This is the same data format used for Snowflake, Redshift, and BigQuery connections. For complete details about the data model, sample queries, and sync expectations, see the Ready to Analyze Views developer documentation.


Was this article helpful?

Got Questions?

Get in touch with a Fullstory rep, ask the community or check out our developer documentation.