analyticsarchitecturedata

Composable Analytics for React: Mixing ClickHouse, Snowflake, and Client-Side Instrumentation

UUnknown

2026-02-08

12 min read

Composable analytics pairs privacy-first client telemetry with ClickHouse for rapid frontend insights and Snowflake for long-term governance.

Hook: Why frontend teams need composable analytics in 2026

Frontend teams are forced to answer the same three questions faster than ever: Which UI changes improve conversion? Where do users get stuck? And can we trust the data without violating privacy rules? The modern answer is a composable analytics architecture that pairs lightweight, privacy-aware client telemetry with fast OLAP engines like ClickHouse for rapid iteration, while pushing long-term storage, heavy joins, and governance workloads to systems like Snowflake.

Why this approach matters in 2026

Late 2025 and early 2026 accelerated two trends important to frontend analytics teams: first, OLAP databases are getting massive investment and product velocity—ClickHouse's continued growth (notably its multi-hundred-million dollar funding rounds in 2025–2026) reflects how teams favor low-latency analytical stores for event-driven workloads. Second, privacy regulation and user expectations have hardened: cookie deprecation, stricter consent UX, and enterprise compliance policies make naive client telemetry untenable.

The result: teams need to collect the smallest possible telemetry footprint on the client, do meaningful enrichment server-side, and use a split backend where ClickHouse serves the fast, iterative queries and Snowflake acts as the canonical, governed data lake/warehouse.

High-level architecture: Composable pipeline

Here is the recommended, pragmatic architecture for frontend teams who want fast insights without privacy trade-offs:

Client instrumentation (privacy-first): minimal events, hashing/anonymization, consent checks, sampling.
Lightweight ingestion API: validates and enriches events, strips PII, applies deterministic sampling, forwards to streaming layer.
Streaming layer: Kafka / Pulsar / managed alternatives (Kinesis, Pub/Sub) with topic partitions for event types and per-tenant isolation.
Fast OLAP (ClickHouse): real-time or near-real-time ingestion for interactive analysis and dashboards used by frontend/product teams.
Data lake / Warehouse (Snowflake): batched ETL or CDC for long-term retention, compliance, heavy transformations and BI.
ETL / Transformation: dbt for Snowflake models; lightweight materialized views / aggregated tables in ClickHouse for high-cardinality, low-latency queries.
Retention & Governance: TTLs and partitioning in ClickHouse for cost control, Snowflake time-travel and access policies for compliance.

Why split ClickHouse + Snowflake?

Use each system for what it does best:

ClickHouse: millisecond to sub-second query times for high-cardinality event data, excellent columnar compression and aggregation throughput—ideal for iterative funnel analysis, sessionization, and A/B quick-checks.
Snowflake: governance, complex joins with canonical customer or billing tables, long-term archiving, and integration with existing enterprise BI stacks.

Client-side instrumentation: minimal, privacy-aware, durable

Think of the browser SDK as a tiny police officer: it must collect what you need, and nothing more. The goal is to minimize sensitive data leaving the device while making events useful when enriched server-side.

Principles to follow

Collect only necessary attributes: page, route, UI component id, interaction type, result (success/failure), non-identifying metadata.
Hash or bucketize identifiers: user ID, email, or device ID should be hashed with a salt and optionally bucketed to lower cardinality.
Consent-first: respect consent states and provide a lightweight consent SDK that toggles what the client sends.
Local buffering & reliable delivery: batch events, use navigator.sendBeacon or fetch keepalive, and fall back to indexedDB for offline storage.
Sampling & rate limits: implement deterministic sampling to control volume while preserving statistical validity for experiments.

Example: a compact React telemetry hook (TypeScript)

The following hook sends minimized events with hashing and batching. It is intentionally small so you can extend it for consent, sampling or privacy requirements.

import { useEffect, useRef } from 'react'

type TelemetryEvent = {
  type: string
  ts: number
  route?: string
  component?: string
  result?: string
  userHash?: string // pre-hashed on client or server
}

function hashId(id: string, salt = 'static-salt') {
  // lightweight non-cryptographic hash for bucketing; replace with crypto.subtle in production
  let h = 2166136261
  for (let i = 0; i < id.length; i++) {
    h ^= id.charCodeAt(i) + salt.charCodeAt(i % salt.length)
    h += (h << 1) + (h << 4) + (h << 7) + (h << 8) + (h >> 24)
  }
  return String(Math.abs(h))
}

export function useTelemetry() {
  const queue = useRef([])
  const timer = useRef(null)

  function sendBatch() {
    if (!queue.current.length) return
    const payload = JSON.stringify({ events: queue.current })
    queue.current = []
    // use sendBeacon where possible for reliability
    if (navigator.sendBeacon) {
      navigator.sendBeacon('/ingest', payload)
      return
    }
    fetch('/ingest', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: payload,
      keepalive: true
    }).catch (() => {/* swallow */})
  }

  function track(e: Omit & { userId?: string }) {
    // respect client-side sampling here if configured
    const event: TelemetryEvent = {
      ...e,
      ts: Date.now(),
      userHash: e.userId ? hashId(e.userId) : undefined
    }
    queue.current.push(event)
    if (!timer.current) {
      timer.current = window.setTimeout(() => { sendBatch(); timer.current = null }, 2000)
    }
    if (queue.current.length >= 20) sendBatch()
  }

  useEffect(() => () => { // flush on unload
    try { sendBatch() } catch (err) { }
  }, [])

  return { track }
}

Ingestion API: validation, enrichment, PII scrubbing

The ingestion API is the right place to perform deterministic enrichment and strict PII removal. Keep the API simple and fast—use Node, Go, or Rust microservices—and forward events to a streaming layer.

Server-side responsibilities

Validate schema: reject malformed events to protect downstream query performance.
Remove PII: drop emails, phone numbers, exact addresses; keep hashed or bucketed IDs.
Enrich: add geo (country only), device bucket, feature flags, or experiment metadata based on deterministic hashing.
Forward to streaming: write to topics used by ClickHouse and Snowflake ingestion pipelines.

// pseudocode: express ingestion endpoint
app.post('/ingest', async (req, res) => {
  const body = req.body
  if (!validate(body)) return res.status(400).send('bad event')
  const cleaned = removePII(body)
  const enriched = { ...cleaned, country: countryFromIP(req.ip), env: process.env.NODE_ENV }
  await producer.send({ topic: 'events', messages: [{ value: JSON.stringify(enriched) }] })
  res.status(202).send('ok')
})

Streaming & ingestion into ClickHouse

ClickHouse offers native ingestion patterns suitable for high throughput: Kafka engine, HTTP inserts, and cloud native ingestion. The typical pattern is to write events to Kafka (or a managed equivalent) and use ClickHouse materialized views with the Kafka engine (or a streaming consumer) to populate mergeable tables for near-real-time queries.

Schema design for ClickHouse

Design event tables for append-heavy workloads with appropriate partitioning and TTLs:

CREATE TABLE events (
  event_date Date DEFAULT toDate(ts),
  ts DateTime64(3),
  event_type String,
  route String,
  component String,
  user_hash String,
  attrs Nested (k String, v String)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_type, user_hash, ts)
TTL ts + INTERVAL 30 DAY
SETTINGS index_granularity = 8192
;

This example sets a 30-day TTL in ClickHouse so raw events older than 30 days are removed automatically—excellent for cost control and privacy.

Fast interactive queries

ClickHouse shines at aggregations and funnels. For example, a simple funnel: page_view -> checkout_start -> purchase can be computed in seconds for recent data.

SELECT
  toStartOfInterval(ts, INTERVAL 1 hour) AS hour,
  sumIf(1, event_type='page_view') AS page_views,
  sumIf(1, event_type='checkout_start') AS checkout_starts,
  sumIf(1, event_type='purchase') AS purchases
FROM events
WHERE ts > now() - INTERVAL 7 day
GROUP BY hour
ORDER BY hour

ETL and moving data to Snowflake

ClickHouse is excellent for quick iteration, but Snowflake is often the canonical source for cross-functional analysis, governed reporting, and data sharing. Use a periodic ETL to push raw or aggregated data from the streaming layer (or ClickHouse) into Snowflake.

Two patterns to move data

Streaming to Snowflake with Snowpipe: stream events into a cloud storage bucket (S3/GCS) and use Snowpipe for near-real-time ingestion. This keeps administrative overhead low.
Batch ETL with dbt: run scheduled jobs to transform raw events into canonical models for business reporting. dbt models run best on Snowflake for heavy joins and historical modeling.

Example: a Python ETL that aggregates hourly metrics and writes to Snowflake (pseudo):

-- dbt-style model (Snowflake)
WITH hourly AS (
  SELECT
    DATE_TRUNC('HOUR', ts) AS hour,
    event_type,
    COUNT(*) AS cnt
  FROM raw_events
  WHERE ts > DATEADD(day, -90, CURRENT_TIMESTAMP())
  GROUP BY hour, event_type
)
SELECT * FROM hourly;

Retention, cost control, and privacy guarantees

Managing retention and cost is where many teams fail. Here are practical rules that scale:

Hot vs. warm vs. cold: keep 7–30 days of raw events in ClickHouse (hot), 90–365 days of aggregates in ClickHouse or Snowflake (warm), and archive raw events beyond that in object storage (cold).
TTL & partitioning: enforce TTLs in ClickHouse and partition by month to make deletions and queries efficient.
Downsample: after 30 days, downsample or roll-up to hourly or daily aggregates to save space while preserving analytic value.
Deterministic hashing + ephemeral IDs: avoid storing permanent identifiers in hot tables. Use salted hashes that can be rotated if necessary.
Audit & lineage: use a data catalog and dbt lineage, so every dashboard can be traced back to the transformation that generated it.

Retention example in ClickHouse

ALTER TABLE events MODIFY TTL ts + INTERVAL 30 DAY,
  TO VOLUME 'cold' WHERE ts < now() - INTERVAL 90 DAY;

Use storage policies to move older partitions to cheaper volumes or object storage if your ClickHouse deployment supports it.

Operational concerns & observability

To make this system reliable for frontend teams:

Monitor ingestion lag: track the difference between event time and ingestion time to detect pipeline slowdowns.
Track sampling ratios: expose the sampling rate used by the client in metadata so engineers can reconstruct statistical significance.
Test end-to-end: run canary tests that send synthetic events and assert they appear in ClickHouse and Snowflake.
Backpressure & throttling: implement per-tenant rate limiting at the API layer to prevent noisy events from overwhelming the cluster.

Security, compliance & privacy tech trends in 2026

As of 2026, expect these mature practices to be table stakes:

Privacy-preserving analytics: deterministic hashing, k-anonymity and early privacy-aware sampling are widely adopted across product teams.
Differential privacy (DP): larger orgs are experimenting with DP for aggregated metrics shared externally; small teams apply simple noise injection to public dashboards.
Consent & signal orchestration: consent frameworks are integrated into SDKs, and consent states are treated as first-class attributes to filter event flows.
Data catalog + governance: dbt lineage, Snowflake access controls, and audit logs are required for product analytics teams that collaborate with legal and security.

Practical privacy: it’s better to collect slightly less now and have reliable, analyzable data than to have more data you can’t use because of policy or cost.

Example: an end-to-end mini-pipeline (concrete steps)

Here’s a concrete implementation plan you can follow in sprints:

Instrument a small number of critical events with the useTelemetry hook. Keep payloads tiny and hash identifiers.
Deploy a single ingestion microservice that validates and forwards to a managed Kafka or equivalent topic.
Connect Kafka to ClickHouse using the Kafka engine + materialized views for near-real-time analytics.
Set ClickHouse TTLs to 30 days and create hourly aggregated tables kept for 90 days.
Run a nightly job that copies raw or aggregated files to S3 and triggers Snowpipe for Snowflake ingestion.
Model canonical tables in dbt on Snowflake for long-term retention and cross-team reporting.
Build lightweight dashboards connected to ClickHouse for rapid product experiments, and Snowflake for executive reporting.

Queries & patterns frontend teams will find most valuable

Frontend teams need quick access to these query patterns in ClickHouse:

Funnel conversion per variant (hourly): group by experiment variant and compute stage conversions.
Sessionization: group events into sessions using sessionize window functions or custom stateful approaches.
Drop-offs by component: aggregate by component id and error type to find hotspots for bug-fixing sprints.
Latency percentiles: compute p50/p95 response times for critical flows using ClickHouse quantile function.

Costs, scaling, and when to move fully to Snowflake

Start with ClickHouse for fast iteration. If your organization needs centralized governance, complex joins with data from many sources, or you want managed elasticity, Snowflake may eventually host more of your workloads. However, keep the fast path: even teams who centralize on Snowflake often retain a ClickHouse cluster for ultra-fast exploration and experimentation.

Real-world case study (anonymized)

At a SaaS company we worked with in 2025–2026, product teams were waiting hours for Snowflake queries to complete. After adding ClickHouse to the stack and migrating a small set of events into a hot ClickHouse tier with 14-day TTL, the team reduced iteration time for A/B checks from hours to minutes. They retained Snowflake for monthly financial reconciliation and complex cohort joins. Privacy was improved by hashing user IDs in the client and only joining back to PII in a separate Snowflake-only process with strict access controls.

Actionable checklist to get started this week

Pick 5 business-critical events and add the telemetry hook to record them.
Implement client-side hashing for identifiers and consent checks.
Stand up a simple ingestion endpoint that forwards to a managed Kafka or equivalent topic.
Create a ClickHouse table with a 30-day TTL and ingest some test data.
Build a small dashboard (Metabase, Apache Superset, or internal UI) pointing at ClickHouse for immediate feedback.

Summary: Why this composition wins for frontend teams

Composable analytics gives frontend teams the best of both worlds in 2026: fast, iterative insights from ClickHouse to move quickly on UX and feature work, and governed, long-term analysis in Snowflake for compliance and cross-team alignment. When you design telemetry with privacy-first defaults—minimal client payloads, hashing, sampling, and TTLs—you reduce cost and legal risk while increasing trust in your metrics.

Call to action

Ready to try a starter kit? Clone a minimal repository that wires up the useTelemetry hook, a small Node ingestion service, and a ClickHouse table with TTL and sample queries. Start with the five-event checklist today and ship dashboards that empower product teams to iterate faster and safer. If you want, share your telemetry schema and I'll review it with retention and privacy recommendations tailored to your app.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.