analyticstutorialClickHouse

React + ClickHouse: Building a Real-Time Product Analytics Panel

UUnknown

2026-01-30

11 min read

Wire a React frontend to ClickHouse for near-real-time funnels and retention: schema tips, materialized views, Node API, and React streaming patterns.

Hook: Stop waiting minutes for dashboard answers — get near-real-time funnels and retention with React + ClickHouse

If you own a product analytics dashboard, you’ve felt the pain: expensive, slow queries, brittle joins, and stale funnels that don’t reflect the last few minutes of user activity. In 2026 the expectation is real-time insights — not hourly batches. This guide shows a practical, battle-tested path to wire a React frontend to ClickHouse for near-real-time funnel and retention reports, with concrete schema patterns, efficient ClickHouse query patterns, and a compact Node.js API you can deploy today.

Why ClickHouse for near-real-time product analytics in 2026?

Columnar OLAP engines like ClickHouse became mainstream for product analytics because they balance fast analytical queries with high ingestion rates. ClickHouse’s momentum accelerated through 2025 and into 2026 — driven by big funding rounds and wider adoption for event analytics, so you’ll be building on a well-supported platform. The practical upside for engineering teams is:

Fast aggregations: sub-second group-by on millions of rows when your schema and partitions are correct.
Streaming ingestion: Kafka / HTTP / Buffer pipelines that reach MergeTree tables quickly.
Materialized views and pre-aggregations: let dashboards query compact, precomputed tables instead of raw event scans.

ClickHouse’s strong market traction through late 2025 and early 2026 makes it a sensible choice for product teams that need both scale and speed.

Overview: The architecture we’ll build

High-level flow — most important decisions first:

Event ingestion: HTTP or Kafka -> transient staging (Kafka/Buffer engine).
Materialized view(s): stream into MergeTree tables and pre-aggregate funnels/cohorts.
Server API: Node.js service runs parameterized ClickHouse queries / reads pre-aggregations and exposes JSON.
React frontend: fetch via SSE or short-polling / React Query to render real-time charts and tables.

1) Schema design: event table patterns for speed and flexibility

Start with an event-level table that keeps raw events for a limited window and a set of pre-aggregations for dashboards. Key rules:

Partition by date (toDate(event_time) or toYYYYMMDD): enables partition pruning.
Primary key order should include user_id and event_time (ORDER BY (user_id, event_time)).
Use appropriate types: LowCardinality(String) for event_type and other enums, DateTime64 for sub-second times, UInt64 for IDs.
TTL for raw events if you only need X days of history in raw form.

Example raw events table

CREATE TABLE analytics.events_raw (
  event_time DateTime64(6),
  user_id UInt64,
  event_type LowCardinality(String),
  properties String, -- JSON blob (use JSONExtract in queries)
  platform LowCardinality(String)
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(event_time)
ORDER BY (user_id, event_time)
TTL event_time + toIntervalDay(90);

Why this works: ordering by user_id + event_time speeds per-user sequence queries (funnels) and cohort calculations. Keeping properties as a JSON blob avoids schema churn; extract needed fields at query time or during materialized view processing.

2) Ingestion patterns for near-real-time

There are two practical ingestion pipelines for near-real-time dashboards:

HTTP batch inserts (good for simple setups): push small batches every few seconds from your services or edge collectors to ClickHouse HTTP insert endpoints.
Kafka engine (recommended for scale and durability): write events to Kafka and let ClickHouse consume using a Kafka table + materialized view to MergeTree.

Kafka engine example (durable, low-latency)

CREATE TABLE events_kafka (
  event_time DateTime64(6),
  user_id UInt64,
  event_type String,
  properties String
) ENGINE = Kafka('kafka:9092', 'events-topic', 'group1', 'JSONEachRow');

CREATE TABLE events_raw ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(event_time)
ORDER BY (user_id, event_time)
AS SELECT * FROM events_kafka;

Use a materialized view to transform/validate events as they flow into events_raw. This minimizes expensive reads on raw data for dashboard queries.

3) Pre-aggregation: materialized views and AggregatingMergeTree

To serve sub-second dashboards, don’t query raw events for every request. Pre-aggregate common metrics.

Daily per-user events for cohort/retention precomputation.
Step counts for funnels: count users who completed each step within a sliding window.

Example: daily_user_events (compact base for cohort calculations)

CREATE TABLE daily_user_events (
  day Date,
  user_id UInt64,
  events Array(String)
) ENGINE = SummingMergeTree()
PARTITION BY toYYYYMMDD(day)
ORDER BY (day, user_id);

CREATE MATERIALIZED VIEW mv_daily_user_events TO daily_user_events AS
SELECT
  toDate(event_time) AS day,
  user_id,
  groupArray(event_type) AS events
FROM events_raw
GROUP BY day, user_id;

With daily_user_events we can compute cohort retention without scanning the full event stream for every query.

4) Efficient funnel queries in ClickHouse

Funnels ask: how many users move from step A → B → C within N hours/days? The performant pattern is:

Aggregate per-user minimal timestamps per step (minIf).
Filter by time windows and ordering (ensure t2 >= t1 etc.).
Count users using uniqExact or uniqCombined for larger scale.

Funnel SQL example (3 steps)

WITH per_user AS (
  SELECT
    user_id,
    minIf(event_time, event_type = 'view_product') AS t_view,
    minIf(event_time, event_type = 'add_to_cart') AS t_add,
    minIf(event_time, event_type = 'purchase') AS t_purchase
  FROM events_raw
  WHERE event_time >= now() - INTERVAL 30 DAY
  GROUP BY user_id
)
SELECT
  uniqExactIf(user_id, t_view IS NOT NULL) AS viewed,
  uniqExactIf(user_id, t_add IS NOT NULL AND t_add >= t_view) AS added,
  uniqExactIf(user_id, t_purchase IS NOT NULL AND t_purchase >= t_add) AS purchased
FROM per_user;

Notes: uniqExact is accurate but memory-intensive. For very high cardinality, use uniqCombined for probabilistic counting. Materialize per_user into a daily table if this query is frequent.

5) Efficient retention queries (cohort analysis)

Cohort retention is computed by cohort date (first event) and then counting distinct users who returned on subsequent days. Efficient approach:

Compute cohorts once (daily) and write to a small cohorts table.
Join cohorts to daily_user_events and group by day difference.

Retention SQL example

-- Build cohorts (run daily via MV or batch)
CREATE TABLE user_cohorts (
  user_id UInt64,
  cohort_date Date
) ENGINE = MergeTree() PARTITION BY toYYYYMMDD(cohort_date) ORDER BY (cohort_date, user_id);

-- Populate cohorts (simplified)
INSERT INTO user_cohorts
SELECT user_id, toDate(min(event_time))
FROM events_raw
WHERE event_time >= now() - INTERVAL 90 DAY
GROUP BY user_id;

-- Retention query
SELECT
  c.cohort_date,
  dateDiff('day', c.cohort_date, d.day) AS day_after_cohort,
  uniqExact(d.user_id) AS retained_users
FROM user_cohorts AS c
INNER JOIN daily_user_events AS d ON c.user_id = d.user_id
WHERE d.day BETWEEN c.cohort_date AND addDays(c.cohort_date, 30)
GROUP BY c.cohort_date, day_after_cohort
ORDER BY c.cohort_date, day_after_cohort;

By precomputing user_cohorts and using the narrow daily_user_events table, the query works on a much smaller dataset and is fast enough for interactive dashboards.

6) Node.js API: safe, parameterized queries to ClickHouse

Never hit ClickHouse directly from the browser. Build a small API that validates parameters and returns JSON. Use the official ClickHouse client for Node.js (@clickhouse/client) or a maintained driver.

Example Node/Express endpoint (simplified)

import express from 'express';
import { createClient } from '@clickhouse/client';

const ch = createClient({
  host: process.env.CLICKHOUSE_URL,
  username: process.env.CLICKHOUSE_USER,
  password: process.env.CLICKHOUSE_PASS,
});

const app = express();
app.use(express.json());

app.post('/api/funnel', async (req, res) => {
  const { start, end, steps } = req.body; // validate carefully
  // Basic validation
  // Build parametrized SQL (avoid string concatenation)
  const sql = `WITH per_user AS (
    SELECT user_id, ${steps.map((s, i) => `minIf(event_time, event_type = {step${i}}) AS t${i}`).join(',')}
    FROM events_raw
    WHERE event_time BETWEEN {start:DateTime64(6)} AND {end:DateTime64(6)}
    GROUP BY user_id
  ) SELECT ${steps.map((s, i) => `uniqExactIf(user_id, t${i} IS NOT NULL) AS step_${i}`).join(',')} FROM per_user`;

  const params = steps.reduce((p, s, i) => ({ ...p, [`step${i}`]: s }), { start, end });

  const resultSet = await ch.query({ query: sql, format: 'JSONEachRow', params });
  const rows = [];
  for await (const row of resultSet.stream()) rows.push(JSON.parse(row));
  res.json(rows[0]);
});

app.listen(8080);

Use parameterized queries instead of string concatenation to avoid injection and to let ClickHouse handle query preparation. Consider robusting your API paths for incident response and retries.

7) React: fetching and streaming patterns for near-real-time dashboards

Two common client-side patterns:

Server-Sent Events (SSE): push incremental updates for high-frequency small updates (e.g., latest counts every second).
Short polling with smart diffs: React Query or SWR polling every 3–10 seconds for fewer servers / easier scaling.

Minimal React SSE hook

import { useEffect, useState } from 'react';

export function useSSE(url) {
  const [data, setData] = useState(null);
  useEffect(() => {
    const es = new EventSource(url);
    es.onmessage = (e) => setData(JSON.parse(e.data));
    es.onerror = () => es.close();
    return () => es.close();
  }, [url]);
  return data;
}

Server-side: stream relevant JSON chunks from your Node API to keep the React chart in sync.

8) Query and performance tuning checklist

When dashboards are slow, run this checklist:

Partition pruning: ensure queries include date filters that match PARTITION BY.
ORDER BY chosen for your most common queries (user-based funnels).
LowCardinality for repeating strings reduces memory.
Materialize frequent heavy queries into AggregatingMergeTree / SummingMergeTree tables.
Right counting function: uniqExact for exact; uniqCombined for memory-efficient estimates — see notes on memory footprint.
Limit JOIN sizes: prefer pre-join in MVs to avoid large distributed joins.
Compression: use LZ4 or ZSTD for cold storage; consider codec per column for heavy arrays.
Query concurrency: tune max_threads and max_concurrent_queries on the cluster for your workloads.

9) Operational tips & cost control

Keep raw events for a rolling window (e.g., 30–90 days) and move older data to cheaper object storage or summarized tables.
Use TTLs on MergeTree to expire raw or intermediate tables.
Cache heavy dashboard responses in Redis for repeated views over short windows.
Use read replicas or separate ClickHouse clusters / edge regions for analytical-heavy workloads to avoid overloading ingestion nodes.

10) Monitoring, observability, and testing

Track these metrics to keep analytics reliable:

Ingest latency: time from event generation to visibility in MergeTree.
Query latency and 95/99th percentiles for dashboard endpoints.
Merge queue length and background merges to avoid query interference.
Result sanity checks (e.g., daily totals should be within expected ranges).

Real-world pattern: a SaaS product funnel at scale (case sketch)

Imagine a SaaS product with 10M monthly active users, generating 200M events/day. We built a near-real-time funnel by:

Ingesting via Kafka into ClickHouse Kafka engine.
Materializing per-user daily events and per-step minima into SummingMergeTree tables every minute via streaming MVs.
Exposing an API that reads pre-aggregated tables and falls back to on-the-fly aggregations for ad-hoc windows.
Frontend uses SSE for live dashboards and React Query for historical panels.

Result: median funnel query latency under 300ms for pre-aggregated windows, ingest-to-dashboard latency between 5–15 seconds depending on consumer lag — good for product teams that need near-real-time visibility without scanning terabytes on every click.

2026 trends and what to expect next

As of early 2026, three trends are shaping analytics stacks:

Columnar OLAP adoption continues to grow — ClickHouse funding and ecosystem investments mean more managed offerings and connectors.
Event streaming to analytics is the default: teams move away from batch ETL toward streaming MVs and Kafka-first architectures.
Edge/Realtime compute: expect serverless query engines and lighter push-based SDKs to let frontends push small events to analytics endpoints with low cost.

Common pitfalls and how to avoid them

Naive raw scans: heavy queries over raw events will kill latency. Always pre-aggregate or narrow by time.
Overuse of exact counts: uniqExact is precise but expensive; use probabilistic functions for large datasets (see notes on memory footprint).
Sending too much to browser: paginate and aggregate on the server; the browser should render small payloads (timeseries points, not millions of rows).

Actionable checklist to implement this in 7 days

Day 1: Create events_raw MergeTree table, set partition/ORDER BY, and TTL.
Day 2: Wire ingestion (HTTP batch or Kafka pipeline) and verify data lands in ClickHouse.
Day 3: Build materialized views for daily_user_events and user_cohorts.
Day 4: Implement Node.js API with parameterized funnel & retention endpoints.
Day 5: Build React pages: one funnel chart (SSE) and one cohort table (React Query polling).
Day 6: Add monitoring (ingest lag, query latency) and tune partitions / compression.
Day 7: Load-test with synthetic events, tune uniq function choice, and ship to a small set of users.

Summary: tradeoffs and final recommendations

ClickHouse gives you the combination of scale and speed needed for near-real-time product analytics, but success depends on schema design, smart pre-aggregation, and the right ingestion pattern. If you prefer low operational overhead, consider managed ClickHouse services. For very low-latency push updates, pair materialized views with SSE or WebSockets in the UI.

Call to action

If you want a ready-to-run starter repo: I’ve published a compact reference implementation (ClickHouse schema + Node API + React dashboard) that follows the patterns in this article. Grab it, deploy to a test cluster, and run the 7-day checklist. Need help adapting the patterns to your data volume? Ask for a quick architecture review and I’ll suggest concrete partition keys, aggregation windows, and query templates tuned to your workload.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.