May 17, 2026May, 2026
TraceAI — Inspectable AI Workflow Observability Dashboard
TraceAI is an inspectable workflow observability system for AI pipelines. It exposes execution traces, retrieval context, validation failures, latency, and model metadata through a controlled workflow demo.
- Role
- Solo developer — system design, workflow engine, database schema, API routes, trace UI, retrieval metadata, live/demo execution modes, deployment setup.
- Stack
- Next.js · TypeScript · React · Tailwind · shadcn/ui · Neon PostgreSQL · Gemini Flash · Vercel
- Links
- LiveRepository
Problem
AI workflows are difficult to debug when generation, retrieval, validation, and fallback behavior are hidden behind a single response. When a run fails, it is often unclear which step caused the failure, what context was retrieved, how long each step took, or whether the output passed validation.
Solution
I built TraceAI as a step-level observability layer for AI workflow runs. The system records each execution step, visualizes latency over time, exposes retrieved context and model metadata, and surfaces validation failures so workflow behavior can be inspected instead of guessed.
Decisions
- Separated generation from validation to make model output failures visible
- Used a timeline view instead of raw logs to expose execution order and bottlenecks
- Kept the support workflow as a controlled demo scenario rather than the product itself
- Stored workflow steps as structured trace data so each run can be inspected after execution
Architecture highlights
- Step-level execution tracing for AI workflow runs
- Timeline-based latency visualization
- Retrieval context inspection with matched documents
- Structured validation layer for AI-generated outputs
- Failure-first debugging for incomplete or invalid workflow runs
Outcomes
- Made failed AI workflow runs easier to inspect at the step level
- Exposed latency, retrieval, model metadata, and validation behavior in one trace
- Turned opaque support reply generation into a debuggable workflow execution
Overview
TraceAI is an inspectable workflow observability system for AI pipelines.
Instead of treating an AI response as a single black-box output, the project exposes execution traces, retrieval context, validation failures, latency, and step-level metadata through a controlled workflow scenario.
The support reply workflow is used only as a deterministic demo environment for inspecting AI pipeline behavior.
Each workflow execution is stored as a run. Each stage is stored as an ordered step with status, duration, input/output previews, and structured metadata. This makes it possible to inspect what happened inside the workflow after it runs.
Problem
AI workflows are often difficult to debug because the important decisions happen between the user input and the final model output.
For a support-reply workflow, the system needs to answer questions such as:
- Which policy documents were retrieved?
- How relevant were they?
- Did the draft cite the required policy?
- Where did the latency happen?
- Was the response generated by a live model or a deterministic fallback?
- Why did a run fail?
Without trace-level visibility, the workflow becomes hard to trust and hard to improve.
Most AI systems expose only the final output. TraceAI focuses on exposing the execution layer behind that output: retrieval decisions, validation behavior, latency distribution, and failure points.
Solution
I built TraceAI as a focused observability layer for inspectable AI workflow execution.
The system runs a support reply pipeline and records every stage into PostgreSQL. The dashboard shows recent runs, success rate, average latency, and run history. A run detail page shows a trace timeline where each step can be opened in a side drawer.
The drawer exposes the internal data for that step: retrieved policy documents, scores, matched keywords, snippets, model information, token estimates, validation checks, errors, and output previews.
Architecture
The project uses a simple layered architecture:
- Next.js App Router for pages and API routes
- A workflow engine responsible for step orchestration
- A structured trace model for execution visibility and debugging
- PostgreSQL tables for runs, steps, and policy documents
- A retrieval layer for scoring policy documents
- A Gemini draft layer for optional live AI generation
- A deterministic demo engine for reliable public demos
The core data model is based on two main entities:
workflow_runsworkflow_steps
A run represents one workflow execution. A step represents one stage inside that execution.
Key Decisions
Run and step based persistence
I modeled each execution as a workflow run and each pipeline stage as an ordered workflow step. This made the UI easier to reason about and allowed the system to show a timeline instead of only a final result.
JSONB metadata for step details
Different steps produce different kinds of observability data. Policy retrieval produces document scores and snippets. AI generation produces model metadata and token estimates. Validation produces checks and possible error messages.
Instead of over-normalizing every metadata field too early, I stored step-specific details in JSONB.
Deterministic demo mode
The public demo should not break just because an AI API key is missing or a quota limit is reached. For that reason, the project supports a deterministic demo engine.
When Gemini is available, the AI draft step can run in live mode. If quota or rate-limit errors happen, the workflow falls back to the demo engine instead of failing the whole experience.
Scope control
I intentionally avoided building authentication, multi-user support, vector search, cost analytics, and a workflow builder in the first version.
The goal was not to create a full observability platform. The goal was to demonstrate how AI workflow execution can become inspectable, debuggable, and traceable instead of opaque.