Transac AI

Multi-cloud Kubernetes microservices that turn transactional data into real-time, natural-language insights using LLMs.

Kubernetes GenAI gRPC Kafka GraphQL Microservices

Problem

Transactional data — payments, transfers, settlements — produces vast streams of structured records. The patterns in those records (compliance flags, fraud signatures, risk anomalies) usually live behind a chain of dashboards, SQL, and human analysts. The interesting work is buried; the time-to-insight is hours, sometimes days.

I wanted to compress that loop. Could a fine-tuned LLM, given the right context per transaction, surface the same insights an analyst would — in real time, in natural language?

Architecture

Multi-cloud microservices. Core services run on Google Kubernetes Engine; auxiliary services run on AWS Lambda + EventBridge. Confluent provides managed Kafka for event streams. Supabase backs real-time storage.

The hot path:

client → WMS (Go) → Kafka → PBS (Python) → LLM → GraphQL Insights API → Supabase

WMS (Workload Manager Service, Go) — receives generation requests, validates, enqueues. Built in Go for sustained concurrency under burst load.
PBS (Prompt Builder Service, Python) — pulls transaction context, applies prompt templates, calls LLMs. Python is the right home for the data-prep heavy work and integration with the broader Python ML ecosystem.
GraphQL servers — TypeScript + Prisma. Stores the generated insights and the original requests; clients read both via a single GraphQL endpoint.

Internal RPCs use gRPC; externally accessible services use Connect RPC so non-gRPC clients can connect over plain HTTP. Kafka keeps insight generation non-blocking; the producer never waits on the LLM.

LLM response generation runs on top of QvikChat — same author, same project family.

Key decisions

Polyglot, deliberately. Go for concurrency-bound services, Python for data-shaping and ML, TypeScript for ORM-backed APIs. Each chosen because the language fits the job, not because of stack uniformity.
Kafka instead of synchronous calls between WMS and PBS. The latency of an LLM call (sometimes seconds) makes a synchronous chain a bad idea — you'd cap throughput at the slowest model. Decoupling them lets the producer rate-limit independently.
Per-transaction prompt templates, not a giant context window. Each insight type (compliance / risk / fraud) gets its own template; the prompt builder picks one based on transaction shape.

What I'd do differently

More observability earlier. I added structured tracing late in the project; on a system with this many hops, distributed tracing is a day-one requirement, not a polish item.
Pin LLM versions per insight type. Prompt drift across model upgrades is real — ungoverned, an "innocuous" model swap can change the tone or shape of insights overnight.
Stricter input validation in WMS. Coral-style explicit schemas, not just type checks at the language level.

Live demo · Documentation · GitHub