Transac AI
Multi-cloud Kubernetes microservices that turn transactional data into real-time, natural-language insights using LLMs.
title: Transac AI
Problem
Transactional data — payments, transfers, settlements — produces vast streams of structured records. The patterns in those records (compliance flags, fraud signatures, risk anomalies) usually live behind a chain of dashboards, SQL, and human analysts. The interesting work is buried; the time-to-insight is hours, sometimes days.
I wanted to compress that loop. Could a fine-tuned LLM, given the right context per transaction, surface the same insights an analyst would — in real time, in natural language?
Architecture
Multi-cloud microservices. Core services run on Google Kubernetes Engine; auxiliary services run on AWS Lambda + EventBridge. Confluent provides managed Kafka for event streams. Supabase backs real-time storage.
The hot path:
client → WMS (Go) → Kafka → PBS (Python) → LLM → GraphQL Insights API → Supabase
- WMS (Workload Manager Service, Go) — receives generation requests, validates, enqueues. Built in Go for sustained concurrency under burst load.
- PBS (Prompt Builder Service, Python) — pulls transaction context, applies prompt templates, calls LLMs. Python is the right home for the data-prep heavy work and integration with the broader Python ML ecosystem.
- GraphQL servers — TypeScript + Prisma. Stores the generated insights and the original requests; clients read both via a single GraphQL endpoint.
Internal RPCs use gRPC; externally accessible services use Connect RPC so non-gRPC clients can connect over plain HTTP. Kafka keeps insight generation non-blocking; the producer never waits on the LLM.
LLM response generation runs on top of QvikChat — same author, same project family.
Key decisions
- Polyglot, deliberately. Go for concurrency-bound services, Python for data-shaping and ML, TypeScript for ORM-backed APIs. Each chosen because the language fits the job, not because of stack uniformity.
- Kafka instead of synchronous calls between WMS and PBS. The latency of an LLM call (sometimes seconds) makes a synchronous chain a bad idea — you'd cap throughput at the slowest model. Decoupling them lets the producer rate-limit independently.
- Per-transaction prompt templates, not a giant context window. Each insight type (compliance / risk / fraud) gets its own template; the prompt builder picks one based on transaction shape.
What I'd do differently
- More observability earlier. I added structured tracing late in the project; on a system with this many hops, distributed tracing is a day-one requirement, not a polish item.
- Pin LLM versions per insight type. Prompt drift across model upgrades is real — ungoverned, an "innocuous" model swap can change the tone or shape of insights overnight.
- Stricter input validation in WMS. Coral-style explicit schemas, not just type checks at the language level.