Skip to content
Pranav Kural

← Projects

Transac AI

Multi-cloud Kubernetes microservices that turn transactional data into real-time, natural-language insights using LLMs.


title: Transac AI

Problem

Transactional data — payments, transfers, settlements — produces vast streams of structured records. The patterns in those records (compliance flags, fraud signatures, risk anomalies) usually live behind a chain of dashboards, SQL, and human analysts. The interesting work is buried; the time-to-insight is hours, sometimes days.

I wanted to compress that loop. Could a fine-tuned LLM, given the right context per transaction, surface the same insights an analyst would — in real time, in natural language?

Architecture

Multi-cloud microservices. Core services run on Google Kubernetes Engine; auxiliary services run on AWS Lambda + EventBridge. Confluent provides managed Kafka for event streams. Supabase backs real-time storage.

The hot path:

client → WMS (Go) → Kafka → PBS (Python) → LLM → GraphQL Insights API → Supabase
  • WMS (Workload Manager Service, Go) — receives generation requests, validates, enqueues. Built in Go for sustained concurrency under burst load.
  • PBS (Prompt Builder Service, Python) — pulls transaction context, applies prompt templates, calls LLMs. Python is the right home for the data-prep heavy work and integration with the broader Python ML ecosystem.
  • GraphQL servers — TypeScript + Prisma. Stores the generated insights and the original requests; clients read both via a single GraphQL endpoint.

Internal RPCs use gRPC; externally accessible services use Connect RPC so non-gRPC clients can connect over plain HTTP. Kafka keeps insight generation non-blocking; the producer never waits on the LLM.

LLM response generation runs on top of QvikChat — same author, same project family.

Key decisions

  • Polyglot, deliberately. Go for concurrency-bound services, Python for data-shaping and ML, TypeScript for ORM-backed APIs. Each chosen because the language fits the job, not because of stack uniformity.
  • Kafka instead of synchronous calls between WMS and PBS. The latency of an LLM call (sometimes seconds) makes a synchronous chain a bad idea — you'd cap throughput at the slowest model. Decoupling them lets the producer rate-limit independently.
  • Per-transaction prompt templates, not a giant context window. Each insight type (compliance / risk / fraud) gets its own template; the prompt builder picks one based on transaction shape.

What I'd do differently

  • More observability earlier. I added structured tracing late in the project; on a system with this many hops, distributed tracing is a day-one requirement, not a polish item.
  • Pin LLM versions per insight type. Prompt drift across model upgrades is real — ungoverned, an "innocuous" model swap can change the tone or shape of insights overnight.
  • Stricter input validation in WMS. Coral-style explicit schemas, not just type checks at the language level.

Live demo · Documentation · GitHub