Year in Review (2025): Systems Engineering

View Mode Toggles:

The theme for 2025 was Systems Engineering—not just writing code, but architecting and deploying end-to-end systems which are secure by design, resource-efficient, and robust in data-handling. Each of the following projects pushes this principle in a different direction.

Scalable Architecture for Identity-Linked Data

Overarching the 2025 roadmap was an infrastructure platform engineered to link consumer wearable biometrics with verified national identity layers. This system bridges Polar AccessLink (biometrics) and SingPass MyInfo (identity), solving a critical challenge of ingesting high-volume and identity-sensitive health data—without exposing identity fields to intermediate states or database administrators.

The platform was designed as a strictly event-driven, serverless ecosystem on AWS, emphasizing granular scalability and isolation.

Identity Federation & Data Privacy (updated): One significant engineering complexity was the handling of NRIC data. A layered security flow was implemented: sensitive data is transmitted between services via secure HTTP POST (TLS)—preventing exposure in browser history or server logs—and is persisted in DynamoDB as an RSA-2048–encrypted payload. This ensures that only designated data custodians holding the private-key can recover the plaintext NRIC, while backend services only process plaintext in volatile memory during the transaction.
OLTP vs. OLAP Separation: Operational and analytical data planes were strictly separated. Ingestion is write-optimized, asynchronously routing payloads to DynamoDB and S3. Downstream analytical data is read-optimized via a scheduled incremental load process which transforms JSON into columnar Parquet format, and made queryable via Amazon Athena. Large-scale queries can thus be performed downstream without impacting live ingestion workloads, while also preserving cost-efficiency.
Other areas which have broadly been covered in a previous post include:
- Asynchronous concurrency, SQS buffering, rate-limiting
- Infrastructure as Code
- CI/CD and Containerization
- Observability and Diagnostics
- Development Environment and Tooling
- LLM-Assisted Development

Traffic Engineering for LLM Fleet

llm-control-plane is about managing a distributed fleet of self-deployed LLMs on heterogeneous hardware. A centralized smart proxy and routing layer was constructed.

Semantic & Hardware-Aware Routing: The system routes traffic intelligently—a WorkloadClassifier categorizes incoming prompts by intent (e.g., reasoning, programming, high-throughput etc.) with a lightweight LLM, enabling the LLMRouter to dispatch requests appropriately. By aligning task complexity with hardware capabilities, the routing layer assigns heavy inferential workloads to high-VRAM clusters and trivial queries to agile, low-overhead instances, ensuring optimal utilization of heterogeneous compute resources. These routing decisions assume that LLM nodes are themselves well-optimized. For example, one class of endpoints is backed by custom Vulkan builds of llama.cpp on Strix Halo, described in this post.
Streaming Proxy with Protocol Normalization: The core of the system is a custom FastAPI proxy designed to intercept and normalize OpenAI-compatible requests. A notable engineering challenge was unifying divergent model output formats—in particular separating internal “reasoning” traces (chain-of-though) from final text output. A robust stream processor that buffers and detects provider-specific delimiters (including <think> tags or proprietary control tokens) in real-time was implemented. This logic dynamically bifurcates the incoming Server-Sent Events (SSE) stream into distinct “reasoning” and “content” channels for the client for easier and standardized client-side consumption, while an SSEAccumulator simultaneously reconstructs the complete response server-side for conversation history logging (e.g. for context).
Real-Time Observability Plane: To validate these orchestration decisions, a dedicated observability interface was developed using Shiny for Python. This dashboard was specifically engineered to expose the router’s internal state, visualizing the rationale behind specific dispatches and surface other performance metrics. This mechanism provides transparent verification that complex workloads are correctly landing on high-compute targets, allowing for subsequent feedback-based fine-tuning and troubleshooting of the smart routing.
Stateless Core with Config-Driven Topology: The control-plane routing layer is stateless at the process level, and all LLM nodes are fully stateless. Endpoints are (re-)discovered and configured via a registry, while conversation state is stored centrally in persistent storage accessible only by the control plane. For each request, the router reconstructs the conversation from storage and sends the full context to a selected LLM node; these LLM nodes never hold or read conversational state themselves. This keeps the entire system horizontally scalable and simplifies failure recovery: any failed LLM nodes can be restarted or replaced without affecting in-memory session state.
Extensibility and Contract-First Design: The entire system is built around explicit contracts:
- a contract for how workloads are classified;
- a contract for how endpoints advertise capabilities; and
- a contract for how streams (reasoning + content) are emitted and consumed.
By treating these as stable interfaces, it becomes straightforward to add new workloads (e.g., evaluation, embedding-only, high-token “dump” modes), new endpoints (new nodes, new runtimes), or new visualizations in the observability layer without destabilizing the existing system.

Algorithmic Rigor in Signal Processing

fit-diff is a specialized analytical application for the validation of wearable sensor data. The complexity lies in transforming messy, asynchronous sensor data into a format suitable for statistical comparison—it employs mathematical optimization to reconcile asynchronous hardware clocks and proprietary PPG algorithmic variance.

Adaptive Step-Size Search: The determine_optimal_shift function implements a search algorithm that iteratively shifts the “test” signal against the “reference.” It calculates an adaptive step size based on the greatest common divisor (GCD) of the sampling intervals of both files, ensuring the shift resolution matches the data density while attempting to minimize computational overhead.
Loss Function Optimization: The alignment isn’t arbitrary; it minimizes a specific loss function (Mean Absolute Error, Mean Squared Error) or maximizes a correlation function (Pearson, Concordance Correlation Coefficient) to find an appropriate temporal offset.

Ongoing work includes an algorithmic validation engine designed to detect anomalies within asynchronous data streams—without relying on “benchmark” devices. As participants are expected to perform similar activities, physiological patterns are also expected to somewhat converge. By treating heart rate streams as signals, and applying z-normalization and global cross-correlation, signal overlap can be maximized mathematically to establish a baseline of “expected” behavior. This process will serve primarily as a quality gate: rather than just aligning data, potential “outlier” devices or sensor failure incidents are flagged, and subsequently excluded from final analysis.

Robust Data Acquisition via Browser Automation

To correlate physiological metrics with environmental stressors, a custom Robotic Process Automation (RPA) engine was engineered to retrieve weather data locked behind a legacy portal lacking modern API access.

HTTP scraping was insufficient due to the target’s frame-based architecture and session management. A highly-customized and extensible automation layer was thus built using Selenium to autonomously navigates the legacy interface, handling multi-step selection logic for weather stations and parameters, effectively creating a “phantom API” over a GUI-only backend.

The system was designed to operate autonomously in a headless environment, orchestrating the scheduled download of TSV-like files as they are generated. These artifacts are subsequently normalized and synced to an S3 Data Lake, decoupling the downstream analytical layers from the fragility of the acquisition method and ensuring that recent environmental data is always available for correlation.

Some key features of the system include:

Fluent Interface Pattern: The core Automata class wraps the raw Selenium WebDriver in a Fluent Interface. This allows for method chaining, making the automation scripts readable and declarative. Instead of verbose Selenium boilerplate, the code reads like a set of instructions:
```
automata.open(url).select(...).input_text(...).click(...).wait(...).click.(...)
```
Even logic for logins and navigation is encapsulated in high-level methods.
Idempotent Merging Logic: The entire pipeline is designed to be re-runnable. The _merge_with_existing_data method checks S3 for existing Parquet partitions, loads them, merges newly retrieved data, and deduplicates based on timestamp and station ID. This ensures that re-running the scraper for overlapping dates does not corrupt the dataset, failed runs will automatically be retried without data loss, and the entire historical dataset can be rebuilt from scratch if absolutely necessary.

Data Lake and Onboarding

This project is primarily a Shiny (Python) application, allowing users to explore AWS Data Lake through a reactive web interface rather than looking at metadata, schemas and even raw data files.

State Management via Reactive Caching: Race conditions—where UI elements renders before schemas resolves—were fixed by preloading the AWS Glue catalog into memory at startup via initialize_data_cache. Despite the built-in reactivity, the reactive graph was also hardened by enforcing explicit dependencies in analysis functions, ensuring stale computations are invalidated only when the full context is available.
Hybrid Compute & Dual-Layer Caching: The platform bridges serverless and local compute. Heavy data retrieval is offloaded to AWS Athena, while fine-grained statistical tests (ANOVA, OLS) run in-memory. To support this, a dual-layer caching strategy was implemented: global schema caching for UI responsiveness, and per-session query result caching to prevent redundant S3 scans and preserve Athena concurrency limits.
Semantic Data Integrity: To address “schema-on-read” ambiguity, instead of relying on automated detection, the system relies on configurations (ui_config.py) to enforce data types. This guarantees that identifiers like numeric IDs are treated as categorical variables, disallowing erroneous regression analyses and ensuring robustness of the automated statistical engine.
Client-Side Infrastructure Automation: A PowerShell automation script was used to encapsulate complex ODBC connection parameters (e.g., AWS region, IAM profiles, S3 output locations) into a single execution. It ensures every analyst’s local environment is identical, effectively decoupling connectivity troubleshooting from the actual data analysis work.

Year in Review (2025): Systems Engineering

Scalable Architecture for Identity-Linked Data

Traffic Engineering for LLM Fleet

Algorithmic Rigor in Signal Processing

Robust Data Acquisition via Browser Automation

Data Lake and Onboarding

Scalable Architecture for Identity-Linked Biometric Data

Traffic Engineering for LLM Endpoints

Algorithmic Rigor in Signal Processing

Robust Data Acquisition via Browser Automation

Data Lake and Onboarding