# AGENTS.md ## Purpose This repository contains a BI and LiveOps system for a coloring game. Agents working in this repo should optimize for data correctness, operational safety, and low-risk incremental changes. Use this file as the default project behavior guide for future coding, debugging, and optimization work. ## Repository Overview - `oms/`: Node.js + TypeScript backend, data services, cron jobs, admin APIs. - `omsapp/`: Angular admin dashboard. - `oms/services/event-api-service.ts`: receives analytics events and publishes to RabbitMQ. - `oms/services/log-service.ts`: consumes RabbitMQ events and writes rotating logs. - `oms/services/ingestor-service.ts`: consumes RabbitMQ events and writes to ClickHouse and MongoDB. - `oms/services/cron-jobs/done-rate.ts`: daily completion-rate aggregation job. - `oms/src/services/clickhouseService.ts`: ClickHouse table management and query wrapper. - `oms/scripts/`: operational and migration scripts. ## Working Principles 1. Prefer small, reversible changes over broad rewrites. 2. Preserve production behavior unless the task explicitly requires a behavior change. 3. Treat data scripts, cron jobs, and ingestion paths as high-risk surfaces. 4. When changing analytics logic, explain the data impact clearly. 5. When changing operational scripts, favor idempotent and restart-safe behavior. ## High-Risk Areas Changes in these areas require extra care and focused validation: - RabbitMQ publish and consume paths - ClickHouse schema, partitioning, and migration scripts - MongoDB aggregation and batch update logic - Cron jobs that scan large datasets - PM2-managed services and cutover scripts ## Default Behavior Expectations ### Code Changes - Fix the narrowest correct slice first. - Do not refactor unrelated code while touching a critical data path. - Reuse existing service abstractions unless they are the root cause. - Keep public API and payload formats stable unless explicitly asked to change them. ### Data and Analytics Changes - Prefer query-shape optimization before changing product semantics. - For ClickHouse queries, always consider: - partition pruning - scanned time range - aggregation fanout - whether multiple scans can be merged into one - For long-running jobs, add execution timing and result-size logs when useful. - Use half-open time ranges: `[start, nextStart)`. ### Operational Scripts - Scripts should default to safe behavior. - Prefer dry-run by default for destructive or cutover actions. - Print the exact SQL or command plan before executing. - Avoid relying on host-installed database tools when the repo already uses Dockerized services. - When reconciling data, prefer idempotent logic keyed by stable identifiers such as `log_id`. ### Frontend and Generated Assets - Do not manually edit built Angular assets under `oms/public/app/` unless explicitly asked. - Prefer editing source files under `omsapp/src/` and rebuilding. ## Validation Rules After making changes, prefer the narrowest useful validation in this order: 1. file-level type or syntax validation 2. targeted script execution or query validation 3. focused runtime check on the changed service or cron job 4. broader build only if needed For database or migration work: 1. validate counts before and after 2. validate per-month or per-partition distribution when relevant 3. keep an explicit rollback path ## Environment Conventions - This project commonly runs ClickHouse in Docker. - Prefer `docker exec ... clickhouse-client ...` over assuming `clickhouse-client` exists on the host. - PM2 is used in production-like environments. - Be careful not to restart unrelated services during operational changes. ## Current Project-Specific Guidance ### Done Rate - `oms/services/cron-jobs/done-rate.ts` is a hotspot. - Keep ClickHouse aggregation consolidated when possible. - Watch both ClickHouse query time and MongoDB update time. ### Ingestor Reliability - `oms/services/ingestor-service.ts` currently accepts a small amount of analytics loss as a tradeoff. - Do not change acknowledgement semantics unless the task explicitly targets ingestion reliability. ### ClickHouse Storage - The `events` table is partitioned monthly by `toYYYYMM(time)`. - Future schema changes must preserve partition-aware query patterns. ## Tracking and Follow-Up - Use `oms/OPTIMIZATION_TRACKER.md` as the source of truth for known completed and pending optimizations. - When finishing a meaningful optimization, update that tracker. ## When Unsure - Choose the safer operational path. - Prefer observability over speculation. - Add logs and narrow validation before making a second larger change.