Health & monitoring
Two scripts in scripts/ cover application health and host/container health. Run them after every deploy and before any demo.
Application health — scripts/health-check.sh
A two‑tier sweep against the live API.
- Tier 1 (no token): gateway, all microservices (
/api/v1/health/services), and the auth + DB write path. - Tier 2 (with a token): read‑only probes across wallet, agreements, distributions, settlements, funding, KYC.
# Liveness only (anywhere):
BASE=https://api.vistus.io bash scripts/health-check.sh
# Full check (from the server or your machine):
BASE=https://api.vistus.io TOKEN="<jwt>" PROFILE_ID="<business-profile-id>" \
ADDR=0xYourWallet bash scripts/health-check.sh
Reading results:
| Result | Meaning |
|---|---|
✓ 200 |
working |
✗ 401 |
token expired/invalid (access tokens are short‑lived — grab a fresh one) |
✗ 403 |
authorization, not failure — a business endpoint hit without owning the PROFILE_ID, or a personal account hitting a business endpoint |
✗ 5xx |
the service or its DB is unhappy — check $DC logs -f service-<name> |
ℹ reconciliation |
informational — that worker isn't in the gateway's health aggregator (see below); verify with docker compose ps service-reconciliation |
Get a JWT from a logged‑in browser: DevTools → Network → any api.vistus.io request → authorization: Bearer …. For PROFILE_ID, switch to business mode and copy the x-axon-profile-id header (or query profiles where type='BUSINESS').
Host & container health — scripts/infra-check.sh
Snapshots host CPU/memory/disk, per‑container resource use, and stability (restarts, OOM kills).
bash scripts/infra-check.sh # add sudo if it says dmesg needs it
It flags: RAM available < 20%/10%, disk ≥ 80%/90%, any container not running (db‑migrate's Exited is treated as the expected one‑shot), restart loops, and OOM kills.
Disk hygiene
Frequent --no-cache rebuilds pile up Docker build cache — the usual disk hog. Reclaim safely (never --volumes, which would delete the database):
docker system df # see what's using space
docker system prune -f # stopped containers, dangling images, networks
docker builder prune -f --reserved-space 20GB # trim build cache, keep ~20GB recent
Known monitoring gap
The gateway's /health/services aggregator can't reach service-reconciliation (it pings the wrong host/path), so it reports "unreachable" even when the worker is healthy. The worker itself has a docker healthcheck at /api/v1/health. Until the aggregator is fixed, confirm the worker with docker compose ps service-reconciliation. Fixing it is a small gateway change (add SERVICE_RECONCILIATION_HOST to the gateway env + correct the path in apps/api-gateway/src/health/health.controller.ts) plus a gateway rebuild.