Health & monitoring

Two scripts in scripts/ cover application health and host/container health. Run them after every deploy and before any demo.

Application health — scripts/health-check.sh

A two‑tier sweep against the live API.

  • Tier 1 (no token): gateway, all microservices (/api/v1/health/services), and the auth + DB write path.
  • Tier 2 (with a token): read‑only probes across wallet, agreements, distributions, settlements, funding, KYC.
# Liveness only (anywhere):
BASE=https://api.vistus.io bash scripts/health-check.sh

# Full check (from the server or your machine):
BASE=https://api.vistus.io TOKEN="<jwt>" PROFILE_ID="<business-profile-id>" \
  ADDR=0xYourWallet bash scripts/health-check.sh

Reading results:

Result Meaning
✓ 200 working
✗ 401 token expired/invalid (access tokens are short‑lived — grab a fresh one)
✗ 403 authorization, not failure — a business endpoint hit without owning the PROFILE_ID, or a personal account hitting a business endpoint
✗ 5xx the service or its DB is unhappy — check $DC logs -f service-<name>
ℹ reconciliation informational — that worker isn't in the gateway's health aggregator (see below); verify with docker compose ps service-reconciliation

Get a JWT from a logged‑in browser: DevTools → Network → any api.vistus.io request → authorization: Bearer …. For PROFILE_ID, switch to business mode and copy the x-axon-profile-id header (or query profiles where type='BUSINESS').

Host & container health — scripts/infra-check.sh

Snapshots host CPU/memory/disk, per‑container resource use, and stability (restarts, OOM kills).

bash scripts/infra-check.sh         # add sudo if it says dmesg needs it

It flags: RAM available < 20%/10%, disk ≥ 80%/90%, any container not running (db‑migrate's Exited is treated as the expected one‑shot), restart loops, and OOM kills.

Disk hygiene

Frequent --no-cache rebuilds pile up Docker build cache — the usual disk hog. Reclaim safely (never --volumes, which would delete the database):

docker system df                              # see what's using space
docker system prune -f                        # stopped containers, dangling images, networks
docker builder prune -f --reserved-space 20GB # trim build cache, keep ~20GB recent

Known monitoring gap

The gateway's /health/services aggregator can't reach service-reconciliation (it pings the wrong host/path), so it reports "unreachable" even when the worker is healthy. The worker itself has a docker healthcheck at /api/v1/health. Until the aggregator is fixed, confirm the worker with docker compose ps service-reconciliation. Fixing it is a small gateway change (add SERVICE_RECONCILIATION_HOST to the gateway env + correct the path in apps/api-gateway/src/health/health.controller.ts) plus a gateway rebuild.

results matching ""

    No results matching ""