Skip to content

Operations (Runbook)

How to configure, run, verify, and operate the bot. For why the operational constraints exist, follow the ADR links.

All config is loaded by pydantic-settings from environment / .env (src/abitly_bot/config.py). Template: .env.example.

VariableDefaultPurpose
BOT_TOKEN— (required)Telegram bot token (SecretStr).
ADMIN_IDS""Comma-separated Telegram user ids allowed to run admin commands.
ACCOUNT_LINK_URL""Base URL for the website→bot account-link button.
DB_HOST / DB_PORT— / 5432PostgreSQL host/port.
DB_USERNAME / DB_PASSWORD— (required)DB credentials (SecretStr).
DB_NAME— (required)Database name.
DB_SCHEMAabitlyShared schema; set as search_path per connection.
DB_SSLtrueKeep TLS verification ON (ADR 0007).
DB_CA_FILEnullPath to the managed-DB CA cert, if the provider uses its own CA.
REDIS_HOST / REDIS_PORTlocalhost / 6379Redis (shared with backend).
REDIS_PASSWORD / REDIS_DBnull / 0Redis auth / logical DB.
DEFAULT_TELEGRAM_MESSAGE_MAX_RETRY3429 retry attempts before dropping.
DEFAULT_TELEGRAM_MESSAGE_RETRY_DELAY_MS1000Base retry delay.
SEND_MAX_CONCURRENCY5Max concurrent outbound sends.
SEND_RATE_PER_SECOND30Global outbound rate cap (aiolimiter).
PORT3000aiohttp /healthcheck port.
TZEurope/KyivAPScheduler timezone for the 07:00 job.
LOG_LEVELINFOLog level.

Secrets: BOT_TOKEN, DB_PASSWORD, REDIS_PASSWORD are SecretStr and must come from the platform’s secret store, never committed. .env and .secrets/ are gitignored.

Terminal window
uv venv && source .venv/bin/activate # or python -m venv .venv
uv pip install -e ".[dev]" # or pip install -e ".[dev]"
cp .env.example .env # fill BOT_TOKEN (enough for a basic /start)
python -m abitly_bot # starts polling + healthcheck on $PORT

Smoke check: the bot answers /start; GET http://localhost:$PORT/healthcheck → OK.

Quality gate (offline — no DB/Telegram needed)

Section titled “Quality gate (offline — no DB/Telegram needed)”
Terminal window
ruff check . # lint (E,F,I,UP,B,ASYNC)
mypy src # strict
pytest # unit tests; integration tests auto-skip without DB_* env

This is the gate that must stay green for every change.

  • The healthcheck server starts first so the platform’s probe passes during boot.
  • The bot then does a fail-fast DB check (SELECT 1); if the DB is unreachable it disposes resources and exits non-zero (SystemExit(1)), so the platform restarts it rather than serving a broken bot. See Runtime Flows.
  • Liveness endpoint: GET /healthcheck → 200 OK on $PORT. Note this proves the process is up, not that the DB is reachable (that is the fail-fast’s job at boot).

TLS verification is always on (ADR 0007). For a managed DB that presents its own CA:

  1. Obtain the provider’s CA certificate (download the official cert; do not disable verification).
  2. Point DB_CA_FILE at it.
  3. Ensure the host’s egress IP is on the provider’s trusted-sources allowlist, or TCP to the DB port will silently time out (packets dropped).

See docs/MIGRATION_STATUS.md (Blocker 1) for the concrete provider steps used during bring-up.

These need live (read-only) DB access and run under the integration marker:

Terminal window
export DB_CA_FILE=/abs/path/to/ca.pem # if the DB uses a private CA
# plus DB_HOST/DB_PORT/DB_USERNAME/DB_PASSWORD/DB_NAME in the env
pytest tests/integration -v -m integration

test_schema_reflection.py is the schema-drift safety net (ADR 0006); it tolerates the two pending filter tables but surfaces any other mismatch.

  • What: daily open-day reminders for events 1 or 3 days away.
  • When: 07:00 in TZ (default Europe/Kyiv).
  • Semantics: coalesce=True (a restart-missed run fires once on resume), misfire_grace_time=3600. Jobstore is in-memory — job state is not shared with the backend or across instances. Code: infra/scheduler.py.

Outbound fan-outs go through MessageSender: bounded concurrency (SEND_MAX_CONCURRENCY), a global rate cap (SEND_RATE_PER_SECOND), and 429 retry via TelegramRetryAfter (sleep retry_after, re-queue up to …_MAX_RETRY). Blocked users (TelegramForbiddenError) and unexpected errors are logged and dropped per message without aborting the batch. Tune via the SEND_* / …_RETRY* env vars. Details: ADR 0005.

Run exactly one instance. Two independent constraints enforce this:

  1. Long-polling — multiple pollers double-deliver updates (ADR 0001).
  2. In-process state — the MessageSender queue/limiter and APScheduler jobstore are per-process; a second instance would not share the rate budget and would double-fire the daily job (ADR 0005).

FSM state is shared (Redis), but that alone is not enough to make the bot multi-instance-safe. Going multi-instance would require: webhook ingress (or a single leader poller), a shared/distributed rate limiter, and a single scheduler owner.

On KeyboardInterrupt / SystemExit, main()’s finally stops the scheduler (wait=False) and closes the bot session, Redis, DB engine, and healthcheck runner. Code: src/abitly_bot/__main__.py:83-89.

  1. Allowlist the egress IP + provision DB_CA_FILE; run the integration tests green.
  2. Land the backend prerequisites (mint endpoint + the two filter tables; verify M2M names) — Data Model.
  3. Live smoke against a staging token: /start, paste an offer URL, /myoffers, /statistics.
  4. Point the production token at this app (Ф7 wiring is ready).

Living status: docs/MIGRATION_STATUS.md.