Skip to content

💾 Lightflow Backup

Lightflow is a generic task runner: the backup logic lives entirely in the scripts it runs and the host-side gate they reach over SSH. This tutorial configures an installed Lightflow instance to drive the nightly OneDrive backups of the Podman services (calibre, immich, kouizine, n8n, planka, vaultwarden) — replacing the heavier Airflow stack while keeping the exact same security model: Lightflow never reads another service's files and never holds the OneDrive token, the boundary lives in backupctl on the host.

📋 Requirements

Info

Lightflow Backup requires the installation of:


🛤️ End-to-end path

How a nightly run travels from the scheduler all the way to OneDrive — and who enforces what at each hop:

backup_process


🗂️ The four-step backup model

For each backed-up service the nightly run executes four steps, in order:

Step Action Box
stop podmanctl --stop <svc>
(whole stack → consistent files)
🟩 / 🟥
sync rclone sync data/ → OneDrive
(mirror)
🟩 / 🟥
archive tar\|gzip\|rclone rcat
retention, Sundays only
🟩 / 🟥 / ⬜ (grey other days)
start podmanctl --start <svc>
run rule always
🟩 / 🟥

A step's exit code drives its box: 0 → success 🟩, 75 → skipped ⬜, anything else → failed 🟥. The start step uses the always run rule, so the service is restarted even if a previous step failed.


🔐 Backup Gate

The host-side plumbing that the DAGs call through SSH:

  • rclone (OneDrive transfers)
  • backupctl (the validating gate)
  • backup_ops SSH account.

☁️ OneDrive Remote

Install rclone and configure the OneDrive remote
# install rclone (OneDrive transfers are driven by backupctl)
sudo apt update
sudo apt install -y rclone
# the ODROID is headless: the Microsoft OAuth consent needs a graphical
# browser, and its redirect is hard-wired to localhost:53682 ON THE
# MACHINE RUNNING RCLONE. An SSH tunnel makes the desktop browser's
# localhost reach the ODROID, so everything stays on the device.
#
# 1. on the DESKTOP (Windows 11: OpenSSH is built in), open the tunnel
#    and KEEP THE SESSION OPEN:
#      ssh -L 53682:localhost:53682 debian@odroid
#
# 2. inside that SSH session, run the interactive configuration:
rclone config
#      n) New remote
#      name>             onedrive
#      Storage>          onedrive
#      client_id>        (empty)
#      client_secret>    (empty)
#      region>           global
#      Edit advanced config?            n
#      Use web browser to authenticate? y   ← yes: the tunnel carries the redirect
#
# 3. rclone cannot open a browser on the ODROID: it prints the URL
#      http://127.0.0.1:53682/auth?state=...
#    open it in the DESKTOP browser and sign in to the Microsoft
#    account — the localhost redirect flows back through the tunnel
#    and rclone resumes automatically:
#
#      Your type of connection>         onedrive (personal or business)
#      Drive to use>     (select your drive → fills drive_id / drive_type)
#      Keep this "onedrive" remote?     y
# the live token belongs to root only: backupctl is the single consumer
sudo mkdir -p /etc/backupctl
sudo mv ~/.config/rclone/rclone.conf /etc/backupctl/rclone.conf
sudo chown root:root /etc/backupctl/rclone.conf
sudo chmod 0600 /etc/backupctl/rclone.conf

# verify the remote and create the backup folder
sudo rclone --config /etc/backupctl/rclone.conf lsd onedrive:
sudo rclone --config /etc/backupctl/rclone.conf mkdir onedrive:/OdroidBackup

The live OAuth token rotates — never re-apply an old copy

Microsoft rotates the refresh token on every use: rclone persists each rotation by rewriting /etc/backupctl/rclone.conf. That file (root, 0600) is the single source of truth for the token, the LightFlow backup_onedrive Variable never holds it (it only carries remote/folder). Keep a copy of the working config in Bitwarden.

🎛️ Install backupctl

Install backupctl to /usr/local/bin
# download the script published by this documentation site
sudo curl -fsSL https://docs.fum-server.fr/files/backupctl.py -o /usr/local/bin/backupctl

# make it executable for everyone, writable by root only
sudo chmod 0755 /usr/local/bin/backupctl

# check the installation (backupctl refuses to run without root)
sudo backupctl --help

The script is plain Python 3 standard library.

👤 Create the backup_ops SSH account

Create the backup_ops user and sudoers rule
# dedicated account used ONLY by the LightFlow container to reach backupctl.
# NOT added to the no-ssh group on purpose: SSH is its entire reason to exist,
# and the forced command below is the only thing it can ever execute.
sudo useradd --system --create-home --home-dir /home/backup_ops --shell /bin/bash backup_ops

# sudoers drop-in: keep the client's original command line visible to the
# gate (sshd exports it as SSH_ORIGINAL_COMMAND) and allow exactly ONE
# invocation as root — no wildcard
sudo tee /etc/sudoers.d/backupctl >/dev/null <<'EOF'
Defaults!/usr/local/bin/backupctl env_keep += "SSH_ORIGINAL_COMMAND"
backup_ops ALL=(root) NOPASSWD: /usr/local/bin/backupctl --ssh-gate
EOF
sudo chmod 0440 /etc/sudoers.d/backupctl

# validate the sudoers syntax before it can lock you out
sudo visudo -cf /etc/sudoers.d/backupctl

🔑 Authorize Lightflow's SSH key

Lightflow doesn't keep a private key file on disk: the key material lives in the BACKUP_SSH_KEY Variable (secret) and backup.py writes it to a private, short-lived temp file only while ssh runs. The same applies to the host-key line in BACKUP_KNOWN_HOSTS. This keeps the script generic, a future task that reaches another device just gets its own key Variable, with nothing to mount.

Generate a key pair, authorize it, and capture the values to paste
# generate a throwaway key pair in a temp dir, the PRIVATE key will live only
# in the BACKUP_SSH_KEY Variable, never as a file on the host or a mount
TMP="$(mktemp -d)"
ssh-keygen -t ed25519 -N "" -C "lightflow-backup" -f "${TMP}/id_ed25519"

# authorize the PUBLIC key for backup_ops with the SAME forced command
PUBKEY="$(cat ${TMP}/id_ed25519.pub)"
sudo tee -a /home/backup_ops/.ssh/authorized_keys >/dev/null <<EOF
restrict,command="sudo /usr/local/bin/backupctl --ssh-gate" ${PUBKEY}
EOF
sudo chmod 0600 /home/backup_ops/.ssh/authorized_keys
sudo chown backup_ops:backup_ops /home/backup_ops/.ssh/authorized_keys

# ---- value for BACKUP_KNOWN_HOSTS -------------------------------------------
# the container reaches the host as host.containers.internal, so the known_hosts
# line must carry THAT name in front of the host's own ed25519 public host key
printf 'host.containers.internal %s\n' \
  "$(sudo awk '{print $1, $2}' /etc/ssh/ssh_host_ed25519_key.pub)"

# ---- value for BACKUP_SSH_KEY -----------------------------------------------
# print the PRIVATE key to copy, then wipe the temp dir so nothing lingers
cat "${TMP}/id_ed25519"
rm -rf "${TMP}"

Paste the private key into the BACKUP_SSH_KEY Variable (secret) and the host.containers.internal … line into BACKUP_KNOWN_HOSTS. No private key file is kept on the host and no .ssh volume is mounted. backupctl accepts only --stop/--start/--sync/--archive over the gate. --restore stays a manual, on-host operation.


⚙️ Configure Lightflow (in the UI)

1. Create the backup pool

Pools → Add pool

  • Name: backup
  • Slots: 1

A 1-slot pool serializes every backup step across all services Two steps can never run at once.

2. Create the backup variables

Variables → Add variable (these become environment variables for the scripts):

Key Value Secret
BACKUP_HOST backup_ops@host.containers.internal no
BACKUP_SSH_KEY (paste the SSH private key contents — see below) yes
BACKUP_KNOWN_HOSTS (paste the known_hosts host-key line — see below) no
ONEDRIVE_REMOTE onedrive no
ONEDRIVE_FOLDER /OdroidBackup no
ARCHIVE_WEEKDAY sunday no
KEEP_WEEKLY 3 no
KEEP_MONTHLY 12 no

The OneDrive token is not stored here. It rotates in /etc/backupctl/rclone.conf on the host.

3. Create the backup scripts
#!/usr/bin/env python3
"""
backup.py — sample Lightflow task script (OneDrive backup of a Podman service).

Lightflow itself is a generic runner: this script holds the backup logic and
reaches the host EXACTLY like the old Airflow DAGs did — over the restricted
`backup_ops` SSH forced command, which runs `sudo backupctl --ssh-gate` on the
host. Lightflow never reads another service's files and never holds the OneDrive
token; the security boundary lives in `backupctl` on the host.

One Lightflow TASK per service, made of four ordered STEPS — each step runs this
script with one action and shows up as one box in the grid:

    Step 1  stop      python3 /scripts/backup.py --stop    <service>     (on_success)
    Step 2  sync      python3 /scripts/backup.py --sync    <service>     (on_success)
    Step 3  archive   python3 /scripts/backup.py --archive <service>     (on_success)
    Step 4  start     python3 /scripts/backup.py --start   <service>     (always)

`archive` exits 75 (Lightflow's "skipped"/grey box) on non-archive days, so the
weekly archive shows grey except on ARCHIVE_WEEKDAY. `start` uses the `always`
run rule so the service is restarted even if a previous step failed.

A non-service directory on the host (e.g. a git tree owned by www-data:debian)
is mirrored with a single step — it has no stack to stop/start:

    sync-extra   python3 /scripts/backup.py --sync-extra <target>   (on_success)

`<target>` is a NAME from backupctl's EXTRA_TARGETS allow-list (the host resolves
its path; we never pass --path); the mirror lands under <remote>/extra/<target>/.

Configuration comes from Lightflow VARIABLES (injected as environment variables):

    BACKUP_HOST        backup_ops@host.containers.internal   (required)
    BACKUP_SSH_KEY     SSH private key — inline PEM value OR a file path
    BACKUP_KNOWN_HOSTS known_hosts entry — inline value OR a file path
    ONEDRIVE_REMOTE    onedrive                              (default)
    ONEDRIVE_FOLDER    /OdroidBackup                         (default)
    ARCHIVE_WEEKDAY    sunday                                (default)
    KEEP_WEEKLY        3                                     (default)
    KEEP_MONTHLY       12                                    (default)
    STOP_TIMEOUT / SYNC_TIMEOUT / ARCHIVE_TIMEOUT / START_TIMEOUT   (seconds)

Holding the SSH key (and known_hosts) inline in a secret Variable keeps the
script generic — any future device just needs its own key Variable, with no
host-side key files to provision. The key material is written to a private,
short-lived temp file only while `ssh` runs, then removed.
"""

import os
import re
import subprocess
import sys
import tempfile
import textwrap
from datetime import date

SKIP_EXIT_CODE = 75  # Lightflow renders this as a grey "skipped" box
ACTIONS = {"--stop", "--start", "--sync", "--archive", "--sync-extra"}


def env(key, default=None):
    return os.environ.get(key, default)


def materialize(value, kind):
    """Resolve a key / known_hosts VALUE to a file path that `ssh` can use.

    A Variable may hold either a filesystem PATH (legacy) or the INLINE value
    itself (preferred — generic, no host-side key files). Inline content is
    written to a private (0600) temp file; the caller removes it afterwards.
    A private key is recognised by its PEM header; a known_hosts entry by the
    whitespace separating its fields. A key whose line breaks were flattened
    to spaces by a single-line input field is re-wrapped into a valid PEM.
    Returns (path, is_temp).
    """
    # normalize line endings: a value pasted or stored with Windows CRLF
    # (or a stray \r) keeps \r inside the body, so force LF.
    text = value.replace("\r\n", "\n").replace("\r", "\n").strip()
    inline = "-----BEGIN" in text if kind == "key" else any(c.isspace() for c in text)
    if not inline:
        return value, False
    if kind == "key":
        # Some Variable fields flatten a multi-line PEM onto ONE line, turning
        # the newlines into spaces — ssh then fails with "error in libcrypto".
        # Rebuild a canonical PEM: keep the BEGIN/END markers, strip ALL
        # whitespace from the base64 body, then re-wrap at 70 columns. This is
        # a no-op on an already-correct key, so it is always safe to run.
        m = re.search(r"-----BEGIN ([A-Z0-9 ]+)-----(.*)-----END \1-----", text, re.S)
        if m:
            label = m.group(1)
            body = re.sub(r"\s+", "", m.group(2))
            text = "-----BEGIN {0}-----\n{1}\n-----END {0}-----".format(
                label, "\n".join(textwrap.wrap(body, 70))
            )
    fd, path = tempfile.mkstemp(prefix="lightflow-", suffix=f".{kind}")
    with os.fdopen(fd, "w") as f:
        f.write(text + "\n")
    os.chmod(path, 0o600)
    return path, True


def ssh_backupctl(args, timeout):
    """Run `backupctl <args> --timeout <t>` on the host via the SSH forced command."""
    host = env("BACKUP_HOST")
    if not host:
        sys.exit("error: BACKUP_HOST is not set (e.g. backup_ops@host.containers.internal)")
    key, key_tmp = materialize(env("BACKUP_SSH_KEY", "/data/.ssh/id_ed25519"), "key")
    known, known_tmp = materialize(env("BACKUP_KNOWN_HOSTS", "/data/.ssh/known_hosts"), "known_hosts")

    remote = "backupctl " + " ".join(args) + f" --timeout {timeout}"
    cmd = [
        "ssh",
        "-i", key,
        "-o", "BatchMode=yes",
        "-o", "StrictHostKeyChecking=accept-new",
        "-o", f"UserKnownHostsFile={known}",
        host,
        remote,
    ]
    print("+ " + " ".join(cmd), flush=True)
    try:
        return subprocess.run(cmd).returncode
    finally:
        for path, is_tmp in ((key, key_tmp), (known, known_tmp)):
            if is_tmp:
                try:
                    os.unlink(path)
                except OSError:
                    pass


def main():
    if len(sys.argv) < 3 or sys.argv[1] not in ACTIONS:
        sys.exit("usage: backup.py <stop|start|sync|archive|sync-extra> <service|target>")
    action, service = sys.argv[1], sys.argv[2]
    if not service.strip():
        sys.exit("error: service name is empty")

    path = f"/media/ssd/podman/{service}/data"
    remote = f"{env('ONEDRIVE_REMOTE', 'onedrive')}:{env('ONEDRIVE_FOLDER', '/OdroidBackup')}"

    if action == "--stop":
        rc = ssh_backupctl(["--stop", service], int(env("STOP_TIMEOUT", "120")))
    elif action == "--start":
        rc = ssh_backupctl(["--start", service], int(env("START_TIMEOUT", "300")))
    elif action == "--sync":
        rc = ssh_backupctl(
            ["--sync", service, "--path", path, "--remote", remote],
            int(env("SYNC_TIMEOUT", "3600")),
        )
    elif action == "--sync-extra":
        # `service` is the extra-target NAME here; the host resolves its path
        # from backupctl's EXTRA_TARGETS allow-list, so we never pass --path.
        rc = ssh_backupctl(
            ["--sync-extra", service, "--remote", remote],
            int(env("SYNC_TIMEOUT", "3600")),
        )
    elif action == "--archive":
        weekday = env("ARCHIVE_WEEKDAY", "sunday").lower()
        if date.today().strftime("%A").lower() != weekday:
            print(f"skip: weekly archive only runs on {weekday}", flush=True)
            sys.exit(SKIP_EXIT_CODE)
        rc = ssh_backupctl(
            [
                "--archive", service, "--path", path, "--remote", remote,
                "--keep-weekly", env("KEEP_WEEKLY", "3"),
                "--keep-monthly", env("KEEP_MONTHLY", "12"),
            ],
            int(env("ARCHIVE_TIMEOUT", "3600")),
        )
    else:
        sys.exit(f"error: unknown action {action!r} (stop|start|sync|archive|sync-extra)")

    sys.exit(rc)


if __name__ == "__main__":
    main()
#!/usr/bin/env python3
"""
self_backup.py — make a consistent, *hot* snapshot of Lightflow's own
SQLite database, WITHOUT stopping the container.

Lightflow can back itself up like any other service. The only file that is
unsafe to copy while the process runs is the live SQLite DB — a plain `cp` can
catch it mid-write. SQLite's online backup API copies a *transactionally
consistent* snapshot even under concurrent writes, so the server never stops.

This step runs INSIDE the Lightflow container (uid 1000, Python stdlib only —
no extra package) and writes the snapshot into the data volume. A *second*,
SSH step then ships the data directory offsite through the host `backupctl`
gate, exactly like every other service:

    ssh backup_ops backupctl --archive lightflow \
        --path /media/ssd/podman/lightflow/data

The scripts/ and python-libs/ folders are ordinary files and are copied as-is by
that archive step (python-libs is reproducible from `pip`, so it is optional).

Restore: extract the archive, then use the snapshot file produced here
(`<data>/backup/lightflow.db`) as the database — discard the live
`lightflow.db` / `-wal` / `-shm` captured alongside it, which may be momentarily
inconsistent.
"""

import os
import sqlite3
import sys
from pathlib import Path

DATA_DIR = Path(os.environ.get("LIGHTFLOW_DATA_DIR", "/data"))
LIVE_DB = DATA_DIR / "lightflow.db"
SNAPSHOT_DIR = DATA_DIR / "backup"
SNAPSHOT_DB = SNAPSHOT_DIR / "lightflow.db"


def main() -> int:
    if not LIVE_DB.exists():
        print(f"error: live database not found at {LIVE_DB}", file=sys.stderr)
        return 1

    SNAPSHOT_DIR.mkdir(parents=True, exist_ok=True)

    # Read-only source: we never write to the live DB. The online backup API
    # streams a consistent copy page-by-page, restarting automatically if a
    # writer commits mid-copy — so the snapshot is always consistent.
    src = sqlite3.connect(f"file:{LIVE_DB}?mode=ro", uri=True)
    try:
        dst = sqlite3.connect(SNAPSHOT_DB)
        try:
            with dst:
                src.backup(dst)
        finally:
            dst.close()
    finally:
        src.close()

    size = SNAPSHOT_DB.stat().st_size
    print(f"snapshot written: {SNAPSHOT_DB} ({size} bytes)")
    return 0


if __name__ == "__main__":
    sys.exit(main())
4. Create one task per service

Tasks → New task — example for vaultwarden:

  • Name: backup_vaultwarden
  • CRON: 0 3 * * * (daily 03:00) — or stagger services a few minutes apart
  • Pool: backup
  • Steps (each runs python3 backup.py <action> vaultwarden):
# Name Command Arguments Run rule Timeout
1 stop python3 --stop vaultwarden on_success 30
2 sync python3 --sync vaultwarden on_success 300
3 archive python3 --archive vaultwarden on_success 600
4 start python3 --start vaultwarden always 180

archive exits 75 (grey ⬜ "skipped") on non-ARCHIVE_WEEKDAY days. start uses always so the stack is restarted even if an earlier step failed. Duplicate the task for each service (calibre, immich, n8n, planka, …), changing the name and the service argument. The 1-slot backup pool keeps them serialized.


💾 Operations

Run and monitor the backups

  • scheduled: each backup_<service> task fires on its CRON; the Auto toggle enables/disables scheduling without deleting the task
  • on demand: open a task → ▶ Run now (or ▶ Run on the Tasks page)
  • grid: 🟩 step ok — 🟥 step failed/timed out (the service is still restarted by start) — ⬜ archive skipped (not the archive weekday)
  • logs: open a task → pick a run column → pick a step → the captured backupctl output (rclone stats included) streams live while it runs
  • OneDrive layout: /OdroidBackup/<service>/data/ (mirror) and /OdroidBackup/<service>/archives/<service>-YYYY-MM-DD.tar.gz

First run — smoke test

Trigger the smallest service first (e.g. backup_vaultwarden) and check:

  1. the four boxes go green and the service is back up (podmanctl --list)
  2. the mirror appeared on OneDrive under /OdroidBackup/vaultwarden/data/
  3. trigger it again on a non-archive day: archive must show skipped (grey)
Restore a backup (manual, on the host)

Restoring stays a deliberate on-host operation through backupctl --restore and is refused over the SSH gate on purpose — run it directly on the ODROID from the Debian account.