💾 Lightflow Backup¶
Lightflow is a generic task runner: the backup logic lives entirely in the
scripts it runs and the host-side gate they reach over SSH. This tutorial
configures an installed Lightflow instance to drive the nightly OneDrive
backups of the Podman services (calibre, immich, kouizine, n8n, planka,
vaultwarden) — replacing the heavier Airflow stack while keeping the exact same
security model: Lightflow never reads another service's files and never holds the
OneDrive token, the boundary lives in backupctl on the host.
📋 Requirements¶
Info
Lightflow Backup requires the installation of:
🛤️ End-to-end path¶
How a nightly run travels from the scheduler all the way to OneDrive — and who enforces what at each hop:

🗂️ The four-step backup model¶
For each backed-up service the nightly run executes four steps, in order:
| Step | Action | Box |
|---|---|---|
stop |
podmanctl --stop <svc>(whole stack → consistent files) |
🟩 / 🟥 |
sync |
rclone sync data/ → OneDrive(mirror) |
🟩 / 🟥 |
archive |
tar\|gzip\|rclone rcatretention, Sundays only |
🟩 / 🟥 / ⬜ (grey other days) |
start |
podmanctl --start <svc>run rule always |
🟩 / 🟥 |
A step's exit code drives its box: 0 → success 🟩, 75 → skipped ⬜, anything
else → failed 🟥. The start step uses the always run rule, so the service
is restarted even if a previous step failed.
🔐 Backup Gate¶
The host-side plumbing that the DAGs call through SSH:
- rclone (OneDrive transfers)
- backupctl (the validating gate)
- backup_ops SSH account.
☁️ OneDrive Remote¶
Install rclone and configure the OneDrive remote
# install rclone (OneDrive transfers are driven by backupctl)
sudo apt update
sudo apt install -y rclone
# the ODROID is headless: the Microsoft OAuth consent needs a graphical
# browser, and its redirect is hard-wired to localhost:53682 ON THE
# MACHINE RUNNING RCLONE. An SSH tunnel makes the desktop browser's
# localhost reach the ODROID, so everything stays on the device.
#
# 1. on the DESKTOP (Windows 11: OpenSSH is built in), open the tunnel
# and KEEP THE SESSION OPEN:
# ssh -L 53682:localhost:53682 debian@odroid
#
# 2. inside that SSH session, run the interactive configuration:
rclone config
# n) New remote
# name> onedrive
# Storage> onedrive
# client_id> (empty)
# client_secret> (empty)
# region> global
# Edit advanced config? n
# Use web browser to authenticate? y ← yes: the tunnel carries the redirect
#
# 3. rclone cannot open a browser on the ODROID: it prints the URL
# http://127.0.0.1:53682/auth?state=...
# open it in the DESKTOP browser and sign in to the Microsoft
# account — the localhost redirect flows back through the tunnel
# and rclone resumes automatically:
#
# Your type of connection> onedrive (personal or business)
# Drive to use> (select your drive → fills drive_id / drive_type)
# Keep this "onedrive" remote? y
# the live token belongs to root only: backupctl is the single consumer
sudo mkdir -p /etc/backupctl
sudo mv ~/.config/rclone/rclone.conf /etc/backupctl/rclone.conf
sudo chown root:root /etc/backupctl/rclone.conf
sudo chmod 0600 /etc/backupctl/rclone.conf
# verify the remote and create the backup folder
sudo rclone --config /etc/backupctl/rclone.conf lsd onedrive:
sudo rclone --config /etc/backupctl/rclone.conf mkdir onedrive:/OdroidBackup
The live OAuth token rotates — never re-apply an old copy
Microsoft rotates the refresh token on every use: rclone persists each rotation by rewriting /etc/backupctl/rclone.conf.
That file (root, 0600) is the single source of truth for the token, the LightFlow backup_onedrive Variable never holds it (it only carries remote/folder).
Keep a copy of the working config in Bitwarden.
🎛️ Install backupctl¶
Install backupctl to /usr/local/bin
# download the script published by this documentation site
sudo curl -fsSL https://docs.fum-server.fr/files/backupctl.py -o /usr/local/bin/backupctl
# make it executable for everyone, writable by root only
sudo chmod 0755 /usr/local/bin/backupctl
# check the installation (backupctl refuses to run without root)
sudo backupctl --help
The script is plain Python 3 standard library.
👤 Create the backup_ops SSH account¶
Create the backup_ops user and sudoers rule
# dedicated account used ONLY by the LightFlow container to reach backupctl.
# NOT added to the no-ssh group on purpose: SSH is its entire reason to exist,
# and the forced command below is the only thing it can ever execute.
sudo useradd --system --create-home --home-dir /home/backup_ops --shell /bin/bash backup_ops
# sudoers drop-in: keep the client's original command line visible to the
# gate (sshd exports it as SSH_ORIGINAL_COMMAND) and allow exactly ONE
# invocation as root — no wildcard
sudo tee /etc/sudoers.d/backupctl >/dev/null <<'EOF'
Defaults!/usr/local/bin/backupctl env_keep += "SSH_ORIGINAL_COMMAND"
backup_ops ALL=(root) NOPASSWD: /usr/local/bin/backupctl --ssh-gate
EOF
sudo chmod 0440 /etc/sudoers.d/backupctl
# validate the sudoers syntax before it can lock you out
sudo visudo -cf /etc/sudoers.d/backupctl
🔑 Authorize Lightflow's SSH key¶
Lightflow doesn't keep a private key file on disk: the key material lives in
the BACKUP_SSH_KEY Variable (secret) and backup.py writes it to a private,
short-lived temp file only while ssh runs. The same applies to the host-key
line in BACKUP_KNOWN_HOSTS. This keeps the script generic, a future task that
reaches another device just gets its own key Variable, with nothing to mount.
Generate a key pair, authorize it, and capture the values to paste
# generate a throwaway key pair in a temp dir, the PRIVATE key will live only
# in the BACKUP_SSH_KEY Variable, never as a file on the host or a mount
TMP="$(mktemp -d)"
ssh-keygen -t ed25519 -N "" -C "lightflow-backup" -f "${TMP}/id_ed25519"
# authorize the PUBLIC key for backup_ops with the SAME forced command
PUBKEY="$(cat ${TMP}/id_ed25519.pub)"
sudo tee -a /home/backup_ops/.ssh/authorized_keys >/dev/null <<EOF
restrict,command="sudo /usr/local/bin/backupctl --ssh-gate" ${PUBKEY}
EOF
sudo chmod 0600 /home/backup_ops/.ssh/authorized_keys
sudo chown backup_ops:backup_ops /home/backup_ops/.ssh/authorized_keys
# ---- value for BACKUP_KNOWN_HOSTS -------------------------------------------
# the container reaches the host as host.containers.internal, so the known_hosts
# line must carry THAT name in front of the host's own ed25519 public host key
printf 'host.containers.internal %s\n' \
"$(sudo awk '{print $1, $2}' /etc/ssh/ssh_host_ed25519_key.pub)"
# ---- value for BACKUP_SSH_KEY -----------------------------------------------
# print the PRIVATE key to copy, then wipe the temp dir so nothing lingers
cat "${TMP}/id_ed25519"
rm -rf "${TMP}"
Paste the private key into the
BACKUP_SSH_KEYVariable (secret) and thehost.containers.internal …line intoBACKUP_KNOWN_HOSTS. No private key file is kept on the host and no.sshvolume is mounted.backupctlaccepts only--stop/--start/--sync/--archiveover the gate.--restorestays a manual, on-host operation.
⚙️ Configure Lightflow (in the UI)¶
1. Create the backup pool
Pools → Add pool
- Name:
backup - Slots:
1
A 1-slot pool serializes every backup step across all services Two steps can never run at once.
2. Create the backup variables
Variables → Add variable (these become environment variables for the scripts):
| Key | Value | Secret |
|---|---|---|
BACKUP_HOST |
backup_ops@host.containers.internal |
no |
BACKUP_SSH_KEY |
(paste the SSH private key contents — see below) | yes |
BACKUP_KNOWN_HOSTS |
(paste the known_hosts host-key line — see below) |
no |
ONEDRIVE_REMOTE |
onedrive |
no |
ONEDRIVE_FOLDER |
/OdroidBackup |
no |
ARCHIVE_WEEKDAY |
sunday |
no |
KEEP_WEEKLY |
3 |
no |
KEEP_MONTHLY |
12 |
no |
The OneDrive token is not stored here. It rotates in
/etc/backupctl/rclone.confon the host.
3. Create the backup scripts
#!/usr/bin/env python3
"""
backup.py — sample Lightflow task script (OneDrive backup of a Podman service).
Lightflow itself is a generic runner: this script holds the backup logic and
reaches the host EXACTLY like the old Airflow DAGs did — over the restricted
`backup_ops` SSH forced command, which runs `sudo backupctl --ssh-gate` on the
host. Lightflow never reads another service's files and never holds the OneDrive
token; the security boundary lives in `backupctl` on the host.
One Lightflow TASK per service, made of four ordered STEPS — each step runs this
script with one action and shows up as one box in the grid:
Step 1 stop python3 /scripts/backup.py --stop <service> (on_success)
Step 2 sync python3 /scripts/backup.py --sync <service> (on_success)
Step 3 archive python3 /scripts/backup.py --archive <service> (on_success)
Step 4 start python3 /scripts/backup.py --start <service> (always)
`archive` exits 75 (Lightflow's "skipped"/grey box) on non-archive days, so the
weekly archive shows grey except on ARCHIVE_WEEKDAY. `start` uses the `always`
run rule so the service is restarted even if a previous step failed.
A non-service directory on the host (e.g. a git tree owned by www-data:debian)
is mirrored with a single step — it has no stack to stop/start:
sync-extra python3 /scripts/backup.py --sync-extra <target> (on_success)
`<target>` is a NAME from backupctl's EXTRA_TARGETS allow-list (the host resolves
its path; we never pass --path); the mirror lands under <remote>/extra/<target>/.
Configuration comes from Lightflow VARIABLES (injected as environment variables):
BACKUP_HOST backup_ops@host.containers.internal (required)
BACKUP_SSH_KEY SSH private key — inline PEM value OR a file path
BACKUP_KNOWN_HOSTS known_hosts entry — inline value OR a file path
ONEDRIVE_REMOTE onedrive (default)
ONEDRIVE_FOLDER /OdroidBackup (default)
ARCHIVE_WEEKDAY sunday (default)
KEEP_WEEKLY 3 (default)
KEEP_MONTHLY 12 (default)
STOP_TIMEOUT / SYNC_TIMEOUT / ARCHIVE_TIMEOUT / START_TIMEOUT (seconds)
Holding the SSH key (and known_hosts) inline in a secret Variable keeps the
script generic — any future device just needs its own key Variable, with no
host-side key files to provision. The key material is written to a private,
short-lived temp file only while `ssh` runs, then removed.
"""
import os
import re
import subprocess
import sys
import tempfile
import textwrap
from datetime import date
SKIP_EXIT_CODE = 75 # Lightflow renders this as a grey "skipped" box
ACTIONS = {"--stop", "--start", "--sync", "--archive", "--sync-extra"}
def env(key, default=None):
return os.environ.get(key, default)
def materialize(value, kind):
"""Resolve a key / known_hosts VALUE to a file path that `ssh` can use.
A Variable may hold either a filesystem PATH (legacy) or the INLINE value
itself (preferred — generic, no host-side key files). Inline content is
written to a private (0600) temp file; the caller removes it afterwards.
A private key is recognised by its PEM header; a known_hosts entry by the
whitespace separating its fields. A key whose line breaks were flattened
to spaces by a single-line input field is re-wrapped into a valid PEM.
Returns (path, is_temp).
"""
# normalize line endings: a value pasted or stored with Windows CRLF
# (or a stray \r) keeps \r inside the body, so force LF.
text = value.replace("\r\n", "\n").replace("\r", "\n").strip()
inline = "-----BEGIN" in text if kind == "key" else any(c.isspace() for c in text)
if not inline:
return value, False
if kind == "key":
# Some Variable fields flatten a multi-line PEM onto ONE line, turning
# the newlines into spaces — ssh then fails with "error in libcrypto".
# Rebuild a canonical PEM: keep the BEGIN/END markers, strip ALL
# whitespace from the base64 body, then re-wrap at 70 columns. This is
# a no-op on an already-correct key, so it is always safe to run.
m = re.search(r"-----BEGIN ([A-Z0-9 ]+)-----(.*)-----END \1-----", text, re.S)
if m:
label = m.group(1)
body = re.sub(r"\s+", "", m.group(2))
text = "-----BEGIN {0}-----\n{1}\n-----END {0}-----".format(
label, "\n".join(textwrap.wrap(body, 70))
)
fd, path = tempfile.mkstemp(prefix="lightflow-", suffix=f".{kind}")
with os.fdopen(fd, "w") as f:
f.write(text + "\n")
os.chmod(path, 0o600)
return path, True
def ssh_backupctl(args, timeout):
"""Run `backupctl <args> --timeout <t>` on the host via the SSH forced command."""
host = env("BACKUP_HOST")
if not host:
sys.exit("error: BACKUP_HOST is not set (e.g. backup_ops@host.containers.internal)")
key, key_tmp = materialize(env("BACKUP_SSH_KEY", "/data/.ssh/id_ed25519"), "key")
known, known_tmp = materialize(env("BACKUP_KNOWN_HOSTS", "/data/.ssh/known_hosts"), "known_hosts")
remote = "backupctl " + " ".join(args) + f" --timeout {timeout}"
cmd = [
"ssh",
"-i", key,
"-o", "BatchMode=yes",
"-o", "StrictHostKeyChecking=accept-new",
"-o", f"UserKnownHostsFile={known}",
host,
remote,
]
print("+ " + " ".join(cmd), flush=True)
try:
return subprocess.run(cmd).returncode
finally:
for path, is_tmp in ((key, key_tmp), (known, known_tmp)):
if is_tmp:
try:
os.unlink(path)
except OSError:
pass
def main():
if len(sys.argv) < 3 or sys.argv[1] not in ACTIONS:
sys.exit("usage: backup.py <stop|start|sync|archive|sync-extra> <service|target>")
action, service = sys.argv[1], sys.argv[2]
if not service.strip():
sys.exit("error: service name is empty")
path = f"/media/ssd/podman/{service}/data"
remote = f"{env('ONEDRIVE_REMOTE', 'onedrive')}:{env('ONEDRIVE_FOLDER', '/OdroidBackup')}"
if action == "--stop":
rc = ssh_backupctl(["--stop", service], int(env("STOP_TIMEOUT", "120")))
elif action == "--start":
rc = ssh_backupctl(["--start", service], int(env("START_TIMEOUT", "300")))
elif action == "--sync":
rc = ssh_backupctl(
["--sync", service, "--path", path, "--remote", remote],
int(env("SYNC_TIMEOUT", "3600")),
)
elif action == "--sync-extra":
# `service` is the extra-target NAME here; the host resolves its path
# from backupctl's EXTRA_TARGETS allow-list, so we never pass --path.
rc = ssh_backupctl(
["--sync-extra", service, "--remote", remote],
int(env("SYNC_TIMEOUT", "3600")),
)
elif action == "--archive":
weekday = env("ARCHIVE_WEEKDAY", "sunday").lower()
if date.today().strftime("%A").lower() != weekday:
print(f"skip: weekly archive only runs on {weekday}", flush=True)
sys.exit(SKIP_EXIT_CODE)
rc = ssh_backupctl(
[
"--archive", service, "--path", path, "--remote", remote,
"--keep-weekly", env("KEEP_WEEKLY", "3"),
"--keep-monthly", env("KEEP_MONTHLY", "12"),
],
int(env("ARCHIVE_TIMEOUT", "3600")),
)
else:
sys.exit(f"error: unknown action {action!r} (stop|start|sync|archive|sync-extra)")
sys.exit(rc)
if __name__ == "__main__":
main()
#!/usr/bin/env python3
"""
self_backup.py — make a consistent, *hot* snapshot of Lightflow's own
SQLite database, WITHOUT stopping the container.
Lightflow can back itself up like any other service. The only file that is
unsafe to copy while the process runs is the live SQLite DB — a plain `cp` can
catch it mid-write. SQLite's online backup API copies a *transactionally
consistent* snapshot even under concurrent writes, so the server never stops.
This step runs INSIDE the Lightflow container (uid 1000, Python stdlib only —
no extra package) and writes the snapshot into the data volume. A *second*,
SSH step then ships the data directory offsite through the host `backupctl`
gate, exactly like every other service:
ssh backup_ops backupctl --archive lightflow \
--path /media/ssd/podman/lightflow/data
The scripts/ and python-libs/ folders are ordinary files and are copied as-is by
that archive step (python-libs is reproducible from `pip`, so it is optional).
Restore: extract the archive, then use the snapshot file produced here
(`<data>/backup/lightflow.db`) as the database — discard the live
`lightflow.db` / `-wal` / `-shm` captured alongside it, which may be momentarily
inconsistent.
"""
import os
import sqlite3
import sys
from pathlib import Path
DATA_DIR = Path(os.environ.get("LIGHTFLOW_DATA_DIR", "/data"))
LIVE_DB = DATA_DIR / "lightflow.db"
SNAPSHOT_DIR = DATA_DIR / "backup"
SNAPSHOT_DB = SNAPSHOT_DIR / "lightflow.db"
def main() -> int:
if not LIVE_DB.exists():
print(f"error: live database not found at {LIVE_DB}", file=sys.stderr)
return 1
SNAPSHOT_DIR.mkdir(parents=True, exist_ok=True)
# Read-only source: we never write to the live DB. The online backup API
# streams a consistent copy page-by-page, restarting automatically if a
# writer commits mid-copy — so the snapshot is always consistent.
src = sqlite3.connect(f"file:{LIVE_DB}?mode=ro", uri=True)
try:
dst = sqlite3.connect(SNAPSHOT_DB)
try:
with dst:
src.backup(dst)
finally:
dst.close()
finally:
src.close()
size = SNAPSHOT_DB.stat().st_size
print(f"snapshot written: {SNAPSHOT_DB} ({size} bytes)")
return 0
if __name__ == "__main__":
sys.exit(main())
4. Create one task per service
Tasks → New task — example for vaultwarden:
- Name:
backup_vaultwarden - CRON:
0 3 * * *(daily 03:00) — or stagger services a few minutes apart - Pool:
backup - Steps (each runs
python3 backup.py <action> vaultwarden):
| # | Name | Command | Arguments | Run rule | Timeout |
|---|---|---|---|---|---|
| 1 | stop | python3 |
--stop vaultwarden |
on_success | 30 |
| 2 | sync | python3 |
--sync vaultwarden |
on_success | 300 |
| 3 | archive | python3 |
--archive vaultwarden |
on_success | 600 |
| 4 | start | python3 |
--start vaultwarden |
always | 180 |
archiveexits 75 (grey ⬜ "skipped") on non-ARCHIVE_WEEKDAYdays.startuses always so the stack is restarted even if an earlier step failed. Duplicate the task for each service (calibre, immich, n8n, planka, …), changing the name and the service argument. The 1-slotbackuppool keeps them serialized.
💾 Operations¶
Run and monitor the backups
- scheduled: each
backup_<service>task fires on its CRON; the Auto toggle enables/disables scheduling without deleting the task - on demand: open a task → ▶ Run now (or ▶ Run on the Tasks page)
- grid: 🟩 step ok — 🟥 step failed/timed out (the service is still restarted by
start) — ⬜archiveskipped (not the archive weekday) - logs: open a task → pick a run column → pick a step → the captured
backupctloutput (rclone stats included) streams live while it runs - OneDrive layout:
/OdroidBackup/<service>/data/(mirror) and/OdroidBackup/<service>/archives/<service>-YYYY-MM-DD.tar.gz
First run — smoke test
Trigger the smallest service first (e.g. backup_vaultwarden) and check:
- the four boxes go green and the service is back up (
podmanctl --list) - the mirror appeared on OneDrive under
/OdroidBackup/vaultwarden/data/ - trigger it again on a non-archive day:
archivemust show skipped (grey)
Restore a backup (manual, on the host)
Restoring stays a deliberate on-host operation through backupctl --restore
and is refused over the SSH gate on purpose — run it directly on the ODROID
from the Debian account.