feat: add borg backup support and classification improvements

This commit is contained in:
Eddie Nielsen 2026-03-23 14:46:33 +00:00
parent 483e2720f1
commit e5ef50a74a
15 changed files with 1293 additions and 649 deletions

367
README.md
View file

@ -1,5 +1,5 @@
<p align="center"> <p align="center">
<img src="images/dockervault-logo.png" width="600"> <img src="images/dockervault_logo.png" width="600">
</p> </p>
# DockerVault # DockerVault
@ -15,172 +15,262 @@ No guesswork. No forgotten volumes. No broken restores.
## 📚 Contents ## 📚 Contents
* [🚀 What is DockerVault?](#-what-is-dockervault) * [🚀 What is DockerVault?](#-what-is-dockervault)
* [🧠 Why DockerVault?](#-why-dockervault)
* [⚡ Quick Start](#-quick-start) * [⚡ Quick Start](#-quick-start)
* [🧠 How it Works](#-how-it-works)
* [🗂 Classification Model](#-classification-model)
* [💾 Borg Integration](#-borg-integration)
* [🤖 Automation Mode](#-automation-mode)
* [🔢 Exit Codes](#-exit-codes)
* [🛠 Tech Stack](#-tech-stack) * [🛠 Tech Stack](#-tech-stack)
* [🔍 Example](#-example) * [🔍 Example](#-example)
* [🧱 Current Features](#-current-features) * [🧱 Current Features](#-current-features)
* [🔥 Roadmap](#-roadmap) * [🔥 Roadmap](#-roadmap)
* [🧠 Philosophy](#-philosophy) * [🧠 Philosophy](#-philosophy)
* [📜 License](#-license) * [📜 License](#-license)
* [🤝 Contributing](#-contributing) * [❤️ Credits](#-credits)
--- ---
## 🚀 What is DockerVault? ## 🚀 What is DockerVault?
DockerVault is a CLI tool that: DockerVault analyzes your `docker-compose.yml` and identifies:
* Scans Docker Compose projects * What **must** be backed up
* Parses services, volumes, env files * What can be **ignored**
* Identifies **real data vs noise** * What needs **human review**
* Builds a structured backup understanding
Built for people running real systems — not toy setups. It bridges the gap between:
--- 👉 “everything looks fine”
and
## 🧠 Why DockerVault? 👉 “restore just failed”
Most backup setups fail because:
* You forget a volume
* You miss an `.env` file
* You back up cache instead of data
* You dont know what actually matters
DockerVault solves this by **thinking like an operator**.
--- ---
## ⚡ Quick Start ## ⚡ Quick Start
```bash ```bash
git clone https://git.lanx.dk/ed/dockervault.git git clone https://github.com/YOUR-USER/dockervault.git
cd dockervault cd dockervault
python3 -m venv .venv
source .venv/bin/activate
pip install -e . pip install -e .
dockervault scan /path/to/docker
``` ```
Run analysis:
```bash
python -m dockervault.cli docker-compose.yml --borg --repo /backup-repo
```
Run backup:
```bash
python -m dockervault.cli docker-compose.yml \
--run-borg \
--repo /backup-repo
```
---
## 🧠 How it Works
DockerVault parses your compose file and inspects:
* bind mounts
* volume targets
* known data paths
It then classifies them using heuristics:
* database paths → critical
* logs/cache → optional
* unknown → review
---
## 🗂 Classification Model
DockerVault divides everything into three categories:
### ✅ INCLUDE
Must be backed up.
Example:
```
/var/lib/mysql
/data
/config
```
### ⚠️ REVIEW
Needs human decision.
Triggered when:
* path does not exist
* path exists but is empty
* named volumes (Docker-managed)
Example:
```
./mc-missing → /data
```
### ❌ SKIP
Safe to ignore.
Example:
```
/var/log
/tmp
/cache
```
---
## 💾 Borg Integration
DockerVault can generate and run Borg backups directly.
Example:
```bash
python -m dockervault.cli docker-compose.yml \
--run-borg \
--repo /mnt/backups/borg/dockervault
```
Generated command:
```bash
borg create --stats --progress \
/repo::hostname-2026-03-23_12-44-19 \
/path/to/data
```
### Features
* automatic archive naming (with seconds precision)
* deduplicated paths
* safe command generation
* subprocess execution
* optional passphrase support
---
## 🤖 Automation Mode
Designed for cron / scripts / servers.
```bash
python -m dockervault.cli docker-compose.yml \
--run-borg \
--quiet \
--automation \
--repo /backup-repo
```
### Behavior
* no plan output
* no interactive prompts
* minimal output
* suitable for logs / CI
---
## 🔢 Exit Codes
| Code | Meaning |
| ---- | ------------------------------------ |
| 0 | Success |
| 1 | General error |
| 2 | Missing required args |
| 3 | No include paths |
| 4 | Review required (`--fail-on-review`) |
### Fail on review
```bash
--fail-on-review
```
Stops automation if something needs human attention.
--- ---
## 🛠 Tech Stack ## 🛠 Tech Stack
DockerVault is built using simple, reliable components: * Python 3.10+
* PyYAML
* **Python 3.10+** core language * BorgBackup
* **PyYAML** parsing Docker Compose files * CLI-first design
* **argparse** CLI interface
* **pip / venv** environment management
---
### 🔧 Designed for
* Linux systems (Ubuntu, Debian, Unraid environments)
* Docker Compose based setups
* CLI-first workflows
--- ---
## 🔍 Example ## 🔍 Example
```bash Input:
dockervault scan ~/test-docker --json
```yaml
services:
db:
volumes:
- ./db:/var/lib/mysql
mc:
volumes:
- ./mc-missing:/data
nginx:
volumes:
- ./logs:/var/log/nginx
``` ```
```json Output:
[
{ ```
"name": "app2", INCLUDE:
"services": [ db
{
"name": "app", REVIEW:
"image": "ghcr.io/example/app:latest", mc-missing
"env_files": [".env"],
"mounts": [ SKIP:
"./data:/app/data", logs
"./config:/app/config"
]
}
],
"named_volumes": ["app_cache"]
}
]
``` ```
--- ---
## 🧱 Current Features ## 🧱 Current Features
* CLI interface * Docker Compose parsing
* Recursive project scanning * Bind mount detection
* Docker Compose parsing (YAML) * Intelligent classification
* Service detection * Borg backup integration
* Volume + bind mount detection * Automation mode
* Environment file detection * Exit codes for scripting
* Safe path handling
* Deduplication
--- ---
## 🔥 Roadmap ## 🔥 Roadmap
### ✅ Phase 1 Discovery * [ ] Named volume inspection (`docker volume inspect`)
* [ ] Docker API integration
* [x] CLI * [ ] Multiple compose files support
* [x] Scan command * [ ] Email / ntfy notifications
* [x] YAML parsing * [ ] Web interface
* [ ] Backup history tracking
---
### 🚧 Phase 2 Intelligence
* [ ] Classify mounts (data / config / cache)
* [ ] Detect backup candidates
* [ ] Generate backup plan
---
### 🔜 Phase 3 Storage
* [ ] SQLite inventory
* [ ] Historical tracking
* [ ] Change detection
---
### 🔜 Phase 4 Execution
* [ ] Borg integration
* [ ] Backup automation
* [ ] Restore validation * [ ] Restore validation
* [ ] Scheduling integration
---
### 🔔 Phase 5 Notifications & Monitoring
* [ ] Email notifications
* [ ] ntfy.sh integration
* [ ] Webhook support
* [ ] Alerts on:
* missing backups
* new volumes
* changed data paths
* [ ] Daily/weekly reports
---
### 🧠 Future Ideas
* [ ] Auto-detect Docker hosts on network
* [ ] Multi-node backup coordination (Lanx-style)
* [ ] Backup simulation ("what would be backed up?")
* [ ] Restore dry-run validation
* [ ] Tagging system (critical / optional / ignore)
--- ---
@ -188,43 +278,30 @@ dockervault scan ~/test-docker --json
DockerVault is built on a simple idea: DockerVault is built on a simple idea:
> Backups should be **correct by default** > Backups should reflect reality — not assumptions.
Not configurable chaos. * No blind backups
* No hidden data
* No silent failures
Not guesswork. Just clarity.
But **system understanding**.
--- ---
## 📜 License ## 📜 License
This project is licensed under the **GNU General Public License v3.0 (GPL-3.0)**. GNU GPLv3
You are free to: This project is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License v3.
* Use the software
* Study and modify it
* Share and redistribute it
Under the condition that:
* Any derivative work must also be licensed under GPL-3.0
* Source code must be made available when distributed
See the full license in the [LICENSE](LICENSE) file.
--- ---
## 🤝 Contributing ## ❤️ Credits
Created by **Ed & NodeFox 🦊**
Built with ❤️ for Lanx
Maintained by Eddie Nielsen
Feel free to contribute, suggest improvements or fork the project. Feel free to contribute, suggest improvements or fork the project.
---
<p align="center">
Built with ❤️ for Lanx by NodeFox 🦊
Maintained by Eddie Nielsen & NodeFox 🦊
Feel free to contribute, suggest improvements or fork the project.
</p>

115
dockervault/borg.py Normal file
View file

@ -0,0 +1,115 @@
from __future__ import annotations
import os
import shlex
import socket
import subprocess
from datetime import datetime
from pathlib import Path
from typing import Iterable
def borg_env(passphrase: str | None = None) -> dict[str, str]:
env = os.environ.copy()
if passphrase:
env["BORG_PASSPHRASE"] = passphrase
return env
def build_archive_name(prefix: str | None = None) -> str:
"""
Build a borg archive name.
Default format:
hostname-YYYY-MM-DD_HH-MM-SS
With prefix:
prefix-hostname-YYYY-MM-DD_HH-MM-SS
"""
hostname = socket.gethostname()
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
if prefix:
return f"{prefix}-{hostname}-{timestamp}"
return f"{hostname}-{timestamp}"
def normalize_include_paths(include_paths: Iterable[str | Path]) -> list[str]:
normalized: list[str] = []
seen: set[str] = set()
for path in include_paths:
resolved = str(Path(path))
if resolved not in seen:
seen.add(resolved)
normalized.append(resolved)
return normalized
def build_borg_create_command(
repo: str,
include_paths: Iterable[str | Path],
archive_name: str | None = None,
stats: bool = True,
progress: bool = True,
) -> list[str]:
normalized_paths = normalize_include_paths(include_paths)
if not normalized_paths:
raise ValueError("No include paths provided for borg backup.")
if archive_name is None:
archive_name = build_archive_name()
command = ["borg", "create"]
if stats:
command.append("--stats")
if progress:
command.append("--progress")
command.append(f"{repo}::{archive_name}")
command.extend(normalized_paths)
return command
def command_to_shell(command: list[str]) -> str:
return " ".join(shlex.quote(part) for part in command)
def run_borg_create(
repo: str,
include_paths: Iterable[str | Path],
passphrase: str | None = None,
archive_name: str | None = None,
stats: bool = True,
progress: bool = True,
quiet: bool = False,
) -> int:
command = build_borg_create_command(
repo=repo,
include_paths=include_paths,
archive_name=archive_name,
stats=stats,
progress=progress,
)
stdout = subprocess.DEVNULL if quiet else None
stderr = subprocess.DEVNULL if quiet else None
result = subprocess.run(
command,
env=borg_env(passphrase),
stdout=stdout,
stderr=stderr,
check=False,
)
return result.returncode

View file

@ -0,0 +1,15 @@
from .engine import ClassificationEngine
from .models import (
Classification,
ClassificationResult,
MountCandidate,
RuleEvidence,
)
__all__ = [
"ClassificationEngine",
"Classification",
"ClassificationResult",
"MountCandidate",
"RuleEvidence",
]

View file

@ -0,0 +1,80 @@
DATABASE_PATH_KEYWORDS = [
"/var/lib/mysql",
"/var/lib/mariadb",
"/var/lib/postgresql",
"/var/lib/postgresql/data",
"/data/db",
]
CONFIG_PATH_KEYWORDS = [
"/config",
"/app/config",
"/settings",
"/etc",
]
DATA_PATH_KEYWORDS = [
"/data",
"/app/data",
"/srv/data",
"/var/lib",
]
EPHEMERAL_PATH_KEYWORDS = [
"/tmp",
"/var/tmp",
"/cache",
"/var/cache",
"/transcode",
"/run",
"/var/run",
]
LOG_PATH_KEYWORDS = [
"/logs",
"/log",
"/var/log",
]
DATABASE_IMAGE_HINTS = [
"mysql",
"mariadb",
"postgres",
"postgresql",
"mongo",
"mongodb",
"redis",
]
KNOWN_IMPORTANT_IMAGE_HINTS = [
"nextcloud",
"grafana",
"vaultwarden",
"gitea",
"portainer",
"paperless",
"immich",
"wordpress",
"nginx",
"traefik",
"minecraft",
"itzg",
]
MINECRAFT_IMAGE_HINTS = [
"minecraft",
"itzg",
]
MINECRAFT_CRITICAL_PATHS = [
"/data",
"/server",
"/minecraft",
]
MINECRAFT_IMPORTANT_PATHS = [
"/plugins",
"/config",
"/mods",
"/world",
]

View file

@ -0,0 +1,52 @@
from collections import defaultdict
from .models import ClassificationResult, Classification
from .rules import DEFAULT_RULES
from .utils import unique_preserve_order
class ClassificationEngine:
def __init__(self, rules=None):
self.rules = rules or DEFAULT_RULES
def classify(self, candidate):
scores = defaultdict(int)
reasons = []
tags = []
matched = []
for rule in self.rules:
results = rule(candidate)
for result in results:
scores[result.classification] += result.score
reasons.extend(result.reasons)
tags.extend(result.tags)
matched.append(result.rule_name)
if not scores:
return ClassificationResult(
candidate=candidate,
classification=Classification.UNKNOWN,
confidence=0.0,
score=0,
reasons=["No rules matched"],
tags=["unknown"],
matched_rules=[],
score_breakdown={},
)
classification, score = max(scores.items(), key=lambda item: item[1])
total_score = sum(scores.values())
confidence = score / total_score if total_score else 0.0
return ClassificationResult(
candidate=candidate,
classification=classification,
confidence=round(confidence, 2),
score=score,
reasons=reasons,
tags=unique_preserve_order(tags),
matched_rules=unique_preserve_order(matched),
score_breakdown={cls.value: value for cls, value in scores.items()},
)

View file

@ -0,0 +1,44 @@
from dataclasses import dataclass, field
from enum import Enum
from typing import Dict, List, Optional
class Classification(str, Enum):
CRITICAL = "critical"
IMPORTANT = "important"
OPTIONAL = "optional"
EPHEMERAL = "ephemeral"
UNKNOWN = "unknown"
@dataclass
class MountCandidate:
service_name: str
image: str
source: str
target: str
mount_type: str
read_only: bool = False
env: Dict[str, str] = field(default_factory=dict)
compose_project: Optional[str] = None
@dataclass
class RuleEvidence:
rule_name: str
classification: Classification
score: int
reasons: List[str] = field(default_factory=list)
tags: List[str] = field(default_factory=list)
@dataclass
class ClassificationResult:
candidate: MountCandidate
classification: Classification
confidence: float
score: int
reasons: List[str]
tags: List[str]
matched_rules: List[str]
score_breakdown: Dict[str, int] = field(default_factory=dict)

View file

@ -0,0 +1,73 @@
from typing import List
from .models import Classification, MountCandidate, RuleEvidence
from .defaults import *
from .utils import norm, path_contains, text_contains
def rule_minecraft(candidate: MountCandidate) -> List[RuleEvidence]:
image = norm(candidate.image)
target = norm(candidate.target)
if any(h in image for h in MINECRAFT_IMAGE_HINTS):
if any(p in target for p in MINECRAFT_CRITICAL_PATHS):
return [RuleEvidence("minecraft_critical", Classification.CRITICAL, 45,
[f"{candidate.target} looks like Minecraft world data"], ["minecraft"])]
if any(p in target for p in MINECRAFT_IMPORTANT_PATHS):
return [RuleEvidence("minecraft_important", Classification.IMPORTANT, 25,
[f"{candidate.target} looks like Minecraft config/plugins"], ["minecraft"])]
return []
def rule_database(candidate: MountCandidate) -> List[RuleEvidence]:
if path_contains(candidate.target, DATABASE_PATH_KEYWORDS):
return [RuleEvidence("db_path", Classification.CRITICAL, 40,
[f"{candidate.target} is database path"], ["database"])]
if text_contains(candidate.image, DATABASE_IMAGE_HINTS):
return [RuleEvidence("db_image", Classification.CRITICAL, 25,
[f"{candidate.image} looks like DB"], ["database"])]
return []
def rule_config(candidate: MountCandidate) -> List[RuleEvidence]:
if path_contains(candidate.target, CONFIG_PATH_KEYWORDS):
return [RuleEvidence("config", Classification.IMPORTANT, 20,
[f"{candidate.target} is config"], ["config"])]
return []
def rule_data(candidate: MountCandidate) -> List[RuleEvidence]:
if path_contains(candidate.target, DATA_PATH_KEYWORDS):
return [RuleEvidence("data", Classification.IMPORTANT, 20,
[f"{candidate.target} is data"], ["data"])]
return []
def rule_ephemeral(candidate: MountCandidate) -> List[RuleEvidence]:
if path_contains(candidate.target, EPHEMERAL_PATH_KEYWORDS):
return [RuleEvidence("ephemeral", Classification.EPHEMERAL, 35,
[f"{candidate.target} is temp/cache"], ["ephemeral"])]
return []
def rule_logs(candidate: MountCandidate) -> List[RuleEvidence]:
if path_contains(candidate.target, LOG_PATH_KEYWORDS):
return [RuleEvidence("logs", Classification.OPTIONAL, 15,
[f"{candidate.target} is logs"], ["logs"])]
return []
DEFAULT_RULES = [
rule_minecraft,
rule_database,
rule_config,
rule_data,
rule_ephemeral,
rule_logs,
]

View file

@ -0,0 +1,22 @@
def norm(value: str) -> str:
return (value or "").strip().lower()
def path_contains(target: str, keywords):
target = norm(target)
return any(k in target for k in keywords)
def text_contains(value: str, keywords):
value = norm(value)
return any(k in value for k in keywords)
def unique_preserve_order(values):
seen = set()
result = []
for v in values:
if v not in seen:
seen.add(v)
result.append(v)
return result

View file

@ -1,334 +1,292 @@
from __future__ import annotations from __future__ import annotations
import argparse import argparse
import json
import shlex
import socket
import subprocess
import sys import sys
from datetime import datetime
from pathlib import Path from pathlib import Path
from typing import Any from typing import Any, Iterable
from dockervault.classifier import classify_compose from .borg import (
build_borg_create_command,
command_to_shell,
run_borg_create,
)
from .classifier import classify_compose
def check_path_exists(path: str) -> bool: def _get_value(obj: Any, *names: str, default: Any = None) -> Any:
return Path(path).exists() for name in names:
if isinstance(obj, dict) and name in obj:
return obj[name]
if hasattr(obj, name):
return getattr(obj, name)
return default
def create_missing_paths(paths: list[str]) -> list[str]: def _normalize_entries(entries: Any) -> list[dict[str, Any]]:
created: list[str] = [] if not entries:
for path in sorted(set(paths)): return []
p = Path(path)
if not p.exists(): normalized: list[dict[str, Any]] = []
p.mkdir(parents=True, exist_ok=True)
created.append(str(p)) for entry in entries:
return created if isinstance(entry, dict):
normalized.append(
{
"source": (
entry.get("source")
or entry.get("path")
or entry.get("host_path")
or entry.get("src")
),
"service": entry.get("service"),
"target": (
entry.get("target")
or entry.get("mount_target")
or entry.get("container_path")
or entry.get("destination")
),
"classification": (
entry.get("classification")
or entry.get("priority")
or entry.get("category")
or entry.get("kind")
),
"reason": entry.get("reason"),
}
)
continue
normalized.append(
{
"source": _get_value(entry, "source", "path", "host_path", "src"),
"service": _get_value(entry, "service"),
"target": _get_value(
entry, "target", "mount_target", "container_path", "destination"
),
"classification": _get_value(
entry, "classification", "priority", "category", "kind"
),
"reason": _get_value(entry, "reason"),
}
)
return normalized
def build_mkdir_suggestion(paths: list[str]) -> str: def _extract_plan_sections(
unique_paths = sorted(set(paths)) plan: Any,
lines = ["mkdir -p \\"] ) -> tuple[list[dict[str, Any]], list[dict[str, Any]], list[dict[str, Any]]]:
for index, path in enumerate(unique_paths): include_entries = _normalize_entries(
suffix = " \\" if index < len(unique_paths) - 1 else "" _get_value(plan, "include", "include_paths", "includes", default=[])
lines.append(f" {path}{suffix}") )
return "\n".join(lines) review_entries = _normalize_entries(
_get_value(plan, "review", "review_paths", "reviews", default=[])
)
def render_borg_archive(template: str, project: str, compose_path: Path) -> str: skip_entries = _normalize_entries(
now = datetime.now() _get_value(plan, "skip", "skip_paths", "skips", default=[])
hostname = socket.gethostname()
compose_stem = compose_path.stem
return template.format(
hostname=hostname,
project=project,
compose_stem=compose_stem,
now=now,
) )
return include_entries, review_entries, skip_entries
def build_borg_command(repo: str, archive_name: str, include_paths: list[str]) -> str:
lines = [
"borg create --stats --progress \\",
f" {repo}::{archive_name} \\",
]
for index, path in enumerate(include_paths):
suffix = " \\" if index < len(include_paths) - 1 else ""
lines.append(f" {path}{suffix}")
return "\n".join(lines)
def build_borg_argv(repo: str, archive_name: str, include_paths: list[str]) -> list[str]: def _entry_path(entry: dict[str, Any]) -> str:
return [ return str(entry.get("source") or "(unknown)")
"borg",
"create",
"--stats",
"--progress",
f"{repo}::{archive_name}",
*include_paths,
]
def find_missing_paths( def _entry_label(entry: dict[str, Any]) -> str:
plan: dict[str, Any], classification = entry.get("classification") or "unknown"
) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]: service = entry.get("service") or "unknown"
missing_include = [ target = entry.get("target") or "unknown"
item for item in plan.get("include", []) reason = entry.get("reason")
if not check_path_exists(item["source"])
]
missing_review = [ label = f"[{classification}] service={service} target={target}"
item for item in plan.get("review", []) if reason:
if not check_path_exists(item["source"]) label += f" reason={reason}"
] return label
return missing_include, missing_review
def print_human_summary(compose_file: Path, project_root: Path, plan: dict[str, Any]) -> None: def _print_section(title: str, entries: Iterable[dict[str, Any]]) -> None:
print("DockerVault Backup Plan") entries = list(entries)
print("=======================") print(f"{title}:")
print(f"Compose file: {compose_file.resolve()}") if not entries:
print(f"Project root: {project_root.resolve()}") print(" - (none)")
print()
for section in ["include", "review", "skip"]:
print(f"{section.upper()} PATHS:")
items = plan.get(section, [])
if items:
for item in items:
exists = check_path_exists(item["source"])
status = "✔ exists" if exists else "❌ missing"
print(
f" - {item['source']} "
f"[{item['priority']}] {status} "
f"service={item['service']} target={item['target']}"
)
else:
print(" - (none)")
print()
def print_missing_paths_report(
missing_include: list[dict[str, Any]],
missing_review: list[dict[str, Any]],
) -> None:
all_missing = missing_include + missing_review
if not all_missing:
return return
print("WARNING: Missing paths detected:") for entry in entries:
for item in all_missing: print(f" - {_entry_path(entry):<40} {_entry_label(entry)}")
bucket = "include" if item in missing_include else "review"
print(f" - {item['source']} (service={item['service']}, bucket={bucket})")
print()
def print_created_paths(created_paths: list[str]) -> None: def _collect_include_paths(include_entries: Iterable[dict[str, Any]]) -> list[str]:
if not created_paths: paths: list[str] = []
return seen: set[str] = set()
print("Created missing paths:") for entry in include_entries:
for path in created_paths: path = _entry_path(entry)
print(f" - {path}") if path == "(unknown)" or path in seen:
print() continue
seen.add(path)
paths.append(path)
return paths
def plan_to_json_dict( def _print_borg_plan(
compose_file: Path, compose_path: Path,
project_root: Path, project_root: Path,
plan: dict[str, Any], include_entries: list[dict[str, Any]],
borg_repo: str | None = None, review_entries: list[dict[str, Any]],
borg_archive: str | None = None, skip_entries: list[dict[str, Any]],
borg_command: str | None = None, repo: str | None,
missing_include: list[dict[str, Any]] | None = None, ) -> None:
missing_review: list[dict[str, Any]] | None = None, print()
) -> dict[str, Any]: print("Borg Backup Plan")
return { print("================")
"compose_file": str(compose_file.resolve()), print(f"Compose file: {compose_path}")
"project_root": str(project_root.resolve()), print(f"Project root: {project_root}")
"include": plan.get("include", []), print()
"review": plan.get("review", []),
"skip": plan.get("skip", []), _print_section("INCLUDE PATHS", include_entries)
"missing": { print()
"include": missing_include or [], _print_section("REVIEW PATHS", review_entries)
"review": missing_review or [], print()
}, _print_section("SKIP PATHS", skip_entries)
"borg": {
"repo": borg_repo, include_paths = _collect_include_paths(include_entries)
"archive": borg_archive,
"command": borg_command, if repo and include_paths:
} command = build_borg_create_command(
if borg_repo or borg_archive or borg_command repo=repo,
else None, include_paths=include_paths,
} )
print()
print("Suggested borg create command")
print("=============================")
print(command_to_shell(command))
def main() -> None: def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description="DockerVault - intelligent Docker backup discovery" prog="dockervault",
description="DockerVault - intelligent Docker backup discovery",
) )
parser.add_argument( parser.add_argument("compose", help="Path to docker-compose.yml")
"compose_file",
nargs="?",
default="docker-compose.yml",
help="Path to docker-compose.yml",
)
parser.add_argument(
"--json",
action="store_true",
help="Print plan as JSON",
)
parser.add_argument( parser.add_argument(
"--borg", "--borg",
action="store_true", action="store_true",
help="Show suggested borg create command", help="Show borg backup plan and suggested command",
) )
parser.add_argument( parser.add_argument(
"--run-borg", "--run-borg",
action="store_true", action="store_true",
help="Execute borg create", help="Run borg create using discovered include paths",
) )
parser.add_argument( parser.add_argument(
"--borg-repo", "--repo",
default="/backup-repo", help="Borg repository path, e.g. /mnt/backups/borg/dockervault",
help="Borg repository path or URI (default: /backup-repo)",
) )
parser.add_argument( parser.add_argument(
"--borg-archive", "--passphrase",
default="{hostname}-{now:%Y-%m-%d_%H-%M}", help="Optional borg passphrase",
help=(
"Archive naming template. Supported fields: "
"{hostname}, {project}, {compose_stem}, {now:...}"
),
) )
parser.add_argument( parser.add_argument(
"--fail-on-missing", "--quiet",
action="store_true", action="store_true",
help="Exit with status 2 if include/review paths are missing", help="Suppress borg stdout/stderr output during execution",
) )
parser.add_argument( parser.add_argument(
"--apply-mkdir", "--automation",
action="store_true", action="store_true",
help="Create missing include/review paths", help="Automation mode: minimal output, non-interactive behavior",
) )
parser.add_argument(
"--fail-on-review",
action="store_true",
help="Exit with code 4 if review paths are present",
)
return parser
def main() -> None:
parser = build_parser()
args = parser.parse_args() args = parser.parse_args()
compose_file = Path(args.compose_file).resolve() compose_path = Path(args.compose).expanduser().resolve()
if not compose_file.exists():
raise SystemExit(f"Compose file not found: {compose_file}")
project_root = compose_file.parent if not compose_path.exists():
project_name = project_root.name or compose_file.stem print(f"Error: compose file not found: {compose_path}", file=sys.stderr)
sys.exit(1)
plan = classify_compose(compose_file) try:
plan = classify_compose(compose_path)
except Exception as exc:
print(f"Error: failed to classify compose file: {exc}", file=sys.stderr)
sys.exit(1)
missing_include, missing_review = find_missing_paths(plan) include_entries, review_entries, skip_entries = _extract_plan_sections(plan)
all_missing = missing_include + missing_review include_paths = _collect_include_paths(include_entries)
project_root = compose_path.parent
created_paths: list[str] = [] should_show_plan = args.borg or (not args.automation and not args.quiet)
if args.apply_mkdir and all_missing:
created_paths = create_missing_paths([item["source"] for item in all_missing])
missing_include, missing_review = find_missing_paths(plan)
all_missing = missing_include + missing_review
borg_command: str | None = None if should_show_plan:
borg_argv: list[str] | None = None _print_borg_plan(
archive_name: str | None = None compose_path=compose_path,
project_root=project_root,
if args.borg or args.run_borg: include_entries=include_entries,
include_paths = [item["source"] for item in plan.get("include", [])] review_entries=review_entries,
skip_entries=skip_entries,
try: repo=args.repo,
archive_name = render_borg_archive(
args.borg_archive,
project_name,
compose_file,
)
except KeyError as exc:
raise SystemExit(
f"Invalid borg archive template field: {exc}. "
"Allowed: hostname, project, compose_stem, now"
) from exc
borg_command = build_borg_command(
repo=args.borg_repo,
archive_name=archive_name,
include_paths=include_paths,
) )
borg_argv = build_borg_argv( if args.fail_on_review and review_entries:
repo=args.borg_repo, if args.automation or args.quiet:
archive_name=archive_name, print("REVIEW required", file=sys.stderr)
include_paths=include_paths, else:
) print()
print("Review required before automated backup can proceed.", file=sys.stderr)
if args.json: sys.exit(4)
print(
json.dumps(
plan_to_json_dict(
compose_file=compose_file,
project_root=project_root,
plan=plan,
borg_repo=args.borg_repo if (args.borg or args.run_borg) else None,
borg_archive=archive_name,
borg_command=borg_command,
missing_include=missing_include,
missing_review=missing_review,
),
indent=2,
)
)
if args.fail_on_missing and all_missing:
sys.exit(2)
return
print_human_summary(compose_file, project_root, plan)
print_missing_paths_report(missing_include, missing_review)
print_created_paths(created_paths)
if all_missing and not args.apply_mkdir:
print("Suggested fix:")
print(build_mkdir_suggestion([item["source"] for item in all_missing]))
print()
if borg_command:
print("Suggested borg command:")
print(borg_command)
print()
if args.fail_on_missing and all_missing:
print("ERROR: Failing because include/review paths are missing.")
sys.exit(2)
if args.run_borg: if args.run_borg:
if borg_argv is None: if not args.repo:
raise SystemExit("Internal error: borg command was not prepared") print("Error: --run-borg requires --repo", file=sys.stderr)
sys.exit(2)
print("Running borg create...") if not include_paths:
print(" ".join(shlex.quote(part) for part in borg_argv)) print("Error: no include paths found for borg backup", file=sys.stderr)
print() sys.exit(3)
try: if not args.quiet:
completed = subprocess.run(borg_argv, check=False) print()
except FileNotFoundError as exc: print("Running borg backup...")
raise SystemExit("borg executable not found in PATH") from exc print("======================")
if completed.returncode != 0: exit_code = run_borg_create(
raise SystemExit(completed.returncode) repo=args.repo,
include_paths=include_paths,
passphrase=args.passphrase,
quiet=args.quiet,
stats=not args.quiet,
progress=not args.quiet,
)
if exit_code != 0:
print(f"Error: borg exited with status {exit_code}", file=sys.stderr)
sys.exit(exit_code)
if not args.quiet:
print()
print("Borg backup completed successfully.")
sys.exit(0)
if __name__ == "__main__": if __name__ == "__main__":

View file

@ -1,63 +1,31 @@
from __future__ import annotations from __future__ import annotations
from dataclasses import asdict, dataclass, field from dataclasses import dataclass
from pathlib import Path from pathlib import Path
@dataclass(slots=True) @dataclass
class MountMapping: class MountEntry:
source: str source: Path
target: str service: str = "unknown"
kind: str target: str = "unknown"
read_only: bool = False classification: str = "unknown"
reason: str = ""
def to_dict(self) -> dict: exists: bool = False
return asdict(self)
@dataclass(slots=True) @dataclass
class ServiceDefinition: class ValidationResult:
name: str missing: list[MountEntry]
image: str | None = None present: list[MountEntry]
restart: str | None = None
env_files: list[str] = field(default_factory=list)
mounts: list[MountMapping] = field(default_factory=list)
def to_dict(self) -> dict:
return asdict(self)
@dataclass(slots=True) @dataclass
class ComposeProject: class BorgSettings:
name: str repo: str
root_path: str archive_name: str
compose_files: list[str] = field(default_factory=list) passphrase_present: bool
services: list[ServiceDefinition] = field(default_factory=list) automation: bool
named_volumes: list[str] = field(default_factory=list) auto_init_repo: bool
backup_paths: list[str] = field(default_factory=list) encryption: str
quiet: bool = False
def to_dict(self) -> dict:
return {
"name": self.name,
"root_path": self.root_path,
"compose_files": self.compose_files,
"services": [service.to_dict() for service in self.services],
"named_volumes": self.named_volumes,
"backup_paths": self.backup_paths,
}
@property
def service_names(self) -> list[str]:
return [service.name for service in self.services]
DEFAULT_COMPOSE_FILENAMES = {
"docker-compose.yml",
"docker-compose.yaml",
"compose.yml",
"compose.yaml",
}
def normalize_path(path: Path) -> str:
return str(path.resolve())

View file

@ -1,222 +1,165 @@
from __future__ import annotations from __future__ import annotations
from pathlib import Path from pathlib import Path
from typing import Any from typing import Any, Dict, List
import yaml import yaml
from dockervault.models import ( from dockervault.classification.models import MountCandidate
ComposeProject,
DEFAULT_COMPOSE_FILENAMES,
MountMapping,
ServiceDefinition,
normalize_path,
)
def find_compose_files(base_path: Path) -> list[Path]: class DockerComposeScanner:
"""Find likely Docker Compose files under base_path.""" def __init__(self, compose_file: str | Path):
matches: list[Path] = [] self.compose_file = Path(compose_file)
self.base_dir = self.compose_file.parent
for path in base_path.rglob("*"): def load_compose(self) -> Dict[str, Any]:
if path.is_file() and path.name in DEFAULT_COMPOSE_FILENAMES: with self.compose_file.open("r", encoding="utf-8") as f:
matches.append(path) return yaml.safe_load(f) or {}
return sorted(matches) def scan(self) -> List[MountCandidate]:
compose = self.load_compose()
services = compose.get("services", {})
project_name = compose.get("name") or self.base_dir.name
candidates: List[MountCandidate] = []
def load_yaml_file(compose_path: Path) -> dict[str, Any]: for service_name, service_def in services.items():
try: image = service_def.get("image", "")
content = compose_path.read_text(encoding="utf-8") env = self._normalize_environment(service_def.get("environment", {}))
except UnicodeDecodeError: volumes = service_def.get("volumes", [])
content = compose_path.read_text(encoding="utf-8", errors="ignore")
for volume in volumes:
candidate = self._parse_volume(
service_name=service_name,
image=image,
volume=volume,
env=env,
compose_project=project_name,
)
if candidate:
candidates.append(candidate)
return candidates
def _normalize_environment(self, env: Any) -> Dict[str, str]:
if isinstance(env, dict):
return {str(k): str(v) for k, v in env.items()}
if isinstance(env, list):
parsed: Dict[str, str] = {}
for item in env:
if isinstance(item, str) and "=" in item:
key, value = item.split("=", 1)
parsed[key] = value
return parsed
data = yaml.safe_load(content) or {}
if not isinstance(data, dict):
return {} return {}
return data
def _parse_volume(
self,
service_name: str,
image: str,
volume: Any,
env: Dict[str, str],
compose_project: str,
) -> MountCandidate | None:
if isinstance(volume, str):
return self._parse_short_syntax(
service_name=service_name,
image=image,
volume=volume,
env=env,
compose_project=compose_project,
)
def parse_env_files(value: Any) -> list[str]: if isinstance(volume, dict):
if isinstance(value, str): return self._parse_long_syntax(
return [value] service_name=service_name,
image=image,
volume=volume,
env=env,
compose_project=compose_project,
)
if isinstance(value, list):
items: list[str] = []
for item in value:
if isinstance(item, str):
items.append(item)
elif isinstance(item, dict):
path = item.get("path")
if isinstance(path, str):
items.append(path)
return sorted(set(items))
return []
def normalize_volume_dict(volume: dict[str, Any]) -> MountMapping | None:
source = volume.get("source") or volume.get("src") or ""
target = volume.get("target") or volume.get("dst") or volume.get("destination") or ""
if not isinstance(target, str) or not target:
return None return None
kind = volume.get("type") or ("bind" if source and str(source).startswith(("/", ".", "~")) else "volume") def _parse_short_syntax(
read_only = bool(volume.get("read_only") or volume.get("readonly")) self,
service_name: str,
image: str,
volume: str,
env: Dict[str, str],
compose_project: str,
) -> MountCandidate | None:
parts = volume.split(":")
return MountMapping( if len(parts) == 1:
source=str(source), # Anonymous volume style: "/data"
target=target, return MountCandidate(
kind=str(kind), service_name=service_name,
read_only=read_only, image=image,
) source="",
target=parts[0],
mount_type="volume",
read_only=False,
env=env,
compose_project=compose_project,
)
if len(parts) >= 2:
source = parts[0]
target = parts[1]
options = parts[2:] if len(parts) > 2 else []
read_only = "ro" in options
def normalize_volume_string(value: str) -> MountMapping | None: mount_type = self._guess_mount_type(source)
parts = value.split(":")
if len(parts) == 1:
return MountMapping(source="", target=parts[0], kind="anonymous", read_only=False)
if len(parts) >= 2: return MountCandidate(
source = parts[0] service_name=service_name,
target = parts[1] image=image,
options = parts[2:] source=source,
read_only = any(option == "ro" for option in options) target=target,
mount_type=mount_type,
read_only=read_only,
env=env,
compose_project=compose_project,
)
if source.startswith(("/", ".", "~")): return None
kind = "bind"
else:
kind = "volume"
return MountMapping(source=source, target=target, kind=kind, read_only=read_only) def _parse_long_syntax(
self,
service_name: str,
image: str,
volume: Dict[str, Any],
env: Dict[str, str],
compose_project: str,
) -> MountCandidate | None:
source = volume.get("source", "") or volume.get("src", "")
target = volume.get("target", "") or volume.get("dst", "") or volume.get("destination", "")
mount_type = volume.get("type", self._guess_mount_type(str(source)))
read_only = bool(volume.get("read_only", False))
return None if not target:
return None
return MountCandidate(
service_name=service_name,
image=image,
source=str(source),
target=str(target),
mount_type=str(mount_type),
read_only=read_only,
env=env,
compose_project=compose_project,
)
def parse_mounts(value: Any) -> list[MountMapping]: def _guess_mount_type(self, source: str) -> str:
mounts: list[MountMapping] = [] if not source:
return "volume"
if not isinstance(value, list): if source.startswith("/") or source.startswith("./") or source.startswith("../"):
return mounts return "bind"
for item in value: return "volume"
mapping: MountMapping | None = None
if isinstance(item, str):
mapping = normalize_volume_string(item)
elif isinstance(item, dict):
mapping = normalize_volume_dict(item)
if mapping:
mounts.append(mapping)
return mounts
def parse_service_definition(name: str, data: Any) -> ServiceDefinition:
if not isinstance(data, dict):
return ServiceDefinition(name=name)
mounts = parse_mounts(data.get("volumes", []))
env_files = parse_env_files(data.get("env_file"))
return ServiceDefinition(
name=name,
image=data.get("image") if isinstance(data.get("image"), str) else None,
restart=data.get("restart") if isinstance(data.get("restart"), str) else None,
env_files=env_files,
mounts=mounts,
)
def merge_service(existing: ServiceDefinition, incoming: ServiceDefinition) -> ServiceDefinition:
mounts_by_key: dict[tuple[str, str, str, bool], MountMapping] = {
(mount.source, mount.target, mount.kind, mount.read_only): mount
for mount in existing.mounts
}
for mount in incoming.mounts:
mounts_by_key[(mount.source, mount.target, mount.kind, mount.read_only)] = mount
env_files = sorted(set(existing.env_files) | set(incoming.env_files))
return ServiceDefinition(
name=existing.name,
image=incoming.image or existing.image,
restart=incoming.restart or existing.restart,
env_files=env_files,
mounts=sorted(mounts_by_key.values(), key=lambda item: (item.target, item.source, item.kind)),
)
def extract_project_from_compose(folder: Path, compose_files: list[Path]) -> ComposeProject:
services_by_name: dict[str, ServiceDefinition] = {}
named_volumes: set[str] = set()
backup_paths: set[str] = set()
for compose_file in sorted(compose_files):
data = load_yaml_file(compose_file)
for volume_name in (data.get("volumes") or {}).keys() if isinstance(data.get("volumes"), dict) else []:
if isinstance(volume_name, str):
named_volumes.add(volume_name)
raw_services = data.get("services") or {}
if not isinstance(raw_services, dict):
continue
for service_name, service_data in raw_services.items():
if not isinstance(service_name, str):
continue
incoming = parse_service_definition(service_name, service_data)
if service_name in services_by_name:
services_by_name[service_name] = merge_service(services_by_name[service_name], incoming)
else:
services_by_name[service_name] = incoming
for service in services_by_name.values():
for mount in service.mounts:
if mount.kind == "bind" and mount.source:
candidate = Path(mount.source).expanduser()
if not candidate.is_absolute():
candidate = (folder / candidate).resolve()
backup_paths.add(str(candidate))
for env_file in service.env_files:
candidate = Path(env_file).expanduser()
if not candidate.is_absolute():
candidate = (folder / candidate).resolve()
backup_paths.add(str(candidate))
return ComposeProject(
name=folder.name,
root_path=normalize_path(folder),
compose_files=[file.name for file in sorted(compose_files)],
services=sorted(services_by_name.values(), key=lambda item: item.name),
named_volumes=sorted(named_volumes),
backup_paths=sorted(backup_paths),
)
def group_projects_by_folder(compose_files: list[Path]) -> list[ComposeProject]:
grouped: dict[Path, list[Path]] = {}
for compose_file in compose_files:
grouped.setdefault(compose_file.parent, []).append(compose_file)
projects: list[ComposeProject] = []
for folder, files in sorted(grouped.items()):
projects.append(extract_project_from_compose(folder, files))
return projects
def scan_projects(base_path: Path) -> list[ComposeProject]:
if not base_path.exists():
raise FileNotFoundError(f"Path does not exist: {base_path}")
if not base_path.is_dir():
raise NotADirectoryError(f"Path is not a directory: {base_path}")
compose_files = find_compose_files(base_path)
return group_projects_by_folder(compose_files)

View file

@ -0,0 +1,47 @@
from dockervault.classification.engine import ClassificationEngine
from dockervault.classification.models import MountCandidate, Classification
def test_minecraft():
engine = ClassificationEngine()
c = MountCandidate(
service_name="mc",
image="itzg/minecraft-server",
source="data",
target="/data",
mount_type="bind"
)
result = engine.classify(c)
assert result.classification == Classification.CRITICAL
def test_database():
engine = ClassificationEngine()
c = MountCandidate(
service_name="db",
image="mysql",
source="db",
target="/var/lib/mysql",
mount_type="bind"
)
result = engine.classify(c)
assert result.classification == Classification.CRITICAL
def test_logs():
engine = ClassificationEngine()
c = MountCandidate(
service_name="nginx",
image="nginx",
source="logs",
target="/var/log/nginx",
mount_type="bind"
)
result = engine.classify(c)
assert result.classification == Classification.OPTIONAL

44
dockervault/validation.py Normal file
View file

@ -0,0 +1,44 @@
from __future__ import annotations
from pathlib import Path
from .models import MountEntry, ValidationResult
def validate_paths(include_entries: list[MountEntry], review_entries: list[MountEntry]) -> ValidationResult:
missing: list[MountEntry] = []
present: list[MountEntry] = []
for entry in [*include_entries, *review_entries]:
entry.exists = entry.source.exists()
if entry.exists:
present.append(entry)
else:
missing.append(entry)
return ValidationResult(missing=missing, present=present)
def mkdir_target_for_missing(entry: MountEntry) -> Path:
"""
Heuristic:
- If path looks like a file path (has suffix), create parent directory.
- Otherwise create the directory path itself.
"""
source = entry.source
if source.suffix and not source.name.startswith("."):
return source.parent
return source
def apply_mkdir_for_missing(missing: list[MountEntry]) -> list[Path]:
created: list[Path] = []
for entry in missing:
target = mkdir_target_for_missing(entry)
if target.exists():
continue
target.mkdir(parents=True, exist_ok=True)
created.append(target)
return created

View file

@ -0,0 +1,171 @@
from __future__ import annotations
import json
import shutil
import subprocess
from dataclasses import dataclass
from pathlib import Path
from typing import Any
@dataclass
class NamedVolumeResolution:
compose_name: str
docker_name: str | None
mountpoint: Path | None
available: bool
reason: str | None = None
def docker_available() -> bool:
return shutil.which("docker") is not None
def run_docker_volume_inspect(volume_name: str) -> dict[str, Any] | None:
if not docker_available():
return None
try:
result = subprocess.run(
["docker", "volume", "inspect", volume_name],
capture_output=True,
text=True,
check=False,
)
except OSError:
return None
if result.returncode != 0:
return None
try:
data = json.loads(result.stdout)
except json.JSONDecodeError:
return None
if not isinstance(data, list) or not data:
return None
item = data[0]
if not isinstance(item, dict):
return None
return item
def infer_project_name(compose_path: Path, compose_data: dict[str, Any]) -> str:
top_level_name = compose_data.get("name")
if isinstance(top_level_name, str) and top_level_name.strip():
return top_level_name.strip()
return compose_path.parent.name
def normalize_top_level_volume_name(
volume_key: str,
compose_data: dict[str, Any],
) -> tuple[str | None, bool]:
"""
Returns:
(explicit_name_or_none, is_external)
"""
volumes = compose_data.get("volumes", {})
if not isinstance(volumes, dict):
return None, False
cfg = volumes.get(volume_key)
if not isinstance(cfg, dict):
return None, False
explicit_name = cfg.get("name")
if not isinstance(explicit_name, str):
explicit_name = None
external = cfg.get("external", False)
is_external = False
if isinstance(external, bool):
is_external = external
elif isinstance(external, dict):
is_external = True
ext_name = external.get("name")
if isinstance(ext_name, str) and ext_name.strip():
explicit_name = ext_name.strip()
return explicit_name, is_external
def build_volume_candidates(
compose_name: str,
compose_path: Path,
compose_data: dict[str, Any],
) -> list[str]:
"""
Try likely Docker volume names in a sensible order.
"""
candidates: list[str] = []
project_name = infer_project_name(compose_path, compose_data)
explicit_name, is_external = normalize_top_level_volume_name(compose_name, compose_data)
# 1) explicit external/name override
if explicit_name:
candidates.append(explicit_name)
# 2) external volumes often use raw name directly
if is_external:
candidates.append(compose_name)
# 3) raw compose source
candidates.append(compose_name)
# 4) compose-created default name: <project>_<volume>
candidates.append(f"{project_name}_{compose_name}")
# de-dup while preserving order
unique: list[str] = []
seen: set[str] = set()
for c in candidates:
if c not in seen:
unique.append(c)
seen.add(c)
return unique
def resolve_named_volume(
compose_name: str,
compose_path: Path,
compose_data: dict[str, Any],
) -> NamedVolumeResolution:
if not docker_available():
return NamedVolumeResolution(
compose_name=compose_name,
docker_name=None,
mountpoint=None,
available=False,
reason="docker CLI not available",
)
for candidate in build_volume_candidates(compose_name, compose_path, compose_data):
inspected = run_docker_volume_inspect(candidate)
if not inspected:
continue
mountpoint = inspected.get("Mountpoint")
if isinstance(mountpoint, str) and mountpoint.strip():
return NamedVolumeResolution(
compose_name=compose_name,
docker_name=candidate,
mountpoint=Path(mountpoint),
available=True,
reason=None,
)
return NamedVolumeResolution(
compose_name=compose_name,
docker_name=None,
mountpoint=None,
available=False,
reason="docker volume not found or not inspectable",
)

View file

@ -0,0 +1,35 @@
version: "3.9"
services:
db:
image: mariadb:10.11
container_name: dv-db
restart: unless-stopped
environment:
MYSQL_ROOT_PASSWORD: example
MYSQL_DATABASE: testdb
MYSQL_USER: test
MYSQL_PASSWORD: test
volumes:
- ./db:/var/lib/mysql
mc:
image: itzg/minecraft-server:latest
container_name: dv-mc
restart: unless-stopped
environment:
EULA: "TRUE"
MEMORY: "1G"
ports:
- "25565:25565"
volumes:
- ./mc-missing:/data # <-- med vilje mangler denne
nginx:
image: nginx:latest
container_name: dv-nginx
restart: unless-stopped
ports:
- "8080:80"
volumes:
- ./logs:/var/log/nginx