release: prepare v0.1.0

This commit is contained in:
Eddie Nielsen 2026-03-24 13:40:06 +00:00
parent 44c98f5917
commit 3cfbcfd390
3 changed files with 404 additions and 386 deletions

23
CHANGELOG.md Normal file
View file

@ -0,0 +1,23 @@
# Changelog
All notable changes to this project will be documented in this file.
---
## [0.1.0] - 2026-03-24
### Added
- Initial DockerVault CLI
- Recursive Docker Compose scanning
- Classification engine (critical / review / skip)
- Named volume detection and resolution
- Missing path detection
- Borg backup command generation
- Automation mode (--automation, --quiet)
- Exit codes for scripting
- Initial pytest test suite
- Project README and documentation
### Notes
- First public foundation release of DockerVault
- Focused on backup discovery for real Docker environments

459
README.md
View file

@ -6,7 +6,7 @@
> Intelligent Docker backup discovery for real systems
DockerVault scans your Docker environments and figures out **what actually matters to back up** — automatically.
DockerVault scans your Docker environments and figures out what actually matters to back up — automatically.
No guesswork. No forgotten volumes. No broken restores.
@ -14,448 +14,175 @@ No guesswork. No forgotten volumes. No broken restores.
## 📚 Contents
* [🚀 What is DockerVault?](#what-is-dockervault)
* [⚡ Quick Start](#quick-start)
* [🧠 How it Works](#how-it-works)
* [🗂 Classification Model](#classification-model)
* [💾 Borg Integration](#borg-integration)
* [🤖 Automation Mode](#automation-mode)
* [🔢 Exit Codes](#exit-codes)
* [🛠 Tech Stack](#tech-stack)
* [🔍 Example](#example)
* [🧱 Current Features](#current-features)
* [🔥 Roadmap](#roadmap)
* [🔮 Future Ideas](#future-ideas)
* [🧠 Philosophy](#philosophy)
* [📜 License](#license)
* [❤️ Credits](#credits)
* 🚀 What is DockerVault?
* ⚡ Quick Start
* 🧠 How it Works
* 🗂 Classification Model
* 💾 Borg Integration
* 🤖 Automation Mode
* 🔢 Exit Codes
* 🛠 Tech Stack
* 🔍 Example
* 🔥 Future Ideas
* 📜 License
* ❤️ Credits
---
## 🚀 What is DockerVault?
DockerVault analyzes your `docker-compose.yml` and identifies:
DockerVault is a CLI tool that scans Docker environments and determines what actually needs to be backed up.
* What **must** be backed up
* What can be **ignored**
* What needs **human review**
It understands:
- Docker Compose setups
- bind mounts
- named volumes
- service-specific data paths
It bridges the gap between:
👉 “everything looks fine”
and
👉 “restore just failed”
Instead of guessing, DockerVault builds a structured backup plan.
---
## ⚡ Quick Start
```bash
git clone https://github.com/YOUR-USER/dockervault.git
git clone https://git.lanx.dk/ed/dockervault.git
cd dockervault
pip install -e .
```
Run analysis:
```bash
python -m dockervault.cli docker-compose.yml --borg --repo /backup-repo
```
Run backup:
```bash
python -m dockervault.cli docker-compose.yml \
--run-borg \
--repo /backup-repo
```
python -m dockervault.cli scan /your/docker/root --repo /backup
---
## 🧠 How it Works
DockerVault parses your compose file and inspects:
DockerVault works in layers:
* bind mounts
* volume targets
* known data paths
It then classifies them using heuristics:
* database paths → critical
* logs/cache → optional
* unknown → review
1. Scan for docker-compose.yml
2. Parse services and volumes
3. Resolve:
- bind mounts
- named volumes
4. Classify paths:
- critical
- review
- skip
5. Generate backup plan
---
## 🗂 Classification Model
DockerVault divides everything into three categories:
DockerVault sorts paths into:
### ✅ INCLUDE
### INCLUDE (critical)
Must be backed up
Must be backed up.
Examples:
- /var/lib/mysql
- /data
Example:
---
```
/var/lib/mysql
/data
/config
```
### REVIEW
Needs human decision
### ⚠️ REVIEW
Examples:
- uploads
- config folders
Needs human decision.
---
Triggered when:
### SKIP
Safe to ignore
* path does not exist
* path exists but is empty
* named volumes (Docker-managed)
Example:
```
./mc-missing → /data
```
### ❌ SKIP
Safe to ignore.
Example:
```
/var/log
/tmp
/cache
```
Examples:
- logs
- cache
- temp data
---
## 💾 Borg Integration
DockerVault can generate and run Borg backups directly.
DockerVault can generate ready-to-use Borg commands:
Example:
```bash
python -m dockervault.cli docker-compose.yml \
--run-borg \
--repo /mnt/backups/borg/dockervault
```
Generated command:
```bash
borg create --stats --progress \
/repo::hostname-2026-03-23_12-44-19 \
/path/to/data
```
/backup-repo::{hostname}-{now:%Y-%m-%d_%H-%M} \
/path1 \
/path2
### Features
* automatic archive naming (with seconds precision)
* deduplicated paths
* safe command generation
* subprocess execution
* optional passphrase support
This makes it easy to plug into:
- cron jobs
- scripts
- automation pipelines
---
## 🤖 Automation Mode
Designed for cron / scripts / servers.
python -m dockervault.cli scan /path --automation --quiet
```bash
python -m dockervault.cli docker-compose.yml \
--run-borg \
--quiet \
--automation \
--repo /backup-repo
```
### Behavior
* no plan output
* no interactive prompts
* minimal output
* suitable for logs / CI
Designed for:
- scheduled backups
- CI/CD pipelines
- unattended systems
---
## 🔢 Exit Codes
| Code | Meaning |
| ---- | ------------------------------------ |
| 0 | Success |
| 1 | General error |
| 2 | Missing required args |
| 3 | No include paths |
| 4 | Review required (`--fail-on-review`) |
### Fail on review
```bash
--fail-on-review
```
Stops automation if something needs human attention.
0 = Success
1 = Missing critical paths
2 = General error
---
## 🛠 Tech Stack
* Python 3.10+
* PyYAML
* BorgBackup
* CLI-first design
- Python 3
- Docker Compose parsing
- Filesystem analysis
- Borg backup integration
- pytest (testing)
---
## 🔍 Example
Input:
DockerVault Backup Plan
=======================
```yaml
services:
db:
volumes:
- ./db:/var/lib/mysql
INCLUDE PATHS:
- ./db [critical]
- ./mc [critical]
mc:
volumes:
- ./mc-missing:/data
nginx:
volumes:
- ./logs:/var/log/nginx
```
Output:
```
INCLUDE:
db
REVIEW:
mc-missing
SKIP:
logs
```
WARNING: Missing critical paths detected
- ./db (service=db)
---
## 🧱 Current Features
## 🔥 Future Ideas
* Docker Compose parsing
* Bind mount detection
* Intelligent classification
* Borg backup integration
* Automation mode
* Exit codes for scripting
* Safe path handling
* Deduplication
---
## 🗺 Roadmap
DockerVault is built with a clear philosophy:
**simple core, intelligent behavior, and extensible design — without unnecessary complexity or vendor lock-in.**
---
### 🚀 v1 — Core Engine (Current Focus)
> Build a reliable, deterministic backup discovery engine
- [x] Docker Compose scanning
- [x] Volume and bind mount detection
- [x] Intelligent classification (critical / review / skip)
- [x] Backup plan generation
- [x] Borg backup integration
- [x] Dry-run mode
- [x] Automation mode (`--automation`, `--quiet`)
---
### 🔧 v2 — Observability & Automation
> Make DockerVault production-ready
- [ ] Advanced logging (human + JSON output)
- [ ] Webhook support (primary notification system)
- [ ] ntfy integration (lightweight alerts)
- [ ] Email notifications (optional reports)
- [ ] Change detection (new/missing volumes)
- [ ] Backup summaries (stats, duration, warnings)
- [ ] Basic run history (file-based, no database)
---
### 🧠 v3 — Intelligence Layer
> Move from tool → system awareness
- [ ] "Explain why" classification decisions
- [ ] Anomaly detection (size, duration, structure)
- [ ] System understanding confidence
- [ ] Backup diff between runs
- [ ] Smarter classification patterns
---
### 🧪 v4 — Reliability & Safety
> Ensure backups are actually usable
- [ ] Restore testing (ephemeral container validation)
- [ ] Integrity checks (borg/restic verify)
- [ ] Pre/post execution hooks
- [ ] Backup profiles (critical / full / custom)
---
### 🔐 v5 — Security & Encryption
> Strong, transparent data protection
- [ ] Engine-native encryption (Borg / Restic)
- [ ] Encryption validation checks
- [ ] Optional post-process encryption (age / gpg)
- [ ] Clear key handling guidelines
---
### 🔌 v6 — Plugin Ecosystem
> Extend without bloating core
- [ ] Storage backends (S3, WebDAV, SSH, etc.)
- [ ] Optional cloud integrations (Dropbox, Google Drive, Proton Drive)
- [ ] Notification plugins (webhook-first approach)
- [ ] Pluggable architecture for extensions
---
### 🌐 v7 — Platform & Deployment
> Make DockerVault easy to run anywhere
- [ ] Official Docker image
- [ ] Non-interactive container mode
- [ ] Unraid Community Apps template
- [ ] Configurable via environment + config file
---
### 🧭 Design Principles
- **No vendor lock-in** — webhook over platform integrations
- **Self-hosting friendly** — works fully offline/local
- **Transparency over magic** — explain decisions
- **Stateless-first** — no database required by default
- **Extensible architecture** — plugins over core bloat
- **Backup ≠ done until restore works**
---
### 🔮 Future Ideas
> Ideas that push DockerVault beyond backup — towards system awareness and control.
#### 🧠 System Intelligence
- Change detection (new/missing volumes, structure changes)
- "Explain why" classification decisions
- System understanding confidence score
- Backup diff between runs
- Detection of unknown/unclassified data
#### 📊 Observability & Insight
- Historical trends (size, duration, change rate)
- Growth analysis (detect abnormal data expansion)
- Backup performance tracking
- Structured JSON logs for external systems
#### 🚨 Alerting & Automation
- Webhook-first automation triggers
- ntfy notifications
- Email reporting
- Conditional alerts (failures, anomalies, missing data)
- Integration with external systems (Node-RED, Home Assistant, OpenObserve)
#### 🧪 Reliability & Verification
- Automated restore testing (ephemeral containers)
- Service-level validation (DB start, app health)
- Integrity checks (borg/restic verification)
- Backup validation reports
#### ⚙️ Control & Extensibility
- Pre/post execution hooks
- Backup profiles (critical / full / custom)
- Simulation mode (predict behavior before execution)
- Advanced dry-run with diff preview
#### 🔐 Security & Encryption
- Engine-native encryption support
- Optional post-process encryption (age, gpg)
- Encryption validation and key awareness
- Secure offsite export workflows
#### 🔌 Plugin Ecosystem
- Storage backends (S3, WebDAV, SSH, etc.)
- Optional cloud targets (Dropbox, Google Drive, Proton Drive)
- Notification plugins (webhook-first design)
- Pluggable architecture for extensions
#### 🌐 Multi-System Awareness
- Multi-host environments (Lanx-style setups)
- Centralized reporting and monitoring
- Cross-node backup visibility
#### 🖥 Platform & UX
- Optional Web UI (status, history, alerts)
- Docker-native deployment mode
- Unraid Community Apps integration
- Config-driven operation (env + config files)
---
> Built with ❤️ for real systems — not toy setups.
---
## 🧠 Philosophy
DockerVault is built on a simple idea:
> Backups should reflect reality — not assumptions.
* No blind backups
* No hidden data
* No silent failures
Just clarity.
- Notifications (mail, ntfy)
- Web interface
- Backup reports
- Restore validation
- smarter classification engine
- Docker API integration
---
## 📜 License
GNU GPLv3
This project is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License v3.
This project is licensed under the GNU GPL v3.
---
## ❤️ Credits
Created by: <br> **Ed Nielsen https://lanx.dk <br> NodeFox 🦊 https://nodefox.lanx.dk**
## ❤️ Credits
Built with ❤️ for Lanx
Built with ❤️ for Lanx by [NodeFox 🦊](https://gnodefoxlanx.dk/nodefox)
Maintained by Ed Nielsen
Feel free to contribute, suggest improvements or fork the project.
Maintained by [Eddie Nielsen](https://lanx.dk/ed)
Feel free to contribute, suggest improvements, or fork the projec

View file

@ -1,31 +1,299 @@
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from typing import Any
import yaml
from .volume_inspector import (
docker_available,
infer_project_name,
normalize_top_level_volume_name,
run_docker_volume_inspect,
)
CRITICAL_PATHS = [
"/var/lib/mysql",
"/data",
"/config",
]
@dataclass
class ClassifiedEntry:
service: str
source: Path | str | None
target: str
classification: str
reason: str
compose: Path
exists: bool = False
volume_type: str | None = None
SKIP_PATHS = [
"/var/log",
"/tmp",
]
_PRIORITY = {
"critical": 3,
"review": 2,
"optional": 1,
}
def _classify_target(target: str) -> tuple[str, str]:
t = (target or "").lower()
if t == "/var/lib/mysql":
return "critical", f"mariadb critical path {target}"
if t == "/var/lib/postgresql/data":
return "critical", f"postgres critical path {target}"
if t == "/data":
return "critical", f"critical target path {target}"
if "/log" in t or "cache" in t or "/tmp" in t:
return "optional", f"ephemeral/log/cache target path {target}"
return "review", "no rule matched"
def classify_mount(mount: dict) -> dict:
target = mount["target"]
classification, reason = _classify_target(mount.get("target", ""))
out = dict(mount)
out["classification"] = classification
out["reason"] = reason
return out
# 🔥 critical
for p in CRITICAL_PATHS:
if target.startswith(p):
return {**mount, "class": "critical"}
# 🗑 skip
for p in SKIP_PATHS:
if target.startswith(p):
return {**mount, "class": "skip"}
def _parse_volume_spec(spec: Any) -> dict[str, Any] | None:
if isinstance(spec, dict):
source = spec.get("source")
target = spec.get("target")
if not target:
return None
# 🤔 fallback
return {**mount, "class": "review"}
mount_type = spec.get("type")
if mount_type is None:
mount_type = (
"volume"
if source and not str(source).startswith((".", "/", "~"))
else "bind"
)
return {
"source": source,
"target": target,
"type": mount_type,
}
if not isinstance(spec, str):
return None
parts = spec.split(":")
if len(parts) == 1:
return {
"source": "__anonymous_volume__",
"target": parts[0],
"type": "anonymous",
}
source = parts[0]
target = parts[1]
if source.startswith("/") or source.startswith(".") or source.startswith("~"):
mount_type = "bind"
else:
mount_type = "volume"
return {
"source": source,
"target": target,
"type": mount_type,
}
def _build_entry(
*,
service: str,
compose_path: Path,
source: Path | str | None,
target: str,
classification: str,
reason: str,
volume_type: str | None,
) -> ClassifiedEntry:
exists = False
if isinstance(source, Path):
exists = source.exists()
elif isinstance(source, str) and not source.startswith("__"):
exists = Path(source).exists()
return ClassifiedEntry(
service=service,
source=source,
target=target,
classification=classification,
reason=reason,
compose=compose_path,
exists=exists,
volume_type=volume_type,
)
def _candidate_volume_names(
*,
named_volume: str,
compose_data: dict[str, Any],
project_name: str,
) -> list[str]:
candidates: list[str] = []
explicit_name, is_external = normalize_top_level_volume_name(
named_volume,
compose_data,
)
if explicit_name:
candidates.append(explicit_name)
if named_volume not in candidates:
candidates.append(named_volume)
if not is_external:
project_prefixed = f"{project_name}_{named_volume}"
if project_prefixed not in candidates:
candidates.append(project_prefixed)
return candidates
def classify_compose(compose_path: str | Path) -> list[ClassifiedEntry]:
compose_path = Path(compose_path)
with compose_path.open("r", encoding="utf-8") as f:
compose_data = yaml.safe_load(f) or {}
services = compose_data.get("services", {}) or {}
project_name = infer_project_name(compose_path, compose_data)
entries: list[ClassifiedEntry] = []
for service_name, service_data in services.items():
volumes = service_data.get("volumes", []) or []
for raw_spec in volumes:
parsed = _parse_volume_spec(raw_spec)
if not parsed:
continue
source = parsed["source"]
target = parsed["target"]
mount_type = parsed["type"]
if mount_type == "bind":
if source and source.startswith("."):
resolved_source = (compose_path.parent / source).resolve()
else:
resolved_source = Path(source).expanduser().resolve()
classification, reason = _classify_target(target)
entries.append(
_build_entry(
service=service_name,
compose_path=compose_path,
source=resolved_source,
target=target,
classification=classification,
reason=reason,
volume_type="bind",
)
)
continue
if mount_type == "anonymous":
entries.append(
_build_entry(
service=service_name,
compose_path=compose_path,
source="__anonymous_volume__",
target=target,
classification="review",
reason=f"anonymous volume for {target}",
volume_type="anonymous",
)
)
continue
classification, reason = _classify_target(target)
named_volume = source
candidates = _candidate_volume_names(
named_volume=named_volume,
compose_data=compose_data,
project_name=project_name,
)
if docker_available():
resolved_source: Path | None = None
resolved_candidate: str | None = None
for candidate in candidates:
inspected = run_docker_volume_inspect(candidate)
if inspected and inspected.get("Mountpoint"):
resolved_source = Path(inspected["Mountpoint"]).resolve()
resolved_candidate = candidate
break
if resolved_source is not None:
entries.append(
ClassifiedEntry(
service=service_name,
source=resolved_source,
target=target,
classification=classification,
reason=f"resolved named volume '{resolved_candidate}' for {target}",
compose=compose_path,
exists=resolved_source.exists(),
volume_type="volume",
)
)
else:
explicit_name, _ = normalize_top_level_volume_name(
named_volume,
compose_data,
)
unresolved_name = explicit_name or named_volume
entries.append(
_build_entry(
service=service_name,
compose_path=compose_path,
source=f"__named_volume_unresolved__/{unresolved_name}",
target=target,
classification="review",
reason=f"named volume '{named_volume}' could not be resolved",
volume_type="volume",
)
)
else:
entries.append(
_build_entry(
service=service_name,
compose_path=compose_path,
source=named_volume,
target=target,
classification="review",
reason=f"docker CLI not available for named volume {named_volume}",
volume_type="volume",
)
)
deduped: dict[str, ClassifiedEntry] = {}
for entry in entries:
key = str(entry.source)
existing = deduped.get(key)
if existing is None:
deduped[key] = entry
continue
if _PRIORITY[entry.classification] > _PRIORITY[existing.classification]:
deduped[key] = entry
return list(deduped.values())