Backups: Difference between revisions
Jump to navigation
Jump to search
(5 intermediate revisions by the same user not shown) | |||
Line 25: | Line 25: | ||
* Copy files from X to Y | * Copy files from X to Y | ||
* Database (SQLite? DuckDB?) to hold metadata (map hashes to file paths) | * Database (SQLite? DuckDB?) to hold metadata (map hashes to file paths) | ||
* | * File size and SHA-3 for deduplication | ||
Thoughts: | Thoughts: | ||
* Can the hash checking of files be done in parallel? | * Can the hash checking of files be done in parallel? | ||
* Should we compare file size first, and only calculate hashes if the sizes are the same? | |||
* What metadata should we store about each file? | * What metadata should we store about each file? | ||
* How to restore files? | * How to restore files? | ||
Line 35: | Line 36: | ||
* Deduplication at the file level - could also do this with chunks of files at a later date? | * Deduplication at the file level - could also do this with chunks of files at a later date? | ||
* Can we compress the files? Can this be done in parallel? | * Can we compress the files? Can this be done in parallel? | ||
* What is the fastest collision-resistant algorithm for file comparison? | |||
* How do we stop two backup processes running simultaneously? | |||
Security: | Security: | ||
* Assumption is that you trust the backup target | |||
* If you want encryption at rest, use LUKS etc. to encrypt the underlying device | * If you want encryption at rest, use LUKS etc. to encrypt the underlying device | ||
* If you want encryption in transit, use SSH or TLS | * If you want encryption in transit, use SSH or TLS | ||
Line 47: | Line 51: | ||
* [https://jpmens.net/2019/04/15/i-mirror-my-github-repositories-to-gitea/ I mirror my Github repositories to Gitea] | * [https://jpmens.net/2019/04/15/i-mirror-my-github-repositories-to-gitea/ I mirror my Github repositories to Gitea] | ||
* [https://github.com/PyGithub/PyGithub PyGithub] | * [https://github.com/PyGithub/PyGithub PyGithub] | ||
[[Category:Software]] | |||
[[Category:Open Source Software]] |
Latest revision as of 10:07, 24 September 2024
Topics for consideration
- How can I start backups automatically when I login? Don't want to do this until a network connection is available, and possibly my keyring has been unlocked.
- How can I start a backup run periodically? Is this necessary given that I usually login at least once every 24 hours?
- How can I put an icon in the system menu that shows backup status/process? Similar to Nextcloud would be useful with a tick for complete, circular arrows for in process, and a cross for failed.
Existing software
- tar
- Obnam
- restic
- Borg
- rdiff-backup
- deja-dup
- rclone (for backing up from/to cloud services such as Google Docs)
Writing new software
Powerful Command-Line Applications in Go - may be useful
MVP:
- Include file (one per line)
- Exclude file (one per line)
- Copy files from X to Y
- Database (SQLite? DuckDB?) to hold metadata (map hashes to file paths)
- File size and SHA-3 for deduplication
Thoughts:
- Can the hash checking of files be done in parallel?
- Should we compare file size first, and only calculate hashes if the sizes are the same?
- What metadata should we store about each file?
- How to restore files?
- How to prune files?
- Deduplication at the file level - could also do this with chunks of files at a later date?
- Can we compress the files? Can this be done in parallel?
- What is the fastest collision-resistant algorithm for file comparison?
- How do we stop two backup processes running simultaneously?
Security:
- Assumption is that you trust the backup target
- If you want encryption at rest, use LUKS etc. to encrypt the underlying device
- If you want encryption in transit, use SSH or TLS