Backups: Difference between revisions

From Rixort Wiki
Jump to navigation Jump to search
 
(9 intermediate revisions by the same user not shown)
Line 16: Line 16:


== Writing new software ==
== Writing new software ==
Powerful Command-Line Applications in Go - may be useful


MVP:
MVP:
Line 22: Line 24:
* Exclude file (one per line)
* Exclude file (one per line)
* Copy files from X to Y
* Copy files from X to Y
* Database (SQLite?) to hold metadata (map hashes to file paths)
* Database (SQLite? DuckDB?) to hold metadata (map hashes to file paths)
* MD5 for deduplication
* File size and SHA-3 for deduplication


Thoughts:
Thoughts:


* Can the hash checking of files be done in parallel?
* Can the hash checking of files be done in parallel?
* Should we compare file size first, and only calculate hashes if the sizes are the same?
* What metadata should we store about each file?
* What metadata should we store about each file?
* How to restore files?
* How to restore files?
* How to prune files?
* How to prune files?
* Deduplication at the file level - could also do this with chunks of files at a later date?
* Can we compress the files? Can this be done in parallel?
* What is the fastest collision-resistant algorithm for file comparison?
* How do we stop two backup processes running simultaneously?


Security:
Security:


* Assumption is that you trust the backup target
* If you want encryption at rest, use LUKS etc. to encrypt the underlying device
* If you want encryption at rest, use LUKS etc. to encrypt the underlying device
* If you want encryption in transit, use SSH or TLS
* If you want encryption in transit, use SSH or TLS
Line 43: Line 51:
* [https://jpmens.net/2019/04/15/i-mirror-my-github-repositories-to-gitea/ I mirror my Github repositories to Gitea]
* [https://jpmens.net/2019/04/15/i-mirror-my-github-repositories-to-gitea/ I mirror my Github repositories to Gitea]
* [https://github.com/PyGithub/PyGithub PyGithub]
* [https://github.com/PyGithub/PyGithub PyGithub]
[[Category:Software]]
[[Category:Open Source Software]]

Latest revision as of 10:07, 24 September 2024

Topics for consideration

  • How can I start backups automatically when I login? Don't want to do this until a network connection is available, and possibly my keyring has been unlocked.
  • How can I start a backup run periodically? Is this necessary given that I usually login at least once every 24 hours?
  • How can I put an icon in the system menu that shows backup status/process? Similar to Nextcloud would be useful with a tick for complete, circular arrows for in process, and a cross for failed.

Existing software

  • tar
  • Obnam
  • restic
  • Borg
  • rdiff-backup
  • deja-dup
  • rclone (for backing up from/to cloud services such as Google Docs)

Writing new software

Powerful Command-Line Applications in Go - may be useful

MVP:

  • Include file (one per line)
  • Exclude file (one per line)
  • Copy files from X to Y
  • Database (SQLite? DuckDB?) to hold metadata (map hashes to file paths)
  • File size and SHA-3 for deduplication

Thoughts:

  • Can the hash checking of files be done in parallel?
  • Should we compare file size first, and only calculate hashes if the sizes are the same?
  • What metadata should we store about each file?
  • How to restore files?
  • How to prune files?
  • Deduplication at the file level - could also do this with chunks of files at a later date?
  • Can we compress the files? Can this be done in parallel?
  • What is the fastest collision-resistant algorithm for file comparison?
  • How do we stop two backup processes running simultaneously?

Security:

  • Assumption is that you trust the backup target
  • If you want encryption at rest, use LUKS etc. to encrypt the underlying device
  • If you want encryption in transit, use SSH or TLS

GitHub