mirror of
https://github.com/jkl1337/duplicacy.git
synced 2026-01-02 03:34:39 -06:00
some readme fixes
This commit is contained in:
16
README.md
16
README.md
@@ -18,27 +18,27 @@ NOTE: This project/repository is not affiliated with nor endorsed by Duplicacy,
|
||||
The generated preserves snapshots are backward compatible with vanilla versions of duplicacy and also do not increase the encoding size of metadata significantly. Unfortunately duplicacy does not have a formal forward-compatible snapshot versioning system, but that's not too surprising. This does mean that the data encoding is somewhat abusive of the existing format.
|
||||
|
||||
### Hard links
|
||||
The storage differs for regular files vs. every other target. Entry records contain a `Link` string field for the symlink target. When a likely hard linked file is encountered (`st_nlink > 1`) that entry is marked as a hard link root with the string "/" in the `Link` field and it is placed in an array, the index of this array will serve as a link address. Plain duplicacy only uses the `Link` field for symlinks. Files that hard link to this initial file have the index of the array of root files encoded as a base-16 integer into their `Link` field. These entries are placed in the snapshot with valid start/end chunk and offset values and all metadata is cloned so official Duplicacy will recover them as regular files with all metadata, it just will never make hard links.
|
||||
The storage differs for regular files vs. every other target. Entry records contain a `Link` string field for the symlink target. When a likely hard linked file is encountered (`st_nlink > 1`) that entry is marked as a hard link root with the string `"/"` in the `Link` field and it is placed in an array, the index of this array will serve as a link address. Plain duplicacy only uses the `Link` field for symlinks. Files that hard link to this initial file have the index of the array of root files encoded as a base-16 integer into their `Link` field. These entries are placed in the snapshot with valid start/end chunk and offset values and all metadata is cloned so official Duplicacy will recover them as regular files with all metadata, it just will never make hard links.
|
||||
For hard links to symlinks and special files, the `Link` isn't used. Instead, since these files never have content the `EndChunk/EndOffset` fields are used. A magic number (-9) is encoded in `EndChunk` for root entries and (-10) for clone/child entries. The `EndOffset` contains the index into the root entry array.
|
||||
|
||||
### Special Files
|
||||
Duplicacy simply skips special files. DuplicacyIX does not skip them. The `st_rdev` (device number) for character and block devices is stored with the lower 32-bits in `StartChunk` and the upper 32-bits in `StartOffset`, though no actual supported system uses anything bigger than 32-bits. The packing of this quantity is OS specific, but major, minor numbers are also OS specific.
|
||||
Duplicacy simply skips special files. Dupluxy does not skip them. The `st_rdev` (device number) for character and block devices is stored with the lower 32-bits in `StartChunk` and the upper 32-bits in `StartOffset`, though no actual supported system uses anything bigger than 32-bits. The packing of this quantity is OS specific, but major, minor numbers are also OS specific.
|
||||
|
||||
### File flags
|
||||
Files flags are stored in the extended attributes table with a short (2 character) OS specific key prefixed with a null-byte. Duplicacy will try to set these xattrs, however they will be ignored as the name appears to be empty with the initial null-byte.
|
||||
|
||||
|
||||
## Motivation
|
||||
Arguably system root directories are better preserved in a filesystem image format, however the line becomes blurred for home and data directories the former which tends to become a magnet for all kinds of data layout. This gives the option of a convenient random addressable cloud backup with easy partial restore while also being able to backup an nearly exact replica for use in disaster recovery. Nearly exact, the only metadata not preserved are times other than mtimes and ACLs on BSD-like systems.
|
||||
Hard links are a pain and might be better to not exist but in actual use, but things like git repos and SDKs have a tendency to use them. Often one has no choice but to deal with them, and forgoing preserving them is painful.
|
||||
Arguably system root directories are better preserved in a filesystem image format, however the line becomes blurred for home and data directories the former which tends to become a magnet for all kinds of data layout. This gives the option of a convenient random addressable cloud backup with easy partial restore while also being able to backup a nearly exact replica for use in disaster recovery. Nearly exact, the only metadata not preserved are times other than mtimes and ACLs on BSD-like systems.
|
||||
Hard links are a pain and might be better to not exist but in actual use things like git repos and SDKs have a tendency to use them. Often one has no choice but to deal with them, and forgoing preserving them is painful.
|
||||
File flags are primarily for the use case of btrfs snapshot backups, specifically with regards to compression and no-COW. The implementation applies certain flags immediately on open so that these flags apply to written blocks.
|
||||
Special files serve a couple purposes. Backup of FIFOs and sockets are primarily for preserving metadata since these files have no useful content and can always be created on the fly. The other is support for backup of overlay2 file systems. overlay2 uses character mode dev-nodes for whiteouts in addition to trusted namespace xattrs. DuplicacyIX should be able to faithfully reproduce overlay2 fs layers.
|
||||
Special files serve a couple purposes. Backup of FIFOs and sockets are primarily for preserving metadata since these files have no useful content and can always be created on the fly. The other is support for backup of overlay2 file systems. overlay2 uses character mode dev-nodes for whiteouts in addition to trusted namespace xattrs. Dupluxy should be able to faithfully reproduce overlay2 fs layers.
|
||||
|
||||
## Caveats/TODO
|
||||
* Improve command line options around processing hard links and special files, especially for restore. Right now if you don't have `mknod` capabilities and the snapshot has special device files you're going to need to use filters.
|
||||
* File flags for immutability aren't handled smartly. Specifically immutable and append only files will break badly with hardlinks, since hardlink creation is deferred to after flags application. The solution is not complicated but this is not a pressing use case.
|
||||
* Improve handling of preferences. There are preferences to enable and disable most features (not hardlinks though) with reasonable defaults but nothing is much documented. Take a look at the generated `.duplicacy/preferences` file.
|
||||
* File flags for immutability aren't handled smartly. Specifically immutable and append only files will break badly with hardlinks, since hardlink creation is deferred to after flags application.
|
||||
* Some corner cases of replacing existing files with hard links might end up breaking links if not doing a full restore. Again not a pressing use case. For the primary use of disaster recovery of large portions or an entire volume it works fine.
|
||||
* Possibly encode ACLs on Mac OS/FreeBSD. On Linux the crappy POSIX 1e ACLs that no one should use are picked up in the xattrs.
|
||||
* Possibly encode ACLs on Mac OS/FreeBSD. On Linux the crappy POSIX 1e ACLs that no one likes to use are picked up in the xattrs for free.
|
||||
|
||||
# Duplicacy: A lock-free deduplication cloud backup tool
|
||||
|
||||
|
||||
Reference in New Issue
Block a user