Commit Graph

90 Commits

Author SHA1 Message Date
Gilbert Chen
d92b1734f4 Skip identical entries when listing chunks
The prune command can remove redundant chunks (chunks with the same chunk id
but at different subdirectory level).  However, if the same chunk appears
mutliple times in the listing returned by the storage, it will be treated as
a redundant chunk and thus removed.
2023-09-27 15:31:08 -04:00
Gilbert Chen
cdf8f5a857 Check the length of 'file' before checking if it ends with '/' 2023-04-09 22:11:08 -04:00
David Zhang
df80096cdf Acquire verifiedChunksLock in saveVerifiedChunks 2023-03-12 21:00:51 -07:00
Gilbert Chen
b8c7594dbf Release the chunk used to download files when finished
Without this fix, a chunk is leaked for each snapshot checked
with `-files`.
2022-12-06 22:46:25 -05:00
Gilbert Chen
58f0d2be5a Fixed a bug that didn't preserve the version bit when copying old snapshots
The version bit should not be set to 1 when encoding a snapshot.  Instead,
it must be set to 1 on snapshot creation.

To correctly process old snapshots encoded incorrectly with version bit set
to 1, the first byte of the encoded file list is also checked.  If the first
byte is `[`, then it must be an old snapshot, since the file list in the new
snapshot format always starts with a string encoded in msgpack, the first
byte of which can't be `[`.
2022-11-22 21:31:24 -05:00
Gilbert Chen
bc2d762e41 Add -rewrite to the check command to fix corrupted chunks
This option is useful only when erasure coding is enabled.  It will
download and re-upload chunks that contain corruption but are
generally recoverable.  It can also be used to fix chunks that
are created by 3.0.1 on arm64 machines with wrong hashes.
2022-11-15 11:47:02 -05:00
Gilbert Chen
d9f6545d63 Rewrite the backup procedure to reduce memory usage
Main changes:

* Change the listing order of files/directories so that the local and remote
  snapshots can be compared on-the-fly.

* Introduce a new struct called EntryList that maintains a list of
  files/directories, which are kept in memory when the number is lower, and
  serialized into a file when there are too many.

* EntryList can also be turned into an on-disk incomplete snapshot quickly,
  to support fast-resume on next run.

* ChunkOperator can now download and upload chunks, thus replacing original
  ChunkDownloader and ChunkUploader.  The new ChunkDownloader is only used
  to prefetch chunks during the restore operation.
2021-10-24 23:34:49 -04:00
Gilbert Chen
f83e4f3c44 Fix SNAPSHOT_INACTIVE log message 2021-01-28 00:06:49 -05:00
Gilbert Chen
7f834e84f6 Don't attemp to load verified_chunks when it doesn't exist. 2020-10-09 14:22:45 -04:00
Gilbert Chen
d7c1903d5a Skip chunks already verified in previous runs for check -chunks.
This is done by storing the list of verified chunks in a file
`.duplicacy/cache/<storage>/verified_chunks`.
2020-10-08 19:59:39 -04:00
gilbertchen
1fedfd1b1a Merge branch 'master' into mac-exclude 2020-09-25 20:13:43 -04:00
Gilbert Chen
6841c989c6 Fixed a bug that caused check -chunks -persist to succeed with broken chunks
The bug was not setting the `isBroken` flag in WaitForChunk()
2020-09-24 14:53:42 -04:00
Gilbert Chen
d0b3b5dc2e Print progress logs when verifying chunks (check -chunks) 2020-09-23 09:02:53 -04:00
Gilbert Chen
73ae3f809e Revert "Add a -max-list-rate option to backup to slow down the listing"
This reverts commit 67a3103467.
2020-09-22 22:08:43 -04:00
Gilbert Chen
67a3103467 Add a -max-list-rate option to backup to slow down the listing
This option sets the maximum number of files that can be listed in one
second.
2020-09-22 08:27:09 -04:00
Gilbert Chen
b7d820195a Remove a debug log message accidentally checked in 2020-09-18 14:59:44 -04:00
Gilbert Chen
16d2c14c5a Follow-up changes for the -persist PR
* Restore/check should report an error instead of a success at the end if there
  were any errors and -persist is specified
* Don't compute the file hash before passing the file to the chunk maker; this is
  redundant as the chunk maker will produce the file hash
* Add a LOG_WERROR function to switch between LOG_WARN and LOG_ERROR dynamically
2020-09-18 11:23:35 -04:00
gilbertchen
ca4d004aca Merge branch 'master' into add_persist_pr 2020-09-09 15:42:01 -04:00
Gilbert Chen
1eb1fb14a8 Don't throw an error on 0-byte chunk files with suffix '.tmp'. 2020-07-07 23:25:09 -04:00
Gilbert Chen
5d45999077 Clear the loaded content after a snapshot has been verified
The snapshot content is loaded before verifying the snapshot, but after that
it isn't used anymore so it should be released to save memory.
2020-06-10 10:08:53 -04:00
Gilbert Chen
fe854d469d Error out in the check command if there are 0-size chunks. 2020-06-02 11:37:12 -04:00
Gilbert Chen
6ca8b8dff0 Disable snapshot cache when checking chunks
Otherwise every chunk will be stored to the snapshot cache when the `-chunks`
option is specified.
2020-05-10 00:26:47 -04:00
Tet Woo Lee
4ae16dec7f add -persist in check and restore mode (for PR) 2020-05-06 18:39:52 +12:00
Gilbert Chen
22d6f3abfc Add a -chunks option to the check command to verify the integrity of chunks
This option will download and verify every chunk.  Unlike the -files option,
this option only downloads each chunk once.  There is also a new -threads
option to use multiple threads to download chunks.
2020-03-24 20:58:45 -04:00
Gilbert Chen
165152493c For the check command, -tabular should imply -all just like -stats 2019-11-24 20:45:05 -05:00
Gilbert Chen
a99f059b52 Allow a custom location for the filters file
You can now add a key 'filters' in the preferences file that points to the
path of the filters file.  If this key is not found in the preferences,
the default location '.duplicacy/filters' is used.

There is a new option '-filters' for the set command that set this key in
the preferences, but you can also edit the file directly.
2019-11-23 15:23:26 -05:00
Gilbert Chen
90833f9d86 Implement RSA encryption
This is to support public key encryption in the backup operation.  You can use
the -key option to supply the public key to the backup command, and then the
same option to supply the private key when restoring a previous revision.

The storage must be encrypted for this to work.
2019-09-20 14:19:18 -04:00
Gilbert Chen
4da7f7b6f9 Check -files may download a chunk multple times
This commit fixed a bug that caused 'check -files' to download the same chunk
multiple times if shared by multiple small files.
2019-06-13 14:47:21 -04:00
Gilbert Chen
9d4ac34f4b Don't compare hashes of empty files in the diff command
Empty files may or may not have a hash depending if the -hash option is used
during backup.
2019-06-06 12:35:34 -04:00
Gilbert Chen
458687d543 The cat command doesn't need to load the entire file into memory
It can print out the chunk as soon as a chunk is retrieved.  This avoids
reconstructing the file in the memory which can be an issue with large files.
2019-05-03 11:33:16 -04:00
Gilbert Chen
4eb174cec5 Remove a few util functions that aren't necessary 2019-04-26 23:47:25 -04:00
Gilbert Chen
1da151f9d9 Add an additional lookup for a chunk that isn't in the chunk list
A chunk not in the chunk list may actually exists in two scenarios:
* the chunk may be a special snapshot chunk that contains the chunk sequence,
  so it may be resurrected by the chunk downloader if it had been turned into
  a fossil before
* if the API to list all chunks doesn't return the complete list due to some
  bug

This additional lookup avoid reporting the missing chunk prematurely.
2019-04-21 20:32:21 -04:00
Gilbert Chen
4b69c1162e Fix a memory issue that check -tabular uses too much memory with many revisions
The call to GetSnapshotChunks in ShowStatisticsTabular sets keepChunkHashes to
true -- this can cause too much memory consumption with hundreds of revisions.
2019-04-20 22:47:03 -04:00
Michael Cook
0762c448c4 gofmt -s 2018-12-29 13:20:10 +01:00
Michael Cook
741644b575 spelling 2018-12-29 13:04:40 +01:00
Gilbert Chen
244b797a1c Print the number of files if available in the snapshot file 2018-11-03 10:38:35 -04:00
Gilbert Chen
073292018c Don't show snapshots whose tags don't match the given one 2018-10-28 23:30:22 -04:00
Gilbert Chen
15f15aa2ca Show more statistics in the check command 2018-10-28 23:27:36 -04:00
Patrick Seal
a1efbe3b73 Add exclude_by_attribute preference 2018-09-21 21:35:40 -07:00
Gilbert Chen
22a0b222db Align snapshot times to the beginning of days when calculating the differences 2018-09-08 20:31:49 -04:00
Gilbert Chen
e8b8922754 Continue to check other snapshots when one snapshot has missing chunks 2018-08-06 21:20:04 -04:00
Gilbert Chen
93cc632021 Record deleted snapshots in the fossil collection and if any deleted snapshot still exist nude the fossil collection 2018-08-04 22:59:25 -04:00
Gilbert Chen
f304b64b3f Removed a redundant call to manager.chunkOperator.Resurrect 2018-08-03 11:32:24 -04:00
Gilbert Chen
8ae7d2a97d Remove extra newline in the PRUNE_NEWSNAPSHOT log message 2018-07-26 21:24:33 -04:00
Gilbert Chen
72dfaa8b6b Fixed a bug causing a new snapshot to be not counted when deciding which fossils can be deleted 2018-07-23 22:08:08 -04:00
Gilbert Chen
f68eb13584 A few fixes for multi-threaded pruning 2018-06-05 16:09:12 -04:00
Gilbert Chen
dd53b4797e Implement multi-threaded pruning 2018-06-04 21:52:07 -04:00
Gilbert Chen
0e585e4be4 Fixed a crashing bug when showing the history of excluded files 2018-05-30 12:05:40 -04:00
Gilbert Chen
e03cd2a880 Add unreferenced fossils to the fossil collection instead of deleting them immediately 2018-05-29 12:57:38 -04:00
Frank
4dd5c43307 Add nobackup-file preference.
Directories containing a file with this name will not be backed up. I find it easier to drop a .nobackup file in directories I don't want backed up instead of maintaining a file of exclusions. This is also useful for scripts that create data in the repository but don't want it to be backed up.
2018-04-01 12:50:00 -07:00