Commit Graph

84 Commits

Author SHA1 Message Date
Gilbert Chen
d9f6545d63 Rewrite the backup procedure to reduce memory usage
Main changes:

* Change the listing order of files/directories so that the local and remote
  snapshots can be compared on-the-fly.

* Introduce a new struct called EntryList that maintains a list of
  files/directories, which are kept in memory when the number is lower, and
  serialized into a file when there are too many.

* EntryList can also be turned into an on-disk incomplete snapshot quickly,
  to support fast-resume on next run.

* ChunkOperator can now download and upload chunks, thus replacing original
  ChunkDownloader and ChunkUploader.  The new ChunkDownloader is only used
  to prefetch chunks during the restore operation.
2021-10-24 23:34:49 -04:00
Gilbert Chen
f83e4f3c44 Fix SNAPSHOT_INACTIVE log message 2021-01-28 00:06:49 -05:00
Gilbert Chen
7f834e84f6 Don't attemp to load verified_chunks when it doesn't exist. 2020-10-09 14:22:45 -04:00
Gilbert Chen
d7c1903d5a Skip chunks already verified in previous runs for check -chunks.
This is done by storing the list of verified chunks in a file
`.duplicacy/cache/<storage>/verified_chunks`.
2020-10-08 19:59:39 -04:00
gilbertchen
1fedfd1b1a Merge branch 'master' into mac-exclude 2020-09-25 20:13:43 -04:00
Gilbert Chen
6841c989c6 Fixed a bug that caused check -chunks -persist to succeed with broken chunks
The bug was not setting the `isBroken` flag in WaitForChunk()
2020-09-24 14:53:42 -04:00
Gilbert Chen
d0b3b5dc2e Print progress logs when verifying chunks (check -chunks) 2020-09-23 09:02:53 -04:00
Gilbert Chen
73ae3f809e Revert "Add a -max-list-rate option to backup to slow down the listing"
This reverts commit 67a3103467.
2020-09-22 22:08:43 -04:00
Gilbert Chen
67a3103467 Add a -max-list-rate option to backup to slow down the listing
This option sets the maximum number of files that can be listed in one
second.
2020-09-22 08:27:09 -04:00
Gilbert Chen
b7d820195a Remove a debug log message accidentally checked in 2020-09-18 14:59:44 -04:00
Gilbert Chen
16d2c14c5a Follow-up changes for the -persist PR
* Restore/check should report an error instead of a success at the end if there
  were any errors and -persist is specified
* Don't compute the file hash before passing the file to the chunk maker; this is
  redundant as the chunk maker will produce the file hash
* Add a LOG_WERROR function to switch between LOG_WARN and LOG_ERROR dynamically
2020-09-18 11:23:35 -04:00
gilbertchen
ca4d004aca Merge branch 'master' into add_persist_pr 2020-09-09 15:42:01 -04:00
Gilbert Chen
1eb1fb14a8 Don't throw an error on 0-byte chunk files with suffix '.tmp'. 2020-07-07 23:25:09 -04:00
Gilbert Chen
5d45999077 Clear the loaded content after a snapshot has been verified
The snapshot content is loaded before verifying the snapshot, but after that
it isn't used anymore so it should be released to save memory.
2020-06-10 10:08:53 -04:00
Gilbert Chen
fe854d469d Error out in the check command if there are 0-size chunks. 2020-06-02 11:37:12 -04:00
Gilbert Chen
6ca8b8dff0 Disable snapshot cache when checking chunks
Otherwise every chunk will be stored to the snapshot cache when the `-chunks`
option is specified.
2020-05-10 00:26:47 -04:00
Tet Woo Lee
4ae16dec7f add -persist in check and restore mode (for PR) 2020-05-06 18:39:52 +12:00
Gilbert Chen
22d6f3abfc Add a -chunks option to the check command to verify the integrity of chunks
This option will download and verify every chunk.  Unlike the -files option,
this option only downloads each chunk once.  There is also a new -threads
option to use multiple threads to download chunks.
2020-03-24 20:58:45 -04:00
Gilbert Chen
165152493c For the check command, -tabular should imply -all just like -stats 2019-11-24 20:45:05 -05:00
Gilbert Chen
a99f059b52 Allow a custom location for the filters file
You can now add a key 'filters' in the preferences file that points to the
path of the filters file.  If this key is not found in the preferences,
the default location '.duplicacy/filters' is used.

There is a new option '-filters' for the set command that set this key in
the preferences, but you can also edit the file directly.
2019-11-23 15:23:26 -05:00
Gilbert Chen
90833f9d86 Implement RSA encryption
This is to support public key encryption in the backup operation.  You can use
the -key option to supply the public key to the backup command, and then the
same option to supply the private key when restoring a previous revision.

The storage must be encrypted for this to work.
2019-09-20 14:19:18 -04:00
Gilbert Chen
4da7f7b6f9 Check -files may download a chunk multple times
This commit fixed a bug that caused 'check -files' to download the same chunk
multiple times if shared by multiple small files.
2019-06-13 14:47:21 -04:00
Gilbert Chen
9d4ac34f4b Don't compare hashes of empty files in the diff command
Empty files may or may not have a hash depending if the -hash option is used
during backup.
2019-06-06 12:35:34 -04:00
Gilbert Chen
458687d543 The cat command doesn't need to load the entire file into memory
It can print out the chunk as soon as a chunk is retrieved.  This avoids
reconstructing the file in the memory which can be an issue with large files.
2019-05-03 11:33:16 -04:00
Gilbert Chen
4eb174cec5 Remove a few util functions that aren't necessary 2019-04-26 23:47:25 -04:00
Gilbert Chen
1da151f9d9 Add an additional lookup for a chunk that isn't in the chunk list
A chunk not in the chunk list may actually exists in two scenarios:
* the chunk may be a special snapshot chunk that contains the chunk sequence,
  so it may be resurrected by the chunk downloader if it had been turned into
  a fossil before
* if the API to list all chunks doesn't return the complete list due to some
  bug

This additional lookup avoid reporting the missing chunk prematurely.
2019-04-21 20:32:21 -04:00
Gilbert Chen
4b69c1162e Fix a memory issue that check -tabular uses too much memory with many revisions
The call to GetSnapshotChunks in ShowStatisticsTabular sets keepChunkHashes to
true -- this can cause too much memory consumption with hundreds of revisions.
2019-04-20 22:47:03 -04:00
Michael Cook
0762c448c4 gofmt -s 2018-12-29 13:20:10 +01:00
Michael Cook
741644b575 spelling 2018-12-29 13:04:40 +01:00
Gilbert Chen
244b797a1c Print the number of files if available in the snapshot file 2018-11-03 10:38:35 -04:00
Gilbert Chen
073292018c Don't show snapshots whose tags don't match the given one 2018-10-28 23:30:22 -04:00
Gilbert Chen
15f15aa2ca Show more statistics in the check command 2018-10-28 23:27:36 -04:00
Patrick Seal
a1efbe3b73 Add exclude_by_attribute preference 2018-09-21 21:35:40 -07:00
Gilbert Chen
22a0b222db Align snapshot times to the beginning of days when calculating the differences 2018-09-08 20:31:49 -04:00
Gilbert Chen
e8b8922754 Continue to check other snapshots when one snapshot has missing chunks 2018-08-06 21:20:04 -04:00
Gilbert Chen
93cc632021 Record deleted snapshots in the fossil collection and if any deleted snapshot still exist nude the fossil collection 2018-08-04 22:59:25 -04:00
Gilbert Chen
f304b64b3f Removed a redundant call to manager.chunkOperator.Resurrect 2018-08-03 11:32:24 -04:00
Gilbert Chen
8ae7d2a97d Remove extra newline in the PRUNE_NEWSNAPSHOT log message 2018-07-26 21:24:33 -04:00
Gilbert Chen
72dfaa8b6b Fixed a bug causing a new snapshot to be not counted when deciding which fossils can be deleted 2018-07-23 22:08:08 -04:00
Gilbert Chen
f68eb13584 A few fixes for multi-threaded pruning 2018-06-05 16:09:12 -04:00
Gilbert Chen
dd53b4797e Implement multi-threaded pruning 2018-06-04 21:52:07 -04:00
Gilbert Chen
0e585e4be4 Fixed a crashing bug when showing the history of excluded files 2018-05-30 12:05:40 -04:00
Gilbert Chen
e03cd2a880 Add unreferenced fossils to the fossil collection instead of deleting them immediately 2018-05-29 12:57:38 -04:00
Frank
4dd5c43307 Add nobackup-file preference.
Directories containing a file with this name will not be backed up. I find it easier to drop a .nobackup file in directories I don't want backed up instead of maintaining a file of exclusions. This is also useful for scripts that create data in the repository but don't want it to be backed up.
2018-04-01 12:50:00 -07:00
Gilbert Chen
5d2242d39d Preserve the list of chunk hashes for the latest snapshot when cleaning local snapshot cache 2018-03-27 23:34:40 -04:00
gilbertchen
13fffc2a11 Merge pull request #329 from pdf/prune_memory
Reduce memory consumption for prune operation
2018-03-20 14:32:12 -04:00
Gilbert Chen
e07226bd62 Retention policy erroneously apply to snapshots without the specified tags 2018-02-10 21:33:01 -05:00
Gilbert Chen
7230ddbef5 Clear the attributes from last snapshot after loading to save memory 2018-01-28 16:54:06 -05:00
Gilbert Chen
d330f61d25 Limit derivation key to 64 bytes since snapshot file path used as key may be longer 2018-01-20 23:52:35 -05:00
Peter Fern
57082cd1d2 Reduce memory consumption for prune operation
For non-exhaustive prune, consider only target chunks instead of mapping
all chunks in repository.
2018-01-21 10:12:09 +11:00