A HTTP(S) syncing tool with lower overhead, for OSS mirrors.
Instead of HEADing every single file, tsumugu parses directory listing HTML and downloads only files that do not seem to be up-to-date.
To successfully sync from these domains, where lftp/rclone fails or finds difficulties:
- Add "--include": Sync even if the file is excluded by
--excluderegex. - Add supported Debian, Ubuntu, Fedora and RHEL versions support to
--includeregex.- Something like
--include debian/${DEBIAN_VERSIONS}?
- Something like
- Check for APT/YUM repo integrity (avoid keeping old invalid metadata files)
- (This is experimental and may not work well)
This project uses cargo workspace with the following crates:
- tsumugu-net: Abstraction of HTTP client (required by
tsumugu-parser) and an implementation withreqwest+tokio(used bytsumugu-cli). - tsumugu-parser: A parser crate for various directory listing formats. Can be reused by other projects.
- tsumugu-cli: The CLI tool for syncing. For historical reasons, the crate name (and binary name) is called
tsumugu.
See ./tsumugu-parser/src/regex_manager/mod.rs for available variables to use in inclusion and exclusion regexes.
There's a breaking change since 20240902. User regexes with ^ and $ would be affected.
See ./docs/exclusion.md.
Tsumugu relies on local file size and mtime to check if file shall be downloaded. Some file-level deduplicators like jdupes would ignore file mtime when deduplicating with hard links. This could be an issue for some repos, as some files would be redownloaded again and again every time as it does not have a correct mtime locally.
Workarounds:
- Set
--compare-size-only. - Use filesystem-level/block-level deduplication like
zfs dedup. - Use another file-level deduplicator which considers mtime (though I don't know which would do this).
Also, if you are sure that some directory is identical with another, you could manually create a symlink for that. Tsumugu would ignore symlinks during syncing.
Special thanks to NJU Mirror for extensive testing and bug reporting.
The name "tsumugu", and current branch name "pudding", are derived from the manga A Drift Girl and a Noble Moon.
And...
Tsumugu in the appearance of a very simplified version of Hitori (Obviously I am not very good at drawing though).
Old (2020), unfinished golang version is named as "traverse", under the main-old branch.