Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.

base: ad1260b6c994f7c0f9c259bd39f39979f7f4ecc2

...

compare: 596b5e77c960cc57ad2e68407b298411ec5e8cb8

8 commits
11 files changed
3 contributors

Commits on Nov 3, 2021

test-genzeros: allow more than 2G zeros in Windows

d5cfd14 (tests: teach the test-tool to generate NUL bytes and
use it, 2019-02-14), add a way to generate zeroes in a portable
way without using /dev/zero (needed by HP NonStop), but uses a
long variable that is limited to 2^31 in Windows.

Use instead a (POSIX/C99) intmax_t that is at least 64bit wide
in 64-bit Windows to use in a future test.

Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>

carenas authored and gitster committed Nov 3, 2021

cbc985a

test-tool genzeros: generate large amounts of data more efficiently

In this developer's tests, producing one gigabyte worth of NULs in a
busy loop that writes out individual bytes, unbuffered, took ~27sec.
Writing chunked 256kB buffers instead only took ~0.6sec

This matters because we are about to introduce a pair of test cases that
want to be able to produce 5GB of NULs, and we cannot use `/dev/zero`
because of the HP NonStop platform's lack of support for that device.

Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>

dscho authored and gitster committed Nov 3, 2021

df7000c

test-lib: add prerequisite for 64-bit platforms

Allow tests that assume a 64-bit `size_t` to be skipped in 32-bit
platforms and regardless of the size of `long`.

This imitates the `LONG_IS_64BIT` prerequisite.

Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>

carenas authored and gitster committed Nov 3, 2021

970fa57

t1051: introduce a smudge filter test for extremely large files

The filter system allows for alterations to file contents when they're
added to the database or working tree. ("Smudge" when moving to the
working tree; "clean" when moving to the database.) This is used
natively to handle CRLF to LF conversions. It's also employed by Git-LFS
to replace large files from the working tree with small tracking files
in the repo and vice versa.

Git reads the entire smudged file into memory to convert it into a
"clean" form to be used in-core. While this is inefficient, there's a
more insidious problem on some platforms due to inconsistency between
using unsigned long and size_t for the same type of data (size of a file
in bytes). On most 64-bit platforms, unsigned long is 64 bits, and
size_t is typedef'd to unsigned long. On Windows, however, unsigned long
is only 32 bits (and therefore on 64-bit Windows, size_t is typedef'd to
unsigned long long in order to be 64 bits).

Practically speaking, this means 64-bit Windows users of Git-LFS can't
handle files larger than 2^32 bytes. Other 64-bit platforms don't suffer
this limitation.

This commit introduces a test exposing the issue; future commits make it
pass. The test simulates the way Git-LFS works by having a tiny file
checked into the repository and expanding it to a huge file on checkout.

Helped-by: Johannes Schindelin <[email protected]>
Signed-off-by: Matt Cooper <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>

vtbassmatt authored and gitster committed Nov 3, 2021

b79541a

odb: teach read_blob_entry to use size_t

There is mixed use of size_t and unsigned long to deal with sizes in the
codebase. Recall that Windows defines unsigned long as 32 bits even on
64-bit platforms, meaning that converting size_t to unsigned long narrows
the range. This mostly doesn't cause a problem since Git rarely deals
with files larger than 2^32 bytes.

But adjunct systems such as Git LFS, which use smudge/clean filters to
keep huge files out of the repository, may have huge file contents passed
through some of the functions in entry.c and convert.c. On Windows, this
results in a truncated file being written to the workdir. I traced this to
one specific use of unsigned long in write_entry (and a similar instance
in write_pc_item_to_fd for parallel checkout). That appeared to be for
the call to read_blob_entry, which expects a pointer to unsigned long.

By altering the signature of read_blob_entry to expect a size_t,
write_entry can be switched to use size_t internally (which all of its
callers and most of its callees already used). To avoid touching dozens of
additional files, read_blob_entry uses a local unsigned long to call a
chain of functions which aren't prepared to accept size_t.

Helped-by: Johannes Schindelin <[email protected]>
Signed-off-by: Matt Cooper <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>

vtbassmatt authored and gitster committed Nov 3, 2021

e9aa762

git-compat-util: introduce more size_t helpers
```
We will use them in the next commit.

Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
```
dscho authored and gitster committed Nov 3, 2021
Configuration menu
View commit details

Copy full SHA for e2ffeae

Browse repository at this point
Copy the full SHA

e2ffeae View commit details

Browse the repository at this point in the history

odb: guard against data loss checking out a huge file

This introduces an additional guard for platforms where `unsigned long`
and `size_t` are not of the same size. If the size of an object in the
database would overflow `unsigned long`, instead we now exit with an
error.

A complete fix will have to update _many_ other functions throughout the
codebase to use `size_t` instead of `unsigned long`. It will have to be
implemented at some stage.

This commit puts in a stop-gap for the time being.

Helped-by: Johannes Schindelin <[email protected]>
Signed-off-by: Matt Cooper <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>

vtbassmatt authored and gitster committed Nov 3, 2021

d6a09e7

clean/smudge: allow clean filters to process extremely large files

The filter system allows for alterations to file contents when they're
moved between the database and the worktree. We already made sure that
it is possible for smudge filters to produce contents that are larger
than `unsigned long` can represent (which matters on systems where
`unsigned long` is narrower than `size_t`, most notably 64-bit Windows).
Now we make sure that clean filters can _consume_ contents that are
larger than that.

Note that this commit only allows clean filters' _input_ to be larger
than can be represented by `unsigned long`.

This change makes only a very minute dent into the much larger project
to teach Git to use `size_t` instead of `unsigned long` wherever
appropriate.

Helped-by: Johannes Schindelin <[email protected]>
Signed-off-by: Matt Cooper <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>

vtbassmatt authored and gitster committed Nov 3, 2021

596b5e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Commits on Nov 3, 2021

This comparison is taking too long to generate.

Uh oh!