-
Notifications
You must be signed in to change notification settings - Fork 27.2k
Comparing changes
Open a pull request
base repository: git/git
base: 9b6606f43d
head repository: git/git
compare: 1b4c57fa87
- 6 commits
- 9 files changed
- 2 contributors
Commits on Apr 16, 2020
-
revision: complicated pathspecs disable filters
The changed-path Bloom filters work only when we can compute an explicit Bloom filter key in advance. When a pathspec is given that allows case-insensitive checks or wildcard matching, we must disable the Bloom filter performance checks. By checking the pathspec in prepare_to_use_bloom_filters(), we avoid setting up the Bloom filter data and thus revert to the usual logic. Before this change, the following tests would fail*: t6004-rev-list-path-optim.sh (Tests 6-7) t6130-pathspec-noglob.sh (Tests 3-6) t6131-pathspec-icase.sh (Tests 3-5) *These tests would fail when using GIT_TEST_COMMIT_GRAPH and GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS except that the latter environment variable was not set up correctly to write the changed- path Bloom filters in the test suite. That will be fixed in the next change. Helped-by: Taylor Blau <[email protected]> Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8918e37 - Browse repository at this point
Copy the full SHA 8918e37View commit details -
tests: write commit-graph with Bloom filters
The GIT_TEST_COMMIT_GRAPH environment variable updates the commit- graph file whenever "git commit" is run, ensuring that we always have an updated commit-graph throughout the test suite. The GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS environment variable was introduced to write the changed-path Bloom filters whenever "git commit-graph write" is run. However, the GIT_TEST_COMMIT_GRAPH trick doesn't launch a separate process and instead writes it directly. To expand the number of tests that have commits in the commit-graph file, add a helper method that computes the commit-graph and place that helper inside "git commit" and "git merge". In the helper method, check GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS to ensure we are writing changed-path Bloom filters whenever possible. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b23ea97 - Browse repository at this point
Copy the full SHA b23ea97View commit details -
blame: use changed-path Bloom filters
The changed-path Bloom filters help reduce the amount of tree parsing required during history queries. Before calculating a diff, we can ask the filter if a path changed between a commit and its first parent. If the filter says "no" then we can move on without parsing trees. If the filter says "maybe" then we parse trees to discover if the answer is actually "yes" or "no". When computing a blame, there is a section in find_origin() that computes a diff between a commit and one of its parents. When this is the first parent, we can check the Bloom filters before calling diff_tree_oid(). In order to make this work with the blame machinery, we need to initialize a struct bloom_key with the initial path. But also, we need to add more keys to a list if a rename is detected. We then check to see if _any_ of these keys answer "maybe" in the diff. During development, I purposefully left out this "add a new key when a rename is detected" to see if the test suite would catch my error. That is how I discovered the issues with GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS from the previous change. With that change, we can feel some confidence in the coverage of this change. If a user requests copy detection using "git blame -C", then there are more places where the set of "important" files can expand. I do not know enough about how this happens in the blame machinery. Thus, the Bloom filter integration is explicitly disabled in this mode. A later change could expand the bloom_key data with an appropriate call (or calls) to add_bloom_key(). If we did not disable this mode, then the following tests would fail: t8003-blame-corner-cases.sh t8011-blame-split-file.sh Generally, this is a performance enhancement and should not change the behavior of 'git blame' in any way. If a repo has a commit-graph file with computed changed-path Bloom filters, then they should notice improved performance for their 'git blame' commands. Here are some example timings that I found by blaming some paths in the Linux kernel repository: git blame arch/x86/kernel/topology.c >/dev/null Before: 0.83s After: 0.24s git blame kernel/time/time.c >/dev/null Before: 0.72s After: 0.24s git blame tools/perf/ui/stdio/hist.c >/dev/null Before: 0.27s After: 0.11s I specifically looked for "deep" paths that were also edited many times. As a counterpoint, the MAINTAINERS file was edited many times but is located in the root tree. This means that the cost of computing a diff relative to the pathspec is very small. Here are the timings for that command: git blame MAINTAINERS >/dev/null Before: 20.1s After: 18.0s These timings are the best of five. The worst-case runs were on the order of 2.5 minutes for both cases. Note that the MAINTAINERS file has 18,740 lines across 17,000+ commits. This happens to be one of the cases where this change provides the least improvement. The lack of improvement for the MAINTAINERS file and the relatively modest improvement for the other examples can be easily explained. The blame machinery needs to compute line-level diffs to determine which lines were changed by each commit. That makes up a large proportion of the computation time, and this change does not attempt to improve on that section of the algorithm. The MAINTAINERS file is large and changed often, so it takes time to determine which lines were updated by which commit. In contrast, the code files are much smaller, and it takes longer to comute the line-by-line diff for a single patch on the Linux mailing lists. Outside of the "-C" integration, I believe there is little more to gain from the changed-path Bloom filters for 'git blame' after this patch. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0906ac2 - Browse repository at this point
Copy the full SHA 0906ac2View commit details
Commits on Apr 23, 2020
-
blame: drop unused parameter from maybe_changed_path
We don't use the "parent" parameter at all (probably because the bloom filter for a commit is always defined against a single parent anyway). Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for fe88f9f - Browse repository at this point
Copy the full SHA fe88f9fView commit details -
test-bloom: fix some whitespace issues
Signed-off-by: Jeff King <[email protected]> Reviewed-by: Taylor Blau <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 24b7d1e - Browse repository at this point
Copy the full SHA 24b7d1eView commit details -
test-bloom: check that we have expected arguments
If "test-tool bloom" is not fed a command, or if arguments are missing for some commands, it will just segfault. Let's check argc and write a friendlier usage message. Signed-off-by: Jeff King <[email protected]> Reviewed-by: Taylor Blau <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1b4c57f - Browse repository at this point
Copy the full SHA 1b4c57fView commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff 9b6606f43d...1b4c57fa87