Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: git/git
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 9b6606f43d
Choose a base ref
...
head repository: git/git
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 1b4c57fa87
Choose a head ref
  • 6 commits
  • 9 files changed
  • 2 contributors

Commits on Apr 16, 2020

  1. revision: complicated pathspecs disable filters

    The changed-path Bloom filters work only when we can compute an
    explicit Bloom filter key in advance. When a pathspec is given
    that allows case-insensitive checks or wildcard matching, we
    must disable the Bloom filter performance checks.
    
    By checking the pathspec in prepare_to_use_bloom_filters(), we
    avoid setting up the Bloom filter data and thus revert to the
    usual logic.
    
    Before this change, the following tests would fail*:
    
    	t6004-rev-list-path-optim.sh (Tests 6-7)
    	t6130-pathspec-noglob.sh (Tests 3-6)
    	t6131-pathspec-icase.sh (Tests 3-5)
    
    *These tests would fail when using GIT_TEST_COMMIT_GRAPH and
    GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS except that the latter
    environment variable was not set up correctly to write the changed-
    path Bloom filters in the test suite. That will be fixed in the
    next change.
    
    Helped-by: Taylor Blau <[email protected]>
    Signed-off-by: Derrick Stolee <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    derrickstolee authored and gitster committed Apr 16, 2020
    Configuration menu
    Copy the full SHA
    8918e37 View commit details
    Browse the repository at this point in the history
  2. tests: write commit-graph with Bloom filters

    The GIT_TEST_COMMIT_GRAPH environment variable updates the commit-
    graph file whenever "git commit" is run, ensuring that we always
    have an updated commit-graph throughout the test suite. The
    GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS environment variable was
    introduced to write the changed-path Bloom filters whenever "git
    commit-graph write" is run. However, the GIT_TEST_COMMIT_GRAPH
    trick doesn't launch a separate process and instead writes it
    directly.
    
    To expand the number of tests that have commits in the commit-graph
    file, add a helper method that computes the commit-graph and place
    that helper inside "git commit" and "git merge".
    
    In the helper method, check GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS
    to ensure we are writing changed-path Bloom filters whenever
    possible.
    
    Signed-off-by: Derrick Stolee <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    derrickstolee authored and gitster committed Apr 16, 2020
    Configuration menu
    Copy the full SHA
    b23ea97 View commit details
    Browse the repository at this point in the history
  3. blame: use changed-path Bloom filters

    The changed-path Bloom filters help reduce the amount of tree
    parsing required during history queries. Before calculating a
    diff, we can ask the filter if a path changed between a commit
    and its first parent. If the filter says "no" then we can move
    on without parsing trees. If the filter says "maybe" then we
    parse trees to discover if the answer is actually "yes" or "no".
    
    When computing a blame, there is a section in find_origin() that
    computes a diff between a commit and one of its parents. When this
    is the first parent, we can check the Bloom filters before calling
    diff_tree_oid().
    
    In order to make this work with the blame machinery, we need to
    initialize a struct bloom_key with the initial path. But also, we
    need to add more keys to a list if a rename is detected. We then
    check to see if _any_ of these keys answer "maybe" in the diff.
    
    During development, I purposefully left out this "add a new key
    when a rename is detected" to see if the test suite would catch
    my error. That is how I discovered the issues with
    GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS from the previous change.
    With that change, we can feel some confidence in the coverage of
    this change.
    
    If a user requests copy detection using "git blame -C", then there
    are more places where the set of "important" files can expand. I
    do not know enough about how this happens in the blame machinery.
    Thus, the Bloom filter integration is explicitly disabled in this
    mode. A later change could expand the bloom_key data with an
    appropriate call (or calls) to add_bloom_key().
    
    If we did not disable this mode, then the following tests would
    fail:
    
    	t8003-blame-corner-cases.sh
    	t8011-blame-split-file.sh
    
    Generally, this is a performance enhancement and should not
    change the behavior of 'git blame' in any way. If a repo has a
    commit-graph file with computed changed-path Bloom filters, then
    they should notice improved performance for their 'git blame'
    commands.
    
    Here are some example timings that I found by blaming some paths
    in the Linux kernel repository:
    
     git blame arch/x86/kernel/topology.c >/dev/null
    
     Before: 0.83s
      After: 0.24s
    
     git blame kernel/time/time.c >/dev/null
    
     Before: 0.72s
      After: 0.24s
    
     git blame tools/perf/ui/stdio/hist.c >/dev/null
    
     Before: 0.27s
      After: 0.11s
    
    I specifically looked for "deep" paths that were also edited many
    times. As a counterpoint, the MAINTAINERS file was edited many
    times but is located in the root tree. This means that the cost of
    computing a diff relative to the pathspec is very small. Here are
    the timings for that command:
    
     git blame MAINTAINERS >/dev/null
    
     Before: 20.1s
      After: 18.0s
    
    These timings are the best of five. The worst-case runs were on the
    order of 2.5 minutes for both cases. Note that the MAINTAINERS file
    has 18,740 lines across 17,000+ commits. This happens to be one of
    the cases where this change provides the least improvement.
    
    The lack of improvement for the MAINTAINERS file and the relatively
    modest improvement for the other examples can be easily explained.
    The blame machinery needs to compute line-level diffs to determine
    which lines were changed by each commit. That makes up a large
    proportion of the computation time, and this change does not
    attempt to improve on that section of the algorithm. The
    MAINTAINERS file is large and changed often, so it takes time to
    determine which lines were updated by which commit. In contrast,
    the code files are much smaller, and it takes longer to comute
    the line-by-line diff for a single patch on the Linux mailing
    lists.
    
    Outside of the "-C" integration, I believe there is little more to
    gain from the changed-path Bloom filters for 'git blame' after this
    patch.
    
    Signed-off-by: Derrick Stolee <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    derrickstolee authored and gitster committed Apr 16, 2020
    Configuration menu
    Copy the full SHA
    0906ac2 View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2020

  1. blame: drop unused parameter from maybe_changed_path

    We don't use the "parent" parameter at all (probably because the bloom
    filter for a commit is always defined against a single parent anyway).
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Apr 23, 2020
    Configuration menu
    Copy the full SHA
    fe88f9f View commit details
    Browse the repository at this point in the history
  2. test-bloom: fix some whitespace issues

    Signed-off-by: Jeff King <[email protected]>
    Reviewed-by: Taylor Blau <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Apr 23, 2020
    Configuration menu
    Copy the full SHA
    24b7d1e View commit details
    Browse the repository at this point in the history
  3. test-bloom: check that we have expected arguments

    If "test-tool bloom" is not fed a command, or if arguments are missing
    for some commands, it will just segfault. Let's check argc and write a
    friendlier usage message.
    
    Signed-off-by: Jeff King <[email protected]>
    Reviewed-by: Taylor Blau <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Apr 23, 2020
    Configuration menu
    Copy the full SHA
    1b4c57f View commit details
    Browse the repository at this point in the history
Loading