Skip to content

Conversation

@GrabYourPitchforks
Copy link
Member

Minor perf optimizations to Array.Resize<T> and RuntimeHelpers.GetSubArray. In both of these methods it's safe to use Buffer.Memmove rather than bouncing through the more heavyweight Array.Copy. Due to the nature of how we're constructing the arrays we know these operations to be valid.

@jkotas
Copy link
Member

jkotas commented May 2, 2020

Any measurable improvements?

@GrabYourPitchforks
Copy link
Member Author

Quick benchmark, showing an approx. 10% savings in RuntimeHelpers.GetSubArray:

public class ArrayRunner
{
    private object[] _objects;

    [Params(0, 1, 4, 12, 24, 128)]
    public int ArrayLength { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _objects = Enumerable.Range(0, 128).Select(i => i.ToString()).ToArray();
    }

    [Benchmark]
    public object[] ArrSlice()
    {
        return _objects[..ArrayLength];
    }
}
Method Job Toolchain ArrayLength Mean Error StdDev Ratio RatioSD
ArrSlice Job-XTQBTF master 0 91.96 ns 1.456 ns 1.362 ns 1.00 0.00
ArrSlice Job-ECZSQW memmove 0 85.51 ns 1.713 ns 1.602 ns 0.93 0.02
ArrSlice Job-XTQBTF master 1 100.25 ns 1.599 ns 1.417 ns 1.00 0.00
ArrSlice Job-ECZSQW memmove 1 89.24 ns 1.682 ns 1.574 ns 0.89 0.02
ArrSlice Job-XTQBTF master 4 107.82 ns 1.226 ns 1.147 ns 1.00 0.00
ArrSlice Job-ECZSQW memmove 4 97.51 ns 1.557 ns 1.456 ns 0.90 0.02
ArrSlice Job-XTQBTF master 12 113.85 ns 1.978 ns 2.641 ns 1.00 0.00
ArrSlice Job-ECZSQW memmove 12 104.73 ns 1.160 ns 0.969 ns 0.91 0.03
ArrSlice Job-XTQBTF master 24 128.17 ns 1.740 ns 1.627 ns 1.00 0.00
ArrSlice Job-ECZSQW memmove 24 115.81 ns 2.310 ns 2.472 ns 0.90 0.02
ArrSlice Job-XTQBTF master 128 242.94 ns 4.920 ns 13.134 ns 1.00 0.00
ArrSlice Job-ECZSQW memmove 128 222.49 ns 4.485 ns 10.123 ns 0.91 0.06

@stephentoub
Copy link
Member

Interesting. I'd have expected a fixed overhead reduction. Is the savings scaling with length?

@GrabYourPitchforks
Copy link
Member Author

@stephentoub I would've expected a fixed-size reduction as well. Maybe something's off on my test box? Here's the benchmarks for Array.Resize<T>, which clearly shows the fixed-size overhead reduction across all test cases.

public class ArrayRunner
{
    private int[] _ints;

    [Params(0, 1, 4, 12, 24, 128)]
    public int ArrayLength { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _ints = Enumerable.Range(0, 32).ToArray();
    }

    [Benchmark]
    public int[] ArrResize()
    {
        int[] ints = _ints;
        Array.Resize(ref ints, ArrayLength);
        return ints;
    }

    [Benchmark]
    public uint[] ArrResizeCompatibleValueType()
    {
        uint[] uints = Unsafe.As<uint[]>(_ints); // safe, but avoid 'castclass' for benchmark
        Array.Resize(ref uints, ArrayLength);
        return uints;
    }
}
Method Job Toolchain ArrayLength Mean Error StdDev Median Ratio RatioSD
ArrResize Job-OGUJUP master 0 16.21 ns 0.398 ns 0.620 ns 16.08 ns 1.00 0.00
ArrResize Job-PCCOVY memmove 0 13.10 ns 0.326 ns 0.799 ns 12.94 ns 0.82 0.05
ArrResizeCompatibleValueType Job-OGUJUP master 0 48.71 ns 1.044 ns 2.927 ns 48.01 ns 1.00 0.00
ArrResizeCompatibleValueType Job-PCCOVY memmove 0 12.98 ns 0.336 ns 0.654 ns 12.81 ns 0.27 0.02
ArrResize Job-OGUJUP master 1 17.13 ns 0.477 ns 1.376 ns 16.70 ns 1.00 0.00
ArrResize Job-PCCOVY memmove 1 12.39 ns 0.318 ns 0.282 ns 12.37 ns 0.68 0.04
ArrResizeCompatibleValueType Job-OGUJUP master 1 44.54 ns 0.778 ns 0.727 ns 44.68 ns 1.00 0.00
ArrResizeCompatibleValueType Job-PCCOVY memmove 1 11.93 ns 0.272 ns 0.241 ns 11.90 ns 0.27 0.01
ArrResize Job-OGUJUP master 4 16.08 ns 0.247 ns 0.219 ns 16.06 ns 1.00 0.00
ArrResize Job-PCCOVY memmove 4 13.12 ns 0.337 ns 0.795 ns 12.84 ns 0.87 0.06
ArrResizeCompatibleValueType Job-OGUJUP master 4 44.88 ns 0.872 ns 0.773 ns 44.77 ns 1.00 0.00
ArrResizeCompatibleValueType Job-PCCOVY memmove 4 12.32 ns 0.297 ns 0.330 ns 12.24 ns 0.27 0.01
ArrResize Job-OGUJUP master 12 19.24 ns 0.432 ns 0.660 ns 19.15 ns 1.00 0.00
ArrResize Job-PCCOVY memmove 12 16.05 ns 0.396 ns 0.941 ns 15.72 ns 0.85 0.06
ArrResizeCompatibleValueType Job-OGUJUP master 12 49.75 ns 1.066 ns 1.459 ns 49.40 ns 1.00 0.00
ArrResizeCompatibleValueType Job-PCCOVY memmove 12 14.27 ns 0.312 ns 0.261 ns 14.26 ns 0.29 0.01
ArrResize Job-OGUJUP master 24 22.87 ns 0.260 ns 0.203 ns 22.83 ns 1.00 0.00
ArrResize Job-PCCOVY memmove 24 19.64 ns 0.358 ns 0.352 ns 19.57 ns 0.86 0.02
ArrResizeCompatibleValueType Job-OGUJUP master 24 53.02 ns 1.126 ns 1.205 ns 52.62 ns 1.00 0.00
ArrResizeCompatibleValueType Job-PCCOVY memmove 24 18.80 ns 0.423 ns 0.396 ns 18.86 ns 0.36 0.01
ArrResize Job-OGUJUP master 128 52.46 ns 0.915 ns 0.811 ns 52.69 ns 1.00 0.00
ArrResize Job-PCCOVY memmove 128 49.79 ns 0.918 ns 0.814 ns 49.62 ns 0.95 0.02
ArrResizeCompatibleValueType Job-OGUJUP master 128 83.35 ns 1.715 ns 2.229 ns 83.42 ns 1.00 0.00
ArrResizeCompatibleValueType Job-PCCOVY memmove 128 48.83 ns 0.905 ns 0.803 ns 48.76 ns 0.59 0.02

@jkotas
Copy link
Member

jkotas commented May 6, 2020

I guess there is noise from GC or alignment.

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems worthwhile even if just for small array sizes. Thanks.

@GrabYourPitchforks
Copy link
Member Author

Looking into why CI's on the floor. I don't think it's related to this PR but want to double-check.

@GrabYourPitchforks
Copy link
Member Author

Interesting that it's only the Mono debug builds failing. I don't see these tests failing in other PRs. Might be some weirdness in introducing GetArrayDataReference in these code paths. Still investigating.

@GrabYourPitchforks
Copy link
Member Author

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@GrabYourPitchforks
Copy link
Member Author

@marek-safar we have a few unit tests that are failing only in mono debug builds. Would you be able to assign somebody to take a look? There appears to be some weirdness in calling MemoryMarshal.GetArrayDataReference from mono debug builds.

@marek-safar
Copy link
Contributor

@vargaz could you please help them with Mono side

@vargaz
Copy link
Contributor

vargaz commented Jun 15, 2020

This looks like a problem in Mono's implementation of BulkMoveWithWriteBarrier. Our old Memmove implementation had a type argument because it used different different copy/write barrier functions for different types, but the .netcore version doesn't have this argument.
Perhaps we should move the Memmove method into runtime specific code so Mono can have its own implementation ?

@GrabYourPitchforks
Copy link
Member Author

@vargaz If you point me to the mono-preferred implementation I can send a PR to split these back out. Thanks!

@vargaz
Copy link
Contributor

vargaz commented Jun 15, 2020

#37920

@GrabYourPitchforks
Copy link
Member Author

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@GrabYourPitchforks
Copy link
Member Author

@vargaz Looks like mono debug tests are still failing, even after merging in your latest changes. See https://github.com/dotnet/runtime/pull/35733/checks.

Ideas?

@GrabYourPitchforks
Copy link
Member Author

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@GrabYourPitchforks
Copy link
Member Author

Failing leg seems to have been pulled from current CI runs, and on rerun today everything that ran was green. Going ahead with the merge.

@GrabYourPitchforks GrabYourPitchforks merged commit 09f1fa0 into dotnet:master Jun 23, 2020
@GrabYourPitchforks GrabYourPitchforks deleted the arr_copy branch June 23, 2020 23:44
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants