Skip to content

Conversation

@jborean93
Copy link
Collaborator

@jborean93 jborean93 commented Jun 7, 2024

PR Summary

Use Array.Resize when doing += on an array type. This is more efficient than the default BinaryAdd done on other enumerable types. As adding to an array is a common operation done by people new to PowerShell this should reduce some of the issues which come with that type of code.

An example of dramatic difference this can make is reflected by this code

$s = [gc]::GetTotalAllocatedBytes($true)

Measure-Command {
    & {
        $val = @()
        for ($i = 0; $i -lt 20000; $i++) {
            $val += $i
        }
    }
}

$e = [gc]::GetTotalAllocatedBytes($true);

Write-Host "Allocated Bytes $($e-$s)"

When run with the new code (TEST=true) you can see it is 8 seconds faster and requires less memory to allocate. As the arrays get larger the differences are even more profound.

jborean:~/dev/PowerShell$ TEST=true debug/pwsh -File jordan.ps1

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 795
Ticks             : 7950790
TotalDays         : 9.20230324074074E-06
TotalHours        : 0.000220855277777778
TotalMinutes      : 0.0132513166666667
TotalSeconds      : 0.795079
TotalMilliseconds : 795.079

Allocated Bytes 1604712232
jborean:~/dev/PowerShell$ debug/pwsh -File jordan.ps1

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 9
Milliseconds      : 145
Ticks             : 91453429
TotalDays         : 0.000105848876157407
TotalHours        : 0.00254037302777778
TotalMinutes      : 0.152422381666667
TotalSeconds      : 9.1453429
TotalMilliseconds : 9145.3429

Allocated Bytes 6372367600

This doesn't negate the existing performance impacts of adding to an array, it just removes extra work that wasn't needed in the first place (which was pretty inefficient) making it slower than it has to. People should still use an alternative like capturing the output from the pipeline or use List<T>.

PR Context

Partially Fixes: #23899

PR Checklist

Use Array.Resize when doing += on an array type. This is more efficient
than the default BinaryAdd done on other enumerable types. As adding to
an array is a common operation done by people new to PowerShell this
should reduce some of the issues which come with that type of code.
@fflaten
Copy link
Contributor

fflaten commented Jun 7, 2024

People should still use an alternative like capturing the output from the pipeline or use List<T>.

In scripts/functions yes, but if this really reduces time spent on += by 90% I'll likely never bother using lists in interactive sessions anymore. Wow 🙌

@daxian-dbw daxian-dbw self-assigned this Jun 10, 2024
@daxian-dbw
Copy link
Member

The Engine WG discussed this and agree with the change, and it is ready to be reviewed.

Copy link
Collaborator

@iSazonov iSazonov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: interesting, have we such issues with Collection, List (and others IEnumarable-s?

@jborean93
Copy link
Collaborator Author

nit: interesting, have we such issues with Collection, List (and others IEnumarable-s?

Somewhat yes, at least the known size types (not IEnumerable) could be improved by providing the capacity on the newly generated list but that's somewhat unrelated to fixing arrays. I didn't touch the code for other IList types because I was unsure whether it was safe to do IList.Count to get the capacity needed.

@SeeminglyScience
Copy link
Collaborator

I didn't touch the code for other IList types because I was unsure whether it was safe to do IList.Count to get the capacity needed.

I would lean toward "maybe, but out of scope for this PR" just to get the most common use case settled quickly.

@SeeminglyScience
Copy link
Collaborator

People should still use an alternative like capturing the output from the pipeline or use List<T>.

In scripts/functions yes, but if this really reduces time spent on += by 90% I'll likely never bother using lists in interactive sessions anymore. Wow 🙌

To be fair, interactively it's really hard to beat statement capture (e.g. $a = foreach ($b in $c) { $b }). Super easy, and still pretty performant

@fflaten
Copy link
Contributor

fflaten commented Jun 11, 2024

To be fair, interactively it's really hard to beat statement capture (e.g. $a = foreach ($b in $c) { $b }). Super easy, and still pretty performant

True. I'm usually a few foreach-loops deep or splitting into multiple arrays when I need += or List in which case it's easier to manage imo.

Either way, looking forward to this.

Copy link
Member

@daxian-dbw daxian-dbw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@daxian-dbw daxian-dbw merged commit 6c66879 into PowerShell:master Jun 17, 2024
@daxian-dbw daxian-dbw added the CL-Engine Indicates that a PR should be marked as an engine change in the Change Log label Jun 17, 2024
@jborean93 jborean93 deleted the array-add branch June 17, 2024 23:24
@iRon7
Copy link

iRon7 commented Aug 30, 2024

This is a great improvement to PowerShell 🎉.

Btw. a similar issue exists for using the increase assignment operator (+=) to build a string. (see: https://stackoverflow.com/a/70093215/1701026)
I wonder if it be possible to do something similar there by first translating the (fixed) string into a StringBuilder object, add the strings, and rebuild the string (.ToString()) at the moment the variable is retrieved?

Image
Stack Overflow
Like with numerics?

For example,

$string = "Hello"
$string =+ " there"
In Perl you can do

my $string = "hello"
$string .= " there"
It seems a bit verbose to have to do

$string = $string + " t...

chrisdent-de pushed a commit to chrisdent-de/PowerShell that referenced this pull request Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CL-Engine Indicates that a PR should be marked as an engine change in the Change Log

Projects

None yet

Development

Successfully merging this pull request may close these issues.

+= copies array many times

6 participants