Skip to content

Conversation

@benaadams
Copy link
Member

@benaadams benaadams commented Nov 30, 2019

By string length (1- 64); difference in last position (lower is better)

image

By difference position (0-88) in long string (lower is better)
image

By difference position (0-1023) in long string (lower is better)

image

        Method | Toolchain |  Length |    Mean   |    Error  |         Op/s  | Ratio |      |
---------------|-----------|---------|-----------|-----------|---------------|-------|------|
CompareOrdinal |      base |       1 |  3.312 ns | 0.0116 ns | 301,900,985.4 |  1    |      |
CompareOrdinal |      diff |       1 |  3.495 ns | 0.0155 ns | 286,163,741.3 |  1.05 |  95% |
CompareOrdinal |      base |       2 |  4.763 ns | 0.0118 ns | 209,946,368.4 |  1    |      |
CompareOrdinal |      diff |       2 |  6.114 ns | 0.0560 ns | 163,555,899.2 |  1.28 |  78% |
CompareOrdinal |      base |       3 |  6.009 ns | 0.0139 ns | 166,404,826.9 |  1    |      | 
CompareOrdinal |      diff |       3 |  5.277 ns | 0.0188 ns | 189,499,820.2 |  0.88 | 114% |
CompareOrdinal |      base |       4 |  6.251 ns | 0.0125 ns | 159,966,995.5 |  1    |      |
CompareOrdinal |      diff |       4 |  5.027 ns | 0.0231 ns | 198,923,661.0 |  0.8  | 125% |
CompareOrdinal |      base |       5 |  6.255 ns | 0.0142 ns | 159,872,536.1 |  1    |      |
CompareOrdinal |      diff |       5 |  5.402 ns | 0.0172 ns | 185,125,753.1 |  0.86 | 116% |
CompareOrdinal |      base |       6 |  6.772 ns | 0.0132 ns | 147,664,221.0 |  1    |      |
CompareOrdinal |      diff |       6 |  6.142 ns | 0.1198 ns | 162,807,596.6 |  0.9  | 111% |
CompareOrdinal |      base |       7 |  7.166 ns | 0.0251 ns | 139,554,658.6 |  1    |      |
CompareOrdinal |      diff |       7 |  5.304 ns | 0.0277 ns | 188,526,101.1 |  0.74 | 135% |
CompareOrdinal |      base |       8 |  7.524 ns | 0.0223 ns | 132,912,885.0 |  1    |      |
CompareOrdinal |      diff |       8 |  6.214 ns | 0.0167 ns | 160,939,642.2 |  0.83 | 120% |
CompareOrdinal |      base |       9 |  7.408 ns | 0.0635 ns | 134,990,065.6 |  1    |      |
CompareOrdinal |      diff |       9 |  6.437 ns | 0.0060 ns | 155,355,707.9 |  0.87 | 115% |
CompareOrdinal |      base |      10 |  7.657 ns | 0.0389 ns | 130,606,710.9 |  1    |      |
CompareOrdinal |      diff |      10 |  6.435 ns | 0.0092 ns | 155,409,282.3 |  0.84 | 119% |
CompareOrdinal |      base |      11 |  8.040 ns | 0.0725 ns | 124,376,453.7 |  1    |      |
CompareOrdinal |      diff |      11 |  6.436 ns | 0.0105 ns | 155,378,710.7 |  0.8  | 125% |
CompareOrdinal |      base |      12 |  8.037 ns | 0.0348 ns | 124,422,050.9 |  1    |      |
CompareOrdinal |      diff |      12 |  6.439 ns | 0.0162 ns | 155,304,474.2 |  0.8  | 125% |
CompareOrdinal |      base |      13 |  8.456 ns | 0.0243 ns | 118,266,085.0 |  1    |      |
CompareOrdinal |      diff |      13 |  6.443 ns | 0.0157 ns | 155,212,797.2 |  0.76 | 132% |
CompareOrdinal |      base |      14 |  6.282 ns | 0.0187 ns | 159,176,779.5 |  1    |      |
CompareOrdinal |      diff |      14 |  6.437 ns | 0.0149 ns | 155,362,814.2 |  1.02 |  98% |
CompareOrdinal |      base |      15 |  6.268 ns | 0.0358 ns | 159,540,160.0 |  1    |      |
CompareOrdinal |      diff |      15 |  6.434 ns | 0.0138 ns | 155,425,241.0 |  1.03 |  97% |
CompareOrdinal |      base |      16 |  5.502 ns | 0.0780 ns | 181,759,720.4 |  1    |      |
CompareOrdinal |      diff |      16 |  6.249 ns | 0.0231 ns | 160,017,474.2 |  1.14 |  88% |
CompareOrdinal |      base |      17 |  6.615 ns | 0.0190 ns | 151,168,076.2 |  1    |      |
CompareOrdinal |      diff |      17 |  6.246 ns | 0.0127 ns | 160,093,226.6 |  0.94 | 106% |
CompareOrdinal |      base |      18 |  7.262 ns | 0.0194 ns | 137,710,614.1 |  1    |      |
CompareOrdinal |      diff |      18 |  6.253 ns | 0.0233 ns | 159,927,722.3 |  0.86 | 116% |
CompareOrdinal |      base |      19 |  7.395 ns | 0.0198 ns | 135,220,294.5 |  1    |      |
CompareOrdinal |      diff |      19 |  6.246 ns | 0.0120 ns | 160,101,708.5 |  0.84 | 119% |
CompareOrdinal |      base |      20 |  7.644 ns | 0.0213 ns | 130,827,428.6 |  1    |      |
CompareOrdinal |      diff |      20 |  6.244 ns | 0.0119 ns | 160,157,822.4 |  0.82 | 122% |
CompareOrdinal |      base |      21 |  7.711 ns | 0.0248 ns | 129,678,130.3 |  1    |      |
CompareOrdinal |      diff |      21 |  6.254 ns | 0.0280 ns | 159,904,313.4 |  0.81 | 123% |
CompareOrdinal |      base |      22 |  8.209 ns | 0.0229 ns | 121,815,140.6 |  1    |      |
CompareOrdinal |      diff |      22 |  6.253 ns | 0.0208 ns | 159,928,685.8 |  0.76 | 132% |
CompareOrdinal |      base |      23 |  8.437 ns | 0.0276 ns | 118,523,585.3 |  1    |      |
CompareOrdinal |      diff |      23 |  6.235 ns | 0.0332 ns | 160,381,594.4 |  0.74 | 135% |
CompareOrdinal |      base |      24 |  8.850 ns | 0.0482 ns | 112,992,528.4 |  1    |      |
CompareOrdinal |      diff |      24 |  6.252 ns | 0.0331 ns | 159,955,247.3 |  0.71 | 141% |
CompareOrdinal |      base |      25 |  8.666 ns | 0.0285 ns | 115,397,471.3 |  1    |      |
CompareOrdinal |      diff |      25 |  6.248 ns | 0.0276 ns | 160,044,673.9 |  0.72 | 139% |
CompareOrdinal |      base |      26 |  6.468 ns | 0.0328 ns | 154,606,877.3 |  1    |      |
CompareOrdinal |      diff |      26 |  6.317 ns | 0.0225 ns | 158,306,290.3 |  0.98 | 102% |
CompareOrdinal |      base |      27 |  7.804 ns | 0.0459 ns | 128,139,353.8 |  1    |      |
CompareOrdinal |      diff |      27 |  6.216 ns | 0.0250 ns | 160,865,918.0 |  0.8  | 125% |
CompareOrdinal |      base |      28 |  8.024 ns | 0.0549 ns | 124,628,196.5 |  1    |      |
CompareOrdinal |      diff |      28 |  6.217 ns | 0.0149 ns | 160,861,035.8 |  0.77 | 130% |
CompareOrdinal |      base |      29 |  7.678 ns | 0.0213 ns | 130,238,615.3 |  1    |      |
CompareOrdinal |      diff |      29 |  6.235 ns | 0.0351 ns | 160,384,748.7 |  0.81 | 123% |
CompareOrdinal |      base |      30 |  8.092 ns | 0.0348 ns | 123,578,752.7 |  1    |      |
CompareOrdinal |      diff |      30 |  6.250 ns | 0.0306 ns | 159,999,814.9 |  0.77 | 130% |
CompareOrdinal |      base |      31 |  8.227 ns | 0.0454 ns | 121,556,695.3 |  1    |      |
CompareOrdinal |      diff |      31 |  6.238 ns | 0.0491 ns | 160,304,019.0 |  0.76 | 132% |
CompareOrdinal |      base |      32 |  8.688 ns | 0.0732 ns | 115,096,742.4 |  1    |      |
CompareOrdinal |      diff |      32 |  6.222 ns | 0.0533 ns | 160,713,868.3 |  0.72 | 139% |
CompareOrdinal |      base |      33 |  8.484 ns | 0.0128 ns | 117,870,166.0 |  1    |      |
CompareOrdinal |      diff |      33 |  6.592 ns | 0.0313 ns | 151,693,637.7 |  0.78 | 128% |
CompareOrdinal |      base |      34 |  8.717 ns | 0.0287 ns | 114,722,837.7 |  1    |      |
CompareOrdinal |      diff |      34 |  6.624 ns | 0.0525 ns | 150,972,499.8 |  0.76 | 132% |
CompareOrdinal |      base |      35 |  8.885 ns | 0.0558 ns | 112,555,187.4 |  1    |      |
CompareOrdinal |      diff |      35 |  6.585 ns | 0.0340 ns | 151,859,163.7 |  0.74 | 135% |
CompareOrdinal |      base |      37 |  9.368 ns | 0.0286 ns | 106,743,397.5 |  1    |      |
CompareOrdinal |      diff |      37 |  6.506 ns | 0.0737 ns | 153,699,328.9 |  0.69 | 145% |
CompareOrdinal |      base |      38 |  7.894 ns | 0.0313 ns | 126,674,121.3 |  1    |      |
CompareOrdinal |      diff |      38 |  6.398 ns | 0.0539 ns | 156,310,293.0 |  0.81 | 123% |
CompareOrdinal |      base |      39 |  7.642 ns | 0.0299 ns | 130,863,753.5 |  1    |      |
CompareOrdinal |      diff |      39 |  6.409 ns | 0.0808 ns | 156,036,825.9 |  0.84 | 119% |
CompareOrdinal |      base |      40 |  8.036 ns | 0.0337 ns | 124,437,137.6 |  1    |      |
CompareOrdinal |      diff |      40 |  6.422 ns | 0.0550 ns | 155,716,826.9 |  0.8  | 125% |
CompareOrdinal |      base |      41 |  7.243 ns | 0.0500 ns | 138,065,007.0 |  1    |      |
CompareOrdinal |      diff |      41 |  6.397 ns | 0.0623 ns | 156,331,361.9 |  0.88 | 114% |
CompareOrdinal |      base |      42 |  8.312 ns | 0.0180 ns | 120,308,418.1 |  1    |      |
CompareOrdinal |      diff |      42 |  6.425 ns | 0.0536 ns | 155,641,772.3 |  0.77 | 130% |
CompareOrdinal |      base |      43 |  8.400 ns | 0.0130 ns | 119,040,855.7 |  1    |      |
CompareOrdinal |      diff |      43 |  6.449 ns | 0.0580 ns | 155,071,546.1 |  0.77 | 130% |
CompareOrdinal |      base |      44 |  8.938 ns | 0.0231 ns | 111,877,493.2 |  1    |      |
CompareOrdinal |      diff |      44 |  6.440 ns | 0.0435 ns | 155,284,991.9 |  0.72 | 139% |
CompareOrdinal |      base |      45 |  8.925 ns | 0.0339 ns | 112,040,562.5 |  1    |      |
CompareOrdinal |      diff |      45 |  6.572 ns | 0.0421 ns | 152,153,450.1 |  0.74 | 135% |
CompareOrdinal |      base |      46 |  9.430 ns | 0.0310 ns | 106,040,924.9 |  1    |      |
CompareOrdinal |      diff |      46 |  6.562 ns | 0.0265 ns | 152,383,295.4 |  0.7  | 143% |
CompareOrdinal |      base |      47 |  9.447 ns | 0.0193 ns | 105,848,695.4 |  1    |      |
CompareOrdinal |      diff |      47 |  6.593 ns | 0.0385 ns | 151,666,514.8 |  0.7  | 143% |
CompareOrdinal |      base |      48 |  9.695 ns | 0.0478 ns | 103,145,515.4 |  1    |      |
CompareOrdinal |      diff |      48 |  6.421 ns | 0.0763 ns | 155,747,630.9 |  0.66 | 152% |
CompareOrdinal |      base |      49 |  9.891 ns | 0.0330 ns | 101,103,394.2 |  1    |      |
CompareOrdinal |      diff |      49 |  7.075 ns | 0.0324 ns | 141,341,240.6 |  0.72 | 139% |
CompareOrdinal |      base |      50 |  8.753 ns | 0.0143 ns | 114,247,034.3 |  1    |      |
CompareOrdinal |      diff |      50 |  7.094 ns | 0.0274 ns | 140,956,633.9 |  0.81 | 123% |
CompareOrdinal |      base |      51 |  7.489 ns | 0.0294 ns | 133,530,478.3 |  1    |      |
CompareOrdinal |      diff |      51 |  7.104 ns | 0.0377 ns | 140,769,956.1 |  0.95 | 105% |
CompareOrdinal |      base |      52 |  8.964 ns | 0.0104 ns | 111,558,760.4 |  1    |      |
CompareOrdinal |      diff |      52 |  7.089 ns | 0.0358 ns | 141,058,316.1 |  0.79 | 127% |
CompareOrdinal |      base |      53 |  8.543 ns | 0.0213 ns | 117,060,863.3 |  1    |      |
CompareOrdinal |      diff |      53 |  7.085 ns | 0.0356 ns | 141,141,750.9 |  0.83 | 120% |
CompareOrdinal |      base |      54 |  7.812 ns | 0.0527 ns | 128,007,289.5 |  1    |      |
CompareOrdinal |      diff |      54 |  7.094 ns | 0.0340 ns | 140,961,837.0 |  0.91 | 110% |
CompareOrdinal |      base |      55 |  9.610 ns | 0.0889 ns | 104,058,536.2 |  1    |      |
CompareOrdinal |      diff |      55 |  7.260 ns | 0.0300 ns | 137,732,735.0 |  0.76 | 132% |
CompareOrdinal |      base |      56 |  9.944 ns | 0.0589 ns | 100,566,043.8 |  1    |      |
CompareOrdinal |      diff |      56 |  7.262 ns | 0.0405 ns | 137,700,869.7 |  0.73 | 137% |
CompareOrdinal |      base |      57 |  9.928 ns | 0.0338 ns | 100,720,650.4 |  1    |      |
CompareOrdinal |      diff |      57 |  7.237 ns | 0.0308 ns | 138,188,219.8 |  0.73 | 137% |
CompareOrdinal |      base |      58 | 10.416 ns | 0.0973 ns |  96,001,992.2 |  1    |      |
CompareOrdinal |      diff |      58 |  7.219 ns | 0.0292 ns | 138,522,886.3 |  0.69 | 145% |
CompareOrdinal |      base |      59 | 10.308 ns | 0.0192 ns |  97,012,227.6 |  1    |      |
CompareOrdinal |      diff |      59 |  7.208 ns | 0.0217 ns | 138,726,321.6 |  0.7  | 143% |
CompareOrdinal |      base |      60 | 10.676 ns | 0.0529 ns |  93,665,185.6 |  1    |      |
CompareOrdinal |      diff |      60 |  7.264 ns | 0.0396 ns | 137,666,794.4 |  0.68 | 147% |
CompareOrdinal |      base |      61 | 10.764 ns | 0.0165 ns |  92,898,125.6 |  1    |      |
CompareOrdinal |      diff |      61 |  7.215 ns | 0.0352 ns | 138,594,708.9 |  0.67 | 149% |
CompareOrdinal |      base |      62 |  8.420 ns | 0.0542 ns | 118,770,530.4 |  1    |      |
CompareOrdinal |      diff |      62 |  7.235 ns | 0.0427 ns | 138,224,679.8 |  0.86 | 116% |
CompareOrdinal |      base |      63 |  9.165 ns | 0.0116 ns | 109,116,464.3 |  1    |      |
CompareOrdinal |      diff |      63 |  7.194 ns | 0.0312 ns | 139,007,668.5 |  0.78 | 128% |
CompareOrdinal |      base |      64 |  9.707 ns | 0.0061 ns | 103,018,766.5 |  1    |      |
CompareOrdinal |      diff |      64 |  7.200 ns | 0.0206 ns | 138,883,359.7 |  0.74 | 135% |

Coreclr PR: dotnet/coreclr#22479
Resolves: https://github.com/dotnet/coreclr/issues/22763

@benaadams
Copy link
Member Author

System.Net.Security.Tests.SslClientAuthenticationOptionsTest.ClientOptions_ServerOptions_NotMutatedDuringAuthentication

System.TimeoutException : VirtualNetwork: Timeout reading the next frame.

Raised issue #404

@drieseng
Copy link
Contributor

I'm just an outsider, but - as @jkotas asked in dotnet/coreclr#22479 - "Could you please share the up to date perf numbers?".

@stephentoub stephentoub reopened this Jan 15, 2020
@stephentoub
Copy link
Member

@benaadams, did you have perf numbers here?

@benaadams benaadams force-pushed the Use-CompareOrdinalHelper-for-SpanHelpers.SequenceCompareTo- branch from e6ef143 to 7d6304c Compare January 15, 2020 17:39
@benaadams
Copy link
Member Author

Ah, this additionally needs the intrinsicification of SequenceCompareTo or it cuts in very late with just Vector on a machine that supports Avx.

Have the additional change, just testing it.

@benaadams benaadams changed the title Use CompareOrdinalHelper for SpanHelpers.SequenceCompareTo Use SpanHelpers.SequenceCompareTo instead of CompareOrdinalHelper Jan 16, 2020
@benaadams benaadams force-pushed the Use-CompareOrdinalHelper-for-SpanHelpers.SequenceCompareTo- branch 2 times, most recently from 29b89a7 to 385d550 Compare January 26, 2020 06:56
@benaadams
Copy link
Member Author

benaadams commented Jan 26, 2020

    G_M29673_IG01:
        push     rsi
        vzeroupper 
                            
    G_M29673_IG02:
        cmp      rcx, r8
+-<     je       SHORT G_M29673_IG10             ; Equal
|                           
|   G_M29673_IG03:
|       cmp      edx, r9d
|       jle      SHORT G_M29673_IG04
|       mov      eax, r9d
|       jmp      SHORT G_M29673_IG05
|                           
|   G_M29673_IG04:
|       mov      eax, edx
|                           
|   G_M29673_IG05:
|       movsxd   r10, eax
|       xor      r11, r11
|       cmp      r10, 8
|       jge      G_M29673_IG15                     ; IntrinsicsCompare ------------>+
|       cmp      r10, 4                                                             |
|       jl       SHORT G_M29673_IG07                                                |
|                                                                                   |
|   G_M29673_IG06:                         <-----+ (long)                           |
|       mov      rax, qword ptr [rcx+2*r11]      |                                  |
|       mov      rsi, qword ptr [r8+2*r11]       |                                  |
|       xor      rsi, rax                        L                                  |
|       test     rsi, rsi                        O                                  |
|       jne      SHORT G_M29673_IG12             O ; LongDifference ------->+       |
|       add      r11, 4                          P                          |       |
|       lea      rax, [r11+4]                    |                          |       |
|       cmp      r10, rax                        |                          |       |
|       jge      SHORT G_M29673_IG06       ------+                          |       |
|                                                                           |       |
|   G_M29673_IG07:                                                          |       |
|       lea      rax, [r11+2]                                               |       |
|       cmp      r10, rax                                                   |       |
|       jl       SHORT G_M29673_IG08                                        |       |
|       mov      eax, dword ptr [rcx+2*r11]                                 |       |
|       cmp      dword ptr [r8+2*r11], eax                                  |       |
|       jne      SHORT G_M29673_IG08                                        |       |
|       add      r11, 2                                                     |       |
|                                                                           |       |
|   G_M29673_IG08:                                                          |       |
|       cmp      r11, r10                                                   |       |
+-<     jge      SHORT G_M29673_IG10             ; Equal                    |       |
|                                                                           |       |
|   G_M29673_IG09:                        <-----+ (char)                    |       |
|       lea      rax, bword ptr [rcx+2*r11]     |                           |       |
|       movzx    rsi, word  ptr [r8+2*r11]      |                           |       |
|       movzx    rax, word  ptr [rax]           L                           |       |
|       sub      eax, esi                       O                           |       |
|       test     eax, eax                       O                           |       |
|       jne      SHORT G_M29673_IG14            P ; ResultDifference --->+  |       |
|       inc      r11                            |                        |  |       |
|       cmp      r11, r10                       |                        |  |       |
|       jl       SHORT G_M29673_IG09      ------+                        |  |       |
\                                                                        |  |       |
 -> G_M29673_IG10:        ; <--- Equal                                   |  |       |
/       mov      eax, edx                                                |  |       |
|       sub      eax, r9d                                                |  |       |
|                                                                        |  |       |
|   G_M29673_IG11:                                                       |  |       |
|       vzeroupper                                                       |  |       |
|       pop      rsi                                                     |  |       |
|       ret                                                              |  |       |
|                                                                        |  |       |
|   G_M29673_IG12:                         ; <-- LongDifference  -----------+       |
|       xor      eax, eax                                                |          |
|       tzcnt    rax, rsi                                                |          |
|       sar      eax, 4                                                  |          |
|       movsxd   rax, eax                                                |          |
|       add      r11, rax                                                |          |
|                                                                        |          |
|   G_M29673_IG13:                         ; <-- OffsetDifference -------|------+   |
|       lea      rax, bword ptr [rcx+2*r11]                              |      |   |
|       movzx    r10, word  ptr [r8+2*r11]                               |      |   |
|       movzx    rax, word  ptr [rax]                                    |      |   |
|       sub      eax, r10d                                               |      |   |
|                                                                        |      |   |
|   G_M29673_IG14:                         ; <-- ResultDifference--------+      |   |
|       vzeroupper                                                              |   |
|       pop      rsi                                                            |   |
|       ret                                                                     |   |
|                                                                               |   |
|   G_M29673_IG15:                          ; <-- IntrinsicsCompare ----------------+
|       lea      rax, [r10-16]                                                  |
|       test     rax, rax                                                       |
|       jl       SHORT G_M29673_IG18                                            |
|       test     rax, rax                                                       |
|       jle      SHORT G_M29673_IG17                                            |
|                                                                               |
|   G_M29673_IG16:                        <-----+ (Vector256)                   |
|       vmovupd  ymm0, ymmword ptr[rcx+2*r11]   |                               |
|       vmovupd  ymm1, ymmword ptr[r8+2*r11]    |                               |
|       vpcmpeqw ymm0, ymm0, ymm1               L                               |
|       vpmovmskb esi, ymm0                     O                               |
|       cmp      esi, -1                        O                               |
|       jne      G_M29673_IG21                  P ; IntrinsicsDifference --->+  |
|       add      r11, 16                        |                            |  |
|       cmp      rax, r11                       |                            |  |
|       jg       SHORT G_M29673_IG16      ------+                            |  |
|                                                                            |  |
|   G_M29673_IG17:                                                           |  |
|       mov      r11, rax                                                    |  |
|       vmovupd  ymm0, ymmword ptr[rcx+2*r11]                                |  |
|       vmovupd  ymm1, ymmword ptr[r8+2*r11]                                 |  |
|       vpcmpeqw ymm0, ymm0, ymm1                                            |  |
|       vpmovmskb esi, ymm0                                                  |  |
|       cmp      esi, -1                                                     |  |
|       jne      SHORT G_M29673_IG21              ; IntrinsicsDifference --->+  |
+-<     jmp      SHORT G_M29673_IG10              ; Equal                    |  |
|                                                                            |  |
|   G_M29673_IG18:                                                           |  |
|       lea      rax, [r10-8]                                                |  |
|       test     rax, rax                                                    |  |
|       jle      SHORT G_M29673_IG20                                         |  |
|                                                                            |  |
|   G_M29673_IG19:                        <-----+ (Vector128)                |  |
|       vmovupd  xmm0, xmmword ptr [rcx+2*r11]  |                            |  |
|       vmovupd  xmm1, xmmword ptr [r8+2*r11]   |                            |  |
|       vpcmpeqw xmm0, xmm0, xmm1               L                            |  |
|       vpmovmskb esi, xmm0                     O                            |  |
|       cmp      esi, 0xFFFF                    O                            |  |
|       jne      SHORT G_M29673_IG21            P ; IntrinsicsDifference --->+  |
|       add      r11, 8                         |                            |  |
|       cmp      rax, r11                       |                            |  |
|       jg       SHORT G_M29673_IG19      ------+                            |  |
|                                                                            |  |
|   G_M29673_IG20:                                                           |  |
|       mov      r11, rax                                                    |  |
|       vmovupd  xmm0, xmmword ptr [rcx+2*r11]                               |  |
|       vmovupd  xmm1, xmmword ptr [r8+2*r11]                                |  |
|       vpcmpeqw xmm0, xmm0, xmm1                                            |  |
|       vpmovmskb esi, xmm0                                                  |  |
|       cmp      esi, 0xFFFF                                                 |  |
+-<     je       G_M29673_IG10                     ; Equal                   |  |
                                                                             |  |
    G_M29673_IG21:         ; <--------------------- IntrinsicsDifference ----+  |
        mov      eax, esi                                                       |
        not      eax                                                            |
        tzcnt    eax, eax                                                       |
        sar      eax, 1                                                         |
        movsxd   rax, eax                                                       |
        add      r11, rax                                                       |
        jmp      G_M29673_IG13                 ; OffsetDifference ------------->+
                            
    
; Total bytes of code 355, prolog size 4, PerfScore 222.98, for method Program:SequenceCompareTo(byref,int,byref,int):int

@benaadams benaadams force-pushed the Use-CompareOrdinalHelper-for-SpanHelpers.SequenceCompareTo- branch from 385d550 to 9431416 Compare January 26, 2020 07:31
@benaadams
Copy link
Member Author

benaadams commented Jan 27, 2020

By string length (1- 64); difference in last position (lower is better)

image

By difference position (0-88) in long string (lower is better)
image

By difference position (0-1023) in long string (lower is better)

image

@benaadams
Copy link
Member Author

        Method | Toolchain |  Length |    Mean   |    Error  |         Op/s  | Ratio |      |
---------------|-----------|---------|-----------|-----------|---------------|-------|------|
CompareOrdinal |      base |       1 |  3.312 ns | 0.0116 ns | 301,900,985.4 |  1    |      |
CompareOrdinal |      diff |       1 |  3.495 ns | 0.0155 ns | 286,163,741.3 |  1.05 |  95% |
CompareOrdinal |      base |       2 |  4.763 ns | 0.0118 ns | 209,946,368.4 |  1    |      |
CompareOrdinal |      diff |       2 |  6.114 ns | 0.0560 ns | 163,555,899.2 |  1.28 |  78% |
CompareOrdinal |      base |       3 |  6.009 ns | 0.0139 ns | 166,404,826.9 |  1    |      | 
CompareOrdinal |      diff |       3 |  5.277 ns | 0.0188 ns | 189,499,820.2 |  0.88 | 114% |
CompareOrdinal |      base |       4 |  6.251 ns | 0.0125 ns | 159,966,995.5 |  1    |      |
CompareOrdinal |      diff |       4 |  5.027 ns | 0.0231 ns | 198,923,661.0 |  0.8  | 125% |
CompareOrdinal |      base |       5 |  6.255 ns | 0.0142 ns | 159,872,536.1 |  1    |      |
CompareOrdinal |      diff |       5 |  5.402 ns | 0.0172 ns | 185,125,753.1 |  0.86 | 116% |
CompareOrdinal |      base |       6 |  6.772 ns | 0.0132 ns | 147,664,221.0 |  1    |      |
CompareOrdinal |      diff |       6 |  6.142 ns | 0.1198 ns | 162,807,596.6 |  0.9  | 111% |
CompareOrdinal |      base |       7 |  7.166 ns | 0.0251 ns | 139,554,658.6 |  1    |      |
CompareOrdinal |      diff |       7 |  5.304 ns | 0.0277 ns | 188,526,101.1 |  0.74 | 135% |
CompareOrdinal |      base |       8 |  7.524 ns | 0.0223 ns | 132,912,885.0 |  1    |      |
CompareOrdinal |      diff |       8 |  6.214 ns | 0.0167 ns | 160,939,642.2 |  0.83 | 120% |
CompareOrdinal |      base |       9 |  7.408 ns | 0.0635 ns | 134,990,065.6 |  1    |      |
CompareOrdinal |      diff |       9 |  6.437 ns | 0.0060 ns | 155,355,707.9 |  0.87 | 115% |
CompareOrdinal |      base |      10 |  7.657 ns | 0.0389 ns | 130,606,710.9 |  1    |      |
CompareOrdinal |      diff |      10 |  6.435 ns | 0.0092 ns | 155,409,282.3 |  0.84 | 119% |
CompareOrdinal |      base |      11 |  8.040 ns | 0.0725 ns | 124,376,453.7 |  1    |      |
CompareOrdinal |      diff |      11 |  6.436 ns | 0.0105 ns | 155,378,710.7 |  0.8  | 125% |
CompareOrdinal |      base |      12 |  8.037 ns | 0.0348 ns | 124,422,050.9 |  1    |      |
CompareOrdinal |      diff |      12 |  6.439 ns | 0.0162 ns | 155,304,474.2 |  0.8  | 125% |
CompareOrdinal |      base |      13 |  8.456 ns | 0.0243 ns | 118,266,085.0 |  1    |      |
CompareOrdinal |      diff |      13 |  6.443 ns | 0.0157 ns | 155,212,797.2 |  0.76 | 132% |
CompareOrdinal |      base |      14 |  6.282 ns | 0.0187 ns | 159,176,779.5 |  1    |      |
CompareOrdinal |      diff |      14 |  6.437 ns | 0.0149 ns | 155,362,814.2 |  1.02 |  98% |
CompareOrdinal |      base |      15 |  6.268 ns | 0.0358 ns | 159,540,160.0 |  1    |      |
CompareOrdinal |      diff |      15 |  6.434 ns | 0.0138 ns | 155,425,241.0 |  1.03 |  97% |
CompareOrdinal |      base |      16 |  5.502 ns | 0.0780 ns | 181,759,720.4 |  1    |      |
CompareOrdinal |      diff |      16 |  6.249 ns | 0.0231 ns | 160,017,474.2 |  1.14 |  88% |
CompareOrdinal |      base |      17 |  6.615 ns | 0.0190 ns | 151,168,076.2 |  1    |      |
CompareOrdinal |      diff |      17 |  6.246 ns | 0.0127 ns | 160,093,226.6 |  0.94 | 106% |
CompareOrdinal |      base |      18 |  7.262 ns | 0.0194 ns | 137,710,614.1 |  1    |      |
CompareOrdinal |      diff |      18 |  6.253 ns | 0.0233 ns | 159,927,722.3 |  0.86 | 116% |
CompareOrdinal |      base |      19 |  7.395 ns | 0.0198 ns | 135,220,294.5 |  1    |      |
CompareOrdinal |      diff |      19 |  6.246 ns | 0.0120 ns | 160,101,708.5 |  0.84 | 119% |
CompareOrdinal |      base |      20 |  7.644 ns | 0.0213 ns | 130,827,428.6 |  1    |      |
CompareOrdinal |      diff |      20 |  6.244 ns | 0.0119 ns | 160,157,822.4 |  0.82 | 122% |
CompareOrdinal |      base |      21 |  7.711 ns | 0.0248 ns | 129,678,130.3 |  1    |      |
CompareOrdinal |      diff |      21 |  6.254 ns | 0.0280 ns | 159,904,313.4 |  0.81 | 123% |
CompareOrdinal |      base |      22 |  8.209 ns | 0.0229 ns | 121,815,140.6 |  1    |      |
CompareOrdinal |      diff |      22 |  6.253 ns | 0.0208 ns | 159,928,685.8 |  0.76 | 132% |
CompareOrdinal |      base |      23 |  8.437 ns | 0.0276 ns | 118,523,585.3 |  1    |      |
CompareOrdinal |      diff |      23 |  6.235 ns | 0.0332 ns | 160,381,594.4 |  0.74 | 135% |
CompareOrdinal |      base |      24 |  8.850 ns | 0.0482 ns | 112,992,528.4 |  1    |      |
CompareOrdinal |      diff |      24 |  6.252 ns | 0.0331 ns | 159,955,247.3 |  0.71 | 141% |
CompareOrdinal |      base |      25 |  8.666 ns | 0.0285 ns | 115,397,471.3 |  1    |      |
CompareOrdinal |      diff |      25 |  6.248 ns | 0.0276 ns | 160,044,673.9 |  0.72 | 139% |
CompareOrdinal |      base |      26 |  6.468 ns | 0.0328 ns | 154,606,877.3 |  1    |      |
CompareOrdinal |      diff |      26 |  6.317 ns | 0.0225 ns | 158,306,290.3 |  0.98 | 102% |
CompareOrdinal |      base |      27 |  7.804 ns | 0.0459 ns | 128,139,353.8 |  1    |      |
CompareOrdinal |      diff |      27 |  6.216 ns | 0.0250 ns | 160,865,918.0 |  0.8  | 125% |
CompareOrdinal |      base |      28 |  8.024 ns | 0.0549 ns | 124,628,196.5 |  1    |      |
CompareOrdinal |      diff |      28 |  6.217 ns | 0.0149 ns | 160,861,035.8 |  0.77 | 130% |
CompareOrdinal |      base |      29 |  7.678 ns | 0.0213 ns | 130,238,615.3 |  1    |      |
CompareOrdinal |      diff |      29 |  6.235 ns | 0.0351 ns | 160,384,748.7 |  0.81 | 123% |
CompareOrdinal |      base |      30 |  8.092 ns | 0.0348 ns | 123,578,752.7 |  1    |      |
CompareOrdinal |      diff |      30 |  6.250 ns | 0.0306 ns | 159,999,814.9 |  0.77 | 130% |
CompareOrdinal |      base |      31 |  8.227 ns | 0.0454 ns | 121,556,695.3 |  1    |      |
CompareOrdinal |      diff |      31 |  6.238 ns | 0.0491 ns | 160,304,019.0 |  0.76 | 132% |
CompareOrdinal |      base |      32 |  8.688 ns | 0.0732 ns | 115,096,742.4 |  1    |      |
CompareOrdinal |      diff |      32 |  6.222 ns | 0.0533 ns | 160,713,868.3 |  0.72 | 139% |
CompareOrdinal |      base |      33 |  8.484 ns | 0.0128 ns | 117,870,166.0 |  1    |      |
CompareOrdinal |      diff |      33 |  6.592 ns | 0.0313 ns | 151,693,637.7 |  0.78 | 128% |
CompareOrdinal |      base |      34 |  8.717 ns | 0.0287 ns | 114,722,837.7 |  1    |      |
CompareOrdinal |      diff |      34 |  6.624 ns | 0.0525 ns | 150,972,499.8 |  0.76 | 132% |
CompareOrdinal |      base |      35 |  8.885 ns | 0.0558 ns | 112,555,187.4 |  1    |      |
CompareOrdinal |      diff |      35 |  6.585 ns | 0.0340 ns | 151,859,163.7 |  0.74 | 135% |
CompareOrdinal |      base |      37 |  9.368 ns | 0.0286 ns | 106,743,397.5 |  1    |      |
CompareOrdinal |      diff |      37 |  6.506 ns | 0.0737 ns | 153,699,328.9 |  0.69 | 145% |
CompareOrdinal |      base |      38 |  7.894 ns | 0.0313 ns | 126,674,121.3 |  1    |      |
CompareOrdinal |      diff |      38 |  6.398 ns | 0.0539 ns | 156,310,293.0 |  0.81 | 123% |
CompareOrdinal |      base |      39 |  7.642 ns | 0.0299 ns | 130,863,753.5 |  1    |      |
CompareOrdinal |      diff |      39 |  6.409 ns | 0.0808 ns | 156,036,825.9 |  0.84 | 119% |
CompareOrdinal |      base |      40 |  8.036 ns | 0.0337 ns | 124,437,137.6 |  1    |      |
CompareOrdinal |      diff |      40 |  6.422 ns | 0.0550 ns | 155,716,826.9 |  0.8  | 125% |
CompareOrdinal |      base |      41 |  7.243 ns | 0.0500 ns | 138,065,007.0 |  1    |      |
CompareOrdinal |      diff |      41 |  6.397 ns | 0.0623 ns | 156,331,361.9 |  0.88 | 114% |
CompareOrdinal |      base |      42 |  8.312 ns | 0.0180 ns | 120,308,418.1 |  1    |      |
CompareOrdinal |      diff |      42 |  6.425 ns | 0.0536 ns | 155,641,772.3 |  0.77 | 130% |
CompareOrdinal |      base |      43 |  8.400 ns | 0.0130 ns | 119,040,855.7 |  1    |      |
CompareOrdinal |      diff |      43 |  6.449 ns | 0.0580 ns | 155,071,546.1 |  0.77 | 130% |
CompareOrdinal |      base |      44 |  8.938 ns | 0.0231 ns | 111,877,493.2 |  1    |      |
CompareOrdinal |      diff |      44 |  6.440 ns | 0.0435 ns | 155,284,991.9 |  0.72 | 139% |
CompareOrdinal |      base |      45 |  8.925 ns | 0.0339 ns | 112,040,562.5 |  1    |      |
CompareOrdinal |      diff |      45 |  6.572 ns | 0.0421 ns | 152,153,450.1 |  0.74 | 135% |
CompareOrdinal |      base |      46 |  9.430 ns | 0.0310 ns | 106,040,924.9 |  1    |      |
CompareOrdinal |      diff |      46 |  6.562 ns | 0.0265 ns | 152,383,295.4 |  0.7  | 143% |
CompareOrdinal |      base |      47 |  9.447 ns | 0.0193 ns | 105,848,695.4 |  1    |      |
CompareOrdinal |      diff |      47 |  6.593 ns | 0.0385 ns | 151,666,514.8 |  0.7  | 143% |
CompareOrdinal |      base |      48 |  9.695 ns | 0.0478 ns | 103,145,515.4 |  1    |      |
CompareOrdinal |      diff |      48 |  6.421 ns | 0.0763 ns | 155,747,630.9 |  0.66 | 152% |
CompareOrdinal |      base |      49 |  9.891 ns | 0.0330 ns | 101,103,394.2 |  1    |      |
CompareOrdinal |      diff |      49 |  7.075 ns | 0.0324 ns | 141,341,240.6 |  0.72 | 139% |
CompareOrdinal |      base |      50 |  8.753 ns | 0.0143 ns | 114,247,034.3 |  1    |      |
CompareOrdinal |      diff |      50 |  7.094 ns | 0.0274 ns | 140,956,633.9 |  0.81 | 123% |
CompareOrdinal |      base |      51 |  7.489 ns | 0.0294 ns | 133,530,478.3 |  1    |      |
CompareOrdinal |      diff |      51 |  7.104 ns | 0.0377 ns | 140,769,956.1 |  0.95 | 105% |
CompareOrdinal |      base |      52 |  8.964 ns | 0.0104 ns | 111,558,760.4 |  1    |      |
CompareOrdinal |      diff |      52 |  7.089 ns | 0.0358 ns | 141,058,316.1 |  0.79 | 127% |
CompareOrdinal |      base |      53 |  8.543 ns | 0.0213 ns | 117,060,863.3 |  1    |      |
CompareOrdinal |      diff |      53 |  7.085 ns | 0.0356 ns | 141,141,750.9 |  0.83 | 120% |
CompareOrdinal |      base |      54 |  7.812 ns | 0.0527 ns | 128,007,289.5 |  1    |      |
CompareOrdinal |      diff |      54 |  7.094 ns | 0.0340 ns | 140,961,837.0 |  0.91 | 110% |
CompareOrdinal |      base |      55 |  9.610 ns | 0.0889 ns | 104,058,536.2 |  1    |      |
CompareOrdinal |      diff |      55 |  7.260 ns | 0.0300 ns | 137,732,735.0 |  0.76 | 132% |
CompareOrdinal |      base |      56 |  9.944 ns | 0.0589 ns | 100,566,043.8 |  1    |      |
CompareOrdinal |      diff |      56 |  7.262 ns | 0.0405 ns | 137,700,869.7 |  0.73 | 137% |
CompareOrdinal |      base |      57 |  9.928 ns | 0.0338 ns | 100,720,650.4 |  1    |      |
CompareOrdinal |      diff |      57 |  7.237 ns | 0.0308 ns | 138,188,219.8 |  0.73 | 137% |
CompareOrdinal |      base |      58 | 10.416 ns | 0.0973 ns |  96,001,992.2 |  1    |      |
CompareOrdinal |      diff |      58 |  7.219 ns | 0.0292 ns | 138,522,886.3 |  0.69 | 145% |
CompareOrdinal |      base |      59 | 10.308 ns | 0.0192 ns |  97,012,227.6 |  1    |      |
CompareOrdinal |      diff |      59 |  7.208 ns | 0.0217 ns | 138,726,321.6 |  0.7  | 143% |
CompareOrdinal |      base |      60 | 10.676 ns | 0.0529 ns |  93,665,185.6 |  1    |      |
CompareOrdinal |      diff |      60 |  7.264 ns | 0.0396 ns | 137,666,794.4 |  0.68 | 147% |
CompareOrdinal |      base |      61 | 10.764 ns | 0.0165 ns |  92,898,125.6 |  1    |      |
CompareOrdinal |      diff |      61 |  7.215 ns | 0.0352 ns | 138,594,708.9 |  0.67 | 149% |
CompareOrdinal |      base |      62 |  8.420 ns | 0.0542 ns | 118,770,530.4 |  1    |      |
CompareOrdinal |      diff |      62 |  7.235 ns | 0.0427 ns | 138,224,679.8 |  0.86 | 116% |
CompareOrdinal |      base |      63 |  9.165 ns | 0.0116 ns | 109,116,464.3 |  1    |      |
CompareOrdinal |      diff |      63 |  7.194 ns | 0.0312 ns | 139,007,668.5 |  0.78 | 128% |
CompareOrdinal |      base |      64 |  9.707 ns | 0.0061 ns | 103,018,766.5 |  1    |      |
CompareOrdinal |      diff |      64 |  7.200 ns | 0.0206 ns | 138,883,359.7 |  0.74 | 135% |

@benaadams benaadams force-pushed the Use-CompareOrdinalHelper-for-SpanHelpers.SequenceCompareTo- branch from 9431416 to e278276 Compare January 27, 2020 18:59
@benaadams
Copy link
Member Author

@stephentoub ready to go

@benaadams benaadams force-pushed the Use-CompareOrdinalHelper-for-SpanHelpers.SequenceCompareTo- branch from 61c79d9 to 1f66e28 Compare August 9, 2020 16:38
@danmoseley
Copy link
Member

Rerunning failed jobs so hopefully we can merge this.

@danmoseley
Copy link
Member

@jkotas you signed off on this a while back. It’s green now. Do you believe this needs further review?

@jkotas
Copy link
Member

jkotas commented Aug 12, 2020

I have signed off on much simpler version of this change. This should be reviewed by somebody with PhD in hardware intrinsics.

@danmoseley
Copy link
Member

@tannergooding knows such a person. He is a contended resource at the moment though..

@adamsitnik adamsitnik added the tenet-performance Performance related issue label Aug 14, 2020
@adamsitnik adamsitnik added this to the 5.0.0 milestone Aug 14, 2020
@GrabYourPitchforks GrabYourPitchforks added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Aug 14, 2020
@GrabYourPitchforks
Copy link
Member

I've marked this NO MERGE for now since the latest iteration hasn't gone through review. Once there's a review on the latest iteration feel free to remove this label.

@tannergooding
Copy link
Member

@benaadams, if you could resolve the merge conflicts then I can give this a review 😄

@benaadams
Copy link
Member Author

@tannergooding #41097 is good to go, while I resolve these conflicts 😉

if (Sse2.IsSupported)
{
if (Vector.IsHardwareAccelerated && minLength >= (nuint)Vector<ushort>.Count)
// Calucate lengthToExamine here for test, rather than just testing as it used later, rather than doing it twice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Calucate/Calculate

}
else if (Vector.IsHardwareAccelerated)
{
// Calucate lengthToExamine here for test, rather than just testing as it used later, rather than doing it twice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Calucate/Calculate

@danmoseley
Copy link
Member

@benaadams do you think you'll be able to resolve those conflicts? I'm keeping an eye on this because it's one of our oldest PR's 🙂

Base automatically changed from master to main March 1, 2021 09:06
@carlossanlop
Copy link
Contributor

Ping @benaadams can you please address the latest comments?

@danmoseley
Copy link
Member

Thanks for the PR, @benaadams . I'm going to close this, feel free to reopen if you plan to pick it up again.

@danmoseley danmoseley closed this Mar 19, 2021
MichalStrehovsky pushed a commit to MichalStrehovsky/runtime that referenced this pull request Mar 25, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 18, 2021
radical pushed a commit to radical/runtime that referenced this pull request Jul 7, 2022
- `UninstallApp()` wasn't triggering for devices
- mlaunch failures when running app didn't get detected

Resolves dotnet#402
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Memory NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) tenet-performance Performance related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.