Commit b9cd1cd
committed
Implement mb_substr_count using fast text conversion filters
The performance gain from this change depends on the text encoding and
input string size. For very small strings, other overheads tend to swamp
the performance gains to some extent, such that the speedup is less than
2x. For medium-length strings (~100 bytes or so), the speedup is
typically around 2.5x.
The greatest performance gains are for UTF-8 strings which have already
been marked as valid (using the GC flags on the zend_string object);
for those, the speedup is more than 10x in many cases.
The previous implementation first converted the haystack and needle to
wchars, then searched for matches between the two sequences of wchars.
Because we use -1 as an error marker when converting to wchars, error
markers from invalid byte sequences in the haystack would match error
markers from invalid byte sequences in the needle, even if the specific
invalid byte sequence was different. I am not sure whether this behavior
is really desirable or not, but anyways, this new implementation
follows the same behavior so as not to cause BC breaks.1 parent f9a1a90 commit b9cd1cd
File tree
5 files changed
+132
-175
lines changed- ext/mbstring
- libmbfl/mbfl
- tests
5 files changed
+132
-175
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
429 | 429 | | |
430 | 430 | | |
431 | 431 | | |
432 | | - | |
433 | | - | |
434 | | - | |
435 | | - | |
436 | | - | |
437 | | - | |
438 | | - | |
439 | | - | |
440 | | - | |
441 | | - | |
442 | | - | |
443 | | - | |
444 | | - | |
445 | | - | |
446 | | - | |
447 | | - | |
448 | | - | |
449 | | - | |
450 | | - | |
451 | | - | |
452 | | - | |
453 | | - | |
454 | | - | |
455 | | - | |
456 | | - | |
457 | | - | |
458 | | - | |
459 | | - | |
460 | | - | |
461 | | - | |
462 | | - | |
463 | | - | |
464 | | - | |
465 | | - | |
466 | | - | |
467 | | - | |
468 | | - | |
469 | | - | |
470 | | - | |
471 | | - | |
472 | | - | |
473 | | - | |
474 | | - | |
475 | | - | |
476 | | - | |
477 | | - | |
478 | | - | |
479 | | - | |
480 | | - | |
481 | | - | |
482 | | - | |
483 | | - | |
484 | | - | |
485 | | - | |
486 | | - | |
487 | | - | |
488 | | - | |
489 | | - | |
490 | | - | |
491 | | - | |
492 | | - | |
493 | | - | |
494 | | - | |
495 | | - | |
496 | | - | |
497 | | - | |
498 | | - | |
499 | | - | |
500 | | - | |
501 | | - | |
502 | | - | |
503 | | - | |
504 | | - | |
505 | | - | |
506 | | - | |
507 | | - | |
508 | | - | |
509 | | - | |
510 | | - | |
511 | | - | |
512 | | - | |
513 | | - | |
514 | | - | |
515 | | - | |
516 | | - | |
517 | | - | |
518 | | - | |
519 | | - | |
520 | | - | |
521 | | - | |
522 | | - | |
523 | | - | |
524 | | - | |
525 | | - | |
526 | | - | |
527 | | - | |
528 | | - | |
529 | | - | |
530 | | - | |
531 | | - | |
532 | | - | |
533 | | - | |
534 | | - | |
535 | | - | |
536 | | - | |
537 | | - | |
538 | | - | |
539 | | - | |
540 | | - | |
541 | | - | |
542 | | - | |
543 | | - | |
544 | | - | |
545 | | - | |
546 | | - | |
547 | | - | |
548 | | - | |
549 | | - | |
550 | | - | |
551 | | - | |
552 | | - | |
553 | | - | |
554 | | - | |
555 | | - | |
556 | | - | |
557 | | - | |
558 | | - | |
559 | | - | |
560 | | - | |
561 | | - | |
562 | | - | |
563 | | - | |
564 | 432 | | |
565 | 433 | | |
566 | 434 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
112 | 112 | | |
113 | 113 | | |
114 | 114 | | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | 115 | | |
119 | 116 | | |
120 | 117 | | |
121 | 118 | | |
| 119 | + | |
122 | 120 | | |
123 | 121 | | |
124 | 122 | | |
| |||
195 | 193 | | |
196 | 194 | | |
197 | 195 | | |
198 | | - | |
199 | | - | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | 196 | | |
205 | 197 | | |
206 | 198 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
394 | 394 | | |
395 | 395 | | |
396 | 396 | | |
397 | | - | |
398 | | - | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
399 | 403 | | |
| 404 | + | |
| 405 | + | |
400 | 406 | | |
401 | 407 | | |
402 | 408 | | |
| |||
427 | 433 | | |
428 | 434 | | |
429 | 435 | | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
430 | 447 | | |
431 | 448 | | |
432 | 449 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1961 | 1961 | | |
1962 | 1962 | | |
1963 | 1963 | | |
1964 | | - | |
| 1964 | + | |
1965 | 1965 | | |
1966 | 1966 | | |
1967 | | - | |
| 1967 | + | |
1968 | 1968 | | |
1969 | 1969 | | |
1970 | 1970 | | |
| |||
2263 | 2263 | | |
2264 | 2264 | | |
2265 | 2265 | | |
2266 | | - | |
2267 | 2266 | | |
2268 | 2267 | | |
2269 | | - | |
2270 | | - | |
2271 | | - | |
| 2268 | + | |
2272 | 2269 | | |
2273 | 2270 | | |
2274 | | - | |
2275 | | - | |
| 2271 | + | |
| 2272 | + | |
2276 | 2273 | | |
2277 | 2274 | | |
2278 | 2275 | | |
2279 | 2276 | | |
2280 | | - | |
2281 | | - | |
2282 | | - | |
2283 | | - | |
| 2277 | + | |
2284 | 2278 | | |
2285 | 2279 | | |
2286 | 2280 | | |
2287 | 2281 | | |
2288 | | - | |
2289 | | - | |
| 2282 | + | |
| 2283 | + | |
2290 | 2284 | | |
2291 | 2285 | | |
2292 | 2286 | | |
2293 | | - | |
2294 | | - | |
2295 | | - | |
2296 | | - | |
2297 | | - | |
2298 | | - | |
2299 | | - | |
| 2287 | + | |
| 2288 | + | |
| 2289 | + | |
| 2290 | + | |
| 2291 | + | |
| 2292 | + | |
| 2293 | + | |
| 2294 | + | |
| 2295 | + | |
| 2296 | + | |
| 2297 | + | |
| 2298 | + | |
| 2299 | + | |
| 2300 | + | |
| 2301 | + | |
| 2302 | + | |
| 2303 | + | |
| 2304 | + | |
| 2305 | + | |
| 2306 | + | |
| 2307 | + | |
| 2308 | + | |
| 2309 | + | |
| 2310 | + | |
| 2311 | + | |
| 2312 | + | |
| 2313 | + | |
| 2314 | + | |
| 2315 | + | |
| 2316 | + | |
| 2317 | + | |
| 2318 | + | |
| 2319 | + | |
| 2320 | + | |
| 2321 | + | |
| 2322 | + | |
| 2323 | + | |
| 2324 | + | |
| 2325 | + | |
| 2326 | + | |
| 2327 | + | |
| 2328 | + | |
| 2329 | + | |
| 2330 | + | |
| 2331 | + | |
| 2332 | + | |
| 2333 | + | |
| 2334 | + | |
| 2335 | + | |
| 2336 | + | |
| 2337 | + | |
| 2338 | + | |
| 2339 | + | |
| 2340 | + | |
| 2341 | + | |
| 2342 | + | |
| 2343 | + | |
| 2344 | + | |
| 2345 | + | |
| 2346 | + | |
| 2347 | + | |
2300 | 2348 | | |
2301 | | - | |
2302 | 2349 | | |
2303 | 2350 | | |
2304 | 2351 | | |
| |||
0 commit comments