[#86787] [Ruby trunk Feature#14723] [WIP] sleepy GC — ko1@...
Issue #14723 has been updated by ko1 (Koichi Sasada).
13 messages
2018/05/01
[#86790] Re: [Ruby trunk Feature#14723] [WIP] sleepy GC
— Eric Wong <normalperson@...>
2018/05/01
[email protected] wrote:
[#86791] Re: [Ruby trunk Feature#14723] [WIP] sleepy GC
— Koichi Sasada <ko1@...>
2018/05/01
On 2018/05/01 12:18, Eric Wong wrote:
[#86792] Re: [Ruby trunk Feature#14723] [WIP] sleepy GC
— Eric Wong <normalperson@...>
2018/05/01
Koichi Sasada <[email protected]> wrote:
[#86793] Re: [Ruby trunk Feature#14723] [WIP] sleepy GC
— Koichi Sasada <ko1@...>
2018/05/01
On 2018/05/01 12:47, Eric Wong wrote:
[#86794] Re: [Ruby trunk Feature#14723] [WIP] sleepy GC
— Eric Wong <normalperson@...>
2018/05/01
Koichi Sasada <[email protected]> wrote:
[#86814] Re: [Ruby trunk Feature#14723] [WIP] sleepy GC
— Koichi Sasada <ko1@...>
2018/05/02
[#86815] Re: [Ruby trunk Feature#14723] [WIP] sleepy GC
— Eric Wong <normalperson@...>
2018/05/02
Koichi Sasada <[email protected]> wrote:
[#86816] Re: [Ruby trunk Feature#14723] [WIP] sleepy GC
— Koichi Sasada <ko1@...>
2018/05/02
On 2018/05/02 11:49, Eric Wong wrote:
[#86847] [Ruby trunk Bug#14732] CGI.unescape returns different instance between Ruby 2.3 and 2.4 — me@...
Issue #14732 has been reported by jnchito (Junichi Ito).
3 messages
2018/05/02
[#86860] [Ruby trunk Feature#14723] [WIP] sleepy GC — sam.saffron@...
Issue #14723 has been updated by sam.saffron (Sam Saffron).
6 messages
2018/05/03
[#86862] Re: [Ruby trunk Feature#14723] [WIP] sleepy GC
— Eric Wong <normalperson@...>
2018/05/03
[email protected] wrote:
[#86935] [Ruby trunk Bug#14742] Deadlock when autoloading different constants in the same file from multiple threads — elkenny@...
Issue #14742 has been reported by eugeneius (Eugene Kenny).
5 messages
2018/05/08
[#87030] [Ruby trunk Feature#14757] [PATCH] thread_pthread.c: enable thread caceh by default — normalperson@...
Issue #14757 has been reported by normalperson (Eric Wong).
4 messages
2018/05/15
[#87093] [Ruby trunk Feature#14767] [PATCH] gc.c: use monotonic counters for objspace_malloc_increase — ko1@...
Issue #14767 has been updated by ko1 (Koichi Sasada).
3 messages
2018/05/17
[#87095] [Ruby trunk Feature#14767] [PATCH] gc.c: use monotonic counters for objspace_malloc_increase — ko1@...
Issue #14767 has been updated by ko1 (Koichi Sasada).
9 messages
2018/05/17
[#87096] Re: [Ruby trunk Feature#14767] [PATCH] gc.c: use monotonic counters for objspace_malloc_increase
— Eric Wong <normalperson@...>
2018/05/17
[email protected] wrote:
[#87166] Re: [Ruby trunk Feature#14767] [PATCH] gc.c: use monotonic counters for objspace_malloc_increase
— Eric Wong <normalperson@...>
2018/05/18
Eric Wong <[email protected]> wrote:
[#87486] Re: [Ruby trunk Feature#14767] [PATCH] gc.c: use monotonic counters for objspace_malloc_increase
— Eric Wong <normalperson@...>
2018/06/13
I wrote:
[ruby-core:86921] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
From:
samuel@...
Date:
2018-05-06 12:17:40 UTC
List:
ruby-core #86921
Issue #14739 has been updated by ioquatix (Samuel Williams).
I tested async-http, a web server, it has a basic performance spec using `wrk` as the client.
I ran it several times and report the best result of each below. It's difficult to make a judgement. I'd like to say performance was improved but if so, < 5%. However, this benchmark is testing an entire web server stack. Context switching only happens a few times per request.. If I had to take a guess, maybe not more than 4 times (accept, read request, write response). In many cases, we only context switch if the operation would block.
```
# Without libcoro-fiber
Async::HTTP::Server
simple response
Running 2m test @ http://127.0.0.1:9292/
8 threads and 8 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 110.06us 647.25us 67.72ms 99.33%
Req/Sec 12.58k 3.07k 26.94k 70.77%
12021990 requests in 2.00m, 401.28MB read
Requests/sec: 100100.72
Transfer/sec: 3.34MB
# With libcoro-fiber
Async::HTTP::Server
simple response
Running 2m test @ http://127.0.0.1:9292/
8 threads and 8 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 106.47us 834.32us 99.45ms 99.46%
Req/Sec 12.66k 2.95k 17.61k 71.12%
12093398 requests in 2.00m, 403.66MB read
Requests/sec: 100694.76
Transfer/sec: 3.36MB
```
This result surprised me a little bit, but now that I think about it, it could make sense. Because the cost of network (read/write) and processing (parsing, generating response, buffers, GC) far outweigh the fiber yield/resume, which is already minimised. In real world situations, the results should lean more in favour of libcoro.
Just for interest, I also collect system call stats.
```
# Without libcoro
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
45.76 4.635066 2 2095278 sendto
32.47 3.288691 1 4191323 rt_sigprocmask
20.90 2.117062 1 2095611 324 recvfrom
0.67 0.068189 9741 7 poll
0.07 0.006821 1 6256 5313 openat
0.03 0.003404 1 4034 5 lstat
0.01 0.001072 1 1158 read
0.01 0.001049 1 987 close
0.01 0.000805 1 901 421 stat
0.01 0.000627 25 25 clone
0.01 0.000624 1 793 fstat
0.01 0.000521 4 124 mmap
0.00 0.000475 1 798 246 fcntl
0.00 0.000475 2 297 1 epoll_wait
0.00 0.000402 3 140 mremap
0.00 0.000386 1 346 322 epoll_ctl
0.00 0.000331 1 557 552 ioctl
0.00 0.000323 16 20 futex
0.00 0.000321 3 94 mprotect
0.00 0.000307 1 213 brk
0.00 0.000255 4 62 getdents
0.00 0.000183 1 291 getuid
0.00 0.000180 1 292 geteuid
0.00 0.000177 1 292 getegid
0.00 0.000172 1 291 getgid
0.00 0.000096 3 36 pipe2
0.00 0.000074 6 12 munmap
0.00 0.000066 11 6 2 execve
0.00 0.000052 2 23 14 accept4
0.00 0.000047 3 18 prctl
0.00 0.000047 2 27 set_robust_list
0.00 0.000045 2 19 getpid
0.00 0.000040 0 81 2 rt_sigaction
0.00 0.000028 2 16 8 access
0.00 0.000017 1 15 getcwd
0.00 0.000016 1 14 readlink
0.00 0.000016 0 241 238 newfstatat
0.00 0.000014 0 96 lseek
0.00 0.000013 1 10 chdir
0.00 0.000013 3 4 arch_prctl
0.00 0.000012 0 25 setsockopt
0.00 0.000009 0 25 getsockname
0.00 0.000007 2 4 prlimit64
0.00 0.000006 0 17 getsockopt
0.00 0.000006 3 2 getrandom
0.00 0.000004 2 2 sched_getaffinity
0.00 0.000004 4 1 clock_gettime
0.00 0.000003 2 2 write
0.00 0.000003 3 1 sigaltstack
0.00 0.000003 2 2 set_tid_address
0.00 0.000002 2 1 vfork
0.00 0.000001 1 1 wait4
0.00 0.000001 1 1 getresgid
0.00 0.000000 0 8 pipe
0.00 0.000000 0 1 dup2
0.00 0.000000 0 8 socket
0.00 0.000000 0 8 bind
0.00 0.000000 0 8 listen
0.00 0.000000 0 1 sysinfo
0.00 0.000000 0 1 getresuid
0.00 0.000000 0 8 epoll_create1
------ ----------- ----------- --------- --------- ----------------
100.00 10.128563 8400935 7448 total
# With libcoro
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
65.83 5.263501 2 2708883 sendto
32.87 2.628193 1 2709155 263 recvfrom
1.06 0.084583 16917 5 poll
0.09 0.006915 1 6232 5313 openat
0.06 0.004405 1 4034 5 lstat
0.02 0.001276 1 1123 read
0.02 0.001207 1 833 379 stat
0.01 0.000996 1 963 close
0.01 0.000510 1 785 fstat
0.01 0.000492 1 533 528 ioctl
0.00 0.000330 2 162 1 epoll_wait
0.00 0.000327 0 797 246 fcntl
0.00 0.000285 11 25 clone
0.00 0.000253 1 232 brk
0.00 0.000253 1 284 260 epoll_ctl
0.00 0.000239 2 123 mmap
0.00 0.000207 2 95 mprotect
0.00 0.000168 8 20 futex
0.00 0.000163 3 62 getdents
0.00 0.000142 0 291 getuid
0.00 0.000139 1 238 235 newfstatat
0.00 0.000133 0 292 geteuid
0.00 0.000131 0 291 getgid
0.00 0.000129 0 292 getegid
0.00 0.000080 7 12 munmap
0.00 0.000058 2 32 rt_sigprocmask
0.00 0.000057 1 88 lseek
0.00 0.000057 2 36 pipe2
0.00 0.000044 1 81 2 rt_sigaction
0.00 0.000043 3 14 readlink
0.00 0.000039 2 16 8 access
0.00 0.000036 2 22 13 accept4
0.00 0.000035 1 27 set_robust_list
0.00 0.000033 2 18 prctl
0.00 0.000028 1 19 getpid
0.00 0.000026 2 15 getcwd
0.00 0.000020 2 10 chdir
0.00 0.000013 13 1 wait4
0.00 0.000009 5 2 getrandom
0.00 0.000008 0 25 setsockopt
0.00 0.000006 3 2 write
0.00 0.000006 0 25 getsockname
0.00 0.000003 3 1 vfork
0.00 0.000003 1 6 2 execve
0.00 0.000003 1 4 arch_prctl
0.00 0.000003 2 2 set_tid_address
0.00 0.000003 1 4 prlimit64
0.00 0.000002 0 17 getsockopt
0.00 0.000002 2 1 sigaltstack
0.00 0.000001 1 1 getresuid
rake aborted!
0.00 0.000001 1 1 getresgid
0.00 0.000001 1 2 sched_getaffinity
0.00 0.000000 0 8 pipe
0.00 0.000000 0 1 dup2
0.00 0.000000 0 8 socket
0.00 0.000000 0 8 bind
0.00 0.000000 0 8 listen
Interrupt:
0.00 0.000000 0 1 sysinfo
0.00 0.000000 0 1 clock_gettime
0.00 0.000000 0 8 epoll_create1
------ ----------- ----------- --------- --------- ----------------
```
`rt_sigprocmask` was gone because it's not invoked by libcoro unless using `swapcontext`.
----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71883
* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
----------------------------------------
I am interested to improve Fiber yield/resume performance.
I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.
I'd suggest to use that library.
As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.
Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/
Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.
--
https://bugs.ruby-lang.org/
Unsubscribe: <mailto:[email protected]?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>