Skip to content

src,http2: use native-layer pipe mechanism from FileHandle instead of synchronous I/O#18936

Closed
addaleax wants to merge 11 commits intonodejs:masterfrom
addaleax:http2-file-pipe
Closed

src,http2: use native-layer pipe mechanism from FileHandle instead of synchronous I/O#18936
addaleax wants to merge 11 commits intonodejs:masterfrom
addaleax:http2-file-pipe

Conversation

@addaleax
Copy link
Member

@addaleax addaleax commented Feb 22, 2018

Hi everyone! :)

This PR resolves the issue of using synchronous I/O for file-backed HTTP/2 streams, by turning FileHandles into (currently read-only) StreamBase instances, adding a generic mechanism to pipe from a StreamBase into another StreamBase (where currently the latter one needs to know that it wants to have data written to it, which is true for HTTP/2 streams).

This could probably land as it is, but I’ve marked it as WIP because the code probably isn’t quite documented yet and I want to investigate more into the performance of these changes:

The http2/respond-with-fd benchmark shows a ~30 % (update: down to more, like, 20 %) performance drop for me. At least part of that is likely related to the way the benchmark is written: It performs tens or hundreds of reads for the same portion of the same short file in order to return data to the client. In particular, the kernel will always immediately have the data available and will never need to perform actual disk operations, so synchronous reads actually come with less overhead than asynchronous reads to begin with.

In some way, that’s realistic: A lot of the time, data will already be in the kernel cache. On the other hand, Node has committed to avoiding synchronous I/O, and we can’t really know whether disk I/O will be slow or fast in advance.

(A lot of these assumptions are based on the fact than when I add a nanosleep() call with 1 ns duration – which obviously takes longer than 1 ns, but it appears to be enough – to the pread64() syscall, the 30 % drop turns into a whopping 100 % perf win).

In any case, I’d love to hear ideas and suggestions, both about the general approach and its performance here.

/cc @nodejs/http2 @jasnell @mcollina @apapirovski @Fishrock123


Commits:

  • http2: simplify timeout tracking

    There’s no need to reset the chunk counter for every write.

  • src: make FileHandle a (readonly) StreamBase

    This enables accessing files using a more standard pattern.

    Once some more refactoring has been performed on the other existing
    StreamBase streams, this could also be used to implement fs
    streams in a more standard manner.

  • src: give StreamBases the capability to ask for data

    Add a OnStreamWantsWrite() event that allows streams to
    ask for more input data if they want some.

  • src: introduce native-layer stream piping

    Provide a way to create pipes between native StreamBase instances
    that acts more directly than a .pipe() call would.

  • http2: use native pipe instead of synchronous I/O

    This resolves the issue of using synchronous I/O for
    respondWithFile() and respondWithFD(), and enables
    scenarios in which the underlying file does not need
    to be a regular file.

  • http2: remove regular-file-only restriction

    Requiring respondWithFile() to only work with regular files
    is an artificial restriction on Node’s side and has become unnecessary.

    Offsets or lengths cannot be specified for those files,
    but that is an inherent property of other file types.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

src,http2

CI: https://ci.nodejs.org/job/node-test-commit/16439/
CI: https://ci.nodejs.org/job/node-test-commit/16441/
CI: https://ci.nodejs.org/job/node-test-commit/16447/

@addaleax addaleax added wip Issues and PRs that are still a work in progress. c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. http2 Issues or PRs related to the http2 subsystem. labels Feb 22, 2018
@nodejs-github-bot nodejs-github-bot added the lib / src Issues and PRs related to general changes in the lib or src directory. label Feb 22, 2018
@addaleax addaleax force-pushed the http2-file-pipe branch 4 times, most recently from e7b0227 to f7b044e Compare February 22, 2018 18:34
src/node_file.cc Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not std::unique_ptr?

std::unique_ptr<FileHandleReadWrap> req_wrap(FileHandleReadWrap::from_req(req));

The delete then happens automatically at the end of the block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fhinkel Done! But mostly because moving towards always using smart pointers is good. In this case, I am not sure that it actually makes the code or ownership flow cleaner (so I’ve added a comment for that).

@mcollina
Copy link
Member

IMHO files served in this way would always be in the kernel cache, and this is a very tiny read.

Can you add a realistic benchmark, ideally in another PR so that we can use it as a baseline to compare the two implementations?

Can you generate a flamegraph of before/after the change? It would be useful to understand the performance impact.

src/node_file.cc Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is creating a function here ok from a performance point of view? This is not the typical style I've used when working with libuv.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcollina This doesn’t create a new function in C++.

This is not the typical style I've used when working with libuv.

I think that’s just because a lot of our code that deals with libuv was written before we started using C++11.

@addaleax
Copy link
Member Author

IMHO files served in this way would always be in the kernel cache, and this is a very tiny read.

That would be true for a lot of cases, yes, and it’s something I’m worried about; because I’d expect that as long as files are in the kernel cache, synchronous I/O is always going to be faster than async I/O.

Can you add a realistic benchmark, ideally in another PR so that we can use it as a baseline to compare the two implementations?

Sure, but I’d be interested in the criteria we would use. I’m wondering whether something like a combination with the benchmarks that we have for network sockets would make sense here, i.e. sending one or more large files over multiple streams and requests in parallel, and measuring throughput rather than ops/sec.

Can you generate a flamegraph of before/after the change? It would be useful to understand the performance impact.

Tbh, I’m not sure what tooling would be best suited for that, but I’m sure I can try.

@fhinkel
Copy link
Member

fhinkel commented Feb 23, 2018

I would love to see more modern C++ in our libuv related code : 💓

@addaleax
Copy link
Member Author

@jasnell
Copy link
Member

jasnell commented Feb 23, 2018

Thank you for this. It's going to take me a few days to be able to get completely through this. I'll try to have some feedback by Monday :)

@apapirovski
Copy link
Contributor

This looks great, @addaleax! In the middle of a major move so it's likely going to take me a little while to review.

@addaleax addaleax force-pushed the http2-file-pipe branch 5 times, most recently from ed538b5 to b880709 Compare February 23, 2018 22:07
@addaleax
Copy link
Member Author

A 12% perf loss at 10k requests is a bit painful

Yeah. :/ Still, I think following the general principle of “never blocking the event loop” is enough to make this worth it. Also, there’s decent perf gains in other categories, and there’s always the fact that the benchmark doesn’t represent a real-world server and we’ll just have to see how this works out in reality.

I suspect that it's still significantly faster than using the normal fs.createReadStream() approach. Can you do a quick check to confirm? Even just local benchmarking should be enough.

Yeah, it is:

$ ./node benchmark/http2/respond-with-fd.js streams=200 requests=10000 
http2/respond-with-fd.js method="pipe" benchmarker="h2load" clients=1 streams=200 requests=10000: 3,375.29
http2/respond-with-fd.js method="respondwithfd" benchmarker="h2load" clients=1 streams=200 requests=10000: 6,490.17
http2/respond-with-fd.js method="pipe" benchmarker="h2load" clients=2 streams=200 requests=10000: 3,642.84
http2/respond-with-fd.js method="respondwithfd" benchmarker="h2load" clients=2 streams=200 requests=10000: 6,463.41

(Side note: I just about got the first prototype for a fast-path StreamBase-to-StreamBase .pipe() working & passing all existing tests – so soon there might not be much of a difference at all. 😄)

addaleax added a commit to addaleax/node that referenced this pull request Mar 14, 2018
82bdf8f fixed an issue by silently modifying the `start`
option for the case when only `end` is passed, in order to perform
reads from a specified range in the file.

However, that approach does not work for non-seekable files, since
a numeric `start` option means that positioned reads will be used
to read data from the file.

This patch fixes that, and instead ends reading after a specified
size by adjusting the read buffer size.

This way we avoid re-introducing the bug that 82bdf8f fixed,
and align behaviour with the native file stream mechanism
introduced in nodejs#18936 as well.

Fixes: nodejs#19240
Refs: nodejs#18121
addaleax added 11 commits March 14, 2018 22:54
Put `HandleScope`s and `Context::Scope`s where they are used,
and don’t create one for native stream callbacks automatically.

This is slightly less convenient but means that stream listeners
that don’t actually call back into JS don’t have to pay the
(small) cost of setting these up.

PR-URL: nodejs#18936
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
This enables accessing files using a more standard pattern.

Once some more refactoring has been performed on the other existing
`StreamBase` streams, this could also be used to implement `fs`
streams in a more standard manner.

PR-URL: nodejs#18936
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Add a `OnStreamWantsWrite()` event that allows streams to
ask for more input data if they want some.

PR-URL: nodejs#18936
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Add `AsyncScope` for cases where the async_hooks `before` and
`after` callbacks should be called, to track async context,
but no actual JS is called in between and we can therefore
skip things like draining the microtask or `nextTick` queues.

PR-URL: nodejs#18936
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Provide a way to create pipes between native `StreamBase` instances
that acts more directly than a `.pipe()` call would.

PR-URL: nodejs#18936
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
This resolves the issue of using synchronous I/O for
`respondWithFile()` and `respondWithFD()`, and enables
scenarios in which the underlying file does not need
to be a regular file.

PR-URL: nodejs#18936
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Requiring `respondWithFile()` to only work with regular files
is an artificial restriction on Node’s side and has become unnecessary.

Offsets or lengths cannot be specified for those files,
but that is an inherent property of other file types.

PR-URL: nodejs#18936
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
This allows V8 to avoid preparing a execution context
for the constructor, to give a (kinda) small but noticeable
perf gain.

Benchmarks (only this commit):

    $ ./node benchmark/compare.js --new ./node --old ./node-master --filter net-c2s.js --set len=10 --set type=asc --runs 360 net | Rscript benchmark/compare.R
    [01:15:27|% 100| 1/1 files | 720/720 runs | 1/1 configs]: Done
                                             confidence improvement accuracy (*)   (**)  (***)
     net/net-c2s.js dur=5 type='asc' len=10        ***      0.69 %       ±0.31% ±0.41% ±0.53%

PR-URL: nodejs#18936
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
`requests = 1000000` took about 10 minutes per run for me
and doesn’t seem to add much value on its own.

PR-URL: nodejs#18936
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
The `StreamBase` interface changed, so that `OnStreamRead()`
and `OnStreamAlloc()` are not guaranteed to be emitted in the
same tick any more.

This means that, while it isn’t causing any trouble right now,
we should not assume that it’s safe to return a static buffer
in the HTTP parser’s `OnStreamAlloc()` method.

PR-URL: nodejs#18936
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
@addaleax
Copy link
Member Author

@addaleax addaleax added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Mar 14, 2018
@addaleax
Copy link
Member Author

Landed in 3ad7c1a...1ac1424

@addaleax addaleax closed this Mar 15, 2018
@addaleax addaleax deleted the http2-file-pipe branch March 15, 2018 11:55
addaleax added a commit that referenced this pull request Mar 15, 2018
Put `HandleScope`s and `Context::Scope`s where they are used,
and don’t create one for native stream callbacks automatically.

This is slightly less convenient but means that stream listeners
that don’t actually call back into JS don’t have to pay the
(small) cost of setting these up.

PR-URL: #18936
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
addaleax added a commit that referenced this pull request Mar 15, 2018
This enables accessing files using a more standard pattern.

Once some more refactoring has been performed on the other existing
`StreamBase` streams, this could also be used to implement `fs`
streams in a more standard manner.

PR-URL: #18936
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
addaleax added a commit that referenced this pull request Mar 15, 2018
Add a `OnStreamWantsWrite()` event that allows streams to
ask for more input data if they want some.

PR-URL: #18936
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
@targos
Copy link
Member

targos commented Mar 17, 2018

Should this be backported to v9.x-staging? If yes please follow the guide and raise a backport PR, if not let me know or add the dont-land-on label.

@addaleax
Copy link
Member Author

This depends on #18297, so I don’t think we can backport it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Issues and PRs that require attention from people who are familiar with C++. http2 Issues or PRs related to the http2 subsystem. lib / src Issues and PRs related to general changes in the lib or src directory.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants