Skip to content

Conversation

@asolntsev
Copy link
Contributor

@asolntsev asolntsev commented Nov 22, 2025

User description

🔗 Related Issues

Fixes #16612

💥 What does this PR do?

This PR allows downloading very large files (I tried up to 4GB) from Selenium Grid, while consuming very low memory.

🔄 Types of changes

  • Performance improvement (backwards compatible)

PR Type

Enhancement, Performance


Description

  • Add new Grid endpoint /se/files/:name for direct file downloads without Base64 encoding

  • Optimize RemoteWebDriver.downloadFile() to stream files directly instead of loading into memory

  • Extract anonymous Contents.Supplier implementations into separate named classes for better debugging

  • Improve HTTP response handling to support binary streams and large file transfers

  • Fix flaky tests by adding proper wait conditions and increased timeouts


Diagram Walkthrough

flowchart LR
  A["Grid Node"] -->|"GET /se/files/:name"| B["New Direct Download Endpoint"]
  B -->|"Stream file"| C["FileContentSupplier"]
  C -->|"InputStream"| D["RemoteWebDriver"]
  D -->|"Files.copy"| E["Target Location"]
  F["Old Method"] -->|"Base64 + JSON"| G["Memory Issues"]
  style B fill:#90EE90
  style C fill:#90EE90
  style E fill:#87CEEB
Loading

File Walkthrough

Relevant files
Enhancement
20 files
Node.java
Add new GET endpoint for direct file downloads                     
+3/-0     
LocalNode.java
Implement direct file streaming and URL parsing logic       
+77/-16 
RemoteWebDriver.java
Optimize file download to stream directly to disk               
+10/-4   
Contents.java
Add file and stream content suppliers, deprecate memory-intensive
methods
+31/-27 
FileContentSupplier.java
New supplier for streaming file content without loading to memory
+80/-0   
InputStreamContentSupplier.java
New supplier for binary stream content with length tracking
+67/-0   
BytesContentSupplier.java
Extract bytes supplier to separate class for clarity         
+65/-0   
FileBackedOutputStreamContentSupplier.java
New supplier for file-backed output stream content             
+74/-0   
JdkHttpClient.java
Switch to InputStream response handler for large file support
+5/-2     
JdkHttpMessages.java
Handle binary streams separately from JSON responses         
+36/-8   
W3CHttpResponseCodec.java
Add binary stream detection and avoid loading large files to memory
+15/-6   
AbstractHttpCommandCodec.java
Add new GET_DOWNLOADED_FILE command mapping                           
+2/-0     
DriverCommand.java
Add GET_DOWNLOADED_FILE command constant                                 
+1/-0     
NodeStatus.java
Add toString method to prevent debug log spam                       
+6/-0     
Session.java
Add toString method to prevent debug log spam                       
+5/-0     
Urls.java
Add URL decoding utility method                                                   
+8/-2     
RequestConverter.java
Use atomic long for thread-safe length tracking                   
+6/-31   
HttpMessage.java
Add toString and contentAsString methods                                 
+9/-0     
HttpRequest.java
Improve toString to include content information                   
+5/-1     
HttpResponse.java
Improve toString to avoid loading large content                   
+4/-2     
Tests
5 files
NodeTest.java
Add wait conditions for async directory deletion                 
+7/-3     
LocalNewSessionQueueTest.java
Increase timeout to fix flaky test                                             
LocalNodeTest.java
Add wait conditions and test for filename extraction         
+55/-30 
HttpClientTestBase.java
Handle HttpTimeoutException in interrupt detection             
+3/-1     
ReverseProxyEndToEndTest.java
Remove debug logging statement                                                     
+0/-2     
Configuration changes
3 files
BUILD.bazel
Add jspecify dependency                                                                   
+1/-0     
BUILD.bazel
Add guava dependency for media type handling                         
+2/-0     
bazel.yml
Add test logs upload on failure                                                   
+8/-0     
Additional files
2 files
ResponseConverter.java +4/-6     
LocalNewSessionQueueTest.java +1/-1     

@selenium-ci selenium-ci added B-grid Everything grid and server related C-java Java Bindings B-build Includes scripting, bazel and CI integrations labels Nov 22, 2025
@asolntsev asolntsev added the I-performance Something could be faster label Nov 22, 2025
@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Nov 22, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Path parsing ambiguity

Description: The URL-decoding in extractFileName replaces spaces with '+' (urlDecode(...).replace(' ',
'+')), which can be abused to bypass exact filename matching and access unexpected files
whose real names contain spaces and plus signs, potentially enabling ambiguous or
unintended file access.
LocalNode.java [771-783]

Referred Code
private String extractFileName(HttpRequest req) {
  return extractFileName(req.getUri());
}

String extractFileName(String uri) {
  String prefix = "/se/files/";
  int index = uri.lastIndexOf(prefix);
  if (index < 0) {
    throw new IllegalArgumentException("Unexpected URL for downloading a file: " + uri);
  }
  return urlDecode(uri.substring(index + prefix.length())).replace(' ', '+');
}
Insecure direct object access

Description: The file download endpoint streams arbitrary files from the per-session downloads
directory based solely on client-supplied names without enforcing content-type
restrictions or download safeguards, allowing retrieval of any file present in that
directory (including potentially sensitive artifacts) if an attacker can write to it via
the browser.
LocalNode.java [839-851]

Referred Code
private HttpResponse getDownloadedFile(File downloadsDirectory, String fileName)
    throws IOException {
  if (fileName.isEmpty()) {
    throw new WebDriverException("Please specify file to download in URL");
  }
  File file = findDownloadedFile(downloadsDirectory, fileName);
  BasicFileAttributes attributes = readAttributes(file.toPath(), BasicFileAttributes.class);
  return new HttpResponse()
      .setHeader("Content-Type", MediaType.OCTET_STREAM.toString())
      .setHeader("Content-Length", String.valueOf(attributes.size()))
      .setHeader("Last-Modified", lastModifiedHeader(attributes.lastModifiedTime()))
      .setContent(Contents.file(file));
}
Memory exhaustion risk

Description: contentAsString reads the entire stream into memory if length ≤ 256MB, which could still
be abused to trigger high memory usage by returning large "text" responses near the
threshold; this is risky for untrusted endpoints and contradicts the streaming intent.
InputStreamContentSupplier.java [61-69]

Referred Code
public String contentAsString(Charset charset) {
  if (length > MAX_TEXT_RESPONSE_SIZE) {
    throw new UnsupportedOperationException("Cannot print out too large stream content");
  }
  try {
    return new String(stream.readAllBytes(), UTF_8);
  } catch (IOException e) {
    throw new RuntimeException(e);
  }
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Info Exposure: Error when file not found exposes absolute downloads directory path and full file list in
the message, which may leak internal system details to the client.

Referred Code
  throw new WebDriverException(
      String.format(
          "Cannot find file [%s] in directory %s. Found %s files: %s.",
          filename, downloadsDirectory.getAbsolutePath(), files.size(), files));
}
if (matchingFiles.size() != 1) {
  throw new WebDriverException(
      String.format(

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Action Logging: New download endpoints and file operations (listing, streaming, deleting) do not include
explicit audit logging of user/session, action, target file, and outcome in the added code
hunks.

Referred Code
          + id
          + " — ensure downloads are enabled in the options class when requesting a session.";
  throw new WebDriverException(msg);
}
File downloadsDirectory =
    Optional.ofNullable(tempFS.getBaseDir().listFiles()).orElse(new File[] {})[0];

try {
  if (req.getMethod().equals(HttpMethod.GET) && req.getUri().endsWith("/se/files")) {
    return listDownloadedFiles(downloadsDirectory);
  }
  if (req.getMethod().equals(HttpMethod.GET)) {
    return getDownloadedFile(downloadsDirectory, extractFileName(req));
  }
  if (req.getMethod().equals(HttpMethod.DELETE)) {
    return deleteDownloadedFile(downloadsDirectory);
  }
  return getDownloadedFile(req, downloadsDirectory);
} catch (IOException e) {
  throw new UncheckedIOException(e);
}


 ... (clipped 117 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Edge Cases: Streaming download path sets Content-Length from attributes but may not handle missing
length or concurrent file changes, and GET without file name relies on URL parsing that
throws IllegalArgumentException without standardized error response mapping.

Referred Code
    throws IOException {
  if (fileName.isEmpty()) {
    throw new WebDriverException("Please specify file to download in URL");
  }
  File file = findDownloadedFile(downloadsDirectory, fileName);
  BasicFileAttributes attributes = readAttributes(file.toPath(), BasicFileAttributes.class);
  return new HttpResponse()
      .setHeader("Content-Type", MediaType.OCTET_STREAM.toString())
      .setHeader("Content-Length", String.valueOf(attributes.size()))
      .setHeader("Last-Modified", lastModifiedHeader(attributes.lastModifiedTime()))
      .setContent(Contents.file(file));
}

private String lastModifiedHeader(FileTime fileTime) {
  return HTTP_DATE_FORMAT.format(fileTime.toInstant().atZone(UTC));
}

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Path Handling: Filename extraction and matching rely on equals against names found in the downloads
directory, but there is no explicit normalization/validation beyond urlDecode to prevent
traversal or special-name attacks, requiring verification of directory confinement.

Referred Code
String extractFileName(String uri) {
  String prefix = "/se/files/";
  int index = uri.lastIndexOf(prefix);
  if (index < 0) {
    throw new IllegalArgumentException("Unexpected URL for downloading a file: " + uri);
  }
  return urlDecode(uri.substring(index + prefix.length())).replace(' ', '+');
}

/** User wants to list files that can be downloaded */
private HttpResponse listDownloadedFiles(File downloadsDirectory) {
  File[] files = Optional.ofNullable(downloadsDirectory.listFiles()).orElse(new File[] {});
  List<String> fileNames = Arrays.stream(files).map(File::getName).collect(Collectors.toList());
  List<DownloadedFile> fileInfos =
      Arrays.stream(files).map(this::getFileInfo).collect(Collectors.toList());

  Map<String, Object> data =
      Map.of(
          "names", fileNames,
          "files", fileInfos);
  Map<String, Map<String, Object>> result = Map.of("value", data);


 ... (clipped 88 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Nov 22, 2025

PR Code Suggestions ✨

Latest suggestions up to 9faf649

CategorySuggestion                                                                                                                                    Impact
Possible issue
Make InputStream supplier multi-read safe

Modify InputStreamContentSupplier to buffer the incoming stream, allowing get()
to be called multiple times as required by the Contents.Supplier contract.

java/src/org/openqa/selenium/remote/http/InputStreamContentSupplier.java [26-67]

 class InputStreamContentSupplier implements Contents.Supplier {
 
-  private final InputStream stream;
+  private final com.google.common.io.FileBackedOutputStream buffer;
   private final long length;
+  private volatile boolean closed;
 
   InputStreamContentSupplier(InputStream stream, long length) {
-    this.stream = Require.nonNull("InputStream", stream);
+    Require.nonNull("InputStream", stream);
     this.length = length;
+    try {
+      // Use small memory threshold; spill to disk for large payloads
+      this.buffer = new com.google.common.io.FileBackedOutputStream(3 * 1024 * 1024, true);
+      stream.transferTo(buffer);
+    } catch (IOException e) {
+      throw new UncheckedIOException(e);
+    } finally {
+      try {
+        stream.close();
+      } catch (IOException ignore) {
+      }
+    }
   }
 
   @Override
   public InputStream get() {
-    return stream;
+    if (closed) {
+      throw new IllegalStateException("Contents.Supplier has been closed before");
+    }
+    try {
+      return buffer.asByteSource().openBufferedStream();
+    } catch (IOException e) {
+      throw new UncheckedIOException(e);
+    }
   }
 
   @Override
   public long length() {
+    if (closed) {
+      throw new IllegalStateException("Contents.Supplier has been closed before");
+    }
     return length;
   }
 
+  @Override
   public void close() {
+    closed = true;
     try {
-      stream.close();
+      buffer.reset();
     } catch (IOException ignore) {
     }
   }
 
   @Override
   public String toString() {
     return String.format("Contents.fromStream(%s bytes)", length);
   }
 
   @Override
   public String contentAsString(Charset charset) {
     throw new UnsupportedOperationException("Don't serialize binary stream - it might be large");
   }
 
   @Override
   public Reader reader(Charset charset) {
     throw new UnsupportedOperationException("Don't read binary stream  - it might be large");
   }
 }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that the new InputStreamContentSupplier violates the Contents.Supplier contract by returning a non-reusable stream, which could lead to bugs. The proposed fix to buffer the stream is appropriate and ensures correctness.

Medium
Safely decode non-text and error responses
Suggestion Impact:The commit updated the logging to include status, content type, and content length, and switched to using a null-safe content type retrieval (requireNonNullElse with getContentType), aligning with the suggestion's direction to rely on content type rather than raw content in logs and logic.

code diff:

     if (LOG.isLoggable(Level.FINER)) {
       LOG.log(
           Level.FINER,
-          "Decoding response. Response code was: {0} and content: {1}",
-          new Object[] {encodedResponse.getStatus(), encodedResponse.getContent()});
+          "Decoding response (status was: {0}, content type: {1}, content length: {2})",
+          new Object[] {
+            encodedResponse.getStatus(),
+            encodedResponse.getContentType(),
+            encodedResponse.getContentLength()
+          });
     }
-    String contentType =
-        Objects.requireNonNullElse(encodedResponse.getHeader(HttpHeader.ContentType.getName()), "");
+    String contentType = requireNonNullElse(encodedResponse.getContentType(), "");
 

Refactor W3CHttpResponseCodec.decode to correctly handle binary content by
keeping it as a stream supplier only when appropriate, and improve error
handling for non-textual error responses to prevent crashes.

java/src/org/openqa/selenium/remote/codec/w3c/W3CHttpResponseCodec.java [75-175]

 @Override
 public Response decode(HttpResponse encodedResponse) {
   if (LOG.isLoggable(Level.FINER)) {
     LOG.log(
         Level.FINER,
         "Decoding response. Response code was: {0} and content: {1}",
-        new Object[] {encodedResponse.getStatus(), encodedResponse.getContent()});
+        new Object[] {encodedResponse.getStatus(), encodedResponse.getHeader(HttpHeader.ContentType.getName())});
   }
   String contentType =
       Objects.requireNonNullElse(encodedResponse.getHeader(HttpHeader.ContentType.getName()), "");
 
   Response response = new Response();
 
-  // Are we dealing with an error?
-  // {"error":"no such alert","message":"No tab modal was open when attempting to get the dialog
-  // text"}
   if (!encodedResponse.isSuccessful()) {
     LOG.fine("Processing an error");
-    String content = encodedResponse.contentAsString().trim();
+    boolean isBinary =
+        contentType.toLowerCase(Locale.ENGLISH).startsWith(MediaType.OCTET_STREAM.toString());
+    String content = isBinary ? "" : encodedResponse.contentAsString().trim();
+
     if (HTTP_BAD_METHOD == encodedResponse.getStatus()) {
       response.setState("unknown command");
       response.setStatus(ErrorCodes.UNKNOWN_COMMAND);
       response.setValue(content);
     } else if (HTTP_GATEWAY_TIMEOUT == encodedResponse.getStatus()
         || HTTP_BAD_GATEWAY == encodedResponse.getStatus()) {
       response.setState("unknown error");
       response.setStatus(ErrorCodes.UNHANDLED_ERROR);
       response.setValue(content);
     } else {
+      if (isBinary || content.isEmpty()) {
+        response.setState("unknown error");
+        response.setStatus(ErrorCodes.UNHANDLED_ERROR);
+        response.setValue(String.format("Non-textual error response (status %s, Content-Type: %s)", encodedResponse.getStatus(), contentType));
+        return response;
+      }
       Map<String, Object> org = json.toType(content, MAP_TYPE);
-...
+      populateResponse(encodedResponse, response, org);
+    }
+    return response;
+  }
+
   response.setState("success");
   response.setStatus(ErrorCodes.SUCCESS);
 
-  if (contentType.startsWith(MediaType.OCTET_STREAM.toString())) {
+  if (contentType.toLowerCase(Locale.ENGLISH).startsWith(MediaType.OCTET_STREAM.toString())) {
+    // Keep as stream supplier so clients can stream large files
     response.setValue(encodedResponse.getContent());
     return response;
   }
 
   String content = encodedResponse.contentAsString().trim();
   if (!content.isEmpty()) {
-    if (contentType.startsWith("application/json")) {
+    if (contentType.toLowerCase(Locale.ENGLISH).startsWith("application/json")) {
       Map<String, Object> parsed = json.toType(content, MAP_TYPE);
       if (parsed.containsKey("value")) {
+        populateResponse(encodedResponse, response, parsed);
+      } else {
+        response.setValue(parsed);
+      }
+    } else {
+      response.setValue(content);
+    }
+  }
+  return response;
+}
 
+private void populateResponse(HttpResponse encodedResponse, Response response, Map<String, Object> map) {
+  if (map.containsKey("sessionId")) {
+    Object sid = map.get("sessionId");
+    if (sid != null) {
+      response.setSessionId(String.valueOf(sid));
+    }
+  }
+  response.setValue(map.get("value"));
+}
+

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that setting a Contents.Supplier as a Response value is incorrect for JSON-based clients and that error handling for binary content is fragile. The proposed changes improve the robustness of response decoding, especially for binary streams and error conditions.

Medium
General
Avoid logging binary response bodies

Update HttpResponse.toString() to avoid materializing binary content by checking
the Content-Type header and providing a summary for binary data instead.

java/src/org/openqa/selenium/remote/http/HttpResponse.java [61-66]

-public class HttpResponse extends HttpMessage<HttpResponse> {
+@Override
+public String toString() {
+  String contentType = getHeader(HttpHeader.ContentType.getName());
+  boolean looksText =
+      contentType == null
+          || contentType.toLowerCase(Locale.ENGLISH).startsWith("text/")
+          || contentType.toLowerCase(Locale.ENGLISH).contains("json")
+          || contentType.toLowerCase(Locale.ENGLISH).contains("xml");
 
-  public static final String HTTP_TARGET_HOST = "http.target.host";
-
-  private int status = HTTP_OK;
-
-  public boolean isSuccessful() {
-    return getStatus() >= HTTP_OK && getStatus() < 300;
+  if (!looksText) {
+    return String.format("%s (binary content, %d bytes)", getStatus(), getContent().length());
   }
 
-  public int getStatus() {
-    return status;
-  }
-
-  public HttpResponse setStatus(int status) {
-    this.status = status;
-    return this;
-  }
-...
-  @Override
-  public String toString() {
-    String content = super.toString();
-    return content.isEmpty()
-        ? String.format("%s", getStatus())
-        : String.format("%s: %s", getStatus(), content);
-  }
+  String content = super.toString();
+  return content.isEmpty()
+      ? String.format("%s", getStatus())
+      : String.format("%s: %s", getStatus(), content);
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly points out that the new toString() implementation can cause issues with large binary content, which is a valid concern given the PR's focus on file streaming. The proposed improvement enhances robustness and prevents potential performance problems during logging.

Low
Learned
best practice
Respect supplier-declared content length

Use the declared content length from the supplier when available to set
Content-Length for non-chunked responses; only fall back to chunked transfer
when length is unknown.

java/src/org/openqa/selenium/netty/server/ResponseConverter.java [62-92]

-private static final int CHUNK_SIZE = 1024 * 1024;
-private static final ThreadLocal<byte[]> CHUNK_CACHE =
-    ThreadLocal.withInitial(() -> new byte[CHUNK_SIZE]);
-...
 byte[] buffer = CHUNK_CACHE.get();
-InputStream is = seResponse.getContent().get();
+org.openqa.selenium.remote.http.Contents.Supplier supplier = seResponse.getContent();
+InputStream is = supplier.get();
 int byteCount = is.readNBytes(buffer, 0, buffer.length);
+long declaredLength = supplier.length();
 
 DefaultHttpResponse first;
-if (byteCount < CHUNK_SIZE) {
+if (declaredLength >= 0 && declaredLength <= CHUNK_SIZE && declaredLength == byteCount) {
+  is.close();
+  first =
+      new DefaultFullHttpResponse(
+          HTTP_1_1,
+          HttpResponseStatus.valueOf(seResponse.getStatus()),
+          Unpooled.copiedBuffer(buffer, 0, byteCount));
+  first.headers().addInt(CONTENT_LENGTH, (int) declaredLength);
+  copyHeaders(seResponse, first);
+  ctx.write(first);
+} else if (byteCount < CHUNK_SIZE && declaredLength >= 0 && declaredLength == byteCount) {
   is.close();
   first =
       new DefaultFullHttpResponse(
           HTTP_1_1,
           HttpResponseStatus.valueOf(seResponse.getStatus()),
           Unpooled.copiedBuffer(buffer, 0, byteCount));
   first.headers().addInt(CONTENT_LENGTH, byteCount);
   copyHeaders(seResponse, first);
   ctx.write(first);
 } else {
   first = new DefaultHttpResponse(HTTP_1_1, HttpResponseStatus.valueOf(seResponse.getStatus()));
   first.headers().set(TRANSFER_ENCODING, CHUNKED);
   copyHeaders(seResponse, first);
   ctx.write(first);
 
-  // We need to write the first response.
   ctx.write(new DefaultHttpContent(Unpooled.copiedBuffer(buffer)));
 
   HttpChunkedInput writer = new HttpChunkedInput(new ChunkedStream(is, CHUNK_SIZE));
   ChannelFuture future = ctx.write(writer);
-  future.addListener(
-      ignored -> {
-        is.close();
-        ctx.flush();
-      });
+  future.addListener(ignored -> { is.close(); ctx.flush(); });
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Guard external I/O with accurate content-length handling to avoid corrupt or truncated responses.

Low
  • Update

Previous suggestions

Suggestions up to commit e993ed2
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect filename extraction logic

Remove the incorrect .replace(' ', '+') from the extractFileName method to
correctly handle filenames containing spaces.

java/src/org/openqa/selenium/grid/node/local/LocalNode.java [775-782]

 String extractFileName(String uri) {
   String prefix = "/se/files/";
   int index = uri.lastIndexOf(prefix);
   if (index < 0) {
     throw new IllegalArgumentException("Unexpected URL for downloading a file: " + uri);
   }
-  return urlDecode(uri.substring(index + prefix.length())).replace(' ', '+');
+  return urlDecode(uri.substring(index + prefix.length()));
 }
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a bug in the new extractFileName method where spaces in filenames are incorrectly converted to plus signs, which would cause file downloads to fail for such files.

High
Allow multiple stream reads from file

Update FileContentSupplier.get() to return a new InputStream on each call to
adhere to the Contents.Supplier contract, removing the single-use restriction.

java/src/org/openqa/selenium/remote/http/FileContentSupplier.java [38-50]

 @Override
 public synchronized InputStream get() {
-  if (inputStream != null) {
-    throw new IllegalStateException("File input stream has been opened before");
-  }
   try {
-    inputStream = Files.newInputStream(file.toPath());
+    return Files.newInputStream(file.toPath());
   } catch (IOException e) {
     throw new IllegalStateException("File not readable: " + file.getAbsolutePath(), e);
   }
-
-  return inputStream;
 }
Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies that the get() method in the new FileContentSupplier class violates the Contents.Supplier contract by not allowing multiple invocations, which could cause issues with features like HTTP retries.

Medium
Overwrite existing files on download

Modify the downloadFile method to use StandardCopyOption.REPLACE_EXISTING with
Files.copy to allow overwriting existing files.

java/src/org/openqa/selenium/remote/RemoteWebDriver.java [729-739]

 @Override
 public void downloadFile(String fileName, Path targetLocation) throws IOException {
   requireDownloadsEnabled(capabilities);
 
   Response response = execute(DriverCommand.GET_DOWNLOADED_FILE, Map.of("name", fileName));
 
   Contents.Supplier content = (Contents.Supplier) response.getValue();
   try (InputStream fileContent = content.get()) {
-    Files.copy(new BufferedInputStream(fileContent), targetLocation.resolve(fileName));
+    Files.copy(
+        new BufferedInputStream(fileContent),
+        targetLocation.resolve(fileName),
+        StandardCopyOption.REPLACE_EXISTING);
   }
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that Files.copy will fail if the target file exists and proposes adding StandardCopyOption.REPLACE_EXISTING to fix it, which is a valid and useful improvement.

Medium
Learned
best practice
Validate filename to prevent traversal

Validate the extracted name against path traversal and reject names containing
separators or parent segments; avoid mutating decoded names unexpectedly.

java/src/org/openqa/selenium/grid/node/local/LocalNode.java [775-851]

 String extractFileName(String uri) {
   String prefix = "/se/files/";
   int index = uri.lastIndexOf(prefix);
   if (index < 0) {
     throw new IllegalArgumentException("Unexpected URL for downloading a file: " + uri);
   }
-  return urlDecode(uri.substring(index + prefix.length())).replace(' ', '+');
+  String decoded = urlDecode(uri.substring(index + prefix.length()));
+  // reject path traversal or nested paths
+  if (decoded.contains("/") || decoded.contains("\\") || decoded.contains("..")) {
+    throw new IllegalArgumentException("Invalid file name");
+  }
+  return decoded;
 }
 
 private HttpResponse getDownloadedFile(File downloadsDirectory, String fileName)
     throws IOException {
-  if (fileName.isEmpty()) {
+  if (fileName == null || fileName.isBlank()) {
     throw new WebDriverException("Please specify file to download in URL");
   }
+  // rest unchanged
   File file = findDownloadedFile(downloadsDirectory, fileName);
   BasicFileAttributes attributes = readAttributes(file.toPath(), BasicFileAttributes.class);
   return new HttpResponse()
       .setHeader("Content-Type", MediaType.OCTET_STREAM.toString())
       .setHeader("Content-Length", String.valueOf(attributes.size()))
       .setHeader("Last-Modified", lastModifiedHeader(attributes.lastModifiedTime()))
       .setContent(Contents.file(file));
 }
Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Guard external I/O and user-supplied inputs with validation to avoid crashes or security issues; validate and sanitize path-derived file names and method-specific routes.

Low

@asolntsev asolntsev self-assigned this Nov 22, 2025
@asolntsev asolntsev force-pushed the 16612-download-large-files branch 2 times, most recently from f42d85e to 2c190cc Compare November 23, 2025 12:26
@asolntsev asolntsev marked this pull request as draft November 23, 2025 12:26
@diemol
Copy link
Member

diemol commented Nov 24, 2025

Please ping when this is ready for review.

@asolntsev
Copy link
Contributor Author

Please ping when this is ready for review.

Sure. I think that the PR is already finished, but there are some tests failing.
Now I am trying to understand if these tests are just flaky, or broken by my changes...

@asolntsev asolntsev force-pushed the 16612-download-large-files branch 4 times, most recently from 1a93cbc to 9faf649 Compare November 25, 2025 21:13
@asolntsev asolntsev marked this pull request as ready for review November 25, 2025 21:52
@qodo-code-review
Copy link
Contributor

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Download Access Logging: New direct file download endpoints and code paths do not include explicit audit logging of
who downloaded which file and when, which may be required for audit trails.

Referred Code
          + id
          + " — ensure downloads are enabled in the options class when requesting a session.";
  throw new WebDriverException(msg);
}
File downloadsDirectory =
    Optional.ofNullable(tempFS.getBaseDir().listFiles()).orElse(new File[] {})[0];

try {
  if (req.getMethod().equals(HttpMethod.GET) && req.getUri().endsWith("/se/files")) {
    return listDownloadedFiles(downloadsDirectory);
  }
  if (req.getMethod().equals(HttpMethod.GET)) {
    return getDownloadedFile(downloadsDirectory, extractFileName(req));
  }
  if (req.getMethod().equals(HttpMethod.DELETE)) {
    return deleteDownloadedFile(downloadsDirectory);
  }
  return getDownloadedFile(req, downloadsDirectory);
} catch (IOException e) {
  throw new UncheckedIOException(e);
}


 ... (clipped 84 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Verbose Error Details: Error messages for missing or duplicate files include full directory paths and file
listings which could expose internal details if surfaced to end users.

Referred Code
            () -> new File[0]));
if (matchingFiles.isEmpty()) {
  List<File> files = downloadedFiles(downloadsDirectory);
  throw new WebDriverException(
      String.format(
          "Cannot find file [%s] in directory %s. Found %s files: %s.",
          filename, downloadsDirectory.getAbsolutePath(), files.size(), files));
}
if (matchingFiles.size() != 1) {
  throw new WebDriverException(
      String.format(
          "Expected there to be only 1 file. Found %s files: %s.",
          matchingFiles.size(), matchingFiles));
}
return matchingFiles.get(0);

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Path Traversal Risk: The new file download by name uses URL-decoded fileName without explicit sanitization
against path traversal, relying on exact name match which may be safe but warrants
verification.

Referred Code
private String extractFileName(HttpRequest req) {
  return extractFileName(req.getUri());
}

String extractFileName(String uri) {
  String prefix = "/se/files/";
  int index = uri.lastIndexOf(prefix);
  if (index < 0) {
    throw new IllegalArgumentException("Unexpected URL for downloading a file: " + uri);
  }
  return urlDecode(uri.substring(index + prefix.length())).replace(' ', '+');
}

/** User wants to list files that can be downloaded */
private HttpResponse listDownloadedFiles(File downloadsDirectory) {
  File[] files = Optional.ofNullable(downloadsDirectory.listFiles()).orElse(new File[] {});
  List<String> fileNames = Arrays.stream(files).map(File::getName).collect(Collectors.toList());
  List<DownloadedFile> fileInfos =
      Arrays.stream(files).map(this::getFileInfo).collect(Collectors.toList());

  Map<String, Object> data =


 ... (clipped 92 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect space encoding in filenames

Remove the incorrect .replace(' ', '+') from extractFileName as urlDecode
already handles space conversion.

java/src/org/openqa/selenium/grid/node/local/LocalNode.java [775-782]

 String extractFileName(String uri) {
   String prefix = "/se/files/";
   int index = uri.lastIndexOf(prefix);
   if (index < 0) {
     throw new IllegalArgumentException("Unexpected URL for downloading a file: " + uri);
   }
-  return urlDecode(uri.substring(index + prefix.length())).replace(' ', '+');
+  return urlDecode(uri.substring(index + prefix.length()));
 }
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a bug in extractFileName where spaces in filenames are incorrectly converted to +, which would cause file lookup to fail for names containing spaces.

Medium
Correctly handle file and directory paths

In downloadFile, check if targetLocation is a directory before resolving the
filename to correctly handle both file and directory paths as the destination.

java/src/org/openqa/selenium/remote/RemoteWebDriver.java [733-738]

 Response response = execute(DriverCommand.GET_DOWNLOADED_FILE, Map.of("name", fileName));
 
 Contents.Supplier content = (Contents.Supplier) response.getValue();
+Path destination =
+    Files.isDirectory(targetLocation) ? targetLocation.resolve(fileName) : targetLocation;
 try (InputStream fileContent = content.get()) {
-  Files.copy(new BufferedInputStream(fileContent), targetLocation.resolve(fileName));
+  Files.copy(new BufferedInputStream(fileContent), destination);
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the PR's implementation of downloadFile does not adhere to its documented contract, as it fails to handle cases where targetLocation is a file path.

Medium
Learned
best practice
Close stream in try-with-resources

Use try-with-resources to ensure the InputStream from the buffer is closed after
copying, preventing potential resource leaks on exceptions.

java/src/org/openqa/selenium/netty/server/FileBackedOutputStreamContentSupplier.java [64-73]

 @Override
 public String contentAsString(Charset charset) {
-  ByteArrayOutputStream out = new ByteArrayOutputStream();
-  try {
-    buffer.asByteSource().copyTo(out);
+  try (InputStream in = buffer.asByteSource().openBufferedStream();
+       ByteArrayOutputStream out = new ByteArrayOutputStream()) {
+    in.transferTo(out);
+    return out.toString(charset);
   } catch (IOException e) {
-    throw new RuntimeException(e);
+    throw new UncheckedIOException(e);
   }
-  return out.toString(charset);
 }
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Ensure resource cleanup by closing or disposing I/O resources in a finally/try-with-resources path, even on exceptions.

Low
  • More

@asolntsev
Copy link
Contributor Author

Please ping when this is ready for review.

@diemol Ping. This PR is ready for review, as well as few others.

Copy link
Member

@diemol diemol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the 2nd and 3rd code review suggestions are valid. What do you think?

@asolntsev
Copy link
Contributor Author

asolntsev commented Nov 27, 2025

I think the 2nd and 3rd code review suggestions are valid. What do you think?

I was thinking about it.

  1. "Make InputStream supplier multi-read safe"

The whole idea is to allow quickly downloading large files from Grid without wasting CPU, I/O or memory.
Currently it's used only in method RemoteWebDriver.download(String fileName, Path targetLocation). It saves the files directly to the requested target location.

If we apply the suggested solution, we would also save the same file to a temporary location, thus wasting disk space and slowing down the download process. I don't see the need for it.

  1. "Safely decode non-text and error responses"

In theory, we could do it. But again, I don't see the need for it because error responses are always written in JSON format.

  1. "Avoid logging binary response bodies"

It's already done so: the binary body is not written in HttpResponse.toString().

  1. "Respect supplier-declared content length"

Maybe it's reasonable, but this is the code I didn't touch. I prefer not to change it in this PR to avoid too big changes.

@asolntsev asolntsev force-pushed the 16612-download-large-files branch from 38cbd2e to 1b5d56f Compare November 27, 2025 17:18
@asolntsev asolntsev requested a review from diemol November 27, 2025 17:20
@asolntsev asolntsev force-pushed the 16612-download-large-files branch 2 times, most recently from 37fe5e3 to 9ad8f5d Compare November 27, 2025 17:27
@diemol
Copy link
Member

diemol commented Nov 27, 2025

Can you please resolve the conflict?

I added a new Grid endpoint "/se/files/:name" which allows downloading the file directly, without encoding it to Base64 and adding to Json. This transformation kills the performance and causes OutOfMemory errors for large files (e.g. 256+ MB).

NB! Be sure that `toString()` method of objects (HttpRequest, HttpResponse, Contents.Supplier) never returns too long string - it spam debug logs and can cause OOM during debugging.
…ier` to separate classes

It makes debugging easier. You can easily see what instances they are and where they come from.
Instead of reading the whole file to a byte array, just save given InputStream directly to the file.

Now it can download large files (I tried 4GB) while consuming very low memory.
… downloading files.

For json responses, still return `Contents.bytes` which allows re-reading its content multiple times. Just in case.
… deleted

After stopping a Grid node, the folder is deleted asynchronously (by cache removal listener). So we need to wait for it in test.
…opped

At least on my machine, stopping the node takes some time, and any checks right after `node.stop(sessionId)` often can fail.
Gr... This is extremely hard to debug test.

After hours of debugging, I came to a conclusion that we just need to increase the timeout. On my machine, `latch` gets decreased after ~1.2 seconds. So 1 second was not enough.
I don't know why, but sometimes we receive `HttpTimeoutException` instead of `InterruptedException`. Seems reasonable to consider execution as interrupted in both cases. (?)
None of `is.readNBytes` implementations returns -1. It always returns 0 or positive number.
…code`

Don't log the entire response body - just content type and length.
@asolntsev asolntsev force-pushed the 16612-download-large-files branch from 9ad8f5d to 58140f5 Compare November 27, 2025 21:35
@diemol diemol merged commit c5bb3c9 into SeleniumHQ:trunk Nov 28, 2025
38 checks passed
@asolntsev asolntsev deleted the 16612-download-large-files branch November 28, 2025 13:51
@ilya-corp
Copy link

Great work, thanks for everyone!

asolntsev added a commit to selenide/selenide that referenced this pull request Dec 12, 2025
Use file modification time for detecting if the downloading is completed.

* File size + time was added to Grid API in Selenium 4.39.0: SeleniumHQ/selenium#16589
* Also, downloading large files was significantly optimized in Selenium 4.39.0: SeleniumHQ/selenium#16627
asolntsev added a commit to selenide/selenide that referenced this pull request Dec 12, 2025
Use file modification time for detecting if the downloading is completed.

* File size + time was added to Grid API in Selenium 4.39.0: SeleniumHQ/selenium#16589
* Also, downloading large files was significantly optimized in Selenium 4.39.0: SeleniumHQ/selenium#16627
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B-build Includes scripting, bazel and CI integrations B-grid Everything grid and server related C-java Java Bindings I-performance Something could be faster Review effort 3/5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[🐛 Bug]: Critically slow file transfer from Node to host for large files

4 participants