feat: add tier-1 platform support for change_time rust-lang/rust#128256

Open

22 comments and reviews loaded in 1.44s

juliusl Avatar
juliusl on 2024-07-26 23:39:03 UTC · edited
juliusl Avatar
juliusl on 2024-07-26 23:39:03 UTC · edited
View on GitHub

View all comments

fix: remove en-us from doc links to support globalization

Related: #121478
r? tgross35

juliusl Avatar
juliusl on 2024-07-26 23:40:16 UTC
juliusl Avatar
juliusl on 2024-07-26 23:40:16 UTC
View on GitHub

@tgross35 unfortunately there are still cases where None would be returned, but I updated the documentation to clarify when that is the case

juliusl Avatar
juliusl on 2024-07-26 23:41:51 UTC
juliusl Avatar
juliusl on 2024-07-26 23:41:51 UTC
View on GitHub

Note: Currently the doc-url I link to that explains ChangeTime is dev blog. I have a PR out to update the official docs so that it reflects the information from there MicrosoftDocs/sdk-api#1863

tgross35 Avatar
tgross35 commented on 2024-07-27 04:06:05 UTC
library/std/src/os/windows/fs.rs
478 /// This will return `None` if the `Metadata` instance was not created using the `FILE_BASIC_INFO` type.
475 /// Returns the value of the `ChangeTime{Low,High}` field from the
476 /// [`FILE_BASIC_INFO`] struct associated with the current file handle.
477 /// [`ChangeTime`], is the last time file metadata was changed, such as
tgross35 Avatar tgross35 on 2024-07-27 03:57:49 UTC
View on GitHub
Suggested change
/// [`ChangeTime`], is the last time file metadata was changed, such as
/// [`ChangeTime`] is the last time file metadata was changed, such as

Spurious comma

library/std/src/os/windows/fs.rs
486 /// [`WIN32_FIND_DATA`]: https://learn.microsoft.com/windows/win32/api/minwinbase/ns-minwinbase-win32_find_dataw
487 /// [`FindFirstFile`]: https://learn.microsoft.com/windows/win32/api/fileapi/nf-fileapi-findfirstfilea
488 /// [`FindNextFile`]: https://learn.microsoft.com/windows/win32/api/fileapi/nf-fileapi-findnextfilea
489 /// [`ChangeTime`]: https://devblogs.microsoft.com/oldnewthing/20100709-00/?p=13463#:~:text=I%E2%80%99m%20told%20that%20the%20difference%20is%20metadata.%20The,attributes%20%28hidden%2C%20read-only%2C%20etc.%29%20or%20renaming%20the%20file.
tgross35 Avatar tgross35 on 2024-07-27 03:57:51 UTC
View on GitHub

Looks like this has some highlighting, https://devblogs.microsoft.com/oldnewthing/20100709-00/?p=13463 seems to work

library/std/src/sys/pal/windows/fs.rs
375 375 }
376
377 // FILE_BASIC_INFO contains additional fields not returned by GetFileInformationByHandle
378 let basic_info = self.basic_info()?;
tgross35 Avatar tgross35 on 2024-07-27 04:05:57 UTC
View on GitHub

I don't know what the failure conditions are here, is accessing FILE_BASIC_INFO expected to never fail? Looks like uwp already does this.

If failures are possible, you can just do self.basic_info().ok().map(|info| c::FILETIME { ... }).

tgross35 Avatar
tgross35 on 2024-07-27 04:12:29 UTC
tgross35 Avatar
tgross35 on 2024-07-27 04:12:29 UTC
View on GitHub

Cool, thanks! The above looks reasonable to me with the possible failure concern addressed.

Since this works on all platforms, is a test possible now?

ChrisDenton Avatar
ChrisDenton on 2024-07-27 17:56:35 UTC
ChrisDenton Avatar
ChrisDenton on 2024-07-27 17:56:35 UTC
View on GitHub

Hm, this adds an extra sys call for every use of metadata? I'd rather avoid that, especially as change time is more niche. Multiple sys calls were done for UWP targets out of necessity but for our tier 1 platforms we do try to be performant.

tgross35 Avatar
tgross35 on 2024-07-27 18:19:19 UTC
tgross35 Avatar
tgross35 on 2024-07-27 18:19:19 UTC
View on GitHub

Per usual with Windows

r? @ChrisDenton

juliusl Avatar
juliusl on 2024-07-30 21:09:04 UTC · edited
juliusl Avatar
juliusl on 2024-07-30 21:09:04 UTC · edited
View on GitHub

@ChrisDenton

Hm, this adds an extra sys call for every use of metadata?

Yeah, I thought about that over the weekend, going to do a bit more tinkering to see if I can find a way to avoid an extra call.

Alternatively, how do you feel about an extended_metadata() api so that the perf cost is opt-in?

juliusl Avatar
juliusl on 2024-07-31 02:51:30 UTC
juliusl Avatar
juliusl on 2024-07-31 02:51:30 UTC
View on GitHub

Okay wow this was a trip to research. So apparently cpython ran into a similar issue which eventually led windows to add this api:

https://learn.microsoft.com/windows/win32/api/winbase/nf-winbase-getfileinformationbyname

This API can return an information class type that returns a struct called FILE_STAT_BASIC_INFORMATION. This struct has all of the previous information, plus the ChangeTime I'm trying to get to.

Here's the real cool part. According to documentation, this api doesn't open the file in order to return metadata, which means that it should actually improve performance along this code path since GetFileInformationByHandle/GetFileInformationByHandleEx actually need to open/close a file handle.

This routine returns the requested information about the file specified by FileName. It does so without opening and returning a handle to the file.

Also, the struct includes the reparse tag if set, meaning the current additional syscall can be avoided when the reparse attribute is set.

For now, I'll try it out and see if I can get it working and I'll update the PR accordingly.

🚀2
juliusl Avatar
juliusl on 2024-07-31 21:31:41 UTC
juliusl Avatar
juliusl on 2024-07-31 21:31:41 UTC
View on GitHub

So slightly blocked by this PR getting merged microsoft/win32metadata#1934 in order for bindings.txt/windows-bindgen to pick up the new API. (https://github.com/microsoft/win32metadata/blob/3b275fdb05e7dbab989c00b8a86e985853c6b65e/generation/WinSDK/RecompiledIdlHeaders/um/WinBase.h#L9394)

However, I confirmed that the API already exists in winbase.h, by just running my own bindgen, which means I could at least test the functionality w/ this Draft PR.

juliusl Avatar
juliusl on 2024-08-01 00:36:09 UTC
juliusl Avatar
juliusl on 2024-08-01 00:36:09 UTC
View on GitHub

Additional prior art -- libuv/libuv#4327

riverar Avatar
riverar on 2024-08-16 16:24:03 UTC
riverar Avatar
riverar on 2024-08-16 16:24:03 UTC
View on GitHub

We can also look at getting this API in earlier than microsoft/win32metadata#1934, will file a separate issue to track that.

❤️1
ChrisDenton Avatar
ChrisDenton on 2024-09-09 14:31:54 UTC
ChrisDenton Avatar
ChrisDenton on 2024-09-09 14:31:54 UTC
View on GitHub

I've been looking into it using this. One concern is how new the API is. The vast majority of users will not have it and the API docs are still displaying a stability warning (though given Python's unusually early use of the API, I'm not sure how seriously to take that). We could fallback to NtQueryInformationByName which is supported since 1709 (for FILE_STAT_INFORMATION). We do tend to avoid NT APIs in std unless necessary but using it as fallback should be fine. Though even this still leaves Windows Server 2016 out in the cold.

Another issue is that not all filesystem drivers support it. Notably fat32 but also third party drivers or (worse) things that hook the filesystem APIs. Maybe using a FILE_BASIC_INFO fallback there would not be as bad because performance in such cases is already going to be affected.

To be clear, I'm not against it at all. Just that there's enough question that I'd rather think/ask/test a bit more.

👍1
ChrisDenton Avatar
ChrisDenton on 2024-09-09 14:32:07 UTC
ChrisDenton Avatar
ChrisDenton on 2024-09-09 14:32:07 UTC
View on GitHub

In any case, there's no reason to block fixing the links. Could you either update this PR to just do the link changes or open a new PR for that? And separately open an issue for using GetFileInfromationByName? I don't mind if you also want to experimentally implement it in a PR but I'd rather not tie it to the link changes.

👍1
riverar Avatar
riverar on 2024-09-09 17:12:11 UTC · edited
riverar Avatar
riverar on 2024-09-09 17:12:11 UTC · edited
View on GitHub

Right, GetFileInformationByName(..., FileStatBasicByNameInfo, ...) doesn't succeed on FAT32 volumes.

juliusl Avatar
juliusl on 2024-09-09 19:05:07 UTC
juliusl Avatar
juliusl on 2024-09-09 19:05:07 UTC
View on GitHub

@ChrisDenton - Created a new PR for the links, #130168, I'll start a new issue for this particular thing

👍1
bors Avatar
bors on 2024-09-12 02:35:15 UTC
bors Avatar
bors on 2024-09-12 02:35:15 UTC
View on GitHub

☔ The latest upstream changes (presumably #130253) made this pull request unmergeable. Please resolve the merge conflicts.

Dylan-DPC Avatar
Dylan-DPC on 2024-11-04 13:40:55 UTC
Dylan-DPC Avatar
Dylan-DPC on 2024-11-04 13:40:55 UTC
View on GitHub

@juliusl what's the status of this? thanks

juliusl Avatar
juliusl on 2024-11-04 21:56:15 UTC
juliusl Avatar
juliusl on 2024-11-04 21:56:15 UTC
View on GitHub

@Dylan-DPC hey there, was waiting on the windows build that includes this API to get to retail, I'll need to circle back to see what the status is but I think it's either released or is close to releasing.

Dylan-DPC Avatar
Dylan-DPC on 2025-06-29 09:07:33 UTC
Dylan-DPC Avatar
Dylan-DPC on 2025-06-29 09:07:33 UTC
View on GitHub

@juliusl any further updates on this? thanks

Dylan-DPC Avatar
Dylan-DPC on 2026-02-16 18:13:21 UTC
Dylan-DPC Avatar
Dylan-DPC on 2026-02-16 18:13:21 UTC
View on GitHub

@juliusl it's been a while, any updates on this pr? or is there anything that's blocking it? thanks

juliusl Avatar
juliusl on 2026-02-17 20:53:50 UTC
juliusl Avatar
juliusl on 2026-02-17 20:53:50 UTC
View on GitHub

@Dylan-DPC hey there, sorry for the late reply. Last I was waiting on was a new version of windows to propagate. I can check to see where that's at currently.

juliusl Avatar
juliusl on 2026-03-03 05:58:18 UTC
juliusl Avatar
juliusl on 2026-03-03 05:58:18 UTC
View on GitHub

@Dylan-DPC Okay the good news is that the API we've been looking for should be pretty common. I continued my investigation and there are a couple of things to consider before I think we can move forward.

The main issue that caused us to pause this PR for tier-1 was the concern @ChrisDenton brought up regarding the extra syscall in sys::fs::File::file_attr. Which is where the idea of using GetFileInformationByName came from. Since this API is now available I was able to benchmark compare the performance of several different solutions.

  1. Current implementation: Open + GetFileInformationByHandle
  2. windows_change_time implementation: Open + GetFileInformationByHandle + GetFileInformationByHandleEx (basic_info)
  3. GetFileInformationByName - This would mean that only Path::metadata(..) would be able to get change_time as File::file_attr is being called from an opened file-handle. It also means that calls to stat(..) would benefit from the fast path on all non-FAT32 regular files.
  4. I also tested what would happen if we used NtQueryInformationFile to populate FileAttrs. However, since there isn't a info struct that completely satisfies FileAttrs, it would mean at least 2 calls would always be used.

For now, I'll discuss just 1) 2) 4) since 3) is sort of a whole thing of it's own.

Regular files

stat (follow symlinks)

Benchmark Syscall sequence ns/iter ± vs baseline
file_open_and_nt_query_stat_info CreateFileW → NtQueryInformationFile(FileStatInformation) → NtQueryInformationFile(FileIdInformation) 74,928 8,563 -0.6%
file_open_and_get_info_by_handle CreateFileW → GetFileInformationByHandle 75,382 9,799 baseline
file_open_and_get_info_by_handle_plus_basic_info CreateFileW → GetFileInformationByHandle → GetFileInformationByHandleEx(BasicInfo) 77,026 7,586 +2.2%

lstat (do not follow symlinks)

Benchmark Syscall sequence ns/iter ± vs baseline
file_lstat_open_and_get_info_by_handle CreateFileW(OPEN_REPARSE_POINT) → GetFileInformationByHandle 73,317 5,846 baseline
file_lstat_open_and_nt_query_stat_info CreateFileW(OPEN_REPARSE_POINT) → NtQueryInformationFile(FileStatInformation) → NtQueryInformationFile(FileIdInformation) 75,959 6,358 +3.6%
file_lstat_open_and_get_info_by_handle_plus_basic_info CreateFileW(OPEN_REPARSE_POINT) → GetFileInformationByHandle → GetFileInformationByHandleEx(BasicInfo) 78,138 9,010 +6.6%

Junctions (reparse points)

stat (follow symlinks)

Benchmark Syscall sequence ns/iter ± vs baseline
junction_stat_open_and_get_info_by_handle CreateFileW (follows junction) → GetFileInformationByHandle → GetFileInformationByHandleEx(FileAttributeTagInfo) 90,427 8,581 baseline
junction_stat_open_and_nt_query_stat_info CreateFileW (follows junction) → NtQueryInformationFile(FileStatInformation) → NtQueryInformationFile(FileIdInformation) 91,684 12,094 +1.4%
junction_stat_open_and_get_info_by_handle_plus_basic_info CreateFileW (follows junction) → GetFileInformationByHandle → GetFileInformationByHandleEx(FileAttributeTagInfo) → GetFileInformationByHandleEx(BasicInfo) 94,645 14,196 +4.7%

lstat (do not follow symlinks)

Benchmark Syscall sequence ns/iter ±
junction_lstat_open_and_get_info_by_handle CreateFileW(OPEN_REPARSE_POINT) → GetFileInformationByHandle → GetFileInformationByHandleEx(FileAttributeTagInfo) 70,103 5,205

Devbox Specs:

  • CPU: Intel Core i7-8700 @ 3.20GHz (6 cores / 12 threads)
  • RAM: 32 GB

Takeaways

  • Adding BasicInfo for ChangeTime costs ~2-7% overhead on top of the current approach. This would mean 5µs on a single file, 15-50ms for 10,000 files, 150-500ms for 100,000 files, etc.
  • Calling NtQueryInformationFile directly is about the same as the current implementation, except we would get back all required fields and it would remove the conditional logic that the current implementation has today. <- (This was surprising to find out)

--

GetFileInformationByName Results

So this wouldn't be a complete solution, however it would provide change_time on files at the file path level rather than the file handle level (although that's arguably better given what the use cases for change_time are). However, it's important to note the caveats are not being supported by FAT32 and symbolic-links.

Benchmark Syscall sequence ns/iter ± vs baseline
file_get_info_by_name GetFileInformationByName 66,871 5,941 -11.3%
junction_get_info_by_name GetFileInformationByName (detects reparse) → CreateFileW → GetFileInformationByHandle → GetFileInformationByHandleEx(FileAttributeTagInfo) 149,286 11,159 +65.1%

11% is a pretty nice speed-up for regular files, however in the junction case, since we still need to open the file to get file size we get hit w/ the 65.1% slowdown.

I'll put these results in the dedicated issue I have but just wanted to link the contexts together.

--

Now with an idea of performance, designing the solution is a bit less hand wavy. I have two ideas at the moment:

  1. We implement this by keeping the implementation inside of the current metadata function. There will be overhead. From my perspective it's a tiny bit of overhead, but it's overhead.

  2. We add a an _extended set of api's to both sys::fs::File and fs::File in order to silo the overhead. For example, we'd add a sys::fs::File::extended_file_attrs, and call that from fs::MetadataExt::extended_metadata. This would mean adding 1 new function to the existing trait. I'm not aware of how the library team would feel about this, but I'm including this idea since it seems simplest.