feat: add tier-1 platform support for change_time rust-lang/rust#128256
@tgross35 unfortunately there are still cases where None would be returned, but I updated the documentation to clarify when that is the case
Note: Currently the doc-url I link to that explains ChangeTime is dev blog. I have a PR out to update the official docs so that it reflects the information from there MicrosoftDocs/sdk-api#1863
library/std/src/os/windows/fs.rs
| 478 | /// This will return `None` if the `Metadata` instance was not created using the `FILE_BASIC_INFO` type. |
|
| 475 | /// Returns the value of the `ChangeTime{Low,High}` field from the |
|
| 476 | /// [`FILE_BASIC_INFO`] struct associated with the current file handle. |
|
| 477 | /// [`ChangeTime`], is the last time file metadata was changed, such as |
| /// [`ChangeTime`], is the last time file metadata was changed, such as | |
| /// [`ChangeTime`] is the last time file metadata was changed, such as |
Spurious comma
library/std/src/os/windows/fs.rs
| 486 | /// [`WIN32_FIND_DATA`]: https://learn.microsoft.com/windows/win32/api/minwinbase/ns-minwinbase-win32_find_dataw |
|
| 487 | /// [`FindFirstFile`]: https://learn.microsoft.com/windows/win32/api/fileapi/nf-fileapi-findfirstfilea |
|
| 488 | /// [`FindNextFile`]: https://learn.microsoft.com/windows/win32/api/fileapi/nf-fileapi-findnextfilea |
|
| 489 | /// [`ChangeTime`]: https://devblogs.microsoft.com/oldnewthing/20100709-00/?p=13463#:~:text=I%E2%80%99m%20told%20that%20the%20difference%20is%20metadata.%20The,attributes%20%28hidden%2C%20read-only%2C%20etc.%29%20or%20renaming%20the%20file. |
Looks like this has some highlighting, https://devblogs.microsoft.com/oldnewthing/20100709-00/?p=13463 seems to work
library/std/src/sys/pal/windows/fs.rs
| 375 | 375 | } |
| 376 | |
|
| 377 | // FILE_BASIC_INFO contains additional fields not returned by GetFileInformationByHandle |
|
| 378 | let basic_info = self.basic_info()?; |
I don't know what the failure conditions are here, is accessing FILE_BASIC_INFO expected to never fail? Looks like uwp already does this.
If failures are possible, you can just do self.basic_info().ok().map(|info| c::FILETIME { ... }).
Cool, thanks! The above looks reasonable to me with the possible failure concern addressed.
Since this works on all platforms, is a test possible now?
Hm, this adds an extra sys call for every use of metadata? I'd rather avoid that, especially as change time is more niche. Multiple sys calls were done for UWP targets out of necessity but for our tier 1 platforms we do try to be performant.
Per usual with Windows
r? @ChrisDenton
Hm, this adds an extra sys call for every use of
metadata?
Yeah, I thought about that over the weekend, going to do a bit more tinkering to see if I can find a way to avoid an extra call.
Alternatively, how do you feel about an extended_metadata() api so that the perf cost is opt-in?
Okay wow this was a trip to research. So apparently cpython ran into a similar issue which eventually led windows to add this api:
https://learn.microsoft.com/windows/win32/api/winbase/nf-winbase-getfileinformationbyname
This API can return an information class type that returns a struct called FILE_STAT_BASIC_INFORMATION. This struct has all of the previous information, plus the ChangeTime I'm trying to get to.
Here's the real cool part. According to documentation, this api doesn't open the file in order to return metadata, which means that it should actually improve performance along this code path since GetFileInformationByHandle/GetFileInformationByHandleEx actually need to open/close a file handle.
This routine returns the requested information about the file specified by FileName. It does so without opening and returning a handle to the file.
Also, the struct includes the reparse tag if set, meaning the current additional syscall can be avoided when the reparse attribute is set.
For now, I'll try it out and see if I can get it working and I'll update the PR accordingly.
So slightly blocked by this PR getting merged microsoft/win32metadata#1934 in order for bindings.txt/windows-bindgen to pick up the new API. (https://github.com/microsoft/win32metadata/blob/3b275fdb05e7dbab989c00b8a86e985853c6b65e/generation/WinSDK/RecompiledIdlHeaders/um/WinBase.h#L9394)
However, I confirmed that the API already exists in winbase.h, by just running my own bindgen, which means I could at least test the functionality w/ this Draft PR.
Additional prior art -- libuv/libuv#4327
We can also look at getting this API in earlier than microsoft/win32metadata#1934, will file a separate issue to track that.
I've been looking into it using this. One concern is how new the API is. The vast majority of users will not have it and the API docs are still displaying a stability warning (though given Python's unusually early use of the API, I'm not sure how seriously to take that). We could fallback to NtQueryInformationByName which is supported since 1709 (for FILE_STAT_INFORMATION). We do tend to avoid NT APIs in std unless necessary but using it as fallback should be fine. Though even this still leaves Windows Server 2016 out in the cold.
Another issue is that not all filesystem drivers support it. Notably fat32 but also third party drivers or (worse) things that hook the filesystem APIs. Maybe using a FILE_BASIC_INFO fallback there would not be as bad because performance in such cases is already going to be affected.
To be clear, I'm not against it at all. Just that there's enough question that I'd rather think/ask/test a bit more.
In any case, there's no reason to block fixing the links. Could you either update this PR to just do the link changes or open a new PR for that? And separately open an issue for using GetFileInfromationByName? I don't mind if you also want to experimentally implement it in a PR but I'd rather not tie it to the link changes.
Right, GetFileInformationByName(..., FileStatBasicByNameInfo, ...) doesn't succeed on FAT32 volumes.
@ChrisDenton - Created a new PR for the links, #130168, I'll start a new issue for this particular thing
☔ The latest upstream changes (presumably #130253) made this pull request unmergeable. Please resolve the merge conflicts.
@juliusl what's the status of this? thanks
@Dylan-DPC hey there, was waiting on the windows build that includes this API to get to retail, I'll need to circle back to see what the status is but I think it's either released or is close to releasing.
@juliusl any further updates on this? thanks
@juliusl it's been a while, any updates on this pr? or is there anything that's blocking it? thanks
@Dylan-DPC hey there, sorry for the late reply. Last I was waiting on was a new version of windows to propagate. I can check to see where that's at currently.
@Dylan-DPC Okay the good news is that the API we've been looking for should be pretty common. I continued my investigation and there are a couple of things to consider before I think we can move forward.
The main issue that caused us to pause this PR for tier-1 was the concern @ChrisDenton brought up regarding the extra syscall in sys::fs::File::file_attr. Which is where the idea of using GetFileInformationByName came from. Since this API is now available I was able to benchmark compare the performance of several different solutions.
- Current implementation: Open + GetFileInformationByHandle
windows_change_timeimplementation: Open + GetFileInformationByHandle + GetFileInformationByHandleEx (basic_info)- GetFileInformationByName - This would mean that only Path::metadata(..) would be able to get
change_timeasFile::file_attris being called from an opened file-handle. It also means that calls to stat(..) would benefit from the fast path on all non-FAT32 regular files. - I also tested what would happen if we used NtQueryInformationFile to populate FileAttrs. However, since there isn't a info struct that completely satisfies FileAttrs, it would mean at least 2 calls would always be used.
For now, I'll discuss just 1) 2) 4) since 3) is sort of a whole thing of it's own.
Regular files
stat (follow symlinks)
| Benchmark | Syscall sequence | ns/iter | ± | vs baseline |
|---|---|---|---|---|
file_open_and_nt_query_stat_info |
CreateFileW → NtQueryInformationFile(FileStatInformation) → NtQueryInformationFile(FileIdInformation) | 74,928 | 8,563 | -0.6% |
file_open_and_get_info_by_handle |
CreateFileW → GetFileInformationByHandle | 75,382 | 9,799 | baseline |
file_open_and_get_info_by_handle_plus_basic_info |
CreateFileW → GetFileInformationByHandle → GetFileInformationByHandleEx(BasicInfo) | 77,026 | 7,586 | +2.2% |
lstat (do not follow symlinks)
| Benchmark | Syscall sequence | ns/iter | ± | vs baseline |
|---|---|---|---|---|
file_lstat_open_and_get_info_by_handle |
CreateFileW(OPEN_REPARSE_POINT) → GetFileInformationByHandle | 73,317 | 5,846 | baseline |
file_lstat_open_and_nt_query_stat_info |
CreateFileW(OPEN_REPARSE_POINT) → NtQueryInformationFile(FileStatInformation) → NtQueryInformationFile(FileIdInformation) | 75,959 | 6,358 | +3.6% |
file_lstat_open_and_get_info_by_handle_plus_basic_info |
CreateFileW(OPEN_REPARSE_POINT) → GetFileInformationByHandle → GetFileInformationByHandleEx(BasicInfo) | 78,138 | 9,010 | +6.6% |
Junctions (reparse points)
stat (follow symlinks)
| Benchmark | Syscall sequence | ns/iter | ± | vs baseline |
|---|---|---|---|---|
junction_stat_open_and_get_info_by_handle |
CreateFileW (follows junction) → GetFileInformationByHandle → GetFileInformationByHandleEx(FileAttributeTagInfo) | 90,427 | 8,581 | baseline |
junction_stat_open_and_nt_query_stat_info |
CreateFileW (follows junction) → NtQueryInformationFile(FileStatInformation) → NtQueryInformationFile(FileIdInformation) | 91,684 | 12,094 | +1.4% |
junction_stat_open_and_get_info_by_handle_plus_basic_info |
CreateFileW (follows junction) → GetFileInformationByHandle → GetFileInformationByHandleEx(FileAttributeTagInfo) → GetFileInformationByHandleEx(BasicInfo) | 94,645 | 14,196 | +4.7% |
lstat (do not follow symlinks)
| Benchmark | Syscall sequence | ns/iter | ± |
|---|---|---|---|
junction_lstat_open_and_get_info_by_handle |
CreateFileW(OPEN_REPARSE_POINT) → GetFileInformationByHandle → GetFileInformationByHandleEx(FileAttributeTagInfo) | 70,103 | 5,205 |
Devbox Specs:
- CPU: Intel Core i7-8700 @ 3.20GHz (6 cores / 12 threads)
- RAM: 32 GB
Takeaways
- Adding BasicInfo for ChangeTime costs ~2-7% overhead on top of the current approach. This would mean 5µs on a single file, 15-50ms for 10,000 files, 150-500ms for 100,000 files, etc.
- Calling NtQueryInformationFile directly is about the same as the current implementation, except we would get back all required fields and it would remove the conditional logic that the current implementation has today. <- (This was surprising to find out)
--
GetFileInformationByName Results
So this wouldn't be a complete solution, however it would provide change_time on files at the file path level rather than the file handle level (although that's arguably better given what the use cases for change_time are). However, it's important to note the caveats are not being supported by FAT32 and symbolic-links.
| Benchmark | Syscall sequence | ns/iter | ± | vs baseline |
|---|---|---|---|---|
file_get_info_by_name |
GetFileInformationByName | 66,871 | 5,941 | -11.3% ⭐ |
junction_get_info_by_name |
GetFileInformationByName (detects reparse) → CreateFileW → GetFileInformationByHandle → GetFileInformationByHandleEx(FileAttributeTagInfo) | 149,286 | 11,159 | +65.1% |
11% is a pretty nice speed-up for regular files, however in the junction case, since we still need to open the file to get file size we get hit w/ the 65.1% slowdown.
I'll put these results in the dedicated issue I have but just wanted to link the contexts together.
--
Now with an idea of performance, designing the solution is a bit less hand wavy. I have two ideas at the moment:
-
We implement this by keeping the implementation inside of the current metadata function. There will be overhead. From my perspective it's a tiny bit of overhead, but it's overhead.
-
We add a an
_extendedset of api's to bothsys::fs::Fileandfs::Filein order to silo the overhead. For example, we'd add asys::fs::File::extended_file_attrs, and call that fromfs::MetadataExt::extended_metadata. This would mean adding 1 new function to the existing trait. I'm not aware of how the library team would feel about this, but I'm including this idea since it seems simplest.
View all comments
fix: remove
en-usfrom doc links to support globalizationRelated: #121478
r? tgross35