-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Improve DateTime{Offset}.ParseExact{Multiple} performance for RFC1123 ("r") #18771
Conversation
|
@tarekgh or @jkotas, do you have any familiarity with the issue cited at: coreclr/src/System.Private.CoreLib/shared/System/Globalization/DateTimeParse.cs Lines 4220 to 4230 in dc8ee8c
and whether that's something I need to care about here? |
| { | ||
| uint space = source[25], g = source[26], m = source[27], t = source[28]; | ||
|
|
||
| if ((space | g | m | t) > 0x7F || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this measurably faster than 4 character equality comparisons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. Simplified.
| return false; | ||
| } | ||
|
|
||
| uint dowString = (dow0 << 24) | (dow1 << 16) | (dow2 << 8) | comma | 0x20202000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahsonkhan, Utf8Parser with "r" expects the specific casing of "Sun", "Mon", etc... it doesn't allow any other casing. Is that by design? It differs from DateTime.ParseExact in that regard, which allows any casing for the day of week and month.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utf8Parser doesn't support "r". It has "R" and "l" for RFC 1123:
https://github.com/dotnet/corefx/blob/edac8ec5d0e09fab9d11c042dcbabb07b9e3b446/src/System.Memory/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Date.cs#L22-L29
So, it looks like its by design.
@atsushikan would know more on why we chose different characters for it.
This is to support the date format with a quote. i.e. in the date format, you'll have 'z' or 'GMT'. the quotes are not part of the real ISO format but it looks the framework at some time supported it by mistake. usually, when having quotes in the date pattern, the characters inside the quotes are treated literals (i.e. it show up in the formatted strings as it is) but when having 'z' or 'GMT' we treat it is as the date is in UTC time zone. |
@tarekgh, can you share an example? I've not been able to come up with a variant that has quotes around GMT and that successfully parses with DateTime.ParseExact(..., "r", ...) |
Oh, wait, it's to support the date format, not the input string. So I don't need to worry about that here, then. Thanks. |
… ("r")
Significantly improves the performance by porting and adapting the Utf8Parser code from corefx. This optimizes for the (default) case of a DateTimeStyles.None; specifying any other style falls back to the normal parsing support, as that requires handling things such as arbitrary whitespace anywhere in the string.
|
Any other feedback? |
|
@stephentoub please ensure you parse the following correctly: This is the special case we support. note the quotes around GMT in the formatted string. |
| // Tue, 03 Jan 2017 08:08:05 GMT | ||
|
|
||
| // The format is exactly 29 characters. | ||
| if ((uint)source.Length != 29) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
((uint)source.Length != 29) [](start = 14, length = 28)
this looks wrong as GMT can specified with quotes so it can be 31
| return false; | ||
| } | ||
|
|
||
| uint dowString = (dow0 << 24) | (dow1 << 16) | (dow2 << 8) | comma | 0x20202000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int dowString = (dow0 << 24) | (dow1 << 16) | (dow2 << 8) | comma | 0x20202000; [](start = 17, length = 79)
do you think this can work with Big Endian machines?
| } | ||
|
|
||
| // Validate that the parsed date is valid according to the calendar. | ||
| if (!parseInfo.calendar.TryToDateTime(year, month, day, hour, minute, second, 0, 0, out result.parsedDate)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!parseInfo.calendar.TryToDateTim [](start = 12, length = 36)
This is ok, but with the "r" format we always use Gregorian calendar so we can just simply do "new DateTime(...) " we just need to ensure DateTimeKind value is set correctly in the returned DateTime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that calendar.TryToDateTime is catching the exception and returns false at that time. we may do the same when doing new DateTime() or better we can add helper method to DateTime like TryGet(..., out DateTime dt)
In reply to: 200185195 [](ancestors = 200185195)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that calendar.TryToDateTime is catching the exception and returns false at that time
GregorianCalendar overrides TryToDateTime, with a bit of logic that ends up calling DateTime.TryCreate. Seems reasonable to still just use calendar.TryToDateTime then and inherit that behavior. We could avoid a virtual call by rewriting that logic here, but it doesn't really seem worth it.
@tarekgh, this is what I was asking about earlier. That example fails for me, e.g. this program: using System;
using System.Globalization;
class Program
{
static void Main() => DateTime.ParseExact("Mon, 01 Jan 0001 00:00:00 'GMT'", "r", CultureInfo.CurrentCulture);
}throws on .NET 4.7.2: Can you please clarify / share an example that parses successfully with format "r" and with quotes around GMT?
Same question. I've not been able to come up with an example where quotes are parsed successfully. |
|
@stephentoub looks you are right. you may ignore my comments then. |
Significantly improves the performance of
DateTime.ParseExact(..., "r", ...)by porting and adapting theUtf8Parsercode from corefx, for a 15-20x throughput increase (and also saving three string allocations totaling ~80bytes... which more generally we should subsequently look at getting rid of for other formats). This optimizes for the (default) case of aDateTimeStyles.None; specifying any other style falls back to the normal parsing support, as that requires handling things such as arbitrary whitespace anywhere in the string.Benchmark:
Before:
After:
cc: @ahsonkhan, @jkotas, @pjanotti, @joperezr
Contributes to https://github.com/dotnet/corefx/issues/30612