Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@stephentoub
Copy link
Member

@stephentoub stephentoub commented Jul 3, 2018

Significantly improves the performance of DateTime.ParseExact(..., "r", ...) by porting and adapting the Utf8Parser code from corefx, for a 15-20x throughput increase (and also saving three string allocations totaling ~80bytes... which more generally we should subsequently look at getting rid of for other formats). This optimizes for the (default) case of a DateTimeStyles.None; specifying any other style falls back to the normal parsing support, as that requires handling things such as arbitrary whitespace anywhere in the string.

Benchmark:

using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Attributes.Jobs;
using BenchmarkDotNet.Running;

[MemoryDiagnoser]
[InProcess]
public class Benchmark
{
    private static void Main() => BenchmarkRunner.Run<Benchmark>();

    [Benchmark] public DateTime DT_ParseExactR() => DateTime.ParseExact("Wed, 15 Aug 1906 07:24:05 GMT", "r", null);
    [Benchmark] public DateTimeOffset DTO_ParseExactR() => DateTimeOffset.ParseExact("Wed, 15 Aug 1906 07:24:05 GMT", "r", null);

    private static readonly string[] s_formats = new string[] { "R" };
    [Benchmark] public DateTime DT_ParseExactMultipleR() => DateTime.ParseExact("Wed, 15 Aug 1906 07:24:05 GMT", s_formats, null);
    [Benchmark] public DateTimeOffset DTO_ParseExactMultipleR() => DateTimeOffset.ParseExact("Wed, 15 Aug 1906 07:24:05 GMT", s_formats, null);
}

Before:

                  Method |     Mean |     Error |    StdDev |  Gen 0 | Allocated |
------------------------ |---------:|----------:|----------:|-------:|----------:|
          DT_ParseExactR | 1.922 us | 0.0377 us | 0.0888 us | 0.0172 |      80 B |
         DTO_ParseExactR | 1.948 us | 0.0374 us | 0.0511 us | 0.0153 |      80 B |
  DT_ParseExactMultipleR | 1.922 us | 0.0265 us | 0.0221 us | 0.0172 |      80 B |
 DTO_ParseExactMultipleR | 1.934 us | 0.0349 us | 0.0292 us | 0.0153 |      80 B |

After:

                  Method |      Mean |     Error |    StdDev | Allocated |
------------------------ |----------:|----------:|----------:|----------:|
          DT_ParseExactR |  98.38 ns | 0.9119 ns | 0.7119 ns |       0 B |
         DTO_ParseExactR | 123.87 ns | 0.4554 ns | 0.3012 ns |       0 B |
  DT_ParseExactMultipleR | 112.05 ns | 2.3175 ns | 4.6284 ns |       0 B |
 DTO_ParseExactMultipleR | 120.27 ns | 1.3823 ns | 1.2930 ns |       0 B |

cc: @ahsonkhan, @jkotas, @pjanotti, @joperezr
Contributes to https://github.com/dotnet/corefx/issues/30612

@stephentoub
Copy link
Member Author

@tarekgh or @jkotas, do you have any familiarity with the issue cited at:

// The "r" and "u" formats incorrectly quoted 'GMT' and 'Z', respectively. We cannot
// correct this mistake for DateTime.ParseExact for compatibility reasons, but we can
// fix it for DateTimeOffset.ParseExact as DateTimeOffset has not been publically released
// with this issue.
if ((result.flags & ParseFlags.CaptureOffset) != 0)
{
if ((result.flags & ParseFlags.Rfc1123Pattern) != 0 && quotedStr == GMTName)
{
result.flags |= ParseFlags.TimeZoneUsed;
result.timeZoneOffset = TimeSpan.Zero;
}

and whether that's something I need to care about here?

{
uint space = source[25], g = source[26], m = source[27], t = source[28];

if ((space | g | m | t) > 0x7F ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this measurably faster than 4 character equality comparisons?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. Simplified.

return false;
}

uint dowString = (dow0 << 24) | (dow1 << 16) | (dow2 << 8) | comma | 0x20202000;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahsonkhan, Utf8Parser with "r" expects the specific casing of "Sun", "Mon", etc... it doesn't allow any other casing. Is that by design? It differs from DateTime.ParseExact in that regard, which allows any casing for the day of week and month.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utf8Parser doesn't support "r". It has "R" and "l" for RFC 1123:
https://github.com/dotnet/corefx/blob/edac8ec5d0e09fab9d11c042dcbabb07b9e3b446/src/System.Memory/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Date.cs#L22-L29

So, it looks like its by design.
@atsushikan would know more on why we chose different characters for it.

@tarekgh
Copy link
Member

tarekgh commented Jul 3, 2018

@stephentoub

do you have any familiarity with the issue cited at:

This is to support the date format with a quote. i.e. in the date format, you'll have 'z' or 'GMT'. the quotes are not part of the real ISO format but it looks the framework at some time supported it by mistake. usually, when having quotes in the date pattern, the characters inside the quotes are treated literals (i.e. it show up in the formatted strings as it is) but when having 'z' or 'GMT' we treat it is as the date is in UTC time zone.

@stephentoub
Copy link
Member Author

This is to support the date format with a quote

@tarekgh, can you share an example? I've not been able to come up with a variant that has quotes around GMT and that successfully parses with DateTime.ParseExact(..., "r", ...)

@stephentoub
Copy link
Member Author

stephentoub commented Jul 4, 2018

This is to support the date format with a quote

Oh, wait, it's to support the date format, not the input string. So I don't need to worry about that here, then. Thanks.

… ("r")

Significantly improves the performance by porting and adapting the Utf8Parser code from corefx.  This optimizes for the (default) case of a DateTimeStyles.None; specifying any other style falls back to the normal parsing support, as that requires handling things such as arbitrary whitespace anywhere in the string.
@stephentoub
Copy link
Member Author

Any other feedback?

@tarekgh
Copy link
Member

tarekgh commented Jul 4, 2018

@stephentoub please ensure you parse the following correctly:

        string s = "Mon, 01 Jan 0001 00:00:00 'GMT'";
        dt = DateTime.ParseExact(formatted, "r", CultureInfo.CurrentCulture);

This is the special case we support. note the quotes around GMT in the formatted string.

// Tue, 03 Jan 2017 08:08:05 GMT

// The format is exactly 29 characters.
if ((uint)source.Length != 29)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

((uint)source.Length != 29) [](start = 14, length = 28)

this looks wrong as GMT can specified with quotes so it can be 31

return false;
}

uint dowString = (dow0 << 24) | (dow1 << 16) | (dow2 << 8) | comma | 0x20202000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int dowString = (dow0 << 24) | (dow1 << 16) | (dow2 << 8) | comma | 0x20202000; [](start = 17, length = 79)

do you think this can work with Big Endian machines?

}

// Validate that the parsed date is valid according to the calendar.
if (!parseInfo.calendar.TryToDateTime(year, month, day, hour, minute, second, 0, 0, out result.parsedDate))
Copy link
Member

@tarekgh tarekgh Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!parseInfo.calendar.TryToDateTim [](start = 12, length = 36)

This is ok, but with the "r" format we always use Gregorian calendar so we can just simply do "new DateTime(...) " we just need to ensure DateTimeKind value is set correctly in the returned DateTime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that calendar.TryToDateTime is catching the exception and returns false at that time. we may do the same when doing new DateTime() or better we can add helper method to DateTime like TryGet(..., out DateTime dt)


In reply to: 200185195 [](ancestors = 200185195)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that calendar.TryToDateTime is catching the exception and returns false at that time

GregorianCalendar overrides TryToDateTime, with a bit of logic that ends up calling DateTime.TryCreate. Seems reasonable to still just use calendar.TryToDateTime then and inherit that behavior. We could avoid a virtual call by rewriting that logic here, but it doesn't really seem worth it.

@stephentoub
Copy link
Member Author

please ensure you parse the following correctly:

@tarekgh, this is what I was asking about earlier. That example fails for me, e.g. this program:

using System;
using System.Globalization;

class Program
{
    static void Main() => DateTime.ParseExact("Mon, 01 Jan 0001 00:00:00 'GMT'", "r", CultureInfo.CurrentCulture);
}

throws on .NET 4.7.2:

Unhandled Exception: System.FormatException: String was not recognized as a valid DateTime.
   at System.DateTimeParse.ParseExact(String s, String format, DateTimeFormatInfo dtfi, DateTimeStyles style)
   at System.DateTime.ParseExact(String s, String format, IFormatProvider provider)

Can you please clarify / share an example that parses successfully with format "r" and with quotes around GMT?

this looks wrong as GMT can specified with quotes so it can be 31

Same question. I've not been able to come up with an example where quotes are parsed successfully.

@tarekgh
Copy link
Member

tarekgh commented Jul 4, 2018

@stephentoub looks you are right. you may ignore my comments then.

@stephentoub stephentoub merged commit 9f9a37d into dotnet:master Jul 5, 2018
@stephentoub stephentoub deleted the dtparse branch July 5, 2018 01:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants