Skip to content

ISO 8601 with offset is not correctly recognized as DATE_TIME #1619

Description

@StefH

Describe the bug
ISO 8601 with positive or 00 offset is not correctly recognized as DATE_TIME

To Reproduce
1️⃣ Start Analyzer Docker Image

2️⃣ Send this request:

{
    "text": "'Time with timezone offset (+2 hours)': 2024-03-15T14:30:00+02:00\r\n'End of year with explicit UTC offset': 2024-12-31T23:59:59+00:00",
    "language": "en"
}

3️⃣ Text (when processed by the Anonymizer) is:

'Time with timezone offset (+2 <DATE_TIME>)': <DATE_TIME>+02:00\r\n'End of year with explicit UTC offset': <DATE_TIME>+00:00"

Expected behavior
I would expect the response text to be:

'Time with timezone offset (+2 <DATE_TIME>)': <DATE_TIME>\r\n'End of year with explicit UTC offset': <DATE_TIME>"

Question
Currently, by using the default Docker image, these DATE_TIMEs are detected by the SpacyRecognizer.
Wouldn't it be a better solution to update the existing DateRecognizer to also include the ISO 8601 as a regex?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions