Skip to content

Deny list recognizer fails to detect terms that are not surrounded by whitespace #857

Description

@omri374

Describe the bug
The deny list capability only detects terms that have white space before and after.

To Reproduce

from presidio_analyzer import PatternRecognizer
deny_list = ["Mr", "Mrs", "Ms", "Dr", "Prof"]

deny_list_recognizer = PatternRecognizer(deny_list=deny_list, supported_entity="TITLES")

deny_list_recognizer.analyze(text="Mr Smith", entities=["TITLES"])

Would result in:

[type: TITLES, start: 0, end: 2, score: 1.0]

Running:

deny_list_recognizer.analyze(text="Mr. Smith", entities=["TITLES"])

Would result in nothing detected.

Expected behavior
Special characters such as .,:;!? before and after a term should also be used to detect a deny list term

Metadata

Metadata

Assignees

Labels

analyzerbugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions