Skip to content

add spacy_stanza into stanza_nlp_engine as it is no longer maintained#1522

Merged
SharonHart merged 8 commits into
mainfrom
omri/stanza_new_version
Feb 5, 2025
Merged

add spacy_stanza into stanza_nlp_engine as it is no longer maintained#1522
SharonHart merged 8 commits into
mainfrom
omri/stanza_new_version

Conversation

@omri374

@omri374 omri374 commented Feb 4, 2025

Copy link
Copy Markdown
Collaborator

Change Description

spacy_stanza is a spacy wrapper for stanza. spacy-stanza only supports stanza versions below 1.7, and due to some changes in torch, stanza < 1.7 doesn't work (unless we downgrade torch).

This change adds the code from spacy_stanza into stanza_nlp_engine, as spacy_stanza is no longer maintained.
Tests were also added to the test_stanza_nlp_engine.py file to ensure that the code works as expected.

@omri374 omri374 requested a review from a team as a code owner February 4, 2025 21:34
@omri374

omri374 commented Feb 4, 2025

Copy link
Copy Markdown
Collaborator Author

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@omri374 omri374 requested a review from Copilot February 4, 2025 21:37

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

presidio-analyzer/presidio_analyzer/nlp_engine/stanza_nlp_engine.py:136

  • [nitpick] The function name 'tokenizer_factory' is ambiguous. It should be renamed to 'create_stanza_tokenizer' to better reflect its purpose.
def tokenizer_factory(

@omri374

omri374 commented Feb 4, 2025

Copy link
Copy Markdown
Collaborator Author

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

SharonHart
SharonHart previously approved these changes Feb 5, 2025
@SharonHart SharonHart merged commit 65eabd4 into main Feb 5, 2025
@SharonHart SharonHart deleted the omri/stanza_new_version branch February 5, 2025 08:34
prokopidis pushed a commit to prokopidis/presidio that referenced this pull request Jun 23, 2026
…microsoft#1522)

* add spacy_stanza into stanza_nlp_engine as it is no longer maintained

* add spacy_stanza into stanza_nlp_engine as it is no longer maintained

* add spacy_stanza into stanza_nlp_engine as it is no longer maintained

* small note fix

* make stanza optional

* simplified stanza tests to reduce memory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants