Skip to content

Conversation

@devansh016
Copy link
Contributor

@devansh016 devansh016 commented Jan 21, 2025

What?

This PR ensures that HTML entities, such as &, are properly decoded when generating post slugs. It modifies the getEditedPostSlug selector to decode entities in post titles before cleaning them for slug creation.

Why?

The current implementation adds unnecessary HTML entities like amp to slugs when saving drafts. This behavior can cause confusion and lead to unnecessarily long or SEO-unfriendly URLs. For example, a draft title containing & becomes /ampersand-amp-and/, but upon publishing, it correctly becomes /ampersand-and/. This PR fixes the inconsistency by decoding HTML entities before cleaning the slug.

Fixes #62543

How?

  • Added decodeEntities from @wordpress/html-entities to the getEditedPostSlug selector.
  • Updated the slug-cleaning process to decode HTML entities in the post title before passing it to cleanForSlug.

Code changes are minimal and located in packages/editor/src/store/selectors.js.

Testing Instructions

  1. Go to Settings > Permalinks and set the permalink structure to "Post name".
  2. Create a new post with a title containing special characters, e.g., Ampersand & And.
  3. Save the post as a draft.
  4. Check the generated URL in the editor sidebar.
    • Before: /ampersand-amp-and/
    • After: /ampersand-and/
  5. Publish the post and confirm the URL remains /ampersand-and/.

Testing Instructions for Keyboard

Screenshots or screencast

Before After

@github-actions
Copy link

github-actions bot commented Jan 21, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Unlinked Accounts

The following contributors have not linked their GitHub and WordPress.org accounts: @APCgit.

Contributors, please read how to link your accounts to ensure your work is properly credited in WordPress releases.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Unlinked contributors: APCgit.

Co-authored-by: devansh016 <[email protected]>
Co-authored-by: Mamaduka <[email protected]>
Co-authored-by: t-hamano <[email protected]>
Co-authored-by: Soean <[email protected]>
Co-authored-by: 2ndkauboy <[email protected]>
Co-authored-by: annezazu <[email protected]>
Co-authored-by: marcarmengou <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@Mamaduka Mamaduka added [Type] Bug An existing feature does not function as intended [Package] Block editor /packages/block-editor labels Jan 22, 2025
@Mamaduka
Copy link
Member

Thanks for contributing, @devansh016!

The referenced issue has a PR (#62549) with an ongoing review process. Just FYI.

@devansh016
Copy link
Contributor Author

Thanks for contributing, @devansh016!

The referenced issue has a PR (#62549) with an ongoing review process. Just FYI.

Thanks, @Mamaduka! I’ve reviewed the other PR. My approach is slightly different, and the previous PR has been inactive for the past couple of months.

@t-hamano
Copy link
Contributor

t-hamano commented May 6, 2025

Hi, @devansh016. Thanks for the PR.

I've discovered that simply decoding HTML entities isn't enough. Besides HTML entities, there are many other characters that need to be handled.

For example, the slug of the title Hello—World (Hello&mdash;World) should be hello-world:

image

However, the slug becomes helloworld when the post is saved in the block editor:

image

In other words, as mentioned in this comment, we may need a JS function equivalent to sanitize_title_with_dashes() function.

@Mamaduka
Copy link
Member

Mamaduka commented May 7, 2025

In other words, as mentioned in #62549 (comment), we may need a JS function equivalent to sanitize_title_with_dashes() function.

The cleanForSlug is supposed to be that function, but it seems it's not working as expected.

@APCgit
Copy link

APCgit commented May 7, 2025

Try updating a slug that has already been published, the result is completely different. When publishing you can handle the slug through PHP, but when updating a published post there is no php filter involved and this ends up in non transformed slugs with wrong umlauts and characters just removed. Its very inconsistent. Unfortunately I didnt find a way to filter the slug myself for the default ui. I added a custom input to handle umlauts and characters properly, but that's not how it should work...

@Mamaduka
Copy link
Member

Mamaduka commented May 7, 2025

@APCgit, the cleanForSlug is more like a sanitize_title_with_dashes, but it also handles special accented characters.

There are a couple of reasons the method doesn't allow filtering the return value:

  • It's more generic than sanitize_title and is used in multiple places to slugify text strings.
  • As stated in the JSDoc block, the method returns an approximate slug, but eventually defers to slugs generated on the server.
  • The sanitize_title_with_dashes also doesn't apply any filters.

@Mamaduka
Copy link
Member

Mamaduka commented May 7, 2025

Thanks for the initial proposal, @devansh016!

I'm going to close this in favor of #70078.

@Mamaduka Mamaduka closed this May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[Package] Block editor /packages/block-editor [Type] Bug An existing feature does not function as intended

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The & symbol becomes amp in the draft URL

4 participants