Optimize the loading of images using stored URL metrics#884

westonruter · 2023-11-16T22:40:16Z

Summary

Fixes #872 as part of #869.

This implements the optimization of image loading based on the client-side detection (#876) that has been stored (#878).

This is fully functional and able to be tested in a WP install.

Relevant technical choices

When a response is eligible for being optimized, an output buffer filter is added to apply optimizations. This involves looking up the ilo_url_metrics post containing the stored URL metrics, and when present groupong the URL metrics into the breakpoints (480px, 600px, 782px). If all breakpoints have URL metrics collected and all share the same image as LCP element, then fetchpriority=high is set on this image while being removed from all other images. If all breakpoints don't have URL metrics collected yet, then the fetchpriority attribute added by server-side heuristics is not removed. Certain data attributes are also added to elements so we can track what operations are being performed:

Attribute	Condition
`data-ilo-added-fetchpriority`	When the `fetchpriority` attribute is added to an element.
`data-ilo-removed-fetchpriority`	When the `fetchpriority` attribute is removed from an element.
`data-ilo-fetchpriority-already-added`	When WordPress's existing server-side heuristics have already set `fetchpriority=high` on the LCP image, then the `data-ilo-removed-fetchpriority` attribute is added.
`data-ilo-added-tag`	Present on the `preload` links which are injected at the end of the `head`.

When the fetchpriority attribute is added, a data-ilo-added-fetchpriority attribute is also added so we can keep track of what operations it is doing.

Whenever there are URL metrics available, preload links are also added for the LCP image in each breakpoint. This is particularly important when different breakpoints have different LCP elements (even none at all), as in such cases the fetchpriority attribute must be removed from each breakpoint-specific LCP image element. This is also useful for when breakpoints don't have URL metrics collected yet, as otherwise fetchpriority=high could be erroneously added to the wrong image for the current viewport.

In order to apply the fetchpriority optimizations to the page and to collect the attributes from IMG elements for use in preload links, the WP_HTML_Tag_Processor class is leveraged, although it is not used directly. An ILO_HTML_Tag_Processor class is used which incorporates WP_HTML_Tag_Processor (by composition instead of extension). This processor is similar to WP_HTML_Tag_Processor except it is more constrained. It provides a generator method called open_tags(); while iterating over this generator, the breadcrumbs for the current open tag can be obtained via the get_breadcrumbs() method. Otherwise, the following methods from the underlying WP_HTML_Tag_Processor are exposed:

get_attribute()
set_attribute()
remove_attribute()
get_updated_html()

Importantly, the next_tag() and seek() methods are not exposed since in order to correctly compute breadcrumbs the processor must proceed forward through the entire document visiting all open and closing tags.

Tip: The following command deletes all ilo_url_metrics posts:

npm run wp-env run cli -- post delete --force $( npm run --silent wp-env run cli -- post list --post_type=ilo_url_metrics --format=ids 2> /dev/null )

Other changes

Besides the existing 480px breakpoint, additional breakpoints of 600px and 782px have been added. These three breakpoints correspond to the mobile, small, and medium breakpoints that are defined in Gutenberg's _breakpoints.css.
Detection and optimization by default do not run now in the Customizer preview or in response to a non-GET request (e.g. a POST request).
Breadcrumb calculation has been updated to disregard the presence of the Admin Bar. Similarly, client-side detection logic has been updated to account for JS injection of the .skip-link.screen-reader-text element, which throws off generation of breadcrumbs on the client to match breadcrumbs generated on the server. In reality, the generation of breadcrumbs client-side is fragile and in a subsequent PR I intend to rely on server-side breadcrumb generation exclusively.
Some additional preflab_ prefixes have been replaced with ilo_.
The tagName key has been replaced with tag in breadcrumbs.

Checklist

PR has either [Focus] or Infrastructure label.
PR has a [Type] label.
PR has a milestone or the no milestone label.

…erent LCP images

…or URL metrics with no LCP element

westonruter · 2023-11-18T17:30:49Z

This is now working!

When there is only one LCP image shared for all breakpoints, it adds fetchpriority=high to that element and removes it from all others.
When breakpoints have different LCP images (or some have none), fetchpriority is removed from all images and preload links are added to target the breakpoints.

Elements that have been muted get annotated with data-ilo-* attributes to easily track what it is doing.

Items to further work on:

Create a subclass of WP_HTML_Tag_Processor instead of using the ilo_walk_document() function.
Disable detection when in the Customizer preview due to the injection of inline editing controls.
Disable detection and optimization when $_SERVER['REQUEST_METHOD'] !== 'GET'.

For a future PR:

Improve handling of breadcrumb generation when there the document structure varies either due to admin bar being present or due to client-side logic. Currently breadcrumbs have to explicitly skip over the admin bar element to ignore it. Similarly, the client-side breadcrumb logic has to skip over .skip-link.screen-reader-text which is injected by wp_enqueue_block_template_skip_link(). This issue of client-side mutated elements is particularly problematic, which may require server-side tagging of elements with their breadcrumbs which would eliminate needing to also do that calculation on the client. But this would mean extraneous data- attributes on all images. When detection isn't needed, the tag processor could strip those attributes out before sending back the buffer. We may need to monitor this to see how much of a problem it is, such as by checking various themes to see what degree they inject elements to mess up breadcrumbs. As another alternative, we may consider relying on IDs and class names for breadcrumbs rather than element indices, although JS here too can cause problems by modifying the class names.
Go ahead and inject a preload link even when there is only one LCP image shared by all breakpoints?
What about LCP images that are injected with JavaScript, e.g. a dynamically-constructed carousel? The element will not be in the DOM for adding fetchpriority=high to. However detection could capture the image attributes (e.g. src and srcset) to then add a preload link. The challenge here is that there's no way to guard against preloading an image that JS no longer is showing. This isn't a problem for server-side processing because we don't store the src but rather just the breadcrumbs, so we validate the LCP element is still in the page.
Should detection be disabled when the user is logged-in (and the admin bar is shown)?

modules/images/image-loading-optimization/storage/data.php

modules/images/image-loading-optimization/optimization.php

westonruter · 2023-11-18T17:39:54Z

modules/images/image-loading-optimization/detection/detect.js

+	}
+	if (
+		children.includes(
+			document.querySelector( '.skip-link.screen-reader-text' )


This should be cached.

I'm going to leave this as-is for this PR. In the next PR I intend to eliminate client-side breadcrumb construction since it is too easy for JavaScript DOM mutations to cause client-side generated breadcrumbs from being able to apply on the server.

Once you switch to generate breadcrumbs on the server, maybe a shorter index or hash passed to the front end?

Yes, exactly. Currently I'm thinking that when detection is needed that the server would add breadcrumbs via a data attribute, for example:

<img data-ilo-breadcrumbs="html,0 body,1 main,2 figure,10 img,0" ...>

When detection is not needed, such data attributes would not be present.

Can you add a TODO comment? :)

There is already one:

performance/modules/images/image-loading-optimization/detection/detect.js

Line 136 in 0cca4dd

* @todo Eliminate this in favor of doing all breadcrumb generation exclusively on the server.

Done in the next PR: #892

… of head

…ta yet collected

modules/images/image-loading-optimization/class-ilo-html-tag-processor.php

dmsnell

@westonruter there's quite a bit in this PR! I can't say that I fully understanding it, but as far as the HTML Processor goes I'm glad you're experimenting with the lower-level tooling for your own needs. plugins are the right place to push ahead with riskier code.

it hasn't been my intention to expose which nth-child a node is compared to the parent, but I was going to build a CSS selector that queries that when prompted. maybe that's a good thing to build in from the start.

my outlook has been positive on the performance side for the eventual HTML Processor, but it's still opaque until it supports everything it needs to 🤷‍♂️

modules/images/image-loading-optimization/class-ilo-html-tag-processor.php

dmsnell · 2023-11-29T23:58:00Z

modules/images/image-loading-optimization/class-ilo-html-tag-processor.php

+				// TODO: There are quite a few more cases of optional closing tags: https://html.spec.whatwg.org/multipage/syntax.html#optional-tags
+				// Nevertheless, given WordPress's legacy of XHTML compatibility, the lack of closing tags may not be common enough to warrant worrying about any of them.
+				if ( in_array( $tag_name, self::P_CLOSING_TAGS, true ) ) {
+					$i = array_search( 'P', $this->open_stack_tags, true );


this is probably fine enough, but be aware that there are certain markers/boundaries in the stack that effectively isolate regions of HTML. for example, if we're currently inside a TEMPLATE element and encounter one of these P-closing elements, they don't close the P from the outside of the template.

<p> This template is isolated. <template><section><p>Hidden</p></section></template> The outer P remains. </p>

dmsnell · 2023-11-30T00:01:34Z

modules/images/image-loading-optimization/class-ilo-html-tag-processor.php

+					$i = array_search( $tag_name, $this->open_stack_tags, true );
+					if ( false !== $i ) {
+						array_splice( $this->open_stack_tags, $i );
+						$did_splice = true;


if encountering an element which isn't an HTML Void element, then regardless of the tag, if the self-closing flag is present, then it closes the tag. all self-closing flags on HTML elements are invalid and ignored, but all on HTML foreign elements are authoritative and obeyed.

<p>There are no empty <div/> tags. This is inside the DIV</p>. <wp-group>Is a custom element, and so <wp-group /> is self-closing while others aren't.</wp-group>

if ( WP_HTML_Processor::is_void( $tag_name ) || ( ! WP_HTML_Processor::is_html_element( $tag_name ) && $p->has_self_closing_flag ) ) { // this tag immediately closes as soon as we jump to the next tag. }

this requires having a list of all HTML elements, which the HTML Processor currently doesn't do because it doesn't support any, but we'll have to add it. this can lead to common parsing failures because the invalid self-closing flag has become more popular post-React where it's valid and normative in JSX.

Very interesting. I wasn't aware of the WP_HTML_Processor::has_self_closing_flag() method.

Nevertheless, maybe having a list of all HTML elements isn't needed here because: (1) if math or svg are ancestors, we can assume that all tags with self-closing flags will close the tag, and (2) custom elements always have hyphens in them, so if present we can also honor the self-closing tag.

Nevertheless, I just checked your example and it doesn't seem the second example with a custom element is actually true. I adapted your example to use a span instead of div (since a div closes an open p):

<p>There are no empty <span/> tags. This is inside the SPAN</p>. <wp-group>Is a custom element, and so <wp-group /> is self-closing while others aren't.</wp-group>

And Chrome renders this as:

So perhaps such self-closing foreign elements are limited to MathML and SVG contexts?

For example:

<svg>Is a custom element, and so <g /> is self-closing while others aren't.</svg>

by golly, you're right. I have been misinterpreting "foreign element" for a long time now! thankfully I haven't gotten to the place in the HTML Processor where that matters.

thank you very much for pointing this out.

dmsnell · 2023-11-30T00:32:18Z

modules/images/image-loading-optimization/optimization.php

+
+			$link_tag .= sprintf( ' %s="%s"', $name, esc_attr( $value ) );
+		}
+		$link_tag .= ">\n";


this is also a reasonable place for the Tag Processor, though obviously not essential. it simply removes the need to remember to escape, and joins together the split tag/html/php syntax.

$link_tag = new WP_HTML_Tag_Processor( '<link data-ilo-added-tag rel="preload" fetchpriority="high" as="image">' ); foreach ( array_filter( $img_attributes ) as $name => $value ) { switch ( $name ) { case 'srcset': case 'sizes': $name = "image{$name}"; break; case 'src': $name = 'href'; break; } $link_tag->set_attribute( $name, $value ); } $link_html = $link_tag->get_updated_html();

I'm hoping to have basic templating in place for 6.5, which should turn this into something potentially nicer still.

$attributes = []; foreach ( array_filter( $img_attributes ) as $name => value ) { switch ( $name ) { case 'srcset': case 'sizes': $attributes[ "image{$name}" ] = $value; continue; case 'src': $attributes['href'] = $value; continue; } $attributes[ $name ] = $value; } $link_tag = WP_HTML::render( '<link data-ilo-added-tag rel="preload" fetchpriority="high" as="image" ...attrs>', array( 'attrs' => $attributes ) );

Co-authored-by: Adam Silverstein <adamjs@google.com>

felixarntz

@westonruter This looks great so far. I left a few comments, mostly for minor optimization. My biggest concern/question is in regards to the custom tag processor class, I'm not sure why that needs to be included here.

felixarntz · 2023-12-01T20:05:46Z

modules/images/image-loading-optimization/class-ilo-html-tag-processor.php

+	const VOID_TAGS = array(
+		'AREA',
+		'BASE',
+		'BASEFONT', // Obsolete.
+		'BGSOUND', // Obsolete.
+		'BR',
+		'COL',
+		'EMBED',
+		'FRAME', // Deprecated.
+		'HR',
+		'IMG',
+		'INPUT',
+		'KEYGEN', // Obsolete.
+		'LINK',
+		'META',
+		'PARAM', // Deprecated.
+		'SOURCE',
+		'TRACK',
+		'WBR',
+	);
+
+	/**
+	 * The set of HTML tags whose presence will implicitly close a <p> element.
+	 * For example '<p>foo<h1>bar</h1>' should parse the same as '<p>foo</p><h1>bar</h1>'.
+	 *
+	 * @link https://html.spec.whatwg.org/multipage/grouping-content.html#the-p-element
+	 *
+	 * @var string[]
+	 */
+	const P_CLOSING_TAGS = array(
+		'ADDRESS',
+		'ARTICLE',
+		'ASIDE',


Why do we need these lists here? What specifically in regards to void and p closing tags is needed here, and why? Is there a bug in the WP_HTML_Tag_Processor, or crucial limitation?

It would be great if you could clarify the need for all this in relation to this module. Right now I'm wondering whether this module should need all those functionalities - like why are those special tags relevant for image optimization?

These are needed to be able to iterate over tags in the document and be able to correctly keep track of the open tag stack. The WP_HTML_Tag_Processor is intentionally very limited in what processing it supports. Namely, it does not have any sense of how tags are related to each other (e.g. that html is a parent of body). It is a streaming parser that iterates over open tags and close tags. In order to keep track of when one tag is a child of another tag, we have to know the semantics of these tags, whether they are self-closing (void) or how they relate to tags that have optional closers (e.g. here p).

The superset of functionality from WP_HTML_Tag_Processor is in WP_HTML_Processor which does have capabilities regarding document structure. However, it can't be used yet because it doesn't support all HTML tags yet. Eventually this class shouldn't be needed as WP_HTML_Processor. I've added a comment to reflect this in 0cca4dd.

And yeah, these lists of tags are all related to image optimization in that we need to know them in order to accurately construct breadcrumbs for any given tag.

Thanks, that makes sense. I wonder how this version of the tag processor is for performance.

For the plugin we don't need to worry about this too much, but for a potential core proposal we would need to spend more time on this - particularly as I'm not sure we would want two separate versions of the tag processor in core. And we will only be able to use this one throughout if it doesn't have notable negative performance implications.

It should have about the same performance as WP_HTML_Tag_Processor as it is just iterating over tags and is doing much less than WP_HTML_Processor, which is less predictable for performance.

felixarntz · 2023-12-01T20:06:50Z

modules/images/image-loading-optimization/detection/detect.js

+	}
+	if (
+		children.includes(
+			document.querySelector( '.skip-link.screen-reader-text' )


Can you add a TODO comment? :)

modules/images/image-loading-optimization/optimization.php

modules/images/image-loading-optimization/storage/data.php

modules/images/image-loading-optimization/optimization.php

westonruter · 2023-12-01T21:11:26Z

My biggest concern/question is in regards to the custom tag processor class, I'm not sure why that needs to be included here.

@felixarntz The custom tag processor is needed in order to be able to obtain breadcrumbs for any given element. Without it, we wouldn't be able to apply optimizations for elements identified during detection.

…cessor is fully implemented

…ints

westonruter · 2023-12-04T16:47:12Z

modules/images/image-loading-optimization/optimization.php

+function ilo_construct_breadcrumbs_string( array $breadcrumbs ): string {
+	return implode(
+		' ',
+		array_map(
+			static function ( $breadcrumb ) {
+				return sprintf( '%s,%s', $breadcrumb['tag'], $breadcrumb['index'] );


In another PR I'm working on, I'm going to make this into an XPath constructor.

Suggested change

function ilo_construct_breadcrumbs_string( array $breadcrumbs ): string {

return implode(

' ',

array_map(

static function ( $breadcrumb ) {

return sprintf( '%s,%s', $breadcrumb['tag'], $breadcrumb['index'] );

function ilo_construct_breadcrumbs_xpath( array $breadcrumbs ): string {

return implode(

'',

array_map(

static function ( $breadcrumb ) {

return sprintf( '/%s[%d]', $breadcrumb['tag'], $breadcrumb['index'] );

westonruter · 2023-12-04T21:34:34Z

modules/images/image-loading-optimization/optimization.php

+ * @param array<array{tag: string, index: int}> $breadcrumbs Breadcrumbs.
+ * @return string Breadcrumb string.
+ */
+function ilo_construct_breadcrumbs_string( array $breadcrumbs ): string {


Function moved to ILO_HTML_Tag_Processor method in #892

Co-authored-by: Felix Arntz <felixarntz@users.noreply.github.com>

felixarntz

@westonruter Thanks for the updates, LGTM!

westonruter · 2023-12-06T00:08:59Z

It would be nice if pull requests that aren't targeting trunk or a release branch could be merged with only a single reviewer. Getting two reviews for every review regardless seems like a bottleneck.

westonruter · 2023-12-07T17:57:15Z

@adamsilverstein or @swissspidy could either of you drop in a LGTM?

adamsilverstein

Tremendous

westonruter added 4 commits November 15, 2023 17:40

Fix prefix for output buffer template filter

4758013

Add small and medium breakpoints from Gutenberg to go along with mobile

5fb6fda

Add ilo_get_lcp_elements_by_minimum_viewport_widths()

bd1153d

Use array_filter() to simplify breakpoint merge

a990982

westonruter added [Type] Feature A new feature within an existing module [Focus] Images no milestone PRs that do not have a defined milestone for release labels Nov 16, 2023

This was referenced Nov 16, 2023

Optimization Detective: Server-applied optimizations informed by client-side detection #869

Closed

Add storage for URL metrics to optimize image loading #878

Merged

westonruter added 2 commits November 16, 2023 16:03

Remove fetchpriority from images when different breakpoints have diff…

b18a0c7

…erent LCP images

Update ilo_get_lcp_elements_by_minimum_viewport_widths() to account f…

25ea308

…or URL metrics with no LCP element

westonruter added the [Plugin] Optimization Detective Issues for the Optimization Detective plugin label Nov 17, 2023

westonruter added 2 commits November 17, 2023 17:10

WIP: Breadcrumb calculation on server

c88f358

Set appropriate fetchpriority and otherwise add preload links

c20f967

westonruter commented Nov 18, 2023

View reviewed changes

modules/images/image-loading-optimization/storage/data.php Outdated Show resolved Hide resolved

westonruter commented Nov 18, 2023

View reviewed changes

modules/images/image-loading-optimization/optimization.php Outdated Show resolved Hide resolved

westonruter commented Nov 18, 2023

View reviewed changes

modules/images/image-loading-optimization/optimization.php Outdated Show resolved Hide resolved

westonruter commented Nov 18, 2023

View reviewed changes

westonruter added 11 commits November 27, 2023 16:45

Use preg_replace() with limit 1 for injection or preload links at end…

08635fa

… of head

Improve variable naming

ec55c9a

Add todos for doing breadcrumbs exclusively on server

6e0f809

Account for HEAD closing tag possibly being upper-case

e429983

Disable on Customizer preview and non-GET responses

8974ac8

Introduce ILO_HTML_Tag_Processor

acde9b4

Allow callable not just Closure

024719d

Prevent error when no ilo_url_metrics post

38cb821

Clarify logic in ilo_optimize_template_output_buffer

234e2fe

Set attribute when server-side heuristics were correct

1585a78

Prevent setting fetchpriority on IMG when not all breakpoints have da…

292d1c4

…ta yet collected

adamsilverstein reviewed Nov 29, 2023

View reviewed changes

modules/images/image-loading-optimization/class-ilo-html-tag-processor.php Outdated Show resolved Hide resolved

dmsnell reviewed Nov 30, 2023

View reviewed changes

westonruter and others added 3 commits November 29, 2023 20:43

Fix comment typo

3d7b5fa

Co-authored-by: Adam Silverstein <adamjs@google.com>

Improve naming of void tags and add missing tags which close P

957d2c4

Remove todo related to user being logged-in

e8a36b4

felixarntz reviewed Dec 1, 2023

View reviewed changes

westonruter added 5 commits December 1, 2023 13:18

Add note about how ILO_HTML_Tag_Processor is needed until WP_HTML_Pro…

0cca4dd

…cessor is fully implemented

Move class file include to load.php

83d0362

Add TODO to remove loading attribute

605fb02

Optimize looking up LCP element by breadcrumb

74145a4

Make ilo_construct_preload_links() easier to read

2758c0d

westonruter requested a review from felixarntz December 1, 2023 22:28

Never include loading=lazy on the LCP image common across all breakpo…

9d82615

…ints

westonruter commented Dec 4, 2023

View reviewed changes

Account for the same element being LCP on different breakpoints

9bfcdc9

westonruter mentioned this pull request Dec 4, 2023

Rely on server-side breadcrumbs for detection and optimization #892

Merged

3 tasks

westonruter commented Dec 4, 2023

View reviewed changes

westonruter and others added 2 commits December 5, 2023 15:26

Clarify LCP element language in comment

185ebf6

Add comment explaining why for loop is used

1d259b5

Co-authored-by: Felix Arntz <felixarntz@users.noreply.github.com>

felixarntz approved these changes Dec 6, 2023

View reviewed changes

westonruter mentioned this pull request Dec 6, 2023

Add PHPUnit tests for Image Loading Optimization #898

Merged

3 tasks

mukeshpanchal27 approved these changes Dec 7, 2023

View reviewed changes

adamsilverstein self-requested a review December 7, 2023 18:57

adamsilverstein approved these changes Dec 7, 2023

View reviewed changes

westonruter merged commit d88cc75 into feature/image-loading-optimization Dec 7, 2023

westonruter deleted the add/image-loading-optimization-optimizing branch December 7, 2023 19:04

westonruter mentioned this pull request Mar 21, 2024

Lazy-load images which are not visible in any breakpoint #1077

Closed

Comments

Conversation

westonruter commented Nov 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Relevant technical choices

Other changes

Checklist

Uh oh!

westonruter commented Nov 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dmsnell left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felixarntz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

westonruter commented Dec 1, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

felixarntz left a comment

Choose a reason for hiding this comment

Uh oh!

westonruter commented Nov 16, 2023 •

edited

Loading

westonruter commented Nov 18, 2023 •

edited

Loading