Skip to content

Conversation

@softhack007
Copy link
Member

@softhack007 softhack007 commented Nov 1, 2024

Reworked framerate control loop, to allow more fluent animations and support higher framerates

the problem (in a nutshell)

image

solution

  • separated fps calculation (strip.show) from framerate control (strip.service)
  • improved condition for early exit in strip.service
  • made MIN_SHOW_DELAY depend on target fps
  • strip.show now considers the complete time of effect calculation + show; old code wrongly used the time between completion of last show and start of next effect drawing, causing unexpected slowdown and sometimes leading to stuttering effects.
  • added "unlimited FPS mode" for testing (use Target FPS = 120)
  • increased warning limits for "slow strip" and "slow effects" (WLED_DEBUG)

As 8266 and esp32-c3 might suffer from higher CPU load, these chips use the previous MIN_SHOW_DELAY logic that causes more idle loops.

Comparison, Using a 64x64 "network DDP" output (esp32_wrover)

  • Akemi : old code 38 fps, new code 53 fps
  • Black Hole: old code 40 fps, new code 54 fps
  • Lissajous: old code 37 fps, new code 51 fps

* separate fps calculation (strip.show) from framerate control (strio.service)
* improved condition for early exit in strip.show
* make MIN_SHOW_DELAY depend on target fps
* strip.show consideres complete time for effect calculation + show; old code wrongly used the time between completion of last show and start of next effect drawing, causing unexpected slowdown
* add "unlimited FPS mode" for testing
* increase warning limits for "slow strip" and "slow effects"
@softhack007 softhack007 marked this pull request as draft November 1, 2024 23:40
@softhack007 softhack007 requested a review from blazoncek November 1, 2024 23:40
@softhack007 softhack007 added enhancement optimization re-working an existing feature to be faster, or use less memory Awaiting testing labels Nov 1, 2024
@softhack007 softhack007 marked this pull request as ready for review November 2, 2024 00:49
@dosipod
Copy link
Contributor

dosipod commented Nov 2, 2024

The below info is from a 32x32 matrix on 4 output with target FPS set to 120, I only tested 2d effects and I used ws but also verified some from the info page (give a small margin for errors due to ws which I kept unedited ) . Visually all effects seems faster with the new code ( too fast on smaller 32x16) . Some of the effects are very hard to notice a difference and might be more apparent on larger matrix then 32x32 but for sure it is better now .
Nov_2_2024_FPS_test.csv
FPS_TEST

@softhack007 softhack007 added this to the 0.15.1 candidate milestone Nov 2, 2024
Copy link
Contributor

@blazoncek blazoncek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not intend to review but changed my mind as I think there are flaws that need addressing.
I do not know the reasons for MIN_SHOW_DELAY so that must be explored separately.

[[maybe_unused]] uint8_t tmpMode = seg.currentMode(); // this will return old mode while in transition
delay = (*_mode[seg.mode])(); // run new/current mode
frameDelay = (*_mode[seg.mode])(); // run new/current mode
if (frameDelay < speedLimit) frameDelay = FRAMETIME; // limit effects that want to go faster than target FPS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should involve _frametime not speedLimit.
_frametime reflects desired FPS (rounded).

Copy link
Member Author

@softhack007 softhack007 Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speed limit is actually 85% of _frametime. I've added it to prevent that effects that return a custom FRAMETIME exceed user defined target FPS. 85% to have some margin for "acceptable" overspeed that can be handled.

@softhack007
Copy link
Member Author

softhack007 commented Nov 2, 2024

@blazoncek thanks for your review 👍
I'll think about your comments and will consider what can be improved.

The whole logic was quite hard to "crack", so happy to get your thoughts 😃

@willmmiles
Copy link
Member

Overall comment is that I think there's a number of otherwise independent changes all glommed together here, and it does make it a little tough to review all at once.

  • delay -> frameDelay
  • Fixing the FPS calculation with _lastServiceShow
  • Adding an "unlimited FPS" feature
  • Changes to the timing parameters for fast chips

I'm not super thrilled with the conditional logic for "fast enough" chips -- it feels to me like there's a generally safe algorithm somewhere in here, struggling to get out. In some ways, high FPS is perfectly fine for fast FX even on slow chips, since they'll be servicing the wifi stack in between every one of those fast frames. The tougher challenge is delivering stable FPS, since servicing the wifi can itself be slow. I think the trick is to balance time spent computing and publishing frames with slack time left for other tasks, so that when one of those tasks wakes up, we don't have to see an FPS glitch. I think a good algorithm for that estimation will be applicable in all environments.

Finally, maybe we want to go past millis() and use ESP.getCycleCount() for improved precision? I do get the feeling that rounding off to millisecond precision is making some of these calculations trickier than they need to be.

@DedeHai
Copy link
Collaborator

DedeHai commented Nov 2, 2024

Finally, maybe we want to go past millis() and use ESP.getCycleCount() for improved precision? I do get the feeling that rounding off to millisecond precision is making some of these calculations trickier than they need to be.

Is more precision required? 1/10th of a frame at 100FPS seems plenty accurate. or which part are you referring to that needs better precision? ESP.getCycleCount() roll over happens much more frequently and may need handling.

* keep FRAMETIME_FIXED as a fixed value
* remove WLED_FPS_SLOW and FRAMETIME_FIXED_SLOW
* explicit test "(_targetFps != FPS_UNLIMITED)" for debug messages
* don't modify _lastServiceShow in show()
* test for "fps == FPS_UNLIMITED" explicitly, so we could pick a different
 magic number later
@softhack007
Copy link
Member Author

softhack007 commented Nov 2, 2024

@willmmiles some updates

  • delay -> frameDelay

this should be gone now. I've pushed a separate commit into 0_15.

  • Fixing the FPS calculation with _lastServiceShow

Maybe I should have explained better - _lastServiceShow is not just to fix the FPS calculation. Its actually needed to have a robust scheduling, that can deliver stable framerates even when there are other things that add unknown delays - like NPB.show() taking longer every second frame, or unexpected delays in the main loop.

@softhack007
Copy link
Member Author

Finally, maybe we want to go past millis() and use ESP.getCycleCount() for improved precision? I do get the feeling that rounding off to millisecond precision is making some of these calculations trickier than they need to be.

Is more precision required? 1/10th of a frame at 100FPS seems plenty accurate. or which part are you referring to that needs better precision? ESP.getCycleCount() roll over happens much more frequently and may need handling.

I'm not sure either.

In fact as we approach the higher framerates, the "air gets thin" with milliseconds time

  • 12 ms => 83 fps
  • 11 ms => 90 fps
  • 10 ms => 100 fps
  • 9 ms => 111 fps
  • 8 ms => 120 fps
  • 7 ms => 142 fps
  • 6 ms => 166 fps
  • 5 ms => 200 fps
  • 4 ms => 250 fps
  • 3 ms => 333 fps

@willmmiles
Copy link
Member

  • Fixing the FPS calculation with _lastServiceShow

Maybe I should have explained better - _lastServiceShow is not just to fix the FPS calculation. Its actually needed to have a robust scheduling, that can deliver the requested framerates even when there are other things that add unknown delays - like NPB.show() taking longer every second frame, or unexpected delays in the main loop.

True! While the metrics fix could be done as its own commit, having those accurate metrics is a prerequisite for scheduling improvements.

@softhack007
Copy link
Member Author

softhack007 commented Nov 3, 2024

In fact as we approach the higher framerates, the "air gets thin" with milliseconds time

@willmmiles @DedeHai The fps calculation in strip.show() could be changed to use esp_timer_get_time(). Its a system call returning elapsed time in micros as uint64_t. It's only availeable on esp32, but we could keep the old code for 8266.
Accurate fps calculation will be a win for performance measurements after code changes.

The effect timing code would stay as-is. For measuring FPS, rollover would not be a problem because we can simply skip measurements where last_timestamp > current_timestamp.

The MoonModules fork is using esp_timer_get_time() already, so I could clean up the code a bit, remove float calculations, and make a separate PR.

https://github.com/MoonModules/WLED/blob/ffa5445aec9a14b39fd13f8c00d6fc9489ac92bb/wled00/FX_fcn.cpp#L2073-L2083

What do you think?

@willmmiles
Copy link
Member

In fact as we approach the higher framerates, the "air gets thin" with milliseconds time

@willmmiles @DedeHai The fps calculation in strip.show() could be changed to use esp_timer_get_time(). Its a system call returning elapsed time in micros as uint64_t. It's only availeable on esp32, but we could keep the old code for 8266. Accurate fps calculation will be a win for performance measurements after code changes.

The effect timing code would stay as-is. For measuring FPS, rollover would not be a problem because we can simply skip measurements where last_timestamp > current_timestamp.

The MoonModules fork is using esp_timer_get_time() already, so I could clean up the code a bit, remove float calculations, and make a separate PR.

https://github.com/MoonModules/WLED/blob/ffa5445aec9a14b39fd13f8c00d6fc9489ac92bb/wled00/FX_fcn.cpp#L2073-L2083

What do you think?

Personally I strongly prefer solutions that work on all supported platforms, hence the suggestion of getCycleCount(). Rollovers are not an issue for relative measurements with unsigned types; new_timestamp - old_timestamp is always correct unless you've "lapped" the clock (for a 240MHz clock with a 32-bit cycle counter, that's a delay of 17.896 seconds -- not likely for an FPS counter!). ESP.getCpuFreqMHz() returns the factor needed to convert back to microseconds (and by proxy, milliseconds).

@DedeHai
Copy link
Collaborator

DedeHai commented Nov 3, 2024

I like Will's suggestion, if only relative comprisons are needed, than using getCycleCount() is the better solution. It's very easy to write a #define or inline function to return the time in microseconds or even nanoseconds.

just FYI regarding performance measurements and FPS: I have modified code I use to increase the low-pass filter time constant of getFps(), I could add that with a define that can be set as a compile parameters, so debug builds can choose how much to average the value.

@softhack007
Copy link
Member Author

softhack007 commented Nov 3, 2024

Yes @DedeHai @willmmiles what you say makes sense. I can try work out a PR during the week.

@DedeHai I'd love to see how you modified the FPS low-pass filtering. Actually the current FPS filter makes measurements very inaccurate, so any improvement is welcome 😃

@softhack007
Copy link
Member Author

softhack007 commented Nov 3, 2024

it feels to me like there's a generally safe algorithm somewhere in here, struggling to get out.

@willmmiles you might be right - I'm also not super happy with my implementation, however it solves some problems with the old code:

  • broken metric for "time since last draw"
  • too many idle loops, so testing FPS after code improvements produces misleading results
  • the new code allows users to get the "Target FPS" they wanted, if effects and strip are fast enough. The old code was very unreliable in this respect, and would produce stuttering in worst case.

The effect scheduling current works in a synchronous way, where all effects are updated at the same time (guaranteed by FRAMETIME). So all effects are "marching in lockstep".

However effects that request a different frametime may be lucky (if requested time is a multiple of the scheduler timestep) or unlucky if they are completely off sync. Luckily almost all effects have "return FRAMETIME" so they fit into the schedule.


If we want to faithfully schedule effects with different timing needs, it could not be done in an "all effects draw if needed, then strip.show updates all LEDs" way.

With different frametimes per effect, you would let some effects draw (if their time has passed), while skipping the ones which are not due yet - the current algo does this, too. But: the next scheduling cycle must be dynamic. Its the minimum of all frame delays, i.e. min(segment.next_time - strip.now). This means additional draw cycles in the main loop, plus strip.show() would be called much more frequently.

Maybe there is a better way, I'm open for any ideas and suggestions.

* _frametime ensures that effects are not serviced too often
*  MIN_SHOW_DELAY is the minimum allowed FRAMETIME that can be requested by effects
@softhack007
Copy link
Member Author

softhack007 commented Nov 4, 2024

I've simplified the scheduler logic, and removed all estimates for "lower limits". The logic is much cleaner now

  • _frametime is used to regulate the minimal time between two runs of strip.service()
  • MIN_SHOW_DELAY is the minimum FRAMETIME that can be requested by effects

The logic is now very much in-line with the original function from WS2812FX
https://github.com/kitesurfer1404/WS2812FX/blob/ca19d320217afb970163c4771d62a183f761202a/src/WS2812FX.cpp#L81-L82


side note
We'll need to take a look at the "Android" effect - it requests FRAMETIME between 3 and 25ms for a standard 1m 60leds strip. Actually the calculation breaks completely when SEGLEN > 128, where the effect will deliver a constant "3".

https://github.com/Aircoookie/WLED/blob/70323b947745e81644b955d3d16cc2bd24059636/wled00/FX.cpp#L811

imho the effect logic of "andriod" needs optimization - actually variable framerates are not needed, if the main effect code is put into a loop that handles several pixels (number depending on speed setting and SEGLEN).

@softhack007 softhack007 dismissed blazoncek’s stale review November 5, 2024 11:18

the topic is well-explained now in the discussion

@softhack007
Copy link
Member Author

@blazoncek to be honest, I'm tired of justifying my work against your vague concerns. Did you ever bother to compile and test this PR?

To all maintainers: I want to know if the changes I proposed in the PR are something you want in AC WLED (0.15.1) or not - as is or with improvements that we can still discuss.

If nobody except myself supports this PR, I'll simply close it, and go on optimizing WLED in the MoonModules fork.

@DedeHai
Copy link
Collaborator

DedeHai commented Nov 6, 2024

just by looking at the FPS improvements, there clearly is something wrong with the current code, so I appreciate the efforts you are putting into this! I do however not understand the code or the issue enough to comment on the implementation of the improvements.

@blazoncek
Copy link
Contributor

@blazoncek to be honest, I'm tired of justifying my work against your vague concerns

You requested my review, so I gave it. Feel free to do whatever you choose. I am no longer participating in WLED development.

@DedeHai
Copy link
Collaborator

DedeHai commented Nov 6, 2024

regarding the "unstable warning": while a warning is ok (keep it as short as possible to not waste flash memory) having a high FPS setting potentially corrupt files is not acceptable IMHO. Unstable as in 'it may occasionally crash' is ok, as long as it does not happen on simple setups. Users tend to ignore all warings and then go complain on discord / github.

edit:
are the reasons known why it would crash or even corrupt files? could file access be handled in a safer way for example by disabling all other tasks while files are being written?

@softhack007
Copy link
Member Author

softhack007 commented Nov 6, 2024

edit:
are the reasons known why it would crash or even corrupt files? could file access be handled in a safer way for example by disabling all other tasks while files are being written?

Hi,
if you run lots of other thinks like alexa, IR, mqtt, HA integration etc, these could - possibly - not get enough CPU time to properly function. I don't think it can corrupt files, because the main loop() is still running, just at a lower rate with fewer idle loops. Actually, the warning was more meant like "I don't know for sure, so better have a backup". Maybe the wording is too scary.

I haven't seen any real crashes due to high FPS in over a year (with a similar patch in the MM fork), but I'm mainly using esp32 and -S3, so don't know for sure if 8266 or -C3 might get into trouble when cranking up the FPS.

@DedeHai
Copy link
Collaborator

DedeHai commented Nov 6, 2024

I run my particle system Tests on a C3 with FPS set to 99 all the time. I get eventual crashes when adding lots of segments or switching FX very fast but if I just use it 'normally' then it works. I assume the crashes are mostly due to heap exhaustion. It is a simple setup, no alexa or HA but sometimes with AR enabled.

Edit: should I run the changes from this PR on a C3 for testing? What would be worst case scenarios for crashes?

@softhack007
Copy link
Member Author

regarding the "unstable warning":

@DedeHai How about this one - more compact, and less scary. "backup" is a link to sec settings page.

image

@softhack007
Copy link
Member Author

Edit: should I run the changes from this PR on a C3 for testing? What would be worst case scenarios for crashes?

yes please - behaviour on -C3 should be almost the same as before. Worst case might be that saving presets is not working, or UI reacts very slowly.

For a test - just to check if C3 can do it - change #if defined(ARDUINO_ARCH_ESP32) && !defined(CONFIG_IDF_TARGET_ESP32C3) to #if defined(ARDUINO_ARCH_ESP32) in fx.h and fx_fcn.cpp.

* removed "use 0 for unlimited"
* added "high FPS mode is experimental" warning
* added "backup first!" warning
* added anchor #backup to sec page
Copy link
Member

@willmmiles willmmiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is shaping up well! I think once we get the intent of MIN_SHOW_DELAY sorted, we'll be ready to go.

* MIN_SHOW_DELAY -> MIN_FRAME_DELAY
* allow up to 250 for target FPS
* minor cleanup
* added specific MIN_FRAME_DELAY for -S2
@softhack007
Copy link
Member Author

softhack007 commented Nov 8, 2024

@willmmiles The new logic would also cause very frequent updates of analog PWM outputs, if users set target FPS above 60.

I don't believe it will be a problem for esp32, however I'm not sure for 8266 (its more your area 😉).
Maybe you could cross-check if 8266 PWM can take an update each 8ms?

I'm thinking about the code in BusPwm::show():
https://github.com/Aircoookie/WLED/blob/271a07a7d6a40026b184cb2c0f96c041f5165a42/wled00/bus_manager.cpp#L596

@willmmiles
Copy link
Member

@willmmiles The new logic would also cause very frequent updates of analog PWM outputs, if users set target FPS above 60.

I don't believe it will be a problem for esp32, however I'm not sure for 8266 (its more your area 😉). Maybe you could cross-check if 8266 PWM can take an update each 8ms?

I'm thinking about the code in BusPwm::show():

https://github.com/Aircoookie/WLED/blob/271a07a7d6a40026b184cb2c0f96c041f5165a42/wled00/bus_manager.cpp#L596

Hmm, yes, I have an active PR in that area: #4165

For non-phase-aligned channels, it's basically just writing the new duty cycle to some shared memory. For phase aligned channels there's a short wait (at most 1 cycle at the analog frequency) per PWM channel updated.

So I expect it'll be fine. I'll validate with a scope over the weekend. Strictly speaking the old code could theoretically update at that rate too, if tickled the right way.

@willmmiles
Copy link
Member

Glad I ran the test! The overall logic has no issues with high frame rates, but I found a case where the phase locking can break down (resulting in flickering) if the first channel is at fully on or fully off.

@DedeHai
Copy link
Collaborator

DedeHai commented Nov 21, 2024

Finally got around to test this PR. Got my upvote!
I see a huge increase in FPS on the C3 and runs rock solid. I even lowered MIN_FRAME_DELAY to zero, polled FPS reading over websocket 10x per second or refreshing the webUI manually in quick succession, while running 2 overlapping segments on 16x16 with heavy computation, FPS set to unlimited, actual FPS was around 70, meaning FX calculation was the limiting factor. In this test, I could not get it to crash or to not save a prset or config. UI loading is very fast, on par with vanilla 0.15 if not faster, even in unlimited mode with MIN_FRAME_DELAY set to 1.
I would suggest to also use the MIN_FRAME_DELAY of 4 for the C3 like it is used for S2.

@softhack007
Copy link
Member Author

I would suggest to also use the MIN_FRAME_DELAY of 4 for the C3 like it is used for S2.

@DedeHai thanks for the testing and feedback 😃
I'll change as you suggested, so S2 and C3 will use the same MIN_FRAME_DELAY of 4 (or maybe reduce to 3 for both S2 and C3).

@netmindz
Copy link
Member

netmindz commented Dec 1, 2024

Are you happy for me to merge @DedeHai , now I've tagged 0.15.0.rc.1 ?

@DedeHai
Copy link
Collaborator

DedeHai commented Dec 1, 2024

I tested it briefly on C3 and it worked, I do not understand the code enough to say if its safe to merge without beta testing.

@softhack007
Copy link
Member Author

softhack007 commented Dec 1, 2024

@netmindz I think this is safe to merge - we have similar code in MM since a long time.

The only "but" I can find is the android effect - it might run a bit slower than before, when the speed slider is set to high values. That's actually due crazy coding of this one effect.

@netmindz netmindz merged commit a873ca6 into wled:0_15 Dec 4, 2024
20 checks passed
@softhack007 softhack007 deleted the framerate_ac015 branch December 11, 2024 20:25
netmindz added a commit that referenced this pull request Feb 22, 2025
Improved framerate control code - strip.show(), strip.service()
@netmindz netmindz mentioned this pull request May 26, 2025
25 tasks
Pluto1010 pushed a commit to Pluto1010/WLED that referenced this pull request Sep 1, 2025
Improved framerate control code - strip.show(), strip.service()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement optimization re-working an existing feature to be faster, or use less memory

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants