forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 143
Uninitialized variable use #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Charles Keepax <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
In the old txt situation we add/describe only properties that are used by the driver/hardware itself. With yaml it also filters things in a node that are used by other drivers like 'power-domains' for rk3399, so add it to 'rockchip-i2s.yaml'. Signed-off-by: Johan Jonker <[email protected]> Reviewed-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
Commit 52120e8 ("dt-bindings: display: fix panel warnings") removed the dsi unit name, but missed to remove the 'reg' property, which causes the following 'make dt_binding_check' warning: Documentation/devicetree/bindings/display/panel/leadtek,ltk500hd1829.example.dts:17.5-29.11: Warning (unit_address_vs_reg): /example-0/dsi: node has a reg or ranges property, but no unit name Fix it by removing the unneeded 'reg' property. Fixes: 52120e8 ("dt-bindings: display: fix panel warnings") Signed-off-by: Fabio Estevam <[email protected]> Signed-off-by: Sam Ravnborg <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit dcde9c0) Signed-off-by: Sam Ravnborg <[email protected]>
Commit 52120e8 ("dt-bindings: display: fix panel warnings") removed the dsi unit name, but missed to remove the 'reg' property, which causes the following 'make dt_binding_check' warning: Documentation/devicetree/bindings/display/panel/xinpeng,xpp055c272.example.dts:17.5-29.11: Warning (unit_address_vs_reg): /example-0/dsi: node has a reg or ranges property, but no unit name Fix it by removing the unneeded 'reg' property. Fixes: 52120e8 ("dt-bindings: display: fix panel warnings") Signed-off-by: Fabio Estevam <[email protected]> Signed-off-by: Sam Ravnborg <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit b1e4475) Signed-off-by: Sam Ravnborg <[email protected]>
Both port and ports names may be used in a panel-lvds binding port - for a single port ports - if there is more than one port in sub-nodes Fixes the following warning: advantech,idk-2121wr.example.dt.yaml: panel-lvds: 'port' is a required property advantech,idk-2121wr.yaml needs several ports, so uses a ports node. v2: - Use oneOf - makes the logic more obvious (Rob) - Added Fixes tag - Added port: true, ports:true v3: - Indent port/ports in required two spaces (Rob) Signed-off-by: Sam Ravnborg <[email protected]> Reviewed-by: Rob Herring <[email protected]> Cc: Rob Herring <[email protected]> Fixes: 8efef33 ("dt-bindings: display: Add idk-2121wr binding") Cc: Fabrizio Castro <[email protected]> Cc: Lad Prabhakar <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Thierry Reding <[email protected]> Cc: [email protected] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 8089a62) Signed-off-by: Sam Ravnborg <[email protected]>
Some drivers (e.g. sun4i-drm) need this info to decide whether they need to enable dithering. Currently driver reports what panel supports and if panel supports 8 we don't get dithering enabled. Hardcode BPC to 6 for now since that's the only BPC that driver supports. Fixes: 6aa1926 ("drm/bridge: Add Analogix anx6345 support") Signed-off-by: Vasily Khoruzhick <[email protected]> Acked-by: Jernej Skrabec <[email protected]> Signed-off-by: Jernej Skrabec <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Free block_cb memory when asked to be deleted. Fixes: 978703f ("netfilter: flowtable: Add API for registering to flow table events") Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Paul Blakey <[email protected]> Reviewed-by: Oz Shlomo <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
Some Google Apex Edge TPU devices have a class code of 0 (PCI_CLASS_NOT_DEFINED). This prevents the PCI core from assigning resources for the Apex BARs because __dev_sort_resources() ignores classless devices, host bridges, and IOAPICs. On x86, firmware typically assigns those resources, so this was not a problem. But on some architectures, firmware does *not* assign BARs, and since the PCI core didn't do it either, the Apex device didn't work correctly: apex 0000:01:00.0: can't enable device: BAR 0 [mem 0x00000000-0x00003fff 64bit pref] not claimed apex 0000:01:00.0: error enabling PCI device f390d08 ("staging: gasket: apex: fixup undefined PCI class") added a quirk to fix the class code, but it was in the apex driver, and if the driver was built as a module, it was too late to help. Move the quirk to the PCI core, where it will always run early enough that the PCI core will assign resources if necessary. Link: https://lore.kernel.org/r/CAEzXK1r0Er039iERnc2KJ4jn7ySNUOG9H=Ha8TD8XroVqiZjgg@mail.gmail.com Fixes: f390d08 ("staging: gasket: apex: fixup undefined PCI class") Reported-by: Luís Mendes <[email protected]> Debugged-by: Luís Mendes <[email protected]> Tested-by: Luis Mendes <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Cc: Todd Poynor <[email protected]>
If the __copy_from_user function failed we need to call sg_remove_request in sg_write. Link: https://lore.kernel.org/r/[email protected] Acked-by: Douglas Gilbert <[email protected]> Signed-off-by: Wu Bo <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Diego Elio Pettenò <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
The firmware driver is optional, but the power driver depends on it, which needs to be reflected in Kconfig to avoid link errors: aarch64-linux-ld: drivers/soc/xilinx/zynqmp_power.o: in function `zynqmp_pm_isr': zynqmp_power.c:(.text+0x284): undefined reference to `zynqmp_pm_invoke_fn' The firmware driver can probably be allowed for compile-testing as well, so it's best to drop the dependency on the ZYNQ platform here and allow building as long as the firmware code is built-in. Fixes: ab27264 ("drivers: soc: xilinx: Add ZynqMP PM driver") Signed-off-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michal Simek <[email protected]>
The function “platform_get_irq” can log an error already. Thus omit a redundant message for the exception handling in the calling function. This issue was detected by using the Coccinelle software. Fixes: 3f68be7 ("drm/meson: Add support for HDMI encoder and DW-HDMI bridge + PHY") Signed-off-by: Markus Elfring <[email protected]> Reviewed-by: Neil Armstrong <[email protected]> Signed-off-by: Neil Armstrong <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Fix the following sparse warning: drivers/firmware/xilinx/zynqmp-debug.c:38:15: warning: symbol 'firmware_debugfs_root' was not declared. Should it be static? Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michal Simek <[email protected]>
We are racing to initialize sched->thread here, just always check the current thread. Signed-off-by: Christian König <[email protected]> Reviewed-by: Andrey Grodzovsky <[email protected]> Reviewed-by: Kent Russell <[email protected]> Link: https://patchwork.freedesktop.org/patch/361303/
As mentioned slightly out of patch context in the code, there is no reset routine for the chip. On boards where the chip is supplied by a fixed regulator, it might not even be resetted during (e.g. watchdog) reboot and can be in any state. If the device is probed with VAG enabled, the driver's probe routine will generate a loud pop sound when ANA_POWER is being programmed. Avoid this by properly disabling just the VAG bit and waiting the required power down time. Signed-off-by: Sebastian Reichel <[email protected]> Reviewed-by: Fabio Estevam <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
At the moment, PCM devices for DPCM are only created based on the
dpcm_playback/capture parameters of the DAI link, without considering
if the CPU/FE DAI is actually capable of playback/capture.
Normally the dpcm_playback/capture parameter should match the
capabilities of the CPU DAI. However, there is no way to set that
parameter from the device tree (e.g. with simple-audio-card or
qcom sound cards). dpcm_playback/capture are always both set to 1.
This causes problems when the CPU DAI does only support playback
or capture. Attemting to open that PCM device with an unsupported
stream type then results in a null pointer dereference:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000128
Internal error: Oops: 96000044 [#1] PREEMPT SMP
CPU: 3 PID: 1582 Comm: arecord Not tainted 5.7.0-rc1
pc : invalidate_paths_ep+0x30/0xe0
lr : snd_soc_dapm_dai_get_connected_widgets+0x170/0x1a8
Call trace:
invalidate_paths_ep+0x30/0xe0
snd_soc_dapm_dai_get_connected_widgets+0x170/0x1a8
dpcm_path_get+0x38/0xd0
dpcm_fe_dai_open+0x70/0x920
snd_pcm_open_substream+0x564/0x840
snd_pcm_open+0xfc/0x228
snd_pcm_capture_open+0x4c/0x78
snd_open+0xac/0x1a8
...
... because the DAI playback/capture_widget is not set in that case.
We could add checks there to fix the problem (maybe we should
anyway), but much easier is to not expose the device as
playback/capture in the first place. Attemting to use that
device would always fail later anyway.
Add checks for snd_soc_dai_stream_valid() to the DPCM case
to avoid exposing playback/capture if it is not supported.
Signed-off-by: Stephan Gerhold <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
For some reason, the MI2S DAIs do not have channels_min/max defined. This means that snd_soc_dai_stream_valid() returns false, i.e. the DAIs have neither valid playback nor capture stream. It's quite surprising that this ever worked correctly, but in 5.7-rc1 this is now failing badly: :) Commit 0e9cf4c ("ASoC: pcm: check if cpu-dai supports a given stream") introduced a check for snd_soc_dai_stream_valid() before calling hw_params(), which means that the q6i2s_hw_params() function was never called, eventually resulting in: qcom-q6afe aprsvc:q6afe:4:4: no line is assigned ... even though "qcom,sd-lines" is set in the device tree. Commit 9b5db05 ("ASoC: soc-pcm: dpcm: Only allow playback/capture if supported") now even avoids creating PCM devices if the stream is not supported, which means that it is failing even earlier with e.g.: Primary MI2S: ASoC: no backend playback stream Avoid all that trouble by adding channels_min/max for the MI2S DAIs. Fixes: 24c4cbc ("ASoC: qdsp6: q6afe: Add q6afe dai driver") Signed-off-by: Stephan Gerhold <[email protected]> Reviewed-by: Srinivas Kandagatla <[email protected]> Cc: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
As done in already existing cases, we should use le32_to_cpu macro while accessing hdr->magic. Found with sparse. Signed-off-by: Amadeusz Sławiński <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
On Baytrail/Cherrytrail, the Atom/SST driver fails miserably: [ 9.741953] intel_sst_acpi 80860F28:00: FW Version 01.0c.00.01 [ 9.832992] intel_sst_acpi 80860F28:00: FW sent error response 0x40034 [ 9.833019] intel_sst_acpi 80860F28:00: FW alloc failed ret -4 [ 9.833028] intel_sst_acpi 80860F28:00: sst_get_stream returned err -5 [ 9.833033] sst-mfld-platform sst-mfld-platform: ASoC: DAI prepare error: -5 [ 9.833037] Baytrail Audio Port: ASoC: prepare FE Baytrail Audio Port failed [ 9.853942] intel_sst_acpi 80860F28:00: FW sent error response 0x40034 [ 9.853974] intel_sst_acpi 80860F28:00: FW alloc failed ret -4 [ 9.853984] intel_sst_acpi 80860F28:00: sst_get_stream returned err -5 [ 9.853990] sst-mfld-platform sst-mfld-platform: ASoC: DAI prepare error: -5 [ 9.853994] Baytrail Audio Port: ASoC: prepare FE Baytrail Audio Port failed Commit b56be80 ("ASoC: soc-pcm: call snd_soc_dai_startup()/shutdown() once") was the initial problematic commit. Commit 1ba616b ("ASoC: soc-dai: fix DAI startup/shutdown sequence") was an attempt to fix things but it does not work on Baytrail, reverting all changes seems necessary for now. Fixes: 1ba616b ("ASoC: soc-dai: fix DAI startup/shutdown sequence") Signed-off-by: Pierre-Louis Bossart <[email protected]> Tested-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
The parent SSI of a multi-SSI setup must be fully setup, started and stopped since it is also part of the playback/capture setup. So only skip the SSI (as per commit 203cdf5 ("ASoC: rsnd: SSI parent cares SWSP bit") and commit 597b046 ("ASoC: rsnd: control SSICR::EN correctly")) if the SSI is parent outside of a multi-SSI setup. Signed-off-by: Matthias Blankertz <[email protected]> Acked-by: Kuninori Morimoto <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
The HDMI?_SEL register maps up to four stereo SSI data lanes onto the sdata[0..3] inputs of the HDMI output block. The upper half of the register contains four blocks of 4 bits, with the most significant controlling the sdata3 line and the least significant the sdata0 line. The shift calculation has an off-by-one error, causing the parent SSI to be mapped to sdata3, the first multi-SSI child to sdata0 and so forth. As the parent SSI transmits the stereo L/R channels, and the HDMI core expects it on the sdata0 line, this causes no audio to be output when playing stereo audio on a multichannel capable HDMI out, and multichannel audio has permutated channels. Fix the shift calculation to map the parent SSI to sdata0, the first child to sdata1 etc. Signed-off-by: Matthias Blankertz <[email protected]> Acked-by: Kuninori Morimoto <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
If we don't find any pcm, pcm will point at address at an offset from the the list head and not a meaningful structure. Fix this by returning correct pcm if found and NULL if not. Found with coccinelle. Signed-off-by: Amadeusz Sławiński <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
…rom Matthias Blankertz <[email protected]>: This fixes two issues in the snd-soc-rcar driver blocking multichannel HDMI audio out: The parent SSI in a multi-SSI configuration is not correctly set up and started, and the SSI->HDMI channel mapping is wrong. With these patches, the following device tree snippet can be used on an r8a7795-based platform (Salvator-X) to enable multichannel HDMI audio on HDMI0: rsnd_port1: port@1 { rsnd_endpoint1: endpoint { remote-endpoint = <&dw_hdmi0_snd_in>; dai-format = "i2s"; bitclock-master = <&rsnd_endpoint1>; frame-master = <&rsnd_endpoint1>; playback = <&ssi0 &ssi1 &ssi2 &ssi9>; }; }; With a capable receiver attached, all of 2ch (stereo), 6ch (e.g. 5.1) and 8ch audio output should work. Matthias Blankertz (2): ASoC: rsnd: Fix parent SSI start/stop in multi-SSI mode ASoC: rsnd: Fix HDMI channel mapping for multi-SSI mode sound/soc/sh/rcar/ssi.c | 8 ++++---- sound/soc/sh/rcar/ssiu.c | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) base-commit: 7111951 -- 2.26.0
Since its inception the module was meant to be disabled by default, but the original commit failed to add the relevant property. Fixes: 4aba4cf ("ARM: dts: bcm2835: Add the DSI module nodes and clocks") Signed-off-by: Nicolas Saenz Julienne <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Florian Fainelli <[email protected]>
If looking up the DT "firmware-name" property fails in q6v6_probe(), the function returns without freeing the remoteproc structure that has been allocated. Fix this by jumping to the free_rproc label, which takes care of this. Signed-off-by: Alex Elder <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bjorn Andersson <[email protected]>
If an error occurs in q6v5_probe() after the proxy power domains are attached, but before qcom_add_ipa_notify_subdev() is called, qcom_remove_ipa_notify_subdev() is called in the error path, which is a bug. Fix this by having that call be reached through a different label. Additionally, if qcom_add_sysmon_subdev() returns an error, the subdevs that had already been added will not be properly removed. Fix this by having the added subdevs (including the IPA notify one) be removed in this case. Finally, arrange for the sysmon subdev to be removed before the rest in the event rproc_add() returns an error. Have cleanup activity done in q6v5_remove() be done in the reverse order they are set up in q6v5_probe() (the same order they're done in the q6v5_probe() error path). Use a local variable for the remoteproc pointer, which is used repeatedly. Remove errant semicolons at the end of two function blocks. Signed-off-by: Alex Elder <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bjorn Andersson <[email protected]>
dma_addr_t and phys_addr_t are distinct types and must not be
mixed, as both the values and the size of the type may be
different depending on what the remote device uses.
In this driver the compiler warns when the two types are different:
drivers/remoteproc/mtk_scp.c: In function 'scp_map_memory_region':
drivers/remoteproc/mtk_scp.c:454:9: error: passing argument 3 of 'dma_alloc_coherent' from incompatible pointer type [-Werror=incompatible-pointer-types]
454 | &scp->phys_addr, GFP_KERNEL);
| ^~~~~~~~~~~~~~~
| |
| phys_addr_t * {aka unsigned int *}
In file included from drivers/remoteproc/mtk_scp.c:7:
include/linux/dma-mapping.h:642:15: note: expected 'dma_addr_t *' {aka 'long long unsigned int *'} but argument is of type 'phys_addr_t *' {aka 'unsigned int *'}
642 | dma_addr_t *dma_handle, gfp_t gfp)
Change the phys_addr member to be typed and named according
to how it is allocated.
Fixes: 63c13d6 ("remoteproc/mediatek: add SCP support for mt8183")
Signed-off-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bjorn Andersson <[email protected]>
Previously, http://wireless.kernel.org would redirect to https://wireless.wiki.kernel.org, however, this is no longer the case and most pages return 404. https is used because http://wireless.kernel.org/* redirects to https://wireless.wiki.kernel.org/* Signed-off-by: Nils ANDRÉ-CHANG <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/20200415140052.2lftkixe37llmtjl@nixos
Removing Avinash since tomorrow is his last day at Quantenna. Signed-off-by: Sergey Matyukevich <[email protected]> Signed-off-by: Igor Mitsyanko <[email protected]> Signed-off-by: Avinash Patil <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected]
As the recent regression showed, we want sometimes to turn off the audio component binding just for debugging. This patch adds the module option to control it easily without compilation. Fixes: ade49db ("ALSA: hda/hdmi - Allow audio component for AMD/ATI and Nvidia HDMI") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207223 Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Nov 14, 2025
When freeing indexed arrays, the corresponding free function should
be called for each entry of the indexed array. For example, for
for 'struct tc_act_attrs' 'tc_act_attrs_free(...)' needs to be called
for each entry.
Previously, memory leaks were reported when enabling the ASAN
analyzer.
=================================================================
==874==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
#1 0x55c98db048af in tc_act_attrs_set_options_vlan_parms ../generated/tc-user.h:2813
#2 0x55c98db048af in main ./linux/tools/net/ynl/samples/tc-filter-add.c:71
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
#1 0x55c98db04a93 in tc_act_attrs_set_options_vlan_parms ../generated/tc-user.h:2813
#2 0x55c98db04a93 in main ./linux/tools/net/ynl/samples/tc-filter-add.c:74
Direct leak of 10 byte(s) in 2 object(s) allocated from:
#0 0x7f221fd20cb5 in malloc ./debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:67
#1 0x55c98db0527d in tc_act_attrs_set_kind ../generated/tc-user.h:1622
SUMMARY: AddressSanitizer: 58 byte(s) leaked in 4 allocation(s).
The following diff illustrates the changes introduced compared to the
previous version of the code.
void tc_flower_attrs_free(struct tc_flower_attrs *obj)
{
+ unsigned int i;
+
free(obj->indev);
+ for (i = 0; i < obj->_count.act; i++)
+ tc_act_attrs_free(&obj->act[i]);
free(obj->act);
free(obj->key_eth_dst);
free(obj->key_eth_dst_mask);
Signed-off-by: Zahari Doychev <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Nov 24, 2025
When emulating an nvme device on qemu with both logical_block_size and physical_block_size set to 8 KiB, but without format, a kernel panic was triggered during the early boot stage while attempting to mount a vfat filesystem. [95553.682035] EXT4-fs (nvme0n1): unable to set blocksize [95553.684326] EXT4-fs (nvme0n1): unable to set blocksize [95553.686501] EXT4-fs (nvme0n1): unable to set blocksize [95553.696448] ISOFS: unsupported/invalid hardware sector size 8192 [95553.697117] ------------[ cut here ]------------ [95553.697567] kernel BUG at fs/buffer.c:1582! [95553.697984] Oops: invalid opcode: 0000 [#1] SMP NOPTI [95553.698602] CPU: 0 UID: 0 PID: 7212 Comm: mount Kdump: loaded Not tainted 6.18.0-rc2+ torvalds#38 PREEMPT(voluntary) [95553.699511] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [95553.700534] RIP: 0010:folio_alloc_buffers+0x1bb/0x1c0 [95553.701018] Code: 48 8b 15 e8 93 18 02 65 48 89 35 e0 93 18 02 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc <0f> 0b 90 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f [95553.702648] RSP: 0018:ffffd1b0c676f990 EFLAGS: 00010246 [95553.703132] RAX: ffff8cfc4176d820 RBX: 0000000000508c48 RCX: 0000000000000001 [95553.703805] RDX: 0000000000002000 RSI: 0000000000000000 RDI: 0000000000000000 [95553.704481] RBP: ffffd1b0c676f9c8 R08: 0000000000000000 R09: 0000000000000000 [95553.705148] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [95553.705816] R13: 0000000000002000 R14: fffff8bc8257e800 R15: 0000000000000000 [95553.706483] FS: 000072ee77315840(0000) GS:ffff8cfdd2c8d000(0000) knlGS:0000000000000000 [95553.707248] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [95553.707782] CR2: 00007d8f2a9e5a20 CR3: 0000000039d0c006 CR4: 0000000000772ef0 [95553.708439] PKRU: 55555554 [95553.708734] Call Trace: [95553.709015] <TASK> [95553.709266] __getblk_slow+0xd2/0x230 [95553.709641] ? find_get_block_common+0x8b/0x530 [95553.710084] bdev_getblk+0x77/0xa0 [95553.710449] __bread_gfp+0x22/0x140 [95553.710810] fat_fill_super+0x23a/0xfc0 [95553.711216] ? __pfx_setup+0x10/0x10 [95553.711580] ? __pfx_vfat_fill_super+0x10/0x10 [95553.712014] vfat_fill_super+0x15/0x30 [95553.712401] get_tree_bdev_flags+0x141/0x1e0 [95553.712817] get_tree_bdev+0x10/0x20 [95553.713177] vfat_get_tree+0x15/0x20 [95553.713550] vfs_get_tree+0x2a/0x100 [95553.713910] vfs_cmd_create+0x62/0xf0 [95553.714273] __do_sys_fsconfig+0x4e7/0x660 [95553.714669] __x64_sys_fsconfig+0x20/0x40 [95553.715062] x64_sys_call+0x21ee/0x26a0 [95553.715453] do_syscall_64+0x80/0x670 [95553.715816] ? __fs_parse+0x65/0x1e0 [95553.716172] ? fat_parse_param+0x103/0x4b0 [95553.716587] ? vfs_parse_fs_param_source+0x21/0xa0 [95553.717034] ? __do_sys_fsconfig+0x3d9/0x660 [95553.717548] ? __x64_sys_fsconfig+0x20/0x40 [95553.717957] ? x64_sys_call+0x21ee/0x26a0 [95553.718360] ? do_syscall_64+0xb8/0x670 [95553.718734] ? __x64_sys_fsconfig+0x20/0x40 [95553.719141] ? x64_sys_call+0x21ee/0x26a0 [95553.719545] ? do_syscall_64+0xb8/0x670 [95553.719922] ? x64_sys_call+0x1405/0x26a0 [95553.720317] ? do_syscall_64+0xb8/0x670 [95553.720702] ? __x64_sys_close+0x3e/0x90 [95553.721080] ? x64_sys_call+0x1b5e/0x26a0 [95553.721478] ? do_syscall_64+0xb8/0x670 [95553.721841] ? irqentry_exit+0x43/0x50 [95553.722211] ? exc_page_fault+0x90/0x1b0 [95553.722681] entry_SYSCALL_64_after_hwframe+0x76/0x7e [95553.723166] RIP: 0033:0x72ee774f3afe [95553.723562] Code: 73 01 c3 48 8b 0d 0a 33 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 49 89 ca b8 af 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d da 32 0f 00 f7 d8 64 89 01 48 [95553.725188] RSP: 002b:00007ffe97148978 EFLAGS: 00000246 ORIG_RAX: 00000000000001af [95553.725892] RAX: ffffffffffffffda RBX: 00005dcfe53d0080 RCX: 000072ee774f3afe [95553.726526] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003 [95553.727176] RBP: 00007ffe97148ac0 R08: 0000000000000000 R09: 000072ee775e7ac0 [95553.727818] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [95553.728459] R13: 00005dcfe53d04b0 R14: 000072ee77670b00 R15: 00005dcfe53d1a28 [95553.729086] </TASK> The panic occurs as follows: 1. logical_block_size is 8KiB, causing {struct super_block *sb}->s_blocksize is initialized to 0. vfat_fill_super - fat_fill_super - sb_min_blocksize - sb_set_blocksize //return 0 when size is 8KiB. 2. __bread_gfp is called with size == 0, causing folio_alloc_buffers() to compute an offset equal to folio_size(folio), which triggers a BUG_ON. fat_fill_super - sb_bread - __bread_gfp // size == {struct super_block *sb}->s_blocksize == 0 - bdev_getblk - __getblk_slow - grow_buffers - grow_dev_folio - folio_alloc_buffers // size == 0 - folio_set_bh //offset == folio_size(folio) and panic To fix this issue, add proper return value checks for sb_min_blocksize(). Cc: [email protected] # v6.15 Fixes: a64e5a5 ("bdev: add back PAGE_SIZE block size validation for sb_set_blocksize()") Reviewed-by: Matthew Wilcox <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: OGAWA Hirofumi <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Yongpeng Yang <[email protected]> Link: https://patch.msgid.link/[email protected] Acked-by: OGAWA Hirofumi <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Nov 24, 2025
If the allocation of tl_hba->sh fails in tcm_loop_driver_probe() and we attempt to dereference it in tcm_loop_tpg_address_show() we will get a segfault, see below for an example. So, check tl_hba->sh before dereferencing it. Unable to allocate struct scsi_host BUG: kernel NULL pointer dereference, address: 0000000000000194 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 8356 Comm: tokio-runtime-w Not tainted 6.6.104.2-4.azl3 #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/28/2024 RIP: 0010:tcm_loop_tpg_address_show+0x2e/0x50 [tcm_loop] ... Call Trace: <TASK> configfs_read_iter+0x12d/0x1d0 [configfs] vfs_read+0x1b5/0x300 ksys_read+0x6f/0xf0 ... Cc: [email protected] Fixes: 2628b35 ("tcm_loop: Show address of tpg in configfs") Signed-off-by: Hamza Mahfooz <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Allen Pais <[email protected]> Link: https://patch.msgid.link/1762370746-6304-1-git-send-email-hamzamahfooz@linux.microsoft.com Signed-off-by: Martin K. Petersen <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Nov 24, 2025
The kernel test has reported: BUG: unable to handle page fault for address: fffba000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page *pde = 03171067 *pte = 00000000 Oops: Oops: 0002 [#1] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G T 6.18.0-rc2-00031-gec7f31b2a2d3 #1 NONE a1d066dfe789f54bc7645c7989957d2bdee593ca Tainted: [T]=RANDSTRUCT Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 EIP: memset (arch/x86/include/asm/string_32.h:168 arch/x86/lib/memcpy_32.c:17) Code: a5 8b 4d f4 83 e1 03 74 02 f3 a4 83 c4 04 5e 5f 5d 2e e9 73 41 01 00 90 90 90 3e 8d 74 26 00 55 89 e5 57 56 89 c6 89 d0 89 f7 <f3> aa 89 f0 5e 5f 5d 2e e9 53 41 01 00 cc cc cc 55 89 e5 53 57 56 EAX: 0000006b EBX: 00000015 ECX: 001fefff EDX: 0000006b ESI: fffb9000 EDI: fffba000 EBP: c611fbf0 ESP: c611fbe8 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010287 CR0: 80050033 CR2: fffba000 CR3: 0316e000 CR4: 00040690 Call Trace: poison_element (mm/mempool.c:83 mm/mempool.c:102) mempool_init_node (mm/mempool.c:142 mm/mempool.c:226) mempool_init_noprof (mm/mempool.c:250 (discriminator 1)) ? mempool_alloc_pages (mm/mempool.c:640) bio_integrity_initfn (block/bio-integrity.c:483 (discriminator 8)) ? mempool_alloc_pages (mm/mempool.c:640) do_one_initcall (init/main.c:1283) Christoph found out this is due to the poisoning code not dealing properly with CONFIG_HIGHMEM because only the first page is mapped but then the whole potentially high-order page is accessed. We could give up on HIGHMEM here, but it's straightforward to fix this with a loop that's mapping, poisoning or checking and unmapping individual pages. Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Analyzed-by: Christoph Hellwig <[email protected]> Fixes: bdfedb7 ("mm, mempool: poison elements backed by slab allocator") Cc: [email protected] Tested-by: kernel test robot <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Vlastimil Babka <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Nov 24, 2025
The validation of the set(nsh(...)) action is completely wrong. It runs through the nsh_key_put_from_nlattr() function that is the same function that validates NSH keys for the flow match and the push_nsh() action. However, the set(nsh(...)) has a very different memory layout. Nested attributes in there are doubled in size in case of the masked set(). That makes proper validation impossible. There is also confusion in the code between the 'masked' flag, that says that the nested attributes are doubled in size containing both the value and the mask, and the 'is_mask' that says that the value we're parsing is the mask. This is causing kernel crash on trying to write into mask part of the match with SW_FLOW_KEY_PUT() during validation, while validate_nsh() doesn't allocate any memory for it: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1c2383067 P4D 1c2383067 PUD 20b703067 PMD 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 8 UID: 0 Kdump: loaded Not tainted 6.17.0-rc4+ torvalds#107 PREEMPT(voluntary) RIP: 0010:nsh_key_put_from_nlattr+0x19d/0x610 [openvswitch] Call Trace: <TASK> validate_nsh+0x60/0x90 [openvswitch] validate_set.constprop.0+0x270/0x3c0 [openvswitch] __ovs_nla_copy_actions+0x477/0x860 [openvswitch] ovs_nla_copy_actions+0x8d/0x100 [openvswitch] ovs_packet_cmd_execute+0x1cc/0x310 [openvswitch] genl_family_rcv_msg_doit+0xdb/0x130 genl_family_rcv_msg+0x14b/0x220 genl_rcv_msg+0x47/0xa0 netlink_rcv_skb+0x53/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x280/0x3b0 netlink_sendmsg+0x1f7/0x430 ____sys_sendmsg+0x36b/0x3a0 ___sys_sendmsg+0x87/0xd0 __sys_sendmsg+0x6d/0xd0 do_syscall_64+0x7b/0x2c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The third issue with this process is that while trying to convert the non-masked set into masked one, validate_set() copies and doubles the size of the OVS_KEY_ATTR_NSH as if it didn't have any nested attributes. It should be copying each nested attribute and doubling them in size independently. And the process must be properly reversed during the conversion back from masked to a non-masked variant during the flow dump. In the end, the only two outcomes of trying to use this action are either validation failure or a kernel crash. And if somehow someone manages to install a flow with such an action, it will most definitely not do what it is supposed to, since all the keys and the masks are mixed up. Fixing all the issues is a complex task as it requires re-writing most of the validation code. Given that and the fact that this functionality never worked since introduction, let's just remove it altogether. It's better to re-introduce it later with a proper implementation instead of trying to fix it in stable releases. Fixes: b2d0f5d ("openvswitch: enable NSH support") Reported-by: Junvy Yang <[email protected]> Signed-off-by: Ilya Maximets <[email protected]> Acked-by: Eelco Chaudron <[email protected]> Reviewed-by: Aaron Conole <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Nov 24, 2025
When crashkernel is configured with a high reservation, shrinking its value below the low crashkernel reservation causes two issues: 1. Invalid crashkernel resource objects 2. Kernel crash if crashkernel shrinking is done twice For example, with crashkernel=200M,high, the kernel reserves 200MB of high memory and some default low memory (say 256MB). The reservation appears as: cat /proc/iomem | grep -i crash af000000-beffffff : Crash kernel 433000000-43f7fffff : Crash kernel If crashkernel is then shrunk to 50MB (echo 52428800 > /sys/kernel/kexec_crash_size), /proc/iomem still shows 256MB reserved: af000000-beffffff : Crash kernel Instead, it should show 50MB: af000000-b21fffff : Crash kernel Further shrinking crashkernel to 40MB causes a kernel crash with the following trace (x86): BUG: kernel NULL pointer dereference, address: 0000000000000038 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI <snip...> Call Trace: <TASK> ? __die_body.cold+0x19/0x27 ? page_fault_oops+0x15a/0x2f0 ? search_module_extables+0x19/0x60 ? search_bpf_extables+0x5f/0x80 ? exc_page_fault+0x7e/0x180 ? asm_exc_page_fault+0x26/0x30 ? __release_resource+0xd/0xb0 release_resource+0x26/0x40 __crash_shrink_memory+0xe5/0x110 crash_shrink_memory+0x12a/0x190 kexec_crash_size_store+0x41/0x80 kernfs_fop_write_iter+0x141/0x1f0 vfs_write+0x294/0x460 ksys_write+0x6d/0xf0 <snip...> This happens because __crash_shrink_memory()/kernel/crash_core.c incorrectly updates the crashk_res resource object even when crashk_low_res should be updated. Fix this by ensuring the correct crashkernel resource object is updated when shrinking crashkernel memory. Link: https://lkml.kernel.org/r/[email protected] Fixes: 16c6006 ("kexec: enable kexec_crash_size to support two crash kernel regions") Signed-off-by: Sourabh Jain <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Zhen Lei <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Nov 24, 2025
nvme_fc_delete_assocation() waits for pending I/O to complete before returning, and an error can cause ->ioerr_work to be queued after cancel_work_sync() had been called. Move the call to cancel_work_sync() to be after nvme_fc_delete_association() to ensure ->ioerr_work is not running when the nvme_fc_ctrl object is freed. Otherwise the following can occur: [ 1135.911754] list_del corruption, ff2d24c8093f31f8->next is NULL [ 1135.917705] ------------[ cut here ]------------ [ 1135.922336] kernel BUG at lib/list_debug.c:52! [ 1135.926784] Oops: invalid opcode: 0000 [#1] SMP NOPTI [ 1135.931851] CPU: 48 UID: 0 PID: 726 Comm: kworker/u449:23 Kdump: loaded Not tainted 6.12.0 #1 PREEMPT(voluntary) [ 1135.943490] Hardware name: Dell Inc. PowerEdge R660/0HGTK9, BIOS 2.5.4 01/16/2025 [ 1135.950969] Workqueue: 0x0 (nvme-wq) [ 1135.954673] RIP: 0010:__list_del_entry_valid_or_report.cold+0xf/0x6f [ 1135.961041] Code: c7 c7 98 68 72 94 e8 26 45 fe ff 0f 0b 48 c7 c7 70 68 72 94 e8 18 45 fe ff 0f 0b 48 89 fe 48 c7 c7 80 69 72 94 e8 07 45 fe ff <0f> 0b 48 89 d1 48 c7 c7 a0 6a 72 94 48 89 c2 e8 f3 44 fe ff 0f 0b [ 1135.979788] RSP: 0018:ff579b19482d3e50 EFLAGS: 00010046 [ 1135.985015] RAX: 0000000000000033 RBX: ff2d24c8093f31f0 RCX: 0000000000000000 [ 1135.992148] RDX: 0000000000000000 RSI: ff2d24d6bfa1d0c0 RDI: ff2d24d6bfa1d0c0 [ 1135.999278] RBP: ff2d24c8093f31f8 R08: 0000000000000000 R09: ffffffff951e2b08 [ 1136.006413] R10: ffffffff95122ac8 R11: 0000000000000003 R12: ff2d24c78697c100 [ 1136.013546] R13: fffffffffffffff8 R14: 0000000000000000 R15: ff2d24c78697c0c0 [ 1136.020677] FS: 0000000000000000(0000) GS:ff2d24d6bfa00000(0000) knlGS:0000000000000000 [ 1136.028765] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1136.034510] CR2: 00007fd207f90b80 CR3: 000000163ea22003 CR4: 0000000000f73ef0 [ 1136.041641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1136.048776] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 1136.055910] PKRU: 55555554 [ 1136.058623] Call Trace: [ 1136.061074] <TASK> [ 1136.063179] ? show_trace_log_lvl+0x1b0/0x2f0 [ 1136.067540] ? show_trace_log_lvl+0x1b0/0x2f0 [ 1136.071898] ? move_linked_works+0x4a/0xa0 [ 1136.075998] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.081744] ? __die_body.cold+0x8/0x12 [ 1136.085584] ? die+0x2e/0x50 [ 1136.088469] ? do_trap+0xca/0x110 [ 1136.091789] ? do_error_trap+0x65/0x80 [ 1136.095543] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.101289] ? exc_invalid_op+0x50/0x70 [ 1136.105127] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.110874] ? asm_exc_invalid_op+0x1a/0x20 [ 1136.115059] ? __list_del_entry_valid_or_report.cold+0xf/0x6f [ 1136.120806] move_linked_works+0x4a/0xa0 [ 1136.124733] worker_thread+0x216/0x3a0 [ 1136.128485] ? __pfx_worker_thread+0x10/0x10 [ 1136.132758] kthread+0xfa/0x240 [ 1136.135904] ? __pfx_kthread+0x10/0x10 [ 1136.139657] ret_from_fork+0x31/0x50 [ 1136.143236] ? __pfx_kthread+0x10/0x10 [ 1136.146988] ret_from_fork_asm+0x1a/0x30 [ 1136.150915] </TASK> Fixes: 19fce04 ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context") Cc: [email protected] Tested-by: Marco Patalano <[email protected]> Reviewed-by: Justin Tee <[email protected]> Signed-off-by: Ewan D. Milne <[email protected]> Signed-off-by: Keith Busch <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Nov 24, 2025
The function devl_rate_nodes_destroy is documented to "Unset parent for all rate objects". However, it was only calling the driver-specific `rate_leaf_parent_set` or `rate_node_parent_set` ops and decrementing the parent's refcount, without actually setting the `devlink_rate->parent` pointer to NULL. This leaves a dangling pointer in the `devlink_rate` struct, which cause refcount error in netdevsim[1] and mlx5[2]. In addition, this is inconsistent with the behavior of `devlink_nl_rate_parent_node_set`, where the parent pointer is correctly cleared. This patch fixes the issue by explicitly setting `devlink_rate->parent` to NULL after notifying the driver, thus fulfilling the function's documented behavior for all rate objects. [1] repro steps: echo 1 > /sys/bus/netdevsim/new_device devlink dev eswitch set netdevsim/netdevsim1 mode switchdev echo 1 > /sys/bus/netdevsim/devices/netdevsim1/sriov_numvfs devlink port function rate add netdevsim/netdevsim1/test_node devlink port function rate set netdevsim/netdevsim1/128 parent test_node echo 1 > /sys/bus/netdevsim/del_device dmesg: refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 8 PID: 1530 at lib/refcount.c:31 refcount_warn_saturate+0x42/0xe0 CPU: 8 UID: 0 PID: 1530 Comm: bash Not tainted 6.18.0-rc4+ #1 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:refcount_warn_saturate+0x42/0xe0 Call Trace: <TASK> devl_rate_leaf_destroy+0x8d/0x90 __nsim_dev_port_del+0x6c/0x70 [netdevsim] nsim_dev_reload_destroy+0x11c/0x140 [netdevsim] nsim_drv_remove+0x2b/0xb0 [netdevsim] device_release_driver_internal+0x194/0x1f0 bus_remove_device+0xc6/0x130 device_del+0x159/0x3c0 device_unregister+0x1a/0x60 del_device_store+0x111/0x170 [netdevsim] kernfs_fop_write_iter+0x12e/0x1e0 vfs_write+0x215/0x3d0 ksys_write+0x5f/0xd0 do_syscall_64+0x55/0x10f0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 [2] devlink dev eswitch set pci/0000:08:00.0 mode switchdev devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 1000 devlink port function rate add pci/0000:08:00.0/group1 devlink port function rate set pci/0000:08:00.0/32768 parent group1 modprobe -r mlx5_ib mlx5_fwctl mlx5_core dmesg: refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 7 PID: 16151 at lib/refcount.c:31 refcount_warn_saturate+0x42/0xe0 CPU: 7 UID: 0 PID: 16151 Comm: bash Not tainted 6.17.0-rc7_for_upstream_min_debug_2025_10_02_12_44 #1 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:refcount_warn_saturate+0x42/0xe0 Call Trace: <TASK> devl_rate_leaf_destroy+0x8d/0x90 mlx5_esw_offloads_devlink_port_unregister+0x33/0x60 [mlx5_core] mlx5_esw_offloads_unload_rep+0x3f/0x50 [mlx5_core] mlx5_eswitch_unload_sf_vport+0x40/0x90 [mlx5_core] mlx5_sf_esw_event+0xc4/0x120 [mlx5_core] notifier_call_chain+0x33/0xa0 blocking_notifier_call_chain+0x3b/0x50 mlx5_eswitch_disable_locked+0x50/0x110 [mlx5_core] mlx5_eswitch_disable+0x63/0x90 [mlx5_core] mlx5_unload+0x1d/0x170 [mlx5_core] mlx5_uninit_one+0xa2/0x130 [mlx5_core] remove_one+0x78/0xd0 [mlx5_core] pci_device_remove+0x39/0xa0 device_release_driver_internal+0x194/0x1f0 unbind_store+0x99/0xa0 kernfs_fop_write_iter+0x12e/0x1e0 vfs_write+0x215/0x3d0 ksys_write+0x5f/0xd0 do_syscall_64+0x53/0x1f0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: d755598 ("devlink: Allow setting parent node of rate objects") Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Carolina Jubran <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Nov 24, 2025
The mlx5_irq_alloc() function can inadvertently free the entire rmap and end up in a crash[1] when the other threads tries to access this, when request_irq() fails due to exhausted IRQ vectors. This commit modifies the cleanup to remove only the specific IRQ mapping that was just added. This prevents removal of other valid mappings and ensures precise cleanup of the failed IRQ allocation's associated glue object. Note: This error is observed when both fwctl and rds configs are enabled. [1] mlx5_core 0000:05:00.0: Successfully registered panic handler for port 1 mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to request irq. err = -28 infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while trying to test write-combining support mlx5_core 0000:05:00.0: Successfully unregistered panic handler for port 1 mlx5_core 0000:06:00.0: Successfully registered panic handler for port 1 mlx5_core 0000:06:00.0: mlx5_irq_alloc:293:(pid 66740): Failed to request irq. err = -28 infiniband mlx5_0: mlx5_ib_test_wc:290:(pid 66740): Error -28 while trying to test write-combining support mlx5_core 0000:06:00.0: Successfully unregistered panic handler for port 1 mlx5_core 0000:03:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to request irq. err = -28 mlx5_core 0000:05:00.0: mlx5_irq_alloc:293:(pid 28895): Failed to request irq. err = -28 general protection fault, probably for non-canonical address 0xe277a58fde16f291: 0000 [#1] SMP NOPTI RIP: 0010:free_irq_cpu_rmap+0x23/0x7d Call Trace: <TASK> ? show_trace_log_lvl+0x1d6/0x2f9 ? show_trace_log_lvl+0x1d6/0x2f9 ? mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core] ? __die_body.cold+0x8/0xa ? die_addr+0x39/0x53 ? exc_general_protection+0x1c4/0x3e9 ? dev_vprintk_emit+0x5f/0x90 ? asm_exc_general_protection+0x22/0x27 ? free_irq_cpu_rmap+0x23/0x7d mlx5_irq_alloc.cold+0x5d/0xf3 [mlx5_core] irq_pool_request_vector+0x7d/0x90 [mlx5_core] mlx5_irq_request+0x2e/0xe0 [mlx5_core] mlx5_irq_request_vector+0xad/0xf7 [mlx5_core] comp_irq_request_pci+0x64/0xf0 [mlx5_core] create_comp_eq+0x71/0x385 [mlx5_core] ? mlx5e_open_xdpsq+0x11c/0x230 [mlx5_core] mlx5_comp_eqn_get+0x72/0x90 [mlx5_core] ? xas_load+0x8/0x91 mlx5_comp_irqn_get+0x40/0x90 [mlx5_core] mlx5e_open_channel+0x7d/0x3c7 [mlx5_core] mlx5e_open_channels+0xad/0x250 [mlx5_core] mlx5e_open_locked+0x3e/0x110 [mlx5_core] mlx5e_open+0x23/0x70 [mlx5_core] __dev_open+0xf1/0x1a5 __dev_change_flags+0x1e1/0x249 dev_change_flags+0x21/0x5c do_setlink+0x28b/0xcc4 ? __nla_parse+0x22/0x3d ? inet6_validate_link_af+0x6b/0x108 ? cpumask_next+0x1f/0x35 ? __snmp6_fill_stats64.constprop.0+0x66/0x107 ? __nla_validate_parse+0x48/0x1e6 __rtnl_newlink+0x5ff/0xa57 ? kmem_cache_alloc_trace+0x164/0x2ce rtnl_newlink+0x44/0x6e rtnetlink_rcv_msg+0x2bb/0x362 ? __netlink_sendskb+0x4c/0x6c ? netlink_unicast+0x28f/0x2ce ? rtnl_calcit.isra.0+0x150/0x146 netlink_rcv_skb+0x5f/0x112 netlink_unicast+0x213/0x2ce netlink_sendmsg+0x24f/0x4d9 __sock_sendmsg+0x65/0x6a ____sys_sendmsg+0x28f/0x2c9 ? import_iovec+0x17/0x2b ___sys_sendmsg+0x97/0xe0 __sys_sendmsg+0x81/0xd8 do_syscall_64+0x35/0x87 entry_SYSCALL_64_after_hwframe+0x6e/0x0 RIP: 0033:0x7fc328603727 Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48 RSP: 002b:00007ffe8eb3f1a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc328603727 RDX: 0000000000000000 RSI: 00007ffe8eb3f1f0 RDI: 000000000000000d RBP: 00007ffe8eb3f1f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 R13: 0000000000000000 R14: 00007ffe8eb3f3c8 R15: 00007ffe8eb3f3bc </TASK> ---[ end trace f43ce73c3c2b13a2 ]--- RIP: 0010:free_irq_cpu_rmap+0x23/0x7d Code: 0f 1f 80 00 00 00 00 48 85 ff 74 6b 55 48 89 fd 53 66 83 7f 06 00 74 24 31 db 48 8b 55 08 0f b7 c3 48 8b 04 c2 48 85 c0 74 09 <8b> 38 31 f6 e8 c4 0a b8 ff 83 c3 01 66 3b 5d 06 72 de b8 ff ff ff RSP: 0018:ff384881640eaca0 EFLAGS: 00010282 RAX: e277a58fde16f291 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ff2335e2e20b3600 RSI: 0000000000000000 RDI: ff2335e2e20b3400 RBP: ff2335e2e20b3400 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 00000000ffffffe4 R12: ff384881640ead88 R13: ff2335c3760751e0 R14: ff2335e2e1672200 R15: ff2335c3760751f8 FS: 00007fc32ac22480(0000) GS:ff2335e2d6e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f651ab54000 CR3: 00000029f1206003 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x1dc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) kvm-guest: disable async PF for cpu 0 Fixes: 3354822 ("net/mlx5: Use dynamic msix vectors allocation") Signed-off-by: Mohith Kumar Thummaluru<[email protected]> Tested-by: Mohith Kumar Thummaluru<[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Reviewed-by: Shay Drori <[email protected]> Signed-off-by: Pradyumn Rahar <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 1, 2025
A synchronous external abort occurs on the Renesas RZ/G3S SoC if unbind is
executed after the configuration sequence described above:
modprobe usb_f_ecm
modprobe libcomposite
modprobe configfs
cd /sys/kernel/config/usb_gadget
mkdir -p g1
cd g1
echo "0x1d6b" > idVendor
echo "0x0104" > idProduct
mkdir -p strings/0x409
echo "0123456789" > strings/0x409/serialnumber
echo "Renesas." > strings/0x409/manufacturer
echo "Ethernet Gadget" > strings/0x409/product
mkdir -p functions/ecm.usb0
mkdir -p configs/c.1
mkdir -p configs/c.1/strings/0x409
echo "ECM" > configs/c.1/strings/0x409/configuration
if [ ! -L configs/c.1/ecm.usb0 ]; then
ln -s functions/ecm.usb0 configs/c.1
fi
echo 11e20000.usb > UDC
echo 11e20000.usb > /sys/bus/platform/drivers/renesas_usbhs/unbind
The displayed trace is as follows:
Internal error: synchronous external abort: 0000000096000010 [#1] SMP
CPU: 0 UID: 0 PID: 188 Comm: sh Tainted: G M 6.17.0-rc7-next-20250922-00010-g41050493b2bd torvalds#55 PREEMPT
Tainted: [M]=MACHINE_CHECK
Hardware name: Renesas SMARC EVK version 2 based on r9a08g045s33 (DT)
pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : usbhs_sys_function_pullup+0x10/0x40 [renesas_usbhs]
lr : usbhsg_update_pullup+0x3c/0x68 [renesas_usbhs]
sp : ffff8000838b3920
x29: ffff8000838b3920 x28: ffff00000d585780 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000000 x24: ffff00000c3e3810
x23: ffff00000d5e5c80 x22: ffff00000d5e5d40 x21: 0000000000000000
x20: 0000000000000000 x19: ffff00000d5e5c80 x18: 0000000000000020
x17: 2e30303230316531 x16: 312d7968703a7968 x15: 3d454d414e5f4344
x14: 000000000000002c x13: 0000000000000000 x12: 0000000000000000
x11: ffff00000f358f38 x10: ffff00000f358db0 x9 : ffff00000b41f418
x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff6364626d
x5 : 8080808000000000 x4 : 000000004b5ccb9d x3 : 0000000000000000
x2 : 0000000000000000 x1 : ffff800083790000 x0 : ffff00000d5e5c80
Call trace:
usbhs_sys_function_pullup+0x10/0x40 [renesas_usbhs] (P)
usbhsg_pullup+0x4c/0x7c [renesas_usbhs]
usb_gadget_disconnect_locked+0x48/0xd4
gadget_unbind_driver+0x44/0x114
device_remove+0x4c/0x80
device_release_driver_internal+0x1c8/0x224
device_release_driver+0x18/0x24
bus_remove_device+0xcc/0x10c
device_del+0x14c/0x404
usb_del_gadget+0x88/0xc0
usb_del_gadget_udc+0x18/0x30
usbhs_mod_gadget_remove+0x24/0x44 [renesas_usbhs]
usbhs_mod_remove+0x20/0x30 [renesas_usbhs]
usbhs_remove+0x98/0xdc [renesas_usbhs]
platform_remove+0x20/0x30
device_remove+0x4c/0x80
device_release_driver_internal+0x1c8/0x224
device_driver_detach+0x18/0x24
unbind_store+0xb4/0xb8
drv_attr_store+0x24/0x38
sysfs_kf_write+0x7c/0x94
kernfs_fop_write_iter+0x128/0x1b8
vfs_write+0x2ac/0x350
ksys_write+0x68/0xfc
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x48/0x110
el0_svc_common.constprop.0+0xc0/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x34/0xf0
el0t_64_sync_handler+0xa0/0xe4
el0t_64_sync+0x198/0x19c
Code: 7100003f 1a9f07e1 531c6c22 f9400001 (79400021)
---[ end trace 0000000000000000 ]---
note: sh[188] exited with irqs disabled
note: sh[188] exited with preempt_count 1
The issue occurs because usbhs_sys_function_pullup(), which accesses the IP
registers, is executed after the USBHS clocks have been disabled. The
problem is reproducible on the Renesas RZ/G3S SoC starting with the
addition of module stop in the clock enable/disable APIs. With module stop
functionality enabled, a bus error is expected if a master accesses a
module whose clock has been stopped and module stop activated.
Disable the IP clocks at the end of remove.
Cc: stable <[email protected]>
Fixes: f1407d5 ("usb: renesas_usbhs: Add Renesas USBHS common code")
Signed-off-by: Claudiu Beznea <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 1, 2025
This reverts commit d02c2e4. Mauro reports that this breaks APEI notifications on his QEMU setup because the "reserved for firmware" region still needs to be writable by Linux in order to signal _back_ to the firmware after processing the reported error: | {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 | ... | [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error | Unable to handle kernel write to read-only memory at virtual address ffff800080035018 | Mem abort info: | ESR = 0x000000009600004f | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | FSC = 0x0f: level 3 permission fault | Data abort info: | ISV = 0, ISS = 0x0000004f, ISS2 = 0x00000000 | CM = 0, WnR = 1, TnD = 0, TagAccess = 0 | GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 | swapper pgtable: 4k pages, 52-bit VAs, pgdp=00000000505d7000 | pgd=10000000510bc003, p4d=1000000100229403, pud=100000010022a403, pmd=100000010022b403, pte=0060000139b90483 | Internal error: Oops: 000000009600004f [#1] SMP For now, revert the offending commit. We can presumably switch back to PAGE_KERNEL when bringing this back in the future. Link: https://lore.kernel.org/r/[email protected] Reported-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Will Deacon <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 1, 2025
Since commit 30f241f ("xsk: Fix immature cq descriptor production"), the descriptor number is stored in skb control block and xsk_cq_submit_addr_locked() relies on it to put the umem addrs onto pool's completion queue. skb control block shouldn't be used for this purpose as after transmit xsk doesn't have control over it and other subsystems could use it. This leads to the following kernel panic due to a NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 2 UID: 1 PID: 927 Comm: p4xsk.bin Not tainted 6.16.12+deb14-cloud-amd64 #1 PREEMPT(lazy) Debian 6.16.12-1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 RIP: 0010:xsk_destruct_skb+0xd0/0x180 [...] Call Trace: <IRQ> ? napi_complete_done+0x7a/0x1a0 ip_rcv_core+0x1bb/0x340 ip_rcv+0x30/0x1f0 __netif_receive_skb_one_core+0x85/0xa0 process_backlog+0x87/0x130 __napi_poll+0x28/0x180 net_rx_action+0x339/0x420 handle_softirqs+0xdc/0x320 ? handle_edge_irq+0x90/0x1e0 do_softirq.part.0+0x3b/0x60 </IRQ> <TASK> __local_bh_enable_ip+0x60/0x70 __dev_direct_xmit+0x14e/0x1f0 __xsk_generic_xmit+0x482/0xb70 ? __remove_hrtimer+0x41/0xa0 ? __xsk_generic_xmit+0x51/0xb70 ? _raw_spin_unlock_irqrestore+0xe/0x40 xsk_sendmsg+0xda/0x1c0 __sys_sendto+0x1ee/0x200 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x84/0x2f0 ? __pfx_pollwake+0x10/0x10 ? __rseq_handle_notify_resume+0xad/0x4c0 ? restore_fpregs_from_fpstate+0x3c/0x90 ? switch_fpu_return+0x5b/0xe0 ? do_syscall_64+0x204/0x2f0 ? do_syscall_64+0x204/0x2f0 ? do_syscall_64+0x204/0x2f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> [...] Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x1c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Instead use the skb destructor_arg pointer along with pointer tagging. As pointers are always aligned to 8B, use the bottom bit to indicate whether this a single address or an allocated struct containing several addresses. Fixes: 30f241f ("xsk: Fix immature cq descriptor production") Closes: https://lore.kernel.org/netdev/[email protected]/ Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Fernando Fernandez Mancera <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Reviewed-by: Jason Xing <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 1, 2025
[WHAT] IGT kms_cursor_legacy's long-nonblocking-modeset-vs-cursor-atomic fails with NULL pointer dereference. This can be reproduced with both an eDP panel and a DP monitors connected. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 13 UID: 0 PID: 2960 Comm: kms_cursor_lega Not tainted 6.16.0-99-custom #8 PREEMPT(voluntary) Hardware name: AMD ........ RIP: 0010:dc_stream_get_scanoutpos+0x34/0x130 [amdgpu] Code: 57 4d 89 c7 41 56 49 89 ce 41 55 49 89 d5 41 54 49 89 fc 53 48 83 ec 18 48 8b 87 a0 64 00 00 48 89 75 d0 48 c7 c6 e0 41 30 c2 <48> 8b 38 48 8b 9f 68 06 00 00 e8 8d d7 fd ff 31 c0 48 81 c3 e0 02 RSP: 0018:ffffd0f3c2bd7608 EFLAGS: 00010292 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffd0f3c2bd7668 RDX: ffffd0f3c2bd7664 RSI: ffffffffc23041e0 RDI: ffff8b32494b8000 RBP: ffffd0f3c2bd7648 R08: ffffd0f3c2bd766c R09: ffffd0f3c2bd7760 R10: ffffd0f3c2bd7820 R11: 0000000000000000 R12: ffff8b32494b8000 R13: ffffd0f3c2bd7664 R14: ffffd0f3c2bd7668 R15: ffffd0f3c2bd766c FS: 000071f631b68700(0000) GS:ffff8b399f114000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001b8105000 CR4: 0000000000f50ef0 PKRU: 55555554 Call Trace: <TASK> dm_crtc_get_scanoutpos+0xd7/0x180 [amdgpu] amdgpu_display_get_crtc_scanoutpos+0x86/0x1c0 [amdgpu] ? __pfx_amdgpu_crtc_get_scanout_position+0x10/0x10[amdgpu] amdgpu_crtc_get_scanout_position+0x27/0x50 [amdgpu] drm_crtc_vblank_helper_get_vblank_timestamp_internal+0xf7/0x400 drm_crtc_vblank_helper_get_vblank_timestamp+0x1c/0x30 drm_crtc_get_last_vbltimestamp+0x55/0x90 drm_crtc_next_vblank_start+0x45/0xa0 drm_atomic_helper_wait_for_fences+0x81/0x1f0 ... Cc: Mario Limonciello <[email protected]> Cc: Alex Deucher <[email protected]> Reviewed-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Hung <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 621e55f) Cc: [email protected]
heftig
pushed a commit
that referenced
this pull request
Dec 1, 2025
The crash in process_v2_sparse_read() for fscrypt-encrypted directories has been reported. Issue takes place for Ceph msgr2 protocol in secure mode. It can be reproduced by the steps: sudo mount -t ceph :/ /mnt/cephfs/ -o name=admin,fs=cephfs,ms_mode=secure (1) mkdir /mnt/cephfs/fscrypt-test-3 (2) cp area_decrypted.tar /mnt/cephfs/fscrypt-test-3 (3) fscrypt encrypt --source=raw_key --key=./my.key /mnt/cephfs/fscrypt-test-3 (4) fscrypt lock /mnt/cephfs/fscrypt-test-3 (5) fscrypt unlock --key=my.key /mnt/cephfs/fscrypt-test-3 (6) cat /mnt/cephfs/fscrypt-test-3/area_decrypted.tar (7) Issue has been triggered [ 408.072247] ------------[ cut here ]------------ [ 408.072251] WARNING: CPU: 1 PID: 392 at net/ceph/messenger_v2.c:865 ceph_con_v2_try_read+0x4b39/0x72f0 [ 408.072267] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore [ 408.072304] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Not tainted 6.17.0-rc7+ [ 408.072307] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 [ 408.072310] Workqueue: ceph-msgr ceph_con_workfn [ 408.072314] RIP: 0010:ceph_con_v2_try_read+0x4b39/0x72f0 [ 408.072317] Code: c7 c1 20 f0 d4 ae 50 31 d2 48 c7 c6 60 27 d5 ae 48 c7 c7 f8 8e 6f b0 68 60 38 d5 ae e8 00 47 61 fe 48 83 c4 18 e9 ac fc ff ff <0f> 0b e9 06 fe ff ff 4c 8b 9d 98 fd ff ff 0f 84 64 e7 ff ff 89 85 [ 408.072319] RSP: 0018:ffff88811c3e7a30 EFLAGS: 00010246 [ 408.072322] RAX: ffffed1024874c6f RBX: ffffea00042c2b40 RCX: 0000000000000f38 [ 408.072324] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 408.072325] RBP: ffff88811c3e7ca8 R08: 0000000000000000 R09: 00000000000000c8 [ 408.072326] R10: 00000000000000c8 R11: 0000000000000000 R12: 00000000000000c8 [ 408.072327] R13: dffffc0000000000 R14: ffff8881243a6030 R15: 0000000000003000 [ 408.072329] FS: 0000000000000000(0000) GS:ffff88823eadf000(0000) knlGS:0000000000000000 [ 408.072331] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 408.072332] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0 [ 408.072336] PKRU: 55555554 [ 408.072337] Call Trace: [ 408.072338] <TASK> [ 408.072340] ? sched_clock_noinstr+0x9/0x10 [ 408.072344] ? __pfx_ceph_con_v2_try_read+0x10/0x10 [ 408.072347] ? _raw_spin_unlock+0xe/0x40 [ 408.072349] ? finish_task_switch.isra.0+0x15d/0x830 [ 408.072353] ? __kasan_check_write+0x14/0x30 [ 408.072357] ? mutex_lock+0x84/0xe0 [ 408.072359] ? __pfx_mutex_lock+0x10/0x10 [ 408.072361] ceph_con_workfn+0x27e/0x10e0 [ 408.072364] ? metric_delayed_work+0x311/0x2c50 [ 408.072367] process_one_work+0x611/0xe20 [ 408.072371] ? __kasan_check_write+0x14/0x30 [ 408.072373] worker_thread+0x7e3/0x1580 [ 408.072375] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 408.072378] ? __pfx_worker_thread+0x10/0x10 [ 408.072381] kthread+0x381/0x7a0 [ 408.072383] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 408.072385] ? __pfx_kthread+0x10/0x10 [ 408.072387] ? __kasan_check_write+0x14/0x30 [ 408.072389] ? recalc_sigpending+0x160/0x220 [ 408.072392] ? _raw_spin_unlock_irq+0xe/0x50 [ 408.072394] ? calculate_sigpending+0x78/0xb0 [ 408.072395] ? __pfx_kthread+0x10/0x10 [ 408.072397] ret_from_fork+0x2b6/0x380 [ 408.072400] ? __pfx_kthread+0x10/0x10 [ 408.072402] ret_from_fork_asm+0x1a/0x30 [ 408.072406] </TASK> [ 408.072407] ---[ end trace 0000000000000000 ]--- [ 408.072418] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI [ 408.072984] KASAN: null-ptr-deref in range [0x0000000000000000- 0x0000000000000007] [ 408.073350] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Tainted: G W 6.17.0-rc7+ #1 PREEMPT(voluntary) [ 408.073886] Tainted: [W]=WARN [ 408.074042] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 [ 408.074468] Workqueue: ceph-msgr ceph_con_workfn [ 408.074694] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80 [ 408.074976] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16 84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03 [ 408.075884] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246 [ 408.076305] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000 [ 408.076909] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378 [ 408.077466] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8 [ 408.078034] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71 [ 408.078575] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378 [ 408.079159] FS: 0000000000000000(0000) GS:ffff88823eadf000(0000) knlGS:0000000000000000 [ 408.079736] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 408.080039] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0 [ 408.080376] PKRU: 55555554 [ 408.080513] Call Trace: [ 408.080630] <TASK> [ 408.080729] ceph_con_v2_try_read+0x49b9/0x72f0 [ 408.081115] ? __pfx_ceph_con_v2_try_read+0x10/0x10 [ 408.081348] ? _raw_spin_unlock+0xe/0x40 [ 408.081538] ? finish_task_switch.isra.0+0x15d/0x830 [ 408.081768] ? __kasan_check_write+0x14/0x30 [ 408.081986] ? mutex_lock+0x84/0xe0 [ 408.082160] ? __pfx_mutex_lock+0x10/0x10 [ 408.082343] ceph_con_workfn+0x27e/0x10e0 [ 408.082529] ? metric_delayed_work+0x311/0x2c50 [ 408.082737] process_one_work+0x611/0xe20 [ 408.082948] ? __kasan_check_write+0x14/0x30 [ 408.083156] worker_thread+0x7e3/0x1580 [ 408.083331] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 408.083557] ? __pfx_worker_thread+0x10/0x10 [ 408.083751] kthread+0x381/0x7a0 [ 408.083922] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 408.084139] ? __pfx_kthread+0x10/0x10 [ 408.084310] ? __kasan_check_write+0x14/0x30 [ 408.084510] ? recalc_sigpending+0x160/0x220 [ 408.084708] ? _raw_spin_unlock_irq+0xe/0x50 [ 408.084917] ? calculate_sigpending+0x78/0xb0 [ 408.085138] ? __pfx_kthread+0x10/0x10 [ 408.085335] ret_from_fork+0x2b6/0x380 [ 408.085525] ? __pfx_kthread+0x10/0x10 [ 408.085720] ret_from_fork_asm+0x1a/0x30 [ 408.085922] </TASK> [ 408.086036] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore [ 408.087778] ---[ end trace 0000000000000000 ]--- [ 408.088007] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80 [ 408.088260] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16 84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03 [ 408.089118] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246 [ 408.089357] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000 [ 408.089678] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378 [ 408.090020] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8 [ 408.090360] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71 [ 408.090687] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378 [ 408.091035] FS: 0000000000000000(0000) GS:ffff88823eadf000(0000) knlGS:0000000000000000 [ 408.091452] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 408.092015] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0 [ 408.092530] PKRU: 55555554 [ 417.112915] ================================================================== [ 417.113491] BUG: KASAN: slab-use-after-free in __mutex_lock.constprop.0+0x1522/0x1610 [ 417.114014] Read of size 4 at addr ffff888124870034 by task kworker/2:0/4951 [ 417.114587] CPU: 2 UID: 0 PID: 4951 Comm: kworker/2:0 Tainted: G D W 6.17.0-rc7+ #1 PREEMPT(voluntary) [ 417.114592] Tainted: [D]=DIE, [W]=WARN [ 417.114593] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 [ 417.114596] Workqueue: events handle_timeout [ 417.114601] Call Trace: [ 417.114602] <TASK> [ 417.114604] dump_stack_lvl+0x5c/0x90 [ 417.114610] print_report+0x171/0x4dc [ 417.114613] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 417.114617] ? kasan_complete_mode_report_info+0x80/0x220 [ 417.114621] kasan_report+0xbd/0x100 [ 417.114625] ? __mutex_lock.constprop.0+0x1522/0x1610 [ 417.114628] ? __mutex_lock.constprop.0+0x1522/0x1610 [ 417.114630] __asan_report_load4_noabort+0x14/0x30 [ 417.114633] __mutex_lock.constprop.0+0x1522/0x1610 [ 417.114635] ? queue_con_delay+0x8d/0x200 [ 417.114638] ? __pfx___mutex_lock.constprop.0+0x10/0x10 [ 417.114641] ? __send_subscribe+0x529/0xb20 [ 417.114644] __mutex_lock_slowpath+0x13/0x20 [ 417.114646] mutex_lock+0xd4/0xe0 [ 417.114649] ? __pfx_mutex_lock+0x10/0x10 [ 417.114652] ? ceph_monc_renew_subs+0x2a/0x40 [ 417.114654] ceph_con_keepalive+0x22/0x110 [ 417.114656] handle_timeout+0x6b3/0x11d0 [ 417.114659] ? _raw_spin_unlock_irq+0xe/0x50 [ 417.114662] ? __pfx_handle_timeout+0x10/0x10 [ 417.114664] ? queue_delayed_work_on+0x8e/0xa0 [ 417.114669] process_one_work+0x611/0xe20 [ 417.114672] ? __kasan_check_write+0x14/0x30 [ 417.114676] worker_thread+0x7e3/0x1580 [ 417.114678] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 417.114682] ? __pfx_sched_setscheduler_nocheck+0x10/0x10 [ 417.114687] ? __pfx_worker_thread+0x10/0x10 [ 417.114689] kthread+0x381/0x7a0 [ 417.114692] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 417.114694] ? __pfx_kthread+0x10/0x10 [ 417.114697] ? __kasan_check_write+0x14/0x30 [ 417.114699] ? recalc_sigpending+0x160/0x220 [ 417.114703] ? _raw_spin_unlock_irq+0xe/0x50 [ 417.114705] ? calculate_sigpending+0x78/0xb0 [ 417.114707] ? __pfx_kthread+0x10/0x10 [ 417.114710] ret_from_fork+0x2b6/0x380 [ 417.114713] ? __pfx_kthread+0x10/0x10 [ 417.114715] ret_from_fork_asm+0x1a/0x30 [ 417.114720] </TASK> [ 417.125171] Allocated by task 2: [ 417.125333] kasan_save_stack+0x26/0x60 [ 417.125522] kasan_save_track+0x14/0x40 [ 417.125742] kasan_save_alloc_info+0x39/0x60 [ 417.125945] __kasan_slab_alloc+0x8b/0xb0 [ 417.126133] kmem_cache_alloc_node_noprof+0x13b/0x460 [ 417.126381] copy_process+0x320/0x6250 [ 417.126595] kernel_clone+0xb7/0x840 [ 417.126792] kernel_thread+0xd6/0x120 [ 417.126995] kthreadd+0x85c/0xbe0 [ 417.127176] ret_from_fork+0x2b6/0x380 [ 417.127378] ret_from_fork_asm+0x1a/0x30 [ 417.127692] Freed by task 0: [ 417.127851] kasan_save_stack+0x26/0x60 [ 417.128057] kasan_save_track+0x14/0x40 [ 417.128267] kasan_save_free_info+0x3b/0x60 [ 417.128491] __kasan_slab_free+0x6c/0xa0 [ 417.128708] kmem_cache_free+0x182/0x550 [ 417.128906] free_task+0xeb/0x140 [ 417.129070] __put_task_struct+0x1d2/0x4f0 [ 417.129259] __put_task_struct_rcu_cb+0x15/0x20 [ 417.129480] rcu_do_batch+0x3d3/0xe70 [ 417.129681] rcu_core+0x549/0xb30 [ 417.129839] rcu_core_si+0xe/0x20 [ 417.130005] handle_softirqs+0x160/0x570 [ 417.130190] __irq_exit_rcu+0x189/0x1e0 [ 417.130369] irq_exit_rcu+0xe/0x20 [ 417.130531] sysvec_apic_timer_interrupt+0x9f/0xd0 [ 417.130768] asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 417.131082] Last potentially related work creation: [ 417.131305] kasan_save_stack+0x26/0x60 [ 417.131484] kasan_record_aux_stack+0xae/0xd0 [ 417.131695] __call_rcu_common+0xcd/0x14b0 [ 417.131909] call_rcu+0x31/0x50 [ 417.132071] delayed_put_task_struct+0x128/0x190 [ 417.132295] rcu_do_batch+0x3d3/0xe70 [ 417.132478] rcu_core+0x549/0xb30 [ 417.132658] rcu_core_si+0xe/0x20 [ 417.132808] handle_softirqs+0x160/0x570 [ 417.132993] __irq_exit_rcu+0x189/0x1e0 [ 417.133181] irq_exit_rcu+0xe/0x20 [ 417.133353] sysvec_apic_timer_interrupt+0x9f/0xd0 [ 417.133584] asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 417.133921] Second to last potentially related work creation: [ 417.134183] kasan_save_stack+0x26/0x60 [ 417.134362] kasan_record_aux_stack+0xae/0xd0 [ 417.134566] __call_rcu_common+0xcd/0x14b0 [ 417.134782] call_rcu+0x31/0x50 [ 417.134929] put_task_struct_rcu_user+0x58/0xb0 [ 417.135143] finish_task_switch.isra.0+0x5d3/0x830 [ 417.135366] __schedule+0xd30/0x5100 [ 417.135534] schedule_idle+0x5a/0x90 [ 417.135712] do_idle+0x25f/0x410 [ 417.135871] cpu_startup_entry+0x53/0x70 [ 417.136053] start_secondary+0x216/0x2c0 [ 417.136233] common_startup_64+0x13e/0x141 [ 417.136894] The buggy address belongs to the object at ffff888124870000 which belongs to the cache task_struct of size 10504 [ 417.138122] The buggy address is located 52 bytes inside of freed 10504-byte region [ffff888124870000, ffff888124872908) [ 417.139465] The buggy address belongs to the physical page: [ 417.140016] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x124870 [ 417.140789] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 417.141519] memcg:ffff88811aa20e01 [ 417.141874] anon flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff) [ 417.142600] page_type: f5(slab) [ 417.142922] raw: 0017ffffc0000040 ffff88810094f040 0000000000000000 dead000000000001 [ 417.143554] raw: 0000000000000000 0000000000030003 00000000f5000000 ffff88811aa20e01 [ 417.143954] head: 0017ffffc0000040 ffff88810094f040 0000000000000000 dead000000000001 [ 417.144329] head: 0000000000000000 0000000000030003 00000000f5000000 ffff88811aa20e01 [ 417.144710] head: 0017ffffc0000003 ffffea0004921c01 00000000ffffffff 00000000ffffffff [ 417.145106] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008 [ 417.145485] page dumped because: kasan: bad access detected [ 417.145859] Memory state around the buggy address: [ 417.146094] ffff88812486ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 417.146439] ffff88812486ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 417.146791] >ffff888124870000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 417.147145] ^ [ 417.147387] ffff888124870080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 417.147751] ffff888124870100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 417.148123] ================================================================== First of all, we have warning in get_bvec_at() because cursor->total_resid contains zero value. And, finally, we have crash in ceph_msg_data_advance() because cursor->data is NULL. It means that get_bvec_at() receives not initialized ceph_msg_data_cursor structure because data is NULL and total_resid contains zero. Moreover, we don't have likewise issue for the case of Ceph msgr1 protocol because ceph_msg_data_cursor_init() has been called before reading sparse data. This patch adds calling of ceph_msg_data_cursor_init() in the beginning of process_v2_sparse_read() with the goal to guarantee that logic of reading sparse data works correctly for the case of Ceph msgr2 protocol. Cc: [email protected] Link: https://tracker.ceph.com/issues/73152 Signed-off-by: Viacheslav Dubeyko <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 1, 2025
…ptcp_do_fastclose(). syzbot reported divide-by-zero in __tcp_select_window() by MPTCP socket. [0] We had a similar issue for the bare TCP and fixed in commit 499350a ("tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0"). Let's apply the same fix to mptcp_do_fastclose(). [0]: Oops: divide error: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 6068 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 RIP: 0010:__tcp_select_window+0x824/0x1320 net/ipv4/tcp_output.c:3336 Code: ff ff ff 44 89 f1 d3 e0 89 c1 f7 d1 41 01 cc 41 21 c4 e9 a9 00 00 00 e8 ca 49 01 f8 e9 9c 00 00 00 e8 c0 49 01 f8 44 89 e0 99 <f7> 7c 24 1c 41 29 d4 48 bb 00 00 00 00 00 fc ff df e9 80 00 00 00 RSP: 0018:ffffc90003017640 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88807b469e40 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc90003017730 R08: ffff888033268143 R09: 1ffff1100664d028 R10: dffffc0000000000 R11: ffffed100664d029 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 000055557faa0500(0000) GS:ffff888126135000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f64a1912ff8 CR3: 0000000072122000 CR4: 00000000003526f0 Call Trace: <TASK> tcp_select_window net/ipv4/tcp_output.c:281 [inline] __tcp_transmit_skb+0xbc7/0x3aa0 net/ipv4/tcp_output.c:1568 tcp_transmit_skb net/ipv4/tcp_output.c:1649 [inline] tcp_send_active_reset+0x2d1/0x5b0 net/ipv4/tcp_output.c:3836 mptcp_do_fastclose+0x27e/0x380 net/mptcp/protocol.c:2793 mptcp_disconnect+0x238/0x710 net/mptcp/protocol.c:3253 mptcp_sendmsg_fastopen+0x2f8/0x580 net/mptcp/protocol.c:1776 mptcp_sendmsg+0x1774/0x1980 net/mptcp/protocol.c:1855 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg+0xe5/0x270 net/socket.c:742 __sys_sendto+0x3bd/0x520 net/socket.c:2244 __do_sys_sendto net/socket.c:2251 [inline] __se_sys_sendto net/socket.c:2247 [inline] __x64_sys_sendto+0xde/0x100 net/socket.c:2247 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f66e998f749 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffff9acedb8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f66e9be5fa0 RCX: 00007f66e998f749 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007ffff9acee10 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 00007f66e9be5fa0 R14: 00007f66e9be5fa0 R15: 0000000000000006 </TASK> Fixes: ae15506 ("mptcp: fix duplicate reset on fastclose") Reported-by: [email protected] Closes: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Matthieu Baerts (NGI0) <[email protected]> Cc: [email protected] Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 12, 2025
…em corrupted commit 986835b upstream. There's issue when file system corrupted: ------------[ cut here ]------------ kernel BUG at fs/jbd2/transaction.c:1289! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 5 UID: 0 PID: 2031 Comm: mkdir Not tainted 6.18.0-rc1-next RIP: 0010:jbd2_journal_get_create_access+0x3b6/0x4d0 RSP: 0018:ffff888117aafa30 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88811a86b000 RCX: ffffffff89a63534 RDX: 1ffff110200ec602 RSI: 0000000000000004 RDI: ffff888100763010 RBP: ffff888100763000 R08: 0000000000000001 R09: ffff888100763028 R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000000 R13: ffff88812c432000 R14: ffff88812c608000 R15: ffff888120bfc000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f91d6970c99 CR3: 00000001159c4000 CR4: 00000000000006f0 Call Trace: <TASK> __ext4_journal_get_create_access+0x42/0x170 ext4_getblk+0x319/0x6f0 ext4_bread+0x11/0x100 ext4_append+0x1e6/0x4a0 ext4_init_new_dir+0x145/0x1d0 ext4_mkdir+0x326/0x920 vfs_mkdir+0x45c/0x740 do_mkdirat+0x234/0x2f0 __x64_sys_mkdir+0xd6/0x120 do_syscall_64+0x5f/0xfa0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The above issue occurs with us in errors=continue mode when accompanied by storage failures. There have been many inconsistencies in the file system data. In the case of file system data inconsistency, for example, if the block bitmap of a referenced block is not set, it can lead to the situation where a block being committed is allocated and used again. As a result, the following condition will not be satisfied then trigger BUG_ON. Of course, it is entirely possible to construct a problematic image that can trigger this BUG_ON through specific operations. In fact, I have constructed such an image and easily reproduced this issue. Therefore, J_ASSERT() holds true only under ideal conditions, but it may not necessarily be satisfied in exceptional scenarios. Using J_ASSERT() directly in abnormal situations would cause the system to crash, which is clearly not what we want. So here we directly trigger a JBD abort instead of immediately invoking BUG_ON. Fixes: 470decc ("[PATCH] jbd2: initial copy of files from jbd") Signed-off-by: Ye Bin <[email protected]> Reviewed-by: Jan Kara <[email protected]> Message-ID: <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 12, 2025
commit 0cd8fee upstream. Fix a race between inline data destruction and block mapping. The function ext4_destroy_inline_data_nolock() changes the inode data layout by clearing EXT4_INODE_INLINE_DATA and setting EXT4_INODE_EXTENTS. At the same time, another thread may execute ext4_map_blocks(), which tests EXT4_INODE_EXTENTS to decide whether to call ext4_ext_map_blocks() or ext4_ind_map_blocks(). Without i_data_sem protection, ext4_ind_map_blocks() may receive inode with EXT4_INODE_EXTENTS flag and triggering assert. kernel BUG at fs/ext4/indirect.c:546! EXT4-fs (loop2): unmounting filesystem. invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:ext4_ind_map_blocks.cold+0x2b/0x5a fs/ext4/indirect.c:546 Call Trace: <TASK> ext4_map_blocks+0xb9b/0x16f0 fs/ext4/inode.c:681 _ext4_get_block+0x242/0x590 fs/ext4/inode.c:822 ext4_block_write_begin+0x48b/0x12c0 fs/ext4/inode.c:1124 ext4_write_begin+0x598/0xef0 fs/ext4/inode.c:1255 ext4_da_write_begin+0x21e/0x9c0 fs/ext4/inode.c:3000 generic_perform_write+0x259/0x5d0 mm/filemap.c:3846 ext4_buffered_write_iter+0x15b/0x470 fs/ext4/file.c:285 ext4_file_write_iter+0x8e0/0x17f0 fs/ext4/file.c:679 call_write_iter include/linux/fs.h:2271 [inline] do_iter_readv_writev+0x212/0x3c0 fs/read_write.c:735 do_iter_write+0x186/0x710 fs/read_write.c:861 vfs_iter_write+0x70/0xa0 fs/read_write.c:902 iter_file_splice_write+0x73b/0xc90 fs/splice.c:685 do_splice_from fs/splice.c:763 [inline] direct_splice_actor+0x10f/0x170 fs/splice.c:950 splice_direct_to_actor+0x33a/0xa10 fs/splice.c:896 do_splice_direct+0x1a9/0x280 fs/splice.c:1002 do_sendfile+0xb13/0x12c0 fs/read_write.c:1255 __do_sys_sendfile64 fs/read_write.c:1323 [inline] __se_sys_sendfile64 fs/read_write.c:1309 [inline] __x64_sys_sendfile64+0x1cf/0x210 fs/read_write.c:1309 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Fixes: c755e25 ("ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()") Cc: [email protected] # v4.11+ Signed-off-by: Alexey Nepomnyashih <[email protected]> Message-ID: <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 12, 2025
commit 3e0ae02 upstream. Rust Binder contains the following unsafe operation: // SAFETY: A `NodeDeath` is never inserted into the death list // of any node other than its owner, so it is either in this // death list or in no death list. unsafe { node_inner.death_list.remove(self) }; This operation is unsafe because when touching the prev/next pointers of a list element, we have to ensure that no other thread is also touching them in parallel. If the node is present in the list that `remove` is called on, then that is fine because we have exclusive access to that list. If the node is not in any list, then it's also ok. But if it's present in a different list that may be accessed in parallel, then that may be a data race on the prev/next pointers. And unfortunately that is exactly what is happening here. In Node::release, we: 1. Take the lock. 2. Move all items to a local list on the stack. 3. Drop the lock. 4. Iterate the local list on the stack. Combined with threads using the unsafe remove method on the original list, this leads to memory corruption of the prev/next pointers. This leads to crashes like this one: Unable to handle kernel paging request at virtual address 000bb9841bcac70e Mem abort info: ESR = 0x0000000096000044 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 CM = 0, WnR = 1, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [000bb9841bcac70e] address between user and kernel address ranges Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP google-cdd 538c004.gcdd: context saved(CPU:1) item - log_kevents is disabled Modules linked in: ... rust_binder CPU: 1 UID: 0 PID: 2092 Comm: kworker/1:178 Tainted: G S W OE 6.12.52-android16-5-g98debd5df505-4k #1 f94a6367396c5488d635708e43ee0c888d230b0b Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: MUSTANG PVT 1.0 based on LGA (DT) Workqueue: events _RNvXs6_NtCsdfZWD8DztAw_6kernel9workqueueINtNtNtB7_4sync3arc3ArcNtNtCs8QPsHWIn21X_16rust_binder_main7process7ProcessEINtB5_15WorkItemPointerKy0_E3runB13_ [rust_binder] pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : _RNvXs3_NtCs8QPsHWIn21X_16rust_binder_main7processNtB5_7ProcessNtNtCsdfZWD8DztAw_6kernel9workqueue8WorkItem3run+0x450/0x11f8 [rust_binder] lr : _RNvXs3_NtCs8QPsHWIn21X_16rust_binder_main7processNtB5_7ProcessNtNtCsdfZWD8DztAw_6kernel9workqueue8WorkItem3run+0x464/0x11f8 [rust_binder] sp : ffffffc09b433ac0 x29: ffffffc09b433d30 x28: ffffff8821690000 x27: ffffffd40cbaa448 x26: ffffff8821690000 x25: 00000000ffffffff x24: ffffff88d0376578 x23: 0000000000000001 x22: ffffffc09b433c78 x21: ffffff88e8f9bf40 x20: ffffff88e8f9bf40 x19: ffffff882692b000 x18: ffffffd40f10bf00 x17: 00000000c006287d x16: 00000000c006287d x15: 00000000000003b0 x14: 0000000000000100 x13: 000000201cb79ae0 x12: fffffffffffffff0 x11: 0000000000000000 x10: 0000000000000001 x9 : 0000000000000000 x8 : b80bb9841bcac706 x7 : 0000000000000001 x6 : fffffffebee63f30 x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000 x2 : 0000000000004c31 x1 : ffffff88216900c0 x0 : ffffff88e8f9bf00 Call trace: _RNvXs3_NtCs8QPsHWIn21X_16rust_binder_main7processNtB5_7ProcessNtNtCsdfZWD8DztAw_6kernel9workqueue8WorkItem3run+0x450/0x11f8 [rust_binder bbc172b53665bbc815363b22e97e3f7e3fe971fc] process_scheduled_works+0x1c4/0x45c worker_thread+0x32c/0x3e8 kthread+0x11c/0x1c8 ret_from_fork+0x10/0x20 Code: 94218d85 b4000155 a94026a8 d10102a0 (f9000509) ---[ end trace 0000000000000000 ]--- Thus, modify Node::release to pop items directly off the original list. Cc: [email protected] Fixes: eafedbc ("rust_binder: add Rust Binder driver") Signed-off-by: Alice Ryhl <[email protected]> Acked-by: Miguel Ojeda <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 12, 2025
commit a51f025 upstream. Syzbot identified an issue [1] in pcl818_ai_cancel(), which stems from the fact that in case of early device detach via pcl818_detach(), subdevice dev->read_subdev may not have initialized its pointer to &struct comedi_async as intended. Thus, any such dereferencing of &s->async->cmd will lead to general protection fault and kernel crash. Mitigate this problem by removing a call to pcl818_ai_cancel() from pcl818_detach() altogether. This way, if the subdevice setups its support for async commands, everything async-related will be handled via subdevice's own ->cancel() function in comedi_device_detach_locked() even before pcl818_detach(). If no support for asynchronous commands is provided, there is no need to cancel anything either. [1] Syzbot crash: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] CPU: 1 UID: 0 PID: 6050 Comm: syz.0.18 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 RIP: 0010:pcl818_ai_cancel+0x69/0x3f0 drivers/comedi/drivers/pcl818.c:762 ... Call Trace: <TASK> pcl818_detach+0x66/0xd0 drivers/comedi/drivers/pcl818.c:1115 comedi_device_detach_locked+0x178/0x750 drivers/comedi/drivers.c:207 do_devconfig_ioctl drivers/comedi/comedi_fops.c:848 [inline] comedi_unlocked_ioctl+0xcde/0x1020 drivers/comedi/comedi_fops.c:2178 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] ... Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=fce5d9d5bd067d6fbe9b Fixes: 00aba6e ("staging: comedi: pcl818: remove 'neverending_ai' from private data") Cc: stable <[email protected]> Signed-off-by: Nikita Zhandarovich <[email protected]> Reviewed-by: Ian Abbott <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 18, 2025
At the moment - the memory allocation for fwsec-sb is created as-needed and is released after being used. Typically this is at some point well after driver load, which can cause runtime suspend/resume to initially work on driver load but then later fail on a machine that has been running for long enough with sufficiently high enough memory pressure: kworker/7:1: page allocation failure: order:5, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 7 UID: 0 PID: 875159 Comm: kworker/7:1 Not tainted 6.17.8-300.fc43.x86_64 #1 PREEMPT(lazy) Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024 Workqueue: pm pm_runtime_work Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 warn_alloc+0x163/0x190 ? __alloc_pages_direct_compact+0x1b3/0x220 __alloc_pages_slowpath.constprop.0+0x57a/0xb10 __alloc_frozen_pages_noprof+0x334/0x350 __alloc_pages_noprof+0xe/0x20 __dma_direct_alloc_pages.isra.0+0x1eb/0x330 dma_direct_alloc_pages+0x3c/0x190 dma_alloc_pages+0x29/0x130 nvkm_firmware_ctor+0x1ae/0x280 [nouveau] nvkm_falcon_fw_ctor+0x3e/0x60 [nouveau] nvkm_gsp_fwsec+0x10e/0x2c0 [nouveau] ? sysvec_apic_timer_interrupt+0xe/0x90 nvkm_gsp_fwsec_sb+0x27/0x70 [nouveau] tu102_gsp_fini+0x65/0x110 [nouveau] ? ktime_get+0x3c/0xf0 nvkm_subdev_fini+0x67/0xc0 [nouveau] nvkm_device_fini+0x94/0x140 [nouveau] nvkm_udevice_fini+0x50/0x70 [nouveau] nvkm_object_fini+0xb1/0x140 [nouveau] nvkm_object_fini+0x70/0x140 [nouveau] ? __pfx_pci_pm_runtime_suspend+0x10/0x10 nouveau_do_suspend+0xe4/0x170 [nouveau] nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau] pci_pm_runtime_suspend+0x67/0x1a0 ? __pfx_pci_pm_runtime_suspend+0x10/0x10 __rpm_callback+0x45/0x1f0 ? __pfx_pci_pm_runtime_suspend+0x10/0x10 rpm_callback+0x6d/0x80 rpm_suspend+0xe5/0x5e0 ? finish_task_switch.isra.0+0x99/0x2c0 pm_runtime_work+0x98/0xb0 process_one_work+0x18f/0x350 worker_thread+0x25a/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xf9/0x240 ? __pfx_kthread+0x10/0x10 ? __pfx_kthread+0x10/0x10 ret_from_fork+0xf1/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> The reason this happens is because the fwsec-sb firmware image only supports being booted from a contiguous coherent sysmem allocation. If a system runs into enough memory fragmentation from memory pressure, such as what can happen on systems with low amounts of memory, this can lead to a situation where it later becomes impossible to find space for a large enough contiguous allocation to hold fwsec-sb. This causes us to fail to boot the firmware image, causing the GPU to fail booting and causing the driver to fail. Since this firmware can't use non-contiguous allocations, the best solution to avoid this issue is to simply allocate the memory for fwsec-sb during initial driver-load, and reuse the memory allocation when fwsec-sb needs to be used. We then release the memory allocations on driver unload. Signed-off-by: Lyude Paul <[email protected]> Fixes: 594766c ("drm/nouveau/gsp: move booter handling to GPU-specific code") Cc: <[email protected]> # v6.16+ Reviewed-by: Timur Tabi <[email protected]> Link: https://patch.msgid.link/[email protected]
heftig
pushed a commit
that referenced
this pull request
Dec 18, 2025
Since commit a735831 ("drm/nouveau: vendor in drm_encoder_slave API") nouveau appears to be broken for all dispnv04 GPUs (before NV50). Depending on the kernel version, either having no display output and hanging in kernel for a long time, or even oopsing in the cleanup path like: Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac ... nouveau 0000:0a:00.0: drm: 0x14C5: Parsing digital output script table BUG: Unable to handle kernel data access on read at 0x00041520 Faulting instruction address: 0xc0003d0001be0844 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=8 NUMA PowerMac Modules linked in: windfarm_cpufreq_clamp windfarm_smu_sensors windfarm_smu_controls windfarm_pm112 snd_aoa_codec_onyx snd_aoa_fabric_layout snd_aoa windfarm_pid jo apple_mfi_fastcharge rndis_host cdc_ether usbnet mii snd_aoa_i2sbus snd_aoa_soundbus snd_pcm snd_timer snd soundcore rack_meter windfarm_smu_sat windfarm_max6690_s m75_sensor windfarm_core gpu_sched drm_gpuvm drm_exec drm_client_lib drm_ttm_helper ttm drm_display_helper drm_kms_helper drm drm_panel_orientation_quirks syscopyar _sys_fops i2c_algo_bit backlight uio_pdrv_genirq uio uninorth_agp agpgart zram dm_mod dax ipv6 nfsv4 dns_resolver nfs lockd grace sunrpc offb cfbfillrect cfbimgblt ont input_leds sr_mod cdrom sd_mod uas ata_generic hid_apple hid_generic usbhid hid usb_storage pata_macio sata_svw libata firewire_ohci scsi_mod firewire_core ohci ehci_pci ehci_hcd tg3 ohci_hcd libphy usbcore usb_common nls_base led_class CPU: 0 UID: 0 PID: 245 Comm: (udev-worker) Not tainted 6.14.0-09584-g7d06015d936c #7 PREEMPTLAZY Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac NIP: c0003d0001be0844 LR: c0003d0001be0830 CTR: 0000000000000000 REGS: c0000000053f70e0 TRAP: 0300 Not tainted (6.14.0-09584-g7d06015d936c) MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 24222220 XER: 00000000 DAR: 0000000000041520 DSISR: 40000000 IRQMASK: 0 \x0aGPR00: c0003d0001be0830 c0000000053f7380 c0003d0000911900 c000000007bc6800 \x0aGPR04: 0000000000000000 0000000000000000 c000000007bc6e70 0000000000000001 \x0aGPR08: 01f3040000000000 0000000000041520 0000000000000000 c0003d0000813958 \x0aGPR12: c000000000071a48 c000000000e28000 0000000000000020 0000000000000000 \x0aGPR16: 0000000000000000 0000000000f52630 0000000000000000 0000000000000000 \x0aGPR20: 0000000000000000 0000000000000000 0000000000000001 c0003d0000928528 \x0aGPR24: c0003d0000928598 0000000000000000 c000000007025480 c000000007025480 \x0aGPR28: c0000000010b4000 0000000000000000 c000000007bc1800 c000000007bc6800 NIP [c0003d0001be0844] nv_crtc_destroy+0x44/0xd4 [nouveau] LR [c0003d0001be0830] nv_crtc_destroy+0x30/0xd4 [nouveau] Call Trace: [c0000000053f7380] [c0003d0001be0830] nv_crtc_destroy+0x30/0xd4 [nouveau] (unreliable) [c0000000053f73c0] [c0003d00007f7bf4] drm_mode_config_cleanup+0x27c/0x30c [drm] [c0000000053f7490] [c0003d0001bdea50] nouveau_display_create+0x1cc/0x550 [nouveau] [c0000000053f7500] [c0003d0001bcc29c] nouveau_drm_device_init+0x1c8/0x844 [nouveau] [c0000000053f75e0] [c0003d0001bcc9ec] nouveau_drm_probe+0xd4/0x1e0 [nouveau] [c0000000053f7670] [c000000000557d24] local_pci_probe+0x50/0xa8 [c0000000053f76f0] [c000000000557fa8] pci_device_probe+0x22c/0x240 [c0000000053f7760] [c0000000005fff3c] really_probe+0x188/0x31c [c0000000053f77e0] [c000000000600204] __driver_probe_device+0x134/0x13c [c0000000053f7860] [c0000000006002c0] driver_probe_device+0x3c/0xb4 [c0000000053f78a0] [c000000000600534] __driver_attach+0x118/0x128 [c0000000053f78e0] [c0000000005fe038] bus_for_each_dev+0xa8/0xf4 [c0000000053f7950] [c0000000005ff460] driver_attach+0x2c/0x40 [c0000000053f7970] [c0000000005fea68] bus_add_driver+0x130/0x278 [c0000000053f7a00] [c00000000060117c] driver_register+0x9c/0x1a0 [c0000000053f7a80] [c00000000055623c] __pci_register_driver+0x5c/0x70 [c0000000053f7aa0] [c0003d0001c058a0] nouveau_drm_init+0x254/0x278 [nouveau] [c0000000053f7b10] [c00000000000e9bc] do_one_initcall+0x84/0x268 [c0000000053f7bf0] [c0000000001a0ba0] do_init_module+0x70/0x2d8 [c0000000053f7c70] [c0000000001a42bc] init_module_from_file+0xb4/0x108 [c0000000053f7d50] [c0000000001a4504] sys_finit_module+0x1ac/0x478 [c0000000053f7e10] [c000000000023230] system_call_exception+0x1a4/0x20c [c0000000053f7e50] [c00000000000c554] system_call_common+0xf4/0x258 --- interrupt: c00 at 0xfd5f988 NIP: 000000000fd5f988 LR: 000000000ff9b148 CTR: 0000000000000000 REGS: c0000000053f7e80 TRAP: 0c00 Not tainted (6.14.0-09584-g7d06015d936c) MSR: 100000000000d032 <HV,EE,PR,ME,IR,DR,RI> CR: 28222244 XER: 00000000 IRQMASK: 0 \x0aGPR00: 0000000000000161 00000000ffcdc2d0 00000000405db160 0000000000000020 \x0aGPR04: 000000000ffa2c9c 0000000000000000 000000000000001f 0000000000000045 \x0aGPR08: 0000000011a13770 0000000000000000 0000000000000000 0000000000000000 \x0aGPR12: 0000000000000000 0000000010249d8c 0000000000000020 0000000000000000 \x0aGPR16: 0000000000000000 0000000000f52630 0000000000000000 0000000000000000 \x0aGPR20: 0000000000000000 0000000000000000 0000000000000000 0000000011a11a70 \x0aGPR24: 0000000011a13580 0000000011a11950 0000000011a11a70 0000000000020000 \x0aGPR28: 000000000ffa2c9c 0000000000000000 000000000ffafc40 0000000011a11a70 NIP [000000000fd5f988] 0xfd5f988 LR [000000000ff9b148] 0xff9b148 --- interrupt: c00 Code: f821ffc1 418200ac e93f0000 e9290038 e9291468 eba90000 48026c0d e8410018 e93f06aa 3d290001 392982a4 79291f24 <7fdd482a> 2c3e0000 41820030 7fc3f378 ---[ end trace 0000000000000000 ]--- This is caused by the i2c encoder modules vendored into nouveau/ now depending on the equally vendored nouveau_i2c_encoder_destroy function. Trying to auto-load this modules hangs on nouveau initialization until timeout, and nouveau continues without i2c video encoders. Fix by avoiding nouveau dependency by __always_inlining that helper functions into those i2c video encoder modules. Fixes: a735831 ("drm/nouveau: vendor in drm_encoder_slave API") Signed-off-by: René Rebe <[email protected]> Reviewed-by: Lyude Paul <[email protected]> [Lyude: fixed commit reference in description] Signed-off-by: Lyude Paul <[email protected]> Link: https://patch.msgid.link/[email protected]
heftig
pushed a commit
that referenced
this pull request
Dec 18, 2025
… transaction We can't log a conflicting inode if it's a directory and it was moved from one parent directory to another parent directory in the current transaction, as this can result an attempt to have a directory with two hard links during log replay, one for the old parent directory and another for the new parent directory. The following scenario triggers that issue: 1) We have directories "dir1" and "dir2" created in a past transaction. Directory "dir1" has inode A as its parent directory; 2) We move "dir1" to some other directory; 3) We create a file with the name "dir1" in directory inode A; 4) We fsync the new file. This results in logging the inode of the new file and the inode for the directory "dir1" that was previously moved in the current transaction. So the log tree has the INODE_REF item for the new location of "dir1"; 5) We move the new file to some other directory. This results in updating the log tree to included the new INODE_REF for the new location of the file and removes the INODE_REF for the old location. This happens during the rename when we call btrfs_log_new_name(); 6) We fsync the file, and that persists the log tree changes done in the previous step (btrfs_log_new_name() only updates the log tree in memory); 7) We have a power failure; 8) Next time the fs is mounted, log replay happens and when processing the inode for directory "dir1" we find a new INODE_REF and add that link, but we don't remove the old link of the inode since we have not logged the old parent directory of the directory inode "dir1". As a result after log replay finishes when we trigger writeback of the subvolume tree's extent buffers, the tree check will detect that we have a directory a hard link count of 2 and we get a mount failure. The errors and stack traces reported in dmesg/syslog are like this: [ 3845.729764] BTRFS info (device dm-0): start tree-log replay [ 3845.730304] page: refcount:3 mapcount:0 mapping:000000005c8a3027 index:0x1d00 pfn:0x11510c [ 3845.731236] memcg:ffff9264c02f4e00 [ 3845.731751] aops:btree_aops [btrfs] ino:1 [ 3845.732300] flags: 0x17fffc00000400a(uptodate|private|writeback|node=0|zone=2|lastcpupid=0x1ffff) [ 3845.733346] raw: 017fffc00000400a 0000000000000000 dead000000000122 ffff9264d978aea8 [ 3845.734265] raw: 0000000000001d00 ffff92650e6d4738 00000003ffffffff ffff9264c02f4e00 [ 3845.735305] page dumped because: eb page dump [ 3845.735981] BTRFS critical (device dm-0): corrupt leaf: root=5 block=30408704 slot=6 ino=257, invalid nlink: has 2 expect no more than 1 for dir [ 3845.737786] BTRFS info (device dm-0): leaf 30408704 gen 10 total ptrs 17 free space 14881 owner 5 [ 3845.737789] BTRFS info (device dm-0): refs 4 lock_owner 0 current 30701 [ 3845.737792] item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160 [ 3845.737794] inode generation 3 transid 9 size 16 nbytes 16384 [ 3845.737795] block group 0 mode 40755 links 1 uid 0 gid 0 [ 3845.737797] rdev 0 sequence 2 flags 0x0 [ 3845.737798] atime 1764259517.0 [ 3845.737800] ctime 1764259517.572889464 [ 3845.737801] mtime 1764259517.572889464 [ 3845.737802] otime 1764259517.0 [ 3845.737803] item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12 [ 3845.737805] index 0 name_len 2 [ 3845.737807] item 2 key (256 DIR_ITEM 2363071922) itemoff 16077 itemsize 34 [ 3845.737808] location key (257 1 0) type 2 [ 3845.737810] transid 9 data_len 0 name_len 4 [ 3845.737811] item 3 key (256 DIR_ITEM 2676584006) itemoff 16043 itemsize 34 [ 3845.737813] location key (258 1 0) type 2 [ 3845.737814] transid 9 data_len 0 name_len 4 [ 3845.737815] item 4 key (256 DIR_INDEX 2) itemoff 16009 itemsize 34 [ 3845.737816] location key (257 1 0) type 2 [ 3845.737818] transid 9 data_len 0 name_len 4 [ 3845.737819] item 5 key (256 DIR_INDEX 3) itemoff 15975 itemsize 34 [ 3845.737820] location key (258 1 0) type 2 [ 3845.737821] transid 9 data_len 0 name_len 4 [ 3845.737822] item 6 key (257 INODE_ITEM 0) itemoff 15815 itemsize 160 [ 3845.737824] inode generation 9 transid 10 size 6 nbytes 0 [ 3845.737825] block group 0 mode 40755 links 2 uid 0 gid 0 [ 3845.737826] rdev 0 sequence 1 flags 0x0 [ 3845.737827] atime 1764259517.572889464 [ 3845.737828] ctime 1764259517.572889464 [ 3845.737830] mtime 1764259517.572889464 [ 3845.737831] otime 1764259517.572889464 [ 3845.737832] item 7 key (257 INODE_REF 256) itemoff 15801 itemsize 14 [ 3845.737833] index 2 name_len 4 [ 3845.737834] item 8 key (257 INODE_REF 258) itemoff 15787 itemsize 14 [ 3845.737836] index 2 name_len 4 [ 3845.737837] item 9 key (257 DIR_ITEM 2507850652) itemoff 15754 itemsize 33 [ 3845.737838] location key (259 1 0) type 1 [ 3845.737839] transid 10 data_len 0 name_len 3 [ 3845.737840] item 10 key (257 DIR_INDEX 2) itemoff 15721 itemsize 33 [ 3845.737842] location key (259 1 0) type 1 [ 3845.737843] transid 10 data_len 0 name_len 3 [ 3845.737844] item 11 key (258 INODE_ITEM 0) itemoff 15561 itemsize 160 [ 3845.737846] inode generation 9 transid 10 size 8 nbytes 0 [ 3845.737847] block group 0 mode 40755 links 1 uid 0 gid 0 [ 3845.737848] rdev 0 sequence 1 flags 0x0 [ 3845.737849] atime 1764259517.572889464 [ 3845.737850] ctime 1764259517.572889464 [ 3845.737851] mtime 1764259517.572889464 [ 3845.737852] otime 1764259517.572889464 [ 3845.737853] item 12 key (258 INODE_REF 256) itemoff 15547 itemsize 14 [ 3845.737855] index 3 name_len 4 [ 3845.737856] item 13 key (258 DIR_ITEM 1843588421) itemoff 15513 itemsize 34 [ 3845.737857] location key (257 1 0) type 2 [ 3845.737858] transid 10 data_len 0 name_len 4 [ 3845.737860] item 14 key (258 DIR_INDEX 2) itemoff 15479 itemsize 34 [ 3845.737861] location key (257 1 0) type 2 [ 3845.737862] transid 10 data_len 0 name_len 4 [ 3845.737863] item 15 key (259 INODE_ITEM 0) itemoff 15319 itemsize 160 [ 3845.737865] inode generation 10 transid 10 size 0 nbytes 0 [ 3845.737866] block group 0 mode 100600 links 1 uid 0 gid 0 [ 3845.737867] rdev 0 sequence 2 flags 0x0 [ 3845.737868] atime 1764259517.580874966 [ 3845.737869] ctime 1764259517.586121869 [ 3845.737870] mtime 1764259517.580874966 [ 3845.737872] otime 1764259517.580874966 [ 3845.737873] item 16 key (259 INODE_REF 257) itemoff 15306 itemsize 13 [ 3845.737874] index 2 name_len 3 [ 3845.737875] BTRFS error (device dm-0): block=30408704 write time tree block corruption detected [ 3845.739448] ------------[ cut here ]------------ [ 3845.740092] WARNING: CPU: 5 PID: 30701 at fs/btrfs/disk-io.c:335 btree_csum_one_bio+0x25a/0x270 [btrfs] [ 3845.741439] Modules linked in: btrfs dm_flakey crc32c_cryptoapi (...) [ 3845.750626] CPU: 5 UID: 0 PID: 30701 Comm: mount Tainted: G W 6.18.0-rc6-btrfs-next-218+ #1 PREEMPT(full) [ 3845.752414] Tainted: [W]=WARN [ 3845.752828] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [ 3845.754499] RIP: 0010:btree_csum_one_bio+0x25a/0x270 [btrfs] [ 3845.755460] Code: 31 f6 48 89 (...) [ 3845.758685] RSP: 0018:ffffa8d9c5677678 EFLAGS: 00010246 [ 3845.759450] RAX: 0000000000000000 RBX: ffff92650e6d4738 RCX: 0000000000000000 [ 3845.760309] RDX: 0000000000000000 RSI: ffffffff9aab45b9 RDI: ffff9264c4748000 [ 3845.761239] RBP: ffff9264d4324000 R08: 0000000000000000 R09: ffffa8d9c5677468 [ 3845.762607] R10: ffff926bdc1fffa8 R11: 0000000000000003 R12: ffffa8d9c5677680 [ 3845.764099] R13: 0000000000004000 R14: ffff9264dd624000 R15: ffff9264d978aba8 [ 3845.765094] FS: 00007f751fa5a840(0000) GS:ffff926c42a82000(0000) knlGS:0000000000000000 [ 3845.766226] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3845.766970] CR2: 0000558df1815380 CR3: 000000010ed88003 CR4: 0000000000370ef0 [ 3845.768009] Call Trace: [ 3845.768392] <TASK> [ 3845.768714] btrfs_submit_bbio+0x6ee/0x7f0 [btrfs] [ 3845.769640] ? write_one_eb+0x28e/0x340 [btrfs] [ 3845.770588] btree_write_cache_pages+0x2f0/0x550 [btrfs] [ 3845.771286] ? alloc_extent_state+0x19/0x100 [btrfs] [ 3845.771967] ? merge_next_state+0x1a/0x90 [btrfs] [ 3845.772586] ? set_extent_bit+0x233/0x8b0 [btrfs] [ 3845.773198] ? xas_load+0x9/0xc0 [ 3845.773589] ? xas_find+0x14d/0x1a0 [ 3845.773969] do_writepages+0xc6/0x160 [ 3845.774367] filemap_fdatawrite_wbc+0x48/0x60 [ 3845.775003] __filemap_fdatawrite_range+0x5b/0x80 [ 3845.775902] btrfs_write_marked_extents+0x61/0x170 [btrfs] [ 3845.776707] btrfs_write_and_wait_transaction+0x4e/0xc0 [btrfs] [ 3845.777379] ? _raw_spin_unlock_irqrestore+0x23/0x40 [ 3845.777923] btrfs_commit_transaction+0x5ea/0xd20 [btrfs] [ 3845.778551] ? _raw_spin_unlock+0x15/0x30 [ 3845.778986] ? release_extent_buffer+0x34/0x160 [btrfs] [ 3845.779659] btrfs_recover_log_trees+0x7a3/0x7c0 [btrfs] [ 3845.780416] ? __pfx_replay_one_buffer+0x10/0x10 [btrfs] [ 3845.781499] open_ctree+0x10bb/0x15f0 [btrfs] [ 3845.782194] btrfs_get_tree.cold+0xb/0x16c [btrfs] [ 3845.782764] ? fscontext_read+0x15c/0x180 [ 3845.783202] ? rw_verify_area+0x50/0x180 [ 3845.783667] vfs_get_tree+0x25/0xd0 [ 3845.784047] vfs_cmd_create+0x59/0xe0 [ 3845.784458] __do_sys_fsconfig+0x4f6/0x6b0 [ 3845.784914] do_syscall_64+0x50/0x1220 [ 3845.785340] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 3845.785980] RIP: 0033:0x7f751fc7f4aa [ 3845.786759] Code: 73 01 c3 48 (...) [ 3845.789951] RSP: 002b:00007ffcdba45dc8 EFLAGS: 00000246 ORIG_RAX: 00000000000001af [ 3845.791402] RAX: ffffffffffffffda RBX: 000055ccc8291c20 RCX: 00007f751fc7f4aa [ 3845.792688] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003 [ 3845.794308] RBP: 000055ccc8292120 R08: 0000000000000000 R09: 0000000000000000 [ 3845.795829] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 3845.797183] R13: 00007f751fe11580 R14: 00007f751fe1326c R15: 00007f751fdf8a23 [ 3845.798633] </TASK> [ 3845.799067] ---[ end trace 0000000000000000 ]--- [ 3845.800215] BTRFS: error (device dm-0) in btrfs_commit_transaction:2553: errno=-5 IO failure (Error while writing out transaction) [ 3845.801860] BTRFS warning (device dm-0 state E): Skipping commit of aborted transaction. [ 3845.802815] BTRFS error (device dm-0 state EA): Transaction aborted (error -5) [ 3845.803728] BTRFS: error (device dm-0 state EA) in cleanup_transaction:2036: errno=-5 IO failure [ 3845.805374] BTRFS: error (device dm-0 state EA) in btrfs_replay_log:2083: errno=-5 IO failure (Failed to recover log tree) [ 3845.807919] BTRFS error (device dm-0 state EA): open_ctree failed: -5 Fix this by never logging a conflicting inode that is a directory and was moved in the current transaction (its last_unlink_trans equals the current transaction) and instead fallback to a transaction commit. A test case for fstests will follow soon. Reported-by: Vyacheslav Kovalevsky <[email protected]> Link: https://lore.kernel.org/linux-btrfs/[email protected]/ CC: [email protected] # 6.1+ Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 18, 2025
[ Upstream commit 78b4d64 ] A timer that expires a vgem fence automatically in 10 seconds is now released with timer_delete_sync() from fence->ops.release() called on last dma_fence_put(). In some scenarios, it can run in IRQ context, which is not safe unless TIMER_IRQSAFE is used. One potentially risky scenario was demonstrated in Intel DRM CI trybot, BAT run on machine bat-adlp-6, while working on new IGT subtests syncobj_timeline@stress-* as user space replacements of some problematic test cases of a dma-fence-chain selftest [1]. [117.004338] ================================ [117.004340] WARNING: inconsistent lock state [117.004342] 6.17.0-rc7-CI_DRM_17270-g7644974e648c+ #1 Tainted: G S U [117.004346] -------------------------------- [117.004347] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. [117.004349] swapper/0/0 [HC1[1]:SC1[1]:HE0:SE0] takes: [117.004352] ffff888138f86aa8 ((&fence->timer)){?.-.}-{0:0}, at: __timer_delete_sync+0x4b/0x190 [117.004361] {HARDIRQ-ON-W} state was registered at: [117.004363] lock_acquire+0xc4/0x2e0 [117.004366] call_timer_fn+0x80/0x2a0 [117.004368] __run_timers+0x231/0x310 [117.004370] run_timer_softirq+0x76/0xe0 [117.004372] handle_softirqs+0xd4/0x4d0 [117.004375] __irq_exit_rcu+0x13f/0x160 [117.004377] irq_exit_rcu+0xe/0x20 [117.004379] sysvec_apic_timer_interrupt+0xa0/0xc0 [117.004382] asm_sysvec_apic_timer_interrupt+0x1b/0x20 [117.004385] cpuidle_enter_state+0x12b/0x8a0 [117.004388] cpuidle_enter+0x2e/0x50 [117.004393] call_cpuidle+0x22/0x60 [117.004395] do_idle+0x1fd/0x260 [117.004398] cpu_startup_entry+0x29/0x30 [117.004401] start_secondary+0x12d/0x160 [117.004404] common_startup_64+0x13e/0x141 [117.004407] irq event stamp: 2282669 [117.004409] hardirqs last enabled at (2282668): [<ffffffff8289db71>] _raw_spin_unlock_irqrestore+0x51/0x80 [117.004414] hardirqs last disabled at (2282669): [<ffffffff82882021>] sysvec_irq_work+0x11/0xc0 [117.004419] softirqs last enabled at (2254702): [<ffffffff8289fd00>] __do_softirq+0x10/0x18 [117.004423] softirqs last disabled at (2254725): [<ffffffff813d4ddf>] __irq_exit_rcu+0x13f/0x160 [117.004426] other info that might help us debug this: [117.004429] Possible unsafe locking scenario: [117.004432] CPU0 [117.004433] ---- [117.004434] lock((&fence->timer)); [117.004436] <Interrupt> [117.004438] lock((&fence->timer)); [117.004440] *** DEADLOCK *** [117.004443] 1 lock held by swapper/0/0: [117.004445] #0: ffffc90000003d50 ((&fence->timer)){?.-.}-{0:0}, at: call_timer_fn+0x7a/0x2a0 [117.004450] stack backtrace: [117.004453] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G S U 6.17.0-rc7-CI_DRM_17270-g7644974e648c+ #1 PREEMPT(voluntary) [117.004455] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER [117.004455] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023 [117.004456] Call Trace: [117.004456] <IRQ> [117.004457] dump_stack_lvl+0x91/0xf0 [117.004460] dump_stack+0x10/0x20 [117.004461] print_usage_bug.part.0+0x260/0x360 [117.004463] mark_lock+0x76e/0x9c0 [117.004465] ? register_lock_class+0x48/0x4a0 [117.004467] __lock_acquire+0xbc3/0x2860 [117.004469] lock_acquire+0xc4/0x2e0 [117.004470] ? __timer_delete_sync+0x4b/0x190 [117.004472] ? __timer_delete_sync+0x4b/0x190 [117.004473] __timer_delete_sync+0x68/0x190 [117.004474] ? __timer_delete_sync+0x4b/0x190 [117.004475] timer_delete_sync+0x10/0x20 [117.004476] vgem_fence_release+0x19/0x30 [vgem] [117.004478] dma_fence_release+0xc1/0x3b0 [117.004480] ? dma_fence_release+0xa1/0x3b0 [117.004481] dma_fence_chain_release+0xe7/0x130 [117.004483] dma_fence_release+0xc1/0x3b0 [117.004484] ? _raw_spin_unlock_irqrestore+0x27/0x80 [117.004485] dma_fence_chain_irq_work+0x59/0x80 [117.004487] irq_work_single+0x75/0xa0 [117.004490] irq_work_run_list+0x33/0x60 [117.004491] irq_work_run+0x18/0x40 [117.004493] __sysvec_irq_work+0x35/0x170 [117.004494] sysvec_irq_work+0x47/0xc0 [117.004496] asm_sysvec_irq_work+0x1b/0x20 [117.004497] RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80 [117.004499] Code: 00 75 1c 65 ff 0d d9 34 68 01 74 20 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc e8 7f 9d d3 fe fb 0f 1f 44 00 00 <eb> d7 0f 1f 44 00 00 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 [117.004499] RSP: 0018:ffffc90000003cf0 EFLAGS: 00000246 [117.004500] RAX: 0000000000000000 RBX: ffff888155e94c40 RCX: 0000000000000000 [117.004501] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [117.004502] RBP: ffffc90000003d00 R08: 0000000000000000 R09: 0000000000000000 [117.004502] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246 [117.004502] R13: 0000000000000001 R14: 0000000000000246 R15: ffff888155e94c80 [117.004506] dma_fence_signal+0x49/0xb0 [117.004507] ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem] [117.004508] vgem_fence_timeout+0x12/0x20 [vgem] [117.004509] call_timer_fn+0xa1/0x2a0 [117.004512] ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem] [117.004513] __run_timers+0x231/0x310 [117.004514] ? tmigr_handle_remote+0x2ac/0x560 [117.004517] timer_expire_remote+0x46/0x70 [117.004518] tmigr_handle_remote+0x433/0x560 [117.004520] ? __run_timers+0x239/0x310 [117.004521] ? run_timer_softirq+0x21/0xe0 [117.004522] ? lock_release+0xce/0x2a0 [117.004524] run_timer_softirq+0xcf/0xe0 [117.004525] handle_softirqs+0xd4/0x4d0 [117.004526] __irq_exit_rcu+0x13f/0x160 [117.004527] irq_exit_rcu+0xe/0x20 [117.004528] sysvec_apic_timer_interrupt+0xa0/0xc0 [117.004529] </IRQ> [117.004529] <TASK> [117.004529] asm_sysvec_apic_timer_interrupt+0x1b/0x20 [117.004530] RIP: 0010:cpuidle_enter_state+0x12b/0x8a0 [117.004532] Code: 48 0f a3 05 97 ce 0e 01 0f 82 2e 03 00 00 31 ff e8 8a 41 bd fe 80 7d d0 00 0f 85 11 03 00 00 e8 8b 06 d5 fe fb 0f 1f 44 00 00 <45> 85 f6 0f 88 67 02 00 00 4d 63 ee 49 83 fd 0a 0f 83 34 06 00 00 [117.004532] RSP: 0018:ffffffff83403d88 EFLAGS: 00000246 [117.004533] RAX: 0000000000000000 RBX: ffff88888f046440 RCX: 0000000000000000 [117.004533] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [117.004534] RBP: ffffffff83403dd8 R08: 0000000000000000 R09: 0000000000000000 [117.004534] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff837cbe80 [117.004534] R13: 0000000000000004 R14: 0000000000000004 R15: 0000001ad1df466b [117.004537] ? cpuidle_enter_state+0x125/0x8a0 [117.004538] ? sched_clock_noinstr+0x9/0x10 [117.004540] cpuidle_enter+0x2e/0x50 [117.004542] call_cpuidle+0x22/0x60 [117.004542] do_idle+0x1fd/0x260 [117.004544] cpu_startup_entry+0x29/0x30 [117.004546] rest_init+0x104/0x200 [117.004548] start_kernel+0x93d/0xbd0 [117.004550] ? load_ucode_intel_bsp+0x2a/0x90 [117.004551] ? sme_unmap_bootdata+0x14/0x80 [117.004554] x86_64_start_reservations+0x18/0x30 [117.004555] x86_64_start_kernel+0xfd/0x150 [117.004556] ? soft_restart_cpu+0x14/0x14 [117.004558] common_startup_64+0x13e/0x141 [117.004560] </TASK> [117.004565] ------------[ cut here ]------------ [117.004692] WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1610 __timer_delete_sync+0x126/0x190 [117.004697] Modules linked in: vgem snd_hda_codec_intelhdmi snd_hda_codec_hdmi i915 prime_numbers ttm drm_buddy drm_display_helper cec rc_core i2c_algo_bit hid_sensor_custom hid_sensor_hub hid_generic intel_ishtp_hid hid intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal intel_powerclamp cmdlinepart ee1004 r8153_ecm spi_nor coretemp cdc_ether mei_pxp mei_hdcp usbnet mtd intel_rapl_msr wmi_bmof kvm_intel snd_hda_intel snd_intel_dspcfg processor_thermal_device_pci kvm snd_hda_codec processor_thermal_device irqbypass processor_thermal_wt_hint polyval_clmulni platform_temperature_control snd_hda_core ghash_clmulni_intel processor_thermal_rfim spi_pxa2xx_platform snd_hwdep aesni_intel processor_thermal_rapl dw_dmac snd_pcm dw_dmac_core intel_rapl_common r8152 rapl mii intel_cstate spi_pxa2xx_core i2c_i801 processor_thermal_wt_req snd_timer i2c_mux mei_me intel_ish_ipc processor_thermal_power_floor e1000e snd i2c_smbus spi_intel_pci processor_thermal_mbox mei soundcore intel_ishtp thunderbolt idma64 [117.004733] spi_intel int340x_thermal_zone igen6_edac binfmt_misc intel_skl_int3472_tps68470 intel_pmc_core tps68470_regulator video clk_tps68470 pmt_telemetry pmt_discovery nls_iso8859_1 pmt_class intel_pmc_ssram_telemetry intel_skl_int3472_discrete int3400_thermal intel_hid intel_skl_int3472_common acpi_thermal_rel intel_vsec wmi pinctrl_tigerlake acpi_tad sparse_keymap acpi_pad dm_multipath msr nvme_fabrics fuse efi_pstore nfnetlink autofs4 [117.004782] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G S U 6.17.0-rc7-CI_DRM_17270-g7644974e648c+ #1 PREEMPT(voluntary) [117.004787] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER [117.004789] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023 [117.004793] RIP: 0010:__timer_delete_sync+0x126/0x190 [117.004795] Code: 31 c0 45 31 c9 c3 cc cc cc cc 48 8b 75 d0 45 84 f6 74 63 49 c7 45 18 00 00 00 00 48 89 c7 e8 51 46 39 01 f3 90 e9 66 ff ff ff <0f> 0b e9 5f ff ff ff e8 ee e4 0c 00 49 8d 5d 28 45 31 c9 31 c9 4c [117.004801] RSP: 0018:ffffc90000003a40 EFLAGS: 00010046 [117.004804] RAX: ffffffff815093fb RBX: ffff888138f86aa8 RCX: 0000000000000000 [117.004807] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [117.004809] RBP: ffffc90000003a70 R08: 0000000000000000 R09: 0000000000000000 [117.004812] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff815093fb [117.004814] R13: ffff888138f86a80 R14: 0000000000000000 R15: 0000000000000000 [117.004817] FS: 0000000000000000(0000) GS:ffff88890b0f7000(0000) knlGS:0000000000000000 [117.004820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [117.004823] CR2: 00005db8131eb7f0 CR3: 0000000003448000 CR4: 0000000000f52ef0 [117.004826] PKRU: 55555554 [117.004827] Call Trace: [117.004829] <IRQ> [117.004831] timer_delete_sync+0x10/0x20 [117.004833] vgem_fence_release+0x19/0x30 [vgem] [117.004836] dma_fence_release+0xc1/0x3b0 [117.004838] ? dma_fence_release+0xa1/0x3b0 [117.004841] dma_fence_chain_release+0xe7/0x130 [117.004844] dma_fence_release+0xc1/0x3b0 [117.004847] ? _raw_spin_unlock_irqrestore+0x27/0x80 [117.004850] dma_fence_chain_irq_work+0x59/0x80 [117.004853] irq_work_single+0x75/0xa0 [117.004857] irq_work_run_list+0x33/0x60 [117.004860] irq_work_run+0x18/0x40 [117.004863] __sysvec_irq_work+0x35/0x170 [117.004865] sysvec_irq_work+0x47/0xc0 [117.004868] asm_sysvec_irq_work+0x1b/0x20 [117.004871] RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80 [117.004874] Code: 00 75 1c 65 ff 0d d9 34 68 01 74 20 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc e8 7f 9d d3 fe fb 0f 1f 44 00 00 <eb> d7 0f 1f 44 00 00 5b 41 5c 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 [117.004879] RSP: 0018:ffffc90000003cf0 EFLAGS: 00000246 [117.004882] RAX: 0000000000000000 RBX: ffff888155e94c40 RCX: 0000000000000000 [117.004884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [117.004887] RBP: ffffc90000003d00 R08: 0000000000000000 R09: 0000000000000000 [117.004890] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246 [117.004892] R13: 0000000000000001 R14: 0000000000000246 R15: ffff888155e94c80 [117.004897] dma_fence_signal+0x49/0xb0 [117.004899] ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem] [117.004902] vgem_fence_timeout+0x12/0x20 [vgem] [117.004904] call_timer_fn+0xa1/0x2a0 [117.004908] ? __pfx_vgem_fence_timeout+0x10/0x10 [vgem] [117.004910] __run_timers+0x231/0x310 [117.004913] ? tmigr_handle_remote+0x2ac/0x560 [117.004917] timer_expire_remote+0x46/0x70 [117.004919] tmigr_handle_remote+0x433/0x560 [117.004923] ? __run_timers+0x239/0x310 [117.004925] ? run_timer_softirq+0x21/0xe0 [117.004928] ? lock_release+0xce/0x2a0 [117.004931] run_timer_softirq+0xcf/0xe0 [117.004933] handle_softirqs+0xd4/0x4d0 [117.004936] __irq_exit_rcu+0x13f/0x160 [117.004938] irq_exit_rcu+0xe/0x20 [117.004940] sysvec_apic_timer_interrupt+0xa0/0xc0 [117.004943] </IRQ> [117.004944] <TASK> [117.004946] asm_sysvec_apic_timer_interrupt+0x1b/0x20 [117.004949] RIP: 0010:cpuidle_enter_state+0x12b/0x8a0 [117.004953] Code: 48 0f a3 05 97 ce 0e 01 0f 82 2e 03 00 00 31 ff e8 8a 41 bd fe 80 7d d0 00 0f 85 11 03 00 00 e8 8b 06 d5 fe fb 0f 1f 44 00 00 <45> 85 f6 0f 88 67 02 00 00 4d 63 ee 49 83 fd 0a 0f 83 34 06 00 00 [117.004961] RSP: 0018:ffffffff83403d88 EFLAGS: 00000246 [117.004963] RAX: 0000000000000000 RBX: ffff88888f046440 RCX: 0000000000000000 [117.004966] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [117.004968] RBP: ffffffff83403dd8 R08: 0000000000000000 R09: 0000000000000000 [117.004971] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff837cbe80 [117.004974] R13: 0000000000000004 R14: 0000000000000004 R15: 0000001ad1df466b [117.004978] ? cpuidle_enter_state+0x125/0x8a0 [117.004981] ? sched_clock_noinstr+0x9/0x10 [117.004985] cpuidle_enter+0x2e/0x50 [117.004989] call_cpuidle+0x22/0x60 [117.004991] do_idle+0x1fd/0x260 [117.005001] cpu_startup_entry+0x29/0x30 [117.005004] rest_init+0x104/0x200 [117.005008] start_kernel+0x93d/0xbd0 [117.005011] ? load_ucode_intel_bsp+0x2a/0x90 [117.005014] ? sme_unmap_bootdata+0x14/0x80 [117.005017] x86_64_start_reservations+0x18/0x30 [117.005020] x86_64_start_kernel+0xfd/0x150 [117.005023] ? soft_restart_cpu+0x14/0x14 [117.005026] common_startup_64+0x13e/0x141 [117.005030] </TASK> [117.005032] irq event stamp: 2282669 [117.005034] hardirqs last enabled at (2282668): [<ffffffff8289db71>] _raw_spin_unlock_irqrestore+0x51/0x80 [117.005038] hardirqs last disabled at (2282669): [<ffffffff82882021>] sysvec_irq_work+0x11/0xc0 [117.005043] softirqs last enabled at (2254702): [<ffffffff8289fd00>] __do_softirq+0x10/0x18 [117.005047] softirqs last disabled at (2254725): [<ffffffff813d4ddf>] __irq_exit_rcu+0x13f/0x160 [117.005051] ---[ end trace 0000000000000000 ]--- Make the timer IRQ safe. [1] https://patchwork.freedesktop.org/series/154987/#rev2 Fixes: 4077798 ("drm/vgem: Attach sw fences to exported vGEM dma-buf (ioctl)") Signed-off-by: Janusz Krzysztofik <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maarten Lankhorst <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 18, 2025
[ Upstream commit 163e5f2 ] When using perf record with the `--overwrite` option, a segmentation fault occurs if an event fails to open. For example: perf record -e cycles-ct -F 1000 -a --overwrite Error: cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' perf: Segmentation fault #0 0x6466b6 in dump_stack debug.c:366 #1 0x646729 in sighandler_dump_stack debug.c:378 #2 0x453fd1 in sigsegv_handler builtin-record.c:722 #3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090] #4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862 #5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943 #6 0x458090 in record__synthesize builtin-record.c:2075 #7 0x45a85a in __cmd_record builtin-record.c:2888 #8 0x45deb6 in cmd_record builtin-record.c:4374 #9 0x4e5e33 in run_builtin perf.c:349 #10 0x4e60bf in handle_internal_command perf.c:401 torvalds#11 0x4e6215 in run_argv perf.c:448 torvalds#12 0x4e653a in main perf.c:555 torvalds#13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72] torvalds#14 0x43a3ee in _start ??:0 The --overwrite option implies --tail-synthesize, which collects non-sample events reflecting the system status when recording finishes. However, when evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist is not initialized and remains NULL. The code unconditionally calls record__synthesize() in the error path, which iterates through the NULL evlist pointer and causes a segfault. To fix it, move the record__synthesize() call inside the error check block, so it's only called when there was no error during recording, ensuring that evlist is properly initialized. Fixes: 4ea648a ("perf record: Add --tail-synthesize option") Signed-off-by: Shuai Xue <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 18, 2025
[ Upstream commit 23b2d2f ] When booting with KASAN enabled the following splat is encountered during probe of the k1 clock driver: UBSAN: array-index-out-of-bounds in drivers/clk/spacemit/ccu-k1.c:1044:16 index 0 is out of range for type 'clk_hw *[*]' CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-rc5+ #1 PREEMPT(lazy) Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2022.10spacemit 10/01/2022 Call Trace: [<ffffffff8002b628>] dump_backtrace+0x28/0x38 [<ffffffff800027d2>] show_stack+0x3a/0x50 [<ffffffff800220c2>] dump_stack_lvl+0x5a/0x80 [<ffffffff80022100>] dump_stack+0x18/0x20 [<ffffffff800164b8>] ubsan_epilogue+0x10/0x48 [<ffffffff8099034e>] __ubsan_handle_out_of_bounds+0xa6/0xa8 [<ffffffff80acbfa6>] k1_ccu_probe+0x37e/0x420 [<ffffffff80b79e6e>] platform_probe+0x56/0x98 [<ffffffff80b76a7e>] really_probe+0x9e/0x350 [<ffffffff80b76db0>] __driver_probe_device+0x80/0x138 [<ffffffff80b76f52>] driver_probe_device+0x3a/0xd0 [<ffffffff80b771c4>] __driver_attach+0xac/0x1b8 [<ffffffff80b742fc>] bus_for_each_dev+0x6c/0xc8 [<ffffffff80b76296>] driver_attach+0x26/0x38 [<ffffffff80b759ae>] bus_add_driver+0x13e/0x268 [<ffffffff80b7836a>] driver_register+0x52/0x100 [<ffffffff80b79a78>] __platform_driver_register+0x28/0x38 [<ffffffff814585da>] k1_ccu_driver_init+0x22/0x38 [<ffffffff80023a8a>] do_one_initcall+0x62/0x2a0 [<ffffffff81401c60>] do_initcalls+0x170/0x1a8 [<ffffffff81401e7a>] kernel_init_freeable+0x16a/0x1e0 [<ffffffff811f7534>] kernel_init+0x2c/0x180 [<ffffffff80025f56>] ret_from_fork_kernel+0x16/0x1d8 [<ffffffff81205336>] ret_from_fork_kernel_asm+0x16/0x18 ---[ end trace ]--- This is bogus and is simply a result of KASAN consulting the `.num` member of the struct for bounds information (as it should due to `__counted_by`) and finding 0 set by kzalloc() because it has not been initialized before the loop that fills in the array. The easy fix is to just move the line that sets `num` to before the loop that fills the array so that KASAN has the information it needs to accurately conclude that the access is valid. Fixes: 1b72c59 ("clk: spacemit: Add clock support for SpacemiT K1 SoC") Tested-by: Yanko Kaneti <[email protected]> Signed-off-by: Charles Mirabile <[email protected]> Reviewed-by: Alex Elder <[email protected]> Reviewed-by: Troy Mitchell <[email protected]> Reviewed-by: Yixun Lan <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 18, 2025
[ Upstream commit 385aab8 ] MT7996 driver can use both wed and wed_hif2 devices to offload traffic from/to the wireless NIC. In the current codebase we assume to always use the primary wed device in wed callbacks resulting in the following crash if the hw runs wed_hif2 (e.g. 6GHz link). [ 297.455876] Unable to handle kernel read from unreadable memory at virtual address 000000000000080a [ 297.464928] Mem abort info: [ 297.467722] ESR = 0x0000000096000005 [ 297.471461] EC = 0x25: DABT (current EL), IL = 32 bits [ 297.476766] SET = 0, FnV = 0 [ 297.479809] EA = 0, S1PTW = 0 [ 297.482940] FSC = 0x05: level 1 translation fault [ 297.487809] Data abort info: [ 297.490679] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 [ 297.496156] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 297.501196] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 297.506500] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000107480000 [ 297.512927] [000000000000080a] pgd=08000001097fb003, p4d=08000001097fb003, pud=08000001097fb003, pmd=0000000000000000 [ 297.523532] Internal error: Oops: 0000000096000005 [#1] SMP [ 297.715393] CPU: 2 UID: 0 PID: 45 Comm: kworker/u16:2 Tainted: G O 6.12.50 #0 [ 297.723908] Tainted: [O]=OOT_MODULE [ 297.727384] Hardware name: Banana Pi BPI-R4 (2x SFP+) (DT) [ 297.732857] Workqueue: nf_ft_offload_del nf_flow_rule_route_ipv6 [nf_flow_table] [ 297.740254] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 297.747205] pc : mt76_wed_offload_disable+0x64/0xa0 [mt76] [ 297.752688] lr : mtk_wed_flow_remove+0x58/0x80 [ 297.757126] sp : ffffffc080fe3ae0 [ 297.760430] x29: ffffffc080fe3ae0 x28: ffffffc080fe3be0 x27: 00000000deadbef7 [ 297.767557] x26: ffffff80c5ebca00 x25: 0000000000000001 x24: ffffff80c85f4c00 [ 297.774683] x23: ffffff80c1875b78 x22: ffffffc080d42cd0 x21: ffffffc080660018 [ 297.781809] x20: ffffff80c6a076d0 x19: ffffff80c6a043c8 x18: 0000000000000000 [ 297.788935] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000000 [ 297.796060] x14: 0000000000000019 x13: ffffff80c0ad8ec0 x12: 00000000fa83b2da [ 297.803185] x11: ffffff80c02700c0 x10: ffffff80c0ad8ec0 x9 : ffffff81fef96200 [ 297.810311] x8 : ffffff80c02700c0 x7 : ffffff80c02700d0 x6 : 0000000000000002 [ 297.817435] x5 : 0000000000000400 x4 : 0000000000000000 x3 : 0000000000000000 [ 297.824561] x2 : 0000000000000001 x1 : 0000000000000800 x0 : ffffff80c6a063c8 [ 297.831686] Call trace: [ 297.834123] mt76_wed_offload_disable+0x64/0xa0 [mt76] [ 297.839254] mtk_wed_flow_remove+0x58/0x80 [ 297.843342] mtk_flow_offload_cmd+0x434/0x574 [ 297.847689] mtk_wed_setup_tc_block_cb+0x30/0x40 [ 297.852295] nf_flow_offload_ipv6_hook+0x7f4/0x964 [nf_flow_table] [ 297.858466] nf_flow_rule_route_ipv6+0x438/0x4a4 [nf_flow_table] [ 297.864463] process_one_work+0x174/0x300 [ 297.868465] worker_thread+0x278/0x430 [ 297.872204] kthread+0xd8/0xdc [ 297.875251] ret_from_fork+0x10/0x20 [ 297.878820] Code: 928b5ae0 8b000273 91400a60 f943fa61 (79401421) [ 297.884901] ---[ end trace 0000000000000000 ]--- Fix the issue detecting the proper wed reference to use running wed callabacks. Fixes: 83eafc9 ("wifi: mt76: mt7996: add wed tx support") Tested-by: Daniel Pawlik <[email protected]> Tested-by: Matteo Croce <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 18, 2025
[ Upstream commit 1f73a56 ] Neither sock4 nor sock6 pointers are guaranteed to be non-NULL in vxlan_xmit_one, e.g. if the iface is brought down. This can lead to the following NULL dereference: BUG: kernel NULL pointer dereference, address: 0000000000000010 Oops: Oops: 0000 [#1] SMP NOPTI RIP: 0010:vxlan_xmit_one+0xbb3/0x1580 Call Trace: vxlan_xmit+0x429/0x610 dev_hard_start_xmit+0x55/0xa0 __dev_queue_xmit+0x6d0/0x7f0 ip_finish_output2+0x24b/0x590 ip_output+0x63/0x110 Mentioned commits changed the code path in vxlan_xmit_one and as a side effect the sock4/6 pointer validity checks in vxlan(6)_get_route were lost. Fix this by adding back checks. Since both commits being fixed were released in the same version (v6.7) and are strongly related, bundle the fixes in a single commit. Reported-by: Liang Li <[email protected]> Fixes: 6f19b2c ("vxlan: use generic function for tunnel IPv4 route lookup") Fixes: 2aceb89 ("vxlan: use generic function for tunnel IPv6 route lookup") Cc: Beniamino Galvani <[email protected]> Signed-off-by: Antoine Tenart <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Tested-by: Ido Schimmel <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 18, 2025
…ne() and smb_direct_cm_handler() [ Upstream commit 425c327 ] Namjae reported the following: I have a simple file copy test with windows 11 client, and get the following error message. [ 894.140312] ------------[ cut here ]------------ [ 894.140316] WARNING: CPU: 1 PID: 116 at fs/smb/server/transport_rdma.c:642 recv_done+0x308/0x360 [ksmbd] [ 894.140335] Modules linked in: ksmbd cmac nls_utf8 nls_ucs2_utils libarc4 nls_iso8859_1 snd_hda_codec_intelhdmi snd_hda_codec_hdmi snd_hda_codec_alc882 snd_hda_codec_realtek_lib snd_hda_codec_generic rpcrdma intel_rapl_msr rdma_ucm intel_rapl_common snd_hda_intel ib_iser snd_hda_codec intel_uncore_frequency intel_uncore_frequency_common snd_hda_core intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp snd_intel_dspcfg libiscsi snd_intel_sdw_acpi coretemp scsi_transport_iscsi snd_hwdep kvm_intel i915 snd_pcm ib_umad rdma_cm snd_seq_midi ib_ipoib kvm snd_seq_midi_event iw_cm snd_rawmidi ghash_clmulni_intel ib_cm aesni_intel snd_seq mei_hdcp drm_buddy rapl snd_seq_device eeepc_wmi asus_wmi snd_timer intel_cstate ttm snd drm_client_lib drm_display_helper sparse_keymap soundcore platform_profile mxm_wmi wmi_bmof joydev mei_me cec acpi_pad mei rc_core drm_kms_helper input_leds i2c_algo_bit mac_hid sch_fq_codel msr parport_pc ppdev lp nfsd parport auth_rpcgss binfmt_misc nfs_acl lockd grace drm sunrpc ramoops efi_pstore [ 894.140414] reed_solomon pstore_blk pstore_zone autofs4 btrfs blake2b_generic xor raid6_pq mlx5_ib ib_uverbs ib_core hid_generic uas usbhid hid r8169 i2c_i801 usb_storage i2c_mux i2c_smbus mlx5_core realtek ahci mlxfw psample libahci video wmi [last unloaded: ksmbd] [ 894.140442] CPU: 1 UID: 0 PID: 116 Comm: kworker/1:1H Tainted: G W 6.18.0-rc5+ #1 PREEMPT(voluntary) [ 894.140447] Tainted: [W]=WARN [ 894.140448] Hardware name: System manufacturer System Product Name/H110M-K, BIOS 3601 12/12/2017 [ 894.140450] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [ 894.140476] RIP: 0010:recv_done+0x308/0x360 [ksmbd] [ 894.140487] Code: 2e f2 ff ff 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 41 8b 55 10 49 8b 75 08 b9 02 00 00 00 e8 ed f4 f2 c3 e9 59 fd ff ff <0f> 0b e9 02 ff ff ff 49 8b 74 24 28 49 8d 94 24 c8 00 00 00 bf 00 [ 894.140490] RSP: 0018:ffffa47ec03f3d78 EFLAGS: 00010293 [ 894.140492] RAX: 0000000000000001 RBX: ffff8eb84c818000 RCX: 000000010002ba00 [ 894.140494] RDX: 0000000037600001 RSI: 0000000000000083 RDI: ffff8eb92ec9ee40 [ 894.140496] RBP: ffffa47ec03f3da0 R08: 0000000000000000 R09: 0000000000000010 [ 894.140498] R10: ffff8eb801705680 R11: fefefefefefefeff R12: ffff8eb7454b8810 [ 894.140499] R13: ffff8eb746deb988 R14: ffff8eb746deb980 R15: ffff8eb84c818000 [ 894.140501] FS: 0000000000000000(0000) GS:ffff8eb9a7355000(0000) knlGS:0000000000000000 [ 894.140503] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 894.140505] CR2: 00002d9401d60018 CR3: 0000000010a40006 CR4: 00000000003726f0 [ 894.140507] Call Trace: [ 894.140509] <TASK> [ 894.140512] __ib_process_cq+0x8e/0x190 [ib_core] [ 894.140530] ib_cq_poll_work+0x2f/0x90 [ib_core] [ 894.140545] process_scheduled_works+0xd4/0x430 [ 894.140554] worker_thread+0x12a/0x270 [ 894.140558] kthread+0x10d/0x250 [ 894.140564] ? __pfx_worker_thread+0x10/0x10 [ 894.140567] ? __pfx_kthread+0x10/0x10 [ 894.140571] ret_from_fork+0x11a/0x160 [ 894.140574] ? __pfx_kthread+0x10/0x10 [ 894.140577] ret_from_fork_asm+0x1a/0x30 [ 894.140584] </TASK> [ 894.140585] ---[ end trace 0000000000000000 ]--- [ 894.154363] ------------[ cut here ]------------ [ 894.154367] WARNING: CPU: 3 PID: 5543 at fs/smb/server/transport_rdma.c:1728 smb_direct_cm_handler+0x121/0x130 [ksmbd] [ 894.154384] Modules linked in: ksmbd cmac nls_utf8 nls_ucs2_utils libarc4 nls_iso8859_1 snd_hda_codec_intelhdmi snd_hda_codec_hdmi snd_hda_codec_alc882 snd_hda_codec_realtek_lib snd_hda_codec_generic rpcrdma intel_rapl_msr rdma_ucm intel_rapl_common snd_hda_intel ib_iser snd_hda_codec intel_uncore_frequency intel_uncore_frequency_common snd_hda_core intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp snd_intel_dspcfg libiscsi snd_intel_sdw_acpi coretemp scsi_transport_iscsi snd_hwdep kvm_intel i915 snd_pcm ib_umad rdma_cm snd_seq_midi ib_ipoib kvm snd_seq_midi_event iw_cm snd_rawmidi ghash_clmulni_intel ib_cm aesni_intel snd_seq mei_hdcp drm_buddy rapl snd_seq_device eeepc_wmi asus_wmi snd_timer intel_cstate ttm snd drm_client_lib drm_display_helper sparse_keymap soundcore platform_profile mxm_wmi wmi_bmof joydev mei_me cec acpi_pad mei rc_core drm_kms_helper input_leds i2c_algo_bit mac_hid sch_fq_codel msr parport_pc ppdev lp nfsd parport auth_rpcgss binfmt_misc nfs_acl lockd grace drm sunrpc ramoops efi_pstore [ 894.154456] reed_solomon pstore_blk pstore_zone autofs4 btrfs blake2b_generic xor raid6_pq mlx5_ib ib_uverbs ib_core hid_generic uas usbhid hid r8169 i2c_i801 usb_storage i2c_mux i2c_smbus mlx5_core realtek ahci mlxfw psample libahci video wmi [last unloaded: ksmbd] [ 894.154483] CPU: 3 UID: 0 PID: 5543 Comm: kworker/3:6 Tainted: G W 6.18.0-rc5+ #1 PREEMPT(voluntary) [ 894.154487] Tainted: [W]=WARN [ 894.154488] Hardware name: System manufacturer System Product Name/H110M-K, BIOS 3601 12/12/2017 [ 894.154490] Workqueue: ib_cm cm_work_handler [ib_cm] [ 894.154499] RIP: 0010:smb_direct_cm_handler+0x121/0x130 [ksmbd] [ 894.154507] Code: e7 e8 13 b1 ef ff 44 89 e1 4c 89 ee 48 c7 c7 80 d7 59 c1 48 89 c2 e8 2e 4d ef c3 31 c0 5b 41 5c 41 5d 41 5e 5d c3 cc cc cc cc <0f> 0b eb a5 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 [ 894.154510] RSP: 0018:ffffa47ec1b27c00 EFLAGS: 00010206 [ 894.154512] RAX: ffffffffc1304e00 RBX: ffff8eb89ae50880 RCX: 0000000000000000 [ 894.154514] RDX: ffff8eb730960000 RSI: ffffa47ec1b27c60 RDI: ffff8eb7454b9400 [ 894.154515] RBP: ffffa47ec1b27c20 R08: 0000000000000002 R09: ffff8eb730b8c18b [ 894.154517] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000009 [ 894.154518] R13: ffff8eb7454b9400 R14: ffff8eb7454b8810 R15: ffff8eb815c43000 [ 894.154520] FS: 0000000000000000(0000) GS:ffff8eb9a7455000(0000) knlGS:0000000000000000 [ 894.154522] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 894.154523] CR2: 00007fe1310e99d0 CR3: 0000000010a40005 CR4: 00000000003726f0 [ 894.154525] Call Trace: [ 894.154527] <TASK> [ 894.154530] cma_cm_event_handler+0x27/0xd0 [rdma_cm] [ 894.154541] cma_ib_handler+0x99/0x2e0 [rdma_cm] [ 894.154551] cm_process_work+0x28/0xf0 [ib_cm] [ 894.154557] cm_queue_work_unlock+0x41/0xf0 [ib_cm] [ 894.154563] cm_work_handler+0x2eb/0x25b0 [ib_cm] [ 894.154568] ? pwq_activate_first_inactive+0x52/0x70 [ 894.154572] ? pwq_dec_nr_in_flight+0x244/0x330 [ 894.154575] process_scheduled_works+0xd4/0x430 [ 894.154579] worker_thread+0x12a/0x270 [ 894.154581] kthread+0x10d/0x250 [ 894.154585] ? __pfx_worker_thread+0x10/0x10 [ 894.154587] ? __pfx_kthread+0x10/0x10 [ 894.154590] ret_from_fork+0x11a/0x160 [ 894.154593] ? __pfx_kthread+0x10/0x10 [ 894.154596] ret_from_fork_asm+0x1a/0x30 [ 894.154602] </TASK> [ 894.154603] ---[ end trace 0000000000000000 ]--- [ 894.154931] ksmbd: smb_direct: disconnected [ 894.157278] ksmbd: smb_direct: disconnected I guess sc->first_error is already set and sc->status is thus unexpected, so this should avoid the WARN[_ON]_ONCE() if sc->first_error is already set and have a usable error path. While there set sc->first_error as soon as possible. v1 of this patch revealed the real problem with this message: [ 309.560973] expected[NEGOTIATE_NEEDED] != RDMA_CONNECT_RUNNING first_error=0 local=192.168.0.200:445 remote=192.168.0.100:60445 [ 309.561034] WARNING: CPU: 2 PID: 78 at transport_rdma.c:643 recv_done+0x2fa/0x3d0 [ksmbd] Some drivers (at least mlx5_ib) might post a recv completion before RDMA_CM_EVENT_ESTABLISHED, so we need to adjust our expectation in that case. Fixes: e2d5e51 ("smb: server: only turn into SMBDIRECT_SOCKET_CONNECTED when negotiation is done") Cc: Steve French <[email protected]> Cc: Tom Talpey <[email protected]> Cc: Long Li <[email protected]> Cc: Namjae Jeon <[email protected]> Cc: Paulo Alcantara <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Stefan Metzmacher <[email protected]> Acked-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 18, 2025
[ Upstream commit ccb61a3 ] kbd_led_set() can sleep, and so may not be used as the brightness_set() callback. Otherwise using this led with a trigger leads to system hangs accompanied by: BUG: scheduling while atomic: acpi_fakekeyd/2588/0x00000003 CPU: 4 UID: 0 PID: 2588 Comm: acpi_fakekeyd Not tainted 6.17.9+deb14-amd64 #1 PREEMPT(lazy) Debian 6.17.9-1 Hardware name: ASUSTeK COMPUTER INC. ASUS EXPERTBOOK B9403CVAR/B9403CVAR, BIOS B9403CVAR.311 12/24/2024 Call Trace: <TASK> [...] schedule_timeout+0xbd/0x100 __down_common+0x175/0x290 down_timeout+0x67/0x70 acpi_os_wait_semaphore+0x57/0x90 [...] asus_wmi_evaluate_method3+0x87/0x190 [asus_wmi] led_trigger_event+0x3f/0x60 [...] Fixes: 9fe44fc ("platform/x86: asus-wmi: Simplify the keyboard brightness updating process") Signed-off-by: Anton Khirnov <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Denis Benato <[email protected]> Link: https://patch.msgid.link/[email protected] Reviewed-by: Ilpo Järvinen <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
heftig
pushed a commit
that referenced
this pull request
Dec 18, 2025
[ Upstream commit d84e47e ] Since commit a735831 ("drm/nouveau: vendor in drm_encoder_slave API") nouveau appears to be broken for all dispnv04 GPUs (before NV50). Depending on the kernel version, either having no display output and hanging in kernel for a long time, or even oopsing in the cleanup path like: Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac ... nouveau 0000:0a:00.0: drm: 0x14C5: Parsing digital output script table BUG: Unable to handle kernel data access on read at 0x00041520 Faulting instruction address: 0xc0003d0001be0844 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=8 NUMA PowerMac Modules linked in: windfarm_cpufreq_clamp windfarm_smu_sensors windfarm_smu_controls windfarm_pm112 snd_aoa_codec_onyx snd_aoa_fabric_layout snd_aoa windfarm_pid jo apple_mfi_fastcharge rndis_host cdc_ether usbnet mii snd_aoa_i2sbus snd_aoa_soundbus snd_pcm snd_timer snd soundcore rack_meter windfarm_smu_sat windfarm_max6690_s m75_sensor windfarm_core gpu_sched drm_gpuvm drm_exec drm_client_lib drm_ttm_helper ttm drm_display_helper drm_kms_helper drm drm_panel_orientation_quirks syscopyar _sys_fops i2c_algo_bit backlight uio_pdrv_genirq uio uninorth_agp agpgart zram dm_mod dax ipv6 nfsv4 dns_resolver nfs lockd grace sunrpc offb cfbfillrect cfbimgblt ont input_leds sr_mod cdrom sd_mod uas ata_generic hid_apple hid_generic usbhid hid usb_storage pata_macio sata_svw libata firewire_ohci scsi_mod firewire_core ohci ehci_pci ehci_hcd tg3 ohci_hcd libphy usbcore usb_common nls_base led_class CPU: 0 UID: 0 PID: 245 Comm: (udev-worker) Not tainted 6.14.0-09584-g7d06015d936c #7 PREEMPTLAZY Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac NIP: c0003d0001be0844 LR: c0003d0001be0830 CTR: 0000000000000000 REGS: c0000000053f70e0 TRAP: 0300 Not tainted (6.14.0-09584-g7d06015d936c) MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 24222220 XER: 00000000 DAR: 0000000000041520 DSISR: 40000000 IRQMASK: 0 \x0aGPR00: c0003d0001be0830 c0000000053f7380 c0003d0000911900 c000000007bc6800 \x0aGPR04: 0000000000000000 0000000000000000 c000000007bc6e70 0000000000000001 \x0aGPR08: 01f3040000000000 0000000000041520 0000000000000000 c0003d0000813958 \x0aGPR12: c000000000071a48 c000000000e28000 0000000000000020 0000000000000000 \x0aGPR16: 0000000000000000 0000000000f52630 0000000000000000 0000000000000000 \x0aGPR20: 0000000000000000 0000000000000000 0000000000000001 c0003d0000928528 \x0aGPR24: c0003d0000928598 0000000000000000 c000000007025480 c000000007025480 \x0aGPR28: c0000000010b4000 0000000000000000 c000000007bc1800 c000000007bc6800 NIP [c0003d0001be0844] nv_crtc_destroy+0x44/0xd4 [nouveau] LR [c0003d0001be0830] nv_crtc_destroy+0x30/0xd4 [nouveau] Call Trace: [c0000000053f7380] [c0003d0001be0830] nv_crtc_destroy+0x30/0xd4 [nouveau] (unreliable) [c0000000053f73c0] [c0003d00007f7bf4] drm_mode_config_cleanup+0x27c/0x30c [drm] [c0000000053f7490] [c0003d0001bdea50] nouveau_display_create+0x1cc/0x550 [nouveau] [c0000000053f7500] [c0003d0001bcc29c] nouveau_drm_device_init+0x1c8/0x844 [nouveau] [c0000000053f75e0] [c0003d0001bcc9ec] nouveau_drm_probe+0xd4/0x1e0 [nouveau] [c0000000053f7670] [c000000000557d24] local_pci_probe+0x50/0xa8 [c0000000053f76f0] [c000000000557fa8] pci_device_probe+0x22c/0x240 [c0000000053f7760] [c0000000005fff3c] really_probe+0x188/0x31c [c0000000053f77e0] [c000000000600204] __driver_probe_device+0x134/0x13c [c0000000053f7860] [c0000000006002c0] driver_probe_device+0x3c/0xb4 [c0000000053f78a0] [c000000000600534] __driver_attach+0x118/0x128 [c0000000053f78e0] [c0000000005fe038] bus_for_each_dev+0xa8/0xf4 [c0000000053f7950] [c0000000005ff460] driver_attach+0x2c/0x40 [c0000000053f7970] [c0000000005fea68] bus_add_driver+0x130/0x278 [c0000000053f7a00] [c00000000060117c] driver_register+0x9c/0x1a0 [c0000000053f7a80] [c00000000055623c] __pci_register_driver+0x5c/0x70 [c0000000053f7aa0] [c0003d0001c058a0] nouveau_drm_init+0x254/0x278 [nouveau] [c0000000053f7b10] [c00000000000e9bc] do_one_initcall+0x84/0x268 [c0000000053f7bf0] [c0000000001a0ba0] do_init_module+0x70/0x2d8 [c0000000053f7c70] [c0000000001a42bc] init_module_from_file+0xb4/0x108 [c0000000053f7d50] [c0000000001a4504] sys_finit_module+0x1ac/0x478 [c0000000053f7e10] [c000000000023230] system_call_exception+0x1a4/0x20c [c0000000053f7e50] [c00000000000c554] system_call_common+0xf4/0x258 --- interrupt: c00 at 0xfd5f988 NIP: 000000000fd5f988 LR: 000000000ff9b148 CTR: 0000000000000000 REGS: c0000000053f7e80 TRAP: 0c00 Not tainted (6.14.0-09584-g7d06015d936c) MSR: 100000000000d032 <HV,EE,PR,ME,IR,DR,RI> CR: 28222244 XER: 00000000 IRQMASK: 0 \x0aGPR00: 0000000000000161 00000000ffcdc2d0 00000000405db160 0000000000000020 \x0aGPR04: 000000000ffa2c9c 0000000000000000 000000000000001f 0000000000000045 \x0aGPR08: 0000000011a13770 0000000000000000 0000000000000000 0000000000000000 \x0aGPR12: 0000000000000000 0000000010249d8c 0000000000000020 0000000000000000 \x0aGPR16: 0000000000000000 0000000000f52630 0000000000000000 0000000000000000 \x0aGPR20: 0000000000000000 0000000000000000 0000000000000000 0000000011a11a70 \x0aGPR24: 0000000011a13580 0000000011a11950 0000000011a11a70 0000000000020000 \x0aGPR28: 000000000ffa2c9c 0000000000000000 000000000ffafc40 0000000011a11a70 NIP [000000000fd5f988] 0xfd5f988 LR [000000000ff9b148] 0xff9b148 --- interrupt: c00 Code: f821ffc1 418200ac e93f0000 e9290038 e9291468 eba90000 48026c0d e8410018 e93f06aa 3d290001 392982a4 79291f24 <7fdd482a> 2c3e0000 41820030 7fc3f378 ---[ end trace 0000000000000000 ]--- This is caused by the i2c encoder modules vendored into nouveau/ now depending on the equally vendored nouveau_i2c_encoder_destroy function. Trying to auto-load this modules hangs on nouveau initialization until timeout, and nouveau continues without i2c video encoders. Fix by avoiding nouveau dependency by __always_inlining that helper functions into those i2c video encoder modules. Fixes: a735831 ("drm/nouveau: vendor in drm_encoder_slave API") Signed-off-by: René Rebe <[email protected]> Reviewed-by: Lyude Paul <[email protected]> [Lyude: fixed commit reference in description] Signed-off-by: Lyude Paul <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There is certainly a problem in that commit that introduced those changes. I'm not sure if this change is enough of a fix for my problem but it is a start.
Also I must add that I didn't have the time to test it yet if this fixed the problem completely. Just get sick of waiting over a year for someone else to fix. If it is wrong I'm sure internet will correct it :)