-
Notifications
You must be signed in to change notification settings - Fork 59.6k
value swap macro change with xor #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
+3
−3
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
may I ask what is the swap value for? |
Contributor
|
The compiler will get rid of the temporary variable anyway this is just a different way or writing it. |
|
12 GB
|
|
Thanks |
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Aug 8, 2013
There is a loop in do_mlockall() that lacks a preemption point, which
means that the following can happen on non-preemptible builds of the
kernel:
> My fuzz tester keeps hitting this. Every instance shows the non-irq stack
> came in from mlockall. I'm only seeing this on one box, but that has more
> ram (8gb) than my other machines, which might explain it.
>
> Dave
>
> INFO: rcu_preempt self-detected stall on CPU { 3} (t=6500 jiffies g=470344 c=470343 q=0)
> sending NMI to all CPUs:
> NMI backtrace for cpu 3
> CPU: 3 PID: 29664 Comm: trinity-child2 Not tainted 3.11.0-rc1+ torvalds#32
> task: ffff88023e743fc0 ti: ffff88022f6f2000 task.ti: ffff88022f6f2000
> RIP: 0010:[<ffffffff810bf7d1>] [<ffffffff810bf7d1>] trace_hardirqs_off_caller+0x21/0xb0
> RSP: 0018:ffff880244e03c30 EFLAGS: 00000046
> RAX: ffff88023e743fc0 RBX: 0000000000000001 RCX: 000000000000003c
> RDX: 000000000000000f RSI: 0000000000000004 RDI: ffffffff81033cab
> RBP: ffff880244e03c38 R08: ffff880243288a80 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000001 R12: ffff880243288a80
> R13: ffff8802437eda40 R14: 0000000000080000 R15: 000000000000d010
> FS: 00007f50ae33b740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000097f000 CR3: 0000000240fa0000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
> ffffffff810bf86d ffff880244e03c98 ffffffff81033cab 0000000000000096
> 000000000000d008 0000000300000002 0000000000000004 0000000000000003
> 0000000000002710 ffffffff81c50d00 ffffffff81c50d00 ffff880244fcde00
> Call Trace:
> <IRQ>
> [<ffffffff810bf86d>] ? trace_hardirqs_off+0xd/0x10
> [<ffffffff81033cab>] __x2apic_send_IPI_mask+0x1ab/0x1c0
> [<ffffffff81033cdc>] x2apic_send_IPI_all+0x1c/0x20
> [<ffffffff81030115>] arch_trigger_all_cpu_backtrace+0x65/0xa0
> [<ffffffff811144b1>] rcu_check_callbacks+0x331/0x8e0
> [<ffffffff8108bfa0>] ? hrtimer_run_queues+0x20/0x180
> [<ffffffff8109e905>] ? sched_clock_cpu+0xb5/0x100
> [<ffffffff81069557>] update_process_times+0x47/0x80
> [<ffffffff810bd115>] tick_sched_handle.isra.16+0x25/0x60
> [<ffffffff810bd231>] tick_sched_timer+0x41/0x60
> [<ffffffff8108ace1>] __run_hrtimer+0x81/0x4e0
> [<ffffffff810bd1f0>] ? tick_sched_do_timer+0x60/0x60
> [<ffffffff8108b93f>] hrtimer_interrupt+0xff/0x240
> [<ffffffff8102de84>] local_apic_timer_interrupt+0x34/0x60
> [<ffffffff81718c5f>] smp_apic_timer_interrupt+0x3f/0x60
> [<ffffffff817178ef>] apic_timer_interrupt+0x6f/0x80
> [<ffffffff8170e8e0>] ? retint_restore_args+0xe/0xe
> [<ffffffff8105f101>] ? __do_softirq+0xb1/0x440
> [<ffffffff8105f64d>] irq_exit+0xcd/0xe0
> [<ffffffff81718c65>] smp_apic_timer_interrupt+0x45/0x60
> [<ffffffff817178ef>] apic_timer_interrupt+0x6f/0x80
> <EOI>
> [<ffffffff8170e8e0>] ? retint_restore_args+0xe/0xe
> [<ffffffff8170b830>] ? wait_for_completion_killable+0x170/0x170
> [<ffffffff8170c853>] ? preempt_schedule_irq+0x53/0x90
> [<ffffffff8170e9f6>] retint_kernel+0x26/0x30
> [<ffffffff8107a523>] ? queue_work_on+0x43/0x90
> [<ffffffff8107c369>] schedule_on_each_cpu+0xc9/0x1a0
> [<ffffffff81167770>] ? lru_add_drain+0x50/0x50
> [<ffffffff811677c5>] lru_add_drain_all+0x15/0x20
> [<ffffffff81186965>] SyS_mlockall+0xa5/0x1a0
> [<ffffffff81716e94>] tracesys+0xdd/0xe2
This commit addresses this problem by inserting the required preemption
point.
Reported-by: Dave Jones <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
torvalds
pushed a commit
that referenced
this pull request
Sep 25, 2013
There is a loop in do_mlockall() that lacks a preemption point, which
means that the following can happen on non-preemptible builds of the
kernel. Dave Jones reports:
"My fuzz tester keeps hitting this. Every instance shows the non-irq
stack came in from mlockall. I'm only seeing this on one box, but
that has more ram (8gb) than my other machines, which might explain
it.
INFO: rcu_preempt self-detected stall on CPU { 3} (t=6500 jiffies g=470344 c=470343 q=0)
sending NMI to all CPUs:
NMI backtrace for cpu 3
CPU: 3 PID: 29664 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #32
Call Trace:
lru_add_drain_all+0x15/0x20
SyS_mlockall+0xa5/0x1a0
tracesys+0xdd/0xe2"
This commit addresses this problem by inserting the required preemption
point.
Reported-by: Dave Jones <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
johnweber
pushed a commit
to johnweber/linux
that referenced
this pull request
Oct 1, 2013
torvalds#125 IRQ torvalds#125's status is not constant on different boards, IRQ torvalds#32 is IOMUXC's interrupt which can be triggered manually at anytime, use this irq instead of torvalds#125 to generate interrupt for avoiding CCM enter low power mode by mistake. Signed-off-by: Anson Huang <[email protected]>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Oct 1, 2013
There is a loop in do_mlockall() that lacks a preemption point, which
means that the following can happen on non-preemptible builds of the
kernel:
> My fuzz tester keeps hitting this. Every instance shows the non-irq stack
> came in from mlockall. I'm only seeing this on one box, but that has more
> ram (8gb) than my other machines, which might explain it.
>
> Dave
>
> INFO: rcu_preempt self-detected stall on CPU { 3} (t=6500 jiffies g=470344 c=470343 q=0)
> sending NMI to all CPUs:
> NMI backtrace for cpu 3
> CPU: 3 PID: 29664 Comm: trinity-child2 Not tainted 3.11.0-rc1+ torvalds#32
> task: ffff88023e743fc0 ti: ffff88022f6f2000 task.ti: ffff88022f6f2000
> RIP: 0010:[<ffffffff810bf7d1>] [<ffffffff810bf7d1>] trace_hardirqs_off_caller+0x21/0xb0
> RSP: 0018:ffff880244e03c30 EFLAGS: 00000046
> RAX: ffff88023e743fc0 RBX: 0000000000000001 RCX: 000000000000003c
> RDX: 000000000000000f RSI: 0000000000000004 RDI: ffffffff81033cab
> RBP: ffff880244e03c38 R08: ffff880243288a80 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000001 R12: ffff880243288a80
> R13: ffff8802437eda40 R14: 0000000000080000 R15: 000000000000d010
> FS: 00007f50ae33b740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000097f000 CR3: 0000000240fa0000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
> ffffffff810bf86d ffff880244e03c98 ffffffff81033cab 0000000000000096
> 000000000000d008 0000000300000002 0000000000000004 0000000000000003
> 0000000000002710 ffffffff81c50d00 ffffffff81c50d00 ffff880244fcde00
> Call Trace:
> <IRQ>
> [<ffffffff810bf86d>] ? trace_hardirqs_off+0xd/0x10
> [<ffffffff81033cab>] __x2apic_send_IPI_mask+0x1ab/0x1c0
> [<ffffffff81033cdc>] x2apic_send_IPI_all+0x1c/0x20
> [<ffffffff81030115>] arch_trigger_all_cpu_backtrace+0x65/0xa0
> [<ffffffff811144b1>] rcu_check_callbacks+0x331/0x8e0
> [<ffffffff8108bfa0>] ? hrtimer_run_queues+0x20/0x180
> [<ffffffff8109e905>] ? sched_clock_cpu+0xb5/0x100
> [<ffffffff81069557>] update_process_times+0x47/0x80
> [<ffffffff810bd115>] tick_sched_handle.isra.16+0x25/0x60
> [<ffffffff810bd231>] tick_sched_timer+0x41/0x60
> [<ffffffff8108ace1>] __run_hrtimer+0x81/0x4e0
> [<ffffffff810bd1f0>] ? tick_sched_do_timer+0x60/0x60
> [<ffffffff8108b93f>] hrtimer_interrupt+0xff/0x240
> [<ffffffff8102de84>] local_apic_timer_interrupt+0x34/0x60
> [<ffffffff81718c5f>] smp_apic_timer_interrupt+0x3f/0x60
> [<ffffffff817178ef>] apic_timer_interrupt+0x6f/0x80
> [<ffffffff8170e8e0>] ? retint_restore_args+0xe/0xe
> [<ffffffff8105f101>] ? __do_softirq+0xb1/0x440
> [<ffffffff8105f64d>] irq_exit+0xcd/0xe0
> [<ffffffff81718c65>] smp_apic_timer_interrupt+0x45/0x60
> [<ffffffff817178ef>] apic_timer_interrupt+0x6f/0x80
> <EOI>
> [<ffffffff8170e8e0>] ? retint_restore_args+0xe/0xe
> [<ffffffff8170b830>] ? wait_for_completion_killable+0x170/0x170
> [<ffffffff8170c853>] ? preempt_schedule_irq+0x53/0x90
> [<ffffffff8170e9f6>] retint_kernel+0x26/0x30
> [<ffffffff8107a523>] ? queue_work_on+0x43/0x90
> [<ffffffff8107c369>] schedule_on_each_cpu+0xc9/0x1a0
> [<ffffffff81167770>] ? lru_add_drain+0x50/0x50
> [<ffffffff811677c5>] lru_add_drain_all+0x15/0x20
> [<ffffffff81186965>] SyS_mlockall+0xa5/0x1a0
> [<ffffffff81716e94>] tracesys+0xdd/0xe2
This commit addresses this problem by inserting the required preemption
point.
Reported-by: Dave Jones <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Oct 14, 2013
As the new x86 CPU bootup printout format code maintainer, I am taking immediate action to improve and clean (and thus indulge my OCD) the reporting of the cores when coming up online. Fix padding to a right-hand alignment, cleanup code and bind reporting width to the max number of supported CPUs on the system, like this: [ 0.074509] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 torvalds#6 torvalds#7 OK [ 0.644008] smpboot: Booting Node 1, Processors: torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 OK [ 1.245006] smpboot: Booting Node 2, Processors: torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 OK [ 1.864005] smpboot: Booting Node 3, Processors: torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 OK [ 2.489005] smpboot: Booting Node 4, Processors: torvalds#32 torvalds#33 torvalds#34 torvalds#35 torvalds#36 torvalds#37 torvalds#38 torvalds#39 OK [ 3.093005] smpboot: Booting Node 5, Processors: torvalds#40 torvalds#41 torvalds#42 torvalds#43 torvalds#44 torvalds#45 torvalds#46 torvalds#47 OK [ 3.698005] smpboot: Booting Node 6, Processors: torvalds#48 torvalds#49 torvalds#50 torvalds#51 #52 #53 torvalds#54 torvalds#55 OK [ 4.304005] smpboot: Booting Node 7, Processors: torvalds#56 torvalds#57 #58 torvalds#59 torvalds#60 torvalds#61 torvalds#62 torvalds#63 OK [ 4.961413] Brought up 64 CPUs and this: [ 0.072367] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 torvalds#6 torvalds#7 OK [ 0.686329] Brought up 8 CPUs Signed-off-by: Borislav Petkov <[email protected]> Cc: Libin <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Oct 14, 2013
Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 torvalds#6 torvalds#7 [ 0.603005] .... node #1, CPUs: torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 [ 1.200005] .... node #2, CPUs: torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 [ 1.796005] .... node #3, CPUs: torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 [ 2.393005] .... node #4, CPUs: torvalds#32 torvalds#33 torvalds#34 torvalds#35 torvalds#36 torvalds#37 torvalds#38 torvalds#39 [ 2.996005] .... node #5, CPUs: torvalds#40 torvalds#41 torvalds#42 torvalds#43 torvalds#44 torvalds#45 torvalds#46 torvalds#47 [ 3.600005] .... node torvalds#6, CPUs: torvalds#48 torvalds#49 torvalds#50 torvalds#51 #52 #53 torvalds#54 torvalds#55 [ 4.202005] .... node torvalds#7, CPUs: torvalds#56 torvalds#57 #58 torvalds#59 torvalds#60 torvalds#61 torvalds#62 torvalds#63 [ 4.811005] .... node torvalds#8, CPUs: torvalds#64 torvalds#65 torvalds#66 torvalds#67 torvalds#68 torvalds#69 #70 torvalds#71 [ 5.421006] .... node torvalds#9, CPUs: torvalds#72 torvalds#73 torvalds#74 torvalds#75 torvalds#76 torvalds#77 torvalds#78 torvalds#79 [ 6.032005] .... node torvalds#10, CPUs: torvalds#80 torvalds#81 torvalds#82 torvalds#83 torvalds#84 torvalds#85 torvalds#86 torvalds#87 [ 6.648006] .... node torvalds#11, CPUs: torvalds#88 torvalds#89 torvalds#90 torvalds#91 torvalds#92 torvalds#93 torvalds#94 torvalds#95 [ 7.262005] .... node torvalds#12, CPUs: torvalds#96 torvalds#97 torvalds#98 torvalds#99 torvalds#100 torvalds#101 torvalds#102 torvalds#103 [ 7.865005] .... node torvalds#13, CPUs: torvalds#104 torvalds#105 torvalds#106 torvalds#107 torvalds#108 torvalds#109 torvalds#110 torvalds#111 [ 8.466005] .... node torvalds#14, CPUs: torvalds#112 torvalds#113 torvalds#114 torvalds#115 torvalds#116 torvalds#117 torvalds#118 torvalds#119 [ 9.073006] .... node torvalds#15, CPUs: torvalds#120 torvalds#121 torvalds#122 torvalds#123 torvalds#124 torvalds#125 torvalds#126 torvalds#127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Oct 14, 2013
The 'driver' field of the i2c_client struct is redundant. The same data can be accessed through to_i2c_driver(client->dev.driver). The generated code for both approaches in more or less the same. E.g. on ARM the expression client->driver->command(...) generates ... ldr r3, [r0, torvalds#28] ldr r3, [r3, torvalds#32] blx r3 ... and the expression to_i2c_driver(client->dev.driver)->command(...) generates ... ldr r3, [r0, torvalds#160] ldr r3, [r3, #-4] blx r3 ... Other architectures will generate similar code. All users of the 'driver' field outside of the I2C core have already been converted. So this only leaves the core itself. This patch converts the remaining few users in the I2C core and then removes the 'driver' field from the i2c_client struct. Signed-off-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
TechNexion
pushed a commit
to TechNexion/linux
that referenced
this pull request
Oct 25, 2013
commit 7ed47b7 upstream. The ghash_update function passes a pointer to gf128mul_4k_lle which will be NULL if ghash_setkey is not called or if the most recent call to ghash_setkey failed to allocate memory. This causes an oops. Fix this up by returning an error code in the null case. This is trivially triggered from unprivileged userspace through the AF_ALG interface by simply writing to the socket without setting a key. The ghash_final function has a similar issue, but triggering it requires a memory allocation failure in ghash_setkey _after_ at least one successful call to ghash_update. BUG: unable to handle kernel NULL pointer dereference at 00000670 IP: [<d88c92d4>] gf128mul_4k_lle+0x23/0x60 [gf128mul] *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: ghash_generic gf128mul algif_hash af_alg nfs lockd nfs_acl sunrpc bridge ipv6 stp llc Pid: 1502, comm: hashatron Tainted: G W 3.1.0-rc9-00085-ge9308cf torvalds#32 Bochs Bochs EIP: 0060:[<d88c92d4>] EFLAGS: 00000202 CPU: 0 EIP is at gf128mul_4k_lle+0x23/0x60 [gf128mul] EAX: d69db1f0 EBX: d6b8ddac ECX: 00000004 EDX: 00000000 ESI: 00000670 EDI: d6b8ddac EBP: d6b8ddc8 ESP: d6b8dda4 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process hashatron (pid: 1502, ti=d6b8c000 task=d6810000 task.ti=d6b8c000) Stack: 00000000 d69db1f0 00000163 00000000 d6b8ddc8 c101a520 d69db1f0 d52aa000 00000ff0 d6b8dde8 d88d310f d6b8a3f8 d52aa000 00001000 d88d502c d6b8ddfc 00001000 d6b8ddf4 c11676e d69db1e8 d6b8de24 c11679ad d52aa000 00000000 Call Trace: [<c101a520>] ? kmap_atomic_prot+0x37/0xa6 [<d88d310f>] ghash_update+0x85/0xbe [ghash_generic] [<c11676ed>] crypto_shash_update+0x18/0x1b [<c11679ad>] shash_ahash_update+0x22/0x36 [<c11679cc>] shash_async_update+0xb/0xd [<d88ce0ba>] hash_sendpage+0xba/0xf2 [algif_hash] [<c121b24c>] kernel_sendpage+0x39/0x4e [<d88ce000>] ? 0xd88cdfff [<c121b298>] sock_sendpage+0x37/0x3e [<c121b261>] ? kernel_sendpage+0x4e/0x4e [<c10b4dbc>] pipe_to_sendpage+0x56/0x61 [<c10b4e1f>] splice_from_pipe_feed+0x58/0xcd [<c10b4d66>] ? splice_from_pipe_begin+0x10/0x10 [<c10b51f5>] __splice_from_pipe+0x36/0x55 [<c10b4d66>] ? splice_from_pipe_begin+0x10/0x10 [<c10b6383>] splice_from_pipe+0x51/0x64 [<c10b63c2>] ? default_file_splice_write+0x2c/0x2c [<c10b63d5>] generic_splice_sendpage+0x13/0x15 [<c10b4d66>] ? splice_from_pipe_begin+0x10/0x10 [<c10b527f>] do_splice_from+0x5d/0x67 [<c10b6865>] sys_splice+0x2bf/0x363 [<c129373b>] ? sysenter_exit+0xf/0x16 [<c104dc1e>] ? trace_hardirqs_on_caller+0x10e/0x13f [<c129370c>] sysenter_do_call+0x12/0x32 Code: 83 c4 0c 5b 5e 5f c9 c3 55 b9 04 00 00 00 89 e5 57 8d 7d e4 56 53 8d 5d e4 83 ec 18 89 45 e0 89 55 dc 0f b6 70 0f c1 e6 04 01 d6 <f3> a5 be 0f 00 00 00 4e 89 d8 e8 48 ff ff ff 8b 45 e0 89 da 0f EIP: [<d88c92d4>] gf128mul_4k_lle+0x23/0x60 [gf128mul] SS:ESP 0068:d6b8dda4 CR2: 0000000000000670 ---[ end trace 4eaa2a86a8e2da24 ]--- note: hashatron[1502] exited with preempt_count 1 BUG: scheduling while atomic: hashatron/1502/0x10000002 INFO: lockdep is turned off. [...] Signed-off-by: Nick Bowler <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Oct 29, 2013
There is a defect in imx6 LPM design. When SW tries to enter low power mode with following sequence, the chip will enter low power mode before A9 CPU execute WFI instruction: 1. Set CCM_CLPCR[1:0] to 2'b00; 2. ARM CPU enters WFI; 3. ARM CPU wakeup from an interrupt event, which is masked by GPC or not visible to GPC, such as interrupt from local timer; 4. Set CCM_CLPCR[1:0] to 2'b01 or 2'b10; 5. ARM CPU execute WFI. Before the last step, the chip will enter WAIT mode if CCM_CLPCR[1:0] is set to 2'b01, or enter STOP mode if CCM_CLPCR[1:0] is set to 2'b10. The patch implements a recommended workaround for this issue. 1. SW triggers irq torvalds#32(IOMUX) to be always pending manually by setting IOMUX_GPR1_GINT bit; 2. SW should then unmask it in GPC before setting CCM LPM; 3. SW should mask it right after CCM LPM is set (bit0-1 of CCM_CLPCR). Signed-off-by: Shawn Guo <[email protected]>
torvalds
pushed a commit
that referenced
this pull request
Nov 22, 2013
Lockdep complains about btrfs's async commit: [ 2372.462171] [ BUG: bad unlock balance detected! ] [ 2372.462191] 3.12.0+ #32 Tainted: G W [ 2372.462209] ------------------------------------- [ 2372.462228] ceph-osd/14048 is trying to release lock (sb_internal) at: [ 2372.462275] [<ffffffffa022cb10>] btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462305] but there are no more locks to release! [ 2372.462324] [ 2372.462324] other info that might help us debug this: [ 2372.462349] no locks held by ceph-osd/14048. [ 2372.462367] [ 2372.462367] stack backtrace: [ 2372.462386] CPU: 2 PID: 14048 Comm: ceph-osd Tainted: G W 3.12.0+ #32 [ 2372.462414] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011 [ 2372.462455] ffffffffa022cb10 ffff88007490fd28 ffffffff816f094a ffff8800378aa320 [ 2372.462491] ffff88007490fd50 ffffffff810adf4c ffff8800378aa320 ffff88009af97650 [ 2372.462526] ffffffffa022cb10 ffff88007490fd88 ffffffff810b01ee ffff8800898c0000 [ 2372.462562] Call Trace: [ 2372.462584] [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462619] [<ffffffff816f094a>] dump_stack+0x45/0x56 [ 2372.462642] [<ffffffff810adf4c>] print_unlock_imbalance_bug+0xec/0x100 [ 2372.462677] [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462710] [<ffffffff810b01ee>] lock_release+0x18e/0x210 [ 2372.462742] [<ffffffffa022cb36>] btrfs_commit_transaction_async+0x1d6/0x2a0 [btrfs] [ 2372.462783] [<ffffffffa025a7ce>] btrfs_ioctl_start_sync+0x3e/0xc0 [btrfs] [ 2372.462822] [<ffffffffa025f1d3>] btrfs_ioctl+0x4c3/0x1f70 [btrfs] [ 2372.462849] [<ffffffff812c0321>] ? avc_has_perm+0x121/0x1b0 [ 2372.462873] [<ffffffff812c0224>] ? avc_has_perm+0x24/0x1b0 [ 2372.462897] [<ffffffff8107ecc8>] ? sched_clock_cpu+0xa8/0x100 [ 2372.462922] [<ffffffff8117b145>] do_vfs_ioctl+0x2e5/0x4e0 [ 2372.462946] [<ffffffff812c19e6>] ? file_has_perm+0x86/0xa0 [ 2372.462969] [<ffffffff8117b3c1>] SyS_ioctl+0x81/0xa0 [ 2372.462991] [<ffffffff817045a4>] tracesys+0xdd/0xe2 ==================================================== It's because that we don't do the right thing when checking if it's ok to tell lockdep that we're trying to release the rwsem. If the trans handle's type is TRANS_ATTACH, we won't acquire the freeze rwsem, but as TRANS_ATTACH fits the check (trans < TRANS_JOIN_NOLOCK), we'll release the freeze rwsem, which makes lockdep complains a lot. Reported-by: Ma Jianpeng <[email protected]> Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
heftig
referenced
this pull request
in zen-kernel/zen-kernel
Dec 21, 2013
commit b1a06a4 upstream. Lockdep complains about btrfs's async commit: [ 2372.462171] [ BUG: bad unlock balance detected! ] [ 2372.462191] 3.12.0+ #32 Tainted: G W [ 2372.462209] ------------------------------------- [ 2372.462228] ceph-osd/14048 is trying to release lock (sb_internal) at: [ 2372.462275] [<ffffffffa022cb10>] btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462305] but there are no more locks to release! [ 2372.462324] [ 2372.462324] other info that might help us debug this: [ 2372.462349] no locks held by ceph-osd/14048. [ 2372.462367] [ 2372.462367] stack backtrace: [ 2372.462386] CPU: 2 PID: 14048 Comm: ceph-osd Tainted: G W 3.12.0+ #32 [ 2372.462414] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015 11/09/2011 [ 2372.462455] ffffffffa022cb10 ffff88007490fd28 ffffffff816f094a ffff8800378aa320 [ 2372.462491] ffff88007490fd50 ffffffff810adf4c ffff8800378aa320 ffff88009af97650 [ 2372.462526] ffffffffa022cb10 ffff88007490fd88 ffffffff810b01ee ffff8800898c0000 [ 2372.462562] Call Trace: [ 2372.462584] [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462619] [<ffffffff816f094a>] dump_stack+0x45/0x56 [ 2372.462642] [<ffffffff810adf4c>] print_unlock_imbalance_bug+0xec/0x100 [ 2372.462677] [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs] [ 2372.462710] [<ffffffff810b01ee>] lock_release+0x18e/0x210 [ 2372.462742] [<ffffffffa022cb36>] btrfs_commit_transaction_async+0x1d6/0x2a0 [btrfs] [ 2372.462783] [<ffffffffa025a7ce>] btrfs_ioctl_start_sync+0x3e/0xc0 [btrfs] [ 2372.462822] [<ffffffffa025f1d3>] btrfs_ioctl+0x4c3/0x1f70 [btrfs] [ 2372.462849] [<ffffffff812c0321>] ? avc_has_perm+0x121/0x1b0 [ 2372.462873] [<ffffffff812c0224>] ? avc_has_perm+0x24/0x1b0 [ 2372.462897] [<ffffffff8107ecc8>] ? sched_clock_cpu+0xa8/0x100 [ 2372.462922] [<ffffffff8117b145>] do_vfs_ioctl+0x2e5/0x4e0 [ 2372.462946] [<ffffffff812c19e6>] ? file_has_perm+0x86/0xa0 [ 2372.462969] [<ffffffff8117b3c1>] SyS_ioctl+0x81/0xa0 [ 2372.462991] [<ffffffff817045a4>] tracesys+0xdd/0xe2 ==================================================== It's because that we don't do the right thing when checking if it's ok to tell lockdep that we're trying to release the rwsem. If the trans handle's type is TRANS_ATTACH, we won't acquire the freeze rwsem, but as TRANS_ATTACH fits the check (trans < TRANS_JOIN_NOLOCK), we'll release the freeze rwsem, which makes lockdep complains a lot. Reported-by: Ma Jianpeng <[email protected]> Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Miao Xie <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
apxii
pushed a commit
to apxii/linux
that referenced
this pull request
Jan 4, 2014
Improve the comment of SW workaround for CCM lpm issue using hardware errata description to avoid confusion. ERR007265: CCM: When improper low-power sequence is used, the SoC enters low power mode before the ARM core executes WFI. Software workaround: 1) Software should trigger IRQ torvalds#32 (IOMUX) to be always pending by setting IOMUX_GPR1_GINT. 2) Software should then unmask IRQ torvalds#32 in GPC before setting CCM Low-Power mode. 3) Software should mask IRQ torvalds#32 right after CCM Low-Power mode is set (set bits 0-1 of CCM_CLPCR). Signed-off-by: Anson Huang <[email protected]> Signed-off-by: Shawn Guo <[email protected]>
johnweber
pushed a commit
to wandboard-org/linux
that referenced
this pull request
Jan 10, 2014
torvalds#125 IRQ torvalds#125's status is not constant on different boards, IRQ torvalds#32 is IOMUXC's interrupt which can be triggered manually at anytime, use this irq instead of torvalds#125 to generate interrupt for avoiding CCM enter low power mode by mistake. Signed-off-by: Anson Huang <[email protected]>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Feb 26, 2014
When doing some numa tests on powerpc, I triggered an oops bug. I find it is caused by using page->_last_cpupid. It should be initialized as "-1 & LAST_CPUPID_MASK", but not "-1". Otherwise, in task_numa_fault(), we will miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)). And finally cause an oops bug in task_numa_group(), since the online cpu is less than possible cpu. PPC needs the LAST_CPUPID_NOT_IN_PAGE_FLAGS case because ppc needs to support a large physical address region, up to 2^46 but small section size (2^24). So when NR_CPUS grows up, it is easily to cause not-in-page-flags. Call trace: [ 55.978091] SMP NR_CPUS=64 NUMA PowerNV [ 55.978118] Modules linked in: [ 55.978145] CPU: 24 PID: 804 Comm: systemd-udevd Not tainted 3.13.0-rc1+ torvalds#32 [ 55.978183] task: c000001e2746aa80 ti: c000001e32c50000 task.ti: c000001e32c50000 [ 55.978219] NIP: c0000000000f5ad0 LR: c0000000000f5ac8 CTR: c000000000913cf0 [ 55.978256] REGS: c000001e32c53510 TRAP: 0300 Not tainted (3.13.0-rc1+) [ 55.978286] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 28024424 XER: 20000000 [ 55.978380] CFAR: c000000000009324 DAR: 7265717569726857 DSISR: 40000000 SOFTE: 1 GPR00: c0000000000f5ac8 c000001e32c53790 c000000001f34338 0000000000000021 GPR04: 0000000000000000 0000000000000031 c000000001f74338 0000ffffffffffff GPR08: 0000000000000001 7265717569726573 0000000000000000 0000000000000000 GPR12: 0000000028024422 c00000000ffdd800 00000000296b2e64 0000000000000020 GPR16: 0000000000000002 0000000000000003 c000001e2f8e4658 c000001e25c1c1d8 GPR20: c000001e2f8e4000 c000000001f7a858 0000000000000658 0000000040000392 GPR24: 00000000000000a8 c000001e33c1a400 00000000000001d8 c000001e25c1c000 GPR28: c000001e33c37ff0 0007837840000392 000000000000003f c000001e32c53790 [ 55.978903] NIP [c0000000000f5ad0] .task_numa_fault+0x1470/0x2370 [ 55.978934] LR [c0000000000f5ac8] .task_numa_fault+0x1468/0x2370 [ 55.978964] Call Trace: [ 55.978978] [c000001e32c53790] [c0000000000f5ac8] .task_numa_fault+0x1468/0x2370 (unreliable) [ 55.979036] [c000001e32c539e0] [c00000000020a820] .do_numa_page+0x480/0x4a0 [ 55.979072] [c000001e32c53b10] [c00000000020bfec] .handle_mm_fault+0x4ec/0xc90 [ 55.979123] [c000001e32c53c00] [c000000000e88c98] .do_page_fault+0x3a8/0x890 [ 55.979161] [c000001e32c53e30] [c000000000009568] handle_page_fault+0x10/0x30 [ 55.979197] Instruction dump: [ 55.979216] 3c82fefb 3884b138 48d9cff1 60000000 48000574 3c62fef 3863af78 3c82fefb [ 55.979277] 3884b138 48d9cfd5 60000000 e93f0100 <812902e4> 7d2907b4 5529063e 7d2a07b4 [ 55.979354] ---[ end trace 15f2510da5ae07cf ]--- Signed-off-by: Liu Ping Fan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Mar 3, 2014
I can trigger a lockdep warning: # mount -t cgroup -o cpuset xxx /cgroup # mkdir /cgroup/cpuset # mkdir /cgroup/tmp # echo 0 > /cgroup/tmp/cpuset.cpus # echo 0 > /cgroup/tmp/cpuset.mems # echo 1 > /cgroup/tmp/cpuset.memory_migrate # echo $$ > /cgroup/tmp/tasks # echo 1 > /cgruop/tmp/cpuset.mems =============================== [ INFO: suspicious RCU usage. ] 3.14.0-rc1-0.1-default+ torvalds#32 Not tainted ------------------------------- include/linux/cgroup.h:682 suspicious rcu_dereference_check() usage! ... [<ffffffff81582174>] dump_stack+0x72/0x86 [<ffffffff810b8f01>] lockdep_rcu_suspicious+0x101/0x140 [<ffffffff81105ba1>] cpuset_migrate_mm+0xb1/0xe0 ... We used to hold cgroup_mutex when calling cpuset_migrate_mm(), but now we hold cpuset_mutex, which causes task_css() to complain. This is not a false-positive but a real issue. Holding cpuset_mutex won't prevent a task from migrating to another cpuset, and it won't prevent the original task->cgroup from destroying during this change. Fixes: 5d21cc2 (cpuset: replace cgroup_mutex locking with cpuset internal locking) Cc: <[email protected]> # 3.9+ Signed-off-by: Li Zefan <[email protected]> Sigend-off-by: Tejun Heo <[email protected]>
torvalds
pushed a commit
that referenced
this pull request
Mar 5, 2014
When doing some numa tests on powerpc, I triggered an oops bug. I find it is caused by using page->_last_cpupid. It should be initialized as "-1 & LAST_CPUPID_MASK", but not "-1". Otherwise, in task_numa_fault(), we will miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)). And finally cause an oops bug in task_numa_group(), since the online cpu is less than possible cpu. This happen with CONFIG_SPARSE_VMEMMAP disabled Call trace: SMP NR_CPUS=64 NUMA PowerNV Modules linked in: CPU: 24 PID: 804 Comm: systemd-udevd Not tainted3.13.0-rc1+ #32 task: c000001e2746aa80 ti: c000001e32c50000 task.ti:c000001e32c50000 REGS: c000001e32c53510 TRAP: 0300 Not tainted(3.13.0-rc1+) MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR:28024424 XER: 20000000 CFAR: c000000000009324 DAR: 7265717569726857 DSISR:40000000 SOFTE: 1 NIP .task_numa_fault+0x1470/0x2370 LR .task_numa_fault+0x1468/0x2370 Call Trace: .task_numa_fault+0x1468/0x2370 (unreliable) .do_numa_page+0x480/0x4a0 .handle_mm_fault+0x4ec/0xc90 .do_page_fault+0x3a8/0x890 handle_page_fault+0x10/0x30 Instruction dump: 3c82fefb 3884b138 48d9cff1 60000000 48000574 3c62fefb3863af78 3c82fefb 3884b138 48d9cfd5 60000000 e93f0100 <812902e4> 7d2907b45529063e 7d2a07b4 ---[ end trace 15f2510da5ae07cf ]--- Signed-off-by: Liu Ping Fan <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
zeitgeist87
pushed a commit
to zeitgeist87/linux
that referenced
this pull request
Mar 14, 2014
When doing some numa tests on powerpc, I triggered an oops bug. I find it is caused by using page->_last_cpupid. It should be initialized as "-1 & LAST_CPUPID_MASK", but not "-1". Otherwise, in task_numa_fault(), we will miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)). And finally cause an oops bug in task_numa_group(), since the online cpu is less than possible cpu. This happen with CONFIG_SPARSE_VMEMMAP disabled Call trace: [ 55.978091] SMP NR_CPUS=64 NUMA PowerNV [ 55.978118] Modules linked in: [ 55.978145] CPU: 24 PID: 804 Comm: systemd-udevd Not tainted3.13.0-rc1+ torvalds#32 [ 55.978183] task: c000001e2746aa80 ti: c000001e32c50000 task.ti:c000001e32c50000 [ 55.978219] NIP: c0000000000f5ad0 LR: c0000000000f5ac8 CTR:c000000000913cf0 [ 55.978256] REGS: c000001e32c53510 TRAP: 0300 Not tainted(3.13.0-rc1+) [ 55.978286] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR:28024424 XER: 20000000 [ 55.978380] CFAR: c000000000009324 DAR: 7265717569726857 DSISR:40000000 SOFTE: 1 GPR00: c0000000000f5ac8 c000001e32c53790 c000000001f343380000000000000021 GPR04: 0000000000000000 0000000000000031 c000000001f743380000ffffffffffff GPR08: 0000000000000001 7265717569726573 00000000000000000000000000000000 GPR12: 0000000028024422 c00000000ffdd800 00000000296b2e640000000000000020 GPR16: 0000000000000002 0000000000000003 c000001e2f8e4658c000001e25c1c1d8 GPR20: c000001e2f8e4000 c000000001f7a858 00000000000006580000000040000392 GPR24: 00000000000000a8 c000001e33c1a400 00000000000001d8c000001e25c1c000 GPR28: c000001e33c37ff0 0007837840000392 000000000000003fc000001e32c53790 [ 55.978903] NIP [c0000000000f5ad0] .task_numa_fault+0x1470/0x2370 [ 55.978934] LR [c0000000000f5ac8] .task_numa_fault+0x1468/0x2370 [ 55.978964] Call Trace: [ 55.978978] [c000001e32c53790] [c0000000000f5ac8].task_numa_fault+0x1468/0x2370 (unreliable) [ 55.979036] [c000001e32c539e0] [c00000000020a820].do_numa_page+0x480/0x4a0 [ 55.979072] [c000001e32c53b10] [c00000000020bfec].handle_mm_fault+0x4ec/0xc90 [ 55.979123] [c000001e32c53c00] [c000000000e88c98].do_page_fault+0x3a8/0x890 [ 55.979161] [c000001e32c53e30] [c000000000009568]handle_page_fault+0x10/0x30 [ 55.979197] Instruction dump: [ 55.979216] 3c82fefb 3884b138 48d9cff1 60000000 48000574 3c62fefb3863af78 3c82fefb [ 55.979277] 3884b138 48d9cfd5 60000000 e93f0100 <812902e4> 7d2907b45529063e 7d2a07b4 [ 55.979354] ---[ end trace 15f2510da5ae07cf ]--- Signed-off-by: Liu Ping Fan <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
brianlilly
pushed a commit
to crystalfontz/cfa_10036_kernel
that referenced
this pull request
Mar 26, 2014
commit 4729583 upstream. I can trigger a lockdep warning: # mount -t cgroup -o cpuset xxx /cgroup # mkdir /cgroup/cpuset # mkdir /cgroup/tmp # echo 0 > /cgroup/tmp/cpuset.cpus # echo 0 > /cgroup/tmp/cpuset.mems # echo 1 > /cgroup/tmp/cpuset.memory_migrate # echo $$ > /cgroup/tmp/tasks # echo 1 > /cgruop/tmp/cpuset.mems =============================== [ INFO: suspicious RCU usage. ] 3.14.0-rc1-0.1-default+ torvalds#32 Not tainted ------------------------------- include/linux/cgroup.h:682 suspicious rcu_dereference_check() usage! ... [<ffffffff81582174>] dump_stack+0x72/0x86 [<ffffffff810b8f01>] lockdep_rcu_suspicious+0x101/0x140 [<ffffffff81105ba1>] cpuset_migrate_mm+0xb1/0xe0 ... We used to hold cgroup_mutex when calling cpuset_migrate_mm(), but now we hold cpuset_mutex, which causes task_css() to complain. This is not a false-positive but a real issue. Holding cpuset_mutex won't prevent a task from migrating to another cpuset, and it won't prevent the original task->cgroup from destroying during this change. Fixes: 5d21cc2 (cpuset: replace cgroup_mutex locking with cpuset internal locking) Signed-off-by: Li Zefan <[email protected]> Sigend-off-by: Tejun Heo <[email protected]> Signed-off-by: Jiri Slaby <[email protected]>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Apr 25, 2014
…ixes WARNING: please, no spaces at the start of a line torvalds#29: FILE: mm/memcontrol.c:689: + int nid = zone_to_nid(zone);$ WARNING: please, no spaces at the start of a line torvalds#30: FILE: mm/memcontrol.c:690: + int zid = zone_idx(zone);$ WARNING: please, no spaces at the start of a line torvalds#32: FILE: mm/memcontrol.c:692: + return mem_cgroup_zoneinfo(memcg, nid, zid);$ total: 0 errors, 3 warnings, 35 lines checked ./patches/mm-memcontrolc-introduce-helper-mem_cgroup_zoneinfo_zone.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Jianyu Zhan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
ddstreet
pushed a commit
to ddstreet/linux
that referenced
this pull request
Apr 26, 2014
…ixes WARNING: please, no spaces at the start of a line torvalds#29: FILE: mm/memcontrol.c:689: + int nid = zone_to_nid(zone);$ WARNING: please, no spaces at the start of a line torvalds#30: FILE: mm/memcontrol.c:690: + int zid = zone_idx(zone);$ WARNING: please, no spaces at the start of a line torvalds#32: FILE: mm/memcontrol.c:692: + return mem_cgroup_zoneinfo(memcg, nid, zid);$ total: 0 errors, 3 warnings, 35 lines checked ./patches/mm-memcontrolc-introduce-helper-mem_cgroup_zoneinfo_zone.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Jianyu Zhan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
torvalds
pushed a commit
that referenced
this pull request
Apr 27, 2014
The asm-generic, big-endian version of zero_bytemask creates a mask of bytes preceding the first zero-byte by left shifting ~0ul based on the position of the first zero byte. Unfortunately, if the first (top) byte is zero, the output of prep_zero_mask has only the top bit set, resulting in undefined C behaviour as we shift left by an amount equal to the width of the type. As it happens, GCC doesn't manage to spot this through the call to fls(), but the issue remains if architectures choose to implement their shift instructions differently. An example would be arch/arm/ (AArch32), where LSL Rd, Rn, #32 results in Rd == 0x0, whilst on arch/arm64 (AArch64) LSL Xd, Xn, #64 results in Xd == Xn. Rather than check explicitly for the problematic shift, this patch adds an extra shift by 1, replacing fls with __fls. Since zero_bytemask is never called with a zero argument (has_zero() is used to check the data first), we don't need to worry about calling __fls(0), which is undefined. Cc: <[email protected]> Cc: Victor Kamensky <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
swarren
pushed a commit
to swarren/linux-tegra
that referenced
this pull request
Apr 28, 2014
…ixes WARNING: please, no spaces at the start of a line torvalds#29: FILE: mm/memcontrol.c:689: + int nid = zone_to_nid(zone);$ WARNING: please, no spaces at the start of a line torvalds#30: FILE: mm/memcontrol.c:690: + int zid = zone_idx(zone);$ WARNING: please, no spaces at the start of a line torvalds#32: FILE: mm/memcontrol.c:692: + return mem_cgroup_zoneinfo(memcg, nid, zid);$ total: 0 errors, 3 warnings, 35 lines checked ./patches/mm-memcontrolc-introduce-helper-mem_cgroup_zoneinfo_zone.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Jianyu Zhan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
ddstreet
pushed a commit
to ddstreet/linux
that referenced
this pull request
May 2, 2014
…ixes WARNING: please, no spaces at the start of a line torvalds#29: FILE: mm/memcontrol.c:689: + int nid = zone_to_nid(zone);$ WARNING: please, no spaces at the start of a line torvalds#30: FILE: mm/memcontrol.c:690: + int zid = zone_idx(zone);$ WARNING: please, no spaces at the start of a line torvalds#32: FILE: mm/memcontrol.c:692: + return mem_cgroup_zoneinfo(memcg, nid, zid);$ total: 0 errors, 3 warnings, 35 lines checked ./patches/mm-memcontrolc-introduce-helper-mem_cgroup_zoneinfo_zone.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Jianyu Zhan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
bjackman
pushed a commit
to bjackman/linux
that referenced
this pull request
Jun 24, 2025
WARNING: Misordered MAINTAINERS entry - list file patterns in alphabetic order torvalds#32: FILE: MAINTAINERS:15860: F: mm/oom_kill.c +F: include/linux/oom.h Cc: David Rientjes <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Michal Hocko <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
intel-lab-lkp
pushed a commit
to intel-lab-lkp/linux
that referenced
this pull request
Jun 24, 2025
Without the change `perf `hangs up on charaster devices. On my system
it's enough to run system-wide sampler for a few seconds to get the
hangup:
$ perf record -a -g --call-graph=dwarf
$ perf report
# hung
`strace` shows that hangup happens on reading on a character device
`/dev/dri/renderD128`
$ strace -y -f -p 2780484
strace: Process 2780484 attached
pread64(101</dev/dri/renderD128>, strace: Process 2780484 detached
It's call trace descends into `elfutils`:
$ gdb -p 2780484
(gdb) bt
#0 0x00007f5e508f04b7 in __libc_pread64 (fd=101, buf=0x7fff9df7edb0, count=0, offset=0)
at ../sysdeps/unix/sysv/linux/pread64.c:25
#1 0x00007f5e52b79515 in read_file () from /<<NIX>>/elfutils-0.192/lib/libelf.so.1
#2 0x00007f5e52b25666 in libdw_open_elf () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#3 0x00007f5e52b25907 in __libdw_open_file () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#4 0x00007f5e52b120a9 in dwfl_report_elf@@ELFUTILS_0.156 ()
from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#5 0x000000000068bf20 in __report_module (al=al@entry=0x7fff9df80010, ip=ip@entry=139803237033216, ui=ui@entry=0x5369b5e0)
at util/dso.h:537
torvalds#6 0x000000000068c3d1 in report_module (ip=139803237033216, ui=0x5369b5e0) at util/unwind-libdw.c:114
torvalds#7 frame_callback (state=0x535aef10, arg=0x5369b5e0) at util/unwind-libdw.c:242
torvalds#8 0x00007f5e52b261d3 in dwfl_thread_getframes () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
torvalds#9 0x00007f5e52b25bdb in get_one_thread_cb () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
torvalds#10 0x00007f5e52b25faa in dwfl_getthreads () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
torvalds#11 0x00007f5e52b26514 in dwfl_getthread_frames () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
torvalds#12 0x000000000068c6ce in unwind__get_entries (cb=cb@entry=0x5d4620 <unwind_entry>, arg=arg@entry=0x10cd5fa0,
thread=thread@entry=0x1076a290, data=data@entry=0x7fff9df80540, max_stack=max_stack@entry=127,
best_effort=best_effort@entry=false) at util/thread.h:152
torvalds#13 0x00000000005dae95 in thread__resolve_callchain_unwind (evsel=0x106006d0, thread=0x1076a290, cursor=0x10cd5fa0,
sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2939
torvalds#14 thread__resolve_callchain_unwind (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, sample=0x7fff9df80540,
max_stack=127, symbols=true) at util/machine.c:2920
torvalds#15 __thread__resolve_callchain (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, evsel@entry=0x7fff9df80440,
sample=0x7fff9df80540, parent=parent@entry=0x7fff9df804a0, root_al=root_al@entry=0x7fff9df80440, max_stack=127, symbols=true)
at util/machine.c:2970
torvalds#16 0x00000000005d0cb2 in thread__resolve_callchain (thread=<optimized out>, cursor=<optimized out>, evsel=0x7fff9df80440,
sample=<optimized out>, parent=0x7fff9df804a0, root_al=0x7fff9df80440, max_stack=127) at util/machine.h:198
torvalds#17 sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fff9df804a0,
evsel=evsel@entry=0x106006d0, al=al@entry=0x7fff9df80440, max_stack=max_stack@entry=127) at util/callchain.c:1127
torvalds#18 0x0000000000617e08 in hist_entry_iter__add (iter=iter@entry=0x7fff9df80480, al=al@entry=0x7fff9df80440, max_stack_depth=127,
arg=arg@entry=0x7fff9df81ae0) at util/hist.c:1255
torvalds#19 0x000000000045d2d0 in process_sample_event (tool=0x7fff9df81ae0, event=<optimized out>, sample=0x7fff9df80540,
evsel=0x106006d0, machine=<optimized out>) at builtin-report.c:334
torvalds#20 0x00000000005e3bb1 in perf_session__deliver_event (session=0x105ff2c0, event=0x7f5c7d735ca0, tool=0x7fff9df81ae0,
file_offset=2914716832, file_path=0x105ffbf0 "perf.data") at util/session.c:1367
torvalds#21 0x00000000005e8d93 in do_flush (oe=0x105ffa50, show_progress=false) at util/ordered-events.c:245
torvalds#22 __ordered_events__flush (oe=0x105ffa50, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:324
torvalds#23 0x00000000005e1f64 in perf_session__process_user_event (session=0x105ff2c0, event=0x7f5c7d752b18, file_offset=2914835224,
file_path=0x105ffbf0 "perf.data") at util/session.c:1419
torvalds#24 0x00000000005e47c7 in reader__read_event (rd=rd@entry=0x7fff9df81260, session=session@entry=0x105ff2c0,
--Type <RET> for more, q to quit, c to continue without paging--
quit
prog=prog@entry=0x7fff9df81220) at util/session.c:2132
torvalds#25 0x00000000005e4b37 in reader__process_events (rd=0x7fff9df81260, session=0x105ff2c0, prog=0x7fff9df81220)
at util/session.c:2181
torvalds#26 __perf_session__process_events (session=0x105ff2c0) at util/session.c:2226
torvalds#27 perf_session__process_events (session=session@entry=0x105ff2c0) at util/session.c:2390
torvalds#28 0x0000000000460add in __cmd_report (rep=0x7fff9df81ae0) at builtin-report.c:1076
torvalds#29 cmd_report (argc=<optimized out>, argv=<optimized out>) at builtin-report.c:1827
torvalds#30 0x00000000004c5a40 in run_builtin (p=p@entry=0xd8f7f8 <commands+312>, argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0)
at perf.c:351
torvalds#31 0x00000000004c5d63 in handle_internal_command (argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:404
torvalds#32 0x0000000000442de3 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:448
torvalds#33 main (argc=<optimized out>, argv=0x7fff9df844b0) at perf.c:556
The hangup happens because nothing in` perf` or `elfutils` checks if a
mapped file is easily readable.
The change conservatively skips all non-regular files.
Signed-off-by: Sergei Trofimovich <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
bjackman
pushed a commit
to bjackman/linux
that referenced
this pull request
Jun 25, 2025
WARNING: Misordered MAINTAINERS entry - list file patterns in alphabetic order torvalds#32: FILE: MAINTAINERS:15860: F: mm/oom_kill.c +F: include/linux/oom.h Cc: David Rientjes <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Michal Hocko <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
heftig
pushed a commit
to archlinux/linux
that referenced
this pull request
Jun 27, 2025
[ Upstream commit eedf3e3 ] ACPICA commit 1c28da2242783579d59767617121035dafba18c3 This was originally done in NetBSD: NetBSD/src@b69d1ac and is the correct alternative to the smattering of `memcpy`s I previously contributed to this repository. This also sidesteps the newly strict checks added in UBSAN: llvm/llvm-project@7926744 Before this change we see the following UBSAN stack trace in Fuchsia: #0 0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e #1.2 0x000021982bc4af3c in ubsan_get_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:41 <libclang_rt.asan.so>+0x41f3c #1.1 0x000021982bc4af3c in maybe_print_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:51 <libclang_rt.asan.so>+0x41f3c #1 0x000021982bc4af3c in ~scoped_report() compiler-rt/lib/ubsan/ubsan_diag.cpp:395 <libclang_rt.asan.so>+0x41f3c #2 0x000021982bc4bb6f in handletype_mismatch_impl() compiler-rt/lib/ubsan/ubsan_handlers.cpp:137 <libclang_rt.asan.so>+0x42b6f #3 0x000021982bc4b723 in __ubsan_handle_type_mismatch_v1 compiler-rt/lib/ubsan/ubsan_handlers.cpp:142 <libclang_rt.asan.so>+0x42723 #4 0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e #5 0x000021afcfdf2089 in acpi_rs_convert_aml_to_resource(struct acpi_resource*, union aml_resource*, struct acpi_rsconvert_info*) ../../third_party/acpica/source/components/resources/rsmisc.c:355 <platform-bus-x86.so>+0x6b2089 #6 0x000021afcfded169 in acpi_rs_convert_aml_to_resources(u8*, u32, u32, u8, void**) ../../third_party/acpica/source/components/resources/rslist.c:137 <platform-bus-x86.so>+0x6ad169 #7 0x000021afcfe2d24a in acpi_ut_walk_aml_resources(struct acpi_walk_state*, u8*, acpi_size, acpi_walk_aml_callback, void**) ../../third_party/acpica/source/components/utilities/utresrc.c:237 <platform-bus-x86.so>+0x6ed24a #8 0x000021afcfde66b7 in acpi_rs_create_resource_list(union acpi_operand_object*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rscreate.c:199 <platform-bus-x86.so>+0x6a66b7 #9 0x000021afcfdf6979 in acpi_rs_get_method_data(acpi_handle, const char*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rsutils.c:770 <platform-bus-x86.so>+0x6b6979 #10 0x000021afcfdf708f in acpi_walk_resources(acpi_handle, char*, acpi_walk_resource_callback, void*) ../../third_party/acpica/source/components/resources/rsxface.c:731 <platform-bus-x86.so>+0x6b708f torvalds#11 0x000021afcfa95dcf in acpi::acpi_impl::walk_resources(acpi::acpi_impl*, acpi_handle, const char*, acpi::Acpi::resources_callable) ../../src/devices/board/lib/acpi/acpi-impl.cc:41 <platform-bus-x86.so>+0x355dcf torvalds#12 0x000021afcfaa8278 in acpi::device_builder::gather_resources(acpi::device_builder*, acpi::Acpi*, fidl::any_arena&, acpi::Manager*, acpi::device_builder::gather_resources_callback) ../../src/devices/board/lib/acpi/device-builder.cc:84 <platform-bus-x86.so>+0x368278 torvalds#13 0x000021afcfbddb87 in acpi::Manager::configure_discovered_devices(acpi::Manager*) ../../src/devices/board/lib/acpi/manager.cc:75 <platform-bus-x86.so>+0x49db87 torvalds#14 0x000021afcf99091d in publish_acpi_devices(acpi::Manager*, zx_device_t*, zx_device_t*) ../../src/devices/board/drivers/x86/acpi-nswalk.cc:95 <platform-bus-x86.so>+0x25091d torvalds#15 0x000021afcf9c1d4e in x86::X86::do_init(x86::X86*) ../../src/devices/board/drivers/x86/x86.cc:60 <platform-bus-x86.so>+0x281d4e torvalds#16 0x000021afcf9e33ad in λ(x86::X86::ddk_init::(anon class)*) ../../src/devices/board/drivers/x86/x86.cc:77 <platform-bus-x86.so>+0x2a33ad torvalds#17 0x000021afcf9e313e in fit::internal::target<(lambda at../../src/devices/board/drivers/x86/x86.cc:76:19), false, false, std::__2::allocator<std::byte>, void>::invoke(void*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:183 <platform-bus-x86.so>+0x2a313e torvalds#18 0x000021afcfbab4c7 in fit::internal::function_base<16UL, false, void(), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <platform-bus-x86.so>+0x46b4c7 torvalds#19 0x000021afcfbab342 in fit::function_impl<16UL, false, void(), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/function.h:315 <platform-bus-x86.so>+0x46b342 torvalds#20 0x000021afcfcd98c3 in async::internal::retained_task::Handler(async_dispatcher_t*, async_task_t*, zx_status_t) ../../sdk/lib/async/task.cc:24 <platform-bus-x86.so>+0x5998c3 torvalds#21 0x00002290f9924616 in λ(const driver_runtime::Dispatcher::post_task::(anon class)*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/dispatcher.cc:789 <libdriver_runtime.so>+0x10a616 torvalds#22 0x00002290f9924323 in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:788:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int>::invoke(void*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0x10a323 torvalds#23 0x00002290f9904b76 in fit::internal::function_base<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xeab76 torvalds#24 0x00002290f9904831 in fit::callback_impl<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::operator()(fit::callback_impl<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/function.h:471 <libdriver_runtime.so>+0xea831 torvalds#25 0x00002290f98d5adc in driver_runtime::callback_request::Call(driver_runtime::callback_request*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/callback_request.h:74 <libdriver_runtime.so>+0xbbadc torvalds#26 0x00002290f98e1e58 in driver_runtime::Dispatcher::dispatch_callback(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >) ../../src/devices/bin/driver_runtime/dispatcher.cc:1248 <libdriver_runtime.so>+0xc7e58 torvalds#27 0x00002290f98e4159 in driver_runtime::Dispatcher::dispatch_callbacks(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:1308 <libdriver_runtime.so>+0xca159 torvalds#28 0x00002290f9918414 in λ(const driver_runtime::Dispatcher::create_with_adder::(anon class)*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:353 <libdriver_runtime.so>+0xfe414 torvalds#29 0x00002290f991812d in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:351:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>>::invoke(void*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0xfe12d torvalds#30 0x00002290f9906fc7 in fit::internal::function_base<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xecfc7 torvalds#31 0x00002290f9906c66 in fit::function_impl<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/function.h:315 <libdriver_runtime.so>+0xecc66 torvalds#32 0x00002290f98e73d9 in driver_runtime::Dispatcher::event_waiter::invoke_callback(driver_runtime::Dispatcher::event_waiter*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.h:543 <libdriver_runtime.so>+0xcd3d9 torvalds#33 0x00002290f98e700d in driver_runtime::Dispatcher::event_waiter::handle_event(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/dispatcher.cc:1442 <libdriver_runtime.so>+0xcd00d torvalds#34 0x00002290f9918983 in async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event(async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>*, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/async_loop_owned_event_handler.h:59 <libdriver_runtime.so>+0xfe983 torvalds#35 0x00002290f9918b9e in async::wait_method<async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>, &async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event>::call_handler(async_dispatcher_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async/include/lib/async/cpp/wait.h:201 <libdriver_runtime.so>+0xfeb9e torvalds#36 0x00002290f99bf509 in async_loop_dispatch_wait(async_loop_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async-loop/loop.c:394 <libdriver_runtime.so>+0x1a5509 torvalds#37 0x00002290f99b9958 in async_loop_run_once(async_loop_t*, zx_time_t) ../../sdk/lib/async-loop/loop.c:343 <libdriver_runtime.so>+0x19f958 torvalds#38 0x00002290f99b9247 in async_loop_run(async_loop_t*, zx_time_t, _Bool) ../../sdk/lib/async-loop/loop.c:301 <libdriver_runtime.so>+0x19f247 torvalds#39 0x00002290f99ba962 in async_loop_run_thread(void*) ../../sdk/lib/async-loop/loop.c:860 <libdriver_runtime.so>+0x1a0962 torvalds#40 0x000041afd176ef30 in start_c11(void*) ../../zircon/third_party/ulib/musl/pthread/pthread_create.c:63 <libc.so>+0x84f30 torvalds#41 0x000041afd18a448d in thread_trampoline(uintptr_t, uintptr_t) ../../zircon/system/ulib/runtime/thread.cc:100 <libc.so>+0x1ba48d Link: acpica/acpica@1c28da22 Signed-off-by: Rafael J. Wysocki <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Tamir Duberstein <[email protected]> [ rjw: Pick up the tag from Tamir ] Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
mj22226
pushed a commit
to mj22226/linux
that referenced
this pull request
Jul 4, 2025
[ Upstream commit eedf3e3 ] ACPICA commit 1c28da2242783579d59767617121035dafba18c3 This was originally done in NetBSD: NetBSD/src@b69d1ac and is the correct alternative to the smattering of `memcpy`s I previously contributed to this repository. This also sidesteps the newly strict checks added in UBSAN: llvm/llvm-project@7926744 Before this change we see the following UBSAN stack trace in Fuchsia: #0 0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e #1.2 0x000021982bc4af3c in ubsan_get_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:41 <libclang_rt.asan.so>+0x41f3c #1.1 0x000021982bc4af3c in maybe_print_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:51 <libclang_rt.asan.so>+0x41f3c #1 0x000021982bc4af3c in ~scoped_report() compiler-rt/lib/ubsan/ubsan_diag.cpp:395 <libclang_rt.asan.so>+0x41f3c #2 0x000021982bc4bb6f in handletype_mismatch_impl() compiler-rt/lib/ubsan/ubsan_handlers.cpp:137 <libclang_rt.asan.so>+0x42b6f #3 0x000021982bc4b723 in __ubsan_handle_type_mismatch_v1 compiler-rt/lib/ubsan/ubsan_handlers.cpp:142 <libclang_rt.asan.so>+0x42723 #4 0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e #5 0x000021afcfdf2089 in acpi_rs_convert_aml_to_resource(struct acpi_resource*, union aml_resource*, struct acpi_rsconvert_info*) ../../third_party/acpica/source/components/resources/rsmisc.c:355 <platform-bus-x86.so>+0x6b2089 torvalds#6 0x000021afcfded169 in acpi_rs_convert_aml_to_resources(u8*, u32, u32, u8, void**) ../../third_party/acpica/source/components/resources/rslist.c:137 <platform-bus-x86.so>+0x6ad169 torvalds#7 0x000021afcfe2d24a in acpi_ut_walk_aml_resources(struct acpi_walk_state*, u8*, acpi_size, acpi_walk_aml_callback, void**) ../../third_party/acpica/source/components/utilities/utresrc.c:237 <platform-bus-x86.so>+0x6ed24a torvalds#8 0x000021afcfde66b7 in acpi_rs_create_resource_list(union acpi_operand_object*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rscreate.c:199 <platform-bus-x86.so>+0x6a66b7 torvalds#9 0x000021afcfdf6979 in acpi_rs_get_method_data(acpi_handle, const char*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rsutils.c:770 <platform-bus-x86.so>+0x6b6979 torvalds#10 0x000021afcfdf708f in acpi_walk_resources(acpi_handle, char*, acpi_walk_resource_callback, void*) ../../third_party/acpica/source/components/resources/rsxface.c:731 <platform-bus-x86.so>+0x6b708f torvalds#11 0x000021afcfa95dcf in acpi::acpi_impl::walk_resources(acpi::acpi_impl*, acpi_handle, const char*, acpi::Acpi::resources_callable) ../../src/devices/board/lib/acpi/acpi-impl.cc:41 <platform-bus-x86.so>+0x355dcf torvalds#12 0x000021afcfaa8278 in acpi::device_builder::gather_resources(acpi::device_builder*, acpi::Acpi*, fidl::any_arena&, acpi::Manager*, acpi::device_builder::gather_resources_callback) ../../src/devices/board/lib/acpi/device-builder.cc:84 <platform-bus-x86.so>+0x368278 torvalds#13 0x000021afcfbddb87 in acpi::Manager::configure_discovered_devices(acpi::Manager*) ../../src/devices/board/lib/acpi/manager.cc:75 <platform-bus-x86.so>+0x49db87 torvalds#14 0x000021afcf99091d in publish_acpi_devices(acpi::Manager*, zx_device_t*, zx_device_t*) ../../src/devices/board/drivers/x86/acpi-nswalk.cc:95 <platform-bus-x86.so>+0x25091d torvalds#15 0x000021afcf9c1d4e in x86::X86::do_init(x86::X86*) ../../src/devices/board/drivers/x86/x86.cc:60 <platform-bus-x86.so>+0x281d4e torvalds#16 0x000021afcf9e33ad in λ(x86::X86::ddk_init::(anon class)*) ../../src/devices/board/drivers/x86/x86.cc:77 <platform-bus-x86.so>+0x2a33ad torvalds#17 0x000021afcf9e313e in fit::internal::target<(lambda at../../src/devices/board/drivers/x86/x86.cc:76:19), false, false, std::__2::allocator<std::byte>, void>::invoke(void*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:183 <platform-bus-x86.so>+0x2a313e torvalds#18 0x000021afcfbab4c7 in fit::internal::function_base<16UL, false, void(), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <platform-bus-x86.so>+0x46b4c7 torvalds#19 0x000021afcfbab342 in fit::function_impl<16UL, false, void(), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/function.h:315 <platform-bus-x86.so>+0x46b342 torvalds#20 0x000021afcfcd98c3 in async::internal::retained_task::Handler(async_dispatcher_t*, async_task_t*, zx_status_t) ../../sdk/lib/async/task.cc:24 <platform-bus-x86.so>+0x5998c3 torvalds#21 0x00002290f9924616 in λ(const driver_runtime::Dispatcher::post_task::(anon class)*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/dispatcher.cc:789 <libdriver_runtime.so>+0x10a616 torvalds#22 0x00002290f9924323 in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:788:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int>::invoke(void*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0x10a323 torvalds#23 0x00002290f9904b76 in fit::internal::function_base<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xeab76 torvalds#24 0x00002290f9904831 in fit::callback_impl<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::operator()(fit::callback_impl<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/function.h:471 <libdriver_runtime.so>+0xea831 torvalds#25 0x00002290f98d5adc in driver_runtime::callback_request::Call(driver_runtime::callback_request*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/callback_request.h:74 <libdriver_runtime.so>+0xbbadc torvalds#26 0x00002290f98e1e58 in driver_runtime::Dispatcher::dispatch_callback(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >) ../../src/devices/bin/driver_runtime/dispatcher.cc:1248 <libdriver_runtime.so>+0xc7e58 torvalds#27 0x00002290f98e4159 in driver_runtime::Dispatcher::dispatch_callbacks(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:1308 <libdriver_runtime.so>+0xca159 torvalds#28 0x00002290f9918414 in λ(const driver_runtime::Dispatcher::create_with_adder::(anon class)*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:353 <libdriver_runtime.so>+0xfe414 torvalds#29 0x00002290f991812d in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:351:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>>::invoke(void*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0xfe12d torvalds#30 0x00002290f9906fc7 in fit::internal::function_base<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xecfc7 torvalds#31 0x00002290f9906c66 in fit::function_impl<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/function.h:315 <libdriver_runtime.so>+0xecc66 torvalds#32 0x00002290f98e73d9 in driver_runtime::Dispatcher::event_waiter::invoke_callback(driver_runtime::Dispatcher::event_waiter*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.h:543 <libdriver_runtime.so>+0xcd3d9 torvalds#33 0x00002290f98e700d in driver_runtime::Dispatcher::event_waiter::handle_event(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/dispatcher.cc:1442 <libdriver_runtime.so>+0xcd00d torvalds#34 0x00002290f9918983 in async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event(async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>*, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/async_loop_owned_event_handler.h:59 <libdriver_runtime.so>+0xfe983 torvalds#35 0x00002290f9918b9e in async::wait_method<async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>, &async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event>::call_handler(async_dispatcher_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async/include/lib/async/cpp/wait.h:201 <libdriver_runtime.so>+0xfeb9e torvalds#36 0x00002290f99bf509 in async_loop_dispatch_wait(async_loop_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async-loop/loop.c:394 <libdriver_runtime.so>+0x1a5509 torvalds#37 0x00002290f99b9958 in async_loop_run_once(async_loop_t*, zx_time_t) ../../sdk/lib/async-loop/loop.c:343 <libdriver_runtime.so>+0x19f958 torvalds#38 0x00002290f99b9247 in async_loop_run(async_loop_t*, zx_time_t, _Bool) ../../sdk/lib/async-loop/loop.c:301 <libdriver_runtime.so>+0x19f247 torvalds#39 0x00002290f99ba962 in async_loop_run_thread(void*) ../../sdk/lib/async-loop/loop.c:860 <libdriver_runtime.so>+0x1a0962 torvalds#40 0x000041afd176ef30 in start_c11(void*) ../../zircon/third_party/ulib/musl/pthread/pthread_create.c:63 <libc.so>+0x84f30 torvalds#41 0x000041afd18a448d in thread_trampoline(uintptr_t, uintptr_t) ../../zircon/system/ulib/runtime/thread.cc:100 <libc.so>+0x1ba48d Link: acpica/acpica@1c28da22 Signed-off-by: Rafael J. Wysocki <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Tamir Duberstein <[email protected]> [ rjw: Pick up the tag from Tamir ] Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
sys-i915-oscijenkins
pushed a commit
to intel-lgci-fdo-gitlab-mirror/drm.xe.kernel
that referenced
this pull request
Sep 30, 2025
Before calling ttm_bo_populate() in the CPU fault path of a bo,
we assert that the bo is not being migrated. However, for
local bos we share the reservation object with other local bos
that might be in the process of being migrated. Also some VM
operations may attach USAGE_KERNEL fences to the common
reservation object and trigger false positives from the assert.
So remove the assert and instead wait for bo idle. This may
unnecessarily wait for idle in some cases but since we're
doing this wait later in the fault path anyway we might as
well do it here as well.
This fixes warnings like:
Sep 25 14:56:23 desky kernel: ------------[ cut here ]------------
Sep 25 14:56:23 desky kernel: xe 0000:03:00.0: [drm] Assertion `dma_resv_test_signaled(tbo->base.resv, DMA_RESV_USAGE_KERNEL) || (tbo->ttm && ttm_tt_is_populated(tbo->ttm))` failed!
platform: BATTLEMAGE subplatform: 1
graphics: Xe2_HPG 20.01 step A0
media: Xe2_HPM 13.01 step A1
Sep 25 14:56:23 desky kernel: WARNING: CPU: 6 PID: 24767 at drivers/gpu/drm/xe/xe_bo.c:1748 xe_bo_fault_migrate+0x1bb/0x300 [xe]
Sep 25 14:56:23 desky kernel: Modules linked in: cpuid dm_crypt xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc xfrm_user xfr>
Sep 25 14:56:23 desky kernel: snd_soc_sdca snd_seq_midi prime_numbers coretemp snd_seq_midi_event drm_ttm_helper snd_hda_codec drm_buddy drm_exec snd_rawmidi snd_soc_core snd_hda_cor>
Sep 25 14:56:23 desky kernel: CPU: 6 UID: 1000 PID: 24767 Comm: steamwebhelper Tainted: G U W 6.17.0-rc7+ torvalds#32 PREEMPT(voluntary)
Sep 25 14:56:23 desky kernel: Tainted: [U]=USER, [W]=WARN
Sep 25 14:56:23 desky kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D36/PRO Z690-P DDR4 (MS-7D36), BIOS A.A1 10/18/2022
Sep 25 14:56:23 desky kernel: RIP: 0010:xe_bo_fault_migrate+0x1bb/0x300 [xe]
Sep 25 14:56:23 desky kernel: Code: fa 64 29 f9 48 c7 c7 40 e0 d3 c1 51 48 c7 c1 c0 e3 d3 c1 52 4c 8b 45 c0 41 50 44 8b 4d c8 4d 89 e0 48 8b 55 a8 e8 25 27 95 ef <0f> 0b 48 83 c4 40 4>
Sep 25 14:56:23 desky kernel: RSP: 0000:ffffae1ca88c7b10 EFLAGS: 00010286
Sep 25 14:56:23 desky kernel: RAX: 0000000000000000 RBX: ffff8d7cfd7e6800 RCX: 0000000000000027
Sep 25 14:56:23 desky kernel: RDX: ffff8d845019cec8 RSI: 0000000000000001 RDI: ffff8d845019cec0
Sep 25 14:56:23 desky kernel: RBP: ffffae1ca88c7bc8 R08: 0000000000000000 R09: 0000000000000000
Sep 25 14:56:23 desky kernel: R10: 0000000000000000 R11: 0000000000000004 R12: ffffffffc1db1faa
Sep 25 14:56:23 desky kernel: R13: ffffffffc1db2ab4 R14: 0000000000000001 R15: ffffae1ca88c7bd8
Sep 25 14:56:23 desky kernel: FS: 00007fb1baf31940(0000) GS:ffff8d849c870000(0000) knlGS:0000000000000000
Sep 25 14:56:23 desky kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 25 14:56:23 desky kernel: CR2: 00007fb1b2860020 CR3: 00000001705a9004 CR4: 0000000000772ef0
Sep 25 14:56:23 desky kernel: PKRU: 55555558
Sep 25 14:56:23 desky kernel: Call Trace:
Sep 25 14:56:23 desky kernel: <TASK>
Sep 25 14:56:23 desky kernel: xe_bo_cpu_fault_fastpath+0x11e/0x220 [xe]
Sep 25 14:56:23 desky kernel: xe_bo_cpu_fault+0x84/0x410 [xe]
Sep 25 14:56:23 desky kernel: ? __x64_sys_mmap+0x33/0x50
Sep 25 14:56:23 desky kernel: ? x64_sys_call+0x1b2e/0x20d0
Sep 25 14:56:23 desky kernel: ? do_syscall_64+0x9d/0x1f0
Sep 25 14:56:23 desky kernel: ? __check_object_size+0x4a/0x2e0
Sep 25 14:56:23 desky kernel: __do_fault+0x36/0x190
Sep 25 14:56:23 desky kernel: do_fault+0xcf/0x570
Sep 25 14:56:23 desky kernel: __handle_mm_fault+0x92b/0xfe0
Sep 25 14:56:23 desky kernel: ? ktime_get_mono_fast_ns+0x39/0xd0
Sep 25 14:56:23 desky kernel: handle_mm_fault+0x164/0x2c0
Sep 25 14:56:23 desky kernel: do_user_addr_fault+0x2cb/0x840
Sep 25 14:56:23 desky kernel: exc_page_fault+0x75/0x180
Sep 25 14:56:23 desky kernel: asm_exc_page_fault+0x27/0x30
Sep 25 14:56:23 desky kernel: RIP: 0033:0x7fb1bc388bb7
Sep 25 14:56:23 desky kernel: Code: 48 ff c7 48 01 fe 48 8d 54 11 80 0f 1f 84 00 00 00 00 00 c5 fe 6f 0e c5 fe 6f 56 20 c5 fe 6f 5e 40 c5 fe 6f 66 60 48 83 ee 80 <c5> fd 7f 0f c5 fd 7>
Sep 25 14:56:23 desky kernel: RSP: 002b:00007ffd7814fad8 EFLAGS: 00010207
Sep 25 14:56:23 desky kernel: RAX: 00007fb1b2860000 RBX: 0000000000000690 RCX: 00007fb1b2860000
Sep 25 14:56:23 desky kernel: RDX: 00007fb1b2860610 RSI: 0000556eda79f4c0 RDI: 00007fb1b2860020
Sep 25 14:56:23 desky kernel: RBP: 00007ffd7814fb60 R08: 0000000000000000 R09: 000000012be0e000
Sep 25 14:56:23 desky kernel: R10: 00007fb1b2860000 R11: 0000000000000246 R12: 0000556edd39a240
Sep 25 14:56:23 desky kernel: R13: 00007fb1b2dcb010 R14: 0000556eda79f420 R15: 0000000000000000
Sep 25 14:56:23 desky kernel: </TASK>
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5250
Fixes: c2ae94c ("drm/xe: Convert the CPU fault handler for exhaustive eviction")
Cc: Matthew Brost <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
intel-lab-lkp
pushed a commit
to intel-lab-lkp/linux
that referenced
this pull request
Oct 3, 2025
Before calling ttm_bo_populate() in the CPU fault path of a bo,
we assert that the bo is not being migrated. However, for
local bos we share the reservation object with other local bos
that might be in the process of being migrated. Also some VM
operations may attach USAGE_KERNEL fences to the common
reservation object and trigger false positives from the assert.
So remove the assert and instead wait for bo idle. This may
unnecessarily wait for idle in some cases but since we're
doing this wait later in the fault path anyway we might as
well do it here as well.
This fixes warnings like:
Sep 25 14:56:23 desky kernel: ------------[ cut here ]------------
Sep 25 14:56:23 desky kernel: xe 0000:03:00.0: [drm] Assertion `dma_resv_test_signaled(tbo->base.resv, DMA_RESV_USAGE_KERNEL) || (tbo->ttm && ttm_tt_is_populated(tbo->ttm))` failed!
platform: BATTLEMAGE subplatform: 1
graphics: Xe2_HPG 20.01 step A0
media: Xe2_HPM 13.01 step A1
Sep 25 14:56:23 desky kernel: WARNING: CPU: 6 PID: 24767 at drivers/gpu/drm/xe/xe_bo.c:1748 xe_bo_fault_migrate+0x1bb/0x300 [xe]
Sep 25 14:56:23 desky kernel: Modules linked in: cpuid dm_crypt xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc xfrm_user xfr>
Sep 25 14:56:23 desky kernel: snd_soc_sdca snd_seq_midi prime_numbers coretemp snd_seq_midi_event drm_ttm_helper snd_hda_codec drm_buddy drm_exec snd_rawmidi snd_soc_core snd_hda_cor>
Sep 25 14:56:23 desky kernel: CPU: 6 UID: 1000 PID: 24767 Comm: steamwebhelper Tainted: G U W 6.17.0-rc7+ torvalds#32 PREEMPT(voluntary)
Sep 25 14:56:23 desky kernel: Tainted: [U]=USER, [W]=WARN
Sep 25 14:56:23 desky kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D36/PRO Z690-P DDR4 (MS-7D36), BIOS A.A1 10/18/2022
Sep 25 14:56:23 desky kernel: RIP: 0010:xe_bo_fault_migrate+0x1bb/0x300 [xe]
Sep 25 14:56:23 desky kernel: Code: fa 64 29 f9 48 c7 c7 40 e0 d3 c1 51 48 c7 c1 c0 e3 d3 c1 52 4c 8b 45 c0 41 50 44 8b 4d c8 4d 89 e0 48 8b 55 a8 e8 25 27 95 ef <0f> 0b 48 83 c4 40 4>
Sep 25 14:56:23 desky kernel: RSP: 0000:ffffae1ca88c7b10 EFLAGS: 00010286
Sep 25 14:56:23 desky kernel: RAX: 0000000000000000 RBX: ffff8d7cfd7e6800 RCX: 0000000000000027
Sep 25 14:56:23 desky kernel: RDX: ffff8d845019cec8 RSI: 0000000000000001 RDI: ffff8d845019cec0
Sep 25 14:56:23 desky kernel: RBP: ffffae1ca88c7bc8 R08: 0000000000000000 R09: 0000000000000000
Sep 25 14:56:23 desky kernel: R10: 0000000000000000 R11: 0000000000000004 R12: ffffffffc1db1faa
Sep 25 14:56:23 desky kernel: R13: ffffffffc1db2ab4 R14: 0000000000000001 R15: ffffae1ca88c7bd8
Sep 25 14:56:23 desky kernel: FS: 00007fb1baf31940(0000) GS:ffff8d849c870000(0000) knlGS:0000000000000000
Sep 25 14:56:23 desky kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 25 14:56:23 desky kernel: CR2: 00007fb1b2860020 CR3: 00000001705a9004 CR4: 0000000000772ef0
Sep 25 14:56:23 desky kernel: PKRU: 55555558
Sep 25 14:56:23 desky kernel: Call Trace:
Sep 25 14:56:23 desky kernel: <TASK>
Sep 25 14:56:23 desky kernel: xe_bo_cpu_fault_fastpath+0x11e/0x220 [xe]
Sep 25 14:56:23 desky kernel: xe_bo_cpu_fault+0x84/0x410 [xe]
Sep 25 14:56:23 desky kernel: ? __x64_sys_mmap+0x33/0x50
Sep 25 14:56:23 desky kernel: ? x64_sys_call+0x1b2e/0x20d0
Sep 25 14:56:23 desky kernel: ? do_syscall_64+0x9d/0x1f0
Sep 25 14:56:23 desky kernel: ? __check_object_size+0x4a/0x2e0
Sep 25 14:56:23 desky kernel: __do_fault+0x36/0x190
Sep 25 14:56:23 desky kernel: do_fault+0xcf/0x570
Sep 25 14:56:23 desky kernel: __handle_mm_fault+0x92b/0xfe0
Sep 25 14:56:23 desky kernel: ? ktime_get_mono_fast_ns+0x39/0xd0
Sep 25 14:56:23 desky kernel: handle_mm_fault+0x164/0x2c0
Sep 25 14:56:23 desky kernel: do_user_addr_fault+0x2cb/0x840
Sep 25 14:56:23 desky kernel: exc_page_fault+0x75/0x180
Sep 25 14:56:23 desky kernel: asm_exc_page_fault+0x27/0x30
Sep 25 14:56:23 desky kernel: RIP: 0033:0x7fb1bc388bb7
Sep 25 14:56:23 desky kernel: Code: 48 ff c7 48 01 fe 48 8d 54 11 80 0f 1f 84 00 00 00 00 00 c5 fe 6f 0e c5 fe 6f 56 20 c5 fe 6f 5e 40 c5 fe 6f 66 60 48 83 ee 80 <c5> fd 7f 0f c5 fd 7>
Sep 25 14:56:23 desky kernel: RSP: 002b:00007ffd7814fad8 EFLAGS: 00010207
Sep 25 14:56:23 desky kernel: RAX: 00007fb1b2860000 RBX: 0000000000000690 RCX: 00007fb1b2860000
Sep 25 14:56:23 desky kernel: RDX: 00007fb1b2860610 RSI: 0000556eda79f4c0 RDI: 00007fb1b2860020
Sep 25 14:56:23 desky kernel: RBP: 00007ffd7814fb60 R08: 0000000000000000 R09: 000000012be0e000
Sep 25 14:56:23 desky kernel: R10: 00007fb1b2860000 R11: 0000000000000246 R12: 0000556edd39a240
Sep 25 14:56:23 desky kernel: R13: 00007fb1b2dcb010 R14: 0000556eda79f420 R15: 0000000000000000
Sep 25 14:56:23 desky kernel: </TASK>
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5250
Fixes: c2ae94c ("drm/xe: Convert the CPU fault handler for exhaustive eviction")
Cc: Matthew Brost <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 8f1756a)
Signed-off-by: Lucas De Marchi <[email protected]>
intel-lab-lkp
pushed a commit
to intel-lab-lkp/linux
that referenced
this pull request
Nov 10, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <[email protected]>
intel-lab-lkp
pushed a commit
to intel-lab-lkp/linux
that referenced
this pull request
Nov 13, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <[email protected]>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>
hellsgod
pushed a commit
to hellsgod/linux
that referenced
this pull request
Nov 14, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <[email protected]>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>
Linux4
pushed a commit
to fp6-aosp/android_kernel_fairphone_sm7635
that referenced
this pull request
Nov 27, 2025
commit 927fa5b upstream. KMSAN reported uninit-value access in __unix_walk_scc() [1]. In the list_for_each_entry_reverse() loop, when the vertex's index equals it's scc_index, the loop uses the variable vertex as a temporary variable that points to a vertex in scc. And when the loop is finished, the variable vertex points to the list head, in this case scc, which is a local variable on the stack (more precisely, it's not even scc and might underflow the call stack of __unix_walk_scc(): container_of(&scc, struct unix_vertex, scc_entry)). However, the variable vertex is used under the label prev_vertex. So if the edge_stack is not empty and the function jumps to the prev_vertex label, the function will access invalid data on the stack. This causes the uninit-value access issue. Fix this by introducing a new temporary variable for the loop. [1] BUG: KMSAN: uninit-value in __unix_walk_scc net/unix/garbage.c:478 [inline] BUG: KMSAN: uninit-value in unix_walk_scc net/unix/garbage.c:526 [inline] BUG: KMSAN: uninit-value in __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584 __unix_walk_scc net/unix/garbage.c:478 [inline] unix_walk_scc net/unix/garbage.c:526 [inline] __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393 kthread+0x3c4/0x530 kernel/kthread.c:389 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Uninit was stored to memory at: unix_walk_scc net/unix/garbage.c:526 [inline] __unix_gc+0x2adf/0x3c20 net/unix/garbage.c:584 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393 kthread+0x3c4/0x530 kernel/kthread.c:389 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Local variable entries created at: ref_tracker_free+0x48/0xf30 lib/ref_tracker.c:222 netdev_tracker_free include/linux/netdevice.h:4058 [inline] netdev_put include/linux/netdevice.h:4075 [inline] dev_put include/linux/netdevice.h:4101 [inline] update_gid_event_work_handler+0xaa/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:813 CPU: 1 PID: 12763 Comm: kworker/u8:31 Not tainted 6.10.0-rc4-00217-g35bb670d65fc torvalds#32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Workqueue: events_unbound __unix_gc Bug: 404256079 Fixes: 3484f06 ("af_unix: Detect Strongly Connected Components.") Reported-by: syzkaller <[email protected]> Signed-off-by: Shigeru Yoshida <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Lee Jones <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit 0297e7d) Signed-off-by: Lee Jones <[email protected]> Change-Id: If09634d63e04e73d6b6e31984020752cd97e52f0
intel-lab-lkp
pushed a commit
to intel-lab-lkp/linux
that referenced
this pull request
Nov 27, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs
but rather just perform direct assignments.
The performance benchmarks with Generic Entry patch[1] with audit on
from perf bench basic syscall on kunpeng920 gives roughly a 1%
performance uplift and also aligns the implementation with
x86 and RISC-V.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
[1]: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jinjie Ruan <[email protected]>
intel-lab-lkp
pushed a commit
to intel-lab-lkp/linux
that referenced
this pull request
Dec 1, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs
but rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments()
and syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance
benchmarks from perf bench basic syscall on kunpeng920 gives roughly
a 1% performance uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
[1]: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jinjie Ruan <[email protected]>
thestinger
pushed a commit
to GrapheneOS/kernel_common-6.12
that referenced
this pull request
Dec 3, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <[email protected]>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>
(cherry picked from commit 0796ddf
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next)
Bug: 450671466
Change-Id: Icb6faa509da6dc282270d763f763bc943d461119
Signed-off-by: Reka Norman <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 17, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 17, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 17, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Kaz205
pushed a commit
to Kaz205/linux
that referenced
this pull request
Dec 18, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <[email protected]>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 19, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 20, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 21, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 21, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 21, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 23, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 23, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 23, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Kaz205
pushed a commit
to Kaz205/linux
that referenced
this pull request
Dec 26, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.
This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0: hint #0x19
ffffffc080ffcff4: stp x29, x30, [sp, #-48]!
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8: adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc: add x29, sp, #0x0
ffffffc080ffd000: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd004: orr x20, xzr, x0
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008: add x0, x2, #0xc20
{
ffffffc080ffd00c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014: add x19, x19, #0xbb0
ffffffc080ffd018: ldr w3, [x20, #4]
dev->last_state_idx = state;
to
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034: hint #0x19
ffffffc080ffd038: stp x29, x30, [sp, #-48]!
ffffffc080ffd03c: add x29, sp, #0x0
ffffffc080ffd040: stp x19, x20, [sp, torvalds#16]
ffffffc080ffd044: orr x20, xzr, x0
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048: adrp x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c: stp x21, x22, [sp, torvalds#32]
struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050: add x19, x19, #0xbb0
dev->last_state_idx = state;
This saves us:
adrp x2, ffffffc0848c0000 <gicv5_global_data+0x28>
add x0, x2, #0xc20
ldr w3, [x20, #4]
Signed-off-by: Christian Loehle <[email protected]>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.