Skip to content

Conversation

@halit
Copy link

@halit halit commented May 12, 2013

No description provided.

@VdeVatman
Copy link

may I ask what is the swap value for?

@jesus-ramos
Copy link
Contributor

The compiler will get rid of the temporary variable anyway this is just a different way or writing it.

@tawseef
Copy link

tawseef commented May 14, 2013

12 GB
On 12 May 2013 23:30, "Rubén Marrero Ruiz" [email protected] wrote:

may I ask what is the value swap for?


Reply to this email directly or view it on GitHubhttps://github.com//pull/32#issuecomment-17782140
.

@VdeVatman
Copy link

Thanks

swarren pushed a commit to swarren/linux-tegra that referenced this pull request Aug 8, 2013
There is a loop in do_mlockall() that lacks a preemption point, which
means that the following can happen on non-preemptible builds of the
kernel:

> My fuzz tester keeps hitting this. Every instance shows the non-irq stack
> came in from mlockall.  I'm only seeing this on one box, but that has more
> ram (8gb) than my other machines, which might explain it.
>
> 	Dave
>
> INFO: rcu_preempt self-detected stall on CPU { 3}  (t=6500 jiffies g=470344 c=470343 q=0)
> sending NMI to all CPUs:
> NMI backtrace for cpu 3
> CPU: 3 PID: 29664 Comm: trinity-child2 Not tainted 3.11.0-rc1+ torvalds#32
> task: ffff88023e743fc0 ti: ffff88022f6f2000 task.ti: ffff88022f6f2000
> RIP: 0010:[<ffffffff810bf7d1>]  [<ffffffff810bf7d1>] trace_hardirqs_off_caller+0x21/0xb0
> RSP: 0018:ffff880244e03c30  EFLAGS: 00000046
> RAX: ffff88023e743fc0 RBX: 0000000000000001 RCX: 000000000000003c
> RDX: 000000000000000f RSI: 0000000000000004 RDI: ffffffff81033cab
> RBP: ffff880244e03c38 R08: ffff880243288a80 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000001 R12: ffff880243288a80
> R13: ffff8802437eda40 R14: 0000000000080000 R15: 000000000000d010
> FS:  00007f50ae33b740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000097f000 CR3: 0000000240fa0000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
>  ffffffff810bf86d ffff880244e03c98 ffffffff81033cab 0000000000000096
>  000000000000d008 0000000300000002 0000000000000004 0000000000000003
>  0000000000002710 ffffffff81c50d00 ffffffff81c50d00 ffff880244fcde00
> Call Trace:
>  <IRQ>
>  [<ffffffff810bf86d>] ? trace_hardirqs_off+0xd/0x10
>  [<ffffffff81033cab>] __x2apic_send_IPI_mask+0x1ab/0x1c0
>  [<ffffffff81033cdc>] x2apic_send_IPI_all+0x1c/0x20
>  [<ffffffff81030115>] arch_trigger_all_cpu_backtrace+0x65/0xa0
>  [<ffffffff811144b1>] rcu_check_callbacks+0x331/0x8e0
>  [<ffffffff8108bfa0>] ? hrtimer_run_queues+0x20/0x180
>  [<ffffffff8109e905>] ? sched_clock_cpu+0xb5/0x100
>  [<ffffffff81069557>] update_process_times+0x47/0x80
>  [<ffffffff810bd115>] tick_sched_handle.isra.16+0x25/0x60
>  [<ffffffff810bd231>] tick_sched_timer+0x41/0x60
>  [<ffffffff8108ace1>] __run_hrtimer+0x81/0x4e0
>  [<ffffffff810bd1f0>] ? tick_sched_do_timer+0x60/0x60
>  [<ffffffff8108b93f>] hrtimer_interrupt+0xff/0x240
>  [<ffffffff8102de84>] local_apic_timer_interrupt+0x34/0x60
>  [<ffffffff81718c5f>] smp_apic_timer_interrupt+0x3f/0x60
>  [<ffffffff817178ef>] apic_timer_interrupt+0x6f/0x80
>  [<ffffffff8170e8e0>] ? retint_restore_args+0xe/0xe
>  [<ffffffff8105f101>] ? __do_softirq+0xb1/0x440
>  [<ffffffff8105f64d>] irq_exit+0xcd/0xe0
>  [<ffffffff81718c65>] smp_apic_timer_interrupt+0x45/0x60
>  [<ffffffff817178ef>] apic_timer_interrupt+0x6f/0x80
>  <EOI>
>  [<ffffffff8170e8e0>] ? retint_restore_args+0xe/0xe
>  [<ffffffff8170b830>] ? wait_for_completion_killable+0x170/0x170
>  [<ffffffff8170c853>] ? preempt_schedule_irq+0x53/0x90
>  [<ffffffff8170e9f6>] retint_kernel+0x26/0x30
>  [<ffffffff8107a523>] ? queue_work_on+0x43/0x90
>  [<ffffffff8107c369>] schedule_on_each_cpu+0xc9/0x1a0
>  [<ffffffff81167770>] ? lru_add_drain+0x50/0x50
>  [<ffffffff811677c5>] lru_add_drain_all+0x15/0x20
>  [<ffffffff81186965>] SyS_mlockall+0xa5/0x1a0
>  [<ffffffff81716e94>] tracesys+0xdd/0xe2

This commit addresses this problem by inserting the required preemption
point.

Reported-by: Dave Jones <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
torvalds pushed a commit that referenced this pull request Sep 25, 2013
There is a loop in do_mlockall() that lacks a preemption point, which
means that the following can happen on non-preemptible builds of the
kernel. Dave Jones reports:

 "My fuzz tester keeps hitting this.  Every instance shows the non-irq
  stack came in from mlockall.  I'm only seeing this on one box, but
  that has more ram (8gb) than my other machines, which might explain
  it.

    INFO: rcu_preempt self-detected stall on CPU { 3}  (t=6500 jiffies g=470344 c=470343 q=0)
    sending NMI to all CPUs:
    NMI backtrace for cpu 3
    CPU: 3 PID: 29664 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #32
    Call Trace:
      lru_add_drain_all+0x15/0x20
      SyS_mlockall+0xa5/0x1a0
      tracesys+0xdd/0xe2"

This commit addresses this problem by inserting the required preemption
point.

Reported-by: Dave Jones <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
johnweber pushed a commit to johnweber/linux that referenced this pull request Oct 1, 2013
torvalds#125

IRQ torvalds#125's status is not constant on different boards, IRQ torvalds#32 is
IOMUXC's interrupt which can be triggered manually at anytime, use
this irq instead of torvalds#125 to generate interrupt for avoiding CCM enter
low power mode by mistake.

Signed-off-by: Anson Huang <[email protected]>
swarren pushed a commit to swarren/linux-tegra that referenced this pull request Oct 1, 2013
There is a loop in do_mlockall() that lacks a preemption point, which
means that the following can happen on non-preemptible builds of the
kernel:

> My fuzz tester keeps hitting this. Every instance shows the non-irq stack
> came in from mlockall.  I'm only seeing this on one box, but that has more
> ram (8gb) than my other machines, which might explain it.
>
> 	Dave
>
> INFO: rcu_preempt self-detected stall on CPU { 3}  (t=6500 jiffies g=470344 c=470343 q=0)
> sending NMI to all CPUs:
> NMI backtrace for cpu 3
> CPU: 3 PID: 29664 Comm: trinity-child2 Not tainted 3.11.0-rc1+ torvalds#32
> task: ffff88023e743fc0 ti: ffff88022f6f2000 task.ti: ffff88022f6f2000
> RIP: 0010:[<ffffffff810bf7d1>]  [<ffffffff810bf7d1>] trace_hardirqs_off_caller+0x21/0xb0
> RSP: 0018:ffff880244e03c30  EFLAGS: 00000046
> RAX: ffff88023e743fc0 RBX: 0000000000000001 RCX: 000000000000003c
> RDX: 000000000000000f RSI: 0000000000000004 RDI: ffffffff81033cab
> RBP: ffff880244e03c38 R08: ffff880243288a80 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000001 R12: ffff880243288a80
> R13: ffff8802437eda40 R14: 0000000000080000 R15: 000000000000d010
> FS:  00007f50ae33b740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000097f000 CR3: 0000000240fa0000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
>  ffffffff810bf86d ffff880244e03c98 ffffffff81033cab 0000000000000096
>  000000000000d008 0000000300000002 0000000000000004 0000000000000003
>  0000000000002710 ffffffff81c50d00 ffffffff81c50d00 ffff880244fcde00
> Call Trace:
>  <IRQ>
>  [<ffffffff810bf86d>] ? trace_hardirqs_off+0xd/0x10
>  [<ffffffff81033cab>] __x2apic_send_IPI_mask+0x1ab/0x1c0
>  [<ffffffff81033cdc>] x2apic_send_IPI_all+0x1c/0x20
>  [<ffffffff81030115>] arch_trigger_all_cpu_backtrace+0x65/0xa0
>  [<ffffffff811144b1>] rcu_check_callbacks+0x331/0x8e0
>  [<ffffffff8108bfa0>] ? hrtimer_run_queues+0x20/0x180
>  [<ffffffff8109e905>] ? sched_clock_cpu+0xb5/0x100
>  [<ffffffff81069557>] update_process_times+0x47/0x80
>  [<ffffffff810bd115>] tick_sched_handle.isra.16+0x25/0x60
>  [<ffffffff810bd231>] tick_sched_timer+0x41/0x60
>  [<ffffffff8108ace1>] __run_hrtimer+0x81/0x4e0
>  [<ffffffff810bd1f0>] ? tick_sched_do_timer+0x60/0x60
>  [<ffffffff8108b93f>] hrtimer_interrupt+0xff/0x240
>  [<ffffffff8102de84>] local_apic_timer_interrupt+0x34/0x60
>  [<ffffffff81718c5f>] smp_apic_timer_interrupt+0x3f/0x60
>  [<ffffffff817178ef>] apic_timer_interrupt+0x6f/0x80
>  [<ffffffff8170e8e0>] ? retint_restore_args+0xe/0xe
>  [<ffffffff8105f101>] ? __do_softirq+0xb1/0x440
>  [<ffffffff8105f64d>] irq_exit+0xcd/0xe0
>  [<ffffffff81718c65>] smp_apic_timer_interrupt+0x45/0x60
>  [<ffffffff817178ef>] apic_timer_interrupt+0x6f/0x80
>  <EOI>
>  [<ffffffff8170e8e0>] ? retint_restore_args+0xe/0xe
>  [<ffffffff8170b830>] ? wait_for_completion_killable+0x170/0x170
>  [<ffffffff8170c853>] ? preempt_schedule_irq+0x53/0x90
>  [<ffffffff8170e9f6>] retint_kernel+0x26/0x30
>  [<ffffffff8107a523>] ? queue_work_on+0x43/0x90
>  [<ffffffff8107c369>] schedule_on_each_cpu+0xc9/0x1a0
>  [<ffffffff81167770>] ? lru_add_drain+0x50/0x50
>  [<ffffffff811677c5>] lru_add_drain_all+0x15/0x20
>  [<ffffffff81186965>] SyS_mlockall+0xa5/0x1a0
>  [<ffffffff81716e94>] tracesys+0xdd/0xe2

This commit addresses this problem by inserting the required preemption
point.

Reported-by: Dave Jones <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
swarren pushed a commit to swarren/linux-tegra that referenced this pull request Oct 14, 2013
As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  torvalds#6  torvalds#7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  torvalds#8  torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: torvalds#32 torvalds#33 torvalds#34 torvalds#35 torvalds#36 torvalds#37 torvalds#38 torvalds#39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: torvalds#40 torvalds#41 torvalds#42 torvalds#43 torvalds#44 torvalds#45 torvalds#46 torvalds#47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: torvalds#48 torvalds#49 torvalds#50 torvalds#51 #52 #53 torvalds#54 torvalds#55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: torvalds#56 torvalds#57 #58 torvalds#59 torvalds#60 torvalds#61 torvalds#62 torvalds#63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 torvalds#6 torvalds#7 OK
 [    0.686329] Brought up 8 CPUs

Signed-off-by: Borislav Petkov <[email protected]>
Cc: Libin <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
swarren pushed a commit to swarren/linux-tegra that referenced this pull request Oct 14, 2013
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   torvalds#6   torvalds#7
[    0.603005] .... node   #1, CPUs:     torvalds#8   torvalds#9  torvalds#10  torvalds#11  torvalds#12  torvalds#13  torvalds#14  torvalds#15
[    1.200005] .... node   #2, CPUs:    torvalds#16  torvalds#17  torvalds#18  torvalds#19  torvalds#20  torvalds#21  torvalds#22  torvalds#23
[    1.796005] .... node   #3, CPUs:    torvalds#24  torvalds#25  torvalds#26  torvalds#27  torvalds#28  torvalds#29  torvalds#30  torvalds#31
[    2.393005] .... node   #4, CPUs:    torvalds#32  torvalds#33  torvalds#34  torvalds#35  torvalds#36  torvalds#37  torvalds#38  torvalds#39
[    2.996005] .... node   #5, CPUs:    torvalds#40  torvalds#41  torvalds#42  torvalds#43  torvalds#44  torvalds#45  torvalds#46  torvalds#47
[    3.600005] .... node   torvalds#6, CPUs:    torvalds#48  torvalds#49  torvalds#50  torvalds#51  #52  #53  torvalds#54  torvalds#55
[    4.202005] .... node   torvalds#7, CPUs:    torvalds#56  torvalds#57  #58  torvalds#59  torvalds#60  torvalds#61  torvalds#62  torvalds#63
[    4.811005] .... node   torvalds#8, CPUs:    torvalds#64  torvalds#65  torvalds#66  torvalds#67  torvalds#68  torvalds#69  #70  torvalds#71
[    5.421006] .... node   torvalds#9, CPUs:    torvalds#72  torvalds#73  torvalds#74  torvalds#75  torvalds#76  torvalds#77  torvalds#78  torvalds#79
[    6.032005] .... node  torvalds#10, CPUs:    torvalds#80  torvalds#81  torvalds#82  torvalds#83  torvalds#84  torvalds#85  torvalds#86  torvalds#87
[    6.648006] .... node  torvalds#11, CPUs:    torvalds#88  torvalds#89  torvalds#90  torvalds#91  torvalds#92  torvalds#93  torvalds#94  torvalds#95
[    7.262005] .... node  torvalds#12, CPUs:    torvalds#96  torvalds#97  torvalds#98  torvalds#99 torvalds#100 torvalds#101 torvalds#102 torvalds#103
[    7.865005] .... node  torvalds#13, CPUs:   torvalds#104 torvalds#105 torvalds#106 torvalds#107 torvalds#108 torvalds#109 torvalds#110 torvalds#111
[    8.466005] .... node  torvalds#14, CPUs:   torvalds#112 torvalds#113 torvalds#114 torvalds#115 torvalds#116 torvalds#117 torvalds#118 torvalds#119
[    9.073006] .... node  torvalds#15, CPUs:   torvalds#120 torvalds#121 torvalds#122 torvalds#123 torvalds#124 torvalds#125 torvalds#126 torvalds#127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
swarren pushed a commit to swarren/linux-tegra that referenced this pull request Oct 14, 2013
The 'driver' field of the i2c_client struct is redundant. The same data can be
accessed through to_i2c_driver(client->dev.driver). The generated code for both
approaches in more or less the same.

E.g. on ARM the expression client->driver->command(...) generates

		...
		ldr     r3, [r0, torvalds#28]
		ldr     r3, [r3, torvalds#32]
		blx     r3
		...

and the expression to_i2c_driver(client->dev.driver)->command(...) generates

		...
		ldr     r3, [r0, torvalds#160]
    	ldr     r3, [r3, #-4]
    	blx     r3
		...

Other architectures will generate similar code.

All users of the 'driver' field outside of the I2C core have already been
converted. So this only leaves the core itself. This patch converts the
remaining few users in the I2C core and then removes the 'driver' field from the
i2c_client struct.

Signed-off-by: Lars-Peter Clausen <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
TechNexion pushed a commit to TechNexion/linux that referenced this pull request Oct 25, 2013
commit 7ed47b7 upstream.

The ghash_update function passes a pointer to gf128mul_4k_lle which will
be NULL if ghash_setkey is not called or if the most recent call to
ghash_setkey failed to allocate memory.  This causes an oops.  Fix this
up by returning an error code in the null case.

This is trivially triggered from unprivileged userspace through the
AF_ALG interface by simply writing to the socket without setting a key.

The ghash_final function has a similar issue, but triggering it requires
a memory allocation failure in ghash_setkey _after_ at least one
successful call to ghash_update.

  BUG: unable to handle kernel NULL pointer dereference at 00000670
  IP: [<d88c92d4>] gf128mul_4k_lle+0x23/0x60 [gf128mul]
  *pde = 00000000
  Oops: 0000 [#1] PREEMPT SMP
  Modules linked in: ghash_generic gf128mul algif_hash af_alg nfs lockd nfs_acl sunrpc bridge ipv6 stp llc

  Pid: 1502, comm: hashatron Tainted: G        W   3.1.0-rc9-00085-ge9308cf torvalds#32 Bochs Bochs
  EIP: 0060:[<d88c92d4>] EFLAGS: 00000202 CPU: 0
  EIP is at gf128mul_4k_lle+0x23/0x60 [gf128mul]
  EAX: d69db1f0 EBX: d6b8ddac ECX: 00000004 EDX: 00000000
  ESI: 00000670 EDI: d6b8ddac EBP: d6b8ddc8 ESP: d6b8dda4
   DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
  Process hashatron (pid: 1502, ti=d6b8c000 task=d6810000 task.ti=d6b8c000)
  Stack:
   00000000 d69db1f0 00000163 00000000 d6b8ddc8 c101a520 d69db1f0 d52aa000
   00000ff0 d6b8dde8 d88d310f d6b8a3f8 d52aa000 00001000 d88d502c d6b8ddfc
   00001000 d6b8ddf4 c11676e d69db1e8 d6b8de24 c11679ad d52aa000 00000000
  Call Trace:
   [<c101a520>] ? kmap_atomic_prot+0x37/0xa6
   [<d88d310f>] ghash_update+0x85/0xbe [ghash_generic]
   [<c11676ed>] crypto_shash_update+0x18/0x1b
   [<c11679ad>] shash_ahash_update+0x22/0x36
   [<c11679cc>] shash_async_update+0xb/0xd
   [<d88ce0ba>] hash_sendpage+0xba/0xf2 [algif_hash]
   [<c121b24c>] kernel_sendpage+0x39/0x4e
   [<d88ce000>] ? 0xd88cdfff
   [<c121b298>] sock_sendpage+0x37/0x3e
   [<c121b261>] ? kernel_sendpage+0x4e/0x4e
   [<c10b4dbc>] pipe_to_sendpage+0x56/0x61
   [<c10b4e1f>] splice_from_pipe_feed+0x58/0xcd
   [<c10b4d66>] ? splice_from_pipe_begin+0x10/0x10
   [<c10b51f5>] __splice_from_pipe+0x36/0x55
   [<c10b4d66>] ? splice_from_pipe_begin+0x10/0x10
   [<c10b6383>] splice_from_pipe+0x51/0x64
   [<c10b63c2>] ? default_file_splice_write+0x2c/0x2c
   [<c10b63d5>] generic_splice_sendpage+0x13/0x15
   [<c10b4d66>] ? splice_from_pipe_begin+0x10/0x10
   [<c10b527f>] do_splice_from+0x5d/0x67
   [<c10b6865>] sys_splice+0x2bf/0x363
   [<c129373b>] ? sysenter_exit+0xf/0x16
   [<c104dc1e>] ? trace_hardirqs_on_caller+0x10e/0x13f
   [<c129370c>] sysenter_do_call+0x12/0x32
  Code: 83 c4 0c 5b 5e 5f c9 c3 55 b9 04 00 00 00 89 e5 57 8d 7d e4 56 53 8d 5d e4 83 ec 18 89 45 e0 89 55 dc 0f b6 70 0f c1 e6 04 01 d6 <f3> a5 be 0f 00 00 00 4e 89 d8 e8 48 ff ff ff 8b 45 e0 89 da 0f
  EIP: [<d88c92d4>] gf128mul_4k_lle+0x23/0x60 [gf128mul] SS:ESP 0068:d6b8dda4
  CR2: 0000000000000670
  ---[ end trace 4eaa2a86a8e2da24 ]---
  note: hashatron[1502] exited with preempt_count 1
  BUG: scheduling while atomic: hashatron/1502/0x10000002
  INFO: lockdep is turned off.
  [...]

Signed-off-by: Nick Bowler <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
swarren pushed a commit to swarren/linux-tegra that referenced this pull request Oct 29, 2013
There is a defect in imx6 LPM design.  When SW tries to enter low power
mode with following sequence, the chip will enter low power mode before
A9 CPU execute WFI instruction:

1. Set CCM_CLPCR[1:0] to 2'b00;
2. ARM CPU enters WFI;
3. ARM CPU wakeup from an interrupt event, which is masked by GPC or not
   visible to GPC, such as interrupt from local timer;
4. Set CCM_CLPCR[1:0] to 2'b01 or 2'b10;
5. ARM CPU execute WFI.

Before the last step, the chip will enter WAIT mode if CCM_CLPCR[1:0] is
set to 2'b01, or enter STOP mode if CCM_CLPCR[1:0] is set to 2'b10.

The patch implements a recommended workaround for this issue.

1. SW triggers irq torvalds#32(IOMUX) to be always pending manually by setting
   IOMUX_GPR1_GINT bit;
2. SW should then unmask it in GPC before setting CCM LPM;
3. SW should mask it right after CCM LPM is set (bit0-1 of CCM_CLPCR).

Signed-off-by: Shawn Guo <[email protected]>
torvalds pushed a commit that referenced this pull request Nov 22, 2013
Lockdep complains about btrfs's async commit:

[ 2372.462171] [ BUG: bad unlock balance detected! ]
[ 2372.462191] 3.12.0+ #32 Tainted: G        W
[ 2372.462209] -------------------------------------
[ 2372.462228] ceph-osd/14048 is trying to release lock (sb_internal) at:
[ 2372.462275] [<ffffffffa022cb10>] btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
[ 2372.462305] but there are no more locks to release!
[ 2372.462324]
[ 2372.462324] other info that might help us debug this:
[ 2372.462349] no locks held by ceph-osd/14048.
[ 2372.462367]
[ 2372.462367] stack backtrace:
[ 2372.462386] CPU: 2 PID: 14048 Comm: ceph-osd Tainted: G        W    3.12.0+ #32
[ 2372.462414] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
[ 2372.462455]  ffffffffa022cb10 ffff88007490fd28 ffffffff816f094a ffff8800378aa320
[ 2372.462491]  ffff88007490fd50 ffffffff810adf4c ffff8800378aa320 ffff88009af97650
[ 2372.462526]  ffffffffa022cb10 ffff88007490fd88 ffffffff810b01ee ffff8800898c0000
[ 2372.462562] Call Trace:
[ 2372.462584]  [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
[ 2372.462619]  [<ffffffff816f094a>] dump_stack+0x45/0x56
[ 2372.462642]  [<ffffffff810adf4c>] print_unlock_imbalance_bug+0xec/0x100
[ 2372.462677]  [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
[ 2372.462710]  [<ffffffff810b01ee>] lock_release+0x18e/0x210
[ 2372.462742]  [<ffffffffa022cb36>] btrfs_commit_transaction_async+0x1d6/0x2a0 [btrfs]
[ 2372.462783]  [<ffffffffa025a7ce>] btrfs_ioctl_start_sync+0x3e/0xc0 [btrfs]
[ 2372.462822]  [<ffffffffa025f1d3>] btrfs_ioctl+0x4c3/0x1f70 [btrfs]
[ 2372.462849]  [<ffffffff812c0321>] ? avc_has_perm+0x121/0x1b0
[ 2372.462873]  [<ffffffff812c0224>] ? avc_has_perm+0x24/0x1b0
[ 2372.462897]  [<ffffffff8107ecc8>] ? sched_clock_cpu+0xa8/0x100
[ 2372.462922]  [<ffffffff8117b145>] do_vfs_ioctl+0x2e5/0x4e0
[ 2372.462946]  [<ffffffff812c19e6>] ? file_has_perm+0x86/0xa0
[ 2372.462969]  [<ffffffff8117b3c1>] SyS_ioctl+0x81/0xa0
[ 2372.462991]  [<ffffffff817045a4>] tracesys+0xdd/0xe2

====================================================

It's because that we don't do the right thing when checking if it's ok to
tell lockdep that we're trying to release the rwsem.

If the trans handle's type is TRANS_ATTACH, we won't acquire the freeze rwsem, but
as TRANS_ATTACH fits the check (trans < TRANS_JOIN_NOLOCK), we'll release the freeze
rwsem, which makes lockdep complains a lot.

Reported-by: Ma Jianpeng <[email protected]>
Signed-off-by: Liu Bo <[email protected]>
Signed-off-by: Miao Xie <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
heftig referenced this pull request in zen-kernel/zen-kernel Dec 21, 2013
commit b1a06a4 upstream.

Lockdep complains about btrfs's async commit:

[ 2372.462171] [ BUG: bad unlock balance detected! ]
[ 2372.462191] 3.12.0+ #32 Tainted: G        W
[ 2372.462209] -------------------------------------
[ 2372.462228] ceph-osd/14048 is trying to release lock (sb_internal) at:
[ 2372.462275] [<ffffffffa022cb10>] btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
[ 2372.462305] but there are no more locks to release!
[ 2372.462324]
[ 2372.462324] other info that might help us debug this:
[ 2372.462349] no locks held by ceph-osd/14048.
[ 2372.462367]
[ 2372.462367] stack backtrace:
[ 2372.462386] CPU: 2 PID: 14048 Comm: ceph-osd Tainted: G        W    3.12.0+ #32
[ 2372.462414] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
[ 2372.462455]  ffffffffa022cb10 ffff88007490fd28 ffffffff816f094a ffff8800378aa320
[ 2372.462491]  ffff88007490fd50 ffffffff810adf4c ffff8800378aa320 ffff88009af97650
[ 2372.462526]  ffffffffa022cb10 ffff88007490fd88 ffffffff810b01ee ffff8800898c0000
[ 2372.462562] Call Trace:
[ 2372.462584]  [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
[ 2372.462619]  [<ffffffff816f094a>] dump_stack+0x45/0x56
[ 2372.462642]  [<ffffffff810adf4c>] print_unlock_imbalance_bug+0xec/0x100
[ 2372.462677]  [<ffffffffa022cb10>] ? btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
[ 2372.462710]  [<ffffffff810b01ee>] lock_release+0x18e/0x210
[ 2372.462742]  [<ffffffffa022cb36>] btrfs_commit_transaction_async+0x1d6/0x2a0 [btrfs]
[ 2372.462783]  [<ffffffffa025a7ce>] btrfs_ioctl_start_sync+0x3e/0xc0 [btrfs]
[ 2372.462822]  [<ffffffffa025f1d3>] btrfs_ioctl+0x4c3/0x1f70 [btrfs]
[ 2372.462849]  [<ffffffff812c0321>] ? avc_has_perm+0x121/0x1b0
[ 2372.462873]  [<ffffffff812c0224>] ? avc_has_perm+0x24/0x1b0
[ 2372.462897]  [<ffffffff8107ecc8>] ? sched_clock_cpu+0xa8/0x100
[ 2372.462922]  [<ffffffff8117b145>] do_vfs_ioctl+0x2e5/0x4e0
[ 2372.462946]  [<ffffffff812c19e6>] ? file_has_perm+0x86/0xa0
[ 2372.462969]  [<ffffffff8117b3c1>] SyS_ioctl+0x81/0xa0
[ 2372.462991]  [<ffffffff817045a4>] tracesys+0xdd/0xe2

====================================================

It's because that we don't do the right thing when checking if it's ok to
tell lockdep that we're trying to release the rwsem.

If the trans handle's type is TRANS_ATTACH, we won't acquire the freeze rwsem, but
as TRANS_ATTACH fits the check (trans < TRANS_JOIN_NOLOCK), we'll release the freeze
rwsem, which makes lockdep complains a lot.

Reported-by: Ma Jianpeng <[email protected]>
Signed-off-by: Liu Bo <[email protected]>
Signed-off-by: Miao Xie <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: Chris Mason <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
apxii pushed a commit to apxii/linux that referenced this pull request Jan 4, 2014
Improve the comment of SW workaround for CCM lpm issue using
hardware errata description to avoid confusion.

ERR007265: CCM: When improper low-power sequence is used, the SoC
enters low power mode before the ARM core executes WFI.

Software workaround:
1) Software should trigger IRQ torvalds#32 (IOMUX) to be always pending
   by setting IOMUX_GPR1_GINT.
2) Software should then unmask IRQ torvalds#32 in GPC before setting CCM
   Low-Power mode.
3) Software should mask IRQ torvalds#32 right after CCM Low-Power mode is
   set (set bits 0-1 of CCM_CLPCR).

Signed-off-by: Anson Huang <[email protected]>
Signed-off-by: Shawn Guo <[email protected]>
johnweber pushed a commit to wandboard-org/linux that referenced this pull request Jan 10, 2014
torvalds#125

IRQ torvalds#125's status is not constant on different boards, IRQ torvalds#32 is
IOMUXC's interrupt which can be triggered manually at anytime, use
this irq instead of torvalds#125 to generate interrupt for avoiding CCM enter
low power mode by mistake.

Signed-off-by: Anson Huang <[email protected]>
swarren pushed a commit to swarren/linux-tegra that referenced this pull request Feb 26, 2014
When doing some numa tests on powerpc, I triggered an oops bug.  I find it
is caused by using page->_last_cpupid.  It should be initialized as "-1 &
LAST_CPUPID_MASK", but not "-1".  Otherwise, in task_numa_fault(), we will
miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)).  And finally
cause an oops bug in task_numa_group(), since the online cpu is less than
possible cpu.

PPC needs the LAST_CPUPID_NOT_IN_PAGE_FLAGS case because ppc needs to
support a large physical address region, up to 2^46 but small section size
(2^24).  So when NR_CPUS grows up, it is easily to cause
not-in-page-flags.

Call trace:
[   55.978091] SMP NR_CPUS=64 NUMA PowerNV
[   55.978118] Modules linked in:
[   55.978145] CPU: 24 PID: 804 Comm: systemd-udevd Not tainted
3.13.0-rc1+ torvalds#32
[   55.978183] task: c000001e2746aa80 ti: c000001e32c50000 task.ti:
c000001e32c50000
[   55.978219] NIP: c0000000000f5ad0 LR: c0000000000f5ac8 CTR:
c000000000913cf0
[   55.978256] REGS: c000001e32c53510 TRAP: 0300   Not tainted
(3.13.0-rc1+)
[   55.978286] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR:
28024424  XER: 20000000
[   55.978380] CFAR: c000000000009324 DAR: 7265717569726857 DSISR:
40000000 SOFTE: 1
GPR00: c0000000000f5ac8 c000001e32c53790 c000000001f34338
0000000000000021
GPR04: 0000000000000000 0000000000000031 c000000001f74338
0000ffffffffffff
GPR08: 0000000000000001 7265717569726573 0000000000000000
0000000000000000
GPR12: 0000000028024422 c00000000ffdd800 00000000296b2e64
0000000000000020
GPR16: 0000000000000002 0000000000000003 c000001e2f8e4658
c000001e25c1c1d8
GPR20: c000001e2f8e4000 c000000001f7a858 0000000000000658
0000000040000392
GPR24: 00000000000000a8 c000001e33c1a400 00000000000001d8
c000001e25c1c000
GPR28: c000001e33c37ff0 0007837840000392 000000000000003f
c000001e32c53790
[   55.978903] NIP [c0000000000f5ad0] .task_numa_fault+0x1470/0x2370
[   55.978934] LR [c0000000000f5ac8] .task_numa_fault+0x1468/0x2370
[   55.978964] Call Trace:
[   55.978978] [c000001e32c53790] [c0000000000f5ac8]
.task_numa_fault+0x1468/0x2370 (unreliable)
[   55.979036] [c000001e32c539e0] [c00000000020a820]
.do_numa_page+0x480/0x4a0
[   55.979072] [c000001e32c53b10] [c00000000020bfec]
.handle_mm_fault+0x4ec/0xc90
[   55.979123] [c000001e32c53c00] [c000000000e88c98]
.do_page_fault+0x3a8/0x890
[   55.979161] [c000001e32c53e30] [c000000000009568]
handle_page_fault+0x10/0x30
[   55.979197] Instruction dump:
[   55.979216] 3c82fefb 3884b138 48d9cff1 60000000 48000574 3c62fef
3863af78 3c82fefb
[   55.979277] 3884b138 48d9cfd5 60000000 e93f0100 <812902e4> 7d2907b4
5529063e 7d2a07b4
[   55.979354] ---[ end trace 15f2510da5ae07cf ]---

Signed-off-by: Liu Ping Fan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
swarren pushed a commit to swarren/linux-tegra that referenced this pull request Mar 3, 2014
I can trigger a lockdep warning:

  # mount -t cgroup -o cpuset xxx /cgroup
  # mkdir /cgroup/cpuset
  # mkdir /cgroup/tmp
  # echo 0 > /cgroup/tmp/cpuset.cpus
  # echo 0 > /cgroup/tmp/cpuset.mems
  # echo 1 > /cgroup/tmp/cpuset.memory_migrate
  # echo $$ > /cgroup/tmp/tasks
  # echo 1 > /cgruop/tmp/cpuset.mems

  ===============================
  [ INFO: suspicious RCU usage. ]
  3.14.0-rc1-0.1-default+ torvalds#32 Not tainted
  -------------------------------
  include/linux/cgroup.h:682 suspicious rcu_dereference_check() usage!
  ...
    [<ffffffff81582174>] dump_stack+0x72/0x86
    [<ffffffff810b8f01>] lockdep_rcu_suspicious+0x101/0x140
    [<ffffffff81105ba1>] cpuset_migrate_mm+0xb1/0xe0
  ...

We used to hold cgroup_mutex when calling cpuset_migrate_mm(), but now
we hold cpuset_mutex, which causes task_css() to complain.

This is not a false-positive but a real issue.

Holding cpuset_mutex won't prevent a task from migrating to another
cpuset, and it won't prevent the original task->cgroup from destroying
during this change.

Fixes: 5d21cc2 (cpuset: replace cgroup_mutex locking with cpuset internal locking)
Cc: <[email protected]> # 3.9+
Signed-off-by: Li Zefan <[email protected]>
Sigend-off-by: Tejun Heo <[email protected]>
torvalds pushed a commit that referenced this pull request Mar 5, 2014
When doing some numa tests on powerpc, I triggered an oops bug.  I find
it is caused by using page->_last_cpupid.  It should be initialized as
"-1 & LAST_CPUPID_MASK", but not "-1".  Otherwise, in task_numa_fault(),
we will miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)).  And
finally cause an oops bug in task_numa_group(), since the online cpu is
less than possible cpu.  This happen with CONFIG_SPARSE_VMEMMAP disabled

Call trace:

  SMP NR_CPUS=64 NUMA PowerNV
  Modules linked in:
  CPU: 24 PID: 804 Comm: systemd-udevd Not tainted3.13.0-rc1+ #32
  task: c000001e2746aa80 ti: c000001e32c50000 task.ti:c000001e32c50000
  REGS: c000001e32c53510 TRAP: 0300   Not tainted(3.13.0-rc1+)
  MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR:28024424  XER: 20000000
  CFAR: c000000000009324 DAR: 7265717569726857 DSISR:40000000 SOFTE: 1
  NIP  .task_numa_fault+0x1470/0x2370
  LR  .task_numa_fault+0x1468/0x2370
  Call Trace:
   .task_numa_fault+0x1468/0x2370 (unreliable)
   .do_numa_page+0x480/0x4a0
   .handle_mm_fault+0x4ec/0xc90
   .do_page_fault+0x3a8/0x890
   handle_page_fault+0x10/0x30
  Instruction dump:
  3c82fefb 3884b138 48d9cff1 60000000 48000574 3c62fefb3863af78 3c82fefb
  3884b138 48d9cfd5 60000000 e93f0100 <812902e4> 7d2907b45529063e 7d2a07b4
  ---[ end trace 15f2510da5ae07cf ]---

Signed-off-by: Liu Ping Fan <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
zeitgeist87 pushed a commit to zeitgeist87/linux that referenced this pull request Mar 14, 2014
When doing some numa tests on powerpc, I triggered an oops bug.  I find it
is caused by using page->_last_cpupid.  It should be initialized as "-1 &
LAST_CPUPID_MASK", but not "-1".  Otherwise, in task_numa_fault(), we will
miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)).  And finally
cause an oops bug in task_numa_group(), since the online cpu is less than
possible cpu.  This happen with CONFIG_SPARSE_VMEMMAP disabled

Call trace:
[   55.978091] SMP NR_CPUS=64 NUMA PowerNV
[   55.978118] Modules linked in:
[   55.978145] CPU: 24 PID: 804 Comm: systemd-udevd Not tainted3.13.0-rc1+ torvalds#32
[   55.978183] task: c000001e2746aa80 ti: c000001e32c50000 task.ti:c000001e32c50000
[   55.978219] NIP: c0000000000f5ad0 LR: c0000000000f5ac8 CTR:c000000000913cf0
[   55.978256] REGS: c000001e32c53510 TRAP: 0300   Not tainted(3.13.0-rc1+)
[   55.978286] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR:28024424  XER: 20000000
[   55.978380] CFAR: c000000000009324 DAR: 7265717569726857 DSISR:40000000 SOFTE: 1
GPR00: c0000000000f5ac8 c000001e32c53790 c000000001f343380000000000000021
GPR04: 0000000000000000 0000000000000031 c000000001f743380000ffffffffffff
GPR08: 0000000000000001 7265717569726573 00000000000000000000000000000000
GPR12: 0000000028024422 c00000000ffdd800 00000000296b2e640000000000000020
GPR16: 0000000000000002 0000000000000003 c000001e2f8e4658c000001e25c1c1d8
GPR20: c000001e2f8e4000 c000000001f7a858 00000000000006580000000040000392
GPR24: 00000000000000a8 c000001e33c1a400 00000000000001d8c000001e25c1c000
GPR28: c000001e33c37ff0 0007837840000392 000000000000003fc000001e32c53790
[   55.978903] NIP [c0000000000f5ad0] .task_numa_fault+0x1470/0x2370
[   55.978934] LR [c0000000000f5ac8] .task_numa_fault+0x1468/0x2370
[   55.978964] Call Trace:
[   55.978978] [c000001e32c53790] [c0000000000f5ac8].task_numa_fault+0x1468/0x2370 (unreliable)
[   55.979036] [c000001e32c539e0] [c00000000020a820].do_numa_page+0x480/0x4a0
[   55.979072] [c000001e32c53b10] [c00000000020bfec].handle_mm_fault+0x4ec/0xc90
[   55.979123] [c000001e32c53c00] [c000000000e88c98].do_page_fault+0x3a8/0x890
[   55.979161] [c000001e32c53e30] [c000000000009568]handle_page_fault+0x10/0x30
[   55.979197] Instruction dump:
[   55.979216] 3c82fefb 3884b138 48d9cff1 60000000 48000574 3c62fefb3863af78 3c82fefb
[   55.979277] 3884b138 48d9cfd5 60000000 e93f0100 <812902e4> 7d2907b45529063e 7d2a07b4
[   55.979354] ---[ end trace 15f2510da5ae07cf ]---

Signed-off-by: Liu Ping Fan <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
brianlilly pushed a commit to crystalfontz/cfa_10036_kernel that referenced this pull request Mar 26, 2014
commit 4729583 upstream.

I can trigger a lockdep warning:

  # mount -t cgroup -o cpuset xxx /cgroup
  # mkdir /cgroup/cpuset
  # mkdir /cgroup/tmp
  # echo 0 > /cgroup/tmp/cpuset.cpus
  # echo 0 > /cgroup/tmp/cpuset.mems
  # echo 1 > /cgroup/tmp/cpuset.memory_migrate
  # echo $$ > /cgroup/tmp/tasks
  # echo 1 > /cgruop/tmp/cpuset.mems

  ===============================
  [ INFO: suspicious RCU usage. ]
  3.14.0-rc1-0.1-default+ torvalds#32 Not tainted
  -------------------------------
  include/linux/cgroup.h:682 suspicious rcu_dereference_check() usage!
  ...
    [<ffffffff81582174>] dump_stack+0x72/0x86
    [<ffffffff810b8f01>] lockdep_rcu_suspicious+0x101/0x140
    [<ffffffff81105ba1>] cpuset_migrate_mm+0xb1/0xe0
  ...

We used to hold cgroup_mutex when calling cpuset_migrate_mm(), but now
we hold cpuset_mutex, which causes task_css() to complain.

This is not a false-positive but a real issue.

Holding cpuset_mutex won't prevent a task from migrating to another
cpuset, and it won't prevent the original task->cgroup from destroying
during this change.

Fixes: 5d21cc2 (cpuset: replace cgroup_mutex locking with cpuset internal locking)
Signed-off-by: Li Zefan <[email protected]>
Sigend-off-by: Tejun Heo <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
swarren pushed a commit to swarren/linux-tegra that referenced this pull request Apr 25, 2014
…ixes

WARNING: please, no spaces at the start of a line
torvalds#29: FILE: mm/memcontrol.c:689:
+       int nid = zone_to_nid(zone);$

WARNING: please, no spaces at the start of a line
torvalds#30: FILE: mm/memcontrol.c:690:
+       int zid = zone_idx(zone);$

WARNING: please, no spaces at the start of a line
torvalds#32: FILE: mm/memcontrol.c:692:
+       return mem_cgroup_zoneinfo(memcg, nid, zid);$

total: 0 errors, 3 warnings, 35 lines checked

./patches/mm-memcontrolc-introduce-helper-mem_cgroup_zoneinfo_zone.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Jianyu Zhan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ddstreet pushed a commit to ddstreet/linux that referenced this pull request Apr 26, 2014
…ixes

WARNING: please, no spaces at the start of a line
torvalds#29: FILE: mm/memcontrol.c:689:
+       int nid = zone_to_nid(zone);$

WARNING: please, no spaces at the start of a line
torvalds#30: FILE: mm/memcontrol.c:690:
+       int zid = zone_idx(zone);$

WARNING: please, no spaces at the start of a line
torvalds#32: FILE: mm/memcontrol.c:692:
+       return mem_cgroup_zoneinfo(memcg, nid, zid);$

total: 0 errors, 3 warnings, 35 lines checked

./patches/mm-memcontrolc-introduce-helper-mem_cgroup_zoneinfo_zone.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Jianyu Zhan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
torvalds pushed a commit that referenced this pull request Apr 27, 2014
The asm-generic, big-endian version of zero_bytemask creates a mask of
bytes preceding the first zero-byte by left shifting ~0ul based on the
position of the first zero byte.

Unfortunately, if the first (top) byte is zero, the output of
prep_zero_mask has only the top bit set, resulting in undefined C
behaviour as we shift left by an amount equal to the width of the type.
As it happens, GCC doesn't manage to spot this through the call to fls(),
but the issue remains if architectures choose to implement their shift
instructions differently.

An example would be arch/arm/ (AArch32), where LSL Rd, Rn, #32 results
in Rd == 0x0, whilst on arch/arm64 (AArch64) LSL Xd, Xn, #64 results in
Xd == Xn.

Rather than check explicitly for the problematic shift, this patch adds
an extra shift by 1, replacing fls with __fls. Since zero_bytemask is
never called with a zero argument (has_zero() is used to check the data
first), we don't need to worry about calling __fls(0), which is
undefined.

Cc: <[email protected]>
Cc: Victor Kamensky <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
swarren pushed a commit to swarren/linux-tegra that referenced this pull request Apr 28, 2014
…ixes

WARNING: please, no spaces at the start of a line
torvalds#29: FILE: mm/memcontrol.c:689:
+       int nid = zone_to_nid(zone);$

WARNING: please, no spaces at the start of a line
torvalds#30: FILE: mm/memcontrol.c:690:
+       int zid = zone_idx(zone);$

WARNING: please, no spaces at the start of a line
torvalds#32: FILE: mm/memcontrol.c:692:
+       return mem_cgroup_zoneinfo(memcg, nid, zid);$

total: 0 errors, 3 warnings, 35 lines checked

./patches/mm-memcontrolc-introduce-helper-mem_cgroup_zoneinfo_zone.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Jianyu Zhan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ddstreet pushed a commit to ddstreet/linux that referenced this pull request May 2, 2014
…ixes

WARNING: please, no spaces at the start of a line
torvalds#29: FILE: mm/memcontrol.c:689:
+       int nid = zone_to_nid(zone);$

WARNING: please, no spaces at the start of a line
torvalds#30: FILE: mm/memcontrol.c:690:
+       int zid = zone_idx(zone);$

WARNING: please, no spaces at the start of a line
torvalds#32: FILE: mm/memcontrol.c:692:
+       return mem_cgroup_zoneinfo(memcg, nid, zid);$

total: 0 errors, 3 warnings, 35 lines checked

./patches/mm-memcontrolc-introduce-helper-mem_cgroup_zoneinfo_zone.patch has style problems, please review.

If any of these errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Jianyu Zhan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
bjackman pushed a commit to bjackman/linux that referenced this pull request Jun 24, 2025
WARNING: Misordered MAINTAINERS entry - list file patterns in alphabetic order
torvalds#32: FILE: MAINTAINERS:15860:
 F:	mm/oom_kill.c
+F:	include/linux/oom.h

Cc: David Rientjes <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jun 24, 2025
Without the change `perf `hangs up on charaster devices. On my system
it's enough to run system-wide sampler for a few seconds to get the
hangup:

    $ perf record -a -g --call-graph=dwarf
    $ perf report
    # hung

`strace` shows that hangup happens on reading on a character device
`/dev/dri/renderD128`

    $ strace -y -f -p 2780484
    strace: Process 2780484 attached
    pread64(101</dev/dri/renderD128>, strace: Process 2780484 detached

It's call trace descends into `elfutils`:

    $ gdb -p 2780484
    (gdb) bt
    #0  0x00007f5e508f04b7 in __libc_pread64 (fd=101, buf=0x7fff9df7edb0, count=0, offset=0)
        at ../sysdeps/unix/sysv/linux/pread64.c:25
    #1  0x00007f5e52b79515 in read_file () from /<<NIX>>/elfutils-0.192/lib/libelf.so.1
    #2  0x00007f5e52b25666 in libdw_open_elf () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #3  0x00007f5e52b25907 in __libdw_open_file () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #4  0x00007f5e52b120a9 in dwfl_report_elf@@ELFUTILS_0.156 ()
       from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #5  0x000000000068bf20 in __report_module (al=al@entry=0x7fff9df80010, ip=ip@entry=139803237033216, ui=ui@entry=0x5369b5e0)
        at util/dso.h:537
    torvalds#6  0x000000000068c3d1 in report_module (ip=139803237033216, ui=0x5369b5e0) at util/unwind-libdw.c:114
    torvalds#7  frame_callback (state=0x535aef10, arg=0x5369b5e0) at util/unwind-libdw.c:242
    torvalds#8  0x00007f5e52b261d3 in dwfl_thread_getframes () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    torvalds#9  0x00007f5e52b25bdb in get_one_thread_cb () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    torvalds#10 0x00007f5e52b25faa in dwfl_getthreads () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    torvalds#11 0x00007f5e52b26514 in dwfl_getthread_frames () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    torvalds#12 0x000000000068c6ce in unwind__get_entries (cb=cb@entry=0x5d4620 <unwind_entry>, arg=arg@entry=0x10cd5fa0,
        thread=thread@entry=0x1076a290, data=data@entry=0x7fff9df80540, max_stack=max_stack@entry=127,
        best_effort=best_effort@entry=false) at util/thread.h:152
    torvalds#13 0x00000000005dae95 in thread__resolve_callchain_unwind (evsel=0x106006d0, thread=0x1076a290, cursor=0x10cd5fa0,
        sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2939
    torvalds#14 thread__resolve_callchain_unwind (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, sample=0x7fff9df80540,
        max_stack=127, symbols=true) at util/machine.c:2920
    torvalds#15 __thread__resolve_callchain (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, evsel@entry=0x7fff9df80440,
        sample=0x7fff9df80540, parent=parent@entry=0x7fff9df804a0, root_al=root_al@entry=0x7fff9df80440, max_stack=127, symbols=true)
        at util/machine.c:2970
    torvalds#16 0x00000000005d0cb2 in thread__resolve_callchain (thread=<optimized out>, cursor=<optimized out>, evsel=0x7fff9df80440,
        sample=<optimized out>, parent=0x7fff9df804a0, root_al=0x7fff9df80440, max_stack=127) at util/machine.h:198
    torvalds#17 sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fff9df804a0,
        evsel=evsel@entry=0x106006d0, al=al@entry=0x7fff9df80440, max_stack=max_stack@entry=127) at util/callchain.c:1127
    torvalds#18 0x0000000000617e08 in hist_entry_iter__add (iter=iter@entry=0x7fff9df80480, al=al@entry=0x7fff9df80440, max_stack_depth=127,
        arg=arg@entry=0x7fff9df81ae0) at util/hist.c:1255
    torvalds#19 0x000000000045d2d0 in process_sample_event (tool=0x7fff9df81ae0, event=<optimized out>, sample=0x7fff9df80540,
        evsel=0x106006d0, machine=<optimized out>) at builtin-report.c:334
    torvalds#20 0x00000000005e3bb1 in perf_session__deliver_event (session=0x105ff2c0, event=0x7f5c7d735ca0, tool=0x7fff9df81ae0,
        file_offset=2914716832, file_path=0x105ffbf0 "perf.data") at util/session.c:1367
    torvalds#21 0x00000000005e8d93 in do_flush (oe=0x105ffa50, show_progress=false) at util/ordered-events.c:245
    torvalds#22 __ordered_events__flush (oe=0x105ffa50, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:324
    torvalds#23 0x00000000005e1f64 in perf_session__process_user_event (session=0x105ff2c0, event=0x7f5c7d752b18, file_offset=2914835224,
        file_path=0x105ffbf0 "perf.data") at util/session.c:1419
    torvalds#24 0x00000000005e47c7 in reader__read_event (rd=rd@entry=0x7fff9df81260, session=session@entry=0x105ff2c0,
    --Type <RET> for more, q to quit, c to continue without paging--
    quit
        prog=prog@entry=0x7fff9df81220) at util/session.c:2132
    torvalds#25 0x00000000005e4b37 in reader__process_events (rd=0x7fff9df81260, session=0x105ff2c0, prog=0x7fff9df81220)
        at util/session.c:2181
    torvalds#26 __perf_session__process_events (session=0x105ff2c0) at util/session.c:2226
    torvalds#27 perf_session__process_events (session=session@entry=0x105ff2c0) at util/session.c:2390
    torvalds#28 0x0000000000460add in __cmd_report (rep=0x7fff9df81ae0) at builtin-report.c:1076
    torvalds#29 cmd_report (argc=<optimized out>, argv=<optimized out>) at builtin-report.c:1827
    torvalds#30 0x00000000004c5a40 in run_builtin (p=p@entry=0xd8f7f8 <commands+312>, argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0)
        at perf.c:351
    torvalds#31 0x00000000004c5d63 in handle_internal_command (argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:404
    torvalds#32 0x0000000000442de3 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:448
    torvalds#33 main (argc=<optimized out>, argv=0x7fff9df844b0) at perf.c:556

The hangup happens because nothing in` perf` or `elfutils` checks if a
mapped file is easily readable.

The change conservatively skips all non-regular files.

Signed-off-by: Sergei Trofimovich <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
bjackman pushed a commit to bjackman/linux that referenced this pull request Jun 25, 2025
WARNING: Misordered MAINTAINERS entry - list file patterns in alphabetic order
torvalds#32: FILE: MAINTAINERS:15860:
 F:	mm/oom_kill.c
+F:	include/linux/oom.h

Cc: David Rientjes <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: SeongJae Park <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
heftig pushed a commit to archlinux/linux that referenced this pull request Jun 27, 2025
[ Upstream commit eedf3e3 ]

ACPICA commit 1c28da2242783579d59767617121035dafba18c3

This was originally done in NetBSD:
NetBSD/src@b69d1ac
and is the correct alternative to the smattering of `memcpy`s I
previously contributed to this repository.

This also sidesteps the newly strict checks added in UBSAN:
llvm/llvm-project@7926744

Before this change we see the following UBSAN stack trace in Fuchsia:

  #0    0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e
  #1.2  0x000021982bc4af3c in ubsan_get_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:41 <libclang_rt.asan.so>+0x41f3c
  #1.1  0x000021982bc4af3c in maybe_print_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:51 <libclang_rt.asan.so>+0x41f3c
  #1    0x000021982bc4af3c in ~scoped_report() compiler-rt/lib/ubsan/ubsan_diag.cpp:395 <libclang_rt.asan.so>+0x41f3c
  #2    0x000021982bc4bb6f in handletype_mismatch_impl() compiler-rt/lib/ubsan/ubsan_handlers.cpp:137 <libclang_rt.asan.so>+0x42b6f
  #3    0x000021982bc4b723 in __ubsan_handle_type_mismatch_v1 compiler-rt/lib/ubsan/ubsan_handlers.cpp:142 <libclang_rt.asan.so>+0x42723
  #4    0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e
  #5    0x000021afcfdf2089 in acpi_rs_convert_aml_to_resource(struct acpi_resource*, union aml_resource*, struct acpi_rsconvert_info*) ../../third_party/acpica/source/components/resources/rsmisc.c:355 <platform-bus-x86.so>+0x6b2089
  #6    0x000021afcfded169 in acpi_rs_convert_aml_to_resources(u8*, u32, u32, u8, void**) ../../third_party/acpica/source/components/resources/rslist.c:137 <platform-bus-x86.so>+0x6ad169
  #7    0x000021afcfe2d24a in acpi_ut_walk_aml_resources(struct acpi_walk_state*, u8*, acpi_size, acpi_walk_aml_callback, void**) ../../third_party/acpica/source/components/utilities/utresrc.c:237 <platform-bus-x86.so>+0x6ed24a
  #8    0x000021afcfde66b7 in acpi_rs_create_resource_list(union acpi_operand_object*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rscreate.c:199 <platform-bus-x86.so>+0x6a66b7
  #9    0x000021afcfdf6979 in acpi_rs_get_method_data(acpi_handle, const char*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rsutils.c:770 <platform-bus-x86.so>+0x6b6979
  #10   0x000021afcfdf708f in acpi_walk_resources(acpi_handle, char*, acpi_walk_resource_callback, void*) ../../third_party/acpica/source/components/resources/rsxface.c:731 <platform-bus-x86.so>+0x6b708f
  torvalds#11   0x000021afcfa95dcf in acpi::acpi_impl::walk_resources(acpi::acpi_impl*, acpi_handle, const char*, acpi::Acpi::resources_callable) ../../src/devices/board/lib/acpi/acpi-impl.cc:41 <platform-bus-x86.so>+0x355dcf
  torvalds#12   0x000021afcfaa8278 in acpi::device_builder::gather_resources(acpi::device_builder*, acpi::Acpi*, fidl::any_arena&, acpi::Manager*, acpi::device_builder::gather_resources_callback) ../../src/devices/board/lib/acpi/device-builder.cc:84 <platform-bus-x86.so>+0x368278
  torvalds#13   0x000021afcfbddb87 in acpi::Manager::configure_discovered_devices(acpi::Manager*) ../../src/devices/board/lib/acpi/manager.cc:75 <platform-bus-x86.so>+0x49db87
  torvalds#14   0x000021afcf99091d in publish_acpi_devices(acpi::Manager*, zx_device_t*, zx_device_t*) ../../src/devices/board/drivers/x86/acpi-nswalk.cc:95 <platform-bus-x86.so>+0x25091d
  torvalds#15   0x000021afcf9c1d4e in x86::X86::do_init(x86::X86*) ../../src/devices/board/drivers/x86/x86.cc:60 <platform-bus-x86.so>+0x281d4e
  torvalds#16   0x000021afcf9e33ad in λ(x86::X86::ddk_init::(anon class)*) ../../src/devices/board/drivers/x86/x86.cc:77 <platform-bus-x86.so>+0x2a33ad
  torvalds#17   0x000021afcf9e313e in fit::internal::target<(lambda at../../src/devices/board/drivers/x86/x86.cc:76:19), false, false, std::__2::allocator<std::byte>, void>::invoke(void*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:183 <platform-bus-x86.so>+0x2a313e
  torvalds#18   0x000021afcfbab4c7 in fit::internal::function_base<16UL, false, void(), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <platform-bus-x86.so>+0x46b4c7
  torvalds#19   0x000021afcfbab342 in fit::function_impl<16UL, false, void(), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/function.h:315 <platform-bus-x86.so>+0x46b342
  torvalds#20   0x000021afcfcd98c3 in async::internal::retained_task::Handler(async_dispatcher_t*, async_task_t*, zx_status_t) ../../sdk/lib/async/task.cc:24 <platform-bus-x86.so>+0x5998c3
  torvalds#21   0x00002290f9924616 in λ(const driver_runtime::Dispatcher::post_task::(anon class)*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/dispatcher.cc:789 <libdriver_runtime.so>+0x10a616
  torvalds#22   0x00002290f9924323 in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:788:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int>::invoke(void*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0x10a323
  torvalds#23   0x00002290f9904b76 in fit::internal::function_base<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xeab76
  torvalds#24   0x00002290f9904831 in fit::callback_impl<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::operator()(fit::callback_impl<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/function.h:471 <libdriver_runtime.so>+0xea831
  torvalds#25   0x00002290f98d5adc in driver_runtime::callback_request::Call(driver_runtime::callback_request*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/callback_request.h:74 <libdriver_runtime.so>+0xbbadc
  torvalds#26   0x00002290f98e1e58 in driver_runtime::Dispatcher::dispatch_callback(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >) ../../src/devices/bin/driver_runtime/dispatcher.cc:1248 <libdriver_runtime.so>+0xc7e58
  torvalds#27   0x00002290f98e4159 in driver_runtime::Dispatcher::dispatch_callbacks(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:1308 <libdriver_runtime.so>+0xca159
  torvalds#28   0x00002290f9918414 in λ(const driver_runtime::Dispatcher::create_with_adder::(anon class)*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:353 <libdriver_runtime.so>+0xfe414
  torvalds#29   0x00002290f991812d in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:351:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>>::invoke(void*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0xfe12d
  torvalds#30   0x00002290f9906fc7 in fit::internal::function_base<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xecfc7
  torvalds#31   0x00002290f9906c66 in fit::function_impl<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/function.h:315 <libdriver_runtime.so>+0xecc66
  torvalds#32   0x00002290f98e73d9 in driver_runtime::Dispatcher::event_waiter::invoke_callback(driver_runtime::Dispatcher::event_waiter*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.h:543 <libdriver_runtime.so>+0xcd3d9
  torvalds#33   0x00002290f98e700d in driver_runtime::Dispatcher::event_waiter::handle_event(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/dispatcher.cc:1442 <libdriver_runtime.so>+0xcd00d
  torvalds#34   0x00002290f9918983 in async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event(async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>*, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/async_loop_owned_event_handler.h:59 <libdriver_runtime.so>+0xfe983
  torvalds#35   0x00002290f9918b9e in async::wait_method<async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>, &async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event>::call_handler(async_dispatcher_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async/include/lib/async/cpp/wait.h:201 <libdriver_runtime.so>+0xfeb9e
  torvalds#36   0x00002290f99bf509 in async_loop_dispatch_wait(async_loop_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async-loop/loop.c:394 <libdriver_runtime.so>+0x1a5509
  torvalds#37   0x00002290f99b9958 in async_loop_run_once(async_loop_t*, zx_time_t) ../../sdk/lib/async-loop/loop.c:343 <libdriver_runtime.so>+0x19f958
  torvalds#38   0x00002290f99b9247 in async_loop_run(async_loop_t*, zx_time_t, _Bool) ../../sdk/lib/async-loop/loop.c:301 <libdriver_runtime.so>+0x19f247
  torvalds#39   0x00002290f99ba962 in async_loop_run_thread(void*) ../../sdk/lib/async-loop/loop.c:860 <libdriver_runtime.so>+0x1a0962
  torvalds#40   0x000041afd176ef30 in start_c11(void*) ../../zircon/third_party/ulib/musl/pthread/pthread_create.c:63 <libc.so>+0x84f30
  torvalds#41   0x000041afd18a448d in thread_trampoline(uintptr_t, uintptr_t) ../../zircon/system/ulib/runtime/thread.cc:100 <libc.so>+0x1ba48d

Link: acpica/acpica@1c28da22
Signed-off-by: Rafael J. Wysocki <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Tamir Duberstein <[email protected]>
[ rjw: Pick up the tag from Tamir ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Jul 4, 2025
[ Upstream commit eedf3e3 ]

ACPICA commit 1c28da2242783579d59767617121035dafba18c3

This was originally done in NetBSD:
NetBSD/src@b69d1ac
and is the correct alternative to the smattering of `memcpy`s I
previously contributed to this repository.

This also sidesteps the newly strict checks added in UBSAN:
llvm/llvm-project@7926744

Before this change we see the following UBSAN stack trace in Fuchsia:

  #0    0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e
  #1.2  0x000021982bc4af3c in ubsan_get_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:41 <libclang_rt.asan.so>+0x41f3c
  #1.1  0x000021982bc4af3c in maybe_print_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:51 <libclang_rt.asan.so>+0x41f3c
  #1    0x000021982bc4af3c in ~scoped_report() compiler-rt/lib/ubsan/ubsan_diag.cpp:395 <libclang_rt.asan.so>+0x41f3c
  #2    0x000021982bc4bb6f in handletype_mismatch_impl() compiler-rt/lib/ubsan/ubsan_handlers.cpp:137 <libclang_rt.asan.so>+0x42b6f
  #3    0x000021982bc4b723 in __ubsan_handle_type_mismatch_v1 compiler-rt/lib/ubsan/ubsan_handlers.cpp:142 <libclang_rt.asan.so>+0x42723
  #4    0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e
  #5    0x000021afcfdf2089 in acpi_rs_convert_aml_to_resource(struct acpi_resource*, union aml_resource*, struct acpi_rsconvert_info*) ../../third_party/acpica/source/components/resources/rsmisc.c:355 <platform-bus-x86.so>+0x6b2089
  torvalds#6    0x000021afcfded169 in acpi_rs_convert_aml_to_resources(u8*, u32, u32, u8, void**) ../../third_party/acpica/source/components/resources/rslist.c:137 <platform-bus-x86.so>+0x6ad169
  torvalds#7    0x000021afcfe2d24a in acpi_ut_walk_aml_resources(struct acpi_walk_state*, u8*, acpi_size, acpi_walk_aml_callback, void**) ../../third_party/acpica/source/components/utilities/utresrc.c:237 <platform-bus-x86.so>+0x6ed24a
  torvalds#8    0x000021afcfde66b7 in acpi_rs_create_resource_list(union acpi_operand_object*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rscreate.c:199 <platform-bus-x86.so>+0x6a66b7
  torvalds#9    0x000021afcfdf6979 in acpi_rs_get_method_data(acpi_handle, const char*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rsutils.c:770 <platform-bus-x86.so>+0x6b6979
  torvalds#10   0x000021afcfdf708f in acpi_walk_resources(acpi_handle, char*, acpi_walk_resource_callback, void*) ../../third_party/acpica/source/components/resources/rsxface.c:731 <platform-bus-x86.so>+0x6b708f
  torvalds#11   0x000021afcfa95dcf in acpi::acpi_impl::walk_resources(acpi::acpi_impl*, acpi_handle, const char*, acpi::Acpi::resources_callable) ../../src/devices/board/lib/acpi/acpi-impl.cc:41 <platform-bus-x86.so>+0x355dcf
  torvalds#12   0x000021afcfaa8278 in acpi::device_builder::gather_resources(acpi::device_builder*, acpi::Acpi*, fidl::any_arena&, acpi::Manager*, acpi::device_builder::gather_resources_callback) ../../src/devices/board/lib/acpi/device-builder.cc:84 <platform-bus-x86.so>+0x368278
  torvalds#13   0x000021afcfbddb87 in acpi::Manager::configure_discovered_devices(acpi::Manager*) ../../src/devices/board/lib/acpi/manager.cc:75 <platform-bus-x86.so>+0x49db87
  torvalds#14   0x000021afcf99091d in publish_acpi_devices(acpi::Manager*, zx_device_t*, zx_device_t*) ../../src/devices/board/drivers/x86/acpi-nswalk.cc:95 <platform-bus-x86.so>+0x25091d
  torvalds#15   0x000021afcf9c1d4e in x86::X86::do_init(x86::X86*) ../../src/devices/board/drivers/x86/x86.cc:60 <platform-bus-x86.so>+0x281d4e
  torvalds#16   0x000021afcf9e33ad in λ(x86::X86::ddk_init::(anon class)*) ../../src/devices/board/drivers/x86/x86.cc:77 <platform-bus-x86.so>+0x2a33ad
  torvalds#17   0x000021afcf9e313e in fit::internal::target<(lambda at../../src/devices/board/drivers/x86/x86.cc:76:19), false, false, std::__2::allocator<std::byte>, void>::invoke(void*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:183 <platform-bus-x86.so>+0x2a313e
  torvalds#18   0x000021afcfbab4c7 in fit::internal::function_base<16UL, false, void(), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <platform-bus-x86.so>+0x46b4c7
  torvalds#19   0x000021afcfbab342 in fit::function_impl<16UL, false, void(), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/function.h:315 <platform-bus-x86.so>+0x46b342
  torvalds#20   0x000021afcfcd98c3 in async::internal::retained_task::Handler(async_dispatcher_t*, async_task_t*, zx_status_t) ../../sdk/lib/async/task.cc:24 <platform-bus-x86.so>+0x5998c3
  torvalds#21   0x00002290f9924616 in λ(const driver_runtime::Dispatcher::post_task::(anon class)*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/dispatcher.cc:789 <libdriver_runtime.so>+0x10a616
  torvalds#22   0x00002290f9924323 in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:788:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int>::invoke(void*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0x10a323
  torvalds#23   0x00002290f9904b76 in fit::internal::function_base<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xeab76
  torvalds#24   0x00002290f9904831 in fit::callback_impl<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::operator()(fit::callback_impl<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/function.h:471 <libdriver_runtime.so>+0xea831
  torvalds#25   0x00002290f98d5adc in driver_runtime::callback_request::Call(driver_runtime::callback_request*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/callback_request.h:74 <libdriver_runtime.so>+0xbbadc
  torvalds#26   0x00002290f98e1e58 in driver_runtime::Dispatcher::dispatch_callback(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >) ../../src/devices/bin/driver_runtime/dispatcher.cc:1248 <libdriver_runtime.so>+0xc7e58
  torvalds#27   0x00002290f98e4159 in driver_runtime::Dispatcher::dispatch_callbacks(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:1308 <libdriver_runtime.so>+0xca159
  torvalds#28   0x00002290f9918414 in λ(const driver_runtime::Dispatcher::create_with_adder::(anon class)*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:353 <libdriver_runtime.so>+0xfe414
  torvalds#29   0x00002290f991812d in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:351:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>>::invoke(void*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0xfe12d
  torvalds#30   0x00002290f9906fc7 in fit::internal::function_base<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xecfc7
  torvalds#31   0x00002290f9906c66 in fit::function_impl<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/function.h:315 <libdriver_runtime.so>+0xecc66
  torvalds#32   0x00002290f98e73d9 in driver_runtime::Dispatcher::event_waiter::invoke_callback(driver_runtime::Dispatcher::event_waiter*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.h:543 <libdriver_runtime.so>+0xcd3d9
  torvalds#33   0x00002290f98e700d in driver_runtime::Dispatcher::event_waiter::handle_event(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/dispatcher.cc:1442 <libdriver_runtime.so>+0xcd00d
  torvalds#34   0x00002290f9918983 in async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event(async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>*, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/async_loop_owned_event_handler.h:59 <libdriver_runtime.so>+0xfe983
  torvalds#35   0x00002290f9918b9e in async::wait_method<async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>, &async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event>::call_handler(async_dispatcher_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async/include/lib/async/cpp/wait.h:201 <libdriver_runtime.so>+0xfeb9e
  torvalds#36   0x00002290f99bf509 in async_loop_dispatch_wait(async_loop_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async-loop/loop.c:394 <libdriver_runtime.so>+0x1a5509
  torvalds#37   0x00002290f99b9958 in async_loop_run_once(async_loop_t*, zx_time_t) ../../sdk/lib/async-loop/loop.c:343 <libdriver_runtime.so>+0x19f958
  torvalds#38   0x00002290f99b9247 in async_loop_run(async_loop_t*, zx_time_t, _Bool) ../../sdk/lib/async-loop/loop.c:301 <libdriver_runtime.so>+0x19f247
  torvalds#39   0x00002290f99ba962 in async_loop_run_thread(void*) ../../sdk/lib/async-loop/loop.c:860 <libdriver_runtime.so>+0x1a0962
  torvalds#40   0x000041afd176ef30 in start_c11(void*) ../../zircon/third_party/ulib/musl/pthread/pthread_create.c:63 <libc.so>+0x84f30
  torvalds#41   0x000041afd18a448d in thread_trampoline(uintptr_t, uintptr_t) ../../zircon/system/ulib/runtime/thread.cc:100 <libc.so>+0x1ba48d

Link: acpica/acpica@1c28da22
Signed-off-by: Rafael J. Wysocki <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Tamir Duberstein <[email protected]>
[ rjw: Pick up the tag from Tamir ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
@torvalds torvalds closed this Sep 22, 2025
sys-i915-oscijenkins pushed a commit to intel-lgci-fdo-gitlab-mirror/drm.xe.kernel that referenced this pull request Sep 30, 2025
Before calling ttm_bo_populate() in the CPU fault path of a bo,
we assert that the bo is not being migrated. However, for
local bos we share the reservation object with other local bos
that might be in the process of being migrated. Also some VM
operations may attach USAGE_KERNEL fences to the common
reservation object and trigger false positives from the assert.

So remove the assert and instead wait for bo idle. This may
unnecessarily wait for idle in some cases but since we're
doing this wait later in the fault path anyway we might as
well do it here as well.

This fixes warnings like:
Sep 25 14:56:23 desky kernel: ------------[ cut here ]------------
Sep 25 14:56:23 desky kernel: xe 0000:03:00.0: [drm] Assertion `dma_resv_test_signaled(tbo->base.resv, DMA_RESV_USAGE_KERNEL) || (tbo->ttm && ttm_tt_is_populated(tbo->ttm))` failed!
                              platform: BATTLEMAGE subplatform: 1
                              graphics: Xe2_HPG 20.01 step A0
                              media: Xe2_HPM 13.01 step A1
Sep 25 14:56:23 desky kernel: WARNING: CPU: 6 PID: 24767 at drivers/gpu/drm/xe/xe_bo.c:1748 xe_bo_fault_migrate+0x1bb/0x300 [xe]
Sep 25 14:56:23 desky kernel: Modules linked in: cpuid dm_crypt xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc xfrm_user xfr>
Sep 25 14:56:23 desky kernel:  snd_soc_sdca snd_seq_midi prime_numbers coretemp snd_seq_midi_event drm_ttm_helper snd_hda_codec drm_buddy drm_exec snd_rawmidi snd_soc_core snd_hda_cor>
Sep 25 14:56:23 desky kernel: CPU: 6 UID: 1000 PID: 24767 Comm: steamwebhelper Tainted: G     U  W           6.17.0-rc7+ torvalds#32 PREEMPT(voluntary)
Sep 25 14:56:23 desky kernel: Tainted: [U]=USER, [W]=WARN
Sep 25 14:56:23 desky kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D36/PRO Z690-P DDR4 (MS-7D36), BIOS A.A1 10/18/2022
Sep 25 14:56:23 desky kernel: RIP: 0010:xe_bo_fault_migrate+0x1bb/0x300 [xe]
Sep 25 14:56:23 desky kernel: Code: fa 64 29 f9 48 c7 c7 40 e0 d3 c1 51 48 c7 c1 c0 e3 d3 c1 52 4c 8b 45 c0 41 50 44 8b 4d c8 4d 89 e0 48 8b 55 a8 e8 25 27 95 ef <0f> 0b 48 83 c4 40 4>
Sep 25 14:56:23 desky kernel: RSP: 0000:ffffae1ca88c7b10 EFLAGS: 00010286
Sep 25 14:56:23 desky kernel: RAX: 0000000000000000 RBX: ffff8d7cfd7e6800 RCX: 0000000000000027
Sep 25 14:56:23 desky kernel: RDX: ffff8d845019cec8 RSI: 0000000000000001 RDI: ffff8d845019cec0
Sep 25 14:56:23 desky kernel: RBP: ffffae1ca88c7bc8 R08: 0000000000000000 R09: 0000000000000000
Sep 25 14:56:23 desky kernel: R10: 0000000000000000 R11: 0000000000000004 R12: ffffffffc1db1faa
Sep 25 14:56:23 desky kernel: R13: ffffffffc1db2ab4 R14: 0000000000000001 R15: ffffae1ca88c7bd8
Sep 25 14:56:23 desky kernel: FS:  00007fb1baf31940(0000) GS:ffff8d849c870000(0000) knlGS:0000000000000000
Sep 25 14:56:23 desky kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 25 14:56:23 desky kernel: CR2: 00007fb1b2860020 CR3: 00000001705a9004 CR4: 0000000000772ef0
Sep 25 14:56:23 desky kernel: PKRU: 55555558
Sep 25 14:56:23 desky kernel: Call Trace:
Sep 25 14:56:23 desky kernel:  <TASK>
Sep 25 14:56:23 desky kernel:  xe_bo_cpu_fault_fastpath+0x11e/0x220 [xe]
Sep 25 14:56:23 desky kernel:  xe_bo_cpu_fault+0x84/0x410 [xe]
Sep 25 14:56:23 desky kernel:  ? __x64_sys_mmap+0x33/0x50
Sep 25 14:56:23 desky kernel:  ? x64_sys_call+0x1b2e/0x20d0
Sep 25 14:56:23 desky kernel:  ? do_syscall_64+0x9d/0x1f0
Sep 25 14:56:23 desky kernel:  ? __check_object_size+0x4a/0x2e0
Sep 25 14:56:23 desky kernel:  __do_fault+0x36/0x190
Sep 25 14:56:23 desky kernel:  do_fault+0xcf/0x570
Sep 25 14:56:23 desky kernel:  __handle_mm_fault+0x92b/0xfe0
Sep 25 14:56:23 desky kernel:  ? ktime_get_mono_fast_ns+0x39/0xd0
Sep 25 14:56:23 desky kernel:  handle_mm_fault+0x164/0x2c0
Sep 25 14:56:23 desky kernel:  do_user_addr_fault+0x2cb/0x840
Sep 25 14:56:23 desky kernel:  exc_page_fault+0x75/0x180
Sep 25 14:56:23 desky kernel:  asm_exc_page_fault+0x27/0x30
Sep 25 14:56:23 desky kernel: RIP: 0033:0x7fb1bc388bb7
Sep 25 14:56:23 desky kernel: Code: 48 ff c7 48 01 fe 48 8d 54 11 80 0f 1f 84 00 00 00 00 00 c5 fe 6f 0e c5 fe 6f 56 20 c5 fe 6f 5e 40 c5 fe 6f 66 60 48 83 ee 80 <c5> fd 7f 0f c5 fd 7>
Sep 25 14:56:23 desky kernel: RSP: 002b:00007ffd7814fad8 EFLAGS: 00010207
Sep 25 14:56:23 desky kernel: RAX: 00007fb1b2860000 RBX: 0000000000000690 RCX: 00007fb1b2860000
Sep 25 14:56:23 desky kernel: RDX: 00007fb1b2860610 RSI: 0000556eda79f4c0 RDI: 00007fb1b2860020
Sep 25 14:56:23 desky kernel: RBP: 00007ffd7814fb60 R08: 0000000000000000 R09: 000000012be0e000
Sep 25 14:56:23 desky kernel: R10: 00007fb1b2860000 R11: 0000000000000246 R12: 0000556edd39a240
Sep 25 14:56:23 desky kernel: R13: 00007fb1b2dcb010 R14: 0000556eda79f420 R15: 0000000000000000
Sep 25 14:56:23 desky kernel:  </TASK>

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5250
Fixes: c2ae94c ("drm/xe: Convert the CPU fault handler for exhaustive eviction")
Cc: Matthew Brost <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Oct 3, 2025
Before calling ttm_bo_populate() in the CPU fault path of a bo,
we assert that the bo is not being migrated. However, for
local bos we share the reservation object with other local bos
that might be in the process of being migrated. Also some VM
operations may attach USAGE_KERNEL fences to the common
reservation object and trigger false positives from the assert.

So remove the assert and instead wait for bo idle. This may
unnecessarily wait for idle in some cases but since we're
doing this wait later in the fault path anyway we might as
well do it here as well.

This fixes warnings like:
Sep 25 14:56:23 desky kernel: ------------[ cut here ]------------
Sep 25 14:56:23 desky kernel: xe 0000:03:00.0: [drm] Assertion `dma_resv_test_signaled(tbo->base.resv, DMA_RESV_USAGE_KERNEL) || (tbo->ttm && ttm_tt_is_populated(tbo->ttm))` failed!
                              platform: BATTLEMAGE subplatform: 1
                              graphics: Xe2_HPG 20.01 step A0
                              media: Xe2_HPM 13.01 step A1
Sep 25 14:56:23 desky kernel: WARNING: CPU: 6 PID: 24767 at drivers/gpu/drm/xe/xe_bo.c:1748 xe_bo_fault_migrate+0x1bb/0x300 [xe]
Sep 25 14:56:23 desky kernel: Modules linked in: cpuid dm_crypt xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc xfrm_user xfr>
Sep 25 14:56:23 desky kernel:  snd_soc_sdca snd_seq_midi prime_numbers coretemp snd_seq_midi_event drm_ttm_helper snd_hda_codec drm_buddy drm_exec snd_rawmidi snd_soc_core snd_hda_cor>
Sep 25 14:56:23 desky kernel: CPU: 6 UID: 1000 PID: 24767 Comm: steamwebhelper Tainted: G     U  W           6.17.0-rc7+ torvalds#32 PREEMPT(voluntary)
Sep 25 14:56:23 desky kernel: Tainted: [U]=USER, [W]=WARN
Sep 25 14:56:23 desky kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D36/PRO Z690-P DDR4 (MS-7D36), BIOS A.A1 10/18/2022
Sep 25 14:56:23 desky kernel: RIP: 0010:xe_bo_fault_migrate+0x1bb/0x300 [xe]
Sep 25 14:56:23 desky kernel: Code: fa 64 29 f9 48 c7 c7 40 e0 d3 c1 51 48 c7 c1 c0 e3 d3 c1 52 4c 8b 45 c0 41 50 44 8b 4d c8 4d 89 e0 48 8b 55 a8 e8 25 27 95 ef <0f> 0b 48 83 c4 40 4>
Sep 25 14:56:23 desky kernel: RSP: 0000:ffffae1ca88c7b10 EFLAGS: 00010286
Sep 25 14:56:23 desky kernel: RAX: 0000000000000000 RBX: ffff8d7cfd7e6800 RCX: 0000000000000027
Sep 25 14:56:23 desky kernel: RDX: ffff8d845019cec8 RSI: 0000000000000001 RDI: ffff8d845019cec0
Sep 25 14:56:23 desky kernel: RBP: ffffae1ca88c7bc8 R08: 0000000000000000 R09: 0000000000000000
Sep 25 14:56:23 desky kernel: R10: 0000000000000000 R11: 0000000000000004 R12: ffffffffc1db1faa
Sep 25 14:56:23 desky kernel: R13: ffffffffc1db2ab4 R14: 0000000000000001 R15: ffffae1ca88c7bd8
Sep 25 14:56:23 desky kernel: FS:  00007fb1baf31940(0000) GS:ffff8d849c870000(0000) knlGS:0000000000000000
Sep 25 14:56:23 desky kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 25 14:56:23 desky kernel: CR2: 00007fb1b2860020 CR3: 00000001705a9004 CR4: 0000000000772ef0
Sep 25 14:56:23 desky kernel: PKRU: 55555558
Sep 25 14:56:23 desky kernel: Call Trace:
Sep 25 14:56:23 desky kernel:  <TASK>
Sep 25 14:56:23 desky kernel:  xe_bo_cpu_fault_fastpath+0x11e/0x220 [xe]
Sep 25 14:56:23 desky kernel:  xe_bo_cpu_fault+0x84/0x410 [xe]
Sep 25 14:56:23 desky kernel:  ? __x64_sys_mmap+0x33/0x50
Sep 25 14:56:23 desky kernel:  ? x64_sys_call+0x1b2e/0x20d0
Sep 25 14:56:23 desky kernel:  ? do_syscall_64+0x9d/0x1f0
Sep 25 14:56:23 desky kernel:  ? __check_object_size+0x4a/0x2e0
Sep 25 14:56:23 desky kernel:  __do_fault+0x36/0x190
Sep 25 14:56:23 desky kernel:  do_fault+0xcf/0x570
Sep 25 14:56:23 desky kernel:  __handle_mm_fault+0x92b/0xfe0
Sep 25 14:56:23 desky kernel:  ? ktime_get_mono_fast_ns+0x39/0xd0
Sep 25 14:56:23 desky kernel:  handle_mm_fault+0x164/0x2c0
Sep 25 14:56:23 desky kernel:  do_user_addr_fault+0x2cb/0x840
Sep 25 14:56:23 desky kernel:  exc_page_fault+0x75/0x180
Sep 25 14:56:23 desky kernel:  asm_exc_page_fault+0x27/0x30
Sep 25 14:56:23 desky kernel: RIP: 0033:0x7fb1bc388bb7
Sep 25 14:56:23 desky kernel: Code: 48 ff c7 48 01 fe 48 8d 54 11 80 0f 1f 84 00 00 00 00 00 c5 fe 6f 0e c5 fe 6f 56 20 c5 fe 6f 5e 40 c5 fe 6f 66 60 48 83 ee 80 <c5> fd 7f 0f c5 fd 7>
Sep 25 14:56:23 desky kernel: RSP: 002b:00007ffd7814fad8 EFLAGS: 00010207
Sep 25 14:56:23 desky kernel: RAX: 00007fb1b2860000 RBX: 0000000000000690 RCX: 00007fb1b2860000
Sep 25 14:56:23 desky kernel: RDX: 00007fb1b2860610 RSI: 0000556eda79f4c0 RDI: 00007fb1b2860020
Sep 25 14:56:23 desky kernel: RBP: 00007ffd7814fb60 R08: 0000000000000000 R09: 000000012be0e000
Sep 25 14:56:23 desky kernel: R10: 00007fb1b2860000 R11: 0000000000000246 R12: 0000556edd39a240
Sep 25 14:56:23 desky kernel: R13: 00007fb1b2dcb010 R14: 0000556eda79f420 R15: 0000000000000000
Sep 25 14:56:23 desky kernel:  </TASK>

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5250
Fixes: c2ae94c ("drm/xe: Convert the CPU fault handler for exhaustive eviction")
Cc: Matthew Brost <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 8f1756a)
Signed-off-by: Lucas De Marchi <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Nov 10, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.

This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0:	hint	#0x19
ffffffc080ffcff4:	stp	x29, x30, [sp, #-48]!
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8:	adrp	x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc:	add	x29, sp, #0x0
ffffffc080ffd000:	stp	x19, x20, [sp, torvalds#16]
ffffffc080ffd004:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008:	add	x0, x2, #0xc20
{
ffffffc080ffd00c:	stp	x21, x22, [sp, torvalds#32]
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014:	add	x19, x19, #0xbb0
ffffffc080ffd018:	ldr	w3, [x20, #4]

	dev->last_state_idx = state;

to

static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034:	hint	#0x19
ffffffc080ffd038:	stp	x29, x30, [sp, #-48]!
ffffffc080ffd03c:	add	x29, sp, #0x0
ffffffc080ffd040:	stp	x19, x20, [sp, torvalds#16]
ffffffc080ffd044:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c:	stp	x21, x22, [sp, torvalds#32]
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050:	add	x19, x19, #0xbb0

	dev->last_state_idx = state;

This saves us:
	adrp    x2, ffffffc0848c0000 <gicv5_global_data+0x28>
	add     x0, x2, #0xc20
	ldr     w3, [x20, #4]

Signed-off-by: Christian Loehle <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Nov 13, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.

This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0:	hint	#0x19
ffffffc080ffcff4:	stp	x29, x30, [sp, #-48]!
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8:	adrp	x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc:	add	x29, sp, #0x0
ffffffc080ffd000:	stp	x19, x20, [sp, torvalds#16]
ffffffc080ffd004:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008:	add	x0, x2, #0xc20
{
ffffffc080ffd00c:	stp	x21, x22, [sp, torvalds#32]
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014:	add	x19, x19, #0xbb0
ffffffc080ffd018:	ldr	w3, [x20, #4]

	dev->last_state_idx = state;

to

static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034:	hint	#0x19
ffffffc080ffd038:	stp	x29, x30, [sp, #-48]!
ffffffc080ffd03c:	add	x29, sp, #0x0
ffffffc080ffd040:	stp	x19, x20, [sp, torvalds#16]
ffffffc080ffd044:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c:	stp	x21, x22, [sp, torvalds#32]
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050:	add	x19, x19, #0xbb0

	dev->last_state_idx = state;

This saves us:
	adrp    x2, ffffffc0848c0000 <gicv5_global_data+0x28>
	add     x0, x2, #0xc20
	ldr     w3, [x20, #4]

Signed-off-by: Christian Loehle <[email protected]>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>
hellsgod pushed a commit to hellsgod/linux that referenced this pull request Nov 14, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.

This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0:	hint	#0x19
ffffffc080ffcff4:	stp	x29, x30, [sp, #-48]!
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8:	adrp	x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc:	add	x29, sp, #0x0
ffffffc080ffd000:	stp	x19, x20, [sp, torvalds#16]
ffffffc080ffd004:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008:	add	x0, x2, #0xc20
{
ffffffc080ffd00c:	stp	x21, x22, [sp, torvalds#32]
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014:	add	x19, x19, #0xbb0
ffffffc080ffd018:	ldr	w3, [x20, #4]

	dev->last_state_idx = state;

to

static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034:	hint	#0x19
ffffffc080ffd038:	stp	x29, x30, [sp, #-48]!
ffffffc080ffd03c:	add	x29, sp, #0x0
ffffffc080ffd040:	stp	x19, x20, [sp, torvalds#16]
ffffffc080ffd044:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c:	stp	x21, x22, [sp, torvalds#32]
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050:	add	x19, x19, #0xbb0

	dev->last_state_idx = state;

This saves us:
	adrp    x2, ffffffc0848c0000 <gicv5_global_data+0x28>
	add     x0, x2, #0xc20
	ldr     w3, [x20, #4]

Signed-off-by: Christian Loehle <[email protected]>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>
Linux4 pushed a commit to fp6-aosp/android_kernel_fairphone_sm7635 that referenced this pull request Nov 27, 2025
commit 927fa5b upstream.

KMSAN reported uninit-value access in __unix_walk_scc() [1].

In the list_for_each_entry_reverse() loop, when the vertex's index
equals it's scc_index, the loop uses the variable vertex as a
temporary variable that points to a vertex in scc. And when the loop
is finished, the variable vertex points to the list head, in this case
scc, which is a local variable on the stack (more precisely, it's not
even scc and might underflow the call stack of __unix_walk_scc():
container_of(&scc, struct unix_vertex, scc_entry)).

However, the variable vertex is used under the label prev_vertex. So
if the edge_stack is not empty and the function jumps to the
prev_vertex label, the function will access invalid data on the
stack. This causes the uninit-value access issue.

Fix this by introducing a new temporary variable for the loop.

[1]
BUG: KMSAN: uninit-value in __unix_walk_scc net/unix/garbage.c:478 [inline]
BUG: KMSAN: uninit-value in unix_walk_scc net/unix/garbage.c:526 [inline]
BUG: KMSAN: uninit-value in __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584
 __unix_walk_scc net/unix/garbage.c:478 [inline]
 unix_walk_scc net/unix/garbage.c:526 [inline]
 __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584
 process_one_work kernel/workqueue.c:3231 [inline]
 process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312
 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393
 kthread+0x3c4/0x530 kernel/kthread.c:389
 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Uninit was stored to memory at:
 unix_walk_scc net/unix/garbage.c:526 [inline]
 __unix_gc+0x2adf/0x3c20 net/unix/garbage.c:584
 process_one_work kernel/workqueue.c:3231 [inline]
 process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312
 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393
 kthread+0x3c4/0x530 kernel/kthread.c:389
 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Local variable entries created at:
 ref_tracker_free+0x48/0xf30 lib/ref_tracker.c:222
 netdev_tracker_free include/linux/netdevice.h:4058 [inline]
 netdev_put include/linux/netdevice.h:4075 [inline]
 dev_put include/linux/netdevice.h:4101 [inline]
 update_gid_event_work_handler+0xaa/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:813

CPU: 1 PID: 12763 Comm: kworker/u8:31 Not tainted 6.10.0-rc4-00217-g35bb670d65fc torvalds#32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
Workqueue: events_unbound __unix_gc

Bug: 404256079
Fixes: 3484f06 ("af_unix: Detect Strongly Connected Components.")
Reported-by: syzkaller <[email protected]>
Signed-off-by: Shigeru Yoshida <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 0297e7d)
Signed-off-by: Lee Jones <[email protected]>
Change-Id: If09634d63e04e73d6b6e31984020752cd97e52f0
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Nov 27, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs
but rather just perform direct assignments.

The performance benchmarks with Generic Entry patch[1] with audit on
from perf bench basic syscall on kunpeng920 gives roughly a 1%
performance uplift and also aligns the implementation with
x86 and RISC-V.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

[1]: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jinjie Ruan <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Dec 1, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs
but rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments()
and syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance
benchmarks from perf bench basic syscall on kunpeng920 gives roughly
a 1% performance uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

[1]: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jinjie Ruan <[email protected]>
thestinger pushed a commit to GrapheneOS/kernel_common-6.12 that referenced this pull request Dec 3, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.

This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0:	hint	#0x19
ffffffc080ffcff4:	stp	x29, x30, [sp, #-48]!
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8:	adrp	x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc:	add	x29, sp, #0x0
ffffffc080ffd000:	stp	x19, x20, [sp, torvalds#16]
ffffffc080ffd004:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008:	add	x0, x2, #0xc20
{
ffffffc080ffd00c:	stp	x21, x22, [sp, torvalds#32]
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014:	add	x19, x19, #0xbb0
ffffffc080ffd018:	ldr	w3, [x20, #4]

	dev->last_state_idx = state;

to

static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034:	hint	#0x19
ffffffc080ffd038:	stp	x29, x30, [sp, #-48]!
ffffffc080ffd03c:	add	x29, sp, #0x0
ffffffc080ffd040:	stp	x19, x20, [sp, torvalds#16]
ffffffc080ffd044:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c:	stp	x21, x22, [sp, torvalds#32]
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050:	add	x19, x19, #0xbb0

	dev->last_state_idx = state;

This saves us:
	adrp    x2, ffffffc0848c0000 <gicv5_global_data+0x28>
	add     x0, x2, #0xc20
	ldr     w3, [x20, #4]

Signed-off-by: Christian Loehle <[email protected]>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>
(cherry picked from commit 0796ddf
 git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next)

Bug: 450671466
Change-Id: Icb6faa509da6dc282270d763f763bc943d461119
Signed-off-by: Reka Norman <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 17, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 17, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 17, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Dec 18, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.

This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0:	hint	#0x19
ffffffc080ffcff4:	stp	x29, x30, [sp, #-48]!
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8:	adrp	x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc:	add	x29, sp, #0x0
ffffffc080ffd000:	stp	x19, x20, [sp, torvalds#16]
ffffffc080ffd004:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008:	add	x0, x2, #0xc20
{
ffffffc080ffd00c:	stp	x21, x22, [sp, torvalds#32]
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014:	add	x19, x19, #0xbb0
ffffffc080ffd018:	ldr	w3, [x20, #4]

	dev->last_state_idx = state;

to

static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034:	hint	#0x19
ffffffc080ffd038:	stp	x29, x30, [sp, #-48]!
ffffffc080ffd03c:	add	x29, sp, #0x0
ffffffc080ffd040:	stp	x19, x20, [sp, torvalds#16]
ffffffc080ffd044:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c:	stp	x21, x22, [sp, torvalds#32]
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050:	add	x19, x19, #0xbb0

	dev->last_state_idx = state;

This saves us:
	adrp    x2, ffffffc0848c0000 <gicv5_global_data+0x28>
	add     x0, x2, #0xc20
	ldr     w3, [x20, #4]

Signed-off-by: Christian Loehle <[email protected]>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 19, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 20, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 21, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 21, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 21, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 23, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 23, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Dec 23, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.

Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.

With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec]    | 2.211 [sec]     |  ↓1.36%   |
| usecs/op   | 0.224157       | 0.221146        |  ↓1.36%   |
| ops/sec    | 4,461,157      | 4,501,409       |  ↑0.9%    |

Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy().  Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].

Before:
<syscall_get_arguments.constprop.0>:
       aa0103e2        mov     x2, x1
       91002003        add     x3, x0, #0x8
       f9408804        ldr     x4, [x0, torvalds#272]
       f8008444        str     x4, [x2], torvalds#8
       a9409404        ldp     x4, x5, [x0, torvalds#8]
       a9009424        stp     x4, x5, [x1, torvalds#8]
       a9418400        ldp     x0, x1, [x0, torvalds#24]
       a9010440        stp     x0, x1, [x2, torvalds#16]
       f9401060        ldr     x0, [x3, torvalds#32]
       f9001040        str     x0, [x2, torvalds#32]
       d65f03c0        ret
       d503201f        nop

After:
       a9408e82        ldp     x2, x3, [x20, torvalds#8]
       2a1603e0        mov     w0, w22
       f9400e84        ldr     x4, [x20, torvalds#24]
       f9408a81        ldr     x1, [x20, torvalds#272]
       9401c4ba        bl      ffff800080215ca8 <__audit_syscall_entry>

This also aligns the implementation with x86 and RISC-V.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Dec 26, 2025
The cpuidle governor callbacks for update, select and reflect
are always running on the actual idle entering/exiting CPU, so
use the more optimized this_cpu_ptr() to access the internal teo
data.

This brings down the latency-critical teo_reflect() from
static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffcff0:	hint	#0x19
ffffffc080ffcff4:	stp	x29, x30, [sp, #-48]!
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffcff8:	adrp	x2, ffffffc0848c0000 <gicv5_global_data+0x28>
{
ffffffc080ffcffc:	add	x29, sp, #0x0
ffffffc080ffd000:	stp	x19, x20, [sp, torvalds#16]
ffffffc080ffd004:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd008:	add	x0, x2, #0xc20
{
ffffffc080ffd00c:	stp	x21, x22, [sp, torvalds#32]
	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
ffffffc080ffd010:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
ffffffc080ffd014:	add	x19, x19, #0xbb0
ffffffc080ffd018:	ldr	w3, [x20, #4]

	dev->last_state_idx = state;

to

static void teo_reflect(struct cpuidle_device *dev, int state)
{
ffffffc080ffd034:	hint	#0x19
ffffffc080ffd038:	stp	x29, x30, [sp, #-48]!
ffffffc080ffd03c:	add	x29, sp, #0x0
ffffffc080ffd040:	stp	x19, x20, [sp, torvalds#16]
ffffffc080ffd044:	orr	x20, xzr, x0
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd048:	adrp	x19, ffffffc083eb5000 <cpu_devices+0x78>
{
ffffffc080ffd04c:	stp	x21, x22, [sp, torvalds#32]
	struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus);
ffffffc080ffd050:	add	x19, x19, #0xbb0

	dev->last_state_idx = state;

This saves us:
	adrp    x2, ffffffc0848c0000 <gicv5_global_data+0x28>
	add     x0, x2, #0xc20
	ldr     w3, [x20, #4]

Signed-off-by: Christian Loehle <[email protected]>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rafael J. Wysocki <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants