aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2 daysbpf: optimize bpf_map_update_elem() for map-in-map typesHEADbpf-next-6.19masterRitesh Oedayrajsingh Varma1-1/+1
Updating a BPF_MAP_TYPE_HASH_OF_MAPS or BPF_MAP_TYPE_ARRAY_OF_MAPS via bpf_map_update_elem() is very expensive. In one of our workloads, we're inserting ~1400 maps of type BPF_MAP_TYPE_ARRAY into a BPF_MAP_TYPE_ARRAY_OF_MAPS. This takes ~21 seconds on a single thread, with an average of ~15ms per call: Function Name: map_update_elem Number of calls: 1369 Total time: 21s 182ms 966µs Maximum: 47ms 937µs Average: 15ms 473µs Minimum: 7µs Profiling shows that nearly all of this time is going to synchronize_rcu(), via maybe_wait_bpf_programs() in map_update_elem(). The call to synchronize_rcu() is done to ensure that after bpf_map_update_elem() returns, no BPF programs are still looking at the old value of the map, per commit 1ae80cf31938 ("bpf: wait for running BPF programs when updating map-in-map"). As discussed on the bpf mailing list, replace synchronize_rcu() with synchronize_rcu_expedited(). This is 175x faster: it now takes an average of 88 microseconds per call, for a total of 127 milliseconds in the same benchmark: Function Name: map_update_elem Number of calls: 1439 Total time: 127ms 626µs Maximum: 445µs Average: 88µs Minimum: 10µs Link: https://lore.kernel.org/bpf/CAH6OuBR=w2kybK6u7aH_35B=Bo1PCukeMZefR=7V4Z2tJNK--Q@mail.gmail.com/ Signed-off-by: Ritesh Oedayrajsingh Varma <ritesh@superluminal.eu> Link: https://lore.kernel.org/r/20251128000422.20462-1-ritesh@superluminal.eu Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 daysbpf: make kprobe_multi_link_prog_run always_inlineMenglong Dong1-1/+1
Make kprobe_multi_link_prog_run() always inline to obtain better performance. Before this patch, the bench performance is: ./bench trig-kprobe-multi Setting up benchmark 'trig-kprobe-multi'... Benchmark 'trig-kprobe-multi' started. Iter 0 ( 95.485us): hits 62.462M/s ( 62.462M/prod), [...] Iter 1 (-80.054us): hits 62.486M/s ( 62.486M/prod), [...] Iter 2 ( 13.572us): hits 62.287M/s ( 62.287M/prod), [...] Iter 3 ( 76.961us): hits 62.293M/s ( 62.293M/prod), [...] Iter 4 (-77.698us): hits 62.394M/s ( 62.394M/prod), [...] Iter 5 (-13.399us): hits 62.319M/s ( 62.319M/prod), [...] Iter 6 ( 77.573us): hits 62.250M/s ( 62.250M/prod), [...] Summary: hits 62.338 ± 0.083M/s ( 62.338M/prod) And after this patch, the performance is: Iter 0 (454.148us): hits 66.900M/s ( 66.900M/prod), [...] Iter 1 (-435.540us): hits 68.925M/s ( 68.925M/prod), [...] Iter 2 ( 8.223us): hits 68.795M/s ( 68.795M/prod), [...] Iter 3 (-12.347us): hits 68.880M/s ( 68.880M/prod), [...] Iter 4 ( 2.291us): hits 68.767M/s ( 68.767M/prod), [...] Iter 5 ( -1.446us): hits 68.756M/s ( 68.756M/prod), [...] Iter 6 ( 13.882us): hits 68.657M/s ( 68.657M/prod), [...] Summary: hits 68.792 ± 0.087M/s ( 68.792M/prod) As we can see, the performance of kprobe-multi increase from 62M/s to 68M/s. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20251126085246.309942-1-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 daysMerge branch 'selftests-bpf-convert-test_tc_edt-sh-into-test_progs'Alexei Starovoitov4-107/+151
Alexis Lothoré says: ==================== selftests/bpf: convert test_tc_edt.sh into test_progs Hello, this is a (late) v2 to my first attempt to convert the test_tc_edt script to test_progs. This new version is way simpler, thanks to Martin's suggestion about properly using the existing network_helpers rather than reinventing the wheel. It also fixes a small bug in the measured effective rate. The converted test roughly follows the original script logic, with two veths in two namespaces, a TCP connection between a client and a server, and the client pushing a specific amount of data. Time is recorded before and after the transmission to compute the effective rate. There are two knobs driving the robustness of the test in CI: - the amount of pushed data (the higher, the more precise is the effective rate) - the tolerated error margin The original test was configured with a 20s duration and a 1% error margin. The new test is configured with 1MB of data being pushed and a 2% error margin, to: - make the duration tolerable in CI - while keeping enough margin for rate measure fluctuations depending on the CI machines load This has been run multiple times locally to ensure that those values are sane, and once in CI before sending the series, but I suggest to let it live a few days in CI to see how it really behaves. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Changes in v2: - drop custom client/server management - update bpf program now that server pushes data - fix effective rate computation - Link to v1: https://lore.kernel.org/r/20251031-tc_edt-v1-0-5d34a5823144@bootlin.com --- Alexis Lothoré (eBPF Foundation) (4): selftests/bpf: rename test_tc_edt.bpf.c section to expose program type selftests/bpf: integrate test_tc_edt into test_progs selftests/bpf: remove test_tc_edt.sh selftests/bpf: do not hardcode target rate in test_tc_edt BPF program tools/testing/selftests/bpf/Makefile | 2 - .../testing/selftests/bpf/prog_tests/test_tc_edt.c | 145 +++++++++++++++++++++ tools/testing/selftests/bpf/progs/test_tc_edt.c | 11 +- tools/testing/selftests/bpf/test_tc_edt.sh | 100 -------------- 4 files changed, 151 insertions(+), 107 deletions(-) --- base-commit: 233a075a1b27070af76d64541cf001340ecff917 change-id: 20251030-tc_edt-3ea8e8d3d14e Best regards, ==================== Link: https://patch.msgid.link/20251128-tc_edt-v2-0-26db48373e73@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 daysselftests/bpf: do not hardcode target rate in test_tc_edt BPF programAlexis Lothoré (eBPF Foundation)2-3/+4
test_tc_edt currently defines the target rate in both the userspace and BPF parts. This value could be defined once in the userspace part if we make it able to configure the BPF program before starting the test. Add a target_rate variable in the BPF part, and make the userspace part set it to the desired rate before attaching the shaping program. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20251128-tc_edt-v2-4-26db48373e73@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 daysselftests/bpf: remove test_tc_edt.shAlexis Lothoré (eBPF Foundation)2-102/+0
Now that test_tc_edt has been integrated in test_progs, remove the legacy shell script. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20251128-tc_edt-v2-3-26db48373e73@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 daysselftests/bpf: integrate test_tc_edt into test_progsAlexis Lothoré (eBPF Foundation)2-1/+145
test_tc_edt.sh uses a pair of veth and a BPF program attached to the TX veth to shape the traffic to 5MBps. It then checks that the amount of received bytes (at interface level), compared to the TX duration, indeed matches 5Mbps. Convert this test script to the test_progs framework: - keep the double veth setup, isolated in two veths - run a small tcp server, and connect client to server - push a pre-configured amount of bytes, and measure how much time has been needed to push those - ensure that this rate is in a 2% error margin around the target rate This two percent value, while being tight, is hopefully large enough to not make the test too flaky in CI, while also turning it into a small example of BPF-based shaping. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20251128-tc_edt-v2-2-26db48373e73@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 daysselftests/bpf: rename test_tc_edt.bpf.c section to expose program typeAlexis Lothoré (eBPF Foundation)2-2/+3
The test_tc_edt BPF program uses a custom section name, which works fine when manually loading it with tc, but prevents it from being loaded with libbpf. Update the program section name to "tc" to be able to manipulate it with a libbpf-based C test. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20251128-tc_edt-v2-1-26db48373e73@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 daysMerge branch 'limited-queueing-in-nmi-for-rqspinlock'Alexei Starovoitov3-76/+108
Kumar Kartikeya Dwivedi says: ==================== Limited queueing in NMI for rqspinlock Ritesh reported that he was frequently seeing timeouts in cases which should have been covered by the AA heuristics. This led to the discovery of multiple gaps in the current code that could lead to timeouts when AA heuristics could work to prevent them. More details and investigation is available in the original threads. [0][1] This set restores the ability for NMI waiters to queue in the slow path, and reduces the cases where they would attempt to trylock. However, such queueing must not happen when interrupting waiters which the NMI itself depends upon for forward progress; in those cases the trylock fallback remains, but with a single attempt to avoid aimless attempts to acquire the lock. It also closes a possible window in the lock fast path and the unlock path where NMIs landing between cmpxchg and entry creation, or entry deletion and unlock would miss the detection of an AA scenario and end up timing out. This virtually eliminates all the cases where existing heuristics can prevent timeouts and quickly recover from a deadlock. More details are available in the commit logs for each patch. [0]: https://lore.kernel.org/bpf/CAH6OuBTjG+N=+GGwcpOUbeDN563oz4iVcU3rbse68egp9wj9_A@mail.gmail.com [1]: https://lore.kernel.org/bpf/20251125203253.3287019-1-memxor@gmail.com ==================== Link: https://patch.msgid.link/20251128232802.1031906-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 daysselftests/bpf: Add success stats to rqspinlock stress testKumar Kartikeya Dwivedi1-12/+43
Add stats to observe the success and failure rate of lock acquisition attempts in various contexts. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251128232802.1031906-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 daysrqspinlock: Precede non-head waiter queueing with AA checkKumar Kartikeya Dwivedi1-0/+13
While previous commits sufficiently address the deadlocks, there are still scenarios where queueing of waiters in NMIs can exacerbate the possibility of timeouts. Consider the case below: CPU 0 <NMI> res_spin_lock(A) -> becomes non-head waiter </NMI> lock owner in CS or pending waiter spinning CPU 1 res_spin_lock(A) -> head waiter spinning on owner/pending bits In such a scenario, the non-head waiter in NMI on CPU 0 will not poll for deadlocks or timeout since it will simply queue behind previous waiter (head on CPU 1), and also not enter the trylock fallback since no rqspinlock queue waiter is active on CPU 0. In such a scenario, the transaction initiated by the head waiter on CPU 1 will timeout, signalling the NMI and ending the cyclic dependency, but it will cost 250 ms of time. Instead, the NMI on CPU 0 could simply check for the presence of an AA deadlock and only proceed with queueing on success. Add such a check right before any form of queueing is initiated. The reason the AA deadlock check is not used in conjunction with in_nmi() is that a similar case could occur due to a reentrant path in the owner's critical section, and unconditionally checking for AA before entering the queueing path avoids expensive timeouts. Non-NMI reentrancy only happens at controlled points in the slow path (with specific tracepoints which do not impede the forward progress of a waiter loop), or in the owner CS, while NMIs can land anywhere. While this check is only needed for non-head waiter queueing, checking whether we are head or not is racy without xchg_tail, and after that point, we are already queued, hence for simplicity we must invoke the check unconditionally. Note that a more contrived case could still be constructed by using two locks, and interrupting the progress of the respective owners by non-head waiters of the other lock, in an ABBA fashion, which would still not be covered by the current set of checks and conditions. It would still lead to a timeout though, and not a deadlock. An ABBA check cannot happen optimistically before the queueing, since it can be racy, and needs to be happen continuously during the waiting period, which would then require an unlinking step for queued NMI/reentrant waiters. This is beyond the scope of this patch. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251128232802.1031906-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 daysrqspinlock: Disable spinning for trylock fallbackKumar Kartikeya Dwivedi1-10/+8
The original trylock fallback was inherited from qspinlock, and then reused for the reentrant NMIs while the slow path is active. However, under contention, it is very unlikely for the trylock to succeed in taking the lock. In addition, a trylock also has no fairness guarantees, and thus is prone to starvation issues under extreme scenarios. The original qspinlock had no choice in terms of returning an error the caller; if the node count was breached, it had to fall back to trylock to attempt to take the lock. In case of rqspinlock, we do have the option of returning to the user. Thus, simply attempt the trylock once, and instead of spinning, return an error in case the lock cannot be taken. This ends up significantly reducing the time spent in the trylock fallback, since we no longer wait for the timeout duration trying to aimlessly acquire the lock when there's a high-probability that under contention, it won't be available to us anyway. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251128232802.1031906-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 daysrqspinlock: Use trylock fallback when per-CPU rqnode is busyKumar Kartikeya Dwivedi1-1/+1
In addition to deferring to the trylock fallback in NMIs, only do so when an rqspinlock waiter is queued on the current CPU. This is detected by noticing a non-zero node index. This allows NMI waiters to join the waiter queue if it isn't interrupting an existing rqspinlock waiter, and increase the chances of fairly obtaining the lock, performing deadlock detection as the head, and not being starved while attempting the trylock. The trylock path in particular is unlikely to succeed under contention, as it relies on the lock word becoming 0, which indicates no contention. This means that the most likely result for NMIs attempting a trylock is a timeout under contention if they don't hit an AA or ABBA case. The core problem being addressed through the fixed commit was removing the dependency edge between an NMI queue waiter and the queue waiter it is interrupting. Whenever a circular dependency forms, and with no way to break it (as non-head waiters don't poll for deadlocks or timeouts), we would enter into a deadlock. A trylock either breaks such an edge by probing for deadlocks, and finally terminating the waiting loop using a timeout. By excluding queueing on CPUs where the node index is non-zero for NMIs, this sort of dependency is broken. The CPU enters the trylock path for those cases, and falls back to deadlock checks and timeouts. However, in other case where it doesn't interrupt the CPU in the slow path while its queued on the lock, it can join the queue as a normal waiter, and avoid trylock associated starvation and subsequent timeouts. There are a few remaining cases here that matter: the NMI can still preempt the owner in its critical section, and if it queues as a non-head waiter, it can end up impeding the progress of the owner. While this won't deadlock, since the head waiter will eventually signal the NMI waiter to either stop (due to a timeout), it can still lead to long timeouts. These gaps will be addressed in subsequent commits. Note that while the node count detection approach is less conservative than simply deferring NMIs to trylock, it is going to return errors where attempts to lock B in NMI happen while waiters for lock A are in a lower context on the same CPU. However, this only occurs when the lower context is queued in the slow path, and the NMI attempt can proceed without failure in all other cases. To continue to prevent AA deadlocks (or ABBA in a similar NMI interrupting lower context pattern), we'd need a more fleshed out algorithm to unlink NMI waiters after they queue and detect such cases. However, all that complexity isn't appealing yet to reduce the failure rate in the small window inside the slow path. It is important to note that reentrancy in the slow path can also happen through trace_contention_{begin,end}, but in those cases, unlike an NMI, the forward progress of the head waiter (or the predecessor in general) is not being blocked. Fixes: 0d80e7f951be ("rqspinlock: Choose trylock fallback for NMI waiters") Reported-by: Ritesh Oedayrajsingh Varma <ritesh@superluminal.eu> Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251128232802.1031906-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 daysrqspinlock: Perform AA checks immediatelyKumar Kartikeya Dwivedi1-18/+7
Currently, while we enter the check_timeout call immediately due to the way the ts.spin is initialized, we still invoke the AA and ABBA checks in the second invocation, and only initialize the timestamp in the first one. Since each iteration is at least done with a 1ms delay, this can add delays in detection of AA deadlocks, up to a ms. Rework check_timeout() to avoid this. First, call check_deadlock_AA() while initializing the timestamps for the wait period. This also means that we only do it once per waiting period, instead of every invocation. Finally, drop check_deadlock() and call check_deadlock_ABBA() directly. To save on unnecessary ktime_get_mono_fast_ns() in case of AA deadlock, sample the time only if it returns 0. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251128232802.1031906-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 daysrqspinlock: Enclose lock/unlock within lock entry acquisitionsKumar Kartikeya Dwivedi2-37/+38
Ritesh reported that timeouts occurred frequently for rqspinlock despite reentrancy on the same lock on the same CPU in [0]. This patch closes one of the races leading to this behavior, and reduces the frequency of timeouts. We currently have a tiny window between the fast-path cmpxchg and the grabbing of the lock entry where an NMI could land, attempt the same lock that was just acquired, and end up timing out. This is not ideal. Instead, move the lock entry acquisition from the fast path to before the cmpxchg, and remove the grabbing of the lock entry in the slow path, assuming it was already taken by the fast path. The TAS fallback is invoked directly without being preceded by the typical fast path, therefore we must continue to grab the deadlock detection entry in that case. Case on lock leading to missed AA: cmpxchg lock A <NMI> ... rqspinlock acquisition of A ... timeout </NMI> grab_held_lock_entry(A) There is a similar case when unlocking the lock. If the NMI lands between the WRITE_ONCE and smp_store_release, it is possible that we end up in a situation where the NMI fails to diagnose the AA condition, leading to a timeout. Case on unlock leading to missed AA: WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL) <NMI> ... rqspinlock acquisition of A ... timeout </NMI> smp_store_release(A->locked, 0) The patch changes the order on unlock to smp_store_release() succeeded by WRITE_ONCE() of NULL. This avoids the missed AA detection described above, but may lead to a false positive if the NMI lands between these two statements, which is acceptable (and preferred over a timeout). The original intention of the reverse order on unlock was to prevent the following possible misdiagnosis of an ABBA scenario: grab entry A lock A grab entry B lock B unlock B smp_store_release(B->locked, 0) grab entry B lock B grab entry A lock A ! <detect ABBA> WRITE_ONCE(rqh->locks[rqh->cnt - 1], NULL) If the store release were is after the WRITE_ONCE, the other CPU would not observe B in the table of the CPU unlocking the lock B. However, since the threads are obviously participating in an ABBA deadlock, it is no longer appealing to use the order above since it may lead to a 250 ms timeout due to missed AA detection. [0]: https://lore.kernel.org/bpf/CAH6OuBTjG+N=+GGwcpOUbeDN563oz4iVcU3rbse68egp9wj9_A@mail.gmail.com Fixes: 0d80e7f951be ("rqspinlock: Choose trylock fallback for NMI waiters") Reported-by: Ritesh Oedayrajsingh Varma <ritesh@superluminal.eu> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251128232802.1031906-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 daysbpf: Remove runqslower toolHoyeon Lee9-413/+4
runqslower was added in commit 9c01546d26d2 "tools/bpf: Add runqslower tool to tools/bpf" as a BCC port to showcase early BPF CO-RE + libbpf workflows. runqslower continues to live in BCC (libbpf-tools), so there is no need to keep building and maintaining it. Drop tools/bpf/runqslower and remove all build hooks in tools/bpf and selftests accordingly. Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com> Link: https://lore.kernel.org/r/20251126093821.373291-1-hoyeon.lee@suse.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 daysselftests/bpf: Remove usage of lsm/file_alloc_security in selftestAmery Hung3-7/+7
file_alloc_security hook is disabled. Use other LSM hooks in selftests instead. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20251126202927.2584874-2-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 daysbpf: Disable file_alloc_security hookAmery Hung1-0/+1
A use-after-free bug may be triggered by calling bpf_inode_storage_get() in a BPF LSM program hooked to file_alloc_security. Disable the hook to prevent this from happening. The cause of the bug is shown in the trace below. In alloc_file(), a file struct is first allocated through kmem_cache_alloc(). Then, file_alloc_security hook is invoked. Since the zero initialization or assignment of f->f_inode happen after this LSM hook, a BPF program may get a dangeld inode pointer by walking the file struct. alloc_file() -> alloc_empty_file() -> f = kmem_cache_alloc() -> init_file() -> security_file_alloc() // f->f_inode not init-ed yet! -> f->f_inode = NULL; -> file_init_path() -> f->f_inode = path->dentry->d_inode Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Dongliang Mu <dzm91@hust.edu.cn> Closes: https://lore.kernel.org/bpf/1d2d1968.47cd3.19ab9528e94.Coremail.kaiyanm@hust.edu.cn/ Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20251126202927.2584874-1-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 daysMerge branch 'a-pair-of-follow-ups-for-indirect-jumps'Alexei Starovoitov3-11/+13
Anton Protopopov says: ==================== A pair of follow ups for indirect jumps Two fixes suggested by Alexei in [1]. Resending as a series, as the second patch depends on the first. [1] https://lore.kernel.org/bpf/CAADnVQK3piReoo1ja=9hgz7aJ60Y_Jjur_JMOaYV8-Mn_VyE4A@mail.gmail.com/#R ==================== Link: https://patch.msgid.link/20251128063224.1305482-1-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 daysbpf: check for insn arrays in check_ptr_alignmentAnton Protopopov1-3/+3
Do not abuse the strict_alignment_once flag, and check if the map is an instruction array inside the check_ptr_alignment() function. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Link: https://lore.kernel.org/r/20251128063224.1305482-3-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 daysbpf: force BPF_F_RDONLY_PROG on insn array creationAnton Protopopov3-8/+10
The original implementation added a hack to check_mem_access() to prevent programs from writing into insn arrays. To get rid of this hack, enforce BPF_F_RDONLY_PROG on map creation. Also fix the corresponding selftest, as the error message changes with this patch. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Link: https://lore.kernel.org/r/20251128063224.1305482-2-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 daysbpf: Fix exclusive map memory leakEdward Adam Davis1-1/+2
When excl_prog_hash is 0 and excl_prog_hash_size is non-zero, the map also needs to be freed. Otherwise, the map memory will not be reclaimed, just like the memory leak problem reported by syzbot [1]. syzbot reported: BUG: memory leak backtrace (crc 7b9fb9b4): map_create+0x322/0x11e0 kernel/bpf/syscall.c:1512 __sys_bpf+0x3556/0x3610 kernel/bpf/syscall.c:6131 Fixes: baefdbdf6812 ("bpf: Implement exclusive map creation") Reported-by: syzbot+cf08c551fecea9fd1320@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=cf08c551fecea9fd1320 Tested-by: syzbot+cf08c551fecea9fd1320@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/tencent_3F226F882CE56DCC94ACE90EED1ECCFC780A@qq.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 daysMerge branch 'general-enhancements-to-rqspinlock-stress-test'Alexei Starovoitov1-3/+117
Kumar Kartikeya Dwivedi says: ==================== General enhancements to rqspinlock stress test Three enchancements, details in commit messages. First, the CPU requirements are 2 for AA, 3 for ABBA, and 4 for ABBCCA, hence relax the check during module initialization. Second, add a per-CPU histogram to capture lock acquisition times to record which buckets these acquisitions fall into for the normal task context and NMI context. Anything below 10ms is not printed in detail, but above that displays the full breakdown for each context. Finally, make the delay of the NMI and task contexts configurable, set to 10 and 20 ms respectively by default. ==================== Link: https://patch.msgid.link/20251125020749.2421610-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 daysselftests/bpf: Make CS length configurable for rqspinlock stress testKumar Kartikeya Dwivedi1-2/+12
Allow users to configure the critical section delay for both task/normal and NMI contexts, and set to 20ms and 10ms as before by default. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251125020749.2421610-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 daysselftests/bpf: Add lock wait time stats to rqspinlock stress testKumar Kartikeya Dwivedi1-0/+104
Add statistics per-CPU broken down by context and various timing windows for the time taken to acquire an rqspinlock. Cases where all acquisitions fit into the 10ms window are skipped from printing, otherwise the full breakdown is displayed when printing the summary. This allows capturing precisely the number of times outlier attempts happened for a given lock in a given context. A critical detail is that time is captured regardless of success or failure, which is important to capture events for failed but long waiting timeout attempts. Output: [ 64.279459] rqspinlock acquisition latency histogram (ms): [ 64.279472] cpu1: total 528426 (normal 526559, nmi 1867) [ 64.279477] 0-1ms: total 524697 (normal 524697, nmi 0) [ 64.279480] 2-2ms: total 3652 (normal 1811, nmi 1841) [ 64.279482] 3-3ms: total 66 (normal 47, nmi 19) [ 64.279485] 4-4ms: total 2 (normal 1, nmi 1) [ 64.279487] 5-5ms: total 1 (normal 1, nmi 0) [ 64.279489] 6-6ms: total 1 (normal 0, nmi 1) [ 64.279490] 101-150ms: total 1 (normal 0, nmi 1) [ 64.279492] >= 251ms: total 6 (normal 2, nmi 4) ... Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251125020749.2421610-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 daysselftests/bpf: Relax CPU requirements for rqspinlock stress testKumar Kartikeya Dwivedi1-1/+1
Only require 2 CPUs for AA, 3 for ABBA, 4 for ABBCCA, which is calculated nicely by adding to the mode enum. Enables running single CPU AA tests. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251125020749.2421610-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 daysbpf: Introduce internal bpf_map_check_op_flags helper functionLeon Hwang2-23/+22
It is to unify map flags checking for lookup_elem, update_elem, lookup_batch and update_batch APIs. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20251125145857.98134-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 dayslibbpf: Fix some incorrect @param descriptions in the comment of libbpf.hJianyun Gao1-11/+16
Fix up some of missing or incorrect @param descriptions for libbpf public APIs in libbpf.h. Signed-off-by: Jianyun Gao <jianyungao89@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251118033025.11804-1-jianyungao89@gmail.com
6 daysselftests/bpf: Call bpf_get_numa_node_id() in trigger_count()Menglong Dong2-4/+6
The bench test "trig-kernel-count" can be used as a baseline comparison for fentry and other benchmarks, and the calling to bpf_get_numa_node_id() should be considered as composition of the baseline. So, let's call it in trigger_count(). Meanwhile, rename trigger_count() to trigger_kernel_count() to make it easier understand. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251116014242.151110-1-dongml2@chinatelecom.cn
6 daysdocs: bpf: map_array: Specify BPF_MAP_TYPE_PERCPU_ARRAY value size limitAlex Tran1-2/+3
Specify value size limit for BPF_MAP_TYPE_PERCPU_ARRAY which is PCPU_MIN_UNIT_SIZE (32 kb). In percpu allocator (mm: percpu), any request with a size greater than PCPU_MIN_UNIT_SIZE is rejected. Signed-off-by: Alex Tran <alex.t.tran@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251115063531.2302903-1-alex.t.tran@gmail.com
7 daysselftests/bpf: Fix htab_update/reenter_update selftest failureSaket Kumar Bhaskar2-15/+41
Since commit 31158ad02ddb ("rqspinlock: Add deadlock detection and recovery") the updated path on re-entrancy now reports deadlock via -EDEADLK instead of the previous -EBUSY. Also, the way reentrancy was exercised (via fentry/lookup_elem_raw) has been fragile because lookup_elem_raw may be inlined (find_kernel_btf_id() will return -ESRCH). To fix this fentry is attached to bpf_obj_free_fields() instead of lookup_elem_raw() and: - The htab map is made to use a BTF-described struct val with a struct bpf_timer so that check_and_free_fields() reliably calls bpf_obj_free_fields() on element replacement. - The selftest is updated to do two updates to the same key (insert + replace) in prog_test. - The selftest is updated to align with expected errno with the kernel’s current behavior. Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Link: https://lore.kernel.org/r/20251117060752.129648-1-skb99@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 daysMerge branch 'ease-bpf-signing-build-requirements'Alexei Starovoitov3-2/+11
Alan Maguire says: ==================== Ease BPF signing build requirements This series makes it easier to build bpftool and selftests with signing support, removing reliance on >= openssl v3 (supporting openssl v1) to build bpftool and not requiring latest xxd to build verification cert header in selftests. Changes since v1 [1]: - Updated patch 2 to add symlink test_progs_verification_cert to .gitignore, EXTRA_CLEANFILES (AI review bot) - Added acks to patch 1 (Song, Quentin) [1] https://lore.kernel.org/bpf/20251114222249.30122-1-alan.maguire@oracle.com/ ==================== Link: https://patch.msgid.link/20251120084754.640405-1-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 daysselftests/bpf: Allow selftests to build with older xxdAlan Maguire2-2/+5
Currently selftests require xxd with the "-n <name>" option which allows the user to specify a name not derived from the input object path. Instead of relying on this newer feature, older xxd can be used if we link our desired name ("test_progs_verification_cert") to the input object. Many distros ship xxd in vim-common package and do not have the latest xxd with -n support. Fixes: b720903e2b14d ("selftests/bpf: Enable signature verification for some lskel tests") Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/r/20251120084754.640405-3-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 daysbpftool: Allow bpftool to build with openssl < 3Alan Maguire1-0/+6
ERR_get_error_all()[1] is a openssl v3 API, so to make code compatible with openssl v1 utilize ERR_get_err_line_data instead. Since openssl is already a build requirement for the kernel (minimum requirement openssl 1.0.0), this will allow bpftool to compile where opensslv3 is not available. Signing-related BPF selftests pass with openssl v1. [1] https://docs.openssl.org/3.4/man3/ERR_get_error/ Fixes: 40863f4d6ef2 ("bpftool: Add support for signing BPF programs") Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Song Liu <song@kernel.org> Acked-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/r/20251120084754.640405-2-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 daysMerge branch 'bpf-trampoline-support-jmp-mode'Alexei Starovoitov15-65/+225
Menglong Dong says: ==================== bpf trampoline support "jmp" mode For now, the bpf trampoline is called by the "call" instruction. However, it break the RSB and introduce extra overhead in x86_64 arch. For example, we hook the function "foo" with fexit, the call and return logic will be like this: call foo -> call trampoline -> call foo-body -> return foo-body -> return foo As we can see above, there are 3 call, but 2 return, which break the RSB balance. We can pseudo a "return" here, but it's not the best choice, as it will still cause once RSB miss: call foo -> call trampoline -> call foo-body -> return foo-body -> return dummy -> return foo The "return dummy" doesn't pair the "call trampoline", which can also cause the RSB miss. Therefore, we introduce the "jmp" mode for bpf trampoline, as advised by Alexei in [1]. And the logic will become this: call foo -> jmp trampoline -> call foo-body -> return foo-body -> return foo As we can see above, the RSB is totally balanced after this series. In this series, we introduce the FTRACE_OPS_FL_JMP for ftrace to make it use the "jmp" instruction instead of "call". And we also do some adjustment to bpf_arch_text_poke() to allow us specify the old and new poke_type. For the BPF_TRAMP_F_SHARE_IPMODIFY case, we will fallback to the "call" mode, as it need to get the function address from the stack, which is not supported in "jmp" mode. Before this series, we have the following performance with the bpf benchmark: $ cd tools/testing/selftests/bpf $ ./benchs/run_bench_trigger.sh usermode-count : 890.171 ± 1.522M/s kernel-count : 409.184 ± 0.330M/s syscall-count : 26.792 ± 0.010M/s fentry : 171.242 ± 0.322M/s fexit : 80.544 ± 0.045M/s fmodret : 78.301 ± 0.065M/s rawtp : 192.906 ± 0.900M/s tp : 81.883 ± 0.209M/s kprobe : 52.029 ± 0.113M/s kprobe-multi : 62.237 ± 0.060M/s kprobe-multi-all: 4.761 ± 0.014M/s kretprobe : 23.779 ± 0.046M/s kretprobe-multi: 29.134 ± 0.012M/s kretprobe-multi-all: 3.822 ± 0.003M/ And after this series, we have the following performance: usermode-count : 890.443 ± 0.307M/s kernel-count : 416.139 ± 0.055M/s syscall-count : 31.037 ± 0.813M/s fentry : 169.549 ± 0.519M/s fexit : 136.540 ± 0.518M/s fmodret : 159.248 ± 0.188M/s rawtp : 194.475 ± 0.144M/s tp : 84.505 ± 0.041M/s kprobe : 59.951 ± 0.071M/s kprobe-multi : 63.153 ± 0.177M/s kprobe-multi-all: 4.699 ± 0.012M/s kretprobe : 23.740 ± 0.015M/s kretprobe-multi: 29.301 ± 0.022M/s kretprobe-multi-all: 3.869 ± 0.005M/s As we can see above, the performance of fexit increase from 80.544M/s to 136.540M/s, and the "fmodret" increase from 78.301M/s to 159.248M/s. Link: https://lore.kernel.org/bpf/20251117034906.32036-1-dongml2@chinatelecom.cn/ Changes since v2: * reject if the addr is already "jmp" in register_ftrace_direct() and __modify_ftrace_direct() in the 1st patch. * fix compile error in powerpc in the 5th patch. * changes in the 6th patch: - fix the compile error by wrapping the write to tr->fops->flags with CONFIG_DYNAMIC_FTRACE_WITH_JMP - reset BPF_TRAMP_F_SKIP_FRAME when the second try of modify_fentry in bpf_trampoline_update() Link: https://lore.kernel.org/bpf/20251114092450.172024-1-dongml2@chinatelecom.cn/ Changes since v1: * change the bool parameter that we add to save_args() to "u32 flags" * rename bpf_trampoline_need_jmp() to bpf_trampoline_use_jmp() * add new function parameter to bpf_arch_text_poke instead of introduce bpf_arch_text_poke_type() * rename bpf_text_poke to bpf_trampoline_update_fentry * remove the BPF_TRAMP_F_JMPED and check the current mode with the origin flags instead. Link: https://lore.kernel.org/bpf/CAADnVQLX54sVi1oaHrkSiLqjJaJdm3TQjoVrgU-LZimK6iDcSA@mail.gmail.com/[1] ==================== Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/20251118123639.688444-1-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 daysbpf: implement "jmp" mode for trampolineMenglong Dong1-17/+58
Implement the "jmp" mode for the bpf trampoline. For the ftrace_managed case, we need only to set the FTRACE_OPS_FL_JMP on the tr->fops if "jmp" is needed. For the bpf poke case, we will check the origin poke type with the "origin_flags", and current poke type with "tr->flags". The function bpf_trampoline_update_fentry() is introduced to do the job. The "jmp" mode will only be enabled with CONFIG_DYNAMIC_FTRACE_WITH_JMP enabled and BPF_TRAMP_F_SHARE_IPMODIFY is not set. With BPF_TRAMP_F_SHARE_IPMODIFY, we need to get the origin call ip from the stack, so we can't use the "jmp" mode. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20251118123639.688444-7-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 daysbpf: specify the old and new poke_type for bpf_arch_text_pokeMenglong Dong9-46/+71
In the origin logic, the bpf_arch_text_poke() assume that the old and new instructions have the same opcode. However, they can have different opcode if we want to replace a "call" insn with a "jmp" insn. Therefore, add the new function parameter "old_t" along with the "new_t", which are used to indicate the old and new poke type. Meanwhile, adjust the implement of bpf_arch_text_poke() for all the archs. "BPF_MOD_NOP" is added to make the code more readable. In bpf_arch_text_poke(), we still check if the new and old address is NULL to determine if nop insn should be used, which I think is more safe. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20251118123639.688444-6-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 daysbpf,x86: adjust the "jmp" mode for bpf trampolineMenglong Dong2-5/+23
In the origin call case, if BPF_TRAMP_F_SKIP_FRAME is not set, it means that the trampoline is not called, but "jmp". Introduce the function bpf_trampoline_use_jmp() to check if the trampoline is in "jmp" mode. Do some adjustment on the "jmp" mode for the x86_64. The main adjustment that we make is for the stack parameter passing case, as the stack alignment logic changes in the "jmp" mode without the "rip". What's more, the location of the parameters on the stack also changes. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20251118123639.688444-5-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 daysbpf: fix the usage of BPF_TRAMP_F_SKIP_FRAMEMenglong Dong2-2/+2
Some places calculate the origin_call by checking if BPF_TRAMP_F_SKIP_FRAME is set. However, it should use BPF_TRAMP_F_ORIG_STACK for this propose. Just fix them. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20251118123639.688444-4-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 daysx86/ftrace: Implement DYNAMIC_FTRACE_WITH_JMPMenglong Dong3-2/+18
Implement the DYNAMIC_FTRACE_WITH_JMP for x86_64. In ftrace_call_replace, we will use JMP32_INSN_OPCODE instead of CALL_INSN_OPCODE if the address should use "jmp". Meanwhile, adjust the direct call in the ftrace_regs_caller. The RSB is balanced in the "jmp" mode. Take the function "foo" for example: original_caller: call foo -> foo: call fentry -> fentry: [do ftrace callbacks ] move tramp_addr to stack RET -> tramp_addr tramp_addr: [..] call foo_body -> foo_body: [..] RET -> back to tramp_addr [..] RET -> back to original_caller Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20251118123639.688444-3-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 daysftrace: Introduce FTRACE_OPS_FL_JMPMenglong Dong3-1/+61
For now, the "nop" will be replaced with a "call" instruction when a function is hooked by the ftrace. However, sometimes the "call" can break the RSB and introduce extra overhead. Therefore, introduce the flag FTRACE_OPS_FL_JMP, which indicate that the ftrace_ops should be called with a "jmp" instead of "call". For now, it is only used by the direct call case. When a direct ftrace_ops is marked with FTRACE_OPS_FL_JMP, the last bit of the ops->direct_call will be set to 1. Therefore, we can tell if we should use "jmp" for the callback in ftrace_call_replace(). Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20251118123639.688444-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 daysbpf: cleanup aux->used_maps after jitAnton Protopopov1-0/+9
In commit b4ce5923e780 ("bpf, x86: add new map type: instructions array") env->used_map was copied to func[i]->aux->used_maps before jitting. Clear these fields out after jitting such that pointer to freed memory (env->used_maps is freed later) are not kept in a live data structure. The reason why the copies were initially added is explained in https://lore.kernel.org/bpf/20251105090410.1250500-1-a.s.protopopov@gmail.com Suggested-by: Alexei Starovoitov <ast@kernel.org> Fixes: b4ce5923e780 ("bpf, x86: add new map type: instructions array") Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Link: https://lore.kernel.org/r/20251124151515.2543403-1-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysMerge branch 'bpf-nested-rcu-critical-sections'Alexei Starovoitov4-27/+66
Puranjay Mohan says: ==================== bpf: Nested rcu critical sections v1: https://lore.kernel.org/bpf/20250916113622.19540-1-puranjay@kernel.org/ Changes in v1->v2: - Move the addition of new tests to a separate patch (Alexei) - Avoid incrementing active_rcu_locks at two places (Eduard) Support nested rcu critical sections by making the boolean flag active_rcu_lock a counter and use it to manage rcu critical section state. bpf_rcu_read_lock() increments this counter and bpf_rcu_read_unlock() decrements it, MEM_RCU -> PTR_UNTRUSTED transition happens when active_rcu_locks drops to 0. ==================== Link: https://patch.msgid.link/20251117200411.25563-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysselftests: bpf: Add tests for unbalanced rcu_read_lockPuranjay Mohan2-0/+42
As verifier now supports nested rcu critical sections, add new test cases to make sure unbalanced usage of rcu_read_lock()/unlock() is rejected. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251117200411.25563-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysbpf: support nested rcu critical sectionsPuranjay Mohan3-27/+24
Currently, nested rcu critical sections are rejected by the verifier and rcu_lock state is managed by a boolean variable. Add support for nested rcu critical sections by make active_rcu_locks a counter similar to active_preempt_locks. bpf_rcu_read_lock() increments this counter and bpf_rcu_read_unlock() decrements it, MEM_RCU -> PTR_UNTRUSTED transition happens when active_rcu_locks drops to 0. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251117200411.25563-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysbpf: test the correct stack liveness of tail callsEduard Zingerman1-0/+50
A new test is added: caller_stack_write_tail_call tests that the live stack is correctly tracked for a tail call. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu> Link: https://lore.kernel.org/r/20251119160355.1160932-5-martin.teichmann@xfel.eu Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysbpf: correct stack liveness for tail callsEduard Zingerman3-7/+34
This updates bpf_insn_successors() reflecting that control flow might jump over the instructions between tail call and function exit, verifier might assume that some writes to parent stack always happen, which is not the case. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu> Link: https://lore.kernel.org/r/20251119160355.1160932-4-martin.teichmann@xfel.eu Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysbpf: test the proper verification of tail callsMartin Teichmann2-2/+90
Three tests are added: - invalidate_pkt_pointers_by_tail_call checks that one can use the packet pointer after a tail call. This was originally possible and also poses not problems, but was made impossible by 1a4607ffba35. - invalidate_pkt_pointers_by_static_tail_call tests a corner case found by Eduard Zingerman during the discussion of the original fix, which was broken in that fix. - subprog_result_tail_call tests that precision propagation works correctly across tail calls. This did not work before. Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251119160355.1160932-3-martin.teichmann@xfel.eu Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysbpf: properly verify tail call behaviorMartin Teichmann1-3/+28
A successful ebpf tail call does not return to the caller, but to the caller-of-the-caller, often just finishing the ebpf program altogether. Any restrictions that the verifier needs to take into account - notably the fact that the tail call might have modified packet pointers - are to be checked on the caller-of-the-caller. Checking it on the caller made the verifier refuse perfectly fine programs that would use the packet pointers after a tail call, which is no problem as this code is only executed if the tail call was unsuccessful, i.e. nothing happened. This patch simulates the behavior of a tail call in the verifier. A conditional jump to the code after the tail call is added for the case of an unsucessful tail call, and a return to the caller is simulated for a successful tail call. For the successful case we assume that the tail call returns an int, as tail calls are currently only allowed in functions that return and int. We always assume that the tail call modified the packet pointers, as we do not know what the tail call did. For the unsuccessful case we know nothing happened, so we do not need to add new constraints. This approach also allows to check other problems that may occur with tail calls, namely we are now able to check that precision is properly propagated into subprograms using tail calls, as well as checking the live slots in such a subprogram. Fixes: 1a4607ffba35 ("bpf: consider that tail calls invalidate packet pointers") Link: https://lore.kernel.org/bpf/20251029105828.1488347-1-martin.teichmann@xfel.eu/ Signed-off-by: Martin Teichmann <martin.teichmann@xfel.eu> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251119160355.1160932-2-martin.teichmann@xfel.eu Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysbpf: Add a check to make static analysers happyAnton Protopopov1-1/+7
In [1] Dan Carpenter reported that the following code makes the Smatch static analyser unhappy: 17904 value = map->ops->map_lookup_elem(map, &i); 17905 if (!value) 17906 return -EINVAL; --> 17907 items[i - start] = value->xlated_off; The analyser assumes that the `value` variable may contain an error and thus it should be properly checked before the dereference. On practice this will never happen as array maps do not return error values in map_lookup_elem, but to make the Smatch and other possible analysers happy this patch adds a formal check. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/bpf/aR2BN1Ix--8tmVrN@stanley.mountain/ [1] Fixes: 493d9e0d6083 ("bpf, x86: add support for indirect jumps") Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Link: https://lore.kernel.org/r/20251119112517.1091793-1-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysselftests/bpf: Update test_tag to use sha256Xing Guo1-1/+1
commit 603b44162325 ("bpf: Update the bpf_prog_calc_tag to use SHA256") changed digest of prog_tag to SHA256 but forgot to update tests correspondingly. Fix it. Fixes: 603b44162325 ("bpf: Update the bpf_prog_calc_tag to use SHA256") Signed-off-by: Xing Guo <higuoxing@gmail.com> Link: https://lore.kernel.org/r/20251121061458.3145167-1-higuoxing@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysselftests/bpf: Improve reliability of test_perf_branches_no_hw()Matt Bobrowski2-2/+17
Currently, test_perf_branches_no_hw() relies on the busy loop within test_perf_branches_common() being slow enough to allow at least one perf event sample tick to occur before starting to tear down the backing perf event BPF program. With a relatively small fixed iteration count of 1,000,000, this is not guaranteed on modern fast CPUs, resulting in the test run to subsequently fail with the following: bpf_testmod.ko is already unloaded. Loading bpf_testmod.ko... Successfully loaded bpf_testmod.ko. test_perf_branches_common:PASS:test_perf_branches_load 0 nsec test_perf_branches_common:PASS:attach_perf_event 0 nsec test_perf_branches_common:PASS:set_affinity 0 nsec check_good_sample:PASS:output not valid 0 nsec check_good_sample:PASS:read_branches_size 0 nsec check_good_sample:PASS:read_branches_stack 0 nsec check_good_sample:PASS:read_branches_stack 0 nsec check_good_sample:PASS:read_branches_global 0 nsec check_good_sample:PASS:read_branches_global 0 nsec check_good_sample:PASS:read_branches_size 0 nsec test_perf_branches_no_hw:PASS:perf_event_open 0 nsec test_perf_branches_common:PASS:test_perf_branches_load 0 nsec test_perf_branches_common:PASS:attach_perf_event 0 nsec test_perf_branches_common:PASS:set_affinity 0 nsec check_bad_sample:FAIL:output not valid no valid sample from prog Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED Successfully unloaded bpf_testmod.ko. On a modern CPU (i.e. one with a 3.5 GHz clock rate), executing 1 million increments of a volatile integer can take significantly less than 1 millisecond. If the spin loop and detachment of the perf event BPF program elapses before the first 1 ms sampling interval elapses, the perf event will never end up firing. Fix this by bumping the loop iteration counter a little within test_perf_branches_common(), along with ensuring adding another loop termination condition which is directly influenced by the backing perf event BPF program executing. Notably, a concious decision was made to not adjust the sample_freq value as that is just not a reliable way to go about fixing the problem. It effectively still leaves the race window open. Fixes: 67306f84ca78c ("selftests/bpf: Add bpf_read_branch_records() selftest") Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20251119143540.2911424-1-mattbobrowski@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysselftests/bpf: skip test_perf_branches_hw() on unsupported platformsMatt Bobrowski1-3/+3
Gracefully skip the test_perf_branches_hw subtest on platforms that do not support LBR or require specialized perf event attributes to enable branch sampling. For example, AMD's Milan (Zen 3) supports BRS rather than traditional LBR. This requires specific configurations (attr.type = PERF_TYPE_RAW, attr.config = RETIRED_TAKEN_BRANCH_INSTRUCTIONS) that differ from the generic setup used within this test. Notably, it also probably doesn't hold much value to special case perf event configurations for selected micro architectures. Fixes: 67306f84ca78c ("selftests/bpf: Add bpf_read_branch_records() selftest") Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20251120142059.2836181-1-mattbobrowski@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysMerge branch 'bpf-arm64-indirect-jumps'Alexei Starovoitov2-2/+13
Puranjay Mohan says: ==================== bpf: arm64: Indirect jumps Changes in v1->v2: v1: https://lore.kernel.org/all/20251117004656.33292-1-puranjay@kernel.org/ - Dropped patch 3 that was ignoring relocations for .jumptables. LLVM has been fixed to not emit relocations for .jumptables, so this patch is not needed. - Added Reviewed-by: Anton Protopopov <a.s.protopopov@gmail.com> This set adds the support of indirect jumps to the arm64 JIT. It involves calling bpf_prog_update_insn_ptrs() to support instructions array map. The second piece is supporting BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=0 instruction that is trivial to implement on arm64. The final patch enables selftests on arm64: [root@localhost bpf]# ./test_progs-cpuv4 -a "*gotox*" #20/1 bpf_gotox/one-switch:OK #20/2 bpf_gotox/one-switch-non-zero-sec-offset:OK #20/3 bpf_gotox/two-switches:OK #20/4 bpf_gotox/big-jump-table:OK #20/5 bpf_gotox/static-global:OK #20/6 bpf_gotox/nonstatic-global:OK #20/7 bpf_gotox/other-sec:OK #20/8 bpf_gotox/static-global-other-sec:OK #20/9 bpf_gotox/nonstatic-global-other-sec:OK #20/10 bpf_gotox/one-jump-two-maps:OK #20/11 bpf_gotox/one-map-two-jumps:OK #20 bpf_gotox:OK #537/1 verifier_gotox/jump_table_ok:OK #537/2 verifier_gotox/jump_table_reserved_field_src_reg:OK #537/3 verifier_gotox/jump_table_reserved_field_non_zero_off:OK #537/4 verifier_gotox/jump_table_reserved_field_non_zero_imm:OK #537/5 verifier_gotox/jump_table_no_jump_table:OK #537/6 verifier_gotox/jump_table_incorrect_dst_reg_type:OK #537/7 verifier_gotox/jump_table_invalid_read_size_u32:OK #537/8 verifier_gotox/jump_table_invalid_read_size_u16:OK #537/9 verifier_gotox/jump_table_invalid_read_size_u8:OK #537/10 verifier_gotox/jump_table_misaligned_access:OK #537/11 verifier_gotox/jump_table_invalid_mem_acceess_pos:OK #537/12 verifier_gotox/jump_table_invalid_mem_acceess_neg:OK #537/13 verifier_gotox/jump_table_add_sub_ok:OK #537/14 verifier_gotox/jump_table_no_writes:OK #537/15 verifier_gotox/jump_table_use_reg_r0:OK #537/16 verifier_gotox/jump_table_use_reg_r1:OK #537/17 verifier_gotox/jump_table_use_reg_r2:OK #537/18 verifier_gotox/jump_table_use_reg_r3:OK #537/19 verifier_gotox/jump_table_use_reg_r4:OK #537/20 verifier_gotox/jump_table_use_reg_r5:OK #537/21 verifier_gotox/jump_table_use_reg_r6:OK #537/22 verifier_gotox/jump_table_use_reg_r7:OK #537/23 verifier_gotox/jump_table_use_reg_r8:OK #537/24 verifier_gotox/jump_table_use_reg_r9:OK #537/25 verifier_gotox/jump_table_outside_subprog:OK #537/26 verifier_gotox/jump_table_contains_non_unique_values:OK #537 verifier_gotox:OK Summary: 2/37 PASSED, 0 SKIPPED, 0 FAILED ==================== Link: https://patch.msgid.link/20251117130732.11107-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysselftests: bpf: Enable gotox tests from arm64Puranjay Mohan1-2/+2
arm64 JIT now supports gotox instruction and jumptables, so run tests in verifier_gotox.c for arm64. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Anton Protopopov <a.s.protopopov@gmail.com> Link: https://lore.kernel.org/r/20251117130732.11107-4-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysbpf: arm64: Add support for indirect jumpsPuranjay Mohan1-0/+4
Add support for a new instruction BPF_JMP|BPF_X|BPF_JA, SRC=0, DST=Rx, off=0, imm=0 which does an indirect jump to a location stored in Rx. The register Rx should have type PTR_TO_INSN. This new type assures that the Rx register contains a value (or a range of values) loaded from a correct jump table – map of type instruction array. ARM64 JIT supports indirect jumps to all registers through the A64_BR() macro, use it to implement this new instruction. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Anton Protopopov <a.s.protopopov@gmail.com> Acked-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20251117130732.11107-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysbpf: arm64: Add support for instructions arrayPuranjay Mohan1-0/+7
Add support for the instructions array map type in the arm64 JIT by calling bpf_prog_update_insn_ptrs() with the offsets that map xlated_offset to the jited_offset in the final image. arm64 JIT already has this offset array which was being used for bpf_prog_fill_jited_linfo() and can be used directly for bpf_prog_update_insn_ptrs. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Anton Protopopov <a.s.protopopov@gmail.com> Acked-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20251117130732.11107-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
10 daysMerge branch 'selftests-bpf-networking-test-cleanups'Martin KaFai Lau2-112/+77
Hoyeon Lee says: ==================== selftests/bpf: networking test cleanups This series finishes the sockaddr_storage migration in the networking selftests by removing the remaining open-coded IPv4/IPv6 wrappers (addr_port/tuple in cls_redirect, sa46 in select_reuseport). The tests now use sockaddr_storage directly. No other custom socket-address wrappers remain after this series, so the churn stops here and behavior is unchanged. ==================== Link: https://patch.msgid.link/20251121081332.2309838-1-hoyeon.lee@suse.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
10 daysselftests/bpf: Use sockaddr_storage instead of sa46 in select_reuseport testHoyeon Lee1-33/+34
The select_reuseport selftest uses a custom sa46 union to represent IPv4 and IPv6 addresses. This custom wrapper requires extra manual handling for address family and field extraction. Replace sa46 with sockaddr_storage and update the helper functions to operate on native socket structures. This simplifies the code and removes unnecessary custom address-handling logic. No functional changes intended. Reviewed-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251121081332.2309838-3-hoyeon.lee@suse.com
10 daysselftests/bpf: Use sockaddr_storage directly in cls_redirect testHoyeon Lee1-79/+43
The cls_redirect test uses a custom addr_port/tuple wrapper to represent IPv4/IPv6 addresses and ports. This custom wrapper requires extra conversion logic and specific helpers such as fill_addr_port(), which are no longer necessary when using standard socket address structures. This commit replaces addr_port/tuple with the standard sockaddr_storage so test handles address families and ports using native socket types. It removes the custom helper, eliminates redundant casts, and simplifies the setup helpers without functional changes. set_up_conn() and build_input() now take src/dst sockaddr_storage directly. Reviewed-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251121081332.2309838-2-hoyeon.lee@suse.com
11 daysbpf: Document cfi_stubs and owner fields in struct bpf_struct_opsNirbhay Sharma1-4/+6
Add missing kernel-doc documentation for the cfi_stubs and owner fields in struct bpf_struct_ops to fix the following warnings: Warning: include/linux/bpf.h:1931 struct member 'cfi_stubs' not described in 'bpf_struct_ops' Warning: include/linux/bpf.h:1931 struct member 'owner' not described in 'bpf_struct_ops' The cfi_stubs field was added in commit 2cd3e3772e41 ("x86/cfi,bpf: Fix bpf_struct_ops CFI") to provide CFI stub functions for trampolines, and the owner field is used for module reference counting. Signed-off-by: Nirbhay Sharma <nirbhay.lkd@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251120204620.59571-2-nirbhay.lkd@gmail.com
11 daysselftests/bpf: Use ASSERT_STRNEQ to factor in long slab cache namesMatt Bobrowski1-1/+2
subtest_kmem_cache_iter_check_slabinfo() fundamentally compares slab cache names parsed out from /proc/slabinfo against those stored within struct kmem_cache_result. The current problem is that the slab cache name within struct kmem_cache_result is stored within a bounded fixed-length array (sized to SLAB_NAME_MAX(32)), whereas the name parsed out from /proc/slabinfo is not. Meaning, using ASSERT_STREQ() can certainly lead to test failures, particularly when dealing with slab cache names that are longer than SLAB_NAME_MAX(32) bytes. Notably, kmem_cache_create() allows callers to create slab caches with somewhat arbitrarily sized names via its __name identifier argument, so exceeding the SLAB_NAME_MAX(32) limit that is in place now can certainly happen. Make subtest_kmem_cache_iter_check_slabinfo() more reliable by only checking up to sizeof(struct kmem_cache_result.name) - 1 using ASSERT_STRNEQ(). Fixes: a496d0cdc84d ("selftests/bpf: Add a test for kmem_cache_iter") Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://patch.msgid.link/20251118073734.4188710-1-mattbobrowski@google.com
13 daysMerge branch 'replace-bpf-memory-allocator-with-kmalloc_nolock-in-local-storage'Alexei Starovoitov3-175/+74
Amery Hung says: ==================== Replace BPF memory allocator with kmalloc_nolock() in local storage This patchset tries to simplify bpf_local_storage.c by adopting kmalloc_nolock(). This removes memory preallocation and reduces the dependency of smap in bpf_selem_free() and bpf_local_storage_free(). The later will simplify a future refactor that replaces local_storage->lock and b->lock [1]. RFC v1 tried to switch to kmalloc_nolock() unconditionally. However, as there is substantial performance loss in socket local storage due to 1) defer_free() in kfree_nolock() and 2) no kfree_rcu() batching, replacing kzalloc() is postponed until necessary improvements in mm land. Benchmark ./bench -p 1 local-storage-create --storage-type <socket,task> \ --batch-size <16,32,64> The benchmark is a microbenchmark stress-testing how fast local storage can be created. For task local storage, switching from BPF memory allocator to kmalloc_nolock() yields a small amount of improvement. For socket local storage, it remains roughly the same as nothing has changed. Socket local storage memory alloc batch creation speed creation speed diff --------------- ---- ------------------ ---- kzalloc 16 144.149 ± 0.642k/s 3.10 kmallocs/create (before) 32 144.379 ± 1.070k/s 3.08 kmallocs/create 64 144.491 ± 0.818k/s 3.13 kmallocs/create kzalloc 16 146.180 ± 1.403k/s 3.10 kmallocs/create +1.4% (not changed) 32 146.245 ± 1.272k/s 3.10 kmallocs/create +1.3% 64 145.012 ± 1.545k/s 3.10 kmallocs/create +0.4% Task local storage memory alloc batch creation speed creation speed diff --------------- ---- ------------------ ---- BPF memory 16 24.668 ± 0.121k/s 2.54 kmallocs/create allocator 32 22.899 ± 0.097k/s 2.67 kmallocs/create (before) 64 22.559 ± 0.076k/s 2.56 kmallocs/create kmalloc_nolock 16 25.796 ± 0.059k/s 2.52 kmallocs/create +4.6% (after) 32 23.412 ± 0.069k/s 2.50 kmallocs/create +2.2% 64 23.717 ± 0.108k/s 2.60 kmallocs/create +5.1% [1] https://lore.kernel.org/bpf/20251002225356.1505480-1-ameryhung@gmail.com/ v1 -> v2 - Only replace BPF memory allocator with kmalloc_nolock() Link: https://lore.kernel.org/bpf/20251112175939.2365295-1-ameryhung@gmail.com/ ==================== Link: https://patch.msgid.link/20251114201329.3275875-1-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 daysbpf: Replace bpf memory allocator with kmalloc_nolock() in local storageAmery Hung2-112/+48
Replace bpf memory allocator with kmalloc_nolock() to reduce memory wastage due to preallocation. In bpf_selem_free(), an selem now needs to wait for a RCU grace period before being freed when reuse_now == true. Therefore, rcu_barrier() should be always be called in bpf_local_storage_map_free(). In bpf_local_storage_free(), since smap->storage_ma is no longer needed to return the memory, the function is now independent from smap. Remove the outdated comment in bpf_local_storage_alloc(). We already free selem after an RCU grace period in bpf_local_storage_update() when bpf_local_storage_alloc() failed the cmpxchg since commit c0d63f309186 ("bpf: Add bpf_selem_free()"). Signed-off-by: Amery Hung <ameryhung@gmail.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20251114201329.3275875-5-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 daysbpf: Save memory alloction info in bpf_local_storageAmery Hung2-44/+9
Save the memory allocation method used for bpf_local_storage in the struct explicitly so that we don't need to go through the hassle to find out the info. When a later patch replaces BPF memory allocator with kmalloc_noloc(), bpf_local_storage_free() will no longer need smap->storage_ma to return the memory and completely remove the dependency on smap in bpf_local_storage_free(). Signed-off-by: Amery Hung <ameryhung@gmail.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20251114201329.3275875-4-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 daysbpf: Remove smap argument from bpf_selem_free()Amery Hung3-11/+11
Since selem already saves a pointer to smap, use it instead of an additional argument in bpf_selem_free(). This requires moving the SDATA(selem)->smap assignment from bpf_selem_link_map() to bpf_selem_alloc() since bpf_selem_free() may be called without the selem being linked to smap in bpf_local_storage_update(). Signed-off-by: Amery Hung <ameryhung@gmail.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20251114201329.3275875-3-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 daysbpf: Always charge/uncharge memory when allocating/unlinking storage elementsAmery Hung3-14/+12
Since commit a96a44aba556 ("bpf: bpf_sk_storage: Fix invalid wait context lockdep report"), {charge,uncharge}_mem are always true when allocating a bpf_local_storage_elem or unlinking a bpf_local_storage_elem from local storage, so drop these arguments. No functional change. Signed-off-by: Amery Hung <ameryhung@gmail.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20251114201329.3275875-2-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
13 daysselftests/bpf: Replace TCP CC string comparisons with bpf_strncmpHoyeon Lee2-28/+10
The connect4_prog and bpf_iter_setsockopt tests duplicate the same open-coded TCP congestion control string comparison logic. Since bpf_strncmp() provides the same functionality, use it instead to avoid repeated open-coded loops. This change applies only to functional BPF tests and does not affect the verifier performance benchmarks (veristat.cfg). No functional changes intended. Reviewed-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251115225550.1086693-5-hoyeon.lee@suse.com
13 daysselftests/bpf: Move common TCP helpers into bpf_tracing_net.hHoyeon Lee5-24/+14
Some BPF selftests contain identical copies of the min(), max(), before(), and after() helpers. These repeated snippets are the same across the tests and do not need to be defined separately. Move these helpers into bpf_tracing_net.h so they can be shared by TCP related BPF programs. This removes repeated code and keeps the helpers in a single place. Reviewed-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Hoyeon Lee <hoyeon.lee@suse.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251115225550.1086693-4-hoyeon.lee@suse.com
14 daysbpf: Fix invalid prog->stats access when update_effective_progs failsPu Lehui2-5/+10
Syzkaller triggers an invalid memory access issue following fault injection in update_effective_progs. The issue can be described as follows: __cgroup_bpf_detach update_effective_progs compute_effective_progs bpf_prog_array_alloc <-- fault inject purge_effective_progs /* change to dummy_bpf_prog */ array->items[index] = &dummy_bpf_prog.prog ---softirq start--- __do_softirq ... __cgroup_bpf_run_filter_skb __bpf_prog_run_save_cb bpf_prog_run stats = this_cpu_ptr(prog->stats) /* invalid memory access */ flags = u64_stats_update_begin_irqsave(&stats->syncp) ---softirq end--- static_branch_dec(&cgroup_bpf_enabled_key[atype]) The reason is that fault injection caused update_effective_progs to fail and then changed the original prog into dummy_bpf_prog.prog in purge_effective_progs. Then a softirq came, and accessing the members of dummy_bpf_prog.prog in the softirq triggers invalid mem access. To fix it, skip updating stats when stats is NULL. Fixes: 492ecee892c2 ("bpf: enable program stats") Signed-off-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20251115102343.2200727-1-pulehui@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14bpf: don't skip other information if xlated_prog_insns is skippedAltgelt, Max (Nextron)1-11/+11
If xlated_prog_insns should not be exposed, other information (such as func_info) still can and should be filled in. Therefore, instead of directly terminating in this case, continue with the normal flow. Signed-off-by: Max Altgelt <max.altgelt@nextron-systems.com> Link: https://lore.kernel.org/r/efd00fcec5e3e247af551632726e2a90c105fbd8.camel@nextron-systems.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14selftests/bpf: Test bpf_skb_check_mtu(BPF_MTU_CHK_SEGS) when ↵Martin KaFai Lau2-1/+34
transport_header is not set Add a test to check that bpf_skb_check_mtu(BPF_MTU_CHK_SEGS) is rejected (-EINVAL) if skb->transport_header is not set. The test needs to lower the MTU of the loopback device. Thus, take this opportunity to run the test in a netns by adding "ns_" to the test name. The "serial_" prefix can then be removed. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20251112232331.1566074-2-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14bpf: Check skb->transport_header is set in bpf_skb_check_mtuMartin KaFai Lau1-3/+6
The bpf_skb_check_mtu helper needs to use skb->transport_header when the BPF_MTU_CHK_SEGS flag is used: bpf_skb_check_mtu(skb, ifindex, &mtu_len, 0, BPF_MTU_CHK_SEGS) The transport_header is not always set. There is a WARN_ON_ONCE report when CONFIG_DEBUG_NET is enabled + skb->gso_size is set + bpf_prog_test_run is used: WARNING: CPU: 1 PID: 2216 at ./include/linux/skbuff.h:3071 skb_gso_validate_network_len bpf_skb_check_mtu bpf_prog_3920e25740a41171_tc_chk_segs_flag # A test in the next patch bpf_test_run bpf_prog_test_run_skb For a normal ingress skb (not test_run), skb_reset_transport_header is performed but there is plan to avoid setting it as described in commit 2170a1f09148 ("net: no longer reset transport_header in __netif_receive_skb_core()"). This patch fixes the bpf helper by checking skb_transport_header_was_set(). The check is done just before skb->transport_header is used, to avoid breaking the existing bpf prog. The WARN_ON_ONCE is limited to bpf_prog_test_run, so targeting bpf-next. Fixes: 34b2021cc616 ("bpf: Add BPF-helper for MTU checking") Cc: Jesper Dangaard Brouer <hawk@kernel.org> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20251112232331.1566074-1-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14bpf: verifier: Move desc->imm setup to sort_kfunc_descs_by_imm_off()Puranjay Mohan1-19/+35
Metadata about a kfunc call is added to the kfunc_tab in add_kfunc_call() but the call instruction itself could get removed by opt_remove_dead_code() later if it is not reachable. If the call instruction is removed, specialize_kfunc() is never called for it and the desc->imm in the kfunc_tab is never initialized for this kfunc call. In this case, sort_kfunc_descs_by_imm_off(env->prog); in do_misc_fixups() doesn't sort the table correctly. This is a problem for s390 as its JIT uses this table to find the addresses for kfuncs, and if this table is not sorted properly, JIT may fail to find addresses for valid kfunc calls. This was exposed by: commit d869d56ca848 ("bpf: verifier: refactor kfunc specialization") as before this commit, desc->imm was initialised in add_kfunc_call() which happens before dead code elimination. Move desc->imm setup down to sort_kfunc_descs_by_imm_off(), this fixes the problem and also saves us from having the same logic in add_kfunc_call() and specialize_kfunc(). Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251114154023.12801-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14selftests/bpf: Align kfuncs renamed in bpf treeMykyta Yatsenko2-3/+3
bpf_task_work_schedule_resume() and bpf_task_work_schedule_signal() have been renamed in bpf tree to bpf_task_work_schedule_resume_impl() and bpf_task_work_schedule_signal_impl() accordingly. There are few uses of these kfuncs in selftests that are not in bpf tree, so that when we port [1] into bpf-next, those BPF programs will not compile. This patch aligns those remaining callsites with the kfunc renaming. It should go on top of [1] when applying on bpf-next. 1: https://lore.kernel.org/all/20251104-implv2-v3-0-4772b9ae0e06@meta.com/ Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20251105132105.597344-1-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after 6.18-rc5+Alexei Starovoitov511-2336/+6076
Cross-merge BPF and other fixes after downstream PR. Minor conflict in kernel/bpf/helpers.c Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14Merge branch 'libbpf-fix-btf-dedup-to-support-recursive-typedef'Andrii Nakryiko2-16/+120
Paul Houssel says: ==================== libbpf: fix BTF dedup to support recursive typedef Pahole fails to encode BTF for some Go projects (e.g. Kubernetes and Podman) due to recursive type definitions that create reference loops not representable in C. These recursive typedefs trigger a failure in the BTF deduplication algorithm. This patch extends btf_dedup_struct_types() to properly handle potential recursion for BTF_KIND_TYPEDEF, similar to how recursion is already handled for BTF_KIND_STRUCT. This allows pahole to successfully generate BTF for Go binaries using recursive types without impacting existing C-based workflows. Changes in v4: fix typo found by Claude-based CI Changes in v3: 1. Patch 1: Adjusted the comment of btf_dedup_ref_type() to refer to typedef as well. 2. Patch 2: Update of the "dedup: recursive typedef" test to include a duplicated version of the types to make sure deduplication still happens in this case. Changes in v2: 1. Patch 1: Refactored code to prevent copying existing logic. Instead of adding a new function we modify the existing btf_dedup_struct_type() function to handle the BTF_KIND_TYPEDEF case. Calls to btf_hash_struct() and btf_shallow_equal_struct() are replaced with calls to functions that select btf_hash_struct() / btf_hash_typedef() based on the type. 2. Patch 2: Added tests v3: https://lore.kernel.org/lkml/cover.1763024337.git.paul.houssel@orange.com/ v2: https://lore.kernel.org/lkml/cover.1762956564.git.paul.houssel@orange.com/ v1: https://lore.kernel.org/lkml/20251107153408.159342-1-paulhoussel2@gmail.com/ ==================== Link: https://patch.msgid.link/cover.1763037045.git.paul.houssel@orange.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-11-14selftests/bpf: Add BTF dedup tests for recursive typedef definitionsPaul Houssel1-0/+65
Add several ./test_progs tests: 1. btf/dedup:recursive typedef ensures that deduplication no longer fails on recursive typedefs. 2. btf/dedup:typedef ensures that typedefs are deduplicated correctly just as they were before this patch. Signed-off-by: Paul Houssel <paul.houssel@orange.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/9fac2f744089f6090257d4c881914b79f6cd6c6a.1763037045.git.paul.houssel@orange.com
2025-11-14libbpf: Fix BTF dedup to support recursive typedef definitionsPaul Houssel1-16/+55
Handle recursive typedefs in BTF deduplication Pahole fails to encode BTF for some Go projects (e.g. Kubernetes and Podman) due to recursive type definitions that create reference loops not representable in C. These recursive typedefs trigger a failure in the BTF deduplication algorithm. This patch extends btf_dedup_ref_type() to properly handle potential recursion for BTF_KIND_TYPEDEF, similar to how recursion is already handled for BTF_KIND_STRUCT. This allows pahole to successfully generate BTF for Go binaries using recursive types without impacting existing C-based workflows. Suggested-by: Tristan d'Audibert <tristan.daudibert@gmail.com> Co-developed-by: Martin Horth <martin.horth@telecom-sudparis.eu> Co-developed-by: Ouail Derghal <ouail.derghal@imt-atlantique.fr> Co-developed-by: Guilhem Jazeron <guilhem.jazeron@inria.fr> Co-developed-by: Ludovic Paillat <ludovic.paillat@inria.fr> Co-developed-by: Robin Theveniaut <robin.theveniaut@irit.fr> Signed-off-by: Martin Horth <martin.horth@telecom-sudparis.eu> Signed-off-by: Ouail Derghal <ouail.derghal@imt-atlantique.fr> Signed-off-by: Guilhem Jazeron <guilhem.jazeron@inria.fr> Signed-off-by: Ludovic Paillat <ludovic.paillat@inria.fr> Signed-off-by: Robin Theveniaut <robin.theveniaut@irit.fr> Signed-off-by: Paul Houssel <paul.houssel@orange.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/bf00857b1e06f282aac12f6834de7396a7547ba6.1763037045.git.paul.houssel@orange.com
2025-11-14selftests/bpf: Fix failure paths in send_signal testAlexei Starovoitov1-0/+5
When test_send_signal_kern__open_and_load() fails parent closes the pipe which cases ASSERT_EQ(read(pipe_p2c...)) to fail, but child continues and enters infinite loop, while parent is stuck in wait(NULL). Other error paths have similar issue, so kill the child before waiting on it. The bug was discovered while compiling all of selftests with -O1 instead of -O2 which caused progs/test_send_signal_kern.c to fail to load. Fixes: ab8b7f0cb358 ("tools/bpf: Add self tests for bpf_send_signal_thread()") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20251113171153.2583-1-alexei.starovoitov@gmail.com
2025-11-14Merge tag 'pci-v6.18-fixes-5' of ↵Linus Torvalds5-28/+50
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Cache the ASPM L0s/L1 Supported bits early so quirks can override them if necessary (Bjorn Helgaas) - Add quirks for PA Semi and Freescale Root Ports and a HiSilicon Wi-Fi device that are reported to have broken L0s and L1 (Shawn Lin, Bjorn Helgaas) * tag 'pci-v6.18-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI/ASPM: Avoid L0s and L1 on Hi1105 [19e5:1105] Wi-Fi PCI/ASPM: Avoid L0s and L1 on PA Semi [1959:a002] Root Ports PCI/ASPM: Avoid L0s and L1 on Freescale [1957:0451] Root Ports PCI/ASPM: Convert quirks to override advertised link states PCI/ASPM: Add pcie_aspm_remove_cap() to override advertised link states PCI/ASPM: Cache L0s/L1 Supported so advertised link states can be overridden
2025-11-14Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds29-84/+762
Pull bpf fixes from Alexei Starovoitov: - Fix interaction between livepatch and BPF fexit programs (Song Liu) With Steven and Masami acks. - Fix stack ORC unwind from BPF kprobe_multi (Jiri Olsa) With Steven and Masami acks. - Fix out of bounds access in widen_imprecise_scalars() in the verifier (Eduard Zingerman) - Fix conflicts between MPTCP and BPF sockmap (Jiayuan Chen) - Fix net_sched storage collision with BPF data_meta/data_end (Eric Dumazet) - Add _impl suffix to BPF kfuncs with implicit args to avoid breaking them in bpf-next when KF_IMPLICIT_ARGS is added (Mykyta Yatsenko) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Test widen_imprecise_scalars() with different stack depth bpf: account for current allocated stack depth in widen_imprecise_scalars() bpf: Add bpf_prog_run_data_pointers() selftests/bpf: Add mptcp test with sockmap mptcp: Fix proto fallback detection with BPF mptcp: Disallow MPTCP subflows from sockmap selftests/bpf: Add stacktrace ips test for raw_tp selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi x86/fgraph,bpf: Fix stack ORC unwind from kprobe_multi return probe Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()" bpf: add _impl suffix for bpf_stream_vprintk() kfunc bpf:add _impl suffix for bpf_task_work_schedule* kfuncs selftests/bpf: Add tests for livepatch + bpf trampoline ftrace: bpf: Fix IPMODIFY + DIRECT in modify_ftrace_direct() ftrace: Fix BPF fexit with livepatch
2025-11-14Merge tag 'rust-fixes-6.18-2' of ↵Linus Torvalds3-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull Rust fix from Miguel Ojeda: - Fix a Rust 1.91.0 build issue due to 'bindings.o' not containing DWARF debug information anymore by teaching gendwarfksyms to skip object files without exports * tag 'rust-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: gendwarfksyms: Skip files with no exports
2025-11-14selftests/bpf: Convert glob_match() to bpf arenaAlexei Starovoitov3-0/+304
Increase arena test coverage. Convert glob_match() to bpf arena in two steps: 1. Copy paste lib/glob.c into bpf_arena_strsearch.h Copy paste lib/globtests.c into progs/arena_strsearch.c 2. Add __arena to pointers Add __arg_arena to global functions that accept arena pointers Add cond_break to loops The test also serves as a good example of what's possible with bpf arena and how existing algorithms can be converted. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251111032931.21430-1-alexei.starovoitov@gmail.com
2025-11-14Merge tag 'nfs-for-6.18-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds9-155/+211
Pull NFS client fixes from Anna Schumaker: - Various fixes when using NFS with TLS - Localio direct-IO fixes - Fix error handling in nfs_atomic_open_v23() - Fix sysfs memory leak when nfs_client kobject add fails - Fix an incorrect parameter when calling nfs4_call_sync() - Fix a failing LTP test when using delegated timestamps * tag 'nfs-for-6.18-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Fix LTP test failures when timestamps are delegated NFSv4: Fix an incorrect parameter when calling nfs4_call_sync() NFS: sysfs: fix leak when nfs_client kobject add fails NFSv2/v3: Fix error handling in nfs_atomic_open_v23() nfs/localio: do not issue misaligned DIO out-of-order nfs/localio: Ensure DIO WRITE's IO on stable storage upon completion nfs/localio: backfill missing partial read support for misaligned DIO nfs/localio: add refcounting for each iocb IO associated with NFS pgio header nfs/localio: remove unecessary ENOTBLK handling in DIO WRITE support NFS: Check the TLS certificate fields in nfs_match_client() pnfs: Set transport security policy to RPC_XPRTSEC_NONE unless using TLS pnfs: Fix TLS logic in _nfs4_pnfs_v4_ds_connect() pnfs: Fix TLS logic in _nfs4_pnfs_v3_ds_connect()
2025-11-14Merge tag 'drm-fixes-2025-11-15' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds17-20/+102
Pull drm fixes from Dave Airlie: "Weekly fixes, amdgpu and vmwgfx making up the most of it, along with panthor and i915/xe. Seems about right for this time of development, nothing major outstanding. client: - Fix description of module parameter panthor: - Flush writes before mapping buffers vmwgfx: - Improve command validation - Improve ref counting - Fix cursor-plane support amdgpu: - Disallow P2P DMA for GC 12 DCC surfaces - ctx error handling fix - UserQ fixes - VRR fix - ISP fix - JPEG 5.0.1 fix amdkfd: - Save area check fix - Fix GPU mappings for APU after prefetch i915: - Fix PSR's pipe to vblank conversion - Disable Panel Replay on MST links xe: - New HW workarounds affecting PTL and WCL platforms * tag 'drm-fixes-2025-11-15' of https://gitlab.freedesktop.org/drm/kernel: drm/client: fix MODULE_PARM_DESC string for "active" drm/i915/dp_mst: Disable Panel Replay drm/amdkfd: Fix GPU mappings for APU after prefetch drm/amdkfd: relax checks for over allocation of save area drm/amdgpu/jpeg: Add parse_cs for JPEG5_0_1 drm/amd/amdgpu: Ensure isp_kernel_buffer_alloc() creates a new BO drm/amd/display: Allow VRR params change if unsynced with the stream drm/amdgpu: fix lock warning in amdgpu_userq_fence_driver_process drm/amdgpu: jump to the correct label on failure drm/amdgpu: disable peer-to-peer access for DCC-enabled GC12 VRAM surfaces drm/xe/xe3lpg: Extend Wa_15016589081 for xe3lpg drm/xe/xe3: Extend wa_14023061436 drm/xe/xe3: Add WA_14024681466 for Xe3_LPG drm/i915/psr: fix pipe to vblank conversion drm/panthor: Flush shmem writes before mapping buffers CPU-uncached drm/vmwgfx: Restore Guest-Backed only cursor plane support drm/vmwgfx: Use kref in vmw_bo_dirty drm/vmwgfx: Validate command header size against SVGA_CMD_MAX_DATASIZE
2025-11-14Merge tag 'mmc-v6.18-rc2' of ↵Linus Torvalds4-42/+22
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: - dw_mmc-rockchip: Fix internal phase calculation - pxamci: Simplify and fix ->probe() error handling - sdhci-of-dwcmshc: Fix strbin signal delay - wmt-sdmmc: Fix compile test default * tag 'mmc-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: dw_mmc-rockchip: Fix wrong internal phase calculate mmc: pxamci: Simplify pxamci_probe() error handling using devm APIs mmc: sdhci-of-dwcmshc: Change DLL_STRBIN_TAPNUM_DEFAULT to 0x4 mmc: wmt-sdmmc: fix compile test default
2025-11-14bpf: Handle return value of ftrace_set_filter_ip in register_fentryMenglong Dong1-1/+3
The error that returned by ftrace_set_filter_ip() in register_fentry() is not handled properly. Just fix it. Fixes: 00963a2e75a8 ("bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)") Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20251110120705.1553694-1-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14Merge tag 'pmdomain-v6.18-rc2' of ↵Linus Torvalds3-17/+27
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm Pull pmdomain fixes from Ulf Hansson: - imx: Fix reference count leak in ->remove() - samsung: Rework legacy splash-screen handover workaround - samsung: Fix potential memleak during ->probe() - arm: Fix genpd leak on provider registration failure for scmi * tag 'pmdomain-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: pmdomain: imx: Fix reference count leak in imx_gpc_remove pmdomain: samsung: Rework legacy splash-screen handover workaround pmdomain: arm: scmi: Fix genpd leak on provider registration failure pmdomain: samsung: plug potential memleak during probe
2025-11-14bpf: Add missing checks to avoid verbose verifier logEduard Zingerman1-4/+6
There are a few places where log level is not checked before calling "verbose()". This forces programs working only at BPF_LOG_LEVEL_STATS (e.g. veristat) to allocate unnecessarily large log buffers. Add missing checks. Reported-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251114200542.912386-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14Merge tag 'cxl-fixes-6.18-rc6' of ↵Linus Torvalds3-22/+28
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dave Jiang: - Fix incorrect device handle check for Generic Initiator - Fix offset calculation for extended linear cache poison injection - Fix lockdep warning for hmem_register_resource() * tag 'cxl-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: acpi/hmat: Fix lockdep warning for hmem_register_resource() cxl: Adjust offset calculation for poison injection acpi,srat: Fix incorrect device handle check for Generic Initiator
2025-11-14bpf: Prevent nesting overflow in bpf_try_get_buffersSahil Chandna1-0/+3
bpf_try_get_buffers() returns one of multiple per-CPU buffers based on a per-CPU nesting counter. This mechanism expects that buffers are not endlessly acquired before being returned. migrate_disable() ensures that a task remains on the same CPU, but it does not prevent the task from being preempted by another task on that CPU. Without disabled preemption, a task may be preempted while holding a buffer, allowing another task to run on same CPU and acquire an additional buffer. Several such preemptions can cause the per-CPU nest counter to exceed MAX_BPRINTF_NEST_LEVEL and trigger the warning in bpf_try_get_buffers(). Adding preempt_disable()/preempt_enable() around buffer acquisition and release prevents this task preemption and preserves the intended bounded nesting behavior. Reported-by: syzbot+b0cff308140f79a9c4cb@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68f6a4c8.050a0220.1be48.0011.GAE@google.com/ Fixes: 4223bf833c849 ("bpf: Remove preempt_disable in bpf_try_get_buffers") Suggested-by: Yonghong Song <yonghong.song@linux.dev> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Sahil Chandna <chandna.sahil@gmail.com> Link: https://lore.kernel.org/r/20251114064922.11650-1-chandna.sahil@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14Merge tag 'spi-fix-v6.18-rc5' of ↵Linus Torvalds3-5/+24
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few standard fixes here, plus one more interesting one from Hans which addresses an issue where a move in when we requested GPIOs on ACPI systems caused us to stop doing pinmuxing and leave things floating that we'd really rather not have floating" * tag 'spi-fix-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: Add TODO comment about ACPI GPIO setup spi: xilinx: increase number of retries before declaring stall spi: imx: keep dma request disabled before dma transfer setup spi: Try to get ACPI GPIO IRQ earlier
2025-11-14Merge tag 'regulator-fix-v6.18-rc5' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "One simple fix for a GPIO descriptor leak in the probe error handling for the fixed regulator" * tag 'regulator-fix-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: fixed: fix GPIO descriptor leak on register failure
2025-11-14Merge tag 'sound-6.18-rc6' of ↵Linus Torvalds15-49/+125
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes. All changes are device-specific, and nothing stands out. - A regression fix for HD-audio HDMI probe - USB-audio hardening patches for issues spotted by fuzzers - ASoC fixes for TAS278x, SoundWire and Cirrus - Usual HD-audio and USB-audio quirks" * tag 'sound-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Add native DSD quirks for PureAudio DAC series ASoC: rsnd: fix OF node reference leak in rsnd_ssiu_probe() ALSA: hda/tas2781: Correct the wrong project ID ALSA: usb-audio: Fix NULL pointer dereference in snd_usb_mixer_controls_badd ASoC: SDCA: bug fix while parsing mipi-sdca-control-cn-list ALSA: usb-audio: Fix potential overflow of PCM transfer buffer ALSA: hda/tas2781: Add new quirk for HP new projects ASoC: tas2781: fix getting the wrong device number ASoC: codecs: va-macro: fix resource leak in probe error path ASoC: tas2783A: Fix issues in firmware parsing ASoC: sdw_utils: fix device reference leak in is_sdca_endpoint_present() ASoC: cs4271: Fix regulator leak on probe failure ALSA: hda/hdmi: Fix breakage at probing nvhdmi-mcp driver ASoC: da7213: Use component driver suspend/resume ALSA: usb-audio: add min_mute quirk for SteelSeries Arctis ASoC: doc: cs35l56: Update firmware filename description for B0 silicon
2025-11-14Merge tag 'block-6.18-20251114' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixlet from Jens Axboe: "Been sitting on this one for a week or two, planning on sending it out when there were other block changes for 6.18. But as that hasn't materialized in the second week of sitting on it, let's flush it out. A previous commit updated my git tree locations, but one was missed as it was already set to the git.kernel.org one. But the git location swap also renamed the actual tree from linux-block to just linux, let's get that last one updated too" * tag 'block-6.18-20251114' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: MAINTAINERS: correct git location for block layer tree
2025-11-14Merge tag 'io_uring-6.18-20251113' of ↵Linus Torvalds4-7/+17
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring fixes from Jens Axboe: - Use the actual segments in a request when for bvec based buffers - Fix an odd case where the iovec might get leaked for a read/write request, if it was newly allocated, overflowed the alloc cache, and hit an early error - Minor tweak to the query API added in this release, returning the number of available entries * tag 'io_uring-6.18-20251113' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs io_uring/query: return number of available queries io_uring/rw: ensure allocated iovec gets cleared for early failure
2025-11-14selftests/bpf: Test widen_imprecise_scalars() with different stack depthEduard Zingerman1-0/+53
A test case for a situation when widen_imprecise_scalars() is called with old->allocated_stack > cur->allocated_stack. Test structure: def widening_stack_size_bug(): r1 = 0 for r6 in 0..1: iterator_with_diff_stack_depth(r1) r1 = 42 def iterator_with_diff_stack_depth(r1): if r1 != 42: use 128 bytes of stack iterator based loop iterator_with_diff_stack_depth() is verified with r1 == 0 first and r1 == 42 next. Causing stack usage of 128 bytes on a first visit and 8 bytes on a second. Such arrangement triggered a KASAN error in widen_imprecise_scalars(). Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251114025730.772723-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14bpf: account for current allocated stack depth in widen_imprecise_scalars()Eduard Zingerman1-2/+4
The usage pattern for widen_imprecise_scalars() looks as follows: prev_st = find_prev_entry(env, ...); queued_st = push_stack(...); widen_imprecise_scalars(env, prev_st, queued_st); Where prev_st is an ancestor of the queued_st in the explored states tree. This ancestor is not guaranteed to have same allocated stack depth as queued_st. E.g. in the following case: def main(): for i in 1..2: foo(i) // same callsite, differnt param def foo(i): if i == 1: use 128 bytes of stack iterator based loop Here, for a second 'foo' call prev_st->allocated_stack is 128, while queued_st->allocated_stack is much smaller. widen_imprecise_scalars() needs to take this into account and avoid accessing bpf_verifier_state->frame[*]->stack out of bounds. Fixes: 2793a8b015f7 ("bpf: exact states comparison for iterator convergence checks") Reported-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251114025730.772723-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14bpf: Add bpf_prog_run_data_pointers()Eric Dumazet3-8/+24
syzbot found that cls_bpf_classify() is able to change tc_skb_cb(skb)->drop_reason triggering a warning in sk_skb_reason_drop(). WARNING: CPU: 0 PID: 5965 at net/core/skbuff.c:1192 __sk_skb_reason_drop net/core/skbuff.c:1189 [inline] WARNING: CPU: 0 PID: 5965 at net/core/skbuff.c:1192 sk_skb_reason_drop+0x76/0x170 net/core/skbuff.c:1214 struct tc_skb_cb has been added in commit ec624fe740b4 ("net/sched: Extend qdisc control block with tc control block"), which added a wrong interaction with db58ba459202 ("bpf: wire in data and data_end for cls_act_bpf"). drop_reason was added later. Add bpf_prog_run_data_pointers() helper to save/restore the net_sched storage colliding with BPF data_meta/data_end. Fixes: ec624fe740b4 ("net/sched: Extend qdisc control block with tc control block") Reported-by: syzbot <syzkaller@googlegroups.com> Closes: https://lore.kernel.org/netdev/6913437c.a70a0220.22f260.013b.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20251112125516.1563021-1-edumazet@google.com
2025-11-14Merge tag 'v6.18-p5' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: - Fix device reference leak in hisilicon * tag 'v6.18-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: hisilicon/qm - Fix device reference leak in qm_get_qos_value
2025-11-14Merge tag 'v6.18-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds3-2/+7
Pull smb client fixes from Steve French: - Multichannel reconnect channel selection fix - Fix for smbdirect (RDMA) disconnect bug - Fix for incorrect username length check - Fix memory leak in mount parm processing * tag 'v6.18-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: let smbd_disconnect_rdma_connection() turn CREATED into DISCONNECTED smb: fix invalid username check in smb3_fs_context_parse_param() cifs: client: fix memory leak in smb3_fs_context_parse_param smb: client: fix cifs_pick_channel when channel needs reconnect
2025-11-14ALSA: usb-audio: Add native DSD quirks for PureAudio DAC seriesLushih Hsieh1-0/+6
The PureAudio APA DAC and Lotus DAC5 series are USB Audio 2.0 Class devices that support native Direct Stream Digital (DSD) playback via specific vendor protocols. Without these quirks, the devices may only function in standard PCM mode, or fail to correctly report their DSD format capabilities to the ALSA framework, preventing native DSD playback under Linux. This commit adds new quirk entries for the mentioned DAC models based on their respective Vendor/Product IDs (VID:PID), for example: 0x16d0:0x0ab1 (APA DAC), 0x16d0:0xeca1 (DAC5 series), etc. The quirk ensures correct DSD format handling by setting the required SNDRV_PCM_FMTBIT_DSD_U32_BE format bit and defining the DSD-specific Audio Class 2.0 (AC2.0) endpoint configurations. This allows the ALSA DSD API to correctly address the device for high-bitrate DSD streams, bypassing the need for DoP (DSD over PCM). Test on APA DAC and Lotus DAC5 SE under Arch Linux. Tested-by: Lushih Hsieh <bruce@mail.kh.edu.tw> Signed-off-by: Lushih Hsieh <bruce@mail.kh.edu.tw> Link: https://patch.msgid.link/20251114052053.54989-1-bruce@mail.kh.edu.tw Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-11-14Merge tag 'asoc-fix-v6.18-rc5' of ↵Takashi Iwai485-2341/+4462
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.18 A small collection of fixes, all driver specific and none especially remarkable unless you have the hardware (many not even then).
2025-11-14Merge tag 'drm-xe-fixes-2025-11-13' of ↵Dave Airlie2-0/+12
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - New HW workarounds affecting PTL and WCL platforms (Nitin Gote, Tangudu Tilak Tirumalesh) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patch.msgid.link/ay2qztgonodwson6tuzcv5napjmqbgwzv27so4ybfola34guux@xgufrrmbzyws
2025-11-14Merge tag 'drm-intel-fixes-2025-11-13' of ↵Dave Airlie1-1/+6
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Fix PSR's pipe to vblank conversion (Jani) - Disable Panel Replay on MST links (Imre) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/aRXdQnitzyFcokhF@intel.com
2025-11-14Merge tag 'drm-misc-fixes-2025-11-13' of ↵Dave Airlie6-10/+46
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: client: - Fix description of module parameter panthor: - Flush writes before mapping buffers vmwgfx: - Improve command validation - Improve ref counting - Fix cursor-plane support Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20251113132317.GA451885@linux.fritz.box
2025-11-14Merge tag 'amd-drm-fixes-6.18-2025-11-12' of ↵Dave Airlie8-9/+38
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.18-2025-11-12: amdgpu: - Disallow P2P DMA for GC 12 DCC surfaces - ctx error handling fix - UserQ fixes - VRR fix - ISP fix - JPEG 5.0.1 fix amdkfd: - Save area check fix - Fix GPU mappings for APU after prefetch Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20251112200930.8788-1-alexander.deucher@amd.com
2025-11-13Merge tag 'vfio-v6.18-rc6' of https://github.com/awilliam/linux-vfioLinus Torvalds4-9/+288
Pull VFIO seftest fixes from Alex Williamson: - Fix vfio selftests to remove the expectation that the IOMMU supports a 64-bit IOVA space. These manifest both in the original set of tests introduced this development cycle in identity mapping the IOVA to buffer virtual address space, as well as the more recent boundary testing. Implement facilities for collecting the valid IOVA ranges from the backend, implement a simple IOVA allocator, and use the information for determining extents (Alex Mastro) * tag 'vfio-v6.18-rc6' of https://github.com/awilliam/linux-vfio: vfio: selftests: replace iova=vaddr with allocated iovas vfio: selftests: add iova allocator vfio: selftests: fix map limit tests to use last available iova vfio: selftests: add iova range query helpers
2025-11-13Merge tag 'hwmon-for-v6.18-rc6' of ↵Linus Torvalds1-28/+26
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - gpd-fan: Fix compilation error for non-ACPI builds, and initialize EC when loading the driver * tag 'hwmon-for-v6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (gpd-fan) initialize EC on driver load for Win 4 hwmon: (gpd-fan) Fix compilation error in non-ACPI builds
2025-11-13Merge tag 'pm-6.18-rc6' of ↵Linus Torvalds2-14/+17
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix issues related to the handling of compressed hibernation images and a recent intel_pstate driver regression: - Fix issues related to using inadequate data types and incorrect use of atomic variables in the compressed hibernation images handling code that were introduced during the 6.9 development cycle (Mario Limonciello) - Move a X86_FEATURE_IDA check from turbo_is_disabled() to the places where a new value for MSR_IA32_PERF_CTL is computed in intel_pstate to address a regression preventing users from enabling turbo frequencies post-boot (Srinivas Pandruvada)" * tag 'pm-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Check IDA only before MSR_IA32_PERF_CTL writes PM: hibernate: Fix style issues in save_compressed_image() PM: hibernate: Use atomic64_t for compressed_size variable PM: hibernate: Emit an error when image writing fails
2025-11-13Merge tag 'acpi-6.18-rc6' of ↵Linus Torvalds3-14/+37
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix issues in the ACPI CPPC library and in the recently added parser for the ACPI MRRM table: - Limit some checks in the ACPI CPPC library to online CPUs to avoid accessing uninitialized per-CPU variables when some CPUs are offline to start with, like during boot with 'nosmt=force' (Gautham Shenoy) - Rework add_boot_memory_ranges() in the ACPI MRRM table parser to fix memory leaks and improve error handling (Kaushlendra Kumar)" * tag 'acpi-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: MRRM: Fix memory leaks and improve error handling ACPI: CPPC: Limit perf ctrs in PCC check only to online CPUs ACPI: CPPC: Perform fast check switch only for online CPUs ACPI: CPPC: Check _CPC validity for only the online CPUs ACPI: CPPC: Detect preferred core availability on online CPUs
2025-11-13selftests/bpf: retry bpf_map_update_elem() when E2BIG is returnedMatt Bobrowski1-1/+2
Executing the test_maps binary on platforms with extremely high core counts may cause intermittent assertion failures in test_update_delete() (called via test_map_parallel()). This can occur because bpf_map_update_elem() under some circumstances (specifically in this case while performing bpf_map_update_elem() with BPF_NOEXIST on a BPF_MAP_TYPE_HASH with its map_flags set to BPF_F_NO_PREALLOC) can return an E2BIG error code i.e. error -7 7 tools/testing/selftests/bpf/test_maps.c:#: void test_update_delete(unsigned int, void *): Assertion `err == 0' failed. tools/testing/selftests/bpf/test_maps.c:#: void __run_parallel(unsigned int, void (*)(unsigned int, void *), void *): Assertion `status == 0' failed. As it turns out, is_map_full() which is called from alloc_htab_elem() can take on a conservative approach when htab->use_percpu_counter is true (which is the case here because the percpu_counter is used when a BPF_MAP_TYPE_HASH is created with its map_flags set to BPF_F_NO_PREALLOC). This conservative approach prioritizes preventing over-allocation and potential issues that could arise from possibly exceeding htab->map.max_entries in highly concurrent environments, even if it means slightly under-utilizing the htab map's capacity. Given that bpf_map_update_elem() from test_update_delete() can return E2BIG, update can_retry() such that it also accounts for the E2BIG error code (specifically only when running with map_flags being set to BPF_F_NO_PREALLOC). The retry loop will allow the global count belonging to the percpu_counter to become synchronized and better reflect the current htab map's capacity. Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20251113092519.2632079-1-mattbobrowski@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-13Merge branch 'mptcp-fix-conflicts-between-mptcp-and-sockmap'Martin KaFai Lau4-2/+195
Jiayuan Chen says: ==================== mptcp: Fix conflicts between MPTCP and sockmap Overall, we encountered a warning [1] that can be triggered by running the selftest I provided. sockmap works by replacing sk_data_ready, recvmsg, sendmsg operations and implementing fast socket-level forwarding logic: 1. Users can obtain file descriptors through userspace socket()/accept() interfaces, then call BPF syscall to perform these replacements. 2. Users can also use the bpf_sock_hash_update helper (in sockops programs) to replace handlers when TCP connections enter ESTABLISHED state (BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB/BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB) However, when combined with MPTCP, an issue arises: MPTCP creates subflow sk's and performs TCP handshakes, so the BPF program obtains subflow sk's and may incorrectly replace their sk_prot. We need to reject such operations. In patch 1, we set psock_update_sk_prot to NULL in the subflow's custom sk_prot. Additionally, if the server's listening socket has MPTCP enabled and the client's TCP also uses MPTCP, we should allow the combination of subflow and sockmap. This is because the latest Golang programs have enabled MPTCP for listening sockets by default [2]. For programs already using sockmap, upgrading Golang should not cause sockmap functionality to fail. Patch 2 prevents the WARNING from occurring. Despite these patches fixing stream corruption, users of sockmap must set GODEBUG=multipathtcp=0 to disable MPTCP until sockmap fully supports it. [1] truncated warning: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380 Modules linked in: RIP: 0010:mptcp_stream_accept+0x34c/0x380 RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202 PKRU: 55555554 Call Trace: <TASK> do_accept+0xeb/0x190 ? __x64_sys_pselect6+0x61/0x80 ? _raw_spin_unlock+0x12/0x30 ? alloc_fd+0x11e/0x190 __sys_accept4+0x8c/0x100 __x64_sys_accept+0x1f/0x30 x64_sys_call+0x202f/0x20f0 do_syscall_64+0x72/0x9a0 ? switch_fpu_return+0x60/0xf0 ? irqentry_exit_to_user_mode+0xdb/0x1e0 ? irqentry_exit+0x3f/0x50 ? clear_bhb_loop+0x50/0xa0 ? clear_bhb_loop+0x50/0xa0 ? clear_bhb_loop+0x50/0xa0 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> ---[ end trace 0000000000000000 ]--- [2]: https://go-review.googlesource.com/c/go/+/607715 ==================== Link: https://patch.msgid.link/20251111060307.194196-1-jiayuan.chen@linux.dev Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-11-13selftests/bpf: Add mptcp test with sockmapJiayuan Chen2-0/+183
Add test cases to verify that when MPTCP falls back to plain TCP sockets, they can properly work with sockmap. Additionally, add test cases to ensure that sockmap correctly rejects MPTCP sockets as expected. Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251111060307.194196-4-jiayuan.chen@linux.dev
2025-11-13mptcp: Fix proto fallback detection with BPFJiayuan Chen1-2/+4
The sockmap feature allows bpf syscall from userspace, or based on bpf sockops, replacing the sk_prot of sockets during protocol stack processing with sockmap's custom read/write interfaces. ''' tcp_rcv_state_process() syn_recv_sock()/subflow_syn_recv_sock() tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB) bpf_skops_established <== sockops bpf_sock_map_update(sk) <== call bpf helper tcp_bpf_update_proto() <== update sk_prot ''' When the server has MPTCP enabled but the client sends a TCP SYN without MPTCP, subflow_syn_recv_sock() performs a fallback on the subflow, replacing the subflow sk's sk_prot with the native sk_prot. ''' subflow_syn_recv_sock() subflow_ulp_fallback() subflow_drop_ctx() mptcp_subflow_ops_undo_override() ''' Then, this subflow can be normally used by sockmap, which replaces the native sk_prot with sockmap's custom sk_prot. The issue occurs when the user executes accept::mptcp_stream_accept::mptcp_fallback_tcp_ops(). Here, it uses sk->sk_prot to compare with the native sk_prot, but this is incorrect when sockmap is used, as we may incorrectly set sk->sk_socket->ops. This fix uses the more generic sk_family for the comparison instead. Additionally, this also prevents a WARNING from occurring: result from ./scripts/decode_stacktrace.sh: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 337 at net/mptcp/protocol.c:68 mptcp_stream_accept \ (net/mptcp/protocol.c:4005) Modules linked in: ... PKRU: 55555554 Call Trace: <TASK> do_accept (net/socket.c:1989) __sys_accept4 (net/socket.c:2028 net/socket.c:2057) __x64_sys_accept (net/socket.c:2067) x64_sys_call (arch/x86/entry/syscall_64.c:41) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) RIP: 0033:0x7f87ac92b83d ---[ end trace 0000000000000000 ]--- Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20251111060307.194196-3-jiayuan.chen@linux.dev
2025-11-13Merge branch 'pm-sleep'Rafael J. Wysocki1-9/+13
Merge fixes for issues related to the handling of compressed hibernation images that were introduced during the 6.9 development cycle. * pm-sleep: PM: hibernate: Fix style issues in save_compressed_image() PM: hibernate: Use atomic64_t for compressed_size variable PM: hibernate: Emit an error when image writing fails
2025-11-13Merge tag 'slab-for-6.18-rc6' of ↵Linus Torvalds1-2/+6
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fix from Vlastimil Babka: - Fix memory leak of objects from remote NUMA node when bulk freeing to a cache with sheaves (Harry Yoo) * tag 'slab-for-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slub: fix memory leak in free_to_pcs_bulk()
2025-11-13Merge branches 'acpi-cppc' and 'acpi-tables'Rafael J. Wysocki3-14/+37
Merge ACPI CPPC library fixes and an ACPI MRRM table parser fix for 6.18-rc6. * acpi-cppc: ACPI: CPPC: Limit perf ctrs in PCC check only to online CPUs ACPI: CPPC: Perform fast check switch only for online CPUs ACPI: CPPC: Check _CPC validity for only the online CPUs ACPI: CPPC: Detect preferred core availability on online CPUs * acpi-tables: ACPI: MRRM: Fix memory leaks and improve error handling
2025-11-13Merge tag 'linux_kselftest-fixes-6.18-rc6' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fix from Shuah Khan: "Fixes event-filter-function.tc tracing test failure caused when a first run to sample events triggers kmem_cache_free which interferes with the rest of the test. Fix this by calling sample_events twice to eliminate the kmem_cache_free related noise from the sampling" * tag 'linux_kselftest-fixes-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/tracing: Run sample events to clear page cache events
2025-11-13Merge tag 'net-6.18-rc6' of ↵Linus Torvalds65-312/+1202
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from Bluetooth and Wireless. No known outstanding regressions. Current release - regressions: - eth: - bonding: fix mii_status when slave is down - mlx5e: fix missing error assignment in mlx5e_xfrm_add_state() Previous releases - regressions: - sched: limit try_bulk_dequeue_skb() batches - ipv4: route: prevent rt_bind_exception() from rebinding stale fnhe - af_unix: initialise scc_index in unix_add_edge() - netpoll: fix incorrect refcount handling causing incorrect cleanup - bluetooth: don't hold spin lock over sleeping functions - hsr: Fix supervision frame sending on HSRv0 - sctp: prevent possible shift out-of-bounds - tipc: fix use-after-free in tipc_mon_reinit_self(). - dsa: tag_brcm: do not mark link local traffic as offloaded - eth: virtio-net: fix incorrect flags recording in big mode Previous releases - always broken: - sched: initialize struct tc_ife to fix kernel-infoleak - wifi: - mac80211: reject address change while connecting - iwlwifi: avoid toggling links due to wrong element use - bluetooth: cancel mesh send timer when hdev removed - strparser: fix signed/unsigned mismatch bug - handshake: fix memory leak in tls_handshake_accept() Misc: - selftests: mptcp: fix some flaky tests" * tag 'net-6.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits) hsr: Follow standard for HSRv0 supervision frames hsr: Fix supervision frame sending on HSRv0 virtio-net: fix incorrect flags recording in big mode ipv4: route: Prevent rt_bind_exception() from rebinding stale fnhe wifi: iwlwifi: mld: always take beacon ies in link grading wifi: iwlwifi: mvm: fix beacon template/fixed rate wifi: iwlwifi: fix aux ROC time event iterator usage net_sched: limit try_bulk_dequeue_skb() batches selftests: mptcp: join: properly kill background tasks selftests: mptcp: connect: trunc: read all recv data selftests: mptcp: join: userspace: longer transfer selftests: mptcp: join: endpoints: longer transfer selftests: mptcp: join: rm: set backup flag selftests: mptcp: connect: fix fallback note due to OoO ethtool: fix incorrect kernel-doc style comment in ethtool.h mlx5: Fix default values in create CQ Bluetooth: btrtl: Avoid loading the config file on security chips net/mlx5e: Fix potentially misleading debug message net/mlx5e: Fix wraparound in rate limiting for values above 255 Gbps net/mlx5e: Fix maxrate wraparound in threshold between units ...
2025-11-13mm/slub: fix memory leak in free_to_pcs_bulk()Harry Yoo1-2/+6
The commit 989b09b73978 ("slab: skip percpu sheaves for remote object freeing") introduced the remote_objects array in free_to_pcs_bulk() to skip sheaves when objects from a remote node are freed. However, the array is flushed only when: 1) the array becomes full (++remote_nr >= PCS_BATCH_MAX), or 2) slab_free_hook() returns false and size becomes zero. When neither of the conditions is met, objects in the array are leaked. This resulted in a memory leak [1], where 82 GiB of memory was allocated for the maple_node cache. Flush the array after successfully freeing objects to sheaves in the do_free: path. In the meantime, move the snippet if (!size) goto flush_remote; outside the while loop for readability. Let's say all objects in the array are from a remote node: then we acquire s->cpu_sheaves->lock and try to free an object even when size is zero. This doesn't appear to be harmful, but isn't really readable. Reported-by: Tytus Rogalewski <admin@simplepod.ai> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220765 [1] Closes: https://lore.kernel.org/linux-mm/20251107094809.12e9d705b7bf4815783eb184@linux-foundation.org Closes: https://lore.kernel.org/all/aRGDTwbt2EIz2CYn@hyeyoo Fixes: 989b09b73978 ("slab: skip percpu sheaves for remote object freeing") Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20251111125331.12246-1-harry.yoo@oracle.com Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Tested-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Tytus Rogalewski <admin@simplepod.ai> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-13Merge branch 'percpu_hash-maps'Alexei Starovoitov3-2/+124
Leon Hwang says: ==================== In the discussion thread "[PATCH bpf-next v9 0/7] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps"[1], it was pointed out that missing calls to bpf_obj_free_fields() could lead to memory leaks. A selftest was added to confirm that this is indeed a real issue - the refcount of BPF_KPTR_REF field is not decremented when bpf_obj_free_fields() is missing after copy_map_value[,_long](). Further inspection of copy_map_value[,_long]() call sites revealed two locations affected by this issue: 1. pcpu_copy_value() 2. htab_map_update_elem() when used with BPF_F_LOCK Similar case happens when update local storage maps with BPF_F_LOCK. This series fixes the cases where BPF_F_LOCK is not involved by properly calling bpf_obj_free_fields() after copy_map_value[,_long](), and adds a selftest to verify the fix. The remaining cases involving BPF_F_LOCK will be addressed in a separate patch set after the series "bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps" is applied. Changes: v5 -> v6: * Update the test name to include "refcounted_kptr". * Update some local variables' name in the test (per Alexei). * v5: https://lore.kernel.org/bpf/20251104142714.99878-1-leon.hwang@linux.dev/ v4 -> v5: * Use a local variable to store the this_cpu_ptr()/per_cpu_ptr() result, and reuse it between copy_map_value[,_long]() and bpf_obj_free_fields() in patch #1 (per Andrii). * Drop patch #2 and #3, because the combination of BPF_F_LOCK with other special fields (except for BPF_SPIN_LOCK) will be disallowed on the UAPI side in the future (per Alexei). * v4: https://lore.kernel.org/bpf/20251030152451.62778-1-leon.hwang@linux.dev/ v3 -> v4: * Target bpf-next tree. * Address comments from Amery: * Drop 'bpf_obj_free_fields()' in the path of updating local storage maps without BPF_F_LOCK. * Drop the corresponding self test. * Respin the other test of local storage maps using syscall BPF programs. * v3: https://lore.kernel.org/bpf/20251026154000.34151-1-leon.hwang@linux.dev/ v2 -> v3: * Free special fields when update local storage maps without BPF_F_LOCK. * Add test to verify decrementing refcount when update cgroup local storage maps without BPF_F_LOCK. * Address review from AI bot: * Slow path with BPF_F_LOCK (around line 642-646) in 'bpf_local_storage.c'. * v2: https://lore.kernel.org/bpf/20251020164608.20536-1-leon.hwang@linux.dev/ v1 -> v2: * Add test to verify decrementing refcount when update cgroup local storage maps with BPF_F_LOCK. * Address review from AI bot: * Fast path without bucket lock (around line 610) in 'bpf_local_storage.c'. * v1: https://lore.kernel.org/bpf/20251016145801.47552-1-leon.hwang@linux.dev/ ==================== Link: https://patch.msgid.link/20251105151407.12723-1-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-13mptcp: Disallow MPTCP subflows from sockmapJiayuan Chen1-0/+8
The sockmap feature allows bpf syscall from userspace, or based on bpf sockops, replacing the sk_prot of sockets during protocol stack processing with sockmap's custom read/write interfaces. ''' tcp_rcv_state_process() subflow_syn_recv_sock() tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB) bpf_skops_established <== sockops bpf_sock_map_update(sk) <== call bpf helper tcp_bpf_update_proto() <== update sk_prot ''' Consider two scenarios: 1. When the server has MPTCP enabled and the client also requests MPTCP, the sk passed to the BPF program is a subflow sk. Since subflows only handle partial data, replacing their sk_prot is meaningless and will cause traffic disruption. 2. When the server has MPTCP enabled but the client sends a TCP SYN without MPTCP, subflow_syn_recv_sock() performs a fallback on the subflow, replacing the subflow sk's sk_prot with the native sk_prot. ''' subflow_ulp_fallback() subflow_drop_ctx() mptcp_subflow_ops_undo_override() ''' Subsequently, accept::mptcp_stream_accept::mptcp_fallback_tcp_ops() converts the subflow to plain TCP. For the first case, we should prevent it from being combined with sockmap by setting sk_prot->psock_update_sk_prot to NULL, which will be blocked by sockmap's own flow. For the second case, since subflow_syn_recv_sock() has already restored sk_prot to native tcp_prot/tcpv6_prot, no further action is needed. Fixes: cec37a6e41aa ("mptcp: Handle MP_CAPABLE options for outgoing connections") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20251111060307.194196-2-jiayuan.chen@linux.dev
2025-11-13selftests/bpf: Add test to verify freeing the special fields in pcpu mapsLeon Hwang2-0/+116
Add test to verify that updating [lru_,]percpu_hash maps decrements refcount when BPF_KPTR_REF objects are involved. The tests perform the following steps: . Call update_elem() to insert an initial value. . Use bpf_refcount_acquire() to increment the refcount. . Store the node pointer in the map value. . Add the node to a linked list. . Probe-read the refcount and verify it is *2*. . Call update_elem() again to trigger refcount decrement. . Probe-read the refcount and verify it is *1*. Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20251105151407.12723-3-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-13bpf: Free special fields when update [lru_,]percpu_hash mapsLeon Hwang1-2/+8
As [lru_,]percpu_hash maps support BPF_KPTR_{REF,PERCPU}, missing calls to 'bpf_obj_free_fields()' in 'pcpu_copy_value()' could cause the memory referenced by BPF_KPTR_{REF,PERCPU} fields to be held until the map gets freed. Fix this by calling 'bpf_obj_free_fields()' after 'copy_map_value[,_long]()' in 'pcpu_copy_value()'. Fixes: 65334e64a493 ("bpf: Support kptrs in percpu hashmap and percpu LRU hashmap") Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20251105151407.12723-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-13Merge branch 'hsr-send-correct-hsrv0-supervision-frames'Paolo Abeni2-8/+19
Felix Maurer says: ==================== hsr: Send correct HSRv0 supervision frames Hangbin recently reported that the hsr selftests were failing and noted that the entries in the node table were not merged, i.e., had 00:00:00:00:00:00 as MacAddressB forever [1]. This failure only occured with HSRv0 because it was not sending supervision frames anymore. While debugging this I found that we were not really following the HSRv0 standard for the supervision frames we sent, so I additionally made a few changes to get closer to the standard and restore a more correct behavior we had a while ago. The selftests can still fail because they take a while and run into the timeout. I did not include a change of the timeout because I have more improvements to the selftests mostly ready that change the test duration but are net-next material. [1]: https://lore.kernel.org/netdev/aMONxDXkzBZZRfE5@fedora/ ==================== Link: https://patch.msgid.link/cover.1762876095.git.fmaurer@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-13hsr: Follow standard for HSRv0 supervision framesFelix Maurer2-8/+16
For HSRv0, the path_id has the following meaning: - 0000: PRP supervision frame - 0001-1001: HSR ring identifier - 1010-1011: Frames from PRP network (A/B, with RedBoxes) - 1111: HSR supervision frame Follow the IEC 62439-3:2010 standard more closely by setting the right path_id for HSRv0 supervision frames (actually, it is correctly set when the frame is constructed, but hsr_set_path_id() overwrites it) and set a fixed HSR ring identifier of 1. The ring identifier seems to be generally unused and we ignore it anyways on reception, but some fixed identifier is definitely better than using one identifier in one direction and a wrong identifier in the other. This was also the behavior before commit f266a683a480 ("net/hsr: Better frame dispatch") which introduced the alternating path_id. This was later moved to hsr_set_path_id() in commit 451d8123f897 ("net: prp: add packet handling support"). The IEC 62439-3:2010 also contains 6 unused bytes after the MacAddressA in the HSRv0 supervision frames. Adjust a TODO comment accordingly. Fixes: f266a683a480 ("net/hsr: Better frame dispatch") Fixes: 451d8123f897 ("net: prp: add packet handling support") Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/ea0d5133cd593856b2fa673d6e2067bf1d4d1794.1762876095.git.fmaurer@redhat.com Tested-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-13hsr: Fix supervision frame sending on HSRv0Felix Maurer1-0/+3
On HSRv0, no supervision frames were sent. The supervison frames were generated successfully, but failed the check for a sufficiently long mac header, i.e., at least sizeof(struct hsr_ethhdr), in hsr_fill_frame_info() because the mac header only contained the ethernet header. Fix this by including the HSR header in the mac header when generating HSR supervision frames. Note that the mac header now also includes the TLV fields. This matches how we set the headers on rx and also the size of struct hsrv0_ethhdr_sp. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Closes: https://lore.kernel.org/netdev/aMONxDXkzBZZRfE5@fedora/ Fixes: 9cfb5e7f0ded ("net: hsr: fix hsr_init_sk() vs network/transport headers.") Signed-off-by: Felix Maurer <fmaurer@redhat.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/4354114fea9a642fe71f49aeeb6c6159d1d61840.1762876095.git.fmaurer@redhat.com Tested-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-13drm/client: fix MODULE_PARM_DESC string for "active"Randy Dunlap1-2/+2
The MODULE_PARM_DESC string for the "active" parameter is missing a space and has an extraneous trailing ']' character. Correct these. Before patch: $ modinfo -p ./drm_client_lib.ko active:Choose which drm client to start, default isfbdev] (string) After patch: $ modinfo -p ./drm_client_lib.ko active:Choose which drm client to start, default is fbdev (string) Fixes: f7b42442c4ac ("drm/log: Introduce a new boot logger to draw the kmsg on the screen") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20251112010920.2355712-1-rdunlap@infradead.org
2025-11-13Merge tag 'erofs-for-6.18-rc6-fixes' of ↵Linus Torvalds2-4/+8
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Add Chunhai Guo as a EROFS reviewer to get more eyes from interested industry vendors - Fix infinite loop caused by incomplete crafted zstd-compressed data (thanks to Robert again!) * tag 'erofs-for-6.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: avoid infinite loop due to incomplete zstd-compressed data MAINTAINERS: erofs: add myself as reviewer
2025-11-13Merge tag 'v6.18-rc5-smb-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds2-2/+17
Pull smb server fixes from Steve French: - Fix smbdirect (RDMA) disconnect hang bug - Fix potential Denial of Service when connection limit exceeded - Fix smbdirect (RDMA) connection (potentially accessing freed memory) bug * tag 'v6.18-rc5-smb-server-fixes' of git://git.samba.org/ksmbd: smb: server: let smb_direct_disconnect_rdma_connection() turn CREATED into DISCONNECTED ksmbd: close accepted socket when per-IP limit rejects connection smb: server: rdma: avoid unmapping posted recv on accept failure
2025-11-13PCI/ASPM: Avoid L0s and L1 on Hi1105 [19e5:1105] Wi-FiShawn Lin1-0/+1
This Wi-Fi advertises the L0s and L1 capabilities but actually it doesn't support them. This is confirmed by HiSilicon team in actual productization. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/1762916319-139532-1-git-send-email-shawn.lin@rock-chips.com
2025-11-13virtio-net: fix incorrect flags recording in big modeXuan Zhuo1-5/+11
The purpose of commit 703eec1b2422 ("virtio_net: fixing XDP for fully checksummed packets handling") is to record the flags in advance, as their value may be overwritten in the XDP case. However, the flags recorded under big mode are incorrect, because in big mode, the passed buf does not point to the rx buffer, but rather to the page of the submitted buffer. This commit fixes this issue. For the small mode, the commit c11a49d58ad2 ("virtio_net: Fix mismatched buf address when unmapping for small packets") fixed it. Tested-by: Alyssa Ross <hi@alyssa.is> Fixes: 703eec1b2422 ("virtio_net: fixing XDP for fully checksummed packets handling") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20251111090828.23186-1-xuanzhuo@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-12Merge tag 'nfsd-6.18-3' of ↵Linus Torvalds6-28/+58
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Address recently reported issues or issues found at the recent NFS bake-a-thon held in Raleigh, NC. Issues reported with v6.18-rc: - Address a kernel build issue - Reorder SEQUENCE processing to avoid spurious NFS4ERR_SEQ_MISORDERED Issues that need expedient stable backports: - Close a refcount leak exposure - Report support for NFSv4.2 CLONE correctly - Fix oops during COPY_NOTIFY processing - Prevent rare crash after XDR encoding failure - Prevent crash due to confused or malicious NFSv4.1 client" * tag 'nfsd-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: Revert "SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it" nfsd: ensure SEQUENCE replay sends a valid reply. NFSD: Never cache a COMPOUND when the SEQUENCE operation fails NFSD: Skip close replay processing if XDR encoding fails NFSD: free copynotify stateid in nfs4_free_ol_stateid() nfsd: add missing FATTR4_WORD2_CLONE_BLKSIZE from supported attributes nfsd: fix refcount leak in nfsd_set_fh_dentry()
2025-11-12Merge tag 'dma-mapping-6.18-2025-11-12' of ↵Linus Torvalds2-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: - two minor fixes for DMA API infrastructure: restoring proper structure padding used in benchmark tests (Qinxin Xia) and global DMA_BIT_MASK macro rework to make it a bit more clang friendly (James Clark) * tag 'dma-mapping-6.18-2025-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-mapping: Allow use of DMA_BIT_MASK(64) in global scope dma-mapping: benchmark: Restore padding to ensure uABI remained consistent
2025-11-12Merge tag 'loongarch-fixes-6.18-1' of ↵Linus Torvalds24-82/+59
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: - Fix a Rust build error - Fix exception/interrupt, memory management, perf event, hardware breakpoint, kexec and KVM bugs * tag 'loongarch-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Fix max supported vCPUs set with EIOINTC LoongArch: KVM: Skip PMU checking on vCPU context switch LoongArch: KVM: Restore guest PMU if it is enabled LoongArch: KVM: Add delay until timer interrupt injected LoongArch: KVM: Set page with write attribute if dirty track disabled LoongArch: kexec: Print out debugging message if required LoongArch: kexec: Initialize the kexec_buf structure LoongArch: Use correct accessor to read FWPC/MWPC LoongArch: Refine the init_hw_perf_events() function LoongArch: Remove __GFP_HIGHMEM masking in pud_alloc_one() LoongArch: Let {pte,pmd}_modify() record the status of _PAGE_DIRTY LoongArch: Consolidate max_pfn & max_low_pfn calculation LoongArch: Consolidate early_ioremap()/ioremap_prot() LoongArch: Use physical addresses for CSR_MERRENTRY/CSR_TLBRENTRY LoongArch: Clarify 3 MSG interrupt features rust: Add -fno-isolate-erroneous-paths-dereference to bindgen_skip_c_flags
2025-11-12Merge tag 'alpha-fixes-v6.18-rc5' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha Pull alpha fix from Matt Turner: "Add Magnus as a maintainer of the alpha port" * tag 'alpha-fixes-v6.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha: MAINTAINERS: Add Magnus Lindholm as maintainer for alpha port
2025-11-12PCI/ASPM: Avoid L0s and L1 on PA Semi [1959:a002] Root PortsBjorn Helgaas1-0/+1
Christian reported that f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms") broke booting on the A-EON AmigaOne X1000. Override the L0s and L1 Support advertised in Link Capabilities by the X1000 Root Ports ([1959:a002]) so we don't try to enable those states. Fixes: f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms") Fixes: df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Link: https://lore.kernel.org/r/a41d2ca1-fcd9-c416-b111-a958e92e94bf@xenosoft.de Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-11-12PCI/ASPM: Avoid L0s and L1 on Freescale [1957:0451] Root PortsBjorn Helgaas1-0/+1
Christian reported that f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms") broke booting on the A-EON X5000. Override the L0s and L1 Support advertised in Link Capabilities by the X5000 Root Ports ([1957:0451]) so we don't try to enable those states. Fixes: f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms") Fixes: df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Link: https://lore.kernel.org/r/db5c95a1-cf3e-46f9-8045-a1b04908051a@xenosoft.de Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Shawn Lin <shawn.lin@rock-chips.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20251110222929.2140564-5-helgaas@kernel.org
2025-11-12PCI/ASPM: Convert quirks to override advertised link statesBjorn Helgaas1-20/+19
Existing quirks to disable ASPM L0s and L1 use pci_disable_link_state(), which disables ASPM states and prevents their use in the future. But since they are FINAL quirks, they happen after ASPM has already been enabled. Here's a typical call path: pci_host_probe pci_scan_root_bus_bridge pci_scan_child_bus pci_scan_slot pci_scan_single_device pci_device_add pci_fixup_device(pci_fixup_header) # HEADER quirks pcie_aspm_init_link_state pcie_config_aspm_path pcie_config_aspm_link pcie_config_aspm_dev # ASPM may be enabled pci_bus_add_devices pci_bus_add_devices pci_fixup_device(pci_fixup_final) # FINAL quirks quirk_disable_aspm_l0s pci_disable_link_state(dev, PCIE_LINK_STATE_L0S) Sometimes enabling ASPM can make the link non-functional, so if we know ASPM is broken on a device, we shouldn't enable it at all, even temporarily. Convert the existing quirks to use pcie_aspm_remove_cap() instead, which overrides the ASPM Support advertised in PCIe Link Capabilities, and make them HEADER quirks so they run before pcie_aspm_init_link_state() has a chance to enable ASPM. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Shawn Lin <shawn.lin@rock-chips.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20251110222929.2140564-4-helgaas@kernel.org
2025-11-12PCI/ASPM: Add pcie_aspm_remove_cap() to override advertised link statesBjorn Helgaas2-0/+15
Add pcie_aspm_remove_cap(). A quirk can use this to prevent use of ASPM L0s or L1 link states, even if the device advertised support for them. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Shawn Lin <shawn.lin@rock-chips.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20251110222929.2140564-3-helgaas@kernel.org
2025-11-12PCI/ASPM: Cache L0s/L1 Supported so advertised link states can be overriddenBjorn Helgaas3-8/+13
Defective devices sometimes advertise support for ASPM L0s or L1 states even if they don't work correctly. Cache the L0s Supported and L1 Supported bits early in enumeration so HEADER quirks can override the ASPM states advertised in Link Capabilities before pcie_aspm_cap_init() enables ASPM. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Shawn Lin <shawn.lin@rock-chips.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Link: https://patch.msgid.link/20251110222929.2140564-2-helgaas@kernel.org
2025-11-13ASoC: rsnd: fix OF node reference leak in rsnd_ssiu_probe()Haotian Zhang1-2/+1
rsnd_ssiu_probe() leaks an OF node reference obtained by rsnd_ssiu_of_node(). The node reference is acquired but never released across all return paths. Fix it by declaring the device node with the __free(device_node) cleanup construct to ensure automatic release when the variable goes out of scope. Fixes: 4e7788fb8018 ("ASoC: rsnd: add SSIU BUSIF support") Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Link: https://patch.msgid.link/20251112065709.1522-1-vulab@iscas.ac.cn Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-12acpi/hmat: Fix lockdep warning for hmem_register_resource()Dave Jiang1-21/+25
The following lockdep splat was observed while kernel auto-online a CXL memory region: ====================================================== WARNING: possible circular locking dependency detected 6.17.0djtest+ #53 Tainted: G W ------------------------------------------------------ systemd-udevd/3334 is trying to acquire lock: ffffffff90346188 (hmem_resource_lock){+.+.}-{4:4}, at: hmem_register_resource+0x31/0x50 but task is already holding lock: ffffffff90338890 ((node_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x2e/0x70 which lock already depends on the new lock. [..] Chain exists of: hmem_resource_lock --> mem_hotplug_lock --> (node_chain).rwsem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock((node_chain).rwsem); lock(mem_hotplug_lock); lock((node_chain).rwsem); lock(hmem_resource_lock); The lock ordering can cause potential deadlock. There are instances where hmem_resource_lock is taken after (node_chain).rwsem, and vice versa. Split out the target update section of hmat_register_target() so that hmat_callback() only envokes that section instead of attempt to register hmem devices that it does not need to. [ dj: Fix up comment to be closer to 80cols. (Jonathan) ] Fixes: cf8741ac57ed ("ACPI: NUMA: HMAT: Register "soft reserved" memory as an "hmem" device") Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Tested-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20251105235115.85062-3-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-11-12hwmon: (gpd-fan) initialize EC on driver load for Win 4Cryolitia PukNgae1-27/+25
The original implement will re-init the EC when it reports a zero value, and it's a workaround for the black box buggy firmware. Now a contributer test and report that, the bug is that, the firmware won't initialize the EC on boot, so the EC ramains in unusable status. And it won't need to re-init it during runtime. The original implement is not perfect, any write command will be ignored until we first read it. Just re-init it unconditionally when the driver load could work. Fixes: 0ab88e239439 ("hwmon: add GPD devices sensor driver") Co-developed-by: kylon <3252255+kylon@users.noreply.github.com> Signed-off-by: kylon <3252255+kylon@users.noreply.github.com> Link: https://github.com/Cryolitia/gpd-fan-driver/pull/20 Signed-off-by: Cryolitia PukNgae <cryolitia@uniontech.com> Link: https://lore.kernel.org/r/20251030-win4-v1-1-c374dcb86985@uniontech.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-11-12hwmon: (gpd-fan) Fix compilation error in non-ACPI buildsGopi Krishna Menon1-1/+1
Building gpd-fan driver without CONFIG_ACPI results in the following build errors: drivers/hwmon/gpd-fan.c: In function ‘gpd_ecram_read’: drivers/hwmon/gpd-fan.c:228:9: error: implicit declaration of function ‘outb’ [-Werror=implicit-function-declaration] 228 | outb(0x2E, addr_port); | ^~~~ drivers/hwmon/gpd-fan.c:241:16: error: implicit declaration of function ‘inb’ [-Werror=implicit-function-declaration] 241 | *val = inb(data_port); The definitions for inb() and outb() come from <linux/io.h> (specifically through <asm/io.h>), which is implicitly included via <acpi_io.h>. When CONFIG_ACPI is not set, <acpi_io.h> is not included resulting in <linux/io.h> to be omitted as well. Since the driver does not depend on ACPI, remove <linux/acpi.h> and add <linux/io.h> directly to fix the compilation errors. Signed-off-by: Gopi Krishna Menon <krishnagopi487@gmail.com> Link: https://lore.kernel.org/r/20251024202042.752160-1-krishnagopi487@gmail.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-11-12bpf: Adjust return value for queue destruction in rqspinlockKumar Kartikeya Dwivedi1-1/+1
Return -ETIMEDOUT whenever non-head waiters are signalled by head, and fix oversight in commit 7bd6e5ce5be6 ("rqspinlock: Disable queue destruction for deadlocks"). We no longer signal on deadlocks. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Reviewed-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20251111013827.1853484-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-12Merge tag 'wireless-2025-11-12' of ↵Jakub Kicinski9-41/+117
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Couple more fixes: - mwl8k: work around FW expecting a DSSS element in beacons - ath11k: report correct TX status - iwlwifi: avoid toggling links due to wrong element use - iwlwifi: fix beacon template rate on older devices - iwlwifi: fix loop iterator being used after loop - mac80211: disallow address changes while using the address - mac80211: avoid bad rate warning in monitor/sniffer mode - hwsim: fix potential NULL deref (on monitor injection) * tag 'wireless-2025-11-12' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: mld: always take beacon ies in link grading wifi: iwlwifi: mvm: fix beacon template/fixed rate wifi: iwlwifi: fix aux ROC time event iterator usage wifi: mwl8k: inject DSSS Parameter Set element into beacons if missing wifi: mac80211_hwsim: Fix possible NULL dereference wifi: mac80211: skip rate verification for not captured PSDUs wifi: mac80211: reject address change while connecting wifi: ath11k: zero init info->status in wmi_process_mgmt_tx_comp() ==================== Link: https://patch.msgid.link/20251112114621.15716-5-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12cpufreq: intel_pstate: Check IDA only before MSR_IA32_PERF_CTL writesSrinivas Pandruvada1-5/+4
Commit ac4e04d9e378 ("cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode") introduced a check for feature X86_FEATURE_IDA to verify turbo mode support. Although this is the correct way to check for turbo mode support, it causes issues on some platforms that disable turbo during OS boot, but enable it later [1]. Before adding this feature check, users were able to get turbo mode frequencies by writing 0 to /sys/devices/system/cpu/intel_pstate/no_turbo post-boot. To restore the old behavior on the affected systems while still addressing the unchecked MSR issue on some Skylake-X systems, check X86_FEATURE_IDA only immediately before updates of MSR_IA32_PERF_CTL that may involve setting the Turbo Engage Bit (bit 32). Fixes: ac4e04d9e378 ("cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode") Reported-by: Aaron Rainbolt <arainbolt@kfocus.org> Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2122531 [1] Tested-by: Aaron Rainbolt <arainbolt@kfocus.org> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject adjustment, changelog edits ] Link: https://patch.msgid.link/20251111010840.141490-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-12io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecsCaleb Sander Mateos1-7/+9
io_buffer_register_bvec() currently uses blk_rq_nr_phys_segments() as the number of bvecs in the request. However, bvecs may be split into multiple segments depending on the queue limits. Thus, the number of segments may overestimate the number of bvecs. For ublk devices, the only current users of io_buffer_register_bvec(), virt_boundary_mask, seg_boundary_mask, max_segments, and max_segment_size can all be set arbitrarily by the ublk server process. Set imu->nr_bvecs based on the number of bvecs the rq_for_each_bvec() loop actually yields. However, continue using blk_rq_nr_phys_segments() as an upper bound on the number of bvecs when allocating imu to avoid needing to iterate the bvecs a second time. Link: https://lore.kernel.org/io-uring/20251111191530.1268875-1-csander@purestorage.com/ Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs") Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-12vfio: selftests: replace iova=vaddr with allocated iovasAlex Mastro2-5/+12
vfio_dma_mapping_test and vfio_pci_driver_test currently use iova=vaddr as part of DMA mapping operations. However, not all IOMMUs support the same virtual address width as the processor. For instance, older Intel consumer platforms only support 39-bits of IOMMU address space. On such platforms, using the virtual address as the IOVA fails. Make the tests more robust by using iova_allocator to vend IOVAs, which queries legally accessible IOVAs from the underlying IOMMUFD or VFIO container. Reviewed-by: David Matlack <dmatlack@google.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Alex Mastro <amastro@fb.com> Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-4-7960244642c5@fb.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12vfio: selftests: add iova allocatorAlex Mastro2-1/+84
Add struct iova_allocator, which gives tests a convenient way to generate legally-accessible IOVAs to map. This allocator traverses the sorted available IOVA ranges linearly, requires power-of-two size allocations, and does not support freeing iova allocations. The assumption is that tests are not IOVA space-bounded, and will not need to recycle IOVAs. This is based on Alex Williamson's patch series for adding an IOVA allocator [1]. [1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/ Reviewed-by: David Matlack <dmatlack@google.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Alex Mastro <amastro@fb.com> Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-3-7960244642c5@fb.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12vfio: selftests: fix map limit tests to use last available iovaAlex Mastro1-2/+13
Use the newly available vfio_pci_iova_ranges() to determine the last legal IOVA, and use this as the basis for vfio_dma_map_limit_test tests. Fixes: de8d1f2fd5a5 ("vfio: selftests: add end of address space DMA map/unmap tests") Reviewed-by: David Matlack <dmatlack@google.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Alex Mastro <amastro@fb.com> Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-2-7960244642c5@fb.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12vfio: selftests: add iova range query helpersAlex Mastro2-1/+179
VFIO selftests need to map IOVAs from legally accessible ranges, which could vary between hardware. Tests in vfio_dma_mapping_test.c are making excessively strong assumptions about which IOVAs can be mapped. Add vfio_iommu_iova_ranges(), which queries IOVA ranges from the IOMMUFD or VFIO container associated with the device. The queried ranges are normalized to IOMMUFD's iommu_iova_range representation so that handling of IOVA ranges up the stack can be implementation-agnostic. iommu_iova_range and vfio_iova_range are equivalent, so bias to using the new interface's struct. Query IOMMUFD's ranges with IOMMU_IOAS_IOVA_RANGES. Query VFIO container's ranges with VFIO_IOMMU_GET_INFO and VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE. The underlying vfio_iommu_type1_info buffer-related functionality has been kept generic so the same helpers can be used to query other capability chain information, if needed. Reviewed-by: David Matlack <dmatlack@google.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Alex Mastro <amastro@fb.com> Link: https://lore.kernel.org/r/20251111-iova-ranges-v3-1-7960244642c5@fb.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-12ipv4: route: Prevent rt_bind_exception() from rebinding stale fnheChuang Wang1-0/+5
The sit driver's packet transmission path calls: sit_tunnel_xmit() -> update_or_create_fnhe(), which lead to fnhe_remove_oldest() being called to delete entries exceeding FNHE_RECLAIM_DEPTH+random. The race window is between fnhe_remove_oldest() selecting fnheX for deletion and the subsequent kfree_rcu(). During this time, the concurrent path's __mkroute_output() -> find_exception() can fetch the soon-to-be-deleted fnheX, and rt_bind_exception() then binds it with a new dst using a dst_hold(). When the original fnheX is freed via RCU, the dst reference remains permanently leaked. CPU 0 CPU 1 __mkroute_output() find_exception() [fnheX] update_or_create_fnhe() fnhe_remove_oldest() [fnheX] rt_bind_exception() [bind dst] RCU callback [fnheX freed, dst leak] This issue manifests as a device reference count leak and a warning in dmesg when unregistering the net device: unregister_netdevice: waiting for sitX to become free. Usage count = N Ido Schimmel provided the simple test validation method [1]. The fix clears 'oldest->fnhe_daddr' before calling fnhe_flush_routes(). Since rt_bind_exception() checks this field, setting it to zero prevents the stale fnhe from being reused and bound to a new dst just before it is freed. [1] ip netns add ns1 ip -n ns1 link set dev lo up ip -n ns1 address add 192.0.2.1/32 dev lo ip -n ns1 link add name dummy1 up type dummy ip -n ns1 route add 192.0.2.2/32 dev dummy1 ip -n ns1 link add name gretap1 up arp off type gretap \ local 192.0.2.1 remote 192.0.2.2 ip -n ns1 route add 198.51.0.0/16 dev gretap1 taskset -c 0 ip netns exec ns1 mausezahn gretap1 \ -A 198.51.100.1 -B 198.51.0.0/16 -t udp -p 1000 -c 0 -q & taskset -c 2 ip netns exec ns1 mausezahn gretap1 \ -A 198.51.100.1 -B 198.51.0.0/16 -t udp -p 1000 -c 0 -q & sleep 10 ip netns pids ns1 | xargs kill ip netns del ns1 Cc: stable@vger.kernel.org Fixes: 67d6d681e15b ("ipv4: make exception cache less predictible") Signed-off-by: Chuang Wang <nashuiliang@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251111064328.24440-1-nashuiliang@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12drm/i915/dp_mst: Disable Panel ReplayImre Deak1-0/+4
Disable Panel Replay on MST links until it's properly implemented. For instance the required VSC SDP is not programmed on MST and FEC is not enabled if Panel Replay is enabled. Fixes: 3257e55d3ea7 ("drm/i915/panelreplay: enable/disable panel replay") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15174 Cc: Jouni Högander <jouni.hogander@intel.com> Cc: Animesh Manna <animesh.manna@intel.com> Cc: stable@vger.kernel.org # v6.8+ Reviewed-by: Jouni Högander <jouni.hogander@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patch.msgid.link/20251107124141.911895-1-imre.deak@intel.com (cherry picked from commit e109f644b871df8440c886a69cdce971ed533088) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-11-12ALSA: hda/tas2781: Correct the wrong project IDBaojun Xu1-3/+3
The project hardware ID should be ALC287_FIXUP_TXNW2781_I2C, not ALC287_FIXUP_TAS2781_I2C for HP Lampass projects. Fixes: 7a39c723b747 ("ALSA: hda/tas2781: Add new quirk for HP new projects") Signed-off-by: Baojun Xu <baojun.xu@ti.com> Link: https://patch.msgid.link/20251112092609.15865-1-baojun.xu@ti.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-11-12Merge tag 'iwlwifi-fixes-2025-11-12' of ↵Johannes Berg4-26/+20
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next Miri Korenblit says: ==================== iwlwifi fixes: - avoid link toggling - fix beacon template rate - don't use iterator outside the loop ==================== Link: https://patch.msgid.link/DM3PPF63A6024A9E52FF4A7B23F283B7FC7A3CCA@DM3PPF63A6024A9.namprd11.prod.outlook.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-12wifi: iwlwifi: mld: always take beacon ies in link gradingMiri Korenblit1-6/+1
One of the factors of a link's grade is the channel load, which is calculated from the AP's bss load element. The current code takes this element from the beacon for an active link, and from bss->ies for an inactive link. bss->ies is set to either the beacon's ies or to the probe response ones, with preference to the probe response (meaning that if there was even one probe response, the ies of it will be stored in bss->ies and won't be overiden by the beacon ies). The probe response can be very old, i.e. from the connection time, where a beacon is updated before each link selection (which is triggered only after a passive scan). In such case, the bss load element in the probe response will not include the channel load caused by the STA, where the beacon will. This will cause the inactive link to always have a lower channel load, and therefore an higher grade than the active link's one. This causes repeated link switches, causing the throughput to drop. Fix this by always taking the ies from the beacon, as those are for sure new. Fixes: d1e879ec600f ("wifi: iwlwifi: add iwlmld sub-driver") Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20251110145652.b493dbb1853a.I058ba7309c84159f640cc9682d1bda56dd56a536@changeid Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-11-12wifi: iwlwifi: mvm: fix beacon template/fixed rateJohannes Berg2-13/+12
During the development of the rate changes, I evidently made some changes that shouldn't have been there; beacon templates with rate_n_flags are only in old versions, so no changes to them should have been necessary, and evidently broke on some devices. This also would have broken fixed (injection) rates, it would seem. Restore the old handling of this. Fixes: dabc88cb3b78 ("wifi: iwlwifi: handle v3 rates") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220558 Reviewed-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20251008112044.3bb8ea849d8d.I90f4d2b2c1f62eaedaf304a61d2ab9e50c491c2d@changeid Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-11-12wifi: iwlwifi: fix aux ROC time event iterator usageJunjie Cao1-7/+7
The list_for_each_entry() iterator must not be used outside the loop. Even though we break and check for NULL, doing so still violates kernel iteration rules and triggers Coccinelle's use_after_iter.cocci warning. Cache the matched entry in aux_roc_te and use it consistently after the loop. This follows iterator best practices, resolves the warning, and makes the code more maintainable. Signed-off-by: Junjie Cao <junjie.cao@intel.com> Link: https://patch.msgid.link/20251016014919.383565-1-junjie.cao@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-11-11drm/amdkfd: Fix GPU mappings for APU after prefetchHarish Kasiviswanathan1-0/+2
Fix the following corner case:- Consider a 2M huge page SVM allocation, followed by prefetch call for the first 4K page. The whole range is initially mapped with single PTE. After the prefetch, this range gets split to first page + rest of the pages. Currently, the first page mapping is not updated on MI300A (APU) since page hasn't migrated. However, after range split PTE mapping it not valid. Fix this by forcing page table update for the whole range when prefetch is called. Calling prefetch on APU doesn't improve performance. If all it deteriotes. However, functionality has to be supported. v2: Use apu_prefer_gtt as this issue doesn't apply to APUs with carveout VRAM v3: Simplify by setting the flag for all ASICs as it doesn't affect dGPU v4: Remove v2 and v3 changes. Force update_mapping when range is split at a size that is not aligned to prange granularity Suggested-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Philip Yang<Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 076470b9f6f8d9c7c8ca73a9f054942a686f9ba7)
2025-11-11drm/amdkfd: relax checks for over allocation of save areaJonathan Kim1-6/+6
Over allocation of save area is not fatal, only under allocation is. ROCm has various components that independently claim authority over save area size. Unless KFD decides to claim single authority, relax size checks. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 15bd4958fe38e763bc17b607ba55155254a01f55) Cc: stable@vger.kernel.org
2025-11-11drm/amdgpu/jpeg: Add parse_cs for JPEG5_0_1Sathishkumar S1-0/+1
enable parse_cs callback for JPEG5_0_1. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 547985579932c1de13f57f8bcf62cd9361b9d3d3) Cc: stable@vger.kernel.org
2025-11-11drm/amd/amdgpu: Ensure isp_kernel_buffer_alloc() creates a new BOSultan Alsawaf1-0/+2
When the BO pointer provided to amdgpu_bo_create_kernel() points to non-NULL, amdgpu_bo_create_kernel() takes it as a hint to pin that address rather than allocate a new BO. This functionality is never desired for allocating ISP buffers. A new BO should always be created when isp_kernel_buffer_alloc() is called, per the description for isp_kernel_buffer_alloc(). Ensure this by zeroing *bo right before the amdgpu_bo_create_kernel() call. Fixes: 55d42f616976 ("drm/amd/amdgpu: Add helper functions for isp buffers") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 73c8c29baac7f0c7e703d92eba009008cbb5228e)
2025-11-11drm/amd/display: Allow VRR params change if unsynced with the streamIvan Lipski1-0/+11
[Why] When changing resolution (e.g., 4K → FHD) in mirror/clone mode with certain monitors, the monitor blanks and loses connection due to an early exit in vrr_settings_require_update(). The function only checks if VRR state, fixed refresh target, or min/max refresh rate range has changed. During mode changes, if the calculated min/max refresh values remain the same even though the stream's v_total changed, the function returns early without updating vrr_params.adjust.v_total_min/max, leaving the monitor's VRR timing parameters unsynced with the new mode, causing it to blank out. [How] Explicitly adjust VRR parameters to the stream's nominal v_total when VRR is supported, but inactive. Fixes: 6d31602a9f57 ("drm/amd/display: more liberal vmin/vmax update for freesync") Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 607df8248a011524211ee34850345305a1913f9e)
2025-11-11drm/amdgpu: fix lock warning in amdgpu_userq_fence_driver_processJesse.Zhang1-2/+3
Fix a potential deadlock caused by inconsistent spinlock usage between interrupt and process contexts in the userq fence driver. The issue occurs when amdgpu_userq_fence_driver_process() is called from both: - Interrupt context: gfx_v11_0_eop_irq() -> amdgpu_userq_fence_driver_process() - Process context: amdgpu_eviction_fence_suspend_worker() -> amdgpu_userq_fence_driver_force_completion() -> amdgpu_userq_fence_driver_process() In interrupt context, the spinlock was acquired without disabling interrupts, leaving it in {IN-HARDIRQ-W} state. When the same lock is acquired in process context, the kernel detects inconsistent locking since the process context acquisition would enable interrupts while holding a lock previously acquired in interrupt context. Kernel log shows: [ 4039.310790] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. [ 4039.310804] kworker/7:2/409 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 4039.310818] ffff9284e1bed000 (&fence_drv->fence_list_lock){?...}-{3:3}, [ 4039.310993] {IN-HARDIRQ-W} state was registered at: [ 4039.311004] lock_acquire+0xc6/0x300 [ 4039.311018] _raw_spin_lock+0x39/0x80 [ 4039.311031] amdgpu_userq_fence_driver_process.part.0+0x30/0x180 [amdgpu] [ 4039.311146] amdgpu_userq_fence_driver_process+0x17/0x30 [amdgpu] [ 4039.311257] gfx_v11_0_eop_irq+0x132/0x170 [amdgpu] Fix by using spin_lock_irqsave()/spin_unlock_irqrestore() to properly manage interrupt state regardless of calling context. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ded3ad780cf97a04927773c4600823b84f7f3cc2) Cc: stable@vger.kernel.org
2025-11-11drm/amdgpu: jump to the correct label on failurePierre-Eric Pelloux-Prayer1-1/+1
drm_sched_entity_init wasn't called yet, so the only thing to do is to release allocated memory. This doesn't fix any bug since entity is zero allocated and drm_sched_entity_fini does nothing in this case. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ec49374ccb8da86b465beaf09c367f3dfd648d8f)
2025-11-11drm/amdgpu: disable peer-to-peer access for DCC-enabled GC12 VRAM surfacesVitaly Prosyak1-0/+12
Certain multi-GPU configurations (especially GFX12) may hit data corruption when a DCC-compressed VRAM surface is shared across GPUs using peer-to-peer (P2P) DMA transfers. Such surfaces rely on device-local metadata and cannot be safely accessed through a remote GPU’s page tables. Attempting to import a DCC-enabled surface through P2P leads to incorrect rendering or GPU faults. This change disables P2P for DCC-enabled VRAM buffers that are contiguous and allocated on GFX12+ hardware. In these cases, the importer falls back to the standard system-memory path, avoiding invalid access to compressed surfaces. Future work could consider optional migration (VRAM→System→VRAM) if a performance regression is observed when `attach->peer2peer = false`. Tested on: - Dual RX 9700 XT (Navi4x) setup - GNOME and Wayland compositor scenarios - Confirmed no corruption after disabling P2P under these conditions v2: Remove check TTM_PL_VRAM & TTM_PL_FLAG_CONTIGUOUS. v3: simplify for upsteam and fix ip version check (Alex) Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9dff2bb709e6fbd97e263fd12bf12802d2b5a0cf) Cc: stable@vger.kernel.org
2025-11-11net_sched: limit try_bulk_dequeue_skb() batchesEric Dumazet1-7/+10
After commit 100dfa74cad9 ("inet: dev_queue_xmit() llist adoption") I started seeing many qdisc requeues on IDPF under high TX workload. $ tc -s qd sh dev eth1 handle 1: ; sleep 1; tc -s qd sh dev eth1 handle 1: qdisc mq 1: root Sent 43534617319319 bytes 268186451819 pkt (dropped 0, overlimits 0 requeues 3532840114) backlog 1056Kb 6675p requeues 3532840114 qdisc mq 1: root Sent 43554665866695 bytes 268309964788 pkt (dropped 0, overlimits 0 requeues 3537737653) backlog 781164b 4822p requeues 3537737653 This is caused by try_bulk_dequeue_skb() being only limited by BQL budget. perf record -C120-239 -e qdisc:qdisc_dequeue sleep 1 ; perf script ... netperf 75332 [146] 2711.138269: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1292 skbaddr=0xff378005a1e9f200 netperf 75332 [146] 2711.138953: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1213 skbaddr=0xff378004d607a500 netperf 75330 [144] 2711.139631: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1233 skbaddr=0xff3780046be20100 netperf 75333 [147] 2711.140356: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1093 skbaddr=0xff37800514845b00 netperf 75337 [151] 2711.141037: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1353 skbaddr=0xff37800460753300 netperf 75337 [151] 2711.141877: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1367 skbaddr=0xff378004e72c7b00 netperf 75330 [144] 2711.142643: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x80150000 parent=0x10013 txq_state=0x0 packets=1202 skbaddr=0xff3780045bd60000 ... This is bad because : 1) Large batches hold one victim cpu for a very long time. 2) Driver often hit their own TX ring limit (all slots are used). 3) We call dev_requeue_skb() 4) Requeues are using a FIFO (q->gso_skb), breaking qdisc ability to implement FQ or priority scheduling. 5) dequeue_skb() gets packets from q->gso_skb one skb at a time with no xmit_more support. This is causing many spinlock games between the qdisc and the device driver. Requeues were supposed to be very rare, lets keep them this way. Limit batch sizes to /proc/sys/net/core/dev_weight (default 64) as __qdisc_run() was designed to use. Fixes: 5772e9a3463b ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/20251109161215.2574081-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11MAINTAINERS: Add Magnus Lindholm as maintainer for alpha portMagnus Lindholm1-0/+1
Acked-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Acked-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Magnus Lindholm <linmag7@gmail.com> Signed-off-by: Matt Turner <mattst88@gmail.com>
2025-11-11Merge branch 'selftests-mptcp-join-fix-some-flaky-tests'Jakub Kicinski4-51/+80
Matthieu Baerts says: ==================== selftests: mptcp: join: fix some flaky tests When looking at the recent CI results on NIPA and MPTCP CIs, a few MPTCP Join tests are marked as unstable. Here are some fixes for that. - Patch 1: a small fix for mptcp_connect.sh, printing a note as initially intended. For >=v5.13. - Patch 2: avoid unexpected reset when closing subflows. For >= 5.13. - Patches 3-4: longer transfer when not waiting for the end. For >=5.18. - Patch 5: read all received data when expecting a reset. For >= v6.1. - Patch 6: a fix to properly kill background tasks. For >= v6.5. ==================== Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-0-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11selftests: mptcp: join: properly kill background tasksMatthieu Baerts (NGI0)2-9/+30
The 'run_tests' function is executed in the background, but killing its associated PID would not kill the children tasks running in the background. To properly kill all background tasks, 'kill -- -PID' could be used, but this requires kill from procps-ng. Instead, all children tasks are listed using 'ps', and 'kill' is called with all PIDs of this group. Fixes: 31ee4ad86afd ("selftests: mptcp: join: stop transfer when check is done (part 1)") Cc: stable@vger.kernel.org Fixes: 04b57c9e096a ("selftests: mptcp: join: stop transfer when check is done (part 2)") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-6-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11selftests: mptcp: connect: trunc: read all recv dataMatthieu Baerts (NGI0)1-5/+13
MPTCP Join "fastclose server" selftest is sometimes failing because the client output file doesn't have the expected size, e.g. 296B instead of 1024B. When looking at a packet trace when this happens, the server sent the expected 1024B in two parts -- 100B, then 924B -- then the MP_FASTCLOSE. It is then strange to see the client only receiving 296B, which would mean it only got a part of the second packet. The problem is then not on the networking side, but rather on the data reception side. When mptcp_connect is launched with '-f -1', it means the connection might stop before having sent everything, because a reset has been received. When this happens, the program was directly stopped. But it is also possible there are still some data to read, simply because the previous 'read' step was done with a buffer smaller than the pending data, see do_rnd_read(). In this case, it is important to read what's left in the kernel buffers before stopping without error like before. SIGPIPE is now ignored, not to quit the app before having read everything. Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-5-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11selftests: mptcp: join: userspace: longer transferMatthieu Baerts (NGI0)1-5/+5
In rare cases, when the test environment is very slow, some userspace tests can fail because some expected events have not been seen. Because the tests are expecting a long on-going connection, and they are not waiting for the end of the transfer, it is fine to make the connection longer. This connection will be killed at the end, after the verifications, so making it longer doesn't change anything, apart from avoid it to end before the end of the verifications To play it safe, all userspace tests not waiting for the end of the transfer are now sharing a longer file (128KB) at slow speed. Fixes: 4369c198e599 ("selftests: mptcp: test userspace pm out of transfer") Cc: stable@vger.kernel.org Fixes: b2e2248f365a ("selftests: mptcp: userspace pm create id 0 subflow") Fixes: e3b47e460b4b ("selftests: mptcp: userspace pm remove initial subflow") Fixes: b9fb176081fb ("selftests: mptcp: userspace pm send RM_ADDR for ID 0") Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-4-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11selftests: mptcp: join: endpoints: longer transferMatthieu Baerts (NGI0)1-4/+4
In rare cases, when the test environment is very slow, some userspace tests can fail because some expected events have not been seen. Because the tests are expecting a long on-going connection, and they are not waiting for the end of the transfer, it is fine to make the connection longer. This connection will be killed at the end, after the verifications, so making it longer doesn't change anything, apart from avoid it to end before the end of the verifications To play it safe, all endpoints tests not waiting for the end of the transfer are now sharing a longer file (128KB) at slow speed. Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case") Cc: stable@vger.kernel.org Fixes: e274f7154008 ("selftests: mptcp: add subflow limits test-cases") Fixes: b5e2fb832f48 ("selftests: mptcp: add explicit test case for remove/readd") Fixes: e06959e9eebd ("selftests: mptcp: join: test for flush/re-add endpoints") Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-3-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11selftests: mptcp: join: rm: set backup flagMatthieu Baerts (NGI0)1-27/+27
Some of these 'remove' tests rarely fail because a subflow has been reset instead of cleanly removed. This can happen when one extra subflow which has never carried data is being closed (FIN) on one side, while the other is sending data for the first time. To avoid such subflows to be used right at the end, the backup flag has been added. With that, data will be only carried on the initial subflow. Fixes: d2c4333a801c ("selftests: mptcp: add testcases for removing addrs") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-2-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11selftests: mptcp: connect: fix fallback note due to OoOMatthieu Baerts (NGI0)1-1/+1
The "fallback due to TCP OoO" was never printed because the stat_ooo_now variable was checked twice: once in the parent if-statement, and one in the child one. The second condition was then always true then, and the 'else' branch was never taken. The idea is that when there are more ACK + MP_CAPABLE than expected, the test either fails if there was no out of order packets, or a notice is printed. Fixes: 69ca3d29a755 ("mptcp: update selftest for fallback due to OoO") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251110-net-mptcp-sft-join-unstable-v1-1-a4332c714e10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11Merge tag 'for-net-2025-11-11' of ↵Jakub Kicinski9-82/+158
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_conn: Fix not cleaning up PA_LINK connections - hci_event: Fix not handling PA Sync Lost event - MGMT: cancel mesh send timer when hdev removed - 6lowpan: reset link-local header on ipv6 recv path - 6lowpan: fix BDADDR_LE vs ADDR_LE_DEV address type confusion - L2CAP: export l2cap_chan_hold for modules - 6lowpan: Don't hold spin lock over sleeping functions - 6lowpan: add missing l2cap_chan_lock() - btusb: reorder cleanup in btusb_disconnect to avoid UAF - btrtl: Avoid loading the config file on security chips * tag 'for-net-2025-11-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: btrtl: Avoid loading the config file on security chips Bluetooth: hci_event: Fix not handling PA Sync Lost event Bluetooth: hci_conn: Fix not cleaning up PA_LINK connections Bluetooth: 6lowpan: add missing l2cap_chan_lock() Bluetooth: 6lowpan: Don't hold spin lock over sleeping functions Bluetooth: L2CAP: export l2cap_chan_hold for modules Bluetooth: 6lowpan: fix BDADDR_LE vs ADDR_LE_DEV address type confusion Bluetooth: 6lowpan: reset link-local header on ipv6 recv path Bluetooth: btusb: reorder cleanup in btusb_disconnect to avoid UAF Bluetooth: MGMT: cancel mesh send timer when hdev removed ==================== Link: https://patch.msgid.link/20251111141357.1983153-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11ethtool: fix incorrect kernel-doc style comment in ethtool.hKriish Sharma1-1/+1
Building documentation produced the following warning: WARNING: ./include/linux/ethtool.h:495 This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * IEEE 802.3ck/df defines 16 bins for FEC histogram plus one more for This comment was not intended to be parsed as kernel-doc, so replace the '/**' with '/*' to silence the warning and align with normal comment style in header files. No functional changes. Signed-off-by: Kriish Sharma <kriish.sharma2006@gmail.com> Link: https://patch.msgid.link/20251110182545.2112596-1-kriish.sharma2006@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-11gendwarfksyms: Skip files with no exportsSami Tolvanen3-3/+6
Starting with Rust 1.91.0 (released 2025-10-30), in upstream commit ab91a63d403b ("Ignore intrinsic calls in cross-crate-inlining cost model") [1][2], `bindings.o` stops containing DWARF debug information because the `Default` implementations contained `write_bytes()` calls which are now ignored in that cost model (note that `CLIPPY=1` does not reproduce it). This means `gendwarfksyms` complains: RUSTC L rust/bindings.o error: gendwarfksyms: process_module: dwarf_get_units failed: no debugging information? There are several alternatives that would work here: conditionally skipping in the cases needed (but that is subtle and brittle), forcing DWARF generation with e.g. a dummy `static` (ugly and we may need to do it in several crates), skipping the call to the tool in the Kbuild command when there are no exports (fine) or teaching the tool to do so itself (simple and clean). Thus do the last one: don't attempt to process files if we have no symbol versions to calculate. [ I used the commit log of my patch linked below since it explained the root issue and expanded it a bit more to summarize the alternatives. - Miguel ] Cc: stable@vger.kernel.org # Needed in 6.17.y. Reported-by: Haiyue Wang <haiyuewa@163.com> Closes: https://lore.kernel.org/rust-for-linux/b8c1c73d-bf8b-4bf2-beb1-84ffdcd60547@163.com/ Suggested-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/rust-for-linux/CANiq72nKC5r24VHAp9oUPR1HVPqT+=0ab9N0w6GqTF-kJOeiSw@mail.gmail.com/ Link: https://github.com/rust-lang/rust/commit/ab91a63d403b0105cacd72809cd292a72984ed99 [1] Link: https://github.com/rust-lang/rust/pull/145910 [2] Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Haiyue Wang <haiyuewa@163.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20251110131913.1789896-1-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-11Merge tag 'arm64-fixes' of ↵Linus Torvalds15-82/+165
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "There's more here than I would ideally like at this stage, but there's been a steady trickle of fixes and some of them took a few rounds of review. The bulk of the changes are fixing some fallout from the recent BBM level two support which allows the linear map to be split from block to page mappings at runtime, but inadvertently led to sleeping in atomic context on some paths where the linear map was already mapped with page granularity. The fix is simply to avoid splitting in those cases but the implementation of that is a little involved. The other interesting fix is addressing a catastophic performance issue with our per-cpu atomics discovered by Paul in the SRCU locking code but which took some interactions with the hardware folks to resolve. Summary: - Avoid sleeping in atomic context when changing linear map permissions for DEBUG_PAGEALLOC or KFENCE - Rework printing of Spectre mitigation status to avoid hardlockup when enabling per-task mitigations on the context-switch path - Reject kernel modules when instruction patching fails either due to the DWARF-based SCS patching or because of an alternatives callback residing outside of the core kernel text - Propagate error when updating kernel memory permissions in kprobes - Drop pointless, incorrect message when enabling the ACPI SPCR console - Use value-returning LSE instructions for per-cpu atomics to reduce latency in SRCU locking routines" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Reject modules with internal alternative callbacks arm64: Fail module loading if dynamic SCS patching fails arm64: proton-pack: Fix hard lockup due to print in scheduler context arm64: proton-pack: Drop print when !CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY arm64: mm: Tidy up force_pte_mapping() arm64: mm: Optimize range_split_to_ptes() arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in atomic context arm64: kprobes: check the return value of set_memory_rox() arm64: acpi: Drop message logging SPCR default console Revert "ACPI: Suppress misleading SPCR console message when SPCR table is absent" arm64: Use load LSE atomics for the non-return per-CPU atomic operations
2025-11-11Merge tag 'for-6.18-rc5-tag' of ↵Linus Torvalds4-34/+34
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix new inode name tracking in tree-log - fix conventional zone and stripe calculations in zoned mode - fix bio reference counts on error paths in relocation and scrub * tag 'for-6.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: release root after error in data_reloc_print_warning_inode() btrfs: scrub: put bio after errors in scrub_raid56_parity_stripe() btrfs: do not update last_log_commit when logging inode due to a new name btrfs: zoned: fix stripe width calculation btrfs: zoned: fix conventional zone capacity calculation
2025-11-11Merge tag 'mm-hotfixes-stable-2025-11-10-19-30' of ↵Linus Torvalds30-152/+424
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "26 hotfixes. 22(!) are cc:stable, 22 are MM. - address some Kexec Handover issues (Pasha Tatashin) - fix handling of large folios which are mapped outside i_size (Kiryl Shutsemau) - fix some DAMON time issues on 32-bit machines (Quanmin Yan) Plus the usual shower of singletons" * tag 'mm-hotfixes-stable-2025-11-10-19-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (26 commits) kho: warn and exit when unpreserved page wasn't preserved kho: fix unpreservation of higher-order vmalloc preservations kho: fix out-of-bounds access of vmalloc chunk MAINTAINERS: add Chris and Kairui as the swap maintainer mm/secretmem: fix use-after-free race in fault handler mm/huge_memory: initialise the tags of the huge zero folio nilfs2: avoid having an active sc_timer before freeing sci scripts/decode_stacktrace.sh: fix build ID and PC source parsing mm/damon/sysfs: change next_update_jiffies to a global variable mm/damon/stat: change last_refresh_jiffies to a global variable maple_tree: fix tracepoint string pointers codetag: debug: handle existing CODETAG_EMPTY in mark_objexts_empty for slabobj_ext mm/mremap: honour writable bit in mremap pte batching gcov: add support for GCC 15 mm/mm_init: fix hash table order logging in alloc_large_system_hash() mm/truncate: unmap large folio on split failure mm/memory: do not populate page table entries beyond i_size fs/proc: fix uaf in proc_readdir_de() mm/huge_memory: preserve PG_has_hwpoisoned if a folio is split to >0 order ksm: use range-walk function to jump over holes in scan_get_next_rmap_item ...
2025-11-11smb: client: let smbd_disconnect_rdma_connection() turn CREATED into ↵Stefan Metzmacher1-0/+3
DISCONNECTED When smbd_disconnect_rdma_connection() turns SMBDIRECT_SOCKET_CREATED into SMBDIRECT_SOCKET_ERROR, we'll have the situation that smbd_disconnect_rdma_work() will set SMBDIRECT_SOCKET_DISCONNECTING and call rdma_disconnect(), which likely fails as we never reached the RDMA_CM_EVENT_ESTABLISHED. it means that wait_event(sc->status_wait, sc->status == SMBDIRECT_SOCKET_DISCONNECTED) in smbd_destroy() will hang forever in SMBDIRECT_SOCKET_DISCONNECTING never reaching SMBDIRECT_SOCKET_DISCONNECTED. So we directly go from SMBDIRECT_SOCKET_CREATED to SMBDIRECT_SOCKET_DISCONNECTED. Fixes: ffbfc73e84eb ("smb: client: let smbd_disconnect_rdma_connection() set SMBDIRECT_SOCKET_ERROR...") Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-11-11ALSA: usb-audio: Fix NULL pointer dereference in snd_usb_mixer_controls_baddHaein Lee1-0/+2
In snd_usb_create_streams(), for UAC version 3 devices, the Interface Association Descriptor (IAD) is retrieved via usb_ifnum_to_if(). If this call fails, a fallback routine attempts to obtain the IAD from the next interface and sets a BADD profile. However, snd_usb_mixer_controls_badd() assumes that the IAD retrieved from usb_ifnum_to_if() is always valid, without performing a NULL check. This can lead to a NULL pointer dereference when usb_ifnum_to_if() fails to find the interface descriptor. This patch adds a NULL pointer check after calling usb_ifnum_to_if() in snd_usb_mixer_controls_badd() to prevent the dereference. This issue was discovered by syzkaller, which triggered the bug by sending a crafted USB device descriptor. Fixes: 17156f23e93c ("ALSA: usb: add UAC3 BADD profiles support") Signed-off-by: Haein Lee <lhi0729@kaist.ac.kr> Link: https://patch.msgid.link/vwhzmoba9j2f.vwhzmob9u9e2.g6@dooray.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-11-11mmc: dw_mmc-rockchip: Fix wrong internal phase calculateShawn Lin1-2/+2
ciu clock is 2 times of io clock, but the sample clk used is derived from io clock provided to the card. So we should use io clock to calculate the phase. Fixes: 59903441f5e4 ("mmc: dw_mmc-rockchip: Add internal phase support") Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Acked-by: Heiko Stuebner <heiko@sntech.de> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-11-11mmc: pxamci: Simplify pxamci_probe() error handling using devm APIsRakuram Eswaran1-38/+18
This patch refactors pxamci_probe() to use devm-managed resource allocation (e.g. devm_dma_request_chan) and dev_err_probe() for improved readability and automatic cleanup on probe failure. It also removes redundant NULL assignments and manual resource release logic from pxamci_probe(), and eliminates the corresponding release calls from pxamci_remove(). Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202510041841.pRlunIfl-lkp@intel.com/ Fixes: 58c40f3faf742c ("mmc: pxamci: Use devm_mmc_alloc_host() helper") Suggested-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Rakuram Eswaran <rakuram.e96@gmail.com> Reviewed-by: Khalid Aziz <khalid@kernel.org> Acked-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-11-11smb: fix invalid username check in smb3_fs_context_parse_param()Yiqi Sun1-1/+1
Since the maximum return value of strnlen(..., CIFS_MAX_USERNAME_LEN) is CIFS_MAX_USERNAME_LEN, length check in smb3_fs_context_parse_param() is always FALSE and invalid. Fix the comparison in if statement. Signed-off-by: Yiqi Sun <sunyiqixm@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-11-11smb: server: let smb_direct_disconnect_rdma_connection() turn CREATED into ↵Stefan Metzmacher1-0/+3
DISCONNECTED When smb_direct_disconnect_rdma_connection() turns SMBDIRECT_SOCKET_CREATED into SMBDIRECT_SOCKET_ERROR, we'll have the situation that smb_direct_disconnect_rdma_work() will set SMBDIRECT_SOCKET_DISCONNECTING and call rdma_disconnect(), which likely fails as we never reached the RDMA_CM_EVENT_ESTABLISHED. it means that wait_event(sc->status_wait, sc->status == SMBDIRECT_SOCKET_DISCONNECTED) in free_transport() will hang forever in SMBDIRECT_SOCKET_DISCONNECTING never reaching SMBDIRECT_SOCKET_DISCONNECTED. So we directly go from SMBDIRECT_SOCKET_CREATED to SMBDIRECT_SOCKET_DISCONNECTED. Fixes: b3fd52a0d85c ("smb: server: let smb_direct_disconnect_rdma_connection() set SMBDIRECT_SOCKET_ERROR...") Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-11-11mlx5: Fix default values in create CQAkiva Goldberger8-48/+44
Currently, CQs without a completion function are assigned the mlx5_add_cq_to_tasklet function by default. This is problematic since only user CQs created through the mlx5_ib driver are intended to use this function. Additionally, all CQs that will use doorbells instead of polling for completions must call mlx5_cq_arm. However, the default CQ creation flow leaves a valid value in the CQ's arm_db field, allowing FW to send interrupts to polling-only CQs in certain corner cases. These two factors would allow a polling-only kernel CQ to be triggered by an EQ interrupt and call a completion function intended only for user CQs, causing a null pointer exception. Some areas in the driver have prevented this issue with one-off fixes but did not address the root cause. This patch fixes the described issue by adding defaults to the create CQ flow. It adds a default dummy completion function to protect against null pointer exceptions, and it sets an invalid command sequence number by default in kernel CQs to prevent the FW from sending an interrupt to the CQ until it is armed. User CQs are responsible for their own initialization values. Callers of mlx5_core_create_cq are responsible for changing the completion function and arming the CQ per their needs. Fixes: cdd04f4d4d71 ("net/mlx5: Add support to create SQ and CQ for ASO") Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Leon Romanovsky <leon@kernel.org> Link: https://patch.msgid.link/1762681743-1084694-1-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-11Bluetooth: btrtl: Avoid loading the config file on security chipsMax Chou1-11/+13
For chips with security enabled, it's only possible to load firmware with a valid signature pattern. If key_id is not zero, it indicates a security chip, and the driver will not load the config file. - Example log for a security chip. Bluetooth: hci0: RTL: examining hci_ver=0c hci_rev=000a lmp_ver=0c lmp_subver=8922 Bluetooth: hci0: RTL: rom_version status=0 version=1 Bluetooth: hci0: RTL: btrtl_initialize: key id 1 Bluetooth: hci0: RTL: loading rtl_bt/rtl8922au_fw.bin Bluetooth: hci0: RTL: cfg_sz 0, total sz 71301 Bluetooth: hci0: RTL: fw version 0x41c0c905 - Example log for a normal chip. Bluetooth: hci0: RTL: examining hci_ver=0c hci_rev=000a lmp_ver=0c lmp_subver=8922 Bluetooth: hci0: RTL: rom_version status=0 version=1 Bluetooth: hci0: RTL: btrtl_initialize: key id 0 Bluetooth: hci0: RTL: loading rtl_bt/rtl8922au_fw.bin Bluetooth: hci0: RTL: loading rtl_bt/rtl8922au_config.bin Bluetooth: hci0: RTL: cfg_sz 6, total sz 71307 Bluetooth: hci0: RTL: fw version 0x41c0c905 Tested-by: Hilda Wu <hildawu@realtek.com> Signed-off-by: Nial Ni <niall_ni@realsil.com.cn> Signed-off-by: Max Chou <max.chou@realtek.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-11-11Merge branch 'mlx5e-misc-fixes-2025-11-09'Paolo Abeni3-7/+31
Tariq Toukan says: ==================== mlx5e misc fixes 2025-11-09 This patchset provides misc bug fixes from the team to the mlx5 Eth driver. ==================== Link: https://patch.msgid.link/1762681073-1084058-1-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-11net/mlx5e: Fix potentially misleading debug messageGal Pressman1-2/+16
Change the debug message to print the correct units instead of always assuming Gbps, as the value can be in either 100 Mbps or 1 Gbps units. Fixes: 5da8bc3effb6 ("net/mlx5e: DCBNL, Add debug messages log") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1762681073-1084058-6-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-11net/mlx5e: Fix wraparound in rate limiting for values above 255 GbpsGal Pressman1-1/+9
Add validation to reject rates exceeding 255 Gbps that would overflow the 8 bits max bandwidth field. Fixes: d8880795dabf ("net/mlx5e: Implement DCBNL IEEE max rate") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1762681073-1084058-5-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-11net/mlx5e: Fix maxrate wraparound in threshold between unitsGal Pressman1-2/+3
The previous calculation used roundup() which caused an overflow for rates between 25.5Gbps and 26Gbps. For example, a rate of 25.6Gbps would result in using 100Mbps units with value of 256, which would overflow the 8 bits field. Simplify the upper_limit_mbps calculation by removing the unnecessary roundup, and adjust the comparison to use <= to correctly handle the boundary condition. Fixes: d8880795dabf ("net/mlx5e: Implement DCBNL IEEE max rate") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1762681073-1084058-4-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-11net/mlx5e: Trim the length of the num_doorbell errorCosmin Ratiu1-1/+1
When trying to set num_doorbells to a value greater than the max number of channels, the error message was going over the netlink limit of 80 chars, truncating the most important part of the message, the number of channels. Fix that by trimming the length a bit. Fixes: 11bbcfb7668c ("net/mlx5e: Use the 'num_doorbells' devlink param") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1762681073-1084058-3-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-11net/mlx5e: Fix missing error assignment in mlx5e_xfrm_add_state()Carolina Jubran1-1/+2
Assign the return value of mlx5_eswitch_block_mode() to 'err' before checking it to avoid returning an uninitialized error code. Fixes: 22239eb258bc ("net/mlx5e: Prevent tunnel reformat when tunnel mode not allowed") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202510271649.uwsIxD6O-lkp@intel.com/ Closes: http://lore.kernel.org/linux-rdma/aPIEK4rLB586FdDt@stanley.mountain/ Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1762681073-1084058-2-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-11Merge branch 'net-sched-initialize-struct-tc_ife-to-fix-kernel-infoleak'Paolo Abeni2-10/+14
Ranganath says: ==================== net: sched: initialize struct tc_ife to fix kernel-infoleak This series addresses the uninitialization of the struct which has 2 bytes of padding. And copying this uninitialized data to userspace can leak info from kernel memory. This series ensures all members and padding are cleared prior to begin copied. This change silences the KMSAN report and prevents potential information leaks from the kernel memory. v3: https://lore.kernel.org/lkml/20251106195635.2438-1-vnranganath.20@gmail.com/#t v2: https://lore.kernel.org/r/20251101-infoleak-v2-0-01a501d41c09@gmail.com v1: https://lore.kernel.org/r/20251031-infoleak-v1-1-9f7250ee33aa@gmail.com Signed-off-by: Ranganath V N <vnranganath.20@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> ==================== Link: https://patch.msgid.link/20251109091336.9277-1-vnranganath.20@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-11net: sched: act_ife: initialize struct tc_ife to fix KMSAN kernel-infoleakRanganath V N1-5/+7
Fix a KMSAN kernel-infoleak detected by the syzbot . [net?] KMSAN: kernel-infoleak in __skb_datagram_iter In tcf_ife_dump(), the variable 'opt' was partially initialized using a designatied initializer. While the padding bytes are reamined uninitialized. nla_put() copies the entire structure into a netlink message, these uninitialized bytes leaked to userspace. Initialize the structure with memset before assigning its fields to ensure all members and padding are cleared prior to beign copied. This change silences the KMSAN report and prevents potential information leaks from the kernel memory. This fix has been tested and validated by syzbot. This patch closes the bug reported at the following syzkaller link and ensures no infoleak. Reported-by: syzbot+0c85cae3350b7d486aee@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0c85cae3350b7d486aee Tested-by: syzbot+0c85cae3350b7d486aee@syzkaller.appspotmail.com Fixes: ef6980b6becb ("introduce IFE action") Signed-off-by: Ranganath V N <vnranganath.20@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251109091336.9277-3-vnranganath.20@gmail.com Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>