Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'lkmm.2023.02.15a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull LKMM (Linux Kernel Memory Model) updates from Paul McKenney:
"Documentation updates.

Add read-modify-write sequences, which means that stronger primitives
more consistently result in stronger ordering, while still remaining
in the envelope of the hardware that supports Linux.

Address, data, and control dependencies used to ignore data that was
stored in temporaries. This update extends these dependency chains to
include unmarked intra-thread stores and loads. Note that these
unmarked stores and loads should not be concurrently accessed from
multiple threads, and doing so will cause LKMM to flag such accesses
as data races"

* tag 'lkmm.2023.02.15a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
tools: memory-model: Make plain accesses carry dependencies
Documentation: Fixed a typo in atomic_t.txt
tools: memory-model: Add rmw-sequences to the LKMM
locking/memory-barriers.txt: Improve documentation for writel() example

+89 -14
+1 -1
Documentation/atomic_t.txt
··· 324 324 325 325 Specifically 'simple' cmpxchg() loops are expected to not starve one another 326 326 indefinitely. However, this is not evident on LL/SC architectures, because 327 - while an LL/SC architecure 'can/should/must' provide forward progress 327 + while an LL/SC architecture 'can/should/must' provide forward progress 328 328 guarantees between competing LL/SC sections, such a guarantee does not 329 329 transfer to cmpxchg() implemented using LL/SC. Consider: 330 330
+10 -10
Documentation/memory-barriers.txt
··· 1910 1910 1911 1911 These are for use with consistent memory to guarantee the ordering 1912 1912 of writes or reads of shared memory accessible to both the CPU and a 1913 - DMA capable device. 1913 + DMA capable device. See Documentation/core-api/dma-api.rst file for more 1914 + information about consistent memory. 1914 1915 1915 1916 For example, consider a device driver that shares memory with a device 1916 1917 and uses a descriptor status value to indicate if the descriptor belongs ··· 1932 1931 /* assign ownership */ 1933 1932 desc->status = DEVICE_OWN; 1934 1933 1935 - /* notify device of new descriptors */ 1934 + /* Make descriptor status visible to the device followed by 1935 + * notify device of new descriptor 1936 + */ 1936 1937 writel(DESC_NOTIFY, doorbell); 1937 1938 } 1938 1939 1939 - The dma_rmb() allows us guarantee the device has released ownership 1940 + The dma_rmb() allows us to guarantee that the device has released ownership 1940 1941 before we read the data from the descriptor, and the dma_wmb() allows 1941 1942 us to guarantee the data is written to the descriptor before the device 1942 1943 can see it now has ownership. The dma_mb() implies both a dma_rmb() and 1943 - a dma_wmb(). Note that, when using writel(), a prior wmb() is not needed 1944 - to guarantee that the cache coherent memory writes have completed before 1945 - writing to the MMIO region. The cheaper writel_relaxed() does not provide 1946 - this guarantee and must not be used here. 1944 + a dma_wmb(). 1947 1945 1948 - See the subsection "Kernel I/O barrier effects" for more information on 1949 - relaxed I/O accessors and the Documentation/core-api/dma-api.rst file for 1950 - more information on consistent memory. 1946 + Note that the dma_*() barriers do not provide any ordering guarantees for 1947 + accesses to MMIO regions. See the later "KERNEL I/O BARRIER EFFECTS" 1948 + subsection for more information about I/O accessors and MMIO ordering. 1951 1949 1952 1950 (*) pmem_wmb(); 1953 1951
+38 -1
tools/memory-model/Documentation/explanation.txt
··· 1007 1007 where the rmw relation links the read and write events making up each 1008 1008 atomic update. This is what the LKMM's "atomic" axiom says. 1009 1009 1010 + Atomic rmw updates play one more role in the LKMM: They can form "rmw 1011 + sequences". An rmw sequence is simply a bunch of atomic updates where 1012 + each update reads from the previous one. Written using events, it 1013 + looks like this: 1014 + 1015 + Z0 ->rf Y1 ->rmw Z1 ->rf ... ->rf Yn ->rmw Zn, 1016 + 1017 + where Z0 is some store event and n can be any number (even 0, in the 1018 + degenerate case). We write this relation as: Z0 ->rmw-sequence Zn. 1019 + Note that this implies Z0 and Zn are stores to the same variable. 1020 + 1021 + Rmw sequences have a special property in the LKMM: They can extend the 1022 + cumul-fence relation. That is, if we have: 1023 + 1024 + U ->cumul-fence X -> rmw-sequence Y 1025 + 1026 + then also U ->cumul-fence Y. Thinking about this in terms of the 1027 + operational model, U ->cumul-fence X says that the store U propagates 1028 + to each CPU before the store X does. Then the fact that X and Y are 1029 + linked by an rmw sequence means that U also propagates to each CPU 1030 + before Y does. In an analogous way, rmw sequences can also extend 1031 + the w-post-bounded relation defined below in the PLAIN ACCESSES AND 1032 + DATA RACES section. 1033 + 1034 + (The notion of rmw sequences in the LKMM is similar to, but not quite 1035 + the same as, that of release sequences in the C11 memory model. They 1036 + were added to the LKMM to fix an obscure bug; without them, atomic 1037 + updates with full-barrier semantics did not always guarantee ordering 1038 + at least as strong as atomic updates with release-barrier semantics.) 1039 + 1010 1040 1011 1041 THE PRESERVED PROGRAM ORDER RELATION: ppo 1012 1042 ----------------------------------------- ··· 2575 2545 them. 2576 2546 2577 2547 Although we said that plain accesses are not linked by the ppo 2578 - relation, they do contribute to it indirectly. Namely, when there is 2548 + relation, they do contribute to it indirectly. Firstly, when there is 2579 2549 an address dependency from a marked load R to a plain store W, 2580 2550 followed by smp_wmb() and then a marked store W', the LKMM creates a 2581 2551 ppo link from R to W'. The reasoning behind this is perhaps a little ··· 2583 2553 for this source code in which W' could execute before R. Just as with 2584 2554 pre-bounding by address dependencies, it is possible for the compiler 2585 2555 to undermine this relation if sufficient care is not taken. 2556 + 2557 + Secondly, plain accesses can carry dependencies: If a data dependency 2558 + links a marked load R to a store W, and the store is read by a load R' 2559 + from the same thread, then the data loaded by R' depends on the data 2560 + loaded originally by R. Thus, if R' is linked to any access X by a 2561 + dependency, R is also linked to access X by the same dependency, even 2562 + if W' or R' (or both!) are plain. 2586 2563 2587 2564 There are a few oddball fences which need special treatment: 2588 2565 smp_mb__before_atomic(), smp_mb__after_atomic(), and
+6
tools/memory-model/linux-kernel.bell
··· 82 82 let Marked = (~M) | IW | Once | Release | Acquire | domain(rmw) | range(rmw) | 83 83 LKR | LKW | UL | LF | RL | RU 84 84 let Plain = M \ Marked 85 + 86 + (* Redefine dependencies to include those carried through plain accesses *) 87 + let carry-dep = (data ; rfi)* 88 + let addr = carry-dep ; addr 89 + let ctrl = carry-dep ; ctrl 90 + let data = carry-dep ; data
+3 -2
tools/memory-model/linux-kernel.cat
··· 74 74 75 75 (* Propagation: Ordering from release operations and strong fences. *) 76 76 let A-cumul(r) = (rfe ; [Marked])? ; r 77 + let rmw-sequence = (rf ; rmw)* 77 78 let cumul-fence = [Marked] ; (A-cumul(strong-fence | po-rel) | wmb | 78 - po-unlock-lock-po) ; [Marked] 79 + po-unlock-lock-po) ; [Marked] ; rmw-sequence 79 80 let prop = [Marked] ; (overwrite & ext)? ; cumul-fence* ; 80 81 [Marked] ; rfe? ; [Marked] 81 82 ··· 175 174 let w-pre-bounded = [Marked] ; (addr | fence)? 176 175 let r-pre-bounded = [Marked] ; (addr | nonrw-fence | 177 176 ([R4rmb] ; fencerel(Rmb) ; [~Noreturn]))? 178 - let w-post-bounded = fence? ; [Marked] 177 + let w-post-bounded = fence? ; [Marked] ; rmw-sequence 179 178 let r-post-bounded = (nonrw-fence | ([~Noreturn] ; fencerel(Rmb) ; [R4rmb]))? ; 180 179 [Marked] 181 180
+31
tools/memory-model/litmus-tests/dep+plain.litmus
··· 1 + C dep+plain 2 + 3 + (* 4 + * Result: Never 5 + * 6 + * This litmus test demonstrates that in LKMM, plain accesses 7 + * carry dependencies much like accesses to registers: 8 + * The data stored to *z1 and *z2 by P0() originates from P0()'s 9 + * READ_ONCE(), and therefore using that data to compute the 10 + * conditional of P0()'s if-statement creates a control dependency 11 + * from that READ_ONCE() to P0()'s WRITE_ONCE(). 12 + *) 13 + 14 + {} 15 + 16 + P0(int *x, int *y, int *z1, int *z2) 17 + { 18 + int a = READ_ONCE(*x); 19 + *z1 = a; 20 + *z2 = *z1; 21 + if (*z2 == 1) 22 + WRITE_ONCE(*y, 1); 23 + } 24 + 25 + P1(int *x, int *y) 26 + { 27 + int r = smp_load_acquire(y); 28 + smp_store_release(x, r); 29 + } 30 + 31 + exists (x=1 /\ y=1)