Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

radix_tree_tag_get() is not as safe as the docs make out [ver #2]

radix_tree_tag_get() is not safe to use concurrently with radix_tree_tag_set()
or radix_tree_tag_clear(). The problem is that the double tag_get() in
radix_tree_tag_get():

if (!tag_get(node, tag, offset))
saw_unset_tag = 1;
if (height == 1) {
int ret = tag_get(node, tag, offset);

may see the value change due to the action of set/clear. RCU is no protection
against this as no pointers are being changed, no nodes are being replaced
according to a COW protocol - set/clear alter the node directly.

The documentation in linux/radix-tree.h, however, says that
radix_tree_tag_get() is an exception to the rule that "any function modifying
the tree or tags (...) must exclude other modifications, and exclude any
functions reading the tree".

The problem is that the next statement in radix_tree_tag_get() checks that the
tag doesn't vary over time:

BUG_ON(ret && saw_unset_tag);

This has been seen happening in FS-Cache:

https://www.redhat.com/archives/linux-cachefs/2010-April/msg00013.html

To this end, remove the BUG_ON() from radix_tree_tag_get() and note in various
comments that the value of the tag may change whilst the RCU read lock is held,
and thus that the return value of radix_tree_tag_get() may not be relied upon
unless radix_tree_tag_set/clear() and radix_tree_delete() are excluded from
running concurrently with it.

Reported-by: Romain DEGEZ <romain.degez@smartjog.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

David Howells and committed by
Linus Torvalds
ce82653d d3e06e2b

+13 -6
+7
include/linux/radix-tree.h
··· 121 121 * (Note, rcu_assign_pointer and rcu_dereference are not needed to control 122 122 * access to data items when inserting into or looking up from the radix tree) 123 123 * 124 + * Note that the value returned by radix_tree_tag_get() may not be relied upon 125 + * if only the RCU read lock is held. Functions to set/clear tags and to 126 + * delete nodes running concurrently with it may affect its result such that 127 + * two consecutive reads in the same locked section may return different 128 + * values. If reliability is required, modification functions must also be 129 + * excluded from concurrency. 130 + * 124 131 * radix_tree_tagged is able to be called without locking or RCU. 125 132 */ 126 133
+6 -6
lib/radix-tree.c
··· 555 555 * 556 556 * 0: tag not present or not set 557 557 * 1: tag set 558 + * 559 + * Note that the return value of this function may not be relied on, even if 560 + * the RCU lock is held, unless tag modification and node deletion are excluded 561 + * from concurrency. 558 562 */ 559 563 int radix_tree_tag_get(struct radix_tree_root *root, 560 564 unsigned long index, unsigned int tag) ··· 599 595 */ 600 596 if (!tag_get(node, tag, offset)) 601 597 saw_unset_tag = 1; 602 - if (height == 1) { 603 - int ret = tag_get(node, tag, offset); 604 - 605 - BUG_ON(ret && saw_unset_tag); 606 - return !!ret; 607 - } 598 + if (height == 1) 599 + return !!tag_get(node, tag, offset); 608 600 node = rcu_dereference_raw(node->slots[offset]); 609 601 shift -= RADIX_TREE_MAP_SHIFT; 610 602 height--;