Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs mount updates from Al Viro:

- mount hash conflicts rudiments are gone now - we do not allow
multiple mounts with the same parent/mountpoint to be hashed at the
same time.

- 'struct mount' changes:
- mnt_umounting is gone
- mnt_slave_list/mnt_slave is an hlist now
- overmounts are kept track of by explicit pointer in mount
- a bunch of flags moved out of mnt_flags to a new field, with
only namespace_sem for protection
- mnt_expiry is protected by mount_lock now (instead of
namespace_sem)
- MNT_LOCKED is used only for mounts that need to remain attached
to their parents to prevent mountpoint exposure - no more
overloading it for absolute roots
- all mnt_list uses are transient now - it's used only to
represent temporary sets during umount_tree()

- mount refcounting change: children no longer pin parents for any
mounts, whether they'd passed through umount_tree() or not

- 'struct mountpoint' changes:
- refcount is no more; what matters is ->m_list emptiness
- instead of temporary bumping the refcount, we insert a new
object (pinned_mountpoint) into ->m_list
- new calling conventions for lock_mount() and friends

- do_move_mount()/attach_recursive_mnt() seriously cleaned up

- globals in fs/pnode.c are gone

- propagate_mnt(), change_mnt_propagation() and propagate_umount()
cleaned up (in the last case - pretty much completely rewritten).

- freeing of emptied mnt_namespace is done in namespace_unlock(). For
one thing, there are subtle ordering requirements there; for another
it simplifies cleanups.

- assorted cleanups

- restore the machinery for long-term mounts from accumulated bitrot.

This is going to get a followup come next cycle, when the change of
vfs_fs_parse_string() calling conventions goes into -next

* tag 'pull-mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (48 commits)
statmount_mnt_basic(): simplify the logics for group id
invent_group_ids(): zero ->mnt_group_id always implies !IS_MNT_SHARED()
get rid of CL_SHARE_TO_SLAVE
take freeing of emptied mnt_namespace to namespace_unlock()
copy_tree(): don't link the mounts via mnt_list
change_mnt_propagation(): move ->mnt_master assignment into MS_SLAVE case
mnt_slave_list/mnt_slave: turn into hlist_head/hlist_node
turn do_make_slave() into transfer_propagation()
do_make_slave(): choose new master sanely
change_mnt_propagation(): do_make_slave() is a no-op unless IS_MNT_SHARED()
change_mnt_propagation() cleanups, step 1
propagate_mnt(): fix comment and convert to kernel-doc, while we are at it
propagate_mnt(): get rid of last_dest
fs/pnode.c: get rid of globals
propagate_one(): fold into the sole caller
propagate_one(): separate the "what should be the master for this copy" part
propagate_one(): separate the "do we need secondary here?" logics
propagate_mnt(): handle all peer groups in the same loop
propagate_one(): get rid of dest_master
mount: separate the flags accessed only under namespace_sem
...

+1229 -820
+484
Documentation/filesystems/propagate_umount.txt
··· 1 + Notes on propagate_umount() 2 + 3 + Umount propagation starts with a set of mounts we are already going to 4 + take out. Ideally, we would like to add all downstream cognates to 5 + that set - anything with the same mountpoint as one of the removed 6 + mounts and with parent that would receive events from the parent of that 7 + mount. However, there are some constraints the resulting set must 8 + satisfy. 9 + 10 + It is convenient to define several properties of sets of mounts: 11 + 12 + 1) A set S of mounts is non-shifting if for any mount X belonging 13 + to S all subtrees mounted strictly inside of X (i.e. not overmounting 14 + the root of X) contain only elements of S. 15 + 16 + 2) A set S is non-revealing if all locked mounts that belong to S have 17 + parents that also belong to S. 18 + 19 + 3) A set S is closed if it contains all children of its elements. 20 + 21 + The set of mounts taken out by umount(2) must be non-shifting and 22 + non-revealing; the first constraint is what allows to reparent 23 + any remaining mounts and the second is what prevents the exposure 24 + of any concealed mountpoints. 25 + 26 + propagate_umount() takes the original set as an argument and tries to 27 + extend that set. The original set is a full subtree and its root is 28 + unlocked; what matters is that it's closed and non-revealing. 29 + Resulting set may not be closed; there might still be mounts outside 30 + of that set, but only on top of stacks of root-overmounting elements 31 + of set. They can be reparented to the place where the bottom of 32 + stack is attached to a mount that will survive. NOTE: doing that 33 + will violate a constraint on having no more than one mount with 34 + the same parent/mountpoint pair; however, the caller (umount_tree()) 35 + will immediately remedy that - it may keep unmounted element attached 36 + to parent, but only if the parent itself is unmounted. Since all 37 + conflicts created by reparenting have common parent *not* in the 38 + set and one side of the conflict (bottom of the stack of overmounts) 39 + is in the set, it will be resolved. However, we rely upon umount_tree() 40 + doing that pretty much immediately after the call of propagate_umount(). 41 + 42 + Algorithm is based on two statements: 43 + 1) for any set S, there is a maximal non-shifting subset of S 44 + and it can be calculated in O(#S) time. 45 + 2) for any non-shifting set S, there is a maximal non-revealing 46 + subset of S. That subset is also non-shifting and it can be calculated 47 + in O(#S) time. 48 + 49 + Finding candidates. 50 + 51 + We are given a closed set U and we want to find all mounts that have 52 + the same mountpoint as some mount m in U *and* whose parent receives 53 + propagation from the parent of the same mount m. Naive implementation 54 + would be 55 + S = {} 56 + for each m in U 57 + add m to S 58 + p = parent(m) 59 + for each q in Propagation(p) - {p} 60 + child = look_up(q, mountpoint(m)) 61 + if child 62 + add child to S 63 + but that can lead to excessive work - there might be propagation among the 64 + subtrees of U, in which case we'd end up examining the same candidates 65 + many times. Since propagation is transitive, the same will happen to 66 + everything downstream of that candidate and it's not hard to construct 67 + cases where the approach above leads to the time quadratic by the actual 68 + number of candidates. 69 + 70 + Note that if we run into a candidate we'd already seen, it must've been 71 + added on an earlier iteration of the outer loop - all additions made 72 + during one iteration of the outer loop have different parents. So 73 + if we find a child already added to the set, we know that everything 74 + in Propagation(parent(child)) with the same mountpoint has been already 75 + added. 76 + S = {} 77 + for each m in U 78 + if m in S 79 + continue 80 + add m to S 81 + p = parent(m) 82 + q = propagation_next(p, p) 83 + while q 84 + child = look_up(q, mountpoint(m)) 85 + if child 86 + if child in S 87 + q = skip_them(q, p) 88 + continue; 89 + add child to S 90 + q = propagation_next(q, p) 91 + where 92 + skip_them(q, p) 93 + keep walking Propagation(p) from q until we find something 94 + not in Propagation(q) 95 + 96 + would get rid of that problem, but we need a sane implementation of 97 + skip_them(). That's not hard to do - split propagation_next() into 98 + "down into mnt_slave_list" and "forward-and-up" parts, with the 99 + skip_them() being "repeat the forward-and-up part until we get NULL 100 + or something that isn't a peer of the one we are skipping". 101 + 102 + Note that there can be no absolute roots among the extra candidates - 103 + they all come from mount lookups. Absolute root among the original 104 + set is _currently_ impossible, but it might be worth protecting 105 + against. 106 + 107 + Maximal non-shifting subsets. 108 + 109 + Let's call a mount m in a set S forbidden in that set if there is a 110 + subtree mounted strictly inside m and containing mounts that do not 111 + belong to S. 112 + 113 + The set is non-shifting when none of its elements are forbidden in it. 114 + 115 + If mount m is forbidden in a set S, it is forbidden in any subset S' it 116 + belongs to. In other words, it can't belong to any of the non-shifting 117 + subsets of S. If we had a way to find a forbidden mount or show that 118 + there's none, we could use it to find the maximal non-shifting subset 119 + simply by finding and removing them until none remain. 120 + 121 + Suppose mount m is forbidden in S; then any mounts forbidden in S - {m} 122 + must have been forbidden in S itself. Indeed, since m has descendents 123 + that do not belong to S, any subtree that fits into S will fit into 124 + S - {m} as well. 125 + 126 + So in principle we could go through elements of S, checking if they 127 + are forbidden in S and removing the ones that are. Removals will 128 + not invalidate the checks done for earlier mounts - if they were not 129 + forbidden at the time we checked, they won't become forbidden later. 130 + It's too costly to be practical, but there is a similar approach that 131 + is linear by size of S. 132 + 133 + Let's say that mount x in a set S is forbidden by mount y, if 134 + * both x and y belong to S. 135 + * there is a chain of mounts starting at x and leaving S 136 + immediately after passing through y, with the first 137 + mountpoint strictly inside x. 138 + Note 1: x may be equal to y - that's the case when something not 139 + belonging to S is mounted strictly inside x. 140 + Note 2: if y does not belong to S, it can't forbid anything in S. 141 + Note 3: if y has no children outside of S, it can't forbid anything in S. 142 + 143 + It's easy to show that mount x is forbidden in S if and only if x is 144 + forbidden in S by some mount y. And it's easy to find all mounts in S 145 + forbidden by a given mount. 146 + 147 + Consider the following operation: 148 + Trim(S, m) = S - {x : x is forbidden by m in S} 149 + 150 + Note that if m does not belong to S or has no children outside of S we 151 + are guaranteed that Trim(S, m) is equal to S. 152 + 153 + The following is true: if x is forbidden by y in Trim(S, m), it was 154 + already forbidden by y in S. 155 + 156 + Proof: Suppose x is forbidden by y in Trim(S, m). Then there is a 157 + chain of mounts (x_0 = x, ..., x_k = y, x_{k+1} = r), such that x_{k+1} 158 + is the first element that doesn't belong to Trim(S, m) and the 159 + mountpoint of x_1 is strictly inside x. If mount r belongs to S, it must 160 + have been removed by Trim(S, m), i.e. it was forbidden in S by m. 161 + Then there was a mount chain from r to some child of m that stayed in 162 + S all the way until m, but that's impossible since x belongs to Trim(S, m) 163 + and prepending (x_0, ..., x_k) to that chain demonstrates that x is also 164 + forbidden in S by m, and thus can't belong to Trim(S, m). 165 + Therefore r can not belong to S and our chain demonstrates that 166 + x is forbidden by y in S. QED. 167 + 168 + Corollary: no mount is forbidden by m in Trim(S, m). Indeed, any 169 + such mount would have been forbidden by m in S and thus would have been 170 + in the part of S removed in Trim(S, m). 171 + 172 + Corollary: no mount is forbidden by m in Trim(Trim(S, m), n). Indeed, 173 + any such would have to have been forbidden by m in Trim(S, m), which 174 + is impossible. 175 + 176 + Corollary: after 177 + S = Trim(S, x_1) 178 + S = Trim(S, x_2) 179 + ... 180 + S = Trim(S, x_k) 181 + no mount remaining in S will be forbidden by either of x_1,...,x_k. 182 + 183 + The following will reduce S to its maximal non-shifting subset: 184 + visited = {} 185 + while S contains elements not belonging to visited 186 + let m be an arbitrary such element of S 187 + S = Trim(S, m) 188 + add m to visited 189 + 190 + S never grows, so the number of elements of S not belonging to visited 191 + decreases at least by one on each iteration. When the loop terminates, 192 + all mounts remaining in S belong to visited. It's easy to see that at 193 + the beginning of each iteration no mount remaining in S will be forbidden 194 + by any element of visited. In other words, no mount remaining in S will 195 + be forbidden, i.e. final value of S will be non-shifting. It will be 196 + the maximal non-shifting subset, since we were removing only forbidden 197 + elements. 198 + 199 + There are two difficulties in implementing the above in linear 200 + time, both due to the fact that Trim() might need to remove more than one 201 + element. Naive implementation of Trim() is vulnerable to running into a 202 + long chain of mounts, each mounted on top of parent's root. Nothing in 203 + that chain is forbidden, so nothing gets removed from it. We need to 204 + recognize such chains and avoid walking them again on subsequent calls of 205 + Trim(), otherwise we will end up with worst-case time being quadratic by 206 + the number of elements in S. Another difficulty is in implementing the 207 + outer loop - we need to iterate through all elements of a shrinking set. 208 + That would be trivial if we never removed more than one element at a time 209 + (linked list, with list_for_each_entry_safe for iterator), but we may 210 + need to remove more than one entry, possibly including the ones we have 211 + already visited. 212 + 213 + Let's start with naive algorithm for Trim(): 214 + 215 + Trim_one(m) 216 + found = false 217 + for each n in children(m) 218 + if n not in S 219 + found = true 220 + if (mountpoint(n) != root(m)) 221 + remove m from S 222 + break 223 + if found 224 + Trim_ancestors(m) 225 + 226 + Trim_ancestors(m) 227 + for (; parent(m) in S; m = parent(m)) { 228 + if (mountpoint(m) != root(parent(m))) 229 + remove parent(m) from S 230 + } 231 + 232 + If m belongs to S, Trim_one(m) will replace S with Trim(S, m). 233 + Proof: 234 + Consider the chains excluding elements from Trim(S, m). The last 235 + two elements in such chain are m and some child of m that does not belong 236 + to S. If m has no such children, Trim(S, m) is equal to S. 237 + m itself is removed if and only if the chain has exactly two 238 + elements, i.e. when the last element does not overmount the root of m. 239 + In other words, that happens when m has a child not in S that does not 240 + overmount the root of m. 241 + All other elements to remove will be ancestors of m, such that 242 + the entire descent chain from them to m is contained in S. Let 243 + (x_0, x_1, ..., x_k = m) be the longest such chain. x_i needs to be 244 + removed if and only if x_{i+1} does not overmount its root. It's easy 245 + to see that Trim_ancestors(m) will iterate through that chain from 246 + x_k to x_1 and that it will remove exactly the elements that need to be 247 + removed. 248 + 249 + Note that if the loop in Trim_ancestors() walks into an already 250 + visited element, we are guaranteed that remaining iterations will see 251 + only elements that had already been visited and remove none of them. 252 + That's the weakness that makes it vulnerable to long chains of full 253 + overmounts. 254 + 255 + It's easy to deal with, if we can afford setting marks on 256 + elements of S; we would mark all elements already visited by 257 + Trim_ancestors() and have it bail out as soon as it sees an already 258 + marked element. 259 + 260 + The problems with iterating through the set can be dealt with in 261 + several ways, depending upon the representation we choose for our set. 262 + One useful observation is that we are given a closed subset in S - the 263 + original set passed to propagate_umount(). Its elements can neither 264 + forbid anything nor be forbidden by anything - all their descendents 265 + belong to S, so they can not occur anywhere in any excluding chain. 266 + In other words, the elements of that subset will remain in S until 267 + the end and Trim_one(S, m) is a no-op for all m from that subset. 268 + 269 + That suggests keeping S as a disjoint union of a closed set U 270 + ('will be unmounted, no matter what') and the set of all elements of 271 + S that do not belong to U. That set ('candidates') is all we need 272 + to iterate through. Let's represent it as a subset in a cyclic list, 273 + consisting of all list elements that are marked as candidates (initially - 274 + all of them). Then we could have Trim_ancestors() only remove the mark, 275 + leaving the elements on the list. Then Trim_one() would never remove 276 + anything other than its argument from the containing list, allowing to 277 + use list_for_each_entry_safe() as iterator. 278 + 279 + Assuming that representation we get the following: 280 + 281 + list_for_each_entry_safe(m, ..., Candidates, ...) 282 + Trim_one(m) 283 + where 284 + Trim_one(m) 285 + if (m is not marked as a candidate) 286 + strip the "seen by Trim_ancestors" mark from m 287 + remove m from the Candidates list 288 + return 289 + 290 + remove_this = false 291 + found = false 292 + for each n in children(m) 293 + if n not in S 294 + found = true 295 + if (mountpoint(n) != root(m)) 296 + remove_this = true 297 + break 298 + if found 299 + Trim_ancestors(m) 300 + if remove_this 301 + strip the "seen by Trim_ancestors" mark from m 302 + strip the "candidate" mark from m 303 + remove m from the Candidate list 304 + 305 + Trim_ancestors(m) 306 + for (p = parent(m); p is marked as candidate ; m = p, p = parent(p)) { 307 + if m is marked as seen by Trim_ancestors 308 + return 309 + mark m as seen by Trim_ancestors 310 + if (mountpoint(m) != root(p)) 311 + strip the "candidate" mark from p 312 + } 313 + 314 + Terminating condition in the loop in Trim_ancestors() is correct, 315 + since that that loop will never run into p belonging to U - p is always 316 + an ancestor of argument of Trim_one() and since U is closed, the argument 317 + of Trim_one() would also have to belong to U. But Trim_one() is never 318 + called for elements of U. In other words, p belongs to S if and only 319 + if it belongs to candidates. 320 + 321 + Time complexity: 322 + * we get no more than O(#S) calls of Trim_one() 323 + * the loop over children in Trim_one() never looks at the same child 324 + twice through all the calls. 325 + * iterations of that loop for children in S are no more than O(#S) 326 + in the worst case 327 + * at most two children that are not elements of S are considered per 328 + call of Trim_one(). 329 + * the loop in Trim_ancestors() sets its mark once per iteration and 330 + no element of S has is set more than once. 331 + 332 + In the end we may have some elements excluded from S by 333 + Trim_ancestors() still stuck on the list. We could do a separate 334 + loop removing them from the list (also no worse than O(#S) time), 335 + but it's easier to leave that until the next phase - there we will 336 + iterate through the candidates anyway. 337 + 338 + The caller has already removed all elements of U from their parents' 339 + lists of children, which means that checking if child belongs to S is 340 + equivalent to checking if it's marked as a candidate; we'll never see 341 + the elements of U in the loop over children in Trim_one(). 342 + 343 + What's more, if we see that children(m) is empty and m is not 344 + locked, we can immediately move m into the committed subset (remove 345 + from the parent's list of children, etc.). That's one fewer mount we'll 346 + have to look into when we check the list of children of its parent *and* 347 + when we get to building the non-revealing subset. 348 + 349 + Maximal non-revealing subsets 350 + 351 + If S is not a non-revealing subset, there is a locked element x in S 352 + such that parent of x is not in S. 353 + 354 + Obviously, no non-revealing subset of S may contain x. Removing such 355 + elements one by one will obviously end with the maximal non-revealing 356 + subset (possibly empty one). Note that removal of an element will 357 + require removal of all its locked children, etc. 358 + 359 + If the set had been non-shifting, it will remain non-shifting after 360 + such removals. 361 + Proof: suppose S was non-shifting, x is a locked element of S, parent of x 362 + is not in S and S - {x} is not non-shifting. Then there is an element m 363 + in S - {x} and a subtree mounted strictly inside m, such that m contains 364 + an element not in in S - {x}. Since S is non-shifting, everything in 365 + that subtree must belong to S. But that means that this subtree must 366 + contain x somewhere *and* that parent of x either belongs that subtree 367 + or is equal to m. Either way it must belong to S. Contradiction. 368 + 369 + // same representation as for finding maximal non-shifting subsets: 370 + // S is a disjoint union of a non-revealing set U (the ones we are committed 371 + // to unmount) and a set of candidates, represented as a subset of list 372 + // elements that have "is a candidate" mark on them. 373 + // Elements of U are removed from their parents' lists of children. 374 + // In the end candidates becomes empty and maximal non-revealing non-shifting 375 + // subset of S is now in U 376 + while (Candidates list is non-empty) 377 + handle_locked(first(Candidates)) 378 + 379 + handle_locked(m) 380 + if m is not marked as a candidate 381 + strip the "seen by Trim_ancestors" mark from m 382 + remove m from the list 383 + return 384 + cutoff = m 385 + for (p = m; p in candidates; p = parent(p)) { 386 + strip the "seen by Trim_ancestors" mark from p 387 + strip the "candidate" mark from p 388 + remove p from the Candidates list 389 + if (!locked(p)) 390 + cutoff = parent(p) 391 + } 392 + if p in U 393 + cutoff = p 394 + while m != cutoff 395 + remove m from children(parent(m)) 396 + add m to U 397 + m = parent(m) 398 + 399 + Let (x_0, ..., x_n = m) be the maximal chain of descent of m within S. 400 + * If it contains some elements of U, let x_k be the last one of those. 401 + Then union of U with {x_{k+1}, ..., x_n} is obviously non-revealing. 402 + * otherwise if all its elements are locked, then none of {x_0, ..., x_n} 403 + may be elements of a non-revealing subset of S. 404 + * otherwise let x_k be the first unlocked element of the chain. Then none 405 + of {x_0, ..., x_{k-1}} may be an element of a non-revealing subset of 406 + S and union of U and {x_k, ..., x_n} is non-revealing. 407 + 408 + handle_locked(m) finds which of these cases applies and adjusts Candidates 409 + and U accordingly. U remains non-revealing, union of Candidates and 410 + U still contains any non-revealing subset of S and after the call of 411 + handle_locked(m) m is guaranteed to be not in Candidates list. So having 412 + it called for each element of S would suffice to empty Candidates, 413 + leaving U the maximal non-revealing subset of S. 414 + 415 + However, handle_locked(m) is a no-op when m belongs to U, so it's enough 416 + to have it called for elements of Candidates list until none remain. 417 + 418 + Time complexity: number of calls of handle_locked() is limited by 419 + #Candidates, each iteration of the first loop in handle_locked() removes 420 + an element from the list, so their total number of executions is also 421 + limited by #Candidates; number of iterations in the second loop is no 422 + greater than the number of iterations of the first loop. 423 + 424 + 425 + Reparenting 426 + 427 + After we'd calculated the final set, we still need to deal with 428 + reparenting - if an element of the final set has a child not in it, 429 + we need to reparent such child. 430 + 431 + Such children can only be root-overmounting (otherwise the set wouldn't 432 + be non-shifting) and their parents can not belong to the original set, 433 + since the original is guaranteed to be closed. 434 + 435 + 436 + Putting all of that together 437 + 438 + The plan is to 439 + * find all candidates 440 + * trim down to maximal non-shifting subset 441 + * trim down to maximal non-revealing subset 442 + * reparent anything that needs to be reparented 443 + * return the resulting set to the caller 444 + 445 + For the 2nd and 3rd steps we want to separate the set into growing 446 + non-revealing subset, initially containing the original set ("U" in 447 + terms of the pseudocode above) and everything we are still not sure about 448 + ("candidates"). It means that for the output of the 1st step we'd like 449 + the extra candidates separated from the stuff already in the original set. 450 + For the 4th step we would like the additions to U separate from the 451 + original set. 452 + 453 + So let's go for 454 + * original set ("set"). Linkage via mnt_list 455 + * undecided candidates ("candidates"). Subset of a list, 456 + consisting of all its elements marked with a new flag (T_UMOUNT_CANDIDATE). 457 + Initially all elements of the list will be marked that way; in the 458 + end the list will become empty and no mounts will remain marked with 459 + that flag. 460 + * Reuse T_MARKED for "has been already seen by trim_ancestors()". 461 + * anything in U that hadn't been in the original set - elements of 462 + candidates will gradually be either discarded or moved there. In other 463 + words, it's the candidates we have already decided to unmount. Its role 464 + is reasonably close to the old "to_umount", so let's use that name. 465 + Linkage via mnt_list. 466 + 467 + For gather_candidates() we'll need to maintain both candidates (S - 468 + set) and intersection of S with set. Use T_UMOUNT_CANDIDATE for 469 + all elements we encounter, putting the ones not already in the original 470 + set into the list of candidates. When we are done, strip that flag from 471 + all elements of the original set. That gives a cheap way to check 472 + if element belongs to S (in gather_candidates) and to candidates 473 + itself (at later stages). Call that predicate is_candidate(); it would 474 + be m->mnt_t_flags & T_UMOUNT_CANDIDATE. 475 + 476 + All elements of the original set are marked with MNT_UMOUNT and we'll 477 + need the same for elements added when joining the contents of to_umount 478 + to set in the end. Let's set MNT_UMOUNT at the time we add an element 479 + to to_umount; that's close to what the old 'umount_one' is doing, so 480 + let's keep that name. It also gives us another predicate we need - 481 + "belongs to union of set and to_umount"; will_be_unmounted() for now. 482 + 483 + Removals from the candidates list should strip both T_MARKED and 484 + T_UMOUNT_CANDIDATE; call it remove_from_candidates_list().
+18 -3
drivers/gpu/drm/i915/gem/i915_gemfs.c
··· 5 5 6 6 #include <linux/fs.h> 7 7 #include <linux/mount.h> 8 + #include <linux/fs_context.h> 8 9 9 10 #include "i915_drv.h" 10 11 #include "i915_gemfs.h" 11 12 #include "i915_utils.h" 12 13 14 + static int add_param(struct fs_context *fc, const char *key, const char *val) 15 + { 16 + return vfs_parse_fs_string(fc, key, val, strlen(val)); 17 + } 18 + 13 19 void i915_gemfs_init(struct drm_i915_private *i915) 14 20 { 15 - char huge_opt[] = "huge=within_size"; /* r/w */ 16 21 struct file_system_type *type; 22 + struct fs_context *fc; 17 23 struct vfsmount *gemfs; 24 + int ret; 18 25 19 26 /* 20 27 * By creating our own shmemfs mountpoint, we can pass in ··· 45 38 if (!type) 46 39 goto err; 47 40 48 - gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt); 49 - if (IS_ERR(gemfs)) 41 + fc = fs_context_for_mount(type, SB_KERNMOUNT); 42 + if (IS_ERR(fc)) 43 + goto err; 44 + ret = add_param(fc, "source", "tmpfs"); 45 + if (!ret) 46 + ret = add_param(fc, "huge", "within_size"); 47 + if (!ret) 48 + gemfs = fc_mount_longterm(fc); 49 + put_fs_context(fc); 50 + if (ret) 50 51 goto err; 51 52 52 53 i915->mm.gemfs = gemfs;
+18 -3
drivers/gpu/drm/v3d/v3d_gemfs.c
··· 3 3 4 4 #include <linux/fs.h> 5 5 #include <linux/mount.h> 6 + #include <linux/fs_context.h> 6 7 7 8 #include "v3d_drv.h" 8 9 10 + static int add_param(struct fs_context *fc, const char *key, const char *val) 11 + { 12 + return vfs_parse_fs_string(fc, key, val, strlen(val)); 13 + } 14 + 9 15 void v3d_gemfs_init(struct v3d_dev *v3d) 10 16 { 11 - char huge_opt[] = "huge=within_size"; 12 17 struct file_system_type *type; 18 + struct fs_context *fc; 13 19 struct vfsmount *gemfs; 20 + int ret; 14 21 15 22 /* 16 23 * By creating our own shmemfs mountpoint, we can pass in ··· 35 28 if (!type) 36 29 goto err; 37 30 38 - gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt); 39 - if (IS_ERR(gemfs)) 31 + fc = fs_context_for_mount(type, SB_KERNMOUNT); 32 + if (IS_ERR(fc)) 33 + goto err; 34 + ret = add_param(fc, "source", "tmpfs"); 35 + if (!ret) 36 + ret = add_param(fc, "huge", "within_size"); 37 + if (!ret) 38 + gemfs = fc_mount_longterm(fc); 39 + put_fs_context(fc); 40 + if (ret) 40 41 goto err; 41 42 42 43 v3d->gemfs = gemfs;
+1 -1
fs/hugetlbfs/inode.c
··· 1588 1588 } else { 1589 1589 struct hugetlbfs_fs_context *ctx = fc->fs_private; 1590 1590 ctx->hstate = h; 1591 - mnt = fc_mount(fc); 1591 + mnt = fc_mount_longterm(fc); 1592 1592 put_fs_context(fc); 1593 1593 } 1594 1594 if (IS_ERR(mnt))
+31 -9
fs/mount.h
··· 44 44 struct hlist_node m_hash; 45 45 struct dentry *m_dentry; 46 46 struct hlist_head m_list; 47 - int m_count; 48 47 }; 49 48 50 49 struct mount { ··· 69 70 struct list_head mnt_list; 70 71 struct list_head mnt_expire; /* link in fs-specific expiry list */ 71 72 struct list_head mnt_share; /* circular list of shared mounts */ 72 - struct list_head mnt_slave_list;/* list of slave mounts */ 73 - struct list_head mnt_slave; /* slave list entry */ 73 + struct hlist_head mnt_slave_list;/* list of slave mounts */ 74 + struct hlist_node mnt_slave; /* slave list entry */ 74 75 struct mount *mnt_master; /* slave is on master->mnt_slave_list */ 75 76 struct mnt_namespace *mnt_ns; /* containing namespace */ 76 77 struct mountpoint *mnt_mp; /* where is it mounted */ ··· 78 79 struct hlist_node mnt_mp_list; /* list mounts with the same mountpoint */ 79 80 struct hlist_node mnt_umount; 80 81 }; 81 - struct list_head mnt_umounting; /* list entry for umount propagation */ 82 82 #ifdef CONFIG_FSNOTIFY 83 83 struct fsnotify_mark_connector __rcu *mnt_fsnotify_marks; 84 84 __u32 mnt_fsnotify_mask; 85 85 struct list_head to_notify; /* need to queue notification */ 86 86 struct mnt_namespace *prev_ns; /* previous namespace (NULL if none) */ 87 87 #endif 88 + int mnt_t_flags; /* namespace_sem-protected flags */ 88 89 int mnt_id; /* mount identifier, reused */ 89 90 u64 mnt_id_unique; /* mount ID unique until reboot */ 90 91 int mnt_group_id; /* peer group identifier */ 91 92 int mnt_expiry_mark; /* true if marked for expiry */ 92 93 struct hlist_head mnt_pins; 93 94 struct hlist_head mnt_stuck_children; 95 + struct mount *overmount; /* mounted on ->mnt_root */ 94 96 } __randomize_layout; 97 + 98 + enum { 99 + T_SHARED = 1, /* mount is shared */ 100 + T_UNBINDABLE = 2, /* mount is unbindable */ 101 + T_MARKED = 4, /* internal mark for propagate_... */ 102 + T_UMOUNT_CANDIDATE = 8, /* for propagate_umount */ 103 + 104 + /* 105 + * T_SHARED_MASK is the set of flags that should be cleared when a 106 + * mount becomes shared. Currently, this is only the flag that says a 107 + * mount cannot be bind mounted, since this is how we create a mount 108 + * that shares events with another mount. If you add a new T_* 109 + * flag, consider how it interacts with shared mounts. 110 + */ 111 + T_SHARED_MASK = T_UNBINDABLE, 112 + }; 95 113 96 114 #define MNT_NS_INTERNAL ERR_PTR(-EINVAL) /* distinct from any mnt_namespace */ 97 115 ··· 117 101 return container_of(mnt, struct mount, mnt); 118 102 } 119 103 120 - static inline int mnt_has_parent(struct mount *mnt) 104 + static inline int mnt_has_parent(const struct mount *mnt) 121 105 { 122 106 return mnt != mnt->mnt_parent; 123 107 } ··· 162 146 163 147 extern const struct seq_operations mounts_op; 164 148 165 - extern bool __is_local_mountpoint(struct dentry *dentry); 166 - static inline bool is_local_mountpoint(struct dentry *dentry) 149 + extern bool __is_local_mountpoint(const struct dentry *dentry); 150 + static inline bool is_local_mountpoint(const struct dentry *dentry) 167 151 { 168 152 if (!d_mountpoint(dentry)) 169 153 return false; ··· 176 160 return ns->seq == 0; 177 161 } 178 162 163 + static inline bool anon_ns_root(const struct mount *m) 164 + { 165 + struct mnt_namespace *ns = READ_ONCE(m->mnt_ns); 166 + 167 + return !IS_ERR_OR_NULL(ns) && is_anon_ns(ns) && m == ns->root; 168 + } 169 + 179 170 static inline bool mnt_ns_attached(const struct mount *mnt) 180 171 { 181 172 return !RB_EMPTY_NODE(&mnt->mnt_node); ··· 193 170 return RB_EMPTY_ROOT(&ns->mounts); 194 171 } 195 172 196 - static inline void move_from_ns(struct mount *mnt, struct list_head *dt_list) 173 + static inline void move_from_ns(struct mount *mnt) 197 174 { 198 175 struct mnt_namespace *ns = mnt->mnt_ns; 199 176 WARN_ON(!mnt_ns_attached(mnt)); ··· 203 180 ns->mnt_first_node = rb_next(&mnt->mnt_node); 204 181 rb_erase(&mnt->mnt_node, &ns->mounts); 205 182 RB_CLEAR_NODE(&mnt->mnt_node); 206 - list_add_tail(&mnt->mnt_list, dt_list); 207 183 } 208 184 209 185 bool has_locked_children(struct mount *mnt, struct dentry *dentry);
+292 -419
fs/namespace.c
··· 79 79 static DECLARE_RWSEM(namespace_sem); 80 80 static HLIST_HEAD(unmounted); /* protected by namespace_sem */ 81 81 static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */ 82 + static struct mnt_namespace *emptied_ns; /* protected by namespace_sem */ 82 83 static DEFINE_SEQLOCK(mnt_ns_tree_lock); 83 84 84 85 #ifdef CONFIG_FSNOTIFY ··· 381 380 INIT_LIST_HEAD(&mnt->mnt_list); 382 381 INIT_LIST_HEAD(&mnt->mnt_expire); 383 382 INIT_LIST_HEAD(&mnt->mnt_share); 384 - INIT_LIST_HEAD(&mnt->mnt_slave_list); 385 - INIT_LIST_HEAD(&mnt->mnt_slave); 383 + INIT_HLIST_HEAD(&mnt->mnt_slave_list); 384 + INIT_HLIST_NODE(&mnt->mnt_slave); 386 385 INIT_HLIST_NODE(&mnt->mnt_mp_list); 387 - INIT_LIST_HEAD(&mnt->mnt_umounting); 388 386 INIT_HLIST_HEAD(&mnt->mnt_stuck_children); 389 387 RB_CLEAR_NODE(&mnt->mnt_node); 390 388 mnt->mnt.mnt_idmap = &nop_mnt_idmap; ··· 894 894 * namespace not just a mount that happens to have some specified 895 895 * parent mount. 896 896 */ 897 - bool __is_local_mountpoint(struct dentry *dentry) 897 + bool __is_local_mountpoint(const struct dentry *dentry) 898 898 { 899 899 struct mnt_namespace *ns = current->nsproxy->mnt_ns; 900 900 struct mount *mnt, *n; ··· 911 911 return is_covered; 912 912 } 913 913 914 - static struct mountpoint *lookup_mountpoint(struct dentry *dentry) 914 + struct pinned_mountpoint { 915 + struct hlist_node node; 916 + struct mountpoint *mp; 917 + }; 918 + 919 + static bool lookup_mountpoint(struct dentry *dentry, struct pinned_mountpoint *m) 915 920 { 916 921 struct hlist_head *chain = mp_hash(dentry); 917 922 struct mountpoint *mp; 918 923 919 924 hlist_for_each_entry(mp, chain, m_hash) { 920 925 if (mp->m_dentry == dentry) { 921 - mp->m_count++; 922 - return mp; 926 + hlist_add_head(&m->node, &mp->m_list); 927 + m->mp = mp; 928 + return true; 923 929 } 924 930 } 925 - return NULL; 931 + return false; 926 932 } 927 933 928 - static struct mountpoint *get_mountpoint(struct dentry *dentry) 934 + static int get_mountpoint(struct dentry *dentry, struct pinned_mountpoint *m) 929 935 { 930 - struct mountpoint *mp, *new = NULL; 936 + struct mountpoint *mp __free(kfree) = NULL; 937 + bool found; 931 938 int ret; 932 939 933 940 if (d_mountpoint(dentry)) { 934 941 /* might be worth a WARN_ON() */ 935 942 if (d_unlinked(dentry)) 936 - return ERR_PTR(-ENOENT); 943 + return -ENOENT; 937 944 mountpoint: 938 945 read_seqlock_excl(&mount_lock); 939 - mp = lookup_mountpoint(dentry); 946 + found = lookup_mountpoint(dentry, m); 940 947 read_sequnlock_excl(&mount_lock); 941 - if (mp) 942 - goto done; 948 + if (found) 949 + return 0; 943 950 } 944 951 945 - if (!new) 946 - new = kmalloc(sizeof(struct mountpoint), GFP_KERNEL); 947 - if (!new) 948 - return ERR_PTR(-ENOMEM); 949 - 952 + if (!mp) 953 + mp = kmalloc(sizeof(struct mountpoint), GFP_KERNEL); 954 + if (!mp) 955 + return -ENOMEM; 950 956 951 957 /* Exactly one processes may set d_mounted */ 952 958 ret = d_set_mounted(dentry); ··· 962 956 goto mountpoint; 963 957 964 958 /* The dentry is not available as a mountpoint? */ 965 - mp = ERR_PTR(ret); 966 959 if (ret) 967 - goto done; 960 + return ret; 968 961 969 962 /* Add the new mountpoint to the hash table */ 970 963 read_seqlock_excl(&mount_lock); 971 - new->m_dentry = dget(dentry); 972 - new->m_count = 1; 973 - hlist_add_head(&new->m_hash, mp_hash(dentry)); 974 - INIT_HLIST_HEAD(&new->m_list); 964 + mp->m_dentry = dget(dentry); 965 + hlist_add_head(&mp->m_hash, mp_hash(dentry)); 966 + INIT_HLIST_HEAD(&mp->m_list); 967 + hlist_add_head(&m->node, &mp->m_list); 968 + m->mp = no_free_ptr(mp); 975 969 read_sequnlock_excl(&mount_lock); 976 - 977 - mp = new; 978 - new = NULL; 979 - done: 980 - kfree(new); 981 - return mp; 970 + return 0; 982 971 } 983 972 984 973 /* 985 974 * vfsmount lock must be held. Additionally, the caller is responsible 986 975 * for serializing calls for given disposal list. 987 976 */ 988 - static void __put_mountpoint(struct mountpoint *mp, struct list_head *list) 977 + static void maybe_free_mountpoint(struct mountpoint *mp, struct list_head *list) 989 978 { 990 - if (!--mp->m_count) { 979 + if (hlist_empty(&mp->m_list)) { 991 980 struct dentry *dentry = mp->m_dentry; 992 - BUG_ON(!hlist_empty(&mp->m_list)); 993 981 spin_lock(&dentry->d_lock); 994 982 dentry->d_flags &= ~DCACHE_MOUNTED; 995 983 spin_unlock(&dentry->d_lock); ··· 993 993 } 994 994 } 995 995 996 - /* called with namespace_lock and vfsmount lock */ 997 - static void put_mountpoint(struct mountpoint *mp) 996 + /* 997 + * locks: mount_lock [read_seqlock_excl], namespace_sem [excl] 998 + */ 999 + static void unpin_mountpoint(struct pinned_mountpoint *m) 998 1000 { 999 - __put_mountpoint(mp, &ex_mountpoints); 1001 + if (m->mp) { 1002 + hlist_del(&m->node); 1003 + maybe_free_mountpoint(m->mp, &ex_mountpoints); 1004 + } 1000 1005 } 1001 1006 1002 1007 static inline int check_mnt(struct mount *mnt) ··· 1043 1038 } 1044 1039 1045 1040 /* 1046 - * vfsmount lock must be held for write 1041 + * locks: mount_lock[write_seqlock] 1047 1042 */ 1048 - static struct mountpoint *unhash_mnt(struct mount *mnt) 1043 + static void __umount_mnt(struct mount *mnt, struct list_head *shrink_list) 1049 1044 { 1050 1045 struct mountpoint *mp; 1046 + struct mount *parent = mnt->mnt_parent; 1047 + if (unlikely(parent->overmount == mnt)) 1048 + parent->overmount = NULL; 1051 1049 mnt->mnt_parent = mnt; 1052 1050 mnt->mnt_mountpoint = mnt->mnt.mnt_root; 1053 1051 list_del_init(&mnt->mnt_child); ··· 1058 1050 hlist_del_init(&mnt->mnt_mp_list); 1059 1051 mp = mnt->mnt_mp; 1060 1052 mnt->mnt_mp = NULL; 1061 - return mp; 1053 + maybe_free_mountpoint(mp, shrink_list); 1062 1054 } 1063 1055 1064 1056 /* 1065 - * vfsmount lock must be held for write 1057 + * locks: mount_lock[write_seqlock], namespace_sem[excl] (for ex_mountpoints) 1066 1058 */ 1067 1059 static void umount_mnt(struct mount *mnt) 1068 1060 { 1069 - put_mountpoint(unhash_mnt(mnt)); 1061 + __umount_mnt(mnt, &ex_mountpoints); 1070 1062 } 1071 1063 1072 1064 /* ··· 1076 1068 struct mountpoint *mp, 1077 1069 struct mount *child_mnt) 1078 1070 { 1079 - mp->m_count++; 1080 - mnt_add_count(mnt, 1); /* essentially, that's mntget */ 1081 1071 child_mnt->mnt_mountpoint = mp->m_dentry; 1082 1072 child_mnt->mnt_parent = mnt; 1083 1073 child_mnt->mnt_mp = mp; 1084 1074 hlist_add_head(&child_mnt->mnt_mp_list, &mp->m_list); 1085 1075 } 1086 1076 1087 - /** 1088 - * mnt_set_mountpoint_beneath - mount a mount beneath another one 1089 - * 1090 - * @new_parent: the source mount 1091 - * @top_mnt: the mount beneath which @new_parent is mounted 1092 - * @new_mp: the new mountpoint of @top_mnt on @new_parent 1093 - * 1094 - * Remove @top_mnt from its current mountpoint @top_mnt->mnt_mp and 1095 - * parent @top_mnt->mnt_parent and mount it on top of @new_parent at 1096 - * @new_mp. And mount @new_parent on the old parent and old 1097 - * mountpoint of @top_mnt. 1098 - * 1099 - * Context: This function expects namespace_lock() and lock_mount_hash() 1100 - * to have been acquired in that order. 1101 - */ 1102 - static void mnt_set_mountpoint_beneath(struct mount *new_parent, 1103 - struct mount *top_mnt, 1104 - struct mountpoint *new_mp) 1077 + static void make_visible(struct mount *mnt) 1105 1078 { 1106 - struct mount *old_top_parent = top_mnt->mnt_parent; 1107 - struct mountpoint *old_top_mp = top_mnt->mnt_mp; 1108 - 1109 - mnt_set_mountpoint(old_top_parent, old_top_mp, new_parent); 1110 - mnt_change_mountpoint(new_parent, new_mp, top_mnt); 1111 - } 1112 - 1113 - 1114 - static void __attach_mnt(struct mount *mnt, struct mount *parent) 1115 - { 1079 + struct mount *parent = mnt->mnt_parent; 1080 + if (unlikely(mnt->mnt_mountpoint == parent->mnt.mnt_root)) 1081 + parent->overmount = mnt; 1116 1082 hlist_add_head_rcu(&mnt->mnt_hash, 1117 1083 m_hash(&parent->mnt, mnt->mnt_mountpoint)); 1118 1084 list_add_tail(&mnt->mnt_child, &parent->mnt_mounts); ··· 1098 1116 * @parent: the parent 1099 1117 * @mnt: the new mount 1100 1118 * @mp: the new mountpoint 1101 - * @beneath: whether to mount @mnt beneath or on top of @parent 1102 1119 * 1103 - * If @beneath is false, mount @mnt at @mp on @parent. Then attach @mnt 1120 + * Mount @mnt at @mp on @parent. Then attach @mnt 1104 1121 * to @parent's child mount list and to @mount_hashtable. 1105 1122 * 1106 - * If @beneath is true, remove @mnt from its current parent and 1107 - * mountpoint and mount it on @mp on @parent, and mount @parent on the 1108 - * old parent and old mountpoint of @mnt. Finally, attach @parent to 1109 - * @mnt_hashtable and @parent->mnt_parent->mnt_mounts. 1110 - * 1111 - * Note, when __attach_mnt() is called @mnt->mnt_parent already points 1123 + * Note, when make_visible() is called @mnt->mnt_parent already points 1112 1124 * to the correct parent. 1113 1125 * 1114 1126 * Context: This function expects namespace_lock() and lock_mount_hash() 1115 1127 * to have been acquired in that order. 1116 1128 */ 1117 1129 static void attach_mnt(struct mount *mnt, struct mount *parent, 1118 - struct mountpoint *mp, bool beneath) 1130 + struct mountpoint *mp) 1119 1131 { 1120 - if (beneath) 1121 - mnt_set_mountpoint_beneath(mnt, parent, mp); 1122 - else 1123 - mnt_set_mountpoint(parent, mp, mnt); 1124 - /* 1125 - * Note, @mnt->mnt_parent has to be used. If @mnt was mounted 1126 - * beneath @parent then @mnt will need to be attached to 1127 - * @parent's old parent, not @parent. IOW, @mnt->mnt_parent 1128 - * isn't the same mount as @parent. 1129 - */ 1130 - __attach_mnt(mnt, mnt->mnt_parent); 1132 + mnt_set_mountpoint(parent, mp, mnt); 1133 + make_visible(mnt); 1131 1134 } 1132 1135 1133 1136 void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp, struct mount *mnt) 1134 1137 { 1135 1138 struct mountpoint *old_mp = mnt->mnt_mp; 1136 - struct mount *old_parent = mnt->mnt_parent; 1137 1139 1138 1140 list_del_init(&mnt->mnt_child); 1139 1141 hlist_del_init(&mnt->mnt_mp_list); 1140 1142 hlist_del_init_rcu(&mnt->mnt_hash); 1141 1143 1142 - attach_mnt(mnt, parent, mp, false); 1144 + attach_mnt(mnt, parent, mp); 1143 1145 1144 - put_mountpoint(old_mp); 1145 - mnt_add_count(old_parent, -1); 1146 + maybe_free_mountpoint(old_mp, &ex_mountpoints); 1146 1147 } 1147 1148 1148 1149 static inline struct mount *node_to_mount(struct rb_node *node) ··· 1162 1197 mnt_notify_add(mnt); 1163 1198 } 1164 1199 1165 - /* 1166 - * vfsmount lock must be held for write 1167 - */ 1168 - static void commit_tree(struct mount *mnt) 1169 - { 1170 - struct mount *parent = mnt->mnt_parent; 1171 - struct mount *m; 1172 - LIST_HEAD(head); 1173 - struct mnt_namespace *n = parent->mnt_ns; 1174 - 1175 - BUG_ON(parent == mnt); 1176 - 1177 - list_add_tail(&head, &mnt->mnt_list); 1178 - while (!list_empty(&head)) { 1179 - m = list_first_entry(&head, typeof(*m), mnt_list); 1180 - list_del(&m->mnt_list); 1181 - 1182 - mnt_add_to_ns(n, m); 1183 - } 1184 - n->nr_mounts += n->pending_mounts; 1185 - n->pending_mounts = 0; 1186 - 1187 - __attach_mnt(mnt, parent); 1188 - touch_mnt_namespace(n); 1189 - } 1190 - 1191 1200 static struct mount *next_mnt(struct mount *p, struct mount *root) 1192 1201 { 1193 1202 struct list_head *next = p->mnt_mounts.next; ··· 1186 1247 prev = p->mnt_mounts.prev; 1187 1248 } 1188 1249 return p; 1250 + } 1251 + 1252 + /* 1253 + * vfsmount lock must be held for write 1254 + */ 1255 + static void commit_tree(struct mount *mnt) 1256 + { 1257 + struct mnt_namespace *n = mnt->mnt_parent->mnt_ns; 1258 + 1259 + if (!mnt_ns_attached(mnt)) { 1260 + for (struct mount *m = mnt; m; m = next_mnt(m, mnt)) 1261 + if (unlikely(mnt_ns_attached(m))) 1262 + m = skip_mnt_tree(m); 1263 + else 1264 + mnt_add_to_ns(n, m); 1265 + n->nr_mounts += n->pending_mounts; 1266 + n->pending_mounts = 0; 1267 + } 1268 + 1269 + make_visible(mnt); 1270 + touch_mnt_namespace(n); 1189 1271 } 1190 1272 1191 1273 /** ··· 1256 1296 } 1257 1297 EXPORT_SYMBOL(fc_mount); 1258 1298 1299 + struct vfsmount *fc_mount_longterm(struct fs_context *fc) 1300 + { 1301 + struct vfsmount *mnt = fc_mount(fc); 1302 + if (!IS_ERR(mnt)) 1303 + real_mount(mnt)->mnt_ns = MNT_NS_INTERNAL; 1304 + return mnt; 1305 + } 1306 + EXPORT_SYMBOL(fc_mount_longterm); 1307 + 1259 1308 struct vfsmount *vfs_kern_mount(struct file_system_type *type, 1260 1309 int flags, const char *name, 1261 1310 void *data) ··· 1306 1337 if (!mnt) 1307 1338 return ERR_PTR(-ENOMEM); 1308 1339 1309 - if (flag & (CL_SLAVE | CL_PRIVATE | CL_SHARED_TO_SLAVE)) 1340 + mnt->mnt.mnt_flags = READ_ONCE(old->mnt.mnt_flags) & 1341 + ~MNT_INTERNAL_FLAGS; 1342 + 1343 + if (flag & (CL_SLAVE | CL_PRIVATE)) 1310 1344 mnt->mnt_group_id = 0; /* not a peer of original */ 1311 1345 else 1312 1346 mnt->mnt_group_id = old->mnt_group_id; ··· 1320 1348 goto out_free; 1321 1349 } 1322 1350 1323 - mnt->mnt.mnt_flags = old->mnt.mnt_flags; 1324 - mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL); 1351 + if (mnt->mnt_group_id) 1352 + set_mnt_shared(mnt); 1325 1353 1326 1354 atomic_inc(&sb->s_active); 1327 1355 mnt->mnt.mnt_idmap = mnt_idmap_get(mnt_idmap(&old->mnt)); ··· 1334 1362 list_add_tail(&mnt->mnt_instance, &sb->s_mounts); 1335 1363 unlock_mount_hash(); 1336 1364 1337 - if ((flag & CL_SLAVE) || 1338 - ((flag & CL_SHARED_TO_SLAVE) && IS_MNT_SHARED(old))) { 1339 - list_add(&mnt->mnt_slave, &old->mnt_slave_list); 1365 + if (flag & CL_PRIVATE) // we are done with it 1366 + return mnt; 1367 + 1368 + if (peers(mnt, old)) 1369 + list_add(&mnt->mnt_share, &old->mnt_share); 1370 + 1371 + if ((flag & CL_SLAVE) && old->mnt_group_id) { 1372 + hlist_add_head(&mnt->mnt_slave, &old->mnt_slave_list); 1340 1373 mnt->mnt_master = old; 1341 - CLEAR_MNT_SHARED(mnt); 1342 - } else if (!(flag & CL_PRIVATE)) { 1343 - if ((flag & CL_MAKE_SHARED) || IS_MNT_SHARED(old)) 1344 - list_add(&mnt->mnt_share, &old->mnt_share); 1345 - if (IS_MNT_SLAVE(old)) 1346 - list_add(&mnt->mnt_slave, &old->mnt_slave); 1374 + } else if (IS_MNT_SLAVE(old)) { 1375 + hlist_add_behind(&mnt->mnt_slave, &old->mnt_slave); 1347 1376 mnt->mnt_master = old->mnt_master; 1348 - } else { 1349 - CLEAR_MNT_SHARED(mnt); 1350 1377 } 1351 - if (flag & CL_MAKE_SHARED) 1352 - set_mnt_shared(mnt); 1353 - 1354 - /* stick the duplicate mount on the same expiry list 1355 - * as the original if that was on one */ 1356 - if (flag & CL_EXPIRE) { 1357 - if (!list_empty(&old->mnt_expire)) 1358 - list_add(&mnt->mnt_expire, &old->mnt_expire); 1359 - } 1360 - 1361 1378 return mnt; 1362 1379 1363 1380 out_free: ··· 1439 1478 rcu_read_unlock(); 1440 1479 1441 1480 list_del(&mnt->mnt_instance); 1481 + if (unlikely(!list_empty(&mnt->mnt_expire))) 1482 + list_del(&mnt->mnt_expire); 1442 1483 1443 1484 if (unlikely(!list_empty(&mnt->mnt_mounts))) { 1444 1485 struct mount *p, *tmp; 1445 1486 list_for_each_entry_safe(p, tmp, &mnt->mnt_mounts, mnt_child) { 1446 - __put_mountpoint(unhash_mnt(p), &list); 1487 + __umount_mnt(p, &list); 1447 1488 hlist_add_head(&p->mnt_umount, &mnt->mnt_stuck_children); 1448 1489 } 1449 1490 } ··· 1642 1679 int may_umount_tree(struct vfsmount *m) 1643 1680 { 1644 1681 struct mount *mnt = real_mount(m); 1645 - int actual_refs = 0; 1646 - int minimum_refs = 0; 1647 - struct mount *p; 1648 - BUG_ON(!m); 1682 + bool busy = false; 1649 1683 1650 1684 /* write lock needed for mnt_get_count */ 1651 1685 lock_mount_hash(); 1652 - for (p = mnt; p; p = next_mnt(p, mnt)) { 1653 - actual_refs += mnt_get_count(p); 1654 - minimum_refs += 2; 1686 + for (struct mount *p = mnt; p; p = next_mnt(p, mnt)) { 1687 + if (mnt_get_count(p) > (p == mnt ? 2 : 1)) { 1688 + busy = true; 1689 + break; 1690 + } 1655 1691 } 1656 1692 unlock_mount_hash(); 1657 1693 1658 - if (actual_refs > minimum_refs) 1659 - return 0; 1660 - 1661 - return 1; 1694 + return !busy; 1662 1695 } 1663 1696 1664 1697 EXPORT_SYMBOL(may_umount_tree); ··· 1730 1771 } 1731 1772 #endif 1732 1773 1774 + static void free_mnt_ns(struct mnt_namespace *); 1733 1775 static void namespace_unlock(void) 1734 1776 { 1735 1777 struct hlist_head head; 1736 1778 struct hlist_node *p; 1737 1779 struct mount *m; 1780 + struct mnt_namespace *ns = emptied_ns; 1738 1781 LIST_HEAD(list); 1739 1782 1740 1783 hlist_move_list(&unmounted, &head); 1741 1784 list_splice_init(&ex_mountpoints, &list); 1785 + emptied_ns = NULL; 1742 1786 1743 1787 if (need_notify_mnt_list()) { 1744 1788 /* ··· 1754 1792 up_read(&namespace_sem); 1755 1793 } else { 1756 1794 up_write(&namespace_sem); 1795 + } 1796 + if (unlikely(ns)) { 1797 + /* Make sure we notice when we leak mounts. */ 1798 + VFS_WARN_ON_ONCE(!mnt_ns_empty(ns)); 1799 + free_mnt_ns(ns); 1757 1800 } 1758 1801 1759 1802 shrink_dentry_list(&list); ··· 1832 1865 for (p = mnt; p; p = next_mnt(p, mnt)) { 1833 1866 p->mnt.mnt_flags |= MNT_UMOUNT; 1834 1867 if (mnt_ns_attached(p)) 1835 - move_from_ns(p, &tmp_list); 1836 - else 1837 - list_move(&p->mnt_list, &tmp_list); 1868 + move_from_ns(p); 1869 + list_add_tail(&p->mnt_list, &tmp_list); 1838 1870 } 1839 1871 1840 1872 /* Hide the mounts from mnt_mounts */ ··· 1862 1896 1863 1897 disconnect = disconnect_mount(p, how); 1864 1898 if (mnt_has_parent(p)) { 1865 - mnt_add_count(p->mnt_parent, -1); 1866 1899 if (!disconnect) { 1867 1900 /* Don't forget about p */ 1868 1901 list_add_tail(&p->mnt_child, &p->mnt_parent->mnt_mounts); ··· 1938 1973 * all race cases, but it's a slowpath. 1939 1974 */ 1940 1975 lock_mount_hash(); 1941 - if (mnt_get_count(mnt) != 2) { 1976 + if (!list_empty(&mnt->mnt_mounts) || mnt_get_count(mnt) != 2) { 1942 1977 unlock_mount_hash(); 1943 1978 return -EBUSY; 1944 1979 } ··· 1984 2019 namespace_lock(); 1985 2020 lock_mount_hash(); 1986 2021 1987 - /* Recheck MNT_LOCKED with the locks held */ 2022 + /* Repeat the earlier racy checks, now that we are holding the locks */ 1988 2023 retval = -EINVAL; 2024 + if (!check_mnt(mnt)) 2025 + goto out; 2026 + 1989 2027 if (mnt->mnt.mnt_flags & MNT_LOCKED) 2028 + goto out; 2029 + 2030 + if (!mnt_has_parent(mnt)) /* not the absolute root */ 1990 2031 goto out; 1991 2032 1992 2033 event++; 1993 2034 if (flags & MNT_DETACH) { 1994 - if (mnt_ns_attached(mnt) || !list_empty(&mnt->mnt_list)) 1995 - umount_tree(mnt, UMOUNT_PROPAGATE); 2035 + umount_tree(mnt, UMOUNT_PROPAGATE); 1996 2036 retval = 0; 1997 2037 } else { 1998 2038 smp_mb(); // paired with __legitimize_mnt() 1999 2039 shrink_submounts(mnt); 2000 2040 retval = -EBUSY; 2001 2041 if (!propagate_mount_busy(mnt, 2)) { 2002 - if (mnt_ns_attached(mnt) || !list_empty(&mnt->mnt_list)) 2003 - umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC); 2042 + umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC); 2004 2043 retval = 0; 2005 2044 } 2006 2045 } ··· 2026 2057 */ 2027 2058 void __detach_mounts(struct dentry *dentry) 2028 2059 { 2029 - struct mountpoint *mp; 2060 + struct pinned_mountpoint mp = {}; 2030 2061 struct mount *mnt; 2031 2062 2032 2063 namespace_lock(); 2033 2064 lock_mount_hash(); 2034 - mp = lookup_mountpoint(dentry); 2035 - if (!mp) 2065 + if (!lookup_mountpoint(dentry, &mp)) 2036 2066 goto out_unlock; 2037 2067 2038 2068 event++; 2039 - while (!hlist_empty(&mp->m_list)) { 2040 - mnt = hlist_entry(mp->m_list.first, struct mount, mnt_mp_list); 2069 + while (mp.node.next) { 2070 + mnt = hlist_entry(mp.node.next, struct mount, mnt_mp_list); 2041 2071 if (mnt->mnt.mnt_flags & MNT_UMOUNT) { 2042 2072 umount_mnt(mnt); 2043 2073 hlist_add_head(&mnt->mnt_umount, &unmounted); 2044 2074 } 2045 2075 else umount_tree(mnt, UMOUNT_CONNECTED); 2046 2076 } 2047 - put_mountpoint(mp); 2077 + unpin_mountpoint(&mp); 2048 2078 out_unlock: 2049 2079 unlock_mount_hash(); 2050 2080 namespace_unlock(); ··· 2227 2259 return dst_mnt; 2228 2260 2229 2261 src_parent = src_root; 2230 - dst_mnt->mnt_mountpoint = src_root->mnt_mountpoint; 2231 2262 2232 2263 list_for_each_entry(src_root_child, &src_root->mnt_mounts, mnt_child) { 2233 2264 if (!is_subdir(src_root_child->mnt_mountpoint, dentry)) ··· 2261 2294 if (IS_ERR(dst_mnt)) 2262 2295 goto out; 2263 2296 lock_mount_hash(); 2264 - list_add_tail(&dst_mnt->mnt_list, &res->mnt_list); 2265 - attach_mnt(dst_mnt, dst_parent, src_parent->mnt_mp, false); 2297 + if (src_mnt->mnt.mnt_flags & MNT_LOCKED) 2298 + dst_mnt->mnt.mnt_flags |= MNT_LOCKED; 2299 + if (unlikely(flag & CL_EXPIRE)) { 2300 + /* stick the duplicate mount on the same expiry 2301 + * list as the original if that was on one */ 2302 + if (!list_empty(&src_mnt->mnt_expire)) 2303 + list_add(&dst_mnt->mnt_expire, 2304 + &src_mnt->mnt_expire); 2305 + } 2306 + attach_mnt(dst_mnt, dst_parent, src_parent->mnt_mp); 2266 2307 unlock_mount_hash(); 2267 2308 } 2268 2309 } ··· 2343 2368 kfree(paths); 2344 2369 } 2345 2370 2346 - static void free_mnt_ns(struct mnt_namespace *); 2347 2371 static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *, bool); 2348 - 2349 - static inline bool must_dissolve(struct mnt_namespace *mnt_ns) 2350 - { 2351 - /* 2352 - * This mount belonged to an anonymous mount namespace 2353 - * but was moved to a non-anonymous mount namespace and 2354 - * then unmounted. 2355 - */ 2356 - if (unlikely(!mnt_ns)) 2357 - return false; 2358 - 2359 - /* 2360 - * This mount belongs to a non-anonymous mount namespace 2361 - * and we know that such a mount can never transition to 2362 - * an anonymous mount namespace again. 2363 - */ 2364 - if (!is_anon_ns(mnt_ns)) { 2365 - /* 2366 - * A detached mount either belongs to an anonymous mount 2367 - * namespace or a non-anonymous mount namespace. It 2368 - * should never belong to something purely internal. 2369 - */ 2370 - VFS_WARN_ON_ONCE(mnt_ns == MNT_NS_INTERNAL); 2371 - return false; 2372 - } 2373 - 2374 - return true; 2375 - } 2376 2372 2377 2373 void dissolve_on_fput(struct vfsmount *mnt) 2378 2374 { 2379 - struct mnt_namespace *ns; 2380 2375 struct mount *m = real_mount(mnt); 2381 2376 2377 + /* 2378 + * m used to be the root of anon namespace; if it still is one, 2379 + * we need to dissolve the mount tree and free that namespace. 2380 + * Let's try to avoid taking namespace_sem if we can determine 2381 + * that there's nothing to do without it - rcu_read_lock() is 2382 + * enough to make anon_ns_root() memory-safe and once m has 2383 + * left its namespace, it's no longer our concern, since it will 2384 + * never become a root of anon ns again. 2385 + */ 2386 + 2382 2387 scoped_guard(rcu) { 2383 - if (!must_dissolve(READ_ONCE(m->mnt_ns))) 2388 + if (!anon_ns_root(m)) 2384 2389 return; 2385 2390 } 2386 2391 2387 2392 scoped_guard(namespace_lock, &namespace_sem) { 2388 - ns = m->mnt_ns; 2389 - if (!must_dissolve(ns)) 2393 + if (!anon_ns_root(m)) 2390 2394 return; 2391 2395 2392 - /* 2393 - * After must_dissolve() we know that this is a detached 2394 - * mount in an anonymous mount namespace. 2395 - * 2396 - * Now when mnt_has_parent() reports that this mount 2397 - * tree has a parent, we know that this anonymous mount 2398 - * tree has been moved to another anonymous mount 2399 - * namespace. 2400 - * 2401 - * So when closing this file we cannot unmount the mount 2402 - * tree. This will be done when the file referring to 2403 - * the root of the anonymous mount namespace will be 2404 - * closed (It could already be closed but it would sync 2405 - * on @namespace_sem and wait for us to finish.). 2406 - */ 2407 - if (mnt_has_parent(m)) 2408 - return; 2409 - 2396 + emptied_ns = m->mnt_ns; 2410 2397 lock_mount_hash(); 2411 2398 umount_tree(m, UMOUNT_CONNECTED); 2412 2399 unlock_mount_hash(); 2413 2400 } 2414 - 2415 - /* Make sure we notice when we leak mounts. */ 2416 - VFS_WARN_ON_ONCE(!mnt_ns_empty(ns)); 2417 - free_mnt_ns(ns); 2418 2401 } 2419 2402 2420 2403 static bool __has_locked_children(struct mount *mnt, struct dentry *dentry) ··· 2451 2518 * loops get created. 2452 2519 */ 2453 2520 if (!check_mnt(old_mnt)) { 2454 - if (!is_mounted(&old_mnt->mnt) || 2455 - !is_anon_ns(old_mnt->mnt_ns) || 2456 - mnt_has_parent(old_mnt)) 2521 + if (!anon_ns_root(old_mnt)) 2457 2522 return ERR_PTR(-EINVAL); 2458 2523 2459 2524 if (!check_for_nsfs_mounts(old_mnt)) ··· 2495 2564 if (flags & MNT_NOEXEC) 2496 2565 flags |= MNT_LOCK_NOEXEC; 2497 2566 /* Don't allow unprivileged users to reveal what is under a mount */ 2498 - if (list_empty(&p->mnt_expire)) 2567 + if (list_empty(&p->mnt_expire) && p != mnt) 2499 2568 flags |= MNT_LOCKED; 2500 2569 p->mnt.mnt_flags = flags; 2501 2570 } ··· 2516 2585 struct mount *p; 2517 2586 2518 2587 for (p = mnt; p; p = recurse ? next_mnt(p, mnt) : NULL) { 2519 - if (!p->mnt_group_id && !IS_MNT_SHARED(p)) { 2588 + if (!p->mnt_group_id) { 2520 2589 int err = mnt_alloc_group_id(p); 2521 2590 if (err) { 2522 2591 cleanup_group_ids(mnt, p); ··· 2552 2621 } 2553 2622 2554 2623 enum mnt_tree_flags_t { 2555 - MNT_TREE_MOVE = BIT(0), 2556 - MNT_TREE_BENEATH = BIT(1), 2557 - MNT_TREE_PROPAGATION = BIT(2), 2624 + MNT_TREE_BENEATH = BIT(0), 2625 + MNT_TREE_PROPAGATION = BIT(1), 2558 2626 }; 2559 2627 2560 2628 /** 2561 2629 * attach_recursive_mnt - attach a source mount tree 2562 2630 * @source_mnt: mount tree to be attached 2563 - * @top_mnt: mount that @source_mnt will be mounted on or mounted beneath 2631 + * @dest_mnt: mount that @source_mnt will be mounted on 2564 2632 * @dest_mp: the mountpoint @source_mnt will be mounted at 2565 - * @flags: modify how @source_mnt is supposed to be attached 2566 2633 * 2567 2634 * NOTE: in the table below explains the semantics when a source mount 2568 2635 * of a given type is attached to a destination mount of a given type. ··· 2623 2694 * Otherwise a negative error code is returned. 2624 2695 */ 2625 2696 static int attach_recursive_mnt(struct mount *source_mnt, 2626 - struct mount *top_mnt, 2627 - struct mountpoint *dest_mp, 2628 - enum mnt_tree_flags_t flags) 2697 + struct mount *dest_mnt, 2698 + struct mountpoint *dest_mp) 2629 2699 { 2630 2700 struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns; 2631 2701 HLIST_HEAD(tree_list); 2632 - struct mnt_namespace *ns = top_mnt->mnt_ns; 2633 - struct mountpoint *smp; 2634 - struct mount *child, *dest_mnt, *p; 2702 + struct mnt_namespace *ns = dest_mnt->mnt_ns; 2703 + struct pinned_mountpoint root = {}; 2704 + struct mountpoint *shorter = NULL; 2705 + struct mount *child, *p; 2706 + struct mount *top; 2635 2707 struct hlist_node *n; 2636 2708 int err = 0; 2637 - bool moving = flags & MNT_TREE_MOVE, beneath = flags & MNT_TREE_BENEATH; 2709 + bool moving = mnt_has_parent(source_mnt); 2638 2710 2639 2711 /* 2640 2712 * Preallocate a mountpoint in case the new mounts need to be 2641 2713 * mounted beneath mounts on the same mountpoint. 2642 2714 */ 2643 - smp = get_mountpoint(source_mnt->mnt.mnt_root); 2644 - if (IS_ERR(smp)) 2645 - return PTR_ERR(smp); 2715 + for (top = source_mnt; unlikely(top->overmount); top = top->overmount) { 2716 + if (!shorter && is_mnt_ns_file(top->mnt.mnt_root)) 2717 + shorter = top->mnt_mp; 2718 + } 2719 + err = get_mountpoint(top->mnt.mnt_root, &root); 2720 + if (err) 2721 + return err; 2646 2722 2647 2723 /* Is there space to add these mounts to the mount namespace? */ 2648 2724 if (!moving) { ··· 2655 2721 if (err) 2656 2722 goto out; 2657 2723 } 2658 - 2659 - if (beneath) 2660 - dest_mnt = top_mnt->mnt_parent; 2661 - else 2662 - dest_mnt = top_mnt; 2663 2724 2664 2725 if (IS_MNT_SHARED(dest_mnt)) { 2665 2726 err = invent_group_ids(source_mnt, true); ··· 2672 2743 } 2673 2744 2674 2745 if (moving) { 2675 - if (beneath) 2676 - dest_mp = smp; 2677 - unhash_mnt(source_mnt); 2678 - attach_mnt(source_mnt, top_mnt, dest_mp, beneath); 2746 + umount_mnt(source_mnt); 2679 2747 mnt_notify_add(source_mnt); 2680 - touch_mnt_namespace(source_mnt->mnt_ns); 2748 + /* if the mount is moved, it should no longer be expired 2749 + * automatically */ 2750 + list_del_init(&source_mnt->mnt_expire); 2681 2751 } else { 2682 2752 if (source_mnt->mnt_ns) { 2683 - LIST_HEAD(head); 2684 - 2685 2753 /* move from anon - the caller will destroy */ 2754 + emptied_ns = source_mnt->mnt_ns; 2686 2755 for (p = source_mnt; p; p = next_mnt(p, source_mnt)) 2687 - move_from_ns(p, &head); 2688 - list_del_init(&head); 2756 + move_from_ns(p); 2689 2757 } 2690 - if (beneath) 2691 - mnt_set_mountpoint_beneath(source_mnt, top_mnt, smp); 2692 - else 2693 - mnt_set_mountpoint(dest_mnt, dest_mp, source_mnt); 2694 - commit_tree(source_mnt); 2695 2758 } 2759 + 2760 + mnt_set_mountpoint(dest_mnt, dest_mp, source_mnt); 2761 + /* 2762 + * Now the original copy is in the same state as the secondaries - 2763 + * its root attached to mountpoint, but not hashed and all mounts 2764 + * in it are either in our namespace or in no namespace at all. 2765 + * Add the original to the list of copies and deal with the 2766 + * rest of work for all of them uniformly. 2767 + */ 2768 + hlist_add_head(&source_mnt->mnt_hash, &tree_list); 2696 2769 2697 2770 hlist_for_each_entry_safe(child, n, &tree_list, mnt_hash) { 2698 2771 struct mount *q; ··· 2702 2771 /* Notice when we are propagating across user namespaces */ 2703 2772 if (child->mnt_parent->mnt_ns->user_ns != user_ns) 2704 2773 lock_mnt_tree(child); 2705 - child->mnt.mnt_flags &= ~MNT_LOCKED; 2706 2774 q = __lookup_mnt(&child->mnt_parent->mnt, 2707 2775 child->mnt_mountpoint); 2708 - if (q) 2709 - mnt_change_mountpoint(child, smp, q); 2776 + if (q) { 2777 + struct mountpoint *mp = root.mp; 2778 + struct mount *r = child; 2779 + while (unlikely(r->overmount)) 2780 + r = r->overmount; 2781 + if (unlikely(shorter) && child != source_mnt) 2782 + mp = shorter; 2783 + mnt_change_mountpoint(r, mp, q); 2784 + } 2710 2785 commit_tree(child); 2711 2786 } 2712 - put_mountpoint(smp); 2787 + unpin_mountpoint(&root); 2713 2788 unlock_mount_hash(); 2714 2789 2715 2790 return 0; ··· 2732 2795 ns->pending_mounts = 0; 2733 2796 2734 2797 read_seqlock_excl(&mount_lock); 2735 - put_mountpoint(smp); 2798 + unpin_mountpoint(&root); 2736 2799 read_sequnlock_excl(&mount_lock); 2737 2800 2738 2801 return err; ··· 2772 2835 * Return: Either the target mountpoint on the top mount or the top 2773 2836 * mount's mountpoint. 2774 2837 */ 2775 - static struct mountpoint *do_lock_mount(struct path *path, bool beneath) 2838 + static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bool beneath) 2776 2839 { 2777 2840 struct vfsmount *mnt = path->mnt; 2778 2841 struct dentry *dentry; 2779 - struct mountpoint *mp = ERR_PTR(-ENOENT); 2780 2842 struct path under = {}; 2843 + int err = -ENOENT; 2781 2844 2782 2845 for (;;) { 2783 2846 struct mount *m = real_mount(mnt); ··· 2815 2878 path->dentry = dget(mnt->mnt_root); 2816 2879 continue; // got overmounted 2817 2880 } 2818 - mp = get_mountpoint(dentry); 2819 - if (IS_ERR(mp)) 2881 + err = get_mountpoint(dentry, pinned); 2882 + if (err) 2820 2883 break; 2821 2884 if (beneath) { 2822 2885 /* ··· 2827 2890 */ 2828 2891 path_put(&under); 2829 2892 } 2830 - return mp; 2893 + return 0; 2831 2894 } 2832 2895 namespace_unlock(); 2833 2896 inode_unlock(dentry->d_inode); 2834 2897 if (beneath) 2835 2898 path_put(&under); 2836 - return mp; 2899 + return err; 2837 2900 } 2838 2901 2839 - static inline struct mountpoint *lock_mount(struct path *path) 2902 + static inline int lock_mount(struct path *path, struct pinned_mountpoint *m) 2840 2903 { 2841 - return do_lock_mount(path, false); 2904 + return do_lock_mount(path, m, false); 2842 2905 } 2843 2906 2844 - static void unlock_mount(struct mountpoint *where) 2907 + static void unlock_mount(struct pinned_mountpoint *m) 2845 2908 { 2846 - inode_unlock(where->m_dentry->d_inode); 2909 + inode_unlock(m->mp->m_dentry->d_inode); 2847 2910 read_seqlock_excl(&mount_lock); 2848 - put_mountpoint(where); 2911 + unpin_mountpoint(m); 2849 2912 read_sequnlock_excl(&mount_lock); 2850 2913 namespace_unlock(); 2851 2914 } ··· 2859 2922 d_is_dir(mnt->mnt.mnt_root)) 2860 2923 return -ENOTDIR; 2861 2924 2862 - return attach_recursive_mnt(mnt, p, mp, 0); 2925 + return attach_recursive_mnt(mnt, p, mp); 2863 2926 } 2864 2927 2865 2928 /* ··· 2908 2971 goto out_unlock; 2909 2972 } 2910 2973 2911 - lock_mount_hash(); 2912 2974 for (m = mnt; m; m = (recurse ? next_mnt(m, mnt) : NULL)) 2913 2975 change_mnt_propagation(m, type); 2914 - unlock_mount_hash(); 2915 2976 2916 2977 out_unlock: 2917 2978 namespace_unlock(); ··· 2983 3048 2984 3049 static struct mount *__do_loopback(struct path *old_path, int recurse) 2985 3050 { 2986 - struct mount *mnt = ERR_PTR(-EINVAL), *old = real_mount(old_path->mnt); 3051 + struct mount *old = real_mount(old_path->mnt); 2987 3052 2988 3053 if (IS_MNT_UNBINDABLE(old)) 2989 - return mnt; 3054 + return ERR_PTR(-EINVAL); 2990 3055 2991 3056 if (!may_copy_tree(old_path)) 2992 - return mnt; 3057 + return ERR_PTR(-EINVAL); 2993 3058 2994 3059 if (!recurse && __has_locked_children(old, old_path->dentry)) 2995 - return mnt; 3060 + return ERR_PTR(-EINVAL); 2996 3061 2997 3062 if (recurse) 2998 - mnt = copy_tree(old, old_path->dentry, CL_COPY_MNT_NS_FILE); 3063 + return copy_tree(old, old_path->dentry, CL_COPY_MNT_NS_FILE); 2999 3064 else 3000 - mnt = clone_mnt(old, old_path->dentry, 0); 3001 - 3002 - if (!IS_ERR(mnt)) 3003 - mnt->mnt.mnt_flags &= ~MNT_LOCKED; 3004 - 3005 - return mnt; 3065 + return clone_mnt(old, old_path->dentry, 0); 3006 3066 } 3007 3067 3008 3068 /* ··· 3008 3078 { 3009 3079 struct path old_path; 3010 3080 struct mount *mnt = NULL, *parent; 3011 - struct mountpoint *mp; 3081 + struct pinned_mountpoint mp = {}; 3012 3082 int err; 3013 3083 if (!old_name || !*old_name) 3014 3084 return -EINVAL; ··· 3020 3090 if (mnt_ns_loop(old_path.dentry)) 3021 3091 goto out; 3022 3092 3023 - mp = lock_mount(path); 3024 - if (IS_ERR(mp)) { 3025 - err = PTR_ERR(mp); 3093 + err = lock_mount(path, &mp); 3094 + if (err) 3026 3095 goto out; 3027 - } 3028 3096 3029 3097 parent = real_mount(path->mnt); 3030 3098 if (!check_mnt(parent)) ··· 3034 3106 goto out2; 3035 3107 } 3036 3108 3037 - err = graft_tree(mnt, parent, mp); 3109 + err = graft_tree(mnt, parent, mp.mp); 3038 3110 if (err) { 3039 3111 lock_mount_hash(); 3040 3112 umount_tree(mnt, UMOUNT_SYNC); 3041 3113 unlock_mount_hash(); 3042 3114 } 3043 3115 out2: 3044 - unlock_mount(mp); 3116 + unlock_mount(&mp); 3045 3117 out: 3046 3118 path_put(&old_path); 3047 3119 return err; ··· 3389 3461 goto out; 3390 3462 3391 3463 if (IS_MNT_SLAVE(from)) { 3392 - struct mount *m = from->mnt_master; 3393 - 3394 - list_add(&to->mnt_slave, &from->mnt_slave); 3395 - to->mnt_master = m; 3464 + hlist_add_behind(&to->mnt_slave, &from->mnt_slave); 3465 + to->mnt_master = from->mnt_master; 3396 3466 } 3397 3467 3398 3468 if (IS_MNT_SHARED(from)) { 3399 3469 to->mnt_group_id = from->mnt_group_id; 3400 3470 list_add(&to->mnt_share, &from->mnt_share); 3401 - lock_mount_hash(); 3402 3471 set_mnt_shared(to); 3403 - unlock_mount_hash(); 3404 3472 } 3405 3473 3406 3474 err = 0; ··· 3431 3507 read_sequnlock_excl(&mount_lock); 3432 3508 } 3433 3509 return unlikely(!no_child); 3510 + } 3511 + 3512 + /* 3513 + * Check if there is a possibly empty chain of descent from p1 to p2. 3514 + * Locks: namespace_sem (shared) or mount_lock (read_seqlock_excl). 3515 + */ 3516 + static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2) 3517 + { 3518 + while (p2 != p1 && mnt_has_parent(p2)) 3519 + p2 = p2->mnt_parent; 3520 + return p2 == p1; 3434 3521 } 3435 3522 3436 3523 /** ··· 3495 3560 if (parent_mnt_to == current->nsproxy->mnt_ns->root) 3496 3561 return -EINVAL; 3497 3562 3498 - for (struct mount *p = mnt_from; mnt_has_parent(p); p = p->mnt_parent) 3499 - if (p == mnt_to) 3500 - return -EINVAL; 3563 + if (mount_is_ancestor(mnt_to, mnt_from)) 3564 + return -EINVAL; 3501 3565 3502 3566 /* 3503 3567 * If the parent mount propagates to the child mount this would ··· 3581 3647 struct mount *p; 3582 3648 struct mount *old; 3583 3649 struct mount *parent; 3584 - struct mountpoint *mp, *old_mp; 3650 + struct pinned_mountpoint mp; 3585 3651 int err; 3586 - bool attached, beneath = flags & MNT_TREE_BENEATH; 3652 + bool beneath = flags & MNT_TREE_BENEATH; 3587 3653 3588 - mp = do_lock_mount(new_path, beneath); 3589 - if (IS_ERR(mp)) 3590 - return PTR_ERR(mp); 3654 + err = do_lock_mount(new_path, &mp, beneath); 3655 + if (err) 3656 + return err; 3591 3657 3592 3658 old = real_mount(old_path->mnt); 3593 3659 p = real_mount(new_path->mnt); 3594 3660 parent = old->mnt_parent; 3595 - attached = mnt_has_parent(old); 3596 - if (attached) 3597 - flags |= MNT_TREE_MOVE; 3598 - old_mp = old->mnt_mp; 3599 3661 ns = old->mnt_ns; 3600 3662 3601 3663 err = -EINVAL; 3602 - /* The thing moved must be mounted... */ 3603 - if (!is_mounted(&old->mnt)) 3604 - goto out; 3605 3664 3606 3665 if (check_mnt(old)) { 3607 3666 /* if the source is in our namespace... */ ··· 3604 3677 /* ... and the target should be in our namespace */ 3605 3678 if (!check_mnt(p)) 3606 3679 goto out; 3680 + /* parent of the source should not be shared */ 3681 + if (IS_MNT_SHARED(parent)) 3682 + goto out; 3607 3683 } else { 3608 3684 /* 3609 3685 * otherwise the source must be the root of some anon namespace. 3610 - * AV: check for mount being root of an anon namespace is worth 3611 - * an inlined predicate... 3612 3686 */ 3613 - if (!is_anon_ns(ns) || mnt_has_parent(old)) 3687 + if (!anon_ns_root(old)) 3614 3688 goto out; 3615 3689 /* 3616 3690 * Bail out early if the target is within the same namespace - ··· 3634 3706 if (d_is_dir(new_path->dentry) != 3635 3707 d_is_dir(old_path->dentry)) 3636 3708 goto out; 3637 - /* 3638 - * Don't move a mount residing in a shared parent. 3639 - */ 3640 - if (attached && IS_MNT_SHARED(parent)) 3641 - goto out; 3642 3709 3643 3710 if (beneath) { 3644 - err = can_move_mount_beneath(old_path, new_path, mp); 3711 + err = can_move_mount_beneath(old_path, new_path, mp.mp); 3645 3712 if (err) 3646 3713 goto out; 3647 3714 3648 3715 err = -EINVAL; 3649 3716 p = p->mnt_parent; 3650 - flags |= MNT_TREE_BENEATH; 3651 3717 } 3652 3718 3653 3719 /* ··· 3653 3731 err = -ELOOP; 3654 3732 if (!check_for_nsfs_mounts(old)) 3655 3733 goto out; 3656 - for (; mnt_has_parent(p); p = p->mnt_parent) 3657 - if (p == old) 3658 - goto out; 3659 - 3660 - err = attach_recursive_mnt(old, real_mount(new_path->mnt), mp, flags); 3661 - if (err) 3734 + if (mount_is_ancestor(old, p)) 3662 3735 goto out; 3663 3736 3664 - /* if the mount is moved, it should no longer be expire 3665 - * automatically */ 3666 - list_del_init(&old->mnt_expire); 3667 - if (attached) 3668 - put_mountpoint(old_mp); 3737 + err = attach_recursive_mnt(old, p, mp.mp); 3669 3738 out: 3670 - unlock_mount(mp); 3671 - if (!err) { 3672 - if (attached) { 3673 - mntput_no_expire(parent); 3674 - } else { 3675 - /* Make sure we notice when we leak mounts. */ 3676 - VFS_WARN_ON_ONCE(!mnt_ns_empty(ns)); 3677 - free_mnt_ns(ns); 3678 - } 3679 - } 3739 + unlock_mount(&mp); 3680 3740 return err; 3681 3741 } 3682 3742 ··· 3719 3815 unsigned int mnt_flags) 3720 3816 { 3721 3817 struct vfsmount *mnt; 3722 - struct mountpoint *mp; 3818 + struct pinned_mountpoint mp = {}; 3723 3819 struct super_block *sb = fc->root->d_sb; 3724 3820 int error; 3725 3821 ··· 3740 3836 3741 3837 mnt_warn_timestamp_expiry(mountpoint, mnt); 3742 3838 3743 - mp = lock_mount(mountpoint); 3744 - if (IS_ERR(mp)) { 3745 - mntput(mnt); 3746 - return PTR_ERR(mp); 3839 + error = lock_mount(mountpoint, &mp); 3840 + if (!error) { 3841 + error = do_add_mount(real_mount(mnt), mp.mp, 3842 + mountpoint, mnt_flags); 3843 + unlock_mount(&mp); 3747 3844 } 3748 - error = do_add_mount(real_mount(mnt), mp, mountpoint, mnt_flags); 3749 - unlock_mount(mp); 3750 3845 if (error < 0) 3751 3846 mntput(mnt); 3752 3847 return error; ··· 3813 3910 int finish_automount(struct vfsmount *m, const struct path *path) 3814 3911 { 3815 3912 struct dentry *dentry = path->dentry; 3816 - struct mountpoint *mp; 3913 + struct pinned_mountpoint mp = {}; 3817 3914 struct mount *mnt; 3818 3915 int err; 3819 3916 ··· 3845 3942 err = 0; 3846 3943 goto discard_locked; 3847 3944 } 3848 - mp = get_mountpoint(dentry); 3849 - if (IS_ERR(mp)) { 3850 - err = PTR_ERR(mp); 3945 + err = get_mountpoint(dentry, &mp); 3946 + if (err) 3851 3947 goto discard_locked; 3852 - } 3853 3948 3854 - err = do_add_mount(mnt, mp, path, path->mnt->mnt_flags | MNT_SHRINKABLE); 3855 - unlock_mount(mp); 3949 + err = do_add_mount(mnt, mp.mp, path, 3950 + path->mnt->mnt_flags | MNT_SHRINKABLE); 3951 + unlock_mount(&mp); 3856 3952 if (unlikely(err)) 3857 3953 goto discard; 3858 3954 return 0; ··· 3860 3958 namespace_unlock(); 3861 3959 inode_unlock(dentry->d_inode); 3862 3960 discard: 3863 - /* remove m from any expiration list it may be on */ 3864 - if (!list_empty(&mnt->mnt_expire)) { 3865 - namespace_lock(); 3866 - list_del_init(&mnt->mnt_expire); 3867 - namespace_unlock(); 3868 - } 3869 3961 mntput(m); 3870 3962 return err; 3871 3963 } ··· 3871 3975 */ 3872 3976 void mnt_set_expiry(struct vfsmount *mnt, struct list_head *expiry_list) 3873 3977 { 3874 - namespace_lock(); 3875 - 3978 + read_seqlock_excl(&mount_lock); 3876 3979 list_add_tail(&real_mount(mnt)->mnt_expire, expiry_list); 3877 - 3878 - namespace_unlock(); 3980 + read_sequnlock_excl(&mount_lock); 3879 3981 } 3880 3982 EXPORT_SYMBOL(mnt_set_expiry); 3881 3983 ··· 4227 4333 /* First pass: copy the tree topology */ 4228 4334 copy_flags = CL_COPY_UNBINDABLE | CL_EXPIRE; 4229 4335 if (user_ns != ns->user_ns) 4230 - copy_flags |= CL_SHARED_TO_SLAVE; 4336 + copy_flags |= CL_SLAVE; 4231 4337 new = copy_tree(old, old->mnt.mnt_root, copy_flags); 4232 4338 if (IS_ERR(new)) { 4233 4339 namespace_unlock(); ··· 4652 4758 { 4653 4759 struct path new, old, root; 4654 4760 struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent, *ex_parent; 4655 - struct mountpoint *old_mp, *root_mp; 4761 + struct pinned_mountpoint old_mp = {}; 4656 4762 int error; 4657 4763 4658 4764 if (!may_mount()) ··· 4673 4779 goto out2; 4674 4780 4675 4781 get_fs_root(current->fs, &root); 4676 - old_mp = lock_mount(&old); 4677 - error = PTR_ERR(old_mp); 4678 - if (IS_ERR(old_mp)) 4782 + error = lock_mount(&old, &old_mp); 4783 + if (error) 4679 4784 goto out3; 4680 4785 4681 4786 error = -EINVAL; ··· 4701 4808 if (!path_mounted(&root)) 4702 4809 goto out4; /* not a mountpoint */ 4703 4810 if (!mnt_has_parent(root_mnt)) 4704 - goto out4; /* not attached */ 4811 + goto out4; /* absolute root */ 4705 4812 if (!path_mounted(&new)) 4706 4813 goto out4; /* not a mountpoint */ 4707 4814 if (!mnt_has_parent(new_mnt)) 4708 - goto out4; /* not attached */ 4815 + goto out4; /* absolute root */ 4709 4816 /* make sure we can reach put_old from new_root */ 4710 4817 if (!is_path_reachable(old_mnt, old.dentry, &new)) 4711 4818 goto out4; ··· 4714 4821 goto out4; 4715 4822 lock_mount_hash(); 4716 4823 umount_mnt(new_mnt); 4717 - root_mp = unhash_mnt(root_mnt); /* we'll need its mountpoint */ 4718 4824 if (root_mnt->mnt.mnt_flags & MNT_LOCKED) { 4719 4825 new_mnt->mnt.mnt_flags |= MNT_LOCKED; 4720 4826 root_mnt->mnt.mnt_flags &= ~MNT_LOCKED; 4721 4827 } 4722 - /* mount old root on put_old */ 4723 - attach_mnt(root_mnt, old_mnt, old_mp, false); 4724 4828 /* mount new_root on / */ 4725 - attach_mnt(new_mnt, root_parent, root_mp, false); 4726 - mnt_add_count(root_parent, -1); 4829 + attach_mnt(new_mnt, root_parent, root_mnt->mnt_mp); 4830 + umount_mnt(root_mnt); 4831 + /* mount old root on put_old */ 4832 + attach_mnt(root_mnt, old_mnt, old_mp.mp); 4727 4833 touch_mnt_namespace(current->nsproxy->mnt_ns); 4728 4834 /* A moved mount should not expire automatically */ 4729 4835 list_del_init(&new_mnt->mnt_expire); 4730 - put_mountpoint(root_mp); 4731 4836 unlock_mount_hash(); 4732 4837 mnt_notify_add(root_mnt); 4733 4838 mnt_notify_add(new_mnt); 4734 4839 chroot_fs_refs(&root, &new); 4735 4840 error = 0; 4736 4841 out4: 4737 - unlock_mount(old_mp); 4738 - if (!error) 4739 - mntput_no_expire(ex_parent); 4842 + unlock_mount(&old_mp); 4740 4843 out3: 4741 4844 path_put(&root); 4742 4845 out2: ··· 4934 5045 err = -EINVAL; 4935 5046 lock_mount_hash(); 4936 5047 4937 - /* Ensure that this isn't anything purely vfs internal. */ 4938 - if (!is_mounted(&mnt->mnt)) 4939 - goto out; 4940 - 4941 - /* 4942 - * If this is an attached mount make sure it's located in the callers 4943 - * mount namespace. If it's not don't let the caller interact with it. 4944 - * 4945 - * If this mount doesn't have a parent it's most often simply a 4946 - * detached mount with an anonymous mount namespace. IOW, something 4947 - * that's simply not attached yet. But there are apparently also users 4948 - * that do change mount properties on the rootfs itself. That obviously 4949 - * neither has a parent nor is it a detached mount so we cannot 4950 - * unconditionally check for detached mounts. 4951 - */ 4952 - if ((mnt_has_parent(mnt) || !is_anon_ns(mnt->mnt_ns)) && !check_mnt(mnt)) 5048 + if (!anon_ns_root(mnt) && !check_mnt(mnt)) 4953 5049 goto out; 4954 5050 4955 5051 /* ··· 5298 5424 s->sm.mnt_parent_id_old = m->mnt_parent->mnt_id; 5299 5425 s->sm.mnt_attr = mnt_to_attr_flags(&m->mnt); 5300 5426 s->sm.mnt_propagation = mnt_to_propagation_flags(m); 5301 - s->sm.mnt_peer_group = IS_MNT_SHARED(m) ? m->mnt_group_id : 0; 5427 + s->sm.mnt_peer_group = m->mnt_group_id; 5302 5428 s->sm.mnt_master = IS_MNT_SLAVE(m) ? m->mnt_master->mnt_group_id : 0; 5303 5429 } 5304 5430 ··· 6102 6228 6103 6229 root.mnt = mnt; 6104 6230 root.dentry = mnt->mnt_root; 6105 - mnt->mnt_flags |= MNT_LOCKED; 6106 6231 6107 6232 set_fs_pwd(current->fs, &root); 6108 6233 set_fs_root(current->fs, &root); ··· 6149 6276 if (!refcount_dec_and_test(&ns->ns.count)) 6150 6277 return; 6151 6278 namespace_lock(); 6279 + emptied_ns = ns; 6152 6280 lock_mount_hash(); 6153 6281 umount_tree(ns->root, 0); 6154 6282 unlock_mount_hash(); 6155 6283 namespace_unlock(); 6156 - free_mnt_ns(ns); 6157 6284 } 6158 6285 6159 6286 struct vfsmount *kern_mount(struct file_system_type *type)
+364 -359
fs/pnode.c
··· 21 21 22 22 static inline struct mount *first_slave(struct mount *p) 23 23 { 24 - return list_entry(p->mnt_slave_list.next, struct mount, mnt_slave); 25 - } 26 - 27 - static inline struct mount *last_slave(struct mount *p) 28 - { 29 - return list_entry(p->mnt_slave_list.prev, struct mount, mnt_slave); 24 + return hlist_entry(p->mnt_slave_list.first, struct mount, mnt_slave); 30 25 } 31 26 32 27 static inline struct mount *next_slave(struct mount *p) 33 28 { 34 - return list_entry(p->mnt_slave.next, struct mount, mnt_slave); 29 + return hlist_entry(p->mnt_slave.next, struct mount, mnt_slave); 35 30 } 36 31 37 32 static struct mount *get_peer_under_root(struct mount *mnt, ··· 65 70 return 0; 66 71 } 67 72 68 - static int do_make_slave(struct mount *mnt) 73 + static inline bool will_be_unmounted(struct mount *m) 69 74 { 70 - struct mount *master, *slave_mnt; 75 + return m->mnt.mnt_flags & MNT_UMOUNT; 76 + } 71 77 72 - if (list_empty(&mnt->mnt_share)) { 73 - if (IS_MNT_SHARED(mnt)) { 74 - mnt_release_group_id(mnt); 75 - CLEAR_MNT_SHARED(mnt); 76 - } 77 - master = mnt->mnt_master; 78 - if (!master) { 79 - struct list_head *p = &mnt->mnt_slave_list; 80 - while (!list_empty(p)) { 81 - slave_mnt = list_first_entry(p, 82 - struct mount, mnt_slave); 83 - list_del_init(&slave_mnt->mnt_slave); 84 - slave_mnt->mnt_master = NULL; 85 - } 86 - return 0; 87 - } 88 - } else { 78 + static struct mount *propagation_source(struct mount *mnt) 79 + { 80 + do { 89 81 struct mount *m; 90 - /* 91 - * slave 'mnt' to a peer mount that has the 92 - * same root dentry. If none is available then 93 - * slave it to anything that is available. 94 - */ 95 - for (m = master = next_peer(mnt); m != mnt; m = next_peer(m)) { 96 - if (m->mnt.mnt_root == mnt->mnt.mnt_root) { 97 - master = m; 98 - break; 99 - } 82 + for (m = next_peer(mnt); m != mnt; m = next_peer(m)) { 83 + if (!will_be_unmounted(m)) 84 + return m; 100 85 } 101 - list_del_init(&mnt->mnt_share); 102 - mnt->mnt_group_id = 0; 103 - CLEAR_MNT_SHARED(mnt); 86 + mnt = mnt->mnt_master; 87 + } while (mnt && will_be_unmounted(mnt)); 88 + return mnt; 89 + } 90 + 91 + static void transfer_propagation(struct mount *mnt, struct mount *to) 92 + { 93 + struct hlist_node *p = NULL, *n; 94 + struct mount *m; 95 + 96 + hlist_for_each_entry_safe(m, n, &mnt->mnt_slave_list, mnt_slave) { 97 + m->mnt_master = to; 98 + if (!to) 99 + hlist_del_init(&m->mnt_slave); 100 + else 101 + p = &m->mnt_slave; 104 102 } 105 - list_for_each_entry(slave_mnt, &mnt->mnt_slave_list, mnt_slave) 106 - slave_mnt->mnt_master = master; 107 - list_move(&mnt->mnt_slave, &master->mnt_slave_list); 108 - list_splice(&mnt->mnt_slave_list, master->mnt_slave_list.prev); 109 - INIT_LIST_HEAD(&mnt->mnt_slave_list); 110 - mnt->mnt_master = master; 111 - return 0; 103 + if (p) 104 + hlist_splice_init(&mnt->mnt_slave_list, p, &to->mnt_slave_list); 112 105 } 113 106 114 107 /* 115 - * vfsmount lock must be held for write 108 + * EXCL[namespace_sem] 116 109 */ 117 110 void change_mnt_propagation(struct mount *mnt, int type) 118 111 { 112 + struct mount *m = mnt->mnt_master; 113 + 119 114 if (type == MS_SHARED) { 120 115 set_mnt_shared(mnt); 121 116 return; 122 117 } 123 - do_make_slave(mnt); 124 - if (type != MS_SLAVE) { 125 - list_del_init(&mnt->mnt_slave); 118 + if (IS_MNT_SHARED(mnt)) { 119 + m = propagation_source(mnt); 120 + if (list_empty(&mnt->mnt_share)) { 121 + mnt_release_group_id(mnt); 122 + } else { 123 + list_del_init(&mnt->mnt_share); 124 + mnt->mnt_group_id = 0; 125 + } 126 + CLEAR_MNT_SHARED(mnt); 127 + transfer_propagation(mnt, m); 128 + } 129 + hlist_del_init(&mnt->mnt_slave); 130 + if (type == MS_SLAVE) { 131 + mnt->mnt_master = m; 132 + if (m) 133 + hlist_add_head(&mnt->mnt_slave, &m->mnt_slave_list); 134 + } else { 126 135 mnt->mnt_master = NULL; 127 136 if (type == MS_UNBINDABLE) 128 - mnt->mnt.mnt_flags |= MNT_UNBINDABLE; 137 + mnt->mnt_t_flags |= T_UNBINDABLE; 129 138 else 130 - mnt->mnt.mnt_flags &= ~MNT_UNBINDABLE; 139 + mnt->mnt_t_flags &= ~T_UNBINDABLE; 140 + } 141 + } 142 + 143 + static struct mount *__propagation_next(struct mount *m, 144 + struct mount *origin) 145 + { 146 + while (1) { 147 + struct mount *master = m->mnt_master; 148 + 149 + if (master == origin->mnt_master) { 150 + struct mount *next = next_peer(m); 151 + return (next == origin) ? NULL : next; 152 + } else if (m->mnt_slave.next) 153 + return next_slave(m); 154 + 155 + /* back at master */ 156 + m = master; 131 157 } 132 158 } 133 159 ··· 166 150 struct mount *origin) 167 151 { 168 152 /* are there any slaves of this mount? */ 169 - if (!IS_MNT_NEW(m) && !list_empty(&m->mnt_slave_list)) 153 + if (!IS_MNT_NEW(m) && !hlist_empty(&m->mnt_slave_list)) 170 154 return first_slave(m); 171 155 172 - while (1) { 173 - struct mount *master = m->mnt_master; 174 - 175 - if (master == origin->mnt_master) { 176 - struct mount *next = next_peer(m); 177 - return (next == origin) ? NULL : next; 178 - } else if (m->mnt_slave.next != &master->mnt_slave_list) 179 - return next_slave(m); 180 - 181 - /* back at master */ 182 - m = master; 183 - } 156 + return __propagation_next(m, origin); 184 157 } 185 158 186 159 static struct mount *skip_propagation_subtree(struct mount *m, 187 160 struct mount *origin) 188 161 { 189 162 /* 190 - * Advance m such that propagation_next will not return 191 - * the slaves of m. 163 + * Advance m past everything that gets propagation from it. 192 164 */ 193 - if (!IS_MNT_NEW(m) && !list_empty(&m->mnt_slave_list)) 194 - m = last_slave(m); 165 + struct mount *p = __propagation_next(m, origin); 195 166 196 - return m; 167 + while (p && peers(m, p)) 168 + p = __propagation_next(p, origin); 169 + 170 + return p; 197 171 } 198 172 199 173 static struct mount *next_group(struct mount *m, struct mount *origin) ··· 191 185 while (1) { 192 186 while (1) { 193 187 struct mount *next; 194 - if (!IS_MNT_NEW(m) && !list_empty(&m->mnt_slave_list)) 188 + if (!IS_MNT_NEW(m) && !hlist_empty(&m->mnt_slave_list)) 195 189 return first_slave(m); 196 190 next = next_peer(m); 197 191 if (m->mnt_group_id == origin->mnt_group_id) { ··· 204 198 /* m is the last peer */ 205 199 while (1) { 206 200 struct mount *master = m->mnt_master; 207 - if (m->mnt_slave.next != &master->mnt_slave_list) 201 + if (m->mnt_slave.next) 208 202 return next_slave(m); 209 203 m = next_peer(master); 210 204 if (master->mnt_group_id == origin->mnt_group_id) ··· 218 212 } 219 213 } 220 214 221 - /* all accesses are serialized by namespace_sem */ 222 - static struct mount *last_dest, *first_source, *last_source, *dest_master; 223 - static struct hlist_head *list; 224 - 225 - static inline bool peers(const struct mount *m1, const struct mount *m2) 215 + static bool need_secondary(struct mount *m, struct mountpoint *dest_mp) 226 216 { 227 - return m1->mnt_group_id == m2->mnt_group_id && m1->mnt_group_id; 228 - } 229 - 230 - static int propagate_one(struct mount *m, struct mountpoint *dest_mp) 231 - { 232 - struct mount *child; 233 - int type; 234 217 /* skip ones added by this propagate_mnt() */ 235 218 if (IS_MNT_NEW(m)) 236 - return 0; 219 + return false; 237 220 /* skip if mountpoint isn't visible in m */ 238 221 if (!is_subdir(dest_mp->m_dentry, m->mnt.mnt_root)) 239 - return 0; 222 + return false; 240 223 /* skip if m is in the anon_ns */ 241 224 if (is_anon_ns(m->mnt_ns)) 242 - return 0; 243 - 244 - if (peers(m, last_dest)) { 245 - type = CL_MAKE_SHARED; 246 - } else { 247 - struct mount *n, *p; 248 - bool done; 249 - for (n = m; ; n = p) { 250 - p = n->mnt_master; 251 - if (p == dest_master || IS_MNT_MARKED(p)) 252 - break; 253 - } 254 - do { 255 - struct mount *parent = last_source->mnt_parent; 256 - if (peers(last_source, first_source)) 257 - break; 258 - done = parent->mnt_master == p; 259 - if (done && peers(n, parent)) 260 - break; 261 - last_source = last_source->mnt_master; 262 - } while (!done); 263 - 264 - type = CL_SLAVE; 265 - /* beginning of peer group among the slaves? */ 266 - if (IS_MNT_SHARED(m)) 267 - type |= CL_MAKE_SHARED; 268 - } 269 - 270 - child = copy_tree(last_source, last_source->mnt.mnt_root, type); 271 - if (IS_ERR(child)) 272 - return PTR_ERR(child); 273 - read_seqlock_excl(&mount_lock); 274 - mnt_set_mountpoint(m, dest_mp, child); 275 - if (m->mnt_master != dest_master) 276 - SET_MNT_MARK(m->mnt_master); 277 - read_sequnlock_excl(&mount_lock); 278 - last_dest = m; 279 - last_source = child; 280 - hlist_add_head(&child->mnt_hash, list); 281 - return count_mounts(m->mnt_ns, child); 225 + return false; 226 + return true; 282 227 } 283 228 284 - /* 285 - * mount 'source_mnt' under the destination 'dest_mnt' at 286 - * dentry 'dest_dentry'. And propagate that mount to 287 - * all the peer and slave mounts of 'dest_mnt'. 288 - * Link all the new mounts into a propagation tree headed at 289 - * source_mnt. Also link all the new mounts using ->mnt_list 290 - * headed at source_mnt's ->mnt_list 229 + static struct mount *find_master(struct mount *m, 230 + struct mount *last_copy, 231 + struct mount *original) 232 + { 233 + struct mount *p; 234 + 235 + // ascend until there's a copy for something with the same master 236 + for (;;) { 237 + p = m->mnt_master; 238 + if (!p || IS_MNT_MARKED(p)) 239 + break; 240 + m = p; 241 + } 242 + while (!peers(last_copy, original)) { 243 + struct mount *parent = last_copy->mnt_parent; 244 + if (parent->mnt_master == p) { 245 + if (!peers(parent, m)) 246 + last_copy = last_copy->mnt_master; 247 + break; 248 + } 249 + last_copy = last_copy->mnt_master; 250 + } 251 + return last_copy; 252 + } 253 + 254 + /** 255 + * propagate_mnt() - create secondary copies for tree attachment 256 + * @dest_mnt: destination mount. 257 + * @dest_mp: destination mountpoint. 258 + * @source_mnt: source mount. 259 + * @tree_list: list of secondaries to be attached. 291 260 * 292 - * @dest_mnt: destination mount. 293 - * @dest_dentry: destination dentry. 294 - * @source_mnt: source mount. 295 - * @tree_list : list of heads of trees to be attached. 261 + * Create secondary copies for attaching a tree with root @source_mnt 262 + * at mount @dest_mnt with mountpoint @dest_mp. Link all new mounts 263 + * into a propagation graph. Set mountpoints for all secondaries, 264 + * link their roots into @tree_list via ->mnt_hash. 296 265 */ 297 266 int propagate_mnt(struct mount *dest_mnt, struct mountpoint *dest_mp, 298 - struct mount *source_mnt, struct hlist_head *tree_list) 267 + struct mount *source_mnt, struct hlist_head *tree_list) 299 268 { 300 - struct mount *m, *n; 301 - int ret = 0; 269 + struct mount *m, *n, *copy, *this; 270 + int err = 0, type; 302 271 303 - /* 304 - * we don't want to bother passing tons of arguments to 305 - * propagate_one(); everything is serialized by namespace_sem, 306 - * so globals will do just fine. 307 - */ 308 - last_dest = dest_mnt; 309 - first_source = source_mnt; 310 - last_source = source_mnt; 311 - list = tree_list; 312 - dest_master = dest_mnt->mnt_master; 272 + if (dest_mnt->mnt_master) 273 + SET_MNT_MARK(dest_mnt->mnt_master); 313 274 314 - /* all peers of dest_mnt, except dest_mnt itself */ 315 - for (n = next_peer(dest_mnt); n != dest_mnt; n = next_peer(n)) { 316 - ret = propagate_one(n, dest_mp); 317 - if (ret) 318 - goto out; 319 - } 320 - 321 - /* all slave groups */ 322 - for (m = next_group(dest_mnt, dest_mnt); m; 323 - m = next_group(m, dest_mnt)) { 324 - /* everything in that slave group */ 325 - n = m; 275 + /* iterate over peer groups, depth first */ 276 + for (m = dest_mnt; m && !err; m = next_group(m, dest_mnt)) { 277 + if (m == dest_mnt) { // have one for dest_mnt itself 278 + copy = source_mnt; 279 + type = CL_MAKE_SHARED; 280 + n = next_peer(m); 281 + if (n == m) 282 + continue; 283 + } else { 284 + type = CL_SLAVE; 285 + /* beginning of peer group among the slaves? */ 286 + if (IS_MNT_SHARED(m)) 287 + type |= CL_MAKE_SHARED; 288 + n = m; 289 + } 326 290 do { 327 - ret = propagate_one(n, dest_mp); 328 - if (ret) 329 - goto out; 330 - n = next_peer(n); 331 - } while (n != m); 291 + if (!need_secondary(n, dest_mp)) 292 + continue; 293 + if (type & CL_SLAVE) // first in this peer group 294 + copy = find_master(n, copy, source_mnt); 295 + this = copy_tree(copy, copy->mnt.mnt_root, type); 296 + if (IS_ERR(this)) { 297 + err = PTR_ERR(this); 298 + break; 299 + } 300 + read_seqlock_excl(&mount_lock); 301 + mnt_set_mountpoint(n, dest_mp, this); 302 + read_sequnlock_excl(&mount_lock); 303 + if (n->mnt_master) 304 + SET_MNT_MARK(n->mnt_master); 305 + copy = this; 306 + hlist_add_head(&this->mnt_hash, tree_list); 307 + err = count_mounts(n->mnt_ns, this); 308 + if (err) 309 + break; 310 + type = CL_MAKE_SHARED; 311 + } while ((n = next_peer(n)) != m); 332 312 } 333 - out: 334 - read_seqlock_excl(&mount_lock); 313 + 335 314 hlist_for_each_entry(n, tree_list, mnt_hash) { 336 315 m = n->mnt_parent; 337 - if (m->mnt_master != dest_mnt->mnt_master) 316 + if (m->mnt_master) 338 317 CLEAR_MNT_MARK(m->mnt_master); 339 318 } 340 - read_sequnlock_excl(&mount_lock); 341 - return ret; 342 - } 343 - 344 - static struct mount *find_topper(struct mount *mnt) 345 - { 346 - /* If there is exactly one mount covering mnt completely return it. */ 347 - struct mount *child; 348 - 349 - if (!list_is_singular(&mnt->mnt_mounts)) 350 - return NULL; 351 - 352 - child = list_first_entry(&mnt->mnt_mounts, struct mount, mnt_child); 353 - if (child->mnt_mountpoint != mnt->mnt.mnt_root) 354 - return NULL; 355 - 356 - return child; 319 + if (dest_mnt->mnt_master) 320 + CLEAR_MNT_MARK(dest_mnt->mnt_master); 321 + return err; 357 322 } 358 323 359 324 /* ··· 384 407 */ 385 408 int propagate_mount_busy(struct mount *mnt, int refcnt) 386 409 { 387 - struct mount *m, *child, *topper; 388 410 struct mount *parent = mnt->mnt_parent; 389 - 390 - if (mnt == parent) 391 - return do_refcount_check(mnt, refcnt); 392 411 393 412 /* 394 413 * quickly check if the current mount can be unmounted. ··· 394 421 if (!list_empty(&mnt->mnt_mounts) || do_refcount_check(mnt, refcnt)) 395 422 return 1; 396 423 397 - for (m = propagation_next(parent, parent); m; 424 + if (mnt == parent) 425 + return 0; 426 + 427 + for (struct mount *m = propagation_next(parent, parent); m; 398 428 m = propagation_next(m, parent)) { 399 - int count = 1; 400 - child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint); 429 + struct list_head *head; 430 + struct mount *child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint); 431 + 401 432 if (!child) 402 433 continue; 403 434 404 - /* Is there exactly one mount on the child that covers 405 - * it completely whose reference should be ignored? 406 - */ 407 - topper = find_topper(child); 408 - if (topper) 409 - count += 1; 410 - else if (!list_empty(&child->mnt_mounts)) 411 - continue; 412 - 413 - if (do_refcount_check(child, count)) 435 + head = &child->mnt_mounts; 436 + if (!list_empty(head)) { 437 + /* 438 + * a mount that covers child completely wouldn't prevent 439 + * it being pulled out; any other would. 440 + */ 441 + if (!list_is_singular(head) || !child->overmount) 442 + continue; 443 + } 444 + if (do_refcount_check(child, 1)) 414 445 return 1; 415 446 } 416 447 return 0; ··· 440 463 } 441 464 } 442 465 443 - static void umount_one(struct mount *mnt, struct list_head *to_umount) 466 + static inline bool is_candidate(struct mount *m) 444 467 { 445 - CLEAR_MNT_MARK(mnt); 446 - mnt->mnt.mnt_flags |= MNT_UMOUNT; 447 - list_del_init(&mnt->mnt_child); 448 - list_del_init(&mnt->mnt_umounting); 449 - move_from_ns(mnt, to_umount); 468 + return m->mnt_t_flags & T_UMOUNT_CANDIDATE; 469 + } 470 + 471 + static void umount_one(struct mount *m, struct list_head *to_umount) 472 + { 473 + m->mnt.mnt_flags |= MNT_UMOUNT; 474 + list_del_init(&m->mnt_child); 475 + move_from_ns(m); 476 + list_add_tail(&m->mnt_list, to_umount); 477 + } 478 + 479 + static void remove_from_candidate_list(struct mount *m) 480 + { 481 + m->mnt_t_flags &= ~(T_MARKED | T_UMOUNT_CANDIDATE); 482 + list_del_init(&m->mnt_list); 483 + } 484 + 485 + static void gather_candidates(struct list_head *set, 486 + struct list_head *candidates) 487 + { 488 + struct mount *m, *p, *q; 489 + 490 + list_for_each_entry(m, set, mnt_list) { 491 + if (is_candidate(m)) 492 + continue; 493 + m->mnt_t_flags |= T_UMOUNT_CANDIDATE; 494 + p = m->mnt_parent; 495 + q = propagation_next(p, p); 496 + while (q) { 497 + struct mount *child = __lookup_mnt(&q->mnt, 498 + m->mnt_mountpoint); 499 + if (child) { 500 + /* 501 + * We might've already run into this one. That 502 + * must've happened on earlier iteration of the 503 + * outer loop; in that case we can skip those 504 + * parents that get propagation from q - there 505 + * will be nothing new on those as well. 506 + */ 507 + if (is_candidate(child)) { 508 + q = skip_propagation_subtree(q, p); 509 + continue; 510 + } 511 + child->mnt_t_flags |= T_UMOUNT_CANDIDATE; 512 + if (!will_be_unmounted(child)) 513 + list_add(&child->mnt_list, candidates); 514 + } 515 + q = propagation_next(q, p); 516 + } 517 + } 518 + list_for_each_entry(m, set, mnt_list) 519 + m->mnt_t_flags &= ~T_UMOUNT_CANDIDATE; 450 520 } 451 521 452 522 /* 453 - * NOTE: unmounting 'mnt' naturally propagates to all other mounts its 454 - * parent propagates to. 523 + * We know that some child of @m can't be unmounted. In all places where the 524 + * chain of descent of @m has child not overmounting the root of parent, 525 + * the parent can't be unmounted either. 455 526 */ 456 - static bool __propagate_umount(struct mount *mnt, 457 - struct list_head *to_umount, 458 - struct list_head *to_restore) 527 + static void trim_ancestors(struct mount *m) 459 528 { 460 - bool progress = false; 461 - struct mount *child; 529 + struct mount *p; 462 530 463 - /* 464 - * The state of the parent won't change if this mount is 465 - * already unmounted or marked as without children. 466 - */ 467 - if (mnt->mnt.mnt_flags & (MNT_UMOUNT | MNT_MARKED)) 468 - goto out; 469 - 470 - /* Verify topper is the only grandchild that has not been 471 - * speculatively unmounted. 472 - */ 473 - list_for_each_entry(child, &mnt->mnt_mounts, mnt_child) { 474 - if (child->mnt_mountpoint == mnt->mnt.mnt_root) 475 - continue; 476 - if (!list_empty(&child->mnt_umounting) && IS_MNT_MARKED(child)) 477 - continue; 478 - /* Found a mounted child */ 479 - goto children; 480 - } 481 - 482 - /* Mark mounts that can be unmounted if not locked */ 483 - SET_MNT_MARK(mnt); 484 - progress = true; 485 - 486 - /* If a mount is without children and not locked umount it. */ 487 - if (!IS_MNT_LOCKED(mnt)) { 488 - umount_one(mnt, to_umount); 489 - } else { 490 - children: 491 - list_move_tail(&mnt->mnt_umounting, to_restore); 492 - } 493 - out: 494 - return progress; 495 - } 496 - 497 - static void umount_list(struct list_head *to_umount, 498 - struct list_head *to_restore) 499 - { 500 - struct mount *mnt, *child, *tmp; 501 - list_for_each_entry(mnt, to_umount, mnt_list) { 502 - list_for_each_entry_safe(child, tmp, &mnt->mnt_mounts, mnt_child) { 503 - /* topper? */ 504 - if (child->mnt_mountpoint == mnt->mnt.mnt_root) 505 - list_move_tail(&child->mnt_umounting, to_restore); 506 - else 507 - umount_one(child, to_umount); 508 - } 509 - } 510 - } 511 - 512 - static void restore_mounts(struct list_head *to_restore) 513 - { 514 - /* Restore mounts to a clean working state */ 515 - while (!list_empty(to_restore)) { 516 - struct mount *mnt, *parent; 517 - struct mountpoint *mp; 518 - 519 - mnt = list_first_entry(to_restore, struct mount, mnt_umounting); 520 - CLEAR_MNT_MARK(mnt); 521 - list_del_init(&mnt->mnt_umounting); 522 - 523 - /* Should this mount be reparented? */ 524 - mp = mnt->mnt_mp; 525 - parent = mnt->mnt_parent; 526 - while (parent->mnt.mnt_flags & MNT_UMOUNT) { 527 - mp = parent->mnt_mp; 528 - parent = parent->mnt_parent; 529 - } 530 - if (parent != mnt->mnt_parent) { 531 - mnt_change_mountpoint(parent, mp, mnt); 532 - mnt_notify_add(mnt); 533 - } 534 - } 535 - } 536 - 537 - static void cleanup_umount_visitations(struct list_head *visited) 538 - { 539 - while (!list_empty(visited)) { 540 - struct mount *mnt = 541 - list_first_entry(visited, struct mount, mnt_umounting); 542 - list_del_init(&mnt->mnt_umounting); 531 + for (p = m->mnt_parent; is_candidate(p); m = p, p = p->mnt_parent) { 532 + if (IS_MNT_MARKED(m)) // all candidates beneath are overmounts 533 + return; 534 + SET_MNT_MARK(m); 535 + if (m != p->overmount) 536 + p->mnt_t_flags &= ~T_UMOUNT_CANDIDATE; 543 537 } 544 538 } 545 539 546 540 /* 547 - * collect all mounts that receive propagation from the mount in @list, 548 - * and return these additional mounts in the same list. 549 - * @list: the list of mounts to be unmounted. 541 + * Find and exclude all umount candidates forbidden by @m 542 + * (see Documentation/filesystems/propagate_umount.txt) 543 + * If we can immediately tell that @m is OK to unmount (unlocked 544 + * and all children are already committed to unmounting) commit 545 + * to unmounting it. 546 + * Only @m itself might be taken from the candidates list; 547 + * anything found by trim_ancestors() is marked non-candidate 548 + * and left on the list. 549 + */ 550 + static void trim_one(struct mount *m, struct list_head *to_umount) 551 + { 552 + bool remove_this = false, found = false, umount_this = false; 553 + struct mount *n; 554 + 555 + if (!is_candidate(m)) { // trim_ancestors() left it on list 556 + remove_from_candidate_list(m); 557 + return; 558 + } 559 + 560 + list_for_each_entry(n, &m->mnt_mounts, mnt_child) { 561 + if (!is_candidate(n)) { 562 + found = true; 563 + if (n != m->overmount) { 564 + remove_this = true; 565 + break; 566 + } 567 + } 568 + } 569 + if (found) { 570 + trim_ancestors(m); 571 + } else if (!IS_MNT_LOCKED(m) && list_empty(&m->mnt_mounts)) { 572 + remove_this = true; 573 + umount_this = true; 574 + } 575 + if (remove_this) { 576 + remove_from_candidate_list(m); 577 + if (umount_this) 578 + umount_one(m, to_umount); 579 + } 580 + } 581 + 582 + static void handle_locked(struct mount *m, struct list_head *to_umount) 583 + { 584 + struct mount *cutoff = m, *p; 585 + 586 + if (!is_candidate(m)) { // trim_ancestors() left it on list 587 + remove_from_candidate_list(m); 588 + return; 589 + } 590 + for (p = m; is_candidate(p); p = p->mnt_parent) { 591 + remove_from_candidate_list(p); 592 + if (!IS_MNT_LOCKED(p)) 593 + cutoff = p->mnt_parent; 594 + } 595 + if (will_be_unmounted(p)) 596 + cutoff = p; 597 + while (m != cutoff) { 598 + umount_one(m, to_umount); 599 + m = m->mnt_parent; 600 + } 601 + } 602 + 603 + /* 604 + * @m is not to going away, and it overmounts the top of a stack of mounts 605 + * that are going away. We know that all of those are fully overmounted 606 + * by the one above (@m being the topmost of the chain), so @m can be slid 607 + * in place where the bottom of the stack is attached. 550 608 * 551 - * vfsmount lock must be held for write 609 + * NOTE: here we temporarily violate a constraint - two mounts end up with 610 + * the same parent and mountpoint; that will be remedied as soon as we 611 + * return from propagate_umount() - its caller (umount_tree()) will detach 612 + * the stack from the parent it (and now @m) is attached to. umount_tree() 613 + * might choose to keep unmounted pieces stuck to each other, but it always 614 + * detaches them from the mounts that remain in the tree. 552 615 */ 553 - int propagate_umount(struct list_head *list) 616 + static void reparent(struct mount *m) 554 617 { 555 - struct mount *mnt; 556 - LIST_HEAD(to_restore); 557 - LIST_HEAD(to_umount); 558 - LIST_HEAD(visited); 618 + struct mount *p = m; 619 + struct mountpoint *mp; 559 620 560 - /* Find candidates for unmounting */ 561 - list_for_each_entry_reverse(mnt, list, mnt_list) { 562 - struct mount *parent = mnt->mnt_parent; 563 - struct mount *m; 621 + do { 622 + mp = p->mnt_mp; 623 + p = p->mnt_parent; 624 + } while (will_be_unmounted(p)); 564 625 565 - /* 566 - * If this mount has already been visited it is known that it's 567 - * entire peer group and all of their slaves in the propagation 568 - * tree for the mountpoint has already been visited and there is 569 - * no need to visit them again. 570 - */ 571 - if (!list_empty(&mnt->mnt_umounting)) 572 - continue; 626 + mnt_change_mountpoint(p, mp, m); 627 + mnt_notify_add(m); 628 + } 573 629 574 - list_add_tail(&mnt->mnt_umounting, &visited); 575 - for (m = propagation_next(parent, parent); m; 576 - m = propagation_next(m, parent)) { 577 - struct mount *child = __lookup_mnt(&m->mnt, 578 - mnt->mnt_mountpoint); 579 - if (!child) 580 - continue; 630 + /** 631 + * propagate_umount - apply propagation rules to the set of mounts for umount() 632 + * @set: the list of mounts to be unmounted. 633 + * 634 + * Collect all mounts that receive propagation from the mount in @set and have 635 + * no obstacles to being unmounted. Add these additional mounts to the set. 636 + * 637 + * See Documentation/filesystems/propagate_umount.txt if you do anything in 638 + * this area. 639 + * 640 + * Locks held: 641 + * mount_lock (write_seqlock), namespace_sem (exclusive). 642 + */ 643 + void propagate_umount(struct list_head *set) 644 + { 645 + struct mount *m, *p; 646 + LIST_HEAD(to_umount); // committed to unmounting 647 + LIST_HEAD(candidates); // undecided umount candidates 581 648 582 - if (!list_empty(&child->mnt_umounting)) { 583 - /* 584 - * If the child has already been visited it is 585 - * know that it's entire peer group and all of 586 - * their slaves in the propgation tree for the 587 - * mountpoint has already been visited and there 588 - * is no need to visit this subtree again. 589 - */ 590 - m = skip_propagation_subtree(m, parent); 591 - continue; 592 - } else if (child->mnt.mnt_flags & MNT_UMOUNT) { 593 - /* 594 - * We have come across a partially unmounted 595 - * mount in a list that has not been visited 596 - * yet. Remember it has been visited and 597 - * continue about our merry way. 598 - */ 599 - list_add_tail(&child->mnt_umounting, &visited); 600 - continue; 601 - } 649 + // collect all candidates 650 + gather_candidates(set, &candidates); 602 651 603 - /* Check the child and parents while progress is made */ 604 - while (__propagate_umount(child, 605 - &to_umount, &to_restore)) { 606 - /* Is the parent a umount candidate? */ 607 - child = child->mnt_parent; 608 - if (list_empty(&child->mnt_umounting)) 609 - break; 610 - } 611 - } 652 + // reduce the set until it's non-shifting 653 + list_for_each_entry_safe(m, p, &candidates, mnt_list) 654 + trim_one(m, &to_umount); 655 + 656 + // ... and non-revealing 657 + while (!list_empty(&candidates)) { 658 + m = list_first_entry(&candidates,struct mount, mnt_list); 659 + handle_locked(m, &to_umount); 612 660 } 613 661 614 - umount_list(&to_umount, &to_restore); 615 - restore_mounts(&to_restore); 616 - cleanup_umount_visitations(&visited); 617 - list_splice_tail(&to_umount, list); 662 + // now to_umount consists of all acceptable candidates 663 + // deal with reparenting of remaining overmounts on those 664 + list_for_each_entry(m, &to_umount, mnt_list) { 665 + if (m->overmount) 666 + reparent(m->overmount); 667 + } 618 668 619 - return 0; 669 + // and fold them into the set 670 + list_splice_tail_init(&to_umount, set); 620 671 }
+17 -10
fs/pnode.h
··· 10 10 #include <linux/list.h> 11 11 #include "mount.h" 12 12 13 - #define IS_MNT_SHARED(m) ((m)->mnt.mnt_flags & MNT_SHARED) 13 + #define IS_MNT_SHARED(m) ((m)->mnt_t_flags & T_SHARED) 14 14 #define IS_MNT_SLAVE(m) ((m)->mnt_master) 15 15 #define IS_MNT_NEW(m) (!(m)->mnt_ns) 16 - #define CLEAR_MNT_SHARED(m) ((m)->mnt.mnt_flags &= ~MNT_SHARED) 17 - #define IS_MNT_UNBINDABLE(m) ((m)->mnt.mnt_flags & MNT_UNBINDABLE) 18 - #define IS_MNT_MARKED(m) ((m)->mnt.mnt_flags & MNT_MARKED) 19 - #define SET_MNT_MARK(m) ((m)->mnt.mnt_flags |= MNT_MARKED) 20 - #define CLEAR_MNT_MARK(m) ((m)->mnt.mnt_flags &= ~MNT_MARKED) 16 + #define CLEAR_MNT_SHARED(m) ((m)->mnt_t_flags &= ~T_SHARED) 17 + #define IS_MNT_UNBINDABLE(m) ((m)->mnt_t_flags & T_UNBINDABLE) 18 + #define IS_MNT_MARKED(m) ((m)->mnt_t_flags & T_MARKED) 19 + #define SET_MNT_MARK(m) ((m)->mnt_t_flags |= T_MARKED) 20 + #define CLEAR_MNT_MARK(m) ((m)->mnt_t_flags &= ~T_MARKED) 21 21 #define IS_MNT_LOCKED(m) ((m)->mnt.mnt_flags & MNT_LOCKED) 22 22 23 23 #define CL_EXPIRE 0x01 ··· 25 25 #define CL_COPY_UNBINDABLE 0x04 26 26 #define CL_MAKE_SHARED 0x08 27 27 #define CL_PRIVATE 0x10 28 - #define CL_SHARED_TO_SLAVE 0x20 29 28 #define CL_COPY_MNT_NS_FILE 0x40 30 29 30 + /* 31 + * EXCL[namespace_sem] 32 + */ 31 33 static inline void set_mnt_shared(struct mount *mnt) 32 34 { 33 - mnt->mnt.mnt_flags &= ~MNT_SHARED_MASK; 34 - mnt->mnt.mnt_flags |= MNT_SHARED; 35 + mnt->mnt_t_flags &= ~T_SHARED_MASK; 36 + mnt->mnt_t_flags |= T_SHARED; 37 + } 38 + 39 + static inline bool peers(const struct mount *m1, const struct mount *m2) 40 + { 41 + return m1->mnt_group_id == m2->mnt_group_id && m1->mnt_group_id; 35 42 } 36 43 37 44 void change_mnt_propagation(struct mount *, int); 38 45 int propagate_mnt(struct mount *, struct mountpoint *, struct mount *, 39 46 struct hlist_head *); 40 - int propagate_umount(struct list_head *); 47 + void propagate_umount(struct list_head *); 41 48 int propagate_mount_busy(struct mount *, int); 42 49 void propagate_mount_unlock(struct mount *); 43 50 void mnt_release_group_id(struct mount *);
+3 -15
include/linux/mount.h
··· 35 35 MNT_SHRINKABLE = 0x100, 36 36 MNT_WRITE_HOLD = 0x200, 37 37 38 - MNT_SHARED = 0x1000, /* if the vfsmount is a shared mount */ 39 - MNT_UNBINDABLE = 0x2000, /* if the vfsmount is a unbindable mount */ 40 - 41 38 MNT_INTERNAL = 0x4000, 42 39 43 40 MNT_LOCK_ATIME = 0x040000, ··· 45 48 MNT_LOCKED = 0x800000, 46 49 MNT_DOOMED = 0x1000000, 47 50 MNT_SYNC_UMOUNT = 0x2000000, 48 - MNT_MARKED = 0x4000000, 49 51 MNT_UMOUNT = 0x8000000, 50 52 51 - /* 52 - * MNT_SHARED_MASK is the set of flags that should be cleared when a 53 - * mount becomes shared. Currently, this is only the flag that says a 54 - * mount cannot be bind mounted, since this is how we create a mount 55 - * that shares events with another mount. If you add a new MNT_* 56 - * flag, consider how it interacts with shared mounts. 57 - */ 58 - MNT_SHARED_MASK = MNT_UNBINDABLE, 59 53 MNT_USER_SETTABLE_MASK = MNT_NOSUID | MNT_NODEV | MNT_NOEXEC 60 54 | MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME 61 55 | MNT_READONLY | MNT_NOSYMFOLLOW, 62 56 MNT_ATIME_MASK = MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME, 63 57 64 - MNT_INTERNAL_FLAGS = MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL | 65 - MNT_DOOMED | MNT_SYNC_UMOUNT | MNT_MARKED | 66 - MNT_LOCKED, 58 + MNT_INTERNAL_FLAGS = MNT_WRITE_HOLD | MNT_INTERNAL | MNT_DOOMED | 59 + MNT_SYNC_UMOUNT | MNT_LOCKED 67 60 }; 68 61 69 62 struct vfsmount { ··· 85 98 void mnt_put_write_access(struct vfsmount *mnt); 86 99 87 100 extern struct vfsmount *fc_mount(struct fs_context *fc); 101 + extern struct vfsmount *fc_mount_longterm(struct fs_context *fc); 88 102 extern struct vfsmount *vfs_create_mount(struct fs_context *fc); 89 103 extern struct vfsmount *vfs_kern_mount(struct file_system_type *type, 90 104 int flags, const char *name,
+1 -1
ipc/mqueue.c
··· 483 483 put_user_ns(fc->user_ns); 484 484 fc->user_ns = get_user_ns(ctx->ipc_ns->user_ns); 485 485 486 - mnt = fc_mount(fc); 486 + mnt = fc_mount_longterm(fc); 487 487 put_fs_context(fc); 488 488 return mnt; 489 489 }