Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs updates from Miklos Szeredi:
"This work from Amir introduces the inodes index feature, which
provides:

- hardlinks are not broken on copy up

- infrastructure for overlayfs NFS export

This also fixes constant st_ino for samefs case for lower hardlinks"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (33 commits)
ovl: mark parent impure and restore timestamp on ovl_link_up()
ovl: document copying layers restrictions with inodes index
ovl: cleanup orphan index entries
ovl: persistent overlay inode nlink for indexed inodes
ovl: implement index dir copy up
ovl: move copy up lock out
ovl: rearrange copy up
ovl: add flag for upper in ovl_entry
ovl: use struct copy_up_ctx as function argument
ovl: base tmpfile in workdir too
ovl: factor out ovl_copy_up_inode() helper
ovl: extract helper to get temp file in copy up
ovl: defer upper dir lock to tempfile link
ovl: hash overlay non-dir inodes by copy up origin
ovl: cleanup bad and stale index entries on mount
ovl: lookup index entry for copy up origin
ovl: verify index dir matches upper dir
ovl: verify upper root dir matches lower root dir
ovl: introduce the inodes index dir feature
ovl: generalize ovl_create_workdir()
...

+1465 -392
+34
Documentation/filesystems/overlayfs.txt
··· 201 201 top, lower2 the middle and lower3 the bottom layer. 202 202 203 203 204 + Sharing and copying layers 205 + -------------------------- 206 + 207 + Lower layers may be shared among several overlay mounts and that is indeed 208 + a very common practice. An overlay mount may use the same lower layer 209 + path as another overlay mount and it may use a lower layer path that is 210 + beneath or above the path of another overlay lower layer path. 211 + 212 + Using an upper layer path and/or a workdir path that are already used by 213 + another overlay mount is not allowed and will fail with EBUSY. Using 214 + partially overlapping paths is not allowed but will not fail with EBUSY. 215 + 216 + Mounting an overlay using an upper layer path, where the upper layer path 217 + was previously used by another mounted overlay in combination with a 218 + different lower layer path, is allowed, unless the "inodes index" feature 219 + is enabled. 220 + 221 + With the "inodes index" feature, on the first time mount, an NFS file 222 + handle of the lower layer root directory, along with the UUID of the lower 223 + filesystem, are encoded and stored in the "trusted.overlay.origin" extended 224 + attribute on the upper layer root directory. On subsequent mount attempts, 225 + the lower root directory file handle and lower filesystem UUID are compared 226 + to the stored origin in upper root directory. On failure to verify the 227 + lower root origin, mount will fail with ESTALE. An overlayfs mount with 228 + "inodes index" enabled will fail with EOPNOTSUPP if the lower filesystem 229 + does not support NFS export, lower filesystem does not have a valid UUID or 230 + if the upper filesystem does not support extended attributes. 231 + 232 + It is quite a common practice to copy overlay layers to a different 233 + directory tree on the same or different underlying filesystem, and even 234 + to a different machine. With the "inodes index" feature, trying to mount 235 + the copied layers will fail the verification of the lower root file handle. 236 + 237 + 204 238 Non-standard behavior 205 239 --------------------- 206 240
+20
fs/overlayfs/Kconfig
··· 23 23 Note, that redirects are not backward compatible. That is, mounting 24 24 an overlay which has redirects on a kernel that doesn't support this 25 25 feature will have unexpected results. 26 + 27 + config OVERLAY_FS_INDEX 28 + bool "Overlayfs: turn on inodes index feature by default" 29 + depends on OVERLAY_FS 30 + help 31 + If this config option is enabled then overlay filesystems will use 32 + the inodes index dir to map lower inodes to upper inodes by default. 33 + In this case it is still possible to turn off index globally with the 34 + "index=off" module option or on a filesystem instance basis with the 35 + "index=off" mount option. 36 + 37 + The inodes index feature prevents breaking of lower hardlinks on copy 38 + up. 39 + 40 + Note, that the inodes index feature is read-only backward compatible. 41 + That is, mounting an overlay which has an index dir on a kernel that 42 + doesn't support this feature read-only, will not have any negative 43 + outcomes. However, mounting the same overlay with an old kernel 44 + read-write and then mounting it again with a new kernel, will have 45 + unexpected results.
+271 -141
fs/overlayfs/copy_up.c
··· 233 233 return err; 234 234 } 235 235 236 - static struct ovl_fh *ovl_encode_fh(struct dentry *lower, uuid_t *uuid) 236 + struct ovl_fh *ovl_encode_fh(struct dentry *lower, bool is_upper) 237 237 { 238 238 struct ovl_fh *fh; 239 239 int fh_type, fh_len, dwords; 240 240 void *buf; 241 241 int buflen = MAX_HANDLE_SZ; 242 + uuid_t *uuid = &lower->d_sb->s_uuid; 242 243 243 244 buf = kmalloc(buflen, GFP_TEMPORARY); 244 245 if (!buf) ··· 272 271 fh->magic = OVL_FH_MAGIC; 273 272 fh->type = fh_type; 274 273 fh->flags = OVL_FH_FLAG_CPU_ENDIAN; 274 + /* 275 + * When we will want to decode an overlay dentry from this handle 276 + * and all layers are on the same fs, if we get a disconncted real 277 + * dentry when we decode fid, the only way to tell if we should assign 278 + * it to upperdentry or to lowerstack is by checking this flag. 279 + */ 280 + if (is_upper) 281 + fh->flags |= OVL_FH_FLAG_PATH_UPPER; 275 282 fh->len = fh_len; 276 283 fh->uuid = *uuid; 277 284 memcpy(fh->fid, buf, buflen); ··· 292 283 static int ovl_set_origin(struct dentry *dentry, struct dentry *lower, 293 284 struct dentry *upper) 294 285 { 295 - struct super_block *sb = lower->d_sb; 296 286 const struct ovl_fh *fh = NULL; 297 287 int err; 298 288 ··· 300 292 * so we can use the overlay.origin xattr to distignuish between a copy 301 293 * up and a pure upper inode. 302 294 */ 303 - if (sb->s_export_op && sb->s_export_op->fh_to_dentry && 304 - !uuid_is_null(&sb->s_uuid)) { 305 - fh = ovl_encode_fh(lower, &sb->s_uuid); 295 + if (ovl_can_decode_fh(lower->d_sb)) { 296 + fh = ovl_encode_fh(lower, false); 306 297 if (IS_ERR(fh)) 307 298 return PTR_ERR(fh); 308 299 } ··· 316 309 return err; 317 310 } 318 311 319 - static int ovl_copy_up_locked(struct dentry *workdir, struct dentry *upperdir, 320 - struct dentry *dentry, struct path *lowerpath, 321 - struct kstat *stat, const char *link, 322 - struct kstat *pstat, bool tmpfile) 312 + struct ovl_copy_up_ctx { 313 + struct dentry *parent; 314 + struct dentry *dentry; 315 + struct path lowerpath; 316 + struct kstat stat; 317 + struct kstat pstat; 318 + const char *link; 319 + struct dentry *destdir; 320 + struct qstr destname; 321 + struct dentry *workdir; 322 + bool tmpfile; 323 + bool origin; 324 + }; 325 + 326 + static int ovl_link_up(struct ovl_copy_up_ctx *c) 323 327 { 324 - struct inode *wdir = workdir->d_inode; 325 - struct inode *udir = upperdir->d_inode; 326 - struct dentry *newdentry = NULL; 327 - struct dentry *upper = NULL; 328 - struct dentry *temp = NULL; 329 328 int err; 329 + struct dentry *upper; 330 + struct dentry *upperdir = ovl_dentry_upper(c->parent); 331 + struct inode *udir = d_inode(upperdir); 332 + 333 + /* Mark parent "impure" because it may now contain non-pure upper */ 334 + err = ovl_set_impure(c->parent, upperdir); 335 + if (err) 336 + return err; 337 + 338 + err = ovl_set_nlink_lower(c->dentry); 339 + if (err) 340 + return err; 341 + 342 + inode_lock_nested(udir, I_MUTEX_PARENT); 343 + upper = lookup_one_len(c->dentry->d_name.name, upperdir, 344 + c->dentry->d_name.len); 345 + err = PTR_ERR(upper); 346 + if (!IS_ERR(upper)) { 347 + err = ovl_do_link(ovl_dentry_upper(c->dentry), udir, upper, 348 + true); 349 + dput(upper); 350 + 351 + if (!err) { 352 + /* Restore timestamps on parent (best effort) */ 353 + ovl_set_timestamps(upperdir, &c->pstat); 354 + ovl_dentry_set_upper_alias(c->dentry); 355 + } 356 + } 357 + inode_unlock(udir); 358 + ovl_set_nlink_upper(c->dentry); 359 + 360 + return err; 361 + } 362 + 363 + static int ovl_install_temp(struct ovl_copy_up_ctx *c, struct dentry *temp, 364 + struct dentry **newdentry) 365 + { 366 + int err; 367 + struct dentry *upper; 368 + struct inode *udir = d_inode(c->destdir); 369 + 370 + upper = lookup_one_len(c->destname.name, c->destdir, c->destname.len); 371 + if (IS_ERR(upper)) 372 + return PTR_ERR(upper); 373 + 374 + if (c->tmpfile) 375 + err = ovl_do_link(temp, udir, upper, true); 376 + else 377 + err = ovl_do_rename(d_inode(c->workdir), temp, udir, upper, 0); 378 + 379 + if (!err) 380 + *newdentry = dget(c->tmpfile ? upper : temp); 381 + dput(upper); 382 + 383 + return err; 384 + } 385 + 386 + static int ovl_get_tmpfile(struct ovl_copy_up_ctx *c, struct dentry **tempp) 387 + { 388 + int err; 389 + struct dentry *temp; 330 390 const struct cred *old_creds = NULL; 331 391 struct cred *new_creds = NULL; 332 392 struct cattr cattr = { 333 393 /* Can't properly set mode on creation because of the umask */ 334 - .mode = stat->mode & S_IFMT, 335 - .rdev = stat->rdev, 336 - .link = link 394 + .mode = c->stat.mode & S_IFMT, 395 + .rdev = c->stat.rdev, 396 + .link = c->link 337 397 }; 338 398 339 - err = security_inode_copy_up(dentry, &new_creds); 399 + err = security_inode_copy_up(c->dentry, &new_creds); 340 400 if (err < 0) 341 401 goto out; 342 402 343 403 if (new_creds) 344 404 old_creds = override_creds(new_creds); 345 405 346 - if (tmpfile) 347 - temp = ovl_do_tmpfile(upperdir, stat->mode); 348 - else 349 - temp = ovl_lookup_temp(workdir); 350 - err = 0; 351 - if (IS_ERR(temp)) { 352 - err = PTR_ERR(temp); 353 - temp = NULL; 406 + if (c->tmpfile) { 407 + temp = ovl_do_tmpfile(c->workdir, c->stat.mode); 408 + if (IS_ERR(temp)) 409 + goto temp_err; 410 + } else { 411 + temp = ovl_lookup_temp(c->workdir); 412 + if (IS_ERR(temp)) 413 + goto temp_err; 414 + 415 + err = ovl_create_real(d_inode(c->workdir), temp, &cattr, 416 + NULL, true); 417 + if (err) { 418 + dput(temp); 419 + goto out; 420 + } 354 421 } 355 - 356 - if (!err && !tmpfile) 357 - err = ovl_create_real(wdir, temp, &cattr, NULL, true); 358 - 422 + err = 0; 423 + *tempp = temp; 424 + out: 359 425 if (new_creds) { 360 426 revert_creds(old_creds); 361 427 put_cred(new_creds); 362 428 } 363 429 364 - if (err) 365 - goto out; 430 + return err; 366 431 367 - if (S_ISREG(stat->mode)) { 432 + temp_err: 433 + err = PTR_ERR(temp); 434 + goto out; 435 + } 436 + 437 + static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp) 438 + { 439 + int err; 440 + 441 + if (S_ISREG(c->stat.mode)) { 368 442 struct path upperpath; 369 443 370 - ovl_path_upper(dentry, &upperpath); 444 + ovl_path_upper(c->dentry, &upperpath); 371 445 BUG_ON(upperpath.dentry != NULL); 372 446 upperpath.dentry = temp; 373 447 374 - if (tmpfile) { 375 - inode_unlock(udir); 376 - err = ovl_copy_up_data(lowerpath, &upperpath, 377 - stat->size); 378 - inode_lock_nested(udir, I_MUTEX_PARENT); 379 - } else { 380 - err = ovl_copy_up_data(lowerpath, &upperpath, 381 - stat->size); 382 - } 383 - 448 + err = ovl_copy_up_data(&c->lowerpath, &upperpath, c->stat.size); 384 449 if (err) 385 - goto out_cleanup; 450 + return err; 386 451 } 387 452 388 - err = ovl_copy_xattr(lowerpath->dentry, temp); 453 + err = ovl_copy_xattr(c->lowerpath.dentry, temp); 389 454 if (err) 390 - goto out_cleanup; 455 + return err; 391 456 392 457 inode_lock(temp->d_inode); 393 - err = ovl_set_attr(temp, stat); 458 + err = ovl_set_attr(temp, &c->stat); 394 459 inode_unlock(temp->d_inode); 395 460 if (err) 396 - goto out_cleanup; 461 + return err; 397 462 398 463 /* 399 464 * Store identifier of lower inode in upper inode xattr to ··· 474 395 * Don't set origin when we are breaking the association with a lower 475 396 * hard link. 476 397 */ 477 - if (S_ISDIR(stat->mode) || stat->nlink == 1) { 478 - err = ovl_set_origin(dentry, lowerpath->dentry, temp); 398 + if (c->origin) { 399 + err = ovl_set_origin(c->dentry, c->lowerpath.dentry, temp); 479 400 if (err) 480 - goto out_cleanup; 401 + return err; 481 402 } 482 403 483 - upper = lookup_one_len(dentry->d_name.name, upperdir, 484 - dentry->d_name.len); 485 - if (IS_ERR(upper)) { 486 - err = PTR_ERR(upper); 487 - upper = NULL; 488 - goto out_cleanup; 489 - } 404 + return 0; 405 + } 490 406 491 - if (tmpfile) 492 - err = ovl_do_link(temp, udir, upper, true); 493 - else 494 - err = ovl_do_rename(wdir, temp, udir, upper, 0); 407 + static int ovl_copy_up_locked(struct ovl_copy_up_ctx *c) 408 + { 409 + struct inode *udir = c->destdir->d_inode; 410 + struct dentry *newdentry = NULL; 411 + struct dentry *temp = NULL; 412 + int err; 413 + 414 + err = ovl_get_tmpfile(c, &temp); 415 + if (err) 416 + goto out; 417 + 418 + err = ovl_copy_up_inode(c, temp); 495 419 if (err) 496 420 goto out_cleanup; 497 421 498 - newdentry = dget(tmpfile ? upper : temp); 499 - ovl_dentry_update(dentry, newdentry); 500 - ovl_inode_update(d_inode(dentry), d_inode(newdentry)); 422 + if (c->tmpfile) { 423 + inode_lock_nested(udir, I_MUTEX_PARENT); 424 + err = ovl_install_temp(c, temp, &newdentry); 425 + inode_unlock(udir); 426 + } else { 427 + err = ovl_install_temp(c, temp, &newdentry); 428 + } 429 + if (err) 430 + goto out_cleanup; 501 431 502 - /* Restore timestamps on parent (best effort) */ 503 - ovl_set_timestamps(upperdir, pstat); 432 + ovl_inode_update(d_inode(c->dentry), newdentry); 504 433 out: 505 434 dput(temp); 506 - dput(upper); 507 435 return err; 508 436 509 437 out_cleanup: 510 - if (!tmpfile) 511 - ovl_cleanup(wdir, temp); 438 + if (!c->tmpfile) 439 + ovl_cleanup(d_inode(c->workdir), temp); 512 440 goto out; 513 441 } 514 442 ··· 528 442 * is possible that the copy up will lock the old parent. At that point 529 443 * the file will have already been copied up anyway. 530 444 */ 531 - static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry, 532 - struct path *lowerpath, struct kstat *stat) 445 + static int ovl_do_copy_up(struct ovl_copy_up_ctx *c) 533 446 { 534 - DEFINE_DELAYED_CALL(done); 535 - struct dentry *workdir = ovl_workdir(dentry); 536 447 int err; 537 - struct kstat pstat; 538 - struct path parentpath; 539 - struct dentry *lowerdentry = lowerpath->dentry; 540 - struct dentry *upperdir; 541 - const char *link = NULL; 542 - struct ovl_fs *ofs = dentry->d_sb->s_fs_info; 448 + struct ovl_fs *ofs = c->dentry->d_sb->s_fs_info; 449 + bool indexed = false; 543 450 544 - if (WARN_ON(!workdir)) 451 + if (ovl_indexdir(c->dentry->d_sb) && !S_ISDIR(c->stat.mode) && 452 + c->stat.nlink > 1) 453 + indexed = true; 454 + 455 + if (S_ISDIR(c->stat.mode) || c->stat.nlink == 1 || indexed) 456 + c->origin = true; 457 + 458 + if (indexed) { 459 + c->destdir = ovl_indexdir(c->dentry->d_sb); 460 + err = ovl_get_index_name(c->lowerpath.dentry, &c->destname); 461 + if (err) 462 + return err; 463 + } else { 464 + /* 465 + * Mark parent "impure" because it may now contain non-pure 466 + * upper 467 + */ 468 + err = ovl_set_impure(c->parent, c->destdir); 469 + if (err) 470 + return err; 471 + } 472 + 473 + /* Should we copyup with O_TMPFILE or with workdir? */ 474 + if (S_ISREG(c->stat.mode) && ofs->tmpfile) { 475 + c->tmpfile = true; 476 + err = ovl_copy_up_locked(c); 477 + } else { 478 + err = -EIO; 479 + if (lock_rename(c->workdir, c->destdir) != NULL) { 480 + pr_err("overlayfs: failed to lock workdir+upperdir\n"); 481 + } else { 482 + err = ovl_copy_up_locked(c); 483 + unlock_rename(c->workdir, c->destdir); 484 + } 485 + } 486 + 487 + if (indexed) { 488 + if (!err) 489 + ovl_set_flag(OVL_INDEX, d_inode(c->dentry)); 490 + kfree(c->destname.name); 491 + } else if (!err) { 492 + struct inode *udir = d_inode(c->destdir); 493 + 494 + /* Restore timestamps on parent (best effort) */ 495 + inode_lock(udir); 496 + ovl_set_timestamps(c->destdir, &c->pstat); 497 + inode_unlock(udir); 498 + 499 + ovl_dentry_set_upper_alias(c->dentry); 500 + } 501 + 502 + return err; 503 + } 504 + 505 + static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry, 506 + int flags) 507 + { 508 + int err; 509 + DEFINE_DELAYED_CALL(done); 510 + struct path parentpath; 511 + struct ovl_copy_up_ctx ctx = { 512 + .parent = parent, 513 + .dentry = dentry, 514 + .workdir = ovl_workdir(dentry), 515 + }; 516 + 517 + if (WARN_ON(!ctx.workdir)) 545 518 return -EROFS; 546 519 547 - ovl_do_check_copy_up(lowerdentry); 548 - 549 - ovl_path_upper(parent, &parentpath); 550 - upperdir = parentpath.dentry; 551 - 552 - /* Mark parent "impure" because it may now contain non-pure upper */ 553 - err = ovl_set_impure(parent, upperdir); 520 + ovl_path_lower(dentry, &ctx.lowerpath); 521 + err = vfs_getattr(&ctx.lowerpath, &ctx.stat, 522 + STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT); 554 523 if (err) 555 524 return err; 556 525 557 - err = vfs_getattr(&parentpath, &pstat, 526 + ovl_path_upper(parent, &parentpath); 527 + ctx.destdir = parentpath.dentry; 528 + ctx.destname = dentry->d_name; 529 + 530 + err = vfs_getattr(&parentpath, &ctx.pstat, 558 531 STATX_ATIME | STATX_MTIME, AT_STATX_SYNC_AS_STAT); 559 532 if (err) 560 533 return err; 561 534 562 - if (S_ISLNK(stat->mode)) { 563 - link = vfs_get_link(lowerdentry, &done); 564 - if (IS_ERR(link)) 565 - return PTR_ERR(link); 535 + /* maybe truncate regular file. this has no effect on dirs */ 536 + if (flags & O_TRUNC) 537 + ctx.stat.size = 0; 538 + 539 + if (S_ISLNK(ctx.stat.mode)) { 540 + ctx.link = vfs_get_link(ctx.lowerpath.dentry, &done); 541 + if (IS_ERR(ctx.link)) 542 + return PTR_ERR(ctx.link); 566 543 } 544 + ovl_do_check_copy_up(ctx.lowerpath.dentry); 567 545 568 - /* Should we copyup with O_TMPFILE or with workdir? */ 569 - if (S_ISREG(stat->mode) && ofs->tmpfile) { 570 - err = ovl_copy_up_start(dentry); 571 - /* err < 0: interrupted, err > 0: raced with another copy-up */ 572 - if (unlikely(err)) { 573 - pr_debug("ovl_copy_up_start(%pd2) = %i\n", dentry, err); 574 - if (err > 0) 575 - err = 0; 576 - goto out_done; 577 - } 578 - 579 - inode_lock_nested(upperdir->d_inode, I_MUTEX_PARENT); 580 - err = ovl_copy_up_locked(workdir, upperdir, dentry, lowerpath, 581 - stat, link, &pstat, true); 582 - inode_unlock(upperdir->d_inode); 546 + err = ovl_copy_up_start(dentry); 547 + /* err < 0: interrupted, err > 0: raced with another copy-up */ 548 + if (unlikely(err)) { 549 + if (err > 0) 550 + err = 0; 551 + } else { 552 + if (!ovl_dentry_upper(dentry)) 553 + err = ovl_do_copy_up(&ctx); 554 + if (!err && !ovl_dentry_has_upper_alias(dentry)) 555 + err = ovl_link_up(&ctx); 583 556 ovl_copy_up_end(dentry); 584 - goto out_done; 585 557 } 586 - 587 - err = -EIO; 588 - if (lock_rename(workdir, upperdir) != NULL) { 589 - pr_err("overlayfs: failed to lock workdir+upperdir\n"); 590 - goto out_unlock; 591 - } 592 - if (ovl_dentry_upper(dentry)) { 593 - /* Raced with another copy-up? Nothing to do, then... */ 594 - err = 0; 595 - goto out_unlock; 596 - } 597 - 598 - err = ovl_copy_up_locked(workdir, upperdir, dentry, lowerpath, 599 - stat, link, &pstat, false); 600 - out_unlock: 601 - unlock_rename(workdir, upperdir); 602 - out_done: 603 558 do_delayed_call(&done); 604 559 605 560 return err; ··· 654 527 while (!err) { 655 528 struct dentry *next; 656 529 struct dentry *parent; 657 - struct path lowerpath; 658 - struct kstat stat; 659 - enum ovl_path_type type = ovl_path_type(dentry); 660 530 661 - if (OVL_TYPE_UPPER(type)) 531 + /* 532 + * Check if copy-up has happened as well as for upper alias (in 533 + * case of hard links) is there. 534 + * 535 + * Both checks are lockless: 536 + * - false negatives: will recheck under oi->lock 537 + * - false positives: 538 + * + ovl_dentry_upper() uses memory barriers to ensure the 539 + * upper dentry is up-to-date 540 + * + ovl_dentry_has_upper_alias() relies on locking of 541 + * upper parent i_rwsem to prevent reordering copy-up 542 + * with rename. 543 + */ 544 + if (ovl_dentry_upper(dentry) && 545 + ovl_dentry_has_upper_alias(dentry)) 662 546 break; 663 547 664 548 next = dget(dentry); ··· 677 539 for (;;) { 678 540 parent = dget_parent(next); 679 541 680 - type = ovl_path_type(parent); 681 - if (OVL_TYPE_UPPER(type)) 542 + if (ovl_dentry_upper(parent)) 682 543 break; 683 544 684 545 dput(next); 685 546 next = parent; 686 547 } 687 548 688 - ovl_path_lower(next, &lowerpath); 689 - err = vfs_getattr(&lowerpath, &stat, 690 - STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT); 691 - /* maybe truncate regular file. this has no effect on dirs */ 692 - if (flags & O_TRUNC) 693 - stat.size = 0; 694 - if (!err) 695 - err = ovl_copy_up_one(parent, next, &lowerpath, &stat); 549 + err = ovl_copy_up_one(parent, next, flags); 696 550 697 551 dput(parent); 698 552 dput(next);
+41 -11
fs/overlayfs/dir.c
··· 24 24 MODULE_PARM_DESC(ovl_redirect_max, 25 25 "Maximum length of absolute redirect xattr value"); 26 26 27 - void ovl_cleanup(struct inode *wdir, struct dentry *wdentry) 27 + int ovl_cleanup(struct inode *wdir, struct dentry *wdentry) 28 28 { 29 29 int err; 30 30 ··· 39 39 pr_err("overlayfs: cleanup of '%pd2' failed (%i)\n", 40 40 wdentry, err); 41 41 } 42 + 43 + return err; 42 44 } 43 45 44 46 struct dentry *ovl_lookup_temp(struct dentry *workdir) ··· 156 154 struct dentry *newdentry, bool hardlink) 157 155 { 158 156 ovl_dentry_version_inc(dentry->d_parent); 159 - ovl_dentry_update(dentry, newdentry); 157 + ovl_dentry_set_upper_alias(dentry); 160 158 if (!hardlink) { 161 - ovl_inode_update(inode, d_inode(newdentry)); 159 + ovl_inode_update(inode, newdentry); 162 160 ovl_copyattr(newdentry->d_inode, inode); 163 161 } else { 164 - WARN_ON(ovl_inode_real(inode, NULL) != d_inode(newdentry)); 162 + WARN_ON(ovl_inode_real(inode) != d_inode(newdentry)); 163 + dput(newdentry); 165 164 inc_nlink(inode); 166 165 } 167 166 d_instantiate(dentry, inode); ··· 591 588 struct dentry *new) 592 589 { 593 590 int err; 591 + bool locked = false; 594 592 struct inode *inode; 595 593 596 594 err = ovl_want_write(old); ··· 602 598 if (err) 603 599 goto out_drop_write; 604 600 601 + err = ovl_nlink_start(old, &locked); 602 + if (err) 603 + goto out_drop_write; 604 + 605 605 inode = d_inode(old); 606 606 ihold(inode); 607 607 ··· 613 605 if (err) 614 606 iput(inode); 615 607 608 + ovl_nlink_end(old, locked); 616 609 out_drop_write: 617 610 ovl_drop_write(old); 618 611 out: 619 612 return err; 613 + } 614 + 615 + static bool ovl_matches_upper(struct dentry *dentry, struct dentry *upper) 616 + { 617 + return d_inode(ovl_dentry_upper(dentry)) == d_inode(upper); 620 618 } 621 619 622 620 static int ovl_remove_and_whiteout(struct dentry *dentry, bool is_dir) ··· 660 646 err = -ESTALE; 661 647 if ((opaquedir && upper != opaquedir) || 662 648 (!opaquedir && ovl_dentry_upper(dentry) && 663 - upper != ovl_dentry_upper(dentry))) { 649 + !ovl_matches_upper(dentry, upper))) { 664 650 goto out_dput_upper; 665 651 } 666 652 ··· 721 707 722 708 err = -ESTALE; 723 709 if ((opaquedir && upper != opaquedir) || 724 - (!opaquedir && upper != ovl_dentry_upper(dentry))) 710 + (!opaquedir && !ovl_matches_upper(dentry, upper))) 725 711 goto out_dput_upper; 726 712 727 713 if (is_dir) ··· 749 735 750 736 static int ovl_do_remove(struct dentry *dentry, bool is_dir) 751 737 { 752 - enum ovl_path_type type; 753 738 int err; 739 + bool locked = false; 754 740 const struct cred *old_cred; 755 741 756 742 err = ovl_want_write(dentry); ··· 761 747 if (err) 762 748 goto out_drop_write; 763 749 764 - type = ovl_path_type(dentry); 750 + err = ovl_nlink_start(dentry, &locked); 751 + if (err) 752 + goto out_drop_write; 765 753 766 754 old_cred = ovl_override_creds(dentry->d_sb); 767 755 if (!ovl_lower_positive(dentry)) ··· 777 761 else 778 762 drop_nlink(dentry->d_inode); 779 763 } 764 + ovl_nlink_end(dentry, locked); 780 765 out_drop_write: 781 766 ovl_drop_write(dentry); 782 767 out: ··· 900 883 unsigned int flags) 901 884 { 902 885 int err; 886 + bool locked = false; 903 887 struct dentry *old_upperdir; 904 888 struct dentry *new_upperdir; 905 889 struct dentry *olddentry; ··· 942 924 goto out_drop_write; 943 925 if (!overwrite) { 944 926 err = ovl_copy_up(new); 927 + if (err) 928 + goto out_drop_write; 929 + } else { 930 + err = ovl_nlink_start(new, &locked); 945 931 if (err) 946 932 goto out_drop_write; 947 933 } ··· 1007 985 goto out_unlock; 1008 986 1009 987 err = -ESTALE; 1010 - if (olddentry != ovl_dentry_upper(old)) 988 + if (!ovl_matches_upper(old, olddentry)) 1011 989 goto out_dput_old; 1012 990 1013 991 newdentry = lookup_one_len(new->d_name.name, new_upperdir, ··· 1020 998 new_opaque = ovl_dentry_is_opaque(new); 1021 999 1022 1000 err = -ESTALE; 1023 - if (ovl_dentry_upper(new)) { 1001 + if (d_inode(new) && ovl_dentry_upper(new)) { 1024 1002 if (opaquedir) { 1025 1003 if (newdentry != opaquedir) 1026 1004 goto out_dput; 1027 1005 } else { 1028 - if (newdentry != ovl_dentry_upper(new)) 1006 + if (!ovl_matches_upper(new, newdentry)) 1029 1007 goto out_dput; 1030 1008 } 1031 1009 } else { ··· 1068 1046 if (cleanup_whiteout) 1069 1047 ovl_cleanup(old_upperdir->d_inode, newdentry); 1070 1048 1049 + if (overwrite && d_inode(new)) { 1050 + if (new_is_dir) 1051 + clear_nlink(d_inode(new)); 1052 + else 1053 + drop_nlink(d_inode(new)); 1054 + } 1055 + 1071 1056 ovl_dentry_version_inc(old->d_parent); 1072 1057 ovl_dentry_version_inc(new->d_parent); 1073 1058 ··· 1086 1057 unlock_rename(new_upperdir, old_upperdir); 1087 1058 out_revert_creds: 1088 1059 revert_creds(old_cred); 1060 + ovl_nlink_end(new, locked); 1089 1061 out_drop_write: 1090 1062 ovl_drop_write(old); 1091 1063 out:
+194 -25
fs/overlayfs/inode.c
··· 12 12 #include <linux/cred.h> 13 13 #include <linux/xattr.h> 14 14 #include <linux/posix_acl.h> 15 + #include <linux/ratelimit.h> 15 16 #include "overlayfs.h" 16 17 17 18 int ovl_setattr(struct dentry *dentry, struct iattr *attr) ··· 97 96 98 97 WARN_ON_ONCE(stat->dev != lowerstat.dev); 99 98 /* 100 - * Lower hardlinks are broken on copy up to different 99 + * Lower hardlinks may be broken on copy up to different 101 100 * upper files, so we cannot use the lower origin st_ino 102 101 * for those different files, even for the same fs case. 102 + * With inodes index enabled, it is safe to use st_ino 103 + * of an indexed hardlinked origin. The index validates 104 + * that the upper hardlink is not broken. 103 105 */ 104 - if (is_dir || lowerstat.nlink == 1) 106 + if (is_dir || lowerstat.nlink == 1 || 107 + ovl_test_flag(OVL_INDEX, d_inode(dentry))) 105 108 stat->ino = lowerstat.ino; 106 109 } 107 110 stat->dev = dentry->d_sb->s_dev; ··· 131 126 if (is_dir && OVL_TYPE_MERGE(type)) 132 127 stat->nlink = 1; 133 128 129 + /* 130 + * Return the overlay inode nlinks for indexed upper inodes. 131 + * Overlay inode nlink counts the union of the upper hardlinks 132 + * and non-covered lower hardlinks. It does not include the upper 133 + * index hardlink. 134 + */ 135 + if (!is_dir && ovl_test_flag(OVL_INDEX, d_inode(dentry))) 136 + stat->nlink = dentry->d_inode->i_nlink; 137 + 134 138 out: 135 139 revert_creds(old_cred); 136 140 ··· 148 134 149 135 int ovl_permission(struct inode *inode, int mask) 150 136 { 151 - bool is_upper; 152 - struct inode *realinode = ovl_inode_real(inode, &is_upper); 137 + struct inode *upperinode = ovl_inode_upper(inode); 138 + struct inode *realinode = upperinode ?: ovl_inode_lower(inode); 153 139 const struct cred *old_cred; 154 140 int err; 155 141 ··· 168 154 return err; 169 155 170 156 old_cred = ovl_override_creds(inode->i_sb); 171 - if (!is_upper && !special_file(realinode->i_mode) && mask & MAY_WRITE) { 157 + if (!upperinode && 158 + !special_file(realinode->i_mode) && mask & MAY_WRITE) { 172 159 mask &= ~(MAY_WRITE | MAY_APPEND); 173 160 /* Make sure mounter can read file for copy up later */ 174 161 mask |= MAY_READ; ··· 301 286 302 287 struct posix_acl *ovl_get_acl(struct inode *inode, int type) 303 288 { 304 - struct inode *realinode = ovl_inode_real(inode, NULL); 289 + struct inode *realinode = ovl_inode_real(inode); 305 290 const struct cred *old_cred; 306 291 struct posix_acl *acl; 307 292 ··· 315 300 return acl; 316 301 } 317 302 318 - static bool ovl_open_need_copy_up(int flags, enum ovl_path_type type, 319 - struct dentry *realdentry) 303 + static bool ovl_open_need_copy_up(struct dentry *dentry, int flags) 320 304 { 321 - if (OVL_TYPE_UPPER(type)) 305 + if (ovl_dentry_upper(dentry) && 306 + ovl_dentry_has_upper_alias(dentry)) 322 307 return false; 323 308 324 - if (special_file(realdentry->d_inode->i_mode)) 309 + if (special_file(d_inode(dentry)->i_mode)) 325 310 return false; 326 311 327 312 if (!(OPEN_FMODE(flags) & FMODE_WRITE) && !(flags & O_TRUNC)) ··· 333 318 int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags) 334 319 { 335 320 int err = 0; 336 - struct path realpath; 337 - enum ovl_path_type type; 338 321 339 - type = ovl_path_real(dentry, &realpath); 340 - if (ovl_open_need_copy_up(file_flags, type, realpath.dentry)) { 322 + if (ovl_open_need_copy_up(dentry, file_flags)) { 341 323 err = ovl_want_write(dentry); 342 324 if (!err) { 343 325 err = ovl_copy_up_flags(dentry, file_flags); ··· 452 440 } 453 441 } 454 442 443 + /* 444 + * With inodes index enabled, an overlay inode nlink counts the union of upper 445 + * hardlinks and non-covered lower hardlinks. During the lifetime of a non-pure 446 + * upper inode, the following nlink modifying operations can happen: 447 + * 448 + * 1. Lower hardlink copy up 449 + * 2. Upper hardlink created, unlinked or renamed over 450 + * 3. Lower hardlink whiteout or renamed over 451 + * 452 + * For the first, copy up case, the union nlink does not change, whether the 453 + * operation succeeds or fails, but the upper inode nlink may change. 454 + * Therefore, before copy up, we store the union nlink value relative to the 455 + * lower inode nlink in the index inode xattr trusted.overlay.nlink. 456 + * 457 + * For the second, upper hardlink case, the union nlink should be incremented 458 + * or decremented IFF the operation succeeds, aligned with nlink change of the 459 + * upper inode. Therefore, before link/unlink/rename, we store the union nlink 460 + * value relative to the upper inode nlink in the index inode. 461 + * 462 + * For the last, lower cover up case, we simplify things by preceding the 463 + * whiteout or cover up with copy up. This makes sure that there is an index 464 + * upper inode where the nlink xattr can be stored before the copied up upper 465 + * entry is unlink. 466 + */ 467 + #define OVL_NLINK_ADD_UPPER (1 << 0) 468 + 469 + /* 470 + * On-disk format for indexed nlink: 471 + * 472 + * nlink relative to the upper inode - "U[+-]NUM" 473 + * nlink relative to the lower inode - "L[+-]NUM" 474 + */ 475 + 476 + static int ovl_set_nlink_common(struct dentry *dentry, 477 + struct dentry *realdentry, const char *format) 478 + { 479 + struct inode *inode = d_inode(dentry); 480 + struct inode *realinode = d_inode(realdentry); 481 + char buf[13]; 482 + int len; 483 + 484 + len = snprintf(buf, sizeof(buf), format, 485 + (int) (inode->i_nlink - realinode->i_nlink)); 486 + 487 + return ovl_do_setxattr(ovl_dentry_upper(dentry), 488 + OVL_XATTR_NLINK, buf, len, 0); 489 + } 490 + 491 + int ovl_set_nlink_upper(struct dentry *dentry) 492 + { 493 + return ovl_set_nlink_common(dentry, ovl_dentry_upper(dentry), "U%+i"); 494 + } 495 + 496 + int ovl_set_nlink_lower(struct dentry *dentry) 497 + { 498 + return ovl_set_nlink_common(dentry, ovl_dentry_lower(dentry), "L%+i"); 499 + } 500 + 501 + unsigned int ovl_get_nlink(struct dentry *lowerdentry, 502 + struct dentry *upperdentry, 503 + unsigned int fallback) 504 + { 505 + int nlink_diff; 506 + int nlink; 507 + char buf[13]; 508 + int err; 509 + 510 + if (!lowerdentry || !upperdentry || d_inode(lowerdentry)->i_nlink == 1) 511 + return fallback; 512 + 513 + err = vfs_getxattr(upperdentry, OVL_XATTR_NLINK, &buf, sizeof(buf) - 1); 514 + if (err < 0) 515 + goto fail; 516 + 517 + buf[err] = '\0'; 518 + if ((buf[0] != 'L' && buf[0] != 'U') || 519 + (buf[1] != '+' && buf[1] != '-')) 520 + goto fail; 521 + 522 + err = kstrtoint(buf + 1, 10, &nlink_diff); 523 + if (err < 0) 524 + goto fail; 525 + 526 + nlink = d_inode(buf[0] == 'L' ? lowerdentry : upperdentry)->i_nlink; 527 + nlink += nlink_diff; 528 + 529 + if (nlink <= 0) 530 + goto fail; 531 + 532 + return nlink; 533 + 534 + fail: 535 + pr_warn_ratelimited("overlayfs: failed to get index nlink (%pd2, err=%i)\n", 536 + upperdentry, err); 537 + return fallback; 538 + } 539 + 455 540 struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev) 456 541 { 457 542 struct inode *inode; ··· 562 453 563 454 static int ovl_inode_test(struct inode *inode, void *data) 564 455 { 565 - return ovl_inode_real(inode, NULL) == data; 456 + return inode->i_private == data; 566 457 } 567 458 568 459 static int ovl_inode_set(struct inode *inode, void *data) 569 460 { 570 - inode->i_private = (void *) (((unsigned long) data) | OVL_ISUPPER_MASK); 461 + inode->i_private = data; 571 462 return 0; 572 463 } 573 464 574 - struct inode *ovl_get_inode(struct super_block *sb, struct inode *realinode) 575 - 465 + static bool ovl_verify_inode(struct inode *inode, struct dentry *lowerdentry, 466 + struct dentry *upperdentry) 576 467 { 468 + struct inode *lowerinode = lowerdentry ? d_inode(lowerdentry) : NULL; 469 + 470 + /* Lower (origin) inode must match, even if NULL */ 471 + if (ovl_inode_lower(inode) != lowerinode) 472 + return false; 473 + 474 + /* 475 + * Allow non-NULL __upperdentry in inode even if upperdentry is NULL. 476 + * This happens when finding a lower alias for a copied up hard link. 477 + */ 478 + if (upperdentry && ovl_inode_upper(inode) != d_inode(upperdentry)) 479 + return false; 480 + 481 + return true; 482 + } 483 + 484 + struct inode *ovl_get_inode(struct dentry *dentry, struct dentry *upperdentry) 485 + { 486 + struct dentry *lowerdentry = ovl_dentry_lower(dentry); 487 + struct inode *realinode = upperdentry ? d_inode(upperdentry) : NULL; 577 488 struct inode *inode; 578 489 579 - inode = iget5_locked(sb, (unsigned long) realinode, 580 - ovl_inode_test, ovl_inode_set, realinode); 581 - if (inode && inode->i_state & I_NEW) { 582 - ovl_fill_inode(inode, realinode->i_mode, realinode->i_rdev); 583 - set_nlink(inode, realinode->i_nlink); 584 - unlock_new_inode(inode); 585 - } 490 + if (!realinode) 491 + realinode = d_inode(lowerdentry); 586 492 493 + if (!S_ISDIR(realinode->i_mode) && 494 + (upperdentry || (lowerdentry && ovl_indexdir(dentry->d_sb)))) { 495 + struct inode *key = d_inode(lowerdentry ?: upperdentry); 496 + unsigned int nlink; 497 + 498 + inode = iget5_locked(dentry->d_sb, (unsigned long) key, 499 + ovl_inode_test, ovl_inode_set, key); 500 + if (!inode) 501 + goto out_nomem; 502 + if (!(inode->i_state & I_NEW)) { 503 + /* 504 + * Verify that the underlying files stored in the inode 505 + * match those in the dentry. 506 + */ 507 + if (!ovl_verify_inode(inode, lowerdentry, upperdentry)) { 508 + iput(inode); 509 + inode = ERR_PTR(-ESTALE); 510 + goto out; 511 + } 512 + 513 + dput(upperdentry); 514 + goto out; 515 + } 516 + 517 + nlink = ovl_get_nlink(lowerdentry, upperdentry, 518 + realinode->i_nlink); 519 + set_nlink(inode, nlink); 520 + } else { 521 + inode = new_inode(dentry->d_sb); 522 + if (!inode) 523 + goto out_nomem; 524 + } 525 + ovl_fill_inode(inode, realinode->i_mode, realinode->i_rdev); 526 + ovl_inode_init(inode, upperdentry, lowerdentry); 527 + 528 + if (upperdentry && ovl_is_impuredir(upperdentry)) 529 + ovl_set_flag(OVL_IMPURE, inode); 530 + 531 + if (inode->i_state & I_NEW) 532 + unlock_new_inode(inode); 533 + out: 587 534 return inode; 535 + 536 + out_nomem: 537 + inode = ERR_PTR(-ENOMEM); 538 + goto out; 588 539 }
+304 -66
fs/overlayfs/namei.c
··· 88 88 return 1; 89 89 } 90 90 91 - static struct dentry *ovl_get_origin(struct dentry *dentry, 92 - struct vfsmount *mnt) 91 + static struct ovl_fh *ovl_get_origin_fh(struct dentry *dentry) 93 92 { 94 93 int res; 95 94 struct ovl_fh *fh = NULL; 96 - struct dentry *origin = NULL; 97 - int bytes; 98 95 99 96 res = vfs_getxattr(dentry, OVL_XATTR_ORIGIN, NULL, 0); 100 97 if (res < 0) { ··· 103 106 if (res == 0) 104 107 return NULL; 105 108 106 - fh = kzalloc(res, GFP_TEMPORARY); 109 + fh = kzalloc(res, GFP_TEMPORARY); 107 110 if (!fh) 108 111 return ERR_PTR(-ENOMEM); 109 112 ··· 126 129 (fh->flags & OVL_FH_FLAG_BIG_ENDIAN) != OVL_FH_FLAG_CPU_ENDIAN) 127 130 goto out; 128 131 129 - bytes = (fh->len - offsetof(struct ovl_fh, fid)); 132 + return fh; 133 + 134 + out: 135 + kfree(fh); 136 + return NULL; 137 + 138 + fail: 139 + pr_warn_ratelimited("overlayfs: failed to get origin (%i)\n", res); 140 + goto out; 141 + invalid: 142 + pr_warn_ratelimited("overlayfs: invalid origin (%*phN)\n", res, fh); 143 + goto out; 144 + } 145 + 146 + static struct dentry *ovl_get_origin(struct dentry *dentry, 147 + struct vfsmount *mnt) 148 + { 149 + struct dentry *origin = NULL; 150 + struct ovl_fh *fh = ovl_get_origin_fh(dentry); 151 + int bytes; 152 + 153 + if (IS_ERR_OR_NULL(fh)) 154 + return (struct dentry *)fh; 130 155 131 156 /* 132 157 * Make sure that the stored uuid matches the uuid of the lower ··· 157 138 if (!uuid_equal(&fh->uuid, &mnt->mnt_sb->s_uuid)) 158 139 goto out; 159 140 141 + bytes = (fh->len - offsetof(struct ovl_fh, fid)); 160 142 origin = exportfs_decode_fh(mnt, (struct fid *)fh->fid, 161 143 bytes >> 2, (int)fh->type, 162 144 ovl_acceptable, NULL); ··· 169 149 } 170 150 171 151 if (ovl_dentry_weird(origin) || 172 - ((d_inode(origin)->i_mode ^ d_inode(dentry)->i_mode) & S_IFMT)) { 173 - dput(origin); 174 - origin = NULL; 152 + ((d_inode(origin)->i_mode ^ d_inode(dentry)->i_mode) & S_IFMT)) 175 153 goto invalid; 176 - } 177 154 178 155 out: 179 156 kfree(fh); 180 157 return origin; 181 158 182 - fail: 183 - pr_warn_ratelimited("overlayfs: failed to get origin (%i)\n", res); 184 - goto out; 185 159 invalid: 186 - pr_warn_ratelimited("overlayfs: invalid origin (%*phN)\n", res, fh); 160 + pr_warn_ratelimited("overlayfs: invalid origin (%pd2)\n", origin); 161 + dput(origin); 162 + origin = NULL; 187 163 goto out; 188 164 } 189 165 ··· 285 269 } 286 270 287 271 288 - static int ovl_check_origin(struct dentry *dentry, struct dentry *upperdentry, 272 + static int ovl_check_origin(struct dentry *upperdentry, 273 + struct path *lowerstack, unsigned int numlower, 289 274 struct path **stackp, unsigned int *ctrp) 290 275 { 291 - struct super_block *same_sb = ovl_same_sb(dentry->d_sb); 292 - struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata; 293 276 struct vfsmount *mnt; 294 - struct dentry *origin; 277 + struct dentry *origin = NULL; 278 + int i; 295 279 296 - if (!same_sb || !roe->numlower) 280 + 281 + for (i = 0; i < numlower; i++) { 282 + mnt = lowerstack[i].mnt; 283 + origin = ovl_get_origin(upperdentry, mnt); 284 + if (IS_ERR(origin)) 285 + return PTR_ERR(origin); 286 + 287 + if (origin) 288 + break; 289 + } 290 + 291 + if (!origin) 297 292 return 0; 298 293 299 - /* 300 - * Since all layers are on the same fs, we use the first layer for 301 - * decoding the file handle. We may get a disconnected dentry, 302 - * which is fine, because we only need to hold the origin inode in 303 - * cache and use its inode number. We may even get a connected dentry, 304 - * that is not under the first layer's root. That is also fine for 305 - * using it's inode number - it's the same as if we held a reference 306 - * to a dentry in first layer that was moved under us. 307 - */ 308 - mnt = roe->lowerstack[0].mnt; 309 - 310 - origin = ovl_get_origin(upperdentry, mnt); 311 - if (IS_ERR_OR_NULL(origin)) 312 - return PTR_ERR(origin); 313 - 314 - BUG_ON(*stackp || *ctrp); 315 - *stackp = kmalloc(sizeof(struct path), GFP_TEMPORARY); 294 + BUG_ON(*ctrp); 295 + if (!*stackp) 296 + *stackp = kmalloc(sizeof(struct path), GFP_TEMPORARY); 316 297 if (!*stackp) { 317 298 dput(origin); 318 299 return -ENOMEM; ··· 318 305 *ctrp = 1; 319 306 320 307 return 0; 308 + } 309 + 310 + /* 311 + * Verify that @fh matches the origin file handle stored in OVL_XATTR_ORIGIN. 312 + * Return 0 on match, -ESTALE on mismatch, < 0 on error. 313 + */ 314 + static int ovl_verify_origin_fh(struct dentry *dentry, const struct ovl_fh *fh) 315 + { 316 + struct ovl_fh *ofh = ovl_get_origin_fh(dentry); 317 + int err = 0; 318 + 319 + if (!ofh) 320 + return -ENODATA; 321 + 322 + if (IS_ERR(ofh)) 323 + return PTR_ERR(ofh); 324 + 325 + if (fh->len != ofh->len || memcmp(fh, ofh, fh->len)) 326 + err = -ESTALE; 327 + 328 + kfree(ofh); 329 + return err; 330 + } 331 + 332 + /* 333 + * Verify that an inode matches the origin file handle stored in upper inode. 334 + * 335 + * If @set is true and there is no stored file handle, encode and store origin 336 + * file handle in OVL_XATTR_ORIGIN. 337 + * 338 + * Return 0 on match, -ESTALE on mismatch, < 0 on error. 339 + */ 340 + int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt, 341 + struct dentry *origin, bool is_upper, bool set) 342 + { 343 + struct inode *inode; 344 + struct ovl_fh *fh; 345 + int err; 346 + 347 + fh = ovl_encode_fh(origin, is_upper); 348 + err = PTR_ERR(fh); 349 + if (IS_ERR(fh)) 350 + goto fail; 351 + 352 + err = ovl_verify_origin_fh(dentry, fh); 353 + if (set && err == -ENODATA) 354 + err = ovl_do_setxattr(dentry, OVL_XATTR_ORIGIN, fh, fh->len, 0); 355 + if (err) 356 + goto fail; 357 + 358 + out: 359 + kfree(fh); 360 + return err; 361 + 362 + fail: 363 + inode = d_inode(origin); 364 + pr_warn_ratelimited("overlayfs: failed to verify origin (%pd2, ino=%lu, err=%i)\n", 365 + origin, inode ? inode->i_ino : 0, err); 366 + goto out; 367 + } 368 + 369 + /* 370 + * Verify that an index entry name matches the origin file handle stored in 371 + * OVL_XATTR_ORIGIN and that origin file handle can be decoded to lower path. 372 + * Return 0 on match, -ESTALE on mismatch or stale origin, < 0 on error. 373 + */ 374 + int ovl_verify_index(struct dentry *index, struct path *lowerstack, 375 + unsigned int numlower) 376 + { 377 + struct ovl_fh *fh = NULL; 378 + size_t len; 379 + struct path origin = { }; 380 + struct path *stack = &origin; 381 + unsigned int ctr = 0; 382 + int err; 383 + 384 + if (!d_inode(index)) 385 + return 0; 386 + 387 + err = -EISDIR; 388 + if (d_is_dir(index)) 389 + goto fail; 390 + 391 + err = -EINVAL; 392 + if (index->d_name.len < sizeof(struct ovl_fh)*2) 393 + goto fail; 394 + 395 + err = -ENOMEM; 396 + len = index->d_name.len / 2; 397 + fh = kzalloc(len, GFP_TEMPORARY); 398 + if (!fh) 399 + goto fail; 400 + 401 + err = -EINVAL; 402 + if (hex2bin((u8 *)fh, index->d_name.name, len) || len != fh->len) 403 + goto fail; 404 + 405 + err = ovl_verify_origin_fh(index, fh); 406 + if (err) 407 + goto fail; 408 + 409 + err = ovl_check_origin(index, lowerstack, numlower, &stack, &ctr); 410 + if (!err && !ctr) 411 + err = -ESTALE; 412 + if (err) 413 + goto fail; 414 + 415 + /* Check if index is orphan and don't warn before cleaning it */ 416 + if (d_inode(index)->i_nlink == 1 && 417 + ovl_get_nlink(index, origin.dentry, 0) == 0) 418 + err = -ENOENT; 419 + 420 + dput(origin.dentry); 421 + out: 422 + kfree(fh); 423 + return err; 424 + 425 + fail: 426 + pr_warn_ratelimited("overlayfs: failed to verify index (%pd2, err=%i)\n", 427 + index, err); 428 + goto out; 429 + } 430 + 431 + /* 432 + * Lookup in indexdir for the index entry of a lower real inode or a copy up 433 + * origin inode. The index entry name is the hex representation of the lower 434 + * inode file handle. 435 + * 436 + * If the index dentry in negative, then either no lower aliases have been 437 + * copied up yet, or aliases have been copied up in older kernels and are 438 + * not indexed. 439 + * 440 + * If the index dentry for a copy up origin inode is positive, but points 441 + * to an inode different than the upper inode, then either the upper inode 442 + * has been copied up and not indexed or it was indexed, but since then 443 + * index dir was cleared. Either way, that index cannot be used to indentify 444 + * the overlay inode. 445 + */ 446 + int ovl_get_index_name(struct dentry *origin, struct qstr *name) 447 + { 448 + int err; 449 + struct ovl_fh *fh; 450 + char *n, *s; 451 + 452 + fh = ovl_encode_fh(origin, false); 453 + if (IS_ERR(fh)) 454 + return PTR_ERR(fh); 455 + 456 + err = -ENOMEM; 457 + n = kzalloc(fh->len * 2, GFP_TEMPORARY); 458 + if (n) { 459 + s = bin2hex(n, fh, fh->len); 460 + *name = (struct qstr) QSTR_INIT(n, s - n); 461 + err = 0; 462 + } 463 + kfree(fh); 464 + 465 + return err; 466 + 467 + } 468 + 469 + static struct dentry *ovl_lookup_index(struct dentry *dentry, 470 + struct dentry *upper, 471 + struct dentry *origin) 472 + { 473 + struct ovl_fs *ofs = dentry->d_sb->s_fs_info; 474 + struct dentry *index; 475 + struct inode *inode; 476 + struct qstr name; 477 + int err; 478 + 479 + err = ovl_get_index_name(origin, &name); 480 + if (err) 481 + return ERR_PTR(err); 482 + 483 + index = lookup_one_len_unlocked(name.name, ofs->indexdir, name.len); 484 + if (IS_ERR(index)) { 485 + pr_warn_ratelimited("overlayfs: failed inode index lookup (ino=%lu, key=%*s, err=%i);\n" 486 + "overlayfs: mount with '-o index=off' to disable inodes index.\n", 487 + d_inode(origin)->i_ino, name.len, name.name, 488 + err); 489 + goto out; 490 + } 491 + 492 + if (d_is_negative(index)) { 493 + if (upper && d_inode(origin)->i_nlink > 1) { 494 + pr_warn_ratelimited("overlayfs: hard link with origin but no index (ino=%lu).\n", 495 + d_inode(origin)->i_ino); 496 + goto fail; 497 + } 498 + 499 + dput(index); 500 + index = NULL; 501 + } else if (upper && d_inode(index) != d_inode(upper)) { 502 + inode = d_inode(index); 503 + pr_warn_ratelimited("overlayfs: wrong index found (index ino: %lu, upper ino: %lu).\n", 504 + d_inode(index)->i_ino, 505 + d_inode(upper)->i_ino); 506 + goto fail; 507 + } 508 + 509 + out: 510 + kfree(name.name); 511 + return index; 512 + 513 + fail: 514 + dput(index); 515 + index = ERR_PTR(-EIO); 516 + goto out; 321 517 } 322 518 323 519 /* ··· 560 338 struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata; 561 339 struct path *stack = NULL; 562 340 struct dentry *upperdir, *upperdentry = NULL; 341 + struct dentry *index = NULL; 563 342 unsigned int ctr = 0; 564 343 struct inode *inode = NULL; 565 344 bool upperopaque = false; 566 - bool upperimpure = false; 567 345 char *upperredirect = NULL; 568 346 struct dentry *this; 569 347 unsigned int i; ··· 581 359 return ERR_PTR(-ENAMETOOLONG); 582 360 583 361 old_cred = ovl_override_creds(dentry->d_sb); 584 - upperdir = ovl_upperdentry_dereference(poe); 362 + upperdir = ovl_dentry_upper(dentry->d_parent); 585 363 if (upperdir) { 586 364 err = ovl_lookup_layer(upperdir, &d, &upperdentry); 587 365 if (err) ··· 594 372 } 595 373 if (upperdentry && !d.is_dir) { 596 374 BUG_ON(!d.stop || d.redirect); 597 - err = ovl_check_origin(dentry, upperdentry, 598 - &stack, &ctr); 375 + /* 376 + * Lookup copy up origin by decoding origin file handle. 377 + * We may get a disconnected dentry, which is fine, 378 + * because we only need to hold the origin inode in 379 + * cache and use its inode number. We may even get a 380 + * connected dentry, that is not under any of the lower 381 + * layers root. That is also fine for using it's inode 382 + * number - it's the same as if we held a reference 383 + * to a dentry in lower layer that was moved under us. 384 + */ 385 + err = ovl_check_origin(upperdentry, roe->lowerstack, 386 + roe->numlower, &stack, &ctr); 599 387 if (err) 600 388 goto out; 601 389 } ··· 618 386 poe = roe; 619 387 } 620 388 upperopaque = d.opaque; 621 - if (upperdentry && d.is_dir) 622 - upperimpure = ovl_is_impuredir(upperdentry); 623 389 } 624 390 625 391 if (!d.stop && poe->numlower) { ··· 658 428 } 659 429 } 660 430 431 + /* Lookup index by lower inode and verify it matches upper inode */ 432 + if (ctr && !d.is_dir && ovl_indexdir(dentry->d_sb)) { 433 + struct dentry *origin = stack[0].dentry; 434 + 435 + index = ovl_lookup_index(dentry, upperdentry, origin); 436 + if (IS_ERR(index)) { 437 + err = PTR_ERR(index); 438 + index = NULL; 439 + goto out_put; 440 + } 441 + } 442 + 661 443 oe = ovl_alloc_entry(ctr); 662 444 err = -ENOMEM; 663 445 if (!oe) 664 446 goto out_put; 665 447 448 + oe->opaque = upperopaque; 449 + memcpy(oe->lowerstack, stack, sizeof(struct path) * ctr); 450 + dentry->d_fsdata = oe; 451 + 452 + if (upperdentry) 453 + ovl_dentry_set_upper_alias(dentry); 454 + else if (index) 455 + upperdentry = dget(index); 456 + 666 457 if (upperdentry || ctr) { 667 - struct dentry *realdentry; 668 - struct inode *realinode; 669 - 670 - realdentry = upperdentry ? upperdentry : stack[0].dentry; 671 - realinode = d_inode(realdentry); 672 - 673 - err = -ENOMEM; 674 - if (upperdentry && !d_is_dir(upperdentry)) { 675 - inode = ovl_get_inode(dentry->d_sb, realinode); 676 - } else { 677 - inode = ovl_new_inode(dentry->d_sb, realinode->i_mode, 678 - realinode->i_rdev); 679 - if (inode) 680 - ovl_inode_init(inode, realinode, !!upperdentry); 681 - } 682 - if (!inode) 458 + inode = ovl_get_inode(dentry, upperdentry); 459 + err = PTR_ERR(inode); 460 + if (IS_ERR(inode)) 683 461 goto out_free_oe; 684 - ovl_copyattr(realdentry->d_inode, inode); 462 + 463 + OVL_I(inode)->redirect = upperredirect; 464 + if (index) 465 + ovl_set_flag(OVL_INDEX, inode); 685 466 } 686 467 687 468 revert_creds(old_cred); 688 - oe->opaque = upperopaque; 689 - oe->impure = upperimpure; 690 - oe->redirect = upperredirect; 691 - oe->__upperdentry = upperdentry; 692 - memcpy(oe->lowerstack, stack, sizeof(struct path) * ctr); 469 + dput(index); 693 470 kfree(stack); 694 471 kfree(d.redirect); 695 - dentry->d_fsdata = oe; 696 472 d_add(dentry, inode); 697 473 698 474 return NULL; 699 475 700 476 out_free_oe: 477 + dentry->d_fsdata = NULL; 701 478 kfree(oe); 702 479 out_put: 480 + dput(index); 703 481 for (i = 0; i < ctr; i++) 704 482 dput(stack[i].dentry); 705 483 kfree(stack); ··· 737 499 return oe->opaque; 738 500 739 501 /* Negative upper -> positive lower */ 740 - if (!oe->__upperdentry) 502 + if (!ovl_dentry_upper(dentry)) 741 503 return true; 742 504 743 505 /* Positive upper -> have to look up lower to see whether it exists */
+39 -19
fs/overlayfs/overlayfs.h
··· 25 25 #define OVL_XATTR_REDIRECT OVL_XATTR_PREFIX "redirect" 26 26 #define OVL_XATTR_ORIGIN OVL_XATTR_PREFIX "origin" 27 27 #define OVL_XATTR_IMPURE OVL_XATTR_PREFIX "impure" 28 + #define OVL_XATTR_NLINK OVL_XATTR_PREFIX "nlink" 29 + 30 + enum ovl_flag { 31 + OVL_IMPURE, 32 + OVL_INDEX, 33 + }; 28 34 29 35 /* 30 36 * The tuple (fh,uuid) is a universal unique identifier for a copy up origin, ··· 44 38 /* CPU byte order required for fid decoding: */ 45 39 #define OVL_FH_FLAG_BIG_ENDIAN (1 << 0) 46 40 #define OVL_FH_FLAG_ANY_ENDIAN (1 << 1) 41 + /* Is the real inode encoded in fid an upper inode? */ 42 + #define OVL_FH_FLAG_PATH_UPPER (1 << 2) 47 43 48 44 #define OVL_FH_FLAG_ALL (OVL_FH_FLAG_BIG_ENDIAN | OVL_FH_FLAG_ANY_ENDIAN) 49 45 ··· 67 59 uuid_t uuid; /* uuid of filesystem */ 68 60 u8 fid[0]; /* file identifier */ 69 61 } __packed; 70 - 71 - #define OVL_ISUPPER_MASK 1UL 72 62 73 63 static inline int ovl_do_rmdir(struct inode *dir, struct dentry *dentry) 74 64 { ··· 181 175 return ret; 182 176 } 183 177 184 - static inline struct inode *ovl_inode_real(struct inode *inode, bool *is_upper) 185 - { 186 - unsigned long x = (unsigned long) READ_ONCE(inode->i_private); 187 - 188 - if (is_upper) 189 - *is_upper = x & OVL_ISUPPER_MASK; 190 - 191 - return (struct inode *) (x & ~OVL_ISUPPER_MASK); 192 - } 193 - 194 178 /* util.c */ 195 179 int ovl_want_write(struct dentry *dentry); 196 180 void ovl_drop_write(struct dentry *dentry); 197 181 struct dentry *ovl_workdir(struct dentry *dentry); 198 182 const struct cred *ovl_override_creds(struct super_block *sb); 199 183 struct super_block *ovl_same_sb(struct super_block *sb); 184 + bool ovl_can_decode_fh(struct super_block *sb); 185 + struct dentry *ovl_indexdir(struct super_block *sb); 200 186 struct ovl_entry *ovl_alloc_entry(unsigned int numlower); 201 187 bool ovl_dentry_remote(struct dentry *dentry); 202 188 bool ovl_dentry_weird(struct dentry *dentry); ··· 199 201 struct dentry *ovl_dentry_upper(struct dentry *dentry); 200 202 struct dentry *ovl_dentry_lower(struct dentry *dentry); 201 203 struct dentry *ovl_dentry_real(struct dentry *dentry); 204 + struct inode *ovl_inode_upper(struct inode *inode); 205 + struct inode *ovl_inode_lower(struct inode *inode); 206 + struct inode *ovl_inode_real(struct inode *inode); 202 207 struct ovl_dir_cache *ovl_dir_cache(struct dentry *dentry); 203 208 void ovl_set_dir_cache(struct dentry *dentry, struct ovl_dir_cache *cache); 204 209 bool ovl_dentry_is_opaque(struct dentry *dentry); 205 - bool ovl_dentry_is_impure(struct dentry *dentry); 206 210 bool ovl_dentry_is_whiteout(struct dentry *dentry); 207 211 void ovl_dentry_set_opaque(struct dentry *dentry); 212 + bool ovl_dentry_has_upper_alias(struct dentry *dentry); 213 + void ovl_dentry_set_upper_alias(struct dentry *dentry); 208 214 bool ovl_redirect_dir(struct super_block *sb); 209 215 const char *ovl_dentry_get_redirect(struct dentry *dentry); 210 216 void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect); 211 - void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry); 212 - void ovl_inode_init(struct inode *inode, struct inode *realinode, 213 - bool is_upper); 214 - void ovl_inode_update(struct inode *inode, struct inode *upperinode); 217 + void ovl_inode_init(struct inode *inode, struct dentry *upperdentry, 218 + struct dentry *lowerdentry); 219 + void ovl_inode_update(struct inode *inode, struct dentry *upperdentry); 215 220 void ovl_dentry_version_inc(struct dentry *dentry); 216 221 u64 ovl_dentry_version_get(struct dentry *dentry); 217 222 bool ovl_is_whiteout(struct dentry *dentry); ··· 226 225 const char *name, const void *value, size_t size, 227 226 int xerr); 228 227 int ovl_set_impure(struct dentry *dentry, struct dentry *upperdentry); 228 + void ovl_set_flag(unsigned long flag, struct inode *inode); 229 + bool ovl_test_flag(unsigned long flag, struct inode *inode); 230 + bool ovl_inuse_trylock(struct dentry *dentry); 231 + void ovl_inuse_unlock(struct dentry *dentry); 232 + int ovl_nlink_start(struct dentry *dentry, bool *locked); 233 + void ovl_nlink_end(struct dentry *dentry, bool locked); 229 234 230 235 static inline bool ovl_is_impuredir(struct dentry *dentry) 231 236 { ··· 240 233 241 234 242 235 /* namei.c */ 236 + int ovl_verify_origin(struct dentry *dentry, struct vfsmount *mnt, 237 + struct dentry *origin, bool is_upper, bool set); 238 + int ovl_verify_index(struct dentry *index, struct path *lowerstack, 239 + unsigned int numlower); 240 + int ovl_get_index_name(struct dentry *origin, struct qstr *name); 243 241 int ovl_path_next(int idx, struct dentry *dentry, struct path *path); 244 242 struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags); 245 243 bool ovl_lower_positive(struct dentry *dentry); ··· 257 245 int ovl_check_d_type_supported(struct path *realpath); 258 246 void ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt, 259 247 struct dentry *dentry, int level); 248 + int ovl_indexdir_cleanup(struct dentry *dentry, struct vfsmount *mnt, 249 + struct path *lowerstack, unsigned int numlower); 260 250 261 251 /* inode.c */ 252 + int ovl_set_nlink_upper(struct dentry *dentry); 253 + int ovl_set_nlink_lower(struct dentry *dentry); 254 + unsigned int ovl_get_nlink(struct dentry *lowerdentry, 255 + struct dentry *upperdentry, 256 + unsigned int fallback); 262 257 int ovl_setattr(struct dentry *dentry, struct iattr *attr); 263 258 int ovl_getattr(const struct path *path, struct kstat *stat, 264 259 u32 request_mask, unsigned int flags); ··· 281 262 bool ovl_is_private_xattr(const char *name); 282 263 283 264 struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev); 284 - struct inode *ovl_get_inode(struct super_block *sb, struct inode *realinode); 265 + struct inode *ovl_get_inode(struct dentry *dentry, struct dentry *upperdentry); 285 266 static inline void ovl_copyattr(struct inode *from, struct inode *to) 286 267 { 287 268 to->i_uid = from->i_uid; ··· 303 284 int ovl_create_real(struct inode *dir, struct dentry *newdentry, 304 285 struct cattr *attr, 305 286 struct dentry *hardlink, bool debug); 306 - void ovl_cleanup(struct inode *dir, struct dentry *dentry); 287 + int ovl_cleanup(struct inode *dir, struct dentry *dentry); 307 288 308 289 /* copy_up.c */ 309 290 int ovl_copy_up(struct dentry *dentry); 310 291 int ovl_copy_up_flags(struct dentry *dentry, int flags); 311 292 int ovl_copy_xattr(struct dentry *old, struct dentry *new); 312 293 int ovl_set_attr(struct dentry *upper, struct kstat *stat); 294 + struct ovl_fh *ovl_encode_fh(struct dentry *lower, bool is_upper);
+27 -9
fs/overlayfs/ovl_entry.h
··· 14 14 char *workdir; 15 15 bool default_permissions; 16 16 bool redirect_dir; 17 + bool index; 17 18 }; 18 19 19 20 /* private information held for overlayfs's superblock */ ··· 22 21 struct vfsmount *upper_mnt; 23 22 unsigned numlower; 24 23 struct vfsmount **lower_mnt; 24 + /* workbasedir is the path at workdir= mount option */ 25 + struct dentry *workbasedir; 26 + /* workdir is the 'work' directory under workbasedir */ 25 27 struct dentry *workdir; 28 + /* index directory listing overlay inodes by origin file handle */ 29 + struct dentry *indexdir; 26 30 long namelen; 27 31 /* pathnames of lower and upper dirs, for show_options */ 28 32 struct ovl_config config; ··· 35 29 const struct cred *creator_cred; 36 30 bool tmpfile; 37 31 bool noxattr; 38 - wait_queue_head_t copyup_wq; 39 32 /* sb common to all layers */ 40 33 struct super_block *same_sb; 41 34 }; 42 35 43 36 /* private information held for every overlayfs dentry */ 44 37 struct ovl_entry { 45 - struct dentry *__upperdentry; 46 - struct ovl_dir_cache *cache; 47 38 union { 48 39 struct { 49 - u64 version; 50 - const char *redirect; 40 + unsigned long has_upper; 51 41 bool opaque; 52 - bool impure; 53 - bool copying; 54 42 }; 55 43 struct rcu_head rcu; 56 44 }; ··· 54 54 55 55 struct ovl_entry *ovl_alloc_entry(unsigned int numlower); 56 56 57 - static inline struct dentry *ovl_upperdentry_dereference(struct ovl_entry *oe) 57 + struct ovl_inode { 58 + struct ovl_dir_cache *cache; 59 + const char *redirect; 60 + u64 version; 61 + unsigned long flags; 62 + struct inode vfs_inode; 63 + struct dentry *__upperdentry; 64 + struct inode *lower; 65 + 66 + /* synchronize copy up and more */ 67 + struct mutex lock; 68 + }; 69 + 70 + static inline struct ovl_inode *OVL_I(struct inode *inode) 58 71 { 59 - return lockless_dereference(oe->__upperdentry); 72 + return container_of(inode, struct ovl_inode, vfs_inode); 73 + } 74 + 75 + static inline struct dentry *ovl_upperdentry_dereference(struct ovl_inode *oi) 76 + { 77 + return lockless_dereference(oi->__upperdentry); 60 78 }
+50
fs/overlayfs/readdir.c
··· 667 667 ovl_cleanup(dir, dentry); 668 668 } 669 669 } 670 + 671 + int ovl_indexdir_cleanup(struct dentry *dentry, struct vfsmount *mnt, 672 + struct path *lowerstack, unsigned int numlower) 673 + { 674 + int err; 675 + struct inode *dir = dentry->d_inode; 676 + struct path path = { .mnt = mnt, .dentry = dentry }; 677 + LIST_HEAD(list); 678 + struct ovl_cache_entry *p; 679 + struct ovl_readdir_data rdd = { 680 + .ctx.actor = ovl_fill_merge, 681 + .dentry = NULL, 682 + .list = &list, 683 + .root = RB_ROOT, 684 + .is_lowest = false, 685 + }; 686 + 687 + err = ovl_dir_read(&path, &rdd); 688 + if (err) 689 + goto out; 690 + 691 + inode_lock_nested(dir, I_MUTEX_PARENT); 692 + list_for_each_entry(p, &list, l_node) { 693 + struct dentry *index; 694 + 695 + if (p->name[0] == '.') { 696 + if (p->len == 1) 697 + continue; 698 + if (p->len == 2 && p->name[1] == '.') 699 + continue; 700 + } 701 + index = lookup_one_len(p->name, dentry, p->len); 702 + if (IS_ERR(index)) { 703 + err = PTR_ERR(index); 704 + break; 705 + } 706 + if (ovl_verify_index(index, lowerstack, numlower)) { 707 + err = ovl_cleanup(dir, index); 708 + if (err) 709 + break; 710 + } 711 + dput(index); 712 + } 713 + inode_unlock(dir); 714 + out: 715 + ovl_cache_free(&list); 716 + if (err) 717 + pr_err("overlayfs: failed index dir cleanup (%i)\n", err); 718 + return err; 719 + }
+213 -34
fs/overlayfs/super.c
··· 34 34 MODULE_PARM_DESC(ovl_redirect_dir_def, 35 35 "Default to on or off for the redirect_dir feature"); 36 36 37 + static bool ovl_index_def = IS_ENABLED(CONFIG_OVERLAY_FS_INDEX); 38 + module_param_named(index, ovl_index_def, bool, 0644); 39 + MODULE_PARM_DESC(ovl_index_def, 40 + "Default to on or off for the inodes index feature"); 41 + 37 42 static void ovl_dentry_release(struct dentry *dentry) 38 43 { 39 44 struct ovl_entry *oe = dentry->d_fsdata; ··· 46 41 if (oe) { 47 42 unsigned int i; 48 43 49 - dput(oe->__upperdentry); 50 - kfree(oe->redirect); 51 44 for (i = 0; i < oe->numlower; i++) 52 45 dput(oe->lowerstack[i].dentry); 53 46 kfree_rcu(oe, rcu); ··· 168 165 .d_weak_revalidate = ovl_dentry_weak_revalidate, 169 166 }; 170 167 168 + static struct kmem_cache *ovl_inode_cachep; 169 + 170 + static struct inode *ovl_alloc_inode(struct super_block *sb) 171 + { 172 + struct ovl_inode *oi = kmem_cache_alloc(ovl_inode_cachep, GFP_KERNEL); 173 + 174 + oi->cache = NULL; 175 + oi->redirect = NULL; 176 + oi->version = 0; 177 + oi->flags = 0; 178 + oi->__upperdentry = NULL; 179 + oi->lower = NULL; 180 + mutex_init(&oi->lock); 181 + 182 + return &oi->vfs_inode; 183 + } 184 + 185 + static void ovl_i_callback(struct rcu_head *head) 186 + { 187 + struct inode *inode = container_of(head, struct inode, i_rcu); 188 + 189 + kmem_cache_free(ovl_inode_cachep, OVL_I(inode)); 190 + } 191 + 192 + static void ovl_destroy_inode(struct inode *inode) 193 + { 194 + struct ovl_inode *oi = OVL_I(inode); 195 + 196 + dput(oi->__upperdentry); 197 + kfree(oi->redirect); 198 + mutex_destroy(&oi->lock); 199 + 200 + call_rcu(&inode->i_rcu, ovl_i_callback); 201 + } 202 + 171 203 static void ovl_put_super(struct super_block *sb) 172 204 { 173 205 struct ovl_fs *ufs = sb->s_fs_info; 174 206 unsigned i; 175 207 208 + dput(ufs->indexdir); 176 209 dput(ufs->workdir); 210 + ovl_inuse_unlock(ufs->workbasedir); 211 + dput(ufs->workbasedir); 212 + if (ufs->upper_mnt) 213 + ovl_inuse_unlock(ufs->upper_mnt->mnt_root); 177 214 mntput(ufs->upper_mnt); 178 215 for (i = 0; i < ufs->numlower; i++) 179 216 mntput(ufs->lower_mnt[i]); ··· 271 228 return err; 272 229 } 273 230 231 + /* Will this overlay be forced to mount/remount ro? */ 232 + static bool ovl_force_readonly(struct ovl_fs *ufs) 233 + { 234 + return (!ufs->upper_mnt || !ufs->workdir); 235 + } 236 + 274 237 /** 275 238 * ovl_show_options 276 239 * ··· 298 249 if (ufs->config.redirect_dir != ovl_redirect_dir_def) 299 250 seq_printf(m, ",redirect_dir=%s", 300 251 ufs->config.redirect_dir ? "on" : "off"); 252 + if (ufs->config.index != ovl_index_def) 253 + seq_printf(m, ",index=%s", 254 + ufs->config.index ? "on" : "off"); 301 255 return 0; 302 256 } 303 257 ··· 308 256 { 309 257 struct ovl_fs *ufs = sb->s_fs_info; 310 258 311 - if (!(*flags & MS_RDONLY) && (!ufs->upper_mnt || !ufs->workdir)) 259 + if (!(*flags & MS_RDONLY) && ovl_force_readonly(ufs)) 312 260 return -EROFS; 313 261 314 262 return 0; 315 263 } 316 264 317 265 static const struct super_operations ovl_super_operations = { 266 + .alloc_inode = ovl_alloc_inode, 267 + .destroy_inode = ovl_destroy_inode, 268 + .drop_inode = generic_delete_inode, 318 269 .put_super = ovl_put_super, 319 270 .sync_fs = ovl_sync_fs, 320 271 .statfs = ovl_statfs, 321 272 .show_options = ovl_show_options, 322 273 .remount_fs = ovl_remount, 323 - .drop_inode = generic_delete_inode, 324 274 }; 325 275 326 276 enum { ··· 332 278 OPT_DEFAULT_PERMISSIONS, 333 279 OPT_REDIRECT_DIR_ON, 334 280 OPT_REDIRECT_DIR_OFF, 281 + OPT_INDEX_ON, 282 + OPT_INDEX_OFF, 335 283 OPT_ERR, 336 284 }; 337 285 ··· 344 288 {OPT_DEFAULT_PERMISSIONS, "default_permissions"}, 345 289 {OPT_REDIRECT_DIR_ON, "redirect_dir=on"}, 346 290 {OPT_REDIRECT_DIR_OFF, "redirect_dir=off"}, 291 + {OPT_INDEX_ON, "index=on"}, 292 + {OPT_INDEX_OFF, "index=off"}, 347 293 {OPT_ERR, NULL} 348 294 }; 349 295 ··· 418 360 config->redirect_dir = false; 419 361 break; 420 362 363 + case OPT_INDEX_ON: 364 + config->index = true; 365 + break; 366 + 367 + case OPT_INDEX_OFF: 368 + config->index = false; 369 + break; 370 + 421 371 default: 422 372 pr_err("overlayfs: unrecognized mount option \"%s\" or missing value\n", p); 423 373 return -EINVAL; ··· 444 378 } 445 379 446 380 #define OVL_WORKDIR_NAME "work" 381 + #define OVL_INDEXDIR_NAME "index" 447 382 448 - static struct dentry *ovl_workdir_create(struct vfsmount *mnt, 449 - struct dentry *dentry) 383 + static struct dentry *ovl_workdir_create(struct super_block *sb, 384 + struct ovl_fs *ufs, 385 + struct dentry *dentry, 386 + const char *name, bool persist) 450 387 { 451 388 struct inode *dir = dentry->d_inode; 389 + struct vfsmount *mnt = ufs->upper_mnt; 452 390 struct dentry *work; 453 391 int err; 454 392 bool retried = false; 393 + bool locked = false; 455 394 456 395 err = mnt_want_write(mnt); 457 396 if (err) 458 - return ERR_PTR(err); 397 + goto out_err; 459 398 460 399 inode_lock_nested(dir, I_MUTEX_PARENT); 400 + locked = true; 401 + 461 402 retry: 462 - work = lookup_one_len(OVL_WORKDIR_NAME, dentry, 463 - strlen(OVL_WORKDIR_NAME)); 403 + work = lookup_one_len(name, dentry, strlen(name)); 464 404 465 405 if (!IS_ERR(work)) { 466 406 struct iattr attr = { ··· 478 406 err = -EEXIST; 479 407 if (retried) 480 408 goto out_dput; 409 + 410 + if (persist) 411 + goto out_unlock; 481 412 482 413 retried = true; 483 414 ovl_workdir_cleanup(dir, mnt, work, 0); ··· 521 446 inode_unlock(work->d_inode); 522 447 if (err) 523 448 goto out_dput; 449 + } else { 450 + err = PTR_ERR(work); 451 + goto out_err; 524 452 } 525 453 out_unlock: 526 - inode_unlock(dir); 527 454 mnt_drop_write(mnt); 455 + if (locked) 456 + inode_unlock(dir); 528 457 529 458 return work; 530 459 531 460 out_dput: 532 461 dput(work); 533 - work = ERR_PTR(err); 462 + out_err: 463 + pr_warn("overlayfs: failed to create directory %s/%s (errno: %i); mounting read-only\n", 464 + ufs->config.workdir, name, -err); 465 + sb->s_flags |= MS_RDONLY; 466 + work = NULL; 534 467 goto out_unlock; 535 468 } 536 469 ··· 638 555 if (ovl_dentry_remote(path->dentry)) 639 556 *remote = true; 640 557 558 + /* 559 + * The inodes index feature needs to encode and decode file 560 + * handles, so it requires that all layers support them. 561 + */ 562 + if (ofs->config.index && !ovl_can_decode_fh(path->dentry->d_sb)) { 563 + ofs->config.index = false; 564 + pr_warn("overlayfs: fs on '%s' does not support file handles, falling back to index=off.\n", name); 565 + } 566 + 641 567 return 0; 642 568 643 569 out_put: ··· 702 610 size_t size, int flags) 703 611 { 704 612 struct dentry *workdir = ovl_workdir(dentry); 705 - struct inode *realinode = ovl_inode_real(inode, NULL); 613 + struct inode *realinode = ovl_inode_real(inode); 706 614 struct posix_acl *acl = NULL; 707 615 int err; 708 616 ··· 744 652 745 653 err = ovl_xattr_set(dentry, handler->name, value, size, flags); 746 654 if (!err) 747 - ovl_copyattr(ovl_inode_real(inode, NULL), inode); 655 + ovl_copyattr(ovl_inode_real(inode), inode); 748 656 749 657 return err; 750 658 ··· 826 734 struct path upperpath = { }; 827 735 struct path workpath = { }; 828 736 struct dentry *root_dentry; 829 - struct inode *realinode; 830 737 struct ovl_entry *oe; 831 738 struct ovl_fs *ufs; 832 739 struct path *stack = NULL; ··· 843 752 if (!ufs) 844 753 goto out; 845 754 846 - init_waitqueue_head(&ufs->copyup_wq); 847 755 ufs->config.redirect_dir = ovl_redirect_dir_def; 756 + ufs->config.index = ovl_index_def; 848 757 err = ovl_parse_opt((char *) data, &ufs->config); 849 758 if (err) 850 759 goto out_free_config; ··· 879 788 if (err) 880 789 goto out_put_upperpath; 881 790 791 + err = -EBUSY; 792 + if (!ovl_inuse_trylock(upperpath.dentry)) { 793 + pr_err("overlayfs: upperdir is in-use by another mount\n"); 794 + goto out_put_upperpath; 795 + } 796 + 882 797 err = ovl_mount_dir(ufs->config.workdir, &workpath); 883 798 if (err) 884 - goto out_put_upperpath; 799 + goto out_unlock_upperdentry; 885 800 886 801 err = -EINVAL; 887 802 if (upperpath.mnt != workpath.mnt) { ··· 898 801 pr_err("overlayfs: workdir and upperdir must be separate subtrees\n"); 899 802 goto out_put_workpath; 900 803 } 804 + 805 + err = -EBUSY; 806 + if (!ovl_inuse_trylock(workpath.dentry)) { 807 + pr_err("overlayfs: workdir is in-use by another mount\n"); 808 + goto out_put_workpath; 809 + } 810 + 811 + ufs->workbasedir = workpath.dentry; 901 812 sb->s_stack_depth = upperpath.mnt->mnt_sb->s_stack_depth; 902 813 } 903 814 err = -ENOMEM; 904 815 lowertmp = kstrdup(ufs->config.lowerdir, GFP_KERNEL); 905 816 if (!lowertmp) 906 - goto out_put_workpath; 817 + goto out_unlock_workdentry; 907 818 908 819 err = -EINVAL; 909 820 stacklen = ovl_split_lowerdirs(lowertmp); ··· 954 849 pr_err("overlayfs: failed to clone upperpath\n"); 955 850 goto out_put_lowerpath; 956 851 } 852 + 957 853 /* Don't inherit atime flags */ 958 854 ufs->upper_mnt->mnt_flags &= ~(MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME); 959 855 960 856 sb->s_time_gran = ufs->upper_mnt->mnt_sb->s_time_gran; 961 857 962 - ufs->workdir = ovl_workdir_create(ufs->upper_mnt, workpath.dentry); 963 - err = PTR_ERR(ufs->workdir); 964 - if (IS_ERR(ufs->workdir)) { 965 - pr_warn("overlayfs: failed to create directory %s/%s (errno: %i); mounting read-only\n", 966 - ufs->config.workdir, OVL_WORKDIR_NAME, -err); 967 - sb->s_flags |= MS_RDONLY; 968 - ufs->workdir = NULL; 969 - } 970 - 858 + ufs->workdir = ovl_workdir_create(sb, ufs, workpath.dentry, 859 + OVL_WORKDIR_NAME, false); 971 860 /* 972 861 * Upper should support d_type, else whiteouts are visible. 973 862 * Given workdir and upper are on same fs, we can do ··· 1003 904 } else { 1004 905 vfs_removexattr(ufs->workdir, OVL_XATTR_OPAQUE); 1005 906 } 907 + 908 + /* Check if upper/work fs supports file handles */ 909 + if (ufs->config.index && 910 + !ovl_can_decode_fh(ufs->workdir->d_sb)) { 911 + ufs->config.index = false; 912 + pr_warn("overlayfs: upper fs does not support file handles, falling back to index=off.\n"); 913 + } 1006 914 } 1007 915 } 1008 916 ··· 1047 941 else if (ufs->upper_mnt->mnt_sb != ufs->same_sb) 1048 942 ufs->same_sb = NULL; 1049 943 944 + if (!(ovl_force_readonly(ufs)) && ufs->config.index) { 945 + /* Verify lower root is upper root origin */ 946 + err = ovl_verify_origin(upperpath.dentry, ufs->lower_mnt[0], 947 + stack[0].dentry, false, true); 948 + if (err) { 949 + pr_err("overlayfs: failed to verify upper root origin\n"); 950 + goto out_put_lower_mnt; 951 + } 952 + 953 + ufs->indexdir = ovl_workdir_create(sb, ufs, workpath.dentry, 954 + OVL_INDEXDIR_NAME, true); 955 + err = PTR_ERR(ufs->indexdir); 956 + if (IS_ERR(ufs->indexdir)) 957 + goto out_put_lower_mnt; 958 + 959 + if (ufs->indexdir) { 960 + /* Verify upper root is index dir origin */ 961 + err = ovl_verify_origin(ufs->indexdir, ufs->upper_mnt, 962 + upperpath.dentry, true, true); 963 + if (err) 964 + pr_err("overlayfs: failed to verify index dir origin\n"); 965 + 966 + /* Cleanup bad/stale/orphan index entries */ 967 + if (!err) 968 + err = ovl_indexdir_cleanup(ufs->indexdir, 969 + ufs->upper_mnt, 970 + stack, numlower); 971 + } 972 + if (err || !ufs->indexdir) 973 + pr_warn("overlayfs: try deleting index dir or mounting with '-o index=off' to disable inodes index.\n"); 974 + if (err) 975 + goto out_put_indexdir; 976 + } 977 + 978 + /* Show index=off/on in /proc/mounts for any of the reasons above */ 979 + if (!ufs->indexdir) 980 + ufs->config.index = false; 981 + 1050 982 if (remote) 1051 983 sb->s_d_op = &ovl_reval_dentry_operations; 1052 984 else ··· 1092 948 1093 949 ufs->creator_cred = cred = prepare_creds(); 1094 950 if (!cred) 1095 - goto out_put_lower_mnt; 951 + goto out_put_indexdir; 1096 952 1097 953 /* Never override disk quota limits or use reserved space */ 1098 954 cap_lower(cred->cap_effective, CAP_SYS_RESOURCE); ··· 1115 971 mntput(upperpath.mnt); 1116 972 for (i = 0; i < numlower; i++) 1117 973 mntput(stack[i].mnt); 1118 - path_put(&workpath); 974 + mntput(workpath.mnt); 1119 975 kfree(lowertmp); 1120 976 1121 977 if (upperpath.dentry) { 1122 - oe->__upperdentry = upperpath.dentry; 1123 - oe->impure = ovl_is_impuredir(upperpath.dentry); 978 + oe->has_upper = true; 979 + if (ovl_is_impuredir(upperpath.dentry)) 980 + ovl_set_flag(OVL_IMPURE, d_inode(root_dentry)); 1124 981 } 1125 982 for (i = 0; i < numlower; i++) { 1126 983 oe->lowerstack[i].dentry = stack[i].dentry; ··· 1131 986 1132 987 root_dentry->d_fsdata = oe; 1133 988 1134 - realinode = d_inode(ovl_dentry_real(root_dentry)); 1135 - ovl_inode_init(d_inode(root_dentry), realinode, !!upperpath.dentry); 1136 - ovl_copyattr(realinode, d_inode(root_dentry)); 989 + ovl_inode_init(d_inode(root_dentry), upperpath.dentry, 990 + ovl_dentry_lower(root_dentry)); 1137 991 1138 992 sb->s_root = root_dentry; 1139 993 ··· 1142 998 kfree(oe); 1143 999 out_put_cred: 1144 1000 put_cred(ufs->creator_cred); 1001 + out_put_indexdir: 1002 + dput(ufs->indexdir); 1145 1003 out_put_lower_mnt: 1146 1004 for (i = 0; i < ufs->numlower; i++) 1147 1005 mntput(ufs->lower_mnt[i]); ··· 1157 1011 kfree(stack); 1158 1012 out_free_lowertmp: 1159 1013 kfree(lowertmp); 1014 + out_unlock_workdentry: 1015 + ovl_inuse_unlock(workpath.dentry); 1160 1016 out_put_workpath: 1161 1017 path_put(&workpath); 1018 + out_unlock_upperdentry: 1019 + ovl_inuse_unlock(upperpath.dentry); 1162 1020 out_put_upperpath: 1163 1021 path_put(&upperpath); 1164 1022 out_free_config: ··· 1188 1038 }; 1189 1039 MODULE_ALIAS_FS("overlay"); 1190 1040 1041 + static void ovl_inode_init_once(void *foo) 1042 + { 1043 + struct ovl_inode *oi = foo; 1044 + 1045 + inode_init_once(&oi->vfs_inode); 1046 + } 1047 + 1191 1048 static int __init ovl_init(void) 1192 1049 { 1193 - return register_filesystem(&ovl_fs_type); 1050 + int err; 1051 + 1052 + ovl_inode_cachep = kmem_cache_create("ovl_inode", 1053 + sizeof(struct ovl_inode), 0, 1054 + (SLAB_RECLAIM_ACCOUNT| 1055 + SLAB_MEM_SPREAD|SLAB_ACCOUNT), 1056 + ovl_inode_init_once); 1057 + if (ovl_inode_cachep == NULL) 1058 + return -ENOMEM; 1059 + 1060 + err = register_filesystem(&ovl_fs_type); 1061 + if (err) 1062 + kmem_cache_destroy(ovl_inode_cachep); 1063 + 1064 + return err; 1194 1065 } 1195 1066 1196 1067 static void __exit ovl_exit(void) 1197 1068 { 1198 1069 unregister_filesystem(&ovl_fs_type); 1070 + 1071 + /* 1072 + * Make sure all delayed rcu free inodes are flushed before we 1073 + * destroy cache. 1074 + */ 1075 + rcu_barrier(); 1076 + kmem_cache_destroy(ovl_inode_cachep); 1077 + 1199 1078 } 1200 1079 1201 1080 module_init(ovl_init);
+268 -87
fs/overlayfs/util.c
··· 12 12 #include <linux/slab.h> 13 13 #include <linux/cred.h> 14 14 #include <linux/xattr.h> 15 + #include <linux/exportfs.h> 16 + #include <linux/uuid.h> 17 + #include <linux/namei.h> 18 + #include <linux/ratelimit.h> 15 19 #include "overlayfs.h" 16 20 #include "ovl_entry.h" 17 21 ··· 51 47 return ofs->same_sb; 52 48 } 53 49 50 + bool ovl_can_decode_fh(struct super_block *sb) 51 + { 52 + return (sb->s_export_op && sb->s_export_op->fh_to_dentry && 53 + !uuid_is_null(&sb->s_uuid)); 54 + } 55 + 56 + struct dentry *ovl_indexdir(struct super_block *sb) 57 + { 58 + struct ovl_fs *ofs = sb->s_fs_info; 59 + 60 + return ofs->indexdir; 61 + } 62 + 54 63 struct ovl_entry *ovl_alloc_entry(unsigned int numlower) 55 64 { 56 65 size_t size = offsetof(struct ovl_entry, lowerstack[numlower]); ··· 95 78 struct ovl_entry *oe = dentry->d_fsdata; 96 79 enum ovl_path_type type = 0; 97 80 98 - if (oe->__upperdentry) { 81 + if (ovl_dentry_upper(dentry)) { 99 82 type = __OVL_PATH_UPPER; 100 83 101 84 /* ··· 116 99 void ovl_path_upper(struct dentry *dentry, struct path *path) 117 100 { 118 101 struct ovl_fs *ofs = dentry->d_sb->s_fs_info; 119 - struct ovl_entry *oe = dentry->d_fsdata; 120 102 121 103 path->mnt = ofs->upper_mnt; 122 - path->dentry = ovl_upperdentry_dereference(oe); 104 + path->dentry = ovl_dentry_upper(dentry); 123 105 } 124 106 125 107 void ovl_path_lower(struct dentry *dentry, struct path *path) ··· 142 126 143 127 struct dentry *ovl_dentry_upper(struct dentry *dentry) 144 128 { 145 - struct ovl_entry *oe = dentry->d_fsdata; 146 - 147 - return ovl_upperdentry_dereference(oe); 148 - } 149 - 150 - static struct dentry *__ovl_dentry_lower(struct ovl_entry *oe) 151 - { 152 - return oe->numlower ? oe->lowerstack[0].dentry : NULL; 129 + return ovl_upperdentry_dereference(OVL_I(d_inode(dentry))); 153 130 } 154 131 155 132 struct dentry *ovl_dentry_lower(struct dentry *dentry) 156 133 { 157 134 struct ovl_entry *oe = dentry->d_fsdata; 158 135 159 - return __ovl_dentry_lower(oe); 136 + return oe->numlower ? oe->lowerstack[0].dentry : NULL; 160 137 } 161 138 162 139 struct dentry *ovl_dentry_real(struct dentry *dentry) 163 140 { 164 - struct ovl_entry *oe = dentry->d_fsdata; 165 - struct dentry *realdentry; 166 - 167 - realdentry = ovl_upperdentry_dereference(oe); 168 - if (!realdentry) 169 - realdentry = __ovl_dentry_lower(oe); 170 - 171 - return realdentry; 141 + return ovl_dentry_upper(dentry) ?: ovl_dentry_lower(dentry); 172 142 } 143 + 144 + struct inode *ovl_inode_upper(struct inode *inode) 145 + { 146 + struct dentry *upperdentry = ovl_upperdentry_dereference(OVL_I(inode)); 147 + 148 + return upperdentry ? d_inode(upperdentry) : NULL; 149 + } 150 + 151 + struct inode *ovl_inode_lower(struct inode *inode) 152 + { 153 + return OVL_I(inode)->lower; 154 + } 155 + 156 + struct inode *ovl_inode_real(struct inode *inode) 157 + { 158 + return ovl_inode_upper(inode) ?: ovl_inode_lower(inode); 159 + } 160 + 173 161 174 162 struct ovl_dir_cache *ovl_dir_cache(struct dentry *dentry) 175 163 { 176 - struct ovl_entry *oe = dentry->d_fsdata; 177 - 178 - return oe->cache; 164 + return OVL_I(d_inode(dentry))->cache; 179 165 } 180 166 181 167 void ovl_set_dir_cache(struct dentry *dentry, struct ovl_dir_cache *cache) 182 168 { 183 - struct ovl_entry *oe = dentry->d_fsdata; 184 - 185 - oe->cache = cache; 169 + OVL_I(d_inode(dentry))->cache = cache; 186 170 } 187 171 188 172 bool ovl_dentry_is_opaque(struct dentry *dentry) 189 173 { 190 174 struct ovl_entry *oe = dentry->d_fsdata; 191 175 return oe->opaque; 192 - } 193 - 194 - bool ovl_dentry_is_impure(struct dentry *dentry) 195 - { 196 - struct ovl_entry *oe = dentry->d_fsdata; 197 - 198 - return oe->impure; 199 176 } 200 177 201 178 bool ovl_dentry_is_whiteout(struct dentry *dentry) ··· 203 194 oe->opaque = true; 204 195 } 205 196 197 + /* 198 + * For hard links it's possible for ovl_dentry_upper() to return positive, while 199 + * there's no actual upper alias for the inode. Copy up code needs to know 200 + * about the existence of the upper alias, so it can't use ovl_dentry_upper(). 201 + */ 202 + bool ovl_dentry_has_upper_alias(struct dentry *dentry) 203 + { 204 + struct ovl_entry *oe = dentry->d_fsdata; 205 + 206 + return oe->has_upper; 207 + } 208 + 209 + void ovl_dentry_set_upper_alias(struct dentry *dentry) 210 + { 211 + struct ovl_entry *oe = dentry->d_fsdata; 212 + 213 + oe->has_upper = true; 214 + } 215 + 206 216 bool ovl_redirect_dir(struct super_block *sb) 207 217 { 208 218 struct ovl_fs *ofs = sb->s_fs_info; ··· 231 203 232 204 const char *ovl_dentry_get_redirect(struct dentry *dentry) 233 205 { 234 - struct ovl_entry *oe = dentry->d_fsdata; 235 - 236 - return oe->redirect; 206 + return OVL_I(d_inode(dentry))->redirect; 237 207 } 238 208 239 209 void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect) 240 210 { 241 - struct ovl_entry *oe = dentry->d_fsdata; 211 + struct ovl_inode *oi = OVL_I(d_inode(dentry)); 242 212 243 - kfree(oe->redirect); 244 - oe->redirect = redirect; 213 + kfree(oi->redirect); 214 + oi->redirect = redirect; 245 215 } 246 216 247 - void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry) 217 + void ovl_inode_init(struct inode *inode, struct dentry *upperdentry, 218 + struct dentry *lowerdentry) 248 219 { 249 - struct ovl_entry *oe = dentry->d_fsdata; 220 + if (upperdentry) 221 + OVL_I(inode)->__upperdentry = upperdentry; 222 + if (lowerdentry) 223 + OVL_I(inode)->lower = d_inode(lowerdentry); 250 224 251 - WARN_ON(!inode_is_locked(upperdentry->d_parent->d_inode)); 252 - WARN_ON(oe->__upperdentry); 225 + ovl_copyattr(d_inode(upperdentry ?: lowerdentry), inode); 226 + } 227 + 228 + void ovl_inode_update(struct inode *inode, struct dentry *upperdentry) 229 + { 230 + struct inode *upperinode = d_inode(upperdentry); 231 + 232 + WARN_ON(OVL_I(inode)->__upperdentry); 233 + 253 234 /* 254 - * Make sure upperdentry is consistent before making it visible to 255 - * ovl_upperdentry_dereference(). 235 + * Make sure upperdentry is consistent before making it visible 256 236 */ 257 237 smp_wmb(); 258 - oe->__upperdentry = upperdentry; 259 - } 260 - 261 - void ovl_inode_init(struct inode *inode, struct inode *realinode, bool is_upper) 262 - { 263 - WRITE_ONCE(inode->i_private, (unsigned long) realinode | 264 - (is_upper ? OVL_ISUPPER_MASK : 0)); 265 - } 266 - 267 - void ovl_inode_update(struct inode *inode, struct inode *upperinode) 268 - { 269 - WARN_ON(!upperinode); 270 - WARN_ON(!inode_unhashed(inode)); 271 - WRITE_ONCE(inode->i_private, 272 - (unsigned long) upperinode | OVL_ISUPPER_MASK); 273 - if (!S_ISDIR(upperinode->i_mode)) 238 + OVL_I(inode)->__upperdentry = upperdentry; 239 + if (!S_ISDIR(upperinode->i_mode) && inode_unhashed(inode)) { 240 + inode->i_private = upperinode; 274 241 __insert_inode_hash(inode, (unsigned long) upperinode); 242 + } 275 243 } 276 244 277 245 void ovl_dentry_version_inc(struct dentry *dentry) 278 246 { 279 - struct ovl_entry *oe = dentry->d_fsdata; 247 + struct inode *inode = d_inode(dentry); 280 248 281 - WARN_ON(!inode_is_locked(dentry->d_inode)); 282 - oe->version++; 249 + WARN_ON(!inode_is_locked(inode)); 250 + OVL_I(inode)->version++; 283 251 } 284 252 285 253 u64 ovl_dentry_version_get(struct dentry *dentry) 286 254 { 287 - struct ovl_entry *oe = dentry->d_fsdata; 255 + struct inode *inode = d_inode(dentry); 288 256 289 - WARN_ON(!inode_is_locked(dentry->d_inode)); 290 - return oe->version; 257 + WARN_ON(!inode_is_locked(inode)); 258 + return OVL_I(inode)->version; 291 259 } 292 260 293 261 bool ovl_is_whiteout(struct dentry *dentry) ··· 300 276 301 277 int ovl_copy_up_start(struct dentry *dentry) 302 278 { 303 - struct ovl_fs *ofs = dentry->d_sb->s_fs_info; 304 - struct ovl_entry *oe = dentry->d_fsdata; 279 + struct ovl_inode *oi = OVL_I(d_inode(dentry)); 305 280 int err; 306 281 307 - spin_lock(&ofs->copyup_wq.lock); 308 - err = wait_event_interruptible_locked(ofs->copyup_wq, !oe->copying); 309 - if (!err) { 310 - if (oe->__upperdentry) 311 - err = 1; /* Already copied up */ 312 - else 313 - oe->copying = true; 282 + err = mutex_lock_interruptible(&oi->lock); 283 + if (!err && ovl_dentry_has_upper_alias(dentry)) { 284 + err = 1; /* Already copied up */ 285 + mutex_unlock(&oi->lock); 314 286 } 315 - spin_unlock(&ofs->copyup_wq.lock); 316 287 317 288 return err; 318 289 } 319 290 320 291 void ovl_copy_up_end(struct dentry *dentry) 321 292 { 322 - struct ovl_fs *ofs = dentry->d_sb->s_fs_info; 323 - struct ovl_entry *oe = dentry->d_fsdata; 324 - 325 - spin_lock(&ofs->copyup_wq.lock); 326 - oe->copying = false; 327 - wake_up_locked(&ofs->copyup_wq); 328 - spin_unlock(&ofs->copyup_wq.lock); 293 + mutex_unlock(&OVL_I(d_inode(dentry))->lock); 329 294 } 330 295 331 296 bool ovl_check_dir_xattr(struct dentry *dentry, const char *name) ··· 356 343 int ovl_set_impure(struct dentry *dentry, struct dentry *upperdentry) 357 344 { 358 345 int err; 359 - struct ovl_entry *oe = dentry->d_fsdata; 360 346 361 - if (oe->impure) 347 + if (ovl_test_flag(OVL_IMPURE, d_inode(dentry))) 362 348 return 0; 363 349 364 350 /* ··· 367 355 err = ovl_check_setxattr(dentry, upperdentry, OVL_XATTR_IMPURE, 368 356 "y", 1, 0); 369 357 if (!err) 370 - oe->impure = true; 358 + ovl_set_flag(OVL_IMPURE, d_inode(dentry)); 371 359 372 360 return err; 361 + } 362 + 363 + void ovl_set_flag(unsigned long flag, struct inode *inode) 364 + { 365 + set_bit(flag, &OVL_I(inode)->flags); 366 + } 367 + 368 + bool ovl_test_flag(unsigned long flag, struct inode *inode) 369 + { 370 + return test_bit(flag, &OVL_I(inode)->flags); 371 + } 372 + 373 + /** 374 + * Caller must hold a reference to inode to prevent it from being freed while 375 + * it is marked inuse. 376 + */ 377 + bool ovl_inuse_trylock(struct dentry *dentry) 378 + { 379 + struct inode *inode = d_inode(dentry); 380 + bool locked = false; 381 + 382 + spin_lock(&inode->i_lock); 383 + if (!(inode->i_state & I_OVL_INUSE)) { 384 + inode->i_state |= I_OVL_INUSE; 385 + locked = true; 386 + } 387 + spin_unlock(&inode->i_lock); 388 + 389 + return locked; 390 + } 391 + 392 + void ovl_inuse_unlock(struct dentry *dentry) 393 + { 394 + if (dentry) { 395 + struct inode *inode = d_inode(dentry); 396 + 397 + spin_lock(&inode->i_lock); 398 + WARN_ON(!(inode->i_state & I_OVL_INUSE)); 399 + inode->i_state &= ~I_OVL_INUSE; 400 + spin_unlock(&inode->i_lock); 401 + } 402 + } 403 + 404 + /* Called must hold OVL_I(inode)->oi_lock */ 405 + static void ovl_cleanup_index(struct dentry *dentry) 406 + { 407 + struct inode *dir = ovl_indexdir(dentry->d_sb)->d_inode; 408 + struct dentry *lowerdentry = ovl_dentry_lower(dentry); 409 + struct dentry *upperdentry = ovl_dentry_upper(dentry); 410 + struct dentry *index = NULL; 411 + struct inode *inode; 412 + struct qstr name; 413 + int err; 414 + 415 + err = ovl_get_index_name(lowerdentry, &name); 416 + if (err) 417 + goto fail; 418 + 419 + inode = d_inode(upperdentry); 420 + if (inode->i_nlink != 1) { 421 + pr_warn_ratelimited("overlayfs: cleanup linked index (%pd2, ino=%lu, nlink=%u)\n", 422 + upperdentry, inode->i_ino, inode->i_nlink); 423 + /* 424 + * We either have a bug with persistent union nlink or a lower 425 + * hardlink was added while overlay is mounted. Adding a lower 426 + * hardlink and then unlinking all overlay hardlinks would drop 427 + * overlay nlink to zero before all upper inodes are unlinked. 428 + * As a safety measure, when that situation is detected, set 429 + * the overlay nlink to the index inode nlink minus one for the 430 + * index entry itself. 431 + */ 432 + set_nlink(d_inode(dentry), inode->i_nlink - 1); 433 + ovl_set_nlink_upper(dentry); 434 + goto out; 435 + } 436 + 437 + inode_lock_nested(dir, I_MUTEX_PARENT); 438 + /* TODO: whiteout instead of cleanup to block future open by handle */ 439 + index = lookup_one_len(name.name, ovl_indexdir(dentry->d_sb), name.len); 440 + err = PTR_ERR(index); 441 + if (!IS_ERR(index)) 442 + err = ovl_cleanup(dir, index); 443 + inode_unlock(dir); 444 + if (err) 445 + goto fail; 446 + 447 + out: 448 + dput(index); 449 + return; 450 + 451 + fail: 452 + pr_err("overlayfs: cleanup index of '%pd2' failed (%i)\n", dentry, err); 453 + goto out; 454 + } 455 + 456 + /* 457 + * Operations that change overlay inode and upper inode nlink need to be 458 + * synchronized with copy up for persistent nlink accounting. 459 + */ 460 + int ovl_nlink_start(struct dentry *dentry, bool *locked) 461 + { 462 + struct ovl_inode *oi = OVL_I(d_inode(dentry)); 463 + const struct cred *old_cred; 464 + int err; 465 + 466 + if (!d_inode(dentry) || d_is_dir(dentry)) 467 + return 0; 468 + 469 + /* 470 + * With inodes index is enabled, we store the union overlay nlink 471 + * in an xattr on the index inode. When whiting out lower hardlinks 472 + * we need to decrement the overlay persistent nlink, but before the 473 + * first copy up, we have no upper index inode to store the xattr. 474 + * 475 + * As a workaround, before whiteout/rename over of a lower hardlink, 476 + * copy up to create the upper index. Creating the upper index will 477 + * initialize the overlay nlink, so it could be dropped if unlink 478 + * or rename succeeds. 479 + * 480 + * TODO: implement metadata only index copy up when called with 481 + * ovl_copy_up_flags(dentry, O_PATH). 482 + */ 483 + if (ovl_indexdir(dentry->d_sb) && !ovl_dentry_has_upper_alias(dentry) && 484 + d_inode(ovl_dentry_lower(dentry))->i_nlink > 1) { 485 + err = ovl_copy_up(dentry); 486 + if (err) 487 + return err; 488 + } 489 + 490 + err = mutex_lock_interruptible(&oi->lock); 491 + if (err) 492 + return err; 493 + 494 + if (!ovl_test_flag(OVL_INDEX, d_inode(dentry))) 495 + goto out; 496 + 497 + old_cred = ovl_override_creds(dentry->d_sb); 498 + /* 499 + * The overlay inode nlink should be incremented/decremented IFF the 500 + * upper operation succeeds, along with nlink change of upper inode. 501 + * Therefore, before link/unlink/rename, we store the union nlink 502 + * value relative to the upper inode nlink in an upper inode xattr. 503 + */ 504 + err = ovl_set_nlink_upper(dentry); 505 + revert_creds(old_cred); 506 + 507 + out: 508 + if (err) 509 + mutex_unlock(&oi->lock); 510 + else 511 + *locked = true; 512 + 513 + return err; 514 + } 515 + 516 + void ovl_nlink_end(struct dentry *dentry, bool locked) 517 + { 518 + if (locked) { 519 + if (ovl_test_flag(OVL_INDEX, d_inode(dentry)) && 520 + d_inode(dentry)->i_nlink == 0) { 521 + const struct cred *old_cred; 522 + 523 + old_cred = ovl_override_creds(dentry->d_sb); 524 + ovl_cleanup_index(dentry); 525 + revert_creds(old_cred); 526 + } 527 + 528 + mutex_unlock(&OVL_I(d_inode(dentry))->lock); 529 + } 373 530 }
+4
include/linux/fs.h
··· 1955 1955 * wb stat updates to grab mapping->tree_lock. See 1956 1956 * inode_switch_wb_work_fn() for details. 1957 1957 * 1958 + * I_OVL_INUSE Used by overlayfs to get exclusive ownership on upper 1959 + * and work dirs among overlayfs mounts. 1960 + * 1958 1961 * Q: What is the difference between I_WILL_FREE and I_FREEING? 1959 1962 */ 1960 1963 #define I_DIRTY_SYNC (1 << 0) ··· 1978 1975 #define __I_DIRTY_TIME_EXPIRED 12 1979 1976 #define I_DIRTY_TIME_EXPIRED (1 << __I_DIRTY_TIME_EXPIRED) 1980 1977 #define I_WB_SWITCH (1 << 13) 1978 + #define I_OVL_INUSE (1 << 14) 1981 1979 1982 1980 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES) 1983 1981 #define I_DIRTY_ALL (I_DIRTY | I_DIRTY_TIME)