Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs updates from Miklos Szeredi:
"In addition to bug fixes and cleanups there are two new features from
Amir:

- Consistent inode number support for the case when layers are not
all on the same filesystem (feature is dubbed "xino").

- Optimize overlayfs file handle decoding. This one touches the
exportfs interface to allow detecting the disconnected directory
case"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: update documentation w.r.t "xino" feature
ovl: add support for "xino" mount and config options
ovl: consistent d_ino for non-samefs with xino
ovl: consistent i_ino for non-samefs with xino
ovl: constant st_ino for non-samefs with xino
ovl: allocate anon bdev per unique lower fs
ovl: factor out ovl_map_dev_ino() helper
ovl: cleanup ovl_update_time()
ovl: add WARN_ON() for non-dir redirect cases
ovl: cleanup setting OVL_INDEX
ovl: set d->is_dir and d->opaque for last path element
ovl: Do not check for redirect if this is last layer
ovl: lookup in inode cache first when decoding lower file handle
ovl: do not try to reconnect a disconnected origin dentry
ovl: disambiguate ovl_encode_fh()
ovl: set lower layer st_dev only if setting lower st_ino
ovl: fix lookup with middle layer opaque dir and absolute path redirects
ovl: Set d->last properly during lookup
ovl: set i_ino to the value of st_ino for NFS export

+510 -172
+33 -6
Documentation/filesystems/overlayfs.txt
··· 14 14 filesystem for various technical reasons. The expectation is that 15 15 many use cases will be able to ignore these differences. 16 16 17 - This approach is 'hybrid' because the objects that appear in the 18 - filesystem do not all appear to belong to that filesystem. In many 19 - cases an object accessed in the union will be indistinguishable 17 + 18 + Overlay objects 19 + --------------- 20 + 21 + The overlay filesystem approach is 'hybrid', because the objects that 22 + appear in the filesystem do not always appear to belong to that filesystem. 23 + In many cases, an object accessed in the union will be indistinguishable 20 24 from accessing the corresponding object from the original filesystem. 21 25 This is most obvious from the 'st_dev' field returned by stat(2). 22 26 ··· 37 33 make the overlay mount more compliant with filesystem scanners and 38 34 overlay objects will be distinguishable from the corresponding 39 35 objects in the original filesystem. 36 + 37 + On 64bit systems, even if all overlay layers are not on the same 38 + underlying filesystem, the same compliant behavior could be achieved 39 + with the "xino" feature. The "xino" feature composes a unique object 40 + identifier from the real object st_ino and an underlying fsid index. 41 + If all underlying filesystems support NFS file handles and export file 42 + handles with 32bit inode number encoding (e.g. ext4), overlay filesystem 43 + will use the high inode number bits for fsid. Even when the underlying 44 + filesystem uses 64bit inode numbers, users can still enable the "xino" 45 + feature with the "-o xino=on" overlay mount option. That is useful for the 46 + case of underlying filesystems like xfs and tmpfs, which use 64bit inode 47 + numbers, but are very unlikely to use the high inode number bit. 48 + 40 49 41 50 Upper and Lower 42 51 --------------- ··· 307 290 --------------------- 308 291 309 292 The copy_up operation essentially creates a new, identical file and 310 - moves it over to the old name. The new file may be on a different 311 - filesystem, so both st_dev and st_ino of the file may change. 293 + moves it over to the old name. Any open files referring to this inode 294 + will access the old data. 312 295 313 - Any open files referring to this inode will access the old data. 296 + The new file may be on a different filesystem, so both st_dev and st_ino 297 + of the real file may change. The values of st_dev and st_ino returned by 298 + stat(2) on an overlay object are often not the same as the real file 299 + stat(2) values to prevent the values from changing on copy_up. 300 + 301 + Unless "xino" feature is enabled, when overlay layers are not all on the 302 + same underlying filesystem, the value of st_dev may be different for two 303 + non-directory objects in the same overlay filesystem and the value of 304 + st_ino for directory objects may be non persistent and could change even 305 + while the overlay filesystem is still mounted. 314 306 315 307 Unless "inode index" feature is enabled, if a file with multiple hard 316 308 links is copied up, then this will "break" the link. Changes will not be ··· 327 301 328 302 Unless "redirect_dir" feature is enabled, rename(2) on a lower or merged 329 303 directory will fail with EXDEV. 304 + 330 305 331 306 Changes to underlying filesystems 332 307 ---------------------------------
+9
fs/exportfs/expfs.c
··· 435 435 if (IS_ERR_OR_NULL(result)) 436 436 return ERR_PTR(-ESTALE); 437 437 438 + /* 439 + * If no acceptance criteria was specified by caller, a disconnected 440 + * dentry is also accepatable. Callers may use this mode to query if 441 + * file handle is stale or to get a reference to an inode without 442 + * risking the high overhead caused by directory reconnect. 443 + */ 444 + if (!acceptable) 445 + return result; 446 + 438 447 if (d_is_dir(result)) { 439 448 /* 440 449 * This request is for a directory.
+17
fs/overlayfs/Kconfig
··· 86 86 case basis with the "nfs_export=on" mount option. 87 87 88 88 Say N unless you fully understand the consequences. 89 + 90 + config OVERLAY_FS_XINO_AUTO 91 + bool "Overlayfs: auto enable inode number mapping" 92 + default n 93 + depends on OVERLAY_FS 94 + help 95 + If this config option is enabled then overlay filesystems will use 96 + unused high bits in undelying filesystem inode numbers to map all 97 + inodes to a unified address space. The mapped 64bit inode numbers 98 + might not be compatible with applications that expect 32bit inodes. 99 + 100 + If compatibility with applications that expect 32bit inodes is not an 101 + issue, then it is safe and recommended to say Y here. 102 + 103 + For more information, see Documentation/filesystems/overlayfs.txt 104 + 105 + If unsure, say N.
+3 -3
fs/overlayfs/copy_up.c
··· 232 232 return err; 233 233 } 234 234 235 - struct ovl_fh *ovl_encode_fh(struct dentry *real, bool is_upper) 235 + struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper) 236 236 { 237 237 struct ovl_fh *fh; 238 238 int fh_type, fh_len, dwords; ··· 300 300 * up and a pure upper inode. 301 301 */ 302 302 if (ovl_can_decode_fh(lower->d_sb)) { 303 - fh = ovl_encode_fh(lower, false); 303 + fh = ovl_encode_real_fh(lower, false); 304 304 if (IS_ERR(fh)) 305 305 return PTR_ERR(fh); 306 306 } ··· 321 321 const struct ovl_fh *fh; 322 322 int err; 323 323 324 - fh = ovl_encode_fh(upper, true); 324 + fh = ovl_encode_real_fh(upper, true); 325 325 if (IS_ERR(fh)) 326 326 return PTR_ERR(fh); 327 327
+40 -35
fs/overlayfs/export.c
··· 228 228 goto fail; 229 229 230 230 /* Encode an upper or lower file handle */ 231 - fh = ovl_encode_fh(enc_lower ? ovl_dentry_lower(dentry) : 232 - ovl_dentry_upper(dentry), !enc_lower); 231 + fh = ovl_encode_real_fh(enc_lower ? ovl_dentry_lower(dentry) : 232 + ovl_dentry_upper(dentry), !enc_lower); 233 233 err = PTR_ERR(fh); 234 234 if (IS_ERR(fh)) 235 235 goto fail; ··· 267 267 return OVL_FILEID; 268 268 } 269 269 270 - static int ovl_encode_inode_fh(struct inode *inode, u32 *fid, int *max_len, 271 - struct inode *parent) 270 + static int ovl_encode_fh(struct inode *inode, u32 *fid, int *max_len, 271 + struct inode *parent) 272 272 { 273 273 struct dentry *dentry; 274 274 int type; ··· 305 305 if (d_is_dir(upper ?: lower)) 306 306 return ERR_PTR(-EIO); 307 307 308 - inode = ovl_get_inode(sb, dget(upper), lower, index, !!lower); 308 + inode = ovl_get_inode(sb, dget(upper), lowerpath, index, !!lower); 309 309 if (IS_ERR(inode)) { 310 310 dput(upper); 311 311 return ERR_CAST(inode); 312 312 } 313 - 314 - if (index) 315 - ovl_set_flag(OVL_INDEX, inode); 316 313 317 314 dentry = d_find_any_alias(inode); 318 315 if (!dentry) { ··· 682 685 if (!ofs->upper_mnt) 683 686 return ERR_PTR(-EACCES); 684 687 685 - upper = ovl_decode_fh(fh, ofs->upper_mnt); 688 + upper = ovl_decode_real_fh(fh, ofs->upper_mnt, true); 686 689 if (IS_ERR_OR_NULL(upper)) 687 690 return upper; 688 691 ··· 700 703 struct ovl_path *stack = &origin; 701 704 struct dentry *dentry = NULL; 702 705 struct dentry *index = NULL; 703 - struct inode *inode = NULL; 704 - bool is_deleted = false; 706 + struct inode *inode; 705 707 int err; 706 708 707 - /* First lookup indexed upper by fh */ 709 + /* First lookup overlay inode in inode cache by origin fh */ 710 + err = ovl_check_origin_fh(ofs, fh, false, NULL, &stack); 711 + if (err) 712 + return ERR_PTR(err); 713 + 714 + if (!d_is_dir(origin.dentry) || 715 + !(origin.dentry->d_flags & DCACHE_DISCONNECTED)) { 716 + inode = ovl_lookup_inode(sb, origin.dentry, false); 717 + err = PTR_ERR(inode); 718 + if (IS_ERR(inode)) 719 + goto out_err; 720 + if (inode) { 721 + dentry = d_find_any_alias(inode); 722 + iput(inode); 723 + if (dentry) 724 + goto out; 725 + } 726 + } 727 + 728 + /* Then lookup indexed upper/whiteout by origin fh */ 708 729 if (ofs->indexdir) { 709 730 index = ovl_get_index_fh(ofs, fh); 710 731 err = PTR_ERR(index); 711 732 if (IS_ERR(index)) { 712 - if (err != -ESTALE) 713 - return ERR_PTR(err); 714 - 715 - /* Found a whiteout index - treat as deleted inode */ 716 - is_deleted = true; 717 733 index = NULL; 734 + goto out_err; 718 735 } 719 736 } 720 737 721 - /* Then try to get upper dir by index */ 738 + /* Then try to get a connected upper dir by index */ 722 739 if (index && d_is_dir(index)) { 723 740 struct dentry *upper = ovl_index_upper(ofs, index); 724 741 ··· 745 734 goto out; 746 735 } 747 736 748 - /* Then lookup origin by fh */ 749 - err = ovl_check_origin_fh(ofs, fh, NULL, &stack); 750 - if (err) { 751 - goto out_err; 752 - } else if (index) { 737 + /* Otherwise, get a connected non-upper dir or disconnected non-dir */ 738 + if (d_is_dir(origin.dentry) && 739 + (origin.dentry->d_flags & DCACHE_DISCONNECTED)) { 740 + dput(origin.dentry); 741 + origin.dentry = NULL; 742 + err = ovl_check_origin_fh(ofs, fh, true, NULL, &stack); 743 + if (err) 744 + goto out_err; 745 + } 746 + if (index) { 753 747 err = ovl_verify_origin(index, origin.dentry, false); 754 748 if (err) 755 749 goto out_err; 756 - } else if (is_deleted) { 757 - /* Lookup deleted non-dir by origin inode */ 758 - if (!d_is_dir(origin.dentry)) 759 - inode = ovl_lookup_inode(sb, origin.dentry, false); 760 - err = -ESTALE; 761 - if (!inode || atomic_read(&inode->i_count) == 1) 762 - goto out_err; 763 - 764 - /* Deleted but still open? */ 765 - index = dget(ovl_i_dentry_upper(inode)); 766 750 } 767 751 768 752 dentry = ovl_get_dentry(sb, NULL, &origin, index); ··· 765 759 out: 766 760 dput(origin.dentry); 767 761 dput(index); 768 - iput(inode); 769 762 return dentry; 770 763 771 764 out_err: ··· 834 829 } 835 830 836 831 const struct export_operations ovl_export_operations = { 837 - .encode_fh = ovl_encode_inode_fh, 832 + .encode_fh = ovl_encode_fh, 838 833 .fh_to_dentry = ovl_fh_to_dentry, 839 834 .fh_to_parent = ovl_fh_to_parent, 840 835 .get_name = ovl_get_name,
+120 -66
fs/overlayfs/inode.c
··· 16 16 #include "overlayfs.h" 17 17 18 18 19 - static dev_t ovl_get_pseudo_dev(struct dentry *dentry) 20 - { 21 - struct ovl_entry *oe = dentry->d_fsdata; 22 - 23 - return oe->lowerstack[0].layer->pseudo_dev; 24 - } 25 - 26 19 int ovl_setattr(struct dentry *dentry, struct iattr *attr) 27 20 { 28 21 int err; ··· 59 66 return err; 60 67 } 61 68 69 + static int ovl_map_dev_ino(struct dentry *dentry, struct kstat *stat, 70 + struct ovl_layer *lower_layer) 71 + { 72 + bool samefs = ovl_same_sb(dentry->d_sb); 73 + unsigned int xinobits = ovl_xino_bits(dentry->d_sb); 74 + 75 + if (samefs) { 76 + /* 77 + * When all layers are on the same fs, all real inode 78 + * number are unique, so we use the overlay st_dev, 79 + * which is friendly to du -x. 80 + */ 81 + stat->dev = dentry->d_sb->s_dev; 82 + return 0; 83 + } else if (xinobits) { 84 + unsigned int shift = 64 - xinobits; 85 + /* 86 + * All inode numbers of underlying fs should not be using the 87 + * high xinobits, so we use high xinobits to partition the 88 + * overlay st_ino address space. The high bits holds the fsid 89 + * (upper fsid is 0). This way overlay inode numbers are unique 90 + * and all inodes use overlay st_dev. Inode numbers are also 91 + * persistent for a given layer configuration. 92 + */ 93 + if (stat->ino >> shift) { 94 + pr_warn_ratelimited("overlayfs: inode number too big (%pd2, ino=%llu, xinobits=%d)\n", 95 + dentry, stat->ino, xinobits); 96 + } else { 97 + if (lower_layer) 98 + stat->ino |= ((u64)lower_layer->fsid) << shift; 99 + 100 + stat->dev = dentry->d_sb->s_dev; 101 + return 0; 102 + } 103 + } 104 + 105 + /* The inode could not be mapped to a unified st_ino address space */ 106 + if (S_ISDIR(dentry->d_inode->i_mode)) { 107 + /* 108 + * Always use the overlay st_dev for directories, so 'find 109 + * -xdev' will scan the entire overlay mount and won't cross the 110 + * overlay mount boundaries. 111 + * 112 + * If not all layers are on the same fs the pair {real st_ino; 113 + * overlay st_dev} is not unique, so use the non persistent 114 + * overlay st_ino for directories. 115 + */ 116 + stat->dev = dentry->d_sb->s_dev; 117 + stat->ino = dentry->d_inode->i_ino; 118 + } else if (lower_layer && lower_layer->fsid) { 119 + /* 120 + * For non-samefs setup, if we cannot map all layers st_ino 121 + * to a unified address space, we need to make sure that st_dev 122 + * is unique per lower fs. Upper layer uses real st_dev and 123 + * lower layers use the unique anonymous bdev assigned to the 124 + * lower fs. 125 + */ 126 + stat->dev = lower_layer->fs->pseudo_dev; 127 + } 128 + 129 + return 0; 130 + } 131 + 62 132 int ovl_getattr(const struct path *path, struct kstat *stat, 63 133 u32 request_mask, unsigned int flags) 64 134 { ··· 131 75 const struct cred *old_cred; 132 76 bool is_dir = S_ISDIR(dentry->d_inode->i_mode); 133 77 bool samefs = ovl_same_sb(dentry->d_sb); 78 + struct ovl_layer *lower_layer = NULL; 134 79 int err; 135 80 136 81 type = ovl_path_real(dentry, &realpath); ··· 141 84 goto out; 142 85 143 86 /* 144 - * For non-dir or same fs, we use st_ino of the copy up origin, if we 145 - * know it. This guaranties constant st_dev/st_ino across copy up. 87 + * For non-dir or same fs, we use st_ino of the copy up origin. 88 + * This guaranties constant st_dev/st_ino across copy up. 89 + * With xino feature and non-samefs, we use st_ino of the copy up 90 + * origin masked with high bits that represent the layer id. 146 91 * 147 - * If filesystem supports NFS export ops, this also guaranties 92 + * If lower filesystem supports NFS file handles, this also guaranties 148 93 * persistent st_ino across mount cycle. 149 94 */ 150 - if (!is_dir || samefs) { 151 - if (OVL_TYPE_ORIGIN(type)) { 95 + if (!is_dir || samefs || ovl_xino_bits(dentry->d_sb)) { 96 + if (!OVL_TYPE_UPPER(type)) { 97 + lower_layer = ovl_layer_lower(dentry); 98 + } else if (OVL_TYPE_ORIGIN(type)) { 152 99 struct kstat lowerstat; 153 100 u32 lowermask = STATX_INO | (!is_dir ? STATX_NLINK : 0); 154 101 ··· 179 118 */ 180 119 if (ovl_test_flag(OVL_INDEX, d_inode(dentry)) || 181 120 (!ovl_verify_lower(dentry->d_sb) && 182 - (is_dir || lowerstat.nlink == 1))) 121 + (is_dir || lowerstat.nlink == 1))) { 183 122 stat->ino = lowerstat.ino; 184 - 185 - if (samefs) 186 - WARN_ON_ONCE(stat->dev != lowerstat.dev); 187 - else 188 - stat->dev = ovl_get_pseudo_dev(dentry); 123 + lower_layer = ovl_layer_lower(dentry); 124 + } 189 125 } 190 - if (samefs) { 191 - /* 192 - * When all layers are on the same fs, all real inode 193 - * number are unique, so we use the overlay st_dev, 194 - * which is friendly to du -x. 195 - */ 196 - stat->dev = dentry->d_sb->s_dev; 197 - } else if (!OVL_TYPE_UPPER(type)) { 198 - /* 199 - * For non-samefs setup, to make sure that st_dev/st_ino 200 - * pair is unique across the system, we use a unique 201 - * anonymous st_dev for lower layer inode. 202 - */ 203 - stat->dev = ovl_get_pseudo_dev(dentry); 204 - } 205 - } else { 206 - /* 207 - * Always use the overlay st_dev for directories, so 'find 208 - * -xdev' will scan the entire overlay mount and won't cross the 209 - * overlay mount boundaries. 210 - * 211 - * If not all layers are on the same fs the pair {real st_ino; 212 - * overlay st_dev} is not unique, so use the non persistent 213 - * overlay st_ino for directories. 214 - */ 215 - stat->dev = dentry->d_sb->s_dev; 216 - stat->ino = dentry->d_inode->i_ino; 217 126 } 127 + 128 + err = ovl_map_dev_ino(dentry, stat, lower_layer); 129 + if (err) 130 + goto out; 218 131 219 132 /* 220 133 * It's probably not worth it to count subdirs to get the ··· 418 383 419 384 int ovl_update_time(struct inode *inode, struct timespec *ts, int flags) 420 385 { 421 - struct dentry *alias; 422 - struct path upperpath; 386 + if (flags & S_ATIME) { 387 + struct ovl_fs *ofs = inode->i_sb->s_fs_info; 388 + struct path upperpath = { 389 + .mnt = ofs->upper_mnt, 390 + .dentry = ovl_upperdentry_dereference(OVL_I(inode)), 391 + }; 423 392 424 - if (!(flags & S_ATIME)) 425 - return 0; 426 - 427 - alias = d_find_any_alias(inode); 428 - if (!alias) 429 - return 0; 430 - 431 - ovl_path_upper(alias, &upperpath); 432 - if (upperpath.dentry) { 433 - touch_atime(&upperpath); 434 - inode->i_atime = d_inode(upperpath.dentry)->i_atime; 393 + if (upperpath.dentry) { 394 + touch_atime(&upperpath); 395 + inode->i_atime = d_inode(upperpath.dentry)->i_atime; 396 + } 435 397 } 436 - 437 - dput(alias); 438 - 439 398 return 0; 440 399 } 441 400 ··· 488 459 #endif 489 460 } 490 461 491 - static void ovl_fill_inode(struct inode *inode, umode_t mode, dev_t rdev) 462 + static void ovl_fill_inode(struct inode *inode, umode_t mode, dev_t rdev, 463 + unsigned long ino, int fsid) 492 464 { 493 - inode->i_ino = get_next_ino(); 465 + int xinobits = ovl_xino_bits(inode->i_sb); 466 + 467 + /* 468 + * When NFS export is enabled and d_ino is consistent with st_ino 469 + * (samefs or i_ino has enough bits to encode layer), set the same 470 + * value used for d_ino to i_ino, because nfsd readdirplus compares 471 + * d_ino values to i_ino values of child entries. When called from 472 + * ovl_new_inode(), ino arg is 0, so i_ino will be updated to real 473 + * upper inode i_ino on ovl_inode_init() or ovl_inode_update(). 474 + */ 475 + if (inode->i_sb->s_export_op && 476 + (ovl_same_sb(inode->i_sb) || xinobits)) { 477 + inode->i_ino = ino; 478 + if (xinobits && fsid && !(ino >> (64 - xinobits))) 479 + inode->i_ino |= (unsigned long)fsid << (64 - xinobits); 480 + } else { 481 + inode->i_ino = get_next_ino(); 482 + } 494 483 inode->i_mode = mode; 495 484 inode->i_flags |= S_NOCMTIME; 496 485 #ifdef CONFIG_FS_POSIX_ACL ··· 644 597 645 598 inode = new_inode(sb); 646 599 if (inode) 647 - ovl_fill_inode(inode, mode, rdev); 600 + ovl_fill_inode(inode, mode, rdev, 0, 0); 648 601 649 602 return inode; 650 603 } ··· 750 703 } 751 704 752 705 struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry, 753 - struct dentry *lowerdentry, struct dentry *index, 706 + struct ovl_path *lowerpath, struct dentry *index, 754 707 unsigned int numlower) 755 708 { 756 709 struct inode *realinode = upperdentry ? d_inode(upperdentry) : NULL; 757 710 struct inode *inode; 711 + struct dentry *lowerdentry = lowerpath ? lowerpath->dentry : NULL; 758 712 bool bylower = ovl_hash_bylower(sb, upperdentry, lowerdentry, index); 713 + int fsid = bylower ? lowerpath->layer->fsid : 0; 759 714 bool is_dir; 715 + unsigned long ino = 0; 760 716 761 717 if (!realinode) 762 718 realinode = d_inode(lowerdentry); ··· 798 748 if (!is_dir) 799 749 nlink = ovl_get_nlink(lowerdentry, upperdentry, nlink); 800 750 set_nlink(inode, nlink); 751 + ino = key->i_ino; 801 752 } else { 802 753 /* Lower hardlink that will be broken on copy up */ 803 754 inode = new_inode(sb); 804 755 if (!inode) 805 756 goto out_nomem; 806 757 } 807 - ovl_fill_inode(inode, realinode->i_mode, realinode->i_rdev); 758 + ovl_fill_inode(inode, realinode->i_mode, realinode->i_rdev, ino, fsid); 808 759 ovl_inode_init(inode, upperdentry, lowerdentry); 809 760 810 761 if (upperdentry && ovl_is_impuredir(upperdentry)) 811 762 ovl_set_flag(OVL_IMPURE, inode); 763 + 764 + if (index) 765 + ovl_set_flag(OVL_INDEX, inode); 812 766 813 767 /* Check for non-merge dir that may have whiteouts */ 814 768 if (is_dir) {
+48 -19
fs/overlayfs/namei.c
··· 56 56 if (s == next) 57 57 goto invalid; 58 58 } 59 + /* 60 + * One of the ancestor path elements in an absolute path 61 + * lookup in ovl_lookup_layer() could have been opaque and 62 + * that will stop further lookup in lower layers (d->stop=true) 63 + * But we have found an absolute redirect in decendant path 64 + * element and that should force continue lookup in lower 65 + * layers (reset d->stop). 66 + */ 67 + d->stop = false; 59 68 } else { 60 69 if (strchr(buf, '/') != NULL) 61 70 goto invalid; ··· 180 171 goto out; 181 172 } 182 173 183 - struct dentry *ovl_decode_fh(struct ovl_fh *fh, struct vfsmount *mnt) 174 + struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt, 175 + bool connected) 184 176 { 185 177 struct dentry *real; 186 178 int bytes; ··· 196 186 bytes = (fh->len - offsetof(struct ovl_fh, fid)); 197 187 real = exportfs_decode_fh(mnt, (struct fid *)fh->fid, 198 188 bytes >> 2, (int)fh->type, 199 - ovl_acceptable, mnt); 189 + connected ? ovl_acceptable : NULL, mnt); 200 190 if (IS_ERR(real)) { 201 191 /* 202 192 * Treat stale file handle to lower file as "origin unknown". ··· 230 220 { 231 221 struct dentry *this; 232 222 int err; 223 + bool last_element = !post[0]; 233 224 234 225 this = lookup_one_len_unlocked(name, base, namelen); 235 226 if (IS_ERR(this)) { ··· 256 245 d->stop = true; 257 246 if (d->is_dir) 258 247 goto put_and_out; 248 + 249 + /* 250 + * NB: handle failure to lookup non-last element when non-dir 251 + * redirects become possible 252 + */ 253 + WARN_ON(!last_element); 259 254 goto out; 260 255 } 261 - d->is_dir = true; 262 - if (!d->last && ovl_is_opaquedir(this)) { 263 - d->stop = d->opaque = true; 256 + if (last_element) 257 + d->is_dir = true; 258 + if (d->last) 259 + goto out; 260 + 261 + if (ovl_is_opaquedir(this)) { 262 + d->stop = true; 263 + if (last_element) 264 + d->opaque = true; 264 265 goto out; 265 266 } 266 267 err = ovl_check_redirect(this, d, prelen, post); ··· 333 310 } 334 311 335 312 336 - int ovl_check_origin_fh(struct ovl_fs *ofs, struct ovl_fh *fh, 313 + int ovl_check_origin_fh(struct ovl_fs *ofs, struct ovl_fh *fh, bool connected, 337 314 struct dentry *upperdentry, struct ovl_path **stackp) 338 315 { 339 316 struct dentry *origin = NULL; 340 317 int i; 341 318 342 319 for (i = 0; i < ofs->numlower; i++) { 343 - origin = ovl_decode_fh(fh, ofs->lower_layers[i].mnt); 320 + origin = ovl_decode_real_fh(fh, ofs->lower_layers[i].mnt, 321 + connected); 344 322 if (origin) 345 323 break; 346 324 } ··· 385 361 if (IS_ERR_OR_NULL(fh)) 386 362 return PTR_ERR(fh); 387 363 388 - err = ovl_check_origin_fh(ofs, fh, upperdentry, stackp); 364 + err = ovl_check_origin_fh(ofs, fh, false, upperdentry, stackp); 389 365 kfree(fh); 390 366 391 367 if (err) { ··· 439 415 struct ovl_fh *fh; 440 416 int err; 441 417 442 - fh = ovl_encode_fh(real, is_upper); 418 + fh = ovl_encode_real_fh(real, is_upper); 443 419 err = PTR_ERR(fh); 444 420 if (IS_ERR(fh)) 445 421 goto fail; ··· 475 451 if (IS_ERR_OR_NULL(fh)) 476 452 return ERR_CAST(fh); 477 453 478 - upper = ovl_decode_fh(fh, ofs->upper_mnt); 454 + upper = ovl_decode_real_fh(fh, ofs->upper_mnt, true); 479 455 kfree(fh); 480 456 481 457 if (IS_ERR_OR_NULL(upper)) ··· 582 558 583 559 /* Check if non-dir index is orphan and don't warn before cleaning it */ 584 560 if (!d_is_dir(index) && d_inode(index)->i_nlink == 1) { 585 - err = ovl_check_origin_fh(ofs, fh, index, &stack); 561 + err = ovl_check_origin_fh(ofs, fh, false, index, &stack); 586 562 if (err) 587 563 goto fail; 588 564 ··· 643 619 struct ovl_fh *fh; 644 620 int err; 645 621 646 - fh = ovl_encode_fh(origin, false); 622 + fh = ovl_encode_real_fh(origin, false); 647 623 if (IS_ERR(fh)) 648 624 return PTR_ERR(fh); 649 625 ··· 839 815 .is_dir = false, 840 816 .opaque = false, 841 817 .stop = false, 842 - .last = !poe->numlower, 818 + .last = ofs->config.redirect_follow ? false : !poe->numlower, 843 819 .redirect = NULL, 844 820 }; 845 821 ··· 897 873 for (i = 0; !d.stop && i < poe->numlower; i++) { 898 874 struct ovl_path lower = poe->lowerstack[i]; 899 875 900 - d.last = i == poe->numlower - 1; 876 + if (!ofs->config.redirect_follow) 877 + d.last = i == poe->numlower - 1; 878 + else 879 + d.last = lower.layer->idx == roe->numlower; 880 + 901 881 err = ovl_lookup_layer(lower.dentry, &d, &this); 902 882 if (err) 903 883 goto out_put; ··· 1004 976 upperdentry = dget(index); 1005 977 1006 978 if (upperdentry || ctr) { 1007 - if (ctr) 1008 - origin = stack[0].dentry; 1009 - inode = ovl_get_inode(dentry->d_sb, upperdentry, origin, index, 979 + inode = ovl_get_inode(dentry->d_sb, upperdentry, stack, index, 1010 980 ctr); 1011 981 err = PTR_ERR(inode); 1012 982 if (IS_ERR(inode)) 1013 983 goto out_free_oe; 1014 984 985 + /* 986 + * NB: handle redirected hard links when non-dir redirects 987 + * become possible 988 + */ 989 + WARN_ON(OVL_I(inode)->redirect); 1015 990 OVL_I(inode)->redirect = upperredirect; 1016 - if (index) 1017 - ovl_set_flag(OVL_INDEX, inode); 1018 991 } 1019 992 1020 993 revert_creds(old_cred);
+14 -5
fs/overlayfs/overlayfs.h
··· 202 202 struct dentry *ovl_workdir(struct dentry *dentry); 203 203 const struct cred *ovl_override_creds(struct super_block *sb); 204 204 struct super_block *ovl_same_sb(struct super_block *sb); 205 - bool ovl_can_decode_fh(struct super_block *sb); 205 + int ovl_can_decode_fh(struct super_block *sb); 206 206 struct dentry *ovl_indexdir(struct super_block *sb); 207 207 bool ovl_index_all(struct super_block *sb); 208 208 bool ovl_verify_lower(struct super_block *sb); ··· 215 215 enum ovl_path_type ovl_path_real(struct dentry *dentry, struct path *path); 216 216 struct dentry *ovl_dentry_upper(struct dentry *dentry); 217 217 struct dentry *ovl_dentry_lower(struct dentry *dentry); 218 + struct ovl_layer *ovl_layer_lower(struct dentry *dentry); 218 219 struct dentry *ovl_dentry_real(struct dentry *dentry); 219 220 struct dentry *ovl_i_dentry_upper(struct inode *inode); 220 221 struct inode *ovl_inode_upper(struct inode *inode); ··· 264 263 return ovl_check_dir_xattr(dentry, OVL_XATTR_IMPURE); 265 264 } 266 265 266 + static inline unsigned int ovl_xino_bits(struct super_block *sb) 267 + { 268 + struct ovl_fs *ofs = sb->s_fs_info; 269 + 270 + return ofs->xino_bits; 271 + } 272 + 267 273 268 274 /* namei.c */ 269 275 int ovl_check_fh_len(struct ovl_fh *fh, int fh_len); 270 - struct dentry *ovl_decode_fh(struct ovl_fh *fh, struct vfsmount *mnt); 271 - int ovl_check_origin_fh(struct ovl_fs *ofs, struct ovl_fh *fh, 276 + struct dentry *ovl_decode_real_fh(struct ovl_fh *fh, struct vfsmount *mnt, 277 + bool connected); 278 + int ovl_check_origin_fh(struct ovl_fs *ofs, struct ovl_fh *fh, bool connected, 272 279 struct dentry *upperdentry, struct ovl_path **stackp); 273 280 int ovl_verify_set_fh(struct dentry *dentry, const char *name, 274 281 struct dentry *real, bool is_upper, bool set); ··· 338 329 struct inode *ovl_lookup_inode(struct super_block *sb, struct dentry *real, 339 330 bool is_upper); 340 331 struct inode *ovl_get_inode(struct super_block *sb, struct dentry *upperdentry, 341 - struct dentry *lowerdentry, struct dentry *index, 332 + struct ovl_path *lowerpath, struct dentry *index, 342 333 unsigned int numlower); 343 334 static inline void ovl_copyattr(struct inode *from, struct inode *to) 344 335 { ··· 370 361 int ovl_copy_up_flags(struct dentry *dentry, int flags); 371 362 int ovl_copy_xattr(struct dentry *old, struct dentry *new); 372 363 int ovl_set_attr(struct dentry *upper, struct kstat *stat); 373 - struct ovl_fh *ovl_encode_fh(struct dentry *real, bool is_upper); 364 + struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper); 374 365 int ovl_set_origin(struct dentry *dentry, struct dentry *lower, 375 366 struct dentry *upper); 376 367
+16 -5
fs/overlayfs/ovl_entry.h
··· 18 18 const char *redirect_mode; 19 19 bool index; 20 20 bool nfs_export; 21 + int xino; 22 + }; 23 + 24 + struct ovl_sb { 25 + struct super_block *sb; 26 + dev_t pseudo_dev; 21 27 }; 22 28 23 29 struct ovl_layer { 24 30 struct vfsmount *mnt; 25 - dev_t pseudo_dev; 26 - /* Index of this layer in fs root (upper == 0) */ 31 + struct ovl_sb *fs; 32 + /* Index of this layer in fs root (upper idx == 0) */ 27 33 int idx; 34 + /* One fsid per unique underlying sb (upper fsid == 0) */ 35 + int fsid; 28 36 }; 29 37 30 38 struct ovl_path { ··· 43 35 /* private information held for overlayfs's superblock */ 44 36 struct ovl_fs { 45 37 struct vfsmount *upper_mnt; 46 - unsigned numlower; 38 + unsigned int numlower; 39 + /* Number of unique lower sb that differ from upper sb */ 40 + unsigned int numlowerfs; 47 41 struct ovl_layer *lower_layers; 42 + struct ovl_sb *lower_fs; 48 43 /* workbasedir is the path at workdir= mount option */ 49 44 struct dentry *workbasedir; 50 45 /* workdir is the 'work' directory under workbasedir */ ··· 61 50 const struct cred *creator_cred; 62 51 bool tmpfile; 63 52 bool noxattr; 64 - /* sb common to all layers */ 65 - struct super_block *same_sb; 66 53 /* Did we take the inuse lock? */ 67 54 bool upperdir_locked; 68 55 bool workdir_locked; 56 + /* Inode numbers in all layers do not use the high xino_bits */ 57 + unsigned int xino_bits; 69 58 }; 70 59 71 60 /* private information held for every overlayfs dentry */
+39 -6
fs/overlayfs/readdir.c
··· 120 120 if (!rdd->dentry) 121 121 return false; 122 122 123 + /* Always recalc d_ino when remapping lower inode numbers */ 124 + if (ovl_xino_bits(rdd->dentry->d_sb)) 125 + return true; 126 + 123 127 /* Always recalc d_ino for parent */ 124 128 if (strcmp(p->name, "..") == 0) 125 129 return true; ··· 439 435 return cache; 440 436 } 441 437 438 + /* Map inode number to lower fs unique range */ 439 + static u64 ovl_remap_lower_ino(u64 ino, int xinobits, int fsid, 440 + const char *name, int namelen) 441 + { 442 + if (ino >> (64 - xinobits)) { 443 + pr_warn_ratelimited("overlayfs: d_ino too big (%.*s, ino=%llu, xinobits=%d)\n", 444 + namelen, name, ino, xinobits); 445 + return ino; 446 + } 447 + 448 + return ino | ((u64)fsid) << (64 - xinobits); 449 + } 450 + 442 451 /* 443 452 * Set d_ino for upper entries. Non-upper entries should always report 444 453 * the uppermost real inode ino and should not call this function. ··· 469 452 struct dentry *this = NULL; 470 453 enum ovl_path_type type; 471 454 u64 ino = p->real_ino; 455 + int xinobits = ovl_xino_bits(dir->d_sb); 472 456 int err = 0; 473 457 474 - if (!ovl_same_sb(dir->d_sb)) 458 + if (!ovl_same_sb(dir->d_sb) && !xinobits) 475 459 goto out; 476 460 477 461 if (p->name[0] == '.') { ··· 509 491 510 492 WARN_ON_ONCE(dir->d_sb->s_dev != stat.dev); 511 493 ino = stat.ino; 494 + } else if (xinobits && !OVL_TYPE_UPPER(type)) { 495 + ino = ovl_remap_lower_ino(ino, xinobits, 496 + ovl_layer_lower(this)->fsid, 497 + p->name, p->len); 512 498 } 513 499 514 500 out: ··· 640 618 struct ovl_dir_cache *cache; 641 619 struct dir_context ctx; 642 620 u64 parent_ino; 621 + int fsid; 622 + int xinobits; 643 623 }; 644 624 645 625 static int ovl_fill_real(struct dir_context *ctx, const char *name, ··· 652 628 container_of(ctx, struct ovl_readdir_translate, ctx); 653 629 struct dir_context *orig_ctx = rdt->orig_ctx; 654 630 655 - if (rdt->parent_ino && strcmp(name, "..") == 0) 631 + if (rdt->parent_ino && strcmp(name, "..") == 0) { 656 632 ino = rdt->parent_ino; 657 - else if (rdt->cache) { 633 + } else if (rdt->cache) { 658 634 struct ovl_cache_entry *p; 659 635 660 636 p = ovl_cache_entry_find(&rdt->cache->root, name, namelen); 661 637 if (p) 662 638 ino = p->ino; 639 + } else if (rdt->xinobits) { 640 + ino = ovl_remap_lower_ino(ino, rdt->xinobits, rdt->fsid, 641 + name, namelen); 663 642 } 664 643 665 644 return orig_ctx->actor(orig_ctx, name, namelen, offset, ino, d_type); ··· 673 646 int err; 674 647 struct ovl_dir_file *od = file->private_data; 675 648 struct dentry *dir = file->f_path.dentry; 649 + struct ovl_layer *lower_layer = ovl_layer_lower(dir); 676 650 struct ovl_readdir_translate rdt = { 677 651 .ctx.actor = ovl_fill_real, 678 652 .orig_ctx = ctx, 653 + .xinobits = ovl_xino_bits(dir->d_sb), 679 654 }; 655 + 656 + if (rdt.xinobits && lower_layer) 657 + rdt.fsid = lower_layer->fsid; 680 658 681 659 if (OVL_TYPE_MERGE(ovl_path_type(dir->d_parent))) { 682 660 struct kstat stat; ··· 725 693 * dir is impure then need to adjust d_ino for copied up 726 694 * entries. 727 695 */ 728 - if (ovl_same_sb(dentry->d_sb) && 729 - (ovl_test_flag(OVL_IMPURE, d_inode(dentry)) || 730 - OVL_TYPE_MERGE(ovl_path_type(dentry->d_parent)))) { 696 + if (ovl_xino_bits(dentry->d_sb) || 697 + (ovl_same_sb(dentry->d_sb) && 698 + (ovl_test_flag(OVL_IMPURE, d_inode(dentry)) || 699 + OVL_TYPE_MERGE(ovl_path_type(dentry->d_parent))))) { 731 700 return ovl_iterate_real(file, ctx); 732 701 } 733 702 return iterate_dir(od->realfile, ctx);
+137 -22
fs/overlayfs/super.c
··· 17 17 #include <linux/statfs.h> 18 18 #include <linux/seq_file.h> 19 19 #include <linux/posix_acl_xattr.h> 20 + #include <linux/exportfs.h> 20 21 #include "overlayfs.h" 21 22 22 23 MODULE_AUTHOR("Miklos Szeredi <miklos@szeredi.hu>"); ··· 50 49 module_param_named(nfs_export, ovl_nfs_export_def, bool, 0644); 51 50 MODULE_PARM_DESC(ovl_nfs_export_def, 52 51 "Default to on or off for the NFS export feature"); 52 + 53 + static bool ovl_xino_auto_def = IS_ENABLED(CONFIG_OVERLAY_FS_XINO_AUTO); 54 + module_param_named(xino_auto, ovl_xino_auto_def, bool, 0644); 55 + MODULE_PARM_DESC(ovl_xino_auto_def, 56 + "Auto enable xino feature"); 53 57 54 58 static void ovl_entry_stack_free(struct ovl_entry *oe) 55 59 { ··· 242 236 if (ofs->upperdir_locked) 243 237 ovl_inuse_unlock(ofs->upper_mnt->mnt_root); 244 238 mntput(ofs->upper_mnt); 245 - for (i = 0; i < ofs->numlower; i++) { 239 + for (i = 0; i < ofs->numlower; i++) 246 240 mntput(ofs->lower_layers[i].mnt); 247 - free_anon_bdev(ofs->lower_layers[i].pseudo_dev); 248 - } 241 + for (i = 0; i < ofs->numlowerfs; i++) 242 + free_anon_bdev(ofs->lower_fs[i].pseudo_dev); 249 243 kfree(ofs->lower_layers); 244 + kfree(ofs->lower_fs); 250 245 251 246 kfree(ofs->config.lowerdir); 252 247 kfree(ofs->config.upperdir); ··· 332 325 return ovl_redirect_dir_def ? "on" : "off"; 333 326 } 334 327 328 + enum { 329 + OVL_XINO_OFF, 330 + OVL_XINO_AUTO, 331 + OVL_XINO_ON, 332 + }; 333 + 334 + static const char * const ovl_xino_str[] = { 335 + "off", 336 + "auto", 337 + "on", 338 + }; 339 + 340 + static inline int ovl_xino_def(void) 341 + { 342 + return ovl_xino_auto_def ? OVL_XINO_AUTO : OVL_XINO_OFF; 343 + } 344 + 335 345 /** 336 346 * ovl_show_options 337 347 * ··· 374 350 if (ofs->config.nfs_export != ovl_nfs_export_def) 375 351 seq_printf(m, ",nfs_export=%s", ofs->config.nfs_export ? 376 352 "on" : "off"); 353 + if (ofs->config.xino != ovl_xino_def()) 354 + seq_printf(m, ",xino=%s", ovl_xino_str[ofs->config.xino]); 377 355 return 0; 378 356 } 379 357 ··· 410 384 OPT_INDEX_OFF, 411 385 OPT_NFS_EXPORT_ON, 412 386 OPT_NFS_EXPORT_OFF, 387 + OPT_XINO_ON, 388 + OPT_XINO_OFF, 389 + OPT_XINO_AUTO, 413 390 OPT_ERR, 414 391 }; 415 392 ··· 426 397 {OPT_INDEX_OFF, "index=off"}, 427 398 {OPT_NFS_EXPORT_ON, "nfs_export=on"}, 428 399 {OPT_NFS_EXPORT_OFF, "nfs_export=off"}, 400 + {OPT_XINO_ON, "xino=on"}, 401 + {OPT_XINO_OFF, "xino=off"}, 402 + {OPT_XINO_AUTO, "xino=auto"}, 429 403 {OPT_ERR, NULL} 430 404 }; 431 405 ··· 541 509 542 510 case OPT_NFS_EXPORT_OFF: 543 511 config->nfs_export = false; 512 + break; 513 + 514 + case OPT_XINO_ON: 515 + config->xino = OVL_XINO_ON; 516 + break; 517 + 518 + case OPT_XINO_OFF: 519 + config->xino = OVL_XINO_OFF; 520 + break; 521 + 522 + case OPT_XINO_AUTO: 523 + config->xino = OVL_XINO_AUTO; 544 524 break; 545 525 546 526 default: ··· 744 700 static int ovl_lower_dir(const char *name, struct path *path, 745 701 struct ovl_fs *ofs, int *stack_depth, bool *remote) 746 702 { 703 + int fh_type; 747 704 int err; 748 705 749 706 err = ovl_mount_dir_noesc(name, path); ··· 764 719 * The inodes index feature and NFS export need to encode and decode 765 720 * file handles, so they require that all layers support them. 766 721 */ 722 + fh_type = ovl_can_decode_fh(path->dentry->d_sb); 767 723 if ((ofs->config.nfs_export || 768 - (ofs->config.index && ofs->config.upperdir)) && 769 - !ovl_can_decode_fh(path->dentry->d_sb)) { 724 + (ofs->config.index && ofs->config.upperdir)) && !fh_type) { 770 725 ofs->config.index = false; 771 726 ofs->config.nfs_export = false; 772 727 pr_warn("overlayfs: fs on '%s' does not support file handles, falling back to index=off,nfs_export=off.\n", 773 728 name); 774 729 } 730 + 731 + /* Check if lower fs has 32bit inode numbers */ 732 + if (fh_type != FILEID_INO32_GEN) 733 + ofs->xino_bits = 0; 775 734 776 735 return 0; 777 736 ··· 1000 951 { 1001 952 struct vfsmount *mnt = ofs->upper_mnt; 1002 953 struct dentry *temp; 954 + int fh_type; 1003 955 int err; 1004 956 1005 957 err = mnt_want_write(mnt); ··· 1050 1000 } 1051 1001 1052 1002 /* Check if upper/work fs supports file handles */ 1053 - if (ofs->config.index && 1054 - !ovl_can_decode_fh(ofs->workdir->d_sb)) { 1003 + fh_type = ovl_can_decode_fh(ofs->workdir->d_sb); 1004 + if (ofs->config.index && !fh_type) { 1055 1005 ofs->config.index = false; 1056 1006 pr_warn("overlayfs: upper fs does not support file handles, falling back to index=off.\n"); 1057 1007 } 1008 + 1009 + /* Check if upper fs has 32bit inode numbers */ 1010 + if (fh_type != FILEID_INO32_GEN) 1011 + ofs->xino_bits = 0; 1058 1012 1059 1013 /* NFS export of r/w mount depends on index */ 1060 1014 if (ofs->config.nfs_export && !ofs->config.index) { ··· 1162 1108 return err; 1163 1109 } 1164 1110 1111 + /* Get a unique fsid for the layer */ 1112 + static int ovl_get_fsid(struct ovl_fs *ofs, struct super_block *sb) 1113 + { 1114 + unsigned int i; 1115 + dev_t dev; 1116 + int err; 1117 + 1118 + /* fsid 0 is reserved for upper fs even with non upper overlay */ 1119 + if (ofs->upper_mnt && ofs->upper_mnt->mnt_sb == sb) 1120 + return 0; 1121 + 1122 + for (i = 0; i < ofs->numlowerfs; i++) { 1123 + if (ofs->lower_fs[i].sb == sb) 1124 + return i + 1; 1125 + } 1126 + 1127 + err = get_anon_bdev(&dev); 1128 + if (err) { 1129 + pr_err("overlayfs: failed to get anonymous bdev for lowerpath\n"); 1130 + return err; 1131 + } 1132 + 1133 + ofs->lower_fs[ofs->numlowerfs].sb = sb; 1134 + ofs->lower_fs[ofs->numlowerfs].pseudo_dev = dev; 1135 + ofs->numlowerfs++; 1136 + 1137 + return ofs->numlowerfs; 1138 + } 1139 + 1165 1140 static int ovl_get_lower_layers(struct ovl_fs *ofs, struct path *stack, 1166 1141 unsigned int numlower) 1167 1142 { ··· 1202 1119 GFP_KERNEL); 1203 1120 if (ofs->lower_layers == NULL) 1204 1121 goto out; 1122 + 1123 + ofs->lower_fs = kcalloc(numlower, sizeof(struct ovl_sb), 1124 + GFP_KERNEL); 1125 + if (ofs->lower_fs == NULL) 1126 + goto out; 1127 + 1205 1128 for (i = 0; i < numlower; i++) { 1206 1129 struct vfsmount *mnt; 1207 - dev_t dev; 1130 + int fsid; 1208 1131 1209 - err = get_anon_bdev(&dev); 1210 - if (err) { 1211 - pr_err("overlayfs: failed to get anonymous bdev for lowerpath\n"); 1132 + err = fsid = ovl_get_fsid(ofs, stack[i].mnt->mnt_sb); 1133 + if (err < 0) 1212 1134 goto out; 1213 - } 1214 1135 1215 1136 mnt = clone_private_mount(&stack[i]); 1216 1137 err = PTR_ERR(mnt); 1217 1138 if (IS_ERR(mnt)) { 1218 1139 pr_err("overlayfs: failed to clone lowerpath\n"); 1219 - free_anon_bdev(dev); 1220 1140 goto out; 1221 1141 } 1142 + 1222 1143 /* 1223 1144 * Make lower layers R/O. That way fchmod/fchown on lower file 1224 1145 * will fail instead of modifying lower fs. ··· 1230 1143 mnt->mnt_flags |= MNT_READONLY | MNT_NOATIME; 1231 1144 1232 1145 ofs->lower_layers[ofs->numlower].mnt = mnt; 1233 - ofs->lower_layers[ofs->numlower].pseudo_dev = dev; 1234 1146 ofs->lower_layers[ofs->numlower].idx = i + 1; 1147 + ofs->lower_layers[ofs->numlower].fsid = fsid; 1148 + if (fsid) { 1149 + ofs->lower_layers[ofs->numlower].fs = 1150 + &ofs->lower_fs[fsid - 1]; 1151 + } 1235 1152 ofs->numlower++; 1236 - 1237 - /* Check if all lower layers are on same sb */ 1238 - if (i == 0) 1239 - ofs->same_sb = mnt->mnt_sb; 1240 - else if (ofs->same_sb != mnt->mnt_sb) 1241 - ofs->same_sb = NULL; 1242 1153 } 1154 + 1155 + /* 1156 + * When all layers on same fs, overlay can use real inode numbers. 1157 + * With mount option "xino=on", mounter declares that there are enough 1158 + * free high bits in underlying fs to hold the unique fsid. 1159 + * If overlayfs does encounter underlying inodes using the high xino 1160 + * bits reserved for fsid, it emits a warning and uses the original 1161 + * inode number. 1162 + */ 1163 + if (!ofs->numlowerfs || (ofs->numlowerfs == 1 && !ofs->upper_mnt)) { 1164 + ofs->xino_bits = 0; 1165 + ofs->config.xino = OVL_XINO_OFF; 1166 + } else if (ofs->config.xino == OVL_XINO_ON && !ofs->xino_bits) { 1167 + /* 1168 + * This is a roundup of number of bits needed for numlowerfs+1 1169 + * (i.e. ilog2(numlowerfs+1 - 1) + 1). fsid 0 is reserved for 1170 + * upper fs even with non upper overlay. 1171 + */ 1172 + BUILD_BUG_ON(ilog2(OVL_MAX_STACK) > 31); 1173 + ofs->xino_bits = ilog2(ofs->numlowerfs) + 1; 1174 + } 1175 + 1176 + if (ofs->xino_bits) { 1177 + pr_info("overlayfs: \"xino\" feature enabled using %d upper inode bits.\n", 1178 + ofs->xino_bits); 1179 + } 1180 + 1243 1181 err = 0; 1244 1182 out: 1245 1183 return err; ··· 1375 1263 1376 1264 ofs->config.index = ovl_index_def; 1377 1265 ofs->config.nfs_export = ovl_nfs_export_def; 1266 + ofs->config.xino = ovl_xino_def(); 1378 1267 err = ovl_parse_opt((char *) data, &ofs->config); 1379 1268 if (err) 1380 1269 goto out_err; ··· 1389 1276 1390 1277 sb->s_stack_depth = 0; 1391 1278 sb->s_maxbytes = MAX_LFS_FILESIZE; 1279 + /* Assume underlaying fs uses 32bit inodes unless proven otherwise */ 1280 + if (ofs->config.xino != OVL_XINO_OFF) 1281 + ofs->xino_bits = BITS_PER_LONG - 32; 1282 + 1392 1283 if (ofs->config.upperdir) { 1393 1284 if (!ofs->config.workdir) { 1394 1285 pr_err("overlayfs: missing 'workdir'\n"); ··· 1422 1305 /* If the upper fs is nonexistent, we mark overlayfs r/o too */ 1423 1306 if (!ofs->upper_mnt) 1424 1307 sb->s_flags |= SB_RDONLY; 1425 - else if (ofs->upper_mnt->mnt_sb != ofs->same_sb) 1426 - ofs->same_sb = NULL; 1427 1308 1428 1309 if (!(ovl_force_readonly(ofs)) && ofs->config.index) { 1429 1310 err = ovl_get_indexdir(ofs, oe, &upperpath);
+34 -5
fs/overlayfs/util.c
··· 47 47 { 48 48 struct ovl_fs *ofs = sb->s_fs_info; 49 49 50 - return ofs->same_sb; 50 + if (!ofs->numlowerfs) 51 + return ofs->upper_mnt->mnt_sb; 52 + else if (ofs->numlowerfs == 1 && !ofs->upper_mnt) 53 + return ofs->lower_fs[0].sb; 54 + else 55 + return NULL; 51 56 } 52 57 53 - bool ovl_can_decode_fh(struct super_block *sb) 58 + /* 59 + * Check if underlying fs supports file handles and try to determine encoding 60 + * type, in order to deduce maximum inode number used by fs. 61 + * 62 + * Return 0 if file handles are not supported. 63 + * Return 1 (FILEID_INO32_GEN) if fs uses the default 32bit inode encoding. 64 + * Return -1 if fs uses a non default encoding with unknown inode size. 65 + */ 66 + int ovl_can_decode_fh(struct super_block *sb) 54 67 { 55 - return (sb->s_export_op && sb->s_export_op->fh_to_dentry && 56 - !uuid_is_null(&sb->s_uuid)); 68 + if (!sb->s_export_op || !sb->s_export_op->fh_to_dentry || 69 + uuid_is_null(&sb->s_uuid)) 70 + return 0; 71 + 72 + return sb->s_export_op->encode_fh ? -1 : FILEID_INO32_GEN; 57 73 } 58 74 59 75 struct dentry *ovl_indexdir(struct super_block *sb) ··· 188 172 return oe->numlower ? oe->lowerstack[0].dentry : NULL; 189 173 } 190 174 175 + struct ovl_layer *ovl_layer_lower(struct dentry *dentry) 176 + { 177 + struct ovl_entry *oe = dentry->d_fsdata; 178 + 179 + return oe->numlower ? oe->lowerstack[0].layer : NULL; 180 + } 181 + 191 182 struct dentry *ovl_dentry_real(struct dentry *dentry) 192 183 { 193 184 return ovl_dentry_upper(dentry) ?: ovl_dentry_lower(dentry); ··· 302 279 void ovl_inode_init(struct inode *inode, struct dentry *upperdentry, 303 280 struct dentry *lowerdentry) 304 281 { 282 + struct inode *realinode = d_inode(upperdentry ?: lowerdentry); 283 + 305 284 if (upperdentry) 306 285 OVL_I(inode)->__upperdentry = upperdentry; 307 286 if (lowerdentry) 308 287 OVL_I(inode)->lower = igrab(d_inode(lowerdentry)); 309 288 310 - ovl_copyattr(d_inode(upperdentry ?: lowerdentry), inode); 289 + ovl_copyattr(realinode, inode); 290 + if (!inode->i_ino) 291 + inode->i_ino = realinode->i_ino; 311 292 } 312 293 313 294 void ovl_inode_update(struct inode *inode, struct dentry *upperdentry) ··· 326 299 smp_wmb(); 327 300 OVL_I(inode)->__upperdentry = upperdentry; 328 301 if (inode_unhashed(inode)) { 302 + if (!inode->i_ino) 303 + inode->i_ino = upperinode->i_ino; 329 304 inode->i_private = upperinode; 330 305 __insert_inode_hash(inode, (unsigned long) upperinode); 331 306 }