Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

NFSD: Track SCSI Persistent Registration Fencing per Client with xarray

When a client holding pNFS SCSI layouts becomes unresponsive, the
server revokes access by preempting the client's SCSI persistent
reservation key. A layout recall is issued for each layout the
client holds; if the client fails to respond, each recall triggers
a fence operation. The first preempt for a given device succeeds
and removes the client's key registration. Subsequent preempts for
the same device fail because the key is no longer registered.

Update the NFS server to handle SCSI persistent registration
fencing on a per-client and per-device basis by utilizing an
xarray associated with the nfs4_client structure.

Each xarray entry is indexed by the dev_t of a block device
registered by the client. The entry maintains a flag indicating
whether this device has already been fenced for the corresponding
client.

When the server issues a persistent registration key to a client,
it creates a new xarray entry at the dev_t index with the fenced
flag initialized to 0.

Before performing a fence via nfsd4_scsi_fence_client, the server
checks the corresponding entry using the device's dev_t. If the
fenced flag is already set, the fence operation is skipped;
otherwise, the flag is set to 1 and fencing proceeds.

The xarray is destroyed when the nfs4_client is released in
__destroy_client.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

authored by

Dai Ngo and committed by
Chuck Lever
a0ed7975 7b546bd8

+81
+72
fs/nfsd/blocklayout.c
··· 273 273 #endif /* CONFIG_NFSD_BLOCKLAYOUT */ 274 274 275 275 #ifdef CONFIG_NFSD_SCSILAYOUT 276 + 277 + #define NFSD_MDS_PR_FENCED XA_MARK_0 278 + 279 + /* 280 + * Clear the fence flag if the device already has an entry. This occurs 281 + * when a client re-registers after a previous fence, allowing new 282 + * layouts for this device. 283 + * 284 + * Insert only on first registration. This bounds cl_dev_fences to the 285 + * count of devices this client has accessed, preventing unbounded growth. 286 + */ 287 + static inline int nfsd4_scsi_fence_insert(struct nfs4_client *clp, 288 + dev_t device) 289 + { 290 + struct xarray *xa = &clp->cl_dev_fences; 291 + int ret; 292 + 293 + xa_lock(xa); 294 + ret = __xa_insert(xa, device, XA_ZERO_ENTRY, GFP_KERNEL); 295 + if (ret == -EBUSY) { 296 + __xa_clear_mark(xa, device, NFSD_MDS_PR_FENCED); 297 + ret = 0; 298 + } 299 + xa_unlock(xa); 300 + return ret; 301 + } 302 + 303 + static inline bool nfsd4_scsi_fence_set(struct nfs4_client *clp, dev_t device) 304 + { 305 + struct xarray *xa = &clp->cl_dev_fences; 306 + bool skip; 307 + 308 + xa_lock(xa); 309 + skip = xa_get_mark(xa, device, NFSD_MDS_PR_FENCED); 310 + if (!skip) 311 + __xa_set_mark(xa, device, NFSD_MDS_PR_FENCED); 312 + xa_unlock(xa); 313 + return skip; 314 + } 315 + 316 + static inline void nfsd4_scsi_fence_clear(struct nfs4_client *clp, dev_t device) 317 + { 318 + xa_clear_mark(&clp->cl_dev_fences, device, NFSD_MDS_PR_FENCED); 319 + } 320 + 276 321 #define NFSD_MDS_PR_KEY 0x0100000000000000ULL 277 322 278 323 /* ··· 387 342 goto out_free_dev; 388 343 } 389 344 345 + ret = nfsd4_scsi_fence_insert(clp, sb->s_bdev->bd_dev); 346 + if (ret < 0) 347 + goto out_free_dev; 348 + 390 349 ret = ops->pr_register(sb->s_bdev, 0, NFSD_MDS_PR_KEY, true); 391 350 if (ret) { 392 351 pr_err("pNFS: failed to register key for device %s.\n", ··· 450 401 struct block_device *bdev = file->nf_file->f_path.mnt->mnt_sb->s_bdev; 451 402 int status; 452 403 404 + if (nfsd4_scsi_fence_set(clp, bdev->bd_dev)) 405 + return; 406 + 453 407 status = bdev->bd_disk->fops->pr_ops->pr_preempt(bdev, NFSD_MDS_PR_KEY, 454 408 nfsd4_scsi_pr_key(clp), 455 409 PR_EXCLUSIVE_ACCESS_REG_ONLY, true); 410 + /* 411 + * Reset to allow retry only when the command could not have 412 + * reached the device. Negative status means a local error 413 + * (e.g., -ENOMEM) prevented the command from being sent. 414 + * PR_STS_PATH_FAILED, PR_STS_PATH_FAST_FAILED, and 415 + * PR_STS_RETRY_PATH_FAILURE indicate transport path failures 416 + * before device delivery. 417 + * 418 + * For all other errors, the command may have reached the device 419 + * and the preempt may have succeeded. Avoid resetting, since 420 + * retrying a successful preempt returns PR_STS_IOERR or 421 + * PR_STS_RESERVATION_CONFLICT, which would cause an infinite 422 + * retry loop. 423 + */ 424 + if (status < 0 || 425 + status == PR_STS_PATH_FAILED || 426 + status == PR_STS_PATH_FAST_FAILED || 427 + status == PR_STS_RETRY_PATH_FAILURE) 428 + nfsd4_scsi_fence_clear(clp, bdev->bd_dev); 429 + 456 430 trace_nfsd_pnfs_fence(clp, bdev->bd_disk->disk_name, status); 457 431 } 458 432
+6
fs/nfsd/nfs4state.c
··· 2382 2382 #ifdef CONFIG_NFSD_PNFS 2383 2383 INIT_LIST_HEAD(&clp->cl_lo_states); 2384 2384 #endif 2385 + #ifdef CONFIG_NFSD_SCSILAYOUT 2386 + xa_init(&clp->cl_dev_fences); 2387 + #endif 2385 2388 INIT_LIST_HEAD(&clp->async_copies); 2386 2389 spin_lock_init(&clp->async_lock); 2387 2390 spin_lock_init(&clp->cl_lock); ··· 2546 2543 svc_xprt_put(clp->cl_cb_conn.cb_xprt); 2547 2544 atomic_add_unless(&nn->nfs4_client_count, -1, 0); 2548 2545 nfsd4_dec_courtesy_client_count(nn, clp); 2546 + #ifdef CONFIG_NFSD_SCSILAYOUT 2547 + xa_destroy(&clp->cl_dev_fences); 2548 + #endif 2549 2549 free_client(clp); 2550 2550 wake_up_all(&expiry_wq); 2551 2551 }
+3
fs/nfsd/state.h
··· 527 527 528 528 struct nfsd4_cb_recall_any *cl_ra; 529 529 time64_t cl_ra_time; 530 + #ifdef CONFIG_NFSD_SCSILAYOUT 531 + struct xarray cl_dev_fences; 532 + #endif 530 533 }; 531 534 532 535 /* struct nfs4_client_reset