Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'block-6.1-2022-10-20' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

- NVMe pull request via Christoph:
- fix nvme-hwmon for DMA non-cohehrent architectures (Serge Semin)
- add a nvme-hwmong maintainer (Christoph Hellwig)
- fix error pointer dereference in error handling (Dan Carpenter)
- fix invalid memory reference in nvmet_subsys_attr_qid_max_show
(Daniel Wagner)
- don't limit the DMA segment size in nvme-apple (Russell King)
- fix workqueue MEM_RECLAIM flushing dependency (Sagi Grimberg)
- disable write zeroes on various Kingston SSDs (Xander Li)

- fix a memory leak with block device tracing (Ye)

- flexible-array fix for ublk (Yushan)

- document the ublk recovery feature from this merge window
(ZiyangZhang)

- remove dead bfq variable in struct (Yuwei)

- error handling rq clearing fix (Yu)

- add an IRQ safety check for the cached bio freeing (Pavel)

- drbd bio cloning fix (Christoph)

* tag 'block-6.1-2022-10-20' of git://git.kernel.dk/linux:
blktrace: remove unnessary stop block trace in 'blk_trace_shutdown'
blktrace: fix possible memleak in '__blk_trace_remove'
blktrace: introduce 'blk_trace_{start,stop}' helper
bio: safeguard REQ_ALLOC_CACHE bio put
block, bfq: remove unused variable for bfq_queue
drbd: only clone bio if we have a backing device
ublk_drv: use flexible-array member instead of zero-length array
nvmet: fix invalid memory reference in nvmet_subsys_attr_qid_max_show
nvmet: fix workqueue MEM_RECLAIM flushing dependency
nvme-hwmon: kmalloc the NVME SMART log buffer
nvme-hwmon: consistently ignore errors from nvme_hwmon_init
nvme: add Guenther as nvme-hwmon maintainer
nvme-apple: don't limit DMA segement size
nvme-pci: disable write zeroes on various Kingston SSD
nvme: fix error pointer dereference in error handling
Documentation: document ublk user recovery feature
blk-mq: fix null pointer dereference in blk_mq_clear_rq_mapping()

+135 -76
+36
Documentation/block/ublk.rst
··· 144 144 For retrieving device info via ``ublksrv_ctrl_dev_info``. It is the server's 145 145 responsibility to save IO target specific info in userspace. 146 146 147 + - ``UBLK_CMD_START_USER_RECOVERY`` 148 + 149 + This command is valid if ``UBLK_F_USER_RECOVERY`` feature is enabled. This 150 + command is accepted after the old process has exited, ublk device is quiesced 151 + and ``/dev/ublkc*`` is released. User should send this command before he starts 152 + a new process which re-opens ``/dev/ublkc*``. When this command returns, the 153 + ublk device is ready for the new process. 154 + 155 + - ``UBLK_CMD_END_USER_RECOVERY`` 156 + 157 + This command is valid if ``UBLK_F_USER_RECOVERY`` feature is enabled. This 158 + command is accepted after ublk device is quiesced and a new process has 159 + opened ``/dev/ublkc*`` and get all ublk queues be ready. When this command 160 + returns, ublk device is unquiesced and new I/O requests are passed to the 161 + new process. 162 + 163 + - user recovery feature description 164 + 165 + Two new features are added for user recovery: ``UBLK_F_USER_RECOVERY`` and 166 + ``UBLK_F_USER_RECOVERY_REISSUE``. 167 + 168 + With ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io 169 + handler) is dying, ublk does not delete ``/dev/ublkb*`` during the whole 170 + recovery stage and ublk device ID is kept. It is ublk server's 171 + responsibility to recover the device context by its own knowledge. 172 + Requests which have not been issued to userspace are requeued. Requests 173 + which have been issued to userspace are aborted. 174 + 175 + With ``UBLK_F_USER_RECOVERY_REISSUE`` set, after one ubq_daemon(ublk 176 + server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``, 177 + requests which have been issued to userspace are requeued and will be 178 + re-issued to the new process after handling ``UBLK_CMD_END_USER_RECOVERY``. 179 + ``UBLK_F_USER_RECOVERY_REISSUE`` is designed for backends who tolerate 180 + double-write since the driver may issue the same I/O request twice. It 181 + might be useful to a read-only FS or a VM backend. 182 + 147 183 Data plane 148 184 ---------- 149 185
+6
MAINTAINERS
··· 14713 14713 F: drivers/nvme/target/fabrics-cmd-auth.c 14714 14714 F: include/linux/nvme-auth.h 14715 14715 14716 + NVM EXPRESS HARDWARE MONITORING SUPPORT 14717 + M: Guenter Roeck <linux@roeck-us.net> 14718 + L: linux-nvme@lists.infradead.org 14719 + S: Supported 14720 + F: drivers/nvme/host/hwmon.c 14721 + 14716 14722 NVM EXPRESS FC TRANSPORT DRIVERS 14717 14723 M: James Smart <james.smart@broadcom.com> 14718 14724 L: linux-nvme@lists.infradead.org
-4
block/bfq-iosched.h
··· 369 369 unsigned long split_time; /* time of last split */ 370 370 371 371 unsigned long first_IO_time; /* time of first I/O for this queue */ 372 - 373 372 unsigned long creation_time; /* when this queue is created */ 374 - 375 - /* max service rate measured so far */ 376 - u32 max_service_rate; 377 373 378 374 /* 379 375 * Pointer to the waker queue for this queue, i.e., to the
+1 -1
block/bio.c
··· 741 741 return; 742 742 } 743 743 744 - if (bio->bi_opf & REQ_ALLOC_CACHE) { 744 + if ((bio->bi_opf & REQ_ALLOC_CACHE) && !WARN_ON_ONCE(in_interrupt())) { 745 745 struct bio_alloc_cache *cache; 746 746 747 747 bio_uninit(bio);
+5 -2
block/blk-mq.c
··· 3112 3112 struct page *page; 3113 3113 unsigned long flags; 3114 3114 3115 - /* There is no need to clear a driver tags own mapping */ 3116 - if (drv_tags == tags) 3115 + /* 3116 + * There is no need to clear mapping if driver tags is not initialized 3117 + * or the mapping belongs to the driver tags. 3118 + */ 3119 + if (!drv_tags || drv_tags == tags) 3117 3120 return; 3118 3121 3119 3122 list_for_each_entry(page, &tags->page_list, lru) {
+6 -8
drivers/block/drbd/drbd_req.c
··· 30 30 return NULL; 31 31 memset(req, 0, sizeof(*req)); 32 32 33 - req->private_bio = bio_alloc_clone(device->ldev->backing_bdev, bio_src, 34 - GFP_NOIO, &drbd_io_bio_set); 35 - req->private_bio->bi_private = req; 36 - req->private_bio->bi_end_io = drbd_request_endio; 37 - 38 33 req->rq_state = (bio_data_dir(bio_src) == WRITE ? RQ_WRITE : 0) 39 34 | (bio_op(bio_src) == REQ_OP_WRITE_ZEROES ? RQ_ZEROES : 0) 40 35 | (bio_op(bio_src) == REQ_OP_DISCARD ? RQ_UNMAP : 0); ··· 1214 1219 /* Update disk stats */ 1215 1220 req->start_jif = bio_start_io_acct(req->master_bio); 1216 1221 1217 - if (!get_ldev(device)) { 1218 - bio_put(req->private_bio); 1219 - req->private_bio = NULL; 1222 + if (get_ldev(device)) { 1223 + req->private_bio = bio_alloc_clone(device->ldev->backing_bdev, 1224 + bio, GFP_NOIO, 1225 + &drbd_io_bio_set); 1226 + req->private_bio->bi_private = req; 1227 + req->private_bio->bi_end_io = drbd_request_endio; 1220 1228 } 1221 1229 1222 1230 /* process discards always from our submitter thread */
+1 -1
drivers/block/ublk_drv.c
··· 124 124 bool force_abort; 125 125 unsigned short nr_io_ready; /* how many ios setup */ 126 126 struct ublk_device *dev; 127 - struct ublk_io ios[0]; 127 + struct ublk_io ios[]; 128 128 }; 129 129 130 130 #define UBLK_DAEMON_MONITOR_PERIOD (5 * HZ)
+2
drivers/nvme/host/apple.c
··· 1039 1039 dma_max_mapping_size(anv->dev) >> 9); 1040 1040 anv->ctrl.max_segments = NVME_MAX_SEGS; 1041 1041 1042 + dma_set_max_seg_size(anv->dev, 0xffffffff); 1043 + 1042 1044 /* 1043 1045 * Enable NVMMU and linear submission queues. 1044 1046 * While we could keep those disabled and pretend this is slightly
+6 -2
drivers/nvme/host/core.c
··· 3262 3262 return ret; 3263 3263 3264 3264 if (!ctrl->identified && !nvme_discovery_ctrl(ctrl)) { 3265 + /* 3266 + * Do not return errors unless we are in a controller reset, 3267 + * the controller works perfectly fine without hwmon. 3268 + */ 3265 3269 ret = nvme_hwmon_init(ctrl); 3266 - if (ret < 0) 3270 + if (ret == -EINTR) 3267 3271 return ret; 3268 3272 } 3269 3273 ··· 4850 4846 return 0; 4851 4847 4852 4848 out_cleanup_admin_q: 4853 - blk_mq_destroy_queue(ctrl->fabrics_q); 4849 + blk_mq_destroy_queue(ctrl->admin_q); 4854 4850 out_free_tagset: 4855 4851 blk_mq_free_tag_set(ctrl->admin_tagset); 4856 4852 return ret;
+22 -10
drivers/nvme/host/hwmon.c
··· 12 12 13 13 struct nvme_hwmon_data { 14 14 struct nvme_ctrl *ctrl; 15 - struct nvme_smart_log log; 15 + struct nvme_smart_log *log; 16 16 struct mutex read_lock; 17 17 }; 18 18 ··· 60 60 static int nvme_hwmon_get_smart_log(struct nvme_hwmon_data *data) 61 61 { 62 62 return nvme_get_log(data->ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0, 63 - NVME_CSI_NVM, &data->log, sizeof(data->log), 0); 63 + NVME_CSI_NVM, data->log, sizeof(*data->log), 0); 64 64 } 65 65 66 66 static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type, 67 67 u32 attr, int channel, long *val) 68 68 { 69 69 struct nvme_hwmon_data *data = dev_get_drvdata(dev); 70 - struct nvme_smart_log *log = &data->log; 70 + struct nvme_smart_log *log = data->log; 71 71 int temp; 72 72 int err; 73 73 ··· 163 163 case hwmon_temp_max: 164 164 case hwmon_temp_min: 165 165 if ((!channel && data->ctrl->wctemp) || 166 - (channel && data->log.temp_sensor[channel - 1])) { 166 + (channel && data->log->temp_sensor[channel - 1])) { 167 167 if (data->ctrl->quirks & 168 168 NVME_QUIRK_NO_TEMP_THRESH_CHANGE) 169 169 return 0444; ··· 176 176 break; 177 177 case hwmon_temp_input: 178 178 case hwmon_temp_label: 179 - if (!channel || data->log.temp_sensor[channel - 1]) 179 + if (!channel || data->log->temp_sensor[channel - 1]) 180 180 return 0444; 181 181 break; 182 182 default: ··· 230 230 231 231 data = kzalloc(sizeof(*data), GFP_KERNEL); 232 232 if (!data) 233 - return 0; 233 + return -ENOMEM; 234 + 235 + data->log = kzalloc(sizeof(*data->log), GFP_KERNEL); 236 + if (!data->log) { 237 + err = -ENOMEM; 238 + goto err_free_data; 239 + } 234 240 235 241 data->ctrl = ctrl; 236 242 mutex_init(&data->read_lock); ··· 244 238 err = nvme_hwmon_get_smart_log(data); 245 239 if (err) { 246 240 dev_warn(dev, "Failed to read smart log (error %d)\n", err); 247 - kfree(data); 248 - return err; 241 + goto err_free_log; 249 242 } 250 243 251 244 hwmon = hwmon_device_register_with_info(dev, "nvme", ··· 252 247 NULL); 253 248 if (IS_ERR(hwmon)) { 254 249 dev_warn(dev, "Failed to instantiate hwmon device\n"); 255 - kfree(data); 256 - return PTR_ERR(hwmon); 250 + err = PTR_ERR(hwmon); 251 + goto err_free_log; 257 252 } 258 253 ctrl->hwmon_device = hwmon; 259 254 return 0; 255 + 256 + err_free_log: 257 + kfree(data->log); 258 + err_free_data: 259 + kfree(data); 260 + return err; 260 261 } 261 262 262 263 void nvme_hwmon_exit(struct nvme_ctrl *ctrl) ··· 273 262 274 263 hwmon_device_unregister(ctrl->hwmon_device); 275 264 ctrl->hwmon_device = NULL; 265 + kfree(data->log); 276 266 kfree(data); 277 267 } 278 268 }
+10
drivers/nvme/host/pci.c
··· 3511 3511 .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, 3512 3512 { PCI_DEVICE(0x2646, 0x2263), /* KINGSTON A2000 NVMe SSD */ 3513 3513 .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, 3514 + { PCI_DEVICE(0x2646, 0x5018), /* KINGSTON OM8SFP4xxxxP OS21012 NVMe SSD */ 3515 + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, 3516 + { PCI_DEVICE(0x2646, 0x5016), /* KINGSTON OM3PGP4xxxxP OS21011 NVMe SSD */ 3517 + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, 3518 + { PCI_DEVICE(0x2646, 0x501A), /* KINGSTON OM8PGP4xxxxP OS21005 NVMe SSD */ 3519 + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, 3520 + { PCI_DEVICE(0x2646, 0x501B), /* KINGSTON OM8PGP4xxxxQ OS21005 NVMe SSD */ 3521 + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, 3522 + { PCI_DEVICE(0x2646, 0x501E), /* KINGSTON OM3PGP4xxxxQ OS21011 NVMe SSD */ 3523 + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, 3514 3524 { PCI_DEVICE(0x1e4B, 0x1001), /* MAXIO MAP1001 */ 3515 3525 .driver_data = NVME_QUIRK_BOGUS_NID, }, 3516 3526 { PCI_DEVICE(0x1e4B, 0x1002), /* MAXIO MAP1002 */
-4
drivers/nvme/target/configfs.c
··· 1290 1290 static ssize_t nvmet_subsys_attr_qid_max_store(struct config_item *item, 1291 1291 const char *page, size_t cnt) 1292 1292 { 1293 - struct nvmet_port *port = to_nvmet_port(item); 1294 1293 u16 qid_max; 1295 - 1296 - if (nvmet_is_port_enabled(port, __func__)) 1297 - return -EACCES; 1298 1294 1299 1295 if (sscanf(page, "%hu\n", &qid_max) != 1) 1300 1296 return -EINVAL;
+1 -1
drivers/nvme/target/core.c
··· 1176 1176 * reset the keep alive timer when the controller is enabled. 1177 1177 */ 1178 1178 if (ctrl->kato) 1179 - mod_delayed_work(system_wq, &ctrl->ka_work, ctrl->kato * HZ); 1179 + mod_delayed_work(nvmet_wq, &ctrl->ka_work, ctrl->kato * HZ); 1180 1180 } 1181 1181 1182 1182 static void nvmet_clear_ctrl(struct nvmet_ctrl *ctrl)
+39 -43
kernel/trace/blktrace.c
··· 346 346 mutex_unlock(&blk_probe_mutex); 347 347 } 348 348 349 + static int blk_trace_start(struct blk_trace *bt) 350 + { 351 + if (bt->trace_state != Blktrace_setup && 352 + bt->trace_state != Blktrace_stopped) 353 + return -EINVAL; 354 + 355 + blktrace_seq++; 356 + smp_mb(); 357 + bt->trace_state = Blktrace_running; 358 + raw_spin_lock_irq(&running_trace_lock); 359 + list_add(&bt->running_list, &running_trace_list); 360 + raw_spin_unlock_irq(&running_trace_lock); 361 + trace_note_time(bt); 362 + 363 + return 0; 364 + } 365 + 366 + static int blk_trace_stop(struct blk_trace *bt) 367 + { 368 + if (bt->trace_state != Blktrace_running) 369 + return -EINVAL; 370 + 371 + bt->trace_state = Blktrace_stopped; 372 + raw_spin_lock_irq(&running_trace_lock); 373 + list_del_init(&bt->running_list); 374 + raw_spin_unlock_irq(&running_trace_lock); 375 + relay_flush(bt->rchan); 376 + 377 + return 0; 378 + } 379 + 349 380 static void blk_trace_cleanup(struct request_queue *q, struct blk_trace *bt) 350 381 { 382 + blk_trace_stop(bt); 351 383 synchronize_rcu(); 352 384 blk_trace_free(q, bt); 353 385 put_probe_ref(); ··· 394 362 if (!bt) 395 363 return -EINVAL; 396 364 397 - if (bt->trace_state != Blktrace_running) 398 - blk_trace_cleanup(q, bt); 365 + blk_trace_cleanup(q, bt); 399 366 400 367 return 0; 401 368 } ··· 689 658 690 659 static int __blk_trace_startstop(struct request_queue *q, int start) 691 660 { 692 - int ret; 693 661 struct blk_trace *bt; 694 662 695 663 bt = rcu_dereference_protected(q->blk_trace, ··· 696 666 if (bt == NULL) 697 667 return -EINVAL; 698 668 699 - /* 700 - * For starting a trace, we can transition from a setup or stopped 701 - * trace. For stopping a trace, the state must be running 702 - */ 703 - ret = -EINVAL; 704 - if (start) { 705 - if (bt->trace_state == Blktrace_setup || 706 - bt->trace_state == Blktrace_stopped) { 707 - blktrace_seq++; 708 - smp_mb(); 709 - bt->trace_state = Blktrace_running; 710 - raw_spin_lock_irq(&running_trace_lock); 711 - list_add(&bt->running_list, &running_trace_list); 712 - raw_spin_unlock_irq(&running_trace_lock); 713 - 714 - trace_note_time(bt); 715 - ret = 0; 716 - } 717 - } else { 718 - if (bt->trace_state == Blktrace_running) { 719 - bt->trace_state = Blktrace_stopped; 720 - raw_spin_lock_irq(&running_trace_lock); 721 - list_del_init(&bt->running_list); 722 - raw_spin_unlock_irq(&running_trace_lock); 723 - relay_flush(bt->rchan); 724 - ret = 0; 725 - } 726 - } 727 - 728 - return ret; 669 + if (start) 670 + return blk_trace_start(bt); 671 + else 672 + return blk_trace_stop(bt); 729 673 } 730 674 731 675 int blk_trace_startstop(struct request_queue *q, int start) ··· 776 772 void blk_trace_shutdown(struct request_queue *q) 777 773 { 778 774 if (rcu_dereference_protected(q->blk_trace, 779 - lockdep_is_held(&q->debugfs_mutex))) { 780 - __blk_trace_startstop(q, 0); 775 + lockdep_is_held(&q->debugfs_mutex))) 781 776 __blk_trace_remove(q); 782 - } 783 777 } 784 778 785 779 #ifdef CONFIG_BLK_CGROUP ··· 1616 1614 if (bt == NULL) 1617 1615 return -EINVAL; 1618 1616 1619 - if (bt->trace_state == Blktrace_running) { 1620 - bt->trace_state = Blktrace_stopped; 1621 - raw_spin_lock_irq(&running_trace_lock); 1622 - list_del_init(&bt->running_list); 1623 - raw_spin_unlock_irq(&running_trace_lock); 1624 - relay_flush(bt->rchan); 1625 - } 1617 + blk_trace_stop(bt); 1626 1618 1627 1619 put_probe_ref(); 1628 1620 synchronize_rcu();