Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"A smaller collection of fixes that should go into -rc1. This contains:

- A fix from Christoph, fixing a regression with the WRITE_SAME and
partial completions. Caused a BUG() on ppc.

- Fixup for __blk_mq_stop_hw_queues(), it should be static. From
Colin.

- Removal of dmesg error messages on elevator switching, when invoked
from sysfs. From me.

- Fix for blk-stat, using this_cpu_ptr() in a section only protected
by rcu_read_lock(). This breaks when PREEMPT_RCU is enabled. From
me.

- Two fixes for BFQ from Paolo, one fixing a crash and one updating
the documentation.

- An error handling lightnvm memory leak, from Rakesh.

- The previous blk-mq hot unplug lock reversal depends on the CPU
hotplug rework that isn't in mainline yet. This caused a lockdep
splat when people unplugged CPUs with blk-mq devices. From Wanpeng.

- A regression fix for DIF/DIX on blk-mq. From Wen"

* 'for-linus' of git://git.kernel.dk/linux-block:
block: handle partial completions for special payload requests
blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op
blk-stat: don't use this_cpu_ptr() in a preemptable section
elevator: remove redundant warnings on IO scheduler switch
block, bfq: stress that low_latency must be off to get max throughput
block, bfq: use pointer entity->sched_data only if set
nvme: lightnvm: fix memory leak
blk-mq: make __blk_mq_stop_hw_queues static
lightnvm: remove unused rq parameter of nvme_nvm_rqtocmd() to kill warning
block/mq: fix potential deadlock during cpu hotplug

+65 -35
+16 -1
Documentation/block/bfq-iosched.txt
··· 11 11 groups (switching back to time distribution when needed to keep 12 12 throughput high). 13 13 14 + In its default configuration, BFQ privileges latency over 15 + throughput. So, when needed for achieving a lower latency, BFQ builds 16 + schedules that may lead to a lower throughput. If your main or only 17 + goal, for a given device, is to achieve the maximum-possible 18 + throughput at all times, then do switch off all low-latency heuristics 19 + for that device, by setting low_latency to 0. Full details in Section 3. 20 + 14 21 On average CPUs, the current version of BFQ can handle devices 15 22 performing at most ~30K IOPS; at most ~50 KIOPS on faster CPUs. As a 16 23 reference, 30-50 KIOPS correspond to very high bandwidths with ··· 382 375 real-time applications are privileged and experience a lower latency, 383 376 as explained in more detail in the description of how BFQ works. 384 377 385 - DO NOT enable this mode if you need full control on bandwidth 378 + DISABLE this mode if you need full control on bandwidth 386 379 distribution. In fact, if it is enabled, then BFQ automatically 387 380 increases the bandwidth share of privileged applications, as the main 388 381 means to guarantee a lower latency to them. 382 + 383 + In addition, as already highlighted at the beginning of this document, 384 + DISABLE this mode if your only goal is to achieve a high throughput. 385 + In fact, privileging the I/O of some application over the rest may 386 + entail a lower throughput. To achieve the highest-possible throughput 387 + on a non-rotational device, setting slice_idle to 0 may be needed too 388 + (at the cost of giving up any strong guarantee on fairness and low 389 + latency). 389 390 390 391 timeout_sync 391 392 ------------
+5
block/bfq-iosched.c
··· 56 56 * rotational or flash-based devices, and to get the job done quickly 57 57 * for applications consisting in many I/O-bound processes. 58 58 * 59 + * NOTE: if the main or only goal, with a given device, is to achieve 60 + * the maximum-possible throughput at all times, then do switch off 61 + * all low-latency heuristics for that device, by setting low_latency 62 + * to 0. 63 + * 59 64 * BFQ is described in [1], where also a reference to the initial, more 60 65 * theoretical paper on BFQ can be found. The interested reader can find 61 66 * in the latter paper full details on the main algorithm, as well as
+11 -2
block/bfq-wf2q.c
··· 1114 1114 bool __bfq_deactivate_entity(struct bfq_entity *entity, bool ins_into_idle_tree) 1115 1115 { 1116 1116 struct bfq_sched_data *sd = entity->sched_data; 1117 - struct bfq_service_tree *st = bfq_entity_service_tree(entity); 1118 - int is_in_service = entity == sd->in_service_entity; 1117 + struct bfq_service_tree *st; 1118 + bool is_in_service; 1119 1119 1120 1120 if (!entity->on_st) /* entity never activated, or already inactive */ 1121 1121 return false; 1122 + 1123 + /* 1124 + * If we get here, then entity is active, which implies that 1125 + * bfq_group_set_parent has already been invoked for the group 1126 + * represented by entity. Therefore, the field 1127 + * entity->sched_data has been set, and we can safely use it. 1128 + */ 1129 + st = bfq_entity_service_tree(entity); 1130 + is_in_service = entity == sd->in_service_entity; 1122 1131 1123 1132 if (is_in_service) 1124 1133 bfq_calc_finish(entity, entity->service);
+12 -12
block/blk-core.c
··· 2644 2644 return false; 2645 2645 } 2646 2646 2647 - WARN_ON_ONCE(req->rq_flags & RQF_SPECIAL_PAYLOAD); 2648 - 2649 2647 req->__data_len -= total_bytes; 2650 2648 2651 2649 /* update sector only for requests with clear definition of sector */ ··· 2656 2658 req->cmd_flags |= req->bio->bi_opf & REQ_FAILFAST_MASK; 2657 2659 } 2658 2660 2659 - /* 2660 - * If total number of sectors is less than the first segment 2661 - * size, something has gone terribly wrong. 2662 - */ 2663 - if (blk_rq_bytes(req) < blk_rq_cur_bytes(req)) { 2664 - blk_dump_rq_flags(req, "request botched"); 2665 - req->__data_len = blk_rq_cur_bytes(req); 2666 - } 2661 + if (!(req->rq_flags & RQF_SPECIAL_PAYLOAD)) { 2662 + /* 2663 + * If total number of sectors is less than the first segment 2664 + * size, something has gone terribly wrong. 2665 + */ 2666 + if (blk_rq_bytes(req) < blk_rq_cur_bytes(req)) { 2667 + blk_dump_rq_flags(req, "request botched"); 2668 + req->__data_len = blk_rq_cur_bytes(req); 2669 + } 2667 2670 2668 - /* recalculate the number of segments */ 2669 - blk_recalc_rq_segments(req); 2671 + /* recalculate the number of segments */ 2672 + blk_recalc_rq_segments(req); 2673 + } 2670 2674 2671 2675 return true; 2672 2676 }
+5 -5
block/blk-mq.c
··· 1236 1236 } 1237 1237 EXPORT_SYMBOL(blk_mq_stop_hw_queue); 1238 1238 1239 - void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync) 1239 + static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync) 1240 1240 { 1241 1241 struct blk_mq_hw_ctx *hctx; 1242 1242 int i; ··· 1554 1554 1555 1555 blk_queue_bounce(q, &bio); 1556 1556 1557 + blk_queue_split(q, &bio, q->bio_split); 1558 + 1557 1559 if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) { 1558 1560 bio_io_error(bio); 1559 1561 return BLK_QC_T_NONE; 1560 1562 } 1561 - 1562 - blk_queue_split(q, &bio, q->bio_split); 1563 1563 1564 1564 if (!is_flush_fua && !blk_queue_nomerges(q) && 1565 1565 blk_attempt_plug_merge(q, bio, &request_count, &same_queue_rq)) ··· 2341 2341 2342 2342 blk_mq_init_cpu_queues(q, set->nr_hw_queues); 2343 2343 2344 - mutex_lock(&all_q_mutex); 2345 2344 get_online_cpus(); 2345 + mutex_lock(&all_q_mutex); 2346 2346 2347 2347 list_add_tail(&q->all_q_node, &all_q_list); 2348 2348 blk_mq_add_queue_tag_set(set, q); 2349 2349 blk_mq_map_swqueue(q, cpu_online_mask); 2350 2350 2351 - put_online_cpus(); 2352 2351 mutex_unlock(&all_q_mutex); 2352 + put_online_cpus(); 2353 2353 2354 2354 if (!(set->flags & BLK_MQ_F_NO_SCHED)) { 2355 2355 int ret;
+10 -7
block/blk-stat.c
··· 96 96 97 97 rcu_read_lock(); 98 98 list_for_each_entry_rcu(cb, &q->stats->callbacks, list) { 99 - if (blk_stat_is_active(cb)) { 100 - bucket = cb->bucket_fn(rq); 101 - if (bucket < 0) 102 - continue; 103 - stat = &this_cpu_ptr(cb->cpu_stat)[bucket]; 104 - __blk_stat_add(stat, value); 105 - } 99 + if (!blk_stat_is_active(cb)) 100 + continue; 101 + 102 + bucket = cb->bucket_fn(rq); 103 + if (bucket < 0) 104 + continue; 105 + 106 + stat = &get_cpu_ptr(cb->cpu_stat)[bucket]; 107 + __blk_stat_add(stat, value); 108 + put_cpu_ptr(cb->cpu_stat); 106 109 } 107 110 rcu_read_unlock(); 108 111 }
+1 -4
block/elevator.c
··· 1062 1062 1063 1063 strlcpy(elevator_name, name, sizeof(elevator_name)); 1064 1064 e = elevator_get(strstrip(elevator_name), true); 1065 - if (!e) { 1066 - printk(KERN_ERR "elevator: type %s not found\n", elevator_name); 1065 + if (!e) 1067 1066 return -EINVAL; 1068 - } 1069 1067 1070 1068 if (q->elevator && 1071 1069 !strcmp(elevator_name, q->elevator->type->elevator_name)) { ··· 1103 1105 if (!ret) 1104 1106 return count; 1105 1107 1106 - printk(KERN_ERR "elevator: switch to %s failed\n", name); 1107 1108 return ret; 1108 1109 } 1109 1110
+5 -4
drivers/nvme/host/lightnvm.c
··· 367 367 368 368 if (unlikely(elba > nvmdev->total_secs)) { 369 369 pr_err("nvm: L2P data from device is out of bounds!\n"); 370 - return -EINVAL; 370 + ret = -EINVAL; 371 + goto out; 371 372 } 372 373 373 374 /* Transform physical address to target address space */ ··· 465 464 return ret; 466 465 } 467 466 468 - static inline void nvme_nvm_rqtocmd(struct request *rq, struct nvm_rq *rqd, 469 - struct nvme_ns *ns, struct nvme_nvm_command *c) 467 + static inline void nvme_nvm_rqtocmd(struct nvm_rq *rqd, struct nvme_ns *ns, 468 + struct nvme_nvm_command *c) 470 469 { 471 470 c->ph_rw.opcode = rqd->opcode; 472 471 c->ph_rw.nsid = cpu_to_le32(ns->ns_id); ··· 504 503 if (!cmd) 505 504 return -ENOMEM; 506 505 507 - nvme_nvm_rqtocmd(rq, rqd, ns, cmd); 506 + nvme_nvm_rqtocmd(rqd, ns, cmd); 508 507 509 508 rq = nvme_alloc_request(q, (struct nvme_command *)cmd, 0, NVME_QID_ANY); 510 509 if (IS_ERR(rq)) {