Merge branch 'for-linus' of git://git.kernel.dk/linux-block

+71

Documentation/block/cfq-iosched.txt

··· 43 43 to IOPS mode and starts providing fairness in terms of number of requests 44 44 dispatched. Note that this mode switching takes effect only for group 45 45 scheduling. For non-cgroup users nothing should change. 46 + 47 + CFQ IO scheduler Idling Theory 48 + =============================== 49 + Idling on a queue is primarily about waiting for the next request to come 50 + on same queue after completion of a request. In this process CFQ will not 51 + dispatch requests from other cfq queues even if requests are pending there. 52 + 53 + The rationale behind idling is that it can cut down on number of seeks 54 + on rotational media. For example, if a process is doing dependent 55 + sequential reads (next read will come on only after completion of previous 56 + one), then not dispatching request from other queue should help as we 57 + did not move the disk head and kept on dispatching sequential IO from 58 + one queue. 59 + 60 + CFQ has following service trees and various queues are put on these trees. 61 + 62 + sync-idle sync-noidle async 63 + 64 + All cfq queues doing synchronous sequential IO go on to sync-idle tree. 65 + On this tree we idle on each queue individually. 66 + 67 + All synchronous non-sequential queues go on sync-noidle tree. Also any 68 + request which are marked with REQ_NOIDLE go on this service tree. On this 69 + tree we do not idle on individual queues instead idle on the whole group 70 + of queues or the tree. So if there are 4 queues waiting for IO to dispatch 71 + we will idle only once last queue has dispatched the IO and there is 72 + no more IO on this service tree. 73 + 74 + All async writes go on async service tree. There is no idling on async 75 + queues. 76 + 77 + CFQ has some optimizations for SSDs and if it detects a non-rotational 78 + media which can support higher queue depth (multiple requests at in 79 + flight at a time), then it cuts down on idling of individual queues and 80 + all the queues move to sync-noidle tree and only tree idle remains. This 81 + tree idling provides isolation with buffered write queues on async tree. 82 + 83 + FAQ 84 + === 85 + Q1. Why to idle at all on queues marked with REQ_NOIDLE. 86 + 87 + A1. We only do tree idle (all queues on sync-noidle tree) on queues marked 88 + with REQ_NOIDLE. This helps in providing isolation with all the sync-idle 89 + queues. Otherwise in presence of many sequential readers, other 90 + synchronous IO might not get fair share of disk. 91 + 92 + For example, if there are 10 sequential readers doing IO and they get 93 + 100ms each. If a REQ_NOIDLE request comes in, it will be scheduled 94 + roughly after 1 second. If after completion of REQ_NOIDLE request we 95 + do not idle, and after a couple of milli seconds a another REQ_NOIDLE 96 + request comes in, again it will be scheduled after 1second. Repeat it 97 + and notice how a workload can lose its disk share and suffer due to 98 + multiple sequential readers. 99 + 100 + fsync can generate dependent IO where bunch of data is written in the 101 + context of fsync, and later some journaling data is written. Journaling 102 + data comes in only after fsync has finished its IO (atleast for ext4 103 + that seemed to be the case). Now if one decides not to idle on fsync 104 + thread due to REQ_NOIDLE, then next journaling write will not get 105 + scheduled for another second. A process doing small fsync, will suffer 106 + badly in presence of multiple sequential readers. 107 + 108 + Hence doing tree idling on threads using REQ_NOIDLE flag on requests 109 + provides isolation from multiple sequential readers and at the same 110 + time we do not idle on individual threads. 111 + 112 + Q2. When to specify REQ_NOIDLE 113 + A2. I would think whenever one is doing synchronous write and not expecting 114 + more writes to be dispatched from same context soon, should be able 115 + to specify REQ_NOIDLE on writes and that probably should work well for 116 + most of the cases.

+6 -3

Documentation/kernel-parameters.txt

··· 1350 1350 it is equivalent to "nosmp", which also disables 1351 1351 the IO APIC. 1352 1352 1353 - max_loop= [LOOP] Maximum number of loopback devices that can 1354 - be mounted 1355 - Format: <1-256> 1353 + max_loop= [LOOP] The number of loop block devices that get 1354 + (loop.max_loop) unconditionally pre-created at init time. The default 1355 + number is configured by BLK_DEV_LOOP_MIN_COUNT. Instead 1356 + of statically allocating a predefined number, loop 1357 + devices can be requested on-demand with the 1358 + /dev/loop-control interface. 1356 1359 1357 1360 mcatest= [IA-64] 1358 1361

+10

block/Kconfig

··· 65 65 66 66 If unsure, say Y. 67 67 68 + config BLK_DEV_BSGLIB 69 + bool "Block layer SG support v4 helper lib" 70 + default n 71 + select BLK_DEV_BSG 72 + help 73 + Subsystems will normally enable this if needed. Users will not 74 + normally need to manually enable this. 75 + 76 + If unsure, say N. 77 + 68 78 config BLK_DEV_INTEGRITY 69 79 bool "Block layer data integrity support" 70 80 ---help---

+1

block/Makefile

··· 8 8 blk-iopoll.o blk-lib.o ioctl.o genhd.o scsi_ioctl.o 9 9 10 10 obj-$(CONFIG_BLK_DEV_BSG) += bsg.o 11 + obj-$(CONFIG_BLK_DEV_BSGLIB) += bsg-lib.o 11 12 obj-$(CONFIG_BLK_CGROUP) += blk-cgroup.o 12 13 obj-$(CONFIG_BLK_DEV_THROTTLING) += blk-throttle.o 13 14 obj-$(CONFIG_IOSCHED_NOOP) += noop-iosched.o

+6 -2

block/blk-core.c

··· 1702 1702 int blk_insert_cloned_request(struct request_queue *q, struct request *rq) 1703 1703 { 1704 1704 unsigned long flags; 1705 + int where = ELEVATOR_INSERT_BACK; 1705 1706 1706 1707 if (blk_rq_check_limits(q, rq)) 1707 1708 return -EIO; ··· 1719 1718 */ 1720 1719 BUG_ON(blk_queued_rq(rq)); 1721 1720 1722 - add_acct_request(q, rq, ELEVATOR_INSERT_BACK); 1721 + if (rq->cmd_flags & (REQ_FLUSH|REQ_FUA)) 1722 + where = ELEVATOR_INSERT_FLUSH; 1723 + 1724 + add_acct_request(q, rq, where); 1723 1725 spin_unlock_irqrestore(q->queue_lock, flags); 1724 1726 1725 1727 return 0; ··· 2279 2275 * %false - we are done with this request 2280 2276 * %true - still buffers pending for this request 2281 2277 **/ 2282 - static bool __blk_end_bidi_request(struct request *rq, int error, 2278 + bool __blk_end_bidi_request(struct request *rq, int error, 2283 2279 unsigned int nr_bytes, unsigned int bidi_bytes) 2284 2280 { 2285 2281 if (blk_update_bidi_request(rq, error, nr_bytes, bidi_bytes))

+19 -6

block/blk-flush.c

··· 95 95 { 96 96 unsigned int policy = 0; 97 97 98 + if (blk_rq_sectors(rq)) 99 + policy |= REQ_FSEQ_DATA; 100 + 98 101 if (fflags & REQ_FLUSH) { 99 102 if (rq->cmd_flags & REQ_FLUSH) 100 103 policy |= REQ_FSEQ_PREFLUSH; 101 - if (blk_rq_sectors(rq)) 102 - policy |= REQ_FSEQ_DATA; 103 104 if (!(fflags & REQ_FUA) && (rq->cmd_flags & REQ_FUA)) 104 105 policy |= REQ_FSEQ_POSTFLUSH; 105 106 } ··· 123 122 124 123 /* make @rq a normal request */ 125 124 rq->cmd_flags &= ~REQ_FLUSH_SEQ; 126 - rq->end_io = NULL; 125 + rq->end_io = rq->flush.saved_end_io; 127 126 } 128 127 129 128 /** ··· 301 300 unsigned int fflags = q->flush_flags; /* may change, cache */ 302 301 unsigned int policy = blk_flush_policy(fflags, rq); 303 302 304 - BUG_ON(rq->end_io); 305 - BUG_ON(!rq->bio || rq->bio != rq->biotail); 306 - 307 303 /* 308 304 * @policy now records what operations need to be done. Adjust 309 305 * REQ_FLUSH and FUA for the driver. ··· 310 312 rq->cmd_flags &= ~REQ_FUA; 311 313 312 314 /* 315 + * An empty flush handed down from a stacking driver may 316 + * translate into nothing if the underlying device does not 317 + * advertise a write-back cache. In this case, simply 318 + * complete the request. 319 + */ 320 + if (!policy) { 321 + __blk_end_bidi_request(rq, 0, 0, 0); 322 + return; 323 + } 324 + 325 + BUG_ON(!rq->bio || rq->bio != rq->biotail); 326 + 327 + /* 313 328 * If there's data but flush is not necessary, the request can be 314 329 * processed directly without going through flush machinery. Queue 315 330 * for normal execution. ··· 330 319 if ((policy & REQ_FSEQ_DATA) && 331 320 !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) { 332 321 list_add_tail(&rq->queuelist, &q->queue_head); 322 + blk_run_queue_async(q); 333 323 return; 334 324 } 335 325 ··· 341 329 memset(&rq->flush, 0, sizeof(rq->flush)); 342 330 INIT_LIST_HEAD(&rq->flush.list); 343 331 rq->cmd_flags |= REQ_FLUSH_SEQ; 332 + rq->flush.saved_end_io = rq->end_io; /* Usually NULL */ 344 333 rq->end_io = flush_data_end_io; 345 334 346 335 blk_flush_complete_seq(rq, REQ_FSEQ_ACTIONS & ~policy, 0);

+8

block/blk-softirq.c

··· 124 124 } else 125 125 ccpu = cpu; 126 126 127 + /* 128 + * If current CPU and requested CPU are in the same group, running 129 + * softirq in current CPU. One might concern this is just like 130 + * QUEUE_FLAG_SAME_FORCE, but actually not. blk_complete_request() is 131 + * running in interrupt handler, and currently I/O controller doesn't 132 + * support multiple interrupts, so current CPU is unique actually. This 133 + * avoids IPI sending from current CPU to the first CPU of a group. 134 + */ 127 135 if (ccpu == cpu || ccpu == group_cpu) { 128 136 struct list_head *list; 129 137 do_local:

+2 -2

block/blk-throttle.c

··· 746 746 static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio) 747 747 { 748 748 bool rw = bio_data_dir(bio); 749 - bool sync = bio->bi_rw & REQ_SYNC; 749 + bool sync = rw_is_sync(bio->bi_rw); 750 750 751 751 /* Charge the bio to the group */ 752 752 tg->bytes_disp[rw] += bio->bi_size; ··· 1150 1150 1151 1151 if (tg_no_rule_group(tg, rw)) { 1152 1152 blkiocg_update_dispatch_stats(&tg->blkg, bio->bi_size, 1153 - rw, bio->bi_rw & REQ_SYNC); 1153 + rw, rw_is_sync(bio->bi_rw)); 1154 1154 rcu_read_unlock(); 1155 1155 return 0; 1156 1156 }

+2

block/blk.h

··· 17 17 struct bio *bio); 18 18 void blk_dequeue_request(struct request *rq); 19 19 void __blk_queue_free_tags(struct request_queue *q); 20 + bool __blk_end_bidi_request(struct request *rq, int error, 21 + unsigned int nr_bytes, unsigned int bidi_bytes); 20 22 21 23 void blk_rq_timed_out_timer(unsigned long data); 22 24 void blk_delete_timer(struct request *);

+298

block/bsg-lib.c

··· 1 + /* 2 + * BSG helper library 3 + * 4 + * Copyright (C) 2008 James Smart, Emulex Corporation 5 + * Copyright (C) 2011 Red Hat, Inc. All rights reserved. 6 + * Copyright (C) 2011 Mike Christie 7 + * 8 + * This program is free software; you can redistribute it and/or modify 9 + * it under the terms of the GNU General Public License as published by 10 + * the Free Software Foundation; either version 2 of the License, or 11 + * (at your option) any later version. 12 + * 13 + * This program is distributed in the hope that it will be useful, 14 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 15 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 16 + * GNU General Public License for more details. 17 + * 18 + * You should have received a copy of the GNU General Public License 19 + * along with this program; if not, write to the Free Software 20 + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 21 + * 22 + */ 23 + #include <linux/slab.h> 24 + #include <linux/blkdev.h> 25 + #include <linux/delay.h> 26 + #include <linux/scatterlist.h> 27 + #include <linux/bsg-lib.h> 28 + #include <linux/module.h> 29 + #include <scsi/scsi_cmnd.h> 30 + 31 + /** 32 + * bsg_destroy_job - routine to teardown/delete a bsg job 33 + * @job: bsg_job that is to be torn down 34 + */ 35 + static void bsg_destroy_job(struct bsg_job *job) 36 + { 37 + put_device(job->dev); /* release reference for the request */ 38 + 39 + kfree(job->request_payload.sg_list); 40 + kfree(job->reply_payload.sg_list); 41 + kfree(job); 42 + } 43 + 44 + /** 45 + * bsg_job_done - completion routine for bsg requests 46 + * @job: bsg_job that is complete 47 + * @result: job reply result 48 + * @reply_payload_rcv_len: length of payload recvd 49 + * 50 + * The LLD should call this when the bsg job has completed. 51 + */ 52 + void bsg_job_done(struct bsg_job *job, int result, 53 + unsigned int reply_payload_rcv_len) 54 + { 55 + struct request *req = job->req; 56 + struct request *rsp = req->next_rq; 57 + int err; 58 + 59 + err = job->req->errors = result; 60 + if (err < 0) 61 + /* we're only returning the result field in the reply */ 62 + job->req->sense_len = sizeof(u32); 63 + else 64 + job->req->sense_len = job->reply_len; 65 + /* we assume all request payload was transferred, residual == 0 */ 66 + req->resid_len = 0; 67 + 68 + if (rsp) { 69 + WARN_ON(reply_payload_rcv_len > rsp->resid_len); 70 + 71 + /* set reply (bidi) residual */ 72 + rsp->resid_len -= min(reply_payload_rcv_len, rsp->resid_len); 73 + } 74 + blk_complete_request(req); 75 + } 76 + EXPORT_SYMBOL_GPL(bsg_job_done); 77 + 78 + /** 79 + * bsg_softirq_done - softirq done routine for destroying the bsg requests 80 + * @rq: BSG request that holds the job to be destroyed 81 + */ 82 + static void bsg_softirq_done(struct request *rq) 83 + { 84 + struct bsg_job *job = rq->special; 85 + 86 + blk_end_request_all(rq, rq->errors); 87 + bsg_destroy_job(job); 88 + } 89 + 90 + static int bsg_map_buffer(struct bsg_buffer *buf, struct request *req) 91 + { 92 + size_t sz = (sizeof(struct scatterlist) * req->nr_phys_segments); 93 + 94 + BUG_ON(!req->nr_phys_segments); 95 + 96 + buf->sg_list = kzalloc(sz, GFP_KERNEL); 97 + if (!buf->sg_list) 98 + return -ENOMEM; 99 + sg_init_table(buf->sg_list, req->nr_phys_segments); 100 + buf->sg_cnt = blk_rq_map_sg(req->q, req, buf->sg_list); 101 + buf->payload_len = blk_rq_bytes(req); 102 + return 0; 103 + } 104 + 105 + /** 106 + * bsg_create_job - create the bsg_job structure for the bsg request 107 + * @dev: device that is being sent the bsg request 108 + * @req: BSG request that needs a job structure 109 + */ 110 + static int bsg_create_job(struct device *dev, struct request *req) 111 + { 112 + struct request *rsp = req->next_rq; 113 + struct request_queue *q = req->q; 114 + struct bsg_job *job; 115 + int ret; 116 + 117 + BUG_ON(req->special); 118 + 119 + job = kzalloc(sizeof(struct bsg_job) + q->bsg_job_size, GFP_KERNEL); 120 + if (!job) 121 + return -ENOMEM; 122 + 123 + req->special = job; 124 + job->req = req; 125 + if (q->bsg_job_size) 126 + job->dd_data = (void *)&job[1]; 127 + job->request = req->cmd; 128 + job->request_len = req->cmd_len; 129 + job->reply = req->sense; 130 + job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer 131 + * allocated */ 132 + if (req->bio) { 133 + ret = bsg_map_buffer(&job->request_payload, req); 134 + if (ret) 135 + goto failjob_rls_job; 136 + } 137 + if (rsp && rsp->bio) { 138 + ret = bsg_map_buffer(&job->reply_payload, rsp); 139 + if (ret) 140 + goto failjob_rls_rqst_payload; 141 + } 142 + job->dev = dev; 143 + /* take a reference for the request */ 144 + get_device(job->dev); 145 + return 0; 146 + 147 + failjob_rls_rqst_payload: 148 + kfree(job->request_payload.sg_list); 149 + failjob_rls_job: 150 + kfree(job); 151 + return -ENOMEM; 152 + } 153 + 154 + /* 155 + * bsg_goose_queue - restart queue in case it was stopped 156 + * @q: request q to be restarted 157 + */ 158 + void bsg_goose_queue(struct request_queue *q) 159 + { 160 + if (!q) 161 + return; 162 + 163 + blk_run_queue_async(q); 164 + } 165 + EXPORT_SYMBOL_GPL(bsg_goose_queue); 166 + 167 + /** 168 + * bsg_request_fn - generic handler for bsg requests 169 + * @q: request queue to manage 170 + * 171 + * On error the create_bsg_job function should return a -Exyz error value 172 + * that will be set to the req->errors. 173 + * 174 + * Drivers/subsys should pass this to the queue init function. 175 + */ 176 + void bsg_request_fn(struct request_queue *q) 177 + { 178 + struct device *dev = q->queuedata; 179 + struct request *req; 180 + struct bsg_job *job; 181 + int ret; 182 + 183 + if (!get_device(dev)) 184 + return; 185 + 186 + while (1) { 187 + req = blk_fetch_request(q); 188 + if (!req) 189 + break; 190 + spin_unlock_irq(q->queue_lock); 191 + 192 + ret = bsg_create_job(dev, req); 193 + if (ret) { 194 + req->errors = ret; 195 + blk_end_request_all(req, ret); 196 + spin_lock_irq(q->queue_lock); 197 + continue; 198 + } 199 + 200 + job = req->special; 201 + ret = q->bsg_job_fn(job); 202 + spin_lock_irq(q->queue_lock); 203 + if (ret) 204 + break; 205 + } 206 + 207 + spin_unlock_irq(q->queue_lock); 208 + put_device(dev); 209 + spin_lock_irq(q->queue_lock); 210 + } 211 + EXPORT_SYMBOL_GPL(bsg_request_fn); 212 + 213 + /** 214 + * bsg_setup_queue - Create and add the bsg hooks so we can receive requests 215 + * @dev: device to attach bsg device to 216 + * @q: request queue setup by caller 217 + * @name: device to give bsg device 218 + * @job_fn: bsg job handler 219 + * @dd_job_size: size of LLD data needed for each job 220 + * 221 + * The caller should have setup the reuqest queue with bsg_request_fn 222 + * as the request_fn. 223 + */ 224 + int bsg_setup_queue(struct device *dev, struct request_queue *q, 225 + char *name, bsg_job_fn *job_fn, int dd_job_size) 226 + { 227 + int ret; 228 + 229 + q->queuedata = dev; 230 + q->bsg_job_size = dd_job_size; 231 + q->bsg_job_fn = job_fn; 232 + queue_flag_set_unlocked(QUEUE_FLAG_BIDI, q); 233 + blk_queue_softirq_done(q, bsg_softirq_done); 234 + blk_queue_rq_timeout(q, BLK_DEFAULT_SG_TIMEOUT); 235 + 236 + ret = bsg_register_queue(q, dev, name, NULL); 237 + if (ret) { 238 + printk(KERN_ERR "%s: bsg interface failed to " 239 + "initialize - register queue\n", dev->kobj.name); 240 + return ret; 241 + } 242 + 243 + return 0; 244 + } 245 + EXPORT_SYMBOL_GPL(bsg_setup_queue); 246 + 247 + /** 248 + * bsg_remove_queue - Deletes the bsg dev from the q 249 + * @q: the request_queue that is to be torn down. 250 + * 251 + * Notes: 252 + * Before unregistering the queue empty any requests that are blocked 253 + */ 254 + void bsg_remove_queue(struct request_queue *q) 255 + { 256 + struct request *req; /* block request */ 257 + int counts; /* totals for request_list count and starved */ 258 + 259 + if (!q) 260 + return; 261 + 262 + /* Stop taking in new requests */ 263 + spin_lock_irq(q->queue_lock); 264 + blk_stop_queue(q); 265 + 266 + /* drain all requests in the queue */ 267 + while (1) { 268 + /* need the lock to fetch a request 269 + * this may fetch the same reqeust as the previous pass 270 + */ 271 + req = blk_fetch_request(q); 272 + /* save requests in use and starved */ 273 + counts = q->rq.count[0] + q->rq.count[1] + 274 + q->rq.starved[0] + q->rq.starved[1]; 275 + spin_unlock_irq(q->queue_lock); 276 + /* any requests still outstanding? */ 277 + if (counts == 0) 278 + break; 279 + 280 + /* This may be the same req as the previous iteration, 281 + * always send the blk_end_request_all after a prefetch. 282 + * It is not okay to not end the request because the 283 + * prefetch started the request. 284 + */ 285 + if (req) { 286 + /* return -ENXIO to indicate that this queue is 287 + * going away 288 + */ 289 + req->errors = -ENXIO; 290 + blk_end_request_all(req, -ENXIO); 291 + } 292 + 293 + msleep(200); /* allow bsg to possibly finish */ 294 + spin_lock_irq(q->queue_lock); 295 + } 296 + bsg_unregister_queue(q); 297 + } 298 + EXPORT_SYMBOL_GPL(bsg_remove_queue);

+21

block/cfq-iosched.c

··· 130 130 unsigned long slice_end; 131 131 long slice_resid; 132 132 133 + /* pending metadata requests */ 134 + int meta_pending; 133 135 /* number of requests that are on the dispatch list or inside driver */ 134 136 int dispatched; 135 137 ··· 684 682 if (rq_is_sync(rq1) != rq_is_sync(rq2)) 685 683 return rq_is_sync(rq1) ? rq1 : rq2; 686 684 685 + if ((rq1->cmd_flags ^ rq2->cmd_flags) & REQ_META) 686 + return rq1->cmd_flags & REQ_META ? rq1 : rq2; 687 + 687 688 s1 = blk_rq_pos(rq1); 688 689 s2 = blk_rq_pos(rq2); 689 690 ··· 1214 1209 1215 1210 hlist_del_init(&cfqg->cfqd_node); 1216 1211 1212 + BUG_ON(cfqd->nr_blkcg_linked_grps <= 0); 1213 + cfqd->nr_blkcg_linked_grps--; 1214 + 1217 1215 /* 1218 1216 * Put the reference taken at the time of creation so that when all 1219 1217 * queues are gone, group can be destroyed. ··· 1612 1604 cfqq->cfqd->rq_queued--; 1613 1605 cfq_blkiocg_update_io_remove_stats(&(RQ_CFQG(rq))->blkg, 1614 1606 rq_data_dir(rq), rq_is_sync(rq)); 1607 + if (rq->cmd_flags & REQ_META) { 1608 + WARN_ON(!cfqq->meta_pending); 1609 + cfqq->meta_pending--; 1610 + } 1615 1611 } 1616 1612 1617 1613 static int cfq_merge(struct request_queue *q, struct request **req, ··· 3369 3357 return true; 3370 3358 3371 3359 /* 3360 + * So both queues are sync. Let the new request get disk time if 3361 + * it's a metadata request and the current queue is doing regular IO. 3362 + */ 3363 + if ((rq->cmd_flags & REQ_META) && !cfqq->meta_pending) 3364 + return true; 3365 + 3366 + /* 3372 3367 * Allow an RT request to pre-empt an ongoing non-RT cfqq timeslice. 3373 3368 */ 3374 3369 if (cfq_class_rt(new_cfqq) && !cfq_class_rt(cfqq)) ··· 3439 3420 struct cfq_io_context *cic = RQ_CIC(rq); 3440 3421 3441 3422 cfqd->rq_queued++; 3423 + if (rq->cmd_flags & REQ_META) 3424 + cfqq->meta_pending++; 3442 3425 3443 3426 cfq_update_io_thinktime(cfqd, cfqq, cic); 3444 3427 cfq_update_io_seektime(cfqd, cfqq, rq);

+4 -4

block/genhd.c

··· 1146 1146 cpu = part_stat_lock(); 1147 1147 part_round_stats(cpu, hd); 1148 1148 part_stat_unlock(); 1149 - seq_printf(seqf, "%4d %7d %s %lu %lu %llu " 1150 - "%u %lu %lu %llu %u %u %u %u\n", 1149 + seq_printf(seqf, "%4d %7d %s %lu %lu %lu " 1150 + "%u %lu %lu %lu %u %u %u %u\n", 1151 1151 MAJOR(part_devt(hd)), MINOR(part_devt(hd)), 1152 1152 disk_name(gp, hd->partno, buf), 1153 1153 part_stat_read(hd, ios[READ]), 1154 1154 part_stat_read(hd, merges[READ]), 1155 - (unsigned long long)part_stat_read(hd, sectors[READ]), 1155 + part_stat_read(hd, sectors[READ]), 1156 1156 jiffies_to_msecs(part_stat_read(hd, ticks[READ])), 1157 1157 part_stat_read(hd, ios[WRITE]), 1158 1158 part_stat_read(hd, merges[WRITE]), 1159 - (unsigned long long)part_stat_read(hd, sectors[WRITE]), 1159 + part_stat_read(hd, sectors[WRITE]), 1160 1160 jiffies_to_msecs(part_stat_read(hd, ticks[WRITE])), 1161 1161 part_in_flight(hd), 1162 1162 jiffies_to_msecs(part_stat_read(hd, io_ticks)),

+16 -1

drivers/block/Kconfig

··· 256 256 257 257 Most users will answer N here. 258 258 259 + config BLK_DEV_LOOP_MIN_COUNT 260 + int "Number of loop devices to pre-create at init time" 261 + depends on BLK_DEV_LOOP 262 + default 8 263 + help 264 + Static number of loop devices to be unconditionally pre-created 265 + at init time. 266 + 267 + This default value can be overwritten on the kernel command 268 + line or with module-parameter loop.max_loop. 269 + 270 + The historic default is 8. If a late 2011 version of losetup(8) 271 + is used, it can be set to 0, since needed loop devices can be 272 + dynamically allocated with the /dev/loop-control interface. 273 + 259 274 config BLK_DEV_CRYPTOLOOP 260 275 tristate "Cryptoloop Support" 261 276 select CRYPTO ··· 486 471 in another domain which drives the actual block device. 487 472 488 473 config XEN_BLKDEV_BACKEND 489 - tristate "Block-device backend driver" 474 + tristate "Xen block-device backend driver" 490 475 depends on XEN_BACKEND 491 476 help 492 477 The block-device backend driver allows the kernel to export its

+2 -2

drivers/block/drbd/drbd_nl.c

··· 1829 1829 1830 1830 /* silently ignore cpu mask on UP kernel */ 1831 1831 if (nr_cpu_ids > 1 && sc.cpu_mask[0] != 0) { 1832 - err = __bitmap_parse(sc.cpu_mask, 32, 0, 1832 + err = bitmap_parse(sc.cpu_mask, 32, 1833 1833 cpumask_bits(new_cpu_mask), nr_cpu_ids); 1834 1834 if (err) { 1835 - dev_warn(DEV, "__bitmap_parse() failed with %d\n", err); 1835 + dev_warn(DEV, "bitmap_parse() failed with %d\n", err); 1836 1836 retcode = ERR_CPU_MASK_PARSE; 1837 1837 goto fail; 1838 1838 }

+207 -92

drivers/block/loop.c

··· 75 75 #include <linux/kthread.h> 76 76 #include <linux/splice.h> 77 77 #include <linux/sysfs.h> 78 - 78 + #include <linux/miscdevice.h> 79 79 #include <asm/uaccess.h> 80 80 81 - static LIST_HEAD(loop_devices); 82 - static DEFINE_MUTEX(loop_devices_mutex); 81 + static DEFINE_IDR(loop_index_idr); 82 + static DEFINE_MUTEX(loop_index_mutex); 83 83 84 84 static int max_part; 85 85 static int part_shift; ··· 722 722 static ssize_t loop_attr_show(struct device *dev, char *page, 723 723 ssize_t (*callback)(struct loop_device *, char *)) 724 724 { 725 - struct loop_device *l, *lo = NULL; 725 + struct gendisk *disk = dev_to_disk(dev); 726 + struct loop_device *lo = disk->private_data; 726 727 727 - mutex_lock(&loop_devices_mutex); 728 - list_for_each_entry(l, &loop_devices, lo_list) 729 - if (disk_to_dev(l->lo_disk) == dev) { 730 - lo = l; 731 - break; 732 - } 733 - mutex_unlock(&loop_devices_mutex); 734 - 735 - return lo ? callback(lo, page) : -EIO; 728 + return callback(lo, page); 736 729 } 737 730 738 731 #define LOOP_ATTR_RO(_name) \ ··· 743 750 ssize_t ret; 744 751 char *p = NULL; 745 752 746 - mutex_lock(&lo->lo_ctl_mutex); 753 + spin_lock_irq(&lo->lo_lock); 747 754 if (lo->lo_backing_file) 748 755 p = d_path(&lo->lo_backing_file->f_path, buf, PAGE_SIZE - 1); 749 - mutex_unlock(&lo->lo_ctl_mutex); 756 + spin_unlock_irq(&lo->lo_lock); 750 757 751 758 if (IS_ERR_OR_NULL(p)) 752 759 ret = PTR_ERR(p); ··· 1000 1007 1001 1008 kthread_stop(lo->lo_thread); 1002 1009 1010 + spin_lock_irq(&lo->lo_lock); 1003 1011 lo->lo_backing_file = NULL; 1012 + spin_unlock_irq(&lo->lo_lock); 1004 1013 1005 1014 loop_release_xfer(lo); 1006 1015 lo->transfer = NULL; ··· 1480 1485 1481 1486 static int lo_open(struct block_device *bdev, fmode_t mode) 1482 1487 { 1483 - struct loop_device *lo = bdev->bd_disk->private_data; 1488 + struct loop_device *lo; 1489 + int err = 0; 1490 + 1491 + mutex_lock(&loop_index_mutex); 1492 + lo = bdev->bd_disk->private_data; 1493 + if (!lo) { 1494 + err = -ENXIO; 1495 + goto out; 1496 + } 1484 1497 1485 1498 mutex_lock(&lo->lo_ctl_mutex); 1486 1499 lo->lo_refcnt++; 1487 1500 mutex_unlock(&lo->lo_ctl_mutex); 1488 - 1489 - return 0; 1501 + out: 1502 + mutex_unlock(&loop_index_mutex); 1503 + return err; 1490 1504 } 1491 1505 1492 1506 static int lo_release(struct gendisk *disk, fmode_t mode) ··· 1561 1557 return 0; 1562 1558 } 1563 1559 1560 + static int unregister_transfer_cb(int id, void *ptr, void *data) 1561 + { 1562 + struct loop_device *lo = ptr; 1563 + struct loop_func_table *xfer = data; 1564 + 1565 + mutex_lock(&lo->lo_ctl_mutex); 1566 + if (lo->lo_encryption == xfer) 1567 + loop_release_xfer(lo); 1568 + mutex_unlock(&lo->lo_ctl_mutex); 1569 + return 0; 1570 + } 1571 + 1564 1572 int loop_unregister_transfer(int number) 1565 1573 { 1566 1574 unsigned int n = number; 1567 - struct loop_device *lo; 1568 1575 struct loop_func_table *xfer; 1569 1576 1570 1577 if (n == 0 || n >= MAX_LO_CRYPT || (xfer = xfer_funcs[n]) == NULL) 1571 1578 return -EINVAL; 1572 1579 1573 1580 xfer_funcs[n] = NULL; 1574 - 1575 - list_for_each_entry(lo, &loop_devices, lo_list) { 1576 - mutex_lock(&lo->lo_ctl_mutex); 1577 - 1578 - if (lo->lo_encryption == xfer) 1579 - loop_release_xfer(lo); 1580 - 1581 - mutex_unlock(&lo->lo_ctl_mutex); 1582 - } 1583 - 1581 + idr_for_each(&loop_index_idr, &unregister_transfer_cb, xfer); 1584 1582 return 0; 1585 1583 } 1586 1584 1587 1585 EXPORT_SYMBOL(loop_register_transfer); 1588 1586 EXPORT_SYMBOL(loop_unregister_transfer); 1589 1587 1590 - static struct loop_device *loop_alloc(int i) 1588 + static int loop_add(struct loop_device **l, int i) 1591 1589 { 1592 1590 struct loop_device *lo; 1593 1591 struct gendisk *disk; 1592 + int err; 1594 1593 1595 1594 lo = kzalloc(sizeof(*lo), GFP_KERNEL); 1596 - if (!lo) 1595 + if (!lo) { 1596 + err = -ENOMEM; 1597 1597 goto out; 1598 + } 1599 + 1600 + err = idr_pre_get(&loop_index_idr, GFP_KERNEL); 1601 + if (err < 0) 1602 + goto out_free_dev; 1603 + 1604 + if (i >= 0) { 1605 + int m; 1606 + 1607 + /* create specific i in the index */ 1608 + err = idr_get_new_above(&loop_index_idr, lo, i, &m); 1609 + if (err >= 0 && i != m) { 1610 + idr_remove(&loop_index_idr, m); 1611 + err = -EEXIST; 1612 + } 1613 + } else if (i == -1) { 1614 + int m; 1615 + 1616 + /* get next free nr */ 1617 + err = idr_get_new(&loop_index_idr, lo, &m); 1618 + if (err >= 0) 1619 + i = m; 1620 + } else { 1621 + err = -EINVAL; 1622 + } 1623 + if (err < 0) 1624 + goto out_free_dev; 1598 1625 1599 1626 lo->lo_queue = blk_alloc_queue(GFP_KERNEL); 1600 1627 if (!lo->lo_queue) ··· 1646 1611 disk->private_data = lo; 1647 1612 disk->queue = lo->lo_queue; 1648 1613 sprintf(disk->disk_name, "loop%d", i); 1649 - return lo; 1614 + add_disk(disk); 1615 + *l = lo; 1616 + return lo->lo_number; 1650 1617 1651 1618 out_free_queue: 1652 1619 blk_cleanup_queue(lo->lo_queue); 1653 1620 out_free_dev: 1654 1621 kfree(lo); 1655 1622 out: 1656 - return NULL; 1623 + return err; 1657 1624 } 1658 1625 1659 - static void loop_free(struct loop_device *lo) 1626 + static void loop_remove(struct loop_device *lo) 1660 1627 { 1628 + del_gendisk(lo->lo_disk); 1661 1629 blk_cleanup_queue(lo->lo_queue); 1662 1630 put_disk(lo->lo_disk); 1663 - list_del(&lo->lo_list); 1664 1631 kfree(lo); 1665 1632 } 1666 1633 1667 - static struct loop_device *loop_init_one(int i) 1634 + static int find_free_cb(int id, void *ptr, void *data) 1668 1635 { 1669 - struct loop_device *lo; 1636 + struct loop_device *lo = ptr; 1637 + struct loop_device **l = data; 1670 1638 1671 - list_for_each_entry(lo, &loop_devices, lo_list) { 1672 - if (lo->lo_number == i) 1673 - return lo; 1639 + if (lo->lo_state == Lo_unbound) { 1640 + *l = lo; 1641 + return 1; 1674 1642 } 1675 - 1676 - lo = loop_alloc(i); 1677 - if (lo) { 1678 - add_disk(lo->lo_disk); 1679 - list_add_tail(&lo->lo_list, &loop_devices); 1680 - } 1681 - return lo; 1643 + return 0; 1682 1644 } 1683 1645 1684 - static void loop_del_one(struct loop_device *lo) 1646 + static int loop_lookup(struct loop_device **l, int i) 1685 1647 { 1686 - del_gendisk(lo->lo_disk); 1687 - loop_free(lo); 1648 + struct loop_device *lo; 1649 + int ret = -ENODEV; 1650 + 1651 + if (i < 0) { 1652 + int err; 1653 + 1654 + err = idr_for_each(&loop_index_idr, &find_free_cb, &lo); 1655 + if (err == 1) { 1656 + *l = lo; 1657 + ret = lo->lo_number; 1658 + } 1659 + goto out; 1660 + } 1661 + 1662 + /* lookup and return a specific i */ 1663 + lo = idr_find(&loop_index_idr, i); 1664 + if (lo) { 1665 + *l = lo; 1666 + ret = lo->lo_number; 1667 + } 1668 + out: 1669 + return ret; 1688 1670 } 1689 1671 1690 1672 static struct kobject *loop_probe(dev_t dev, int *part, void *data) 1691 1673 { 1692 1674 struct loop_device *lo; 1693 1675 struct kobject *kobj; 1676 + int err; 1694 1677 1695 - mutex_lock(&loop_devices_mutex); 1696 - lo = loop_init_one(MINOR(dev) >> part_shift); 1697 - kobj = lo ? get_disk(lo->lo_disk) : ERR_PTR(-ENOMEM); 1698 - mutex_unlock(&loop_devices_mutex); 1678 + mutex_lock(&loop_index_mutex); 1679 + err = loop_lookup(&lo, MINOR(dev) >> part_shift); 1680 + if (err < 0) 1681 + err = loop_add(&lo, MINOR(dev) >> part_shift); 1682 + if (err < 0) 1683 + kobj = ERR_PTR(err); 1684 + else 1685 + kobj = get_disk(lo->lo_disk); 1686 + mutex_unlock(&loop_index_mutex); 1699 1687 1700 1688 *part = 0; 1701 1689 return kobj; 1702 1690 } 1703 1691 1692 + static long loop_control_ioctl(struct file *file, unsigned int cmd, 1693 + unsigned long parm) 1694 + { 1695 + struct loop_device *lo; 1696 + int ret = -ENOSYS; 1697 + 1698 + mutex_lock(&loop_index_mutex); 1699 + switch (cmd) { 1700 + case LOOP_CTL_ADD: 1701 + ret = loop_lookup(&lo, parm); 1702 + if (ret >= 0) { 1703 + ret = -EEXIST; 1704 + break; 1705 + } 1706 + ret = loop_add(&lo, parm); 1707 + break; 1708 + case LOOP_CTL_REMOVE: 1709 + ret = loop_lookup(&lo, parm); 1710 + if (ret < 0) 1711 + break; 1712 + mutex_lock(&lo->lo_ctl_mutex); 1713 + if (lo->lo_state != Lo_unbound) { 1714 + ret = -EBUSY; 1715 + mutex_unlock(&lo->lo_ctl_mutex); 1716 + break; 1717 + } 1718 + if (lo->lo_refcnt > 0) { 1719 + ret = -EBUSY; 1720 + mutex_unlock(&lo->lo_ctl_mutex); 1721 + break; 1722 + } 1723 + lo->lo_disk->private_data = NULL; 1724 + mutex_unlock(&lo->lo_ctl_mutex); 1725 + idr_remove(&loop_index_idr, lo->lo_number); 1726 + loop_remove(lo); 1727 + break; 1728 + case LOOP_CTL_GET_FREE: 1729 + ret = loop_lookup(&lo, -1); 1730 + if (ret >= 0) 1731 + break; 1732 + ret = loop_add(&lo, -1); 1733 + } 1734 + mutex_unlock(&loop_index_mutex); 1735 + 1736 + return ret; 1737 + } 1738 + 1739 + static const struct file_operations loop_ctl_fops = { 1740 + .open = nonseekable_open, 1741 + .unlocked_ioctl = loop_control_ioctl, 1742 + .compat_ioctl = loop_control_ioctl, 1743 + .owner = THIS_MODULE, 1744 + .llseek = noop_llseek, 1745 + }; 1746 + 1747 + static struct miscdevice loop_misc = { 1748 + .minor = LOOP_CTRL_MINOR, 1749 + .name = "loop-control", 1750 + .fops = &loop_ctl_fops, 1751 + }; 1752 + 1753 + MODULE_ALIAS_MISCDEV(LOOP_CTRL_MINOR); 1754 + MODULE_ALIAS("devname:loop-control"); 1755 + 1704 1756 static int __init loop_init(void) 1705 1757 { 1706 1758 int i, nr; 1707 1759 unsigned long range; 1708 - struct loop_device *lo, *next; 1760 + struct loop_device *lo; 1761 + int err; 1709 1762 1710 - /* 1711 - * loop module now has a feature to instantiate underlying device 1712 - * structure on-demand, provided that there is an access dev node. 1713 - * However, this will not work well with user space tool that doesn't 1714 - * know about such "feature". In order to not break any existing 1715 - * tool, we do the following: 1716 - * 1717 - * (1) if max_loop is specified, create that many upfront, and this 1718 - * also becomes a hard limit. 1719 - * (2) if max_loop is not specified, create 8 loop device on module 1720 - * load, user can further extend loop device by create dev node 1721 - * themselves and have kernel automatically instantiate actual 1722 - * device on-demand. 1723 - */ 1763 + err = misc_register(&loop_misc); 1764 + if (err < 0) 1765 + return err; 1724 1766 1725 1767 part_shift = 0; 1726 1768 if (max_part > 0) { ··· 1820 1708 if (max_loop > 1UL << (MINORBITS - part_shift)) 1821 1709 return -EINVAL; 1822 1710 1711 + /* 1712 + * If max_loop is specified, create that many devices upfront. 1713 + * This also becomes a hard limit. If max_loop is not specified, 1714 + * create CONFIG_BLK_DEV_LOOP_MIN_COUNT loop devices at module 1715 + * init time. Loop devices can be requested on-demand with the 1716 + * /dev/loop-control interface, or be instantiated by accessing 1717 + * a 'dead' device node. 1718 + */ 1823 1719 if (max_loop) { 1824 1720 nr = max_loop; 1825 1721 range = max_loop << part_shift; 1826 1722 } else { 1827 - nr = 8; 1723 + nr = CONFIG_BLK_DEV_LOOP_MIN_COUNT; 1828 1724 range = 1UL << MINORBITS; 1829 1725 } 1830 1726 1831 1727 if (register_blkdev(LOOP_MAJOR, "loop")) 1832 1728 return -EIO; 1833 1729 1834 - for (i = 0; i < nr; i++) { 1835 - lo = loop_alloc(i); 1836 - if (!lo) 1837 - goto Enomem; 1838 - list_add_tail(&lo->lo_list, &loop_devices); 1839 - } 1840 - 1841 - /* point of no return */ 1842 - 1843 - list_for_each_entry(lo, &loop_devices, lo_list) 1844 - add_disk(lo->lo_disk); 1845 - 1846 1730 blk_register_region(MKDEV(LOOP_MAJOR, 0), range, 1847 1731 THIS_MODULE, loop_probe, NULL, NULL); 1848 1732 1733 + /* pre-create number of devices given by config or max_loop */ 1734 + mutex_lock(&loop_index_mutex); 1735 + for (i = 0; i < nr; i++) 1736 + loop_add(&lo, i); 1737 + mutex_unlock(&loop_index_mutex); 1738 + 1849 1739 printk(KERN_INFO "loop: module loaded\n"); 1850 1740 return 0; 1741 + } 1851 1742 1852 - Enomem: 1853 - printk(KERN_INFO "loop: out of memory\n"); 1743 + static int loop_exit_cb(int id, void *ptr, void *data) 1744 + { 1745 + struct loop_device *lo = ptr; 1854 1746 1855 - list_for_each_entry_safe(lo, next, &loop_devices, lo_list) 1856 - loop_free(lo); 1857 - 1858 - unregister_blkdev(LOOP_MAJOR, "loop"); 1859 - return -ENOMEM; 1747 + loop_remove(lo); 1748 + return 0; 1860 1749 } 1861 1750 1862 1751 static void __exit loop_exit(void) 1863 1752 { 1864 1753 unsigned long range; 1865 - struct loop_device *lo, *next; 1866 1754 1867 1755 range = max_loop ? max_loop << part_shift : 1UL << MINORBITS; 1868 1756 1869 - list_for_each_entry_safe(lo, next, &loop_devices, lo_list) 1870 - loop_del_one(lo); 1757 + idr_for_each(&loop_index_idr, &loop_exit_cb, NULL); 1758 + idr_remove_all(&loop_index_idr); 1759 + idr_destroy(&loop_index_idr); 1871 1760 1872 1761 blk_unregister_region(MKDEV(LOOP_MAJOR, 0), range); 1873 1762 unregister_blkdev(LOOP_MAJOR, "loop"); 1763 + 1764 + misc_deregister(&loop_misc); 1874 1765 } 1875 1766 1876 1767 module_init(loop_init);

+1

drivers/block/swim3.c

··· 1184 1184 { 1185 1185 .compatible = "swim3" 1186 1186 }, 1187 + { /* end of list */ } 1187 1188 }; 1188 1189 1189 1190 static struct macio_driver swim3_driver =

+3 -3

drivers/block/xen-blkfront.c

··· 123 123 #define BLKIF_MINOR_EXT(dev) ((dev)&(~EXTENDED)) 124 124 #define EMULATED_HD_DISK_MINOR_OFFSET (0) 125 125 #define EMULATED_HD_DISK_NAME_OFFSET (EMULATED_HD_DISK_MINOR_OFFSET / 256) 126 - #define EMULATED_SD_DISK_MINOR_OFFSET (EMULATED_HD_DISK_MINOR_OFFSET + (4 * 16)) 127 - #define EMULATED_SD_DISK_NAME_OFFSET (EMULATED_HD_DISK_NAME_OFFSET + 4) 126 + #define EMULATED_SD_DISK_MINOR_OFFSET (0) 127 + #define EMULATED_SD_DISK_NAME_OFFSET (EMULATED_SD_DISK_MINOR_OFFSET / 256) 128 128 129 129 #define DEV_NAME "xvd" /* name in /dev */ 130 130 ··· 529 529 minor = BLKIF_MINOR_EXT(info->vdevice); 530 530 nr_parts = PARTS_PER_EXT_DISK; 531 531 offset = minor / nr_parts; 532 - if (xen_hvm_domain() && offset <= EMULATED_HD_DISK_NAME_OFFSET + 4) 532 + if (xen_hvm_domain() && offset < EMULATED_HD_DISK_NAME_OFFSET + 4) 533 533 printk(KERN_WARNING "blkfront: vdevice 0x%x might conflict with " 534 534 "emulated IDE disks,\n\t choose an xvd device name" 535 535 "from xvde on\n", info->vdevice);

+7 -1

drivers/cdrom/cdrom.c

··· 1929 1929 goto out; 1930 1930 1931 1931 s->manufact.len = buf[0] << 8 | buf[1]; 1932 - if (s->manufact.len < 0 || s->manufact.len > 2048) { 1932 + if (s->manufact.len < 0) { 1933 1933 cdinfo(CD_WARNING, "Received invalid manufacture info length" 1934 1934 " (%d)\n", s->manufact.len); 1935 1935 ret = -EIO; 1936 1936 } else { 1937 + if (s->manufact.len > 2048) { 1938 + cdinfo(CD_WARNING, "Received invalid manufacture info " 1939 + "length (%d): truncating to 2048\n", 1940 + s->manufact.len); 1941 + s->manufact.len = 2048; 1942 + } 1937 1943 memcpy(s->manufact.value, &buf[4], s->manufact.len); 1938 1944 } 1939 1945

+4 -3

include/linux/blk_types.h

··· 125 125 __REQ_SYNC, /* request is sync (sync write or read) */ 126 126 __REQ_META, /* metadata io request */ 127 127 __REQ_DISCARD, /* request to discard sectors */ 128 + __REQ_SECURE, /* secure discard (used with __REQ_DISCARD) */ 129 + 128 130 __REQ_NOIDLE, /* don't anticipate more IO after this one */ 131 + __REQ_FUA, /* forced unit access */ 132 + __REQ_FLUSH, /* request for cache flush */ 129 133 130 134 /* bio only flags */ 131 135 __REQ_RAHEAD, /* read ahead, can fail anytime */ ··· 139 135 /* request only flags */ 140 136 __REQ_SORTED, /* elevator knows about this request */ 141 137 __REQ_SOFTBARRIER, /* may not be passed by ioscheduler */ 142 - __REQ_FUA, /* forced unit access */ 143 138 __REQ_NOMERGE, /* don't touch this for merging */ 144 139 __REQ_STARTED, /* drive already may have started this one */ 145 140 __REQ_DONTPREP, /* don't call prep for this one */ ··· 149 146 __REQ_PREEMPT, /* set for "ide_preempt" requests */ 150 147 __REQ_ALLOCED, /* request came from our alloc pool */ 151 148 __REQ_COPY_USER, /* contains copies of user pages */ 152 - __REQ_FLUSH, /* request for cache flush */ 153 149 __REQ_FLUSH_SEQ, /* request for flush sequence */ 154 150 __REQ_IO_STAT, /* account I/O stat */ 155 151 __REQ_MIXED_MERGE, /* merge of different types, fail separately */ 156 - __REQ_SECURE, /* secure discard (used with __REQ_DISCARD) */ 157 152 __REQ_NR_BITS, /* stops here */ 158 153 }; 159 154

+5

include/linux/blkdev.h

··· 30 30 struct blk_trace; 31 31 struct request; 32 32 struct sg_io_hdr; 33 + struct bsg_job; 33 34 34 35 #define BLKDEV_MIN_RQ 4 35 36 #define BLKDEV_MAX_RQ 128 /* Default maximum */ ··· 118 117 struct { 119 118 unsigned int seq; 120 119 struct list_head list; 120 + rq_end_io_fn *saved_end_io; 121 121 } flush; 122 122 }; 123 123 ··· 211 209 typedef void (softirq_done_fn)(struct request *); 212 210 typedef int (dma_drain_needed_fn)(struct request *); 213 211 typedef int (lld_busy_fn) (struct request_queue *q); 212 + typedef int (bsg_job_fn) (struct bsg_job *); 214 213 215 214 enum blk_eh_timer_return { 216 215 BLK_EH_NOT_HANDLED, ··· 378 375 struct mutex sysfs_lock; 379 376 380 377 #if defined(CONFIG_BLK_DEV_BSG) 378 + bsg_job_fn *bsg_job_fn; 379 + int bsg_job_size; 381 380 struct bsg_class_device bsg_dev; 382 381 #endif 383 382

+3 -2

include/linux/blktrace_api.h

··· 14 14 enum blktrace_cat { 15 15 BLK_TC_READ = 1 << 0, /* reads */ 16 16 BLK_TC_WRITE = 1 << 1, /* writes */ 17 - BLK_TC_BARRIER = 1 << 2, /* barrier */ 17 + BLK_TC_FLUSH = 1 << 2, /* flush */ 18 18 BLK_TC_SYNC = 1 << 3, /* sync IO */ 19 19 BLK_TC_SYNCIO = BLK_TC_SYNC, 20 20 BLK_TC_QUEUE = 1 << 4, /* queueing/merging */ ··· 28 28 BLK_TC_META = 1 << 12, /* metadata */ 29 29 BLK_TC_DISCARD = 1 << 13, /* discard requests */ 30 30 BLK_TC_DRV_DATA = 1 << 14, /* binary per-driver data */ 31 + BLK_TC_FUA = 1 << 15, /* fua requests */ 31 32 32 - BLK_TC_END = 1 << 15, /* only 16-bits, reminder */ 33 + BLK_TC_END = 1 << 15, /* we've run out of bits! */ 33 34 }; 34 35 35 36 #define BLK_TC_SHIFT (16)

+73

include/linux/bsg-lib.h

··· 1 + /* 2 + * BSG helper library 3 + * 4 + * Copyright (C) 2008 James Smart, Emulex Corporation 5 + * Copyright (C) 2011 Red Hat, Inc. All rights reserved. 6 + * Copyright (C) 2011 Mike Christie 7 + * 8 + * This program is free software; you can redistribute it and/or modify 9 + * it under the terms of the GNU General Public License as published by 10 + * the Free Software Foundation; either version 2 of the License, or 11 + * (at your option) any later version. 12 + * 13 + * This program is distributed in the hope that it will be useful, 14 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 15 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 16 + * GNU General Public License for more details. 17 + * 18 + * You should have received a copy of the GNU General Public License 19 + * along with this program; if not, write to the Free Software 20 + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 21 + * 22 + */ 23 + #ifndef _BLK_BSG_ 24 + #define _BLK_BSG_ 25 + 26 + #include <linux/blkdev.h> 27 + 28 + struct request; 29 + struct device; 30 + struct scatterlist; 31 + struct request_queue; 32 + 33 + struct bsg_buffer { 34 + unsigned int payload_len; 35 + int sg_cnt; 36 + struct scatterlist *sg_list; 37 + }; 38 + 39 + struct bsg_job { 40 + struct device *dev; 41 + struct request *req; 42 + 43 + /* Transport/driver specific request/reply structs */ 44 + void *request; 45 + void *reply; 46 + 47 + unsigned int request_len; 48 + unsigned int reply_len; 49 + /* 50 + * On entry : reply_len indicates the buffer size allocated for 51 + * the reply. 52 + * 53 + * Upon completion : the message handler must set reply_len 54 + * to indicates the size of the reply to be returned to the 55 + * caller. 56 + */ 57 + 58 + /* DMA payloads for the request/response */ 59 + struct bsg_buffer request_payload; 60 + struct bsg_buffer reply_payload; 61 + 62 + void *dd_data; /* Used for driver-specific storage */ 63 + }; 64 + 65 + void bsg_job_done(struct bsg_job *job, int result, 66 + unsigned int reply_payload_rcv_len); 67 + int bsg_setup_queue(struct device *dev, struct request_queue *q, char *name, 68 + bsg_job_fn *job_fn, int dd_job_size); 69 + void bsg_request_fn(struct request_queue *q); 70 + void bsg_remove_queue(struct request_queue *q); 71 + void bsg_goose_queue(struct request_queue *q); 72 + 73 + #endif

+4 -1

include/linux/loop.h

··· 64 64 65 65 struct request_queue *lo_queue; 66 66 struct gendisk *lo_disk; 67 - struct list_head lo_list; 68 67 }; 69 68 70 69 #endif /* __KERNEL__ */ ··· 160 161 #define LOOP_CHANGE_FD 0x4C06 161 162 #define LOOP_SET_CAPACITY 0x4C07 162 163 164 + /* /dev/loop-control interface */ 165 + #define LOOP_CTL_ADD 0x4C80 166 + #define LOOP_CTL_REMOVE 0x4C81 167 + #define LOOP_CTL_GET_FREE 0x4C82 163 168 #endif

+1

include/linux/miscdevice.h

··· 40 40 #define BTRFS_MINOR 234 41 41 #define AUTOFS_MINOR 235 42 42 #define MAPPER_CTRL_MINOR 236 43 + #define LOOP_CTRL_MINOR 237 43 44 #define MISC_DYNAMIC_MINOR 255 44 45 45 46 struct device;

+11 -9

include/trace/events/block.h

··· 8 8 #include <linux/blkdev.h> 9 9 #include <linux/tracepoint.h> 10 10 11 + #define RWBS_LEN 8 12 + 11 13 DECLARE_EVENT_CLASS(block_rq_with_error, 12 14 13 15 TP_PROTO(struct request_queue *q, struct request *rq), ··· 21 19 __field( sector_t, sector ) 22 20 __field( unsigned int, nr_sector ) 23 21 __field( int, errors ) 24 - __array( char, rwbs, 6 ) 22 + __array( char, rwbs, RWBS_LEN ) 25 23 __dynamic_array( char, cmd, blk_cmd_buf_len(rq) ) 26 24 ), 27 25 ··· 106 104 __field( sector_t, sector ) 107 105 __field( unsigned int, nr_sector ) 108 106 __field( unsigned int, bytes ) 109 - __array( char, rwbs, 6 ) 107 + __array( char, rwbs, RWBS_LEN ) 110 108 __array( char, comm, TASK_COMM_LEN ) 111 109 __dynamic_array( char, cmd, blk_cmd_buf_len(rq) ) 112 110 ), ··· 185 183 __field( dev_t, dev ) 186 184 __field( sector_t, sector ) 187 185 __field( unsigned int, nr_sector ) 188 - __array( char, rwbs, 6 ) 186 + __array( char, rwbs, RWBS_LEN ) 189 187 __array( char, comm, TASK_COMM_LEN ) 190 188 ), 191 189 ··· 224 222 __field( sector_t, sector ) 225 223 __field( unsigned, nr_sector ) 226 224 __field( int, error ) 227 - __array( char, rwbs, 6 ) 225 + __array( char, rwbs, RWBS_LEN) 228 226 ), 229 227 230 228 TP_fast_assign( ··· 251 249 __field( dev_t, dev ) 252 250 __field( sector_t, sector ) 253 251 __field( unsigned int, nr_sector ) 254 - __array( char, rwbs, 6 ) 252 + __array( char, rwbs, RWBS_LEN ) 255 253 __array( char, comm, TASK_COMM_LEN ) 256 254 ), 257 255 ··· 323 321 __field( dev_t, dev ) 324 322 __field( sector_t, sector ) 325 323 __field( unsigned int, nr_sector ) 326 - __array( char, rwbs, 6 ) 324 + __array( char, rwbs, RWBS_LEN ) 327 325 __array( char, comm, TASK_COMM_LEN ) 328 326 ), 329 327 ··· 458 456 __field( dev_t, dev ) 459 457 __field( sector_t, sector ) 460 458 __field( sector_t, new_sector ) 461 - __array( char, rwbs, 6 ) 459 + __array( char, rwbs, RWBS_LEN ) 462 460 __array( char, comm, TASK_COMM_LEN ) 463 461 ), 464 462 ··· 500 498 __field( unsigned int, nr_sector ) 501 499 __field( dev_t, old_dev ) 502 500 __field( sector_t, old_sector ) 503 - __array( char, rwbs, 6 ) 501 + __array( char, rwbs, RWBS_LEN) 504 502 ), 505 503 506 504 TP_fast_assign( ··· 544 542 __field( unsigned int, nr_sector ) 545 543 __field( dev_t, old_dev ) 546 544 __field( sector_t, old_sector ) 547 - __array( char, rwbs, 6 ) 545 + __array( char, rwbs, RWBS_LEN) 548 546 ), 549 547 550 548 TP_fast_assign(

+16 -5

kernel/trace/blktrace.c

··· 206 206 what |= MASK_TC_BIT(rw, RAHEAD); 207 207 what |= MASK_TC_BIT(rw, META); 208 208 what |= MASK_TC_BIT(rw, DISCARD); 209 + what |= MASK_TC_BIT(rw, FLUSH); 210 + what |= MASK_TC_BIT(rw, FUA); 209 211 210 212 pid = tsk->pid; 211 213 if (act_log_check(bt, what, sector, pid)) ··· 1056 1054 goto out; 1057 1055 } 1058 1056 1057 + if (tc & BLK_TC_FLUSH) 1058 + rwbs[i++] = 'F'; 1059 + 1059 1060 if (tc & BLK_TC_DISCARD) 1060 1061 rwbs[i++] = 'D'; 1061 1062 else if (tc & BLK_TC_WRITE) ··· 1068 1063 else 1069 1064 rwbs[i++] = 'N'; 1070 1065 1066 + if (tc & BLK_TC_FUA) 1067 + rwbs[i++] = 'F'; 1071 1068 if (tc & BLK_TC_AHEAD) 1072 1069 rwbs[i++] = 'A'; 1073 - if (tc & BLK_TC_BARRIER) 1074 - rwbs[i++] = 'B'; 1075 1070 if (tc & BLK_TC_SYNC) 1076 1071 rwbs[i++] = 'S'; 1077 1072 if (tc & BLK_TC_META) ··· 1137 1132 1138 1133 static int blk_log_action_classic(struct trace_iterator *iter, const char *act) 1139 1134 { 1140 - char rwbs[6]; 1135 + char rwbs[RWBS_LEN]; 1141 1136 unsigned long long ts = iter->ts; 1142 1137 unsigned long nsec_rem = do_div(ts, NSEC_PER_SEC); 1143 1138 unsigned secs = (unsigned long)ts; ··· 1153 1148 1154 1149 static int blk_log_action(struct trace_iterator *iter, const char *act) 1155 1150 { 1156 - char rwbs[6]; 1151 + char rwbs[RWBS_LEN]; 1157 1152 const struct blk_io_trace *t = te_blk_io_trace(iter->ent); 1158 1153 1159 1154 fill_rwbs(rwbs, t); ··· 1566 1561 } mask_maps[] = { 1567 1562 { BLK_TC_READ, "read" }, 1568 1563 { BLK_TC_WRITE, "write" }, 1569 - { BLK_TC_BARRIER, "barrier" }, 1564 + { BLK_TC_FLUSH, "flush" }, 1570 1565 { BLK_TC_SYNC, "sync" }, 1571 1566 { BLK_TC_QUEUE, "queue" }, 1572 1567 { BLK_TC_REQUEUE, "requeue" }, ··· 1578 1573 { BLK_TC_META, "meta" }, 1579 1574 { BLK_TC_DISCARD, "discard" }, 1580 1575 { BLK_TC_DRV_DATA, "drv_data" }, 1576 + { BLK_TC_FUA, "fua" }, 1581 1577 }; 1582 1578 1583 1579 static int blk_trace_str2mask(const char *str) ··· 1794 1788 { 1795 1789 int i = 0; 1796 1790 1791 + if (rw & REQ_FLUSH) 1792 + rwbs[i++] = 'F'; 1793 + 1797 1794 if (rw & WRITE) 1798 1795 rwbs[i++] = 'W'; 1799 1796 else if (rw & REQ_DISCARD) ··· 1806 1797 else 1807 1798 rwbs[i++] = 'N'; 1808 1799 1800 + if (rw & REQ_FUA) 1801 + rwbs[i++] = 'F'; 1809 1802 if (rw & REQ_RAHEAD) 1810 1803 rwbs[i++] = 'A'; 1811 1804 if (rw & REQ_SYNC)

Configure Feed

Configure Feed