ASoC: Merge up fixes · tjh.dev/kernel@9b4f416

+5

.mailmap

··· 20 20 Adam Radford <aradford@gmail.com> 21 21 Adriana Reus <adi.reus@gmail.com> <adriana.reus@intel.com> 22 22 Adrian Bunk <bunk@stusta.de> 23 + Ajay Kaher <ajay.kaher@broadcom.com> <akaher@vmware.com> 23 24 Akhil P Oommen <quic_akhilpo@quicinc.com> <akhilpo@codeaurora.org> 24 25 Alan Cox <alan@lxorguk.ukuu.org.uk> 25 26 Alan Cox <root@hraefn.swansea.linux.org.uk> ··· 37 36 Alexei Starovoitov <ast@kernel.org> <alexei.starovoitov@gmail.com> 38 37 Alexei Starovoitov <ast@kernel.org> <ast@fb.com> 39 38 Alexei Starovoitov <ast@kernel.org> <ast@plumgrid.com> 39 + Alexey Makhalov <alexey.amakhalov@broadcom.com> <amakhalov@vmware.com> 40 40 Alex Hung <alexhung@gmail.com> <alex.hung@canonical.com> 41 41 Alex Shi <alexs@kernel.org> <alex.shi@intel.com> 42 42 Alex Shi <alexs@kernel.org> <alex.shi@linaro.org> ··· 112 110 Brian Avery <b.avery@hp.com> 113 111 Brian King <brking@us.ibm.com> 114 112 Brian Silverman <bsilver16384@gmail.com> <brian.silverman@bluerivertech.com> 113 + Bryan Tan <bryan-bt.tan@broadcom.com> <bryantan@vmware.com> 115 114 Cai Huoqing <cai.huoqing@linux.dev> <caihuoqing@baidu.com> 116 115 Can Guo <quic_cang@quicinc.com> <cang@codeaurora.org> 117 116 Carl Huang <quic_cjhuang@quicinc.com> <cjhuang@codeaurora.org> ··· 532 529 Roman Gushchin <roman.gushchin@linux.dev> <guro@fb.com> 533 530 Roman Gushchin <roman.gushchin@linux.dev> <guroan@gmail.com> 534 531 Roman Gushchin <roman.gushchin@linux.dev> <klamm@yandex-team.ru> 532 + Ronak Doshi <ronak.doshi@broadcom.com> <doshir@vmware.com> 535 533 Muchun Song <muchun.song@linux.dev> <songmuchun@bytedance.com> 536 534 Muchun Song <muchun.song@linux.dev> <smuchun@gmail.com> 537 535 Ross Zwisler <zwisler@kernel.org> <ross.zwisler@linux.intel.com> ··· 655 651 Viresh Kumar <vireshk@kernel.org> <viresh.linux@gmail.com> 656 652 Viresh Kumar <viresh.kumar@linaro.org> <viresh.kumar@linaro.org> 657 653 Viresh Kumar <viresh.kumar@linaro.org> <viresh.kumar@linaro.com> 654 + Vishnu Dasa <vishnu.dasa@broadcom.com> <vdasa@vmware.com> 658 655 Vivek Aknurwar <quic_viveka@quicinc.com> <viveka@codeaurora.org> 659 656 Vivien Didelot <vivien.didelot@gmail.com> <vivien.didelot@savoirfairelinux.com> 660 657 Vlad Dogaru <ddvlad@gmail.com> <vlad.dogaru@intel.com>

+1 -1

Documentation/admin-guide/kernel-parameters.txt

··· 6599 6599 To turn off having tracepoints sent to printk, 6600 6600 echo 0 > /proc/sys/kernel/tracepoint_printk 6601 6601 Note, echoing 1 into this file without the 6602 - tracepoint_printk kernel cmdline option has no effect. 6602 + tp_printk kernel cmdline option has no effect. 6603 6603 6604 6604 The tp_printk_stop_on_boot (see below) can also be used 6605 6605 to stop the printing of events to console at

+2 -2

Documentation/admin-guide/mm/zswap.rst

··· 155 155 156 156 Some users cannot tolerate the swapping that comes with zswap store failures 157 157 and zswap writebacks. Swapping can be disabled entirely (without disabling 158 - zswap itself) on a cgroup-basis as follows: 158 + zswap itself) on a cgroup-basis as follows:: 159 159 160 160 echo 0 > /sys/fs/cgroup/<cgroup-name>/memory.zswap.writeback 161 161 ··· 166 166 When there is a sizable amount of cold memory residing in the zswap pool, it 167 167 can be advantageous to proactively write these cold pages to swap and reclaim 168 168 the memory for other use cases. By default, the zswap shrinker is disabled. 169 - User can enable it as follows: 169 + User can enable it as follows:: 170 170 171 171 echo Y > /sys/module/zswap/parameters/shrinker_enabled 172 172

+2

Documentation/dev-tools/testing-overview.rst

··· 104 104 KASAN and can be used in production. See Documentation/dev-tools/kfence.rst 105 105 * lockdep is a locking correctness validator. See 106 106 Documentation/locking/lockdep-design.rst 107 + * Runtime Verification (RV) supports checking specific behaviours for a given 108 + subsystem. See Documentation/trace/rv/runtime-verification.rst 107 109 * There are several other pieces of debug instrumentation in the kernel, many 108 110 of which can be found in lib/Kconfig.debug 109 111

-2

Documentation/devicetree/bindings/clock/keystone-gate.txt

··· 1 - Status: Unstable - ABI compatibility may be broken in the future 2 - 3 1 Binding for Keystone gate control driver which uses PSC controller IP. 4 2 5 3 This binding uses the common clock binding[1].

-2

Documentation/devicetree/bindings/clock/keystone-pll.txt

··· 1 - Status: Unstable - ABI compatibility may be broken in the future 2 - 3 1 Binding for keystone PLLs. The main PLL IP typically has a multiplier, 4 2 a divider and a post divider. The additional PLL IPs like ARMPLL, DDRPLL 5 3 and PAPLL are controlled by the memory mapped register where as the Main

-2

Documentation/devicetree/bindings/clock/ti/adpll.txt

··· 1 1 Binding for Texas Instruments ADPLL clock. 2 2 3 - Binding status: Unstable - ABI compatibility may be broken in the future 4 - 5 3 This binding uses the common clock binding[1]. It assumes a 6 4 register-mapped ADPLL with two to three selectable input clocks 7 5 and three to four children.

-2

Documentation/devicetree/bindings/clock/ti/apll.txt

··· 1 1 Binding for Texas Instruments APLL clock. 2 2 3 - Binding status: Unstable - ABI compatibility may be broken in the future 4 - 5 3 This binding uses the common clock binding[1]. It assumes a 6 4 register-mapped APLL with usually two selectable input clocks 7 5 (reference clock and bypass clock), with analog phase locked

-2

Documentation/devicetree/bindings/clock/ti/autoidle.txt

··· 1 1 Binding for Texas Instruments autoidle clock. 2 2 3 - Binding status: Unstable - ABI compatibility may be broken in the future 4 - 5 3 This binding uses the common clock binding[1]. It assumes a register mapped 6 4 clock which can be put to idle automatically by hardware based on the usage 7 5 and a configuration bit setting. Autoidle clock is never an individual

-2

Documentation/devicetree/bindings/clock/ti/clockdomain.txt

··· 1 1 Binding for Texas Instruments clockdomain. 2 2 3 - Binding status: Unstable - ABI compatibility may be broken in the future 4 - 5 3 This binding uses the common clock binding[1] in consumer role. 6 4 Every clock on TI SoC belongs to one clockdomain, but software 7 5 only needs this information for specific clocks which require

-2

Documentation/devicetree/bindings/clock/ti/composite.txt

··· 1 1 Binding for TI composite clock. 2 2 3 - Binding status: Unstable - ABI compatibility may be broken in the future 4 - 5 3 This binding uses the common clock binding[1]. It assumes a 6 4 register-mapped composite clock with multiple different sub-types; 7 5

-2

Documentation/devicetree/bindings/clock/ti/divider.txt

··· 1 1 Binding for TI divider clock 2 2 3 - Binding status: Unstable - ABI compatibility may be broken in the future 4 - 5 3 This binding uses the common clock binding[1]. It assumes a 6 4 register-mapped adjustable clock rate divider that does not gate and has 7 5 only one input clock or parent. By default the value programmed into

-2

Documentation/devicetree/bindings/clock/ti/dpll.txt

··· 1 1 Binding for Texas Instruments DPLL clock. 2 2 3 - Binding status: Unstable - ABI compatibility may be broken in the future 4 - 5 3 This binding uses the common clock binding[1]. It assumes a 6 4 register-mapped DPLL with usually two selectable input clocks 7 5 (reference clock and bypass clock), with digital phase locked

-2

Documentation/devicetree/bindings/clock/ti/fapll.txt

··· 1 1 Binding for Texas Instruments FAPLL clock. 2 2 3 - Binding status: Unstable - ABI compatibility may be broken in the future 4 - 5 3 This binding uses the common clock binding[1]. It assumes a 6 4 register-mapped FAPLL with usually two selectable input clocks 7 5 (reference clock and bypass clock), and one or more child

-2

Documentation/devicetree/bindings/clock/ti/fixed-factor-clock.txt

··· 1 1 Binding for TI fixed factor rate clock sources. 2 2 3 - Binding status: Unstable - ABI compatibility may be broken in the future 4 - 5 3 This binding uses the common clock binding[1], and also uses the autoidle 6 4 support from TI autoidle clock [2]. 7 5

-2

Documentation/devicetree/bindings/clock/ti/gate.txt

··· 1 1 Binding for Texas Instruments gate clock. 2 2 3 - Binding status: Unstable - ABI compatibility may be broken in the future 4 - 5 3 This binding uses the common clock binding[1]. This clock is 6 4 quite much similar to the basic gate-clock [2], however, 7 5 it supports a number of additional features. If no register

-2

Documentation/devicetree/bindings/clock/ti/interface.txt

··· 1 1 Binding for Texas Instruments interface clock. 2 2 3 - Binding status: Unstable - ABI compatibility may be broken in the future 4 - 5 3 This binding uses the common clock binding[1]. This clock is 6 4 quite much similar to the basic gate-clock [2], however, 7 5 it supports a number of additional features, including

-2

Documentation/devicetree/bindings/clock/ti/mux.txt

··· 1 1 Binding for TI mux clock. 2 2 3 - Binding status: Unstable - ABI compatibility may be broken in the future 4 - 5 3 This binding uses the common clock binding[1]. It assumes a 6 4 register-mapped multiplexer with multiple input clock signals or 7 5 parents, one of which can be selected as output. This clock does not

+2

Documentation/devicetree/bindings/dts-coding-style.rst

··· 144 144 #dma-cells = <1>; 145 145 clocks = <&clock_controller 0>, <&clock_controller 1>; 146 146 clock-names = "bus", "host"; 147 + #address-cells = <1>; 148 + #size-cells = <1>; 147 149 vendor,custom-property = <2>; 148 150 status = "disabled"; 149 151

+4

Documentation/devicetree/bindings/net/bluetooth/qualcomm-bluetooth.yaml

··· 94 94 95 95 local-bd-address: true 96 96 97 + qcom,local-bd-address-broken: 98 + type: boolean 99 + description: 100 + boot firmware is incorrectly passing the address in big-endian order 97 101 98 102 required: 99 103 - compatible

-3

Documentation/devicetree/bindings/remoteproc/ti,davinci-rproc.txt

··· 1 1 TI Davinci DSP devices 2 2 ======================= 3 3 4 - Binding status: Unstable - Subject to changes for DT representation of clocks 5 - and resets 6 - 7 4 The TI Davinci family of SoCs usually contains a TI DSP Core sub-system that 8 5 is used to offload some of the processor-intensive tasks or algorithms, for 9 6 achieving various system level goals.

+1 -1

Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-dcfg.yaml

··· 51 51 ranges: true 52 52 53 53 patternProperties: 54 - "^clock-controller@[0-9a-z]+$": 54 + "^clock-controller@[0-9a-f]+$": 55 55 $ref: /schemas/clock/fsl,flexspi-clock.yaml# 56 56 57 57 required:

+1 -1

Documentation/devicetree/bindings/soc/fsl/fsl,layerscape-scfg.yaml

··· 41 41 ranges: true 42 42 43 43 patternProperties: 44 - "^interrupt-controller@[a-z0-9]+$": 44 + "^interrupt-controller@[a-f0-9]+$": 45 45 $ref: /schemas/interrupt-controller/fsl,ls-extirq.yaml# 46 46 47 47 required:

+6

Documentation/devicetree/bindings/sound/rt5645.txt

··· 20 20 a GPIO spec for the external headphone detect pin. If jd-mode = 0, 21 21 we will get the JD status by getting the value of hp-detect-gpios. 22 22 23 + - cbj-sleeve-gpios: 24 + a GPIO spec to control the external combo jack circuit to tie the sleeve/ring2 25 + contacts to the ground or floating. It could avoid some electric noise from the 26 + active speaker jacks. 27 + 23 28 - realtek,in2-differential 24 29 Boolean. Indicate MIC2 input are differential, rather than single-ended. 25 30 ··· 73 68 compatible = "realtek,rt5650"; 74 69 reg = <0x1a>; 75 70 hp-detect-gpios = <&gpio 19 0>; 71 + cbj-sleeve-gpios = <&gpio 20 0>; 76 72 interrupt-parent = <&gpio>; 77 73 interrupts = <7 IRQ_TYPE_EDGE_FALLING>; 78 74 realtek,dmic-en = "true";

+1 -1

Documentation/devicetree/bindings/timer/arm,arch_timer_mmio.yaml

··· 60 60 be implemented in an always-on power domain." 61 61 62 62 patternProperties: 63 - '^frame@[0-9a-z]*$': 63 + '^frame@[0-9a-f]+$': 64 64 type: object 65 65 additionalProperties: false 66 66 description: A timer node has up to 8 frame sub-nodes, each with the following properties.

+34 -4

Documentation/devicetree/bindings/ufs/qcom,ufs.yaml

··· 27 27 - qcom,msm8996-ufshc 28 28 - qcom,msm8998-ufshc 29 29 - qcom,sa8775p-ufshc 30 + - qcom,sc7180-ufshc 30 31 - qcom,sc7280-ufshc 32 + - qcom,sc8180x-ufshc 31 33 - qcom,sc8280xp-ufshc 32 34 - qcom,sdm845-ufshc 33 35 - qcom,sm6115-ufshc 36 + - qcom,sm6125-ufshc 34 37 - qcom,sm6350-ufshc 35 38 - qcom,sm8150-ufshc 36 39 - qcom,sm8250-ufshc ··· 45 42 - const: jedec,ufs-2.0 46 43 47 44 clocks: 48 - minItems: 8 45 + minItems: 7 49 46 maxItems: 11 50 47 51 48 clock-names: 52 - minItems: 8 49 + minItems: 7 53 50 maxItems: 11 54 51 55 52 dma-coherent: true ··· 120 117 compatible: 121 118 contains: 122 119 enum: 120 + - qcom,sc7180-ufshc 121 + then: 122 + properties: 123 + clocks: 124 + minItems: 7 125 + maxItems: 7 126 + clock-names: 127 + items: 128 + - const: core_clk 129 + - const: bus_aggr_clk 130 + - const: iface_clk 131 + - const: core_clk_unipro 132 + - const: ref_clk 133 + - const: tx_lane0_sync_clk 134 + - const: rx_lane0_sync_clk 135 + reg: 136 + maxItems: 1 137 + reg-names: 138 + maxItems: 1 139 + 140 + - if: 141 + properties: 142 + compatible: 143 + contains: 144 + enum: 123 145 - qcom,msm8998-ufshc 124 146 - qcom,sa8775p-ufshc 125 147 - qcom,sc7280-ufshc 148 + - qcom,sc8180x-ufshc 126 149 - qcom,sc8280xp-ufshc 127 150 - qcom,sm8250-ufshc 128 151 - qcom,sm8350-ufshc ··· 244 215 contains: 245 216 enum: 246 217 - qcom,sm6115-ufshc 218 + - qcom,sm6125-ufshc 247 219 then: 248 220 properties: 249 221 clocks: ··· 278 248 reg: 279 249 maxItems: 1 280 250 clocks: 281 - minItems: 8 251 + minItems: 7 282 252 maxItems: 8 283 253 else: 284 254 properties: ··· 286 256 minItems: 1 287 257 maxItems: 2 288 258 clocks: 289 - minItems: 8 259 + minItems: 7 290 260 maxItems: 11 291 261 292 262 unevaluatedProperties: false

+76

Documentation/networking/devlink/devlink-eswitch-attr.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ========================== 4 + Devlink E-Switch Attribute 5 + ========================== 6 + 7 + Devlink E-Switch supports two modes of operation: legacy and switchdev. 8 + Legacy mode operates based on traditional MAC/VLAN steering rules. Switching 9 + decisions are made based on MAC addresses, VLANs, etc. There is limited ability 10 + to offload switching rules to hardware. 11 + 12 + On the other hand, switchdev mode allows for more advanced offloading 13 + capabilities of the E-Switch to hardware. In switchdev mode, more switching 14 + rules and logic can be offloaded to the hardware switch ASIC. It enables 15 + representor netdevices that represent the slow path of virtual functions (VFs) 16 + or scalable-functions (SFs) of the device. See more information about 17 + :ref:`Documentation/networking/switchdev.rst <switchdev>` and 18 + :ref:`Documentation/networking/representors.rst <representors>`. 19 + 20 + In addition, the devlink E-Switch also comes with other attributes listed 21 + in the following section. 22 + 23 + Attributes Description 24 + ====================== 25 + 26 + The following is a list of E-Switch attributes. 27 + 28 + .. list-table:: E-Switch attributes 29 + :widths: 8 5 45 30 + 31 + * - Name 32 + - Type 33 + - Description 34 + * - ``mode`` 35 + - enum 36 + - The mode of the device. The mode can be one of the following: 37 + 38 + * ``legacy`` operates based on traditional MAC/VLAN steering 39 + rules. 40 + * ``switchdev`` allows for more advanced offloading capabilities of 41 + the E-Switch to hardware. 42 + * - ``inline-mode`` 43 + - enum 44 + - Some HWs need the VF driver to put part of the packet 45 + headers on the TX descriptor so the e-switch can do proper 46 + matching and steering. Support for both switchdev mode and legacy mode. 47 + 48 + * ``none`` none. 49 + * ``link`` L2 mode. 50 + * ``network`` L3 mode. 51 + * ``transport`` L4 mode. 52 + * - ``encap-mode`` 53 + - enum 54 + - The encapsulation mode of the device. Support for both switchdev mode 55 + and legacy mode. The mode can be one of the following: 56 + 57 + * ``none`` Disable encapsulation support. 58 + * ``basic`` Enable encapsulation support. 59 + 60 + Example Usage 61 + ============= 62 + 63 + .. code:: shell 64 + 65 + # enable switchdev mode 66 + $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev 67 + 68 + # set inline-mode and encap-mode 69 + $ devlink dev eswitch set pci/0000:08:00.0 inline-mode none encap-mode basic 70 + 71 + # display devlink device eswitch attributes 72 + $ devlink dev eswitch show pci/0000:08:00.0 73 + pci/0000:08:00.0: mode switchdev inline-mode none encap-mode basic 74 + 75 + # enable encap-mode with legacy mode 76 + $ devlink dev eswitch set pci/0000:08:00.0 mode legacy inline-mode none encap-mode basic

+1

Documentation/networking/devlink/index.rst

··· 67 67 devlink-selftests 68 68 devlink-trap 69 69 devlink-linecard 70 + devlink-eswitch-attr 70 71 71 72 Driver-specific documentation 72 73 -----------------------------

+1

Documentation/networking/representors.rst

··· 1 1 .. SPDX-License-Identifier: GPL-2.0 2 + .. _representors: 2 3 3 4 ============================= 4 5 Network Function Representors

+24 -18

Documentation/virt/kvm/x86/amd-memory-encryption.rst

··· 46 46 Hence, the ASID for the SEV-enabled guests must be from 1 to a maximum value 47 47 defined in the CPUID 0x8000001f[ecx] field. 48 48 49 - SEV Key Management 50 - ================== 49 + The KVM_MEMORY_ENCRYPT_OP ioctl 50 + =============================== 51 51 52 - The SEV guest key management is handled by a separate processor called the AMD 53 - Secure Processor (AMD-SP). Firmware running inside the AMD-SP provides a secure 54 - key management interface to perform common hypervisor activities such as 55 - encrypting bootstrap code, snapshot, migrating and debugging the guest. For more 56 - information, see the SEV Key Management spec [api-spec]_ 57 - 58 - The main ioctl to access SEV is KVM_MEMORY_ENCRYPT_OP. If the argument 59 - to KVM_MEMORY_ENCRYPT_OP is NULL, the ioctl returns 0 if SEV is enabled 60 - and ``ENOTTY`` if it is disabled (on some older versions of Linux, 61 - the ioctl runs normally even with a NULL argument, and therefore will 62 - likely return ``EFAULT``). If non-NULL, the argument to KVM_MEMORY_ENCRYPT_OP 63 - must be a struct kvm_sev_cmd:: 52 + The main ioctl to access SEV is KVM_MEMORY_ENCRYPT_OP, which operates on 53 + the VM file descriptor. If the argument to KVM_MEMORY_ENCRYPT_OP is NULL, 54 + the ioctl returns 0 if SEV is enabled and ``ENOTTY`` if it is disabled 55 + (on some older versions of Linux, the ioctl tries to run normally even 56 + with a NULL argument, and therefore will likely return ``EFAULT`` instead 57 + of zero if SEV is enabled). If non-NULL, the argument to 58 + KVM_MEMORY_ENCRYPT_OP must be a struct kvm_sev_cmd:: 64 59 65 60 struct kvm_sev_cmd { 66 61 __u32 id; ··· 82 87 The KVM_SEV_INIT command is used by the hypervisor to initialize the SEV platform 83 88 context. In a typical workflow, this command should be the first command issued. 84 89 85 - The firmware can be initialized either by using its own non-volatile storage or 86 - the OS can manage the NV storage for the firmware using the module parameter 87 - ``init_ex_path``. If the file specified by ``init_ex_path`` does not exist or 88 - is invalid, the OS will create or override the file with output from PSP. 89 90 90 91 Returns: 0 on success, -negative on error 91 92 ··· 424 433 issued by the hypervisor to make the guest ready for execution. 425 434 426 435 Returns: 0 on success, -negative on error 436 + 437 + Firmware Management 438 + =================== 439 + 440 + The SEV guest key management is handled by a separate processor called the AMD 441 + Secure Processor (AMD-SP). Firmware running inside the AMD-SP provides a secure 442 + key management interface to perform common hypervisor activities such as 443 + encrypting bootstrap code, snapshot, migrating and debugging the guest. For more 444 + information, see the SEV Key Management spec [api-spec]_ 445 + 446 + The AMD-SP firmware can be initialized either by using its own non-volatile 447 + storage or the OS can manage the NV storage for the firmware using 448 + parameter ``init_ex_path`` of the ``ccp`` module. If the file specified 449 + by ``init_ex_path`` does not exist or is invalid, the OS will create or 450 + override the file with PSP non-volatile storage. 427 451 428 452 References 429 453 ==========

+9 -10

Documentation/virt/kvm/x86/msr.rst

··· 193 193 Asynchronous page fault (APF) control MSR. 194 194 195 195 Bits 63-6 hold 64-byte aligned physical address of a 64 byte memory area 196 - which must be in guest RAM and must be zeroed. This memory is expected 197 - to hold a copy of the following structure:: 196 + which must be in guest RAM. This memory is expected to hold the 197 + following structure:: 198 198 199 199 struct kvm_vcpu_pv_apf_data { 200 200 /* Used for 'page not present' events delivered via #PF */ ··· 204 204 __u32 token; 205 205 206 206 __u8 pad[56]; 207 - __u32 enabled; 208 207 }; 209 208 210 209 Bits 5-4 of the MSR are reserved and should be zero. Bit 0 is set to 1 ··· 231 232 as regular page fault, guest must reset 'flags' to '0' before it does 232 233 something that can generate normal page fault. 233 234 234 - Bytes 5-7 of 64 byte memory location ('token') will be written to by the 235 + Bytes 4-7 of 64 byte memory location ('token') will be written to by the 235 236 hypervisor at the time of APF 'page ready' event injection. The content 236 - of these bytes is a token which was previously delivered as 'page not 237 - present' event. The event indicates the page in now available. Guest is 238 - supposed to write '0' to 'token' when it is done handling 'page ready' 239 - event and to write 1' to MSR_KVM_ASYNC_PF_ACK after clearing the location; 240 - writing to the MSR forces KVM to re-scan its queue and deliver the next 241 - pending notification. 237 + of these bytes is a token which was previously delivered in CR2 as 238 + 'page not present' event. The event indicates the page is now available. 239 + Guest is supposed to write '0' to 'token' when it is done handling 240 + 'page ready' event and to write '1' to MSR_KVM_ASYNC_PF_ACK after 241 + clearing the location; writing to the MSR forces KVM to re-scan its 242 + queue and deliver the next pending notification. 242 243 243 244 Note, MSR_KVM_ASYNC_PF_INT MSR specifying the interrupt vector for 'page 244 245 ready' APF delivery needs to be written to before enabling APF mechanism

+26 -23

MAINTAINERS

··· 14019 14019 14020 14020 MELLANOX ETHERNET DRIVER (mlx5e) 14021 14021 M: Saeed Mahameed <saeedm@nvidia.com> 14022 + M: Tariq Toukan <tariqt@nvidia.com> 14022 14023 L: netdev@vger.kernel.org 14023 14024 S: Supported 14024 14025 W: http://www.mellanox.com ··· 14087 14086 MELLANOX MLX5 core VPI driver 14088 14087 M: Saeed Mahameed <saeedm@nvidia.com> 14089 14088 M: Leon Romanovsky <leonro@nvidia.com> 14089 + M: Tariq Toukan <tariqt@nvidia.com> 14090 14090 L: netdev@vger.kernel.org 14091 14091 L: linux-rdma@vger.kernel.org 14092 14092 S: Supported ··· 16733 16731 16734 16732 PARAVIRT_OPS INTERFACE 16735 16733 M: Juergen Gross <jgross@suse.com> 16736 - R: Ajay Kaher <akaher@vmware.com> 16737 - R: Alexey Makhalov <amakhalov@vmware.com> 16738 - R: VMware PV-Drivers Reviewers <pv-drivers@vmware.com> 16734 + R: Ajay Kaher <ajay.kaher@broadcom.com> 16735 + R: Alexey Makhalov <alexey.amakhalov@broadcom.com> 16736 + R: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com> 16739 16737 L: virtualization@lists.linux.dev 16740 16738 L: x86@kernel.org 16741 16739 S: Supported ··· 22443 22441 W: https://kernsec.org/wiki/index.php/Linux_Kernel_Integrity 22444 22442 Q: https://patchwork.kernel.org/project/linux-integrity/list/ 22445 22443 T: git git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd.git 22444 + F: Documentation/devicetree/bindings/tpm/ 22446 22445 F: drivers/char/tpm/ 22447 22446 22448 22447 TPS546D24 DRIVER ··· 23668 23665 F: drivers/misc/vmw_balloon.c 23669 23666 23670 23667 VMWARE HYPERVISOR INTERFACE 23671 - M: Ajay Kaher <akaher@vmware.com> 23672 - M: Alexey Makhalov <amakhalov@vmware.com> 23673 - R: VMware PV-Drivers Reviewers <pv-drivers@vmware.com> 23668 + M: Ajay Kaher <ajay.kaher@broadcom.com> 23669 + M: Alexey Makhalov <alexey.amakhalov@broadcom.com> 23670 + R: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com> 23674 23671 L: virtualization@lists.linux.dev 23675 23672 L: x86@kernel.org 23676 23673 S: Supported ··· 23679 23676 F: arch/x86/kernel/cpu/vmware.c 23680 23677 23681 23678 VMWARE PVRDMA DRIVER 23682 - M: Bryan Tan <bryantan@vmware.com> 23683 - M: Vishnu Dasa <vdasa@vmware.com> 23684 - R: VMware PV-Drivers Reviewers <pv-drivers@vmware.com> 23679 + M: Bryan Tan <bryan-bt.tan@broadcom.com> 23680 + M: Vishnu Dasa <vishnu.dasa@broadcom.com> 23681 + R: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com> 23685 23682 L: linux-rdma@vger.kernel.org 23686 23683 S: Supported 23687 23684 F: drivers/infiniband/hw/vmw_pvrdma/ 23688 23685 23689 23686 VMWARE PVSCSI DRIVER 23690 - M: Vishal Bhakta <vbhakta@vmware.com> 23691 - R: VMware PV-Drivers Reviewers <pv-drivers@vmware.com> 23687 + M: Vishal Bhakta <vishal.bhakta@broadcom.com> 23688 + R: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com> 23692 23689 L: linux-scsi@vger.kernel.org 23693 23690 S: Supported 23694 23691 F: drivers/scsi/vmw_pvscsi.c 23695 23692 F: drivers/scsi/vmw_pvscsi.h 23696 23693 23697 23694 VMWARE VIRTUAL PTP CLOCK DRIVER 23698 - M: Jeff Sipek <jsipek@vmware.com> 23699 - R: Ajay Kaher <akaher@vmware.com> 23700 - R: Alexey Makhalov <amakhalov@vmware.com> 23701 - R: VMware PV-Drivers Reviewers <pv-drivers@vmware.com> 23695 + M: Nick Shi <nick.shi@broadcom.com> 23696 + R: Ajay Kaher <ajay.kaher@broadcom.com> 23697 + R: Alexey Makhalov <alexey.amakhalov@broadcom.com> 23698 + R: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com> 23702 23699 L: netdev@vger.kernel.org 23703 23700 S: Supported 23704 23701 F: drivers/ptp/ptp_vmw.c 23705 23702 23706 23703 VMWARE VMCI DRIVER 23707 - M: Bryan Tan <bryantan@vmware.com> 23708 - M: Vishnu Dasa <vdasa@vmware.com> 23709 - R: VMware PV-Drivers Reviewers <pv-drivers@vmware.com> 23704 + M: Bryan Tan <bryan-bt.tan@broadcom.com> 23705 + M: Vishnu Dasa <vishnu.dasa@broadcom.com> 23706 + R: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com> 23710 23707 L: linux-kernel@vger.kernel.org 23711 23708 S: Supported 23712 23709 F: drivers/misc/vmw_vmci/ ··· 23721 23718 F: drivers/input/mouse/vmmouse.h 23722 23719 23723 23720 VMWARE VMXNET3 ETHERNET DRIVER 23724 - M: Ronak Doshi <doshir@vmware.com> 23725 - R: VMware PV-Drivers Reviewers <pv-drivers@vmware.com> 23721 + M: Ronak Doshi <ronak.doshi@broadcom.com> 23722 + R: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com> 23726 23723 L: netdev@vger.kernel.org 23727 23724 S: Supported 23728 23725 F: drivers/net/vmxnet3/ 23729 23726 23730 23727 VMWARE VSOCK VMCI TRANSPORT DRIVER 23731 - M: Bryan Tan <bryantan@vmware.com> 23732 - M: Vishnu Dasa <vdasa@vmware.com> 23733 - R: VMware PV-Drivers Reviewers <pv-drivers@vmware.com> 23728 + M: Bryan Tan <bryan-bt.tan@broadcom.com> 23729 + M: Vishnu Dasa <vishnu.dasa@broadcom.com> 23730 + R: Broadcom internal kernel review list <bcm-kernel-feedback-list@broadcom.com> 23734 23731 L: linux-kernel@vger.kernel.org 23735 23732 S: Supported 23736 23733 F: net/vmw_vsock/vmci_transport*

+1 -1

Makefile

··· 2 2 VERSION = 6 3 3 PATCHLEVEL = 9 4 4 SUBLEVEL = 0 5 - EXTRAVERSION = -rc2 5 + EXTRAVERSION = -rc3 6 6 NAME = Hurr durr I'ma ninja sloth 7 7 8 8 # *DOCUMENTATION*

+2

arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi

··· 944 944 vddrf-supply = <&pp1300_l2c>; 945 945 vddch0-supply = <&pp3300_l10c>; 946 946 max-speed = <3200000>; 947 + 948 + qcom,local-bd-address-broken; 947 949 }; 948 950 }; 949 951

+16 -13

arch/arm64/kernel/head.S

··· 291 291 blr x2 292 292 0: 293 293 mov_q x0, HCR_HOST_NVHE_FLAGS 294 + 295 + /* 296 + * Compliant CPUs advertise their VHE-onlyness with 297 + * ID_AA64MMFR4_EL1.E2H0 < 0. HCR_EL2.E2H can be 298 + * RES1 in that case. Publish the E2H bit early so that 299 + * it can be picked up by the init_el2_state macro. 300 + * 301 + * Fruity CPUs seem to have HCR_EL2.E2H set to RAO/WI, but 302 + * don't advertise it (they predate this relaxation). 303 + */ 304 + mrs_s x1, SYS_ID_AA64MMFR4_EL1 305 + tbz x1, #(ID_AA64MMFR4_EL1_E2H0_SHIFT + ID_AA64MMFR4_EL1_E2H0_WIDTH - 1), 1f 306 + 307 + orr x0, x0, #HCR_E2H 308 + 1: 294 309 msr hcr_el2, x0 295 310 isb 296 311 ··· 318 303 319 304 mov_q x1, INIT_SCTLR_EL1_MMU_OFF 320 305 321 - /* 322 - * Compliant CPUs advertise their VHE-onlyness with 323 - * ID_AA64MMFR4_EL1.E2H0 < 0. HCR_EL2.E2H can be 324 - * RES1 in that case. 325 - * 326 - * Fruity CPUs seem to have HCR_EL2.E2H set to RES1, but 327 - * don't advertise it (they predate this relaxation). 328 - */ 329 - mrs_s x0, SYS_ID_AA64MMFR4_EL1 330 - ubfx x0, x0, #ID_AA64MMFR4_EL1_E2H0_SHIFT, #ID_AA64MMFR4_EL1_E2H0_WIDTH 331 - tbnz x0, #(ID_AA64MMFR4_EL1_E2H0_SHIFT + ID_AA64MMFR4_EL1_E2H0_WIDTH - 1), 1f 332 - 333 306 mrs x0, hcr_el2 334 307 and x0, x0, #HCR_E2H 335 308 cbz x0, 2f 336 - 1: 309 + 337 310 /* Set a sane SCTLR_EL1, the VHE way */ 338 311 pre_disable_mmu_workaround 339 312 msr_s SYS_SCTLR_EL12, x1

+1 -4

arch/arm64/kernel/ptrace.c

··· 761 761 { 762 762 unsigned int vq; 763 763 bool active; 764 - bool fpsimd_only; 765 764 enum vec_type task_type; 766 765 767 766 memset(header, 0, sizeof(*header)); ··· 776 777 case ARM64_VEC_SVE: 777 778 if (test_tsk_thread_flag(target, TIF_SVE_VL_INHERIT)) 778 779 header->flags |= SVE_PT_VL_INHERIT; 779 - fpsimd_only = !test_tsk_thread_flag(target, TIF_SVE); 780 780 break; 781 781 case ARM64_VEC_SME: 782 782 if (test_tsk_thread_flag(target, TIF_SME_VL_INHERIT)) 783 783 header->flags |= SVE_PT_VL_INHERIT; 784 - fpsimd_only = false; 785 784 break; 786 785 default: 787 786 WARN_ON_ONCE(1); ··· 787 790 } 788 791 789 792 if (active) { 790 - if (fpsimd_only) { 793 + if (target->thread.fp_type == FP_STATE_FPSIMD) { 791 794 header->flags |= SVE_PT_REGS_FPSIMD; 792 795 } else { 793 796 header->flags |= SVE_PT_REGS_SVE;

+5 -8

arch/arm64/kvm/arm.c

··· 2597 2597 if (err) 2598 2598 goto out_hyp; 2599 2599 2600 - if (is_protected_kvm_enabled()) { 2601 - kvm_info("Protected nVHE mode initialized successfully\n"); 2602 - } else if (in_hyp_mode) { 2603 - kvm_info("VHE mode initialized successfully\n"); 2604 - } else { 2605 - char mode = cpus_have_final_cap(ARM64_KVM_HVHE) ? 'h' : 'n'; 2606 - kvm_info("Hyp mode (%cVHE) initialized successfully\n", mode); 2607 - } 2600 + kvm_info("%s%sVHE mode initialized successfully\n", 2601 + in_hyp_mode ? "" : (is_protected_kvm_enabled() ? 2602 + "Protected " : "Hyp "), 2603 + in_hyp_mode ? "" : (cpus_have_final_cap(ARM64_KVM_HVHE) ? 2604 + "h" : "n")); 2608 2605 2609 2606 /* 2610 2607 * FIXME: Do something reasonable if kvm_init() fails after pKVM

+2 -1

arch/arm64/kvm/hyp/nvhe/tlb.c

··· 154 154 /* Switch to requested VMID */ 155 155 __tlb_switch_to_guest(mmu, &cxt, false); 156 156 157 - __flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0); 157 + __flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 158 + TLBI_TTL_UNKNOWN); 158 159 159 160 dsb(ish); 160 161 __tlbi(vmalle1is);

+15 -8

arch/arm64/kvm/hyp/pgtable.c

··· 528 528 529 529 kvm_clear_pte(ctx->ptep); 530 530 dsb(ishst); 531 - __tlbi_level(vae2is, __TLBI_VADDR(ctx->addr, 0), ctx->level); 531 + __tlbi_level(vae2is, __TLBI_VADDR(ctx->addr, 0), TLBI_TTL_UNKNOWN); 532 532 } else { 533 533 if (ctx->end - ctx->addr < granule) 534 534 return -EINVAL; ··· 843 843 * Perform the appropriate TLB invalidation based on the 844 844 * evicted pte value (if any). 845 845 */ 846 - if (kvm_pte_table(ctx->old, ctx->level)) 847 - kvm_tlb_flush_vmid_range(mmu, ctx->addr, 848 - kvm_granule_size(ctx->level)); 849 - else if (kvm_pte_valid(ctx->old)) 846 + if (kvm_pte_table(ctx->old, ctx->level)) { 847 + u64 size = kvm_granule_size(ctx->level); 848 + u64 addr = ALIGN_DOWN(ctx->addr, size); 849 + 850 + kvm_tlb_flush_vmid_range(mmu, addr, size); 851 + } else if (kvm_pte_valid(ctx->old)) { 850 852 kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, 851 853 ctx->addr, ctx->level); 854 + } 852 855 } 853 856 854 857 if (stage2_pte_is_counted(ctx->old)) ··· 899 896 if (kvm_pte_valid(ctx->old)) { 900 897 kvm_clear_pte(ctx->ptep); 901 898 902 - if (!stage2_unmap_defer_tlb_flush(pgt)) 903 - kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, 904 - ctx->addr, ctx->level); 899 + if (kvm_pte_table(ctx->old, ctx->level)) { 900 + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, 901 + TLBI_TTL_UNKNOWN); 902 + } else if (!stage2_unmap_defer_tlb_flush(pgt)) { 903 + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, 904 + ctx->level); 905 + } 905 906 } 906 907 907 908 mm_ops->put_page(ctx->ptep);

+2 -1

arch/arm64/kvm/hyp/vhe/tlb.c

··· 171 171 /* Switch to requested VMID */ 172 172 __tlb_switch_to_guest(mmu, &cxt); 173 173 174 - __flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 0); 174 + __flush_s2_tlb_range_op(ipas2e1is, start, pages, stride, 175 + TLBI_TTL_UNKNOWN); 175 176 176 177 dsb(ish); 177 178 __tlbi(vmalle1is);

+1 -1

arch/arm64/kvm/mmu.c

··· 1637 1637 fault_ipa = kvm_vcpu_get_fault_ipa(vcpu); 1638 1638 is_iabt = kvm_vcpu_trap_is_iabt(vcpu); 1639 1639 1640 - if (esr_fsc_is_permission_fault(esr)) { 1640 + if (esr_fsc_is_translation_fault(esr)) { 1641 1641 /* Beyond sanitised PARange (which is the IPA limit) */ 1642 1642 if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) { 1643 1643 kvm_inject_size_fault(vcpu);

+5 -1

arch/nios2/kernel/prom.c

··· 21 21 22 22 void __init early_init_devtree(void *params) 23 23 { 24 - __be32 *dtb = (u32 *)__dtb_start; 24 + __be32 __maybe_unused *dtb = (u32 *)__dtb_start; 25 + 25 26 #if defined(CONFIG_NIOS2_DTB_AT_PHYS_ADDR) 26 27 if (be32_to_cpup((__be32 *)CONFIG_NIOS2_DTB_PHYS_ADDR) == 27 28 OF_DT_HEADER) { ··· 31 30 return; 32 31 } 33 32 #endif 33 + 34 + #ifdef CONFIG_NIOS2_DTB_SOURCE_BOOL 34 35 if (be32_to_cpu((__be32) *dtb) == OF_DT_HEADER) 35 36 params = (void *)__dtb_start; 37 + #endif 36 38 37 39 early_init_dt_scan(params); 38 40 }

+1 -2

arch/powerpc/include/asm/vdso/gettimeofday.h

··· 4 4 5 5 #ifndef __ASSEMBLY__ 6 6 7 - #include <asm/page.h> 8 7 #include <asm/vdso/timebase.h> 9 8 #include <asm/barrier.h> 10 9 #include <asm/unistd.h> ··· 94 95 static __always_inline 95 96 const struct vdso_data *__arch_get_timens_vdso_data(const struct vdso_data *vd) 96 97 { 97 - return (void *)vd + PAGE_SIZE; 98 + return (void *)vd + (1U << CONFIG_PAGE_SHIFT); 98 99 } 99 100 #endif 100 101

+1 -1

arch/riscv/Makefile

··· 151 151 endif 152 152 153 153 vdso-install-y += arch/riscv/kernel/vdso/vdso.so.dbg 154 - vdso-install-$(CONFIG_COMPAT) += arch/riscv/kernel/compat_vdso/compat_vdso.so.dbg:../compat_vdso/compat_vdso.so 154 + vdso-install-$(CONFIG_COMPAT) += arch/riscv/kernel/compat_vdso/compat_vdso.so.dbg 155 155 156 156 ifneq ($(CONFIG_XIP_KERNEL),y) 157 157 ifeq ($(CONFIG_RISCV_M_MODE)$(CONFIG_ARCH_CANAAN),yy)

+6

arch/riscv/include/asm/pgtable.h

··· 593 593 return ptep_test_and_clear_young(vma, address, ptep); 594 594 } 595 595 596 + #define pgprot_nx pgprot_nx 597 + static inline pgprot_t pgprot_nx(pgprot_t _prot) 598 + { 599 + return __pgprot(pgprot_val(_prot) & ~_PAGE_EXEC); 600 + } 601 + 596 602 #define pgprot_noncached pgprot_noncached 597 603 static inline pgprot_t pgprot_noncached(pgprot_t _prot) 598 604 {

+2 -1

arch/riscv/include/asm/syscall_wrapper.h

··· 36 36 ulong) \ 37 37 __attribute__((alias(__stringify(___se_##prefix##name)))); \ 38 38 __diag_pop(); \ 39 - static long noinline ___se_##prefix##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ 39 + static long noinline ___se_##prefix##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ 40 + __used; \ 40 41 static long ___se_##prefix##name(__MAP(x,__SC_LONG,__VA_ARGS__)) 41 42 42 43 #define SC_RISCV_REGS_TO_ARGS(x, ...) \

+2 -2

arch/riscv/include/asm/uaccess.h

··· 319 319 320 320 #define __get_kernel_nofault(dst, src, type, err_label) \ 321 321 do { \ 322 - long __kr_err; \ 322 + long __kr_err = 0; \ 323 323 \ 324 324 __get_user_nocheck(*((type *)(dst)), (type *)(src), __kr_err); \ 325 325 if (unlikely(__kr_err)) \ ··· 328 328 329 329 #define __put_kernel_nofault(dst, src, type, err_label) \ 330 330 do { \ 331 - long __kr_err; \ 331 + long __kr_err = 0; \ 332 332 \ 333 333 __put_user_nocheck(*((type *)(src)), (type *)(dst), __kr_err); \ 334 334 if (unlikely(__kr_err)) \

+1 -1

arch/riscv/include/uapi/asm/auxvec.h

··· 34 34 #define AT_L3_CACHEGEOMETRY 47 35 35 36 36 /* entries in ARCH_DLINFO */ 37 - #define AT_VECTOR_SIZE_ARCH 9 37 + #define AT_VECTOR_SIZE_ARCH 10 38 38 #define AT_MINSIGSTKSZ 51 39 39 40 40 #endif /* _UAPI_ASM_RISCV_AUXVEC_H */

+1 -1

arch/riscv/kernel/compat_vdso/Makefile

··· 74 74 rm $@.tmp 75 75 76 76 # actual build commands 77 - quiet_cmd_compat_vdsoas = VDSOAS $@ 77 + quiet_cmd_compat_vdsoas = VDSOAS $@ 78 78 cmd_compat_vdsoas = $(COMPAT_CC) $(a_flags) $(COMPAT_CC_FLAGS) -c -o $@ $<

+8

arch/riscv/kernel/patch.c

··· 80 80 */ 81 81 lockdep_assert_held(&text_mutex); 82 82 83 + preempt_disable(); 84 + 83 85 if (across_pages) 84 86 patch_map(addr + PAGE_SIZE, FIX_TEXT_POKE1); 85 87 ··· 93 91 94 92 if (across_pages) 95 93 patch_unmap(FIX_TEXT_POKE1); 94 + 95 + preempt_enable(); 96 96 97 97 return 0; 98 98 } ··· 126 122 if (!riscv_patch_in_stop_machine) 127 123 lockdep_assert_held(&text_mutex); 128 124 125 + preempt_disable(); 126 + 129 127 if (across_pages) 130 128 patch_map(addr + PAGE_SIZE, FIX_TEXT_POKE1); 131 129 ··· 139 133 140 134 if (across_pages) 141 135 patch_unmap(FIX_TEXT_POKE1); 136 + 137 + preempt_enable(); 142 138 143 139 return ret; 144 140 }

+1 -4

arch/riscv/kernel/process.c

··· 27 27 #include <asm/vector.h> 28 28 #include <asm/cpufeature.h> 29 29 30 - register unsigned long gp_in_global __asm__("gp"); 31 - 32 30 #if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK) 33 31 #include <linux/stackprotector.h> 34 32 unsigned long __stack_chk_guard __read_mostly; ··· 35 37 36 38 extern asmlinkage void ret_from_fork(void); 37 39 38 - void arch_cpu_idle(void) 40 + void noinstr arch_cpu_idle(void) 39 41 { 40 42 cpu_do_idle(); 41 43 } ··· 205 207 if (unlikely(args->fn)) { 206 208 /* Kernel thread */ 207 209 memset(childregs, 0, sizeof(struct pt_regs)); 208 - childregs->gp = gp_in_global; 209 210 /* Supervisor/Machine, irqs on: */ 210 211 childregs->status = SR_PP | SR_PIE; 211 212

+8 -7

arch/riscv/kernel/signal.c

··· 119 119 struct __sc_riscv_v_state __user *state = sc_vec; 120 120 void __user *datap; 121 121 122 + /* 123 + * Mark the vstate as clean prior performing the actual copy, 124 + * to avoid getting the vstate incorrectly clobbered by the 125 + * discarded vector state. 126 + */ 127 + riscv_v_vstate_set_restore(current, regs); 128 + 122 129 /* Copy everything of __sc_riscv_v_state except datap. */ 123 130 err = __copy_from_user(&current->thread.vstate, &state->v_state, 124 131 offsetof(struct __riscv_v_ext_state, datap)); ··· 140 133 * Copy the whole vector content from user space datap. Use 141 134 * copy_from_user to prevent information leak. 142 135 */ 143 - err = copy_from_user(current->thread.vstate.datap, datap, riscv_v_vsize); 144 - if (unlikely(err)) 145 - return err; 146 - 147 - riscv_v_vstate_set_restore(current, regs); 148 - 149 - return err; 136 + return copy_from_user(current->thread.vstate.datap, datap, riscv_v_vsize); 150 137 } 151 138 #else 152 139 #define save_v_state(task, regs) (0)

+1 -1

arch/riscv/kernel/traps.c

··· 122 122 print_vma_addr(KERN_CONT " in ", instruction_pointer(regs)); 123 123 pr_cont("\n"); 124 124 __show_regs(regs); 125 - dump_instr(KERN_EMERG, regs); 125 + dump_instr(KERN_INFO, regs); 126 126 } 127 127 128 128 force_sig_fault(signo, code, (void __user *)addr);

+1

arch/riscv/kernel/vdso/Makefile

··· 37 37 38 38 # Disable -pg to prevent insert call site 39 39 CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) 40 + CFLAGS_REMOVE_hwprobe.o = $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS) 40 41 41 42 # Disable profiling and instrumentation for VDSO code 42 43 GCOV_PROFILE := n

+31 -6

arch/riscv/kvm/aia_aplic.c

··· 137 137 raw_spin_lock_irqsave(&irqd->lock, flags); 138 138 139 139 sm = irqd->sourcecfg & APLIC_SOURCECFG_SM_MASK; 140 - if (!pending && 141 - ((sm == APLIC_SOURCECFG_SM_LEVEL_HIGH) || 142 - (sm == APLIC_SOURCECFG_SM_LEVEL_LOW))) 140 + if (sm == APLIC_SOURCECFG_SM_INACTIVE) 143 141 goto skip_write_pending; 142 + 143 + if (sm == APLIC_SOURCECFG_SM_LEVEL_HIGH || 144 + sm == APLIC_SOURCECFG_SM_LEVEL_LOW) { 145 + if (!pending) 146 + goto skip_write_pending; 147 + if ((irqd->state & APLIC_IRQ_STATE_INPUT) && 148 + sm == APLIC_SOURCECFG_SM_LEVEL_LOW) 149 + goto skip_write_pending; 150 + if (!(irqd->state & APLIC_IRQ_STATE_INPUT) && 151 + sm == APLIC_SOURCECFG_SM_LEVEL_HIGH) 152 + goto skip_write_pending; 153 + } 144 154 145 155 if (pending) 146 156 irqd->state |= APLIC_IRQ_STATE_PENDING; ··· 197 187 198 188 static bool aplic_read_input(struct aplic *aplic, u32 irq) 199 189 { 200 - bool ret; 201 - unsigned long flags; 190 + u32 sourcecfg, sm, raw_input, irq_inverted; 202 191 struct aplic_irq *irqd; 192 + unsigned long flags; 193 + bool ret = false; 203 194 204 195 if (!irq || aplic->nr_irqs <= irq) 205 196 return false; 206 197 irqd = &aplic->irqs[irq]; 207 198 208 199 raw_spin_lock_irqsave(&irqd->lock, flags); 209 - ret = (irqd->state & APLIC_IRQ_STATE_INPUT) ? true : false; 200 + 201 + sourcecfg = irqd->sourcecfg; 202 + if (sourcecfg & APLIC_SOURCECFG_D) 203 + goto skip; 204 + 205 + sm = sourcecfg & APLIC_SOURCECFG_SM_MASK; 206 + if (sm == APLIC_SOURCECFG_SM_INACTIVE) 207 + goto skip; 208 + 209 + raw_input = (irqd->state & APLIC_IRQ_STATE_INPUT) ? 1 : 0; 210 + irq_inverted = (sm == APLIC_SOURCECFG_SM_LEVEL_LOW || 211 + sm == APLIC_SOURCECFG_SM_EDGE_FALL) ? 1 : 0; 212 + ret = !!(raw_input ^ irq_inverted); 213 + 214 + skip: 210 215 raw_spin_unlock_irqrestore(&irqd->lock, flags); 211 216 212 217 return ret;

+1 -1

arch/riscv/kvm/vcpu_onereg.c

··· 986 986 987 987 static inline unsigned long num_isa_ext_regs(const struct kvm_vcpu *vcpu) 988 988 { 989 - return copy_isa_ext_reg_indices(vcpu, NULL);; 989 + return copy_isa_ext_reg_indices(vcpu, NULL); 990 990 } 991 991 992 992 static int copy_sbi_ext_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)

+2 -2

arch/riscv/mm/tlbflush.c

··· 99 99 local_flush_tlb_range_asid(d->start, d->size, d->stride, d->asid); 100 100 } 101 101 102 - static void __flush_tlb_range(struct cpumask *cmask, unsigned long asid, 102 + static void __flush_tlb_range(const struct cpumask *cmask, unsigned long asid, 103 103 unsigned long start, unsigned long size, 104 104 unsigned long stride) 105 105 { ··· 200 200 201 201 void flush_tlb_kernel_range(unsigned long start, unsigned long end) 202 202 { 203 - __flush_tlb_range((struct cpumask *)cpu_online_mask, FLUSH_TLB_NO_ASID, 203 + __flush_tlb_range(cpu_online_mask, FLUSH_TLB_NO_ASID, 204 204 start, end - start, PAGE_SIZE); 205 205 } 206 206

+22 -22

arch/s390/include/asm/atomic.h

··· 15 15 #include <asm/barrier.h> 16 16 #include <asm/cmpxchg.h> 17 17 18 - static inline int arch_atomic_read(const atomic_t *v) 18 + static __always_inline int arch_atomic_read(const atomic_t *v) 19 19 { 20 20 return __atomic_read(v); 21 21 } 22 22 #define arch_atomic_read arch_atomic_read 23 23 24 - static inline void arch_atomic_set(atomic_t *v, int i) 24 + static __always_inline void arch_atomic_set(atomic_t *v, int i) 25 25 { 26 26 __atomic_set(v, i); 27 27 } 28 28 #define arch_atomic_set arch_atomic_set 29 29 30 - static inline int arch_atomic_add_return(int i, atomic_t *v) 30 + static __always_inline int arch_atomic_add_return(int i, atomic_t *v) 31 31 { 32 32 return __atomic_add_barrier(i, &v->counter) + i; 33 33 } 34 34 #define arch_atomic_add_return arch_atomic_add_return 35 35 36 - static inline int arch_atomic_fetch_add(int i, atomic_t *v) 36 + static __always_inline int arch_atomic_fetch_add(int i, atomic_t *v) 37 37 { 38 38 return __atomic_add_barrier(i, &v->counter); 39 39 } 40 40 #define arch_atomic_fetch_add arch_atomic_fetch_add 41 41 42 - static inline void arch_atomic_add(int i, atomic_t *v) 42 + static __always_inline void arch_atomic_add(int i, atomic_t *v) 43 43 { 44 44 __atomic_add(i, &v->counter); 45 45 } ··· 50 50 #define arch_atomic_fetch_sub(_i, _v) arch_atomic_fetch_add(-(int)(_i), _v) 51 51 52 52 #define ATOMIC_OPS(op) \ 53 - static inline void arch_atomic_##op(int i, atomic_t *v) \ 53 + static __always_inline void arch_atomic_##op(int i, atomic_t *v) \ 54 54 { \ 55 55 __atomic_##op(i, &v->counter); \ 56 56 } \ 57 - static inline int arch_atomic_fetch_##op(int i, atomic_t *v) \ 57 + static __always_inline int arch_atomic_fetch_##op(int i, atomic_t *v) \ 58 58 { \ 59 59 return __atomic_##op##_barrier(i, &v->counter); \ 60 60 } ··· 74 74 75 75 #define arch_atomic_xchg(v, new) (arch_xchg(&((v)->counter), new)) 76 76 77 - static inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new) 77 + static __always_inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new) 78 78 { 79 79 return __atomic_cmpxchg(&v->counter, old, new); 80 80 } ··· 82 82 83 83 #define ATOMIC64_INIT(i) { (i) } 84 84 85 - static inline s64 arch_atomic64_read(const atomic64_t *v) 85 + static __always_inline s64 arch_atomic64_read(const atomic64_t *v) 86 86 { 87 87 return __atomic64_read(v); 88 88 } 89 89 #define arch_atomic64_read arch_atomic64_read 90 90 91 - static inline void arch_atomic64_set(atomic64_t *v, s64 i) 91 + static __always_inline void arch_atomic64_set(atomic64_t *v, s64 i) 92 92 { 93 93 __atomic64_set(v, i); 94 94 } 95 95 #define arch_atomic64_set arch_atomic64_set 96 96 97 - static inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v) 97 + static __always_inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v) 98 98 { 99 99 return __atomic64_add_barrier(i, (long *)&v->counter) + i; 100 100 } 101 101 #define arch_atomic64_add_return arch_atomic64_add_return 102 102 103 - static inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v) 103 + static __always_inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v) 104 104 { 105 105 return __atomic64_add_barrier(i, (long *)&v->counter); 106 106 } 107 107 #define arch_atomic64_fetch_add arch_atomic64_fetch_add 108 108 109 - static inline void arch_atomic64_add(s64 i, atomic64_t *v) 109 + static __always_inline void arch_atomic64_add(s64 i, atomic64_t *v) 110 110 { 111 111 __atomic64_add(i, (long *)&v->counter); 112 112 } ··· 114 114 115 115 #define arch_atomic64_xchg(v, new) (arch_xchg(&((v)->counter), new)) 116 116 117 - static inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new) 117 + static __always_inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new) 118 118 { 119 119 return __atomic64_cmpxchg((long *)&v->counter, old, new); 120 120 } 121 121 #define arch_atomic64_cmpxchg arch_atomic64_cmpxchg 122 122 123 - #define ATOMIC64_OPS(op) \ 124 - static inline void arch_atomic64_##op(s64 i, atomic64_t *v) \ 125 - { \ 126 - __atomic64_##op(i, (long *)&v->counter); \ 127 - } \ 128 - static inline long arch_atomic64_fetch_##op(s64 i, atomic64_t *v) \ 129 - { \ 130 - return __atomic64_##op##_barrier(i, (long *)&v->counter); \ 123 + #define ATOMIC64_OPS(op) \ 124 + static __always_inline void arch_atomic64_##op(s64 i, atomic64_t *v) \ 125 + { \ 126 + __atomic64_##op(i, (long *)&v->counter); \ 127 + } \ 128 + static __always_inline long arch_atomic64_fetch_##op(s64 i, atomic64_t *v) \ 129 + { \ 130 + return __atomic64_##op##_barrier(i, (long *)&v->counter); \ 131 131 } 132 132 133 133 ATOMIC64_OPS(and)

+11 -11

arch/s390/include/asm/atomic_ops.h

··· 8 8 #ifndef __ARCH_S390_ATOMIC_OPS__ 9 9 #define __ARCH_S390_ATOMIC_OPS__ 10 10 11 - static inline int __atomic_read(const atomic_t *v) 11 + static __always_inline int __atomic_read(const atomic_t *v) 12 12 { 13 13 int c; 14 14 ··· 18 18 return c; 19 19 } 20 20 21 - static inline void __atomic_set(atomic_t *v, int i) 21 + static __always_inline void __atomic_set(atomic_t *v, int i) 22 22 { 23 23 asm volatile( 24 24 " st %1,%0\n" 25 25 : "=R" (v->counter) : "d" (i)); 26 26 } 27 27 28 - static inline s64 __atomic64_read(const atomic64_t *v) 28 + static __always_inline s64 __atomic64_read(const atomic64_t *v) 29 29 { 30 30 s64 c; 31 31 ··· 35 35 return c; 36 36 } 37 37 38 - static inline void __atomic64_set(atomic64_t *v, s64 i) 38 + static __always_inline void __atomic64_set(atomic64_t *v, s64 i) 39 39 { 40 40 asm volatile( 41 41 " stg %1,%0\n" ··· 45 45 #ifdef CONFIG_HAVE_MARCH_Z196_FEATURES 46 46 47 47 #define __ATOMIC_OP(op_name, op_type, op_string, op_barrier) \ 48 - static inline op_type op_name(op_type val, op_type *ptr) \ 48 + static __always_inline op_type op_name(op_type val, op_type *ptr) \ 49 49 { \ 50 50 op_type old; \ 51 51 \ ··· 96 96 #else /* CONFIG_HAVE_MARCH_Z196_FEATURES */ 97 97 98 98 #define __ATOMIC_OP(op_name, op_string) \ 99 - static inline int op_name(int val, int *ptr) \ 99 + static __always_inline int op_name(int val, int *ptr) \ 100 100 { \ 101 101 int old, new; \ 102 102 \ ··· 122 122 #undef __ATOMIC_OPS 123 123 124 124 #define __ATOMIC64_OP(op_name, op_string) \ 125 - static inline long op_name(long val, long *ptr) \ 125 + static __always_inline long op_name(long val, long *ptr) \ 126 126 { \ 127 127 long old, new; \ 128 128 \ ··· 154 154 155 155 #endif /* CONFIG_HAVE_MARCH_Z196_FEATURES */ 156 156 157 - static inline int __atomic_cmpxchg(int *ptr, int old, int new) 157 + static __always_inline int __atomic_cmpxchg(int *ptr, int old, int new) 158 158 { 159 159 asm volatile( 160 160 " cs %[old],%[new],%[ptr]" ··· 164 164 return old; 165 165 } 166 166 167 - static inline bool __atomic_cmpxchg_bool(int *ptr, int old, int new) 167 + static __always_inline bool __atomic_cmpxchg_bool(int *ptr, int old, int new) 168 168 { 169 169 int old_expected = old; 170 170 ··· 176 176 return old == old_expected; 177 177 } 178 178 179 - static inline long __atomic64_cmpxchg(long *ptr, long old, long new) 179 + static __always_inline long __atomic64_cmpxchg(long *ptr, long old, long new) 180 180 { 181 181 asm volatile( 182 182 " csg %[old],%[new],%[ptr]" ··· 186 186 return old; 187 187 } 188 188 189 - static inline bool __atomic64_cmpxchg_bool(long *ptr, long old, long new) 189 + static __always_inline bool __atomic64_cmpxchg_bool(long *ptr, long old, long new) 190 190 { 191 191 long old_expected = old; 192 192

+18 -18

arch/s390/include/asm/preempt.h

··· 12 12 #define PREEMPT_NEED_RESCHED 0x80000000 13 13 #define PREEMPT_ENABLED (0 + PREEMPT_NEED_RESCHED) 14 14 15 - static inline int preempt_count(void) 15 + static __always_inline int preempt_count(void) 16 16 { 17 17 return READ_ONCE(S390_lowcore.preempt_count) & ~PREEMPT_NEED_RESCHED; 18 18 } 19 19 20 - static inline void preempt_count_set(int pc) 20 + static __always_inline void preempt_count_set(int pc) 21 21 { 22 22 int old, new; 23 23 ··· 29 29 old, new) != old); 30 30 } 31 31 32 - static inline void set_preempt_need_resched(void) 32 + static __always_inline void set_preempt_need_resched(void) 33 33 { 34 34 __atomic_and(~PREEMPT_NEED_RESCHED, &S390_lowcore.preempt_count); 35 35 } 36 36 37 - static inline void clear_preempt_need_resched(void) 37 + static __always_inline void clear_preempt_need_resched(void) 38 38 { 39 39 __atomic_or(PREEMPT_NEED_RESCHED, &S390_lowcore.preempt_count); 40 40 } 41 41 42 - static inline bool test_preempt_need_resched(void) 42 + static __always_inline bool test_preempt_need_resched(void) 43 43 { 44 44 return !(READ_ONCE(S390_lowcore.preempt_count) & PREEMPT_NEED_RESCHED); 45 45 } 46 46 47 - static inline void __preempt_count_add(int val) 47 + static __always_inline void __preempt_count_add(int val) 48 48 { 49 49 /* 50 50 * With some obscure config options and CONFIG_PROFILE_ALL_BRANCHES ··· 59 59 __atomic_add(val, &S390_lowcore.preempt_count); 60 60 } 61 61 62 - static inline void __preempt_count_sub(int val) 62 + static __always_inline void __preempt_count_sub(int val) 63 63 { 64 64 __preempt_count_add(-val); 65 65 } 66 66 67 - static inline bool __preempt_count_dec_and_test(void) 67 + static __always_inline bool __preempt_count_dec_and_test(void) 68 68 { 69 69 return __atomic_add(-1, &S390_lowcore.preempt_count) == 1; 70 70 } 71 71 72 - static inline bool should_resched(int preempt_offset) 72 + static __always_inline bool should_resched(int preempt_offset) 73 73 { 74 74 return unlikely(READ_ONCE(S390_lowcore.preempt_count) == 75 75 preempt_offset); ··· 79 79 80 80 #define PREEMPT_ENABLED (0) 81 81 82 - static inline int preempt_count(void) 82 + static __always_inline int preempt_count(void) 83 83 { 84 84 return READ_ONCE(S390_lowcore.preempt_count); 85 85 } 86 86 87 - static inline void preempt_count_set(int pc) 87 + static __always_inline void preempt_count_set(int pc) 88 88 { 89 89 S390_lowcore.preempt_count = pc; 90 90 } 91 91 92 - static inline void set_preempt_need_resched(void) 92 + static __always_inline void set_preempt_need_resched(void) 93 93 { 94 94 } 95 95 96 - static inline void clear_preempt_need_resched(void) 96 + static __always_inline void clear_preempt_need_resched(void) 97 97 { 98 98 } 99 99 100 - static inline bool test_preempt_need_resched(void) 100 + static __always_inline bool test_preempt_need_resched(void) 101 101 { 102 102 return false; 103 103 } 104 104 105 - static inline void __preempt_count_add(int val) 105 + static __always_inline void __preempt_count_add(int val) 106 106 { 107 107 S390_lowcore.preempt_count += val; 108 108 } 109 109 110 - static inline void __preempt_count_sub(int val) 110 + static __always_inline void __preempt_count_sub(int val) 111 111 { 112 112 S390_lowcore.preempt_count -= val; 113 113 } 114 114 115 - static inline bool __preempt_count_dec_and_test(void) 115 + static __always_inline bool __preempt_count_dec_and_test(void) 116 116 { 117 117 return !--S390_lowcore.preempt_count && tif_need_resched(); 118 118 } 119 119 120 - static inline bool should_resched(int preempt_offset) 120 + static __always_inline bool should_resched(int preempt_offset) 121 121 { 122 122 return unlikely(preempt_count() == preempt_offset && 123 123 tif_need_resched());

+1

arch/s390/kernel/entry.S

··· 635 635 SYM_DATA_END(daton_psw) 636 636 637 637 .section .rodata, "a" 638 + .balign 8 638 639 #define SYSCALL(esame,emu) .quad __s390x_ ## esame 639 640 SYM_DATA_START(sys_call_table) 640 641 #include "asm/syscall_table.h"

+7 -3

arch/s390/kernel/perf_pai_crypto.c

··· 90 90 event->cpu); 91 91 struct paicrypt_map *cpump = mp->mapptr; 92 92 93 - cpump->event = NULL; 94 93 static_branch_dec(&pai_key); 95 94 mutex_lock(&pai_reserve_mutex); 96 95 debug_sprintf_event(cfm_dbg, 5, "%s event %#llx cpu %d users %d" ··· 355 356 356 357 static void paicrypt_stop(struct perf_event *event, int flags) 357 358 { 358 - if (!event->attr.sample_period) /* Counting */ 359 + struct paicrypt_mapptr *mp = this_cpu_ptr(paicrypt_root.mapptr); 360 + struct paicrypt_map *cpump = mp->mapptr; 361 + 362 + if (!event->attr.sample_period) { /* Counting */ 359 363 paicrypt_read(event); 360 - else /* Sampling */ 364 + } else { /* Sampling */ 361 365 perf_sched_cb_dec(event->pmu); 366 + cpump->event = NULL; 367 + } 362 368 event->hw.state = PERF_HES_STOPPED; 363 369 } 364 370

+7 -3

arch/s390/kernel/perf_pai_ext.c

··· 122 122 123 123 free_page(PAI_SAVE_AREA(event)); 124 124 mutex_lock(&paiext_reserve_mutex); 125 - cpump->event = NULL; 126 125 if (refcount_dec_and_test(&cpump->refcnt)) /* Last reference gone */ 127 126 paiext_free(mp); 128 127 paiext_root_free(); ··· 361 362 362 363 static void paiext_stop(struct perf_event *event, int flags) 363 364 { 364 - if (!event->attr.sample_period) /* Counting */ 365 + struct paiext_mapptr *mp = this_cpu_ptr(paiext_root.mapptr); 366 + struct paiext_map *cpump = mp->mapptr; 367 + 368 + if (!event->attr.sample_period) { /* Counting */ 365 369 paiext_read(event); 366 - else /* Sampling */ 370 + } else { /* Sampling */ 367 371 perf_sched_cb_dec(event->pmu); 372 + cpump->event = NULL; 373 + } 368 374 event->hw.state = PERF_HES_STOPPED; 369 375 } 370 376

+1 -1

arch/s390/mm/fault.c

··· 75 75 if (!IS_ENABLED(CONFIG_PGSTE)) 76 76 return KERNEL_FAULT; 77 77 gmap = (struct gmap *)S390_lowcore.gmap; 78 - if (regs->cr1 == gmap->asce) 78 + if (gmap && gmap->asce == regs->cr1) 79 79 return GMAP_FAULT; 80 80 return KERNEL_FAULT; 81 81 }

+93

arch/x86/coco/core.c

··· 3 3 * Confidential Computing Platform Capability checks 4 4 * 5 5 * Copyright (C) 2021 Advanced Micro Devices, Inc. 6 + * Copyright (C) 2024 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. 6 7 * 7 8 * Author: Tom Lendacky <thomas.lendacky@amd.com> 8 9 */ 9 10 10 11 #include <linux/export.h> 11 12 #include <linux/cc_platform.h> 13 + #include <linux/string.h> 14 + #include <linux/random.h> 12 15 16 + #include <asm/archrandom.h> 13 17 #include <asm/coco.h> 14 18 #include <asm/processor.h> 15 19 16 20 enum cc_vendor cc_vendor __ro_after_init = CC_VENDOR_NONE; 17 21 u64 cc_mask __ro_after_init; 22 + 23 + static struct cc_attr_flags { 24 + __u64 host_sev_snp : 1, 25 + __resv : 63; 26 + } cc_flags; 18 27 19 28 static bool noinstr intel_cc_platform_has(enum cc_attr attr) 20 29 { ··· 98 89 case CC_ATTR_GUEST_SEV_SNP: 99 90 return sev_status & MSR_AMD64_SEV_SNP_ENABLED; 100 91 92 + case CC_ATTR_HOST_SEV_SNP: 93 + return cc_flags.host_sev_snp; 94 + 101 95 default: 102 96 return false; 103 97 } ··· 160 148 } 161 149 } 162 150 EXPORT_SYMBOL_GPL(cc_mkdec); 151 + 152 + static void amd_cc_platform_clear(enum cc_attr attr) 153 + { 154 + switch (attr) { 155 + case CC_ATTR_HOST_SEV_SNP: 156 + cc_flags.host_sev_snp = 0; 157 + break; 158 + default: 159 + break; 160 + } 161 + } 162 + 163 + void cc_platform_clear(enum cc_attr attr) 164 + { 165 + switch (cc_vendor) { 166 + case CC_VENDOR_AMD: 167 + amd_cc_platform_clear(attr); 168 + break; 169 + default: 170 + break; 171 + } 172 + } 173 + 174 + static void amd_cc_platform_set(enum cc_attr attr) 175 + { 176 + switch (attr) { 177 + case CC_ATTR_HOST_SEV_SNP: 178 + cc_flags.host_sev_snp = 1; 179 + break; 180 + default: 181 + break; 182 + } 183 + } 184 + 185 + void cc_platform_set(enum cc_attr attr) 186 + { 187 + switch (cc_vendor) { 188 + case CC_VENDOR_AMD: 189 + amd_cc_platform_set(attr); 190 + break; 191 + default: 192 + break; 193 + } 194 + } 195 + 196 + __init void cc_random_init(void) 197 + { 198 + /* 199 + * The seed is 32 bytes (in units of longs), which is 256 bits, which 200 + * is the security level that the RNG is targeting. 201 + */ 202 + unsigned long rng_seed[32 / sizeof(long)]; 203 + size_t i, longs; 204 + 205 + if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) 206 + return; 207 + 208 + /* 209 + * Since the CoCo threat model includes the host, the only reliable 210 + * source of entropy that can be neither observed nor manipulated is 211 + * RDRAND. Usually, RDRAND failure is considered tolerable, but since 212 + * CoCo guests have no other unobservable source of entropy, it's 213 + * important to at least ensure the RNG gets some initial random seeds. 214 + */ 215 + for (i = 0; i < ARRAY_SIZE(rng_seed); i += longs) { 216 + longs = arch_get_random_longs(&rng_seed[i], ARRAY_SIZE(rng_seed) - i); 217 + 218 + /* 219 + * A zero return value means that the guest doesn't have RDRAND 220 + * or the CPU is physically broken, and in both cases that 221 + * means most crypto inside of the CoCo instance will be 222 + * broken, defeating the purpose of CoCo in the first place. So 223 + * just panic here because it's absolutely unsafe to continue 224 + * executing. 225 + */ 226 + if (longs == 0) 227 + panic("RDRAND is defective."); 228 + } 229 + add_device_randomness(rng_seed, sizeof(rng_seed)); 230 + memzero_explicit(rng_seed, sizeof(rng_seed)); 231 + }

+4 -4

arch/x86/events/intel/ds.c

··· 1237 1237 struct pmu *pmu = event->pmu; 1238 1238 1239 1239 /* 1240 - * Make sure we get updated with the first PEBS 1241 - * event. It will trigger also during removal, but 1242 - * that does not hurt: 1240 + * Make sure we get updated with the first PEBS event. 1241 + * During removal, ->pebs_data_cfg is still valid for 1242 + * the last PEBS event. Don't clear it. 1243 1243 */ 1244 - if (cpuc->n_pebs == 1) 1244 + if ((cpuc->n_pebs == 1) && add) 1245 1245 cpuc->pebs_data_cfg = PEBS_UPDATE_DS_SW; 1246 1246 1247 1247 if (needed_cb != pebs_needs_sched_cb(cpuc)) {

+2 -2

arch/x86/include/asm/alternative.h

··· 117 117 extern void callthunks_patch_module_calls(struct callthunk_sites *sites, 118 118 struct module *mod); 119 119 extern void *callthunks_translate_call_dest(void *dest); 120 - extern int x86_call_depth_emit_accounting(u8 **pprog, void *func); 120 + extern int x86_call_depth_emit_accounting(u8 **pprog, void *func, void *ip); 121 121 #else 122 122 static __always_inline void callthunks_patch_builtin_calls(void) {} 123 123 static __always_inline void ··· 128 128 return dest; 129 129 } 130 130 static __always_inline int x86_call_depth_emit_accounting(u8 **pprog, 131 - void *func) 131 + void *func, void *ip) 132 132 { 133 133 return 0; 134 134 }

+2

arch/x86/include/asm/coco.h

··· 22 22 23 23 u64 cc_mkenc(u64 val); 24 24 u64 cc_mkdec(u64 val); 25 + void cc_random_init(void); 25 26 #else 26 27 #define cc_vendor (CC_VENDOR_NONE) 27 28 ··· 35 34 { 36 35 return val; 37 36 } 37 + static inline void cc_random_init(void) { } 38 38 #endif 39 39 40 40 #endif /* _ASM_X86_COCO_H */

+2

arch/x86/include/asm/cpufeature.h

··· 33 33 CPUID_7_EDX, 34 34 CPUID_8000_001F_EAX, 35 35 CPUID_8000_0021_EAX, 36 + CPUID_LNX_5, 37 + NR_CPUID_WORDS, 36 38 }; 37 39 38 40 #define X86_CAP_FMT_NUM "%d:%d"

+2 -2

arch/x86/include/asm/sev.h

··· 228 228 void snp_accept_memory(phys_addr_t start, phys_addr_t end); 229 229 u64 snp_get_unsupported_features(u64 status); 230 230 u64 sev_get_status(void); 231 - void kdump_sev_callback(void); 232 231 void sev_show_status(void); 233 232 #else 234 233 static inline void sev_es_ist_enter(struct pt_regs *regs) { } ··· 257 258 static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { } 258 259 static inline u64 snp_get_unsupported_features(u64 status) { return 0; } 259 260 static inline u64 sev_get_status(void) { return 0; } 260 - static inline void kdump_sev_callback(void) { } 261 261 static inline void sev_show_status(void) { } 262 262 #endif 263 263 ··· 268 270 int rmp_make_private(u64 pfn, u64 gpa, enum pg_level level, u32 asid, bool immutable); 269 271 int rmp_make_shared(u64 pfn, enum pg_level level); 270 272 void snp_leak_pages(u64 pfn, unsigned int npages); 273 + void kdump_sev_callback(void); 271 274 #else 272 275 static inline bool snp_probe_rmptable_info(void) { return false; } 273 276 static inline int snp_lookup_rmpentry(u64 pfn, bool *assigned, int *level) { return -ENODEV; } ··· 281 282 } 282 283 static inline int rmp_make_shared(u64 pfn, enum pg_level level) { return -ENODEV; } 283 284 static inline void snp_leak_pages(u64 pfn, unsigned int npages) {} 285 + static inline void kdump_sev_callback(void) { } 284 286 #endif 285 287 286 288 #endif

+23

arch/x86/include/uapi/asm/kvm.h

··· 694 694 695 695 struct kvm_sev_cmd { 696 696 __u32 id; 697 + __u32 pad0; 697 698 __u64 data; 698 699 __u32 error; 699 700 __u32 sev_fd; ··· 705 704 __u32 policy; 706 705 __u64 dh_uaddr; 707 706 __u32 dh_len; 707 + __u32 pad0; 708 708 __u64 session_uaddr; 709 709 __u32 session_len; 710 + __u32 pad1; 710 711 }; 711 712 712 713 struct kvm_sev_launch_update_data { 713 714 __u64 uaddr; 714 715 __u32 len; 716 + __u32 pad0; 715 717 }; 716 718 717 719 718 720 struct kvm_sev_launch_secret { 719 721 __u64 hdr_uaddr; 720 722 __u32 hdr_len; 723 + __u32 pad0; 721 724 __u64 guest_uaddr; 722 725 __u32 guest_len; 726 + __u32 pad1; 723 727 __u64 trans_uaddr; 724 728 __u32 trans_len; 729 + __u32 pad2; 725 730 }; 726 731 727 732 struct kvm_sev_launch_measure { 728 733 __u64 uaddr; 729 734 __u32 len; 735 + __u32 pad0; 730 736 }; 731 737 732 738 struct kvm_sev_guest_status { ··· 746 738 __u64 src_uaddr; 747 739 __u64 dst_uaddr; 748 740 __u32 len; 741 + __u32 pad0; 749 742 }; 750 743 751 744 struct kvm_sev_attestation_report { 752 745 __u8 mnonce[16]; 753 746 __u64 uaddr; 754 747 __u32 len; 748 + __u32 pad0; 755 749 }; 756 750 757 751 struct kvm_sev_send_start { 758 752 __u32 policy; 753 + __u32 pad0; 759 754 __u64 pdh_cert_uaddr; 760 755 __u32 pdh_cert_len; 756 + __u32 pad1; 761 757 __u64 plat_certs_uaddr; 762 758 __u32 plat_certs_len; 759 + __u32 pad2; 763 760 __u64 amd_certs_uaddr; 764 761 __u32 amd_certs_len; 762 + __u32 pad3; 765 763 __u64 session_uaddr; 766 764 __u32 session_len; 765 + __u32 pad4; 767 766 }; 768 767 769 768 struct kvm_sev_send_update_data { 770 769 __u64 hdr_uaddr; 771 770 __u32 hdr_len; 771 + __u32 pad0; 772 772 __u64 guest_uaddr; 773 773 __u32 guest_len; 774 + __u32 pad1; 774 775 __u64 trans_uaddr; 775 776 __u32 trans_len; 777 + __u32 pad2; 776 778 }; 777 779 778 780 struct kvm_sev_receive_start { ··· 790 772 __u32 policy; 791 773 __u64 pdh_uaddr; 792 774 __u32 pdh_len; 775 + __u32 pad0; 793 776 __u64 session_uaddr; 794 777 __u32 session_len; 778 + __u32 pad1; 795 779 }; 796 780 797 781 struct kvm_sev_receive_update_data { 798 782 __u64 hdr_uaddr; 799 783 __u32 hdr_len; 784 + __u32 pad0; 800 785 __u64 guest_uaddr; 801 786 __u32 guest_len; 787 + __u32 pad1; 802 788 __u64 trans_uaddr; 803 789 __u32 trans_len; 790 + __u32 pad2; 804 791 }; 805 792 806 793 #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)

-1

arch/x86/include/uapi/asm/kvm_para.h

··· 142 142 __u32 token; 143 143 144 144 __u8 pad[56]; 145 - __u32 enabled; 146 145 }; 147 146 148 147 #define KVM_PV_EOI_BIT 0

+2 -2

arch/x86/kernel/callthunks.c

··· 314 314 return !bcmp(pad, insn_buff, tmpl_size); 315 315 } 316 316 317 - int x86_call_depth_emit_accounting(u8 **pprog, void *func) 317 + int x86_call_depth_emit_accounting(u8 **pprog, void *func, void *ip) 318 318 { 319 319 unsigned int tmpl_size = SKL_TMPL_SIZE; 320 320 u8 insn_buff[MAX_PATCH_LEN]; ··· 327 327 return 0; 328 328 329 329 memcpy(insn_buff, skl_call_thunk_template, tmpl_size); 330 - apply_relocation(insn_buff, tmpl_size, *pprog, 330 + apply_relocation(insn_buff, tmpl_size, ip, 331 331 skl_call_thunk_template, tmpl_size); 332 332 333 333 memcpy(*pprog, insn_buff, tmpl_size);

+23 -15

arch/x86/kernel/cpu/amd.c

··· 345 345 #endif 346 346 } 347 347 348 + static void bsp_determine_snp(struct cpuinfo_x86 *c) 349 + { 350 + #ifdef CONFIG_ARCH_HAS_CC_PLATFORM 351 + cc_vendor = CC_VENDOR_AMD; 352 + 353 + if (cpu_has(c, X86_FEATURE_SEV_SNP)) { 354 + /* 355 + * RMP table entry format is not architectural and is defined by the 356 + * per-processor PPR. Restrict SNP support on the known CPU models 357 + * for which the RMP table entry format is currently defined for. 358 + */ 359 + if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && 360 + c->x86 >= 0x19 && snp_probe_rmptable_info()) { 361 + cc_platform_set(CC_ATTR_HOST_SEV_SNP); 362 + } else { 363 + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP); 364 + cc_platform_clear(CC_ATTR_HOST_SEV_SNP); 365 + } 366 + } 367 + #endif 368 + } 369 + 348 370 static void bsp_init_amd(struct cpuinfo_x86 *c) 349 371 { 350 372 if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) { ··· 474 452 break; 475 453 } 476 454 477 - if (cpu_has(c, X86_FEATURE_SEV_SNP)) { 478 - /* 479 - * RMP table entry format is not architectural and it can vary by processor 480 - * and is defined by the per-processor PPR. Restrict SNP support on the 481 - * known CPU model and family for which the RMP table entry format is 482 - * currently defined for. 483 - */ 484 - if (!boot_cpu_has(X86_FEATURE_ZEN3) && 485 - !boot_cpu_has(X86_FEATURE_ZEN4) && 486 - !boot_cpu_has(X86_FEATURE_ZEN5)) 487 - setup_clear_cpu_cap(X86_FEATURE_SEV_SNP); 488 - else if (!snp_probe_rmptable_info()) 489 - setup_clear_cpu_cap(X86_FEATURE_SEV_SNP); 490 - } 491 - 455 + bsp_determine_snp(c); 492 456 return; 493 457 494 458 warn:

+3 -1

arch/x86/kernel/cpu/mce/core.c

··· 2500 2500 return -EINVAL; 2501 2501 2502 2502 b = &per_cpu(mce_banks_array, s->id)[bank]; 2503 - 2504 2503 if (!b->init) 2505 2504 return -ENODEV; 2506 2505 2507 2506 b->ctl = new; 2507 + 2508 + mutex_lock(&mce_sysfs_mutex); 2508 2509 mce_restart(); 2510 + mutex_unlock(&mce_sysfs_mutex); 2509 2511 2510 2512 return size; 2511 2513 }

+1 -1

arch/x86/kernel/cpu/mtrr/generic.c

··· 108 108 (boot_cpu_data.x86 >= 0x0f))) 109 109 return; 110 110 111 - if (cpu_feature_enabled(X86_FEATURE_SEV_SNP)) 111 + if (cc_platform_has(CC_ATTR_HOST_SEV_SNP)) 112 112 return; 113 113 114 114 rdmsr(MSR_AMD64_SYSCFG, lo, hi);

+2 -1

arch/x86/kernel/cpu/resctrl/internal.h

··· 78 78 else 79 79 cpu = cpumask_any_but(mask, exclude_cpu); 80 80 81 - if (!IS_ENABLED(CONFIG_NO_HZ_FULL)) 81 + /* Only continue if tick_nohz_full_mask has been initialized. */ 82 + if (!tick_nohz_full_enabled()) 82 83 return cpu; 83 84 84 85 /* If the CPU picked isn't marked nohz_full nothing more needs doing. */

+6 -5

arch/x86/kernel/kvm.c

··· 65 65 66 66 early_param("no-steal-acc", parse_no_stealacc); 67 67 68 + static DEFINE_PER_CPU_READ_MOSTLY(bool, async_pf_enabled); 68 69 static DEFINE_PER_CPU_DECRYPTED(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64); 69 70 DEFINE_PER_CPU_DECRYPTED(struct kvm_steal_time, steal_time) __aligned(64) __visible; 70 71 static int has_steal_clock = 0; ··· 245 244 { 246 245 u32 flags = 0; 247 246 248 - if (__this_cpu_read(apf_reason.enabled)) { 247 + if (__this_cpu_read(async_pf_enabled)) { 249 248 flags = __this_cpu_read(apf_reason.flags); 250 249 __this_cpu_write(apf_reason.flags, 0); 251 250 } ··· 296 295 297 296 inc_irq_stat(irq_hv_callback_count); 298 297 299 - if (__this_cpu_read(apf_reason.enabled)) { 298 + if (__this_cpu_read(async_pf_enabled)) { 300 299 token = __this_cpu_read(apf_reason.token); 301 300 kvm_async_pf_task_wake(token); 302 301 __this_cpu_write(apf_reason.token, 0); ··· 363 362 wrmsrl(MSR_KVM_ASYNC_PF_INT, HYPERVISOR_CALLBACK_VECTOR); 364 363 365 364 wrmsrl(MSR_KVM_ASYNC_PF_EN, pa); 366 - __this_cpu_write(apf_reason.enabled, 1); 365 + __this_cpu_write(async_pf_enabled, true); 367 366 pr_debug("setup async PF for cpu %d\n", smp_processor_id()); 368 367 } 369 368 ··· 384 383 385 384 static void kvm_pv_disable_apf(void) 386 385 { 387 - if (!__this_cpu_read(apf_reason.enabled)) 386 + if (!__this_cpu_read(async_pf_enabled)) 388 387 return; 389 388 390 389 wrmsrl(MSR_KVM_ASYNC_PF_EN, 0); 391 - __this_cpu_write(apf_reason.enabled, 0); 390 + __this_cpu_write(async_pf_enabled, false); 392 391 393 392 pr_debug("disable async PF for cpu %d\n", smp_processor_id()); 394 393 }

+2

arch/x86/kernel/setup.c

··· 35 35 #include <asm/bios_ebda.h> 36 36 #include <asm/bugs.h> 37 37 #include <asm/cacheinfo.h> 38 + #include <asm/coco.h> 38 39 #include <asm/cpu.h> 39 40 #include <asm/efi.h> 40 41 #include <asm/gart.h> ··· 992 991 * memory size. 993 992 */ 994 993 mem_encrypt_setup_arch(); 994 + cc_random_init(); 995 995 996 996 efi_fake_memmap(); 997 997 efi_find_mirror();

-10

arch/x86/kernel/sev.c

··· 2284 2284 } 2285 2285 device_initcall(snp_init_platform_device); 2286 2286 2287 - void kdump_sev_callback(void) 2288 - { 2289 - /* 2290 - * Do wbinvd() on remote CPUs when SNP is enabled in order to 2291 - * safely do SNP_SHUTDOWN on the local CPU. 2292 - */ 2293 - if (cpu_feature_enabled(X86_FEATURE_SEV_SNP)) 2294 - wbinvd(); 2295 - } 2296 - 2297 2287 void sev_show_status(void) 2298 2288 { 2299 2289 int i;

+1

arch/x86/kvm/Kconfig

··· 122 122 default y 123 123 depends on KVM_AMD && X86_64 124 124 depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m) 125 + select ARCH_HAS_CC_PLATFORM 125 126 help 126 127 Provides support for launching Encrypted VMs (SEV) and Encrypted VMs 127 128 with Encrypted State (SEV-ES) on AMD processors.

+27 -17

arch/x86/kvm/cpuid.c

··· 189 189 return 0; 190 190 } 191 191 192 - static struct kvm_hypervisor_cpuid kvm_get_hypervisor_cpuid(struct kvm_vcpu *vcpu, 193 - const char *sig) 192 + static struct kvm_hypervisor_cpuid __kvm_get_hypervisor_cpuid(struct kvm_cpuid_entry2 *entries, 193 + int nent, const char *sig) 194 194 { 195 195 struct kvm_hypervisor_cpuid cpuid = {}; 196 196 struct kvm_cpuid_entry2 *entry; 197 197 u32 base; 198 198 199 199 for_each_possible_hypervisor_cpuid_base(base) { 200 - entry = kvm_find_cpuid_entry(vcpu, base); 200 + entry = cpuid_entry2_find(entries, nent, base, KVM_CPUID_INDEX_NOT_SIGNIFICANT); 201 201 202 202 if (entry) { 203 203 u32 signature[3]; ··· 217 217 return cpuid; 218 218 } 219 219 220 - static struct kvm_cpuid_entry2 *__kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcpu, 221 - struct kvm_cpuid_entry2 *entries, int nent) 220 + static struct kvm_hypervisor_cpuid kvm_get_hypervisor_cpuid(struct kvm_vcpu *vcpu, 221 + const char *sig) 222 + { 223 + return __kvm_get_hypervisor_cpuid(vcpu->arch.cpuid_entries, 224 + vcpu->arch.cpuid_nent, sig); 225 + } 226 + 227 + static struct kvm_cpuid_entry2 *__kvm_find_kvm_cpuid_features(struct kvm_cpuid_entry2 *entries, 228 + int nent, u32 kvm_cpuid_base) 229 + { 230 + return cpuid_entry2_find(entries, nent, kvm_cpuid_base | KVM_CPUID_FEATURES, 231 + KVM_CPUID_INDEX_NOT_SIGNIFICANT); 232 + } 233 + 234 + static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcpu) 222 235 { 223 236 u32 base = vcpu->arch.kvm_cpuid.base; 224 237 225 238 if (!base) 226 239 return NULL; 227 240 228 - return cpuid_entry2_find(entries, nent, base | KVM_CPUID_FEATURES, 229 - KVM_CPUID_INDEX_NOT_SIGNIFICANT); 230 - } 231 - 232 - static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcpu) 233 - { 234 - return __kvm_find_kvm_cpuid_features(vcpu, vcpu->arch.cpuid_entries, 235 - vcpu->arch.cpuid_nent); 241 + return __kvm_find_kvm_cpuid_features(vcpu->arch.cpuid_entries, 242 + vcpu->arch.cpuid_nent, base); 236 243 } 237 244 238 245 void kvm_update_pv_runtime(struct kvm_vcpu *vcpu) ··· 273 266 int nent) 274 267 { 275 268 struct kvm_cpuid_entry2 *best; 269 + struct kvm_hypervisor_cpuid kvm_cpuid; 276 270 277 271 best = cpuid_entry2_find(entries, nent, 1, KVM_CPUID_INDEX_NOT_SIGNIFICANT); 278 272 if (best) { ··· 300 292 cpuid_entry_has(best, X86_FEATURE_XSAVEC))) 301 293 best->ebx = xstate_required_size(vcpu->arch.xcr0, true); 302 294 303 - best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent); 304 - if (kvm_hlt_in_guest(vcpu->kvm) && best && 305 - (best->eax & (1 << KVM_FEATURE_PV_UNHALT))) 306 - best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT); 295 + kvm_cpuid = __kvm_get_hypervisor_cpuid(entries, nent, KVM_SIGNATURE); 296 + if (kvm_cpuid.base) { 297 + best = __kvm_find_kvm_cpuid_features(entries, nent, kvm_cpuid.base); 298 + if (kvm_hlt_in_guest(vcpu->kvm) && best) 299 + best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT); 300 + } 307 301 308 302 if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) { 309 303 best = cpuid_entry2_find(entries, nent, 0x1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);

+2

arch/x86/kvm/reverse_cpuid.h

··· 102 102 */ 103 103 static __always_inline void reverse_cpuid_check(unsigned int x86_leaf) 104 104 { 105 + BUILD_BUG_ON(NR_CPUID_WORDS != NCAPINTS); 105 106 BUILD_BUG_ON(x86_leaf == CPUID_LNX_1); 106 107 BUILD_BUG_ON(x86_leaf == CPUID_LNX_2); 107 108 BUILD_BUG_ON(x86_leaf == CPUID_LNX_3); 108 109 BUILD_BUG_ON(x86_leaf == CPUID_LNX_4); 110 + BUILD_BUG_ON(x86_leaf == CPUID_LNX_5); 109 111 BUILD_BUG_ON(x86_leaf >= ARRAY_SIZE(reverse_cpuid)); 110 112 BUILD_BUG_ON(reverse_cpuid[x86_leaf].function == 0); 111 113 }

+35 -25

arch/x86/kvm/svm/sev.c

··· 84 84 }; 85 85 86 86 /* Called with the sev_bitmap_lock held, or on shutdown */ 87 - static int sev_flush_asids(int min_asid, int max_asid) 87 + static int sev_flush_asids(unsigned int min_asid, unsigned int max_asid) 88 88 { 89 - int ret, asid, error = 0; 89 + int ret, error = 0; 90 + unsigned int asid; 90 91 91 92 /* Check if there are any ASIDs to reclaim before performing a flush */ 92 93 asid = find_next_bit(sev_reclaim_asid_bitmap, nr_asids, min_asid); ··· 117 116 } 118 117 119 118 /* Must be called with the sev_bitmap_lock held */ 120 - static bool __sev_recycle_asids(int min_asid, int max_asid) 119 + static bool __sev_recycle_asids(unsigned int min_asid, unsigned int max_asid) 121 120 { 122 121 if (sev_flush_asids(min_asid, max_asid)) 123 122 return false; ··· 144 143 145 144 static int sev_asid_new(struct kvm_sev_info *sev) 146 145 { 147 - int asid, min_asid, max_asid, ret; 146 + /* 147 + * SEV-enabled guests must use asid from min_sev_asid to max_sev_asid. 148 + * SEV-ES-enabled guest can use from 1 to min_sev_asid - 1. 149 + * Note: min ASID can end up larger than the max if basic SEV support is 150 + * effectively disabled by disallowing use of ASIDs for SEV guests. 151 + */ 152 + unsigned int min_asid = sev->es_active ? 1 : min_sev_asid; 153 + unsigned int max_asid = sev->es_active ? min_sev_asid - 1 : max_sev_asid; 154 + unsigned int asid; 148 155 bool retry = true; 156 + int ret; 157 + 158 + if (min_asid > max_asid) 159 + return -ENOTTY; 149 160 150 161 WARN_ON(sev->misc_cg); 151 162 sev->misc_cg = get_current_misc_cg(); ··· 170 157 171 158 mutex_lock(&sev_bitmap_lock); 172 159 173 - /* 174 - * SEV-enabled guests must use asid from min_sev_asid to max_sev_asid. 175 - * SEV-ES-enabled guest can use from 1 to min_sev_asid - 1. 176 - */ 177 - min_asid = sev->es_active ? 1 : min_sev_asid; 178 - max_asid = sev->es_active ? min_sev_asid - 1 : max_sev_asid; 179 160 again: 180 161 asid = find_next_zero_bit(sev_asid_bitmap, max_asid + 1, min_asid); 181 162 if (asid > max_asid) { ··· 186 179 187 180 mutex_unlock(&sev_bitmap_lock); 188 181 189 - return asid; 182 + sev->asid = asid; 183 + return 0; 190 184 e_uncharge: 191 185 sev_misc_cg_uncharge(sev); 192 186 put_misc_cg(sev->misc_cg); ··· 195 187 return ret; 196 188 } 197 189 198 - static int sev_get_asid(struct kvm *kvm) 190 + static unsigned int sev_get_asid(struct kvm *kvm) 199 191 { 200 192 struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; 201 193 ··· 255 247 { 256 248 struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; 257 249 struct sev_platform_init_args init_args = {0}; 258 - int asid, ret; 250 + int ret; 259 251 260 252 if (kvm->created_vcpus) 261 253 return -EINVAL; 262 254 263 - ret = -EBUSY; 264 255 if (unlikely(sev->active)) 265 - return ret; 256 + return -EINVAL; 266 257 267 258 sev->active = true; 268 259 sev->es_active = argp->id == KVM_SEV_ES_INIT; 269 - asid = sev_asid_new(sev); 270 - if (asid < 0) 260 + ret = sev_asid_new(sev); 261 + if (ret) 271 262 goto e_no_asid; 272 - sev->asid = asid; 273 263 274 264 init_args.probe = false; 275 265 ret = sev_platform_init(&init_args); ··· 293 287 294 288 static int sev_bind_asid(struct kvm *kvm, unsigned int handle, int *error) 295 289 { 290 + unsigned int asid = sev_get_asid(kvm); 296 291 struct sev_data_activate activate; 297 - int asid = sev_get_asid(kvm); 298 292 int ret; 299 293 300 294 /* activate ASID on the given handle */ ··· 2246 2240 goto out; 2247 2241 } 2248 2242 2249 - sev_asid_count = max_sev_asid - min_sev_asid + 1; 2250 - WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV, sev_asid_count)); 2243 + if (min_sev_asid <= max_sev_asid) { 2244 + sev_asid_count = max_sev_asid - min_sev_asid + 1; 2245 + WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV, sev_asid_count)); 2246 + } 2251 2247 sev_supported = true; 2252 2248 2253 2249 /* SEV-ES support requested? */ ··· 2280 2272 out: 2281 2273 if (boot_cpu_has(X86_FEATURE_SEV)) 2282 2274 pr_info("SEV %s (ASIDs %u - %u)\n", 2283 - sev_supported ? "enabled" : "disabled", 2275 + sev_supported ? min_sev_asid <= max_sev_asid ? "enabled" : 2276 + "unusable" : 2277 + "disabled", 2284 2278 min_sev_asid, max_sev_asid); 2285 2279 if (boot_cpu_has(X86_FEATURE_SEV_ES)) 2286 2280 pr_info("SEV-ES %s (ASIDs %u - %u)\n", ··· 2330 2320 */ 2331 2321 static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va) 2332 2322 { 2333 - int asid = to_kvm_svm(vcpu->kvm)->sev_info.asid; 2323 + unsigned int asid = sev_get_asid(vcpu->kvm); 2334 2324 2335 2325 /* 2336 2326 * Note! The address must be a kernel address, as regular page walk ··· 2648 2638 void pre_sev_run(struct vcpu_svm *svm, int cpu) 2649 2639 { 2650 2640 struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu); 2651 - int asid = sev_get_asid(svm->vcpu.kvm); 2641 + unsigned int asid = sev_get_asid(svm->vcpu.kvm); 2652 2642 2653 2643 /* Assign the asid allocated with this SEV guest */ 2654 2644 svm->asid = asid; ··· 3184 3174 unsigned long pfn; 3185 3175 struct page *p; 3186 3176 3187 - if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) 3177 + if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) 3188 3178 return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); 3189 3179 3190 3180 /*

+5 -5

arch/x86/kvm/trace.h

··· 735 735 * Tracepoint for nested #vmexit because of interrupt pending 736 736 */ 737 737 TRACE_EVENT(kvm_invlpga, 738 - TP_PROTO(__u64 rip, int asid, u64 address), 738 + TP_PROTO(__u64 rip, unsigned int asid, u64 address), 739 739 TP_ARGS(rip, asid, address), 740 740 741 741 TP_STRUCT__entry( 742 - __field( __u64, rip ) 743 - __field( int, asid ) 744 - __field( __u64, address ) 742 + __field( __u64, rip ) 743 + __field( unsigned int, asid ) 744 + __field( __u64, address ) 745 745 ), 746 746 747 747 TP_fast_assign( ··· 750 750 __entry->address = address; 751 751 ), 752 752 753 - TP_printk("rip: 0x%016llx asid: %d address: 0x%016llx", 753 + TP_printk("rip: 0x%016llx asid: %u address: 0x%016llx", 754 754 __entry->rip, __entry->asid, __entry->address) 755 755 ); 756 756

+5 -1

arch/x86/lib/retpoline.S

··· 228 228 #else /* !CONFIG_MITIGATION_SRSO */ 229 229 /* Dummy for the alternative in CALL_UNTRAIN_RET. */ 230 230 SYM_CODE_START(srso_alias_untrain_ret) 231 - RET 231 + ANNOTATE_UNRET_SAFE 232 + ANNOTATE_NOENDBR 233 + ret 234 + int3 232 235 SYM_FUNC_END(srso_alias_untrain_ret) 236 + __EXPORT_THUNK(srso_alias_untrain_ret) 233 237 #define JMP_SRSO_UNTRAIN_RET "ud2" 234 238 #endif /* CONFIG_MITIGATION_SRSO */ 235 239

+1

arch/x86/mm/numa_32.c

··· 24 24 25 25 #include <linux/memblock.h> 26 26 #include <linux/init.h> 27 + #include <asm/pgtable_areas.h> 27 28 28 29 #include "numa_internal.h" 29 30

+35 -14

arch/x86/mm/pat/memtype.c

··· 947 947 memtype_free(paddr, paddr + size); 948 948 } 949 949 950 + static int get_pat_info(struct vm_area_struct *vma, resource_size_t *paddr, 951 + pgprot_t *pgprot) 952 + { 953 + unsigned long prot; 954 + 955 + VM_WARN_ON_ONCE(!(vma->vm_flags & VM_PAT)); 956 + 957 + /* 958 + * We need the starting PFN and cachemode used for track_pfn_remap() 959 + * that covered the whole VMA. For most mappings, we can obtain that 960 + * information from the page tables. For COW mappings, we might now 961 + * suddenly have anon folios mapped and follow_phys() will fail. 962 + * 963 + * Fallback to using vma->vm_pgoff, see remap_pfn_range_notrack(), to 964 + * detect the PFN. If we need the cachemode as well, we're out of luck 965 + * for now and have to fail fork(). 966 + */ 967 + if (!follow_phys(vma, vma->vm_start, 0, &prot, paddr)) { 968 + if (pgprot) 969 + *pgprot = __pgprot(prot); 970 + return 0; 971 + } 972 + if (is_cow_mapping(vma->vm_flags)) { 973 + if (pgprot) 974 + return -EINVAL; 975 + *paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT; 976 + return 0; 977 + } 978 + WARN_ON_ONCE(1); 979 + return -EINVAL; 980 + } 981 + 950 982 /* 951 983 * track_pfn_copy is called when vma that is covering the pfnmap gets 952 984 * copied through copy_page_range(). ··· 989 957 int track_pfn_copy(struct vm_area_struct *vma) 990 958 { 991 959 resource_size_t paddr; 992 - unsigned long prot; 993 960 unsigned long vma_size = vma->vm_end - vma->vm_start; 994 961 pgprot_t pgprot; 995 962 996 963 if (vma->vm_flags & VM_PAT) { 997 - /* 998 - * reserve the whole chunk covered by vma. We need the 999 - * starting address and protection from pte. 1000 - */ 1001 - if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) { 1002 - WARN_ON_ONCE(1); 964 + if (get_pat_info(vma, &paddr, &pgprot)) 1003 965 return -EINVAL; 1004 - } 1005 - pgprot = __pgprot(prot); 966 + /* reserve the whole chunk covered by vma. */ 1006 967 return reserve_pfn_range(paddr, vma_size, &pgprot, 1); 1007 968 } 1008 969 ··· 1070 1045 unsigned long size, bool mm_wr_locked) 1071 1046 { 1072 1047 resource_size_t paddr; 1073 - unsigned long prot; 1074 1048 1075 1049 if (vma && !(vma->vm_flags & VM_PAT)) 1076 1050 return; ··· 1077 1053 /* free the chunk starting from pfn or the whole chunk */ 1078 1054 paddr = (resource_size_t)pfn << PAGE_SHIFT; 1079 1055 if (!paddr && !size) { 1080 - if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) { 1081 - WARN_ON_ONCE(1); 1056 + if (get_pat_info(vma, &paddr, NULL)) 1082 1057 return; 1083 - } 1084 - 1085 1058 size = vma->vm_end - vma->vm_start; 1086 1059 } 1087 1060 free_pfn_range(paddr, size);

+8 -11

arch/x86/net/bpf_jit_comp.c

··· 480 480 static int emit_rsb_call(u8 **pprog, void *func, void *ip) 481 481 { 482 482 OPTIMIZER_HIDE_VAR(func); 483 - x86_call_depth_emit_accounting(pprog, func); 483 + ip += x86_call_depth_emit_accounting(pprog, func, ip); 484 484 return emit_patch(pprog, func, ip, 0xE8); 485 485 } 486 486 ··· 1972 1972 1973 1973 /* call */ 1974 1974 case BPF_JMP | BPF_CALL: { 1975 - int offs; 1975 + u8 *ip = image + addrs[i - 1]; 1976 1976 1977 1977 func = (u8 *) __bpf_call_base + imm32; 1978 1978 if (tail_call_reachable) { 1979 1979 RESTORE_TAIL_CALL_CNT(bpf_prog->aux->stack_depth); 1980 - if (!imm32) 1981 - return -EINVAL; 1982 - offs = 7 + x86_call_depth_emit_accounting(&prog, func); 1983 - } else { 1984 - if (!imm32) 1985 - return -EINVAL; 1986 - offs = x86_call_depth_emit_accounting(&prog, func); 1980 + ip += 7; 1987 1981 } 1988 - if (emit_call(&prog, func, image + addrs[i - 1] + offs)) 1982 + if (!imm32) 1983 + return -EINVAL; 1984 + ip += x86_call_depth_emit_accounting(&prog, func, ip); 1985 + if (emit_call(&prog, func, ip)) 1989 1986 return -EINVAL; 1990 1987 break; 1991 1988 } ··· 2832 2835 * Direct-call fentry stub, as such it needs accounting for the 2833 2836 * __fentry__ call. 2834 2837 */ 2835 - x86_call_depth_emit_accounting(&prog, NULL); 2838 + x86_call_depth_emit_accounting(&prog, NULL, image); 2836 2839 } 2837 2840 EMIT1(0x55); /* push rbp */ 2838 2841 EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */

+18 -8

arch/x86/virt/svm/sev.c

··· 77 77 { 78 78 u64 val; 79 79 80 - if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) 80 + if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) 81 81 return 0; 82 82 83 83 rdmsrl(MSR_AMD64_SYSCFG, val); ··· 98 98 { 99 99 u64 val; 100 100 101 - if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) 101 + if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) 102 102 return 0; 103 103 104 104 rdmsrl(MSR_AMD64_SYSCFG, val); ··· 174 174 u64 rmptable_size; 175 175 u64 val; 176 176 177 - if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) 177 + if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) 178 178 return 0; 179 179 180 180 if (!amd_iommu_snp_en) 181 - return 0; 181 + goto nosnp; 182 182 183 183 if (!probed_rmp_size) 184 184 goto nosnp; ··· 225 225 return 0; 226 226 227 227 nosnp: 228 - setup_clear_cpu_cap(X86_FEATURE_SEV_SNP); 228 + cc_platform_clear(CC_ATTR_HOST_SEV_SNP); 229 229 return -ENOSYS; 230 230 } 231 231 ··· 246 246 { 247 247 struct rmpentry *large_entry, *entry; 248 248 249 - if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) 249 + if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) 250 250 return ERR_PTR(-ENODEV); 251 251 252 252 entry = get_rmpentry(pfn); ··· 363 363 unsigned long paddr = pfn << PAGE_SHIFT; 364 364 int ret; 365 365 366 - if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) 366 + if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) 367 367 return -ENODEV; 368 368 369 369 if (!pfn_valid(pfn)) ··· 472 472 unsigned long paddr = pfn << PAGE_SHIFT; 473 473 int ret, level; 474 474 475 - if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) 475 + if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) 476 476 return -ENODEV; 477 477 478 478 level = RMP_TO_PG_LEVEL(state->pagesize); ··· 558 558 spin_unlock(&snp_leaked_pages_list_lock); 559 559 } 560 560 EXPORT_SYMBOL_GPL(snp_leak_pages); 561 + 562 + void kdump_sev_callback(void) 563 + { 564 + /* 565 + * Do wbinvd() on remote CPUs when SNP is enabled in order to 566 + * safely do SNP_SHUTDOWN on the local CPU. 567 + */ 568 + if (cc_platform_has(CC_ATTR_HOST_SEV_SNP)) 569 + wbinvd(); 570 + }

+66 -18

block/bdev.c

··· 583 583 mutex_unlock(&bdev->bd_holder_lock); 584 584 bd_clear_claiming(whole, holder); 585 585 mutex_unlock(&bdev_lock); 586 - 587 - if (hops && hops->get_holder) 588 - hops->get_holder(holder); 589 586 } 590 587 591 588 /** ··· 605 608 static void bd_end_claim(struct block_device *bdev, void *holder) 606 609 { 607 610 struct block_device *whole = bdev_whole(bdev); 608 - const struct blk_holder_ops *hops = bdev->bd_holder_ops; 609 611 bool unblock = false; 610 612 611 613 /* ··· 626 630 if (!whole->bd_holders) 627 631 whole->bd_holder = NULL; 628 632 mutex_unlock(&bdev_lock); 629 - 630 - if (hops && hops->put_holder) 631 - hops->put_holder(holder); 632 633 633 634 /* 634 635 * If this was the last claim, remove holder link and unblock evpoll if ··· 769 776 770 777 static bool bdev_writes_blocked(struct block_device *bdev) 771 778 { 772 - return bdev->bd_writers == -1; 779 + return bdev->bd_writers < 0; 773 780 } 774 781 775 782 static void bdev_block_writes(struct block_device *bdev) 776 783 { 777 - bdev->bd_writers = -1; 784 + bdev->bd_writers--; 778 785 } 779 786 780 787 static void bdev_unblock_writes(struct block_device *bdev) 781 788 { 782 - bdev->bd_writers = 0; 789 + bdev->bd_writers++; 783 790 } 784 791 785 792 static bool bdev_may_open(struct block_device *bdev, blk_mode_t mode) ··· 806 813 bdev->bd_writers++; 807 814 } 808 815 816 + static inline bool bdev_unclaimed(const struct file *bdev_file) 817 + { 818 + return bdev_file->private_data == BDEV_I(bdev_file->f_mapping->host); 819 + } 820 + 809 821 static void bdev_yield_write_access(struct file *bdev_file) 810 822 { 811 823 struct block_device *bdev; ··· 818 820 if (bdev_allow_write_mounted) 819 821 return; 820 822 823 + if (bdev_unclaimed(bdev_file)) 824 + return; 825 + 821 826 bdev = file_bdev(bdev_file); 822 - /* Yield exclusive or shared write access. */ 823 - if (bdev_file->f_mode & FMODE_WRITE) { 824 - if (bdev_writes_blocked(bdev)) 825 - bdev_unblock_writes(bdev); 826 - else 827 - bdev->bd_writers--; 828 - } 827 + 828 + if (bdev_file->f_mode & FMODE_WRITE_RESTRICTED) 829 + bdev_unblock_writes(bdev); 830 + else if (bdev_file->f_mode & FMODE_WRITE) 831 + bdev->bd_writers--; 829 832 } 830 833 831 834 /** ··· 906 907 bdev_file->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT; 907 908 if (bdev_nowait(bdev)) 908 909 bdev_file->f_mode |= FMODE_NOWAIT; 910 + if (mode & BLK_OPEN_RESTRICT_WRITES) 911 + bdev_file->f_mode |= FMODE_WRITE_RESTRICTED; 909 912 bdev_file->f_mapping = bdev->bd_inode->i_mapping; 910 913 bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping); 911 914 bdev_file->private_data = holder; ··· 1013 1012 } 1014 1013 EXPORT_SYMBOL(bdev_file_open_by_path); 1015 1014 1015 + static inline void bd_yield_claim(struct file *bdev_file) 1016 + { 1017 + struct block_device *bdev = file_bdev(bdev_file); 1018 + void *holder = bdev_file->private_data; 1019 + 1020 + lockdep_assert_held(&bdev->bd_disk->open_mutex); 1021 + 1022 + if (WARN_ON_ONCE(IS_ERR_OR_NULL(holder))) 1023 + return; 1024 + 1025 + if (!bdev_unclaimed(bdev_file)) 1026 + bd_end_claim(bdev, holder); 1027 + } 1028 + 1016 1029 void bdev_release(struct file *bdev_file) 1017 1030 { 1018 1031 struct block_device *bdev = file_bdev(bdev_file); ··· 1051 1036 bdev_yield_write_access(bdev_file); 1052 1037 1053 1038 if (holder) 1054 - bd_end_claim(bdev, holder); 1039 + bd_yield_claim(bdev_file); 1055 1040 1056 1041 /* 1057 1042 * Trigger event checking and tell drivers to flush MEDIA_CHANGE ··· 1070 1055 put_no_open: 1071 1056 blkdev_put_no_open(bdev); 1072 1057 } 1058 + 1059 + /** 1060 + * bdev_fput - yield claim to the block device and put the file 1061 + * @bdev_file: open block device 1062 + * 1063 + * Yield claim on the block device and put the file. Ensure that the 1064 + * block device can be reclaimed before the file is closed which is a 1065 + * deferred operation. 1066 + */ 1067 + void bdev_fput(struct file *bdev_file) 1068 + { 1069 + if (WARN_ON_ONCE(bdev_file->f_op != &def_blk_fops)) 1070 + return; 1071 + 1072 + if (bdev_file->private_data) { 1073 + struct block_device *bdev = file_bdev(bdev_file); 1074 + struct gendisk *disk = bdev->bd_disk; 1075 + 1076 + mutex_lock(&disk->open_mutex); 1077 + bdev_yield_write_access(bdev_file); 1078 + bd_yield_claim(bdev_file); 1079 + /* 1080 + * Tell release we already gave up our hold on the 1081 + * device and if write restrictions are available that 1082 + * we already gave up write access to the device. 1083 + */ 1084 + bdev_file->private_data = BDEV_I(bdev_file->f_mapping->host); 1085 + mutex_unlock(&disk->open_mutex); 1086 + } 1087 + 1088 + fput(bdev_file); 1089 + } 1090 + EXPORT_SYMBOL(bdev_fput); 1073 1091 1074 1092 /** 1075 1093 * lookup_bdev() - Look up a struct block_device by name.

+3 -2

block/ioctl.c

··· 96 96 unsigned long arg) 97 97 { 98 98 uint64_t range[2]; 99 - uint64_t start, len; 99 + uint64_t start, len, end; 100 100 struct inode *inode = bdev->bd_inode; 101 101 int err; 102 102 ··· 117 117 if (len & 511) 118 118 return -EINVAL; 119 119 120 - if (start + len > bdev_nr_bytes(bdev)) 120 + if (check_add_overflow(start, len, &end) || 121 + end > bdev_nr_bytes(bdev)) 121 122 return -EINVAL; 122 123 123 124 filemap_invalidate_lock(inode->i_mapping);

+10 -12

drivers/acpi/thermal.c

··· 662 662 { 663 663 int result; 664 664 665 - tz->thermal_zone = thermal_zone_device_register_with_trips("acpitz", 666 - trip_table, 667 - trip_count, 668 - tz, 669 - &acpi_thermal_zone_ops, 670 - NULL, 671 - passive_delay, 672 - tz->polling_frequency * 100); 665 + if (trip_count) 666 + tz->thermal_zone = thermal_zone_device_register_with_trips( 667 + "acpitz", trip_table, trip_count, tz, 668 + &acpi_thermal_zone_ops, NULL, passive_delay, 669 + tz->polling_frequency * 100); 670 + else 671 + tz->thermal_zone = thermal_tripless_zone_device_register( 672 + "acpitz", tz, &acpi_thermal_zone_ops, NULL); 673 + 673 674 if (IS_ERR(tz->thermal_zone)) 674 675 return PTR_ERR(tz->thermal_zone); 675 676 ··· 902 901 trip++; 903 902 } 904 903 905 - if (trip == trip_table) { 904 + if (trip == trip_table) 906 905 pr_warn(FW_BUG "No valid trip points!\n"); 907 - result = -ENODEV; 908 - goto free_memory; 909 - } 910 906 911 907 result = acpi_thermal_register_thermal_zone(tz, trip_table, 912 908 trip - trip_table,

-1

drivers/ata/ahci_st.c

··· 30 30 #define ST_AHCI_OOBR_CIMAX_SHIFT 0 31 31 32 32 struct st_ahci_drv_data { 33 - struct platform_device *ahci; 34 33 struct reset_control *pwr; 35 34 struct reset_control *sw_rst; 36 35 struct reset_control *pwr_rst;

-3

drivers/ata/pata_macio.c

··· 1371 1371 .suspend = pata_macio_pci_suspend, 1372 1372 .resume = pata_macio_pci_resume, 1373 1373 #endif 1374 - .driver = { 1375 - .owner = THIS_MODULE, 1376 - }, 1377 1374 }; 1378 1375 MODULE_DEVICE_TABLE(pci, pata_macio_pci_match); 1379 1376

+4 -1

drivers/ata/sata_gemini.c

··· 200 200 pclk = sg->sata0_pclk; 201 201 else 202 202 pclk = sg->sata1_pclk; 203 - clk_enable(pclk); 203 + ret = clk_enable(pclk); 204 + if (ret) 205 + return ret; 206 + 204 207 msleep(10); 205 208 206 209 /* Do not keep clocking a bridge that is not online */

+31 -32

drivers/ata/sata_mv.c

··· 787 787 }, 788 788 }; 789 789 790 - static const struct pci_device_id mv_pci_tbl[] = { 791 - { PCI_VDEVICE(MARVELL, 0x5040), chip_504x }, 792 - { PCI_VDEVICE(MARVELL, 0x5041), chip_504x }, 793 - { PCI_VDEVICE(MARVELL, 0x5080), chip_5080 }, 794 - { PCI_VDEVICE(MARVELL, 0x5081), chip_508x }, 795 - /* RocketRAID 1720/174x have different identifiers */ 796 - { PCI_VDEVICE(TTI, 0x1720), chip_6042 }, 797 - { PCI_VDEVICE(TTI, 0x1740), chip_6042 }, 798 - { PCI_VDEVICE(TTI, 0x1742), chip_6042 }, 799 - 800 - { PCI_VDEVICE(MARVELL, 0x6040), chip_604x }, 801 - { PCI_VDEVICE(MARVELL, 0x6041), chip_604x }, 802 - { PCI_VDEVICE(MARVELL, 0x6042), chip_6042 }, 803 - { PCI_VDEVICE(MARVELL, 0x6080), chip_608x }, 804 - { PCI_VDEVICE(MARVELL, 0x6081), chip_608x }, 805 - 806 - { PCI_VDEVICE(ADAPTEC2, 0x0241), chip_604x }, 807 - 808 - /* Adaptec 1430SA */ 809 - { PCI_VDEVICE(ADAPTEC2, 0x0243), chip_7042 }, 810 - 811 - /* Marvell 7042 support */ 812 - { PCI_VDEVICE(MARVELL, 0x7042), chip_7042 }, 813 - 814 - /* Highpoint RocketRAID PCIe series */ 815 - { PCI_VDEVICE(TTI, 0x2300), chip_7042 }, 816 - { PCI_VDEVICE(TTI, 0x2310), chip_7042 }, 817 - 818 - { } /* terminate list */ 819 - }; 820 - 821 790 static const struct mv_hw_ops mv5xxx_ops = { 822 791 .phy_errata = mv5_phy_errata, 823 792 .enable_leds = mv5_enable_leds, ··· 4272 4303 static int mv_pci_device_resume(struct pci_dev *pdev); 4273 4304 #endif 4274 4305 4306 + static const struct pci_device_id mv_pci_tbl[] = { 4307 + { PCI_VDEVICE(MARVELL, 0x5040), chip_504x }, 4308 + { PCI_VDEVICE(MARVELL, 0x5041), chip_504x }, 4309 + { PCI_VDEVICE(MARVELL, 0x5080), chip_5080 }, 4310 + { PCI_VDEVICE(MARVELL, 0x5081), chip_508x }, 4311 + /* RocketRAID 1720/174x have different identifiers */ 4312 + { PCI_VDEVICE(TTI, 0x1720), chip_6042 }, 4313 + { PCI_VDEVICE(TTI, 0x1740), chip_6042 }, 4314 + { PCI_VDEVICE(TTI, 0x1742), chip_6042 }, 4315 + 4316 + { PCI_VDEVICE(MARVELL, 0x6040), chip_604x }, 4317 + { PCI_VDEVICE(MARVELL, 0x6041), chip_604x }, 4318 + { PCI_VDEVICE(MARVELL, 0x6042), chip_6042 }, 4319 + { PCI_VDEVICE(MARVELL, 0x6080), chip_608x }, 4320 + { PCI_VDEVICE(MARVELL, 0x6081), chip_608x }, 4321 + 4322 + { PCI_VDEVICE(ADAPTEC2, 0x0241), chip_604x }, 4323 + 4324 + /* Adaptec 1430SA */ 4325 + { PCI_VDEVICE(ADAPTEC2, 0x0243), chip_7042 }, 4326 + 4327 + /* Marvell 7042 support */ 4328 + { PCI_VDEVICE(MARVELL, 0x7042), chip_7042 }, 4329 + 4330 + /* Highpoint RocketRAID PCIe series */ 4331 + { PCI_VDEVICE(TTI, 0x2300), chip_7042 }, 4332 + { PCI_VDEVICE(TTI, 0x2310), chip_7042 }, 4333 + 4334 + { } /* terminate list */ 4335 + }; 4275 4336 4276 4337 static struct pci_driver mv_pci_driver = { 4277 4338 .name = DRV_NAME, ··· 4314 4315 #endif 4315 4316 4316 4317 }; 4318 + MODULE_DEVICE_TABLE(pci, mv_pci_tbl); 4317 4319 4318 4320 /** 4319 4321 * mv_print_info - Dump key info to kernel log for perusal. ··· 4487 4487 MODULE_AUTHOR("Brett Russ"); 4488 4488 MODULE_DESCRIPTION("SCSI low-level driver for Marvell SATA controllers"); 4489 4489 MODULE_LICENSE("GPL v2"); 4490 - MODULE_DEVICE_TABLE(pci, mv_pci_tbl); 4491 4490 MODULE_VERSION(DRV_VERSION); 4492 4491 MODULE_ALIAS("platform:" DRV_NAME); 4493 4492

+2 -4

drivers/ata/sata_sx4.c

··· 957 957 958 958 offset -= (idx * window_size); 959 959 idx++; 960 - dist = ((long) (window_size - (offset + size))) >= 0 ? size : 961 - (long) (window_size - offset); 960 + dist = min(size, window_size - offset); 962 961 memcpy_fromio(psource, dimm_mmio + offset / 4, dist); 963 962 964 963 psource += dist; ··· 1004 1005 readl(mmio + PDC_DIMM_WINDOW_CTLR); 1005 1006 offset -= (idx * window_size); 1006 1007 idx++; 1007 - dist = ((long)(s32)(window_size - (offset + size))) >= 0 ? size : 1008 - (long) (window_size - offset); 1008 + dist = min(size, window_size - offset); 1009 1009 memcpy_toio(dimm_mmio + offset / 4, psource, dist); 1010 1010 writel(0x01, mmio + PDC_GENERAL_CTLR); 1011 1011 readl(mmio + PDC_GENERAL_CTLR);

+23 -3

drivers/base/core.c

··· 44 44 static void __fw_devlink_link_to_consumers(struct device *dev); 45 45 static bool fw_devlink_drv_reg_done; 46 46 static bool fw_devlink_best_effort; 47 + static struct workqueue_struct *device_link_wq; 47 48 48 49 /** 49 50 * __fwnode_link_add - Create a link between two fwnode_handles. ··· 534 533 /* 535 534 * It may take a while to complete this work because of the SRCU 536 535 * synchronization in device_link_release_fn() and if the consumer or 537 - * supplier devices get deleted when it runs, so put it into the "long" 538 - * workqueue. 536 + * supplier devices get deleted when it runs, so put it into the 537 + * dedicated workqueue. 539 538 */ 540 - queue_work(system_long_wq, &link->rm_work); 539 + queue_work(device_link_wq, &link->rm_work); 541 540 } 541 + 542 + /** 543 + * device_link_wait_removal - Wait for ongoing devlink removal jobs to terminate 544 + */ 545 + void device_link_wait_removal(void) 546 + { 547 + /* 548 + * devlink removal jobs are queued in the dedicated work queue. 549 + * To be sure that all removal jobs are terminated, ensure that any 550 + * scheduled work has run to completion. 551 + */ 552 + flush_workqueue(device_link_wq); 553 + } 554 + EXPORT_SYMBOL_GPL(device_link_wait_removal); 542 555 543 556 static struct class devlink_class = { 544 557 .name = "devlink", ··· 4179 4164 sysfs_dev_char_kobj = kobject_create_and_add("char", dev_kobj); 4180 4165 if (!sysfs_dev_char_kobj) 4181 4166 goto char_kobj_err; 4167 + device_link_wq = alloc_workqueue("device_link_wq", 0, 0); 4168 + if (!device_link_wq) 4169 + goto wq_err; 4182 4170 4183 4171 return 0; 4184 4172 4173 + wq_err: 4174 + kobject_put(sysfs_dev_char_kobj); 4185 4175 char_kobj_err: 4186 4176 kobject_put(sysfs_dev_block_kobj); 4187 4177 block_kobj_err:

+3 -3

drivers/base/regmap/regcache-maple.c

··· 112 112 unsigned long *entry, *lower, *upper; 113 113 unsigned long lower_index, lower_last; 114 114 unsigned long upper_index, upper_last; 115 - int ret; 115 + int ret = 0; 116 116 117 117 lower = NULL; 118 118 upper = NULL; ··· 145 145 upper_index = max + 1; 146 146 upper_last = mas.last; 147 147 148 - upper = kmemdup(&entry[max + 1], 148 + upper = kmemdup(&entry[max - mas.index + 1], 149 149 ((mas.last - max) * 150 150 sizeof(unsigned long)), 151 151 map->alloc_flags); ··· 244 244 unsigned long lmin = min; 245 245 unsigned long lmax = max; 246 246 unsigned int r, v, sync_start; 247 - int ret; 247 + int ret = 0; 248 248 bool sync_needed = false; 249 249 250 250 map->cache_bypass = true;

+37

drivers/base/regmap/regmap.c

··· 2839 2839 EXPORT_SYMBOL_GPL(regmap_read); 2840 2840 2841 2841 /** 2842 + * regmap_read_bypassed() - Read a value from a single register direct 2843 + * from the device, bypassing the cache 2844 + * 2845 + * @map: Register map to read from 2846 + * @reg: Register to be read from 2847 + * @val: Pointer to store read value 2848 + * 2849 + * A value of zero will be returned on success, a negative errno will 2850 + * be returned in error cases. 2851 + */ 2852 + int regmap_read_bypassed(struct regmap *map, unsigned int reg, unsigned int *val) 2853 + { 2854 + int ret; 2855 + bool bypass, cache_only; 2856 + 2857 + if (!IS_ALIGNED(reg, map->reg_stride)) 2858 + return -EINVAL; 2859 + 2860 + map->lock(map->lock_arg); 2861 + 2862 + bypass = map->cache_bypass; 2863 + cache_only = map->cache_only; 2864 + map->cache_bypass = true; 2865 + map->cache_only = false; 2866 + 2867 + ret = _regmap_read(map, reg, val); 2868 + 2869 + map->cache_bypass = bypass; 2870 + map->cache_only = cache_only; 2871 + 2872 + map->unlock(map->lock_arg); 2873 + 2874 + return ret; 2875 + } 2876 + EXPORT_SYMBOL_GPL(regmap_read_bypassed); 2877 + 2878 + /** 2842 2879 * regmap_raw_read() - Read raw data from the device 2843 2880 * 2844 2881 * @map: Register map to read from

+2 -2

drivers/block/null_blk/main.c

··· 1965 1965 1966 1966 out_ida_free: 1967 1967 ida_free(&nullb_indexes, nullb->index); 1968 - out_cleanup_zone: 1969 - null_free_zoned_dev(dev); 1970 1968 out_cleanup_disk: 1971 1969 put_disk(nullb->disk); 1970 + out_cleanup_zone: 1971 + null_free_zoned_dev(dev); 1972 1972 out_cleanup_tags: 1973 1973 if (nullb->tag_set == &nullb->__tag_set) 1974 1974 blk_mq_free_tag_set(nullb->tag_set);

+6 -2

drivers/bluetooth/btqca.c

··· 826 826 827 827 int qca_set_bdaddr(struct hci_dev *hdev, const bdaddr_t *bdaddr) 828 828 { 829 + bdaddr_t bdaddr_swapped; 829 830 struct sk_buff *skb; 830 831 int err; 831 832 832 - skb = __hci_cmd_sync_ev(hdev, EDL_WRITE_BD_ADDR_OPCODE, 6, bdaddr, 833 - HCI_EV_VENDOR, HCI_INIT_TIMEOUT); 833 + baswap(&bdaddr_swapped, bdaddr); 834 + 835 + skb = __hci_cmd_sync_ev(hdev, EDL_WRITE_BD_ADDR_OPCODE, 6, 836 + &bdaddr_swapped, HCI_EV_VENDOR, 837 + HCI_INIT_TIMEOUT); 834 838 if (IS_ERR(skb)) { 835 839 err = PTR_ERR(skb); 836 840 bt_dev_err(hdev, "QCA Change address cmd failed (%d)", err);

+9 -10

drivers/bluetooth/hci_qca.c

··· 7 7 * 8 8 * Copyright (C) 2007 Texas Instruments, Inc. 9 9 * Copyright (c) 2010, 2012, 2018 The Linux Foundation. All rights reserved. 10 - * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved. 11 10 * 12 11 * Acknowledgements: 13 12 * This file is based on hci_ll.c, which was... ··· 225 226 struct qca_power *bt_power; 226 227 u32 init_speed; 227 228 u32 oper_speed; 229 + bool bdaddr_property_broken; 228 230 const char *firmware_name; 229 231 }; 230 232 ··· 1843 1843 const char *firmware_name = qca_get_firmware_name(hu); 1844 1844 int ret; 1845 1845 struct qca_btsoc_version ver; 1846 + struct qca_serdev *qcadev; 1846 1847 const char *soc_name; 1847 1848 1848 1849 ret = qca_check_speeds(hu); ··· 1905 1904 case QCA_WCN6750: 1906 1905 case QCA_WCN6855: 1907 1906 case QCA_WCN7850: 1907 + set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks); 1908 1908 1909 - /* Set BDA quirk bit for reading BDA value from fwnode property 1910 - * only if that property exist in DT. 1911 - */ 1912 - if (fwnode_property_present(dev_fwnode(hdev->dev.parent), "local-bd-address")) { 1913 - set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks); 1914 - bt_dev_info(hdev, "setting quirk bit to read BDA from fwnode later"); 1915 - } else { 1916 - bt_dev_dbg(hdev, "local-bd-address` is not present in the devicetree so not setting quirk bit for BDA"); 1917 - } 1909 + qcadev = serdev_device_get_drvdata(hu->serdev); 1910 + if (qcadev->bdaddr_property_broken) 1911 + set_bit(HCI_QUIRK_BDADDR_PROPERTY_BROKEN, &hdev->quirks); 1918 1912 1919 1913 hci_set_aosp_capable(hdev); 1920 1914 ··· 2290 2294 &qcadev->oper_speed); 2291 2295 if (!qcadev->oper_speed) 2292 2296 BT_DBG("UART will pick default operating speed"); 2297 + 2298 + qcadev->bdaddr_property_broken = device_property_read_bool(&serdev->dev, 2299 + "qcom,local-bd-address-broken"); 2293 2300 2294 2301 if (data) 2295 2302 qcadev->btsoc_type = data->soc_type;

+1 -1

drivers/crypto/ccp/sev-dev.c

··· 1090 1090 void *arg = &data; 1091 1091 int cmd, rc = 0; 1092 1092 1093 - if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) 1093 + if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) 1094 1094 return -ENODEV; 1095 1095 1096 1096 sev = psp->sev_data;

+5 -1

drivers/firewire/ohci.c

··· 2060 2060 2061 2061 ohci->generation = generation; 2062 2062 reg_write(ohci, OHCI1394_IntEventClear, OHCI1394_busReset); 2063 + if (param_debug & OHCI_PARAM_DEBUG_BUSRESETS) 2064 + reg_write(ohci, OHCI1394_IntMaskSet, OHCI1394_busReset); 2063 2065 2064 2066 if (ohci->quirks & QUIRK_RESET_PACKET) 2065 2067 ohci->request_generation = generation; ··· 2127 2125 return IRQ_NONE; 2128 2126 2129 2127 /* 2130 - * busReset and postedWriteErr must not be cleared yet 2128 + * busReset and postedWriteErr events must not be cleared yet 2131 2129 * (OHCI 1.1 clauses 7.2.3.2 and 13.2.8.1) 2132 2130 */ 2133 2131 reg_write(ohci, OHCI1394_IntEventClear, 2134 2132 event & ~(OHCI1394_busReset | OHCI1394_postedWriteErr)); 2135 2133 log_irqs(ohci, event); 2134 + if (event & OHCI1394_busReset) 2135 + reg_write(ohci, OHCI1394_IntMaskClear, OHCI1394_busReset); 2136 2136 2137 2137 if (event & OHCI1394_selfIDComplete) 2138 2138 queue_work(selfid_workqueue, &ohci->bus_reset_work);

+32 -16

drivers/gpio/gpiolib-cdev.c

··· 728 728 GPIO_V2_LINE_EVENT_FALLING_EDGE; 729 729 } 730 730 731 + static inline char *make_irq_label(const char *orig) 732 + { 733 + char *new; 734 + 735 + if (!orig) 736 + return NULL; 737 + 738 + new = kstrdup_and_replace(orig, '/', ':', GFP_KERNEL); 739 + if (!new) 740 + return ERR_PTR(-ENOMEM); 741 + 742 + return new; 743 + } 744 + 745 + static inline void free_irq_label(const char *label) 746 + { 747 + kfree(label); 748 + } 749 + 731 750 #ifdef CONFIG_HTE 732 751 733 752 static enum hte_return process_hw_ts_thread(void *p) ··· 1034 1015 { 1035 1016 unsigned long irqflags; 1036 1017 int ret, level, irq; 1018 + char *label; 1037 1019 1038 1020 /* try hardware */ 1039 1021 ret = gpiod_set_debounce(line->desc, debounce_period_us); ··· 1057 1037 if (irq < 0) 1058 1038 return -ENXIO; 1059 1039 1040 + label = make_irq_label(line->req->label); 1041 + if (IS_ERR(label)) 1042 + return -ENOMEM; 1043 + 1060 1044 irqflags = IRQF_TRIGGER_FALLING | IRQF_TRIGGER_RISING; 1061 1045 ret = request_irq(irq, debounce_irq_handler, irqflags, 1062 - line->req->label, line); 1063 - if (ret) 1046 + label, line); 1047 + if (ret) { 1048 + free_irq_label(label); 1064 1049 return ret; 1050 + } 1065 1051 line->irq = irq; 1066 1052 } else { 1067 1053 ret = hte_edge_setup(line, GPIO_V2_LINE_FLAG_EDGE_BOTH); ··· 1107 1081 return lc->attrs[i].attr.debounce_period_us; 1108 1082 } 1109 1083 return 0; 1110 - } 1111 - 1112 - static inline char *make_irq_label(const char *orig) 1113 - { 1114 - return kstrdup_and_replace(orig, '/', ':', GFP_KERNEL); 1115 - } 1116 - 1117 - static inline void free_irq_label(const char *label) 1118 - { 1119 - kfree(label); 1120 1084 } 1121 1085 1122 1086 static void edge_detector_stop(struct line *line) ··· 1174 1158 irqflags |= IRQF_ONESHOT; 1175 1159 1176 1160 label = make_irq_label(line->req->label); 1177 - if (!label) 1178 - return -ENOMEM; 1161 + if (IS_ERR(label)) 1162 + return PTR_ERR(label); 1179 1163 1180 1164 /* Request a thread to read the events */ 1181 1165 ret = request_threaded_irq(irq, edge_irq_handler, edge_irq_thread, ··· 2233 2217 goto out_free_le; 2234 2218 2235 2219 label = make_irq_label(le->label); 2236 - if (!label) { 2237 - ret = -ENOMEM; 2220 + if (IS_ERR(label)) { 2221 + ret = PTR_ERR(label); 2238 2222 goto out_free_le; 2239 2223 } 2240 2224

+3

drivers/gpio/gpiolib.c

··· 1175 1175 1176 1176 list_for_each_entry_srcu(gdev, &gpio_devices, list, 1177 1177 srcu_read_lock_held(&gpio_devices_srcu)) { 1178 + if (!device_is_registered(&gdev->dev)) 1179 + continue; 1180 + 1178 1181 guard(srcu)(&gdev->srcu); 1179 1182 1180 1183 gc = srcu_dereference(gdev->chip, &gdev->srcu);

+2 -2

drivers/gpu/drm/display/drm_dp_dual_mode_helper.c

··· 52 52 * @adapter: I2C adapter for the DDC bus 53 53 * @offset: register offset 54 54 * @buffer: buffer for return data 55 - * @size: sizo of the buffer 55 + * @size: size of the buffer 56 56 * 57 57 * Reads @size bytes from the DP dual mode adaptor registers 58 58 * starting at @offset. ··· 116 116 * @adapter: I2C adapter for the DDC bus 117 117 * @offset: register offset 118 118 * @buffer: buffer for write data 119 - * @size: sizo of the buffer 119 + * @size: size of the buffer 120 120 * 121 121 * Writes @size bytes to the DP dual mode adaptor registers 122 122 * starting at @offset.

+6 -1

drivers/gpu/drm/drm_prime.c

··· 582 582 { 583 583 struct drm_gem_object *obj = dma_buf->priv; 584 584 585 - if (!obj->funcs->get_sg_table) 585 + /* 586 + * drm_gem_map_dma_buf() requires obj->get_sg_table(), but drivers 587 + * that implement their own ->map_dma_buf() do not. 588 + */ 589 + if (dma_buf->ops->map_dma_buf == drm_gem_map_dma_buf && 590 + !obj->funcs->get_sg_table) 586 591 return -ENOSYS; 587 592 588 593 return drm_gem_pin(obj);

+1

drivers/gpu/drm/i915/Makefile

··· 118 118 gt/intel_ggtt_fencing.o \ 119 119 gt/intel_gt.o \ 120 120 gt/intel_gt_buffer_pool.o \ 121 + gt/intel_gt_ccs_mode.o \ 121 122 gt/intel_gt_clock_utils.o \ 122 123 gt/intel_gt_debugfs.o \ 123 124 gt/intel_gt_engines_debugfs.o \

-9

drivers/gpu/drm/i915/display/intel_display.c

··· 2709 2709 */ 2710 2710 intel_de_write(dev_priv, PIPESRC(pipe), 2711 2711 PIPESRC_WIDTH(width - 1) | PIPESRC_HEIGHT(height - 1)); 2712 - 2713 - if (!crtc_state->enable_psr2_su_region_et) 2714 - return; 2715 - 2716 - width = drm_rect_width(&crtc_state->psr2_su_area); 2717 - height = drm_rect_height(&crtc_state->psr2_su_area); 2718 - 2719 - intel_de_write(dev_priv, PIPE_SRCSZ_ERLY_TPT(pipe), 2720 - PIPESRC_WIDTH(width - 1) | PIPESRC_HEIGHT(height - 1)); 2721 2712 } 2722 2713 2723 2714 static bool intel_pipe_is_interlaced(const struct intel_crtc_state *crtc_state)

+1

drivers/gpu/drm/i915/display/intel_display_device.h

··· 47 47 #define HAS_DPT(i915) (DISPLAY_VER(i915) >= 13) 48 48 #define HAS_DSB(i915) (DISPLAY_INFO(i915)->has_dsb) 49 49 #define HAS_DSC(__i915) (DISPLAY_RUNTIME_INFO(__i915)->has_dsc) 50 + #define HAS_DSC_MST(__i915) (DISPLAY_VER(__i915) >= 12 && HAS_DSC(__i915)) 50 51 #define HAS_FBC(i915) (DISPLAY_RUNTIME_INFO(i915)->fbc_mask != 0) 51 52 #define HAS_FPGA_DBG_UNCLAIMED(i915) (DISPLAY_INFO(i915)->has_fpga_dbg) 52 53 #define HAS_FW_BLC(i915) (DISPLAY_VER(i915) >= 3)

+2

drivers/gpu/drm/i915/display/intel_display_types.h

··· 1423 1423 1424 1424 u32 psr2_man_track_ctl; 1425 1425 1426 + u32 pipe_srcsz_early_tpt; 1427 + 1426 1428 struct drm_rect psr2_su_area; 1427 1429 1428 1430 /* Variable Refresh Rate state */

+7 -4

drivers/gpu/drm/i915/display/intel_dp.c

··· 499 499 /* The values must be in increasing order */ 500 500 static const int mtl_rates[] = { 501 501 162000, 216000, 243000, 270000, 324000, 432000, 540000, 675000, 502 - 810000, 1000000, 1350000, 2000000, 502 + 810000, 1000000, 2000000, 503 503 }; 504 504 static const int icl_rates[] = { 505 505 162000, 216000, 270000, 324000, 432000, 540000, 648000, 810000, ··· 1422 1422 if (DISPLAY_VER(dev_priv) >= 12) 1423 1423 return true; 1424 1424 1425 - if (DISPLAY_VER(dev_priv) == 11 && encoder->port != PORT_A) 1425 + if (DISPLAY_VER(dev_priv) == 11 && encoder->port != PORT_A && 1426 + !intel_crtc_has_type(pipe_config, INTEL_OUTPUT_DP_MST)) 1426 1427 return true; 1427 1428 1428 1429 return false; ··· 1918 1917 dsc_max_bpp = min(dsc_max_bpp, pipe_bpp - 1); 1919 1918 1920 1919 for (i = 0; i < ARRAY_SIZE(valid_dsc_bpp); i++) { 1921 - if (valid_dsc_bpp[i] < dsc_min_bpp || 1922 - valid_dsc_bpp[i] > dsc_max_bpp) 1920 + if (valid_dsc_bpp[i] < dsc_min_bpp) 1921 + continue; 1922 + if (valid_dsc_bpp[i] > dsc_max_bpp) 1923 1923 break; 1924 1924 1925 1925 ret = dsc_compute_link_config(intel_dp, ··· 6559 6557 intel_connector->get_hw_state = intel_ddi_connector_get_hw_state; 6560 6558 else 6561 6559 intel_connector->get_hw_state = intel_connector_get_hw_state; 6560 + intel_connector->sync_state = intel_dp_connector_sync_state; 6562 6561 6563 6562 if (!intel_edp_init_connector(intel_dp, intel_connector)) { 6564 6563 intel_dp_aux_fini(intel_dp);

+1 -1

drivers/gpu/drm/i915/display/intel_dp_mst.c

··· 1355 1355 return 0; 1356 1356 } 1357 1357 1358 - if (DISPLAY_VER(dev_priv) >= 10 && 1358 + if (HAS_DSC_MST(dev_priv) && 1359 1359 drm_dp_sink_supports_dsc(intel_connector->dp.dsc_dpcd)) { 1360 1360 /* 1361 1361 * TBD pass the connector BPC,

+56 -22

drivers/gpu/drm/i915/display/intel_psr.c

··· 1994 1994 1995 1995 void intel_psr2_program_trans_man_trk_ctl(const struct intel_crtc_state *crtc_state) 1996 1996 { 1997 + struct intel_crtc *crtc = to_intel_crtc(crtc_state->uapi.crtc); 1997 1998 struct drm_i915_private *dev_priv = to_i915(crtc_state->uapi.crtc->dev); 1998 1999 enum transcoder cpu_transcoder = crtc_state->cpu_transcoder; 1999 2000 struct intel_encoder *encoder; ··· 2014 2013 2015 2014 intel_de_write(dev_priv, PSR2_MAN_TRK_CTL(cpu_transcoder), 2016 2015 crtc_state->psr2_man_track_ctl); 2016 + 2017 + if (!crtc_state->enable_psr2_su_region_et) 2018 + return; 2019 + 2020 + intel_de_write(dev_priv, PIPE_SRCSZ_ERLY_TPT(crtc->pipe), 2021 + crtc_state->pipe_srcsz_early_tpt); 2017 2022 } 2018 2023 2019 2024 static void psr2_man_trk_ctl_calc(struct intel_crtc_state *crtc_state, ··· 2056 2049 } 2057 2050 exit: 2058 2051 crtc_state->psr2_man_track_ctl = val; 2052 + } 2053 + 2054 + static u32 psr2_pipe_srcsz_early_tpt_calc(struct intel_crtc_state *crtc_state, 2055 + bool full_update) 2056 + { 2057 + int width, height; 2058 + 2059 + if (!crtc_state->enable_psr2_su_region_et || full_update) 2060 + return 0; 2061 + 2062 + width = drm_rect_width(&crtc_state->psr2_su_area); 2063 + height = drm_rect_height(&crtc_state->psr2_su_area); 2064 + 2065 + return PIPESRC_WIDTH(width - 1) | PIPESRC_HEIGHT(height - 1); 2059 2066 } 2060 2067 2061 2068 static void clip_area_update(struct drm_rect *overlap_damage_area, ··· 2116 2095 * cursor fully when cursor is in SU area. 2117 2096 */ 2118 2097 static void 2119 - intel_psr2_sel_fetch_et_alignment(struct intel_crtc_state *crtc_state, 2120 - struct intel_plane_state *cursor_state) 2098 + intel_psr2_sel_fetch_et_alignment(struct intel_atomic_state *state, 2099 + struct intel_crtc *crtc) 2121 2100 { 2122 - struct drm_rect inter; 2101 + struct intel_crtc_state *crtc_state = intel_atomic_get_new_crtc_state(state, crtc); 2102 + struct intel_plane_state *new_plane_state; 2103 + struct intel_plane *plane; 2104 + int i; 2123 2105 2124 - if (!crtc_state->enable_psr2_su_region_et || 2125 - !cursor_state->uapi.visible) 2106 + if (!crtc_state->enable_psr2_su_region_et) 2126 2107 return; 2127 2108 2128 - inter = crtc_state->psr2_su_area; 2129 - if (!drm_rect_intersect(&inter, &cursor_state->uapi.dst)) 2130 - return; 2109 + for_each_new_intel_plane_in_state(state, plane, new_plane_state, i) { 2110 + struct drm_rect inter; 2131 2111 2132 - clip_area_update(&crtc_state->psr2_su_area, &cursor_state->uapi.dst, 2133 - &crtc_state->pipe_src); 2112 + if (new_plane_state->uapi.crtc != crtc_state->uapi.crtc) 2113 + continue; 2114 + 2115 + if (plane->id != PLANE_CURSOR) 2116 + continue; 2117 + 2118 + if (!new_plane_state->uapi.visible) 2119 + continue; 2120 + 2121 + inter = crtc_state->psr2_su_area; 2122 + if (!drm_rect_intersect(&inter, &new_plane_state->uapi.dst)) 2123 + continue; 2124 + 2125 + clip_area_update(&crtc_state->psr2_su_area, &new_plane_state->uapi.dst, 2126 + &crtc_state->pipe_src); 2127 + } 2134 2128 } 2135 2129 2136 2130 /* ··· 2188 2152 { 2189 2153 struct drm_i915_private *dev_priv = to_i915(state->base.dev); 2190 2154 struct intel_crtc_state *crtc_state = intel_atomic_get_new_crtc_state(state, crtc); 2191 - struct intel_plane_state *new_plane_state, *old_plane_state, 2192 - *cursor_plane_state = NULL; 2155 + struct intel_plane_state *new_plane_state, *old_plane_state; 2193 2156 struct intel_plane *plane; 2194 2157 bool full_update = false; 2195 2158 int i, ret; ··· 2273 2238 damaged_area.x2 += new_plane_state->uapi.dst.x1 - src.x1; 2274 2239 2275 2240 clip_area_update(&crtc_state->psr2_su_area, &damaged_area, &crtc_state->pipe_src); 2276 - 2277 - /* 2278 - * Cursor plane new state is stored to adjust su area to cover 2279 - * cursor are fully. 2280 - */ 2281 - if (plane->id == PLANE_CURSOR) 2282 - cursor_plane_state = new_plane_state; 2283 2241 } 2284 2242 2285 2243 /* ··· 2301 2273 if (ret) 2302 2274 return ret; 2303 2275 2304 - /* Adjust su area to cover cursor fully as necessary */ 2305 - if (cursor_plane_state) 2306 - intel_psr2_sel_fetch_et_alignment(crtc_state, cursor_plane_state); 2276 + /* 2277 + * Adjust su area to cover cursor fully as necessary (early 2278 + * transport). This needs to be done after 2279 + * drm_atomic_add_affected_planes to ensure visible cursor is added into 2280 + * affected planes even when cursor is not updated by itself. 2281 + */ 2282 + intel_psr2_sel_fetch_et_alignment(state, crtc); 2307 2283 2308 2284 intel_psr2_sel_fetch_pipe_alignment(crtc_state); 2309 2285 ··· 2370 2338 2371 2339 skip_sel_fetch_set_loop: 2372 2340 psr2_man_trk_ctl_calc(crtc_state, full_update); 2341 + crtc_state->pipe_srcsz_early_tpt = 2342 + psr2_pipe_srcsz_early_tpt_calc(crtc_state, full_update); 2373 2343 return 0; 2374 2344 } 2375 2345

+3

drivers/gpu/drm/i915/gt/gen8_ppgtt.c

··· 961 961 struct i915_vma *vma; 962 962 int ret; 963 963 964 + if (!intel_gt_needs_wa_16018031267(vm->gt)) 965 + return 0; 966 + 964 967 /* The memory will be used only by GPU. */ 965 968 obj = i915_gem_object_create_lmem(i915, PAGE_SIZE, 966 969 I915_BO_ALLOC_VOLATILE |

+17

drivers/gpu/drm/i915/gt/intel_engine_cs.c

··· 908 908 info->engine_mask &= ~BIT(GSC0); 909 909 } 910 910 911 + /* 912 + * Do not create the command streamer for CCS slices beyond the first. 913 + * All the workload submitted to the first engine will be shared among 914 + * all the slices. 915 + * 916 + * Once the user will be allowed to customize the CCS mode, then this 917 + * check needs to be removed. 918 + */ 919 + if (IS_DG2(gt->i915)) { 920 + u8 first_ccs = __ffs(CCS_MASK(gt)); 921 + 922 + /* Mask off all the CCS engine */ 923 + info->engine_mask &= ~GENMASK(CCS3, CCS0); 924 + /* Put back in the first CCS engine */ 925 + info->engine_mask |= BIT(_CCS(first_ccs)); 926 + } 927 + 911 928 return info->engine_mask; 912 929 } 913 930

+6

drivers/gpu/drm/i915/gt/intel_gt.c

··· 1024 1024 return I915_MAP_WC; 1025 1025 } 1026 1026 1027 + bool intel_gt_needs_wa_16018031267(struct intel_gt *gt) 1028 + { 1029 + /* Wa_16018031267, Wa_16018063123 */ 1030 + return IS_GFX_GT_IP_RANGE(gt, IP_VER(12, 55), IP_VER(12, 71)); 1031 + } 1032 + 1027 1033 bool intel_gt_needs_wa_22016122933(struct intel_gt *gt) 1028 1034 { 1029 1035 return MEDIA_VER_FULL(gt->i915) == IP_VER(13, 0) && gt->type == GT_MEDIA;

+5 -4

drivers/gpu/drm/i915/gt/intel_gt.h

··· 82 82 ##__VA_ARGS__); \ 83 83 } while (0) 84 84 85 - #define NEEDS_FASTCOLOR_BLT_WABB(engine) ( \ 86 - IS_GFX_GT_IP_RANGE(engine->gt, IP_VER(12, 55), IP_VER(12, 71)) && \ 87 - engine->class == COPY_ENGINE_CLASS && engine->instance == 0) 88 - 89 85 static inline bool gt_is_root(struct intel_gt *gt) 90 86 { 91 87 return !gt->info.id; 92 88 } 93 89 90 + bool intel_gt_needs_wa_16018031267(struct intel_gt *gt); 94 91 bool intel_gt_needs_wa_22016122933(struct intel_gt *gt); 92 + 93 + #define NEEDS_FASTCOLOR_BLT_WABB(engine) ( \ 94 + intel_gt_needs_wa_16018031267(engine->gt) && \ 95 + engine->class == COPY_ENGINE_CLASS && engine->instance == 0) 95 96 96 97 static inline struct intel_gt *uc_to_gt(struct intel_uc *uc) 97 98 {

+39

drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c

··· 1 + // SPDX-License-Identifier: MIT 2 + /* 3 + * Copyright © 2024 Intel Corporation 4 + */ 5 + 6 + #include "i915_drv.h" 7 + #include "intel_gt.h" 8 + #include "intel_gt_ccs_mode.h" 9 + #include "intel_gt_regs.h" 10 + 11 + void intel_gt_apply_ccs_mode(struct intel_gt *gt) 12 + { 13 + int cslice; 14 + u32 mode = 0; 15 + int first_ccs = __ffs(CCS_MASK(gt)); 16 + 17 + if (!IS_DG2(gt->i915)) 18 + return; 19 + 20 + /* Build the value for the fixed CCS load balancing */ 21 + for (cslice = 0; cslice < I915_MAX_CCS; cslice++) { 22 + if (CCS_MASK(gt) & BIT(cslice)) 23 + /* 24 + * If available, assign the cslice 25 + * to the first available engine... 26 + */ 27 + mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs); 28 + 29 + else 30 + /* 31 + * ... otherwise, mark the cslice as 32 + * unavailable if no CCS dispatches here 33 + */ 34 + mode |= XEHP_CCS_MODE_CSLICE(cslice, 35 + XEHP_CCS_MODE_CSLICE_MASK); 36 + } 37 + 38 + intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode); 39 + }

+13

drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h

··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2024 Intel Corporation 4 + */ 5 + 6 + #ifndef __INTEL_GT_CCS_MODE_H__ 7 + #define __INTEL_GT_CCS_MODE_H__ 8 + 9 + struct intel_gt; 10 + 11 + void intel_gt_apply_ccs_mode(struct intel_gt *gt); 12 + 13 + #endif /* __INTEL_GT_CCS_MODE_H__ */

+6

drivers/gpu/drm/i915/gt/intel_gt_regs.h

··· 1477 1477 #define ECOBITS_PPGTT_CACHE4B (0 << 8) 1478 1478 1479 1479 #define GEN12_RCU_MODE _MMIO(0x14800) 1480 + #define XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1) 1480 1481 #define GEN12_RCU_MODE_CCS_ENABLE REG_BIT(0) 1482 + 1483 + #define XEHP_CCS_MODE _MMIO(0x14804) 1484 + #define XEHP_CCS_MODE_CSLICE_MASK REG_GENMASK(2, 0) /* CCS0-3 + rsvd */ 1485 + #define XEHP_CCS_MODE_CSLICE_WIDTH ilog2(XEHP_CCS_MODE_CSLICE_MASK + 1) 1486 + #define XEHP_CCS_MODE_CSLICE(cslice, ccs) (ccs << (cslice * XEHP_CCS_MODE_CSLICE_WIDTH)) 1481 1487 1482 1488 #define CHV_FUSE_GT _MMIO(VLV_GUNIT_BASE + 0x2168) 1483 1489 #define CHV_FGT_DISABLE_SS0 (1 << 10)

+28 -2

drivers/gpu/drm/i915/gt/intel_workarounds.c

··· 10 10 #include "intel_engine_regs.h" 11 11 #include "intel_gpu_commands.h" 12 12 #include "intel_gt.h" 13 + #include "intel_gt_ccs_mode.h" 13 14 #include "intel_gt_mcr.h" 14 15 #include "intel_gt_print.h" 15 16 #include "intel_gt_regs.h" ··· 52 51 * registers belonging to BCS, VCS or VECS should be implemented in 53 52 * xcs_engine_wa_init(). Workarounds for registers not belonging to a specific 54 53 * engine's MMIO range but that are part of of the common RCS/CCS reset domain 55 - * should be implemented in general_render_compute_wa_init(). 54 + * should be implemented in general_render_compute_wa_init(). The settings 55 + * about the CCS load balancing should be added in ccs_engine_wa_mode(). 56 56 * 57 57 * - GT workarounds: the list of these WAs is applied whenever these registers 58 58 * revert to their default values: on GPU reset, suspend/resume [1]_, etc. ··· 2856 2854 wa_write_clr(wal, GEN8_GARBCNTL, GEN12_BUS_HASH_CTL_BIT_EXC); 2857 2855 } 2858 2856 2857 + static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct i915_wa_list *wal) 2858 + { 2859 + struct intel_gt *gt = engine->gt; 2860 + 2861 + if (!IS_DG2(gt->i915)) 2862 + return; 2863 + 2864 + /* 2865 + * Wa_14019159160: This workaround, along with others, leads to 2866 + * significant challenges in utilizing load balancing among the 2867 + * CCS slices. Consequently, an architectural decision has been 2868 + * made to completely disable automatic CCS load balancing. 2869 + */ 2870 + wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE); 2871 + 2872 + /* 2873 + * After having disabled automatic load balancing we need to 2874 + * assign all slices to a single CCS. We will call it CCS mode 1 2875 + */ 2876 + intel_gt_apply_ccs_mode(gt); 2877 + } 2878 + 2859 2879 /* 2860 2880 * The workarounds in this function apply to shared registers in 2861 2881 * the general render reset domain that aren't tied to a ··· 3028 3004 * to a single RCS/CCS engine's workaround list since 3029 3005 * they're reset as part of the general render domain reset. 3030 3006 */ 3031 - if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE) 3007 + if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE) { 3032 3008 general_render_compute_wa_init(engine, wal); 3009 + ccs_engine_wa_mode(engine, wal); 3010 + } 3033 3011 3034 3012 if (engine->class == COMPUTE_CLASS) 3035 3013 ccs_engine_wa_init(engine, wal);

+3 -3

drivers/gpu/drm/nouveau/nouveau_uvmm.c

··· 812 812 struct drm_gpuva_op_unmap *u = r->unmap; 813 813 struct nouveau_uvma *uvma = uvma_from_va(u->va); 814 814 u64 addr = uvma->va.va.addr; 815 - u64 range = uvma->va.va.range; 815 + u64 end = uvma->va.va.addr + uvma->va.va.range; 816 816 817 817 if (r->prev) 818 818 addr = r->prev->va.addr + r->prev->va.range; 819 819 820 820 if (r->next) 821 - range = r->next->va.addr - addr; 821 + end = r->next->va.addr; 822 822 823 - op_unmap_range(u, addr, range); 823 + op_unmap_range(u, addr, end - addr); 824 824 } 825 825 826 826 static int

+1 -1

drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c

··· 420 420 return ret; 421 421 } else { 422 422 ret = nvkm_memory_map(gr->attrib_cb, 0, chan->vmm, chan->attrib_cb, 423 - &args, sizeof(args));; 423 + &args, sizeof(args)); 424 424 if (ret) 425 425 return ret; 426 426 }

+3 -3

drivers/gpu/drm/panfrost/panfrost_gpu.c

··· 441 441 442 442 gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present); 443 443 ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_PWRTRANS_LO, 444 - val, !val, 1, 1000); 444 + val, !val, 1, 2000); 445 445 if (ret) 446 446 dev_err(pfdev->dev, "shader power transition timeout"); 447 447 448 448 gpu_write(pfdev, TILER_PWROFF_LO, pfdev->features.tiler_present); 449 449 ret = readl_relaxed_poll_timeout(pfdev->iomem + TILER_PWRTRANS_LO, 450 - val, !val, 1, 1000); 450 + val, !val, 1, 2000); 451 451 if (ret) 452 452 dev_err(pfdev->dev, "tiler power transition timeout"); 453 453 454 454 gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present); 455 455 ret = readl_poll_timeout(pfdev->iomem + L2_PWRTRANS_LO, 456 - val, !val, 0, 1000); 456 + val, !val, 0, 2000); 457 457 if (ret) 458 458 dev_err(pfdev->dev, "l2 power transition timeout"); 459 459 }

+10 -1

drivers/gpu/drm/xe/xe_device.c

··· 193 193 { 194 194 struct xe_device *xe = to_xe_device(dev); 195 195 196 + if (xe->preempt_fence_wq) 197 + destroy_workqueue(xe->preempt_fence_wq); 198 + 196 199 if (xe->ordered_wq) 197 200 destroy_workqueue(xe->ordered_wq); 198 201 ··· 261 258 INIT_LIST_HEAD(&xe->pinned.external_vram); 262 259 INIT_LIST_HEAD(&xe->pinned.evicted); 263 260 261 + xe->preempt_fence_wq = alloc_ordered_workqueue("xe-preempt-fence-wq", 0); 264 262 xe->ordered_wq = alloc_ordered_workqueue("xe-ordered-wq", 0); 265 263 xe->unordered_wq = alloc_workqueue("xe-unordered-wq", 0, 0); 266 - if (!xe->ordered_wq || !xe->unordered_wq) { 264 + if (!xe->ordered_wq || !xe->unordered_wq || 265 + !xe->preempt_fence_wq) { 266 + /* 267 + * Cleanup done in xe_device_destroy via 268 + * drmm_add_action_or_reset register above 269 + */ 267 270 drm_err(&xe->drm, "Failed to allocate xe workqueues\n"); 268 271 err = -ENOMEM; 269 272 goto err;

+3

drivers/gpu/drm/xe/xe_device_types.h

··· 363 363 /** @ufence_wq: user fence wait queue */ 364 364 wait_queue_head_t ufence_wq; 365 365 366 + /** @preempt_fence_wq: used to serialize preempt fences */ 367 + struct workqueue_struct *preempt_fence_wq; 368 + 366 369 /** @ordered_wq: used to serialize compute mode resume */ 367 370 struct workqueue_struct *ordered_wq; 368 371

+7 -72

drivers/gpu/drm/xe/xe_exec.c

··· 94 94 * Unlock all 95 95 */ 96 96 97 + /* 98 + * Add validation and rebinding to the drm_exec locking loop, since both can 99 + * trigger eviction which may require sleeping dma_resv locks. 100 + */ 97 101 static int xe_exec_fn(struct drm_gpuvm_exec *vm_exec) 98 102 { 99 103 struct xe_vm *vm = container_of(vm_exec->vm, struct xe_vm, gpuvm); 100 - struct drm_gem_object *obj; 101 - unsigned long index; 102 - int num_fences; 103 - int ret; 104 104 105 - ret = drm_gpuvm_validate(vm_exec->vm, &vm_exec->exec); 106 - if (ret) 107 - return ret; 108 - 109 - /* 110 - * 1 fence slot for the final submit, and 1 more for every per-tile for 111 - * GPU bind and 1 extra for CPU bind. Note that there are potentially 112 - * many vma per object/dma-resv, however the fence slot will just be 113 - * re-used, since they are largely the same timeline and the seqno 114 - * should be in order. In the case of CPU bind there is dummy fence used 115 - * for all CPU binds, so no need to have a per-tile slot for that. 116 - */ 117 - num_fences = 1 + 1 + vm->xe->info.tile_count; 118 - 119 - /* 120 - * We don't know upfront exactly how many fence slots we will need at 121 - * the start of the exec, since the TTM bo_validate above can consume 122 - * numerous fence slots. Also due to how the dma_resv_reserve_fences() 123 - * works it only ensures that at least that many fence slots are 124 - * available i.e if there are already 10 slots available and we reserve 125 - * two more, it can just noop without reserving anything. With this it 126 - * is quite possible that TTM steals some of the fence slots and then 127 - * when it comes time to do the vma binding and final exec stage we are 128 - * lacking enough fence slots, leading to some nasty BUG_ON() when 129 - * adding the fences. Hence just add our own fences here, after the 130 - * validate stage. 131 - */ 132 - drm_exec_for_each_locked_object(&vm_exec->exec, index, obj) { 133 - ret = dma_resv_reserve_fences(obj->resv, num_fences); 134 - if (ret) 135 - return ret; 136 - } 137 - 138 - return 0; 105 + /* The fence slot added here is intended for the exec sched job. */ 106 + return xe_vm_validate_rebind(vm, &vm_exec->exec, 1); 139 107 } 140 108 141 109 int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file) ··· 120 152 struct drm_exec *exec = &vm_exec.exec; 121 153 u32 i, num_syncs = 0, num_ufence = 0; 122 154 struct xe_sched_job *job; 123 - struct dma_fence *rebind_fence; 124 155 struct xe_vm *vm; 125 156 bool write_locked, skip_retry = false; 126 157 ktime_t end = 0; ··· 257 290 goto err_exec; 258 291 } 259 292 260 - /* 261 - * Rebind any invalidated userptr or evicted BOs in the VM, non-compute 262 - * VM mode only. 263 - */ 264 - rebind_fence = xe_vm_rebind(vm, false); 265 - if (IS_ERR(rebind_fence)) { 266 - err = PTR_ERR(rebind_fence); 267 - goto err_put_job; 268 - } 269 - 270 - /* 271 - * We store the rebind_fence in the VM so subsequent execs don't get 272 - * scheduled before the rebinds of userptrs / evicted BOs is complete. 273 - */ 274 - if (rebind_fence) { 275 - dma_fence_put(vm->rebind_fence); 276 - vm->rebind_fence = rebind_fence; 277 - } 278 - if (vm->rebind_fence) { 279 - if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, 280 - &vm->rebind_fence->flags)) { 281 - dma_fence_put(vm->rebind_fence); 282 - vm->rebind_fence = NULL; 283 - } else { 284 - dma_fence_get(vm->rebind_fence); 285 - err = drm_sched_job_add_dependency(&job->drm, 286 - vm->rebind_fence); 287 - if (err) 288 - goto err_put_job; 289 - } 290 - } 291 - 292 - /* Wait behind munmap style rebinds */ 293 + /* Wait behind rebinds */ 293 294 if (!xe_vm_in_lr_mode(vm)) { 294 295 err = drm_sched_job_add_resv_dependencies(&job->drm, 295 296 xe_vm_resv(vm),

+5

drivers/gpu/drm/xe/xe_exec_queue_types.h

··· 148 148 const struct xe_ring_ops *ring_ops; 149 149 /** @entity: DRM sched entity for this exec queue (1 to 1 relationship) */ 150 150 struct drm_sched_entity *entity; 151 + /** 152 + * @tlb_flush_seqno: The seqno of the last rebind tlb flush performed 153 + * Protected by @vm's resv. Unused if @vm == NULL. 154 + */ 155 + u64 tlb_flush_seqno; 151 156 /** @lrc: logical ring context for this exec queue */ 152 157 struct xe_lrc lrc[]; 153 158 };

+1 -2

drivers/gpu/drm/xe/xe_gt_pagefault.c

··· 100 100 { 101 101 struct xe_bo *bo = xe_vma_bo(vma); 102 102 struct xe_vm *vm = xe_vma_vm(vma); 103 - unsigned int num_shared = 2; /* slots for bind + move */ 104 103 int err; 105 104 106 - err = xe_vm_prepare_vma(exec, vma, num_shared); 105 + err = xe_vm_lock_vma(exec, vma); 107 106 if (err) 108 107 return err; 109 108

-1

drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c

··· 61 61 INIT_LIST_HEAD(&gt->tlb_invalidation.pending_fences); 62 62 spin_lock_init(&gt->tlb_invalidation.pending_lock); 63 63 spin_lock_init(&gt->tlb_invalidation.lock); 64 - gt->tlb_invalidation.fence_context = dma_fence_context_alloc(1); 65 64 INIT_DELAYED_WORK(&gt->tlb_invalidation.fence_tdr, 66 65 xe_gt_tlb_fence_timeout); 67 66

-7

drivers/gpu/drm/xe/xe_gt_types.h

··· 177 177 * xe_gt_tlb_fence_timeout after the timeut interval is over. 178 178 */ 179 179 struct delayed_work fence_tdr; 180 - /** @tlb_invalidation.fence_context: context for TLB invalidation fences */ 181 - u64 fence_context; 182 - /** 183 - * @tlb_invalidation.fence_seqno: seqno to TLB invalidation fences, protected by 184 - * tlb_invalidation.lock 185 - */ 186 - u32 fence_seqno; 187 180 /** @tlb_invalidation.lock: protects TLB invalidation fences */ 188 181 spinlock_t lock; 189 182 } tlb_invalidation;

+1 -1

drivers/gpu/drm/xe/xe_preempt_fence.c

··· 49 49 struct xe_exec_queue *q = pfence->q; 50 50 51 51 pfence->error = q->ops->suspend(q); 52 - queue_work(system_unbound_wq, &pfence->preempt_work); 52 + queue_work(q->vm->xe->preempt_fence_wq, &pfence->preempt_work); 53 53 return true; 54 54 } 55 55

+20 -5

drivers/gpu/drm/xe/xe_pt.c

··· 1135 1135 spin_lock_irq(&gt->tlb_invalidation.lock); 1136 1136 dma_fence_init(&ifence->base.base, &invalidation_fence_ops, 1137 1137 &gt->tlb_invalidation.lock, 1138 - gt->tlb_invalidation.fence_context, 1139 - ++gt->tlb_invalidation.fence_seqno); 1138 + dma_fence_context_alloc(1), 1); 1140 1139 spin_unlock_irq(&gt->tlb_invalidation.lock); 1141 1140 1142 1141 INIT_LIST_HEAD(&ifence->base.link); ··· 1235 1236 err = xe_pt_prepare_bind(tile, vma, entries, &num_entries); 1236 1237 if (err) 1237 1238 goto err; 1239 + 1240 + err = dma_resv_reserve_fences(xe_vm_resv(vm), 1); 1241 + if (!err && !xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm) 1242 + err = dma_resv_reserve_fences(xe_vma_bo(vma)->ttm.base.resv, 1); 1243 + if (err) 1244 + goto err; 1245 + 1238 1246 xe_tile_assert(tile, num_entries <= ARRAY_SIZE(entries)); 1239 1247 1240 1248 xe_vm_dbg_print_entries(tile_to_xe(tile), entries, num_entries); ··· 1260 1254 * non-faulting LR, in particular on user-space batch buffer chaining, 1261 1255 * it needs to be done here. 1262 1256 */ 1263 - if ((rebind && !xe_vm_in_lr_mode(vm) && !vm->batch_invalidate_tlb) || 1264 - (!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) { 1257 + if ((!rebind && xe_vm_has_scratch(vm) && xe_vm_in_preempt_fence_mode(vm))) { 1265 1258 ifence = kzalloc(sizeof(*ifence), GFP_KERNEL); 1266 1259 if (!ifence) 1267 1260 return ERR_PTR(-ENOMEM); 1261 + } else if (rebind && !xe_vm_in_lr_mode(vm)) { 1262 + /* We bump also if batch_invalidate_tlb is true */ 1263 + vm->tlb_flush_seqno++; 1268 1264 } 1269 1265 1270 1266 rfence = kzalloc(sizeof(*rfence), GFP_KERNEL); ··· 1305 1297 } 1306 1298 1307 1299 /* add shared fence now for pagetable delayed destroy */ 1308 - dma_resv_add_fence(xe_vm_resv(vm), fence, !rebind && 1300 + dma_resv_add_fence(xe_vm_resv(vm), fence, rebind || 1309 1301 last_munmap_rebind ? 1310 1302 DMA_RESV_USAGE_KERNEL : 1311 1303 DMA_RESV_USAGE_BOOKKEEP); ··· 1584 1576 struct dma_fence *fence = NULL; 1585 1577 struct invalidation_fence *ifence; 1586 1578 struct xe_range_fence *rfence; 1579 + int err; 1587 1580 1588 1581 LLIST_HEAD(deferred); 1589 1582 ··· 1601 1592 xe_vm_dbg_print_entries(tile_to_xe(tile), entries, num_entries); 1602 1593 xe_pt_calc_rfence_interval(vma, &unbind_pt_update, entries, 1603 1594 num_entries); 1595 + 1596 + err = dma_resv_reserve_fences(xe_vm_resv(vm), 1); 1597 + if (!err && !xe_vma_has_no_bo(vma) && !xe_vma_bo(vma)->vm) 1598 + err = dma_resv_reserve_fences(xe_vma_bo(vma)->ttm.base.resv, 1); 1599 + if (err) 1600 + return ERR_PTR(err); 1604 1601 1605 1602 ifence = kzalloc(sizeof(*ifence), GFP_KERNEL); 1606 1603 if (!ifence)

+4 -7

drivers/gpu/drm/xe/xe_ring_ops.c

··· 219 219 { 220 220 u32 dw[MAX_JOB_SIZE_DW], i = 0; 221 221 u32 ppgtt_flag = get_ppgtt_flag(job); 222 - struct xe_vm *vm = job->q->vm; 223 222 struct xe_gt *gt = job->q->gt; 224 223 225 - if (vm && vm->batch_invalidate_tlb) { 224 + if (job->ring_ops_flush_tlb) { 226 225 dw[i++] = preparser_disable(true); 227 226 i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc), 228 227 seqno, true, dw, i); ··· 269 270 struct xe_gt *gt = job->q->gt; 270 271 struct xe_device *xe = gt_to_xe(gt); 271 272 bool decode = job->q->class == XE_ENGINE_CLASS_VIDEO_DECODE; 272 - struct xe_vm *vm = job->q->vm; 273 273 274 274 dw[i++] = preparser_disable(true); 275 275 ··· 280 282 i = emit_aux_table_inv(gt, VE0_AUX_INV, dw, i); 281 283 } 282 284 283 - if (vm && vm->batch_invalidate_tlb) 285 + if (job->ring_ops_flush_tlb) 284 286 i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc), 285 287 seqno, true, dw, i); 286 288 287 289 dw[i++] = preparser_disable(false); 288 290 289 - if (!vm || !vm->batch_invalidate_tlb) 291 + if (!job->ring_ops_flush_tlb) 290 292 i = emit_store_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc), 291 293 seqno, dw, i); 292 294 ··· 315 317 struct xe_gt *gt = job->q->gt; 316 318 struct xe_device *xe = gt_to_xe(gt); 317 319 bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK); 318 - struct xe_vm *vm = job->q->vm; 319 320 u32 mask_flags = 0; 320 321 321 322 dw[i++] = preparser_disable(true); ··· 324 327 mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS; 325 328 326 329 /* See __xe_pt_bind_vma() for a discussion on TLB invalidations. */ 327 - i = emit_pipe_invalidate(mask_flags, vm && vm->batch_invalidate_tlb, dw, i); 330 + i = emit_pipe_invalidate(mask_flags, job->ring_ops_flush_tlb, dw, i); 328 331 329 332 /* hsdes: 1809175790 */ 330 333 if (has_aux_ccs(xe))

+10

drivers/gpu/drm/xe/xe_sched_job.c

··· 250 250 251 251 void xe_sched_job_arm(struct xe_sched_job *job) 252 252 { 253 + struct xe_exec_queue *q = job->q; 254 + struct xe_vm *vm = q->vm; 255 + 256 + if (vm && !xe_sched_job_is_migration(q) && !xe_vm_in_lr_mode(vm) && 257 + (vm->batch_invalidate_tlb || vm->tlb_flush_seqno != q->tlb_flush_seqno)) { 258 + xe_vm_assert_held(vm); 259 + q->tlb_flush_seqno = vm->tlb_flush_seqno; 260 + job->ring_ops_flush_tlb = true; 261 + } 262 + 253 263 drm_sched_job_arm(&job->drm); 254 264 } 255 265

+2

drivers/gpu/drm/xe/xe_sched_job_types.h

··· 39 39 } user_fence; 40 40 /** @migrate_flush_flags: Additional flush flags for migration jobs */ 41 41 u32 migrate_flush_flags; 42 + /** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */ 43 + bool ring_ops_flush_tlb; 42 44 /** @batch_addr: batch buffer address of job */ 43 45 u64 batch_addr[]; 44 46 };

+67 -43

drivers/gpu/drm/xe/xe_vm.c

··· 482 482 return 0; 483 483 } 484 484 485 + /** 486 + * xe_vm_validate_rebind() - Validate buffer objects and rebind vmas 487 + * @vm: The vm for which we are rebinding. 488 + * @exec: The struct drm_exec with the locked GEM objects. 489 + * @num_fences: The number of fences to reserve for the operation, not 490 + * including rebinds and validations. 491 + * 492 + * Validates all evicted gem objects and rebinds their vmas. Note that 493 + * rebindings may cause evictions and hence the validation-rebind 494 + * sequence is rerun until there are no more objects to validate. 495 + * 496 + * Return: 0 on success, negative error code on error. In particular, 497 + * may return -EINTR or -ERESTARTSYS if interrupted, and -EDEADLK if 498 + * the drm_exec transaction needs to be restarted. 499 + */ 500 + int xe_vm_validate_rebind(struct xe_vm *vm, struct drm_exec *exec, 501 + unsigned int num_fences) 502 + { 503 + struct drm_gem_object *obj; 504 + unsigned long index; 505 + int ret; 506 + 507 + do { 508 + ret = drm_gpuvm_validate(&vm->gpuvm, exec); 509 + if (ret) 510 + return ret; 511 + 512 + ret = xe_vm_rebind(vm, false); 513 + if (ret) 514 + return ret; 515 + } while (!list_empty(&vm->gpuvm.evict.list)); 516 + 517 + drm_exec_for_each_locked_object(exec, index, obj) { 518 + ret = dma_resv_reserve_fences(obj->resv, num_fences); 519 + if (ret) 520 + return ret; 521 + } 522 + 523 + return 0; 524 + } 525 + 485 526 static int xe_preempt_work_begin(struct drm_exec *exec, struct xe_vm *vm, 486 527 bool *done) 487 528 { 488 529 int err; 489 530 490 - /* 491 - * 1 fence for each preempt fence plus a fence for each tile from a 492 - * possible rebind 493 - */ 494 - err = drm_gpuvm_prepare_vm(&vm->gpuvm, exec, vm->preempt.num_exec_queues + 495 - vm->xe->info.tile_count); 531 + err = drm_gpuvm_prepare_vm(&vm->gpuvm, exec, 0); 496 532 if (err) 497 533 return err; 498 534 ··· 543 507 return 0; 544 508 } 545 509 546 - err = drm_gpuvm_prepare_objects(&vm->gpuvm, exec, vm->preempt.num_exec_queues); 510 + err = drm_gpuvm_prepare_objects(&vm->gpuvm, exec, 0); 547 511 if (err) 548 512 return err; 549 513 ··· 551 515 if (err) 552 516 return err; 553 517 554 - return drm_gpuvm_validate(&vm->gpuvm, exec); 518 + /* 519 + * Add validation and rebinding to the locking loop since both can 520 + * cause evictions which may require blocing dma_resv locks. 521 + * The fence reservation here is intended for the new preempt fences 522 + * we attach at the end of the rebind work. 523 + */ 524 + return xe_vm_validate_rebind(vm, exec, vm->preempt.num_exec_queues); 555 525 } 556 526 557 527 static void preempt_rebind_work_func(struct work_struct *w) 558 528 { 559 529 struct xe_vm *vm = container_of(w, struct xe_vm, preempt.rebind_work); 560 530 struct drm_exec exec; 561 - struct dma_fence *rebind_fence; 562 531 unsigned int fence_count = 0; 563 532 LIST_HEAD(preempt_fences); 564 533 ktime_t end = 0; ··· 609 568 if (err) 610 569 goto out_unlock; 611 570 612 - rebind_fence = xe_vm_rebind(vm, true); 613 - if (IS_ERR(rebind_fence)) { 614 - err = PTR_ERR(rebind_fence); 571 + err = xe_vm_rebind(vm, true); 572 + if (err) 615 573 goto out_unlock; 616 - } 617 574 618 - if (rebind_fence) { 619 - dma_fence_wait(rebind_fence, false); 620 - dma_fence_put(rebind_fence); 621 - } 622 - 623 - /* Wait on munmap style VM unbinds */ 575 + /* Wait on rebinds and munmap style VM unbinds */ 624 576 wait = dma_resv_wait_timeout(xe_vm_resv(vm), 625 577 DMA_RESV_USAGE_KERNEL, 626 578 false, MAX_SCHEDULE_TIMEOUT); ··· 807 773 struct xe_sync_entry *syncs, u32 num_syncs, 808 774 bool first_op, bool last_op); 809 775 810 - struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker) 776 + int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker) 811 777 { 812 - struct dma_fence *fence = NULL; 778 + struct dma_fence *fence; 813 779 struct xe_vma *vma, *next; 814 780 815 781 lockdep_assert_held(&vm->lock); 816 782 if (xe_vm_in_lr_mode(vm) && !rebind_worker) 817 - return NULL; 783 + return 0; 818 784 819 785 xe_vm_assert_held(vm); 820 786 list_for_each_entry_safe(vma, next, &vm->rebind_list, ··· 822 788 xe_assert(vm->xe, vma->tile_present); 823 789 824 790 list_del_init(&vma->combined_links.rebind); 825 - dma_fence_put(fence); 826 791 if (rebind_worker) 827 792 trace_xe_vma_rebind_worker(vma); 828 793 else 829 794 trace_xe_vma_rebind_exec(vma); 830 795 fence = xe_vm_bind_vma(vma, NULL, NULL, 0, false, false); 831 796 if (IS_ERR(fence)) 832 - return fence; 797 + return PTR_ERR(fence); 798 + dma_fence_put(fence); 833 799 } 834 800 835 - return fence; 801 + return 0; 836 802 } 837 803 838 804 static void xe_vma_free(struct xe_vma *vma) ··· 1038 1004 } 1039 1005 1040 1006 /** 1041 - * xe_vm_prepare_vma() - drm_exec utility to lock a vma 1007 + * xe_vm_lock_vma() - drm_exec utility to lock a vma 1042 1008 * @exec: The drm_exec object we're currently locking for. 1043 1009 * @vma: The vma for witch we want to lock the vm resv and any attached 1044 1010 * object's resv. 1045 - * @num_shared: The number of dma-fence slots to pre-allocate in the 1046 - * objects' reservation objects. 1047 1011 * 1048 1012 * Return: 0 on success, negative error code on error. In particular 1049 1013 * may return -EDEADLK on WW transaction contention and -EINTR if 1050 1014 * an interruptible wait is terminated by a signal. 1051 1015 */ 1052 - int xe_vm_prepare_vma(struct drm_exec *exec, struct xe_vma *vma, 1053 - unsigned int num_shared) 1016 + int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma) 1054 1017 { 1055 1018 struct xe_vm *vm = xe_vma_vm(vma); 1056 1019 struct xe_bo *bo = xe_vma_bo(vma); 1057 1020 int err; 1058 1021 1059 1022 XE_WARN_ON(!vm); 1060 - if (num_shared) 1061 - err = drm_exec_prepare_obj(exec, xe_vm_obj(vm), num_shared); 1062 - else 1063 - err = drm_exec_lock_obj(exec, xe_vm_obj(vm)); 1064 - if (!err && bo && !bo->vm) { 1065 - if (num_shared) 1066 - err = drm_exec_prepare_obj(exec, &bo->ttm.base, num_shared); 1067 - else 1068 - err = drm_exec_lock_obj(exec, &bo->ttm.base); 1069 - } 1023 + 1024 + err = drm_exec_lock_obj(exec, xe_vm_obj(vm)); 1025 + if (!err && bo && !bo->vm) 1026 + err = drm_exec_lock_obj(exec, &bo->ttm.base); 1070 1027 1071 1028 return err; 1072 1029 } ··· 1069 1044 1070 1045 drm_exec_init(&exec, 0, 0); 1071 1046 drm_exec_until_all_locked(&exec) { 1072 - err = xe_vm_prepare_vma(&exec, vma, 0); 1047 + err = xe_vm_lock_vma(&exec, vma); 1073 1048 drm_exec_retry_on_contention(&exec); 1074 1049 if (XE_WARN_ON(err)) 1075 1050 break; ··· 1614 1589 XE_WARN_ON(vm->pt_root[id]); 1615 1590 1616 1591 trace_xe_vm_free(vm); 1617 - dma_fence_put(vm->rebind_fence); 1618 1592 kfree(vm); 1619 1593 } 1620 1594 ··· 2536 2512 2537 2513 lockdep_assert_held_write(&vm->lock); 2538 2514 2539 - err = xe_vm_prepare_vma(exec, vma, 1); 2515 + err = xe_vm_lock_vma(exec, vma); 2540 2516 if (err) 2541 2517 return err; 2542 2518

+5 -3

drivers/gpu/drm/xe/xe_vm.h

··· 207 207 208 208 int xe_vm_userptr_check_repin(struct xe_vm *vm); 209 209 210 - struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker); 210 + int xe_vm_rebind(struct xe_vm *vm, bool rebind_worker); 211 211 212 212 int xe_vm_invalidate_vma(struct xe_vma *vma); 213 213 ··· 242 242 243 243 int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id); 244 244 245 - int xe_vm_prepare_vma(struct drm_exec *exec, struct xe_vma *vma, 246 - unsigned int num_shared); 245 + int xe_vm_lock_vma(struct drm_exec *exec, struct xe_vma *vma); 246 + 247 + int xe_vm_validate_rebind(struct xe_vm *vm, struct drm_exec *exec, 248 + unsigned int num_fences); 247 249 248 250 /** 249 251 * xe_vm_resv() - Return's the vm's reservation object

+5 -3

drivers/gpu/drm/xe/xe_vm_types.h

··· 177 177 */ 178 178 struct list_head rebind_list; 179 179 180 - /** @rebind_fence: rebind fence from execbuf */ 181 - struct dma_fence *rebind_fence; 182 - 183 180 /** 184 181 * @destroy_work: worker to destroy VM, needed as a dma_fence signaling 185 182 * from an irq context can be last put and the destroy needs to be able ··· 261 264 bool capture_once; 262 265 } error_capture; 263 266 267 + /** 268 + * @tlb_flush_seqno: Required TLB flush seqno for the next exec. 269 + * protected by the vm resv. 270 + */ 271 + u64 tlb_flush_seqno; 264 272 /** @batch_invalidate_tlb: Always invalidate TLB before batch start */ 265 273 bool batch_invalidate_tlb; 266 274 /** @xef: XE file handle for tracking this VM's drm client */

+1 -1

drivers/i2c/busses/i2c-pxa.c

··· 324 324 decode_bits(KERN_DEBUG "ISR", isr_bits, ARRAY_SIZE(isr_bits), val); 325 325 } 326 326 327 + #ifdef CONFIG_I2C_PXA_SLAVE 327 328 static const struct bits icr_bits[] = { 328 329 PXA_BIT(ICR_START, "START", NULL), 329 330 PXA_BIT(ICR_STOP, "STOP", NULL), ··· 343 342 PXA_BIT(ICR_UR, "UR", "ur"), 344 343 }; 345 344 346 - #ifdef CONFIG_I2C_PXA_SLAVE 347 345 static void decode_ICR(unsigned int val) 348 346 { 349 347 decode_bits(KERN_DEBUG "ICR", icr_bits, ARRAY_SIZE(icr_bits), val);

+3 -1

drivers/iommu/amd/init.c

··· 3228 3228 static void iommu_snp_enable(void) 3229 3229 { 3230 3230 #ifdef CONFIG_KVM_AMD_SEV 3231 - if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) 3231 + if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) 3232 3232 return; 3233 3233 /* 3234 3234 * The SNP support requires that IOMMU must be enabled, and is ··· 3236 3236 */ 3237 3237 if (no_iommu || iommu_default_passthrough()) { 3238 3238 pr_err("SNP: IOMMU disabled or configured in passthrough mode, SNP cannot be supported.\n"); 3239 + cc_platform_clear(CC_ATTR_HOST_SEV_SNP); 3239 3240 return; 3240 3241 } 3241 3242 3242 3243 amd_iommu_snp_en = check_feature(FEATURE_SNP); 3243 3244 if (!amd_iommu_snp_en) { 3244 3245 pr_err("SNP: IOMMU SNP feature not enabled, SNP cannot be supported.\n"); 3246 + cc_platform_clear(CC_ATTR_HOST_SEV_SNP); 3245 3247 return; 3246 3248 } 3247 3249

+1 -1

drivers/mtd/devices/block2mtd.c

··· 209 209 210 210 if (dev->bdev_file) { 211 211 invalidate_mapping_pages(dev->bdev_file->f_mapping, 0, -1); 212 - fput(dev->bdev_file); 212 + bdev_fput(dev->bdev_file); 213 213 } 214 214 215 215 kfree(dev);

+5 -1

drivers/net/dsa/mv88e6xxx/chip.c

··· 5503 5503 .family = MV88E6XXX_FAMILY_6250, 5504 5504 .name = "Marvell 88E6020", 5505 5505 .num_databases = 64, 5506 - .num_ports = 4, 5506 + /* Ports 2-4 are not routed to pins 5507 + * => usable ports 0, 1, 5, 6 5508 + */ 5509 + .num_ports = 7, 5507 5510 .num_internal_phys = 2, 5511 + .invalid_port_mask = BIT(2) | BIT(3) | BIT(4), 5508 5512 .max_vid = 4095, 5509 5513 .port_base_addr = 0x8, 5510 5514 .phy_base_addr = 0x0,

+1 -1

drivers/net/dsa/sja1105/sja1105_mdio.c

··· 94 94 return tmp & 0xffff; 95 95 } 96 96 97 - int sja1110_pcs_mdio_write_c45(struct mii_bus *bus, int phy, int reg, int mmd, 97 + int sja1110_pcs_mdio_write_c45(struct mii_bus *bus, int phy, int mmd, int reg, 98 98 u16 val) 99 99 { 100 100 struct sja1105_mdio_private *mdio_priv = bus->priv;

+12 -4

drivers/net/ethernet/broadcom/genet/bcmgenet.c

··· 3280 3280 } 3281 3281 3282 3282 /* Returns a reusable dma control register value */ 3283 - static u32 bcmgenet_dma_disable(struct bcmgenet_priv *priv) 3283 + static u32 bcmgenet_dma_disable(struct bcmgenet_priv *priv, bool flush_rx) 3284 3284 { 3285 3285 unsigned int i; 3286 3286 u32 reg; ··· 3304 3304 bcmgenet_umac_writel(priv, 1, UMAC_TX_FLUSH); 3305 3305 udelay(10); 3306 3306 bcmgenet_umac_writel(priv, 0, UMAC_TX_FLUSH); 3307 + 3308 + if (flush_rx) { 3309 + reg = bcmgenet_rbuf_ctrl_get(priv); 3310 + bcmgenet_rbuf_ctrl_set(priv, reg | BIT(0)); 3311 + udelay(10); 3312 + bcmgenet_rbuf_ctrl_set(priv, reg); 3313 + udelay(10); 3314 + } 3307 3315 3308 3316 return dma_ctrl; 3309 3317 } ··· 3376 3368 3377 3369 bcmgenet_set_hw_addr(priv, dev->dev_addr); 3378 3370 3379 - /* Disable RX/TX DMA and flush TX queues */ 3380 - dma_ctrl = bcmgenet_dma_disable(priv); 3371 + /* Disable RX/TX DMA and flush TX and RX queues */ 3372 + dma_ctrl = bcmgenet_dma_disable(priv, true); 3381 3373 3382 3374 /* Reinitialize TDMA and RDMA and SW housekeeping */ 3383 3375 ret = bcmgenet_init_dma(priv); ··· 4243 4235 bcmgenet_hfb_create_rxnfc_filter(priv, rule); 4244 4236 4245 4237 /* Disable RX/TX DMA and flush TX queues */ 4246 - dma_ctrl = bcmgenet_dma_disable(priv); 4238 + dma_ctrl = bcmgenet_dma_disable(priv, false); 4247 4239 4248 4240 /* Reinitialize TDMA and RDMA and SW housekeeping */ 4249 4241 ret = bcmgenet_init_dma(priv);

+9 -2

drivers/net/ethernet/freescale/fec_main.c

··· 2454 2454 fep->link = 0; 2455 2455 fep->full_duplex = 0; 2456 2456 2457 - phy_dev->mac_managed_pm = true; 2458 - 2459 2457 phy_attached_info(phy_dev); 2460 2458 2461 2459 return 0; ··· 2465 2467 struct net_device *ndev = platform_get_drvdata(pdev); 2466 2468 struct fec_enet_private *fep = netdev_priv(ndev); 2467 2469 bool suppress_preamble = false; 2470 + struct phy_device *phydev; 2468 2471 struct device_node *node; 2469 2472 int err = -ENXIO; 2470 2473 u32 mii_speed, holdtime; 2471 2474 u32 bus_freq; 2475 + int addr; 2472 2476 2473 2477 /* 2474 2478 * The i.MX28 dual fec interfaces are not equal. ··· 2583 2583 if (err) 2584 2584 goto err_out_free_mdiobus; 2585 2585 of_node_put(node); 2586 + 2587 + /* find all the PHY devices on the bus and set mac_managed_pm to true */ 2588 + for (addr = 0; addr < PHY_MAX_ADDR; addr++) { 2589 + phydev = mdiobus_get_phy(fep->mii_bus, addr); 2590 + if (phydev) 2591 + phydev->mac_managed_pm = true; 2592 + } 2586 2593 2587 2594 mii_cnt++; 2588 2595

+2

drivers/net/ethernet/intel/e1000e/hw.h

··· 628 628 u32 id; 629 629 u32 reset_delay_us; /* in usec */ 630 630 u32 revision; 631 + u32 retry_count; 631 632 632 633 enum e1000_media_type media_type; 633 634 ··· 645 644 bool polarity_correction; 646 645 bool speed_downgraded; 647 646 bool autoneg_wait_to_complete; 647 + bool retry_enabled; 648 648 }; 649 649 650 650 struct e1000_nvm_info {

+26 -12

drivers/net/ethernet/intel/e1000e/ich8lan.c

··· 222 222 if (hw->mac.type >= e1000_pch_lpt) { 223 223 /* Only unforce SMBus if ME is not active */ 224 224 if (!(er32(FWSM) & E1000_ICH_FWSM_FW_VALID)) { 225 + /* Switching PHY interface always returns MDI error 226 + * so disable retry mechanism to avoid wasting time 227 + */ 228 + e1000e_disable_phy_retry(hw); 229 + 225 230 /* Unforce SMBus mode in PHY */ 226 231 e1e_rphy_locked(hw, CV_SMB_CTRL, &phy_reg); 227 232 phy_reg &= ~CV_SMB_CTRL_FORCE_SMBUS; 228 233 e1e_wphy_locked(hw, CV_SMB_CTRL, phy_reg); 234 + 235 + e1000e_enable_phy_retry(hw); 229 236 230 237 /* Unforce SMBus mode in MAC */ 231 238 mac_reg = er32(CTRL_EXT); ··· 317 310 goto out; 318 311 } 319 312 313 + /* There is no guarantee that the PHY is accessible at this time 314 + * so disable retry mechanism to avoid wasting time 315 + */ 316 + e1000e_disable_phy_retry(hw); 317 + 320 318 /* The MAC-PHY interconnect may be in SMBus mode. If the PHY is 321 319 * inaccessible and resetting the PHY is not blocked, toggle the 322 320 * LANPHYPC Value bit to force the interconnect to PCIe mode. ··· 392 380 break; 393 381 } 394 382 383 + e1000e_enable_phy_retry(hw); 384 + 395 385 hw->phy.ops.release(hw); 396 386 if (!ret_val) { 397 387 ··· 462 448 phy->autoneg_mask = AUTONEG_ADVERTISE_SPEED_DEFAULT; 463 449 464 450 phy->id = e1000_phy_unknown; 451 + 452 + if (hw->mac.type == e1000_pch_mtp) { 453 + phy->retry_count = 2; 454 + e1000e_enable_phy_retry(hw); 455 + } 465 456 466 457 ret_val = e1000_init_phy_workarounds_pchlan(hw); 467 458 if (ret_val) ··· 1165 1146 if (ret_val) 1166 1147 goto out; 1167 1148 1168 - /* Force SMBus mode in PHY */ 1169 - ret_val = e1000_read_phy_reg_hv_locked(hw, CV_SMB_CTRL, &phy_reg); 1170 - if (ret_val) 1171 - goto release; 1172 - phy_reg |= CV_SMB_CTRL_FORCE_SMBUS; 1173 - e1000_write_phy_reg_hv_locked(hw, CV_SMB_CTRL, phy_reg); 1174 - 1175 - /* Force SMBus mode in MAC */ 1176 - mac_reg = er32(CTRL_EXT); 1177 - mac_reg |= E1000_CTRL_EXT_FORCE_SMBUS; 1178 - ew32(CTRL_EXT, mac_reg); 1179 - 1180 1149 /* Si workaround for ULP entry flow on i127/rev6 h/w. Enable 1181 1150 * LPLU and disable Gig speed when entering ULP 1182 1151 */ ··· 1320 1313 /* Toggle LANPHYPC Value bit */ 1321 1314 e1000_toggle_lanphypc_pch_lpt(hw); 1322 1315 1316 + /* Switching PHY interface always returns MDI error 1317 + * so disable retry mechanism to avoid wasting time 1318 + */ 1319 + e1000e_disable_phy_retry(hw); 1320 + 1323 1321 /* Unforce SMBus mode in PHY */ 1324 1322 ret_val = e1000_read_phy_reg_hv_locked(hw, CV_SMB_CTRL, &phy_reg); 1325 1323 if (ret_val) { ··· 1344 1332 } 1345 1333 phy_reg &= ~CV_SMB_CTRL_FORCE_SMBUS; 1346 1334 e1000_write_phy_reg_hv_locked(hw, CV_SMB_CTRL, phy_reg); 1335 + 1336 + e1000e_enable_phy_retry(hw); 1347 1337 1348 1338 /* Unforce SMBus mode in MAC */ 1349 1339 mac_reg = er32(CTRL_EXT);

+18

drivers/net/ethernet/intel/e1000e/netdev.c

··· 6623 6623 struct e1000_hw *hw = &adapter->hw; 6624 6624 u32 ctrl, ctrl_ext, rctl, status, wufc; 6625 6625 int retval = 0; 6626 + u16 smb_ctrl; 6626 6627 6627 6628 /* Runtime suspend should only enable wakeup for link changes */ 6628 6629 if (runtime) ··· 6697 6696 if (retval) 6698 6697 return retval; 6699 6698 } 6699 + 6700 + /* Force SMBUS to allow WOL */ 6701 + /* Switching PHY interface always returns MDI error 6702 + * so disable retry mechanism to avoid wasting time 6703 + */ 6704 + e1000e_disable_phy_retry(hw); 6705 + 6706 + e1e_rphy(hw, CV_SMB_CTRL, &smb_ctrl); 6707 + smb_ctrl |= CV_SMB_CTRL_FORCE_SMBUS; 6708 + e1e_wphy(hw, CV_SMB_CTRL, smb_ctrl); 6709 + 6710 + e1000e_enable_phy_retry(hw); 6711 + 6712 + /* Force SMBus mode in MAC */ 6713 + ctrl_ext = er32(CTRL_EXT); 6714 + ctrl_ext |= E1000_CTRL_EXT_FORCE_SMBUS; 6715 + ew32(CTRL_EXT, ctrl_ext); 6700 6716 } 6701 6717 6702 6718 /* Ensure that the appropriate bits are set in LPI_CTRL

+114 -70

drivers/net/ethernet/intel/e1000e/phy.c

··· 107 107 return e1e_wphy(hw, M88E1000_PHY_GEN_CONTROL, 0); 108 108 } 109 109 110 + void e1000e_disable_phy_retry(struct e1000_hw *hw) 111 + { 112 + hw->phy.retry_enabled = false; 113 + } 114 + 115 + void e1000e_enable_phy_retry(struct e1000_hw *hw) 116 + { 117 + hw->phy.retry_enabled = true; 118 + } 119 + 110 120 /** 111 121 * e1000e_read_phy_reg_mdic - Read MDI control register 112 122 * @hw: pointer to the HW structure ··· 128 118 **/ 129 119 s32 e1000e_read_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 *data) 130 120 { 121 + u32 i, mdic = 0, retry_counter, retry_max; 131 122 struct e1000_phy_info *phy = &hw->phy; 132 - u32 i, mdic = 0; 123 + bool success; 133 124 134 125 if (offset > MAX_PHY_REG_ADDRESS) { 135 126 e_dbg("PHY Address %d is out of range\n", offset); 136 127 return -E1000_ERR_PARAM; 137 128 } 138 129 130 + retry_max = phy->retry_enabled ? phy->retry_count : 0; 131 + 139 132 /* Set up Op-code, Phy Address, and register offset in the MDI 140 133 * Control register. The MAC will take care of interfacing with the 141 134 * PHY to retrieve the desired data. 142 135 */ 143 - mdic = ((offset << E1000_MDIC_REG_SHIFT) | 144 - (phy->addr << E1000_MDIC_PHY_SHIFT) | 145 - (E1000_MDIC_OP_READ)); 136 + for (retry_counter = 0; retry_counter <= retry_max; retry_counter++) { 137 + success = true; 146 138 147 - ew32(MDIC, mdic); 139 + mdic = ((offset << E1000_MDIC_REG_SHIFT) | 140 + (phy->addr << E1000_MDIC_PHY_SHIFT) | 141 + (E1000_MDIC_OP_READ)); 148 142 149 - /* Poll the ready bit to see if the MDI read completed 150 - * Increasing the time out as testing showed failures with 151 - * the lower time out 152 - */ 153 - for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) { 154 - udelay(50); 155 - mdic = er32(MDIC); 156 - if (mdic & E1000_MDIC_READY) 157 - break; 158 - } 159 - if (!(mdic & E1000_MDIC_READY)) { 160 - e_dbg("MDI Read PHY Reg Address %d did not complete\n", offset); 161 - return -E1000_ERR_PHY; 162 - } 163 - if (mdic & E1000_MDIC_ERROR) { 164 - e_dbg("MDI Read PHY Reg Address %d Error\n", offset); 165 - return -E1000_ERR_PHY; 166 - } 167 - if (FIELD_GET(E1000_MDIC_REG_MASK, mdic) != offset) { 168 - e_dbg("MDI Read offset error - requested %d, returned %d\n", 169 - offset, FIELD_GET(E1000_MDIC_REG_MASK, mdic)); 170 - return -E1000_ERR_PHY; 171 - } 172 - *data = (u16)mdic; 143 + ew32(MDIC, mdic); 173 144 174 - /* Allow some time after each MDIC transaction to avoid 175 - * reading duplicate data in the next MDIC transaction. 176 - */ 177 - if (hw->mac.type == e1000_pch2lan) 178 - udelay(100); 179 - return 0; 145 + /* Poll the ready bit to see if the MDI read completed 146 + * Increasing the time out as testing showed failures with 147 + * the lower time out 148 + */ 149 + for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) { 150 + usleep_range(50, 60); 151 + mdic = er32(MDIC); 152 + if (mdic & E1000_MDIC_READY) 153 + break; 154 + } 155 + if (!(mdic & E1000_MDIC_READY)) { 156 + e_dbg("MDI Read PHY Reg Address %d did not complete\n", 157 + offset); 158 + success = false; 159 + } 160 + if (mdic & E1000_MDIC_ERROR) { 161 + e_dbg("MDI Read PHY Reg Address %d Error\n", offset); 162 + success = false; 163 + } 164 + if (FIELD_GET(E1000_MDIC_REG_MASK, mdic) != offset) { 165 + e_dbg("MDI Read offset error - requested %d, returned %d\n", 166 + offset, FIELD_GET(E1000_MDIC_REG_MASK, mdic)); 167 + success = false; 168 + } 169 + 170 + /* Allow some time after each MDIC transaction to avoid 171 + * reading duplicate data in the next MDIC transaction. 172 + */ 173 + if (hw->mac.type == e1000_pch2lan) 174 + usleep_range(100, 150); 175 + 176 + if (success) { 177 + *data = (u16)mdic; 178 + return 0; 179 + } 180 + 181 + if (retry_counter != retry_max) { 182 + e_dbg("Perform retry on PHY transaction...\n"); 183 + mdelay(10); 184 + } 185 + } 186 + 187 + return -E1000_ERR_PHY; 180 188 } 181 189 182 190 /** ··· 207 179 **/ 208 180 s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 data) 209 181 { 182 + u32 i, mdic = 0, retry_counter, retry_max; 210 183 struct e1000_phy_info *phy = &hw->phy; 211 - u32 i, mdic = 0; 184 + bool success; 212 185 213 186 if (offset > MAX_PHY_REG_ADDRESS) { 214 187 e_dbg("PHY Address %d is out of range\n", offset); 215 188 return -E1000_ERR_PARAM; 216 189 } 217 190 191 + retry_max = phy->retry_enabled ? phy->retry_count : 0; 192 + 218 193 /* Set up Op-code, Phy Address, and register offset in the MDI 219 194 * Control register. The MAC will take care of interfacing with the 220 195 * PHY to retrieve the desired data. 221 196 */ 222 - mdic = (((u32)data) | 223 - (offset << E1000_MDIC_REG_SHIFT) | 224 - (phy->addr << E1000_MDIC_PHY_SHIFT) | 225 - (E1000_MDIC_OP_WRITE)); 197 + for (retry_counter = 0; retry_counter <= retry_max; retry_counter++) { 198 + success = true; 226 199 227 - ew32(MDIC, mdic); 200 + mdic = (((u32)data) | 201 + (offset << E1000_MDIC_REG_SHIFT) | 202 + (phy->addr << E1000_MDIC_PHY_SHIFT) | 203 + (E1000_MDIC_OP_WRITE)); 228 204 229 - /* Poll the ready bit to see if the MDI read completed 230 - * Increasing the time out as testing showed failures with 231 - * the lower time out 232 - */ 233 - for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) { 234 - udelay(50); 235 - mdic = er32(MDIC); 236 - if (mdic & E1000_MDIC_READY) 237 - break; 238 - } 239 - if (!(mdic & E1000_MDIC_READY)) { 240 - e_dbg("MDI Write PHY Reg Address %d did not complete\n", offset); 241 - return -E1000_ERR_PHY; 242 - } 243 - if (mdic & E1000_MDIC_ERROR) { 244 - e_dbg("MDI Write PHY Red Address %d Error\n", offset); 245 - return -E1000_ERR_PHY; 246 - } 247 - if (FIELD_GET(E1000_MDIC_REG_MASK, mdic) != offset) { 248 - e_dbg("MDI Write offset error - requested %d, returned %d\n", 249 - offset, FIELD_GET(E1000_MDIC_REG_MASK, mdic)); 250 - return -E1000_ERR_PHY; 205 + ew32(MDIC, mdic); 206 + 207 + /* Poll the ready bit to see if the MDI read completed 208 + * Increasing the time out as testing showed failures with 209 + * the lower time out 210 + */ 211 + for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) { 212 + usleep_range(50, 60); 213 + mdic = er32(MDIC); 214 + if (mdic & E1000_MDIC_READY) 215 + break; 216 + } 217 + if (!(mdic & E1000_MDIC_READY)) { 218 + e_dbg("MDI Write PHY Reg Address %d did not complete\n", 219 + offset); 220 + success = false; 221 + } 222 + if (mdic & E1000_MDIC_ERROR) { 223 + e_dbg("MDI Write PHY Reg Address %d Error\n", offset); 224 + success = false; 225 + } 226 + if (FIELD_GET(E1000_MDIC_REG_MASK, mdic) != offset) { 227 + e_dbg("MDI Write offset error - requested %d, returned %d\n", 228 + offset, FIELD_GET(E1000_MDIC_REG_MASK, mdic)); 229 + success = false; 230 + } 231 + 232 + /* Allow some time after each MDIC transaction to avoid 233 + * reading duplicate data in the next MDIC transaction. 234 + */ 235 + if (hw->mac.type == e1000_pch2lan) 236 + usleep_range(100, 150); 237 + 238 + if (success) 239 + return 0; 240 + 241 + if (retry_counter != retry_max) { 242 + e_dbg("Perform retry on PHY transaction...\n"); 243 + mdelay(10); 244 + } 251 245 } 252 246 253 - /* Allow some time after each MDIC transaction to avoid 254 - * reading duplicate data in the next MDIC transaction. 255 - */ 256 - if (hw->mac.type == e1000_pch2lan) 257 - udelay(100); 258 - 259 - return 0; 247 + return -E1000_ERR_PHY; 260 248 } 261 249 262 250 /**

+2

drivers/net/ethernet/intel/e1000e/phy.h

··· 51 51 s32 e1000e_write_phy_reg_bm2(struct e1000_hw *hw, u32 offset, u16 data); 52 52 void e1000_power_up_phy_copper(struct e1000_hw *hw); 53 53 void e1000_power_down_phy_copper(struct e1000_hw *hw); 54 + void e1000e_disable_phy_retry(struct e1000_hw *hw); 55 + void e1000e_enable_phy_retry(struct e1000_hw *hw); 54 56 s32 e1000e_read_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 *data); 55 57 s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 data); 56 58 s32 e1000_read_phy_reg_hv(struct e1000_hw *hw, u32 offset, u16 *data);

+1

drivers/net/ethernet/intel/i40e/i40e.h

··· 955 955 struct rcu_head rcu; /* to avoid race with update stats on free */ 956 956 char name[I40E_INT_NAME_STR_LEN]; 957 957 bool arm_wb_state; 958 + bool in_busy_poll; 958 959 int irq_num; /* IRQ assigned to this q_vector */ 959 960 } ____cacheline_internodealigned_in_smp; 960 961

+11 -2

drivers/net/ethernet/intel/i40e/i40e_main.c

··· 1253 1253 int bkt; 1254 1254 int cnt = 0; 1255 1255 1256 - hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist) 1257 - ++cnt; 1256 + hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist) { 1257 + if (f->state == I40E_FILTER_NEW || 1258 + f->state == I40E_FILTER_ACTIVE) 1259 + ++cnt; 1260 + } 1258 1261 1259 1262 return cnt; 1260 1263 } ··· 3913 3910 wr32(hw, I40E_PFINT_ITRN(I40E_TX_ITR, vector - 1), 3914 3911 q_vector->tx.target_itr >> 1); 3915 3912 q_vector->tx.current_itr = q_vector->tx.target_itr; 3913 + 3914 + /* Set ITR for software interrupts triggered after exiting 3915 + * busy-loop polling. 3916 + */ 3917 + wr32(hw, I40E_PFINT_ITRN(I40E_SW_ITR, vector - 1), 3918 + I40E_ITR_20K); 3916 3919 3917 3920 wr32(hw, I40E_PFINT_RATEN(vector - 1), 3918 3921 i40e_intrl_usec_to_reg(vsi->int_rate_limit));

+3

drivers/net/ethernet/intel/i40e/i40e_register.h

··· 333 333 #define I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT 3 334 334 #define I40E_PFINT_DYN_CTLN_ITR_INDX_MASK I40E_MASK(0x3, I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT) 335 335 #define I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT 5 336 + #define I40E_PFINT_DYN_CTLN_INTERVAL_MASK I40E_MASK(0xFFF, I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT) 336 337 #define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_SHIFT 24 337 338 #define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_MASK I40E_MASK(0x1, I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_SHIFT) 339 + #define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_SHIFT 25 340 + #define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_MASK I40E_MASK(0x3, I40E_PFINT_DYN_CTLN_SW_ITR_INDX_SHIFT) 338 341 #define I40E_PFINT_ICR0 0x00038780 /* Reset: CORER */ 339 342 #define I40E_PFINT_ICR0_INTEVENT_SHIFT 0 340 343 #define I40E_PFINT_ICR0_INTEVENT_MASK I40E_MASK(0x1, I40E_PFINT_ICR0_INTEVENT_SHIFT)

+61 -21

drivers/net/ethernet/intel/i40e/i40e_txrx.c

··· 2630 2630 return failure ? budget : (int)total_rx_packets; 2631 2631 } 2632 2632 2633 - static inline u32 i40e_buildreg_itr(const int type, u16 itr) 2633 + /** 2634 + * i40e_buildreg_itr - build a value for writing to I40E_PFINT_DYN_CTLN register 2635 + * @itr_idx: interrupt throttling index 2636 + * @interval: interrupt throttling interval value in usecs 2637 + * @force_swint: force software interrupt 2638 + * 2639 + * The function builds a value for I40E_PFINT_DYN_CTLN register that 2640 + * is used to update interrupt throttling interval for specified ITR index 2641 + * and optionally enforces a software interrupt. If the @itr_idx is equal 2642 + * to I40E_ITR_NONE then no interval change is applied and only @force_swint 2643 + * parameter is taken into account. If the interval change and enforced 2644 + * software interrupt are not requested then the built value just enables 2645 + * appropriate vector interrupt. 2646 + **/ 2647 + static u32 i40e_buildreg_itr(enum i40e_dyn_idx itr_idx, u16 interval, 2648 + bool force_swint) 2634 2649 { 2635 2650 u32 val; 2636 2651 ··· 2659 2644 * an event in the PBA anyway so we need to rely on the automask 2660 2645 * to hold pending events for us until the interrupt is re-enabled 2661 2646 * 2662 - * The itr value is reported in microseconds, and the register 2663 - * value is recorded in 2 microsecond units. For this reason we 2664 - * only need to shift by the interval shift - 1 instead of the 2665 - * full value. 2647 + * We have to shift the given value as it is reported in microseconds 2648 + * and the register value is recorded in 2 microsecond units. 2666 2649 */ 2667 - itr &= I40E_ITR_MASK; 2650 + interval >>= 1; 2668 2651 2652 + /* 1. Enable vector interrupt 2653 + * 2. Update the interval for the specified ITR index 2654 + * (I40E_ITR_NONE in the register is used to indicate that 2655 + * no interval update is requested) 2656 + */ 2669 2657 val = I40E_PFINT_DYN_CTLN_INTENA_MASK | 2670 - (type << I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT) | 2671 - (itr << (I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT - 1)); 2658 + FIELD_PREP(I40E_PFINT_DYN_CTLN_ITR_INDX_MASK, itr_idx) | 2659 + FIELD_PREP(I40E_PFINT_DYN_CTLN_INTERVAL_MASK, interval); 2660 + 2661 + /* 3. Enforce software interrupt trigger if requested 2662 + * (These software interrupts rate is limited by ITR2 that is 2663 + * set to 20K interrupts per second) 2664 + */ 2665 + if (force_swint) 2666 + val |= I40E_PFINT_DYN_CTLN_SWINT_TRIG_MASK | 2667 + I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_MASK | 2668 + FIELD_PREP(I40E_PFINT_DYN_CTLN_SW_ITR_INDX_MASK, 2669 + I40E_SW_ITR); 2672 2670 2673 2671 return val; 2674 2672 } 2675 - 2676 - /* a small macro to shorten up some long lines */ 2677 - #define INTREG I40E_PFINT_DYN_CTLN 2678 2673 2679 2674 /* The act of updating the ITR will cause it to immediately trigger. In order 2680 2675 * to prevent this from throwing off adaptive update statistics we defer the ··· 2704 2679 static inline void i40e_update_enable_itr(struct i40e_vsi *vsi, 2705 2680 struct i40e_q_vector *q_vector) 2706 2681 { 2682 + enum i40e_dyn_idx itr_idx = I40E_ITR_NONE; 2707 2683 struct i40e_hw *hw = &vsi->back->hw; 2708 - u32 intval; 2684 + u16 interval = 0; 2685 + u32 itr_val; 2709 2686 2710 2687 /* If we don't have MSIX, then we only need to re-enable icr0 */ 2711 2688 if (!test_bit(I40E_FLAG_MSIX_ENA, vsi->back->flags)) { ··· 2729 2702 */ 2730 2703 if (q_vector->rx.target_itr < q_vector->rx.current_itr) { 2731 2704 /* Rx ITR needs to be reduced, this is highest priority */ 2732 - intval = i40e_buildreg_itr(I40E_RX_ITR, 2733 - q_vector->rx.target_itr); 2705 + itr_idx = I40E_RX_ITR; 2706 + interval = q_vector->rx.target_itr; 2734 2707 q_vector->rx.current_itr = q_vector->rx.target_itr; 2735 2708 q_vector->itr_countdown = ITR_COUNTDOWN_START; 2736 2709 } else if ((q_vector->tx.target_itr < q_vector->tx.current_itr) || ··· 2739 2712 /* Tx ITR needs to be reduced, this is second priority 2740 2713 * Tx ITR needs to be increased more than Rx, fourth priority 2741 2714 */ 2742 - intval = i40e_buildreg_itr(I40E_TX_ITR, 2743 - q_vector->tx.target_itr); 2715 + itr_idx = I40E_TX_ITR; 2716 + interval = q_vector->tx.target_itr; 2744 2717 q_vector->tx.current_itr = q_vector->tx.target_itr; 2745 2718 q_vector->itr_countdown = ITR_COUNTDOWN_START; 2746 2719 } else if (q_vector->rx.current_itr != q_vector->rx.target_itr) { 2747 2720 /* Rx ITR needs to be increased, third priority */ 2748 - intval = i40e_buildreg_itr(I40E_RX_ITR, 2749 - q_vector->rx.target_itr); 2721 + itr_idx = I40E_RX_ITR; 2722 + interval = q_vector->rx.target_itr; 2750 2723 q_vector->rx.current_itr = q_vector->rx.target_itr; 2751 2724 q_vector->itr_countdown = ITR_COUNTDOWN_START; 2752 2725 } else { 2753 2726 /* No ITR update, lowest priority */ 2754 - intval = i40e_buildreg_itr(I40E_ITR_NONE, 0); 2755 2727 if (q_vector->itr_countdown) 2756 2728 q_vector->itr_countdown--; 2757 2729 } 2758 2730 2759 - if (!test_bit(__I40E_VSI_DOWN, vsi->state)) 2760 - wr32(hw, INTREG(q_vector->reg_idx), intval); 2731 + /* Do not update interrupt control register if VSI is down */ 2732 + if (test_bit(__I40E_VSI_DOWN, vsi->state)) 2733 + return; 2734 + 2735 + /* Update ITR interval if necessary and enforce software interrupt 2736 + * if we are exiting busy poll. 2737 + */ 2738 + if (q_vector->in_busy_poll) { 2739 + itr_val = i40e_buildreg_itr(itr_idx, interval, true); 2740 + q_vector->in_busy_poll = false; 2741 + } else { 2742 + itr_val = i40e_buildreg_itr(itr_idx, interval, false); 2743 + } 2744 + wr32(hw, I40E_PFINT_DYN_CTLN(q_vector->reg_idx), itr_val); 2761 2745 } 2762 2746 2763 2747 /** ··· 2883 2845 */ 2884 2846 if (likely(napi_complete_done(napi, work_done))) 2885 2847 i40e_update_enable_itr(vsi, q_vector); 2848 + else 2849 + q_vector->in_busy_poll = true; 2886 2850 2887 2851 return min(work_done, budget - 1); 2888 2852 }

+1

drivers/net/ethernet/intel/i40e/i40e_txrx.h

··· 68 68 /* these are indexes into ITRN registers */ 69 69 #define I40E_RX_ITR I40E_IDX_ITR0 70 70 #define I40E_TX_ITR I40E_IDX_ITR1 71 + #define I40E_SW_ITR I40E_IDX_ITR2 71 72 72 73 /* Supported RSS offloads */ 73 74 #define I40E_DEFAULT_RSS_HENA ( \

+22 -23

drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c

··· 1624 1624 { 1625 1625 struct i40e_hw *hw = &pf->hw; 1626 1626 struct i40e_vf *vf; 1627 - int i, v; 1628 1627 u32 reg; 1628 + int i; 1629 1629 1630 1630 /* If we don't have any VFs, then there is nothing to reset */ 1631 1631 if (!pf->num_alloc_vfs) ··· 1636 1636 return false; 1637 1637 1638 1638 /* Begin reset on all VFs at once */ 1639 - for (v = 0; v < pf->num_alloc_vfs; v++) { 1640 - vf = &pf->vf[v]; 1639 + for (vf = &pf->vf[0]; vf < &pf->vf[pf->num_alloc_vfs]; ++vf) { 1641 1640 /* If VF is being reset no need to trigger reset again */ 1642 1641 if (!test_bit(I40E_VF_STATE_RESETTING, &vf->vf_states)) 1643 - i40e_trigger_vf_reset(&pf->vf[v], flr); 1642 + i40e_trigger_vf_reset(vf, flr); 1644 1643 } 1645 1644 1646 1645 /* HW requires some time to make sure it can flush the FIFO for a VF ··· 1648 1649 * the VFs using a simple iterator that increments once that VF has 1649 1650 * finished resetting. 1650 1651 */ 1651 - for (i = 0, v = 0; i < 10 && v < pf->num_alloc_vfs; i++) { 1652 + for (i = 0, vf = &pf->vf[0]; i < 10 && vf < &pf->vf[pf->num_alloc_vfs]; ++i) { 1652 1653 usleep_range(10000, 20000); 1653 1654 1654 1655 /* Check each VF in sequence, beginning with the VF to fail 1655 1656 * the previous check. 1656 1657 */ 1657 - while (v < pf->num_alloc_vfs) { 1658 - vf = &pf->vf[v]; 1658 + while (vf < &pf->vf[pf->num_alloc_vfs]) { 1659 1659 if (!test_bit(I40E_VF_STATE_RESETTING, &vf->vf_states)) { 1660 1660 reg = rd32(hw, I40E_VPGEN_VFRSTAT(vf->vf_id)); 1661 1661 if (!(reg & I40E_VPGEN_VFRSTAT_VFRD_MASK)) ··· 1664 1666 /* If the current VF has finished resetting, move on 1665 1667 * to the next VF in sequence. 1666 1668 */ 1667 - v++; 1669 + ++vf; 1668 1670 } 1669 1671 } 1670 1672 ··· 1674 1676 /* Display a warning if at least one VF didn't manage to reset in 1675 1677 * time, but continue on with the operation. 1676 1678 */ 1677 - if (v < pf->num_alloc_vfs) 1679 + if (vf < &pf->vf[pf->num_alloc_vfs]) 1678 1680 dev_err(&pf->pdev->dev, "VF reset check timeout on VF %d\n", 1679 - pf->vf[v].vf_id); 1681 + vf->vf_id); 1680 1682 usleep_range(10000, 20000); 1681 1683 1682 1684 /* Begin disabling all the rings associated with VFs, but do not wait 1683 1685 * between each VF. 1684 1686 */ 1685 - for (v = 0; v < pf->num_alloc_vfs; v++) { 1687 + for (vf = &pf->vf[0]; vf < &pf->vf[pf->num_alloc_vfs]; ++vf) { 1686 1688 /* On initial reset, we don't have any queues to disable */ 1687 - if (pf->vf[v].lan_vsi_idx == 0) 1689 + if (vf->lan_vsi_idx == 0) 1688 1690 continue; 1689 1691 1690 1692 /* If VF is reset in another thread just continue */ 1691 1693 if (test_bit(I40E_VF_STATE_RESETTING, &vf->vf_states)) 1692 1694 continue; 1693 1695 1694 - i40e_vsi_stop_rings_no_wait(pf->vsi[pf->vf[v].lan_vsi_idx]); 1696 + i40e_vsi_stop_rings_no_wait(pf->vsi[vf->lan_vsi_idx]); 1695 1697 } 1696 1698 1697 1699 /* Now that we've notified HW to disable all of the VF rings, wait 1698 1700 * until they finish. 1699 1701 */ 1700 - for (v = 0; v < pf->num_alloc_vfs; v++) { 1702 + for (vf = &pf->vf[0]; vf < &pf->vf[pf->num_alloc_vfs]; ++vf) { 1701 1703 /* On initial reset, we don't have any queues to disable */ 1702 - if (pf->vf[v].lan_vsi_idx == 0) 1704 + if (vf->lan_vsi_idx == 0) 1703 1705 continue; 1704 1706 1705 1707 /* If VF is reset in another thread just continue */ 1706 1708 if (test_bit(I40E_VF_STATE_RESETTING, &vf->vf_states)) 1707 1709 continue; 1708 1710 1709 - i40e_vsi_wait_queues_disabled(pf->vsi[pf->vf[v].lan_vsi_idx]); 1711 + i40e_vsi_wait_queues_disabled(pf->vsi[vf->lan_vsi_idx]); 1710 1712 } 1711 1713 1712 1714 /* Hw may need up to 50ms to finish disabling the RX queues. We ··· 1715 1717 mdelay(50); 1716 1718 1717 1719 /* Finish the reset on each VF */ 1718 - for (v = 0; v < pf->num_alloc_vfs; v++) { 1720 + for (vf = &pf->vf[0]; vf < &pf->vf[pf->num_alloc_vfs]; ++vf) { 1719 1721 /* If VF is reset in another thread just continue */ 1720 1722 if (test_bit(I40E_VF_STATE_RESETTING, &vf->vf_states)) 1721 1723 continue; 1722 1724 1723 - i40e_cleanup_reset_vf(&pf->vf[v]); 1725 + i40e_cleanup_reset_vf(vf); 1724 1726 } 1725 1727 1726 1728 i40e_flush(hw); ··· 3137 3139 /* Allow to delete VF primary MAC only if it was not set 3138 3140 * administratively by PF or if VF is trusted. 3139 3141 */ 3140 - if (ether_addr_equal(addr, vf->default_lan_addr.addr) && 3141 - i40e_can_vf_change_mac(vf)) 3142 - was_unimac_deleted = true; 3143 - else 3144 - continue; 3142 + if (ether_addr_equal(addr, vf->default_lan_addr.addr)) { 3143 + if (i40e_can_vf_change_mac(vf)) 3144 + was_unimac_deleted = true; 3145 + else 3146 + continue; 3147 + } 3145 3148 3146 3149 if (i40e_del_mac_filter(vsi, al->list[i].addr)) { 3147 3150 ret = -EINVAL;

+5 -5

drivers/net/ethernet/intel/ice/ice_common.c

··· 1002 1002 */ 1003 1003 int ice_init_hw(struct ice_hw *hw) 1004 1004 { 1005 - struct ice_aqc_get_phy_caps_data *pcaps __free(kfree); 1006 - void *mac_buf __free(kfree); 1005 + struct ice_aqc_get_phy_caps_data *pcaps __free(kfree) = NULL; 1006 + void *mac_buf __free(kfree) = NULL; 1007 1007 u16 mac_buf_len; 1008 1008 int status; 1009 1009 ··· 3272 3272 return status; 3273 3273 3274 3274 if (li->link_info & ICE_AQ_MEDIA_AVAILABLE) { 3275 - struct ice_aqc_get_phy_caps_data *pcaps __free(kfree); 3275 + struct ice_aqc_get_phy_caps_data *pcaps __free(kfree) = NULL; 3276 3276 3277 3277 pcaps = kzalloc(sizeof(*pcaps), GFP_KERNEL); 3278 3278 if (!pcaps) ··· 3420 3420 int 3421 3421 ice_set_fc(struct ice_port_info *pi, u8 *aq_failures, bool ena_auto_link_update) 3422 3422 { 3423 - struct ice_aqc_get_phy_caps_data *pcaps __free(kfree); 3423 + struct ice_aqc_get_phy_caps_data *pcaps __free(kfree) = NULL; 3424 3424 struct ice_aqc_set_phy_cfg_data cfg = { 0 }; 3425 3425 struct ice_hw *hw; 3426 3426 int status; ··· 3561 3561 ice_cfg_phy_fec(struct ice_port_info *pi, struct ice_aqc_set_phy_cfg_data *cfg, 3562 3562 enum ice_fec_mode fec) 3563 3563 { 3564 - struct ice_aqc_get_phy_caps_data *pcaps __free(kfree); 3564 + struct ice_aqc_get_phy_caps_data *pcaps __free(kfree) = NULL; 3565 3565 struct ice_hw *hw; 3566 3566 int status; 3567 3567

+1 -1

drivers/net/ethernet/intel/ice/ice_ethtool.c

··· 941 941 struct ice_netdev_priv *np = netdev_priv(netdev); 942 942 struct ice_vsi *orig_vsi = np->vsi, *test_vsi; 943 943 struct ice_pf *pf = orig_vsi->back; 944 + u8 *tx_frame __free(kfree) = NULL; 944 945 u8 broadcast[ETH_ALEN], ret = 0; 945 946 int num_frames, valid_frames; 946 947 struct ice_tx_ring *tx_ring; 947 948 struct ice_rx_ring *rx_ring; 948 - u8 *tx_frame __free(kfree); 949 949 int i; 950 950 951 951 netdev_info(netdev, "loopback test\n");

+8 -10

drivers/net/ethernet/intel/ice/ice_vf_vsi_vlan_ops.c

··· 26 26 struct ice_vsi_vlan_ops *vlan_ops; 27 27 struct ice_pf *pf = vsi->back; 28 28 29 + /* setup inner VLAN ops */ 30 + vlan_ops = &vsi->inner_vlan_ops; 31 + 29 32 if (ice_is_dvm_ena(&pf->hw)) { 30 - vlan_ops = &vsi->outer_vlan_ops; 31 - 32 - /* setup outer VLAN ops */ 33 - vlan_ops->set_port_vlan = ice_vsi_set_outer_port_vlan; 34 - vlan_ops->clear_port_vlan = ice_vsi_clear_outer_port_vlan; 35 - 36 - /* setup inner VLAN ops */ 37 - vlan_ops = &vsi->inner_vlan_ops; 38 33 vlan_ops->add_vlan = noop_vlan_arg; 39 34 vlan_ops->del_vlan = noop_vlan_arg; 40 35 vlan_ops->ena_stripping = ice_vsi_ena_inner_stripping; 41 36 vlan_ops->dis_stripping = ice_vsi_dis_inner_stripping; 42 37 vlan_ops->ena_insertion = ice_vsi_ena_inner_insertion; 43 38 vlan_ops->dis_insertion = ice_vsi_dis_inner_insertion; 44 - } else { 45 - vlan_ops = &vsi->inner_vlan_ops; 46 39 40 + /* setup outer VLAN ops */ 41 + vlan_ops = &vsi->outer_vlan_ops; 42 + vlan_ops->set_port_vlan = ice_vsi_set_outer_port_vlan; 43 + vlan_ops->clear_port_vlan = ice_vsi_clear_outer_port_vlan; 44 + } else { 47 45 vlan_ops->set_port_vlan = ice_vsi_set_inner_port_vlan; 48 46 vlan_ops->clear_port_vlan = ice_vsi_clear_inner_port_vlan; 49 47 }

+2 -2

drivers/net/ethernet/intel/idpf/idpf_txrx.c

··· 2941 2941 rx_ptype = le16_get_bits(rx_desc->ptype_err_fflags0, 2942 2942 VIRTCHNL2_RX_FLEX_DESC_ADV_PTYPE_M); 2943 2943 2944 + skb->protocol = eth_type_trans(skb, rxq->vport->netdev); 2945 + 2944 2946 decoded = rxq->vport->rx_ptype_lkup[rx_ptype]; 2945 2947 /* If we don't know the ptype we can't do anything else with it. Just 2946 2948 * pass it up the stack as-is. ··· 2952 2950 2953 2951 /* process RSS/hash */ 2954 2952 idpf_rx_hash(rxq, skb, rx_desc, &decoded); 2955 - 2956 - skb->protocol = eth_type_trans(skb, rxq->vport->netdev); 2957 2953 2958 2954 if (le16_get_bits(rx_desc->hdrlen_flags, 2959 2955 VIRTCHNL2_RX_FLEX_DESC_ADV_RSC_M))

+2

drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c

··· 160 160 continue; 161 161 lmac_bmap = cgx_get_lmac_bmap(rvu_cgx_pdata(cgx, rvu)); 162 162 for_each_set_bit(iter, &lmac_bmap, rvu->hw->lmac_per_cgx) { 163 + if (iter >= MAX_LMAC_COUNT) 164 + continue; 163 165 lmac = cgx_get_lmacid(rvu_cgx_pdata(cgx, rvu), 164 166 iter); 165 167 rvu->pf2cgxlmac_map[pf] = cgxlmac_id_to_bmap(cgx, lmac);

+1 -1

drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c

··· 1657 1657 struct npc_coalesced_kpu_prfl *img_data = NULL; 1658 1658 int i = 0, rc = -EINVAL; 1659 1659 void __iomem *kpu_prfl_addr; 1660 - u16 offset; 1660 + u32 offset; 1661 1661 1662 1662 img_data = (struct npc_coalesced_kpu_prfl __force *)rvu->kpu_prfl_addr; 1663 1663 if (le64_to_cpu(img_data->signature) == KPU_SIGN &&

+1 -1

drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c

··· 1933 1933 * mcam entries are enabled to receive the packets. Hence disable the 1934 1934 * packet I/O. 1935 1935 */ 1936 - if (err == EIO) 1936 + if (err == -EIO) 1937 1937 goto err_disable_rxtx; 1938 1938 else if (err) 1939 1939 goto err_tx_stop_queues;

+8 -2

drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c

··· 14 14 #include <linux/module.h> 15 15 #include <linux/phy.h> 16 16 #include <linux/platform_device.h> 17 + #include <linux/rtnetlink.h> 17 18 #include <linux/skbuff.h> 18 19 19 20 #include "mlxbf_gige.h" ··· 493 492 { 494 493 struct mlxbf_gige *priv = platform_get_drvdata(pdev); 495 494 496 - writeq(0, priv->base + MLXBF_GIGE_INT_EN); 497 - mlxbf_gige_clean_port(priv); 495 + rtnl_lock(); 496 + netif_device_detach(priv->netdev); 497 + 498 + if (netif_running(priv->netdev)) 499 + dev_close(priv->netdev); 500 + 501 + rtnl_unlock(); 498 502 } 499 503 500 504 static const struct acpi_device_id __maybe_unused mlxbf_gige_acpi_match[] = {

+1 -1

drivers/net/ethernet/microsoft/mana/mana_en.c

··· 601 601 602 602 *alloc_size = mtu + MANA_RXBUF_PAD + *headroom; 603 603 604 - *datasize = ALIGN(mtu + ETH_HLEN, MANA_RX_DATA_ALIGN); 604 + *datasize = mtu + ETH_HLEN; 605 605 } 606 606 607 607 static int mana_pre_alloc_rxbufs(struct mana_port_context *mpc, int new_mtu)

+36 -4

drivers/net/ethernet/realtek/r8169_main.c

··· 1314 1314 RTL_W8(tp, IBCR0, RTL_R8(tp, IBCR0) & ~0x01); 1315 1315 } 1316 1316 1317 + static void rtl_dash_loop_wait(struct rtl8169_private *tp, 1318 + const struct rtl_cond *c, 1319 + unsigned long usecs, int n, bool high) 1320 + { 1321 + if (!tp->dash_enabled) 1322 + return; 1323 + rtl_loop_wait(tp, c, usecs, n, high); 1324 + } 1325 + 1326 + static void rtl_dash_loop_wait_high(struct rtl8169_private *tp, 1327 + const struct rtl_cond *c, 1328 + unsigned long d, int n) 1329 + { 1330 + rtl_dash_loop_wait(tp, c, d, n, true); 1331 + } 1332 + 1333 + static void rtl_dash_loop_wait_low(struct rtl8169_private *tp, 1334 + const struct rtl_cond *c, 1335 + unsigned long d, int n) 1336 + { 1337 + rtl_dash_loop_wait(tp, c, d, n, false); 1338 + } 1339 + 1317 1340 static void rtl8168dp_driver_start(struct rtl8169_private *tp) 1318 1341 { 1319 1342 r8168dp_oob_notify(tp, OOB_CMD_DRIVER_START); 1320 - rtl_loop_wait_high(tp, &rtl_dp_ocp_read_cond, 10000, 10); 1343 + rtl_dash_loop_wait_high(tp, &rtl_dp_ocp_read_cond, 10000, 10); 1321 1344 } 1322 1345 1323 1346 static void rtl8168ep_driver_start(struct rtl8169_private *tp) 1324 1347 { 1325 1348 r8168ep_ocp_write(tp, 0x01, 0x180, OOB_CMD_DRIVER_START); 1326 1349 r8168ep_ocp_write(tp, 0x01, 0x30, r8168ep_ocp_read(tp, 0x30) | 0x01); 1327 - rtl_loop_wait_high(tp, &rtl_ep_ocp_read_cond, 10000, 30); 1350 + rtl_dash_loop_wait_high(tp, &rtl_ep_ocp_read_cond, 10000, 30); 1328 1351 } 1329 1352 1330 1353 static void rtl8168_driver_start(struct rtl8169_private *tp) ··· 1361 1338 static void rtl8168dp_driver_stop(struct rtl8169_private *tp) 1362 1339 { 1363 1340 r8168dp_oob_notify(tp, OOB_CMD_DRIVER_STOP); 1364 - rtl_loop_wait_low(tp, &rtl_dp_ocp_read_cond, 10000, 10); 1341 + rtl_dash_loop_wait_low(tp, &rtl_dp_ocp_read_cond, 10000, 10); 1365 1342 } 1366 1343 1367 1344 static void rtl8168ep_driver_stop(struct rtl8169_private *tp) ··· 1369 1346 rtl8168ep_stop_cmac(tp); 1370 1347 r8168ep_ocp_write(tp, 0x01, 0x180, OOB_CMD_DRIVER_STOP); 1371 1348 r8168ep_ocp_write(tp, 0x01, 0x30, r8168ep_ocp_read(tp, 0x30) | 0x01); 1372 - rtl_loop_wait_low(tp, &rtl_ep_ocp_read_cond, 10000, 10); 1349 + rtl_dash_loop_wait_low(tp, &rtl_ep_ocp_read_cond, 10000, 10); 1373 1350 } 1374 1351 1375 1352 static void rtl8168_driver_stop(struct rtl8169_private *tp) ··· 5163 5140 struct pci_dev *pdev = tp->pci_dev; 5164 5141 struct mii_bus *new_bus; 5165 5142 int ret; 5143 + 5144 + /* On some boards with this chip version the BIOS is buggy and misses 5145 + * to reset the PHY page selector. This results in the PHY ID read 5146 + * accessing registers on a different page, returning a more or 5147 + * less random value. Fix this by resetting the page selector first. 5148 + */ 5149 + if (tp->mac_version == RTL_GIGA_MAC_VER_25 || 5150 + tp->mac_version == RTL_GIGA_MAC_VER_26) 5151 + r8169_mdio_write(tp, 0x1f, 0); 5166 5152 5167 5153 new_bus = devm_mdiobus_alloc(&pdev->dev); 5168 5154 if (!new_bus)

+14 -10

drivers/net/ethernet/renesas/ravb_main.c

··· 1324 1324 int q = napi - priv->napi; 1325 1325 int mask = BIT(q); 1326 1326 int quota = budget; 1327 + bool unmask; 1327 1328 1328 1329 /* Processing RX Descriptor Ring */ 1329 1330 /* Clear RX interrupt */ 1330 1331 ravb_write(ndev, ~(mask | RIS0_RESERVED), RIS0); 1331 - if (ravb_rx(ndev, &quota, q)) 1332 - goto out; 1332 + unmask = !ravb_rx(ndev, &quota, q); 1333 1333 1334 1334 /* Processing TX Descriptor Ring */ 1335 1335 spin_lock_irqsave(&priv->lock, flags); ··· 1338 1338 ravb_tx_free(ndev, q, true); 1339 1339 netif_wake_subqueue(ndev, q); 1340 1340 spin_unlock_irqrestore(&priv->lock, flags); 1341 + 1342 + /* Receive error message handling */ 1343 + priv->rx_over_errors = priv->stats[RAVB_BE].rx_over_errors; 1344 + if (info->nc_queues) 1345 + priv->rx_over_errors += priv->stats[RAVB_NC].rx_over_errors; 1346 + if (priv->rx_over_errors != ndev->stats.rx_over_errors) 1347 + ndev->stats.rx_over_errors = priv->rx_over_errors; 1348 + if (priv->rx_fifo_errors != ndev->stats.rx_fifo_errors) 1349 + ndev->stats.rx_fifo_errors = priv->rx_fifo_errors; 1350 + 1351 + if (!unmask) 1352 + goto out; 1341 1353 1342 1354 napi_complete(napi); 1343 1355 ··· 1364 1352 } 1365 1353 spin_unlock_irqrestore(&priv->lock, flags); 1366 1354 1367 - /* Receive error message handling */ 1368 - priv->rx_over_errors = priv->stats[RAVB_BE].rx_over_errors; 1369 - if (info->nc_queues) 1370 - priv->rx_over_errors += priv->stats[RAVB_NC].rx_over_errors; 1371 - if (priv->rx_over_errors != ndev->stats.rx_over_errors) 1372 - ndev->stats.rx_over_errors = priv->rx_over_errors; 1373 - if (priv->rx_fifo_errors != ndev->stats.rx_fifo_errors) 1374 - ndev->stats.rx_fifo_errors = priv->rx_fifo_errors; 1375 1355 out: 1376 1356 return budget - quota; 1377 1357 }

+31 -9

drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c

··· 92 92 u32 prio, u32 queue) 93 93 { 94 94 void __iomem *ioaddr = hw->pcsr; 95 - u32 base_register; 96 - u32 value; 95 + u32 clear_mask = 0; 96 + u32 ctrl2, ctrl3; 97 + int i; 97 98 98 - base_register = (queue < 4) ? GMAC_RXQ_CTRL2 : GMAC_RXQ_CTRL3; 99 - if (queue >= 4) 99 + ctrl2 = readl(ioaddr + GMAC_RXQ_CTRL2); 100 + ctrl3 = readl(ioaddr + GMAC_RXQ_CTRL3); 101 + 102 + /* The software must ensure that the same priority 103 + * is not mapped to multiple Rx queues 104 + */ 105 + for (i = 0; i < 4; i++) 106 + clear_mask |= ((prio << GMAC_RXQCTRL_PSRQX_SHIFT(i)) & 107 + GMAC_RXQCTRL_PSRQX_MASK(i)); 108 + 109 + ctrl2 &= ~clear_mask; 110 + ctrl3 &= ~clear_mask; 111 + 112 + /* First assign new priorities to a queue, then 113 + * clear them from others queues 114 + */ 115 + if (queue < 4) { 116 + ctrl2 |= (prio << GMAC_RXQCTRL_PSRQX_SHIFT(queue)) & 117 + GMAC_RXQCTRL_PSRQX_MASK(queue); 118 + 119 + writel(ctrl2, ioaddr + GMAC_RXQ_CTRL2); 120 + writel(ctrl3, ioaddr + GMAC_RXQ_CTRL3); 121 + } else { 100 122 queue -= 4; 101 123 102 - value = readl(ioaddr + base_register); 103 - 104 - value &= ~GMAC_RXQCTRL_PSRQX_MASK(queue); 105 - value |= (prio << GMAC_RXQCTRL_PSRQX_SHIFT(queue)) & 124 + ctrl3 |= (prio << GMAC_RXQCTRL_PSRQX_SHIFT(queue)) & 106 125 GMAC_RXQCTRL_PSRQX_MASK(queue); 107 - writel(value, ioaddr + base_register); 126 + 127 + writel(ctrl3, ioaddr + GMAC_RXQ_CTRL3); 128 + writel(ctrl2, ioaddr + GMAC_RXQ_CTRL2); 129 + } 108 130 } 109 131 110 132 static void dwmac4_tx_queue_priority(struct mac_device_info *hw,

+31 -7

drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c

··· 105 105 u32 queue) 106 106 { 107 107 void __iomem *ioaddr = hw->pcsr; 108 - u32 value, reg; 108 + u32 clear_mask = 0; 109 + u32 ctrl2, ctrl3; 110 + int i; 109 111 110 - reg = (queue < 4) ? XGMAC_RXQ_CTRL2 : XGMAC_RXQ_CTRL3; 111 - if (queue >= 4) 112 + ctrl2 = readl(ioaddr + XGMAC_RXQ_CTRL2); 113 + ctrl3 = readl(ioaddr + XGMAC_RXQ_CTRL3); 114 + 115 + /* The software must ensure that the same priority 116 + * is not mapped to multiple Rx queues 117 + */ 118 + for (i = 0; i < 4; i++) 119 + clear_mask |= ((prio << XGMAC_PSRQ_SHIFT(i)) & 120 + XGMAC_PSRQ(i)); 121 + 122 + ctrl2 &= ~clear_mask; 123 + ctrl3 &= ~clear_mask; 124 + 125 + /* First assign new priorities to a queue, then 126 + * clear them from others queues 127 + */ 128 + if (queue < 4) { 129 + ctrl2 |= (prio << XGMAC_PSRQ_SHIFT(queue)) & 130 + XGMAC_PSRQ(queue); 131 + 132 + writel(ctrl2, ioaddr + XGMAC_RXQ_CTRL2); 133 + writel(ctrl3, ioaddr + XGMAC_RXQ_CTRL3); 134 + } else { 112 135 queue -= 4; 113 136 114 - value = readl(ioaddr + reg); 115 - value &= ~XGMAC_PSRQ(queue); 116 - value |= (prio << XGMAC_PSRQ_SHIFT(queue)) & XGMAC_PSRQ(queue); 137 + ctrl3 |= (prio << XGMAC_PSRQ_SHIFT(queue)) & 138 + XGMAC_PSRQ(queue); 117 139 118 - writel(value, ioaddr + reg); 140 + writel(ctrl3, ioaddr + XGMAC_RXQ_CTRL3); 141 + writel(ctrl2, ioaddr + XGMAC_RXQ_CTRL2); 142 + } 119 143 } 120 144 121 145 static void dwxgmac2_tx_queue_prio(struct mac_device_info *hw, u32 prio,

+5 -3

drivers/net/ethernet/wangxun/txgbe/txgbe_phy.c

··· 20 20 #include "txgbe_phy.h" 21 21 #include "txgbe_hw.h" 22 22 23 + #define TXGBE_I2C_CLK_DEV_NAME "i2c_dw" 24 + 23 25 static int txgbe_swnodes_register(struct txgbe *txgbe) 24 26 { 25 27 struct txgbe_nodes *nodes = &txgbe->nodes; ··· 573 571 char clk_name[32]; 574 572 struct clk *clk; 575 573 576 - snprintf(clk_name, sizeof(clk_name), "i2c_dw.%d", 577 - pci_dev_id(pdev)); 574 + snprintf(clk_name, sizeof(clk_name), "%s.%d", 575 + TXGBE_I2C_CLK_DEV_NAME, pci_dev_id(pdev)); 578 576 579 577 clk = clk_register_fixed_rate(NULL, clk_name, NULL, 0, 156250000); 580 578 if (IS_ERR(clk)) ··· 636 634 637 635 info.parent = &pdev->dev; 638 636 info.fwnode = software_node_fwnode(txgbe->nodes.group[SWNODE_I2C]); 639 - info.name = "i2c_designware"; 637 + info.name = TXGBE_I2C_CLK_DEV_NAME; 640 638 info.id = pci_dev_id(pdev); 641 639 642 640 info.res = &DEFINE_RES_IRQ(pdev->irq);

+24 -7

drivers/net/phy/micrel.c

··· 2431 2431 struct lan8814_ptp_rx_ts *rx_ts, *tmp; 2432 2432 int txcfg = 0, rxcfg = 0; 2433 2433 int pkt_ts_enable; 2434 + int tx_mod; 2434 2435 2435 2436 ptp_priv->hwts_tx_type = config->tx_type; 2436 2437 ptp_priv->rx_filter = config->rx_filter; ··· 2478 2477 lanphy_write_page_reg(ptp_priv->phydev, 5, PTP_RX_TIMESTAMP_EN, pkt_ts_enable); 2479 2478 lanphy_write_page_reg(ptp_priv->phydev, 5, PTP_TX_TIMESTAMP_EN, pkt_ts_enable); 2480 2479 2481 - if (ptp_priv->hwts_tx_type == HWTSTAMP_TX_ONESTEP_SYNC) 2480 + tx_mod = lanphy_read_page_reg(ptp_priv->phydev, 5, PTP_TX_MOD); 2481 + if (ptp_priv->hwts_tx_type == HWTSTAMP_TX_ONESTEP_SYNC) { 2482 2482 lanphy_write_page_reg(ptp_priv->phydev, 5, PTP_TX_MOD, 2483 - PTP_TX_MOD_TX_PTP_SYNC_TS_INSERT_); 2483 + tx_mod | PTP_TX_MOD_TX_PTP_SYNC_TS_INSERT_); 2484 + } else if (ptp_priv->hwts_tx_type == HWTSTAMP_TX_ON) { 2485 + lanphy_write_page_reg(ptp_priv->phydev, 5, PTP_TX_MOD, 2486 + tx_mod & ~PTP_TX_MOD_TX_PTP_SYNC_TS_INSERT_); 2487 + } 2484 2488 2485 2489 if (config->rx_filter != HWTSTAMP_FILTER_NONE) 2486 2490 lan8814_config_ts_intr(ptp_priv->phydev, true); ··· 2543 2537 } 2544 2538 } 2545 2539 2546 - static void lan8814_get_sig_rx(struct sk_buff *skb, u16 *sig) 2540 + static bool lan8814_get_sig_rx(struct sk_buff *skb, u16 *sig) 2547 2541 { 2548 2542 struct ptp_header *ptp_header; 2549 2543 u32 type; ··· 2553 2547 ptp_header = ptp_parse_header(skb, type); 2554 2548 skb_pull_inline(skb, ETH_HLEN); 2555 2549 2550 + if (!ptp_header) 2551 + return false; 2552 + 2556 2553 *sig = (__force u16)(ntohs(ptp_header->sequence_id)); 2554 + return true; 2557 2555 } 2558 2556 2559 2557 static bool lan8814_match_rx_skb(struct kszphy_ptp_priv *ptp_priv, ··· 2569 2559 bool ret = false; 2570 2560 u16 skb_sig; 2571 2561 2572 - lan8814_get_sig_rx(skb, &skb_sig); 2562 + if (!lan8814_get_sig_rx(skb, &skb_sig)) 2563 + return ret; 2573 2564 2574 2565 /* Iterate over all RX timestamps and match it with the received skbs */ 2575 2566 spin_lock_irqsave(&ptp_priv->rx_ts_lock, flags); ··· 2845 2834 return 0; 2846 2835 } 2847 2836 2848 - static void lan8814_get_sig_tx(struct sk_buff *skb, u16 *sig) 2837 + static bool lan8814_get_sig_tx(struct sk_buff *skb, u16 *sig) 2849 2838 { 2850 2839 struct ptp_header *ptp_header; 2851 2840 u32 type; ··· 2853 2842 type = ptp_classify_raw(skb); 2854 2843 ptp_header = ptp_parse_header(skb, type); 2855 2844 2845 + if (!ptp_header) 2846 + return false; 2847 + 2856 2848 *sig = (__force u16)(ntohs(ptp_header->sequence_id)); 2849 + return true; 2857 2850 } 2858 2851 2859 2852 static void lan8814_match_tx_skb(struct kszphy_ptp_priv *ptp_priv, ··· 2871 2856 2872 2857 spin_lock_irqsave(&ptp_priv->tx_queue.lock, flags); 2873 2858 skb_queue_walk_safe(&ptp_priv->tx_queue, skb, skb_tmp) { 2874 - lan8814_get_sig_tx(skb, &skb_sig); 2859 + if (!lan8814_get_sig_tx(skb, &skb_sig)) 2860 + continue; 2875 2861 2876 2862 if (memcmp(&skb_sig, &seq_id, sizeof(seq_id))) 2877 2863 continue; ··· 2926 2910 2927 2911 spin_lock_irqsave(&ptp_priv->rx_queue.lock, flags); 2928 2912 skb_queue_walk_safe(&ptp_priv->rx_queue, skb, skb_tmp) { 2929 - lan8814_get_sig_rx(skb, &skb_sig); 2913 + if (!lan8814_get_sig_rx(skb, &skb_sig)) 2914 + continue; 2930 2915 2931 2916 if (memcmp(&skb_sig, &rx_ts->seq_id, sizeof(rx_ts->seq_id))) 2932 2917 continue;

+2

drivers/net/usb/ax88179_178a.c

··· 1273 1273 1274 1274 if (is_valid_ether_addr(mac)) { 1275 1275 eth_hw_addr_set(dev->net, mac); 1276 + if (!is_local_ether_addr(mac)) 1277 + dev->net->addr_assign_type = NET_ADDR_PERM; 1276 1278 } else { 1277 1279 netdev_info(dev->net, "invalid MAC address, using random\n"); 1278 1280 eth_hw_addr_random(dev->net);

+1

drivers/net/xen-netfront.c

··· 285 285 return NULL; 286 286 } 287 287 skb_add_rx_frag(skb, 0, page, 0, 0, PAGE_SIZE); 288 + skb_mark_for_recycle(skb); 288 289 289 290 /* Align ip header to a 16 bytes boundary */ 290 291 skb_reserve(skb, NET_IP_ALIGN);

+32 -9

drivers/nvme/host/core.c

··· 2076 2076 bool vwc = ns->ctrl->vwc & NVME_CTRL_VWC_PRESENT; 2077 2077 struct queue_limits lim; 2078 2078 struct nvme_id_ns_nvm *nvm = NULL; 2079 + struct nvme_zone_info zi = {}; 2079 2080 struct nvme_id_ns *id; 2080 2081 sector_t capacity; 2081 2082 unsigned lbaf; ··· 2089 2088 if (id->ncap == 0) { 2090 2089 /* namespace not allocated or attached */ 2091 2090 info->is_removed = true; 2092 - ret = -ENODEV; 2091 + ret = -ENXIO; 2093 2092 goto out; 2094 2093 } 2094 + lbaf = nvme_lbaf_index(id->flbas); 2095 2095 2096 2096 if (ns->ctrl->ctratt & NVME_CTRL_ATTR_ELBAS) { 2097 2097 ret = nvme_identify_ns_nvm(ns->ctrl, info->nsid, &nvm); ··· 2100 2098 goto out; 2101 2099 } 2102 2100 2101 + if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && 2102 + ns->head->ids.csi == NVME_CSI_ZNS) { 2103 + ret = nvme_query_zone_info(ns, lbaf, &zi); 2104 + if (ret < 0) 2105 + goto out; 2106 + } 2107 + 2103 2108 blk_mq_freeze_queue(ns->disk->queue); 2104 - lbaf = nvme_lbaf_index(id->flbas); 2105 2109 ns->head->lba_shift = id->lbaf[lbaf].ds; 2106 2110 ns->head->nuse = le64_to_cpu(id->nuse); 2107 2111 capacity = nvme_lba_to_sect(ns->head, le64_to_cpu(id->nsze)); ··· 2120 2112 capacity = 0; 2121 2113 nvme_config_discard(ns, &lim); 2122 2114 if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && 2123 - ns->head->ids.csi == NVME_CSI_ZNS) { 2124 - ret = nvme_update_zone_info(ns, lbaf, &lim); 2125 - if (ret) { 2126 - blk_mq_unfreeze_queue(ns->disk->queue); 2127 - goto out; 2128 - } 2129 - } 2115 + ns->head->ids.csi == NVME_CSI_ZNS) 2116 + nvme_update_zone_info(ns, &lim, &zi); 2130 2117 ret = queue_limits_commit_update(ns->disk->queue, &lim); 2131 2118 if (ret) { 2132 2119 blk_mq_unfreeze_queue(ns->disk->queue); ··· 2204 2201 } 2205 2202 2206 2203 if (!ret && nvme_ns_head_multipath(ns->head)) { 2204 + struct queue_limits *ns_lim = &ns->disk->queue->limits; 2207 2205 struct queue_limits lim; 2208 2206 2209 2207 blk_mq_freeze_queue(ns->head->disk->queue); ··· 2216 2212 set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info)); 2217 2213 nvme_mpath_revalidate_paths(ns); 2218 2214 2215 + /* 2216 + * queue_limits mixes values that are the hardware limitations 2217 + * for bio splitting with what is the device configuration. 2218 + * 2219 + * For NVMe the device configuration can change after e.g. a 2220 + * Format command, and we really want to pick up the new format 2221 + * value here. But we must still stack the queue limits to the 2222 + * least common denominator for multipathing to split the bios 2223 + * properly. 2224 + * 2225 + * To work around this, we explicitly set the device 2226 + * configuration to those that we just queried, but only stack 2227 + * the splitting limits in to make sure we still obey possibly 2228 + * lower limitations of other controllers. 2229 + */ 2219 2230 lim = queue_limits_start_update(ns->head->disk->queue); 2231 + lim.logical_block_size = ns_lim->logical_block_size; 2232 + lim.physical_block_size = ns_lim->physical_block_size; 2233 + lim.io_min = ns_lim->io_min; 2234 + lim.io_opt = ns_lim->io_opt; 2220 2235 queue_limits_stack_bdev(&lim, ns->disk->part0, 0, 2221 2236 ns->head->disk->disk_name); 2222 2237 ret = queue_limits_commit_update(ns->head->disk->queue, &lim);

+2 -2

drivers/nvme/host/fc.c

··· 2428 2428 * controller. Called after last nvme_put_ctrl() call 2429 2429 */ 2430 2430 static void 2431 - nvme_fc_nvme_ctrl_freed(struct nvme_ctrl *nctrl) 2431 + nvme_fc_free_ctrl(struct nvme_ctrl *nctrl) 2432 2432 { 2433 2433 struct nvme_fc_ctrl *ctrl = to_fc_ctrl(nctrl); 2434 2434 ··· 3384 3384 .reg_read32 = nvmf_reg_read32, 3385 3385 .reg_read64 = nvmf_reg_read64, 3386 3386 .reg_write32 = nvmf_reg_write32, 3387 - .free_ctrl = nvme_fc_nvme_ctrl_freed, 3387 + .free_ctrl = nvme_fc_free_ctrl, 3388 3388 .submit_async_event = nvme_fc_submit_async_event, 3389 3389 .delete_ctrl = nvme_fc_delete_ctrl, 3390 3390 .get_address = nvmf_get_address,

+10 -2

drivers/nvme/host/nvme.h

··· 1036 1036 } 1037 1037 #endif /* CONFIG_NVME_MULTIPATH */ 1038 1038 1039 + struct nvme_zone_info { 1040 + u64 zone_size; 1041 + unsigned int max_open_zones; 1042 + unsigned int max_active_zones; 1043 + }; 1044 + 1039 1045 int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector, 1040 1046 unsigned int nr_zones, report_zones_cb cb, void *data); 1041 - int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf, 1042 - struct queue_limits *lim); 1047 + int nvme_query_zone_info(struct nvme_ns *ns, unsigned lbaf, 1048 + struct nvme_zone_info *zi); 1049 + void nvme_update_zone_info(struct nvme_ns *ns, struct queue_limits *lim, 1050 + struct nvme_zone_info *zi); 1043 1051 #ifdef CONFIG_BLK_DEV_ZONED 1044 1052 blk_status_t nvme_setup_zone_mgmt_send(struct nvme_ns *ns, struct request *req, 1045 1053 struct nvme_command *cmnd,

+20 -13

drivers/nvme/host/zns.c

··· 35 35 return 0; 36 36 } 37 37 38 - int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf, 39 - struct queue_limits *lim) 38 + int nvme_query_zone_info(struct nvme_ns *ns, unsigned lbaf, 39 + struct nvme_zone_info *zi) 40 40 { 41 41 struct nvme_effects_log *log = ns->head->effects; 42 42 struct nvme_command c = { }; ··· 89 89 goto free_data; 90 90 } 91 91 92 - ns->head->zsze = 93 - nvme_lba_to_sect(ns->head, le64_to_cpu(id->lbafe[lbaf].zsze)); 94 - if (!is_power_of_2(ns->head->zsze)) { 92 + zi->zone_size = le64_to_cpu(id->lbafe[lbaf].zsze); 93 + if (!is_power_of_2(zi->zone_size)) { 95 94 dev_warn(ns->ctrl->device, 96 - "invalid zone size:%llu for namespace:%u\n", 97 - ns->head->zsze, ns->head->ns_id); 95 + "invalid zone size: %llu for namespace: %u\n", 96 + zi->zone_size, ns->head->ns_id); 98 97 status = -ENODEV; 99 98 goto free_data; 100 99 } 100 + zi->max_open_zones = le32_to_cpu(id->mor) + 1; 101 + zi->max_active_zones = le32_to_cpu(id->mar) + 1; 101 102 102 - blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, ns->queue); 103 - lim->zoned = 1; 104 - lim->max_open_zones = le32_to_cpu(id->mor) + 1; 105 - lim->max_active_zones = le32_to_cpu(id->mar) + 1; 106 - lim->chunk_sectors = ns->head->zsze; 107 - lim->max_zone_append_sectors = ns->ctrl->max_zone_append; 108 103 free_data: 109 104 kfree(id); 110 105 return status; 106 + } 107 + 108 + void nvme_update_zone_info(struct nvme_ns *ns, struct queue_limits *lim, 109 + struct nvme_zone_info *zi) 110 + { 111 + lim->zoned = 1; 112 + lim->max_open_zones = zi->max_open_zones; 113 + lim->max_active_zones = zi->max_active_zones; 114 + lim->max_zone_append_sectors = ns->ctrl->max_zone_append; 115 + lim->chunk_sectors = ns->head->zsze = 116 + nvme_lba_to_sect(ns->head, zi->zone_size); 117 + blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, ns->queue); 111 118 } 112 119 113 120 static void *nvme_zns_alloc_report_buffer(struct nvme_ns *ns,

+47

drivers/nvme/target/configfs.c

··· 1613 1613 return ERR_PTR(-EINVAL); 1614 1614 } 1615 1615 1616 + if (sysfs_streq(name, nvmet_disc_subsys->subsysnqn)) { 1617 + pr_err("can't create subsystem using unique discovery NQN\n"); 1618 + return ERR_PTR(-EINVAL); 1619 + } 1620 + 1616 1621 subsys = nvmet_subsys_alloc(name, NVME_NQN_NVME); 1617 1622 if (IS_ERR(subsys)) 1618 1623 return ERR_CAST(subsys); ··· 2164 2159 2165 2160 static struct config_group nvmet_hosts_group; 2166 2161 2162 + static ssize_t nvmet_root_discovery_nqn_show(struct config_item *item, 2163 + char *page) 2164 + { 2165 + return snprintf(page, PAGE_SIZE, "%s\n", nvmet_disc_subsys->subsysnqn); 2166 + } 2167 + 2168 + static ssize_t nvmet_root_discovery_nqn_store(struct config_item *item, 2169 + const char *page, size_t count) 2170 + { 2171 + struct list_head *entry; 2172 + size_t len; 2173 + 2174 + len = strcspn(page, "\n"); 2175 + if (!len || len > NVMF_NQN_FIELD_LEN - 1) 2176 + return -EINVAL; 2177 + 2178 + down_write(&nvmet_config_sem); 2179 + list_for_each(entry, &nvmet_subsystems_group.cg_children) { 2180 + struct config_item *item = 2181 + container_of(entry, struct config_item, ci_entry); 2182 + 2183 + if (!strncmp(config_item_name(item), page, len)) { 2184 + pr_err("duplicate NQN %s\n", config_item_name(item)); 2185 + up_write(&nvmet_config_sem); 2186 + return -EINVAL; 2187 + } 2188 + } 2189 + memset(nvmet_disc_subsys->subsysnqn, 0, NVMF_NQN_FIELD_LEN); 2190 + memcpy(nvmet_disc_subsys->subsysnqn, page, len); 2191 + up_write(&nvmet_config_sem); 2192 + 2193 + return len; 2194 + } 2195 + 2196 + CONFIGFS_ATTR(nvmet_root_, discovery_nqn); 2197 + 2198 + static struct configfs_attribute *nvmet_root_attrs[] = { 2199 + &nvmet_root_attr_discovery_nqn, 2200 + NULL, 2201 + }; 2202 + 2167 2203 static const struct config_item_type nvmet_root_type = { 2204 + .ct_attrs = nvmet_root_attrs, 2168 2205 .ct_owner = THIS_MODULE, 2169 2206 }; 2170 2207

+7

drivers/nvme/target/core.c

··· 1541 1541 } 1542 1542 1543 1543 down_read(&nvmet_config_sem); 1544 + if (!strncmp(nvmet_disc_subsys->subsysnqn, subsysnqn, 1545 + NVMF_NQN_SIZE)) { 1546 + if (kref_get_unless_zero(&nvmet_disc_subsys->ref)) { 1547 + up_read(&nvmet_config_sem); 1548 + return nvmet_disc_subsys; 1549 + } 1550 + } 1544 1551 list_for_each_entry(p, &port->subsystems, entry) { 1545 1552 if (!strncmp(p->subsys->subsysnqn, subsysnqn, 1546 1553 NVMF_NQN_SIZE)) {

+10 -7

drivers/nvme/target/fc.c

··· 1115 1115 } 1116 1116 1117 1117 static bool 1118 - nvmet_fc_assoc_exits(struct nvmet_fc_tgtport *tgtport, u64 association_id) 1118 + nvmet_fc_assoc_exists(struct nvmet_fc_tgtport *tgtport, u64 association_id) 1119 1119 { 1120 1120 struct nvmet_fc_tgt_assoc *a; 1121 + bool found = false; 1121 1122 1123 + rcu_read_lock(); 1122 1124 list_for_each_entry_rcu(a, &tgtport->assoc_list, a_list) { 1123 - if (association_id == a->association_id) 1124 - return true; 1125 + if (association_id == a->association_id) { 1126 + found = true; 1127 + break; 1128 + } 1125 1129 } 1130 + rcu_read_unlock(); 1126 1131 1127 - return false; 1132 + return found; 1128 1133 } 1129 1134 1130 1135 static struct nvmet_fc_tgt_assoc * ··· 1169 1164 ran = ran << BYTES_FOR_QID_SHIFT; 1170 1165 1171 1166 spin_lock_irqsave(&tgtport->lock, flags); 1172 - rcu_read_lock(); 1173 - if (!nvmet_fc_assoc_exits(tgtport, ran)) { 1167 + if (!nvmet_fc_assoc_exists(tgtport, ran)) { 1174 1168 assoc->association_id = ran; 1175 1169 list_add_tail_rcu(&assoc->a_list, &tgtport->assoc_list); 1176 1170 done = true; 1177 1171 } 1178 - rcu_read_unlock(); 1179 1172 spin_unlock_irqrestore(&tgtport->lock, flags); 1180 1173 } while (!done); 1181 1174

+12

drivers/of/dynamic.c

··· 9 9 10 10 #define pr_fmt(fmt) "OF: " fmt 11 11 12 + #include <linux/device.h> 12 13 #include <linux/of.h> 13 14 #include <linux/spinlock.h> 14 15 #include <linux/slab.h> ··· 667 666 void of_changeset_destroy(struct of_changeset *ocs) 668 667 { 669 668 struct of_changeset_entry *ce, *cen; 669 + 670 + /* 671 + * When a device is deleted, the device links to/from it are also queued 672 + * for deletion. Until these device links are freed, the devices 673 + * themselves aren't freed. If the device being deleted is due to an 674 + * overlay change, this device might be holding a reference to a device 675 + * node that will be freed. So, wait until all already pending device 676 + * links are deleted before freeing a device node. This ensures we don't 677 + * free any device node that has a non-zero reference count. 678 + */ 679 + device_link_wait_removal(); 670 680 671 681 list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node) 672 682 __of_changeset_entry_destroy(ce);

+8

drivers/of/module.c

··· 16 16 ssize_t csize; 17 17 ssize_t tsize; 18 18 19 + /* 20 + * Prevent a kernel oops in vsnprintf() -- it only allows passing a 21 + * NULL ptr when the length is also 0. Also filter out the negative 22 + * lengths... 23 + */ 24 + if ((len > 0 && !str) || len < 0) 25 + return -EINVAL; 26 + 19 27 /* Name & Type */ 20 28 /* %p eats all alphanum characters, so %c must be used here */ 21 29 csize = snprintf(str, len, "of:N%pOFn%c%s", np, 'T',

+4

drivers/perf/riscv_pmu.c

··· 313 313 u64 event_config = 0; 314 314 uint64_t cmask; 315 315 316 + /* driver does not support branch stack sampling */ 317 + if (has_branch_stack(event)) 318 + return -EOPNOTSUPP; 319 + 316 320 hwc->flags = 0; 317 321 mapped_event = rvpmu->event_map(event, &event_config); 318 322 if (mapped_event < 0) {

+1 -1

drivers/pwm/core.c

··· 443 443 if (IS_ERR(pwm)) 444 444 return pwm; 445 445 446 - if (args->args_count > 1) 446 + if (args->args_count > 0) 447 447 pwm->args.period = args->args[0]; 448 448 449 449 pwm->args.polarity = PWM_POLARITY_NORMAL;

+7

drivers/regulator/tps65132-regulator.c

··· 267 267 }; 268 268 MODULE_DEVICE_TABLE(i2c, tps65132_id); 269 269 270 + static const struct of_device_id __maybe_unused tps65132_of_match[] = { 271 + { .compatible = "ti,tps65132" }, 272 + {}, 273 + }; 274 + MODULE_DEVICE_TABLE(of, tps65132_of_match); 275 + 270 276 static struct i2c_driver tps65132_i2c_driver = { 271 277 .driver = { 272 278 .name = "tps65132", 273 279 .probe_type = PROBE_PREFER_ASYNCHRONOUS, 280 + .of_match_table = of_match_ptr(tps65132_of_match), 274 281 }, 275 282 .probe = tps65132_probe, 276 283 .id_table = tps65132_id,

+1 -1

drivers/scsi/libsas/sas_expander.c

··· 135 135 136 136 static inline void *alloc_smp_req(int size) 137 137 { 138 - u8 *p = kzalloc(size, GFP_KERNEL); 138 + u8 *p = kzalloc(ALIGN(size, ARCH_DMA_MINALIGN), GFP_KERNEL); 139 139 if (p) 140 140 p[0] = SMP_REQUEST; 141 141 return p;

+10 -10

drivers/scsi/myrb.c

··· 1775 1775 1776 1776 name = myrb_devstate_name(ldev_info->state); 1777 1777 if (name) 1778 - ret = snprintf(buf, 32, "%s\n", name); 1778 + ret = snprintf(buf, 64, "%s\n", name); 1779 1779 else 1780 - ret = snprintf(buf, 32, "Invalid (%02X)\n", 1780 + ret = snprintf(buf, 64, "Invalid (%02X)\n", 1781 1781 ldev_info->state); 1782 1782 } else { 1783 1783 struct myrb_pdev_state *pdev_info = sdev->hostdata; ··· 1796 1796 else 1797 1797 name = myrb_devstate_name(pdev_info->state); 1798 1798 if (name) 1799 - ret = snprintf(buf, 32, "%s\n", name); 1799 + ret = snprintf(buf, 64, "%s\n", name); 1800 1800 else 1801 - ret = snprintf(buf, 32, "Invalid (%02X)\n", 1801 + ret = snprintf(buf, 64, "Invalid (%02X)\n", 1802 1802 pdev_info->state); 1803 1803 } 1804 1804 return ret; ··· 1886 1886 1887 1887 name = myrb_raidlevel_name(ldev_info->raid_level); 1888 1888 if (!name) 1889 - return snprintf(buf, 32, "Invalid (%02X)\n", 1889 + return snprintf(buf, 64, "Invalid (%02X)\n", 1890 1890 ldev_info->state); 1891 - return snprintf(buf, 32, "%s\n", name); 1891 + return snprintf(buf, 64, "%s\n", name); 1892 1892 } 1893 - return snprintf(buf, 32, "Physical Drive\n"); 1893 + return snprintf(buf, 64, "Physical Drive\n"); 1894 1894 } 1895 1895 static DEVICE_ATTR_RO(raid_level); 1896 1896 ··· 1903 1903 unsigned char status; 1904 1904 1905 1905 if (sdev->channel < myrb_logical_channel(sdev->host)) 1906 - return snprintf(buf, 32, "physical device - not rebuilding\n"); 1906 + return snprintf(buf, 64, "physical device - not rebuilding\n"); 1907 1907 1908 1908 status = myrb_get_rbld_progress(cb, &rbld_buf); 1909 1909 1910 1910 if (rbld_buf.ldev_num != sdev->id || 1911 1911 status != MYRB_STATUS_SUCCESS) 1912 - return snprintf(buf, 32, "not rebuilding\n"); 1912 + return snprintf(buf, 64, "not rebuilding\n"); 1913 1913 1914 - return snprintf(buf, 32, "rebuilding block %u of %u\n", 1914 + return snprintf(buf, 64, "rebuilding block %u of %u\n", 1915 1915 rbld_buf.ldev_size - rbld_buf.blocks_left, 1916 1916 rbld_buf.ldev_size); 1917 1917 }

+12 -12

drivers/scsi/myrs.c

··· 947 947 948 948 name = myrs_devstate_name(ldev_info->dev_state); 949 949 if (name) 950 - ret = snprintf(buf, 32, "%s\n", name); 950 + ret = snprintf(buf, 64, "%s\n", name); 951 951 else 952 - ret = snprintf(buf, 32, "Invalid (%02X)\n", 952 + ret = snprintf(buf, 64, "Invalid (%02X)\n", 953 953 ldev_info->dev_state); 954 954 } else { 955 955 struct myrs_pdev_info *pdev_info; ··· 958 958 pdev_info = sdev->hostdata; 959 959 name = myrs_devstate_name(pdev_info->dev_state); 960 960 if (name) 961 - ret = snprintf(buf, 32, "%s\n", name); 961 + ret = snprintf(buf, 64, "%s\n", name); 962 962 else 963 - ret = snprintf(buf, 32, "Invalid (%02X)\n", 963 + ret = snprintf(buf, 64, "Invalid (%02X)\n", 964 964 pdev_info->dev_state); 965 965 } 966 966 return ret; ··· 1066 1066 ldev_info = sdev->hostdata; 1067 1067 name = myrs_raid_level_name(ldev_info->raid_level); 1068 1068 if (!name) 1069 - return snprintf(buf, 32, "Invalid (%02X)\n", 1069 + return snprintf(buf, 64, "Invalid (%02X)\n", 1070 1070 ldev_info->dev_state); 1071 1071 1072 1072 } else 1073 1073 name = myrs_raid_level_name(MYRS_RAID_PHYSICAL); 1074 1074 1075 - return snprintf(buf, 32, "%s\n", name); 1075 + return snprintf(buf, 64, "%s\n", name); 1076 1076 } 1077 1077 static DEVICE_ATTR_RO(raid_level); 1078 1078 ··· 1086 1086 unsigned char status; 1087 1087 1088 1088 if (sdev->channel < cs->ctlr_info->physchan_present) 1089 - return snprintf(buf, 32, "physical device - not rebuilding\n"); 1089 + return snprintf(buf, 64, "physical device - not rebuilding\n"); 1090 1090 1091 1091 ldev_info = sdev->hostdata; 1092 1092 ldev_num = ldev_info->ldev_num; ··· 1098 1098 return -EIO; 1099 1099 } 1100 1100 if (ldev_info->rbld_active) { 1101 - return snprintf(buf, 32, "rebuilding block %zu of %zu\n", 1101 + return snprintf(buf, 64, "rebuilding block %zu of %zu\n", 1102 1102 (size_t)ldev_info->rbld_lba, 1103 1103 (size_t)ldev_info->cfg_devsize); 1104 1104 } else 1105 - return snprintf(buf, 32, "not rebuilding\n"); 1105 + return snprintf(buf, 64, "not rebuilding\n"); 1106 1106 } 1107 1107 1108 1108 static ssize_t rebuild_store(struct device *dev, ··· 1190 1190 unsigned short ldev_num; 1191 1191 1192 1192 if (sdev->channel < cs->ctlr_info->physchan_present) 1193 - return snprintf(buf, 32, "physical device - not checking\n"); 1193 + return snprintf(buf, 64, "physical device - not checking\n"); 1194 1194 1195 1195 ldev_info = sdev->hostdata; 1196 1196 if (!ldev_info) ··· 1198 1198 ldev_num = ldev_info->ldev_num; 1199 1199 myrs_get_ldev_info(cs, ldev_num, ldev_info); 1200 1200 if (ldev_info->cc_active) 1201 - return snprintf(buf, 32, "checking block %zu of %zu\n", 1201 + return snprintf(buf, 64, "checking block %zu of %zu\n", 1202 1202 (size_t)ldev_info->cc_lba, 1203 1203 (size_t)ldev_info->cfg_devsize); 1204 1204 else 1205 - return snprintf(buf, 32, "not checking\n"); 1205 + return snprintf(buf, 64, "not checking\n"); 1206 1206 } 1207 1207 1208 1208 static ssize_t consistency_check_store(struct device *dev,

+1 -1

drivers/scsi/sd.c

··· 3920 3920 3921 3921 error = device_add_disk(dev, gd, NULL); 3922 3922 if (error) { 3923 - put_device(&sdkp->disk_dev); 3923 + device_unregister(&sdkp->disk_dev); 3924 3924 put_disk(gd); 3925 3925 goto out; 3926 3926 }

+6 -8

drivers/spi/spi-fsl-lpspi.c

··· 852 852 fsl_lpspi->base = devm_platform_get_and_ioremap_resource(pdev, 0, &res); 853 853 if (IS_ERR(fsl_lpspi->base)) { 854 854 ret = PTR_ERR(fsl_lpspi->base); 855 - goto out_controller_put; 855 + return ret; 856 856 } 857 857 fsl_lpspi->base_phys = res->start; 858 858 859 859 irq = platform_get_irq(pdev, 0); 860 860 if (irq < 0) { 861 861 ret = irq; 862 - goto out_controller_put; 862 + return ret; 863 863 } 864 864 865 865 ret = devm_request_irq(&pdev->dev, irq, fsl_lpspi_isr, 0, 866 866 dev_name(&pdev->dev), fsl_lpspi); 867 867 if (ret) { 868 868 dev_err(&pdev->dev, "can't get irq%d: %d\n", irq, ret); 869 - goto out_controller_put; 869 + return ret; 870 870 } 871 871 872 872 fsl_lpspi->clk_per = devm_clk_get(&pdev->dev, "per"); 873 873 if (IS_ERR(fsl_lpspi->clk_per)) { 874 874 ret = PTR_ERR(fsl_lpspi->clk_per); 875 - goto out_controller_put; 875 + return ret; 876 876 } 877 877 878 878 fsl_lpspi->clk_ipg = devm_clk_get(&pdev->dev, "ipg"); 879 879 if (IS_ERR(fsl_lpspi->clk_ipg)) { 880 880 ret = PTR_ERR(fsl_lpspi->clk_ipg); 881 - goto out_controller_put; 881 + return ret; 882 882 } 883 883 884 884 /* enable the clock */ 885 885 ret = fsl_lpspi_init_rpm(fsl_lpspi); 886 886 if (ret) 887 - goto out_controller_put; 887 + return ret; 888 888 889 889 ret = pm_runtime_get_sync(fsl_lpspi->dev); 890 890 if (ret < 0) { ··· 945 945 pm_runtime_dont_use_autosuspend(fsl_lpspi->dev); 946 946 pm_runtime_put_sync(fsl_lpspi->dev); 947 947 pm_runtime_disable(fsl_lpspi->dev); 948 - out_controller_put: 949 - spi_controller_put(controller); 950 948 951 949 return ret; 952 950 }

+2

drivers/spi/spi-pci1xxxx.c

··· 725 725 spi_bus->spi_int[iter] = devm_kzalloc(&pdev->dev, 726 726 sizeof(struct pci1xxxx_spi_internal), 727 727 GFP_KERNEL); 728 + if (!spi_bus->spi_int[iter]) 729 + return -ENOMEM; 728 730 spi_sub_ptr = spi_bus->spi_int[iter]; 729 731 spi_sub_ptr->spi_host = devm_spi_alloc_host(dev, sizeof(struct spi_controller)); 730 732 if (!spi_sub_ptr->spi_host)

+2 -3

drivers/spi/spi-s3c64xx.c

··· 430 430 struct s3c64xx_spi_driver_data *sdd = spi_controller_get_devdata(host); 431 431 432 432 if (sdd->rx_dma.ch && sdd->tx_dma.ch) 433 - return xfer->len > sdd->fifo_depth; 433 + return xfer->len >= sdd->fifo_depth; 434 434 435 435 return false; 436 436 } ··· 826 826 return status; 827 827 } 828 828 829 - if (!is_polling(sdd) && (xfer->len > fifo_len) && 829 + if (!is_polling(sdd) && xfer->len >= fifo_len && 830 830 sdd->rx_dma.ch && sdd->tx_dma.ch) { 831 831 use_dma = 1; 832 - 833 832 } else if (xfer->len >= fifo_len) { 834 833 tx_buf = xfer->tx_buf; 835 834 rx_buf = xfer->rx_buf;

+5 -9

drivers/thermal/gov_power_allocator.c

··· 606 606 607 607 /* There might be no cooling devices yet. */ 608 608 if (!num_actors) { 609 - ret = -EINVAL; 609 + ret = 0; 610 610 goto clean_state; 611 611 } 612 612 ··· 679 679 return -ENOMEM; 680 680 681 681 get_governor_trips(tz, params); 682 - if (!params->trip_max) { 683 - dev_warn(&tz->device, "power_allocator: missing trip_max\n"); 684 - kfree(params); 685 - return -EINVAL; 686 - } 687 682 688 683 ret = check_power_actors(tz, params); 689 684 if (ret < 0) { ··· 709 714 else 710 715 params->sustainable_power = tz->tzp->sustainable_power; 711 716 712 - estimate_pid_constants(tz, tz->tzp->sustainable_power, 713 - params->trip_switch_on, 714 - params->trip_max->temperature); 717 + if (params->trip_max) 718 + estimate_pid_constants(tz, tz->tzp->sustainable_power, 719 + params->trip_switch_on, 720 + params->trip_max->temperature); 715 721 716 722 reset_pid_controller(params); 717 723

+7 -2

drivers/ufs/core/ufshcd.c

··· 3217 3217 3218 3218 /* MCQ mode */ 3219 3219 if (is_mcq_enabled(hba)) { 3220 - err = ufshcd_clear_cmd(hba, lrbp->task_tag); 3220 + /* successfully cleared the command, retry if needed */ 3221 + if (ufshcd_clear_cmd(hba, lrbp->task_tag) == 0) 3222 + err = -EAGAIN; 3221 3223 hba->dev_cmd.complete = NULL; 3222 3224 return err; 3223 3225 } ··· 9793 9791 9794 9792 /* UFS device & link must be active before we enter in this function */ 9795 9793 if (!ufshcd_is_ufs_dev_active(hba) || !ufshcd_is_link_active(hba)) { 9796 - ret = -EINVAL; 9794 + /* Wait err handler finish or trigger err recovery */ 9795 + if (!ufshcd_eh_in_progress(hba)) 9796 + ufshcd_force_error_recovery(hba); 9797 + ret = -EBUSY; 9797 9798 goto enable_scaling; 9798 9799 } 9799 9800

+1 -1

fs/aio.c

··· 1202 1202 spin_lock_irqsave(&ctx->wait.lock, flags); 1203 1203 list_for_each_entry_safe(curr, next, &ctx->wait.head, w.entry) 1204 1204 if (avail >= curr->min_nr) { 1205 - list_del_init_careful(&curr->w.entry); 1206 1205 wake_up_process(curr->w.private); 1206 + list_del_init_careful(&curr->w.entry); 1207 1207 } 1208 1208 spin_unlock_irqrestore(&ctx->wait.lock, flags); 1209 1209 }

+3

fs/bcachefs/Makefile

··· 17 17 btree_journal_iter.o \ 18 18 btree_key_cache.o \ 19 19 btree_locking.o \ 20 + btree_node_scan.o \ 20 21 btree_trans_commit.o \ 21 22 btree_update.o \ 22 23 btree_update_interior.o \ ··· 38 37 error.o \ 39 38 extents.o \ 40 39 extent_update.o \ 40 + eytzinger.o \ 41 41 fs.o \ 42 42 fs-common.o \ 43 43 fs-ioctl.o \ ··· 69 67 quota.o \ 70 68 rebalance.o \ 71 69 recovery.o \ 70 + recovery_passes.o \ 72 71 reflink.o \ 73 72 replicas.o \ 74 73 sb-clean.o \

+26 -21

fs/bcachefs/alloc_background.c

··· 1713 1713 if (ret) 1714 1714 goto out; 1715 1715 1716 - if (BCH_ALLOC_V4_NEED_INC_GEN(&a->v)) { 1717 - a->v.gen++; 1718 - SET_BCH_ALLOC_V4_NEED_INC_GEN(&a->v, false); 1719 - goto write; 1720 - } 1721 - 1722 - if (a->v.journal_seq > c->journal.flushed_seq_ondisk) { 1723 - if (c->curr_recovery_pass > BCH_RECOVERY_PASS_check_alloc_info) { 1724 - bch2_trans_inconsistent(trans, 1725 - "clearing need_discard but journal_seq %llu > flushed_seq %llu\n" 1726 - "%s", 1727 - a->v.journal_seq, 1728 - c->journal.flushed_seq_ondisk, 1729 - (bch2_bkey_val_to_text(&buf, c, k), buf.buf)); 1716 + if (a->v.dirty_sectors) { 1717 + if (bch2_trans_inconsistent_on(c->curr_recovery_pass > BCH_RECOVERY_PASS_check_alloc_info, 1718 + trans, "attempting to discard bucket with dirty data\n%s", 1719 + (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) 1730 1720 ret = -EIO; 1731 - } 1732 1721 goto out; 1733 1722 } 1734 1723 1735 1724 if (a->v.data_type != BCH_DATA_need_discard) { 1736 - if (c->curr_recovery_pass > BCH_RECOVERY_PASS_check_alloc_info) { 1737 - bch2_trans_inconsistent(trans, 1738 - "bucket incorrectly set in need_discard btree\n" 1739 - "%s", 1740 - (bch2_bkey_val_to_text(&buf, c, k), buf.buf)); 1741 - ret = -EIO; 1725 + if (data_type_is_empty(a->v.data_type) && 1726 + BCH_ALLOC_V4_NEED_INC_GEN(&a->v)) { 1727 + a->v.gen++; 1728 + SET_BCH_ALLOC_V4_NEED_INC_GEN(&a->v, false); 1729 + goto write; 1742 1730 } 1743 1731 1732 + if (bch2_trans_inconsistent_on(c->curr_recovery_pass > BCH_RECOVERY_PASS_check_alloc_info, 1733 + trans, "bucket incorrectly set in need_discard btree\n" 1734 + "%s", 1735 + (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) 1736 + ret = -EIO; 1737 + goto out; 1738 + } 1739 + 1740 + if (a->v.journal_seq > c->journal.flushed_seq_ondisk) { 1741 + if (bch2_trans_inconsistent_on(c->curr_recovery_pass > BCH_RECOVERY_PASS_check_alloc_info, 1742 + trans, "clearing need_discard but journal_seq %llu > flushed_seq %llu\n%s", 1743 + a->v.journal_seq, 1744 + c->journal.flushed_seq_ondisk, 1745 + (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) 1746 + ret = -EIO; 1744 1747 goto out; 1745 1748 } 1746 1749 ··· 1838 1835 if (ret) 1839 1836 goto err; 1840 1837 1838 + BUG_ON(a->v.dirty_sectors); 1841 1839 SET_BCH_ALLOC_V4_NEED_DISCARD(&a->v, false); 1842 1840 a->v.data_type = alloc_data_type(a->v, a->v.data_type); 1843 1841 ··· 1946 1942 goto out; 1947 1943 1948 1944 BUG_ON(a->v.data_type != BCH_DATA_cached); 1945 + BUG_ON(a->v.dirty_sectors); 1949 1946 1950 1947 if (!a->v.cached_sectors) 1951 1948 bch_err(c, "invalidating empty bucket, confused");

+3 -1

fs/bcachefs/alloc_foreground.c

··· 188 188 static inline unsigned open_buckets_reserved(enum bch_watermark watermark) 189 189 { 190 190 switch (watermark) { 191 - case BCH_WATERMARK_reclaim: 191 + case BCH_WATERMARK_interior_updates: 192 192 return 0; 193 + case BCH_WATERMARK_reclaim: 194 + return OPEN_BUCKETS_COUNT / 6; 193 195 case BCH_WATERMARK_btree: 194 196 case BCH_WATERMARK_btree_copygc: 195 197 return OPEN_BUCKETS_COUNT / 4;

+2 -1

fs/bcachefs/alloc_types.h

··· 22 22 x(copygc) \ 23 23 x(btree) \ 24 24 x(btree_copygc) \ 25 - x(reclaim) 25 + x(reclaim) \ 26 + x(interior_updates) 26 27 27 28 enum bch_watermark { 28 29 #define x(name) BCH_WATERMARK_##name,

+174 -12

fs/bcachefs/backpointers.c

··· 8 8 #include "btree_update.h" 9 9 #include "btree_update_interior.h" 10 10 #include "btree_write_buffer.h" 11 + #include "checksum.h" 11 12 #include "error.h" 12 13 13 14 #include <linux/mm.h> ··· 30 29 if (p.ptr.cached) 31 30 continue; 32 31 33 - bch2_extent_ptr_to_bp(c, btree_id, level, k, p, 34 - &bucket2, &bp2); 32 + bch2_extent_ptr_to_bp(c, btree_id, level, k, p, entry, &bucket2, &bp2); 35 33 if (bpos_eq(bucket, bucket2) && 36 34 !memcmp(&bp, &bp2, sizeof(bp))) 37 35 return true; ··· 44 44 struct printbuf *err) 45 45 { 46 46 struct bkey_s_c_backpointer bp = bkey_s_c_to_backpointer(k); 47 + 48 + /* these will be caught by fsck */ 49 + if (!bch2_dev_exists2(c, bp.k->p.inode)) 50 + return 0; 51 + 47 52 struct bpos bucket = bp_pos_to_bucket(c, bp.k->p); 48 53 int ret = 0; 49 54 ··· 383 378 backpointer_to_missing_alloc, 384 379 "backpointer for nonexistent alloc key: %llu:%llu:0\n%s", 385 380 alloc_iter.pos.inode, alloc_iter.pos.offset, 386 - (bch2_bkey_val_to_text(&buf, c, alloc_k), buf.buf))) { 381 + (bch2_bkey_val_to_text(&buf, c, k), buf.buf))) { 387 382 ret = bch2_btree_delete_at(trans, bp_iter, 0); 388 383 goto out; 389 384 } ··· 419 414 struct bkey_buf last_flushed; 420 415 }; 421 416 417 + static int drop_dev_and_update(struct btree_trans *trans, enum btree_id btree, 418 + struct bkey_s_c extent, unsigned dev) 419 + { 420 + struct bkey_i *n = bch2_bkey_make_mut_noupdate(trans, extent); 421 + int ret = PTR_ERR_OR_ZERO(n); 422 + if (ret) 423 + return ret; 424 + 425 + bch2_bkey_drop_device(bkey_i_to_s(n), dev); 426 + return bch2_btree_insert_trans(trans, btree, n, 0); 427 + } 428 + 429 + static int check_extent_checksum(struct btree_trans *trans, 430 + enum btree_id btree, struct bkey_s_c extent, 431 + enum btree_id o_btree, struct bkey_s_c extent2, unsigned dev) 432 + { 433 + struct bch_fs *c = trans->c; 434 + struct bkey_ptrs_c ptrs = bch2_bkey_ptrs_c(extent); 435 + const union bch_extent_entry *entry; 436 + struct extent_ptr_decoded p; 437 + struct printbuf buf = PRINTBUF; 438 + void *data_buf = NULL; 439 + struct bio *bio = NULL; 440 + size_t bytes; 441 + int ret = 0; 442 + 443 + if (bkey_is_btree_ptr(extent.k)) 444 + return false; 445 + 446 + bkey_for_each_ptr_decode(extent.k, ptrs, p, entry) 447 + if (p.ptr.dev == dev) 448 + goto found; 449 + BUG(); 450 + found: 451 + if (!p.crc.csum_type) 452 + return false; 453 + 454 + bytes = p.crc.compressed_size << 9; 455 + 456 + struct bch_dev *ca = bch_dev_bkey_exists(c, dev); 457 + if (!bch2_dev_get_ioref(ca, READ)) 458 + return false; 459 + 460 + data_buf = kvmalloc(bytes, GFP_KERNEL); 461 + if (!data_buf) { 462 + ret = -ENOMEM; 463 + goto err; 464 + } 465 + 466 + bio = bio_alloc(ca->disk_sb.bdev, 1, REQ_OP_READ, GFP_KERNEL); 467 + bio->bi_iter.bi_sector = p.ptr.offset; 468 + bch2_bio_map(bio, data_buf, bytes); 469 + ret = submit_bio_wait(bio); 470 + if (ret) 471 + goto err; 472 + 473 + prt_str(&buf, "extents pointing to same space, but first extent checksum bad:"); 474 + prt_printf(&buf, "\n %s ", bch2_btree_id_str(btree)); 475 + bch2_bkey_val_to_text(&buf, c, extent); 476 + prt_printf(&buf, "\n %s ", bch2_btree_id_str(o_btree)); 477 + bch2_bkey_val_to_text(&buf, c, extent2); 478 + 479 + struct nonce nonce = extent_nonce(extent.k->version, p.crc); 480 + struct bch_csum csum = bch2_checksum(c, p.crc.csum_type, nonce, data_buf, bytes); 481 + if (fsck_err_on(bch2_crc_cmp(csum, p.crc.csum), 482 + c, dup_backpointer_to_bad_csum_extent, 483 + "%s", buf.buf)) 484 + ret = drop_dev_and_update(trans, btree, extent, dev) ?: 1; 485 + fsck_err: 486 + err: 487 + if (bio) 488 + bio_put(bio); 489 + kvfree(data_buf); 490 + percpu_ref_put(&ca->io_ref); 491 + printbuf_exit(&buf); 492 + return ret; 493 + } 494 + 422 495 static int check_bp_exists(struct btree_trans *trans, 423 496 struct extents_to_bp_state *s, 424 497 struct bpos bucket, ··· 504 421 struct bkey_s_c orig_k) 505 422 { 506 423 struct bch_fs *c = trans->c; 507 - struct btree_iter bp_iter = { NULL }; 424 + struct btree_iter bp_iter = {}; 425 + struct btree_iter other_extent_iter = {}; 508 426 struct printbuf buf = PRINTBUF; 509 427 struct bkey_s_c bp_k; 510 428 struct bkey_buf tmp; ··· 513 429 514 430 bch2_bkey_buf_init(&tmp); 515 431 432 + if (!bch2_dev_bucket_exists(c, bucket)) { 433 + prt_str(&buf, "extent for nonexistent device:bucket "); 434 + bch2_bpos_to_text(&buf, bucket); 435 + prt_str(&buf, "\n "); 436 + bch2_bkey_val_to_text(&buf, c, orig_k); 437 + bch_err(c, "%s", buf.buf); 438 + return -BCH_ERR_fsck_repair_unimplemented; 439 + } 440 + 516 441 if (bpos_lt(bucket, s->bucket_start) || 517 442 bpos_gt(bucket, s->bucket_end)) 518 443 return 0; 519 - 520 - if (!bch2_dev_bucket_exists(c, bucket)) 521 - goto missing; 522 444 523 445 bp_k = bch2_bkey_get_iter(trans, &bp_iter, BTREE_ID_backpointers, 524 446 bucket_pos_to_bp(c, bucket, bp.bucket_offset), ··· 551 461 ret = -BCH_ERR_transaction_restart_write_buffer_flush; 552 462 goto out; 553 463 } 554 - goto missing; 464 + 465 + goto check_existing_bp; 555 466 } 556 467 out: 557 468 err: 558 469 fsck_err: 470 + bch2_trans_iter_exit(trans, &other_extent_iter); 559 471 bch2_trans_iter_exit(trans, &bp_iter); 560 472 bch2_bkey_buf_exit(&tmp, c); 561 473 printbuf_exit(&buf); 562 474 return ret; 475 + check_existing_bp: 476 + /* Do we have a backpointer for a different extent? */ 477 + if (bp_k.k->type != KEY_TYPE_backpointer) 478 + goto missing; 479 + 480 + struct bch_backpointer other_bp = *bkey_s_c_to_backpointer(bp_k).v; 481 + 482 + struct bkey_s_c other_extent = 483 + bch2_backpointer_get_key(trans, &other_extent_iter, bp_k.k->p, other_bp, 0); 484 + ret = bkey_err(other_extent); 485 + if (ret == -BCH_ERR_backpointer_to_overwritten_btree_node) 486 + ret = 0; 487 + if (ret) 488 + goto err; 489 + 490 + if (!other_extent.k) 491 + goto missing; 492 + 493 + if (bch2_extents_match(orig_k, other_extent)) { 494 + printbuf_reset(&buf); 495 + prt_printf(&buf, "duplicate versions of same extent, deleting smaller\n "); 496 + bch2_bkey_val_to_text(&buf, c, orig_k); 497 + prt_str(&buf, "\n "); 498 + bch2_bkey_val_to_text(&buf, c, other_extent); 499 + bch_err(c, "%s", buf.buf); 500 + 501 + if (other_extent.k->size <= orig_k.k->size) { 502 + ret = drop_dev_and_update(trans, other_bp.btree_id, other_extent, bucket.inode); 503 + if (ret) 504 + goto err; 505 + goto out; 506 + } else { 507 + ret = drop_dev_and_update(trans, bp.btree_id, orig_k, bucket.inode); 508 + if (ret) 509 + goto err; 510 + goto missing; 511 + } 512 + } 513 + 514 + ret = check_extent_checksum(trans, other_bp.btree_id, other_extent, bp.btree_id, orig_k, bucket.inode); 515 + if (ret < 0) 516 + goto err; 517 + if (ret) { 518 + ret = 0; 519 + goto missing; 520 + } 521 + 522 + ret = check_extent_checksum(trans, bp.btree_id, orig_k, other_bp.btree_id, other_extent, bucket.inode); 523 + if (ret < 0) 524 + goto err; 525 + if (ret) { 526 + ret = 0; 527 + goto out; 528 + } 529 + 530 + printbuf_reset(&buf); 531 + prt_printf(&buf, "duplicate extents pointing to same space on dev %llu\n ", bucket.inode); 532 + bch2_bkey_val_to_text(&buf, c, orig_k); 533 + prt_str(&buf, "\n "); 534 + bch2_bkey_val_to_text(&buf, c, other_extent); 535 + bch_err(c, "%s", buf.buf); 536 + ret = -BCH_ERR_fsck_repair_unimplemented; 537 + goto err; 563 538 missing: 539 + printbuf_reset(&buf); 564 540 prt_printf(&buf, "missing backpointer for btree=%s l=%u ", 565 541 bch2_btree_id_str(bp.btree_id), bp.level); 566 542 bch2_bkey_val_to_text(&buf, c, orig_k); 567 - prt_printf(&buf, "\nbp pos "); 568 - bch2_bpos_to_text(&buf, bp_iter.pos); 543 + prt_printf(&buf, "\n got: "); 544 + bch2_bkey_val_to_text(&buf, c, bp_k); 545 + 546 + struct bkey_i_backpointer n_bp_k; 547 + bkey_backpointer_init(&n_bp_k.k_i); 548 + n_bp_k.k.p = bucket_pos_to_bp(trans->c, bucket, bp.bucket_offset); 549 + n_bp_k.v = bp; 550 + prt_printf(&buf, "\n want: "); 551 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&n_bp_k.k_i)); 569 552 570 553 if (fsck_err(c, ptr_to_missing_backpointer, "%s", buf.buf)) 571 554 ret = bch2_bucket_backpointer_mod(trans, bucket, bp, orig_k, true); ··· 665 502 if (p.ptr.cached) 666 503 continue; 667 504 668 - bch2_extent_ptr_to_bp(c, btree, level, 669 - k, p, &bucket_pos, &bp); 505 + bch2_extent_ptr_to_bp(c, btree, level, k, p, entry, &bucket_pos, &bp); 670 506 671 507 ret = check_bp_exists(trans, s, bucket_pos, bp, k); 672 508 if (ret)

+26 -6

fs/bcachefs/backpointers.h

··· 90 90 return bch2_trans_update_buffered(trans, BTREE_ID_backpointers, &bp_k.k_i); 91 91 } 92 92 93 - static inline enum bch_data_type bkey_ptr_data_type(enum btree_id btree_id, unsigned level, 94 - struct bkey_s_c k, struct extent_ptr_decoded p) 93 + static inline enum bch_data_type bch2_bkey_ptr_data_type(struct bkey_s_c k, 94 + struct extent_ptr_decoded p, 95 + const union bch_extent_entry *entry) 95 96 { 96 - return level ? BCH_DATA_btree : 97 - p.has_ec ? BCH_DATA_stripe : 98 - BCH_DATA_user; 97 + switch (k.k->type) { 98 + case KEY_TYPE_btree_ptr: 99 + case KEY_TYPE_btree_ptr_v2: 100 + return BCH_DATA_btree; 101 + case KEY_TYPE_extent: 102 + case KEY_TYPE_reflink_v: 103 + return p.has_ec ? BCH_DATA_stripe : BCH_DATA_user; 104 + case KEY_TYPE_stripe: { 105 + const struct bch_extent_ptr *ptr = &entry->ptr; 106 + struct bkey_s_c_stripe s = bkey_s_c_to_stripe(k); 107 + 108 + BUG_ON(ptr < s.v->ptrs || 109 + ptr >= s.v->ptrs + s.v->nr_blocks); 110 + 111 + return ptr >= s.v->ptrs + s.v->nr_blocks - s.v->nr_redundant 112 + ? BCH_DATA_parity 113 + : BCH_DATA_user; 114 + } 115 + default: 116 + BUG(); 117 + } 99 118 } 100 119 101 120 static inline void bch2_extent_ptr_to_bp(struct bch_fs *c, 102 121 enum btree_id btree_id, unsigned level, 103 122 struct bkey_s_c k, struct extent_ptr_decoded p, 123 + const union bch_extent_entry *entry, 104 124 struct bpos *bucket_pos, struct bch_backpointer *bp) 105 125 { 106 - enum bch_data_type data_type = bkey_ptr_data_type(btree_id, level, k, p); 126 + enum bch_data_type data_type = bch2_bkey_ptr_data_type(k, p, entry); 107 127 s64 sectors = level ? btree_sectors(c) : k.k->size; 108 128 u32 bucket_offset; 109 129

+6 -2

fs/bcachefs/bcachefs.h

··· 209 209 #include "fifo.h" 210 210 #include "nocow_locking_types.h" 211 211 #include "opts.h" 212 - #include "recovery_types.h" 212 + #include "recovery_passes_types.h" 213 213 #include "sb-errors_types.h" 214 214 #include "seqmutex.h" 215 215 #include "time_stats.h" ··· 456 456 457 457 #include "alloc_types.h" 458 458 #include "btree_types.h" 459 + #include "btree_node_scan_types.h" 459 460 #include "btree_write_buffer_types.h" 460 461 #include "buckets_types.h" 461 462 #include "buckets_waiting_for_journal_types.h" ··· 615 614 */ 616 615 617 616 #define BCH_FS_FLAGS() \ 617 + x(new_fs) \ 618 618 x(started) \ 619 619 x(may_go_rw) \ 620 620 x(rw) \ ··· 798 796 u64 features; 799 797 u64 compat; 800 798 unsigned long errors_silent[BITS_TO_LONGS(BCH_SB_ERR_MAX)]; 799 + u64 btrees_lost_data; 801 800 } sb; 802 801 803 802 ··· 813 810 814 811 /* snapshot.c: */ 815 812 struct snapshot_table __rcu *snapshots; 816 - size_t snapshot_table_size; 817 813 struct mutex snapshot_table_lock; 818 814 struct rw_semaphore snapshot_create_lock; 819 815 ··· 1105 1103 u64 journal_entries_base_seq; 1106 1104 struct journal_keys journal_keys; 1107 1105 struct list_head journal_iters; 1106 + 1107 + struct find_btree_nodes found_btree_nodes; 1108 1108 1109 1109 u64 last_bucket_seq_cleanup; 1110 1110

+1

fs/bcachefs/bcachefs_format.h

··· 818 818 struct bch_sb_field field; 819 819 __le64 recovery_passes_required[2]; 820 820 __le64 errors_silent[8]; 821 + __le64 btrees_lost_data; 821 822 }; 822 823 823 824 struct bch_sb_field_downgrade_entry {

+10 -4

fs/bcachefs/bset.c

··· 134 134 printbuf_exit(&buf); 135 135 } 136 136 137 - #ifdef CONFIG_BCACHEFS_DEBUG 138 - 139 - void __bch2_verify_btree_nr_keys(struct btree *b) 137 + struct btree_nr_keys bch2_btree_node_count_keys(struct btree *b) 140 138 { 141 139 struct bset_tree *t; 142 140 struct bkey_packed *k; 143 - struct btree_nr_keys nr = { 0 }; 141 + struct btree_nr_keys nr = {}; 144 142 145 143 for_each_bset(b, t) 146 144 bset_tree_for_each_key(b, t, k) 147 145 if (!bkey_deleted(k)) 148 146 btree_keys_account_key_add(&nr, t - b->set, k); 147 + return nr; 148 + } 149 + 150 + #ifdef CONFIG_BCACHEFS_DEBUG 151 + 152 + void __bch2_verify_btree_nr_keys(struct btree *b) 153 + { 154 + struct btree_nr_keys nr = bch2_btree_node_count_keys(b); 149 155 150 156 BUG_ON(memcmp(&nr, &b->nr, sizeof(nr))); 151 157 }

+2

fs/bcachefs/bset.h

··· 458 458 459 459 /* Accounting: */ 460 460 461 + struct btree_nr_keys bch2_btree_node_count_keys(struct btree *); 462 + 461 463 static inline void btree_keys_account_key(struct btree_nr_keys *n, 462 464 unsigned bset, 463 465 struct bkey_packed *k,

+4 -1

fs/bcachefs/btree_cache.c

··· 808 808 prt_printf(&buf, "\nmax "); 809 809 bch2_bpos_to_text(&buf, b->data->max_key); 810 810 811 - bch2_fs_inconsistent(c, "%s", buf.buf); 811 + bch2_fs_topology_error(c, "%s", buf.buf); 812 + 812 813 printbuf_exit(&buf); 813 814 } 814 815 ··· 1135 1134 b = btree_cache_find(bc, k); 1136 1135 if (!b) 1137 1136 return; 1137 + 1138 + BUG_ON(b == btree_node_root(trans->c, b)); 1138 1139 wait_on_io: 1139 1140 /* not allowed to wait on io with btree locks held: */ 1140 1141

+246 -260

fs/bcachefs/btree_gc.c

··· 7 7 #include "bcachefs.h" 8 8 #include "alloc_background.h" 9 9 #include "alloc_foreground.h" 10 + #include "backpointers.h" 10 11 #include "bkey_methods.h" 11 12 #include "bkey_buf.h" 12 13 #include "btree_journal_iter.h" 13 14 #include "btree_key_cache.h" 14 15 #include "btree_locking.h" 16 + #include "btree_node_scan.h" 15 17 #include "btree_update_interior.h" 16 18 #include "btree_io.h" 17 19 #include "btree_gc.h" ··· 26 24 #include "journal.h" 27 25 #include "keylist.h" 28 26 #include "move.h" 29 - #include "recovery.h" 27 + #include "recovery_passes.h" 30 28 #include "reflink.h" 31 29 #include "replicas.h" 32 30 #include "super-io.h" ··· 42 40 43 41 #define DROP_THIS_NODE 10 44 42 #define DROP_PREV_NODE 11 43 + #define DID_FILL_FROM_SCAN 12 45 44 46 45 static struct bkey_s unsafe_bkey_s_c_to_s(struct bkey_s_c k) 47 46 { ··· 71 68 { 72 69 BUG_ON(gc_pos_cmp(new_pos, c->gc_pos) <= 0); 73 70 __gc_pos_set(c, new_pos); 74 - } 75 - 76 - /* 77 - * Missing: if an interior btree node is empty, we need to do something - 78 - * perhaps just kill it 79 - */ 80 - static int bch2_gc_check_topology(struct bch_fs *c, 81 - struct btree *b, 82 - struct bkey_buf *prev, 83 - struct bkey_buf cur, 84 - bool is_last) 85 - { 86 - struct bpos node_start = b->data->min_key; 87 - struct bpos node_end = b->data->max_key; 88 - struct bpos expected_start = bkey_deleted(&prev->k->k) 89 - ? node_start 90 - : bpos_successor(prev->k->k.p); 91 - struct printbuf buf1 = PRINTBUF, buf2 = PRINTBUF; 92 - int ret = 0; 93 - 94 - if (cur.k->k.type == KEY_TYPE_btree_ptr_v2) { 95 - struct bkey_i_btree_ptr_v2 *bp = bkey_i_to_btree_ptr_v2(cur.k); 96 - 97 - if (!bpos_eq(expected_start, bp->v.min_key)) { 98 - bch2_topology_error(c); 99 - 100 - if (bkey_deleted(&prev->k->k)) { 101 - prt_printf(&buf1, "start of node: "); 102 - bch2_bpos_to_text(&buf1, node_start); 103 - } else { 104 - bch2_bkey_val_to_text(&buf1, c, bkey_i_to_s_c(prev->k)); 105 - } 106 - bch2_bkey_val_to_text(&buf2, c, bkey_i_to_s_c(cur.k)); 107 - 108 - if (__fsck_err(c, 109 - FSCK_CAN_FIX| 110 - FSCK_CAN_IGNORE| 111 - FSCK_NO_RATELIMIT, 112 - btree_node_topology_bad_min_key, 113 - "btree node with incorrect min_key at btree %s level %u:\n" 114 - " prev %s\n" 115 - " cur %s", 116 - bch2_btree_id_str(b->c.btree_id), b->c.level, 117 - buf1.buf, buf2.buf) && should_restart_for_topology_repair(c)) { 118 - bch_info(c, "Halting mark and sweep to start topology repair pass"); 119 - ret = bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_check_topology); 120 - goto err; 121 - } else { 122 - set_bit(BCH_FS_initial_gc_unfixed, &c->flags); 123 - } 124 - } 125 - } 126 - 127 - if (is_last && !bpos_eq(cur.k->k.p, node_end)) { 128 - bch2_topology_error(c); 129 - 130 - printbuf_reset(&buf1); 131 - printbuf_reset(&buf2); 132 - 133 - bch2_bkey_val_to_text(&buf1, c, bkey_i_to_s_c(cur.k)); 134 - bch2_bpos_to_text(&buf2, node_end); 135 - 136 - if (__fsck_err(c, FSCK_CAN_FIX|FSCK_CAN_IGNORE|FSCK_NO_RATELIMIT, 137 - btree_node_topology_bad_max_key, 138 - "btree node with incorrect max_key at btree %s level %u:\n" 139 - " %s\n" 140 - " expected %s", 141 - bch2_btree_id_str(b->c.btree_id), b->c.level, 142 - buf1.buf, buf2.buf) && 143 - should_restart_for_topology_repair(c)) { 144 - bch_info(c, "Halting mark and sweep to start topology repair pass"); 145 - ret = bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_check_topology); 146 - goto err; 147 - } else { 148 - set_bit(BCH_FS_initial_gc_unfixed, &c->flags); 149 - } 150 - } 151 - 152 - bch2_bkey_buf_copy(prev, c, cur.k); 153 - err: 154 - fsck_err: 155 - printbuf_exit(&buf2); 156 - printbuf_exit(&buf1); 157 - return ret; 158 71 } 159 72 160 73 static void btree_ptr_to_v2(struct btree *b, struct bkey_i_btree_ptr_v2 *dst) ··· 131 212 struct bkey_i_btree_ptr_v2 *new; 132 213 int ret; 133 214 215 + if (c->opts.verbose) { 216 + struct printbuf buf = PRINTBUF; 217 + 218 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&b->key)); 219 + prt_str(&buf, " -> "); 220 + bch2_bpos_to_text(&buf, new_min); 221 + 222 + bch_info(c, "%s(): %s", __func__, buf.buf); 223 + printbuf_exit(&buf); 224 + } 225 + 134 226 new = kmalloc_array(BKEY_BTREE_PTR_U64s_MAX, sizeof(u64), GFP_KERNEL); 135 227 if (!new) 136 228 return -BCH_ERR_ENOMEM_gc_repair_key; ··· 166 236 { 167 237 struct bkey_i_btree_ptr_v2 *new; 168 238 int ret; 239 + 240 + if (c->opts.verbose) { 241 + struct printbuf buf = PRINTBUF; 242 + 243 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&b->key)); 244 + prt_str(&buf, " -> "); 245 + bch2_bpos_to_text(&buf, new_max); 246 + 247 + bch_info(c, "%s(): %s", __func__, buf.buf); 248 + printbuf_exit(&buf); 249 + } 169 250 170 251 ret = bch2_journal_key_delete(c, b->c.btree_id, b->c.level + 1, b->key.k.p); 171 252 if (ret) ··· 209 268 return 0; 210 269 } 211 270 212 - static int btree_repair_node_boundaries(struct bch_fs *c, struct btree *b, 213 - struct btree *prev, struct btree *cur) 271 + static int btree_check_node_boundaries(struct bch_fs *c, struct btree *b, 272 + struct btree *prev, struct btree *cur, 273 + struct bpos *pulled_from_scan) 214 274 { 215 275 struct bpos expected_start = !prev 216 276 ? b->data->min_key 217 277 : bpos_successor(prev->key.k.p); 218 - struct printbuf buf1 = PRINTBUF, buf2 = PRINTBUF; 278 + struct printbuf buf = PRINTBUF; 219 279 int ret = 0; 220 280 221 - if (!prev) { 222 - prt_printf(&buf1, "start of node: "); 223 - bch2_bpos_to_text(&buf1, b->data->min_key); 224 - } else { 225 - bch2_bkey_val_to_text(&buf1, c, bkey_i_to_s_c(&prev->key)); 281 + BUG_ON(b->key.k.type == KEY_TYPE_btree_ptr_v2 && 282 + !bpos_eq(bkey_i_to_btree_ptr_v2(&b->key)->v.min_key, 283 + b->data->min_key)); 284 + 285 + if (bpos_eq(expected_start, cur->data->min_key)) 286 + return 0; 287 + 288 + prt_printf(&buf, " at btree %s level %u:\n parent: ", 289 + bch2_btree_id_str(b->c.btree_id), b->c.level); 290 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&b->key)); 291 + 292 + if (prev) { 293 + prt_printf(&buf, "\n prev: "); 294 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&prev->key)); 226 295 } 227 296 228 - bch2_bkey_val_to_text(&buf2, c, bkey_i_to_s_c(&cur->key)); 297 + prt_str(&buf, "\n next: "); 298 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&cur->key)); 229 299 230 - if (prev && 231 - bpos_gt(expected_start, cur->data->min_key) && 232 - BTREE_NODE_SEQ(cur->data) > BTREE_NODE_SEQ(prev->data)) { 233 - /* cur overwrites prev: */ 300 + if (bpos_lt(expected_start, cur->data->min_key)) { /* gap */ 301 + if (b->c.level == 1 && 302 + bpos_lt(*pulled_from_scan, cur->data->min_key)) { 303 + ret = bch2_get_scanned_nodes(c, b->c.btree_id, 0, 304 + expected_start, 305 + bpos_predecessor(cur->data->min_key)); 306 + if (ret) 307 + goto err; 234 308 235 - if (mustfix_fsck_err_on(bpos_ge(prev->data->min_key, 236 - cur->data->min_key), c, 237 - btree_node_topology_overwritten_by_next_node, 238 - "btree node overwritten by next node at btree %s level %u:\n" 239 - " node %s\n" 240 - " next %s", 241 - bch2_btree_id_str(b->c.btree_id), b->c.level, 242 - buf1.buf, buf2.buf)) { 243 - ret = DROP_PREV_NODE; 244 - goto out; 309 + *pulled_from_scan = cur->data->min_key; 310 + ret = DID_FILL_FROM_SCAN; 311 + } else { 312 + if (mustfix_fsck_err(c, btree_node_topology_bad_min_key, 313 + "btree node with incorrect min_key%s", buf.buf)) 314 + ret = set_node_min(c, cur, expected_start); 245 315 } 246 - 247 - if (mustfix_fsck_err_on(!bpos_eq(prev->key.k.p, 248 - bpos_predecessor(cur->data->min_key)), c, 249 - btree_node_topology_bad_max_key, 250 - "btree node with incorrect max_key at btree %s level %u:\n" 251 - " node %s\n" 252 - " next %s", 253 - bch2_btree_id_str(b->c.btree_id), b->c.level, 254 - buf1.buf, buf2.buf)) 255 - ret = set_node_max(c, prev, 256 - bpos_predecessor(cur->data->min_key)); 257 - } else { 258 - /* prev overwrites cur: */ 259 - 260 - if (mustfix_fsck_err_on(bpos_ge(expected_start, 261 - cur->data->max_key), c, 262 - btree_node_topology_overwritten_by_prev_node, 263 - "btree node overwritten by prev node at btree %s level %u:\n" 264 - " prev %s\n" 265 - " node %s", 266 - bch2_btree_id_str(b->c.btree_id), b->c.level, 267 - buf1.buf, buf2.buf)) { 268 - ret = DROP_THIS_NODE; 269 - goto out; 316 + } else { /* overlap */ 317 + if (prev && BTREE_NODE_SEQ(cur->data) > BTREE_NODE_SEQ(prev->data)) { /* cur overwrites prev */ 318 + if (bpos_ge(prev->data->min_key, cur->data->min_key)) { /* fully? */ 319 + if (mustfix_fsck_err(c, btree_node_topology_overwritten_by_next_node, 320 + "btree node overwritten by next node%s", buf.buf)) 321 + ret = DROP_PREV_NODE; 322 + } else { 323 + if (mustfix_fsck_err(c, btree_node_topology_bad_max_key, 324 + "btree node with incorrect max_key%s", buf.buf)) 325 + ret = set_node_max(c, prev, 326 + bpos_predecessor(cur->data->min_key)); 327 + } 328 + } else { 329 + if (bpos_ge(expected_start, cur->data->max_key)) { /* fully? */ 330 + if (mustfix_fsck_err(c, btree_node_topology_overwritten_by_prev_node, 331 + "btree node overwritten by prev node%s", buf.buf)) 332 + ret = DROP_THIS_NODE; 333 + } else { 334 + if (mustfix_fsck_err(c, btree_node_topology_bad_min_key, 335 + "btree node with incorrect min_key%s", buf.buf)) 336 + ret = set_node_min(c, cur, expected_start); 337 + } 270 338 } 271 - 272 - if (mustfix_fsck_err_on(!bpos_eq(expected_start, cur->data->min_key), c, 273 - btree_node_topology_bad_min_key, 274 - "btree node with incorrect min_key at btree %s level %u:\n" 275 - " prev %s\n" 276 - " node %s", 277 - bch2_btree_id_str(b->c.btree_id), b->c.level, 278 - buf1.buf, buf2.buf)) 279 - ret = set_node_min(c, cur, expected_start); 280 339 } 281 - out: 340 + err: 282 341 fsck_err: 283 - printbuf_exit(&buf2); 284 - printbuf_exit(&buf1); 342 + printbuf_exit(&buf); 285 343 return ret; 286 344 } 287 345 288 346 static int btree_repair_node_end(struct bch_fs *c, struct btree *b, 289 - struct btree *child) 347 + struct btree *child, struct bpos *pulled_from_scan) 290 348 { 291 - struct printbuf buf1 = PRINTBUF, buf2 = PRINTBUF; 349 + struct printbuf buf = PRINTBUF; 292 350 int ret = 0; 293 351 294 - bch2_bkey_val_to_text(&buf1, c, bkey_i_to_s_c(&child->key)); 295 - bch2_bpos_to_text(&buf2, b->key.k.p); 352 + if (bpos_eq(child->key.k.p, b->key.k.p)) 353 + return 0; 296 354 297 - if (mustfix_fsck_err_on(!bpos_eq(child->key.k.p, b->key.k.p), c, 298 - btree_node_topology_bad_max_key, 299 - "btree node with incorrect max_key at btree %s level %u:\n" 300 - " %s\n" 301 - " expected %s", 302 - bch2_btree_id_str(b->c.btree_id), b->c.level, 303 - buf1.buf, buf2.buf)) { 304 - ret = set_node_max(c, child, b->key.k.p); 305 - if (ret) 306 - goto err; 355 + prt_printf(&buf, "at btree %s level %u:\n parent: ", 356 + bch2_btree_id_str(b->c.btree_id), b->c.level); 357 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&b->key)); 358 + 359 + prt_str(&buf, "\n child: "); 360 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&child->key)); 361 + 362 + if (mustfix_fsck_err(c, btree_node_topology_bad_max_key, 363 + "btree node with incorrect max_key%s", buf.buf)) { 364 + if (b->c.level == 1 && 365 + bpos_lt(*pulled_from_scan, b->key.k.p)) { 366 + ret = bch2_get_scanned_nodes(c, b->c.btree_id, 0, 367 + bpos_successor(child->key.k.p), b->key.k.p); 368 + if (ret) 369 + goto err; 370 + 371 + *pulled_from_scan = b->key.k.p; 372 + ret = DID_FILL_FROM_SCAN; 373 + } else { 374 + ret = set_node_max(c, child, b->key.k.p); 375 + } 307 376 } 308 377 err: 309 378 fsck_err: 310 - printbuf_exit(&buf2); 311 - printbuf_exit(&buf1); 379 + printbuf_exit(&buf); 312 380 return ret; 313 381 } 314 382 315 - static int bch2_btree_repair_topology_recurse(struct btree_trans *trans, struct btree *b) 383 + static int bch2_btree_repair_topology_recurse(struct btree_trans *trans, struct btree *b, 384 + struct bpos *pulled_from_scan) 316 385 { 317 386 struct bch_fs *c = trans->c; 318 387 struct btree_and_journal_iter iter; 319 388 struct bkey_s_c k; 320 389 struct bkey_buf prev_k, cur_k; 321 390 struct btree *prev = NULL, *cur = NULL; 322 - bool have_child, dropped_children = false; 391 + bool have_child, new_pass = false; 323 392 struct printbuf buf = PRINTBUF; 324 393 int ret = 0; 325 394 326 395 if (!b->c.level) 327 396 return 0; 328 - again: 329 - prev = NULL; 330 - have_child = dropped_children = false; 397 + 331 398 bch2_bkey_buf_init(&prev_k); 332 399 bch2_bkey_buf_init(&cur_k); 400 + again: 401 + cur = prev = NULL; 402 + have_child = new_pass = false; 333 403 bch2_btree_and_journal_iter_init_node_iter(trans, &iter, b); 334 404 iter.prefetch = true; 335 405 ··· 367 415 b->c.level - 1, 368 416 buf.buf)) { 369 417 bch2_btree_node_evict(trans, cur_k.k); 370 - ret = bch2_journal_key_delete(c, b->c.btree_id, 371 - b->c.level, cur_k.k->k.p); 372 418 cur = NULL; 419 + ret = bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_scan_for_btree_nodes) ?: 420 + bch2_journal_key_delete(c, b->c.btree_id, 421 + b->c.level, cur_k.k->k.p); 373 422 if (ret) 374 423 break; 375 424 continue; ··· 380 427 if (ret) 381 428 break; 382 429 383 - ret = btree_repair_node_boundaries(c, b, prev, cur); 430 + if (bch2_btree_node_is_stale(c, cur)) { 431 + bch_info(c, "btree node %s older than nodes found by scanning", buf.buf); 432 + six_unlock_read(&cur->c.lock); 433 + bch2_btree_node_evict(trans, cur_k.k); 434 + ret = bch2_journal_key_delete(c, b->c.btree_id, 435 + b->c.level, cur_k.k->k.p); 436 + cur = NULL; 437 + if (ret) 438 + break; 439 + continue; 440 + } 441 + 442 + ret = btree_check_node_boundaries(c, b, prev, cur, pulled_from_scan); 443 + if (ret == DID_FILL_FROM_SCAN) { 444 + new_pass = true; 445 + ret = 0; 446 + } 384 447 385 448 if (ret == DROP_THIS_NODE) { 386 449 six_unlock_read(&cur->c.lock); ··· 414 445 prev = NULL; 415 446 416 447 if (ret == DROP_PREV_NODE) { 448 + bch_info(c, "dropped prev node"); 417 449 bch2_btree_node_evict(trans, prev_k.k); 418 450 ret = bch2_journal_key_delete(c, b->c.btree_id, 419 451 b->c.level, prev_k.k->k.p); ··· 422 452 break; 423 453 424 454 bch2_btree_and_journal_iter_exit(&iter); 425 - bch2_bkey_buf_exit(&prev_k, c); 426 - bch2_bkey_buf_exit(&cur_k, c); 427 455 goto again; 428 456 } else if (ret) 429 457 break; ··· 433 465 434 466 if (!ret && !IS_ERR_OR_NULL(prev)) { 435 467 BUG_ON(cur); 436 - ret = btree_repair_node_end(c, b, prev); 468 + ret = btree_repair_node_end(c, b, prev, pulled_from_scan); 469 + if (ret == DID_FILL_FROM_SCAN) { 470 + new_pass = true; 471 + ret = 0; 472 + } 437 473 } 438 474 439 475 if (!IS_ERR_OR_NULL(prev)) ··· 451 479 goto err; 452 480 453 481 bch2_btree_and_journal_iter_exit(&iter); 482 + 483 + if (new_pass) 484 + goto again; 485 + 454 486 bch2_btree_and_journal_iter_init_node_iter(trans, &iter, b); 455 487 iter.prefetch = true; 456 488 ··· 471 495 if (ret) 472 496 goto err; 473 497 474 - ret = bch2_btree_repair_topology_recurse(trans, cur); 498 + ret = bch2_btree_repair_topology_recurse(trans, cur, pulled_from_scan); 475 499 six_unlock_read(&cur->c.lock); 476 500 cur = NULL; 477 501 ··· 479 503 bch2_btree_node_evict(trans, cur_k.k); 480 504 ret = bch2_journal_key_delete(c, b->c.btree_id, 481 505 b->c.level, cur_k.k->k.p); 482 - dropped_children = true; 506 + new_pass = true; 483 507 } 484 508 485 509 if (ret) ··· 506 530 six_unlock_read(&cur->c.lock); 507 531 508 532 bch2_btree_and_journal_iter_exit(&iter); 509 - bch2_bkey_buf_exit(&prev_k, c); 510 - bch2_bkey_buf_exit(&cur_k, c); 511 533 512 - if (!ret && dropped_children) 534 + if (!ret && new_pass) 513 535 goto again; 514 536 537 + BUG_ON(!ret && bch2_btree_node_check_topology(trans, b)); 538 + 539 + bch2_bkey_buf_exit(&prev_k, c); 540 + bch2_bkey_buf_exit(&cur_k, c); 515 541 printbuf_exit(&buf); 516 542 return ret; 517 543 } ··· 521 543 int bch2_check_topology(struct bch_fs *c) 522 544 { 523 545 struct btree_trans *trans = bch2_trans_get(c); 524 - struct btree *b; 525 - unsigned i; 546 + struct bpos pulled_from_scan = POS_MIN; 526 547 int ret = 0; 527 548 528 - for (i = 0; i < btree_id_nr_alive(c) && !ret; i++) { 549 + for (unsigned i = 0; i < btree_id_nr_alive(c) && !ret; i++) { 529 550 struct btree_root *r = bch2_btree_id_root(c, i); 551 + bool reconstructed_root = false; 530 552 531 - if (!r->alive) 532 - continue; 553 + if (r->error) { 554 + ret = bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_scan_for_btree_nodes); 555 + if (ret) 556 + break; 557 + reconstruct_root: 558 + bch_info(c, "btree root %s unreadable, must recover from scan", bch2_btree_id_str(i)); 533 559 534 - b = r->b; 535 - if (btree_node_fake(b)) 536 - continue; 560 + r->alive = false; 561 + r->error = 0; 562 + 563 + if (!bch2_btree_has_scanned_nodes(c, i)) { 564 + mustfix_fsck_err(c, btree_root_unreadable_and_scan_found_nothing, 565 + "no nodes found for btree %s, continue?", bch2_btree_id_str(i)); 566 + bch2_btree_root_alloc_fake(c, i, 0); 567 + } else { 568 + bch2_btree_root_alloc_fake(c, i, 1); 569 + ret = bch2_get_scanned_nodes(c, i, 0, POS_MIN, SPOS_MAX); 570 + if (ret) 571 + break; 572 + } 573 + 574 + bch2_shoot_down_journal_keys(c, i, 1, BTREE_MAX_DEPTH, POS_MIN, SPOS_MAX); 575 + reconstructed_root = true; 576 + } 577 + 578 + struct btree *b = r->b; 537 579 538 580 btree_node_lock_nopath_nofail(trans, &b->c, SIX_LOCK_read); 539 - ret = bch2_btree_repair_topology_recurse(trans, b); 581 + ret = bch2_btree_repair_topology_recurse(trans, b, &pulled_from_scan); 540 582 six_unlock_read(&b->c.lock); 541 583 542 584 if (ret == DROP_THIS_NODE) { 543 - bch_err(c, "empty btree root - repair unimplemented"); 544 - ret = -BCH_ERR_fsck_repair_unimplemented; 585 + bch2_btree_node_hash_remove(&c->btree_cache, b); 586 + mutex_lock(&c->btree_cache.lock); 587 + list_move(&b->list, &c->btree_cache.freeable); 588 + mutex_unlock(&c->btree_cache.lock); 589 + 590 + r->b = NULL; 591 + 592 + if (!reconstructed_root) 593 + goto reconstruct_root; 594 + 595 + bch_err(c, "empty btree root %s", bch2_btree_id_str(i)); 596 + bch2_btree_root_alloc_fake(c, i, 0); 597 + r->alive = false; 598 + ret = 0; 545 599 } 546 600 } 547 - 601 + fsck_err: 548 602 bch2_trans_put(trans); 549 - 550 603 return ret; 551 604 } 552 605 ··· 600 591 bkey_for_each_ptr_decode(k->k, ptrs_c, p, entry_c) { 601 592 struct bch_dev *ca = bch_dev_bkey_exists(c, p.ptr.dev); 602 593 struct bucket *g = PTR_GC_BUCKET(ca, &p.ptr); 603 - enum bch_data_type data_type = bch2_bkey_ptr_data_type(*k, &entry_c->ptr); 594 + enum bch_data_type data_type = bch2_bkey_ptr_data_type(*k, p, entry_c); 604 595 605 596 if (fsck_err_on(!g->gen_valid, 606 597 c, ptr_to_missing_alloc_key, ··· 666 657 continue; 667 658 668 659 if (fsck_err_on(bucket_data_type(g->data_type) && 669 - bucket_data_type(g->data_type) != data_type, c, 660 + bucket_data_type(g->data_type) != 661 + bucket_data_type(data_type), c, 670 662 ptr_bucket_data_type_mismatch, 671 663 "bucket %u:%zu different types of data in same bucket: %s, %s\n" 672 664 "while marking %s", ··· 708 698 } 709 699 710 700 if (do_update) { 711 - struct bkey_ptrs ptrs; 712 - union bch_extent_entry *entry; 713 - struct bch_extent_ptr *ptr; 714 - struct bkey_i *new; 715 - 716 701 if (is_root) { 717 702 bch_err(c, "cannot update btree roots yet"); 718 703 ret = -EINVAL; 719 704 goto err; 720 705 } 721 706 722 - new = kmalloc(bkey_bytes(k->k), GFP_KERNEL); 707 + struct bkey_i *new = kmalloc(bkey_bytes(k->k), GFP_KERNEL); 723 708 if (!new) { 724 709 ret = -BCH_ERR_ENOMEM_gc_repair_key; 725 710 bch_err_msg(c, ret, "allocating new key"); ··· 729 724 * btree node isn't there anymore, the read path will 730 725 * sort it out: 731 726 */ 732 - ptrs = bch2_bkey_ptrs(bkey_i_to_s(new)); 727 + struct bkey_ptrs ptrs = bch2_bkey_ptrs(bkey_i_to_s(new)); 733 728 bkey_for_each_ptr(ptrs, ptr) { 734 729 struct bch_dev *ca = bch_dev_bkey_exists(c, ptr->dev); 735 730 struct bucket *g = PTR_GC_BUCKET(ca, ptr); ··· 737 732 ptr->gen = g->gen; 738 733 } 739 734 } else { 740 - bch2_bkey_drop_ptrs(bkey_i_to_s(new), ptr, ({ 741 - struct bch_dev *ca = bch_dev_bkey_exists(c, ptr->dev); 742 - struct bucket *g = PTR_GC_BUCKET(ca, ptr); 743 - enum bch_data_type data_type = bch2_bkey_ptr_data_type(*k, ptr); 735 + struct bkey_ptrs ptrs; 736 + union bch_extent_entry *entry; 737 + restart_drop_ptrs: 738 + ptrs = bch2_bkey_ptrs(bkey_i_to_s(new)); 739 + bkey_for_each_ptr_decode(bkey_i_to_s(new).k, ptrs, p, entry) { 740 + struct bch_dev *ca = bch_dev_bkey_exists(c, p.ptr.dev); 741 + struct bucket *g = PTR_GC_BUCKET(ca, &p.ptr); 742 + enum bch_data_type data_type = bch2_bkey_ptr_data_type(bkey_i_to_s_c(new), p, entry); 744 743 745 - (ptr->cached && 746 - (!g->gen_valid || gen_cmp(ptr->gen, g->gen) > 0)) || 747 - (!ptr->cached && 748 - gen_cmp(ptr->gen, g->gen) < 0) || 749 - gen_cmp(g->gen, ptr->gen) > BUCKET_GC_GEN_MAX || 750 - (g->data_type && 751 - g->data_type != data_type); 752 - })); 744 + if ((p.ptr.cached && 745 + (!g->gen_valid || gen_cmp(p.ptr.gen, g->gen) > 0)) || 746 + (!p.ptr.cached && 747 + gen_cmp(p.ptr.gen, g->gen) < 0) || 748 + gen_cmp(g->gen, p.ptr.gen) > BUCKET_GC_GEN_MAX || 749 + (g->data_type && 750 + g->data_type != data_type)) { 751 + bch2_bkey_drop_ptr(bkey_i_to_s(new), &entry->ptr); 752 + goto restart_drop_ptrs; 753 + } 754 + } 753 755 again: 754 756 ptrs = bch2_bkey_ptrs(bkey_i_to_s(new)); 755 757 bkey_extent_entry_for_each(ptrs, entry) { ··· 786 774 } 787 775 } 788 776 789 - ret = bch2_journal_key_insert_take(c, btree_id, level, new); 790 - if (ret) { 791 - kfree(new); 792 - goto err; 793 - } 794 - 795 777 if (level) 796 778 bch2_btree_node_update_key_early(trans, btree_id, level - 1, *k, new); 797 779 ··· 797 791 printbuf_reset(&buf); 798 792 bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(new)); 799 793 bch_info(c, "new key %s", buf.buf); 794 + } 795 + 796 + ret = bch2_journal_key_insert_take(c, btree_id, level, new); 797 + if (ret) { 798 + kfree(new); 799 + goto err; 800 800 } 801 801 802 802 *k = bkey_i_to_s_c(new); ··· 831 819 BUG_ON(bch2_journal_seq_verify && 832 820 k->k->version.lo > atomic64_read(&c->journal.seq)); 833 821 834 - ret = bch2_check_fix_ptrs(trans, btree_id, level, is_root, k); 835 - if (ret) 836 - goto err; 837 - 838 822 if (fsck_err_on(k->k->version.lo > atomic64_read(&c->key_version), c, 839 823 bkey_version_in_future, 840 824 "key version number higher than recorded: %llu > %llu", ··· 839 831 atomic64_set(&c->key_version, k->k->version.lo); 840 832 } 841 833 834 + ret = bch2_check_fix_ptrs(trans, btree_id, level, is_root, k); 835 + if (ret) 836 + goto err; 837 + 842 838 ret = commit_do(trans, NULL, NULL, 0, 843 - bch2_key_trigger(trans, btree_id, level, old, unsafe_bkey_s_c_to_s(*k), BTREE_TRIGGER_GC)); 839 + bch2_key_trigger(trans, btree_id, level, old, 840 + unsafe_bkey_s_c_to_s(*k), BTREE_TRIGGER_GC)); 844 841 fsck_err: 845 842 err: 846 843 bch_err_fn(c, ret); ··· 854 841 855 842 static int btree_gc_mark_node(struct btree_trans *trans, struct btree *b, bool initial) 856 843 { 857 - struct bch_fs *c = trans->c; 858 844 struct btree_node_iter iter; 859 845 struct bkey unpacked; 860 846 struct bkey_s_c k; 861 - struct bkey_buf prev, cur; 862 847 int ret = 0; 848 + 849 + ret = bch2_btree_node_check_topology(trans, b); 850 + if (ret) 851 + return ret; 863 852 864 853 if (!btree_node_type_needs_gc(btree_node_type(b))) 865 854 return 0; 866 855 867 856 bch2_btree_node_iter_init_from_start(&iter, b); 868 - bch2_bkey_buf_init(&prev); 869 - bch2_bkey_buf_init(&cur); 870 - bkey_init(&prev.k->k); 871 857 872 858 while ((k = bch2_btree_node_iter_peek_unpack(&iter, b, &unpacked)).k) { 873 859 ret = bch2_gc_mark_key(trans, b->c.btree_id, b->c.level, false, 874 860 &k, initial); 875 861 if (ret) 876 - break; 862 + return ret; 877 863 878 864 bch2_btree_node_iter_advance(&iter, b); 879 - 880 - if (b->c.level) { 881 - bch2_bkey_buf_reassemble(&cur, c, k); 882 - 883 - ret = bch2_gc_check_topology(c, b, &prev, cur, 884 - bch2_btree_node_iter_end(&iter)); 885 - if (ret) 886 - break; 887 - } 888 865 } 889 866 890 - bch2_bkey_buf_exit(&cur, c); 891 - bch2_bkey_buf_exit(&prev, c); 892 - return ret; 867 + return 0; 893 868 } 894 869 895 870 static int bch2_gc_btree(struct btree_trans *trans, enum btree_id btree_id, ··· 926 925 struct bch_fs *c = trans->c; 927 926 struct btree_and_journal_iter iter; 928 927 struct bkey_s_c k; 929 - struct bkey_buf cur, prev; 928 + struct bkey_buf cur; 930 929 struct printbuf buf = PRINTBUF; 931 930 int ret = 0; 932 931 932 + ret = bch2_btree_node_check_topology(trans, b); 933 + if (ret) 934 + return ret; 935 + 933 936 bch2_btree_and_journal_iter_init_node_iter(trans, &iter, b); 934 - bch2_bkey_buf_init(&prev); 935 937 bch2_bkey_buf_init(&cur); 936 - bkey_init(&prev.k->k); 937 938 938 939 while ((k = bch2_btree_and_journal_iter_peek(&iter)).k) { 939 940 BUG_ON(bpos_lt(k.k->p, b->data->min_key)); ··· 946 943 if (ret) 947 944 goto fsck_err; 948 945 949 - if (b->c.level) { 950 - bch2_bkey_buf_reassemble(&cur, c, k); 951 - k = bkey_i_to_s_c(cur.k); 952 - 953 - bch2_btree_and_journal_iter_advance(&iter); 954 - 955 - ret = bch2_gc_check_topology(c, b, 956 - &prev, cur, 957 - !bch2_btree_and_journal_iter_peek(&iter).k); 958 - if (ret) 959 - goto fsck_err; 960 - } else { 961 - bch2_btree_and_journal_iter_advance(&iter); 962 - } 946 + bch2_btree_and_journal_iter_advance(&iter); 963 947 } 964 948 965 949 if (b->c.level > target_depth) { ··· 1005 1015 } 1006 1016 fsck_err: 1007 1017 bch2_bkey_buf_exit(&cur, c); 1008 - bch2_bkey_buf_exit(&prev, c); 1009 1018 bch2_btree_and_journal_iter_exit(&iter); 1010 1019 printbuf_exit(&buf); 1011 1020 return ret; ··· 1021 1032 int ret = 0; 1022 1033 1023 1034 b = bch2_btree_id_root(c, btree_id)->b; 1024 - 1025 - if (btree_node_fake(b)) 1026 - return 0; 1027 1035 1028 1036 six_lock_read(&b->c.lock, NULL, NULL); 1029 1037 printbuf_reset(&buf);

+12 -6

fs/bcachefs/btree_io.c

··· 654 654 */ 655 655 bch2_bset_set_no_aux_tree(b, b->set); 656 656 bch2_btree_build_aux_trees(b); 657 + b->nr = bch2_btree_node_count_keys(b); 657 658 658 659 struct bkey_s_c k; 659 660 struct bkey unpacked; ··· 1264 1263 return retry_read; 1265 1264 fsck_err: 1266 1265 if (ret == -BCH_ERR_btree_node_read_err_want_retry || 1267 - ret == -BCH_ERR_btree_node_read_err_must_retry) 1266 + ret == -BCH_ERR_btree_node_read_err_must_retry) { 1268 1267 retry_read = 1; 1269 - else 1268 + } else { 1270 1269 set_btree_node_read_error(b); 1270 + bch2_btree_lost_data(c, b->c.btree_id); 1271 + } 1271 1272 goto out; 1272 1273 } 1273 1274 ··· 1330 1327 1331 1328 if (!can_retry) { 1332 1329 set_btree_node_read_error(b); 1330 + bch2_btree_lost_data(c, b->c.btree_id); 1333 1331 break; 1334 1332 } 1335 1333 } ··· 1530 1526 ret = -1; 1531 1527 } 1532 1528 1533 - if (ret) 1529 + if (ret) { 1534 1530 set_btree_node_read_error(b); 1535 - else if (*saw_error) 1531 + bch2_btree_lost_data(c, b->c.btree_id); 1532 + } else if (*saw_error) 1536 1533 bch2_btree_node_rewrite_async(c, b); 1537 1534 1538 1535 for (i = 0; i < ra->nr; i++) { ··· 1662 1657 1663 1658 prt_str(&buf, "btree node read error: no device to read from\n at "); 1664 1659 bch2_btree_pos_to_text(&buf, c, b); 1665 - bch_err(c, "%s", buf.buf); 1660 + bch_err_ratelimited(c, "%s", buf.buf); 1666 1661 1667 1662 if (c->recovery_passes_explicit & BIT_ULL(BCH_RECOVERY_PASS_check_topology) && 1668 1663 c->curr_recovery_pass > BCH_RECOVERY_PASS_check_topology) 1669 1664 bch2_fatal_error(c); 1670 1665 1671 1666 set_btree_node_read_error(b); 1667 + bch2_btree_lost_data(c, b->c.btree_id); 1672 1668 clear_btree_node_read_in_flight(b); 1673 1669 wake_up_bit(&b->flags, BTREE_NODE_read_in_flight); 1674 1670 printbuf_exit(&buf); ··· 1866 1860 } else { 1867 1861 ret = bch2_trans_do(c, NULL, NULL, 0, 1868 1862 bch2_btree_node_update_key_get_iter(trans, b, &wbio->key, 1869 - BCH_WATERMARK_reclaim| 1863 + BCH_WATERMARK_interior_updates| 1870 1864 BCH_TRANS_COMMIT_journal_reclaim| 1871 1865 BCH_TRANS_COMMIT_no_enospc| 1872 1866 BCH_TRANS_COMMIT_no_check_rw,

+46 -6

fs/bcachefs/btree_iter.c

··· 927 927 if (ret) 928 928 goto err; 929 929 } else { 930 - bch2_bkey_buf_unpack(&tmp, c, l->b, 931 - bch2_btree_node_iter_peek(&l->iter, l->b)); 930 + struct bkey_packed *k = bch2_btree_node_iter_peek(&l->iter, l->b); 931 + if (!k) { 932 + struct printbuf buf = PRINTBUF; 933 + 934 + prt_str(&buf, "node not found at pos "); 935 + bch2_bpos_to_text(&buf, path->pos); 936 + prt_str(&buf, " within parent node "); 937 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&l->b->key)); 938 + 939 + bch2_fs_fatal_error(c, "%s", buf.buf); 940 + printbuf_exit(&buf); 941 + ret = -BCH_ERR_btree_need_topology_repair; 942 + goto err; 943 + } 944 + 945 + bch2_bkey_buf_unpack(&tmp, c, l->b, k); 932 946 933 947 if ((flags & BTREE_ITER_PREFETCH) && 934 948 c->opts.btree_node_prefetch) { ··· 975 961 bch2_bkey_buf_exit(&tmp, c); 976 962 return ret; 977 963 } 978 - 979 964 980 965 static int bch2_btree_path_traverse_all(struct btree_trans *trans) 981 966 { ··· 2803 2790 struct btree_transaction_stats *s = btree_trans_stats(trans); 2804 2791 s->max_mem = max(s->max_mem, new_bytes); 2805 2792 2793 + if (trans->used_mempool) { 2794 + if (trans->mem_bytes >= new_bytes) 2795 + goto out_change_top; 2796 + 2797 + /* No more space from mempool item, need malloc new one */ 2798 + new_mem = kmalloc(new_bytes, GFP_NOWAIT|__GFP_NOWARN); 2799 + if (unlikely(!new_mem)) { 2800 + bch2_trans_unlock(trans); 2801 + 2802 + new_mem = kmalloc(new_bytes, GFP_KERNEL); 2803 + if (!new_mem) 2804 + return ERR_PTR(-BCH_ERR_ENOMEM_trans_kmalloc); 2805 + 2806 + ret = bch2_trans_relock(trans); 2807 + if (ret) { 2808 + kfree(new_mem); 2809 + return ERR_PTR(ret); 2810 + } 2811 + } 2812 + memcpy(new_mem, trans->mem, trans->mem_top); 2813 + trans->used_mempool = false; 2814 + mempool_free(trans->mem, &c->btree_trans_mem_pool); 2815 + goto out_new_mem; 2816 + } 2817 + 2806 2818 new_mem = krealloc(trans->mem, new_bytes, GFP_NOWAIT|__GFP_NOWARN); 2807 2819 if (unlikely(!new_mem)) { 2808 2820 bch2_trans_unlock(trans); ··· 2836 2798 if (!new_mem && new_bytes <= BTREE_TRANS_MEM_MAX) { 2837 2799 new_mem = mempool_alloc(&c->btree_trans_mem_pool, GFP_KERNEL); 2838 2800 new_bytes = BTREE_TRANS_MEM_MAX; 2801 + memcpy(new_mem, trans->mem, trans->mem_top); 2802 + trans->used_mempool = true; 2839 2803 kfree(trans->mem); 2840 2804 } 2841 2805 ··· 2851 2811 if (ret) 2852 2812 return ERR_PTR(ret); 2853 2813 } 2854 - 2814 + out_new_mem: 2855 2815 trans->mem = new_mem; 2856 2816 trans->mem_bytes = new_bytes; 2857 2817 ··· 2859 2819 trace_and_count(c, trans_restart_mem_realloced, trans, _RET_IP_, new_bytes); 2860 2820 return ERR_PTR(btree_trans_restart(trans, BCH_ERR_transaction_restart_mem_realloced)); 2861 2821 } 2862 - 2822 + out_change_top: 2863 2823 p = trans->mem + trans->mem_top; 2864 2824 trans->mem_top += size; 2865 2825 memset(p, 0, size); ··· 3133 3093 if (paths_allocated != trans->_paths_allocated) 3134 3094 kvfree_rcu_mightsleep(paths_allocated); 3135 3095 3136 - if (trans->mem_bytes == BTREE_TRANS_MEM_MAX) 3096 + if (trans->used_mempool) 3137 3097 mempool_free(trans->mem, &c->btree_trans_mem_pool); 3138 3098 else 3139 3099 kfree(trans->mem);

+43 -5

fs/bcachefs/btree_journal_iter.c

··· 261 261 return bch2_journal_key_insert(c, id, level, &whiteout); 262 262 } 263 263 264 + bool bch2_key_deleted_in_journal(struct btree_trans *trans, enum btree_id btree, 265 + unsigned level, struct bpos pos) 266 + { 267 + struct journal_keys *keys = &trans->c->journal_keys; 268 + size_t idx = bch2_journal_key_search(keys, btree, level, pos); 269 + 270 + if (!trans->journal_replay_not_finished) 271 + return false; 272 + 273 + return (idx < keys->size && 274 + keys->data[idx].btree_id == btree && 275 + keys->data[idx].level == level && 276 + bpos_eq(keys->data[idx].k->k.p, pos) && 277 + bkey_deleted(&keys->data[idx].k->k)); 278 + } 279 + 264 280 void bch2_journal_key_overwritten(struct bch_fs *c, enum btree_id btree, 265 281 unsigned level, struct bpos pos) 266 282 { ··· 379 363 380 364 struct bkey_s_c bch2_btree_and_journal_iter_peek(struct btree_and_journal_iter *iter) 381 365 { 382 - struct bkey_s_c btree_k, journal_k, ret; 366 + struct bkey_s_c btree_k, journal_k = bkey_s_c_null, ret; 383 367 384 368 if (iter->prefetch && iter->journal.level) 385 369 btree_and_journal_iter_prefetch(iter); ··· 391 375 bpos_lt(btree_k.k->p, iter->pos)) 392 376 bch2_journal_iter_advance_btree(iter); 393 377 394 - while ((journal_k = bch2_journal_iter_peek(&iter->journal)).k && 395 - bpos_lt(journal_k.k->p, iter->pos)) 396 - bch2_journal_iter_advance(&iter->journal); 378 + if (iter->trans->journal_replay_not_finished) 379 + while ((journal_k = bch2_journal_iter_peek(&iter->journal)).k && 380 + bpos_lt(journal_k.k->p, iter->pos)) 381 + bch2_journal_iter_advance(&iter->journal); 397 382 398 383 ret = journal_k.k && 399 384 (!btree_k.k || bpos_le(journal_k.k->p, btree_k.k->p)) ··· 452 435 453 436 bch2_btree_node_iter_init_from_start(&node_iter, b); 454 437 __bch2_btree_and_journal_iter_init_node_iter(trans, iter, b, node_iter, b->data->min_key); 455 - list_add(&iter->journal.list, &trans->c->journal_iters); 438 + if (trans->journal_replay_not_finished && 439 + !test_bit(BCH_FS_may_go_rw, &trans->c->flags)) 440 + list_add(&iter->journal.list, &trans->c->journal_iters); 456 441 } 457 442 458 443 /* sort and dedup all keys in the journal: */ ··· 566 547 567 548 bch_verbose(c, "Journal keys: %zu read, %zu after sorting and compacting", nr_read, keys->nr); 568 549 return 0; 550 + } 551 + 552 + void bch2_shoot_down_journal_keys(struct bch_fs *c, enum btree_id btree, 553 + unsigned level_min, unsigned level_max, 554 + struct bpos start, struct bpos end) 555 + { 556 + struct journal_keys *keys = &c->journal_keys; 557 + size_t dst = 0; 558 + 559 + move_gap(keys, keys->nr); 560 + 561 + darray_for_each(*keys, i) 562 + if (!(i->btree_id == btree && 563 + i->level >= level_min && 564 + i->level <= level_max && 565 + bpos_ge(i->k->k.p, start) && 566 + bpos_le(i->k->k.p, end))) 567 + keys->data[dst++] = *i; 568 + keys->nr = keys->gap = dst; 569 569 }

+6 -2

fs/bcachefs/btree_journal_iter.h

··· 40 40 unsigned, struct bkey_i *); 41 41 int bch2_journal_key_delete(struct bch_fs *, enum btree_id, 42 42 unsigned, struct bpos); 43 - void bch2_journal_key_overwritten(struct bch_fs *, enum btree_id, 44 - unsigned, struct bpos); 43 + bool bch2_key_deleted_in_journal(struct btree_trans *, enum btree_id, unsigned, struct bpos); 44 + void bch2_journal_key_overwritten(struct bch_fs *, enum btree_id, unsigned, struct bpos); 45 45 46 46 void bch2_btree_and_journal_iter_advance(struct btree_and_journal_iter *); 47 47 struct bkey_s_c bch2_btree_and_journal_iter_peek(struct btree_and_journal_iter *); ··· 65 65 void bch2_journal_entries_free(struct bch_fs *); 66 66 67 67 int bch2_journal_keys_sort(struct bch_fs *); 68 + 69 + void bch2_shoot_down_journal_keys(struct bch_fs *, enum btree_id, 70 + unsigned, unsigned, 71 + struct bpos, struct bpos); 68 72 69 73 #endif /* _BCACHEFS_BTREE_JOURNAL_ITER_H */

+495

fs/bcachefs/btree_node_scan.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include "bcachefs.h" 4 + #include "btree_cache.h" 5 + #include "btree_io.h" 6 + #include "btree_journal_iter.h" 7 + #include "btree_node_scan.h" 8 + #include "btree_update_interior.h" 9 + #include "buckets.h" 10 + #include "error.h" 11 + #include "journal_io.h" 12 + #include "recovery_passes.h" 13 + 14 + #include <linux/kthread.h> 15 + #include <linux/sort.h> 16 + 17 + struct find_btree_nodes_worker { 18 + struct closure *cl; 19 + struct find_btree_nodes *f; 20 + struct bch_dev *ca; 21 + }; 22 + 23 + static void found_btree_node_to_text(struct printbuf *out, struct bch_fs *c, const struct found_btree_node *n) 24 + { 25 + prt_printf(out, "%s l=%u seq=%u cookie=%llx ", bch2_btree_id_str(n->btree_id), n->level, n->seq, n->cookie); 26 + bch2_bpos_to_text(out, n->min_key); 27 + prt_str(out, "-"); 28 + bch2_bpos_to_text(out, n->max_key); 29 + 30 + if (n->range_updated) 31 + prt_str(out, " range updated"); 32 + if (n->overwritten) 33 + prt_str(out, " overwritten"); 34 + 35 + for (unsigned i = 0; i < n->nr_ptrs; i++) { 36 + prt_char(out, ' '); 37 + bch2_extent_ptr_to_text(out, c, n->ptrs + i); 38 + } 39 + } 40 + 41 + static void found_btree_nodes_to_text(struct printbuf *out, struct bch_fs *c, found_btree_nodes nodes) 42 + { 43 + printbuf_indent_add(out, 2); 44 + darray_for_each(nodes, i) { 45 + found_btree_node_to_text(out, c, i); 46 + prt_newline(out); 47 + } 48 + printbuf_indent_sub(out, 2); 49 + } 50 + 51 + static void found_btree_node_to_key(struct bkey_i *k, const struct found_btree_node *f) 52 + { 53 + struct bkey_i_btree_ptr_v2 *bp = bkey_btree_ptr_v2_init(k); 54 + 55 + set_bkey_val_u64s(&bp->k, sizeof(struct bch_btree_ptr_v2) / sizeof(u64) + f->nr_ptrs); 56 + bp->k.p = f->max_key; 57 + bp->v.seq = cpu_to_le64(f->cookie); 58 + bp->v.sectors_written = 0; 59 + bp->v.flags = 0; 60 + bp->v.min_key = f->min_key; 61 + SET_BTREE_PTR_RANGE_UPDATED(&bp->v, f->range_updated); 62 + memcpy(bp->v.start, f->ptrs, sizeof(struct bch_extent_ptr) * f->nr_ptrs); 63 + } 64 + 65 + static bool found_btree_node_is_readable(struct btree_trans *trans, 66 + const struct found_btree_node *f) 67 + { 68 + struct { __BKEY_PADDED(k, BKEY_BTREE_PTR_VAL_U64s_MAX); } k; 69 + 70 + found_btree_node_to_key(&k.k, f); 71 + 72 + struct btree *b = bch2_btree_node_get_noiter(trans, &k.k, f->btree_id, f->level, false); 73 + bool ret = !IS_ERR_OR_NULL(b); 74 + if (ret) 75 + six_unlock_read(&b->c.lock); 76 + 77 + /* 78 + * We might update this node's range; if that happens, we need the node 79 + * to be re-read so the read path can trim keys that are no longer in 80 + * this node 81 + */ 82 + if (b != btree_node_root(trans->c, b)) 83 + bch2_btree_node_evict(trans, &k.k); 84 + return ret; 85 + } 86 + 87 + static int found_btree_node_cmp_cookie(const void *_l, const void *_r) 88 + { 89 + const struct found_btree_node *l = _l; 90 + const struct found_btree_node *r = _r; 91 + 92 + return cmp_int(l->btree_id, r->btree_id) ?: 93 + cmp_int(l->level, r->level) ?: 94 + cmp_int(l->cookie, r->cookie); 95 + } 96 + 97 + /* 98 + * Given two found btree nodes, if their sequence numbers are equal, take the 99 + * one that's readable: 100 + */ 101 + static int found_btree_node_cmp_time(const struct found_btree_node *l, 102 + const struct found_btree_node *r) 103 + { 104 + return cmp_int(l->seq, r->seq); 105 + } 106 + 107 + static int found_btree_node_cmp_pos(const void *_l, const void *_r) 108 + { 109 + const struct found_btree_node *l = _l; 110 + const struct found_btree_node *r = _r; 111 + 112 + return cmp_int(l->btree_id, r->btree_id) ?: 113 + -cmp_int(l->level, r->level) ?: 114 + bpos_cmp(l->min_key, r->min_key) ?: 115 + -found_btree_node_cmp_time(l, r); 116 + } 117 + 118 + static void try_read_btree_node(struct find_btree_nodes *f, struct bch_dev *ca, 119 + struct bio *bio, struct btree_node *bn, u64 offset) 120 + { 121 + struct bch_fs *c = container_of(f, struct bch_fs, found_btree_nodes); 122 + 123 + bio_reset(bio, ca->disk_sb.bdev, REQ_OP_READ); 124 + bio->bi_iter.bi_sector = offset; 125 + bch2_bio_map(bio, bn, PAGE_SIZE); 126 + 127 + submit_bio_wait(bio); 128 + if (bch2_dev_io_err_on(bio->bi_status, ca, BCH_MEMBER_ERROR_read, 129 + "IO error in try_read_btree_node() at %llu: %s", 130 + offset, bch2_blk_status_to_str(bio->bi_status))) 131 + return; 132 + 133 + if (le64_to_cpu(bn->magic) != bset_magic(c)) 134 + return; 135 + 136 + rcu_read_lock(); 137 + struct found_btree_node n = { 138 + .btree_id = BTREE_NODE_ID(bn), 139 + .level = BTREE_NODE_LEVEL(bn), 140 + .seq = BTREE_NODE_SEQ(bn), 141 + .cookie = le64_to_cpu(bn->keys.seq), 142 + .min_key = bn->min_key, 143 + .max_key = bn->max_key, 144 + .nr_ptrs = 1, 145 + .ptrs[0].type = 1 << BCH_EXTENT_ENTRY_ptr, 146 + .ptrs[0].offset = offset, 147 + .ptrs[0].dev = ca->dev_idx, 148 + .ptrs[0].gen = *bucket_gen(ca, sector_to_bucket(ca, offset)), 149 + }; 150 + rcu_read_unlock(); 151 + 152 + if (bch2_trans_run(c, found_btree_node_is_readable(trans, &n))) { 153 + mutex_lock(&f->lock); 154 + if (BSET_BIG_ENDIAN(&bn->keys) != CPU_BIG_ENDIAN) { 155 + bch_err(c, "try_read_btree_node() can't handle endian conversion"); 156 + f->ret = -EINVAL; 157 + goto unlock; 158 + } 159 + 160 + if (darray_push(&f->nodes, n)) 161 + f->ret = -ENOMEM; 162 + unlock: 163 + mutex_unlock(&f->lock); 164 + } 165 + } 166 + 167 + static int read_btree_nodes_worker(void *p) 168 + { 169 + struct find_btree_nodes_worker *w = p; 170 + struct bch_fs *c = container_of(w->f, struct bch_fs, found_btree_nodes); 171 + struct bch_dev *ca = w->ca; 172 + void *buf = (void *) __get_free_page(GFP_KERNEL); 173 + struct bio *bio = bio_alloc(NULL, 1, 0, GFP_KERNEL); 174 + unsigned long last_print = jiffies; 175 + 176 + if (!buf || !bio) { 177 + bch_err(c, "read_btree_nodes_worker: error allocating bio/buf"); 178 + w->f->ret = -ENOMEM; 179 + goto err; 180 + } 181 + 182 + for (u64 bucket = ca->mi.first_bucket; bucket < ca->mi.nbuckets; bucket++) 183 + for (unsigned bucket_offset = 0; 184 + bucket_offset + btree_sectors(c) <= ca->mi.bucket_size; 185 + bucket_offset += btree_sectors(c)) { 186 + if (time_after(jiffies, last_print + HZ * 30)) { 187 + u64 cur_sector = bucket * ca->mi.bucket_size + bucket_offset; 188 + u64 end_sector = ca->mi.nbuckets * ca->mi.bucket_size; 189 + 190 + bch_info(ca, "%s: %2u%% done", __func__, 191 + (unsigned) div64_u64(cur_sector * 100, end_sector)); 192 + last_print = jiffies; 193 + } 194 + 195 + try_read_btree_node(w->f, ca, bio, buf, 196 + bucket * ca->mi.bucket_size + bucket_offset); 197 + } 198 + err: 199 + bio_put(bio); 200 + free_page((unsigned long) buf); 201 + percpu_ref_get(&ca->io_ref); 202 + closure_put(w->cl); 203 + kfree(w); 204 + return 0; 205 + } 206 + 207 + static int read_btree_nodes(struct find_btree_nodes *f) 208 + { 209 + struct bch_fs *c = container_of(f, struct bch_fs, found_btree_nodes); 210 + struct closure cl; 211 + int ret = 0; 212 + 213 + closure_init_stack(&cl); 214 + 215 + for_each_online_member(c, ca) { 216 + struct find_btree_nodes_worker *w = kmalloc(sizeof(*w), GFP_KERNEL); 217 + struct task_struct *t; 218 + 219 + if (!w) { 220 + percpu_ref_put(&ca->io_ref); 221 + ret = -ENOMEM; 222 + goto err; 223 + } 224 + 225 + percpu_ref_get(&ca->io_ref); 226 + closure_get(&cl); 227 + w->cl = &cl; 228 + w->f = f; 229 + w->ca = ca; 230 + 231 + t = kthread_run(read_btree_nodes_worker, w, "read_btree_nodes/%s", ca->name); 232 + ret = IS_ERR_OR_NULL(t); 233 + if (ret) { 234 + percpu_ref_put(&ca->io_ref); 235 + closure_put(&cl); 236 + f->ret = ret; 237 + bch_err(c, "error starting kthread: %i", ret); 238 + break; 239 + } 240 + } 241 + err: 242 + closure_sync(&cl); 243 + return f->ret ?: ret; 244 + } 245 + 246 + static void bubble_up(struct found_btree_node *n, struct found_btree_node *end) 247 + { 248 + while (n + 1 < end && 249 + found_btree_node_cmp_pos(n, n + 1) > 0) { 250 + swap(n[0], n[1]); 251 + n++; 252 + } 253 + } 254 + 255 + static int handle_overwrites(struct bch_fs *c, 256 + struct found_btree_node *start, 257 + struct found_btree_node *end) 258 + { 259 + struct found_btree_node *n; 260 + again: 261 + for (n = start + 1; 262 + n < end && 263 + n->btree_id == start->btree_id && 264 + n->level == start->level && 265 + bpos_lt(n->min_key, start->max_key); 266 + n++) { 267 + int cmp = found_btree_node_cmp_time(start, n); 268 + 269 + if (cmp > 0) { 270 + if (bpos_cmp(start->max_key, n->max_key) >= 0) 271 + n->overwritten = true; 272 + else { 273 + n->range_updated = true; 274 + n->min_key = bpos_successor(start->max_key); 275 + n->range_updated = true; 276 + bubble_up(n, end); 277 + goto again; 278 + } 279 + } else if (cmp < 0) { 280 + BUG_ON(bpos_cmp(n->min_key, start->min_key) <= 0); 281 + 282 + start->max_key = bpos_predecessor(n->min_key); 283 + start->range_updated = true; 284 + } else { 285 + struct printbuf buf = PRINTBUF; 286 + 287 + prt_str(&buf, "overlapping btree nodes with same seq! halting\n "); 288 + found_btree_node_to_text(&buf, c, start); 289 + prt_str(&buf, "\n "); 290 + found_btree_node_to_text(&buf, c, n); 291 + bch_err(c, "%s", buf.buf); 292 + printbuf_exit(&buf); 293 + return -1; 294 + } 295 + } 296 + 297 + return 0; 298 + } 299 + 300 + int bch2_scan_for_btree_nodes(struct bch_fs *c) 301 + { 302 + struct find_btree_nodes *f = &c->found_btree_nodes; 303 + struct printbuf buf = PRINTBUF; 304 + size_t dst; 305 + int ret = 0; 306 + 307 + if (f->nodes.nr) 308 + return 0; 309 + 310 + mutex_init(&f->lock); 311 + 312 + ret = read_btree_nodes(f); 313 + if (ret) 314 + return ret; 315 + 316 + if (!f->nodes.nr) { 317 + bch_err(c, "%s: no btree nodes found", __func__); 318 + ret = -EINVAL; 319 + goto err; 320 + } 321 + 322 + if (0 && c->opts.verbose) { 323 + printbuf_reset(&buf); 324 + prt_printf(&buf, "%s: nodes found:\n", __func__); 325 + found_btree_nodes_to_text(&buf, c, f->nodes); 326 + bch2_print_string_as_lines(KERN_INFO, buf.buf); 327 + } 328 + 329 + sort(f->nodes.data, f->nodes.nr, sizeof(f->nodes.data[0]), found_btree_node_cmp_cookie, NULL); 330 + 331 + dst = 0; 332 + darray_for_each(f->nodes, i) { 333 + struct found_btree_node *prev = dst ? f->nodes.data + dst - 1 : NULL; 334 + 335 + if (prev && 336 + prev->cookie == i->cookie) { 337 + if (prev->nr_ptrs == ARRAY_SIZE(prev->ptrs)) { 338 + bch_err(c, "%s: found too many replicas for btree node", __func__); 339 + ret = -EINVAL; 340 + goto err; 341 + } 342 + prev->ptrs[prev->nr_ptrs++] = i->ptrs[0]; 343 + } else { 344 + f->nodes.data[dst++] = *i; 345 + } 346 + } 347 + f->nodes.nr = dst; 348 + 349 + sort(f->nodes.data, f->nodes.nr, sizeof(f->nodes.data[0]), found_btree_node_cmp_pos, NULL); 350 + 351 + if (0 && c->opts.verbose) { 352 + printbuf_reset(&buf); 353 + prt_printf(&buf, "%s: nodes after merging replicas:\n", __func__); 354 + found_btree_nodes_to_text(&buf, c, f->nodes); 355 + bch2_print_string_as_lines(KERN_INFO, buf.buf); 356 + } 357 + 358 + dst = 0; 359 + darray_for_each(f->nodes, i) { 360 + if (i->overwritten) 361 + continue; 362 + 363 + ret = handle_overwrites(c, i, &darray_top(f->nodes)); 364 + if (ret) 365 + goto err; 366 + 367 + BUG_ON(i->overwritten); 368 + f->nodes.data[dst++] = *i; 369 + } 370 + f->nodes.nr = dst; 371 + 372 + if (c->opts.verbose) { 373 + printbuf_reset(&buf); 374 + prt_printf(&buf, "%s: nodes found after overwrites:\n", __func__); 375 + found_btree_nodes_to_text(&buf, c, f->nodes); 376 + bch2_print_string_as_lines(KERN_INFO, buf.buf); 377 + } 378 + 379 + eytzinger0_sort(f->nodes.data, f->nodes.nr, sizeof(f->nodes.data[0]), found_btree_node_cmp_pos, NULL); 380 + err: 381 + printbuf_exit(&buf); 382 + return ret; 383 + } 384 + 385 + static int found_btree_node_range_start_cmp(const void *_l, const void *_r) 386 + { 387 + const struct found_btree_node *l = _l; 388 + const struct found_btree_node *r = _r; 389 + 390 + return cmp_int(l->btree_id, r->btree_id) ?: 391 + -cmp_int(l->level, r->level) ?: 392 + bpos_cmp(l->max_key, r->min_key); 393 + } 394 + 395 + #define for_each_found_btree_node_in_range(_f, _search, _idx) \ 396 + for (size_t _idx = eytzinger0_find_gt((_f)->nodes.data, (_f)->nodes.nr, \ 397 + sizeof((_f)->nodes.data[0]), \ 398 + found_btree_node_range_start_cmp, &search); \ 399 + _idx < (_f)->nodes.nr && \ 400 + (_f)->nodes.data[_idx].btree_id == _search.btree_id && \ 401 + (_f)->nodes.data[_idx].level == _search.level && \ 402 + bpos_lt((_f)->nodes.data[_idx].min_key, _search.max_key); \ 403 + _idx = eytzinger0_next(_idx, (_f)->nodes.nr)) 404 + 405 + bool bch2_btree_node_is_stale(struct bch_fs *c, struct btree *b) 406 + { 407 + struct find_btree_nodes *f = &c->found_btree_nodes; 408 + 409 + struct found_btree_node search = { 410 + .btree_id = b->c.btree_id, 411 + .level = b->c.level, 412 + .min_key = b->data->min_key, 413 + .max_key = b->key.k.p, 414 + }; 415 + 416 + for_each_found_btree_node_in_range(f, search, idx) 417 + if (f->nodes.data[idx].seq > BTREE_NODE_SEQ(b->data)) 418 + return true; 419 + return false; 420 + } 421 + 422 + bool bch2_btree_has_scanned_nodes(struct bch_fs *c, enum btree_id btree) 423 + { 424 + struct found_btree_node search = { 425 + .btree_id = btree, 426 + .level = 0, 427 + .min_key = POS_MIN, 428 + .max_key = SPOS_MAX, 429 + }; 430 + 431 + for_each_found_btree_node_in_range(&c->found_btree_nodes, search, idx) 432 + return true; 433 + return false; 434 + } 435 + 436 + int bch2_get_scanned_nodes(struct bch_fs *c, enum btree_id btree, 437 + unsigned level, struct bpos node_min, struct bpos node_max) 438 + { 439 + struct find_btree_nodes *f = &c->found_btree_nodes; 440 + 441 + int ret = bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_scan_for_btree_nodes); 442 + if (ret) 443 + return ret; 444 + 445 + if (c->opts.verbose) { 446 + struct printbuf buf = PRINTBUF; 447 + 448 + prt_printf(&buf, "recovering %s l=%u ", bch2_btree_id_str(btree), level); 449 + bch2_bpos_to_text(&buf, node_min); 450 + prt_str(&buf, " - "); 451 + bch2_bpos_to_text(&buf, node_max); 452 + 453 + bch_info(c, "%s(): %s", __func__, buf.buf); 454 + printbuf_exit(&buf); 455 + } 456 + 457 + struct found_btree_node search = { 458 + .btree_id = btree, 459 + .level = level, 460 + .min_key = node_min, 461 + .max_key = node_max, 462 + }; 463 + 464 + for_each_found_btree_node_in_range(f, search, idx) { 465 + struct found_btree_node n = f->nodes.data[idx]; 466 + 467 + n.range_updated |= bpos_lt(n.min_key, node_min); 468 + n.min_key = bpos_max(n.min_key, node_min); 469 + 470 + n.range_updated |= bpos_gt(n.max_key, node_max); 471 + n.max_key = bpos_min(n.max_key, node_max); 472 + 473 + struct { __BKEY_PADDED(k, BKEY_BTREE_PTR_VAL_U64s_MAX); } tmp; 474 + 475 + found_btree_node_to_key(&tmp.k, &n); 476 + 477 + struct printbuf buf = PRINTBUF; 478 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&tmp.k)); 479 + bch_verbose(c, "%s(): recovering %s", __func__, buf.buf); 480 + printbuf_exit(&buf); 481 + 482 + BUG_ON(bch2_bkey_invalid(c, bkey_i_to_s_c(&tmp.k), BKEY_TYPE_btree, 0, NULL)); 483 + 484 + ret = bch2_journal_key_insert(c, btree, level + 1, &tmp.k); 485 + if (ret) 486 + return ret; 487 + } 488 + 489 + return 0; 490 + } 491 + 492 + void bch2_find_btree_nodes_exit(struct find_btree_nodes *f) 493 + { 494 + darray_exit(&f->nodes); 495 + }

+11

fs/bcachefs/btree_node_scan.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _BCACHEFS_BTREE_NODE_SCAN_H 3 + #define _BCACHEFS_BTREE_NODE_SCAN_H 4 + 5 + int bch2_scan_for_btree_nodes(struct bch_fs *); 6 + bool bch2_btree_node_is_stale(struct bch_fs *, struct btree *); 7 + bool bch2_btree_has_scanned_nodes(struct bch_fs *, enum btree_id); 8 + int bch2_get_scanned_nodes(struct bch_fs *, enum btree_id, unsigned, struct bpos, struct bpos); 9 + void bch2_find_btree_nodes_exit(struct find_btree_nodes *); 10 + 11 + #endif /* _BCACHEFS_BTREE_NODE_SCAN_H */

+30

fs/bcachefs/btree_node_scan_types.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _BCACHEFS_BTREE_NODE_SCAN_TYPES_H 3 + #define _BCACHEFS_BTREE_NODE_SCAN_TYPES_H 4 + 5 + #include "darray.h" 6 + 7 + struct found_btree_node { 8 + bool range_updated:1; 9 + bool overwritten:1; 10 + u8 btree_id; 11 + u8 level; 12 + u32 seq; 13 + u64 cookie; 14 + 15 + struct bpos min_key; 16 + struct bpos max_key; 17 + 18 + unsigned nr_ptrs; 19 + struct bch_extent_ptr ptrs[BCH_REPLICAS_MAX]; 20 + }; 21 + 22 + typedef DARRAY(struct found_btree_node) found_btree_nodes; 23 + 24 + struct find_btree_nodes { 25 + int ret; 26 + struct mutex lock; 27 + found_btree_nodes nodes; 28 + }; 29 + 30 + #endif /* _BCACHEFS_BTREE_NODE_SCAN_TYPES_H */

+3 -2

fs/bcachefs/btree_trans_commit.c

··· 318 318 !(i->flags & BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE) && 319 319 test_bit(JOURNAL_REPLAY_DONE, &trans->c->journal.flags) && 320 320 i->k->k.p.snapshot && 321 - bch2_snapshot_is_internal_node(trans->c, i->k->k.p.snapshot)); 321 + bch2_snapshot_is_internal_node(trans->c, i->k->k.p.snapshot) > 0); 322 322 } 323 323 324 324 static __always_inline int bch2_trans_journal_res_get(struct btree_trans *trans, ··· 887 887 int ret, unsigned long trace_ip) 888 888 { 889 889 struct bch_fs *c = trans->c; 890 + enum bch_watermark watermark = flags & BCH_WATERMARK_MASK; 890 891 891 892 switch (ret) { 892 893 case -BCH_ERR_btree_insert_btree_node_full: ··· 906 905 * flag 907 906 */ 908 907 if ((flags & BCH_TRANS_COMMIT_journal_reclaim) && 909 - (flags & BCH_WATERMARK_MASK) != BCH_WATERMARK_reclaim) { 908 + watermark < BCH_WATERMARK_reclaim) { 910 909 ret = -BCH_ERR_journal_reclaim_would_deadlock; 911 910 break; 912 911 }

+6

fs/bcachefs/btree_update.c

··· 38 38 struct bkey_i *update; 39 39 int ret; 40 40 41 + if (unlikely(trans->journal_replay_not_finished)) 42 + return 0; 43 + 41 44 update = bch2_bkey_make_mut_noupdate(trans, k); 42 45 ret = PTR_ERR_OR_ZERO(update); 43 46 if (ret) ··· 71 68 { 72 69 struct bch_fs *c = trans->c; 73 70 int ret; 71 + 72 + if (unlikely(trans->journal_replay_not_finished)) 73 + return 0; 74 74 75 75 ret = bch2_key_has_snapshot_overwrites(trans, iter->btree_id, insert->k.p) ?: 76 76 bch2_key_has_snapshot_overwrites(trans, iter->btree_id, k.k->p);

+147 -65

fs/bcachefs/btree_update_interior.c

··· 2 2 3 3 #include "bcachefs.h" 4 4 #include "alloc_foreground.h" 5 + #include "bkey_buf.h" 5 6 #include "bkey_methods.h" 6 7 #include "btree_cache.h" 7 8 #include "btree_gc.h" ··· 19 18 #include "journal.h" 20 19 #include "journal_reclaim.h" 21 20 #include "keylist.h" 21 + #include "recovery_passes.h" 22 22 #include "replicas.h" 23 23 #include "super-io.h" 24 24 #include "trace.h" 25 25 26 26 #include <linux/random.h> 27 + 28 + const char * const bch2_btree_update_modes[] = { 29 + #define x(t) #t, 30 + BCH_WATERMARKS() 31 + #undef x 32 + NULL 33 + }; 27 34 28 35 static int bch2_btree_insert_node(struct btree_update *, struct btree_trans *, 29 36 btree_path_idx_t, struct btree *, struct keylist *); ··· 53 44 return path_idx; 54 45 } 55 46 56 - /* Debug code: */ 57 - 58 47 /* 59 48 * Verify that child nodes correctly span parent node's range: 60 49 */ 61 - static void btree_node_interior_verify(struct bch_fs *c, struct btree *b) 50 + int bch2_btree_node_check_topology(struct btree_trans *trans, struct btree *b) 62 51 { 63 - #ifdef CONFIG_BCACHEFS_DEBUG 64 - struct bpos next_node = b->data->min_key; 65 - struct btree_node_iter iter; 52 + struct bch_fs *c = trans->c; 53 + struct bpos node_min = b->key.k.type == KEY_TYPE_btree_ptr_v2 54 + ? bkey_i_to_btree_ptr_v2(&b->key)->v.min_key 55 + : b->data->min_key; 56 + struct btree_and_journal_iter iter; 66 57 struct bkey_s_c k; 67 - struct bkey_s_c_btree_ptr_v2 bp; 68 - struct bkey unpacked; 69 - struct printbuf buf1 = PRINTBUF, buf2 = PRINTBUF; 58 + struct printbuf buf = PRINTBUF; 59 + struct bkey_buf prev; 60 + int ret = 0; 70 61 71 - BUG_ON(!b->c.level); 62 + BUG_ON(b->key.k.type == KEY_TYPE_btree_ptr_v2 && 63 + !bpos_eq(bkey_i_to_btree_ptr_v2(&b->key)->v.min_key, 64 + b->data->min_key)); 72 65 73 - if (!test_bit(JOURNAL_REPLAY_DONE, &c->journal.flags)) 74 - return; 66 + if (!b->c.level) 67 + return 0; 75 68 76 - bch2_btree_node_iter_init_from_start(&iter, b); 69 + bch2_bkey_buf_init(&prev); 70 + bkey_init(&prev.k->k); 71 + bch2_btree_and_journal_iter_init_node_iter(trans, &iter, b); 77 72 78 - while (1) { 79 - k = bch2_btree_node_iter_peek_unpack(&iter, b, &unpacked); 73 + while ((k = bch2_btree_and_journal_iter_peek(&iter)).k) { 80 74 if (k.k->type != KEY_TYPE_btree_ptr_v2) 81 - break; 82 - bp = bkey_s_c_to_btree_ptr_v2(k); 75 + goto out; 83 76 84 - if (!bpos_eq(next_node, bp.v->min_key)) { 85 - bch2_dump_btree_node(c, b); 86 - bch2_bpos_to_text(&buf1, next_node); 87 - bch2_bpos_to_text(&buf2, bp.v->min_key); 88 - panic("expected next min_key %s got %s\n", buf1.buf, buf2.buf); 77 + struct bkey_s_c_btree_ptr_v2 bp = bkey_s_c_to_btree_ptr_v2(k); 78 + 79 + struct bpos expected_min = bkey_deleted(&prev.k->k) 80 + ? node_min 81 + : bpos_successor(prev.k->k.p); 82 + 83 + if (!bpos_eq(expected_min, bp.v->min_key)) { 84 + bch2_topology_error(c); 85 + 86 + printbuf_reset(&buf); 87 + prt_str(&buf, "end of prev node doesn't match start of next node\n"), 88 + prt_printf(&buf, " in btree %s level %u node ", 89 + bch2_btree_id_str(b->c.btree_id), b->c.level); 90 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&b->key)); 91 + prt_str(&buf, "\n prev "); 92 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(prev.k)); 93 + prt_str(&buf, "\n next "); 94 + bch2_bkey_val_to_text(&buf, c, k); 95 + 96 + need_fsck_err(c, btree_node_topology_bad_min_key, "%s", buf.buf); 97 + goto topology_repair; 89 98 } 90 99 91 - bch2_btree_node_iter_advance(&iter, b); 92 - 93 - if (bch2_btree_node_iter_end(&iter)) { 94 - if (!bpos_eq(k.k->p, b->key.k.p)) { 95 - bch2_dump_btree_node(c, b); 96 - bch2_bpos_to_text(&buf1, b->key.k.p); 97 - bch2_bpos_to_text(&buf2, k.k->p); 98 - panic("expected end %s got %s\n", buf1.buf, buf2.buf); 99 - } 100 - break; 101 - } 102 - 103 - next_node = bpos_successor(k.k->p); 100 + bch2_bkey_buf_reassemble(&prev, c, k); 101 + bch2_btree_and_journal_iter_advance(&iter); 104 102 } 105 - #endif 103 + 104 + if (bkey_deleted(&prev.k->k)) { 105 + bch2_topology_error(c); 106 + 107 + printbuf_reset(&buf); 108 + prt_str(&buf, "empty interior node\n"); 109 + prt_printf(&buf, " in btree %s level %u node ", 110 + bch2_btree_id_str(b->c.btree_id), b->c.level); 111 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&b->key)); 112 + 113 + need_fsck_err(c, btree_node_topology_empty_interior_node, "%s", buf.buf); 114 + goto topology_repair; 115 + } else if (!bpos_eq(prev.k->k.p, b->key.k.p)) { 116 + bch2_topology_error(c); 117 + 118 + printbuf_reset(&buf); 119 + prt_str(&buf, "last child node doesn't end at end of parent node\n"); 120 + prt_printf(&buf, " in btree %s level %u node ", 121 + bch2_btree_id_str(b->c.btree_id), b->c.level); 122 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&b->key)); 123 + prt_str(&buf, "\n last key "); 124 + bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(prev.k)); 125 + 126 + need_fsck_err(c, btree_node_topology_bad_max_key, "%s", buf.buf); 127 + goto topology_repair; 128 + } 129 + out: 130 + fsck_err: 131 + bch2_btree_and_journal_iter_exit(&iter); 132 + bch2_bkey_buf_exit(&prev, c); 133 + printbuf_exit(&buf); 134 + return ret; 135 + topology_repair: 136 + if ((c->recovery_passes_explicit & BIT_ULL(BCH_RECOVERY_PASS_check_topology)) && 137 + c->curr_recovery_pass > BCH_RECOVERY_PASS_check_topology) { 138 + bch2_inconsistent_error(c); 139 + ret = -BCH_ERR_btree_need_topology_repair; 140 + } else { 141 + ret = bch2_run_explicit_recovery_pass(c, BCH_RECOVERY_PASS_check_topology); 142 + } 143 + goto out; 106 144 } 107 145 108 146 /* Calculate ideal packed bkey format for new btree nodes: */ ··· 310 254 struct open_buckets obs = { .nr = 0 }; 311 255 struct bch_devs_list devs_have = (struct bch_devs_list) { 0 }; 312 256 enum bch_watermark watermark = flags & BCH_WATERMARK_MASK; 313 - unsigned nr_reserve = watermark > BCH_WATERMARK_reclaim 257 + unsigned nr_reserve = watermark < BCH_WATERMARK_reclaim 314 258 ? BTREE_NODE_RESERVE 315 259 : 0; 316 260 int ret; ··· 694 638 * which may require allocations as well. 695 639 */ 696 640 ret = commit_do(trans, &as->disk_res, &journal_seq, 697 - BCH_WATERMARK_reclaim| 641 + BCH_WATERMARK_interior_updates| 698 642 BCH_TRANS_COMMIT_no_enospc| 699 643 BCH_TRANS_COMMIT_no_check_rw| 700 644 BCH_TRANS_COMMIT_journal_reclaim, ··· 853 797 mutex_lock(&c->btree_interior_update_lock); 854 798 list_add_tail(&as->unwritten_list, &c->btree_interior_updates_unwritten); 855 799 856 - BUG_ON(as->mode != BTREE_INTERIOR_NO_UPDATE); 800 + BUG_ON(as->mode != BTREE_UPDATE_none); 857 801 BUG_ON(!btree_node_dirty(b)); 858 802 BUG_ON(!b->c.level); 859 803 860 - as->mode = BTREE_INTERIOR_UPDATING_NODE; 804 + as->mode = BTREE_UPDATE_node; 861 805 as->b = b; 862 806 863 807 set_btree_node_write_blocked(b); ··· 880 824 lockdep_assert_held(&c->btree_interior_update_lock); 881 825 882 826 child->b = NULL; 883 - child->mode = BTREE_INTERIOR_UPDATING_AS; 827 + child->mode = BTREE_UPDATE_update; 884 828 885 829 bch2_journal_pin_copy(&c->journal, &as->journal, &child->journal, 886 830 bch2_update_reparent_journal_pin_flush); ··· 891 835 struct bkey_i *insert = &b->key; 892 836 struct bch_fs *c = as->c; 893 837 894 - BUG_ON(as->mode != BTREE_INTERIOR_NO_UPDATE); 838 + BUG_ON(as->mode != BTREE_UPDATE_none); 895 839 896 840 BUG_ON(as->journal_u64s + jset_u64s(insert->k.u64s) > 897 841 ARRAY_SIZE(as->journal_entries)); ··· 905 849 mutex_lock(&c->btree_interior_update_lock); 906 850 list_add_tail(&as->unwritten_list, &c->btree_interior_updates_unwritten); 907 851 908 - as->mode = BTREE_INTERIOR_UPDATING_ROOT; 852 + as->mode = BTREE_UPDATE_root; 909 853 mutex_unlock(&c->btree_interior_update_lock); 910 854 } 911 855 ··· 1083 1027 struct bch_fs *c = as->c; 1084 1028 u64 start_time = as->start_time; 1085 1029 1086 - BUG_ON(as->mode == BTREE_INTERIOR_NO_UPDATE); 1030 + BUG_ON(as->mode == BTREE_UPDATE_none); 1087 1031 1088 1032 if (as->took_gc_lock) 1089 1033 up_read(&as->c->gc_lock); ··· 1128 1072 unsigned journal_flags = watermark|JOURNAL_RES_GET_CHECK; 1129 1073 1130 1074 if ((flags & BCH_TRANS_COMMIT_journal_reclaim) && 1131 - watermark != BCH_WATERMARK_reclaim) 1075 + watermark < BCH_WATERMARK_reclaim) 1132 1076 journal_flags |= JOURNAL_RES_GET_NONBLOCK; 1133 1077 1134 1078 ret = drop_locks_do(trans, ··· 1179 1123 as->c = c; 1180 1124 as->start_time = start_time; 1181 1125 as->ip_started = _RET_IP_; 1182 - as->mode = BTREE_INTERIOR_NO_UPDATE; 1126 + as->mode = BTREE_UPDATE_none; 1127 + as->watermark = watermark; 1183 1128 as->took_gc_lock = true; 1184 1129 as->btree_id = path->btree_id; 1185 1130 as->update_level = update_level; ··· 1225 1168 */ 1226 1169 if (bch2_err_matches(ret, ENOSPC) && 1227 1170 (flags & BCH_TRANS_COMMIT_journal_reclaim) && 1228 - watermark != BCH_WATERMARK_reclaim) { 1171 + watermark < BCH_WATERMARK_reclaim) { 1229 1172 ret = -BCH_ERR_journal_reclaim_would_deadlock; 1230 1173 goto err; 1231 1174 } ··· 1437 1380 if (bkey_deleted(k)) 1438 1381 continue; 1439 1382 1383 + uk = bkey_unpack_key(b, k); 1384 + 1385 + if (b->c.level && 1386 + u64s < n1_u64s && 1387 + u64s + k->u64s >= n1_u64s && 1388 + bch2_key_deleted_in_journal(trans, b->c.btree_id, b->c.level, uk.p)) 1389 + n1_u64s += k->u64s; 1390 + 1440 1391 i = u64s >= n1_u64s; 1441 1392 u64s += k->u64s; 1442 - uk = bkey_unpack_key(b, k); 1443 1393 if (!i) 1444 1394 n1_pos = uk.p; 1445 1395 bch2_bkey_format_add_key(&format[i], &uk); ··· 1505 1441 1506 1442 bch2_verify_btree_nr_keys(n[i]); 1507 1443 1508 - if (b->c.level) 1509 - btree_node_interior_verify(as->c, n[i]); 1444 + BUG_ON(bch2_btree_node_check_topology(trans, n[i])); 1510 1445 } 1511 1446 } 1512 1447 ··· 1536 1473 1537 1474 __bch2_btree_insert_keys_interior(as, trans, path, b, node_iter, keys); 1538 1475 1539 - btree_node_interior_verify(as->c, b); 1476 + BUG_ON(bch2_btree_node_check_topology(trans, b)); 1540 1477 } 1541 1478 } 1542 1479 ··· 1551 1488 u64 start_time = local_clock(); 1552 1489 int ret = 0; 1553 1490 1491 + bch2_verify_btree_nr_keys(b); 1554 1492 BUG_ON(!parent && (b != btree_node_root(c, b))); 1555 1493 BUG_ON(parent && !btree_node_intent_locked(trans->paths + path, b->c.level + 1)); 1494 + 1495 + ret = bch2_btree_node_check_topology(trans, b); 1496 + if (ret) 1497 + return ret; 1556 1498 1557 1499 bch2_btree_interior_update_will_free_node(as, b); 1558 1500 ··· 1778 1710 goto split; 1779 1711 } 1780 1712 1781 - btree_node_interior_verify(c, b); 1713 + ret = bch2_btree_node_check_topology(trans, b); 1714 + if (ret) { 1715 + bch2_btree_node_unlock_write(trans, path, b); 1716 + return ret; 1717 + } 1782 1718 1783 1719 bch2_btree_insert_keys_interior(as, trans, path, b, keys); 1784 1720 ··· 1800 1728 1801 1729 bch2_btree_node_unlock_write(trans, path, b); 1802 1730 1803 - btree_node_interior_verify(c, b); 1731 + BUG_ON(bch2_btree_node_check_topology(trans, b)); 1804 1732 return 0; 1805 1733 split: 1806 1734 /* ··· 1890 1818 { 1891 1819 struct bch_fs *c = trans->c; 1892 1820 struct btree *b = bch2_btree_id_root(c, trans->paths[path].btree_id)->b; 1821 + 1822 + if (btree_node_fake(b)) 1823 + return bch2_btree_split_leaf(trans, path, flags); 1824 + 1893 1825 struct btree_update *as = 1894 - bch2_btree_update_start(trans, trans->paths + path, 1895 - b->c.level, true, flags); 1826 + bch2_btree_update_start(trans, trans->paths + path, b->c.level, true, flags); 1896 1827 if (IS_ERR(as)) 1897 1828 return PTR_ERR(as); 1898 1829 ··· 2466 2391 bch2_btree_set_root_inmem(c, b); 2467 2392 } 2468 2393 2469 - static int __bch2_btree_root_alloc(struct btree_trans *trans, enum btree_id id) 2394 + static int __bch2_btree_root_alloc_fake(struct btree_trans *trans, enum btree_id id, unsigned level) 2470 2395 { 2471 2396 struct bch_fs *c = trans->c; 2472 2397 struct closure cl; ··· 2485 2410 2486 2411 set_btree_node_fake(b); 2487 2412 set_btree_node_need_rewrite(b); 2488 - b->c.level = 0; 2413 + b->c.level = level; 2489 2414 b->c.btree_id = id; 2490 2415 2491 2416 bkey_btree_ptr_init(&b->key); ··· 2512 2437 return 0; 2513 2438 } 2514 2439 2515 - void bch2_btree_root_alloc(struct bch_fs *c, enum btree_id id) 2440 + void bch2_btree_root_alloc_fake(struct bch_fs *c, enum btree_id id, unsigned level) 2516 2441 { 2517 - bch2_trans_run(c, __bch2_btree_root_alloc(trans, id)); 2442 + bch2_trans_run(c, __bch2_btree_root_alloc_fake(trans, id, level)); 2443 + } 2444 + 2445 + static void bch2_btree_update_to_text(struct printbuf *out, struct btree_update *as) 2446 + { 2447 + prt_printf(out, "%ps: btree=%s watermark=%s mode=%s nodes_written=%u cl.remaining=%u journal_seq=%llu\n", 2448 + (void *) as->ip_started, 2449 + bch2_btree_id_str(as->btree_id), 2450 + bch2_watermarks[as->watermark], 2451 + bch2_btree_update_modes[as->mode], 2452 + as->nodes_written, 2453 + closure_nr_remaining(&as->cl), 2454 + as->journal.seq); 2518 2455 } 2519 2456 2520 2457 void bch2_btree_updates_to_text(struct printbuf *out, struct bch_fs *c) ··· 2535 2448 2536 2449 mutex_lock(&c->btree_interior_update_lock); 2537 2450 list_for_each_entry(as, &c->btree_interior_update_list, list) 2538 - prt_printf(out, "%ps: mode=%u nodes_written=%u cl.remaining=%u journal_seq=%llu\n", 2539 - (void *) as->ip_started, 2540 - as->mode, 2541 - as->nodes_written, 2542 - closure_nr_remaining(&as->cl), 2543 - as->journal.seq); 2451 + bch2_btree_update_to_text(out, as); 2544 2452 mutex_unlock(&c->btree_interior_update_lock); 2545 2453 } 2546 2454

+18 -10

fs/bcachefs/btree_update_interior.h

··· 10 10 11 11 #define BTREE_UPDATE_JOURNAL_RES (BTREE_UPDATE_NODES_MAX * (BKEY_BTREE_PTR_U64s_MAX + 1)) 12 12 13 + int bch2_btree_node_check_topology(struct btree_trans *, struct btree *); 14 + 15 + #define BTREE_UPDATE_MODES() \ 16 + x(none) \ 17 + x(node) \ 18 + x(root) \ 19 + x(update) 20 + 21 + enum btree_update_mode { 22 + #define x(n) BTREE_UPDATE_##n, 23 + BTREE_UPDATE_MODES() 24 + #undef x 25 + }; 26 + 13 27 /* 14 28 * Tracks an in progress split/rewrite of a btree node and the update to the 15 29 * parent node: ··· 51 37 struct list_head list; 52 38 struct list_head unwritten_list; 53 39 54 - /* What kind of update are we doing? */ 55 - enum { 56 - BTREE_INTERIOR_NO_UPDATE, 57 - BTREE_INTERIOR_UPDATING_NODE, 58 - BTREE_INTERIOR_UPDATING_ROOT, 59 - BTREE_INTERIOR_UPDATING_AS, 60 - } mode; 61 - 40 + enum btree_update_mode mode; 41 + enum bch_watermark watermark; 62 42 unsigned nodes_written:1; 63 43 unsigned took_gc_lock:1; 64 44 ··· 62 54 struct disk_reservation disk_res; 63 55 64 56 /* 65 - * BTREE_INTERIOR_UPDATING_NODE: 57 + * BTREE_UPDATE_node: 66 58 * The update that made the new nodes visible was a regular update to an 67 59 * existing interior node - @b. We can't write out the update to @b 68 60 * until the new nodes we created are finished writing, so we block @b ··· 171 163 struct bkey_i *, unsigned, bool); 172 164 173 165 void bch2_btree_set_root_for_read(struct bch_fs *, struct btree *); 174 - void bch2_btree_root_alloc(struct bch_fs *, enum btree_id); 166 + void bch2_btree_root_alloc_fake(struct bch_fs *, enum btree_id, unsigned); 175 167 176 168 static inline unsigned btree_update_reserve_required(struct bch_fs *c, 177 169 struct btree *b)

+14

fs/bcachefs/btree_write_buffer.c

··· 11 11 #include "journal_reclaim.h" 12 12 13 13 #include <linux/prefetch.h> 14 + #include <linux/sort.h> 14 15 15 16 static int bch2_btree_write_buffer_journal_flush(struct journal *, 16 17 struct journal_entry_pin *, u64); ··· 45 44 #else 46 45 return __wb_key_ref_cmp(l, r); 47 46 #endif 47 + } 48 + 49 + static int wb_key_seq_cmp(const void *_l, const void *_r) 50 + { 51 + const struct btree_write_buffered_key *l = _l; 52 + const struct btree_write_buffered_key *r = _r; 53 + 54 + return cmp_int(l->journal_seq, r->journal_seq); 48 55 } 49 56 50 57 /* Compare excluding idx, the low 24 bits: */ ··· 365 356 * we can skip those here. 366 357 */ 367 358 trace_and_count(c, write_buffer_flush_slowpath, trans, slowpath, wb->flushing.keys.nr); 359 + 360 + sort(wb->flushing.keys.data, 361 + wb->flushing.keys.nr, 362 + sizeof(wb->flushing.keys.data[0]), 363 + wb_key_seq_cmp, NULL); 368 364 369 365 darray_for_each(wb->flushing.keys, i) { 370 366 if (!i->journal_seq)

+7 -5

fs/bcachefs/buckets.c

··· 525 525 "different types of data in same bucket: %s, %s", 526 526 bch2_data_type_str(g->data_type), 527 527 bch2_data_type_str(data_type))) { 528 + BUG(); 528 529 ret = -EIO; 529 530 goto err; 530 531 } ··· 629 628 bch2_data_type_str(ptr_data_type), 630 629 (printbuf_reset(&buf), 631 630 bch2_bkey_val_to_text(&buf, c, k), buf.buf)); 631 + BUG(); 632 632 ret = -EIO; 633 633 goto err; 634 634 } ··· 817 815 static int bch2_trigger_pointer(struct btree_trans *trans, 818 816 enum btree_id btree_id, unsigned level, 819 817 struct bkey_s_c k, struct extent_ptr_decoded p, 820 - s64 *sectors, 821 - unsigned flags) 818 + const union bch_extent_entry *entry, 819 + s64 *sectors, unsigned flags) 822 820 { 823 821 bool insert = !(flags & BTREE_TRIGGER_OVERWRITE); 824 822 struct bpos bucket; 825 823 struct bch_backpointer bp; 826 824 827 - bch2_extent_ptr_to_bp(trans->c, btree_id, level, k, p, &bucket, &bp); 825 + bch2_extent_ptr_to_bp(trans->c, btree_id, level, k, p, entry, &bucket, &bp); 828 826 *sectors = insert ? bp.bucket_len : -((s64) bp.bucket_len); 829 827 830 828 if (flags & BTREE_TRIGGER_TRANSACTIONAL) { ··· 853 851 if (flags & BTREE_TRIGGER_GC) { 854 852 struct bch_fs *c = trans->c; 855 853 struct bch_dev *ca = bch_dev_bkey_exists(c, p.ptr.dev); 856 - enum bch_data_type data_type = bkey_ptr_data_type(btree_id, level, k, p); 854 + enum bch_data_type data_type = bch2_bkey_ptr_data_type(k, p, entry); 857 855 858 856 percpu_down_read(&c->mark_lock); 859 857 struct bucket *g = PTR_GC_BUCKET(ca, &p.ptr); ··· 981 979 982 980 bkey_for_each_ptr_decode(k.k, ptrs, p, entry) { 983 981 s64 disk_sectors; 984 - ret = bch2_trigger_pointer(trans, btree_id, level, k, p, &disk_sectors, flags); 982 + ret = bch2_trigger_pointer(trans, btree_id, level, k, p, entry, &disk_sectors, flags); 985 983 if (ret < 0) 986 984 return ret; 987 985

+1

fs/bcachefs/buckets.h

··· 226 226 fallthrough; 227 227 case BCH_WATERMARK_btree_copygc: 228 228 case BCH_WATERMARK_reclaim: 229 + case BCH_WATERMARK_interior_updates: 229 230 break; 230 231 } 231 232

+1 -1

fs/bcachefs/chardev.c

··· 7 7 #include "chardev.h" 8 8 #include "journal.h" 9 9 #include "move.h" 10 - #include "recovery.h" 10 + #include "recovery_passes.h" 11 11 #include "replicas.h" 12 12 #include "super.h" 13 13 #include "super-io.h"

+10 -2

fs/bcachefs/data_update.c

··· 14 14 #include "move.h" 15 15 #include "nocow_locking.h" 16 16 #include "rebalance.h" 17 + #include "snapshot.h" 17 18 #include "subvolume.h" 18 19 #include "trace.h" 19 20 ··· 510 509 unsigned ptrs_locked = 0; 511 510 int ret = 0; 512 511 512 + /* 513 + * fs is corrupt we have a key for a snapshot node that doesn't exist, 514 + * and we have to check for this because we go rw before repairing the 515 + * snapshots table - just skip it, we can move it later. 516 + */ 517 + if (unlikely(k.k->p.snapshot && !bch2_snapshot_equiv(c, k.k->p.snapshot))) 518 + return -BCH_ERR_data_update_done; 519 + 513 520 bch2_bkey_buf_init(&m->k); 514 521 bch2_bkey_buf_reassemble(&m->k, c, k); 515 522 m->btree_id = btree_id; ··· 580 571 move_ctxt_wait_event(ctxt, 581 572 (locked = bch2_bucket_nocow_trylock(&c->nocow_locks, 582 573 PTR_BUCKET_POS(c, &p.ptr), 0)) || 583 - (!atomic_read(&ctxt->read_sectors) && 584 - !atomic_read(&ctxt->write_sectors))); 574 + list_empty(&ctxt->ios)); 585 575 586 576 if (!locked) 587 577 bch2_bucket_nocow_lock(&c->nocow_locks,

+2 -1

fs/bcachefs/errcode.h

··· 252 252 x(BCH_ERR_nopromote, nopromote_in_flight) \ 253 253 x(BCH_ERR_nopromote, nopromote_no_writes) \ 254 254 x(BCH_ERR_nopromote, nopromote_enomem) \ 255 - x(0, need_inode_lock) 255 + x(0, need_inode_lock) \ 256 + x(0, invalid_snapshot_node) 256 257 257 258 enum bch_errcode { 258 259 BCH_ERR_START = 2048,

+4 -2

fs/bcachefs/error.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 #include "bcachefs.h" 3 3 #include "error.h" 4 - #include "recovery.h" 4 + #include "journal.h" 5 + #include "recovery_passes.h" 5 6 #include "super.h" 6 7 #include "thread_with_file.h" 7 8 ··· 17 16 return false; 18 17 case BCH_ON_ERROR_ro: 19 18 if (bch2_fs_emergency_read_only(c)) 20 - bch_err(c, "inconsistency detected - emergency read only"); 19 + bch_err(c, "inconsistency detected - emergency read only at journal seq %llu", 20 + journal_cur_seq(&c->journal)); 21 21 return true; 22 22 case BCH_ON_ERROR_panic: 23 23 panic(bch2_fmt(c, "panic after error"));

+6

fs/bcachefs/error.h

··· 32 32 33 33 int bch2_topology_error(struct bch_fs *); 34 34 35 + #define bch2_fs_topology_error(c, ...) \ 36 + ({ \ 37 + bch_err(c, "btree topology error: " __VA_ARGS__); \ 38 + bch2_topology_error(c); \ 39 + }) 40 + 35 41 #define bch2_fs_inconsistent(c, ...) \ 36 42 ({ \ 37 43 bch_err(c, __VA_ARGS__); \

+35 -26

fs/bcachefs/extents.c

··· 189 189 enum bkey_invalid_flags flags, 190 190 struct printbuf *err) 191 191 { 192 + struct bkey_s_c_btree_ptr_v2 bp = bkey_s_c_to_btree_ptr_v2(k); 192 193 int ret = 0; 193 194 194 - bkey_fsck_err_on(bkey_val_u64s(k.k) > BKEY_BTREE_PTR_VAL_U64s_MAX, c, err, 195 - btree_ptr_v2_val_too_big, 195 + bkey_fsck_err_on(bkey_val_u64s(k.k) > BKEY_BTREE_PTR_VAL_U64s_MAX, 196 + c, err, btree_ptr_v2_val_too_big, 196 197 "value too big (%zu > %zu)", 197 198 bkey_val_u64s(k.k), BKEY_BTREE_PTR_VAL_U64s_MAX); 199 + 200 + bkey_fsck_err_on(bpos_ge(bp.v->min_key, bp.k->p), 201 + c, err, btree_ptr_v2_min_key_bad, 202 + "min_key > key"); 198 203 199 204 ret = bch2_bkey_ptrs_invalid(c, k, flags, err); 200 205 fsck_err: ··· 978 973 return bkey_deleted(k.k); 979 974 } 980 975 976 + void bch2_extent_ptr_to_text(struct printbuf *out, struct bch_fs *c, const struct bch_extent_ptr *ptr) 977 + { 978 + struct bch_dev *ca = c && ptr->dev < c->sb.nr_devices && c->devs[ptr->dev] 979 + ? bch_dev_bkey_exists(c, ptr->dev) 980 + : NULL; 981 + 982 + if (!ca) { 983 + prt_printf(out, "ptr: %u:%llu gen %u%s", ptr->dev, 984 + (u64) ptr->offset, ptr->gen, 985 + ptr->cached ? " cached" : ""); 986 + } else { 987 + u32 offset; 988 + u64 b = sector_to_bucket_and_offset(ca, ptr->offset, &offset); 989 + 990 + prt_printf(out, "ptr: %u:%llu:%u gen %u", 991 + ptr->dev, b, offset, ptr->gen); 992 + if (ptr->cached) 993 + prt_str(out, " cached"); 994 + if (ptr->unwritten) 995 + prt_str(out, " unwritten"); 996 + if (ca && ptr_stale(ca, ptr)) 997 + prt_printf(out, " stale"); 998 + } 999 + } 1000 + 981 1001 void bch2_bkey_ptrs_to_text(struct printbuf *out, struct bch_fs *c, 982 1002 struct bkey_s_c k) 983 1003 { ··· 1018 988 prt_printf(out, " "); 1019 989 1020 990 switch (__extent_entry_type(entry)) { 1021 - case BCH_EXTENT_ENTRY_ptr: { 1022 - const struct bch_extent_ptr *ptr = entry_to_ptr(entry); 1023 - struct bch_dev *ca = c && ptr->dev < c->sb.nr_devices && c->devs[ptr->dev] 1024 - ? bch_dev_bkey_exists(c, ptr->dev) 1025 - : NULL; 1026 - 1027 - if (!ca) { 1028 - prt_printf(out, "ptr: %u:%llu gen %u%s", ptr->dev, 1029 - (u64) ptr->offset, ptr->gen, 1030 - ptr->cached ? " cached" : ""); 1031 - } else { 1032 - u32 offset; 1033 - u64 b = sector_to_bucket_and_offset(ca, ptr->offset, &offset); 1034 - 1035 - prt_printf(out, "ptr: %u:%llu:%u gen %u", 1036 - ptr->dev, b, offset, ptr->gen); 1037 - if (ptr->cached) 1038 - prt_str(out, " cached"); 1039 - if (ptr->unwritten) 1040 - prt_str(out, " unwritten"); 1041 - if (ca && ptr_stale(ca, ptr)) 1042 - prt_printf(out, " stale"); 1043 - } 991 + case BCH_EXTENT_ENTRY_ptr: 992 + bch2_extent_ptr_to_text(out, c, entry_to_ptr(entry)); 1044 993 break; 1045 - } 994 + 1046 995 case BCH_EXTENT_ENTRY_crc32: 1047 996 case BCH_EXTENT_ENTRY_crc64: 1048 997 case BCH_EXTENT_ENTRY_crc128: {

+1 -24

fs/bcachefs/extents.h

··· 596 596 return ret; 597 597 } 598 598 599 - static inline unsigned bch2_bkey_ptr_data_type(struct bkey_s_c k, const struct bch_extent_ptr *ptr) 600 - { 601 - switch (k.k->type) { 602 - case KEY_TYPE_btree_ptr: 603 - case KEY_TYPE_btree_ptr_v2: 604 - return BCH_DATA_btree; 605 - case KEY_TYPE_extent: 606 - case KEY_TYPE_reflink_v: 607 - return BCH_DATA_user; 608 - case KEY_TYPE_stripe: { 609 - struct bkey_s_c_stripe s = bkey_s_c_to_stripe(k); 610 - 611 - BUG_ON(ptr < s.v->ptrs || 612 - ptr >= s.v->ptrs + s.v->nr_blocks); 613 - 614 - return ptr >= s.v->ptrs + s.v->nr_blocks - s.v->nr_redundant 615 - ? BCH_DATA_parity 616 - : BCH_DATA_user; 617 - } 618 - default: 619 - BUG(); 620 - } 621 - } 622 - 623 599 unsigned bch2_bkey_nr_ptrs(struct bkey_s_c); 624 600 unsigned bch2_bkey_nr_ptrs_allocated(struct bkey_s_c); 625 601 unsigned bch2_bkey_nr_ptrs_fully_allocated(struct bkey_s_c); ··· 676 700 void bch2_extent_ptr_set_cached(struct bkey_s, struct bch_extent_ptr *); 677 701 678 702 bool bch2_extent_normalize(struct bch_fs *, struct bkey_s); 703 + void bch2_extent_ptr_to_text(struct printbuf *out, struct bch_fs *, const struct bch_extent_ptr *); 679 704 void bch2_bkey_ptrs_to_text(struct printbuf *, struct bch_fs *, 680 705 struct bkey_s_c); 681 706 int bch2_bkey_ptrs_invalid(struct bch_fs *, struct bkey_s_c,

+234

fs/bcachefs/eytzinger.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include "eytzinger.h" 4 + 5 + /** 6 + * is_aligned - is this pointer & size okay for word-wide copying? 7 + * @base: pointer to data 8 + * @size: size of each element 9 + * @align: required alignment (typically 4 or 8) 10 + * 11 + * Returns true if elements can be copied using word loads and stores. 12 + * The size must be a multiple of the alignment, and the base address must 13 + * be if we do not have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS. 14 + * 15 + * For some reason, gcc doesn't know to optimize "if (a & mask || b & mask)" 16 + * to "if ((a | b) & mask)", so we do that by hand. 17 + */ 18 + __attribute_const__ __always_inline 19 + static bool is_aligned(const void *base, size_t size, unsigned char align) 20 + { 21 + unsigned char lsbits = (unsigned char)size; 22 + 23 + (void)base; 24 + #ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS 25 + lsbits |= (unsigned char)(uintptr_t)base; 26 + #endif 27 + return (lsbits & (align - 1)) == 0; 28 + } 29 + 30 + /** 31 + * swap_words_32 - swap two elements in 32-bit chunks 32 + * @a: pointer to the first element to swap 33 + * @b: pointer to the second element to swap 34 + * @n: element size (must be a multiple of 4) 35 + * 36 + * Exchange the two objects in memory. This exploits base+index addressing, 37 + * which basically all CPUs have, to minimize loop overhead computations. 38 + * 39 + * For some reason, on x86 gcc 7.3.0 adds a redundant test of n at the 40 + * bottom of the loop, even though the zero flag is still valid from the 41 + * subtract (since the intervening mov instructions don't alter the flags). 42 + * Gcc 8.1.0 doesn't have that problem. 43 + */ 44 + static void swap_words_32(void *a, void *b, size_t n) 45 + { 46 + do { 47 + u32 t = *(u32 *)(a + (n -= 4)); 48 + *(u32 *)(a + n) = *(u32 *)(b + n); 49 + *(u32 *)(b + n) = t; 50 + } while (n); 51 + } 52 + 53 + /** 54 + * swap_words_64 - swap two elements in 64-bit chunks 55 + * @a: pointer to the first element to swap 56 + * @b: pointer to the second element to swap 57 + * @n: element size (must be a multiple of 8) 58 + * 59 + * Exchange the two objects in memory. This exploits base+index 60 + * addressing, which basically all CPUs have, to minimize loop overhead 61 + * computations. 62 + * 63 + * We'd like to use 64-bit loads if possible. If they're not, emulating 64 + * one requires base+index+4 addressing which x86 has but most other 65 + * processors do not. If CONFIG_64BIT, we definitely have 64-bit loads, 66 + * but it's possible to have 64-bit loads without 64-bit pointers (e.g. 67 + * x32 ABI). Are there any cases the kernel needs to worry about? 68 + */ 69 + static void swap_words_64(void *a, void *b, size_t n) 70 + { 71 + do { 72 + #ifdef CONFIG_64BIT 73 + u64 t = *(u64 *)(a + (n -= 8)); 74 + *(u64 *)(a + n) = *(u64 *)(b + n); 75 + *(u64 *)(b + n) = t; 76 + #else 77 + /* Use two 32-bit transfers to avoid base+index+4 addressing */ 78 + u32 t = *(u32 *)(a + (n -= 4)); 79 + *(u32 *)(a + n) = *(u32 *)(b + n); 80 + *(u32 *)(b + n) = t; 81 + 82 + t = *(u32 *)(a + (n -= 4)); 83 + *(u32 *)(a + n) = *(u32 *)(b + n); 84 + *(u32 *)(b + n) = t; 85 + #endif 86 + } while (n); 87 + } 88 + 89 + /** 90 + * swap_bytes - swap two elements a byte at a time 91 + * @a: pointer to the first element to swap 92 + * @b: pointer to the second element to swap 93 + * @n: element size 94 + * 95 + * This is the fallback if alignment doesn't allow using larger chunks. 96 + */ 97 + static void swap_bytes(void *a, void *b, size_t n) 98 + { 99 + do { 100 + char t = ((char *)a)[--n]; 101 + ((char *)a)[n] = ((char *)b)[n]; 102 + ((char *)b)[n] = t; 103 + } while (n); 104 + } 105 + 106 + /* 107 + * The values are arbitrary as long as they can't be confused with 108 + * a pointer, but small integers make for the smallest compare 109 + * instructions. 110 + */ 111 + #define SWAP_WORDS_64 (swap_r_func_t)0 112 + #define SWAP_WORDS_32 (swap_r_func_t)1 113 + #define SWAP_BYTES (swap_r_func_t)2 114 + #define SWAP_WRAPPER (swap_r_func_t)3 115 + 116 + struct wrapper { 117 + cmp_func_t cmp; 118 + swap_func_t swap; 119 + }; 120 + 121 + /* 122 + * The function pointer is last to make tail calls most efficient if the 123 + * compiler decides not to inline this function. 124 + */ 125 + static void do_swap(void *a, void *b, size_t size, swap_r_func_t swap_func, const void *priv) 126 + { 127 + if (swap_func == SWAP_WRAPPER) { 128 + ((const struct wrapper *)priv)->swap(a, b, (int)size); 129 + return; 130 + } 131 + 132 + if (swap_func == SWAP_WORDS_64) 133 + swap_words_64(a, b, size); 134 + else if (swap_func == SWAP_WORDS_32) 135 + swap_words_32(a, b, size); 136 + else if (swap_func == SWAP_BYTES) 137 + swap_bytes(a, b, size); 138 + else 139 + swap_func(a, b, (int)size, priv); 140 + } 141 + 142 + #define _CMP_WRAPPER ((cmp_r_func_t)0L) 143 + 144 + static int do_cmp(const void *a, const void *b, cmp_r_func_t cmp, const void *priv) 145 + { 146 + if (cmp == _CMP_WRAPPER) 147 + return ((const struct wrapper *)priv)->cmp(a, b); 148 + return cmp(a, b, priv); 149 + } 150 + 151 + static inline int eytzinger0_do_cmp(void *base, size_t n, size_t size, 152 + cmp_r_func_t cmp_func, const void *priv, 153 + size_t l, size_t r) 154 + { 155 + return do_cmp(base + inorder_to_eytzinger0(l, n) * size, 156 + base + inorder_to_eytzinger0(r, n) * size, 157 + cmp_func, priv); 158 + } 159 + 160 + static inline void eytzinger0_do_swap(void *base, size_t n, size_t size, 161 + swap_r_func_t swap_func, const void *priv, 162 + size_t l, size_t r) 163 + { 164 + do_swap(base + inorder_to_eytzinger0(l, n) * size, 165 + base + inorder_to_eytzinger0(r, n) * size, 166 + size, swap_func, priv); 167 + } 168 + 169 + void eytzinger0_sort_r(void *base, size_t n, size_t size, 170 + cmp_r_func_t cmp_func, 171 + swap_r_func_t swap_func, 172 + const void *priv) 173 + { 174 + int i, c, r; 175 + 176 + /* called from 'sort' without swap function, let's pick the default */ 177 + if (swap_func == SWAP_WRAPPER && !((struct wrapper *)priv)->swap) 178 + swap_func = NULL; 179 + 180 + if (!swap_func) { 181 + if (is_aligned(base, size, 8)) 182 + swap_func = SWAP_WORDS_64; 183 + else if (is_aligned(base, size, 4)) 184 + swap_func = SWAP_WORDS_32; 185 + else 186 + swap_func = SWAP_BYTES; 187 + } 188 + 189 + /* heapify */ 190 + for (i = n / 2 - 1; i >= 0; --i) { 191 + for (r = i; r * 2 + 1 < n; r = c) { 192 + c = r * 2 + 1; 193 + 194 + if (c + 1 < n && 195 + eytzinger0_do_cmp(base, n, size, cmp_func, priv, c, c + 1) < 0) 196 + c++; 197 + 198 + if (eytzinger0_do_cmp(base, n, size, cmp_func, priv, r, c) >= 0) 199 + break; 200 + 201 + eytzinger0_do_swap(base, n, size, swap_func, priv, r, c); 202 + } 203 + } 204 + 205 + /* sort */ 206 + for (i = n - 1; i > 0; --i) { 207 + eytzinger0_do_swap(base, n, size, swap_func, priv, 0, i); 208 + 209 + for (r = 0; r * 2 + 1 < i; r = c) { 210 + c = r * 2 + 1; 211 + 212 + if (c + 1 < i && 213 + eytzinger0_do_cmp(base, n, size, cmp_func, priv, c, c + 1) < 0) 214 + c++; 215 + 216 + if (eytzinger0_do_cmp(base, n, size, cmp_func, priv, r, c) >= 0) 217 + break; 218 + 219 + eytzinger0_do_swap(base, n, size, swap_func, priv, r, c); 220 + } 221 + } 222 + } 223 + 224 + void eytzinger0_sort(void *base, size_t n, size_t size, 225 + cmp_func_t cmp_func, 226 + swap_func_t swap_func) 227 + { 228 + struct wrapper w = { 229 + .cmp = cmp_func, 230 + .swap = swap_func, 231 + }; 232 + 233 + return eytzinger0_sort_r(base, n, size, _CMP_WRAPPER, SWAP_WRAPPER, &w); 234 + }

+37 -26

fs/bcachefs/eytzinger.h

··· 5 5 #include <linux/bitops.h> 6 6 #include <linux/log2.h> 7 7 8 - #include "util.h" 8 + #ifdef EYTZINGER_DEBUG 9 + #define EYTZINGER_BUG_ON(cond) BUG_ON(cond) 10 + #else 11 + #define EYTZINGER_BUG_ON(cond) 12 + #endif 9 13 10 14 /* 11 15 * Traversal for trees in eytzinger layout - a full binary tree layed out in an 12 - * array 13 - */ 14 - 15 - /* 16 - * One based indexing version: 16 + * array. 17 17 * 18 - * With one based indexing each level of the tree starts at a power of two - 19 - * good for cacheline alignment: 18 + * Consider using an eytzinger tree any time you would otherwise be doing binary 19 + * search over an array. Binary search is a worst case scenario for branch 20 + * prediction and prefetching, but in an eytzinger tree every node's children 21 + * are adjacent in memory, thus we can prefetch children before knowing the 22 + * result of the comparison, assuming multiple nodes fit on a cacheline. 23 + * 24 + * Two variants are provided, for one based indexing and zero based indexing. 25 + * 26 + * Zero based indexing is more convenient, but one based indexing has better 27 + * alignment and thus better performance because each new level of the tree 28 + * starts at a power of two, and thus if element 0 was cacheline aligned, each 29 + * new level will be as well. 20 30 */ 21 31 22 32 static inline unsigned eytzinger1_child(unsigned i, unsigned child) 23 33 { 24 - EBUG_ON(child > 1); 34 + EYTZINGER_BUG_ON(child > 1); 25 35 26 36 return (i << 1) + child; 27 37 } ··· 68 58 69 59 static inline unsigned eytzinger1_next(unsigned i, unsigned size) 70 60 { 71 - EBUG_ON(i > size); 61 + EYTZINGER_BUG_ON(i > size); 72 62 73 63 if (eytzinger1_right_child(i) <= size) { 74 64 i = eytzinger1_right_child(i); ··· 84 74 85 75 static inline unsigned eytzinger1_prev(unsigned i, unsigned size) 86 76 { 87 - EBUG_ON(i > size); 77 + EYTZINGER_BUG_ON(i > size); 88 78 89 79 if (eytzinger1_left_child(i) <= size) { 90 80 i = eytzinger1_left_child(i) + 1; ··· 111 101 unsigned shift = __fls(size) - b; 112 102 int s; 113 103 114 - EBUG_ON(!i || i > size); 104 + EYTZINGER_BUG_ON(!i || i > size); 115 105 116 106 i ^= 1U << b; 117 107 i <<= 1; ··· 136 126 unsigned shift; 137 127 int s; 138 128 139 - EBUG_ON(!i || i > size); 129 + EYTZINGER_BUG_ON(!i || i > size); 140 130 141 131 /* 142 132 * sign bit trick: ··· 174 164 175 165 static inline unsigned eytzinger0_child(unsigned i, unsigned child) 176 166 { 177 - EBUG_ON(child > 1); 167 + EYTZINGER_BUG_ON(child > 1); 178 168 179 169 return (i << 1) + 1 + child; 180 170 } ··· 241 231 (_i) != -1; \ 242 232 (_i) = eytzinger0_next((_i), (_size))) 243 233 244 - typedef int (*eytzinger_cmp_fn)(const void *l, const void *r, size_t size); 245 - 246 234 /* return greatest node <= @search, or -1 if not found */ 247 235 static inline ssize_t eytzinger0_find_le(void *base, size_t nr, size_t size, 248 - eytzinger_cmp_fn cmp, const void *search) 236 + cmp_func_t cmp, const void *search) 249 237 { 250 238 unsigned i, n = 0; 251 239 ··· 252 244 253 245 do { 254 246 i = n; 255 - n = eytzinger0_child(i, cmp(search, base + i * size, size) >= 0); 247 + n = eytzinger0_child(i, cmp(base + i * size, search) <= 0); 256 248 } while (n < nr); 257 249 258 250 if (n & 1) { 259 251 /* @i was greater than @search, return previous node: */ 260 - 261 - if (i == eytzinger0_first(nr)) 262 - return -1; 263 - 264 252 return eytzinger0_prev(i, nr); 265 253 } else { 266 254 return i; 267 255 } 256 + } 257 + 258 + static inline ssize_t eytzinger0_find_gt(void *base, size_t nr, size_t size, 259 + cmp_func_t cmp, const void *search) 260 + { 261 + ssize_t idx = eytzinger0_find_le(base, nr, size, cmp, search); 262 + return eytzinger0_next(idx, size); 268 263 } 269 264 270 265 #define eytzinger0_find(base, nr, size, _cmp, search) \ ··· 280 269 int _res; \ 281 270 \ 282 271 while (_i < _nr && \ 283 - (_res = _cmp(_search, _base + _i * _size, _size))) \ 272 + (_res = _cmp(_search, _base + _i * _size))) \ 284 273 _i = eytzinger0_child(_i, _res > 0); \ 285 274 _i; \ 286 275 }) 287 276 288 - void eytzinger0_sort(void *, size_t, size_t, 289 - int (*cmp_func)(const void *, const void *, size_t), 290 - void (*swap_func)(void *, void *, size_t)); 277 + void eytzinger0_sort_r(void *, size_t, size_t, 278 + cmp_r_func_t, swap_r_func_t, const void *); 279 + void eytzinger0_sort(void *, size_t, size_t, cmp_func_t, swap_func_t); 291 280 292 281 #endif /* _EYTZINGER_H */

+2 -2

fs/bcachefs/fs-io-direct.c

··· 536 536 if (likely(!dio->iter.count) || dio->op.error) 537 537 break; 538 538 539 - bio_reset(bio, NULL, REQ_OP_WRITE); 539 + bio_reset(bio, NULL, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); 540 540 } 541 541 out: 542 542 return bch2_dio_write_done(dio); ··· 618 618 619 619 bio = bio_alloc_bioset(NULL, 620 620 bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS), 621 - REQ_OP_WRITE, 621 + REQ_OP_WRITE | REQ_SYNC | REQ_IDLE, 622 622 GFP_KERNEL, 623 623 &c->dio_write_bioset); 624 624 dio = container_of(bio, struct dio_write, op.wbio.bio);

+1

fs/bcachefs/fs.c

··· 1997 1997 return dget(sb->s_root); 1998 1998 1999 1999 err_put_super: 2000 + __bch2_fs_stop(c); 2000 2001 deactivate_locked_super(sb); 2001 2002 return ERR_PTR(bch2_err_class(ret)); 2002 2003 }

+224 -40

fs/bcachefs/fsck.c

··· 12 12 #include "fsck.h" 13 13 #include "inode.h" 14 14 #include "keylist.h" 15 - #include "recovery.h" 15 + #include "recovery_passes.h" 16 16 #include "snapshot.h" 17 17 #include "super.h" 18 18 #include "xattr.h" ··· 63 63 u32 *snapshot, u64 *inum) 64 64 { 65 65 struct bch_subvolume s; 66 - int ret; 67 - 68 - ret = bch2_subvolume_get(trans, subvol, false, 0, &s); 66 + int ret = bch2_subvolume_get(trans, subvol, false, 0, &s); 69 67 70 68 *snapshot = le32_to_cpu(s.snapshot); 71 69 *inum = le64_to_cpu(s.inode); ··· 156 158 157 159 bch2_trans_iter_init(trans, &iter, BTREE_ID_dirents, pos, BTREE_ITER_INTENT); 158 160 159 - ret = bch2_hash_delete_at(trans, bch2_dirent_hash_desc, 160 - &dir_hash_info, &iter, 161 - BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 161 + ret = bch2_btree_iter_traverse(&iter) ?: 162 + bch2_hash_delete_at(trans, bch2_dirent_hash_desc, 163 + &dir_hash_info, &iter, 164 + BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE); 162 165 bch2_trans_iter_exit(trans, &iter); 163 166 err: 164 167 bch_err_fn(c, ret); ··· 168 169 169 170 /* Get lost+found, create if it doesn't exist: */ 170 171 static int lookup_lostfound(struct btree_trans *trans, u32 snapshot, 171 - struct bch_inode_unpacked *lostfound) 172 + struct bch_inode_unpacked *lostfound, 173 + u64 reattaching_inum) 172 174 { 173 175 struct bch_fs *c = trans->c; 174 176 struct qstr lostfound_str = QSTR("lost+found"); ··· 184 184 return ret; 185 185 186 186 subvol_inum root_inum = { .subvol = le32_to_cpu(st.master_subvol) }; 187 - u32 subvol_snapshot; 188 187 189 - ret = subvol_lookup(trans, le32_to_cpu(st.master_subvol), 190 - &subvol_snapshot, &root_inum.inum); 191 - bch_err_msg(c, ret, "looking up root subvol"); 188 + struct bch_subvolume subvol; 189 + ret = bch2_subvolume_get(trans, le32_to_cpu(st.master_subvol), 190 + false, 0, &subvol); 191 + bch_err_msg(c, ret, "looking up root subvol %u for snapshot %u", 192 + le32_to_cpu(st.master_subvol), snapshot); 192 193 if (ret) 193 194 return ret; 195 + 196 + if (!subvol.inode) { 197 + struct btree_iter iter; 198 + struct bkey_i_subvolume *subvol = bch2_bkey_get_mut_typed(trans, &iter, 199 + BTREE_ID_subvolumes, POS(0, le32_to_cpu(st.master_subvol)), 200 + 0, subvolume); 201 + ret = PTR_ERR_OR_ZERO(subvol); 202 + if (ret) 203 + return ret; 204 + 205 + subvol->v.inode = cpu_to_le64(reattaching_inum); 206 + bch2_trans_iter_exit(trans, &iter); 207 + } 208 + 209 + root_inum.inum = le64_to_cpu(subvol.inode); 194 210 195 211 struct bch_inode_unpacked root_inode; 196 212 struct bch_hash_info root_hash_info; 197 213 u32 root_inode_snapshot = snapshot; 198 214 ret = lookup_inode(trans, root_inum.inum, &root_inode, &root_inode_snapshot); 199 - bch_err_msg(c, ret, "looking up root inode"); 215 + bch_err_msg(c, ret, "looking up root inode %llu for subvol %u", 216 + root_inum.inum, le32_to_cpu(st.master_subvol)); 200 217 if (ret) 201 218 return ret; 202 219 ··· 309 292 snprintf(name_buf, sizeof(name_buf), "%llu", inode->bi_inum); 310 293 } 311 294 312 - ret = lookup_lostfound(trans, dirent_snapshot, &lostfound); 295 + ret = lookup_lostfound(trans, dirent_snapshot, &lostfound, inode->bi_inum); 313 296 if (ret) 314 297 return ret; 315 298 ··· 378 361 ret = reattach_inode(trans, &inode, le32_to_cpu(s.v->snapshot)); 379 362 bch_err_msg(c, ret, "reattaching inode %llu", inode.bi_inum); 380 363 return ret; 364 + } 365 + 366 + static int reconstruct_subvol(struct btree_trans *trans, u32 snapshotid, u32 subvolid, u64 inum) 367 + { 368 + struct bch_fs *c = trans->c; 369 + 370 + if (!bch2_snapshot_is_leaf(c, snapshotid)) { 371 + bch_err(c, "need to reconstruct subvol, but have interior node snapshot"); 372 + return -BCH_ERR_fsck_repair_unimplemented; 373 + } 374 + 375 + /* 376 + * If inum isn't set, that means we're being called from check_dirents, 377 + * not check_inodes - the root of this subvolume doesn't exist or we 378 + * would have found it there: 379 + */ 380 + if (!inum) { 381 + struct btree_iter inode_iter = {}; 382 + struct bch_inode_unpacked new_inode; 383 + u64 cpu = raw_smp_processor_id(); 384 + 385 + bch2_inode_init_early(c, &new_inode); 386 + bch2_inode_init_late(&new_inode, bch2_current_time(c), 0, 0, S_IFDIR|0755, 0, NULL); 387 + 388 + new_inode.bi_subvol = subvolid; 389 + 390 + int ret = bch2_inode_create(trans, &inode_iter, &new_inode, snapshotid, cpu) ?: 391 + bch2_btree_iter_traverse(&inode_iter) ?: 392 + bch2_inode_write(trans, &inode_iter, &new_inode); 393 + bch2_trans_iter_exit(trans, &inode_iter); 394 + if (ret) 395 + return ret; 396 + 397 + inum = new_inode.bi_inum; 398 + } 399 + 400 + bch_info(c, "reconstructing subvol %u with root inode %llu", subvolid, inum); 401 + 402 + struct bkey_i_subvolume *new_subvol = bch2_trans_kmalloc(trans, sizeof(*new_subvol)); 403 + int ret = PTR_ERR_OR_ZERO(new_subvol); 404 + if (ret) 405 + return ret; 406 + 407 + bkey_subvolume_init(&new_subvol->k_i); 408 + new_subvol->k.p.offset = subvolid; 409 + new_subvol->v.snapshot = cpu_to_le32(snapshotid); 410 + new_subvol->v.inode = cpu_to_le64(inum); 411 + ret = bch2_btree_insert_trans(trans, BTREE_ID_subvolumes, &new_subvol->k_i, 0); 412 + if (ret) 413 + return ret; 414 + 415 + struct btree_iter iter; 416 + struct bkey_i_snapshot *s = bch2_bkey_get_mut_typed(trans, &iter, 417 + BTREE_ID_snapshots, POS(0, snapshotid), 418 + 0, snapshot); 419 + ret = PTR_ERR_OR_ZERO(s); 420 + bch_err_msg(c, ret, "getting snapshot %u", snapshotid); 421 + if (ret) 422 + return ret; 423 + 424 + u32 snapshot_tree = le32_to_cpu(s->v.tree); 425 + 426 + s->v.subvol = cpu_to_le32(subvolid); 427 + SET_BCH_SNAPSHOT_SUBVOL(&s->v, true); 428 + bch2_trans_iter_exit(trans, &iter); 429 + 430 + struct bkey_i_snapshot_tree *st = bch2_bkey_get_mut_typed(trans, &iter, 431 + BTREE_ID_snapshot_trees, POS(0, snapshot_tree), 432 + 0, snapshot_tree); 433 + ret = PTR_ERR_OR_ZERO(st); 434 + bch_err_msg(c, ret, "getting snapshot tree %u", snapshot_tree); 435 + if (ret) 436 + return ret; 437 + 438 + if (!st->v.master_subvol) 439 + st->v.master_subvol = cpu_to_le32(subvolid); 440 + 441 + bch2_trans_iter_exit(trans, &iter); 442 + return 0; 443 + } 444 + 445 + static int reconstruct_inode(struct btree_trans *trans, u32 snapshot, u64 inum, u64 size, unsigned mode) 446 + { 447 + struct bch_fs *c = trans->c; 448 + struct bch_inode_unpacked new_inode; 449 + 450 + bch2_inode_init_early(c, &new_inode); 451 + bch2_inode_init_late(&new_inode, bch2_current_time(c), 0, 0, mode|0755, 0, NULL); 452 + new_inode.bi_size = size; 453 + new_inode.bi_inum = inum; 454 + 455 + return __bch2_fsck_write_inode(trans, &new_inode, snapshot); 456 + } 457 + 458 + static int reconstruct_reg_inode(struct btree_trans *trans, u32 snapshot, u64 inum) 459 + { 460 + struct btree_iter iter = {}; 461 + 462 + bch2_trans_iter_init(trans, &iter, BTREE_ID_extents, SPOS(inum, U64_MAX, snapshot), 0); 463 + struct bkey_s_c k = bch2_btree_iter_peek_prev(&iter); 464 + bch2_trans_iter_exit(trans, &iter); 465 + int ret = bkey_err(k); 466 + if (ret) 467 + return ret; 468 + 469 + return reconstruct_inode(trans, snapshot, inum, k.k->p.offset << 9, S_IFREG); 381 470 } 382 471 383 472 struct snapshots_seen_entry { ··· 1187 1064 if (ret && !bch2_err_matches(ret, ENOENT)) 1188 1065 goto err; 1189 1066 1067 + if (ret && (c->sb.btrees_lost_data & BIT_ULL(BTREE_ID_subvolumes))) { 1068 + ret = reconstruct_subvol(trans, k.k->p.snapshot, u.bi_subvol, u.bi_inum); 1069 + goto do_update; 1070 + } 1071 + 1190 1072 if (fsck_err_on(ret, 1191 1073 c, inode_bi_subvol_missing, 1192 1074 "inode %llu:%u bi_subvol points to missing subvolume %u", ··· 1209 1081 do_update = true; 1210 1082 } 1211 1083 } 1212 - 1084 + do_update: 1213 1085 if (do_update) { 1214 1086 ret = __bch2_fsck_write_inode(trans, &u, iter->pos.snapshot); 1215 1087 bch_err_msg(c, ret, "in fsck updating inode"); ··· 1258 1130 i->count = count2; 1259 1131 1260 1132 if (i->count != count2) { 1261 - bch_err(c, "fsck counted i_sectors wrong for inode %llu:%u: got %llu should be %llu", 1262 - w->last_pos.inode, i->snapshot, i->count, count2); 1133 + bch_err_ratelimited(c, "fsck counted i_sectors wrong for inode %llu:%u: got %llu should be %llu", 1134 + w->last_pos.inode, i->snapshot, i->count, count2); 1263 1135 return -BCH_ERR_internal_fsck_err; 1264 1136 } 1265 1137 ··· 1499 1371 goto err; 1500 1372 } 1501 1373 1502 - ret = extent_ends_at(c, extent_ends, seen, k); 1503 - if (ret) 1504 - goto err; 1505 - 1506 1374 extent_ends->last_pos = k.k->p; 1507 1375 err: 1508 1376 return ret; ··· 1562 1438 goto err; 1563 1439 1564 1440 if (k.k->type != KEY_TYPE_whiteout) { 1441 + if (!i && (c->sb.btrees_lost_data & BIT_ULL(BTREE_ID_inodes))) { 1442 + ret = reconstruct_reg_inode(trans, k.k->p.snapshot, k.k->p.inode) ?: 1443 + bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc); 1444 + if (ret) 1445 + goto err; 1446 + 1447 + inode->last_pos.inode--; 1448 + ret = -BCH_ERR_transaction_restart_nested; 1449 + goto err; 1450 + } 1451 + 1565 1452 if (fsck_err_on(!i, c, extent_in_missing_inode, 1566 1453 "extent in missing inode:\n %s", 1567 1454 (printbuf_reset(&buf), ··· 1638 1503 } 1639 1504 1640 1505 i->seen_this_pos = true; 1506 + } 1507 + 1508 + if (k.k->type != KEY_TYPE_whiteout) { 1509 + ret = extent_ends_at(c, extent_ends, s, k); 1510 + if (ret) 1511 + goto err; 1641 1512 } 1642 1513 out: 1643 1514 err: ··· 1725 1584 return count2; 1726 1585 1727 1586 if (i->count != count2) { 1728 - bch_err(c, "fsck counted subdirectories wrong: got %llu should be %llu", 1729 - i->count, count2); 1587 + bch_err_ratelimited(c, "fsck counted subdirectories wrong for inum %llu:%u: got %llu should be %llu", 1588 + w->last_pos.inode, i->snapshot, i->count, count2); 1730 1589 i->count = count2; 1731 1590 if (i->inode.bi_nlink == i->count) 1732 1591 continue; ··· 1923 1782 u32 parent_subvol = le32_to_cpu(d.v->d_parent_subvol); 1924 1783 u32 target_subvol = le32_to_cpu(d.v->d_child_subvol); 1925 1784 u32 parent_snapshot; 1785 + u32 new_parent_subvol = 0; 1926 1786 u64 parent_inum; 1927 1787 struct printbuf buf = PRINTBUF; 1928 1788 int ret = 0; ··· 1931 1789 ret = subvol_lookup(trans, parent_subvol, &parent_snapshot, &parent_inum); 1932 1790 if (ret && !bch2_err_matches(ret, ENOENT)) 1933 1791 return ret; 1792 + 1793 + if (ret || 1794 + (!ret && !bch2_snapshot_is_ancestor(c, parent_snapshot, d.k->p.snapshot))) { 1795 + int ret2 = find_snapshot_subvol(trans, d.k->p.snapshot, &new_parent_subvol); 1796 + if (ret2 && !bch2_err_matches(ret, ENOENT)) 1797 + return ret2; 1798 + } 1799 + 1800 + if (ret && 1801 + !new_parent_subvol && 1802 + (c->sb.btrees_lost_data & BIT_ULL(BTREE_ID_subvolumes))) { 1803 + /* 1804 + * Couldn't find a subvol for dirent's snapshot - but we lost 1805 + * subvols, so we need to reconstruct: 1806 + */ 1807 + ret = reconstruct_subvol(trans, d.k->p.snapshot, parent_subvol, 0); 1808 + if (ret) 1809 + return ret; 1810 + 1811 + parent_snapshot = d.k->p.snapshot; 1812 + } 1934 1813 1935 1814 if (fsck_err_on(ret, c, dirent_to_missing_parent_subvol, 1936 1815 "dirent parent_subvol points to missing subvolume\n%s", ··· 1961 1798 "dirent not visible in parent_subvol (not an ancestor of subvol snap %u)\n%s", 1962 1799 parent_snapshot, 1963 1800 (bch2_bkey_val_to_text(&buf, c, d.s_c), buf.buf))) { 1964 - u32 new_parent_subvol; 1965 - ret = find_snapshot_subvol(trans, d.k->p.snapshot, &new_parent_subvol); 1966 - if (ret) 1967 - goto err; 1801 + if (!new_parent_subvol) { 1802 + bch_err(c, "could not find a subvol for snapshot %u", d.k->p.snapshot); 1803 + return -BCH_ERR_fsck_repair_unimplemented; 1804 + } 1968 1805 1969 1806 struct bkey_i_dirent *new_dirent = bch2_bkey_make_mut_typed(trans, iter, &d.s_c, 0, dirent); 1970 1807 ret = PTR_ERR_OR_ZERO(new_dirent); ··· 2010 1847 2011 1848 ret = lookup_inode(trans, target_inum, &subvol_root, &target_snapshot); 2012 1849 if (ret && !bch2_err_matches(ret, ENOENT)) 2013 - return ret; 1850 + goto err; 2014 1851 2015 - if (fsck_err_on(parent_subvol != subvol_root.bi_parent_subvol, 1852 + if (ret) { 1853 + bch_err(c, "subvol %u points to missing inode root %llu", target_subvol, target_inum); 1854 + ret = -BCH_ERR_fsck_repair_unimplemented; 1855 + ret = 0; 1856 + goto err; 1857 + } 1858 + 1859 + if (fsck_err_on(!ret && parent_subvol != subvol_root.bi_parent_subvol, 2016 1860 c, inode_bi_parent_wrong, 2017 1861 "subvol root %llu has wrong bi_parent_subvol: got %u, should be %u", 2018 1862 target_inum, ··· 2027 1857 subvol_root.bi_parent_subvol = parent_subvol; 2028 1858 ret = __bch2_fsck_write_inode(trans, &subvol_root, target_snapshot); 2029 1859 if (ret) 2030 - return ret; 1860 + goto err; 2031 1861 } 2032 1862 2033 1863 ret = check_dirent_target(trans, iter, d, &subvol_root, 2034 1864 target_snapshot); 2035 1865 if (ret) 2036 - return ret; 1866 + goto err; 2037 1867 out: 2038 1868 err: 2039 1869 fsck_err: ··· 2050 1880 struct snapshots_seen *s) 2051 1881 { 2052 1882 struct bch_fs *c = trans->c; 2053 - struct bkey_s_c_dirent d; 2054 1883 struct inode_walker_entry *i; 2055 1884 struct printbuf buf = PRINTBUF; 2056 1885 struct bpos equiv; ··· 2088 1919 *hash_info = bch2_hash_info_init(c, &dir->inodes.data[0].inode); 2089 1920 dir->first_this_inode = false; 2090 1921 1922 + if (!i && (c->sb.btrees_lost_data & BIT_ULL(BTREE_ID_inodes))) { 1923 + ret = reconstruct_inode(trans, k.k->p.snapshot, k.k->p.inode, 0, S_IFDIR) ?: 1924 + bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc); 1925 + if (ret) 1926 + goto err; 1927 + 1928 + dir->last_pos.inode--; 1929 + ret = -BCH_ERR_transaction_restart_nested; 1930 + goto err; 1931 + } 1932 + 2091 1933 if (fsck_err_on(!i, c, dirent_in_missing_dir_inode, 2092 1934 "dirent in nonexisting directory:\n%s", 2093 1935 (printbuf_reset(&buf), ··· 2133 1953 if (k.k->type != KEY_TYPE_dirent) 2134 1954 goto out; 2135 1955 2136 - d = bkey_s_c_to_dirent(k); 1956 + struct bkey_s_c_dirent d = bkey_s_c_to_dirent(k); 2137 1957 2138 1958 if (d.v->d_type == DT_SUBVOL) { 2139 1959 ret = check_dirent_to_subvol(trans, iter, d); ··· 2278 2098 2279 2099 if (mustfix_fsck_err_on(ret, c, root_subvol_missing, 2280 2100 "root subvol missing")) { 2281 - struct bkey_i_subvolume root_subvol; 2101 + struct bkey_i_subvolume *root_subvol = 2102 + bch2_trans_kmalloc(trans, sizeof(*root_subvol)); 2103 + ret = PTR_ERR_OR_ZERO(root_subvol); 2104 + if (ret) 2105 + goto err; 2282 2106 2283 2107 snapshot = U32_MAX; 2284 2108 inum = BCACHEFS_ROOT_INO; 2285 2109 2286 - bkey_subvolume_init(&root_subvol.k_i); 2287 - root_subvol.k.p.offset = BCACHEFS_ROOT_SUBVOL; 2288 - root_subvol.v.flags = 0; 2289 - root_subvol.v.snapshot = cpu_to_le32(snapshot); 2290 - root_subvol.v.inode = cpu_to_le64(inum); 2291 - ret = bch2_btree_insert_trans(trans, BTREE_ID_subvolumes, &root_subvol.k_i, 0); 2110 + bkey_subvolume_init(&root_subvol->k_i); 2111 + root_subvol->k.p.offset = BCACHEFS_ROOT_SUBVOL; 2112 + root_subvol->v.flags = 0; 2113 + root_subvol->v.snapshot = cpu_to_le32(snapshot); 2114 + root_subvol->v.inode = cpu_to_le64(inum); 2115 + ret = bch2_btree_insert_trans(trans, BTREE_ID_subvolumes, &root_subvol->k_i, 0); 2292 2116 bch_err_msg(c, ret, "writing root subvol"); 2293 2117 if (ret) 2294 2118 goto err;

+1 -1

fs/bcachefs/inode.c

··· 552 552 prt_printf(out, "bi_sectors=%llu", inode->bi_sectors); 553 553 prt_newline(out); 554 554 555 - prt_newline(out); 556 555 prt_printf(out, "bi_version=%llu", inode->bi_version); 556 + prt_newline(out); 557 557 558 558 #define x(_name, _bits) \ 559 559 prt_printf(out, #_name "=%llu", (u64) inode->_name); \

+2

fs/bcachefs/io_misc.c

··· 264 264 ret = 0; 265 265 err: 266 266 bch2_logged_op_finish(trans, op_k); 267 + bch_err_fn(c, ret); 267 268 return ret; 268 269 } 269 270 ··· 477 476 break; 478 477 } 479 478 err: 479 + bch_err_fn(c, ret); 480 480 bch2_logged_op_finish(trans, op_k); 481 481 bch2_trans_iter_exit(trans, &iter); 482 482 return ret;

+1 -2

fs/bcachefs/journal_seq_blacklist.c

··· 95 95 return ret ?: bch2_blacklist_table_initialize(c); 96 96 } 97 97 98 - static int journal_seq_blacklist_table_cmp(const void *_l, 99 - const void *_r, size_t size) 98 + static int journal_seq_blacklist_table_cmp(const void *_l, const void *_r) 100 99 { 101 100 const struct journal_seq_blacklist_table_entry *l = _l; 102 101 const struct journal_seq_blacklist_table_entry *r = _r;

+3 -4

fs/bcachefs/logged_ops.c

··· 37 37 const struct bch_logged_op_fn *fn = logged_op_fn(k.k->type); 38 38 struct bkey_buf sk; 39 39 u32 restart_count = trans->restart_count; 40 - int ret; 41 40 42 41 if (!fn) 43 42 return 0; ··· 44 45 bch2_bkey_buf_init(&sk); 45 46 bch2_bkey_buf_reassemble(&sk, c, k); 46 47 47 - ret = drop_locks_do(trans, (bch2_fs_lazy_rw(c), 0)) ?: 48 - fn->resume(trans, sk.k) ?: trans_was_restarted(trans, restart_count); 48 + fn->resume(trans, sk.k); 49 49 50 50 bch2_bkey_buf_exit(&sk, c); 51 - return ret; 51 + 52 + return trans_was_restarted(trans, restart_count); 52 53 } 53 54 54 55 int bch2_resume_logged_ops(struct bch_fs *c)

+1 -27

fs/bcachefs/mean_and_variance_test.c

··· 136 136 d, mean, stddev, weighted_mean, weighted_stddev); 137 137 } 138 138 139 - static void mean_and_variance_test_2(struct kunit *test) 140 - { 141 - s64 d[] = { 100, 10, 10, 10, 10, 10, 10 }; 142 - s64 mean[] = { 10, 10, 10, 10, 10, 10, 10 }; 143 - s64 stddev[] = { 9, 9, 9, 9, 9, 9, 9 }; 144 - s64 weighted_mean[] = { 32, 27, 22, 19, 17, 15, 14 }; 145 - s64 weighted_stddev[] = { 38, 35, 31, 27, 24, 21, 18 }; 146 - 147 - do_mean_and_variance_test(test, 10, 6, ARRAY_SIZE(d), 2, 148 - d, mean, stddev, weighted_mean, weighted_stddev); 149 - } 150 - 151 139 /* Test behaviour where we switch from one steady state to another: */ 152 - static void mean_and_variance_test_3(struct kunit *test) 140 + static void mean_and_variance_test_2(struct kunit *test) 153 141 { 154 142 s64 d[] = { 100, 100, 100, 100, 100 }; 155 143 s64 mean[] = { 22, 32, 40, 46, 50 }; 156 144 s64 stddev[] = { 32, 39, 42, 44, 45 }; 157 - s64 weighted_mean[] = { 32, 49, 61, 71, 78 }; 158 - s64 weighted_stddev[] = { 38, 44, 44, 41, 38 }; 159 - 160 - do_mean_and_variance_test(test, 10, 6, ARRAY_SIZE(d), 2, 161 - d, mean, stddev, weighted_mean, weighted_stddev); 162 - } 163 - 164 - static void mean_and_variance_test_4(struct kunit *test) 165 - { 166 - s64 d[] = { 100, 100, 100, 100, 100 }; 167 - s64 mean[] = { 10, 11, 12, 13, 14 }; 168 - s64 stddev[] = { 9, 13, 15, 17, 19 }; 169 145 s64 weighted_mean[] = { 32, 49, 61, 71, 78 }; 170 146 s64 weighted_stddev[] = { 38, 44, 44, 41, 38 }; 171 147 ··· 206 230 KUNIT_CASE(mean_and_variance_weighted_advanced_test), 207 231 KUNIT_CASE(mean_and_variance_test_1), 208 232 KUNIT_CASE(mean_and_variance_test_2), 209 - KUNIT_CASE(mean_and_variance_test_3), 210 - KUNIT_CASE(mean_and_variance_test_4), 211 233 {} 212 234 }; 213 235

+4

fs/bcachefs/opts.c

··· 7 7 #include "disk_groups.h" 8 8 #include "error.h" 9 9 #include "opts.h" 10 + #include "recovery_passes.h" 10 11 #include "super-io.h" 11 12 #include "util.h" 12 13 ··· 205 204 .min = _min, .max = _max 206 205 #define OPT_STR(_choices) .type = BCH_OPT_STR, \ 207 206 .min = 0, .max = ARRAY_SIZE(_choices), \ 207 + .choices = _choices 208 + #define OPT_STR_NOLIMIT(_choices) .type = BCH_OPT_STR, \ 209 + .min = 0, .max = U64_MAX, \ 208 210 .choices = _choices 209 211 #define OPT_FN(_fn) .type = BCH_OPT_FN, .fn = _fn 210 212

+8 -3

fs/bcachefs/opts.h

··· 362 362 OPT_FS|OPT_MOUNT, \ 363 363 OPT_BOOL(), \ 364 364 BCH2_NO_SB_OPT, false, \ 365 - NULL, "Don't replay the journal") \ 366 - x(keep_journal, u8, \ 365 + NULL, "Exit recovery immediately prior to journal replay")\ 366 + x(recovery_pass_last, u8, \ 367 + OPT_FS|OPT_MOUNT, \ 368 + OPT_STR_NOLIMIT(bch2_recovery_passes), \ 369 + BCH2_NO_SB_OPT, 0, \ 370 + NULL, "Exit recovery after specified pass") \ 371 + x(retain_recovery_info, u8, \ 367 372 0, \ 368 373 OPT_BOOL(), \ 369 374 BCH2_NO_SB_OPT, false, \ 370 - NULL, "Don't free journal entries/keys after startup")\ 375 + NULL, "Don't free journal entries/keys, scanned btree nodes after startup")\ 371 376 x(read_entire_journal, u8, \ 372 377 0, \ 373 378 OPT_BOOL(), \

+101 -303

fs/bcachefs/recovery.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 3 3 #include "bcachefs.h" 4 - #include "backpointers.h" 5 - #include "bkey_buf.h" 6 4 #include "alloc_background.h" 7 - #include "btree_gc.h" 5 + #include "bkey_buf.h" 8 6 #include "btree_journal_iter.h" 7 + #include "btree_node_scan.h" 9 8 #include "btree_update.h" 10 9 #include "btree_update_interior.h" 11 10 #include "btree_io.h" 12 11 #include "buckets.h" 13 12 #include "dirent.h" 14 - #include "ec.h" 15 13 #include "errcode.h" 16 14 #include "error.h" 17 15 #include "fs-common.h" 18 - #include "fsck.h" 19 16 #include "journal_io.h" 20 17 #include "journal_reclaim.h" 21 18 #include "journal_seq_blacklist.h" 22 - #include "lru.h" 23 19 #include "logged_ops.h" 24 20 #include "move.h" 25 21 #include "quota.h" 26 22 #include "rebalance.h" 27 23 #include "recovery.h" 24 + #include "recovery_passes.h" 28 25 #include "replicas.h" 29 26 #include "sb-clean.h" 30 27 #include "sb-downgrade.h" 31 28 #include "snapshot.h" 32 - #include "subvolume.h" 33 29 #include "super-io.h" 34 30 35 31 #include <linux/sort.h> 36 32 #include <linux/stat.h> 37 33 38 34 #define QSTR(n) { { { .len = strlen(n) } }, .name = n } 35 + 36 + void bch2_btree_lost_data(struct bch_fs *c, enum btree_id btree) 37 + { 38 + u64 b = BIT_ULL(btree); 39 + 40 + if (!(c->sb.btrees_lost_data & b)) { 41 + bch_err(c, "flagging btree %s lost data", bch2_btree_id_str(btree)); 42 + 43 + mutex_lock(&c->sb_lock); 44 + bch2_sb_field_get(c->disk_sb.sb, ext)->btrees_lost_data |= cpu_to_le64(b); 45 + bch2_write_super(c); 46 + mutex_unlock(&c->sb_lock); 47 + } 48 + } 39 49 40 50 static bool btree_id_is_alloc(enum btree_id id) 41 51 { ··· 62 52 } 63 53 64 54 /* for -o reconstruct_alloc: */ 65 - static void do_reconstruct_alloc(struct bch_fs *c) 55 + static void bch2_reconstruct_alloc(struct bch_fs *c) 66 56 { 67 57 bch2_journal_log_msg(c, "dropping alloc info"); 68 58 bch_info(c, "dropping and reconstructing all alloc info"); ··· 97 87 98 88 c->recovery_passes_explicit |= bch2_recovery_passes_from_stable(le64_to_cpu(ext->recovery_passes_required[0])); 99 89 100 - struct journal_keys *keys = &c->journal_keys; 101 - size_t src, dst; 102 90 103 - move_gap(keys, keys->nr); 104 - 105 - for (src = 0, dst = 0; src < keys->nr; src++) 106 - if (!btree_id_is_alloc(keys->data[src].btree_id)) 107 - keys->data[dst++] = keys->data[src]; 108 - keys->nr = keys->gap = dst; 91 + bch2_shoot_down_journal_keys(c, BTREE_ID_alloc, 92 + 0, BTREE_MAX_DEPTH, POS_MIN, SPOS_MAX); 93 + bch2_shoot_down_journal_keys(c, BTREE_ID_backpointers, 94 + 0, BTREE_MAX_DEPTH, POS_MIN, SPOS_MAX); 95 + bch2_shoot_down_journal_keys(c, BTREE_ID_need_discard, 96 + 0, BTREE_MAX_DEPTH, POS_MIN, SPOS_MAX); 97 + bch2_shoot_down_journal_keys(c, BTREE_ID_freespace, 98 + 0, BTREE_MAX_DEPTH, POS_MIN, SPOS_MAX); 99 + bch2_shoot_down_journal_keys(c, BTREE_ID_bucket_gens, 100 + 0, BTREE_MAX_DEPTH, POS_MIN, SPOS_MAX); 109 101 } 110 102 111 103 /* ··· 198 186 return cmp_int(l->journal_seq, r->journal_seq); 199 187 } 200 188 201 - static int bch2_journal_replay(struct bch_fs *c) 189 + int bch2_journal_replay(struct bch_fs *c) 202 190 { 203 191 struct journal_keys *keys = &c->journal_keys; 204 192 DARRAY(struct journal_key *) keys_sorted = { 0 }; ··· 206 194 u64 start_seq = c->journal_replay_seq_start; 207 195 u64 end_seq = c->journal_replay_seq_start; 208 196 struct btree_trans *trans = bch2_trans_get(c); 197 + bool immediate_flush = false; 209 198 int ret = 0; 210 199 211 200 if (keys->nr) { ··· 227 214 */ 228 215 darray_for_each(*keys, k) { 229 216 cond_resched(); 217 + 218 + /* 219 + * k->allocated means the key wasn't read in from the journal, 220 + * rather it was from early repair code 221 + */ 222 + if (k->allocated) 223 + immediate_flush = true; 230 224 231 225 /* Skip fastpath if we're low on space in the journal */ 232 226 ret = c->journal.watermark ? -1 : ··· 286 266 bch2_trans_put(trans); 287 267 trans = NULL; 288 268 289 - if (!c->opts.keep_journal) 269 + if (!c->opts.retain_recovery_info && 270 + c->recovery_pass_done >= BCH_RECOVERY_PASS_journal_replay) 290 271 bch2_journal_keys_put_initial(c); 291 272 292 273 replay_now_at(j, j->replay_journal_seq_end); 293 274 j->replay_journal_seq = 0; 294 275 295 276 bch2_journal_set_replay_done(j); 277 + 278 + /* if we did any repair, flush it immediately */ 279 + if (immediate_flush) { 280 + bch2_journal_flush_all_pins(&c->journal); 281 + ret = bch2_journal_meta(&c->journal); 282 + } 296 283 297 284 if (keys->nr) 298 285 bch2_journal_log_msg(c, "journal replay finished"); ··· 450 423 451 424 static int read_btree_roots(struct bch_fs *c) 452 425 { 453 - unsigned i; 454 426 int ret = 0; 455 427 456 - for (i = 0; i < btree_id_nr_alive(c); i++) { 428 + for (unsigned i = 0; i < btree_id_nr_alive(c); i++) { 457 429 struct btree_root *r = bch2_btree_id_root(c, i); 458 430 459 431 if (!r->alive) ··· 461 435 if (btree_id_is_alloc(i) && c->opts.reconstruct_alloc) 462 436 continue; 463 437 464 - if (r->error) { 465 - __fsck_err(c, 466 - btree_id_is_alloc(i) 467 - ? FSCK_CAN_IGNORE : 0, 468 - btree_root_bkey_invalid, 469 - "invalid btree root %s", 470 - bch2_btree_id_str(i)); 471 - if (i == BTREE_ID_alloc) 438 + if (mustfix_fsck_err_on((ret = r->error), 439 + c, btree_root_bkey_invalid, 440 + "invalid btree root %s", 441 + bch2_btree_id_str(i)) || 442 + mustfix_fsck_err_on((ret = r->error = bch2_btree_root_read(c, i, &r->key, r->level)), 443 + c, btree_root_read_error, 444 + "error reading btree root %s l=%u: %s", 445 + bch2_btree_id_str(i), r->level, bch2_err_str(ret))) { 446 + if (btree_id_is_alloc(i)) { 447 + c->recovery_passes_explicit |= BIT_ULL(BCH_RECOVERY_PASS_check_allocations); 448 + c->recovery_passes_explicit |= BIT_ULL(BCH_RECOVERY_PASS_check_alloc_info); 449 + c->recovery_passes_explicit |= BIT_ULL(BCH_RECOVERY_PASS_check_lrus); 450 + c->recovery_passes_explicit |= BIT_ULL(BCH_RECOVERY_PASS_check_extents_to_backpointers); 451 + c->recovery_passes_explicit |= BIT_ULL(BCH_RECOVERY_PASS_check_alloc_to_lru_refs); 472 452 c->sb.compat &= ~(1ULL << BCH_COMPAT_alloc_info); 473 - } 453 + r->error = 0; 454 + } else if (!(c->recovery_passes_explicit & BIT_ULL(BCH_RECOVERY_PASS_scan_for_btree_nodes))) { 455 + bch_info(c, "will run btree node scan"); 456 + c->recovery_passes_explicit |= BIT_ULL(BCH_RECOVERY_PASS_scan_for_btree_nodes); 457 + c->recovery_passes_explicit |= BIT_ULL(BCH_RECOVERY_PASS_check_topology); 458 + } 474 459 475 - ret = bch2_btree_root_read(c, i, &r->key, r->level); 476 - if (ret) { 477 - fsck_err(c, 478 - btree_root_read_error, 479 - "error reading btree root %s", 480 - bch2_btree_id_str(i)); 481 - if (btree_id_is_alloc(i)) 482 - c->sb.compat &= ~(1ULL << BCH_COMPAT_alloc_info); 483 460 ret = 0; 461 + bch2_btree_lost_data(c, i); 484 462 } 485 463 } 486 464 487 - for (i = 0; i < BTREE_ID_NR; i++) { 465 + for (unsigned i = 0; i < BTREE_ID_NR; i++) { 488 466 struct btree_root *r = bch2_btree_id_root(c, i); 489 467 490 - if (!r->b) { 468 + if (!r->b && !r->error) { 491 469 r->alive = false; 492 470 r->level = 0; 493 - bch2_btree_root_alloc(c, i); 471 + bch2_btree_root_alloc_fake(c, i, 0); 494 472 } 495 473 } 496 474 fsck_err: 497 - return ret; 498 - } 499 - 500 - static int bch2_initialize_subvolumes(struct bch_fs *c) 501 - { 502 - struct bkey_i_snapshot_tree root_tree; 503 - struct bkey_i_snapshot root_snapshot; 504 - struct bkey_i_subvolume root_volume; 505 - int ret; 506 - 507 - bkey_snapshot_tree_init(&root_tree.k_i); 508 - root_tree.k.p.offset = 1; 509 - root_tree.v.master_subvol = cpu_to_le32(1); 510 - root_tree.v.root_snapshot = cpu_to_le32(U32_MAX); 511 - 512 - bkey_snapshot_init(&root_snapshot.k_i); 513 - root_snapshot.k.p.offset = U32_MAX; 514 - root_snapshot.v.flags = 0; 515 - root_snapshot.v.parent = 0; 516 - root_snapshot.v.subvol = cpu_to_le32(BCACHEFS_ROOT_SUBVOL); 517 - root_snapshot.v.tree = cpu_to_le32(1); 518 - SET_BCH_SNAPSHOT_SUBVOL(&root_snapshot.v, true); 519 - 520 - bkey_subvolume_init(&root_volume.k_i); 521 - root_volume.k.p.offset = BCACHEFS_ROOT_SUBVOL; 522 - root_volume.v.flags = 0; 523 - root_volume.v.snapshot = cpu_to_le32(U32_MAX); 524 - root_volume.v.inode = cpu_to_le64(BCACHEFS_ROOT_INO); 525 - 526 - ret = bch2_btree_insert(c, BTREE_ID_snapshot_trees, &root_tree.k_i, NULL, 0) ?: 527 - bch2_btree_insert(c, BTREE_ID_snapshots, &root_snapshot.k_i, NULL, 0) ?: 528 - bch2_btree_insert(c, BTREE_ID_subvolumes, &root_volume.k_i, NULL, 0); 529 - bch_err_fn(c, ret); 530 - return ret; 531 - } 532 - 533 - static int __bch2_fs_upgrade_for_subvolumes(struct btree_trans *trans) 534 - { 535 - struct btree_iter iter; 536 - struct bkey_s_c k; 537 - struct bch_inode_unpacked inode; 538 - int ret; 539 - 540 - k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_inodes, 541 - SPOS(0, BCACHEFS_ROOT_INO, U32_MAX), 0); 542 - ret = bkey_err(k); 543 - if (ret) 544 - return ret; 545 - 546 - if (!bkey_is_inode(k.k)) { 547 - bch_err(trans->c, "root inode not found"); 548 - ret = -BCH_ERR_ENOENT_inode; 549 - goto err; 550 - } 551 - 552 - ret = bch2_inode_unpack(k, &inode); 553 - BUG_ON(ret); 554 - 555 - inode.bi_subvol = BCACHEFS_ROOT_SUBVOL; 556 - 557 - ret = bch2_inode_write(trans, &iter, &inode); 558 - err: 559 - bch2_trans_iter_exit(trans, &iter); 560 - return ret; 561 - } 562 - 563 - /* set bi_subvol on root inode */ 564 - noinline_for_stack 565 - static int bch2_fs_upgrade_for_subvolumes(struct bch_fs *c) 566 - { 567 - int ret = bch2_trans_do(c, NULL, NULL, BCH_TRANS_COMMIT_lazy_rw, 568 - __bch2_fs_upgrade_for_subvolumes(trans)); 569 - bch_err_fn(c, ret); 570 - return ret; 571 - } 572 - 573 - const char * const bch2_recovery_passes[] = { 574 - #define x(_fn, ...) #_fn, 575 - BCH_RECOVERY_PASSES() 576 - #undef x 577 - NULL 578 - }; 579 - 580 - static int bch2_check_allocations(struct bch_fs *c) 581 - { 582 - return bch2_gc(c, true, c->opts.norecovery); 583 - } 584 - 585 - static int bch2_set_may_go_rw(struct bch_fs *c) 586 - { 587 - struct journal_keys *keys = &c->journal_keys; 588 - 589 - /* 590 - * After we go RW, the journal keys buffer can't be modified (except for 591 - * setting journal_key->overwritten: it will be accessed by multiple 592 - * threads 593 - */ 594 - move_gap(keys, keys->nr); 595 - 596 - set_bit(BCH_FS_may_go_rw, &c->flags); 597 - 598 - if (keys->nr || c->opts.fsck || !c->sb.clean) 599 - return bch2_fs_read_write_early(c); 600 - return 0; 601 - } 602 - 603 - struct recovery_pass_fn { 604 - int (*fn)(struct bch_fs *); 605 - unsigned when; 606 - }; 607 - 608 - static struct recovery_pass_fn recovery_pass_fns[] = { 609 - #define x(_fn, _id, _when) { .fn = bch2_##_fn, .when = _when }, 610 - BCH_RECOVERY_PASSES() 611 - #undef x 612 - }; 613 - 614 - u64 bch2_recovery_passes_to_stable(u64 v) 615 - { 616 - static const u8 map[] = { 617 - #define x(n, id, ...) [BCH_RECOVERY_PASS_##n] = BCH_RECOVERY_PASS_STABLE_##n, 618 - BCH_RECOVERY_PASSES() 619 - #undef x 620 - }; 621 - 622 - u64 ret = 0; 623 - for (unsigned i = 0; i < ARRAY_SIZE(map); i++) 624 - if (v & BIT_ULL(i)) 625 - ret |= BIT_ULL(map[i]); 626 - return ret; 627 - } 628 - 629 - u64 bch2_recovery_passes_from_stable(u64 v) 630 - { 631 - static const u8 map[] = { 632 - #define x(n, id, ...) [BCH_RECOVERY_PASS_STABLE_##n] = BCH_RECOVERY_PASS_##n, 633 - BCH_RECOVERY_PASSES() 634 - #undef x 635 - }; 636 - 637 - u64 ret = 0; 638 - for (unsigned i = 0; i < ARRAY_SIZE(map); i++) 639 - if (v & BIT_ULL(i)) 640 - ret |= BIT_ULL(map[i]); 641 475 return ret; 642 476 } 643 477 ··· 573 687 return false; 574 688 } 575 689 576 - u64 bch2_fsck_recovery_passes(void) 577 - { 578 - u64 ret = 0; 579 - 580 - for (unsigned i = 0; i < ARRAY_SIZE(recovery_pass_fns); i++) 581 - if (recovery_pass_fns[i].when & PASS_FSCK) 582 - ret |= BIT_ULL(i); 583 - return ret; 584 - } 585 - 586 - static bool should_run_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass) 587 - { 588 - struct recovery_pass_fn *p = recovery_pass_fns + pass; 589 - 590 - if (c->opts.norecovery && pass > BCH_RECOVERY_PASS_snapshots_read) 591 - return false; 592 - if (c->recovery_passes_explicit & BIT_ULL(pass)) 593 - return true; 594 - if ((p->when & PASS_FSCK) && c->opts.fsck) 595 - return true; 596 - if ((p->when & PASS_UNCLEAN) && !c->sb.clean) 597 - return true; 598 - if (p->when & PASS_ALWAYS) 599 - return true; 600 - return false; 601 - } 602 - 603 - static int bch2_run_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass) 604 - { 605 - struct recovery_pass_fn *p = recovery_pass_fns + pass; 606 - int ret; 607 - 608 - if (!(p->when & PASS_SILENT)) 609 - bch2_print(c, KERN_INFO bch2_log_msg(c, "%s..."), 610 - bch2_recovery_passes[pass]); 611 - ret = p->fn(c); 612 - if (ret) 613 - return ret; 614 - if (!(p->when & PASS_SILENT)) 615 - bch2_print(c, KERN_CONT " done\n"); 616 - 617 - return 0; 618 - } 619 - 620 - static int bch2_run_recovery_passes(struct bch_fs *c) 621 - { 622 - int ret = 0; 623 - 624 - while (c->curr_recovery_pass < ARRAY_SIZE(recovery_pass_fns)) { 625 - if (should_run_recovery_pass(c, c->curr_recovery_pass)) { 626 - unsigned pass = c->curr_recovery_pass; 627 - 628 - ret = bch2_run_recovery_pass(c, c->curr_recovery_pass); 629 - if (bch2_err_matches(ret, BCH_ERR_restart_recovery) || 630 - (ret && c->curr_recovery_pass < pass)) 631 - continue; 632 - if (ret) 633 - break; 634 - 635 - c->recovery_passes_complete |= BIT_ULL(c->curr_recovery_pass); 636 - } 637 - c->curr_recovery_pass++; 638 - c->recovery_pass_done = max(c->recovery_pass_done, c->curr_recovery_pass); 639 - } 640 - 641 - return ret; 642 - } 643 - 644 - int bch2_run_online_recovery_passes(struct bch_fs *c) 645 - { 646 - int ret = 0; 647 - 648 - for (unsigned i = 0; i < ARRAY_SIZE(recovery_pass_fns); i++) { 649 - struct recovery_pass_fn *p = recovery_pass_fns + i; 650 - 651 - if (!(p->when & PASS_ONLINE)) 652 - continue; 653 - 654 - ret = bch2_run_recovery_pass(c, i); 655 - if (bch2_err_matches(ret, BCH_ERR_restart_recovery)) { 656 - i = c->curr_recovery_pass; 657 - continue; 658 - } 659 - if (ret) 660 - break; 661 - } 662 - 663 - return ret; 664 - } 665 - 666 690 int bch2_fs_recovery(struct bch_fs *c) 667 691 { 668 692 struct bch_sb_field_clean *clean = NULL; ··· 605 809 goto err; 606 810 } 607 811 608 - if (c->opts.fsck && c->opts.norecovery) { 609 - bch_err(c, "cannot select both norecovery and fsck"); 610 - ret = -EINVAL; 611 - goto err; 612 - } 812 + if (c->opts.norecovery) 813 + c->opts.recovery_pass_last = BCH_RECOVERY_PASS_journal_replay - 1; 613 814 614 815 if (!c->opts.nochanges) { 615 816 mutex_lock(&c->sb_lock); 817 + struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext); 616 818 bool write_sb = false; 617 - 618 - struct bch_sb_field_ext *ext = 619 - bch2_sb_field_get_minsize(&c->disk_sb, ext, sizeof(*ext) / sizeof(u64)); 620 - if (!ext) { 621 - ret = -BCH_ERR_ENOSPC_sb; 622 - mutex_unlock(&c->sb_lock); 623 - goto err; 624 - } 625 819 626 820 if (BCH_SB_HAS_TOPOLOGY_ERRORS(c->disk_sb.sb)) { 627 821 ext->recovery_passes_required[0] |= ··· 671 885 goto err; 672 886 } 673 887 674 - if (!c->sb.clean || c->opts.fsck || c->opts.keep_journal) { 888 + if (!c->sb.clean || c->opts.fsck || c->opts.retain_recovery_info) { 675 889 struct genradix_iter iter; 676 890 struct journal_replay **i; 677 891 ··· 751 965 c->journal_replay_seq_end = blacklist_seq - 1; 752 966 753 967 if (c->opts.reconstruct_alloc) 754 - do_reconstruct_alloc(c); 968 + bch2_reconstruct_alloc(c); 755 969 756 970 zero_out_btree_mem_ptr(&c->journal_keys); 757 971 ··· 803 1017 804 1018 clear_bit(BCH_FS_fsck_running, &c->flags); 805 1019 1020 + /* fsync if we fixed errors */ 1021 + if (test_bit(BCH_FS_errors_fixed, &c->flags)) { 1022 + bch2_journal_flush_all_pins(&c->journal); 1023 + bch2_journal_meta(&c->journal); 1024 + } 1025 + 806 1026 /* If we fixed errors, verify that fs is actually clean now: */ 807 1027 if (IS_ENABLED(CONFIG_BCACHEFS_DEBUG) && 808 1028 test_bit(BCH_FS_errors_fixed, &c->flags) && ··· 843 1051 } 844 1052 845 1053 mutex_lock(&c->sb_lock); 1054 + struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext); 846 1055 bool write_sb = false; 847 1056 848 1057 if (BCH_SB_VERSION_UPGRADE_COMPLETE(c->disk_sb.sb) != le16_to_cpu(c->disk_sb.sb->version)) { ··· 857 1064 write_sb = true; 858 1065 } 859 1066 860 - if (!test_bit(BCH_FS_error, &c->flags)) { 861 - struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext); 862 - if (ext && 863 - (!bch2_is_zero(ext->recovery_passes_required, sizeof(ext->recovery_passes_required)) || 864 - !bch2_is_zero(ext->errors_silent, sizeof(ext->errors_silent)))) { 865 - memset(ext->recovery_passes_required, 0, sizeof(ext->recovery_passes_required)); 866 - memset(ext->errors_silent, 0, sizeof(ext->errors_silent)); 867 - write_sb = true; 868 - } 1067 + if (!test_bit(BCH_FS_error, &c->flags) && 1068 + !bch2_is_zero(ext->errors_silent, sizeof(ext->errors_silent))) { 1069 + memset(ext->errors_silent, 0, sizeof(ext->errors_silent)); 1070 + write_sb = true; 1071 + } 1072 + 1073 + if (c->opts.fsck && 1074 + !test_bit(BCH_FS_error, &c->flags) && 1075 + c->recovery_pass_done == BCH_RECOVERY_PASS_NR - 1 && 1076 + ext->btrees_lost_data) { 1077 + ext->btrees_lost_data = 0; 1078 + write_sb = true; 869 1079 } 870 1080 871 1081 if (c->opts.fsck && ··· 909 1113 out: 910 1114 bch2_flush_fsck_errs(c); 911 1115 912 - if (!c->opts.keep_journal && 913 - test_bit(JOURNAL_REPLAY_DONE, &c->journal.flags)) 1116 + if (!c->opts.retain_recovery_info) { 914 1117 bch2_journal_keys_put_initial(c); 1118 + bch2_find_btree_nodes_exit(&c->found_btree_nodes); 1119 + } 915 1120 kfree(clean); 916 1121 917 1122 if (!ret && ··· 938 1141 int ret; 939 1142 940 1143 bch_notice(c, "initializing new filesystem"); 1144 + set_bit(BCH_FS_new_fs, &c->flags); 941 1145 942 1146 mutex_lock(&c->sb_lock); 943 1147 c->disk_sb.sb->compat[0] |= cpu_to_le64(1ULL << BCH_COMPAT_extents_above_btree_updates_done); ··· 953 1155 } 954 1156 mutex_unlock(&c->sb_lock); 955 1157 956 - c->curr_recovery_pass = ARRAY_SIZE(recovery_pass_fns); 1158 + c->curr_recovery_pass = BCH_RECOVERY_PASS_NR; 957 1159 set_bit(BCH_FS_may_go_rw, &c->flags); 958 1160 959 1161 for (unsigned i = 0; i < BTREE_ID_NR; i++) 960 - bch2_btree_root_alloc(c, i); 1162 + bch2_btree_root_alloc_fake(c, i, 0); 961 1163 962 1164 for_each_member_device(c, ca) 963 1165 bch2_dev_usage_init(ca); ··· 1028 1230 if (ret) 1029 1231 goto err; 1030 1232 1031 - c->recovery_pass_done = ARRAY_SIZE(recovery_pass_fns) - 1; 1233 + c->recovery_pass_done = BCH_RECOVERY_PASS_NR - 1; 1032 1234 1033 1235 if (enabled_qtypes(c)) { 1034 1236 ret = bch2_fs_quota_read(c);

+2 -30

fs/bcachefs/recovery.h

··· 2 2 #ifndef _BCACHEFS_RECOVERY_H 3 3 #define _BCACHEFS_RECOVERY_H 4 4 5 - extern const char * const bch2_recovery_passes[]; 5 + void bch2_btree_lost_data(struct bch_fs *, enum btree_id); 6 6 7 - u64 bch2_recovery_passes_to_stable(u64 v); 8 - u64 bch2_recovery_passes_from_stable(u64 v); 9 - 10 - /* 11 - * For when we need to rewind recovery passes and run a pass we skipped: 12 - */ 13 - static inline int bch2_run_explicit_recovery_pass(struct bch_fs *c, 14 - enum bch_recovery_pass pass) 15 - { 16 - if (c->recovery_passes_explicit & BIT_ULL(pass)) 17 - return 0; 18 - 19 - bch_info(c, "running explicit recovery pass %s (%u), currently at %s (%u)", 20 - bch2_recovery_passes[pass], pass, 21 - bch2_recovery_passes[c->curr_recovery_pass], c->curr_recovery_pass); 22 - 23 - c->recovery_passes_explicit |= BIT_ULL(pass); 24 - 25 - if (c->curr_recovery_pass >= pass) { 26 - c->curr_recovery_pass = pass; 27 - c->recovery_passes_complete &= (1ULL << pass) >> 1; 28 - return -BCH_ERR_restart_recovery; 29 - } else { 30 - return 0; 31 - } 32 - } 33 - 34 - int bch2_run_online_recovery_passes(struct bch_fs *); 35 - u64 bch2_fsck_recovery_passes(void); 7 + int bch2_journal_replay(struct bch_fs *); 36 8 37 9 int bch2_fs_recovery(struct bch_fs *); 38 10 int bch2_fs_initialize(struct bch_fs *);

+249

fs/bcachefs/recovery_passes.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include "bcachefs.h" 4 + #include "alloc_background.h" 5 + #include "backpointers.h" 6 + #include "btree_gc.h" 7 + #include "btree_node_scan.h" 8 + #include "ec.h" 9 + #include "fsck.h" 10 + #include "inode.h" 11 + #include "journal.h" 12 + #include "lru.h" 13 + #include "logged_ops.h" 14 + #include "rebalance.h" 15 + #include "recovery.h" 16 + #include "recovery_passes.h" 17 + #include "snapshot.h" 18 + #include "subvolume.h" 19 + #include "super.h" 20 + #include "super-io.h" 21 + 22 + const char * const bch2_recovery_passes[] = { 23 + #define x(_fn, ...) #_fn, 24 + BCH_RECOVERY_PASSES() 25 + #undef x 26 + NULL 27 + }; 28 + 29 + static int bch2_check_allocations(struct bch_fs *c) 30 + { 31 + return bch2_gc(c, true, false); 32 + } 33 + 34 + static int bch2_set_may_go_rw(struct bch_fs *c) 35 + { 36 + struct journal_keys *keys = &c->journal_keys; 37 + 38 + /* 39 + * After we go RW, the journal keys buffer can't be modified (except for 40 + * setting journal_key->overwritten: it will be accessed by multiple 41 + * threads 42 + */ 43 + move_gap(keys, keys->nr); 44 + 45 + set_bit(BCH_FS_may_go_rw, &c->flags); 46 + 47 + if (keys->nr || c->opts.fsck || !c->sb.clean) 48 + return bch2_fs_read_write_early(c); 49 + return 0; 50 + } 51 + 52 + struct recovery_pass_fn { 53 + int (*fn)(struct bch_fs *); 54 + unsigned when; 55 + }; 56 + 57 + static struct recovery_pass_fn recovery_pass_fns[] = { 58 + #define x(_fn, _id, _when) { .fn = bch2_##_fn, .when = _when }, 59 + BCH_RECOVERY_PASSES() 60 + #undef x 61 + }; 62 + 63 + static const u8 passes_to_stable_map[] = { 64 + #define x(n, id, ...) [BCH_RECOVERY_PASS_##n] = BCH_RECOVERY_PASS_STABLE_##n, 65 + BCH_RECOVERY_PASSES() 66 + #undef x 67 + }; 68 + 69 + static enum bch_recovery_pass_stable bch2_recovery_pass_to_stable(enum bch_recovery_pass pass) 70 + { 71 + return passes_to_stable_map[pass]; 72 + } 73 + 74 + u64 bch2_recovery_passes_to_stable(u64 v) 75 + { 76 + u64 ret = 0; 77 + for (unsigned i = 0; i < ARRAY_SIZE(passes_to_stable_map); i++) 78 + if (v & BIT_ULL(i)) 79 + ret |= BIT_ULL(passes_to_stable_map[i]); 80 + return ret; 81 + } 82 + 83 + u64 bch2_recovery_passes_from_stable(u64 v) 84 + { 85 + static const u8 map[] = { 86 + #define x(n, id, ...) [BCH_RECOVERY_PASS_STABLE_##n] = BCH_RECOVERY_PASS_##n, 87 + BCH_RECOVERY_PASSES() 88 + #undef x 89 + }; 90 + 91 + u64 ret = 0; 92 + for (unsigned i = 0; i < ARRAY_SIZE(map); i++) 93 + if (v & BIT_ULL(i)) 94 + ret |= BIT_ULL(map[i]); 95 + return ret; 96 + } 97 + 98 + /* 99 + * For when we need to rewind recovery passes and run a pass we skipped: 100 + */ 101 + int bch2_run_explicit_recovery_pass(struct bch_fs *c, 102 + enum bch_recovery_pass pass) 103 + { 104 + if (c->recovery_passes_explicit & BIT_ULL(pass)) 105 + return 0; 106 + 107 + bch_info(c, "running explicit recovery pass %s (%u), currently at %s (%u)", 108 + bch2_recovery_passes[pass], pass, 109 + bch2_recovery_passes[c->curr_recovery_pass], c->curr_recovery_pass); 110 + 111 + c->recovery_passes_explicit |= BIT_ULL(pass); 112 + 113 + if (c->curr_recovery_pass >= pass) { 114 + c->curr_recovery_pass = pass; 115 + c->recovery_passes_complete &= (1ULL << pass) >> 1; 116 + return -BCH_ERR_restart_recovery; 117 + } else { 118 + return 0; 119 + } 120 + } 121 + 122 + int bch2_run_explicit_recovery_pass_persistent(struct bch_fs *c, 123 + enum bch_recovery_pass pass) 124 + { 125 + enum bch_recovery_pass_stable s = bch2_recovery_pass_to_stable(pass); 126 + 127 + mutex_lock(&c->sb_lock); 128 + struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext); 129 + 130 + if (!test_bit_le64(s, ext->recovery_passes_required)) { 131 + __set_bit_le64(s, ext->recovery_passes_required); 132 + bch2_write_super(c); 133 + } 134 + mutex_unlock(&c->sb_lock); 135 + 136 + return bch2_run_explicit_recovery_pass(c, pass); 137 + } 138 + 139 + static void bch2_clear_recovery_pass_required(struct bch_fs *c, 140 + enum bch_recovery_pass pass) 141 + { 142 + enum bch_recovery_pass_stable s = bch2_recovery_pass_to_stable(pass); 143 + 144 + mutex_lock(&c->sb_lock); 145 + struct bch_sb_field_ext *ext = bch2_sb_field_get(c->disk_sb.sb, ext); 146 + 147 + if (test_bit_le64(s, ext->recovery_passes_required)) { 148 + __clear_bit_le64(s, ext->recovery_passes_required); 149 + bch2_write_super(c); 150 + } 151 + mutex_unlock(&c->sb_lock); 152 + } 153 + 154 + u64 bch2_fsck_recovery_passes(void) 155 + { 156 + u64 ret = 0; 157 + 158 + for (unsigned i = 0; i < ARRAY_SIZE(recovery_pass_fns); i++) 159 + if (recovery_pass_fns[i].when & PASS_FSCK) 160 + ret |= BIT_ULL(i); 161 + return ret; 162 + } 163 + 164 + static bool should_run_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass) 165 + { 166 + struct recovery_pass_fn *p = recovery_pass_fns + pass; 167 + 168 + if (c->recovery_passes_explicit & BIT_ULL(pass)) 169 + return true; 170 + if ((p->when & PASS_FSCK) && c->opts.fsck) 171 + return true; 172 + if ((p->when & PASS_UNCLEAN) && !c->sb.clean) 173 + return true; 174 + if (p->when & PASS_ALWAYS) 175 + return true; 176 + return false; 177 + } 178 + 179 + static int bch2_run_recovery_pass(struct bch_fs *c, enum bch_recovery_pass pass) 180 + { 181 + struct recovery_pass_fn *p = recovery_pass_fns + pass; 182 + int ret; 183 + 184 + if (!(p->when & PASS_SILENT)) 185 + bch2_print(c, KERN_INFO bch2_log_msg(c, "%s..."), 186 + bch2_recovery_passes[pass]); 187 + ret = p->fn(c); 188 + if (ret) 189 + return ret; 190 + if (!(p->when & PASS_SILENT)) 191 + bch2_print(c, KERN_CONT " done\n"); 192 + 193 + return 0; 194 + } 195 + 196 + int bch2_run_online_recovery_passes(struct bch_fs *c) 197 + { 198 + int ret = 0; 199 + 200 + for (unsigned i = 0; i < ARRAY_SIZE(recovery_pass_fns); i++) { 201 + struct recovery_pass_fn *p = recovery_pass_fns + i; 202 + 203 + if (!(p->when & PASS_ONLINE)) 204 + continue; 205 + 206 + ret = bch2_run_recovery_pass(c, i); 207 + if (bch2_err_matches(ret, BCH_ERR_restart_recovery)) { 208 + i = c->curr_recovery_pass; 209 + continue; 210 + } 211 + if (ret) 212 + break; 213 + } 214 + 215 + return ret; 216 + } 217 + 218 + int bch2_run_recovery_passes(struct bch_fs *c) 219 + { 220 + int ret = 0; 221 + 222 + while (c->curr_recovery_pass < ARRAY_SIZE(recovery_pass_fns)) { 223 + if (c->opts.recovery_pass_last && 224 + c->curr_recovery_pass > c->opts.recovery_pass_last) 225 + break; 226 + 227 + if (should_run_recovery_pass(c, c->curr_recovery_pass)) { 228 + unsigned pass = c->curr_recovery_pass; 229 + 230 + ret = bch2_run_recovery_pass(c, c->curr_recovery_pass); 231 + if (bch2_err_matches(ret, BCH_ERR_restart_recovery) || 232 + (ret && c->curr_recovery_pass < pass)) 233 + continue; 234 + if (ret) 235 + break; 236 + 237 + c->recovery_passes_complete |= BIT_ULL(c->curr_recovery_pass); 238 + } 239 + 240 + c->recovery_pass_done = max(c->recovery_pass_done, c->curr_recovery_pass); 241 + 242 + if (!test_bit(BCH_FS_error, &c->flags)) 243 + bch2_clear_recovery_pass_required(c, c->curr_recovery_pass); 244 + 245 + c->curr_recovery_pass++; 246 + } 247 + 248 + return ret; 249 + }

+17

fs/bcachefs/recovery_passes.h

··· 1 + #ifndef _BCACHEFS_RECOVERY_PASSES_H 2 + #define _BCACHEFS_RECOVERY_PASSES_H 3 + 4 + extern const char * const bch2_recovery_passes[]; 5 + 6 + u64 bch2_recovery_passes_to_stable(u64 v); 7 + u64 bch2_recovery_passes_from_stable(u64 v); 8 + 9 + u64 bch2_fsck_recovery_passes(void); 10 + 11 + int bch2_run_explicit_recovery_pass(struct bch_fs *, enum bch_recovery_pass); 12 + int bch2_run_explicit_recovery_pass_persistent(struct bch_fs *, enum bch_recovery_pass); 13 + 14 + int bch2_run_online_recovery_passes(struct bch_fs *); 15 + int bch2_run_recovery_passes(struct bch_fs *); 16 + 17 + #endif /* _BCACHEFS_RECOVERY_PASSES_H */

+7 -4

fs/bcachefs/recovery_types.h fs/bcachefs/recovery_passes_types.h

··· 1 1 /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef _BCACHEFS_RECOVERY_TYPES_H 3 - #define _BCACHEFS_RECOVERY_TYPES_H 2 + #ifndef _BCACHEFS_RECOVERY_PASSES_TYPES_H 3 + #define _BCACHEFS_RECOVERY_PASSES_TYPES_H 4 4 5 5 #define PASS_SILENT BIT(0) 6 6 #define PASS_FSCK BIT(1) ··· 13 13 * must never change: 14 14 */ 15 15 #define BCH_RECOVERY_PASSES() \ 16 + x(scan_for_btree_nodes, 37, 0) \ 16 17 x(check_topology, 4, 0) \ 17 18 x(alloc_read, 0, PASS_ALWAYS) \ 18 19 x(stripes_read, 1, PASS_ALWAYS) \ ··· 32 31 x(check_alloc_to_lru_refs, 15, PASS_ONLINE|PASS_FSCK) \ 33 32 x(fs_freespace_init, 16, PASS_ALWAYS|PASS_SILENT) \ 34 33 x(bucket_gens_init, 17, 0) \ 34 + x(reconstruct_snapshots, 38, 0) \ 35 35 x(check_snapshot_trees, 18, PASS_ONLINE|PASS_FSCK) \ 36 36 x(check_snapshots, 19, PASS_ONLINE|PASS_FSCK) \ 37 37 x(check_subvols, 20, PASS_ONLINE|PASS_FSCK) \ 38 38 x(check_subvol_children, 35, PASS_ONLINE|PASS_FSCK) \ 39 39 x(delete_dead_snapshots, 21, PASS_ONLINE|PASS_FSCK) \ 40 40 x(fs_upgrade_for_subvolumes, 22, 0) \ 41 - x(resume_logged_ops, 23, PASS_ALWAYS) \ 42 41 x(check_inodes, 24, PASS_FSCK) \ 43 42 x(check_extents, 25, PASS_FSCK) \ 44 43 x(check_indirect_extents, 26, PASS_FSCK) \ ··· 48 47 x(check_subvolume_structure, 36, PASS_ONLINE|PASS_FSCK) \ 49 48 x(check_directory_structure, 30, PASS_ONLINE|PASS_FSCK) \ 50 49 x(check_nlinks, 31, PASS_FSCK) \ 50 + x(resume_logged_ops, 23, PASS_ALWAYS) \ 51 51 x(delete_dead_inodes, 32, PASS_FSCK|PASS_UNCLEAN) \ 52 52 x(fix_reflink_p, 33, 0) \ 53 53 x(set_fs_needs_rebalance, 34, 0) \ ··· 58 56 #define x(n, id, when) BCH_RECOVERY_PASS_##n, 59 57 BCH_RECOVERY_PASSES() 60 58 #undef x 59 + BCH_RECOVERY_PASS_NR 61 60 }; 62 61 63 62 /* But we also need stable identifiers that can be used in the superblock */ ··· 68 65 #undef x 69 66 }; 70 67 71 - #endif /* _BCACHEFS_RECOVERY_TYPES_H */ 68 + #endif /* _BCACHEFS_RECOVERY_PASSES_TYPES_H */

+1 -2

fs/bcachefs/reflink.c

··· 185 185 } else { 186 186 bkey_error_init(update); 187 187 update->k.p = p.k->p; 188 - update->k.p.offset = next_idx; 189 - update->k.size = next_idx - *idx; 188 + update->k.size = p.k->size; 190 189 set_bkey_val_u64s(&update->k, 0); 191 190 } 192 191

+12 -7

fs/bcachefs/replicas.c

··· 6 6 #include "replicas.h" 7 7 #include "super-io.h" 8 8 9 + #include <linux/sort.h> 10 + 9 11 static int bch2_cpu_replicas_to_sb_replicas(struct bch_fs *, 10 12 struct bch_replicas_cpu *); 11 13 12 14 /* Some (buggy!) compilers don't allow memcmp to be passed as a pointer */ 13 - static int bch2_memcmp(const void *l, const void *r, size_t size) 15 + static int bch2_memcmp(const void *l, const void *r, const void *priv) 14 16 { 17 + size_t size = (size_t) priv; 15 18 return memcmp(l, r, size); 16 19 } 17 20 ··· 42 39 43 40 static void bch2_cpu_replicas_sort(struct bch_replicas_cpu *r) 44 41 { 45 - eytzinger0_sort(r->entries, r->nr, r->entry_size, bch2_memcmp, NULL); 42 + eytzinger0_sort_r(r->entries, r->nr, r->entry_size, 43 + bch2_memcmp, NULL, (void *)(size_t)r->entry_size); 46 44 } 47 45 48 46 static void bch2_replicas_entry_v0_to_text(struct printbuf *out, ··· 232 228 233 229 verify_replicas_entry(search); 234 230 235 - #define entry_cmp(_l, _r, size) memcmp(_l, _r, entry_size) 231 + #define entry_cmp(_l, _r) memcmp(_l, _r, entry_size) 236 232 idx = eytzinger0_find(r->entries, r->nr, r->entry_size, 237 233 entry_cmp, search); 238 234 #undef entry_cmp ··· 828 824 { 829 825 unsigned i; 830 826 831 - sort_cmp_size(cpu_r->entries, 832 - cpu_r->nr, 833 - cpu_r->entry_size, 834 - bch2_memcmp, NULL); 827 + sort_r(cpu_r->entries, 828 + cpu_r->nr, 829 + cpu_r->entry_size, 830 + bch2_memcmp, NULL, 831 + (void *)(size_t)cpu_r->entry_size); 835 832 836 833 for (i = 0; i < cpu_r->nr; i++) { 837 834 struct bch_replicas_entry_v1 *e =

+1 -1

fs/bcachefs/sb-downgrade.c

··· 7 7 8 8 #include "bcachefs.h" 9 9 #include "darray.h" 10 - #include "recovery.h" 10 + #include "recovery_passes.h" 11 11 #include "sb-downgrade.h" 12 12 #include "sb-errors.h" 13 13 #include "super-io.h"

+6 -1

fs/bcachefs/sb-errors_types.h

··· 265 265 x(subvol_children_bad, 257) \ 266 266 x(subvol_loop, 258) \ 267 267 x(subvol_unreachable, 259) \ 268 - x(btree_node_bkey_bad_u64s, 260) 268 + x(btree_node_bkey_bad_u64s, 260) \ 269 + x(btree_node_topology_empty_interior_node, 261) \ 270 + x(btree_ptr_v2_min_key_bad, 262) \ 271 + x(btree_root_unreadable_and_scan_found_nothing, 263) \ 272 + x(snapshot_node_missing, 264) \ 273 + x(dup_backpointer_to_bad_csum_extent, 265) 269 274 270 275 enum bch_sb_error_id { 271 276 #define x(t, n) BCH_FSCK_ERR_##t = n,

+192 -16

fs/bcachefs/snapshot.c

··· 8 8 #include "errcode.h" 9 9 #include "error.h" 10 10 #include "fs.h" 11 + #include "recovery_passes.h" 11 12 #include "snapshot.h" 12 13 13 14 #include <linux/random.h> ··· 94 93 95 94 static bool __bch2_snapshot_is_ancestor_early(struct snapshot_table *t, u32 id, u32 ancestor) 96 95 { 97 - while (id && id < ancestor) 98 - id = __snapshot_t(t, id)->parent; 96 + while (id && id < ancestor) { 97 + const struct snapshot_t *s = __snapshot_t(t, id); 98 + id = s ? s->parent : 0; 99 + } 99 100 return id == ancestor; 100 101 } 101 102 ··· 113 110 static inline u32 get_ancestor_below(struct snapshot_table *t, u32 id, u32 ancestor) 114 111 { 115 112 const struct snapshot_t *s = __snapshot_t(t, id); 113 + if (!s) 114 + return 0; 116 115 117 116 if (s->skip[2] <= ancestor) 118 117 return s->skip[2]; ··· 132 127 rcu_read_lock(); 133 128 struct snapshot_table *t = rcu_dereference(c->snapshots); 134 129 135 - if (unlikely(c->recovery_pass_done <= BCH_RECOVERY_PASS_check_snapshots)) { 130 + if (unlikely(c->recovery_pass_done < BCH_RECOVERY_PASS_check_snapshots)) { 136 131 ret = __bch2_snapshot_is_ancestor_early(t, id, ancestor); 137 132 goto out; 138 133 } ··· 156 151 static noinline struct snapshot_t *__snapshot_t_mut(struct bch_fs *c, u32 id) 157 152 { 158 153 size_t idx = U32_MAX - id; 159 - size_t new_size; 160 154 struct snapshot_table *new, *old; 161 155 162 - new_size = max(16UL, roundup_pow_of_two(idx + 1)); 156 + size_t new_bytes = kmalloc_size_roundup(struct_size(new, s, idx + 1)); 157 + size_t new_size = (new_bytes - sizeof(*new)) / sizeof(new->s[0]); 163 158 164 - new = kvzalloc(struct_size(new, s, new_size), GFP_KERNEL); 159 + new = kvzalloc(new_bytes, GFP_KERNEL); 165 160 if (!new) 166 161 return NULL; 167 162 163 + new->nr = new_size; 164 + 168 165 old = rcu_dereference_protected(c->snapshots, true); 169 166 if (old) 170 - memcpy(new->s, 171 - rcu_dereference_protected(c->snapshots, true)->s, 172 - sizeof(new->s[0]) * c->snapshot_table_size); 167 + memcpy(new->s, old->s, sizeof(old->s[0]) * old->nr); 173 168 174 169 rcu_assign_pointer(c->snapshots, new); 175 - c->snapshot_table_size = new_size; 176 - kvfree_rcu_mightsleep(old); 170 + kvfree_rcu(old, rcu); 177 171 178 - return &rcu_dereference_protected(c->snapshots, true)->s[idx]; 172 + return &rcu_dereference_protected(c->snapshots, 173 + lockdep_is_held(&c->snapshot_table_lock))->s[idx]; 179 174 } 180 175 181 176 static inline struct snapshot_t *snapshot_t_mut(struct bch_fs *c, u32 id) 182 177 { 183 178 size_t idx = U32_MAX - id; 179 + struct snapshot_table *table = 180 + rcu_dereference_protected(c->snapshots, 181 + lockdep_is_held(&c->snapshot_table_lock)); 184 182 185 183 lockdep_assert_held(&c->snapshot_table_lock); 186 184 187 - if (likely(idx < c->snapshot_table_size)) 188 - return &rcu_dereference_protected(c->snapshots, true)->s[idx]; 185 + if (likely(table && idx < table->nr)) 186 + return &table->s[idx]; 189 187 190 188 return __snapshot_t_mut(c, id); 191 189 } ··· 575 567 u32 subvol_id; 576 568 577 569 ret = bch2_snapshot_tree_master_subvol(trans, root_id, &subvol_id); 570 + bch_err_fn(c, ret); 571 + 572 + if (bch2_err_matches(ret, ENOENT)) { /* nothing to be done here */ 573 + ret = 0; 574 + goto err; 575 + } 576 + 578 577 if (ret) 579 578 goto err; 580 579 ··· 739 724 u32 parent_id = bch2_snapshot_parent_early(c, k.k->p.offset); 740 725 u32 real_depth; 741 726 struct printbuf buf = PRINTBUF; 742 - bool should_have_subvol; 743 727 u32 i, id; 744 728 int ret = 0; 745 729 ··· 784 770 } 785 771 } 786 772 787 - should_have_subvol = BCH_SNAPSHOT_SUBVOL(&s) && 773 + bool should_have_subvol = BCH_SNAPSHOT_SUBVOL(&s) && 788 774 !BCH_SNAPSHOT_DELETED(&s); 789 775 790 776 if (should_have_subvol) { ··· 882 868 BTREE_ITER_PREFETCH, k, 883 869 NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 884 870 check_snapshot(trans, &iter, k))); 871 + bch_err_fn(c, ret); 872 + return ret; 873 + } 874 + 875 + static int check_snapshot_exists(struct btree_trans *trans, u32 id) 876 + { 877 + struct bch_fs *c = trans->c; 878 + 879 + if (bch2_snapshot_equiv(c, id)) 880 + return 0; 881 + 882 + u32 tree_id; 883 + int ret = bch2_snapshot_tree_create(trans, id, 0, &tree_id); 884 + if (ret) 885 + return ret; 886 + 887 + struct bkey_i_snapshot *snapshot = bch2_trans_kmalloc(trans, sizeof(*snapshot)); 888 + ret = PTR_ERR_OR_ZERO(snapshot); 889 + if (ret) 890 + return ret; 891 + 892 + bkey_snapshot_init(&snapshot->k_i); 893 + snapshot->k.p = POS(0, id); 894 + snapshot->v.tree = cpu_to_le32(tree_id); 895 + snapshot->v.btime.lo = cpu_to_le64(bch2_current_time(c)); 896 + 897 + return bch2_btree_insert_trans(trans, BTREE_ID_snapshots, &snapshot->k_i, 0) ?: 898 + bch2_mark_snapshot(trans, BTREE_ID_snapshots, 0, 899 + bkey_s_c_null, bkey_i_to_s(&snapshot->k_i), 0) ?: 900 + bch2_snapshot_set_equiv(trans, bkey_i_to_s_c(&snapshot->k_i)); 901 + } 902 + 903 + /* Figure out which snapshot nodes belong in the same tree: */ 904 + struct snapshot_tree_reconstruct { 905 + enum btree_id btree; 906 + struct bpos cur_pos; 907 + snapshot_id_list cur_ids; 908 + DARRAY(snapshot_id_list) trees; 909 + }; 910 + 911 + static void snapshot_tree_reconstruct_exit(struct snapshot_tree_reconstruct *r) 912 + { 913 + darray_for_each(r->trees, i) 914 + darray_exit(i); 915 + darray_exit(&r->trees); 916 + darray_exit(&r->cur_ids); 917 + } 918 + 919 + static inline bool same_snapshot(struct snapshot_tree_reconstruct *r, struct bpos pos) 920 + { 921 + return r->btree == BTREE_ID_inodes 922 + ? r->cur_pos.offset == pos.offset 923 + : r->cur_pos.inode == pos.inode; 924 + } 925 + 926 + static inline bool snapshot_id_lists_have_common(snapshot_id_list *l, snapshot_id_list *r) 927 + { 928 + darray_for_each(*l, i) 929 + if (snapshot_list_has_id(r, *i)) 930 + return true; 931 + return false; 932 + } 933 + 934 + static void snapshot_id_list_to_text(struct printbuf *out, snapshot_id_list *s) 935 + { 936 + bool first = true; 937 + darray_for_each(*s, i) { 938 + if (!first) 939 + prt_char(out, ' '); 940 + first = false; 941 + prt_printf(out, "%u", *i); 942 + } 943 + } 944 + 945 + static int snapshot_tree_reconstruct_next(struct bch_fs *c, struct snapshot_tree_reconstruct *r) 946 + { 947 + if (r->cur_ids.nr) { 948 + darray_for_each(r->trees, i) 949 + if (snapshot_id_lists_have_common(i, &r->cur_ids)) { 950 + int ret = snapshot_list_merge(c, i, &r->cur_ids); 951 + if (ret) 952 + return ret; 953 + goto out; 954 + } 955 + darray_push(&r->trees, r->cur_ids); 956 + darray_init(&r->cur_ids); 957 + } 958 + out: 959 + r->cur_ids.nr = 0; 960 + return 0; 961 + } 962 + 963 + static int get_snapshot_trees(struct bch_fs *c, struct snapshot_tree_reconstruct *r, struct bpos pos) 964 + { 965 + if (!same_snapshot(r, pos)) 966 + snapshot_tree_reconstruct_next(c, r); 967 + r->cur_pos = pos; 968 + return snapshot_list_add_nodup(c, &r->cur_ids, pos.snapshot); 969 + } 970 + 971 + int bch2_reconstruct_snapshots(struct bch_fs *c) 972 + { 973 + struct btree_trans *trans = bch2_trans_get(c); 974 + struct printbuf buf = PRINTBUF; 975 + struct snapshot_tree_reconstruct r = {}; 976 + int ret = 0; 977 + 978 + for (unsigned btree = 0; btree < BTREE_ID_NR; btree++) { 979 + if (btree_type_has_snapshots(btree)) { 980 + r.btree = btree; 981 + 982 + ret = for_each_btree_key(trans, iter, btree, POS_MIN, 983 + BTREE_ITER_ALL_SNAPSHOTS|BTREE_ITER_PREFETCH, k, ({ 984 + get_snapshot_trees(c, &r, k.k->p); 985 + })); 986 + if (ret) 987 + goto err; 988 + 989 + snapshot_tree_reconstruct_next(c, &r); 990 + } 991 + } 992 + 993 + darray_for_each(r.trees, t) { 994 + printbuf_reset(&buf); 995 + snapshot_id_list_to_text(&buf, t); 996 + 997 + darray_for_each(*t, id) { 998 + if (fsck_err_on(!bch2_snapshot_equiv(c, *id), 999 + c, snapshot_node_missing, 1000 + "snapshot node %u from tree %s missing", *id, buf.buf)) { 1001 + if (t->nr > 1) { 1002 + bch_err(c, "cannot reconstruct snapshot trees with multiple nodes"); 1003 + ret = -BCH_ERR_fsck_repair_unimplemented; 1004 + goto err; 1005 + } 1006 + 1007 + ret = commit_do(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc, 1008 + check_snapshot_exists(trans, *id)); 1009 + if (ret) 1010 + goto err; 1011 + } 1012 + } 1013 + } 1014 + fsck_err: 1015 + err: 1016 + bch2_trans_put(trans); 1017 + snapshot_tree_reconstruct_exit(&r); 1018 + printbuf_exit(&buf); 885 1019 bch_err_fn(c, ret); 886 1020 return ret; 887 1021 } ··· 1844 1682 POS_MIN, 0, k, 1845 1683 (set_is_ancestor_bitmap(c, k.k->p.offset), 0))); 1846 1684 bch_err_fn(c, ret); 1685 + 1686 + /* 1687 + * It's important that we check if we need to reconstruct snapshots 1688 + * before going RW, so we mark that pass as required in the superblock - 1689 + * otherwise, we could end up deleting keys with missing snapshot nodes 1690 + * instead 1691 + */ 1692 + BUG_ON(!test_bit(BCH_FS_new_fs, &c->flags) && 1693 + test_bit(BCH_FS_may_go_rw, &c->flags)); 1694 + 1695 + if (bch2_err_matches(ret, EIO) || 1696 + (c->sb.btrees_lost_data & BIT_ULL(BTREE_ID_snapshots))) 1697 + ret = bch2_run_explicit_recovery_pass_persistent(c, BCH_RECOVERY_PASS_reconstruct_snapshots); 1698 + 1847 1699 return ret; 1848 1700 } 1849 1701

+50 -39

fs/bcachefs/snapshot.h

··· 33 33 34 34 static inline struct snapshot_t *__snapshot_t(struct snapshot_table *t, u32 id) 35 35 { 36 - return &t->s[U32_MAX - id]; 36 + u32 idx = U32_MAX - id; 37 + 38 + return likely(t && idx < t->nr) 39 + ? &t->s[idx] 40 + : NULL; 37 41 } 38 42 39 43 static inline const struct snapshot_t *snapshot_t(struct bch_fs *c, u32 id) ··· 48 44 static inline u32 bch2_snapshot_tree(struct bch_fs *c, u32 id) 49 45 { 50 46 rcu_read_lock(); 51 - id = snapshot_t(c, id)->tree; 47 + const struct snapshot_t *s = snapshot_t(c, id); 48 + id = s ? s->tree : 0; 52 49 rcu_read_unlock(); 53 50 54 51 return id; ··· 57 52 58 53 static inline u32 __bch2_snapshot_parent_early(struct bch_fs *c, u32 id) 59 54 { 60 - return snapshot_t(c, id)->parent; 55 + const struct snapshot_t *s = snapshot_t(c, id); 56 + return s ? s->parent : 0; 61 57 } 62 58 63 59 static inline u32 bch2_snapshot_parent_early(struct bch_fs *c, u32 id) ··· 72 66 73 67 static inline u32 __bch2_snapshot_parent(struct bch_fs *c, u32 id) 74 68 { 75 - #ifdef CONFIG_BCACHEFS_DEBUG 76 - u32 parent = snapshot_t(c, id)->parent; 69 + const struct snapshot_t *s = snapshot_t(c, id); 70 + if (!s) 71 + return 0; 77 72 78 - if (parent && 79 - snapshot_t(c, id)->depth != snapshot_t(c, parent)->depth + 1) 73 + u32 parent = s->parent; 74 + if (IS_ENABLED(CONFIG_BCACHEFS_DEBU) && 75 + parent && 76 + s->depth != snapshot_t(c, parent)->depth + 1) 80 77 panic("id %u depth=%u parent %u depth=%u\n", 81 78 id, snapshot_t(c, id)->depth, 82 79 parent, snapshot_t(c, parent)->depth); 83 80 84 81 return parent; 85 - #else 86 - return snapshot_t(c, id)->parent; 87 - #endif 88 82 } 89 83 90 84 static inline u32 bch2_snapshot_parent(struct bch_fs *c, u32 id) ··· 122 116 123 117 static inline u32 __bch2_snapshot_equiv(struct bch_fs *c, u32 id) 124 118 { 125 - return snapshot_t(c, id)->equiv; 119 + const struct snapshot_t *s = snapshot_t(c, id); 120 + return s ? s->equiv : 0; 126 121 } 127 122 128 123 static inline u32 bch2_snapshot_equiv(struct bch_fs *c, u32 id) ··· 140 133 return id == bch2_snapshot_equiv(c, id); 141 134 } 142 135 143 - static inline bool bch2_snapshot_is_internal_node(struct bch_fs *c, u32 id) 136 + static inline int bch2_snapshot_is_internal_node(struct bch_fs *c, u32 id) 144 137 { 145 - const struct snapshot_t *s; 146 - bool ret; 147 - 148 138 rcu_read_lock(); 149 - s = snapshot_t(c, id); 150 - ret = s->children[0]; 139 + const struct snapshot_t *s = snapshot_t(c, id); 140 + int ret = s ? s->children[0] : -BCH_ERR_invalid_snapshot_node; 151 141 rcu_read_unlock(); 152 142 153 143 return ret; 154 144 } 155 145 156 - static inline u32 bch2_snapshot_is_leaf(struct bch_fs *c, u32 id) 146 + static inline int bch2_snapshot_is_leaf(struct bch_fs *c, u32 id) 157 147 { 158 - return !bch2_snapshot_is_internal_node(c, id); 159 - } 160 - 161 - static inline u32 bch2_snapshot_sibling(struct bch_fs *c, u32 id) 162 - { 163 - const struct snapshot_t *s; 164 - u32 parent = __bch2_snapshot_parent(c, id); 165 - 166 - if (!parent) 167 - return 0; 168 - 169 - s = snapshot_t(c, __bch2_snapshot_parent(c, id)); 170 - if (id == s->children[0]) 171 - return s->children[1]; 172 - if (id == s->children[1]) 173 - return s->children[0]; 174 - return 0; 148 + int ret = bch2_snapshot_is_internal_node(c, id); 149 + if (ret < 0) 150 + return ret; 151 + return !ret; 175 152 } 176 153 177 154 static inline u32 bch2_snapshot_depth(struct bch_fs *c, u32 parent) ··· 209 218 210 219 static inline int snapshot_list_add(struct bch_fs *c, snapshot_id_list *s, u32 id) 211 220 { 212 - int ret; 213 - 214 221 BUG_ON(snapshot_list_has_id(s, id)); 215 - ret = darray_push(s, id); 222 + int ret = darray_push(s, id); 216 223 if (ret) 217 224 bch_err(c, "error reallocating snapshot_id_list (size %zu)", s->size); 218 225 return ret; 226 + } 227 + 228 + static inline int snapshot_list_add_nodup(struct bch_fs *c, snapshot_id_list *s, u32 id) 229 + { 230 + int ret = snapshot_list_has_id(s, id) 231 + ? 0 232 + : darray_push(s, id); 233 + if (ret) 234 + bch_err(c, "error reallocating snapshot_id_list (size %zu)", s->size); 235 + return ret; 236 + } 237 + 238 + static inline int snapshot_list_merge(struct bch_fs *c, snapshot_id_list *dst, snapshot_id_list *src) 239 + { 240 + darray_for_each(*src, i) { 241 + int ret = snapshot_list_add_nodup(c, dst, *i); 242 + if (ret) 243 + return ret; 244 + } 245 + 246 + return 0; 219 247 } 220 248 221 249 int bch2_snapshot_lookup(struct btree_trans *trans, u32 id, ··· 248 238 249 239 int bch2_check_snapshot_trees(struct bch_fs *); 250 240 int bch2_check_snapshots(struct bch_fs *); 241 + int bch2_reconstruct_snapshots(struct bch_fs *); 251 242 252 243 int bch2_snapshot_node_set_deleted(struct btree_trans *, u32); 253 244 void bch2_delete_dead_snapshots_work(struct work_struct *); ··· 260 249 struct bpos pos) 261 250 { 262 251 if (!btree_type_has_snapshots(id) || 263 - bch2_snapshot_is_leaf(trans->c, pos.snapshot)) 252 + bch2_snapshot_is_leaf(trans->c, pos.snapshot) > 0) 264 253 return 0; 265 254 266 255 return __bch2_key_has_snapshot_overwrites(trans, id, pos);

+72

fs/bcachefs/subvolume.c

··· 595 595 return ret; 596 596 } 597 597 598 + int bch2_initialize_subvolumes(struct bch_fs *c) 599 + { 600 + struct bkey_i_snapshot_tree root_tree; 601 + struct bkey_i_snapshot root_snapshot; 602 + struct bkey_i_subvolume root_volume; 603 + int ret; 604 + 605 + bkey_snapshot_tree_init(&root_tree.k_i); 606 + root_tree.k.p.offset = 1; 607 + root_tree.v.master_subvol = cpu_to_le32(1); 608 + root_tree.v.root_snapshot = cpu_to_le32(U32_MAX); 609 + 610 + bkey_snapshot_init(&root_snapshot.k_i); 611 + root_snapshot.k.p.offset = U32_MAX; 612 + root_snapshot.v.flags = 0; 613 + root_snapshot.v.parent = 0; 614 + root_snapshot.v.subvol = cpu_to_le32(BCACHEFS_ROOT_SUBVOL); 615 + root_snapshot.v.tree = cpu_to_le32(1); 616 + SET_BCH_SNAPSHOT_SUBVOL(&root_snapshot.v, true); 617 + 618 + bkey_subvolume_init(&root_volume.k_i); 619 + root_volume.k.p.offset = BCACHEFS_ROOT_SUBVOL; 620 + root_volume.v.flags = 0; 621 + root_volume.v.snapshot = cpu_to_le32(U32_MAX); 622 + root_volume.v.inode = cpu_to_le64(BCACHEFS_ROOT_INO); 623 + 624 + ret = bch2_btree_insert(c, BTREE_ID_snapshot_trees, &root_tree.k_i, NULL, 0) ?: 625 + bch2_btree_insert(c, BTREE_ID_snapshots, &root_snapshot.k_i, NULL, 0) ?: 626 + bch2_btree_insert(c, BTREE_ID_subvolumes, &root_volume.k_i, NULL, 0); 627 + bch_err_fn(c, ret); 628 + return ret; 629 + } 630 + 631 + static int __bch2_fs_upgrade_for_subvolumes(struct btree_trans *trans) 632 + { 633 + struct btree_iter iter; 634 + struct bkey_s_c k; 635 + struct bch_inode_unpacked inode; 636 + int ret; 637 + 638 + k = bch2_bkey_get_iter(trans, &iter, BTREE_ID_inodes, 639 + SPOS(0, BCACHEFS_ROOT_INO, U32_MAX), 0); 640 + ret = bkey_err(k); 641 + if (ret) 642 + return ret; 643 + 644 + if (!bkey_is_inode(k.k)) { 645 + bch_err(trans->c, "root inode not found"); 646 + ret = -BCH_ERR_ENOENT_inode; 647 + goto err; 648 + } 649 + 650 + ret = bch2_inode_unpack(k, &inode); 651 + BUG_ON(ret); 652 + 653 + inode.bi_subvol = BCACHEFS_ROOT_SUBVOL; 654 + 655 + ret = bch2_inode_write(trans, &iter, &inode); 656 + err: 657 + bch2_trans_iter_exit(trans, &iter); 658 + return ret; 659 + } 660 + 661 + /* set bi_subvol on root inode */ 662 + int bch2_fs_upgrade_for_subvolumes(struct bch_fs *c) 663 + { 664 + int ret = bch2_trans_do(c, NULL, NULL, BCH_TRANS_COMMIT_lazy_rw, 665 + __bch2_fs_upgrade_for_subvolumes(trans)); 666 + bch_err_fn(c, ret); 667 + return ret; 668 + } 669 + 598 670 int bch2_fs_subvolumes_init(struct bch_fs *c) 599 671 { 600 672 INIT_WORK(&c->snapshot_delete_work, bch2_delete_dead_snapshots_work);

+3

fs/bcachefs/subvolume.h

··· 37 37 int bch2_subvolume_unlink(struct btree_trans *, u32); 38 38 int bch2_subvolume_create(struct btree_trans *, u64, u32, u32, u32 *, u32 *, bool); 39 39 40 + int bch2_initialize_subvolumes(struct bch_fs *); 41 + int bch2_fs_upgrade_for_subvolumes(struct bch_fs *); 42 + 40 43 int bch2_fs_subvolumes_init(struct bch_fs *); 41 44 42 45 #endif /* _BCACHEFS_SUBVOLUME_H */

+2

fs/bcachefs/subvolume_types.h

··· 20 20 }; 21 21 22 22 struct snapshot_table { 23 + struct rcu_head rcu; 24 + size_t nr; 23 25 #ifndef RUST_BINDGEN 24 26 DECLARE_FLEX_ARRAY(struct snapshot_t, s); 25 27 #else

+10 -3

fs/bcachefs/super-io.c

··· 8 8 #include "journal.h" 9 9 #include "journal_sb.h" 10 10 #include "journal_seq_blacklist.h" 11 - #include "recovery.h" 11 + #include "recovery_passes.h" 12 12 #include "replicas.h" 13 13 #include "quota.h" 14 14 #include "sb-clean.h" ··· 143 143 { 144 144 kfree(sb->bio); 145 145 if (!IS_ERR_OR_NULL(sb->s_bdev_file)) 146 - fput(sb->s_bdev_file); 146 + bdev_fput(sb->s_bdev_file); 147 147 kfree(sb->holder); 148 148 kfree(sb->sb_name); 149 149 ··· 527 527 memset(c->sb.errors_silent, 0, sizeof(c->sb.errors_silent)); 528 528 529 529 struct bch_sb_field_ext *ext = bch2_sb_field_get(src, ext); 530 - if (ext) 530 + if (ext) { 531 531 le_bitvector_to_cpu(c->sb.errors_silent, (void *) ext->errors_silent, 532 532 sizeof(c->sb.errors_silent) * 8); 533 + c->sb.btrees_lost_data = le64_to_cpu(ext->btrees_lost_data); 534 + } 533 535 534 536 for_each_member_device(c, ca) { 535 537 struct bch_member m = bch2_sb_member_get(src, ca->dev_idx); ··· 1164 1162 1165 1163 kfree(errors_silent); 1166 1164 } 1165 + 1166 + prt_printf(out, "Btrees with missing data:"); 1167 + prt_tab(out); 1168 + prt_bitflags(out, __bch2_btree_ids, le64_to_cpu(e->btrees_lost_data)); 1169 + prt_newline(out); 1167 1170 } 1168 1171 1169 1172 static const struct bch_sb_field_ops bch_sb_field_ops_ext = {

+14 -2

fs/bcachefs/super.c

··· 15 15 #include "btree_gc.h" 16 16 #include "btree_journal_iter.h" 17 17 #include "btree_key_cache.h" 18 + #include "btree_node_scan.h" 18 19 #include "btree_update_interior.h" 19 20 #include "btree_io.h" 20 21 #include "btree_write_buffer.h" ··· 366 365 !test_bit(BCH_FS_emergency_ro, &c->flags) && 367 366 test_bit(BCH_FS_started, &c->flags) && 368 367 test_bit(BCH_FS_clean_shutdown, &c->flags) && 369 - !c->opts.norecovery) { 368 + c->recovery_pass_done >= BCH_RECOVERY_PASS_journal_replay) { 370 369 BUG_ON(c->journal.last_empty_seq != journal_cur_seq(&c->journal)); 371 370 BUG_ON(atomic_read(&c->btree_cache.dirty)); 372 371 BUG_ON(atomic_long_read(&c->btree_key_cache.nr_dirty)); ··· 511 510 512 511 int bch2_fs_read_write(struct bch_fs *c) 513 512 { 514 - if (c->opts.norecovery) 513 + if (c->opts.recovery_pass_last && 514 + c->opts.recovery_pass_last < BCH_RECOVERY_PASS_journal_replay) 515 515 return -BCH_ERR_erofs_norecovery; 516 516 517 517 if (c->opts.nochanges) ··· 537 535 for (i = 0; i < BCH_TIME_STAT_NR; i++) 538 536 bch2_time_stats_exit(&c->times[i]); 539 537 538 + bch2_find_btree_nodes_exit(&c->found_btree_nodes); 540 539 bch2_free_pending_node_rewrites(c); 541 540 bch2_fs_sb_errors_exit(c); 542 541 bch2_fs_counters_exit(c); ··· 562 559 bch2_io_clock_exit(&c->io_clock[READ]); 563 560 bch2_fs_compress_exit(c); 564 561 bch2_journal_keys_put_initial(c); 562 + bch2_find_btree_nodes_exit(&c->found_btree_nodes); 565 563 BUG_ON(atomic_read(&c->journal_keys.ref)); 566 564 bch2_fs_btree_write_buffer_exit(c); 567 565 percpu_free_rwsem(&c->mark_lock); ··· 1019 1015 for_each_online_member(c, ca) 1020 1016 bch2_members_v2_get_mut(c->disk_sb.sb, ca->dev_idx)->last_mount = cpu_to_le64(now); 1021 1017 1018 + struct bch_sb_field_ext *ext = 1019 + bch2_sb_field_get_minsize(&c->disk_sb, ext, sizeof(*ext) / sizeof(u64)); 1022 1020 mutex_unlock(&c->sb_lock); 1021 + 1022 + if (!ext) { 1023 + bch_err(c, "insufficient space in superblock for sb_field_ext"); 1024 + ret = -BCH_ERR_ENOSPC_sb; 1025 + goto err; 1026 + } 1023 1027 1024 1028 for_each_rw_member(c, ca) 1025 1029 bch2_dev_allocator_add(c, ca);

-143

fs/bcachefs/util.c

··· 707 707 } 708 708 } 709 709 710 - static int alignment_ok(const void *base, size_t align) 711 - { 712 - return IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) || 713 - ((unsigned long)base & (align - 1)) == 0; 714 - } 715 - 716 - static void u32_swap(void *a, void *b, size_t size) 717 - { 718 - u32 t = *(u32 *)a; 719 - *(u32 *)a = *(u32 *)b; 720 - *(u32 *)b = t; 721 - } 722 - 723 - static void u64_swap(void *a, void *b, size_t size) 724 - { 725 - u64 t = *(u64 *)a; 726 - *(u64 *)a = *(u64 *)b; 727 - *(u64 *)b = t; 728 - } 729 - 730 - static void generic_swap(void *a, void *b, size_t size) 731 - { 732 - char t; 733 - 734 - do { 735 - t = *(char *)a; 736 - *(char *)a++ = *(char *)b; 737 - *(char *)b++ = t; 738 - } while (--size > 0); 739 - } 740 - 741 - static inline int do_cmp(void *base, size_t n, size_t size, 742 - int (*cmp_func)(const void *, const void *, size_t), 743 - size_t l, size_t r) 744 - { 745 - return cmp_func(base + inorder_to_eytzinger0(l, n) * size, 746 - base + inorder_to_eytzinger0(r, n) * size, 747 - size); 748 - } 749 - 750 - static inline void do_swap(void *base, size_t n, size_t size, 751 - void (*swap_func)(void *, void *, size_t), 752 - size_t l, size_t r) 753 - { 754 - swap_func(base + inorder_to_eytzinger0(l, n) * size, 755 - base + inorder_to_eytzinger0(r, n) * size, 756 - size); 757 - } 758 - 759 - void eytzinger0_sort(void *base, size_t n, size_t size, 760 - int (*cmp_func)(const void *, const void *, size_t), 761 - void (*swap_func)(void *, void *, size_t)) 762 - { 763 - int i, c, r; 764 - 765 - if (!swap_func) { 766 - if (size == 4 && alignment_ok(base, 4)) 767 - swap_func = u32_swap; 768 - else if (size == 8 && alignment_ok(base, 8)) 769 - swap_func = u64_swap; 770 - else 771 - swap_func = generic_swap; 772 - } 773 - 774 - /* heapify */ 775 - for (i = n / 2 - 1; i >= 0; --i) { 776 - for (r = i; r * 2 + 1 < n; r = c) { 777 - c = r * 2 + 1; 778 - 779 - if (c + 1 < n && 780 - do_cmp(base, n, size, cmp_func, c, c + 1) < 0) 781 - c++; 782 - 783 - if (do_cmp(base, n, size, cmp_func, r, c) >= 0) 784 - break; 785 - 786 - do_swap(base, n, size, swap_func, r, c); 787 - } 788 - } 789 - 790 - /* sort */ 791 - for (i = n - 1; i > 0; --i) { 792 - do_swap(base, n, size, swap_func, 0, i); 793 - 794 - for (r = 0; r * 2 + 1 < i; r = c) { 795 - c = r * 2 + 1; 796 - 797 - if (c + 1 < i && 798 - do_cmp(base, n, size, cmp_func, c, c + 1) < 0) 799 - c++; 800 - 801 - if (do_cmp(base, n, size, cmp_func, r, c) >= 0) 802 - break; 803 - 804 - do_swap(base, n, size, swap_func, r, c); 805 - } 806 - } 807 - } 808 - 809 - void sort_cmp_size(void *base, size_t num, size_t size, 810 - int (*cmp_func)(const void *, const void *, size_t), 811 - void (*swap_func)(void *, void *, size_t size)) 812 - { 813 - /* pre-scale counters for performance */ 814 - int i = (num/2 - 1) * size, n = num * size, c, r; 815 - 816 - if (!swap_func) { 817 - if (size == 4 && alignment_ok(base, 4)) 818 - swap_func = u32_swap; 819 - else if (size == 8 && alignment_ok(base, 8)) 820 - swap_func = u64_swap; 821 - else 822 - swap_func = generic_swap; 823 - } 824 - 825 - /* heapify */ 826 - for ( ; i >= 0; i -= size) { 827 - for (r = i; r * 2 + size < n; r = c) { 828 - c = r * 2 + size; 829 - if (c < n - size && 830 - cmp_func(base + c, base + c + size, size) < 0) 831 - c += size; 832 - if (cmp_func(base + r, base + c, size) >= 0) 833 - break; 834 - swap_func(base + r, base + c, size); 835 - } 836 - } 837 - 838 - /* sort */ 839 - for (i = n - size; i > 0; i -= size) { 840 - swap_func(base, base + i, size); 841 - for (r = 0; r * 2 + size < i; r = c) { 842 - c = r * 2 + size; 843 - if (c < i - size && 844 - cmp_func(base + c, base + c + size, size) < 0) 845 - c += size; 846 - if (cmp_func(base + r, base + c, size) >= 0) 847 - break; 848 - swap_func(base + r, base + c, size); 849 - } 850 - } 851 - } 852 - 853 710 #if 0 854 711 void eytzinger1_test(void) 855 712 {

+10 -4

fs/bcachefs/util.h

··· 631 631 memset(s + bytes, c, rem); 632 632 } 633 633 634 - void sort_cmp_size(void *base, size_t num, size_t size, 635 - int (*cmp_func)(const void *, const void *, size_t), 636 - void (*swap_func)(void *, void *, size_t)); 637 - 638 634 /* just the memmove, doesn't update @_nr */ 639 635 #define __array_insert_item(_array, _nr, _pos) \ 640 636 memmove(&(_array)[(_pos) + 1], \ ··· 791 795 static inline void __set_bit_le64(size_t bit, __le64 *addr) 792 796 { 793 797 addr[bit / 64] |= cpu_to_le64(BIT_ULL(bit % 64)); 798 + } 799 + 800 + static inline void __clear_bit_le64(size_t bit, __le64 *addr) 801 + { 802 + addr[bit / 64] &= !cpu_to_le64(BIT_ULL(bit % 64)); 803 + } 804 + 805 + static inline bool test_bit_le64(size_t bit, __le64 *addr) 806 + { 807 + return (addr[bit / 64] & cpu_to_le64(BIT_ULL(bit % 64))) != 0; 794 808 } 795 809 796 810 #endif /* _BCACHEFS_UTIL_H */

+1 -1

fs/cramfs/inode.c

··· 495 495 sb->s_mtd = NULL; 496 496 } else if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV) && sb->s_bdev) { 497 497 sync_blockdev(sb->s_bdev); 498 - fput(sb->s_bdev_file); 498 + bdev_fput(sb->s_bdev_file); 499 499 } 500 500 kfree(sbi); 501 501 }

+4 -4

fs/ext4/super.c

··· 5668 5668 brelse(sbi->s_sbh); 5669 5669 if (sbi->s_journal_bdev_file) { 5670 5670 invalidate_bdev(file_bdev(sbi->s_journal_bdev_file)); 5671 - fput(sbi->s_journal_bdev_file); 5671 + bdev_fput(sbi->s_journal_bdev_file); 5672 5672 } 5673 5673 out_fail: 5674 5674 invalidate_bdev(sb->s_bdev); ··· 5913 5913 out_bh: 5914 5914 brelse(bh); 5915 5915 out_bdev: 5916 - fput(bdev_file); 5916 + bdev_fput(bdev_file); 5917 5917 return ERR_PTR(errno); 5918 5918 } 5919 5919 ··· 5952 5952 out_journal: 5953 5953 jbd2_journal_destroy(journal); 5954 5954 out_bdev: 5955 - fput(bdev_file); 5955 + bdev_fput(bdev_file); 5956 5956 return ERR_PTR(errno); 5957 5957 } 5958 5958 ··· 7327 7327 kill_block_super(sb); 7328 7328 7329 7329 if (bdev_file) 7330 - fput(bdev_file); 7330 + bdev_fput(bdev_file); 7331 7331 } 7332 7332 7333 7333 static struct file_system_type ext4_fs_type = {

+1 -1

fs/f2fs/super.c

··· 1558 1558 1559 1559 for (i = 0; i < sbi->s_ndevs; i++) { 1560 1560 if (i > 0) 1561 - fput(FDEV(i).bdev_file); 1561 + bdev_fput(FDEV(i).bdev_file); 1562 1562 #ifdef CONFIG_BLK_DEV_ZONED 1563 1563 kvfree(FDEV(i).blkz_seq); 1564 1564 #endif

+2 -2

fs/jfs/jfs_logmgr.c

··· 1141 1141 lbmLogShutdown(log); 1142 1142 1143 1143 close: /* close external log device */ 1144 - fput(bdev_file); 1144 + bdev_fput(bdev_file); 1145 1145 1146 1146 free: /* free log descriptor */ 1147 1147 mutex_unlock(&jfs_log_mutex); ··· 1485 1485 bdev_file = log->bdev_file; 1486 1486 rc = lmLogShutdown(log); 1487 1487 1488 - fput(bdev_file); 1488 + bdev_fput(bdev_file); 1489 1489 1490 1490 kfree(log); 1491 1491

+2 -5

fs/namei.c

··· 4050 4050 case 0: case S_IFREG: 4051 4051 error = vfs_create(idmap, path.dentry->d_inode, 4052 4052 dentry, mode, true); 4053 + if (!error) 4054 + security_path_post_mknod(idmap, dentry); 4053 4055 break; 4054 4056 case S_IFCHR: case S_IFBLK: 4055 4057 error = vfs_mknod(idmap, path.dentry->d_inode, ··· 4062 4060 dentry, mode, 0); 4063 4061 break; 4064 4062 } 4065 - 4066 - if (error) 4067 - goto out2; 4068 - 4069 - security_path_post_mknod(idmap, dentry); 4070 4063 out2: 4071 4064 done_path_create(&path, dentry); 4072 4065 if (retry_estale(error, lookup_flags)) {

+2 -5

fs/nfsd/nfs4state.c

··· 3042 3042 nfsd4_cb_recall_any_release(struct nfsd4_callback *cb) 3043 3043 { 3044 3044 struct nfs4_client *clp = cb->cb_clp; 3045 - struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id); 3046 3045 3047 - spin_lock(&nn->client_lock); 3048 3046 clear_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); 3049 - put_client_renew_locked(clp); 3050 - spin_unlock(&nn->client_lock); 3047 + drop_client(clp); 3051 3048 } 3052 3049 3053 3050 static int ··· 6613 6616 list_add(&clp->cl_ra_cblist, &cblist); 6614 6617 6615 6618 /* release in nfsd4_cb_recall_any_release */ 6616 - atomic_inc(&clp->cl_rpc_users); 6619 + kref_get(&clp->cl_nfsdfs.cl_ref); 6617 6620 set_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags); 6618 6621 clp->cl_ra_time = ktime_get_boottime_seconds(); 6619 6622 }

+1 -1

fs/reiserfs/journal.c

··· 2589 2589 static void release_journal_dev(struct reiserfs_journal *journal) 2590 2590 { 2591 2591 if (journal->j_bdev_file) { 2592 - fput(journal->j_bdev_file); 2592 + bdev_fput(journal->j_bdev_file); 2593 2593 journal->j_bdev_file = NULL; 2594 2594 } 2595 2595 }

+1 -1

fs/romfs/super.c

··· 594 594 #ifdef CONFIG_ROMFS_ON_BLOCK 595 595 if (sb->s_bdev) { 596 596 sync_blockdev(sb->s_bdev); 597 - fput(sb->s_bdev_file); 597 + bdev_fput(sb->s_bdev_file); 598 598 } 599 599 #endif 600 600 }

+4 -2

fs/smb/client/cached_dir.c

··· 417 417 { 418 418 struct cached_fid *cfid = container_of(ref, struct cached_fid, 419 419 refcount); 420 + int rc; 420 421 421 422 spin_lock(&cfid->cfids->cfid_list_lock); 422 423 if (cfid->on_list) { ··· 431 430 cfid->dentry = NULL; 432 431 433 432 if (cfid->is_open) { 434 - SMB2_close(0, cfid->tcon, cfid->fid.persistent_fid, 433 + rc = SMB2_close(0, cfid->tcon, cfid->fid.persistent_fid, 435 434 cfid->fid.volatile_fid); 436 - atomic_dec(&cfid->tcon->num_remote_opens); 435 + if (rc != -EBUSY && rc != -EAGAIN) 436 + atomic_dec(&cfid->tcon->num_remote_opens); 437 437 } 438 438 439 439 free_cached_dir(cfid);

+6

fs/smb/client/cifs_debug.c

··· 250 250 spin_lock(&cifs_tcp_ses_lock); 251 251 list_for_each_entry(server, &cifs_tcp_ses_list, tcp_ses_list) { 252 252 list_for_each_entry(ses, &server->smb_ses_list, smb_ses_list) { 253 + if (cifs_ses_exiting(ses)) 254 + continue; 253 255 list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { 254 256 spin_lock(&tcon->open_file_lock); 255 257 list_for_each_entry(cfile, &tcon->openFileList, tlist) { ··· 678 676 } 679 677 #endif /* CONFIG_CIFS_STATS2 */ 680 678 list_for_each_entry(ses, &server->smb_ses_list, smb_ses_list) { 679 + if (cifs_ses_exiting(ses)) 680 + continue; 681 681 list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { 682 682 atomic_set(&tcon->num_smbs_sent, 0); 683 683 spin_lock(&tcon->stat_lock); ··· 759 755 } 760 756 #endif /* STATS2 */ 761 757 list_for_each_entry(ses, &server->smb_ses_list, smb_ses_list) { 758 + if (cifs_ses_exiting(ses)) 759 + continue; 762 760 list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { 763 761 i++; 764 762 seq_printf(m, "\n%d) %s", i, tcon->tree_name);

+11

fs/smb/client/cifsfs.c

··· 156 156 struct workqueue_struct *fileinfo_put_wq; 157 157 struct workqueue_struct *cifsoplockd_wq; 158 158 struct workqueue_struct *deferredclose_wq; 159 + struct workqueue_struct *serverclose_wq; 159 160 __u32 cifs_lock_secret; 160 161 161 162 /* ··· 1889 1888 goto out_destroy_cifsoplockd_wq; 1890 1889 } 1891 1890 1891 + serverclose_wq = alloc_workqueue("serverclose", 1892 + WQ_FREEZABLE|WQ_MEM_RECLAIM, 0); 1893 + if (!serverclose_wq) { 1894 + rc = -ENOMEM; 1895 + goto out_destroy_serverclose_wq; 1896 + } 1897 + 1892 1898 rc = cifs_init_inodecache(); 1893 1899 if (rc) 1894 1900 goto out_destroy_deferredclose_wq; ··· 1970 1962 destroy_workqueue(decrypt_wq); 1971 1963 out_destroy_cifsiod_wq: 1972 1964 destroy_workqueue(cifsiod_wq); 1965 + out_destroy_serverclose_wq: 1966 + destroy_workqueue(serverclose_wq); 1973 1967 out_clean_proc: 1974 1968 cifs_proc_clean(); 1975 1969 return rc; ··· 2001 1991 destroy_workqueue(cifsoplockd_wq); 2002 1992 destroy_workqueue(decrypt_wq); 2003 1993 destroy_workqueue(fileinfo_put_wq); 1994 + destroy_workqueue(serverclose_wq); 2004 1995 destroy_workqueue(cifsiod_wq); 2005 1996 cifs_proc_clean(); 2006 1997 }

+15 -4

fs/smb/client/cifsglob.h

··· 442 442 /* set fid protocol-specific info */ 443 443 void (*set_fid)(struct cifsFileInfo *, struct cifs_fid *, __u32); 444 444 /* close a file */ 445 - void (*close)(const unsigned int, struct cifs_tcon *, 445 + int (*close)(const unsigned int, struct cifs_tcon *, 446 446 struct cifs_fid *); 447 447 /* close a file, returning file attributes and timestamps */ 448 - void (*close_getattr)(const unsigned int xid, struct cifs_tcon *tcon, 448 + int (*close_getattr)(const unsigned int xid, struct cifs_tcon *tcon, 449 449 struct cifsFileInfo *pfile_info); 450 450 /* send a flush request to the server */ 451 451 int (*flush)(const unsigned int, struct cifs_tcon *, struct cifs_fid *); ··· 1281 1281 struct cached_fids *cfids; 1282 1282 /* BB add field for back pointer to sb struct(s)? */ 1283 1283 #ifdef CONFIG_CIFS_DFS_UPCALL 1284 - struct list_head dfs_ses_list; 1285 1284 struct delayed_work dfs_cache_work; 1286 1285 #endif 1287 1286 struct delayed_work query_interfaces; /* query interfaces workqueue job */ ··· 1439 1440 bool swapfile:1; 1440 1441 bool oplock_break_cancelled:1; 1441 1442 bool status_file_deleted:1; /* file has been deleted */ 1443 + bool offload:1; /* offload final part of _put to a wq */ 1442 1444 unsigned int oplock_epoch; /* epoch from the lease break */ 1443 1445 __u32 oplock_level; /* oplock/lease level from the lease break */ 1444 1446 int count; ··· 1448 1448 struct cifs_search_info srch_inf; 1449 1449 struct work_struct oplock_break; /* work for oplock breaks */ 1450 1450 struct work_struct put; /* work for the final part of _put */ 1451 + struct work_struct serverclose; /* work for serverclose */ 1451 1452 struct delayed_work deferred; 1452 1453 bool deferred_close_scheduled; /* Flag to indicate close is scheduled */ 1453 1454 char *symlink_target; ··· 1805 1804 struct TCP_Server_Info *server; 1806 1805 struct cifs_ses *ses; 1807 1806 struct cifs_tcon *tcon; 1808 - struct list_head dfs_ses_list; 1809 1807 }; 1810 1808 1811 1809 static inline void __free_dfs_info_param(struct dfs_info3_param *param) ··· 2105 2105 extern struct workqueue_struct *fileinfo_put_wq; 2106 2106 extern struct workqueue_struct *cifsoplockd_wq; 2107 2107 extern struct workqueue_struct *deferredclose_wq; 2108 + extern struct workqueue_struct *serverclose_wq; 2108 2109 extern __u32 cifs_lock_secret; 2109 2110 2110 2111 extern mempool_t *cifs_sm_req_poolp; ··· 2324 2323 struct smb2_file_link_info link_info; 2325 2324 struct kvec ea_iov; 2326 2325 }; 2326 + 2327 + static inline bool cifs_ses_exiting(struct cifs_ses *ses) 2328 + { 2329 + bool ret; 2330 + 2331 + spin_lock(&ses->ses_lock); 2332 + ret = ses->ses_status == SES_EXITING; 2333 + spin_unlock(&ses->ses_lock); 2334 + return ret; 2335 + } 2327 2336 2328 2337 #endif /* _CIFS_GLOB_H */

+10 -10

fs/smb/client/cifsproto.h

··· 725 725 void cifs_put_tcon_super(struct super_block *sb); 726 726 int cifs_wait_for_server_reconnect(struct TCP_Server_Info *server, bool retry); 727 727 728 - /* Put references of @ses and @ses->dfs_root_ses */ 728 + /* Put references of @ses and its children */ 729 729 static inline void cifs_put_smb_ses(struct cifs_ses *ses) 730 730 { 731 - struct cifs_ses *rses = ses->dfs_root_ses; 731 + struct cifs_ses *next; 732 732 733 - __cifs_put_smb_ses(ses); 734 - if (rses) 735 - __cifs_put_smb_ses(rses); 733 + do { 734 + next = ses->dfs_root_ses; 735 + __cifs_put_smb_ses(ses); 736 + } while ((ses = next)); 736 737 } 737 738 738 - /* Get an active reference of @ses and @ses->dfs_root_ses. 739 + /* Get an active reference of @ses and its children. 739 740 * 740 741 * NOTE: make sure to call this function when incrementing reference count of 741 742 * @ses to ensure that any DFS root session attached to it (@ses->dfs_root_ses) 742 743 * will also get its reference count incremented. 743 744 * 744 - * cifs_put_smb_ses() will put both references, so call it when you're done. 745 + * cifs_put_smb_ses() will put all references, so call it when you're done. 745 746 */ 746 747 static inline void cifs_smb_ses_inc_refcount(struct cifs_ses *ses) 747 748 { 748 749 lockdep_assert_held(&cifs_tcp_ses_lock); 749 750 750 - ses->ses_count++; 751 - if (ses->dfs_root_ses) 752 - ses->dfs_root_ses->ses_count++; 751 + for (; ses; ses = ses->dfs_root_ses) 752 + ses->ses_count++; 753 753 } 754 754 755 755 static inline bool dfs_src_pathname_equal(const char *s1, const char *s2)

+2 -4

fs/smb/client/cifssmb.c

··· 5854 5854 parm_data->list.EA_flags = 0; 5855 5855 /* we checked above that name len is less than 255 */ 5856 5856 parm_data->list.name_len = (__u8)name_len; 5857 - /* EA names are always ASCII */ 5858 - if (ea_name) 5859 - strncpy(parm_data->list.name, ea_name, name_len); 5860 - parm_data->list.name[name_len] = '\0'; 5857 + /* EA names are always ASCII and NUL-terminated */ 5858 + strscpy(parm_data->list.name, ea_name ?: "", name_len + 1); 5861 5859 parm_data->list.value_len = cpu_to_le16(ea_value_len); 5862 5860 /* caller ensures that ea_value_len is less than 64K but 5863 5861 we need to ensure that it fits within the smb */

+100 -57

fs/smb/client/connect.c

··· 175 175 176 176 spin_lock(&cifs_tcp_ses_lock); 177 177 list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) { 178 + if (cifs_ses_exiting(ses)) 179 + continue; 178 180 spin_lock(&ses->chan_lock); 179 181 for (i = 0; i < ses->chan_count; i++) { 180 182 if (!ses->chans[i].server) ··· 234 232 235 233 spin_lock(&cifs_tcp_ses_lock); 236 234 list_for_each_entry_safe(ses, nses, &pserver->smb_ses_list, smb_ses_list) { 237 - /* check if iface is still active */ 235 + spin_lock(&ses->ses_lock); 236 + if (ses->ses_status == SES_EXITING) { 237 + spin_unlock(&ses->ses_lock); 238 + continue; 239 + } 240 + spin_unlock(&ses->ses_lock); 241 + 238 242 spin_lock(&ses->chan_lock); 239 243 if (cifs_ses_get_chan_index(ses, server) == 240 244 CIFS_INVAL_CHAN_INDEX) { ··· 1868 1860 ctx->sectype != ses->sectype) 1869 1861 return 0; 1870 1862 1863 + if (ctx->dfs_root_ses != ses->dfs_root_ses) 1864 + return 0; 1865 + 1871 1866 /* 1872 1867 * If an existing session is limited to less channels than 1873 1868 * requested, it should not be reused ··· 1974 1963 return rc; 1975 1964 } 1976 1965 1977 - /** 1978 - * cifs_free_ipc - helper to release the session IPC tcon 1979 - * @ses: smb session to unmount the IPC from 1980 - * 1981 - * Needs to be called everytime a session is destroyed. 1982 - * 1983 - * On session close, the IPC is closed and the server must release all tcons of the session. 1984 - * No need to send a tree disconnect here. 1985 - * 1986 - * Besides, it will make the server to not close durable and resilient files on session close, as 1987 - * specified in MS-SMB2 3.3.5.6 Receiving an SMB2 LOGOFF Request. 1988 - */ 1989 - static int 1990 - cifs_free_ipc(struct cifs_ses *ses) 1991 - { 1992 - struct cifs_tcon *tcon = ses->tcon_ipc; 1993 - 1994 - if (tcon == NULL) 1995 - return 0; 1996 - 1997 - tconInfoFree(tcon); 1998 - ses->tcon_ipc = NULL; 1999 - return 0; 2000 - } 2001 - 2002 1966 static struct cifs_ses * 2003 1967 cifs_find_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx) 2004 1968 { ··· 2005 2019 void __cifs_put_smb_ses(struct cifs_ses *ses) 2006 2020 { 2007 2021 struct TCP_Server_Info *server = ses->server; 2022 + struct cifs_tcon *tcon; 2008 2023 unsigned int xid; 2009 2024 size_t i; 2025 + bool do_logoff; 2010 2026 int rc; 2011 2027 2012 - spin_lock(&ses->ses_lock); 2013 - if (ses->ses_status == SES_EXITING) { 2014 - spin_unlock(&ses->ses_lock); 2015 - return; 2016 - } 2017 - spin_unlock(&ses->ses_lock); 2018 - 2019 - cifs_dbg(FYI, "%s: ses_count=%d\n", __func__, ses->ses_count); 2020 - cifs_dbg(FYI, 2021 - "%s: ses ipc: %s\n", __func__, ses->tcon_ipc ? ses->tcon_ipc->tree_name : "NONE"); 2022 - 2023 2028 spin_lock(&cifs_tcp_ses_lock); 2024 - if (--ses->ses_count > 0) { 2029 + spin_lock(&ses->ses_lock); 2030 + cifs_dbg(FYI, "%s: id=0x%llx ses_count=%d ses_status=%u ipc=%s\n", 2031 + __func__, ses->Suid, ses->ses_count, ses->ses_status, 2032 + ses->tcon_ipc ? ses->tcon_ipc->tree_name : "none"); 2033 + if (ses->ses_status == SES_EXITING || --ses->ses_count > 0) { 2034 + spin_unlock(&ses->ses_lock); 2025 2035 spin_unlock(&cifs_tcp_ses_lock); 2026 2036 return; 2027 2037 } 2028 - spin_lock(&ses->ses_lock); 2029 - if (ses->ses_status == SES_GOOD) 2030 - ses->ses_status = SES_EXITING; 2031 - spin_unlock(&ses->ses_lock); 2032 - spin_unlock(&cifs_tcp_ses_lock); 2033 - 2034 2038 /* ses_count can never go negative */ 2035 2039 WARN_ON(ses->ses_count < 0); 2036 2040 2037 - spin_lock(&ses->ses_lock); 2038 - if (ses->ses_status == SES_EXITING && server->ops->logoff) { 2039 - spin_unlock(&ses->ses_lock); 2040 - cifs_free_ipc(ses); 2041 + spin_lock(&ses->chan_lock); 2042 + cifs_chan_clear_need_reconnect(ses, server); 2043 + spin_unlock(&ses->chan_lock); 2044 + 2045 + do_logoff = ses->ses_status == SES_GOOD && server->ops->logoff; 2046 + ses->ses_status = SES_EXITING; 2047 + tcon = ses->tcon_ipc; 2048 + ses->tcon_ipc = NULL; 2049 + spin_unlock(&ses->ses_lock); 2050 + spin_unlock(&cifs_tcp_ses_lock); 2051 + 2052 + /* 2053 + * On session close, the IPC is closed and the server must release all 2054 + * tcons of the session. No need to send a tree disconnect here. 2055 + * 2056 + * Besides, it will make the server to not close durable and resilient 2057 + * files on session close, as specified in MS-SMB2 3.3.5.6 Receiving an 2058 + * SMB2 LOGOFF Request. 2059 + */ 2060 + tconInfoFree(tcon); 2061 + if (do_logoff) { 2041 2062 xid = get_xid(); 2042 2063 rc = server->ops->logoff(xid, ses); 2043 2064 if (rc) 2044 2065 cifs_server_dbg(VFS, "%s: Session Logoff failure rc=%d\n", 2045 2066 __func__, rc); 2046 2067 _free_xid(xid); 2047 - } else { 2048 - spin_unlock(&ses->ses_lock); 2049 - cifs_free_ipc(ses); 2050 2068 } 2051 2069 2052 2070 spin_lock(&cifs_tcp_ses_lock); ··· 2363 2373 * need to lock before changing something in the session. 2364 2374 */ 2365 2375 spin_lock(&cifs_tcp_ses_lock); 2376 + if (ctx->dfs_root_ses) 2377 + cifs_smb_ses_inc_refcount(ctx->dfs_root_ses); 2366 2378 ses->dfs_root_ses = ctx->dfs_root_ses; 2367 - if (ses->dfs_root_ses) 2368 - ses->dfs_root_ses->ses_count++; 2369 2379 list_add(&ses->smb_ses_list, &server->smb_ses_list); 2370 2380 spin_unlock(&cifs_tcp_ses_lock); 2371 2381 ··· 3316 3326 cifs_put_smb_ses(mnt_ctx->ses); 3317 3327 else if (mnt_ctx->server) 3318 3328 cifs_put_tcp_session(mnt_ctx->server, 0); 3329 + mnt_ctx->ses = NULL; 3330 + mnt_ctx->tcon = NULL; 3331 + mnt_ctx->server = NULL; 3319 3332 mnt_ctx->cifs_sb->mnt_cifs_flags &= ~CIFS_MOUNT_POSIX_PATHS; 3320 3333 free_xid(mnt_ctx->xid); 3321 3334 } ··· 3597 3604 bool isdfs; 3598 3605 int rc; 3599 3606 3600 - INIT_LIST_HEAD(&mnt_ctx.dfs_ses_list); 3601 - 3602 3607 rc = dfs_mount_share(&mnt_ctx, &isdfs); 3603 3608 if (rc) 3604 3609 goto error; ··· 3627 3636 return rc; 3628 3637 3629 3638 error: 3630 - dfs_put_root_smb_sessions(&mnt_ctx.dfs_ses_list); 3631 3639 cifs_mount_put_conns(&mnt_ctx); 3632 3640 return rc; 3633 3641 } ··· 3641 3651 goto error; 3642 3652 3643 3653 rc = cifs_mount_get_tcon(&mnt_ctx); 3654 + if (!rc) { 3655 + /* 3656 + * Prevent superblock from being created with any missing 3657 + * connections. 3658 + */ 3659 + if (WARN_ON(!mnt_ctx.server)) 3660 + rc = -EHOSTDOWN; 3661 + else if (WARN_ON(!mnt_ctx.ses)) 3662 + rc = -EACCES; 3663 + else if (WARN_ON(!mnt_ctx.tcon)) 3664 + rc = -ENOENT; 3665 + } 3644 3666 if (rc) 3645 3667 goto error; 3646 3668 ··· 3990 3988 } 3991 3989 3992 3990 static struct cifs_tcon * 3993 - cifs_construct_tcon(struct cifs_sb_info *cifs_sb, kuid_t fsuid) 3991 + __cifs_construct_tcon(struct cifs_sb_info *cifs_sb, kuid_t fsuid) 3994 3992 { 3995 3993 int rc; 3996 3994 struct cifs_tcon *master_tcon = cifs_sb_master_tcon(cifs_sb); 3997 3995 struct cifs_ses *ses; 3998 3996 struct cifs_tcon *tcon = NULL; 3999 3997 struct smb3_fs_context *ctx; 3998 + char *origin_fullpath = NULL; 4000 3999 4001 4000 ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); 4002 4001 if (ctx == NULL) ··· 4021 4018 ctx->sign = master_tcon->ses->sign; 4022 4019 ctx->seal = master_tcon->seal; 4023 4020 ctx->witness = master_tcon->use_witness; 4021 + ctx->dfs_root_ses = master_tcon->ses->dfs_root_ses; 4024 4022 4025 4023 rc = cifs_set_vol_auth(ctx, master_tcon->ses); 4026 4024 if (rc) { ··· 4041 4037 goto out; 4042 4038 } 4043 4039 4040 + #ifdef CONFIG_CIFS_DFS_UPCALL 4041 + spin_lock(&master_tcon->tc_lock); 4042 + if (master_tcon->origin_fullpath) { 4043 + spin_unlock(&master_tcon->tc_lock); 4044 + origin_fullpath = dfs_get_path(cifs_sb, cifs_sb->ctx->source); 4045 + if (IS_ERR(origin_fullpath)) { 4046 + tcon = ERR_CAST(origin_fullpath); 4047 + origin_fullpath = NULL; 4048 + cifs_put_smb_ses(ses); 4049 + goto out; 4050 + } 4051 + } else { 4052 + spin_unlock(&master_tcon->tc_lock); 4053 + } 4054 + #endif 4055 + 4044 4056 tcon = cifs_get_tcon(ses, ctx); 4045 4057 if (IS_ERR(tcon)) { 4046 4058 cifs_put_smb_ses(ses); 4047 4059 goto out; 4048 4060 } 4061 + 4062 + #ifdef CONFIG_CIFS_DFS_UPCALL 4063 + if (origin_fullpath) { 4064 + spin_lock(&tcon->tc_lock); 4065 + tcon->origin_fullpath = origin_fullpath; 4066 + spin_unlock(&tcon->tc_lock); 4067 + origin_fullpath = NULL; 4068 + queue_delayed_work(dfscache_wq, &tcon->dfs_cache_work, 4069 + dfs_cache_get_ttl() * HZ); 4070 + } 4071 + #endif 4049 4072 4050 4073 #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY 4051 4074 if (cap_unix(ses)) ··· 4082 4051 out: 4083 4052 kfree(ctx->username); 4084 4053 kfree_sensitive(ctx->password); 4054 + kfree(origin_fullpath); 4085 4055 kfree(ctx); 4086 4056 4087 4057 return tcon; 4058 + } 4059 + 4060 + static struct cifs_tcon * 4061 + cifs_construct_tcon(struct cifs_sb_info *cifs_sb, kuid_t fsuid) 4062 + { 4063 + struct cifs_tcon *ret; 4064 + 4065 + cifs_mount_lock(); 4066 + ret = __cifs_construct_tcon(cifs_sb, fsuid); 4067 + cifs_mount_unlock(); 4068 + return ret; 4088 4069 } 4089 4070 4090 4071 struct cifs_tcon *

+24 -27

fs/smb/client/dfs.c

··· 66 66 } 67 67 68 68 /* 69 - * Track individual DFS referral servers used by new DFS mount. 70 - * 71 - * On success, their lifetime will be shared by final tcon (dfs_ses_list). 72 - * Otherwise, they will be put by dfs_put_root_smb_sessions() in cifs_mount(). 69 + * Get an active reference of @ses so that next call to cifs_put_tcon() won't 70 + * release it as any new DFS referrals must go through its IPC tcon. 73 71 */ 74 - static int add_root_smb_session(struct cifs_mount_ctx *mnt_ctx) 72 + static void add_root_smb_session(struct cifs_mount_ctx *mnt_ctx) 75 73 { 76 74 struct smb3_fs_context *ctx = mnt_ctx->fs_ctx; 77 - struct dfs_root_ses *root_ses; 78 75 struct cifs_ses *ses = mnt_ctx->ses; 79 76 80 77 if (ses) { 81 - root_ses = kmalloc(sizeof(*root_ses), GFP_KERNEL); 82 - if (!root_ses) 83 - return -ENOMEM; 84 - 85 - INIT_LIST_HEAD(&root_ses->list); 86 - 87 78 spin_lock(&cifs_tcp_ses_lock); 88 79 cifs_smb_ses_inc_refcount(ses); 89 80 spin_unlock(&cifs_tcp_ses_lock); 90 - root_ses->ses = ses; 91 - list_add_tail(&root_ses->list, &mnt_ctx->dfs_ses_list); 92 81 } 93 - /* Select new DFS referral server so that new referrals go through it */ 94 82 ctx->dfs_root_ses = ses; 95 - return 0; 96 83 } 97 84 98 85 static inline int parse_dfs_target(struct smb3_fs_context *ctx, ··· 172 185 continue; 173 186 } 174 187 175 - if (is_refsrv) { 176 - rc = add_root_smb_session(mnt_ctx); 177 - if (rc) 178 - goto out; 179 - } 188 + if (is_refsrv) 189 + add_root_smb_session(mnt_ctx); 180 190 181 191 rc = ref_walk_advance(rw); 182 192 if (!rc) { ··· 216 232 struct smb3_fs_context *ctx = mnt_ctx->fs_ctx; 217 233 struct cifs_tcon *tcon; 218 234 char *origin_fullpath; 235 + bool new_tcon = true; 219 236 int rc; 220 237 221 238 origin_fullpath = dfs_get_path(cifs_sb, ctx->source); ··· 224 239 return PTR_ERR(origin_fullpath); 225 240 226 241 rc = dfs_referral_walk(mnt_ctx); 242 + if (!rc) { 243 + /* 244 + * Prevent superblock from being created with any missing 245 + * connections. 246 + */ 247 + if (WARN_ON(!mnt_ctx->server)) 248 + rc = -EHOSTDOWN; 249 + else if (WARN_ON(!mnt_ctx->ses)) 250 + rc = -EACCES; 251 + else if (WARN_ON(!mnt_ctx->tcon)) 252 + rc = -ENOENT; 253 + } 227 254 if (rc) 228 255 goto out; 229 256 ··· 244 247 if (!tcon->origin_fullpath) { 245 248 tcon->origin_fullpath = origin_fullpath; 246 249 origin_fullpath = NULL; 250 + } else { 251 + new_tcon = false; 247 252 } 248 253 spin_unlock(&tcon->tc_lock); 249 254 250 - if (list_empty(&tcon->dfs_ses_list)) { 251 - list_replace_init(&mnt_ctx->dfs_ses_list, &tcon->dfs_ses_list); 255 + if (new_tcon) { 252 256 queue_delayed_work(dfscache_wq, &tcon->dfs_cache_work, 253 257 dfs_cache_get_ttl() * HZ); 254 - } else { 255 - dfs_put_root_smb_sessions(&mnt_ctx->dfs_ses_list); 256 258 } 257 259 258 260 out: ··· 294 298 if (rc) 295 299 return rc; 296 300 297 - ctx->dfs_root_ses = mnt_ctx->ses; 298 301 /* 299 302 * If called with 'nodfs' mount option, then skip DFS resolving. Otherwise unconditionally 300 303 * try to get an DFS referral (even cached) to determine whether it is an DFS mount. ··· 319 324 320 325 *isdfs = true; 321 326 add_root_smb_session(mnt_ctx); 322 - return __dfs_mount_share(mnt_ctx); 327 + rc = __dfs_mount_share(mnt_ctx); 328 + dfs_put_root_smb_sessions(mnt_ctx); 329 + return rc; 323 330 } 324 331 325 332 /* Update dfs referral path of superblock */

+21 -12

fs/smb/client/dfs.h

··· 7 7 #define _CIFS_DFS_H 8 8 9 9 #include "cifsglob.h" 10 + #include "cifsproto.h" 10 11 #include "fs_context.h" 12 + #include "dfs_cache.h" 11 13 #include "cifs_unicode.h" 12 14 #include <linux/namei.h> 13 15 ··· 116 114 ref_walk_tit(rw)); 117 115 } 118 116 119 - struct dfs_root_ses { 120 - struct list_head list; 121 - struct cifs_ses *ses; 122 - }; 123 - 124 117 int dfs_parse_target_referral(const char *full_path, const struct dfs_info3_param *ref, 125 118 struct smb3_fs_context *ctx); 126 119 int dfs_mount_share(struct cifs_mount_ctx *mnt_ctx, bool *isdfs); ··· 130 133 { 131 134 struct smb3_fs_context *ctx = mnt_ctx->fs_ctx; 132 135 struct cifs_sb_info *cifs_sb = mnt_ctx->cifs_sb; 136 + struct cifs_ses *rses = ctx->dfs_root_ses ?: mnt_ctx->ses; 133 137 134 - return dfs_cache_find(mnt_ctx->xid, ctx->dfs_root_ses, cifs_sb->local_nls, 138 + return dfs_cache_find(mnt_ctx->xid, rses, cifs_sb->local_nls, 135 139 cifs_remap(cifs_sb), path, ref, tl); 136 140 } 137 141 138 - static inline void dfs_put_root_smb_sessions(struct list_head *head) 142 + /* 143 + * cifs_get_smb_ses() already guarantees an active reference of 144 + * @ses->dfs_root_ses when a new session is created, so we need to put extra 145 + * references of all DFS root sessions that were used across the mount process 146 + * in dfs_mount_share(). 147 + */ 148 + static inline void dfs_put_root_smb_sessions(struct cifs_mount_ctx *mnt_ctx) 139 149 { 140 - struct dfs_root_ses *root, *tmp; 150 + const struct smb3_fs_context *ctx = mnt_ctx->fs_ctx; 151 + struct cifs_ses *ses = ctx->dfs_root_ses; 152 + struct cifs_ses *cur; 141 153 142 - list_for_each_entry_safe(root, tmp, head, list) { 143 - list_del_init(&root->list); 144 - cifs_put_smb_ses(root->ses); 145 - kfree(root); 154 + if (!ses) 155 + return; 156 + 157 + for (cur = ses; cur; cur = cur->dfs_root_ses) { 158 + if (cur->dfs_root_ses) 159 + cifs_put_smb_ses(cur->dfs_root_ses); 146 160 } 161 + cifs_put_smb_ses(ses); 147 162 } 148 163 149 164 #endif /* _CIFS_DFS_H */

+24 -29

fs/smb/client/dfs_cache.c

··· 1172 1172 return ret; 1173 1173 } 1174 1174 1175 - /* Refresh dfs referral of tcon and mark it for reconnect if needed */ 1176 - static int __refresh_tcon(const char *path, struct cifs_ses *ses, bool force_refresh) 1175 + /* Refresh dfs referral of @ses and mark it for reconnect if needed */ 1176 + static void __refresh_ses_referral(struct cifs_ses *ses, bool force_refresh) 1177 1177 { 1178 1178 struct TCP_Server_Info *server = ses->server; 1179 1179 DFS_CACHE_TGT_LIST(old_tl); ··· 1181 1181 bool needs_refresh = false; 1182 1182 struct cache_entry *ce; 1183 1183 unsigned int xid; 1184 + char *path = NULL; 1184 1185 int rc = 0; 1185 1186 1186 1187 xid = get_xid(); 1188 + 1189 + mutex_lock(&server->refpath_lock); 1190 + if (server->leaf_fullpath) { 1191 + path = kstrdup(server->leaf_fullpath + 1, GFP_ATOMIC); 1192 + if (!path) 1193 + rc = -ENOMEM; 1194 + } 1195 + mutex_unlock(&server->refpath_lock); 1196 + if (!path) 1197 + goto out; 1187 1198 1188 1199 down_read(&htable_rw_lock); 1189 1200 ce = lookup_cache_entry(path); ··· 1229 1218 free_xid(xid); 1230 1219 dfs_cache_free_tgts(&old_tl); 1231 1220 dfs_cache_free_tgts(&new_tl); 1232 - return rc; 1221 + kfree(path); 1233 1222 } 1234 1223 1235 - static int refresh_tcon(struct cifs_tcon *tcon, bool force_refresh) 1224 + static inline void refresh_ses_referral(struct cifs_ses *ses) 1236 1225 { 1237 - struct TCP_Server_Info *server = tcon->ses->server; 1238 - struct cifs_ses *ses = tcon->ses; 1226 + __refresh_ses_referral(ses, false); 1227 + } 1239 1228 1240 - mutex_lock(&server->refpath_lock); 1241 - if (server->leaf_fullpath) 1242 - __refresh_tcon(server->leaf_fullpath + 1, ses, force_refresh); 1243 - mutex_unlock(&server->refpath_lock); 1244 - return 0; 1229 + static inline void force_refresh_ses_referral(struct cifs_ses *ses) 1230 + { 1231 + __refresh_ses_referral(ses, true); 1245 1232 } 1246 1233 1247 1234 /** ··· 1280 1271 */ 1281 1272 cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_USE_PREFIX_PATH; 1282 1273 1283 - return refresh_tcon(tcon, true); 1274 + force_refresh_ses_referral(tcon->ses); 1275 + return 0; 1284 1276 } 1285 1277 1286 1278 /* Refresh all DFS referrals related to DFS tcon */ 1287 1279 void dfs_cache_refresh(struct work_struct *work) 1288 1280 { 1289 - struct TCP_Server_Info *server; 1290 - struct dfs_root_ses *rses; 1291 1281 struct cifs_tcon *tcon; 1292 1282 struct cifs_ses *ses; 1293 1283 1294 1284 tcon = container_of(work, struct cifs_tcon, dfs_cache_work.work); 1295 - ses = tcon->ses; 1296 - server = ses->server; 1297 1285 1298 - mutex_lock(&server->refpath_lock); 1299 - if (server->leaf_fullpath) 1300 - __refresh_tcon(server->leaf_fullpath + 1, ses, false); 1301 - mutex_unlock(&server->refpath_lock); 1302 - 1303 - list_for_each_entry(rses, &tcon->dfs_ses_list, list) { 1304 - ses = rses->ses; 1305 - server = ses->server; 1306 - mutex_lock(&server->refpath_lock); 1307 - if (server->leaf_fullpath) 1308 - __refresh_tcon(server->leaf_fullpath + 1, ses, false); 1309 - mutex_unlock(&server->refpath_lock); 1310 - } 1286 + for (ses = tcon->ses; ses; ses = ses->dfs_root_ses) 1287 + refresh_ses_referral(ses); 1311 1288 1312 1289 queue_delayed_work(dfscache_wq, &tcon->dfs_cache_work, 1313 1290 atomic_read(&dfs_cache_ttl) * HZ);

+15

fs/smb/client/dir.c

··· 189 189 int disposition; 190 190 struct TCP_Server_Info *server = tcon->ses->server; 191 191 struct cifs_open_parms oparms; 192 + int rdwr_for_fscache = 0; 192 193 193 194 *oplock = 0; 194 195 if (tcon->ses->server->oplocks) ··· 200 199 free_dentry_path(page); 201 200 return PTR_ERR(full_path); 202 201 } 202 + 203 + /* If we're caching, we need to be able to fill in around partial writes. */ 204 + if (cifs_fscache_enabled(inode) && (oflags & O_ACCMODE) == O_WRONLY) 205 + rdwr_for_fscache = 1; 203 206 204 207 #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY 205 208 if (tcon->unix_ext && cap_unix(tcon->ses) && !tcon->broken_posix_open && ··· 281 276 desired_access |= GENERIC_READ; /* is this too little? */ 282 277 if (OPEN_FMODE(oflags) & FMODE_WRITE) 283 278 desired_access |= GENERIC_WRITE; 279 + if (rdwr_for_fscache == 1) 280 + desired_access |= GENERIC_READ; 284 281 285 282 disposition = FILE_OVERWRITE_IF; 286 283 if ((oflags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL)) ··· 311 304 if (!tcon->unix_ext && (mode & S_IWUGO) == 0) 312 305 create_options |= CREATE_OPTION_READONLY; 313 306 307 + retry_open: 314 308 oparms = (struct cifs_open_parms) { 315 309 .tcon = tcon, 316 310 .cifs_sb = cifs_sb, ··· 325 317 rc = server->ops->open(xid, &oparms, oplock, buf); 326 318 if (rc) { 327 319 cifs_dbg(FYI, "cifs_create returned 0x%x\n", rc); 320 + if (rc == -EACCES && rdwr_for_fscache == 1) { 321 + desired_access &= ~GENERIC_READ; 322 + rdwr_for_fscache = 2; 323 + goto retry_open; 324 + } 328 325 goto out; 329 326 } 327 + if (rdwr_for_fscache == 2) 328 + cifs_invalidate_cache(inode, FSCACHE_INVAL_DIO_WRITE); 330 329 331 330 #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY 332 331 /*

+95 -16

fs/smb/client/file.c

··· 206 206 */ 207 207 } 208 208 209 - static inline int cifs_convert_flags(unsigned int flags) 209 + static inline int cifs_convert_flags(unsigned int flags, int rdwr_for_fscache) 210 210 { 211 211 if ((flags & O_ACCMODE) == O_RDONLY) 212 212 return GENERIC_READ; 213 213 else if ((flags & O_ACCMODE) == O_WRONLY) 214 - return GENERIC_WRITE; 214 + return rdwr_for_fscache == 1 ? (GENERIC_READ | GENERIC_WRITE) : GENERIC_WRITE; 215 215 else if ((flags & O_ACCMODE) == O_RDWR) { 216 216 /* GENERIC_ALL is too much permission to request 217 217 can cause unnecessary access denied on create */ ··· 348 348 int create_options = CREATE_NOT_DIR; 349 349 struct TCP_Server_Info *server = tcon->ses->server; 350 350 struct cifs_open_parms oparms; 351 + int rdwr_for_fscache = 0; 351 352 352 353 if (!server->ops->open) 353 354 return -ENOSYS; 354 355 355 - desired_access = cifs_convert_flags(f_flags); 356 + /* If we're caching, we need to be able to fill in around partial writes. */ 357 + if (cifs_fscache_enabled(inode) && (f_flags & O_ACCMODE) == O_WRONLY) 358 + rdwr_for_fscache = 1; 359 + 360 + desired_access = cifs_convert_flags(f_flags, rdwr_for_fscache); 356 361 357 362 /********************************************************************* 358 363 * open flag mapping table: ··· 394 389 if (f_flags & O_DIRECT) 395 390 create_options |= CREATE_NO_BUFFER; 396 391 392 + retry_open: 397 393 oparms = (struct cifs_open_parms) { 398 394 .tcon = tcon, 399 395 .cifs_sb = cifs_sb, ··· 406 400 }; 407 401 408 402 rc = server->ops->open(xid, &oparms, oplock, buf); 409 - if (rc) 403 + if (rc) { 404 + if (rc == -EACCES && rdwr_for_fscache == 1) { 405 + desired_access = cifs_convert_flags(f_flags, 0); 406 + rdwr_for_fscache = 2; 407 + goto retry_open; 408 + } 410 409 return rc; 410 + } 411 + if (rdwr_for_fscache == 2) 412 + cifs_invalidate_cache(inode, FSCACHE_INVAL_DIO_WRITE); 411 413 412 414 /* TODO: Add support for calling posix query info but with passing in fid */ 413 415 if (tcon->unix_ext) ··· 459 445 } 460 446 461 447 static void cifsFileInfo_put_work(struct work_struct *work); 448 + void serverclose_work(struct work_struct *work); 462 449 463 450 struct cifsFileInfo *cifs_new_fileinfo(struct cifs_fid *fid, struct file *file, 464 451 struct tcon_link *tlink, __u32 oplock, ··· 506 491 cfile->tlink = cifs_get_tlink(tlink); 507 492 INIT_WORK(&cfile->oplock_break, cifs_oplock_break); 508 493 INIT_WORK(&cfile->put, cifsFileInfo_put_work); 494 + INIT_WORK(&cfile->serverclose, serverclose_work); 509 495 INIT_DELAYED_WORK(&cfile->deferred, smb2_deferred_work_close); 510 496 mutex_init(&cfile->fh_mutex); 511 497 spin_lock_init(&cfile->file_info_lock); ··· 598 582 cifsFileInfo_put_final(cifs_file); 599 583 } 600 584 585 + void serverclose_work(struct work_struct *work) 586 + { 587 + struct cifsFileInfo *cifs_file = container_of(work, 588 + struct cifsFileInfo, serverclose); 589 + 590 + struct cifs_tcon *tcon = tlink_tcon(cifs_file->tlink); 591 + 592 + struct TCP_Server_Info *server = tcon->ses->server; 593 + int rc = 0; 594 + int retries = 0; 595 + int MAX_RETRIES = 4; 596 + 597 + do { 598 + if (server->ops->close_getattr) 599 + rc = server->ops->close_getattr(0, tcon, cifs_file); 600 + else if (server->ops->close) 601 + rc = server->ops->close(0, tcon, &cifs_file->fid); 602 + 603 + if (rc == -EBUSY || rc == -EAGAIN) { 604 + retries++; 605 + msleep(250); 606 + } 607 + } while ((rc == -EBUSY || rc == -EAGAIN) && (retries < MAX_RETRIES) 608 + ); 609 + 610 + if (retries == MAX_RETRIES) 611 + pr_warn("Serverclose failed %d times, giving up\n", MAX_RETRIES); 612 + 613 + if (cifs_file->offload) 614 + queue_work(fileinfo_put_wq, &cifs_file->put); 615 + else 616 + cifsFileInfo_put_final(cifs_file); 617 + } 618 + 601 619 /** 602 620 * cifsFileInfo_put - release a reference of file priv data 603 621 * ··· 672 622 struct cifs_fid fid = {}; 673 623 struct cifs_pending_open open; 674 624 bool oplock_break_cancelled; 625 + bool serverclose_offloaded = false; 675 626 676 627 spin_lock(&tcon->open_file_lock); 677 628 spin_lock(&cifsi->open_file_lock); 678 629 spin_lock(&cifs_file->file_info_lock); 630 + 631 + cifs_file->offload = offload; 679 632 if (--cifs_file->count > 0) { 680 633 spin_unlock(&cifs_file->file_info_lock); 681 634 spin_unlock(&cifsi->open_file_lock); ··· 720 667 if (!tcon->need_reconnect && !cifs_file->invalidHandle) { 721 668 struct TCP_Server_Info *server = tcon->ses->server; 722 669 unsigned int xid; 670 + int rc = 0; 723 671 724 672 xid = get_xid(); 725 673 if (server->ops->close_getattr) 726 - server->ops->close_getattr(xid, tcon, cifs_file); 674 + rc = server->ops->close_getattr(xid, tcon, cifs_file); 727 675 else if (server->ops->close) 728 - server->ops->close(xid, tcon, &cifs_file->fid); 676 + rc = server->ops->close(xid, tcon, &cifs_file->fid); 729 677 _free_xid(xid); 678 + 679 + if (rc == -EBUSY || rc == -EAGAIN) { 680 + // Server close failed, hence offloading it as an async op 681 + queue_work(serverclose_wq, &cifs_file->serverclose); 682 + serverclose_offloaded = true; 683 + } 730 684 } 731 685 732 686 if (oplock_break_cancelled) ··· 741 681 742 682 cifs_del_pending_open(&open); 743 683 744 - if (offload) 745 - queue_work(fileinfo_put_wq, &cifs_file->put); 746 - else 747 - cifsFileInfo_put_final(cifs_file); 684 + // if serverclose has been offloaded to wq (on failure), it will 685 + // handle offloading put as well. If serverclose not offloaded, 686 + // we need to handle offloading put here. 687 + if (!serverclose_offloaded) { 688 + if (offload) 689 + queue_work(fileinfo_put_wq, &cifs_file->put); 690 + else 691 + cifsFileInfo_put_final(cifs_file); 692 + } 748 693 } 749 694 750 695 int cifs_open(struct inode *inode, struct file *file) ··· 899 834 use_cache: 900 835 fscache_use_cookie(cifs_inode_cookie(file_inode(file)), 901 836 file->f_mode & FMODE_WRITE); 902 - if (file->f_flags & O_DIRECT && 903 - (!((file->f_flags & O_ACCMODE) != O_RDONLY) || 904 - file->f_flags & O_APPEND)) 905 - cifs_invalidate_cache(file_inode(file), 906 - FSCACHE_INVAL_DIO_WRITE); 837 + if (!(file->f_flags & O_DIRECT)) 838 + goto out; 839 + if ((file->f_flags & (O_ACCMODE | O_APPEND)) == O_RDONLY) 840 + goto out; 841 + cifs_invalidate_cache(file_inode(file), FSCACHE_INVAL_DIO_WRITE); 907 842 908 843 out: 909 844 free_dentry_path(page); ··· 968 903 int disposition = FILE_OPEN; 969 904 int create_options = CREATE_NOT_DIR; 970 905 struct cifs_open_parms oparms; 906 + int rdwr_for_fscache = 0; 971 907 972 908 xid = get_xid(); 973 909 mutex_lock(&cfile->fh_mutex); ··· 1032 966 } 1033 967 #endif /* CONFIG_CIFS_ALLOW_INSECURE_LEGACY */ 1034 968 1035 - desired_access = cifs_convert_flags(cfile->f_flags); 969 + /* If we're caching, we need to be able to fill in around partial writes. */ 970 + if (cifs_fscache_enabled(inode) && (cfile->f_flags & O_ACCMODE) == O_WRONLY) 971 + rdwr_for_fscache = 1; 972 + 973 + desired_access = cifs_convert_flags(cfile->f_flags, rdwr_for_fscache); 1036 974 1037 975 /* O_SYNC also has bit for O_DSYNC so following check picks up either */ 1038 976 if (cfile->f_flags & O_SYNC) ··· 1048 978 if (server->ops->get_lease_key) 1049 979 server->ops->get_lease_key(inode, &cfile->fid); 1050 980 981 + retry_open: 1051 982 oparms = (struct cifs_open_parms) { 1052 983 .tcon = tcon, 1053 984 .cifs_sb = cifs_sb, ··· 1074 1003 /* indicate that we need to relock the file */ 1075 1004 oparms.reconnect = true; 1076 1005 } 1006 + if (rc == -EACCES && rdwr_for_fscache == 1) { 1007 + desired_access = cifs_convert_flags(cfile->f_flags, 0); 1008 + rdwr_for_fscache = 2; 1009 + goto retry_open; 1010 + } 1077 1011 1078 1012 if (rc) { 1079 1013 mutex_unlock(&cfile->fh_mutex); ··· 1086 1010 cifs_dbg(FYI, "oplock: %d\n", oplock); 1087 1011 goto reopen_error_exit; 1088 1012 } 1013 + 1014 + if (rdwr_for_fscache == 2) 1015 + cifs_invalidate_cache(inode, FSCACHE_INVAL_DIO_WRITE); 1089 1016 1090 1017 #ifdef CONFIG_CIFS_ALLOW_INSECURE_LEGACY 1091 1018 reopen_success:

+3 -3

fs/smb/client/fs_context.c

··· 37 37 #include "rfc1002pdu.h" 38 38 #include "fs_context.h" 39 39 40 - static DEFINE_MUTEX(cifs_mount_mutex); 40 + DEFINE_MUTEX(cifs_mount_mutex); 41 41 42 42 static const match_table_t cifs_smb_version_tokens = { 43 43 { Smb_1, SMB1_VERSION_STRING }, ··· 783 783 784 784 if (err) 785 785 return err; 786 - mutex_lock(&cifs_mount_mutex); 786 + cifs_mount_lock(); 787 787 ret = smb3_get_tree_common(fc); 788 - mutex_unlock(&cifs_mount_mutex); 788 + cifs_mount_unlock(); 789 789 return ret; 790 790 } 791 791

+12

fs/smb/client/fs_context.h

··· 304 304 #define MAX_CACHED_FIDS 16 305 305 extern char *cifs_sanitize_prepath(char *prepath, gfp_t gfp); 306 306 307 + extern struct mutex cifs_mount_mutex; 308 + 309 + static inline void cifs_mount_lock(void) 310 + { 311 + mutex_lock(&cifs_mount_mutex); 312 + } 313 + 314 + static inline void cifs_mount_unlock(void) 315 + { 316 + mutex_unlock(&cifs_mount_mutex); 317 + } 318 + 307 319 #endif

+6

fs/smb/client/fscache.h

··· 109 109 __cifs_readahead_to_fscache(inode, pos, len); 110 110 } 111 111 112 + static inline bool cifs_fscache_enabled(struct inode *inode) 113 + { 114 + return fscache_cookie_enabled(cifs_inode_cookie(inode)); 115 + } 116 + 112 117 #else /* CONFIG_CIFS_FSCACHE */ 113 118 static inline 114 119 void cifs_fscache_fill_coherency(struct inode *inode, ··· 129 124 static inline void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool update) {} 130 125 static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode) { return NULL; } 131 126 static inline void cifs_invalidate_cache(struct inode *inode, unsigned int flags) {} 127 + static inline bool cifs_fscache_enabled(struct inode *inode) { return false; } 132 128 133 129 static inline int cifs_fscache_query_occupancy(struct inode *inode, 134 130 pgoff_t first, unsigned int nr_pages,

+5 -1

fs/smb/client/ioctl.c

··· 247 247 spin_lock(&cifs_tcp_ses_lock); 248 248 list_for_each_entry(server_it, &cifs_tcp_ses_list, tcp_ses_list) { 249 249 list_for_each_entry(ses_it, &server_it->smb_ses_list, smb_ses_list) { 250 - if (ses_it->Suid == out.session_id) { 250 + spin_lock(&ses_it->ses_lock); 251 + if (ses_it->ses_status != SES_EXITING && 252 + ses_it->Suid == out.session_id) { 251 253 ses = ses_it; 252 254 /* 253 255 * since we are using the session outside the crit ··· 257 255 * so increment its refcount 258 256 */ 259 257 cifs_smb_ses_inc_refcount(ses); 258 + spin_unlock(&ses_it->ses_lock); 260 259 found = true; 261 260 goto search_end; 262 261 } 262 + spin_unlock(&ses_it->ses_lock); 263 263 } 264 264 } 265 265 search_end:

+2 -6

fs/smb/client/misc.c

··· 138 138 atomic_set(&ret_buf->num_local_opens, 0); 139 139 atomic_set(&ret_buf->num_remote_opens, 0); 140 140 ret_buf->stats_from_time = ktime_get_real_seconds(); 141 - #ifdef CONFIG_CIFS_DFS_UPCALL 142 - INIT_LIST_HEAD(&ret_buf->dfs_ses_list); 143 - #endif 144 141 145 142 return ret_buf; 146 143 } ··· 153 156 atomic_dec(&tconInfoAllocCount); 154 157 kfree(tcon->nativeFileSystem); 155 158 kfree_sensitive(tcon->password); 156 - #ifdef CONFIG_CIFS_DFS_UPCALL 157 - dfs_put_root_smb_sessions(&tcon->dfs_ses_list); 158 - #endif 159 159 kfree(tcon->origin_fullpath); 160 160 kfree(tcon); 161 161 } ··· 481 487 /* look up tcon based on tid & uid */ 482 488 spin_lock(&cifs_tcp_ses_lock); 483 489 list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) { 490 + if (cifs_ses_exiting(ses)) 491 + continue; 484 492 list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { 485 493 if (tcon->tid != buf->Tid) 486 494 continue;

+2 -2

fs/smb/client/smb1ops.c

··· 753 753 cinode->can_cache_brlcks = CIFS_CACHE_WRITE(cinode); 754 754 } 755 755 756 - static void 756 + static int 757 757 cifs_close_file(const unsigned int xid, struct cifs_tcon *tcon, 758 758 struct cifs_fid *fid) 759 759 { 760 - CIFSSMBClose(xid, tcon, fid->netfid); 760 + return CIFSSMBClose(xid, tcon, fid->netfid); 761 761 } 762 762 763 763 static int

+4

fs/smb/client/smb2misc.c

··· 622 622 /* look up tcon based on tid & uid */ 623 623 spin_lock(&cifs_tcp_ses_lock); 624 624 list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) { 625 + if (cifs_ses_exiting(ses)) 626 + continue; 625 627 list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { 626 628 spin_lock(&tcon->open_file_lock); 627 629 cifs_stats_inc( ··· 699 697 /* look up tcon based on tid & uid */ 700 698 spin_lock(&cifs_tcp_ses_lock); 701 699 list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) { 700 + if (cifs_ses_exiting(ses)) 701 + continue; 702 702 list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { 703 703 704 704 spin_lock(&tcon->open_file_lock);

+8 -5

fs/smb/client/smb2ops.c

··· 1412 1412 memcpy(cfile->fid.create_guid, fid->create_guid, 16); 1413 1413 } 1414 1414 1415 - static void 1415 + static int 1416 1416 smb2_close_file(const unsigned int xid, struct cifs_tcon *tcon, 1417 1417 struct cifs_fid *fid) 1418 1418 { 1419 - SMB2_close(xid, tcon, fid->persistent_fid, fid->volatile_fid); 1419 + return SMB2_close(xid, tcon, fid->persistent_fid, fid->volatile_fid); 1420 1420 } 1421 1421 1422 - static void 1422 + static int 1423 1423 smb2_close_getattr(const unsigned int xid, struct cifs_tcon *tcon, 1424 1424 struct cifsFileInfo *cfile) 1425 1425 { ··· 1430 1430 rc = __SMB2_close(xid, tcon, cfile->fid.persistent_fid, 1431 1431 cfile->fid.volatile_fid, &file_inf); 1432 1432 if (rc) 1433 - return; 1433 + return rc; 1434 1434 1435 1435 inode = d_inode(cfile->dentry); 1436 1436 ··· 1459 1459 1460 1460 /* End of file and Attributes should not have to be updated on close */ 1461 1461 spin_unlock(&inode->i_lock); 1462 + return rc; 1462 1463 } 1463 1464 1464 1465 static int ··· 2481 2480 2482 2481 spin_lock(&cifs_tcp_ses_lock); 2483 2482 list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) { 2483 + if (cifs_ses_exiting(ses)) 2484 + continue; 2484 2485 list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { 2485 2486 if (tcon->tid == le32_to_cpu(shdr->Id.SyncId.TreeId)) { 2486 2487 spin_lock(&tcon->tc_lock); ··· 3916 3913 strcat(message, "W"); 3917 3914 } 3918 3915 if (!new_oplock) 3919 - strncpy(message, "None", sizeof(message)); 3916 + strscpy(message, "None"); 3920 3917 3921 3918 cinode->oplock = new_oplock; 3922 3919 cifs_dbg(FYI, "%s Lease granted on inode %p\n", message,

+1 -1

fs/smb/client/smb2pdu.c

··· 3628 3628 memcpy(&pbuf->network_open_info, 3629 3629 &rsp->network_open_info, 3630 3630 sizeof(pbuf->network_open_info)); 3631 + atomic_dec(&tcon->num_remote_opens); 3631 3632 } 3632 3633 3633 - atomic_dec(&tcon->num_remote_opens); 3634 3634 close_exit: 3635 3635 SMB2_close_free(&rqst); 3636 3636 free_rsp_buf(resp_buftype, rsp);

+1 -1

fs/smb/client/smb2transport.c

··· 659 659 } 660 660 spin_unlock(&server->srv_lock); 661 661 if (!is_binding && !server->session_estab) { 662 - strncpy(shdr->Signature, "BSRSPYL", 8); 662 + strscpy(shdr->Signature, "BSRSPYL"); 663 663 return 0; 664 664 } 665 665

+2 -1

fs/smb/server/ksmbd_netlink.h

··· 167 167 __u16 force_uid; 168 168 __u16 force_gid; 169 169 __s8 share_name[KSMBD_REQ_MAX_SHARE_NAME]; 170 - __u32 reserved[112]; /* Reserved room */ 170 + __u32 reserved[111]; /* Reserved room */ 171 + __u32 payload_sz; 171 172 __u32 veto_list_sz; 172 173 __s8 ____payload[]; 173 174 };

+6 -1

fs/smb/server/mgmt/share_config.c

··· 158 158 share->name = kstrdup(name, GFP_KERNEL); 159 159 160 160 if (!test_share_config_flag(share, KSMBD_SHARE_FLAG_PIPE)) { 161 - share->path = kstrdup(ksmbd_share_config_path(resp), 161 + int path_len = PATH_MAX; 162 + 163 + if (resp->payload_sz) 164 + path_len = resp->payload_sz - resp->veto_list_sz; 165 + 166 + share->path = kstrndup(ksmbd_share_config_path(resp), path_len, 162 167 GFP_KERNEL); 163 168 if (share->path) 164 169 share->path_sz = strlen(share->path);

+5 -5

fs/smb/server/smb2ops.c

··· 228 228 conn->cli_cap & SMB2_GLOBAL_CAP_ENCRYPTION) 229 229 conn->vals->capabilities |= SMB2_GLOBAL_CAP_ENCRYPTION; 230 230 231 + if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_ENCRYPTION || 232 + (!(server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_ENCRYPTION_OFF) && 233 + conn->cli_cap & SMB2_GLOBAL_CAP_ENCRYPTION)) 234 + conn->vals->capabilities |= SMB2_GLOBAL_CAP_ENCRYPTION; 235 + 231 236 if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB3_MULTICHANNEL) 232 237 conn->vals->capabilities |= SMB2_GLOBAL_CAP_MULTI_CHANNEL; 233 238 } ··· 282 277 if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_LEASES) 283 278 conn->vals->capabilities |= SMB2_GLOBAL_CAP_LEASING | 284 279 SMB2_GLOBAL_CAP_DIRECTORY_LEASING; 285 - 286 - if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_ENCRYPTION || 287 - (!(server_conf.flags & KSMBD_GLOBAL_FLAG_SMB2_ENCRYPTION_OFF) && 288 - conn->cli_cap & SMB2_GLOBAL_CAP_ENCRYPTION)) 289 - conn->vals->capabilities |= SMB2_GLOBAL_CAP_ENCRYPTION; 290 280 291 281 if (server_conf.flags & KSMBD_GLOBAL_FLAG_SMB3_MULTICHANNEL) 292 282 conn->vals->capabilities |= SMB2_GLOBAL_CAP_MULTI_CHANNEL;

+2 -1

fs/smb/server/smb2pdu.c

··· 5857 5857 if (!file_info->ReplaceIfExists) 5858 5858 flags = RENAME_NOREPLACE; 5859 5859 5860 - smb_break_all_levII_oplock(work, fp, 0); 5861 5860 rc = ksmbd_vfs_rename(work, &fp->filp->f_path, new_name, flags); 5861 + if (!rc) 5862 + smb_break_all_levII_oplock(work, fp, 0); 5862 5863 out: 5863 5864 kfree(new_name); 5864 5865 return rc;

+37

fs/smb/server/transport_ipc.c

··· 65 65 struct hlist_node ipc_table_hlist; 66 66 67 67 void *response; 68 + unsigned int msg_sz; 68 69 }; 69 70 70 71 static struct delayed_work ipc_timer_work; ··· 276 275 } 277 276 278 277 memcpy(entry->response, payload, sz); 278 + entry->msg_sz = sz; 279 279 wake_up_interruptible(&entry->wait); 280 280 ret = 0; 281 281 break; ··· 455 453 return ret; 456 454 } 457 455 456 + static int ipc_validate_msg(struct ipc_msg_table_entry *entry) 457 + { 458 + unsigned int msg_sz = entry->msg_sz; 459 + 460 + if (entry->type == KSMBD_EVENT_RPC_REQUEST) { 461 + struct ksmbd_rpc_command *resp = entry->response; 462 + 463 + msg_sz = sizeof(struct ksmbd_rpc_command) + resp->payload_sz; 464 + } else if (entry->type == KSMBD_EVENT_SPNEGO_AUTHEN_REQUEST) { 465 + struct ksmbd_spnego_authen_response *resp = entry->response; 466 + 467 + msg_sz = sizeof(struct ksmbd_spnego_authen_response) + 468 + resp->session_key_len + resp->spnego_blob_len; 469 + } else if (entry->type == KSMBD_EVENT_SHARE_CONFIG_REQUEST) { 470 + struct ksmbd_share_config_response *resp = entry->response; 471 + 472 + if (resp->payload_sz) { 473 + if (resp->payload_sz < resp->veto_list_sz) 474 + return -EINVAL; 475 + 476 + msg_sz = sizeof(struct ksmbd_share_config_response) + 477 + resp->payload_sz; 478 + } 479 + } 480 + 481 + return entry->msg_sz != msg_sz ? -EINVAL : 0; 482 + } 483 + 458 484 static void *ipc_msg_send_request(struct ksmbd_ipc_msg *msg, unsigned int handle) 459 485 { 460 486 struct ipc_msg_table_entry entry; ··· 507 477 ret = wait_event_interruptible_timeout(entry.wait, 508 478 entry.response != NULL, 509 479 IPC_WAIT_TIMEOUT); 480 + if (entry.response) { 481 + ret = ipc_validate_msg(&entry); 482 + if (ret) { 483 + kvfree(entry.response); 484 + entry.response = NULL; 485 + } 486 + } 510 487 out: 511 488 down_write(&ipc_msg_table_lock); 512 489 hash_del(&entry.ipc_table_hlist);

+3 -21

fs/super.c

··· 1515 1515 return error; 1516 1516 } 1517 1517 1518 - static void fs_bdev_super_get(void *data) 1519 - { 1520 - struct super_block *sb = data; 1521 - 1522 - spin_lock(&sb_lock); 1523 - sb->s_count++; 1524 - spin_unlock(&sb_lock); 1525 - } 1526 - 1527 - static void fs_bdev_super_put(void *data) 1528 - { 1529 - struct super_block *sb = data; 1530 - 1531 - put_super(sb); 1532 - } 1533 - 1534 1518 const struct blk_holder_ops fs_holder_ops = { 1535 1519 .mark_dead = fs_bdev_mark_dead, 1536 1520 .sync = fs_bdev_sync, 1537 1521 .freeze = fs_bdev_freeze, 1538 1522 .thaw = fs_bdev_thaw, 1539 - .get_holder = fs_bdev_super_get, 1540 - .put_holder = fs_bdev_super_put, 1541 1523 }; 1542 1524 EXPORT_SYMBOL_GPL(fs_holder_ops); 1543 1525 ··· 1544 1562 * writable from userspace even for a read-only block device. 1545 1563 */ 1546 1564 if ((mode & BLK_OPEN_WRITE) && bdev_read_only(bdev)) { 1547 - fput(bdev_file); 1565 + bdev_fput(bdev_file); 1548 1566 return -EACCES; 1549 1567 } 1550 1568 ··· 1555 1573 if (atomic_read(&bdev->bd_fsfreeze_count) > 0) { 1556 1574 if (fc) 1557 1575 warnf(fc, "%pg: Can't mount, blockdev is frozen", bdev); 1558 - fput(bdev_file); 1576 + bdev_fput(bdev_file); 1559 1577 return -EBUSY; 1560 1578 } 1561 1579 spin_lock(&sb_lock); ··· 1675 1693 generic_shutdown_super(sb); 1676 1694 if (bdev) { 1677 1695 sync_blockdev(bdev); 1678 - fput(sb->s_bdev_file); 1696 + bdev_fput(sb->s_bdev_file); 1679 1697 } 1680 1698 } 1681 1699

+1

fs/vboxsf/file.c

··· 218 218 .release = vboxsf_file_release, 219 219 .fsync = noop_fsync, 220 220 .splice_read = filemap_splice_read, 221 + .setlease = simple_nosetlease, 221 222 }; 222 223 223 224 const struct inode_operations vboxsf_reg_iops = {

+5 -4

fs/vboxsf/super.c

··· 151 151 if (!sbi->nls) { 152 152 vbg_err("vboxsf: Count not load '%s' nls\n", nls_name); 153 153 err = -EINVAL; 154 - goto fail_free; 154 + goto fail_destroy_idr; 155 155 } 156 156 } 157 157 158 - sbi->bdi_id = ida_simple_get(&vboxsf_bdi_ida, 0, 0, GFP_KERNEL); 158 + sbi->bdi_id = ida_alloc(&vboxsf_bdi_ida, GFP_KERNEL); 159 159 if (sbi->bdi_id < 0) { 160 160 err = sbi->bdi_id; 161 161 goto fail_free; ··· 221 221 vboxsf_unmap_folder(sbi->root); 222 222 fail_free: 223 223 if (sbi->bdi_id >= 0) 224 - ida_simple_remove(&vboxsf_bdi_ida, sbi->bdi_id); 224 + ida_free(&vboxsf_bdi_ida, sbi->bdi_id); 225 225 if (sbi->nls) 226 226 unload_nls(sbi->nls); 227 + fail_destroy_idr: 227 228 idr_destroy(&sbi->ino_idr); 228 229 kfree(sbi); 229 230 return err; ··· 269 268 270 269 vboxsf_unmap_folder(sbi->root); 271 270 if (sbi->bdi_id >= 0) 272 - ida_simple_remove(&vboxsf_bdi_ida, sbi->bdi_id); 271 + ida_free(&vboxsf_bdi_ida, sbi->bdi_id); 273 272 if (sbi->nls) 274 273 unload_nls(sbi->nls); 275 274

-3

fs/vboxsf/utils.c

··· 440 440 { 441 441 const char *in; 442 442 char *out; 443 - size_t out_len; 444 443 size_t out_bound_len; 445 444 size_t in_bound_len; 446 445 ··· 447 448 in_bound_len = utf8_len; 448 449 449 450 out = name; 450 - out_len = 0; 451 451 /* Reserve space for terminating 0 */ 452 452 out_bound_len = name_bound_len - 1; 453 453 ··· 467 469 468 470 out += nb; 469 471 out_bound_len -= nb; 470 - out_len += nb; 471 472 } 472 473 473 474 *out = 0;

+1 -1

fs/xfs/xfs_buf.c

··· 2030 2030 fs_put_dax(btp->bt_daxdev, btp->bt_mount); 2031 2031 /* the main block device is closed by kill_block_super */ 2032 2032 if (btp->bt_bdev != btp->bt_mount->m_super->s_bdev) 2033 - fput(btp->bt_bdev_file); 2033 + bdev_fput(btp->bt_bdev_file); 2034 2034 kfree(btp); 2035 2035 } 2036 2036

+13 -2

fs/xfs/xfs_inode.c

··· 1301 1301 */ 1302 1302 if (unlikely((tdp->i_diflags & XFS_DIFLAG_PROJINHERIT) && 1303 1303 tdp->i_projid != sip->i_projid)) { 1304 - error = -EXDEV; 1305 - goto error_return; 1304 + /* 1305 + * Project quota setup skips special files which can 1306 + * leave inodes in a PROJINHERIT directory without a 1307 + * project ID set. We need to allow links to be made 1308 + * to these "project-less" inodes because userspace 1309 + * expects them to succeed after project ID setup, 1310 + * but everything else should be rejected. 1311 + */ 1312 + if (!special_file(VFS_I(sip)->i_mode) || 1313 + sip->i_projid != 0) { 1314 + error = -EXDEV; 1315 + goto error_return; 1316 + } 1306 1317 } 1307 1318 1308 1319 if (!resblks) {

+3 -3

fs/xfs/xfs_super.c

··· 485 485 mp->m_logdev_targp = mp->m_ddev_targp; 486 486 /* Handle won't be used, drop it */ 487 487 if (logdev_file) 488 - fput(logdev_file); 488 + bdev_fput(logdev_file); 489 489 } 490 490 491 491 return 0; ··· 497 497 xfs_free_buftarg(mp->m_ddev_targp); 498 498 out_close_rtdev: 499 499 if (rtdev_file) 500 - fput(rtdev_file); 500 + bdev_fput(rtdev_file); 501 501 out_close_logdev: 502 502 if (logdev_file) 503 - fput(logdev_file); 503 + bdev_fput(logdev_file); 504 504 return error; 505 505 } 506 506

+1 -1

include/kvm/arm_pmu.h

··· 86 86 */ 87 87 #define kvm_pmu_update_vcpu_events(vcpu) \ 88 88 do { \ 89 - if (!has_vhe() && kvm_vcpu_has_pmu(vcpu)) \ 89 + if (!has_vhe() && kvm_arm_support_pmu_v3()) \ 90 90 vcpu->arch.pmu.events = *kvm_get_pmu_events(); \ 91 91 } while (0) 92 92

+1 -10

include/linux/blkdev.h

··· 1505 1505 * Thaw the file system mounted on the block device. 1506 1506 */ 1507 1507 int (*thaw)(struct block_device *bdev); 1508 - 1509 - /* 1510 - * If needed, get a reference to the holder. 1511 - */ 1512 - void (*get_holder)(void *holder); 1513 - 1514 - /* 1515 - * Release the holder. 1516 - */ 1517 - void (*put_holder)(void *holder); 1518 1508 }; 1519 1509 1520 1510 /* ··· 1575 1585 1576 1586 int bdev_freeze(struct block_device *bdev); 1577 1587 int bdev_thaw(struct block_device *bdev); 1588 + void bdev_fput(struct file *bdev_file); 1578 1589 1579 1590 struct io_comp_batch { 1580 1591 struct request *req_list;

+15 -1

include/linux/bpf.h

··· 1574 1574 enum bpf_link_type type; 1575 1575 const struct bpf_link_ops *ops; 1576 1576 struct bpf_prog *prog; 1577 - struct work_struct work; 1577 + /* rcu is used before freeing, work can be used to schedule that 1578 + * RCU-based freeing before that, so they never overlap 1579 + */ 1580 + union { 1581 + struct rcu_head rcu; 1582 + struct work_struct work; 1583 + }; 1578 1584 }; 1579 1585 1580 1586 struct bpf_link_ops { 1581 1587 void (*release)(struct bpf_link *link); 1588 + /* deallocate link resources callback, called without RCU grace period 1589 + * waiting 1590 + */ 1582 1591 void (*dealloc)(struct bpf_link *link); 1592 + /* deallocate link resources callback, called after RCU grace period; 1593 + * if underlying BPF program is sleepable we go through tasks trace 1594 + * RCU GP and then "classic" RCU GP 1595 + */ 1596 + void (*dealloc_deferred)(struct bpf_link *link); 1583 1597 int (*detach)(struct bpf_link *link); 1584 1598 int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog, 1585 1599 struct bpf_prog *old_prog);

+12

include/linux/cc_platform.h

··· 90 90 * Examples include TDX Guest. 91 91 */ 92 92 CC_ATTR_HOTPLUG_DISABLED, 93 + 94 + /** 95 + * @CC_ATTR_HOST_SEV_SNP: AMD SNP enabled on the host. 96 + * 97 + * The host kernel is running with the necessary features 98 + * enabled to run SEV-SNP guests. 99 + */ 100 + CC_ATTR_HOST_SEV_SNP, 93 101 }; 94 102 95 103 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM ··· 115 107 * * FALSE - Specified Confidential Computing attribute is not active 116 108 */ 117 109 bool cc_platform_has(enum cc_attr attr); 110 + void cc_platform_set(enum cc_attr attr); 111 + void cc_platform_clear(enum cc_attr attr); 118 112 119 113 #else /* !CONFIG_ARCH_HAS_CC_PLATFORM */ 120 114 121 115 static inline bool cc_platform_has(enum cc_attr attr) { return false; } 116 + static inline void cc_platform_set(enum cc_attr attr) { } 117 + static inline void cc_platform_clear(enum cc_attr attr) { } 122 118 123 119 #endif /* CONFIG_ARCH_HAS_CC_PLATFORM */ 124 120

+1

include/linux/device.h

··· 1247 1247 void device_link_remove(void *consumer, struct device *supplier); 1248 1248 void device_links_supplier_sync_state_pause(void); 1249 1249 void device_links_supplier_sync_state_resume(void); 1250 + void device_link_wait_removal(void); 1250 1251 1251 1252 /* Create alias, so I can be autoloaded. */ 1252 1253 #define MODULE_ALIAS_CHARDEV(major,minor) \

-1

include/linux/energy_model.h

··· 245 245 * max utilization to the allowed CPU capacity before calculating 246 246 * effective performance. 247 247 */ 248 - max_util = map_util_perf(max_util); 249 248 max_util = min(max_util, allowed_cpu_cap); 250 249 251 250 /*

+2

include/linux/fs.h

··· 121 121 #define FMODE_PWRITE ((__force fmode_t)0x10) 122 122 /* File is opened for execution with sys_execve / sys_uselib */ 123 123 #define FMODE_EXEC ((__force fmode_t)0x20) 124 + /* File writes are restricted (block device specific) */ 125 + #define FMODE_WRITE_RESTRICTED ((__force fmode_t)0x40) 124 126 /* 32bit hashes as llseek() offset (for directories) */ 125 127 #define FMODE_32BITHASH ((__force fmode_t)0x200) 126 128 /* 64bit hashes as llseek() offset (for directories) */

-1

include/linux/io_uring_types.h

··· 294 294 295 295 struct io_submit_state submit_state; 296 296 297 - struct io_buffer_list *io_bl; 298 297 struct xarray io_bl_xa; 299 298 300 299 struct io_hash_table cancel_table_locked;

+8

include/linux/regmap.h

··· 1230 1230 int regmap_raw_write_async(struct regmap *map, unsigned int reg, 1231 1231 const void *val, size_t val_len); 1232 1232 int regmap_read(struct regmap *map, unsigned int reg, unsigned int *val); 1233 + int regmap_read_bypassed(struct regmap *map, unsigned int reg, unsigned int *val); 1233 1234 int regmap_raw_read(struct regmap *map, unsigned int reg, 1234 1235 void *val, size_t val_len); 1235 1236 int regmap_noinc_read(struct regmap *map, unsigned int reg, ··· 1735 1734 1736 1735 static inline int regmap_read(struct regmap *map, unsigned int reg, 1737 1736 unsigned int *val) 1737 + { 1738 + WARN_ONCE(1, "regmap API is disabled"); 1739 + return -EINVAL; 1740 + } 1741 + 1742 + static inline int regmap_read_bypassed(struct regmap *map, unsigned int reg, 1743 + unsigned int *val) 1738 1744 { 1739 1745 WARN_ONCE(1, "regmap API is disabled"); 1740 1746 return -EINVAL;

+2 -2

include/linux/secretmem.h

··· 13 13 /* 14 14 * Using folio_mapping() is quite slow because of the actual call 15 15 * instruction. 16 - * We know that secretmem pages are not compound and LRU so we can 16 + * We know that secretmem pages are not compound, so we can 17 17 * save a couple of cycles here. 18 18 */ 19 - if (folio_test_large(folio) || !folio_test_lru(folio)) 19 + if (folio_test_large(folio)) 20 20 return false; 21 21 22 22 mapping = (struct address_space *)

+3 -4

include/linux/stackdepot.h

··· 44 44 union handle_parts { 45 45 depot_stack_handle_t handle; 46 46 struct { 47 - /* pool_index is offset by 1 */ 48 - u32 pool_index : DEPOT_POOL_INDEX_BITS; 49 - u32 offset : DEPOT_OFFSET_BITS; 50 - u32 extra : STACK_DEPOT_EXTRA_BITS; 47 + u32 pool_index_plus_1 : DEPOT_POOL_INDEX_BITS; 48 + u32 offset : DEPOT_OFFSET_BITS; 49 + u32 extra : STACK_DEPOT_EXTRA_BITS; 51 50 }; 52 51 }; 53 52

+9 -2

include/linux/timecounter.h

··· 22 22 * 23 23 * @read: returns the current cycle value 24 24 * @mask: bitmask for two's complement 25 - * subtraction of non 64 bit counters, 25 + * subtraction of non-64-bit counters, 26 26 * see CYCLECOUNTER_MASK() helper macro 27 27 * @mult: cycle to nanosecond multiplier 28 28 * @shift: cycle to nanosecond divisor (power of two) ··· 35 35 }; 36 36 37 37 /** 38 - * struct timecounter - layer above a %struct cyclecounter which counts nanoseconds 38 + * struct timecounter - layer above a &struct cyclecounter which counts nanoseconds 39 39 * Contains the state needed by timecounter_read() to detect 40 40 * cycle counter wrap around. Initialize with 41 41 * timecounter_init(). Also used to convert cycle counts into the ··· 66 66 * @cycles: Cycles 67 67 * @mask: bit mask for maintaining the 'frac' field 68 68 * @frac: pointer to storage for the fractional nanoseconds. 69 + * 70 + * Returns: cycle counter cycles converted to nanoseconds 69 71 */ 70 72 static inline u64 cyclecounter_cyc2ns(const struct cyclecounter *cc, 71 73 u64 cycles, u64 mask, u64 *frac) ··· 81 79 82 80 /** 83 81 * timecounter_adjtime - Shifts the time of the clock. 82 + * @tc: The &struct timecounter to adjust 84 83 * @delta: Desired change in nanoseconds. 85 84 */ 86 85 static inline void timecounter_adjtime(struct timecounter *tc, s64 delta) ··· 110 107 * 111 108 * In other words, keeps track of time since the same epoch as 112 109 * the function which generated the initial time stamp. 110 + * 111 + * Returns: nanoseconds since the initial time stamp 113 112 */ 114 113 extern u64 timecounter_read(struct timecounter *tc); 115 114 ··· 128 123 * 129 124 * This allows conversion of cycle counter values which were generated 130 125 * in the past. 126 + * 127 + * Returns: cycle counter converted to nanoseconds since the initial time stamp 131 128 */ 132 129 extern u64 timecounter_cyc2time(const struct timecounter *tc, 133 130 u64 cycle_tstamp);

+42 -7

include/linux/timekeeping.h

··· 22 22 const struct timezone *tz); 23 23 24 24 /* 25 - * ktime_get() family: read the current time in a multitude of ways, 25 + * ktime_get() family - read the current time in a multitude of ways. 26 26 * 27 27 * The default time reference is CLOCK_MONOTONIC, starting at 28 28 * boot time but not counting the time spent in suspend. 29 29 * For other references, use the functions with "real", "clocktai", 30 30 * "boottime" and "raw" suffixes. 31 31 * 32 - * To get the time in a different format, use the ones wit 32 + * To get the time in a different format, use the ones with 33 33 * "ns", "ts64" and "seconds" suffix. 34 34 * 35 35 * See Documentation/core-api/timekeeping.rst for more details. ··· 74 74 75 75 /** 76 76 * ktime_get_real - get the real (wall-) time in ktime_t format 77 + * 78 + * Returns: real (wall) time in ktime_t format 77 79 */ 78 80 static inline ktime_t ktime_get_real(void) 79 81 { ··· 88 86 } 89 87 90 88 /** 91 - * ktime_get_boottime - Returns monotonic time since boot in ktime_t format 89 + * ktime_get_boottime - Get monotonic time since boot in ktime_t format 92 90 * 93 91 * This is similar to CLOCK_MONTONIC/ktime_get, but also includes the 94 92 * time spent in suspend. 93 + * 94 + * Returns: monotonic time since boot in ktime_t format 95 95 */ 96 96 static inline ktime_t ktime_get_boottime(void) 97 97 { ··· 106 102 } 107 103 108 104 /** 109 - * ktime_get_clocktai - Returns the TAI time of day in ktime_t format 105 + * ktime_get_clocktai - Get the TAI time of day in ktime_t format 106 + * 107 + * Returns: the TAI time of day in ktime_t format 110 108 */ 111 109 static inline ktime_t ktime_get_clocktai(void) 112 110 { ··· 150 144 151 145 /** 152 146 * ktime_mono_to_real - Convert monotonic time to clock realtime 147 + * @mono: monotonic time to convert 148 + * 149 + * Returns: time converted to realtime clock 153 150 */ 154 151 static inline ktime_t ktime_mono_to_real(ktime_t mono) 155 152 { 156 153 return ktime_mono_to_any(mono, TK_OFFS_REAL); 157 154 } 158 155 156 + /** 157 + * ktime_get_ns - Get the current time in nanoseconds 158 + * 159 + * Returns: current time converted to nanoseconds 160 + */ 159 161 static inline u64 ktime_get_ns(void) 160 162 { 161 163 return ktime_to_ns(ktime_get()); 162 164 } 163 165 166 + /** 167 + * ktime_get_real_ns - Get the current real/wall time in nanoseconds 168 + * 169 + * Returns: current real time converted to nanoseconds 170 + */ 164 171 static inline u64 ktime_get_real_ns(void) 165 172 { 166 173 return ktime_to_ns(ktime_get_real()); 167 174 } 168 175 176 + /** 177 + * ktime_get_boottime_ns - Get the monotonic time since boot in nanoseconds 178 + * 179 + * Returns: current boottime converted to nanoseconds 180 + */ 169 181 static inline u64 ktime_get_boottime_ns(void) 170 182 { 171 183 return ktime_to_ns(ktime_get_boottime()); 172 184 } 173 185 186 + /** 187 + * ktime_get_clocktai_ns - Get the current TAI time of day in nanoseconds 188 + * 189 + * Returns: current TAI time converted to nanoseconds 190 + */ 174 191 static inline u64 ktime_get_clocktai_ns(void) 175 192 { 176 193 return ktime_to_ns(ktime_get_clocktai()); 177 194 } 178 195 196 + /** 197 + * ktime_get_raw_ns - Get the raw monotonic time in nanoseconds 198 + * 199 + * Returns: current raw monotonic time converted to nanoseconds 200 + */ 179 201 static inline u64 ktime_get_raw_ns(void) 180 202 { 181 203 return ktime_to_ns(ktime_get_raw()); ··· 258 224 259 225 extern void timekeeping_inject_sleeptime64(const struct timespec64 *delta); 260 226 261 - /* 262 - * struct ktime_timestanps - Simultaneous mono/boot/real timestamps 227 + /** 228 + * struct ktime_timestamps - Simultaneous mono/boot/real timestamps 263 229 * @mono: Monotonic timestamp 264 230 * @boot: Boottime timestamp 265 231 * @real: Realtime timestamp ··· 276 242 * @cycles: Clocksource counter value to produce the system times 277 243 * @real: Realtime system time 278 244 * @raw: Monotonic raw system time 279 - * @clock_was_set_seq: The sequence number of clock was set events 245 + * @cs_id: Clocksource ID 246 + * @clock_was_set_seq: The sequence number of clock-was-set events 280 247 * @cs_was_changed_seq: The sequence number of clocksource change events 281 248 */ 282 249 struct system_time_snapshot {

+10 -2

include/linux/timer.h

··· 22 22 #define __TIMER_LOCKDEP_MAP_INITIALIZER(_kn) 23 23 #endif 24 24 25 - /** 25 + /* 26 26 * @TIMER_DEFERRABLE: A deferrable timer will work normally when the 27 27 * system is busy, but will not cause a CPU to come out of idle just 28 28 * to service it; instead, the timer will be serviced when the CPU ··· 140 140 * or not. Callers must ensure serialization wrt. other operations done 141 141 * to this timer, eg. interrupt contexts, or other CPUs on SMP. 142 142 * 143 - * return value: 1 if the timer is pending, 0 if not. 143 + * Returns: 1 if the timer is pending, 0 if not. 144 144 */ 145 145 static inline int timer_pending(const struct timer_list * timer) 146 146 { ··· 175 175 * See timer_delete_sync() for detailed explanation. 176 176 * 177 177 * Do not use in new code. Use timer_delete_sync() instead. 178 + * 179 + * Returns: 180 + * * %0 - The timer was not pending 181 + * * %1 - The timer was pending and deactivated 178 182 */ 179 183 static inline int del_timer_sync(struct timer_list *timer) 180 184 { ··· 192 188 * See timer_delete() for detailed explanation. 193 189 * 194 190 * Do not use in new code. Use timer_delete() instead. 191 + * 192 + * Returns: 193 + * * %0 - The timer was not pending 194 + * * %1 - The timer was pending and deactivated 195 195 */ 196 196 static inline int del_timer(struct timer_list *timer) 197 197 {

+28

include/linux/udp.h

··· 150 150 } 151 151 } 152 152 153 + DECLARE_STATIC_KEY_FALSE(udp_encap_needed_key); 154 + #if IS_ENABLED(CONFIG_IPV6) 155 + DECLARE_STATIC_KEY_FALSE(udpv6_encap_needed_key); 156 + #endif 157 + 158 + static inline bool udp_encap_needed(void) 159 + { 160 + if (static_branch_unlikely(&udp_encap_needed_key)) 161 + return true; 162 + 163 + #if IS_ENABLED(CONFIG_IPV6) 164 + if (static_branch_unlikely(&udpv6_encap_needed_key)) 165 + return true; 166 + #endif 167 + 168 + return false; 169 + } 170 + 153 171 static inline bool udp_unexpected_gso(struct sock *sk, struct sk_buff *skb) 154 172 { 155 173 if (!skb_is_gso(skb)) ··· 179 161 180 162 if (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST && 181 163 !udp_test_bit(ACCEPT_FRAGLIST, sk)) 164 + return true; 165 + 166 + /* GSO packets lacking the SKB_GSO_UDP_TUNNEL/_CSUM bits might still 167 + * land in a tunnel as the socket check in udp_gro_receive cannot be 168 + * foolproof. 169 + */ 170 + if (udp_encap_needed() && 171 + READ_ONCE(udp_sk(sk)->encap_rcv) && 172 + !(skb_shinfo(skb)->gso_type & 173 + (SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM))) 182 174 return true; 183 175 184 176 return false;

+9

include/net/bluetooth/hci.h

··· 176 176 */ 177 177 HCI_QUIRK_USE_BDADDR_PROPERTY, 178 178 179 + /* When this quirk is set, the Bluetooth Device Address provided by 180 + * the 'local-bd-address' fwnode property is incorrectly specified in 181 + * big-endian order. 182 + * 183 + * This quirk can be set before hci_register_dev is called or 184 + * during the hdev->setup vendor callback. 185 + */ 186 + HCI_QUIRK_BDADDR_PROPERTY_BROKEN, 187 + 179 188 /* When this quirk is set, the duplicate filtering during 180 189 * scanning is based on Bluetooth devices addresses. To allow 181 190 * RSSI based updates, restart scanning if needed.

-1

include/net/mana/mana.h

··· 39 39 #define COMP_ENTRY_SIZE 64 40 40 41 41 #define RX_BUFFERS_PER_QUEUE 512 42 - #define MANA_RX_DATA_ALIGN 64 43 42 44 43 #define MAX_SEND_BUFFERS_PER_QUEUE 256 45 44

+2

include/sound/cs35l56.h

··· 267 267 bool fw_patched; 268 268 bool secured; 269 269 bool can_hibernate; 270 + bool fw_owns_asp1; 270 271 bool cal_data_valid; 271 272 s8 cal_index; 272 273 struct cirrus_amp_cal_data cal_data; ··· 284 283 extern const unsigned int cs35l56_tx_input_values[CS35L56_NUM_INPUT_SRC]; 285 284 286 285 int cs35l56_set_patch(struct cs35l56_base *cs35l56_base); 286 + int cs35l56_init_asp1_regs_for_driver_control(struct cs35l56_base *cs35l56_base); 287 287 int cs35l56_force_sync_asp1_registers_from_cache(struct cs35l56_base *cs35l56_base); 288 288 int cs35l56_mbox_send(struct cs35l56_base *cs35l56_base, unsigned int command); 289 289 int cs35l56_firmware_shutdown(struct cs35l56_base *cs35l56_base);

+1 -1

include/sound/tas2781-tlv.h

··· 15 15 #ifndef __TAS2781_TLV_H__ 16 16 #define __TAS2781_TLV_H__ 17 17 18 - static const DECLARE_TLV_DB_SCALE(dvc_tlv, -10000, 100, 0); 18 + static const __maybe_unused DECLARE_TLV_DB_SCALE(dvc_tlv, -10000, 100, 0); 19 19 static const DECLARE_TLV_DB_SCALE(amp_vol_tlv, 1100, 50, 0); 20 20 21 21 #endif

+1 -7

include/vdso/datapage.h

··· 19 19 #include <vdso/time32.h> 20 20 #include <vdso/time64.h> 21 21 22 - #ifdef CONFIG_ARM64 23 - #include <asm/page-def.h> 24 - #else 25 - #include <asm/page.h> 26 - #endif 27 - 28 22 #ifdef CONFIG_ARCH_HAS_VDSO_DATA 29 23 #include <asm/vdso/data.h> 30 24 #else ··· 126 132 */ 127 133 union vdso_data_store { 128 134 struct vdso_data data[CS_BASES]; 129 - u8 page[PAGE_SIZE]; 135 + u8 page[1U << CONFIG_PAGE_SHIFT]; 130 136 }; 131 137 132 138 /*

+1 -1

init/initramfs.c

··· 367 367 if (S_ISREG(mode)) { 368 368 int ml = maybe_link(); 369 369 if (ml >= 0) { 370 - int openflags = O_WRONLY|O_CREAT; 370 + int openflags = O_WRONLY|O_CREAT|O_LARGEFILE; 371 371 if (ml != 1) 372 372 openflags |= O_TRUNC; 373 373 wfile = filp_open(collected, openflags, mode);

+19 -12

io_uring/io_uring.c

··· 147 147 static void io_queue_sqe(struct io_kiocb *req); 148 148 149 149 struct kmem_cache *req_cachep; 150 + static struct workqueue_struct *iou_wq __ro_after_init; 150 151 151 152 static int __read_mostly sysctl_io_uring_disabled; 152 153 static int __read_mostly sysctl_io_uring_group = -1; ··· 351 350 err: 352 351 kfree(ctx->cancel_table.hbs); 353 352 kfree(ctx->cancel_table_locked.hbs); 354 - kfree(ctx->io_bl); 355 353 xa_destroy(&ctx->io_bl_xa); 356 354 kfree(ctx); 357 355 return NULL; ··· 1982 1982 err = -EBADFD; 1983 1983 if (!io_file_can_poll(req)) 1984 1984 goto fail; 1985 - err = -ECANCELED; 1986 - if (io_arm_poll_handler(req, issue_flags) != IO_APOLL_OK) 1987 - goto fail; 1988 - return; 1985 + if (req->file->f_flags & O_NONBLOCK || 1986 + req->file->f_mode & FMODE_NOWAIT) { 1987 + err = -ECANCELED; 1988 + if (io_arm_poll_handler(req, issue_flags) != IO_APOLL_OK) 1989 + goto fail; 1990 + return; 1991 + } else { 1992 + req->flags &= ~REQ_F_APOLL_MULTISHOT; 1993 + } 1989 1994 } 1990 1995 1991 1996 if (req->flags & REQ_F_FORCE_ASYNC) { ··· 2931 2926 io_napi_free(ctx); 2932 2927 kfree(ctx->cancel_table.hbs); 2933 2928 kfree(ctx->cancel_table_locked.hbs); 2934 - kfree(ctx->io_bl); 2935 2929 xa_destroy(&ctx->io_bl_xa); 2936 2930 kfree(ctx); 2937 2931 } ··· 3165 3161 * noise and overhead, there's no discernable change in runtime 3166 3162 * over using system_wq. 3167 3163 */ 3168 - queue_work(system_unbound_wq, &ctx->exit_work); 3164 + queue_work(iou_wq, &ctx->exit_work); 3169 3165 } 3170 3166 3171 3167 static int io_uring_release(struct inode *inode, struct file *file) ··· 3447 3443 ptr = ctx->sq_sqes; 3448 3444 break; 3449 3445 case IORING_OFF_PBUF_RING: { 3446 + struct io_buffer_list *bl; 3450 3447 unsigned int bgid; 3451 3448 3452 3449 bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT; 3453 - rcu_read_lock(); 3454 - ptr = io_pbuf_get_address(ctx, bgid); 3455 - rcu_read_unlock(); 3456 - if (!ptr) 3457 - return ERR_PTR(-EINVAL); 3450 + bl = io_pbuf_get_bl(ctx, bgid); 3451 + if (IS_ERR(bl)) 3452 + return bl; 3453 + ptr = bl->buf_ring; 3454 + io_put_bl(ctx, bl); 3458 3455 break; 3459 3456 } 3460 3457 default: ··· 4189 4184 sizeof_field(struct io_kiocb, cmd.data), NULL); 4190 4185 io_buf_cachep = KMEM_CACHE(io_buffer, 4191 4186 SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT); 4187 + 4188 + iou_wq = alloc_workqueue("iou_exit", WQ_UNBOUND, 64); 4192 4189 4193 4190 #ifdef CONFIG_SYSCTL 4194 4191 register_sysctl_init("kernel", kernel_io_uring_disabled_table);

+40 -76

io_uring/kbuf.c

··· 17 17 18 18 #define IO_BUFFER_LIST_BUF_PER_PAGE (PAGE_SIZE / sizeof(struct io_uring_buf)) 19 19 20 - #define BGID_ARRAY 64 21 - 22 20 /* BIDs are addressed by a 16-bit field in a CQE */ 23 21 #define MAX_BIDS_PER_BGID (1 << 16) 24 22 ··· 38 40 int inuse; 39 41 }; 40 42 41 - static struct io_buffer_list *__io_buffer_get_list(struct io_ring_ctx *ctx, 42 - struct io_buffer_list *bl, 43 - unsigned int bgid) 43 + static inline struct io_buffer_list *__io_buffer_get_list(struct io_ring_ctx *ctx, 44 + unsigned int bgid) 44 45 { 45 - if (bl && bgid < BGID_ARRAY) 46 - return &bl[bgid]; 47 - 48 46 return xa_load(&ctx->io_bl_xa, bgid); 49 47 } 50 48 ··· 49 55 { 50 56 lockdep_assert_held(&ctx->uring_lock); 51 57 52 - return __io_buffer_get_list(ctx, ctx->io_bl, bgid); 58 + return __io_buffer_get_list(ctx, bgid); 53 59 } 54 60 55 61 static int io_buffer_add_list(struct io_ring_ctx *ctx, ··· 61 67 * always under the ->uring_lock, but the RCU lookup from mmap does. 62 68 */ 63 69 bl->bgid = bgid; 64 - smp_store_release(&bl->is_ready, 1); 65 - 66 - if (bgid < BGID_ARRAY) 67 - return 0; 68 - 70 + atomic_set(&bl->refs, 1); 69 71 return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL)); 70 72 } 71 73 ··· 198 208 return ret; 199 209 } 200 210 201 - static __cold int io_init_bl_list(struct io_ring_ctx *ctx) 202 - { 203 - struct io_buffer_list *bl; 204 - int i; 205 - 206 - bl = kcalloc(BGID_ARRAY, sizeof(struct io_buffer_list), GFP_KERNEL); 207 - if (!bl) 208 - return -ENOMEM; 209 - 210 - for (i = 0; i < BGID_ARRAY; i++) { 211 - INIT_LIST_HEAD(&bl[i].buf_list); 212 - bl[i].bgid = i; 213 - } 214 - 215 - smp_store_release(&ctx->io_bl, bl); 216 - return 0; 217 - } 218 - 219 211 /* 220 212 * Mark the given mapped range as free for reuse 221 213 */ ··· 266 294 return i; 267 295 } 268 296 297 + void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl) 298 + { 299 + if (atomic_dec_and_test(&bl->refs)) { 300 + __io_remove_buffers(ctx, bl, -1U); 301 + kfree_rcu(bl, rcu); 302 + } 303 + } 304 + 269 305 void io_destroy_buffers(struct io_ring_ctx *ctx) 270 306 { 271 307 struct io_buffer_list *bl; 272 308 struct list_head *item, *tmp; 273 309 struct io_buffer *buf; 274 310 unsigned long index; 275 - int i; 276 - 277 - for (i = 0; i < BGID_ARRAY; i++) { 278 - if (!ctx->io_bl) 279 - break; 280 - __io_remove_buffers(ctx, &ctx->io_bl[i], -1U); 281 - } 282 311 283 312 xa_for_each(&ctx->io_bl_xa, index, bl) { 284 313 xa_erase(&ctx->io_bl_xa, bl->bgid); 285 - __io_remove_buffers(ctx, bl, -1U); 286 - kfree_rcu(bl, rcu); 314 + io_put_bl(ctx, bl); 287 315 } 288 316 289 317 /* ··· 461 489 462 490 io_ring_submit_lock(ctx, issue_flags); 463 491 464 - if (unlikely(p->bgid < BGID_ARRAY && !ctx->io_bl)) { 465 - ret = io_init_bl_list(ctx); 466 - if (ret) 467 - goto err; 468 - } 469 - 470 492 bl = io_buffer_get_list(ctx, p->bgid); 471 493 if (unlikely(!bl)) { 472 494 bl = kzalloc(sizeof(*bl), GFP_KERNEL_ACCOUNT); ··· 473 507 if (ret) { 474 508 /* 475 509 * Doesn't need rcu free as it was never visible, but 476 - * let's keep it consistent throughout. Also can't 477 - * be a lower indexed array group, as adding one 478 - * where lookup failed cannot happen. 510 + * let's keep it consistent throughout. 479 511 */ 480 - if (p->bgid >= BGID_ARRAY) 481 - kfree_rcu(bl, rcu); 482 - else 483 - WARN_ON_ONCE(1); 512 + kfree_rcu(bl, rcu); 484 513 goto err; 485 514 } 486 515 } ··· 640 679 if (reg.ring_entries >= 65536) 641 680 return -EINVAL; 642 681 643 - if (unlikely(reg.bgid < BGID_ARRAY && !ctx->io_bl)) { 644 - int ret = io_init_bl_list(ctx); 645 - if (ret) 646 - return ret; 647 - } 648 - 649 682 bl = io_buffer_get_list(ctx, reg.bgid); 650 683 if (bl) { 651 684 /* if mapped buffer ring OR classic exists, don't allow */ ··· 688 733 if (!bl->is_buf_ring) 689 734 return -EINVAL; 690 735 691 - __io_remove_buffers(ctx, bl, -1U); 692 - if (bl->bgid >= BGID_ARRAY) { 693 - xa_erase(&ctx->io_bl_xa, bl->bgid); 694 - kfree_rcu(bl, rcu); 695 - } 736 + xa_erase(&ctx->io_bl_xa, bl->bgid); 737 + io_put_bl(ctx, bl); 696 738 return 0; 697 739 } 698 740 ··· 719 767 return 0; 720 768 } 721 769 722 - void *io_pbuf_get_address(struct io_ring_ctx *ctx, unsigned long bgid) 770 + struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx, 771 + unsigned long bgid) 723 772 { 724 773 struct io_buffer_list *bl; 774 + bool ret; 725 775 726 - bl = __io_buffer_get_list(ctx, smp_load_acquire(&ctx->io_bl), bgid); 727 - 728 - if (!bl || !bl->is_mmap) 729 - return NULL; 730 776 /* 731 - * Ensure the list is fully setup. Only strictly needed for RCU lookup 732 - * via mmap, and in that case only for the array indexed groups. For 733 - * the xarray lookups, it's either visible and ready, or not at all. 777 + * We have to be a bit careful here - we're inside mmap and cannot grab 778 + * the uring_lock. This means the buffer_list could be simultaneously 779 + * going away, if someone is trying to be sneaky. Look it up under rcu 780 + * so we know it's not going away, and attempt to grab a reference to 781 + * it. If the ref is already zero, then fail the mapping. If successful, 782 + * the caller will call io_put_bl() to drop the the reference at at the 783 + * end. This may then safely free the buffer_list (and drop the pages) 784 + * at that point, vm_insert_pages() would've already grabbed the 785 + * necessary vma references. 734 786 */ 735 - if (!smp_load_acquire(&bl->is_ready)) 736 - return NULL; 787 + rcu_read_lock(); 788 + bl = xa_load(&ctx->io_bl_xa, bgid); 789 + /* must be a mmap'able buffer ring and have pages */ 790 + ret = false; 791 + if (bl && bl->is_mmap) 792 + ret = atomic_inc_not_zero(&bl->refs); 793 + rcu_read_unlock(); 737 794 738 - return bl->buf_ring; 795 + if (ret) 796 + return bl; 797 + 798 + return ERR_PTR(-EINVAL); 739 799 } 740 800 741 801 /*

+5 -3

io_uring/kbuf.h

··· 25 25 __u16 head; 26 26 __u16 mask; 27 27 28 + atomic_t refs; 29 + 28 30 /* ring mapped provided buffers */ 29 31 __u8 is_buf_ring; 30 32 /* ring mapped provided buffers, but mmap'ed by application */ 31 33 __u8 is_mmap; 32 - /* bl is visible from an RCU point of view for lookup */ 33 - __u8 is_ready; 34 34 }; 35 35 36 36 struct io_buffer { ··· 61 61 62 62 bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags); 63 63 64 - void *io_pbuf_get_address(struct io_ring_ctx *ctx, unsigned long bgid); 64 + void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl); 65 + struct io_buffer_list *io_pbuf_get_bl(struct io_ring_ctx *ctx, 66 + unsigned long bgid); 65 67 66 68 static inline bool io_kbuf_recycle_ring(struct io_kiocb *req) 67 69 {

+8 -1

io_uring/rw.c

··· 937 937 ret = __io_read(req, issue_flags); 938 938 939 939 /* 940 + * If the file doesn't support proper NOWAIT, then disable multishot 941 + * and stay in single shot mode. 942 + */ 943 + if (!io_file_supports_nowait(req)) 944 + req->flags &= ~REQ_F_APOLL_MULTISHOT; 945 + 946 + /* 940 947 * If we get -EAGAIN, recycle our buffer and just let normal poll 941 948 * handling arm it. 942 949 */ ··· 962 955 /* 963 956 * Any successful return value will keep the multishot read armed. 964 957 */ 965 - if (ret > 0) { 958 + if (ret > 0 && req->flags & REQ_F_APOLL_MULTISHOT) { 966 959 /* 967 960 * Put our buffer and post a CQE. If we fail to post a CQE, then 968 961 * jump to the termination path. This request is then done.

+32 -3

kernel/bpf/syscall.c

··· 3024 3024 atomic64_inc(&link->refcnt); 3025 3025 } 3026 3026 3027 + static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu) 3028 + { 3029 + struct bpf_link *link = container_of(rcu, struct bpf_link, rcu); 3030 + 3031 + /* free bpf_link and its containing memory */ 3032 + link->ops->dealloc_deferred(link); 3033 + } 3034 + 3035 + static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu) 3036 + { 3037 + if (rcu_trace_implies_rcu_gp()) 3038 + bpf_link_defer_dealloc_rcu_gp(rcu); 3039 + else 3040 + call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp); 3041 + } 3042 + 3027 3043 /* bpf_link_free is guaranteed to be called from process context */ 3028 3044 static void bpf_link_free(struct bpf_link *link) 3029 3045 { 3046 + bool sleepable = false; 3047 + 3030 3048 bpf_link_free_id(link->id); 3031 3049 if (link->prog) { 3050 + sleepable = link->prog->sleepable; 3032 3051 /* detach BPF program, clean up used resources */ 3033 3052 link->ops->release(link); 3034 3053 bpf_prog_put(link->prog); 3035 3054 } 3036 - /* free bpf_link and its containing memory */ 3037 - link->ops->dealloc(link); 3055 + if (link->ops->dealloc_deferred) { 3056 + /* schedule BPF link deallocation; if underlying BPF program 3057 + * is sleepable, we need to first wait for RCU tasks trace 3058 + * sync, then go through "classic" RCU grace period 3059 + */ 3060 + if (sleepable) 3061 + call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); 3062 + else 3063 + call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp); 3064 + } 3065 + if (link->ops->dealloc) 3066 + link->ops->dealloc(link); 3038 3067 } 3039 3068 3040 3069 static void bpf_link_put_deferred(struct work_struct *work) ··· 3573 3544 3574 3545 static const struct bpf_link_ops bpf_raw_tp_link_lops = { 3575 3546 .release = bpf_raw_tp_link_release, 3576 - .dealloc = bpf_raw_tp_link_dealloc, 3547 + .dealloc_deferred = bpf_raw_tp_link_dealloc, 3577 3548 .show_fdinfo = bpf_raw_tp_link_show_fdinfo, 3578 3549 .fill_link_info = bpf_raw_tp_link_fill_link_info, 3579 3550 };

+3

kernel/bpf/verifier.c

··· 18379 18379 } 18380 18380 if (!env->prog->jit_requested) { 18381 18381 verbose(env, "JIT is required to use arena\n"); 18382 + fdput(f); 18382 18383 return -EOPNOTSUPP; 18383 18384 } 18384 18385 if (!bpf_jit_supports_arena()) { 18385 18386 verbose(env, "JIT doesn't support arena\n"); 18387 + fdput(f); 18386 18388 return -EOPNOTSUPP; 18387 18389 } 18388 18390 env->prog->aux->arena = (void *)map; 18389 18391 if (!bpf_arena_get_user_vm_start(env->prog->aux->arena)) { 18390 18392 verbose(env, "arena's user address must be set via map_extra or mmap()\n"); 18393 + fdput(f); 18391 18394 return -EINVAL; 18392 18395 } 18393 18396 }

+15 -3

kernel/time/tick-sched.c

··· 697 697 698 698 /** 699 699 * tick_nohz_update_jiffies - update jiffies when idle was interrupted 700 + * @now: current ktime_t 700 701 * 701 702 * Called from interrupt entry when the CPU was idle 702 703 * ··· 795 794 * This time is measured via accounting rather than sampling, 796 795 * and is as accurate as ktime_get() is. 797 796 * 798 - * This function returns -1 if NOHZ is not enabled. 797 + * Return: -1 if NOHZ is not enabled, else total idle time of the @cpu 799 798 */ 800 799 u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time) 801 800 { ··· 821 820 * This time is measured via accounting rather than sampling, 822 821 * and is as accurate as ktime_get() is. 823 822 * 824 - * This function returns -1 if NOHZ is not enabled. 823 + * Return: -1 if NOHZ is not enabled, else total iowait time of @cpu 825 824 */ 826 825 u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time) 827 826 { ··· 1288 1287 1289 1288 /** 1290 1289 * tick_nohz_idle_got_tick - Check whether or not the tick handler has run 1290 + * 1291 + * Return: %true if the tick handler has run, otherwise %false 1291 1292 */ 1292 1293 bool tick_nohz_idle_got_tick(void) 1293 1294 { ··· 1308 1305 * stopped, it returns the next hrtimer. 1309 1306 * 1310 1307 * Called from power state control code with interrupts disabled 1308 + * 1309 + * Return: the next expiration time 1311 1310 */ 1312 1311 ktime_t tick_nohz_get_next_hrtimer(void) 1313 1312 { ··· 1325 1320 * The return value of this function and/or the value returned by it through the 1326 1321 * @delta_next pointer can be negative which must be taken into account by its 1327 1322 * callers. 1323 + * 1324 + * Return: the expected length of the current sleep 1328 1325 */ 1329 1326 ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next) 1330 1327 { ··· 1364 1357 /** 1365 1358 * tick_nohz_get_idle_calls_cpu - return the current idle calls counter value 1366 1359 * for a particular CPU. 1360 + * @cpu: target CPU number 1367 1361 * 1368 1362 * Called from the schedutil frequency scaling governor in scheduler context. 1363 + * 1364 + * Return: the current idle calls counter value for @cpu 1369 1365 */ 1370 1366 unsigned long tick_nohz_get_idle_calls_cpu(int cpu) 1371 1367 { ··· 1381 1371 * tick_nohz_get_idle_calls - return the current idle calls counter value 1382 1372 * 1383 1373 * Called from the schedutil frequency scaling governor in scheduler context. 1374 + * 1375 + * Return: the current idle calls counter value for the current CPU 1384 1376 */ 1385 1377 unsigned long tick_nohz_get_idle_calls(void) 1386 1378 { ··· 1571 1559 1572 1560 /** 1573 1561 * tick_setup_sched_timer - setup the tick emulation timer 1574 - * @mode: tick_nohz_mode to setup for 1562 + * @hrtimer: whether to use the hrtimer or not 1575 1563 */ 1576 1564 void tick_setup_sched_timer(bool hrtimer) 1577 1565 {

+1 -1

kernel/time/tick-sched.h

··· 46 46 * @next_tick: Next tick to be fired when in dynticks mode. 47 47 * @idle_jiffies: jiffies at the entry to idle for idle time accounting 48 48 * @idle_waketime: Time when the idle was interrupted 49 + * @idle_sleeptime_seq: sequence counter for data consistency 49 50 * @idle_entrytime: Time when the idle call was entered 50 - * @nohz_mode: Mode - one state of tick_nohz_mode 51 51 * @last_jiffies: Base jiffies snapshot when next event was last computed 52 52 * @timer_expires_base: Base time clock monotonic for @timer_expires 53 53 * @timer_expires: Anticipated timer expiration time (in case sched tick is stopped)

+11 -11

kernel/time/timer.c

··· 64 64 65 65 /* 66 66 * The timer wheel has LVL_DEPTH array levels. Each level provides an array of 67 - * LVL_SIZE buckets. Each level is driven by its own clock and therefor each 67 + * LVL_SIZE buckets. Each level is driven by its own clock and therefore each 68 68 * level has a different granularity. 69 69 * 70 - * The level granularity is: LVL_CLK_DIV ^ lvl 70 + * The level granularity is: LVL_CLK_DIV ^ level 71 71 * The level clock frequency is: HZ / (LVL_CLK_DIV ^ level) 72 72 * 73 73 * The array level of a newly armed timer depends on the relative expiry 74 74 * time. The farther the expiry time is away the higher the array level and 75 - * therefor the granularity becomes. 75 + * therefore the granularity becomes. 76 76 * 77 77 * Contrary to the original timer wheel implementation, which aims for 'exact' 78 78 * expiry of the timers, this implementation removes the need for recascading ··· 207 207 * struct timer_base - Per CPU timer base (number of base depends on config) 208 208 * @lock: Lock protecting the timer_base 209 209 * @running_timer: When expiring timers, the lock is dropped. To make 210 - * sure not to race agains deleting/modifying a 210 + * sure not to race against deleting/modifying a 211 211 * currently running timer, the pointer is set to the 212 212 * timer, which expires at the moment. If no timer is 213 213 * running, the pointer is NULL. ··· 737 737 } 738 738 739 739 /* 740 - * fixup_init is called when: 740 + * timer_fixup_init is called when: 741 741 * - an active object is initialized 742 742 */ 743 743 static bool timer_fixup_init(void *addr, enum debug_obj_state state) ··· 761 761 } 762 762 763 763 /* 764 - * fixup_activate is called when: 764 + * timer_fixup_activate is called when: 765 765 * - an active object is activated 766 766 * - an unknown non-static object is activated 767 767 */ ··· 783 783 } 784 784 785 785 /* 786 - * fixup_free is called when: 786 + * timer_fixup_free is called when: 787 787 * - an active object is freed 788 788 */ 789 789 static bool timer_fixup_free(void *addr, enum debug_obj_state state) ··· 801 801 } 802 802 803 803 /* 804 - * fixup_assert_init is called when: 804 + * timer_fixup_assert_init is called when: 805 805 * - an untracked/uninit-ed object is found 806 806 */ 807 807 static bool timer_fixup_assert_init(void *addr, enum debug_obj_state state) ··· 914 914 * @key: lockdep class key of the fake lock used for tracking timer 915 915 * sync lock dependencies 916 916 * 917 - * init_timer_key() must be done to a timer prior calling *any* of the 917 + * init_timer_key() must be done to a timer prior to calling *any* of the 918 918 * other timer functions. 919 919 */ 920 920 void init_timer_key(struct timer_list *timer, ··· 1417 1417 * If @shutdown is set then the lock has to be taken whether the 1418 1418 * timer is pending or not to protect against a concurrent rearm 1419 1419 * which might hit between the lockless pending check and the lock 1420 - * aquisition. By taking the lock it is ensured that such a newly 1420 + * acquisition. By taking the lock it is ensured that such a newly 1421 1421 * enqueued timer is dequeued and cannot end up with 1422 1422 * timer->function == NULL in the expiry code. 1423 1423 * ··· 2306 2306 2307 2307 /* 2308 2308 * When timer base is not set idle, undo the effect of 2309 - * tmigr_cpu_deactivate() to prevent inconsitent states - active 2309 + * tmigr_cpu_deactivate() to prevent inconsistent states - active 2310 2310 * timer base but inactive timer migration hierarchy. 2311 2311 * 2312 2312 * When timer base was already marked idle, nothing will be

+31 -1

kernel/time/timer_migration.c

··· 751 751 752 752 first_childevt = evt = data->evt; 753 753 754 + /* 755 + * Walking the hierarchy is required in any case when a 756 + * remote expiry was done before. This ensures to not lose 757 + * already queued events in non active groups (see section 758 + * "Required event and timerqueue update after a remote 759 + * expiry" in the documentation at the top). 760 + * 761 + * The two call sites which are executed without a remote expiry 762 + * before, are not prevented from propagating changes through 763 + * the hierarchy by the return: 764 + * - When entering this path by tmigr_new_timer(), @evt->ignore 765 + * is never set. 766 + * - tmigr_inactive_up() takes care of the propagation by 767 + * itself and ignores the return value. But an immediate 768 + * return is possible if there is a parent, sparing group 769 + * locking at this level, because the upper walking call to 770 + * the parent will take care about removing this event from 771 + * within the group and update next_expiry accordingly. 772 + * 773 + * However if there is no parent, ie: the hierarchy has only a 774 + * single level so @group is the top level group, make sure the 775 + * first event information of the group is updated properly and 776 + * also handled properly, so skip this fast return path. 777 + */ 778 + if (evt->ignore && !remote && group->parent) 779 + return true; 780 + 754 781 raw_spin_lock(&group->lock); 755 782 756 783 childstate.state = 0; ··· 789 762 * queue when the expiry time changed only or when it could be ignored. 790 763 */ 791 764 if (timerqueue_node_queued(&evt->nextevt)) { 792 - if ((evt->nextevt.expires == nextexp) && !evt->ignore) 765 + if ((evt->nextevt.expires == nextexp) && !evt->ignore) { 766 + /* Make sure not to miss a new CPU event with the same expiry */ 767 + evt->cpu = first_childevt->cpu; 793 768 goto check_toplvl; 769 + } 794 770 795 771 if (!timerqueue_del(&group->events, &evt->nextevt)) 796 772 WRITE_ONCE(group->next_expiry, KTIME_MAX);

+5 -5

kernel/trace/bpf_trace.c

··· 2728 2728 2729 2729 static const struct bpf_link_ops bpf_kprobe_multi_link_lops = { 2730 2730 .release = bpf_kprobe_multi_link_release, 2731 - .dealloc = bpf_kprobe_multi_link_dealloc, 2731 + .dealloc_deferred = bpf_kprobe_multi_link_dealloc, 2732 2732 .fill_link_info = bpf_kprobe_multi_link_fill_link_info, 2733 2733 }; 2734 2734 ··· 3157 3157 3158 3158 umulti_link = container_of(link, struct bpf_uprobe_multi_link, link); 3159 3159 bpf_uprobe_unregister(&umulti_link->path, umulti_link->uprobes, umulti_link->cnt); 3160 + if (umulti_link->task) 3161 + put_task_struct(umulti_link->task); 3162 + path_put(&umulti_link->path); 3160 3163 } 3161 3164 3162 3165 static void bpf_uprobe_multi_link_dealloc(struct bpf_link *link) ··· 3167 3164 struct bpf_uprobe_multi_link *umulti_link; 3168 3165 3169 3166 umulti_link = container_of(link, struct bpf_uprobe_multi_link, link); 3170 - if (umulti_link->task) 3171 - put_task_struct(umulti_link->task); 3172 - path_put(&umulti_link->path); 3173 3167 kvfree(umulti_link->uprobes); 3174 3168 kfree(umulti_link); 3175 3169 } ··· 3242 3242 3243 3243 static const struct bpf_link_ops bpf_uprobe_multi_link_lops = { 3244 3244 .release = bpf_uprobe_multi_link_release, 3245 - .dealloc = bpf_uprobe_multi_link_dealloc, 3245 + .dealloc_deferred = bpf_uprobe_multi_link_dealloc, 3246 3246 .fill_link_info = bpf_uprobe_multi_link_fill_link_info, 3247 3247 }; 3248 3248

+2 -2

lib/stackdepot.c

··· 330 330 stack = current_pool + pool_offset; 331 331 332 332 /* Pre-initialize handle once. */ 333 - stack->handle.pool_index = pool_index + 1; 333 + stack->handle.pool_index_plus_1 = pool_index + 1; 334 334 stack->handle.offset = pool_offset >> DEPOT_STACK_ALIGN; 335 335 stack->handle.extra = 0; 336 336 INIT_LIST_HEAD(&stack->hash_list); ··· 441 441 const int pools_num_cached = READ_ONCE(pools_num); 442 442 union handle_parts parts = { .handle = handle }; 443 443 void *pool; 444 - u32 pool_index = parts.pool_index - 1; 444 + u32 pool_index = parts.pool_index_plus_1 - 1; 445 445 size_t offset = parts.offset << DEPOT_STACK_ALIGN; 446 446 struct stack_record *stack; 447 447

+4

mm/memory.c

··· 5973 5973 goto out; 5974 5974 pte = ptep_get(ptep); 5975 5975 5976 + /* Never return PFNs of anon folios in COW mappings. */ 5977 + if (vm_normal_folio(vma, address, pte)) 5978 + goto unlock; 5979 + 5976 5980 if ((flags & FOLL_WRITE) && !pte_write(pte)) 5977 5981 goto unlock; 5978 5982

+45 -29

mm/vmalloc.c

··· 989 989 return atomic_long_read(&nr_vmalloc_pages); 990 990 } 991 991 992 + static struct vmap_area *__find_vmap_area(unsigned long addr, struct rb_root *root) 993 + { 994 + struct rb_node *n = root->rb_node; 995 + 996 + addr = (unsigned long)kasan_reset_tag((void *)addr); 997 + 998 + while (n) { 999 + struct vmap_area *va; 1000 + 1001 + va = rb_entry(n, struct vmap_area, rb_node); 1002 + if (addr < va->va_start) 1003 + n = n->rb_left; 1004 + else if (addr >= va->va_end) 1005 + n = n->rb_right; 1006 + else 1007 + return va; 1008 + } 1009 + 1010 + return NULL; 1011 + } 1012 + 992 1013 /* Look up the first VA which satisfies addr < va_end, NULL if none. */ 993 1014 static struct vmap_area * 994 1015 __find_vmap_area_exceed_addr(unsigned long addr, struct rb_root *root) ··· 1046 1025 static struct vmap_node * 1047 1026 find_vmap_area_exceed_addr_lock(unsigned long addr, struct vmap_area **va) 1048 1027 { 1049 - struct vmap_node *vn, *va_node = NULL; 1050 - struct vmap_area *va_lowest; 1028 + unsigned long va_start_lowest; 1029 + struct vmap_node *vn; 1051 1030 int i; 1052 1031 1053 - for (i = 0; i < nr_vmap_nodes; i++) { 1032 + repeat: 1033 + for (i = 0, va_start_lowest = 0; i < nr_vmap_nodes; i++) { 1054 1034 vn = &vmap_nodes[i]; 1055 1035 1056 1036 spin_lock(&vn->busy.lock); 1057 - va_lowest = __find_vmap_area_exceed_addr(addr, &vn->busy.root); 1058 - if (va_lowest) { 1059 - if (!va_node || va_lowest->va_start < (*va)->va_start) { 1060 - if (va_node) 1061 - spin_unlock(&va_node->busy.lock); 1037 + *va = __find_vmap_area_exceed_addr(addr, &vn->busy.root); 1062 1038 1063 - *va = va_lowest; 1064 - va_node = vn; 1065 - continue; 1066 - } 1067 - } 1039 + if (*va) 1040 + if (!va_start_lowest || (*va)->va_start < va_start_lowest) 1041 + va_start_lowest = (*va)->va_start; 1068 1042 spin_unlock(&vn->busy.lock); 1069 1043 } 1070 1044 1071 - return va_node; 1072 - } 1045 + /* 1046 + * Check if found VA exists, it might have gone away. In this case we 1047 + * repeat the search because a VA has been removed concurrently and we 1048 + * need to proceed to the next one, which is a rare case. 1049 + */ 1050 + if (va_start_lowest) { 1051 + vn = addr_to_node(va_start_lowest); 1073 1052 1074 - static struct vmap_area *__find_vmap_area(unsigned long addr, struct rb_root *root) 1075 - { 1076 - struct rb_node *n = root->rb_node; 1053 + spin_lock(&vn->busy.lock); 1054 + *va = __find_vmap_area(va_start_lowest, &vn->busy.root); 1077 1055 1078 - addr = (unsigned long)kasan_reset_tag((void *)addr); 1056 + if (*va) 1057 + return vn; 1079 1058 1080 - while (n) { 1081 - struct vmap_area *va; 1082 - 1083 - va = rb_entry(n, struct vmap_area, rb_node); 1084 - if (addr < va->va_start) 1085 - n = n->rb_left; 1086 - else if (addr >= va->va_end) 1087 - n = n->rb_right; 1088 - else 1089 - return va; 1059 + spin_unlock(&vn->busy.lock); 1060 + goto repeat; 1090 1061 } 1091 1062 1092 1063 return NULL; ··· 2355 2342 struct vmap_node *vn; 2356 2343 struct vmap_area *va; 2357 2344 int i, j; 2345 + 2346 + if (unlikely(!vmap_initialized)) 2347 + return NULL; 2358 2348 2359 2349 /* 2360 2350 * An addr_to_node_id(addr) converts an address to a node index

+5 -5

net/9p/client.c

··· 1583 1583 received = rsize; 1584 1584 } 1585 1585 1586 - p9_debug(P9_DEBUG_9P, "<<< RREAD count %d\n", count); 1586 + p9_debug(P9_DEBUG_9P, "<<< RREAD count %d\n", received); 1587 1587 1588 1588 if (non_zc) { 1589 1589 int n = copy_to_iter(dataptr, received, to); ··· 1609 1609 int total = 0; 1610 1610 *err = 0; 1611 1611 1612 - p9_debug(P9_DEBUG_9P, ">>> TWRITE fid %d offset %llu count %zd\n", 1613 - fid->fid, offset, iov_iter_count(from)); 1614 - 1615 1612 while (iov_iter_count(from)) { 1616 1613 int count = iov_iter_count(from); 1617 1614 int rsize = fid->iounit; ··· 1619 1622 1620 1623 if (count < rsize) 1621 1624 rsize = count; 1625 + 1626 + p9_debug(P9_DEBUG_9P, ">>> TWRITE fid %d offset %llu count %d (/%d)\n", 1627 + fid->fid, offset, rsize, count); 1622 1628 1623 1629 /* Don't bother zerocopy for small IO (< 1024) */ 1624 1630 if (clnt->trans_mod->zc_request && rsize > 1024) { ··· 1650 1650 written = rsize; 1651 1651 } 1652 1652 1653 - p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count); 1653 + p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", written); 1654 1654 1655 1655 p9_req_put(clnt, req); 1656 1656 iov_iter_revert(from, count - written - iov_iter_count(from));

-1

net/9p/trans_fd.c

··· 95 95 * @unsent_req_list: accounting for requests that haven't been sent 96 96 * @rreq: read request 97 97 * @wreq: write request 98 - * @req: current request being processed (if any) 99 98 * @tmp_buf: temporary buffer to read in header 100 99 * @rc: temporary fcall for reading current frame 101 100 * @wpos: write position for current frame

+1 -1

net/ax25/ax25_dev.c

··· 105 105 spin_lock_bh(&ax25_dev_lock); 106 106 107 107 #ifdef CONFIG_AX25_DAMA_SLAVE 108 - ax25_ds_del_timer(ax25_dev); 108 + timer_shutdown_sync(&ax25_dev->dama.slave_timer); 109 109 #endif 110 110 111 111 /*

+3 -3

net/bluetooth/hci_core.c

··· 2874 2874 cancel_delayed_work_sync(&hdev->ncmd_timer); 2875 2875 atomic_set(&hdev->cmd_cnt, 1); 2876 2876 2877 - hci_cmd_sync_cancel_sync(hdev, -err); 2877 + hci_cmd_sync_cancel_sync(hdev, err); 2878 2878 } 2879 2879 2880 2880 /* Suspend HCI device */ ··· 2894 2894 return 0; 2895 2895 2896 2896 /* Cancel potentially blocking sync operation before suspend */ 2897 - hci_cancel_cmd_sync(hdev, -EHOSTDOWN); 2897 + hci_cancel_cmd_sync(hdev, EHOSTDOWN); 2898 2898 2899 2899 hci_req_sync_lock(hdev); 2900 2900 ret = hci_suspend_sync(hdev); ··· 4210 4210 4211 4211 err = hci_send_frame(hdev, skb); 4212 4212 if (err < 0) { 4213 - hci_cmd_sync_cancel_sync(hdev, err); 4213 + hci_cmd_sync_cancel_sync(hdev, -err); 4214 4214 return; 4215 4215 } 4216 4216

+40 -24

net/bluetooth/hci_debugfs.c

··· 218 218 { 219 219 struct hci_dev *hdev = data; 220 220 221 - if (val == 0 || val > hdev->conn_info_max_age) 222 - return -EINVAL; 223 - 224 221 hci_dev_lock(hdev); 222 + if (val == 0 || val > hdev->conn_info_max_age) { 223 + hci_dev_unlock(hdev); 224 + return -EINVAL; 225 + } 226 + 225 227 hdev->conn_info_min_age = val; 226 228 hci_dev_unlock(hdev); 227 229 ··· 248 246 { 249 247 struct hci_dev *hdev = data; 250 248 251 - if (val == 0 || val < hdev->conn_info_min_age) 252 - return -EINVAL; 253 - 254 249 hci_dev_lock(hdev); 250 + if (val == 0 || val < hdev->conn_info_min_age) { 251 + hci_dev_unlock(hdev); 252 + return -EINVAL; 253 + } 254 + 255 255 hdev->conn_info_max_age = val; 256 256 hci_dev_unlock(hdev); 257 257 ··· 571 567 { 572 568 struct hci_dev *hdev = data; 573 569 574 - if (val == 0 || val % 2 || val > hdev->sniff_max_interval) 575 - return -EINVAL; 576 - 577 570 hci_dev_lock(hdev); 571 + if (val == 0 || val % 2 || val > hdev->sniff_max_interval) { 572 + hci_dev_unlock(hdev); 573 + return -EINVAL; 574 + } 575 + 578 576 hdev->sniff_min_interval = val; 579 577 hci_dev_unlock(hdev); 580 578 ··· 601 595 { 602 596 struct hci_dev *hdev = data; 603 597 604 - if (val == 0 || val % 2 || val < hdev->sniff_min_interval) 605 - return -EINVAL; 606 - 607 598 hci_dev_lock(hdev); 599 + if (val == 0 || val % 2 || val < hdev->sniff_min_interval) { 600 + hci_dev_unlock(hdev); 601 + return -EINVAL; 602 + } 603 + 608 604 hdev->sniff_max_interval = val; 609 605 hci_dev_unlock(hdev); 610 606 ··· 858 850 { 859 851 struct hci_dev *hdev = data; 860 852 861 - if (val < 0x0006 || val > 0x0c80 || val > hdev->le_conn_max_interval) 862 - return -EINVAL; 863 - 864 853 hci_dev_lock(hdev); 854 + if (val < 0x0006 || val > 0x0c80 || val > hdev->le_conn_max_interval) { 855 + hci_dev_unlock(hdev); 856 + return -EINVAL; 857 + } 858 + 865 859 hdev->le_conn_min_interval = val; 866 860 hci_dev_unlock(hdev); 867 861 ··· 888 878 { 889 879 struct hci_dev *hdev = data; 890 880 891 - if (val < 0x0006 || val > 0x0c80 || val < hdev->le_conn_min_interval) 892 - return -EINVAL; 893 - 894 881 hci_dev_lock(hdev); 882 + if (val < 0x0006 || val > 0x0c80 || val < hdev->le_conn_min_interval) { 883 + hci_dev_unlock(hdev); 884 + return -EINVAL; 885 + } 886 + 895 887 hdev->le_conn_max_interval = val; 896 888 hci_dev_unlock(hdev); 897 889 ··· 1002 990 { 1003 991 struct hci_dev *hdev = data; 1004 992 1005 - if (val < 0x0020 || val > 0x4000 || val > hdev->le_adv_max_interval) 1006 - return -EINVAL; 1007 - 1008 993 hci_dev_lock(hdev); 994 + if (val < 0x0020 || val > 0x4000 || val > hdev->le_adv_max_interval) { 995 + hci_dev_unlock(hdev); 996 + return -EINVAL; 997 + } 998 + 1009 999 hdev->le_adv_min_interval = val; 1010 1000 hci_dev_unlock(hdev); 1011 1001 ··· 1032 1018 { 1033 1019 struct hci_dev *hdev = data; 1034 1020 1035 - if (val < 0x0020 || val > 0x4000 || val < hdev->le_adv_min_interval) 1036 - return -EINVAL; 1037 - 1038 1021 hci_dev_lock(hdev); 1022 + if (val < 0x0020 || val > 0x4000 || val < hdev->le_adv_min_interval) { 1023 + hci_dev_unlock(hdev); 1024 + return -EINVAL; 1025 + } 1026 + 1039 1027 hdev->le_adv_max_interval = val; 1040 1028 hci_dev_unlock(hdev); 1041 1029

+25

net/bluetooth/hci_event.c

··· 3208 3208 if (test_bit(HCI_ENCRYPT, &hdev->flags)) 3209 3209 set_bit(HCI_CONN_ENCRYPT, &conn->flags); 3210 3210 3211 + /* "Link key request" completed ahead of "connect request" completes */ 3212 + if (ev->encr_mode == 1 && !test_bit(HCI_CONN_ENCRYPT, &conn->flags) && 3213 + ev->link_type == ACL_LINK) { 3214 + struct link_key *key; 3215 + struct hci_cp_read_enc_key_size cp; 3216 + 3217 + key = hci_find_link_key(hdev, &ev->bdaddr); 3218 + if (key) { 3219 + set_bit(HCI_CONN_ENCRYPT, &conn->flags); 3220 + 3221 + if (!(hdev->commands[20] & 0x10)) { 3222 + conn->enc_key_size = HCI_LINK_KEY_SIZE; 3223 + } else { 3224 + cp.handle = cpu_to_le16(conn->handle); 3225 + if (hci_send_cmd(hdev, HCI_OP_READ_ENC_KEY_SIZE, 3226 + sizeof(cp), &cp)) { 3227 + bt_dev_err(hdev, "sending read key size failed"); 3228 + conn->enc_key_size = HCI_LINK_KEY_SIZE; 3229 + } 3230 + } 3231 + 3232 + hci_encrypt_cfm(conn, ev->status); 3233 + } 3234 + } 3235 + 3211 3236 /* Get remote features */ 3212 3237 if (conn->type == ACL_LINK) { 3213 3238 struct hci_cp_read_remote_features cp;

+8 -2

net/bluetooth/hci_sync.c

··· 617 617 bt_dev_dbg(hdev, "err 0x%2.2x", err); 618 618 619 619 if (hdev->req_status == HCI_REQ_PEND) { 620 - hdev->req_result = err; 620 + /* req_result is __u32 so error must be positive to be properly 621 + * propagated. 622 + */ 623 + hdev->req_result = err < 0 ? -err : err; 621 624 hdev->req_status = HCI_REQ_CANCELED; 622 625 623 626 wake_up_interruptible(&hdev->req_wait_q); ··· 3419 3416 if (ret < 0 || !bacmp(&ba, BDADDR_ANY)) 3420 3417 return; 3421 3418 3422 - bacpy(&hdev->public_addr, &ba); 3419 + if (test_bit(HCI_QUIRK_BDADDR_PROPERTY_BROKEN, &hdev->quirks)) 3420 + baswap(&hdev->public_addr, &ba); 3421 + else 3422 + bacpy(&hdev->public_addr, &ba); 3423 3423 } 3424 3424 3425 3425 struct hci_init_stage {

+6

net/bridge/netfilter/ebtables.c

··· 1111 1111 struct ebt_table_info *newinfo; 1112 1112 struct ebt_replace tmp; 1113 1113 1114 + if (len < sizeof(tmp)) 1115 + return -EINVAL; 1114 1116 if (copy_from_sockptr(&tmp, arg, sizeof(tmp)) != 0) 1115 1117 return -EFAULT; 1116 1118 ··· 1425 1423 { 1426 1424 struct ebt_replace hlp; 1427 1425 1426 + if (len < sizeof(hlp)) 1427 + return -EINVAL; 1428 1428 if (copy_from_sockptr(&hlp, arg, sizeof(hlp))) 1429 1429 return -EFAULT; 1430 1430 ··· 2356 2352 { 2357 2353 struct compat_ebt_replace hlp; 2358 2354 2355 + if (len < sizeof(hlp)) 2356 + return -EINVAL; 2359 2357 if (copy_from_sockptr(&hlp, arg, sizeof(hlp))) 2360 2358 return -EFAULT; 2361 2359

+1 -1

net/core/dev.c

··· 429 429 * PP consumers must pay attention to run APIs in the appropriate context 430 430 * (e.g. NAPI context). 431 431 */ 432 - static DEFINE_PER_CPU_ALIGNED(struct page_pool *, system_page_pool); 432 + static DEFINE_PER_CPU(struct page_pool *, system_page_pool); 433 433 434 434 #ifdef CONFIG_LOCKDEP 435 435 /*

+2 -1

net/core/gro.c

··· 192 192 } 193 193 194 194 merge: 195 - /* sk owenrship - if any - completely transferred to the aggregated packet */ 195 + /* sk ownership - if any - completely transferred to the aggregated packet */ 196 196 skb->destructor = NULL; 197 + skb->sk = NULL; 197 198 delta_truesize = skb->truesize; 198 199 if (offset > headlen) { 199 200 unsigned int eat = offset - headlen;

+6

net/core/sock_map.c

··· 411 411 struct sock *sk; 412 412 int err = 0; 413 413 414 + if (irqs_disabled()) 415 + return -EOPNOTSUPP; /* locks here are hardirq-unsafe */ 416 + 414 417 spin_lock_bh(&stab->lock); 415 418 sk = *psk; 416 419 if (!sk_test || sk_test == sk) ··· 935 932 struct bpf_shtab_bucket *bucket; 936 933 struct bpf_shtab_elem *elem; 937 934 int ret = -ENOENT; 935 + 936 + if (irqs_disabled()) 937 + return -EOPNOTSUPP; /* locks here are hardirq-unsafe */ 938 938 939 939 hash = sock_hash_bucket_hash(key, key_size); 940 940 bucket = sock_hash_select_bucket(htab, hash);

+6 -7

net/hsr/hsr_device.c

··· 132 132 { 133 133 struct hsr_priv *hsr; 134 134 struct hsr_port *port; 135 - char designation; 135 + const char *designation = NULL; 136 136 137 137 hsr = netdev_priv(dev); 138 - designation = '\0'; 139 138 140 139 hsr_for_each_port(hsr, port) { 141 140 if (port->type == HSR_PT_MASTER) 142 141 continue; 143 142 switch (port->type) { 144 143 case HSR_PT_SLAVE_A: 145 - designation = 'A'; 144 + designation = "Slave A"; 146 145 break; 147 146 case HSR_PT_SLAVE_B: 148 - designation = 'B'; 147 + designation = "Slave B"; 149 148 break; 150 149 default: 151 - designation = '?'; 150 + designation = "Unknown"; 152 151 } 153 152 if (!is_slave_up(port->dev)) 154 - netdev_warn(dev, "Slave %c (%s) is not up; please bring it up to get a fully working HSR network\n", 153 + netdev_warn(dev, "%s (%s) is not up; please bring it up to get a fully working HSR network\n", 155 154 designation, port->dev->name); 156 155 } 157 156 158 - if (designation == '\0') 157 + if (!designation) 159 158 netdev_warn(dev, "No slave devices configured\n"); 160 159 161 160 return 0;

+20 -10

net/ipv4/inet_connection_sock.c

··· 203 203 kuid_t sk_uid, bool relax, 204 204 bool reuseport_cb_ok, bool reuseport_ok) 205 205 { 206 - if (sk->sk_family == AF_INET && ipv6_only_sock(sk2)) 207 - return false; 206 + if (ipv6_only_sock(sk2)) { 207 + if (sk->sk_family == AF_INET) 208 + return false; 209 + 210 + #if IS_ENABLED(CONFIG_IPV6) 211 + if (ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)) 212 + return false; 213 + #endif 214 + } 208 215 209 216 return inet_bind_conflict(sk, sk2, sk_uid, relax, 210 217 reuseport_cb_ok, reuseport_ok); ··· 294 287 struct sock_reuseport *reuseport_cb; 295 288 struct inet_bind_hashbucket *head2; 296 289 struct inet_bind2_bucket *tb2; 290 + bool conflict = false; 297 291 bool reuseport_cb_ok; 298 292 299 293 rcu_read_lock(); ··· 307 299 308 300 spin_lock(&head2->lock); 309 301 310 - inet_bind_bucket_for_each(tb2, &head2->chain) 311 - if (inet_bind2_bucket_match_addr_any(tb2, net, port, l3mdev, sk)) 312 - break; 302 + inet_bind_bucket_for_each(tb2, &head2->chain) { 303 + if (!inet_bind2_bucket_match_addr_any(tb2, net, port, l3mdev, sk)) 304 + continue; 313 305 314 - if (tb2 && inet_bhash2_conflict(sk, tb2, uid, relax, reuseport_cb_ok, 315 - reuseport_ok)) { 316 - spin_unlock(&head2->lock); 317 - return true; 306 + if (!inet_bhash2_conflict(sk, tb2, uid, relax, reuseport_cb_ok, reuseport_ok)) 307 + continue; 308 + 309 + conflict = true; 310 + break; 318 311 } 319 312 320 313 spin_unlock(&head2->lock); 321 - return false; 314 + 315 + return conflict; 322 316 } 323 317 324 318 /*

+5

net/ipv4/ip_gre.c

··· 280 280 tpi->flags | TUNNEL_NO_KEY, 281 281 iph->saddr, iph->daddr, 0); 282 282 } else { 283 + if (unlikely(!pskb_may_pull(skb, 284 + gre_hdr_len + sizeof(*ershdr)))) 285 + return PACKET_REJECT; 286 + 283 287 ershdr = (struct erspan_base_hdr *)(skb->data + gre_hdr_len); 284 288 ver = ershdr->ver; 289 + iph = ip_hdr(skb); 285 290 tunnel = ip_tunnel_lookup(itn, skb->dev->ifindex, 286 291 tpi->flags | TUNNEL_KEY, 287 292 iph->saddr, iph->daddr, tpi->key);

+4

net/ipv4/netfilter/arp_tables.c

··· 956 956 void *loc_cpu_entry; 957 957 struct arpt_entry *iter; 958 958 959 + if (len < sizeof(tmp)) 960 + return -EINVAL; 959 961 if (copy_from_sockptr(&tmp, arg, sizeof(tmp)) != 0) 960 962 return -EFAULT; 961 963 ··· 1256 1254 void *loc_cpu_entry; 1257 1255 struct arpt_entry *iter; 1258 1256 1257 + if (len < sizeof(tmp)) 1258 + return -EINVAL; 1259 1259 if (copy_from_sockptr(&tmp, arg, sizeof(tmp)) != 0) 1260 1260 return -EFAULT; 1261 1261

+4

net/ipv4/netfilter/ip_tables.c

··· 1108 1108 void *loc_cpu_entry; 1109 1109 struct ipt_entry *iter; 1110 1110 1111 + if (len < sizeof(tmp)) 1112 + return -EINVAL; 1111 1113 if (copy_from_sockptr(&tmp, arg, sizeof(tmp)) != 0) 1112 1114 return -EFAULT; 1113 1115 ··· 1494 1492 void *loc_cpu_entry; 1495 1493 struct ipt_entry *iter; 1496 1494 1495 + if (len < sizeof(tmp)) 1496 + return -EINVAL; 1497 1497 if (copy_from_sockptr(&tmp, arg, sizeof(tmp)) != 0) 1498 1498 return -EFAULT; 1499 1499

+7

net/ipv4/udp.c

··· 582 582 } 583 583 584 584 DEFINE_STATIC_KEY_FALSE(udp_encap_needed_key); 585 + EXPORT_SYMBOL(udp_encap_needed_key); 586 + 587 + #if IS_ENABLED(CONFIG_IPV6) 588 + DEFINE_STATIC_KEY_FALSE(udpv6_encap_needed_key); 589 + EXPORT_SYMBOL(udpv6_encap_needed_key); 590 + #endif 591 + 585 592 void udp_encap_enable(void) 586 593 { 587 594 static_branch_inc(&udp_encap_needed_key);

+13 -10

net/ipv4/udp_offload.c

··· 449 449 NAPI_GRO_CB(p)->count++; 450 450 p->data_len += skb->len; 451 451 452 - /* sk owenrship - if any - completely transferred to the aggregated packet */ 452 + /* sk ownership - if any - completely transferred to the aggregated packet */ 453 453 skb->destructor = NULL; 454 + skb->sk = NULL; 454 455 p->truesize += skb->truesize; 455 456 p->len += skb->len; 456 457 ··· 552 551 unsigned int off = skb_gro_offset(skb); 553 552 int flush = 1; 554 553 555 - /* we can do L4 aggregation only if the packet can't land in a tunnel 556 - * otherwise we could corrupt the inner stream 554 + /* We can do L4 aggregation only if the packet can't land in a tunnel 555 + * otherwise we could corrupt the inner stream. Detecting such packets 556 + * cannot be foolproof and the aggregation might still happen in some 557 + * cases. Such packets should be caught in udp_unexpected_gso later. 557 558 */ 558 559 NAPI_GRO_CB(skb)->is_flist = 0; 559 560 if (!sk || !udp_sk(sk)->gro_receive) { 561 + /* If the packet was locally encapsulated in a UDP tunnel that 562 + * wasn't detected above, do not GRO. 563 + */ 564 + if (skb->encapsulation) 565 + goto out; 566 + 560 567 if (skb->dev->features & NETIF_F_GRO_FRAGLIST) 561 568 NAPI_GRO_CB(skb)->is_flist = sk ? !udp_test_bit(GRO_ENABLED, sk) : 1; 562 569 ··· 728 719 skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4); 729 720 skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count; 730 721 731 - if (skb->ip_summed == CHECKSUM_UNNECESSARY) { 732 - if (skb->csum_level < SKB_MAX_CSUM_LEVEL) 733 - skb->csum_level++; 734 - } else { 735 - skb->ip_summed = CHECKSUM_UNNECESSARY; 736 - skb->csum_level = 0; 737 - } 722 + __skb_incr_checksum_unnecessary(skb); 738 723 739 724 return 0; 740 725 }

+7 -7

net/ipv6/ip6_fib.c

··· 651 651 if (!w) { 652 652 /* New dump: 653 653 * 654 - * 1. hook callback destructor. 655 - */ 656 - cb->args[3] = (long)cb->done; 657 - cb->done = fib6_dump_done; 658 - 659 - /* 660 - * 2. allocate and initialize walker. 654 + * 1. allocate and initialize walker. 661 655 */ 662 656 w = kzalloc(sizeof(*w), GFP_ATOMIC); 663 657 if (!w) 664 658 return -ENOMEM; 665 659 w->func = fib6_dump_node; 666 660 cb->args[2] = (long)w; 661 + 662 + /* 2. hook callback destructor. 663 + */ 664 + cb->args[3] = (long)cb->done; 665 + cb->done = fib6_dump_done; 666 + 667 667 } 668 668 669 669 arg.skb = skb;

+3

net/ipv6/ip6_gre.c

··· 528 528 struct ip6_tnl *tunnel; 529 529 u8 ver; 530 530 531 + if (unlikely(!pskb_may_pull(skb, sizeof(*ershdr)))) 532 + return PACKET_REJECT; 533 + 531 534 ipv6h = ipv6_hdr(skb); 532 535 ershdr = (struct erspan_base_hdr *)skb->data; 533 536 ver = ershdr->ver;

+4

net/ipv6/netfilter/ip6_tables.c

··· 1125 1125 void *loc_cpu_entry; 1126 1126 struct ip6t_entry *iter; 1127 1127 1128 + if (len < sizeof(tmp)) 1129 + return -EINVAL; 1128 1130 if (copy_from_sockptr(&tmp, arg, sizeof(tmp)) != 0) 1129 1131 return -EFAULT; 1130 1132 ··· 1503 1501 void *loc_cpu_entry; 1504 1502 struct ip6t_entry *iter; 1505 1503 1504 + if (len < sizeof(tmp)) 1505 + return -EINVAL; 1506 1506 if (copy_from_sockptr(&tmp, arg, sizeof(tmp)) != 0) 1507 1507 return -EFAULT; 1508 1508

+1 -1

net/ipv6/udp.c

··· 447 447 goto try_again; 448 448 } 449 449 450 - DEFINE_STATIC_KEY_FALSE(udpv6_encap_needed_key); 450 + DECLARE_STATIC_KEY_FALSE(udpv6_encap_needed_key); 451 451 void udpv6_encap_enable(void) 452 452 { 453 453 static_branch_inc(&udpv6_encap_needed_key);

+1 -7

net/ipv6/udp_offload.c

··· 174 174 skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4); 175 175 skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count; 176 176 177 - if (skb->ip_summed == CHECKSUM_UNNECESSARY) { 178 - if (skb->csum_level < SKB_MAX_CSUM_LEVEL) 179 - skb->csum_level++; 180 - } else { 181 - skb->ip_summed = CHECKSUM_UNNECESSARY; 182 - skb->csum_level = 0; 183 - } 177 + __skb_incr_checksum_unnecessary(skb); 184 178 185 179 return 0; 186 180 }

-2

net/mptcp/protocol.c

··· 3937 3937 mptcp_set_state(newsk, TCP_CLOSE); 3938 3938 } 3939 3939 } else { 3940 - MPTCP_INC_STATS(sock_net(ssk), 3941 - MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK); 3942 3940 tcpfallback: 3943 3941 newsk->sk_kern_sock = kern; 3944 3942 lock_sock(newsk);

+4

net/mptcp/sockopt.c

··· 1493 1493 struct mptcp_subflow_context *subflow; 1494 1494 int space, cap; 1495 1495 1496 + /* bpf can land here with a wrong sk type */ 1497 + if (sk->sk_protocol == IPPROTO_TCP) 1498 + return -EINVAL; 1499 + 1496 1500 if (sk->sk_userlocks & SOCK_RCVBUF_LOCK) 1497 1501 cap = sk->sk_rcvbuf >> 1; 1498 1502 else

+2

net/mptcp/subflow.c

··· 905 905 return child; 906 906 907 907 fallback: 908 + if (fallback) 909 + SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK); 908 910 mptcp_subflow_drop_ctx(child); 909 911 return child; 910 912 }

+34 -16

net/netfilter/nf_tables_api.c

··· 1209 1209 return true; 1210 1210 1211 1211 list_for_each_entry(trans, &nft_net->commit_list, list) { 1212 - if ((trans->msg_type == NFT_MSG_NEWCHAIN || 1213 - trans->msg_type == NFT_MSG_DELCHAIN) && 1214 - trans->ctx.table == ctx->table && 1215 - nft_trans_chain_update(trans)) 1212 + if (trans->ctx.table == ctx->table && 1213 + ((trans->msg_type == NFT_MSG_NEWCHAIN && 1214 + nft_trans_chain_update(trans)) || 1215 + (trans->msg_type == NFT_MSG_DELCHAIN && 1216 + nft_is_base_chain(trans->ctx.chain)))) 1216 1217 return true; 1217 1218 } 1218 1219 ··· 2449 2448 if (nla[NFTA_CHAIN_HOOK]) { 2450 2449 struct nft_stats __percpu *stats = NULL; 2451 2450 struct nft_chain_hook hook = {}; 2451 + 2452 + if (table->flags & __NFT_TABLE_F_UPDATE) 2453 + return -EINVAL; 2452 2454 2453 2455 if (flags & NFT_CHAIN_BINDING) 2454 2456 return -EOPNOTSUPP; ··· 8297 8293 return err; 8298 8294 } 8299 8295 8296 + /* call under rcu_read_lock */ 8300 8297 static const struct nf_flowtable_type *__nft_flowtable_type_get(u8 family) 8301 8298 { 8302 8299 const struct nf_flowtable_type *type; 8303 8300 8304 - list_for_each_entry(type, &nf_tables_flowtables, list) { 8301 + list_for_each_entry_rcu(type, &nf_tables_flowtables, list) { 8305 8302 if (family == type->family) 8306 8303 return type; 8307 8304 } ··· 8314 8309 { 8315 8310 const struct nf_flowtable_type *type; 8316 8311 8312 + rcu_read_lock(); 8317 8313 type = __nft_flowtable_type_get(family); 8318 - if (type != NULL && try_module_get(type->owner)) 8314 + if (type != NULL && try_module_get(type->owner)) { 8315 + rcu_read_unlock(); 8319 8316 return type; 8317 + } 8318 + rcu_read_unlock(); 8320 8319 8321 8320 lockdep_nfnl_nft_mutex_not_held(); 8322 8321 #ifdef CONFIG_MODULES ··· 10464 10455 struct nft_trans *trans, *next; 10465 10456 LIST_HEAD(set_update_list); 10466 10457 struct nft_trans_elem *te; 10458 + int err = 0; 10467 10459 10468 10460 if (action == NFNL_ABORT_VALIDATE && 10469 10461 nf_tables_validate(net) < 0) 10470 - return -EAGAIN; 10462 + err = -EAGAIN; 10471 10463 10472 10464 list_for_each_entry_safe_reverse(trans, next, &nft_net->commit_list, 10473 10465 list) { ··· 10660 10650 nf_tables_abort_release(trans); 10661 10651 } 10662 10652 10663 - if (action == NFNL_ABORT_AUTOLOAD) 10664 - nf_tables_module_autoload(net); 10665 - else 10666 - nf_tables_module_autoload_cleanup(net); 10667 - 10668 - return 0; 10653 + return err; 10669 10654 } 10670 10655 10671 10656 static int nf_tables_abort(struct net *net, struct sk_buff *skb, ··· 10673 10668 gc_seq = nft_gc_seq_begin(nft_net); 10674 10669 ret = __nf_tables_abort(net, action); 10675 10670 nft_gc_seq_end(nft_net, gc_seq); 10671 + 10672 + WARN_ON_ONCE(!list_empty(&nft_net->commit_list)); 10673 + 10674 + /* module autoload needs to happen after GC sequence update because it 10675 + * temporarily releases and grabs mutex again. 10676 + */ 10677 + if (action == NFNL_ABORT_AUTOLOAD) 10678 + nf_tables_module_autoload(net); 10679 + else 10680 + nf_tables_module_autoload_cleanup(net); 10681 + 10676 10682 mutex_unlock(&nft_net->commit_mutex); 10677 10683 10678 10684 return ret; ··· 11489 11473 11490 11474 gc_seq = nft_gc_seq_begin(nft_net); 11491 11475 11492 - if (!list_empty(&nft_net->commit_list) || 11493 - !list_empty(&nft_net->module_list)) 11494 - __nf_tables_abort(net, NFNL_ABORT_NONE); 11476 + WARN_ON_ONCE(!list_empty(&nft_net->commit_list)); 11477 + 11478 + if (!list_empty(&nft_net->module_list)) 11479 + nf_tables_module_autoload_cleanup(net); 11495 11480 11496 11481 __nft_release_tables(net); 11497 11482 ··· 11584 11567 unregister_netdevice_notifier(&nf_tables_flowtable_notifier); 11585 11568 nft_chain_filter_fini(); 11586 11569 nft_chain_route_fini(); 11570 + nf_tables_trans_destroy_flush_work(); 11587 11571 unregister_pernet_subsys(&nf_tables_net_ops); 11588 11572 cancel_work_sync(&trans_gc_work); 11589 11573 cancel_work_sync(&trans_destroy_work);

+1 -1

net/rds/rdma.c

··· 302 302 } 303 303 ret = PTR_ERR(trans_private); 304 304 /* Trigger connection so that its ready for the next retry */ 305 - if (ret == -ENODEV) 305 + if (ret == -ENODEV && cp) 306 306 rds_conn_connect_if_down(cp->cp_conn); 307 307 goto out; 308 308 }

+5 -5

net/sched/act_skbmod.c

··· 241 241 struct tcf_skbmod *d = to_skbmod(a); 242 242 unsigned char *b = skb_tail_pointer(skb); 243 243 struct tcf_skbmod_params *p; 244 - struct tc_skbmod opt = { 245 - .index = d->tcf_index, 246 - .refcnt = refcount_read(&d->tcf_refcnt) - ref, 247 - .bindcnt = atomic_read(&d->tcf_bindcnt) - bind, 248 - }; 244 + struct tc_skbmod opt; 249 245 struct tcf_t t; 250 246 247 + memset(&opt, 0, sizeof(opt)); 248 + opt.index = d->tcf_index; 249 + opt.refcnt = refcount_read(&d->tcf_refcnt) - ref, 250 + opt.bindcnt = atomic_read(&d->tcf_bindcnt) - bind; 251 251 spin_lock_bh(&d->tcf_lock); 252 252 opt.action = d->tcf_action; 253 253 p = rcu_dereference_protected(d->skbmod_p,

+1 -1

net/sched/sch_api.c

··· 809 809 notify = !sch->q.qlen && !WARN_ON_ONCE(!n && 810 810 !qdisc_is_offloaded); 811 811 /* TODO: perform the search on a per txq basis */ 812 - sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid)); 812 + sch = qdisc_lookup_rcu(qdisc_dev(sch), TC_H_MAJ(parentid)); 813 813 if (sch == NULL) { 814 814 WARN_ON_ONCE(parentid != TC_H_ROOT); 815 815 break;

+1 -9

net/sunrpc/svcsock.c

··· 1206 1206 * MSG_SPLICE_PAGES is used exclusively to reduce the number of 1207 1207 * copy operations in this path. Therefore the caller must ensure 1208 1208 * that the pages backing @xdr are unchanging. 1209 - * 1210 - * Note that the send is non-blocking. The caller has incremented 1211 - * the reference count on each page backing the RPC message, and 1212 - * the network layer will "put" these pages when transmission is 1213 - * complete. 1214 - * 1215 - * This is safe for our RPC services because the memory backing 1216 - * the head and tail components is never kmalloc'd. These always 1217 - * come from pages in the svc_rqst::rq_pages array. 1218 1209 */ 1219 1210 static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp, 1220 1211 rpc_fraghdr marker, unsigned int *sentp) ··· 1235 1244 iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, 1236 1245 1 + count, sizeof(marker) + rqstp->rq_res.len); 1237 1246 ret = sock_sendmsg(svsk->sk_sock, &msg); 1247 + page_frag_free(buf); 1238 1248 if (ret < 0) 1239 1249 return ret; 1240 1250 *sentp += ret;

+2 -1

net/vmw_vsock/virtio_transport.c

··· 120 120 if (!skb) 121 121 break; 122 122 123 - virtio_transport_deliver_tap_pkt(skb); 124 123 reply = virtio_vsock_skb_reply(skb); 125 124 sgs = vsock->out_sgs; 126 125 sg_init_one(sgs[out_sg], virtio_vsock_hdr(skb), ··· 168 169 virtio_vsock_skb_queue_head(&vsock->send_pkt_queue, skb); 169 170 break; 170 171 } 172 + 173 + virtio_transport_deliver_tap_pkt(skb); 171 174 172 175 if (reply) { 173 176 struct virtqueue *rx_vq = vsock->vqs[VSOCK_VQ_RX];

+1 -1

scripts/kernel-doc

··· 1541 1541 save_struct_actual($2); 1542 1542 1543 1543 push_parameter($2, "$type $1", $arg, $file, $declaration_name); 1544 - } elsif ($param =~ m/(.*?):(\d+)/) { 1544 + } elsif ($param =~ m/(.*?):(\w+)/) { 1545 1545 if ($type ne "") { # skip unnamed bit-fields 1546 1546 save_struct_actual($1); 1547 1547 push_parameter($1, "$type:$2", $arg, $file, $declaration_name)

+2 -2

security/security.c

··· 1793 1793 EXPORT_SYMBOL(security_path_mknod); 1794 1794 1795 1795 /** 1796 - * security_path_post_mknod() - Update inode security field after file creation 1796 + * security_path_post_mknod() - Update inode security after reg file creation 1797 1797 * @idmap: idmap of the mount 1798 1798 * @dentry: new file 1799 1799 * 1800 - * Update inode security field after a file has been created. 1800 + * Update inode security field after a regular file has been created. 1801 1801 */ 1802 1802 void security_path_post_mknod(struct mnt_idmap *idmap, struct dentry *dentry) 1803 1803 {

+7 -5

security/selinux/selinuxfs.c

··· 2123 2123 .kill_sb = sel_kill_sb, 2124 2124 }; 2125 2125 2126 - static struct vfsmount *selinuxfs_mount __ro_after_init; 2127 2126 struct path selinux_null __ro_after_init; 2128 2127 2129 2128 static int __init init_sel_fs(void) ··· 2144 2145 return err; 2145 2146 } 2146 2147 2147 - selinux_null.mnt = selinuxfs_mount = kern_mount(&sel_fs_type); 2148 - if (IS_ERR(selinuxfs_mount)) { 2148 + selinux_null.mnt = kern_mount(&sel_fs_type); 2149 + if (IS_ERR(selinux_null.mnt)) { 2149 2150 pr_err("selinuxfs: could not mount!\n"); 2150 - err = PTR_ERR(selinuxfs_mount); 2151 - selinuxfs_mount = NULL; 2151 + err = PTR_ERR(selinux_null.mnt); 2152 + selinux_null.mnt = NULL; 2153 + return err; 2152 2154 } 2155 + 2153 2156 selinux_null.dentry = d_hash_and_lookup(selinux_null.mnt->mnt_root, 2154 2157 &null_name); 2155 2158 if (IS_ERR(selinux_null.dentry)) { 2156 2159 pr_err("selinuxfs: could not lookup null!\n"); 2157 2160 err = PTR_ERR(selinux_null.dentry); 2158 2161 selinux_null.dentry = NULL; 2162 + return err; 2159 2163 } 2160 2164 2161 2165 return err;

+7 -1

sound/oss/dmasound/dmasound_paula.c

··· 725 725 dmasound_deinit(); 726 726 } 727 727 728 - static struct platform_driver amiga_audio_driver = { 728 + /* 729 + * amiga_audio_remove() lives in .exit.text. For drivers registered via 730 + * module_platform_driver_probe() this is ok because they cannot get unbound at 731 + * runtime. So mark the driver struct with __refdata to prevent modpost 732 + * triggering a section mismatch warning. 733 + */ 734 + static struct platform_driver amiga_audio_driver __refdata = { 729 735 .remove_new = __exit_p(amiga_audio_remove), 730 736 .driver = { 731 737 .name = "amiga-audio",

+2 -5

sound/pci/emu10k1/emu10k1_callback.c

··· 255 255 /* check if sample is finished playing (non-looping only) */ 256 256 if (bp != best + V_OFF && bp != best + V_FREE && 257 257 (vp->reg.sample_mode & SNDRV_SFNT_SAMPLE_SINGLESHOT)) { 258 - val = snd_emu10k1_ptr_read(hw, CCCA_CURRADDR, vp->ch) - 64; 258 + val = snd_emu10k1_ptr_read(hw, CCCA_CURRADDR, vp->ch); 259 259 if (val >= vp->reg.loopstart) 260 260 bp = best + V_OFF; 261 261 } ··· 362 362 363 363 map = (hw->silent_page.addr << hw->address_mode) | (hw->address_mode ? MAP_PTI_MASK1 : MAP_PTI_MASK0); 364 364 365 - addr = vp->reg.start + 64; 365 + addr = vp->reg.start; 366 366 temp = vp->reg.parm.filterQ; 367 367 ccca = (temp << 28) | addr; 368 368 if (vp->apitch < 0xe400) ··· 429 429 430 430 /* Q & current address (Q 4bit value, MSB) */ 431 431 CCCA, ccca, 432 - 433 - /* cache */ 434 - CCR, REG_VAL_PUT(CCR_CACHEINVALIDSIZE, 64), 435 432 436 433 /* reset volume */ 437 434 VTFT, vtarget | vp->ftarget,

+6

sound/pci/hda/cs35l41_hda_property.c

··· 108 108 { "10431F12", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 }, 109 109 { "10431F1F", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, -1, 0, 0, 0, 0 }, 110 110 { "10431F62", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 }, 111 + { "10433A60", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 }, 111 112 { "17AA386F", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, -1, -1, 0, 0, 0 }, 113 + { "17AA3877", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 }, 114 + { "17AA3878", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 }, 112 115 { "17AA38A9", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 2, -1, 0, 0, 0 }, 113 116 { "17AA38AB", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 2, -1, 0, 0, 0 }, 114 117 { "17AA38B4", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 }, ··· 499 496 { "CSC3551", "10431F12", generic_dsd_config }, 500 497 { "CSC3551", "10431F1F", generic_dsd_config }, 501 498 { "CSC3551", "10431F62", generic_dsd_config }, 499 + { "CSC3551", "10433A60", generic_dsd_config }, 502 500 { "CSC3551", "17AA386F", generic_dsd_config }, 501 + { "CSC3551", "17AA3877", generic_dsd_config }, 502 + { "CSC3551", "17AA3878", generic_dsd_config }, 503 503 { "CSC3551", "17AA38A9", generic_dsd_config }, 504 504 { "CSC3551", "17AA38AB", generic_dsd_config }, 505 505 { "CSC3551", "17AA38B4", generic_dsd_config },

+4

sound/pci/hda/cs35l56_hda.c

··· 644 644 ret = cs35l56_wait_for_firmware_boot(&cs35l56->base); 645 645 if (ret) 646 646 goto err_powered_up; 647 + 648 + regcache_cache_only(cs35l56->base.regmap, false); 647 649 } 648 650 649 651 /* Disable auto-hibernate so that runtime_pm has control */ ··· 1003 1001 ret = cs35l56_wait_for_firmware_boot(&cs35l56->base); 1004 1002 if (ret) 1005 1003 goto err; 1004 + 1005 + regcache_cache_only(cs35l56->base.regmap, false); 1006 1006 1007 1007 ret = cs35l56_set_patch(&cs35l56->base); 1008 1008 if (ret)

+11 -2

sound/pci/hda/cs35l56_hda_i2c.c

··· 56 56 {} 57 57 }; 58 58 59 + static const struct acpi_device_id cs35l56_acpi_hda_match[] = { 60 + { "CSC3554", 0 }, 61 + { "CSC3556", 0 }, 62 + { "CSC3557", 0 }, 63 + {} 64 + }; 65 + MODULE_DEVICE_TABLE(acpi, cs35l56_acpi_hda_match); 66 + 59 67 static struct i2c_driver cs35l56_hda_i2c_driver = { 60 68 .driver = { 61 - .name = "cs35l56-hda", 62 - .pm = &cs35l56_hda_pm_ops, 69 + .name = "cs35l56-hda", 70 + .acpi_match_table = cs35l56_acpi_hda_match, 71 + .pm = &cs35l56_hda_pm_ops, 63 72 }, 64 73 .id_table = cs35l56_hda_i2c_id, 65 74 .probe = cs35l56_hda_i2c_probe,

+11 -2

sound/pci/hda/cs35l56_hda_spi.c

··· 56 56 {} 57 57 }; 58 58 59 + static const struct acpi_device_id cs35l56_acpi_hda_match[] = { 60 + { "CSC3554", 0 }, 61 + { "CSC3556", 0 }, 62 + { "CSC3557", 0 }, 63 + {} 64 + }; 65 + MODULE_DEVICE_TABLE(acpi, cs35l56_acpi_hda_match); 66 + 59 67 static struct spi_driver cs35l56_hda_spi_driver = { 60 68 .driver = { 61 - .name = "cs35l56-hda", 62 - .pm = &cs35l56_hda_pm_ops, 69 + .name = "cs35l56-hda", 70 + .acpi_match_table = cs35l56_acpi_hda_match, 71 + .pm = &cs35l56_hda_pm_ops, 63 72 }, 64 73 .id_table = cs35l56_hda_spi_id, 65 74 .probe = cs35l56_hda_spi_probe,

+57 -3

sound/pci/hda/patch_realtek.c

··· 6875 6875 comp_generic_fixup(cdc, action, "i2c", "CLSA0101", "-%s:00-cs35l41-hda.%d", 2); 6876 6876 } 6877 6877 6878 + static void cs35l56_fixup_i2c_two(struct hda_codec *cdc, const struct hda_fixup *fix, int action) 6879 + { 6880 + comp_generic_fixup(cdc, action, "i2c", "CSC3556", "-%s:00-cs35l56-hda.%d", 2); 6881 + } 6882 + 6883 + static void cs35l56_fixup_i2c_four(struct hda_codec *cdc, const struct hda_fixup *fix, int action) 6884 + { 6885 + comp_generic_fixup(cdc, action, "i2c", "CSC3556", "-%s:00-cs35l56-hda.%d", 4); 6886 + } 6887 + 6888 + static void cs35l56_fixup_spi_two(struct hda_codec *cdc, const struct hda_fixup *fix, int action) 6889 + { 6890 + comp_generic_fixup(cdc, action, "spi", "CSC3556", "-%s:00-cs35l56-hda.%d", 2); 6891 + } 6892 + 6878 6893 static void cs35l56_fixup_spi_four(struct hda_codec *cdc, const struct hda_fixup *fix, int action) 6879 6894 { 6880 6895 comp_generic_fixup(cdc, action, "spi", "CSC3556", "-%s:00-cs35l56-hda.%d", 4); 6896 + } 6897 + 6898 + static void alc285_fixup_asus_ga403u(struct hda_codec *cdc, const struct hda_fixup *fix, int action) 6899 + { 6900 + /* 6901 + * The same SSID has been re-used in different hardware, they have 6902 + * different codecs and the newer GA403U has a ALC285. 6903 + */ 6904 + if (cdc->core.vendor_id == 0x10ec0285) 6905 + cs35l56_fixup_i2c_two(cdc, fix, action); 6906 + else 6907 + alc_fixup_inv_dmic(cdc, fix, action); 6881 6908 } 6882 6909 6883 6910 static void tas2781_fixup_i2c(struct hda_codec *cdc, ··· 7463 7436 ALC256_FIXUP_ACER_SFG16_MICMUTE_LED, 7464 7437 ALC256_FIXUP_HEADPHONE_AMP_VOL, 7465 7438 ALC245_FIXUP_HP_SPECTRE_X360_EU0XXX, 7439 + ALC285_FIXUP_CS35L56_SPI_2, 7440 + ALC285_FIXUP_CS35L56_I2C_2, 7441 + ALC285_FIXUP_CS35L56_I2C_4, 7442 + ALC285_FIXUP_ASUS_GA403U, 7466 7443 }; 7467 7444 7468 7445 /* A special fixup for Lenovo C940 and Yoga Duet 7; ··· 9674 9643 .type = HDA_FIXUP_FUNC, 9675 9644 .v.func = alc245_fixup_hp_spectre_x360_eu0xxx, 9676 9645 }, 9646 + [ALC285_FIXUP_CS35L56_SPI_2] = { 9647 + .type = HDA_FIXUP_FUNC, 9648 + .v.func = cs35l56_fixup_spi_two, 9649 + }, 9650 + [ALC285_FIXUP_CS35L56_I2C_2] = { 9651 + .type = HDA_FIXUP_FUNC, 9652 + .v.func = cs35l56_fixup_i2c_two, 9653 + }, 9654 + [ALC285_FIXUP_CS35L56_I2C_4] = { 9655 + .type = HDA_FIXUP_FUNC, 9656 + .v.func = cs35l56_fixup_i2c_four, 9657 + }, 9658 + [ALC285_FIXUP_ASUS_GA403U] = { 9659 + .type = HDA_FIXUP_FUNC, 9660 + .v.func = alc285_fixup_asus_ga403u, 9661 + }, 9677 9662 }; 9678 9663 9679 9664 static const struct snd_pci_quirk alc269_fixup_tbl[] = { ··· 10143 10096 SND_PCI_QUIRK(0x1043, 0x1a83, "ASUS UM5302LA", ALC294_FIXUP_CS35L41_I2C_2), 10144 10097 SND_PCI_QUIRK(0x1043, 0x1a8f, "ASUS UX582ZS", ALC245_FIXUP_CS35L41_SPI_2), 10145 10098 SND_PCI_QUIRK(0x1043, 0x1b11, "ASUS UX431DA", ALC294_FIXUP_ASUS_COEF_1B), 10146 - SND_PCI_QUIRK(0x1043, 0x1b13, "Asus U41SV", ALC269_FIXUP_INV_DMIC), 10099 + SND_PCI_QUIRK(0x1043, 0x1b13, "ASUS U41SV/GA403U", ALC285_FIXUP_ASUS_GA403U), 10147 10100 SND_PCI_QUIRK(0x1043, 0x1b93, "ASUS G614JVR/JIR", ALC245_FIXUP_CS35L41_SPI_2), 10148 10101 SND_PCI_QUIRK(0x1043, 0x1bbd, "ASUS Z550MA", ALC255_FIXUP_ASUS_MIC_NO_PRESENCE), 10149 10102 SND_PCI_QUIRK(0x1043, 0x1c03, "ASUS UM3406HA", ALC287_FIXUP_CS35L41_I2C_2), ··· 10151 10104 SND_PCI_QUIRK(0x1043, 0x1c33, "ASUS UX5304MA", ALC245_FIXUP_CS35L41_SPI_2), 10152 10105 SND_PCI_QUIRK(0x1043, 0x1c43, "ASUS UX8406MA", ALC245_FIXUP_CS35L41_SPI_2), 10153 10106 SND_PCI_QUIRK(0x1043, 0x1c62, "ASUS GU603", ALC289_FIXUP_ASUS_GA401), 10107 + SND_PCI_QUIRK(0x1043, 0x1c63, "ASUS GU605M", ALC285_FIXUP_CS35L56_SPI_2), 10154 10108 SND_PCI_QUIRK(0x1043, 0x1c92, "ASUS ROG Strix G15", ALC285_FIXUP_ASUS_G533Z_PINS), 10155 10109 SND_PCI_QUIRK(0x1043, 0x1c9f, "ASUS G614JU/JV/JI", ALC285_FIXUP_ASUS_HEADSET_MIC), 10156 10110 SND_PCI_QUIRK(0x1043, 0x1caf, "ASUS G634JY/JZ/JI/JG", ALC285_FIXUP_ASUS_SPI_REAR_SPEAKERS), ··· 10163 10115 SND_PCI_QUIRK(0x1043, 0x1d42, "ASUS Zephyrus G14 2022", ALC289_FIXUP_ASUS_GA401), 10164 10116 SND_PCI_QUIRK(0x1043, 0x1d4e, "ASUS TM420", ALC256_FIXUP_ASUS_HPE), 10165 10117 SND_PCI_QUIRK(0x1043, 0x1da2, "ASUS UP6502ZA/ZD", ALC245_FIXUP_CS35L41_SPI_2), 10118 + SND_PCI_QUIRK(0x1043, 0x1df3, "ASUS UM5606", ALC285_FIXUP_CS35L56_I2C_4), 10166 10119 SND_PCI_QUIRK(0x1043, 0x1e02, "ASUS UX3402ZA", ALC245_FIXUP_CS35L41_SPI_2), 10167 10120 SND_PCI_QUIRK(0x1043, 0x1e11, "ASUS Zephyrus G15", ALC289_FIXUP_ASUS_GA502), 10168 10121 SND_PCI_QUIRK(0x1043, 0x1e12, "ASUS UM3402", ALC287_FIXUP_CS35L41_I2C_2), 10169 10122 SND_PCI_QUIRK(0x1043, 0x1e51, "ASUS Zephyrus M15", ALC294_FIXUP_ASUS_GU502_PINS), 10170 10123 SND_PCI_QUIRK(0x1043, 0x1e5e, "ASUS ROG Strix G513", ALC294_FIXUP_ASUS_G513_PINS), 10124 + SND_PCI_QUIRK(0x1043, 0x1e63, "ASUS H7606W", ALC285_FIXUP_CS35L56_I2C_2), 10125 + SND_PCI_QUIRK(0x1043, 0x1e83, "ASUS GA605W", ALC285_FIXUP_CS35L56_I2C_2), 10171 10126 SND_PCI_QUIRK(0x1043, 0x1e8e, "ASUS Zephyrus G15", ALC289_FIXUP_ASUS_GA401), 10172 10127 SND_PCI_QUIRK(0x1043, 0x1ee2, "ASUS UM6702RA/RC", ALC287_FIXUP_CS35L41_I2C_2), 10173 10128 SND_PCI_QUIRK(0x1043, 0x1c52, "ASUS Zephyrus G15 2022", ALC289_FIXUP_ASUS_GA401), ··· 10184 10133 SND_PCI_QUIRK(0x1043, 0x3a30, "ASUS G814JVR/JIR", ALC245_FIXUP_CS35L41_SPI_2), 10185 10134 SND_PCI_QUIRK(0x1043, 0x3a40, "ASUS G814JZR", ALC245_FIXUP_CS35L41_SPI_2), 10186 10135 SND_PCI_QUIRK(0x1043, 0x3a50, "ASUS G834JYR/JZR", ALC245_FIXUP_CS35L41_SPI_2), 10187 - SND_PCI_QUIRK(0x1043, 0x3a60, "ASUS G634JYR/JZR", ALC245_FIXUP_CS35L41_SPI_2), 10136 + SND_PCI_QUIRK(0x1043, 0x3a60, "ASUS G634JYR/JZR", ALC285_FIXUP_ASUS_SPI_REAR_SPEAKERS), 10188 10137 SND_PCI_QUIRK(0x1043, 0x831a, "ASUS P901", ALC269_FIXUP_STEREO_DMIC), 10189 10138 SND_PCI_QUIRK(0x1043, 0x834a, "ASUS S101", ALC269_FIXUP_STEREO_DMIC), 10190 10139 SND_PCI_QUIRK(0x1043, 0x8398, "ASUS P1005", ALC269_FIXUP_STEREO_DMIC), ··· 10210 10159 SND_PCI_QUIRK(0x10ec, 0x1254, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK), 10211 10160 SND_PCI_QUIRK(0x10ec, 0x12cc, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK), 10212 10161 SND_PCI_QUIRK(0x10ec, 0x12f6, "Intel Reference board", ALC295_FIXUP_CHROME_BOOK), 10213 - SND_PCI_QUIRK(0x10f7, 0x8338, "Panasonic CF-SZ6", ALC269_FIXUP_HEADSET_MODE), 10162 + SND_PCI_QUIRK(0x10f7, 0x8338, "Panasonic CF-SZ6", ALC269_FIXUP_ASPIRE_HEADSET_MIC), 10214 10163 SND_PCI_QUIRK(0x144d, 0xc109, "Samsung Ativ book 9 (NP900X3G)", ALC269_FIXUP_INV_DMIC), 10215 10164 SND_PCI_QUIRK(0x144d, 0xc169, "Samsung Notebook 9 Pen (NP930SBE-K01US)", ALC298_FIXUP_SAMSUNG_AMP), 10216 10165 SND_PCI_QUIRK(0x144d, 0xc176, "Samsung Notebook 9 Pro (NP930MBE-K04US)", ALC298_FIXUP_SAMSUNG_AMP), ··· 10384 10333 SND_PCI_QUIRK(0x17aa, 0x3869, "Lenovo Yoga7 14IAL7", ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN), 10385 10334 SND_PCI_QUIRK(0x17aa, 0x386f, "Legion 7i 16IAX7", ALC287_FIXUP_CS35L41_I2C_2), 10386 10335 SND_PCI_QUIRK(0x17aa, 0x3870, "Lenovo Yoga 7 14ARB7", ALC287_FIXUP_YOGA7_14ARB7_I2C), 10336 + SND_PCI_QUIRK(0x17aa, 0x3877, "Lenovo Legion 7 Slim 16ARHA7", ALC287_FIXUP_CS35L41_I2C_2), 10337 + SND_PCI_QUIRK(0x17aa, 0x3878, "Lenovo Legion 7 Slim 16ARHA7", ALC287_FIXUP_CS35L41_I2C_2), 10387 10338 SND_PCI_QUIRK(0x17aa, 0x387d, "Yoga S780-16 pro Quad AAC", ALC287_FIXUP_TAS2781_I2C), 10388 10339 SND_PCI_QUIRK(0x17aa, 0x387e, "Yoga S780-16 pro Quad YC", ALC287_FIXUP_TAS2781_I2C), 10389 10340 SND_PCI_QUIRK(0x17aa, 0x3881, "YB9 dual power mode2 YC", ALC287_FIXUP_TAS2781_I2C), ··· 10456 10403 SND_PCI_QUIRK(0x1d05, 0x1147, "TongFang GMxTGxx", ALC269_FIXUP_NO_SHUTUP), 10457 10404 SND_PCI_QUIRK(0x1d05, 0x115c, "TongFang GMxTGxx", ALC269_FIXUP_NO_SHUTUP), 10458 10405 SND_PCI_QUIRK(0x1d05, 0x121b, "TongFang GMxAGxx", ALC269_FIXUP_NO_SHUTUP), 10406 + SND_PCI_QUIRK(0x1d05, 0x1387, "TongFang GMxIXxx", ALC2XX_FIXUP_HEADSET_MIC), 10459 10407 SND_PCI_QUIRK(0x1d72, 0x1602, "RedmiBook", ALC255_FIXUP_XIAOMI_HEADSET_MIC), 10460 10408 SND_PCI_QUIRK(0x1d72, 0x1701, "XiaomiNotebook Pro", ALC298_FIXUP_DELL1_MIC_NO_PRESENCE), 10461 10409 SND_PCI_QUIRK(0x1d72, 0x1901, "RedmiBook 14", ALC256_FIXUP_ASUS_HEADSET_MIC),

+21 -7

sound/soc/codecs/cs35l41.c

··· 1093 1093 static int cs35l41_dsp_init(struct cs35l41_private *cs35l41) 1094 1094 { 1095 1095 struct wm_adsp *dsp; 1096 + uint32_t dsp1rx5_src; 1096 1097 int ret; 1097 1098 1098 1099 dsp = &cs35l41->dsp; ··· 1113 1112 return ret; 1114 1113 } 1115 1114 1116 - ret = regmap_write(cs35l41->regmap, CS35L41_DSP1_RX5_SRC, 1117 - CS35L41_INPUT_SRC_VPMON); 1118 - if (ret < 0) { 1119 - dev_err(cs35l41->dev, "Write INPUT_SRC_VPMON failed: %d\n", ret); 1115 + switch (cs35l41->hw_cfg.bst_type) { 1116 + case CS35L41_INT_BOOST: 1117 + case CS35L41_SHD_BOOST_ACTV: 1118 + dsp1rx5_src = CS35L41_INPUT_SRC_VPMON; 1119 + break; 1120 + case CS35L41_EXT_BOOST: 1121 + case CS35L41_SHD_BOOST_PASS: 1122 + dsp1rx5_src = CS35L41_INPUT_SRC_VBSTMON; 1123 + break; 1124 + default: 1125 + dev_err(cs35l41->dev, "wm_halo_init failed - Invalid Boost Type: %d\n", 1126 + cs35l41->hw_cfg.bst_type); 1120 1127 goto err_dsp; 1121 1128 } 1122 - ret = regmap_write(cs35l41->regmap, CS35L41_DSP1_RX6_SRC, 1123 - CS35L41_INPUT_SRC_CLASSH); 1129 + 1130 + ret = regmap_write(cs35l41->regmap, CS35L41_DSP1_RX5_SRC, dsp1rx5_src); 1124 1131 if (ret < 0) { 1125 - dev_err(cs35l41->dev, "Write INPUT_SRC_CLASSH failed: %d\n", ret); 1132 + dev_err(cs35l41->dev, "Write DSP1RX5_SRC: %d failed: %d\n", dsp1rx5_src, ret); 1133 + goto err_dsp; 1134 + } 1135 + ret = regmap_write(cs35l41->regmap, CS35L41_DSP1_RX6_SRC, CS35L41_INPUT_SRC_VBSTMON); 1136 + if (ret < 0) { 1137 + dev_err(cs35l41->dev, "Write CS35L41_INPUT_SRC_VBSTMON failed: %d\n", ret); 1126 1138 goto err_dsp; 1127 1139 } 1128 1140 ret = regmap_write(cs35l41->regmap, CS35L41_DSP1_RX7_SRC,

-2

sound/soc/codecs/cs35l56-sdw.c

··· 188 188 goto out; 189 189 } 190 190 191 - regcache_cache_only(cs35l56->base.regmap, false); 192 - 193 191 ret = cs35l56_init(cs35l56); 194 192 if (ret < 0) { 195 193 regcache_cache_only(cs35l56->base.regmap, true);

+55 -30

sound/soc/codecs/cs35l56-shared.c

··· 41 41 static const struct reg_default cs35l56_reg_defaults[] = { 42 42 /* no defaults for OTP_MEM - first read populates cache */ 43 43 44 - { CS35L56_ASP1_ENABLES1, 0x00000000 }, 45 - { CS35L56_ASP1_CONTROL1, 0x00000028 }, 46 - { CS35L56_ASP1_CONTROL2, 0x18180200 }, 47 - { CS35L56_ASP1_CONTROL3, 0x00000002 }, 48 - { CS35L56_ASP1_FRAME_CONTROL1, 0x03020100 }, 49 - { CS35L56_ASP1_FRAME_CONTROL5, 0x00020100 }, 50 - { CS35L56_ASP1_DATA_CONTROL1, 0x00000018 }, 51 - { CS35L56_ASP1_DATA_CONTROL5, 0x00000018 }, 52 - 53 - /* no defaults for ASP1TX mixer */ 44 + /* 45 + * No defaults for ASP1 control or ASP1TX mixer. See 46 + * cs35l56_populate_asp1_register_defaults() and 47 + * cs35l56_sync_asp1_mixer_widgets_with_firmware(). 48 + */ 54 49 55 50 { CS35L56_SWIRE_DP3_CH1_INPUT, 0x00000018 }, 56 51 { CS35L56_SWIRE_DP3_CH2_INPUT, 0x00000019 }, ··· 206 211 } 207 212 } 208 213 214 + static const struct reg_sequence cs35l56_asp1_defaults[] = { 215 + REG_SEQ0(CS35L56_ASP1_ENABLES1, 0x00000000), 216 + REG_SEQ0(CS35L56_ASP1_CONTROL1, 0x00000028), 217 + REG_SEQ0(CS35L56_ASP1_CONTROL2, 0x18180200), 218 + REG_SEQ0(CS35L56_ASP1_CONTROL3, 0x00000002), 219 + REG_SEQ0(CS35L56_ASP1_FRAME_CONTROL1, 0x03020100), 220 + REG_SEQ0(CS35L56_ASP1_FRAME_CONTROL5, 0x00020100), 221 + REG_SEQ0(CS35L56_ASP1_DATA_CONTROL1, 0x00000018), 222 + REG_SEQ0(CS35L56_ASP1_DATA_CONTROL5, 0x00000018), 223 + }; 224 + 225 + /* 226 + * The firmware can have control of the ASP so we don't provide regmap 227 + * with defaults for these registers, to prevent a regcache_sync() from 228 + * overwriting the firmware settings. But if the machine driver hooks up 229 + * the ASP it means the driver is taking control of the ASP, so then the 230 + * registers are populated with the defaults. 231 + */ 232 + int cs35l56_init_asp1_regs_for_driver_control(struct cs35l56_base *cs35l56_base) 233 + { 234 + if (!cs35l56_base->fw_owns_asp1) 235 + return 0; 236 + 237 + cs35l56_base->fw_owns_asp1 = false; 238 + 239 + return regmap_multi_reg_write(cs35l56_base->regmap, cs35l56_asp1_defaults, 240 + ARRAY_SIZE(cs35l56_asp1_defaults)); 241 + } 242 + EXPORT_SYMBOL_NS_GPL(cs35l56_init_asp1_regs_for_driver_control, SND_SOC_CS35L56_SHARED); 243 + 209 244 /* 210 245 * The firmware boot sequence can overwrite the ASP1 config registers so that 211 246 * they don't match regmap's view of their values. Rewrite the values from the ··· 243 218 */ 244 219 int cs35l56_force_sync_asp1_registers_from_cache(struct cs35l56_base *cs35l56_base) 245 220 { 246 - struct reg_sequence asp1_regs[] = { 247 - { .reg = CS35L56_ASP1_ENABLES1 }, 248 - { .reg = CS35L56_ASP1_CONTROL1 }, 249 - { .reg = CS35L56_ASP1_CONTROL2 }, 250 - { .reg = CS35L56_ASP1_CONTROL3 }, 251 - { .reg = CS35L56_ASP1_FRAME_CONTROL1 }, 252 - { .reg = CS35L56_ASP1_FRAME_CONTROL5 }, 253 - { .reg = CS35L56_ASP1_DATA_CONTROL1 }, 254 - { .reg = CS35L56_ASP1_DATA_CONTROL5 }, 255 - }; 221 + struct reg_sequence asp1_regs[ARRAY_SIZE(cs35l56_asp1_defaults)]; 256 222 int i, ret; 257 223 258 - /* Read values from regmap cache into a write sequence */ 224 + if (cs35l56_base->fw_owns_asp1) 225 + return 0; 226 + 227 + memcpy(asp1_regs, cs35l56_asp1_defaults, sizeof(asp1_regs)); 228 + 229 + /* Read current values from regmap cache into the write sequence */ 259 230 for (i = 0; i < ARRAY_SIZE(asp1_regs); ++i) { 260 231 ret = regmap_read(cs35l56_base->regmap, asp1_regs[i].reg, &asp1_regs[i].def); 261 232 if (ret) ··· 329 308 reg = CS35L56_DSP1_HALO_STATE; 330 309 331 310 /* 332 - * This can't be a regmap_read_poll_timeout() because cs35l56 will NAK 333 - * I2C until it has booted which would terminate the poll 311 + * The regmap must remain in cache-only until the chip has 312 + * booted, so use a bypassed read of the status register. 334 313 */ 335 - poll_ret = read_poll_timeout(regmap_read, read_ret, 314 + poll_ret = read_poll_timeout(regmap_read_bypassed, read_ret, 336 315 (val < 0xFFFF) && (val >= CS35L56_HALO_STATE_BOOT_DONE), 337 316 CS35L56_HALO_STATE_POLL_US, 338 317 CS35L56_HALO_STATE_TIMEOUT_US, ··· 384 363 return; 385 364 386 365 cs35l56_wait_control_port_ready(); 387 - regcache_cache_only(cs35l56_base->regmap, false); 366 + 367 + /* Leave in cache-only. This will be revoked when the chip has rebooted. */ 388 368 } 389 369 EXPORT_SYMBOL_NS_GPL(cs35l56_system_reset, SND_SOC_CS35L56_SHARED); 390 370 ··· 600 578 cs35l56_issue_wake_event(cs35l56_base); 601 579 602 580 out_sync: 603 - regcache_cache_only(cs35l56_base->regmap, false); 604 - 605 581 ret = cs35l56_wait_for_firmware_boot(cs35l56_base); 606 582 if (ret) { 607 583 dev_err(cs35l56_base->dev, "Hibernate wake failed: %d\n", ret); 608 584 goto err; 609 585 } 586 + 587 + regcache_cache_only(cs35l56_base->regmap, false); 610 588 611 589 ret = cs35l56_mbox_send(cs35l56_base, CS35L56_MBOX_CMD_PREVENT_AUTO_HIBERNATE); 612 590 if (ret) ··· 707 685 708 686 int cs35l56_get_calibration(struct cs35l56_base *cs35l56_base) 709 687 { 710 - u64 silicon_uid; 688 + u64 silicon_uid = 0; 711 689 int ret; 712 690 713 691 /* Driver can't apply calibration to a secured part, so skip */ ··· 780 758 * devices so the REVID needs to be determined before waiting for the 781 759 * firmware to boot. 782 760 */ 783 - ret = regmap_read(cs35l56_base->regmap, CS35L56_REVID, &revid); 761 + ret = regmap_read_bypassed(cs35l56_base->regmap, CS35L56_REVID, &revid); 784 762 if (ret < 0) { 785 763 dev_err(cs35l56_base->dev, "Get Revision ID failed\n"); 786 764 return ret; ··· 791 769 if (ret) 792 770 return ret; 793 771 794 - ret = regmap_read(cs35l56_base->regmap, CS35L56_DEVID, &devid); 772 + ret = regmap_read_bypassed(cs35l56_base->regmap, CS35L56_DEVID, &devid); 795 773 if (ret < 0) { 796 774 dev_err(cs35l56_base->dev, "Get Device ID failed\n"); 797 775 return ret; ··· 809 787 } 810 788 811 789 cs35l56_base->type = devid & 0xFF; 790 + 791 + /* Silicon is now identified and booted so exit cache-only */ 792 + regcache_cache_only(cs35l56_base->regmap, false); 812 793 813 794 ret = regmap_read(cs35l56_base->regmap, CS35L56_DSP_RESTRICT_STS1, &secured); 814 795 if (ret) {

+25 -1

sound/soc/codecs/cs35l56.c

··· 455 455 { 456 456 struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(codec_dai->component); 457 457 unsigned int val; 458 + int ret; 458 459 459 460 dev_dbg(cs35l56->base.dev, "%s: %#x\n", __func__, fmt); 461 + 462 + ret = cs35l56_init_asp1_regs_for_driver_control(&cs35l56->base); 463 + if (ret) 464 + return ret; 460 465 461 466 switch (fmt & SND_SOC_DAIFMT_CLOCK_PROVIDER_MASK) { 462 467 case SND_SOC_DAIFMT_CBC_CFC: ··· 536 531 unsigned int rx_mask, int slots, int slot_width) 537 532 { 538 533 struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(dai->component); 534 + int ret; 535 + 536 + ret = cs35l56_init_asp1_regs_for_driver_control(&cs35l56->base); 537 + if (ret) 538 + return ret; 539 539 540 540 if ((slots == 0) || (slot_width == 0)) { 541 541 dev_dbg(cs35l56->base.dev, "tdm config cleared\n"); ··· 589 579 struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(dai->component); 590 580 unsigned int rate = params_rate(params); 591 581 u8 asp_width, asp_wl; 582 + int ret; 583 + 584 + ret = cs35l56_init_asp1_regs_for_driver_control(&cs35l56->base); 585 + if (ret) 586 + return ret; 592 587 593 588 asp_wl = params_width(params); 594 589 if (cs35l56->asp_slot_width) ··· 650 635 int clk_id, unsigned int freq, int dir) 651 636 { 652 637 struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(dai->component); 653 - int freq_id; 638 + int freq_id, ret; 639 + 640 + ret = cs35l56_init_asp1_regs_for_driver_control(&cs35l56->base); 641 + if (ret) 642 + return ret; 654 643 655 644 if (freq == 0) { 656 645 cs35l56->sysclk_set = false; ··· 1423 1404 cs35l56->base.cal_index = -1; 1424 1405 cs35l56->speaker_id = -ENOENT; 1425 1406 1407 + /* Assume that the firmware owns ASP1 until we know different */ 1408 + cs35l56->base.fw_owns_asp1 = true; 1409 + 1426 1410 dev_set_drvdata(cs35l56->base.dev, cs35l56); 1427 1411 1428 1412 cs35l56_fill_supply_names(cs35l56->supplies); ··· 1554 1532 return ret; 1555 1533 1556 1534 dev_dbg(cs35l56->base.dev, "Firmware rebooted after soft reset\n"); 1535 + 1536 + regcache_cache_only(cs35l56->base.regmap, false); 1557 1537 } 1558 1538 1559 1539 /* Disable auto-hibernate so that runtime_pm has control */

+25

sound/soc/codecs/rt5645.c

··· 444 444 struct regmap *regmap; 445 445 struct i2c_client *i2c; 446 446 struct gpio_desc *gpiod_hp_det; 447 + struct gpio_desc *gpiod_cbj_sleeve; 447 448 struct snd_soc_jack *hp_jack; 448 449 struct snd_soc_jack *mic_jack; 449 450 struct snd_soc_jack *btn_jack; ··· 3187 3186 regmap_update_bits(rt5645->regmap, RT5645_IN1_CTRL2, 3188 3187 RT5645_CBJ_MN_JD, 0); 3189 3188 3189 + if (rt5645->gpiod_cbj_sleeve) 3190 + gpiod_set_value(rt5645->gpiod_cbj_sleeve, 1); 3191 + 3190 3192 msleep(600); 3191 3193 regmap_read(rt5645->regmap, RT5645_IN1_CTRL3, &val); 3192 3194 val &= 0x7; ··· 3206 3202 snd_soc_dapm_disable_pin(dapm, "Mic Det Power"); 3207 3203 snd_soc_dapm_sync(dapm); 3208 3204 rt5645->jack_type = SND_JACK_HEADPHONE; 3205 + if (rt5645->gpiod_cbj_sleeve) 3206 + gpiod_set_value(rt5645->gpiod_cbj_sleeve, 0); 3209 3207 } 3210 3208 if (rt5645->pdata.level_trigger_irq) 3211 3209 regmap_update_bits(rt5645->regmap, RT5645_IRQ_CTRL2, ··· 3235 3229 if (rt5645->pdata.level_trigger_irq) 3236 3230 regmap_update_bits(rt5645->regmap, RT5645_IRQ_CTRL2, 3237 3231 RT5645_JD_1_1_MASK, RT5645_JD_1_1_INV); 3232 + 3233 + if (rt5645->gpiod_cbj_sleeve) 3234 + gpiod_set_value(rt5645->gpiod_cbj_sleeve, 0); 3238 3235 } 3239 3236 3240 3237 return rt5645->jack_type; ··· 4021 4012 return ret; 4022 4013 } 4023 4014 4015 + rt5645->gpiod_cbj_sleeve = devm_gpiod_get_optional(&i2c->dev, "cbj-sleeve", 4016 + GPIOD_OUT_LOW); 4017 + 4018 + if (IS_ERR(rt5645->gpiod_cbj_sleeve)) { 4019 + ret = PTR_ERR(rt5645->gpiod_cbj_sleeve); 4020 + dev_info(&i2c->dev, "failed to initialize gpiod, ret=%d\n", ret); 4021 + if (ret != -ENOENT) 4022 + return ret; 4023 + } 4024 + 4024 4025 for (i = 0; i < ARRAY_SIZE(rt5645->supplies); i++) 4025 4026 rt5645->supplies[i].supply = rt5645_supply_names[i]; 4026 4027 ··· 4278 4259 cancel_delayed_work_sync(&rt5645->jack_detect_work); 4279 4260 cancel_delayed_work_sync(&rt5645->rcclock_work); 4280 4261 4262 + if (rt5645->gpiod_cbj_sleeve) 4263 + gpiod_set_value(rt5645->gpiod_cbj_sleeve, 0); 4264 + 4281 4265 regulator_bulk_disable(ARRAY_SIZE(rt5645->supplies), rt5645->supplies); 4282 4266 } 4283 4267 ··· 4296 4274 0); 4297 4275 msleep(20); 4298 4276 regmap_write(rt5645->regmap, RT5645_RESET, 0); 4277 + 4278 + if (rt5645->gpiod_cbj_sleeve) 4279 + gpiod_set_value(rt5645->gpiod_cbj_sleeve, 0); 4299 4280 } 4300 4281 4301 4282 static int __maybe_unused rt5645_sys_suspend(struct device *dev)

+4 -4

sound/soc/codecs/rt715-sdca.c

··· 316 316 return 0; 317 317 } 318 318 319 - static const DECLARE_TLV_DB_SCALE(in_vol_tlv, -17625, 375, 0); 319 + static const DECLARE_TLV_DB_SCALE(in_vol_tlv, -1725, 75, 0); 320 320 static const DECLARE_TLV_DB_SCALE(mic_vol_tlv, 0, 1000, 0); 321 321 322 322 static int rt715_sdca_get_volsw(struct snd_kcontrol *kcontrol, ··· 477 477 RT715_SDCA_FU_VOL_CTRL, CH_01), 478 478 SDW_SDCA_CTL(FUN_MIC_ARRAY, RT715_SDCA_FU_ADC7_27_VOL, 479 479 RT715_SDCA_FU_VOL_CTRL, CH_02), 480 - 0x2f, 0x7f, 0, 480 + 0x2f, 0x3f, 0, 481 481 rt715_sdca_set_amp_gain_get, rt715_sdca_set_amp_gain_put, 482 482 in_vol_tlv), 483 483 RT715_SDCA_EXT_TLV("FU02 Capture Volume", ··· 485 485 RT715_SDCA_FU_VOL_CTRL, CH_01), 486 486 rt715_sdca_set_amp_gain_4ch_get, 487 487 rt715_sdca_set_amp_gain_4ch_put, 488 - in_vol_tlv, 4, 0x7f), 488 + in_vol_tlv, 4, 0x3f), 489 489 RT715_SDCA_EXT_TLV("FU06 Capture Volume", 490 490 SDW_SDCA_CTL(FUN_MIC_ARRAY, RT715_SDCA_FU_ADC10_11_VOL, 491 491 RT715_SDCA_FU_VOL_CTRL, CH_01), 492 492 rt715_sdca_set_amp_gain_4ch_get, 493 493 rt715_sdca_set_amp_gain_4ch_put, 494 - in_vol_tlv, 4, 0x7f), 494 + in_vol_tlv, 4, 0x3f), 495 495 /* MIC Boost Control */ 496 496 RT715_SDCA_BOOST_EXT_TLV("FU0E Boost", 497 497 SDW_SDCA_CTL(FUN_MIC_ARRAY, RT715_SDCA_FU_DMIC_GAIN_EN,

+1

sound/soc/codecs/rt715-sdw.c

··· 111 111 case 0x839d: 112 112 case 0x83a7: 113 113 case 0x83a9: 114 + case 0x752001: 114 115 case 0x752039: 115 116 return true; 116 117 default:

+20 -7

sound/soc/codecs/rt722-sdca.c

··· 1330 1330 .capture = { 1331 1331 .stream_name = "DP6 DMic Capture", 1332 1332 .channels_min = 1, 1333 - .channels_max = 2, 1333 + .channels_max = 4, 1334 1334 .rates = RT722_STEREO_RATES, 1335 1335 .formats = RT722_FORMATS, 1336 1336 }, ··· 1439 1439 int loop_check, chk_cnt = 100, ret; 1440 1440 unsigned int calib_status = 0; 1441 1441 1442 - /* Read eFuse */ 1443 - rt722_sdca_index_write(rt722, RT722_VENDOR_SPK_EFUSE, RT722_DC_CALIB_CTRL, 1444 - 0x4808); 1442 + /* Config analog bias */ 1443 + rt722_sdca_index_write(rt722, RT722_VENDOR_REG, RT722_ANALOG_BIAS_CTL3, 1444 + 0xa081); 1445 + /* GE related settings */ 1446 + rt722_sdca_index_write(rt722, RT722_VENDOR_HDA_CTL, RT722_GE_RELATED_CTL2, 1447 + 0xa009); 1445 1448 /* Button A, B, C, D bypass mode */ 1446 1449 rt722_sdca_index_write(rt722, RT722_VENDOR_HDA_CTL, RT722_UMP_HID_CTL4, 1447 1450 0xcf00); ··· 1478 1475 if ((calib_status & 0x0040) == 0x0) 1479 1476 break; 1480 1477 } 1481 - /* Release HP-JD, EN_CBJ_TIE_GL/R open, en_osw gating auto done bit */ 1482 - rt722_sdca_index_write(rt722, RT722_VENDOR_REG, RT722_DIGITAL_MISC_CTRL4, 1483 - 0x0010); 1484 1478 /* Set ADC09 power entity floating control */ 1485 1479 rt722_sdca_index_write(rt722, RT722_VENDOR_HDA_CTL, RT722_ADC0A_08_PDE_FLOAT_CTL, 1486 1480 0x2a12); ··· 1490 1490 /* Set DAC03 and HP power entity floating control */ 1491 1491 rt722_sdca_index_write(rt722, RT722_VENDOR_HDA_CTL, RT722_DAC03_HP_PDE_FLOAT_CTL, 1492 1492 0x4040); 1493 + rt722_sdca_index_write(rt722, RT722_VENDOR_HDA_CTL, RT722_ENT_FLOAT_CTRL_1, 1494 + 0x4141); 1495 + rt722_sdca_index_write(rt722, RT722_VENDOR_HDA_CTL, RT722_FLOAT_CTRL_1, 1496 + 0x0101); 1493 1497 /* Fine tune PDE40 latency */ 1494 1498 regmap_write(rt722->regmap, 0x2f58, 0x07); 1499 + regmap_write(rt722->regmap, 0x2f03, 0x06); 1500 + /* MIC VRefo */ 1501 + rt722_sdca_index_update_bits(rt722, RT722_VENDOR_REG, 1502 + RT722_COMBO_JACK_AUTO_CTL1, 0x0200, 0x0200); 1503 + rt722_sdca_index_update_bits(rt722, RT722_VENDOR_REG, 1504 + RT722_VREFO_GAT, 0x4000, 0x4000); 1505 + /* Release HP-JD, EN_CBJ_TIE_GL/R open, en_osw gating auto done bit */ 1506 + rt722_sdca_index_write(rt722, RT722_VENDOR_REG, RT722_DIGITAL_MISC_CTRL4, 1507 + 0x0010); 1495 1508 } 1496 1509 1497 1510 int rt722_sdca_io_init(struct device *dev, struct sdw_slave *slave)

+3

sound/soc/codecs/rt722-sdca.h

··· 69 69 #define RT722_COMBO_JACK_AUTO_CTL2 0x46 70 70 #define RT722_COMBO_JACK_AUTO_CTL3 0x47 71 71 #define RT722_DIGITAL_MISC_CTRL4 0x4a 72 + #define RT722_VREFO_GAT 0x63 72 73 #define RT722_FSM_CTL 0x67 73 74 #define RT722_SDCA_INTR_REC 0x82 74 75 #define RT722_SW_CONFIG1 0x8a ··· 128 127 #define RT722_UMP_HID_CTL6 0x66 129 128 #define RT722_UMP_HID_CTL7 0x67 130 129 #define RT722_UMP_HID_CTL8 0x68 130 + #define RT722_FLOAT_CTRL_1 0x70 131 + #define RT722_ENT_FLOAT_CTRL_1 0x76 131 132 132 133 /* Parameter & Verb control 01 (0x1a)(NID:20h) */ 133 134 #define RT722_HIDDEN_REG_SW_RESET (0x1 << 14)

+1

sound/soc/codecs/wsa881x.c

··· 1155 1155 pdev->prop.sink_ports = GENMASK(WSA881X_MAX_SWR_PORTS, 0); 1156 1156 pdev->prop.sink_dpn_prop = wsa_sink_dpn_prop; 1157 1157 pdev->prop.scp_int1_mask = SDW_SCP_INT1_BUS_CLASH | SDW_SCP_INT1_PARITY; 1158 + pdev->prop.clk_stop_mode1 = true; 1158 1159 gpiod_direction_output(wsa881x->sd_n, !wsa881x->sd_n_val); 1159 1160 1160 1161 wsa881x->regmap = devm_regmap_init_sdw(pdev, &wsa881x_regmap_config);

+1 -1

sound/soc/intel/avs/icl.c

··· 66 66 struct avs_icl_memwnd2 { 67 67 union { 68 68 struct avs_icl_memwnd2_desc slot_desc[AVS_ICL_MEMWND2_SLOTS_COUNT]; 69 - u8 rsvd[PAGE_SIZE]; 69 + u8 rsvd[SZ_4K]; 70 70 }; 71 71 u8 slot_array[AVS_ICL_MEMWND2_SLOTS_COUNT][SZ_4K]; 72 72 } __packed;

+2

sound/soc/intel/avs/topology.c

··· 1582 1582 if (!le32_to_cpu(dw->priv.size)) 1583 1583 return 0; 1584 1584 1585 + w->no_wname_in_kcontrol_name = true; 1586 + 1585 1587 if (w->ignore_suspend && !AVS_S0IX_SUPPORTED) { 1586 1588 dev_info_once(comp->dev, "Device does not support S0IX, check BIOS settings\n"); 1587 1589 w->ignore_suspend = false;

+13 -11

sound/soc/intel/boards/bytcr_rt5640.c

··· 636 636 BYT_RT5640_USE_AMCR0F28), 637 637 }, 638 638 { 639 - .matches = { 640 - DMI_EXACT_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), 641 - DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "T100TA"), 642 - }, 643 - .driver_data = (void *)(BYT_RT5640_IN1_MAP | 644 - BYT_RT5640_JD_SRC_JD2_IN4N | 645 - BYT_RT5640_OVCD_TH_2000UA | 646 - BYT_RT5640_OVCD_SF_0P75 | 647 - BYT_RT5640_MCLK_EN), 648 - }, 649 - { 639 + /* Asus T100TAF, unlike other T100TA* models this one has a mono speaker */ 650 640 .matches = { 651 641 DMI_EXACT_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), 652 642 DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "T100TAF"), ··· 648 658 BYT_RT5640_MONO_SPEAKER | 649 659 BYT_RT5640_DIFF_MIC | 650 660 BYT_RT5640_SSP0_AIF2 | 661 + BYT_RT5640_MCLK_EN), 662 + }, 663 + { 664 + /* Asus T100TA and T100TAM, must come after T100TAF (mono spk) match */ 665 + .matches = { 666 + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), 667 + DMI_MATCH(DMI_PRODUCT_NAME, "T100TA"), 668 + }, 669 + .driver_data = (void *)(BYT_RT5640_IN1_MAP | 670 + BYT_RT5640_JD_SRC_JD2_IN4N | 671 + BYT_RT5640_OVCD_TH_2000UA | 672 + BYT_RT5640_OVCD_SF_0P75 | 651 673 BYT_RT5640_MCLK_EN), 652 674 }, 653 675 {

+4 -4

sound/soc/sof/amd/acp.c

··· 704 704 goto unregister_dev; 705 705 } 706 706 707 + ret = acp_init(sdev); 708 + if (ret < 0) 709 + goto free_smn_dev; 710 + 707 711 sdev->ipc_irq = pci->irq; 708 712 ret = request_threaded_irq(sdev->ipc_irq, acp_irq_handler, acp_irq_thread, 709 713 IRQF_SHARED, "AudioDSP", sdev); ··· 716 712 sdev->ipc_irq); 717 713 goto free_smn_dev; 718 714 } 719 - 720 - ret = acp_init(sdev); 721 - if (ret < 0) 722 - goto free_ipc_irq; 723 715 724 716 /* scan SoundWire capabilities exposed by DSDT */ 725 717 ret = acp_sof_scan_sdw_devices(sdev, chip->sdw_acpi_dev_addr);

+11 -7

sound/soc/sof/core.c

··· 339 339 ret = snd_sof_probe(sdev); 340 340 if (ret < 0) { 341 341 dev_err(sdev->dev, "failed to probe DSP %d\n", ret); 342 - sof_ops_free(sdev); 343 - return ret; 342 + goto err_sof_probe; 344 343 } 345 344 346 345 /* check machine info */ ··· 350 351 } 351 352 352 353 ret = sof_select_ipc_and_paths(sdev); 353 - if (!ret && plat_data->ipc_type != base_profile->ipc_type) { 354 + if (ret) { 355 + goto err_machine_check; 356 + } else if (plat_data->ipc_type != base_profile->ipc_type) { 354 357 /* IPC type changed, re-initialize the ops */ 355 358 sof_ops_free(sdev); 356 359 357 360 ret = validate_sof_ops(sdev); 358 361 if (ret < 0) { 359 362 snd_sof_remove(sdev); 363 + snd_sof_remove_late(sdev); 360 364 return ret; 361 365 } 362 366 } 363 367 368 + return 0; 369 + 364 370 err_machine_check: 365 - if (ret) { 366 - snd_sof_remove(sdev); 367 - sof_ops_free(sdev); 368 - } 371 + snd_sof_remove(sdev); 372 + err_sof_probe: 373 + snd_sof_remove_late(sdev); 374 + sof_ops_free(sdev); 369 375 370 376 return ret; 371 377 }

+18

sound/soc/sof/debug.c

··· 311 311 312 312 int snd_sof_dbg_init(struct snd_sof_dev *sdev) 313 313 { 314 + struct snd_sof_pdata *plat_data = sdev->pdata; 314 315 struct snd_sof_dsp_ops *ops = sof_ops(sdev); 315 316 const struct snd_sof_debugfs_map *map; 317 + struct dentry *fw_profile; 316 318 int i; 317 319 int err; 318 320 319 321 /* use "sof" as top level debugFS dir */ 320 322 sdev->debugfs_root = debugfs_create_dir("sof", NULL); 323 + 324 + /* expose firmware/topology prefix/names for test purposes */ 325 + fw_profile = debugfs_create_dir("fw_profile", sdev->debugfs_root); 326 + 327 + debugfs_create_str("fw_path", 0444, fw_profile, 328 + (char **)&plat_data->fw_filename_prefix); 329 + debugfs_create_str("fw_lib_path", 0444, fw_profile, 330 + (char **)&plat_data->fw_lib_prefix); 331 + debugfs_create_str("tplg_path", 0444, fw_profile, 332 + (char **)&plat_data->tplg_filename_prefix); 333 + debugfs_create_str("fw_name", 0444, fw_profile, 334 + (char **)&plat_data->fw_filename); 335 + debugfs_create_str("tplg_name", 0444, fw_profile, 336 + (char **)&plat_data->tplg_filename); 337 + debugfs_create_u32("ipc_type", 0444, fw_profile, 338 + (u32 *)&plat_data->ipc_type); 321 339 322 340 /* init dfsentry list */ 323 341 INIT_LIST_HEAD(&sdev->dfsentry_list);

+24 -8

sound/soc/sof/intel/lnl.c

··· 32 32 }; 33 33 34 34 /* this helps allows the DSP to setup DMIC/SSP */ 35 - static int hdac_bus_offload_dmic_ssp(struct hdac_bus *bus) 35 + static int hdac_bus_offload_dmic_ssp(struct hdac_bus *bus, bool enable) 36 36 { 37 37 int ret; 38 38 39 - ret = hdac_bus_eml_enable_offload(bus, true, AZX_REG_ML_LEPTR_ID_INTEL_SSP, true); 39 + ret = hdac_bus_eml_enable_offload(bus, true, 40 + AZX_REG_ML_LEPTR_ID_INTEL_SSP, enable); 40 41 if (ret < 0) 41 42 return ret; 42 43 43 - ret = hdac_bus_eml_enable_offload(bus, true, AZX_REG_ML_LEPTR_ID_INTEL_DMIC, true); 44 + ret = hdac_bus_eml_enable_offload(bus, true, 45 + AZX_REG_ML_LEPTR_ID_INTEL_DMIC, enable); 44 46 if (ret < 0) 45 47 return ret; 46 48 ··· 57 55 if (ret < 0) 58 56 return ret; 59 57 60 - return hdac_bus_offload_dmic_ssp(sof_to_bus(sdev)); 58 + return hdac_bus_offload_dmic_ssp(sof_to_bus(sdev), true); 59 + } 60 + 61 + static void lnl_hda_dsp_remove(struct snd_sof_dev *sdev) 62 + { 63 + int ret; 64 + 65 + ret = hdac_bus_offload_dmic_ssp(sof_to_bus(sdev), false); 66 + if (ret < 0) 67 + dev_warn(sdev->dev, 68 + "Failed to disable offload for DMIC/SSP: %d\n", ret); 69 + 70 + hda_dsp_remove(sdev); 61 71 } 62 72 63 73 static int lnl_hda_dsp_resume(struct snd_sof_dev *sdev) ··· 80 66 if (ret < 0) 81 67 return ret; 82 68 83 - return hdac_bus_offload_dmic_ssp(sof_to_bus(sdev)); 69 + return hdac_bus_offload_dmic_ssp(sof_to_bus(sdev), true); 84 70 } 85 71 86 72 static int lnl_hda_dsp_runtime_resume(struct snd_sof_dev *sdev) ··· 91 77 if (ret < 0) 92 78 return ret; 93 79 94 - return hdac_bus_offload_dmic_ssp(sof_to_bus(sdev)); 80 + return hdac_bus_offload_dmic_ssp(sof_to_bus(sdev), true); 95 81 } 96 82 97 83 static int lnl_dsp_post_fw_run(struct snd_sof_dev *sdev) ··· 118 104 /* common defaults */ 119 105 memcpy(&sof_lnl_ops, &sof_hda_common_ops, sizeof(struct snd_sof_dsp_ops)); 120 106 121 - /* probe */ 122 - if (!sdev->dspless_mode_selected) 107 + /* probe/remove */ 108 + if (!sdev->dspless_mode_selected) { 123 109 sof_lnl_ops.probe = lnl_hda_dsp_probe; 110 + sof_lnl_ops.remove = lnl_hda_dsp_remove; 111 + } 124 112 125 113 /* shutdown */ 126 114 sof_lnl_ops.shutdown = hda_dsp_shutdown;

+3

sound/soc/sof/intel/pci-lnl.c

··· 35 35 .default_fw_path = { 36 36 [SOF_IPC_TYPE_4] = "intel/sof-ipc4/lnl", 37 37 }, 38 + .default_lib_path = { 39 + [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/lnl", 40 + }, 38 41 .default_tplg_path = { 39 42 [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg", 40 43 },

+1

sound/soc/sof/ipc3-pcm.c

··· 434 434 .trigger = sof_ipc3_pcm_trigger, 435 435 .dai_link_fixup = sof_ipc3_pcm_dai_link_fixup, 436 436 .reset_hw_params_during_stop = true, 437 + .d0i3_supported_in_s0ix = true, 437 438 };

+82 -39

sound/soc/sof/ipc4-pcm.c

··· 37 37 snd_pcm_sframes_t delay; 38 38 }; 39 39 40 + /** 41 + * struct sof_ipc4_pcm_stream_priv - IPC4 specific private data 42 + * @time_info: pointer to time info struct if it is supported, otherwise NULL 43 + * @chain_dma_allocated: indicates the ChainDMA allocation state 44 + */ 45 + struct sof_ipc4_pcm_stream_priv { 46 + struct sof_ipc4_timestamp_info *time_info; 47 + 48 + bool chain_dma_allocated; 49 + }; 50 + 51 + static inline struct sof_ipc4_timestamp_info * 52 + sof_ipc4_sps_to_time_info(struct snd_sof_pcm_stream *sps) 53 + { 54 + struct sof_ipc4_pcm_stream_priv *stream_priv = sps->private; 55 + 56 + return stream_priv->time_info; 57 + } 58 + 40 59 static int sof_ipc4_set_multi_pipeline_state(struct snd_sof_dev *sdev, u32 state, 41 60 struct ipc4_pipeline_set_state_data *trigger_list) 42 61 { ··· 272 253 */ 273 254 274 255 static int sof_ipc4_chain_dma_trigger(struct snd_sof_dev *sdev, 275 - int direction, 256 + struct snd_sof_pcm *spcm, int direction, 276 257 struct snd_sof_pcm_stream_pipeline_list *pipeline_list, 277 258 int state, int cmd) 278 259 { 279 260 struct sof_ipc4_fw_data *ipc4_data = sdev->private; 261 + struct sof_ipc4_pcm_stream_priv *stream_priv; 280 262 bool allocate, enable, set_fifo_size; 281 263 struct sof_ipc4_msg msg = {{ 0 }}; 282 - int i; 264 + int ret, i; 265 + 266 + stream_priv = spcm->stream[direction].private; 283 267 284 268 switch (state) { 285 269 case SOF_IPC4_PIPE_RUNNING: /* Allocate and start chained dma */ ··· 303 281 set_fifo_size = false; 304 282 break; 305 283 case SOF_IPC4_PIPE_RESET: /* Disable and free chained DMA. */ 284 + 285 + /* ChainDMA can only be reset if it has been allocated */ 286 + if (!stream_priv->chain_dma_allocated) 287 + return 0; 288 + 306 289 allocate = false; 307 290 enable = false; 308 291 set_fifo_size = false; ··· 365 338 if (enable) 366 339 msg.primary |= SOF_IPC4_GLB_CHAIN_DMA_ENABLE_MASK; 367 340 368 - return sof_ipc_tx_message_no_reply(sdev->ipc, &msg, 0); 341 + ret = sof_ipc_tx_message_no_reply(sdev->ipc, &msg, 0); 342 + /* Update the ChainDMA allocation state */ 343 + if (!ret) 344 + stream_priv->chain_dma_allocated = allocate; 345 + 346 + return ret; 369 347 } 370 348 371 349 static int sof_ipc4_trigger_pipelines(struct snd_soc_component *component, ··· 410 378 * trigger function that handles the rest for the substream. 411 379 */ 412 380 if (pipeline->use_chain_dma) 413 - return sof_ipc4_chain_dma_trigger(sdev, substream->stream, 381 + return sof_ipc4_chain_dma_trigger(sdev, spcm, substream->stream, 414 382 pipeline_list, state, cmd); 415 383 416 384 /* allocate memory for the pipeline data */ ··· 484 452 * Invalidate the stream_start_offset to make sure that it is 485 453 * going to be updated if the stream resumes 486 454 */ 487 - time_info = spcm->stream[substream->stream].private; 455 + time_info = sof_ipc4_sps_to_time_info(&spcm->stream[substream->stream]); 488 456 if (time_info) 489 457 time_info->stream_start_offset = SOF_IPC4_INVALID_STREAM_POSITION; 490 458 ··· 738 706 static void sof_ipc4_pcm_free(struct snd_sof_dev *sdev, struct snd_sof_pcm *spcm) 739 707 { 740 708 struct snd_sof_pcm_stream_pipeline_list *pipeline_list; 709 + struct sof_ipc4_pcm_stream_priv *stream_priv; 741 710 int stream; 742 711 743 712 for_each_pcm_streams(stream) { 744 713 pipeline_list = &spcm->stream[stream].pipeline_list; 745 714 kfree(pipeline_list->pipelines); 746 715 pipeline_list->pipelines = NULL; 716 + 717 + stream_priv = spcm->stream[stream].private; 718 + kfree(stream_priv->time_info); 747 719 kfree(spcm->stream[stream].private); 748 720 spcm->stream[stream].private = NULL; 749 721 } ··· 757 721 { 758 722 struct snd_sof_pcm_stream_pipeline_list *pipeline_list; 759 723 struct sof_ipc4_fw_data *ipc4_data = sdev->private; 760 - struct sof_ipc4_timestamp_info *stream_info; 724 + struct sof_ipc4_pcm_stream_priv *stream_priv; 725 + struct sof_ipc4_timestamp_info *time_info; 761 726 bool support_info = true; 762 727 u32 abi_version; 763 728 u32 abi_offset; ··· 786 749 return -ENOMEM; 787 750 } 788 751 789 - if (!support_info) 790 - continue; 791 - 792 - stream_info = kzalloc(sizeof(*stream_info), GFP_KERNEL); 793 - if (!stream_info) { 752 + stream_priv = kzalloc(sizeof(*stream_priv), GFP_KERNEL); 753 + if (!stream_priv) { 794 754 sof_ipc4_pcm_free(sdev, spcm); 795 755 return -ENOMEM; 796 756 } 797 757 798 - spcm->stream[stream].private = stream_info; 758 + spcm->stream[stream].private = stream_priv; 759 + 760 + if (!support_info) 761 + continue; 762 + 763 + time_info = kzalloc(sizeof(*time_info), GFP_KERNEL); 764 + if (!time_info) { 765 + sof_ipc4_pcm_free(sdev, spcm); 766 + return -ENOMEM; 767 + } 768 + 769 + stream_priv->time_info = time_info; 799 770 } 800 771 801 772 return 0; 802 773 } 803 774 804 - static void sof_ipc4_build_time_info(struct snd_sof_dev *sdev, struct snd_sof_pcm_stream *spcm) 775 + static void sof_ipc4_build_time_info(struct snd_sof_dev *sdev, struct snd_sof_pcm_stream *sps) 805 776 { 806 777 struct sof_ipc4_copier *host_copier = NULL; 807 778 struct sof_ipc4_copier *dai_copier = NULL; 808 779 struct sof_ipc4_llp_reading_slot llp_slot; 809 - struct sof_ipc4_timestamp_info *info; 780 + struct sof_ipc4_timestamp_info *time_info; 810 781 struct snd_soc_dapm_widget *widget; 811 782 struct snd_sof_dai *dai; 812 783 int i; 813 784 814 785 /* find host & dai to locate info in memory window */ 815 - for_each_dapm_widgets(spcm->list, i, widget) { 786 + for_each_dapm_widgets(sps->list, i, widget) { 816 787 struct snd_sof_widget *swidget = widget->dobj.private; 817 788 818 789 if (!swidget) ··· 840 795 return; 841 796 } 842 797 843 - info = spcm->private; 844 - info->host_copier = host_copier; 845 - info->dai_copier = dai_copier; 846 - info->llp_offset = offsetof(struct sof_ipc4_fw_registers, llp_gpdma_reading_slots) + 847 - sdev->fw_info_box.offset; 798 + time_info = sof_ipc4_sps_to_time_info(sps); 799 + time_info->host_copier = host_copier; 800 + time_info->dai_copier = dai_copier; 801 + time_info->llp_offset = offsetof(struct sof_ipc4_fw_registers, 802 + llp_gpdma_reading_slots) + sdev->fw_info_box.offset; 848 803 849 804 /* find llp slot used by current dai */ 850 805 for (i = 0; i < SOF_IPC4_MAX_LLP_GPDMA_READING_SLOTS; i++) { 851 - sof_mailbox_read(sdev, info->llp_offset, &llp_slot, sizeof(llp_slot)); 806 + sof_mailbox_read(sdev, time_info->llp_offset, &llp_slot, sizeof(llp_slot)); 852 807 if (llp_slot.node_id == dai_copier->data.gtw_cfg.node_id) 853 808 break; 854 809 855 - info->llp_offset += sizeof(llp_slot); 810 + time_info->llp_offset += sizeof(llp_slot); 856 811 } 857 812 858 813 if (i < SOF_IPC4_MAX_LLP_GPDMA_READING_SLOTS) 859 814 return; 860 815 861 816 /* if no llp gpdma slot is used, check aggregated sdw slot */ 862 - info->llp_offset = offsetof(struct sof_ipc4_fw_registers, llp_sndw_reading_slots) + 863 - sdev->fw_info_box.offset; 817 + time_info->llp_offset = offsetof(struct sof_ipc4_fw_registers, 818 + llp_sndw_reading_slots) + sdev->fw_info_box.offset; 864 819 for (i = 0; i < SOF_IPC4_MAX_LLP_SNDW_READING_SLOTS; i++) { 865 - sof_mailbox_read(sdev, info->llp_offset, &llp_slot, sizeof(llp_slot)); 820 + sof_mailbox_read(sdev, time_info->llp_offset, &llp_slot, sizeof(llp_slot)); 866 821 if (llp_slot.node_id == dai_copier->data.gtw_cfg.node_id) 867 822 break; 868 823 869 - info->llp_offset += sizeof(llp_slot); 824 + time_info->llp_offset += sizeof(llp_slot); 870 825 } 871 826 872 827 if (i < SOF_IPC4_MAX_LLP_SNDW_READING_SLOTS) 873 828 return; 874 829 875 830 /* check EVAD slot */ 876 - info->llp_offset = offsetof(struct sof_ipc4_fw_registers, llp_evad_reading_slot) + 877 - sdev->fw_info_box.offset; 878 - sof_mailbox_read(sdev, info->llp_offset, &llp_slot, sizeof(llp_slot)); 831 + time_info->llp_offset = offsetof(struct sof_ipc4_fw_registers, 832 + llp_evad_reading_slot) + sdev->fw_info_box.offset; 833 + sof_mailbox_read(sdev, time_info->llp_offset, &llp_slot, sizeof(llp_slot)); 879 834 if (llp_slot.node_id != dai_copier->data.gtw_cfg.node_id) 880 - info->llp_offset = 0; 835 + time_info->llp_offset = 0; 881 836 } 882 837 883 838 static int sof_ipc4_pcm_hw_params(struct snd_soc_component *component, ··· 894 849 if (!spcm) 895 850 return -EINVAL; 896 851 897 - time_info = spcm->stream[substream->stream].private; 852 + time_info = sof_ipc4_sps_to_time_info(&spcm->stream[substream->stream]); 898 853 /* delay calculation is not supported by current fw_reg ABI */ 899 854 if (!time_info) 900 855 return 0; ··· 909 864 910 865 static int sof_ipc4_get_stream_start_offset(struct snd_sof_dev *sdev, 911 866 struct snd_pcm_substream *substream, 912 - struct snd_sof_pcm_stream *stream, 867 + struct snd_sof_pcm_stream *sps, 913 868 struct sof_ipc4_timestamp_info *time_info) 914 869 { 915 870 struct sof_ipc4_copier *host_copier = time_info->host_copier; ··· 963 918 struct sof_ipc4_timestamp_info *time_info; 964 919 struct sof_ipc4_llp_reading_slot llp; 965 920 snd_pcm_uframes_t head_cnt, tail_cnt; 966 - struct snd_sof_pcm_stream *stream; 921 + struct snd_sof_pcm_stream *sps; 967 922 u64 dai_cnt, host_cnt, host_ptr; 968 923 struct snd_sof_pcm *spcm; 969 924 int ret; ··· 972 927 if (!spcm) 973 928 return -EOPNOTSUPP; 974 929 975 - stream = &spcm->stream[substream->stream]; 976 - time_info = stream->private; 930 + sps = &spcm->stream[substream->stream]; 931 + time_info = sof_ipc4_sps_to_time_info(sps); 977 932 if (!time_info) 978 933 return -EOPNOTSUPP; 979 934 ··· 983 938 * the statistics is complete. And it will not change after the first initiailization. 984 939 */ 985 940 if (time_info->stream_start_offset == SOF_IPC4_INVALID_STREAM_POSITION) { 986 - ret = sof_ipc4_get_stream_start_offset(sdev, substream, stream, time_info); 941 + ret = sof_ipc4_get_stream_start_offset(sdev, substream, sps, time_info); 987 942 if (ret < 0) 988 943 return -EOPNOTSUPP; 989 944 } ··· 1075 1030 { 1076 1031 struct snd_soc_pcm_runtime *rtd = snd_soc_substream_to_rtd(substream); 1077 1032 struct sof_ipc4_timestamp_info *time_info; 1078 - struct snd_sof_pcm_stream *stream; 1079 1033 struct snd_sof_pcm *spcm; 1080 1034 1081 1035 spcm = snd_sof_find_spcm_dai(component, rtd); 1082 1036 if (!spcm) 1083 1037 return 0; 1084 1038 1085 - stream = &spcm->stream[substream->stream]; 1086 - time_info = stream->private; 1039 + time_info = sof_ipc4_sps_to_time_info(&spcm->stream[substream->stream]); 1087 1040 /* 1088 1041 * Report the stored delay value calculated in the pointer callback. 1089 1042 * In the unlikely event that the calculation was skipped/aborted, the

+6 -7

sound/soc/sof/pcm.c

··· 312 312 ipc_first = true; 313 313 break; 314 314 case SNDRV_PCM_TRIGGER_SUSPEND: 315 - if (sdev->system_suspend_target == SOF_SUSPEND_S0IX && 315 + /* 316 + * If DSP D0I3 is allowed during S0iX, set the suspend_ignored flag for 317 + * D0I3-compatible streams to keep the firmware pipeline running 318 + */ 319 + if (pcm_ops && pcm_ops->d0i3_supported_in_s0ix && 320 + sdev->system_suspend_target == SOF_SUSPEND_S0IX && 316 321 spcm->stream[substream->stream].d0i3_compatible) { 317 - /* 318 - * trap the event, not sending trigger stop to 319 - * prevent the FW pipelines from being stopped, 320 - * and mark the flag to ignore the upcoming DAPM 321 - * PM events. 322 - */ 323 322 spcm->stream[substream->stream].suspend_ignored = true; 324 323 return 0; 325 324 }

+2

sound/soc/sof/sof-audio.h

··· 117 117 * triggers. The FW keeps the host DMA running in this case and 118 118 * therefore the host must do the same and should stop the DMA during 119 119 * hw_free. 120 + * @d0i3_supported_in_s0ix: Allow DSP D0I3 during S0iX 120 121 */ 121 122 struct sof_ipc_pcm_ops { 122 123 int (*hw_params)(struct snd_soc_component *component, struct snd_pcm_substream *substream, ··· 137 136 bool reset_hw_params_during_stop; 138 137 bool ipc_first_on_start; 139 138 bool platform_stop_during_hw_free; 139 + bool d0i3_supported_in_s0ix; 140 140 }; 141 141 142 142 /**

+3 -4

sound/soc/tegra/tegra186_dspk.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0-only 2 + // SPDX-FileCopyrightText: Copyright (c) 2020-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. 2 3 // 3 4 // tegra186_dspk.c - Tegra186 DSPK driver 4 - // 5 - // Copyright (c) 2020 NVIDIA CORPORATION. All rights reserved. 6 5 7 6 #include <linux/clk.h> 8 7 #include <linux/device.h> ··· 240 241 return -EINVAL; 241 242 } 242 243 243 - cif_conf.client_bits = TEGRA_ACIF_BITS_24; 244 - 245 244 switch (params_format(params)) { 246 245 case SNDRV_PCM_FORMAT_S16_LE: 247 246 cif_conf.audio_bits = TEGRA_ACIF_BITS_16; 247 + cif_conf.client_bits = TEGRA_ACIF_BITS_16; 248 248 break; 249 249 case SNDRV_PCM_FORMAT_S32_LE: 250 250 cif_conf.audio_bits = TEGRA_ACIF_BITS_32; 251 + cif_conf.client_bits = TEGRA_ACIF_BITS_24; 251 252 break; 252 253 default: 253 254 dev_err(dev, "unsupported format!\n");

+6 -6

sound/soc/ti/davinci-mcasp.c

··· 2417 2417 2418 2418 mcasp_reparent_fck(pdev); 2419 2419 2420 - ret = devm_snd_soc_register_component(&pdev->dev, &davinci_mcasp_component, 2421 - &davinci_mcasp_dai[mcasp->op_mode], 1); 2422 - 2423 - if (ret != 0) 2424 - goto err; 2425 - 2426 2420 ret = davinci_mcasp_get_dma_type(mcasp); 2427 2421 switch (ret) { 2428 2422 case PCM_EDMA: ··· 2442 2448 dev_err(&pdev->dev, "register PCM failed: %d\n", ret); 2443 2449 goto err; 2444 2450 } 2451 + 2452 + ret = devm_snd_soc_register_component(&pdev->dev, &davinci_mcasp_component, 2453 + &davinci_mcasp_dai[mcasp->op_mode], 1); 2454 + 2455 + if (ret != 0) 2456 + goto err; 2445 2457 2446 2458 no_audio: 2447 2459 ret = davinci_mcasp_init_gpiochip(mcasp);

+3 -3

sound/usb/line6/driver.c

··· 202 202 struct urb *urb; 203 203 204 204 /* create message: */ 205 - msg = kmalloc(sizeof(struct message), GFP_ATOMIC); 205 + msg = kzalloc(sizeof(struct message), GFP_ATOMIC); 206 206 if (msg == NULL) 207 207 return -ENOMEM; 208 208 ··· 688 688 int ret; 689 689 690 690 /* initialize USB buffers: */ 691 - line6->buffer_listen = kmalloc(LINE6_BUFSIZE_LISTEN, GFP_KERNEL); 691 + line6->buffer_listen = kzalloc(LINE6_BUFSIZE_LISTEN, GFP_KERNEL); 692 692 if (!line6->buffer_listen) 693 693 return -ENOMEM; 694 694 ··· 697 697 return -ENOMEM; 698 698 699 699 if (line6->properties->capabilities & LINE6_CAP_CONTROL_MIDI) { 700 - line6->buffer_message = kmalloc(LINE6_MIDI_MESSAGE_MAXLEN, GFP_KERNEL); 700 + line6->buffer_message = kzalloc(LINE6_MIDI_MESSAGE_MAXLEN, GFP_KERNEL); 701 701 if (!line6->buffer_message) 702 702 return -ENOMEM; 703 703

+2

tools/include/linux/btf_ids.h

··· 3 3 #ifndef _LINUX_BTF_IDS_H 4 4 #define _LINUX_BTF_IDS_H 5 5 6 + #include <linux/types.h> /* for u32 */ 7 + 6 8 struct btf_id_set { 7 9 u32 cnt; 8 10 u32 ids[];

+2 -2

tools/testing/selftests/kvm/aarch64/arch_timer.c

··· 135 135 136 136 irq_iter = READ_ONCE(shared_data->nr_iter); 137 137 __GUEST_ASSERT(config_iter + 1 == irq_iter, 138 - "config_iter + 1 = 0x%lx, irq_iter = 0x%lx.\n" 139 - " Guest timer interrupt was not trigged within the specified\n" 138 + "config_iter + 1 = 0x%x, irq_iter = 0x%x.\n" 139 + " Guest timer interrupt was not triggered within the specified\n" 140 140 " interval, try to increase the error margin by [-e] option.\n", 141 141 config_iter + 1, irq_iter); 142 142 }

+11

tools/testing/selftests/kvm/include/x86_64/processor.h

··· 1037 1037 void vcpu_set_cpuid_property(struct kvm_vcpu *vcpu, 1038 1038 struct kvm_x86_cpu_property property, 1039 1039 uint32_t value); 1040 + void vcpu_set_cpuid_maxphyaddr(struct kvm_vcpu *vcpu, uint8_t maxphyaddr); 1040 1041 1041 1042 void vcpu_clear_cpuid_entry(struct kvm_vcpu *vcpu, uint32_t function); 1043 + 1044 + static inline bool vcpu_cpuid_has(struct kvm_vcpu *vcpu, 1045 + struct kvm_x86_cpu_feature feature) 1046 + { 1047 + struct kvm_cpuid_entry2 *entry; 1048 + 1049 + entry = __vcpu_get_cpuid_entry(vcpu, feature.function, feature.index); 1050 + return *((&entry->eax) + feature.reg) & BIT(feature.bit); 1051 + } 1052 + 1042 1053 void vcpu_set_or_clear_cpuid_feature(struct kvm_vcpu *vcpu, 1043 1054 struct kvm_x86_cpu_feature feature, 1044 1055 bool set);

+1 -1

tools/testing/selftests/kvm/riscv/arch_timer.c

··· 60 60 irq_iter = READ_ONCE(shared_data->nr_iter); 61 61 __GUEST_ASSERT(config_iter + 1 == irq_iter, 62 62 "config_iter + 1 = 0x%x, irq_iter = 0x%x.\n" 63 - " Guest timer interrupt was not trigged within the specified\n" 63 + " Guest timer interrupt was not triggered within the specified\n" 64 64 " interval, try to increase the error margin by [-e] option.\n", 65 65 config_iter + 1, irq_iter); 66 66 }

+39

tools/testing/selftests/kvm/x86_64/kvm_pv_test.c

··· 133 133 } 134 134 } 135 135 136 + static void test_pv_unhalt(void) 137 + { 138 + struct kvm_vcpu *vcpu; 139 + struct kvm_vm *vm; 140 + struct kvm_cpuid_entry2 *ent; 141 + u32 kvm_sig_old; 142 + 143 + pr_info("testing KVM_FEATURE_PV_UNHALT\n"); 144 + 145 + TEST_REQUIRE(KVM_CAP_X86_DISABLE_EXITS); 146 + 147 + /* KVM_PV_UNHALT test */ 148 + vm = vm_create_with_one_vcpu(&vcpu, guest_main); 149 + vcpu_set_cpuid_feature(vcpu, X86_FEATURE_KVM_PV_UNHALT); 150 + 151 + TEST_ASSERT(vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT), 152 + "Enabling X86_FEATURE_KVM_PV_UNHALT had no effect"); 153 + 154 + /* Make sure KVM clears vcpu->arch.kvm_cpuid */ 155 + ent = vcpu_get_cpuid_entry(vcpu, KVM_CPUID_SIGNATURE); 156 + kvm_sig_old = ent->ebx; 157 + ent->ebx = 0xdeadbeef; 158 + vcpu_set_cpuid(vcpu); 159 + 160 + vm_enable_cap(vm, KVM_CAP_X86_DISABLE_EXITS, KVM_X86_DISABLE_EXITS_HLT); 161 + ent = vcpu_get_cpuid_entry(vcpu, KVM_CPUID_SIGNATURE); 162 + ent->ebx = kvm_sig_old; 163 + vcpu_set_cpuid(vcpu); 164 + 165 + TEST_ASSERT(!vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT), 166 + "KVM_FEATURE_PV_UNHALT is set with KVM_CAP_X86_DISABLE_EXITS"); 167 + 168 + /* FIXME: actually test KVM_FEATURE_PV_UNHALT feature */ 169 + 170 + kvm_vm_free(vm); 171 + } 172 + 136 173 int main(void) 137 174 { 138 175 struct kvm_vcpu *vcpu; ··· 188 151 189 152 enter_guest(vcpu); 190 153 kvm_vm_free(vm); 154 + 155 + test_pv_unhalt(); 191 156 }

+1 -1

tools/testing/selftests/mm/vm_util.h

··· 3 3 #include <stdbool.h> 4 4 #include <sys/mman.h> 5 5 #include <err.h> 6 - #include <string.h> /* ffsl() */ 6 + #include <strings.h> /* ffsl() */ 7 7 #include <unistd.h> /* _SC_PAGESIZE */ 8 8 9 9 #define BIT_ULL(nr) (1ULL << (nr))

+721 -72

tools/testing/selftests/net/bind_wildcard.c

··· 6 6 7 7 #include "../kselftest_harness.h" 8 8 9 - struct in6_addr in6addr_v4mapped_any = { 9 + static const __u32 in4addr_any = INADDR_ANY; 10 + static const __u32 in4addr_loopback = INADDR_LOOPBACK; 11 + static const struct in6_addr in6addr_v4mapped_any = { 10 12 .s6_addr = { 11 13 0, 0, 0, 0, 12 14 0, 0, 0, 0, ··· 16 14 0, 0, 0, 0 17 15 } 18 16 }; 19 - 20 - struct in6_addr in6addr_v4mapped_loopback = { 17 + static const struct in6_addr in6addr_v4mapped_loopback = { 21 18 .s6_addr = { 22 19 0, 0, 0, 0, 23 20 0, 0, 0, 0, ··· 25 24 } 26 25 }; 27 26 27 + #define NR_SOCKETS 8 28 + 28 29 FIXTURE(bind_wildcard) 29 30 { 30 - struct sockaddr_in addr4; 31 - struct sockaddr_in6 addr6; 31 + int fd[NR_SOCKETS]; 32 + socklen_t addrlen[NR_SOCKETS]; 33 + union { 34 + struct sockaddr addr; 35 + struct sockaddr_in addr4; 36 + struct sockaddr_in6 addr6; 37 + } addr[NR_SOCKETS]; 32 38 }; 33 39 34 40 FIXTURE_VARIANT(bind_wildcard) 35 41 { 36 - const __u32 addr4_const; 37 - const struct in6_addr *addr6_const; 38 - int expected_errno; 42 + sa_family_t family[2]; 43 + const void *addr[2]; 44 + bool ipv6_only[2]; 45 + 46 + /* 6 bind() calls below follow two bind() for the defined 2 addresses: 47 + * 48 + * 0.0.0.0 49 + * 127.0.0.1 50 + * :: 51 + * ::1 52 + * ::ffff:0.0.0.0 53 + * ::ffff:127.0.0.1 54 + */ 55 + int expected_errno[NR_SOCKETS]; 56 + int expected_reuse_errno[NR_SOCKETS]; 39 57 }; 40 58 59 + /* (IPv4, IPv4) */ 60 + FIXTURE_VARIANT_ADD(bind_wildcard, v4_any_v4_local) 61 + { 62 + .family = {AF_INET, AF_INET}, 63 + .addr = {&in4addr_any, &in4addr_loopback}, 64 + .expected_errno = {0, EADDRINUSE, 65 + EADDRINUSE, EADDRINUSE, 66 + EADDRINUSE, 0, 67 + EADDRINUSE, EADDRINUSE}, 68 + .expected_reuse_errno = {0, 0, 69 + EADDRINUSE, EADDRINUSE, 70 + EADDRINUSE, 0, 71 + EADDRINUSE, EADDRINUSE}, 72 + }; 73 + 74 + FIXTURE_VARIANT_ADD(bind_wildcard, v4_local_v4_any) 75 + { 76 + .family = {AF_INET, AF_INET}, 77 + .addr = {&in4addr_loopback, &in4addr_any}, 78 + .expected_errno = {0, EADDRINUSE, 79 + EADDRINUSE, EADDRINUSE, 80 + EADDRINUSE, 0, 81 + EADDRINUSE, EADDRINUSE}, 82 + .expected_reuse_errno = {0, 0, 83 + EADDRINUSE, EADDRINUSE, 84 + EADDRINUSE, 0, 85 + EADDRINUSE, EADDRINUSE}, 86 + }; 87 + 88 + /* (IPv4, IPv6) */ 41 89 FIXTURE_VARIANT_ADD(bind_wildcard, v4_any_v6_any) 42 90 { 43 - .addr4_const = INADDR_ANY, 44 - .addr6_const = &in6addr_any, 45 - .expected_errno = EADDRINUSE, 91 + .family = {AF_INET, AF_INET6}, 92 + .addr = {&in4addr_any, &in6addr_any}, 93 + .expected_errno = {0, EADDRINUSE, 94 + EADDRINUSE, EADDRINUSE, 95 + EADDRINUSE, 0, 96 + EADDRINUSE, EADDRINUSE}, 97 + .expected_reuse_errno = {0, 0, 98 + EADDRINUSE, EADDRINUSE, 99 + EADDRINUSE, EADDRINUSE, 100 + EADDRINUSE, EADDRINUSE}, 101 + }; 102 + 103 + FIXTURE_VARIANT_ADD(bind_wildcard, v4_any_v6_any_only) 104 + { 105 + .family = {AF_INET, AF_INET6}, 106 + .addr = {&in4addr_any, &in6addr_any}, 107 + .ipv6_only = {false, true}, 108 + .expected_errno = {0, 0, 109 + EADDRINUSE, EADDRINUSE, 110 + EADDRINUSE, EADDRINUSE, 111 + EADDRINUSE, EADDRINUSE}, 112 + .expected_reuse_errno = {0, 0, 113 + EADDRINUSE, EADDRINUSE, 114 + EADDRINUSE, EADDRINUSE, 115 + EADDRINUSE, EADDRINUSE}, 46 116 }; 47 117 48 118 FIXTURE_VARIANT_ADD(bind_wildcard, v4_any_v6_local) 49 119 { 50 - .addr4_const = INADDR_ANY, 51 - .addr6_const = &in6addr_loopback, 52 - .expected_errno = 0, 120 + .family = {AF_INET, AF_INET6}, 121 + .addr = {&in4addr_any, &in6addr_loopback}, 122 + .expected_errno = {0, 0, 123 + EADDRINUSE, EADDRINUSE, 124 + EADDRINUSE, EADDRINUSE, 125 + EADDRINUSE, EADDRINUSE}, 126 + .expected_reuse_errno = {0, 0, 127 + EADDRINUSE, EADDRINUSE, 128 + EADDRINUSE, EADDRINUSE, 129 + EADDRINUSE, EADDRINUSE}, 53 130 }; 54 131 55 132 FIXTURE_VARIANT_ADD(bind_wildcard, v4_any_v6_v4mapped_any) 56 133 { 57 - .addr4_const = INADDR_ANY, 58 - .addr6_const = &in6addr_v4mapped_any, 59 - .expected_errno = EADDRINUSE, 134 + .family = {AF_INET, AF_INET6}, 135 + .addr = {&in4addr_any, &in6addr_v4mapped_any}, 136 + .expected_errno = {0, EADDRINUSE, 137 + EADDRINUSE, EADDRINUSE, 138 + EADDRINUSE, 0, 139 + EADDRINUSE, EADDRINUSE}, 140 + .expected_reuse_errno = {0, 0, 141 + EADDRINUSE, EADDRINUSE, 142 + EADDRINUSE, 0, 143 + EADDRINUSE, EADDRINUSE}, 60 144 }; 61 145 62 146 FIXTURE_VARIANT_ADD(bind_wildcard, v4_any_v6_v4mapped_local) 63 147 { 64 - .addr4_const = INADDR_ANY, 65 - .addr6_const = &in6addr_v4mapped_loopback, 66 - .expected_errno = EADDRINUSE, 148 + .family = {AF_INET, AF_INET6}, 149 + .addr = {&in4addr_any, &in6addr_v4mapped_loopback}, 150 + .expected_errno = {0, EADDRINUSE, 151 + EADDRINUSE, EADDRINUSE, 152 + EADDRINUSE, 0, 153 + EADDRINUSE, EADDRINUSE}, 154 + .expected_reuse_errno = {0, 0, 155 + EADDRINUSE, EADDRINUSE, 156 + EADDRINUSE, 0, 157 + EADDRINUSE, EADDRINUSE}, 67 158 }; 68 159 69 160 FIXTURE_VARIANT_ADD(bind_wildcard, v4_local_v6_any) 70 161 { 71 - .addr4_const = INADDR_LOOPBACK, 72 - .addr6_const = &in6addr_any, 73 - .expected_errno = EADDRINUSE, 162 + .family = {AF_INET, AF_INET6}, 163 + .addr = {&in4addr_loopback, &in6addr_any}, 164 + .expected_errno = {0, EADDRINUSE, 165 + EADDRINUSE, EADDRINUSE, 166 + EADDRINUSE, 0, 167 + EADDRINUSE, EADDRINUSE}, 168 + .expected_reuse_errno = {0, 0, 169 + EADDRINUSE, EADDRINUSE, 170 + EADDRINUSE, EADDRINUSE, 171 + EADDRINUSE, EADDRINUSE}, 172 + }; 173 + 174 + FIXTURE_VARIANT_ADD(bind_wildcard, v4_local_v6_any_only) 175 + { 176 + .family = {AF_INET, AF_INET6}, 177 + .addr = {&in4addr_loopback, &in6addr_any}, 178 + .ipv6_only = {false, true}, 179 + .expected_errno = {0, 0, 180 + EADDRINUSE, EADDRINUSE, 181 + EADDRINUSE, EADDRINUSE, 182 + EADDRINUSE, EADDRINUSE}, 183 + .expected_reuse_errno = {0, 0, 184 + EADDRINUSE, EADDRINUSE, 185 + EADDRINUSE, EADDRINUSE, 186 + EADDRINUSE, EADDRINUSE}, 74 187 }; 75 188 76 189 FIXTURE_VARIANT_ADD(bind_wildcard, v4_local_v6_local) 77 190 { 78 - .addr4_const = INADDR_LOOPBACK, 79 - .addr6_const = &in6addr_loopback, 80 - .expected_errno = 0, 191 + .family = {AF_INET, AF_INET6}, 192 + .addr = {&in4addr_loopback, &in6addr_loopback}, 193 + .expected_errno = {0, 0, 194 + EADDRINUSE, EADDRINUSE, 195 + EADDRINUSE, EADDRINUSE, 196 + EADDRINUSE, EADDRINUSE}, 197 + .expected_reuse_errno = {0, 0, 198 + EADDRINUSE, EADDRINUSE, 199 + EADDRINUSE, EADDRINUSE, 200 + EADDRINUSE, EADDRINUSE}, 81 201 }; 82 202 83 203 FIXTURE_VARIANT_ADD(bind_wildcard, v4_local_v6_v4mapped_any) 84 204 { 85 - .addr4_const = INADDR_LOOPBACK, 86 - .addr6_const = &in6addr_v4mapped_any, 87 - .expected_errno = EADDRINUSE, 205 + .family = {AF_INET, AF_INET6}, 206 + .addr = {&in4addr_loopback, &in6addr_v4mapped_any}, 207 + .expected_errno = {0, EADDRINUSE, 208 + EADDRINUSE, EADDRINUSE, 209 + EADDRINUSE, 0, 210 + EADDRINUSE, EADDRINUSE}, 211 + .expected_reuse_errno = {0, 0, 212 + EADDRINUSE, EADDRINUSE, 213 + EADDRINUSE, 0, 214 + EADDRINUSE, EADDRINUSE}, 88 215 }; 89 216 90 217 FIXTURE_VARIANT_ADD(bind_wildcard, v4_local_v6_v4mapped_local) 91 218 { 92 - .addr4_const = INADDR_LOOPBACK, 93 - .addr6_const = &in6addr_v4mapped_loopback, 94 - .expected_errno = EADDRINUSE, 219 + .family = {AF_INET, AF_INET6}, 220 + .addr = {&in4addr_loopback, &in6addr_v4mapped_loopback}, 221 + .expected_errno = {0, EADDRINUSE, 222 + EADDRINUSE, EADDRINUSE, 223 + EADDRINUSE, 0, 224 + EADDRINUSE, EADDRINUSE}, 225 + .expected_reuse_errno = {0, 0, 226 + EADDRINUSE, EADDRINUSE, 227 + EADDRINUSE, 0, 228 + EADDRINUSE, EADDRINUSE}, 95 229 }; 230 + 231 + /* (IPv6, IPv4) */ 232 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_any_v4_any) 233 + { 234 + .family = {AF_INET6, AF_INET}, 235 + .addr = {&in6addr_any, &in4addr_any}, 236 + .expected_errno = {0, EADDRINUSE, 237 + EADDRINUSE, EADDRINUSE, 238 + EADDRINUSE, EADDRINUSE, 239 + EADDRINUSE, EADDRINUSE}, 240 + .expected_reuse_errno = {0, 0, 241 + EADDRINUSE, EADDRINUSE, 242 + EADDRINUSE, EADDRINUSE, 243 + EADDRINUSE, EADDRINUSE}, 244 + }; 245 + 246 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_any_only_v4_any) 247 + { 248 + .family = {AF_INET6, AF_INET}, 249 + .addr = {&in6addr_any, &in4addr_any}, 250 + .ipv6_only = {true, false}, 251 + .expected_errno = {0, 0, 252 + EADDRINUSE, EADDRINUSE, 253 + EADDRINUSE, EADDRINUSE, 254 + EADDRINUSE, EADDRINUSE}, 255 + .expected_reuse_errno = {0, 0, 256 + EADDRINUSE, EADDRINUSE, 257 + EADDRINUSE, EADDRINUSE, 258 + EADDRINUSE, EADDRINUSE}, 259 + }; 260 + 261 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_any_v4_local) 262 + { 263 + .family = {AF_INET6, AF_INET}, 264 + .addr = {&in6addr_any, &in4addr_loopback}, 265 + .expected_errno = {0, EADDRINUSE, 266 + EADDRINUSE, EADDRINUSE, 267 + EADDRINUSE, EADDRINUSE, 268 + EADDRINUSE, EADDRINUSE}, 269 + .expected_reuse_errno = {0, 0, 270 + EADDRINUSE, EADDRINUSE, 271 + EADDRINUSE, EADDRINUSE, 272 + EADDRINUSE, EADDRINUSE}, 273 + }; 274 + 275 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_any_only_v4_local) 276 + { 277 + .family = {AF_INET6, AF_INET}, 278 + .addr = {&in6addr_any, &in4addr_loopback}, 279 + .ipv6_only = {true, false}, 280 + .expected_errno = {0, 0, 281 + EADDRINUSE, EADDRINUSE, 282 + EADDRINUSE, EADDRINUSE, 283 + EADDRINUSE, EADDRINUSE}, 284 + .expected_reuse_errno = {0, 0, 285 + EADDRINUSE, EADDRINUSE, 286 + EADDRINUSE, EADDRINUSE, 287 + EADDRINUSE, EADDRINUSE}, 288 + }; 289 + 290 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_local_v4_any) 291 + { 292 + .family = {AF_INET6, AF_INET}, 293 + .addr = {&in6addr_loopback, &in4addr_any}, 294 + .expected_errno = {0, 0, 295 + EADDRINUSE, EADDRINUSE, 296 + EADDRINUSE, EADDRINUSE, 297 + EADDRINUSE, EADDRINUSE}, 298 + .expected_reuse_errno = {0, 0, 299 + EADDRINUSE, EADDRINUSE, 300 + EADDRINUSE, EADDRINUSE, 301 + EADDRINUSE, EADDRINUSE}, 302 + }; 303 + 304 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_local_v4_local) 305 + { 306 + .family = {AF_INET6, AF_INET}, 307 + .addr = {&in6addr_loopback, &in4addr_loopback}, 308 + .expected_errno = {0, 0, 309 + EADDRINUSE, EADDRINUSE, 310 + EADDRINUSE, EADDRINUSE, 311 + EADDRINUSE, EADDRINUSE}, 312 + .expected_reuse_errno = {0, 0, 313 + EADDRINUSE, EADDRINUSE, 314 + EADDRINUSE, EADDRINUSE, 315 + EADDRINUSE, EADDRINUSE}, 316 + }; 317 + 318 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_v4mapped_any_v4_any) 319 + { 320 + .family = {AF_INET6, AF_INET}, 321 + .addr = {&in6addr_v4mapped_any, &in4addr_any}, 322 + .expected_errno = {0, EADDRINUSE, 323 + EADDRINUSE, EADDRINUSE, 324 + EADDRINUSE, 0, 325 + EADDRINUSE, EADDRINUSE}, 326 + .expected_reuse_errno = {0, 0, 327 + EADDRINUSE, EADDRINUSE, 328 + EADDRINUSE, 0, 329 + EADDRINUSE, EADDRINUSE}, 330 + }; 331 + 332 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_v4mapped_any_v4_local) 333 + { 334 + .family = {AF_INET6, AF_INET}, 335 + .addr = {&in6addr_v4mapped_any, &in4addr_loopback}, 336 + .expected_errno = {0, EADDRINUSE, 337 + EADDRINUSE, EADDRINUSE, 338 + EADDRINUSE, 0, 339 + EADDRINUSE, EADDRINUSE}, 340 + .expected_reuse_errno = {0, 0, 341 + EADDRINUSE, EADDRINUSE, 342 + EADDRINUSE, 0, 343 + EADDRINUSE, EADDRINUSE}, 344 + }; 345 + 346 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_v4mapped_local_v4_any) 347 + { 348 + .family = {AF_INET6, AF_INET}, 349 + .addr = {&in6addr_v4mapped_loopback, &in4addr_any}, 350 + .expected_errno = {0, EADDRINUSE, 351 + EADDRINUSE, EADDRINUSE, 352 + EADDRINUSE, 0, 353 + EADDRINUSE, EADDRINUSE}, 354 + .expected_reuse_errno = {0, 0, 355 + EADDRINUSE, EADDRINUSE, 356 + EADDRINUSE, 0, 357 + EADDRINUSE, EADDRINUSE}, 358 + }; 359 + 360 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_v4mapped_local_v4_local) 361 + { 362 + .family = {AF_INET6, AF_INET}, 363 + .addr = {&in6addr_v4mapped_loopback, &in4addr_loopback}, 364 + .expected_errno = {0, EADDRINUSE, 365 + EADDRINUSE, EADDRINUSE, 366 + EADDRINUSE, 0, 367 + EADDRINUSE, EADDRINUSE}, 368 + .expected_reuse_errno = {0, 0, 369 + EADDRINUSE, EADDRINUSE, 370 + EADDRINUSE, 0, 371 + EADDRINUSE, EADDRINUSE}, 372 + }; 373 + 374 + /* (IPv6, IPv6) */ 375 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_any_v6_any) 376 + { 377 + .family = {AF_INET6, AF_INET6}, 378 + .addr = {&in6addr_any, &in6addr_any}, 379 + .expected_errno = {0, EADDRINUSE, 380 + EADDRINUSE, EADDRINUSE, 381 + EADDRINUSE, EADDRINUSE, 382 + EADDRINUSE, EADDRINUSE}, 383 + .expected_reuse_errno = {0, 0, 384 + EADDRINUSE, EADDRINUSE, 385 + EADDRINUSE, EADDRINUSE, 386 + EADDRINUSE, EADDRINUSE}, 387 + }; 388 + 389 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_any_only_v6_any) 390 + { 391 + .family = {AF_INET6, AF_INET6}, 392 + .addr = {&in6addr_any, &in6addr_any}, 393 + .ipv6_only = {true, false}, 394 + .expected_errno = {0, EADDRINUSE, 395 + 0, EADDRINUSE, 396 + EADDRINUSE, EADDRINUSE, 397 + EADDRINUSE, EADDRINUSE}, 398 + .expected_reuse_errno = {0, 0, 399 + EADDRINUSE, EADDRINUSE, 400 + EADDRINUSE, EADDRINUSE, 401 + EADDRINUSE, EADDRINUSE}, 402 + }; 403 + 404 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_any_v6_any_only) 405 + { 406 + .family = {AF_INET6, AF_INET6}, 407 + .addr = {&in6addr_any, &in6addr_any}, 408 + .ipv6_only = {false, true}, 409 + .expected_errno = {0, EADDRINUSE, 410 + EADDRINUSE, EADDRINUSE, 411 + EADDRINUSE, EADDRINUSE, 412 + EADDRINUSE, EADDRINUSE}, 413 + .expected_reuse_errno = {0, 0, 414 + EADDRINUSE, EADDRINUSE, 415 + EADDRINUSE, EADDRINUSE, 416 + EADDRINUSE, EADDRINUSE}, 417 + }; 418 + 419 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_any_only_v6_any_only) 420 + { 421 + .family = {AF_INET6, AF_INET6}, 422 + .addr = {&in6addr_any, &in6addr_any}, 423 + .ipv6_only = {true, true}, 424 + .expected_errno = {0, EADDRINUSE, 425 + 0, EADDRINUSE, 426 + EADDRINUSE, EADDRINUSE, 427 + EADDRINUSE, EADDRINUSE}, 428 + .expected_reuse_errno = {0, 0, 429 + 0, EADDRINUSE, 430 + EADDRINUSE, EADDRINUSE, 431 + EADDRINUSE, EADDRINUSE}, 432 + }; 433 + 434 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_any_v6_local) 435 + { 436 + .family = {AF_INET6, AF_INET6}, 437 + .addr = {&in6addr_any, &in6addr_loopback}, 438 + .expected_errno = {0, EADDRINUSE, 439 + EADDRINUSE, EADDRINUSE, 440 + EADDRINUSE, EADDRINUSE, 441 + EADDRINUSE, EADDRINUSE}, 442 + .expected_reuse_errno = {0, 0, 443 + EADDRINUSE, EADDRINUSE, 444 + EADDRINUSE, EADDRINUSE, 445 + EADDRINUSE, EADDRINUSE}, 446 + }; 447 + 448 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_any_only_v6_local) 449 + { 450 + .family = {AF_INET6, AF_INET6}, 451 + .addr = {&in6addr_any, &in6addr_loopback}, 452 + .ipv6_only = {true, false}, 453 + .expected_errno = {0, EADDRINUSE, 454 + 0, EADDRINUSE, 455 + EADDRINUSE, EADDRINUSE, 456 + EADDRINUSE, EADDRINUSE}, 457 + .expected_reuse_errno = {0, 0, 458 + 0, EADDRINUSE, 459 + EADDRINUSE, EADDRINUSE, 460 + EADDRINUSE, EADDRINUSE}, 461 + }; 462 + 463 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_any_v6_v4mapped_any) 464 + { 465 + .family = {AF_INET6, AF_INET6}, 466 + .addr = {&in6addr_any, &in6addr_v4mapped_any}, 467 + .expected_errno = {0, EADDRINUSE, 468 + EADDRINUSE, EADDRINUSE, 469 + EADDRINUSE, EADDRINUSE, 470 + EADDRINUSE, EADDRINUSE}, 471 + .expected_reuse_errno = {0, 0, 472 + EADDRINUSE, EADDRINUSE, 473 + EADDRINUSE, EADDRINUSE, 474 + EADDRINUSE, EADDRINUSE}, 475 + }; 476 + 477 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_any_only_v6_v4mapped_any) 478 + { 479 + .family = {AF_INET6, AF_INET6}, 480 + .addr = {&in6addr_any, &in6addr_v4mapped_any}, 481 + .ipv6_only = {true, false}, 482 + .expected_errno = {0, 0, 483 + EADDRINUSE, EADDRINUSE, 484 + EADDRINUSE, EADDRINUSE, 485 + EADDRINUSE, EADDRINUSE}, 486 + .expected_reuse_errno = {0, 0, 487 + EADDRINUSE, EADDRINUSE, 488 + EADDRINUSE, EADDRINUSE, 489 + EADDRINUSE, EADDRINUSE}, 490 + }; 491 + 492 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_any_v6_v4mapped_local) 493 + { 494 + .family = {AF_INET6, AF_INET6}, 495 + .addr = {&in6addr_any, &in6addr_v4mapped_loopback}, 496 + .expected_errno = {0, EADDRINUSE, 497 + EADDRINUSE, EADDRINUSE, 498 + EADDRINUSE, EADDRINUSE, 499 + EADDRINUSE, EADDRINUSE}, 500 + .expected_reuse_errno = {0, 0, 501 + EADDRINUSE, EADDRINUSE, 502 + EADDRINUSE, EADDRINUSE, 503 + EADDRINUSE, EADDRINUSE}, 504 + }; 505 + 506 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_any_only_v6_v4mapped_local) 507 + { 508 + .family = {AF_INET6, AF_INET6}, 509 + .addr = {&in6addr_any, &in6addr_v4mapped_loopback}, 510 + .ipv6_only = {true, false}, 511 + .expected_errno = {0, 0, 512 + EADDRINUSE, EADDRINUSE, 513 + EADDRINUSE, EADDRINUSE, 514 + EADDRINUSE, EADDRINUSE}, 515 + .expected_reuse_errno = {0, 0, 516 + EADDRINUSE, EADDRINUSE, 517 + EADDRINUSE, EADDRINUSE, 518 + EADDRINUSE, EADDRINUSE}, 519 + }; 520 + 521 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_local_v6_any) 522 + { 523 + .family = {AF_INET6, AF_INET6}, 524 + .addr = {&in6addr_loopback, &in6addr_any}, 525 + .expected_errno = {0, EADDRINUSE, 526 + 0, EADDRINUSE, 527 + EADDRINUSE, EADDRINUSE, 528 + EADDRINUSE, EADDRINUSE}, 529 + .expected_reuse_errno = {0, 0, 530 + EADDRINUSE, EADDRINUSE, 531 + EADDRINUSE, EADDRINUSE, 532 + EADDRINUSE, EADDRINUSE}, 533 + }; 534 + 535 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_local_v6_any_only) 536 + { 537 + .family = {AF_INET6, AF_INET6}, 538 + .addr = {&in6addr_loopback, &in6addr_any}, 539 + .ipv6_only = {false, true}, 540 + .expected_errno = {0, EADDRINUSE, 541 + 0, EADDRINUSE, 542 + EADDRINUSE, EADDRINUSE, 543 + EADDRINUSE, EADDRINUSE}, 544 + .expected_reuse_errno = {0, 0, 545 + 0, EADDRINUSE, 546 + EADDRINUSE, EADDRINUSE, 547 + EADDRINUSE, EADDRINUSE}, 548 + }; 549 + 550 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_local_v6_v4mapped_any) 551 + { 552 + .family = {AF_INET6, AF_INET6}, 553 + .addr = {&in6addr_loopback, &in6addr_v4mapped_any}, 554 + .expected_errno = {0, 0, 555 + EADDRINUSE, EADDRINUSE, 556 + EADDRINUSE, EADDRINUSE, 557 + EADDRINUSE, EADDRINUSE}, 558 + .expected_reuse_errno = {0, 0, 559 + EADDRINUSE, EADDRINUSE, 560 + EADDRINUSE, EADDRINUSE, 561 + EADDRINUSE, EADDRINUSE}, 562 + }; 563 + 564 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_local_v6_v4mapped_local) 565 + { 566 + .family = {AF_INET6, AF_INET6}, 567 + .addr = {&in6addr_loopback, &in6addr_v4mapped_loopback}, 568 + .expected_errno = {0, 0, 569 + EADDRINUSE, EADDRINUSE, 570 + EADDRINUSE, EADDRINUSE, 571 + EADDRINUSE, EADDRINUSE}, 572 + .expected_reuse_errno = {0, 0, 573 + EADDRINUSE, EADDRINUSE, 574 + EADDRINUSE, EADDRINUSE, 575 + EADDRINUSE, EADDRINUSE}, 576 + }; 577 + 578 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_v4mapped_any_v6_any) 579 + { 580 + .family = {AF_INET6, AF_INET6}, 581 + .addr = {&in6addr_v4mapped_any, &in6addr_any}, 582 + .expected_errno = {0, EADDRINUSE, 583 + EADDRINUSE, EADDRINUSE, 584 + EADDRINUSE, 0, 585 + EADDRINUSE, EADDRINUSE}, 586 + .expected_reuse_errno = {0, 0, 587 + EADDRINUSE, EADDRINUSE, 588 + EADDRINUSE, EADDRINUSE, 589 + EADDRINUSE, EADDRINUSE}, 590 + }; 591 + 592 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_v4mapped_any_v6_any_only) 593 + { 594 + .family = {AF_INET6, AF_INET6}, 595 + .addr = {&in6addr_v4mapped_any, &in6addr_any}, 596 + .ipv6_only = {false, true}, 597 + .expected_errno = {0, 0, 598 + EADDRINUSE, EADDRINUSE, 599 + EADDRINUSE, EADDRINUSE, 600 + EADDRINUSE, EADDRINUSE}, 601 + .expected_reuse_errno = {0, 0, 602 + EADDRINUSE, EADDRINUSE, 603 + EADDRINUSE, EADDRINUSE, 604 + EADDRINUSE, EADDRINUSE}, 605 + }; 606 + 607 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_v4mapped_any_v6_local) 608 + { 609 + .family = {AF_INET6, AF_INET6}, 610 + .addr = {&in6addr_v4mapped_any, &in6addr_loopback}, 611 + .expected_errno = {0, 0, 612 + EADDRINUSE, EADDRINUSE, 613 + EADDRINUSE, EADDRINUSE, 614 + EADDRINUSE, EADDRINUSE}, 615 + .expected_reuse_errno = {0, 0, 616 + EADDRINUSE, EADDRINUSE, 617 + EADDRINUSE, EADDRINUSE, 618 + EADDRINUSE, EADDRINUSE}, 619 + }; 620 + 621 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_v4mapped_any_v6_v4mapped_local) 622 + { 623 + .family = {AF_INET6, AF_INET6}, 624 + .addr = {&in6addr_v4mapped_any, &in6addr_v4mapped_loopback}, 625 + .expected_errno = {0, EADDRINUSE, 626 + EADDRINUSE, EADDRINUSE, 627 + EADDRINUSE, 0, 628 + EADDRINUSE, EADDRINUSE}, 629 + .expected_reuse_errno = {0, 0, 630 + EADDRINUSE, EADDRINUSE, 631 + EADDRINUSE, 0, 632 + EADDRINUSE, EADDRINUSE}, 633 + }; 634 + 635 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_v4mapped_loopback_v6_any) 636 + { 637 + .family = {AF_INET6, AF_INET6}, 638 + .addr = {&in6addr_v4mapped_loopback, &in6addr_any}, 639 + .expected_errno = {0, EADDRINUSE, 640 + EADDRINUSE, EADDRINUSE, 641 + EADDRINUSE, 0, 642 + EADDRINUSE, EADDRINUSE}, 643 + .expected_reuse_errno = {0, 0, 644 + EADDRINUSE, EADDRINUSE, 645 + EADDRINUSE, EADDRINUSE, 646 + EADDRINUSE, EADDRINUSE}, 647 + }; 648 + 649 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_v4mapped_loopback_v6_any_only) 650 + { 651 + .family = {AF_INET6, AF_INET6}, 652 + .addr = {&in6addr_v4mapped_loopback, &in6addr_any}, 653 + .ipv6_only = {false, true}, 654 + .expected_errno = {0, 0, 655 + EADDRINUSE, EADDRINUSE, 656 + EADDRINUSE, EADDRINUSE, 657 + EADDRINUSE, EADDRINUSE}, 658 + .expected_reuse_errno = {0, 0, 659 + EADDRINUSE, EADDRINUSE, 660 + EADDRINUSE, EADDRINUSE, 661 + EADDRINUSE, EADDRINUSE}, 662 + }; 663 + 664 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_v4mapped_loopback_v6_local) 665 + { 666 + .family = {AF_INET6, AF_INET6}, 667 + .addr = {&in6addr_v4mapped_loopback, &in6addr_loopback}, 668 + .expected_errno = {0, 0, 669 + EADDRINUSE, EADDRINUSE, 670 + EADDRINUSE, EADDRINUSE, 671 + EADDRINUSE, EADDRINUSE}, 672 + .expected_reuse_errno = {0, 0, 673 + EADDRINUSE, EADDRINUSE, 674 + EADDRINUSE, EADDRINUSE, 675 + EADDRINUSE, EADDRINUSE}, 676 + }; 677 + 678 + FIXTURE_VARIANT_ADD(bind_wildcard, v6_v4mapped_loopback_v6_v4mapped_any) 679 + { 680 + .family = {AF_INET6, AF_INET6}, 681 + .addr = {&in6addr_v4mapped_loopback, &in6addr_v4mapped_any}, 682 + .expected_errno = {0, EADDRINUSE, 683 + EADDRINUSE, EADDRINUSE, 684 + EADDRINUSE, 0, 685 + EADDRINUSE, EADDRINUSE}, 686 + .expected_reuse_errno = {0, 0, 687 + EADDRINUSE, EADDRINUSE, 688 + EADDRINUSE, 0, 689 + EADDRINUSE, EADDRINUSE}, 690 + }; 691 + 692 + static void setup_addr(FIXTURE_DATA(bind_wildcard) *self, int i, 693 + int family, const void *addr_const) 694 + { 695 + if (family == AF_INET) { 696 + struct sockaddr_in *addr4 = &self->addr[i].addr4; 697 + const __u32 *addr4_const = addr_const; 698 + 699 + addr4->sin_family = AF_INET; 700 + addr4->sin_port = htons(0); 701 + addr4->sin_addr.s_addr = htonl(*addr4_const); 702 + 703 + self->addrlen[i] = sizeof(struct sockaddr_in); 704 + } else { 705 + struct sockaddr_in6 *addr6 = &self->addr[i].addr6; 706 + const struct in6_addr *addr6_const = addr_const; 707 + 708 + addr6->sin6_family = AF_INET6; 709 + addr6->sin6_port = htons(0); 710 + addr6->sin6_addr = *addr6_const; 711 + 712 + self->addrlen[i] = sizeof(struct sockaddr_in6); 713 + } 714 + } 96 715 97 716 FIXTURE_SETUP(bind_wildcard) 98 717 { 99 - self->addr4.sin_family = AF_INET; 100 - self->addr4.sin_port = htons(0); 101 - self->addr4.sin_addr.s_addr = htonl(variant->addr4_const); 718 + setup_addr(self, 0, variant->family[0], variant->addr[0]); 719 + setup_addr(self, 1, variant->family[1], variant->addr[1]); 102 720 103 - self->addr6.sin6_family = AF_INET6; 104 - self->addr6.sin6_port = htons(0); 105 - self->addr6.sin6_addr = *variant->addr6_const; 721 + setup_addr(self, 2, AF_INET, &in4addr_any); 722 + setup_addr(self, 3, AF_INET, &in4addr_loopback); 723 + 724 + setup_addr(self, 4, AF_INET6, &in6addr_any); 725 + setup_addr(self, 5, AF_INET6, &in6addr_loopback); 726 + setup_addr(self, 6, AF_INET6, &in6addr_v4mapped_any); 727 + setup_addr(self, 7, AF_INET6, &in6addr_v4mapped_loopback); 106 728 } 107 729 108 730 FIXTURE_TEARDOWN(bind_wildcard) 109 731 { 732 + int i; 733 + 734 + for (i = 0; i < NR_SOCKETS; i++) 735 + close(self->fd[i]); 110 736 } 111 737 112 - void bind_sockets(struct __test_metadata *_metadata, 113 - FIXTURE_DATA(bind_wildcard) *self, 114 - int expected_errno, 115 - struct sockaddr *addr1, socklen_t addrlen1, 116 - struct sockaddr *addr2, socklen_t addrlen2) 738 + void bind_socket(struct __test_metadata *_metadata, 739 + FIXTURE_DATA(bind_wildcard) *self, 740 + const FIXTURE_VARIANT(bind_wildcard) *variant, 741 + int i, int reuse) 117 742 { 118 - int fd[2]; 119 743 int ret; 120 744 121 - fd[0] = socket(addr1->sa_family, SOCK_STREAM, 0); 122 - ASSERT_GT(fd[0], 0); 745 + self->fd[i] = socket(self->addr[i].addr.sa_family, SOCK_STREAM, 0); 746 + ASSERT_GT(self->fd[i], 0); 123 747 124 - ret = bind(fd[0], addr1, addrlen1); 125 - ASSERT_EQ(ret, 0); 126 - 127 - ret = getsockname(fd[0], addr1, &addrlen1); 128 - ASSERT_EQ(ret, 0); 129 - 130 - ((struct sockaddr_in *)addr2)->sin_port = ((struct sockaddr_in *)addr1)->sin_port; 131 - 132 - fd[1] = socket(addr2->sa_family, SOCK_STREAM, 0); 133 - ASSERT_GT(fd[1], 0); 134 - 135 - ret = bind(fd[1], addr2, addrlen2); 136 - if (expected_errno) { 137 - ASSERT_EQ(ret, -1); 138 - ASSERT_EQ(errno, expected_errno); 139 - } else { 748 + if (i < 2 && variant->ipv6_only[i]) { 749 + ret = setsockopt(self->fd[i], SOL_IPV6, IPV6_V6ONLY, &(int){1}, sizeof(int)); 140 750 ASSERT_EQ(ret, 0); 141 751 } 142 752 143 - close(fd[1]); 144 - close(fd[0]); 753 + if (i < 2 && reuse) { 754 + ret = setsockopt(self->fd[i], SOL_SOCKET, reuse, &(int){1}, sizeof(int)); 755 + ASSERT_EQ(ret, 0); 756 + } 757 + 758 + self->addr[i].addr4.sin_port = self->addr[0].addr4.sin_port; 759 + 760 + ret = bind(self->fd[i], &self->addr[i].addr, self->addrlen[i]); 761 + 762 + if (reuse) { 763 + if (variant->expected_reuse_errno[i]) { 764 + ASSERT_EQ(ret, -1); 765 + ASSERT_EQ(errno, variant->expected_reuse_errno[i]); 766 + } else { 767 + ASSERT_EQ(ret, 0); 768 + } 769 + } else { 770 + if (variant->expected_errno[i]) { 771 + ASSERT_EQ(ret, -1); 772 + ASSERT_EQ(errno, variant->expected_errno[i]); 773 + } else { 774 + ASSERT_EQ(ret, 0); 775 + } 776 + } 777 + 778 + if (i == 0) { 779 + ret = getsockname(self->fd[0], &self->addr[0].addr, &self->addrlen[0]); 780 + ASSERT_EQ(ret, 0); 781 + } 145 782 } 146 783 147 - TEST_F(bind_wildcard, v4_v6) 784 + TEST_F(bind_wildcard, plain) 148 785 { 149 - bind_sockets(_metadata, self, variant->expected_errno, 150 - (struct sockaddr *)&self->addr4, sizeof(self->addr4), 151 - (struct sockaddr *)&self->addr6, sizeof(self->addr6)); 786 + int i; 787 + 788 + for (i = 0; i < NR_SOCKETS; i++) 789 + bind_socket(_metadata, self, variant, i, 0); 152 790 } 153 791 154 - TEST_F(bind_wildcard, v6_v4) 792 + TEST_F(bind_wildcard, reuseaddr) 155 793 { 156 - bind_sockets(_metadata, self, variant->expected_errno, 157 - (struct sockaddr *)&self->addr6, sizeof(self->addr6), 158 - (struct sockaddr *)&self->addr4, sizeof(self->addr4)); 794 + int i; 795 + 796 + for (i = 0; i < NR_SOCKETS; i++) 797 + bind_socket(_metadata, self, variant, i, SO_REUSEADDR); 798 + } 799 + 800 + TEST_F(bind_wildcard, reuseport) 801 + { 802 + int i; 803 + 804 + for (i = 0; i < NR_SOCKETS; i++) 805 + bind_socket(_metadata, self, variant, i, SO_REUSEPORT); 159 806 } 160 807 161 808 TEST_HARNESS_MAIN

+9

tools/testing/selftests/net/mptcp/mptcp_connect.sh

··· 383 383 local stat_cookierx_last 384 384 local stat_csum_err_s 385 385 local stat_csum_err_c 386 + local stat_tcpfb_last_l 386 387 stat_synrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX") 387 388 stat_ackrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX") 388 389 stat_cookietx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent") 389 390 stat_cookierx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv") 390 391 stat_csum_err_s=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtDataCsumErr") 391 392 stat_csum_err_c=$(mptcp_lib_get_counter "${connector_ns}" "MPTcpExtDataCsumErr") 393 + stat_tcpfb_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableFallbackACK") 392 394 393 395 timeout ${timeout_test} \ 394 396 ip netns exec ${listener_ns} \ ··· 459 457 local stat_cookietx_now 460 458 local stat_cookierx_now 461 459 local stat_ooo_now 460 + local stat_tcpfb_now_l 462 461 stat_synrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX") 463 462 stat_ackrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX") 464 463 stat_cookietx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent") 465 464 stat_cookierx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv") 466 465 stat_ooo_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtTCPOFOQueue") 466 + stat_tcpfb_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableFallbackACK") 467 467 468 468 expect_synrx=$((stat_synrx_last_l)) 469 469 expect_ackrx=$((stat_ackrx_last_l)) ··· 510 506 mptcp_lib_pr_fail "client got ${csum_err_c_nr} data checksum error[s]" 511 507 retc=1 512 508 fi 509 + fi 510 + 511 + if [ ${stat_ooo_now} -eq 0 ] && [ ${stat_tcpfb_last_l} -ne ${stat_tcpfb_now_l} ]; then 512 + mptcp_lib_pr_fail "unexpected fallback to TCP" 513 + rets=1 513 514 fi 514 515 515 516 if [ $cookies -eq 2 ];then

+3 -1

tools/testing/selftests/net/mptcp/mptcp_join.sh

··· 729 729 [ -n "$_flags" ]; flags="flags $_flags" 730 730 shift 731 731 elif [ $1 = "dev" ]; then 732 - [ -n "$2" ]; dev="dev $1" 732 + [ -n "$2" ]; dev="dev $2" 733 733 shift 734 734 elif [ $1 = "id" ]; then 735 735 _id=$2 ··· 3610 3610 local tests_pid=$! 3611 3611 3612 3612 wait_mpj $ns2 3613 + pm_nl_check_endpoint "creation" \ 3614 + $ns2 10.0.2.2 id 2 flags subflow dev ns2eth2 3613 3615 chk_subflow_nr "before delete" 2 3614 3616 chk_mptcp_info subflows 1 subflows 1 3615 3617

+1 -1

tools/testing/selftests/net/reuseaddr_conflict.c

··· 109 109 fd1 = open_port(0, 1); 110 110 if (fd1 >= 0) 111 111 error(1, 0, "Was allowed to create an ipv4 reuseport on an already bound non-reuseport socket with no ipv6"); 112 - fprintf(stderr, "Success"); 112 + fprintf(stderr, "Success\n"); 113 113 return 0; 114 114 }

+2 -8

tools/testing/selftests/net/udpgro_fwd.sh

··· 244 244 create_vxlan_pair 245 245 ip netns exec $NS_DST ethtool -K veth$DST generic-receive-offload on 246 246 ip netns exec $NS_DST ethtool -K veth$DST rx-gro-list on 247 - run_test "GRO frag list over UDP tunnel" $OL_NET$DST 1 1 247 + run_test "GRO frag list over UDP tunnel" $OL_NET$DST 10 10 248 248 cleanup 249 249 250 250 # use NAT to circumvent GRO FWD check ··· 258 258 # load arp cache before running the test to reduce the amount of 259 259 # stray traffic on top of the UDP tunnel 260 260 ip netns exec $NS_SRC $PING -q -c 1 $OL_NET$DST_NAT >/dev/null 261 - run_test "GRO fwd over UDP tunnel" $OL_NET$DST_NAT 1 1 $OL_NET$DST 262 - cleanup 263 - 264 - create_vxlan_pair 265 - run_bench "UDP tunnel fwd perf" $OL_NET$DST 266 - ip netns exec $NS_DST ethtool -K veth$DST rx-udp-gro-forwarding on 267 - run_bench "UDP tunnel GRO fwd perf" $OL_NET$DST 261 + run_test "GRO fwd over UDP tunnel" $OL_NET$DST_NAT 10 10 $OL_NET$DST 268 262 cleanup 269 263 done 270 264

Configure Feed

Configure Feed