Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase

As currently defined, initial_bytes is monotonically decreasing and
precedes dirty_bytes when reading from the saving file descriptor.
The transition from initial_bytes to dirty_bytes is unidirectional and
irreversible.

The initial_bytes are considered as critical data that is highly
recommended to be transferred to the target as part of PRE_COPY, without
this data, the PRE_COPY phase would be ineffective.

We come to solve the case when a new chunk of critical data is
introduced during the PRE_COPY phase and the driver would like to report
an entirely new value for the initial_bytes.

For that, we extend the VFIO_MIG_GET_PRECOPY_INFO ioctl with an output
flag named VFIO_PRECOPY_INFO_REINIT to allow drivers reporting a new
initial_bytes value during the PRE_COPY phase.

Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't
assign info.flags before copy_to_user(), this effectively echoes
userspace-provided flags back as output, preventing the field from being
used to report new reliable data from the drivers.

Reliable use of the new VFIO_PRECOPY_INFO_REINIT flag requires userspace
to explicitly opt in by enabling the
VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 device feature.

When the caller opts in, the driver may report an entirely new
value for initial_bytes. It may be larger, it may be smaller, it may
include the previous unread initial_bytes, it may discard the previous
unread initial_bytes, up to the driver logic and state.
The presence of the VFIO_PRECOPY_INFO_REINIT output flag set by the
driver indicates that new initial data is present on the stream.

Once the caller sees this flag, the initial_bytes value should be
re-evaluated relative to the readiness state for transition to
STOP_COPY.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/20260317161753.18964-2-yishaih@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>

authored by

Yishai Hadas and committed by
Alex Williamson
d7140b5d 4f42d716

+24
+24
include/uapi/linux/vfio.h
··· 1266 1266 * The initial_bytes field indicates the amount of initial precopy 1267 1267 * data available from the device. This field should have a non-zero initial 1268 1268 * value and decrease as migration data is read from the device. 1269 + * The presence of the VFIO_PRECOPY_INFO_REINIT output flag indicates 1270 + * that new initial data is present on the stream. 1271 + * The new initial data may result, for example, from device reconfiguration 1272 + * during migration that requires additional initialization data. 1273 + * In that case initial_bytes may report a non-zero value irrespective of 1274 + * any previously reported values, which progresses towards zero as precopy 1275 + * data is read from the data stream. dirty_bytes is also reset 1276 + * to zero and represents the state change of the device relative to the new 1277 + * initial_bytes. 1278 + * VFIO_PRECOPY_INFO_REINIT can be reported only after userspace opts in to 1279 + * VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2. Without this opt-in, the flags field 1280 + * of struct vfio_precopy_info is reserved for bug-compatibility reasons. 1281 + * 1269 1282 * It is recommended to leave PRE_COPY for STOP_COPY only after this field 1270 1283 * reaches zero. Leaving PRE_COPY earlier might make things slower. 1271 1284 * ··· 1314 1301 struct vfio_precopy_info { 1315 1302 __u32 argsz; 1316 1303 __u32 flags; 1304 + #define VFIO_PRECOPY_INFO_REINIT (1 << 0) /* output - new initial data is present */ 1317 1305 __aligned_u64 initial_bytes; 1318 1306 __aligned_u64 dirty_bytes; 1319 1307 }; ··· 1523 1509 __u32 nr_ranges; 1524 1510 struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges); 1525 1511 }; 1512 + 1513 + /* 1514 + * Enables the migration precopy_info_v2 behaviour. 1515 + * 1516 + * VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2. 1517 + * 1518 + * On SET, enables the v2 pre_copy_info behaviour, where the 1519 + * vfio_precopy_info.flags is a valid output field. 1520 + */ 1521 + #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12 1526 1522 1527 1523 /* -------- API for Type1 VFIO IOMMU -------- */ 1528 1524