Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/ras: Introduce the DRM RAS infrastructure over generic netlink

Introduces the DRM RAS infrastructure over generic netlink.

The new interface allows drivers to expose RAS nodes and their
associated error counters to userspace in a structured and extensible
way. Each drm_ras node can register its own set of error counters, which
are then discoverable and queryable through netlink operations. This
lays the groundwork for reporting and managing hardware error states
in a unified manner across different DRM drivers.

Currently it only supports error-counter nodes. But it can be
extended later.

The registration is also not tied to any drm node, so it can be
used by accel devices as well.

It uses the new and mandatory YAML description format stored in
Documentation/netlink/specs/. This forces a single generic netlink
family namespace for the entire drm: "drm-ras".
But multiple-endpoints are supported within the single family.

Any modification to this API needs to be applied to
Documentation/netlink/specs/drm_ras.yaml before regenerating the
code:

$ tools/net/ynl/pyynl/ynl_gen_c.py --spec \
Documentation/netlink/specs/drm_ras.yaml --mode uapi --header \
-o include/uapi/drm/drm_ras.h

$ tools/net/ynl/pyynl/ynl_gen_c.py --spec \
Documentation/netlink/specs/drm_ras.yaml --mode kernel \
--header -o drivers/gpu/drm/drm_ras_nl.h

$ tools/net/ynl/pyynl/ynl_gen_c.py --spec \
Documentation/netlink/specs/drm_ras.yaml \
--mode kernel --source -o drivers/gpu/drm/drm_ras_nl.c

Cc: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: netdev@vger.kernel.org
Co-developed-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/20260304074412.464435-8-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

+853
+103
Documentation/gpu/drm-ras.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0+ 2 + 3 + ============================ 4 + DRM RAS over Generic Netlink 5 + ============================ 6 + 7 + The DRM RAS (Reliability, Availability, Serviceability) interface provides a 8 + standardized way for GPU/accelerator drivers to expose error counters and 9 + other reliability nodes to user space via Generic Netlink. This allows 10 + diagnostic tools, monitoring daemons, or test infrastructure to query hardware 11 + health in a uniform way across different DRM drivers. 12 + 13 + Key Goals: 14 + 15 + * Provide a standardized RAS solution for GPU and accelerator drivers, enabling 16 + data center monitoring and reliability operations. 17 + * Implement a single drm-ras Generic Netlink family to meet modern Netlink YAML 18 + specifications and centralize all RAS-related communication in one namespace. 19 + * Support a basic error counter interface, addressing the immediate, essential 20 + monitoring needs. 21 + * Offer a flexible, future-proof interface that can be extended to support 22 + additional types of RAS data in the future. 23 + * Allow multiple nodes per driver, enabling drivers to register separate 24 + nodes for different IP blocks, sub-blocks, or other logical subdivisions 25 + as applicable. 26 + 27 + Nodes 28 + ===== 29 + 30 + Nodes are logical abstractions representing an error type or error source within 31 + the device. Currently, only error counter nodes is supported. 32 + 33 + Drivers are responsible for registering and unregistering nodes via the 34 + `drm_ras_node_register()` and `drm_ras_node_unregister()` APIs. 35 + 36 + Node Management 37 + ------------------- 38 + 39 + .. kernel-doc:: drivers/gpu/drm/drm_ras.c 40 + :doc: DRM RAS Node Management 41 + .. kernel-doc:: drivers/gpu/drm/drm_ras.c 42 + :internal: 43 + 44 + Generic Netlink Usage 45 + ===================== 46 + 47 + The interface is implemented as a Generic Netlink family named ``drm-ras``. 48 + User space tools can: 49 + 50 + * List registered nodes with the ``list-nodes`` command. 51 + * List all error counters in an node with the ``get-error-counter`` command with ``node-id`` 52 + as a parameter. 53 + * Query specific error counter values with the ``get-error-counter`` command, using both 54 + ``node-id`` and ``error-id`` as parameters. 55 + 56 + YAML-based Interface 57 + -------------------- 58 + 59 + The interface is described in a YAML specification ``Documentation/netlink/specs/drm_ras.yaml`` 60 + 61 + This YAML is used to auto-generate user space bindings via 62 + ``tools/net/ynl/pyynl/ynl_gen_c.py``, and drives the structure of netlink 63 + attributes and operations. 64 + 65 + Usage Notes 66 + ----------- 67 + 68 + * User space must first enumerate nodes to obtain their IDs. 69 + * Node IDs or Node names can be used for all further queries, such as error counters. 70 + * Error counters can be queried by either the Error ID or Error name. 71 + * Query Parameters should be defined as part of the uAPI to ensure user interface stability. 72 + * The interface supports future extension by adding new node types and 73 + additional attributes. 74 + 75 + Example: List nodes using ynl 76 + 77 + .. code-block:: bash 78 + 79 + sudo ynl --family drm_ras --dump list-nodes 80 + [{'device-name': '0000:03:00.0', 81 + 'node-id': 0, 82 + 'node-name': 'correctable-errors', 83 + 'node-type': 'error-counter'}, 84 + {'device-name': '0000:03:00.0', 85 + 'node-id': 1, 86 + 'node-name': 'uncorrectable-errors', 87 + 'node-type': 'error-counter'}] 88 + 89 + Example: List all error counters using ynl 90 + 91 + .. code-block:: bash 92 + 93 + sudo ynl --family drm_ras --dump get-error-counter --json '{"node-id":0}' 94 + [{'error-id': 1, 'error-name': 'error_name1', 'error-value': 0}, 95 + {'error-id': 2, 'error-name': 'error_name2', 'error-value': 0}] 96 + 97 + Example: Query an error counter for a given node 98 + 99 + .. code-block:: bash 100 + 101 + sudo ynl --family drm_ras --do get-error-counter --json '{"node-id":0, "error-id":1}' 102 + {'error-id': 1, 'error-name': 'error_name1', 'error-value': 0} 103 +
+1
Documentation/gpu/index.rst
··· 9 9 drm-mm 10 10 drm-kms 11 11 drm-kms-helpers 12 + drm-ras 12 13 drm-uapi 13 14 drm-usage-stats 14 15 driver-uapi
+115
Documentation/netlink/specs/drm_ras.yaml
··· 1 + # SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 2 + --- 3 + name: drm-ras 4 + protocol: genetlink 5 + uapi-header: drm/drm_ras.h 6 + 7 + doc: >- 8 + DRM RAS (Reliability, Availability, Serviceability) over Generic Netlink. 9 + Provides a standardized mechanism for DRM drivers to register "nodes" 10 + representing hardware/software components capable of reporting error counters. 11 + Userspace tools can query the list of nodes or individual error counters 12 + via the Generic Netlink interface. 13 + 14 + definitions: 15 + - 16 + type: enum 17 + name: node-type 18 + value-start: 1 19 + entries: [error-counter] 20 + doc: >- 21 + Type of the node. Currently, only error-counter nodes are 22 + supported, which expose reliability counters for a hardware/software 23 + component. 24 + 25 + attribute-sets: 26 + - 27 + name: node-attrs 28 + attributes: 29 + - 30 + name: node-id 31 + type: u32 32 + doc: >- 33 + Unique identifier for the node. 34 + Assigned dynamically by the DRM RAS core upon registration. 35 + - 36 + name: device-name 37 + type: string 38 + doc: >- 39 + Device name chosen by the driver at registration. 40 + Can be a PCI BDF, UUID, or module name if unique. 41 + - 42 + name: node-name 43 + type: string 44 + doc: >- 45 + Node name chosen by the driver at registration. 46 + Can be an IP block name, or any name that identifies the 47 + RAS node inside the device. 48 + - 49 + name: node-type 50 + type: u32 51 + doc: Type of this node, identifying its function. 52 + enum: node-type 53 + - 54 + name: error-counter-attrs 55 + attributes: 56 + - 57 + name: node-id 58 + type: u32 59 + doc: Node ID targeted by this error counter operation. 60 + - 61 + name: error-id 62 + type: u32 63 + doc: Unique identifier for a specific error counter within an node. 64 + - 65 + name: error-name 66 + type: string 67 + doc: Name of the error. 68 + - 69 + name: error-value 70 + type: u32 71 + doc: Current value of the requested error counter. 72 + 73 + operations: 74 + list: 75 + - 76 + name: list-nodes 77 + doc: >- 78 + Retrieve the full list of currently registered DRM RAS nodes. 79 + Each node includes its dynamically assigned ID, name, and type. 80 + **Important:** User space must call this operation first to obtain 81 + the node IDs. These IDs are required for all subsequent 82 + operations on nodes, such as querying error counters. 83 + attribute-set: node-attrs 84 + flags: [admin-perm] 85 + dump: 86 + reply: 87 + attributes: 88 + - node-id 89 + - device-name 90 + - node-name 91 + - node-type 92 + - 93 + name: get-error-counter 94 + doc: >- 95 + Retrieve error counter for a given node. 96 + The response includes the id, the name, and even the current 97 + value of each counter. 98 + attribute-set: error-counter-attrs 99 + flags: [admin-perm] 100 + do: 101 + request: 102 + attributes: 103 + - node-id 104 + - error-id 105 + reply: 106 + attributes: &errorinfo 107 + - error-id 108 + - error-name 109 + - error-value 110 + dump: 111 + request: 112 + attributes: 113 + - node-id 114 + reply: 115 + attributes: *errorinfo
+10
drivers/gpu/drm/Kconfig
··· 130 130 Smaller QR code are easier to read, but will contain less debugging 131 131 data. Default is 40. 132 132 133 + config DRM_RAS 134 + bool "DRM RAS support" 135 + depends on DRM 136 + depends on NET 137 + help 138 + Enables the DRM RAS (Reliability, Availability and Serviceability) 139 + support for DRM drivers. This provides a Generic Netlink interface 140 + for error reporting and queries. 141 + If in doubt, say "N". 142 + 133 143 config DRM_DEBUG_DP_MST_TOPOLOGY_REFS 134 144 bool "Enable refcount backtrace history in the DP MST helpers" 135 145 depends on STACKTRACE_SUPPORT
+1
drivers/gpu/drm/Makefile
··· 95 95 drm-$(CONFIG_DRM_PANIC) += drm_panic.o 96 96 drm-$(CONFIG_DRM_DRAW) += drm_draw.o 97 97 drm-$(CONFIG_DRM_PANIC_SCREEN_QR_CODE) += drm_panic_qr.o 98 + drm-$(CONFIG_DRM_RAS) += drm_ras.o drm_ras_nl.o drm_ras_genl_family.o 98 99 obj-$(CONFIG_DRM) += drm.o 99 100 100 101 obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
+6
drivers/gpu/drm/drm_drv.c
··· 53 53 #include <drm/drm_panic.h> 54 54 #include <drm/drm_print.h> 55 55 #include <drm/drm_privacy_screen_machine.h> 56 + #include <drm/drm_ras_genl_family.h> 56 57 57 58 #include "drm_crtc_internal.h" 58 59 #include "drm_internal.h" ··· 1224 1223 1225 1224 static void drm_core_exit(void) 1226 1225 { 1226 + drm_ras_genl_family_unregister(); 1227 1227 drm_privacy_screen_lookup_exit(); 1228 1228 drm_panic_exit(); 1229 1229 accel_core_exit(); ··· 1262 1260 drm_panic_init(); 1263 1261 1264 1262 drm_privacy_screen_lookup_init(); 1263 + 1264 + ret = drm_ras_genl_family_register(); 1265 + if (ret < 0) 1266 + goto error; 1265 1267 1266 1268 drm_core_init_complete = true; 1267 1269
+354
drivers/gpu/drm/drm_ras.c
··· 1 + // SPDX-License-Identifier: MIT 2 + /* 3 + * Copyright © 2026 Intel Corporation 4 + */ 5 + 6 + #include <linux/module.h> 7 + #include <linux/kernel.h> 8 + #include <linux/netdevice.h> 9 + #include <linux/xarray.h> 10 + #include <net/genetlink.h> 11 + 12 + #include <drm/drm_ras.h> 13 + 14 + #include "drm_ras_nl.h" 15 + 16 + /** 17 + * DOC: DRM RAS Node Management 18 + * 19 + * This module provides the infrastructure to manage RAS (Reliability, 20 + * Availability, and Serviceability) nodes for DRM drivers. Each 21 + * DRM driver may register one or more RAS nodes, which represent 22 + * logical components capable of reporting error counters and other 23 + * reliability metrics. 24 + * 25 + * The nodes are stored in a global xarray `drm_ras_xa` to allow 26 + * efficient lookup by ID. Nodes can be registered or unregistered 27 + * dynamically at runtime. 28 + * 29 + * A Generic Netlink family `drm_ras` exposes two main operations to 30 + * userspace: 31 + * 32 + * 1. LIST_NODES: Dump all currently registered RAS nodes. 33 + * The user receives an array of node IDs, names, and types. 34 + * 35 + * 2. GET_ERROR_COUNTER: Get error counters of a given node. 36 + * Userspace must provide Node ID, Error ID (Optional for specific counter). 37 + * Returns all counters of a node if only Node ID is provided or specific 38 + * error counters. 39 + * 40 + * Node registration: 41 + * 42 + * - drm_ras_node_register(): Registers a new node and assigns 43 + * it a unique ID in the xarray. 44 + * - drm_ras_node_unregister(): Removes a previously registered 45 + * node from the xarray. 46 + * 47 + * Node type: 48 + * 49 + * - ERROR_COUNTER: 50 + * + Currently, only error counters are supported. 51 + * + The driver must implement the query_error_counter() callback to provide 52 + * the name and the value of the error counter. 53 + * + The driver must provide a error_counter_range.last value informing the 54 + * last valid error ID. 55 + * + The driver can provide a error_counter_range.first value informing the 56 + * first valid error ID. 57 + * + The error counters in the driver doesn't need to be contiguous, but the 58 + * driver must return -ENOENT to the query_error_counter as an indication 59 + * that the ID should be skipped and not listed in the netlink API. 60 + * 61 + * Netlink handlers: 62 + * 63 + * - drm_ras_nl_list_nodes_dumpit(): Implements the LIST_NODES 64 + * operation, iterating over the xarray. 65 + * - drm_ras_nl_get_error_counter_dumpit(): Implements the GET_ERROR_COUNTER dumpit 66 + * operation, fetching all counters from a specific node. 67 + * - drm_ras_nl_get_error_counter_doit(): Implements the GET_ERROR_COUNTER doit 68 + * operation, fetching a counter value from a specific node. 69 + */ 70 + 71 + static DEFINE_XARRAY_ALLOC(drm_ras_xa); 72 + 73 + /* 74 + * The netlink callback context carries dump state across multiple dumpit calls 75 + */ 76 + struct drm_ras_ctx { 77 + /* Which xarray id to restart the dump from */ 78 + unsigned long restart; 79 + }; 80 + 81 + /** 82 + * drm_ras_nl_list_nodes_dumpit() - Dump all registered RAS nodes 83 + * @skb: Netlink message buffer 84 + * @cb: Callback context for multi-part dumps 85 + * 86 + * Iterates over all registered RAS nodes in the global xarray and appends 87 + * their attributes (ID, name, type) to the given netlink message buffer. 88 + * Uses @cb->ctx to track progress in case the message buffer fills up, allowing 89 + * multi-part dump support. On buffer overflow, updates the context to resume 90 + * from the last node on the next invocation. 91 + * 92 + * Return: 0 if all nodes fit in @skb, number of bytes added to @skb if 93 + * the buffer filled up (requires multi-part continuation), or 94 + * a negative error code on failure. 95 + */ 96 + int drm_ras_nl_list_nodes_dumpit(struct sk_buff *skb, 97 + struct netlink_callback *cb) 98 + { 99 + const struct genl_info *info = genl_info_dump(cb); 100 + struct drm_ras_ctx *ctx = (void *)cb->ctx; 101 + struct drm_ras_node *node; 102 + struct nlattr *hdr; 103 + unsigned long id; 104 + int ret; 105 + 106 + xa_for_each_start(&drm_ras_xa, id, node, ctx->restart) { 107 + hdr = genlmsg_iput(skb, info); 108 + if (!hdr) { 109 + ret = -EMSGSIZE; 110 + break; 111 + } 112 + 113 + ret = nla_put_u32(skb, DRM_RAS_A_NODE_ATTRS_NODE_ID, node->id); 114 + if (ret) { 115 + genlmsg_cancel(skb, hdr); 116 + break; 117 + } 118 + 119 + ret = nla_put_string(skb, DRM_RAS_A_NODE_ATTRS_DEVICE_NAME, 120 + node->device_name); 121 + if (ret) { 122 + genlmsg_cancel(skb, hdr); 123 + break; 124 + } 125 + 126 + ret = nla_put_string(skb, DRM_RAS_A_NODE_ATTRS_NODE_NAME, 127 + node->node_name); 128 + if (ret) { 129 + genlmsg_cancel(skb, hdr); 130 + break; 131 + } 132 + 133 + ret = nla_put_u32(skb, DRM_RAS_A_NODE_ATTRS_NODE_TYPE, 134 + node->type); 135 + if (ret) { 136 + genlmsg_cancel(skb, hdr); 137 + break; 138 + } 139 + 140 + genlmsg_end(skb, hdr); 141 + } 142 + 143 + if (ret == -EMSGSIZE) 144 + ctx->restart = id; 145 + 146 + return ret; 147 + } 148 + 149 + static int get_node_error_counter(u32 node_id, u32 error_id, 150 + const char **name, u32 *value) 151 + { 152 + struct drm_ras_node *node; 153 + 154 + node = xa_load(&drm_ras_xa, node_id); 155 + if (!node || !node->query_error_counter) 156 + return -ENOENT; 157 + 158 + if (error_id < node->error_counter_range.first || 159 + error_id > node->error_counter_range.last) 160 + return -EINVAL; 161 + 162 + return node->query_error_counter(node, error_id, name, value); 163 + } 164 + 165 + static int msg_reply_value(struct sk_buff *msg, u32 error_id, 166 + const char *error_name, u32 value) 167 + { 168 + int ret; 169 + 170 + ret = nla_put_u32(msg, DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID, error_id); 171 + if (ret) 172 + return ret; 173 + 174 + ret = nla_put_string(msg, DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_NAME, 175 + error_name); 176 + if (ret) 177 + return ret; 178 + 179 + return nla_put_u32(msg, DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_VALUE, 180 + value); 181 + } 182 + 183 + static int doit_reply_value(struct genl_info *info, u32 node_id, 184 + u32 error_id) 185 + { 186 + struct sk_buff *msg; 187 + struct nlattr *hdr; 188 + const char *error_name; 189 + u32 value; 190 + int ret; 191 + 192 + msg = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); 193 + if (!msg) 194 + return -ENOMEM; 195 + 196 + hdr = genlmsg_iput(msg, info); 197 + if (!hdr) { 198 + nlmsg_free(msg); 199 + return -EMSGSIZE; 200 + } 201 + 202 + ret = get_node_error_counter(node_id, error_id, 203 + &error_name, &value); 204 + if (ret) 205 + return ret; 206 + 207 + ret = msg_reply_value(msg, error_id, error_name, value); 208 + if (ret) { 209 + genlmsg_cancel(msg, hdr); 210 + nlmsg_free(msg); 211 + return ret; 212 + } 213 + 214 + genlmsg_end(msg, hdr); 215 + 216 + return genlmsg_reply(msg, info); 217 + } 218 + 219 + /** 220 + * drm_ras_nl_get_error_counter_dumpit() - Dump all Error Counters 221 + * @skb: Netlink message buffer 222 + * @cb: Callback context for multi-part dumps 223 + * 224 + * Iterates over all error counters in a given Node and appends 225 + * their attributes (ID, name, value) to the given netlink message buffer. 226 + * Uses @cb->ctx to track progress in case the message buffer fills up, allowing 227 + * multi-part dump support. On buffer overflow, updates the context to resume 228 + * from the last node on the next invocation. 229 + * 230 + * Return: 0 if all errors fit in @skb, number of bytes added to @skb if 231 + * the buffer filled up (requires multi-part continuation), or 232 + * a negative error code on failure. 233 + */ 234 + int drm_ras_nl_get_error_counter_dumpit(struct sk_buff *skb, 235 + struct netlink_callback *cb) 236 + { 237 + const struct genl_info *info = genl_info_dump(cb); 238 + struct drm_ras_ctx *ctx = (void *)cb->ctx; 239 + struct drm_ras_node *node; 240 + struct nlattr *hdr; 241 + const char *error_name; 242 + u32 node_id, error_id, value; 243 + int ret; 244 + 245 + if (!info->attrs || GENL_REQ_ATTR_CHECK(info, DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID)) 246 + return -EINVAL; 247 + 248 + node_id = nla_get_u32(info->attrs[DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID]); 249 + 250 + node = xa_load(&drm_ras_xa, node_id); 251 + if (!node) 252 + return -ENOENT; 253 + 254 + for (error_id = max(node->error_counter_range.first, ctx->restart); 255 + error_id <= node->error_counter_range.last; 256 + error_id++) { 257 + ret = get_node_error_counter(node_id, error_id, 258 + &error_name, &value); 259 + /* 260 + * For non-contiguous range, driver return -ENOENT as indication 261 + * to skip this ID when listing all errors. 262 + */ 263 + if (ret == -ENOENT) 264 + continue; 265 + if (ret) 266 + return ret; 267 + 268 + hdr = genlmsg_iput(skb, info); 269 + 270 + if (!hdr) { 271 + ret = -EMSGSIZE; 272 + break; 273 + } 274 + 275 + ret = msg_reply_value(skb, error_id, error_name, value); 276 + if (ret) { 277 + genlmsg_cancel(skb, hdr); 278 + break; 279 + } 280 + 281 + genlmsg_end(skb, hdr); 282 + } 283 + 284 + if (ret == -EMSGSIZE) 285 + ctx->restart = error_id; 286 + 287 + return ret; 288 + } 289 + 290 + /** 291 + * drm_ras_nl_get_error_counter_doit() - Query an error counter of an node 292 + * @skb: Netlink message buffer 293 + * @info: Generic Netlink info containing attributes of the request 294 + * 295 + * Extracts the node ID and error ID from the netlink attributes and 296 + * retrieves the current value of the corresponding error counter. Sends the 297 + * result back to the requesting user via the standard Genl reply. 298 + * 299 + * Return: 0 on success, or negative errno on failure. 300 + */ 301 + int drm_ras_nl_get_error_counter_doit(struct sk_buff *skb, 302 + struct genl_info *info) 303 + { 304 + u32 node_id, error_id; 305 + 306 + if (!info->attrs || 307 + GENL_REQ_ATTR_CHECK(info, DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID) || 308 + GENL_REQ_ATTR_CHECK(info, DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID)) 309 + return -EINVAL; 310 + 311 + node_id = nla_get_u32(info->attrs[DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID]); 312 + error_id = nla_get_u32(info->attrs[DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID]); 313 + 314 + return doit_reply_value(info, node_id, error_id); 315 + } 316 + 317 + /** 318 + * drm_ras_node_register() - Register a new RAS node 319 + * @node: Node structure to register 320 + * 321 + * Adds the given RAS node to the global node xarray and assigns it 322 + * a unique ID. Both @node->name and @node->type must be valid. 323 + * 324 + * Return: 0 on success, or negative errno on failure: 325 + */ 326 + int drm_ras_node_register(struct drm_ras_node *node) 327 + { 328 + if (!node->device_name || !node->node_name) 329 + return -EINVAL; 330 + 331 + /* Currently, only Error Counter Endpoints are supported */ 332 + if (node->type != DRM_RAS_NODE_TYPE_ERROR_COUNTER) 333 + return -EINVAL; 334 + 335 + /* Mandatory entries for Error Counter Node */ 336 + if (node->type == DRM_RAS_NODE_TYPE_ERROR_COUNTER && 337 + (!node->error_counter_range.last || !node->query_error_counter)) 338 + return -EINVAL; 339 + 340 + return xa_alloc(&drm_ras_xa, &node->id, node, xa_limit_32b, GFP_KERNEL); 341 + } 342 + EXPORT_SYMBOL(drm_ras_node_register); 343 + 344 + /** 345 + * drm_ras_node_unregister() - Unregister a previously registered node 346 + * @node: Node structure to unregister 347 + * 348 + * Removes the given node from the global node xarray using its ID. 349 + */ 350 + void drm_ras_node_unregister(struct drm_ras_node *node) 351 + { 352 + xa_erase(&drm_ras_xa, node->id); 353 + } 354 + EXPORT_SYMBOL(drm_ras_node_unregister);
+42
drivers/gpu/drm/drm_ras_genl_family.c
··· 1 + // SPDX-License-Identifier: MIT 2 + /* 3 + * Copyright © 2026 Intel Corporation 4 + */ 5 + 6 + #include <drm/drm_ras_genl_family.h> 7 + #include "drm_ras_nl.h" 8 + 9 + /* Track family registration so the drm_exit can be called at any time */ 10 + static bool registered; 11 + 12 + /** 13 + * drm_ras_genl_family_register() - Register drm-ras genl family 14 + * 15 + * Only to be called one at drm_drv_init() 16 + */ 17 + int drm_ras_genl_family_register(void) 18 + { 19 + int ret; 20 + 21 + registered = false; 22 + 23 + ret = genl_register_family(&drm_ras_nl_family); 24 + if (ret) 25 + return ret; 26 + 27 + registered = true; 28 + return 0; 29 + } 30 + 31 + /** 32 + * drm_ras_genl_family_unregister() - Unregister drm-ras genl family 33 + * 34 + * To be called one at drm_drv_exit() at any moment, but only once. 35 + */ 36 + void drm_ras_genl_family_unregister(void) 37 + { 38 + if (registered) { 39 + genl_unregister_family(&drm_ras_nl_family); 40 + registered = false; 41 + } 42 + }
+56
drivers/gpu/drm/drm_ras_nl.c
··· 1 + // SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 2 + /* Do not edit directly, auto-generated from: */ 3 + /* Documentation/netlink/specs/drm_ras.yaml */ 4 + /* YNL-GEN kernel source */ 5 + /* To regenerate run: tools/net/ynl/ynl-regen.sh */ 6 + 7 + #include <net/netlink.h> 8 + #include <net/genetlink.h> 9 + 10 + #include "drm_ras_nl.h" 11 + 12 + #include <uapi/drm/drm_ras.h> 13 + 14 + /* DRM_RAS_CMD_GET_ERROR_COUNTER - do */ 15 + static const struct nla_policy drm_ras_get_error_counter_do_nl_policy[DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID + 1] = { 16 + [DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID] = { .type = NLA_U32, }, 17 + [DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID] = { .type = NLA_U32, }, 18 + }; 19 + 20 + /* DRM_RAS_CMD_GET_ERROR_COUNTER - dump */ 21 + static const struct nla_policy drm_ras_get_error_counter_dump_nl_policy[DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID + 1] = { 22 + [DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID] = { .type = NLA_U32, }, 23 + }; 24 + 25 + /* Ops table for drm_ras */ 26 + static const struct genl_split_ops drm_ras_nl_ops[] = { 27 + { 28 + .cmd = DRM_RAS_CMD_LIST_NODES, 29 + .dumpit = drm_ras_nl_list_nodes_dumpit, 30 + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP, 31 + }, 32 + { 33 + .cmd = DRM_RAS_CMD_GET_ERROR_COUNTER, 34 + .doit = drm_ras_nl_get_error_counter_doit, 35 + .policy = drm_ras_get_error_counter_do_nl_policy, 36 + .maxattr = DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID, 37 + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, 38 + }, 39 + { 40 + .cmd = DRM_RAS_CMD_GET_ERROR_COUNTER, 41 + .dumpit = drm_ras_nl_get_error_counter_dumpit, 42 + .policy = drm_ras_get_error_counter_dump_nl_policy, 43 + .maxattr = DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID, 44 + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP, 45 + }, 46 + }; 47 + 48 + struct genl_family drm_ras_nl_family __ro_after_init = { 49 + .name = DRM_RAS_FAMILY_NAME, 50 + .version = DRM_RAS_FAMILY_VERSION, 51 + .netnsok = true, 52 + .parallel_ops = true, 53 + .module = THIS_MODULE, 54 + .split_ops = drm_ras_nl_ops, 55 + .n_split_ops = ARRAY_SIZE(drm_ras_nl_ops), 56 + };
+24
drivers/gpu/drm/drm_ras_nl.h
··· 1 + /* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */ 2 + /* Do not edit directly, auto-generated from: */ 3 + /* Documentation/netlink/specs/drm_ras.yaml */ 4 + /* YNL-GEN kernel header */ 5 + /* To regenerate run: tools/net/ynl/ynl-regen.sh */ 6 + 7 + #ifndef _LINUX_DRM_RAS_GEN_H 8 + #define _LINUX_DRM_RAS_GEN_H 9 + 10 + #include <net/netlink.h> 11 + #include <net/genetlink.h> 12 + 13 + #include <uapi/drm/drm_ras.h> 14 + 15 + int drm_ras_nl_list_nodes_dumpit(struct sk_buff *skb, 16 + struct netlink_callback *cb); 17 + int drm_ras_nl_get_error_counter_doit(struct sk_buff *skb, 18 + struct genl_info *info); 19 + int drm_ras_nl_get_error_counter_dumpit(struct sk_buff *skb, 20 + struct netlink_callback *cb); 21 + 22 + extern struct genl_family drm_ras_nl_family; 23 + 24 + #endif /* _LINUX_DRM_RAS_GEN_H */
+75
include/drm/drm_ras.h
··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2026 Intel Corporation 4 + */ 5 + 6 + #ifndef __DRM_RAS_H__ 7 + #define __DRM_RAS_H__ 8 + 9 + #include <uapi/drm/drm_ras.h> 10 + 11 + /** 12 + * struct drm_ras_node - A DRM RAS Node 13 + */ 14 + struct drm_ras_node { 15 + /** @id: Unique identifier for the node. Dynamically assigned. */ 16 + u32 id; 17 + /** 18 + * @device_name: Human-readable name of the device. Given by the driver. 19 + */ 20 + const char *device_name; 21 + /** @node_name: Human-readable name of the node. Given by the driver. */ 22 + const char *node_name; 23 + /** @type: Type of the node (enum drm_ras_node_type). */ 24 + enum drm_ras_node_type type; 25 + 26 + /* Error-Counter Related Callback and Variables */ 27 + 28 + /** @error_counter_range: Range of valid Error IDs for this node. */ 29 + struct { 30 + /** @first: First valid Error ID. */ 31 + u32 first; 32 + /** @last: Last valid Error ID. Mandatory entry. */ 33 + u32 last; 34 + } error_counter_range; 35 + 36 + /** 37 + * @query_error_counter: 38 + * 39 + * This callback is used by drm-ras to query a specific error counter. 40 + * Used for input check and to iterate all error counters in a node. 41 + * 42 + * Driver should expect query_error_counter() to be called with 43 + * error_id from `error_counter_range.first` to 44 + * `error_counter_range.last`. 45 + * 46 + * The @query_error_counter is a mandatory callback for 47 + * error_counter_node. 48 + * 49 + * Returns: 0 on success, 50 + * -ENOENT when error_id is not supported as an indication that 51 + * drm_ras should silently skip this entry. Used for 52 + * supporting non-contiguous error ranges. 53 + * Driver is responsible for maintaining the list of 54 + * supported error IDs in the range of first to last. 55 + * Other negative values on errors that should terminate the 56 + * netlink query. 57 + */ 58 + int (*query_error_counter)(struct drm_ras_node *node, u32 error_id, 59 + const char **name, u32 *val); 60 + 61 + /** @priv: Driver private data */ 62 + void *priv; 63 + }; 64 + 65 + struct drm_device; 66 + 67 + #if IS_ENABLED(CONFIG_DRM_RAS) 68 + int drm_ras_node_register(struct drm_ras_node *node); 69 + void drm_ras_node_unregister(struct drm_ras_node *node); 70 + #else 71 + static inline int drm_ras_node_register(struct drm_ras_node *node) { return 0; } 72 + static inline void drm_ras_node_unregister(struct drm_ras_node *node) { } 73 + #endif 74 + 75 + #endif
+17
include/drm/drm_ras_genl_family.h
··· 1 + /* SPDX-License-Identifier: MIT */ 2 + /* 3 + * Copyright © 2026 Intel Corporation 4 + */ 5 + 6 + #ifndef __DRM_RAS_GENL_FAMILY_H__ 7 + #define __DRM_RAS_GENL_FAMILY_H__ 8 + 9 + #if IS_ENABLED(CONFIG_DRM_RAS) 10 + int drm_ras_genl_family_register(void); 11 + void drm_ras_genl_family_unregister(void); 12 + #else 13 + static inline int drm_ras_genl_family_register(void) { return 0; } 14 + static inline void drm_ras_genl_family_unregister(void) { } 15 + #endif 16 + 17 + #endif
+49
include/uapi/drm/drm_ras.h
··· 1 + /* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */ 2 + /* Do not edit directly, auto-generated from: */ 3 + /* Documentation/netlink/specs/drm_ras.yaml */ 4 + /* YNL-GEN uapi header */ 5 + /* To regenerate run: tools/net/ynl/ynl-regen.sh */ 6 + 7 + #ifndef _UAPI_LINUX_DRM_RAS_H 8 + #define _UAPI_LINUX_DRM_RAS_H 9 + 10 + #define DRM_RAS_FAMILY_NAME "drm-ras" 11 + #define DRM_RAS_FAMILY_VERSION 1 12 + 13 + /* 14 + * Type of the node. Currently, only error-counter nodes are supported, which 15 + * expose reliability counters for a hardware/software component. 16 + */ 17 + enum drm_ras_node_type { 18 + DRM_RAS_NODE_TYPE_ERROR_COUNTER = 1, 19 + }; 20 + 21 + enum { 22 + DRM_RAS_A_NODE_ATTRS_NODE_ID = 1, 23 + DRM_RAS_A_NODE_ATTRS_DEVICE_NAME, 24 + DRM_RAS_A_NODE_ATTRS_NODE_NAME, 25 + DRM_RAS_A_NODE_ATTRS_NODE_TYPE, 26 + 27 + __DRM_RAS_A_NODE_ATTRS_MAX, 28 + DRM_RAS_A_NODE_ATTRS_MAX = (__DRM_RAS_A_NODE_ATTRS_MAX - 1) 29 + }; 30 + 31 + enum { 32 + DRM_RAS_A_ERROR_COUNTER_ATTRS_NODE_ID = 1, 33 + DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_ID, 34 + DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_NAME, 35 + DRM_RAS_A_ERROR_COUNTER_ATTRS_ERROR_VALUE, 36 + 37 + __DRM_RAS_A_ERROR_COUNTER_ATTRS_MAX, 38 + DRM_RAS_A_ERROR_COUNTER_ATTRS_MAX = (__DRM_RAS_A_ERROR_COUNTER_ATTRS_MAX - 1) 39 + }; 40 + 41 + enum { 42 + DRM_RAS_CMD_LIST_NODES = 1, 43 + DRM_RAS_CMD_GET_ERROR_COUNTER, 44 + 45 + __DRM_RAS_CMD_MAX, 46 + DRM_RAS_CMD_MAX = (__DRM_RAS_CMD_MAX - 1) 47 + }; 48 + 49 + #endif /* _UAPI_LINUX_DRM_RAS_H */