Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'mauro' into docs-mw

Mauro says:

This patch series change how kdoc parser handles macro replacements.

Instead of heavily relying on regular expressions that can sometimes
be very complex, it uses a C lexical tokenizer. This ensures that
BEGIN/END blocks on functions and structs are properly handled,
even when nested.

Checking before/after the patch series, for both man pages and
rst only had:
- whitespace differences;
- struct_group macros now are shown as inner anonimous structs
as it should be.

Also, I didn't notice any relevant change on the documentation build
time. With that regards, right now, every time a CMatch replacement
rule takes in place, it does:

for each transform:
- tokenizes the source code;
- handle CMatch;
- convert tokens back to a string.

A possible optimization would be to do, instead:

- tokenizes source code;
- for each transform handle CMatch;
- convert tokens back to a string.

For now, I opted not do do it, because:

- too much changes on a single row;
- docs build time is taking ~3:30 minutes, which is
about the same time it ws taken before the changes;
- there is a very dirty hack inside function_xforms:
(KernRe(r"_noprof"), ""). This is meant to change
function prototypes instead of function arguments.

So, if ok for you, I would prefer to merge this one first. We can later
optimize kdoc_parser to avoid multiple token <-> string conversions.

-

One important aspect of this series is that it introduces unittests
for kernel-doc. I used it a lot during the development of this series,
to ensure that the changes I was doing were producing the expected
results. Tests are on two separate files that can be executed directly.

Alternatively, there is a run.py script that runs all of them (and
any other python script named tools/unittests/test_*.py"):

$ tools/unittests/run.py
test_cmatch:
TestSearch:
test_search_acquires_multiple: OK
test_search_acquires_nested_paren: OK
test_search_acquires_simple: OK
test_search_must_hold: OK
test_search_must_hold_shared: OK
test_search_no_false_positive: OK
test_search_no_function: OK
test_search_no_macro_remains: OK
TestSubMultipleMacros:
test_acquires_multiple: OK
test_acquires_nested_paren: OK
test_acquires_simple: OK
test_mixed_macros: OK
test_must_hold: OK
test_must_hold_shared: OK
test_no_false_positive: OK
test_no_function: OK
test_no_macro_remains: OK
TestSubSimple:
test_rise_early_greedy: OK
test_rise_multiple_greedy: OK
test_strip_multiple_acquires: OK
test_sub_count_parameter: OK
test_sub_mixed_placeholders: OK
test_sub_multiple_placeholders: OK
test_sub_no_placeholder: OK
test_sub_single_placeholder: OK
test_sub_with_capture: OK
test_sub_zero_placeholder: OK
TestSubWithLocalXforms:
test_functions_with_acquires_and_releases: OK
test_raw_struct_group: OK
test_raw_struct_group_tagged: OK
test_struct_group: OK
test_struct_group_attr: OK
test_struct_group_tagged_with_private: OK
test_struct_kcov: OK
test_vars_stackdepot: OK

test_tokenizer:
TestPublicPrivate:
test_balanced_inner_private: OK
test_balanced_non_greddy_private: OK
test_balanced_private: OK
test_no private: OK
test_unbalanced_inner_private: OK
test_unbalanced_private: OK
test_unbalanced_struct_group_tagged_with_private: OK
test_unbalanced_two_struct_group_tagged_first_with_private: OK
test_unbalanced_without_end_of_line: OK
TestTokenizer:
test_basic_tokens: OK
test_depth_counters: OK
test_mismatch_error: OK

Ran 47 tests

+2470 -343
+6
Documentation/doc-guide/kernel-doc.rst
··· 213 213 ``/*`` comment marker. They may optionally include comments between the 214 214 ``:`` and the ending ``*/`` marker. 215 215 216 + When ``private:`` is used on nested structs, it propagates only to inner 217 + structs/unions. 218 + 219 + 216 220 Example:: 217 221 218 222 /** ··· 260 256 union { 261 257 struct { 262 258 int memb1; 259 + /* private: hides memb2 from documentation */ 263 260 int memb2; 264 261 }; 262 + /* Everything here is public again, as private scope finished */ 265 263 struct { 266 264 void *memb3; 267 265 int memb4;
+2
Documentation/tools/python.rst
··· 11 11 feat 12 12 kdoc 13 13 kabi 14 + 15 + unittest
+24
Documentation/tools/unittest.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =============== 4 + Python unittest 5 + =============== 6 + 7 + Checking consistency of python modules can be complex. Sometimes, it is 8 + useful to define a set of unit tests to help checking them. 9 + 10 + While the actual test implementation is usecase dependent, Python already 11 + provides a standard way to add unit tests by using ``import unittest``. 12 + 13 + Using such class, requires setting up a test suite. Also, the default format 14 + is a little bit ackward. To improve it and provide a more uniform way to 15 + report errors, some unittest classes and functions are defined. 16 + 17 + 18 + Unittest helper module 19 + ====================== 20 + 21 + .. automodule:: lib.python.unittest_helper 22 + :members: 23 + :show-inheritance: 24 + :undoc-members:
+655
tools/lib/python/kdoc/c_lex.py
··· 1 + #!/usr/bin/env python3 2 + # SPDX-License-Identifier: GPL-2.0 3 + # Copyright(c) 2025: Mauro Carvalho Chehab <mchehab@kernel.org>. 4 + 5 + """ 6 + Regular expression ancillary classes. 7 + 8 + Those help caching regular expressions and do matching for kernel-doc. 9 + 10 + Please notice that the code here may rise exceptions to indicate bad 11 + usage inside kdoc to indicate problems at the replace pattern. 12 + 13 + Other errors are logged via log instance. 14 + """ 15 + 16 + import logging 17 + import re 18 + 19 + from copy import copy 20 + 21 + from .kdoc_re import KernRe 22 + 23 + log = logging.getLogger(__name__) 24 + 25 + def tokenizer_set_log(logger, prefix = ""): 26 + """ 27 + Replace the module‑level logger with a LoggerAdapter that 28 + prepends *prefix* to every message. 29 + """ 30 + global log 31 + 32 + class PrefixAdapter(logging.LoggerAdapter): 33 + """ 34 + Ancillary class to set prefix on all message logs. 35 + """ 36 + def process(self, msg, kwargs): 37 + return f"{prefix}{msg}", kwargs 38 + 39 + # Wrap the provided logger in our adapter 40 + log = PrefixAdapter(logger, {"prefix": prefix}) 41 + 42 + class CToken(): 43 + """ 44 + Data class to define a C token. 45 + """ 46 + 47 + # Tokens that can be used by the parser. Works like an C enum. 48 + 49 + COMMENT = 0 #: A standard C or C99 comment, including delimiter. 50 + STRING = 1 #: A string, including quotation marks. 51 + CHAR = 2 #: A character, including apostophes. 52 + NUMBER = 3 #: A number. 53 + PUNC = 4 #: A puntuation mark: / ``,`` / ``.``. 54 + BEGIN = 5 #: A begin character: ``{`` / ``[`` / ``(``. 55 + END = 6 #: A end character: ``}`` / ``]`` / ``)``. 56 + CPP = 7 #: A preprocessor macro. 57 + HASH = 8 #: The hash character - useful to handle other macros. 58 + OP = 9 #: A C operator (add, subtract, ...). 59 + STRUCT = 10 #: A ``struct`` keyword. 60 + UNION = 11 #: An ``union`` keyword. 61 + ENUM = 12 #: A ``struct`` keyword. 62 + TYPEDEF = 13 #: A ``typedef`` keyword. 63 + NAME = 14 #: A name. Can be an ID or a type. 64 + SPACE = 15 #: Any space characters, including new lines 65 + ENDSTMT = 16 #: End of an statement (``;``). 66 + 67 + BACKREF = 17 #: Not a valid C sequence, but used at sub regex patterns. 68 + 69 + MISMATCH = 255 #: an error indicator: should never happen in practice. 70 + 71 + # Dict to convert from an enum interger into a string. 72 + _name_by_val = {v: k for k, v in dict(vars()).items() if isinstance(v, int)} 73 + 74 + # Dict to convert from string to an enum-like integer value. 75 + _name_to_val = {k: v for v, k in _name_by_val.items()} 76 + 77 + @staticmethod 78 + def to_name(val): 79 + """Convert from an integer value from CToken enum into a string""" 80 + 81 + return CToken._name_by_val.get(val, f"UNKNOWN({val})") 82 + 83 + @staticmethod 84 + def from_name(name): 85 + """Convert a string into a CToken enum value""" 86 + if name in CToken._name_to_val: 87 + return CToken._name_to_val[name] 88 + 89 + return CToken.MISMATCH 90 + 91 + 92 + def __init__(self, kind, value=None, pos=0, 93 + brace_level=0, paren_level=0, bracket_level=0): 94 + self.kind = kind 95 + self.value = value 96 + self.pos = pos 97 + self.level = (bracket_level, paren_level, brace_level) 98 + 99 + def __repr__(self): 100 + name = self.to_name(self.kind) 101 + if isinstance(self.value, str): 102 + value = '"' + self.value + '"' 103 + else: 104 + value = self.value 105 + 106 + return f"CToken(CToken.{name}, {value}, {self.pos}, {self.level})" 107 + 108 + #: Regexes to parse C code, transforming it into tokens. 109 + RE_SCANNER_LIST = [ 110 + # 111 + # Note that \s\S is different than .*, as it also catches \n 112 + # 113 + (CToken.COMMENT, r"//[^\n]*|/\*[\s\S]*?\*/"), 114 + 115 + (CToken.STRING, r'"(?:\\.|[^"\\])*"'), 116 + (CToken.CHAR, r"'(?:\\.|[^'\\])'"), 117 + 118 + (CToken.NUMBER, r"0[xX][\da-fA-F]+[uUlL]*|0[0-7]+[uUlL]*|" 119 + r"\d+(?:\.\d*)?(?:[eE][+-]?\d+)?[fFlL]*"), 120 + 121 + (CToken.ENDSTMT, r"(?:\s+;|;)"), 122 + 123 + (CToken.PUNC, r"[,\.]"), 124 + 125 + (CToken.BEGIN, r"[\[\(\{]"), 126 + 127 + (CToken.END, r"[\]\)\}]"), 128 + 129 + (CToken.CPP, r"#\s*(?:define|include|ifdef|ifndef|if|else|elif|endif|undef|pragma)\b"), 130 + 131 + (CToken.HASH, r"#"), 132 + 133 + (CToken.OP, r"\+\+|\-\-|\->|==|\!=|<=|>=|&&|\|\||<<|>>|\+=|\-=|\*=|/=|%=" 134 + r"|&=|\|=|\^=|[=\+\-\*/%<>&\|\^~!\?\:]"), 135 + 136 + (CToken.STRUCT, r"\bstruct\b"), 137 + (CToken.UNION, r"\bunion\b"), 138 + (CToken.ENUM, r"\benum\b"), 139 + (CToken.TYPEDEF, r"\btypedef\b"), 140 + 141 + (CToken.NAME, r"[A-Za-z_]\w*"), 142 + 143 + (CToken.SPACE, r"\s+"), 144 + 145 + (CToken.BACKREF, r"\\\d+"), 146 + 147 + (CToken.MISMATCH,r"."), 148 + ] 149 + 150 + def fill_re_scanner(token_list): 151 + """Ancillary routine to convert RE_SCANNER_LIST into a finditer regex""" 152 + re_tokens = [] 153 + 154 + for kind, pattern in token_list: 155 + name = CToken.to_name(kind) 156 + re_tokens.append(f"(?P<{name}>{pattern})") 157 + 158 + return KernRe("|".join(re_tokens), re.MULTILINE | re.DOTALL) 159 + 160 + #: Handle C continuation lines. 161 + RE_CONT = KernRe(r"\\\n") 162 + 163 + RE_COMMENT_START = KernRe(r'/\*\s*') 164 + 165 + #: tokenizer regex. Will be filled at the first CTokenizer usage. 166 + RE_SCANNER = fill_re_scanner(RE_SCANNER_LIST) 167 + 168 + 169 + class CTokenizer(): 170 + """ 171 + Scan C statements and definitions and produce tokens. 172 + 173 + When converted to string, it drops comments and handle public/private 174 + values, respecting depth. 175 + """ 176 + 177 + # This class is inspired and follows the basic concepts of: 178 + # https://docs.python.org/3/library/re.html#writing-a-tokenizer 179 + 180 + def __init__(self, source=None, log=None): 181 + """ 182 + Create a regular expression to handle RE_SCANNER_LIST. 183 + 184 + While I generally don't like using regex group naming via: 185 + (?P<name>...) 186 + 187 + in this particular case, it makes sense, as we can pick the name 188 + when matching a code via RE_SCANNER. 189 + """ 190 + 191 + self.tokens = [] 192 + 193 + if not source: 194 + return 195 + 196 + if isinstance(source, list): 197 + self.tokens = source 198 + return 199 + 200 + # 201 + # While we could just use _tokenize directly via interator, 202 + # As we'll need to use the tokenizer several times inside kernel-doc 203 + # to handle macro transforms, cache the results on a list, as 204 + # re-using it is cheaper than having to parse everytime. 205 + # 206 + for tok in self._tokenize(source): 207 + self.tokens.append(tok) 208 + 209 + def _tokenize(self, source): 210 + """ 211 + Iterator that parses ``source``, splitting it into tokens, as defined 212 + at ``self.RE_SCANNER_LIST``. 213 + 214 + The interactor returns a CToken class object. 215 + """ 216 + 217 + # Handle continuation lines. Note that kdoc_parser already has a 218 + # logic to do that. Still, let's keep it for completeness, as we might 219 + # end re-using this tokenizer outsize kernel-doc some day - or we may 220 + # eventually remove from there as a future cleanup. 221 + source = RE_CONT.sub("", source) 222 + 223 + brace_level = 0 224 + paren_level = 0 225 + bracket_level = 0 226 + 227 + for match in RE_SCANNER.finditer(source): 228 + kind = CToken.from_name(match.lastgroup) 229 + pos = match.start() 230 + value = match.group() 231 + 232 + if kind == CToken.MISMATCH: 233 + log.error(f"Unexpected token '{value}' on pos {pos}:\n\t'{source}'") 234 + elif kind == CToken.BEGIN: 235 + if value == '(': 236 + paren_level += 1 237 + elif value == '[': 238 + bracket_level += 1 239 + else: # value == '{' 240 + brace_level += 1 241 + 242 + elif kind == CToken.END: 243 + if value == ')' and paren_level > 0: 244 + paren_level -= 1 245 + elif value == ']' and bracket_level > 0: 246 + bracket_level -= 1 247 + elif brace_level > 0: # value == '}' 248 + brace_level -= 1 249 + 250 + yield CToken(kind, value, pos, 251 + brace_level, paren_level, bracket_level) 252 + 253 + def __str__(self): 254 + out="" 255 + show_stack = [True] 256 + 257 + for i, tok in enumerate(self.tokens): 258 + if tok.kind == CToken.BEGIN: 259 + show_stack.append(show_stack[-1]) 260 + 261 + elif tok.kind == CToken.END: 262 + prev = show_stack[-1] 263 + if len(show_stack) > 1: 264 + show_stack.pop() 265 + 266 + if not prev and show_stack[-1]: 267 + # 268 + # Try to preserve indent 269 + # 270 + out += "\t" * (len(show_stack) - 1) 271 + 272 + out += str(tok.value) 273 + continue 274 + 275 + elif tok.kind == CToken.COMMENT: 276 + comment = RE_COMMENT_START.sub("", tok.value) 277 + 278 + if comment.startswith("private:"): 279 + show_stack[-1] = False 280 + show = False 281 + elif comment.startswith("public:"): 282 + show_stack[-1] = True 283 + 284 + continue 285 + 286 + if not show_stack[-1]: 287 + continue 288 + 289 + if i < len(self.tokens) - 1: 290 + next_tok = self.tokens[i + 1] 291 + 292 + # Do some cleanups before ";" 293 + 294 + if tok.kind == CToken.SPACE and next_tok.kind == CToken.ENDSTMT: 295 + continue 296 + 297 + if tok.kind == CToken.ENDSTMT and next_tok.kind == tok.kind: 298 + continue 299 + 300 + out += str(tok.value) 301 + 302 + return out 303 + 304 + 305 + class CTokenArgs: 306 + """ 307 + Ancillary class to help using backrefs from sub matches. 308 + 309 + If the highest backref contain a "+" at the last element, 310 + the logic will be greedy, picking all other delims. 311 + 312 + This is needed to parse struct_group macros with end with ``MEMBERS...``. 313 + """ 314 + def __init__(self, sub_str): 315 + self.sub_groups = set() 316 + self.max_group = -1 317 + self.greedy = None 318 + 319 + for m in KernRe(r'\\(\d+)([+]?)').finditer(sub_str): 320 + group = int(m.group(1)) 321 + if m.group(2) == "+": 322 + if self.greedy and self.greedy != group: 323 + raise ValueError("There are multiple greedy patterns!") 324 + self.greedy = group 325 + 326 + self.sub_groups.add(group) 327 + self.max_group = max(self.max_group, group) 328 + 329 + if self.greedy: 330 + if self.greedy != self.max_group: 331 + raise ValueError("Greedy pattern is not the last one!") 332 + 333 + sub_str = KernRe(r'(\\\d+)[+]').sub(r"\1", sub_str) 334 + 335 + self.sub_str = sub_str 336 + self.sub_tokeninzer = CTokenizer(sub_str) 337 + 338 + def groups(self, new_tokenizer): 339 + """ 340 + Create replacement arguments for backrefs like: 341 + 342 + ``\0``, ``\1``, ``\2``, ...``\n`` 343 + 344 + It also accepts a ``+`` character to the highest backref. When used, 345 + it means in practice to ignore delimins after it, being greedy. 346 + 347 + The logic is smart enough to only go up to the maximum required 348 + argument, even if there are more. 349 + 350 + If there is a backref for an argument above the limit, it will 351 + raise an exception. Please notice that, on C, square brackets 352 + don't have any separator on it. Trying to use ``\1``..``\n`` for 353 + brackets also raise an exception. 354 + """ 355 + 356 + level = (0, 0, 0) 357 + 358 + if self.max_group < 0: 359 + return level, [] 360 + 361 + tokens = new_tokenizer.tokens 362 + 363 + # 364 + # Fill \0 with the full token contents 365 + # 366 + groups_list = [ [] ] 367 + 368 + if 0 in self.sub_groups: 369 + inner_level = 0 370 + 371 + for i in range(0, len(tokens)): 372 + tok = tokens[i] 373 + 374 + if tok.kind == CToken.BEGIN: 375 + inner_level += 1 376 + 377 + # 378 + # Discard first begin 379 + # 380 + if not groups_list[0]: 381 + continue 382 + elif tok.kind == CToken.END: 383 + inner_level -= 1 384 + if inner_level < 0: 385 + break 386 + 387 + if inner_level: 388 + groups_list[0].append(tok) 389 + 390 + if not self.max_group: 391 + return level, groups_list 392 + 393 + delim = None 394 + 395 + # 396 + # Ignore everything before BEGIN. The value of begin gives the 397 + # delimiter to be used for the matches 398 + # 399 + for i in range(0, len(tokens)): 400 + tok = tokens[i] 401 + if tok.kind == CToken.BEGIN: 402 + if tok.value == "{": 403 + delim = ";" 404 + elif tok.value == "(": 405 + delim = "," 406 + else: 407 + self.log.error(fr"Can't handle \1..\n on {sub_str}") 408 + 409 + level = tok.level 410 + break 411 + 412 + pos = 1 413 + groups_list.append([]) 414 + 415 + inner_level = 0 416 + for i in range(i + 1, len(tokens)): 417 + tok = tokens[i] 418 + 419 + if tok.kind == CToken.BEGIN: 420 + inner_level += 1 421 + if tok.kind == CToken.END: 422 + inner_level -= 1 423 + if inner_level < 0: 424 + break 425 + 426 + if tok.kind in [CToken.PUNC, CToken.ENDSTMT] and delim == tok.value: 427 + pos += 1 428 + if self.greedy and pos > self.max_group: 429 + pos -= 1 430 + else: 431 + groups_list.append([]) 432 + 433 + if pos > self.max_group: 434 + break 435 + 436 + continue 437 + 438 + groups_list[pos].append(tok) 439 + 440 + if pos < self.max_group: 441 + log.error(fr"{self.sub_str} groups are up to {pos} instead of {self.max_group}") 442 + 443 + return level, groups_list 444 + 445 + def tokens(self, new_tokenizer): 446 + level, groups = self.groups(new_tokenizer) 447 + 448 + new = CTokenizer() 449 + 450 + for tok in self.sub_tokeninzer.tokens: 451 + if tok.kind == CToken.BACKREF: 452 + group = int(tok.value[1:]) 453 + 454 + for group_tok in groups[group]: 455 + new_tok = copy(group_tok) 456 + 457 + new_level = [0, 0, 0] 458 + 459 + for i in range(0, len(level)): 460 + new_level[i] = new_tok.level[i] + level[i] 461 + 462 + new_tok.level = tuple(new_level) 463 + 464 + new.tokens += [ new_tok ] 465 + else: 466 + new.tokens += [ tok ] 467 + 468 + return new.tokens 469 + 470 + 471 + class CMatch: 472 + """ 473 + Finding nested delimiters is hard with regular expressions. It is 474 + even harder on Python with its normal re module, as there are several 475 + advanced regular expressions that are missing. 476 + 477 + This is the case of this pattern:: 478 + 479 + '\\bSTRUCT_GROUP(\\(((?:(?>[^)(]+)|(?1))*)\\))[^;]*;' 480 + 481 + which is used to properly match open/close parentheses of the 482 + string search STRUCT_GROUP(), 483 + 484 + Add a class that counts pairs of delimiters, using it to match and 485 + replace nested expressions. 486 + 487 + The original approach was suggested by: 488 + 489 + https://stackoverflow.com/questions/5454322/python-how-to-match-nested-parentheses-with-regex 490 + 491 + Although I re-implemented it to make it more generic and match 3 types 492 + of delimiters. The logic checks if delimiters are paired. If not, it 493 + will ignore the search string. 494 + """ 495 + 496 + 497 + def __init__(self, regex, delim="("): 498 + self.regex = KernRe("^" + regex + r"\b") 499 + self.start_delim = delim 500 + 501 + def _search(self, tokenizer): 502 + """ 503 + Finds paired blocks for a regex that ends with a delimiter. 504 + 505 + The suggestion of using finditer to match pairs came from: 506 + https://stackoverflow.com/questions/5454322/python-how-to-match-nested-parentheses-with-regex 507 + but I ended using a different implementation to align all three types 508 + of delimiters and seek for an initial regular expression. 509 + 510 + The algorithm seeks for open/close paired delimiters and places them 511 + into a stack, yielding a start/stop position of each match when the 512 + stack is zeroed. 513 + 514 + The algorithm should work fine for properly paired lines, but will 515 + silently ignore end delimiters that precede a start delimiter. 516 + This should be OK for kernel-doc parser, as unaligned delimiters 517 + would cause compilation errors. So, we don't need to raise exceptions 518 + to cover such issues. 519 + """ 520 + 521 + start = None 522 + started = False 523 + 524 + import sys 525 + 526 + stack = [] 527 + 528 + for i, tok in enumerate(tokenizer.tokens): 529 + if start is None: 530 + if tok.kind == CToken.NAME and self.regex.match(tok.value): 531 + start = i 532 + stack.append((start, tok.level)) 533 + started = False 534 + 535 + continue 536 + 537 + if not started: 538 + if tok.kind == CToken.SPACE: 539 + continue 540 + 541 + if tok.kind == CToken.BEGIN and tok.value == self.start_delim: 542 + started = True 543 + continue 544 + 545 + # Name only token without BEGIN/END 546 + if i > start: 547 + i -= 1 548 + yield start, i 549 + start = None 550 + 551 + if tok.kind == CToken.END and tok.level == stack[-1][1]: 552 + start, level = stack.pop() 553 + 554 + yield start, i 555 + start = None 556 + 557 + # 558 + # If an END zeroing levels is not there, return remaining stuff 559 + # This is meant to solve cases where the caller logic might be 560 + # picking an incomplete block. 561 + # 562 + if start and stack: 563 + if started: 564 + s = str(tokenizer) 565 + log.warning(f"can't find a final end at {s}") 566 + 567 + yield start, len(tokenizer.tokens) 568 + 569 + def search(self, source): 570 + """ 571 + This is similar to re.search: 572 + 573 + It matches a regex that it is followed by a delimiter, 574 + returning occurrences only if all delimiters are paired. 575 + """ 576 + 577 + if isinstance(source, CTokenizer): 578 + tokenizer = source 579 + is_token = True 580 + else: 581 + tokenizer = CTokenizer(source) 582 + is_token = False 583 + 584 + for start, end in self._search(tokenizer): 585 + new_tokenizer = CTokenizer(tokenizer.tokens[start:end + 1]) 586 + 587 + if is_token: 588 + yield new_tokenizer 589 + else: 590 + yield str(new_tokenizer) 591 + 592 + def sub(self, sub_str, source, count=0): 593 + """ 594 + This is similar to re.sub: 595 + 596 + It matches a regex that it is followed by a delimiter, 597 + replacing occurrences only if all delimiters are paired. 598 + 599 + if the sub argument contains:: 600 + 601 + r'\0' 602 + 603 + it will work just like re: it places there the matched paired data 604 + with the delimiter stripped. 605 + 606 + If count is different than zero, it will replace at most count 607 + items. 608 + """ 609 + if isinstance(source, CTokenizer): 610 + is_token = True 611 + tokenizer = source 612 + else: 613 + is_token = False 614 + tokenizer = CTokenizer(source) 615 + 616 + # Detect if sub_str contains sub arguments 617 + 618 + args_match = CTokenArgs(sub_str) 619 + 620 + new_tokenizer = CTokenizer() 621 + pos = 0 622 + n = 0 623 + 624 + # 625 + # NOTE: the code below doesn't consider overlays at sub. 626 + # We may need to add some extra unit tests to check if those 627 + # would cause problems. When replacing by "", this should not 628 + # be a problem, but other transformations could be problematic 629 + # 630 + for start, end in self._search(tokenizer): 631 + new_tokenizer.tokens += tokenizer.tokens[pos:start] 632 + 633 + new = CTokenizer(tokenizer.tokens[start:end + 1]) 634 + 635 + new_tokenizer.tokens += args_match.tokens(new) 636 + 637 + pos = end + 1 638 + 639 + n += 1 640 + if count and n >= count: 641 + break 642 + 643 + new_tokenizer.tokens += tokenizer.tokens[pos:] 644 + 645 + if not is_token: 646 + return str(new_tokenizer) 647 + 648 + return new_tokenizer 649 + 650 + def __repr__(self): 651 + """ 652 + Returns a displayable version of the class init. 653 + """ 654 + 655 + return f'CMatch("{self.regex.regex.pattern}")'
+20 -15
tools/lib/python/kdoc/kdoc_parser.py
··· 13 13 import re 14 14 from pprint import pformat 15 15 16 - from kdoc.kdoc_re import NestedMatch, KernRe 16 + from kdoc.c_lex import CTokenizer, tokenizer_set_log 17 + from kdoc.kdoc_re import KernRe 17 18 from kdoc.kdoc_item import KdocItem 18 19 19 20 # ··· 85 84 """ 86 85 Remove ``struct``/``enum`` members that have been marked "private". 87 86 """ 88 - # First look for a "public:" block that ends a private region, then 89 - # handle the "private until the end" case. 90 - # 91 - text = KernRe(r'/\*\s*private:.*?/\*\s*public:.*?\*/', flags=re.S).sub('', text) 92 - text = KernRe(r'/\*\s*private:.*', flags=re.S).sub('', text) 93 - # 94 - # We needed the comments to do the above, but now we can take them out. 95 - # 96 - return KernRe(r'\s*/\*.*?\*/\s*', flags=re.S).sub('', text).strip() 87 + 88 + tokens = CTokenizer(text) 89 + return str(tokens) 97 90 98 91 class state: 99 92 """ ··· 252 257 self.fname = fname 253 258 self.config = config 254 259 self.xforms = xforms 260 + 261 + tokenizer_set_log(self.config.log, f"{self.fname}: CMatch: ") 255 262 256 263 # Initial state for the state machines 257 264 self.state = state.NORMAL ··· 723 726 # 724 727 # Do the basic parse to get the pieces of the declaration. 725 728 # 729 + proto = trim_private_members(proto) 726 730 struct_parts = self.split_struct_proto(proto) 727 731 if not struct_parts: 728 732 self.emit_msg(ln, f"{proto} error: Cannot parse struct or union!") ··· 737 739 # 738 740 # Go through the list of members applying all of our transformations. 739 741 # 740 - members = trim_private_members(members) 741 742 members = self.xforms.apply("struct", members) 742 743 743 744 # ··· 763 766 # Strip preprocessor directives. Note that this depends on the 764 767 # trailing semicolon we added in process_proto_type(). 765 768 # 769 + proto = trim_private_members(proto) 766 770 proto = KernRe(r'#\s*((define|ifdef|if)\s+|endif)[^;]*;', flags=re.S).sub('', proto) 767 771 # 768 772 # Parse out the name and members of the enum. Typedef form first. ··· 771 773 r = KernRe(r'typedef\s+enum\s*\{(.*)\}\s*(\w*)\s*;') 772 774 if r.search(proto): 773 775 declaration_name = r.group(2) 774 - members = trim_private_members(r.group(1)) 776 + members = r.group(1) 775 777 # 776 778 # Failing that, look for a straight enum 777 779 # ··· 779 781 r = KernRe(r'enum\s+(\w*)\s*\{(.*)\}') 780 782 if r.match(proto): 781 783 declaration_name = r.group(1) 782 - members = trim_private_members(r.group(2)) 784 + members = r.group(2) 783 785 # 784 786 # OK, this isn't going to work. 785 787 # ··· 808 810 member_set = set() 809 811 members = KernRe(r'\([^;)]*\)').sub('', members) 810 812 for arg in members.split(','): 811 - if not arg: 812 - continue 813 813 arg = KernRe(r'^\s*(\w+).*').sub(r'\1', arg) 814 + if not arg.strip(): 815 + continue 816 + 814 817 self.entry.parameterlist.append(arg) 815 818 if arg not in self.entry.parameterdescs: 816 819 self.entry.parameterdescs[arg] = self.undescribed ··· 1354 1355 elif doc_content.search(line): 1355 1356 self.emit_msg(ln, f"Incorrect use of kernel-doc format: {line}") 1356 1357 self.state = state.PROTO 1358 + 1359 + # 1360 + # Don't let it add partial comments at the code, as breaks the 1361 + # logic meant to remove comments from prototypes. 1362 + # 1363 + self.process_proto_type(ln, "/**\n" + line) 1357 1364 # else ... ?? 1358 1365 1359 1366 def process_inline_text(self, ln, line):
-201
tools/lib/python/kdoc/kdoc_re.py
··· 140 140 """ 141 141 142 142 return self.last_match.groups() 143 - 144 - #: Nested delimited pairs (brackets and parenthesis) 145 - DELIMITER_PAIRS = { 146 - '{': '}', 147 - '(': ')', 148 - '[': ']', 149 - } 150 - 151 - #: compiled delimiters 152 - RE_DELIM = KernRe(r'[\{\}\[\]\(\)]') 153 - 154 - 155 - class NestedMatch: 156 - """ 157 - Finding nested delimiters is hard with regular expressions. It is 158 - even harder on Python with its normal re module, as there are several 159 - advanced regular expressions that are missing. 160 - 161 - This is the case of this pattern:: 162 - 163 - '\\bSTRUCT_GROUP(\\(((?:(?>[^)(]+)|(?1))*)\\))[^;]*;' 164 - 165 - which is used to properly match open/close parentheses of the 166 - string search STRUCT_GROUP(), 167 - 168 - Add a class that counts pairs of delimiters, using it to match and 169 - replace nested expressions. 170 - 171 - The original approach was suggested by: 172 - 173 - https://stackoverflow.com/questions/5454322/python-how-to-match-nested-parentheses-with-regex 174 - 175 - Although I re-implemented it to make it more generic and match 3 types 176 - of delimiters. The logic checks if delimiters are paired. If not, it 177 - will ignore the search string. 178 - """ 179 - 180 - # TODO: make NestedMatch handle multiple match groups 181 - # 182 - # Right now, regular expressions to match it are defined only up to 183 - # the start delimiter, e.g.: 184 - # 185 - # \bSTRUCT_GROUP\( 186 - # 187 - # is similar to: STRUCT_GROUP\((.*)\) 188 - # except that the content inside the match group is delimiter-aligned. 189 - # 190 - # The content inside parentheses is converted into a single replace 191 - # group (e.g. r`\0'). 192 - # 193 - # It would be nice to change such definition to support multiple 194 - # match groups, allowing a regex equivalent to: 195 - # 196 - # FOO\((.*), (.*), (.*)\) 197 - # 198 - # it is probably easier to define it not as a regular expression, but 199 - # with some lexical definition like: 200 - # 201 - # FOO(arg1, arg2, arg3) 202 - 203 - def __init__(self, regex): 204 - self.regex = KernRe(regex) 205 - 206 - def _search(self, line): 207 - """ 208 - Finds paired blocks for a regex that ends with a delimiter. 209 - 210 - The suggestion of using finditer to match pairs came from: 211 - https://stackoverflow.com/questions/5454322/python-how-to-match-nested-parentheses-with-regex 212 - but I ended using a different implementation to align all three types 213 - of delimiters and seek for an initial regular expression. 214 - 215 - The algorithm seeks for open/close paired delimiters and places them 216 - into a stack, yielding a start/stop position of each match when the 217 - stack is zeroed. 218 - 219 - The algorithm should work fine for properly paired lines, but will 220 - silently ignore end delimiters that precede a start delimiter. 221 - This should be OK for kernel-doc parser, as unaligned delimiters 222 - would cause compilation errors. So, we don't need to raise exceptions 223 - to cover such issues. 224 - """ 225 - 226 - stack = [] 227 - 228 - for match_re in self.regex.finditer(line): 229 - start = match_re.start() 230 - offset = match_re.end() 231 - string_char = None 232 - escape = False 233 - 234 - d = line[offset - 1] 235 - if d not in DELIMITER_PAIRS: 236 - continue 237 - 238 - end = DELIMITER_PAIRS[d] 239 - stack.append(end) 240 - 241 - for match in RE_DELIM.finditer(line[offset:]): 242 - pos = match.start() + offset 243 - 244 - d = line[pos] 245 - 246 - if escape: 247 - escape = False 248 - continue 249 - 250 - if string_char: 251 - if d == '\\': 252 - escape = True 253 - elif d == string_char: 254 - string_char = None 255 - 256 - continue 257 - 258 - if d in ('"', "'"): 259 - string_char = d 260 - continue 261 - 262 - if d in DELIMITER_PAIRS: 263 - end = DELIMITER_PAIRS[d] 264 - 265 - stack.append(end) 266 - continue 267 - 268 - # Does the end delimiter match what is expected? 269 - if stack and d == stack[-1]: 270 - stack.pop() 271 - 272 - if not stack: 273 - yield start, offset, pos + 1 274 - break 275 - 276 - def search(self, line): 277 - """ 278 - This is similar to re.search: 279 - 280 - It matches a regex that it is followed by a delimiter, 281 - returning occurrences only if all delimiters are paired. 282 - """ 283 - 284 - for t in self._search(line): 285 - 286 - yield line[t[0]:t[2]] 287 - 288 - def sub(self, sub, line, count=0): 289 - """ 290 - This is similar to re.sub: 291 - 292 - It matches a regex that it is followed by a delimiter, 293 - replacing occurrences only if all delimiters are paired. 294 - 295 - if the sub argument contains:: 296 - 297 - r'\0' 298 - 299 - it will work just like re: it places there the matched paired data 300 - with the delimiter stripped. 301 - 302 - If count is different than zero, it will replace at most count 303 - items. 304 - """ 305 - out = "" 306 - 307 - cur_pos = 0 308 - n = 0 309 - 310 - for start, end, pos in self._search(line): 311 - out += line[cur_pos:start] 312 - 313 - # Value, ignoring start/end delimiters 314 - value = line[end:pos - 1] 315 - 316 - # replaces \0 at the sub string, if \0 is used there 317 - new_sub = sub 318 - new_sub = new_sub.replace(r'\0', value) 319 - 320 - out += new_sub 321 - 322 - # Drop end ';' if any 323 - if pos < len(line) and line[pos] == ';': 324 - pos += 1 325 - 326 - cur_pos = pos 327 - n += 1 328 - 329 - if count and count >= n: 330 - break 331 - 332 - # Append the remaining string 333 - l = len(line) 334 - out += line[cur_pos:l] 335 - 336 - return out 337 - 338 - def __repr__(self): 339 - """ 340 - Returns a displayable version of the class init. 341 - """ 342 - 343 - return f'NestedMatch("{self.regex.regex.pattern}")'
+110 -127
tools/lib/python/kdoc/xforms_lists.py
··· 4 4 5 5 import re 6 6 7 - from kdoc.kdoc_re import KernRe, NestedMatch 7 + from kdoc.kdoc_re import KernRe 8 + from kdoc.c_lex import CMatch, CTokenizer 8 9 9 - struct_args_pattern = r'([^,)]+)' 10 + struct_args_pattern = r"([^,)]+)" 11 + 10 12 11 13 class CTransforms: 12 14 """ ··· 17 15 into something we can parse and generate kdoc for. 18 16 """ 19 17 18 + # 19 + # NOTE: 20 + # Due to performance reasons, place CMatch rules before KernRe, 21 + # as this avoids running the C parser every time. 22 + # 23 + 20 24 #: Transforms for structs and unions. 21 25 struct_xforms = [ 22 - # Strip attributes 23 - (KernRe(r"__attribute__\s*\(\([a-z0-9,_\*\s\(\)]*\)\)", flags=re.I | re.S, cache=False), ' '), 24 - (KernRe(r'\s*__aligned\s*\([^;]*\)', re.S), ' '), 25 - (KernRe(r'\s*__counted_by\s*\([^;]*\)', re.S), ' '), 26 - (KernRe(r'\s*__counted_by_(le|be)\s*\([^;]*\)', re.S), ' '), 27 - (KernRe(r'\s*__guarded_by\s*\([^\)]*\)', re.S), ' '), 28 - (KernRe(r'\s*__pt_guarded_by\s*\([^\)]*\)', re.S), ' '), 29 - (KernRe(r'\s*__packed\s*', re.S), ' '), 30 - (KernRe(r'\s*CRYPTO_MINALIGN_ATTR', re.S), ' '), 31 - (KernRe(r'\s*__private', re.S), ' '), 32 - (KernRe(r'\s*__rcu', re.S), ' '), 33 - (KernRe(r'\s*____cacheline_aligned_in_smp', re.S), ' '), 34 - (KernRe(r'\s*____cacheline_aligned', re.S), ' '), 35 - (KernRe(r'\s*__cacheline_group_(begin|end)\([^\)]+\);'), ''), 36 - # 37 - # Unwrap struct_group macros based on this definition: 38 - # __struct_group(TAG, NAME, ATTRS, MEMBERS...) 39 - # which has variants like: struct_group(NAME, MEMBERS...) 40 - # Only MEMBERS arguments require documentation. 41 - # 42 - # Parsing them happens on two steps: 43 - # 44 - # 1. drop struct group arguments that aren't at MEMBERS, 45 - # storing them as STRUCT_GROUP(MEMBERS) 46 - # 47 - # 2. remove STRUCT_GROUP() ancillary macro. 48 - # 49 - # The original logic used to remove STRUCT_GROUP() using an 50 - # advanced regex: 51 - # 52 - # \bSTRUCT_GROUP(\(((?:(?>[^)(]+)|(?1))*)\))[^;]*; 53 - # 54 - # with two patterns that are incompatible with 55 - # Python re module, as it has: 56 - # 57 - # - a recursive pattern: (?1) 58 - # - an atomic grouping: (?>...) 59 - # 60 - # I tried a simpler version: but it didn't work either: 61 - # \bSTRUCT_GROUP\(([^\)]+)\)[^;]*; 62 - # 63 - # As it doesn't properly match the end parenthesis on some cases. 64 - # 65 - # So, a better solution was crafted: there's now a NestedMatch 66 - # class that ensures that delimiters after a search are properly 67 - # matched. So, the implementation to drop STRUCT_GROUP() will be 68 - # handled in separate. 69 - # 70 - (KernRe(r'\bstruct_group\s*\(([^,]*,)', re.S), r'STRUCT_GROUP('), 71 - (KernRe(r'\bstruct_group_attr\s*\(([^,]*,){2}', re.S), r'STRUCT_GROUP('), 72 - (KernRe(r'\bstruct_group_tagged\s*\(([^,]*),([^,]*),', re.S), r'struct \1 \2; STRUCT_GROUP('), 73 - (KernRe(r'\b__struct_group\s*\(([^,]*,){3}', re.S), r'STRUCT_GROUP('), 74 - # 75 - # Replace macros 76 - # 77 - # TODO: use NestedMatch for FOO($1, $2, ...) matches 78 - # 79 - # it is better to also move those to the NestedMatch logic, 80 - # to ensure that parentheses will be properly matched. 81 - # 82 - (KernRe(r'__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)', re.S), 83 - r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'), 84 - (KernRe(r'DECLARE_PHY_INTERFACE_MASK\s*\(([^\)]+)\)', re.S), 85 - r'DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)'), 86 - (KernRe(r'DECLARE_BITMAP\s*\(' + struct_args_pattern + r',\s*' + struct_args_pattern + r'\)', 87 - re.S), r'unsigned long \1[BITS_TO_LONGS(\2)]'), 88 - (KernRe(r'DECLARE_HASHTABLE\s*\(' + struct_args_pattern + r',\s*' + struct_args_pattern + r'\)', 89 - re.S), r'unsigned long \1[1 << ((\2) - 1)]'), 90 - (KernRe(r'DECLARE_KFIFO\s*\(' + struct_args_pattern + r',\s*' + struct_args_pattern + 91 - r',\s*' + struct_args_pattern + r'\)', re.S), r'\2 *\1'), 92 - (KernRe(r'DECLARE_KFIFO_PTR\s*\(' + struct_args_pattern + r',\s*' + 93 - struct_args_pattern + r'\)', re.S), r'\2 *\1'), 94 - (KernRe(r'(?:__)?DECLARE_FLEX_ARRAY\s*\(' + struct_args_pattern + r',\s*' + 95 - struct_args_pattern + r'\)', re.S), r'\1 \2[]'), 96 - (KernRe(r'DEFINE_DMA_UNMAP_ADDR\s*\(' + struct_args_pattern + r'\)', re.S), r'dma_addr_t \1'), 97 - (KernRe(r'DEFINE_DMA_UNMAP_LEN\s*\(' + struct_args_pattern + r'\)', re.S), r'__u32 \1'), 98 - (KernRe(r'VIRTIO_DECLARE_FEATURES\(([\w_]+)\)'), r'union { u64 \1; u64 \1_array[VIRTIO_FEATURES_U64S]; }'), 26 + (CMatch("__attribute__"), ""), 27 + (CMatch("__aligned"), ""), 28 + (CMatch("__counted_by"), ""), 29 + (CMatch("__counted_by_(le|be)"), ""), 30 + (CMatch("__guarded_by"), ""), 31 + (CMatch("__pt_guarded_by"), ""), 32 + (CMatch("__packed"), ""), 33 + (CMatch("CRYPTO_MINALIGN_ATTR"), ""), 34 + (CMatch("__private"), ""), 35 + (CMatch("__rcu"), ""), 36 + (CMatch("____cacheline_aligned_in_smp"), ""), 37 + (CMatch("____cacheline_aligned"), ""), 38 + (CMatch("__cacheline_group_(?:begin|end)"), ""), 39 + (CMatch("__ETHTOOL_DECLARE_LINK_MODE_MASK"), r"DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)"), 40 + (CMatch("DECLARE_PHY_INTERFACE_MASK",),r"DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)"), 41 + (CMatch("DECLARE_BITMAP"), r"unsigned long \1[BITS_TO_LONGS(\2)]"), 42 + (CMatch("DECLARE_HASHTABLE"), r"unsigned long \1[1 << ((\2) - 1)]"), 43 + (CMatch("DECLARE_KFIFO"), r"\2 *\1"), 44 + (CMatch("DECLARE_KFIFO_PTR"), r"\2 *\1"), 45 + (CMatch("(?:__)?DECLARE_FLEX_ARRAY"), r"\1 \2[]"), 46 + (CMatch("DEFINE_DMA_UNMAP_ADDR"), r"dma_addr_t \1"), 47 + (CMatch("DEFINE_DMA_UNMAP_LEN"), r"__u32 \1"), 48 + (CMatch("VIRTIO_DECLARE_FEATURES"), r"union { u64 \1; u64 \1_array[VIRTIO_FEATURES_U64S]; }"), 49 + (CMatch("__cond_acquires"), ""), 50 + (CMatch("__cond_releases"), ""), 51 + (CMatch("__acquires"), ""), 52 + (CMatch("__releases"), ""), 53 + (CMatch("__must_hold"), ""), 54 + (CMatch("__must_not_hold"), ""), 55 + (CMatch("__must_hold_shared"), ""), 56 + (CMatch("__cond_acquires_shared"), ""), 57 + (CMatch("__acquires_shared"), ""), 58 + (CMatch("__releases_shared"), ""), 59 + (CMatch("__attribute__"), ""), 99 60 100 - (NestedMatch(r"__cond_acquires\s*\("), ""), 101 - (NestedMatch(r"__cond_releases\s*\("), ""), 102 - (NestedMatch(r"__acquires\s*\("), ""), 103 - (NestedMatch(r"__releases\s*\("), ""), 104 - (NestedMatch(r"__must_hold\s*\("), ""), 105 - (NestedMatch(r"__must_not_hold\s*\("), ""), 106 - (NestedMatch(r"__must_hold_shared\s*\("), ""), 107 - (NestedMatch(r"__cond_acquires_shared\s*\("), ""), 108 - (NestedMatch(r"__acquires_shared\s*\("), ""), 109 - (NestedMatch(r"__releases_shared\s*\("), ""), 110 - (NestedMatch(r'\bSTRUCT_GROUP\('), r'\0'), 61 + # 62 + # Macro __struct_group() creates an union with an anonymous 63 + # and a non-anonymous struct, depending on the parameters. We only 64 + # need one of those at kernel-doc, as we won't be documenting the same 65 + # members twice. 66 + # 67 + (CMatch("struct_group"), r"struct { \2+ };"), 68 + (CMatch("struct_group_attr"), r"struct { \3+ };"), 69 + (CMatch("struct_group_tagged"), r"struct { \3+ };"), 70 + (CMatch("__struct_group"), r"struct { \4+ };"), 111 71 ] 112 72 113 73 #: Transforms for function prototypes. 114 74 function_xforms = [ 115 - (KernRe(r"^static +"), ""), 116 - (KernRe(r"^extern +"), ""), 117 - (KernRe(r"^asmlinkage +"), ""), 118 - (KernRe(r"^inline +"), ""), 119 - (KernRe(r"^__inline__ +"), ""), 120 - (KernRe(r"^__inline +"), ""), 121 - (KernRe(r"^__always_inline +"), ""), 122 - (KernRe(r"^noinline +"), ""), 123 - (KernRe(r"^__FORTIFY_INLINE +"), ""), 124 - (KernRe(r"__init +"), ""), 125 - (KernRe(r"__init_or_module +"), ""), 126 - (KernRe(r"__exit +"), ""), 127 - (KernRe(r"__deprecated +"), ""), 128 - (KernRe(r"__flatten +"), ""), 129 - (KernRe(r"__meminit +"), ""), 130 - (KernRe(r"__must_check +"), ""), 131 - (KernRe(r"__weak +"), ""), 132 - (KernRe(r"__sched +"), ""), 133 - (KernRe(r"_noprof"), ""), 134 - (KernRe(r"__always_unused *"), ""), 135 - (KernRe(r"__printf\s*\(\s*\d*\s*,\s*\d*\s*\) +"), ""), 136 - (KernRe(r"__(?:re)?alloc_size\s*\(\s*\d+\s*(?:,\s*\d+\s*)?\) +"), ""), 137 - (KernRe(r"__diagnose_as\s*\(\s*\S+\s*(?:,\s*\d+\s*)*\) +"), ""), 138 - (KernRe(r"DECL_BUCKET_PARAMS\s*\(\s*(\S+)\s*,\s*(\S+)\s*\)"), r"\1, \2"), 139 - (KernRe(r"__no_context_analysis\s*"), ""), 140 - (KernRe(r"__attribute_const__ +"), ""), 141 - (KernRe(r"__attribute__\s*\(\((?:[\w\s]+(?:\([^)]*\))?\s*,?)+\)\)\s+"), ""), 75 + (CMatch("static"), ""), 76 + (CMatch("extern"), ""), 77 + (CMatch("asmlinkage"), ""), 78 + (CMatch("inline"), ""), 79 + (CMatch("__inline__"), ""), 80 + (CMatch("__inline"), ""), 81 + (CMatch("__always_inline"), ""), 82 + (CMatch("noinline"), ""), 83 + (CMatch("__FORTIFY_INLINE"), ""), 84 + (CMatch("__init"), ""), 85 + (CMatch("__init_or_module"), ""), 86 + (CMatch("__exit"), ""), 87 + (CMatch("__deprecated"), ""), 88 + (CMatch("__flatten"), ""), 89 + (CMatch("__meminit"), ""), 90 + (CMatch("__must_check"), ""), 91 + (CMatch("__weak"), ""), 92 + (CMatch("__sched"), ""), 93 + (CMatch("__always_unused"), ""), 94 + (CMatch("__printf"), ""), 95 + (CMatch("__(?:re)?alloc_size"), ""), 96 + (CMatch("__diagnose_as"), ""), 97 + (CMatch("DECL_BUCKET_PARAMS"), r"\1, \2"), 98 + (CMatch("__no_context_analysis"), ""), 99 + (CMatch("__attribute_const__"), ""), 100 + (CMatch("__attribute__"), ""), 101 + 102 + # 103 + # HACK: this is similar to process_export() hack. It is meant to 104 + # drop _noproof from function name. See for instance: 105 + # ahash_request_alloc kernel-doc declaration at include/crypto/hash.h. 106 + # 107 + (KernRe("_noprof"), ""), 142 108 ] 143 109 144 110 #: Transforms for variable prototypes. 145 111 var_xforms = [ 146 - (KernRe(r"__read_mostly"), ""), 147 - (KernRe(r"__ro_after_init"), ""), 148 - (KernRe(r'\s*__guarded_by\s*\([^\)]*\)', re.S), ""), 149 - (KernRe(r'\s*__pt_guarded_by\s*\([^\)]*\)', re.S), ""), 150 - (KernRe(r"LIST_HEAD\(([\w_]+)\)"), r"struct list_head \1"), 112 + (CMatch("__read_mostly"), ""), 113 + (CMatch("__ro_after_init"), ""), 114 + (CMatch("__guarded_by"), ""), 115 + (CMatch("__pt_guarded_by"), ""), 116 + (CMatch("LIST_HEAD"), r"struct list_head \1"), 117 + 151 118 (KernRe(r"(?://.*)$"), ""), 152 119 (KernRe(r"(?:/\*.*\*/)"), ""), 153 120 (KernRe(r";$"), ""), ··· 129 158 "var": var_xforms, 130 159 } 131 160 132 - def apply(self, xforms_type, text): 161 + def apply(self, xforms_type, source): 133 162 """ 134 - Apply a set of transforms to a block of text. 163 + Apply a set of transforms to a block of source. 164 + 165 + As tokenizer is used here, this function also remove comments 166 + at the end. 135 167 """ 136 168 if xforms_type not in self.xforms: 137 - return text 169 + return source 170 + 171 + if isinstance(source, str): 172 + source = CTokenizer(source) 138 173 139 174 for search, subst in self.xforms[xforms_type]: 140 - text = search.sub(subst, text) 141 - return text 175 + # 176 + # KernRe only accept strings. 177 + # 178 + if isinstance(search, KernRe): 179 + source = str(source) 180 + 181 + source = search.sub(subst, source) 182 + return str(source)
+353
tools/lib/python/unittest_helper.py
··· 1 + #!/usr/bin/env python3 2 + # SPDX-License-Identifier: GPL-2.0 3 + # Copyright(c) 2025-2026: Mauro Carvalho Chehab <mchehab@kernel.org>. 4 + # 5 + # pylint: disable=C0103,R0912,R0914,E1101 6 + 7 + """ 8 + Provides helper functions and classes execute python unit tests. 9 + 10 + Those help functions provide a nice colored output summary of each 11 + executed test and, when a test fails, it shows the different in diff 12 + format when running in verbose mode, like:: 13 + 14 + $ tools/unittests/nested_match.py -v 15 + ... 16 + Traceback (most recent call last): 17 + File "/new_devel/docs/tools/unittests/nested_match.py", line 69, in test_count_limit 18 + self.assertEqual(replaced, "bar(a); bar(b); foo(c)") 19 + ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 20 + AssertionError: 'bar(a) foo(b); foo(c)' != 'bar(a); bar(b); foo(c)' 21 + - bar(a) foo(b); foo(c) 22 + ? ^^^^ 23 + + bar(a); bar(b); foo(c) 24 + ? ^^^^^ 25 + ... 26 + 27 + It also allows filtering what tests will be executed via ``-k`` parameter. 28 + 29 + Typical usage is to do:: 30 + 31 + from unittest_helper import run_unittest 32 + ... 33 + 34 + if __name__ == "__main__": 35 + run_unittest(__file__) 36 + 37 + If passing arguments is needed, on a more complex scenario, it can be 38 + used like on this example:: 39 + 40 + from unittest_helper import TestUnits, run_unittest 41 + ... 42 + env = {'sudo': ""} 43 + ... 44 + if __name__ == "__main__": 45 + runner = TestUnits() 46 + base_parser = runner.parse_args() 47 + base_parser.add_argument('--sudo', action='store_true', 48 + help='Enable tests requiring sudo privileges') 49 + 50 + args = base_parser.parse_args() 51 + 52 + # Update module-level flag 53 + if args.sudo: 54 + env['sudo'] = "1" 55 + 56 + # Run tests with customized arguments 57 + runner.run(__file__, parser=base_parser, args=args, env=env) 58 + """ 59 + 60 + import argparse 61 + import atexit 62 + import os 63 + import re 64 + import unittest 65 + import sys 66 + 67 + from unittest.mock import patch 68 + 69 + 70 + class Summary(unittest.TestResult): 71 + """ 72 + Overrides ``unittest.TestResult`` class to provide a nice colored 73 + summary. When in verbose mode, displays actual/expected difference in 74 + unified diff format. 75 + """ 76 + def __init__(self, *args, **kwargs): 77 + super().__init__(*args, **kwargs) 78 + 79 + #: Dictionary to store organized test results. 80 + self.test_results = {} 81 + 82 + #: max length of the test names. 83 + self.max_name_length = 0 84 + 85 + def startTest(self, test): 86 + super().startTest(test) 87 + test_id = test.id() 88 + parts = test_id.split(".") 89 + 90 + # Extract module, class, and method names 91 + if len(parts) >= 3: 92 + module_name = parts[-3] 93 + else: 94 + module_name = "" 95 + if len(parts) >= 2: 96 + class_name = parts[-2] 97 + else: 98 + class_name = "" 99 + 100 + method_name = parts[-1] 101 + 102 + # Build the hierarchical structure 103 + if module_name not in self.test_results: 104 + self.test_results[module_name] = {} 105 + 106 + if class_name not in self.test_results[module_name]: 107 + self.test_results[module_name][class_name] = [] 108 + 109 + # Track maximum test name length for alignment 110 + display_name = f"{method_name}:" 111 + 112 + self.max_name_length = max(len(display_name), self.max_name_length) 113 + 114 + def _record_test(self, test, status): 115 + test_id = test.id() 116 + parts = test_id.split(".") 117 + if len(parts) >= 3: 118 + module_name = parts[-3] 119 + else: 120 + module_name = "" 121 + if len(parts) >= 2: 122 + class_name = parts[-2] 123 + else: 124 + class_name = "" 125 + method_name = parts[-1] 126 + self.test_results[module_name][class_name].append((method_name, status)) 127 + 128 + def addSuccess(self, test): 129 + super().addSuccess(test) 130 + self._record_test(test, "OK") 131 + 132 + def addFailure(self, test, err): 133 + super().addFailure(test, err) 134 + self._record_test(test, "FAIL") 135 + 136 + def addError(self, test, err): 137 + super().addError(test, err) 138 + self._record_test(test, "ERROR") 139 + 140 + def addSkip(self, test, reason): 141 + super().addSkip(test, reason) 142 + self._record_test(test, f"SKIP ({reason})") 143 + 144 + def printResults(self): 145 + """ 146 + Print results using colors if tty. 147 + """ 148 + # Check for ANSI color support 149 + use_color = sys.stdout.isatty() 150 + COLORS = { 151 + "OK": "\033[32m", # Green 152 + "FAIL": "\033[31m", # Red 153 + "SKIP": "\033[1;33m", # Yellow 154 + "PARTIAL": "\033[33m", # Orange 155 + "EXPECTED_FAIL": "\033[36m", # Cyan 156 + "reset": "\033[0m", # Reset to default terminal color 157 + } 158 + if not use_color: 159 + for c in COLORS: 160 + COLORS[c] = "" 161 + 162 + # Calculate maximum test name length 163 + if not self.test_results: 164 + return 165 + try: 166 + lengths = [] 167 + for module in self.test_results.values(): 168 + for tests in module.values(): 169 + for test_name, _ in tests: 170 + lengths.append(len(test_name) + 1) # +1 for colon 171 + max_length = max(lengths) + 2 # Additional padding 172 + except ValueError: 173 + sys.exit("Test list is empty") 174 + 175 + # Print results 176 + for module_name, classes in self.test_results.items(): 177 + print(f"{module_name}:") 178 + for class_name, tests in classes.items(): 179 + print(f" {class_name}:") 180 + for test_name, status in tests: 181 + # Get base status without reason for SKIP 182 + if status.startswith("SKIP"): 183 + status_code = status.split()[0] 184 + else: 185 + status_code = status 186 + color = COLORS.get(status_code, "") 187 + print( 188 + f" {test_name + ':':<{max_length}}{color}{status}{COLORS['reset']}" 189 + ) 190 + print() 191 + 192 + # Print summary 193 + print(f"\nRan {self.testsRun} tests", end="") 194 + if hasattr(self, "timeTaken"): 195 + print(f" in {self.timeTaken:.3f}s", end="") 196 + print() 197 + 198 + if not self.wasSuccessful(): 199 + print(f"\n{COLORS['FAIL']}FAILED (", end="") 200 + failures = getattr(self, "failures", []) 201 + errors = getattr(self, "errors", []) 202 + if failures: 203 + print(f"failures={len(failures)}", end="") 204 + if errors: 205 + if failures: 206 + print(", ", end="") 207 + print(f"errors={len(errors)}", end="") 208 + print(f"){COLORS['reset']}") 209 + 210 + 211 + def flatten_suite(suite): 212 + """Flatten test suite hierarchy.""" 213 + tests = [] 214 + for item in suite: 215 + if isinstance(item, unittest.TestSuite): 216 + tests.extend(flatten_suite(item)) 217 + else: 218 + tests.append(item) 219 + return tests 220 + 221 + 222 + class TestUnits: 223 + """ 224 + Helper class to set verbosity level. 225 + 226 + This class discover test files, import its unittest classes and 227 + executes the test on it. 228 + """ 229 + def parse_args(self): 230 + """Returns a parser for command line arguments.""" 231 + parser = argparse.ArgumentParser(description="Test runner with regex filtering") 232 + parser.add_argument("-v", "--verbose", action="count", default=1) 233 + parser.add_argument("-f", "--failfast", action="store_true") 234 + parser.add_argument("-k", "--keyword", 235 + help="Regex pattern to filter test methods") 236 + return parser 237 + 238 + def run(self, caller_file=None, pattern=None, 239 + suite=None, parser=None, args=None, env=None): 240 + """ 241 + Execute all tests from the unity test file. 242 + 243 + It contains several optional parameters: 244 + 245 + ``caller_file``: 246 + - name of the file that contains test. 247 + 248 + typical usage is to place __file__ at the caller test, e.g.:: 249 + 250 + if __name__ == "__main__": 251 + TestUnits().run(__file__) 252 + 253 + ``pattern``: 254 + - optional pattern to match multiple file names. Defaults 255 + to basename of ``caller_file``. 256 + 257 + ``suite``: 258 + - an unittest suite initialized by the caller using 259 + ``unittest.TestLoader().discover()``. 260 + 261 + ``parser``: 262 + - an argparse parser. If not defined, this helper will create 263 + one. 264 + 265 + ``args``: 266 + - an ``argparse.Namespace`` data filled by the caller. 267 + 268 + ``env``: 269 + - environment variables that will be passed to the test suite 270 + 271 + At least ``caller_file`` or ``suite`` must be used, otherwise a 272 + ``TypeError`` will be raised. 273 + """ 274 + if not args: 275 + if not parser: 276 + parser = self.parse_args() 277 + args = parser.parse_args() 278 + 279 + if not caller_file and not suite: 280 + raise TypeError("Either caller_file or suite is needed at TestUnits") 281 + 282 + verbose = args.verbose 283 + 284 + if not env: 285 + env = os.environ.copy() 286 + 287 + env["VERBOSE"] = f"{verbose}" 288 + 289 + patcher = patch.dict(os.environ, env) 290 + patcher.start() 291 + # ensure it gets stopped after 292 + atexit.register(patcher.stop) 293 + 294 + 295 + if verbose >= 2: 296 + unittest.TextTestRunner(verbosity=verbose).run = lambda suite: suite 297 + 298 + # Load ONLY tests from the calling file 299 + if not suite: 300 + if not pattern: 301 + pattern = caller_file 302 + 303 + loader = unittest.TestLoader() 304 + suite = loader.discover(start_dir=os.path.dirname(caller_file), 305 + pattern=os.path.basename(caller_file)) 306 + 307 + # Flatten the suite for environment injection 308 + tests_to_inject = flatten_suite(suite) 309 + 310 + # Filter tests by method name if -k specified 311 + if args.keyword: 312 + try: 313 + pattern = re.compile(args.keyword) 314 + filtered_suite = unittest.TestSuite() 315 + for test in tests_to_inject: # Use the pre-flattened list 316 + method_name = test.id().split(".")[-1] 317 + if pattern.search(method_name): 318 + filtered_suite.addTest(test) 319 + suite = filtered_suite 320 + except re.error as e: 321 + sys.stderr.write(f"Invalid regex pattern: {e}\n") 322 + sys.exit(1) 323 + else: 324 + # Maintain original suite structure if no keyword filtering 325 + suite = unittest.TestSuite(tests_to_inject) 326 + 327 + if verbose >= 2: 328 + resultclass = None 329 + else: 330 + resultclass = Summary 331 + 332 + runner = unittest.TextTestRunner(verbosity=args.verbose, 333 + resultclass=resultclass, 334 + failfast=args.failfast) 335 + result = runner.run(suite) 336 + if resultclass: 337 + result.printResults() 338 + 339 + sys.exit(not result.wasSuccessful()) 340 + 341 + 342 + def run_unittest(fname): 343 + """ 344 + Basic usage of TestUnits class. 345 + 346 + Use it when there's no need to pass any extra argument to the tests 347 + with. The recommended way is to place this at the end of each 348 + unittest module:: 349 + 350 + if __name__ == "__main__": 351 + run_unittest(__file__) 352 + """ 353 + TestUnits().run(fname)
+17
tools/unittests/run.py
··· 1 + #!/bin/env python3 2 + import os 3 + import unittest 4 + import sys 5 + 6 + TOOLS_DIR=os.path.join(os.path.dirname(os.path.realpath(__file__)), "..") 7 + sys.path.insert(0, TOOLS_DIR) 8 + 9 + from lib.python.unittest_helper import TestUnits 10 + 11 + if __name__ == "__main__": 12 + loader = unittest.TestLoader() 13 + 14 + suite = loader.discover(start_dir=os.path.join(TOOLS_DIR, "unittests"), 15 + pattern="test*.py") 16 + 17 + TestUnits().run("", suite=suite)
+821
tools/unittests/test_cmatch.py
··· 1 + #!/usr/bin/env python3 2 + # SPDX-License-Identifier: GPL-2.0 3 + # Copyright(c) 2026: Mauro Carvalho Chehab <mchehab@kernel.org>. 4 + # 5 + # pylint: disable=C0413,R0904 6 + 7 + 8 + """ 9 + Unit tests for kernel-doc CMatch. 10 + """ 11 + 12 + import os 13 + import re 14 + import sys 15 + import unittest 16 + 17 + 18 + # Import Python modules 19 + 20 + SRC_DIR = os.path.dirname(os.path.realpath(__file__)) 21 + sys.path.insert(0, os.path.join(SRC_DIR, "../lib/python")) 22 + 23 + from kdoc.c_lex import CMatch 24 + from kdoc.kdoc_re import KernRe 25 + from unittest_helper import run_unittest 26 + 27 + # 28 + # Override unittest.TestCase to better compare diffs ignoring whitespaces 29 + # 30 + class TestCaseDiff(unittest.TestCase): 31 + """ 32 + Disable maximum limit on diffs and add a method to better 33 + handle diffs with whitespace differences. 34 + """ 35 + 36 + @classmethod 37 + def setUpClass(cls): 38 + """Ensure that there won't be limit for diffs""" 39 + cls.maxDiff = None 40 + 41 + 42 + # 43 + # Tests doing with different macros 44 + # 45 + 46 + class TestSearch(TestCaseDiff): 47 + """ 48 + Test search mechanism 49 + """ 50 + 51 + def test_search_acquires_simple(self): 52 + line = "__acquires(ctx) foo();" 53 + result = ", ".join(CMatch("__acquires").search(line)) 54 + self.assertEqual(result, "__acquires(ctx)") 55 + 56 + def test_search_acquires_multiple(self): 57 + line = "__acquires(ctx) __acquires(other) bar();" 58 + result = ", ".join(CMatch("__acquires").search(line)) 59 + self.assertEqual(result, "__acquires(ctx), __acquires(other)") 60 + 61 + def test_search_acquires_nested_paren(self): 62 + line = "__acquires((ctx1, ctx2)) baz();" 63 + result = ", ".join(CMatch("__acquires").search(line)) 64 + self.assertEqual(result, "__acquires((ctx1, ctx2))") 65 + 66 + def test_search_must_hold(self): 67 + line = "__must_hold(&lock) do_something();" 68 + result = ", ".join(CMatch("__must_hold").search(line)) 69 + self.assertEqual(result, "__must_hold(&lock)") 70 + 71 + def test_search_must_hold_shared(self): 72 + line = "__must_hold_shared(RCU) other();" 73 + result = ", ".join(CMatch("__must_hold_shared").search(line)) 74 + self.assertEqual(result, "__must_hold_shared(RCU)") 75 + 76 + def test_search_no_false_positive(self): 77 + line = "call__acquires(foo); // should stay intact" 78 + result = ", ".join(CMatch(r"__acquires").search(line)) 79 + self.assertEqual(result, "") 80 + 81 + def test_search_no_macro_remains(self): 82 + line = "do_something_else();" 83 + result = ", ".join(CMatch("__acquires").search(line)) 84 + self.assertEqual(result, "") 85 + 86 + def test_search_no_function(self): 87 + line = "something" 88 + result = ", ".join(CMatch(line).search(line)) 89 + self.assertEqual(result, "") 90 + 91 + # 92 + # Override unittest.TestCase to better compare diffs ignoring whitespaces 93 + # 94 + class TestCaseDiff(unittest.TestCase): 95 + """ 96 + Disable maximum limit on diffs and add a method to better 97 + handle diffs with whitespace differences. 98 + """ 99 + 100 + @classmethod 101 + def setUpClass(cls): 102 + """Ensure that there won't be limit for diffs""" 103 + cls.maxDiff = None 104 + 105 + def assertLogicallyEqual(self, a, b): 106 + """ 107 + Compare two results ignoring multiple whitespace differences. 108 + 109 + This is useful to check more complex matches picked from examples. 110 + On a plus side, we also don't need to use dedent. 111 + Please notice that line breaks still need to match. We might 112 + remove it at the regex, but this way, checking the diff is easier. 113 + """ 114 + a = re.sub(r"[\t ]+", " ", a.strip()) 115 + b = re.sub(r"[\t ]+", " ", b.strip()) 116 + 117 + a = re.sub(r"\s+\n", "\n", a) 118 + b = re.sub(r"\s+\n", "\n", b) 119 + 120 + a = re.sub(" ;", ";", a) 121 + b = re.sub(" ;", ";", b) 122 + 123 + self.assertEqual(a, b) 124 + 125 + # 126 + # Tests doing with different macros 127 + # 128 + 129 + class TestSubMultipleMacros(TestCaseDiff): 130 + """ 131 + Tests doing with different macros. 132 + 133 + Here, we won't use assertLogicallyEqual. Instead, we'll check if each 134 + of the expected patterns are present at the answer. 135 + """ 136 + 137 + def test_acquires_simple(self): 138 + """Simple replacement test with __acquires""" 139 + line = "__acquires(ctx) foo();" 140 + result = CMatch(r"__acquires").sub("REPLACED", line) 141 + 142 + self.assertEqual("REPLACED foo();", result) 143 + 144 + def test_acquires_multiple(self): 145 + """Multiple __acquires""" 146 + line = "__acquires(ctx) __acquires(other) bar();" 147 + result = CMatch(r"__acquires").sub("REPLACED", line) 148 + 149 + self.assertEqual("REPLACED REPLACED bar();", result) 150 + 151 + def test_acquires_nested_paren(self): 152 + """__acquires with nested pattern""" 153 + line = "__acquires((ctx1, ctx2)) baz();" 154 + result = CMatch(r"__acquires").sub("REPLACED", line) 155 + 156 + self.assertEqual("REPLACED baz();", result) 157 + 158 + def test_must_hold(self): 159 + """__must_hold with a pointer""" 160 + line = "__must_hold(&lock) do_something();" 161 + result = CMatch(r"__must_hold").sub("REPLACED", line) 162 + 163 + self.assertNotIn("__must_hold(", result) 164 + self.assertIn("do_something();", result) 165 + 166 + def test_must_hold_shared(self): 167 + """__must_hold with an upercase defined value""" 168 + line = "__must_hold_shared(RCU) other();" 169 + result = CMatch(r"__must_hold_shared").sub("REPLACED", line) 170 + 171 + self.assertNotIn("__must_hold_shared(", result) 172 + self.assertIn("other();", result) 173 + 174 + def test_no_false_positive(self): 175 + """ 176 + Ensure that unrelated text containing similar patterns is preserved 177 + """ 178 + line = "call__acquires(foo); // should stay intact" 179 + result = CMatch(r"\b__acquires").sub("REPLACED", line) 180 + 181 + self.assertLogicallyEqual(result, "call__acquires(foo);") 182 + 183 + def test_mixed_macros(self): 184 + """Add a mix of macros""" 185 + line = "__acquires(ctx) __releases(ctx) __must_hold(&lock) foo();" 186 + 187 + result = CMatch(r"__acquires").sub("REPLACED", line) 188 + result = CMatch(r"__releases").sub("REPLACED", result) 189 + result = CMatch(r"__must_hold").sub("REPLACED", result) 190 + 191 + self.assertNotIn("__acquires(", result) 192 + self.assertNotIn("__releases(", result) 193 + self.assertNotIn("__must_hold(", result) 194 + 195 + self.assertIn("foo();", result) 196 + 197 + def test_no_macro_remains(self): 198 + """Ensures that unmatched macros are untouched""" 199 + line = "do_something_else();" 200 + result = CMatch(r"__acquires").sub("REPLACED", line) 201 + 202 + self.assertEqual(result, line) 203 + 204 + def test_no_function(self): 205 + """Ensures that no functions will remain untouched""" 206 + line = "something" 207 + result = CMatch(line).sub("REPLACED", line) 208 + 209 + self.assertEqual(result, line) 210 + 211 + # 212 + # Check if the diff is logically equivalent. To simplify, the tests here 213 + # use a single macro name for all replacements. 214 + # 215 + 216 + class TestSubSimple(TestCaseDiff): 217 + """ 218 + Test argument replacements. 219 + 220 + Here, the function name can be anything. So, we picked __attribute__(), 221 + to mimic a macro found at the Kernel, but none of the replacements her 222 + has any relationship with the Kernel usage. 223 + """ 224 + 225 + MACRO = "__attribute__" 226 + 227 + @classmethod 228 + def setUpClass(cls): 229 + """Define a CMatch to be used for all tests""" 230 + cls.matcher = CMatch(cls.MACRO) 231 + 232 + def test_sub_with_capture(self): 233 + """Test all arguments replacement with a single arg""" 234 + line = f"{self.MACRO}(&ctx)\nfoo();" 235 + 236 + result = self.matcher.sub(r"ACQUIRED(\0)", line) 237 + 238 + self.assertLogicallyEqual("ACQUIRED(&ctx)\nfoo();", result) 239 + 240 + def test_sub_zero_placeholder(self): 241 + """Test all arguments replacement with a multiple args""" 242 + line = f"{self.MACRO}(arg1, arg2)\nbar();" 243 + 244 + result = self.matcher.sub(r"REPLACED(\0)", line) 245 + 246 + self.assertLogicallyEqual("REPLACED(arg1, arg2)\nbar();", result) 247 + 248 + def test_sub_single_placeholder(self): 249 + """Single replacement rule for \1""" 250 + line = f"{self.MACRO}(ctx, boo)\nfoo();" 251 + result = self.matcher.sub(r"ACQUIRED(\1)", line) 252 + 253 + self.assertLogicallyEqual("ACQUIRED(ctx)\nfoo();", result) 254 + 255 + def test_sub_multiple_placeholders(self): 256 + """Replacement rule for both \1 and \2""" 257 + line = f"{self.MACRO}(arg1, arg2)\nbar();" 258 + result = self.matcher.sub(r"REPLACE(\1, \2)", line) 259 + 260 + self.assertLogicallyEqual("REPLACE(arg1, arg2)\nbar();", result) 261 + 262 + def test_sub_mixed_placeholders(self): 263 + """Replacement rule for \0, \1 and additional text""" 264 + line = f"{self.MACRO}(foo, bar)\nbaz();" 265 + result = self.matcher.sub(r"ALL(\0) FIRST(\1)", line) 266 + 267 + self.assertLogicallyEqual("ALL(foo, bar) FIRST(foo)\nbaz();", result) 268 + 269 + def test_sub_no_placeholder(self): 270 + """Replacement without placeholders""" 271 + line = f"{self.MACRO}(arg)\nfoo();" 272 + result = self.matcher.sub(r"NO_BACKREFS()", line) 273 + 274 + self.assertLogicallyEqual("NO_BACKREFS()\nfoo();", result) 275 + 276 + def test_sub_count_parameter(self): 277 + """Verify that the algorithm stops after the requested count""" 278 + line = f"{self.MACRO}(a1) x();\n{self.MACRO}(a2) y();" 279 + result = self.matcher.sub(r"ONLY_FIRST(\1) ", line, count=1) 280 + 281 + self.assertLogicallyEqual(f"ONLY_FIRST(a1) x();\n{self.MACRO}(a2) y();", 282 + result) 283 + 284 + def test_strip_multiple_acquires(self): 285 + """Check if spaces between removed delimiters will be dropped""" 286 + line = f"int {self.MACRO}(1) {self.MACRO}(2 ) {self.MACRO}(3) foo;" 287 + result = self.matcher.sub("", line) 288 + 289 + self.assertLogicallyEqual(result, "int foo;") 290 + 291 + def test_rise_early_greedy(self): 292 + line = f"{self.MACRO}(a, b, c, d);" 293 + sub = r"\1, \2+, \3" 294 + 295 + with self.assertRaises(ValueError): 296 + result = self.matcher.sub(sub, line) 297 + 298 + def test_rise_multiple_greedy(self): 299 + line = f"{self.MACRO}(a, b, c, d);" 300 + sub = r"\1, \2+, \3+" 301 + 302 + with self.assertRaises(ValueError): 303 + result = self.matcher.sub(sub, line) 304 + 305 + # 306 + # Test replacements with slashrefs 307 + # 308 + 309 + 310 + class TestSubWithLocalXforms(TestCaseDiff): 311 + """ 312 + Test diferent usecase patterns found at the Kernel. 313 + 314 + Here, replacements using both CMatch and KernRe can be tested, 315 + as it will import the actual replacement rules used by kernel-doc. 316 + """ 317 + 318 + struct_xforms = [ 319 + (CMatch("__attribute__"), ' '), 320 + (CMatch('__aligned'), ' '), 321 + (CMatch('__counted_by'), ' '), 322 + (CMatch('__counted_by_(le|be)'), ' '), 323 + (CMatch('__guarded_by'), ' '), 324 + (CMatch('__pt_guarded_by'), ' '), 325 + 326 + (CMatch('__cacheline_group_(begin|end)'), ''), 327 + 328 + (CMatch('struct_group'), r'\2'), 329 + (CMatch('struct_group_attr'), r'\3'), 330 + (CMatch('struct_group_tagged'), r'struct \1 { \3+ } \2;'), 331 + (CMatch('__struct_group'), r'\4'), 332 + 333 + (CMatch('__ETHTOOL_DECLARE_LINK_MODE_MASK'), r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'), 334 + (CMatch('DECLARE_PHY_INTERFACE_MASK',), r'DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)'), 335 + (CMatch('DECLARE_BITMAP'), r'unsigned long \1[BITS_TO_LONGS(\2)]'), 336 + 337 + (CMatch('DECLARE_HASHTABLE'), r'unsigned long \1[1 << ((\2) - 1)]'), 338 + (CMatch('DECLARE_KFIFO'), r'\2 *\1'), 339 + (CMatch('DECLARE_KFIFO_PTR'), r'\2 *\1'), 340 + (CMatch('(?:__)?DECLARE_FLEX_ARRAY'), r'\1 \2[]'), 341 + (CMatch('DEFINE_DMA_UNMAP_ADDR'), r'dma_addr_t \1'), 342 + (CMatch('DEFINE_DMA_UNMAP_LEN'), r'__u32 \1'), 343 + (CMatch('VIRTIO_DECLARE_FEATURES'), r'union { u64 \1; u64 \1_array[VIRTIO_FEATURES_U64S]; }'), 344 + ] 345 + 346 + function_xforms = [ 347 + (CMatch('__printf'), ""), 348 + (CMatch('__(?:re)?alloc_size'), ""), 349 + (CMatch("__diagnose_as"), ""), 350 + (CMatch("DECL_BUCKET_PARAMS"), r"\1, \2"), 351 + 352 + (CMatch("__cond_acquires"), ""), 353 + (CMatch("__cond_releases"), ""), 354 + (CMatch("__acquires"), ""), 355 + (CMatch("__releases"), ""), 356 + (CMatch("__must_hold"), ""), 357 + (CMatch("__must_not_hold"), ""), 358 + (CMatch("__must_hold_shared"), ""), 359 + (CMatch("__cond_acquires_shared"), ""), 360 + (CMatch("__acquires_shared"), ""), 361 + (CMatch("__releases_shared"), ""), 362 + (CMatch("__attribute__"), ""), 363 + ] 364 + 365 + var_xforms = [ 366 + (CMatch('__guarded_by'), ""), 367 + (CMatch('__pt_guarded_by'), ""), 368 + (CMatch("LIST_HEAD"), r"struct list_head \1"), 369 + ] 370 + 371 + #: Transforms main dictionary used at apply_transforms(). 372 + xforms = { 373 + "struct": struct_xforms, 374 + "func": function_xforms, 375 + "var": var_xforms, 376 + } 377 + 378 + @classmethod 379 + def apply_transforms(cls, xform_type, text): 380 + """ 381 + Mimic the behavior of kdoc_parser.apply_transforms() method. 382 + 383 + For each element of STRUCT_XFORMS, apply apply_transforms. 384 + 385 + There are two parameters: 386 + 387 + - ``xform_type`` 388 + Can be ``func``, ``struct`` or ``var``; 389 + - ``text`` 390 + The text where the sub patterns from CTransforms will be applied. 391 + """ 392 + for search, subst in cls.xforms.get(xform_type): 393 + text = search.sub(subst, text) 394 + 395 + return text.strip() 396 + 397 + cls.matcher = CMatch(r"struct_group[\w\_]*") 398 + 399 + def test_struct_group(self): 400 + """ 401 + Test struct_group using a pattern from 402 + drivers/net/ethernet/asix/ax88796c_main.h. 403 + """ 404 + line = """ 405 + struct tx_pkt_info { 406 + struct_group(tx_overhead, 407 + struct tx_sop_header sop; 408 + struct tx_segment_header seg; 409 + ); 410 + struct tx_eop_header eop; 411 + u16 pkt_len; 412 + u16 seq_num; 413 + }; 414 + """ 415 + expected = """ 416 + struct tx_pkt_info { 417 + struct tx_sop_header sop; 418 + struct tx_segment_header seg; 419 + struct tx_eop_header eop; 420 + u16 pkt_len; 421 + u16 seq_num; 422 + }; 423 + """ 424 + 425 + result = self.apply_transforms("struct", line) 426 + self.assertLogicallyEqual(result, expected) 427 + 428 + def test_struct_group_attr(self): 429 + """ 430 + Test two struct_group_attr using patterns from fs/smb/client/cifspdu.h. 431 + """ 432 + line = """ 433 + typedef struct smb_com_open_rsp { 434 + struct smb_hdr hdr; /* wct = 34 BB */ 435 + __u8 AndXCommand; 436 + __u8 AndXReserved; 437 + __le16 AndXOffset; 438 + __u8 OplockLevel; 439 + __u16 Fid; 440 + __le32 CreateAction; 441 + struct_group_attr(common_attributes,, 442 + __le64 CreationTime; 443 + __le64 LastAccessTime; 444 + __le64 LastWriteTime; 445 + __le64 ChangeTime; 446 + __le32 FileAttributes; 447 + ); 448 + __le64 AllocationSize; 449 + __le64 EndOfFile; 450 + __le16 FileType; 451 + __le16 DeviceState; 452 + __u8 DirectoryFlag; 453 + __u16 ByteCount; /* bct = 0 */ 454 + } OPEN_RSP; 455 + typedef struct { 456 + struct_group_attr(common_attributes,, 457 + __le64 CreationTime; 458 + __le64 LastAccessTime; 459 + __le64 LastWriteTime; 460 + __le64 ChangeTime; 461 + __le32 Attributes; 462 + ); 463 + __u32 Pad1; 464 + __le64 AllocationSize; 465 + __le64 EndOfFile; 466 + __le32 NumberOfLinks; 467 + __u8 DeletePending; 468 + __u8 Directory; 469 + __u16 Pad2; 470 + __le32 EASize; 471 + __le32 FileNameLength; 472 + union { 473 + char __pad; 474 + DECLARE_FLEX_ARRAY(char, FileName); 475 + }; 476 + } FILE_ALL_INFO; /* level 0x107 QPathInfo */ 477 + """ 478 + expected = """ 479 + typedef struct smb_com_open_rsp { 480 + struct smb_hdr hdr; 481 + __u8 AndXCommand; 482 + __u8 AndXReserved; 483 + __le16 AndXOffset; 484 + __u8 OplockLevel; 485 + __u16 Fid; 486 + __le32 CreateAction; 487 + __le64 CreationTime; 488 + __le64 LastAccessTime; 489 + __le64 LastWriteTime; 490 + __le64 ChangeTime; 491 + __le32 FileAttributes; 492 + __le64 AllocationSize; 493 + __le64 EndOfFile; 494 + __le16 FileType; 495 + __le16 DeviceState; 496 + __u8 DirectoryFlag; 497 + __u16 ByteCount; 498 + } OPEN_RSP; 499 + typedef struct { 500 + __le64 CreationTime; 501 + __le64 LastAccessTime; 502 + __le64 LastWriteTime; 503 + __le64 ChangeTime; 504 + __le32 Attributes; 505 + __u32 Pad1; 506 + __le64 AllocationSize; 507 + __le64 EndOfFile; 508 + __le32 NumberOfLinks; 509 + __u8 DeletePending; 510 + __u8 Directory; 511 + __u16 Pad2; 512 + __le32 EASize; 513 + __le32 FileNameLength; 514 + union { 515 + char __pad; 516 + char FileName[]; 517 + }; 518 + } FILE_ALL_INFO; 519 + """ 520 + 521 + result = self.apply_transforms("struct", line) 522 + self.assertLogicallyEqual(result, expected) 523 + 524 + def test_raw_struct_group(self): 525 + """ 526 + Test a __struct_group pattern from include/uapi/cxl/features.h. 527 + """ 528 + line = """ 529 + struct cxl_mbox_get_sup_feats_out { 530 + __struct_group(cxl_mbox_get_sup_feats_out_hdr, hdr, /* empty */, 531 + __le16 num_entries; 532 + __le16 supported_feats; 533 + __u8 reserved[4]; 534 + ); 535 + struct cxl_feat_entry ents[] __counted_by_le(num_entries); 536 + } __attribute__ ((__packed__)); 537 + """ 538 + expected = """ 539 + struct cxl_mbox_get_sup_feats_out { 540 + __le16 num_entries; 541 + __le16 supported_feats; 542 + __u8 reserved[4]; 543 + struct cxl_feat_entry ents[]; 544 + }; 545 + """ 546 + 547 + result = self.apply_transforms("struct", line) 548 + self.assertLogicallyEqual(result, expected) 549 + 550 + def test_raw_struct_group_tagged(self): 551 + r""" 552 + Test cxl_regs with struct_group_tagged patterns from drivers/cxl/cxl.h. 553 + 554 + NOTE: 555 + 556 + This one has actually a violation from what kernel-doc would 557 + expect: Kernel-doc regex expects only 3 members, but this is 558 + actually defined as:: 559 + 560 + #define struct_group_tagged(TAG, NAME, MEMBERS...) 561 + 562 + The replace expression there is:: 563 + 564 + struct \1 { \3 } \2; 565 + 566 + but it should be really something like:: 567 + 568 + struct \1 { \3 \4 \5 \6 \7 \8 ... } \2; 569 + 570 + a later fix would be needed to address it. 571 + 572 + """ 573 + line = """ 574 + struct cxl_regs { 575 + struct_group_tagged(cxl_component_regs, component, 576 + void __iomem *hdm_decoder; 577 + void __iomem *ras; 578 + ); 579 + 580 + 581 + /* This is actually a violation: too much commas */ 582 + struct_group_tagged(cxl_device_regs, device_regs, 583 + void __iomem *status, *mbox, *memdev; 584 + ); 585 + 586 + struct_group_tagged(cxl_pmu_regs, pmu_regs, 587 + void __iomem *pmu; 588 + ); 589 + 590 + struct_group_tagged(cxl_rch_regs, rch_regs, 591 + void __iomem *dport_aer; 592 + ); 593 + 594 + struct_group_tagged(cxl_rcd_regs, rcd_regs, 595 + void __iomem *rcd_pcie_cap; 596 + ); 597 + }; 598 + """ 599 + expected = """ 600 + struct cxl_regs { 601 + struct cxl_component_regs { 602 + void __iomem *hdm_decoder; 603 + void __iomem *ras; 604 + } component; 605 + 606 + struct cxl_device_regs { 607 + void __iomem *status, *mbox, *memdev; 608 + } device_regs; 609 + 610 + struct cxl_pmu_regs { 611 + void __iomem *pmu; 612 + } pmu_regs; 613 + 614 + struct cxl_rch_regs { 615 + void __iomem *dport_aer; 616 + } rch_regs; 617 + 618 + struct cxl_rcd_regs { 619 + void __iomem *rcd_pcie_cap; 620 + } rcd_regs; 621 + }; 622 + """ 623 + 624 + result = self.apply_transforms("struct", line) 625 + self.assertLogicallyEqual(result, expected) 626 + 627 + def test_struct_group_tagged_with_private(self): 628 + """ 629 + Replace struct_group_tagged with private, using the same regex 630 + for the replacement as what happens in xforms_lists.py. 631 + 632 + As the private removal happens outside NestedGroup class, we manually 633 + dropped the remaining part of the struct, to simulate what happens 634 + at kdoc_parser. 635 + 636 + Taken from include/net/page_pool/types.h 637 + """ 638 + line = """ 639 + struct page_pool_params { 640 + struct_group_tagged(page_pool_params_slow, slow, 641 + struct net_device *netdev; 642 + unsigned int queue_idx; 643 + unsigned int flags; 644 + /* private: only under "slow" struct */ 645 + unsigned int ignored; 646 + ); 647 + /* Struct below shall not be ignored */ 648 + struct_group_tagged(page_pool_params_fast, fast, 649 + unsigned int order; 650 + unsigned int pool_size; 651 + int nid; 652 + struct device *dev; 653 + struct napi_struct *napi; 654 + enum dma_data_direction dma_dir; 655 + unsigned int max_len; 656 + unsigned int offset; 657 + ); 658 + }; 659 + """ 660 + expected = """ 661 + struct page_pool_params { 662 + struct page_pool_params_slow { 663 + struct net_device *netdev; 664 + unsigned int queue_idx; 665 + unsigned int flags; 666 + } slow; 667 + struct page_pool_params_fast { 668 + unsigned int order; 669 + unsigned int pool_size; 670 + int nid; 671 + struct device *dev; 672 + struct napi_struct *napi; 673 + enum dma_data_direction dma_dir; 674 + unsigned int max_len; 675 + unsigned int offset; 676 + } fast; 677 + }; 678 + """ 679 + 680 + result = self.apply_transforms("struct", line) 681 + self.assertLogicallyEqual(result, expected) 682 + 683 + def test_struct_kcov(self): 684 + """ 685 + """ 686 + line = """ 687 + struct kcov { 688 + refcount_t refcount; 689 + spinlock_t lock; 690 + enum kcov_mode mode __guarded_by(&lock); 691 + unsigned int size __guarded_by(&lock); 692 + void *area __guarded_by(&lock); 693 + struct task_struct *t __guarded_by(&lock); 694 + bool remote; 695 + unsigned int remote_size; 696 + int sequence; 697 + }; 698 + """ 699 + expected = """ 700 + """ 701 + 702 + result = self.apply_transforms("struct", line) 703 + self.assertLogicallyEqual(result, expected) 704 + 705 + 706 + def test_struct_kcov(self): 707 + """ 708 + Test a struct from kernel/kcov.c. 709 + """ 710 + line = """ 711 + struct kcov { 712 + refcount_t refcount; 713 + spinlock_t lock; 714 + enum kcov_mode mode __guarded_by(&lock); 715 + unsigned int size __guarded_by(&lock); 716 + void *area __guarded_by(&lock); 717 + struct task_struct *t __guarded_by(&lock); 718 + bool remote; 719 + unsigned int remote_size; 720 + int sequence; 721 + }; 722 + """ 723 + expected = """ 724 + struct kcov { 725 + refcount_t refcount; 726 + spinlock_t lock; 727 + enum kcov_mode mode; 728 + unsigned int size; 729 + void *area; 730 + struct task_struct *t; 731 + bool remote; 732 + unsigned int remote_size; 733 + int sequence; 734 + }; 735 + """ 736 + 737 + result = self.apply_transforms("struct", line) 738 + self.assertLogicallyEqual(result, expected) 739 + 740 + def test_vars_stackdepot(self): 741 + """ 742 + Test guarded_by on vars from lib/stackdepot.c. 743 + """ 744 + line = """ 745 + size_t pool_offset __guarded_by(&pool_lock) = DEPOT_POOL_SIZE; 746 + __guarded_by(&pool_lock) LIST_HEAD(free_stacks); 747 + void **stack_pools __pt_guarded_by(&pool_lock); 748 + """ 749 + expected = """ 750 + size_t pool_offset = DEPOT_POOL_SIZE; 751 + struct list_head free_stacks; 752 + void **stack_pools; 753 + """ 754 + 755 + result = self.apply_transforms("var", line) 756 + self.assertLogicallyEqual(result, expected) 757 + 758 + def test_functions_with_acquires_and_releases(self): 759 + """ 760 + Test guarded_by on vars from lib/stackdepot.c. 761 + """ 762 + line = """ 763 + bool prepare_report_consumer(unsigned long *flags, 764 + const struct access_info *ai, 765 + struct other_info *other_info) \ 766 + __cond_acquires(true, &report_lock); 767 + 768 + int tcp_sigpool_start(unsigned int id, struct tcp_sigpool *c) \ 769 + __cond_acquires(0, RCU_BH); 770 + 771 + bool undo_report_consumer(unsigned long *flags, 772 + const struct access_info *ai, 773 + struct other_info *other_info) \ 774 + __cond_releases(true, &report_lock); 775 + 776 + void debugfs_enter_cancellation(struct file *file, 777 + struct debugfs_cancellation *c) \ 778 + __acquires(cancellation); 779 + 780 + void debugfs_leave_cancellation(struct file *file, 781 + struct debugfs_cancellation *c) \ 782 + __releases(cancellation); 783 + 784 + acpi_cpu_flags acpi_os_acquire_lock(acpi_spinlock lockp) \ 785 + __acquires(lockp); 786 + 787 + void acpi_os_release_lock(acpi_spinlock lockp, 788 + acpi_cpu_flags not_used) \ 789 + __releases(lockp) 790 + """ 791 + expected = """ 792 + bool prepare_report_consumer(unsigned long *flags, 793 + const struct access_info *ai, 794 + struct other_info *other_info); 795 + 796 + int tcp_sigpool_start(unsigned int id, struct tcp_sigpool *c); 797 + 798 + bool undo_report_consumer(unsigned long *flags, 799 + const struct access_info *ai, 800 + struct other_info *other_info); 801 + 802 + void debugfs_enter_cancellation(struct file *file, 803 + struct debugfs_cancellation *c); 804 + 805 + void debugfs_leave_cancellation(struct file *file, 806 + struct debugfs_cancellation *c); 807 + 808 + acpi_cpu_flags acpi_os_acquire_lock(acpi_spinlock lockp); 809 + 810 + void acpi_os_release_lock(acpi_spinlock lockp, 811 + acpi_cpu_flags not_used) 812 + """ 813 + 814 + result = self.apply_transforms("func", line) 815 + self.assertLogicallyEqual(result, expected) 816 + 817 + # 818 + # Run all tests 819 + # 820 + if __name__ == "__main__": 821 + run_unittest(__file__)
+462
tools/unittests/test_tokenizer.py
··· 1 + #!/usr/bin/env python3 2 + 3 + """ 4 + Unit tests for struct/union member extractor class. 5 + """ 6 + 7 + 8 + import os 9 + import re 10 + import unittest 11 + import sys 12 + 13 + from unittest.mock import MagicMock 14 + 15 + SRC_DIR = os.path.dirname(os.path.realpath(__file__)) 16 + sys.path.insert(0, os.path.join(SRC_DIR, "../lib/python")) 17 + 18 + from kdoc.c_lex import CToken, CTokenizer 19 + from unittest_helper import run_unittest 20 + 21 + # 22 + # List of tests. 23 + # 24 + # The code will dynamically generate one test for each key on this dictionary. 25 + # 26 + def tokens_to_list(tokens): 27 + tuples = [] 28 + 29 + for tok in tokens: 30 + if tok.kind == CToken.SPACE: 31 + continue 32 + 33 + tuples += [(tok.kind, tok.value, tok.level)] 34 + 35 + return tuples 36 + 37 + 38 + def make_tokenizer_test(name, data): 39 + """ 40 + Create a test named ``name`` using parameters given by ``data`` dict. 41 + """ 42 + 43 + def test(self): 44 + """In-lined lambda-like function to run the test""" 45 + 46 + # 47 + # Check if logger is working 48 + # 49 + if "log_level" in data: 50 + with self.assertLogs('kdoc.c_lex', level='ERROR') as cm: 51 + tokenizer = CTokenizer(data["source"]) 52 + 53 + return 54 + 55 + # 56 + # Check if tokenizer is producing expected results 57 + # 58 + tokens = CTokenizer(data["source"]).tokens 59 + 60 + result = tokens_to_list(tokens) 61 + expected = tokens_to_list(data["expected"]) 62 + 63 + self.assertEqual(result, expected, msg=f"{name}") 64 + 65 + return test 66 + 67 + #: Tokenizer tests. 68 + TESTS_TOKENIZER = { 69 + "__run__": make_tokenizer_test, 70 + 71 + "basic_tokens": { 72 + "source": """ 73 + int a; // comment 74 + float b = 1.23; 75 + """, 76 + "expected": [ 77 + CToken(CToken.NAME, "int"), 78 + CToken(CToken.NAME, "a"), 79 + CToken(CToken.ENDSTMT, ";"), 80 + CToken(CToken.COMMENT, "// comment"), 81 + CToken(CToken.NAME, "float"), 82 + CToken(CToken.NAME, "b"), 83 + CToken(CToken.OP, "="), 84 + CToken(CToken.NUMBER, "1.23"), 85 + CToken(CToken.ENDSTMT, ";"), 86 + ], 87 + }, 88 + 89 + "depth_counters": { 90 + "source": """ 91 + struct X { 92 + int arr[10]; 93 + func(a[0], (b + c)); 94 + } 95 + """, 96 + "expected": [ 97 + CToken(CToken.STRUCT, "struct"), 98 + CToken(CToken.NAME, "X"), 99 + CToken(CToken.BEGIN, "{", brace_level=1), 100 + 101 + CToken(CToken.NAME, "int", brace_level=1), 102 + CToken(CToken.NAME, "arr", brace_level=1), 103 + CToken(CToken.BEGIN, "[", brace_level=1, bracket_level=1), 104 + CToken(CToken.NUMBER, "10", brace_level=1, bracket_level=1), 105 + CToken(CToken.END, "]", brace_level=1), 106 + CToken(CToken.ENDSTMT, ";", brace_level=1), 107 + CToken(CToken.NAME, "func", brace_level=1), 108 + CToken(CToken.BEGIN, "(", brace_level=1, paren_level=1), 109 + CToken(CToken.NAME, "a", brace_level=1, paren_level=1), 110 + CToken(CToken.BEGIN, "[", brace_level=1, paren_level=1, bracket_level=1), 111 + CToken(CToken.NUMBER, "0", brace_level=1, paren_level=1, bracket_level=1), 112 + CToken(CToken.END, "]", brace_level=1, paren_level=1), 113 + CToken(CToken.PUNC, ",", brace_level=1, paren_level=1), 114 + CToken(CToken.BEGIN, "(", brace_level=1, paren_level=2), 115 + CToken(CToken.NAME, "b", brace_level=1, paren_level=2), 116 + CToken(CToken.OP, "+", brace_level=1, paren_level=2), 117 + CToken(CToken.NAME, "c", brace_level=1, paren_level=2), 118 + CToken(CToken.END, ")", brace_level=1, paren_level=1), 119 + CToken(CToken.END, ")", brace_level=1), 120 + CToken(CToken.ENDSTMT, ";", brace_level=1), 121 + CToken(CToken.END, "}"), 122 + ], 123 + }, 124 + 125 + "mismatch_error": { 126 + "source": "int a$ = 5;", # $ is illegal 127 + "log_level": "ERROR", 128 + }, 129 + } 130 + 131 + def make_private_test(name, data): 132 + """ 133 + Create a test named ``name`` using parameters given by ``data`` dict. 134 + """ 135 + 136 + def test(self): 137 + """In-lined lambda-like function to run the test""" 138 + tokens = CTokenizer(data["source"]) 139 + result = str(tokens) 140 + 141 + # 142 + # Avoid whitespace false positives 143 + # 144 + result = re.sub(r"\s++", " ", result).strip() 145 + expected = re.sub(r"\s++", " ", data["trimmed"]).strip() 146 + 147 + msg = f"failed when parsing this source:\n{data['source']}" 148 + self.assertEqual(result, expected, msg=msg) 149 + 150 + return test 151 + 152 + #: Tests to check if CTokenizer is handling properly public/private comments. 153 + TESTS_PRIVATE = { 154 + # 155 + # Simplest case: no private. Ensure that trimming won't affect struct 156 + # 157 + "__run__": make_private_test, 158 + "no private": { 159 + "source": """ 160 + struct foo { 161 + int a; 162 + int b; 163 + int c; 164 + }; 165 + """, 166 + "trimmed": """ 167 + struct foo { 168 + int a; 169 + int b; 170 + int c; 171 + }; 172 + """, 173 + }, 174 + 175 + # 176 + # Play "by the books" by always having a public in place 177 + # 178 + 179 + "balanced_private": { 180 + "source": """ 181 + struct foo { 182 + int a; 183 + /* private: */ 184 + int b; 185 + /* public: */ 186 + int c; 187 + }; 188 + """, 189 + "trimmed": """ 190 + struct foo { 191 + int a; 192 + int c; 193 + }; 194 + """, 195 + }, 196 + 197 + "balanced_non_greddy_private": { 198 + "source": """ 199 + struct foo { 200 + int a; 201 + /* private: */ 202 + int b; 203 + /* public: */ 204 + int c; 205 + /* private: */ 206 + int d; 207 + /* public: */ 208 + int e; 209 + 210 + }; 211 + """, 212 + "trimmed": """ 213 + struct foo { 214 + int a; 215 + int c; 216 + int e; 217 + }; 218 + """, 219 + }, 220 + 221 + "balanced_inner_private": { 222 + "source": """ 223 + struct foo { 224 + struct { 225 + int a; 226 + /* private: ignore below */ 227 + int b; 228 + /* public: but this should not be ignored */ 229 + }; 230 + int b; 231 + }; 232 + """, 233 + "trimmed": """ 234 + struct foo { 235 + struct { 236 + int a; 237 + }; 238 + int b; 239 + }; 240 + """, 241 + }, 242 + 243 + # 244 + # Test what happens if there's no public after private place 245 + # 246 + 247 + "unbalanced_private": { 248 + "source": """ 249 + struct foo { 250 + int a; 251 + /* private: */ 252 + int b; 253 + int c; 254 + }; 255 + """, 256 + "trimmed": """ 257 + struct foo { 258 + int a; 259 + }; 260 + """, 261 + }, 262 + 263 + "unbalanced_inner_private": { 264 + "source": """ 265 + struct foo { 266 + struct { 267 + int a; 268 + /* private: ignore below */ 269 + int b; 270 + /* but this should not be ignored */ 271 + }; 272 + int b; 273 + }; 274 + """, 275 + "trimmed": """ 276 + struct foo { 277 + struct { 278 + int a; 279 + }; 280 + int b; 281 + }; 282 + """, 283 + }, 284 + 285 + "unbalanced_struct_group_tagged_with_private": { 286 + "source": """ 287 + struct page_pool_params { 288 + struct_group_tagged(page_pool_params_fast, fast, 289 + unsigned int order; 290 + unsigned int pool_size; 291 + int nid; 292 + struct device *dev; 293 + struct napi_struct *napi; 294 + enum dma_data_direction dma_dir; 295 + unsigned int max_len; 296 + unsigned int offset; 297 + }; 298 + struct_group_tagged(page_pool_params_slow, slow, 299 + struct net_device *netdev; 300 + unsigned int queue_idx; 301 + unsigned int flags; 302 + /* private: used by test code only */ 303 + void (*init_callback)(netmem_ref netmem, void *arg); 304 + void *init_arg; 305 + }; 306 + }; 307 + """, 308 + "trimmed": """ 309 + struct page_pool_params { 310 + struct_group_tagged(page_pool_params_fast, fast, 311 + unsigned int order; 312 + unsigned int pool_size; 313 + int nid; 314 + struct device *dev; 315 + struct napi_struct *napi; 316 + enum dma_data_direction dma_dir; 317 + unsigned int max_len; 318 + unsigned int offset; 319 + }; 320 + struct_group_tagged(page_pool_params_slow, slow, 321 + struct net_device *netdev; 322 + unsigned int queue_idx; 323 + unsigned int flags; 324 + }; 325 + }; 326 + """, 327 + }, 328 + 329 + "unbalanced_two_struct_group_tagged_first_with_private": { 330 + "source": """ 331 + struct page_pool_params { 332 + struct_group_tagged(page_pool_params_slow, slow, 333 + struct net_device *netdev; 334 + unsigned int queue_idx; 335 + unsigned int flags; 336 + /* private: used by test code only */ 337 + void (*init_callback)(netmem_ref netmem, void *arg); 338 + void *init_arg; 339 + }; 340 + struct_group_tagged(page_pool_params_fast, fast, 341 + unsigned int order; 342 + unsigned int pool_size; 343 + int nid; 344 + struct device *dev; 345 + struct napi_struct *napi; 346 + enum dma_data_direction dma_dir; 347 + unsigned int max_len; 348 + unsigned int offset; 349 + }; 350 + }; 351 + """, 352 + "trimmed": """ 353 + struct page_pool_params { 354 + struct_group_tagged(page_pool_params_slow, slow, 355 + struct net_device *netdev; 356 + unsigned int queue_idx; 357 + unsigned int flags; 358 + }; 359 + struct_group_tagged(page_pool_params_fast, fast, 360 + unsigned int order; 361 + unsigned int pool_size; 362 + int nid; 363 + struct device *dev; 364 + struct napi_struct *napi; 365 + enum dma_data_direction dma_dir; 366 + unsigned int max_len; 367 + unsigned int offset; 368 + }; 369 + }; 370 + """, 371 + }, 372 + "unbalanced_without_end_of_line": { 373 + "source": """ \ 374 + struct page_pool_params { \ 375 + struct_group_tagged(page_pool_params_slow, slow, \ 376 + struct net_device *netdev; \ 377 + unsigned int queue_idx; \ 378 + unsigned int flags; 379 + /* private: used by test code only */ 380 + void (*init_callback)(netmem_ref netmem, void *arg); \ 381 + void *init_arg; \ 382 + }; \ 383 + struct_group_tagged(page_pool_params_fast, fast, \ 384 + unsigned int order; \ 385 + unsigned int pool_size; \ 386 + int nid; \ 387 + struct device *dev; \ 388 + struct napi_struct *napi; \ 389 + enum dma_data_direction dma_dir; \ 390 + unsigned int max_len; \ 391 + unsigned int offset; \ 392 + }; \ 393 + }; 394 + """, 395 + "trimmed": """ 396 + struct page_pool_params { 397 + struct_group_tagged(page_pool_params_slow, slow, 398 + struct net_device *netdev; 399 + unsigned int queue_idx; 400 + unsigned int flags; 401 + }; 402 + struct_group_tagged(page_pool_params_fast, fast, 403 + unsigned int order; 404 + unsigned int pool_size; 405 + int nid; 406 + struct device *dev; 407 + struct napi_struct *napi; 408 + enum dma_data_direction dma_dir; 409 + unsigned int max_len; 410 + unsigned int offset; 411 + }; 412 + }; 413 + """, 414 + }, 415 + } 416 + 417 + #: Dict containing all test groups fror CTokenizer 418 + TESTS = { 419 + "TestPublicPrivate": TESTS_PRIVATE, 420 + "TestTokenizer": TESTS_TOKENIZER, 421 + } 422 + 423 + def setUp(self): 424 + self.maxDiff = None 425 + 426 + def build_test_class(group_name, table): 427 + """ 428 + Dynamically creates a class instance using type() as a generator 429 + for a new class derivated from unittest.TestCase. 430 + 431 + We're opting to do it inside a function to avoid the risk of 432 + changing the globals() dictionary. 433 + """ 434 + 435 + class_dict = { 436 + "setUp": setUp 437 + } 438 + 439 + run = table["__run__"] 440 + 441 + for test_name, data in table.items(): 442 + if test_name == "__run__": 443 + continue 444 + 445 + class_dict[f"test_{test_name}"] = run(test_name, data) 446 + 447 + cls = type(group_name, (unittest.TestCase,), class_dict) 448 + 449 + return cls.__name__, cls 450 + 451 + # 452 + # Create classes and add them to the global dictionary 453 + # 454 + for group, table in TESTS.items(): 455 + t = build_test_class(group, table) 456 + globals()[t[0]] = t[1] 457 + 458 + # 459 + # main 460 + # 461 + if __name__ == "__main__": 462 + run_unittest(__file__)