docs: kdoc_parser: avoid tokenizing structs everytime

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

Most of the rules inside CTransforms are of the type CMatch.

Don't re-parse the source code every time.

Doing this doesn't change the output, but makes kdoc almost
as fast as before the tokenizer patches:

# Before tokenizer patches
$ time ./scripts/kernel-doc . -man >original 2>&1

real 0m42.933s
user 0m36.523s
sys 0m1.145s

# After tokenizer patches
$ time ./scripts/kernel-doc . -man >before 2>&1

real 1m29.853s
user 1m23.974s
sys 0m1.237s

# After this patch
$ time ./scripts/kernel-doc . -man >after 2>&1

real 0m48.579s
user 0m45.938s
sys 0m0.988s

$ diff -s before after
Files before and after are identical

Manually checked the differences between original and after
with:

$ diff -U0 -prBw original after|grep -v Warning|grep -v "@@"|less

They're due:
- whitespace fixes;
- struct_group are now better handled;
- several badly-generated man pages from broken inline kernel-doc
markups are now fixed.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Message-ID: <1cc2a4286ebf7d4b2d03fcaf42a1ba9fa09004b9.1773770483.git.mchehab+huawei@kernel.org>

authored by

Mauro Carvalho Chehab and committed by

Jonathan Corbet 2 months ago 79d881be 12aa7753

+24 -7

2 changed files

expand all

tools

lib

python

kdoc

kdoc_parser.py

xforms_lists.py

-1

tools/lib/python/kdoc/kdoc_parser.py

··· 737 737 # 738 738 # Go through the list of members applying all of our transformations. 739 739 # 740 - members = trim_private_members(members) 741 740 members = self.xforms.apply("struct", members) 742 741 743 742 #

+24 -6

tools/lib/python/kdoc/xforms_lists.py

··· 5 5 import re 6 6 7 7 from kdoc.kdoc_re import KernRe 8 - from kdoc.c_lex import CMatch 8 + from kdoc.c_lex import CMatch, CTokenizer 9 9 10 10 struct_args_pattern = r'([^,)]+)' 11 11 ··· 15 15 structure member prefixes, and macro invocations and variables 16 16 into something we can parse and generate kdoc for. 17 17 """ 18 + 19 + # 20 + # NOTE: 21 + # Due to performance reasons, place CMatch rules before KernRe, 22 + # as this avoids running the C parser every time. 23 + # 18 24 19 25 #: Transforms for structs and unions. 20 26 struct_xforms = [ ··· 130 124 "var": var_xforms, 131 125 } 132 126 133 - def apply(self, xforms_type, text): 127 + def apply(self, xforms_type, source): 134 128 """ 135 - Apply a set of transforms to a block of text. 129 + Apply a set of transforms to a block of source. 130 + 131 + As tokenizer is used here, this function also remove comments 132 + at the end. 136 133 """ 137 134 if xforms_type not in self.xforms: 138 - return text 135 + return source 136 + 137 + if isinstance(source, str): 138 + source = CTokenizer(source) 139 139 140 140 for search, subst in self.xforms[xforms_type]: 141 - text = search.sub(subst, text) 142 - return text 141 + # 142 + # KernRe only accept strings. 143 + # 144 + if isinstance(search, KernRe): 145 + source = str(source) 146 + 147 + source = search.sub(subst, source) 148 + return str(source)

Configure Feed

Configure Feed