code complexity & repetition analysis tool
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

initial commit

Owais Jamil c599846d

+306
+21
.gitignore
··· 1 + # Generated by Cargo 2 + # will have compiled files and executables 3 + debug 4 + target 5 + 6 + # These are backup files generated by rustfmt 7 + **/*.rs.bk 8 + 9 + # MSVC Windows builds of rustc generate these, which store debugging information 10 + *.pdb 11 + 12 + # Generated by cargo mutants 13 + # Contains mutation testing data 14 + **/mutants.out*/ 15 + 16 + # RustRover 17 + # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 18 + # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 19 + # and can be added to the global gitignore or merged into this file. For a more nuclear 20 + # option (not recommended) you can uncomment the following to ignore the entire idea folder. 21 + #.idea/
+11
Cargo.lock
··· 1 + # This file is automatically @generated by Cargo. 2 + # It is not intended for manual editing. 3 + version = 4 4 + 5 + [[package]] 6 + name = "mccabre" 7 + version = "0.1.0" 8 + 9 + [[package]] 10 + name = "mccabre-core" 11 + version = "0.1.0"
+3
Cargo.toml
··· 1 + [workspace] 2 + resolver = "2" 3 + members = ["crates/cli", "crates/core"]
+231
README
··· 1 + =============================================================================== 2 + Code Complexity & Clone Detection Tool 3 + =============================================================================== 4 + 5 + An extensible CLI for measuring code complexity and detecting repeated code 6 + fragments. Designed to be language-agnostic with low-overhead. 7 + 8 + This document outlines: 9 + - Goals and metrics 10 + - Algorithms used 11 + - Project structure 12 + - Beyond-MVP roadmap 13 + - Outputs 14 + - Philosophy 15 + 16 + =============================================================================== 17 + GOALS 18 + 19 + - Provide actionable complexity and duplication insights without heavy parsing. 20 + - Support any programming language 21 + - Keep runtime fast: linear or near-linear where possible. 22 + - Keep architecture modular so deeper analysis can be plugged in later. 23 + - Produce both human-readable and machine-readable reports. 24 + 25 + =============================================================================== 26 + SCOPE 27 + 28 + The MVP focuses on high-signal, low-complexity algorithms that require only 29 + tokenization—not full AST parsing. This keeps the implementation small and 30 + the performance excellent. 31 + 32 + =============================================================================== 33 + Algorithms 34 + ------------------------------------------------------------------------------- 35 + 1. Cyclomatic Complexity (McCabe) 36 + - Measures independent control-flow paths in a function. 37 + - Works by building a simple CFG (control-flow graph) from tokens or indentation. 38 + - Formula: 39 + CC = E - N + 2P 40 + where: 41 + E = number of edges 42 + N = number of nodes 43 + P = number of connected components (usually 1) 44 + 45 + Purpose: 46 + - Flags overly complex functions. 47 + - Well understood by developers. 48 + ------------------------------------------------------------------------------- 49 + 2. Token-based Clone Detection (Rabin-Karp Rolling Hash) 50 + - Detects repeated token sequences across files. 51 + - Uses a rolling hash window (e.g., 20–50 tokens). 52 + - Language-agnostic; extremely fast. 53 + 54 + Purpose: 55 + - Quickly identifies copy/paste blocks and boilerplate. 56 + ------------------------------------------------------------------------------- 57 + 3. Lines of Code (LOC) 58 + - Counts executable or logical lines. 59 + - Required for future Maintainability Index calculations. 60 + 61 + Purpose: 62 + - Baseline size metric. 63 + ------------------------------------------------------------------------------- 64 + 4. Halstead Complexity Metrics 65 + - Based on counts of operators and operands. 66 + - Computes vocabulary, volume, difficulty, and effort. 67 + 68 + Purpose: 69 + - Complements Cyclomatic Complexity with lexical complexity. 70 + =============================================================================== 71 + FUTURE 72 + ------------------------------------------------------------------------------- 73 + A. AST-Based Clone Detection 74 + - Compare abstract syntax trees or subtrees. 75 + - Identifies clones resilient to renamed variables or formatting changes. 76 + - Requires per-language AST adapters. 77 + 78 + B. Cognitive Complexity 79 + - Scores human comprehension cost. 80 + - Rewards flattened control flow, penalizes deep nesting. 81 + - Requires AST-level traversal. 82 + 83 + C. Maintainability Index 84 + - Combines Cyclomatic, Halstead, and LOC into a single number. 85 + - Good for dashboards and longitudinal tracking. 86 + 87 + D. Semantic Clone Detection 88 + - Goes beyond syntax: identifies logically equivalent code. 89 + - Requires control/data-flow analysis. 90 + 91 + E. Dependency Metrics 92 + - Coupling, fan-in/fan-out, depth of inheritance. 93 + - Requires language-specific type-resolution or symbol graph extraction. 94 + 95 + F. Hotspot Analysis 96 + - Combine Git history + complexity metrics. 97 + - Identify files that change often AND are complex. 98 + 99 + G. Incremental Mode 100 + - Cache hashes/graphs. 101 + - Analyze only changed files and touched boundaries. 102 + 103 + H. Rich Reports 104 + - HTML dashboards 105 + - SVG graphs (CFG visualization) 106 + - JSON with stable schema for CI systems 107 + 108 + =============================================================================== 109 + CRATES 110 + 111 + -------------------------------------------------------------------------------- 112 + core 113 + tokenizer 114 + - minimal language-agnostic tokenizer 115 + - operator/operand extractor (Halstead) 116 + - whitespace, comments, string handling 117 + complexity 118 + cyclomatic 119 + - CFG builder 120 + - node/edge counter 121 + halstead 122 + - operator+operand tables 123 + loc 124 + - physical and logical LOC 125 + cloner 126 + - rolling hash (Rabin-Karp) 127 + - w-shingling and fingerprint index 128 + - min token length filter 129 + reporter 130 + - JSON output 131 + - plaintext/ANSI output 132 + - sorting, filtering, severity thresholds 133 + loader 134 + - config loading 135 + - file walker + globbing 136 + - ignore rules 137 + 138 + -------------------------------------------------------------------------------- 139 + cli 140 + - commands: analyze, clones, complexity, dump-config 141 + - flags: --json, --threshold, --min-tokens, --sort, --path 142 + 143 + =============================================================================== 144 + OUTPUT FORMATS 145 + -------------------------------------------------------------------------------- 146 + Plaintext: 147 + 148 + FILE: src/server/routes.rs 149 + Cyclomatic: 14 (warning) 150 + Halstead Volume: 312.9 151 + LOC: 128 152 + 153 + FILE: src/server/auth.rs 154 + Cyclomatic: 5 155 + Halstead Volume: 102.3 156 + LOC: 41 157 + 158 + CLONES: 159 + - HashMatch #7 160 + - src/server/routes.rs:41-78 161 + - src/server/router.rs:12-49 162 + Length: 32 tokens 163 + 164 + -------------------------------------------------------------------------------- 165 + JSON: 166 + 167 + { 168 + "files": [ 169 + { 170 + "path": "src/server/routes.rs", 171 + "cyclomatic": 14, 172 + "halstead": { "volume": 312.9, "difficulty": 12.8 }, 173 + "loc": 128 174 + } 175 + ], 176 + "clones": [ 177 + { 178 + "id": 7, 179 + "length": 32, 180 + "locations": [ 181 + { "file": "src/server/routes.rs", "start": 41, "end": 78 }, 182 + { "file": "src/server/router.rs", "start": 12, "end": 49 } 183 + ] 184 + } 185 + ] 186 + } 187 + 188 + =============================================================================== 189 + DESIGN PRINCIPLES 190 + 191 + 1. Zero language assumptions in the MVP. 192 + 2. Pluggable architecture (token → AST → CFG → semantic). 193 + 3. High performance by default. 194 + 4. No global state; everything streamed and incremental. 195 + 5. Developer-friendly reporting and clear severity levels. 196 + 6. Configurable thresholds 197 + 198 + =============================================================================== 199 + SUMMARY 200 + 201 + MVP: 202 + - LOC 203 + - Cyclomatic Complexity 204 + - Rabin-Karp Clone Detection 205 + - Halstead Complexity 206 + 207 + Beyond MVP: 208 + - AST-based clones 209 + - Cognitive Complexity 210 + - Maintainability Index 211 + - Coupling metrics 212 + - Call graphs and data-flow graphs 213 + - Hotspot analysis 214 + - Incremental scanning 215 + 216 + =============================================================================== 217 + REFERENCES 218 + Cyclomatic Complexity (McCabe, 1976): 219 + https://www.literateprogramming.com/mccabe.pdf 220 + https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication500-235.pdf 221 + 222 + AST clone detection: 223 + https://www.cs.ucdavis.edu/~su/publications/icse07.pdf 224 + 225 + LSH-based clone detection: 226 + https://journals.riverpublishers.com/index.php/JCSANDM/article/download/18799/18101/68821 227 + https://www.sei.cmu.edu/library/code-similarity-detection-using-syntax-agnostic-locality-sensitive-hashing/ 228 + 229 + Halstead: 230 + https://www.researchgate.net/publication/317317114_Software_Complexity_Analysis_Using_Halstead_Metrics 231 + https://nvlpubs.nist.gov/nistpubs/TechnicalNotes/NIST.TN.1990.pdf
+6
crates/cli/Cargo.toml
··· 1 + [package] 2 + name = "mccabre" 3 + version = "0.1.0" 4 + edition = "2024" 5 + 6 + [dependencies]
+3
crates/cli/src/main.rs
··· 1 + fn main() { 2 + println!("Hello, world!"); 3 + }
+6
crates/core/Cargo.toml
··· 1 + [package] 2 + name = "mccabre-core" 3 + version = "0.1.0" 4 + edition = "2024" 5 + 6 + [dependencies]
+14
crates/core/src/lib.rs
··· 1 + pub fn add(left: u64, right: u64) -> u64 { 2 + left + right 3 + } 4 + 5 + #[cfg(test)] 6 + mod tests { 7 + use super::*; 8 + 9 + #[test] 10 + fn it_works() { 11 + let result = add(2, 2); 12 + assert_eq!(result, 4); 13 + } 14 + }
+1
docs/.gitignore
··· 1 + book
+6
docs/book.toml
··· 1 + [book] 2 + authors = ["Owais Jamil"] 3 + language = "en" 4 + multilingual = false 5 + src = "src" 6 + title = "Mccabre Code Analyzer"
+3
docs/src/SUMMARY.md
··· 1 + # Summary 2 + 3 + - [Chapter 1](./chapter_1.md)
+1
docs/src/chapter_1.md
··· 1 + # Chapter 1