@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Remarkup Code-block: parse language specifier in markdown

Summary:
We add support to code blocks with the language expressed as GitLab/GitHub/StackOverflow/...
"flavored markdown".

So we support this syntax: (to avoid confusion see it online on the Diff)

lang=text
```php
$asd = 1;
```

Before this change, this was the only supposed syntax in Remarkup, with an explicit "lang=":

lang=text
```lang=php
$asd = 1;
```

This change introduces a minor risk to eat legitimate Remarkup content, since Remarkup allows
to do a multi-line in this way:

lang=text
```$asd = 1;
$asd = 2;```

The above example still works, but, there is a chance that hardcore Remarkup people
have a problem when doing a code block to mention programming languages.

In short, this can be problematic since "cpp" will be eaten from this list:

COUNTEREXAMPLE
```cpp
php
python
```

Using the above example is not socially nice because it is not usable in GitLab, GitHub and Stack Overflow.

If your first line is eaten:

Just *add* a newline on the top to reach a valid raw Markdown list (suggested, valid in Remarkup + Markdown):

lang=text
```
cpp
php
python
```

Or, just add "text" to specify that as language (suggested, valid in Remarkup + Markdown):

lang=text
```text
cpp
php
python
```

Or, just *remove* a newline from the bottom to reach a valid raw Remarkup list (Remarkup-only):

lang=text
```cpp
php
python```

Or, just specify that you are writing in the language "text" (Remarkup-only):

lang=text
```lang=text
cpp
php
python```

To reduce impact and help you, the logic of this strict implementation is:

- must have backticks
- must not have any valid remarkup option, like lang=, counterexample, etc.
- must not have content in the same line of the last backticks
- must have a known language in our proposed subset

If everything is OK, we remove that language from the content since it would be otherwise displayed.

Interestingly, this could improve performance when rendering README files or snippets from external
websites, since - in case - we do not need to guess the language using our deep dark magic.

Closes T15481

Test Plan:
We added some nice unit tests. Ensure that this test passes:

PhutilRemarkupEngineTestCase::testEngine

Optionally, take vision of these, before and after:

https://we.phorge.it/P16

Change the test plan slightly every time, to make sure it is not in your cache.

Reviewers: O1 Blessed Committers, avivey

Reviewed By: O1 Blessed Committers, avivey

Subscribers: avivey, speck, tobiaswiese, Matthew, Cigaryno

Maniphest Tasks: T15481

Differential Revision: https://we.phorge.it/D25299

+154 -4
+98 -4
src/infrastructure/markup/blockrule/PhutilRemarkupCodeBlockRule.php
··· 44 44 } 45 45 46 46 public function markupText($text, $children) { 47 - if (preg_match('/^\s*```/', $text)) { 47 + // Header/footer eventually useful to be nice with "flavored markdown". 48 + // When it starts with ```stuff the header is 'stuff' (->language) 49 + // When it ends with stuff``` the footer is 'stuff' (->garbage) 50 + $header_line = null; 51 + $footer_line = null; 52 + 53 + $matches = null; 54 + if (preg_match('/^\s*```(.*)/', $text, $matches)) { 55 + if (isset($matches[1])) { 56 + $header_line = $matches[1]; 57 + } 58 + 48 59 // If this is a ```-style block, trim off the backticks and any leading 49 60 // blank line. 50 61 $text = preg_replace('/^\s*```(\s*\n)?/', '', $text); ··· 52 63 } 53 64 54 65 $lines = explode("\n", $text); 66 + 67 + // If we have a flavored header, it has sense to look for the footer. 68 + if ($header_line !== null && $lines) { 69 + $footer_line = $lines[last_key($lines)]; 70 + } 71 + 72 + // Strip final empty lines 55 73 while ($lines && !strlen(last($lines))) { 56 74 unset($lines[last_key($lines)]); 57 75 } ··· 65 83 66 84 $parser = new PhutilSimpleOptions(); 67 85 $custom = $parser->parse(head($lines)); 86 + $valid_options = null; 68 87 if ($custom) { 69 - $valid = true; 88 + $valid_options = true; 70 89 foreach ($custom as $key => $value) { 71 90 if (!array_key_exists($key, $options)) { 72 - $valid = false; 91 + $valid_options = false; 73 92 break; 74 93 } 75 94 } 76 - if ($valid) { 95 + if ($valid_options) { 77 96 array_shift($lines); 78 97 $options = $custom + $options; 98 + } 99 + } 100 + 101 + // Parse flavored markdown strictly to don't eat legitimate Remarkup. 102 + // Proceed only if we tried to parse options and we failed 103 + // (no options also mean no language). 104 + // For example this is not a valid option: ```php 105 + // Proceed only if the footer exists and it is not: blabla``` 106 + // Accept only 2 lines or more. First line: header; then content. 107 + if ( 108 + $valid_options === false && 109 + $header_line !== null && 110 + $footer_line === '' && 111 + count($lines) > 1 112 + ) { 113 + if (self::isKnownLanguageCode($header_line)) { 114 + array_shift($lines); 115 + $options['lang'] = $header_line; 79 116 } 80 117 } 81 118 ··· 247 284 PhutilSafeHTML::applyFunction( 248 285 'rtrim', 249 286 $engine->highlightSource($options['lang'], $text))); 287 + } 288 + 289 + /** 290 + * Check if a language code can be used in a generic flavored markdown. 291 + * @param string $lang Language code 292 + * @return bool 293 + */ 294 + private static function isKnownLanguageCode($lang) { 295 + $languages = self::knownLanguageCodes(); 296 + return isset($languages[$lang]); 297 + } 298 + 299 + /** 300 + * Get the available languages for a generic flavored markdown. 301 + * @return array Languages as array keys. Ignore the value. 302 + */ 303 + private static function knownLanguageCodes() { 304 + // This is a friendly subset from https://pygments.org/languages/ 305 + static $map = array( 306 + 'arduino' => 1, 307 + 'assembly' => 1, 308 + 'awk' => 1, 309 + 'bash' => 1, 310 + 'bat' => 1, 311 + 'c' => 1, 312 + 'cmake' => 1, 313 + 'cobol' => 1, 314 + 'cpp' => 1, 315 + 'css' => 1, 316 + 'csharp' => 1, 317 + 'dart' => 1, 318 + 'delphi' => 1, 319 + 'fortran' => 1, 320 + 'go' => 1, 321 + 'groovy' => 1, 322 + 'haskell' => 1, 323 + 'java' => 1, 324 + 'javascript' => 1, 325 + 'kotlin' => 1, 326 + 'lisp' => 1, 327 + 'lua' => 1, 328 + 'matlab' => 1, 329 + 'make' => 1, 330 + 'perl' => 1, 331 + 'php' => 1, 332 + 'powershell' => 1, 333 + 'python' => 1, 334 + 'r' => 1, 335 + 'ruby' => 1, 336 + 'rust' => 1, 337 + 'scala' => 1, 338 + 'sh' => 1, 339 + 'sql' => 1, 340 + 'typescript' => 1, 341 + 'vba' => 1, 342 + ); 343 + return $map; 250 344 } 251 345 252 346 }
+2
src/infrastructure/markup/remarkup/__tests__/PhutilRemarkupEngineTestCase.php
··· 2 2 3 3 /** 4 4 * Test cases for @{class:PhutilRemarkupEngine}. 5 + * @TODO: This unit is not always triggered when you need it. 6 + * https://we.phorge.it/T15500 5 7 */ 6 8 final class PhutilRemarkupEngineTestCase extends PhutilTestCase { 7 9
+7
src/infrastructure/markup/remarkup/__tests__/remarkup/tick-block-flavored.txt
··· 1 + ```cpp 2 + code 3 + ``` 4 + ~~~~~~~~~~ 5 + <div class="remarkup-code-block" data-code-lang="cpp" data-sigil="remarkup-code-block"><pre class="remarkup-code">code</pre></div> 6 + ~~~~~~~~~~ 7 + code
+18
src/infrastructure/markup/remarkup/__tests__/remarkup/tick-block-multi-flavored-comment.txt
··· 1 + ```#comment 2 + code 3 + 4 + #more comment 5 + more code``` 6 + 7 + ~~~~~~~~~~ 8 + <div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">#comment 9 + code 10 + 11 + #more comment 12 + more code</pre></div> 13 + ~~~~~~~~~~ 14 + #comment 15 + code 16 + 17 + #more comment 18 + more code
+9
src/infrastructure/markup/remarkup/__tests__/remarkup/tick-block-multi-flavored-empty.txt
··· 1 + ``` 2 + cpp 3 + second line``` 4 + ~~~~~~~~~~ 5 + <div class="remarkup-code-block" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code">cpp 6 + second line</pre></div> 7 + ~~~~~~~~~~ 8 + cpp 9 + second line
+20
src/infrastructure/markup/remarkup/__tests__/remarkup/tick-block-multi-flavored.txt
··· 1 + ```cpp 2 + code 3 + 4 + more code 5 + 6 + more code 7 + ``` 8 + 9 + ~~~~~~~~~~ 10 + <div class="remarkup-code-block" data-code-lang="cpp" data-sigil="remarkup-code-block"><pre class="remarkup-code">code 11 + 12 + more code 13 + 14 + more code</pre></div> 15 + ~~~~~~~~~~ 16 + code 17 + 18 + more code 19 + 20 + more code