feat: add article about logging · hauleth.dev/blog@c917b53

+479

1 changed file

expand all

content

post

+479

content/post/log-all-the-things.md

··· 1 + +++ 2 + title = "Log all the things" 3 + date = 2021-10-13 4 + draft = true 5 + 6 + [taxonomies] 7 + tags = [ 8 + "elixir", 9 + "programming", 10 + "observability" 11 + ] 12 + +++ 13 + 14 + In Elixir 1.11 landed set of new features that allows for more powerful logging 15 + by utilising Erlang's [`logger`][erl-log] features. Here I will try to descibe 16 + new possibilities and how You can use them to improve your logs. 17 + 18 +  19 + 20 + ## New log levels {#levels} 21 + 22 + Elixir gained 4 new log levels to total 8 (from most verbose to least verbose): 23 + 24 + - debug 25 + - info 26 + - **notice** * 27 + - warning (renamed from warn) 28 + - error 29 + - **critical** * 30 + - **alert** * 31 + - **emergency** * 32 + 33 + <small>* new levels</small> 34 + 35 + This allow to provide finer graded verbosity control, however due to 36 + compatibility reasons, in Elixir backends we need to translate these levels back 37 + to "old" set of 4. The current table looks like: 38 + 39 + | Call level | What Elixir backend will see | 40 + | -- | -- | 41 + | `debug` | `debug` | 42 + | `info` | `info` | 43 + | `notice` | **`info`** * | 44 + | `warning` (or `warn`) | `warn` | 45 + | `error` | `error` | 46 + | `critical` | **`error`** * | 47 + | `alert` | **`error`** * | 48 + | `emergency` | **`error`** * | 49 + 50 + <small>* "translated" messages</small> 51 + 52 + However we can set verbosity to all levels. This may be confusing during the 53 + transition period, but we cannot change the behaviour until Elixir 2 (which is 54 + not happening any time soon). 55 + 56 + Usage of the new levels is "obvious": 57 + 58 + ```elixir 59 + Logger.notice("Hello") 60 + ``` 61 + 62 + Will produce message with `notice` level of verbosity. 63 + 64 + Additionally the `logger.level` option in configuration supports 2 additional 65 + verbosity levels that you can use in your config: 66 + 67 + - `:all` - all messages will be logged, logically exactly the same as `:debug` 68 + - `:none` - no messages will be logged 69 + 70 + ## Per module log level {#per-module-level} 71 + 72 + This is change that can be quite handy during debugging sessions. With this 73 + change we have 4 new functions in `Logger` module: 74 + 75 + - [`get_module_level/1`](https://hexdocs.pm/logger/Logger.html#get_module_level/1) 76 + - [`put_module_level/2`](https://hexdocs.pm/logger/Logger.html#put_module_level/2) 77 + - [`delete_module_level/1`](https://hexdocs.pm/logger/Logger.html#delete_module_level/1) 78 + - [`delete_all_module_level/0`](https://hexdocs.pm/logger/Logger.html#delete_all_module_level/0) 79 + 80 + These allow us to manipulate verbosity level on per-module basis. What is 81 + non-obvious and is super handy is that it allows both lowering **and raising** 82 + verbosity for given module. This mean that: 83 + 84 + ```elixir 85 + require Logger 86 + 87 + Logger.configure(level: :error) 88 + 89 + defmodule Foo do 90 + def run do 91 + Logger.debug("I am still there") 92 + end 93 + end 94 + 95 + Foo.run() # Does not log anything 96 + 97 + # Set `debug` level for `Foo` module only 98 + Logger.put_module_level(Foo, :debug) 99 + Foo.run() 100 + # `I am still there` is logged 101 + Logger.debug("I will not be printed") 102 + # Nothing got logged as top-level verbositi is still set to `:error` 103 + ``` 104 + 105 + Of course it will not work if you decide to use [compile time purging](https://hexdocs.pm/logger/Logger.html#module-application-configuration). 106 + 107 + ## Logger handlers {#handlers} 108 + 109 + --- 110 + 111 + **Warning!** This is not fully implemented in both - Erlang and Elixir. Writing 112 + your own handlers without additional knowledge can cause overload problems. 113 + 114 + --- 115 + 116 + Erlang together with their logging implementation needed to provide a way to 117 + ingest these logs somehow. This is done via Erlang logger handlers (in this 118 + article called *handlers* in contrast to Elixir backends called *backends* 119 + there). 120 + 121 + Handlers are modules that export at least 1 function `log/2` that takes 2 122 + arguments: 123 + 124 + - `log_event` which is a map with 3 fields: 125 + - `:level` - verbosity level 126 + - `:msg` - tuple describing message: 127 + - `{:io.format(), [term()]}` - format string and list of terms that should 128 + be passed to `:io_lib.format/2` function 129 + - `{:report, map() | keyword()}` - report that can be formatted into string 130 + by `report_cb/{1,2}` set in metadata map (see below) 131 + - `{:string, :unicode.chardata()}` - raw string that should be printed as 132 + a message 133 + - `:meta` - map containing all metadata for given event. All keys should be 134 + atoms and values can be anything. Some keys have special meaning, and some 135 + of them will be populated automatically by the `Logger` macros and functions. 136 + These are: 137 + - `:pid` - PID of the process that fired log event 138 + - `:gl` - group leader of the process that fired log event 139 + - `:mfa` - tuple in form of `{module(), name :: atom(), arity :: non_neg_integer()}` 140 + that describe function that fired log event 141 + - `:file` - filename of file that defines the code that fired log event 142 + - `:line` - line in the given file where the log event was fired 143 + - `:domain` - list of atoms that can be used to describe log events 144 + hierarchy which then can be used for filtering. All events fired using 145 + `Logger` macros and functions will have `:elixir` prepended to their 146 + domain list. 147 + - `:report_cb` - function that will be used to format `{:report, map() | keyword()}` 148 + messages. This can be either 1-ary function, that takes report and returns 149 + `{:io.format(), [term()]}` leaving truncation and further formatting up to 150 + the main formatter, or 2-ary function that takes report and configuration 151 + map `%{depth: pos_integer() | :unlimited, chars_limit: pos_integer() | 152 + :unlimited, single_line: boolean()}` and returns already formatted 153 + `:unicode.chardata()`. More about it can be found in [separate section](#structured-logging). 154 + 155 + Return value of this function is ignored. If there will be any exception raised 156 + when calling this function, then it will be captured and failing handler will be 157 + removed. This is important, as if such handler is the only one, then you can be 158 + left without any logging handler and miss logs. 159 + 160 + The important thing about Erlang handlers and Elixir backends is that Erlang 161 + handlers functions are called **within caller process** while Elixir backends 162 + are called in separate process. This mean that wrongly written Erlang handler 163 + can cause quite substantial load on application. 164 + 165 + To read on other, optional, callbacks that can be defined by Erlang handler, that 166 + will not be covered there, I suggest looking into [Erlang documentation](https://erlang.org/doc/man/logger.html#formatter-callback-functions). 167 + 168 + ## Structured logging {#structured-logging} 169 + 170 + One of the biggest new features in the Elixir 1.11 is support for structured 171 + logging. This mean that the log message do not need to be free-form string, but 172 + instead we can pass structure, that can provide more machine-readable data for 173 + processing in log aggregators. In Elixir 1.11 is is simple as passing map as a 174 + first argument to the `Logger` macros: 175 + 176 + ```elixir 177 + Logger.info(%{ 178 + status: :completed, 179 + response: :ok 180 + }) 181 + ``` 182 + 183 + This will produce message that looks like: 184 + 185 + ```log 186 + 14:08:46.849 [info] [response: :ok, status: :completed] 187 + ``` 188 + 189 + As we can see, the map (called *report*) is formatted as a keyword list. This is 190 + default way to present the report data. Unfortunately we cannot access the 191 + metadata from the Elixir backends, but we have 2 ways to make these messages 192 + more readable for the human operator: 193 + 194 + 1. Utilise [`Logger`'s translators](https://hexdocs.pm/logger/Logger.Translator.html) 195 + 1. Using `:report_cb` field in metadata 196 + 197 + 1st option is described quite good in Elixir documentation and is available 198 + since Elixir 1.0 as it was used to translate `error_logger` messages in old 199 + Erlang versions. Here I will describe the 2nd option which provide way for 200 + **caller** to define how report should be formatted into human-readable string. 201 + 202 + `:report_cb` accepts 2 kind of functions as an argument: 203 + 204 + - 1-ary function, that takes report as an argument and should return tuple 205 + in form of `{:io.format(), [term()]}` that will be later formatted 206 + respectively by the formatters. 207 + - 2-ary function that takes report and configuration map as an arguments and 208 + should return formatted string. 209 + 210 + 1st option is much easier for most use cases, as it do not force you to worry 211 + about handling width, depth, and multiline logs, as it will all be handled for 212 + you. 213 + 214 + For example, instead of doing: 215 + 216 + ```elixir 217 + Logger.info("Started HTTP server on http://localhost:8080") 218 + ``` 219 + 220 + We can do: 221 + 222 + ```elixir 223 + Logger.info( 224 + %{ 225 + protocol: :http, 226 + port: 8080, 227 + address: "localhost", 228 + endpoint: MyEndpoint, 229 + handler: Plug.Cowboy 230 + }, 231 + report_cb: &__MODULE__.report_cb/1 232 + ) 233 + 234 + # … 235 + 236 + def report_cb(%{protocol: protocol, port: port, address: address}) do 237 + {"Started ~s server on ~s://~s:~B", [protocol, protocol, address, port]} 238 + end 239 + ``` 240 + 241 + While the second entry seems much more verbose, with proper handler, it can 242 + provide much more detailed output. Just imagine that we would have handler that 243 + output JSON data and what information we could contain in such message: 244 + 245 + ```json 246 + { 247 + "msg": "Started HTTP server on http://localhost:8080", 248 + "metadata": { 249 + "mfa": "MyMod.start/2", 250 + "file": "foo.ex", 251 + "line": 42 252 + } 253 + } 254 + ``` 255 + 256 + Now our log aggregation service need to parse `msg` field to extract all 257 + information that is contained there, like port, address, and protocol. With 258 + structured logging we can have that message available already there while 259 + presenting the "human readable" form as well: 260 + 261 + ```json 262 + { 263 + "text": "Started HTTP server on http://localhost:8080", 264 + "msg": { 265 + "address": "localhost", 266 + "port": 8080, 267 + "protocol": "http", 268 + "endpoint": "MyEndpoint", 269 + "handler": "Plug.Cowboy" 270 + }, 271 + "metadata": { 272 + "mfa": "MyMod.start/2", 273 + "file": "foo.ex", 274 + "line": 42 275 + } 276 + } 277 + ``` 278 + 279 + Additionally you can see there that we can have more information available in 280 + the structured log that would otherwise needed to be crammed somewhere into the 281 + text message, even if it is not important in "regular" Ops observability. 282 + 283 + This can raise a question - why not use metadata for such functionality, like it 284 + is available in [`LoggerJSON`][] or [`Ink`][]? The reason is that their reason 285 + existence is different. Metadata meant for "meta" stuff like location, tracing 286 + ID, but not for the information about the message itself. It is best shown on 287 + example. For this use Elixir's implementation of `GenServer` wrapper that 288 + produces error log entry on unknown message handled by default `handle_info/2`: 289 + 290 + ```elixir 291 + Logger.error( 292 + # Report 293 + %{ 294 + label: {GenServer, :no_handle_info}, 295 + report: %{ 296 + module: __MODULE__, 297 + message: msg, 298 + name: proc 299 + } 300 + }, 301 + # Metadata 302 + %{ 303 + domain: [:otp, :elixir], 304 + error_logger: %{tag: :error_msg}, 305 + report_cb: &GenServer.format_report/1 306 + } 307 + ) 308 + ``` 309 + 310 + As we can see there, the report contains informations like: 311 + 312 + - `:label` - that describes type of the event 313 + - `:report` - content of the "main" event 314 + - `:module` - module that created the event, it is important to notice, that 315 + it is also present in metadata (as part of `:mfa` key), but their meaning is 316 + different. Module name here is meant for the operator to know the name of 317 + the implementor that failed to handle message, while `:mfa` is meant to 318 + describe the location of the code that fired the event. 319 + - `:message` - the message itself that hasn't been handled. Notice, that it is 320 + not stringified in any way there, it is simply passed "as is" to the 321 + report. It is meant to be stringified later by the `:report_cb` function. 322 + - `:name` - name of the process. Remember, similarly to `:module`, the PID of 323 + the current process is part of the metadata, so in theory we could use value 324 + from there, but their meaning is different (additionally this one may be an 325 + atom in case if the process is locally registered with name). 326 + 327 + Metadata on the other hand contains information that will be useful for 328 + filtering or formatting of the event. 329 + 330 + The rule of thumb you can follow is: 331 + 332 + > If it is thing that you will want to filter on, then it propably should be 333 + > part of the metadata. If you want to aggregate information or just display 334 + > them, it should be part of the message report. 335 + 336 + ## Log filtering 337 + 338 + Finally we come to first feature that is not directly accessible from the Elixir 339 + `Logger` API (yet). Erlang's `logger` have powerful functionality for filtering 340 + log messages which allows us to dynamically decide which message should, or 341 + should not be logged. These even can alter messages on the fly. 342 + 343 + Currently that functionality is available only via `:logger` module. It can be 344 + used like: 345 + 346 + ```elixir 347 + defmodule MyFilter do 348 + def filter(log_event, opts) do 349 + # … 350 + end 351 + end 352 + 353 + :logger.add_primary_filter(:my_filter, {&MyFilter.filter/2, opts}) 354 + # Or 355 + :logger.add_handler_filter(handler_id, :my_filter, {&MyFilter.filter/2, opts}) 356 + ``` 357 + 358 + Few important things that need to be remembered when writing such filters: 359 + 360 + - It is best practice to make such functions public and define filters using 361 + remote function capture, like `&__MODULE__.process_disabled/2` (so not 362 + anonymous functions either). It will make such filter much easier for VM to 363 + handle (it is bigger topic why it is that, I may to cover it in another post). 364 + - Filters are ran **within the same process that fired log event**, so it is 365 + important to make such filters as fast as possible, and do not do any heavy 366 + work there. 367 + 368 + Filters can be used for 2 different things: 369 + 370 + - preventing some messages from being logged 371 + - modifying a message 372 + 373 + While the former is much more common, I will try to describe both use cases 374 + there, as the latter is also quite useful. 375 + 376 + Filters are defined as 2-ary functions where 1st argument is log event, and 377 + second argument is any term that can be used as a configuration for filter. 378 + Filter should return one of these 3 values: 379 + 380 + - `:stop` - which will immediately discard message and do not run any additional 381 + filters. 382 + - `:ignore` - which mean that given filter didn't recognise the given message 383 + and leaves it up to other filters to decide on the action. If all filters 384 + return `:ignore` then `:filter_default` option for the handler will be taken. 385 + By default it is `:log`, which mean that message will be logged, but default 386 + handler has it set to `:stop` by default, which mean, that non-matching 387 + messages will be discarded. 388 + - Just log event (possibly modified) that will cause next filter to be called 389 + with altered message. The message returned by the last filter (or in case of 390 + `:ignore` return, previous filters) will be the message passed to handler. 391 + 392 + ### Preventing some messages from being logged 393 + 394 + Most common use-case for filters will probably be rejecting messages that aren't 395 + important for us. [Erlang even prepared some useful filters][logger_filters]: 396 + 397 + - `domain` - allow filtering by metadata `:domain` field (remember as I said 398 + that metadata is for filtering?). It supports multiple possible relations 399 + between the log domain and defined domain. 400 + - `level` - allow filtering (in or out) messages depending on their level, in 401 + both directions. So it will allow you to filter messages with higher level for 402 + some handlers. Just remember, that it will not receive messages that will not 403 + pass primary/module level. 404 + - `progress` - filters all reports from `supervisor` and 405 + `application_controller`. Simply, reduces startup/process shutdown chatter 406 + that often is meaningless for most time. 407 + - `remote_gl` - filters messages coming from group leader on another node. 408 + Useful when you want to discard/log messages coming from other nodes in 409 + cluster. 410 + 411 + ### Modifying a message 412 + 413 + Sometimes (hopefully rarely) there is need to alter messages in the system. For 414 + example we may need to prevent sensitive information from being logged. When 415 + using "old" Elixir approach you could abuse translators, but that was error 416 + prone, as first successful translator was breaking pipeline, so you couldn't 417 + just smash one on top and then keep rest working as is. With "new" approach and 418 + structured logging you can just traverse the report and replace all occurences 419 + of the unsafe data with anonymised data. For example: 420 + 421 + ```elixir 422 + def filter_out_password(%{msg: {:report, report}} = event, _opts) do 423 + %{event | msg: {:report, replace(report)}} 424 + end 425 + 426 + @filtered "[FILTERED]" 427 + 428 + defp replace(%{password: _} = map) do 429 + for {k, v} <- %{map | password: @filtered}, into: %{} do 430 + {k, replace(v)} 431 + end 432 + end 433 + 434 + defp replace(%{"password" => _} = map) do 435 + for {k, v} <- %{map | "password" => @filtered}, into: %{} do 436 + {k, replace(v)} 437 + end 438 + end 439 + 440 + defp replace(list) when is_list(list) do 441 + for elem <- list do 442 + case elem do 443 + {:password, _} -> {:password, @filtered} 444 + {"password", _} -> {"password", @filtered} 445 + {k, v} -> {k, replace(v)} 446 + other -> replace(other) 447 + end 448 + end 449 + end 450 + 451 + defp replace(other), do: other 452 + ``` 453 + 454 + This snippet will replace all occurences of `:password` or `"password"` with 455 + filtered out value. 456 + 457 + There is disadvantage of such approach though - it will make all messages with 458 + such fields allowed in case if your filter has `:filter_default` set to `:stop`. 459 + That mean, that if you want to make some of them rejected anyway, then you will 460 + need to manually add additional step to reject messages that do not fit into 461 + your patterns. Alternatively you can use `filter_default: :log` and then use 462 + opt-out logging. There currently is no way to alter the message and make other 463 + filters decide whether log it or not (as of OTP 24). 464 + 465 + ## Summary 466 + 467 + New features and possibilities with relation to logging in Elixir 1.11 can be 468 + overwhelming. Fortunately all of the new features are optional and provided in 469 + addition to "good 'ol `Logger.info("logging")`". But for the people who works on 470 + the observability in BEAM (EEF Observability WG, Sentry, Logflare, etc.) it 471 + brings a lot of new powerful capabilities. 472 + 473 + I am thrilled to see what will people create using all that power. 474 + 475 + [erl-log]: https://erlang.org/doc/man/logger.html 476 + [syslog]: https://en.wikipedia.org/wiki/Syslog#Severity_level 477 + [`LoggerJSON`]: https://github.com/Nebo15/logger_json 478 + [`Ink`]: https://hex.pm/packages/ink 479 + [logger_filters]: https://erlang.org/doc/man/logger_filters.html

Configure Feed

Configure Feed