server/services/ruleAnomalyDetectionService/getRuleAlarmStatus.ts at 557ff54b2b435e5f1e789c6a8a4e1bebf2d7deb6

roost.tools / coop
fork
Mirror of https://github.com/roostorg/coop github.com/roostorg/coop
fork
coop / server / services / ruleAnomalyDetectionService / getRuleAlarmStatus.ts
at 557ff54b2b435e5f1e789c6a8a4e1bebf2d7deb6 177 lines 10 kB view raw
wrap content
Juan S. Mrad Initial open source release (v0) 2mo ago
23e31d47
  1import binomialTest from '@stdlib/stats-binomial-test';
  2import _ from 'lodash';
  3
  4import { RuleAlarmStatus } from '../moderationConfigService/index.js';
  5
  6const { sum } = _;
  7
  8/**
  9 * Determine whether we should flag the rule as being 'in alarm', meaning that
 10 * its most-recent pass rate looks anomalous.
 11 *
 12 * I'm not exactly sure what the math should be here, but it seems like there
 13 * are two ways to view a rule's pass rate.
 14 *
 15 * One is to imagine that there's some static probability of a rule passing, so
 16 * the number of passes you'd expect to observe would be given by a binomial
 17 * distribution, which you're sampling (historically and in the latest period).
 18 * Then, you can look at how many passes were in the latest period and ask (in
 19 * various ways) if that's consistent w/ the pass rate observed historically.
 20 *
 21 * The second view is to see a rule's pass rate as a variable that changes over
 22 * time, with an independent "seasonal" component (i.e., different cohorts of
 23 * users regualarly come onto the platform at different times of day/days of the
 24 * week) and "trend" component (e.g., the pass rate may be going down over time
 25 * as bad users get deleted). Then, you forecast the next period's pass rate,
 26 * with a confidence interval on the forecast, and use the error between the
 27 * forecast and the actual pass rate to detect anomalies.
 28 *
 29 * The second approach feels more accurate, but also obviously more complicated.
 30 * The statistics are beyond me, and I couldn't find any off-the-shelf packages
 31 * in JS for decomposing the historical data into these trend + seasonal +
 32 * residual components (they're all in Python or R.)
 33 *
 34 * My hacky approach was gonna be to check if the difference between this hour's
 35 * pass rate and last hour's pass rate was abnormally large (by estimating a
 36 * normal distribution for the rule's hour-by-hour _pass rate change_, from the
 37 * samples), and then mark the rule as in alarm if the pass rate went up a lot.
 38 * The idea was that, by comparing the current hour to the last hour (when
 39 * finding the pass rate change), we'd be assuming that the best estimate for
 40 * this hour's pass rate -- if things aren't anomalous -- is last hour's pass
 41 * rate, which thereby accounts for "seasonal" effects that we might miss if we
 42 * used some longer-run average as the starting point for estimating what this
 43 * period's pass rate "should be" (a la the 'static pass rate' model).
 44 *
 45 * But one issue is that the last hour's pass rate could also be abnormally low
 46 * by chance, which we wouldn't have alerted on (since we only alert if it looks
 47 * like the pass rate is unusually _high_), so then simple regression to the
 48 * mean in the next period would result in a big difference that could trigger
 49 * an alert. Relatedly, some rules, especially at some hours, really don't run
 50 * that many times, so using the hour-by-hour pass rates is not very reliable/
 51 * accurate (doubly so if we start onboarding smaller users), even though
 52 * pass rates are very nice in their ability to abstract away changes in the
 53 * absolute amount of usage at each hour, which we don't care about (and does
 54 * change a lot). Still, many rules don't pass at all in a given hour, which
 55 * totally messes this approach up.
 56 *
 57 * There's also a bigger issue with looking at the pass rate as a time series,
 58 * which is that, if we go into alert mode when the pass rate jumps, we don't
 59 * want to then automatically go out of alert mode while the anomaly is still
 60 * occurring (i.e., we don't want to too-quickly 'learn' the anomolous value as
 61 * the 'correct' value, which could happen if we use the last value as the basis
 62 * for our prediction of the next value, and the anomaly lasts for over an hour).
 63 *
 64 * So, for now, I'm sticking with the more naive option (1), but, to kinda-sorta
 65 * try to capture part of this seasonality idea, while avoiding the trap of an
 66 * anomalous most-recent hour(s), I'm very-slightly weighting more recent data
 67 * more heavily when calculating the historical data. This is kinda like
 68 * "exponential smoothing" in time series analysis.
 69 *
 70 * @param stats An array where each item represents one time period, with newer
 71 *   time periods first. Within an item, the keys represent how many passes and
 72 *   rule executions there were in that time period.
 73 * @param confidence A value between 0 and 1 that represents how confident we
 74 *   have to be before we consider the rule in alarm.
 75 */
 76export default function getRuleAlarmStatus(
 77  stats: { passes: number; runs: number }[],
 78  confidence = 0.995,
 79): RuleAlarmStatus {
 80  const [lastPeriod, ...priorPeriods] = stats;
 81
 82  // alpha is the exponential smoothing factor (i.e., the weights exponentially
 83  // decay such that each period gets (1 - alpha) times the weight of the prior
 84  // one. alpha, is quite low because, again, we don't want to learn an anomaly
 85  // as the new 'real' value, and because any 'seasonal' changes should be small
 86  // (so we don't actually need to give much more weight to recent data). A .02
 87  // smoothing factor each gives each prior period 98% as much weight as the
 88  // next (i.e., more-recent) period.
 89  //
 90  // NB: We're throwing away the _number of weighted passes and runs_, and just
 91  // looking at the weighted _pass rate_. If we ever want to do a statistical
 92  // test that relies on the absolute number of (weighted) prior runs to derive
 93  // some uncertainty bounds, we'll need to scale the weights to sum to 1 first.
 94  const alpha = 0.02;
 95  const weights = priorPeriods.map((_, i) => (1 - alpha) ** i);
 96  const historicalWeightedPassRate =
 97    sum(priorPeriods.map((it, i) => it.passes * weights[i])) /
 98    sum(priorPeriods.map((it, i) => it.runs * weights[i]));
 99
100  // We've got a few options for the kind of test we can do here:
101  //
102  // 1) we could just assume that the pass rate from `priorPeriodsMerged` is
103  //    the underlying pass rate for the rule (even though it's actually just
104  //    an estimate, which might be off if we don't have much historical data),
105  //    and then do an exact binomial test against lastPeriod's sample.
106  //
107  // 2) we can find the difference in pass rates between lastPeriod and the
108  //    historical data, and assume that that difference is normally distributed.
109  //    this would account for the sample size of the historical data.
110  //    (See https://www.itl.nist.gov/div898/handbook/prc/section3/prc33.htm)
111  //    the issue is that our binomial distributions are hella skewed (because
112  //    the pass rate for some rules is very, very close to zero), so assuming
113  //    the pass rate difference is normally distributed isn't strictly right
114  //    iiuc. there is (luckily) an off-the-shelf heuristic for when it's good
115  //    enough, which is when there's at least 10 passes in both samples.
116  //
117  // 3) we can run one of the exact tests that work around the fact that our
118  //    historical proportion is just an estimate, using different approaches.
119  //    see https://stats.stackexchange.com/a/551617/277172. the issue here is
120  //    that these are complicated and/or slow.
121  //
122  // For now, i just stick with the simplest option (1), but return false if it
123  // looks like we might not have enough historical data. The threshold for
124  // "enough" is hard to find because, if we want our estimated pass rate to be
125  // off by no more than 10%, 95% of the time, the number of raw runs we'd need
126  // could vary _wildly_ based on the rule's true pass rate. Our rule pass rates
127  // seem to be as low as 1 in 100,000, which would require 250,000 runs to have
128  // a reliable estimate, which is gonna be too many for smaller users and
129  // is unnecessary for rules that pass much more often (e.g., a rule that
130  // passed 5% of the time, which is the highest pass rate I can imagine, would
131  // only need ~8000 runs, and ~4000 if we only care about our estimate being
132  // too high, since we're only alerting when the threshold is exceeded, not
133  // when the pass rate is anomolously low). However, if we say that the rule
134  // must have been run at least 4000 times, and passed at least twice or been
135  // run 125,000+ times, then this covers both extremes, I think. So, we require
136  // that + at least 24 periods of data, to hopefully capture some seasonality.
137  const priorPasses = sum(priorPeriods.map((it) => it.passes));
138  const priorRuns = sum(priorPeriods.map((it) => it.runs));
139  const hasAdequateData =
140    priorPeriods.length >= 24 &&
141    priorRuns >= 4000 &&
142    (priorPasses > 2 || priorRuns > 125000);
143
144  if (!hasAdequateData) {
145    return RuleAlarmStatus.INSUFFICIENT_DATA;
146  }
147
148  // Even with our heuristics above, there are some cases where we're
149  // incorrectly flagging things as anomolous because, when the sample sizes are
150  // big (as w/ our large users), then even very small errors in the
151  // estimate used for the pass rate, or small seasonal effects, can lead us to
152  // log an anomaly incorrectly.
153  //
154  // For example: a rule has run almost 2 million times and has a weighted pass
155  // rate of 0.017752. Now, in the latest period, if the rule runs many times
156  // (say 8,000+), that's a big enough sample size that we'd expect the pass
157  // rate to be pretty close to .017752. Specifically, with 8,000 runs, the
158  // expected value is 8000*.017752 = 142 successes, but 99.5% of the time,
159  // we'll have 171 passes (pass rate: .021375) or fewer. In real life, we'll
160  // semi-routinely get 172-182 passes, and this won't represent anything
161  // anomalous. Instead, it just means that our pass rate estimate was still a
162  // little off or some seasonal thing is going on in this hour. So, to prevent
163  // these kinds of false positives, we make sure that the pass rate in this
164  // period is at least 25% higher than the historical weighted pass rate.
165  if (lastPeriod.passes / lastPeriod.runs < historicalWeightedPassRate * 1.25) {
166    return RuleAlarmStatus.OK;
167  }
168
169  const testResults = binomialTest(lastPeriod.passes, lastPeriod.runs, {
170    p: historicalWeightedPassRate,
171    alternative: 'greater',
172    alpha: 1 - confidence,
173  });
174
175  // It's an anomaly if we rejected the null hypothesis.
176  return testResults.rejected ? RuleAlarmStatus.ALARM : RuleAlarmStatus.OK;
177}
Configure Feed

Configure Feed