Mirror of https://github.com/roostorg/coop github.com/roostorg/coop
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

at 557ff54b2b435e5f1e789c6a8a4e1bebf2d7deb6 177 lines 10 kB view raw
1import binomialTest from '@stdlib/stats-binomial-test'; 2import _ from 'lodash'; 3 4import { RuleAlarmStatus } from '../moderationConfigService/index.js'; 5 6const { sum } = _; 7 8/** 9 * Determine whether we should flag the rule as being 'in alarm', meaning that 10 * its most-recent pass rate looks anomalous. 11 * 12 * I'm not exactly sure what the math should be here, but it seems like there 13 * are two ways to view a rule's pass rate. 14 * 15 * One is to imagine that there's some static probability of a rule passing, so 16 * the number of passes you'd expect to observe would be given by a binomial 17 * distribution, which you're sampling (historically and in the latest period). 18 * Then, you can look at how many passes were in the latest period and ask (in 19 * various ways) if that's consistent w/ the pass rate observed historically. 20 * 21 * The second view is to see a rule's pass rate as a variable that changes over 22 * time, with an independent "seasonal" component (i.e., different cohorts of 23 * users regualarly come onto the platform at different times of day/days of the 24 * week) and "trend" component (e.g., the pass rate may be going down over time 25 * as bad users get deleted). Then, you forecast the next period's pass rate, 26 * with a confidence interval on the forecast, and use the error between the 27 * forecast and the actual pass rate to detect anomalies. 28 * 29 * The second approach feels more accurate, but also obviously more complicated. 30 * The statistics are beyond me, and I couldn't find any off-the-shelf packages 31 * in JS for decomposing the historical data into these trend + seasonal + 32 * residual components (they're all in Python or R.) 33 * 34 * My hacky approach was gonna be to check if the difference between this hour's 35 * pass rate and last hour's pass rate was abnormally large (by estimating a 36 * normal distribution for the rule's hour-by-hour _pass rate change_, from the 37 * samples), and then mark the rule as in alarm if the pass rate went up a lot. 38 * The idea was that, by comparing the current hour to the last hour (when 39 * finding the pass rate change), we'd be assuming that the best estimate for 40 * this hour's pass rate -- if things aren't anomalous -- is last hour's pass 41 * rate, which thereby accounts for "seasonal" effects that we might miss if we 42 * used some longer-run average as the starting point for estimating what this 43 * period's pass rate "should be" (a la the 'static pass rate' model). 44 * 45 * But one issue is that the last hour's pass rate could also be abnormally low 46 * by chance, which we wouldn't have alerted on (since we only alert if it looks 47 * like the pass rate is unusually _high_), so then simple regression to the 48 * mean in the next period would result in a big difference that could trigger 49 * an alert. Relatedly, some rules, especially at some hours, really don't run 50 * that many times, so using the hour-by-hour pass rates is not very reliable/ 51 * accurate (doubly so if we start onboarding smaller users), even though 52 * pass rates are very nice in their ability to abstract away changes in the 53 * absolute amount of usage at each hour, which we don't care about (and does 54 * change a lot). Still, many rules don't pass at all in a given hour, which 55 * totally messes this approach up. 56 * 57 * There's also a bigger issue with looking at the pass rate as a time series, 58 * which is that, if we go into alert mode when the pass rate jumps, we don't 59 * want to then automatically go out of alert mode while the anomaly is still 60 * occurring (i.e., we don't want to too-quickly 'learn' the anomolous value as 61 * the 'correct' value, which could happen if we use the last value as the basis 62 * for our prediction of the next value, and the anomaly lasts for over an hour). 63 * 64 * So, for now, I'm sticking with the more naive option (1), but, to kinda-sorta 65 * try to capture part of this seasonality idea, while avoiding the trap of an 66 * anomalous most-recent hour(s), I'm very-slightly weighting more recent data 67 * more heavily when calculating the historical data. This is kinda like 68 * "exponential smoothing" in time series analysis. 69 * 70 * @param stats An array where each item represents one time period, with newer 71 * time periods first. Within an item, the keys represent how many passes and 72 * rule executions there were in that time period. 73 * @param confidence A value between 0 and 1 that represents how confident we 74 * have to be before we consider the rule in alarm. 75 */ 76export default function getRuleAlarmStatus( 77 stats: { passes: number; runs: number }[], 78 confidence = 0.995, 79): RuleAlarmStatus { 80 const [lastPeriod, ...priorPeriods] = stats; 81 82 // alpha is the exponential smoothing factor (i.e., the weights exponentially 83 // decay such that each period gets (1 - alpha) times the weight of the prior 84 // one. alpha, is quite low because, again, we don't want to learn an anomaly 85 // as the new 'real' value, and because any 'seasonal' changes should be small 86 // (so we don't actually need to give much more weight to recent data). A .02 87 // smoothing factor each gives each prior period 98% as much weight as the 88 // next (i.e., more-recent) period. 89 // 90 // NB: We're throwing away the _number of weighted passes and runs_, and just 91 // looking at the weighted _pass rate_. If we ever want to do a statistical 92 // test that relies on the absolute number of (weighted) prior runs to derive 93 // some uncertainty bounds, we'll need to scale the weights to sum to 1 first. 94 const alpha = 0.02; 95 const weights = priorPeriods.map((_, i) => (1 - alpha) ** i); 96 const historicalWeightedPassRate = 97 sum(priorPeriods.map((it, i) => it.passes * weights[i])) / 98 sum(priorPeriods.map((it, i) => it.runs * weights[i])); 99 100 // We've got a few options for the kind of test we can do here: 101 // 102 // 1) we could just assume that the pass rate from `priorPeriodsMerged` is 103 // the underlying pass rate for the rule (even though it's actually just 104 // an estimate, which might be off if we don't have much historical data), 105 // and then do an exact binomial test against lastPeriod's sample. 106 // 107 // 2) we can find the difference in pass rates between lastPeriod and the 108 // historical data, and assume that that difference is normally distributed. 109 // this would account for the sample size of the historical data. 110 // (See https://www.itl.nist.gov/div898/handbook/prc/section3/prc33.htm) 111 // the issue is that our binomial distributions are hella skewed (because 112 // the pass rate for some rules is very, very close to zero), so assuming 113 // the pass rate difference is normally distributed isn't strictly right 114 // iiuc. there is (luckily) an off-the-shelf heuristic for when it's good 115 // enough, which is when there's at least 10 passes in both samples. 116 // 117 // 3) we can run one of the exact tests that work around the fact that our 118 // historical proportion is just an estimate, using different approaches. 119 // see https://stats.stackexchange.com/a/551617/277172. the issue here is 120 // that these are complicated and/or slow. 121 // 122 // For now, i just stick with the simplest option (1), but return false if it 123 // looks like we might not have enough historical data. The threshold for 124 // "enough" is hard to find because, if we want our estimated pass rate to be 125 // off by no more than 10%, 95% of the time, the number of raw runs we'd need 126 // could vary _wildly_ based on the rule's true pass rate. Our rule pass rates 127 // seem to be as low as 1 in 100,000, which would require 250,000 runs to have 128 // a reliable estimate, which is gonna be too many for smaller users and 129 // is unnecessary for rules that pass much more often (e.g., a rule that 130 // passed 5% of the time, which is the highest pass rate I can imagine, would 131 // only need ~8000 runs, and ~4000 if we only care about our estimate being 132 // too high, since we're only alerting when the threshold is exceeded, not 133 // when the pass rate is anomolously low). However, if we say that the rule 134 // must have been run at least 4000 times, and passed at least twice or been 135 // run 125,000+ times, then this covers both extremes, I think. So, we require 136 // that + at least 24 periods of data, to hopefully capture some seasonality. 137 const priorPasses = sum(priorPeriods.map((it) => it.passes)); 138 const priorRuns = sum(priorPeriods.map((it) => it.runs)); 139 const hasAdequateData = 140 priorPeriods.length >= 24 && 141 priorRuns >= 4000 && 142 (priorPasses > 2 || priorRuns > 125000); 143 144 if (!hasAdequateData) { 145 return RuleAlarmStatus.INSUFFICIENT_DATA; 146 } 147 148 // Even with our heuristics above, there are some cases where we're 149 // incorrectly flagging things as anomolous because, when the sample sizes are 150 // big (as w/ our large users), then even very small errors in the 151 // estimate used for the pass rate, or small seasonal effects, can lead us to 152 // log an anomaly incorrectly. 153 // 154 // For example: a rule has run almost 2 million times and has a weighted pass 155 // rate of 0.017752. Now, in the latest period, if the rule runs many times 156 // (say 8,000+), that's a big enough sample size that we'd expect the pass 157 // rate to be pretty close to .017752. Specifically, with 8,000 runs, the 158 // expected value is 8000*.017752 = 142 successes, but 99.5% of the time, 159 // we'll have 171 passes (pass rate: .021375) or fewer. In real life, we'll 160 // semi-routinely get 172-182 passes, and this won't represent anything 161 // anomalous. Instead, it just means that our pass rate estimate was still a 162 // little off or some seasonal thing is going on in this hour. So, to prevent 163 // these kinds of false positives, we make sure that the pass rate in this 164 // period is at least 25% higher than the historical weighted pass rate. 165 if (lastPeriod.passes / lastPeriod.runs < historicalWeightedPassRate * 1.25) { 166 return RuleAlarmStatus.OK; 167 } 168 169 const testResults = binomialTest(lastPeriod.passes, lastPeriod.runs, { 170 p: historicalWeightedPassRate, 171 alternative: 'greater', 172 alpha: 1 - confidence, 173 }); 174 175 // It's an anomaly if we rejected the null hypothesis. 176 return testResults.rejected ? RuleAlarmStatus.ALARM : RuleAlarmStatus.OK; 177}