mul_u64_u64_div_u64: make it precise always

Patch series "mul_u64_u64_div_u64: new implementation", v3.

This provides an implementation for mul_u64_u64_div_u64() that always
produces exact results.

This patch (of 2):

Library facilities must always return exact results. If the caller may be
contented with approximations then it should do the approximation on its
own.

In this particular case the comment in the code says "the algorithm
... below might lose some precision". Well, if you try it with e.g.:

a = 18446462598732840960
b = 18446462598732840960
c = 18446462598732840961

then the produced answer is 0 whereas the exact answer should be
18446462598732840959. This is _some_ precision lost indeed!

Let's reimplement this function so it always produces the exact result
regardless of its inputs while preserving existing fast paths when
possible.

Uwe said:

: My personal interest is to get the calculations in pwm drivers right.
: This function is used in several drivers below drivers/pwm/ . With the
: errors in mul_u64_u64_div_u64(), pwm consumers might not get the
: settings they request. Although I have to admit that I'm not aware it
: breaks real use cases (because typically the periods used are too short
: to make the involved multiplications overflow), but I pretty sure am
: not aware of all usages and it breaks testing.
:
: Another justification is commits like
: https://git.kernel.org/tip/77baa5bafcbe1b2a15ef9c37232c21279c95481c,
: where people start to work around the precision shortcomings of
: mul_u64_u64_div_u64().

Link: https://lkml.kernel.org/r/20240707190648.1982714-1-nico@fluxnic.net
Link: https://lkml.kernel.org/r/20240707190648.1982714-2-nico@fluxnic.net
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Tested-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Tested-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Nicolas Pitre and committed by

Andrew Morton 2 years ago b29a62d8 431c1646

+66 -44

1 changed file

expand all

lib

math

div64.c

+66 -44

lib/math/div64.c

··· 186 186 #ifndef mul_u64_u64_div_u64 187 187 u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 c) 188 188 { 189 - u64 res = 0, div, rem; 190 - int shift; 189 + if (ilog2(a) + ilog2(b) <= 62) 190 + return div64_u64(a * b, c); 191 191 192 - /* can a * b overflow ? */ 193 - if (ilog2(a) + ilog2(b) > 62) { 192 + #if defined(__SIZEOF_INT128__) 193 + 194 + /* native 64x64=128 bits multiplication */ 195 + u128 prod = (u128)a * b; 196 + u64 n_lo = prod, n_hi = prod >> 64; 197 + 198 + #else 199 + 200 + /* perform a 64x64=128 bits multiplication manually */ 201 + u32 a_lo = a, a_hi = a >> 32, b_lo = b, b_hi = b >> 32; 202 + u64 x, y, z; 203 + 204 + x = (u64)a_lo * b_lo; 205 + y = (u64)a_lo * b_hi + (u32)(x >> 32); 206 + z = (u64)a_hi * b_hi + (u32)(y >> 32); 207 + y = (u64)a_hi * b_lo + (u32)y; 208 + z += (u32)(y >> 32); 209 + x = (y << 32) + (u32)x; 210 + 211 + u64 n_lo = x, n_hi = z; 212 + 213 + #endif 214 + 215 + int shift = __builtin_ctzll(c); 216 + 217 + /* try reducing the fraction in case the dividend becomes <= 64 bits */ 218 + if ((n_hi >> shift) == 0) { 219 + u64 n = (n_lo >> shift) | (n_hi << (64 - shift)); 220 + 221 + return div64_u64(n, c >> shift); 194 222 /* 195 - * Note that the algorithm after the if block below might lose 196 - * some precision and the result is more exact for b > a. So 197 - * exchange a and b if a is bigger than b. 198 - * 199 - * For example with a = 43980465100800, b = 100000000, c = 1000000000 200 - * the below calculation doesn't modify b at all because div == 0 201 - * and then shift becomes 45 + 26 - 62 = 9 and so the result 202 - * becomes 4398035251080. However with a and b swapped the exact 203 - * result is calculated (i.e. 4398046510080). 223 + * The remainder value if needed would be: 224 + * res = div64_u64_rem(n, c >> shift, &rem); 225 + * rem = (rem << shift) + (n_lo - (n << shift)); 204 226 */ 205 - if (a > b) 206 - swap(a, b); 207 - 208 - /* 209 - * (b * a) / c is equal to 210 - * 211 - * (b / c) * a + 212 - * (b % c) * a / c 213 - * 214 - * if nothing overflows. Can the 1st multiplication 215 - * overflow? Yes, but we do not care: this can only 216 - * happen if the end result can't fit in u64 anyway. 217 - * 218 - * So the code below does 219 - * 220 - * res = (b / c) * a; 221 - * b = b % c; 222 - */ 223 - div = div64_u64_rem(b, c, &rem); 224 - res = div * a; 225 - b = rem; 226 - 227 - shift = ilog2(a) + ilog2(b) - 62; 228 - if (shift > 0) { 229 - /* drop precision */ 230 - b >>= shift; 231 - c >>= shift; 232 - if (!c) 233 - return res; 234 - } 235 227 } 236 228 237 - return res + div64_u64(a * b, c); 229 + if (n_hi >= c) { 230 + /* overflow: result is unrepresentable in a u64 */ 231 + return -1; 232 + } 233 + 234 + /* Do the full 128 by 64 bits division */ 235 + 236 + shift = __builtin_clzll(c); 237 + c <<= shift; 238 + 239 + int p = 64 + shift; 240 + u64 res = 0; 241 + bool carry; 242 + 243 + do { 244 + carry = n_hi >> 63; 245 + shift = carry ? 1 : __builtin_clzll(n_hi); 246 + if (p < shift) 247 + break; 248 + p -= shift; 249 + n_hi <<= shift; 250 + n_hi |= n_lo >> (64 - shift); 251 + n_lo <<= shift; 252 + if (carry || (n_hi >= c)) { 253 + n_hi -= c; 254 + res |= 1ULL << p; 255 + } 256 + } while (n_hi); 257 + /* The remainder value if needed would be n_hi << p */ 258 + 259 + return res; 238 260 } 239 261 EXPORT_SYMBOL(mul_u64_u64_div_u64); 240 262 #endif

Configure Feed

Configure Feed