I have a C function:

```
int64_t fn(int64_t a, int32_t b, int32_t c, int32_t d)
{
/* should return (a * b * c)/d */
}
```

It is possible for a to be near INT64_MAX, but for the final result not to overflow, for instance if b = 1, c = d = 40. However, I am having trouble figuring out how to compute this so that I never lose data to rounding (by doing the division first) or have an intermediate result overflow.

If I had access to a large enough datatype to fit the whole product of a, b, and c, I would just do the math in that type and then truncate, but is there some way I can do this without big integers?

### Answer1:

Write `a = q*d + r`

with `|r| < |d|`

(I'm assuming `d != 0`

, otherwise the computation is meaningless anyway). Then `(a*b*c)/d = q*b*c + (r*b*c)/d`

. If `q*b*c`

overflows, the entire computation would overflow anyway, so either you don't care, or you have to check for overflow. `r*b*c`

might still overflow, so we again use the same method to avoid overflow,

```
int64_t q = a/d, r = a%d;
int64_t part1 = q*b*c;
int64_t q1 = (r*b)/d, r1 = (r*b)%d;
return part1 + q1*c + (r1*c)/d;
```

### Answer2:

I would suggest finding the greatest common divisor of d with each of a, b, and c, dividing out the common factors as you go:

```
common = gcd(a,d) // You can implement GCD using Euclid's algorithm
a=a/common
d=d/common
common = gcd(b,d)
b=b/common
d=d/common
common = gcd(c,d)
c=c/common
d=d/common
```

Then calculate `a*b*c/d`

with all the common factors removed. Euclid's GCD algorithm
runs in logarithmic time, so this should be fairly efficient.

### Answer3:

It's easy to see that some inputs would produce outputs that cannot be represented by the return type `int64_t`

. For example, `fn(INT64_MAX, 2, 1, 1)`

. However, the following approach <strike>will</strike> should allow you to return a correct answer for any combination of inputs that does in fact fit within the range of `int64_t`

.

```
int64_t fn(int64_t a, int32_t b, int32_t c, int32_t d)
{
/* find the integer and remainder portions of a/d */
int64_t leftI = a / d;
int64_t leftR = a % d;
/* multiply the integer portion of the result by b and c */
int64_t resultI = leftI * b * c;
/* multiply the remainder portion by b */
int64_t resultR = leftR * b;
resultI = resultI + (resultR / d) * c;
/* multiply the remainder portion by c */
resultR = (resultR % d) * c;
return resultI + (resultR / d);
}
```

### Answer4:

If you are working on x86_64 then the asm supports 128 bit integers:

```
int64_t fn(uint64_t a, uint64_t b, uint64_t c, uint64_t d) {
asm (
"mulq %1\n" // a *= b
"movq %%rbx, %%rdx\n"// rbx = upper 64 bit of the multiplication
"mulq %2\n" // multiply the lower 64 bits by c
"push %%rax\n" // temporarily save the lowest 64 bits on the stack
"mov %%rcx, %%rdx\n" // rcx = upper 64 bits of the multiplication
"movq %%rax, %%rbx\n"//
"mulq %2\n" // multiply the upper 64 bits by c
"addq %%rax, %%rcx\n"// combine the middle 64 bits
"addcq %%rdx, $0\n" // transfer carry tp the higest 64 bits if present
"divq %3\n" // divide the upper 128 (of 192) bits by d
"mov %%rbx, %%rax\n" // rbx = result
"pop %%rax\n"
"divq %3\n" // divide remainder:lower 64 bits by d
: "+a" (a) // assigns a to rax register as in/out
, "+b" (b) // assigns b to rbx register
: "g" (c) // assigns c to random register
, "g" (d) // assigns d to random register
: "edx", "rdx" // tells the compiler that edx/rdx will be used internally, but does not need any input
);
// b now holds the upper 64 bit if (a * b * c / d) > UINT64_MAX
return a;
}
```

Please note that all input integers have to be the same length. Working length will be double the input. Works with unsigned only.

The native `div`

and `mul`

instructions on x86 work on double-length exactly to allow for overflows. Sadly I am unaware of a compiler intrinsic to make use of them.