84664

How can I multiply and divide 64-bit ints accurately?

I have a C function:

int64_t fn(int64_t a, int32_t b, int32_t c, int32_t d) { /* should return (a * b * c)/d */ }

It is possible for a to be near INT64_MAX, but for the final result not to overflow, for instance if b = 1, c = d = 40. However, I am having trouble figuring out how to compute this so that I never lose data to rounding (by doing the division first) or have an intermediate result overflow.

If I had access to a large enough datatype to fit the whole product of a, b, and c, I would just do the math in that type and then truncate, but is there some way I can do this without big integers?

Answer1:

Write a = q*d + r with |r| < |d| (I'm assuming d != 0, otherwise the computation is meaningless anyway). Then (a*b*c)/d = q*b*c + (r*b*c)/d. If q*b*c overflows, the entire computation would overflow anyway, so either you don't care, or you have to check for overflow. r*b*c might still overflow, so we again use the same method to avoid overflow,

int64_t q = a/d, r = a%d; int64_t part1 = q*b*c; int64_t q1 = (r*b)/d, r1 = (r*b)%d; return part1 + q1*c + (r1*c)/d;

Answer2:

I would suggest finding the greatest common divisor of d with each of a, b, and c, dividing out the common factors as you go:

common = gcd(a,d) // You can implement GCD using Euclid's algorithm a=a/common d=d/common common = gcd(b,d) b=b/common d=d/common common = gcd(c,d) c=c/common d=d/common

Then calculate a*b*c/d with all the common factors removed. Euclid's GCD algorithm runs in logarithmic time, so this should be fairly efficient.

Answer3:

It's easy to see that some inputs would produce outputs that cannot be represented by the return type int64_t. For example, fn(INT64_MAX, 2, 1, 1). However, the following approach <strike>will</strike> should allow you to return a correct answer for any combination of inputs that does in fact fit within the range of int64_t.

int64_t fn(int64_t a, int32_t b, int32_t c, int32_t d) { /* find the integer and remainder portions of a/d */ int64_t leftI = a / d; int64_t leftR = a % d; /* multiply the integer portion of the result by b and c */ int64_t resultI = leftI * b * c; /* multiply the remainder portion by b */ int64_t resultR = leftR * b; resultI = resultI + (resultR / d) * c; /* multiply the remainder portion by c */ resultR = (resultR % d) * c; return resultI + (resultR / d); }

Answer4:

If you are working on x86_64 then the asm supports 128 bit integers:

int64_t fn(uint64_t a, uint64_t b, uint64_t c, uint64_t d) { asm ( "mulq %1\n" // a *= b "movq %%rbx, %%rdx\n"// rbx = upper 64 bit of the multiplication "mulq %2\n" // multiply the lower 64 bits by c "push %%rax\n" // temporarily save the lowest 64 bits on the stack "mov %%rcx, %%rdx\n" // rcx = upper 64 bits of the multiplication "movq %%rax, %%rbx\n"// "mulq %2\n" // multiply the upper 64 bits by c "addq %%rax, %%rcx\n"// combine the middle 64 bits "addcq %%rdx, $0\n" // transfer carry tp the higest 64 bits if present "divq %3\n" // divide the upper 128 (of 192) bits by d "mov %%rbx, %%rax\n" // rbx = result "pop %%rax\n" "divq %3\n" // divide remainder:lower 64 bits by d : "+a" (a) // assigns a to rax register as in/out , "+b" (b) // assigns b to rbx register : "g" (c) // assigns c to random register , "g" (d) // assigns d to random register : "edx", "rdx" // tells the compiler that edx/rdx will be used internally, but does not need any input ); // b now holds the upper 64 bit if (a * b * c / d) > UINT64_MAX return a; }

Please note that all input integers have to be the same length. Working length will be double the input. Works with unsigned only.

The native div and mul instructions on x86 work on double-length exactly to allow for overflows. Sadly I am unaware of a compiler intrinsic to make use of them.

Recommend

  • How to properly declare a computed property, when calculation uses background threads?
  • Stored procedure throws error in SQL Server 2012 but works fine in SQL Server 2000
  • Multithreaded application, am I doing it right?
  • Python Prime Number Generator
  • Ways to implement recipricals on Verilog
  • Speed of “gcloud docker push”
  • Performning sudzc webservice using GCD
  • Nth HCF of two number
  • What is the maximum number of threads we can run in NSOperationQueue simultaneously [duplicate]
  • Android SQLite column and index best practice
  • How to get country name in android?
  • spoj factorial (time limit exceeded error). How can i improve my solution? [closed]
  • Resize rectangle in Paper.js
  • Dividing a number into m parts uniformly randomly
  • Apache Spark: How to structure code of a Spark Application (especially when using Broadcasts)
  • Download an asynchronous multiple images in UITableViewView?
  • Necessary to pin_ptr on C++ CLR Value types, why?
  • Removing levels from data frame read from CSV file - R
  • Max of several columns
  • Mysql query to determine if the given datetime is included in the datetime interval
  • Distribute Range of Numbers between each threads
  • Perspective projection, 4 points
  • Less Conflicting Session Manager for Zope 2
  • R Split data.frame using a column that represents and on/off switch
  • Does Mobilefirst provide a provision to access web services directly?
  • WPF - CanExecute dosn't fire when raising Commands from a UserControl
  • azure media services - The request body is too large and exceeds the maximum permissible limit
  • Sencha Touch 2.0 Controller refs attribute not working?
  • Play WS (2.2.1): post/put large request
  • Get data from AJAX - How to
  • Initializer list vs. initialization method
  • Installing Hadoop, Java Exception about illegal characters at index 7?
  • Spring security and special characters
  • Different response to non-authenticated users and AJAX calls
  • Bug in WPF DataGrid
  • Incrementing object id automatically JS constructor (static method and variable)
  • Trying to switch camera back to front but getting exception
  • Free memory of cv::Mat loaded using FileStorage API
  • Angular 2 constructor injection vs direct access
  • Programmatically clearing map cache