Rotate left on a 64-bit word byte array in JavaCard


I am trying to perform a rotate left (ROTL) operation on an arbitrary amount of rotations on a 64-bit word currently represented as a byte array of 8 bytes in a JavaCard smart card.

The ugly way around is to hard-code all 64 possible permutations of ROTL on a 64-bit word represented as an 8 byte array but that will simply bloat the entire codebase up.

How do I make it leaner so that I can on-the-fly do any arbitrary amount of ROTL operations on demand on a 64-bit word (in byte array) with the use of byte and short types only due to JavaCard not recognizing more complex things like int or long and so on without need for hardcoding all ROTL64 permutations.


The following method performs rotate right for any kind of array in a buffer, <em>without requiring an additional output or temporary array</em>:

/** * Rotates the indicated bytes in the given buffer to the right by a certain * number of bits. * * @param buf * the buffer in which the bits need to be rotated * @param off * the offset in the buffer where the rotation needs to start * @param len * the amount of bytes * @param rot * the amount to rotate (any 31 bit value allowed) */ public static void rotr(byte[] buf, short off, short len, short rot) { if (len == 0) { // nothing to rotate (avoid division by 0) return; } final short lenBits = (short) (len * BYTE_SIZE); // doesn't always work for edge cases close to MIN_SHORT / MAX_SHORT rot = (short) ((rot + lenBits) % lenBits); // reused variables for byte and bit shift short shift, i; byte t1, t2; // --- byte shift shift = (short) (rot / BYTE_SIZE); // only shift when actually required if (shift != 0) { // values will never be used, src == start at the beginning short start = -1, src = -1, dest; // compiler is too stupid to see t1 will be assigned anyway t1 = 0; // go over all the bytes, but in stepwise fashion for (i = 0; i < len; i++) { // if we get back to the start // ... then we need to continue one position to the right if (src == start) { start++; t1 = buf[(short) (off + (++src))]; } // calculate next location by stepping by the shift amount // ... modulus the length of course dest = (short) ((src + shift) % len); // save value, this will be the next one to be stored t2 = buf[(short) (off + dest)]; // store value, doing the actual shift buf[(short) (off + dest)] = t1; // do the step src = dest; // we're going to store t1, not t2 t1 = t2; } } // --- bit shift shift = (short) (rot % BYTE_SIZE); // only shift when actually required if (shift != 0) { // t1 holds previous byte, at other side t1 = buf[(short) (off + len - 1)]; for (i = 0; i < len; i++) { t2 = buf[(short) (off + i)]; // take bits from previous byte and this byte together buf[(short) (off + i)] = (byte) ((t1 << (BYTE_SIZE - shift)) | ((t2 & BYTE_MASK) >> shift)); // current byte is now previous byte t1 = t2; } } } private static final short BYTE_MASK = 0xFF; private static final short BYTE_SIZE = 8;

A drawback is that it requires two passes over all the data: one of the byte shift and one for the bit shift. It skips them when they are not required (if you know that the skipping is never performed then you can easily remove those checks).

Here is the left-rotate. The left rotate is somewhat harder to implement itself - requiring a few more calculations, so the cost of the additional method call may be offset against that. If you rotate by using a literal then you can just use the rotr function of course, or calculate the rotation amount yourself.

public static void rotl(byte[] buf, short off, short len, short bits) { final short lenBits = (short) (len * BYTE_SIZE); bits = (short) ((bits + lenBits) % lenBits); // we don't care if we pass 0 or lenBits, rotr will adjust rotr(buf, off, len, (short) (lenBits - bits)); }

Note: A previous version required rotates over 64 bit, was more time constant and didn't have the offset included. It also required a 64 bit specific array with loop constants (which is now replaced by the much more generic inner if statement in the for loop of the byte shift). See the edits for other versions.

<hr />

Rotating gets much easier when an output buffer is available: this implementation can consist of just the initialization part and the last 4 lines of code. Crypto code often only shifts by a constant size and by odd numbers, so just the last 4 lines can then be used, skipping the optimizations in case no (bit) shift needs to be performed.

I've deliberately used a slightly different interface for that which assumes 64 bits for the rotate, just to show a slightly different implementation.

public static void rotr64(byte[] inBuf, short inOff, byte[] outBuf, short outOff, short rot) { short byteRot = (short) ((rot & 0b00111000) >> 3); short bitRot = (short) (rot & 0b00000111); if (bitRot == 0) { if (byteRot == 0) { // --- no rotation return; } // --- only byte rotation for (short i = 0; i < LONG_BYTES; i++) { outBuf[(short) (outOff + (i + byteRot) % LONG_BYTES)] = inBuf[(short) (inOff + i)]; } } else { // --- bit- and possibly byte rotation // note: also works for all other situations, but slower // put the last byte in t_lo short t = (short) (inBuf[inOff + LONG_BYTES - 1] & BYTE_MASK); for (short i = 0; i < LONG_BYTES; i++) { // shift t_lo into t_hi and add the next byte into t_lo t = (short) (t << BYTE_SIZE | (inBuf[(short) (inOff + i)] & BYTE_MASK)); // find the byte to receive the shifted value within the short outBuf[(short) (outOff + (i + byteRot) % LONG_BYTES)] = (byte) (t >> bitRot); } } } private static final int LONG_BYTES = 8; private static final short BYTE_MASK = 0xFF; private static final short BYTE_SIZE = 8;

and it can be further simplified if the offset is always set to zero.

Here is the left rotate, in case you need a generic function for that:

public static void rotl64(byte[] inBuf, short inOff, byte[] outBuf, short outOff, short rot) { rotr64(inBuf, inOff, outBuf, outOff, (short) (64 - rot & 0b00111111)); } <hr />

Everything is tested against random input (a million runs or so, which takes less than a second on Java SE) although I do not offer warranty on the testing; please test yourself.


A very simple implementation which receives the four shorts in separate parameters

public static void rotateRight64(short x3, short x2, short x1, short x0, short rotAmount, short[] out) { assert out.length() == 4; rotAmount &= (1 << 6) - 1; // limit the range to 0..63 if (rotAmount >= 16) rotateRight64(x0, x3, x2, x1, rotAmount - 16, out); else { out[0] = (short)((x0 >>> rotAmount) | (x1 << (16-rotAmount))); out[1] = (short)((x1 >>> rotAmount) | (x2 << (16-rotAmount))); out[2] = (short)((x2 >>> rotAmount) | (x3 << (16-rotAmount))); out[3] = (short)((x3 >>> rotAmount) | (x0 << (16-rotAmount))); } }

It's a rotate right but it's easy to do a rotate left by rotating right 64 - rotAmount

Alternatively it can be done like this without the coarse shifting step

public static void rotateRight(short[] out, short[] in, short rotAmount) // in ror rotAmount { assert out.length() == 4 && in.length() == 4 && rotAmount >= 0 && rotAmount < 64; const short shift = (short)(rotAmount % 16); const short limbshift = (short)(rotAmount / 16); short tmp = in[0]; for (short i = 0; i < 4; ++i) { short index = (short)((i + limbshift) % 4); out[i] = (short)((in[index] >>> shift) | (in[index + 1] << (16 - shift))); } }

This way it can be easily changed to an arbitrary-precision shift/rotate

If the input array is byte then you can change short[4] to byte[8] and change all the constants from 16 → 8 and from 4 → 8. In fact they can be generalized without problem, I'm just hard coding to keep it simple and easier to understand


  • StepLDA without Cross Validation
  • Given N marbles and M floors, find the algorithm to find the lowest floor from which marble would br
  • SQL Transpose Rows to undefined number of columns
  • Visual Studio Code: DesignHostManager failed
  • How to perform a left join in SQLALchemy?
  • Adding support for a memory type similar to __shared__ in CUDA using clang-llvm compiler
  • TypeScript Mapped Types: Get element type of array
  • I have a SQLite syntax error in DELETE statement
  • Unselect column after pasting data
  • How can I substitute my own custom dynamic scaffolding methods
  • Edge-case: When (only) reversing order of template parameters in specialization, can non-specialized
  • Imagemagick set interline spacing?
  • Populate checkbox from database
  • parallelize process in missForest package
  • Sort by a column in a union query in SqlAlchemy SQLite
  • How to format code on aptana 3?
  • Updating and removing unique join relationships in CakePHP
  • Count New Lines in Text File
  • MySQL performance when updating row with FK
  • Plotting densities in R
  • ThreadStatic in asynchronous ASP.NET Web API
  • Mysql - How to search for 26 records that each begins with the letter of the alphabet?
  • Does Mobilefirst provide a provision to access web services directly?
  • How to view images from protected folder with php?
  • Display images in Django
  • preg_replace Double Spaces to tab (\\t) at the beginning of a line
  • formatting the colorbar ticklabels with SymLogNorm normalization in matplotlib
  • java.lang.NoClassDefFoundError: com.parse.Parse$Configuration$Builder on below Lollipop versions
  • Resize panoramic image to fixed size
  • How to add a column to a Pandas dataframe made of arrays of the n-preceding values of another column
  • How to extract text from Word files using C#?
  • Convert array of 8 bytes to signed long in C++
  • Importing jscolor library in angular 2
  • Codeigniter doesn't let me update entry, because some fields must be unique
  • Understanding cpu registers
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]
  • Does armcc optimizes non-volatile variables with -O0?
  • Recursive/Hierarchical Query Using Postgres
  • Running Map reduces the dimensions of the matrices
  • Conditional In-Line CSS for IE and Others?