What is the availability of 'vector long long'?

I'm testing on an old PowerMac G5, which is a Power4 machine. The build is failing:

$ make ... g++ -DNDEBUG -g2 -O3 -mcpu=power4 -maltivec -c ppc-simd.cpp ppc-crypto.h:36: error: use of 'long long' in AltiVec types is invalid make: *** [ppc-simd.o] Error 1

The failure is due to:

typedef __vector unsigned long long uint64x2_p8;

I'm having trouble determining when I should make the typedef available. With -mcpu=power4 -maltivec the machine reports 64-bit availability:

$ gcc -mcpu=power4 -maltivec -dM -E - </dev/null | sort | egrep -i -E 'power|ARCH' #define _ARCH_PPC 1 #define _ARCH_PPC64 1 #define __POWERPC__ 1

The OpenPOWER | 6.1. Vector Data Types manual has a good information on vector data types, but it does not discuss when the vector long long are available.

What is the availability of __vector unsigned long long? When can I use the typedef?

Answer1:

TL:DR: it looks like POWER7 is the minimum requirement for 64-bit element size with AltiVec. This is part of VSX (Vector Scalar Extension), which Wikipedia confirms first appeared in POWER7.

<hr>

It's very likely that gcc knows what it's doing, and enables 64-bit element-size vector intrinsics with the lowest necessary -mcpu= requirement.

#include <altivec.h> auto vec32(void) { // compiles with your options: Power4 return vec_splats((int) 1); } // gcc error: use of 'long long' in AltiVec types is invalid without -mvsx vector long long vec64(void) { return vec_splats((long long) 1); }

(With auto instead of vector long long, the 2nd function compiles to returning in two 64-bit integer registers.)

Adding -mvsx lets the 2nd function compile. Using -mcpu=power7 also works, but power6 doesn't.

source + asm on Godbolt (PowerPC64 gcc6.3)

# with auto without VSX: vec64(): # -O3 -mcpu=power4 -maltivec -mregnames li %r4,1 li %r3,1 blr

vec64(): # -O3 -mcpu=power7 -maltivec -mregnames .LCF2: 0: addis 2,12,.TOC.-.LCF2@ha addi 2,2,.TOC.-.LCF2@l addis %r9,%r2,.LC0@toc@ha addi %r9,%r9,.LC0@toc@l # PC-relative addressing for static constant, I think. lxvd2x %vs34,0,%r9 # vector load? xxpermdi %vs34,%vs34,%vs34,2 blr .LC0: # in .rodata .quad 1 .quad 1 <hr>

And BTW, vec_splats (splat scalar) with a constant compiles to a single instruction. But with a runtime variable (e.g. a function arg), it compiles to an integer store / vector load / vector-splat (like the vec_splat intrinsic). Apparently there isn't a single instruction for int->vec.

The vec_splat_s32 and related intrinsics only accept a small (5-bit) constant, so they only compile in cases where the compiler can use the corresponding splat-immediate instruction.

This Intel SSE to PowerPC AltiVec migration looks mostly good, but got that wrong (it claims that vec_splats splats a signed byte).

人吐槽 人点赞

Recommend

Comment

用户名: 密码:
验证码: 匿名发表

你可以使用这些语言

查看评论:What is the availability of 'vector long long'?