85145

What is the availability of 'vector long long'?

I'm testing on an old PowerMac G5, which is a Power4 machine. The build is failing:

$ make ... g++ -DNDEBUG -g2 -O3 -mcpu=power4 -maltivec -c ppc-simd.cpp ppc-crypto.h:36: error: use of 'long long' in AltiVec types is invalid make: *** [ppc-simd.o] Error 1

The failure is due to:

typedef __vector unsigned long long uint64x2_p8;

I'm having trouble determining when I should make the typedef available. With -mcpu=power4 -maltivec the machine reports 64-bit availability:

$ gcc -mcpu=power4 -maltivec -dM -E - </dev/null | sort | egrep -i -E 'power|ARCH' #define _ARCH_PPC 1 #define _ARCH_PPC64 1 #define __POWERPC__ 1

The OpenPOWER | 6.1. Vector Data Types manual has a good information on vector data types, but it does not discuss when the vector long long are available.

What is the availability of __vector unsigned long long? When can I use the typedef?

Answer1:

TL:DR: it looks like POWER7 is the minimum requirement for 64-bit element size with AltiVec. This is part of VSX (Vector Scalar Extension), which Wikipedia confirms first appeared in POWER7.

<hr>

It's very likely that gcc knows what it's doing, and enables 64-bit element-size vector intrinsics with the lowest necessary -mcpu= requirement.

#include <altivec.h> auto vec32(void) { // compiles with your options: Power4 return vec_splats((int) 1); } // gcc error: use of 'long long' in AltiVec types is invalid without -mvsx vector long long vec64(void) { return vec_splats((long long) 1); }

(With auto instead of vector long long, the 2nd function compiles to returning in two 64-bit integer registers.)

Adding -mvsx lets the 2nd function compile. Using -mcpu=power7 also works, but power6 doesn't.

source + asm on Godbolt (PowerPC64 gcc6.3)

# with auto without VSX: vec64(): # -O3 -mcpu=power4 -maltivec -mregnames li %r4,1 li %r3,1 blr

vec64(): # -O3 -mcpu=power7 -maltivec -mregnames .LCF2: 0: addis 2,12,.TOC.-.LCF2@ha addi 2,2,.TOC.-.LCF2@l addis %r9,%r2,.LC0@toc@ha addi %r9,%r9,.LC0@toc@l # PC-relative addressing for static constant, I think. lxvd2x %vs34,0,%r9 # vector load? xxpermdi %vs34,%vs34,%vs34,2 blr .LC0: # in .rodata .quad 1 .quad 1 <hr>

And BTW, vec_splats (splat scalar) with a constant compiles to a single instruction. But with a runtime variable (e.g. a function arg), it compiles to an integer store / vector load / vector-splat (like the vec_splat intrinsic). Apparently there isn't a single instruction for int->vec.

The vec_splat_s32 and related intrinsics only accept a small (5-bit) constant, so they only compile in cases where the compiler can use the corresponding splat-immediate instruction.

This Intel SSE to PowerPC AltiVec migration looks mostly good, but got that wrong (it claims that vec_splats splats a signed byte).

Recommend

  • How to run Background Tasks in ASP.NET [closed]
  • How is a large, spread out company supposed to work with the iOS Developer Program?
  • HTML Image going across entire screen
  • How are function arguments stored in memory?
  • PDF in PHP ZipArchive throwing errors, server permissions
  • Antialias on clipPath on layout
  • C++ calling the default constructor with parens vs without parens [duplicate]
  • GCC Inconsistent compilation error 'has no member named ' [duplicate]
  • Passing unspecialized template as a template parameter
  • draw a B+ tree in latex
  • C++ Armadillo Access Triangular Matrix Elements
  • Read the values from XML file
  • runtime-check whether an instance (Base*) override a parent function (Base::f())
  • Is there a package like bigmemory in R that can deal with large list objects?
  • How do I retrieve the user information of a user authenticated with Apache's mod_ldap?
  • Cast between interfaces whose interface signatures are same
  • Suppressing passwd when calling sqlplus from shell script
  • Security issues with PHP's Readfile method
  • How to determine if there are bytes available to be read from boost:asio:serial_port
  • C++ Partial template specialization - design simplification
  • Initializer list vs. initialization method
  • Javascript simulate pressing enter in input box
  • Different response to non-authenticated users and AJAX calls
  • C# - Serializing and deserializing static member
  • Incrementing object id automatically JS constructor (static method and variable)
  • DirectX11 ClearRenderTargetViewback with transparent buffer?
  • Why is the timeout on a windows udp receive socket always 500ms longer than set by SO_RCVTIMEO?
  • Do create extension work in single-user mode in postgres?
  • Web-crawler for facebook in python
  • Do I've to free mysql result after storing it?
  • Windows forms listbox.selecteditem displaying “System.Data.DataRowView” instead of actual value
  • InvalidAuthenticityToken between subdomains when logging in with Rails app
  • Unit Testing MVC Web Application in Visual Studio and Problem with QTAgent
  • KeystoneJS: Relationships in Admin UI not updating
  • trying to dynamically update Highchart column chart but series undefined
  • How can I get HTML syntax highlighting in my editor for CakePHP?
  • embed rChart in Markdown
  • How to get NHibernate ISession to cache entity not retrieved by primary key
  • How can I use `wmic` in a Windows PE script?
  • Unable to use reactive element in my shiny app