How are vertices transformed in WebGL in indexed and non-indexed geometries?


I am trying to digest these two links:

<a href="https://www.khronos.org/opengl/wiki/Rendering_Pipeline_Overview" rel="nofollow">https://www.khronos.org/opengl/wiki/Rendering_Pipeline_Overview</a> <a href="https://www.khronos.org/opengl/wiki/Vertex_Shader" rel="nofollow">https://www.khronos.org/opengl/wiki/Vertex_Shader</a>

The pipeline overview says that vertex shader runs before the primitive assembly.

The second one mentions this:


A vertex shader is (usually) invariant with its input. That is, within a single Drawing Command, <strong>two vertex shader invocations that get the exact same input attributes will return binary identical results</strong>. Because of this, if OpenGL can detect that a vertex shader invocation is being given the same inputs as a previous invocation, it is allowed to reuse the results of the previous invocation, instead of wasting valuable time executing something that it already knows the answer to.

OpenGL implementations generally do not do this by actually comparing the input values (that would take far too long). Instead, this optimization typically only happens when using indexed rendering functions. <strong>If a particular index is specified more than once (within the same Instanced Rendering), then this vertex is guaranteed to result in the exact same input data.</strong>

Therefore, implementations employ a cache on the results of vertex shaders. <strong>If an index/instance pair comes up again, and the result is still in the cache</strong>, then the vertex shader is not executed again. <strong>Thus, there can be fewer vertex shader invocations than there are vertices specified</strong>.


So if i have two quads with two triangles each:


verts: { 0 1 2 3 } tris: { 0 1 2 } { 1 2 3 }


verts: { 0 1 2 3 4 5 } tris: { 0 1 2 } { 3 4 5 }

and perhaps a vertex shader that looks like this:

uniform mat4 mvm; uniform mat4 pm; attribute vec3 position; void main (){ vec4 res; for ( int i = 0; i < 256; i++ ){ res = pm * mvm * vec4(position,1.); } gl_Position = res;

<strong>Should I care that one has 4 vertices while the other one has 6?</strong> Is this even true from gpu to gpu, will one invoke the vertex shader 4 times vs 6? How is this affected by the cache:


If an index/instance pair comes up again, <strong>and the result is still in the cache</strong>...


How is the primitive number related to performance here? In both cases i have the same amount of primitives.

In the case of a very simple fragment shader, but an expensive vertex shader:

void main(){ gl_FragColor = vec4(1.); }

And a tessellated quad (100x100 segments) can i say that the indexed version <strong>will</strong> run faster, or <strong>can</strong> run faster, or maybe say <strong>nothing</strong>?


Like everything in GPUs according to the spec you can say nothing. It's up to the driver and GPU. In reality though in your example 4 vertices will run faster than 6 pretty much everywhere?

Search for vertex order optimization and lots of articles come up

<a href="https://tomforsyth1000.github.io/papers/fast_vert_cache_opt.html" rel="nofollow">Linear-Speed Vertex Cache Optimisation</a>

<a href="http://gameangst.com/?p=9" rel="nofollow">Triangle Order Optimization</a>

<a href="https://github.com/GPUOpen-Tools/amd-tootle" rel="nofollow">AMD Triangle Order Optimization Tool</a>

<a href="http://gfx.cs.princeton.edu/pubs/Nehab_2006_TOO/index.php" rel="nofollow">Triangle Order Optimization for Graphics Hardware Computation Culling</a>

unrelated but another example of the spec vs realtiy is that according to the spec depth testing happens AFTER the fragment shader runs (otherwise you couldn't set gl_FragDepth in the fragment shader. In reality though as long as the results are the same the driver/GPU can do whatever it wants so fragment shaders that don't set gl_FragDepth or discard certain fragments are depth tested first and only run if the test passes.


  • Difference in performance between calling .localeCompare on string objects and constructing a purpos
  • How to compare same PropertyInfo with different ReflectedType values?
  • Why does updating a hash set to a variable update that variable?
  • Delphi XE3, ugly StringGrid's borders
  • Compare Dictionary
  • Compare variables PHP
  • SharePoint Designer 2010 - Determine if today's date is within x days of a start date column us
  • rails - convert DateTime to UTC before saving to server
  • javafx 3d performance large data set
  • Gforce min not supported for character in data.table
  • iOS - Is this a task for enums?
  • Importing Excel files with a large number of columns header into mysql with c#
  • Using an enum contained in a Cloud Endpoint model on a Android client
  • DataGridView: Pass by Value or Reference?
  • MySQL: Difference between `… ADD INDEX(a); … ADD INDEX(b);` and `… ADD INDEX(a,b);`?
  • Rails 5 - Google Maps - Javascript error - initMap is not a function - fixing one js issue creates a
  • How to call a procedure using NHibernate that returns result from multiple tables?
  • How to render a pixel array most efficiently to a window in c++?
  • Can someone please explain to me in the most layman terms how to use EventArgs?
  • Content-Length header not returned from Pylons response
  • Play WS (2.2.1): post/put large request
  • How to access EntityManager inside Entity class in EJB3
  • Control modification in presentation layer
  • PHPUnit_Framework_TestCase class is not available. Fix… - Makegood , Eclipse
  • Display issues when we change from one jquery mobile page to another in firefox
  • output of program is not same as passed argument
  • Does CUDA 5 support STL or THRUST inside the device code?
  • vba code to select only visible cells in specific column except heading
  • Convert array of 8 bytes to signed long in C++
  • Statically linking a C++ library to a C# process using CLI or any other way
  • Rearranging Cells in UITableView Bug & Saving Changes
  • Why winpcap requires both .lib and .dll to run?
  • Proper way to use connect-multiparty with express.js?
  • Understanding cpu registers
  • Memory offsets in inline assembly
  • apache spark aggregate function using min value
  • Does armcc optimizes non-volatile variables with -O0?
  • Recursive/Hierarchical Query Using Postgres
  • Running Map reduces the dimensions of the matrices
  • Conditional In-Line CSS for IE and Others?