I want to measure broadcast message latency over our message broker on a 1GB LAN.
Messages are transmitted in a pub sub fashion, one publisher, many consumers. The producer timestamps each message using the system clock (DateTime.Now in C#) and consumers measure latency by subtracting the timestamp on the message from DateTime.Now.
double latency = (DateTime.Now - msg.NMSTimestamp).TotalMilliseconds;
All of the boxes on our LAN sync their time via NTP once an hour yet I'm seeing significant latency and even negative times in the range of +/- 1 second. I read that NTP should provide ~5ms accuracy in a LAN environment.
Is my measurement strategy fundamentally flawed? Is there another explanation for the negative latency? If I was only seeing large latencies I'd suspect our message queue was slow but the negative ones really have me confused.
What are your negative values looking like in millis? If it's within 5ms, that's normal for NTP, as you know. There could even be up to 10 millis difference between computers if one computer was 5 millis ahead of true time and another was 5 behind. More than that, I would guess that there's some rounding error, lookahead/lookbehind error, or sync errors somewhere in your system. There are many hardware and implementation details you have little control over that can produce inaccuracies. GENERALLY, system clocks are accurate enough at the millisecond level when polled by DateTime.Now, but many hardware details like CPU throttling under load, pipelines, cache thrashing etc. can introduce enough error to be significant at the millisecond level.
If possible, set up your computers to synchronize with the NTP server at least a second apart form each other. If all computers try to sync on the hour every hour, the NTP server will be flooded, increasing inaccuracies in reporting the correct time due to crowding and packet scheduling. I think this is the most likely cause for what's going on. Also, make sure your network is as efficient as possible, by reducing cable runs (300ft is the theoretical maximum, and in an EMI-noisy environment runs as short as 40 feet can cause serious problems), replacing hubs with switches, and minimizing wireless network use.
I have a handful of incidents logged of negative network latency measured by the same clock.
Windows fails to implement clock skew, so you see these whenever a sync happens.
Windows does not guarantee 5ms accuracy, but only 18.2 ticks per second. My machine provides an epsilon of 15ms.