Feb 28, 2026 · 3 min read

Log 003: Gaze Desynchronization

Gaze Failures

Noticed a significant desynchronization issue when piping gaze vectors through the VLM framework during rapid head movements. The 7-DoF manipulator started lagging by approximately 400ms. Root cause seems to be frame dropping in our explicit cue alignment module — when the gaze tracker fires at its full rate (~300Hz) but the visual encoder only processes at 30fps, the alignment buffer overflows silently.

This is particularly bad during fast saccades where the target object and the gaze signal need to be tightly coupled for shared autonomy to function correctly. The robot ends up acting on stale gaze data, which produces completely incorrect intent estimates.

Root Cause

The explicit cue alignment module uses a standard FIFO queue. Under high load (rapid head movement + concurrent VLM inference), frames get dropped from the front of the queue without any notification propagating upstream. The result: gaze latency silently inflates from ~12ms to ~400ms.

Next Steps

Re-implementing the buffer using a ring (circular) buffer structure rather than standard queueing. A ring buffer drops the oldest frame — not the newest — maintaining real-time temporal alignment at the cost of some historical fidelity. The model doesn't need a full gaze history for single-step intent prediction, so this is an acceptable tradeoff.

Initial tests are promising. Will benchmark end-to-end latency next week.