Virtual reality (VR) is one of the most demanding human-in-the-loop applications from a latency standpoint. The latency between the physical movement of a user’s head and updated photons from a head mounted display reaching their eyes is one of the most critical factors in providing a high quality experience. Human sensory systems can detect very small relative delays in parts of the visual or, especially, audio fields, but when absolute delays are below approximately 20 milliseconds they are generally imperceptible. Interactive 3D systems today typically have latencies that are several times that figure, but alternate configurations of the same hardware components can allow that target to be reached.
- John Carmack
Latency is one of the biggest challenges facing OculusVR, as well as the virtual reality space as a whole. Latency can lead to a detached gaming experience and can also contribute to a players motion sickness/ dizziness, so its extremely important to know where to find it and how it can be eradicated.
There are now a few answers to these questions on the AltDevBlogADay website, where John Carmack recently took the time address a few of them and spell out some VR truths to the unitiated.
It’s a meaty post, but here’s the jist of it:
- A total system latency of 50 milliseconds will feel responsive, but still noticeable laggy.
- 20 milliseconds or less will provide the minimum level of latency deemed acceptable.
- Extrapolation of sensor data can be used to mitigate some system latency, but even with a sophisticated model of the motion of the human head, there will be artifacts as movements are initiated and changed.
- Filtering and communication are constant delays, but the discretely packetized nature of most sensor updates introduces a variable latency, or “jitter” as the sensor data is used for a video frame rate that differs from the sensor frame rate.
- Early LCDs were notorious for “ghosting” during scrolling or animation, still showing traces of old images many tens of milliseconds after the image was changed, but significant progress has been made in the last two decades. The transition times for LCD pixels vary based on the start and end values being transitioned between, but a good panel today will have a switching time around ten milliseconds, and optimized displays for active 3D and gaming can have switching times less than half that.
- Some less common display technologies have speed advantages over LCD panels; OLED pixels can have switching times well under a millisecond, and laser displays are as instantaneous as CRTs.
- A subtle latency point is that most displays present an image incrementally as it is scanned out from the computer, which has the effect that the bottom of the screen changes 16 milliseconds later than the top of the screen on a 60 fps display.
- This is rarely a problem on a static display, but on a head mounted display it can cause the world to appear to shear left and right, or “waggle” as the head is rotated, because the source image was generated for an instant in time, but different parts are presented at different times. This effect is usually masked by switching times on LCD HMDs, but it is obvious with fast OLED HMDs.
- An attractive direction for stereoscopic rendering is to have each GPU on a dual GPU system render one eye, which would deliver maximum performance and minimum latency, at the expense of requiring the application to maintain buffers across two independent rendering contexts.
- The downside to preventing GPU buffering is that throughput performance may drop, resulting in more dropped frames under heavily loaded conditions.
- Much of the work in the simulation task does not depend directly on the user input, or would be insensitive to a frame of latency in it. If the user processing is done last (LATE FRAME SCHEDULING), and the input is sampled just before it is needed, rather than stored off at the beginning of the frame, the total latency can be reduced.
- The drawback to late frame scheduling is that it introduces a tight scheduling requirement that usually requires busy waiting to meet, wasting power. If your frame rate is determined by the video retrace rather than an arbitrary time slice, assistance from the graphics driver in accurately determining the current scanout position is helpful.
- An alternate way of accomplishing a similar, or slightly greater latency reduction Is to allow the rendering code to modify the parameters delivered to it by the game code, based on a newer sampling of user input (VIEW BYPASS)..At the simplest level, the user input can be used to calculate a delta from the previous sampling to the current one, which can be used to modify the view matrix that the game submitted to the rendering code.
- Delta processing in this way is minimally intrusive, but there will often be situations where the user input should not affect the rendering, such as cinematic cut scenes or when the player has died. It can be argued that a game designed from scratch for virtual reality should avoid those situations, because a non-responsive view in a HMD is disorienting and unpleasant, but conventional game design has many such cases.
- If you had perfect knowledge of how long the rendering of a frame would take, some additional amount of latency could be saved by late frame scheduling the entire rendering task, but this is not practical due to the wide variability in frame rendering times. However, a post processing task on the rendered image (TIME WARPING) can be counted on to complete in a fairly predictable amount of time, and can be late scheduled more easily.
- After drawing a frame with the best information at your disposal, possibly with bypassed view parameters, instead of displaying it directly, developers can fetch the latest user input, generate updated view parameters, and calculate a transformation that warps the rendered image into a position that approximates where it would be with the updated parameters. Using that transform, then can warp the rendered image into an updated form on the screen that reflects the new input.
- View bypass and time warping are complementary techniques that can be applied independently or together. Time warping can warp from a source image at an arbitrary view time / location to any other one, but artifacts from internal parallax and screen edge clamping are reduced by using the most recent source image possible, which view bypass rendering helps provide.
- Actions that require simulation state changes, like flipping a switch or firing a weapon, still need to go through the full pipeline for 32 – 48 milliseconds of latency based on what scan line the result winds up displaying on the screen, and translational information may not be completely faithfully represented below the 16 – 32 milliseconds of the view bypass rendering, but the critical head orientation feedback can be provided in 2 – 18 milliseconds on a 60 hz display. In conjunction with low latency sensors and displays, this will generally be perceived as immediate. Continuous time warping opens up the possibility of latencies below 3 milliseconds, which may cross largely unexplored thresholds in human / computer interactivity!
- Conventional computer interfaces are generally not as latency demanding as virtual reality, but sensitive users can tell the difference in mouse response down to the same 20 milliseconds or so, making it worthwhile to apply these techniques even in applications without a VR focus.
- A particularly interesting application is in cloud gaming, where a simple client appliance or application forwards control information to a remote server, which streams back real time video of the game. This offers significant convenience benefits for users, but the inherent network and compression latencies makes it a lower quality experience for action oriented titles. View bypass and time warping can both be performed on the server, regaining a substantial fraction of the latency imposed by the network. If the cloud gaming client was made more sophisticated, time warping could be performed locally, which could theoretically reduce the latency to the same levels as local applications, but it would probably be prudent to restrict the total amount of time warping to perhaps 30 or 40 milliseconds to limit the distance from the source images.
Check out the full post here, and if you manage to read it all the way to the bottom give yourself a big pat on the back from me. The class test is tomorrow.