General Discussion

Authors: Niklas Hypki

Today, HMDs have become powerful devices with impressive tracking capabilities, and to a certain extent, Sutherland’s ideas of seamlessly experiencing and walking through virtual worlds have become reality.

To top it off, HMDs with built-in eye tracking sensors enable VEs and user interfaces that automatically react to the user’s eye movements. As a result, tracked eye movements can be transferred to virtual agents and VR buttons can be selected using fixations.

The research results presented in this paper show that we have reached a point where we can use VR as a research method to expand our knowledge of visual perception. In particular, when investigating the connections between vision and locomotion, VR eye tracking offers great potential for gaining a better understanding of how we perceive and explore the world around us.

The investigation into how VR can be used to observe eye movements during natural behaviours was driven by two goals. We wanted to gain more insights into typical gaze patterns to increase our knowledge of human visual perception and hoped that the knowledge of these patterns might be used for VR applications.

As a first step, we developed and utilised a method for measuring the latency of eye tracking in HMDs. Our results make it possible to precisely match the eye tracking signal to the stimuli displayed on the screen. Our results also enable a more informed assessment of whether a specific VR setup can be used for gaze-contingent experiments. We followed this with Study [II], in which gaze, movement and orientation data were recorded during several natural tasks. Subsequently, we tested how well future waypoints could be predicted based on this information. Here, we were particularly interested in whether eye tracking data improves prediction accuracy. In Study [III], eye and head movements during visual search tasks in VR were recorded, to get a broader understanding of how our gaze is guided in this common scenario.

The following general discussion first examines the current methodological limitations of VR eye tracking and provides an outlook on possible directions for the future development of the method. Next, the prediction of our movement based on gaze data is revisited and new proposals for improving the method are summarised. In addition, the influence of salient objects that may obscure the gaze while walking on the prediction of movement is discussed. The discussion then reflects upon the findings on gaze behaviour during searching, specifically with regard to the factors that influence the interaction between bottom-up and top-down guidance of coordinated head and eye movements. This is followed by a summary of the current limitations of VR that should be taken into account when designing future psychophysical experiments, as well as a brief outlook on the future of the method. Finally, the discussion concludes by reflecting on the most important features of this work.

VR Eye Tracking Latency

To transfer human behaviour to virtual worlds, HMDs record data from a wide range of sensors. This means that the VR equipment simultaneously captures multiple facets of user behaviour in great detail, providing a range of opportunities to observe human behaviour. When recording eye movements for psychophysical studies, it is important to have accurate information about the latency of the measuring devices in order to be able to associate the tracked movements with the stimuli displayed in the VE. Moreover, experiments with gaze-contingent stimuli, require latencies that are small enough to go unnoticed by participants when moving their eyes across the screen.

In Study [I], we developed a method that can be used to accurately measure the eye tracking delay and end-to-end latency in VR headsets with included eye tracking. By simultaneously recording eye movement data based on VOG, EOG and the light emitted by the HMD using a photodiode, we measured the latency of the sensors of different headsets. We measured eye tracking delay (time from the occurrence of an eye movement to the availability of the corresponding data) and latency (time from an eye movement to a corresponding change in the screen content). Delays ranged from 15 to 52 ms, latency ranged from 45 to 81 ms. Compared to the Varjo VR-1 and the HTC Vive Pro Eye, the Fove-0 was the fastest device and seemed best suited for time-critical psychophysical research.

The end-to-end latency (from an eye movement to a visible change on the screen) is the result of a chain of different processes (see Figure 1). Initially, latency can be caused by the camera directly when the eyes are recorded. Then, the eye tracking algorithm that interprets the recorded image can add further delays (especially if temporal filters are applied to smoothen the signal by averaging the direction of gaze over several samples). The process of reading in the eye movement data can also lead to further time loss (e.g. due to the limited bandwidth of the connection). The time required to perform all these steps results in a specific eye tracking frequency. This frequency indicates how often this series of steps can be performed per second. At the same time, the virtual image is created at a certain rendering frequency. If both frequencies are not perfectly synchronised, this can lead to additional end-to-end latency. The total latency can also be increased by the rendering itself (e.g. when particularly complex scenes have to be generated, which slows down the time required to create a frame and thus reduces the frame rate). Finally, latency can increase by a few milliseconds due to the latency of the screen when displaying the image. Interestingly, in our comparison the Fove-0s display had the lowest refresh-rate (70 Hz vs 90 Hz in the Varjo VR-1 and the Vive Pro Eye), which should translate into a slightly shorter display latency of about 3 ms for both the Vive Pro Eye and the Varjo VR-1.

With this in mind, our results contribute to a better understanding of which step in this chain causes how many frames of latency. It is therefore possible to combine our latency measurements with the results of other studies to draw further conclusions. In an earlier study with the HTC Vive (the predecessor model of the Vive Pro Eye, which also uses an OLED display and the same position tracking system but is not equipped with eye tracking sensors) Niehorster, Li, and Lappe (2017) measured the end-to-end latency between the physical movement of the HMD and a corresponding change on the display. They found a latency of only 22 ms or 2 frames at 90 Hz. Combined with our measurements of the latency between receiving gaze data and updating gaze dependent screen content (see Table [table:overview]), this suggests that motion tracking can be completed in under 1 frame, while rendering and displaying a frame based on this data can take up to 2 frames. The remaining latency is therefore probably caused by the eye tracking itself. Like us, Sipatchin, Wahl, and Rifai (2021) also measured the time lag of the Vive Pro Eye. In contrast to our experiment, they used an infrared light source that was used as an artificial eye instead of real eye movements. With this setup, they found an end-to-end latency of 5 to 6 frames at 90 Hz. In our study we measured an eye tracking delay of 4 to 5 frames from saccade onset to recording gaze data and an end-to-end latency of 7 frames at 90 Hz from saccade onset to measured changes on the display. As a result, it seems that recording an image of the users eyes, detecting the pupil and transforming these measures into eye tracking data, takes multiple frames. A direct comparison between the two studies shows that this latency varies depending on whether a real eye is tracked or a pupil reflection is simulated (see Figure 1).

Components of end-to-end latency in VR eye tracking in the HTC Vive Pro Eye. Latency from an eye movement to display change (black) takes 7 frames, while eye tracking delay from eye movement to data collection (red) takes 4 -5 frames (Stein et al. 2021). End-to-end latency when using an artificial eye (blue) led to a latency of 5 – 6 frames (Sipatchin, Wahl, and Rifai 2021). Latency from moving the HMD to a change on the display (green) takes 2 frames (Niehorster, Li, and Lappe 2017).

In a realistic gaze-dependent experiment, detecting the onset of saccades, could increase the overall latency even further. However, in the latency measurement studies presented here, each eye movement could be detected from one sample to the next. End-to-end latency is too high for gaze contingent rendering, if the user perceives stimuli that have not yet been adjusted to the current eye position after a saccade. To determine the maximum latency for a gaze-contingent setup, only a few empirical studies are available. Loschky and Wolverton (2007) found that user performance in a blur detection task did not change until up to 60 ms latency. In addition, Loschky and McConkie (2000) reported that users fixated slightly longer at a latency of 45 ms compared to a latency of 15 ms. Other recommendations for maximum end-to-end latency are based on unpublished results and expert opinions. Overall, the maximum latency for GCDs recommended in previous articles varies between 20 and 80 ms (Loschky and McConkie 2000; Loschky and Wolverton 2007; Albert et al. 2017; Melnyk et al. 2025; Carmack 2013). As a result, a clearly defined maximum end-to-end latency is not available. At the same time this range of recommended values fits well to the expected length of saccadic suppression of saccades with typical amplitudes (Volkmann 1986; Stevenson et al. 1986).

Our results suggest that among the HMDs we tested, the Fove-0 is best suited to present gaze-contingent stimuli in VR without the user perceiving this manipulation. To be able to measure gaze-contingent psychophysical experiments in VR, however, the exact parameters, such as a detection threshold as a function of saccade amplitude and parameters of stimulus manipulation detectability such as contrast, or degradation gradient would first have to be systematically determined for the specific task. Furthermore, even if previous measurements suggest that head movements can be detected relatively quickly, the latency of head position and orientation tracking also needs to be taken into account in order to display gaze-contingent stimuli in VR.

Eye tracking devices mainly used for research, such as the Eyelink 1000, achieve low latency times through a combination of very high recording frequencies, efficient algorithms for analysing the eye tracking camera images and fast forwarding of the eye tracking data via network interfaces. In VR eye tracking, recording frequency has also increased in some more recent devices, such as the Fove-480 (480 Hz) and the Varjo XR-4 (200 Hz). In both HMDs the recorded images of the eyes are forwarded to the computer, where they are analysed to generate gaze direction data. On the one hand, the improved temporal resolution can be used to recognise saccades earlier and better display their trajectory, on the other, the precision of eye tracking can be improved by applying a temporal filter to the eye tracking signal (Holmqvist et al. 2011).

At the same time, there are new approaches to further develop the recording frequency in mobile eye tracking using a combination of several recording methods. In addition to good data quality and new algorithms for merging data from different sources, energy consumption, waste heat and the weight of the additional sensors also play a role. For example, experiments with a combination of VOG and photo sensor oculography, which were combined with the help of machine learning algorithms, showed interesting results that could lead to VR eye tracking hardware with higher temporal resolution in the future (Katrychuk, Griffith, and Komogortsev 2019; Rigas, Raffle, and Komogortsev 2017; Palmero et al. 2023): As with limbal eye trackers, the photo sensor provides a measurement of the location of the white sclera of the eye, which reflects more infrared light than the iris. With today’s sensors, this data has high spatial and temporal resolution. In addition, it is possible to only forward data from sensor input that changed since the previous sample. This event-based approach reduces the amount of data so that transmission can be faster. This data stream is then fused with the lower-resolution video recordings to create a data set with a high temporal and spatial resolution.

Another area of development is new algorithms that can predict future eye movements. Komogortsev, Ryu, and Koh (2009) predicted saccade amplitudes based on the first two gaze samples using an eye tracker with 120 Hz. Although their model was good enough to predict the saccade direction with a high degree of certainty, it was not able to predict the exact landing position, as it gave an average prediction error of about 5°. Furthermore, Arabadzhiyska et al. (2017) predicted the saccade landing point during an eye movement using the data recorded at the start of the saccade with 300 Hz. Their prediction results in an average error of less than 4° if the prediction is made on the basis of data up to the middle of the saccade. At 80 % of the saccade, the error even drops to less than 1°. Interestingly, the best prediction according to their model is based exclusively on the last data point of the saccade. Based on their prediction, Arabadzhiyska et al. (2017) were able to reduce the detection rate in an experiment with foveated rendering.

One other approach is to use machine learning models trained on large eye movement datasets to predict the onset of a saccade based on a series of recent incoming eye movement samples (Rolff, Steinicke, and Frintrop 2022; Rolff et al. 2022, 2023). So far, however, the models predict saccades with an average error of more than 0.12 s and are therefore not suitable for replacing conventional saccade detection algorithms. In the future, however, it may be possible to adaptively adjust the saccade threshold based on such predictions in order to reduce the time required for an eye tracking system to detect eye movement.

Since the publication of our study, in addition to updated eye tracking headsets connected to a computer, a number of stand-alone HMDs with inside-out positional tracking, controller and hand tracking capabilities and eye trackers have come to market (for example the HoloLens 2, the Apple Vision Pro, the Neo 2 Eye, Neo 3 Pro and Neo 4 Enterprise from Pico, the Varjo Aero, the HTC Vive Focus 3 and the Meta Quest Pro). For updated HMDs connected to a computer, such as the Fove-480 or the Varjo XR-4, our method of measuring latency by simultaneously recording eye movements using EOG can still be used. To measure the latency of stand-alone devices, however, a different synchronisation method is required to determine the latency between VOG and EOG.

Until today, researchers favoured the Quest Pro (Aziz et al. 2024; Wei, Bloemers, and Rovira 2023). The Vive Pro Eye was also widely used to record fixation positions in many different scenarios (Moreno-Arjonilla et al. 2024). In contrast to the Fove-0, both headsets offer support for the use of controllers, additional trackers, large-area position tracking and a wireless mode. Even though end-to-end latency is comparably high, accuracy and precision of the eye tracking sensors in the Vive Pro Eye appear to be good enough to record fixation points in VR, at least in the central FOV (Sipatchin, Wahl, and Rifai 2021; Schuetz and Fiehler 2022).

Within this context, our research showed that the latency of VR eye trackers can be reliably determined using a direct comparison with EOG. We found that the end-to-end latency between an eye movement and a display change varies depending on the device. In some cases, the latency even appears to be longer than the expected duration of saccade suppression for saccades with typical amplitudes. When conducting a gaze-dependent experiment with such latency, participants would see the visual stimuli jump from one position to another after each saccade. In principle, the latency measurement method we developed can also be applied to newly introduced devices, although a new type of signal synchronisation is required to determine the latency of stand-alone HMDs. It may therefore be possible to reduce latency by filtering gaze data less or by detecting pupils and reflections more quickly. Nevertheless, VR eye tracking can currently be used to transfer our eye movements to avatars, record fixation points in virtual space and even track travel-gaze-fixations while a user is walking. Moreover, VR eye tracking can be used for foveated rendering. In such cases, only the central part of the current FOV is rendered in high detail, allowing virtual worlds to be simulated more efficiently. However, as mentioned before, the latency of some current HMDs is in a range where it is seems likely that users can perceive that the method is applied. To reduce the detection rate, the resolution degradation factor can be carefully adjusted to increase the foveated area. In addition, algorithms for predicting the start or progression of the saccade can help to make the detection of the manipulation less likely.

The results from our study provide valuable information for analysing the behaviour being tracked in order to synchronise recorded eye movements and presented stimuli. In addition, our analysis provides an estimate of how much faster VR eye tracking would need to become in order to display stimuli in a gaze-contingent manner. Together with other studies that provide valuable insights into other parameters of data quality, such as the accuracy and precision of eye tracking and the latency of head tracking, our study contributes to a better understanding of VR eye tracking for applications in psychophysical research, which could ultimately help to enable gaze-contingent HMDs.

Next Steps in Gaze-based Locomotion Prediction

When walking, our gaze usually precedes our progress by about two steps (Jonathan Samir Matthis, Yates, and Hayhoe 2018; Hollands et al. 1995; Jonathan S. Matthis and Fajen 2014). In contrast to grasping, for example, the execution phase between planning an action and reaching the visual target is quite long. The planning phase therefore potentially involves the selection of an action goal, which is reached comparatively far in the future.

Study [II] made it possible to observe how gaze precedes complex actions in a series of typical locomotor patterns. Typical gaze shifts during locomotion serve to capture visual information for the execution of upcoming movements and to monitor ongoing behaviour. We therefore assumed that eye behaviour would contain information about the selection of the next waypoint. Thus, we assumed it should be possible to predict future locomotion waypoints based on this data. Indeed, we were able to show that it was possible to combine this unsorted stream of gaze information with their corresponding position data to predict walking targets 2.5 s in the future with an error of 66 cm.

We also found that eye movements enable a more accurate prediction of movement behaviour, especially in situations involving varying walking speeds. These could be decision-making situations, such as turning around or selecting an object with which to interact. Our results therefore suggest that our gaze plays an important role, especially when modifying an ongoing movement. Since a decision such as right or left or turning around has a strong impact on the selection of future walking destinations, it is precisely these phases (in which walking speed changes) that are particularly interesting for applications that use these locomotion predictions.

This result supports earlier theories that postulate gaze is an important part of planning our walking trajectories. Moreover, it appears that eye movements associated with planning our future locomotion make up a sufficiently large proportion of the total amount of gaze data to contribute to a significant improvement in prediction in LSTM models without further filtering. It seems that the previously proposed connection between our gaze and walking patterns during different tasks (that were observed in previous studies) are strong enough to systematically and automatically predict future walking actions using gaze and motion data. This means that this method of data processing could be used in various applications, for example, in improving redirected walking to reduce the number of user resets (Jeon et al. 2025). In addition, using a similar approach, it might even be possible to gain other information from gaze data to achieve such aims as decoding user preferences in product design (Palacios-Ibáñez et al. 2023).

Further Improvements

Since the publication of our research, two other studies have confirmed our main results. Bremer and Lappe (2024) found that gaze data improved predictions of the future pathway in two environments in which participants navigated using a joystick. Moreover, just like in our experiment, Y. Kim et al. (2024) found that gaze can improve the prediction of future waypoints in a walking task. However, in contrast to our model, which used 2.5 s of gaze and movement data to predict waypoints 2.5 s in the future, their model used 3 s of past data to predict waypoints 2 s in the future which resulted in overall higher prediction errors (between 113 and 126 cm) compared to our study (66 cm).

One reason for this could be the 0.5 s shorter input stream for each prediction. The association between waypoint fixation and future waypoint peaks approximately 1.5 s before a waypoint is reached (Jonathan Samir Matthis, Barton, and Fajen 2017). Thus, the proportion of relevant information for the prediction of locomotion in the data stream from 2.5 s before reaching a waypoint could be greater than in a data stream from 3 s before reaching the waypoint. However, a post-hoc analysis of our data from Study [II] does not suggest this: a series of new LSTM models trained with input data of 0.5 s, 1 s, 1.5 s, 2 s and 2.5 s predicting the movement positions for the same time span into the future yielded average errors between 68 and 62 cm. Thus, the total error range did not change dramatically with changes of 0.5 s. Furthermore, in this analysis, the largest error occurred at 0.5 s and the smallest error at 2.5 s. It therefore seems more likely that the trained LSTM already reflects the typical information distribution during natural walking. Other reasons for the larger prediction error in the experiment by Y. Kim et al. (2024) could be the smaller amount of training data or differences in data quality resulting from different HMDs and the lower sampling frequency of the eye tracker used in this study.

Nevertheless, it might be interesting to systematically investigate in future experiments how the prediction error changes under identical conditions with even longer input lengths. Such an analysis was carried out by Bremer and Lappe (2024), who investigated prediction errors as a function of how far into the future their models predicted locomotion behaviour. They found that the prediction errors increased overall for longer predictions. However, based on an LSTM model, it was relatively constant between 1 s and 2 s, until it increased again after about approximately 2 s in the future. However, with (Bremer and Lappe 2024)’s prediction model based on a transformer architecture, the trend of increasing prediction errors for longer predictions appeared to be smaller.

Based on our results, some suggestions were made to further improve the prediction of locomotion: For example, Mayor, Calleja, and Fuentes-Hurtado (2024) investigated the use of quaternions instead of directional angles with a range of 0° - 360°. They argued that jumps from 360° to 1° for a movement of only 1° would be especially difficult to learn for the machine learning algorithm. Quaternions, on the other hand, contain the same information without this problem. In their study, they found this to be an advantage, especially when predicting the movement position 2.5 seconds in the future. They go on to suggest that research into other machine learning algorithms could lead to further improvements. Indeed, in a direct comparison of different architectures trained on the same dataset, Bremer and Lappe (2024) showed that changing the network architecture to a transformer model can also reduce the prediction error.

Saliency-guided Eye Movements During Walking

As mentioned in the previous chapters, there are usually different types of fixations and eye movements in our gaze movements during locomotion. While visual salience is often described as an important factor in screen based visual search tasks, its influence often seems to be diminished when the ongoing tasks demand collecting additional visual information. Nevertheless, we kept the VE that we used to record the gaze data sparse and ensured that only objects related to one of the instructed tasks were present.

Thus, our recorded gaze data most likely consists of multiple types of fixations. First, there are travel-gaze-fixations that are associated with gathering just-in-time information to find the next waypoints (which are typically about 1.5 s in the future). Second, there are long-term planning travel-gaze-fixations that are on objects unrelated to the current destination, but are potentially helpful for future waypoints. Then there are other travel-gaze-fixations that are not directly related to locomotion and are used to remember properties of objects that are relevant for future tasks. Finally, there might be a small proportion of gaze movements related to monitoring grasping actions when interacting with the VE.

In walking experiments on flat terrain we usually spend slightly more than half of our fixations on the ground (Pelz and Rothkopf 2007; Jonathan Samir Matthis, Yates, and Hayhoe 2018; Patla and Vickers 2003). In their study, Y. Kim et al. (2024) added a condition in which the amount of distracting and moving objects in the background was varied. They found that as visual distractions increased, participants performed more gaze fixations towards the background while solving the three different locomotion tasks. Normally, our travel-gaze-fixations are shorter than 0.6 s (Patla and Vickers 2003) and tend to be directed in our current direction of motion (Hollands, Patla, and Vickers 2002; Rothkopf, Ballard, and Hayhoe 2016). This facilitates getting environmental and self-motion information from the optic flow, which could be helpful for directing our feet to the planned landing positions (Patla and Vickers 2003). In an environment with moving objects, our gaze must compensate for the visible movement of the objects to enable continuous fixation on the same target. When we walk and look at moving objects that are close to us, our gaze must compensate for both the movement of the objects and our own progress. However, in the study by Y. Kim et al. (2024), the objects were presented in the background. As a result, although our gaze continues to compensate for the movement of the target, it does not have to compensate for locomotion, as the objects remain in the background no matter how far we move. Therefore, it may be more difficult to extract information about future waypoints from the gaze, which could explain why the prediction error increased under conditions of a moving background.

Looking at salient objects that guide our eye movements as we walk raises new questions, such as where do we normally focus our attention while walking? Previous research suggests that particularly in direct comparison to situations without locomotion, the periphery contains relevant information (Franchak and Adolph 2010; Craybiel, Jokl, and Trapp 1955; Hassan et al. 2002; Geruschat, Turano, and Stahl 1998). Walking seems to be a highly automated action that we can perform without much cognitive effort, so it is likely that we have some reserve capacity to process other visual content as we move forward. Thus, even if our gaze is directed towards a salient non-task-relevant object, we may still be focusing covertly on walking-relevant elements of the scene. This is supported by previous experiments that suggest that our visual perception and eye movements are (at least to a certain extent) influenced by walking (Cao, Chen, and Haendel 2020; Haegens et al. 2011; Barnes, Davidson, and Alais 2025; Davidson et al. 2023; Davidson, Verstraten, and Alais 2024). This process is not perfect, as shown by Jonathan S. Matthis and Fajen (2014) who suggest that there is a certain time window in which we can absorb the visual information required for each step. If this information is not available, our steps tend to be less precise. It could therefore be interesting to investigate if shifting attention during this critical time-window has an influence on locomotion. Following the approach of Davidson, Verstraten, and Alais (2024), one could also investigate whether the direction of our attention can be assigned to three-dimensional positions relative to the head position and whether and how this relationship is modulated by our steps.

Although there are still several open questions regarding the influence of distracting objects during walking, this does not necessarily mean that predicting locomotion using gaze data is only possible in sparse environments in which the participants focus exclusively on walking tasks. Conversely, Bremer and Lappe (2024) found that gaze data can be valuable for predicting movement in various contexts, even when the link between gaze direction and future waypoints was particularly weak, as participants were solving a visual search task while moving forward in the VE.

As an alternative approach to improve locomotion prediction, Y. Kim et al. (2024) suggested adding additional gait sensors to collect data that are not influenced by salient stimuli and are therefore able to improve predictions. Another approach to improve predictions is to classify gaze data in terms of their expected information for the prediction. This could be done, for example, following the approach of Bremer, Stein, and Lappe (2022), who trained an LSTM model that was able to distinguish gazes directed at future waypoints from gazes directed at other features of a scene. Similarly, Sharma et al. (2024) were also able to distinguish between targets and non-targets in fixations when searching for icons on desktop backgrounds and finding tools in a cluttered workshop, using a combination of eye tracking and EEG data.

In summary, although prediction accuracy varies slightly depending on the task and environment, gaze data can generally be used to improve prediction models for walking behaviour. By combining data from multiple locomotion tasks, we achieved an average prediction error of 66 cm.

Since the publication of our results, there have been well-founded suggestions as to how this accuracy could be further improved in the future. New findings about salient objects in the background that can distract our eye movements during locomotion may suggest that important information for movement prediction is encoded in gaze movements related to fixations on objects near us. However, further research is needed to confirm this hypothesis. More generally, the influence of salient background objects raises further questions about where we look while planning upcoming behaviour and how visual information and task requirements jointly control our eye movements. Our study not only shows that predicting walking goals based on gaze data is a potentially useful method for various VR applications, but also provides a clear example of how VR research can be used to better understand how we plan our behaviour.

Gaze Guidance

In Study [III], we observed the trajectories of eye and head movements in a head-free search in a VE. Unsurprisingly, the results showed that some eye movements during searching are guided by the salience of visual stimuli. Initial saccades at the start of a search especially, seemed to be guided by information from the original FOV. Thus, salient targets visible in the periphery of the FOV were found quicker than less salient objects and initial eye movements landed closer to the target compared to less salient objects.

We additionally observed how top-down head movement strategies became an influential factor when the search space expanded naturally when the target was not found in the initial FOV and head movements were required to complete the task. Therefore, our results indicate that saliency plays a subordinate role in the second searching phase. Interestingly, the same stimuli which produced pop-out effects when they were visible in the FOV at the beginning of the search, did not have this capability when they were brought into the FOV by turning our head. Search behaviour was no longer influenced solely by the salience of visual stimuli and expected eye movement biases. Instead, the decision to make a particular head movement seemed to depend largely on the known arrangement of stimuli.

Peripheral targets that only became visible during the search through head movements were reached in similar search times, regardless of their saliency. In addition, initially hidden targets (that could only be seen when they were in the central FOV when the head movement towards them was almost complete) were found in a similar amount of time. This supports the conclusion that peripheral information was not used to guide the gaze. After head movements brought a set of potential targets close to the centre of the FOV, the search within this central set seemed to be again facilitated by salience.

Bottom-Up & Top-Down Factors

Our gaze can be guided at different stages by two different principles: by bottom-up visual features that are directly related to a visible salient object and by more top-down cognitive planning that is not always related to something currently in our FOV. The influence of salience often seems to be reduced when eye movements are driven by task demands.

Schütz, Braun, and Gegenfurtner (2011) assumed that bottom-up guidance is off duty whenever we perform a task.

However, this does not mean that bottom-up guidance is always the most important factor when no instructions are given.

Nyström and Holmqvist (2008) compared the fixations on images of three different categories and found that edge density and contrast at the fixated location did not influence initial fixation or the time course of viewing, even when no task was given. Thus, Wolfe and Horowitz (2017) hypothesised observers would internally intuitively follow a task, even without specific instructions, or when asked to view the presented scene freely.

Implicitly, these ideas imply some kind of modular structure with a bottom-up and a top-down pathway that can replace each other in different situations. This aligns with the results of Lin, Franconeri, and Enns (2008), who found that suddenly appearing (and therefore salient) objects during an ongoing search strongly attract attention, especially if they approach from the periphery or threaten to collide with us. In other words, switching between bottom-up and top-down control can take place during a task, based on visual input. Naturally, this means that information from the bottom-up path is continuously available to our control system, even if it is not currently the main guiding factor. This is consistent with the results of a few other experiments in which bottom-up mechanisms seem to continuously have the capability to distract us during free headed searches in natural VEs. For example, blurring of salient regions in a visual scene enable an overall better search performance for less salient targets (Lukashova-Sanz and Wahl 2021).

Conversely, Bektaş et al. (2019) reported that a gaze-contingent reduction in peripheral detail accuracy could improve visual search performance.

To better understand this, Einhäuser, Rutishauser, and Koch (2008) set up a study on how both factors are weighted against each other. In their study, the influence of visual input as a guiding factor was first established using high- and low-contrast stimuli or flickering targets. Then the bottom-up influence was overridden by setting a simple search task for a clearly visible bulls-eye target. In the condition where participants initially preferred salient, high-contrast parts of the screen, they then developed a preference for low-contrast stimuli. This suggests that instead of completely overriding one channel of information, we are sometimes guided by a combined stream of bottom-up and top-down information. However, when using a less salient search target, contrast was not a relevant guiding factor. In the condition where flicker was the initially established as a guiding factor, the salient bullseye search target also neutralised the original influence of flicker. Therefore, Einhäuser, Rutishauser, and Koch (2008) concluded that the extent to which one factor overrides the influence of another also depends on the overridden feature and the salience of the objects relevant to the new task, while the initial salience features in the environment appear to play a subordinate role.

The weighting of top-down and bottom-up factors can also be shifted by non-visual changes as a study by Foulsham and Underwood (2007) shows. They found that objects with higher salience were fixated earlier and more frequently than objects with lower saliency, while participants memorised different complex scenes. Once the task was changed to searching for a target defined by a category or example, salience was no longer the decisive guiding factor. A similar result was presented by Longstaffe, Hood, and Gilchrist (2014), when participants searched in a room with a series of illuminated buttons on the floor to find a target that changed colour when pressed. Some buttons flashed while others remained lit. Participants chose the flashing buttons more often, even though they knew that doing so had no effect on the search results. At the same time, they were able to choose a sensible search sequence and avoid selecting the same switches twice. Nevertheless, in a condition where memorisation was not necessary because the already selected targets were visually marked, participants were less likely to explore flashing locations. Thus, in this task, top-down guidance was not able to completely override the influence of bottom-up information. Interestingly, in these experiments, salient targets were chosen even more often when a memorizing task was done simultaneously. This suggests that our memory capacity can also influence how highly we weight salient visual stimuli as a guiding factor.

In summary, it can be said that various guiding factors are integrated with each other in a common module. One possible outcome is that one factor neutralises all others. Tatler et al. (2011) assumed that this is particularly the case when non-visual factors such as time pressure or behavioural costs come into play. They assumed that this would allow particularly important fixations to be carried out without distraction and with particular precision in time-critical situations such as driving or walking. However, given the findings of Longstaffe, Hood, and Gilchrist (2014), their assumption that salience prevails particularly in situations with low cognitive load appears overly simplistic. Nevertheless, it is possible that top-down factors such as instructions and working memory load modulate the effect of bottom-up factors.

Timing

The influence of different guiding factors on our eye movements changes depending on the visual content and over time. Schütz, Trommershäuser, and Gegenfurtner (2012) observed that saccades with short latency were mainly guided by salience, while saccades with long latency could also be influenced by value information.

Similar conclusions can also be drawn from a screen-based study by Wolf and Lappe (2021), in which participants were instructed to look at either the centre of a picture, a salient object, or a position between the two options. Their results showed that quickly executed saccades (triggered between 80 ms and 250 ms after the appearance of visual stimuli) are susceptible to involuntary distortions caused by salient objects or the centre of the image. However, saccades with a longer latency and without any instructions on where to look frequently landed on salient stimuli. This suggests an interaction between the guidance of the instructions and the timing of the execution of the eye movement.

This aligns with a range of findings from Zoest, Donk, and Theeuwes (2004), in which saccades with latencies below 250 ms were stimulus driven, whereas later eye movements were more goal driven. This suggests that visual input is likely used for bottom-up guidance as soon as it is available. However, some visual features are decoded relatively late in the visual pathway from the visual inputs and are therefore only available with a certain latency. For example, Palmer et al. (2019) found that it takes 200 – 300 ms to utilise colour information in a guided search. Then again, there are some search tasks that we can solve surprisingly quickly: it only takes 150 ms to recognise whether an animal is visible in an image or not (Thorpe, Fize, and Marlot 1996). However, looking at the results of the influence of top-down processes after 250 ms, it seems that even with these highly efficient search processes, there is a very limited time frame in which bottom-up factors such as salience can have an effect in situations where an instruction is given.

One explanation of the potential functions leading to the reduction of the influence of bottom-up salience was given by Donk and Zoest (2008). Based on two experiments, they concluded that salience could be represented in the visual system only for a limited period of time after it is received.

In both experiments, participants were asked to find the most salient object among two salient objects and many distractors. Again, performance varied with timing. Initially, participants were able to detect the most salient object. However, in trials with saccade latencies longer than 300 ms, participants were not able to distinguish the most salient object from the other salient object. In a second experiment, the presentation time of the stimulus was systematically varied. If objects were visible longer than 83 ms the average proportion of correct answers fell by about 10%. From all of these results Donk and Zoest (2008) concluded that the relative salience between objects is represented in neural response map via latency: Following this idea, neurons representing the salience map with receptive fields containing highly salient objects would fire earlier than neurons with receptive fields containing less salient stimuli. After the latency offset, the neurons corresponding to the location of the less salient object also fire. Then, the two may no longer be distinguishable in terms of their relative salience. Instead, the saliency map would then only contain information about the locations of all salient objects in the FOV.

Similarly, such temporal effects could potentially explain some of the typical observations made when shifting from bottom-up to top-down control. For example, Wolf and Lappe (2020) found that this shift did not simply depend on the duration of the task, but rather on how long salient objects that initially appeared distracting had already been visible. They concluded that the visually available bottom-up saliency must first be suppressed to allow the influence of top-down factors. In theory, this suppression mechanism could be related to the loss of relative salience information and a therefore more spread-out saliency signal.

In many situations, the first saccade could already be carried out while relative salience is still represented in the visual system. Thus, in many arrangements of objects in our FOV the first saccade is guided by bottom-up factors.

This fits well to observations by Brouwer, Franz, and Gegenfurtner (2009) , who found that when we look at an object that we want to grasp, the first saccade tends to go towards the centre of gravity of the target object. The following second fixation, however, is often more task-related and brings areas relevant for grasping into foveal vision.

In line with the findings of Donk and Zoest (2008), Heusden, Donk, and Olivers (2021) found that relative salience appears to be represented long enough to guide the first saccade, even when it is directed towards peripheral targets, where the planning and execution phases are somewhat prolonged. Moreover, they found that in this scenario the whole process of switching from bottom-up to more top-down guidance seems to start up to 50 ms later. One could therefore argue that the weighting of the guiding factors is also influenced by the actual sequence of eye movements based on the structure of the scene.

Head Movements

Many processes in our visual system are organised retinotopically. This is also the case for the neurons processing bottom-up salience signals (Bichot, Rossi, and Desimone 2005). As a result, salient objects can pop-out from the FOV during searching and trigger eye movements towards them. In line with many experiments on guidance during searching in the FOV, also Study [III] showed that initial saccades at the start of a search are guided by salient visual stimuli.

At the start of a head movement no visual information about the head movement target in the extended FOV is available. Therefore, we need to rely on non-visual information when planning and initiating the head movement. In general, contextual guidance or the use memorized or learned content is an established principle for visual guidance. For example, there is experimental evidence that top-down information about scenes can be used to refine the search space in a scene (Kanan et al. 2009; Hwang, Wang, and Pomplun 2011; Neider and Zelinsky 2006). In addition, many of the regions involved in the selection and generation of saccades are sensitive to the expectation of reward (Jovancevic-Misic and Hayhoe 2009). For example, in monkeys saccade-related areas in the cortex (lateral intraparietal area, frontal eye field, supplementary eye field and dorsolateral prefrontal cortex) all show sensitivity to rewards (Platt and Glimcher 1999; Stuphorn, Taylor, and Schall 2000; Dorris and Glimcher 2004; Sugrue, Corrado, and Newsome 2004). As a result, it makes sense to assume that we rely more heavily on memorised content to plan our head movements while the FOV is being updated.

The use of a non-visual signal to plan our head movements suggests that this movement plan is not represented retinotopically in the visual system. This raises the question of which coordinate system we use to plan our movements instead. Crowe, Vorgia, and Brenner (2025) investigated search time differences for search displays that were occluded with a movable aperture. In separate conditions participants could use a mouse to either move the aperture or the presented stimuli behind it. They found that the search time was shorter when participants moved the aperture instead of the stimulus. This suggests that the non-visible search space is not organised in absolute world-coordinates, but relative to the hand movements. This was also the case when an additional visible reference was present and even when the aperture and the search display were constantly wobbling.

However, if the assignment between mouse movements and the movement on the screen was reversed, the search time pattern inverted. The search then became faster when the search display was moved instead of the aperture. This suggests that the congruence between the moving direction and the visual change was more important for the speed of the search than which object is moved.

Head movements result in a reliable shift of the current visual input. Therefore, the representation or organisation of the surrounding objects could also be organised using the movement signal. In our experiments, it seemed as if this head movement strategy was based mainly on the structure of the scene.

This fits well with the results of Shioiri et al. (2018) who found that participants are able to quickly learn the spatial arrangements of displays placed around them, while doing a search task. As a result, the area in which we can move our eyes is determined by head movements that are guided by top-down factors. Consequently, the influence of top-down factors increases as soon as scenarios are considered in which head movements are necessary to reach a target. Using 360° scenes, Haskins et al. (2020) compared eye movements between two experimental conditions. In the first, simulated head movements were presented. In the second condition, participants were free to move their heads as they explored the scene. found that during active viewing, participants used shorter, more exploratory fixations, while the influence of salience decreased and the effect of meaningful scene areas increased.

Generally, once a head movement has started, we could in theory use visual information that became available through the shift of the FOV to alter the ongoing head movement. In our experiments this did not seem to be the case. However, previous experiments found that individuals use extraretinal cues such as neck proprioception, information about the neck movement and vestibular information to perceive visual input while turning our heads (Crowell et al. 1998), which might explain behavioural differences between simulated and actual head turns. Then again, the image on the retina is not completely stabilised during head movements (Steinman and Collewijn 1980; Skavenski et al. 1979), which suggests that guidance of head movements is more likely based on a combination of top-down factors, such as memory and previously learned structures.

One could therefore argue that at the start of a search, bottom-up salience is guiding our first saccade towards the first object to inspect. If the search target is not in the initial FOV, we need to expand the search space by moving our head to obtain more information in the extended FOV. During this process, we typically follow a saccade and catch-up strategy in which first a rapid eye movement towards potential targets in the periphery is made and simultaneously a head movement is initiated. Once the head movement has caught up and the previous saccade target is close to central vision, another saccade follows to inspect new objects located in peripheral regions of the newly established FOV. This means that no bottom-up visual information is available when planning a head movement. Although head movements are slow and visual stimuli can be stabilised to a certain extent on the retina even during continuous head rotation, it appears, at least in our experiments, that when searching for a scene with a known structure, we do not change our planned head movement trajectory based on the information received during the shift of the FOV. Once a new FOV was established and the head movement ended, bottom-up information seemed to be able to guide gaze again.

This leads us to conclude that our attentional system has both bottom-up and top-down information available at the same time, so that the weight given to different guiding factors can change rapidly. It seems that both, bottom-up and top-down guidance, can modulate each other to a certain degree. For example, our eye movements may initially be strongly action-related, but can then be distracted by a salient stimulus. Conversely, we can also give ourselves instructions in an environment with a lot of salient factors to find relevant objects in our surroundings. Bottom-up guidance becomes available earlier than the influence of top-down factors, the latter usually becoming more important after a few hundred milliseconds, when specific instructions are given. After this delay, we often do a second, more top-down driven, saccade. The delay can be longer when salient objects are located in the periphery. Finally, it is clear that no visual information is available as a guiding factor for the initial planning of head movements towards the extended FOV. In line with this, our results suggest that these are planned based on the structure of the scene. In our experiments, it also appeared that head movements were not influenced by information that may have become visible during their execution. As soon as head movements are necessary to reach a target, top-down factors gain importance overall, as top-down-controlled head movements determine the area of our environment in which we can perform eye movements.

Limitations and Outlook for VR Research

The experiments in this thesis show that it is possible to collect behavioural data using VR that meets scientific standards. Nevertheless, it is reasonable to ask how VR measurements compare to data previously collected in simplified environments using desktop setups and whether there are systematic differences between the two. There are several experiments which suggest that eye movement patterns are similar when performing the same task in the real world and in realistic simulations (S. Kim, Nussbaum, and Ulman 2018; Drewes, Feder, and Einhäuser 2021; Gulhan, Durant, and Zanker 2021). Then again, studies have shown that VR alters the perception of walking speed (Banton et al. 2005) and can influence walking dynamics when walking on a treadmill (Sloot, van der Krogt, and Harlaar 2014). Moreover, there are several other potential reasons why, even in ideal conditions, users may not act naturally in VR researchers should be aware of. First, wearing an HMD feels different from not wearing one. Therefore, although the devices are becoming lighter with every generation, right now the additional weight of more than 700 g on the head is still clearly noticeable.

The extra weight on the head and joints can make many natural movements more strenuous, which can lead to unnatural compensatory movements.

Second, on current devices, the FOV is more limited than in reality, which could potentially alter visual perception for example during locomotion.

In light of these limitations, it is important to distinguish between undesirable side effects of VR and differences that arise because the task is closer to natural human behaviour. For example Wu (2025) conducted a real life search task, in which participants walked around a room to find objects that were presented on a screen. All movements of the head and eyes were tracked throughout the whole task. In their experiment, Wu (2025) reported an average fixation duration of about 200 ms which is similar to results from other more simplistic studies.

However, the reported average saccade amplitude was 10.4° which is higher than in 2D natural image search with average saccade lengths of 5-6°.

They also found several environment fixations in which participants did not look at task relevant objects, but looked at other objects, for example, when navigating the room.

Similar conclusions can be drawn from our data in Study [II]. In a direct comparison between VR and simple desktop studies, it would be easy to dismiss such differences as VR bias. In this situation, it is important to consider carefully whether the deviation in the behavioural data could also be a result of observing more natural behaviours in VR, which may describe typical human behaviour even better than the results of desktop studies.

Finally, some differences in measured behaviour may result from using different measuring equipment. Compared to some desktop-based eye trackers such as the Eyelink 1000, which has established itself as the gold standard in eye tracking research, VR eye trackers are still rather imprecise. Therefore, when conducting a VR experiment, more data is required to achieve a similar level of precision by averaging the information about eye movements across multiple trials. The overall lower accuracy means that it is not possible to distinguish eye movements with very short amplitudes, such as fixational eye movements. Our latency measurements in chapter [I] showed that measuring eye movements usually takes several frames. In addition, due to its significantly lower spatial and temporal resolution, the currently available VR eye tracking hardware cannot be readily used for online measurement of saccade dynamics. It also does not seem possible to use it to detect saccade onsets quickly enough to present VEs with gaze-contingent experimental stimuli. This would require the positions of the virtual objects to be adjusted to the gaze, which would require low latency eye and head tracking. Nevertheless, it is possible to use HMDs to observe eye and gaze paths of saccades and pursuit movements, as well as gaze points and fixations, if the data is analysed retrospectively and averaged across several trials.

When doing so, we should consider that the gaze signal resulting from combining eye and head tracking suffers from imperfections that must be taken into account when planning VR experiments and interpreting their results. Based on recordings during the pilot phase with the HTC Vive Pro Eye, we suspect that eye and head tracking are not recorded perfectly synchronised. For example, when horizontal head movements are recorded while we are fixating a point, our eye movements compensate for our head movements so that we can see the fixation target clearly while moving our head. If we calculate the difference between the speeds of eye and head movements, we often find rapid velocities of over 30 °/s when using current VR hardware. Then again, finding perfect stabilisation at speeds close to 0°/s seems to be unlikely based on previous results (Harris 1994). Nevertheless, we cannot distinguish between actual retinal slip and additional errors caused by asynchrony between eye and head tracking with the current setup. This makes it difficult or even impossible to identify fixation objects based on an eye velocity threshold, as is often used in desktop eye tracking experiments.

One method for determining gaze points and gaze patterns in VR is calculating intersections between the gaze ray and the positions of objects in the VE. When the same object is looked at for multiple frames, we can assume that it was fixated, even though the output velocity of the gaze signal wrongly suggests that the image on the retina was not perfectly stable. In summary, as a scientific method, VR is no longer in its infancy and offers great potential for systematically observing complex movement sequences (such as searching for an object in a room). This means that although some previously established methods need to be adjusted due to flaws in the currently available hardware, based on our tests, the combined gaze signal provides a solid overview of where and how long a person looked in a sequence, especially when combined with information about the VE. All in all, this makes VR eye tracking particularly interesting as a method for observing eye movements during actions and enables experiments in which participants can explore their surroundings more naturally than in experiments in front of a desk. In situations like these, various processes simultaneously influence our behaviour. We can therefore use VR to gain new insights into our perception and behavioural control. High temporal resolution and the combination of detailed participant behaviour along with the exact description of the stimuli presented in the experiment make it possible to describe the timing and connection between perception and behaviour much more accurately than before. The extensive amount of data enables new evaluation methods, such as machine learning, to automatically classify the collected data.

Conclusions

The rapid development of VR technology in recent years has led to the combination and improvement of several different technologies for tracking human behaviour. Simultaneously, the possibilities for presenting visual stimuli that respond to our movements have improved.

In this thesis, we showed how observing eye, head and walking movements in VR can improve our understanding of how we perceive and explore our surroundings. As a first step, we evaluated VR eye tracking latency to be able to use it as a new method in psychophysics. We found that eye tracking latency ranged between 45 and 81 ms on different devices. Second, we illustrated that eye and head movements offer predictive added value for locomotion forecasting. This fits well with previous knowledge about the planning and execution of walking movements. In our study, we predicted future movement behaviour in VR. Such prediction models could be used to minimise errors in motion tracking by making plausibility assumptions based on such motion predictions. Moreover, similar prediction models could be used to compensate for any system latency with software by performing tracking based on a mixture of measured and predicted data. Our predictions can also be used to create immersive VEs that automatically respond to intended behaviour and are thus able to keep users away from obstacles. Since we relied exclusively on egocentric movement data, the same method could also be applied in many real-world contexts. In such contexts, it is helpful if controllable environmental parameters can react proactively to human movements. For example, when learning new movement sequences, this prevents errors at an early stage and avoids damage to work materials and the practice of incorrect movement patterns. Such predictions could be helpful in surveillance systems for hazardous areas. These safety systems would then no longer just react when a person enters a hazardous area, but could also issue an early warning or automatically eliminate potential hazards based on movement predictions using egocentric data. Finally, we used a combination of VR eye and head tracking to expand the classic visual search paradigm. In two experiments on search in the extended FOV, we observed how head and eye movements interact with each other to form our gaze trajectory. We found that head movements in particular tended to be guided by top-down features and gaze was not immediately influenced by stimuli that entered the FOV through a head movement. This description of search behaviour suggests that salience plays a rather subordinate role when searching in real and virtual environments.

On this basis, new experiments using VR eye tracking can be developed to describe the entire search process with its various subtasks in greater detail. To this day, there remain numerous unanswered questions about the extent to which the distribution of attention when walking or retrieving memories interacts with the control of search processes, especially when a target enters our FOV through self-motion. VR and eye tracking technology has made significant advances over the past decade. Ivan Sutherland envisioned many of these developments when he contemplated the possibilities of the ultimate display. Although his vision of a space in which a computer can control the existence of matter has not (yet) been realised, the technology that has continuously evolved towards his ideas can already help us to observe more accurately how we process and perceive our environment. This is because the rapid development of VR hardware has been accompanied by increasingly detailed behavioural measurements. As a result, HMDs are useful tools for improving, refining and expanding dozens of experiments and existing theories. This makes VR a method with great potential for improving our understanding of how we see, perceive and act.

References

Albert, Rachel, Anjul Patney, David Luebke, and Joohwan Kim. 2017. “Latency Requirements for Foveated Rendering in Virtual Reality.” ACM Transaction of Applied Perception 14 (4). https://doi.org/10.1145/3127589.

Arabadzhiyska, Elena, Okan Tarhan Tursun, Karol Myszkowski, Hans-Peter Seidel, and Piotr Didyk. 2017. “Saccade Landing Position Prediction for Gaze-Contingent Rendering.” ACM Trans. Graph. 36 (4). https://doi.org/10.1145/3072959.3073642.

Aziz, Samantha, Dillon J Lohr, Lee Friedman, and Oleg Komogortsev. 2024. “Evaluation of Eye Tracking Signal Quality for Virtual Reality Applications: A Case Study in the Meta Quest Pro.” In Proceedings of the 2024 Symposium on Eye Tracking Research and Applications. ETRA ’24. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3649902.3653347.

Banton, Tom, Jeanine Stefanucci, Frank Durgin, Adam Fass, and Dennis Proffitt. 2005. “The Perception of Walking Speed in a Virtual Environment.” Presence 14 (4): 394–406. https://doi.org/10.1162/105474605774785262.

Barnes, Lydia, Matthew J. Davidson, and David Alais. 2025. “The Speed and Phase of Locomotion Dictate Saccade Probability and Simultaneous Low-Frequency Power Spectra.” Attention, Perception, & Psychophysics 87 (1): 245–60. https://doi.org/10.3758/s13414-024-02932-4.

Bektaş, Kenan, Arzu Çöltekin, Jens Krüger, Andrew T. Duchowski, and Sara Irina Fabrikant. 2019. “GeoGCD: Improved Visual Search via Gaze-Contingent Display.” In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications. ETRA ’19. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3317959.3321488.

Bichot, Narcisse P., Andrew F. Rossi, and Robert Desimone. 2005. “Parallel and Serial Neural Mechanisms for Visual Search in Macaque Area V4.” Science 308 (5721): 529–34. https://doi.org/10.1126/science.1109676.

Bremer, Gianni, and Markus Lappe. 2024. “Predicting Locomotion Intention Using Eye Movements and EEG with LSTM and Transformers.” In 2024 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 21–30. https://doi.org/10.1109/ISMAR62088.2024.00016.

Bremer, Gianni, Niklas Stein, and Markus Lappe. 2022. “Do They Look Where They Go? Gaze Classification During Walking.” In NeuRIPS 2022 Workshop on Gaze Meets ML. https://openreview.net/forum?id=XP0k6ToFK7t.

Brouwer, Anne-Marie, Volker H. Franz, and Karl R. Gegenfurtner. 2009. “Differences in fixations between grasping and viewing objects.” Journal of Vision 9 (1): 18–18. https://doi.org/10.1167/9.1.18.

Cao, Liyu, Xinyu Chen, and Barbara F. Haendel. 2020. “Overground Walking Decreases Alpha Activity and Entrains Eye Movements in Humans.” Frontiers in Human Neuroscience, December. https://www.proquest.com/scholarly-journals/overground-walking-decreases-alpha-activity/docview/2471936104/se-2.

Carmack, John. 2013. “Latency Mitigation Strategies (by John Carmack).” https://danluu.com/latency-mitigation/.

Craybiel, Ashton, Ernst Jokl, and Claude Trapp. 1955. “Notes: Russian Studies of Vision in Relation to Physical Activity and Sports.” Research Quarterly. American Association for Health, Physical Education and Recreation 26 (4): 480–85. https://doi.org/10.1080/10671188.1955.10612840.

Crowe, Emily M., Danai T. Vorgia, and Eli Brenner. 2025. “Congruency Between Viewers’ Movements and the Region of the Display Being Sampled Speeds up Search Through an Aperture.” Perception 54 (4): 226–38. https://doi.org/10.1177/03010066251314181.

Crowell, James A., Martin S. Banks, Krishna V. Shenoy, and Richard A. Andersen. 1998. “Visual Self-Motion Perception During Head Turns.” Nature Neuroscience 1 (8): 732–37. https://doi.org/10.1038/3732.

Davidson, Matthew J., Robert Tobin Keys, Brian Szekely, Paul MacNeilage, Frans Verstraten, and David Alais. 2023. “Continuous Peripersonal Tracking Accuracy Is Limited by the Speed and Phase of Locomotion.” Scientific Reports 13 (1): 14864. https://doi.org/10.1038/s41598-023-40655-y.

Davidson, Matthew J., Frans A. J. Verstraten, and David Alais. 2024. “Walking Modulates Visual Detection Performance According to Stride Cycle Phase.” Nature Communications 15 (1): 2027. https://doi.org/10.1038/s41467-024-45780-4.

Donk, Mieke, and Wieske van Zoest. 2008. “Effects of Salience Are Short-Lived.” Psychological Science 19 (7): 733–39. https://doi.org/10.1111/j.1467-9280.2008.02149.x.

Dorris, Michael C., and Paul W. Glimcher. 2004. “Activity in Posterior Parietal Cortex Is Correlated with the Relative Subjective Desirability of Action.” Neuron 44 (2): 365–78. https://doi.org/10.1016/j.neuron.2004.09.009.

Drewes, Jan, Sascha Feder, and Wolfgang Einhäuser. 2021. “Gaze During Locomotion in Virtual Reality and the Real World.” Frontiers in Neuroscience 15. https://doi.org/10.3389/fnins.2021.656913.

Einhäuser, Wolfgang, Ueli Rutishauser, and Christof Koch. 2008. “Task-Demands Can Immediately Reverse the Effects of Sensory-Driven Saliency in Complex Visual Stimuli.” Journal of Vision 8 (2): 2–2. https://doi.org/10.1167/8.2.2.

Foulsham, Tom, and Geoffrey Underwood. 2007. “How Does the Purpose of Inspection Influence the Potency of Visual Salience in Scene Perception?” Perception 36 (8): 1123–38. https://doi.org/10.1068/p5659.

Franchak, John M., and Karen E. Adolph. 2010. “Visually Guided Navigation: Head-Mounted Eye-Tracking of Natural Locomotion in Children and Adults.” Vision Research 50 (24): 2766–74. https://doi.org/10.1016/j.visres.2010.09.024.

Geruschat, Duane R., Kathleen A. Turano, and Julie W. Stahl. 1998. “Traditional Measures of Mobility Performance and Retinitis Pigmentosa.” Optometry and Vision Science 75 (7). https://doi.org/10.1097/00006324-199807000-00022.

Gulhan, Doga, Szonya Durant, and Johannes M. Zanker. 2021. “Similarity of Gaze Patterns Across Physical and Virtual Versions of an Installation Artwork.” Scientific Reports 11 (1): 18913. https://doi.org/10.1038/s41598-021-91904-x.

Haegens, Saskia, Verónica Nácher, Rogelio Luna, Ranulfo Romo, and Ole Jensen. 2011. “Alpha-Oscillations in the Monkey Sensorimotor Network Influence Discrimination Performance by Rhythmical Inhibition of Neuronal Spiking.” Proceedings of the National Academy of Sciences 108 (48): 19377–82. https://doi.org/10.1073/pnas.1117190108.

Harris, Laurence R. 1994. “Visual Motion Caused by Movements of the Eye, Head and Body.” Visual Detection of Motion 1: 397–435. https://www.yorku.ca/harris/pubs/1994visualmotioncausedbymovementsofeyeheadandbody.pdf.

Haskins, A. J., J. Mentch, T. L. Botch, and C. E. Robertson. 2020. “Active Vision in Immersive, 360° Real-World Environments.” Scientific Reports 10: 14304. https://doi.org/10.1038/s41598-020-71125-4.

Hassan, Shirin E., Jan E. Lovie-Kitchin, Woods, and Russell L. 2002. “Vision and Mobility Performance of Subjects with Age-Related Macular Degeneration.” Optometry and Vision Science 79 (11). https://doi.org/10.1097/00006324-200211000-00007.

Heusden, Elle van, Mieke Donk, and Christian N. L. Olivers. 2021. “The Dynamics of Saliency-Driven and Goal-Driven Visual Selection as a Function of Eccentricity.” Journal of Vision 21 (3): 2–2. https://doi.org/10.1167/jov.21.3.2.

Hollands, Mark A, Dilwyn E Marple-Horvat, Sebastian Henkes, and Andrew K Rowan. 1995. “Human Eye Movements During Visually Guided Stepping.” Journal of Motor Behavior 27 (2): 155–63. https://doi.org/10.1080/00222895.1995.9941707.

Hollands, Mark A, Aftab E Patla, and Joan N Vickers. 2002. “‘Look Where You’re Going!’: Gaze Behaviour Associated with Maintaining and Changing the Direction of Locomotion.” Experimental Brain Research 143 (2): 221–30. https://doi.org/10.1007/s00221-001-0983-7.

Holmqvist, Kenneth, Marcus Nyström, Richard Andersson, Richard Dewhurst, Jarodzka Halszka, and Joost van de Weijer. 2011. Eye Tracking: A Comprehensive Guide to Methods and Measures. United Kingdom: Oxford University Press.

Hwang, Alex D., Hsueh-Cheng Wang, and Marc Pomplun. 2011. “Semantic Guidance of Eye Movements in Real-World Scenes.” Vision Research 51 (10): 1192–1205. https://doi.org/10.1016/j.visres.2011.03.010.

Jeon, Sang-Bin, Jaeho Jung, Jinhyung Park, and In-Kwon Lee. 2025. “F-RDW: Redirected Walking with Forecasting Future Position.” IEEE Transactions on Visualization and Computer Graphics 31 (4): 1970–84. https://doi.org/10.1109/TVCG.2024.3376080.

Jovancevic-Misic, Jelena, and Mary Hayhoe. 2009. “Adaptive Gaze Control in Natural Environments.” Journal of Neuroscience 29 (19): 6234–38. https://doi.org/10.1523/JNEUROSCI.5570-08.2009.

Kanan, Christopher, Mathew H. Tong, Lingyun Zhang, and Garrison W. Cottrell. 2009. “SUN: Top-down Saliency Using Natural Statistics.” Visual Cognition 17 (6-7): 979–1003. https://doi.org/10.1080/13506280902771138.

Katrychuk, Dmytro, Henry K. Griffith, and Oleg V. Komogortsev. 2019. “Power-Efficient and Shift-Robust Eye-Tracking Sensor for Portable VR Headsets.” In Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications. ETRA ’19. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3314111.3319821.

Kim, Sunwook, Maury A. Nussbaum, and Sophia Ulman. 2018. “Impacts of Using a Head-Worn Display on Gait Performance During Level Walking and Obstacle Crossing.” Journal of Electromyography and Kinesiology 39: 142–48. https://doi.org/10.1016/j.jelekin.2018.02.007.

Kim, YoungIn, Seokhyun Hwang, Jeongseok Oh, and Seungjun Kim. 2024. “GaitWay: Gait Data-Based VR Locomotion Prediction System Robust to Visual Distraction.” In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. CHI EA ’24. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3613905.3651073.

Komogortsev, Oleg V., Young Sam Ryu, and Do Hyong Koh. 2009. “Quick Models for Saccade Amplitude Prediction.” Journal of Eye Movement Research 3 (1): 1–13. https://doi.org/10.16910/jemr.3.1.1.

Lin, Jeffrey Y., Steven Franconeri, and James T. Enns. 2008. “Objects on a Collision Path with the Observer Demand Attention.” Psychological Science 19 (7): 686–92. https://doi.org/10.1111/j.1467-9280.2008.02143.x.

Longstaffe, Kate A., Bruce M. Hood, and Iain D. Gilchrist. 2014. “The Influence of Cognitive Load on Spatial Search Performance.” Attention, Perception, & Psychophysics 76 (1): 49–63. https://doi.org/10.3758/s13414-013-0575-1.

Loschky, Lester C., and George W. McConkie. 2000. “User Performance with Gaze Contingent Multiresolutional Displays.” In Proceedings of the 2000 Symposium on Eye Tracking Research & Applications, 97–103. ETRA ’00. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/355017.355032.

Loschky, Lester C., and Gary S. Wolverton. 2007. “How Late Can You Update Gaze-Contingent Multiresolutional Displays Without Detection?” ACM Transactions on Multimedia Computing, Communications, and Applications 3 (4). https://doi.org/10.1145/1314303.1314310.

Lukashova-Sanz, Olga, and Siegfried Wahl. 2021. “Saliency-Aware Subtle Augmentation Improves Human Visual Search Performance in VR.” Brain Sciences 11 (3). https://doi.org/10.3390/brainsci11030283.

Matthis, Jonathan Samir, Sean L. Barton, and Brett R. Fajen. 2017. “The Critical Phase for Visual Control of Human Walking over Complex Terrain.” Proceedings of the National Academy of Sciences 114 (32): E6720–29. https://doi.org/10.1073/pnas.1611699114.

Matthis, Jonathan Samir, Jacob L. Yates, and Mary M. Hayhoe. 2018. “Gaze and the Control of Foot Placement When Walking in Natural Terrain.” Current Biology 28 (8): 1224–1233.e5. https://doi.org/10.1016/j.cub.2018.03.008.

Matthis, Jonathan S, and Brett R Fajen. 2014. “Visual Control of Foot Placement When Walking over Complex Terrain.” Journal of Experimental Psychology. Human Perception and Performance 40 (1): 106—115. https://doi.org/10.1037/a0033101.

Mayor, Jesus, Pablo Calleja, and Felix Fuentes-Hurtado. 2024. “Long Short-Term Memory Prediction of User’s Locomotion in Virtual Reality.” Virtual Reality 28 (1): 65. https://doi.org/10.1007/s10055-024-00962-9.

Melnyk, Kateryna, Lee Friedman, Dmytro Katrychuk, and Oleg Komogortsev. 2025. “Gaze Prediction as a Function of Eye Movement Type and Individual Differences.” In Proceedings of the 2025 Symposium on Eye Tracking Research and Applications. ETRA ’25. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3715669.3723116.

Moreno-Arjonilla, Jesús, Alfonso López-Ruiz, J. Roberto Jiménez-Pérez, José E. Callejas-Aguilera, and Juan M. Jurado. 2024. “Eye-Tracking on Virtual Reality: A Survey.” Virtual Reality 28 (1): 38. https://doi.org/10.1007/s10055-023-00903-y.

Neider, Mark B., and Gregory J. Zelinsky. 2006. “Scene Context Guides Eye Movements During Visual Search.” Vision Research 46 (5): 614–21. https://doi.org/10.1016/j.visres.2005.08.025.

Niehorster, Diederick C, Li Li, and Markus Lappe. 2017. “The Accuracy and Precision of Position and Orientation Tracking in the HTC Vive Virtual Reality System for Scientific Research.” I-Perception 8 (3): 2041669517708205. https://doi.org/10.1177/2041669517708205.

Nyström, Marcus, and Kenneth Holmqvist. 2008. “Semantic Override of Low-Level Features in Image Viewing–Both Initially and Overall.” Journal of Eye Movement Research 2 (2): 1–11. https://doi.org/10.16910/jemr.2.2.2.

Palacios-Ibáñez, Almudena, Javier Marín-Morales, Manuel Contero, and Mariano Alcañiz. 2023. “Predicting Decision-Making in Virtual Environments: An Eye Movement Analysis with Household Products.” Applied Sciences 13 (12). https://doi.org/10.3390/app13127124.

Palmer, Evan M., Michael J. Van Wert, Todd S. Horowitz, and Jeremy M. Wolfe. 2019. “Measuring the Time Course of Selection During Visual Search.” Attention, Perception, & Psychophysics 81 (1): 47–60. https://doi.org/10.3758/s13414-018-1596-6.

Palmero, Cristina, Oleg V. Komogortsev, Sergio Escalera, and Sachin S. Talathi. 2023. “Multi-Rate Sensor Fusion for Unconstrained Near-Eye Gaze Estimation.” In Proceedings of the 2023 Symposium on Eye Tracking Research and Applications. ETRA ’23. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3588015.3588407.

Patla, Aftab E., and Joan N. Vickers. 2003. “How Far Ahead Do We Look When Required to Step on Specific Locations in the Travel Path During Locomotion?” Experimental Brain Research 148 (1): 133–38. https://doi.org/10.1007/s00221-002-1246-y.

Pelz, Jeff B., and Constantin Rothkopf. 2007. “Oculomotor Behavior in Natural and Man-Made Environments.” In Eye Movements, edited by Roger P. G. Van Gompel, Martin H. Fischer, Wayne S. Murray, and Robin L. Hill, 661–76. Oxford: Elsevier. https://doi.org/10.1016/B978-008044980-7/50033-1.

Platt, Michael L., and Paul W. Glimcher. 1999. “Neural Correlates of Decision Variables in Parietal Cortex.” Nature 400 (6741): 233–38. https://doi.org/10.1038/22268.

Rigas, Ioannis, Hayes Raffle, and Oleg V. Komogortsev. 2017. “Hybrid PS-V Technique: A Novel Sensor Fusion Approach for Fast Mobile Eye-Tracking with Sensor-Shift Aware Correction.” https://arxiv.org/abs/1707.05411.

Rolff, Tim, Susanne Schmidt, Frank Steinicke, and Simone Frintrop. 2023. “A Deep Learning Architecture for Egocentric Time-to-Saccade Prediction Using Weibull Mixture-Models and Historic Priors.” In Proceedings of the 2023 Symposium on Eye Tracking Research and Applications. ETRA ’23. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3588015.3588408.

Rolff, Tim, Niklas Stein, Markus Lappe, Frank Steinicke, and Simone Frintrop. 2022. “Metrics for Time-to-Event Prediction of Gaze Events.” In NeuRIPS 2022 Workshop on Gaze Meets ML. https://openreview.net/forum?id=snClL0drE-A.

Rolff, Tim, Frank Steinicke, and Simone Frintrop. 2022. “When Do Saccades Begin? Prediction of Saccades as a Time-to-Event Problem.” In 2022 Symposium on Eye Tracking Research and Applications. ETRA ’22. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3517031.3529627.

Rothkopf, Constantin A., Dana H. Ballard, and Mary M. Hayhoe. 2016. “Task and Context Determine Where You Look.” Journal of Vision 7 (14): 16–16. https://doi.org/10.1167/7.14.16.

Schuetz, Immo, and Katja Fiehler. 2022. “Eye Tracking in Virtual Reality: Vive Pro Eye Spatial Accuracy, Precision, and Calibration Reliability.” Journal of Eye Movement Research 15 (3). https://doi.org/10.16910/jemr.15.3.3.

Schütz, Alexander C., Doris I. Braun, and Karl R. Gegenfurtner. 2011. “Eye Movements and Perception: A Selective Review.” Journal of Vision 11 (5): 9–9. https://doi.org/10.1167/11.5.9.

Schütz, Alexander C., Julia Trommershäuser, and Karl R. Gegenfurtner. 2012. “Dynamic Integration of Information about Salience and Value for Saccadic Eye Movements.” Proceedings of the National Academy of Sciences 109 (19): 7547–52. https://doi.org/10.1073/pnas.1115638109.

Sharma, Mansi, Camilo Andrés Martı́nez Martı́nez, Benedikt Emanuel Wirth, Antonio Krüger, and Philipp Müller. 2024. “Distinguishing Target and Non-Target Fixations with EEG and Eye Tracking in Realistic Visual Scenes.” In Proceedings of the 26th International Conference on Multimodal Interaction, 459–68. ICMI ’24. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3678957.3685728.

Shioiri, Satoshi, Masayuki Kobayashi, Kazumichi Matsumiya, and Ichiro Kuriki. 2018. “Spatial Representations of the Viewer’s Surroundings.” Scientific Reports 8 (1): 7171. https://doi.org/10.1038/s41598-018-25433-5.

Sipatchin, Alexandra, Siegfried Wahl, and Katharina Rifai. 2021. “Eye-Tracking for Clinical Ophthalmology with Virtual Reality (VR): A Case Study of the HTC Vive Pro Eye’s Usability.” Healthcare 9 (2). https://doi.org/10.3390/healthcare9020180.

Skavenski, A. A., R. M. Hansen, R. M. Steinman, and B. J. Winterson. 1979. “Quality of Retinal Image Stabilization During Small Natural and Artificial Body Rotations in Man.” Vision Research 19 (6): 675–83. https://doi.org/10.1016/0042-6989(79)90243-8.

Sloot, L. H., M. M. van der Krogt, and J. Harlaar. 2014. “Effects of Adding a Virtual Reality Environment to Different Modes of Treadmill Walking.” Gait & Posture 39 (3): 939–45. https://doi.org/10.1016/j.gaitpost.2013.12.005.

Stein, Niklas, Diederick C Niehorster, Tamara Watson, Frank Steinicke, Katharina Rifai, Siegfried Wahl, and Markus Lappe. 2021. “A Comparison of Eye Tracking Latencies Among Several Commercial Head-Mounted Displays.” I-Perception 12 (1). https://doi.org/10.1177/204166952098333.

Steinman, Robert M., and Han Collewijn. 1980. “Binocular Retinal Image Motion During Active Head Rotation.” Vision Research 20 (5): 415–29. https://doi.org/10.1016/0042-6989(80)90032-2.

Stevenson, S. B., F. C. Volkmann, J. P. Kelly, and L. A. Riggs. 1986. “Dependence of Visual Suppression on the Amplitudes of Saccades and Blinks.” Vision Research 26 (11): 1815–24. https://doi.org/10.1016/0042-6989(86)90133-1.

Stuphorn, Veit, Tracy L. Taylor, and Jeffrey D. Schall. 2000. “Performance Monitoring by the Supplementary Eye Field.” Nature 408 (6814): 857–60. https://doi.org/10.1038/35048576.

Sugrue, Leo P., Greg S. Corrado, and William T. Newsome. 2004. “Matching Behavior and the Representation of Value in the Parietal Cortex.” Science 304 (5678): 1782–87. https://doi.org/10.1126/science.1094765.

Tatler, Benjamin W., Mary M. Hayhoe, Michael F. Land, and Dana H. Ballard. 2011. “Eye Guidance in Natural Vision: Reinterpreting Salience.” Journal of Vision 11 (5): 5–5. https://doi.org/10.1167/11.5.5.

Thorpe, Simon, Denis Fize, and Catherine Marlot. 1996. “Speed of Processing in the Human Visual System.” Nature 381 (6582): 520–22. https://doi.org/10.1038/381520a0.

Volkmann, Frances C. 1986. “Human Visual Suppression.” Vision Research 26 (9): 1401–16. https://doi.org/10.1016/0042-6989(86)90164-1.

Wei, Shu, Desmond Bloemers, and Aitor Rovira. 2023. “A Preliminary Study of the Eye Tracker in the Meta Quest Pro.” In Proceedings of the 2023 ACM International Conference on Interactive Media Experiences, 216–21. IMX ’23. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3573381.3596467.

Wolf, Christian, and Markus Lappe. 2020. “Top-down Control of Saccades Requires Inhibition of Suddenly Appearing Stimuli.” Attention, Perception, & Psychophysics 82 (8): 3863–77. https://doi.org/10.3758/s13414-020-02101-3.

Idem. 2021. “Salient Objects Dominate the Central Fixation Bias When Orienting Toward Images.” Journal of Vision 21 (8): 23–23. https://doi.org/10.1167/jov.21.8.23.

Wolfe, Jeremy M, and Todd S Horowitz. 2017. “Five Factors That Guide Attention in Visual Search.” Nature Human Behaviour 1 (3): 0058. https://doi.org/10.1038/s41562-017-0058.

Wu, John K., Tiffany C. AND Tsotsos. 2025. “Real-World Visual Search Goes Beyond Eye Movements: Active Searchers Select 3D Scene Viewpoints Too.” PLOS ONE 20 (7): 1–30. https://doi.org/10.1371/journal.pone.0319719.

Zoest, Wieske van, Mieke Donk, and Jan Theeuwes. 2004. “The Role of Stimulus-Driven and Goal-Driven Control in Saccadic Visual Selection.” Journal of Experimental Psychology: Human Perception and Performance 30 (4): 746–59. https://doi.org/10.1037/0096-1523.30.4.749.