Eye Tracking-based LSTM for Locomotion Prediction in VR

Authors: Niklas Stein*, Gianni Bremer* & Markus Lappe (*shared co-first authorship)

This chapter was published at the International Conference on Virtual Reality and 3D User Interfaces (IEEE VR) in 2022

Abstract

Virtual Reality (VR) allows users to perform natural movements such as hand movements, turning the head and natural walking in virtual environments. While such movements enable seamless natural interaction, they come with the need for a large tracking space, particularly in the case of walking. To optimise use of the available physical space, prediction models for upcoming behaviour are helpful. In this study, we examined whether a user’s eye movements tracked by current VR hardware can improve such predictions. Eighteen participants walked through a virtual environment while performing different tasks, including walking in curved paths, avoiding or approaching objects and conducting a search. The recorded position, orientation and eye tracking features from 2.5 s segments of the data were used to train a LSTM model to predict the user’s position 2.5 s into the future. We found that future positions can be predicted with an average error of 66 cm. The benefit of eye movement data depended on the task and environment. In particular, situations with changes in walking speed benefited from the inclusion of eye data. We conclude that a model utilizing eye tracking data can improve VR applications in which path predictions are helpful.

User perspective of the experiment while walking to a target and avoiding an obstacle.

Introduction

Walking is the most natural and immersive way of moving through a VE (Steinicke et al. 2013). It enhances presence and allows users to unconsciously acquire spatial knowledge (Usoh et al. 1999; Langbehn, Lubos, and Steinicke 2018). However, walking creates a set of physical problems for VR applications. Collisions with physical walls, objects or other users in real space need to be always prevented. Therefore, large VEs either require large tracking areas or methods such as RDW. With methods that estimate intentions or predict future actions of walking users, RDW and other applications such as programmable interaction patterns of avatars, automatic, user behaviour based level design could become more effective.

RDW is the subtle and unbeknownst redirection of the user to change her trajectory in real space while keeping her perception of movement in virtual space the same. RDW is limited by the user’s manipulation threshold (Steinicke et al. 2009). If visually presented and physical walking paths diverge too much, the user notices the manipulation. Although the detection threshold can be slowly adapted over time through learning, the general limitation remains (Bölling et al. 2019).

The implementation of RDW requires applying these manipulations automatically using RDW controllers. Scripted steering controllers have a predefined set of rules based on given information about the VE and the physical space (Razzaque, Kohn, and Whitton 2001). As a result, RDW manipulations are applied whenever a participant reaches a predefined position. Accordingly, this type of controller needs to be carefully readjusted whenever the VE or the available physical space changes. In the past, several methods to automatize this readjustment process have been developed (Zmuda et al. 2013). The available virtual walking paths can be generated automatically from the environmental data (Zank and Kunz 2017). Equipped with predefined probability scores for each path and a given skeleton map of the virtual and physical space, the controller then chooses the best manipulation from a predefined set during walking. Generalised controllers follow a different approach. Instead of using information about the VE, algorithms such as steer-to-center, steer-to-orbit, steer to way-points, or steer in figure-eight patterns are designed to work in any VE by steering users along specified physical paths (Nilsson et al. 2018).

RDW controllers can greatly benefit from predictions or assumptions about the upcoming behaviour of the user. Given predictions of the user’s virtual path, manipulations can be applied earlier and with a smaller divergence between real and virtual paths. One source for the prediction in unknown environments can be recurring behaviour patterns. For example, assumptions such as typical walking paths (Hutton and Suma 2016) have proven useful in this context. Besides general assumptions, predictions of upcoming locomotor activities may also be gained from observing the user’s prior behaviour and actions, such as the trajectory of the user’s path over the immediate past and the orientation of her head and body during that time.

Another potential source of information is the user’s gaze pattern. When walking, humans typically use specific gaze strategies to inspect their future path, monitor their next target and avoid obstacles (Hollands, Patla, and Vickers 2002; Hayhoe and Ballard 2005; Hart and Einhäuser 2012). Therefore, the pattern of gaze during walking contains valuable information about the user’s locomotor intentions. Gaze tracking has recently been incorporated into commercially available HMDs. Although their current quality does not match that of research-grade eye trackers (Stein et al. 2021; Sipatchin, Wahl, and Rifai 2021; Lohr, Friedman, and Komogortsev 2019) it appears still possible to use gaze information to improve locomotor predictions of VR users (Gandrud and Interrante 2016; Zank and Kunz 2016a, 2016b; Bremer, Stein, and Lappe 2021). Since gaze behaviour depends to some degree on the task (for example obstacle avoidance is different from walking straight to a target (Rothkopf, Ballard, and Hayhoe 2016)) the usefulness of gaze data for walking prediction might likewise depend on the situation.

In the present study, we explore how combinations of position, orientation and gaze data can be used to predict a user’s future position during walking in VR, and how the prediction quality depends on the different data features and task demands. Our prediction model uses an artificial neural network for time series prediction, the long short-term memory (LSTM) architecture (Hochreiter and Schmidhuber 1997), which can be fitted to the relevant data without prior assumptions and has been used successfully, for example, for prediction of public pedestrian traffic (Becker et al. 2018) as well as in RDW (Cho, Lee, and Lee 2018).

Gaze Behaviour During Walking

Eye movements have a tight correlation with other motor action because we need to move our eye to targets of interest to collect the visual information we need for good action control (Michael F. Land and Tatler 2009). Usually, eye movements to a behavioural target precede any other motor action (Michael F. Land and Hayhoe 2001; Hayhoe and Ballard 2005). Therefore, they present a rich signal for the estimation of action intention (Belardinelli, Stepper, and Butz 2016; Gandrud and Interrante 2016; Zank and Kunz 2016a, 2016b; Bremer, Stein, and Lappe 2021). However, during walking this does not mean that people lock their gaze on a future target, for example a door, at all times. Instead, during walking, eye movements serve the identification of both targets and obstacles. For example, walkers often look at the ground in front a few steps ahead for safe placements of the feet, particularly in uneven terrain (Hollands et al. 1995; Hollands and Marple-Horvat 1996; Calow and Lappe 2008; Hart and Einhäuser 2012; Matthis, Yates, and Hayhoe 2018). This involves not only a direction of gaze towards the ground but also a pitch of the head (Marigold and Patla 2008). Before approaching a goal, however, walkers typically direct their gaze towards the goal (Hollands, Patla, and Vickers 2002; Durant and Zanker 2020).

Eye movements are also linked to changes of direction during walking (Hollands, Patla, and Vickers 2002). When walking in a curve, for example, walkers typically direct their gaze inward from the curve (Grasso et al. 1998; Imai et al. 2001). Furthermore, eye movements are involved in deciding between alternative targets (Zank and Kunz 2016a; Wiener, De Condappa, and Holscher 2011) and in searching of targets between distractors (Kit 2014). Thus, eye movements during walking depend on task demands (Tatler and Tatler 2013).

In summary, eye movements during natural behaviour may be a useful, though not straightforward, signal for predicting a user’s intention and future action which could be extracted in deep learning approaches for action prediction. Their usefulness is potentially different for different tasks. For the present investigation, we chose three tasks to cover a set of general locomotor scenarios in which eye movements likely play a role: searching for a target amongst distractors in a room, walking along a curved path and avoiding an obstacle.

Locomotor Prediction

Different methods for the prediction of future trajectories of users in VR have been proposed in the past. One approach relies on the analysis of possible or probable paths in the current environment (Zmuda et al. 2013; Zank and Kunz 2017). Available virtual walking paths are generated automatically from the environmental data and probability scores for each path along with a skeleton map of the virtual and physical space allow, for example, to choose an optimal manipulation for RDW. This approach, however, is tied to the specific environment and needs environmental information.

A different approach focuses on observation of the user’s prior actions to predict the future trajectory. Zank and Kunz (2016a) used eye tracking data to predict the next locomotion target. In their experiment, two predefined target positions were presented and the user was either instructed to go to one of the targets or freely choose one of them. Based on the chosen targets, they also compared different models that use previous movements to calculate probabilities for the two targets based on assumptions about human walking behaviour (Zank and Kunz 2016b; Arechavaleta et al. 2008; Fink, Foo, and Warren 2007) and graph representations of the environment (Nescher, Huang, and Kunz 2014). In particular, in a narrow T-shaped corridor with little open space, models using eye data were able to provide accurate predictions earlier than models without eye data. Later in a trial and in cases with open space, the overall prediction accuracy was higher and including eye data had no benefit for the prediction at those times.

Gandrud and Interrante (2016) also implemented a binary choice between two walking targets: Users walked along a virtual hallway, at the end of which they had to decide between two targets to approach. In the experiment they measured head direction, gaze direction and the position relative to the virtual hallway midline and compared these three measures for predicting the chosen target. They found that both head orientation and gaze orientation had the potential to be useful in predicting a person’s future direction of locomotion.

The use of eye data in these approaches assumes that the gaze position directly precedes the direction of human walking. Indeed, there is evidence supporting this notion (Hollands, Patla, and Vickers 2002; Brument et al. 2019; Michael F. Land and Tatler 2009; Tuhkanen et al. 2019). However, since eye movements during walking also depend on task demands (Tatler and Tatler 2013) a more complex processing procedure of gaze data could allow an even better prediction. Moreover, these pioneering previous studies only distinguished between discrete walking decisions. Further scenarios with fewer restrictions need to be evaluated to advance the use of behavioural measures for locomotor predictions.

A promising current approach to predict future positions is the use of deep learning models. These models can be fitted to the relevant data without prior assumptions. They have been successful, for example, with respect to public pedestrian traffic (Becker et al. 2018; Yu et al. 2020) or for the prediction of future gaze directions (Feng, Liu, and Wei 2020; Xu et al. 2018; Hu et al. 2021, 2020; Cornia et al. 2018). With respect to walking prediction, Cho, Lee, and Lee (2018) presented a preliminary study in which they implemented a deep learning model for locomotion prediction in the context of RDW. They used head position and orientation to train an LSTM model (Hochreiter and Schmidhuber 1997) to predict the user’s position 100 frames (about 1 second) into the future while the user navigated a maze. They report that the prediction worked well for two example users. However, their model is limited to the pre-defined maze map they used and did not include gaze data.

Scope of this Study

In the present work, we created a scenario in which participants fulfilled a range of typical VR tasks, including walking in curved paths, avoiding or approaching objects and conducting a search. Based on the walking and gaze data, we trained and evaluated different indoor path prediction models with an LSTM architecture that had no information about the used VE. We were especially interested in whether a user’s eye movements during walking could be a useful addition to the model. Therefore, we analysed the stability of the eye tracking over the duration of the study, described typical data patterns during different tasks and evaluated under which circumstances eye tracking data contributes to a better prediction of future position.

Methods

Participants

Eighteen participants (8 female) completed the experiment. The participants’ age ranged from 20 to 47 years (\(M = 27, SD = 6.34\)). Participants gave written informed consent and the experimental procedures were approved by the Ethics Committee of the Department of Psychology and Sports Science of the University of Münster. Apart from the two authors who participated in the experiment (N.S. and G.B.) participants were naïve to the purpose of the experiment. Two additional participants we initially acquired had to be excluded due to failed recordings.

Materials

The virtual environment was displayed in an HTC Vive Pro Eye with a resolution of 1440×1600 pixels per eye at a frame rate of 90 Hz and a nominal FOV of 110°. The experiment was conducted at the VR laboratory of the Department of Psychology and Sports Science of the University of Münster. An area of 6 × 11 m was tracked using 6 Vive Lighthouses 2.0. During the experiment, all positional tracking data were Kalman filtered.

The experiment was run using Unity3D on an MSI GE63VR 7RF Raider notebook equipped with an NVIDIA GTX 1070 graphics card. The notebook was carried in a backpack and supplied with power by a cable from the ceiling. A Vive tracker was attached to the backpack to record the body orientation and a Vive controller was used as input device. Eye tracking data were recorded using the integrated Tobii eye tracker of the Vive Pro Eye with a nominal accuracy of 0.5°–1.1° within a FOV of 20° at an output frequency of 120 Hz and a trackable FOV of 110°.

Eye Tracking Procedure and Questionnaires

At the beginning of the experiment, eye tracking calibration was done using the calibration method provided by Tobii Software.

In the last decade, eye tracking cameras have been included in several commercially available VR headsets (Clay, König, and Koenig 2019). Different aspects like latency, accuracy and precision of eye tracking data in VR have been evaluated (Lohr, Friedman, and Komogortsev 2019). Under optimal conditions, the HTC Vive Pro Eye showed eye tracking data delays of around 50 ms and an accuracy below 1° in central, but up to 10° in very peripheral positions (27°). Eye tracking precision varied from 1.4° to 3.5° (Stein et al. 2021; Sipatchin, Wahl, and Rifai 2021). This makes the Vive Pro Eye suitable for a rough online estimation of gaze in VR, although it is not clear how side effects such as slippage through head movements during natural walking affect the measurements. To evaluate possible slippage of the HMD on the head and therefore the eye tracking calibration stability before and after the experiment, 16 participants did an additional, custom-made, simple calibration procedure with 5 fixation positions before and after the experiment.

The simulator sickness questionnaire (Kennedy et al. 1993) (SSQ) was completed before and after each session. Additionally, the participants completed the Slater-Usoh-Steed questionnaire for immersion (Slater, Usoh, and Steed 1994) (SUS) after the session. Both questionnaires were translated into German. The authors did not participate in the questionnaires.

Tasks and Virtual Rooms

The virtual environment was divided into three rooms: search room (for an example see Figure [fig:FigureCa] in section 4), transition corridor (example in Figure 4) and obstacle room (for an example see Figure 1) in which all participants did ten trials including different tasks. To provoke a lot of natural walking in a limited physical space, search room and obstacle room were mapped onto the same physical space in an impossible room scenario (Suma et al. 2012). Whenever a participant moved through the transition corridor to the door on the other side, an entry to the room opened on the other side and the interior changed. Participants were asked to maintain a natural walking speed during the experiment while performing the following instructed tasks.

Search Room — Free Exploration

In the search room, participants had to look for a search object among six identical looking distractor objects. One object was placed in the centre. The others formed a hexagon around it (see Figure [fig:FigureCa]). All objects had a random yaw direction orientation and a distance of 2 m to their next neighbour objects. Whenever an object was reached, the participants could test whether this object was the search object by holding the controller close to it while pressing the trigger on the back of the controller. The result was signalled by a red or green light above the object. In addition, a sound was played if the search object had been found.

During the task, the participants were free to decide which object they wanted to go to next and did not know which object was the search object beforehand. The target position was individually pseudo-randomised for each participant. After completing the task, participants walked through the transition corridor to the obstacle room.

Transition Corridor - Curved Path

The transition corridor connected the two other rooms (see Figure 4) and followed a curved path with a radius of 5.5 m. Participants had to walk through the corridor to pass between the rooms. Data from the transition corridor was obtained to investigate walking along a curved path. Since the participants went back and forth between the rooms, ten right curves (search room to obstacle room) and nine left curves (obstacle room to search room) were recorded for each participant.

Obstacle Room — Straight Path and Obstacle Avoidance

In the obstacle room, participants were instructed to walk to a target object while avoiding an obstacle that might be positioned along the way. For each 4 m walk, the participants first positioned themselves on a white field in front of a pole with a red button. Pushing the button with the controller made the pole and button disappear and the target object and the obstacle (a chair) appear. The task had 4 different conditions: obstacle centred, obstacle 30 cm to the left, obstacle 30 cm to the right and no obstacle. During each visit to the obstacle room, these 4 conditions were run in pseudo-randomised order. Figure 1 shows the button, the obstacle and the target at example positions and a typical walking path with footsteps. Note that in the real experiment, the three objects were never visible at the same time. The obstacle was placed in the middle between the button and the target. After selecting the target with the controller (by pressing the trigger at the back while holding it close to the target), the target and obstacle disappeared and a new red button appeared at a new location in the room. The participants then went to the new position and repeated the procedure. After four walks in each trial, the participants returned to the search room via the transition corridor.

Data

All raw data files are freely available from https://osf.io/b43uv/. For the analysis, data were sampled at intervals of 50 ms. Periods with missing data and periods in which the participant remained standing without any locomotion (threshold = 0.15 m/s) were removed. Such periods of prolonged standing occurred, for example, when the participants did not immediately start walking at the beginning of the experiment. Naturally, the eye tracker missed data whenever the participant briefly blinked. In this case, missing eye tracking values were linearly extrapolated from the data before the blink.

For the prediction model, we distinguished three categories of data:

Positional data from the Vive’s infrared tracking system.
Orientation angles from the IMU.
eye tracking data from the Vive Pro Eye’s eye tracking system.

Prediction Model

Features and Labels

The model aimed to predict the user’s location in the VR environment 2.5 s into the future. Thus, the output of our prediction model was defined as the direction vector from the current head position (\(X^H_t, Y^H_t, Z^H_t\)) to the position 2.5 s into the future (\(X^H_{t+2.5 s}, Y^H_{t+2.5 s}, Z^H_{t+2.5 s}\)).

Since our prediction model should be based on user behaviour without information about the environment, it needed to use a coordinate system in a reference-frame attached to the user and not to the environment. For the present study, we used a head-fixed coordinate system, which has shown the best results previously (details in Bremer, Stein, and Lappe (2021)).

To set up this coordinate system, we used the average head orientation in the horizontal plane of the input sequence (\(\overline{\Phi^H_{t-i}}, \overline{\Theta^H_{t-i}}, \overline{\Psi^H_{t-i}}\)) to create a fixed reference yaw angle that was used to describe each input-output-pair. This reference angle, along with its orthogonal directions in the horizontal and vertical planes, provided the axes of the head-fixed coordinate system. Thus, the label direction vectors (\(\vec{F}_t\)) were rotated using the reference yaw angle. Lower case letters are used to express variables in the new coordinate systems. (e.g., \(f, \psi, \theta\)).

\[\begin{split} \vec{F}_t = (F^X_t,F^Z_t) = (X^H_{t+2.5 s} - X^H_t, Z^H_{t+2.5 s} - Z^H_t) \\ f^x_t=\cos(-\overline{\Psi^H_{t-i}}) F^X_t - \sin(-\overline{\Psi^H_{t-i}}) F^Z_t\\ f^z_t=\sin(-\overline{\Psi^H_{t-i}}) F^X_t + \cos(-\overline{\Psi^H_{t-i}}) F^Z_t \end{split}\]

The index \(i\) in these equations represents the different steps in the input sequence. A total of 7 input features were selected for our model. First, the current two-dimensional velocity of the head in the horizontal plane (\(V^X_{t-i}, V^Z_{t-i}\)) was calculated. Height was not used, as there is no evidence for the relevance of this information with regard to the direction of motion on a plane. These input velocities were rotated using the reference yaw angle to convert them to the head-fixed coordinate system. \[\begin{split} v^x_{t-i} = \cos(-\overline{\Psi^H_{t-i}}) V^X_{t-i} - \sin(-\overline{\Psi^H_{t-i}}) V^Z_{t-i}\\ v^z_{t-i} = \sin(-\overline{\Psi^H_{t-i}}) V^X_{t-i} + \cos(-\overline{\Psi^H_{t-i}}) V^Z_{t-i} \end{split}\]

Second, we added the yaw and pitch angle of the head (\(\Psi^H_t, \Theta^H_t\)) and the direction of eye gaze (\(\Psi^E_t, \Theta^E_t\)) to the list of the features. Both features might be informative since humans usually orient their head to the target and direct their gaze to the floor along the future locomotor path. Third, we added the body tracker yaw (\(\Psi^B_t\)), which was captured by the additional Vive tracker in the subject’s backpack. This provided the model with information about body orientation. The reference angles were subtracted from all angles. Before our final scaling, the angles were set between -180° and 180°, with 0° indicating the reference direction.

\[\begin{split} \psi^H_{t-i} = \Psi^H_{t-i} - \overline{\Psi^H_{t-i}} \\ \theta^H_{t-i} = \Theta^H_{t-i} - \overline{\Theta^H_{t-i}} \\ \psi^B_{t-i} = \Psi^B_{t-i} - \overline{\Psi^H_{t-i}} \\ \psi^E_{t-i} = \Psi^E_{t-i} - \overline{\Psi^H_{t-i}} \\ \theta^E_{t-i} = \Theta^E_{t-i} - \overline{\Theta^H_{t-i}} \end{split}\] The input length was set to 2.5 s (50 samples per input), i.e. the prediction was based on the present data and 2.5 s of past information.

To compensate for possible asymmetries resulting from the architecture of the VE (for example the positioning of the room entrance might have led the participants to favour right or left curves in the search room), every second data pair was mirrored (left-right).

Processing

The data went through two LSTM layers with 64 hidden units, a dropout layer (\(p = 0.3\)) and a dense output layer for the predicted coordinates.

We chose Adam as the optimiser with a learning rate of 0.003 (Kingma and Ba 2014). A weight decay of \(1e-4\) was added to prevent overfitting. The model was trained for 20 epochs using the mean squared error as the criterion and a batch size of 64.

To obtain a single comprehensible value for estimating the quality of the prediction, we computed the MDE as the distance between the true value and the prediction. To evaluate the influence of eye tracking and IMU data, we also created models without this data to compare them to the main model, which included all features. The model without IMU data also omitted the eye tracking features, because a tracking system equipped with an eye tracker but without IMU seemed to be an unlikely use case.

We implemented leave-3-out-cross-validation at the group level. One participant’s data were used as validation data and two participants’ data were used as test data. All remaining participants were used to train the model. The validation data were used for hyperparameter optimisation. All data were z-standardised with scalers fitted to the training data.

Results

Participants took on average 14 minutes to complete the experiment.

Evaluation of Eye Tracking Stability

The eye tracking calibration analysis revealed an average Euclidean eye tracking error of 1.99° before and 2.07° after the experiment (no significant difference, \(p>.05, t(15) = -0.31\)) across all tested participants. Thus, the overall error stayed small over the experiment. Within subject, tests of fixation positions before and after revealed a significant mean difference of 1.38° (\(p<0.001, t(15) = 5.37\)). Thus, individually, the eye tracking changed non-systematically by a small amount, probably due to HMD slippage during the natural walking experiment. Centred fixation positions showed standard deviations up to 1.6°. The more peripheral fixation targets at \(\pm\) 15° had standard deviations up to 2.2° (see Figure 2).

Comparison of custom eye tracking calibration before (left) and after (right) the experiment. The five black asterisks represent the fixation targets. Red crosses show mean fixation positions across all participants (coloured dots) and two standard deviations (ellipses).

Overall Quality of Prediction

The full data set contained 156,076 input-output pairs to train the model. In the 2.5 s labels, the participants travelled an average distance of 165 cm with a mean walking speed of 0.72 m/s. The prediction model using all features (position, orientation and gaze) produced an MDE of 66 cm for a 2.5 s prediction. A model using only position and orientation performed slightly worse, with an MDE = 68 cm. A model that used only position data produced an MDE = 78 cm. For comparison, a linear regression model considering all features reached an MDE of 93 cm (\(SD = 8\) cm ) and an extrapolation benchmark based on the most recent positions reached an MDE of 131 cm (\(SD = 16\) cm). Figure 3 shows examples of predicted paths from the full model in the search room and the transition corridor.

Two examples of 15 seconds of sequential path predictions. The gray line depicts the real path that the participant walked. The blue line shows the path predicted by the model. Prediction at each data point was based on the preceding 2.5 seconds of movement.

Using the testing method proposed by Nadeau and Bengio (Nadeau and Bengio 2003) for cross-validated data (alpha level = 0.05), the difference between the LSTM model and the linear model was significant (\(t(5) = -11.08, p < 0.001\)). The Benjamini-Hochberg correction (Benjamini and Hochberg 1995) was used to test the differences between the three LSTM models. The model using only position data produced significantly larger error than the model using all features (\(t(5) = -6.99, p < 0.01\)) and the model using only positional and IMU features (\(t(5) = -4.92, p < 0.01\)). The model using all features produces significantly smaller error than the model using only positional and IMU data (\(t(5) = -3.01, p < 0.05\)). Thus, eye tracking significantly improved prediction quality, even if only by a small amount (2.78%). Further analysis showed that using a GRU model did not lead to a significantly better prediction (details in Bremer, Stein, and Lappe (2021)).

Analysis of the Different Rooms

Next, we compared the prediction performances of the model in the three rooms to test whether the improvement gained by eye tracking depends on the task and behaviour of the participants.

Search Room

In the search room, participants looked for a search object amongst six distractors. Each object had to be closely approached to check if it was the search object. On average, the participants walked with a speed of 0.6 m/s and found the search object in the 4th attempt (\(SD = 0.6\)). Typically, after entering the room, participants walked to the object closest to the entrance and approached one of the outer objects (either clockwise or anti-clockwise) next. In most trials, participants then tested the object in the centre and afterwards continued with the remaining objects (see Figure [fig:FigureCa]).

For the model trained with all data, the prediction error in the search room was higher than the average error across all rooms (see Table 1). One reason could be quantity of training data since only 14.36 % of the data originated in the search room. However, training the model specifically on the Search Room’s data improved the prediction only marginally (MDE of 84 cm). This suggests that the lower prediction quality in the search room might be related to the particular task performed, which might be more difficult to predict than the tasks in the other rooms.

2.5 s prediction error for different rooms
		Test Data
Training Data	Eye Data	Search	Corridor	Obstacle
All	yes	88 cm	68 cm	59 cm
All	no	90 cm	68 cm	62 cm
Search	yes	84 cm	124 cm	103 cm
Search	no	86 cm	120 cm	101 cm
Corridor	yes	116 cm	61 cm	103 cm
Corridor	no	114 cm	60 cm	97 cm
Obstacle	yes	106 cm	106 cm	57 cm
Obstacle	no	105 cm	111 cm	59 cm
All*	yes	88 cm	75 cm	70 cm

*The number of samples in the transition corridor and the obstacle room was artificially lowered to match the number of samples in the search room.

The inclusion of eye data improved the prediction in the search room by a small margin, both in the model trained with all the data and in the model specifically trained on the search room data. Thus, eye movement data provided a benefit for the prediction in the search task.

Where did participants look in the obstacle room? The coloured lines represent the relative prevalence of looking at the target, ground, wall, or obstacle as participants walked from the buzzer to the target. Conditions with and without obstacle are indicated by continuous and dashed lines, respectively. Coloured areas between the lines identify phases in which participants looked to the respective object more often when the obstacle was absent than when it was present. Shaded areas indicate phases in which participants looked to the respective object more often when the obstacle was present than when it was absent.

Transition Corridor

In the transition corridor, participants had to walk along a curve to proceed from one room to the other. Participants passed the transition corridor at a mean speed of 0.9 m/s.

As the participants went around the curve, they looked towards the inside of the wall and eventually at the door (see Figure 4). Hence, gaze was to the inside of the curve at almost all times.

The full model trained on all data achieved a prediction error of 68 cm in the transition corridor (see Table 1). Training a model specifically on the transition corridor data without including data from the other rooms reduces the error to 61 cm, showing a noticeable advantage. Note that 28.56 % of the full model data came from the transition corridor.

Eye tracking, on the other hand, did not prove useful in the transition corridor. When trained on all data and when trained on only the transition corridor data, the model with eye tracking data gave even slightly worse results than the model without eye tracking.

Obstacle Room

In the obstacle room, participants had to walk to a target object while avoiding an obstacle along the way (see Figure 1). Participants moved at an average speed of 0.62 m/s through the obstacle room. They passed 90 % of all right obstacles on the right and 87 % of all left obstacles on the left. When the obstacle was placed in the centre, 68 % of all obstacles were passed on the right. For the model, this bias should not matter since half of the data were mirrored.

Every fourth walk contained no obstacle and participants could walk straight to the target. Comparing gaze data in these trials to those trials that contained an obstacle allowed us to look at differences in the eye movements depending on whether the obstacle was present or not. Figure 6 shows the proportion of gazes (gaze prevalence) that were directed to the target (blue), obstacle (green), ground (orange), or walls (pink) as a function of time from the start of the walk. Conditions with an obstacle are plotted as lines, conditions without an obstacle as dashed lines. The figure shows that, at the start of the walk, within the first second, participants looked mostly at the ground. Then, when no obstacle was present, they looked at the target object and kept their gaze there most of the time. When an obstacle was present, participants looked less frequently to the target, partially because they looked at the obstacle, but also because they looked more often to the ground than when no obstacle was present. Overall, over the course of walking to the target, our participants looked progressively more often at the target and less often at the ground. Right after start, half of all gazes were directed to the obstacle when an obstacle was present. Gazes to the obstacle became less frequent after three seconds. At that time, in many trials, the obstacle had been passed and was no longer visible.

The MDE in the obstacle room (59 cm) was smaller than the mean error over all rooms (see Table 1). When the model was trained specifically on obstacle room data, the error was even smaller (57 cm). 57.08 % of the data originated in the obstacle room.

The inclusion of eye data provided a small advantage to predictions, both when data from all rooms was used for training and when only the obstacle room was used.

Comparison Between Rooms

When comparing the rooms, the best prediction result was achieved in the obstacle room, followed by the transition corridor and the search room. While there were differences in the amount of data collected in different rooms, the prediction results still hold when the amount of data was artificially reduced to the same number of observations in all rooms (see Table 1). Training the model on only a single room’s data slightly improved the predictions in that room at the expense of prediction accuracy in the other rooms.

Dependence of Eye Tracking Benefit on Locomotor and Gaze Behaviour

Eye movements provided the biggest benefits in the Search and Obstacle Room. This indicates that task and behaviour can influence the importance of eye tracking data for the model.

Figure 7a shows the improvement of prediction error for a model with eye movements compared to a model without eye movements as a function of the mean acceleration in the 2.5 s input data segment. Colours indicate the different rooms, transparency shows the amount of data that was available. Prediction quality improved when acceleration was larger, most notably in the obstacle room. This shows that eye movements were particularly useful when the data segment contained high acceleration. Figure 7b shows how the prediction error varies with the mean acceleration in the 2.5 s data segment that was to be predicted. Here, prediction quality was best when the user decelerated in the obstacle room. This suggests that eye movements are most useful for prediction in situations in which the user is likely to stop, presumably because she is close to the target. The search room and the transition corridor did not show such dependencies. However, in these rooms, large accelerations and decelerations were less likely to occur, because they were not part of the typical task-related behaviour in this room.

Besides the acceleration in the walking data, gaze data were also useful to indicate situations in which eye tracking reduces the prediction error. Figure 7c shows the improvement of prediction error for a model with eye movements compared to a model without eye movements as a function of the gaze distance in the 2.5 s input data segment, i.e. the distance to the object at which the user currently looks. In the obstacle room, prediction quality benefits the most from eye tracking when the gaze target is close. This tendency was also present in the search room, but not in the transition corridor.

Added value of eye tracking in the model. (a) shows the model advantage as a function of the acceleration in the input sequence position data. A 0.25 m/s² long Gaussian rolling window (standard normal distribution) was used to smooth the data. (b) shows the model advantage as a function of the acceleration in the true output paths. The same rolling window was used. (c) shows the model advantage as a function of the gaze length, i.e. the distance between the observer and the point where their gaze hits the world, at the moment of prediction. A 1.25 m long Gaussian rolling window was used. The transparency of the line colours indicates the number of observations that factored into each data point.

Questionnaires

The mean SSQ total simulator sickness increased from 7.48 (\(SD = 7.70\)) before the experiment to 20.36 (\(SD = 34.55\)) after the experiment. However, mean values around 20 should not be automatically attributed to a bad simulator in novice VR users (Bimberg, Weissker, and Kulik 2020). Therefore and because we did not notice a high occurrence of motion sickness, when talking to the participants after the experiment, we would interpret the value increase as a typical result from an experiment that included a 14 minutes long physical task (walking with a VR backpack) and was done by novice users. Moreover, such physical activity has led to increased sweating, which is also represented as a sickness-symptom in the SSQ.

The results from the SUS questionnaire indicated that the users perceived the presented VE as immersive. Participants scored an average of 4.99 (\(SD = 1.28\), \(min = 2.83\), \(max = 7\)).

Discussion

We investigated whether data obtained from eye tracking devices in current VR hardware can be used to enhance an LSTM model of locomotor path prediction for natural walking in VR. The model predicted future walking positions trained on position, orientation and eye tracking data from a free walking scenario, in which users performed different tasks in differently structured rooms. The full model produced good predictions that exceeded those of linear regression and simple extrapolation models.

To evaluate the use of eye tracking in this model, we compared it against models that used only position and orientation, or only position data for training. The model including eye tracking significantly improved prediction compared to those models, albeit only to a small amount overall. We evaluated the impact of eye tracking data in the model specifically for the different rooms, tasks, and walking behaviours. We found eye tracking benefits in search and obstacle avoidance tasks, and especially in situations in which users changed their walking velocity.

Quality of Eye Tracking in VR

A prerequisite for the predictive utility of eye data in VR is the quality and stability of the eye tracker. Different currently available eye trackers have different temporal resolutions, making them more or less usable for studies needing temporal accuracy in the millisecond range (Stein et al. 2021). For the present study, temporal accuracy was less critical since we averaged data over 50 ms intervals and based our prediction on data segments of 2.5 s. Since locomotor behaviour in walking is usually smooth, a small temporal lag of the eye tracker would not be detrimental in our prediction scenario. However, during extended periods of walking, on average 14 minutes in our study, spatial accuracy of the eye tracking might deteriorate, for example, if the HMD slips on the head during user movement.

To check eye tracking quality, we measured fixations to a standard set of five fixation targets before and after the experiment. We evaluated both the average error before and after the experiment and the within-subject change of the fixation positions before and after. There was an average error of 2° in the eye tracking data within our 30° FOV, which did not increase significantly after the experiment. The average within-subject change of the fixation positions was 1.38°. Therefore, we consider the stability of the eye tracking suitable for our purpose of measuring general directions of gaze in walking experiments. While this is encouraging for the use of eye tracking in VR, we believe that applications that require eye positions should include such a before-and-after calibration check.

Gaze and Locomotor Behaviour in VR

Gaze behaviour during locomotion in the real world shows certain characteristics, like looking on the ground in front in cluttered environments (Patla and Vickers 1997; Calow and Lappe 2008; Hart and Einhäuser 2012; Matthis, Yates, and Hayhoe 2018), looking at the target in approach (Hollands, Patla, and Vickers 2002; Durant and Zanker 2020), or looking towards the inside of a curve (Grasso et al. 1998; Imai et al. 2001). The observed behaviour in the different rooms is consistent with these specifics, suggesting that participants acted naturally in these virtual tasks.

The walking speed in the transition corridor was higher than in the other rooms. This might be explained by a lack of task-relevant object interaction, which allowed the participants to pass through quickly. In the search and obstacle rooms, participants walked more slowly and interacted with objects more deliberately. In the obstacle room, participants glanced more frequently to the ground in the first 3 s of their way to the target when an obstacle was present, consistent with looks to the ground in natural behaviour for negotiating a path containing obstacles (Patla and Vickers 1997; Calow and Lappe 2008; Hart and Einhäuser 2012; Matthis, Yates, and Hayhoe 2018). As the obstacle blocked the view towards the target, participants may have had to look sideways towards their path. Since gaze precedes path (Hollands et al. 1995; Matthis, Yates, and Hayhoe 2018), it would also be possible that curves to avoid the obstacle necessitated smaller shifts of the gaze here. In the search room, the systematic search patterns shown in Figure [fig:FigureCa] revealed that short distances were preferred to arbitrary trial and error.

Use of Eye Tracking for Prediction

Our results show that eye tracking can help to estimate future actions. Gandrud and Interrante (2016) previously showed that gaze data are useful when predicting choice between two potential walking targets in a VR hallway. Likewise, Zank and Kunz (2016a) identified eye data as especially advantageous for binary walking path predictions in long, narrow environments. We expanded their results by following a more general approach, not limited to binary predictions. This provides the potential to be applicable to a broader range of environments and applications. Users were instructed to walk freely in the VE. Our aim was to predict their future position in space. Therefore, a direct comparison of the performance of our model to the prediction methods in those previous papers is not possible with the current data-set. However, it is possible to compare whether eye tracking data were a significantly useful addition for the prediction in the different studies. Eye movements provide useful information when approaching a target (Hollands, Patla, and Vickers 2002; Durant and Zanker 2020). This fits well with the results of the binary prediction studies. However, when walking in a curve with no task, the head orientation is likely to include the same information as the gaze (Grasso et al. 1998; Imai et al. 2001). Accordingly, gaze provided no benefit for the model using only data from the transition corridor.

In the obstacle room, an advantage from the inclusion of eye data was observed for decelerating paths and paths following accelerated motion. It seems possible that the presence of the obstacle influenced the motion and gaze patterns in a way that eye data improved model performance. However, since the obstacle was only looked at for a short time (see Figure 6), it is not clear if this alone has caused this effect. The advantage could also be caused by a change in gaze behaviour when stopping in front of the target. However, the obstacle room was the only room where stopping was part of the task. Therefore, we cannot clearly differentiate if eye movements were most beneficial for predictions during stop-and-go, obstacle avoidance or a combination of both tasks.

The link between gaze and locomotor activity in natural behaviour and VR suggests that eye tracking might be useful for predicting upcoming user actions. The results from our LSTM prediction model confirm that the addition of eye tracking data can provide a significant benefit in predicting future walking paths. Overall, the MDE of the best model including gaze data was 66 cm for a 2.5 s prediction. The average distance walked during 2.5 s was 165 cm. It remains to be evaluated in future studies how effective this prediction is for different applications.

Possible Application Scenarios

For walking in VR, a particular application scenario is RDW, in which users are steered along a physical trajectory that differs from the virtual trajectory to optimally use the physical space available for tracking (Razzaque, Kohn, and Whitton 2001). In dynamic RDW, one has to decide in which direction and by how much the user should be redirected at any given point in time. Knowledge of where the user most likely intends to go can be advantageous for implementing RDW fast and effectively (Nilsson et al. 2018). Since our prediction model worked well in different rooms and tasks and since it only uses data directly from the user, it can be considered a useful tool for a generalised RDW controller. Future user studies are needed to evaluate if an RDW controller using eye tracking data for redirection outperforms previous approaches regarding the needed number of resets and the user experience in different physical and VR scenarios.

A very simple way to include our prediction in RDW is to focus on whether a user, at a particular point in time, should be redirected leftward or rightward. In such a case, a simple left/right prediction of the user’s intent could be sufficient. To quantify the quality of such a prediction in our model, we took the predicted future positions at each point in time and subjected them to a simple left/right discrimination. The results showed that the model could predict whether the user will turn left or turn right with an accuracy of 84.8 %. A model explicitly trained to distinguish between left curves and right curves would likely achieve even better accuracy. Additionally, there might be some potential for synergistic effects with other methods, such as following behaviour (Nguyen, Wüest, and Kunz 2020) to improve performance even further.

Our method of locomotion prediction might also be useful in other scenarios. For example, control of non-player characters could benefit from user prediction to avoid collisions. Moreover, valid predictions of upcoming behaviour could be helpful in determining the objects and locations where computational resources should be focused to for example increase their responsiveness or level of detail.

An important feature of our method is that it relies exclusively on egocentric data and does not need information about the environment. This feature makes the method very versatile. By using inside-out tracking, head worn IMUs and eye trackers, the data for prediction can be gathered by sensors worn by the user, not only in VR. For example, similar models could likely be trained to provide rough predictions of locomotion intentions in augmented reality scenarios. Moreover, the anticipation of human actions, such as walking, can also play a key role in the development of assistive robots (Koppula and Saxena 2015).

Limitations

Although we tried to create an experiment with a representative sample of typical VR walking tasks, our data set is limited to our selection of rooms and tasks. This could compromise the performance of our model in new contexts, as movements associated with our tasks could be disproportionately represented in the data set. This effect is already observable in our results since the models that limited access to some rooms during training produced worse results in the other rooms, compared to the full model (see Table 1). While the overall smaller amount of data might have decreased prediction accuracy, the discrepancy between known and unknown environments indicates that the transfer from one of our rooms to the others has only been successful to a limited extent. This suggests that performance of our model is likely to drop when used with tasks that were not included in the training phase. However, even these higher errors are still in a range far below the distances travelled. Therefore, we think that our model’s path prediction can be used in new situations as long as an error of about 1 m in comparison to the real path is tolerable for the application.

Another limitation is concerned with the predicted variable we used in our model. Minimizing the error between the real and the predicted end position of a 2.5 s subset of data gave us valuable information about the range of MDE and the incremental value of gaze data for the prediction. However, it does not include information on the deviation between the predicted and the true trajectory during the walk. Prediction of the full trajectory could be valuable for some application scenarios and therefore should be considered in extended path prediction models in the future.

Conclusion

eye tracking data from current VR hardware improves deep learning path prediction for natural walking in VR. Eye tracking benefits appear especially in situations in which the user interacts with the virtual environment during locomotion. Including eye tracking data while considering the user’s task and behaviour is a useful tool for deep learning path prediction.

Declaration of Conflicting Interests

The author(s) declare no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

This work was supported by the German Research Foundation (DFG La 952-4-3, La 952-7) and has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 951910.

References

Arechavaleta, Gustavo, Jean-Paul Laumond, Halim Hicheur, and Alain Berthoz. 2008. “An Optimality Principle Governing Human Walking.” IEEE Transactions on Robotics 24 (1): 5–14. https://doi.org/10.1109/TRO.2008.915449.

Becker, Stefan, Ronny Hug, Wolfgang Hübner, and Michael Arens. 2018. “An Evaluation of Trajectory Prediction Approaches and Notes on the Trajnet Benchmark.” arXiv:1805.07663. https://doi.org/10.48550/arXiv.1805.07663.

Belardinelli, Anna, Madeleine Y. Stepper, and Martin V. Butz. 2016. “It’s in the eyes: Planning precise manual actions before execution.” Journal of Vision 16 (1): 18–18. https://doi.org/10.1167/16.1.18.

Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society: Series B (Methodological) 57 (1): 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.

Bimberg, Pauline, Tim Weissker, and Alexander Kulik. 2020. “On the Usage of the Simulator Sickness Questionnaire for Virtual Reality Research.” In 2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), 464–67. https://doi.org/10.1109/VRW50115.2020.00098.

Bölling, Luke, Niklas Stein, Frank Steinicke, and Markus Lappe. 2019. “Shrinking Circles: Adaptation to Increased Curvature Gain in Redirected Walking.” IEEE Transactions on Visualization and Computer Graphics 25 (5): 2032–39. https://doi.org/10.1109/TVCG.2019.2899228.

Bremer, Gianni, Niklas Stein, and Markus Lappe. 2021. “Predicting Future Position from Natural Walking and Eye Movements with Machine Learning.” In 2021 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), 19–28. IEEE. https://doi.org/10.1109/AIVR52153.2021.00013.

Brument, Hugo, Iana Podkosova, Hannes Kaufmann, Anne Hélène Olivier, and Ferran Argelaguet. 2019. “Virtual Vs. Physical Navigation in VR: Study of Gaze and Body Segments Temporal Reorientation Behaviour.” In 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), 680–89. IEEE. https://doi.org/10.1109/VR.2019.8797721.

Calow, D., and M. Lappe. 2008. “Efficient Encoding of Natural Optic Flow.” Network: Computation in Neural Systems 19 (3): 183–212. https://doi.org/10.1080/09548980802368764.

Cho, Yong-Hun, Dong-Yong Lee, and In-Kwon Lee. 2018. “Path Prediction Using LSTM Network for Redirected Walking.” In 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), 527–28. IEEE. https://doi.org/10.1109/VR.2018.8446442.

Clay, Viviane, Peter König, and Sabine Koenig. 2019. “Eye Tracking in Virtual Reality.” Journal of Eye Movement Research 12 (1). https://doi.org/10.16910/jemr.12.1.3.

Cornia, Marcella, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. 2018. “Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model.” IEEE Transactions on Image Processing 27 (10): 5142–54. https://doi.org/10.1109/TIP.2018.2851672.

Durant, Szonya, and Johannes M Zanker. 2020. “The Combined Effect of Eye Movements Improve Head Centred Local Motion Information During Walking.” PloS One 15 (1): e0228345. https://doi.org/10.1371/journal.pone.0228345.

Feng, Xianglong, Yao Liu, and Sheng Wei. 2020. “LiveDeep: Online Viewport Prediction for Live Virtual Reality Streaming Using Lifelong Deep Learning.” In 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), 800–808. https://doi.org/10.1109/VR46266.2020.00104.

Fink, Philip W, Patrick S Foo, and William H Warren. 2007. “Obstacle Avoidance During Walking in Real and Virtual Environments.” ACM Transactions on Applied Perception (TAP) 4 (1): 2–es. https://doi.org/10.1145/1227134.1227136.

Gandrud, Jonathan, and Victoria Interrante. 2016. “Predicting Destination Using Head Orientation and Gaze Direction During Locomotion in VR.” In ACM Symposium on Applied Perception, SAP 2016, 31–38. Association for Computing Machinery, Inc. https://doi.org/10.1145/2931002.2931010.

Grasso, Renato, Pascal Prévost, Yuri P Ivanenko, and Alain Berthoz. 1998. “Eye-Head Coordination for the Steering of Locomotion in Humans: An Anticipatory Synergy.” Neuroscience Letters 253 (2): 115–18. https://doi.org/10.1016/S0304-3940(98)00625-9.

Hart, Bernard Marius ’t, and Wolfgang Einhäuser. 2012. “Mind the Step: Complementary Effects of an Implicit Task on Eye and Head Movements in Real-Life Gaze Allocation.” Experimental Brain Research 223 (2): 233–49. https://doi.org/10.1007/s00221-012-3254-x.

Hayhoe, Mary M., and D Ballard. 2005. “Eye Movements in Natural Behavior.” Trends Cogn. Sci. 9 (4): 188–94. https://doi.org/10.1016/j.tics.2005.02.009.

Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long Short-Term Memory.” Neural Computation 9 (8): 1735–80. https://doi.org/10.1162/neco.1997.9.8.1735.

Hollands, Mark A, and D E Marple-Horvat. 1996. “Visually Guided Stepping Under Conditions of Step Cycle-Related Denial of Visual Information.” Experimental Brain Research 109 (2): 343–56. https://doi.org/10.1007/BF00231792.

Hollands, Mark A, Dilwyn E Marple-Horvat, Sebastian Henkes, and Andrew K Rowan. 1995. “Human Eye Movements During Visually Guided Stepping.” Journal of Motor Behavior 27 (2): 155–63. https://doi.org/10.1080/00222895.1995.9941707.

Hollands, Mark A, Aftab E Patla, and Joan N Vickers. 2002. “‘Look Where You’re Going!’: Gaze Behaviour Associated with Maintaining and Changing the Direction of Locomotion.” Experimental Brain Research 143 (2): 221–30. https://doi.org/10.1007/s00221-001-0983-7.

Hu, Zhiming, Andreas Bulling, Sheng Li, and Guoping Wang. 2021. “FixationNet: Forecasting Eye Fixations in Task-Oriented Virtual Environments.” IEEE Transactions on Visualization and Computer Graphics 27 (5): 2681–90. https://doi.org/10.1109/TVCG.2021.3067779.

Hu, Zhiming, Sheng Li, Congyi Zhang, Kangrui Yi, Guoping Wang, and Dinesh Manocha. 2020. “DGaze: CNN-Based Gaze Prediction in Dynamic Scenes.” IEEE Transactions on Visualization and Computer Graphics 26 (5): 1902–11. https://doi.org/10.1109/TVCG.2020.2973473.

Hutton, Courtney, and Evan Suma. 2016. “A Realistic Walking Model for Enhancing Redirection in Virtual Reality.” In 2016 IEEE Virtual Reality (VR), 183–84. IEEE. https://doi.org/10.1109/VR.2016.7504714.

Imai, Takao, Steven T. Moore, Theodore Raphan, and Bernard Cohen. 2001. “Interaction of the Body, Head, and Eyes During Walking and Turning.” Experimental Brain Research 136 (1): 1–18. https://doi.org/10.1007/s002210000533.

Kennedy, Robert S, Norman E Lane, Kevin S Berbaum, and Michael G Lilienthal. 1993. “Simulator Sickness Questionnaire: An Enhanced Method for Quantifying Simulator Sickness.” The International Journal of Aviation Psychology 3 (3): 203–20. https://doi.org/10.1207/s15327108ijap0303_3.

Kingma, Diederik P, and Jimmy Ba. 2014. “Adam: A Method for Stochastic Optimization.” arXiv Preprint arXiv:1412.6980. https://hdl.handle.net/11245/1.505367.

Kit, Leor AND Sullivan, Dmitry AND Katz. 2014. “Eye Movements, Visual Search and Scene Memory, in an Immersive Virtual Environment.” PLOS ONE 9 (4): 1–11. https://doi.org/10.1371/journal.pone.0094362.

Koppula, Hema S, and Ashutosh Saxena. 2015. “Anticipating Human Activities Using Object Affordances for Reactive Robotic Response.” IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (1): 14–29. https://doi.org/10.1109/TPAMI.2015.2430335.

Land, Michael F, and Mary M. Hayhoe. 2001. “In What Ways Do Eye Movements Contribute to Everyday Activities?” Vision Research 41 (25-26): 3559–65. https://doi.org/10.1016/S0042-6989(01)00102-X.

Land, Michael F., and Benjamin W. Tatler. 2009. “Locomotion on Foot.” In, 100–115. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198570943.003.0006.

Langbehn, Eike, Paul Lubos, and Frank Steinicke. 2018. “Evaluation of Locomotion Techniques for Room-Scale VR: Joystick, Teleportation, and Redirected Walking.” In Proceedings of the Virtual Reality International Conference-Laval Virtual, 1–9. https://doi.org/10.1145/3234253.3234291.

Lohr, Dillon J, Lee Friedman, and Oleg V Komogortsev. 2019. “Evaluating the Data Quality of Eye Tracking Signals from a Virtual Reality System: Case Study Using SMI’s Eye-Tracking HTC Vive.” arXiv Preprint arXiv:1912.02083. https://doi.org/10.48550/arXiv.1912.02083.

Marigold, Daniel S, and Aftab E Patla. 2008. “Visual Information from the Lower Visual Field Is Important for Walking Across Multi-Surface Terrain.” Experimental Brain Research 188 (1): 23–31. https://doi.org/10.1007/s00221-008-1335-7.

Matthis, Jonathan Samir, Jacob L. Yates, and Mary M. Hayhoe. 2018. “Gaze and the Control of Foot Placement When Walking in Natural Terrain.” Current Biology 28 (8): 1224–1233.e5. https://doi.org/10.1016/j.cub.2018.03.008.

Nadeau, Claude, and Yoshua Bengio. 2003. “Inference for the Generalization Error.” Machine Learning 52 (3): 239–81. https://doi.org/10.1023/A:1024068626366.

Nescher, Thomas, Ying-Yin Huang, and Andreas Kunz. 2014. “Planning Redirection Techniques for Optimal Free Walking Experience Using Model Predictive Control.” In 2014 IEEE Symposium on 3D User Interfaces (3DUI), 111–18. IEEE. https://doi.org/10.1109/3DUI.2014.6798851.

Nguyen, Anh, Pascal Wüest, and Andreas Kunz. 2020. “Human Following Behavior in Virtual Reality.” In 26th ACM Symposium on Virtual Reality Software and Technology, 1–3. https://doi.org/10.1145/3385956.3422099.

Nilsson, Niels Christian, Tabitha Peck, Gerd Bruder, Eri Hodgson, Stefania Serafin, Mary Whitton, Frank Steinicke, and Evan Suma Rosenberg. 2018. “15 Years of Research on Redirected Walking in Immersive Virtual Environments.” IEEE Computer Graphics and Applications 38 (2): 44–56. https://doi.org/10.1109/MCG.2018.111125628.

Patla, Aftab E, and Joan N Vickers. 1997. “Where and When Do We Look as We Approach and Step over an Obstacle in the Travel Path?” Neuroreport 8 (17): 3661–65. https://doi.org/10.1097/00001756-199712010-00002.

Razzaque, Sharif, Zachariah Kohn, and Mary C. Whitton. 2001. “Redirected Walking.” In Eurographics 2001 - Short Presentations. Eurographics Association. https://doi.org/10.2312/egs.20011036.

Rothkopf, Constantin A., Dana H. Ballard, and Mary M. Hayhoe. 2016. “Task and Context Determine Where You Look.” Journal of Vision 7 (14): 16–16. https://doi.org/10.1167/7.14.16.

Sipatchin, Alexandra, Siegfried Wahl, and Katharina Rifai. 2021. “Eye-Tracking for Clinical Ophthalmology with Virtual Reality (VR): A Case Study of the HTC Vive Pro Eye’s Usability.” Healthcare 9 (2). https://doi.org/10.3390/healthcare9020180.

Slater, Mel, Martin Usoh, and Anthony Steed. 1994. “Depth of Presence in Virtual Environments.” Presence: Teleoperators & Virtual Environments 3 (2): 130–44. https://doi.org/10.1162/pres.1994.3.2.130.

Stein, Niklas, Diederick C Niehorster, Tamara Watson, Frank Steinicke, Katharina Rifai, Siegfried Wahl, and Markus Lappe. 2021. “A Comparison of Eye Tracking Latencies Among Several Commercial Head-Mounted Displays.” I-Perception 12 (1). https://doi.org/10.1177/204166952098333.

Steinicke, Frank, Gerd Bruder, Jason Jerald, Harald Frenz, and Markus Lappe. 2009. “Estimation of Detection Thresholds for Redirected Walking Techniques.” IEEE Transactions on Visualization and Computer Graphics 16 (1): 17–27. https://doi.org/10.1109/TVCG.2009.62.

Steinicke, Frank, Yon Visell, Jennifer Campos, and Anatole Lécuyer. 2013. Human Walking in Virtual Environments. Vol. 2. Springer. https://doi.org/10.1007/978-1-4419-8432-6.

Suma, Evan A., Zachary Lipps, Samantha Finkelstein, David M. Krum, and Mark Bolas. 2012. “Impossible Spaces: Maximizing Natural Walking in Virtual Environments with Self-Overlapping Architecture.” IEEE Transactions on Visualization and Computer Graphics 18 (4): 555–64. https://doi.org/10.1109/TVCG.2012.47.

Tatler, Benjamin W., and Sarah L. Tatler. 2013. “The influence of instructions on object memory in a real-world setting.” Journal of Vision 13 (2): 5–5. https://doi.org/10.1167/13.2.5.

Tuhkanen, Samuel, Jami Pekkanen, Paavo Rinkkala, Callum Mole, Richard M Wilkie, and Otto Lappi. 2019. “Humans Use Predictive Gaze Strategies to Target Waypoints for Steering.” Scientific Reports 9 (1): 1–18. https://doi.org/10.1038/s41598-019-44723-0.

Usoh, Martin, Kevin Arthur, Mary C Whitton, Rui Bastos, Anthony Steed, Mel Slater, and Frederick P Brooks Jr. 1999. “Walking > Walking-in-place > Flying, in Virtual Environments.” In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 359–64. https://doi.org/doi.org/10.1145/311535.311589.

Wiener, Jan, Olivier De Condappa, and Christoph Holscher. 2011. “Do You Have to Look Where You Go? Gaze Behaviour During Spatial Decision Making.” Proceedings of the Annual Meeting of the Cognitive Science Society 33. https://escholarship.org/uc/item/9n91h72n.

Xu, Yanyu, Yanbing Dong, Junru Wu, Zhengzhong Sun, Zhiru Shi, Jingyi Yu, and Shenghua Gao. 2018. “Gaze Prediction in Dynamic 360 Immersive Videos.” In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5333–42. https://doi.org/10.1109/CVPR.2018.00559.

Yu, Cunjun, Xiao Ma, Jiawei Ren, Haiyu Zhao, and Shuai Yi. 2020. “Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction.” In European Conference on Computer Vision, 507–23. Springer. https://doi.org/10.1007/978-3-030-58610-2_30.

Zank, Markus, and Andreas Kunz. 2016a. “Eye Tracking for Locomotion Prediction in Redirected Walking.” In 2016 IEEE Symposium on 3D User Interfaces (3DUI), 49–58. IEEE. https://doi.org/10.1109/3DUI.2016.7460030.

Idem. 2016b. “Where Are You Going? Using Human Locomotion Models for Target Estimation.” The Visual Computer 32 (10): 1323–35. https://doi.org/10.1007/s00371-016-1229-9.

Idem. 2017. “Optimized Graph Extraction and Locomotion Prediction for Redirected Walking.” In 2017 IEEE Symposium on 3D User Interfaces (3DUI), 120–29. IEEE. https://doi.org/10.1109/3DUI.2017.7893328.

Zmuda, Michael A, Joshua L Wonser, Eric R Bachmann, and Eric Hodgson. 2013. “Optimizing Constrained-Environment Redirected Walking Instructions Using Search Techniques.” IEEE Transactions on Visualization and Computer Graphics 19 (11): 1872–84. https://doi.org/10.1109/TVCG.2013.88.