OPen

Face-tracking in Interaction Design for Mobile 3D Environments

Keywords

UX Research, Interaction Design, 3D, Spatial

Type

Coursework for Towards a New Science of Design

Role

Experiment Design, Data Analysis

Team

Xinyi Tang, Robert Nilsen

Year

2023

Can we utilize body languages to facilitate more intuitive interactions?

demo video

Background

An observation: We lean and squint when we can’t see well
An assumption: body languages indicate needs
Proposition: can we use subconscious body languages for more intuitive interactions?

Project Scoping

Interactions  in virtual exhibitions on mobile devices
As museum lovers, Robert and I decided to approach the massive problem of multi-modal interaction design with a smaller context of virtual exhibition.

Though virtual exhibitions are more common nowadays, they are still generally inconvenient to operate and offer a compromised experience. For example, the common design of fixed-point movement limits the exploratory nature of the space. Another example is that joystick control is often difficult to operate, especially for users who are unfamiliar with game controllers.

While touch-based user interfaces are common, emerging technologies like eye and face-tracked UIs show potential for enhancing usability and user experience in virtual spaces. Understanding the impact of physical-tracking in the context of navigating and viewing exhibits and individual artworks is crucial. Although these technologies are becoming slowly more prevalent in VR applications, there is limited knowledge regarding their effectiveness in mobile virtual environments.

With specific context, new questions arise:
1) what are major factors influencing viewing experiences?
2) what are viewer's intuitive reactions to those factors?


Stage 1: Pre-study

What’s intuitive under what conditions?
Our initial observations and research suggest that visual distance and image complexity are two major factors influencing virtual art-viewing experiences in 3D environments. Therefore, we design the first study to examine participants' facial and physical expressions when exposed to visually distant and highly complex artworks.

We hypothesize that certain expressions, such as squinting and leaning, will naturally occur as viewers attempt to gain a clearer view or understanding of the displayed art.The insights from this preliminary study will inform the design of interactions for the second stage of our research.

Experiment design: test images
Image Matrices
- Considering the two variables (distance and complexity), we developed two matrices, each containing eight base images.
- We established four levels to quantitatively measure the degree of these variables (distance and complexity).
- Consequently, we generated a total of 64 images for the study.

Randomized Image Sets
- For each participant, we generated a image set of 16, covering all degrees of the variables
- Each base image appear only once
- Each degree of the variable appear twice with different base images

Spatial setup & data collection

Data Analysis (12 participants)
Qualitative data and interpretations were included with our participant analyses. One major conclusion synthesized from these was that leaning in and squinting were reactions to visually distant artworks.

Image size vs. word count
- Word count: how many words were said about each image
- Word count increased with image size; the higher the image size, the higher the word count
- On average, participants had more to say about larger images

Image size vs. viewing time
- Viewing time: how long someone viewed a specific image (until they specified they were finished)
- Viewing time increased with image size; the higher the image size, the more viewing time
- On average, participants spent more time viewing larger images

Image size vs. Lean Percentage
- Lean Percentage: the percentage change of how much a participant leaned into the computer screen
- Lean percentage decreased with image size; the higher the image size, the lower lean percentage
- On average, participants leaned in significantly more for smaller images

Overlay percentage vs. word count
- Word count: how many words were said about each image
-Word count increased with a certain overlay percentage (0.25 to 0.75) in comparison to full-size images

Overlay percentage vs. viewing time
- Viewing time: how long someone viewed a specific image (until they specified they were finished)
- Viewing time increased with a certain overlay percentage (0.25 to 0.75)

Overlay percentage vs. Lean Percentage
- Lean Percentage: the percentage change of how much a participant leaned into the computer screen
- Lean percentage slightly decreased with a certain overlay percentage (0.25 to 0.75)

Pre-study conclusions
Average Normalized Lean Percentage
- Image size relatively predicted whether someone leaned in towards the computer screen. The smaller the image, the more likely the participant will lean in farther.
- Generally, overlay percentages show that participants leaned in slightly more for high range confusion values (0.5). Smaller image sizes displayed higher values of leaning in than overlay images.
Squint occurrence
- Image size was a clear predictor for whether participants squinted when viewing a certain image. The smaller the image, the more likely a user squinted when attempting to describe the respective image.
- Generally, overlay percentages show that participants squinted more for high range confusion values (0.5). Smaller image sizes displayed higher values of squinting than overlay images.
General Pre-Study Conclusions
Visually distant artworks (smaller image sizes) majorly provoked reactions of squinting, leaning in, and frowning.

Stage 2: Study Test in Staten Island Museum

Pre-study conclusions inform the design of stage 2 study test
For the purpose of integrating these into a user interface and balancing technical abilities, we went forward with utilizing squinting and leaning as interface triggers.


Reference & existing work
Existing work on eye expression and VR
Squinting is feasible for 100% participants in this study, suggesting that it is a common and feasible facial expression.Squinting is ranked the second after blinking for preference, suggesting that this gesture has the potential being used more often.
Common Interaction Modes for Mobile Device
The common interaction modes all have references to physical interaction, suggesting that interaction design grounded with physical habits or intuitions are more acceptable.The paper did not identify any best interaction mode, suggesting a need for exploration and potential for personalization.

Floor plan of interaction space

Study setup

Data collection (10 participants)

Factorial analysis
- Overall, participants found the facial-tracking experience more enjoyable and appealing.
- However, on average, it was perceived as less intuitive and efficient, potentially due to bugs and facial-recognition failures.
- The greater variance in the second experience suggests that personal preferences for modality choice can vary significantly.

Preference

Results & Conclusions

Pre-study conclusions:
- Leaning suggests the need for a better view of the artwork.
- Squinting indicates a need for more information and visual detail.
We incorporated these into the UI by:
- Leaning UI Incorporation: Allowing the user to teleport to the “best angle” to view the painting or photograph when leaning in towards the mobile device.
- Squinting UI Incorporation: Allowing the user to squint to instantaneously prompt more information about the artwork and a high quality image in a visual pop-up.
Incorporating these into the design and interface of the mobile VR environment, we noticed that:
1. Spatial confusion is common in 3D environment. It needs to be addressed for better user experience.
2. Individual preference varies for modality choices. Customizability is desirable for more personalized and intuitive interactions.
3. Novelty could be a double-edged sword - it might make the experience more engaging and promote the art-viewing for some, or it might be entertaining in itself and distract others from the actual artworks.
4. Calibration is needed for better eye-tracking.

Other findings:

Limitations

Pre-study limitations
Pre-study limitations