Text Input
In visionOS, users can look at an input field and start talking to enter text. Based on Apples presentation, it looks like this will always replace all text that might have previously been entered into the input field.
I wanted to explore if it would be possible to allow audio input wherever the caret (blinking line) is currently focussing, so that it could also be used for composing longer pieces of text and for potentially switching back and forth between keyboard and audio input on the fly.
Here’s how that could work.
The caret in visionOS could have multiple states.
You would still have the traditional blinking state that shows you where your current focus is. Traditionally, carets are blinking to make it easier for your eye to find them, and that's still true in visionOS.
In addition to that, you would have a “gaze” caret that would follow your eye in order to move your focus to a different position inside the text. Its color is de-emphasized and it has a shadow to indicate that it’s not yet been placed into the input field. Once you tap your fingers together, the caret would be placed into the spot you’re looking at.
And then there’s a third state - the “active” caret - which appears when you’re looking at where the focussed caret is currently positioned. In this state, meaning as long as you’re looking at it, the caret is not blinking, because it doesn’t need to call attention to itself.
Instead, it shows a little animated indicator that responds in real time to the sounds of your environment that are picked up via your microphone, to indicate that it’s ready for your voice input. Once you start talking, the words are transcribed in real time and entered into the input field at that position. To prevent accidental input from conversations in your environment, the system could use voice detection to only transcribe when it hears your voice.
Draggable Elements
There is no hover state on visionOS.
And while there has never been hover states on mobile either, there’s a difference between iOS and visionOS.
Because one part of the input method in visionOS - eye tracking - does physically allow for hover states, since the device knows about your selection (where you’re looking) before performing the action. Apple just decided not to communicate your eye position to apps & websites for privacy reasons.
But the OS has that information.
This is interesting, because it means that apps & websites can’t design their own hover states, which they could use to indicate what interactions are possible for the element you’re hovering over. But the system can automatically apply a system style to it, which they’re doing with a subtle glow for tappable elements.
And since the system knows not just what you’re looking at but also if the element you’re looking at is clickable, or scrollable, or draggable, etc., there's an opportunity here to define how visionOS could visually indicate the type of possible interaction of the object you’re looking at.
The equivalent of macOS cursor styles for a gaze-based input method.
On macOS, the cursor can change its appearance to communicate the type of possible interaction of the object behind it, for example dragging. Since there is no traditional cursor on visionOS, the only other place to display such an indication is the affected element itself.
Here’s how this could look like.
To indicate that a UI element is draggable, the system could display a drag handle, similar to the one below every window in visionOS, below the element you’re looking at.
Once you look directly at this drag handle, and tap and hold your fingers, the element would move with your eyes, which act as the cursor. Once you spread your fingers again, the element would be placed into the new position.
In addition to using the drag handle, elements should most likely also allow the user to drag them around by looking at and dragging the element itself. In the future, this might even be the more common way. But I can imagine that a new input method, as the one in visionOS, might have to help users get accustomed with certain patterns by displaying dedicated UI controls.