Just as the keyboard alone was an inadequate tool for interacting with the computer once the console was replaced by the two dimensional desktop, the mouse and keyboard in their current state are insufficient tools for dealing with the navigation and interaction needs of a environment that is three dimensional in both function and appearance. However, it is not apparent how one should deal with input into a three dimensional operating environment (3D-OE). The input method must be both convenient for the user and as intuitive as possible. In the same way that the mouse/keyboard system is an extension of the keyboard standing on its own, the input methods of a 3D-OE should try to extend the intuition if not the physical realization of current input methods.

While we are sure that the ideas following no where near exhaust the possibilities of input into a 3D-OE, we do believe that they illustrate some reasonable paths that input can take. We choose to ignore the possibility of methods like virtual reality systems which require more physical commitment from the user than two hands. While these may be useful for research, games, and other applications, these systems would not be a reasonable way of interacting with an operating system. They demand too much of the user; he must be immersed in the computer and is less sensitive to external stimuli. This demand for complete attention reduces the usefulness of the computer as a tool, and complete immersion would have negative social impacts in the long run.

The mouse/keyboard system is able to easily capture the full potential of a 2D-OE. In a simplified view, one can say that the mouse is responsible for most navigation and interaction and the keyboard is used for editing text.1 The reason that the mouse is sufficient for interacting with a 2D world is clear; the movement of the mouse on a desk can be described using the same XY coordinate system that can be used to describe the desktop. It is equally clear that using the mouse in this traditional way can not even begin to exploit the possibilities of a 3D-OE. To interact with a 3D system the roles of the input device, whether a mouse or something else, must change. The questions then becomes "How should input devices act and what should they be?" and "Can their behavior be changed and still remain intuitive to the user of a 2D-OE?".

3D interfaces are quite common among 3D first person shooter (FPS) games. The player controls a person as if looking through their eyes, and can look around using the mouse and move around the world using keyboard controls. The mouse is essentially mapped to pitch2 and yaw3 coordinates, so moving the mouse left and right will swing the view left and right, much as if a person stood in one spot and looked left and right.

Since most FPS do not involve people with the ability to fly, movement is limited to a single plane; keyboard keys are mapped to forward, back, left and right, such that pressing the forward key moves the player forward in the direction they are looking, along the floor they are standing on. By adding a single jump button, this system fairly effectively allows a player to use a standard mouse and keyboard to move around somewhat freely in three dimensions. A few games, most notably Descent (in which the player flies a spacecraft around underground mines fighting robots), allow true 3D movement. By adding four extra keys for moving up and down, and rolling4, the player has six-degrees-of-freedom movement.5

A better method of full 3D movement can be achieved with most joysticks. A joystick has basic yaw/pitch control by moving the joystick around. Many also have a twist axis on the handle, giving roll movement. Joysticks often also have a "hat" under the thumb, which is a little control that can be pushed in different directions to give XY axis movement. While this is still lacking a Z axis control, it still allows mostly complete 3D movement with only one hand, as compared to the standard two-handed mouse/keyboard setup commonly used for FPS, and uses semi-common and relatively cheap hardware. While true 6DOF devices do exist, they are generally only used in professional areas, and can cost up-wards of $400. Joysticks, by contrast, are not uncommon among computer gamers, and can be found for under $50.

Interacting with a 3D interface is different from a game, however. The main difference is that in addition to moving around in the interface world, the user needs a way to interact with individual elements in the world. This requires an accurate position indicator that is independent from the view-point movement; a role probably best fulfilled by the traditional mouse cursor from current 2D interfaces. Different options exist depending on what equipment the user has present on his computer. For all configurations we consider, a normal 2D monitor, keyboard and mouse are assumed present.

The most basic scenario would involve nothing extra, just a standard keyboard and mouse. This should be the default configuration, and, at present, what many people would use. In this situation, there are no parts of the devices which can be reserved for 3D movement. The mouse is needed to interact with the elements in the world via the mouse cursor. Many applications may utilize the entire keyboard, thus, no keys on it can be reserved for movement. There are few options which provide the necessary controls, but some are considered here.

First, there could be elements of the interface which the user interacts with to move about. This leaves the mouse in complete control of the cursor. An example of this type of configuration is having four (or six) arrows on screen (represented as icons). Clicking on each arrow moves the user in that direction. To change the viewpoint, there may be more arrow icons, or pushing the mouse against the edge of the screen could rotate the view in that direction. This method would allow the user to navigate his way around the system and has the added advantage of being relatively easy to figure out. However, it has the disadvantage of being somewhat cumbersome. To move or change viewpoint the user would have to move to these arrows or the edge of the screen. This would make using the system much more complex and would discourage users whose goal is to use the computer in the most efficient manner.

Another possibility is to have different modes of operation; the input devices would behave differently depending on which mode you were in. For example, separate interface and movement modes would allow the user to move around more easily when in movement mode, by mapping the mouse to the viewpoint and keyboard keys to movement commands, as in most FPS games. The interface mode would give the mouse control over the cursor, and keyboard commands would be processed by the current application. Though this method also seems less intuitive than using a mouse in a traditional 2D-OE, using dual modes to increase functionality is not a new idea. An example of physically switching modes is the almost intuitive ability to switch between the mouse and the keyboard in current systems. Another situation where mode switching is common is when using the text editor vim. It uses two modes, "Normal" mode where keystrokes are interpreted as commands and "Insert" mode where they are interpreted as text to be added to the file. While both of these systems can seem unintuitive at first, users quickly adapt to them. If implemented in a simple manner, the duality between movement and interaction mode in a 3D-OE could be similarly adapted to.

Another scenario involves using a second mouse in addition to the first.6 The second mouse could then be mapped to 3D movement, while the first retains control over the cursor, allowing 3D movement and interface interaction at the same time. The movement would be limited, however, since the keyboard would need to remain in interface mode, passing commands to applications. This could be helped somewhat by using buttons on the mouse to change how the mouse movement is interpreted (i.e., by default the mouse moves the viewpoint in yaw/pitch motions; but holding down the right mouse button makes mouse motion be interpreted as XY axis movement).

A third scenario uses a joystick for movement. Many joysticks provide fairly complete 3D movement without the aid of a keyboard, so using a joystick for movement gives good 3D movement without having to resort to tricks such as when using a second mouse. However, it still allows the mouse and keyboard complete control over the interface, so cumbersome buttons or separate interface/movement modes do not have to be used.

The second mouse and joystick methods have similar disadvantages. Both of these methods would seem to work best by using the off-hand (the hand normally on the keyboard when using the mouse) for the movement device. This would result in the user having to move both hands around a lot; though this happens to some degree in current systems, having the off-hand on the keyboard acts as both an anchor for when the mouse hand returns to the keyboard and as a tool for performing commands which supplement those being given by the mouse. For power users, this might eliminate a lot of the benefits short-cut keys provide.7 Another problem is that most mice and joysticks are made to fit the right hand. While left-handed people have always had a problem with this, they can adapt to using their right hand for mousing or joystick use in games. Trying to use two mice or a mouse and a joystick at the same time, however, requires a left-handed or handed-ness neutral device in the left hand. The third disadvantage of this system is that the extra device would take up extra desk space, a potential problem for those people who already have barely enough room for the various components of the computers as it is.

A final method of dealing with input into a 3D-OE would be to develop completely new tools. To be most convenient, these tools should be one handed like mice currently are. They should also try to limit their movement to two dimensions, i.e., they should not leave the desktop. Such movement would be less comfortable for the user and would be harder to control precisely. There are many forms these devices could take, and it will take much thought and experiment to develop a tool that is convenient and takes full advantage of the possibilities of a 3D-OE. While such developments may be the preferred method of interaction in the future, for the present, it would be most beneficial to focus on methods which involve existing hardware that can be easily attached to the standard computer.

Movement in a three dimensional system is necessarily more complex than movement in a two dimensional system. Thus, the mouse, which was able to act as a tool for both navigation and interaction in the two dimensional system, cannot as easily perform both functions in the three dimensional system. This can be coped with in several ways: the mouse can be given dual modes, it can be supplemented by another tool, or completely new tools can be created. Each of these methods has its own advantages and disadvantages. Ideally, a 3D-OE should be able to handle more than one of these input methods and the user should be able to choose which is most suited to his needs. Eventually, common use will dictate which of these is the "best way". Until then, we can only try to come up with ideas and try to understand them both as technology and as tools for human interaction.


Footnotes
  1. Of course, those who frequently use computers, know that much of the manipulation and navigation done by the mouse can be done (or at least closely mirrored) using the keyboard.
  2. Rotation around the X axis
  3. Rotation around the Y axis
  4. Rotation about the Z axis. The Z axis points out of the screen.
  5. Six-degrees-of-freedom, or 6DOF, consists of X, Y, Z, roll, pitch and yaw.
  6. Many interfaces currently deal with a second mouse by giving both mice control over the cursor. In terms of adding more software functionality, this isn't very helpful.
  7. Shortcut keys can usually be used with one hand on the keyboard and the other on the mouse.