Computer vision enables camera data to be utilized in user interfaces to analyze the 3-D context and automatically detect the user intentions. Using cameras as an input modality provides single-handed operations in which the users' actions are recognized without interactions with the screen or keypad. In this context, we have constructed a real-time mobile application prototype where the user's position and gaze is determined in real time, a technique that enables the display of true three-dimensional objects even on a typical 2-D LCD screen. We have defined a series of interaction methods where the user's motion and camera input realistically control the viewpoint on a 3-D scene. The head movement and gaze can be used to interact with hidden objects in a natural manner just by looking at them. We provide a description of the embedded implementation at a system-level where we highlight the application development challenges and trade-offs that need to be dealt with battery powered mobile devices. The implementation includes a parallel pipeline that reduces the latencies of the application.