Virtual Concert Performance - Synthetic Animated Musicians Playing in an Acoustically Simulated Room

Rami Hänninen, Lauri Savioja, and Tapio Takala

Department of Computer Science
Helsinki University of Technology
Otakaari 1, FIN02150 Espoo, Finland




This paper describes a system that implements a virtual visual and acoustic model of a closed space, such as a concert hall. The system features an animated musician playing a flute, physically modelled flute sound synthesis, visual and acoustical room models, auralization, and real time graphical user interface. The user can move in the modelled space and experience the visual and aural environment.

Keywords: Virtual Reality, Animation, Physical Modeling, Room Acoustics, Auralization

Figure 1. DIVA system overview and information flowgraph.

1 Introduction

We have developed a soft- and hardware system called DIVA (Digital Virtual Acoustics) for producing virtual audiovisual performances in real-time. The whole processing chain from sound synthesis through sound propagation in a room to auralization at the listener's ears is implemented, and combined with synchronized animation of music players and lighting of the hall.

The final result is a fully synthetic concert where an animated musician plays a physically modelled instrument in an acoustically modeled hall. Instrument fingerings for the musician model are automatically generated and, by inverse kinematics calculations, the finger motions are synchronized to exactly perform each note on an animated instrument model.

Acoustics of the enclosing room are simulated based on geometric model and material information of the room. Sound signal from the instrument and concert hall model is auralized to the listener, who can freely move in the virtual concert hall.

This paper describes the overall structure of the system, and the MIDI-based animation control of virtual musicians. Detailed discussion of physical instrument modeling, as well as auralization and room acoustics simulation can be found in other papers [Huopaniemi et al. 1994, Välimäki et al. 1996, Savioja et al. 1996].

2 Overview and Related Work

Modern computer graphics and signal processing systems can create realistic audiovisual representations for Virtual Reality [see e.g. Kalawsky 1993, Begault 1994]. For animations, motion has geen generated from MIDI music by procedural control on one hand [Lytle 1990], and sound effects have been generated by physically-based simulation on the other [Takala and Hahn 1992]. However, only few systems [Cook 1995] have combined these with musician modeling ñ closest to our work is the virtual performer by [Katayose et al. 1993].

The system components and the overall information flow are shown in figure 1. The upper half of the figure shows the audio stream while the lower half shows the visual stream.

Our sound sources are physical synthesis models of instruments. Several approaches to physical modeling can be found in [Smith 1996, Välimäki and Takala 1996]. We use the waveguide method, currently implemented for flute and plucked string instruments such as acoustic guitar, banjo and mandolin [Välimäki et al. 1995].

Concert hall geometry, including visual and acoustic material properties, defines the 3D model of the virtual space. We use a hybrid method to calculate the room's response. Early reflections are computed binaurally with image-source method. Reverberant part of the impulse response is obtained by the ray-tracing method. Diffraction effects at low frequencies could be taken into account with another model based on signal propagation in a three-dimensional waveguide mesh. The latter technique, however, are not applicable to real-time performance, but only to off-line prepared animated movies. Real-time performance calls for coarse simplifications, taking only the direct sound and first reflections as echoes into account, for example, and modeling late reverberation by digital filter structures [Schroeder 1962].

Modeling and synthesis of binaural hearing (auralization) is an important part of immersive audio environments. Simple means for giving a directional sensation of sound are the interaural amplitude and time differences (IAD and ITD), but they cannot resolve the front-back confusion. The head-related transfer function (HRTF) gives better performance but is computationally more demanding. Our approach is to process only the direct sound with a filter approximation of the HRTF at the respective direction, and to use IAD and ITD for other reflections (image sources). For late reverberation we use a recursive filter structure consisting of comb and allpass filters [Huopaniemi et al. 1994].

3 Animation Control from MIDI

The MIDI-to-movement mapper unit uses musician and instrument models to convert MIDI events into movement goals. An inverse kinematics system then transforms the goals to arm and finger movements.

The instrument definition lists the possible fingering combinations that an instrument supports for each note. As there may be alternative fingerings for the same note, the mapper first makes a sequence of compatible grips to play the music. Then each grip is replaced with the set of associated elementary movement goals. They are passed to an inverse kinematics system that generates a path through all given position and temporal movement goals in 3D space.

Our human instrumentalist model (figure 2) consists of a static body with movable arms and fingers. Each arm has 27 degrees of freedom in total. The inverse kinematics, i.e. deciding intermediate arm joint positions from the finger positions, is solved iteratively by using the Jacobian matrix method [Watt and Watt 1992]. As the inverse kinematics is computationally very demanding, the finger positions are precomputed, and only their synchronization and rendering is done in real time.

Finally, the interactive animation control unit provides a graphical user interface to the DIVA system. Through this interface, the user moves a virtual viewer and listener around in a concert hall.

Figure 2: A virtual musician

4 Experiments and Future Work

Our implementation is distributed to run in parallel processors, connected by Ethernet. Currently we use one Silicon Graphics Power Challenge workstation for real-time visualization and user interface and another for image-source calculations. A TMS320C40-based signal processor system performs the auralization. We are investigating the possibility of using high-speed asynchronous transfer mode (ATM) networks for virtual audio reality applications.

We have an animated video presentation of our system (figure 2 is a snapshot from the video). An animated flutist plays in the model of a real concert hall, while the camera moves around, demonstrating room acoustics at different listening positions.

For more immersive experience, we are planning to build a virtual studio environment, similar to the CAVE system [Cruz-Neira et al. 1992], where the user may freely move and control the virtual world with a gesture sensor ñ for example to conduct a virtual orchestra.


[Begault 1994] D. Begault. 3-D sound for virtual reality and multimedia. Academic Press, Cambridge, MA.

[Cook 1995] P. R. Cook. Integration of physical modeling for synthesis and animation. Proc. ICMC'95, pp. 525-528.

[Cruz-Neira et al. 1992] C. Cruz-Neira, D. Sandin, T. DeFanti, R. Kenyon, J. Hart. The Cave - Audio Visual Experience Automatic Virtual Environment. Communications of ACM, vol.35, no.6, pp. 64-72.

[Huopaniemi et al. 1994] J. Huopaniemi, M. Karjalainen, V. Välimäki, T. Huotilainen. Virtual instruments in virtual rooms - a real-time binaural room simulation environment for physical models of musical instruments. Proc. ICMC'94, pp. 455-462.

[Kalawsky 1993] R. Kalawsky. The Science of Virtual Reality and Virtual Environments. Addison-Wesley.

[Katayose et al. 1993] H. Katayose, T. Kanamori, K. Kamei, Y. Nagashima, K. Sato, S. Inokuchi, S. Simura. Virtual Performer. Proc. ICMCí93, pp. 138-145.

[Lytle 1990] W. Lytle. Driving Computer Graphics Animation from a Musical Score. Scientific Excellence in Supercomputing, The IBM 1990 Contest Prize Papers, Vol.2, pp.644 (Cornell National Supercomputer Facility, Ithaca, NY, USA).

[Savioja et al. 1996] L. Savioja, J. Huopaniemi, T. Huotilainen, and T. Takala. Real-Time Virtual Audio Reality. Proc. ICMC'96.

[Schroeder 1962] M. R. Schroeder. Natural-sounding artificial reverberation. J. Acoust. Soc. Am., vol. 10, pp. 219ñ223.

[Smith 1996] J. O. Smith. Physical modeling synthesis update. To be published in: Computer Music Journal, vol. 20, no. 2.

[Takala and Hahn 1992] T. Takala and J. Hahn. Sound Rendering. Proc. SIGGRAPH'92, Computer Graphics, vol. 26, no. 2, pp. 211-220.

[Välimäki and Takala 1996] V. Välimäki and T. Takala. Virtual musical instruments - natural sound using physical models. In: Organised Sound, vol. 1.

[Välimäki et al. 1995] V. Välimäki, J. Huopaniemi, M. Karjalainen, and Z. Jánosy. Physical Modeling of Plucked String Instruments with Application to Real-Time Sound Synthesis. 98th AES Convention, Preprint 3956, Also in: J. Audio Eng. Soc. (1996).

[Välimäki et al. 1996] V. Välimäki, R. Hänninen, and M. Karjalainen. An Improved Digital Waveguide Model of a Flute ñ Implementation Issues. Proc. ICMC'96.

[Watt and Watt 1993] A. Watt and M. Watt. Advanced Animation and Rendering Techniques - Theory and Practice. Addison-Wesley.