Multimedia Computing -------------------- Thomas Little Boston University ``Multimedia'' in the context of computing has come to imply the integration of audio, video, and images with more traditional types of data such as text and numerics. It is an application-oriented technology that caters to the multi-sensory nature of humans and is based on the evolving ability of computers to store, transmit, and convey diverse types of information. ``Multimedia Computing'' is defined as the manipulation and presentation of such media in a computer system. Applications of multimedia computing exist in all facets of society including business, education, manufacturing, law, medicine, and entertainment, that were inconceivable prior to the introduction of this technology. Table 1 shows a few application domains and their media components. Increasingly, these capabilities are the norm as all computer applications become multimedia applications and multimedia computing is absorbed by mainstream computing. The remainder of this article describes the essence of multimedia computing and the computer and networking components required to support multimedia applications. Table 1. Multimedia Applications and Media Supported Application Typical Media Supported Office Automation Images, Text, Spreadsheets, Mail Medical Information Systems Video (Telephony), Images, Text Geography Images, Graphics Education/Training Audio, Video, Images, Text Command and Control Audio (Telephony), Images Weather Images, Numeric Data, Text Banking Numeric Data, Text, Images Travel Agents Audio, Video, Images, Text Advertising Audio, Video, Images, text Electronic Mail Audio, Images, Text Engineering, CAD/CAM Numeric Data, Text Consumer Electronic Catalogs Audio, Video, Text Home Video Distribution Audio, Video Real Estate Audio, Video, Images, Text Library Image, Text Legal Information Systems Image, Text Tourist Information Audio, Video, Text Newsprint Publication Image, Text Dictionaries Image, Text Electronic Collaboration Audio, Video, Text The following four examples illustrate multimedia applications in detail. These are on-line news, distance learning, interactive gaming, and video-on-demand. On-Line News: On-line news is the multimedia analog to the printed newspaper. Through a Web browser, a reader can browse through pages of the newspaper, read articles, and view pictures or audio/video presentations, as shown in an example of Figure 1. In addition, the user can perform index or relational-type database queries or searches to locate specific articles or advertisements. The user may also participate in chat groups and provide feedback to the editors. Synchronization between media is required for elements of text, image, audio, and video, within the context of a page. Other requirements of the application include the ability to format the data for display (e.g., fonts, panning, zooming, sequence control (stopping and starting of streaming video), and database navigation. Figure 1: An Example of an Electronic News Service from the Web http://nt.excite.com122/page.html Distance Education Distance education enables students at remote locations to participate in live instruction, via videoconferencing; to collaborate on projects through shared "whiteboards"; or to replay instruction that has been pre-recorded or pre-orchestrated. Figure 2 illustrates an example of a multimedia distance learning application using the Web as a basis. In this example a student can browse through a database consisting of course material in various formats (images, audio and video recordings, and textual information). Alternately, the student can issue queries to the database while reading text, viewing illustrations and audio/video presentations. Figure 2: Electronic Distance Learning Application Interactive Gaming Interactive games present perhaps the greatest demands on the multimedia delivery system due to the requirement for real-time, three-dimensional imaging coupled with interactions among multiple players. SwineOnline (Figure) is an example of a Web-enabled game involving the raising of pigs by participants in a virtual state fair. Each participant is responsible for interacting with and nurturing the virtual pet as it grows in weight. Characteristic of this application is the need for low-latency interactions and support for a very large number of interacting players. Figure 3: Interactive Gaming with SwineOnline http://swineonline.tvisions.com Video-on-demand Video-on-demand (VOD) defines networked multimedia applications with a focus on full-screen video. Examples include movies in the home delivered from a central video server. Although early attempts at VOD failed due expensive system components, limited service offerings, and unsatisfactory revenue generation models, a resurgence in this technology is expected as residential broadband networks are installed, and the lessons learned from the many successful Web-based applications are applied to video-rich content. General Requirements for Multimedia Applications The examples in the previous section belong to a class of distributed multimedia information system (DMIS) applications that define multimedia technology. In a DMIS, there are many unique engineering challenges for both computer component designers and system integrators. Due to the large volume of multimedia data and inherent time dependencies, the components of such a system include high-speed networks, massive data servers, and specialized presentation devices which must be suitably selected and interconnected. A special case of the DMIS is the standalone workstation with multimedia capabilities including CD-ROM. Although not as rich in capabilities, workstations with multimedia capabilities are the norm for any new unit. However, the the limited data universe of CD-ROMS does not match the universal appeal and reach of Web and Internet based data. In the rest of this section we elaborate on the requirements of the components that comprise a DMIS. The major system-wide requirement of a DMIS is the ability to integrate multimedia data in real-time, being retrieved from distributed databases. This mode of integration differs drastically from the very important problem of systems integration, that concerns with unifying heterogeneous operating systems, networks, instruction sets, and data formats. Data integration, composition, or fusion describes the assembly of multimedia data elements into presentation form depending on the temporal and spatial characteristics of the data. This requirement establishes the need for system components capable of performing real-time data retrieval, delivery, and presentation. Spatial integration of multimedia data is unique to each medium and describes the assembly of objects in space (e.g., on a workstation) at certain points in time. For pictorial representations such as still images and graphics, integration operations include overlay and mosaic, and require processing such as scaling, cropping, color conversion, and position registration. For audio data, spatial integration is performed by superposition, or mixing, of signals. Other "spatial" audio operations include gain, rate, and tone adjustment. For example, videoconferencing uses signal mixing to prioritize a speaker's voice amongst many with gain or tone differences to signify "distance" via signal processing techniques. Similarly, temporal integration describes the presentation of time-dependent data such as a sequence of video frames, which nominally occurs at a rate of 30 per second for NTSC video, and requires specific video decoding hardware and scheduling for the retrieval and display of data elements. These special characteristics of multimedia data require a detailed evaluation of the individual system components for their suitability for building a DMIS. The other technological requirements of a DMIS deal with the workstation technology, communication protocols, bandwidth, internetworking, data storage, application interfaces and autoring tools, and information modeling and retrieval. We now briefly overview the requirements in these domains. Workstation technology. For multimedia applications we need appropriate multisensory input/output (I/O) devices. For presentation of data to a user, today's high-performance workstations with high-resolution monitors and audio output can be used as presentation devices. The output device must allow presentation of both the visual (text, graphics, video) and aural (voice, music) components of the application. For data capture, additional specialized devices are required depending on the type of data. For example, still images can be captured using a scanner, voice can be captured with a microphone and digitizer, text can be input via a keyboard, and video can be handled with a camera and digitizer. Early multimedia systems have revealed that conventional devices for user interaction, such as the mouse and keyboard, are not suitable for many multimedia applications such as games. Multiple axis joysticks, foot pedals, ``data gloves'' and eye motion tracking systems represent the next generation of I/O devices suitable for multimedia applications. Other requirements of a workstation include the need to compress and decompress data for transmission and storage, and the need to handle the large data rates of live video. New video formats such as High-Definition Television (HDTV) will further tax the data handling of the multimedia system with its increased data rates. Evolving standards for image and video compression include the JPEG (Joint Photographic Experts Group), for still image compression, and MPEG (Motion Picture Coding Experts Group), for motion image compression. Workstation add-on boards are available for real-time image compression with a compression ratio varying from 30:1 (VHS quality) to 500:1. Other add-on boards for multimedia workstations include TV tuners accepting cable or antenna, and video encoders to produce NTSC or PAL output for recording on tape. Video and audio playout are now common functions of most workstations. Audio playout is supported by a hardware device while video in various compression formats is handled by software decompression in conjunction with standard video display drivers. High-end solutions achieve video decompression with an add-on board; although with recent advances in bus and processor speeds, and advanced instruction sets (e.g., Intel's MMX), these boards must be inexpensive to be viable. Communication Protocols. Interactive multimedia traffic places stringent real-time service demands on a communication system. Existing protocols (e.g., TCP/IP) for data communication are not ideal for such traffic since they were not designed for time-critical delivery of data. They are used to provide error-free service, whereas many multimedia application traffic types can tolerate errors in transmission due to corruption or packet loss without retransmission or correction. In fact, to meet real-time delivery requirements, late packets can be discarded to meet deadlines of others. This assumes that these errors can be tolerated, as is the case for some interactive voice and video applications since dropped packets do not seriously degrade the service. The result of this characterization of the delivery requirements is that lightweight transmission protocols can be employed; the ones which do not provide retransmission since this can introduce undesirable delay. Multimedia applications require a high performance in terms of predictable end-to-end delays; a feature not provided by most of the existing protocols or operating systems. However, existing protocols can be used for multimedia applications, assuming the application can tolerate long delays without guaranteed service provision. New protocols for real-time traffic as required for multimedia applications have been proposed at several levels of the OSI Reference Model to provide real-time delay-bounded service for continuous-media traffic. If a specific delay or throughput cannot be achieved under current conditions, then the connection is not allowed. In this manner, the traffic on the network is limited to provide guaranteed performance for all of the allowed connections. The transport mechanism can then prioritize traffic classes based on type, and achieve performance specifications for individual classes. ATM (Asynchronous Transfer Mode) promises to provide a flexible communication mechanism for variable Quality of Service using variable-bandwidth channels in a form of packet switching. This technique achieves a single network interface to communication channels for each media type, adaptability of application's bandwidth requirements, flexibility for handling different data types, and a common signaling structure. However, implementations of ATM are expensive and it has yet to integrate both the local (LAN) and wide-area (WAN) environments or to successfully support multipoint applications. In contrast, conventional TCP/IP internetworks can be adapted for guaranteed service by the use of protocols such as RSVP or by overprovisioning with respect to the applications supported. Communication bandwidth. Multimedia data, especially interactive data, require enormous transmission rates. A summary of the data storage and communication requirements for various multimedia applications is shown in Table 2. As can be noted from this table, a single audio/video videoconference connection-pair requires 150 Mb/s without compression. As mentioned, compression can reduce this bandwidth requirement, with acceptable signal degradation. For the enormous bandwidth necessary to support multiple sessions, high-speed networks are needed for this type of traffic. Bandwidth availability is currently the most significant issue for ubiquitous deployment of multimedia applications and is the source of frequent complaints about the usability of the Web. Bandwidth availability is the key to viability of interactive TV, or VOD consumption in the home. For residential networks, Hybrid Fiber-Coaxial (HFC), via the Cable TV plant, and Asymmetric Digital Subscriber Line (ADSL), via the telephony path, solutions are proposed to provide the aggregate bandwidth for these applications. Table 2: Bandwidth requirements of high-end media delivery Medium Nominal Bandwidth (size/delay tolerated) Text file 60 Kb/s Image file 400 Kb/s MPEG 1 compressed video/audio 1.5 Mb/s MPEG 2 compressed video/audio 7 Mb/s Internet streaming compressed video 28 Kb/s Uncompressed video/audio 150 Mb/s Internetworking. In order to gain access to the many public and private databases, a DMIS must extend beyond the simple Local Area Network (LAN) environment. Currently, there are thousands of publicly available online databases. High-speed networks can link such geographically dispersed data stores and users requiring broadband services. Both Local Area Network (LAN) and Metropolitan Area Network (MAN) technologies are being developed that are well suited for such interconnections. Data Storage. Like communication bandwidth, the data storage requirement for multimedia data types is very large. For example, as noted from Table 2, the storage of 10,000 full-screen color still images (3 colors/pixel x 8 bits/color x 1200 x 1200 pixels = 35 Mb) requires 350 Gb, or 43 Gbytes. Similarly, for digital video storage applications, a video archive of 500 movies of 120 minute duration requires 531 Tb, or 66 Tbytes (3 colors/pixel x 8 bits/color x 512 x 400 pixels x 30/s = 147 Mb/s.). With a compression of 20:1, this can be reduced to a "mere" 3.3 Tbytes. This volume is particularly problematic when the multimedia applications require random access storage, for which streaming drives are not suitable, such as magnetic tape. Furthermore, due to the limited data transfer rates of many storage devices, especially optical disk drives, these devices often have inadequate access bandwidth to satisfy a large number of user streams. For example, MPEG-2 compressed video requires a transfer rate of 6.2Mb/s. Although this rate is readily achievable from a typical magnetic disk drive or a large database server, it is not typical of a CD ROM. The recent DVD standard addresses the both the capacity and bandwidth limitations of the CD-ROM; however, is intended to support only a single stream. Applications Interfaces and Authoring Tools. Additional important problems faced by multimedia application developers is the design of user interfaces and the authoring of content for the applications. Tools for developing such interfaces include window systems and applications programmer's interfaces (APIs) which permit a developer a full range of access to the utilities available in a DMIS. Substantial improvement in window-based interface models and toolkits has been achieved, particularly in the domain of the Web. These tools allow the rapid development of user interfaces for any application, often use an object-oriented approach, and are becoming de facto standards. Multimedia content is diverse in data types. To support the orchestration of content it must be used in association with instructions on how it should be interpreted by the workstation. Authoring tools provide a means for technical and non-technical content developers to quickly produce multimedia works dealing with the spatial layout and temporal presentation. Most of these tools produce proprietary data representations that are only playable with their own components. However, increasingly tools are being developed that yield standards-based output such as HyTime, SGML, or other scripting language that is supported on a wider scale through open Web-based systems. Executable content represents a means for scripting information to be coupled directly with the content to be delivered. Most of the early Web-based content delivery approaches relied on both the workstation and server participating in interactions with the content. Executable content (e.g., programs written in Java) allows for a tighter coupling of the content with the program that is required for its presentation. In this manner, the content and the program can be transferred to a workstation from a server and subsequent interactions do not require participation by the server. Information Modeling and Retrieval. New approaches to accessing information have also been developed that facilitate operation in novel ways. For example, high-end workstations can provides 3D access to information via the use of large display devices and oscillating-aperture 3D glasses. For database applications, the trend is to move away from traditional relational query-type interfaces that require substantial knowledge of the content and structure of the stored information. Object-oriented and hypermedia models are increasingly popular to manage the very large multimedia data items as "digital libraries." On the Web, the hypermedia or hypertext paradigm is a fundamental, but is often coupled with conventional relational database components. In this paradigm, data or documents are interconnected as a network. representation facilitates extensive cross-referencing of related items in a mode which allows a user to effectively browse through the data by following links connecting associated topics or keywords. Summary and Future In summary, multimedia computing is here and is being absorbed into mainstream computing; however, for large-scale, multimedia applications beyond the ``desktop'' there still is a need for significant advancements in high-speed networking, storage servers, and low-cost presentation devices for consumer multimedia delivery. The pace of development of multimedia technology is furious. Trends indicate that processing units will become faster, display devices will become cheaper, memory devices will become larger, and high-bandwidth network access will become ubiquitous. The end result will be the further impact of multimedia technology on society. It is becoming less of a novelty and more of a practical necessity. References Comm. of the ACM, Feb 1997, Vol. 40, No. 2. Furht, B. (Ed.), Multimedia Tools and Applications, Kluwer Academic Publishers, 1996, Norwood, MA, USA. Buford, J.F. (Ed.), Multimedia Systems, ACM Press, New York, 1994. Blattner, M.M., Dannenberg, R.B., Multimedia Interface Design, ACM Press, New York, 1992. S.Gibbs, D.C. Tsichritzis, Multimedia Programming. Objects, Environments and Frameworks, Addison-Wesley, ACM Press, 1995 R.Streinmetz, K.Nahrstedt, Multimedia: computing, Communications and Applications, Prentice Hall, 1995