Network Considerations for Distributed Multimedia Object Composition and Communication Thomas D.C. Little (tdlittle@sunrise.acs.syr.edu) Arif Ghafoor (ghafoor@top.cis.syr.edu) Department of Electrical and Computer Engineering 121 Link Hall Syracuse University Syracuse, New York 13244-1240 (315) 443-4454 (315) 443-4936 fax Abstract A multimedia application must manage distributed data consisting of text, video, audio, and graphics, maintained in remote, heterogeneous databases. Problems of synchronization, real-time communication, and format conversion arise in this form of system to compose distributed data into complex multimedia objects for final interaction and presentation to a user. Inter-object relationships are specified in terms of a set of timing and spatial integration requirements which dictate the performance characteristics of a supporting distributed sys- tem. Described in this paper are the state-of-the-art techniques proposed for multimedia object communication and integration necessary to maintain their relationships and compose multimedia objects. Accordingly, an architecture for distributed multimedia object manage- ment and composition is presented which provides a framework for supporting future distrib- uted multimedia information systems (DMIS). 1 Introduction Recent developments in high speed communications technology have resulted in interest- ing new applications for Distributed Computing Systems (DCS). A DCS is generally com- prised of groups of workstations and shared I/O devices interconnected by Local Area Networks (LAN). The advent of fiber optics and Broadband Integrated Services Digital Network (B-ISDN) extends the potential for these systems beyond the LAN environment by providing high-speed, low bit-error-rate communication channels to remote sites, and allow- ing switched voice and video transmission. One primary beneficiary of this technology is the Distributed Multimedia Information System (DMIS) [1]. Figure 1 describes the compo- nents of a DMIS, with the distribution of functions carried out by various system compo- nents. Figure 1. Distributed Multimedia Information System There are many potential multimedia applications. A typical one is the electronic catalog in which a consumer can browse, or query, through a set of listed catalog items by listening to audio descriptions, reading prices, and viewing video demonstrations of a merchant's products. Such type of service requires data of three types: audio, video, and numeric. Other multimedia applications exist in the areas of medicine, geography, and education. Table 1 summarizes these application areas with their characteristics. Few of these applica- tions use a single medium by itself without some interaction with other media; data of differ- ent media types are combined for use in presentation to the user. The process of combining data in this manner is commonly known as integration, composition, or fusion, and is shown pictorially in Figure 2. Table 1. Multimedia Applications and Characteristics __________________________________________________________________________________________________________ Office Automation Images, Text, Spreadsheets, Mail Composition, Filing, Communication Medical Information Systems Video (Telephony), Images, Text Data Acquisition, Communication, Filing Geography Images, Graphics Data Acquisition, Storage, Image Manipulation Education/Training Audio, Video, Images, Text Browsing, Interactivity Command and Control Audio (Telephony), Images Data Acquisition, Communication Weather Images, Numeric Data, Text Data Acquisition, Simulation, Data Integration Banking Numeric Data, Text, Images Image Archiving Travel Agents Audio, Video, Images, Text Video Browsing, Communication Advertising Video, Images Image Composition, Enhancement Electronic Mail Audio, Images, Text Communication Engineering, CAD/CAM Numeric Data, Text Cooperative Work Consumer Electronic Catalogs Audio, Video, Text Video Browsing Home Video Distribution Audio, Video Video Browsing Real Estate Audio, Video, Images, Text Video Browsing, Communication Library Image, Text Database Browsing, Query Legal Information Systems Image, Text Database Query Tourist Information Audio, Video, Text Video Browsing Newsprint Publication Image, Text Image, Text Composition Dictionaries Image, Text Database Browsing, Query Electronic Collaboration Audio, Video, Text Videoconferencing, Concurrency Control, Communication __________________________________________________________________________________________________________ Composition generally takes spatial and temporal forms [2]. Temporal composition refers to synchronization of multiple streams of information consisting of objects of varying granu- larity [3]. These objects can be continuous types (video, audio), discrete types (images, text), or combinations of both. Spatial composition is concerned with the combination of objects in space such as image overlay, or text with image. Figure 2. (a) Spatial Composition. (b) Temporal Composition For most applications, such as medical information systems and interactive electronic collaboration (Table 1), data are obtained from many dispersed sources. Data are created or fetched from individual locations, or sources, and communicated to users at interactive termi- nals or sinks. In the future, an increasing amount of information will be provided by private and public database organizations that are geographically dispersed [4]. The composition process requires assembling these data based on both spatial and temporal constraints by some component of the system for ultimate presentation to the user(s); a process suitable for cooperating distributed multimedia database servers. The partitioning of this composition process on the set of distributed system resources is one of the issues addressed in this paper. Other important topics discussed include requirements for distributed multimedia object composition, possible approaches to the composition problem, and assessment of future multimedia communication technology. The remainder of this paper is organized as follows: Section 2 defines the terminology and issues related to providing temporal data integration in a distributed environment. In Section 3 a review of the current technological approaches to the issues and an assessment of the state-of-the-art is provided. Section 4 presents a unified model for synchronization at three levels, based on the approaches of Section 3. Section 5 concludes the paper with a course for future efforts. 2 Object Composition and Integration An object in our context is any unit of data whether complex or simple that can be dis- tributed throughout a DMIS or, alternately, can be presented to a user in some desirable manner. This definition encompasses a spectrum of object type complexities including com posite multimedia documents, text annotated images, simple numeric data, and encoded audio. Table 2 summarizes the units of information associated with the various media types. It can be noted that for each media type, multiple levels of decomposition result in different units of information. For example, the basic unit for the text type is the character, but in- creasingly complex objects of word, sentence, paragraph, and document can be composed from the atomic or base types. Table 2. Information Units for Various Media Types [5] ____________________________________________________________________________________ Text character word sentence paragraph document Image pixel image Motion Video pixel raster image segment film Graphic vector shape drawing ____________________________________________________________________________________ The size of objects varies extensively with the application. Textual objects such as words, sentences, paragraphs, or pages comprise some of the smallest objects; whereas color images form the largest. The size of objects can be unbounded. However, generally the largest objects are composed from the sequences of still image objects which make up color video. Table 3 indicates typical uncompressed object sizes in bits per object for various object types taken from the application areas of Table 1. The sizes indicate a requirement for both mas- sive data storage and high bandwidth communication for utilization of these objects in a DMIS. With suitable approaches; however, data compression can reduce these object sizes significantly. For still images, a compression ratio of three-to-one has been reported [8] without loss of information. Greater compression ratios, exceeding twelve-to-one, are possi- ble by using lossy approaches, without significant image degradation [8]. Table 3. Typical Uncompressed Object Sizes ___________________________________________________________________________________ Geography Multispectral Scan Image (corrected)3548 b x 2983 b x 6 b/pixel 64 Mb ___________________________________________________________________________________ Medicine Digital Chest X-ray 1024 b x 1024 b x 12 b/pixel 13 Mb [7] Emission Computed Tomography 128 b x 128 b x 16 b/pixel 260 Kb Nuclear Medicine 256 b x 256 b x 16 b/pixel 1 Mb Nuclear Magnetic Resonance 512 b x 512 b x 16 b/pixel 4.2 Mb Ultrasound 512 b x 512 b x 8 b/pixel 2.1 Mb ___________________________________________________________________________________ VideotelephonyMedium Resolution (b/w) 512 b x 400 b x 8 b/pixel 1.6 Mb Stills Medium Resolution (color) 512 b x 400 b x 24 b/pixel 4.9 Mb [8] High Resolution (color) 1024 b x 1024 b x 24 b/pixel 25 Mb Motion Video 5 Seconds Medium Resolution (b/w)1.6 Mb/f x 30 f/s x 5 s 240 Mb (30 f/s) 5 Seconds Medium Resolution (color)4.9 Mb/f x 30 f/s x 5 s 735 Mb ___________________________________________________________________________________ Office VT100 ASCII Text Screen 80 c/l x 24 l x 8 b/c 16 Kb Automation 8.5" x 11" Page ASCII Text (Courier)66 c/l x 55 l x 8 b/c 29 Kb Scanned 8.5" x 11" Page (b/w) 8.5" x 11" x (300 pix/in.)2 x 8 b/pix 67 Mb Scanned 8.5" x 11" Page (color) 8.5" x 11" x (300 pix/in.)2 x 24 b/pix200 Mb 5 Seconds Telephone Quality Audio 7000 S/s x 5 s x 8 b/S 280 Kb 5 Seconds Stereo CD Quality Audio 44 KS/s x 2 ch x 5 s x 16 b/S 7 Mb ___________________________________________________________________________________ 2.1 Multimedia Object Types: Persistent Versus Non-Persistent Objects can be classified in terms of their presentation and application lifetimes. A per- sistent object is one that can exist for the duration of the application; in a persistent store such as a database. A non-persistent object is created dynamically and discarded when obsolete. For presentation, a transient object is defined as an object that is presented for a short dura- tion without manipulation. The display of a series of audio or video frames represents tran- sient presentation of objects, whether created dynamically or retrieved from a database. Objects are static during presentation if they exist for an extended period for their possible manipulation. A still image is an example of a static object. Henceforth, we use the terms static and transient to describe presentation lifetimes of objects while persistence expresses their storage life in a database. 2.2 Multimedia Object Composition: Spatial Versus Temporal Spatial composition involves assembling data based on overlaying or linking multiple objects into a single entity, as for example, the composition of textual and image information of Figure 2 (a). For such objects, the order in which the composition is performed is of no significance since there are no specific temporal relationships between the data elements and there are no transient data. Spatial composition techniques must consider size, rotation, and placement of participating objects. If an overlaying operation is performed, various merging operations can be used such as Exclusive-ORing of pixel elements. Performance of spatial composition assumes an underlying data model supporting the aggregation of objects. Stand- ardization efforts have resulted in a both logical and layout representations for multimedia documents described by the Office Document Architecture (ODA) [11]. These representa- tions, as well as other approaches [12, 13] develop a hierarchy of objects with the object- oriented paradigm. Similarly, a methodology is defined for exchange of temporal relation- ships occurring within documents [14], including parallel, sequential, and independent rela- tionships. This representation describes concurrent, serial, and arbitrary temporal relation- ships, respectively. For temporal composition there exists a time ordering assigned to the presentation of the elements of the multimedia object. Consider a multimedia slide presentation in which a series of verbal annotations coincides with a similar series of visual elements such as slides. In this example, verbal annotations, or audio, must accompany the image presented. The presenta- tion of the annotations is sequential, as it is for the images. Points of synchronization corre- spond to the change of image segment and the end of a verbal annotation: an example of a rather coarse synchronization between objects. A multimedia system must preserve the timing relationships between the elements of the object presentation at these points of syn- chronization by the process of temporal composition. Figure 2 (b) indicates a pictorial repre- sentation of sequences of audio and video elements presented continuously as they are gener- ated. For this kind of presentation, the video component consists of a sequence of frames each displayed for 1/30 of a second, to maintain the nominal video display rate of 30 frames per second. However, during the display of one video frame, the corresponding sequence of audio samples does not comprise a logical information unit for which we can apply synchro- nization points. Some other form of coordination is necessary to satisfy a tight synchroniza- tion requirement (e.g., lip synchronization) between the two streams during playback. Specif- ic coarse synchronization points between these types of multimedia objects do not exist except at the beginning and end of the data sequences; rather, an ongoing form of synchroni- zation is necessary. This type of temporal composition is called continuous [5], stream, or isochronous [15] synchronization. 2.3 Temporal Composition: Continuous Versus Synthetic In this paper the problem of temporal object composition is specifically considered. Based on the multimedia application areas presented, two kinds of temporal relationships are defined which can occur between objects. We call temporal relationships between objects which occur as streams time-continuous relationships (or simply, continuous relationships). The relationship between audio and video streams generated by recording the image and voice of a speaker is continuous (or isochronous [15]) since these streams are produced, communicated, and presented as finite sample sizes and in fixed time intervals. Artificially created temporal relationships formed between objects which do not require continuous stream synchronization are called synthetic relationships [16]. Synchronized images and text for the composition of a multimedia display fit this category of temporal relationship. Syn- thetic relationship composition primarily indicates applications which rely on composition from stored, pre-orchestrated presentations with less stringent synchronization and delivery requirements, while continuous relationships are critical for real-time applications with strict needs for inter-object coordination and communication. In the following sections we elabo- rate on the distinctions of continuous and synthetic composition. Figure 3. Visual Telephony [17] 2.3.1 Time-Continuous Composition Continuous temporal relationships occur among objects which are acquired simultaneous- ly. The most illustrative example of a multimedia application exhibiting continuous composi- tion is visual telephony, where audio and video are acquired, encoded, and transmitted be- tween remote sites over a network. Figure 3 indicates the components of a simple visual telephony system. Here, a transmitting station acquires data, signaling, and analog audio and video signals in real-time, encodes them, and sends them via a high speed network such as B-ISDN (Broadband Integrated Services Digital Network) to a receiver which performs the inverse function. To minimize communication cost, some form of compression and decom- pression is performed on the transmitted data. This can be variable bit rate or fixed bit rate, and can be intra- or inter-frame coded. Network bandwidth must be guaranteed to meet the variable rates associated with the video signal, with a hard bound on delay. Also, variations in packet arrival time (jitter) introduced by independent routing paths, packet loss, and packet buffering, must be eliminated via synchronization by the network or receiver. A detailed discussion of the communication requirements for this type of composition is presented in Section 3. Figure 4. Medical Diagnosis and Teleconferencing System (MDTS) 2.3.2 Synthetic Composition Examples of composition based on synthetic temporal relationships occur whenever complex temporally related objects are created and stored in a database. Storage of objects is facilitated by a database management system (DBMS); one that is capable of storing multi- media data (objects) and their attributes. In addition to maintaining the data comprising the objects, the DBMS must also maintain information necessary to perform temporal composi- tion for both continuous and synthetic synchronizations. Specification of the temporal rela- tionships can be modeled using the Object Composition Petri Net (OCPN) [2] which allows subsequent storage of coarse synchronization information at the object level (see Table 2). Figure 4 shows a hypothetical medical diagnosis and teleconferencing system (MDTS) which employs synthetic relationships among component objects. In this example it is assumed that the application requires synchronization of verbal utterances with video segments and diagnostic images, which constitutes a subset of the synthetic relationships and synchroniza- tion requirements. An OCPN for these objects captures the synthetic relationships and indi- cates synchronization points. For a given application and its data, by employing the OCPN, a database schema can be constructed for maintaining the object hierarchy and temporal rela- tionships for storage and subsequent retrieval [2]. Figure 5 shows a segment of the OCPN for the MDTS. Parallel streams of audio and video data are synchronized at various points with text and image objects, as specified by the OCPN for this example. Figure 5. OCPN for Subset of MDTS Temporal composition of two objects can occur based on either sequential or parallel time relationships. There are thirteen ways in which two objects can relate in time [18]. However, it has been shown that seven relationships are sufficient to describe composition based on any temporal relationship between pairs of objects [2]. These seven relationships are shown in Figure 6 (a). The OCPN can capture the semantics of any of these temporal relations for the purpose of specifying timing and display requirements of various objects as demonstrated in Figure 6 (b). Figure 6 (a). Temporal Relationships. (b) Corresponding OCPNs These relationships and OCPN models have been shown to be sufficient for specifying the temporal relationships of complex multimedia interactions constructed by pairing related objects [2]. 2.4 Multiple Levels of Integration There are three levels of integration defined for the provision of multimedia services in a DMIS [3]. These are defined as the human-interface, the service and the physical levels. At the human interface level, integration consists of presentation and interaction with the user through multiple I/O devices. Service level integration of multiple media describes the set of services offered by the DMIS to provide object composition functionality to multimedia applications. Interactions between objects and applications are achieved by an interface at the service level. Physical level integration describes the consolidation of the data compris- ing multimedia objects onto communication and storage media. This integration consists of multiplexing for physical channels, or clustering for physical storage. The OCPN modeling technique describes the composition of multiple media for objects at the human-interface and service levels of integration. To characterize the remaining levels required for a DMIS we introduce quality of service (QOS) to describe composition service and physical channel features. 2.5 Quality of Service for Object Composition An important parameter for supporting distributed applications over a network is QOS which characterizes communication services. For multimedia object communication, a QOS can be defined in terms of a tuple indicating target levels of performance. Specification of performance through this tuple influences the development of a DMIS. To develop this tuple some definitions are introduced. We assume that those media which occur as continuous streams are transformed into sequences of discrete values to provide a uniform mechanism for inter-media synchroniza- tion. This assumption is consistent with the process of digitizing analog inputs such as audio and video. Single instances of discrete values or aggregations of these values can be called objects. Whether continuous or synthetic relationships exist among synchronized entities, an end-to-end delay exists between the source and destination. This delay depends on the char- acteristics of the underlying network. A sampling delay is introduced in the digitization of analog data streams such as real-time video and voice. For stored objects, various overheads such as query evaluation, seek and access delays in the storage devices are associated with object retrieval. Remote data access introduces communication delays due to network trans- mission, packetization, buffering, and depacketization. Finally, delay associated with presen- tation is incurred at the sink. Additional delays can be introduced due to compression and decompression of data at the source and sink, respectively. The relationship between these delays in summarized in Figure 7. The presentation rate of a sequence of objects, such as video frames, is nominally equal to the rate at which they are recorded; however, this rate can be increased or decreased for certain circumstances. We define the presentation speed ratio for a sequence of objects to be the ratio of the actual to the nominal object presentation rate. Similarly, object utilization describes the ratio of the actual presentation rate to the available delivery rate of a sequence of objects. When utilization is equal to unity, all objects are presented. When it decreases, some objects can be discarded in order to maintain synchronization between two object streams. Ideally, both object speed ratio and utilization are equal to unity. However, there is a tradeoff between these parameters which depends on the policy established by the composi- tion service. Degradation of one of these parameters can reduce system load, as discussed in Section 3.2. Object skew refers to the average difference in presentation times between two synchronized objects (over n synchronization points) at their synchronization instant. The instantaneous difference is called object jitter. Figure 7. End-To-End Delays In Figure 8 the characteristics of object skew, jitter, utilization, and speed ratio are shown. If we assume that stream A represents the nominal presentation of a sequence of objects then the characteristics of the presented stream, A' can be determined and are shown in Table 4. Figure 8. Skew, Jitter, Utilization, Speed Table 4. Parameter Values Another important communication parameter is the reliability of communication services which can be viewed at multiple levels: per bit, frame, packet, channel, or connection. The reliability, expressed in terms of bit error rate (BER) and packet error rate (PER), represents the number of errors per unit time for bits and packets, respectively. The detected error rate of a communication channel is dependent on factors including the transmission medium, check-summing algorithm, and expected rate of packet loss from buffer overflows. The effect of packet and bit errors can have very different consequences, depending on the data transmitted. For example, an error in digitized voice can result in an audible 'click' in a phone connection. For inter-frame coded video, lost bits or packets can interrupt image display for several seconds [19]. The level of error provision also impacts time performance. To provide an error-free service at any level of synchronization (bit, packet, etc.), error control protocols are required. These protocols must provide error detection and retransmission and/or correction, each reducing the transmission performance and ability to meet real-time performance specifica- tions. Based on these definitions, the QOS for multimedia communication can be defined as a tuple which includes (speed ratio, utilization, average delay, maximum jitter, maximum BER, maximum PER). This tuple can specify the requirements necessary to provide a multimedia service. For example, video telephony requires that voice and video be properly synchro- nized with moderate reliability. For the video stream this can be reflected through a QOS tuple with the following characteristics; speed ratio = 1.0, utilization = 1.0, average delay = 0.25 s, maximum jitter = 10 ms, maximum BER = 10-2, and maximum PER = 10-3 [20]. 3 Temporal Object Composition in a Distributed System A high QOS for object composition in a DMIS places strong requirements on both hardware and software associated with the server, network, and workstation components. These requirements can be partitioned by several more or less independent layers of function- ality, having different synchronization characteristics. The major layers of functionality of a DMIS system have been defined in [2] which include the user interface, scheduler for the object composition process, object manager, and network services. The relationship between these functional layers is shown in Figure 9. Each layer provides services to the preceding layer as the hierarchy is traversed from top to bottom. The functional requirements of each layer with respect to temporal composition are discussed next. Figure 9. Layers of Functionality 3.1 User Interaction Ultimate functionality and utility to the user depend on the transparency of data and communication access to the distributed object. The user should have, for example, the abili- ty to (1) setup and use coordinated (synchronized) multimedia channels (for example, visual telephony) by simple means and, (2) browse through sets of various objects within a database without regard for their location or media type. User interaction can be in the form of quer- ies, sequential stopping and starting, spatial panning and zooming, animation, and various editing operations. The user interaction layer is characterized as the most coarse level of synchronization in the DMIS. Synchronization points occur at invocations and terminations of application programs, object selection or query, pause operations, or any other user-object interaction. These synchronization events occur randomly based on the user interaction, and cannot be predicted by the system. The synchronization required in this fashion has implications on the process of scheduling object presentation for concurrent real-time activities at the worksta- tion. 3.2 Object Presentation Scheduler The activities of presentation of multimedia objects to the end-user are inherently concur- rent, while the contemporary output devices (workstations) are not. Virtual, real-time concur- rency is provided by multitasking for these devices and facilitated by a scheduler. With respect to synchronization, the scheduler must manage concurrent processes with real-time requirements. These requirements encompass both user interaction and the continuous and synthetic temporal relationships that can exist between objects. The scheduler for the presentation of objects must have access to a schema specifying the temporal relationships of the objects currently undergoing presentation, the types of media, the desired QOS, and the location within the decentralized database of the selected objects and their components with respect to the display site. The various media types have different communication requirements (see Section 3.4) which can vary depending on the desired QOS. From this information, decisions can be made to provide suitable capability for sched- uling. For objects with synthetic relationships, temporal relationships are known a priori, by virtue of their database storage. With this information, the scheduling task is significantly simplified [16]. A similar argument is valid for scheduling synchronization for continuous objects. These types have predictable scheduling requirements. However, scheduling the presentation of objects influenced by intermittent activities introduced by the user is more difficult. Figure 10. Synchronization Anomalies Scheduling for synchronization of objects is based on temporal constraints including deadline, minimum delay, and maximum delay. Minimum delay defines the least amount of time before presentation of an object is to occur, or between presentation times of two ob- jects. Similarly, maximum delay indicates the greatest amount of time before presentation is to be initiated, or between presentation instants of two objects. The time instant identified by the end of these intervals is called the deadline. These temporal constraints can be derived from the QOS tuple supplied to the scheduler. An error condition results in the event that a constraint is violated (a schedule cannot be met due to heavy load). A policy to solve this problem is to degrade the QOS for an object to satisfy the schedule. The degradation can be in the form of dropping frames (i.e., a reduction in object utilization) or decreasing the object speed ratio. The approach taken in [5] is to use restricted blocking; whereby, if during the presentation of two media (such as audio and video in Figure 10), a synchronization point is reached but one stream is delayed, the policy is to perform some appropriate alternative action while the other stream catches up. For the example in Figure 10, the alternative is to "hold" the last frame of the video stream while the audio stream is presented normally. This practice ultimately results in an object skew, dropping of frames, or increased object speed for one of the data streams as the scheduler handles the burst of delayed frames when they arrive. 3.3 Object Management A DMIS consists of data storage, processing, and communication components. Using the layered view, of Figure 9, the underlying communication system provides services to the upper layers of the DMIS. Such services must support unicast and multicast connections with remote sites, global naming of objects, concurrency control, resolution of heterogeneity, etc. Operations and interface primitives between layers must be defined to facilitate management and control of synchronization over established connections [15]. This control consists of establishing a quality of service for a multimedia connection, and maintaining or modifying the QOS for a connection. Maintenance of QOS can include adjusting the skew between synchronized data streams, and informing the application of the state of synchronization. Synthetic composition is intended to be performed at the level of complex stream-type, or static objects. This level of the hierarchy represents the intermediate level of synchronization between stream synchronization and user or external event synchronization. Spatial registra- tion, synchronization, and QOS information must be maintained for each object to perform composition. It is the role of the object manager to maintain this information, including the objects themselves. To specify temporal relationships between two objects, four parameters must be specified and maintained as indicated by the synchronization tuple (ta, tb, td, TR) [2]. ta and tb define the time durations of the individual objects. The delay between them is expressed in terms of td. Their temporal relationship is given by TR (see Figures 6 (a) and (b)). By combining objects in a pairwise fashion, a hierarchy of temporally related objects can be created and managed in a distributed multimedia database. The granularity of synchroniza- tion for synthetic composition is determined by the size of the objects in the hierarchy; however, this technique is not intended for providing continuous relationship composition. It is designed for a coarse level of synchronization at the complex object level rather than at the frame level (see Table 2). An example of object synchronization is provided in Section 4.3. 3.4 Network Layer for Distributed Multimedia Objects In a distributed system, the communications subsystem plays a central role in its organiza- tion and operation. The network is considered to provide basic services to the system as a whole. With respect to temporal object composition, this subsystem must be able to provide synchronization down to a fine level in what is called stream synchronization. For multimedia data types, there is a unique set of requirements imposed on the commu- nication component of a DMIS due to the size and characteristics of multimedia objects (Table 3). For some types, delayed data are of little or no use for a multimedia application. Voice and video require real-time delivery; whereas text and images merely require timely delivery. With respect to transmission reliability, multimedia types also have differing re- quirements for PER and BER. Voice and video data can suffer errors in transmission without major degradation in service, depending on their coding algorithms. For data transfer (text and numeric), errors cannot be tolerated at the destination. Therefore error detection, correc- tion, and recovery schemes are used to provide reliable communication for these types. Error recovery protocols are sacrificed to provide real-time delivery performance for perishable objects while real-time delivery is traded-off to provide reliability for numeric data. A set of traffic classes can be identified by bandwidth requirement, delay distribution, and end-to-end packet loss. We will ignore the distinction between high- and low-bandwidth traffic since ample bandwidth is necessary for either case (e.g., interactive versus bulk traffic). Critical to the network component of a DMIS is the communication delay distribu- tion and reliability. Three types of network performance characteristics [21] can be identified for types of multimedia objects: (1) Delay-sensitive, non-blocking service: This service is used for communication requir- ing high data rates, short delay, small delay variation, and zero packet loss due to contention. Video telephony requires this kind of service since video data is transient in nature and is therefore perishable. Although no loss due to blocking is provided by this service, it does not guarantee freedom from packet errors. (2) Delay-sensitive, blocking service: This service is used for communication requiring high data rates and can tolerate moderate delay, jitter, and blocking. This service has a re- laxed delay distribution and a nonzero probability of lost packets, but with a bounded value for the ones lost consecutively. Voice and video traffic can use this kind of service. Bulk information transport that can tolerate errors is provided by this service without a requirement for minimum delay. (3) Delay-insensitive, error-free service: This service is used to provide error-free trans- fer of data provided by a retransmission policy for blocked, discarded, or erroneous packets. Traffic of this class is guaranteed zero end-to-end errors and is specified by a minimum average throughput and maximum average delay. This service is intended for any traffic requiring error-free service such as file transfers. These three classes characterize most traffic that is possible in a DMIS. However, it is intended that QOS be dynamic according to the requirements of the application. Communica- tion protocols need to be provided with a spectrum of reliability levels and real-time charac- teristics to support a wide range of QOS levels indicated by each media type and as dictated by individual multimedia applications. In Sections 4.1 and 4.2 we review techniques to provide synchronization, or maintenance of delay variation as dictated by QOS. 4 Technological Assessment: Multimedia Services In this section we discuss solutions to the problems of object synchronization at three levels, and present a model for distributed object composition. This model consists of a series of layers representing object composition protocols for imposing various types of synchronizations as proposed in [22, 15, 2]. These levels of synchronization deal with user interaction, object composition and management, and stream synchronization at the network layer (see Figure 9), and are described below. 4.1 Low-Level Synchronization Protocols At a very low level, synchronization of multimedia data requires synchronization of data streams with very tight skew, jitter, and delay requirements (we defer discussion of reliabili- ty). Audio and video have these requirements. The low-level must deal with methods of maintaining synchronization of data streams from disks, communication channels, and real- time inputs. A proposed approach to the provision of a flexible communication mechanism for varia- ble QOS uses variable-bandwidth channels provided by packet switching schemes [23, 24]. Rather than using fixed, dedicated, low-bandwidth (circuit switched) channels with independ- ent signaling channels, a high-bandwidth, virtual circuit (VC) can be used to support different data transfer rates. B-ISDN is proposed to support data transfer operations necessary for distributed multimedia applications such as transmission of audio, video, and data. The proposed transport mechanism for B-ISDN, called Asynchronous Transfer Mode (ATM), seeks to provide [25]: (1) A single network interface to communication channels for each media type audio, video, image, text (2) Adaptability of application's bandwidth requirements (3) Flexibility for handling different data types (4) Common signaling structure The ATM transport technique uses a multiplexing scheme in which data is organized into units called cells. Each fixed-length cell contains a header which includes media access, connection, and priority control information, as well as a data field. Cell length has not been standardized but is proposed to be less than 100 octets. Calls are associated with cells by the contents of the header; therefore, no channel bandwidth is occupied in a virtual circuit except during actual transmission of an ATM cell. The implication of this multiplexing strategy is that dynamic bandwidth allocation is possible as applications require varying communication performance dictated by differing data types or QOS. Each media type can utilize the same transport mechanism irrespective of the data type's bandwidth requirements. By using a transport mechanism like ATM, anomalies occur which make synchronization at the receiver difficult. These include packet loss, delay due to buffering and independent communication paths, and clock variation at the sender and receiver. Several approaches have been proposed to resolve these problems of synchronization at the stream level as well as the object level. In [22], the approach to stream synchronization is to establish a single multimedia virtual circuit (MVC), whereby synchronized media are multiplexed onto a single virtual circuit (VC) with an inherent variable-bandwidth. A similar approach is proposed by [3]. The ra- tionale for this approach is twofold. First, since a single VC is used, the network guarantees that packets are received in the same order that they are sent, without incurring delay varia- tions (jitter) due to multiple VC connections. Transmission of the multiplexed media is seri- alized. Second, by using a single VC, connection establishment can be performed prior to data transfer without the requirement for additional signaling channels as required by fixed- bandwidth circuits. An MVC can be viewed as a broadband digital pipe which consists of multiple channels; one per medium or data stream, shown in Figure 11. The exact number and characteristic (QOS) of the channels comprising an MVC depend on the application's requirement and request at connection set-up. To support connection establishment and control, interfaces between the network and transport, and transport and session layers provide functions of create listen, modify, and clear, and create, add, modify, and delete, respectively. These allow connection management including specification of service class, priority, flow control, or other QOS parameters. Figure 11. Multimedia Virtual Circuit (MVC) Priority and service class describe the tolerance to delay and peak and average bandwidth requirement. As channels are needed in a connection, the add functionality allows additional media to be multiplexed over the same MVC, with desired QOS. For this scheme, to provide synchronization at the receiver, it is assumed that packets arrive in sequence. Two levels of packet multiplexing and demultiplexing are defined in [22]. At the sender, streams of data (packets) from multiple channels of a multimedia application are multiplexed onto a single MVC. Streams of packets from multiple MVCs are then multi- plexed onto the network for transmission. The inverse operation is performed at the receiver. Different media of an application possess different delivery requirements, which can be represented by a priority assignment. To maintain temporal synchronization within an MVC, all channel priorities are raised to the level of the media with the highest priority. MVCs with the same priority are called a virtual circuit group (VCG). The approach to ensure synchro- nization requires that (1) transmission priority is given to packets of the MVC in the VCG with the highest priority, and (2) a round-robin scheme is used for packets within a VCG. This scheme has some limitations [15]. First, by promoting the priorities of media channels within an MVC, inefficiencies result in the form of lost bandwidth. For example, the QOS for a text data channel can require error recovery while this is not necessary for an audio channel. If both require the same priority and error protocol, unnecessary handshaking is performed. Second, the MVC model assumes point-to-point connections, with no provision to support applications requiring multiple sources or destinations such as shared windows, teleconferencing, and distributed objects. We can identify four types of connections for object retrieval and composition in a DMIS, each shown in Figure 12: (a) single source to single destination, (b) single source to multiple destinations, (c) multiple sources to single destination, and (d) multiple sources to multiple destinations. Figure 12. Multimedia Connections in a DMIS Case (a) is a point-to-point connection for which a master-slave relationship exists be- tween a single multimedia server and a sink. A single MVC can serve the purpose of this connection. Case (b) represents a shared object environment where a single object is dis- played simultaneously to various users. This mode is necessary for the collaborative or teleconferencing environment and requires concurrency and consistency control mechanisms. The MVC approach can handle this case via multicasting or establishing channels for each destination, under a single controller as the source. Case (c) represents a distributed object environment, for which complete composition is performed at the sink. This case is also handled by independent MVC channels assuming that there are no dependencies between data from the different sources. If there are data dependencies between the channels, the MVC model is not applicable. However, by using an intermediate site, data dependencies can be resolved and an independent connection established with the single destination. Figure 12 (d) shows a scenario in which dependent objects are assembled from distributed sites and send via a single MVC to the final destination. Case (e) defines the general case of distributed object composition in a shared object environment. In [15], continuous data streams are proposed to be exchanged on independent channels for each media, rather than a multiplexed MVC. Synchronization of related media is per- formed at coarse intervals, with the requirement that short-term synchronization drifts (skew) be kept small. This is based on the assumption that bounded skew can be provided between temporally related data streams using appropriate buffering, for a short period of time. At coarse synchronization points, the rate of change of skew and the instantaneous jitter are evaluated to provide corrective resynchronization via feedback to the buffering scheme. Frames in this context are called physical synchronization frames (PSF) [15] and refer to a physical unit of communication, such as the packet. A more coarse grain of synchronization is described by logical synchronization frames (LSF). The relationship between PSFs and LSFs is shown in Figure 13. These frames indicate discernible units of information (Table 2) which are comprised of sequences of PSFs. The PSFs are for synchronization within the communications mechanism, and the LSFs are for the controlling application. Furthermore, by specifying the relationship between LSFs and PSFs, a degree of the QOS can be indicated for synchronization. A one-to-one relationship implies tight synchronization at the control level, while an n-to-one relationship indicates control level resynchronization every n PSFs. Figure 13. Physical and Logical Synchronization Frames 4.1.1 High-Level Synchronization Protocols At the level of object composition and management, synchronization points are specified by LSFs [15]. These units represent the finest units that an application can manipulate. That is, stoppage of the presentation of a data stream in the middle of an LSF can result in the presentation of the remaining PSFs to the end of the LSF. For representation of the synchro- nization beyond the low-level, object synchronization using an OCPN can be applied. At the level of complex multimedia objects representing orchestrations of independent media, temporal relationships must be established and maintained. Object modeling is re- quired to identify classes of related presentation elements for the development of a logical database schema and their subsequent storage in a database. For instance, the multimedia objects associated with a conference call or a multimedia document must be classified for these applications. In Figure 14 a hierarchical ordering is assigned to sub-objects comprising a full-motion video (film) object. Here, an object hierarchy is defined to which object syn- chronization can be applied. Stream synchronization can be assigned to a low level with units such as frames, rasters, or pixels. Figure 14. Hierarchical Levels of Objects Provided with object classes, individual relationships between independent media can be established using the OCPN modeling tool. From the OCPN, a synchronization schema is created, representing the required synchronization at the object level. The relationship between objects and LSFs is dependent on the degree of synchronization with which the application desires. Synchronization at this high level is facilitated by interacting processes dedicated to presenting individual objects, scheduled by a suitable policy based on QOS. An algorithm proposed in [2] can be used to retrieve objects from database storage and present them based on their temporal relationships. The main features of this algorithm are: (1) It identifies temporal relationships from a database. (2) It creates process threads for component objects (subobjects) with concurrent temporal relationships. (3) It supports arbitrarily nested complex objects with recursion. (4) It defines synchronization points at the start and termination of component object presentation as defined by the OCPN for the object hierarchy. Details of this algorithm can be found in [2]. In the following section, an example of a multi-level synchronization scheme is presented. 4.2 Integrated Object Synchronization Model Based on Figure 9 and the description of synchronization protocols given in the previous section, a multilayered model for synchronization is proposed. This model unifies the current- ly known synchronization approaches at different levels of object composition. The model can be mapped onto the OSI reference model, as shown in Figure 15. Figure 15. Comparison of OSI and Integrated Synchronization Models The model consists of four layers corresponding to specific layers of the OSI reference model. These layers are the application, window manager, object composition, and transport support layers. The application and window management layers corresponding to the OSI application layer provide the total functionality of the user interaction level of Figure 9. The multimedia application is expected to exist within a graphical user interface with this func- tionality. The object composition layer is responsible for providing synchronization at the level of objects with an algorithm suitable to support real-time presentation of concurrent media to the application. This layer corresponds to the object presentation and scheduling layers of Figure 9. The transport support layer provides services to support stream synchro- nization and other network services. As mentioned earlier, the interfaces between the synchronization layers must provide the ability to monitor and maintain synchronization at higher levels. At the application layer this means provision for primitives to support user functionality such as start and stop of presen- tation. At the transport layer it requires functionality to support control of skew, jitter, utiliza- tion, reliability, or other QOS parameters. In Figure 16, abstract interface primitives between the synchronization layers are indicated. Figure 16. Interfaces Between Synchronization Levels In response to synchronization feedback from the transport mechanism, the object compo- sition layer can take corrective action to resynchronize two streams of data, as described in Section 4.1. This action can indicate a change in QOS to decrease object utilization or in- crease speed to perform skew correction. In Figure 17, stream B undergoes skew correction by dropping a frame (reduction in utilization). This is representative of possible resynchroni- zation techniques within the transport mechanism layer. Figure 17. Skew Correction In Figure 18, a complete example is provided for the integrated synchronization model demonstrating the layered synchronization approach for temporal object composition. The example represents a medical information system incorporating audio, video, text and graph- ics. Multimedia objects are synchronized at three levels as follows: (1) At the level of user interaction at starting and stopping points, indicated in Figure 17 (a) as a set of queries through an information network, shown in Figure 17 (b). Figure 17 (c) indicates the hierarchy of database elements forming the multimedia presentation associated with a topic in the browsing model. (2) At the object level between audio verbalizations and video segments, as shown in Figure 17 (d). The various relationships between objects of the presentation are illustrated with their coarse synchronization points and corresponding OCPN. (3) At the stream synchronization level between audio packets and video frames, shown in Figure 17 (e). At this level, sets of PSFs are mapped to LSFs comprising the presentation objects. Figure 18. OCPN, LSF, PSF Relationship Based on this example, the integrated model has wide applicability to temporal composi- tion problems of multimedia applications. 5 Conclusion Presented in this paper is a review of current techniques and approaches to the problem of temporal object composition for a distributed multimedia information system. An integrated model for the combination of these approaches towards the problems of low-level and high- level synchronization and object management is also proposed. Further research towards the general problem of object composition needs to be addressed. In addition to the numerous communication areas of research associated with B-ISDN, various other issues need to be addressed before distributed multimedia applications have wide acceptance. These include the investigation of a spatial composition mechanism, the use of the object-oriented paradigm for inter-object and inter-machine communication, the investigation of heterogeneity issues, distributed load sharing and the assignment of composition process based on data distribution [26, 16], and the development of scheduling algorithms for desired QOS. Presently, research is being conducted at the Multimedia Information Laboratory at Syracuse University towards the development of a DMIS using the integrated synchroniza- tion model for a distributed system. The approach towards this work is to develop a working multimedia applications platform on an extensible system of heterogeneous workstations and fileservers, implementing a distributed data version initially, but with the goal of fully dis- tributed operation [27]. Results of the experience with this work will appear in a later publi- cation. 6 References [1] Garcia-Luna-Aceves, J.J., "Towards Computer-Based Multimedia Information Systems," Computer Mes- sage System - 85, Ed. R. Uhlig, Elsevier-North-Holland, Amsterdam, 1985, pp. 61-77. [2] Little, T.D.C., Ghafoor, A., "Synchronization and Storage Models for Objects," IEEE Journal on Selected Areas in Communications, Vol. 8, No. 3, April 1990, pp. 413-427. [3] Sventek, J.S., "An Architecture for Supporting Multi-Media Integration," Proc. IEEE Computer Society Office Automation Symposium, April, 1987 pp. 46-56. [4] Berra, P.B., Chen, C.Y.R., Ghafoor, A., Lin, C.C., Little, T.D.C., Shin, C., "Multiuser Multimedia Applica- tion Development System Project," Technical Report, The New York State Center for Advanced Technology in Computer Applications and Software Engineering (CASE), Syracuse University, October 1989. [5] Steinmetz, R., "Synchronization Properties in Multimedia Systems, "IEEE Journal on Selected Areas in Communications, Vol. 8, No. 3, April 1990, pp. 401-412. [6] Zobrist, A.L., Nagy, G., "Pictorial Information Processing of Landsat Data for Geographic Analysis," IEEE Computer, Vol. 14, No. 11, November 1981, pp. 34-41. [7] Perry, J.R., et al., "Performance Features for a PACS Display Console," IEEE Computer, Vol. 16, No. 8, August 1983, pp. 51-56. [8] Ahlgren, D.R., Crosbie, J., Eriqat, D., "Compression of Digitized Images for Transmission and Storage Applications," SPIE Vol. 901, Proc. Image Processing, Analysis, Measurement, and Quality, Los Angeles, CA, January 1988, pp. 105-113. [9] Poggio, A., et al., "CCWS: A Computer-Based Multimedia Information System," IEEE Computer, Vol. 18, No. 10, October 1985, pp. 92-103. [11] International Organization for Standardization, ISO Document No. 8613, ISO, Geneva, March 1988. [12] Mohan, L., Kashyap, R.L., "An Object-Oriented Knowledge Representation for Spatial Information," IEEE Trans. on Software Engineering, Vol. 14, No. 5, May 1988, pp. 675-681. [13] Oosterom, P.V., Bos, J.V.D., "An Object-Oriented Approach to the Design of Geographic Information Systems," Computers and Graphics, Vol. 13, No. 4, 1989, pp. 408-418. [14] Postel, J., Finn, G., Katz, A., and Reynolds, J., "An Experimental Multimedia Mail System," ACM Trans. on Office Information Systems, Vol. 6, No. 1, January 1988, pp. 63-81. [15] Nicolaou, C. "An Architecture for Real-Time Multimedia Communication Systems,"IEEE Journal on Selected Areas in Communications, Vol. 8, No. 3, April 1990, pp. 391-400. [16] Little, T.D.C., "Synchronization of Distributed Multimedia Objects," Ph.D. Dissertation, in Preparation, Syracuse University. [17] Liou, M.L., "Visual Telephony as an ISDN Application," IEEE Communications Magazine, Vol 28, No. 2, February 1990, pp. 30-38. [18] Hamblin, C.L., "Instants and Intervals," Proc. of the 1st Conf. of the Intl. Society for the Study of Time, J.T. Fraser, et al., Ed, Springer-Verlag, New York, 1972, pp. 324-331. [19] Abate, J.E., et al., "AT&T's New Approach to the Synchronization of Telecommunication Networks," IEEE Communications Magazine, Vol. 27, No. 4, April 1989, pp. 35-45. [20] Hehmann, B.B., Salmony, M.G., Stuttgen, H.J., "Transport Services for Multimedia Applications on Broadband Networks," Commputer Communications, Vol. 13, No. 4, May 1990, pp. 197-203. [21] Lazar, A.A., Temple, A., Gidron, R., "An Architecture for Integrated Networks that Guarantees Quality of Service," to appear in Intl. Journal of Digital and Analog Cabled Systems. [22] Leung, W.H., et al., "A Software Architecture for Workstations Supporting Multimedia Conferencing in Packet Switching Networks,"IEEE Journal on Selected Areas in Communications, Vol. 8, No. 3, April 1990, pp. 380-390. [23] International Telegraph and Telephone Consultative Committee SG VIII Draft Recommendation I.121, 1988. [24] National Standards Organization, T1S1 Technical Subcommittee, "Broadband Aspects of ISDN," Decem- ber 1988. [25] Wernik, M., et al., "Supporting Multimedia Applications in Asynchronous Transfer Mode Networks," Proc. 2nd IEEE COMSOC Intl. Multimedia Communication Workshop (Multimedia '89), Ottawa, April 1989. [26] Ghafoor, A., Berra, P., and Chen, R., "A Distributed Multimedia Database System," Proc. Workshop on the Future Trends of Distributed Computing Systems in the 1990s, Hong Kong, September 1988, pp. 461-469. [27] Berra, P.B., Chen, C.Y.R., Ghafoor, A., Lin, C.C., Little, T.D.C., Shin, C., "An Architecture for Distributed Multimedia Database Systems," Computer Communications, Vol. 13, No. 4, May 1990, pp. 217-231. Acknowledgment We wish to thank the anonymous reviewers for their helpful comments in the develop- ment of this paper, and acknowledge the support by the New York State Center for Advanced Technology in Computer Applications and Software Engineering (CASE) at Syracuse Uni- versity. Biographical Information Thomas D.C. Little (S'82-M'83-S'87) received the B.S. degree in biomedical engineer- ing from Rensselaer Polytechnic Institute, Troy, New York, in 1983 and the M.S. degree in electrical engineering from Syracuse University, Syracuse, New York, in 1989. He is cur- rently a candidate for the Ph.D. degree in computer engineering at Syracuse University. Since 1983 he has been involved with applications for embedded, real-time computer systems including automatic call-routing and autonomous ocean sensing. He is presently working for the CASE Center at Syracuse University on the development of a distributed multimedia information system. His current research interests include distributed systems, multimedia database management, and real-time system design. Mr. Little is a member of the IEEE Computer and Communications Societies, and the Association for Computing Machinery. Arif Ghafoor (S'84-M'84-SM'89) received the B.S. degree in electrical engineering from the University of Engineering and Technology, Lehore, Pakistan, in 1976, and the M.S., M.Phil., and Ph.D. from Columbia University, in 1977, 1980, and 1985, respectively. He is a consultant to many companies including Bell Labs and General Electric in the area of telecommunications. He joined Syracuse University in September 1984 as an Assistant Professor. His current research interests include design and analysis of parallel and distribut- ed systems and optical information processing. Dr. Ghafoor is a member of Eta Kappa Nu.