Spatio-Temporal Composition of Distributed Multimedia Objects for Value-Added Networks Thomas D.C. Little Syracuse University Arif Ghafoor Purdue University Abstract - The main requirement for distributed multimedia information systems (DMISs) is the integration of data with complex spatial and temporal relationships prior to presentation to the end-user. These data are often distributed to multiple remote locations in an open system providing public and private database access. Rather than the management of homogeneous data types, multimedia applications require special techniques for retrieval and presentation necessary to provide timely delivery of perishable, heterogeneous data, in spite of delays intro- duced by the network interconnection of distribution sites. The integration of composite data can be performed locally at workstations, or in a hierarchical form within the network as a value-added service in order to reduce communication cost. In this paper we investigate the impact of the distribution of the data and various multimedia object composition architectures on the characteristics of the DMIS. Keywords: distributed multimedia information system, value-added network, spatial & temporal integration, synchronization, communications protocols. 1 Introduction Advances in workstation and network technologies and the desire to improve communica- tion have created great interest in multimedia applications which require the processing, stor- age, and transmission of various data types including audio, video, text, graphics, and images. Multimedia is an emerging application-oriented technology embracing many computer disci- plines including interface design, networks, and databases, and presents many interesting chal- lenges for research. In a distributed multimedia information system (DMIS), an important requirement is the integration, or composition of multimedia objects retrieved from databases distributed across a network. We define multimedia objects to be aggregate units comprised of data elements taken from the aforementioned data classes (text, image, etc.) Such integration depends on both the temporal and spatial characteristics of the multimedia elements. Temporal integration re- quires evaluation of the temporal relationships among component elements and scheduling their retrieval and presentation to satisfy these relations. The relations can be continuous, such as exist for live audio and video, and also synthetic, consisting of arbitrary temporal constraints on any multimedia data type [1]. For example, in Fig. 1, various data elements are retrieved from storage and presented in a serial fashion at the times indicated. Spatial integration of multimedia data is unique to each medium and describes the assembly of objects in space, e.g., on a workstation display, at certain points in time. For pictorial repre- sentations such as still images and graphics, integration operations include overlay and mosaic, and require processing such as scaling, cropping, color conversion, and position registration. For audio data, integration consists of superposition, or mixing, of signals. Other "spatial" audio operations include gain and tone adjustment which are useful in videoconferencing applications to prioritize a speaker's voice amongst many. Gain or tone differences can signify "distance" in participants via signal distortion techniques [2]. A typical spatial composition for pictorial data is shown in Fig. 2, constructed using scaling and mosaic-forming operations. The primary issue addressed in this paper is the investigation of the overall process neces- sary to perform spatial and temporal data integration over a network to support a DMIS. We find that temporal integration can be most suitably achieved at the workstation with respect to delays introduced through the network, while spatial composition is most effectively per- formed in a hierarchical fashion as dictated by the underlying network support and processing resources, in order to reduce volume of transmitted data. The subsequent composition meth- odology is unique in its combination of both spatial and temporal integration as a network service. The remainder of this paper is organized as follows. In Section 2, database organizations and data distributions are investigated. Section 3 and 4 provide a discussion of the spatial and temporal composition functions, respectively, and their integration into the network architec- ture. In Section 5, a mapping of the composition process onto the network resources is de- scribed as a value-added service. Section 6 concludes the paper. 2 Database Organizations and Distributed Object Composition Architectures A number of common database management systems (DBMSs) can be identified for multimedia applications. These systems can be organized as the centralized, master-slave, and federated types, as shown in Fig. 3 (a-c). In the simplest case, a multimedia information system resides on a single server system incorporating DBMS functionality for each medium. In this case, data integration is done entirely at the server. However, access to remote data- bases expands the potential applications for multimedia services requiring a multiple database organization such as Fig. 3 (b) and (c) which are appropriate for large-scale multimedia appli- cations. Here, a data server is defined as an intelligent multimedia database machine capable of performing complex multimedia database operations such as composition, browsing, link- ing, pattern matching, etc., in addition to standard DBMS operations such as selection, projec- tion, and join. The most suitable database organizations are the federated and master-slave types as shown in Fig. 3 (b) and (c). In either case, the users must have the ability to query across the universe of available data. This requires global naming of data, and resolution of heterogeneity between DBMSs, hardware, data formats, etc. After data elements are identified and located, the spatial and temporal integration of accessed data can be viewed as an added service for the creation of complex multimedia objects. Remaining is the specification of the composition function and task of distributing it over the set of computing elements in the DMIS in a manner appropriate for the DBMSs. In a centralized environment where the data are not distributed (e.g., Fig. 3(a)), the compo- sition of the objects is performed solely by a single server. However, the centralized environ- ment is not rich enough to support most multimedia applications since only local data are accessible. On the other hand, the availability of a large number of databases, accessible over a network, provides the potential for large-scale multimedia applications. For such a DMIS, multiple database servers can perform spatial integration to reduce the load on the destination workstation as well as the required network bandwidth for object communication. In particu- lar, spatial operations such as generation of mosaics, cropping, and color conversions, which are common in window-based environments, when performed at the server sites, can signifi- cantly reduce superfluous data transfer. The same benefit is not exhibited by integration at remote servers when temporal integration is considered, since no data reduction occurs be- tween source and destination. For strictly temporal composition, without spatial integration requirements, a point-to-point virtual connection between source and destination is most suit- able for maintaining synchronization for continuous communication and presentation of ob- jects. No additional processing can be performed within the network at some intermediate site. Based on the assumption of a distributed DBMS organization, four types of point-to-point connections for object retrieval and composition in a DMIS can be identified. These composi- tion architectures, shown in Fig. 4, include (a) single source to single destination, (b) single source to multiple destinations, (c) multiple sources to single destination, and (d) multiple sources to multiple destinations. Case (a) is a point-to-point connection for which a client-server relationship exists between a single multimedia server and a the workstation. Case (b) represents a shared object architec- ture in which a single object is displayed simultaneously to various users via multicasting. This mode is necessary to enable Computer-Supported Cooperative Work (CSCW) [3], or teleconferencing. Additional requirements include concurrency and consistency control mechanisms for shared objects. Case (c) represents a distributed object environment, for which complete composition is performed at the sink in a multidrop fashion. This case can be han- dled by independent network connections between the sink and each source and poses a chal- lenge to control intermedia synchronization by the workstation. Fig. 4 (d) shows a scenario in which objects are composed at an intermediate site after arriving from distributed sources and are sent via a single connection to the final destination. By using a single connection, data sequencing ensures strict ordering and thereby provides intermedia synchronization [4]. Case (e) defines the general multicast and multidrop case of distributed object composition in a shared object environment. As mentioned above, the process of data composition can be distributed within the network to minimize various system costs. In particular, the criterion for the selection of composition locus is a function of various system costs including communication, storage, processing, and the desired performance characteristics of the system such as reliability and quality of commu- nication service. The problem is analogous to optimization of queries in a distributed DBMS [5]. In the remainder of this section we investigate the composition issues for both spatial and temporal integration and comment on the various consideration for the selection of a composi- tion architecture. 3 Spatial Composition Current data composition schemes incorporating video data are mostly analog in nature. For example, the generation of an image mosaic by the capture and placement of video stills from the analog output of a videodisk player. Analog schemes differ from the composition of video, images, audio and text in a completely digital domain. In the digital realm, we can uti- lize storage, processing and communication technology, open system applicability, and achieve device independence in an integrated environment. The task of composing data spatially as shown in Fig. 2 can be specified using an object-ori- ented paradigm and implemented by various DMIS components. Many data representations for composite multimedia data have been proposed based on the object-oriented paradigm (e.g., [6]). In this paper we concentrate on the task of composition rather than discussion of a complete object-oriented spatial data representation. We assume a simple layout structure based on ODA [7] as shown for an example in Fig. 5 (a-b). If data are composed at the work- station, user interaction can be most readily incorporated into the composition process. For example, choice of window placement, resolution of occlusion, evaluation of viewspace coor- dinates, scaling, etc., can be handled locally without dialogue with a remote composition serv- ice. Similarly, object modification is most readily achieved by the workstation. However, some objects may be large enough to exceed the local storage capacity of the workstation, and modification must be performed on fragments of the original object, stored elsewhere. Composition at the workstation requires transmission of complete multimedia objects from the database, such as full-size color video or full resolution images. At the destination, much of this resolution is not utilized after scaling, cropping, color mapping, or other spatial manipula- tion. A remote service, by contrast, can perform these spatial operations prior to data transmis- sion, reducing the transmission overhead and freeing the task from the workstation. Addition- ally, specialized hardware can be maintained at the composition sites (e.g., array processors, video special effects devices) which are optimized for high-speed spatial manipulation. To provide a value-added service facilitating these kinds of spatial operations, we describe a set of spatial manipulation primitives that can be implemented as standard functions at var- ious sites within the DMIS. These primitives can be partitioned into two classes; unary and binary operations. Unary operations are applied to single data objects, e.g., the low-pass fil- tering of an audio segment, or the cropping of an image. Binary operations combine pairs of objects, creating their composition. Examples of these operation are summarized in Table 1 with their approximate processing costs. We envision spatial composition to be a multiphase process consisting of unary operations for adjusting the data elements followed by binary operations composing the adjusted elements into final form. Further, we define a logical enti- ty, called the Spatial Translator (ST), to be distributed throughout the network to perform these operations. Note that these operations constitute the overall translation from storage to work- station display, a simplification of the many intermediate transformations between the various coordinate systems and canonical representations usually indicated in graphics programming standards (e.g., GKS, PHIGS). More detailed analysis of spatial operations with respect to communication and processing requirements is given below. By distributing the operations with a uniform interface throughout the DMIS, spatial data composition can be tailored to the requirements of the individual objects. As data are re- trieved, spatial operations can be invoked at remote servers through the uniform interface with the goal of reducing data traffic. This issue is discussed in Section 5. 3.1 Communication and Computational Requirements for Spatial Operations As mentioned above, spatial operations are basically transformations on multimedia objects to achieve desired functionality such as cropping, filtering, overlaying, etc. Some transforma- tions may result in data reduction while in other cases the new data elements can be larger then the old, (e.g., colorization, scale-up, etc.) If the criterion to select server or workstation for composition is based on reduction of network traffic then clearly we want to perform unary spatial transformations at the server site only when a data reduction occurs. However, to minimize processing at the workstation, we may want to perform even the data-increasing transformations at the server as well. Considering binary spatial manipulation, suppose the criterion to select a site for composi- tion is based on reducing network traffic and minimizing workstation processing load. Let R1 and R2 represent two objects to be merged spatially in some arbitrary way, with characteristic data sizes of |R1| and |R2|. The binary spatial merge for some transformation bg, Rd = bg(R1, R2), will result in a final display object Rd with size |Rd|. Clearly max(|R1|, |R2|) <_ |Rd| <_ |R1| + |R2|, since the merge represents in the worst case the union of the two objects (e.g., overlay, abut, mix). This relationship is valid for any type of data object including text, image, video, and audio objects, assuming the absence of data compression. If |Rd| = max(|R1|, |R2|), then a potential reduction of min(|R1|, |R2|) in data traffic to the workstation results by performing composition at the server. On the other hand, if |Rd| = |R1| + |R2|, no savings result with such composition. Within this range, the choice of a site for composition depends on the penalty associated with the composition processing versus the necessary data communication bandwidth, and the composition requirements of other related objects. The overall savings in data transmission is determined by considering all operations required in forming the final composite object including unary and binary operations. As an example, consider the multimedia object of Fig. 2. The data comprising the object are assumed to exist in several distributed servers. Each image is stored in 24 bit, full-color format (8 bits per color, 3 colors per pixel). The subimages in the figure are created by crop- ping their corresponding (1200 x 925 pixel) source images to a size of (120 x 120 pixel). Each of these is superimposed (opaque) onto the background image along with the ASCII text (1989 char x 8 bits/char) with a selected font. The final composite object is entirely in bit-map form and is the size of the cropped background image (1100 x 825 pixels). If composition is per- formed at the workstation, the raw, unprocessed images and text would need to be transmitted there (1200 x 925 pixels x 24 bits/pixel x 5 images + 1989 char x 8 bits/char = 133,209,945 bits). If composition is performed prior to transmission, then the data transmission requirement is 1100 x 825 pixels x 24 bits/pixel = 21,780,000 bits. The savings in data transmission for remote composition is 111,429,945 bits, or a traffic reduction of 84%. Note, that in these calcu- lations we have assumed neither any data compression nor the overheads associated with storage and communication. However, the percentage savings will remain unchanged with compression. Additional benefit is gained with this scheme if specialized hardware of the server site is employed. This example illustrates the reduction of data possible during the process of data composition at some intermediate site in the network as data are retrieved from a database, merged, and sent to the final presentation site. The spatial processing requirements depend on various factors which include the size and type of object considered, the type of composition function, the algorithm used, and on the implementation, whether parallel or serial. Table 1 summarizes processing approximations for some of the various spatial operations on images, audio, and text. Either time or number of operations can be applied as costs. For example, the move cost is related to memory access time, i.e., Pm equals two memory access cycles (one read, one write). For a given composite multimedia object, the spatial formatting requirements can be quan- tified by using the cost estimates for the various spatial transformations. By chaining the cost estimates as described by the spatial hierarchy, an overall processing cost can be estimated. The evaluation of such processing requirements is important in determining the real-time computational performance of the workstation, and the distribution of composition operations. 4 Temporal Composition The time-dependent characteristic of multimedia data motivates the necessity to synchro- nize data objects in time. This requirement extends to the synchronization of a time sequence of static objects, such as still images and text, and to continuous streams of audio and video. A DMIS must satisfy this requirement in the presence of random network delays that are due to the inherently asynchronous nature of a packet network [1,8,9] and storage device latencies. This problem is particularly acute since several streams of different origin can require synchro- nization to each other. For continuous streams of data, the problem is to ensure the proper playout time of each data element in spite of random network delays, as illustrated in Fig. 6. Synchronization of this type has been typically applied to singular streams of packetized audio and video [8] but can be generalized to multiple streams and non-stream data (e.g., still images and text). An important factor for synchronization is the determination of the necessary delay and buffering required to establish a level of packet loss for various network delay distri- butions. This delay, called a control time T, can be found for audio and video streams [8,9] given a target packet loss probability. The same principle can be applied to non-stream data. A difference, however, is that in the former analyses, it is assumed that the generation of packets is at a rate equal to the consumption rate, and the capacity of the communication channel is never exceeded. When arbitrary sequences of multimedia objects are presented, it is possible to specify concurrency in presentation such that channel capacity is exceeded for some intervals. However, due to flexibility in object retrieval for stored-data applications, the times for data retrieval can be reorganized such that the channel capacity is not exceeded. Of course, this is not possible for live data sources. In essence, database sources give us more freedom in the control of time at which data are acquired by the application, as shown in Fig. 7. Beyond the problem of compensating for random network delays, synchronization of multiple streams can be controlled by the destination workstation, or by any other intermediate server within the network prior to the delivery of the data to the destination. The temporal specification needs to be known to the synchronization controller irrespective of the site of the controller. A Petri-net-based approach to specifying the temporal requirements for multimedia objects can be used for this purpose [1,10]. If synchronization is performed at the destination, (e.g., Fig. 4 (c)), then the workstation must evaluate the temporal specification of objects and carry out a synchronization dialogue with the remote servers prior to and during data transfer. If synchronization is performed at any other intermediate server (e.g., Fig. 4 (d)), the worksta- tion does not need to evaluate the temporal specification. However, due to network latencies, retransmission (if applicable), workstation performance limitations, etc., significant skew can be introduced among these synchronized streams, and an intermediate server has little control over final result at the workstation. Therefore, it is difficult to rely on an intermediate server to provide fine synchronization as is required for audio/video streams which generally require a skew of less than 150 ms. 4.1 Multimedia Synchronization Service A typical application might use the synchronization service for pre-orchestrated presenta- tions, teleconferencing, or CSCW. Typically an object is identified from a database through browsing or querying operations. Once identified, the object can be retrieved, composed, and presented. Since both spatial and temporal composition specifications must be met, an impor- tant operation in this scheme is the decomposition of multimedia data into classes for inde- pendent transfer. This separation is desirable since network performance can be increased by isolating unique data traffic classes and by using different transfer protocols tailored to each class [11]. In essence, such transport protocols can provide different levels of guaranteed service for each data type based on the data's tolerance to packet delay and loss, e.g., an image object requires error-free service while audio objects can tolerate errors. We present two communication protocols to perform synchronization as a value-added network service between source and destination [12]. These protocols, called the Application Synchronization Protocol (ASP), and Network Synchronization Protocol (NSP), allow the communication of complex multimedia presentations from distributed sources for playout at a single site. The protocols utilize a Petri-net-based temporal specification of a multimedia object [12]. This model basically specifies the precedence relations among all subobjects in the form of a partial and strict ordering. In the former case, objects have simultaneous playout deadlines and require concurrent presentation, while in the latter case, presentation must be strictly sequential in time (e.g., Fig. 1). The purpose of the ASP is to set-up and initiate data transfer as specified by temporal requirements on an end-to-end basis. Spatial requirements of an object affect the ASP since they must be passed during connection set-up to the remote sites. The ASP takes as input a selected object representing the aggregation of a complex multimedia presentation requiring synchronization, and returns independent streams of synchronized data traffic which can then be routed to specific output devices for presentation at the workstation. The interface also provides control over the quality and cost of transmission service by negotiating a target packet loss probability and delay with the underlying guaranteed-service data transport mechanism (such as [11]). The NSP in turn provides a data transfer facility with a predicted end-to-end delay characteristic based on the specified probability of late packets. The ASP and NSP do not specifically provide a mechanism for reconstructing late or lost packets, rather, they rely on selecting appropriate quality of service parameters for each medium's tolerance to delay and loss. These protocols allow synchronization of independent network connections unlike other approaches to synchronization [4] that require sequencing of synchronized data onto a single virtual circuit (e.g., Fig. 4(c)). For the ASP, it is assumed that all synchronization is performed at the destination. Intermediate nodes are considered only if some intermediate spatial compo- sition function is needed, otherwise a point-to-point connection is implied with corresponding end-to-end properties. In summary, the ASP and NSP provide synchronization as a network value-added service for multimedia objects and involve the following steps: (1) Retrieval of the spatio-temporal relationships describing the components comprising the complex multimedia object. (2) Evaluation of the precedence relationships in the Petri net, thereby creating a playout schedule. (3) Decomposition of the schedule into subschedules based upon the different traffic classes represented and the locations of stored data. (4) Determination of the overall control time required to maintain synchronization among the traffic classes through interaction with the NSP and cooperating sites. (5) Provision of synchronous data transfer. The combination of the ASP and NSP with the composition architecture provides a value- added service by the network, as we discuss in the following section. 5 Multimedia Composition as a Value-Added Network Service In this section, we investigate a combined approach to performing both temporal and spa- tial composition of a DMIS as a service within the network. In the future, we envision the heterogeneity of the network in terms of speed and topology to force the overall composition process to be hierarchical in the sense that multiple data servers (DS) and intermediate sites, or composition servers (CS), collaborate to compose the requested objects both spatially and temporally. Generally, an object model consists of information describing the various opera- tions necessary for spatial composition and intermedia timing and is stored in the network at a central site. At the time a session is established by a user, this information is identified from the central site, and the object hierarchy is decomposed and mapped onto the set of servers (i.e., DSs, CSs, and the workstation). The problem of assignment of composition locus for a given multimedia object is analo- gous to query optimization for distributed databases [5], however, it differs in several ways. First, a sequence of spatial operations often cannot be permuted and therefore little optimiza- tion can be achieved though reorganizing the sequence of spatial operations. Some spatial transformations, e.g., scale-up, increase data volume and are optimal if performed closer to the data's destination rather than near its source. Second, the optimization technique assumes homogeneity in processing and communication costs and therefore does not account for spe- cialized hardware for each medium, nor does it consider the necessity for load balancing when long-lived database transactions are present (e.g., movies). As mentioned earlier, temporal composition always requires control by the workstation since independent media cannot be combined in any reasonable manner at intermediate sites in the network en route to the playout destination, except to provide sequencing [4]. On the other hand, spatial composition can be performed at either the composition servers or at the destina- tion workstation. The choice is dependent on the characteristics of the objects to compose, the workstation storage and computational capability, and the bandwidth of the network. If remote composition is dictated, the composition is delegated to at least one CS, which we call the primary CS. The secondary servers consist of data sources. The choice of primary server should take into account the following considerations: (i) The locality of objects. (ii) The amount of spatial processing required on the selected objects. (iii) The spatial composition capability of each CS (e.g., array processor). (iv) The current CS loading. The first factor means that the composition server with the closest proximity to the largest percentage of data is optimal [5], and is therefore most suitable to be the primary server. Similarly, the spatial processing consideration can be weighed in determining the primary server. By analyzing the spatial organization of a complex object, the mapping of composition function to a specific CS or set of CSs can be done based upon their requirements and the capabilities of the servers. Also, the load on a CS can affect the primary server selection. With respect to the workstation and network components, the assignment of spatial opera- tions must consider the utilization of bandwidth in an optimal way. This can dictate to reduce objects at the CS (e.g., scale, crop, filter), enlarge objects at the WS (e.g., scale, colorize, format text), and build windows at the WS (e.g., deal with occlusion). If minimization of workstation computational requirement is desired, then all spatial operations must be per- formed at the CS including the management of windows (e.g., occlusion). Clearly there is a tradeoff between traffic on the network, computation at the workstation, and user control over presentation. When objects are composed at the CSs, the user loses the ability to control the assembly of data since this operation is performed prior to the reception of the data. A compromise solution is to allow specification of the object hierarchy such that some objects must be assembled at the WS and others can be distributed to various CSs. Based on the above considerations, we show the combined operation scenario for both spatial and temporal object composition using the ASP and ST entities, in the following example. 5.1 Example: The Electronic Magazine The Electronic Magazine is a multimedia application analogous to a printed magazine. A "reader" may browse through "pages" of the magazine, reading articles, viewing pictures, audio/video presentations, as shown in Fig. 8. We model the "pages" of this application as objects that are browsed sequentially. In addition, the user may perform queries or searches to locate specific articles or advertisements. Spatial and temporal composition is required for elements of text, image, audio, and video, within the context of a page as indicated by the spatial and temporal specifications of Figs. 9 and 10. These data are assumed to be distributed across a high-speed network and require remote data access and composition. The operation scenario is as follows. After selecting a page for presentation at the worksta- tion, a composition server is identified via consulting some master name server which main- tains a global table of objects, names, locations, and other characteristics, as well as global state information regarding the availability of resources and load at each CS. As mentioned above, this information is used to select a primary CS for each medium. The object's temporal specification (Fig. 10) is used by the WS to establish an ASP session. The ASP then initiates individual concurrent connections to the primary CSs using the NSP on a point-to-point basis. The primary CSs establish NSP connections with DSs as required for the indicated spatial composition. Load is effectively balanced by placing new sessions onto CSs with the least load, if appropriate, in light of other primary server selection criteria. Assuming the DMIS architecture as shown in Fig. 11, a mapping of spatial operations onto the set of CSs is shown in Fig. 12. In general, a mapping requires optimization of some cost function such as commu- nication bandwidth. A synopsis of the evaluation of the temporal requirements by the ASP and NSP is as fol- lows: The temporal specification is interpreted to generate a set of deadlines for each class of multimedia data (e.g., audio, video, text) [12]. Assuming video is synchronized per frame (30 f/s), audio per 10 s interval, and text and image per object, the temporal specification (Fig. 10) generates the subschedules: Svideo = (0, 0.033, 0.066, 0.099, ... 29.966), Saudio = (0, 10, 20), Stext = (0, 10), and Simage = (0,10). For each subschedule, an NSP connection establishment proce- dure is invoked, resulting in a set of control times: Taudio, Timage, Ttext, and Tvideo which represent the aggregate end-to-end delays over virtual channels between data source and destination for each medium (dashed lines of Fig. 11). The maximum control time To is found and returned to the ASP for generation of the overall start time. Data transfer is provided by the underlying communication mechanism which interprets the derived schedule, assured to be feasible by the evaluation of the connection establishment procedure of the ASP and NSP. 6 Conclusion The composition of data objects in a DMIS is an important technical problem considering the complexity of supporting time-dependent media and data heterogeneity in an open systems environment. Object composition requires consideration of both the temporal and spatial characteristics of multimedia elements. Temporal integration corresponds to evaluation of the temporal relationships between component elements and scheduling their retrieval to satisfy these relationships. Spatial integration of multimedia data is unique to each medium and ap- plies operations such as overlaying and scaling of images, and mixing and dubbing of audio. We show an approach to the partitioning of the composition process onto the resources of a network based on the communication and computation requirements for the composition of objects. Two composition components for temporal and spatial composition are described called the Application Synchronization Protocol and the Spatial Translator, respectively, which encompass the composition function in the support of a DMIS. The composition methodology is unique in its integration of both spatial and temporal composition in a network as a value- added service. Acknowledgements We thank the reviewers for their constructive comments. This work was supported in part by the New York State Center for Advanced Technology in Computer Applications and Soft- ware Engineering (CASE) at Syracuse University. 7 References [1] Little, T.D.C., Ghafoor, A., "Network Considerations for Distributed Multimedia Object Composition and Communication," IEEE Network, Vol. 4, No. 6, Nov. 1990, pp. 32-49. [2] Ludwig, L.F., Pincever, N., Cohen, M., "Extending the Notion of a Window System to Audio," IEEE Computer, Vol 23, No. 8, Aug. 1990, pp. 66-72. [3] Greif, I. Ed., Computer Supported Cooperative Work: A Book of Readings, Morgan Kaufmann, San Mateo, CA, 1988. [4] Leung, W.H., Baumgartner, T.J., Hwang, Y.H., Morgan, M.J, Tu, S.C., "A Software Archi- tecture for Workstations Supporting Multimedia Conferencing in Packet Switching Net- works,"IEEE J. on Selected Areas in Comm., Vol. 8, No. 3, Apr. 1990, pp. 380-390. [5] Chu, W.W., Hurley, P., "Optimal Query Processing for Distributed Database Systems," IEEE Trans. on Computers, Vol. C-31, No. 9, September 1982, pp. 835-850. [6] Woelk, D., Kim, W., and Luther, W., "An Object-Oriented Approach to Multimedia Data- bases," Proc. of ACM SIGMOD Conf., Washington D.C., May 1986, pp. 311-325. [7] International Organization for Standardization, ISO Document No. 8613, ISO, Geneva, Mar. 1988. [8] De Prycker, M., Ryckebusch, M., Barri, P., "Terminal Synchronization in Asynchronous Networks," Proc. ICC '87, Seattle, WA, June 1987, pp. 800-807. [9] Barberis, G., Pazzaglia, D., "Analysis and Optimal Design of a Packet-Voice Receiver," IEEE Trans. on Comm., Vol. COM-28, No. 2, Feb. 1980, pp. 217-227. [10] Stotts, P.D., Furuta, R., "Petri-Net-Based Hypertext: Document Structure with Browsing Semantics," ACM Trans. on Office Automation Systems, Vol. 7, No. 1, Jan. 1989, pp. 3-29. [11] Lazar, A.A., Temple, A., Gidron, R., "MAGNET II: A Metropolitan Area Network Based on Asynchronous Time Sharing," IEEE J. on Selected Areas in Comm., Vol. 8, No. 8, Oct. 1990, pp. 1582-1594. [12] Little, T.D.C., Ghafoor, A., "Multimedia Synchronization Protocols for Broadband Inte- grated Services," to be published in IEEE J. on Selected Areas in Comm., 1991.