Spatio-Temporal Composition of Distributed Multimedia Objects for
Value-Added Networks

Thomas D.C. Little
Syracuse University

Arif Ghafoor
Purdue University 

Abstract - The main requirement for distributed multimedia information
systems (DMISs) is the integration of data with complex spatial and
temporal relationships prior to presentation to the end-user.  These
data are often distributed to multiple remote locations in an open
system providing public and private database access.  Rather than the
management of homogeneous data types, multimedia applications require
special techniques for retrieval and presentation necessary to provide
timely delivery of perishable, heterogeneous data, in spite of delays
intro- duced by the network interconnection of distribution sites.
The integration of composite data can be performed locally at
workstations, or in a hierarchical form within the network as a
value-added service in order to reduce communication cost.  In this
paper we investigate the impact of the distribution of the data and
various multimedia object composition architectures on the
characteristics of the DMIS.

Keywords: distributed multimedia information system, value-added
network, spatial & temporal integration, synchronization,
communications protocols.

1 Introduction

Advances in workstation and network technologies and the desire to
improve communica- tion have created great interest in multimedia
applications which require the processing, stor- age, and transmission
of various data types including audio, video, text, graphics, and
images.  Multimedia is an emerging application-oriented technology
embracing many computer disci- plines including interface design,
networks, and databases, and presents many interesting chal- lenges
for research.

In a distributed multimedia information system (DMIS), an important
requirement is the integration, or composition of multimedia objects
retrieved from databases distributed across a network.  We define
multimedia objects to be aggregate units comprised of data elements
taken from the aforementioned data classes (text, image, etc.)  Such
integration depends on both the temporal and spatial characteristics
of the multimedia elements.  Temporal integration re- quires
evaluation of the temporal relationships among component elements and
scheduling their retrieval and presentation to satisfy these
relations.  The relations can be continuous, such as exist for live
audio and video, and also synthetic, consisting of arbitrary temporal
constraints on any multimedia data type [1].  For example, in Fig. 1,
various data elements are retrieved from storage and presented in a
serial fashion at the times indicated.

Spatial integration of multimedia data is unique to each medium and
describes the assembly of objects in space, e.g., on a workstation
display, at certain points in time.  For pictorial repre- sentations
such as still images and graphics, integration operations include
overlay and mosaic, and require processing such as scaling, cropping,
color conversion, and position registration.  For audio data,
integration consists of superposition, or mixing, of signals.  Other
"spatial" audio operations include gain and tone adjustment which are
useful in videoconferencing applications to prioritize a speaker's
voice amongst many.  Gain or tone differences can signify "distance"
in participants via signal distortion techniques [2].  A typical
spatial composition for pictorial data is shown in Fig. 2, constructed
using scaling and mosaic-forming operations.

The primary issue addressed in this paper is the investigation of
the overall process neces- sary to perform spatial and temporal data
integration over a network to support a DMIS.  We find that temporal
integration can be most suitably achieved at the workstation with
respect to delays introduced through the network, while spatial
composition is most effectively per- formed in a hierarchical fashion
as dictated by the underlying network support and processing
resources, in order to reduce volume of transmitted data.  The
subsequent composition meth- odology is unique in its combination of
both spatial and temporal integration as a network service.

The remainder of this paper is organized as follows.  In Section 2,
database organizations and data distributions are investigated.
Section 3 and 4 provide a discussion of the spatial and temporal
composition functions, respectively, and their integration into the
network architec- ture.  In Section 5, a mapping of the composition
process onto the network resources is de- scribed as a value-added
service.  Section 6 concludes the paper.

2 Database Organizations and Distributed Object Composition
Architectures

A number of common database management systems (DBMSs) can be
identified for multimedia applications.  These systems can be
organized as the centralized, master-slave, and federated types, as
shown in Fig. 3 (a-c).  In the simplest case, a multimedia information
system resides on a single server system incorporating DBMS
functionality for each medium.  In this case, data integration is done
entirely at the server.  However, access to remote data- bases expands
the potential applications for multimedia services requiring a
multiple database organization such as Fig. 3 (b) and (c) which are
appropriate for large-scale multimedia appli- cations.  Here, a data
server is defined as an intelligent multimedia database machine
capable of performing complex multimedia database operations such as
composition, browsing, link- ing, pattern matching, etc., in addition
to standard DBMS operations such as selection, projec- tion, and join.
The most suitable database organizations are the federated and
master-slave types as shown in Fig. 3 (b) and (c).  In either case,
the users must have the ability to query across the universe of
available data.  This requires global naming of data, and resolution
of heterogeneity between DBMSs, hardware, data formats, etc.  After
data elements are identified and located, the spatial and temporal
integration of accessed data can be viewed as an added service for the
creation of complex multimedia objects.  Remaining is the
specification of the composition function and task of distributing it
over the set of computing elements in the DMIS in a manner appropriate
for the DBMSs.

In a centralized environment where the data are not distributed
(e.g., Fig. 3(a)), the compo- sition of the objects is performed
solely by a single server.  However, the centralized environ- ment is
not rich enough to support most multimedia applications since only
local data are accessible.  On the other hand, the availability of a
large number of databases, accessible over a network, provides the
potential for large-scale multimedia applications.  For such a DMIS,
multiple database servers can perform spatial integration to reduce
the load on the destination workstation as well as the required
network bandwidth for object communication.  In particu- lar, spatial
operations such as generation of mosaics, cropping, and color
conversions, which are common in window-based environments, when
performed at the server sites, can signifi- cantly reduce superfluous
data transfer.  The same benefit is not exhibited by integration at
remote servers when temporal integration is considered, since no data
reduction occurs be- tween source and destination.  For strictly
temporal composition, without spatial integration requirements, a
point-to-point virtual connection between source and destination is
most suit- able for maintaining synchronization for continuous
communication and presentation of ob- jects.  No additional processing
can be performed within the network at some intermediate site.

Based on the assumption of a distributed DBMS organization, four
types of point-to-point connections for object retrieval and
composition in a DMIS can be identified.  These composi- tion
architectures, shown in Fig. 4, include (a) single source to single
destination, (b) single source to multiple destinations, (c) multiple
sources to single destination, and (d) multiple sources to multiple
destinations.

Case (a) is a point-to-point connection for which a client-server
relationship exists between a single multimedia server and a the
workstation.  Case (b) represents a shared object architec- ture in
which a single object is displayed simultaneously to various users via
multicasting.  This mode is necessary to enable Computer-Supported
Cooperative Work (CSCW) [3], or teleconferencing.  Additional
requirements include concurrency and consistency control mechanisms
for shared objects.  Case (c) represents a distributed object
environment, for which complete composition is performed at the sink
in a multidrop fashion.  This case can be han- dled by independent
network connections between the sink and each source and poses a chal-
lenge to control intermedia synchronization by the workstation.  Fig.
4 (d) shows a scenario in which objects are composed at an
intermediate site after arriving from distributed sources and are sent
via a single connection to the final destination.  By using a single
connection, data sequencing ensures strict ordering and thereby
provides intermedia synchronization [4].  Case (e) defines the general
multicast and multidrop case of distributed object composition in a
shared object environment.

As mentioned above, the process of data composition can be distributed
within the network to minimize various system costs.  In particular,
the criterion for the selection of composition locus is a function of
various system costs including communication, storage, processing, and
the desired performance characteristics of the system such as
reliability and quality of commu- nication service.  The problem is
analogous to optimization of queries in a distributed DBMS [5].  In
the remainder of this section we investigate the composition issues
for both spatial and temporal integration and comment on the various
consideration for the selection of a composi- tion architecture.

3 Spatial Composition

Current data composition schemes incorporating video data are mostly
analog in nature.  For example, the generation of an image mosaic by
the capture and placement of video stills from the analog output of a
videodisk player.  Analog schemes differ from the composition of
video, images, audio and text in a completely digital domain.  In the
digital realm, we can uti- lize storage, processing and communication
technology, open system applicability, and achieve device independence
in an integrated environment.

The task of composing data spatially as shown in Fig. 2 can be
specified using an object-ori- ented paradigm and implemented by
various DMIS components.  Many data representations for composite
multimedia data have been proposed based on the object-oriented
paradigm (e.g., [6]).  In this paper we concentrate on the task of
composition rather than discussion of a complete object-oriented
spatial data representation.  We assume a simple layout structure
based on ODA [7] as shown for an example in Fig. 5 (a-b).  If data are
composed at the work- station, user interaction can be most readily
incorporated into the composition process.  For example, choice of
window placement, resolution of occlusion, evaluation of viewspace
coor- dinates, scaling, etc., can be handled locally without dialogue
with a remote composition serv- ice.  Similarly, object modification
is most readily achieved by the workstation.  However, some objects
may be large enough to exceed the local storage capacity of the
workstation, and modification must be performed on fragments of the
original object, stored elsewhere.  Composition at the workstation
requires transmission of complete multimedia objects from the
database, such as full-size color video or full resolution images.  At
the destination, much of this resolution is not utilized after
scaling, cropping, color mapping, or other spatial manipula- tion.  A
remote service, by contrast, can perform these spatial operations
prior to data transmis- sion, reducing the transmission overhead and
freeing the task from the workstation.  Addition- ally, specialized
hardware can be maintained at the composition sites (e.g., array
processors, video special effects devices) which are optimized for
high-speed spatial manipulation.

To provide a value-added service facilitating these kinds of spatial
operations, we describe a set of spatial manipulation primitives that
can be implemented as standard functions at var- ious sites within the
DMIS.  These primitives can be partitioned into two classes; unary and
binary operations.  Unary operations are applied to single data
objects, e.g., the low-pass fil- tering of an audio segment, or the
cropping of an image.  Binary operations combine pairs of objects,
creating their composition.  Examples of these operation are
summarized in Table 1 with their approximate processing costs.  We
envision spatial composition to be a multiphase process consisting of
unary operations for adjusting the data elements followed by binary
operations composing the adjusted elements into final form.  Further,
we define a logical enti- ty, called the Spatial Translator (ST), to
be distributed throughout the network to perform these operations.
Note that these operations constitute the overall translation from
storage to work- station display, a simplification of the many
intermediate transformations between the various coordinate systems
and canonical representations usually indicated in graphics
programming standards (e.g., GKS, PHIGS).  More detailed analysis of
spatial operations with respect to communication and processing
requirements is given below.  By distributing the operations with a
uniform interface throughout the DMIS, spatial data composition can be
tailored to the requirements of the individual objects.  As data are
re- trieved, spatial operations can be invoked at remote servers
through the uniform interface with the goal of reducing data traffic.
This issue is discussed in Section 5.

3.1 Communication and Computational Requirements for Spatial
Operations

As mentioned above, spatial operations are basically transformations
on multimedia objects to achieve desired functionality such as
cropping, filtering, overlaying, etc.  Some transforma- tions may
result in data reduction while in other cases the new data elements
can be larger then the old, (e.g., colorization, scale-up, etc.)  If
the criterion to select server or workstation for composition is based
on reduction of network traffic then clearly we want to perform unary
spatial transformations at the server site only when a data reduction
occurs.  However, to minimize processing at the workstation, we may
want to perform even the data-increasing transformations at the server
as well.

  Considering binary spatial manipulation, suppose the criterion to
select a site for composi- tion is based on reducing network traffic
and minimizing workstation processing load.  Let R1 and R2 represent
two objects to be merged spatially in some arbitrary way, with
characteristic data sizes of |R1| and |R2|.  The binary spatial merge
for some transformation bg, Rd = bg(R1, R2), will result in a final
display object Rd with size |Rd|.  Clearly max(|R1|, |R2|) <_ |Rd| <_
|R1| + |R2|, since the merge represents in the worst case the union of
the two objects (e.g., overlay, abut, mix).  This relationship is
valid for any type of data object including text, image, video, and
audio objects, assuming the absence of data compression.

  If |Rd| = max(|R1|, |R2|), then a potential reduction of min(|R1|,
|R2|) in data traffic to the workstation results by performing
composition at the server.  On the other hand, if |Rd| = |R1| + |R2|,
no savings result with such composition.  Within this range, the
choice of a site for composition depends on the penalty associated
with the composition processing versus the necessary data
communication bandwidth, and the composition requirements of other
related objects.  The overall savings in data transmission is
determined by considering all operations required in forming the final
composite object including unary and binary operations.

  As an example, consider the multimedia object of Fig. 2.  The data
comprising the object are assumed to exist in several distributed
servers.  Each image is stored in 24 bit, full-color format (8 bits
per color, 3 colors per pixel).  The subimages in the figure are
created by crop- ping their corresponding (1200 x 925 pixel) source
images to a size of (120 x 120 pixel).  Each of these is superimposed
(opaque) onto the background image along with the ASCII text (1989
char x 8 bits/char) with a selected font.  The final composite object
is entirely in bit-map form and is the size of the cropped background
image (1100 x 825 pixels).  If composition is per- formed at the
workstation, the raw, unprocessed images and text would need to be
transmitted there (1200 x 925 pixels x 24 bits/pixel x 5 images + 1989
char x 8 bits/char = 133,209,945 bits).  If composition is performed
prior to transmission, then the data transmission requirement is 1100
x 825 pixels x 24 bits/pixel = 21,780,000 bits. The savings in data
transmission for remote composition is 111,429,945 bits, or a traffic
reduction of 84%. Note, that in these calcu- lations we have assumed
neither any data compression nor the overheads associated with storage
and communication.  However, the percentage savings will remain
unchanged with compression.  Additional benefit is gained with this
scheme if specialized hardware of the server site is employed.  This
example illustrates the reduction of data possible during the process
of data composition at some intermediate site in the network as data
are retrieved from a database, merged, and sent to the final
presentation site.

  The spatial processing requirements depend on various factors which
include the size and type of object considered, the type of
composition function, the algorithm used, and on the implementation,
whether parallel or serial.  Table 1 summarizes processing
approximations for some of the various spatial operations on images,
audio, and text.  Either time or number of operations can be applied
as costs.  For example, the move cost is related to memory access
time, i.e., Pm equals two memory access cycles (one read, one write).

  For a given composite multimedia object, the spatial formatting
requirements can be quan- tified by using the cost estimates for the
various spatial transformations.  By chaining the cost estimates as
described by the spatial hierarchy, an overall processing cost can be
estimated.  The evaluation of such processing requirements is
important in determining the real-time computational performance of
the workstation, and the distribution of composition operations.

4 Temporal Composition

  The time-dependent characteristic of multimedia data motivates the
necessity to synchro- nize data objects in time.  This requirement
extends to the synchronization of a time sequence of static objects,
such as still images and text, and to continuous streams of audio and
video.  A DMIS must satisfy this requirement in the presence of random
network delays that are due to the inherently asynchronous nature of a
packet network [1,8,9] and storage device latencies.  This problem is
particularly acute since several streams of different origin can
require synchro- nization to each other.  For continuous streams of
data, the problem is to ensure the proper playout time of each data
element in spite of random network delays, as illustrated in Fig. 6.

  Synchronization of this type has been typically applied to singular
streams of packetized audio and video [8] but can be generalized to
multiple streams and non-stream data (e.g., still images and text).
An important factor for synchronization is the determination of the
necessary delay and buffering required to establish a level of packet
loss for various network delay distri- butions.  This delay, called a
control time T, can be found for audio and video streams [8,9] given a
target packet loss probability.  The same principle can be applied to
non-stream data.  A difference, however, is that in the former
analyses, it is assumed that the generation of packets is at a rate
equal to the consumption rate, and the capacity of the communication
channel is never exceeded.  When arbitrary sequences of multimedia
objects are presented, it is possible to specify concurrency in
presentation such that channel capacity is exceeded for some
intervals.  However, due to flexibility in object retrieval for
stored-data applications, the times for data retrieval can be
reorganized such that the channel capacity is not exceeded.  Of
course, this is not possible for live data sources.  In essence,
database sources give us more freedom in the control of time at which
data are acquired by the application, as shown in Fig. 7.

  Beyond the problem of compensating for random network delays,
synchronization of multiple streams can be controlled by the
destination workstation, or by any other intermediate server within
the network prior to the delivery of the data to the destination.  The
temporal specification needs to be known to the synchronization
controller irrespective of the site of the controller.  A
Petri-net-based approach to specifying the temporal requirements for
multimedia objects can be used for this purpose [1,10].  If
synchronization is performed at the destination, (e.g., Fig.  4 (c)),
then the workstation must evaluate the temporal specification of
objects and carry out a synchronization dialogue with the remote
servers prior to and during data transfer.  If synchronization is
performed at any other intermediate server (e.g., Fig. 4 (d)), the
worksta- tion does not need to evaluate the temporal specification.
However, due to network latencies, retransmission (if applicable),
workstation performance limitations, etc., significant skew can be
introduced among these synchronized streams, and an intermediate
server has little control over final result at the workstation.
Therefore, it is difficult to rely on an intermediate server to
provide fine synchronization as is required for audio/video streams
which generally require a skew of less than 150 ms.

4.1 Multimedia Synchronization Service

A typical application might use the synchronization service for
pre-orchestrated presenta- tions, teleconferencing, or CSCW.
Typically an object is identified from a database through browsing or
querying operations.  Once identified, the object can be retrieved,
composed, and presented.  Since both spatial and temporal composition
specifications must be met, an impor- tant operation in this scheme is
the decomposition of multimedia data into classes for inde- pendent
transfer.  This separation is desirable since network performance can
be increased by isolating unique data traffic classes and by using
different transfer protocols tailored to each class [11].  In essence,
such transport protocols can provide different levels of guaranteed
service for each data type based on the data's tolerance to packet
delay and loss, e.g., an image object requires error-free service
while audio objects can tolerate errors.

  We present two communication protocols to perform synchronization as
a value-added network service between source and destination [12].
These protocols, called the Application Synchronization Protocol
(ASP), and Network Synchronization Protocol (NSP), allow the
communication of complex multimedia presentations from distributed
sources for playout at a single site.  The protocols utilize a
Petri-net-based temporal specification of a multimedia object [12].
This model basically specifies the precedence relations among all
subobjects in the form of a partial and strict ordering.  In the
former case, objects have simultaneous playout deadlines and require
concurrent presentation, while in the latter case, presentation must
be strictly sequential in time (e.g., Fig. 1).

  The purpose of the ASP is to set-up and initiate data transfer as
specified by temporal requirements on an end-to-end basis.  Spatial
requirements of an object affect the ASP since they must be passed
during connection set-up to the remote sites.  The ASP takes as input
a selected object representing the aggregation of a complex multimedia
presentation requiring synchronization, and returns independent
streams of synchronized data traffic which can then be routed to
specific output devices for presentation at the workstation.  The
interface also provides control over the quality and cost of
transmission service by negotiating a target packet loss probability
and delay with the underlying guaranteed-service data transport
mechanism (such as [11]).  The NSP in turn provides a data transfer
facility with a predicted end-to-end delay characteristic based on the
specified probability of late packets.  The ASP and NSP do not
specifically provide a mechanism for reconstructing late or lost
packets, rather, they rely on selecting appropriate quality of service
parameters for each medium's tolerance to delay and loss.

  These protocols allow synchronization of independent network
connections unlike other approaches to synchronization [4] that
require sequencing of synchronized data onto a single virtual circuit
(e.g., Fig. 4(c)).  For the ASP, it is assumed that all
synchronization is performed at the destination.  Intermediate nodes
are considered only if some intermediate spatial compo- sition
function is needed, otherwise a point-to-point connection is implied
with corresponding end-to-end properties.  In summary, the ASP and NSP
provide synchronization as a network value-added service for
multimedia objects and involve the following steps:

  (1) Retrieval of the spatio-temporal relationships describing the
components comprising the complex multimedia object.

  (2) Evaluation of the precedence relationships in the Petri net,
thereby creating a playout schedule.

  (3) Decomposition of the schedule into subschedules based upon the
different traffic classes represented and the locations of stored
data.

  (4) Determination of the overall control time required to maintain
synchronization among the traffic classes through interaction with the
NSP and cooperating sites.

  (5) Provision of synchronous data transfer.

  The combination of the ASP and NSP with the composition architecture
provides a value- added service by the network, as we discuss in the
following section.

5 Multimedia Composition as a Value-Added Network Service

  In this section, we investigate a combined approach to performing
both temporal and spa- tial composition of a DMIS as a service within
the network.  In the future, we envision the heterogeneity of the
network in terms of speed and topology to force the overall
composition process to be hierarchical in the sense that multiple data
servers (DS) and intermediate sites, or composition servers (CS),
collaborate to compose the requested objects both spatially and
temporally.  Generally, an object model consists of information
describing the various opera- tions necessary for spatial composition
and intermedia timing and is stored in the network at a central site.
At the time a session is established by a user, this information is
identified from the central site, and the object hierarchy is
decomposed and mapped onto the set of servers (i.e., DSs, CSs, and the
workstation).

  The problem of assignment of composition locus for a given
multimedia object is analo- gous to query optimization for distributed
databases [5], however, it differs in several ways.  First, a sequence
of spatial operations often cannot be permuted and therefore little
optimiza- tion can be achieved though reorganizing the sequence of
spatial operations.  Some spatial transformations, e.g., scale-up,
increase data volume and are optimal if performed closer to the data's
destination rather than near its source.  Second, the optimization
technique assumes homogeneity in processing and communication costs
and therefore does not account for spe- cialized hardware for each
medium, nor does it consider the necessity for load balancing when
long-lived database transactions are present (e.g., movies).

  As mentioned earlier, temporal composition always requires control
by the workstation since independent media cannot be combined in any
reasonable manner at intermediate sites in the network en route to the
playout destination, except to provide sequencing [4].  On the other
hand, spatial composition can be performed at either the composition
servers or at the destina- tion workstation.  The choice is dependent
on the characteristics of the objects to compose, the workstation
storage and computational capability, and the bandwidth of the
network.  If remote composition is dictated, the composition is
delegated to at least one CS, which we call the primary CS.  The
secondary servers consist of data sources.  The choice of primary
server should take into account the following considerations:

  (i) The locality of objects.

  (ii) The amount of spatial processing required on the selected
objects.

  (iii) The spatial composition capability of each CS (e.g., array
processor).

  (iv) The current CS loading.

  The first factor means that the composition server with the closest
proximity to the largest percentage of data is optimal [5], and is
therefore most suitable to be the primary server.  Similarly, the
spatial processing consideration can be weighed in determining the
primary server.  By analyzing the spatial organization of a complex
object, the mapping of composition function to a specific CS or set of
CSs can be done based upon their requirements and the capabilities of
the servers.  Also, the load on a CS can affect the primary server
selection.

  With respect to the workstation and network components, the
assignment of spatial opera- tions must consider the utilization of
bandwidth in an optimal way.  This can dictate to reduce objects at
the CS (e.g., scale, crop, filter), enlarge objects at the WS (e.g.,
scale, colorize, format text), and build windows at the WS (e.g., deal
with occlusion).  If minimization of workstation computational
requirement is desired, then all spatial operations must be per-
formed at the CS including the management of windows (e.g.,
occlusion).

  Clearly there is a tradeoff between traffic on the network,
computation at the workstation, and user control over presentation.
When objects are composed at the CSs, the user loses the ability to
control the assembly of data since this operation is performed prior
to the reception of the data.  A compromise solution is to allow
specification of the object hierarchy such that some objects must be
assembled at the WS and others can be distributed to various CSs.
Based on the above considerations, we show the combined operation
scenario for both spatial and temporal object composition using the
ASP and ST entities, in the following example.

5.1 Example: The Electronic Magazine

  The Electronic Magazine is a multimedia application analogous to a
printed magazine.  A "reader" may browse through "pages" of the
magazine, reading articles, viewing pictures, audio/video
presentations, as shown in Fig. 8.  We model the "pages" of this
application as objects that are browsed sequentially.  In addition,
the user may perform queries or searches to locate specific articles
or advertisements.  Spatial and temporal composition is required for
elements of text, image, audio, and video, within the context of a
page as indicated by the spatial and temporal specifications of Figs.
9 and 10.  These data are assumed to be distributed across a
high-speed network and require remote data access and composition.

  The operation scenario is as follows.  After selecting a page for
presentation at the worksta- tion, a composition server is identified
via consulting some master name server which main- tains a global
table of objects, names, locations, and other characteristics, as well
as global state information regarding the availability of resources
and load at each CS.  As mentioned above, this information is used to
select a primary CS for each medium.  The object's temporal
specification (Fig. 10) is used by the WS to establish an ASP session.
The ASP then initiates individual concurrent connections to the
primary CSs using the NSP on a point-to-point basis.  The primary CSs
establish NSP connections with DSs as required for the indicated
spatial composition.  Load is effectively balanced by placing new
sessions onto CSs with the least load, if appropriate, in light of
other primary server selection criteria.  Assuming the DMIS
architecture as shown in Fig. 11, a mapping of spatial operations onto
the set of CSs is shown in Fig. 12.  In general, a mapping requires
optimization of some cost function such as commu- nication bandwidth.

  A synopsis of the evaluation of the temporal requirements by the ASP
and NSP is as fol- lows: The temporal specification is interpreted to
generate a set of deadlines for each class of multimedia data (e.g.,
audio, video, text) [12].  Assuming video is synchronized per frame
(30 f/s), audio per 10 s interval, and text and image per object, the
temporal specification (Fig.  10) generates the subschedules: Svideo =
(0, 0.033, 0.066, 0.099, ... 29.966), Saudio = (0, 10, 20), Stext =
(0, 10), and Simage = (0,10).  For each subschedule, an NSP connection
establishment proce- dure is invoked, resulting in a set of control
times: Taudio, Timage, Ttext, and Tvideo which represent the aggregate
end-to-end delays over virtual channels between data source and
destination for each medium (dashed lines of Fig. 11).  The maximum
control time To is found and returned to the ASP for generation of the
overall start time.  Data transfer is provided by the underlying
communication mechanism which interprets the derived schedule, assured
to be feasible by the evaluation of the connection establishment
procedure of the ASP and NSP.

6 Conclusion

  The composition of data objects in a DMIS is an important technical
problem considering the complexity of supporting time-dependent media
and data heterogeneity in an open systems environment.  Object
composition requires consideration of both the temporal and spatial
characteristics of multimedia elements.  Temporal integration
corresponds to evaluation of the temporal relationships between
component elements and scheduling their retrieval to satisfy these
relationships.  Spatial integration of multimedia data is unique to
each medium and ap- plies operations such as overlaying and scaling of
images, and mixing and dubbing of audio.

  We show an approach to the partitioning of the composition process
onto the resources of a network based on the communication and
computation requirements for the composition of objects.  Two
composition components for temporal and spatial composition are
described called the Application Synchronization Protocol and the
Spatial Translator, respectively, which encompass the composition
function in the support of a DMIS.  The composition methodology is
unique in its integration of both spatial and temporal composition in
a network as a value- added service.

Acknowledgements

  We thank the reviewers for their constructive comments.  This work
was supported in part by the New York State Center for Advanced
Technology in Computer Applications and Soft- ware Engineering (CASE)
at Syracuse University.

7 References

[1] Little, T.D.C., Ghafoor, A., "Network Considerations for
Distributed Multimedia Object Composition and Communication," IEEE
Network, Vol. 4, No. 6, Nov. 1990, pp. 32-49.

[2] Ludwig, L.F., Pincever, N., Cohen, M., "Extending the Notion of a
Window System to Audio," IEEE Computer, Vol 23, No. 8, Aug. 1990, pp.
66-72.

[3] Greif, I.  Ed., Computer Supported Cooperative Work: A Book of
Readings, Morgan Kaufmann, San Mateo, CA, 1988.

[4] Leung, W.H., Baumgartner, T.J., Hwang, Y.H., Morgan, M.J, Tu,
S.C., "A Software Archi- tecture for Workstations Supporting
Multimedia Conferencing in Packet Switching Net- works,"IEEE J. on
Selected Areas in Comm., Vol. 8, No. 3, Apr. 1990, pp. 380-390.

[5] Chu, W.W., Hurley, P., "Optimal Query Processing for Distributed
Database Systems," IEEE Trans. on Computers, Vol. C-31, No. 9,
September 1982, pp. 835-850.

[6] Woelk, D., Kim, W., and Luther, W., "An Object-Oriented Approach
to Multimedia Data- bases," Proc. of ACM SIGMOD Conf., Washington
D.C., May 1986, pp. 311-325.

[7] International Organization for Standardization, ISO Document No.
8613, ISO, Geneva, Mar. 1988.

[8] De Prycker, M., Ryckebusch, M., Barri, P., "Terminal
Synchronization in Asynchronous Networks," Proc. ICC '87, Seattle, WA,
June 1987, pp. 800-807.

[9] Barberis, G., Pazzaglia, D., "Analysis and Optimal Design of a
Packet-Voice Receiver," IEEE Trans. on Comm., Vol. COM-28, No. 2, Feb.
1980, pp. 217-227.

[10] Stotts, P.D., Furuta, R., "Petri-Net-Based Hypertext: Document
Structure with Browsing Semantics," ACM Trans. on Office Automation
Systems, Vol. 7, No. 1, Jan. 1989, pp. 3-29.

[11] Lazar, A.A., Temple, A., Gidron, R., "MAGNET II: A Metropolitan
Area Network Based on Asynchronous Time Sharing," IEEE J. on Selected
Areas in Comm., Vol. 8, No. 8, Oct.  1990, pp. 1582-1594.

[12] Little, T.D.C., Ghafoor, A., "Multimedia Synchronization
Protocols for Broadband Inte- grated Services," to be published in
IEEE J. on Selected Areas in Comm., 1991.