A Load Balancing Algorithm For a Distributed Multimedia Game Server Architecture Dugki Min1 , Eunmi Choi2 , Donghoon Lee1 , Byungseok Park1 1
Department of Computer Science and Engineering, Konkuk University, Mojin-dong, Kwangjin-ku, Seoul, 133-701, Korea {dkmin, dhlee, bspark }@pluto.konkuk.ac.kr, Phone: +82-2-450-3490
2
School of Computer Science and Electronic Engineering, Handong University, Heunghae-eub, Puk-ku, Pohang, Kyungbuk, 791-940, Korea,
[email protected]
Abstract Network game servers, in which a huge number of users can
network game server on a distributed system is a feasible and
play a common game through interconnecting network, can be
practical solution due to the availability of powerful
designed in distributed system architecture with the benefit of
microprocessors
scalability, reliability, and cost-effectiveness. Although many
communication technology as well as the maturity of software
load balancing algorithms have been proposed by many
technology.
at
low
cost,
significant
advances
in
researchers in the area of distributed systems or dynamic
In designing a distributed system that supports a
scientific particle simulation, they cannot be applied to
multimedia network game application, the primary goal is to
interactive network game applications. The dynamic load
achieve efficiency, scalability and robustness of the system. As
balancing in interactive network games should take into
the number of users increases, the distributed system should
account the geographical relationship among game units and
achieve high system performance by virtue of enormous
the short response time of frequent user interactions. In this
distributed computing resources. Also, under the situation that
paper, we propose a load balancing algorithm for interactive
any component of processors suddenly fails, the game system
multimedia network game and present the simulation results
could be still accessible. In order to achieve these objectives,
from a simulator of distributed game system implemented with
however, the architecture of distributed systems should be
the proposed load balancing algorithm.
carefully designed due to the inherent limitations in distributed systems. The limitations are such as the effect of distributed
Introduction
resources, non-negligible communication overhead, the higher
Multimedia network games have become popular in the game.
likelihood of component failures in the system, and the lack of
Unlike arcade games or stand-alone PC games, the fun of a
common memory and system-wide clock.
network game is caused by the participants in the game; participants’ various
tricks,
techniques,
strategies
In this paper, we study an efficient and scalable
and
distributed game server architecture that can accommodate a
cooperation are sources of creating unexpected exciting
huge number of users. In order to be an efficient and scalable
situations and funs. In order to support a huge number of game
game server system against the dynamic changes of users’
players, a multimedia network game server should have a
action and the size, this paper focuses on the issue of load
scalable architecture. Developing a powerful and scalable
balancing for efficient execution of dynamic distributed load.
The load balancing issue is not a new problem in the discipline
the game client applications access, and the other is the
of distributed systems. However, the former load balancing
communication backbone network interconnecting servers.
algorithms proposed in the related literature[1,2,3,4,5,7,8] are
The distributed game server architecture is composed of two
applicable
processing
types of servers, a master server and a number of game
applications[1,2,3] or distributed operating systems[4,5], but
processing servers. The master server is the representative of
not appropriate to distributed interactive applications. This
the game server system, conducting the roles of distributing
paper proposes an appropriate dynamic load balancing
initial connections and controlling the number of operating
algorithm that is suitable to a distributed interactive game
game processing power. All game client applications are
server; whose game units have geographical relationships with
initially connected to this master server, and then redirected to
one another and change their positions with the lapse of time
appropriate game processing servers. The master server
which are controlled by interaction of game players.
manages game processing servers by checking, adding, and
to
the
batch-style
parallel
The next section describes the scalable distributed server
removing them according to the load situation and load
architecture for network games. In Section 3, the dynamic load
balancing algorithm employed. Besides, it manages the global
balancing algorithm we proposed in this paper is presented.
clock in order to synchronize the local clocks of the game
Section 4 shows our simulation results of the performance of
processing servers.
the load balancing algorithm. The concluding remark is in Section 5. Client
2. Distributed Game Server Architecture Recently, we have two kinds of popular network game server
Client
Public Network
Service Network
architecture: one is Starcraft-style game, and the other is Ultima on-line game. The Starcraft-style game opens a huge number of small game sessions, each of which around 20 users
Game Processing Server
Master Server
Server Backbone Network
can play a game. In this style game, there are about several thousands of game sessions that are active to play. It provides
Figure 1. The Distributed Game Server Architecture
small space or maps where a unit can reach the boundary of
The game processing servers are the major game servers that
the space within an hour. In contrast, the Ultima -online-style
process actions of game units. Each game processing server is
game has more than a thousand of users in each game session.
composed of four manager components: unit action manager,
Its scale of space is huge; a game unit is hard to reach to its
player connection manager, load balancing manager, and
boundary even within a day. The style of game we concern in
boundary interaction manager. The unit action manager
this research is the Ultima -online-style. Due to the huge
processes all events generated by game units, such as shooting
number of users in a game session, one of the major concerns
bullets, changing a unit’s location, and destroying a planet.
in this style of network game is scalability.
Note that there are two kinds of units: the first is created and
Figure 1 illustrates an overall distributed system
controlled by game players, say ‘player’unit, and the other is
architecture for the Ultima -online-style network game. The
‘non-player’ unit which acts by itself according to autonomous
servers in the system are interconnected by two types of
routines. The player connection manager controls connections
communication networks: one is the service network to which
from game client applications. It checks the connections
periodically to confirm that the players are in a normal status
3.1 Space Partitioning Method
of communication. It also redirects the event notifications to
As shown in Figure 2, our algorithm divides the game space
the corresponding clients. The load balancing manager
vertically against the x axis. The game units in each
performs a distributed load balancing algorithm in order to
subpartition are assigned to a processor. Suppose that each
decrease the performance degradation caused by the dynamic
subpartition is indexed with the processor number. Since a
geographical change of game units in the game space. When
game unit interacts with its neighbor units located within a
the load is higher than the maximum threshold, the load
given range, let’s call this range the interaction distance of
balancing algorithm is performed to move the overloaded units
units. In order to balance the loads of processors, the locations
to another game processing servers according to the policies.
of partition lines are repositioned according to the dynamic
The boundary interaction manager manages interactions
change of the density of units. This partitioning scheme makes
among game units located on the boundary areas. When a
the algorithm also be called a dynamic load partitioning
game unit performs actions and needs to change its state, this
algorithm. In contrast, a static partitioning divides the space
information should be notified to all the interacting game units
into subpartitions of fixed size and the subpartitions are not
whether they are in the same processor or in neighbor
modified. The dynamic partitioning pays the cost of re-
processors. In order to perform boundary interactions
partitioning the load, while a static partitioning pays the cost
efficiently, each processor keeps on the record about the state
of processor idleness due to load imbalance.
information of game units and sends the record to its neighbor processors.
The light and dark gray bands in the boundaries of Pi illustrate the areas whose units are exported and imported, respectively, to the neighbor processors, Pi-1 and Pi+1 . The
3. The Dynamic Load Balancing Algorithm
width of the bands is greater than or equal to the interaction
Our dynamic load balancing algorithm for interactive network
distance of units. Since the units in the export areas of Pi can
games is developed in two considerations. One is how to
interact with the units in the import areas of Pi o r Pi+1 , Pi
divide the game space into subpartitions that are assignable to
maintains the up-to-date unit information of the import areas.
a number of distributed processors. The strategy of space partitioning is closely related to the performance of dynamic load balancing algorithm. Thus, it should be simple and effective enough to minimize the average turnaround time of user interactions and to distribute the control of load distributing easily. The other is how to distribute the control of load balancing task. The issue of distributing control of load balancing is related to the scalability of load balancing algorithm. To be scalable with no account of the number of users or the number of processors, the decision making control of load balancing should be done on each processor with partial information collected from a small number of processors.
Figure 2. Space Partitioning
3.2 The Dynamic Load Balancing Algorithm In this subsection, we present our load balancing algorithm. The
employed
routine
for
message
sending,
i.e.
send_messsage(), is a nonblocking send, while the message receiving routine, i.e. receive_message(), is a blocking receive.
Some parts of the load balancing algorithm are given below as
by
pseudo-codes to help readers clearly understand this algorithm.
ack_load_checking in a pseudo code.
For the sake of convenience, we call a game processing server a node.
x
coordinate.
Figure
4
shows
the
process
of
If a node finds out that both neighbor nodes cannot accept its load transferring, the node has to request a remote help to
There are five states that a node can have. A node is in
find a proper remote node that can receive its overloaded load.
NORMAL state, when the node can operate its task in normal
Thus, the node sends remote_help messages to its neighbor
phase with a moderate load. When the load is larger than the
nodes. Since the neighbor node is already overloaded, it
overload_threshold value, the node becomes OVERLOADED
forwards the remote_help message to its neighbor node. The
state. Once a node is in OVERLOADED state, it becomes a
remote_help message, thus, is forwarded in two directions
sender and tries to transfer the overloaded units to other nodes.
from the original sender node to others, node by node, in order
When an overloaded node receives messages indicating all the
to find the available node. If there is an available node that can
neighbor nodes are overloaded, the node is locked in
help the original target node, it sends an ack_remote_help
LOCKED state, because it cannot reduce its load immediately.
message back to the neighbor node by which the remote_help
If the load of a node is smaller than the underload_threshold,
message was forwarded, rather than directly sends to the target
the node becomes UNDERLOADED state. If the node is in
node (the original sender node). It is because in our algorithm
the UNDERLOADED state, it becomes a receiver which can
all nodes can communicate only with their neighbor nodes in
receive more loads from other overloaded nodes. The last state
order to keep the characteristic of distributed control. Once the
is UNKNOWN state that is initially assigned to the node
ack_remote_help message is generated, cascading migrations
before the load is not checked yet.
occur starting from the available node to the target node. We
Each node periodically checks its current load and
allow the overhead of cascading in order to avoid the system
updates its state by comparing it to threshold values. If it is
to stay in an unstable state due to a cluster of highly
overloaded, it sends load_checking messages to its neighbor
overloaded nodes. Figure 5 shows this remote help process in
nodes in order to check if it can migrate overloaded units to
a pseudo-code form.
neighbor nodes. Since nodes communicate with one another
Whenever units are migrated from a node to another node,
via message passing, the major operation to be performed by a
the receiver node asks migration to the sender node by sending
node is decided by what kind of message the node receives. If
an ask_unit_migration message. Not only in the case of load
a node receives a load_checking message, the receiver node
balancing, but also units are transferred to other nodes when
checks its current load and sends back another message,
the units move cross the node’s boundaries. When a unit
tagged by ack_load_checking, which has the information of
crosses over the boundary of a node, the node sends a
the current load. If a node receives an ack_load_checking
unit_move message to its neighbor node to hand over the
message and the sender node can accommodate a chunk of
moving unit. Another major operation is to handle units and
more units without changing its NORMAL state, the receiver
events in queues. Whenever events are generated or moved
node can migrates to the sender node a chunk of units. The
from other nodes, they are enqueued and processed. If they
chunk size is the amount of load transferred in a migrating
need to be propagated to other nodes, they are stored in
step. A chunk of units to be transferred are selected according
separate send buffers for each sender node. After processing
to the selection policy. In our algorithm, the units to be
all the units in the queue, the events in the send buffers are
transferred are easily selected by sorting the location of units
transferred to the target nodes via a unit_move message.
4. Simulation Results
In order to show how good the performance of the dynamic
We developed a simulator in Java to test the performance
load balancing algorithm is, we compare it to the performance
of the dynamic load balancing algorithm proposed in this
of static partitioning. Static partitioning is a strategy that
paper. The concurrent processing of a number of game
assigns a fixed portion of units to a node statically during the
processing servers is implemented by Java threads. In the
entire simulation time. This strategy does not allow either
dynamic game server architecture described in Section 2, the
moving units cross the boundaries of the allocated node nor
load balancing manager and the boundary interaction manager
repartitioning the load dynamically. Such a static partitioning
are fully implemented on each game processing thread. Each
is good to measure the effect of load imbalance on the average
game processing server have several queues: a unit event
turnaround time. The ideal static partitioning lets all partitions
queue that is for the synchronous event processing in a game
have the same number of units so that the load is perfectly
unit, a unit notification queue that keeps the newly generated
balanced. In this case, there are no performance degrading
events to the game units on the same processor, and a server
factors, such as load imbalance and the overhead of dynamic
event queue that keeps the pending messages from other
load balancing.
processors. At each clock tick, one event is processed at a time.
Figure 8 shows performances of the dynamic load
A processing cycle is composed of the load balancing routine,
balancing algorithm compared to those of static partitioning
processing of server event queue, processing of unit event
strategies having various load imbalances. We define load
queue, and processing of unit notification queue
imbalance to be the difference between the maximum and
In order to measure performance of the dynamic load
minimum number of units per processor compared to the
balancing algorithm, we perform a set of experiments using
average. The dots in the line of static partitioning indicate the
workloads artificially generated by user threads. The user
average turnaround time of four cases that assigns 1500 units
threads generate game units and control the activities and
statically into five nodes so that the load imbalances are 0,0,
motions of the units. The number of game units increases
0.98, 1.15, 1.3 respectively. The flat line represents the
continuously up to a certain value set at the beginning of the
turnaround time of the dynamic load balancing with the
simulation. After reaching the value, the number of game units
optimal setting given in Section 4.1. Due to the overhead of
dynamically fluctuates around the value during the rest of the
load balancing, the dynamic load balancing algorithm shows
simulation. Using the GUI monitor of the simulator, this value
the worse performance than the static partitioning strategies
can be changed during the simulation. The motion of a game
whose load imbalance are lower than around 0.1. But as the
unit is controlled by a vector whose direction and size are
load imbalance of static partitioning increases, the dynamic
randomly changed by the user thread. To test the load
load balancing shows much better turnaround time.
imbalance situation, several sink areas are located on the map, where the game units have more probability to stay within the sink areas than to leave out. As the major performance metric, the average turnaround time of user interaction is measured. The total 1500 game units are offered to the system with five game processing servers.
4.1 Comparison to Static Partitioning
Figure 8. Dynamic Partitioning vs. Static Partitioning
4.2 Other Characteristics We performed two more sets of experiments to study the other characteristics of the dynamic load balancing algorithm. First, we changed the offered system load from 500 to 2000 units. Figure 9 plots the average turnaround time vs. the offered system load for the dynamic load balancing algorithm. As the offered system load increases up to 1750 units, the average
Figure 11. Turnaround Time vs. Number of Sink Areas
turnaround time increases linearly. After that load, the system becomes
unstable
and
the
turnaround
time
increases
exponentially. Next, we perform another set of experiments to test the
5. Conclusion In this paper, we designed a distributed multimedia game server architecture that serves interactive action units. The
dynamic load balancing algorithm in load imbalance situations.
architecture is composed of a master server and several game
To simulate the possible load imbalance situation that can be
processing servers. Each game processing server has four
happened in normal network games, we place a number of
manager components: unit action manager, player connection
sink areas on the map, where the game units have more
manager, load balancing manager, and boundary interaction
probability to stay within the sink areas than to leave out.
manager. These managers work together to process action
Figure 10 gives three captured images that show the cases we
units generated dynamically. The entire working space of the
performed in our experiments. Each has one, two or three
game is partitioned into sub-areas and assigned to each game
number of sink areas respectively. The current state of each
processing server to organize good design strategies.
picture is in a near optimal balancing state achieved a long execution of dynamic load balancing.
Among the game processing servers, the situation of load imbalance needs to be avoided to achieve good system performance. In order to balance loads of servers, we proposed a dynamic load balancing algorithm. Based on the load balancing algorithm, we implemented a distributed game server system in Java and measured several system performances. By using the major performance metric, the average turnaround time, we studied the effects of the frequency of load balancing, a chuck size of migration,
Figure 9. Effects of Offered System Load Including the case with no sink area, Figure 11 plots the turnaround time for the four different cases with 0 to 3 sink areas. The turnaround time without sink areas shows the better performance, while all the three load imbalance cases with one or more sink areas shows similar performance.
partitioning pattern, various system loads, and the number of sink areas. The optimal performance can be achieved when the load balancing is performed once per 50 time units with chunk size 27. Compared to a static partitioning, our dynamic partitioning achieved better results as the system has less balanced load.
References
[1]
D. M. Nicol and J. H. Saltz, “An Analysis of Scatter Decomposition,” IEEE Trans. Computer., vol. 39, no. 11, pp. 1337-1345, Nov. 1990
[2]
S.
B.
Baden,
“Programming
Abstractions
for
Dynamically Partitioning and Coordinating Localized Scientific Calculations Running on Multiprocessors ,” Siam J. Sci. Stat. Comput., vol. 12, no. 1, pp. 145-157, Jan. 1991 [3]
R. D. Williams, “Performance of dynamic load balancing algorithms
for
unstructured
mesh
calculations,”
Concurrency: Practice and Experience, vol. 3, no. 5, pp. 457-481, Oct. 1991 [4]
R. Chow and T. Johnson, Distributed Operating Systems and Algorithms, Addison-Wesley, 1997
[5]
M. Singhal and N. G. Shivaratri, Advanced Concepts in Operating Systems. New York: McGraw-Hill, 1994
[6]
D. L. Eager, E. D. Lazowska, and J. Zahorjan, “A Comparison of Receiver-In itiated and Sender-Initiated Adaptive Load Sharing,” Performance Evaluation, North-Holand, vol. 6, no. 1, pp. 53-68, Mar. 1986
[7]
P. Krueger and R. Finkel, “An Adaptive Load Balancing Algorithm for a Multicomputer,” Technical Report 539, University of Wisconsin-Madison, Apr. 1984