Generation of Feasible Integer Solutions on a Massively Parallel ComputerI,II Utku Koca,b,∗, Sanjay Mehrotraa a Northwestern b MEF
University, Evanston, IL, USA University, Istanbul, TURKEY
Abstract We present an approach to parallelize generation of feasible solutions of mixed integer linear programs in distributed memory high performance computing environments. The approach combines a parallel framework with feasibility pump (FP) as the rounding heuristic. The proposed approach runs multiple FP instances with different starting solutions concurrently, while allowing them to share information. The starting solutions for multiple subroutines are created by rounding the most fractional k variables of an optimal solution of the continuous relaxation. Our computational results on COR@L, MIPLIB 2003, and MIPLIB 2010 test sets suggest that the improvement resulting from parallelization using our approach is statistically significant. Furthermore, running multiple short FP algorithms in parallel can significantly outperform running a single long version even if both algorithms are given the same amount of CPU time. This suggest that the benefits of parallelization are also due to information sharing. Keywords: Mixed Integer Programming, Parallel Optimization, Feasibility Pump
1. Introduction
branch and bound (B&B) tree in a branch and cut algorithm. In this study, we propose a scheme that can
In this study we consider the problem of generating use multiple heuristics with various parameter settings high quality feasible solutions for unstructured Mixed in parallel. Specifically, we empirically investigate the Integer Linear Programs (MILPs) in a parallel computause of Feasibility Pump (FP) to find feasible solutions tional environment. MILP is extensively studied in the for unstructured MILPs in a parallel framework. literature. We suggest interested reader to [1] for a recent review. Generating high quality feasible solutions
The motivation of this study is the emerging com-
quickly is important in practice. This is because avail-
puting environments. The clock speed of the high-tech
ability of feasible solutions with close to optimal objec-
processors is more or less stable for the past few years.
tive value may help reduce the number of nodes in the
Computer technology is now mainly focused on increasing the number of processors and memory. With this in
I This
study is supported by ONR (grant no:) and DoE (grant no:) II The main part of the study was conducted while Utku Koc was a post doctoral fellow at Northwestern University. ∗ Corresponding author Email addresses:
[email protected] (Utku Koc),
[email protected] (Sanjay Mehrotra) Preprint submitted to Elsevier
mind, we move to a new era of developing parallel algorithms for a variety of problems for desktop and high performance computing. From a practical point of view, July 18, 2016
it is important to solve a problem or identify a good so-
solutions for MILPs [8] and Mixed Integer Convex Pro-
lution within a reasonable amount of wall-clock time,
grams (MICPs) [9].
de-emphasizing the CPU-time used.
This paper has multiple contributions to the literature:
For MILPs, a way to use the power of parallel com-
1) we assess the value of parallelization independent of
puting is to search the branch and bound tree in par-
the increase in the CPU-time, 2) we provide a paral-
allel. Koch et al. [2] discuss that the speed up of a
lel framework that can use multiple parameters for FP
B&B algorithm is around 20,000 compared to a sequen-
type heuristics. Each parallel subroutine uses a different
tial run, even if a million cores are used to search the
rounding scheme so that the most fractional variables
B&B tree. They discuss that the dis-proportionality in
are rounded in an enumerative fashion independently.
the performance is mainly due to communication over-
This study is the first of its kind in terms of using many
head, idle time for initial tasks or termination (ramp-
cores to generate feasible integer solutions in parallel
up and ramp-down), performance effect of the redun-
using enumeration in a distributed memory environment
dant work (some nodes may not have been evaluated
with many cores. Our computational experiments sug-
if fewer processors are used), and idle time due to la-
gest that, running multiple algorithms for a short amount
tency/contention/starvation.
of time in parallel can significantly outperform running
The FP algorithm was first proposed by Fischetti et
a single long version even if both algorithms are given
al. [3]. An extension to general MILPs is proposed by
the same amount of CPU clock time. Thus, the benefits
Bertacco et al.
[4]. By a modification of the objec-
of parallelization are not only due to the increase in the
tive function, Achterberg and Berthold [5] found bet-
CPU-time (given the same amount of wall clock time)
ter feasible solutions (Objective FP). Fischetti and Sal-
but also due to multiple algorithms running in parallel
vagnin proposed different rounding heuristic by using
and sharing information along the course of the algo-
constraint propagation techniques after rounding some
rithms.
of the variables [6]. Baena and Castro [7] extended the
We present computational results describing our ex-
FP so that the integer point is obtained by rounding a
perience with the use of the FP heuristic in the parallel
point on the (feasible) line segment between the com-
subroutines. The original FP algorithm starts from the
puted feasible point and the analytic center for the re-
rounded solution of an optimal solution of the continu-
laxed LP. In this study, we provide a parallel framework
ous relaxation. In this study, all possible rounded points
in which multiple feasibility heuristics starting from dif-
from the most fractional k variables are enumerated and
ferent solutions can communicate and share informa-
2k subroutines are run in parallel.
tion. Recently, Huang and Mehrotra studied a com-
The rest of the paper is organized as follows: we de-
bination of different types of random walks and FP in
scribe our parallel heuristic framework for the use of
which the FP algorithm is used as the rounding proce-
multiple heuristics in Section 2. Details of the round-
dure for interior random points. They generate feasible
ing procedure are given in Section 3. Section 4 gives 2
the implementation details of the proposed algorithms.
we use FP as the rounding procedure at the subroutines
The computational results and our experience regarding
of our concurrent feasibility heuristic.
the use of massively parallel systems are discussed in
Regarding the communication during the run time,
Section 5. Finally, we conclude in Section 6.
one may use so called master/slave topology. In this paradigm, master controls the overall course of the algorithm. Slave programs, on the other hand, follow
2. A Concurrent Framework for Finding Feasible
the commands from the master, run the instances of the
Solutions for MILPs
heuristic, and return integer solution(s) to the master, In this section we describe our concurrent framework
if any. The role of master includes distributing inputs
to generate feasible solutions for MILPs. In our ap-
to and collecting results from the slaves. When one
proach, we run multiple feasibility heuristics in parallel.
of the slaves finds an integer solution, it sends the so-
We refer to the algorithms running in different proces-
lution to the master, along with the objective function
sors as the subroutines. Each parallel subroutine uses
value. Moreover, any combination of parameter set-
different random number seed with different starting so-
tings, rounding methods, and anti cycling rules are also
lutions. Also, one may run different feasibility heuristics
valid. The main algorithm that runs at the master is pre-
in parallel. Note that, even if all the subroutines start
sented in Algorithm 2.1.
from the same solution and run independently, final inAlgorithm 2.1 Parallel Feasibility-Pump Running in Master Input: a MILP min{cT x : Ax ≥ b, x ∈ Rn , x j integer ∀ j ∈ I}, number of slaves each heuristic will run Output: an integer solution to the above MILP 1: Spawn Slaves 2: Set LB = min{cT x : Ax ≥ b, x ∈ Rn }, U B = ∞ and RHS = U B − 3: while termination criteria not met do 4: Inform slaves about new RHS 5: Collect results 6: if One of the slaves return an integer solution then 7: Update U B = minimum of the slaves 8: Update RHS = U B − 9: end if 10: end while 11: Exit all the slaves and return best integer so far
teger solutions may still be different. This is because multiple instances can take different paths (due to the inherent randomness) in the course of the parallel subroutines. Whenever one of the subroutines finds a feasible solution, it broadcasts the objective function value to others. Then, all subroutines continue their search with a new and better objective cut off constraint. Thus, the information gained in one of the subroutines is shared with the rest to enhance their search. This is an important feature of our concurrent optimization approach. All subroutines update themselves as soon as the first feasible solution is found. All parallel instances restart their search (with the new collective information) at the
We illustrate the algorithm running at the slaves in
time a better solution is found. In other words, all sub-
Algorithm 2.2. Each slave uses a different random num-
routines continue as if they found a better solution which
ber seed and may run a different variant of a heuristic.
is fed by the others. In this study, as a proof of concept,
At each iteration of Algorithm 2.2, slave subroutine re3
ceives some information from the master (if any). Then
3.1. Basic and Objective FP Algorithms
updates itself with the new information, creates a starting solution for the algorithms depending on the type
FP heuristic was first proposed by Fischetti et al. [3]
of heuristic it is running. The heuristic subroutine con-
for 0-1 MILPs. The FP algorithm starts from a solution
tinues until predetermined criteria is met or master pro-
x, searches for another solution xˆ that is as close as pos-
vides new information. Whenever an integer solution
sible to a rounded solution of x ( x˜) by solving an l1 norm
is identified, it is shared with the rest of the concurrent
minimization problem of the form:
subroutines by means of the master. Algorithm 2.2 Parallel Heuristic Subroutine Running in Slaves Input: a MILP, RHS Output: an integer solution to the MILP 1: Listen master for the type of heuristic that will be run 2: while not killed by the master do 3: Listen master for parameters and information (RHS ) 4: Update RHS of the objective cutoff constraint ∗ 5: Get information form master (LP optimum (xlp )) 6: Update with respect to the heuristic variant 7: Create a starting solution x 8: Run heuristic starting from x 9: Broadcast best integer solution 10: end while
min
∆(x, x˜) =
X
|x j − x˜ j |
(1)
Ax ≥ b
(2)
cT x ≤ RHS
(3)
j∈I
x ∈ Rn , x j ∈ Z, ∀ j ∈ I,
(4)
where (1) is the l1 norm distance, (2) and (4) are the constraint set defined by the original MILP and (3) is the objective cut off constraint. Two decisions are made in this heuristic: starting so-
The variants of the heuristic subroutines differ in
lutions and rounding procedure. Moreover, one needs to
Steps 6-8 of Algorithm 2.2. The update procedure, gen-
define an iterative version that moves from one starting
erations of starting solutions, and running conditions of
solution to the next. In other words, one meeds to define
the heuristics depend on the heuristic itself and informa-
how x and x˜ are calculated at each iteration. Note that
tion provided by the master. Next, we define the variants
the above model focuses on feasibility with no consid-
of heuristic subroutines and the feasibility pump algo-
eration on the quality of the solution. Using a normalized convex combination of the origi-
rithm running in slaves in detail.
nal objective function and the above l1 norm objective, one can generate better quality solutions (Objective-FP)
3. Variants of FP Heuristic
[5]. The idea is to focus more on the objective value In this section we describe the details for the round-
quality in the beginning of the algorithm, and feasi-
ing subroutine, as well as the generation of the staring
bility at the later stages by controlling the parameter
solutions for rounding. We start with the details of the
α ∈ (0, 1). For this purpose, the objective function (1)
basic FP algorithm as the rounding procedure.
of the above MILP (1) is replaced by 4
α T 1−α ∆(x, x˜k−1 ) + c x, ||∆|| ||c||
Algorithm 3.1 Objective Feasibility Pump for MILP Input: a MILP min{cT x : Ax ≥ b, x j integer ∀ j ∈ I} Output: an integer solution to the above MILP 1: Initialize k = 0, LB = min{cT x : Ax ≥ b}, U B := ∞, RHS = U B − 2: Set xk := arg min{cT x : Ax ≥ b, cT x ≤ RHS }. 3: if xk is integer then 4: return xk 5: else 6: let x˜k := [xk ] (= rounding of xk ) 7: end if 8: while termination criteria not met do 9: k := k + 1 10: α = α × αr α T ˜k−1 ) + ||c|| c x : 11: compute xk := arg min{ 1−α ||∆|| ∆(x, x T Ax ≥ b, c x ≤ RHS } 12: if xk is integer then 13: set U B := cT xk and RHS := U B − and go to step 2. 14: end if 15: if ∃ j ∈ I : [xkj ] , x˜kj then 16: set x˜ := [xk ] 17: else 18: flip entries x˜kj ( j ∈ I) randomly 19: end if 20: end while
(5)
where ∆ is the l1 norm distance, c is the original objective vector and ||·|| is the euclidean norm. The parameter α reduces gradually at each iteration of the Objective FP algorithm provided in Algorithm 3.1. We refer to the process of solving the problem of minimizing the convex combination defined in (5) as an FP iteration. The original FP algorithm starts from an optimal solution of the relaxation problem and rounds it to the nearest integer. We refer to a solution xˆ to be an integer solution if xˆ j is integer for all j ∈ I. If FP iteration terminates with an integer solution, we have a feasible solution for the original MILP. The objective function value of this solution is fed back to the model as an artificial objective cutoff constraint cT x ≤ U B − , where U B is the objective function value of the best incumbent
Note that the above algorithm may cycle. In the orig-
solution so far and is the improvement coefficient. Ob-
inal implementation by Fischetti et al. [3], whenever a
jective cutoff constraint is used to find solutions with im-
cycle is heuristically detected, a random perturbation is
proved objective. If objective of the problem is known to
applied by skipping Step 15 and directly moving to Step
be integer, then = 1, else one needs to set to a small
18. In Step 18 of the algorithm, flipping an entry means
tolerance (we use = 0.1). If the solution of the FP iter-
changing the rounding value of the entry. If x˜kj < xkj
ation is not integer, the original FP algorithm continues
and x˜kj is to be flipped, we increase x˜kj by one. Simi-
from this solution and rounds it to another integer solu-
larly, if x˜kj > xkj , flipping corresponds to decreasing the
tion. In other words, the next iteration starts from an op-
value of x˜kj by one. In the cycle breaking perturbation,
timal solution of the l1 norm minimization problem with
a random number of indexes among the most fractional
the same rounding scheme. The algorithm termites if
entries are flipped. Note that, depending on the flipping
an optimal solution for MILP is found or time/iteration
(rounding) scheme, the output of the algorithm can sig-
limit is reached. Depending on the choice of the starting
nificantly change. Additionally, using different random
solutions and the rounding scheme, multiple FP variants
number streams may have a significant impact on the
can be defined.
performance of the solutions generated by the FP algo5
is imposed to ensure that all parallel subroutines com-
rithm.
plete at the same time. In a parallel setting where multiple variants are used, there may be drastic differences
4. Implementation Details
in the completion time of a fixed number of iterations We now describe the implementation details for our
and/or CPU-time. To get maximum computational ad-
concurrent optimization framework and FP. The details
vantage, we allow all parallel subroutines to complete
of the computational environment are also given.
at the given wall clock time. Also, the algorithm terminates if an optimal solution is found by one of the slaves.
4.1. Parallel FP Implementation 4.2. Implementation of the Rounding Heuristics
For the master/slave paradigm, we use MPI (Message Passing Interface) to ensure scalability. The communi-
The algorithm can be split into three basic stages, 1)
cation between the slaves is done by the master. We
Start point generation, 2) Rounding and 3) Communica-
use Mersenne twister random number generator at each
tion.
slave. In order to ensure that each slave uses a different
Stage 1 - Start point generation: The main difference
random number stream, the seed for each generator is
within FP implementations is based on this stage, The
fed by the master. The seeds are generated using a lin-
starting solution for the original FP algorithm is an op-
ear congruential method. Recall that whenever a feasi-
timal solution for the continuous relaxation. One than
ble integer solution is found, objective cut off constraint
rounds this solution. In our implementation, each paral-
needs to be updated at all parallel processors. In our
lel subroutine changes the rounding scheme in the fol-
implementation each slave updates itself then sends the
lowing way: the most fractional k variables are set to
objective function value to the master process, which in
its floor or ceiling by different subroutines, k depending
turn broadcasts the best of the slave objectives to the
on the total number of CPUs. If the parallelization level
other processors. The master process checks the slaves
is 2k , the most fractional k variables are enumerated by
for feasible solutions in a predetermined sequence. The
considering both floor and ceiling of the values.
broadcast of the objective function value is done in the
Stage 2 - Rounding: FP algorithm is implemented as
same sequence. Due the communication lag, a slave
the rounding stage. The details are similar to the orig-
may have already found a solution that is better than the
inal FP implementation by Fischetti et al.. However,
broadcasted one. In this case, slaves do not update the
we turned off the branching phase. As the improvement
objective cutoff constraint.
phase is handled by the master algorithm, internal im-
The implementation allows any combination of mix-
provement is also disabled. All other parameters are
ing multiple FP variants in a parallel setting. Multiple
used at the default values. Note that if FP implemen-
termination criteria are implemented, however, we share
tation hits its internal iteration limits, the algorithm re-
the results with a wall clock time limit. Wall clock time
sets those as long as the limits imposed by the master 6
is not met. Moreover, when a feasible solution is found,
For our tests, we used 74 problems from the COR@L
the solution is polished by fixing all the integer variables
library [10], 28 problems from the MIPLIB 2003 library
and re-optimizing on the continuous variables, if any.
[11], and 84 feasible problems from the MIPLIB 2010
Stage 3 - Communication: As soon as a feasible so-
benchmark set [12]. As some of the problems are du-
lution is found by one of the parallel subroutines, it is
plicate in the test sets, the total number of problems in
shared by the master. The master then updates all slaves
our test bed is 180. The details on the test problems are
with the new incumbent. After each rounding iteration,
provided in the Appendix.
slaves check if there exists a new incumbent solution and update objective cut off constraint, if necessary.
5. Computational Results
4.3. Computational Environment and Test Bed
In order to assess the value of parallelization irrespec-
All the algorithms are coded in C++. Computations
tive of the increase in the CPU-time, we run the algo-
are performed on Northwestern University high perfor-
rithm in an increasingly parallel environment using 1, 2,
mance computing (HPC) system referred to as QUEST.
4, 8, 16, 32, 64, 128, 256, and 512 parallel subroutines.
At the time this study is conducted QUEST clusters
The amount of time one wants to spend on heuristics to
had 252 Intel Westmere X5650 (2.66 GHz) nodes (3052
generated feasible solutions depend on the user/solver
cores), 68 Intel Sandybridge E2670 (2.6 GHz) nodes
settings. In our runs the time limit for each problem
(1088 cores) and 110 Intel IvyBridge E5-2680 (2.8
is calculated depending on the solution time of the first
GHz) nodes (2200 cores). This computations in this
continuous relaxation. In a set of preliminary experi-
study is carried out in the Westmere cluster. Each node
ments, we calculated the time to solve the first relax-
has at least 4GBs of memory per core for all the nodes.
ation at different times of the day and different days of
In practice, as the number of cores needed increase, it is
the week. This is done to get an understanding of how
reasonable to share the nodes with other users in HPC
the load of the HPC system effects the results even for
systems. To access the resources in a reasonable time,
a single LP relaxation. We then averaged the solution
we allow to share the nodes with other users in all ex-
times (referred to as t). 90% of the problems (162 out
periments. We point out that, depending on internal and
of 180) have t < 6 seconds. We run these problems for
external factors, controlling the process of the parallel
up to 2560t wall clock time limit. 13 of the remaining
implementations is an issue in a shared machine. The
18 problems are run with 256t time limit. The limits
mapping of the nodes and cores may take some time de-
are selected in such a way that maximum time to run
pending on system settings. We used MPI 3 standards
each problem is limited to four hours. The remaining
to employ the master/slave paradigm. Cplex 12.5 with
five problems are not included in the analysis as 256t is
a coin-OR interface is used for solving the linear pro-
more that four hours. This was needed to ensure that
gramming relaxations at the slaves.
we get the resources on QUEST in a timely manner. We 7
recorded the results at 10t, 20t, 40t, . . ., 2560t for t < 6,
discussion on our experience on parallel HPC systems
and 1t, 2t, 4t, . . ., 256t for t ≥ 6. The idea is to under-
in Section 5.1.
stand the trade off between the wall clock time limit and
The total amount of resources used by an algorithm
the number of processors. We tested both basic and ob-
can be calculated by multiplying the time spent and
jective FP algorithms in our analysis. Table 1 and 2 sum-
number of processors. Each cell uses the same amount
marizes the results for 10 parallelization levels (1, 2, 4,
of resources with its up-right and down-left cell. Mov-
. . .,512) and nine time levels (10t, 20t, 40t, . . ., 2560t).
ing in the down-left direction in the table represents the
Each cell represent the number of problems for which
use of same resources with more processors and less
the algorithm finds a feasible solution at a given paral-
time. Note that the numbers in each cell in columns with
lelization level and time limit. The numbers in parenthe-
more that 80t is comparable with its up-right and down-
sis represent the number of problems for which an op-
left cell. For a general conclusion observe that the first
timal solution is found. Note that if an algorithm finds
row represents a single processor (a classical serial al-
a solution that is better than or equal to the best known
gorithm). Note that even if it is given a long time (i.e.,
solution (reported at library web pages), it is considered
2560t) a serial basic FP algorithm can find a solution
as optimal in this analysis. There are cases for which we
for 138 problems, 35 being optimal. When 32 proces-
find solutions that are better than the best known values
sors are used with 80t time limit (32 times less), the run
reported in the library web pages, though they seem to
finds a solution for 144 problems (34 being optimal).
be outdated. We start the analysis with basic FP algo-
The amount of resources used by both implementations
rithm.
is the same (32 processors × 80t = 1 processor × 2560t
Observe from Table 1 that as time increases (i.e.,
). The number of problems for which feasible solutions
moving right at each row) the number of problems for
are found increases with decreased time and increased
which a feasible (optimal) solution is found increases.
processors. In order for a clearer understanding, we ig-
We observe that in most of the cases, increasing the par-
nore the problems for which a serial algorithm can find
allelization level (i.e., moving down at each column)
a solution in 80t time limit. These problems are likely
provides at least the same results if not better. How-
no to benefit from parallelization. We refer to this situa-
ever, there are cases in which using the same amount
tion as anchoring at a single processor, 80t. Considering
of time with more processors result in finding less so-
both the number of problems for which a feasible and
lutions. This may be due to two reasons: 1) increasing
optimal solution is found through several parallelization
the parallelization level increases the time to map the
levels, we conclude that parallel version of the basic FP
processes to different processors, and 2) the runs for dif-
algorithm linearly scales, for time values greater than
ferent parallelization levels are taken at different times
80t and parallelization level less than 512.
and environments. Both reasons are due to the compu-
The results for objective FP are provided in Table 2.
tational and parallel environment. We provide a detailed
Comparing the results with Table 1 indicates that the ba8
Table 1: Number of problems for which basic FP finds a feasible(optimal) solution (total = 162)
1 2 4 8 16 32 64 128 256 512
10t
20t
40t
80t
160t
320t
640t
1280t
2560t
69(6) 90(10) 99(10) 106(16) 92(10) 117(15) 112(16) 117(14) 124(15) 126(13)
92(9) 114(14) 116(13) 127(19) 123(16) 130(17) 132(20) 133(18) 134(18) 135(15)
109(11) 120(17) 129(21) 133(23) 133(23) 137(27) 138(28) 138(23) 142(22) 143(19)
116(14) 128(25) 133(29) 137(33) 137(31) 144(34) 143(38) 143(30) 145(22) 145(24)
121(17) 133(33) 137(37) 139(41) 142(44) 144(45) 144(44) 145(34) 148(30) 148(32)
128(20) 136(38) 141(42) 140(48) 145(54) 145(51) 148(58) 147(46) 150(39) 151(41)
130(28) 136(45) 142(52) 146(60) 148(61) 149(61) 149(67) 149(58) 151(48) 151(44)
132(33) 140(48) 143(55) 146(61) 149(63) 150(65) 149(68) 150(66) 151(58) 151(51)
138(35) 141(50) 144(56) 148(64) 150(65) 150(68) 150(73) 151(73) 151(70) 151(58)
Table 2: Number of problems for which Objective FP finds a feasible(optimal) solution (total = 162)
1 2 4 8 16 32 64 128 256 512
10t
20t
40t
80t
160t
320t
640t
1280t
2560t
37(2) 52(7) 72(12) 74(17) 77(18) 78(19) 77(19) 81(18) 84(19) 85(17)
66(6) 83(11) 98(20) 103(26) 109(25) 108(27) 110(31) 111(29) 112(31) 112(29)
89(10) 103(17) 119(26) 122(33) 125(35) 124(34) 128(41) 129(41) 129(38) 132(43)
105(12) 117(24) 129(32) 131(40) 134(44) 134(46) 138(49) 139(45) 139(47) 142(49)
112(15) 124(29) 136(45) 137(45) 139(50) 140(54) 144(60) 145(56) 144(52) 145(53)
116(18) 128(36) 138(50) 138(53) 142(57) 143(60) 146(65) 147(67) 147(63) 146(56)
120(23) 131(40) 139(53) 140(55) 144(60) 145(63) 146(66) 147(70) 147(68) 147(68)
123(26) 134(48) 141(56) 142(61) 146(65) 148(67) 149(70) 149(73) 149(73) 150(76)
126(30) 136(51) 143(59) 145(62) 147(69) 148(69) 149(70) 151(76) 150(73) 150(79)
9
sic FP algorithm finds feasible solutions for more problems in almost all parallelization and time levels. This is due to the fact that objective FP algorithm searches for higher quality solutions in terms of objective function value and basic FP focuses only on feasibility. In 160
terms of the number of problems for which each algorithm finds an optimal solution, objective FP seems to provide better results. However, the comparison cannot
150
150-‐160
140
140-‐150
130
130-‐140 120-‐130
120 2560
110
be generalized among parallelization and time levels.
640
100 1
Similar to the results for basic FP, moving in the down-
2
4
8
110-‐120 100-‐110
160 16
32
64
128
256
40 512
left direction in Table 2 provides better or equivalent results for time values greater than 80t and parallelization
Figure 1: Number found with respect to time and parallelization level for Basic FP
level up to 512. Consider the results anchored at single processor, 80t. The number of problems for which a feasible(optimal) solution found is 105 (12). Increasing the time limit 32 fold increases this number to 126 (30). However, using the same amount of resources but in parallel with 32 cores, the numbers increase to 134 (46). On examining the table, we conclude that in terms of the number of problems for which a feasible(optimal) solution is found, objective FP scales linearly, for time values greater than 80t and parallelization level less than
160
512. Figures 1 and 2 show how the number of problems for which basic and objective FP finds a feasible
150
150-‐160
140
140-‐150
130
130-‐140 120-‐130
120
solution changes with respect to different time and par-
2560
110
640
100
allelization levels. For a clearer understanding, we in-
1
cluded the time values starting from 40t. Observe from
2
4
8
110-‐120 100-‐110
160 16
32
64
128
256
40 512
the figures that, the slope in the parallelization level diFigure 2: Number found with respect to time and parallelization level for Objective FP
rection is more than the time increase direction. This is due to the fact that the effects of parallelization is more than that of time. Also, the effects of parallelization is more in the lower levels. Continuing with the 13 larger problems which are run 10
for up to 256t time limit, the results in Tables 3 and 4
tion increases, diminishing after 128 processors. Also,
provide the number of problems for which a feasible
the value of parallelization increase with time.
(optimal) solution is found for basic and objective FP, 5.1. Additional Computational Experience
respectively. The results show a similar trend as that for
In this section, we share additional computational ex-
the small problems. Considering the quality of the solutions generated at
perience on running basic and objective FP in a dis-
each time limit and parallelization level, we perform
tributed memory environment. Computers have become
a pairwise comparison of all methods with equal re-
commodities rather than technological equipments and
sources using Wilcoxon signed rank test on percentage
massive parallelization in a distributed memory setting
gap values. Table 5 and 6 present the significance of
is the current trend. However, even the simplest appli-
the difference between multiple parallelization and time
cations encounter serious implementation and practical
levels for basic FP and objective FP algorithms, respec-
issues. As the number of processors increase, it is more
tively. The columns of the table corresponds to time
difficult to have a dedicated set of nodes that can only
difference and rows correspond to processor difference.
be accessed by a single user. Thus, one needs to ac-
For example, the value at row 2-4 and column 80t-40t
cept sharing the computing resources with other users
of Table 5 (α = −0.95) represents that the performance
that may run various types of programs with different
difference between two processors and 80t and four pro-
requirements. Some applications focus more on mem-
cessors with 40t is significant with α = 0.95. The posi-
ory while others rely on CPU usage. Professional pro-
tive sign represents that the algorithm with more proces-
grams focus more on optimizing performance in a lower
sors (and less time) is better, whereas a negative value
coding level and may squeeze the use of memory and
states that the longer algorithm (and more time) provides
CPU. Sharing a node with such applications may result
statistically better solutions. Empty cells represent that
in underutilization on one side resulting in unfair distri-
the performance difference is not significant. Recall that
bution of the resources. Depending on the underlying
both algorithms have the same resources thus, the per-
system and message passing structure, the time to allo-
formance difference is due to parallelization irrespec-
cate nodes, mapping of the processors varies. This also
tive of the increase in the CPU-time. Note that empty
depends on the programs running in all nodes.
cells and positive values are in favor of parallelization.
The iteration count of the first feasible solution would
Observe that using two cores instead of one is statisti-
be the same across runs for the same slave if there
cally significant at all time limits for both basic and ob-
were no interruption from other slaves (no communi-
jective FP. The significance of parallelization increases
cation). However, the course of our parallel algorithm
with time up to 64 cores for basic FP and up to 128 cores
depends on the time when the slaves find a solution (es-
for objective FP. The value of parallelization of FP in the
pecially the first solution) rather than the iteration count
way described here decreases as the level of paralleliza-
of the algorithms running in slaves. The slaves, on the 11
Table 3: Number of problems for which basic FP finds a feasible(optimal) solution for large problems (total =13)
1 2 4 8 16 32 64 128 256 512
1t
2t
4t
8t
16t
32t
64t
128t
256t
1(0) 2(0) 4(0) 3(0) 3(0) 4(0) 4(0) 4(0) 4(0) 4(0)
3(0) 4(0) 4(0) 4(0) 4(0) 4(0) 4(0) 4(0) 4(0) 4(0)
4(0) 7(0) 7(0) 7(0) 8(0) 8(0) 11(0) 11(0) 11(0) 11(0)
6(0) 7(0) 8(0) 10(0) 12(0) 12(1) 12(0) 12(0) 12(0) 12(0)
8(0) 9(1) 10(0) 11(0) 12(1) 13(1) 12(1) 13(1) 13(1) 13(1)
8(0) 11(1) 11(1) 12(0) 13(1) 13(1) 13(1) 13(1) 13(1) 13(1)
9(0) 11(1) 12(1) 12(1) 13(1) 13(1) 13(1) 13(1) 13(2) 13(2)
9(0) 12(1) 12(1) 12(1) 13(1) 13(1) 13(1) 13(1) 13(3) 13(3)
9(0) 12(1) 12(1) 12(1) 13(1) 13(1) 13(1) 13(2) 13(3) 13(3)
Table 4: Number of problems for which Objective FP finds a feasible(optimal) solution for large problems (total = 13)
1 2 4 8 16 32 64 128 256 512
1t
2t
4t
8t
16t
32t
64t
128t
256t
0(0) 0(0) 0(0) 1(0) 1(0) 1(0) 1(0) 1(0) 1(0) 1(0)
2(0) 2(0) 2(0) 2(0) 3(0) 3(0) 6(0) 6(0) 6(0) 6(0)
2(0) 4(0) 4(0) 5(0) 7(0) 8(0) 8(0) 8(0) 8(1) 8(0)
4(0) 5(0) 5(0) 6(0) 7(0) 8(0) 9(0) 9(0) 9(1) 9(1)
6(0) 9(0) 8(2) 10(2) 11(2) 12(2) 12(2) 12(2) 12(3) 12(3)
7(0) 9(0) 10(2) 11(2) 12(2) 13(2) 13(3) 13(2) 13(4) 13(4)
9(0) 12(1) 12(2) 12(2) 12(3) 13(2) 13(3) 13(3) 13(4) 13(4)
10(0) 12(1) 12(2) 12(2) 12(3) 13(3) 13(3) 13(4) 13(4) 13(4)
10(0) 12(1) 12(2) 12(2) 12(3) 13(3) 13(3) 13(4) 13(4) 13(5)
Table 5: Significance of parallelization for basic FP on problems with t < 6
1-2 2-4 4-8 8-16 16-32 32-64 64-128 128-256 256-512
20t-10t
40t-20t
80t-40t
160t-80t
320t-160t
640t-320t
1280t-640t
2560t-1280t
0.95 -0.995 0.999 -0.999 -0.999 -0.999 -0.999 -0.999 -0.995
0.995 -0.999 -0.95 -0.999 -0.999 -0.999 -0.999 -0.999 -0.9
0.999 -0.95 -0.95 -0.999 -0.999 -0.999 -0.999 -0.999 -0.95
0.999
0.999 0.95 0.95 -0.999 -0.99 -0.999 -0.999 -0.999 -0.999
0.999 0.999 0.999
0.999 0.999 0.999
0.999 0.999 0.999
-0.999 -0.999 -0.999
-0.999 -0.999 -0.999
-0.95 -0.999 -0.999
-0.999 -0.999 -0.999 -0.999 -0.999 -0.995
12
Table 6: Significance of parallelization for objective FP on problems with t < 6
1-2 2-4 4-8 8-16 16-32 32-64 64-128 128-256 256-512
20t-10t
40t-20t
80t-40t
160t-80t
320t-160t
640t-320t
1280t-640t
2560t-1280t
0.9 -0.9 -0.999 -0.999 -0.999 -0.999 -0.999 -0.999 -0.999
0.995 0.95 -0.99 -0.999 -0.999 -0.999 -0.999 -0.99 -0.999
0.999 0.99 -0.95 -0.995 -0.995 -0.999 -0.999 -0.999 -0.999
0.999 0.999
0.999 0.999
0.999 0.999 0.95
0.999 0.999 0.95 0.9
0.999 0.999 0.995 0.9
-0.9 -0.995 -0.999 -0.999 -0.999
-0.999 -0.999 -0.999
-0.95 -0.999 -0.999
-0.95 -0.995
-0.9 -0.99
other hand, are affected by the computing resources used
tion (that is an optimum solution for the continuous
across different runs depending of the programs running
relaxation) and running with different random number
in different nodes. Although the slaves follow the same
streams. There is a significant value of starting from
track of iterations, the time to generate a particular so-
multiple rounded points in the presence of paralleliza-
lution varies across replications. Thus, a specific set of
tion. Extensive computational test indicate that the value
iterations may be interrupted by a solution fed by other
of increasing the level of parallelization is statistically
slaves. This suggest that the results depend on the state
significant for up to 128 cores.
of the computing resources, and may not replicate as this
There are several other heuristics for finding fea-
state cannot be reproduced if the resources are shared
sible solutions for MILP problems that can be used
with other users. We would like to also note that due
as a part of a parallel implementation. Among them,
to the synchronization in parallel implementations, de-
Pivot-and-Complement [13] performs simplex like piv-
pending on how the processors are given priority, mul-
ots to get slack variable into the basis and integer vari-
tiple runs may terminate with slightly different solution
ables out of a basis. This is further extended by Balas
even in a shared memory setting.
[14]. Another heuristic for 0-1 MILP is OCTANE, which uses enumeration techniques on extended facets of the
6. Conclusion
octahedron [15]. Fischetti and Lodi propose a local search algorithm [16] to improve an incumbent solu-
It is already shown that FP is a useful heuristic for
tion.
MILP as it usually finds feasible solutions for practical
A heuristic called Relaxation Induced Neigh-
borhood Search RINS solves sufficiently smaller sub-
problems in a reasonable computational time [5] [4], [3].
MILPs to improve an incumbent solution [17]. The
In all studies related to the use of FP, however, no paral-
use of random-walks was investigated in the FP setting
lelization is used. In this study, we tested FP further in
[8, 9]. Using these heuristics in a parallel framework are
a highly scalable parallel framework.
considered as future research topics.
We note that starting FP from multiple rounded points in parallel outperforms using the same starting solu-
[1] A. Lodi, Mixed integer programming computation, in: M. Jnger,
13
T. M. Liebling, D. Naddef, G. L. Nemhauser, W. R. Pulleyblank,
[14] E. Balas, S. Schmieta, C. Wallace., Pivot and shift - a mixed in-
G. Reinelt, G. Rinaldi, L. A. Wolsey (Eds.), 50 Years of Integer
teger programming heuristic, Discrete Optimization 1 (1) (2004)
Programming 1958-2008, Springer Berlin Heidelberg, 2010, pp.
3–12.
619–645.
[15] E. Balas, S. Ceria, M. Dawande, F. Margot, G. Pataki, Octane: A
[2] T. Koch, T. Ralphs, Y. Shinano, Could we use a million cores to
new heuristic for pure 0-1 programs, Operations Research 49 (2)
solve an integer program?, Mathematical Methods of Operations
(2001) 207–225.
Research 76 (1) (2012) 67–93. doi:10.1007/s00186-012-0390-9.
[16] M. Fischetti, A. Lodi, Local branching, Mathematical Program-
URL http://dx.doi.org/10.1007/s00186-012-0390-9
ming 98 (1) (2003) 23–47.
[3] M. Fischetti, F. Glover, A. Lodi, The feasibility pump, Mathe-
[17] E. Danna, E. Rothberg, C. Pape, Exploring relaxation induced
matical Programming 104 (1) (2005) 91–104.
neighborhoods to improve MIP solutions, Mathematical Pro-
[4] L. Bertacco, M. Fischetti, A. Lodi, A feasibility pump heuristic
gramming 102 (1) (2005) 71–90.
for general mixed-integer problems, Discrete Optimization 4 (1) (2007) 63–76. [5] T. Achterberg, T. Berthold, Improving the feasibility pump, Discrete Optimization 4 (1) (2007) 77–86. [6] M. Fischetti, D. Salvagnin, Feasibility pump 2.0, Mathematical Programming Computation 1 (2009) 201–222. [7] D. Baena, J. Castro, Using the analytic center in the feasibility pump, Operations Research Letters 39 (5) (2011) 310–317. [8] K.-L. Huang, S. Mehrotra, An empirical evaluation of walk-andround heuristics for mixed integer linear programs, Computational Optimization and Applications 55 (3) (2013) 545–570. [9] K.-L. Huang, S. Mehrotra, An empirical evaluation of a walkrelax-round heuristic for mixed integer convex programs, Computational Optimization and Applications 60 (3) (2015) 559– 585. doi:10.1007/s10589-014-9693-5. URL http://dx.doi.org/10.1007/s10589-014-9693-5 [10] COR@L:
Computational
optimization
research
at
lehigh
(http://coral.ie.lehigh.edu/mip-instances/). [11] T. Achterberg, Operations
T. Koch,
Research
Letters
A. Martin, 34
(4)
MIPLIB 2003, (2006)
361–372.
doi:10.1016/j.orl.2005.07.009. URL http://www.zib.de/Publications/abstracts/ZR-05-28/ [12] T. Koch, T. Achterberg, E. Andersen, O. Bastert, T. Berthold, R. E. Bixby, E. Danna, G. Gamrath, A. M. Gleixner, S. Heinz, A. Lodi, H. Mittelmann, T. Ralphs, D. Salvagnin, D. E. Steffy, K. Wolter, MIPLIB 2010, Mathematical Programming Computation 3 (2) (2011) 103–163. doi:10.1007/s12532-011-0025-9. URL http://mpc.zib.de/index.php/MPC/article/view/56/28 [13] E. Balas, C. H. Martin, Pivot-and-complement: A heuristic for 0-1 programming, Management Science 26 (1) (1980) 8696.
14
APPENDIX Table 7: Problem Profiles of COR@L Library Problem neos5 neos-584851 neos-881765 neos-905856 neos-1121679 neos-1211578 neos-1228986 neos-1337489 neos-1420205 neos-1430701 neos-1440447 neos-1460246 neos-1480121 ran14x18 1 rlp1 roy neos14 neos15 neos-1489999 neos-911880 neos-504815 bienst1 bienst2 mcf2 neos-911970 neos-631517 neos-1439395 22433 neos-1620807 neos-1346382 neos-1426635 23588 neos-512201 neos-504674 neos-582605 neos-1225589 neos-631164 neos-955215 neos-1440460 neos-1056905 neos-1595230 neos-1429461 neos-1467067 neos-522351 neos-538867 neos-1200887 neos-1616732 neos-825075
Lower Bound
Upper Bound
Rows
Columns
Conti
Bin
Int
15 -11 0 -8 −∞ -77 -123 -77 40 -78 -100 2325 43 3667.46 -1 3208.957 74333.34 77895.21 354 49.85281 2296.22 46.75 54.6 65.66667 50.572 11275806 -182 21477 6 -178 -178 8090 513.57 3635.87 −∞ 1.23E+09 10948328 446.5 -180 −∞ 9 -102 -103.667 17891.08 122 -74 −∞ -272
15 -11 +∞ +∞ 16 -77 -123 -77 40 -77 -100 2606 43 3736 15 3208.957 74333.34 81525.14 354 54.83 2296.22 46.75 54.6 65.66667 54.76 11503309 -180.33 21477 6 -176 -176 8090 513.57 3635.87 1 1.23E+09 11315752 446.5 -179.25 30 9 -101.25 -103 17891.08 122 -74 159 -272
63 661 278 403 6 356 356 356 383 668 561 306 363 284 68 162 552 552 1046 83 1067 576 576 664 107 351 775 198 1340 796 796 137 1335 1344 1240 675 406 723 989 900 1750 1096 1084 1705 1170 633 1999 328
63 445 712 686 62 260 260 260 231 312 260 285 222 504 461 149 792 792 534 888 674 505 505 521 888 1090 364 429 231 520 520 368 838 844 1265 1300 1282 1302 468 463 490 520 1196 1524 792 234 200 800
10 40 0 0 12 130 130 130 0 156 130 19 152 252 11 99 656 632 2 48 554 477 470 465 48 231 182 198 0 260 260 137 688 694 865 650 245 672 234 43 0 260 598 1284 0 117 0 0
53 405 712 686 50 130 130 130 126 156 130 266 70 252 450 50 136 160 532 840 120 28 35 56 840 859 182 231 231 260 260 231 150 150 400 650 1037 630 234 420 490 260 598 240 792 117 200 800
0 0 0 0 0 0 0 0 105 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15
Table 7: Problem Profiles of COR@L Library (continued) Problem neos-933815 neos-538916 neos-906865 neos-863472 binkar10 1 neos11 neos-503737 neos-593853 neos-598183 neos-603073 neos-848150 neos-892255 neos-933364 neos-934184 neos-942323 neos-1311124 neos-1426662 neos-1427181 neos-1427261 neos-1429185 neos-1436709 neos-1437164 neos-1440457 neos-1442119 neos-1442657 neos-1603512
Lower Bound
Upper Bound
Rows
Columns
Conti
Bin
Int
759.7285 134 3175 9.94541 6742.2 9 50 1.17E+09 18429.98 16790.24 0 14 760.4215 760.4215 15 -182 -52 -104 -130 -78 -129 8 -180 -182 -156 0
766 134 3175 11.69 6742.2 9 52 1.17E+09 18429.98 16790.24 +∞ 14 766 766 17 -181 -44 -102 -127 -76 -128 8 -179 -181 -154.5 5
947 1314 1634 523 1026 2706 500 1606 992 992 731 2137 1006 1006 754 1643 1914 1786 2226 1346 1417 187 1952 1524 1310 555
1728 864 1184 588 2298 1220 2850 2400 1696 1696 949 1800 1728 1728 732 1092 832 832 1040 624 676 2256 936 728 624 730
888 0 784 56 2128 320 350 1200 1260 1260 0 0 888 888 48 546 416 416 520 312 338 0 468 364 312 1
840 864 400 532 170 900 2500 1200 436 436 949 1800 840 840 684 546 416 416 520 312 338 2256 468 364 312 729
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Table 8: Problem Profiles of MIPLIB 2003 Library Problem
Objective
Rows
Cols
Nonzeros
Conti
Bin
Int
10teams a1c1s1 aflow30a aflow40b cap6000 danoint disctom fixnet6 gesa2 gesa2-o liu manna81 mas74 mas76 mkc modglob noswot opt1217 p2756
924 11503.44 1158 1168 -2451380 65.66 -5000 3983 25779900 25779900 1172 -13164 11801.2 40005.1 -563.846 20740500 -41 -16 3124
230 3312 479 1442 2176 664 399 478 1392 1248 2178 6480 13 12 3411 291 182 64 755
2025 3648 842 2728 6000 521 10000 878 1224 1224 1156 3321 151 151 5325 422 128 769 2756
12150 10178 2091 6783 48243 3232 30000 1756 5064 3672 10626 12960 1706 1640 17038 968 735 1542 8937
225 3456 421 1364 0 465 0 500 816 504 67 0 1 1 2 324 28 1 0
1800 192 421 1364 6000 56 10000 378 240 384 1089 18 150 150 5323 98 75 768 2756
0 0 0 0 0 0 0 0 168 336 0 3303 0 0 0 0 25 0 0
16
Table 8: Problem Profiles of MIPLIB 2003 Library (continued) Problem pp08aCUTS pp08a rout set1ch seymour timtab1 timtab2 tr12-30 vpm2
Objective
Rows
Cols
Nonzeros
Conti
Bin
Int
7350 7350 1077.56 54537.8 423 764772 1096560 130596 13.75
246 136 291 492 4944 171 294 750 234
240 240 556 712 1372 397 675 1080 378
839 480 2431 1412 33549 829 1482 2508 917
176 176 241 472 0 226 381 720 210
64 64 300 240 1372 64 113 360 168
0 0 15 0 0 107 181 0 0
Table 9: Problem Profiles of MIPLIB 2010 Library Problem 30n20b8 acc-tight5 aflow40b air04 app1-2 bab5 beasleyC3 biella1 bienst2 binkar10 1 bnatt350 core2536-691 cov1075 csched010 danoint dfn-gwin-UUM eil33-2 eilB101 enlight13 ex9 glass4 gmu-35-40 iis-100-0-cov iis-bupa-cov iis-pima-cov lectsched-4-obj m100n500k4r1 macrophage map18 map20 mcsched mik-250-1-100-1 mine-166-5 mine-90-10 msc98-ip mzzv11
Rows
Columns
Nonzeros
Int
Bin
576 3052 1442 823 53467 4964 1750 1203 576 1026 4923 2539 637 351 664 158 32 100 169 40962 396 424 3831 4803 7201 14163 100 3164 328818 328818 2107 151 8429 6270 15850 9499
18380 1339 2728 8904 26871 21600 2500 7328 505 2298 3150 15293 120 1758 521 938 4516 2818 338 10404 322 1205 100 345 768 7901 500 2260 164547 164547 1747 251 830 900 21143 10240
109706 16134 6783 72965 199175 155520 5000 71489 2184 4496 19061 177739 14280 6376 3232 2632 44243 24120 962 517112 1815 4843 22986 38392 71941 82428 2000 9492 549920 549920 8088 5351 19412 15407 92918 134603
7344
11036 1339 1364 8904 13300 21600 1250 6110 35 170 3150 15284 120 1457 56
17
90
169
236
14 150
53 251
4516 2818 169 10404 302 1200 100 345 768 7665 500 2260 146 146 1731 100 830 900 20237 9989
Conti
1364 13571 1250 1218 470 2128 9 301 465 848
20 5
164401 164401 2 1
853
Table 9: Problem Profiles of MIPLIB 2010 Library (continued) Problem n3div36 n3seq24 n4-3 neos-1109824 neos-1337307 neos-1396125 neos13 neos-1601936 neos18 neos-476283 neos-686190 neos-849702 neos-916792 neos-934278 net12 newdano noswot ns1208400 ns1688347 ns1830653 opm2-z7-s2 pg5 34 pigeon-10 pw-myciel4 qiu rail507 ran16x16 reblock67 rmatr100-p10 rmatr100-p5 rmine6 rocII-4-11 rococoC10-001000 roll3000 satellites1-25 sp98ic sp98ir tanglegram1 b tanglegram2 timtab1 triptim1 unitcal 7 vpphard zib54-UUE ns1758913 netdiversion mspp16 bley xl1
Rows
Columns
Nonzeros
4484 6044 1236 28979 5687 1494 20852 3131 11402 10015 3664 1041 1909 11495 14021 576 182 4289 4191 2932 31798 225 931 8164 1192 509 288 2523 7260 8685 7078 21738 1293 2295 5996 825 1531 68342 8980 171 15706 48939 47280 1809 624166 119589 561657 175620
22120 119856 3596 1520 2840 1161 1827 4446 3312 11915 3660 1737 1474 23123 14115 505 128 2883 2685 1629 2023 2600 490 1059 840 63019 512 670 7359 8784 1096 9234 3117 1166 9013 10894 1680 34759 4714 397 30055 25755 51471 5150 17956 129180 29280 5831
340740 3232340 14036 89528 30799 5511 253842 72500 24614 3945693 18085 19308 134442 125577 80384 2184 735 81746 66908 100933 79762 7700 8150 17779 3432 468878 1024 7495 21877 26152 18084 243106 11751 29386 59023 316317 71704 205026 26940 829 515436 127595 372305 15288 1283444 615282 27678735 869391
18
Int
Bin
Conti
22120 119856 174
60
25
1
124 492
809
107 9597
5831
3422 1520 2840 129 1815 3906 3312 5588 3600 1737 717 19955 1603 56 75 2880 2685 1458 2023 100 400 1058 48 63009 256 670 100 100 1096 9086 2993 246 8509 10894 871 34759 4714 64 20451 2856 51471 81 17822 129180 29280
1032 12 540 6327
757 3168 12512 449 28 3 171 2500 90 792 10 256 7259 8684 148 428 504
226 7 22899 5069 134