Anda di halaman 1dari 17

MULTI-AGENT SOKOBAN PLANNING

USING SAT ENCODING


REPORT- RAVI KURIL
ravikuril.du.or@gmail.com
MAY 2018

1
Contents
1 Introduction 4
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Problem Statement 4

3 Sokoban encoding approach 1: 4


3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Constraint Definition . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2.1 Player movement constraints . . . . . . . . . . . . . . . . 5
3.2.2 Box push movement constraints . . . . . . . . . . . . . . . 5
3.2.3 Player Head-on constraint . . . . . . . . . . . . . . . . . . 6
3.2.4 Player’s single place existence constraints . . . . . . . . . 6
3.2.5 Box’s single place existence constraints . . . . . . . . . . . 6
3.2.6 Player to Player collision constraints . . . . . . . . . . . 7
3.2.7 Box to Box collision constraints . . . . . . . . . . . . . . . 7
3.2.8 Box to player collision constraints . . . . . . . . . . . . . 7
3.2.9 Obstacle collision constraints . . . . . . . . . . . . . . . . 7
3.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Sokobon Encoding Approach 2: 8


4.1 Constraint Definition . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1.1 Player movement constraints . . . . . . . . . . . . . . . . 8
4.1.2 Box movement constraints . . . . . . . . . . . . . . . . . . 8
4.1.3 Box push movement constraints . . . . . . . . . . . . . . . 8
4.1.4 Player Head-on constraint . . . . . . . . . . . . . . . . . . 9
4.1.5 Player’s single place existence constraints . . . . . . . . . 9
4.1.6 Box’s single place existence constraints . . . . . . . . . . . 9
4.1.7 Player to Player collision constraints . . . . . . . . . . . 9
4.1.8 Box to Box collision constraints . . . . . . . . . . . . . . . 9
4.1.9 Box to player collision constraints . . . . . . . . . . . . . 9
4.1.10 Obstacle collision constraints . . . . . . . . . . . . . . . . 9
4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3 Strategy 2:Exponential jumps encoding Approach . . . . . . . . . 9
4.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.4 Strategy 3: Neighborhood Encoding Approach . . . . . . . . . . 10
4.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.5 Strategy 4: Neighborhood Encoding Approach with Relaxed Jump 10
4.5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 10

5 Experiment and results 10


5.0.1 Machine configuration and Time limit . . . . . . . . . . . 10
5.0.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.0.3 Planner comparison . . . . . . . . . . . . . . . . . . . . . 11

2
5.0.4 PBL interface removal . . . . . . . . . . . . . . . . . . . . 13
5.1 Approach 1 conclusion . . . . . . . . . . . . . . . . . . . . . . . . 13
5.1.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.1.3 Results of relaxed jump . . . . . . . . . . . . . . . . . . . 15

6 Future work 16

3
1 Introduction
1.1 Motivation
In multi-agent classical planning problems multiple agents plan to achieve a
common goal. These multi-agent classical planning problems are solved using
planners. The planner can be of two types, state space based and reduction
based[1]. State space based planners are fast in terms of planning time.Most of
these planner focuses on speed of the solution so plan optimally is not guaran-
teed.Planning as satisfiability[2] is a reduction based planning approach which
reduces the problem and uses SAT solver to solve the reduced form. Recent
advancement in SAT solver makes them competitive with state spaced based
planner in terms of planning time.
To build a general purpose classical planner for MA-STRIP planning problems,
a problem is taken here and with respect to that an encoding is introduced.
Successful result of the approach is the driving force in the work of general
purpose planner.

1.2 Objective
The objective of the work is to develop efficient encoding scheme for the sokoban
puzzle to reduce it into CNF format and solve it using state of the art SAT solver.

2 Problem Statement
Sokoban is a transport puzzle. In single player sokoban there is a N*M grid on
which B boxes are located. There are X numbers of obstacles tiles are located
on the grid. A box can move form its location only if the player pushes it. The
objective of the problem is to get the goal configuration of the boxes without
colliding with the obstacles.
In Multi player sokoban the configuration is same in addition with the R number
of players. Location of the players on the tile is mentioned. A player can push
any box which is near to it if push is possible. The goal locations are given.
The goal of the problem is to get all the boxes on goal locations.

3 Sokoban encoding approach 1:


3.1 Methodology
Assuming there are P number of the players. pt i is the ith player at time t
∀pi ∈ P . Initially all the players are on some locations. (p0 1 , p0 2 , p0 3 ...p0 R )
is the initial locations of the players at time step 0. (pt 1 , pt 2 , pt 3 ...pt R ) is the
locations of the players at time step t. (pT 1 , pT 2 , pT 3 ...pT R ) is the locations of
the players at time step T when all goals are achieved. So the plan length is

4
of the T time step. (p0 1 , p1 1 , p2 1 ...pT 1 ) is the T step plan for player 1.Simi-
larly (p0 i , p1 i , p2 i ...pT i ) is the T step plan for any player where ∀pi ∈ P . All
the steps of the players are represented as boolean variables.Truth assignment
of the boolean variables are searched. A plan is said to be valid iff U is satisfiable.

V(T −1)
U = I(0)( k=1 P (k) ∧ T r(k − 1, k))G(T )
where I(0) is the set of initial configuration literals forming CNF(conjunctive
normal form) clauses, P (k) is the player movement clauses at time step k.T r(k−
1, k) are the box movement clauses from time step k-1 to k.
All the clauses are generated in and converted into DIMACS format and given
to SAT solver.

3.2 Constraint Definition


Constraints are the restriction which are applied on the problem. Constraints
depend upon the nature of the problem. For the sokoban problem constraints
are defined below.

3.2.1 Player movement constraints


In one time step a player can move on one of its neighbor or it can stay on its
location.The neighbors of a player are left,right,up and down.

pi (x, y, t) ⇒ pi (x, y, t + 1) ∨ pi (x + 1, y, t + 1) ∨ pi (x − 1, y, t + 1) ∨ pi (x, y + 1, t + 1)


∨pi (x, y − 1, t + 1) where ∀pi ∈ P .
Let P is the total number of players and N*M is grid size and for time step t
there are (4t ) ∗ P ∗ N ∗ M clauses where t ∈ Z : t ∈ [0, T − 1]. Total number of
clauses for time step T is (4T ) ∗ P ∗ N ∗ M .

3.2.2 Box push movement constraints


Let R be the number of players in the system.A box can push pushed in four
possible directions.

Push Right:
A box is pushed in right direction and agent will take its place.
bi (x, y, t + 1) ⇒ ((bi (x, y, t) ∨ ((p1 (x − 2, y, t) ∧ bi (x − 1, y, t) ∧ (p1 (x − 1, y, t +
1)) ∨ ((p2 (x − 2, y, t) ∧ bi (x − 1, y, t) ∧ (p2 (x − 1, y, t + 1))......((pR (x − 2, y, t) ∧
bi (x − 1, y, t) ∧ (pR (x − 1, y, t + 1))) where ∀bi ∈ B.

Push Left:
A box is pushed in left direction and agent will take its place.
bi (x, y, t + 1) ⇒ ((bi (x, y, t) ∨ ((p1 (x + 2, y, t) ∧ bi (x + 1, y, t) ∧ (p1 (x + 1, y, t +

5
1)) ∨ ((p2 (x + 2, y, t) ∧ bi (x + 1, y, t) ∧ (p2 (x + 1, y, t + 1))......((pR (x + 2, y, t) ∧
bi (x + 1, y, t) ∧ (pR (x + 1, y, t + 1))) where ∀bi ∈ B.

Push Up:
A box is pushed in up direction and agent will take its place.
bi (x, y, t + 1) ⇒ ((bi (x, y, t) ∨ ((p1 (x, y − 2, t) ∧ bi (x, y − 1, t) ∧ (p1 (x, y − 1, t +
1)) ∨ ((p2 (x, y − 2, t) ∧ bi (x, y − 1, t) ∧ (p2 (x, y − 1, t + 1))......((pR (x, y − 2, t) ∧
bi (x, y − 1, t) ∧ (pR (x, y − 1, t + 1))) where ∀bi ∈ B.

Push Down:
A box is pushed in down direction and agent will take its place.
bi (x, y, t + 1) ⇒ ((bi (x, y, t) ∨ ((p1 (x, y + 2, t) ∧ bi (x, y + 1, t) ∧ (p1 (x, y + 1, t +
1)) ∨ ((p2 (x, y + 2, t) ∧ bi (x, y + 1, t) ∧ (p2 (x, y + 1, t + 1))......((pR (x, y + 2, t) ∧
bi (x, y + 1, t) ∧ (pR (x, y + 1, t + 1))) where ∀bi ∈ B.

Total prepositions generated in one time step is 4*B*N*M


The availability of the obstacles will reduce the total number of prepositions.
The reduction is subjected to the location configuration of the obstacles.

3.2.3 Player Head-on constraint


Two players are neighbor to each other and the both has opposite movement
direction into each other.

¬((p1 (x, y, t) ∧ p2 (x, y + 1, t) ∧ p1 (x, y + 1, t + 1) ∧ p2 (x, y, t + 1))


∗M
Total number of clauses at time T is P2 ∗ T ∗ NAdj
 
2
where Adj2 is the two cells
which are neighbors.

3.2.4 Player’s single place existence constraints


This constraint makes sure that a player does not appear more than one location
on the grid at any time step.

¬(pi (x1 , y1 , t) ∧ pi (x2 , y2 , t)) where ∀pi ∈ P and t ∈ T.


Total clauses at time step t are ((N ∗ M − X)(N ∗ M − X − 1))/2 ∗ t ∗ P where
X is number of obstacles present on the grid.

3.2.5 Box’s single place existence constraints


A box can not on more than one place on grid at any time point.

¬(bi (x1 , y1 , t) ∧ bi (x2 , y2 , t)) where ∀bi ∈ B and t ∈ T.


Total clauses at time step t are ((N ∗ M − X)(N ∗ M − X − 1))/2 ∗ t ∗ B where
X is number of obstacles present on the grid.

6
3.2.6 Player to Player collision constraints
Two players can not be on one place.

¬(pi (x, y, t) ∧ pj (x, y, t)) where ∀pi ∈ P ,∀pj ∈ P , i 6= j and t ∈ T.

Total number of clauses at time T is P2 ∗ N ∗ M ∗ T




3.2.7 Box to Box collision constraints


Two box can not be on one place.

¬(bi (x, y, t) ∧ bj (x, y, t)) where ∀bi ∈ B ,∀bj ∈ B,i 6= j and t ∈ T.

Total number of clauses at time T is B2 ∗ N ∗ M ∗ T




3.2.8 Box to player collision constraints


A player and a box can not be on one place.

¬(bi (x, y, t) ∧ pj (x, y, t))where ∀bi ∈ B ,∀pj ∈ P and t ∈ T.

3.2.9 Obstacle collision constraints


Player with obstacle: A player can not be on obstacle location.
¬p(xo , yo , t) where (xo , yo ) is the obstacle location.
Box with obstacle: A box can not be on obstacle location.
¬b(xo , yo , t) where (xo , yo ) is the obstacle location.

3.3 Implementation
The proposed encoder module is implemented using python language. In the
module for boolean formula to cnf convertor PBL interface (a Boolean alge-
bra/propositional logic library written in Python) is used. Glucose-syrup 4.1
SAT Solver is used as a black box for finding the satisfiability.

7
4 Sokobon Encoding Approach 2:
In the second approach the box encoding is changed to the previous encoding.
The result of that is less number of clauses.

4.1 Constraint Definition


4.1.1 Player movement constraints
Same as first encoding approach.

4.1.2 Box movement constraints


Same as first encoding approach.

4.1.3 Box push movement constraints


Let R be the number of players in the system.A box can push pushed in four
possible directions.

Push Right:
A box is pushed in right direction and agent will take its place.
bi (x, y, t) ∨ bi (x + 1, y, t + 1) ⇒ (p1 (x − 1, y, t) ∧ p2 (x, y, t + 1)) ∨ (p2 (x − 1, y, t +
1) ∧ p2 (x, y, t))......(pR (x − 1, y, t) ∧ pR (x, y, t)) where ∀bi ∈ B.

Push Left:
A box is pushed in left direction and agent will take its place.
bi (x, y, t) ∨ bi (x − 1, y, t + 1) ⇒ (p1 (x + 1, y, t) ∧ p2 (x, y, t + 1)) ∨ (p2 (x + 1, y, t) ∧
p2 (x, y, t + 1))......(pR (x + 1, y, t) ∧ pR (x, y, t + 1)) where ∀bi ∈ B.

Push Up:
A box is pushed in up direction and agent will take its place.
bi (x, y, t) ∨ bi (x, y + 1, t + 1) ⇒ (p1 (x, y − 1, t) ∧ p2 (x, y, t + 1)) ∨ (p2 (x, y − 1, t) ∧
p2 (x, y, t + 1))......(pR (x, y − 1, t) ∧ pR (x, y, t + 1)) where ∀bi ∈ B

Push Down:
A box is pushed in down direction and agent will take its place.
bi (x, y, t) ∨ bi (x, y − 1, t + 1) ⇒ (p1 (x, y + 1, t) ∧ p2 (x, y, t + 1)) ∨ (p2 (x, y + 1, t) ∧
p2 (x, y, t + 1))......(pR (x, y + 1, t) ∧ pR (x, y, t + 1)) where ∀bi ∈ B
¬(bi (x1 , y1 , t) ∧ bi (x2 , y2 , t)) where ∀bi ∈ B and t ∈ T.

Total clauses generated in this step is 4*B*N*M*


The availability of the obstacles will reduce the total number of clauses. The
reduction depends upon the location configuration of the obstacles.

8
4.1.4 Player Head-on constraint
Same as first encoding approach.

4.1.5 Player’s single place existence constraints


Same as first encoding approach.

4.1.6 Box’s single place existence constraints


Same as first encoding approach.

4.1.7 Player to Player collision constraints


Same as first encoding approach.

4.1.8 Box to Box collision constraints


Same as first encoding approach.

4.1.9 Box to player collision constraints


Same as first encoding approach.

4.1.10 Obstacle collision constraints


Same as first encoding approach.

4.2 Implementation
The improved version of the proposed encoder module is implemented in c++.In
the clause generation module preposition formulae are being solved using pat-
tern generator module.

4.3 Strategy 2:Exponential jumps encoding Approach


4.3.1 Methodology
In the previous solving strategy constraints are generated linearly until solution
is not found or the given time window expires.In this strategy the sat solver is
invoked on exponential time steps if the solution is not found till next expo-
nential time step constraints are generated and again sat solver is called. In
case of solution is found from that time step to the first time step binary jumps
on time point are used to check the minimum steps to achieve the solution. In
the binary jumps when jump is towards first step clauses are removed from the
system.

9
4.4 Strategy 3: Neighborhood Encoding Approach
4.4.1 Methodology
In neighborhood encoding approach the main focus is to reduce the number of
clauses so that overall solution time can be improved.The main idea of the ap-
proach is that in initial time steps for an agent generate only those movements
clauses which are neighbors of the agent. The above step follows till the time
step an agent can reach all the locations of the grid. From the time step on
which it can reach all the location of the grid generate all movement clauses of
all the grid locations.

4.5 Strategy 4: Neighborhood Encoding Approach with


Relaxed Jump
This strategy reduces the overall solution time by calling the SAT solver on the
time point when there is a valid assignment of the boxes on the goals.

4.5.1 Methodology
let P0 = (p1 (x1 , y1 , 0), p2 (x2 , y2 , 0), ...pR (xu , yv , 0), ) is the initial configuration
of the agents where R is total number of players,. Box configuration on time step
zero B0 = (b1 (x1 , y1 , 0), b2 (x2 , y2 , 0), ..bB (xm , yn , 0)). X = ((x1 , y1 ), (x2 , y2 )..(xm , yn ))
is the locations of the boxes. G = ((xu1 , yv1 ), (xu2 , yv2 )...(xun , yvn )) is the set
of goal the locations. if there is a t ∈ Z : t ∈ [0, T − 1] for which X ≡ G
.Sequentially generate the clauses until t is found.Once t is found from now on
this t SAT solver is invoked.

5 Experiment and results


5.0.1 Machine configuration and Time limit
Each problem has 1800 seconds allocated time window for execution and ma-
chine configuration is (Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz) 4 core 8
threads, 32 GB Memory.

5.0.2 Results
All 20 problems given by CoDMAP (Competition of Distributed and Multiagent
Planners) competition are selected for the experiment purpose and the results
on them are given below.

10
CoDMAP Problems result
Problem Grid (Agents, Plan Plan Total Total Total
Name Boxes) Found Itera- Time Plan- clauses
tion (Sec) ner gener-
Time ation
(sec) (sec)
p01 7*7 2,2 YES 8 76.248 9.192 66.808
p01-1 7*7 2,2 YES 8 78.199 9.428 68.77
p02 12*6 2,3 T.O - - - -
p02-1 12*6 2,3 T.O - - - -
p03 7*7 2,3 YES 8 93.9 22.9‘ 70.9
p03-1 7*7 2,3 YES 10 117.0 28.3 88.78
p04 7*11 3,4 T.O - 1800 - -
p04-1 7*11 3,4 T.O - 1800 - -
p05 10*9 3,4 T.O - 1800 - -
p05-1 10*9 3,4 T.O - 1800 - -
p06 10*9 2,4 YES 13 644.1 205.8 438.3
p06-1 10*9 3,4 T.O - 1800 - -
p07 9*8 2,3 T.O - 1800 - -
p07-1 9*8 3,3 T.O - 1800 - -
p08 12*10 3,2 T.O - 1800 - -
p08-1 12*10 3,3 T.O - 1800 - -
p09 15*7 3,4 T.O - 1800 - -
p09-1 15*7 3,4 T.O - 1800 - -
p10 29*17 4,1 T.O - 1800 - -
p10-1 15*7 3,4 T.O - 1800 - -

NOTE: T.O is considered as time out.


Total planner time is time taken by the planner to solve the problem, clause
generation time is not being considered here. Clause generation time is com-
posed of clause generation and prepositional formulae solving time.

Observations:
1.For each preposition formula PBL interface takes significantly large amount
of time.
02.For initial iterations some irrelevant clauses are generated.

5.0.3 Planner comparison


This section contains the comparison between proposed encoder, state spaced
planner and reduction based planners.

11
NOTE:
1. Madagascar is reduction based planner.
2. Mp and MpC are the two versions of madagascar.
3. ADP and ADP legacy are the fastest planner in the CoDMAP competition.
Both are state spaced planner.

Observations:
1. Proposed encoder gives optimal plan.
2 Proposed encoder is very much slower than the fastest planner.
3.Fastest planner is state spaced based.

12
5.0.4 PBL interface removal
Prepositions for sokoban problem are being solved by PBL interface. A pattern
was identified between prepositions and solved CNF. Using the backtracking
tool is able to generate cnf of the given formula. This causes the removal of
PBL interface from the encoder.

Results The problems which was solved previously are considered here.

Observation
1.For smaller grid size solution time reduces 80 to 95%.
2.All approach are using pattern based CNF generator module.

5.1 Approach 1 conclusion


Proposed encoder is optimal but not fast enough when compared with the state
space based planner. The reason for proposed encoder is performing slower be-
cause it generates a lot of clauses so the solver is taking more time to solve them.
The structure of preposition can be changed so that the meaning captured by
preposition is not changed which will result in less number of clauses. Next
approach is following the mentioned idea.

5.1.1 Results
The results for encoding with exponential approach strategy is given below.
The results are being compared with the state of art Madagascar planner, ADP
planner which is the fastest planner in the CoDMAP competition, BlackBOX
planner which is the first reduction based classical planner, SATPLAN is a sin-
gle agent based classical planner.

13
NOTATIONS:
1. O.S. is optimal steps for solving the problem.
2. T.O. is Time Out
3. *Time is in seconds (user+sys mode of process execution)
4. ** Time in range
5 .***INCOM means that conversion from Ma-pddl to pddl form is not com-
patible with the planner.

Test Cases:
Test 1:
A1:(0,0) A2:(14,0) [Agents]
B1:(1,1) B2:(13,1) [Boxes]
G1:(14,14) G2:(0,14) [Goals]
Test 2:
A1:(0,0) A2:(29,0) [Agents]
B1:(1,1) B2:(28,1) [Boxes]
G1:(29,29) G2:(0,29) [Goals]

NOTE: There is a script provided by the CODMAP that convert MA-domain


to plain pddl domain. Using the script the single agent planner are eligible for
the problem execution.
Observations:
1.The proposed approach is practical for small time steps as the jump expands
the problem size exponentially increases and on some cases problem becomes
unsolvable with respect to mentioned computing power.

2.The approach is generating the clauses for all the grid location, some of them
are beyond the reach of an agent.

14
5.1.2 Results
The result of the proposed approach with encoding scheme 2 is given below.
Both problems are taken form the CoDMAP problem set.

Observations:
1.Overall solution time is reduced in p01 81.03% and p01-1 82.79 %.
2.The proposed approach is scalable with the grid size.

5.1.3 Results of relaxed jump


Results of the above proposed strategy is given below.

15
Observations:
1. By applying the constraints generation strategies and solver invocation strat-
egy the proposed encoding module is bit competitive to the search based plan-
ners.
2. Few problems which are giving time out by the other planners our encoder
is solving them within time limit.

6 Future work
The proposed encoding schemes and solver strategies has shown the perfor-
mance competitiveness with the search based solvers.Reduction based solver is
fast enough but not competitive when the problem size is large.For such in-
stances incremental strategy can be applied to reduce the overall solver time.
In incremental strategy solver reuses the already learned clauses from the pre-
vious iterations. That is the main reason for which incremental solver strategy
is suitable for larger problems.

16
References
[1] Pavel Surynek,Ariel Felner,Roni Stern and Eli Boyarski Efficient SAT Ap-
proach to Multi-Agent Path Finding under the Sum of Costs Objective work-
shop IJCAI,2016.
[2] H. Kautz and B. Selman, Planning as satisfiability, Proceedings of the Tenth
European Conference on Artificial Intelligence (ECAI’92), John Wiley, 1992.

[3] Henry Kautz and Bart Selman (1998). BLACKBOX: A New Approach to
the Application of Theorem Proving to Problem Solving. Working notes of
the AIPS-98 Workshop on Planning as Combinatorial Search, Pittsburgh,
PA, 1998.

17

Anda mungkin juga menyukai