Planning for Connected Agents in a Partially Known Environment

The Connected Multi-Agent Path Finding (CMAPF) problem asks for a plan to move a group of agents in a graph while respecting a connectivity constraint. We study a generalization of CMAPF in which the graph is not entirely known in advance, but is discovered by the agents during their mission. We present a framework introducing this notion and study the problem of searching for a strategy to reach a conﬁguration in this setting. We prove the problem to be PSPACE-complete when requiring all agents to be connected at all times, and NEXPTIME-complete in the decentralized case, regardless of whether we consider a bound on the length of the execution.


Introduction
The coordination of mobile agents is at the heart of many real world problems such as traffic control [1], robotics [2,3], aviation [4] and more [5,6]. Some of these problems have multiple aspects which make them complex: (1) Some systems are multi-agent, that is, the behaviors of agents influence others' and these influences must be taken into consideration when computing missions; this can be due for instance, to collisions [7], sensor interferences [8,9] etc.; (2) Some missions must ensure connectivity, that is, ensure periodic or constant connection to a station/agent to share acquired information [10]; (3) The environment may be only partially known, and the agents may discover it during the mission [11,12]. Several works have considered problems containing these three aspects. For instance, several algorithmic approaches have been investigated to solve the coordination of multirobot exploration [13][14][15]. Our objective in this paper is to present a framework to study the theoretical complexity of planning problems with respect to these three aspects.
The theoretical complexity of some related problems have been studied in the literature. Multi-Agent Path Finding (MAPF) is an important framework introduced to study collisionfree navigation of agents in warehouses (see [7,16]). This problem was intensively studied and gave rise to a popular algorithm known as Conflict-Based Search (CBS) [17]. An extension of MAPF with connectivity constraints, called Connected MAPF (CMAPF), was studied as well [18]. The complexity of CMAPF and algorithmic solutions were studied in [19,20]. However, CMAPF only addresses the multi-agent and connectivity aspects, and not the partial knowledge of the environment. The latter aspect is considered in the Canadian Traveler Problem (CTP), which is a well-known problem to study the navigation of an agent in a partially known graph [21]. While the initial framework was for a single agent, CTP has been extended to multiple agents in the settings of packet routing [22], multirobot exploration [23] and more [24]. While a notion of communication was considered in [25,26], it is limited to settings where all agents can receive information at all times or only designated agents can send information. In contrast, we are interested in studying the setting where agents' ability to communicate depends on their positions in the graph, and in establishing theoretical complexity results of resulting problems.
In this paper, we study the theoretical complexity of generating plans for a group of agents to reach a given target configuration. More precisely, we analyze the impact of enforcing or ignoring: (A) connectivity; (B) collision; (C) a bound on the length of the execution. For (A), we consider either fully-connected strategies, requiring that the agents remain connected at all times during the mission, or a decentralized strategy, allowing agents to disconnect and reconnect. (B) In some applications, collisions can be handled by a local collision avoidance system [18], and one can thus abstract away and ignore collisions in graph-based planning algorithms. (C) By providing a bound on the execution length, we can study the complexity of the decision problem associated to the optimization problem. Our results are summarized in Table 1. Interestingly, The PSPACE algorithm for the connected problem with a bounded execution is subtle and relies on a variant of the Savitch's theorem [27] we present here. Additionally, the PSPACE-completeness holds even in the case in which agents can always communicate, thus the hardness of the problem already comes from the incomplete knowledge of the movement graph. For the decentralized case, we prove the NEXPTIME-hardness in the bounded and unbounded cases by two separate reductions from the True Dependency Quantified Boolean Formula problem (TDQBF) [28], thus showing that the problem becomes significantly harder in this case.
Let us compare our results with known complexity results. In the fully known environment, the CMAPF problem is PSPACE-complete in the connected and unbounded case [19], while it is NP-complete in the bounded case (with the bound given in unary) [18]. Thus, the partial knowledge of the environment does not render the problem harder in terms of complexity. In contrast, recall that in MAPF (without connectivity), one can check the existence of a solution in polynomial time [29], while the bounded problem is NP-hard [30], so the hardness is due to connectivity constraints. Some algorithms were presented for CMAPF in [19] that can scale up to about ten agents. Since both problems are similar and belong to PSPACE, one can hope that these approaches can be extended to the partial knowledge case. On the other hand, our results show that the complexity of the decentralized case is significantly higher. While some tools and algorithms are available for Decentralized POMDPs, which are in the same complexity class, the scalability is limited and the development of efficient algorithms for this case seems more challenging (see [31] for a survey).
Some proofs are omitted due to space constraints; they are provided in the long version [32].  Figure 1 gives an example of a topological graph. For instance, an agent can go from 1 to 3 in one step. Two agents at vertices 1 and 5 can communicate, but two agents at 1 and 4 cannot.
We suppose that each vertex has a self-loop movement edge and a self-loop communication edge. This respectively represents the ability of an agent to stay at their location and to communicate with a nearby agent (i.e. at the same location/vertex).
This means that each agent i can move from their vertex c i in c to their vertex c ′ i in c ′ in one step. Definition 3 (Execution). An execution π is a sequence of configurations ⟨π We denote the length of π by |π|, and write π[0..t − 1] to denote the sub-execution ⟨π 1]⟩ of π. Moreover, last(π) denotes the last configuration of π.
We say that a configuration c is connected iff the subgraph of the vertices occupied by the agents form a connected graph for relation E c , i.e. the graph ⟨V a , E c ∩ (V a ×V a )⟩ is connected with V a = {c 1 , . . . , c n }. An execution is said to be connected iff all its configurations are connected. The Connected Multi-Agent Path Finding problem consists in finding a connected execution for the agents from a given initial configuration to a target configuration. We summarize below the known complexity results for the CMAPF problem. Theorem 1 ([18]). The problem of deciding whether, in a given instance (G, c s , c t , k) where k is unary, there is a connected execution from c s to c t of length at most k is NP-complete. Theorem 2 ([19]). The problem of deciding whether for a given instance (G, c s , c t , ∞), there exists a connected execution from c s to c t is PSPACE-complete.
We do not consider collision constraints which would forbid agents from sharing the same vertex. For CMAPF with perfect knowledge, executions are not harder to compute with or without collision constraints [19,20]. The same holds in our case, and we discuss, in the Section 6, how to incorporate these constraints in our setting. Dependency Quantified Boolean Formula. A Dependency QBF (DQBF) is a formula in which dependencies of existential variables over universal variables are explicitly specified. A DQBF is of the form ∀y 1 , . . . , y n ∃x 1 (O x1 ) . . . ∃x n (O xn ) ψ, where each O xi is, the dependency set, a subset of universally quantified variables, and ψ is a Boolean formula in CNF over x 1 , . . . , x n , y 1 , . . . , y n . It is worth noting that a QBF can be seen as a DQBF with The True DQBF (TDQBF) is the problem of deciding whether a given DQBF holds. Formally, a DQBF φ holds iff there exists a collection of Skolem functions A = (A xi : .n such that replacing each existential variable x i by a Boolean formula representing A xi , turns ψ into a tautology. TDQBF is NEXPTIME-complete [28], and will be used to prove NEXPTIME lower bounds in Section 5.

Modeling Imperfect Information
To formalize CMAPF in the imperfect information setting, let us show how to represent the initial knowledge of the agents, and how the information they have evolves during the execution. Agents initially know the exact set of vertices, but only have a lower and an upper approximation of the actual graph: they know that some (communication or movement) edges are certain (they must be present), while some are uncertain (they may be absent).
. We assume that the communication edges of the actual graph are undirected, so That is, the movements edges of the actual graph are not necessarily undirected. A strategy σ i for agent i tells where to go next after a given execution π. Formally: Definition 5. A strategy σ i for agent i maps any execution π to a vertex such that (v, σ i (π)) ∈ E m where v is the vertex at which agent i is in the last configuration of π.
A joint strategy σ is a tuple ⟨σ 1 , . . . , σ n ⟩ where σ i is a strategy for agent i. The outcome of a joint strategy σ starting from configuration c s is the execution π defined by induction as follows: π[0] is c s , and for t ≥ 1, π[t] is the configuration in which agent i is at vertex In the context of imperfect information, the behaviors of the agents only depend on their observations, as in imperfect information games as in [33]. The strategies, as defined above, do not necessarily take observations of the agents into account. We will now formalize observations and uniform strategies, that is, those respecting the observations of the agents.
In our setting, at any time, an agent observes all movement edges adjacent (both in-and out-coming) to the vertex v it occupies. Moreover, they observe the presence or absence of a communication edge between v and v ′ if v ′ is occupied by another agent with which there is a direct or indirect communication (via other agents). Intuitively, during an execution, at each step, each agent updates their knowledge about the graph with these observations they receive. Moreover, they share all their knowledge with all agents with which they are connected at each step.
The observation of adjacent movement edges has been a recurrent practice in theoretical works [21,34] as well as robotics [35,36], and our formalism is inspired from these works.
The knowledge of an agent at any time corresponds intuitively to a pair of graphs as in Definition 4. For agent i and execution π, let us denote by k i (π) G the knowledge of agent i about the graph after observing the execution π in actual graph G. Given such knowledge K = k i (π) G , the agent can deduce an under-approximation and an over-approximation of the actual graph; let us denote these by G K and G K respectively. In particular, if K is the knowledge the agents have initially, then G K = G 1 and G K = G 2 . We present a representation of the knowledge using predicates in the appendix where a detailed formalization of k i (π) G can be found.
In the rest of the paper, we assume all considered strategies to be uniform, that is, they comply with the knowledge of the agents: the strategies prescribe the same move to all executions that are indistinguishable with the agent's observations. Formally, if |π| = |π ′ | and k i (π) G = k i (π ′ ) G , then σ i (π) = σ i (π ′ ).

Decision Problems
We consider the decision problem of reaching a configuration c t from a configuration c s in less than k steps, using uniform strategies. For two configurations c s , c t , let us call a topological graph G (c s , c t )-admissible if there is an execution from c s to c t , which is not necessarily connected.

Definition 6.
We say that an instance (G 1 , G 2 , c s , c t , k) is positive if there exists a joint strategy σ such that in all (c s , c t )-admissible graphs G satisfying G 1 ⊆ G ⊆ G 2 , the outcome of σ starting in c s ends in c t in less than k steps.
Observe that the above problem requires that a strategy ensures the reachability of the target configuration only for graphs that are (c s , c t )-admissible and compatible with the initial knowledge. In fact, intuitively, we would like the strategy to work under all possible graphs G with G 1 ⊆ G ⊆ G 2 . However, requiring a strategy to ensure reachability in a nonadmissible graph does not make sense, since even a strategy with full information would fail. We thus require the strategies to make their best efforts, that is, to ensure the objective unless it is physically impossible. Figure 2. If both bridges (i.e. movement edges (s 2 , s 4 ) and (s 3 , s 5 )) are absent in the actual graph, the graph is not (s 1 , s 6 )-admissible and there cannot be a strategy ensuring reachability. The admissible graphs contain either (s 2 , s 4 ), or (s 3 , s 5 ), or possibly both. Note that, this instance is negative for k < 6 (that is, it does not admit a solution). Indeed, consider a strategy that moves the agent, for instance, to s 2 . In the graph where (s 2 , s 4 ) is absent, the agent would need to come back to s 1 and take the alternative path, which requires an execution of total length 6; and the situation is symmetric if the first move is towards s 3 . The instance is nonetheless positive for k ≥ 6 with the described strategy. However, if the edges (s 2 , s 1 ) and (s 3 , s 1 ) were not present, then the instance would be negative. In fact, once the agent moves to s 2 or s 3 , they get stuck if the graph only contains the other bridge.

Example 2. Consider the example of
We now define the connected version of Definition 6. For two configurations c s , c t , we say that a topological graph G is (c s , c t )-c-admissible if there is a connected execution from c s to c t . We will often omit the pair of configurations which will be clear from the context, and write simply admissible or c-admissible. (G 1 , G 2 , c s , c t , k) is c-positive if there exists a joint strategy σ such that in all (c s , c t )-c-admissible graphs G satisfying G 1 ⊆ G ⊆ G 2 , the outcome of σ starting in c s is connected and ends in c t in less than k steps.

Definition 7. We say that an instance
In the connected case, agents cannot visit a disconnected configuration. Hence, the considered strategies only visit configurations that are certainly connected. Observe that agents can make observations about the presence or absence of communication edges while being connected and use this information later. Figure 3. Assume there are two agents, the starting and goal configurations are c s = ⟨s 1 , s 6 ⟩ and c t = ⟨s 5 , s 7 ⟩, and the only uncertainty is about the movement edges (s 3 , s 5 ) and (s 4 , s 5 ). Here, Agent 2 could immediately move to her target s 7 ; however, she could also cooperate with Agent 1 and lower the total completion time. Indeed, from their start configuration ⟨s 1 , s 6 ⟩, the agents first move to ⟨s 2 , s 4 ⟩ where Agent 2 observes whether (s 4 , s 5 ) is present. Assume the edge is present. Then, they follow the sequence ⟨s 4 , s 6 ⟩ · ⟨s 5 , s 7 ⟩; and otherwise ⟨s 3 , s 6 ⟩ · ⟨s 5 , s 7 ⟩. Thus, in order to minimize the length of the execution, the agents do not always take their shortest paths but might help other agents by obtaining information about the graph. Consider now the same example in which the communication edge (s 3 , s 6 ) is uncertain. If this edge is absent then Agent 2 cannot help Agent 1 achieve the target faster since if the former moves to s 4 and (s 4 , s 5 ) is absent, then, in order to maintain connectivity, the next configurations should be ⟨s 4 , s 6 ⟩ · ⟨s 2 , s 7 ⟩ · ⟨s 3 , s 7 ⟩ · ⟨s 5 , s 7 ⟩. An execution of the same size is obtained when Agent 2 moves to s 7 in the first step.

Example 3. Let us illustrate the above property on the example of
For both Definitions 6 and 7 above, let us call a joint strategy a witness if it witnesses the fact that the given instance is positive, and respectively, c-positive.
We instantiate the Connected MAPF problem in four different settings. The four following decision problems are defined depending on whether we consider the connectivity requirement and whether the bound is finite. Note that the bounded problems are the decision problems associated to the optimization problems.

Bounded Decentralized Reachability. Is given
As we will see, the encoding of the integer k does not change the overall complexity. Lower bounds are all obtained directly for the unary encoding; the lower bounds for the binary encoding follows. Concerning the upper bounds, we explain for each case how to design an algorithm with k encoded in binary.

Connected Reachability
We first address the case where agents must be connected at each step of the execution. In this case, agents share their knowledge at all times and thus the group of agents can be considered as a single agent playing against the environment.

Unbounded Case
We first focus on the existence of an unbounded connected strategy. Interestingly, we show that verifying the existence of a connected strategy in a partially known environment is not harder than in a perfectly known environment.

Bounded Case
We now study the existence of a bounded connected strategy. We show that this problem is PSPACE-complete even when the communication graph is complete. Please find in supplementary material detailed proofs of the Lemmas 1, 2 and Theorem 5.

Theorem 4. The bounded connected reachability problem is in PSPACE when the bound is given in binary.
Proof. Let us first prove the upper bound when k is given in unary. As APTIME = PSPACE [37], we give an alternating algorithm that runs in polynomial time, as follows. At each step, the existential player chooses the next connected configuration to move the agents; and the universal player chooses the information about the newly discovered edges. After k steps the algorithm accepts if the target configuration is reached, or the revealed edges mean that the graph is not c-admissible. The number of steps is bounded by k, which is polynomial, thus the algorithm runs in polynomial time.
There is one subtlety to prove the correctness. The alternating algorithm actually corresponds to a slight variant of our setting which can be seen as a game. In our setting, the environment chooses a graph G with G 1 ⊆ G ⊆ G 2 at the beginning, and the agents discover the graph G as they move. In contrast, in the alternating algorithm, the universal player reveals the graph step by step; therefore the environment might adapt the graph to the moves of the existential player.

Lemma 1. The alternating algorithm is correct.
When k is binary, the previous algorithm does not run in polynomial time. However, observe that the number of alternations can be bounded by a polynomial because there is only a polynomial number of steps in which the universal players reveal new information to the coalition of agents. In fact, the universal player is only useful when some agent is at a vertex that has not been seen before, and this can only happen a linear number of times. Furthermore, the previous algorithm runs in polynomial space.
When k is binary, our problem is in STA(poly(n), * , poly(n)) where STA(s(n), t(n), a(n)) is the set of problems decided in space O(s(n)), time O(t(n)) with O(a(n))) alternations. Our problem is in PSPACE thanks to the generalization of Savitch's theorem we prove: Lemma 2. STA(poly(n), * , poly(n)) ⊆ PSPACE.
Intuitively, this lemma is proved by guessing the computations between each universal choice by a PSPACE oracle, which yields an overall PSPACE algorithm.

Theorem 5. The bounded connected reachability problem with complete connectivity is PSPACE-hard.
Our reduction actually builds an undirected movement graph. Thus, PSPACE-hardness holds already for undirected movement graphs. Note that in our current setting, pairs of uncertain edges of the form (u, v) and (v, u) are treated separately, but the lower bound proof still holds when they are seen as one.

Decentralized Reachability
We now tackle the case where agents are allowed to be disconnected; at each configuration, they share their knowledge with all agents to which they are connected. This case is harder because agents no longer follow a centralized strategy and they must cooperate to exchange information at the right moment to reach their targets.

Unbounded Case
Theorem 6. The unbounded decentralized reachability problem is NEXPTIME-complete.
Proof Sketch. For the upper bound, an NEXPTIME algorithm consists in guessing uniform strategies for all agents and checking whether the joint strategy is a witness. Such a strategy has exponential size since it is a function of the sets of knowledge of the agents and the current vertex. One can enumerate all graphs G between G 1 and G 2 , and execute the joint strategy on G to check that it ensures the reachability of the target. Moreover, the executions ? ∃z (a) Gadget for variable z. Both edges (vz, ⊤z) and (vz, ⊥z) are certain (resp. uncertain) if z is existential (resp. universal). Certain (resp. uncertain) movement edge if z is existential (resp. universal) Figure 4. Gadgets in the reduction from DQBF to unbounded decentralized reachability.
to be checked have at most exponential length. In fact, executions can be seen as paths in a meta-graph where vertices are configurations augmented with the sets of knowledge of the agents. This meta-graph is of exponential size, so it is sufficient to consider executions of exponential length. The overall non-deterministic algorithm is thus in exponential time.
The graph G 1 and G 2 as follows. For each variable z, we create the gadget depicted in Figure 4a.
We create the observation gadget for all existential variables x and for all (universal) variables y ∈ O x , depicted in Figure 4b. For convenience, we write O for the pair (x, y) corresponding to observation of y by x.
Finally, we create the clause gadget, depicted in Figure 4c. A vertex γ i certainly communicates with ⊤ z iff z ∈ γ i , and with ⊥ z iff ¬z ∈ γ i . Moreover, the vertex v γ communicates with all ⊤ z and ⊥ z for all variables z.
We give some intuitions about the reduction; the full proof is in the supplementary material. Assuming the DQBF φ holds, one can build a witness strategy as follows. The agents starting at a universal s z are forced to follow a path not deleted by the environment, which determines the value of z. The agents starting at s O can observe the valuation of their respective universal variables (by visiting v y ), and then inform their respective existential variables thanks to the communication edge between v O and v x . Thus, agents starting at an existential s z are aware of the values of all variables in O z and choose an appropriate value so as to satisfy φ. The agents starting at s i idle at γ i at step 3. At this point, the agent that starts s γ also idles at v γ , and all agents that start s z are at ⊤ z or ⊥ z . Thanks to the communication edges, and because φ is true, each γ i communicates with at least some ⊤ z or ⊥ z , thus also with the agent at v γ . Therefore, the latter agent knows about all available edges in the clause gadget and can successfully go to t γ .

Bounded Case
In the bounded case, the problem is NEXPTIME-complete independently of the encoding of the bound. Moreover, the hardness holds even for undirected graphs. Theorem 7. The bounded decentralized reachability problem is NEXPTIME-complete. NEXPTIME-hardness holds for undirected graphs.

Discussion
Additional Results. We present results obtained by a simple observation/modification.
Unbounded Reachability and Undirected Graphs. Both the unbounded connected and unbounded decentralized reachability become trivial on undirected graphs. This is because we only require reachability for (c-)admissible graphs. In the decentralized case, each agent can run a DFS independently, and eventually reach their targets in at most 2|V | steps, and a similar search can be done by the set of agents in the connected case.
Base Station. Several works consider a designated base vertex to which all agents must stay connected during the execution [19,20,38]. This concept is only relevant in the connected case. Our results also hold with this additional constraint. In fact, the lower bound of Theorem 3 follows from [19], which proves the bound also with a base. In Theorem 5, we can add the base vertex as an isolated vertex so that the reduction is still valid.
Collisions. We did not require the paths to be collision-free in the results presented in this paper. However, this property is already ensured by our proofs or can be obtained by simple modifications. The lower bound proof of Theorem 3 relies on Theorem 2 from [19] which holds with collision constraints as well, so this is also true for our case. The proof of Theorem 5 does not generate collision-free paths as the groups of occurrence agents start and finish at the same location and follow almost the same path. This proof can be adapted to prevent collisions by delaying each occurrence agent by 3 steps behind one another. This can be achieved easily by extending the movement paths and shifting the starting location and target location of an agent up by 3 vertices behind the previous agent. The proof of lower bound of Theorem 6 features a construction ensuring a collision-free strategy. Indeed, the observations agents only need to take turns to visit the universal variables. Thus, the result holds with collision constraints as well. The algorithms of Theorem 3, 5, and 7 can be adapted by restricting all considered configurations to collision-free ones; while c-admissibility of a graph with collision constraints can be checked using Theorem 2.
Graph Classes. The MAPF and CMAPF problems have been studied for different classes of graphs (planar, grid, . . . ). The proof of lower bound in Theorem 3 relies on the proof of unbounded reachability done in [20], thus the result of PSPACE-hardness on planar graphs also carries over to our problem. Planar QBF is known to be PSPACE-complete [39], and the construction of Theorem 5 is such that when applied to a planar QBF, the resulting graph is planar. Our PSPACE-completeness result thus holds on planar graphs. Related Work. Different definitions of robust plans [40][41][42] have been studied. A k-robust plan guarantees the reachability of the target in the events of at most k delays. A p-robust plan executes without a conflict with probability at least p. Our framework does not consider delayed agents but focus on synchronous executions with imperfect knowledge of the area.
The problem of MAPF with a dynamic environment has multiple formulations. The Adversarial Cooperative Path-Finding [43] considers that the obstacles are agents which reason to prevent the cooperation to reach its goal. [44] considered the problem where the dynamics of the environment is predictable. Additionally, when obstacles have unknown dynamics, one can estimate their movements and plan to minimize the probability of a collision [45], or predict their movements [46] and plan online the movement of the agents [47]. In our setting, the environment is static, thus, all observations are fixed.
MAPF with Uncertainty (MAPFU) asks for a plan which guarantees that mishaps, localization and sensing errors do not impact the proper execution of the plan. This problem can be solved by temporal logic [48], POMDPs [49], replanning [50][51][52], interaction regions [1,53], and belief space planning [54][55][56]. Nebel et al. [57] studied the MAPF problem with an uncertainty on the destination of the agents and lack of communication. The asynchronous movement of the agents, studied in those papers, cannot be expressed in our framework as we require the agent to follow some universal clock to execute their plan. Perspectives. We proposed a setting for CMAPF in the imperfect case and studied the theoretical complexity of the reachability problem. The first natural question is to find classes of graphs (e.g. grid graphs) on which the reachability problem is easier to solve, as it was done for MAPF in [58,59], and CMAPF in [20]. Another possible direction is to study the coverage of all vertices [20]. An alternative way to handle non-admissible graphs is to require that agents return to their starting configuration if the graph is discovered not to be admissible. We believe that such variants should be as hard as reachability. Furthermore, there are several possible generalizations that could be considered by introducing dynamic environments (instead of static), faulty sensing of agents, robustness, uncertainty, etc.