Jointly Learning Predicates and Actions Enables Zero-Shot Skill Composition

Brown University, Robotics and AI Institute (RAI)
“Learned robot skills that predict both actions and their symbolic effects, enabling skill composition via symbolic planning.”

Abstract

Learning from Demonstration (LfD) enables robots to learn complex behaviors from expert examples, yet existing approaches often fail to generalize to new compositions of known skills without retraining. Modern generative policies model distri- butions over action trajectories alone, thus are unable to reason about the symbolic outcomes required for robust composition. We propose that skills should jointly model action trajectories and the symbolic outcomes they induce. To address this gap, we introduce Predicate-Action Skills (PACTS), a class of closed- loop visuomotor policies that model skills as a joint generative process over action and predicate belief trajectories, producing coherent action–outcome rollouts within a single model. Jointly generating actions and predicates enables PACTS to learn in- ternal representations that improve both action generation and predicate classification. Furthermore, we demonstrate zero-shot composition of learned skills via planning by leveraging online predicate predictions from PACTS as a symbolic interface for sequencing and monitoring execution

Pacts teaser image.

Predicate-Action Skills. Conditioned on current observations $\mathbf{o}$, we model skills as a joint generative process over an action trajectory $\mathbf{x}$ and a predicate-belief trajectory $\mathbf{z}$ by learning the coupled distribution $p(\mathbf{x},\mathbf{z}\mid \mathbf{o})$. Starting from noise ($\mathbf{x}_T$,$\mathbf{z}_T$), our model iteratively refines both modalities to produce temporally coherent action–outcome rollouts $(\mathbf{x}_0,\mathbf{z}_0)$. The resulting predicate-belief trajectory $\mathbf{z}_0$ provides an online symbolic interface for monitoring skill execution and planning-based skill composition using off-the-shelf planners.

Explainer Video

Online Action and Predicate Prediction
(Simulation Environments)

We evaluate PACTS in simulated manipulation environments where the policy executes actions while simultaneously predicting predicate beliefs online. Each video visualizes the learned policy rollout together with predicted symbolic conditions and simulator ground truth. These rollouts visualize the alignement between generated predicate-belief trace and the outcomes induced by the robot’s own actions.


PushBarrier

Kitchen

Coffee Preparation

From Monolithic Demonstrations to Composable Skills
(Real-world Environment)

To train PACTS in the real world, we built an annotation and skill-segmentation toolkit that converts long-horizon demonstrations into skill-centric datasets. Users sparsely annotate keyframes where predicates change, such as when a cube enters the container. The toolkit propagates those labels into dense per-timestep predicate traces and aligns them with the action stream. The result is a dataset of paired action and predicate label trajectories for each skill, ready for joint action–outcome learning.


1. Long-horizon Task Demonstration

2. Keyframe-level Predicate Annotation

3. Generated Skill-Centric Dataset

Planning with Learned Predicate-Action Skills
(Real-world Environment)

PACTS exposes learned skills through predicate beliefs, allowing an off-the-shelf symbolic planner to sequence skills for new goals. Given a symbolic goal, the planner selects a sequence of learned skills. During execution, PACTS predicts whether each skill’s expected effects have been achieved, allowing the system to advance, monitor progress, or replan when needed.


In this real-world cube-packing task, demonstrations contain the full behavior of packing all cubes. PACTS learns individual packing skills from segmented demonstrations and recomposes them at test time using symbolic goals. The same learned skill library can be recomposed into new symbolic goals without retraining.