Prior Reinforce: Goal-Conditioned Dynamic Manipulation with Limited Trials

IROS 2026 logo
1Tsinghua University, 2Shanghai Qi Zhi Institute, 3Spirit AI

Abstract

Embodied robots have achieved strong performance in many real-world manipulation tasks, yet agile dynamic manipulation remains challenging due to high sensitivity to motion parameters and sparse outcome-level feedback. Tasks such as shooting a basketball into a hoop require precise control of fast open-loop motions, where small trajectory variations can lead to large outcome deviations, making data-efficient adaptation difficult for existing methods that rely on large-scale interaction, reward engineering, or accurate dynamic modeling. We propose Prior Reinforce (P.R.), a simple and practical framework for goal-conditioned dynamic manipulation. The method first learns a structured motion manifold from a small set of demonstrations using a conditional diffusion model, and then adapts motions toward new goals through feedback-driven optimization in a low-dimensional condition space. By separating motion generation from outcome-driven adaptation, the framework enables efficient refinement using only a small number of real-world trials under noisy perception. Experiments on multiple real-world dynamic manipulation tasks demonstrate that P.R. reliably achieves new goals within as few as ten total trials while remaining robust to perception noise and hardware uncertainty, suggesting a practical approach for low-trial real-world robot adaptation.

Tasks

SKIL Framework

We select three real-world goal-conditioned agile dynamic tasks: Basketball Shot, Curling and Fishing Rod Swinging, aiming to cover several typical agile dynamic behaviors that are commonly seen in daily life, including projectile motion, sliding with friction, and movement of soft deformable objects.

Prior Reinforce Framework

SKIL Framework

Overview of our method Prior Reinforce, including two stages: Motion Pattern Learning and Iterative Rollout & Adaptation. In the first stage, the agent learns the motion pattern and the rough correlation between the action and the outcome from the few provided action demonstration priors; in the second stage, the agent iteratively refines its action plan based on real-world trials, until reaching a newly specified goal.

Real-world Results

SKIL Framework

We evaluate Prior Reinforce on three agile dynamic tasks, including Basketball Shot (along with its 2 modified versions with changed hardware), Curling, and Fishing Rod Swinging, with perception signal provided by human eye or VLM model. Result shows that under all cases we tested, P.R. can reach a new unseen goal in as few as 10 real-world trials in total, thus achieving human-level performance and efficiency. Please see OUR VIDEO for experiment visualizations.

Citation

Welcome to cite us if you find our work inspiring or helpful : ) Thank you!
              
                @misc{hu2026priorreinforcegoalconditioneddynamic,
                      title={Prior Reinforce: Goal-Conditioned Dynamic Manipulation with Limited Trials}, 
                      author={Yihang Hu and Pingyue Sheng and Yuyang Liu and Shengjie Wang and Yang Gao},
                      year={2026},
                      eprint={2505.21916},
                      archivePrefix={arXiv},
                      primaryClass={cs.RO},
                      url={https://arxiv.org/abs/2505.21916}, 
                }