ADAP

Abstract

Embodied robots nowadays can already handle many real-world manipulation tasks. However, certain other real-world tasks (e.g., shooting a basketball into a hoop) are highly agile and require high execution precision, presenting additional challenges for methods primarily designed for quasi-static manipulation tasks. This leads to increased efforts in costly data collection, laborious reward design, or complex motion planning. Such tasks, however, are far less challenging for humans. Say a novice basketball player typically needs only ~10 attempts to make their first successful shot, by roughly imitating a motion prior and then iteratively adjusting their motion based on the past outcomes. Inspired by this human learning paradigm, we propose the Prior Reinforce (P.R.) algorithm, a simple & scalable approach which iteratively refines its action plan by few real-world trials within a learned prior motion pattern, until reaching a specific goal. Experiments demonstrated that Prior Reinforce can learn and accomplish a wide range of goal-conditioned agile dynamic tasks with human-level precision and efficiency directly in real-world, such as throwing a basketball into the hoop in fewer than 10 trials.

Tasks

We select three real-world goal-conditioned agile dynamic tasks: Basketball Shot, Curling and Fishing Rod Swinging, aiming to cover several typical agile dynamic behaviors that are commonly seen in daily life, including projectile motion, sliding with friction, and movement of soft deformable objects.

Prior Reinforce Framework

Overview of our method Prior Reinforce, including two stages: Motion Pattern Learning and Iterative Rollout & Adaption. In the first stage, the agent learns the motion pattern and the rough correlation between the action and the outcome from the few provided action demonstration priors; in the second stage, the agent iteratively refines its action plan based on real-world trials, until reaching a newly specified goal.

Real-world Results

We evaluate ADAP on three agile dynamic tasks, including Basketball Shot (along with its 2 modified versions with changed hareware), Curling, and Fishing Rod Swinging. For all these tasks, ADAP can reach a new unseen goal in less than 10 real-world trials in total, thus achieving human-level performance and efficiency. Please see OUR VIDEO for experiment visualizations.

Citation

Welcome to cite us if you find our work inspiring or helpful : ) Thank you!

Prior Reinforce: Mastering Agile Tasks with Limited Trials

Abstract

Tasks

Prior Reinforce Framework

Real-world Results

Citation

Prior Reinforce:
Mastering Agile Tasks with Limited Trials