Lecture 15: Variable Impedance Contents: •
Variable Impedance Optimisation Exploiting Natural Dynamics in Explosive movements • Periodic and rhythmic tasks • Impedance transfer across heterogeneous systems •
Lecture 14: RLSC - Prof. Sethu Vijayakumar
1
Control signals
u
Arm states x [q; q ]
Target
Redundancy is a fundamental feature of the human motor system Redundancy atthe various levels: that arises from fact that there are more degrees of freedom o Task End Effector Trajectory Jerk, Min. Energytoetc.) available to-> control a movement than(Min. are strictly necessary o End -> (Bernstein, Joint Angles1967). (Inverse Kinematics) achieve theEffector task goal o Joint Angles -> Joint Torques (Inverse Dynamics) o Joint Torques -> Joint Stiffness (Variable Impedance)
Stiffness
+
Damping
Impedance
This capability is crucial for safe, yet precise human robot interactions and wearable exoskeletons. HAL Exoskeleton, Cyberdyne Inc., Japan
KUKA 7 DOF arm with Schunk 7 DOF hand @ Univ. of Edinburgh
Variable Stiffness Actuator τ τ(q, u) K K (q, u)
MACCEPA: Van Ham et.al, 2007
DLR Hand Arm System: Grebenstein et.al., 2011
… and an optimization framework
Open Loop OC OFC
Inv. dyn. model.
TASK
Trajectory planning
Solve IK
Controller (Feedback gains, constraints,…)
PLANT
- min. jerk, min time,…
TASK
Optimise cost function (e.g. minimum energy) Task & constraints are intuitively encoded
Optimal controller
PLANT
Given: Start & end states, fixed-time horizon T and system dynamics dx f (x, u)dt F(x, u)dω And assuming some cost function:
How the system reacts (∆x) to forces (u)
T v (t , x) E h(x(T )) l ( , x( ), π( , x( )))d t
Final Cost
Running Cost
Apply Statistical Optimization techniques to find optimal control commands
Aim: find control law π∗ that minimizes vπ (0, x0).
Analytic Methods Linear Quadratic Regulator (LQR) Linear Quadratic Gaussian (LQG)
Local Optimization Methods iLQG, iLDP
Dynamic Programming (DDP) Inference based methods AICO, PI^2, …
L, x
cost function (incl. target) dynamics model
OFC
OFC law
feedback controller
u
u δu
x
plant (robot)
Assume knowledge of actuator dynamics Assume knowledge of cost being optimized Explosive Movement Tasks (e.g., throwing) Periodic Movement Tasks and Temporal
Optimization (e.g. walking, brachiation) Learning dynamics (OFC-LD)
Assume knowledge of actuator dynamics Assume knowledge of cost being optimized Explosive Movement Tasks (e.g., throwing) Periodic Movement Tasks and Temporal
Optimization (e.g. walking, brachiation) Learning dynamics (OFC-LD)
David Braun, Matthew Howard and Sethu Vijayakumar, Exploiting Variable Stiffness for Explosive Movement Tasks, Proc. Robotics: Science and Systems (R:SS), Los Angeles (2011)
Highly dynamic tasks, explosive movements
David Braun, Matthew Howard and Sethu Vijayakumar, Exploiting Variable Stiffness for Explosive Movement Tasks, Proc. Robotics: Science and Systems (R:SS), Los Angeles (2011)
The two main ingredients: Compliant Actuators VARIABLE JOINT STIFFNESS
τ τ(q, u) MACCEPA: Van Ham et.al, 2007
K K (q, u)
Torque/Stiffness Opt. Model of the system dynamics:
x f (x, u) u Control objective: T
1 2 J d w F dt min . 20 Optimal control solution:
u(t , x) u* (t ) L* (t )(x x* (t )) DLR Hand Arm System: Grebenstein et.al., 2011
iLQG: Li & Todorov 2007 DDP: Jacobson & Mayne 1970
David Braun, Matthew Howard and Sethu Vijayakumar, Exploiting Variable Stiffness for Explosive Movement Tasks, Proc. Robotics: Science and Systems (R:SS), Los Angeles (2011)
2-link ball throwing - MACCEPA
stiffness modulation
speed: 20 rad/s distance thrown: 5.2m
Benefits of Stiffness Modulation: Quantitative evidence of improved task performance (distance thrown) with temporal stiffness modulation as opposed to fixed (optimal) stiffness control
David Braun, Matthew Howard and Sethu Vijayakumar, Exploiting Variable Stiffness for Explosive Movement Tasks, Proc. Robotics: Science and Systems (R:SS), Los Angeles (2011)
Exploiting Natural Dynamics: a) optimization suggests power amplification through pumping energy b) benefit of passive stiffness vs. active stiffness control
David Braun, Matthew Howard and Sethu Vijayakumar, Exploiting Variable Stiffness for Explosive Movement Tasks, Proc. Robotics: Science and Systems (R:SS), Los Angeles (2011)
Behaviour Optimization: Simultaneous stiffness and torque optimization of a VIA actuator that reflects strategies used in human explosive movement tasks: a) performance-effort trade-off 1 J d w F dt b) qualitatively similar stiffness pattern 2 c) strategy change in task execution T
2
0
David Braun, Matthew Howard and Sethu Vijayakumar, Exploiting Variable Stiffness for Explosive Movement Tasks, Proc. Robotics: Science and Systems (R:SS), Los Angeles (2011)
Scalability to more complex hardware Aim: Modelling and control with emphasis on physically realizable optimal impedance control with more complex state and actuation constraints Goal: a) demonstrate the applicability of the optimal variable stiffness control methodology to real-world problems, b) provide experimental evidence that supports the numerical predictions obtained by simulations, c) illustrate scalability of our approach
Ball throwing with the DLR HASy DLR HASY: State-of-the-art research platform for variable stiffness control. Restricted to a 2-dof system (shoulder and elbow rotation) Max motor side speed: 8 rad/s Max torque: 67Nm Stiffness range: 50 – 800 Nm/rad Speed for stiffness change: 0.33 s/range
DLR - FSJ
Schematic representation of the DLR-FSJ
Motor-side positions:
q 2 [θ, σ]T 4 Constraint:
min (σ ) max (σ )
Dealing with Complex Constraints 1 C11(q1 , q 1 )q 1 G1 (q1 ) τ1 (q1 , q2 ) M11(q1 )q 2 2βq 2 κ 2q 2 κ 2u q Incorporating the constraints: 1. Range constraints:
Φ(q1 , q 2 ) [Φmin (q 2 ), Φmax (q 2 )]
u [umin , umax ] Φ(q1 , q 2 ) 2. Rate/effort limitations:
κ [0, κ max ]
DLR – FSJ: optimisation with state constraints variable stiffness
fixed stiffness
Spring Length vs Stiffness Modulation
DLR – FSJ: optimisation with state constraints variable stiffness
fixed stiffness
Spring Length and Stiffness Modulation (plotted against time)
Implementation on the DLR HASy
motor velocity limited to: 2rad/s, 3rad/s
Ball throwing with DLR HASy
motor velocity limited to: 2rad/s, 3rad/s
Assume knowledge of actuator dynamics Assume knowledge of cost being optimized Explosive Movement Tasks (e.g., throwing) Periodic Movement Tasks and Temporal
Optimization (e.g. walking, brachiation) Learning dynamics (OFC-LD)
Jun Nakanishi, Konrad Rawlik and Sethu Vijayakumar, Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach , Proc. IEEE Intl Conf on Intelligent Robots and Systems (IROS ‘11) , San Francisco (2011).
WP4
Periodic Movement Control: Issues Representation • what is a suitable representation of periodic movement (trajectories, goal)?
Choice of cost function • how to design a cost function for periodic movement?
Exploitation of natural dynamics • how to exploit resonance for energy efficient control? • optimize frequency (temporal aspect) • stiffness tuning
Periodic Movement Representation Dynamical system with Fourier basis functions parameters Fourier basis functions
Fourier basis functions: Fourier coefficients:
• scaling of frequency, amplitude and offset is possible • efficient approximation method to compute Fourier coefficients [Kuhl and Giardina 1982] • orthogonality properties of basis functions • cf. Fourier series expansion
Cost Function for Periodic Movements Optimization criterion Terminal cost
• ensures periodicity of the trajectory
Running cost
• tracking performance and control cost
Jun Nakanishi, Konrad Rawlik and Sethu Vijayakumar, Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach , Proc. IEEE Intl Conf on Intelligent Robots and Systems (IROS ‘11) , San Francisco (2011).
WP4
Another View of Cost Function • Running cost: tracking performance and control cost
• Augmented plant dynamics with Fourier series based DMPs
• Reformulated running cost
• Find control and parameter such that plant dynamics (1) should behave like (2) and (3) while min. control cost Jun Nakanishi, Konrad Rawlik and Sethu Vijayakumar, Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach , Proc. IEEE Intl Conf on Intelligent Robots and Systems (IROS ‘11) , San Francisco (2011).
Temporal Optimization How do we find the right temporal duration in which to optimize a movement ? Solutions: • Fix temporal parameters ... not optimal • Time stationary cost ... cannot deal with sequential tasks, e.g. via points • Chain ‘first exit time’ controllers ... Linear duration cost, not optimal • Canonical Time Formulation 31
Canonical Time Formulation Dynamics:
Cost: n.b.
represent real time
Introduce change of time
Canonical Time Formulation Dynamics: Cost: n.b.
represent real time
n.b.
now represents canonical time
Introduce change of time Konrad Rawlik, Marc Toussaint and Sethu Vijayakumar, An Approximate Inference Approach to Temporal Optimization in Optimal Control, Proc. Advances in Neural Information Processing Systems (NIPS '10), Vancouver, Canada (2010).
AICO-T algorithm
• Use approximate inference methods • EM algorithm • E-Step: solve OC problem with fixed β • M-Step: optimise β with fixed controls Konrad Rawlik, Marc Toussaint and Sethu Vijayakumar, An Approximate Inference Approach to Temporal Optimization in Optimal Control, Proc. Advances in Neural Information Processing Systems (NIPS '10), Vancouver, Canada (2010). 34
Spatiotemporal Optimization • 2 DoF arm, reaching task
• 2 DoF arm, via point task
Optimization of Impedance Profiles Plant dynamics Reference trajectory
Optimization criterion Optimal feedback controller
EM-like iterative procedure to obtain and
Temporal optimization : time scaling • optimize
to yield optimal
or
WP4
Temporal Optimization in Brachiation • Optimize the joint torque and movement duration • Cost function : gripper position
• Time-scaling : canonical time • Find optimal convergence
using iLQG and update
in turn until
[Rawlik, Toussaint and Vijayakumar, 2010]
Jun Nakanishi, Konrad Rawlik and Sethu Vijayakumar, Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach , Proc. IEEE Intl Conf on Intelligent Robots and Systems (IROS ‘11) , San Francisco (2011).
WP4
Temporal Optimization of Swing Locomotion • vary T=1.3~1.55 (sec) and compare required joint torque • significant reduction of joint torque with
Jun Nakanishi, Konrad Rawlik and Sethu Vijayakumar, Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach , Proc. IEEE Intl Conf on Intelligent Robots and Systems (IROS ‘11) , San Francisco (2011).
Optimized Brachiating Manoeuvre Swing-up and locomotion
Jun Nakanishi, Konrad Rawlik and Sethu Vijayakumar, Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach , Proc. IEEE Intl Conf on Intelligent Robots and Systems (IROS ‘11) , San Francisco (2011).
Variable Impedance Biped (BLUE: Bipedal Locomotion @ UoE)
Walking is a bouncing gait • Energy from the swing is stored during stance • Three primary uses of springs – “pogo stick” principle – Return springs – Foot pad shock absorption
• Co-contraction lets us change the effective stiffness of joints – Variable stiffness: flexible gait, more behaviours, more efficient Umberger 2007
BLUE Bipedal Locomotion @ University of Edinburgh • Saggital plane biped capable of independently varying joint position, stiffness and damping • ¾ scale biped – Hip rotation height 700mm
• A platform to explore the effect of varying stiffness and damping on locomotion – Efficient walking at different speeds – Able to stand as well as walk efficiently – Different terrain, disturbances etc.
miniBLUE • ½ scale biped • printed – Lightweight – Rapid manufacture
• Large stiffness range – Down to zero stiffness
• Non-backdrive-able drive motors – Can store energy in springs without requiring opposing motor torque
BLUE: Preliminary Results
Assume knowledge of actuator dynamics Assume knowledge of cost being optimized Explosive Movement Tasks (e.g., throwing) Periodic Movement Tasks and Temporal
Optimization (e.g. walking, brachiation) Learning dynamics (OFC-LD)
Approximate non-linear functions with a combination of multiple weighted linear models 1 w ii exp( (xi xq )T Dk (xi xq )) 2 β k ( XT Wk X) 1 XT Wk Y yˆ k xTq β k yˆ wk yˆ k / wk k
k
Solve this problem for high dimensional space: LWPR Sethu Vijayakumar, Aaron D'Souza and Stefan Schaal, Online Learning in High Dimensions, Neural Computation, vol. 17, pp. 2602-34 (2005)
Locally Weighted Projection Regression (LWPR) for dynamics learning (Vijayakumar et al., 2005). Φ(q, q , u) ~ f (q, q , u) q
[q, q , u]
dx f(x,u)dt F(x, u)dω
~ dx f (x, u)dt (x, u)dω
S. Vijayakumar, A. D'Souza and S. Schaal, Online Learning in High Dimensions, Neural Computation, vol. 17 (2005)
• OFC-LD uses LWPR learned dynamics for optimization (Mitrovic et al., 2010a) • Key ingredient: Ability to learn both the dynamics and the associated uncertainty (Mitrovic et al., 2010b)
Djordje Mitrovic, Stefan Klanke and Sethu Vijayakumar, Adaptive Optimal Feedback Control with Learned Internal Dynamics Models, From Motor Learning to Interaction Learning in Robots, SCI 264, pp. 65-84, Springer-Verlag (2010).
Reproduces the “trial-to-trial” variability in the uncontrolled manifold, i.e., exhibits the minimum intervention principle that is characteristic of human motor control. KUKA LWR
Simulink Model
Minimum intervention principle
High accuracy while remaining compliant and energy efficient.
Djordje Mitrovic, Stefan Klanke and Sethu Vijayakumar, Learning Impedance Control of Antagonistic Systems based on Stochastic Optimisation Principles, International Journal of Robotic Research, Vol. 30, No. 5, pp. 556-573 (2011).
Constant Unidirectional Force Field
Can predict the “ideal observer” adaptation behaviour under complex force fields due to the ability to work with adaptive dynamics
Velocity-dependent Divergent Force Field
Cost Function:
Djordje Mitrovic, Stefan Klanke, Rieko Osu, Mitsuo Kawato and Sethu Vijayakumar, A Computational Model of Limb Impedance Control based on Principles of Internal Model Uncertainty, PLoS ONE, Vol. 5, No. 10 (2010).
OFC-LD is computationally more efficient than iLQG, because we can compute the required partial derivatives analytically from the learned model
Optimized co-contraction profiles are quite different from how humans use their antagonistic musculoskeletal system. So what is missing? Muscle plots: Minimal co-contraction remains
2 joint and 6 antagonistic muscles
Constant force field Online adaptation!
Overshoot Online re-anneal
Djordje Mitrovic, Stefan Klanke, Sethu Vijayakumar, Adaptive Optimal Control for Redundantly Actuated Arms, Proc. Tenth International Conference on the Simulation of Adaptive Behavior (SAB '08), Osaka, Japan (2008)
Focus: Signal Dependent Noise (SDN)
(u) isotonic u1 u2 isometric u1 u2 , ξ ~ N (0, I 2 ) n
m
See: Osu et.al., 2004; Gribble et al., 2003
Stochastic OFC-LD
Deterministic OFC-LD
Djordje Mitrovic, Stefan Klanke, Rieko Osu, Mitsuo Kawato and Sethu Vijayakumar, A Computational Model of Limb Impedance Control based on Principles of Internal Model Uncertainty, PLoS ONE (2010).
Assume knowledge of actuator dynamics Assume knowledge of cost being optimized Explosive Movement Tasks (e.g., throwing) Periodic Movement Tasks and Temporal
Optimization (e.g. walking, brachiation) Learning dynamics (OFC-LD)
Assume knowledge of actuator dynamics Assume knowledge of cost to be optimized Routes to Impedance Behaviour Imitation
Edinburgh SEA
MACCEPA Kuka Lightweight Arm LWR-III DLR VIA Shadow Hand IIT actuator
‘Ideal’ VSA:
• u (q0 , k )T • stiffness (k), eq. pos. (q0) directly controllable
Edinburgh SEA:
T • u ( , ) • biomorphic, antagonistic design • coupled stiffness and eq. pos.
MACCEPA:
• u (m1 , m2 ) • (nearly) de-coupled, stiffness and eq. pos. control T
Direct Transfer: Feed EMG directly to motors
Impedance Transfer: Pre-process EMG, track stiffness and equilibrium position
Matthew Howard, David Braun and Sethu Vijayakumar, Constraint-based Equilibrium and Stiffness Control of Variable Stiffness Actuators, Proc. IEEE International Conference on Robotics and Automation (ICRA 2011), Shanghai (2011).
Transfer ball hitting task across different VIAs: Very different command sequences due to different actuation Optimal impedance control strategy very similar across plants
• Direct imitation: lower velocity at time of impact, less powerful hit • Apprenticeship learning: movement is optimised to robot dynamics, ball is hit further
M. Howard, D. Mitrovic & S. Vijayakumar, Transferring Impedance Control Strategies Between Heterogeneous Systems via Apprenticeship Learning, Proc. IEEE-RAS International Conference on Humanoid Robots, Nashville, TN, USA (2010)).
Model-based transfer of human behavior has relied on demonstrator’s dynamics: in most practical settings, such models fail to capture the complex, non-linear dynamics of the human
musculoskeletal system inconsistencies between modeling assumptions and the configuration and placement of measurement apparatus
Takeshi Mori, Matthew Howard and Sethu Vijayakumar, Model Free Apprenticeship Learning for Transfer of Human Impedance Behaviour, Proc. 11th IEEE-RAS International Conference on Humanoid Robots, Bled, Slovenia (2011).
Original
Monte Carlo method and model-based method on MWAL Requires: (human) dynamics model ef
Model-free
LSTDf and LSPIf combined on MWAL Requires: exploratory data aD instead of using dynamics model
Optimization methods Need to exploit plant (actuator) dynamics ▪ Direct policy methods allow this Are effective when one has a good estimate of
costs functions that need optimized
Imitation and Transfer methods Should not naively mimic impedance profiles
across heterogeneous systems Transfer at the level of objectives most appropriate