Lecture 15: Variable Impedance

Lecture 15: Variable Impedance Contents: • Variable Impedance Optimisation Exploiting Natural Dynamics in Explosive movements • Periodic and rhythmic...

Author: Randell Cameron

0 downloads 2 Views 10MB Size

Report

Download PDF

Recommend Documents

Lecture 3: Quantitative Variable

Lecture 14: Impedance and Admittance Matrices

Lecture 7: Continuous Random Variable

Lecture-VII. Momentum and variable mass

Lecture 15: Modelling State

Lecture 15 Vocal Jazz

Physics 207 Lecture 15

815 Lecture 15

510 Probability Lecture 10: Continuous Random Variable

Lecture 15. Capacitance

Lecture 15 - Midterm Review

Lecture 15 Forecasting

Hardened Concrete. Lecture No. 15

Lecture 15: Multiple Sequence Alignment

Lecture 9 March 15, 2012

Lecture 2: September 15, 2016

Allan MacRae, Ezekiel, Lecture 15

Lecture 15. Options hedging approaches

Lecture 15: Active Galactic Nuclei

Digestive System 5. Lecture 15

rat Weight (g) Variable, Variable, Variable, Variable, Variable, Variable, * * *

Impedance Cardiography

IMPEDANCE CARDIOGRAPHY

soundboard impedance

Lecture 15: Variable Impedance Contents: •

Variable Impedance Optimisation Exploiting Natural Dynamics in Explosive movements • Periodic and rhythmic tasks • Impedance transfer across heterogeneous systems •

Lecture 14: RLSC - Prof. Sethu Vijayakumar

1

Control signals

u

Arm states x  [q; q ]

Target

Redundancy is a fundamental feature of the human motor system Redundancy atthe various levels: that arises from fact that there are more degrees of freedom o Task End Effector Trajectory Jerk, Min. Energytoetc.) available to-> control a movement than(Min. are strictly necessary o End -> (Bernstein, Joint Angles1967). (Inverse Kinematics) achieve theEffector task goal o Joint Angles -> Joint Torques (Inverse Dynamics) o Joint Torques -> Joint Stiffness (Variable Impedance)

Stiffness

+

Damping

Impedance

This capability is crucial for safe, yet precise human robot interactions and wearable exoskeletons. HAL Exoskeleton, Cyberdyne Inc., Japan

KUKA 7 DOF arm with Schunk 7 DOF hand @ Univ. of Edinburgh



Variable Stiffness Actuator τ  τ(q, u) K  K (q, u)

MACCEPA: Van Ham et.al, 2007

DLR Hand Arm System: Grebenstein et.al., 2011



… and an optimization framework

Open Loop OC OFC

Inv. dyn. model.

TASK

Trajectory planning

Solve IK

Controller (Feedback gains, constraints,…)

PLANT

- min. jerk, min time,…

TASK

Optimise cost function (e.g. minimum energy) Task & constraints are intuitively encoded

Optimal controller

PLANT

Given:  Start & end states,  fixed-time horizon T and  system dynamics dx  f (x, u)dt  F(x, u)dω And assuming some cost function:

How the system reacts (∆x) to forces (u)

T    v (t , x)  E h(x(T ))   l ( , x( ), π( , x( )))d  t  

Final Cost

Running Cost

Apply Statistical Optimization techniques to find optimal control commands

Aim: find control law π∗ that minimizes vπ (0, x0).



Analytic Methods  Linear Quadratic Regulator (LQR)  Linear Quadratic Gaussian (LQG)



Local Optimization Methods  iLQG, iLDP

 

Dynamic Programming (DDP) Inference based methods  AICO, PI^2, …

L, x

cost function (incl. target) dynamics model

OFC

OFC law

feedback controller

u

u  δu

x

plant (robot)

Assume knowledge of actuator dynamics Assume knowledge of cost being optimized  Explosive Movement Tasks (e.g., throwing)  Periodic Movement Tasks and Temporal

Optimization (e.g. walking, brachiation)  Learning dynamics (OFC-LD)

Assume knowledge of actuator dynamics Assume knowledge of cost being optimized  Explosive Movement Tasks (e.g., throwing)  Periodic Movement Tasks and Temporal

Optimization (e.g. walking, brachiation)  Learning dynamics (OFC-LD)

David Braun, Matthew Howard and Sethu Vijayakumar, Exploiting Variable Stiffness for Explosive Movement Tasks, Proc. Robotics: Science and Systems (R:SS), Los Angeles (2011)

Highly dynamic tasks, explosive movements

David Braun, Matthew Howard and Sethu Vijayakumar, Exploiting Variable Stiffness for Explosive Movement Tasks, Proc. Robotics: Science and Systems (R:SS), Los Angeles (2011)

The two main ingredients: Compliant Actuators  VARIABLE JOINT STIFFNESS

τ  τ(q, u) MACCEPA: Van Ham et.al, 2007

K  K (q, u)

Torque/Stiffness Opt.  Model of the system dynamics:

x  f (x, u) u    Control objective: T

1 2 J  d  w  F dt  min . 20  Optimal control solution:

u(t , x)  u* (t )  L* (t )(x  x* (t )) DLR Hand Arm System: Grebenstein et.al., 2011

iLQG: Li & Todorov 2007 DDP: Jacobson & Mayne 1970

David Braun, Matthew Howard and Sethu Vijayakumar, Exploiting Variable Stiffness for Explosive Movement Tasks, Proc. Robotics: Science and Systems (R:SS), Los Angeles (2011)

2-link ball throwing - MACCEPA

stiffness modulation

speed: 20 rad/s distance thrown: 5.2m

Benefits of Stiffness Modulation: Quantitative evidence of improved task performance (distance thrown) with temporal stiffness modulation as opposed to fixed (optimal) stiffness control

David Braun, Matthew Howard and Sethu Vijayakumar, Exploiting Variable Stiffness for Explosive Movement Tasks, Proc. Robotics: Science and Systems (R:SS), Los Angeles (2011)

Exploiting Natural Dynamics: a) optimization suggests power amplification through pumping energy b) benefit of passive stiffness vs. active stiffness control

David Braun, Matthew Howard and Sethu Vijayakumar, Exploiting Variable Stiffness for Explosive Movement Tasks, Proc. Robotics: Science and Systems (R:SS), Los Angeles (2011)

Behaviour Optimization: Simultaneous stiffness and torque optimization of a VIA actuator that reflects strategies used in human explosive movement tasks: a) performance-effort trade-off 1 J  d  w  F dt b) qualitatively similar stiffness pattern 2 c) strategy change in task execution T

2

0

David Braun, Matthew Howard and Sethu Vijayakumar, Exploiting Variable Stiffness for Explosive Movement Tasks, Proc. Robotics: Science and Systems (R:SS), Los Angeles (2011)

Scalability to more complex hardware Aim: Modelling and control with emphasis on physically realizable optimal impedance control with more complex state and actuation constraints Goal: a) demonstrate the applicability of the optimal variable stiffness control methodology to real-world problems, b) provide experimental evidence that supports the numerical predictions obtained by simulations, c) illustrate scalability of our approach

Ball throwing with the DLR HASy DLR HASY: State-of-the-art research platform for variable stiffness control. Restricted to a 2-dof system (shoulder and elbow rotation) Max motor side speed: 8 rad/s Max torque: 67Nm Stiffness range: 50 – 800 Nm/rad Speed for stiffness change: 0.33 s/range

DLR - FSJ

Schematic representation of the DLR-FSJ

Motor-side positions:

q 2  [θ, σ]T  4 Constraint:

min (σ )    max (σ )

Dealing with Complex Constraints 1  C11(q1 , q 1 )q 1  G1 (q1 )  τ1 (q1 , q2 ) M11(q1 )q  2  2βq 2  κ 2q 2  κ 2u q Incorporating the constraints: 1. Range constraints:

Φ(q1 , q 2 )    [Φmin (q 2 ), Φmax (q 2 )]

u [umin , umax ]  Φ(q1 , q 2 )   2. Rate/effort limitations:

κ [0, κ max ]

DLR – FSJ: optimisation with state constraints variable stiffness

fixed stiffness

Spring Length vs Stiffness Modulation

DLR – FSJ: optimisation with state constraints variable stiffness

fixed stiffness

Spring Length and Stiffness Modulation (plotted against time)

Implementation on the DLR HASy

motor velocity limited to: 2rad/s, 3rad/s

Ball throwing with DLR HASy

motor velocity limited to: 2rad/s, 3rad/s

Assume knowledge of actuator dynamics Assume knowledge of cost being optimized  Explosive Movement Tasks (e.g., throwing)  Periodic Movement Tasks and Temporal

Optimization (e.g. walking, brachiation)  Learning dynamics (OFC-LD)

Jun Nakanishi, Konrad Rawlik and Sethu Vijayakumar, Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach , Proc. IEEE Intl Conf on Intelligent Robots and Systems (IROS ‘11) , San Francisco (2011).

WP4

Periodic Movement Control: Issues Representation • what is a suitable representation of periodic movement (trajectories, goal)?

Choice of cost function • how to design a cost function for periodic movement?

Exploitation of natural dynamics • how to exploit resonance for energy efficient control? • optimize frequency (temporal aspect) • stiffness tuning

Periodic Movement Representation Dynamical system with Fourier basis functions parameters Fourier basis functions

Fourier basis functions: Fourier coefficients:

• scaling of frequency, amplitude and offset is possible • efficient approximation method to compute Fourier coefficients [Kuhl and Giardina 1982] • orthogonality properties of basis functions • cf. Fourier series expansion

Cost Function for Periodic Movements Optimization criterion Terminal cost

• ensures periodicity of the trajectory

Running cost

• tracking performance and control cost

Jun Nakanishi, Konrad Rawlik and Sethu Vijayakumar, Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach , Proc. IEEE Intl Conf on Intelligent Robots and Systems (IROS ‘11) , San Francisco (2011).

WP4

Another View of Cost Function • Running cost: tracking performance and control cost

• Augmented plant dynamics with Fourier series based DMPs

• Reformulated running cost

• Find control and parameter such that plant dynamics (1) should behave like (2) and (3) while min. control cost Jun Nakanishi, Konrad Rawlik and Sethu Vijayakumar, Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach , Proc. IEEE Intl Conf on Intelligent Robots and Systems (IROS ‘11) , San Francisco (2011).

Temporal Optimization How do we find the right temporal duration in which to optimize a movement ? Solutions: • Fix temporal parameters ... not optimal • Time stationary cost ... cannot deal with sequential tasks, e.g. via points • Chain ‘first exit time’ controllers ... Linear duration cost, not optimal • Canonical Time Formulation 31

Canonical Time Formulation Dynamics:

Cost: n.b.

represent real time

Introduce change of time

Canonical Time Formulation Dynamics: Cost: n.b.

represent real time

n.b.

now represents canonical time

Introduce change of time Konrad Rawlik, Marc Toussaint and Sethu Vijayakumar, An Approximate Inference Approach to Temporal Optimization in Optimal Control, Proc. Advances in Neural Information Processing Systems (NIPS '10), Vancouver, Canada (2010).

AICO-T algorithm

• Use approximate inference methods • EM algorithm • E-Step: solve OC problem with fixed β • M-Step: optimise β with fixed controls Konrad Rawlik, Marc Toussaint and Sethu Vijayakumar, An Approximate Inference Approach to Temporal Optimization in Optimal Control, Proc. Advances in Neural Information Processing Systems (NIPS '10), Vancouver, Canada (2010). 34

Spatiotemporal Optimization • 2 DoF arm, reaching task

• 2 DoF arm, via point task

Optimization of Impedance Profiles Plant dynamics Reference trajectory

Optimization criterion Optimal feedback controller

EM-like iterative procedure to obtain and

Temporal optimization : time scaling • optimize

to yield optimal

or

WP4

Temporal Optimization in Brachiation • Optimize the joint torque and movement duration • Cost function : gripper position

• Time-scaling : canonical time • Find optimal convergence

using iLQG and update

in turn until

[Rawlik, Toussaint and Vijayakumar, 2010]

Jun Nakanishi, Konrad Rawlik and Sethu Vijayakumar, Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach , Proc. IEEE Intl Conf on Intelligent Robots and Systems (IROS ‘11) , San Francisco (2011).

WP4

Temporal Optimization of Swing Locomotion • vary T=1.3~1.55 (sec) and compare required joint torque • significant reduction of joint torque with

Jun Nakanishi, Konrad Rawlik and Sethu Vijayakumar, Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach , Proc. IEEE Intl Conf on Intelligent Robots and Systems (IROS ‘11) , San Francisco (2011).

Optimized Brachiating Manoeuvre Swing-up and locomotion

Jun Nakanishi, Konrad Rawlik and Sethu Vijayakumar, Stiffness and Temporal Optimization in Periodic Movements: An Optimal Control Approach , Proc. IEEE Intl Conf on Intelligent Robots and Systems (IROS ‘11) , San Francisco (2011).

Variable Impedance Biped (BLUE: Bipedal Locomotion @ UoE)

Walking is a bouncing gait • Energy from the swing is stored during stance • Three primary uses of springs – “pogo stick” principle – Return springs – Foot pad shock absorption

• Co-contraction lets us change the effective stiffness of joints – Variable stiffness: flexible gait, more behaviours, more efficient Umberger 2007

BLUE Bipedal Locomotion @ University of Edinburgh • Saggital plane biped capable of independently varying joint position, stiffness and damping • ¾ scale biped – Hip rotation height 700mm

• A platform to explore the effect of varying stiffness and damping on locomotion – Efficient walking at different speeds – Able to stand as well as walk efficiently – Different terrain, disturbances etc.

miniBLUE • ½ scale biped • printed – Lightweight – Rapid manufacture

• Large stiffness range – Down to zero stiffness

• Non-backdrive-able drive motors – Can store energy in springs without requiring opposing motor torque

BLUE: Preliminary Results

Assume knowledge of actuator dynamics Assume knowledge of cost being optimized  Explosive Movement Tasks (e.g., throwing)  Periodic Movement Tasks and Temporal

Optimization (e.g. walking, brachiation)  Learning dynamics (OFC-LD)

Approximate non-linear functions with a combination of multiple weighted linear models 1 w ii  exp(  (xi  xq )T Dk (xi  xq )) 2 β k  ( XT Wk X) 1 XT Wk Y yˆ k  xTq β k yˆ   wk yˆ k /  wk k

k

Solve this problem for high dimensional space: LWPR Sethu Vijayakumar, Aaron D'Souza and Stefan Schaal, Online Learning in High Dimensions, Neural Computation, vol. 17, pp. 2602-34 (2005)

Locally Weighted Projection Regression (LWPR) for dynamics learning (Vijayakumar et al., 2005). Φ(q, q , u) ~  f (q, q , u)  q

[q, q , u]

dx  f(x,u)dt  F(x, u)dω

~ dx  f (x, u)dt   (x, u)dω

S. Vijayakumar, A. D'Souza and S. Schaal, Online Learning in High Dimensions, Neural Computation, vol. 17 (2005)

• OFC-LD uses LWPR learned dynamics for optimization (Mitrovic et al., 2010a) • Key ingredient: Ability to learn both the dynamics and the associated uncertainty (Mitrovic et al., 2010b)

Djordje Mitrovic, Stefan Klanke and Sethu Vijayakumar, Adaptive Optimal Feedback Control with Learned Internal Dynamics Models, From Motor Learning to Interaction Learning in Robots, SCI 264, pp. 65-84, Springer-Verlag (2010).

Reproduces the “trial-to-trial” variability in the uncontrolled manifold, i.e., exhibits the minimum intervention principle that is characteristic of human motor control. KUKA LWR

Simulink Model

Minimum intervention principle

High accuracy while remaining compliant and energy efficient.

Djordje Mitrovic, Stefan Klanke and Sethu Vijayakumar, Learning Impedance Control of Antagonistic Systems based on Stochastic Optimisation Principles, International Journal of Robotic Research, Vol. 30, No. 5, pp. 556-573 (2011).

Constant Unidirectional Force Field

Can predict the “ideal observer” adaptation behaviour under complex force fields due to the ability to work with adaptive dynamics

Velocity-dependent Divergent Force Field

Cost Function:

Djordje Mitrovic, Stefan Klanke, Rieko Osu, Mitsuo Kawato and Sethu Vijayakumar, A Computational Model of Limb Impedance Control based on Principles of Internal Model Uncertainty, PLoS ONE, Vol. 5, No. 10 (2010).

OFC-LD is computationally more efficient than iLQG, because we can compute the required partial derivatives analytically from the learned model

Optimized co-contraction profiles are quite different from how humans use their antagonistic musculoskeletal system. So what is missing? Muscle plots: Minimal co-contraction remains

2 joint and 6 antagonistic muscles

Constant force field Online adaptation!

Overshoot  Online re-anneal

Djordje Mitrovic, Stefan Klanke, Sethu Vijayakumar, Adaptive Optimal Control for Redundantly Actuated Arms, Proc. Tenth International Conference on the Simulation of Adaptive Behavior (SAB '08), Osaka, Japan (2008)

Focus: Signal Dependent Noise (SDN)

 (u)   isotonic u1  u2   isometric u1  u2 , ξ ~ N (0, I 2 ) n

m

See: Osu et.al., 2004; Gribble et al., 2003

Stochastic OFC-LD

Deterministic OFC-LD

Djordje Mitrovic, Stefan Klanke, Rieko Osu, Mitsuo Kawato and Sethu Vijayakumar, A Computational Model of Limb Impedance Control based on Principles of Internal Model Uncertainty, PLoS ONE (2010).

Assume knowledge of actuator dynamics Assume knowledge of cost being optimized  Explosive Movement Tasks (e.g., throwing)  Periodic Movement Tasks and Temporal

Optimization (e.g. walking, brachiation)  Learning dynamics (OFC-LD)

Assume knowledge of actuator dynamics Assume knowledge of cost to be optimized  Routes to Impedance Behaviour Imitation

Edinburgh SEA

MACCEPA Kuka Lightweight Arm LWR-III DLR VIA Shadow Hand IIT actuator

‘Ideal’ VSA:

• u  (q0 , k )T • stiffness (k), eq. pos. (q0) directly controllable

Edinburgh SEA:

T • u  ( ,  ) • biomorphic, antagonistic design • coupled stiffness and eq. pos.

MACCEPA:

• u  (m1 , m2 ) • (nearly) de-coupled, stiffness and eq. pos. control T

Direct Transfer: Feed EMG directly to motors

Impedance Transfer: Pre-process EMG, track stiffness and equilibrium position

Matthew Howard, David Braun and Sethu Vijayakumar, Constraint-based Equilibrium and Stiffness Control of Variable Stiffness Actuators, Proc. IEEE International Conference on Robotics and Automation (ICRA 2011), Shanghai (2011).

Transfer ball hitting task across different VIAs: Very different command sequences due to different actuation Optimal impedance control strategy very similar across plants

• Direct imitation: lower velocity at time of impact, less powerful hit • Apprenticeship learning: movement is optimised to robot dynamics, ball is hit further

M. Howard, D. Mitrovic & S. Vijayakumar, Transferring Impedance Control Strategies Between Heterogeneous Systems via Apprenticeship Learning, Proc. IEEE-RAS International Conference on Humanoid Robots, Nashville, TN, USA (2010)).



Model-based transfer of human behavior has relied on demonstrator’s dynamics: in most practical settings, such models fail to capture  the complex, non-linear dynamics of the human

musculoskeletal system  inconsistencies between modeling assumptions and the configuration and placement of measurement apparatus

Takeshi Mori, Matthew Howard and Sethu Vijayakumar, Model Free Apprenticeship Learning for Transfer of Human Impedance Behaviour, Proc. 11th IEEE-RAS International Conference on Humanoid Robots, Bled, Slovenia (2011).



Original

Monte Carlo method and model-based method on MWAL Requires: (human) dynamics model ef



Model-free

LSTDf and LSPIf combined on MWAL Requires: exploratory data aD instead of using dynamics model



Optimization methods  Need to exploit plant (actuator) dynamics ▪ Direct policy methods allow this  Are effective when one has a good estimate of

costs functions that need optimized 

Imitation and Transfer methods  Should not naively mimic impedance profiles

across heterogeneous systems  Transfer at the level of objectives most appropriate