Computational Methods for Sustainable Energy J. Zico Kolter
August 5, 2013 IJCAI
J. Zico Kolter
Outline • Introduction to sustainable energy and the smart grid • Three highlighted topics: – Power and demand forecasting – Energy disaggregation – Control in the smart grid
• Final thoughts
J. Zico Kolter
Outline • Introduction to sustainable energy and the smart grid • Three highlighted topics: – Power and demand forecasting – Energy disaggregation – Control in the smart grid
• Final thoughts
J. Zico Kolter
“Sustainable energy”
“Sustainable development is development that meets the needs of the present without compromising the ability of future generations to meet their own needs.” – UN Report “Our Common Future”, 1987
J. Zico Kolter
U.S. energy consumption 3.5 Coal Natural Gas Petroleum Hydro / Nuclear / Biomass Wind / Solar / Geothermal
Average Power (TW)
3 2.5 2 1.5 1 0.5 0 1850
1900
1950 2000 Year Data: U.S. Energy Information Administration J. Zico Kolter
U.S. energy consumption Lawrence Livermore National Laboratory
Estimated U.S. Energy Use in 2010: ~98.0 Quads Solar 0.11
Net Electricity Imports
0.01 8.44
Nuclear 8.44
7.52 2.49
19.13
Hydro 2.51 Wind 0.92
0.09
12.71 Electricity Generation 39.49
26.78 Rejected Energy 56.13 2.36
4.95
0.92 0.15
Residential 11.79
0.10 0.04
Geothermal 0.21
0.42
9.43
1.22
5.06 1.74
4.54
0.02
Natural Gas 24.65
Commercial 8.71
3.28
6.97
0.71
0.02
0.06 0.11
3.28 8.11
Coal 20.82 2.23
1.62
8.01
Industrial 23.27
18.62
0.44
20.59
Biomass 4.29 0.38
0.68
1.10
0.03
25.65 Petroleum 35.97
Energy Services 41.88
4.65
Transportation 27.45
6.86
Source: LLNL 2011. Data is based on DOE/EIA-0384(2010), October 2011. If this information or a reproduction of it is used, credit must be given to the Lawrence Livermore National Laboratory and the Department of Energy, under whose auspices the work was performed. Distributed electricity represents only retail electricity sales and does not include self-generation. EIA reports flows for hydro, wind, solar and geothermal in BTU-equivalent values by assuming a typical fossil fuel plant "heat rate." (see EIA report for explanation of change to geothermal in 2010). The efficiency of electricity production is calculated as the total retail electricity delivered divided by the primary energy input into electricity generation. End use efficiency is estimated as 80% for the residential, commercial and industrial sectors, and as 25% for the transportation sector. Totals may not equal sum of components due to independent rounding. LLNL-MI-410527
J. Zico Kolter
U.S. petroleum production US Pertroleum Production (TW)
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1900
1920
1940
1960 Year
1980
2000
Data: U.S. Energy Information Administration J. Zico Kolter
Atmospheric CO2 Concentration (PPM)
Atmospheric carbon dioxide 400 380 360 340 320 300 280 260 1000
1200
1400
1600 Year
1800
2000
Data: NOAA and Eteridge et al., 1998 J. Zico Kolter
Atmospheric CO2 Concentration (PPM)
Atmospheric carbon dioxide 400
350
300
250
200
150
600
500 400 300 200 100 1000 Years Before Present
0
Data: Barnola et al., 2003, Siegenthaler et al., 2005 J. Zico Kolter
For additional discussion... “Defence”: 4 Transporting stuff: 12 kWh/d
Geothermal: 1 kWh/d
Tide: 11 kWh/d Wave: 4 kWh/d Stuff: 48+ kWh/d
Food, farming, fertilizer: 15 kWh/d Gadgets: 5 Light: 4 kWh/d
Heating, cooling: 37 kWh/d
Jet flights: 30 kWh/d
Deep offshore wind: 32 kWh/d
Shallow offshore wind: 16 kWh/d Hydro: 1.5 kWh/d
Biomass: food, biofuel, wood, waste incin’n, landfill gas: 24 kWh/d
PV farm (200 m2/p): 50 kWh/d
PV, 10 m2/p: 5
Car: 40 kWh/d
Solar heating: 13 kWh/d
Wind: 20 kWh/d
http://www.withouthotair.com J. Zico Kolter
Why computation/AI?
You: need to supply power to the country
Data: U.S. Energy Informational Administration, 2010 U.S. Census, Institute for Electric Efficiency, Argonne National Labs J. Zico Kolter
Why computation/AI?
You: need to supply power to the country
5,500 power plants (925 GW capacity)
Data: U.S. Energy Informational Administration, 2010 U.S. Census, Institute for Electric Efficiency, Argonne National Labs J. Zico Kolter
Why computation/AI? 83m residential and 5m commerical/ industrial buildings (768 GW peak demand)
You: need to supply power to the country
5,500 power plants (925 GW capacity)
Data: U.S. Energy Informational Administration, 2010 U.S. Census, Institute for Electric Efficiency, Argonne National Labs J. Zico Kolter
Why computation/AI? 83m residential and 5m commerical/ industrial buildings (768 GW peak demand)
172k miles of transmission lines
You: need to supply power to the country
5,500 power plants (925 GW capacity)
Data: U.S. Energy Informational Administration, 2010 U.S. Census, Institute for Electric Efficiency, Argonne National Labs J. Zico Kolter
Why computation/AI? 83m residential and 5m commerical/ industrial buildings (768 GW peak demand)
172k miles of transmission lines
5,500 power plants (925 GW capacity)
You: need to supply power to the country
49 GW of installed wind/ solar capcity
Data: U.S. Energy Informational Administration, 2010 U.S. Census, Institute for Electric Efficiency, Argonne National Labs J. Zico Kolter
Why computation/AI? 83m residential and 5m commerical/ industrial buildings (768 GW peak demand)
172k miles of transmission lines
5,500 power plants (925 GW capacity)
8 7
9 01
6 5 4
2
8
3 7
9 01
6 5 4
2
8
3 7
9 01
6 5 4
2
8
3 7
9 01
6 5 4
2
8
3 7
9 01
2 3
6 5 4
30m installed smart meters
You: need to supply power to the country
49 GW of installed wind/ solar capcity
Data: U.S. Energy Informational Administration, 2010 U.S. Census, Institute for Electric Efficiency, Argonne National Labs J. Zico Kolter
Outline • Introduction to sustainable energy and the smart grid • Three highlighted topics: – Power and demand forecasting – Energy disaggregation – Control in the smart grid
• Final thoughts
J. Zico Kolter
Outline • Introduction to sustainable energy and the smart grid • Three highlighted topics: – Power generation and demand forecasting – Energy disaggregation – Control in the smart grid
• Final thoughts
J. Zico Kolter
Pittsburgh electricity consumption
Hourly Demand (GW)
3 Feb 9 Jul 13 Oct 10
2.5
2
1.5
1
0
5
10 15 Hour of Day
Data: PJM http://www.pjm.com
20
J. Zico Kolter
Electricity forecasting
• One of the most common tasks in energy system scheduling is forecasting how much electricity a region will consume • Lets us plan, in advance, how we are going to allocate generation (especially important for slow-starting generators) • Will need to re-schedule generation in real-time to make up for errors, but gives a good baseline
J. Zico Kolter
A natural supervised learning setup
• Electricity forecasting is naturally formulated as a multi-output regression problem yˆt = f (xt ) where – yˆt ∈ R24 = predicted consumption over the next 24 hours, starting at time t – xt ∈ Rk = features that can help predict consumption over the next 24 hours
J. Zico Kolter
Possible features: hour of day
Hourly Demand (GW)
3 Feb 9 Jul 13 Oct 10
2.5
2
1.5
1
0
5
10 15 Hour of Day
20
J. Zico Kolter
Possible features: previous day’s power 2.1
Hourly Demand (GW)
2 1.9 1.8 1.7 1.6 1.5
Feb 12, 2008 Feb 13, 2008 0
5
10 15 Hour of Day
20
J. Zico Kolter
Possible features: temperature
Peak Hourly Demand (GW)
3
2.5
2
1.5
0
20
40 60 High Temperature (F)
80
100
J. Zico Kolter
(Multiple) linear regression • Model predicted consumption as a linear model yˆt = ΘT xt where – yˆt ∈ R24 = predicted consumption over the next 24 hours, starting at time t – xt ∈ Rk = {hour of day, previous 24 hours of power, previous 24 hours and next 24 hours of temperature (+ non-linear features)} – Θ ∈ R24×k regression parameters J. Zico Kolter
• MATLAB code for electricity forecasting % file format is: data = load('pjm_load_data.txt'); % form output and feature vectors Y = hankel(data(25:end-23,2), data(end-23:end,2)); X_power = hankel(data(1:end-47,2), data(end-47:end-24,2)); X_temp = hankel(data(1:end-47,3), data(end-47:end,3)); hour = mod(data(1:size(X_power,1),1)/3600,24); X_hour_of_day = sparse(1:size(hour,1), hour+1, ... ones(size(hour,1),1)); % multiple linear regression X = [X_power X_temp X_temp.^2 X_temp.^3 X_hour_of_day]; Theta = X \ Y; Y_pred = X*Theta;
J. Zico Kolter
2
Power (GW)
Power (GW)
2
1.5
1.8
1.6
Actual Predicted 1
5
10 15 Hour of Day
Actual Predicted 1.4
20
5
10 15 Hour of Day
20
2 2 Power (GW)
Power (GW)
1.8 1.8 1.6 Actual Predicted
1.4 5
10 15 Hour of Day
20
1.6 1.4 Actual Predicted
1.2 5
10 15 Hour of Day
20
Predictions for several days J. Zico Kolter
Root Mean Squared Error (GW)
0.25 All features Hour of day only Hour of day + previous power Hour of day + temperature
0.2
0.15
0.1
0.05
0
5
10 15 20 Prediction Horizon (Hours)
Errors omitting various features
J. Zico Kolter
From demo to state-of-the-art • With a few additions, this is a state-of-the-art, deployed system: – PJM Manual 19: Load Forecasting and Analysis. PJM. Available at: http://www.pjm.com/~/media/documents/manuals/m19.ashx – Neural network, additional features for weekdays/weekends/holidays, more nuanced treatment of temperature (heating/cooling degree days)
• A huge existing literature on the topic: Soliman, S. A. and Al-Kandari, A. M. (2010). Electrical Load Forecasting: Modeling and Model Construction. Elsevier.
J. Zico Kolter
Renewable generation forecasting • Can apply the exact same methodology to forecast the generation of uncertain sources like wind or solar power • Some work in the area, but a much more recent topic:
– A. Costa, et al. A review on the young history of the wind power short-term prediction. Renewable and Sustainable Energy Reviews, 12(6):17251744, 2008. – C. Monteiro, et al. Wind power forecasting: state-of-the-art 2009. Technical report, Argonne National Laboratory (ANL), 2009 – Kaggle Global Energy Forecasting Competition, 2013. http://gefcom.org J. Zico Kolter
Research directions for AI
• How do we make multiple predictions across spatially-similar regions? • How do we deal with uncertainty in the predictions? • How can we use these predictions to actually optimally schedule generation (more on this later)
J. Zico Kolter
Large-scale probabilistic forecasting • Some of our recent work on the topic: M. Wytock, J.Z. Kolter. Sparse Gaussian conditional random fields: Algorithms, theory, and application to energy forecasting. ICML, 2013.
J. Zico Kolter
• Algorithm: sparse conditional Gaussian random field (SGCRF) y1
y2
y3
···
yp
x1
x2
x3
···
xn
• Mathematical formulation: p(y|x) ∼ exp −y T Λy − 2y T Θx , Λ ∈ Rp×p , Θ ∈ Rp×n • Train using maximum likelihood estimation with `1 regularization on Λ and Θ minimize log p(Y |X) + λ(kΛk1 + kΘk1 ) Λ,Θ
J. Zico Kolter
Performance on wind forecasting SGCRF LS
MSE
0.5 0.4 0.3 0.2 0 10
−1
10 λ
−2
10
• Least-squares here uses highly tuned features, got 5th place in Kaggle Global Energy Forecasting Competition. J. Zico Kolter
Performance on load forecasting 0.12 SGCRF PJM forecast
MSE
0.1 0.08 0.06 0.04
−1
−2
10
10
−3
10
λ
• “PJM” is deployed solution at utility. J. Zico Kolter
Summary: energy forecasting
• Application: Predicting future demand on the electrical grid, or future generation of renewable sources • Algorithm: Supervised learning techniques for forecasting; recent work involving large-scale probabilistic modeling
J. Zico Kolter
Outline • Introduction to sustainable energy and the smart grid • Three highlighted topics: – Power generation and demand forecasting – Energy disaggregation – Control in the smart grid
• Final thoughts
J. Zico Kolter
Energy disaggregation
J. Zico Kolter
J. Zico Kolter
J. Zico Kolter
Refrigerator Washer/Dryer Lighting Computer ...
5.64 10.23 15.20 9.40
J. Zico Kolter
J. Zico Kolter
J. Zico Kolter
5000 4500 4000
Power (Watts)
3500 3000 2500 2000 1500 1000 500 0
17:40
17:50
18:00 Time
18:10
J. Zico Kolter
5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0
17:40
17:50
18:00
18:10
J. Zico Kolter
A slightly simpler problem...
• Given power traces for a single device, determine if it is a given device (e.g. a refrigerator) or not • Often used as a sub-step of a full energy disaggregation algorithms
J. Zico Kolter
250
Power (watts)
200
150
100
50
0
1000 2000 3000 4000 5000 6000 7000 Time (seconds)
Power signal for refrigerator J. Zico Kolter
250
Power (watts)
200
150
100
50
0
1000 2000 3000 4000 5000 6000 7000 Time (seconds)
Constructing features from power signal J. Zico Kolter
Refrigerator vs. other devices 2500 Other devices Refrigerator Duration (seconds)
2000
1500
1000
500
120
140
160
180 200 Power (watts)
220
240 J. Zico Kolter
• MATLAB code for classifying devices (using YALMIP optimization library to solve SVM) % file format: data = load('device_signals.txt'); X = data(:,1:2); y = data(:,3); m = size(X,1); % construct kernels and outputs X = (X - repmat(mean(X),m,1)) ./ repmat(std(X),m,1); sig = 1.0; C = 100; K = exp(-sqdist(X', X')/(2*sig^2)) + 1e-2*eye(m); % solve SVM a = sdpvar(m,1); solvesdp([], a’*K*a + C*sum(max(0,1-y.*(K*a))), ... sdpsettings('solver', 'sedumi')); Y_pred = sign(K*double(a));
J. Zico Kolter
2500
Duration (seconds)
2000
1500
1000
500
120
140
160
180 200 Power (watts)
220
240
Kernelized SVM, Gaussian kernel (σ = 1.0) J. Zico Kolter
2500
Duration (seconds)
2000
1500
1000
500
120
140
160
180 200 Power (watts)
220
240
Kernelized SVM, Gaussian kernel (σ = 0.2) J. Zico Kolter
Standard approach for energy disaggregation 5000 4500 4000
Power (Watts)
3500 3000 2500 2000 1500 1000 500 0
17:40
17:50
18:00 Time
18:10
• State of the art (for 20+ years, e.g. Hart 1992): classify edges in power signal, integrate to determine breakdown of energy. • A recent survey: M. Ziefman and K. Roth. Nonintrusive appliance load monitoring: Review and outlook. IEEE Transactions on Consumer Electronics, 57(1):7684, 2011. J. Zico Kolter
Research directions for AI
• Probabilistic disaggregation using source separation techniques (standard classification approach very sensitive to errors) • Unsupervised/semi-supervised learning of appliance models • Determining best feedback methods for giving information to users
J. Zico Kolter
Disaggregation with Factorial HMMs • Some of our recent work on the topic: J.Z. Kolter and T. Jaakkola. Approximate inference in additive factorial HMMs. AISTATS, 2012.
xt−1
(1)
xt
xt−1
(2)
yt−1
(1)
(1)
xt+1
xt
(2)
xt+1
yt
yt+1
5000 4500 4000 3500
(2)
3000 2500 2000 1500 1000 500 0
17:40
17:50
18:00
18:10
Factorial hidden Markov model (FHMM) J. Zico Kolter
xt−1
(1)
xt
xt−1
(2)
yt−1
(1)
(1)
xt+1
xt
(2)
xt+1
yt
yt+1
(2)
• Challenge: Inference (determining the most likely discrete states given observed output) is intractable for FHMMs • Algorithmic work: New methods for approximate inference in FHMMs, based upon convex relaxations
J. Zico Kolter
• The trick: look at a probabilistic formulation in terms of the differences in total power • Write inference as optimization problem minimize µ
subject to
T −1 X
!2 (yt+1 − yt ) −
t−1 (i) µt ∈ {0, 1}, ∀i, t (i) µ1:T −1 “valid”, ∀i
X
(i) θiT µt
i
where µ variables represent indication function of state change and θ parameters denote mean power outputs • Key property: if only one device changes state at any given time, we can relax problem to linear program; solving optimization problem typically results in integral solutions J. Zico Kolter
Performance 5000
4500
4500
4000
4000
3500
3500 Power (Watts)
5000
3000 2500 2000
3000 2500 2000
1500
1500
1000
1000
500 0
unassigned kitchen outlets microwave washer dryer
500 17:40
17:50
18:00
0
18:10
17:40
True breakdown 4500 4000
Power (Watts)
3500
Unassigned kitchen outlets furnace bath gfi microwave kitchen outlets washer dryer
5000 4500 4000 3500 Power (Watts)
5000
3000 2500 2000
18:10
unassigned bath gfi microwave furnace kitchen outlets kitchen outlets washer dryer
2500 2000 1500
1000
1000
0
18:00 Time
3000
1500
500
17:50
Our approach
500 17:40
17:50
18:00 Time
Event-based
18:10
0
17:40
17:50
18:00 Time
18:10
Structured mean field J. Zico Kolter
Performance
Circuit Microwave Bath GFI Kitchen Outlets Furnace Kitchen Outlets Washer / Dryer Total
Our Method 98% / 66% 83% / 71% 38% / 13% 92% / 71% 45% / 16% 99% / 73% 87% / 60%
Previous approx 97% / 4% 50% / 9% 10% / 48% 13% / 15% 13% / 24% 89% / 77% 36% / 45%
Event-based 98% / 28% 23% / 21% 57% / 15% 25% / 71% 27% / 11% 95% / 64% 49% / 53%
Performance on example circuits in a home over two weeks All data available at: http://redd.csail.mit.edu
J. Zico Kolter
Summary: energy disaggregation
• Application: Understand breakdowns of power from smart meters • Algorithm: Supervised learning for classifying devices, hidden Markov models and approximate inference approaches for source separation in factorial HMMs • Additional work: A great deal of follow on work, using REDD data set, or other domains (e.g. conference paper at IJCAI on water disaggregation)
J. Zico Kolter
Outline • Introduction to sustainable energy and the smart grid • Three highlighted topics: – Power and demand forecasting – Energy disaggregation – Control in the smart grid
• Final thoughts
J. Zico Kolter
The challenge of electrical grid control 83m residential and 5m commerical/ industrial buildings (768 GW peak demand)
172k miles of transmission lines
5,500 power plants (925 GW capacity)
8 7
9 01
6 5 4
2
8
3 7
9 01
6 5 4
2
8
3 7
9 01
6 5 4
2
8
3 7
9 01
6 5 4
2
8
3 7
9 01
2 3
6 5 4
30m installed smart meters
You: need to supply power to the country
49 GW of installed wind/ solar capcity
Data: U.S. Energy Informational Administration, 2010 U.S. Census, Institute for Electric Efficiency, Argonne National Labs J. Zico Kolter
• Generators has very different costs, greenhouse emissions, and ramp rates • Transmission lines have different physical properties and capacities • Power can’t be “routed” like packets, obeys laws of physics • Wind and solar provide “free” power (at least at operation time), but are non-dispatchable, intermittent, and can’t currently be stored economically • Emerging ability to also control load through “demand response”
J. Zico Kolter
Everything you ever wanted to know about power systems but were afraid to ask
• Voltage: electric potential energy, 1 volt = 1 joule/coulomb • Current: flow of charge, 1 ampere = 1 coulomb/second • Ohm’s law: i = v/R • Power: p = vi
J. Zico Kolter
• Alternating current (AC) systems: v(t) = vˆ cos(ωt + θ), vˆ = voltage magnitude, ω = frequency, θ = phase/voltage angle • Represent using complex numbers: v = vˆejθ , j =
√ −1
• Ohm’s law: i = Y v, where Y is admittance Y = 1/(R + jX), 1 X is reactance, X = ωL − ωC , L is inductance and C is capacitance • Power: s = v¯i (¯· is complex conjugate), has both real and imaginary components, called real and reactive power respectively
J. Zico Kolter
• Power network: i ∈ Cn , v ∈ Cn • Ohm’s law + Kirchoff’s voltage law: i = Y v where ( 1 − Rk` +jX k 6= ` (0 if k, ` not connected) k` Yk` = P 1 s6=k Rsk +jXsk k = ` • Flow over line via Ohm’s law: ik` =
vk −v` Rk` +jXk` J. Zico Kolter
• Power: s = diag v¯i = diag v Y¯ v¯ • Power flow: given some know powers/voltages, solve (non-linear) equation s = diag v Y¯ v¯ • Optimal power flow (OPF): Solve some optimization problem (e.g. minimize generation cost) subject to power flow constraint • But non-linear equations is nasty, so we can simplify (assume voltages equal, transmission lines have no resistance just reactance, small angle approximation) to get ( p = Bθ,
Bk` =
− X1k` P 1
s6=k Xsk
k 6= ` k=`
called (in the worst naming convention ever) DC power flow J. Zico Kolter
An example of DC OPF • An example optimal power flow optimization problem minimize pG ,θ
n X
ci (pG i )
i=1
subject to Bθ = pG − pL
pG ≤ pG ≤ pG
|Bij (θi − θj )| ≤ F ij
where pG , θ ∈ Rn are optimization variables; B ∈ Rn×n is DC approximate admittance matrix; pL ∈ Rn is a vector of loads at each node; pG , pG ∈ Rn are generator upper and lower bounds; and F¯ij is the power capacity of the transmission line between nodes i and j J. Zico Kolter
IEEE 30 bus test system J. Zico Kolter
• MATLAB code for DC optimal power flow % load electrical network data from file [B, p, gen, base_mva] = load_cdf_dc('ieee30cdf.txt'); n = size(B,1); p_load = max(-p,0); p_gen = sdpvar(n,1); theta = sdpvar(n,1); % set cost1 cost2 const
up costs and constraints = p_gen(gen(1)) + p_gen(gen(1))^2; = 0.5*p_gen(gen(2)) + 2*p_gen(gen(2))^2; = [B*theta == p_gen - p_load; theta(1) == 0; p_gen(setdiff(1:n,gen)) == 0; abs(B(1,2)*(theta(1) - theta(2)))