AMOS 2014 Talk: Categorising meteorological events as inputs to machine learning based solar forecasts

hobart_clouds

So I’m in Hobart, Tasmania.  A beautiful city I might add, with some picturesque upper level cirrus arranged in awesome gravity wave bands.  But I’m not here as a cloud tourist, but in fact to present some interesting research I am completing with an undergraduate student (Sonya Wellby) at the Australian Meteorological and Oceanographic Society's Annual Conference (AMOS 2014).


In the course of producing solar forecasts for our ARENA USASEC grant project (I really do need to actually write up a blog post on that at some point), we’ve discovered something interesting about the inclusion of weather data in our machine learning algorithms:  it doesn’t seem to help at all.


This is quite strange when you consider that the entire challenge of solar energy prediction is related to the clouds and those are driven by recognizable weather phenomena.  So including information like the temperature, wind speed/direction, surface pressure, etc should in theory, help the forecast improve.


But this is not what we’ve found.  In our single-site, Support Vector Machine (SVM with linear loss function) model estimates for 10 minute interval data/forecasts, we see an increase in Mean Bias and Root Mean Squared Errors and a decrease in the correlation between predictions/observations:


For hourly data/forecast intervals, we see no improvement in the forecasts with the inclusion of weather data.

10_min_forecasts
10 minute interval forecasts/data (1). 60 minute interval forecasts/data at (2).   Red line is persistence, black line SVM without weather data, green line SVM with weather data. All forecasts were produced using data from one PV site.

10 minute interval forecasts/data (1). 60 minute interval forecasts/data at (2).  

Red line is persistence, black line SVM without weather data, green line SVM with weather data.

All forecasts were produced using data from one PV site.

So what is going on here?

 

My hypothesis is that the weather data is full of too many small fluctuations and seemingly random signals for the machine-learning algorithm to see the “Big Picture" (thanks to Sonya for that phrase).  This is to say, it doesn’t recognize the overall synoptic or mesoscale event, which us meteorologists are trained to interpret.  It has no physical understanding of the data it is seeing – just a lack of direct relationships and therefor a diminishing weight to that data. 

So what can we do?

Well, our approach is to remove the individual feature vectors of weather data and replace them with a feature vector that signals weather or not a "significant" weather event is going to occur

 

“Significant” is defined here as the types of weather events that result in large scale ramp events for collectives of PV systems.  Currently, we are using data from 30 PV systems in Canberra to ID large ramp events.

 

Major events so far are fog clearing, morning cloud dissipation after easterly surges, the departure of a low pressure system/cold front and thunderstorm events.  We’re identifying more, but at this stage, I’ll refer you to our talk – which I’m posting here with audio (how cool are we?).

Check out our #AMOS2014 talk here, complete with Audio Transcript

[Download Talk]

See the slides:

Listen to the audio:

/*