5. Example: Playing around with some quality controlled PV data

[under construction]

Following on from the previous example, using the Engerer2 and DIRINT models in the anusolar Rpackage, I'd like to demonstrate how we can use it to analyse some power output data as well.

Firstly, you should download my 100 site, quality controlled PV power output dataset from Canberra.  This is a beta version of a research grade dataset that I hope to make available in the near future.

Here's what you need to do...

Download the beta version dataset at this link

Take the above data and place it in:

/data/CBR/

within your cloned/forked version of the anusolar Rpackage.

Once this is done, you can work with the data by first creating a macro metadata array for CBR (Canberra), using the quality control flag (qc = TRUE).  Then you can load data using the year, month and day, for any dates between 1 January 2013 and 1 July 2015. 

Loading in some data!

cbr.qc = ar.in("CBR",qc=T)
qcdf = read.pvo.qc(cbr.qc,2015,1,1) # YYYY,MM,DD e.g. here: 1 jan 2015

Now we can look at the quality controlled data.frame object, qcdf:

Where $gtms, $ltms are UTC and local time in POSIX format.

$kwr is raw power output data from PV system, normalised to installed capacity (kW/kWp)
$qwr is quality controlled power output
$kpd is the KPV calculation for the QC data
$kwu is the re-rated power output (adjusting for soiling,shading - note this is experimental)
$kwc is the clear sky power output curve (modelled)
$kws is the statistical clear sky power output (see QC chapter of PhD)

Important information about the data

Now, regarding the above, here are a few things for consideration. 

Firstly, $qwr and $kwu were was created using the quality control algorithm published in my thesis (see part 2).

Next, $kwr is divided by the rated capacity of the PV system. To get back to kW, you'll need to use the metadata array, cbr.qc as follows:

kw_actual = qcdf$kwr x cbr.qc$ar

where cbr.qc$ar is the PV array rating in Watts (see the "Working with Data" post for more information on variable names).

$kpd and $kwc values aren't perfect and could use some further refinement.  The KPV approach is sensitive to misalignment of the simulated PV system.  I recommend that you instead use the $kws variable for the clear sky curve (based off of the Lonij et. al. 2013 approach).

Now, to the good stuff! The power output values from these PV sites is generally very good after the quality control algorithm.  So $qwr is a great place to start trying out some science.  This with the exception of site #19249 which should be omitted (Thanks to Aloysius Aruputera of SERIS for spotting this one).  The dataset is also relatively complete (for distributed PV data anyway!), meaning that up to 85-90 sites are available at any given time for analysis.

Thanks to my collaborators at SERIS for completing this data availability analysis.

Thanks to my collaborators at SERIS for completing this data availability analysis.

If you want to see the individual site metadata, you can call the variable cbr.qc.  You can also see this data by navigating to:

/data/CBR/meta/qc.sites_info.csv

These sites are located all around Canberra, here's a map for reference (from Engerer and Hansard 2015):

Red dots = PV sites, Black lines = Canberra suburbs/localities

Red dots = PV sites, Black lines = Canberra suburbs/localities

This allows you to do all sorts of fun things. For starters, you can plot some data an get a feel for what type of variability was present:

Data from the first 10 sites in the dataset...

Data from the first 10 sites in the dataset...

Here we can see a partly cloudy day present with some fairly regular, low amplitude variability.  This type of variability would smooth out quite nicely in the mean:

Here again with the mean from all sites in black - check out that geographic smoothing!

Here again with the mean from all sites in black - check out that geographic smoothing!

Or what about a different day? Wellby and Engerer 2016, my publication on critical collective ramp events tells us that 19 February 2014 was a negative ramp event day, lets check it out:

qcdf = read.pvo.qc(cbr,2014,2,19)
plot(qcdf$qwr[,1]~qcdf$ltms,typ='l',ylim=c(0,1.2))
for(i in 2:10){lines(qcdf$qwr[,i]~qcdf$ltms,col=i)}
mndf = rowMeans(qcdf$qwr,na.rm=T)
lines(mndf~qcdf$ltms,lwd=3,col='black')

19 February 2014, a gnarly negative critical collective ramp event

19 February 2014, a gnarly negative critical collective ramp event

Wowzers! Check that out! The mean power output from all of the PV sites around Canberra drops like a rock due to a thunderstorm event.  That's awesome - and combines my two favourite things energy & meteorology

 

 

 

/*