8. Simulating PV System Power Output with SPMQ

If you've made it to post #8 (after doing all of the other 7 R manual entries!), then you are ready to become an anusolar R-package jedi master.  Well, sort of - I mean, you'll be simulating PV system power output with the SPMQ function.  That's pretty rad, right?

The basis for the SPMQ function, are the Sandia Performance Model and Inverter Performance Model created by the incredible folks at Sandia National Laboratories. These functions, while more than a decade old, still form the basis of modern PV simulation tools like SAM, the NREL tool for simulating PV system performance.

I review the modelling process that I've adopted for simulation PV system power output in my KPV paper (Engerer and Mills 2014), so read that manuscript in order to find out more about how SPMQ will work.  I've also uploaded the two papers that outline the Sandia Performance and Inverter Models here:  King et. al. 2004 and King et. al. 2007.  The equations laid out in these reports make up the core functions in the SPMQ function R code.

How do these models and this function work?

The general ideas behind both models is fairly simple.  Given several predictor variables, simulating PV module and inverter power output for a given value of radiation (and temperature and wind velocity) is possible.  These predictor variables are empirical coefficients, meaning that they are determined via testing of various PV module and inverter types.  The coefficients are then stored in a library, which is published in each updated version of NREL's SAM model. 

You can see the current version of the databases in the /code/rpkg/support/SPM/ directory. Where there are 4 files:

  • SPMDb.csv, the Sandia Module Database, for simulating a given PV module
  • SMD_Compare.csv, identical in content to SPMDB.csv, but with maximum power point calculations added (PMP), and the table is sorted by this value to allow for easier selection of a matching module for your simulations. The original row number is in the left column and is the $mm variable described in "3. Working with Data" under "Module Matching"
  • inv_lib.csv, empirical coefficients for the inverter modelling, selected based on the row number in variable $im (again, see "3. Working with Data" under "Inverter Matching")  
  • inv_types.txt, inverter names by row number in the database

There are three things that are required to simulate PV system performance:

  1. Radiation data
  2. Weather observations
  3. Metadata about the PV system

More specifically, SPMQ expects three objects:

  • s, a list object of radiation data, which includes:
    • $Egt, $Ebt, $Edt - the three tilted components of radiation on plane of array surfaces
    • $ama, $aoir - absolute air mass and angle of incidence (radians)
  • swx, weather observations of matching timesteps, including:
    • $tmp, temperature in Celsius
    • $wsp, wind speed in km/h
  • ar, metadata array which contains all of the PV metadata described in 3. Working with Data.

Let's start off by generating a clear sky curve for a solar PV system.

In this example, we'll work with data from Sydney, and a ramp event that James Bills identified in his honours thesis of the PV power output in the region.  You can find the example code in /code/examples/syd.pvo.ramp_event.R


# load PV metadata for 500 PV sites in Sydney
syd = ar.in("SYD")

# load in some raw PV data
indf = read.pvo(syd,2013,11,22) # largest negative collective ramp from Bills 2015
kwr = indf[,3:dim(indf)[2]] # trim off the timestamps, keep only power output

# grab the matching weather data, match it with the PV data
wx = read.wx(syd,2013,11,22)
swx = shrink.wx(wx,indf)

# solar geometry (R Manual Post #6)
sza = SZAQ(indf,syd)

# clear sky model (R Manual Post #7)
csm = CSMQ(sza,swx,'rest2',syd)

# simulate clear sky power output for PV systems
spm = SPMQ(csm,swx,syd)
kwc = spm[,3:dim(spm)[2]] # trim off the timestamps, get PV estimates only


The above code will load in the PV metadata for Sydney, bring in some raw PV power output data, as well as some weather observations (provided for station 66006 on required days only).  This is all of the data we'll require to run some PV system simulations for clear sky radiation, and then use the simulated data in a simple analysis.

We'll use the time-stamped PV power output data as the input data frame to SZAQ, to get all of the solar geometry calculations we'll require.  We pass the output of SZAQ to CSMQ, as we learned to do in the last post

With the output from CSMQ, our clear sky radiation estimates, we can actually pass this straight into SPMQ, as it contains the appropriate radiation components, transposed to the plane of array surfaces of our PV systems.

We then execute SPMQ via:

spm = SPMQ(csm,swx,syd)

These are the s, swx and ar input data requirements detailed above. 

After this step, we get data.frame object returned via spm, within which are $gtms and $ltms the UTC and local time, in the first two columns, followed by the simulated clear sky power output for all of the PV systems available in syd, our metadata object.  

One good trick to learn, is to chop off of the first two timestamp columns, holding onto just the numerical data in kwc as shown above. This means that only the clear sky curves are now contained in kwc. Which is useful for calculations/analysis with the data.

Let's Check Out Our Results

# visualise clear sky example
n=8 # try some other numbers, but watch out for missing data/bad matches (raw data, no Quality Control!)
plot(kwr[,n],typ='l',ylim=c(0,1.2))
lines(kwc[,n],col='blue',lwd=2)

Here is measured power output (black, from kwr) and clear sky power output (blue, from kwc) for a single site in SYD.

Here is measured power output (black, from kwr) and clear sky power output (blue, from kwc) for a single site in SYD.

What if we change the value of n? Let's try n= 20...

This looks similar, but upon closer inspection we can see that the clear sky curve is a bit too low throughout the day.  This is the result of the PVOutput.org user not properly reporting some aspect of the system's metadata.  That could be the tilt angle or the total installed capacity of the PV system. This highlights the risks of blindly using PVOutput.org data without extensive quality control. 

This looks similar, but upon closer inspection we can see that the clear sky curve is a bit too low throughout the day.  This is the result of the PVOutput.org user not properly reporting some aspect of the system's metadata.  That could be the tilt angle or the total installed capacity of the PV system. This highlights the risks of blindly using PVOutput.org data without extensive quality control. 

Ignoring the possible issues with our raw dataset, we can still have a bit of fun with the data from this particular day.  There was actually a very significant negative ramp event that occurred over the Sydney CBD during this period.  To analyse all of the data we can...

# analyse the major ramp event!
plot(kwr[,1]~indf$ltms,typ='l',ylim=c(0,1.2))
for(i in 2:50){lines(kwr[,i]~indf$ltms,col=i)}
mndf = rowMeans(kwr,na.rm=T)
lines(mndf~indf$ltms,lwd=3,col='black')

Wow! That's a serious drop in collective power output over our 50 selected sites! Here the dark black line is the mean power output across all of the sites.

Wow! That's a serious drop in collective power output over our 50 selected sites! Here the dark black line is the mean power output across all of the sites.

Here is is useful to go into a bit more detail, connecting back to James' thesis.  He analysed this event in a bit more detail, and what he found was very interesting:

From James' thesis (find it on the "Research > Student Projects" page

From James' thesis (find it on the "Research > Student Projects" page

There was a mega HUGE 94% drop in collective power output over an 85 minute period. But WHY? Don't forget, we need to connect these things back to the meteorology of the event.  Look no further than the BoM radar imagery:

SYD storms ramp event

This event was triggered by some severe thunderstorm activity, with the power output reduction being connected to the anvil of these thunderstorms blowing downstream within a strongly sheared wind environment. 

So here we can connect my two loves! Solar & Severe Storms.

Isn't science awesome?

Coming up next: Using SPMQ to get to KPV calculations.


7. Clear Sky Radiation Estimates with CSMQ

Clear Sky Radiation Models and CSMQ!

Clear sky radiation models provide estimates of the total radiation reaching the surface of the Earth under cloud-less conditions. These important models, form the basis for radiation data quality control and its analysis, and they are an inherent part of satellite based radiation estimatesAlso, clear sky radiation models coupled with a PV system simulation, are a great tool for PV data quality control and for simulating unmonitored PV system power output via KPV.

The anusolar Rpackage makes computing clear-sky radiation models very easy, and comes pre-loaded with several of the following modelling options:

Global clear sky models

Global clear sky models

Beam (Direct) clear sky models

Beam (Direct) clear sky models

The relevant citations for each of these models and the basic formulas required for their computation can be found in my Solar Energy paper which validated these models using Australia radiation data.

Now, onto computing these models!

Much like in the last post, we'll start by following along with the Engerer 2 separation model example for Melbourne (R Manual Entry #4).  You can find this example code in the /code/examples/ directory of the Rpackage.

# Example script for calculating the diffuse fraction
##----####----####----####----####----##
## First Example: using the Engerer2 & Perez models
# using Melbourne data provided by the Bureau of Meteorology
# for 15 January 2013, only data provided by default in repository

pathto = "" # Enter your path to repository here, end with "/"

# load environment & initialise
source(paste(pathto,"code/rpkg/init.R",sep=""))
init(pathto)

# set the local region ('macro')
mlb = rad.meta("MLB")

# load in example radiation data, 15 Jan 2015 in Melbourne
indf = read.rad(mlb,2013,1,15) # macro, year, month, day, resolution (sec)

# generate solar position info for the time series
sza = SZAQ(indf,mlb)

# load in some weather data
wx = read.wx(mlb,2013,1,15)

# load up clear-sky model output, using rest2 model
csm = CSMQ(sza,wx,'rest2',mlb)

Reading through the above, in order to compute a clear sky model, we can see that we need to:

  1. Initialise the R Environment
  2. Load in some metadata from our $macro
  3. Create a time-stamped data.frame, indf (here we use the radiation data from read.rad)
  4. Calculate the solar geometry
  5. Load in some weather data
  6. Compute the CSMQ function

Steps 1-4 should be familiar to you by now.  We covered some details about step 5 in the #3 Working with Data post, but we'll need some more information about weather data in a future post (coming soon!).

Step 6 is new, and introduces the CSMQ() function, which stands for Clear Sky Modelling (Quick - Vectorised). This whiz-bang ninja of a function allows us to compute any of the clear sky models from the above list.

This function requires 4 inputs:

sza, the output from the SZAQ function
cwx, timestamped weather observation data which matches the timestamps in sza
modl, a string value selecting the clear sky model to compute
ar, your metadata array (list object) which should be the same as what you used for SZAQ

The options for the modl variable are:

'rest2', Gueymard (2008)
'esra', Rigollier et al. (2000)
'ineichen_perez', Ineichen and Perez (2002)
'bird', Bird and Hulstrom (1981)
'iqbal', Iqbal (1983)
'mac', Davies and McKay (1982)
'solis', Ineichen (2008)

Note that these are character strings, and need to be wrapped in quotation marks within R. Using 'rest2' or 'esra' are my recommendations for any modelling in Australia.

The function returns several useful outputs:

$Egc,  global horizontal clear sky radiation
$Ebnc, beam normal clear sky radiation
$Ebhc, beam horizontal clear sky radiation
$Ebtc, beam tilted clear sky radiation (plane of array surface, PV systems only)
$Edhc, diffuse horizontal clear sky radiation
$Edtc, diffuse tilted clear sky radiation (plane of array surface, PV systems only)
$Egtc, global tilted clear sky radiation (plane of array surface, PV systems only)
$ama, absolute air mass
$aoir, angle of incidence (radians) (plane of array surface, PV systems only)

Let's take a look at the data...

plot commands
Here's an example with the rest2 model.  Solid blue = Global, Dashed blue = Beam Horizontal, Dotted blue = diffuse horizontal.

Here's an example with the rest2 model.  Solid blue = Global, Dashed blue = Beam Horizontal, Dotted blue = diffuse horizontal.

It's also fun to see how different models behave:

csm.bird = CSMQ(sza,wx,'bird',mlb)
csm.esra = CSMQ(sza,wx,'esra',mlb)
csm.rest = CSMQ(sza,wx,'rest2',mlb)

plot(csm.bird$Egc~csm$ltms,typ='l',col='red',ylab='W/m^2',xlab='Local Time')
lines(csm.esra$Egc~csm$ltms,col='dark green')
lines(csm.rest$Egc~csm$ltms,col='blue')

Bird model - red, Esra model - green, rest2 model - blue. An interesting difference between Bird & Esra versus rest2!

Bird model - red, Esra model - green, rest2 model - blue. An interesting difference between Bird & Esra versus rest2!

With this information, you should be able to use each of the included clear radiation models in your analysis.  Be sure to read up on the papers to figure out how these models work, my review paper will be a great starting point! I intentionally include the original basic functions of each model in the manuscript, to make it easier for others to learn.


Be sure to check out all the other Rmanual posts:

6. Calculating Solar Geometry with SZAQ

One of the most fundamental steps in computing scientific analysis on solar radiation or PV power output data, is knowing the position of the sun in the sky.  This is why solar geometry calculations in anusolar are the starting point for anyone learning how to use this code package.

First, let's review the basics about solar geometry!

If we want to express the position of an object in space, we have to first adopt a coordinate system. For the sun, we use spherical coordinates, but we omit the radial distance of the sun from the center of our coordinate system.  We only need to express the movement of the sun across the sky (azimuth) and its height above the horizon (zenith angle)

Thanks NOAA for this graphic: http://www.esrl.noaa.gov/gmd/grad/solcalc/glossary.html

Thanks NOAA for this graphic: http://www.esrl.noaa.gov/gmd/grad/solcalc/glossary.html

OK, now let's calculate the sun's position according to these variables, using the...

The SZAQ Function

Once again, let's use the MLB diffuse fraction example code (using the Engerer2 separation model). You can find the example file in /code/examples/MLB.diffuse_fraction.r.  This loads the required macro metadata (see "working with data"), as well as some radiation data from the Melbourne airport. 


# Example script for calculating the diffuse fraction
##----####----####----####----####----##
## First Example: using the Engerer2 & Perez models
# using Melbourne data provided by the Bureau of Meteorology
# for 15 January 2013, only data provided by default in repository

pathto = "" # Enter your path to repository here, end with "/"

# load environment & initialise
source(paste(pathto,"code/rpkg/init.R",sep=""))
init(pathto)

# set the local region ('macro')
mlb = rad.meta("MLB")

# load in example radiation data, 15 Jan 2015 in Melbourne
indf = read.rad(mlb,2013,1,15) # macro, year, month, day, resolution (sec)

# generate solar position info for the time series
sza = SZAQ(indf,mlb)


Once we have the sza list object, we have all the required solar geometry variables for calculating the sun's position, as well as some other helpful outputs:

$ltms, $gtms timestamps as described elswhere
$zen, $zenr the solar zenith angle in decimal degrees, and radiations, respectively
$azi, $azir the solar azimuth angle in degrees/radians from North (0 N, W 90, E -90, S 180)
$nd, integer day of the year (1-365)
$ha, the reporting interval of the PV data in seconds
$dec, decimal time
$ha, local solar noon in decimal time

So, what can we do with this? Well we can plot the zenith angle with time:

plot(sza$zen~sza$ltms,typ='l',col='blue')

sza zenith plot

We could use the zenith angle to filter out the overnight periods from some PV data:

tmp.Egh = indf$Egh # get the global radiation values
tmp.Egh[which(sza$zen>0.9)] = NA

Some basic radiation data quality control!

Some basic radiation data quality control!

Or we can pass sza directly to the clear sky radiation modelling function (CSMQ - coming up next!).

I hope that was useful! Don't forget to check out all the other Rmanual posts:

 

5. Example: Playing around with some quality controlled PV data

[under construction]

Following on from the previous example, using the Engerer2 and DIRINT models in the anusolar Rpackage, I'd like to demonstrate how we can use it to analyse some power output data as well.

Firstly, you should download my 100 site, quality controlled PV power output dataset from Canberra.  This is a beta version of a research grade dataset that I hope to make available in the near future.

Here's what you need to do...

Download the beta version dataset at this link

Take the above data and place it in:

/data/CBR/

within your cloned/forked version of the anusolar Rpackage.

Once this is done, you can work with the data by first creating a macro metadata array for CBR (Canberra), using the quality control flag (qc = TRUE).  Then you can load data using the year, month and day, for any dates between 1 January 2013 and 1 July 2015. 

Loading in some data!

cbr.qc = ar.in("CBR",qc=T)
qcdf = read.pvo.qc(cbr.qc,2015,1,1) # YYYY,MM,DD e.g. here: 1 jan 2015

Now we can look at the quality controlled data.frame object, qcdf:

Where $gtms, $ltms are UTC and local time in POSIX format.

$kwr is raw power output data from PV system, normalised to installed capacity (kW/kWp)
$qwr is quality controlled power output
$kpd is the KPV calculation for the QC data
$kwu is the re-rated power output (adjusting for soiling,shading - note this is experimental)
$kwc is the clear sky power output curve (modelled)
$kws is the statistical clear sky power output (see QC chapter of PhD)

Important information about the data

Now, regarding the above, here are a few things for consideration. 

Firstly, $qwr and $kwu were was created using the quality control algorithm published in my thesis (see part 2).

Next, $kwr is divided by the rated capacity of the PV system. To get back to kW, you'll need to use the metadata array, cbr.qc as follows:

kw_actual = qcdf$kwr x cbr.qc$ar

where cbr.qc$ar is the PV array rating in Watts (see the "Working with Data" post for more information on variable names).

$kpd and $kwc values aren't perfect and could use some further refinement.  The KPV approach is sensitive to misalignment of the simulated PV system.  I recommend that you instead use the $kws variable for the clear sky curve (based off of the Lonij et. al. 2013 approach).

Now, to the good stuff! The power output values from these PV sites is generally very good after the quality control algorithm.  So $qwr is a great place to start trying out some science.  This with the exception of site #19249 which should be omitted (Thanks to Aloysius Aruputera of SERIS for spotting this one).  The dataset is also relatively complete (for distributed PV data anyway!), meaning that up to 85-90 sites are available at any given time for analysis.

Thanks to my collaborators at SERIS for completing this data availability analysis.

Thanks to my collaborators at SERIS for completing this data availability analysis.

If you want to see the individual site metadata, you can call the variable cbr.qc.  You can also see this data by navigating to:

/data/CBR/meta/qc.sites_info.csv

These sites are located all around Canberra, here's a map for reference (from Engerer and Hansard 2015):

Red dots = PV sites, Black lines = Canberra suburbs/localities

Red dots = PV sites, Black lines = Canberra suburbs/localities

This allows you to do all sorts of fun things. For starters, you can plot some data an get a feel for what type of variability was present:

Data from the first 10 sites in the dataset...

Data from the first 10 sites in the dataset...

Here we can see a partly cloudy day present with some fairly regular, low amplitude variability.  This type of variability would smooth out quite nicely in the mean:

Here again with the mean from all sites in black - check out that geographic smoothing!

Here again with the mean from all sites in black - check out that geographic smoothing!

Or what about a different day? Wellby and Engerer 2016, my publication on critical collective ramp events tells us that 19 February 2014 was a negative ramp event day, lets check it out:

qcdf = read.pvo.qc(cbr,2014,2,19)
plot(qcdf$qwr[,1]~qcdf$ltms,typ='l',ylim=c(0,1.2))
for(i in 2:10){lines(qcdf$qwr[,i]~qcdf$ltms,col=i)}
mndf = rowMeans(qcdf$qwr,na.rm=T)
lines(mndf~qcdf$ltms,lwd=3,col='black')

19 February 2014, a gnarly negative critical collective ramp event

19 February 2014, a gnarly negative critical collective ramp event

Wowzers! Check that out! The mean power output from all of the PV sites around Canberra drops like a rock due to a thunderstorm event.  That's awesome - and combines my two favourite things energy & meteorology

 

 

 

4. Example: Using the Engerer2 Separation Model

Alright, let's do some science! Using the Engerer2 separation model.

One of the primary reasons for getting the anusolar rpackage out into the internet is that it allows anyone to use the Engerer 2 separation model, which was recently so well reviewed by Gueymard & Ruiz-Arias 2015.  As a result, I have many scientists asking for the code to compute this model, so let's do it!

Thankfully, I am super keen to share my coding environment with you.  After all, this isn't my model, its our scientific community's model. And I want you to test it, trash it, use it, improve it, and above all, do the science necessary to integrate more solar in more places around the globe. So let's check this snazzy model for separating global radiation measurements from a pyranometer into their direct and diffuse components.

Firstly, you're going to need the rpackage, so go back and follow through steps 1-3. I'll put them right here in case you need them:


Once you have the package repository cloned/forked and initialised, navigate your way to /code/examples/, where you'll find the file "MLB.diffuse_fraction.r".  Open this file in your editor of choice.

The code asks you to enter the path to the code repository in the pathto variable, do that first.

# Enter your path to repository here, end with "/"
pathto = "" # e.g. pathto = "~/anusolar/"

# load environment & initialise
source(paste(pathto,"code/rpkg/init.R",sep=""))
init(pathto)

Next up, we load the macro region. You can read more about how the data environment works in the previous post.  The macro tells the code where you are operating, in this example the macro region is already setup for Melbourne ("MLB"). 

# set the local region ('macro')
mlb = rad.meta("MLB")

Take a look at the 'mlb' metadata array, and you'll see several variables within this list object.  Included is the $loc three-letter identifier, $lat/$lon in decimal degrees, $alt for altitude - all things you can learn a bit more about in the "Working with Data" R manual page.

Take a look at the 'mlb' metadata array, and you'll see several variables within this list object.  Included is the $loc three-letter identifier, $lat/$lon in decimal degrees, $alt for altitude - all things you can learn a bit more about in the "Working with Data" R manual page.

To play around with the Engerer2 model, I've included 1-day of example radiation and weather data for you, from the 15 January 2015, so we'll load that in:

# load in example radiation data, 15 Jan 2015 in Melbourne
indf = read.rad(mlb,2013,1,15) # macro, year, month, day, resolution (sec)

# load in some weather data
wx = read.wx(mlb,2013,1,15)

If you want to load in in your own data, you'll need to feed in data in the same format. So take a look at the indf and wx data.frame objects and their variable names. I'll post more data and explanations in the future.

This example data is from the Bureau of Meteorology radiation site at the Melbourne airport, here you have 1-minute data with UTC ($gtms) and local daylight time ($ltms) timestamps.  The remaining variables are described in the in the "Working with Data" R manual page.

This example data is from the Bureau of Meteorology radiation site at the Melbourne airport, here you have 1-minute data with UTC ($gtms) and local daylight time ($ltms) timestamps.  The remaining variables are described in the in the "Working with Data" R manual page.

Next we need to create the necessary solar geometry calculations, this is accomplished through a function called "SZAQ", which stands for Solar Zenith Angle Quick (meaning that it is vectorised).

# generate solar position info for the time series
sza = SZAQ(indf,mlb)

Plotting some of the output from SZAQ, $zen is the zenith angle in decimal degrees (multiply by 100 to get degrees) and $ltms the local time.  Look at that sexy sun move across the sky, that's looking like some science.

Plotting some of the output from SZAQ, $zen is the zenith angle in decimal degrees (multiply by 100 to get degrees) and $ltms the local time.  Look at that sexy sun move across the sky, that's looking like some science.

With the solar geometry at hand, as well as the weather data we loaded in earlier (a blog post on these items in the near future), we can now use the CSMQ function.  That stands for "Clear Sky Models (Quick)", which features all of the clear-sky models in my 2015 Solar Energy validation publication.  For the purposes of the Engerer 2 model, we'll use the REST2 clear sky model, which is the preferred model for the Engerer2 computation.

# load up clear-sky model output, using REST2 model
csm = CSMQ(sza,wx,'rest2',mlb)

Here it is, warts and all (we can ignore the low sun angle behaviour, I'll fix that), a clear sky radiation estimate from the REST2 model, where the $Egc represents the global horizontal clear sky radiation in kW/m^2.

Here it is, warts and all (we can ignore the low sun angle behaviour, I'll fix that), a clear sky radiation estimate from the REST2 model, where the $Egc represents the global horizontal clear sky radiation in kW/m^2.

Next up, we can calculate the Engerer2 model to separate our global radiation measurements from the MLB radiation station.

# calculate the diffuse fraction via Engerer 2
Edh = engerer2(indf$Egh,sza,csm) # inputs from global horizontal irradiance (Egh), SZAQ, CSMQ
Edf_e = Edh/indf$Egh # calculated diffuse fraction

# note: here, Egh values should match up, row for row with sza$gtms/ltms

Lucky you, you can also compute the DIRINT model of Richard Perez...

# calculate the diffuse fraction via Perez model
Edh = perez.sep.wrapper(indf$Egh,sza,wx,mlb)
Edf_p = Edh/indf$Egh

And we can also take a look at what we just did. Let's use the classic Kt (clearness index) and DF (diffuse fraction) method of visualising the data:

# visualise:
Eext = spencer(sza)
EextH = Eext
cos(sza$zenr)
Kt = indf$Egh/EextH # clearness index

DF = indf$Edh/indf$Egh # observed diffuse fraction

plot(Kt,DF,xlim=c(0,1),ylim=c(0,1),pch=19,cex=.30) # plot obs
points(Kt,Edf_e,col='blue',pch=19,cex=0.3) # Engerer 2
points(Kt,Edf_p,col='red',pch=19,cex=0.3) # Perez*

NOTE
# radiation data should always be processed through a quality control algorithm
# the data used here are not filtered, and are raw data

Black is observed data, blue is the Engerer2 model and Red is the DIRINT model.  Looks like Richard wins this time - argh! :-)

Black is observed data, blue is the Engerer2 model and Red is the DIRINT model.  Looks like Richard wins this time - argh! :-)

And there you have it, our first functional example!

Now, for this to be truly useful, I need to cover a few more items:

  • More information about setting up your own macro
  • More information about loading in your own data
  • Providing some additional datasets
  • Examples of all the functions included in this entry (and more!)

So I've got some more work to do, but trust that I'll get around to it soon :-)

For now, I'm glad to introduce you to how to use this model!

 

3. Working with Data

Working with Data in anusolar

One of the primary reasons for the formulation of this Rpackage in a non-traditional format, is the way in which it allows you to work with your own data (or even data provided by the ANU).  The “/data/“ folder is a place for you to add any data relevant to your project

The underlying structure of the data within this code package relies on macro and micro locations.  The macro level is data organised by a regional level, most commonly a greater metropolitan area or the city limits.  These are assigned a three-letter identifier, comprised of the first three consonants of a location (e.g. Melbourne - MLB, Sydney - SYD, Freiburg - FRB).  Note CBR for Canberra is an exception as its creation predates the creation of this data structure.  Micro locations are sub-regions within a macro, also given a three-letter identifier, and are organised under the macro location in the data directory.

Data are to be placed under the data/ directory under the root folder (see Initialising).  Under this macro ID, data are then organised by primary type, the directory PVO/ or RAD/

The data required for this package to “do things”, includes primary data (PV or radiation data), supplementary data (such as meteorological data or atmospheric data), and metadata (data about the location of your observations and properties of the reporting equipment).  

Let's start off discussing the metadata...


1. Metadata

The primary data are assumed to be broken down by individual sites, where each stream of data has an individual location with its own unique meta properties.  There should be meta-data for each macro (see next section, Data Structure).

The required data (minimum for functional code) are $lat, $lon, $alt.  For the radiation or pv data we'll need some extra information.

Macro_Meta

Radiation Site Metadata

Radiation measurement site data in placed into the /data/locations.csv file.  Several of the Bureau of Meteorology radiation stations are already in this file. If you want to add a location, you'll need to place it in this file.

You can create a metadata list object for a radiation site by using the rad.meta function. You can see this done in the Engerer2 separation modelling example for Melbourne, done as follows:

mlb = rad.meta("MLB")

In the mlb metadata list object, you'll find the variables below.  Of particular interest are the $wxid and $radid variables.  These correspond to the Australian Bureau of Meteorology radiation or weather data site numbers, with leading zeros removed.  If you are working with data outside of Australia, use the World Meteorological Site Number. The $radius value is kilometer value, which specifies what zone is included in the macro region, by drawing a circle of that radius around the lat/lon values.

Timezones are also an important thing to list here, including the local time offset from GMT ($lst, $dst).  For more information on $cutz, see this webpage. 

PV Metadata

PV metadata refers to all of the important information required for simulating PV system power output through this Rpackage. The database is a very simple .csv file, which is stored in either:

/data/$macro_id/meta/sites_info.csv. [for raw PV data sites]

/data/$macro_id/meta/qc.sites_info.csv. [for quality controlled data sites]

I'll explain the $macro_id more below!

These PV metadata objects ("lists" in R) can be loaded using pre-made functions in the rpackage, for example to load raw data PV sites:

cbr = ar.in("CBR") # here $macro_id = "CBR"

Or for the quality controlled data:

cbr.qc = ar.in("CBR",qc=T)

Try the above code out! In these list objects, you'll see the following variables:

pv metdata

In order to simulate a PV system's power output, you'll require at least the following:

$sid a unique numerical identifier for each site
$nm, $ms, $mp, $mr information about the modules and their place in the array layout
$mm, see module matching below
$ni,$ir inverter details
$im, see inverter matching
$int the reporting interval of the PV data in seconds

Module Matching:

You'll need to pick the best match (the value of the line number in the database) for your module from within the Sandia Module Database, located in: /code/rpkg/support/SPM/SPMDb.csv.

Inverter Matching:

You'll need to pick the best match (the value of the line number in the database) for your inverter from within the Sandia Inverter Database, located in: /code/rpkg/support/SPM/inv_lib.csv.

Note: this package does not yet support split array layouts directly, that is a feature that will show up soon.


2. Supplementary Data

There are two types of supplementary data that are required for analyses of the primary data, these are meteorological data and atmospheric data.  Meteorological data, in the case of Australia, is sourced from the Bureau of Meteorology, but may be provided by you for other locations.  Atmospheric data includes values like ozone concentration or atmospheric turbidity should be downloaded from the SoDa webpage. Let's dig into these, starting with...  

Supplementary Data: Atmospheric Data

In order for the clear-sky radiation modelling routines to operate, many of them require input data about the condition of atmospheric variables, these are:

  1.     Ozone
  2.     Water Vapour
  3.     Aerosol Optical Depth
  4.     Turbidity

These are all sourced from the SoDa database using the URL:
http://www.soda-is.com/eng/services/service_invoke/gui.php?

The SoDa database at the link above.  You can extract the data you need for each site using the lat, lon and altitude for your site.

The SoDa database at the link above.  You can extract the data you need for each site using the lat, lon and altitude for your site.

You'll use this portal to extract this atmospheric data by entering the latitude, longitude and altitude of the location you are setting up. Use “Execute SoDa Service” for each of the variables. This needs only be done once for each macro region (and does not need to be done if it is already setup for you).

These are then saved directly into the macro directory, e.g. data/ LB/lt.csv, data/MLB/wv.csv, etc. The best way to do this is to copy the example files from /data/MLB/ to your new directory, and fill in the values from the portal service.

Supplementary Data: Weather Data

Basic meteorological observations are required for computing several of the included radiation models and for simulating PV system power output. For this purpose, we'll use the /data/WX/$wxid/ directory where $wxid matches up with the locations.csv file (see above), including any leading zeroes required to be six digits long.  In this directory, weather data files should be broken into daily files and named according to the YYYY-MM-DD.rda format. Once they are there, we can load weather data via:

# set the local region ('macro')
mlb = rad.meta("MLB")
# load in some weather data
wx = read.wx(mlb,2013,1,15) #YYYY,MM,DD

Weather variables in the above $wx data.frame should obey the following naming format (e.g. $stn, $ltms, etc.):

wx vars

Lastly, you can see all of the weather stations available in Australia in the /data/WX/stations.csv file


3. Primary Data

There are two primary types of input data to this R package: PV power output data (most commonly from PVOutput.org) and solar radiation data.  These will be referred to as PVO and RAD data respectively.  

Radiation observations Data [RAD]

The radiation data within this environment is stored within the macro region directory, under /data/$macro_id/RAD/ in either the /qc/ or /raw/ directories.  The files are broken into days, according to YYYY-MM-DD.rda format.  These can be read using the function read.rad:

# load in example radiation data, 15 Jan 2015 in Melbourne
indf = read.rad(mlb,2013,1,15) # macro, year, month, day, resolution (sec)

The above example is from the Engerer2 Melbourne example.  If you look inside the data.frame indf, you'd find:

rad vars

If you want to add radiation data to your copy of anusolar place it in the /RAD/raw/ directory.  The 'qc' directory will be for the QCRad function (more on that in future posts), which will quality control the raw radiation data.

PV power output data [PVO]

PV power output data should be stored under the PVO/ directory, the organisation of data is broken down into six subdirectories; these subdirectories will be created the first time the setup.new.location() function is run:

all_sites
qc_sites

assembled
meta

The meta/ directory contains the metadata information organised into a csv file named sites_info.csv (see PV metadata above!).  It will also contain the qc.sites_info.csv file for quality controlled data (e.g. the data available in this post).

all_sites/ contains a single .rda file for each day with the file format YYYY-MM-DD.rda.  All of the sites within the macro location are collated into this file. Each file contains a data frame named all_sites containing the variables $gtms, $ltms and then $sid for each site.  The $sid variable means e.g. site $sid = 1797, $1797 is the variable name for the power output at that site.   

qc_sites/ is arranged in the same format as all_sites/, but the data have been pre-processed by the quality control algorithm. The data in these files has a bit more complexity, so see the post on working with the quality controlled data.

assembled/ contains quality controlled, collated data, broken down into specified intervals, combined with multiple other data sources or post-processing fields.  Each interval of assembled data is organised by the number of seconds the output is averaged to.  E.g. 3600 is hourly data, 600 ten minute data.  Files are output at intervals specified within the function assemble(), but most commonly are produced at daily, monthly and yearly intervals, with file types organised as YYYY-MM-DD.rda, YYYY-MM.rda and YYYY.rda, respectively.  Data are stored in the parent data frame "adata".

Within this package, the PV data variable names used are:


This concludes the introduction to data section of this online R manual for the anusolar package.  I hope it was helpful. Please use the comments section below to ask questions!

Check out the rest of the R manual here:

2. Initialising the Rpackage Environment

This next set of instructions assumes that you've already:

  1. Installed R (as well as RStudio, which I highly recommend)
  2. Cloned the Repository

The root directory ($PATH_TO_ROOT = your path) is the full path to the directory of the repository that you just cloned.  Each time you want to use this package, you’ll need to initialise the environment in R.  This loads up all the necessary functions and informs your local machine where all the required directories for code & data are located.  This is similar to a 'require()' or 'library()' command in R.


Inialise by the following:

source(“$PATH_TO_ROOT/anusolar/code/rpkg/init.R”)
init($PATH_TO_ROOT/)

Examples:

OS X:

source(“/Users/[user name]/anusolar/code/rpkg/init.R”)
init(“/Users/[user name]/anusolar/“) # don’t forget the last “/“

Windows:

source(“\Documents and Settings[user name]\My Documents\anusolar\code\rpkg\init.R”)
init(“\Documents and Settings[user name]\My Documents\anusolar\”)


Having trouble? Comment below!

NEXT up: working with data

1. Getting Access, Getting Started

Step 1: Sign-up for an Assembla Account. It's free!

At this stage, the anusolar Rpackage is not really a “package”, in that it is not freely available on the cRan repositories.  Instead, the package is maintained on Assembla as a git repository.  To access it, you’ll need to sign-up for an account at assembla.com, and send me your username via the contact form.

Once you’ve been granted access to the repository, you’ll see anusolar appear in your "spaces" tab on your Assembla homepage.  Navigate to this link and select Instructions from the top submenu under Git.  You’ll need to carefully follow these instructions, based on your operating system, in order to clone the repository to your computer (pay particular attention to the SSH keys).

Step 2: Get a git repo management system, I recommend SourceTree!

While there are many options for interacting with the repository, these manual pages will only go through how to use SourceTree to do so (It's made by a great Aussie entrepreneur start up, Atlassian).  This is for the purpose of simplicity when working with students and collaborators.  Be forewarned, if you use another method, there won’t be support for any errors.  And if you make a mistake with the repo, your access might be suspended.  So be sure you really know what you are doing prior to making any commit/push commands.

First, download SourceTree.  Next, launch the software and select +New Repository > Add existing local repository.  Point SourceTree to the copy of your local repository, and it should open another SourceTree window that allows you to see all the files/directories that have been added to your local hard drive. We’ll revisit SourceTree in later sections when we review How to interact with the repository [coming soon].

Next Up: Initialise the Rpackage Environment

/*