First Steps

This software comes with no data. It is meant to be generic software which facilitates the automatic confrontation of model results with benchmark observational datasets. However, the best way to learn how to use this software is with actual data. To this end we have a relatively small sample which you can download. Extract this file to a location of your choosing by the following:

tar -xvf minimal_ILAMB_data.tgz
cd ILAMB_sample

We use this environment variable in the ILAMB package to point to the top level directory of the data. Later, when we reference specific data locations, we can specify them relative to this path. This both shortens the path and makes the configuration portable to other systems or data locations.

The following tree represents the organization of the contents of this sample data:

├── DATA
│   ├── albedo
│   │   └── CERES
│   │       └──
│   └── rsus
│       └── CERES
│           └──
    └── CLM40cn
        ├── rsds
        │   └──
        └── rsus

There are two main branches in this directory. The first is the DATA directory–this is where we keep the observational datasets each in a subdirectory bearing the name of the variable. While not strictly necesary to follow this form, it is a convenient convention. The second branch is the MODEL directory in which we see a single model result from CLM.

Configure Files

Now that we have data, we need to setup a file which the ILAMB package will use to initiate a benchmark study. There is such a file which comes with the software package in the demo directory called sample.cfg. Navigate to the demo directory and open this file or view it online. We also reproduce it here for the purpose of this tutorial:

# This configure file specifies the variables

[h1: Radiation and Energy Cycle]
bgcolor  = "#FFECE6"

[h2: Surface Upward SW Radiation]
variable = "rsus"

source   = "DATA/rsus/CERES/"

[h2: Albedo]
variable = "albedo"
derived  = "rsus/rsds"

source   = "DATA/albedo/CERES/"

We note that while the ILAMB package is written in python, this file contains no python and is written in a small configure language of our invention. Here we will go over this file line by line and explain how each entry functions.

At the top of the file, you see the following lines:

[h1: Radiation and Energy Cycle]
bgcolor = "#FFECE6"

This is a tag that we use to tell the system that we will have a top level heading h1 which we call Radiation and Energy Cycle. While you can name this section anything of your choosing, we have chosen this name as it is descriptive of the benchmarking activities we will perform. Also note that you may specify a background color here in hexadecimal format (we found this site to be helpful to play around with colors). This color will be used in the output which we will show later. It is important to understand that heading are hierarchical–this heading owns everything underneath it until the next h1 tag is found or the file ends. We use h1 level headings to group variables of a given type to better organize the output.

Below this, you will notice a second level heading which appears like this:

[h2: Surface Upward SW Radiation]
variable = "rsus"

We will be looking at radiation here. The variable tag is the name of the variable inside the dataset which represents the variable of interest. Here rsus is a standard name used to represent Surface Upward Shortwave Radiation. We use h2 headings to represent a variable which we wish to compare.

The next entry in the file appears as the following:

source   = "DATA/rsus/CERES/"

First, notice the absence of a h1 or h2 tag. This indicates that this entry is a particular dataset of a given variable (our h2 heading) of a given grouping (our h1 heading). We have named it CERES as that is the name of the data source we have included. We only have to specify the location of the source dataset, relative to the environment variable we set earlier, ILAMB_ROOT.

At this point we feel it important to mention that this is the minimum required to setup a benchmark study in this system. If you have an observational dataset which directly maps to a variable which is output by models as rsus is, you are done.

However, it is possible that your dataset has no direct analog in the list of variables which models output and some manipulation is needed. We have support for when your dataset corresponds to an algebraic function of model variables. Consider the remaining entries in our sample:

[h2: Albedo]
variable = "albedo"
derived  = "rsus/rsds"

source   = "DATA/albedo/CERES/"

We have done two things here. First we started a new h2 heading because we will now look at albedo. But albedo is not a variable which is included in our list of model outputs (see the tree above). However we have both upward and downward radiation, so we could compute albedo. This is accomplished by adding the derived tag and specifying the algebraic relationship. When our ILAMB system looks for the albedo variable for a given model and cannot find it, it will try to find the variables which are the arguments of the expression you type in the derived tag. It will then combined them automatically and resolve unit differences.

The configuration language is small, but allows you to change a lot of the behavior of the system. Non-algebraic manipulations are also possible, but will be covered in a more advanced tutorial.

Running the Study

Now that we have the configuration file set up, you can run the study using the ilamb-run script. Executing the command:

ilamb-run --config sample.cfg --model_root $ILAMB_ROOT/MODELS/ --regions global

If you are on some institutional resource, you may need to launch the above command using a submission script, or request an interactive node. As the script runs, it will yield output which resembles the following:

Searching for model results in /Users/ncf/sandbox/ILAMB_sample/MODELS/


Parsing config file sample.cfg...

                   SurfaceUpwardSWRadiation/CERES Initialized
                                     Albedo/CERES Initialized

Running model-confrontation pairs...

                   SurfaceUpwardSWRadiation/CERES CLM40cn              Completed  37.3 s
                                     Albedo/CERES CLM40cn              Completed  44.7 s

Finishing post-processing which requires collectives...

                   SurfaceUpwardSWRadiation/CERES CLM40cn              Completed   3.3 s
                                     Albedo/CERES CLM40cn              Completed   3.3 s

Completed in  91.8 s

What happened here? First, the script looks for model results in the directory you specified in the --model_root option. It will treat each subdirectory of the specified directory as a separate model result. Here since we only have one such directory, CLM40cn, it found that and set it up as a model in the system. Next it parsed the configure file we examined earlier. We see that it found the CERES data source for both variables as we specified it. If the source data was not found or some other problem was encountered, the green Initialized will appear as red text which explains what the problem was (most likely MisplacedData). If you encounter this error, make sure that ILAMB_ROOT is set correctly and that the data really is in the paths you specified in the configure file.

Next we ran all model-confrontation pairs. In our parlance, a confrontation is a benchmark observational dataset and its accompanying analsys. We have two confrontations specified in our configure file and one model, so we have two entries here. If the analysis completed without error, you will see a green Completed text appear along with the runtime. Here we see that albedo took a few seconds longer than rsus, presumably because we had the additional burden of reading in two datasets and combining them.

The next stage is the post-processing. This is done as a separate loop to exploit some parallelism. All the work in a model-confrontation pair is purely local to the pair. Yet plotting results on the same scale implies that we know the maxmimum and minimum values from all models and thus requires the communcation of this information. Here, as we are plotting only over the globe and not extra regions, the plotting occurs quickly.

Viewing the Output

The whole process generates a directory of results which by default is called _build. To view the results locally on your computer, navigate into this directory and start a local http server:

python -m http.server

You should see a message similar to this:

Serving HTTP on port 8000 ( ...

Open this link in your browser and you will see a webpage with a summary table in the center. As we have so few variables and a single model at this point, the table will not be very helpful. As we add more variables and models, this summary table helps you understand relative differences in scores among models. For now, clicking on a row of the table will expand it to reveal the underlying datasets used. Clicking on CERES will take you to another page which presents detailed scores and plots.