Imagine you are a reinsurance pricing actuary tasked with pricing (or costing) an excess of loss contract. A typical method would be to determine the expected number of claims excess of some threshold, and then to also chose a severity distribution representing the probability of different sizes of loss above that threshold. A pareto curve would be a typical example or you also might use a semi-parametric mixed exponential distribution. Assuming these distributions represent only the ceding company's incurred loss, you can also apply their limit profile to get what's called an exposure estimate of the loss to the layer.

But what about the ceding company's loss adjustment expenses, commonly known as ALAE? For many lines of business these expenses are covered in addition to the insured's policy limit, and that is the case we will assume here. Usually an excess of loss reinsurance contract will cover some of the ceding company's ALAE for claims in the layer, or even below the layer. There are two common reinsurance treatments: **ALAE included** which means you add the indemnity and ALAE together and the reinsurer is responsible for however much of that is in the layer, or **ALAE pro-rata** which means the reinsurer pays the same percentage of the ALAE as it paid of the loss.

So we need to adjust our exposure loss cost estimate for ALAE. The traditional, and still very common, way this is done is to select an overall ratio of ALAE to loss, e.g. 5% or perhaps 20%, and then multiply each indemnity value by that amount to determine the amount of ALAE for that claim. For example, with a 20% ALAE load a $1M indemnity loss would have exactly $200k of ALAE, and every $1M claim would have exactly that same amount of ALAE.

While this seems reasonable it actually makes two very strong implicit assumptions. It forces the distribution of ALAE to be a scaled copy of the distribution of indemnity and it forces the two to be 100% correlated. We might suspect that the ALAE distribution is not a scaled copy of the indemnity distribution especially if there is a significant effect of policy limit capping, which there often is. A $1M indemnity limit and a 20% ALAE load implies a maximum possible ALAE incurred of $200k. We will look at the data further down to evaluate the correlation assumption.

Claim data with indemnity and ALAE information is an example of a bivariate distribution: random points living on the x-y plane with a certain probability distribution. At first, you might recall all the one dimensional distributions actuaries use, pareto, gamma, normal, exponential, etc., and think there must be that many bivariate distributions squared. Luckily, Sklar’s theorem states that a bivariate distribution is completely defined by the marginal distribution of the variables, i.e. the univariate distribution of x by itself and y by itself, and the copula which relates their cumulative distribution functions:

(1)\begin{equation} F(x,y)=C(u,v);u= F_X (x), v= F_Y (y) \end{equation}

A copula is simply a bivariate (or multivariate) distribution on the unit square (or cube, etc.). This means that we can fit univariate distributions to ALAE and indemnity each in isolation, something more actuaries are comfortable with, and then fit a copula to the bivariate ALAE-indemnity data transformed to $[0,1]\times[0,1]$ without worrying about any loss of information.

See the references at the bottom for more information.

Back to the premise: you are a reinsurance actuary trying to price an excess of loss contract. You already have a severity curve for indemnity, as discussed in the first paragraph, and an estimate of the expected number of claims excess of a certain threshold. You have a dataset of indemnity and ALAE amounts for a set of claims. We will not worry about trend or development (assume everything is trended and at ultimate already). What you are trying to do is refine the traditional assumption of ALAE as a fixed percent of indemnity and use a copula to model the bivariate nature of claims.

What we will do in this blog is present and walk through R code that performs this analysis step by step. If you download the .csv files attached (see link at bottom of page), you should be able to follow along and reproduce the results. We will discuss places where changes could be made as well. The full R code is embedded in the blog and also attached.

Most of the functions we use have thankfully been programmed by somebody else. Loading these packages gives us access to all those functions. R can automatically download these or you might have to manually download and unzip them.

` ``library("actuar") library("stats") library("stats4") library("copula") library("distr")`

We started by importing a file containing the loss data with a column for indemnity (I will refer to indemnity as loss in the code, hopefully it will be clear from the context) and ALAE. The first handful of rows are shown below the code.

` ``setwd("c:\\directory") # Change this to where the .csv file with the data is stored copula_data <- read.csv("copula_data.csv", header = TRUE)`

loss | alae |
---|---|

4468750 | 5571.34 |

3490000 | 808363.4 |

3450000 | 180757 |

3425000 | 713552.2 |

2895000 | 394624.5 |

1958925 | 36074.6 |

1626762 | 326741.5 |

990000 | 936229.8 |

980885.4 | 13156.44 |

Here was a quick look at the summary statistics of our data:

` ``summary(copula_data)`

loss | ALAE | ||
---|---|---|---|

Min. : | 100000 | Min. : | 0 |

1st Qu.: | 132812 | 1st Qu.: | 12390 |

Median : | 197500 | Median : | 48396 |

Mean : | 374647 | Mean : | 115491 |

3rd Qu.: | 336250 | 3rd Qu.: | 126048 |

Max. : | 4468750 | Max. : | 1767626 |

We removed any ALAE data points at exactly 0. This will allow us to fit curves to the logarithm of the data. An adjustment could be made at the end to add back in the probability of 0 ALAE but I have not done that here.

` ``alae_data <- copula_data$alae alae_data <- alae_data[alae_data>0] #The ALAE data will have a point mass at 0 which our fitted distributions do not account for.`

Finally we transformed the data to be in $[0,1]\times[0,1]$ by using the rank function and then dividing by the number of data points plus one. Some of the copula fitting procedures break down if there are ties or repeats in the data, so we applied a tie-break procedure which just randomly selects one of the equal entries to be ranked ahead of the other.

` ``set.seed(123) rankdata <- sapply(copula_data , rank, ties.method = "random") / (nrow(copula_data) + 1)`

A quick look at the data (log scale with liner trendline) showed the 100% correlation assumption to be unrealistic:

` ``plot(copula_data, log="yx") abline(lm(copula_data[,2]~copula_data[,1]), untf = "TRUE")`

There are so many resources discussing curve fitting for univariate data I won’t go into the theory but the following code took the ALAE data and used maximum likelihood to fit a pareto, log-gamma, Weibull and lognormal distribution:

` ``nLL1 <- function(mu, sigma) -sum(stats::dlnorm(alae_data, mu, sigma, log = TRUE)) # This defines a function giving the logliklihood of the data for a given set of distribution parameters. This line is for the log-normal distribution. alae_fit1<-mle(nLL1, start = list(mu = 10, sigma = 1), method = "L-BFGS-B", nobs = length(alae_data), lower = c(1,0.01)) #This finds the distribution parameters minimizing the function defined above, i.e. the maximum liklihood fit. nLL2 <- function(shape, scale) -sum(stats::dweibull(alae_data, shape, scale, log = TRUE)) # Same as above but for the Weibull distribution. alae_fit2<-mle(nLL2, start = list(shape = 1, scale = 50000), method = "L-BFGS-B", nobs = length(alae_data), lower = c(0.1,100)) nLL3 <- function(shapelog, ratelog) -sum(actuar::dlgamma(alae_data, shapelog, ratelog, log = TRUE)) #Log-gamma distribution. alae_fit3<-mle(nLL3, start = list(shapelog = 60, ratelog = 1), method = "L-BFGS-B", nobs = length(alae_data), lower = c(0.01,0.01)) nLL4 <- function(shape, scale) -sum(actuar::dpareto(alae_data, shape, scale, log = TRUE)) # Pareto distribution. alae_fit4<-mle(nLL4, start = list(shape = 1.1, scale = 1000), method = "L-BFGS-B", nobs = length(alae_data), lower = c(1,100)) # The following code created a graph displaying each of the fitted curves against the empirical distribution of the data. The x-axis is ALAE amount and the y-axis is cumulative probability of ALAE < y. x<-seq(0,max(alae_data),max(alae_data)/1000) # This defines the x-axis range for the following graph to encompas all ALAE data. plot(x,plnorm(x,coef(alae_fit1)[1], coef(alae_fit1)[2]),type="l",col="red", main="ECDF vs Curve Fits") #We give each distribution a different color as indicated in the code below. lines(x,pweibull(x,coef(alae_fit2)[1], coef(alae_fit2)[2]),type="l",col="blue") lines(x,plgamma(x,coef(alae_fit3)[1], coef(alae_fit3)[2]),type="l",col="orange") lines(x,ppareto(x,coef(alae_fit4)[1], coef(alae_fit4)[2]),type="l",col="green") plot(ecdf(alae_data),add=TRUE) # At this point the user may select which fit they think is the best. I typically found the pareto to be the best fit and so the rest of the code assumes the pareto distribution is chosen. The code can be modified to make a different selection. summary(alae_fit4) alae_fit4@coef`

If everything worked correctly you should see this graph:

The graph shows each of the 4 fitted distributions against the actual cumulative distribution of ALAE amounts. From practice the pareto often seems to be the best fit as it is here in green.

There are two aspects to copula fitting. Just like with univariate distributions, there are different families of distributions and then within a family any given dataset will have a best fit member (according to some goodness of fit measure). From my own experimentation I found that the best fitting copula family to various datasets of liability indemnity and ALAE was the Gumbel copula. This was also chosen as the best family in Frees and Valdez [3], Micocci [9] and Venter [12]. The Gumbel has the desirable properties of being single parameter, an extreme-value copula (meaning it's appropriate for right tailed truncated data as we often work with in reinsurance), and has a closed form expression for traditional product-moment correlation and upper tail correlation.

So for this exercise we assumed the Gumbel to be the correct copula family and then fit the best parameter (as it is a single parameter copula) using the maximum likelihood option in the 'copula' package's fitCopula method:

` ``fitted_copula<-fitCopula(gumbelCopula(dim=2), rankdata, method = "ml") #We have assumed use of the Gumbel copula. summary(fitted_copula) theta <- fitted_copula@copula@parameters tail_correlation <- 2 - 2^(1/theta) tail_correlation # The upper tail correlation is one way of uniquely describing a member of a one-parameter copula family (e.g. Gumbel). plot(rankdata)`

In my trials with different datasets of casualty large loss data, I had fitted tail correlations of between 0.2 and 0.4. This is on a scale of 0 to 1 for the upper tail correlation measure, as opposed to -1 to 1 for the traditional Pearson correlation measure. After running the code you should have seen a plot of the empirical copula data:

You can see the upper tail correlation since there is a cluster of points in the upper right hand corner. That means that when indemnity is large or in the upper quantiles, then ALAE also tends to be relatively large, or in the upper quantiles of the ALAE distribution. You can also see even more distinctly the lack of points in the upper-left and lower-right corners. This means that it is very rare to have a small ALAE amount accompanying a large loss amount and vice versa. If ALAE and indemnity were independent, then the points would be uniformly scattered across the entire square and you would see a similar number of points in each of the four corners.

` ``indexpo_data<-read.csv("indemnity_curve.csv", header = FALSE) #This assumes the user already has an indemnity distribution, i.e. exposure curve, they want to use. indexpo_data[1001, 2] <- 1 supp = indexpo_data[2:1001, 1] probs = indexpo_data[, 2][-1] - indexpo_data[1:1000, 2] indemnity<-DiscreteDistribution(supp, probs) #This just defines the cumulative distribution function of the exposure curve as a distribution object in R.`

As mentioned above, we are not fitting a cure to the indemnity data, we are assuming that you already have an empirical distribution representing projected indemnity amounts. This is usually based on the types of business written by the ceding company and at what limits.

The minimum value in the indemnity severity distribution is $100,000. This is what we will call the **model threshold**. This just simplifies the analysis by not requiring us to know about the severity distribution far below the reinsurance attachment point.

With both marginal distributions and a copula we now have a full model for the ALAE-indemnity bivariate distribution. We can plot some simulated data against the actual data to visually assess the model for any blatant errors:

` ``simcopula <- rCopula(nrow(copula_data), fitted_copula@copula) # This simulates as many random draws from the fitted copula as there are in the original dataset. simloss <- cbind(distr::q(indemnity)(simcopula[,1]), qpareto(simcopula[,2], alae_fit4@coef["shape"], alae_fit4@coef["scale"])) # This uses the cumulative ALAE and loss distributions to transform the copula data, which is in terms of percentiles, into loss/ALAE amounts. plot(copula_data, xlim = c(0, 10^7), ylim = c(0, 1.5*10^6), main = "input data v simulated data", xlab = "indemnity", ylab = "ALAE") par(new=T) plot(simloss, axes = F, type = "p", col=2, xlim = c(0, 10^7), ylim = c(0, 1.5*10^6), xlab = "", ylab = "") par(new=F) # This is a plot of the actual data versus simulated data.`

If all went according to plan, you should see a plot similar to this:

Of course, the above plot may be of limited value because we’ve only simulated as many points from the fitted distribution as there are points in the original dataset, so even two such plots form the same fitted distribution may look very different.

The final output is based on empirical simulation, not theoretical integration, so the first step is to simulate from our fitted bivariate distribution:

` ``n_simulations <-100000 # define the number of simulations to do for creation of the curve simcopula <- rCopula(n_simulations, fitted_copula@copula) simloss <- cbind(distr::q(indemnity)(simcopula[,1]), qpareto(simcopula[,2], alae_fit4@coef["shape"], alae_fit4@coef["scale"]))`

Interestingly, we need to simulate only from the copula distribution. These points live on the unit square, then we use the “q” function of the fitted distributions to convert from cumulative percentiles to x-values of ALAE and indemnity.

The first type of output we can create is an “ALAE Included” severity cumulative distribution. We add the simulated indemnity and ALAE amounts together for each simulated point, and rank them to create a univariate distribution. This could come in handy if we are trying to evaluate or simulate multiple layers simultaneously in excel or some other program, but need a single univariate severity distribution. It should be noted that in the ALAE pro-rata reinsurance case, a univariate distribution cannot replicate the loss to the layer for each claim. We export the resulting empiric distribution to a .csv:

` ``loss_alaeinc<-simloss[,1]+simloss[,2] n_points <-1000 #Define the number of points desired for the final empirical loss plus alae distribution. Should be orders of magnitude less than n_simulations for stability. cumulative_prob<-c(1:n_points)/n_points loss_dist_alaeinc <- cbind(quantile(loss_alaeinc, cumulative_prob), cumulative_prob) write.csv(loss_dist_alaeinc, "loss_dist_alaeinc.csv") #Output the empirical distribution based on simulated data to a .csv file (assuming this is desired for use in some other program, spreadsheet, etc.)`

The other form of output is a summary of loss statistics for the layers of interest. The final goal is to have an estimate for the expected loss, frequency, severity and standard deviation of loss for each layer as these are typically used in the determination of the price or cost of the reinsurance.

We start with a .csv file with the desired reinsurance limit and attachment for the layers of interest. We also have blank spaces where we will use R to enter the statistics of interest. Here is what the input file looks like:

limit | attachment | expected_loss | frequency | severity | std_dev |
---|---|---|---|---|---|

250000 | 250000 | 0 | 0 | 0 | 0 |

500000 | 500000 | 0 | 0 | 0 | 0 |

1000000 | 1000000 | 0 | 0 | 0 | 0 |

3000000 | 2000000 | 0 | 0 | 0 | 0 |

5000000 | 5000000 | 0 | 0 | 0 | 0 |

The only additional item of information we need is the frequency at the model threshold, which is our minimum modeled indemnity loss amount of $100,000. We assume this number has been estimated already and is given. Here is the code in the ALAE included reinsurance case:

` ``freq_at_threshold <- 16 loss_cost_exhibit_alaeinc <- read.csv("loss_cost_exhibit_input.csv", header = TRUE) layer <- function(x, limit, attachment) pmax(0, pmin(limit, x - attachment)) #This function calculates the reinsurance loss given the ground up loss and alae, limit and attachment. for (i in seq(length.out=nrow(loss_cost_exhibit_alaeinc))) { layeredloss <- layer(simloss[,1]+simloss[,2], loss_cost_exhibit_alaeinc$limit[i], loss_cost_exhibit_alaeinc$attachment[i]) #Amount of loss plus ALAE within the layer for each simulated loss. count <- sum(layeredloss > 0) #Number of simulated losses over the reinsurance attachment mean <- mean(layeredloss) #Average of the simulated layered loss. Note that layered losses of zero are included in the average. mean_sq <- mean(layeredloss^2) #Again, zeroes included. loss_cost_exhibit_alaeinc$expected_loss[i] <- freq_at_threshold * mean loss_cost_exhibit_alaeinc$frequency[i] <- count * freq_at_threshold / n_simulations loss_cost_exhibit_alaeinc$severity[i] <- mean * n_simulations / count #Adjusting for the zeroes included in mean. loss_cost_exhibit_alaeinc$std_dev[i] <- sqrt(mean_sq * freq_at_threshold) #This assumes poisson frequency. We use the threshold instead of layer frequency because the squared layer mean includes simulated losses of zero. } loss_cost_exhibit_alaeinc write.csv(loss_cost_exhibit_alaeinc, "loss_cost_exhibit_output_ALAE_Incl.csv")`

If all went well, you should have the following table both in R and in a .csv file:

limit | attachment | expected_loss | frequency | severity | std_dev |
---|---|---|---|---|---|

250000 | 250000 | 1943577 | 10.9496 | 177502 | 661010 |

500000 | 500000 | 1770682 | 5.8824 | 301014 | 872540 |

1000000 | 1000000 | 1178566 | 2.45936 | 479216 | 984557 |

3000000 | 2000000 | 1003379 | 0.76896 | 1304852 | 1520420 |

5000000 | 5000000 | 261281 | 0.16784 | 1556727 | 935496 |

This code is mostly identical to Step 8a, but we do the calculations assuming ALAE pro-rata reinsurance treatment. We need to change the layeredloss variable definition. It is the indemnity (first simloss column entry) plus the ALAE amount (second entry) multiplied by the ratio of the layered indemnity over the total indemnity.

` ``loss_cost_exhibit_alaepr <- read.csv("loss_cost_exhibit_input.csv", header = TRUE) for (i in seq(length.out=nrow(loss_cost_exhibit_alaepr))) { layeredindem <- layer(simloss[,1], loss_cost_exhibit_alaepr$limit[i], loss_cost_exhibit_alaepr$attachment[i]) #Amount of indemnity falling in the layer. layeredloss <- layeredindem + simloss[,2] * layeredindem/simloss[,1] #Reinsurance loss under ALAE pro-rata treatment which is layered indemnity plus an equal portion of the ALAE as the reinsured indemnity is of total indemnity. count <- sum(layeredloss > 0) mean <- mean(layeredloss) mean_sq <- mean(layeredloss^2) loss_cost_exhibit_alaepr$expected_loss[i] <- freq_at_threshold * mean loss_cost_exhibit_alaepr$frequency[i] <- count * freq_at_threshold / n_simulations loss_cost_exhibit_alaepr$severity[i] <- mean * n_simulations / count loss_cost_exhibit_alaepr$std_dev[i] <- sqrt(mean_sq * freq_at_threshold) } loss_cost_exhibit_alaepr write.csv(loss_cost_exhibit_alaepr, "loss_cost_exhibit_output_ALAE_Prorata.csv")`

If all went well, you should have the following table both in R and in a .csv file:

limit | attachment | expected_loss | frequency | severity | std_dev |
---|---|---|---|---|---|

250000 | 250000 | 1856440 | 9.02144 | 205781 | 776859 |

500000 | 500000 | 1594784 | 4.72624 | 337432 | 976762 |

1000000 | 1000000 | 970301 | 1.028 | 943872 | 1077113 |

3000000 | 2000000 | 902565 | 0.66144 | 1364546 | 1722935 |

5000000 | 5000000 | 107998 | 0.12592 | 857669 | 924822 |

We should compare the final results of the copula method for modeling ALAE to the classical assumption we talked about before. The classical assumption is that ALAE is a fixed percent of indemnity for every loss. The first step is to determine what that fixed percentage should be. A very simple way is to take the total ALAE in our loss dataset and divide by the total indemnity. This is a commonly used method and so will give us a fair comparison.

` ``ALAE_load <- 1+ sum(copula_data$alae)/sum(copula_data$loss) ALAE_load # We assume that the fixed ALAE load to be applied to each claim is the total ALAE in the dataset divided by the total loss (including below the threshold), which is a typical practice.`

The next step is to prepare the loss cost exhibits for ALAE included and pro-rata treatment under the classical assumption:

` ``loss_cost_exhibit_alaeinc_clsc <- read.csv("loss_cost_exhibit_input.csv", header = TRUE) for (i in seq(length.out=nrow(loss_cost_exhibit_alaeinc_clsc))) { layeredloss <- layer(simloss[,1]*ALAE_load, loss_cost_exhibit_alaeinc_clsc$limit[i], loss_cost_exhibit_alaeinc_clsc$attachment[i]) #We load each indemnity amount by the ALAE load and then apply the layering. The rest of the calculations are identical. count <- sum(layeredloss > 0) mean <- mean(layeredloss) mean_sq <- mean(layeredloss^2) loss_cost_exhibit_alaeinc_clsc$expected_loss[i] <- freq_at_threshold * mean loss_cost_exhibit_alaeinc_clsc$frequency[i] <- count * freq_at_threshold / n_simulations loss_cost_exhibit_alaeinc_clsc$severity[i] <- mean * n_simulations / count loss_cost_exhibit_alaeinc_clsc$std_dev[i] <- sqrt(mean_sq * freq_at_threshold) } loss_cost_exhibit_alaeinc_clsc write.csv(loss_cost_exhibit_alaeinc_clsc, "loss_cost_exhibit_output_ALAE_Incl_clsc.csv")`

If all went well, you should have the following table both in R and in a .csv file:

limit | attachment | expected_loss | frequency | severity | std_dev |
---|---|---|---|---|---|

250000 | 250000 | 1902908 | 10.6384 | 178872 | 650561 |

500000 | 500000 | 1832596 | 5.78512 | 316778 | 878039 |

1000000 | 1000000 | 1317938 | 2.51984 | 523024 | 999219 |

3000000 | 2000000 | 1172114 | 0.78608 | 1491087 | 1630258 |

5000000 | 5000000 | 303197 | 0.20064 | 1511148 | 856929 |

` ``loss_cost_exhibit_alaepr_clsc <- read.csv("loss_cost_exhibit_input.csv", header = TRUE) for (i in seq(length.out=nrow(loss_cost_exhibit_alaepr_clsc))) { layeredloss <- ALAE_load*layer(simloss[,1], loss_cost_exhibit_alaepr_clsc$limit[i], loss_cost_exhibit_alaepr_clsc$attachment[i]) #Exercise: Work out that this is the correct formula for reinsurance loss in this case. count <- sum(layeredloss > 0) mean <- mean(layeredloss) mean_sq <- mean(layeredloss^2) loss_cost_exhibit_alaepr_clsc$expected_loss[i] <- freq_at_threshold * mean loss_cost_exhibit_alaepr_clsc$frequency[i] <- count * freq_at_threshold / n_simulations loss_cost_exhibit_alaepr_clsc$severity[i] <- mean * n_simulations / count loss_cost_exhibit_alaepr_clsc$std_dev[i] <- sqrt(mean_sq * freq_at_threshold) } loss_cost_exhibit_alaepr_clsc write.csv(loss_cost_exhibit_alaepr_clsc, "loss_cost_exhibit_output_ALAE_Prorata_clsc.csv")`

If all went well, you should have the following table both in R and in a .csv file:

limit | attachment | expected_loss | frequency | severity | std_dev |
---|---|---|---|---|---|

250000 | 250000 | 1956277 | 9.02144 | 216848 | 766881 |

500000 | 500000 | 1726739 | 4.72624 | 365351 | 1013374 |

1000000 | 1000000 | 1062436 | 1.028 | 1033498 | 1135475 |

3000000 | 2000000 | 963839 | 0.66144 | 1457183 | 1719035 |

5000000 | 5000000 | 87520 | 0.12592 | 695043 | 621557 |

Now that we have the loss cost exhibit for each ALAE treatment and method, we can do a quick comparison. The following code generates a table of percentage differences:

` ``100*(loss_cost_exhibit_alaepr/loss_cost_exhibit_alaepr_clsc-1)[,3:6]`

You should get a table similar to this for the ALAE pro-rata case:

expected_loss | frequency | severity | std_dev |
---|---|---|---|

-5.150366 | 0 | -5.150366 | 0.1551662 |

-7.610009 | 0 | -7.610009 | -2.7739925 |

-8.051739 | 0 | -8.051739 | -3.8005262 |

-5.527266 | 0 | -5.527266 | 2.7570940 |

18.515967 | 0 | 18.515967 | 49.7287914 |

Note that the frequency doesn't change because under ALAE pro-rata treatment, the frequency is determined by indemnity amount only which does not depend on the ALAE. Since the frequency is the same, it makes sense that the difference in loss cost is entirely due to differences in severity so we see these differences being equal. But why is it that the loss, or severity, is lower for the new method for every layer except the very highest layer where it is much higher? It could be that for very high indemnity amounts, the tail correlation of the copula tends to draw the very high ALAE amounts which, due to the heavy tailed-ness of the pareto distribution, are a much greater than the average ALAE based on the ALAE ratio. Or it could be simulation error since we have only a finite number of points. For very high indemnity amounts, the tail correlation of the copula tends to draw the very high ALAE amounts which, due to the heavy tailed-ness of the pareto distribution, are a much greater than the average ALAE based on the ALAE ratio. However, in this example, more simulations should be done to increase the stability of the top layer.

Let's look at the ALAE included case:

` ``100*(loss_cost_exhibit_alaeinc/loss_cost_exhibit_alaeinc_clsc-1)[,3:6]`

You should get a table like this:

expected_loss | frequency | severity | std_dev |
---|---|---|---|

2.305544 | 2.9593188 | -0.6349835 | 1.69892766 |

-2.483069 | 2.1675698 | -4.5519714 | -0.05632885 |

-8.914428 | -1.0747844 | -7.9248187 | -0.51161208 |

-12.769818 | -0.5792305 | -12.2616102 | -6.07735758 |

-17.075425 | -17.2059984 | 0.1577083 | 4.78690599 |

This is interesting because we see the loss cost in the lowest layer increase and the highest layer decrease which is opposite of the ALAE pro-rata case. For the low layer, this may be because our model allows large ALAE amounts to occur for small indemnity amounts (with probability determined by the copula) and so we have additionally those indemnity amounts below the threshold divided by the ALAE load able to enter the layer, whereas under the classical assumption they would not. For the highest layer we may be getting the benefit of the partial correlation given by the copula, as opposed to 100% correlation in the classical assumptions.

As you probably noticed from the comparison table, the classical method is doing a fine job most of the time (otherwise the alarm would have been sounded already!). What I would like you to take away from this, rather than just blindly implementing the method, is to think about how ALAE has its own distribution and is tail correlated with indemnity. This has implications for certain particular scenarios: a $1M layer attachment when all policy limits (applying only to indemnity) are $1M and the reinsurance covers ALAE included with the loss.

What about layers that attach just above the ALAE load times a common policy limit, do losses from those policy limits really contribute no expected loss to the layer? Occasionally a reinsurance contract will say the ALAE treatment can either be included or pro-rata, whichever the client prefers. What should this cost and how does the distribution of ALAE and tail correlation with indemnity affect that cost? With this blogpost as a starting point, hopefully you are in a better position to answer those questions.

*Greg McNulty, FCAS SCOR Reinsurance moc.rocs|ytluncmg#moc.rocs|ytluncmg*

Bibliography

1. Camphausen, F. et al. “Package ‘distr’”. Cran.org. Version 2.4. February 7, 2013.

2. Dutang, C. et al. “actuar: An R Package for Actuarial Science”. Journal of Statistical Software. March 2008, Volume 25, Issue 7.

3. Frees, E.; Valdez, E. “Understanding Relationships Using Copulas”. North American Actuarial Journal, Volume 2, Number 1. 1998.

4. Genest, C.; MacKay, J. “The Joy of Copulas: Bivariate Distributions with Uniform Marginals”. The American Statistician, Volume 40, Issue 4 (Nov., 1986),280-283.

5. Geyer, C. “Maximum Likelihood in R”. www.stat.umn.edu/geyer. September 30, 2003.

6. Hofert, M. et al. “Package ‘copula’”. Cran.org. Version 0.999-5, November 2012.

7. Joe, Harry. Multivariate Models and Dependence Concepts. Monographs on Statistics and Probability 73, Chapman & Hall/CRC, 2001.

8. Kojadinovic, I.; Yan, J. “Modeling Multivariate Distributions with Continuous Margins Using the copula R Package”. Journal of Statistical Software. May 2010, Volume 34, Issue 9.

9. Micocci, M.; Masala, G. “Loss-ALAE modeling through a copula dependence structure”. Investment Management and Financial Innovations. Volume 6, Issue 4, 2009.

10. Ricci, V. “Fitting Distributions with R”. Cran.r-project.org. Release 0.4-21, February 2005.

11. Ruckdeschel, P. et al. “S4 Classes for Distributions—a manual for packages "distr", "distrEx", "distrEllipse", "distrMod", "distrSim", "distrTEst", "distrTeach", version 2.4”. r-project.org. February 5, 2013.

12. Venter, G. “Tails of Copulas”. Proceedings of the Casualty Actuarial Society. Arlington, Virginia. 2002: LXXXIX, 68-113.

13. Yan, J. “Enjoy the Joy of Copulas: With a Package copula”. Journal of Statistical Software. October 2007, Volume 21, Issue 4.

by Greg McNulty

]]>Technically, **open source** software simply means that the source code for a piece of software is distributed together with, or in lieu of, its executable. It technically makes no statement about how the end user may take advantage of that source code, other than compiling it, and possibly learning from reading it.

The accepted definition of **free software** is software which is released under a license that guarantee users not only the freedom to run the software for any purpose, but also to study, modify, and distribute both the original software as well as any changed versions.

Because the word "free" has another connotation (Stallman's famous "free as in speech, not free as in beer" line) the industry has accepted the use of the term **libre** to mean "free as in beer" in this context.

So when people say free software, most of the time they are referring to **FLOSS** software: **F**ree **L**ibre **O**pen **S**ource **S**oftware. This is very different from *freeware* which means that the author is giving away the software, but **not** the source code or any license outside of using the software. For example, many iOS and Android apps are freeware, but if you reverse-engineer the source code and sell or give away your own version, you will quickly be sued for copyright infringement.

For anyone interested, there is some friction between the "free software" proponents and the "open source" software proponents. If you are interested, Wikipedia, the OSI, and the FSF are good places to look.

Even within the context of F(L)OSS, there are different classes of licenses which can be more or less restrictive. One of the most important differences between licenses is do derivative works need to be released under the exact same (or sometimes a similar enough) license, or may they be distributed however is desired, including on a proprietary basis (and for a fee). The licenses that require release under the same (or similar) licenses are called **copyleft** (yes, that *is* a pun on copyright). Examples include the Gnu GPL and LGPL. Other, more permissive, licenses include the BSD(-3) and Boost licenses.

Another important question is even if the released software will remain untouched, can it be "folded into" a proprietary piece of software or *must* any software into which it is compiled be release *in toto* on a free basis. This is the main difference between the GPL and LGPL. The former requires all software which includes any library licensed under the GPL to be released under a compatible license. The Gnu Lesser General Public License allows for a free library to be released in a proprietary package as long as the free library itself is untouched (and released with the proper license and author list etc.)

A third key question is with which licenses do we want our works to be compatible. For example, GPL2 is **not** compatible with the original BSD license (although GPL3 is compatible with BSD3) which meant that, legally, someone could not build a piece of software which relies on two libraries, one of which is GPL2 and the other OldBSD, as that would violate the license.

If we wanted to be as permissive as possible (allow someone to use our work to make money for his or her company, use our work to build proprietary software for sale, etc.) then we need to focus on licenses like the new BSD, the Boost, the ISC, the MIT, etc. which are pretty much along the lines "Use this for whatever you want at your own risk." If we wanted to ensure that our work remains open source we would have to focus on one of the more restrictive copyleft licenses.

by Avraham

]]>