Global Payment Trends: Currently reading: RFM and CLV: Using Iso-Value Curves for Customer Base Analysis

The R packages But 'til you die (BTYD) implements a number of customer lifetime value prediction algorithms. Here, I've collected my notes from reading the papers RFM and CLV: Using Iso-Value Curves for Customer Base Analysis and “Counting Your Customers” the Easy Way: An Alternative to the Pareto/NBD Model.

Iso-value curves allows us to group individual customers with different purchase histories but similar future valuations. Iso-curves can be visualised to show the interactions and trade-offs among the RFM measures and customer value.

The Pareto/NBD framework captures the flow of transaction s over time and a gamma-gamma submodel is used for spend per transaction.

The Pareto timing model

Hold-out tests are used to check for validity of the model.

Customer centric marketing

In non-contractual settings, forecasting the CLV is particularly challenging.

Researchers have previously developed scoring models to predict customers behaviour. The RFM model (recency, frequency, monetary value) is also common way to summarise customers' past behaviour. They fail to recognise that different customer cohorts will lead to different RFM values.

These problems can be overcome with a formal model of buyer behaviour. The authors develop a model based on the premise that "observed behaviour is a realisation of latent traits". With this insight, they can use Baye's theorem to estimate customers' latent traits.

Statistical inference is the process of deducing properties of an underlying distribution by analysis of data.

Bayesian inference is a method of statistical inference in which Baye's theorem is used to update the probability for a hypothesis as evidence.

Baye's theorem describes the probability of an event, based on conditions that might be related to the event. With a Bayesian probability interpretation the theorem expresses how a subjective degree of belief should rationally change to account for evidence, i.e. Bayesian inference.

P(A|B) = ( P(A) P(B|A) ) / P(B)

The Pareto/NBD framework assumes that the RFM variables are independent. Monetary value is independent of the underlying transaction process, which means that value per transaction can be factored out. Instead, the authors recommend focusing on the "flow of future transactions". To get the estimated customer lifetime value, we can rescale the discounted expected transactions (DET) with a multiplier. The DET is constructed with a gamma-gamma submodel.

CLV = margin * revenue / transactions * DET

The Pareto/NBM framework is based on the following assumptions.

Customers go through two stages in their lifetime with a specific firm: They are active for some time period, and then they become permanently inactive.
While customers are active, they can place orders whenever they want. The number of orders a customer place in any given time period appears to vary randomly around his or her underlying average rate.
Customers (while active) vary in their underlying average purchase rate.
The point at which a customer becomes inactive is unobserved by the firm. The only indication of this change in status is an unexpectedly long time since the customer's transaction, and even this is an imperfect indicator; that is, a long hiatus does not necessarily indicate that the customer has become inactive. There is no way for an outside observer to know for sure (thus the need for the model to make a "best guess" about this process.
Customers become inactive for any number of reasons; thus, the unobserved time at which a customer becomes inactive appears to have a random component.
The inclination for a customer to "drop out" of their relationship with the firm is heterogenous. In other words, some customers are expected to become inactive much sooner than others, and some may remain active for many years, well beyond the length of any conceivable data set.
Purchase rates (while a customer is active) and drop out rates vary independently across customers.

The model only needs recency and frequency. This is represented by the columns x, t.x, T in the R packages BTYD. x is the number of transactions observed in the time interval (0, T) and t.x (0 < t.x < = T) is the time of the last transaction.

The authors develop a formal model for lifetime value because the observed data is sparse and therefore unreliable.

These calculations give us the expected lifetime number of transactions from a customer. To calculate the lifetime value, we also need a model for the expected value of the transactions. The assumptions are:

The dollar value of a customer's transaction varies randomly around his average.
Average transaction values vary across customers but not over time.
The distribution of average transaction values is independent of the transaction process.

The author's descriptive statistics indicate that the sample transaction values aren't normally distributed. Instead, they assume a log-normal distribution across the population for the average transaction value. It would also make sense to use a log-normal distribution for random individual-level purchase behaviour around the person's mean. The authors write that they cannot account for the non-normal distribution of repeat transactions because "there is no closed-form expression for the convolution of lognormals".

Closed-form expression: A mathematical expression that can be evaluated in a finite number of operations. Usually, if an expression contains a limit function, it is not closed form.

Convolution: A function derived from two given functions by integration that expresses how the shape of one is modified by the other.

Instead of the log-normal distribution, they choose the gamma distribution, adapting the gamma-gamma model from Colombo and Jiang (1999).

The gamma distribution is a family of probability density distributions with two parameters. The exponential distribution and chi-squared distribution are two special cases of gamma distributions. There are three alternative parameterisations of a gamma distribution: shape and scale, shape and rate, or shape and mean.

Monetary value
Why do we need a model for monetary value at all? Isn't the mean of observed values sufficient? We cannot necessarily trust the observed value m.x because of potential outliers skewing individual results. If a customer has made a payment with a size far away from the mean, we want to debias the forecast. The monetary value of each transaction is denoted by z.1, z.2 ... z.x. As x approach infinity, the observed mean transaction value m.x approaches the true mean E(M). We expect this to be a slow process and one which the typical sparse transaction data set is far from approximating.

Z.i is assumed to be i.i.d. gamma variables with shape parameter p and scale parameter v.
A gamma (px, v) random variable multiplied by the scalar 1/x is also has a gamma distribution with shape parameter px and scale parameter vx.

The individual-level distribution of m.x is given by f(m.x | p, v, x) = ( (v x)^(p x) m.x^(p x-1) e^(-v x m.x) ) / Gamma(p x).

The expected monetary value E(M) is a weighed mean of the observed monetary value m.x and the population mean. More transactions (a higher value of x) leads to more weight being placed on the individual observed mean.

A marginal distribution is the probability distribution of several variables combined. The variables are also a subset of a larger set of variables. The term marginal came about because they used to be found by summing values in tables along rows or columns in the margin of the table.

Sources
“Counting Your Customers” the Easy Way: An Alternative to the Pareto/NBD Model
RFM and CLV: Using Iso-Value Curves for Customer Base Analysis

Global Payment Trends

Pages

Monday, December 21, 2015

Currently reading: RFM and CLV: Using Iso-Value Curves for Customer Base Analysis

No comments: