Sunday, June 21, 2015

How to calculate the virality coefficient in SQL

One of the most powerful way to grow an online business is through word of mouth. Cohort analysis is a powerful way to understand how effective you are at turning users into advocates. The number of users brought in by an invite program is called the virality coefficient. It measures how many guests (new users) your hosts (existing users) are bringing in. Depending on your business, you will be more concerned with the new user virality coefficient, or the active user virality coefficient. Here, I explain how to calculate a generic one week new user virality coefficient with SQL.

First, some definitions


  • Host: An existing user.
  • Guest: A potential users that has been invited by a host.
  • Conversion: In this case, we define conversion as making a payment.
  • Cohort: A collection of host that share some attribute. If we are calculating the new user virality coefficient, the shared attribute is that they converted in the same time frame.
  • Time limit: To make the data comparable across cohorts, we must look at the same time frame. Here, we use one week. If you are familiar with cohort tables, they show the value across several time periods.
  • One week new user virality coefficient = Number of converted guests by an existing user within one week of that user converting themselves
  • 1wnuvc=(guests,1w | host)/hosts

What the database is assumed to contain

A typical database structure for an invite program could look something like this:

Invite_table
Host_id
Guest_id
Guest_first_payment_date

User table
Id
Sign_up_date
First_payment_date

Step 1

First we define what cohort we want to measure. In this example, we will define a cohort as the week of the first payment.

select 
date_format(user_table. first_payment_date, '%x-%v')
, count(user_table.first_payment_date)
from user_table
group by 1

By using %x instead of %Y to calculate the year, we make sure to calculate the year correctly even when the year changes.

Step 2

We then calculate the virality coefficient per host within given time frame. Let's say we want to measure the one week virality coefficient. The case statement checks if the difference between the conversion of the guest and the host is less than seven days and only counts those cases.

Since the date of the host's first payment is stored in user_table, we need to do a join to be able to calculate the difference.

select
invite_table.host_id
,sum(case when
datediff(invite_table.guest_first_payment_date, user_table.first_payment_date) <=7
then 1
else 0
end) as individual_one_week_vc
from invite_table
join user_table
on user_table.id = invite_table.host_id
group by 1

Step 3

Finally, we combine the two. The virality coefficient for the weekly cohort is calculated as the sum of the individual virality coefficient, divided by the number of hosts in the cohort.

select 
date_format(user_table. first_payment_date, '%x-%v')
,count(user_table. first_payment_date)
,sum(conversions.individual_one_week_vc)/count(user_table. first_payment_date) as one_week_virality_vc
from user_table

left join (
select
invite_table.host_id
,sum(case when
datediff(invite_table.guest_first_payment_date, user_table.first_payment_date) <=7
then 1
else 0
end) as individual_one_week_vc
from invite_table
join user_table
on user_table.id = invite_table.host_id
group by 1
) conversions
on conversions.host_id=user_table.id

group by 1


No comments:

Entertaining Blogs - BlogCatalog Blog Directory
Bloggtoppen.se