Probability Models for Assessing the Effectiveness of Advertising Channels in the Internet Environment

Marketing specialists simultaneously use several channels to attract visitors to websites. There is a difficulty in the separate assessment of not only the efficiency and conversion of each channel, but also in their interconnection. Problems occur when users visit a website from several sources and only after that do the key action. Different models of attribution are used to assess the effectiveness and selection of the most optimal channels. The models are reviewed in the present paper. However, we suggested using the multi-channel attribution, which provides an aggregate assessment of multi-channel sequences, by taking into account their interdependent nature. The purpose of paper was to create an attribution model that comprehensively evaluated multi-channel sequences and showed the effect of each channel on the conversion. The presented model of attribution can be based on the theory of graphs or Markov chains. The first method of calculation was more visual; the second (based on Markov chains) allowed working with a large amount of data. As a result, it presented a model of multi-channel attribution that was based on Markov processes or graph theory. It allowed for maximum comprehensive assessment of the impact of each channel on the conversion. On the basis of two methods, calculations were carried out confirming the adequacy of applied model for assigned tasks.


Introduction
Recently, successful internet marketers use more and more channels. In addition to search engine optimization and contextual advertising (Yandex.Direct and GoogleAdwords), e-mail channels, social networks, Instagram, remarketing / retargeting, etc. were included in practical use. Therefore, marketers face the task of selecting those advertising channels that will be most effective for a specific project. In addition to the difficulty in choosing the optimal advertising channels, it is difficult to choose a model for the integrated evaluation of the efficiency of a channel for the subsequent distribution of the advertising budget between them.
The first difficulty in complex evaluation is that users can visit the website in several ways: they can go to the site following a direct link, and from social networks, from advertising links on Yandex, etc. Moreover, before making the desired action (conversion) on the site, users can repeatedly visit the site from different "entry points": the first time they can go to the site by clicking on the advertisement (CPC) from Yandex search request results, the second visit can be by direct link (Direct), and the third one (leading to the conversion -C) can be from a social network (Social) in this case we observe the chain (multichannel sequence): (1) Thus, when evaluating the effectiveness of advertising channels, the marketing specialist first of all needs to answer the questions: how to assess the contribution of a particular source to the formation of conversion on the website? What happens with the conversion on the site, if you exclude this or that marketing channel? To answer this question, there are a number of methodologies, which are called attribution models

Methods Literature Review
The attribution model is a way of distributing the "weight" of conversion between channels. Depending on the choice of the attribution model, the weight of the channel (source) will be calculated, which can be conditionally considered as the contribution that this source in conversion. We can review the sequence: (2) The following basic attribution models are distinguished now: 1. By the last interaction -Last Click Model (LCM). This model, for its simplicity and intuitive "correctness", has become most widespread in practice. In the most general case, within the LCM model, all 100% of the weight of the conversion is given to the last channel in the multichannel sequence, which preceded the fact of the occurrence of the desired action. 2. First interaction -First Click Model (FCM). В данной модели 100% вес отдается первому источнику в последовательности и 0% всем остальным. 3. In this model, 100% weight goes to the first source in the sequence and 0% to all the rest. 4. Linear model -Linear Model (LM). Within its framework, all channels get their nonzero weight. In the case of LM, all channels have the same weight (that is, their contributions to the conversion) and are considered to be equivalent. 5. Time Decay -Time Decay Model (TDM). The TDM attribution model is based on the assumption that the contribution of the channel is greater the closer it is to the conversion, so the channel weight is a monotonically increasing function depending on its position in the chain. Based on the position -Position Type Model (PTM). The PTM attribution model is a combination of three models: LCM, FCM and LM. Within its framework, the maximum share (usually 40) goes to the first and last interactions in the chain, and the remaining (typically 20) are distributed evenly (as in the linear model) between the intermediate channels.
The choice of the attribution model is the most important step in assessing the effectiveness of internet marketing (Davnis and Tinyakova, 2006). Depending on the model, the marketing manager can get absolutely opposite conclusions about the profitability of a particular channel. Especially this is observed in the spheres, where there is a long decision-making process, for example, in real estate sphere (Lavrynenko and Tinyakova, 2013). The question arises: which model of attribution should be considered as a reference model?
As a rule, the LCM model is being chosen. However, in practice, there were cases when replacing LCM with PTM and subsequent allocation of funds between channels allowed to significantly increase the efficiency of marketing activities (Hollensen et al., 2017;Kotler et al., 2016;Roos, 2017). The modern models have the following disadvantages: -the impossibility of obtaining unambiguous results and their evaluation of the choice of a particular model; -the use of expert choice of the model increases the subjectivity of further decisions; -combined models are also not deprived of their disadvantiges due to the initially selected weights

Data and Methodology
The reviewed model was originally developed for aggregate assessment of multi-channel sequences, assuming that the channels are interdependent.
Let's describe the data format which our model interacts with. Assuming that for the analyzed time interval T, M transitions were made to the website, that is, we have data on M user sessions. Each i session Si has a fixed set of parameters (session attributes) P. For our analysis, we need the following set of attributes to be included in the set of all session attributes: { } (3) SrcTypelink channel; URLthe address of the page that the user visited when going to the website; clientID -the user's unique identifier; CVtypewhether a conversion was made as a result of the session (CV-yes, Nno); Ttime interval between TimeS; TimeF. The channel is the source of traffic, which can include: Yandex CPC, Google CPC, Facebook, Vkontakte, Instagram, Direct, Referal, etc..
Advertising channels are coded as follows: assuming that their number is limited by the value of k. Let's suggest that M sessions { } were initiated byG ≤ M users. Using the unique identifier of the user clientID we can divide the set Σ into G disjoint subsets: (4) multiple sessions (sorted by date ascending order) with the same clientID, i.e., a set of chronologically ordered sessions initiated by the same user. Considering our assumption that [TimeS; TimeF] ⊂ T, then on the basis of data in Ui we can associate with each i user the following chain of channels: where L_(i=) |U_i | is the number of elements (the number of the user's transitions to the website) in the set . The above transition chain is a sequence of traffic sources that the i-user used during the interaction with the website.
Next we introduce two additional "pseudo-channels" CV and N according to the rule: -if during the user's session i with the source there was a conversion, then after we add CV, and obtain ...→ →CV→...; -if no conversion occurred as a result of the last current session i with the source , then after we add N, and obtain ...→ →N. Let's additionally pay attention to the situation when we have chains like: Sequences with such a structure cannot arise according to the rules formulated above, but nevertheless they can occur in a number of cases, for example, in call-related topics, where besides the above session parameters we have a unique relation: . A key feature of the above method of forming chains of users' interaction with the website is that any chain of interaction (multichannel sequence) always ends as one of two "events": CV or N. Event N can occur only in the end of the sequence, while event CV can occur at an arbitrary place.
Let's perform examples of sequences formed according to the above mentioned rules. For simplicity, we shall take 3 different channels and add CV and N to them: ; ; ; ; ; .
The next step necessary to construct a multi-channel attribution model is to transform the sequences so that event CV, like event N, can occur only strictly at the end of the sequence (such sequences will be called elementary sequences). For this purpose, we shall "split" the original chains so that at their end they will always have a CV or N event.
Let us demonstrate this method by the example of typical sequences: chains 1-4 modified to "elementary" form; let us -split‖ chain 5 into: and ; let us -split‖ chain 6 into: and ; let us -split‖ chain 7 into: and ; let us -split‖ chain 8 into: , and . Then we shall review a set of G sequences (we assume that all of them are already elementary, that is, they end in CV or N. We suggest that from X sequences X end with CV and G and X with N. Then we define the effect of channel on conversion on the website for time T through ), and the elementary chain j through . The value of impact ) of the channel on conversion will be considered as the number of "lost" conversions in case of removing the channel from all conversion chains, where it is performed, referred to the total number of conversions X:

|{ | }|
It is obvious that for any the value of ) satisfies the following inequality: Moreover, if and only if channel is not included into any "conversion" sequence, and I if and only if the removal of ci leads to loss of all conversions on the website. Thus, it will be easy to estimate the new number of conversions that will result after removing the channel : (6) The sum of the effects of the channels is not equal to one. For convenience, we can introduce a normalization and calculate the normalized influence of channels on conversion: ( ) ∑ If the task is to find out how the channel affects , then we can use the following argument: the user's session initiated by channel leads to the session with channel as many times as there are chains , where precedes in them. And if we designate the value of such influence as ( ), then: Generally, the function ( ) is not symmetric: ( ) ( ). Sequences where precedes j and at the same time j precedes (i.e. cycles are formed) can also be taken into account in the denominator of the formula. The normalization introduced earlier can be generalized in a natural way to the case described above:

Channel Cost Assessment
To assess the basic metrics, we also need to add to the parameters of user's sessions an indicator called "cost of transition". It can be interpreted as the cost paid by the advertiser, per user click on the given channel, if the channel is free (for example, direct link), then we will assume that the cost of the transition equals 0. If it is possible to determine only the total cost of the channel (for example for SEO), then we assume that the cost of the transition in a particular session is equal to the ratio of the total costs per channel to the number of uses of this channel for all sessions. We will designate the cost of the transition for channel in chain through . Thus, we can estimate the cost of one chain as follows: Total costs for channel equal: Total costs of attracting users to the website using channels , ,..., equal ∑ ∑ ∑ ∑ (12) The duality of the formula results from different ways of calculating the total costs: in the first case, we sum the costs for each of the chains in all G chains, and in the second case we sum the channel costs for all k channels.
After describing all elements included in models of multichannel attribution, we shall review the two most effective ones: graph and matrix models (Wiesner et al., 2001).

Graph Model
In order to perform the set of chains in the form of a graph, we need to fix two sets: the set of V vertices and the set of E connections between them. Marketing channels and additional events will be the vertices: { } We shall choose pairs of connected elements of V as E. For the elementary chains reviewed above, we obtain: Since there may be coincident elements in the set E, the resulting graph can have multiple (doubled) edgesthis will complicate the perception. Therefore, the original graph is transformed into an oriented weighted graph (see Figure 1):

Figure-1. Oriented weighted graph
It should be noted that P is the probability of conversion of source in the classical LCM model. It is obvious that the LCM model does not take into account the large amount of statistical data that we can collect by analyzing users' sessions. If we perform calculations for all the remaining vertices, then our graph will be transformed to the form below (see Figure 2).

Figure-2.
Graph for calculating the total probability of conversion of a specific channel Based on this model, we can calculate the full probability of conversion for a particular channel. The following recursive formula is used for calculation: However, if we assume the possibility of transitions in the graph of type...→ → →... (i.e., permit loops), then the system of equations becomes nonlinear, which considerably complicates determining of the required probabilities.

Matrix Model
Let us consider the second model of multi-channel attributionmatrix model. Imagine a set of k channels , ,... and two additional "pseudo-channels" CV, N. In the graph model they were performed as vertices. As a result, we can from a square matrix , with conditional probabilities as elements ( ) : ( ) The matrix for which this condition is satisfied is called stochastic. It is known that an arbitrary stochastic matrix defines a certain random process, called Markov process.
Such models allow to answer a number of important questions, in particular: -What is the probability of passing from state to state in t steps? -What will be the distribution of the probability of finding in each channel in t steps? To solve it, we must find the answer for the special case of the first question: what is the total probability of passing from the state to CV? (16)

Empirical Results
As an example, we shall calculate the total probability of conversion for source . Since is associated with , CV, N, but the probability of passing from N to CV equals zero, and the probability of passing from CV to CV equals 1, then: From , we can return to or pass to , CV, N, which means: Next we shall transform:

( )
For convenience, we shall designate ,, then we obtain the following linear equation:

( )
Now we shall calculate . From source , we can only go to CV or N. As a result we shall obtain: Finally we have the following equation:

( )
And we further we can determine x: The main advantage of the above model is its clarity, while the obvious disadvantages (as can be seen even in a simple example) include high computational complexity in case of large number of traffic sources.
Let us make the calculation for the matrix model. For the example above, we obtain:

( )
We can prove that for the case when transitions to no other state are possible from states CV and N, this limit exists. Of course, in practice, we cannot operate with the "infinite" degree of the matrix. However, instead of "infinity" it is usually sufficient to take a sufficiently higher power of two. The convenience of raising a matrix to the 2t power is that matrix H has to be multiplied by itself.
Let us show on our example the rate of "convergence" of the limit to the required probability: As a result, for the calculated probability differs from the exact value that we previously obtained on the basis of the graph model, in the fourth decimal place. The probability values calculated for , , coincide. Thus, in this case it was sufficient to limit to calculating , which requires only 3 matrix multiplications. Thus, the rate of convergence of the limit to the required probability is high enough, which makes this model effective in practical applications.

Summary and Conclusion
Currently used classical models of conversion attribution were reviewed. In addition, a multi-channel attribution model based on Markov processes (chains) is described that allows to evaluate comprehensively the probability of conversion for each advertising channel and calculate the impact of the channel on the website conversion. Approaches that allow to adapt the formed model for the optimization of rates in contextual advertising were performed.