Data Mining: The AdWords Problem Review

This post is a continuation of the previous post on Advertising on the Web and Data mining. Here we conclude by reviewing some basic algorithms for placing ads on the web. The adwords problem is to be solved with association graphs algorithm in Data Mining.

The AdWords Problem

With Adwords we consider Google’s example. Google’s policy for placing ads is not on the bid value but in the total amount expected to be received for the display of each ad. The value of an ad was taken to be the product of the bid and the click-through rate (the click-through rate is taken for each ad, based on the statistics of displays of an ad).

Match graphs

After we concluded that the ad placement challenge is to be done with on-line algorithms of greedy or generalized balance types (previous post), we need to consider also the match graphs for bid-query match.

This problem involves two sets of nodes and a set of edges between members of the two sets. The first set is the incoming search queries while the second set is the related ads from various sources. The goal is to ﬁnd a maximal matching — as large a set of edges as possible that includes no node more than once. The On-Line Solution to the Matching Problem would be a greedy algorithm for ﬁnding a match in a bipartite graph. It is done by ordering the edges in certain way. The competitive ratio in this case should be no less than 1/2. That is the on-line greedy matching algorithm finds matches at least half as many nodes as the best off-line algorithm does.

Basic Adwords workflow

Of course, the decision regarding which ads to show must be made on-line. So, there should be only on-line algorithms considered for solving the adwords problem. The input data and algorithm results are as follows.
Input:
1. A set of bids by advertisers for search queries.
2. A click-through rate (CTR) for each advertiser-query pair
3. A budget for each advertiser. We shall assume budgets are for a
month, although any unit of time could be used.
4. A limit on the number of ads to be displayed with each search query.

• Output – Respond to each search query with a set of advertisers of those characteristics:
1. The size of the set is no larger than the limit on the number of ads per query.
2. Each advertiser has a bid on the search query.
3. Each advertiser has enough budget left to pay for the ad if it is
clicked upon.

Implementing the Adwords algorithm

Basically we now have an idea of how ads are selected to go with the answer to a search query. But we have not addressed the problem of ﬁnding the bids that have been made on a given query.

Finding bids for search queries

Which algorithm finds the bids that have been made on a given query? For a simple case we can represent a query by the list of its words, in sorted order. Bids are stored in a hash table or similar structure, with a hash key equal to the sorted list of words. A search query can then be matched against bids by a straightforward lookup in the table.

The simplest version of this implementation serves in situations where the bids are on exactly the set of words in the search query (in reality, search engines all offer to advertisers a ‘broad matching’ feature). It also does not consider the historical frequency of queries or statistics.

The task might not just be limited to the same set of words. For example, Google also matches adwords bids to emails. There, the match criterion is not based on the equality of sets but on the inclusiveness of the bid’s set in the related document (email). How could you compare the millions of words from bids with the millions of words in the stream of regular emails?

Matching Algorithm for Documents and Bids

The Matching Algorithm for Documents and Bids has been developed for this purpose. Briefly, it works by processing the words of the document, rarest-ﬁrst. Word sets whose ﬁrst word is the current word are copied to a temporary hash table, with the second word as the key. Sets already in the temporary hash table are examined to see if the word that is their key matches the current word, and, if so, they are rehashed using their next word. Sets whose last word is matched are copied to the output. For more details on this algorithm you might read more from the same book, paragraph 8.5.3.

Summary

The data mining algorithms for ad placement in response to a search query or larger documents (emails) are on-line greedy or generalized balance algorithms on the match graphs. The complex query/bid matching process is to be done with hashing sets-of-words tables with the consequent matching of rare to unrare word order in order to find the total match of the bid set in the document.