Pages

Saturday, March 19, 2011

Semantic Duplicates


Overview

Duplicate keywords in adwords would strictly be defined as keywords with the same text and the same match-type. However, this definition leaves out a large number of cases where duplicity is created not directly by the keyword itself, but by the way the google algorithm sees these keywords. We call such duplicity as semantic duplicity. The purpose of this document is to define semantic duplicity, formally and also through many examples; then to outline a step-by-step to remove semantically duplicate keywords from MakeMyTrip accounts.

Theory Development

An Example

Consider the keywords “air fare international” and “international air fares” both being bid on broad match. These would not be identified by the standard Google duplicate keyword finder as they have differing textual content. However, consider the case when a user makes the following search query – “cheapest air fares for international routes”. Basis the nature of broad-match in Google’s algorithms, this search query could be matched to either of the two keywords defined above. Hence, which keyword finally enters the auction would now be decided not by their text but by their bids. If “air fare international” was bid at 19 and “international air fares” was bid at 20, the latter would enter the auction.

Another Example

Next consider two keywords in phrase match – “cheap international” and “cheap international airfare” and consider the user query “cheap international airfare to Bangkok”. By virtue of Google’s phrase-match definition, both keywords are eligible to enter the auction. Which one finally does would be decided by their bids. If “cheap international” was bid at 20 and “cheap international airfare” was bid at 19, the former would enter the auction.
It is important to note that if the search query was “cheap international tickets”, then the keywords would not be semantically duplicated as only the first one will match to the query irrespective of their bids.

Generalization

From the two examples above, we observe that the keywords text, match-type and the bid all together determine whether it will enter the auction or not. We now define semantic duplicity
Semantic Duplicity: A set of keywords is semantically duplicated for a given search-query, when the decision as to which one will enter the auction for that search-query is determined purely by their bids[1].
We also need to extend this definition for keyword sets which will have semantic duplicity for all possible search queries.

Configurations

Broad Match

In a set of keywords where all are bid on broad match, semantic duplicity can arise if any of the following conditions are met
Rearrangements: This is the configuration when all keywords are the exact same set of words but in different order of arrangement, e.g. “cheap international fare” and “fare cheap international”.
Plurals: This is the configuration when all keywords have the same word, sometimes in singular and other times in plural, e.g. “cheap international fare” and “cheap international fares”.
Relevant Variants: This is the configuration when all keywords have the same word but in close variant forms. Variant forms would be the same word in different parts of speech, maybe once as an adjective, at other times as an adverb etc. For e.g. “cheapest international”, “cheaply international” and “cheap international” contain variants of the root “cheap”; however only relevant variants would qualify. Since the internals of Google broad match algorithm are trademarked secrets, it is better to be extremely conservative while building the set of variants for a word. Only variants where the interpretation is unambiguously the same should be included. In this example, “cheapster” would not be a relevant variation of “cheap” as it has completely different connotation.
Synonyms: This is the configuration when all keywords have the same word but as synonyms. Once again it must be said that since the internals of Google broad match algorithm are trademarked secrets, it is better to be extremely conservative while building the set of synonyms for a word. An example would be “cheap international fare” and “cheap international price” where fare and price are synonyms.

Any Match

In a set of keywords where all bids are phrase match or broad match, semantic duplicity can arise when one keyword is the exact substring of another with the same or different match-type and the longest keyword is a substring of the search query. For example “cheap international”, “cheap international ticket” and “cheap international ticket to Bangkok” are all semantic duplicates when the search query is “cheap international ticket to Bangkok from Mumbai”.

Disadvantages of Semantic Duplicity

Why should semantic duplicity be removed from accounts?
Control over Costs: With semantic duplicity present in an account, the advertiser will quickly lose control over his costs. This would be especially apparent when the keywords are in different ad-groups or campaigns. If “cheap international fares” and “cheap international” are in two different ad-groups with broad match type, and the advertiser wants to control his cost on generic international terms, his work is made very tough as he will not know for which keyword his ad would be served. He would then have to be cognizant not just of bids at an ad-group level but also at a semantic duplicate group level, which would then cut across many ad-groups making the task doubly complex.
Control over Ad-Copies: Not just the cost, but the general quality and hygiene of the account will suffer if semantic duplicity is present. The advertiser can never control which ad-copy would be showed to the user for a given search query. While he may have modified ad-copies on some keywords in one ad-group, he may not have modified them for semantically duplicate keywords in another ad-group. An egregious manifestation is the following case – “air fare to Bangkok” shows the ad-copy “Delhi-Bangkok for Rs. 2300” and “cheap air prices to Thailand” shows the ad-copy “Delhi-Bangkok for Rs. 4500”. (Aside: Can you identify which types of SD configurations are present in these two keywords?[2])

Resolution

We are under no illusion that Semantic Duplicity can be completely removed from an account. But it can definitely be minimized by some basic operations across the entire account:

SD Resolution Process

1.       Remove duplicates – same keyword with same match-type and same targeting across the accounts should first be removed. This can be done using Google duplicate finder tool or a simple excel exercise.
2.       Remove rearrangements, plurals and substrings for broad match– find all broad-match keywords with the same words, in either singular or plural form, rearranged and/or substrings of other broad-match keywords. Just keep the keyword with the highest current bid and delete all else. If you end up with more than one having the same bid, keep the simplest looking one and remove all else. For e.g. this step could throw up the following set
Keyword
Match-Type
Bid
Cheap international fares
Broad
23
Cheap fares international
Broad
22
fare cheap international
Broad
34
Cheap Fare international
Broad
13
This is a semantically duplicate set; one should only keep “fare cheap international” (since it has the highest bid) and delete all other keywords.
3.       Build a list of SD sets based on variants and synonyms: Once the keyword has been pruned by rearrangements and plurals, one should find those sets where the exact same keywords are present, in any order, where each word is the same as or a variant or synonym of some word in another keyword. This list should then be comprehensively and carefully reviewed. If there are some real benefits of de-duping them and if the problem cannot be resolved by other means[3], the keyword with the highest bid should be kept and the others removed.
4.       Include all match-types: Of the pruned list, if the same keyword or any of its substrings are present as a stricter match type in the account, either the bid of the stricter type should be raised or the bid of the looser match-type should be lowered.

Conclusion

Fine balance needs to be struck between the need for control and the need for coverage. If executed without caution, the hunt for semantic duplicates can lead to inadvertent yet significant loss in valuable impressions. Take for example the keywords “cheap fare to Bangkok” and “low fare to Bangkok”, both in Broad match. Are they semantic duplicates? definitely not! While to the human intelligence these phrases mean the exact same thing, the google algorithm treats cheap and low as different words. Hence removing either one of them could lead to some lost impressions. Even in black and white cases, if the algorithms to auto-detect and delete SDs are not 100% accurate, every single keyword deleted inadvertently would cost the business valuable impressions. In the end, coverage must take precedence over control in cases where they come into conflict. Pruning bases on SD sets must ensure a 0% loss in relevant coverage and should be a strict control over any keyword pruning process.


[1] Note that semantic duplicity, when defined like this, is contextual and not an inherent property of the keyword set. Whether a keyword set has semantic duplicity is a question that can only be asked in reference to a given set of search-queries.
[2] Fare is the plural synonym of prices, Bangkok is a relevant variation of Thailand, and with the presence of cheap the first keyword becomes a substring of the second.
[3] Other means would be to bring them into the same adgoup and then manage bids for both basis preference for word forms.

No comments: