Combating Ad-Fraud with Bearskin
Wherever there is money to be made, there will be people looking to take advantage of the system to line their own pockets. For Adform, this most often takes the form of click fraud, which we will highlight today. We seemingly interact with real people, but what we are really interacting with are the digital tracks representing real people. These tracks are relatively easy to generate by non-humans, which makes online fraud an irresistible temptation for some unscrupulous types.
Adform takes seriously what negatively affects the value we offer clients. We actively combat click fraud and have a range of algorithms in place that detect and exclude fraudulent activity. Adform’s latest weapon in the battle against ad fraud is Bearskin.
How is fraud committed?
As the name suggests, click fraud centres on fraudulent clicks. The effect of the fake clicks is that a particular domain has an inflated click-through rate (CTR), which is a key measure of advert efficiency, value or quality. Increased CTRs will lead to higher demand and prices, which leads to higher revenues for the website owner. It is worth noting that the clicks themselves do not incur any extra cost for the advertiser nor generate revenue for the domain owner, which is somewhat unique to banner display adverts. The immediate aim is increased perceived quality, leading to increased demand.
Technically, click fraud can be implemented in a number of ways. In most substantial click fraud, robots, or automated scripts, load web pages and click on adverts served on the page. The robots will generally try to mimic human behaviour to leave tracks that are not suspicious.
This means a robot will generally have a cookie with a certain lifetime (albeit a short one), during which it will acquire a history of domains visited. The robot will also report a realistic device type, language setting, etc.
What’s the problem? Let’s block it!
As we covered in my previous post, in the Adform system, a user exists only as a cookie ID – the label given to the tracks left behind by a given user.
Associated with this ID are various data: device type, tracking history, geolocation estimates, browsing times and other information. A robot can quite easily generate these types of data tracks, making it indistinguishable from a genuine human user. This is the core difficulty in identifying and excluding robot traffic, and this is what we have developed Bearskin to target.
So what are we looking for when we search for click bots? What are the secrets of the Bearskin algorithm?
We need to look at how a given cookie interacts with an impression. If there is a click, does the click happen immediately? Where on the banner is the click? What do users do when (or if) they arrive at the landing page?
It turns out this is where most robots lose their cover – and hence this is where Bearskin gets to work. Analysis of billions of clicks has shown how a real human interacts with ads. If we find a cookie that consistently deviates from what is normal, we label this cookie a robot and we make no bids on the domain where that robot is active.
Click timing: It turns out that humans click within a short, low-variance time of seeing an impression (if they click at all). Robots are often programmed to follow a different delay pattern, which we can detect.
Domains 7 and 12 display suspicious, non-human click behaviour.
Click location: Humans click according to the creative content of a banner and so plotting clicks will reveal hotspots of activity. Robots struggle to assess the creative content in a banner. They are stuck following a predictable click pattern, which we can detect.
Humans will click according to ad creatives. Robots do not know where to click; their clicks are dispersed strangely across the banner.
Post-click activity: Robots are totally uninterested in buying a new bike, insurance for their pet or a cheap last-minute holiday. In fact, often they do not even land at the landing page. That is another telltale sign of a robot.
Seems easy, but …
There are three main difficulties that we face:
- First, we have no verified training data. We have no cookies that are labelled as robots with absolute certainty. This means our objective is very poorly defined: we just are looking for cookies that are somehow “different”.
- Second, we want to minimise false positives, i.e. blocking a domain that is actually not a hotbed of robot activity. In particular, new but genuine cookies that have very short histories can be tough to discern from robots. Genuine publishers, understandably, do not appreciate being blacklisted and being denied access to the market.
- Third, robots are constantly developing and becoming more advanced. Principally this development is in the probability distributions robots draw from when deciding how long to wait before clicking and where to click. We can defend against this to an extent by not revealing the precise details of what we look for when classifying robots!
Fraud detection is complex and the summary above is necessarily at a very high level. There are other types of fraud not mentioned here and we are constantly aiming to be at the cutting edge of fraud detection and elimination. Follow our blog for more related posts coming very soon.
Please feel free to contact us for more information or answers to specific questions!