Abstract
Online advertisements are bought through a mechanism called real-time bidding (RTB). In RTB, the ads are auctioned in real-time on every webpage load. The ad auctions can be of two types: second-price or first-price auctions. In second-price auctions, the bidder with the highest bid wins the auction, but they only pay the second-highest bid. This paper focuses on first-price auctions, where the buyer pays the amount that they bid. This research evaluates how multi-armed bandit strategies optimize the bid size in a commercial demand-side platform (DSP) that buys inventory through ad exchanges. First, we analyze seven multi-armed bandit algorithms on two different offline real datasets gathered from real second-price auctions. Then, we test and compare the performance of three algorithms in a production environment. Our results show that real data from second-price auctions can be used successfully to model first-price auctions. Moreover, we found that the trained multi-armed bandit algorithms reduce the bidding costs considerably compared to the baseline (naïve approach) on average 29%and optimize the whole budget by slightly reducing the win rate (on average 7.7%). Our findings, tested in a real scenario, show a clear and substantial economic benefit for ad buyers using DSPs.
| Original language | English |
|---|---|
| Pages (from-to) | 6111-6125 |
| Number of pages | 15 |
| Journal | Journal of Intelligent and Fuzzy Systems |
| Volume | 41 |
| Issue number | 6 |
| DOIs | |
| Publication status | Published - 26 Aug 2021 |
| MoE publication type | A1 Journal article-refereed |
Keywords
- Bid shading
- bid optimization
- multi-armed bandits
- reinforcement learning