Abstract
Online advertisements are bought through a mechanism called real-time bidding (RTB). In RTB, the ads are auctioned in real-time on every webpage load. The ad auctions can be of two types: second-price or first-price auctions. In second-price auctions, the bidder with the highest bid wins the auction, but they only pay the second-highest bid. This paper focuses on first-price auctions, where the buyer pays the amount that they bid. This research evaluates how multi-armed bandit strategies optimize the bid size in a commercial demand-side platform (DSP) that buys inventory through ad exchanges. First, we analyze seven multi-armed bandit algorithms on two different offline real datasets gathered from real second-price auctions. Then, we test and compare the performance of three algorithms in a production environment. Our results show that real data from second-price auctions can be used successfully to model first-price auctions. Moreover, we found that the trained multi-armed bandit algorithms reduce the bidding costs considerably compared to the baseline (naïve approach) on average 29%and optimize the whole budget by slightly reducing the win rate (on average 7.7%). Our findings, tested in a real scenario, show a clear and substantial economic benefit for ad buyers using DSPs.
Original language | English |
---|---|
Pages (from-to) | 6111-6125 |
Number of pages | 15 |
Journal | Journal of Intelligent and Fuzzy Systems |
Volume | 41 |
Issue number | 6 |
DOIs | |
Publication status | Published - 26 Aug 2021 |
MoE publication type | A1 Journal article-refereed |
Keywords
- Bid shading
- bid optimization
- multi-armed bandits
- reinforcement learning