Meta tests LLM retrieval for stabler ad recommendations
Meta Platforms researchers have introduced an LLM-based retrieval framework designed to improve the stability and predictability of ad recommendations. This system utilizes fine-tuned Large Language Models to extract hierarchical semantic attributes from ad creatives, enabling a graph-based expansion for candidate generation. Online A/B experiments demonstrated significant improvements in both predictability (8.62% reduction in A/A' difference) and traditional performance metrics (0.45% lift in topline online metric).
Key Takeaways
- The system uses Llama3-8B Instruct in an ads retrieval pipeline to extract hierarchical semantic attributes from ad creatives.
- Online A/B tests showed an 8.62% reduction in top-line A/A' difference versus control.
- The test arm posted a 0.45% lift in the topline online metric and a 1.2% increase in final-stage recall.
- Meta reports a 45% improvement in the median absolute deviation of daily impression differences between primary and shadow ad pairs.
- Recall alignment ratio dropped from 0.51X at Top-5 to 0.07X at Top-200, while incremental recall potential rose to 1.89Y at Top-200.
Why It Matters
For Meta’s ad stack, the immediate payoff is less variance when creatives change slightly: the paper says primary and shadow ad pairs produced an 8.62% lower A/A' difference, with a 45% better MAD on daily impression differences. That matters because the system is trying to improve not just recall, but repeatability and explainability in candidate generation. The broader signal is that LLMs are moving from ranking aids into retrieval infrastructure, using semantic attributes and graph traversal to shape delivery. The next concrete marker is whether those gains hold beyond the reported online A/B test, especially the 0.45% topline lift and 1.2% recall increase.
Read full article at arxiv.org