Boosted Objects and Jet Substructure at the LHC (BOOST 2012 Report)

11/12/2013

219 citations (115 excluding self-citations). The third in a series of annual BOOST workshop reports that established jet substructure as a mature subfield of collider physics.

The Problem

When heavy particles (top quarks, W/Z/Higgs bosons) are produced with enough momentum, their hadronic decay products merge into a single large-radius jet. Extracting the identity of the parent particle from the internal structure of that jet, jet substructure, became one of the most active areas of LHC physics in the early 2010s. By 2012, dozens of substructure techniques had been proposed (pruning, trimming, filtering, N-subjettiness, shower deconstruction), ATLAS and CMS were beginning to use them in real analyses, and the field needed a systematic comparison of tools, a common set of benchmarks, and a clear assessment of what worked and what didn’t.

The Approach

The BOOST workshop series brought together theorists and experimentalists annually to benchmark jet substructure tools against each other and against data. The 2012 report covers perturbative QCD calculations relevant to substructure, Monte Carlo generator validation, pileup mitigation strategies, and lessons from the first wave of ATLAS and CMS analyses using substructure. The report established best practices that both collaborations adopted for their boosted-object analyses through Run 1 and into Run 2.

Impact

The BOOST reports (this paper is the third, following the 2010 and 2011 reports with 374 and 331 citations respectively) collectively defined the standards for jet substructure at the LHC. Two of the most widely used tools in modern collider physics grew directly from challenges highlighted in the BOOST program: Soft Drop grooming (Larkoski, Marzani, Soyez, Thaler; 1,245 citations) and the PUPPI pileup mitigation algorithm (775 citations).

The BOOST benchmarks also became the proving ground for the machine learning revolution in collider physics. Six of the paper’s top 15 citers are ML-for-jets papers: jet images (247 citations), deep learning jet images (382 citations), ParticleNet (648 citations), Energy Flow Networks (369 citations), DeepTop (257 citations), and the Machine Learning Landscape of Top Taggers benchmark comparison (272 citations). The boosted-object tagging challenges defined in the BOOST workshops created the datasets and performance targets that the ML community adopted. The Reviews of Modern Physics survey of jet substructure (377 citations) and the comprehensive Phys. Reports review (637 citations) both trace the field’s development through the BOOST series.

Recollections

I helped create the BOOST conference series three years before this report, as a way of unifying the techniques around what had been a scattered set of observations: formerly heavy particles, now produced with enough momentum that their decay products become collimated and appear as single jets. Traditional experimental and theoretical analyses often required jet isolation, which meant these events were excluded entirely. The early BOOST conferences were important in establishing jet substructure as a legitimate and important technique at both the Tevatron and the LHC rather than an exotic curiosity. I served on the advisory committee and helped with organization.

My main intellectual contribution to the substructure program was a paper called “Jet Dipolarity: Top Tagging with Color Flow,” written with my Stanford graduate students Anson Hook and Martin Jankowiak. The idea was to use jet shapes beyond simple clustering to identify whether a jet’s internal structure came from a color-singlet object, like a W boson, where color flow concentrates energy between the two subjets, versus a colored object, where the color flow radiates out of the jet. Dipolarity provided a handle on the color structure of the decay, not just the kinematics, which was a different kind of information than most substructure tools were using at the time.