Machine Learning The LHC ABC’s

Article Title: ABCDisCo: Automating the ABCD Method with Machine Learning

Authors: Gregor Kasieczka, Benjamin Nachman, Matthew D. Schwartz, David Shih

Reference: arxiv:2007.14400

When LHC experiments try to look for the signatures of new particles in their data they always apply a series of selection criteria to the recorded collisions. The selections pick out events that look similar to the sought after signal. Often they then compare the observed number of events passing these criteria to the number they would expect to be there from ‘background’ processes. If they see many more events in real data than the predicted background that is evidence of the sought after signal. Crucial to whole endeavor is being able to accurately estimate the number of events background processes would produce. Underestimate it and you may incorrectly claim evidence of a signal, overestimate it and you may miss the chance to find a highly sought after signal.

However it is not always so easy to estimate the expected number of background events. While LHC experiments do have high quality simulations of the Standard Model processes that produce these backgrounds they aren’t perfect. Particularly processes involving the strong force (aka Quantum Chromodynamics, QCD) are very difficult to simulate, and refining these simulations is an active area of research. Because of these deficiencies we don’t always trust background estimates based solely on these simulations, especially when applying very specific selection criteria.

Therefore experiments often employ ‘data-driven’ methods where they estimate the amount background events by using control regions in the data. One of the most widely used techniques is called the ABCD method.

An illustration of the ABCD method. The signal region, A, is defined as the region in which f and g are greater than some value. The amount of background in region A is estimated using regions B C and D which are dominated by background.

The ABCD method can applied if the selection of signal-like events involves two independent variables f and g. If one defines the ‘signal region’, A,  (the part of the data in which we are looking for a signal) as having f  and g each greater than some amount, then one can use the neighboring regions B, C, and D to estimate the amount of background in region A. If the number of signal events outside region A is small, the number of background events in region A can be estimated as N_A = N_B * (N_C/N_D).

In modern analyses often one of these selection requirements involves the score of a neural network trained to identify the sought after signal. Because neural networks are powerful learners one often has to be careful that they don’t accidentally learn about the other variable that will be used in the ABCD method, such as the mass of the signal particle. If two variables become correlated, a background estimate with the ABCD method will not be possible. This often means augmenting the neural network either during training or after the fact so that it is intentionally ‘de-correlated’ with respect to the other variable. While there are several known techniques to do this, it is still a tricky process and often good background estimates come with a trade off of reduced classification performance.

In this latest work the authors devise a way to have the neural networks help with the background estimate rather than hindering it. The idea is rather than training a single network to classify signal-like events, they simultaneously train two networks both trying to identify the signal. But during this training they use a groovy technique called ‘DisCo’ (short for Distance Correlation) to ensure that these two networks output is independent from each other. This forces the networks to learn to use independent information to identify the signal. This then allows these networks to be used in an ABCD background estimate quite easily.

The authors try out this new technique, dubbed ‘Double DisCo’, on several examples. They demonstrate they are able to have quality background estimates using the ABCD method while achieving great classification performance. They show that this method improves upon the previous state of the art technique of decorrelating a single network from a fixed variable like mass and using cuts on the mass and classifier to define the ABCD regions (called ‘Single Disco’ here).

Using the task of identifying jets containing boosted top quarks, they compare the classification performance (x-axis) and quality of the ABCD background estimate (y-axis) achievable with the new Double DisCo technique (yellow points) and previously state of the art Single DisCo (blue points). One can see the Double DisCo method is able to achieve higher background rejection with a similar or better amount of ABCD closure.

While there have been many papers over the last few years about applying neural networks to classification tasks in high energy physics, not many have thought about how to use them to improve background estimates as well. Because of their importance, background estimates are often the most time consuming part of a search for new physics. So this technique is both interesting and immediately practical to searches done with LHC data. Hopefully it will be put to use in the near future!

Further Reading:

Quanta Magazine Article “How Artificial Intelligence Can Supercharge the Search for New Particles

Recent ATLAS Summary on New Machine Learning Techniques “Machine learning qualitatively changes the search for new particles

CERN Tutorial on “Background Estimation with the ABCD Method

Summary of Paper of Previous Decorrelation Techniques used in ATLAS “Performance of mass-decorrelated jet substructure observables for hadronic two-body decay tagging in ATLAS

A shortcut to truth

Article title: “Automated detector simulation and reconstruction
parametrization using machine learning”

Authors: D. Benjamin, S.V. Chekanov, W. Hopkins, Y. Li, J.R. Love

Reference: https://arxiv.org/abs/2002.11516 (https://iopscience.iop.org/article/10.1088/1748-0221/15/05/P05025)

Demonstration of probability density function as the output of a neural network. (Source: paper)

The simulation of particle collisions at the LHC is a pharaonic task. The messy chromodynamics of protons must be modeled; the statistics of the collision products must reflect the Standard Model; each particle has to travel through the detectors and interact with all the elements in its path. Its presence will eventually be reduced to electronic measurements, which, after all, is all we know about it.

The work of the simulation ends somewhere here, and that of the reconstruction starts; namely to go from electronic signals to particles. Reconstruction is a process common to simulation and to the real world. Starting from the tangle of statistical and detector effects that the actual measurements include, the goal is to divine the properties of the initial collision products.

Now, researchers at the Argonne National Laboratory looked into going from the simulated particles as produced in the collisions (aka “truth objects”) directly to the reconstructed ones (aka “reco objects”): bypassing the steps of the detailed interaction with the detectors and of the reconstruction algorithm could make the studies that use simulations much more speedy and efficient.

Display of a collision event involving hadronic jets at ATLAS. Each colored block corresponds to interaction with a detector element. (Source: ATLAS experiment)

The team used a neural network which it trained on simulations of the full set. The goal was to have the network learn to produce the properties of the reco objects when given only the truth objects. The process succeeded in producing the transverse momenta of hadronic jets, and looks suitable for any kind of particle and for other kinematic quantities.

More specifically, the researchers began with two million simulated jet events, fully passed through the ATLAS experiment and the reconstruction algorithm. For each of them, the network took the kinematic properties of the truth jet as input and was trained to achieve the reconstructed transverse momentum.

The network was taught to perform multi-categorization: its output didn’t consist of a single node giving the momentum value, but of 400 nodes, each corresponding to a different range of values. The output of each node was the probability for that particular range. In other words, the result was a probability density function for the reconstructed momentum of a given jet.

The final step was to select the momentum randomly from this distribution. For half a million of test jets, all this resulted in good agreement with the actual reconstructed momenta, specifically within 5% for values above 20 GeV. In addition, it seems that the training was sensitive to the effects of quantities other than the target one (e.g. the effects of the position in the detector), as the neural network was able to pick up on the dependencies between the input variables. Also, hadronic jets are complicated animals, so it is expected that the method will work on other objects just as well.

Comparison of the reconstructed transverse momentum between the full simulation and reconstruction (“Delphes”) and the neural net output. (Source: paper)

All in all, this work showed the perspective for neural networks to imitate successfully the effects of the detector and the reconstruction. Simulations in large experiments typically take up loads of time and resources due to their size, intricacy and frequent need for updates in the hardware conditions. Such a shortcut, needing only small numbers of fully processed events, would speed up studies such as optimization of the reconstruction and detector upgrades.

More reading:

Argonne Lab press release: https://www.anl.gov/article/learning-more-about-particle-collisions-with-machine-learning

Intro to neural networks: https://physicsworld.com/a/neural-networks-explained/

Jets aren’t just a game of tag anymore

Article: Probing Quarkonium Production Mechanisms with Jet Substructure
Authors: Matthew Baumgart, Adam Leibovich, Thomas Mehen, and Ira Rothstein
Reference: arXiv:1406.2295 [hep-ph]

“Tag…you’re it!” is a popular game to play with jets these days at particle accelerators like the LHC. These collimated sprays of radiation are common in various types of high-energy collisions and can present a nasty challenge to both theorists and experimentalists (for more on the basic ideas and importance of jet physics, see my July bite on the subject). The process of tagging a jet generally means identifying the type of particle that initiated the jet. Since jets provide a significant contribution to backgrounds at high energy colliders, identifying where they come from can make doing things like discovering new particles much easier. While identifying backgrounds to new physics is important, in this bite I want to focus on how theorists are now using jets to study the production of hadrons in a unique way.

Over the years, a host of theoretical tools have been developed for making the study of jets tractable. The key steps of “reconstructing” jets are:

  1. Choose a jet algorithm (i.e. basically pick a metric that decides which particles it thinks are “clustered”),
  2. Identify potential jet axes (i.e. the centers of the jets),
  3. Decide which particles are in/out of the jets based on your jet algorithm.

 

Figure 1: A basic 3-jet event where one of the reconstructed jets is found to have been initiated by a b quark. The process of finding such jets is called "tagging."
Figure 1: A basic 3-jet event where one of the reconstructed jets is found to have been initiated by a b quark. The process of finding such jets is called “tagging.”

Deciphering the particle content of a jet can often help to uncover what particle initiated the jet. While this is often enough for many analyses, one can ask the next obvious question: how are the momenta of the particles within the jet distributed? In other words, what does the inner geometry of the jet look like?

There are a number of observables that one can look at to study a jet’s geometry. These are generally referred to as jet substructure observables. Two basic examples are:

  1. Jet-shape: This takes a jet of radius R and then identifies a sub-jet within it of radius r. By measuring the energy fraction contained within sub-jets of variable radius r, one can study where the majority of the jet’s energy/momentum is concentrated.
  2. Jet mass: By measuring the invariant mass of all of the particles in a jet (while simultaneously considering the jet’s energy and radius) one can gain insight into how focused a jet is.
Figure 2: A basic way to produce quarkonium via the fragmentation of a gluon. The interactions highlighted in blue are calculated using standard perturbative QCD. The green zone is where things get tricky and non-perturbative models that are extracted from data must be used.
Figure 2: A basic way to produce quarkonium via the fragmentation of a gluon. The interactions highlighted in blue are calculated using standard perturbative QCD. The green zone is where things get tricky and non-perturbative models that are extracted from data must often be used.

One way in which phenomenologists are utilizing jet substructure technology is in the study of hadron production. In arXiv:1406.2295, Baumgart et. al. introduced a way to connect the world of jet physics with the world of quarkonia. These bound states of charm-anti-charm or bottom-anti-bottom quarks are the source of two things: great buzz words for impressing your friends and several outstanding problems within the standard model. While we’ve been studying quarkonia such the J/\psi(c\bar{c}) and the \Upsilon(b\bar{b}) for a half-century, there are still a bunch of very basic questions we have about how they are produced (more on this topic in future bites).

This paper offers a fresh approach to studying the various ways in which quarkonia are produced at the LHC by focusing on how they are produced within jets. The wealth of available jet physics technology then provides a new family of interesting observables. The authors first describe the various mechanisms by which quarkonia are produced. In the formalism of Non-relativistic (NR) QCD, the J/\psi for example, is most frequently produced at the LHC (see Fig. 2) when a high energy gluon splits into a c\bar{c} pair in one of several possible angular momentum and color quantum states. This pair then ultimately undergoes non-perturbative (i.e. we can’t really calculate them using standard techniques in quantum field theory) effects and becomes a color-singlet final state particle (as any reasonably minded particle should do). While this model makes some sense, we have no idea how often quarkonia are produced via each mechanism.

Figure 3: This plot from arXiv:1406.2295 shows how the probability that a gluon or quark fragments into a jet with a specific energy E that a contains a $latex J/\psi$ with a fraction $latex z$ of the original quark/gluon's momentum varies for different mechanisms. The spectroscopic notation should be familiar from basic quantum mechanics. It gives the angular momentum and color quantum numbers of the $latex q\bar{q}$ pair that eventually becomes quarkonium. Notice that for different values of z and E, the different mechanisms behave differently.
Figure 3: This plot from arXiv:1406.2295 shows how the probability that a gluon or quark fragments into a jet with a specific energy E that a contains a J/\psi with a fraction z of the original quark/gluon’s momentum varies for different mechanisms. The spectroscopic notation should be familiar from basic quantum mechanics. It gives the angular momentum and color quantum numbers of the q\bar{q} pair that eventually becomes quarkonium. Notice that for different values of z and E, the different mechanisms behave differently. Thus this observable (i.e. that mouth full of a probability distribution I described) is said to have discriminating power between the different channels by which a J/\psi is typically formed.

This paper introduces a theoretical formalism that looks at the following question: what is the probability that a parton (quark/gluon) hadronizes into a jet with a certain substructure and that contains a specific hadron with some fraction z of the original partons energy? The authors show that the answer to this question is correlated with the answer to the question: How often are quarkonia produced via the different intermediate angular-momentum/color states of NRQCD? In other words, they show that studying how the geometry of the jets that contain quarkonia may lead to answers to decades old questions about how quarkonia are produced!

There are several other efforts to study hadron production through the lens of jet physics that have also done preliminary comparisons with ATLAS/CMS data (one such study will be the subject of my next bite). These studies look at the production of more general classes of hadrons and numbers of jets in events and see promising results when compared with 7 TeV data from ATLAS and CMS.

The moral of this story is that jets are now being viewed less as a source of troublesome backgrounds to new physics and more as a laboratory for studying long-standing questions about the underlying nature of hadronization. Jet physics offers innovative ways to look at old problems, offering a host of new and exciting observables to study at the LHC and other experiments.

Further Reading

  1. The November Revolution: https://www.slac.stanford.edu/history/pubs/gilmannov.pdf. This transcript of a talk provides some nice background on, amongst other things, the momentous discovery of the J/\psi in 1974 what is often referred to the November Revolution.
  2. An Introduction to the NRQCD Factorization Approach to Heavy Quarkonium https://cds.cern.ch/record/319642/files/9702225.pdf. As good as it gets when it comes to outlines of the basics of this tried-and-true effective theory. This article will definitely take some familiarity with QFT but provides a great outline of the basics of the NRQCD Lagrangian, fields, decays etc.