Letting the Machines Search for New Physics

Article: “Anomaly Detection for Resonant New Physics with Machine Learning”

Authors: Jack H. Collins, Kiel Howe, Benjamin Nachman

Reference : https://arxiv.org/abs/1805.02664

One of the main goals of LHC experiments is to look for signals of physics beyond the Standard Model; new particles that may explain some of the mysteries the Standard Model doesn’t answer. The typical way this works is that theorists come up with a new particle that would solve some mystery and they spell out how it interacts with the particles we already know about. Then experimentalists design a strategy of how to search for evidence of that particle in the mountains of data that the LHC produces. So far none of the searches performed in this way have seen any definitive evidence of new particles, leading experimentalists to rule out a lot of the parameter space of theorists favorite models.

A summary of searches the ATLAS collaboration has performed. The left columns show model being searched for, what experimental signature was looked at and how much data has been analyzed so far. The color bars show the regions that have been ruled out based on the null result of the search. As you can see, we have already covered a lot of territory.

Despite this extensive program of searches, one might wonder if we are still missing something. What if there was a new particle in the data, waiting to be discovered, but theorists haven’t thought of it yet so it hasn’t been looked for? This gives experimentalists a very interesting challenge, how do you look for something new, when you don’t know what you are looking for? One approach, which Particle Bites has talked about before, is to look at as many final states as possible and compare what you see in data to simulation and look for any large deviations. This is a good approach, but may be limited in its sensitivity to small signals. When a normal search for a specific model is performed one usually makes a series of selection requirements on the data, that are chosen to remove background events and keep signal events. Nowadays, these selection requirements are getting more complex, often using neural networks, a common type of machine learning model, trained to discriminate signal versus background. Without some sort of selection like this you may miss a smaller signal within the large amount of background events.

This new approach lets the neural network itself decide what signal to  look for. It uses part of the data itself to train a neural network to find a signal, and then uses the rest of the data to actually look for that signal. This lets you search for many different kinds of models at the same time!

If that sounds like magic, lets try to break it down. You have to assume something about the new particle you are looking for, and the technique here assumes it forms a resonant peak. This is a common assumption of searches. If a new particle were being produced in LHC collisions and then decaying, then you would get an excess of events where the invariant mass of its decay products have a particular value. So if you plotted the number of events in bins of invariant mass you would expect a new particle to show up as a nice peak on top of a relatively smooth background distribution. This is a very common search strategy, and often colloquially referred to as a ‘bump hunt’. This strategy was how the Higgs boson was discovered in 2012.

A histogram showing the invariant mass of photon pairs. The Higgs boson shows up as a bump at 125 GeV. Plot from here

The other secret ingredient we need is the idea of Classification Without Labels (abbreviated CWoLa, pronounced like koala). The way neural networks are usually trained in high energy physics is using fully labeled simulated examples. The network is shown a set of examples and then guesses which are signal and which are background. Using the true label of the event, the network is told which of the examples it got wrong, its parameters are updated accordingly, and it slowly improves. The crucial challenge when trying to train using real data is that we don’t know the true label of any of data, so its hard to tell the network how to improve. Rather than trying to use the true labels of any of the events, the CWoLA technique uses mixtures of events. Lets say you have 2 mixed samples of events, sample A and sample B, but you know that sample A has more signal events in it than sample B. Then, instead of trying to classify signal versus background directly, you can train a classifier to distinguish between events from sample A and events from sample B and what that network will learn to do is distinguish between signal and background. You can actually show that the optimal classifier for distinguishing the two mixed samples is the same as the optimal classifier of signal versus background. Even more amazing, this technique actually works quite well in practice, achieving good results even when there is only a few percent of signal in one of the samples.

An illustration of the CWoLa method. A classifier trained to distinguish between two mixed samples of signal and background events learns can learn to classify signal versus background. Taken from here

The technique described in the paper combines these two ideas in a clever way. Because we expect the new particle to show up in a narrow region of invariant mass, you can use some of your data to train a classifier to distinguish between events in a given slice of invariant mass from other events. If there is no signal with a mass in that region then the classifier should essentially learn nothing, but if there was a signal in that region that the classifier should learn to separate signal and background. Then one can apply that classifier to select events in the rest of your data (which hasn’t been used in the training) and look for a peak that would indicate a new particle. Because you don’t know ahead of time what mass any new particle should have, you scan over the whole range you have sufficient data for, looking for a new particle in each slice.

The specific case that they use to demonstrate the power of this technique is for new particles decaying to pairs of jets. On the surface, jets, the large sprays of particles produced when quark or gluon is made in a LHC collision, all look the same. But actually the insides of jets, their sub-structure, can contain very useful information about what kind of particle produced it. If a new particle that is produced decays into other particles, like top quarks, W bosons or some a new BSM particle, before decaying into quarks then there will be a lot of interesting sub-structure to the resulting jet, which can be used to distinguish it from regular jets. In this paper the neural network uses information about the sub-structure for both of the jets in event to determine if the event is signal-like or background-like.

The authors test out their new technique on a simulated dataset, containing some events where a new particle is produced and a large number of QCD background events. They train a neural network to distinguish events in a window of invariant mass of the jet pair from other events. With no selection applied there is no visible bump in the dijet invariant mass spectrum. With their technique they are able to train a classifier that can reject enough background such that a clear mass peak of the new particle shows up. This shows that you can find a new particle without relying on searching for a particular model, allowing you to be sensitive to particles overlooked by existing searches.

Demonstration of the bump hunt search. The shaded histogram is the amount of signal in the dataset. The different levels of blue points show the data remaining after applying tighter and tighter selection based on the neural network classifier score. The red line is the predicted amount of background events based on fitting the sideband regions. One can see that for the tightest selection (bottom set of points), the data forms a clear bump over the background estimate, indicating the presence of a new particle

This paper was one of the first to really demonstrate the power of machine-learning based searches. There is actually a competition being held to inspire researchers to try out other techniques on a mock dataset. So expect to see more new search strategies utilizing machine learning being released soon. Of course the real excitement will be when a search like this is applied to real data and we can see if machines can find new physics that us humans have overlooked!

Read More:

  1. Quanta Magazine Article “How Artificial Intelligence Can Supercharge the Search for New Particles”
  2. Blog Post on the CWoLa Method “Training Collider Classifiers on Real Data”
  3. Particle Bites Post “Going Rogue: The Search for Anything (and Everything) with ATLAS”
  4. Blog Post on applying ML to top quark decays “What does Bidirectional LSTM Neural Networks has to do with Top Quarks?”
  5. Extended Version of Original Paper “Extending the Bump Hunt with Machine Learning”

LIGO and Gravitational Waves: A Hep-ex perspective

The exciting Twitter rumors have been confirmed! On Thursday, LIGO finally announced the first direct observation of gravitational waves, a prediction 100 years in the making. The media storm has been insane, with physicists referring to the discovery as “more significant than the discovery of the Higgs boson… the biggest scientific breakthrough of the century.” Watching Thursday’s press conference from CERN, it was hard not to make comparisons between the discovery of the Higgs and LIGO’s announcement.

 

 

The gravitational-wave event GW150914 observed by the LIGO Collaboration
The gravitational-wave event GW150914 observed by the LIGO Collaboration

 

Long standing Searches for well known phenomena

 

The Higgs boson was billed as the last piece of the Standard Model puzzle. The existence of the Higgs was predicted in the 1960s in order to explain the mass of vector bosons of the Standard Model, and avoid non-unitary amplitudes in W boson scattering. Even if the Higgs didn’t exist, particle physicists expected new physics to come into play at the TeV Scale, and experiments at the LHC were designed to find it.

 

Similarly, gravitational waves were the last untested fundamental prediction of General Relativity. At first, physicists remained skeptical of the existence of gravitational waves, but the search began in earnest with Joseph Webber in the 1950s (Forbes). Indirect evidence of gravitational waves was demonstrated a few decades later. A binary system consisting of a pulsar and neutron star was observed to release energy over time, presumably in the form of gravitational waves. Using Webber’s method for inspiration, LIGO developed two detectors of unprecedented precision in order to finally make direct observation.

 

Unlike the Higgs, General Relativity makes clear predictions about the properties of gravitational waves. Waves should travel at the speed of light, have two polarizations, and interact weakly with matter. Scientists at LIGO were even searching for a very particular signal, described as a characteristic “chirp”. With the upgrade to the LIGO detectors, physicists were certain they’d be capable of observing gravitational waves. The only outstanding question was how often these observations would happen.

 

The search for the Higgs involved more uncertainties. The one parameter essential for describing the Higgs, its mass, is not predicted by the Standard Model. While previous collider experiments at LEP and Fermilab were able to set limits on the Higgs mass, the observed properties of the Higgs were ultimately unknown before the discovery. No one knew whether or not the Higgs would be a Standard Model Higgs, or part of a more complicated theory like Supersymmetry or technicolor.

 

Monumental scientific endeavors

 

Answering the most difficult questions posed by the universe isn’t easy, or cheap. In terms of cost, both LIGO and the LHC represent billion dollar investments. Including the most recent upgrade, LIGO cost a total $1.1 billion, and when it was originally approved in 1992, “it represented the biggest investment the NSF had ever made” according to France Córdova, NSF director. The discovery of the Higgs was estimated by Forbes to cost a total of $13 billion, a hefty price to be paid by CERN’s member and observer states. Even the electricity bill costs more than $200 million per year.

 

The large investment is necessitated by the sheer monstrosity of the experiments. LIGO consists of two identical detectors roughly 4 km long, built 3000 km apart. Because of it’s large size, LIGO is capable of measuring ripples in space 10000 times smaller than an atomic nucleus, the smallest scale ever measured by scientists (LIGO Fact Page). The size of the LIGO vacuum tubes is only surpassed by those at the LHC. At 27 km in circumference, the LHC is the single largest machine in the world, and the most powerful particle accelerator to date. It only took a handful of people to predict the existence of gravitational waves and the Higgs, but it took thousands of physicists and engineers to find them.

 

Life after Discovery

 

Even the language surrounding both announcements is strikingly similar. Rumors were circulating for months before the official press conferences, and the expectations from each respective community were very high. Both discoveries have been touted as the discoveries of the century, with many experts claiming that results would usher in a “new era” of particle physics or observational astronomy.

 

With a few years of hindsight, it is clear that the “new era” of particle physics has begun. Before Run I of the LHC, particle physicists knew they needed to search for the Higgs. Now that the Higgs has been discovered, there is much more uncertainty surrounding the field. The list of questions to try and answer is enormous. Physicists want to understand the source of the Dark Matter that makes up roughly 25% of the universe, from where neutrinos derive their mass, and how to quantize gravity. There are several ad hoc features of the Standard Model that merit additional explanation, and physicists are still searching for evidence of supersymmetry and grand unified theories. While the to-do list is long, and well understood, how to solve these problems is not. Measuring the properties of the Higgs does allow particle physicists to set limits on beyond the Standard Model Physics, but it’s unclear at which scale new physics will come into play, and there’s no real consensus about which experiments deserve the most support. For some in the field, this uncertainty can result in a great deal of anxiety and skepticism about the future. For others, the long to-do list is an absolutely thrilling call to action.

 

With regards to the LIGO experiment, the future is much more clear. LIGO has only published one event from 16 days of data taking. There is much more data already in the pipeline, and more interferometers like VIRGO and (e)LISA, planning to go online in the near future. Now that gravitational waves have been proven to exist, they can be used to observe the universe in a whole new way. The first event already contains an interesting surprise. LIGO has observed two inspriraling black holes of 36 and 29 solar masses, merging into a final black hole of 62 solar masses. The data thus confirmed the existence of heavy stellar black holes, with masses more than 25 times greater than the sun, and that binary black hole systems form in nature (Atrophysical Journal). When VIRGO comes online, it will be possible to triangulate the source of these gravitational waves as well. LIGO’s job is to watch, and see what other secrets the universe has in store.

LHC Run II: What To Look Out For

The Large Hadron Collider is the world’s largest proton collider, and in a mere five years of active data acquisition, it has already achieved fame for the discovery of the elusive Higgs Boson in 2012. Though the LHC is currently off to allow for a series of repairs and upgrades, it is scheduled to begin running again within the month, this time with a proton collision energy of 13 TeV. This is nearly double the previous run energy of 8 TeV,  opening the door to a host of new particle productions and processes. Many physicists are keeping their fingers crossed that another big discovery is right around the corner. Here are a few specific things that will be important in Run II.

 

1. Luminosity scaling

Though this is a very general category, it is a huge component of the Run II excitement. This is simply due to the scaling of luminosity with collision energy, which gives a remarkable increase in discovery potential for the energy increase.

If you’re not familiar, luminosity is the number of events per unit time and cross sectional area. Integrated luminosity sums this instantaneous value over time, giving a metric in the units of 1/area.

lumi                          intLumi

 In the particle physics world, luminosities are measured in inverse femtobarns, where 1 fb-1 = 1/(10-43 m2). Each of the two main detectors at CERN, ATLAS and CMS, collected 30 fb-1 by the end of 2012. The main point is that more luminosity means more events in which to search for new physics.

Figure 1 shows the ratios of LHC luminosities for 7 vs. 8 TeV, and again for 13 vs. 8 TeV. Since the plot is in log scale on the y axis, it’s easy to tell that 13 to 8 TeV is a very large ratio. In fact, 100 fb-1 at 8 TeV is the equivalent of 1 fb-1 at 13 TeV. So increasing the energy by a factor less than 2 increase the integrated luminosity by a factor of 100! This means that even in the first few months of running at 13 TeV, there will be a huge amount of data available for analysis, leading to the likely release of many analyses shortly after the beginning of data acquisition.

lumiRatio
Figure 1: Parton luminosity ratios, from J. Stirling at Imperial College London (see references.)

 

2. Supersymmetry

Supersymmetry theory proposes the existence of a superpartner for every particle in the Standard Model, effectively doubling the number of fundamental particles in the universe. This helps to answer many questions in particle physics, namely the question of where the particle masses came from, known as the ‘hierarchy’ problem (see the further reading list for some good explanations.)

Current mass limits on many supersymmetric particles are getting pretty high, concerning some physicists about the feasibility of finding evidence for SUSY. Many of these particles have already been excluded for masses below the order of a TeV, making it very difficult to create them with the LHC as is. While there is talk of another LHC upgrade to achieve energies even higher than 14 TeV, for now the SUSY searches will have to make use of the energy that is available.

SUSYxsec
Figure 2: Cross sections for the case of equal degenerate squark and gluino masses as a function of mass at √s = 13 TeV, from 1407.5066. q stands for quark, g stands for gluino, and t stands for stop.

 

Figure 2 shows the cross sections for various supersymmetric particle pair production, including squark (the supersymmetric top quark) and gluino (the supersymmetric gluon). Given the luminosity scaling described previously, these cross sections tell us that with only 1 fb-1, physicists will be able to surpass the existing sensitivity for these supersymmetric processes. As a result, there will be a rush of searches being performed in a very short time after the run begins.

 

3. Dark Matter

Dark matter is one of the greatest mysteries in particle physics to date (see past particlebites posts for more information). It is also one of the most difficult mysteries to solve, since dark matter candidate particles are by definition very weakly interacting. In the LHC, potential dark matter creation is detected as missing transverse energy (MET) in the detector, since the particles do not leave tracks or deposit energy.

One of the best ways to ‘see’ dark matter at the LHC is in signatures with mono-jet or photon signatures; these are jets/photons that do not occur in pairs, but rather occur singly as a result of radiation. Typically these signatures have very high transverse momentum (pT) jets, giving a good primary vertex, and large amounts of MET, making them easier to observe. Figure 3 shows a Feynman diagram of such a decay, with the MET recoiling off a jet or a photon.

feynmanMonoX
Figure 3: Feynman diagram of mono-X searches for dark matter, from “Hunting for the Invisible.”

 

Though the topics in this post will certainly be popular in the next few years at the LHC, they do not even begin to span the huge volume of physics analyses that we can expect to see emerging from Run II data. The next year alone has the potential to be a groundbreaking one, so stay tuned!

 

References: 

Further Reading:

 

 

CMS evidence of a possible SUSY decay chain

Title: “Search for physics beyond the standard model in events with two leptons, jets, and missing transverse energy in pp collisions at sqrt(s)=8 TeV.”
Author: CMS Collaboration
Published: CMS Public: Physics Results SUS12019

The CMS Collaboration, one of the two main groups working on multipurpose experiments at the Large Hadron Collider, has recently reported an excess of events with an estimated significance of 2.6σ. As a reminder, discoveries in particle physics are typically declared at 5σ. While this excess is small enough that it may not be related to new physics at all, it is also large enough to generate some discussion.

The excess occurs at an invariant mass of 20 – 70 GeV in dilepton + missing transverse energy (MET) decays. Some theorists claim that this may be a signature of supersymmetry. The analysis was completed using kinematic ‘edges’, an example of which can be seen in Figure 1. These shapes are typical of the decays of new particles predicted by supersymmetry. 

 

edgeDiagram
Figure 1: Diagram of kinematic ‘edge’ effects in decay chains, from “Search for an ‘edge’ with CMS”. On the left, A, B, C, and D represent particles decaying. On the right, the invariant mass of final state particles C and D is shown, where the y axis represents the number of events.

The edge shape comes from the reconstructed invariant mass of the two leptons; in the diagram, these correspond to particles C and D. In models that conserve R-parity, which is the quantum number that distinguishes SUSY particles from Standard Model particles, a SUSY particle decays by emitting an SM particle and a lighter SUSY particle. In this case, two leptons are emitted in the chain. Reconstructing the invariant mass of the event is impossible because of the invisible massive particle. However, the total mass of the lepton pair can have any value, provided it is less than the maximum difference in mass between the initial and final state, as enforced by energy conservation. This maximum mass difference gives a hard cutoff, or ‘edge’, in the invariant mass distribution, as shown in the right side of Figure 1. Since the location of this cutoff is dependent on the mass of the original superparticle, these features can be very useful in obtaining information about such decays.

 

Figure 2 shows generated Monte Carlo for a new particle decaying to a two lepton final state. The red and blue lines show sources of background, while the green is the simulated signal. If the model was a good estimate of data, these three colored lines would sum to the distribution observed in data. Figure 3 shows the actual data distribution, with the relative significance of the excess around 20 – 70 GeV.

newSUSYMC
Figure 2: Monte Carlo invariant mass distribution of paired electrons or muons; signal shown in green with characteristic edge.
excessPlot
Figure 3: Invariant mass data distribution for paired leptons; excess between 20 and 70 GeV constitutes an estimated 2.6σ significance. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This excess is encouraging for physicists hoping to find stronger evidence for supersymmetry (or more generally, new physics) in Run II. However, 2.6σ is not especially high, and historically these excesses come and go all the time. Both CMS and ATLAS will certainly be watching this resonance in the 2015 13 TeV data, to see whether it grows into something more significant or simply fades into the background.

 

Further reading: