1 Preface To outline the challenges in computing that high-energy physics will face over the next years and strategies to not attempt to take later developments into account. 2 Introduction notthe Large Hadron Colider (LHC)re,the high luminosity LHC (HI-LH),in a is to exploit the full pbysics will delive ayhat periments will be limited by the physics Machine learning (ML)applied to particle physics nt research and development where signil mprovements are ne .Physics performance of reconstruction and analysis algorithms: Execution time of computationally expensive parts of event simulation,pattern recognition,and cali- bration; .Realtime implementation of machine learning algorithms Reduction of the data footprint with data compression,placement and access 2.1 Motivation with physics beyond the SM.Both tasks require the identification of rare signals in immense bac rounds rie tc Machine learning algorithms are already the s e-of-the-art in event and particle identification,energy estima 2.2 Brief Overview of Machine Learning Algorithms in HEP Trees(BDTs)and Neural Networks(NN). model is tmined for
1 Preface To outline the challenges in computing that high-energy physics will face over the next years and strategies to approach them, the HEP software foundation has organised a Community White Paper (CWP) [1]. In addition to the main document, several more detailed documents were worked out by different working groups. The present document focusses on the topic of machine learning. The goals are to define the tasks at the energy and intensity frontier that can be addressed during the next decade by research and development of machine learning applications. Machine learning in particle physics is evolving fast, while the contents of this community white paper were mainly compiled during community meetings in spring 2017 that took place at several workshops on machine learning in high-energy physics: S2I2 and [2–5]. The contents of this document thus reflect the state of the art at these events and does not attempt to take later developments into account. 2 Introduction One of the main objectives of particle physics in the post-Higgs boson discovery era is to exploit the full physics potential of both the Large Hadron Collider (LHC) and its upgrade, the high luminosity LHC (HL-LHC), in addition to present and future neutrino experiments. The HL-LHC will deliver an integrated luminosity that is 20 times larger than the present LHC dataset, bringing quantitatively and qualitatively new challenges due to event size, data volume, and complexity. The physics reach of the experiments will be limited by the physics performance of algorithms and computational resources. Machine learning (ML) applied to particle physics promises to provide improvements in both of these areas. Incorporating machine learning in particle physics workflows will require significant research and development over the next five years. Areas where significant improvements are needed include: • Physics performance of reconstruction and analysis algorithms; • Execution time of computationally expensive parts of event simulation, pattern recognition, and calibration; • Realtime implementation of machine learning algorithms; • Reduction of the data footprint with data compression, placement and access. 2.1 Motivation The experimental high-energy physics (HEP) program revolves around two main objectives that go hand in hand: probing the Standard Model (SM) with increasing precision and searching for new particles associated with physics beyond the SM. Both tasks require the identification of rare signals in immense backgrounds. Substantially increased levels of pile-up collisions from additional protons in the bunch at the HL-LHC will make this a significant challenge. Machine learning algorithms are already the state-of-the-art in event and particle identification, energy estimation and pile-up suppression applications in HEP. Despite their present advantage, machine-learning algorithms still have significant room for improvement in their exploitation of the full potential of the dataset. 2.2 Brief Overview of Machine Learning Algorithms in HEP This section provides a brief introduction to the most important machine learning algorithms in HEP, introducing key vocabulary (in italic). Machine learning methods are designed to exploit large datasets in order to reduce complexity and find new features in data. The current most frequently used machine learning algorithms in HEP are Boosted Decision Trees (BDTs) and Neural Networks (NN). Typically, variables relevant to the physics problem are selected and a machine learning model is trained for classification or regression using signal and background events (or instances). Training the model is the most human- and CPU-time consuming step, while the application, the so called inference stage, is relatively inexpensive. BDTs and NNs are typically used to classify particles and events. They are also used for regression, 6
oforee toobain the bet ete of partice'ybdo impact on HEP.Deep learning is particularly promising when there is a large amount of data and features,as well as symmetries and complex non-linear dependencies between inputs and outputs There are different types of DNN used in HEP:fully-connected(FCN),convolutional (CNN)and recurrent (RNN).etworks aresd in the co work is used in HEP. A large set of machine learning algorithms is devoted to time series analysis and prediction.They are in general not relevant for HEP data an alysis where event s are independent from eachother.However,there is more and 2.3 Structure of the Document earning software in HEP and discusses the interplay between internally and developed machine lear ing tools Recent ng was made p t by emerg learning.Finally,Section 8 presents the roadmap for the near future. 3 Machine Learning Applications and R&D This chapter describes the science drivers and high-energy physics challenges where machine learning can play a work will go in adapting and evolving such methods to match the particular HEP requirements. 3.1 Simulation ctions with matter are known,it is int acta cated CWP challenges of simulations in great detail.This section focuses on the machine learning related aspects For the HLLHC.on of trillions of simulated ded in r example,simula tector response of a single LH oton-proto icity in the part case in the core of a jet of particles produced byhigh guo Fast simulation place the slo utationally efficient mations.Often such approximations have been done by using simplified parametrizations or particle shower These are computationally fast but often suffer from insufficient accuracy for high precision 7
where a continuous function is learned, for example to obtain the best estimate of a particle’s energy based on the measurements from multiple detectors. Neural Networks have been used in HEP for some time; however, improvements in training algorithms and computing power have in the last decade led to the so-called deep learning revolution, which has had a significant impact on HEP. Deep learning is particularly promising when there is a large amount of data and features, as well as symmetries and complex non-linear dependencies between inputs and outputs. There are different types of DNN used in HEP: fully-connected (FCN), convolutional (CNN) and recurrent (RNN). Additionally, neural networks are used in the context of Generative Models, where a Neural Network is trained to reproduce the multidimensional distribution of the training instances set. Variational AutoEncoders (VAE) and more recent Generative Adversarial Networks (GAN) are two examples of such generative models used in HEP. A large set of machine learning algorithms is devoted to time series analysis and prediction. They are in general not relevant for HEP data analysis where events are independent from each other. However, there is more and more interest in these algorithms for Data Quality, Computing and Accelerator Infrastructure monitoring, as well as those physics processes and event reconstruction tasks where time is an important dimension. 2.3 Structure of the Document Applications of machine learning algorithms motivated by HEP drivers are detailed in Section 3, while Section 4 focuses on outreach and collaboration with the machine learning community. Section 5 focuses on the machine learning software in HEP and discusses the interplay between internally and externally developed machine learning tools. Recent progress in machine learning was made possible in part by emergence of suitable hardware for training complex models, thus in Section 6 the resource requirements of training and applying machine learning algorithms in HEP are discussed. Section 7 discusses strategies for training the HEP community in machine learning. Finally, Section 8 presents the roadmap for the near future. 3 Machine Learning Applications and R&D This chapter describes the science drivers and high-energy physics challenges where machine learning can play a significant role in advancing the current state of the art. These challenges are selected because of their relevance and potential and also due to similarity with challenges faced outside the field. Despite similarities, major R&D work will go in adapting and evolving such methods to match the particular HEP requirements. 3.1 Simulation Particle discovery relies on the ability to accurately compare the observed detector response data with expectations based on the hypotheses of the Standard Model or models of new physics. While the processes of subatomic particle interactions with matter are known, it is intractable to compute the detector response analytically. As a result, Monte Carlo simulation tools, such as GEANT [6], have been developed to simulate the propagation of particles in detectors to compare with the data. The dedicated CWP on detector simulation [7] discusses the challenges of simulations in great detail. This section focuses on the machine learning related aspects. For the HL-LHC, on the order of trillions of simulated collisions are needed in order to achieve the required statistical accuracy of the simulations to perform precision hypothesis testing. However, such simulations are highly computationally expensive. For example, simulating the detector response of a single LHC proton-proton collision event takes on the order of several minutes. A particularly time consuming step is the simulation of particles incident on the dense material of a calorimeter. The high interaction probability and resulting high multiplicity in the so-called showers of particles passing through the detector material make the simulation of such processes very expensive. This problem is further compounded when particle showers overlap, as is frequently the case in the core of a jet of particles produced by high energy quarks and gluons. Fast simulations replace the slowest components of the simulation chain with computationally efficient approximations. Often such approximations have been done by using simplified parametrizations or particle shower look-up tables. These are computationally fast but often suffer from insufficient accuracy for high precision physics measurements and searches. 7
Recent progress in high fidelity fast generative models,such as GANs and VAEs,which learn to sample from high toehniqpesawordnofnagmimdenceaeininhtioapedoereitingfattnmlhtiomtoetnios ources with fast s can be used to tune various aspects of the simnlated events.Performing such tuning over the of the generator without detailed knowledge of its internal details Applying such techniques to simulation tuning may further improve the output of the simulations 3.2 Real Time Analysis and Triggering nts recorded by these ever can be aftordabl ed and di teding)with a reasonable efficiency,an later event more of the data in realtime o This topic is dis aod in e detail in the re d Soft osted decision trees in the Level I trigger to approximate mon momenta.One of the challenges is the Tade-ofinngorTthcomplextiwyandperforTmancemd quent perfor fast inference in online systems. Real-tim for fast fake k and clone 2ectio ndaheateplrattboasteldeiateeforalaepa Another related application In addition,the incr easing event complexity particularly in the HI-LHCerawill mean that machine learning improving jet calibration at a very early stage of reconstruction allowing jet triggers thresholds to be lowered: or super e and prot ay triggering at neutrino 3.3 Object Reconstruction,Identification,and Calibration ts occur n tim and thus tdtt ecay within ap They may well also break down in other areas of high-energy physics in due cours
Recent progress in high fidelity fast generative models, such as GANs and VAEs, which learn to sample from high dimensional feature distributions by minimizing an objective that measures the distance between the generated and actual distribution, offer a promising alternative for simulation. A simplified first attempt at using such techniques saw orders of magnitude increase in simulation speed over existing fast simulation techniques [8], but such generative models have not yet reached the required accuracy partly due to inherent shortcomings of the methods and the instability in training of the GANs. Developing these techniques for realistic detector models and understanding how to reach the required accuracy is still needed. The fast advancement in the ML community of such techniques makes this a highly promising avenue to pursue. Orthogonal to the reduction of demand of computing resources with fast simulations, machine learning can also contribute to other aspects of the simulation. Event generators have a large number of parameters that can be used to tune various aspects of the simulated events. Performing such tuning over many-dimensional parameter space is highly non-trivial and may require generating many data samples in the process to test parameter space points. Modern machine learning optimization techniques, such as Bayesian Optimization, allow for global optimization of the generator without detailed knowledge of its internal details [9]. Applying such techniques to simulation tuning may further improve the output of the simulations. 3.2 Real Time Analysis and Triggering The traditional approach to data analysis in particle physics assumes that the interesting events recorded by a detector can be selected in real-time (a process known as triggering) with a reasonable efficiency, and that once selected, these events can be affordably stored and distributed for further selection and analysis at a later point in time. However, the enormous production cross-section and luminosity of the LHC mean that these assumptions break down.1 In particular there are whole classes of events, for example beauty and charm hadrons or low-mass dark matter signatures, which are so abundant that it is not affordable to store all of the events for later analysis. To exploit the full information the LHC delivers, it will increasingly be necessary to perform more of the data analysis in real-time [10]. This topic is discussed in some detail in the Reconstruction and Software Triggering chapter [11], but it is also an important driver of machine learning applications in HEP. Machine learning methods offer the possibility to offset some of the cost of applying reconstruction algorithms, and may be the only hope of performing the real-time reconstruction that enables real-time analysis in the first place. For example, the CMS experiment uses boosted decision trees in the Level 1 trigger to approximate muon momenta. One of the challenges is the trade-off in algorithm complexity and performance under strict inference time constraints. In another example, called the HEP.TrkX project, deep neural networks are trained on large resource platforms and subsequently perform fast inference in online systems. Real-time analysis poses specific challenges to machine learning algorithm design, in particular how to maintain insensitivity to detector performance which may vary over time. For example, the LHCb experiment uses neural networks for fast fake-track and clone rejection and already employs a fast boosted decision tree for a large part of the event selection in the trigger [12]. It will be important that these approaches maintain performance for higher detector occupancy for the full range of tracks used in physics analyses. Another related application is speeding up the reconstruction of beauty, charm, and other lower mass hadrons, where traditional track combinatorics and vertexing techniques may become too computationally expensive. In addition, the increasing event complexity particularly in the HL-LHC era will mean that machine learning techniques may also become more important to maintaining or improving the efficiency of traditional triggers. Examples of where ML approaches can be useful are the triggering of electroweak events with low-energy objects; improving jet calibration at a very early stage of reconstruction allowing jet triggers thresholds to be lowered; or supernovae and proton decay triggering at neutrino experiments. 3.3 Object Reconstruction, Identification, and Calibration The physical processes of interest in high energy physics experiments occur on time scales too short to be observed directly by particle detectors. For instance, a Higgs boson produced at the LHC will decay within approximately 10−22 seconds and thus decays essentially at the point of production. However, the decay products of the initial particle, which are observed in the detector, can be used to infer its properties. Better knowledge of the properties (e.g. type, energy, direction) of the decay products permits more accurate reconstruction of the initial physical process. Event reconstruction at large is discussed in [11] and also disucces the following 1They may well also break down in other areas of high-energy physics in due course. 8
applications of machine learning Experiments have trained ML algorithms on the features from combined reconstruction algorithms to perform ex extracting networks. projection chamber from electromagnetic showers,jet p opertics including substructure and btagging.taus and missing ener are th that scales line s such as 3.4 End-To-End Deep Learning called MELA variables,used in the analysis of the final states.While a few analyses first at the Tevatro which is described in the next section. a from a detector together with etend的 high dimer s raw data in a controlled way that does not necessarily rely on domain knowledge. 3.5 Sustainable Matrix Element Method d for n nts of standard mod (SN ME method are given in Appendix A.1. p The me method has able fea ent all nation of a radiation within the LO ME framework using transverse boosting and dedicated transfer funct ions to inte wlnitialstat y on 9
applications of machine learning. Experiments have trained ML algorithms on the features from combined reconstruction algorithms to perform particle identification for decades. In the past decade BDTs have been one of the most popular techniques in this domain. More recently, experiments have focused on extracting better performance with deep neural networks. An active area of research is the application of DNNs to the output of feature extraction in order to perform particle identification and extracting particle properties [13]. This is particularly true for calorimeters or time projection chambers (TPCs), where the data can be represented as a 2D or 3D image and the problems can be cast as computer vision tasks, in which neural networks are used to reconstruct images from pixel intensities. These neural networks are adapted for particle physics applications by optimizing network architectures for complex, 3-dimensional detector geometries and training them on suitable signal and background samples derived from data control regions. Applications include identification and measurements of electrons and photons from electromagnetic showers, jet properties including substructure and b-tagging, taus and missing energy. Promising deep learning architectures for these tasks include convolutional, recurrent and adversarial neural networks. A particularly important application is to Liquid Argon TPCs (LArTPCs), which are the chosen detection technology for the flagship neutrino program. For tracking detectors, pattern recognition is the most computationally challenging step. In particular, it becomes computationally intractable for the HL-LHC. The hope is that machine learning will provide a solution that scales linearly with LHC collision density. A current effort called HEP.TrkX investigates deep learning algorithms such as long short-term memory (LSTM) networks for track pattern recognition on many-core processors. 3.4 End-To-End Deep Learning The vast majority of analyses at the LHC use high-level features constructed from particle four-momenta, even when the analyses make use of machine learning. A high-profile example of such variables are the seven, socalled MELA variables, used in the analysis of the final states H → ZZ → 4`. While a few analyses, first at the Tevatron, and later at the LHC, have used the four-momenta directly, the latter are still high-level relative to the raw data. Approaches based on the four-momenta are closely related to the Matrix Element Method, which is described in the next section. Given recent spectacular advances in image recognition based on the use of raw information, we are led to consider whether there is something to be gained by moving closer to using raw data in LHC analyses. This so-called end-to-end deep learning approach uses low level data from a detector together with deep learning algorithms [14, 15]. One obvious challenge is that low level data, for example, detector hits, tend to be both high-dimensional and sparse. Therefore, there is interest in also exploring automatic ways to compress raw data in a controlled way that does not necessarily rely on domain knowledge. 3.5 Sustainable Matrix Element Method The Matrix Element (ME) Method [16–19] is a powerful technique which can be utilized for measurements of physical model parameters and direct searches for new phenomena. It has been used extensively by collider experiments at the Tevatron for standard model (SM) measurements and Higgs boson searches [20–25] and at the LHC for measurements in the Higgs and top quark sectors of the SM [26–32]. A few more details on the ME method are given in Appendix A.1. The ME method has several unique and desirable features, most notably it (1) does not require training data being an ab initio calculation of event probabilities, (2) incorporates all available kinematic information of a hypothesized process, including all correlations, and (3) has a clear physical meaning in terms of the transition probabilities within the framework of quantum field theory. One drawback to the ME Method is that it has traditionally relied on leading order (LO) matrix elements, although nothing limits the ME method to LO calculations. Techniques that accommodate initial-state QCD radiation within the LO ME framework using transverse boosting and dedicated transfer functions to integrate over the transverse momentum of initial-state partons have been developed [33]. Another challenge is development of the transfer functions which rely on tediously hand-crafted fits to full simulated Monte-Carlo events. 9
The most serious difficulty in the ME method that has limited its applicability to searches for beyond-the-SM imdin the new physics rech of the Cwhich wi be dominated by increase in integrated The applica tion of the ME method is computationally challenging for two reasons:(1)it involves high- spite the attr rdenof the ME method and promise of futher optimiztion ive featur dern machine lea ed up the the ME method and therefore broaden the applicability of the ME method to the efit of HL-LHC physics is suf ly rich to encod al N of ti he wmple) (DN re strong APwdhatepoiyoaaeeodp imations,poesibly n co unctio ith smart ph can be found i On ea set of DNNs re oodapnoii of th ese DN n be ME method to be both nimble and usb neither of which is true today. The all strategy is to do the sive full me calculatio sible ideally ace for dnn raining and once more for a final pass befor a good appr ture analysis thod with might l ng lk sed through this pro dure As the s through selection and /or sample nRetnticteanetcte,the DNN-base o the time ME method is published.a final pass using full ME calculation would likely be performed both to maximize orty of the ruad to adate teevi the Dbas alone with a com software project for ME calculations in the spirit of This area is very nclaiedformpactlcohboatiouwihg hine learning. sing a DNN trainins from full ME calculations and direct compa ns of the integration accura and DNN-based cal tions should be udertaken.More placed in developing comeling common API,is proposed. 3.6 Matrix Element Machine Learning Method The ma element method is hased in the fact that the physics of article collisions is t-ha r al and side of interest.In exactly the same inputs as in the matrix element method.namely,the matrix elements.parton distribution 10
The most serious difficulty in the ME method that has limited its applicability to searches for beyond-the-SM physics and precision measurements is that it is very computationally intensive. If this limitation is overcome, it would enable more widespread use of ME methods for analysis of LHC data. This could be particularly important for extending the new physics reach of the HL-LHC which will be dominated by increases in integrated luminosity rather than center-of-mass collision energy. The application of the ME method is computationally challenging for two reasons: (1) it involves highdimensional integration over a large number of events, signal and background hypotheses, and systematic variations and (2) it involves sharply-peaked integrands2 over a large domain in phase space. Therefore, despite the attractive features of the ME method and promise of further optimization and parallelization, the computational burden of the ME technique will continue to limit is range of applicability for practical data analysis without new and innovative approaches. The primary idea put forward in this section is to utilize modern machine learning techniques to dramatically speed up the numerical evaluations in the ME method and therefore broaden the applicability of the ME method to the benefit of HL-LHC physics. Applying neural networks to numerical integration problems is plausible but not new (see [34–36], for example). The technical challenge is to design a network which is sufficiently rich to encode the complexity of the ME calculation for a given process over the phase space relevant to the signal process. Deep Neural Networks (DNNs) are strong candidates for networks with sufficient complexity to achieve good approximations, possibly in conjunction with smart phase-space mapping such as described in [37]. Promising demonstration of the power of Boosted Decision Trees [38, 39] and Generative Adversarial Networks [40] for improved Monte Carlo integration can be found in [41]. Once a set of DNNs representing definite integrals is generated to good approximation, evaluation of the ME method calculations via the DNNs will be very fast. These DNNs can be thought of as preserving the essence of ME calculations in a way that allows for fast forward execution. They can enable the ME method to be both nimble and sustainable, neither of which is true today. The overall strategy is to do the expensive full ME calculations as infrequently as possible, ideally once for DNN training and once more for a final pass before publication, with the DNNs utilized as a good approximation in between. A future analysis flow using the ME method with DNNs might look something like the following: One performs a large number of ME calculations using a traditional numerical integration technique like VEGAS [42, 43] or FOAM [44] on a large CPU resource, ideally exploiting acceleration on many-core devices. The DNN training data is generated from the phase space sampling in performing the full integration in this initial pass, and DNNs are trained either in situ or a posteriori. The accuracy of the DNN-based ME calculation can be assessed through this procedure. As the analysis develops and progresses through selection and/or sample changes, systematic treatment, etc., the DNN-based ME calculations are used in place of the time-consuming, full ME calculations to make the analysis nimble and to preserve the ME calculations. Before a result using the ME method is published, a final pass using full ME calculation would likely be performed both to maximize the numerical precision or sensitivity of the results and to validate the analysis evolution via the DNN-based approximations. There are several activities which are proposed to further develop the idea of a Sustainable Matrix Element Method. The first is to establish a cross-experiment group interested in developing the ideas presented in this section, along with a common software project for ME calculations in the spirit of [45]. This area is very well-suited for impactful collaboration with computer scientists and those working in machine learning. Using a few test cases (e.g. tt¯ or tth¯ production), evaluation of DNN choices and configurations, developing methods for DNN training from full ME calculations and direct comparisons of the integration accuracy between Monte Carlo and DNN-based calculations should be undertaken. More effort should also be placed in developing compelling applications of the ME method for HL-LHC physics. In the longer term, the possibility of Sustainable-MatrixElement-Method-as-a-Service (SMEMaaS), where shared software and infrastructure could be used through a common API, is proposed. 3.6 Matrix Element Machine Learning Method The matrix element method is based in the fact that the physics of particle collisions is encoded in the distribution of the particles’ four-momenta and with their flavors. As noted in the previous section, the fundamental task is to approximate the left-hand side of Eq. (5) for all (exclusive) final states of interest. In the matrix element method, one proceeds by approximating the right-hand side of Eq. (5). But, since the goal is to compute Pξ(x|α), and given that billions of fully simulated events will be available, and that the simulations use exactly the same inputs as in the matrix element method, namely, the matrix elements, parton distribution 2a consequence of imposing energy/momentum conservation in the processes 10