This thesis presents measurements of SM processes and searches for BSM physics in events with high momentum leptons collected in the CMS detector of the LHC. These studies have been performed in proton-proton collisions at $\sqrt{s}$=13 TeV collected during the Run 2 of the LHC.

Measurements of the production cross-section of a top quark associated with a a W boson have been performed. A inclusive cross-section of $\sigma_{{\ensuremath{\PQt{}\PW}\xspace}} = 63.1 \pm 1.8 \mathrm{(stat)} \pm 6.3 \mathrm{(syst)} \pm 2.1 \mathrm{(lumi)}\ \mathrm{pb}$ has been measured, consistent with the SM predictions. Differential measurements of this process have also been performed, and found to be consistent with the SM within the uncertainties.

Searches for BSM physics have been performed in events with two opposite-sign same-flavor leptons and large momentum imbalance. This final state is motivated by several SUSY signatures, for which specific signal regions are built. No significant excess of data above the SM prediction is observed in these signal regions. The obtained results allow to put upper limits to sparticle production in the context of simplified SUSY models.

Finally, a measurement of the production of top quarks in association with a Higgs boson is performed in the multilepton channel. This channel targets the decay of the Higgs boson into WW, ZZ and $\PGt\PGt$ pairs. Regions enriched in signal and the various backgrounds are constructed by classifying events exploiting kinematic differences between the two species. Signal strenghts for ttH and tH production are found to be consistent with the SM expectations. Results allow to observe ttH production with an observed (expected) significance of 5.3 (5.4) standard deviations. Results are also interpreted in the context of Higgs coupling modifiers, constraining $\kappa_{\PQt}$ to be in either one of the intervals $-1.1 < \kappa_{\PQt} < -0.7$ and $0.9 < \kappa_{\PQt} < 1.1$ at 95% confidence level, assuming the rest of the couplings to be those of the SM.

Introduction

The of particle physics is the current scientific paradigm that has been able to provide successful prediction for all the phenomena observed in particle physics experiments up to date. Formulated during the 1970s, it predicts a variety of fundamental particles and interactions, that have been observed in collider experiments, and their properties have been accurately measured. The construction of the culminated in 2012 with the discovery of the Higgs boson by the CMS and ATLAS Collaborations at the CERN LHC.

The is the largest and most powerful collider built up to date, and is able to accelerate protons up to almost the speed of light and collide them. These collisions are studied by several experiments that measure the particles that emerge from them.

The Run 1 of the allowed the experiments to observe the Higgs boson and measure some of its couplings to other fundamental particles, but also many advances were made in measurements of the properties of the like, for instance, precision physics in the top sector. It also allowed to impose constrains on some of the most straightforward models of new physics.

With an increase in center-of-mass energy and in luminosity, the Run 2 of the allows to study processes with even lower cross-section and with higher precision. Results obtained during Run 2 are establishing the validity of the at a higher energy scale, providing evidence, as an example, for the interaction of the Higgs boson with top and bottom quarks, and the fermion. Severe constraints were also put on many physics models, putting natural under question, by excluding the presence of top squark masses up to 1 TeV in several models.

This thesis carried out during this period, and focuses in the study of processes with at least two leptons in the final state, in proton-proton collisions recorded by the CMS detector. This final state provides a signature that points to the presence interesting processes in collisions in a hadron collider, and is relatively easy to trigger on. This thesis has aimed to exploit the physics potential of this final state, that allows to cover a broad range of topics. Due to the methodology followed in experimental particle physics, this thesis is supported by publications performed with the dataset collected during 2016. The results described in the thesis are the natural continuation of these measurements with the full dataset, and publications associated to them will follow.

The top quark is the heaviest particle in the , being the only quark that disintegrates before hadronizing. Its study allows therefore to test perturbative QCD at the energy scale. Additionally, because of its high mass, which is a consequence of its natural coupling to the Higgs boson, the top quark plays an important role in many models. The main production top quark production channel at the is through the production of quark pairs. During my thesis I contributed to measurements in this channel at center-of-mass energy at 13 TeV . Previously to the start of the thesis, I had also contributed to measurements at $\sqrt{s} = $ 8 TeV and 13 TeV . The main background to this process is the associated production of a top quark and a boson, $\PQt{}\PW$. During the thesis, I have performed inclusive and differential cross-section measurements of this process .

The Run 2 of the also allows to study for the first time the coupling of the top quark to the Higgs boson at tree level. During this thesis, I have performed measurements of the $\PQt{}\PAQt{}\PH$production in the multilepton channel, which is the most sensitive to the process with the luminosity collected up to date. The measurement performed with data collected during 2016 provided evidence for this process in the multilepton channel only . The statistical combination of this analysis with other decay modes allowed to observe this process for the first time . All the results obtained are consistent with the expectation, including the analysis performed including data collected with 2017 . The analysis reported in this thesis studies the complete Run 2 dataset, and will allow to observe this process with the multilepton channel only, for which a publication will follow.

The third part of the thesis is a search for in events with two opposite-sign same-flavor leptons and large momentum imbalance. This signature allows to search for the production of colored spartners, electroweak spartners and lepton spartners. I have contributed to this searches performed with data collected in 2015 and 2016 . The results obtained in the latter measurements were combined in . The analysis described in this thesis is performed with the data collected in 2016, 2017 and 2018, for which a publication will also follow.

Some of the analyses described in this thesis have been performed in collaboration with other institutions in the CMS Collaboration. The personal contributions of the author are highlighted in every chapter.

The thesis is organized as follows. Chapter [chap:theory] provides a brief review of the theoretical state-of-the-art of particle physics, including a description of the and possible extensions. In chapter [chap:cms] the LHC and the CMS detector are described, together with the techniques used for the analysis of the collisions. These two chapters do not contain original work by the author, but aim to contextualize the research work and describe the techniques used. Chapter [chap:muon] describes the lepton reconstruction used in this thesis, including studies for a precise characterization of their efficiency. Chapter [chap:topphysics] describes the $\PQt{}\PW$measurements performed in this thesis. In chapter [chap:susy], the searches for supersymmetry with two opposite-sign same-flavor leptons in the final state. Finally, chapter [chap:ttH] covers the measurement of the $\PQt{}\PAQt{}\PH$analysis in the multilepton channel. The thesis is closed with a summary and conclusions.

The Standard Model of particle physics and beyond

The universe as we know it is composed by elementary particles that interact among themselves following a set of natural rules. Our current understanding of these elementary particles is that they correspond to excited states of quantum fields. These fields are described mathematically with . The of particle physics is a renormalizable and gauge invariant quantum field theory that encompasses all the particles and their interactions observed up to date, providing accurate predictions to all the phenomena observed in particle physics, spanning over many orders of magnitude. As already mentioned, particles that conform matter and their interactions are represented by fields in this .

Interactions are mediated by integer spin particles, called bosons, that arise when requiring that the theory is invariant until certain local gauge transformations. Gluons mediate the strong force, while the $\PZ^{\pm}$ and bosons, and the photon mediate the electroweak interaction. Gravity is not described by the SM, but it is negligible at the experimentally accessible energy scales by particle physics experiments.

Matter particles have spin 1/2 and are referred to as fermions. Leptons and neutrinos do not interact with gluons, while quarks do. There are two types of quarks: up-type quarks, with electric charge of 2/3, and down-type quarks, with electric charge of -1/3. Three generations of leptons, neutrinos and quarks are present, having all generations identical quantum numbers, but different masses.

Matter content and symmetries of the SM

The propagation and interactions of fields in a are fully determined by a Lagrangian density, L(ψ_i, ∂_μψ_i, x^μ), that depends on the fields ψ_i, their derivatives ∂_μψ_i, and the spacetime coordinates, x^μ = (t, x, y, z).

Since this is a relativistic theory, fields must be representations of the Lorentz group. Fermion fields are represented by spinors, while interaction particles are represented by vectors. The Higgs boson, that is mentioned in following section, is represented by a scalar field.

The other ingredient of the are the local gauge symmetries. These symmetries, due to Noether’s theorem, enforce the conservation of charges associated to the interactions. Imposing the theory to be invariant under these transformations requires the introduction of additional vector fields, that correspond to gauge bosons that mediate the interactions.

Two interactions are present in the : the interaction and the strong interaction. Each one of the two is represented by a different local gauge symmetry. is built requiring the theory to be invariant under the 3-dimensional special unitary group SU(3)_C, while the theory of interactions is invariant under SU_L(2) × U(1)_Y

Quantum chromodynamics

is the theory in the describing strong interactions. is a gauge theory symmetric under the 3-dimensional special unitary group, SU(3)_C. The conserved charged associated to this symmetry is the color charge. Only quarks have color charge, therefore quark fields have a fundamental representation under this group, while the rest of the fermions have the trivial representation. The SU(3)_C group is non-abelian and has 8 generators, T^a, the Gell-Mann matrices, that are related to the 8 gluons. The Lagrangian is written as

$$\mathcal{L}_{QCD} = \bar{\psi} \left( i\gamma^{\mu}\partial_{\mu} - g_s \gamma^{\mu} T_a G^a_{\mu} - m \right) \psi - \frac{1}{4} G^{a}_{\mu\nu}G_{a}^{\mu\nu},$$

with the strength field tensor G_μν^a = ∂_μG_ν^a − ∂_μG_ν^a − g_sf^abcG_μ^bG_ν^c, where f^abc are the structure constants of the SU(3) group. The a, b and c indices run over the 8 kinds of gluons. The third term in G_μν must be introduced since the SU(3) group is non-abelian, and adds self-interaction terms to the gluons, so they couple to each other. g_s is the coupling constant of the strong interaction, mostly referred to in this thesis as $\alpha_s = \frac{g_s^2}{4\pi}$.

The structure of the SU(3)_C group conditions profoundly affects its phenomenology. At the level of one loop, the renormalization group equations predict an scaling of α_s as

where n_f is the number of flavors and Q the scale of the interaction. For n_f = 3, α_s always decreases as a function of Q². This means that the strong interaction will be stronger at low energy scales, with the consequence that colored particles will form bounded colorless states, named hadrons. This phenomenon is called quantum confinement, and implies quarks or gluons will not be observed as free particles. The exception to this is the top quark which, due to its high mass, decays before hadronizing. At higher energies the opposite phenomenon occurs, asymptotic freedom, as the strength of the strong interaction is increasingly weaker.

Electroweak interaction

The description of the electroweak interaction in the was introduced by Glashow, Weinberg and Salam . Since left and right-handed particles couple differently through the interaction, the projection of matter fields to their respective chiral components is considered.

$$\psi_L = \frac{1}{2}\left(1-\gamma^5\right)\psi,\ \ \ \psi_R = \frac{1}{2}\left(1+\gamma^5\right)\psi.$$

The electroweak theory is introduced by imposing a local gauge invariance of a SU_L(2) × U(1)_Y group. The first component of the group acts only on the left-handed components on fermions, hence they are represented as doublets, while the right-handed components are singlets:

In the expression above only the representation for up and down quarks is written for simplicity, but a similar structure is followed for other quarks and leptons, with the exception of neutrinos, that do not have a right-handed component. This gauge group introduces three bosons Wⁱ, associated to the generators of the SU(2)_L group, which are Pauli matrices, σ_i. The associated conserved charge to the SU(2)_L group is called weak isospin. For up-type left-handed fermions, this quantity is $+\frac{1}{2}$ and for down-type it is $-\frac{1}{2}$. For right-handed fermions, since they are singlets of SU(2)_L, their weak isospin component is zero.

Similarly to the strong interaction, the SU(2)_L invariant Lagrangian can be written as

$$\mathcal{L} = -\frac{1}{4}W^i_{\mu\nu}W_i^{\mu\nu} + i\bar{u_R}\gamma^{\mu}\partial_{\mu} u_R + i\bar{d_R}\gamma^{\mu}\partial_{\mu} d_R + i\bar{L}\gamma^{\mu}D_{\mu} L,$$

where W_μνⁱ = ∂_μW_νⁱ − ∂_μW_νⁱ − g_Wε^ijkW_μ^jG_ν^k and $D_{\mu} = \partial_{\mu} + i g_{W} \frac{\sigma_i}{2}W^i_{\mu}$.

It should be noted that by imposing the invariance under SU(2)_L, this theory does not allow to contain mass terms in the Lagrangian, as $m\bar{\psi}\psi = m (\bar{\psi_R}\psi_L + \bar{\psi_L}\psi_R)$ is not invariant under SU(2)_L.

Inspired by the electromagnetic theory, and to explain the different couplings of the and bosons, an additional U(1) group is introduced, with Lagrangian density

$$\mathcal{L} = \sum_{\psi =u_R, d_R,L} \bar{\psi}\left( i \gamma_{\mu}\partial^{\mu} - Y g_Y B_{\mu} \right) \psi -\frac{1}{4} B^{\mu\nu} B_{\mu\nu}.$$

The conserved current for this symmetry is the hypercharge, Y, that is related with the third component of the isospin and the electric charge: Y = 2(Q − I₃). The four degrees of freedom introduced by SU(2)xU(1) local gauge symmetry can be related with the physical mass eigenstates corresponding to the and bosons and the photons:

$$\begin{pmatrix} \PZ \\ \PGg \end{pmatrix} = \begin{pmatrix} \cos\theta_{\PW} & -\sin\theta_{\PW}\\ \sin\theta_{\PW} & \cos\theta_{\PW}\end{pmatrix} \begin{pmatrix} W^0 \\ B \end{pmatrix},$$

Spontaneous symmetry breaking of the electroweak symmetry

As mentioned in the previous section, imposing the invariance under local gauge transformations of the SU(3)_C × SU(2)_Y × U(1)_Y group does not allow for a theory with massive fermions. Similarly, massive bosons would also break this symmetry. This obviously contradicts experimental evidence in which some bosons and all fermions are massive.

Masses can be introduced in the theory by means of the Higgs-Brout-Englert mechanism , that describes the of the symmetry. An additional SU(2)_L doublet of complex scalar fields, H, is introduced in the theory. This field propagates as an usual scalar field under a given potential

$$\mathcal{L}_{\PH} = (D_{\mu} H)^{\dagger} (D^{\mu} H) + \mu^2\phi^{\dagger}\phi + \lambda (\phi^{\dagger}\phi)^2$$

and a potential of the form are added to the theory. A Yukawa interaction term is added between the fermions and the scalar doublet:

$$\mathcal{L}_{\mathrm{Yukawa}} = -y_{u}\left( \bar{L} H u_R + \bar{u_R} H L \right) -y_{d}\left( \bar{L} H d_R + \bar{d_R} H L \right) .$$

It should be taken into account that for leptons only the first term is present, since there are no right-handed neutrinos in the . Once these terms are added, the theory is still invariant under the local SU(2)xU(1) symmetry. However, assuming μ² < 0, the potential has a set of minima, and the H may acquire a vacuum expectation value, breaking the symmetry. The unitary gauge $H =\begin{pmatrix} 0 \\ \sqrt{\frac{\mu^2}{\sqrt{2}\lambda}} + \PH(x) \end{pmatrix}$ can assumed, where $\PH(x)$ is the remaining degree of freedom after the symmetry breaking. This degree of freedom corresponds to a scalar, the Higgs boson. With this, terms appear that give rise to masses to the , and bosons, that take the following values:

$$\begin{aligned} m_{\PW} &= \frac{gv}{2}, \nonumber \\ m_{\PZ} &= \frac{v\sqrt{g_{\PW}^2 + g_{Y}^2 }}{2}, \\ m_{\PH} &= \sqrt{2\lambda}v. \nonumber \end{aligned}$$

The Yukawa couplings also introduce mass terms for the fermions, that are equal to $m_{\psi} = \frac{y_{\psi}v}{\sqrt{2}}$.

CKM matrix

While the contains three flavors of fermions, the construction considered so far only takes into account only one generation. This can be expanding by considering a family of quarks Q_Li, u_Ri and d_Ri, where i is an index running over the 3 flavors. The Yukawa interaction for quarks may then be written as

$$\mathcal{L}_{\mathrm{Yukawa}} = -Y_{ij}^d \bar{Q_{Li}}H d_{Rj} - Y_{ij}^u \bar{Q_{Li}}\epsilon H^* u_{Rj} ,$$

where the Yukawa couplings, Y^u, d, become now 3 × 3 complex matrices, and ε is the 2 × 2 antisymmetric tensor. The indices i and j run over the three flavors. When the Higgs field acquires a vacuum expectation value, fermions still acquire mass terms. However, since the Y^u, d matrices are not diagonal anymore, the mass eigenstates do not correspond anymore to the gauge eigenstates. The former can be recovered by diagonalizing the Y^u, d matrices. As a result, the interaction of the bosons does not occur among fermions with the same flavor. Instead, couplings between different generations exist, proportionally to the elements of the matrix.

$$V_{CKM} = \begin{pmatrix} V_{\PQu\PQd} & V_{\PQu\PQs} & V_{\PQu\PQb} \\ V_{\PQc\PQd} & V_{\PQc\PQs} & V_{\PQc\PQb} \\ V_{\PQt\PQd} & V_{\PQt\PQs} & V_{\PQt\PQb} \\ \end{pmatrix}.$$

The elements of the matrix are obtained in the diagonalization of the Y^u, d matrices, and are free parameters of the theory, that are to be measured experimentally. This matrix must be unitary by construction and, therefore, its elements are not independent. Measurements of these elements, together with the 6 resulting constrains are used to experimentally test the consistency of the SM.

As a conclusion, all the matter particles, with their corresponding quantum numbers are measured masses are shown in table [tab:sm_particles]. The mass eigenstates of the gauge bosons are the following. The photon and the gluons are massless, while the $\PW^{\pm}$ and the $\PZ$ bosons have been measured with a mass of 80.385 ± 0.015 $\,\text{Ge\hspace{-.08em}V}$and 91.1876 ± 0.0021 $\,\text{Ge\hspace{-.08em}V}$respectively .

Open topics in the SM

The theory described in the previous section provides a complete description of all the phenomena observed so far in particle physics. The particles described in the previous section have been experimentally observed and measured. The construction of the was completed when the Higgs boson was observed by ATLAS and CMS. Despite of this striking excess, there are a few open topics for which the does not have a complete explanation yet. In this section, some of these points are discussed. Additionally, the supersymmetric extensions of the are described.

Gravity

The does not include an unified description of gravity as a . However the contribution of gravity to processes at the accessible energy scales is negligible. Effects due to gravity are only assured to be relevant above the Planck scale, 13 orders of magnitude higher than the TeV scale. Attempts exist however to introduce gravity into the by means of a spin-2 mediator, the graviton.

Naturalness

The quantum corrections to the Higgs boson mass serve to illustrate the hierarchy problem of the . Loop contributions from fermions and bosons to the Higgs propagator, like in the ones depicted in the left term of figure [fig:sm_higgs_radiativecorr], can be significant. These radiative corrections are dominated by the top quark, which is the particle with the highest mass and therefore the largest coupling to the Higgs boson. This implies corrections to the mass that are of the order of

where Λ_UV is a ultraviolet cut-off used to regulate the loop integral. Under the , the scale of Λ_UV can be as large as the Planck scale. Therefore, in absence from other contributions besides gravity, these corrections can be large. Other particles can contribute to the quantum corrections of the Higgs boson mass and compensate those of the top quarks. Indeed, corrections from bosons contribute with the opposite sign to that of fermions, however their masses and couplings are different, therefore a complete cancellation is highly unexpected.

Neutrino masses

In the formulation shown in the previous section, neutrinos do not acquire mass in the mechanism. However, observations of neutrino mixings confirm they are indeed massive particles . Indeed their masses are currently unknown, and also the mechanism through which they acquire them.

Dark matter

The rotational curves of the galaxies cannot be explained only by the presence of observed baryionic dark matter. This hints to the presence of an known type of matter not included in the current formulation of the , and whose nature is unknown. Supporting evidence comes from observations of the cosmic microwave backgrounds .

Supersymmetry

Supersymmetric theories are introduced to solve some of the open issues in the , like the hierarchy problem and the nature of the dark matter constituents. These theories propose the existence of a new symmetry, , that relates fermions and bosons:

$$Q\ket{\mathrm{fermion}} = \ket{\mathrm{boson}}, \ \ \ \ Q\ket{\mathrm{boson}} = \ket{\mathrm{fermion}},$$

where Q is the generator this symmetry. In supersymmetric theories, particles are represented as multiplets of the associated algebra, named supermultiplets. Each supermultiplet contains the same amount of fermion and boson degrees of freedom, and therefore each particle has associated a superpartner. If the particle is a fermion, the superpartner must be a boson and viceversa.

Because of this property, the existence of predicts a set of new particles that are not included in the . These particles have the same quantum numbers as their partners, but for their spin. Gauge bosons are associated to spin 1/2 particles, named gauginos and, conversely, fermions are associated to scalar particles, named sfermions. These new particles are usually referred to as sparticles. The includes only the needed particles to have a consistent theory including the current particle content observed in the . Besides the spartners of the particles, an additional Higgs doublet must be added. One of the Higgs doublets will give masses to the down-type quarks, while the other will give mass to the up-type quark and the charged leptons. In total, two neutral and two charged Higgs bosons are present in the theory, together with the corresponding sparticles.

The nomenclature followed to denote sparticles is the following. Sparticles are usually denoted with the name of their partner with a tilde. Like that, the superpartner of the right-handed electron, $\PeR$, is denoted as .

In many supersymmetric theories the spartners of the electroweak gauge bosons and the Higgs bosons do not correspond to the mass eigenstates. Instead mass eigenstates correspond to mixtures of the gauge and Higgs bosons. The neutral mass eigenstates are named charginos, $\PSGczDo$ and $\PSGczDt$, while the neutral are named neutralinos, , , and .

As anticipated, the introduction of these new particles compensates the quantum corrections to the Higgs boson propagation described in the previous section, as shown in the diagrams of figure [fig:sm_higgs_radiativecorr]. If was an unbroken symmetry, and since the quantum number of sparticles are the same as SM particles, this cancellation would be exact, since they would have the same masses as their partners.

However, such particles have not been observed. Therefore, if is realized in nature it must be broken at a certain scale Λ_UV. Then, the corrections to the Higgs boson mass are not canceled completely, however the Λ_UV scale is at the breaking scale and not at the Planck mass. The size of the corrections to the Higgs propagator would be acceptable if this scale would be slightly above the electroweak scale. Because of this, it is very appealing to perform searches for particles in the current experiments at the LHC.

The phenomenology of a supersymmetric theory is conditioned by the masses of the associated partners, which are set by the mechanism through which the symmetry is broken. Several mechanisms have been proposed for this breaking, but in practice not that many constraints can be derived from this.

Additionally, the presence of particles introduces new interactions. In particular, the decay of the proton could be mediated by the superpartner of the right-handed strange or bottom quarks. This again contradicts the observations on the lifetime of the proton. This is typically handled by requiring the conservation of the R-parity, a quantum number that is defined as +1 for particles, and -1 for the particles. Besides protecting the proton from promptly decaying, this has phenomenological consequences. First, since this number must be conserved, sparticles can only be produced in pairs in colliders, since the initial state particles are particles. Secondly, R-parity conserving models predict the existence of a stable particle, since such its decay into sparticles would be kinematically forbidden and its decay into particles would violate R-parity conservation. This particle, known as , would be a suitable candidate to be the constituent of dark matter. In many supersymmetric models, the phenomenology of processes is set by the characteristics of the and the . Indeed not all the realistic models assume R-parity conservation, and in some of them a certain degree of violation is allowed.

Current searches for at the make use of simplified models, in which the production of a certain kind of particles and an specific decay mode with 100% branching fraction are assumed. These models may not be realistic, but are nevertheless useful to define a framework to guide searches and set limit on the production of sparticles. Up to date, sparticles have not been detected. Instead, tight constrains are put by the experiments on the production of sparticles. For instance, in the most straightforward models, stop masses below the TeV scale are excluded, questioning the existence of natural at the electroweak scale.

SM phenomenology at LHC

Many of the most important experimental tests that physicists set the are collider experiments. Colliders are also used in many cases aim to produce particles, not accessible at the lower energy scales. Because of this and the because this thesis is focused in one of the experiments, the rudiments used to draw testable predictions from the in hadronic colliders are described in this section. These techniques can also be used to produce predictions under the assumptions of theories, provided they can be computed in perturbation theory.

Parton distribution functions

Any observable in a hadronic collider can be computed from a inclusive or differential cross section. Calculations of these cross sections are usually divided in two parts: the short distance interactions, that can be computed analytically using perturbation theory, and long distance interactions. Perturbation theory cannot be used for the latter since the strong coupling becomes too large, and a dedicated treatment for them is needed. Cross-sections for the production of a given final state in a proton-proton ($\Pp\Pp$) collision can be computed at fixed order in perturbation theory as

σ = ∑_i, j∫dx_idx_jf_j(x_j, μ_F)f_j(x_j, μ_F)σ_ij(x₁x₂s; μ_R, μ_F).

The first sum runs over the possible partons species in the two protons, for which all the momentum configurations are considered parameterized by the Bjorken variables, x_i = p_i/p, where p_i is the momentum of the parton and p, that of the proton.

σ_AB is the partonic cross-section, that can be computed at a fixed order, using the Feynman rules that can be obtained from the Lagrangian densities described in the previous sections. The renormalization scale, μ_R, must be introduced in this calculation in order to cure the divergences in the loops of these Feynman diagrams. This scale enters through the running of the coupling constants, which become a function of it.

f_i are the , that represent the probability for finding a parton of the proton for a given momentum fraction x. By introducing this contribution, another scale, μ_F, is introduced that regulates the separation between the soft and hard interaction. These functions parameterize the soft interactions that occur in the initial state protons and therefore cannot be calculated from first principles with the current formulation of QCD. Instead, these functions have been measured in deeply inelastic scattering and other experiments at certain scales. can be then propagated to the scale under study using the Dokshitzer-Gribov-Lipatov-Altarelli-Parisi (DGLAP) equations.

Fixed order calculations can give accurate predictions for the partonic final state, however this does not correspond to the final experimental observables. These particles may instead further decay or, if they are colored particles, they can radiate extra partons. Since this radiation may be soft, and also because the matrix elements can only be calculated up to a certain order, phenomenological models are used instead.

Besides parton showers, other effects must also be taken into account. In particular, the rest of the proton constituents may interact among each other, leading to secondary interactions. These interactions are known as , and modeled using phenomenological models that take into account perturbative and non-perturbative effects.

Flavor schemes

When modeling the production involving third generation quarks, such as single top cross-section, two different approaches can be followed to treat initial state quarks. The two approaches are called and . In the the quark is assumed to be a massless particle and therefore it is considered as part of the . In the , the mass of the quark is considered and therefore, it can only be present in the initial state through gluon splitting, since its mass is larger than that of the proton. This nomenclature of “massive” and “massless” particle is only for convenience and does not imply that bottom quark mass effects are taken into account .

The two approaches are valid to make sensible predictions. In the the quark is simulated exactly at fixed order in perturbation theory, while in the its emission comes from the parton shower so, in principle, the former should be more accurate. However, the gluon splitting in the initial suffers from a collinear divergence, that may spoil the convergence of the perturbation series. This divergence is on the other hand absorbed by the in the .

Hadronization

Parton showers still only predict the radiation of final state partons. These partons are not the final physical observables as they still correspond to colored states. Colored particles hadronized to form bounded color neutral states, named hadrons. This corresponds to a fully non-perturbative process, that occurs below a given energy scale. Phenomenological models, such as pythia and herwig , that also model , exist for the treatment of these processes.

Monte Carlo generators

In practice, some of the calculations mentioned above are automated in the form of event generators. These generators have the advantage that they allow to easily perform phase space integrations and, additionally, they allow to deliver predictions for arbitrary observables.

In this thesis, several analyses with different choices of models are described. Typically, the hard scattering is simulated at or accuracy using the powheg or MadGraphmc@nlo using a given choice of . These calculations are then interfaced to generators. In particular, pythia8 is always used in this thesis. This generator also models the hadronization and the , that are tuned to reproduce quantities measured in data.

The LHC and the CMS experiment

The measurements and searches described in this thesis have been performed in the detector of the CERN . In this chapter, the facility and the experiment are described in sections [sec:lhc] and [sec:cms]. Section [sec:cms_event_reco] describes the reconstruction of events performed in the data collected by the detectors. Section [sec:cms_datasets] describes the features of the datasets used in this thesis, and section [sec:cms_data_mc_corrections] describes the corrections that are applied to simulations in order for them to reliably predict the features observed in data. Finally, section [sec:cms_anal_techniques] is dedicated to the analysis techniques employed along this thesis.

The Large Hadron Collider

The is the largest physics experimental facility. It is a collider that accelerates beams of hadrons up to a center-of-mass energy of up to $\sqrt{s}=$ 13 TeV and collides them. These collisions take place in four interaction points where the main experiments, , , and , are located. These experiments record and analyze these collisions, extracting information of many physical observables. Here only a brief description of the and functioning is provided. A more detailed description can be found in reference .

The profits from an accelerator chain that starts when protons, produced from a ionized hydrogen gas, are injected into a Radio Frequency Quadrupole, a cavity that accelerates and groups the proton in beams. Protons and then injected in the LINAC 2, a linear accelerator that accelerates them up to 50 MeV. Then they are injected into the Proton Synchrotron Booster, that accelerates them up to 1.4 $\,\text{Ge\hspace{-.08em}V}$, and in which protons are grouped into bunches. Finally protons are accelerated to an energy of 26$\,\text{Ge\hspace{-.08em}V}$in the Proton Synchrotron, where the bunch structure is set. Finally, the Super Proton Synchrotron accelerates protons until 450$\,\text{Ge\hspace{-.08em}V}$, energy at which protons are injected into the .

The is a circular collider situated along a 27 km underground tunnel near Geneva (Switzerland). Protons injected into the are increasingly accelerated by a set of superconducting radio frequency cavities situated in one point of the circumference. There are 16 cavities operating at a frequency of 400 MHz, each one of which provides a 2 MV acceleration voltage to the protons. The oscillating voltage is synchronized with the arrival time of the protons in a way that protons arriving earlier are accelerated less than the ones arriving later, collimating further the bunches.

Protons are kept in the circumference by bending their trajectory using a set of superconducting Nb-Ti magnets that generate a magnetic field of up to 8.3 T, that are situated in many sectors along the circumference. These magnets are operated at a temperature of 1.9 K, that is achieved with a circuit of liquid helium. Between the bending magnets, a set of quadrupole magnets focus the beam in the directions perpendicular to the beam axis, in order to prevent beam losses. Additional magnets further focus the beam and direct it to the collision points.

With this combination of radio frequency cavities and superconducting magnets, the is able to accelerate and collide protons up to energies of $\sqrt{s}=$ 13 TeV, the largest ones achieved in a human-made experiment. The other figure of merit relevant for experimental physicists is the instantaneous luminosity, L. The instantaneous luminosity can be defined as the relation between the event rate and the cross-section of a process with a given cross-section σ:

This quantity depends only on the characteristics of the collider, and not on the process considered. It can be approximately computed as

where N_p is the number of protons per bunch, n_b the number of colliding bunches, f is the beam revolution frequency, σ_x, y are the beams sizes along the transverse directions, and R is a geometrical reduction factor to take into account the crossing angle at the interaction point.

A quantity associated to the instantaneous luminosity is the integrated luminosity, L, usually referred to in this thesis as luminosity, which is defined as the integral over time of the instantaneous luminosity

The luminosity integrated along the data taking periods of 2015, 2016 and 2017 by the experiment is shown in figure [fig:lumi_pileup].

The was designed to deliver an instantaneous luminosity of about 10³⁴ cm^− 2s^− 1 during Run 2, but twice this value was achieve during the running period in 2017 and 2018. To achieve this high luminosity, many protons are collimated in bunches. Because of this, simultaneous collisions occurs at every bunch crossing. This phenomenon is called pile-up, and can be a limiting factor for the experiments. Figure [fig:lumi_pileup] shows distribution of the average number of interactions per bunch crossing. The average number of pile-up interactions was 27, 38 and 37 in the operation during 2016, 2017 and 2018, respectively, the data used in this thesis.

The CMS experiment

The experiment is, together with , one of the two general-purpose detectors of the . These detectors aim to record and study a wide variety of physical processes that may occur in $\Pp\Pp$at the energy scale. In contrast, and are focused in study quark physics and heavy ion collisions, respectively. In this section a brief description of the detector, the experimental set-up of this thesis, is provided. A more complete description can be found in .

The detector is a cylindrical device located around the beam pipe in one of the interaction points. This detector has a length of 21.6 m, a diameter of 14.6 and weights 14500 tons. It is divided in several components: it features a superconducting solenoid magnet that generates a magnetic field of up to 3.8 T, a very granular tracking detector, an excellent muon detector and hermetic calorimeters. A general sketch of the CMS detector is shown in figure [fig:layout_detector].

Coordinate system and notation

In order to describe the detector components, and also to quantify the physics observables in the measurements in this thesis a common coordinate system is established. The z axis is taken to point in the direction of the beam axis, pointing towards west. The y axis points in the vertical direction towards the earth surface, while the x axis points to the center of the circumference. It is useful to define the x − y plane as the transverse plane, which is perpendicular to the beam axis. The azimuthal angle, labeled as ϕ, is defined starting from the y axis. The projection of a particle momentum into the transverse plane is denoted transverse momentum, p_T. The polar angle in the y − z plane, θ, is also defined starting in the y axis. This quantity is related to the pseudorapidity, η, defined as

The difference in pseudorapidity between two massless particles is invariant under boosts along the z axis. Finally, angular distances are usually measured in units of as $\Delta R = \sqrt{\Delta\phi^2+\Delta\eta^2}$.

Subdetectors in are usually divided in two parts. The central part, surrounding the beam pipe is called the barrel. Two endcaps in the negative and positive z axis are located at the two sides of the barrel to increase the acceptance.

Finally, in this thesis, mathematical symbols in bold face are used to denote vectors.

Solenoid magnet

The solenoid magnet of the detector is one of its key components. It is a superconducting Nb-Ti magnet, that generates a 3.8 T magnetic inside the solenoid and up to 2 T outside. Its is 13 m long and a has radius of 6 m, and is located in the center of the detector, between the calorimeters and the muon system. The return yoke of the magnet is made of iron and interspersed with the chambers of the muon system. The magnetic field generated bends the trajectory of charged particles emerging from the collision, allowing to infer the transverse momentum of these particles from the curvature of their trajectory. This bending is larger the larger the magnetic field, so it is crucial to have an intense magnetic field for a precise momentum measurements, particularly for high momentum particles, for which the bending is smaller.

Tracking system

The tracking system allows to measure the track of charged particles produced in the collisions. It is located in the innermost part of the system, in order to make a precise measurement of the momentum and the impact parameter of the tracks. This allows to efficiently discriminate among the vertices corresponding to the several interactions that may occur in the same bunch crossing. A precise determination of the impact parameter is also crucial to tag jets, which is an important observable in measurements and searches for new physics.

The tracking system is composed by two parts, the pixel detector and the silicon strip detector. Both are based on semiconductor technology but satisfy different granularity and size criteria.

The pixel detector is the innermost detector, situated a few centimeters away from the interaction point. Since a higher occupancy is expected in that region and since a high spacial resolution is needed closer to the interaction point for a precise measurement of the impact parameter, the pixel detector is very granular, with a spacial resolution of between 10 and 15 μm. The pixel detector is divided in 3 (4) layers in the barrel and 2 (3) disks in the endcap before 2017 (from 2017).

The silicon strip system is the second layer of subdetector located 20 to 116 cm away from the interaction point. Silicon strips are located along the z direction in the barrel, and along the radial component in the endcap.

The performance of the tracking system is shown in figure [fig:tracker_performance], in which the impact parameter resolution in the transverse and z components is measured.

Calorimeters

The aim of the calorimeters is to measured the energy of the particles produced in the collisions. Two types of calorimeters are present in the detector: the and . The two calorimeters are located between the tracking system and the solenoid magnet, with the being the innermost one. High resolution in the energy measurement and hermeticity are necessary to precisely measure the energy of neutral particles, as well as provide a good resolution in the total momentum imbalance of the event.

The is designed for a precise measurement of the energy of electrons and photons. It is a scintillation calorimeter made of PbWO₄ crystals. This material is suitable for this purpose due to its high density, short radiation length and small Moliere radius. The light produced by the scintillation is recorded by photodetectors, , whose output can be read out. The energy resolution is shown in figure [fig:ecal_hcal_performance], as measured in $\PZ\to\Pe\Pe$ events.

The surrounds the and is designed to measure the energy of hadrons. It is the only element that allows to measure the energy of neutral hadrons and it provides a complementary measurement of the charged hadron measurements. Since the space between the and magnet is limited, part of the it is located outside the magnet solenoid. It is a sampling calorimeter in which brass layers are interspersed with plastic scintillators.

Muon spectrometer

The muon system is located in the outermost part of the detector and is dedicated to the measurement of muons, which are the only particles, besides neutrinos, that are able to traverse the calorimeters and the solenoid. The detection and identification of muons is crucial for the physics program, as their presence is one of the possible signature of many interesting processes. Besides the identification of muons, the muon system is designed to also be able to trigger based in their presence and to measure their momentum with a relatively high precision.

Three kinds of gaseous muon detectors are used in the detector: the , and . The three kind of detectors have distinctive features and are used in a complementary way.

are located in the barrel of the detector (∣η∣ < 1.2), where the neutron-induced background is small and low muon rates are expected. It is organized in 5 wheels situated in parallel to the z axis. Each wheel consists of four stations interspersed among the layers of the magnet return yoke. Each station is a set of chambers located in a ring, with the same radius, that is able to measure both the z direction of the muon trajectory as well as the r − ϕ bending angle. The outermost ring does not measure directions along the z direction. are divided in several drift cells filled with gas, and the muon position is inferred by measuring the drift time to an anode wire of the charge produced by the ionization produced by the muon.

In the endcap regions of , a larger contribution from background is expected. Moreover the magnetic field is larger and less uniform. are situated in 0.9 < ∣η∣ < 2.4 and consist of four vertical stations in each endcap. operate as multiwire proportional chambers, with finely segmented cathode strips. The wires are pointing in the radial direction, which allowing to precisely measure the ϕ component of the muon interpolating among the charges induced in the strips.

are located in ${\ensuremath{|\eta|}\xspace}< 1.9$. They are a gaseous parallel-plate detector, that features a very good timing resolution, better than the minimum 25 ns between consecutive bunch crossings. complement the and providing very good trigger capabilities.

The muon system has in general a very good coverage along the ${\ensuremath{|\eta|}\xspace}<2.4$ range. The regions with ∣η∣around 0.25 and 0.8, that correspond to the transition region between the wheels and 1.2, in which the transitions between and are located, show however a slightly lower efficiency. The excellent performance of the muon system is described in chapter [chap:muon], in which the muon reconstruction algorithms are described.

Trigger system

The collision rate can be as high as 40 MHz. This, together with the complexity of the detector, require the need for a trigger system that promptly evaluates and selects the collisions that are interesting and must be stored, discarding the rest. The trigger system consists of two logical layers: and the .

The takes decisions based on the detector information obtained from a coarse readout of the calorimeters and the muon system. The tracking detectors are not used at this level since it is not possible to read out all the information at every bunch crossing. The system allows for a latency up to 4 μs, during which events are buffered until they are either accepted or discarded.

Several algorithms are implemented in , that is implemented in hardware. Firstly each subdetector has its own local trigger, that promptly reconstruct the energy deposition in the calorimeters and muon hits in the subdetector. Then a two-layer trigger allows to reconstruct electrons, jets and hadronic from the calorimeter information. In parallel, three track finders reconstruct muons combining the information from the different muon subdetectors to construct muon candidates. Finally the information from the reconstructed calorimeter and muon objects is received by a global trigger that makes the final decision.

Most of the bandwidth of the is dedicated to simple topologies, like single and double objects, as well as combinations of objects with certain momentum thresholds. As an example, a combine trigger may require the presence of an electron and a muon. However, trigger algorithms allow to calculate invariant masses and angular distances between objects, allowing for more complex topologies. The readout of the detector imposes an upper limit of 100 kHz on the acceptance rate of the trigger.

Upon a positive decision of the trigger, the full event information can be read out and reconstructed by the . At the , events are reconstructed using a computing farm of commercial computers. The software used to do so is mainly written in and integrated with the rest of the code used in offline event reconstruction, and in some occasions it coincides.

Event reconstruction at the is seeded by a positive decision of one or more triggers. One or several algorithms (or paths) are ran, targeting specific topologies, and are seeded by one decision targeting that topology. Each path performs a increasingly complex object reconstruction, aiming to promptly reject spurious events in order to reduce the computing time taken by each event, and preforming a more precise reconstruction for likely candidates. The average processing time of an event depends on the running conditions, such as the pile-up, but it is usually around 150 ms, as it can be seen in figure [fig:hlt_performance]. It can also be seen that the bulk of the events are processed quite fast, since for them a complete reconstruction is not needed to make a decision, while for a few, in the tails, it is needed to run more complex algorithms that can take up to one second. The accepts a rate of around 1000 Hz, integrated over several hours, that can be stored and further processed.

Event reconstruction in CMS

Particle Flow

Event reconstruction of data aims to identify the collision products and measure their kinematic properties, to ultimately construct the relevant physical observables. The algorithm performs this task by combining the information from all the subdetectors described in the previous section.

The inputs to the algorithms are the tracks measured in the tracking system, the energy depositions in the calorimeters, as well as information of the muon system. Information on the electron and muon reconstruction is also introduced as an input, to exploit the peculiarities of these objects, as described in chapter [chap:muon].

The tracking of charged particles in the inner detectors is performed with the means of an iterative tracking algorithm. The algorithm runs several tracking iterations trying to keeping an increasingly high efficiency but very high purity. Hits associated to a track in a given iteration are then masked for the following iterations in order to avoid them being associated to a different track. At each step of the iteration, different quality criteria on the track seeds, the χ² as well as the probability from the coming from one of the reconstructed vertices, adapted to the track p_T, ∣η∣and number of hits. From the reconstructed tracks, primary vertices corresponding to several interactions can be identified. In this thesis, the primary vertex with the highest quadratic sum p_Tof their tracks is taken as the vertex of the hard scattering.

Clustering of energy depositions in the calorimeters done separately for each subdetectors and is seeded by local maxima of energy depositions. Then clusters are built by aggregating neighboring depositions. This algorithm aims for a high efficiency when detecting low energy particles and high spacial resolution.

The algorithm gathers these inputs using a linking algorithm that connects them. This connection is made geometrically, either extrapolating the reconstructed tracks to the calorimeter cells or through the position the calorimeter deposits. Tracker tracks may also be linked to hits in the muon system. Since particles interact differently with each one of the subdetectors, this linkage allows for particle identification. Muons and electrons are built by the association of tracks with segments in the muon systems and deposition in the , respectively, as described in chapter [chap:muon]. Charged hadrons produced a tracker track and depositions in the calorimeters. Photons and neutral hadrons only deposit their energy in the and , respectively.

Missing transverse momentum

The complete reconstruction of all the particles in the collision, as well as their assignment to a specific primary vertex allows to infer the presence of invisible particles in the final state, such as neutrinos but also particles, like the . The missing transverse momentum, ${\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}$, is defined as

$${\ensuremath{{\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}}\xspace}= - \sum{\ensuremath{{\vec p}_{\mathrm{T}}}\xspace}(i),$$

where the sum runs over all the observed particles from the selected primary vertex. The magnitude of ${\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}$is usually denoted as ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$. Due to momentum conservation, if only visible particles were present in the final state and in absence of resolution effects, this quantity would be zero. Therefore, zero values of ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$hint to the presence of an invisible particle.

Several algorithms are used to identify and reject events with spurious ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$due to failures of the reconstruction algorithm or the detectors. These algorithms , or filters, identify several of these sources, such detector noise in the calorimeters, background induced by the beams or poorly reconstructed high p_Tmuons.

In figure [fig:cms_met_performance], the ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$resolution during the 2016 data taking period is shown in $\PZ\to{\PGm}\PGm$ events, also compared with simulations, showing a reasonably good agreement. The effect of the ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$filters is also shown in a sample of multijet events in the same figure.

In addition, in some occasions the H_T^miss variable is considered. It is defined in the same way as ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$, but only objects (jets, leptons and hadronic ) passing specific selection criteria in the analysis are used. This variable is less sensitive to the presence of invisible particles, but is more robust to energy mismeasurements and spurious signals. In the $\PQt{}\PAQt{}\PH$multilepton analysis described in chapter [chap:ttH], a linear combination of the two is considered, ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$LD, that provides a good compromise between discrimination power and robustness.

Jets

Due to the quantum confinement described in chapter [chap:theory], colored final states are not observable. Instead, quarks and gluons radiate partons, that eventually hadronize into hadrons, which are bounded, colorless states. The observable for a parton emission is then a spray of particles, named jet, that is more or less collimated in a given direction. From a phenomenological point of view, jets may be defined by clustering all particles in the final state. However, clustering algorithms must be collinear and infrared safe in order to be a suitable to be compared with theoretical predictions: the additional emission of soft or collinear partons should not affect the result of the clustering.

In this thesis, jets are defined using the anti-k_Talgorithm , that is ran on all the particles identified by the algorithm to come from the selected from the selected primary vertex. Usually a distance parameter of 0.4 is taken, however in the search for jets with a distance parameter of 0.8 are also used, to reconstruct large radius jet produced in the hadronic decay of boosted and bosons.

In order to reject jets originating from detector noise and misidentified particles, very mild selection criteria are applied to reconstructed jets in order for them to be applied in the analysis. They are simple requirements in the minimum number of constituents, as well as minimum hadron and electromagnetic energy fractions .

$\Pb$-jet tagging

It is usually interesting to identify the flavor of the partons that have produced a jet. The identification of jets produced by quarks, referred to as jets, is particularly interesting, since they appear in the decay of top quarks and other massive particles. jets have distinctive features due to the presence of hadrons containing . These hadrons usually have a longer lifetime that other light flavor hadrons. Because of this, -jets usually present a secondary vertex, corresponding to the decay of the hadron after it has flown a given distance. Additionally, quarks have a larger mass than light flavor quarks and gluons, so particles in the decay have a larger momentum relative to the jet axis than other constituents. Finally, these decays may also give raise to electrons and muons.

Several algorithms are being used in to identify jets . The field of flavor tagging has significantly evolved during the time this thesis was made. Therefore three different algorithms are employed: the DeepCSVv2 algorithm, the DeepCSV and the DeepJet algorithm. The three of them are neural networks exploiting similar kinematic features, however the technical complexity is increasingly large, resulting in a improved performance. Three working points are established to tag jets: loose, medium and tight, targeting to a 10, 1 and 0.1% of light jet misidentification probability, approximately.

The performance of the DeepCSV and DeepJet algorithms are shown in figure [fig:cms_btag_performance] in simulations, compared to the performance in simulations calibrated by the efficiency and mistag rate observed in data for the mentioned working points.

Finally, other discriminators exist that allow to discriminate also between jets produced by quarks and gluons , exploiting features such as the higher multiplicity and a softer fragmentation in gluons jets.

Datasets

Due to the complexity of the detector, the conditions under which it is ran change with the time. As two examples, it was already shown in the previous sections that the geometry of the pixel detector was changed from the 2017, adding a new tracking layer, but also the pile-up conditions were increased during the 2017 and 2018. These effects are carefully taken into account when analyzing data, thanks to dedicated corrections or specific simulations. In this section, the most important features of the data taking are described.

Pile-up conditions

As shown in figure [fig:lumi_pileup], the number of simultaneous interaction per bunch crossing was significantly larger during the 2017 and 2018 data-taking. This resulted in an increase of the instantaneous and integrated luminosity during those years, that allows to explore even lower cross-section processes. However, this also results in a slightly degraded performance during those years in the measurement of ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$, jet energy and lepton isolation. Additionally, the increase of the instantaneous luminosity also meant an raise in the momentum thresholds of the triggers, in order to keep the rate under control.

Pixel performance

The pixel detector was upgraded during 2017. The new pixel detector has an improved readout and has four layers and three disks, ensuring a 4-hit coverage across all the tracker acceptance. Additionally, the innermost layer is closer to the interaction point, allowing for a better impact parameter resolution. However, during the 2017 data taking some of the pixel modules failed due to the radiation-induced damage in components of their power supply. This resulted in a slight degradation of the performance, which is partially recovered by the redundancy of the system. The failing components were fully replaced before the start of the 2018 data taking, and no significant failures have been observed since.

L1 prefiring

The progressive transparency loss of the crystals led to a gradual shift in the timing of the . This shift was not propagated to the L1 calibration during the 2016 and 2017 data taking. Because of this, a fraction of the objects at high ∣η∣firing the trigger were associated to the previous bunch crossing. When this occurs only the previous bunch crossing is accepted by the trigger, because the logic does not allow two consecutive bunch crossings to fire the trigger. Then, the previous bunch crossing is also rejected in the , since the event is empty. This effect is studied selecting events two bunch crossings after a trigger has been fired. These events cannot be affected by the pre-firing because of the trigger rules, and the prefiring rate can be measured. This effect was found to be of the order of a few percent. It was accounted for in the L1 calibrations for the 2018 data taking, so no calibration is needed.

HCAL Endcap issue

During 2018 data taking a sector of the endcap failed and could not be recovered. This results in a miscalibration of jets and electrons in a small region of the phase space. In the analysis described in chapter [chap:susy] this is accounted for vetoing all events with a jet, electron or muon in that region. In the $\PQt{}\PAQt{}\PH$analysis described in chapter [chap:ttH] an additional uncertainty is applied to account for a possible miscalibration.

Corrections to simulations

Monte Carlo simulations are used to estimate the production rate of events with given kinematic properties. Dedicated simulations are typically used for signal and background processes, in a way that physics analyses can be optimized, and also to interpret them in the context of the or any theory. Generators described in section [sec:sm_pheno] quite reliably model most of the physical processes that can happen in $\Pp\Pp$collisions. However, the interaction of final state particles with the detectors must also be modeled in order to have a complete prediction. In , the geant software is used to model the complete detector geometry, the interaction of the final state particles with each one of the subcomponents, as well as the digitalization of the detected signals. Then these signals are analyzed as if they were actual collision data, running all the reconstruction algorithms on them. These simulations are referred to as full simulations. Since the complete simulation of the detector is a computationally intensive process, for the modeling of several processes, the Fast Simulation package is used instead, that allows for a much faster emulation of the components, with a simplified geometry and emulating the response of the subdetectors with what is observed in full simulations.

Despite of the complete simulation of the detector, slight differences between data and full and fast simulations are observed in the detector response. This ultimately affects the estimation of the object identification and reconstruction performance. Since it is aimed to precisely estimate all the observables, this performance is carefully measured in data and these small discrepancies are corrected. In this section, some of the methods used for this are described.

Lepton efficiency corrections

The efficiency of the lepton selection and reconstruction is corrected using the tag-and-probe method . Since a focus is put in lepton reconstruction and selection in this thesis, this method is reviewed in greater detail in chapter [chap:muon].

Trigger efficiency corrections

The trigger efficiency can be measured using different methods, depending on the nature of the trigger. In this section the efficiency measurements used in this thesis are reviewed, for which different combinations of single and double lepton triggers are used. Single lepton trigger efficiencies can be easily measured using the tag-and-probe technique. This technique could be in principle extrapolated to dilepton triggers, but it gets increasingly complicated when using more elaborated trigger strategies or when the efficiencies of the two leptons cannot be factorized.

In this thesis, an alternative approach, called orthogonal trigger method, is used instead. Efficiency is measured in events collected using a trigger whose decision is independent from the presence of the objects we aim to trigger on. In this thesis, triggers based on the momentum imbalance of the event are used. These triggers are indeed not fully independent from single and di-lepton triggers, since a lepton mismeasurement could yield to instrumental ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$. However, these correlations are found to be small and are usually assigned as a systematic uncertainty in the measurements. Efficiency for a given trigger in events passing a offline selection is measured as

$$\epsilon = \frac{\text{N(trigger $\land$ offline $\land$ orthogonal)}}{\text{N(offline $\land$ orthogonal)}}.$$

These measurements can be performed, in the case of dileptonic events, as a function of the lepton kinematics. Several uncertainty sources may be considered: the comparison with the measurements using other orthogonal triggers, or with the ones obtained with the tag-and-probe method, the correlations between orthogonal and measured triggers in simulations, residual kinematic dependencies, as well as the statistical uncertainty of the data samples.

Jet energy and resolution corrections

Jets reconstructed are calibrated in order to provide a more accurate estimation of their momenta and in order for simulations to properly describe data. The set of corrections is fully described in . Using simulated events, corrections are derived to subtract contributions to the jet energy from pile-up and particles. Similarly, other corrections are applied in order to the measured jet energy corresponds to that of the generator level jets. Finally, residual corrections are applied to simulations correct for the small discrepancies in data and simulations. These corrections are determined by studying the momentum imbalance in dijet events, between jets and electron and muon pairs in DY+jetsevents, and between jets and photons in +jetsevents.

These calibrations, as well as the corresponding uncertainties, are propagated to the calculation of ${\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}$.

tagging corrections

Both the efficiency to identify jets as such and the misidentification probability of light flavor jets as jets must be taken into account when calibrating the tagging algorithms. These measurements are fully described in .

Misidentification probability is measured in a inclusive sample of multijet events. The so-called negative (positive) taggers are built by constructing the -jet tagger using only tracks and secondary vertices with negative (positive) impact parameter and flying distance. These taggers are symmetric for light flavor jets, so the mistag rate for a given working point is determined as the proportion of jets that pass that working point of the negative tagger. The ratio between the misidentification probability of light jets and the misidentification rate of the negative tagger in all jets is taken into account from simulations.

Jet -tagging efficiency is measured in samples of jets enriched in jets, either by selecting suitable samples of multijet events or selecting events. Several complementary approaches are followed to obtain a efficiency for each working point.

In some occasions, it is necessary to calibrate the efficiency and misidentification rate as a function of the value of the discriminator. In this cases, an iterative approach is followed, selecting and DY+jetsevents and deriving calibrations independently. In the two cases, contamination from and light flavor jets exist, so the determination of one rate affects the determination of the other. This procedure is followed iteratively until a convergence is achieved.

Statistical analysis techniques

When performing physics analyses, it is necessary the usage of statistical tools in order to improve the performance of the analysis itself, but also to interpret the measured observables in terms of physical quantities. Several of such techniques are used along this thesis, and are introduced in this section.

MVA discriminants

In this thesis, discriminators are used to classify events and objects between two or more categories using a set of input variables. Usually, these two categories are signal and background, however in the $\PQt{}\PAQt{}\PH$multilepton analysis described in chapter [chap:ttH], classification among several background and signal species is used. These algorithms are, by construction, achieve better discrimination performance than usual techniques using sequential cuts.

Two types of supervised learning algorithms are used in this thesis, and . These kind of algorithms are usually trained in sets of simulated samples, in which the species of each one of them is known a priori. A more detailed description on this methods can be found in the usual literature .

are built from decision trees. Decision trees are classifiers that consist in the application of several sequential criteria. These criteria are set iteratively, by applying first a classification based on the most discriminating variable, dividing the training set into two subsets or nodes, one of them more enriched in signal and the other in background. Then the same procedure is applied iteratively to each one of the subsets separately, until a reasonable purity is achieved in each of the nodes. The maximum number of such decisions that are made is denoted as the depth of the tree. Decision trees can perform very well when considering a small number of input variables and large training datasets. However they are very sensitive to fluctuations in the training dataset, easily leading to overtraining. If this occurs, the classifier does not generalize to other datasets different from the training one.

In order to avoid overtraining, a technique called boosting is used. An ensemble of weak learners is trained. In this case weak learners are shallow trees. Intuitively, the first weak learner is trained using the approach above, and the rest are focused to properly classify samples for which the previous learners have failed. Several boosting algorithms are available. In this thesis, the implementation of gradient boosting in the TMVA software is used.

Many types of have been developed in the past years . In this thesis, only feed-forward neural networks are considered. They are mappings built in the following way. Given an activation function f, a neuron is f(∑a_ix_i + b_i), where x_i are input variables, and a_i and b_i are the weights and biases, parameters of the model. An set of neurons L is defined to be a layer of the network, which is formed by the composition of several layers, L_1 ∘ ⋯ ∘ L_d. The network, which is said to have a depth d, corresponds to a mapping Rⁿ → R^m, where n is the number of input variables and k any integer.

It can be proven that any bounded Rⁿ → R^m function can be approximated in a compact Rⁿ with a neural network with enough neurons and with a suitable choice of weights and biases and mild assumptions on the activation function . The parameters of the network are trained in a simulated dataset of the different signal and background species, choosing the set of weights and biases minimizing the cross-entropy variable. In the context of this thesis, neural networks used make use of rectified linear units as activation functions. The minimization is performed using the batch gradient descent algorithm implemented in the Tensorflow package, with the Keras interface.

Statistical treatment of the results

The three physics analysis described in this thesis define categories, in which event counting experiments are performed. Then, inference is performed in these observables to obtain relevant physical information. In the case of the $\PQt{}\PW$inclusive production cross-section and $\PQt{}\PAQt{}\PH$production measurements, confidence intervals are drawn on strength parameters. In the search for production, since no signal is observed, upper limits are drawn on the signal strength for each model. In this section, the statistical framework used is described. This framework coincides with the one described in .

Given a set of counting experiments observations, x_i, a likelihood function is defined as

$$\mathcal{L}(x_i|\boldsymbol{\mu}, \boldsymbol{\theta}) = \prod_i \mathcal{P}\left(x_i,\boldsymbol{\mu} \boldsymbol{s_i}(\boldsymbol{\theta}\right) + b_i(\boldsymbol{\theta})) \prod_j p_j(\tilde{\theta_j} | \theta_j).$$

In the equation $\boldsymbol{\mu}$ are the signal strengths for the various signals, $\boldsymbol{s_i}$, their contribution to a given category i and b_i the contribution from background. The latter two are function of $\boldsymbol{\theta}$, the nuisance parameters, that parameterize the systematic uncertainties affecting to their estimation. The symbol P denotes the probability function of the Poisson distribution.

The functions $p_j(\tilde{\theta_j} | \theta_j)$ are the a-priori expectations for the nuisance parameters, and constrain them. This function is a log-normal probability density function in nuisances affecting the normalization of processes, and a gamma function if the uncertainty is of statistical origin. In the $\PQt{}\PAQt{}\PH$multilepton analysis, the approach described in is followed for the latter type of uncertainties. Systematic uncertainties affecting also the shape of the distributions are incorporated, following the morphing technique described in , and their prior is Gaussian. In some occasions, the normalization of given backgrounds is left unconstrained in the fit.

The best fit for $\boldsymbol{\mu}$ is obtained by maximizing the likelihood, L. Confidence intervals and upper limits are obtained by considering the profile likelihood ratio t-statistic:

$$q(\boldsymbol{\mu}) = -2\log\frac{\mathcal{L}(x_i|\boldsymbol{\mu}, \boldsymbol{\hat{\hat{\theta}}}(\boldsymbol{\mu}))}{\mathcal{L}(x_i|\boldsymbol{\hat{\mu}}, \boldsymbol{\hat{\theta}})},$$

where $\boldsymbol{\hat{\mu}}$ and $\boldsymbol{\hat{\theta}}$ are the best fit values, and $\boldsymbol{\hat{\hat{\theta}}}(\boldsymbol{\mu})$, the best fit values for a given choice of $\boldsymbol{\mu}$. Due to the Wilk’s theorem, in the limit of large number of observations, $q(\boldsymbol{\mu})$ is distributed as a χ_n² distribution under the hypothesis of $\boldsymbol{\mu}$ being the true value, where n is the dimension of $\boldsymbol{\mu}$. Then confidence intervals can be constructed by considering the crossings of $\boldsymbol{\mu}$ with the quantiles of the χ² distribution.

Upper limits on the production of a signal can also be obtained by considering the same t-statistic as

$$\text{CL}_{s+b}(\boldsymbol{\mu}) = P(q(\boldsymbol{\mu}) > q(\boldsymbol{\mu}) | \boldsymbol{\mu}).$$

A possible way to define the exclusion limit would be with the Neyman construction, taking $\text{CL}_{s+b}(\boldsymbol{\mu}) < \alpha$ as an upper limit at confidence level (CL) α. However, by construction, CL_s + b(0) < α will occur in an 1 − α proportion of the times, i. e., this will happen a 5% of the times if we aim for upper limits at 95% CL. A modification of this upper limits is therefore considered. Instead, the upper limit is defined as $\text{CL}_{s}(\boldsymbol{\mu}) < \alpha$, defining

$$\text{CL}_s (\boldsymbol{\mu}) = \frac{\text{CL}_{s+b}(\boldsymbol{\mu}) }{\text{CL}_{b} (\boldsymbol{\mu}) },$$

where $\text{CL}_{b} (\boldsymbol{\mu}) = P(q(\boldsymbol{\mu}=0) > q(\boldsymbol{\mu}=0) | \boldsymbol{\mu}=0)$. By construction, the resulting limits are one-sided, and overcover the true value, since they are strictly higher than the CL_s + b ones. These upper limits can be computed using asymptotic formulae for the limit with large number of observations. In the cases in which this approximation is not valid, they can be estimated drawing a set of pseudoexperiments.

Unfolding

In some occasions, it is interesting to study the production cross section as a function of a given observable of interest. In the case of inclusive measurements it is trivial to remove the effects due to the detector resolution computing the efficiency. However, when considering differential distributions one should also consider migrations among bins of the distributions. The method to do so is called unfolding. A more complete review on the different unfolding techniques can be found in reference . This section covers the method followed in the differential cross section measurement in $\PQt{}\PW$, described in chapter [chap:tW].

Response matrix

The problem can be stated in terms of x, a given variable in the before any detector effect is applied, and y, the reconstructed variable. Usually these measurements are done over binned datasets, although this binning can typically be chosen2. We denote x_j the number of signal events in a given bin of x and y_i the number of signal events in a bin of y.

To model the limited resolution of the detectors, the response matrix, ${\ensuremath{\mathcal{R}}\xspace}$, parameterizes the signal event migrations between the detector-level and particle-level variables. The matrix is defined in a way $ y_i = {\ensuremath{\mathcal{R}}\xspace}_{ij} x_j$, and is constructed using signal simulated events, after they have been corrected by the corresponding calibrations and scale factors. With these simulations, the response matrix can be computed as

This matrix takes into account signal efficiency as well as migrations between bins of the distribution. When applying this method in actual data, a contribution from background events may appear in the signal region. In order to account for this, estimated backgrounds are subtracted from data

$$\label{eq:unfolding} \hat{y_i} = N_i - N_i^{bkg} = \sum_{j=1} {\ensuremath{\mathcal{R}}\xspace}_{ij} \hat{x_j}.$$

where $\hat{y_i}$ and $\hat{x_j}$ represent estimators for y_i and x_j, respectively.

If the response matrix is inverted, equation [eq:unfolding] can be used to obtain an unbiased estimator for x_j. However, as it will be shown below, the non-diagonal terms in the response matrix may increase significantly the covariance of this estimator. Because of this, it is crucial to choose a binning of the generator and reconstructed-level variables that ensures that those elements are as small as possible.

To quantify the levels of non-diagonality of the response matrix, the condition number of the matrix is used. The condition number of a given matrix ${\ensuremath{\mathcal{R}}\xspace}$ is defined as $\mathrm{cond}({\ensuremath{\mathcal{R}}\xspace})=\sigma_{\mathrm{max}}/\max(0,\sigma_\mathrm{min})$, where σ_{max (min)} are the largest (smallest) singular values of ${\ensuremath{\mathcal{R}}\xspace}$. If the condition number is small, of the order of 10, the problem is well-conditioned and can be solved using using the maximum likelihood estimator. If it is large, the problem is ill-conditioned and regularization techniques may be needed.

The binning is optimized to achieve the maximum stability and purity. These quantifies are defined, respectively as:

$$s_i = \frac{{\ensuremath{\mathcal{R}}\xspace}_{ii}}{\sum_j {\ensuremath{\mathcal{R}}\xspace}_{ij}}, \ p_j = \frac{ {\ensuremath{\mathcal{R}}\xspace}_{jj}}{ \sum_i {\ensuremath{\mathcal{R}}\xspace}_{ij}}.$$

Regularization

As mentioned above, an unbiased estimator for y_i would be obtained by inverting the response matrix. However, this inversion may amplify the statistical uncertainty of data, leading to large uncertainties in the estimated unfolded distribution. This estimator has a variance equal to the Frechet-Cramer-Rao bound , hence it can only be improved by introducing a bias to reduce its variance.

The approach followed in this analysis is to consider Tikhanov regularization , as it is implemented in . This approach is performed with the following construction. A solution to the problem can be obtained by minimizing the following cost function:

$$\begin{aligned} \label{eq:likeli} {\ensuremath{\mathcal{L}}\xspace}&= {\ensuremath{\mathcal{L}}\xspace}_1 + {\ensuremath{\mathcal{L}}\xspace}_2 + {\ensuremath{\mathcal{L}}\xspace}_3 \\ &= (\mathbf{y} - {\ensuremath{\mathcal{R}}\xspace}\mathbf{x})^{T} V_y (\mathbf{y} - {\ensuremath{\mathcal{R}}\xspace}\mathbf{x}) \\ &+ \tau^2 \mathbf{x}^T (L^T L) \mathbf{x} \\ &+ \lambda (Y - e^T \mathbf{x}),\end{aligned}$$

where V_y is the covariance matrix associated to y, Y = ∑_iy_i and $e_j = \sum_i {\ensuremath{\mathcal{R}}\xspace}_{ij}$. The first term ${\ensuremath{\mathcal{L}}\xspace}_1$ corresponds to the likelihood function that is maximized to obtain the maximum likelihood estimator, which corresponds with the solution associated to the matrix inversion. The ${\ensuremath{\mathcal{L}}\xspace}_2$ and ${\ensuremath{\mathcal{L}}\xspace}_3$ terms introduce a bias to this solution, that allow to reduce the variance of the estimator.

${\ensuremath{\mathcal{L}}\xspace}_2$ is a regularization term that penalizes larges curvatures in y, associated to numerical instabilities, and is modulated by a regularization term, τ, that regulates the size of the bias introduced. Different L matrices can be applied, depending on the instabilities to be penalized. In this case, the bias aims to penalize large curvatures or second derivatives, hence L_ij = 2δ_ij − δ_ij − 1 − δ_ij + 1 is taken, where δ represents the Kronecker δ.

${\ensuremath{\mathcal{L}}\xspace}_3$ is an area constraint, which a Lagrangian parameter λ, that forces the estimated total number of events in the particle-level space to be consistent with the reconstructed space.

The optimal choice of the τ² parameter is determined using the L-curve method. Low values for values of τ² allow for larger fluctuations in data, while high values introduce a higher bias. The optimal value is then fixed by scanning the L-curve, which is defined as (L_x, L_y) with $L_x = \log{\ensuremath{\mathcal{L}}\xspace}_1$ and $L_y = \log{\ensuremath{\mathcal{L}}\xspace}_2/\tau^2$. For large τ values, higher values of ${\ensuremath{\mathcal{L}}\xspace}_2$ are penalized and high values of ${\ensuremath{\mathcal{L}}\xspace}_1$ are allowed. On the contrary, for small τ values ${\ensuremath{\mathcal{L}}\xspace}_1$ takes small values and ${\ensuremath{\mathcal{L}}\xspace}_2$ takes large values.

In the analysis shown in chapter [chap:tW], the scan in the L-curve was performed for all the studied variables, and the best τ value was tested. No need for regularization terms was needed in this analysis, and the MLE estimator is used.

Lepton reconstruction and identification in CMS

Leptons are the main objects used in this thesis. This chapter covers lepton reconstruction at , with a focus on muon reconstruction and identification, since it was one of the main tasks I contributed to during this thesis. This is described in section [sec:muon]. Electron reconstruction and identification, and the main isolation variables used in are described in sections [sec:electron_reconstruction_id] and [sec:isolation]. Then, an ad-hoc discriminator developed for the $\PQt{}\PAQt{}\PH$analysis is described in section [sec:muon_lepton_mva]. A complete description of lepton efficiency measurements and the associated uncertainties is shown in section [sec:muon_lepton_efficiency_characterization]. Finally, a short summary of the identification of hadronic is made in section [sec:tauh_reconstruction].

Muon reconstruction and identification at CMS

In this section, the muon reconstruction methods used in CMS detector and the identification criteria used are described. A focus is put on the identification criteria that are used in the forthcoming chapters of the thesis. A complete description of the algorithms used can be found in . The performance of the muon selection and reconstruction in Run 2 data is also shown.

Muon reconstruction

Muons are reconstructed in CMS by combining the information from the tracking detectors and the muon spectrometer, using three complementary algorithms.

Standalone muons tracks are constructed starting from a track built from groups of DT or CSC segments, and is propagated and updated using RPC, CSC and DT segments using a Kalman filter algorithm . Tracker muon tracks are built starting from all reconstructed tracks in the inner tracker, which are propagated to the muon system. The propagated track is matched geometrically to segments in the muon system. A tracker muon is reconstructed if at least one segment is matched to the propagated track. Global muons are built starting from standalone tracks, that are propagated to the inner detector and matched to tracker tracks. A global fit is then performed with the Kalman filter using both the standalone and the associated tracker track. Tracker and global muons are merged into a single candidate if they share the same inner track.

About 99% of muons produced the geometrical acceptance of the muon system are reconstructed as a global or tracker muon. The global algorithm is designed to reconstruct muons that transverse the muon system with a very high purity. Tracker muons are instead used to recovered efficiency in the regions where the muon system is not very instrumented.

Muons reconstructed by these algorithms are then used as an input of the PF algorithm. Al global isolated muons matched to DT and CSC segments are selected as PF algorithm, while additional quality criteria are imposed on tracker muons or non-isolated global muons.

The default algorithm to measure the muon momentum takes the information from the global fitted trajectory and the tracker-only trajectory. The latter is used for muons with p_T > 200 $\,\text{Ge\hspace{-.08em}V}$if the charge-momentum ratio, q/p, agrees within two standard deviations with the tracker-only fit. In the rest of the cases, the inner track is used . This approach is followed in the analyses described in this thesis. There are other more refined algorithms to estimate with higher precision the momentum of high p_Tmuons, for which tracker tracks will be almost straight and a significant amount of showering is expected in the muon system .

Muon identification

Several quality criteria are imposed to select muons at the analysis level. Working points are also defined targeting for several levels of purity and efficiency, and also different sources of muons. In this section, the three main selections used in this thesis are described. The $\PQt{}\PAQt{}\PH$multilepton analysis described in chapter [chap:ttH] makes use of the prompt-lepton MVA described in section [sec:muon_lepton_mva]. There are other selections, not described in this thesis, that are tailored for high p_Tand low p_Tmuons, that do not make use of the PF algorithm, and are described in . The loose muon identification criteria aim to select muons produced in prompt decays in the primary vertex, but also those produced in light and have flavor decays, with a very high efficiency. This working point is used to efficiently reject charged hadrons that are reconstructed as muons. Loose muons are required to be selected by the PF algorithm and also to be either a tracker or a global muon.

The medium muon identification criteria aim to select muons produced in prompt decays and in heavy flavor decays. Medium muons must pass the loose selection and are required to use more than 80% of the hits in the tracker track. Additional quality criteria are based on the following quantities. The segment compatibility evaluates the consistency between the tracker trajectory and the segments in the muon system, as it would be expected from a minimum ionizing particle, returning a value between 0 and 1, with 1 representing the highest degree of compatibility. A kink-finding algorithm is used to evaluate the quality of the combined track. The algorithm divides the track in two at several points in the trajectory, and evaluates the compatibility between the resulting tracks, with a large χ² value indicating a large difference between the two. For global muons with a global track with a χ²/dof < 3, the position match between the tracker and standalone track must have χ² < 12, and with the kink-finding algorithm returning a score less than 20, segment compatibility is required to be higher than 0.303. This cut is relaxed to 0.451 otherwise.

The tight muon selection aims to reject muons from decays in flight and hadronic punch-through. A tight muon is a loose muon reconstructed as a global muon with global track χ²/dof < 10, and a tracker track using at least 6 tracker layers and at least one pixel hit. The track must be matched to at least two muon stations to suppress hadronic punch-through. Impact parameter cuts of 2 mm and 5 mm in the transverse plane and along the beam direction are applied, respectively.

Muon selection performance

In this section, the performance of some of the selections defined above are shown in $\Pp\Pp$collisions recorded with the CMS detector during years 2016, 2017 and 2018. During my thesis, I coordinated the efforts in the muon group of the CMS Collaboration to commission and evaluate the performance of these selections, as well as tuning them. These studies are documented in these public documents . The efficiency figures shown in this section were done in collaboration with the IFCA group, while the resolution performance studies were done by the Rochester group.

The efficiency for muons to pass the loose, medium and tight selection criteria is shown in figure [fig:muon_efficiency_measurement] in $\PZ\to{\ensuremath{\PGm^\pm}\xspace}{\ensuremath{\PGm^\mp}\xspace}$ events collected with single muon triggers in CMS during Run 2 of the LHC. Efficiency is measured as a function of the muon η using the tag-and-probe method , with the parameters described in section [sec:muon_lepton_efficiency_characterization]. The efficiency of the three selection criteria is above 95% for the three selections, with the exception of the tight identification criteria in regions where the muon system is less instrumented. The results also show the robustness of the muon system and reconstruction algorithms across the different periods of the Run 2 data taking, with different detector geometry and pile-up conditions.

The momentum scale and resolution achieved by the CMS detector is shown in figure [fig:muon_momentum_resolution], as it is measured in $\PZ\to{\ensuremath{\PGm^\pm}\xspace}{\ensuremath{\PGm^\mp}\xspace}$ events corrected by the so-called Rochester method . Events are collected with a single muon trigger, and muons are required to pass the tight identification criteria. The figure shows that scale of the resonance is determined with a high precision, and the associated resolution achieved, which has a mild dependence on the η of the muon.

Electron reconstruction and identification at CMS

The electron reconstruction at CMS, described in greater detail in , combines the information from the calorimeters and tracker to build electron candidates. Since electrons radiate photons via bremsstrahlung, their trajectory will change as they traverse the tracker. Additionally, when an electron arrives to the ECAL, it may be composed by a single object, but a combination of electrons and photons that have been produced. The reconstruction algorithm aims to reconstruct all the associated object into an electron candidate.

Three types of seeds are used to reconstruct electrons. A dedicated algorithm is used to cluster ECAL deposits corresponding to a candidate electron and associated electrons and photons into a supercluster (SC). Additionally, doublets of tracker hits matched geometrically to the ECAL SC are used as seeds to the algorithm. The other kind of seeds are tracker tracks, that are tested to be compatible with an electron. The two latter seeds are fed into a dedicated tracking algorithm, that takes into account the effect of photon emission in the tracker in the electron trajectory, using . These tracks are referred to as tracks.

The ECAL clusters, the tracks, and SC associated to tracker tracks are fed into the PF algorithm, which produces electron and photon candidates. Electron candidates are constructed from the association of a track and an ECAL SC.

Reconstructed electrons are applied quality criteria to reject jets misidentified as leptons, keeping only those produced in the prompt decays of , , and, very optimistically, decays. Several variables are considered for this purpose. The σ_iηiη^5 × 5 variable is defined as

$$\sigma_{i\eta i\eta}^{5\times 5} = \frac{\sum \left( \eta_i - \eta\right)^2 w_i}{\sum w_i},$$
where the sum runs over the 5 × 5 matrix of ECAL clusters around the highest energy crystal of the SC, and w_i is a weight that depends logarithmically on the energy deposited in the crystal. Additionally, the ratio between the energy deposited in the HCAL and the ECAL, H/E is also considered. These variables allow to discriminate genuine electrons from jets identified as electrons.

The distance in pseudorapidity between the ECAL SC and the extrapolated track, Δη and Δϕ, the analogous quantity in the ϕ directions are also used. The η and ϕ positions of the SC are defined as the energy-weighed position of the clusters. Finally, the difference between the inverse of the SC energy and the inverse of the track momentum, $\frac{1}{E_{\mathring{SC}}}-\frac{1}{p}$. Additional criteria may be also applied to further suppress the photon conversion.

In this thesis, most of these variables are combined in different manners to identify electrons. In the $\PQt{}\PW$analysis described in chapter [chap:topphysics], selections on each one of these variables are applied sequentially as outlined in table [tab:elec_tight_id_cutbased]. The criteria are applied depending on the η of the electron supercluster, η_SC. This is denoted as the cut-based tight identification. The remaining two analyses use these variables combined in a discriminator, that improves the discrimination performance. The is trained in DY+jetssimulated samples using prompt electrons as signal and jets as background.

Several working points are defined, depending on the efficiency. Two working points are used in the $\PQt{}\PAQt{}\PH$multilepton analysis (see chapter [chap:ttH]), the loose and the WP-80 working points. The first aims for a very high efficiency, while the second aims for a 80% efficiency. Two working points are also used in the search for SUSY (chapter [chap:susy]), for which the cut in the score is tuned as a function of the p_Tand η_SC to achieve the optimal performance

Lepton isolation at CMS

In many cases, it is necessary to discriminate between leptons produced in prompt decays from those from other sources. These sources may be dominated by genuine leptons produced in heavy flavor decays, for which usual identification variables are not enough. Since non-prompt leptons are usually produced inside jets, the detection of other objects surrounding the lepton candidate can be used to discriminate them. Isolation variables are used for that purpose, and built considering a cone in ΔR around the reconstructed lepton. The PF relative isolation is then computed as

$$I = \sum {\ensuremath{p_{\mathrm{T}}}\xspace}^i / {\ensuremath{p_{\mathrm{T}}}\xspace},$$

where the sum runs over all the PF candidates with a ${\ensuremath{p_{\mathrm{T}}}\xspace}^i$ included in the cone size and coming from the primary vertex, and the p_Tcorresponds to the p_Tof the lepton. While the charged particle component of the isolation is very well measured due to the tracking system, additional corrections are needed in the neutral part, composed by neutral hadrons and photons, since the contribution from pile-up cannot be directly subtracted, because no tracking information is available to discriminate between the different primary vertices. Two different approaches are followed: the Δβ corrections, and the effective area corrections.

The Δβ corrections are applied to the neutral component of the isolation by computing the contribution from charged hadrons coming from pile-up in the cone. This quantity is scaled by a factor of 0.5, corresponding to the ratio between charged and neutral particle production rate at inelastic $\Pp\Pp$collisions, as observed in simulations. The relative isolation is then computed as

$$I = \left(\sum_{\mathrm{ch.\ had.}} {\ensuremath{p_{\mathrm{T}}}\xspace}^i + \max\left(\sum_{\mathrm{ph.}} {\ensuremath{p_{\mathrm{T}}}\xspace}^i +\sum_{\mathrm{neu.\ had.}} {\ensuremath{p_{\mathrm{T}}}\xspace}^i - 0.5 \sum_{\mathrm{ch.\ had.\ (pile-up)}} {\ensuremath{p_{\mathrm{T}}}\xspace}^i \right) \right) / {\ensuremath{p_{\mathrm{T}}}\xspace}.$$

The effective area correction is similar to the jet areas method used to subtract the pile-up contribution from jets . In this case, the pile-up correction is assumed to be ${\ensuremath{p_{\mathrm{T}}}\xspace}^{\mathrm{PU}} = \rho A_{eff}$. ρ is the average density of an event, defined as the median of the energy density of particles within the area of any jet, reconstructed with a R parameter of 0.6. A_eff corresponds to the dependence of isolation as a function of the number of reconstructed vertices in the collision.

Several isolation cone sizes are used depending on the object and on the analysis. By default, a fixed cone size of 0.4 is used for muons, while a 0.3 cone size is used for electrons. However, for several analyses it is interesting to consider a variable cone size, that shrinks as a function of p_T. This allows to recover efficiency to topologies in which the leptons are produced in the decay of boosted objects, and are collimated to the rest of the decay products. In those cases, mini-isolation is considered, for which a cone size with a radius between 0.05 and 0.2, depending on the lepton p_Tis used.

In this thesis, both fixed cone size and mini-isolation are considered in the different analyses. Muon fixed-cone isolation is corrected with Δβ corrections, while mini-isolation and electron fixed cone isolations are corrected with the effective areas approach.

Prompt-lepton MVA

The three analyses described in this thesis use the presence of two or more leptons in the final state as a signature to trigger and identify the presence of interest. These signal leptons are usually discriminated from those coming from non-prompt sources using isolation variables. However, many analyses still have a significant contribution from non-prompt sources in their signal regions. This is the case of the $\PQt{}\PAQt{}\PH$multilepton analysis, fully described in chapter [chap:ttH], that looks for a rare signature, comparable to the rate of non-prompt leptons to pass the usual isolation requirements.

To enhance the separation power between prompt and non-prompt leptons, this analysis makes use of a multivariate discriminator that takes several aspects of the leptons into account. This method has been used in several searches for supersymmetry and Higgs measurements . A retraining of this method was used for recent $\PQt{}\PAQt{}\PZ$measurements and the $\PQt{}\PZ{}\PQq$observation . During my thesis, I maintained and improved this discriminator, and tuned it to achieve the best performance for the $\PQt{}\PAQt{}\PH$multilepton analysis.

The discriminator is a trained in simulated events, separately for electrons and muons. Two trainings for each lepton flavor are done in simulations reflecting the detector conditions during the 2016 and 2017 data taking, to account from the different detector geometry. No significant changes were present between 2018 and 2017, so the training performed in 2017 simulations is used for 2018 data and simulation.

Signal leptons are leptons coming from the prompt decay of a boson or a in a $\PQt{}\PAQt{}\PH$simulated sample. Even if the impact parameter of decays is slightly different from that of prompt decays, it is important to keep them as signal, in order not to loose performance when considering $\PH\to\PGt\PGt$ decay in the $\PQt{}\PAQt{}\PH$measurement. Backgrounds are leptons in a semileptonic sample that are not matched to a prompt boson, decay or any other source of prompt leptons.

Three types of variables are considered: kinematic variables, isolation variables, tagging variables, identification variables, and impact parameter variables. Several of these variables are defined considering the jet associated to the lepton. This jet is the one in which the PF candidate that conforms the lepton is included. Only jets with p_Tgreater than 15 $\,\text{Ge\hspace{-.08em}V}$are considered. If such jet does not exist, the variables are assigned a sentinel value, that is defined in the list of input variables below.

The performance of this discriminator is evaluated in simulated events, using prompt electrons and muons as signal and non-prompt leptons as background. It is compared to the selections used in the $\PW\PW$ cross section analysis and the search for $\PQt{}\PAQt{}\PQt{}\PAQt$ , both by the CMS Collaboration. Non-prompt leptons contribute significantly to the two analyses, however the source and p_Tspectrum of these non-prompt leptons is not necessarily the one of the $\PQt{}\PAQt{}\PH$multilepton analysis. Therefore they are shown here as benchmark points to compare the performance of the prompt-lepton MVA with other approaches with rectangular cuts.

The $\PW\PW$ measurement uses the tight muon and tight cut-based electron identification criteria, with tighter cuts on the impact parameter of the leptons with respect to the primary vertex, and relative isolation using fixed cone sizes. The $\PQt{}\PAQt{}\PQt{}\PAQt$analysis uses medium muon identification criteria and a custom working point of the electron MVA-based identification criteria, with tight impact parameter cuts. Rectangular cuts are also imposed in the mini-isolation, ${\ensuremath{p_{\mathrm{T}}}\xspace}^{rel}$ and ${\ensuremath{p_{\mathrm{T}}}\xspace}^{ratio}$.

Efficiency for signal and background muons passing the loose identification criteria and signal and background reconstructed electrons, all them with ${\ensuremath{p_{\mathrm{T}}}\xspace}> 25$$\,\text{Ge\hspace{-.08em}V}$, and are shown in figure [fig:muon_lepmva_rocs]. The $\PQt{}\PAQt{}\PH$multilepton curve includes the requirements used in the $\PQt{}\PAQt{}\PH$analysis described in chapter [chap:ttH], for different cuts of the prompt-lepton MVA. The approach followed in the $\PQt{}\PAQt{}\PH$multilepton analysis reduces the background acceptance rate by more than a factor of 5 for muons and almost a factor of 2 for electrons. This shows significant gains can be obtained by using techniques in lepton identification.

Lepton efficiency characterization

The algorithms described in the previous section allow to selection muons from specific sources with very high purity and efficiency. However, once the suitable selection is chosen, the muons passing it must be properly characterized. Lepton efficiency is typically calibrated by correcting the efficiency measured in simulated events with that predicted by the simulations. This procedure brings an uncertainty that can be significant. These uncertainties are of the order of 1-2% per each lepton, however they can be higher for selections that are dependent on the topology of the event, as isolation variables or the prompt-lepton MVA described in the previous section. For example, the uncertainty associated to this quantity in the $\PQt{}\PW$analysis described in chapter [chap:topphysics] is among the leading ones. Additionally, this uncertainty was the dominant in the $\PQt{}\PAQt{}\PH$measurement in the analysis performed with 2016 data .

This section provides a comprehensive review of the methods employed to measured these efficiencies, focusing on the different sources of uncertainty that affect the measurement. An emphasis is put in isolation, since all the analyses described in this thesis make use of it to discriminate prompt leptons from other sources, and is more sensitive to the topology of the event.

Lepton efficiency calibrations are usually performed in two steps. In the first one, the DY+jetsevents are used as a standard candle to measure the lepton efficiency using the tag-and-probe method . Then, efficiencies measured in data are compared to the prediction by DY+jetssimulations to obtain a set of scale factors. In the second step, simulated events in the relevant regions of the analysis are corrected by these scale factors.

Each one of the steps has associated systematic uncertainties stemming from the several assumptions that are made.

Measurement in DY events

In the tag-and-probe method, the efficiency is measured in DY+jetsevents. The method is described with more detail . In the following, we will only focus on the systematic uncertainties.

The method profits from the fact that DY+jetsevents show a resonance in the boson mass, and this can be used to discriminate from the various backgrounds. Events are collected using single lepton trigger. The lepton firing the trigger, referred to the as tag, is required to pass quality criteria to reject backgrounds. Another lepton, the probe, is required to be reconstructed passing looser selection criteria, which are going to be the denominator of the efficiency measurement. Efficiency for the probe to pass a given selection can then be obtained by counting the number of events for which the probe pass or fail the selection.

However, possible backgrounds in the measurement must be subtracted, since their efficiency may be significantly smaller than that of signal, biasing the measurement. To account for this, a fit is performed to the $m_{\Pl\Pl}$distribution to disentangle the signal peak from the non-resonant background. The integral of the signal in the passing and failing categories can then be used. However, a model for both signal and background must be chosen.

This choice introduces a source of systematic uncertainty, as slightly different results could be obtained with a different signal-background model. Different approaches are followed in the electron and muon groups of the CMS Collaboration. The muon group uses the sum of two Voigtian functions as signal hypothesis, and an error function with a multiplicative exponential term at high ${\ensuremath{m_{\Pl\Pl}}\xspace}$ from background . The efficiencies for electrons are measured using a template constructed using DY+jetssimulated events for signal, and an analytical function for background .

Systematic uncertainties are considered by performing different variations of these models, and also by varying the signal-background composition of the sample. The former is performed considering different analytic functions or different event generators when the templates are obtained from simulations. The latter is done by considering different selections to the tag lepton or by changing the $m_{\Pl\Pl}$range that is fitted.

Other subtle effects can affect the measurement. Leptons may radiate photons as they are produced. This is effect is more prominent in leptons with low p_T, in which a secondary peak is observed in the $m_{\Pl\Pl}$distribution. When measuring isolation efficiencies, this effect can be significant, as these leptons are usually less isolated because of the presence of the radiated photon itself. This radiation can be recovered and reassigned into the lepton 4-momentum, or the peak can be fitted and accounted for as signal or background, depending on which analysis the measurement is going to be made.

Phase space extrapolation

The following step in the correction for the data and MC discrepancies is to apply the scale factors derived using the tag-and-probe method in the measurement regions. This application is only valid if the distribution of the identification and isolation variables employed in the lepton selection are the same in leptons produced in the boson decay and signal leptons present in the measurement regions.

Identification variables are usually robust enough to be independent of the topology of the event. There are exception to this, since in cases in which two muons are produced colinearly to each other the reconstruction may fail. The analysis treated in the thesis are not very sensitive to this problem, since leptons are required to be well separated. However, isolation variables can be affected by the presence of relatively close-by jets, yielding to a bias due to the application of the scale factor in a different region to what has been measured. We refer to this effect as phase space extrapolation.

Phase space extrapolation uncertainty can affect analyses in different manners. Analyses performed in a Drell-Yan-like topology the bias introduced by this effect are more reduced, while for analyses performed in boosted topologies, in which different objects can be merged with the lepton, this effect can be large. This effect be therefore assessed in a analysis-by-analysis or topology-by-topology basis, and also depending on the specific lepton selection.

Measurement of the phase space extrapolation in the $\PQt{}\PAQt{}\PH$selection

The lepton selection used in the $\PQt{}\PAQt{}\PH$analysis is particularly sensitive to the topology of the event for two reasons. First, the prompt-lepton MVA requirements applied make use of isolation and variables related to close-by jets. Second, the analysis profits from a very pure selection requirements, so quite strict criteria are applied. This could enhance any phase space extrapolation effect.

As described in chapter [chap:ttH], several working points are used in the analysis. Because of this, the efficiency is measured in two steps: the loose selection efficiency and the tight selection efficiency in leptons passing the loose selection are measured separately. The first step is not expected to have a significant phase space dependence, since only a mild selection requirement is applied at that level. The effect of the phase space extrapolation is accounted for in the tight selection efficiency measured in loose leptons.

The approach followed is the following. Since signal regions require higher jet multiplicity than DY+jetsevents, the efficiencies are measured in a region enriched in events. This allows to check effect of additional jets in the event in a region with a reasonably large event rate. The nominal corrections in the analysis are however determined with the usual tag-and-probe measurement in DY+jetsevents, and the efficiency measurement in is used to estimate the systematic uncertainty due to the phase space extrapolation.

Events are collected using single lepton triggers. A reconstructed $\Pe\PGm$ pair must be present in the event. When measuring muon efficiencies, the muon is required to pass the loose $\PQt{}\PAQt{}\PH$selection (probe), and the electron is required to pass the tight $\PQt{}\PAQt{}\PH$selection (tag). Conversely, when measuring electron efficiencies, the electron and the muon are required to pass the loose and tight $\PQt{}\PAQt{}\PH$requirements, respectively.

In order to have a selection enriched in events, events are also required to have at least two reconstructed jets, out of which at least one of them must be -tagged, following the selections described in chapter [chap:ttH]. Similarly to the $\PQt{}\PAQt{}\PH$multilepton analysis, contribution low mass resonances is rejected by requiring ${\ensuremath{m_{\Pl\Pl}}\xspace}> 12$$\,\text{Ge\hspace{-.08em}V}$for all pairs of loose leptons in the event.

Efficiency can be measured by comparing the number of events for which the probe passes and does not pass the tight selection criteria. However, the presence of non-prompt or misidentified leptons in the region must be taken into account. This presence can be very significant in events in which the probe lepton fails the selection. In order to do this, efficiency is measured after the contribution from non-prompt leptons has been subtracted from the numerator and denominator:

$$\epsilon = \frac{N^{\mathrm{data}}_{\mathrm{passing}}-N^{\mathrm{non-prompt}}_{\mathrm{passing}}}{N^{\mathrm{data}}_{\mathrm{total}}-N^{\mathrm{non-prompt}}_{\mathrm{total}}}.$$

The estimation of non-prompt events is then carried out as usually done in several top physics analyses . This approach profits from the fact that the sign of non-prompt leptons are usually uncorrelated from the sign of the prompt lepton in semileptonic events, which are the main source of non-prompt leptons in this region. Then, the contribution of non-prompt leptons can be written as

where the subscript SS (OS) denotes the number of events with same (opposite) leptons. R_SS^OS is defined as the ratio between opposite-sign and same-sign events in processes with non-prompt leptons. This quantity and the subtraction of processes with prompt leptons is taken from simulated events.

Since the final aim is to measure at which level the scale factors obtained using the tag-and-probe measurements are suitable to be applied in events, the measurement of the efficiency in data is compared to the efficiency predicted by simulations in that region, after it has been corrected by the scale factors measured in tag-and-probe measurements.

The measurement is performed separately for electrons and muons as a function of p_T. Unfortunately the event rate in the same-sign sideband is not large enough to make a over η and p_Tsimultaneously. The results of the measurement are shown in figure [fig:muon_closure_lepeff], separately for electrons and muons, as a function of p_T, and separately for each year of the data taking. The difference between the two is use as the systematic uncertainty for the tight lepton selection in the $\PQt{}\PAQt{}\PH$multilepton analysis described in chapter [chap:ttH].

Hadronic reconstruction

Hadronically decaying fermions, $\PGt_\mathrm{h}$, are reconstructed using the hadron-plus-strips algorithm . This algorithm reconstructs $\PGt_\mathrm{h}$in its various decay modes: one-prong decays (h^±, $h^{\pm}+\PGpz$, $h^{\pm}+2\PGpz$), two-prong decays (h^±h^∓, $h^{\pm}h^{\mp}+\PGpz$, $h^{\pm}h^{\mp}+2\PGpz$) and three-prong decays (h^±h^∓h^±, $hh^{\pm}h^{\mp}h^{\pm}+\PGpz$), where h denotes a charged pion or kaon. Neutral pions decay into photons that may converge into $\Pep\Pem$ pairs with a high probability. Therefore pions are reconstructed by clustering the photon and electron constituents of the jet. $\PGt_\mathrm{h}$candidates are constructed by combining the reconstructed pions with the charged components of the jet, according to the modes described.

$\PGt_\mathrm{h}$are then discriminated from quark and gluon jet background, muons and electrons using a convolutional DNN , referred to as “Deep Tau v2.1”. This algorithm has a significantly better performance than other methods used in CMS, that is achieved by combining the usage of high-level features of the reconstructed $\PGt_\mathrm{h}$and low level information from the particle flow candidates within the $\PGt_\mathrm{h}$isolation cone. It has been trained using and $\PW$+jets simulated samples. The output of the DNN are three nodes discriminating against jets, electrons and muons. Working point are defined based on the score of each node: 8 in the discrimination against jets, 4 against muons and 8 against muons.

Measurement of $\PQt{}\PW$production at CMS

[chap:topphysics] In this chapter, the measurements of the single top production in association with a boson ($\PQt{}\PW$) performed at CMS with $\Pp\Pp$collision data at $\sqrt{s}$ = 13 $\,\text{Te\hspace{-.08em}V}$collected by the CMS experiment during the year 2016. This dataset allowed for high precision measurement of this production mode, and opens the way for measurements of this process in other kinematic regimes.

This chapter is organized as follows. In section [sec:top_introduction], the measurements of $\PQt{}\PW$production will be motivated in the general context of measurements of top quark properties at the LHC. A review on the current status of theory calculation for these processes is also shown. The following sections will cover the measurements of the inclusive $\PQt{}\PW$production cross section and as a function of several kinematic variables. The event selection for these measurements is described in section [sec:event_selection], and the models used to estimate the signal and background contribution to these regions are described in section [sec:mc_simulations]. Section [sec:syst_uncertainties] reviews the systematic uncertainties that are taken into account in these measurements. Section [sec:inclusive] shows the methodology used in the measurement for the inclusive $\PQt{}\PW$production cross section and the results obtained, while section [sec:differential] covers the differential measurements. The chapter is closed with the conclusions in section [sec:tw_conclusions].

Top physics and the $\PQt{}\PW$process

The study of the properties of the top quark are, by its own right, one of the main fields of study in particle physics. The top was predicted by Kobayashi and Maskawa to explain CP violation , but it was not observed until 1995, when it was observed by the D0 and CDF experiments . With a mass, $m_{\PQt}$ of 172.9 ± 0.4 $\,\text{Ge\hspace{-.08em}V}$, as measured by the CMS and ATLAS experiments at LHC and the experiments at Tevatron , it is the heaviest fundamental particle in the .

Its large mass has two implications that motivate the studies of top quark physics. Firstly, due to the large mass difference between the top quark and its decay products, and despite of the fact that this decay is mediated by the weak force, the top quark decays promptly, before any hadronization can take place. Because of this, it is the only quark that can be studied in a free state, and one of the few available probes for perturbative QCD (pQCD).

Additionally, its high mass is a direct consequence of its large coupling to the Higgs boson. This Yukawa coupling is the only natural one in the . Because of this, the top quark plays a special role in the spontaneous breaking of the EWK symmetry. This behavior also implies that the top quark induces large radiative corrections to the mass of the Higgs boson, and may also affect the stability of the vacuum at high energy scales . Because of this, the top quark also plays an important role in many models that aim to enforce the naturalness of the . For instance, most scenarios of SUSY predict a scalar top with a mass in the electroweak scale to cancel the large radiative corrections to the Higgs mass.

Top quark production modes at the LHC

The top quark is produced in different modes in $\Pp\Pp$collisions at the LHC energy scales, each one of which has a different production rate and allow to study different physical aspects.

Top quark pair production

The dominant production mode is the production of quark anti-quark pairs (), for which some of the leading order diagrams are shown in figure [fig:ttbar_diagrams]. The cross section for this production in $\Pp\Pp$and $\Pp\PAp$collisions is shown in figure [fig:ttbar_crosssecction]. This high cross section allows to measure the properties of this process very precisely. The production was dominated by quark-anti quark annihilation () in $\Pp\PAp$collisions at the Tevatron energy scales, while at the LHC the gluon-gluon fusion () dominates, with a contribution of 85%, followed by interactions.

Top pair production cross section has been calculated at accuracy in series of α_s for all the production modes: $\PQq\PQq$ (including $\PQq\PQq$, $\PQq\PAQq$, $\PQq'\PQq$ and $\PQq'\PAQq$), $\PQq{\PQq}$ and $\Pg\Pg$. To take into account the radiation of additional soft partons at all orders, which lead to large logarithmic terms, these are resummed at . At $\sqrt{s}=$ 13 TeV, and taking μ_R = μ_F = m_t, the + prediction for the cross section in $\Pp\Pp$collisions is 832 pb, using the CT14nnlo to model the structure of the proton. The uncertainty on the calculation due to missing higher order calculations is assessed by considering independent variations of the μ_R and μ_F scales between m_t/2 and 2m_t, leading to an uncertainty between 2-3%. The uncertainties associated to the modeling of the structure of the proton are also propagated to the cross section, leading to an uncertainty of 4% .

Predictions of the differential cross section are available in the form of various high precision calculations. These predictions are necessary for many phenomenological studies on proton collider physics, including the study of top quark properties, but also searches for physics or Higgs physics among others. Most of these predictions rely on the narrow width approximation (NWA), in which the width of the top quark is assumed to be zero, allowing to factorize production and decays of the top quark pair. There are automatic MC generators that generate events at NLO accuracy, which allow to predict a broad variety of observables . Additionally, calculations at accuracy for stable top quarks exist for a few set of observables . For decayed top quarks, the best prediction at fixed order include the combination of production and decay calculations, that are usually quoted as $\hat{\text{N}}$NLO .

Several observables are sensitive to off-shell effects, that are not taken into account under the NWA, such as those aiming to reconstruct the top mass . Fixed order calculations at NLO exist for this process, including the top quark decays. A full NLO prediction matched to parton showers is not trivial to obtain, and has only been obtained for the dilepton channel . This aspect, of great importance when discussing the interference between $\PQt{}\PW$and processes, is further discussed in section [sub:twprocess].

The top pair production mode led to the discovery of the top quark at the Tevatron collider by the D0 and CDF Collaborations . Both at Tevatron and at the LHC, this process is studied depending on the different decays of the top quark. The top quark can either decay either hadronically ($\PQq\PQq'\PQb$) or to a triplet of lepton, neutrino and -quark ($\Pl\PGn\PQb$). These decays are always mediated by a boson and, since the $V_{\PQt\PQb}$ element dominates over $V_{\PQt\PQs}$ and $V_{\PQt\PQd}$, a -quark appears in the decay in most of the cases. Measurements of this process are performed aiming for the different decay modes of the top quark.

The dilepton decay mode provides a clean signature with two leptons in the final state, two jets due to the two quarks produced in the top decays, and missing transverse momentum due to the two neutrinos. This is a clean topology that can be easily triggered on because of the presence of the two leptons, keeping a reasonably large fiducial region. Backgrounds in this topology include $\PQt{}\PW$production, which is an irreducible background, and Drell-Yan events, that can be suppressed by requiring jets, tagged jets, vetoing the boson peak or studying events with different flavor leptons.

The single lepton final state corresponds to cases in which one of the top quarks decays leptonically and the other one hadronically. This signature, that has a larger background contamination from multijet production, has a higher branching ratio and allows for a complete unambiguous reconstruction of the system, due to the presence of only one invisible particle in the final state. This allows for precise differential measurements as a function of the top kinematics.

Finally, the fully hadronic channel is the one with the highest cross section. Although all the objects in the final state are visible, the overwhelming multijet background makes precision measurements in this channel challenging.

When I joined the CMS Collaboration, I contributed to measurements of the inclusive cross section in $\Pp\Pp$collisions at a $\sqrt{s}$ at 7 and 8 TeV in the dilepton channel , performing a cross-check of the main analysis, following the signal extraction used in by the ATLAS Collaboration. This measurement had a precision comparable to that of the main CMS analysis, and was more precise than the current state-of-the-art calculations. I also contributed to the first measurement of the inclusive production cross section performed in the Run 2 of the LHC at $\sqrt{s}=$ 13 TeV . This result was one of the first results in the LHC establishing the validity of the at this energy scale, that was then accessible for the first time.

The inclusive production cross section was then measured in this channel with a precision of 4% by the CMS Collaboration and 2.5% by the ATLAS Collaboration , both compatible with the state-of-the-art prediction, and with a higher precision.

Additionally, CMS and ATLAS have performed precision measurements of differential observables in the single lepton and dilepton channels .

These measurements show a generally good agreement with respect to the NLO simulations. However these models fail to predict the top p_Tspectrum, which could be partially explained by missing high orders in the calculations, as the discrepancy diminishes when considering higher order corrections .

Additionally, the production of top anti-top pairs also allows to measure the top mass, the α_s coupling and is affected by the Yukawa coupling of the Higgs boson to the top quark .

Single top production

Top quarks can also be produced at the LHC in processes mediated by the electroweak interactions. In most of these production modes, the top quark is produced singly. At the LHC the single top production is dominated by the so-called t-channel, s-channel and $\PQt{}\PW$-channel. The production cross section for these processes is one order of magnitude lower than that of , since these processes are mediated by the electroweak interaction and partons from the sea of quarks are present in the initial state. Leading order diagrams can be seen for these processes in figure [fig:feynman_diagram_singletop]. These processes involve the vertex. Contributions from the and vertices are present but their effect is negligible. Contributions from flavor-changing neutral currents (FCNC) would also affect the production of some of these processes, but are highly suppressed in the . Because of this, they may be an excellent probe for physics.

t-channel

t-channel production is the single top process with the highest cross section at the LHC, with a value of 217.0_− 4.6^+ 6.6 (scale) ± 3.5 (PDF + α_s) at NLO in the 5FS at $\sqrt{s}=$13 TeV . Measurements of this process can help constrain the PDFs of the proton by performing inclusive and differential cross section measurements. Further constrains can be achieved with measurements of the ratio between the production rate of and processes. Finally, since top quarks are expected to be almost 100% polarized through the t-channel, measurements of this process are sensitive to any contribution that may alter the Lorentz structure of the coupling between top, quarks and bosons.

$\PQt{}\PW$-channel

The second production channel in terms of production rate at the LHC, is the associated production of a top quark and a boson, with an expected rate of 71.7 ± 1.8 (scale) ± 3.4 (PDF) pb at $\sqrt{s}=13$ TeV, calculated at approximate NNLO . The measurement of this process is also a probe to the $V_{\PQt\PQb}$ coupling. However, one of the most important features of this process is its interference with production at NLO. Therefore this process is only well defined at Born level. This process is the main topic of this chapter and is described in greater detail in section [sub:twprocess].

s-channel

The s-channel production mode is very rare in the LHC, due to the presence of quarks from the sea of quarks in the initial state, and the searches performed at the time this thesis was written have not allowed for an observation of the process in $\Pp\Pp$collisions. However, this process was observed at the Tevatron, as its production cross section in $\Pp\PAp$collisions is higher. The process is sensitivity to new particles such as a charged Higgs boson or an extra $\PW'$ production.

Rare processes

Besides the main top quark production modes described above, there are other modes that have a smaller cross section but their study remains interesting as they are probes to the couplings of the top quark to other particles and, potentially, to physics. Thanks to the amount of luminosity delivered by the LHC during Run 1 and, particularly, Run 2, the study of these processes has become possible in the past years.

These processes include the associated production of a top anti-top quark pair with a boson (${\ensuremath{\PQt{}\PAQt{}\PH}\xspace}$, $\PQt{}\PAQt{}\PZ$, ${\ensuremath{\PQt{}\PAQt{}\PW}\xspace}$ and ${\PQt{}\PAQt{}\PGg}$), the production of a single top quark associated with a Z boson, ${\ensuremath{\PQt{}\PZ{}\PQq}\xspace}$, with a Higgs boson, ${\ensuremath{\PQt{}\PH{}}\xspace}$, and with a Higgs and a boson, ${\ensuremath{\PQt{}\PH{}\PW}\xspace}$. $\PQt{}\PAQt{}\PH$, ${\ensuremath{\PQt{}\PH{}}\xspace}$ and ${\ensuremath{\PQt{}\PH{}\PW}\xspace}$ will be covered in greater detail in chapter [chap:ttH]. Similarly, ${\ensuremath{\PQt{}\PAQt{}\PW}\xspace}$ and $\PQt{}\PAQt{}\PZ$will also be covered in that chapter, as they represent the largest irreducible backgrounds in measurements of $\PQt{}\PAQt{}\PH$, $\PQt{}\PH{}$and $\PQt{}\PH{}\PW$production.

$\PQt{}\PZ{}\PQq$production has been observed for the first time by the ATLAS and CMS Collaborations in collisions at $\sqrt{s}= 13$ TeV. This process includes contribution from the coupling of the top quark to the boson, but also of the coupling. Additionally, it is also sensitive to FCNC contributions that may appear in scenarios .

The study of $\PQt{}\PAQt{}\PQt{}\PAQt$production is very valuable because it allows to test calculations involving several orders of QCD. Additionally, this process is also affected by the Yukawa coupling of the top quark to the Higgs boson. Both ATLAS and CMS have performed searches for this process in collision data, however its tiny cross section of 12 fb makes the studies very challenging. Therefore, with the current amount of data collected, the precision of this studies is not enough to claim evidence for this process .

The $\PQt{}\PW$process

The associated production of a single top quark with a boson is the main topic of this chapter. This process is mediated by the electroweak interaction and is affected by the V_tb element of the CKM matrix. The process is trivially defined at Born level, however it interferes with production at NLO, and removal methods must be applied to obtain a consistent definition of this process at NLO. This aspect is discussed in section [sub:tw_ttbar_interference].

Evidence for $\PQt{}\PW$production at 7 TeV was reported by the CMS and ATLAS experiments and its observation was achieved with $ \sqrt{s}=$ 8 TeV $\Pp\Pp$colisions by those collaborations . Both the ATLAS and CMS Collaboration performed measurements of this process in the dilepton channel, obtaining results consistent with the . At $\sqrt{s}$ = 13 TeV, the ATLAS Collaboration has performed a measurement of the inclusive cross section of the process with a luminosity of 3.2 fb^− 1 , and a differential measurement with 36.1 fb^− 1 , both compatible with the approximate NNLO predictions. The CMS Collaboration has measured this process in collisions at $\sqrt{s}$ = 13 TeV with collected luminosity of 35.9 fb^− 1. The production cross section for this process has been measured inclusively and differentially . These measurements constitute the contribution of this thesis to the study of the $\PQt{}\PW$process and are described in sections [sec:event_selection]- [sec:differential].

The $\PQt{}\PW$process can also be used to probe for new interactions at higher energy scales. The measurement done in , in the same topology and using similar analysis techniques to those of the analysis shown in this chapter, allows to put limits on the Wilson coefficients for additional terms in the Lagrangian.

Interference between and $\PQt{}\PW$

One of the most remarkable properties of the $\PQt{}\PW$process is its interference with events, which is described in this section. A more detailed description can be found in .

The interference occurs when computing observables at NLO accuracy in perturbation theory for the $\PQt{}\PW$process. At NLO, the $\PQt + \PW + \PQb$ final state is accessible, which coincides with the final state of the process. This is shown in figure [fig:tw_ttbar_interference] in which diagrams associated to production and $\PQt{}\PW$production associated with an additional quark are depicted. In the former, the two top quark lines are resonant, while in the latter only one of the two quarks is resonant. The contribution of the first kind of diagrams is larger than the second one, so this introduces a difficulty to the modeling of the $\PQt{}\PW$process at NLO, since the NLO corrections are going to be typically larger than the Born ones.

Three approaches exist to remove the overlapping between doubly and singly-resonant contributions, and have a consistent definition of the $\PQt{}\PW$process, separated from . The complete squared matrix element can be written as

$$|\mathcal{M}_{\PQb\PQb 2\Pl2\nu}|^2= | \mathcal{M}_{singly} | ^2 + | \mathcal{M}_{doubly} | ^2 + 2\Re(\mathcal{M}_{singly}^*\mathcal{M}_{doubly}),$$

where M_{singly (doubly)} represents the matrix element associated to the singly (doubly)-resonant diagrams.

In the Diagram Removal method , only the M_singly is kept, neglecting the contribution from interference terms and doubly-resonant diagrams. This approach violates the gauge invariance, however, in practice, little dependence of the gauge choice is observed. An alternative approach, called Diagram Removal 2 , the terms ∣M_singly∣² + 2ℜ(M_singly^*M_doubly) are used to define the $\PQt{}\PW$process. A third approach exists, called Diagram Subtraction , in which an additional term, M_DS is added at the level of squared matrix element. This term is chosen so that M_DS − M_doubly will vanish when $m_{\PAQb\PW}^2 \to m_t^2$. This provides a gauge invariant construction, but only allows for a local subtraction of the interference.

In order to have a complete description of this interference, NLO calculations of the $pp\to\Plp\PGn\Plm\PGn\PQb\PAQb$ process, dubbed bb4l, taking into account off-shell effects are needed. These calculations exist in the , that assumes quarks to be massless. However, this set-up yields to divergences in the $\Pg\to\PQb\PAQb$ splitting, than can only be handled for observables with two hard jets, not suitable to explore phase spaces in which one of the jets is vetoed.

The current state of the art for the bb4lprocess is a calculation, that includes off-shell contributions, NLO corrections to production, decay and their interference, and matching to parton showers. This calculation been implemented in the powheg-box-resframework, and is fully described in reference . This calculation provides an exact treatment of the interference at NLO accuracy. Additionally, since the calculation is performed in the it allows to model phase space regions with jet vetoes.

From the experimental viewpoint, and $\PQt{}\PW$measurements are typically designed to be robust to these effects, which are evaluated comparing the different methods to handle the interference¹. However, searches for scalar partners of the top quark aim for phase spaces with a significant contribution from off-shell top quarks. In such phase spaces, the interference effects cannot be avoided and may constitute a significant source of systematic uncertainty.

Dedicated measurements of the quantum interference effects have been performed by the ATLAS Collaboration in $\Pp\Pp$collisions. The measurement is performed in the eμ channel in topology with jets, and an observable sensitive to off-shell effects is explored. Figure [fig:atlas_tw_wwbb] shows the distribution of the observable in data compared to the full bb4lprediction in [fig:atlas_tw_wwbb] and the +$\PQt{}\PW$predictions with the DS, DR and DR2 methods. All the generators model well data in the bulk of the distributions, where interference effects are not expected to be dominant. The +$\PQt{}\PW$prediction slightly deviate from data in the phase space where interference effects are relevant, while this phase space is properly modeled by the bb4lprediction. This measurement can also be exploited to measure the top quark width with high precision .

Analysis strategy

In this section, the general analysis strategy is described. Two different measurements are considered in this chapter: the inclusive measurement and the differential measurement of the $\PQt{}\PW$process. Both analyses are performed in the dilepton ${\ensuremath{\Pe^\pm}\xspace}{\ensuremath{\PGm^\mp}\xspace}$ channel, which is the one with the smallest background contribution.

In this final state, signal is characterized by the presence of an electron, a muon and a -tagged jet in the final state. The main challenge of the analysis is the overwhelming presence of events across all the phase space.

The top quark pairs are produced with a cross-section more than 10 times higher than signal, and present a similar final state, which contains only an additional jet, compared to signal. A significant portion of events will loose one jet due to the limited acceptance of the detector or because the jet being bellow the energy threshold.

Different approaches are followed to tackle with this background in the inclusive and differential analysis: the inclusive analysis makes use of a likelihood fit to several signal and control regions, defined by the jet and -tagged jet multiplicity of the event and a score, while the differential analysis is designed as a counting experiment performed in a region enriched as much as possible in signal. However, both analysis rely on the different jet multiplicity of the two processes, and variables derived from this fact, as the main discriminating variable.

Object selection

Muons identified by the PF algorithm are applied additional identification criteria corresponding to the “tight” selection, described in section [sec:muo_identification], to efficiently select muons produced prompt , or decays. Additionally, in order to reject muons not produced in , or decays, muons are required to be isolated with a relative isolation variable of less than 15%, with an isolation cone size of 0.4.

Reconstructed electrons are required to pass the “cut-based tight” identification criteria, described in section [sec:electron_reconstruction_id]. Electron candidates in the transition region, fulfilling 1.442 < ∣η_SC∣ < 1.5660, are rejected, where η_SC denotes the pseudorapidity of the ECAL supercluster.

Both electrons and muons in the event are required to have small impact parameters measured with respect to the primary vertex and to be isolated with the rest of the physics objects to reject leptons produced in quark decays.

Jets follow the definition in [sec:jet_definition] and are required to have a p_Tof at least 30 GeV. Since leptons are usually clustered as jets by the anti-κ_t algorithm, jets within a ΔR < 0.4 of a selected leptons are not included in the counting. tagged jets are identified using the CSVv2 algorithm, described in section [sub:jet_btagging]. Finally, ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$is constructed using all the particles reconstructed by the PF algorithm and coming from the primary vertices. Momentum corrections to the energy of the jets is propagated to the ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$.

In order to be able to capture jets that are slightly below the energy acceptance, and achieve better discrimination between and $\PQt{}\PW$events, “loose” jets, defined as those jets fulfilling the criteria above but with a p_Tthreshold of 25 GeV and a p_Tbelow 30 $\,\text{Ge\hspace{-.08em}V}$, are used.

Event selection

Events are collected with a set of double and single lepton triggers to maximize the signal efficiency. These triggers require the presence of a muon (electron) with p_Tgreater than 23 (12) GeV and an electron (muon) with a p_Tgreater than 12 (8) GeV. In addition, other triggers that only require the presence of one muon (electron) with a p_Tgreater than 24 (27) GeV are used. This strategy is followed in order to increase the precision on the measurement of the trigger efficiency, that is performed following the orthogonal trigger method, described in section [sub:cms_trigger_efficiency_measurements].

Events collected by these triggers are required to have a pair of leptons with a p_Tgreater than 25 (20) GeV for the leading (sub-leading) lepton with opposite charge. The two leading leptons are required to be an electron and a muon. Events with ${\ensuremath{m_{\Pl\Pl}}\xspace}< 20$ $\,\text{Ge\hspace{-.08em}V}$are rejected to suppress the contribution from low mass resonances. Specific filters are applied to reject events with anomalous ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$due to detector noise, cosmic rays and other sources.

The modeling of several kinematic variables is checked in events fulfilling the selection described above. Figure [fig:tw_kinvars] shows the data distributions of several kinematic variables, compared to the MC predictions with its corresponding uncertainties, with a good agreement between the two.

After this selection, categories are considered based on the jet (n_j) and -tagged jet ($n_{\PQb}$) multiplicity of the events. The (n_j, $n_{\PQb}$) distribution of selected events is shown in figure [fig:njetnb], compared to the predictions. Each bin contains a different amount of signal and background events. Therefore, these bins are used to define the several regions of interest in the analysis. In particular, the region with exactly one jet that is -tagged (1j1b) is the one that is enriched the most in signal. The region with exactly two jets, out of which one is a -tagged jet (2j1b) is less pure but keeps a significant part of the signal. On the other hand, the region with exactly two jets that are -tagged (2j2b) is fully enriched in events, and is used to constrain this background.

The inclusive measurement is performed simultaneously with events in the 1j1b and 2j1b region. The 2j2b region is also included in the signal extraction fit to constrain the background and the uncertainties associated to it.

The differential measurement is performed only in events in the 1j1b region. Additionally, in order to obtain a region purer in signal, only events with no additional “loose” jets are considered. The distribution of number of “loose” jets is shown in figure [fig:njetnb] for events in the 1j1b region, showing the discrimination power of this variable.

Background and signal estimation

Simulated events are used to predict the data yields in the measurement regions of these analyses.

The $\PQt{}\PW$signal is simulated at NLO using powhegv1 , with the NNPDF 3.0 PDF set . The DR approach described in section [sub:tw_ttbar_interference] is used to take into account the interference. Set of simulated events using the DS approach are also used to determine the uncertainty associated to the modeling of the interference.

events are simulated using powhegv2 , which is also used to infer the dependency on the PDFs and the factorization and renormalization scales. This sample is normalized to the + calculation obtained in .

Drell–Yan and $\PW$+jets background events are simulated at NLO with MadGraphmc@nlov2.2.2 with NNPDF 3.0 PDFs. These processes are simulated with up to additional partons and the FxFx scheme is used for merging . Contributions from $\PW\PW$, $\PW\PZ$ and $\PZ\PZ$ (denoted as ${{\HepParticle{V}{}{}}\Xspace}{{\HepParticle{V}{}{}}\Xspace}$) are simulated with pythiav8.205 at LO. Finally, $\PQt{}\PAQt{}\PW$and $\PQt{}\PAQt{}\PZ$production are simulated at NLO precision with MadGraphmc@nlo.

All these samples, except , are interfaced to pythiav8.205 with CUETP8M1 underlying event tune to simulate parton shower and hadronization. For events, the CUETP8M2T4 tune is used instead.

Contribution from leptons not coming from , or decays is expected to be dominated by and $\PW$+jets events. The simulations described above are used to model them.

Systematic uncertainties

Measurements of $\PQt{}\PW$cross section are affected by several sources of systematic uncertainty. Each source is evaluated by performing consistent variations of the signal and background estimations by the estimated uncertainty. Two types of systematic uncertainties are taken into account: those stemming from the calibration or characterization of the reconstructed physics objects, and those due to the unknowns in the models, either due to their intrinsic assumptions or to the limited knowledge of fundamental parameters.

Experimental uncertainties

Modeling uncertainties

Simulated events used to estimate signal and background contain underlying assumptions, that may have an impact in the analysis. This is assessed by considering dedicated simulated samples of the and $\PQt{}\PW$processes. These samples are generated varying the relevant parameters from those of the standard powheg+pythiasimulations.

The uncertainty on the missing higher orders in the calculation of the hard-production process is assessed by considering variations of μ_R and μ_F by factors of 2 and 0.5 relative to their nominal value. The uncertainty associated to the knowledge of the proton structure is obtained by reweighing the sample of simulated events by the 100 NNPDF3.0 replicas. For each bin of the fitted distributions, the root-mean-square of the 100 replicas is taken as this source of uncertainty. These two variations are used to estimate the acceptance and distributions of signal and backgrounds. The total production rate for is a assigned a 5% uncertainty due to the two effects .

The interference between $\PQt{}\PW$and production is handled by considering the difference between the DR and DS prescriptions.

To account for the uncertainties in the parton shows and jet fragmentation modeling, several aspects are taken into account:

The mismodeling of the top p_Tspectrum in described in section [sec:top_introduction] is also taken into account by considering simulated events with the nominal powhegmodel. These events are reweighed in order for the top p_Tspectrum to match that of data. The difference between this prediction and the nominal powhegprediction is considered. This effect was observed to have no impact in the inclusive analysis, and is not taken into account. In the differential analysis, this correction is relevant when determining the p_Tdistributions of the leptons. In that case, the reweighed distribution is taken as the central value, and the difference with respect to the nominal as the systematic uncertainty.

Finally, other backgrounds different from are assigned a 50% uncertainty. This uncertainty accounts for the limited knowledge of the cross section for these processes, as well as extrapolations from the full phase space to the phase space in which this analysis takes place.

In the inclusive analysis, these uncertainties are taken into account as independent nuisance parameters in the fit. For those uncertainties affecting the signal, the effect on the extrapolation from the full to the fiducial phase space is factorized from the effect on efficiency and distribution shapes. This effect is taken into account as separated nuisance parameter, that cannot be constrained in the fit. The number of expected and $\PQt{}\PW$events in these regions is shown in table [tab:tw_yield].

Inclusive measurement of the $\PQt{}\PW$cross section

As noted above, the signal regions defined in the analysis have a dominant contribution of events, and only the 1j1b and 2j1b regions have a significant contribution of signal. The signal purity of these two regions is of the order of 20% and 5%, respectively.

The distinct signature between and $\PQt{}\PW$in the 1j1b region is that in events one jet is outside the acceptance or does not pass the p_Tor identification requirements. Therefore the “loose” jet multiplicity distribution can be employed as a discriminating variable between signal and background. However, as seen in figure [fig:njetnb], its discriminating power does not allow to construct a region fully enriched in signal.

The topology of the 2j1b region is the one expected in events in which one of the jets has not been tagged as such. This can occur in a non negligible amount of cases due to the limited efficiency of the tagging methods. However, there is a significant contribution from signal, for cases in which $\PQt{}\PW$is produced in association with additional partons. In these cases, the additional jet is expected to be softer, since it is expected to come from radiation or from a diagram line corresponding to an off-shell top quark.

Since in the 1j1b and 2j1b regions there is no single variable that is expected to easily discriminate between and $\PQt{}\PW$, dedicated multivariate discriminators are employed in each of the 1j1b and 2j1b categories to obtain regions purer in signal. Even with the usage of these techniques, the presence of events is still dominant over signal, and the uncertainties associated to the estimation of this background are the dominant ones in the analysis. To further constrain them, the 2j2b region is also included in the analysis. The total cross section measurement is then performed by making a likelihood fit to event yields of bins of the discriminator distribution in the 1j1b and 2j1b region, exploiting the discrimination power of the whole shape, and the yields on the distribution of the sub-leading jet p_Tin the 2j2b region, which allow to slightly constrain the uncertainties associated to the jet energy scale.

BDT for background discrimination

To improve the discrimination, s with gradient boosting is used. The input variables used for the 1j1b region are:

These distributions aim to discriminate profiting from the different topology and kinematics expected in and $\PQt{}\PW$events. Variables related to the “loose” jet multiplicity or kinematics aim to recover the jet that has not been selected in events. Other variables, such as ${\ensuremath{p_{\mathrm{T}}}\xspace}^{sys}$ aim to be sensitive to this missing jet via the momentum imbalance that would appear in the total system. Finally, other, such as the p_Tof the jet or H_T, are sensitive to the higher energy that is present in the system in comparison with the $\PQt{}\PW$system. The distribution of these variables is shown in figures [fig:tw_input1j1b₁]- [fig:tw_input1j1b₂].

The sub-leading jet p_Tis included in the because this distribution is expected to be softer in signal than in events. ΔR distributions in the various systems are expected to have slightly different between signal and events due to the higher total energy in the system.

The distribution of these variables is shown in figure [fig:tw_input2j1b], featuring a good agreement between data and prediction.

Some of the hyperparameters in the two s trained are shown in table [tab:tw_bdt_hyp]. The normalized distribution of the scores in training and testing simulated events is presented in figure [fig:tw_bdts_dist], showing the discrimination power of the variables, and its generalization from training to test sample. The presented distributions do not show any sign of overtraining.

The distribution of the s score in data is compared to predictions in figure [fig:tw_bdts_stacks], showing a good agreement between data and the predictions.

Even if the amount of observed events is not expected to be a limitation for this analysis, the amount of simulated events that are used is limited. In particular, the alternative samples used to estimate the systematic uncertainties are significantly smaller than the nominal ones. Fluctuations of these samples could yield to an overestimation of the uncertainty as well as instabilities in the signal extraction fit. In order to ensure that each bin of the signal extraction contains enough simulated events for a precise estimation of both the expected yields, as well as the of associated uncertainties, a specific binning is chosen for these distributions. In particular, for the distributions, the quantiles of the background distribution are taken as the bin limits. This ensures that all bins contain approximately a similar amount of simulated background event, which is the optimal way to reduce the statistical uncertainties associated to the limited amount of simulated events. For the sub-leading jet p_T, the binning is chosen according to the expected detector resolution.

Signal extraction

The observed data in the distributions used for the signal extraction, is shown in figure [fig:tw_signalextraction]. The expectations are also shown in the plot, together with the uncertainties associated to it.

The signal is then extracted to yields in these distributions. Besides the nuisance parameters, that parameterize the systematic uncertainties described in section [sec:syst_uncertainties], the fit model includes a signal-strength parameter, $\mu_{{\ensuremath{\PQt{}\PW}\xspace}} = \sigma_{{\ensuremath{\PQt{}\PW}\xspace}}/\sigma_{{\ensuremath{\PQt{}\PW}\xspace}}^{exp}$, that is unconstrained in the fit. This parameter of the model defines the scaling of the signal with respect to the value predicted by the . The best fit for $\mu_{{\ensuremath{\PQt{}\PW}\xspace}}$ is obtained by maximizing the likelihood. The 68% confidence interval is obtained by considering variations of the test statistic described in [sub:statistics] by one unit from its minimum.

Results

The best fit for the $\PQt{}\PW$signal-strength parameter is 0.88 ± 0.02 (stat) ± 0.09 (syst) ± 0.03 (lumi), corresponding to a measured cross section of 63.1 ± 1.8 (stat) ± 6.3 (syst) ± 2.1 (lumi) pb, consistent with the expectation.

Figure [fig:tw_signalextraction_postfit] shows the data observed in the various signal regions as well as the predictions after the fit was performed, setting the $\mu_{{\ensuremath{\PQt{}\PW}\xspace}}$ parameter to its postfit value, showing a good compatibility between data and the statistical model.

Table [tab:tw_systimpacts] shows the impact of each one of the sources of systematic uncertainty in the analysis. The impact of a given uncertainty is obtained by comparing the uncertainty of the nominal fit with that of a fit performed fixing the associated nuisances to their postfit value. The alternative fit will have, by construction, the same postfit values as the nominal one, but the uncertainty will be smaller since the fixed parameters do not play a role. The quadratic difference between the uncertainty of the nominal and the uncertainty of the alternative fit is taken as the impact of the source. Statistical uncertainty is obtained by fixing all the nuisance parameters to their postfit value.

The uncertainty is dominated by trigger and lepton efficiency and luminosity. The size of these uncertainties is due to the dominant presence of background in signal and control regions: a small uncertainty in events is amplified when propagating it to signal events.

Differential measurements of the $\PQt{}\PW$cross section

This section describes measurements of the $\PQt{}\PW$production differential cross section. This measurement is performed in the 1j1b region described in section [sub:evt_selection], which is the one with the highest signal purity. While other differential cross section distributions in phase spaces dominated by background have made use of multivariate discriminants , this measurement uses purely object multiplicity cuts to define the measurement region.

This approach has three advantages. First, it does not introduce strong assumptions on the distribution of the signal in a multivariate discriminant, reducing the model dependency of the measurement. Secondly, this allows to define a fiducial region consistent with the regions employed in the measurement, and assumptions on the extrapolation to the full phase space are made explicit on the method. Finally, this method allows to measure the cross section as a function of any variable, provided that the signal purity in all the bins is large enough for a sensitive measurement.

Observables under study

The first three variables provide general information regarding the kinematic properties of the $\PQt{}\PW$system. In particular, the first two are sensitive to the mismodeling of the top quark p_T. The $\Delta\phi({\ensuremath{\Pe^\pm}\xspace},{\ensuremath{\PGm^\mp}\xspace})$ variable allows to explore correlations between the two top quarks and to measure spin-related properties. The $p_z({\ensuremath{\Pe^\pm}\xspace},{\ensuremath{\PGm^\mp}\xspace},j)$ is a proxy of the total boost to the system, and provides sensitivity to the production mechanism. The last two, the invariant and transverse masses are sensitive to the total energy and mass of the $\PQt{}\PW$system.

Selection and fiducial space definition

In order to achieve the maximum signal purity, the measurement is performed in events in the 1j1b region, described in section [sub:evt_selection], without reconstructed “loose” jets.

The observed number of events as a function of the observables under study is shown in figure [fig:tw_diff_dists₁], together with the prediction of the $\PQt{}\PW$prediction from the powhegmodel.

Since this selection still has a significant contribution from events, it is important to check the modeling of this process in a phase space that is close to the measurement region. To do so, a control region is defined by considering events in the 1j1b region with additional “loose” jets in the final state. The observed distributions in this region are compared with the predictions in figure [fig:tw_diff_control₁], showing a good agreement between data and the predictions.

The fiducial region is constructed using particle level objects. A summary of the main requirements applied to the objects is discussed here. The complete definition is shown in a dedicated reference . Particle level objects are required to have a lifetime greater than 30 ps.

Particle level charged leptons are defined as those produced in prompt decays of , , or . In particular, leptons produced in the decay of heavy hadrons are not taken into account. These leptons are dressed with photons with a size of ΔR < 0.1.

Particle level jets are defined by clustering all stable particles with the anti-k_T algorithm with a jet cone parameter of R = 0.4. Neutrinos are excluded from this clustering, as well as prompt leptons and photons. jets are defined as those jets that contain a decayed hadron.

The fiducial region is defined as events with one particle-level muon with ${\ensuremath{p_{\mathrm{T}}}\xspace}> 20$ and ∣η∣ < 2.4, one electron with ${\ensuremath{p_{\mathrm{T}}}\xspace}> 20$ GeV and ∣η∣ < 2.4 and ∣η < 1.442∣ or ∣η∣ > 1.566. Events are also required to have one particle-level jet with ${\ensuremath{p_{\mathrm{T}}}\xspace}> 30$ GeV and ∣η∣ < 2.4, and no additional jets with ${\ensuremath{p_{\mathrm{T}}}\xspace}>20$ GeV and ∣η∣ < 2.4. This definition of particle objects and the definition of the fiducial region are summarized in tables [tab:tw_particle_level] and [tab:tw_fiducial_region], respectively.

Unfolding to particle level

To unfold the results to particle level, the approach described in section [sub:unfolding] is followed.

The response matrices are obtained using the $\PQt{}\PW$signal simulations described in section [sec:mc_simulations]. The usage of response matrices based in simulations introduces uncertainties associated to the calibrations derived for them and mismodeling effects. To account for this, replicas of the response matrix are considered varying each one of the systematics uncertainties, described in section [sec:syst_uncertainties], affecting the signal. The response matrices of the variables under study are shown in figure [fig:tw_response]. As mentioned in section [sub:unfolding], the binning of these matrices has been optimized to reduce the stability and purity. The condition number for the matrices is shown in table [tab:tw_condition_number], which is of the order of magnitude of the unity. The choice for τ, the regularization parameter, is made by scanning the L-curve as described in [sub:unfolding]. No significant difference is seen in the result between applying and not applying the regularization terms, which is consistent with the condition numbers of the response matrices. Because of this and the small condition number of the response matrices, the unbiased maximum likelihood estimator is used.

For a given variable X, the absolute differential $\PQt{}\PW$cross section for a given bin j of X is computed from the number of unfolded signal events N_j^sig, unf as

$$\left(\frac{d\sigma}{dX}\right)_j = \frac{1}{\mathcal{L}}\frac{N_j^{sig,unf}}{\Delta_j},$$

where Δ_j is the bin width of bin j. In order to profit from the cancellation of systematic uncertainties, the normalized differential cross section is obtained, by the fiducial cross section (the sum over all bins of the absolute differential cross section).

Uncertainties associated to the background subtraction are taken into account by considering suitable variations of the systematic uncertainties.

To propagate to the cross section measurement, the measurement is performed using variations of the systematic uncertainties both in the background subtraction and the response matrix. Response matrices and background subtraction are varied simultaneously for systematics that affect both signal and backgrounds in a correlated way.

Results

The normalized differential $\PQt{}\PW$cross section as a function of the observables under study are shown in figure [fig:tw_unfolded]. Fair agreement, within the uncertainties is observed with respect to the powhegDR, powhegDS and MadGraph5_amc@nlo. The leading systematic uncertainties affecting to each bin of the distributions are shown in figure [fig:tw_unfolded_uncertainties]. The leading uncertainties in these measurements are due to and . The impact of these uncertainties is large due to its effect in the background estimation.

Conclusions

Measurements of the $\PQt{}\PW$production cross section have been performed in $\Pp\Pp$collision events at $\sqrt{s}=$ 13 TeV collected with the CMS detector during 2016, corresponding to an integrated luminosity of 35.9 fb^− 1. The measurements are performed in events with a ${\ensuremath{\Pe^\pm}\xspace}{\ensuremath{\PGm^\mp}\xspace}$ pair and at least one -tagged jet.

These measurements are intimately related with high precision measurements of the process, since the two processes interfere and are therefore irreducible backgrounds to each other. In the case of $\PQt{}\PW$measurements, a precise modeling of events is crucial, due to its larger cross section.

The inclusive $\PQt{}\PW$production cross section is measured making use of multivariate techniques, that allow to discriminate signal from events. The signal is extracted performing a maximum likelihood fit to the distribution of events in the score of these discriminants. The inclusive cross section is measured to be 63.1 ± 1.8 (stat) ± 6.3 (syst) ± 2.1 (lumi) pb, consistent with the SM prediction. This measurement has been published in .

Normalized differential $\PQt{}\PW$production cross sections are also measured in a fiducial region enriched in signal events. The measurements are performed as a function of several properties of the event: the transverse momentum of the leading lepton; the transverse momentum of the jet; the difference in the ϕ angle of the muon and the electron; the longitudinal momentum of the muon, the electron and the jet; the invariant mass of the muon, the electron and the jet; and the transverse mass of the electron, the muon, the jet, and the ${\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}$.

The dominating uncertainties to these measurements are experimental uncertainties, such as jet energy scale and trigger and lepton efficiency, that have a significant impact to the modeling in the final measurements due to their effect on the modeling of events.

Both the inclusive and differential measurements are consistent with the prediction from the various NLO predictions. This result has been released in .

Search for new physics in events with two opposite-sign same-flavor leptons

This chapter covers searches for new physics in events with two opposite-sign same-flavor (OSSF) leptons and missing transverse momentum in the final state. The search is performed in $\Pp\Pp$collision evens at $\sqrt{s}=$ 13 TeV recorded by the CMS detector in 2016, 2017 and 2018.

The chapter is organized as follows. Section [sec:susy_signals] will cover the characteristics of the topology in which the search is made, as well as the potential signals featuring this topology. The status of these searches at the moment in which this thesis was written will also be shown. The simulated datasets used in the search are briefly described in section [sec:susy_dataset]. The event selection and signal region (SR) definitions will be described in section [sec:susy_regions], and the background estimation methods will be covered in section [sec:susy_backgrounds]. The results of the search will be shown in section [sec:susy_results] and their interpretation on the context of supersymmetric simplified models, in section [sec:susy_interpretation].

Due to the complexity of the research performed in high energy physics, the work shown in this chapter has been done in collaboration with groups of UCSD, RTWH Aachen, ETH Zürich, CERN and IFCA, all within the CMS Collaboration. Besides leading the development of the full Run 2 analysis, my personal contributions to the analysis include the development of the likelihood discriminant, and the definition and tuning of various signal regions in both the electroweak and strong production modes. I also redesigned the factorization method to estimate the flavor-symmetric background in the full run 2 analysis, and performed the signal extraction for some of the considered models.

Signal models and previous searches

This search looks for supersymmetric processes in which an opposite-sign same-flavor pair of leptons and missing transverse momentum are produced. As mentioned in chapter [chap:theory], momentum imbalance is one of the characteristics of R-parity conserving SUSY models, in which an invisible LSP is present in the final state. In such models, SUSY particles are produced in pairs, either via the strong interaction, producing squarks or gluinos, or via the electroweak interaction, producing charginos, neutralinos and sleptons.

Leptons may appear in the decay chains of these SUSY particles. Although the branching ratios to final states with leptons are typically smaller than hadronic final states in many SUSY models, these final states are relatively easy to trigger on and are affected by limited backgrounds. In particular, searches with leptons in the final state are the most sensitive for electroweak SUSY production and are at least complementary for strong searches.

$$\PSGczDt \to \Plpm \PSlmp \to \Plpm\Plmp\PSGczDo$$

$$\PSGczDt \to \PZ\PSGczDo \to \Plpm\Plmp\PSGczDo.$$

A similar topology may occur in scenarios, in which the gravitino, the spartner of the hypothetical graviton, is the LSP:

Different kinematic features occur in each one of the decay modes, and depend on the mass splitting of the neutralinos. In the decay modes mediated by a boson, if this mass splitting is larger than the boson mass, the resonant contribution dominates and an excess of events containing a boson is observed. Otherwise, a resonance with a kinematic endpoint depending on the mass splitting of the neutralinos is observed. This resonance is referred to as an edge.

The may have been produced via direct production or in the decays of strongly produced SUSY particles. This will condition the topology of the event, as additional objects may be produced in the final state. In this search, several signal regions are defined targeting different topologies.

Another possible topology with OSSF leptons is the following. Since SUSY particles in R-parity conserving models are produced in pairs, the pair of opposite-sign same flavor pair can occur in the direct pair production of sleptons:

In this case, since the two leptons are produced in different decay chains, the two leptons are mostly uncorrelated and no resonance-like features are observed in the $m_{\Pl\Pl}$distribution.

Several simplified models involving these topologies are considered to define and optimize the signal regions of the analysis, as well as to interpret the results.

Models with a candidate

As mentioned above, processes with SUSY particles and a boson in the final state can occur in the decay of a neutralino, that may have been produced with the strong and electroweak interaction.

To study strong interaction, a simplified model, inspired in SUSY, is considered. This model will be referred to as the scenario, and shows direct pair gluino production. Each one of the gluinos decays into a pair of quarks and a , which further decays into a boson and the massless gravitino. Considering the leptonic decay of one of the bosons and the hadronic decay of the remaining one, events in this topology feature an OSSF lepton pair, 6 jets, and missing transverse momentum due to the gravitinos. The kinematic properties of these objects depend on the gluino and masses, which are the parameters of the model. The Feynman diagram for this process is shown in figure [fig:feynman_susy_gsmb].

Two electroweak production models are considered: direct neutralino pair production ($\PSGczDo\PSGczDo$) and the associated production of a neutralino and a chargino ($\PSGczDt\PSGcpmDo$).

In the first case, a model with a massless gravitino is considered . Then, the phenomenology of the model is fully determined by the . The topology we are interested in appears when the is a higgsino-like , as a gaugino-like would decay into a $\PGg\PXXSG$. We consider a set of higgsino-like , and , that are approximately degenerate, and the rest of sparticles are decoupled. Under this model, and would decay promptly into emitting additional low energy partons, that can be neglected. Then, the effective total cross section for $\PSGczDo\PSGczDo$ is dominated by ${\ensuremath{\Pp\Pp}\xspace}\to \PSGcz_i\PSGcz_j$ with i, j = 1, 2 and ${\ensuremath{\Pp\Pp}\xspace}\to \PSGcpDo\PSGcmDo$, with all the modes contributing similarly.

The branching fractions of the $\PSGczDo$ depend on its higgsino content, as well as its mass. Below and slightly above the kinematic threshold for or production, the decay is dominated by photons. In this case, we restrict ourselves to extreme cases. In the first of them, the neutralino always decays into a boson and the massless gravitino. In this case, the final state would present an OSSF lepton pair and two jets produced in the decay of the two bosons and missing transverse momentum due to the gravitino. In the second scenario, the neutralino decays 50% of the times into a boson and a gravitino and 50% of the times into a boson and a gravitino. In this scenario, 50% of the events will have a $\PH\PZ$ final state, 25% a $\PZ\PZ$ final state, and in the remaining 25%, a $\PH\PH$ final state, all of them with ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$in the final state. Feynman diagrams for these two scenarios are shown in figure [fig:feynman_susy_neuneu]. The free parameter of this model is the mass of the neutralino.

In the $\PSGczDt\PSGcpmDo$ production model, the second neutralino decays into the lightest neutralino and a boson, while the chargino decays into a boson and a lightest neutralino. Targeting the hadronic decay of the boson, the final state present shows an OSSF lepton pair, two jets and ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$. A feynman diagram for this model is shown in figure [fig:feynman_susy_charneu]. The parameters of this model are the mass of the chargino and second neutralino, that are assumed to be degenerate, and the mass of the first neutralino.

Models with a resonant edge

The direct production of an squark and its subsequent decay into a quark and a neutralino can induce the resonant edge topology:

$$\label{eq:susy_feyn_edge} {\PSQ}\to \PQq\PSGczDt \to \PQq \Plpm\Plmp\PSGczDo.$$

The decay of the can occur in the two ways described above. If the on-shell boson is not accessible because of the small mass splitting between and , two situations can occur.

If $m_{\PSGczDt} - m_{\PSGczDo} < m_{\PZ}$ and $m_{\PSGczDt} - m_{\PSGczDo} < m_{\PSl}$, the only possible decay mode is through an off-shell boson. Then, the $m_{\Pl\Pl}$ distribution shows a kinematic endpoint at, $m_{\Pl\Pl}^{edge}$, given by

If $m_{\PSl} < m_{\PSGczDt} - m_{\PSGczDo} < m_{\PZ}$, the decay mediated by an slepton is kinematically allowed and the decay occurs through two sequential two-body decays. In that case, the kinematic endpoint is given by

$$m_{\Pl\Pl}^{edge} =\frac{\sqrt{\left(m^2_{\PSGczDt} -m_{\PSl}^2 \right)\left(m_{\PSl}^2 - m^2_{\PSGczDo} \right)}}{m_{\PSl}}.$$

These kind of models are particularly interesting because, if a signal was to be measured, the position of the edge resonance would give direct information on the masses of the sparticles produced in the decay chain. Figure [fig:susy_feynman_t6bb] shows the Feynman diagram for the simplified model considered, the so-called slepton-edge model. It features the production of an squark pair, with the subsequent decay described in the reaction [eq:susy_feyn_edge]. In the search, to variations of this model are considered, one in which the squark is an sbottom and another in which it is a light flavor squark. In this model the mass of the is assumed to be 100 $\,\text{Ge\hspace{-.08em}V}$and the mass of the slepton to be $0.5(m_{\PSGczDt} + m_{\PSGczDo})$. The squark and masses are free parameters of the model.

Non-resonant models

This search also considers the search for a direct production of a pair of sleptons (selectrons and smuons). In the model considered, a pair of left-handed or right-handed sleptons is produced. Each one of them subsequently decays into a lepton of the same flavor and a $\PSGczDo$, the LSP in this model. A Feynman diagram for this process is shown in figure [fig:susy_feynman_slept]. Both the slepton and the masses are free parameters of the model.

Previous searches

I have contributed to three publications of SUSY searches performed in this final state with Run 2 data. The first set of $\Pp\Pp$collisions at 13 TeV, corresponding to 2.3 fb^− 1, collected during 2015 with CMS allowed to search for strong production models, profiting from the increase in cross section at the high energy . Then this search was performed with the luminosity collected during 2016, and including electroweak production models . The electroweak production limits obtained in the latter publications were combined with other decay channels, resulting in an increased sensitivity .

Additionally, CMS released a search for direct production of sleptons with the same dataset , using similar selections and background estimation techniques.

Similar searches were also performed in CMS with Run 1 data. At 8 TeV searches for strong and electroweak production of SUSY in these topologies were performed. A combination of electroweak production searches was also performed in .

The searches covered in this chapter are the result of the three searches mentioned above, as well as the direct slepton pair production search. The search is performed in $\Pp\Pp$collisions collected across the 2016, 2017 and 2018 years, with a total integrated luminosity of 137.2 fb^− 1. This luminosity allows for higher sensitivity to lower cross section signal both because of the reduction of the statistical uncertainties and the refinement of the background estimation and rejection techniques.

Signal and background simulations

Although the background estimation techniques for the main backgrounds are based on data, simulated events for specific processes are also used. These are needed to validate the data-driven background prediction techniques, to model smaller backgrounds not covered by these techniques, and to estimate the contribution from potential signals in the analysis.

Most background processes are simulated using the same generators as described in section [sec:mc_simulations]. Dedicated simulated samples are used for each one of the years of the data-taking, in order to account for the different detector geometry and pile-up conditions. The pythiatune used for 2017 and 2018 simulations is CP5 , while the one used in 2016 is CUETP8M1 for events and CUETP8M2T4 for the rest. Contribution from +jetsevents in the photon control regions is simulated at LO with the MadGraphmc@nlogenerator.

For this analysis, since larger contribution from $\PZ\PZ$ processes is expected, a more careful treatment is performed of this process. $\PQq\PAQq\to\PZ\PZ$ and $\Pg\Pg\to\PZ\PZ$ are modeled separately. $\PQq\PAQq\to\PZ\PZ$ is simulated at using powheg2.0. Generator-level p_Tdependent k-factors are assigned to take into account / differences . $\Pg\Pg\to\PZ\PZ$ is simulated using mcfm7.0 at LO, and is normalized to the calculations .

Signals are simulated at LO precision with the MadGraphmc@nlogenerator with up to two additional partons in the matrix element calculation. Events are then interfaced to pythiav8.212 for fragmentation and hadronization. The detector simulation is performed using the CMS fast simulation package . Signal simulations are normalized to + calculations .

Signal and control region definitions

Object selection

Muons and electrons are the most important objects for this analysis. The driving principle of the muon and electron selection is to keep a high efficiency while having a similar muon and electron efficiency. This is done to enhance the performance of the flavor-symmetric background estimation methods, described in section [sec:susy_regions].

Muons are required to pass the “medium” selection criteria, described in section [sec:muo_identification]. Electrons are required to pass a MVA-based selection (described in section [sec:electron_reconstruction_id]), designed to keep a high efficiency and low acceptance rates of electrons in jets. Muons and electrons in the transition region of the ECAL are rejected, to ensure their reconstruction efficiencies are similar.

Tracks associated to electrons and muons are required to have an impact parameter of less than 0.5 mm in the transverse plane and less than 1 mm in the direction along the beam direction. They are also required to be isolated from the rest of PF candidates in the event. In order to do that, the mini-isolation variable is used, which is required to be less than 10% of the lepton p_Tfor electrons and 20% for muons.

Jets are selected using the criteria described in section [sec:jet_definition], and are required to have a p_Tgreater than 35 $\,\text{Ge\hspace{-.08em}V}$. This threshold is relaxed to 25 $\,\text{Ge\hspace{-.08em}V}$to veto jets and jets. Since leptons are usually clustered as jets by the anti-κ_t algorithm, jets within a ΔR < 0.4 of a selected leptons are not included in the counting. Jets produced in the parton showers of quarks are tagged using the deepCSV algorithm, described in section [sub:jet_btagging].

Some of the signal regions look for the presence of hadronically decaying or bosons, whose decay products are collimated into a single jets. For that purpose, jets are clustered with the anti-k_T algorithm and a radius parameter of 0.8. /candidates are then required to have a soft-drop mass between 65 and 105 $\,\text{Ge\hspace{-.08em}V}$, and τ₂/τ₁ < 0.4(0.45) in 2016 (2017 and 2018) data. This way only jets are selected that have a mass consistent with the and bosons and have the characteristic 2-prong substructure expected in their decays.

Additionally, since the presence of additional leptons must be vetoed in some signal regions, in order to build suitable control regions or to be independent with other analyses. In order to do that, events with isolated PF candidates are rejected. In order to be considered for this veto, the track-based relative isolation with a cone size of ΔR < 0.3 is required to be less than 20% of the track momentum and less than 5 GeV. Additionally, the track is required to have an impact parameter with respect to the primary vertex less than 0.1 cm and 0.2 cm relative to the primary vertex.

Photons are required to pass identification criteria based on the shape of the cluster in the ECAL and the fraction of energy deposited in the ECAL . Photons are required to have p_T > 50 $\,\text{Ge\hspace{-.08em}V}$, excluding the transition region of the ECAL.

Discriminating variables

The $\mathrm{M_{T2}}$variable and a likelihood discriminator are used to select and classify events. These two variables are built to reject events, and are described in this section.

$\mathrm{M_{T2}}$

The $\mathrm{M_{T2}}$variable is used in many searches to reject backgrounds, as it aims to reconstruct the mass of a heavy particle, that has been produced in pairs, and decays into a visible and an invisible object. Given a pair of visible objects v₁ and v₂, and ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$due to the presence of the two invisible objects, $\mathrm{M_{T2}}$(v₁v₂) is defined as

$$\label{eq:mt2} {\ensuremath{\mathrm{M_{T2}}}\xspace}= \min_{{\ensuremath{{\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}}\xspace}{}^{(1)} + {\ensuremath{{\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}}\xspace}{}^{(2)} = {\ensuremath{{\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}}\xspace}} \left[ \max \left( {\ensuremath{\mathrm{M_T}}\xspace}\left(v_1,{\ensuremath{{\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}}\xspace}{}^{(1)}\right) , {\ensuremath{\mathrm{M_T}}\xspace}\left(v_2,{\ensuremath{{\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}}\xspace}{}^{(2)}\right) \right) \right],$$

where ${\ensuremath{{\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}}\xspace}{}^{(1,2)}$ are hypothesis vectors in the transverse plane, that consider all the possible configurations for the invisible pair of particles consistent with the observed ${\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}$. ${\ensuremath{\mathrm{M_T}}\xspace}\left(v_i,{\ensuremath{{\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}}\xspace}{}^{(i)}\right)$ is the transverse mass of the system composed by the visible object and the hypothesis for the invisible muon momentum.

If v₁ and v₂ are produced in association with two invisible objects in a decay of a pair of particles with mass M, $\mathrm{M_{T2}}$(v₁v₂) has a kinematic endpoint at M. In particular, in an event with two leptons and ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$one can consider the $\mathrm{M_{T2}}$variable evaluated in the two leptons, $\mathrm{M_{T2}}(\Pl\Pl)$. This variable a kinematic endpoint in the mass for dileptonic and $\PW\PW$ events. Of course, due to the limited resolution of the detector, as well as off-shell effects of the produced particle, a significant contribution from those background events may contribute with values above the kinematic endpoint.

Another related variable is the $\mathrm{M_{T2}}(\Pl\PQb\Pl\PQb)$variable, which is constructed in events with a pair of leptons ($\Pl_1$ and $\Pl_2$) and a pair of jets ($\PQb_1$ and $\PQb_2$). To construct it, the two possible pairings between leptons and jets are considered, and ${\ensuremath{\mathrm{M_{T2}}}\xspace}(\Pl_1+\PQb_1,\Pl_2+\PQb_2)$ and ${\ensuremath{\mathrm{M_{T2}}}\xspace}(\Pl_1+\PQb_2,\Pl_2+\PQb_1)$ are calculated. The minimum of the two is defined as $\mathrm{M_{T2}}(\Pl\PQb\Pl\PQb)$. This variable has, for dileptonic events, a kinematic endpoint at the mass of the top quark.

discriminator

A likelihood discriminator is used to enhance the sensitivity of slepton-edge search. This discriminator is a naive Bayes classifier, that categorizes events as -like and non-like. Since no signal is used in the design of the discriminator, it can be considered signal-agnostic.

Several variables, that are characteristics of the topology are chosen as inputs to the multivariate discriminator:

The correlation matrix of these variables in simulations is shown in figure [fig:nll_correlationvariables]. Mild correlations among ${\ensuremath{\lvert \Delta \phi^{\Pl\Pl} \rvert}}$, ${\ensuremath{p_{\mathrm{T}}}\xspace}^{\Pl\Pl}$and $\sum m_{\Pl\PQb}$are observed, while ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$is quite uncorrelated with respect to the others.

The for these variables is extracted from data events with an pair, which is expected to be enriched in events. An ad-hoc model is used, assuming an analytic expression for these variables. The validity of these models is corroborated by a Kolmogorov-Smirnov goodness-of-fit test to data. ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$is modeled using a sum of exponential functions, a Crystal ball for $\sum m_{\Pl\PQb}$and ${\ensuremath{p_{\mathrm{T}}}\xspace}^{\Pl\Pl}$, and a third order polynomial for ${\ensuremath{\lvert \Delta \phi^{\Pl\Pl} \rvert}}$. The for the variables have been determined separately for each one of the three years of the data taking, and the likelihood discriminator is constructed separately for each year.

Figure [fig:susy_nll_plots] show the distribution of the discriminator score in simulated events and signal for several sparticle masses. The distribution of the score in simulations reproducing the conditions of each year of the data-taking have a good agreement among the three, as it is shown in the figure, which shows the robustness of the method.

Based on the discriminator, events are classified as -like or as non -like if the discriminator score is less or more than 24. This working point has been chosen to provide the largest signal sensitivity to the slepton-edge model, while keeping enough amount of data events in the control regions.

Signal regions

Events in the signal and dileptonic control regions are collected with a set of triggers requiring the presence of a pair of leptons passing mild isolation criteria and a given p_Tthreshold, which depends on the lepton flavor and the period of the data taking. For the leading lepton, this threshold is between 23 and 17 $\,\text{Ge\hspace{-.08em}V}$, while for the subleading lepton, the threshold ranges between 12 and 8 $\,\text{Ge\hspace{-.08em}V}$. To recover efficiency in cases in which the leptons are emitted close-by to other objects, which is the typical signature in boosted topologies, another set of triggers without isolation requirement is also used. The momentum threshold for the leading and sub-leading leptons range between 37 and 25 $\,\text{Ge\hspace{-.08em}V}$and 33 and 8$\,\text{Ge\hspace{-.08em}V}$, respectively. No single lepton triggers are used, since it is desired to keep a symmetry between electron and muon reconstruction efficiencies.

These events are also required to pass a minimal dileptonic event selection. Events are required to have at least two reconstructed leptons passing the requirements described in this section. The leading of these leptons is required to have p_T > 25 $\,\text{Ge\hspace{-.08em}V}$and the other p_T > 20 $\,\text{Ge\hspace{-.08em}V}$. The two leading leptons are required to have the opposite sign. Then, events are classified based on the flavor of these two leptons between same-flavor (SF) and different-flavor (DF). SF events are used to define the control regions, while DF events are only used for some of the control regions.

The p_Tof the system formed by the two leading leptons, ${\ensuremath{p_{\mathrm{T}}}\xspace}^{\Pl\Pl}$, is required to be larger than 50$\,\text{Ge\hspace{-.08em}V}$, in order to be consistent with the p_Tcut applied to the photons, to ensure the consistency of the ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$“templates” method, described in section [sec:susy_backgrounds]. Events for which the azimuthal angle between the two leptons, ${\ensuremath{\lvert \Delta \phi^{\Pl\Pl} \rvert}}$, is less than 0.1 are rejected, to keep similar electron and muon isolation efficiencies. The invariant mass of the two leptons, ${\ensuremath{m_{\Pl\Pl}}\xspace}$, is required to be greater than 20 $\,\text{Ge\hspace{-.08em}V}$, to reject contributions from low mass resonances.

Signal regions are defined aiming to be sensitive to the simplified models described in the previous section. However, it is aimed to keep sensitivity to other models besides those.

On-regions

Events are required, in order to enter in the on-regions, to have an $m_{\Pl\Pl}$consistent with the production of a boson, between 86 $\,\text{Ge\hspace{-.08em}V}$and 96$\,\text{Ge\hspace{-.08em}V}$. Events containing additional isolated PF candidates passing the isolation criteria described for the veto are rejected. To reject events with contributions from instrumental ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$the two jets with the highest p_Tare required to be separated from ${\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}$in ϕ by at least 0.4.

Six orthogonal SRs are defined in the search for strong production of SUSY with a candidate. SRA, SRB and SRC are composed by events with 2-3 jets, 4-5 jets and 6 or more reconstructed jets, respectively. In the GSMB scenario, up to 6 jets are expected, however SRA and SRB provide additional sensitivity to events in which some of the jets are outside the acceptance. These regions are further divided between those containing at least one jet or those that do not have jets. Events in regions without jets are required to have ${\ensuremath{H_{\mathrm{T}}}\xspace}> 500$ $\,\text{Ge\hspace{-.08em}V}$and ${\ensuremath{\mathrm{M_{T2}}(\Pl\Pl)}\xspace}>$ 80 $\,\text{Ge\hspace{-.08em}V}$, while the ones with jets are required to have ${\ensuremath{H_{\mathrm{T}}}\xspace}> 200$ $\,\text{Ge\hspace{-.08em}V}$and ${\ensuremath{\mathrm{M_{T2}}(\Pl\Pl)}\xspace}>$ 100 $\,\text{Ge\hspace{-.08em}V}$, to more efficiently suppress background. These criteria are summarized in table [tab:strong_onz_regions].

For the electroweak production, three regions are built. The first two target the $\PW\PZ$ and $\PZ\PZ$ topologies, in which the $\PW$ and one of the bosons decay hadronically, and the other one, leptonically. The third one targets the $\PH\PZ$ final state, with the boson decaying leptonically and the Higgs boson into a $\PQb\PAQb$ pair.

The first region, named “${{\HepParticle{V}{}{}}\Xspace}\PZ$ resolved”, aims for the cases in which the two jets produced in the or decays are reconstructed individually. For this region, events are required to have at least two jets. Events with jets or ${\ensuremath{\mathrm{M_{T2}}(\Pl\Pl)}\xspace}<$ 80 $\,\text{Ge\hspace{-.08em}V}$are rejected, to reduce the contribution from events. Then, the mass of the two jets that are closer in Δϕ, m_jj, is required to be less than 110 $\,\text{Ge\hspace{-.08em}V}$, to be consistent with the hadronic decay of the and boson.

The second region, referred to as “${{\HepParticle{V}{}{}}\Xspace}\PZ$ boosted”, recovers sensitivity to cases in which the or boson is produced with a large boost in the transverse plane and the two jets produced in the decay are merged into a single one. For this region, a /candidate with p_T > 200 $\,\text{Ge\hspace{-.08em}V}$is required in the final state. The /candidate is required to be separated from ${\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}$with a Δϕ > 0.8. Events with -tagged jets are also rejected. In addition, events in the “resolved” region are rejected in order to avoid overlap between the regions.

The third region, $\PH\PZ$ region, is designed to be sensitive to the cases in which the boson decays leptonically and the boson decays through the $\PQb\PAQb$ mode, which is the one with the dominant branching ratio. Events in this category are required to have at least two -tagged jets with an invariant mass, $m_{\PQb\PQb}$, less than 150 $\,\text{Ge\hspace{-.08em}V}$. In order to reduce the presence of events, $\mathrm{M_{T2}}(\Pl\PQb\Pl\PQb)$is required to be greater than 200 $\,\text{Ge\hspace{-.08em}V}$.

All the electroweak regions are split in ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$bins. The selection criteria for these regions is summarized in table [tab:ewk_onz_regions].

Off-regions

Two different signal regions are constructed aiming for the slepton-edge model and direct slepton pair production, respectively.

For the slepton-edge signal region, events are expected to have at least two jets produced in the decay of the squarks, plus two additional leptons produced in their decay chain. Events are required to have at least two jets and ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$ > 150 GeV. To suppress events with instrumental ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$, it is required that the two jets with the highest p_Thave a Δϕ with respect to ${\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}$greater than 0.4. In order to suppress the dominant background, composed by events, $\mathrm{M_{T2}}(\Pl\Pl)$is required to be larger than 80 $\,\text{Ge\hspace{-.08em}V}$.

Then events are divided in 28 exclusive signal regions. Evens are classified depending on their $m_{\Pl\Pl}$value in 7 categories: [20-60], [60-86], [96-150], [150-200], [200-300], [300-400] and greater than 400 $\,\text{Ge\hspace{-.08em}V}$. Events with $m_{\Pl\Pl}$between 86 and 96 $\,\text{Ge\hspace{-.08em}V}$are not included in any of these categories, so they are not taken into account in the analysis. For each $m_{\Pl\Pl}$bin, events are classified according to the -tagged jet multiplicity: one category is built for events with at least one -tagged jet and another for events with no -tagged jets. Events are also classified as -like and non -like based on the likelihood discriminant described in the previous section.

Regions aiming for direct slepton production are expected to have two OSSF leptons and missing transverse momentum in the final state, and no or little hadronic activity. In these regions, events with additional PF candidates are vetoed. In order to reject contribution from Drell–Yan events, the leading lepton is required to have p_T > 50 $\,\text{Ge\hspace{-.08em}V}$. Besides, $m_{\Pl\Pl}$must be lower than 65 $\,\text{Ge\hspace{-.08em}V}$or greater than 120 $\,\text{Ge\hspace{-.08em}V}$. To reduce the presence of both Drell–Yan, and $\PW\PW$ events, events are required to have both $\mathrm{M_{T2}}(\Pl\Pl)$and ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$larger than 100$\,\text{Ge\hspace{-.08em}V}$.

On top of this selection, two types of regions are defined. Most of the sensitivity is expected to come from events without jets in the final state. Therefore, a region with no reconstructed jets is defined. In order to keep sensitivity to signal events in which low energy jets are present due to ISR radiation, another category with jets is built. In this category, at least one jet with p_T > 25 $\,\text{Ge\hspace{-.08em}V}$must be present, but the quotient ${\ensuremath{p_{\mathrm{T}}}\xspace}^{lep2}/{\ensuremath{p_{\mathrm{T}}}\xspace}^{jet1}$ is required to be greater than 1.2. The leading jet is required to have $\Delta\phi(j,{\ensuremath{{\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}}\xspace})>0.4$ in this region. These selection criteria are summarized in table [tab:slepton_regions].

Dileptonic control regions

Dileptonic control regions are defined to estimate some of the backgrounds. These regions are defined on top of the dilepton event selection defined in the previous section.

Different-flavor control regions

One of the main background processes are the so-called flavor-symmetric processes, those in which DF and SF lepton pairs are produced at the same rate. To estimate them, control regions are built by applying the same selection criteria as signal regions, but requiring a DF lepton pair instead of a SF lepton pair. As will be described in section [sec:susy_backgrounds], these events are used to estimate the contribution of flavor-symmetric processes to the signal region, after they have been corrected by the transfer factor.

Since DF and SF pairs are produced at the same rate, the transfer factor is expected to be close to the unity. Because of that, it is necessary to increase the statistical power of the control regions in cases in which the rate of flavor-symmetric processes is expected to be low. This is the case in regions aiming for SUSY with a candidate. In those region, the amount of flavor-symmetric events in the control region is increased by removing the requirement on $m_{\Pl\Pl}$to be between 86 and 96 $\,\text{Ge\hspace{-.08em}V}$.

DY+jetsand control regions

To develop some of the background estimation methods, it is necessary to consider data regions enriched in DY+jetsevents. Three different control regions are considered. One region is defined to derive the transfer factors between SF and DF events. For this region, leptons are required to have a SF lepton pair, at least two jets, ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$less than 50$\,\text{Ge\hspace{-.08em}V}$and with $m_{\Pl\Pl}$between 60 and 120 $\,\text{Ge\hspace{-.08em}V}$. The other two regions are used to extrapolate DY+jetscontribution from regions on-to signal regions off-. For this purpose a region is defined for the slepton-edge search, requiring events with at least two jets, ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$less than 50$\,\text{Ge\hspace{-.08em}V}$and $\mathrm{M_{T2}}(\Pl\Pl)$greater than 80 $\,\text{Ge\hspace{-.08em}V}$. For the slepton regions, the control region is defined on top of their definition by removing the ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$cut, the jet veto cut and the $m_{\Pl\Pl}$cut.

To validate the estimation of flavor-symmetric backgrounds, a -enriched control region in the SF channel is defined. This region requires exactly two jets in the event, ${\ensuremath{m_{\Pl\Pl}}\xspace}< 70$ $\,\text{Ge\hspace{-.08em}V}$or ${\ensuremath{m_{\Pl\Pl}}\xspace}> 110$ and 100 $< {\ensuremath{{\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}}\xspace}< 150{\ensuremath{\,\text{Ge\hspace{-.08em}V}}\xspace}$, to reject contributions from DY+jetsevents.

${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$templates validation regions

To validate the DY+jetsestimation in the on-regions, in which it is the dominant background, DY+jetsenriched regions are defined. These regions are built by inverting the cut that rejects events for which one of the two leading jets is closer in ϕ than 0.4 to ${\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}$. In the case of the boosted region, the inverted cut requires ϕ to be more than 0.8 between the /candidate and ${\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}$. Inverting this cut builds signals with a significant contribution from DY+jetsevents that enter in the signal region due to mismeasured jets.

Photon control regions

In signal regions with jets, the contribution from DY+jetsevents is estimated using a control region with reconstructed photons. In particular, these +jetsevents are used to determine the ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$distribution of DY+jets, as in both cases their spectrum is expected to be driven by limited resolution of the jet momentum.

These events are collected using a set of single photon triggers, that require events to contain a photon with at least 50 $\,\text{Ge\hspace{-.08em}V}$. Only a fraction of these events is accepted by the trigger with an acceptance factor, dependent on the p_Tthreshold, to keep their acceptance rate under an acceptable value. Events collected by these are weighed by this fraction in order for their effective luminosity to match the luminosity collected in the rest of the regions.

Events with additional charged PF candidates are vetoed, to suppress the contribution from electroweak processes, such as $\PW\PGg$, that may significantly populate the tails of the ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$distribution in the control region.

The ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$distribution may depend on the jet multiplicity and kinematics of the event, as well as by the presence of jets in the event, since they may contain neutrinos produced in the hadron decays. Therefore a control region is built for each one of the signal regions, and is built by applying to the jets in the event the same requirements that are applied in the signal region definition.

In regions with requirements on $\mathrm{M_{T2}}$, it is not possible to directly apply the cuts on the +jetssamples, since since the variable is constructed with two visible particles. In order to have a consistent definition of these variables in the +jetssamples, the decay of the boson is emulated. This decay is performed assuming the mother particle with the a mass of a boson, and a momentum corresponding to the reconstructed photon. We consider a system in which the mother particle is at rest. Then the decay of the mother particle to leptons is simulated taking into account the angular correlation between them due to spin correlations in the matrix element. Analysis requirements on η and p_Tof the leptons are also applied to the leptons obtained in the decay.

Control regions for $\PW{}\PZ$, $\PZ{}\PZ$and $\PQt{}\PAQt{}\PZ$

The contribution of $\PW{}\PZ$, $\PZ{}\PZ$and $\PQt{}\PAQt{}\PZ$processes to signal regions is estimated using simulations. In order to verify the good modeling of these processes in simulations, dedicated control regions enriched in them are built.

Two types of control regions are constructed. The first set aims to check the modeling of the three processes in regions with jets, while the second kind measure the contribution from $\PZ{}\PZ$and $\PW{}\PZ$to the slepton regions, that only allow little hadronic activity.

In regions without jets, events are required to pass the minimal dilepton selection criteria, described in section [sub:susy_srs], and to contain at least two jets with $|\Delta\phi(j_{1,2},{\ensuremath{{\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}}\xspace})| > 0.4$. For the three regions, an OSSF lepton pair with 86 $< {\ensuremath{m_{\Pl\Pl}}\xspace}< 96$ $\,\text{Ge\hspace{-.08em}V}$is required.

For the $\PQt{}\PAQt{}\PZ$background, events are required to have ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$ > 30 $\,\text{Ge\hspace{-.08em}V}$and at least two -tagged jets with the medium working point.

For the $\PZ{}\PZ$background, events are required to have two OSSF pairs of leptons. The two pairs of leptons are required to have $m_{\Pl\Pl}$ > 40 $\,\text{Ge\hspace{-.08em}V}$to reject low mass resonances. Events with -tagged jets passing the loose working point are rejected.

In the regions without jets, only $\PZ{}\PZ$and $\PW{}\PZ$are relevant. In the $\PW{}\PZ$region, events are required to have

Background predictions

Three types of backgrounds are affecting this analysis. The first kind, are processes in which events with a SF lepton pair and events with a DF lepton pair are produced with the same rate. Another contribution comes from DY+jetsevents. Even if all the signal regions of this analysis require the presence of ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$in the final state, a significant number of events may leak into the signal region because of instrumental ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$. In the he third type of processes, referred to as +processes, the two leptons are produced in a $\PZ/\PGg^*$ decay and genuine ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$is present in the final state. The estimation of the first kind of processes is fully based on data, while remaining contributions from the other is performed using simulations. This section will cover the methods used in this search to predict each one of them.

Flavor-symmetric processes

Flavor-symmetric processes are dominated by events, although contributions from $\PW{}\PW$, $\PZ\to\tau\tau$ and $\PQt{}\PW$production events is expected. In particular, in some of the slepton signal regions, $\PW{}\PW$events are expected to contribute dominantly the flavor symmetric component.

As mentioned in the previous section, these backgrounds are estimated in control regions that are built with the same requirements as the signal regions. By construction, the control regions will only be populated by flavor-symmetric events, that are expected to contribute equally to signal and control regions.

However, due to different reconstruction, isolation and trigger efficiencies between electrons and muons, the flavor-symmetric yield in the DF and SF regions may differ. In order to take into account this difference, a transfer factor between signal and control regions, ${\ensuremath{\mathrm{R}_{\mathrm{SF/OF}}}\xspace}$, is computed. This factor is derived with the so-called factorization method. In the analysis performed in , the factor derived with a combination of this method and a direct measurement in the control region defined in the previous section. In this iteration of the analysis, the larger dataset allows for a more precise determination of ${\ensuremath{\mathrm{R}_{\mathrm{SF/OF}}}\xspace}$, dominated by the factorization method, so the direct measurement is not used. The control region is used instead to check the validity of the method in data.

In the factorization method, it is assumed that the selection efficiencies for the two leptons are independent. Then, the efficiency of the dilepton selection can be written as the product of the efficiencies of each lepton separately, $\epsilon_{\Pl_1\Pl_2} = \epsilon_{\Pl_1}\epsilon_{\Pl_2}$.

To derive the method, the following nomenclature is adopted: $N_{\Pe\Pe(\PGm\PGm)}$ denotes the number of ${\ensuremath{\Pe^\pm}\xspace}{\ensuremath{\Pe^\mp}\xspace}$ (${\ensuremath{\PGm^\pm}\xspace}{\ensuremath{\PGm^\mp}\xspace}$) events. The ^hard superscript means the quantity refers to the generator-level quantity, before any trigger or reconstruction effects are taken into account. When the superscript “^*” is included, only offline reconstruction and identification efficiencies are taken into account, and trigger efficiencies have been factored out. Finally, the trigger efficiency is denoted as ε^T.

The factor $r_{\PGm/\Pe}$is defined as the ratio of muon and electron offline and trigger efficiencies, $\epsilon_{\PGm}/\epsilon_{\Pe}$. $r_{\PGm/\Pe}$is one of the coefficients to be measured in data, and can be computed as

$${\ensuremath{r_{\PGm/\Pe}}\xspace}= \sqrt{\frac{N_{\PGm\PGm}}{N_{\Pe\Pe}}} = \sqrt{\frac{\epsilon^T_{\PGm\PGm}\epsilon^*_{\PGm}(\Pl_1)\epsilon^*_{\PGm}(\Pl_2)}{\epsilon^T_{\Pe\Pe}\epsilon^*_{\Pe}(\Pl_1)\epsilon^*_{\Pe}(\Pl_2)}}.$$

$$\begin{aligned} N_{\Pe\Pe} &= \epsilon_{\Pe\Pe}^T N^*_{\Pe\Pe} = \epsilon_{\Pe\Pe}^T \epsilon_{\Pe}^*(\Pl_1)\epsilon_{\Pe}^*(\Pl_2) N^{\textrm{hard}}_{\Pe\Pe} \\ &= \frac{1}{2} \epsilon_{\Pe\Pe}^T \epsilon_{\Pe}^*(\Pl_1)\epsilon_{\Pe}^*(\Pl_2) N^{\textrm{hard}}_{DF} \\ &= \frac{1}{2} \epsilon_{\Pe\Pe}^T \epsilon_{\Pe}^*(\Pl_1)\epsilon_{\Pe}^*(\Pl_2) \frac{N^{*}_{DF}}{\epsilon_{\Pe}^*(\Pl_1) \epsilon_{\PGm}^*(\Pl_2)} \label{eq:flav_sym_step3}\\ &= \frac{1}{2} \frac{\epsilon^T_{\Pe\Pe}}{\epsilon^T_{DF}} = \frac{1}{2} \frac{1}{{\ensuremath{r_{\PGm/\Pe}}\xspace}(\Pl_2)} \frac{\sqrt{\epsilon_{\Pe\Pe}^T\epsilon_{\PGm\PGm}^T}}{\epsilon_{DF}^T} N_{DF}, \end{aligned}$$

where in step [eq:flav_sym_step3] the convention has been adopted that $\Pl_2$ is the electron in the DF sample and $\Pl_1$ is the muon. Similarly, for muons,

$$N_{\PGm\PGm} = {\ensuremath{r_{\PGm/\Pe}}\xspace}(\Pl_1) \frac{\sqrt{\epsilon_{\Pe\Pe}^T\epsilon_{\PGm\PGm}^T}}{\epsilon_{DF}^T} N_{DF},$$

where again the convention that $\Pl_1$ is the muon has been adopted. Then the total flavor symmetric can be written, defining $R_T = \frac{\sqrt{\epsilon_{\Pe\Pe}^T\epsilon_{\PGm\PGm}^T}}{\epsilon_{DF}^T}$ as

$$N_{SF} = \frac{1}{2}\left({\ensuremath{r_{\PGm/\Pe}}\xspace}(\PGm) + \frac{1}{{\ensuremath{r_{\PGm/\Pe}}\xspace}(\Pe)}\right) R_T N_{DF}.$$

In this calculation, we have kept track of which lepton each one of the ${\ensuremath{r_{\PGm/\Pe}}\xspace}$ factors have to be evaluated in. In previous iterations of the analysis , only the dependence of the R_SF/OF factor as a function of the p_Tof one of the leptons was kept. In the current parameterization, and thanks to the higher collected luminosity and a deeper understanding of the detector, it is possible to evaluate $r_{\PGm/\Pe}$as as a function of the lepton 3-momentum. Then, the R_SF/OF transfer factor becomes a function of the p_Tand η of the two leptons.

$r_{\PGm/\Pe}$factor is measured in the DY+jetscontrol region defined in the previous section, using the relation ${\ensuremath{r_{\PGm/\Pe}}\xspace}= \sqrt{\frac{N_{\PGm\PGm}}{N_{\Pe\Pe}}}$. A parameterization of this factor as a function of the lepton ${\ensuremath{p_{\mathrm{T}}}\xspace}$ and η using an ad-hoc functional form, that has been empirically found to describe the data:

$$\begin{aligned} {\ensuremath{r_{\PGm/\Pe}}\xspace}(\Pl) &= {\ensuremath{r_{\PGm/\Pe}}\xspace}^0 \cdot f\left({\ensuremath{p_{\mathrm{T}}}\xspace}(\Pl)\right) \cdot g\left(\eta(\Pl)\right) \\ f({\ensuremath{p_{\mathrm{T}}}\xspace}) &= ( a_1 + b_1/{\ensuremath{p_{\mathrm{T}}}\xspace}) \label{eq:rmue_fit_pt}\\ g(\eta) &= a_2 + \begin{cases} 0 & |\eta| < 1.6 \\ c_1 \cdot (\eta - 1.6)^2 & \eta > 1.6 \\ c_2 \cdot (\eta + 1.6)^2 & \eta < -1.6 \end{cases},\label{eq:rmue_fit_eta}\end{aligned}$$

where ${\ensuremath{r_{\PGm/\Pe}}\xspace}^0$, a₁, a₂, b₁, c₁ and c₂ are constants that are determined from data and parameterize the dependencies of $r_{\PGm/\Pe}$as a function of the lepton η and p_T. The dependencies on p_Tand η are determined performing two separate fits are performed, one for each dependency. In order to derive $f({\ensuremath{p_{\mathrm{T}}}\xspace})$ and g(η), the ${\ensuremath{r_{\PGm/\Pe}}\xspace}^2(\Plp,\Plm) = {\ensuremath{r_{\PGm/\Pe}}\xspace}(\Plp){\ensuremath{r_{\PGm/\Pe}}\xspace}(\Plm)$ is considered. When considering ${\ensuremath{r_{\PGm/\Pe}}\xspace}^2(\Plp,\Plm)$ as a function of a given variable, ${\ensuremath{r_{\PGm/\Pe}}\xspace}^2(\Plp)$, one must marginalize over the remaining variables,the kinematics of the remaining lepton and η of that lepton:

$${\ensuremath{r_{\PGm/\Pe}}\xspace}^2(\Plp) = {\ensuremath{r_{\PGm/\Pe}}\xspace}(\Plp)\int{\ensuremath{r_{\PGm/\Pe}}\xspace}(\Plm)d\Plm = {\ensuremath{r_{\PGm/\Pe}}\xspace}(\Plp)\bar{{\ensuremath{r_{\PGm/\Pe}}\xspace}},$$

having assumed that the two leptons are independent, and η and p_Tof the leptons are independent. It should be noted that the choice of the positive lepton is just a way to randomize the choice. Picking the leading or sub-leading lepton to obtain the parameterization would break the independence assumption.

In a first step, the marginalized distribution ${\ensuremath{r_{\PGm/\Pe}}\xspace}^2(\Plp)$ as a function of the p_Tallows to obtain the distribution of $f({\ensuremath{p_{\mathrm{T}}}\xspace})$. However, since there is a degeneracy between a₁, ${\ensuremath{r_{\PGm/\Pe}}\xspace}^0$ and $\bar{{\ensuremath{r_{\PGm/\Pe}}\xspace}}$, the fit only allows to determine the ratio b₁/a₁. The overall magnitude of ${\ensuremath{r_{\PGm/\Pe}}\xspace}^0$ will be determined at a later step. The result of this fit is shown in figure [fig:susy_rmue_pt].

In the second step, the η dependencies of the ${\ensuremath{r_{\PGm/\Pe}}\xspace}$ factor are determined using the same methodology. In order to avoid correcting twice the p_Tdependencies due to possible correlations between p_Tand η, each dielectron event is weighed by $f({\ensuremath{p_{\mathrm{T}}}\xspace}^{\Pl_1})f({\ensuremath{p_{\mathrm{T}}}\xspace}^{\Pl_2})$. Again a degeneracy among the various parameters is present, and only c₁/a₂ and c₂/a₂ can be obtained from this fit. The result of this fit is shown in figure [fig:susy_rmue_eta].

Since the fits described above do not to fully determine the $r_{\PGm/\Pe}$function, only its p_Tand η dependencies have been obtained. ${\ensuremath{r_{\PGm/\Pe}}\xspace}^0$ is then determined inclusively, weighing dielectron events by $f({\ensuremath{p_{\mathrm{T}}}\xspace}^{\Pl_1})f({\ensuremath{p_{\mathrm{T}}}\xspace}^{\Pl_2})g(\eta^{\Pl_1})g(\eta^{\Pl_2})$, to fully account for the observed η and p_Tdependencies. The resulting values of the $r_{\PGm/\Pe}$parameterization are shown in table [tab:rmue_fittedvalues]. A significant non-zero value is obtained for c₁ and c₂, that paremeterize the η dependency. This dependency is present due to the prefiring described in chapter [chap:cms].

After the $r_{\PGm/\Pe}$parameterization has been completely determined, the distribution of $\sqrt{N_{\PGm\PGm}/N_{\Pe\Pe}}$ after dielectron events have been reweighed by the ${\ensuremath{r_{\PGm/\Pe}}\xspace}(\Pl_1){\ensuremath{r_{\PGm/\Pe}}\xspace}(\Pl_2)$ as a function of several observables. These distributions are shown in figure [fig:rmue_rmue_closure] for the three years of the data taking. No significant trends are observed in the distributions, and residual trends are used to estimate the systematic uncertainty of the method. These residual trends can be due to correlations between the kinematics of the two leptons and trends not covered by the ad-hoc parameterization. Three sources of uncertainties are considered for $r_{\PGm/\Pe}$: a 5% flat uncertainty, a 5% uncertainty that modulates its p_Tdependency, and a 5% modulating its η dependency. As seen in figure [fig:rmue_rmue_closure], these uncertainties cover the residual dependencies.

The remaining ingredient is the R_Tfactor, that corrects the remaining effects due to the residual differences in trigger efficiencies. This factor can be determined by measuring the trigger efficiencies using the orthogonal trigger method. In previous iterations of the analysis, these measurements were performed with triggers that collect events with high H_T. However, the dedicated bandwidth for these triggers was significantly decreased for the 2017 and 2018 data taking, and the amount of data collected by them was not enough to make a precise measurement. Because of this, triggers selecting events with large momentum imbalance are used.

The efficiencies for each one of the channels are measured in dilepton events that are not entering the signal regions or their equivalent DF control regions. Since the trigger reconstruction algorithm and menu changed during Run 2, the measurement is performed separately for each one of the years. These efficiencies are shown in in table [tab:susy_rt], together with the corresponding R_Tvalue. This measurement has been performed as a function of several kinematic variables. No significant trends on this variable have been found, and a 4% systematic uncertainty is assigned to this factor to account for small statistical fluctuations in these distributions.

The whole method is validated in the control region described in the previous section, that is enriched in events. Data in this region is compared to the prediction by the data-driven estimate of the flavor symmetric background and simulations for the remaining backgrounds, as shown in figure [fig:rsfofclosure]. Data agrees well with predictions within the systematic and systematic uncertainties of the method.

Flavor-symmetric processes in on-regions

Since the expected rate of flavor-symmetric events is on-regions is expected to be small, the statistical power of the DF flavor control regions could be a limiting factor of the analysis. In order to tackle this, the requirements on these control regions are relaxed by removing the $86 < {\ensuremath{m_{\Pl\Pl}}\xspace}< 96$ $\,\text{Ge\hspace{-.08em}V}$cut. To account for this, an additional factor, κ, is needed

$$\kappa = \frac{N^{\mathrm{DF} (86 < {\ensuremath{m_{\Pl\Pl}}\xspace}< 96 {\ensuremath{\,\text{Ge\hspace{-.08em}V}}\xspace})}}{N^{\mathrm{DF} ({\ensuremath{m_{\Pl\Pl}}\xspace}> 20 {\ensuremath{\,\text{Ge\hspace{-.08em}V}}\xspace})}}.$$

κ is estimated data in a set of regions with DF lepton pairs, separately for different sets of signal regions. It is estimated in the DF control region associated to the following regions:

These measurements are shown in figure [fig:kappa_measurement]. The κ values for the six defined control regions are found to be consistent with simulations within the statistical uncertainty. However, the difference in SRC between the measured value and the one predicted by the simulations is taken as a systematic uncertainty, once the statistical uncertainty is subtracted.

The dependency of κ on ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$and the jet multiplicity is also studied. In order to do that, events passing the minimal dilepton cut, with at least two jets and ${\ensuremath{\mathrm{M_{T2}}}\xspace}> 80$ $\,\text{Ge\hspace{-.08em}V}$. Events are also required to have Δϕ between ${\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}$and the two leading jets greater than 0.4. κ is measured in events with and without -tagged jets, and as a function of ${\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}$. These results are shown in figure [fig:kappa_measurement].

The following systematic uncertainties are considered for κ: the statistical uncertainty in the measurement and a 20% to cover the trends observed as a function of ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$. The dependencies as a function of the multiplicity is found to be negligible compared to the others and is not taken into account.

DY+jetsbackgrounds

DY+jetsevents can enter the signal regions due to the limited energy resolution and acceptance of the detector, that can lead to instrumental ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$. This ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$is usually small, however due to the large cross section of the DY+jetsprocess, it can contribute with a significant number of events to the signal regions, specially the ones with lower ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$requirements.

In signal regions with large or moderate jet multiplicity requirements, i.e. the on-regions and the slepton-edge search, instrumental ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$is dominated by energy mismeasurements of the jets. Therefore the shape of the ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$distribution is determined from a +jetsdata sample, using the so-called “${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$templates” method.

In slepton regions, in which only events with small hadronic activity are allowed, this assumption may not be valid anymore. Therefore the “${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$templates” method cannot be used. In these regions, the 65 $\,\text{Ge\hspace{-.08em}V}$$<{\ensuremath{m_{\Pl\Pl}}\xspace}<$ 105 $\,\text{Ge\hspace{-.08em}V}$requirement is inverted to obtain a region enriched in DY+jets, that can be used to infer its presence.

“${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$templates” method

The “${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$templates” relies on the fact that energy mismeasurements of the jets contribute more dominantly to the instrumental ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$than photons or leptons, which are expected to be measured with higher precision. Under that assumption, and assuming the same event topology for DY+jetsand +jetsprocesses, the ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$distribution should be equivalent between the two. The distribution of ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$is then inferred from the photon control regions defined in section [sub:susy_photon_cr]. This method has been developed within the UCSD group and its only briefly described here for completeness.

Processes with genuine ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$can contribute to the +jetssamples. This effect is expected to be more significant in regions with large ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$requirement. In order to account for this, the contribution from these events is subtracted using simulations. The contribution from $\PW+\PGg$, $\PQt\PAQt\PGg$ and $\PZ(\to\PGn\PGn)\PGg$ is considered. The modeling of these processes is checked in a control region with a reconstructed photon and a muon, which is dominated in $\PW\PGg$ events. Simulations are found to overpredict data by a 30%, therefore the subtraction is performed by scaling simulations by 0.7. The systematic uncertainty associated to this subtraction is a 30%, across all the signal regions.

Additionally, due to the different photon mass, the boson p_Tdistribution is different between +jetsand dilepton samples. To account for this, +jetsdata events are reweighed so their p_Tspectrum matches the expected one in DY+jetsevents. This reweighing is derived separately for each region using DY+jetsand +jetssimulations in dedicated simulated samples.

Finally, the validity of the method is checked using DY+jetsand +jetssimulations in each signal region separately. The prediction of the procedure applied to the +jetssimulations is compared to the prediction of the DY+jetsprediction. The differences between the two are assigned as systematic uncertainties of the method. These differences range between 20% and 100%, the greater of which come from regions with low number of simulated events.

This procedure allows to obtain an estimate of the ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$distribution of the DY+jetsprocess in the signal region. This distribution is normalized using events with 50 $< {\ensuremath{{\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}}\xspace}< $ 100 $\,\text{Ge\hspace{-.08em}V}$, after the remaining backgrounds have been subtracted. Since a small contribution from signal is expected in these regions, in the signal extraction fit the the normalization for the templates is parameterized as a freely floating parameter of the fit. The region with 50 $ < {\ensuremath{{\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}}\xspace}< $ 100 $\,\text{Ge\hspace{-.08em}V}$is also included in the fit to constrain this parameter. This way any possible signal hypothesis is taken into account in the background estimation.

The statistical uncertainty of the 50 $< {\ensuremath{{\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}}\xspace}<$ 100 $\,\text{Ge\hspace{-.08em}V}$region, that is used to normalize the templates, is included by considering the normalization parameter in the fit.

DY+jetsin slepton-edge regions

The contribution from DY+jetsin the slepton-edge regions can also be estimated using the ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$templates method. The DY+jetsis estimated in a region with 86 $< {\ensuremath{m_{\Pl\Pl}}\xspace}< 96$ $\,\text{Ge\hspace{-.08em}V}$, that is more enriched in this process, and is suitable to apply the “${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$templates” method. Then, the leakage of DY+jetsinto the off-signal regions must be taken into account. In order to do so, a transfer factor, ${\ensuremath{r_{\mathrm{in/out}}}\xspace}$, is constructed. The transfer factor is defined as

$${\ensuremath{r_{\mathrm{in/out}}}\xspace}^{[{\ensuremath{m_{\Pl\Pl}}\xspace}^1,{\ensuremath{m_{\Pl\Pl}}\xspace}^2]} = \frac{N({\ensuremath{m_{\Pl\Pl}}\xspace}^1 < {\ensuremath{m_{\Pl\Pl}}\xspace}< {\ensuremath{m_{\Pl\Pl}}\xspace}^2)}{N(86{\ensuremath{\,\text{Ge\hspace{-.08em}V}}\xspace}< {\ensuremath{m_{\Pl\Pl}}\xspace}< 96{\ensuremath{\,\text{Ge\hspace{-.08em}V}}\xspace})}$$

for each signal region. This factor is measured in the DY+jetscontrol region with jets defined in section [subsub:dyjets_ttbar]. The data yield, together with the contribution from other backgrounds different from DY+jetsis shown in figure [fig:rinoutedgeslepton]. The contribution from these backgrounds is subtracted in the r_in/outcalculation. These measurements are also performed as a function of ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$and $\mathrm{M_{T2}}(\Pl\Pl)$, showing some trends. In order to take them into account, a 50% uncertainty is assigned for regions below the peak and 100% for regions above the peak. The results are shown in table [tab:rinoutvalues].

DY+jetsin slepton regions

Since in the slepton regions only low energy jets are required, the “${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$templates” method cannot be used for two reasons. First, a significant contribution of instrumental ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$from lepton energy mismeasurement can occur. Secondly, contributions to the instrumental ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$can occur due to objects outside the detector acceptance or not passing our object definition, that cannot be handled in a controlled way at the analysis level.

Because of this, and of the fact that the contribution of DY+jetsto the slepton regions is subdominant, the contribution from DY+jetsevents is estimated directly from a control region defined by inverting the mass veto. This is done separately for each signal region and for each ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$bin. The contribution from the on-region is extrapolated to the off-region with another r_in/outtransfer factor, dubbed r_in/out^slepton, that is determined from data in the DY+jetscontrol region without jets defined in section [subsub:dyjets_ttbar]. The data yield for the measured is shown in figure [fig:rinoutedgeslepton]. Since slepton regions are not classified in $m_{\Pl\Pl}$, only one transfer factor (off-/on-) is computed. Systematic uncertainties are applied to this procedure from a closure test on DY+jetssimulations, resulting on a 50% uncertainty, which is dominated by the statistical uncertainty of the test. The results of the measurement are shown in table [tab:rinoutvalues].

A small contribution from signal is expected in the on-regions, which may be particularly significant in the high ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$bins. Because of that, these regions are included in the signal extraction, and the contribution of DY+jetsis modulated as a freely floating parameter in the fit. In total, 8 parameters are added to account for the DY+jetscontribution in the two regions with and without jets, and in the four ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$bins. In that way, the DY+jetscontribution is constrained by data in the on-region, taking into account any potential signal.

+backgrounds

Processes with genuine ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$and in which the two leptons come from the same $\PZ/\PGg^*$ boson are not included in any of the two groups above. The contribution from them is taken into account using simulations. Given that some of them have a flavor-symmetric components, generator information is used to select simulated events in which the two leptons come from the decay of the same $\PZ/\PGg^*$ boson.

The contribution from these processes is dominated by ${\ensuremath{\PW{}\PZ}\xspace}$, ${\ensuremath{\PZ{}\PZ}\xspace}$ and $\PQt{}\PAQt{}\PZ$processes in the case of the regions with jets. Only diboson processes contribute significantly to the slepton regions. The agreement of these processes is checked in the control regions defined in section [sub:susy_cr_znu].

In the regions without jets, an good agreement is observed between simulations and the predictions. For $\PW{}\PZ$a systematic uncertainty of 6% is taken, as measured by the latest CMS measurement . A mild trend is observed when considering the vector sum of the lepton associated to the decay and the ${\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}$, which is a proxy of the modeling of the ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$distribution in regions with two leptons. A shape uncertainty is assigned to account for this difference. For $\PZ{}\PZ$a small dependency of the normalization is seen in the $4\Pe$, $4\PGm$ and $2\Pe2\PGm$ channels, that is due to the uncertainties in the lepton efficiency calibration. A systematic uncertainty of 20% is used to account for this. Additionally, the shape difference between applying and not applying the / k-factors is taken as a source of systematic uncertainty for this process.

In regions with jets, mild excesses are observed with respect to data, depending on the data-taking era. The differences are used to correct simulations to match data. The scale factors applied are shown in table [tab:edge_scalefactors_zmu]. It should be noted that the uncertainties associated to the scale factors include statistical uncertainty only. In particular, $\PQt{}\PAQt{}\PZ$normalization is performed with respect to the cross section and performed in a particular region of the phase space valid for this analysis. A more thorough and precise study of this process is performed in chapter [chap:ttH]. A systematic uncertainty of 50%, 30% and 50% is applied to the $\PZ{}\PZ$, $\PW{}\PZ$and $\PQt{}\PAQt{}\PZ$processes, respectively, to account for kinematic dependencies observed.

On top of these modeling uncertainties described above, the impact of experimental uncertainties, such as lepton and trigger efficiencies, jet energy scale and resolution, and tagging are taken into account.

Results

In this section, the number of observed events in each one of the signal regions is compared to the predictions provided by the estimation methods discussed along this chapter.

Results of the on-SRs

The results for the SRs of the on-strong-production search are shown in figure [fig:susy_results_t5zz], and for the electroweak search are shown in figure [fig:susy_results_ewk_onz]. The ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$template prediction for each SR is normalized to events with 50-100 $\,\text{Ge\hspace{-.08em}V}$, therefore data and prediction agree in the first bin of each distribution by construction.

Data in these regions shows a fairly good agreement with the prediction, and no remarkable differences are seen. The largest discrepancy occurs in the highest ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$bins of the resolved region, where 2 events are observed, while 6.3 ± 2.2 where predicted. This difference corresponds to a local significance of -1.2 standard deviations (s.d.), very compatible with a statistical fluctuation.

Results of the edge search

The results for the 28 signal regions in the search for an edge are shown in figure [fig:susy_results_edge]. There is a general good agreement between data and the expectations. The largest discrepancy appears in the non -like region with -tagged jets. In the bin with 20 $< {\ensuremath{m_{\Pl\Pl}}\xspace}<$ 60 $\,\text{Ge\hspace{-.08em}V}$2 events are observed while 7.4_− 2.5^+ 3.7 were expected, corresponding to a local significance of -2.4 standard deviations.

Results of the slepton search

The results for the slepton search are shown in figure [fig:susy_results_sleptons]. The DY+jetsestimation has been obtained by performing a background-only fit to the on-region only. An overall good agreement between data and the expectation is observed. The largest discrepancy appears in the region without jets, in the highest ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$bin, in which 17 events were observed and 9.6 ± 2.1 were predicted.

Interpretation

Since no significant differences between data and predictions have been observed in the searches, upper limits are set in terms of the production cross section of the simplified models considered for this search. These upper limits have been calculated at 95% confidence level (CL) using the CL_S criterion in the asymptotic formulation described in section [sub:statistics]. However, these limits have been compared for a few significant points in each simplified model with the ones obtained using the CL_S criterion with toys, described in section [sub:statistics]. The two are found to agree within 1-5%, which validates the approach used.

The systematic uncertainties associated to backgrounds have been described in section [sec:susy_backgrounds]. For signal, besides the uncertainties associated to the calibration of lepton and trigger efficiencies, jet energy scale and resolution, jet tagging, additional uncertainties are added to account for the additional calibration needed in fast simulations. Additional uncertainties are taken into account for the mismodeling of ISR in LO MadGraphsamples. The uncertainties on the signal cross section due to the choice and missing diagrams in the calculation, which is estimated varying the renormalization and factorization scales, are propagated directly to the observed exclusion contours.

Upper limits on the scenario are obtained from the interpretation of the strong on-regions. These upper limits are shown in figure [fig:susy_interpretation_t5zz] as a function of the gluino and masses. The exclusion contours are defined as those points in which the upper limit matches the expected theoretical cross section, and the observed and expected ones are shown. The results in these regions allow to exclude gluino masses between 1600 and 1850 $\,\text{Ge\hspace{-.08em}V}$, depending on the mass. Upper limits are weaker in the regions with lower mass difference between gluino and , since a lower efficiency for these models is expected because jets produced in the decay of the gluino will be softer. Also weaker exclusion is expected for low masses since less ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$will be present.

The results of the electroweak signal regions are interpreted in terms of the electroweak models. The resolved and boosted regions drive the sensitivity for the $\PSGczDt\PSGcpmDo$ production.

For the $\PSGcpmDo\PSGcpmDo$ production models the sensitivity is given both for the and HZ regions. For the cases in which the only decays to the boson, most of the sensitivity is expected in the regions, however some efficiency is recovered by the HZ regions in the cases in which one of the bosons decays hadronically. The scenario in which the decays to and bosons with a 50% probability more sensitivity is given by the HZ regions, however it also contributes with a 25% to the $\PZ{}\PZ$final state. The exclusion limits for the electroweakino models are shown in figure [fig:susy_interpretation_tchi] as a function of the NLSP and LSP masses.

In the $\PSGczDt\PSGcpmDo$ production model, results allow to exclude (or ) masses up to 750 $\,\text{Ge\hspace{-.08em}V}$. The observed limit is stronger than the expected to the observed yields being smaller than the prediction in two of the ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$bins of the resolved region.

For the $\PSGczDo\PSGczDo$ models, neutralino masses up to 700 $\,\text{Ge\hspace{-.08em}V}$are excluded in the first scenario, and up to 500 $\,\text{Ge\hspace{-.08em}V}$in the second scenario.

The results of the slepton edge region are interpreted in terms of upper limits in sbottom and squark production in the context of the slepton-edge models. The upper limits are shown in figure [fig:susy_interpretation_t6bb] as a function of the squark mass and the mass. These results allow to probe sbottom masses between 1300 and 1600 $\,\text{Ge\hspace{-.08em}V}$, and light flavor squark masses between 1600 and 1700 $\,\text{Ge\hspace{-.08em}V}$, depending on the mass. As described in section [sec:susy_signals], the position of the edge resonance depends on the mass and therefore exclusion limits are stronger in regions with low , which correspond to edge position of 20-60 $\,\text{Ge\hspace{-.08em}V}$, where a slight defect of data is seen.

Finally, the results of the slepton signal regions are interpreted in the context of slepton pair production. Here, both the on-and off-regions are included in the signal extraction fit, to account for any potential signal contamination in the DY+jetsestimation. Upper limits on slepton pair production are shown in figure [fig:susy_interpretation_slepton] as a function of the slepton and mass, showing that the results allow to probe slepton masses up to 650 $\,\text{Ge\hspace{-.08em}V}$for low mass.

Conclusions

A search for new physics in events with two opposite-sign same-flavor leptons has been shown in this chapter. The search presented has been performed with the data collected during the full run 2 of the LHC, but it has been built on top of similar searches performed with data collected during 2015 and 2016.

This search is sensitive to a variety of SUSY models involving both strongly and electroweak produced SUSY. Ad-hoc signal regions have been defined targeting topologies inspired by a set of simplified SUSY models. Data-driven background estimation methods have been developed to estimate the contribution from the main backgrounds. Other backgrounds have been estimated using simulations, validated in dedicated control regions.

The results of the search have been found to be compatible with predictions, and are used to impose upper limits on the production of the simplified models.

Measurements of $\PQt{}\PAQt{}\PH$and $\PQt{}\PH{}$production in the multilepton channel

In this chapter, the measurements of the production of the $\PQt{}\PAQt{}\PH$and $\PQt{}\PH{}$processes are described. The chapter is outlined as follows. Section [sec:ttH_context] contextualizes the measurement, describing the importance and features of signals and the main backgrounds. The general analysis strategy is described in section [sec:ttH_anal_strategy]. Object and event selections are described in sections [sec:ttH_object_selection] and [sec:ttH_evt_selection]. Background estimation methods and signal models are described in section [sec:ttH_sig_bkg_estimation]. Further categorization techniques are used to to perform the signal extraction. These techniques as well as the statistical model employed are described in section [sec:ttH_sig_extraction]. Finally, results and their interpretation in the context of modified couplings of the Higgs boson are shown in sections [sec:ttH_results] and [sec:ttH_interpretation].

The $\PQt{}\PAQt{}\PH$and $\PQt{}\PH{}$processes

After the discovery of the Higgs boson, it is important to measure its properties and check their compatibility with the . Together with its spin, its coupling to massive bosons and fermions is one of the first properties to be checked. The main decay modes of the Higgs boson into some of the most massive particles ${\ensuremath{\PW{}\PW}\xspace}$, ττ, $\PQb\PAQb$ and ${\ensuremath{\PZ{}\PZ}\xspace}$ allow to directly measure this coupling with relatively high precision. The coupling to second generation particles, such as muons or quarks is too small to have a precise measurement.

Top Yukawa sector

The Yukawa coupling of the top quark to the Higgs boson cannot be studied directly in in Higgs boson decays, since the decay into two top quarks is not allowed kinematically. It can instead be explored instead measuring the Higgs boson production via gluon fusion, which is the dominant production mode at the LHC, or in its decay into a $\PGg\PGg$ pair. These processes, for which some of their Feynman diagrams are shown in figure [fig:higgs_feyn_diag], the top quark contributions appears in loops. These loops can however receive contributions from particles. Therefore it is desirable to measure this coupling in processes in which this coupling appears at tree level, and search for potential contributions.

These measurements can be interpreted in the κ framework . In this framework, only modifications of the coupling strengths by effects are considered, assuming the Lorentz structure of the coupling to be the same to the case. These coupling modifiers, κ_i, are defined in a way that cross section and partial widths associated to the interaction of the particle i and Higgs boson scale with a factor κ_i² at leading order. At higher orders, this scaling property is lost, however QCD corrections usually factorize with this scaling.

In some cases, such as $\kappa_{\PGg}$ or $\kappa_{\Pg}$ are use to scale the effective couplings of the Higgs to photons or gluons. These couplings can be resolved taken into account the contributions from the loops, assyming the matter content.

One of the processes that allows to measure κ_t with the highest precision is the associated production of a Higgs boson and a pair, ${\ensuremath{\PQt{}\PAQt{}\PH}\xspace}$. One of the leading order Feynman diagrams for this process is shown in figure [fig:higgs_feyn_diag]. The study of $\PQt{}\PAQt{}\PH$is the most precise way of measuring the Yukawa coupling at the LHC at tree level. At leading order, the cross section of this process is proportional to ∣κ_t∣².

The $\PQt{}\PAQt{}\PH$production cross section has been calculated at in QCD with electrowewak corrections according to the calculation in . The LO contribution features O(α_S²α), while the QCD contribution adds additional terms of the order O(α_S³α), involving the $\PQq\PAQq$, $\Pg\Pg$ and $\Pg\PQq$ channels. The additional electroweak corrections comprise terms proportional to O(α³), O(α_sα²), and O(α_s²α²). O(α_sα³) and O(α⁴) are not included in the calculations as they are expected to be small.

The other main process sensitive to the Yukawa coupling at leading order is the associated production of one top quark and a Higgs boson, $\PQt{}\PH{}$. This process also has the interesting feature of being affected by the relative sign of y_t and the coupling to the boson to the Higgs. Two production modes occur for this process: the t-channel and the $\PQt\PW$-channel. The s-channel is usually not considered due to its lower cross section.

These processes contribute with diagrams that scale proportionally to $\kappa_{\PQt}$ or with $\kappa_{\PW}$ as shown in figure [fig:higgs_feyn_thq]. Since the initial and final state is the same in the diagrams involving $\kappa_{\PQt}$ and $\kappa_{\PW}$, these diagrams interfere. The t-channel and ${\ensuremath{\PQt{}\PW}\xspace}$-channel production modes are well defined up to in the . For higher orders or in the some of the processes interfere. For instance, t-channel interferes with $\PQt{}\PW$and s-channels at higher orders in the hadronic decay of the boson. Nevertheless, this interference is small and calculations at exist in the and for the t-channel. The ${\ensuremath{\PQt{}\PW}\xspace}$ channel, in the other hand, interferes with $\PQt{}\PAQt{}\PH$production at . Calculations exist that tackle this interference using the DR and DS schema covered in chapter [chap:topphysics].

At leading order and at $\sqrt{s} = 13$ TeV, the cross section for these processes can be written as a function of the coupling modifiers as :

$$\sigma_{\PQt\PH\PQq} = (2.63 \kappa_{\PQt}^2 + 3.58 \kappa_{\PW}^2 - 5.21 \kappa_{\PQt} \kappa_{\PW}) \sigma^{\mathrm{SM}}_{\PQt\PH\PQq}$$

$$\sigma_{\PQt\PH\PW} = (2.91 \kappa_{\PQt}^2 + 2.31 \kappa_{\PW}^2 - 4.22 \kappa_{\PQt} \kappa_{\PW}) \sigma^{\mathrm{SM}}_{\PQt\PH\PW}.$$

In the case, the interference is destructive and the cross section for these processes is small. On the opposite case, with $|\kappa_{\PQt}|=|\kappa_{\PW}|$ and $\kappa_{\PQt}/\kappa_{\PW}=-1$, the interference between the diagrams in constructive and the cross section for the t-channel and $\PQt{}\PW$-channel is 11 and 9 times higher than in the , respectively. This scenario is referred to as the ITC and makes $\PQt{}\PH{}$measurements particularly interesting since it does not affect the cross section for $\PQt{}\PAQt{}\PH$production.

The study of these processes can be used to probe CP violation in the Higgs sector. Even if the pseudoscalar Higgs boson is disfavored by experimental data , CP violation in the top-Higgs interaction is still allowed. In particular, beyond the κ framework, the Lagrangian of the top Yukawa sector can be generalized as follows :

$$\mathcal{L} = \bar{\psi_{\PQt}} \left( \cos(\alpha) \kappa_{\PH\PQt\PQt} + \sin(\alpha) \kappa_{\mathrm{A}\PQt\PQt} \gamma^5 \right)\frac{y_{\PQt}}{\sqrt{2}} \psi_{\PQt} \phi,$$

where α is a CP-mixing phase, and $\kappa_{\PH\PQt\PQt}$ and $\kappa_{\mathrm{A}\PQt\PQt}$ are dimensionless rescaling parameters. $\psi_{\PQt}$ corresponds to the top quark spinor, and ϕ is a scalar representing the Higgs field. The case is recovered when α = 0, $\kappa_{\PH\PQt\PQt}=1$ and $\kappa_{\mathrm{A}\PQt\PQt}=0$. Instead α = π corresponds to the ITC case. These two scenarios correspond to the absence of CP violation, while cases with 0 < α < π allow some degree of violation. The pure CP odd case occurs with α = π/2.

Of course, contributions to this Lagrangian are significantly constrained by measurements of the Higgs production cross section via gluon fusion and its $\PGg\PGg$ decay. However, the cross section for these processes is recovered when $\kappa_{\PH\PQt\PQt} =1$ and $\kappa_{\mathrm{A}\PQt\PQt} = 2/3$. For these choices, all possible values of α will result in a gluon fusion cross section consistent with the . However, the cross sections for $\PQt{}\PAQt{}\PH$and $\PQt{}\PH{}$production are affected by this CP phase, as shown in figure [fig:higgs_cp_violation].

$\PQt{}\PAQt{}\PH$and $\PQt{}\PH{}$production can also be sensitive to some of the terms of the Higgs potential . As seen in the diagram in figure [fig:kappa_lambda], higher order electroweak corrections to this process includes diagrams proportional to κ_λ, the modifier of the trilinear term of the Higgs potential. Similar diagrams appear in the $\PQt{}\PH{}$production, also shown in figure [fig:kappa_lambda]. This figure also shows the effect of the κ_λ correction with respect to the LO calculation, confirming the potential sensitivity of a differential $\PQt{}\PAQt{}\PH$or $\PQt{}\PH{}$measurement to κ_λ.

Experimental status

The production rate of $\PQt{}\PAQt{}\PH$in $\Pp\Pp$collisions is shown as a function of $\sqrt{s}$ in figure [fig:higgs_cross_section_sqrts], compared to other Higgs production modes. The $\PQt{}\PAQt{}\PH$cross section is one order of magnitude lower than the main Higgs production modes, and the $\PQt{}\PH{}$cross section is one order of magnitude lower. This makes the study of this process very challenging. Additionally, the different decays of the top quarks and the Higgs boson lead to a large variety of final states with very different topologies.

The bb̄ decay of the Higgs is the dominant mode in terms of rate at the LHC, but is affected by large backgrounds. It is searched for in fully hadronic, single lepton and dilepton channels. The fully hadronic channel is largely affected by multijet background, while the + jets background affects to all the channels. The latter includes contributions from in association with light jets, with -jets, and the irreducible contribution from + $\PQb\PAQb$. The latest results in this decay mode were obtained by the CMS Collaboration with 77.4 fb^− 1 of $\Pp\Pp$collisions , and with 36.1 fb^− 1 by the ATLAS Collaboration . Both analyses make use of multivariate analysis techniques to discriminate signal from background events. The CMS analysis achieves a sensitivity of 3.9 standard deviations to the $\PQt{}\PAQt{}\PH$production, while the analysis by ATLAS achieves a sensitivity with 1.4 standard deviations with the single lepton and dilepton categories only.

The $\PGg\PGg$ decay mode of the Higgs boson allows for a cleaner environment with more reduced background. This final state also allows to easily resolve the Higgs system, which makes this channel suitable for differential studies. Additionally, the available datasets recorded by the ATLAS and CMS experiment make measurements in his channel competitive to those of the rest of the channels. Both Collaborations have released results with the full run 2 datasets , reaching an observation of the process with 5.2 and 6.6 standard deviations, compared to 4.4 and 4.7 expected, respectively. The analysis by ATLAS also includes a measurement of $\PQt{}\PH{}$production, but still without enough sensitivity to obtain an observation. Both analyses include a study of the CP violation in this sector, excluding the pure CP-odd model at 3.9 and 3.2 σ, respectively.

This chapter of the thesis is dedicated to the measurement of $\PQt{}\PAQt{}\PH$and $\PQt{}\PH{}$production in the multilepton channel, so it will be described in greater detail in the following sections. This channel probes the $\PW\PW$ and $\PGt\PGt$ decay channels, although it has a small contribution from the $\PZ\PZ$ decays. These decay modes have a moderately large branching fraction, and searches for these processes are made in final states in which only small backgrounds are expected, such as regions with at least same-sign leptons and moderately large jet multiplicity. The search shown in this chapter corresponds to the CMS measurement performed with the full run 2 dataset. Previous searches to which I have contributed to are the ones released with the 2016 and 2017 dataset , obtaining a signal strength of 0.96_− 0.31^+ 0.34. The latest result by the ATLAS Collaboration was shown in , measuring a signal strength for $\PQt{}\PAQt{}\PH$of 0.58_− 0.33^+ 0.36.

Measurements of the $\PZ{}\PZ$decay of the Higgs boson also typically include dedicated categories targeting the $\PQt{}\PAQt{}\PH$production modes. However, despite of the interest of these final states, their branching ratio is still too small for a measurement of the process with the available datasets.

Additionally to the recent observations of these process in the $\PGg\PGg$, the $\PQt{}\PAQt{}\PH$production had already been observed in the datasets collected during 2016 by the ATLAS and CMS Collaboration. This observations were made by combining the most sensitive of the channels described above, including the one described in this thesis.

Irreducible backgrounds in the multilepton channel

On top of the importance of the $\PQt{}\PAQt{}\PH$process itself for the reasons mentioned above, some of its backgrounds in the multilepton channels are interesting themselves as studies of the consistency or as probes to physics. This is the case of $\PQt{}\PAQt{}\PZ$and $\PQt{}\PAQt{}\PW$production, which are the leading contribution of the irreducible backgrounds to the measurement. Their production cross section is small, however it is comparable to that of the signals.

$\PQt{}\PAQt{}\PZ$production is interesting as the diagrams exist in which the boson couples to the top quarks at leading order. Therefore it is one of the best probes to test this coupling in the . Additionally, both $\PQt{}\PAQt{}\PZ$and $\PQt{}\PAQt{}\PW$are sensitive to physics affecting this and other couplings, that can be parameterized as additional terms in the Lagrangian.

$\PQt{}\PAQt{}\PZ$and $\PQt{}\PAQt{}\PW$production cross sections have been calculated to accuracy with electroweak corrections in this result , which is taken as a reference. These corrections, similarly to those in $\PQt{}\PAQt{}\PH$, do not include O(α_sα³) terms. However, the contribution from this correction is sizable for $\PQt{}\PAQt{}\PW$production, as it contains the $\Pg\PQq\to \PQt\PAQt\PW^{\pm}\PQq'$ real emission channel that includes $\PQt\PW\to\PQt\PW$ scattering, and represents a 12% correction with respect to the LO cross section . The full + cross section including electroweak corrections is reported in this reference . Additionally, the presence of corrections with one extra parton can be sizable, of the order of an additional 10% . Finally, it was recently suggested that off-shell effects in the production of the top quarks yields to negligible corrections to the inclusive cross-section, but may have a sizable effect in kinematic distributions .

The ATLAS and CMS Collaborations have performed measurements of these processes at $\sqrt{s}=8$ and 13 TeV. The most recent results on the topic reach high level of precision for $\PQt{}\PAQt{}\PZ$ , showing a fairly good compatibility with the state-of-the-art predictions. On $\PQt{}\PAQt{}\PW$both ATLAS and CMS have released results in which data shows a preference for higher values of the cross section for this process .

Analysis strategy

In the following sections, the measurement of the $\PQt{}\PAQt{}\PH$and $\PQt{}\PH{}$production cross section in the multilepton channel performed with data collected with the CMS detector is described. This analysis is performed in evens with at least two light leptons and hadronically decaying in the final state. Event categories are built depending on the lepton multiplicity and flavor. Categories with requirements on hadronic $\PGt$ (${\ensuremath{\PGt_\mathrm{h}}\xspace}$) provide sensitivity to the $\PH\to\PGt\PGt$ decay mode, while the remaining ones target the $\PW\PW$ and $\PZ\PZ$ decays, although they provide some sensitivity to the $\PGt\PGt$ channel thanks to the leptonic decays of the $\PGt$.

The topologies selected in the analysis, require relatively large lepton multiplicity or lepton pairs with equal charge, are chosen in such a way that only rare processes, such as the signal, are accepted. Because of this, and since the cross section of the signal is very small, typical backgrounds with cross section orders of magnitude large may have a sizable contribution due to events with misidentified leptons are misidentified or with leptons coming from non-prompt decays.

Because of this, one of the key ingredients of the analysis is the object identification. This analysis that is crucial in the case of leptons, as large discrimination is needed to reject leptons coming from non-prompt decays of partons, while keeping a high signal efficiency. For that purpose, the MVA techniques for electron and muon identification, as shown in chapter [chap:muon]. Additionally, state-of-the-art identification algorithms are used to discriminate $\PGt_\mathrm{h}$from jets.

On top of the lepton multiplicity cuts, subcategories are defined by applying additional criteria are applied to the jet multiplicity and kinematic variables describing the global features of the event. Then, events are further classified according to multivariate discriminators taking as inputs kinematic variables of the reconstructed objects. These classifiers are trained to discriminate signal from the various backgrounds in each category. Additionally, a control analysis is performed to check the consistency of the result in the most sensitive categories without using multivariate discriminators trained in simulations.

Despite of the usage of advanced techniques for background discrimination, signal regions have a sizable contribution from backgrounds. This contribution needs to be accounted for in an accurate way in order to achieve a high precision in the measurement of the signal. Two types of background processes are considered: irreducible and reducible backgrounds.

Irreducible backgrounds are those in which the reconstructed leptons are genuine leptons coming from prompt decays, and reconstructed ${\ensuremath{\PGt_\mathrm{h}}\xspace}$ are produced in hadronic $\PGt$ decays. They are dominated by $\PQt{}\PAQt{}\PW$and $\PQt{}\PAQt{}\PZ$production in the main categories, but with significant contributions from $\PQt{}\PZ{}\PQq$and $\PW{}\PZ$. events also appear as irreducible backgrounds in some categories. These processes are estimated using state-of-the-art simulations of these process, and are normalized to the most accurate cross section prediction. Additionally, control regions are defined whenever is possible to check the level of agreement between data and this models.

Reducible backgrounds are those which enter the signal region, but their topology does not fit the signal region definition. Here it is distinguished between non-prompt leptons, “flips” and “conversions”. “Flips” are leptons in which the sign of their charge has been measured in correctly, while “conversions” refer to electrons have been produced in the conversion of a photon. The non-prompt leptons and flips are estimated using dedicated data-driven methods, while conversions are estimated with simulations.

The work described in this chapter has been performed in collaboration with other groups in the CMS Collaboration: CERN, IHEP, LLR/CNRS (Ecole Polytechnique), NICPB (Tallin) and UCL. My contributions to the analysis focus on the categories without $\PGt_\mathrm{h}$. I have contributed on the development of the electron and muon identification criteria and their optimization, to the estimation of the non-prompt lepton background, to the development of new control regions for irreducible and to the signal extraction fit.

Object selection

As mentioned above, the selection of objects is designed to achieve the maximum purity, particularly on leptons, which are used to define the topology of event candidates.

Light lepton selection

Three types of light lepton (and ) selections are defined. “Loose” leptons are the basic selection on top of which the other two definitions are used. It is designed to be very inclusive, keeping a large signal efficiency, but also a quite large acceptance of leptons coming from quark decays. Loose leptons are only used to remove duplicate leptons that are close-by and to reject low mass resonances or efficiently reduce events with a candidate.

Loose leptons are required to have at least 5 $\,\text{Ge\hspace{-.08em}V}$and to be within the acceptance, ∣η∣ < 2.4 for muons and ∣η∣ < 2.5 for electrons. Additionally, they are required to have an impact parameter with respect to the primary vertex of less than 0.05 cm in the transverse plane and 0.1 cm in the direction along the beam line. The significance of this impact parameter, defined as the distance over its associated uncertainty, is also required to be less than 8 standard deviations. A mild isolation cut is applied to these leptons by requiring mini-isolation to be less than 40% the p_Tof the lepton. Muons are required to pass the “loose” working point defined in section [sec:muo_identification]. Electrons are required to have ${\ensuremath{p_{\mathrm{T}}}\xspace}>7$ $\,\text{Ge\hspace{-.08em}V}$, at most one missing hit from its expected trajectory in the tracker, and pass the “loose” MVA selection defined in [sec:electron_reconstruction_id].

Leptons passing the “fakeable” lepton selection are used to estimate the contribution from non-prompt leptons in data samples. This selection is constructed to keep a higher acceptance rate for non-prompt leptons, but it is tuned so that the efficiency for a non-prompt lepton passing the fakeable selection criteria to enter the signal region is constant regardless of the flavor of the parton producing the lepton. Since these leptons are used to estimate the contribution from non-prompt leptons, all the regions are constructed applying cuts on the fakeable object multiplicity. Jets that have been clustered together with a fakeable lepton are not taken into account.

Fakeable leptons are required to pass the loose identification criteria. Additionally, the jet associated to the lepton is required to fail the medium working point of the DeepFlavor discriminant. This cut is applied to avoid removing jets passing this criterion, which are likely to be jets, and the lepton, a non-prompt lepton produced in a decay of the hadrons in the parton shower. Additionally, electrons are required to have no missing hits from the expected trajectory, and to pass a set of cuts designed to be tighter than the requirements at the HLT. These cuts set on the ratio of the energy in the HCAL to the energy in the ECAL, H/E < 0.1; the difference between the inverse of the electron cluster energy and the inverse of the track momentum, $\frac{1}{E} - \frac{1}{p} > -0.04$, and the width of the electron cluster in the η direction, σ_iηiη < 0.011(0.03) for η_SC < 1.479 (η_SC > 1.479). Electrons are also rejected if they are associated with a successfully reconstructed conversion vertex. Finally, electrons that are closer to a loose muon than 0.3 in ΔR are discarded.

Then, additional requirements are applied to leptons not passing the tight prompt-lepton MVA identification criteria described below. For electrons, the jet relative isolation variable, defined in section [sec:muon_lepton_mva], is required to be less than 0.7. For muons this variable is required to be less than 0.5 and the DeepFlavor score of the associated jet is required to fail a working point that ranges between the loose and the medium, depending on the muon cone-p_T.

The cone-p_Tvariable is designed to be, in non-prompt leptons, a proxy of the p_Tof the parton that has originated the jet in which the lepton is produced and, in prompt-leptons, a proxy of the actual lepton. It is therefore defined as the p_Tfor leptons passing the tight selection, and as $0.9{\ensuremath{p_{\mathrm{T}}}\xspace}(1+I_{jet})$ for the rest, where I_jet is the jet relative isolation. This variable is more suitable to parameterize the fake-rate, therefore the event-level variables are defined on top of it. In this chapter, cone p_Tis referred to as p_Tfor fakeable leptons, unless explicitly mentioned. Fakeable leptons are required to have cone-p_T > 10 $\,\text{Ge\hspace{-.08em}V}$.

Finally, the “tight” selection is used to achieve the maximum purity, with stringent cuts on the prompt-$\Pe$ or prompt-$\PGm$ variable, and is used to define the signal and irreducible background control regions. Leptons in this category are required to have a prompt-lepton MVA, defined in section [sec:muon_lepton_mva], score greater than 0.85 (0.8) for electrons (muons). Muons are additionally required to pass the medium criteria described in section [sec:muo_identification]. The working point of the cut on the prompt-lepton MVA has been carefully tuned to achieve the maximum sensitivity in the analysis. All the selection criteria in electrons and in muons are summarized in tables [tab:eleid] and [tab:muoid], respectively.

selection

$\PGt_\mathrm{h}$are reconstructed using the method described in section [sec:tauh_reconstruction], and the working points defined in the Deep Tau v2.1 discriminators are used. Similarly to the selection of light leptons and with the same purposes, three levels of $\PGt_\mathrm{h}$identification criteria are defined for $\PGt_\mathrm{h}$: loose, fakeable and tight.

Loose $\PGt_\mathrm{h}$are required to have p_T > 20 $\,\text{Ge\hspace{-.08em}V}$and ∣η∣ < 2.3. Impact parameter is required to be less than 1000 cm and 0.2 in the transverse direction and along the beam direction, respectively. The Deep Tau discriminator against jets is required to pass the VVLoose working point.

Fakeable $\PGt_\mathrm{h}$are additionally required to pass the VLoose Deep Tau discriminator against muons and the VVVLoose Deep Tau discriminator against electrons. Only $\PGt_\mathrm{h}$that have not been reconstructed as the 2-prong+π⁰ decay mode are considered. Tight $\PGt_\mathrm{h}$implement additional cuts on the Deep Tau discriminator against jets, that depend on the category. All these selections are summarized in table [tab:tth_tauid].

Jets and tagging

Jets in the analysis are reconstructed following the procedure outlined in section [sec:jet_definition]. They are required to have ∣η∣ < 5 and p_T > 25 $\,\text{Ge\hspace{-.08em}V}$, maximizing the acceptance to reconstruct account the spectator quark in $\PQt{}\PH{}$events, which is expected to be emitted in a forward direction a portion of the times. Jets within the tracker acceptance are referred to as “jets”, while the rest are dubbed as “forward jets”. Jets with 2.7 < ∣η∣ < 3 are required to have p_T > 60 $\,\text{Ge\hspace{-.08em}V}$, as this region is problematic due to noise in the calorimeters, which introduces a large bias in the p_Testimation of soft jets.

Since all PF candidates are introduced in the jet clustering algorithm, leptons may be also identified as jets. Since we are only interested in jets produced in the parton shower of a colored particle, jets that include the PF candidate associated to a fakeable electron or muon, or are closer in ΔR < 0.3 to a loose $\PGt_\mathrm{h}$.

Jets are considered as -tagged if they pass the medium or loose working point of the DeepFlavor discriminant. Some of the MVA variables used in the analysis make use of the score of the discriminator. Ad-hoc calibrations are applied to simulations in order for the shape of the discriminator to be well modeled.

Finally, we also define “light jets” as the set jets with ∣η∣ < 2.4 that fail the loose -tag jet requirement or have ∣η∣ > 2.4.

Event selection and classification

Events are recorded using a set of single-, double- and triple-lepton triggers, lepton+$\PGt_\mathrm{h}$triggers, and double $\PGt_\mathrm{h}$triggers. Triggers are required to be consistent with the minimum lepton requirement on each category. For instance, events in a category with only two leptons are required to have fired a single- or double-lepton trigger, while no requirements are applied on the triple-lepton triggers or triggers with $\PGt_\mathrm{h}$. These triggers require leptons and $\PGt_\mathrm{h}$to have a p_Tabove a certain threshold, in order to keep their acceptance rate within acceptable values. This threshold varies by a few $\,\text{Ge\hspace{-.08em}V}$depending on the data taking period, to account for the conditions of each period. The ranges between these triggers vary are shown in table [tab:ttH_triggers].

As mentioned in the previous sections, several signal regions are defined according to the lepton and $\PGt_\mathrm{h}$multiplicity of the event. Each one of these categories aims for a specific decay mode of the top quarks and the Higgs boson, therefore specific cuts are applied for each signal region separately. All events that have a pair of loose leptons with invariant mass smaller than 12 $\,\text{Ge\hspace{-.08em}V}$are rejected, in order to avoid contributions from low mass resonances.

Signal regions

Signal regions are defined here. The three first regions $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$, $3 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$and $2 \Pl ss +1 {\ensuremath{\PGt_\mathrm{h}}\xspace}$aim to be sensitive to both $\PQt{}\PAQt{}\PH$and $\PQt{}\PH{}$processes, while the others target $\PQt{}\PAQt{}\PH$production only.

$2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region

In this region, the $\PH\to\PW\PW$ decay is targeted, in which one of the bosons decays hadronically and the other one leptonically. One of the two top quarks in $\PQt{}\PAQt{}\PH$events decays hadronically and the other one, leptonically. In the case of $\PQt{}\PH{}$, the top quark decays leptonically. The expected topology is then two leptons and up to six hadronic jets for $\PQt{}\PAQt{}\PH$events and four for $\PQt{}\PH{}$. The signs of the leptons coming from the and from the top quark are uncorrelated, so this final state will produce same-sign leptons 50% of the cases. Since this signature is very rare in the , leptons are required to have the same sign.

The leading (sub-leading) lepton is required to have p_T > 25 (15) $\,\text{Ge\hspace{-.08em}V}$, to be above the trigger thresholds. Additional cuts are applied on the consistency of the measured charge in electrons and muons to suppress background events in which this charge has been measured with the incorrect sign. In the case of muons, the estimated uncertainty on the track p_Tmust be smaller than 20% the magnitude of the p_T. For electrons, the consistency is required between the determination of the charge sign from the relative position of the ECAL cluster and the track, and the determination from the track curvature.

Background from $\PQt{}\PAQt{}\PZ$events is largely removed by rejecting events that contain an opposite-sign same-flavor pair of loose leptons with $|{\ensuremath{m_{\Pl\Pl}}\xspace}- m_{\PZ}| < 10$ $\,\text{Ge\hspace{-.08em}V}$. In the cases in which the two selected leptons are electrons events are required to have ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$LD > 30$\,\text{Ge\hspace{-.08em}V}$, to reject residual contribution from DY+jetsevents with mismeasured charge.

Two different selections are applied targeting $\PQt{}\PAQt{}\PH$and $\PQt{}\PH{}$production. Events are requested to pass either one of these selections:

In order to remove the overlap with respect to the rest of the categories, events with more than two tight leptons or a loose $\PGt_\mathrm{h}$are rejected.

$3 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region

In this region, the $\PW\PW$ decay of the Higgs is also targeted. In this case, the final state aimed is the one in which three of the bosons decay leptonically and the remaining ones, hadronically. Therefore four jets are expected in $\PQt{}\PAQt{}\PH$events and two in $\PQt{}\PH{}$events. Exactly three tight leptons are required in the event, passing p_T > 25, 15 and 10 $\,\text{Ge\hspace{-.08em}V}$cuts. The sum of the lepton charges is required to be either +1 or -1. $\PQt{}\PAQt{}\PZ$events are suppressed by rejecting events with an opposite-sign same-flavor pair of leptons with $|{\ensuremath{m_{\Pl\Pl}}\xspace}- m_{\PZ}| < 10$ $\,\text{Ge\hspace{-.08em}V}$. Again, two different selections are applied targeting $\PQt{}\PAQt{}\PH$and $\PQt{}\PH{}$production:

Events are required to pass one of the two selections. Additionally, if two pairs of opposite-sign same-flavor leptons are present in the event, $m_{\Pl\Pl\Pl\Pl}$ is required to be greater than 140 $\,\text{Ge\hspace{-.08em}V}$, to remove the overlap between this analysis and the dedicated $\PH\to\PZ\PZ$ analysis .

$2 \Pl ss +1 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region

In this category, we target the decay of the into a $\PGt\PGt$ pair. One of the must decay leptonically and the remaining into a $\PGt_\mathrm{h}$. One of the other top quarks must decay leptonically and the other, in the $\PQt{}\PAQt{}\PH$case, hadronically. Four jets are then expected for $\PQt{}\PAQt{}\PH$events and two in $\PQt{}\PH{}$. Similarly to the $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region, the two leptons are required to have the same sign.

Besides the additional $\PGt_\mathrm{h}$requirement, which is required to pass the very-loose WP of the Deep Tau discriminator against jets, is identical to that applied in the $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region, with the exception that the if the sub-leading lepton is a muon, its p_Tthreshold is relaxed to 10 $\,\text{Ge\hspace{-.08em}V}$. Additionally, the charge of the $\PGt_\mathrm{h}$is required to be opposite to the charge of the leptons. Finally, the presence of additional loose $\PGt_\mathrm{h}$is vetoed in order to keep orthogonality with the rest of analysis regions.

$1 \Pl +2 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region

This category is designed for $\PQt{}\PAQt{}\PH$events in which the Higgs boson decays into ${\ensuremath{\PGt_\mathrm{h}}\xspace}{\ensuremath{\PGt_\mathrm{h}}\xspace}$ pair, one of the top quarks decays leptonically and the other, hadronically. Events are required to have a lepton passing the tight criteria and two $\PGt_\mathrm{h}$passing the medium working point of the discriminator against jets. The lepton is required to have ∣η∣ < 2.1, as this cut is applied at trigger level, and p_T > 30 ( > 25) $\,\text{Ge\hspace{-.08em}V}$if it is an electron (muon). The two $\PGt_\mathrm{h}$are required to have the same charge, and the leading one is required to have p_T > 30 $\,\text{Ge\hspace{-.08em}V}$. Events with less than three jets or not passing the -tag multiplicity criteria are rejected. Events with more than one lepton passing the tight criteria are rejected.

$2 \Pl os +1 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region

This category targets to the same decay chain of $\PQt{}\PAQt{}\PH$events as the $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$category. However, in this case, leptons are required to have the opposite-sign. The $\PGt_\mathrm{h}$must pass the very tight working point of the discriminator against jets. Events with an opposite-sign same-flavor pair of loose leptons with $|{\ensuremath{m_{\Pl\Pl}}\xspace}-m_{\PZ}|<10$ $\,\text{Ge\hspace{-.08em}V}$are vetoed. If the two selected leptons have the same flavor is required to have ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$LD > 30$\,\text{Ge\hspace{-.08em}V}$and at least three jets and the multiplicity criteria.

$3 \Pl +1 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region

This region aims for $\PQt{}\PAQt{}\PH$events in which the Higgs boson decays into a $\PGt\PGt$ pair and the two top quarks decay leptonically. One of the two decays leptonically and the other one into $\PGt_\mathrm{h}$. At least three tight leptons are required in this category with p_Tgreater than 20, 15 and 10 $\,\text{Ge\hspace{-.08em}V}$, respectively, and one $\PGt_\mathrm{h}$passing the very loose working point of the discriminator. The sum of charge of the three leptons and the $\PGt_\mathrm{h}$is required to be zero. Events with an opposite-sign same-flavor pair loose lepton pair with an invariant mass closer than 10 $\,\text{Ge\hspace{-.08em}V}$to the mass are rejected. Events are required to have at least two jets, with the usual jet multiplicity cut.

$2 \Pl +2 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region

Events in this region aim for the leptonic decay of the two top quarks in $\PQt{}\PAQt{}\PH$events and the decay of the Higgs into two $\PGt_\mathrm{h}$. Events are required to have two tight leptons with the leading one with p_T > 25 $\,\text{Ge\hspace{-.08em}V}$and the subleading with ${\ensuremath{p_{\mathrm{T}}}\xspace}> 10 (15) {\ensuremath{\,\text{Ge\hspace{-.08em}V}}\xspace}$ if it is a muon (electron). The two $\PGt_\mathrm{h}$are required to pass the medium working point of the discriminator. The sum of charges of the four objects is required to be zero. The usual candidate veto with loose leptons is applied. Events with less than four jets are required to have ${\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}$LD > 30$\,\text{Ge\hspace{-.08em}V}$, or 45$\,\text{Ge\hspace{-.08em}V}$if the event contains an pair of opposite-sign same-flavor fakeable lepton pair.

$4 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region

This region aims to be sensitive to $\PQt{}\PAQt{}\PH$events in which the Higgs decays into $\PW\PW$, which decay leptonically, and the two top quarks decay leptonically as well. Besides the requirement of a fourth lepton, whose p_Tis required to be greater than 10 $\,\text{Ge\hspace{-.08em}V}$, and the fact that no requirement is made to the charge of the leptons, the selection is identical to the $3 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$$\PQt{}\PAQt{}\PH$region.

$0 \Pl +2 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region

This region looks for $\PQt{}\PAQt{}\PH$events in which the Higgs boson decays into a pair of $\PGt_\mathrm{h}$and the two top quarks decay hadronically. Events in this region are required to contain two $\PGt_\mathrm{h}$passing the loose working point of the discriminator and no leptons passing the tight criteria. The two $\PGt_\mathrm{h}$must have the opposite charge and be within the trigger acceptance (∣η∣ < 2.1). The two $\PGt_\mathrm{h}$must have p_T > 40$\,\text{Ge\hspace{-.08em}V}$and events are required to contain at least 4 jets, to reduce the contribution from events, which is an irreducible background. The usual tag multiplicity criteria are applied to events in this region.

$1 \Pl +1 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region

This category aims for $\PQt{}\PAQt{}\PH$events in which the Higgs boson decays into a $\PGt_\mathrm{h}$and a lepton, that is produced in the decay of the other . Top quarks decay hadronically in this case. Events in this category must contain an electron or a muon passing the tight identification criteria and a fakeable $\PGt_\mathrm{h}$passing the medium working point of the discriminator. The lepton is required to have ∣η∣ < 2.1 to be within the acceptance of the trigger and to have a p_Tgreater than 30 (25) $\,\text{Ge\hspace{-.08em}V}$for electrons (muons). The $\PGt_\mathrm{h}$must also have p_T > 30 $\,\text{Ge\hspace{-.08em}V}$. Events must contain at least four jets with the usual tag multiplicity cut.

Control regions

To check the modeling of some of the main irreducible backgrounds, control regions are defined to check their modeling in simulations. The 3$\Pl$ control region is used to validate $\PW{}\PZ$and $\PQt{}\PAQt{}\PZ$events, while the 4$\Pl$ control region is built to check the agreement of $\PZ{}\PZ$and $\PQt{}\PAQt{}\PZ$events.

These regions are built by relaxing and inverting cuts from the $3 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$$\PQt{}\PAQt{}\PH$and the $4 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region, respectively.

3$\Pl$ control region

The three lepton control region is defined by inverting the veto on an opposite-sign same-flavor pair of loose leptons with an invariant mass close to the boson mass of the $3 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$$\PQt{}\PAQt{}\PH$region. Additionally, no explicit cut on the jet or jet multiplicity is applied. Instead, events are classified in bins of jet and jet multiplicity. All of the classes require at least one reconstructed jet. Classes with low (high) jet multiplicity are expected to be more enriched in $\PW{}\PZ$($\PQt{}\PAQt{}\PZ$) events.

4$\Pl$ control region

The four lepton control region is defined by inverting the veto of the $4 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region. Again, no explicit cuts on the jet multiplicity are applied. In this case, events are classified depending on the presence of a second opposite-sign same-flavor pair, the jet and jet multiplicity.

The categories in the 3$\Pl$ and 4$\Pl$ regions are outlined in table [tab:tth_cr_cats]. The event yield in these categories is used in the signal extraction fit to constrain the irreducible backgrounds.

Signal and background estimation

In this section, the estimation of the various backgrounds and signal is described.

Reducible backgrounds

Reducible backgrounds are composed by different effects: non-prompt leptons and misidentified $\PGt_\mathrm{h}$, lepton charge mismeasurement and photon conversions. For the first two data-driven approaches are followed, while simulations are used for photon conversions.

Non-prompt leptons

Thanks to the high purity of the lepton selection in this analysis, the probability for a non-prompt lepton or a jet to be identified as a prompt lepton is very small. However, since the cross section of many backgrounds, such as events, is very large compared to that of signal, a significant contribution from processes containing non-prompt leptons may be present in the selection. This contribution is dominated in the most sensitive regions by events. For instance, semileptonic events may enter in the $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$category if a lepton with the same sign to the prompt lepton is produced in one of the jets. Dileptonic may also contribute to the $3 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region if a non-prompt lepton is produced.

To estimate this background a data-driven approach is followed, taking a sideband from the signal region. This region is referred to as application region (AR), and its definition is identical to the signal regions, but from the fact that the lepton identification criteria from the signal regions are loosened to require leptons only to pass the fakeable criteria. The AR is enriched in events with non-prompt leptons, that can be used to estimate the contribution of this process in the signal region with a suitable transfer factor. The contribution of processes with prompt leptons in the AR is subtracted before the transfer factor is applied.

This approach has the main advantage with respect to simulation based approaches that no assumptions are needed on the wellness of the modeling of the event variables. For instance, signal regions in this analysis require a higher jet multiplicity to what is expected in events. The generation of additional these jets is usually difficult to model and sensitive to the tune of the underlying event parameters .

Additionally, since the transfer factor between the SR and the AR is determined on data, the approach does not make strong assumptions on the modeling on the rate of non-prompt fakeable leptons passing the tight identification. This factor is referred to as fake-rate. However, since this method is also validated using simulations, and the fact that similar approaches could be followed to calibrate the fake-rate of simulated events, this does not represent a significant gain with respect to simulation-based approaches.

The method has two steps: the measurement of the fake-rate in a region enriched in non-prompt leptons, the measurement region (MR), and the application of the fake-rate to events in the AR to obtain an estimate of the non-prompt background in the SR. The whole procedure is then validated with a closure test in simulated events, that is used to asses any potential bias due to variations of the fake-rate between the MR and the AR.

Fake-rate measurement

The fake-rate is measured as a function of the cone-p_Tand in two ∣η∣regions, separately for electrons and muons. For that purpose, a sample of multijet data events enriched in non-prompt leptons is used. This sample is constructed by requiring the presence of exactly one electron or muon passing the fakeable selection criteria. Additionally, at least one recoiling jet is required in the event, which is required to be separated by the lepton with ΔR > 0.7.

While this sample is dominated by multijet events, non-prompt contribution to the signal regions comes mostly from events. A different flavor composition and momentum spectrum of the partons yielding to the production of non-prompt leptons is expected. To account for this, the definition of the fakeable selection criteria was carefully tuned to have a similar flavor composition between the two samples. The difference in p_Tdistribution of the partons between samples is handled by parameterizing the fake-rate as a function of the cone-p_T, instead of the reconstructed p_T.

Events are collected using several single lepton triggers with different p_Tthresholds and no isolation criteria. These triggers have been designed specifically for the purpose of measuring this fake-rate and are prescaled to keep their acceptance rate under control. Electron triggers require the presence of an additional jets. This cut is only applied in muon triggers for low p_Tthresholds.

A cut is applied on the reconstructed p_Tof the lepton to be above the turn-on of the trigger. Additionally, these events are only used to measured the fake-rate above a certain cone-p_Tvalue. Otherwise, since cone-p_Tis a correction based on the isolation of the lepton, selecting leptons with cone-p_Tclose to the p_Tthreshold would enrich the selection in leptons that are isolated, introducing a bias in the measurement. Thanks to the isolation cut of less than 0.4 ${\ensuremath{p_{\mathrm{T}}}\xspace}$, selecting leptons with cone-p_Tabove twice the p_Tthreshold effectively removes this bias. An additional bias is introduced by the trigger selection, since the electron triggers impose quality cuts on the electron. This is avoided by applying the trigger emulation cuts described in section [sub:ttH_light_leptonsel].

Passing and failing categories are defined depending on whether the lepton passes the tight selection or not. Then, the fake-rate f can be computed from the number of non-prompt leptons in each of the categories, N_pass and N_fail:

Contribution from other processes, such as +jets events, DY+jets, or events may contribute to this region with prompt leptons. This contribution is higher in the numerator of the measurement. For the measurement of the fake-rate this contribution must be subtracted. To discriminate between non-prompt and prompt the M_T^fixvariable is defined as

$${\ensuremath{\mathrm{M_T}^{\mathrm{fix}}}\xspace}= \sqrt{2{\ensuremath{p_{\mathrm{T}}}\xspace}^{\mathrm{fix}} {\ensuremath{{\ensuremath{p_{\mathrm{T}}}\xspace}^\text{miss}}\xspace}(1-\cos\Delta\phi)},$$

where Δϕ is the azimuthal angle between the lepton momentum and the ${\vec p}_{\mathrm{T}}^{\kern1pt\text{miss}}$, and ${\ensuremath{p_{\mathrm{T}}}\xspace}^{\mathrm{fix}}$ is a constant value set to 35 $\,\text{Ge\hspace{-.08em}V}$. This is a variation of the M_Tvariable, in which the p_Tof the lepton is set to 35 $\,\text{Ge\hspace{-.08em}V}$to reduce its correlation with the p_T, since the fake-rate may have a p_Tdependence.

The contribution from non-prompt leptons in the two categories is determined by performing a fit to the M_T^fixshape simultaneously in the passing and failing categories. The fitting distributions are templates obtained from multijet simulations and simulations from the processes leading to prompt-leptons. The normalizations of the templates are free parameters of the fit. Additionally, nuisance parameters are added to the fit that allow to modify the shape of these templates, considering a linear deformation and a stretching of the template. Statistical uncertainties of the templates are also nuisance parameters of the fit.

Alternative fits are considered to estimate the systematic uncertainties associated to this subtraction of prompt processes. One source of uncertainty is obtained by fitting the overall normalization of prompt leptons to all bins, and applying it to all bins. Another alternative considered is to divide each bin of the measurement in two regions with high and low values of M_T^fix. This gives two independent measurements of the fake-rate with no subtraction of the prompt lepton component. Taking the ratio of events with prompt leptons with high and low values of M_T^fixfrom simulations, the fake-rate measurements in the two categories can be unfolded to obtain an unbiased estimate of the fake-rate.

Fake-rate application

Once the fake-rates, f, have been obtained, a transfer factor is built to extrapolate from the AR to the SR. This transfer factor depends on the lepton multiplicity that is used in the signal region definition. In a region with two leptons, the expected number of non-prompt events with two passing leptons N_pp can be written as

$$N_{\text{pp}} = \sum_{\text{pf}} \frac{f_1}{1-f_1} + \sum_{\text{fp}} \frac{f_2}{1-f_2} - \sum_{\text{ff}} \frac{f_1}{1-f_1}\frac{f_2}{1-f_2},$$

where f_i is the fake-rate evaluated in the i-th lepton, sorted by cone-p_T. The sums ∑_pf,fp,ff over the events in the application regions which have a specific combination of leptons passing and failing the tight selection. For instance, ∑_fp runs over all the events in the application region for which the leading lepton has failed the tight criteria and the subleading has passed them.

For regions with three leptons, the estimation in the signal region can be written as

$$\begin{split} N_{\text{ppp}} &= \sum_{\text{fpp}} \frac{f_1}{1-f_1} + \sum_{\text{pfp}} \frac{f_2}{1-f_2} + \sum_{\text{ppf}} \frac{f_3}{1-f_3} \\ & -\sum_{\text{ffp}} \frac{f_1}{1-f_1}\frac{f_2}{1-f_2} -\sum_{\text{fpf}} \frac{f_1}{1-f_1}\frac{f_3}{1-f_3} -\sum_{\text{pff}} \frac{f_2}{1-f_2}\frac{f_3}{1-f_3} \\ & + \sum_{\text{fff}} \frac{f_1}{1-f_1}\frac{f_2}{1-f_2}\frac{f_3}{1-f_3}. \end{split}$$

A similar formula can be written for signal regions with four leptons. The contribution from prompt processes to the AR is estimated using simulations, as it is done in the SR, and it is subtracted before applying the transfer factor to data.

Closure test

In order to check the validity of the method, a closure test is performed in simulated events passing the signal region requirements. This method ensures that no bias is being introduced due to the different flavor composition of the multijet data sample and the signal region, that is enriched in events. It also takes into account uncovered kinematic dependencies of the fake rate as a function of the signal extraction variables.

The closure test is performed separately for electrons and muons in events in the $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region, which is the one expected to have a most significant contribution of events producing non-prompt leptons. The event yield predicted by simulations in the signal region is compared to the estimation of the method applied to simulated events in the AR. The latter is done with fake-rates computed in and multijet simulated samples. The comparison between these two fake-rates allows to evaluate the dependency of the fake-rate between the two samples, while the comparison between the yield in the signal region and the estimation performed using the fake-rate obtained with a multijet sample is an estimation of the total bias of the method.

The closure is shown in figure [fig:ttH_closure_ele_muon] for muons and electrons separately. A very good closure is observed for muons, while a 30-40% non-closure is observed for electrons. This non-closure is due to the different flavor composition between the multijet and the samples, and is used to correct the fake-rate. This non-closure is also taken int account as a source of systematic uncertainty.

$\PGt_\mathrm{h}$misidentification rate

The estimation from misreconstructed gluon and quark jets as $\PGt_\mathrm{h}$is estimated using a similar method to non-prompt leptons, with the only difference that the fake-rate is measured in a data sample enriched events. The selection for this is constructed by requiring an opposite-sign $\Pe\PGm$ pair with ${\ensuremath{p_{\mathrm{T}}}\xspace}> 25$ and 15 $\,\text{Ge\hspace{-.08em}V}$, for the leading and subleading lepton. At least two jets are required, and the event must pass the usual tag multiplicity requirement. Events with $m_{\Pe\PGm} < 12$ $\,\text{Ge\hspace{-.08em}V}$are rejected to reject contributions from low mass resonances. Finally, events are expected to have a $\PGt_\mathrm{h}$passing the fakeable selection. The efficiency of this fakeable $\PGt_\mathrm{h}$to pass the different tight selection criteria is evaluated after subtracting the contribution from genuine $\PGt_\mathrm{h}$as estimated from simulations.

This fake-rate is applied using the same formulae as it is used for non-prompt leptons, and a closure test is also applied to estimate the systematic uncertainty of the method.

Charge flips

The charge sign of electrons and muons may be measured incorrectly, and background events may enter in the $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$and $2 \Pl ss +1 {\ensuremath{\PGt_\mathrm{h}}\xspace}$regions. This rate is negligible for muons, once the cut on the associated uncertainty to the p_Tis applied. However, electrons may radiate significantly and the sign of the curvature may be measured incorrectly with a higher probability.

This mismeasurement rate is determined by measuring the number of reconstructed same-sign and opposite-sign $\PZ\to{\ensuremath{\Pe^\pm}\xspace}{\ensuremath{\Pe^\mp}\xspace}$ events in data. The number of $\PZ\to{\ensuremath{\Pe^\pm}\xspace}{\ensuremath{\Pe^\mp}\xspace}$ events is obtained by performing a fit to the $\PZ$ peak. The shapes for the $m_{\Pl\Pl}$distribution are templates obtained from simulated events.

The measurement is performed in bins of p_Tand η, with a total of 6 categories. The number of same-sign events can be calculated as N_SS = (q₁ + q₂)N_OS, where q_i is the charge flip rate of the i-th lepton and q ≪ 1 has been assumed. 21 such equations are written, corresponding to the possible categories the two leptons can be in, since no ordering of the leptons is taken. These 21 equations, together with their 6 unknowns, that correspond to the flip-rate for each region, form an unconstrained problem, that can be solved using regularization techniques.

Conversions

A contribution from processes in which an electron is produced in the conversion of a photon. In this analysis, this contribution is dominated by ${{\PQt{}\PAQt}\xspace}\PGg$ events, and are estimated using simulations.

Signal and irreducible backgrounds

Signals and irreducible backgrounds are estimated using simulations. The normalization of processes in which it is possible to have a control region enriched in them is left freely floating in the analysis. This way, no a-priori assumption is made on their cross section, and only an its acceptance and in the shapes of the distributions, which are expected to be less affected by higher order correction. However, we quote postfit scale factors with respect to the cross section calculations described below, so they can be compared with other predictions. In this analysis, the normalization of the $\PQt{}\PAQt{}\PZ$, $\PQt{}\PAQt{}\PW$, $\PW{}\PZ$and $\PZ{}\PZ$processes are determined in situ in the signal extraction fit.

$\PQt{}\PAQt{}\PH$signal events are simulated at accuracy with MadGraphmc@nlo, and are normalized to the calculation with electroweak correction presented in . For the interpretation, this cross section is scaled proportionally to ${\kappa}_{\PQt}$, as expected at LO precision in electroweak in perturbation theory.

$\PQt{}\PH{}$events are simulated at LO accuracy using MadGraph. These samples contain information to reweigh events according to different scenarios, which different values for $\kappa_{\PQt}$ and $\kappa_{\PW}$. This allows to take into account the effect of variations of these couplings in the event kinematics. The is employed for the t-channel production, as it is expected to describe better the additional quark in the gluon splitting, that would be treated within the parton shower in the . For the $\PQt{}\PW$-channel, the is used, since it is the only way to technically reweigh events, for different $\kappa_{\PQt}$ and $\kappa_{\PW}$ values. The cross section to normalize these samples is obtained separately for each choice of $\kappa_{\PQt}$ and $\kappa_{\PW}$ at accuracy in the for the t-channel and at accuracy in the , using the DR2 scheme to handle the interference with $\PQt{}\PAQt{}\PH$events .

$\PQt{}\PAQt{}\PW$events are simulated at accuracy with MadGraphmc@nlo, and are normalized to the calculations with electroweak corrections presented in . These predictions do not incorporate the O(α_sα³) corrections mentioned in section [sec:ttH_irreducible], but are anyway used to compare with previous measurements. $\PQt{}\PAQt{}\PZ$events are simulated in the phase space with a mass higher than 1 $\,\text{Ge\hspace{-.08em}V}$at accuracy with MadGraphmc@nlo, and are normalized according to the cross section with electroweak corrections shown in .

The cross sections used for these processes are summarized in table [tab:ttH_irreducible_xsections].

$\PW{}\PZ$and ${\PQq}{\PQq}\to{\ensuremath{\PZ{}\PZ}\xspace}$ events are simulated at using powhegv2 . They are normalized to calculations, although their normalization is measured in data. $\Pg\Pg\to\PZ\PZ$ events are simulated using the mcfmgenerator at LO and normalized to a calculation . $\PQt{}\PZ{}\PQq$events are relevant in the categories with leptons and in the control region for $\PQt{}\PAQt{}\PZ$. They are simulated at accuracy in the with the MadGraphmc@nlopackage, and normalized to the cross section provided by that generator.

Contributions from rarer processes like $\PQt{}\PAQt{}\PQt{}\PAQt$, $\PQt\PW\PZ$, $\PQt\PQt\PW\PH$ or triboson processes are also taken into account with dedicated simulations.

Signal extraction

As mentioned in the previous sections, two complementary approaches are followed to extract the signals in this measurement. The main analysis uses events in all the categories and extracts the signal by fitting the event yields in several subcategories. These subcategories are built using multivariate discriminators that are constructed using kinematic variables. For the control analysis, only events in the most sensitive categories are used. In the categories without $\PGt_\mathrm{h}$, simple kinematic distributions are fitted, while in the $2 \Pl ss +1 {\ensuremath{\PGt_\mathrm{h}}\xspace}$, the score of a matrix element discriminant is used.

The first approach provides higher sensitivity to the process, while the second allows to establish a baseline analysis, and for an easier interpretation of the results.

Main analysis

Two different approaches are followed in the main analysis. The $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$, $3 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$and $2 \Pl ss +1 {\ensuremath{\PGt_\mathrm{h}}\xspace}$are the most sensitive categories in the analysis, as they are quite pure in signal and a reasonably large number of expected signal events. Additionally, their are constructed to accept a significant number of $\PQt{}\PH{}$events. For these regions, the classification is made using multiclass ANN. These discriminators produce several output variables, that can be interpreted as the probability of an event to have been produced by a given process. This allows to built selections that are pure in the $\PQt{}\PAQt{}\PH$and $\PQt{}\PH{}$signals separately, but also on the main backgrounds. This is done to obtain a higher signal sensitivity, and also to constrain these backgrounds from data. Additionally, the large number of expected events in the $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$and $3 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$categories allow to further classify in other variables, such as lepton flavor or tag jet multiplicity to enhance the sensitivity.

Some of the remaining signal regions, $0 \Pl +2 {\ensuremath{\PGt_\mathrm{h}}\xspace}$, $1 \Pl +1 {\ensuremath{\PGt_\mathrm{h}}\xspace}$, $1 \Pl +2 {\ensuremath{\PGt_\mathrm{h}}\xspace}$and $2 \Pl os +1 {\ensuremath{\PGt_\mathrm{h}}\xspace}$, have a significant contribution from non-prompt leptons or misidentified $\PGt_\mathrm{h}$, one order of magnitude higher than that of signal. The remaining ones, $2 \Pl +2 {\ensuremath{\PGt_\mathrm{h}}\xspace}$, $3 \Pl +1 {\ensuremath{\PGt_\mathrm{h}}\xspace}$and $4 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$, correspond to final state with four leptons, receiving a negligible contribution from $\PQt{}\PH{}$and a overall small event yield. In these cases, little sensitivity gain is expected from the multiclass approach, so s are used to construct categories enriched in signal.

The discriminators used in this chapter are inputs from the IHEP and NICPB groups, which I have used to perform the signal extraction. For this reason, only the input variables and the relevant details of their training are covered.

Some of these discriminators use others as input variables: the hadronic top tagger and the Higgs jet tagger, that allow to improve the sensitivity.

Hadronic top tagger

It allows to find triplets of jets that correspond to the hadronic decay of a top quark. It is a trained on simulated events using jet triplets coming from the top quark decay as signal and other combinations, i.e. jet triplets in that have not been produced in the same top quark decay, as background. The discriminator uses 16 input variables, including the score of the -tagging discriminators and the quark-gluon likelihood of the jets, angular variables distances and masses of several combinations of jets. All the possible jet triplets of an event are considered, and the triplets with the highest and second highest score are kept.

Higgs jet tagger

It allows to identify, in the $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region, the jet that has been produced in the Higgs decay. It uses 5 input variables including the kinematic properties of the jet, its tagging score and quark gluon likelihood, as well as the angular distances between the jet and the leptons. All jets in the event, but the triplet identified by the hadronic top tagger are evaluated, and the one with the highest score is kept.

Signal regions with ANNs

The ANNs are trained with samples of simulated events, different from those used in the signal extraction. As input variables, a combination of low level variables, namely the three-momenta of the objects, and high level variables (invariant masses and angular distances between the various objects) are used. This combined strategy allows to improve significantly the discrimination with respect to the based approaches.

The ANN in the $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region features 36 input variables and returns four scores, corresponding to the probability of an event probability to be a $\PQt{}\PAQt{}\PH$, $\PQt{}\PH{}$, $\PQt{}\PAQt{}\PW$or event. 41 and 37 variables are used in the $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$and $3 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$regions, respectively. Here, three nodes are included for $\PQt{}\PAQt{}\PH$, $\PQt{}\PH{}$and background processes. The list of input variables in these ANNs is shown in table [tab:tth_input_variables_dnn].

Events in each category are divided in subcategories based on the output scores of the DNN. One subcategory per each output node is built with events for which the score of that particular node is higher than that of the other nodes. This defines four categories in the $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region and three in the other two. Events in the $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$are further categorized depending on the flavor of the selected leptons in the $\Pe\Pe$, $\Pe\PGm$ and $\PGm\PGm$ channels. Events in the $3 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$are also classified according to the presence or not of two medium -tagged jets and according to the lepton flavor in the $\Pe\Pe\Pe$, $\Pe\Pe\PGm$ $\Pe\PGm\PGm$ and $\PGm\PGm\PGm$. Not all the subcategorization is performed to all the nodes. Finally, each subcategory is divided bins according to the maximum ANN output score to maximize the sensitivity.

Events in the 3$\Pl$ and 4$\Pl$ control regions are also added in the fit in both the main and the control analysis, to determine the normalization of the $\PQt{}\PAQt{}\PZ$and diboson processes in situ.

Signal regions with

The remaining signal regions are either dominated by one background source or have to small event yield to profit from the multiclass approach. For these regions are used, and events are classified depending on the score of these s in several bins.

The input variables used in these s are summarized in table [tab:ttH_inputvariables_bdt].

Control analysis

The control analysis is designed to be a robust cross-check of the analysis, that serves to consolidate the measurement. Only the most sensitive categories to the $\PQt{}\PAQt{}\PH$signal are used: the $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$$\PQt{}\PAQt{}\PH$region, the $3 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$$\PQt{}\PAQt{}\PH$region, the $2 \Pl ss +1 {\ensuremath{\PGt_\mathrm{h}}\xspace}$$\PQt{}\PAQt{}\PH$region and the $4 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region.

$2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$

Events in this region are categorized in three classes, depending on the jet multiplicity. Three subcategories are built depending on the flavor of the two leading leptons, and two additional subcategories are built depending on the charge of the leptons. In total 18 categories are built. Then, events are further classified in 9 bins of $m_{\Pl\Pl}$.

$3 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$

Events in this region are categorized in two classes, depending on the jet multiplicity. Two subcategories are built, depending on the sign of the sum of lepton charges. Then, events are further classified in 5 bins of $m_{3\Pl}$.

$3 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$

$2 \Pl ss +1 {\ensuremath{\PGt_\mathrm{h}}\xspace}$

Events in this category are classified based on a matrix element method (MEM). Two categories are considered: events with three jets, in which one of the jets for signal has been lost, and events with at least four jets. Several hypotheses are considered, corresponding to the decay chains expected for events coming from the different processes in this region. These hypotheses are shown in table [tab:ttH_mem_hypotheses]. Then, the MEM weight for an event with reconstructed-level observables, y, can be computed as

$$w_\Omega(\textbf{y})=\frac{1}{\sigma_\Omega} \sum\limits_{p} \int{d\textbf{x} dx_a dx_b \frac{f_i(x_a,Q)f_j(x_b,Q)}{x_a x_b s} \delta(x_a P_a + x_b P_b - \sum p_k) |\mathcal{M}_\Omega(\textbf{x})|^2 W(\textbf{y}|\textbf{x})}, \label{eq:MEM_weight}$$

where σ_Ω is a normalization term fixed by requiring ∫dyw_Ω(y). $\sum\limits_{p}$ runs over the possible partons in the initial state, and x_a, x_b, correspond to the Bjorken fractions. f_i(x, Q) are the associated to a parton with flavor i, Bjorken fraction x and a given scale, Q, of the process, and δ(x_aP_a + x_bP_b − ∑p_k) imposes the energy and momentum conservation. M_Ω(x) is the matrix element for a given process hypothesis, and is computed at LO using MadGraph. W(y∣x) is a transfer function from parton-level quantities x to detector-level quantities, determined from simulations.

$4 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$

Events are categorized in three categories based on the invariant mass of the four leptons, $m_{4\Pl}$.

Systematic uncertainties

Similarly to the other analyses presented in this thesis, systematic uncertainties are present due to imprecisely measured or simulated effects. These uncertainties may affect the observed number of events, and they are parameterized as nuisance parameters in the signal extraction fit.

Auxiliary measurements used to validate and correct the simulations, and imprecisions the data-driven estimation methods are referred to as experimental uncertainties. Modeling uncertainties raise from the limited accuracy of the theory predictions, mainly dominated by the missing higher-order corrections in the cross section calculations and the simulation models, and from the uncertainties on the .

Experimental uncertainties

Modeling uncertainty

Results

In this section, observed number of events in each subcategory of the analysis is compared to the expectations from the signal and background models. The signal extraction fit is also described and its results are also shown.

Main analysis

Signal is extracted by performing a maximum likelihood to the observed yields in all the categories. Besides the systematic uncertainties, that are parameterized as nuisance parameters of the fit, the strength μ = σ/σ^SM for the $\PQt{}\PAQt{}\PH$, $\PQt{}\PH{}$, $\PQt{}\PAQt{}\PW$, $\PQt{}\PAQt{}\PZ$, $\PW{}\PZ$and $\PZ{}\PZ$processes are free unconstrained parameters of the fit. The rate of the $\PQt{}\PAQt{}\PW{}\PW$background is constrained to scale by the same factor as $\PQt{}\PAQt{}\PW$production.

The number of observed events in the signal regions without $\PGt_\mathrm{h}$are shown in figures [fig:main_analysis_results₀tau₁] and [fig:main_analysis_results₀tau₂] after the fit to all signal and control regions of the main analysis is performed. A good agreement is seen between the observation and the statistical model across all the signal regions.

In case the distributions in the discriminating observables for the ${\ensuremath{\PQt{}\PH{}}\xspace}$ and ${\ensuremath{\PQt{}\PAQt{}\PH}\xspace}$ signals conform to their SM expectation the production rate for the ${\ensuremath{\PQt{}\PAQt{}\PH}\xspace}$ signal is measured to be ${\ensuremath{\mu}}_{{\ensuremath{\PQt{}\PAQt{}\PH}\xspace}} = 0.98 \pm 0.19 {\ensuremath{\text{~(stat.)}}}^{+0.17}_{-0.13}{\ensuremath{\text{~(syst.+lumi.)}}}$ times the SM expectation, equivalent to a ${\ensuremath{\PQt{}\PAQt{}\PH}\xspace}$ production cross section of $496.8 \pm 96.3{\ensuremath{\text{~(stat.)}}}\pm 38.2{\ensuremath{\text{~(syst.+lumi.)}}}{\ensuremath{{\ensuremath{\text{\,fb}}\xspace}}}$, and that of the ${\ensuremath{\PQt{}\PH{}}\xspace}$ signal is measured to be ${\ensuremath{\mu}}_{{\ensuremath{\PQt{}\PH{}}\xspace}} = 6.9 \pm 2.7{\ensuremath{\text{~(stat.)}}}\pm 3.0{\ensuremath{\text{~(syst.+lumi.)}}}$ times the SM expectation for this production rate, equivalent to a cross section for ${\ensuremath{\PQt{}\PH{}}\xspace}$ production of $0.6 \pm 0.2{\ensuremath{\text{~(stat.)}}}\pm 0.3{\ensuremath{\text{~(syst.+lumi.)}}}{\ensuremath{{\ensuremath{\text{\,fb}}\xspace}}}$. The production rate for the main unconstrained backgrounds is measured to be $\theta_{{\ensuremath{\PQt{}\PAQt{}\PZ}\xspace}} = 1.03 \pm 0.14{\ensuremath{\text{~(stat.+syst.+lumi.)}}}$ and $\theta_{{\ensuremath{\PQt{}\PAQt{}\PW}\xspace}} = 1.49 \pm 0.21{\ensuremath{\text{~(stat.+syst.+lumi.)}}}$ times their SM expectation. The rate of $\PQt{}\PAQt{}\PW$is above the expectation, due to a excess of events in the $\PQt{}\PAQt{}\PW$category of the $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$region, which is reasonably pure in $\PQt{}\PAQt{}\PW$events. This behavior shows a similar trend to other $\PQt{}\PAQt{}\PH$and $\PQt{}\PAQt{}\PW$analyses by both the CMS and ATLAS collaborations , and could be at least partially explained by the missing higher order corrections in the reference $\PQt{}\PAQt{}\PW$cross section, as described in section [sec:ttH_irreducible].

Assuming a $\PQt{}\PH{}$production rate equivalent to that of the , these results correspond to an observed (expected) significance of the ${\ensuremath{\PQt{}\PAQt{}\PH}\xspace}$ signal to be different from zero amounting to 5.3 (5.4) standard deviations. Assuming a $\PQt{}\PAQt{}\PH$production equivalent to that of the , the observed and expected significance for $\PQt{}\PH{}$production is 1.8 and 0.3.

Figure [fig:ttH_results_main_scans] also show the confidence regions for simultaneous measurements of the $\PQt{}\PAQt{}\PW$and $\PQt{}\PAQt{}\PH$, $\PQt{}\PAQt{}\PZ$and $\PQt{}\PAQt{}\PH$, and $\PQt{}\PH{}$and $\PQt{}\PAQt{}\PH$for 0.68 and 0.95 confidence levels. The rest of the parameters, including the other two parameters of interest not plotted, are profiled. They show the level of agreement with expectation and also the level of correlation of the measured parameters of interest.

Control analysis

Signal extraction is performed in the control analysis using a similar likelihood fit, keeping the $\PQt{}\PAQt{}\PH$, $\PQt{}\PAQt{}\PZ$and $\PQt{}\PAQt{}\PH$production rates as free parameters of the fit. $\PQt{}\PH{}$is fixed to the prediction within its associated uncertainties. The observed events in each signal region, together with the best approximation of the statistical model are shown in figures [fig:control_analysis_results_srs₁]-[fig:control_analysis_taus]. The production rate for the floating processes is measured to be ${\ensuremath{\mu}}_{{\ensuremath{\PQt{}\PAQt{}\PH}\xspace}} = 0.7 \pm 0.3{\ensuremath{\text{~(stat.+syst.+lumi.)}}}$, ${\ensuremath{\mu}}_{{\ensuremath{\PQt{}\PAQt{}\PH}\xspace}} = 1.4 \pm 0.5{\ensuremath{\text{~(stat.+syst.+lumi.)}}}$, ${\ensuremath{\mu}}_{{\ensuremath{\PQt{}\PAQt{}\PH}\xspace}} = 0.8 \pm 0.4{\ensuremath{\text{~(stat.+syst.+lumi.)}}}$, and ${\ensuremath{\mu}}_{{\ensuremath{\PQt{}\PAQt{}\PH}\xspace}} = 1.5 \pm 1.5{\ensuremath{\text{~(stat.+syst.+lumi.)}}}$ times the SM expectation, in the $2 \Pl ss+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$, $3 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$, $2 \Pl ss +1 {\ensuremath{\PGt_\mathrm{h}}\xspace}$, and $4 \Pl+0 {\ensuremath{\PGt_\mathrm{h}}\xspace}$channel, respectively. A signal strength of ${\ensuremath{\mu}}_{{\ensuremath{\PQt{}\PAQt{}\PH}\xspace}} = 1.03 \pm 0.22{\ensuremath{\text{~(stat.)}}}\pm 0.19{\ensuremath{\text{~(syst.+lumi.)}}}$ is obtained for the simultaneous ML fit of all four channels. The corresponding observed (expected) significance of the ${\ensuremath{\PQt{}\PAQt{}\PH}\xspace}$ signal in this analysis amounts to 4.2 (4.1) standard deviations.

Figure [fig:ttH_control_results_likelihoodscan] shows the confidence region corresponding to a 68% and 95% confidence level on the simultaneous extraction of the $\PQt{}\PAQt{}\PH$and $\PQt{}\PAQt{}\PZ$, $\PQt{}\PAQt{}\PZ$and $\PQt{}\PAQt{}\PW$, and $\PQt{}\PAQt{}\PH$and $\PQt{}\PAQt{}\PW$strengths.

Interpretation

The results of the main analysis are interpreted in terms of the coupling modifiers $\kappa_{\PQt}$ and $\kappa_{\PW}$. The former affects both the $\PQt{}\PAQt{}\PH$and $\PQt{}\PH{}$production, while $\kappa_{\PW}$ only affects the production of $\PQt{}\PH{}$. The effect of $\kappa_{\PQt}$ and $\kappa_{\PW}$ in the event kinematics of $\PQt{}\PH{}$are already implemented in the samples, as described in section [sub:ttH_signal_background_mcs]. No kinematic dependence with $\kappa_{\PQt}$ is present at leading order in $\PQt{}\PAQt{}\PH$events, so only the scaling of the cross section as $\kappa_{\PQt}^2$ is considered. The effects of the variations of $\kappa_{\PW}$ and $\kappa_{\PQt}$ in the branching ratios of the Higgs boson are also considered.

In order to perform this interpretation, the statistical model used for the signal extraction of the main analysis is used, fixing the signal strengths to the values. The likelihood function is evaluated profiling all the nuisances and the strength modifiers for the irreducible backgrounds for several hypotheses of $\kappa_{\PQt}$ and $\kappa_{\PW}$. The point with the maximum likelihood is taken as the best fit for $\kappa_{\PQt}$ and $\kappa_{\PW}$.

Figure [fig:ttH_results_ktkv] shows − 2ΔlogL value, where L is the profiled likelihood, for several values of $\kappa_{\PQt}$, assuming a $\kappa_{\PW}$ equivalent to the standard model. Both the expected and observed scans are shown in the plot. Limits on $\kappa_{\PQt}$ can be set by finding the crossing of − 2ΔlogL with the quantiles of a χ₁² distribution. The current results constrain $\kappa_{\PQt}$ to be within either of the two intervals $-1.1 < \kappa_{\PQt} < -0.7$ and $0.9 < \kappa_{\PQt} < 1.1$ at 95% confidence level. Figure [fig:ttH_results_ktkv] also shows confidence regions at 68% and 95% confidence level, obtained by finding the crossings of − 2ΔlogL with the quantiles of a χ₂² distribution.

Conclusions

A measurement of $\PQt{}\PAQt{}\PH$and $\PQt{}\PH{}$production in the multilepton channel is performed. This measurement aims to be sensitive to the $\PW\PW$, $\PZ\PZ$ and $\PGt\PGt$ decay modes of the Higgs boson, and both hadronic and leptonic decays of the top quark. Signal regions are built aiming for different combinations of these decays, and selections are applied according to the topology expected. Events are categorized among these signal regions, and are further categorized by the event kinematics to select regions enriched in signal and background events. Two approaches are followed to do so: one approach using techniques, and another using simple kinematic variables and a matrix element discriminant.

The first approach allows to measure $\PQt{}\PAQt{}\PH$with a production rate of ${\ensuremath{\mu}}_{{\ensuremath{\PQt{}\PAQt{}\PH}\xspace}} = 0.98 \pm 0.19 {\ensuremath{\text{~(stat.)}}}^{+0.17}_{-0.13}{\ensuremath{\text{~(syst.+lumi.)}}}$ times the SM expectation. The production rate for ${\ensuremath{\PQt{}\PH{}}\xspace}$ signal is measured to be ${\ensuremath{\mu}}_{{\ensuremath{\PQt{}\PH{}}\xspace}} = 6.9 \pm 2.7{\ensuremath{\text{~(stat.)}}}\pm 3.0{\ensuremath{\text{~(syst.+lumi.)}}}$ times the SM expectation. These results correspond to an observed (expected) significance of 5.3 (5.4) standard deviations for $\PQt{}\PAQt{}\PH$and 1.8 (0.3) for $\PQt{}\PH{}$production, in both cases fixing the other signal to the value.

The second approach allows to measure the $\PQt{}\PAQt{}\PH$signal strength with a value of ${\ensuremath{\mu}}_{{\ensuremath{\PQt{}\PAQt{}\PH}\xspace}} = 1.03 \pm 0.22{\ensuremath{\text{~(stat.)}}}\pm 0.19{\ensuremath{\text{~(syst.+lumi.)}}}$ . The observed (expected) significance to the $\PQt{}\PAQt{}\PH$signal to be different from zero in this approach amounts to 4.2 (4.1) standard deviations.

These results allow to claim the observation of $\PQt{}\PAQt{}\PH$production in the multilepton channel.

Additionally, results have been interpreted in the context of modifiers of the Higgs boson couplings to the top quark and bosons. Assuming the coupling of the rest of the couplings to be that of the , $\kappa_{\PQt}$ is constrained to be in either of the two intervals $-1.1 < \kappa_{\PQt} < -0.7$ and $0.9 < \kappa_{\PQt} < 1.1$ at 95% confidence level.

Summary and conclusions

In this thesis, searches and measurements are presented in $\Pp\Pp$collisions recorded by the detector during the Run 2 of the . These studies have been done in events with two or more high p_Tleptons in the final state, signature that is indicative of the production of heavy particles.

A measurement of the $\PQt{}\PW$production cross-section in ${\ensuremath{\Pe^\pm}\xspace}{\ensuremath{\PGm^\mp}\xspace}$ events has been presented. These measurements are the complement to measurements of production, for which $\PQt{}\PW$production is the main background. Events are categorized according to their jet and -tagged jet multiplicity and makes discriminators are used to obtain selections pure in signal. The cross-section is measured to be

$$\sigma_{{\ensuremath{\PQt{}\PW}\xspace}} = 63.1 \pm 1.8 \mathrm{(stat)} \pm 6.3 \mathrm{(syst)} \pm 2.1 \mathrm{(lumi)}\ \mathrm{pb},$$

consistent with the predictions of 71.7 ± 1.8 (scale) ± 3.4 (PDF) pb approximate accuracy .

A differential measurement is also performed in the same channel, in events with exactly one jet that is -tagged. Despite of the large contribution of events in the signal region, the differential cross-section for several variables of interest is measured consistently with the predictions with a precision of between 20% and 100%, depending on the bin of the distribution. These measurements open the way for more precision measurements, as well as measurements of the interference between the and $\PQt{}\PW$processes.

Searches for the production of particles have also been performed in events with two opposite-sign same-flavor leptons, and missing transverse momentum. This signature is sensitive to different models of gluino, squark, electroweak spartners and slepton production. Different signal regions were defined for each one of the models, and main backgrounds are estimated using data-driven techniques. These measurements allow to exclude several ranges of sparticle masses under the assumption of simplified models, not excluded by previous searches.

Finally, a measurement of $\PQt{}\PAQt{}\PH$and $\PQt{}\PH{}$production has been performed. This measurement is performed in the multilepton channel, aiming for the $\PW\PW$, $\PZ\PZ$ and $\PGt\PGt$ decay modes of the Higgs boson. Events are categorized based on the lepton and $\PGt_\mathrm{h}$multiplicity, and further selection criteria in jets and -tagged jet multiplicity are applied. Signals are then extracted in two complementary ways. In the main analysis, events are categorized based on the output of multivariate classifiers and other kinematic properties of the event. A control analysis is performed by categorizing based on the kinematic properties of the event and a matrix element discriminator. Special care is taken in the discrimination of background due to non-prompt leptons, and the characterization of selected leptons. Similarly, state-of-the-art models are used to estimate the irreducible backgrounds.

These measurements provide a measurement of the $\PQt{}\PAQt{}\PH$signal strength of ${\ensuremath{\mu}}_{{\ensuremath{\PQt{}\PAQt{}\PH}\xspace}} = 0.98 \pm 0.19 {\ensuremath{\text{~(stat.)}}}^{+0.17}_{-0.13}{\ensuremath{\text{~(syst.+lumi.)}}}$ times the SM expectation. Its cross-section is measured with an uncertainty better than 15%. The analysis presented has an observed (expected) sensitivity 5.3 (5.4) standard deviations in the main analysis, and of 4.2 (4.1) standard deviations in the control analysis. The observed and expected significance of the main analysis to $\PQt{}\PH{}$production is 1.8 (0.3).

The main analysis is also interpreted in the context of anomalous couplings of the Higgs boson to the top quark and the boson, that affect both the $\PQt{}\PAQt{}\PH$and $\PQt{}\PH{}$cross-sections. Kinematic effects of these anomalous couplings in $\PQt{}\PH{}$production are taken into account in the analysis. The analysis constrain the $\kappa_{\PQt}$ coupling modifier to be in either of the two intervals $-1.1 < \kappa_{\PQt} < -0.7$ and $0.9 < \kappa_{\PQt} < 1.1$ at 95% confidence level, under the assumption that the Higgs boson coupling to the boson is the predicted by the . Confidence regions are also set in the $(\kappa_{\PQt},\kappa_{\PW})$ plane of the coupling modifiers.

In summary, this thesis has exploited the potential of final states with two or more leptons to perform both precision measurements and searches for physics. All the results obtained are consistent with expectations. Similarly, under this hypothesis, an observation of $\PQt{}\PAQt{}\PH$production in the multilepton channel has been performed.

Resumen y conclusiones

At least until the NLO bb4lcalculation was released.↩