Partager

Publications

Publications

Les thèses soutenues au CMAP sont disponibles en suivant ce lien:
Découvrez les thèses du CMAP

Sont listées ci-dessous, par année, les publications figurant dans l'archive ouverte HAL.

2020

  • AMF: Aggregated Mondrian Forests for Online Learning
    • Mourtada Jaouad
    • Gaïffas Stéphane
    • Scornet Erwan
    , 2020. Random Forests (RF) is one of the algorithms of choice in many supervised learning applications, be it classification or regression. The appeal of such tree-ensemble methods comes from a combination of several characteristics: a remarkable accuracy in a variety of tasks, a small number of parameters to tune, robustness with respect to features scaling, a reasonable computational cost for training and prediction, and their suitability in high-dimensional settings. The most commonly used RF variants however are "offline" algorithms, which require the availability of the whole dataset at once. In this paper, we introduce AMF, an online random forest algorithm based on Mondrian Forests. Using a variant of the Context Tree Weighting algorithm, we show that it is possible to efficiently perform an exact aggregation over all prunings of the trees; in particular, this enables to obtain a truly online parameter-free algorithm which is competitive with the optimal pruning of the Mondrian tree, and thus adaptive to the unknown regularity of the regression function. Numerical experiments show that AMF is competitive with respect to several strong baselines on a large number of datasets for multi-class classification. (10.48550/arXiv.1906.10529)
    DOI : 10.48550/arXiv.1906.10529
  • Stochastic approximations for financial risk computations
    • Bourgey Florian
    , 2020. In this thesis, we investigate several stochastic approximation methods for both the computation of financial risk measures and the pricing of derivatives.As closed-form expressions are scarcely available for such quantities, %and because they have to be evaluated daily, the need for fast, efficient, and reliable analytic approximation formulas is of primal importance to financial institutions.We aim at giving a broad overview of such approximation methods and we focus on three distinct approaches.In the first part, we study some Multilevel Monte Carlo approximation methods and apply them for two practical problems: the estimation of quantities involving nested expectations (such as the initial margin) along with the discretization of integrals arising in rough forward variance models for the pricing of VIX derivatives.For both cases, we analyze the properties of the corresponding asymptotically-optimal multilevel estimatorsand numerically demonstrate the superiority of multilevel methods compare to a standard Monte Carlo.In the second part, motivated by the numerous examples arising in credit risk modeling, we propose a general framework for meta-modeling large sums of weighted Bernoullirandom variables which are conditional independent of a common factor X.Our generic approach is based on a Polynomial Chaos Expansion on the common factor together withsome Gaussian approximation. L2 error estimates are given when the factor X is associated withclassical orthogonal polynomials.Finally, in the last part of this dissertation, we deal withsmall-time asymptotics and provide asymptoticexpansions for both American implied volatility and American option prices in local volatility models.We also investigate aweak approximations for the VIX index inrough forward variance models expressed in termsof lognormal proxiesand derive expansions results for VIX derivatives with explicit coefficients.
  • Geometry-Aware Hamiltonian Variational Auto-Encoder
    • Chadebec Clément
    • Mantoux Clément
    • Allassonnière Stéphanie
    , 2020. Variational auto-encoders (VAEs) have proven to be a well suited tool for performing dimensionality reduction by extracting latent variables lying in a potentially much smaller dimensional space than the data. Their ability to capture meaningful information from the data can be easily apprehended when considering their capability to generate new realistic samples or perform potentially meaningful interpolations in a much smaller space. However, such generative models may perform poorly when trained on small data sets which are abundant in many real-life fields such as medicine. This may, among others, come from the lack of structure of the latent space, the geometry of which is often under-considered. We thus propose in this paper to see the latent space as a Riemannian manifold endowed with a parametrized metric learned at the same time as the encoder and decoder networks. This metric is then used in what we called the Riemannian Hamiltonian VAE which extends the Hamiltonian VAE introduced by Caterini et al. (2018) to better exploit the underlying geometry of the latent space. We argue that such latent space modelling provides useful information about its underlying structure leading to far more meaningful interpolations, more realistic data-generation and more reliable clustering.
  • An asymptotic preserving well-balanced scheme for the isothermal fluid equations in low-temperature plasma applications
    • Alvarez-Laguna Alejandro
    • Pichard Teddy
    • Magin Thierry
    • Chabert Pascal
    • Bourdon Anne
    • Massot Marc
    Journal of Computational Physics, Elsevier, 2020, 419, pp.109634. We present a novel numerical scheme for the efficient and accurate solution of the isothermal two-fluid (electron + ion) equations coupled to Poisson's equation for low-temperature plasmas. The model considers electrons and ions as separate fluids, comprising the electron inertia and charge separation. The discretization of this system with standard explicit schemes is constrained by very restrictive time steps and cell sizes related to the resolution of the Debye length, electron plasma frequency, and electron sound waves. Both sheath and electron inertia are fundamental to fully explain the physics in low-pressure and low-temperature plasmas. However, most of the phenomena of interest for fluid models occur at speeds much slower than the electron thermal speed and are quasi-neutral, except in small charged regions. A numerical method that is able to simulate efficiently and accurately all these regimes is a challenge due to the multiscale character of the problem. In this work, we present a scheme based on the Lagrange-projection operator splitting that preserves the asymptotic regime where the plasma is quasi-neutral with massless electrons. As a result, the quasi-neutral regime is treated without the need of an implicit solver nor the resolution of the Debye length and electron plasma frequency. Additionally, the scheme proves to accurately represent the dynamics of the electrons both at low speeds and when the electron speed is comparable to the thermal speed. In addition, a well-balanced treatment of the ion source terms is proposed in order to tackle problems where the ion temperature is very low compared to the electron temperature. The scheme significantly improves the accuracy both in the quasi-neutral limit and in the presence of plasma sheaths when the Debye length is resolved. In order to assess the performance of the scheme in low-temperature plasmas conditions, we propose two specifically designed test-cases: a quasi-neutral two-stream periodic perturbation with analytical solution and a low-temperature discharge that includes sheaths. The numerical strategy, its accuracy, and computational efficiency are assessed on these two discriminating configurations. (10.1016/j.jcp.2020.109634)
    DOI : 10.1016/j.jcp.2020.109634
  • Optimal feedback control in first-passage resetting
    • Lunz Davin
    Journal of Physics A: Mathematical and Theoretical, IOP Publishing, 2020, 53 (44), pp.44LT01. We study a diffusion process on a finite interval under the influence of a controllable drift where the particle resets to the left-hand side upon reaching the right-hand side. Assigning a pay-off for being nearer the right-hand side, but a penalty for reaching it, induces an inherent trade-off. We seek the drift feedback that maximises the long-term reward. By reducing the problem to a constrained variational problem we deduce that, for a wide class of problems, the optimal feedback law is remarkably straightforward: below a threshold state exert maximum drift; beyond the threshold exert minimum drift. (10.1088/1751-8121/abbc7c)
    DOI : 10.1088/1751-8121/abbc7c
  • Mean–field moral hazard for optimal energy demand response management
    • Élie Romuald
    • Hubert Emma
    • Mastrolia Thibaut
    • Possamaï Dylan
    Mathematical Finance, Wiley, 2020, 31 (1), pp.399-473. Abstract We study the problem of demand response contracts in electricity markets by quantifying the impact of considering a continuum of consumers with mean–field interaction, whose consumption is impacted by a common noise. We formulate the problem as a Principal–Agent problem with moral hazard in which the Principal— she —is an electricity producer who observes continuously the consumption of a continuum of risk‐averse consumers, and designs contracts in order to reduce her production costs. More precisely, the producer incentivizes each consumer to reduce the average and the volatility of his consumption in different usages, without observing the efforts he makes. We prove that the producer can benefit from considering the continuum of consumers by indexing contracts on the consumption of one Agent and aggregate consumption statistics from the distribution of the entire population of consumers. In the case of linear energy valuation, we provide closed‐form expression for this new type of optimal contracts that maximizes the utility of the producer. In most cases, we show that this new type of contracts allows the Principal to choose the risks she wants to bear, and to reduce the problem at hand to an uncorrelated one. (10.1111/mafi.12291)
    DOI : 10.1111/mafi.12291
  • Exploring and Comparing Unsupervised Clustering Algorithms
    • Lavielle Marc
    • Waggoner Philip
    Journal of Open Research Software, Ubiquity Press, 2020, 8. (10.5334/jors.269)
    DOI : 10.5334/jors.269
  • Comparison between multifluid and Particle-In-Cell (PIC) simulations of instabilities and boundary layers in low-temperature low pressure magnetized plasmas for electric pro- pulsion applications.
    • Reboul Louis
    • Alvarez-Laguna Alejandro
    • Magin Thierry E.
    • Chabert Pascal
    • Bourdon Anne
    • Massot Marc
    , 2020.
  • Vlasov limit for a chain of oscillators with Kac potentials
    • Fernandez Montero Alejandro
    , 2020. We consider a chain of anharmonic oscillators with local mean field interaction and long-range stochastic exchanges of velocity. Even if the particles are not exchangeable, we prove the convergence of the empirical measure associated with this chain to a solution of a Vlasov-type equation. We then use this convergence to prove energy diffusion for a restricted class of anharmonic potentials.
  • High order homogenization of the Poisson equation in a perforated periodic domain
    • Feppon Florian
    , 2020. We derive high order homogenized models for the Poisson problem in a cubic domain periodically perforated with holes where Dirichlet boundary conditions are applied. These models unify the three possible kinds of limit problems derived by the literature for various asymptotic regimes (namely the "unchanged" Poisson equation, the Poisson problem with a strange reaction term, and the zeroth order limit problem) of the ratio η ≡ a_ε/ε between the size a_ε of the holes and the size ε of the periodic cell. The derivation relies on algebraic manipulations on formal two-scale power series in terms of ε and more particularly on the existence of a "criminal" ansatz, which allows to reconstruct the oscillating solution uε as a linear combination of the derivatives of its formal average u_ε^* weighted by suitable corrector tensors. The formal average is itself the solution of a formal, infinite order homogenized equation. Classically, truncating the infinite order homogenized equation yields in general an ill-posed model. Inspired by a variational method introduced in [52, 23], we derive, for any K ∈ N, well-posed corrected homogenized equations of order 2K + 2 which yield approximations of the original solutions with an error of order O(ε^2K+4) in the L 2 norm. Finally, we find asymptotics of all homogenized tensors in the low volume fraction regime η → 0 and in dimension d ≥ 3. This allows us to show that our higher order effective equations converge coefficient-wise to either of the three classical homogenized regimes of the literature which arise when η is respectively lower, equivalent, or greater than the critical scaling η_crit ∼ ε^2/(d−2) .
  • A time domain factorization method for obstacles with impedance boundary conditions
    • Haddar Houssem
    • Liu Xiaoli
    Inverse Problems, IOP Publishing, 2020, 36 (10), pp.105011. We consider the inverse acoustic time domain scattering problem for absorbing scatterers modeled by impedance boundary conditions. We present and analyze a factorization method for reconstructing the obstacle boundary from far field measurements. The analysis is based on using the Laplace transform and proving the coercivity of the solution operator in suitable weighted spaces in time. This leads us to consider a modified far field operator parameterized by the imaginary part of Laplace variable which is different but can be arbitrarily close to the original far field operator. As a proof of concept we also provide some preliminary numerical examples to test and discuss the effectiveness of the resulting inversion method when applied to the original far field operator. (10.1088/1361-6420/abaf3b)
    DOI : 10.1088/1361-6420/abaf3b
  • ZiMM: a deep learning model for long term adverse events with non-clinical claims data
    • Kabeshova Anastasiia
    • Yu Yiyang
    • Lukacs Bertrand
    • Bacry Emmanuel
    • Gaïffas Stéphane
    Journal of Biomedical Informatics, Elsevier, 2020, 110, pp.103531. This paper considers the problems of modeling and predicting a long-term and “blurry” relapse that occurs after a medical act, such as a surgery. We do not consider a short-term complication related to the act itself, but a long-term relapse that clinicians cannot explain easily, since it depends on unknown sets or sequences of past events that occurred before the act. The relapse is observed only indirectly, in a “blurry” fashion, through longitudinal prescriptions of drugs over a long period of time after the medical act. We introduce a new model, called ZiMM (Zero-inflated Mixture of Multinomial distributions) in order to capture long-term and blurry relapses. On top of it, we build an end-to-end deep-learning architecture called ZiMM Encoder-Decoder (ZiMM ED) that can learn from the complex, irregular, highly heterogeneous and sparse patterns of health events that are observed through a claims-only database. ZiMM ED is applied on a “non-clinical” claims database, that contains only timestamped reimbursement codes for drug purchases, medical procedures and hospital diagnoses, the only available clinical feature being the age of the patient. This setting is more challenging than a setting where bedside clinical signals are available. Our motivation for using such a non-clinical claims database is its exhaustivity population-wise, compared to clinical electronic health records coming from a single or a small set of hospitals. Indeed, we consider a dataset containing the claims of almost all French citizens who had surgery for prostatic problems, with a history between 1.5 and 5 years. We consider a long-term (18 months) relapse (urination problems still occur despite surgery), which is blurry since it is observed only through the reimbursement of a specific set of drugs for urination problems. Our experiments show that ZiMM ED improves several baselines, including non-deep learning and deep-learning approaches, and that it allows working on such a dataset with minimal preprocessing work. (10.1016/j.jbi.2020.103531)
    DOI : 10.1016/j.jbi.2020.103531
  • Practical computation of the diffusion MRI signal of realistic neurons based on Laplace eigenfunctions
    • Li Jing-Rebecca
    • Tran Try Nguyen
    • Nguyen Van‐dang
    NMR in Biomedicine, Wiley, 2020, 33 (10). The complex transverse water proton magnetization subject to diffusion-encoding magnetic field gradient pulses in a heterogeneous medium such as brain tissue can be modeled by the Bloch-Torrey partial differential equation. The spatial integral of the solution of this equation in realistic geometry provides a gold-standard reference model for the diffusion MRI signal arising from different tissue micro-structures of interest. A closed form representation of this reference diffusion MRI signal has been derived twenty years ago, called Matrix Formalism that makes explicit the link between the Laplace eigenvalues and eigenfunctions of the biological cell and its diffusion MRI signal. In addition, once the Laplace eigendecomposition has been computed and saved, the diffusion MRI signal can be calculated for arbitrary diffusion-encoding sequences and b-values at negligible additional cost. Up to now, this representation, though mathematically elegant, has not been often used as a practical model of the diffusion MRI signal, due to the difficulties of calculating the Laplace eigendecomposition in complicated geometries. In this paper, we present a simulation framework that we have implemented inside the MATLAB-based diffusion MRI simulator SpinDoctor that efficiently computes the Matrix Formalism representation forrealistic neurons using the finite elements method. We show the Matrix Formalism representation requires around a few hundred eigenmodes to match the reference signal computed by solving the Bloch-Torrey equation when the cell geometry comes from realistic neurons. As expected, the number of required eigenmodes to match the reference signal increases with smaller diffusion time and higher b-values. We also converted the eigenvalues to alength scale and illustrated the link between the length scale and the oscillation frequency of the eigenmode in the cell geometry. We gave the transformation that links the Laplace eigenfunctions to the eigenfunctions of the Bloch-Torrey operator and computed the Bloch-Torrey eigenfunctions and eigenvalues. This work is another step in bringing advanced mathematical tools and numerical method development to the simulation and modeling ofdiffusion MRI. (10.1002/nbm.4353)
    DOI : 10.1002/nbm.4353
  • Rescaling limits of the spatial Lambda-Fleming-Viot process with selection
    • Etheridge Alison M
    • Véber Amandine
    • Yu Feng
    Electronic Journal of Probability, Institute of Mathematical Statistics (IMS), 2020, 25, pp.1 - 89. We consider the spatial Λ-Fleming-Viot process model for frequencies of genetic types in a population living in R^d , with two types of individuals (0 and 1) and natural selection favouring individuals of type 1. We first prove that the model is well-defined and provide a measure-valued dual process encoding the locations of the "potential ancestors" of a sample taken from such a population, in the same spirit as the dual process for the SLFV without natural selection. We then consider two cases, one in which the dynamics of the process are driven by purely "local" events (that is, reproduction events of bounded radii) and one incorporating large-scale extinction-recolonisation events whose radii have a polynomial tail distribution. In both cases, we consider a sequence of spatial Λ-Fleming-Viot processes indexed by n, and we assume that the fraction of individuals replaced during a reproduction event and the relative frequency of events during which natural selection acts tend to 0 as n tends to infinity. We choose the decay of these parameters in such a way that when reproduction is only local, the measure-valued process describing the local frequencies of the less favoured type converges in distribution to a (measure-valued) solution to the stochastic Fisher-KPP equation in one dimension, and to a (measure-valued) solution to the deterministic Fisher-KPP equation in more than one dimension. When large-scale extinction-recolonisation events occur, the sequence of processes converges instead to the solution to the analogous equation in which the Laplacian is replaced by a fractional Laplacian (again, noise can be retained in the limit only in one spatial dimension). We also consider the process of "potential ancestors" of a sample of individuals taken from these populations, which we see as (the empirical distribution of) a system of branching and coalescing symmetric jump processes. We show their convergence in distribution towards a system of Brownian or stable motions which branch at some finite rate. In one dimension, in the limit, pairs of particles also coalesce at a rate proportional to their collision local time. In contrast to previous proofs of scaling limits for the spatial Λ-Fleming-Viot process, here the convergence of the more complex forwards in time processes is used to prove the convergence of the dual process of potential ancestries. (10.1214/20-EJP523)
    DOI : 10.1214/20-EJP523
  • Low-cost methods for constrained multi-objective optimization under uncertainty
    • Rivier Mickael
    , 2020. Optimization Under Uncertainty is a fundamental axis of research in many companies nowadays, due to both the evergrowing computational power available and the need for efficiency, reliability and cost optimality. Among others, some challenges are the formulation of a suitable metric for the optimization problem of interest and the search for an ideal trade-off between computational cost and accuracy in the case of problems involving complex and expensive numerical solvers. The targeted class of problem here is constrained multiobjective optimization where fitness functions are uncertainty-driven metrics, such as statistical moments or quantiles. This thesis relies on two main ideas. First, the accuracy for approximating the objectives and constraints at a given design should be driven by the probability for this design of being non-dominated. This choice permits to reduce the effort for evaluating designs which are unlikely to be optimal. To this extent, we introduce the concept of probabilistic dominance for constrained multi-objective optimization under uncertainty through the computation of the so-called Pareto-Optimal Probability (POP). Secondly, these approximated evaluations and their associated errors can be used to construct a predictive representation of the objectives and constraints over the whole design space to accelerate the optimization process. Overall, the approximation of different uncertainty-based metrics with tunable accuracy and the use of a Surrogate-Assisting strategy are the main ingredients of the proposed algorithm, called SAMATA. This approach is flexible in terms of metrics formulations and reveals very parsimonious. Note that this algorithm is applicable with generic optimization methods. This thesis then explores the influence of the error distribution on the algorithm performance. We first make a simplifying and conservative assumption by considering a Uniform distribution of the error. In this case, the proposed formulation yields a Bounding-Box approach, where the estimation error can be regarded with the abstraction of an interval (in one-dimensional problems) or a product of intervals (in multi-dimensional problems) around the estimated value, naturally allowing for the computation of an approximated Pareto front. This approach is then supplemented by a Surrogate-Assisting strategy that directly estimates the objective and constraint values. Under some hypotheses, we study the convergence properties of the method in terms of the distance between the approximated Pareto front and the true continuous one. Secondly, we propose to compute non-parametric approximations of the error distributions with a sampling-based technique. We propose a first algorithm relying on an approximation scheme with controlled accuracy for drawing large-scale Gaussian random field realizations in the coupled space between design and uncertain parameters. It notably permits a sharper computation of the POP and detects possible correlations between the different objectives and constraints. Joint realizations can be drawn on multiple designs in order to generate Surrogate-Assisting models of the objectives and constraints. Since the construction of a Gaussian random field can be hard in the context of high dimensionality or non-parametric inputs, we also propose a KDE-based Surrogate-Assisting model as an extension of the classical heteroscedastic Gaussian process, with the capability to take as input disjoint objective and constraint realizations. We assess the proposed variants on several analytical uncertainty-based optimization test-cases with respect to an a priori metamodel approach by computing a probabilistic modified Hausdorff distance to the exact Pareto optimal set. The method is then employed on several engineering applications: the two-bar truss, a thermal protection system for atmospheric reentry and the blades of an Organic Rankine Cycle turbine.
  • Crises de liquidité endogènes dans les marchés financiers
    • Fosset Antoine
    , 2020. De récentes analyses empiriques ont révélé l'existence de l'effet Zumbach. Cette découverte a conduit à l'élaboration des processus de Hawkes quadratique, adapté pour reproduire cet effet. Ce modèle ne faisant pas de lien avec le processus de formation de prix, nous l'avons étendu au carnet d'ordres avec un processus de Hawkes quadratique généralisé (GQ-Hawkes). En utilisant des données de marchés, nous avons montré qu'il existe un effet de type Zumbach qui diminue la liquidité future. Microfondant l'effet Zumbach, il est responsable d'une potentielle déstabilisation des marchés financiers. De plus, la calibration exacte d'un processus GQ-Hawkes nous indique que les marchés sont aux bords de la criticité. Ces preuves empiriques nous ont donc incité à faire une analyse d'un modèle de carnet d'ordres construit avec un couplage de type Zumbach. Nous avons donc introduit le modèle de Santa Fe quadratique et prouvé numériquement qu'il existe une transition de phase entre un marché stable et un marché instable sujet à des crises de liquidité. Grâce à une analyse de taille finie nous avons pu déterminer les exposants critiques de cette transition, appartenant à une nouvelle classe d'universalité. N'étant pas analytiquement soluble, cela nous a conduit à introduire des modèles plus simples pour décrire les crises de liquidités. En mettant de côté la microstructure du carnet d'ordres, nous obtenons une classe de modèles de spread où nous avons calculé les paramètres critiques de leurs transitions. Même si ces exposants ne sont pas ceux de la transition du Santa Fe quadratique, ces modèles ouvrent de nouveaux horizons pour explorer la dynamique de spread. L'un d'entre eux possède un couplage non-linéaire faisant apparaître un état métastable. Ce scénario alternatif élégant n'a pas besoin de paramètres critiques pour obtenir un marché instable, même si les données empiriques ne sont pas en sa faveur. Pour finir, nous avons regardé la dynamique du carnet d'ordres sous un autre angle: celui de la réaction-diffusion. Nous avons modélisé une liquidité qui se révèle dans le carnet d'ordres avec une certaine fréquence. La résolution de ce modèle à l'équilibre révèle qu'il existe une condition de stabilité sur les paramètres au-delà de laquelle le carnet d'ordres se vide totalement, correspondant à une crise de liquidité. En le calibrant sur des données de marchés nous avons pu analyser qualitativement la distance à cette région instable.
  • Renewal in Hawkes processes with self-excitation and inhibition
    • Costa Manon
    • Graham Carl
    • Marsalle Laurence
    • Tran Viet-Chi
    Advances in Applied Probability, Applied Probability Trust, 2020, 52 (3), pp.879-915. This paper investigates Hawkes processes on the positive real line exhibiting both self-excitation and inhibition. Each point of this point process impacts its future intensity by the addition of a signed reproduction function. The case of a nonnegative reproduction function corresponds to self-excitation, and has been widely investigated in the literature. In particular, there exists a cluster representation of the Hawkes process which allows to apply results known for Galton-Watson trees. In the present paper, we establish limit theorems for Hawkes process with signed reproduction functions by using renewal techniques. We notably prove exponential concentration inequalities , and thus extend results of Reynaud-Bouret and Roy (2007) which were proved for nonnegative reproduction functions using this cluster representation which is no longer valid in our case. An important step for this is to establish the existence of exponential moments for renewal times of M/G/∞ queues that appear naturally in our problem. These results have their own interest, independently of the original problem for the Hawkes processes. (10.1017/apr.2020.19)
    DOI : 10.1017/apr.2020.19
  • Towards an End-to-End Analysis and Prediction System for Weather, Climate, and Marine Applications in the Red Sea
    • Hoteit Ibrahim
    • Abualnaja Yasser
    • Afsal​ Shehzad
    • Ait El Fquih Boujemaa
    • Akylas Triantaphyllos
    • Antony​ Charls
    • Dawson Clint
    • Asfahani Khaled
    • Brewin Robert J W
    • Cavaleri Luigi
    • Cerovečki Ivana
    • Cornuelle Bruce D
    • Desamsetti Srinivas
    • Attada Raju
    • Dasari Hari
    • Sanchez-Garrido Jose
    • Genevier Lily
    • Gharamti Mohamad
    • Gittings John
    • Gokul Elamurugu
    • Gopalakrishnan Ganesh
    • Guo​ Daquan
    • Hadri Bilel
    • Hadwiger Markus
    • Abed Mohammed
    • Hendershott Myrl
    • Hittawe Mohamad
    • Ashok Karumuri
    • Knio Omar
    • Köhl Armin
    • Kortas Samuel
    • Krokos George
    • Kunchala Ravi
    • Issa Leila
    • Lakkis Issam
    • Langodan​ Sabique
    • Lermusiaux Pierre
    • Luong Thong
    • Ma Jingyi
    • Le Maitre Olivier
    • Mazloff Matthew
    • El-Mohtar Samah
    • Papadopoulos Vassilis
    • Platt Trevor
    • Pratt Larry
    • Raboudi Naila
    • Racault Marie-Fanny
    • Raitsos Dionysios
    • Razak Shanas
    • Sivareddy Sanikommu
    • Sathyendranath Shuba
    • Sofianos Sarantis
    • Subramanian Aneesh
    • Sun Rui
    • Titi Edriss S.
    • Toye Habib
    • Triantafyllou Georges
    • Tsiaras​ Kostas
    • Vasou Panagiotis
    • Viswanadhapalli​ Yesubabu
    • Wang Yixin
    • Yao Fengchao
    • Zhan Peng
    • Zodiatis George
    Bulletin of the American Meteorological Society, American Meteorological Society, 2020, pp.1-61. The Red Sea, home to the second-longest coral reef systemin the world, is a vital resource for the Kingdom of Saudi Arabia. The Red Sea provides 90% of the Kingdom’s potable water by desalinization, supporting tourism, shipping, aquaculture and fishing industries, which together contribute about 10-20% of the country’s GDP.All these activities, and those elsewhere in the Red Sea region, critically depend on oceanic and atmospheric conditions. At a time of mega-development projects along the Red Seacoast, and global warming, authorities are working on optimizing the harnessing of environmental resources, including renewable energy, rainwater harvesting, etc. All these require high-resolution weather and climate information. Toward this end, we have undertaken a multi-pronged R&D activity in which we are developingan integrated data-driven regional coupled modeling system. The telescopically-nested components include 5km-600m resolution atmospheric models to address weather and climate challenges, 4km-50m resolution ocean models with regional and coastal configurations to simulate and predict the general and mesoscale circulation; 4km-100m ecosystem models to simulate the biogeochemistry; and 1km-50m resolution wave models. In addition, a complementary probabilistic transport modeling system predicts dispersion of contaminant plumes, oil-spill, and marine ecosystem connectivity. Advanced ensemble data assimilation capabilities have also been implemented for accurate forecasting. Resulting achievements include significant advancement in our understanding of the regional circulation and its connection to the global climate, development and validation of long-term Red Sea regional atmospheric-oceanic-wave reanalyses, and forecasting capacities.These products are being extensively used by academia/government/industry in various weather and marine studies and operations, environmental policies, renewable energy applications, impact assessment, flood-forecasting, etc. (10.1175/BAMS-D-19-0005.1)
    DOI : 10.1175/BAMS-D-19-0005.1
  • Local mean field and energy transport in non-equilibrium systems
    • Fernandez Montero Alejandro
    , 2020. Chains of oscillator systems enable to model microscopically a solid, in order to study energy transport and prove Fourier’s law. In this thesis, we introduce two new models of chains of oscillators with local mean field mechanical interaction and stochastic collisions that preserve the system’s total energy. The first model is a model with stochastic velocity exchanges of Kac type. The second one is a model with random flips of velocities, where the sign of the particles’ velocities is changed at random times.As we consider local mean field models, particles are not indistinguishable, and the conservative stochastic exchanges in our first model are an additional difficulty for the proof of a Vlasov limit. We first derive a quantitative mean field limit, that we then use to prove that energy evolves diffusively at a given timescale for the model with long-range exchanges and for a restricted class of anharmonic potentials. At the same timescale, we also prove that there is no evolution of energy for the model with flips of velocities.For harmonic interactions, we then compute thermal conductivity via Green-Kubo formula for both models, to highlight that the timescale at which energy evolves for the model with velocity flips is longer and therefore that the mechanisms at play for energy transport are different.
  • Mixtures of Gaussian Graphical Models with Constraints
    • Lartigue Thomas
    , 2020. Describing the co-variations between several observed random variables is a delicate problem. Dependency networks are popular tools that depict the relations between variables through the presence or absence of edges between the nodes of a graph. In particular, conditional correlation graphs are used to represent the “direct” correlations between nodes of the graph. They are often studied under the Gaussian assumption and consequently referred to as “Gaussian Graphical Models” (GGM). A single network can be used to represent the overall tendencies identified within a data sample. However, when the observed data is sampled from a heterogeneous population, then there exist different sub-populations that all need to be described through their own graphs. What is more, if the sub-population (or “class”) labels are not available, unsupervised approaches must be implemented in order to correctly identify the classes and describe each of them with its own graph. In this work, we tackle the fairly new problem of Hierarchical GGM estimation for unlabelled heterogeneous populations. We explore several key axes to improve the estimation of the model parameters as well as the unsupervised identification of the sub-populations. Our goal is to ensure that the inferred conditional correlation graphs are as relevant and interpretable as possible. First - in the simple, homogeneous population case - we develop a composite method that combines the strengths of the two main state of the art paradigms to correct their weaknesses. For the unlabelled heterogeneous case, we propose to estimate a Mixture of GGM with an Expectation Maximisation (EM) algorithm. In order to improve the solutions of this EM algorithm, and avoid falling for sub-optimal local extrema in high dimension, we introduce a tempered version of this EM algorithm, that we study theoretically and empirically. Finally, we improve the clustering of the EM by taking into consideration the effect of external co-features on the position in space of the observed data.
  • AI-Augmented Multi Function Radar Engineering with Digital Twin: Towards Proactivity
    • Klein Mathieu
    • Carpentier Thomas
    • Jeanclaude Eric
    • Kassab Rami
    • Varelas Konstantinos
    • de Bruijn Nico
    • Barbaresco Frédéric
    • Briheche Yann
    • Semet Yann
    • Aligne Florence
    , 2020. Thales new generation digital multi-missions radars, fully-digital and software-defined, like the Sea Fire and Ground Fire radars, benefit from a considerable increase of accessible degrees of freedoms to optimally design their operational modes. To effectively leverage these design choices and turn them into operational capabilities, it is necessary to develop new engineering tools, using artificial intelligence. Innovative optimization algorithms in the discrete and continuous domains, coupled with a radar Digital Twins, allowed construction of a generic tool for "search" mode design (beam synthesis, waveform and volume grid) compliant with the available radar time budget. The high computation speeds of these algorithms suggest tool application in a "Proactive Radar" configuration, which would dynamically propose to the operator, operational modes better adapted to environment, threats and the equipment failure conditions.
  • Phased-Array Antenna Pattern Optimization with Evolution Strategies
    • Dufossé Paul
    • Enderli Cyrille
    • Savy Laurent
    • Hansen Nikolaus
    , 2020. (10.1109/RadarConf2043947.2020.9266631)
    DOI : 10.1109/RadarConf2043947.2020.9266631
  • Statistical inference with incomplete and high-dimensional data - modeling polytraumatized patients
    • Jiang Wei
    , 2020. The problem of missing data has existed since the beginning of data analysis, as missing values are related to the process of obtaining and preparing data. In applications of modern statistics and machine learning, where the collection of data is becoming increasingly complex and where multiple sources of information are combined, large databases often have an extraordinarily high number of missing values. These data therefore present important methodological and technical challenges for analysis: from visualization to modeling including estimation, variable selection, predictive capabilities, and implementation through implementations. Moreover, although high-dimensional data with missing values are considered common difficulties in statistical analysis today, only a few solutions are available.The objective of this thesis is to provide new methodologies for performing statistical inferences with missing data and in particular for high-dimensional data. The most important contribution is to provide a comprehensive framework for dealing with missing values from estimation to model selection based on likelihood approaches. The proposed method doesn't rely on a specific pattern of missingness, and allows a good balance between quality of inference and computational efficiency.The contribution of the thesis consists of three parts. In Chapter 2, we focus on performing a logistic regression with missing values in a joint modeling framework, using a stochastic approximation of the EM algorithm. We discuss parameter estimation, variable selection, and prediction for incomplete new observations. Through extensive simulations, we show that the estimators are unbiased and have good confidence interval coverage properties, which outperforms the popular imputation-based approach. The method is then applied to pre-hospital data to predict the risk of hemorrhagic shock, in collaboration with medical partners - the Traumabase group of Paris hospitals. Indeed, the proposed model improves the prediction of bleeding risk compared to the prediction made by physicians.In chapters 3 and 4, we focus on model selection issues for high-dimensional incomplete data, which are particularly aimed at controlling for false discoveries. For linear models, the adaptive Bayesian version of SLOPE (ABSLOPE) we propose in Chapter 3 addresses these issues by embedding the sorted l1 regularization within a Bayesian spike-and-slab framework. Alternatively, in Chapter 4, aiming at more general models beyond linear regression, we consider these questions in a model-X framework, where the conditional distribution of the response as a function of the covariates is not specified. To do so, we combine knockoff methodology and multiple imputations. Through extensive simulations, we demonstrate satisfactory performance in terms of power, FDR and estimation bias for a wide range of scenarios. In the application of the medical data set, we build a model to predict patient platelet levels from pre-hospital and hospital data.Finally, we provide two open-source software packages with tutorials, in order to help decision making in medical field and users facing missing values.
  • Decomposition of High Dimensional Aggregative Stochastic Control Problems
    • Seguret Adrien
    • Alasseur Clémence
    • Frédéric Bonnans J
    • de Paola Antonio
    • Oudjane Nadia
    • Trovato Vincenzo
    , 2020. We consider the framework of high dimensional stochastic control problem, in which the controls are aggregated in the cost function. As first contribution we introduce a modified problem, whose optimal control is under some reasonable assumptions an ε-optimal solution of the original problem. As second contribution, we present a decentralized algorithm whose convergence to the solution of the modified problem is established. Finally, we study the application to a problem of coordination of energy consumption and production of domestic appliances.
  • Probabilistic and mean-field model of COVID-19 epidemics with user mobility and contact tracing
    • Akian Marianne
    • Ganassali Luca
    • Gaubert Stéphane
    • Massoulié Laurent
    , 2020. We propose a detailed discrete-time model of COVID-19 epidemics coming in two flavours, mean-field and probabilistic. The main contribution lies in several extensions of the basic model that capture i) user mobility - distinguishing routing, i.e. change of residence, from commuting, i.e. daily mobility - and ii) contact tracing procedures. We confront this model to public data on daily hospitalizations, and discuss its application as well as underlying estimation procedures.