Quentin Petit

Doctor in High Performance Compmuting and AI

I am currently a postdoctoral researcher at Mines Paris. I am working on KV cache compression.

Experiences

Postdoc Researcher

Les Mines Paris - PSL University, CRI laboratory

I am working on implementing KV cache compression solutions based on spectral methods to reduce the bottleneck of KV cache memory during inference with very long sequences length.

2025 - now

Research Engineer in HPC

Huawei Technologies France

I worked on distributed and parallel multi-level computing for very large Deep Learning models. I specifically worked on the generalization of data embedding in the pre-processing of large models. The goal is to find the best way of representing data to retain as much information as possible, while reducing their size to save computing power during processing. Methods will be implemented in MindSpore, an AI framework.

2020 - 2024

Sparse linear Algebra Research project

CNRS & Maison de la Simulation

I experimented existing linear algebra proposed by TensorFlow for sparse matrix computation. I analyzed how it is possible to use new sparse matrix compression formats using TensorFlow in order to minimize communications and optimize computation time.

2019 - 2020

Front-End back-End developer

Orange

To support a major relocation and with the aim of improving the well-being of employees in their workplace, I developed a new web application in Angular. I used the CakePHP library for the back-end part. This application allows all employees to listen to music. Their use then allows you to customize the music in the common rooms.

2019

Software engineer internship

CNRS & Maison de la Simulation

Lead graphical user interface Implementation for the SMG2S project (Mathematical research project). Website: https://smg2s.github.io/

2018

Education

Thesis HPC/DL

Paris-Saclay University • PhD Degree

Subject : Distributed and Parallel Computing for very Large Neural Networks
Academic supervisor : Prof. Emad Nahid
Industrial supervisor : Dr. Li Chong
See on theses.fr

Jury members:

Gaétan HAINS, Professor – Université Paris-Est Créteil – President
Michel DAYDÉ, Professor – Université de Toulouse – Reviewer & Examiner
Corinne ANCOURT, Professor – Mines Paris/PLS – Reviewer & Examiner
Jack DONGARRA, Professor – Université du Tennessee (USA, TN) – Examiner
Christophe CALVIN, Research Director – CEA / Université Paris-Saclay – Examiner
Geraud KRAWEZIK, Doctor – Flatiron Institute (USA, NY) – Examiner

Thesis defended the March 20, 2025 at Maison de la simulation

2021 - 2024

Software Engineering and Statistics

Polytech Lille • Master Degree

2017 - 2020

Bachelor of Mathematics

Polytech Tours • Bachelor Degree

2015 - 2017

Scientific publications

Distributed and Parallel Computing for Very Large Neural Networks

Quentin Petit

Thesis Manuscript

Very large model sizes are now a very common feature, extending the range of applications for Deep Learning. However, this exponential growth in model size has led to an equally significant increase in computing power requirements. Innovative solutions need to be found and implemented to optimize current algorithms, reduce their complexity and make them easy to use and deploy in a massively distributed environment. The development of parallel and distributed computing techniques and methods to fully exploit available resources is crucial to maximizing efficiency and minimizing computation costs is very important to meet the ever-growing requirements of these models.

In this context, we propose several contributions to reduce the costs associated with the training of neural networks in a massively distributed environment. Our contributions focus on the processing of data upstream of the model, in order to improve the quality of the data supplied to the neural network and facilitate its training. We focused on the processing of sparse data, such as graphs, which pose particular challenges due to their complex structures and potentially very large sizes. The processing applied to these data are designed to significantly improve the model’s performance. Finally, we propose leveraging this processing to reduce effectively the size of the data, thereby decreasing the number of inputs while retaining sufficient information to ensure good model accuracy.

2025

Efficient and Scalable Approach to Build Co-occurrence Matrix for DNN's Embedding Layer

Quentin Petit, Chong Li, Nahid Emad

International Conference on Supercomputing - ICS 2024

Embedding is a crucial step for deep neural networks. Datasets, from different applications, with different structures, can all be processed through an embedding layer and transformed into a dense matrix. The transformation must minimize both the loss of information and the redundancy of data. Extracting appropriate data features ensures the efficiency of the transformation. The co-occurrence matrix is an excellent way of representing the links between elements in a dataset. However, the dataset size becomes a problem in terms of computation power and memory footprint for using the co-occurrence matrix.

In this paper, we propose a parallel and distributed approach to efficiently constructing the co-occurrence matrix in a scalable way. Our solution takes advantage of different features of boolean datasets to minimize the construction time of the co-occurrence matrix. Our experimental results show that our solution outperforms traditional approaches up to 34x. We also demonstrate the efficacy of our approach with a cost model.

2024

Enhancing Graph Convolutional Networks by Topology Sampling

Quentin R Petit, Chong Li, Serge G Petiton, Kelun Chai, Nahid Emad

IEEE Big Data 2022

Graph Neural Networks (GNNs) play a very important role today. It does analyze not only the graph data itself, but also the data connectivity of the graph. The quality of a GNN is thus altered by the result of extracted graph structure information. The extraction could be enhanced by GNN model design or directly from the training dataset with a GNN-decoupled method. In this paper, we propose RankedDrop, a new sampling method to improve the extraction of graph structure information. This approach is based on droppingout technique, and it adopts a spatial-aware selection of edges to drop. It takes into account structure information of the graph to control the dropping-out, and its random selection of edges to be dropped is under the control of a probability generated with respect to graph’s topological importance. Our experiments point out that RankedDrop provides high-quality and robust training results compared to the leading solutions. Furthermore, RankedDrop could be a framework plugin and combined with GNN model improvements to maximize GNN quality. Furthermore, RankedDrop could be a plugin for AI frameworks like MindSpore and combined with GNN model improvements to maximize GNN quality.

2022

Distributed and Parallel Sparse Computing for Very Large Graph Neural Networks

Quentin Petit, Chong Li, Nahid Emad

IEEE Big Data 2022

Deep learning (DL) requires high-performance processing on big data. Graph Neural Networks, a challenging topic in DL using linear algebra methods, need algorithmic solutions to efficiently assign and process graph data on modern distributed and parallel machines, which are considered with mixed arithmetic and various types of tensor/matrix accelerators. Determining compression techniques for the graph’s sparse data structures is one of the key elements.Our first objective is to design and implement a reusable parallel numerical library to resolve large neural network graphs. Our design strategy is drawn on a component-based approach and targets maximum code reuse in various parallel contexts while allowing for performance optimization. The solution could be later integrated into a DL framework like MindSpore.

2022

Skills

Programming skills

C / C++

High performance computing algorithm development

95%

MPI

Parrallel and collectives communciations library: MPI, OpenMP, HCCL

90%

LaTeX

Build documents and slides with LaTeX

85%

Python

AI Framework: Keras, TensorFlow, MindSpore

65%

Languages

French

C2: Mother tongue

English

C1: Fluent

Chinese

A1: Elementary

Spanish

A1: Elementary

Interests

While studying mathematics and computer science, I became passionate about HPC and computations based on sparse linear algebra in a massively distributed environment.

During my thesis, I worked on the application of these principles to accelerate computations and took part in the generalization of various Deep Learning models.

Hobbies

Running
Photography
Video Games

Research areas

High performance computing
Sparse matrix - Sparse linear algebra
Dimension reduction
Data quality
Embedding/Pre-processing for deep learning models
Spectral methods
KV cache compression