Quentin Petit
I'm looking for opportunities in industry to continue working on the computational distribution of very large deep learning models.
I'm looking for opportunities in industry to continue working on the computational distribution of very large deep learning models.
I worked on distributed and parallel multi-level computing for very large Deep Learning models. I specifically worked on the generalization of data embedding in the pre-processing of large models. The goal is to find the best way of representing data to retain as much information as possible, while reducing their size to save computing power during processing. Methods will be implemented in MindSpore, an AI framework.
I experimented existing linear algebra proposed by TensorFlow for sparse matrix computation. I analyzed how it is possible to use new sparse matrix compression formats using TensorFlow in order to minimize communications and optimize computation time.
To support a major relocation and with the aim of improving the well-being of employees in their workplace, I developed a new web application in Angular. I used the CakePHP library for the back-end part. This application allows all employees to listen to music. Their use then allows you to customize the music in the common rooms.
Lead graphical user interface Implementation for the SMG2S project (Mathematical research project). Website: https://smg2s.github.io/
Very large model sizes are now a very common feature, extending the range of applications for Deep Learning. However, this exponential growth in model size has led to an equally significant increase in computing power requirements. Innovative solutions need to be found and implemented to optimize current algorithms, reduce their complexity and make them easy to use and deploy in a massively distributed environment. The development of parallel and distributed computing techniques and methods to fully exploit available resources is crucial to maximizing efficiency and minimizing computation costs is very important to meet the ever-growing requirements of these models.
In this context, we propose several contributions to reduce the costs associated with the training of neural networks in a massively distributed environment. Our contributions focus on the processing of data upstream of the model, in order to improve the quality of the data supplied to the neural network and facilitate its training. We focused on the processing of sparse data, such as graphs, which pose particular challenges due to their complex structures and potentially very large sizes. The processing applied to these data are designed to significantly improve the model’s performance. Finally, we propose leveraging this processing to reduce effectively the size of the data, thereby decreasing the number of inputs while retaining sufficient information to ensure good model accuracy.
Embedding is a crucial step for deep neural networks. Datasets, from different applications, with different structures, can all be processed through an embedding layer and transformed into a dense matrix. The transformation must minimize both the loss of information and the redundancy of data. Extracting appropriate data features ensures the efficiency of the transformation. The co-occurrence matrix is an excellent way of representing the links between elements in a dataset. However, the dataset size becomes a problem in terms of computation power and memory footprint for using the co-occurrence matrix.
In this paper, we propose a parallel and distributed approach to efficiently constructing the co-occurrence matrix in a scalable way. Our solution takes advantage of different features of boolean datasets to minimize the construction time of the co-occurrence matrix. Our experimental results show that our solution outperforms traditional approaches up to 34x. We also demonstrate the efficacy of our approach with a cost model.
Graph Neural Networks (GNNs) play a very important role today. It does analyze not only the graph data itself, but also the data connectivity of the graph. The quality of a GNN is thus altered by the result of extracted graph structure information. The extraction could be enhanced by GNN model design or directly from the training dataset with a GNN-decoupled method. In this paper, we propose RankedDrop, a new sampling method to improve the extraction of graph structure information. This approach is based on droppingout technique, and it adopts a spatial-aware selection of edges to drop. It takes into account structure information of the graph to control the dropping-out, and its random selection of edges to be dropped is under the control of a probability generated with respect to graph’s topological importance. Our experiments point out that RankedDrop provides high-quality and robust training results compared to the leading solutions. Furthermore, RankedDrop could be a framework plugin and combined with GNN model improvements to maximize GNN quality. Furthermore, RankedDrop could be a plugin for AI frameworks like MindSpore and combined with GNN model improvements to maximize GNN quality.
Deep learning (DL) requires high-performance processing on big data. Graph Neural Networks, a challenging topic in DL using linear algebra methods, need algorithmic solutions to efficiently assign and process graph data on modern distributed and parallel machines, which are considered with mixed arithmetic and various types of tensor/matrix accelerators. Determining compression techniques for the graph’s sparse data structures is one of the key elements.Our first objective is to design and implement a reusable parallel numerical library to resolve large neural network graphs. Our design strategy is drawn on a component-based approach and targets maximum code reuse in various parallel contexts while allowing for performance optimization. The solution could be later integrated into a DL framework like MindSpore.
High performance computing algorithm development
Parrallel and collectives communciations library: MPI, OpenMP, HCCL
Build documents and slides with LaTeX
AI Framework: Keras, TensorFlow, MindSpore
C2: Mother tongue
C1: Fluent
A1: Elementary
A1: Elementary
While studying mathematics and computer science, I became passionate about HPC and computations based on sparse linear algebra in a massively distributed environment.
During my thesis, I worked on the application of these principles to accelerate computations and took part in the generalization of various Deep Learning models.