10 décembre 2020

Catégorie : Doctorant

Graphs are commonly used to describe the knowledge we have on given phenomena. Social networks can be modeled in the form of graphs whose nodes are users and the edges are the relationships between individuals. Biologists use graphs to describe protein interactions, while communication networks are themselves graphs. The interest of machine learning for graphs has recently become a strategic issue. The objective can be, for example, the prediction of new affinities in social networks, etc. In order to handle and perform operations on graphs, it is important to have an adequate associated mathematical representation ("embedding" in English) [1,2]. The main challenge in this area is to represent the structure of a graph in order to be easily exploited by machine learning methods.

Until now, it is therefore common to associate a graph with a vector or a set of vectors, that is to say, in the form of a matrix. For example, Word2vec is an embedding method which associates a vector with each word. Similar words must have close vector embeddings in the sense of a metric. This embedding step should capture the topology of the graph, the vertex-to-vertex relationships, and any other relevant and available information about the graph, subgraphs, and vertices. Here, we list two of the properties that an embedding method should have:

- Fidelity to topology.The embedding method must describe as closely as possible the topology of the graph (connections and neighborhood).The learning performance will of course be strongly dependent on the choice of the embedding.

- Efficiency on large graphs.Graphs are generally large.For example, we must imagine social networks composed of millions/billions users or the multitude of objects communicating in a home internet network (IoT).Each elementary entity is a node and the edges indicate a knowledge shared between nodes.

The greater the quantity of information captured by the embedding method, the more efficient the learning step will be, but the greater the resources in terms of storage and calculation will be necessary. There is a natural increase in the dimensionality of the embedding method. And in this case, the quantity of information then grows exponentially with the dimension. This trend is known as the "curse of dimensionality". The methods of the state of the art generally learn representations of explicit nodes from the spectral decomposition of the adjacency matrix or the Laplacian matrix [3] constructed as the difference between the matrix of degrees (a diagonal matrix which contains information on the number of edges attached to each node) and the ajdacency matrix (indicating whether or not nodes are adjacent in the graph). In this thesis work, we want to generalize these approaches to a multi-view graph conext. By multi-views, we understand the broadening of the information base on which we build the embedding [4]. More precisely, it is a question of studying the learning of graphs based on tensor embeddings and the decompositions which are attached to it [5]. In the latter case, we will call on the theory of tensor networks [6,7], which consists of factorizing a tensor of high order into a collection of coupled tensors of order at most 3. The number of edges indicates the order/dimension.

By resorting to tensor networks, the initial massive problem is replaced by a collection of distributed problems involving a reduced amount of data. These approaches, in addition to the reduction of computational and storage complexities, make it possible to gain in interpretability of raw data. The challenge here is to develop this new tensor tool in the context of learning large graphs (large number of nodes). The applications of this work are multi-fold, such as for example machine learning for IoT, 5G telecoms and beyond, social networks or even health (see [8] dealing with the COVID-19 pandemic), …

**Contact:**

*References*

*[1] Cai, H., Zheng, V. W., & Chang, K. C. C. (2018). A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering, 30(9), 1616-1637.*

*[2] Yan, S., Xu, D., Zhang, B., Zhang, H. J., Yang, Q., & Lin, S. (2006). Graph embedding and extensions: A general framework for dimensionality reduction. IEEE transactions on pattern analysis and machine intelligence, 29(1), 40-51.*

*[3] Belkin, M., & Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems (pp. 585-591).*

*[4] Al-Sayouri, S., Gujral, E., Koutra, D. et al. (2020) t-PINE: tensor-based predictable and interpretable node embeddings. Soc. Netw. Anal. Min. 10, 46. https://doi.org/10.1007/s13278-020-00649-4*

*[5] Kolda, T. G., Bader, B. W. (2009). Tensor decompositions and applications. SIAM review, 51(3), 455-500.*

*[6] **Cichocki, A., Lee, N., Oseledets, I., Phan, A. H., Zhao, Q.,Mandic, D. P. (2016). Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions. Foundations and Trends in Machine Learning, 9(4-5), 249-429.*

*[7] Zniyed, Y., Boyer, R., de Almeida, A. L., & Favier, G. (2020). High-order tensor estimation via trains of coupled third-order CP and Tucker decompositions. Linear Algebra and its Applications, 588, 304-337*

*[8] Kanatsoulis, C. I., Sidiropoulos, N. D. (2020). TeX-Graph: Coupled tensor-matrix knowledge-graph embedding for COVID-19 drug repurposing. arXiv preprint arXiv:2010.11367.*

(c) GdR 720 ISIS - CNRS - 2011-2020.