# Réunion

## Learning in Networks and Beyond

**Date :**3-06-2016

**Lieu :**Télécom ParisTech (Amphi B310)

**Thèmes scientifiques :**

- D - Télécommunications : compression, protection, transmission

Nous vous rappelons que, afin de garantir l'accès de tous les inscrits aux salles de réunion, **l'inscription aux réunions est gratuite mais obligatoire**.

### Inscriptions

18 personnes membres du GdR ISIS, et 31 personnes non membres du GdR, sont inscrits à cette réunion.

Capacité de la salle : 144 personnes.

### Annonce

Dans le domaine de l'optimisation des réseaux (au sens large) et des services en ligne (moteurs de recherche, systèmes de recommandation) il est souvent nécessaire de prendre des décisions face à un environnement incertain/stochastique dont le comportement statistique n'est pas connu a priori (mobilité des noeuds, trafic ou popularité des contenus). Dans un autre domaine, celui de l'apprentissage (ou "machine learning"), les problèmes d'optimisation en bandit ("multi-armed bandit", "restless bandit"), l'apprentissage par renforcement ou encore la détection de communautés font l'objet d'un vif intérêt motivé par leur applicabilité aux systèmes de recommandation, publicité en ligne et réseaux sociaux.

Ces deux communautés étudient des problèmes similaires et utilisent fréquemment les mêmes outils (probabilités, inégalités de concentration, théorie de l'information, *etc.*) et le but de cette journée est de rassembler des chercheurs travaillant dans ces deux domaines, tant au niveau des problèmes que des outils.

### Orateurs invités :

- Thomas Bonald (TelecomParisTech)
- Emilie Kauffmann (CRIStAL : CNRS, Université Lille 1)
- Odalric Maillard (INRIA Saclay)
- Laurent Massoulié (INRIA-Microsoft Research Joint Center)

### Appel à présentations :

Les chercheurs travaillant sur les thèmes mentionnés ci-dessus et souhaitant présenter leur travail sont invités à envoyer avant le 18 mai 2016 un titre et un résumé d'une page maximum (au format pdf) aux organisateurs (richard.combes@supelec.fr, mari.kobayashi@supelec.fr). La participation des doctorants est tout particulièrement encouragée.

### Organisateurs :

- Richard Combes (Supélec), richard.combes@supelec.fr
- Mari Kobayashi (Supélec), mari.kobayashi@supelec.fr

### Programme

9h00 - 9h30 : Acceuil

9h30 - 9h40: Presentation introductive (GdR ISIS)

9h40 - 10h30: Laurent Massoulie (INRIA / Microsoft Joint Center): Non-regular Ramanujan graphs and community detection

10h30 - 11h15: Odalric Maillard (INRIA): Multi-Armed Bandits with latent user variables

11h15 - 11h30: Pause

11h30 - 11h50: Hafiz Tiomoko (Centrale-Supelec /L2S): Spectral Community Detection in Realistic Graphs: Beating the Bethe Hessian

11h50 - 12h10: Alexandre Marcastel (ENSEA): Interference Mitigation via Pricing in Time-Varying Multi-User Cognitive Radio Systems

12h10 - 13h30: Repas

13h30 - 14h15: Emilie Kaufmann (CRIStAL : CNRS, Université Lille 1): Two examples of discrete PAC optimization

14h15 - 14h35: Mikael Touati (Telecom ParisTech): A Cooperative Game Theoretic Analysis of WiFi

14h35 - 14h55: Mohammed Amin Abdullah (Huawei): Robust On-line Matrix Completion on Graphs

14h55 - 15h10: Pause

15h10 - 16h00: Thomas Bonald (Telecom ParisTech): Crowdsourcing: Low Complexity, Minimax Optimal Algorithms

16h00 - 16h20: Symeon Chouvardas (Huawei): A Diffusion Kernel LMS Algorithm for Nonlinear Adaptive Networks

16h20 - 16h40: Wenjie Li (Paris Sud / Centrale-Supelec / L2S): Self-Rating in a Community of Peers

16h40 - 17h00: Patricia Conde (Paris 13 / L2TI) : An efficient method for mining the Maximal alpha-quasi-clique-community of a given node in Complex Networks

17h00 - 17h30: Discussion et Conclusion

### Résumés des contributions

### Non-regular Ramanujan graphs and community detection

**Speaker**: Laurent Massoulié (INRIA / Microsoft Joint Center)

**Abstract**: This talk will review the notion of Ramanujan graphs and the result of Alon and Boppana according to which such graphs have a maximal spectral separation. It will then present a spectral characterization of non-backtracking matrices of random graphs sampled according to the so-called stochastic block model. This characterization implies that i) Erdos-Rényi graphs satisfy a property extending the notion of Ramanujan graph to the non-regular case, and ii) spectral clustering based on the non-backtracking matrix can perform non-trivial community detection down to the so-called Kesten-Stigum threshold. This last property establishes the "spectral redemption" conjecture made by Krzakala et al.

Based on joint work with Charles Bordenave and Marc Lelarge

### Multi-Armed Bandits with latent user variables

**Speaker**: Odalric Maillard (INRIA)

**Abstract**: We consider a multi-armed bandit problem where the reward distributions are indexed by two sets ?one for arms, one for type and can be partitioned into a small number of clusters according to the type. First, we consider the setting where all reward distributions are known and all types have the same underlying cluster, the types identity is, however, unknown. Second, we study the case where types may come from different classes, which is significantly more challenging. Finally, we tackle the case where the reward distributions are completely unknown. In each setting, we introduce specific algorithms and derive non-trivial regret performance. Numerical experiments show that, in the most challenging agnostic case, the proposed algorithm achieves excellent performance in several difficult scenarios. We then discuss some extension to the smooth clustering setting.

### Spectral Community Detection in Realistic Graphs: Beating the Bethe Hessian

**Speaker**: Hafiz Tiomoko (Centrale-Supelec /L2S)

**Abstract**: Community detection in large dimensional networks is a subject of heavy investigation these days with applications in social, technological, biological networks. Most real world networks are usually: (i) sparse in the sense that the degree of each node vanish with the network size and (ii) heterogeneous in the sense that the node degrees are highly varying. Spectral clustering, which classifies vertices according to eigenvectors corresponding to outlying eigenvalues of some matrix representation of the network, is one of the prominent methods to perform community detection tasks. A currently breakthrough spectral method is based on a new operator coming from statistical physics, the Bethe Hessian (BH) matrix. The latter and also most existing spectral methods are motivated by the underlying assumption of a stochastic block model (SBM) and by Newman?s definition of the graph modularity matrix (derived from the adjacency matrix). As the SBM does not allow for degree heterogeneity inside clusters, we work on the degree-corrected stochastic block model (DCSBM), which generalises the SBM for graphs with heterogeneous degree distributions. In addition to studying realistic degree corrected dense structured-graph models, our investigation is motivated by the lack of unification of the spectral methods for community detection in the literature, based on various normalizations of the adjacency/modularity matrices. In this talk, we study a novel ??-regularization? of the modularity matrix. By leveraging tools from random matrix theory, we study the eigenstructure of this matrix and the consequences to community detection. In particular, (a) We provide a consistent estimator for the choice of inducing the most favorable community detection in the hard regime where the clusters are not easily separable. (b) We further prove that spectral clustering ought to be performed on a regularization of the dominant eigenvectors (rather than on the eigenvectors themselves) to compensate for biases due to degree heterogeneity. Our approach largely outperforms the state of the art spectral methods, on synthetic graphs generated from the DCSBM with highly varying node degrees. In particular, the BH approach creates additional artificial clusters which are due to biases induced by the degree heterogeneity and this completely alter the classification performance in those scenarios. Futhermore, although based on dense graph models, our clustering method largely outperforms the BH method on some real world benchmarks and has competitive performances on others.

### Interference Mitigation via Pricing in Time-Varying Multi-User Cognitive Radio Systems

**Speaker**: Alexandre Marcastel (ETIS/ENSEA)

**Abstract**: Despite the lure of a considerable increase in spectrum usage efficiency, the practical implementation of cognitive radio (CR) systems is being obstructed by the need for efficient and reliable protection mechanisms that can safeguard the quality of service requirements (QoS) of licensed users. This need becomes particularly apparent in dynamic wireless networks where channel conditions may vary unpredictably thus making the task of guaranteeing the primary users (PU) minimum QoS requirements an even more challenging task. In this work, we consider a pricing mechanism that penalizes the Secondary user (SU) for the interference they inflict on the networks PUs and then compensates the PUs accordingly. Drawing on tools from online optimization, we propose an exponential learning power allocation policy that is provably capable of adapting quickly and efficiently to the systems variability, relying only on strictly causal channel state informations. If the transmission horizon T is known in advance by the SUs, we prove that the proposed algorithm reaches a no-regret state within O(T^{-1/2} ) iterations; otherwise, if the horizon is not known in advance, the algorithm still reaches a no-regret state within O(T^{-1/2} log T ) iterations. Moreover, our numerical results show that the interference created by the SUs can be mitigated effectively by properly tuning the parameters of the pricing mechanism.

### Two examples of discrete PAC optimization

**Speaker**: Emilie Kaufmann

**Abstract**: We consider a generic framework in which given an unknown function f defined on a finite set, the goal is to find, with high probability, the solution of an optimization problem related to f, using as few noisy evaluations of f as possible. The simplest example of this setting is the Best Arm Identification problem in a bandit model, in which arms (i.e. unknown probability distributions) are sampled sequentially so as to identify as quickly and accurately as possible the arm with highest mean. I will present an optimal algorithm for this problem. Building on tools for Best Arm Identification, I will then propose algorithms for best maximin action identification in a two-player game, for which sample complexity guarantees are provided. The proposed framework can be viewed as the simplest possible model for (depth-two) Monte-Carlo Tree Search (MCTS), that has been successfully applied to design artificial intelligence for games (see the recent sucess of AlphaGo), but with few sample complexity guarantees.

### A Cooperative Game Theoretic Analysis of WiFi

**Speaker**: Mikael Touati (Telecom ParisTech)

**Abstract**: In multi-rate IEEE 802.11 WLANs, the traditional user association based on the strongest received signal and the well known anomaly of the MAC protocol can lead to overloaded Access Points (APs), and poor or heterogeneous performance. Our goal is to propose an alternative game-theoretic approach for association. We model the joint resource allocation and user association as a matching game with complementarities and peer effects consisting of selfish players solely interested in their individual throughputs. Using recent game-theoretic results we first show that various resource sharing protocols actually fall in the scope of the set of stability-inducing resource allocation schemes. The game makes an extensive use of the Nash bargaining and some of its related properties that allow to control the incentives of the players. We show that the proposed mechanism can greatly improve the efficiency of 802.11 with heterogeneous nodes and reduce the negative impact of peer effects such as its MAC anomaly. The mechanism can be implemented as a virtual connectivity management layer to achieve efficient APs-user associations without modification of the MAC layer.

### Robust On-line Matrix Completion on Graphs

**Speaker**: Mohammed Amin Abdullah (Huawei)

**Abstract**: We study online robust matrix completion on graphs. At each iteration a vector with some entries missing is revealed and our goal is to reconstruct it by identifying the underlying lowdimensional subspace from which the vectors are drawn. We assume there is an underlying graph structure to the data, that is, the components of each vector correspond to nodes of a certain (known) graph, and their values are related accordingly. We give algorithms that exploit the graph to reconstruct the incomplete data, even in the presence of outlier noise. The theoretical properties of the algorithms are studied and numerical experiments using both synthetic and real world datasets verify the improved performance of the proposed technique compared to other state of the art algorithms.

### Crowdsourcing: Low Complexity, Minimax Optimal Algorithms

**Speaker**: Thomas Bonald (Telecom ParisTech)

**Abstract**: We consider the problem of accurately estimating the reliability of workers based on noisy labels they provide, which is a fundamental question in crowdsourcing. We propose a novel lower bound on the minimax estimation error which applies to any estimation procedure. We further propose Triangular Estimation (TE), an algorithm for estimating the reliability of workers. TE has low complexity, may be implemented in a streaming setting when labels are provided by workers in real time, and does not rely on an iterative procedure. We further prove that TE is minimax optimal and matches our lower bound. We conclude by assessing the performance of TE and other state-of-the-art algorithms on both synthetic and real-world data sets.

### A Diffusion Kernel LMS Algorithm for Nonlinear Adaptive Networks

**Speaker**: Symeon Chouvardas (Huawei)

**Abstract**: We study distributed algorithms for nonlinear adaptive learning. In particular, a set of nodes obtain measurements, sequentially one per time step, which are related via a nonlinear function; their goal is to collectively minimize a cost function by employing a diffusion based Kernel Least Mean Squares (KLMS). The algorithm follows the Adapt Then Combine mode of cooperation. Moreover, the theoretical properties of the algorithm are studied and it is proved that under certain assumptions the algorithm suffers a no regret bound. Finally, comparative experiments verify that the proposed scheme outperforms other variants of the LMS.

### Self-Rating in a Community of Peers

**Speaker**: Wenjie Li (Paris Sud / Centrale-Supelec / L2S)

**Abstract**: Consider a community of agents, all performing a predefined task, but with different abilities. Each agent may be interested in knowing how well it performs in comparison with her peers. This general scenario is relevant, e.g., in Wireless Sensor Networks, or in the context of crowd sensing applications, where devices with embedded sensing capabilities collaboratively collect data to characterize the surrounding environment, but the performance is very sensitive to the accuracy of the gathered measurements. We present a distributed algorithm allowing each agent to self-rate her level of expertise/performance at the task, as a consequence of pairwise interactions with the peers. The dynamics of the proportions of agents with similar beliefs in their expertise are described using continuous-time state equations. The existence and the uniqueness of an equilibrium is discussed. Closed form expressions for the various proportions of agents with similar belief in their expertise is provided at equilibrium. Simulation results match well theoretical results in the context of agents equipped with sensors aiming at determining the performance of their sensors.

### An efficient method for mining the Maximal alpha-quasi-clique-community of a given node in Complex Networks

**Speaker**: Patricia Conde (Paris 13 / L2TI)

**Abstract**: A network is a complex system composed of a set of entities, usually called vertices or nodes, connected by links, also called edges. In the particular case where the entities are people, the system is called a social network. Some examples of such networks are online social networks (Facebook, Twitter, Skyrock, ...), email exchange networks, scientific collaboration networks. Detecting communities in large complex networks is important to understand their structure and to extract features useful for visualisation or prediction of various phenomena like the diffusion of information or the dynamic of the network. Intuitively, a community is defined by a set of strongly interconnected nodes. An alpha-quasi-clique community, is a group of nodes where each member is connected to more than a proportion alpha of the other nodes. Consequently, an -quasi-clique has a density greater than. The size of an alpha-quasi-clique is limited by the degree of its nodes. In complex networks whose degree distribution follows a power law, usually alpha-quasi-cliques are small sets of nodes for high values of alpha. Therefore, we are interested in alpha quasi-cliques of maximal size. Mining all the maximal alpha-quasi-cliques of a network is NP-complete. Efficient exact methods or approximations to solve it are available. However, all these methods generally assume that the network is entirely known. In some situations, the network can be so large that we can have only local information about some nodes or we can be interested in the community of a particular node in the network. Detecting the communities of specific nodes may be very important for applications dealing with huge networks, when iterating through all nodes would be impractical. We present an efficient method, called RANK-NUM-NEIGHS (RNN), for finding the maximal alpha-quasi-clique community of a given node in the whole network. Therefore, the resulting communities have two main characteristics: they are alpha-quasi-cliques (very dense for high?) and they are local to the initial node (this problem is a particular case of the local community detection problem). The proposed method is evaluated experimentally on real and computer generated networks in terms of quality (community size), execution time and stability. We also provide an upper bound on the optimal solution.