Dissertations, Theses, and Capstone Projects

Date of Degree

9-2022

Document Type

Dissertation

Degree Name

Ph.D.

Program

Computer Science

Advisor

Amotz Bar-Noy

Committee Members

Theodore Brown

Saad Mneimneh

Dror Rawitz

Subject Categories

Theory and Algorithms

Keywords

Friendship Paradox, Random Neighbor Sampling

Abstract

The friendship paradox (FP) is the famous sampling-bias phenomenon that leads to the seemingly paradoxical truth that, on average, people’s friends have more friends than they do. Among the many far-reaching research findings the FP inspired is a sampling method that samples neighbors of vertices in a graph in order to acquire random vertices that are of higher expected degree than average.

Our research examines the friendship paradox on a local level. We seek to quantify the impact of the FP on an individual vertex by defining the vertex’s “friendship index”, a measure of the extent to which the phenomenon affects the vertex, either positively or negatively. We extend this measure to create aggregate measures that are indicative of FP-characteristics of the graph as a whole. We then examine these measures experimentally on theoretical canonical graphs, synthetic graphs, and the graphs of real-world networks as a means of demonstrating their usefulness for revealing information about a graph. These analyses place a particular focus on the similarities and differences our metrics have with the well-known degree-homophily measure, assortativity.

Having defined this metric and quantified information about the FP’s impact on graphs and vertices, we turn to one of the famous results of the paradox, the ability to sample neighbors of vertices instead of the vertices themselves in order to find high-degree vertices in a graph. We focus on an overlooked detail of this sampling method, the additional computational cost it incurs for each sampled vertex. We analyze this cost from a few perspectives, breaking it down into multiple costs that might apply in varied situations, and then create a strong model that enables a fair comparison between sampling methods to better quantify the value of this method versus naïve random sampling. As we define costs, certain tweaks that can be applied to the method become apparent. Some of these allow us to amortize the cost of a more computationally expensive step over more vertices and achieve superior results for the investment. This leads to a number of new versions of the sampling method which we present and analyze.

We then perform an extensive study on a particularly novel tweak, ‘inclusive random sampling’. Whereas the original method of sampling neighbors blindly exchanges a vertex for its neighbor, inclusive sampling learns the degree of both the vertex and the neighbor and chooses the one of higher degree. We explore this idea by applying it to two FP-inspired sampling methods, the previously mentioned random neighbor sampling, and a lesser-known method that samples edges instead of vertices. We prove interesting theoretical bounds on these methods and show their applications to different graph types. We also show the strengths and weaknesses of the different inclusive and exclusive methods through experimentation on a variety of synthetic and real-world graphs. We explore the connection between these methods and assortativity which further elucidates the characteristics that make one method superior to another for a given graph.

Share

COinS