The Sensitivity of Page Rank to Connection Errors
Large networks taken from the social sciences often contain "connection errors" in which the edges are mistakenly recorded. For example, an set of extra edges can be wrongly added, a set of edges can be wrongly deleted, or a set of edges may be "transposed", that is, an edge that should be recorded as going from node A to node B is transposed and falsely recorded as going from node A to node C. This Demonstration examines the sensitivity of the page rank measure of node centrality to these "connection errors" in a random graph. You choose the number of nodes and edges in the original random graph. By moving the "variant" slider you then choose an example of a random graph satisfying these criteria for the original random graph. You then choose how many random edges to add, delete, or transpose from the graph using the "operation" control in conjunction with the "error parameter" slider. That slider responds in either a linear or logarithmic fashion depending on the "transposition fraction mode" you have selected. By triggering the "perturb" control, the Demonstration generates sets of errors satisfying the constraints established by the prior controls.[more]
The Demonstration responds with (1) a grid showing the number of nodes in the original graph, the number of edges, and the number of random edges that will be deleted; (2) a histogram showing the distribution of page rank changes resulting from the deletion of the selected random edges; (3) a grid showing various statistics associated with the absolute changes in page rank induced by the deletions; (4) a grid showing various statistics associated with the changes in the ordering of the page ranks induced by the transpositions; and (5) a mapping showing how transposing the edges changes the ordering of the page ranks of the nodes. Some small text at the bottom of the display provides hash codes for both the original and the erroneous graph, which lets you verify whether your use of the controls is altering the original or the erroneous graph.
If one were to generalize the "message" of this Demonstration, it is that the ordering of page rank is very sensitive to transposition errors. If one is compiling a list, for example, of the top 10 nodes based on page rank, even a single transposition may easily result in the wrong nodes being included in such a list. This result suggests the need for extreme care in drawing conclusions about the relative "importance" of a particular node based on page rank. Even highly infrequent errors in the construction of the network can lead to significant errors in such computations.[less]
The Demonstration makes use of an ordering metric for each node, which is the difference between the order of the page rank of that node in the original graph and the order of the page rank of that node in the original graph with edges deleted. Thus, an absolute mean of 2.6 in a graph with 10 nodes means that on average a node changes 2.6 places in the page rank ordering as a result of the deletion of nodes.