Part III in this series (see part I and part II) starts from the same list of persons (mathematicians, physicists, artists, architects, composers, …) and crawls their respective Wikipedia entry. But this time, all the wiki-internal links are extracted from the pages. With these lists of links, a graph is created, the individual pages being the nodes and the hyperlinks becoming edges.
The figure above shows a subgraph containing only the nodes of the 261 persons in the list. The dot size represents the page rank1 of that node in the graph.
The full graph – containing all the wiki pages any one of the 261 original pages links to – consists of over 40’000 nodes and over 70’000 edges and is therefore somewhat hard to visualise. The figure below shows a sub-graph, where only the nodes are kept either whose degree (number of edges) is bigger than four or who is in the list of names. It contains over 3’000 nodes. You can see the d3 visualisation here: www.mathiasbernhard.ch/wikigraph/ (it’s very slow!)
Beside visualisation, a lot of interesting metrics can be applied to this graph. For example: What is the shortest path from Steve Jobs to Alan Turing?
Alan Turing is five clicks away from Steve Jobs. There are six other paths with the same length. They are all starting via Edison and Röntgen, then Bohr/Hilbert, Heisenberg/Hilbert, Dirac/Hilbert, Einstein/Gödel, Einstein/Russell and Gabor/Shannon. In the opposite direction, it is only one click (discussion about whether the bite in Apple’s logo refers to the poisoning of Alan Turing).
Another interesting fact is that when the nodes are ordered by their respective page rank, there are four national libraries within the top twenty.
1: Authority_control: 0.00169658320645
2: Library_of_Congress_Control_Number: 0.00168659080125
3: Biblioth%C3%A8que_nationale_de_France: 0.00149557357925
4: Union_List_of_Artist_Names: 0.00118089437863
5: Syst%C3%A8me_universitaire_de_documentation: 0.000994692220822
6: National_Diet_Library: 0.000850136620221
7: Netherlands_Institute_for_Art_History#Online_artist_pages: 0.000775126801984
8: Digital_object_identifier: 0.00066214294946
9: LIBRIS: 0.000638040562868
10: National_Library_of_the_Czech_Republic: 0.000637854056841
11: University_of_St_Andrews: 0.000634510432404
12: MacTutor_History_of_Mathematics_archive: 0.000602870466377
13: Edmund_F._Robertson: 0.000599879087721
14: John_J._O%27Connor_(mathematician): 0.000599879087721
15: Alma_mater: 0.000586810718044
16: Architect: 0.000579653396514
17: National_Library_of_Australia: 0.000555743367234
18: Paris: 0.000548107467855
19: Internet_Archive: 0.000546230717735
20: Mathematics_Genealogy_Project: 0.000534084950152
21: France: 0.000530455354901
22: Italy: 0.000515697915013
23: IEEE: 0.000513181569442
24: Germany: 0.0005042811901
25: Mathematics: 0.000503069062753
Ordered by betweenness centrality however, the ranking is the following:
1: Kurt_G%C3%B6del: 0.00946450875602
2: Isaac_Newton: 0.00775142391286
3: Albert_Einstein: 0.00588718058716
4: Johann_Wolfgang_von_Goethe: 0.00525383757489
5: Bertrand_Russell: 0.00492127996847
6: Gottfried_Wilhelm_Leibniz: 0.00489376125238
7: Carl_Friedrich_Gauss: 0.00378070359586
8: Plato: 0.00367214894243
9: Norbert_Wiener: 0.00328975513751
10: Charles_Sanders_Peirce: 0.00290511094746
11: Immanuel_Kant: 0.00289419373699
12: Leonardo_da_Vinci: 0.00283207785036
13: Le_Corbusier: 0.00273813162512
14: Richard_Wagner: 0.00260243587402
15: Robert_Hooke: 0.00250765486565
16: Ren%C3%A9_Descartes: 0.00224103254035
17: Salvador_Dal%C3%AD: 0.00222329736393
18: Friedrich_Nietzsche: 0.00221557309017
19: Vitruvius: 0.002140042138
20: Wilhelm_R%C3%B6ntgen: 0.00211640224521
21: Aristotle: 0.00211537504511
22: Arthur_Schopenhauer: 0.0020558068558
23: John_von_Neumann: 0.00198851339016
24: Sigmund_Freud: 0.00196861922736
25: Johann_Sebastian_Bach: 0.00194183582603