Exploring the Co-Authorship Graph for SIGMOD,
PODS, VLDB and ICDE*

Last update: (The analysis below considers the information available on DBLP's XML snapshot as of June 27, 2005)


Jeffrey Pound
Mario A. Nascimento
Jörg Sander
Dept. of Computing Science, University of Alberta, Canada

Analysis of co-authorship relationships have been done in the context of Mathematical Sciences (aka, The Erdös Number Project) and Information Retrieval (c.f., paper and companion website).  Related analysis in the context of co-starring (in a movie) instead of co-authorship has also been done and is known as The Oracle Of Bacon

Here, and in the companion paper (SIGMOD Record 32(3), 2003) we explore the co-authorship graph for a few database conferences, namely: SIGMOD, PODS, VLDB and ICDE.  The data used for the analysis in the SIGMOD Record paper was obtained from DBLP and considers all conferences up to 2002. This site, on the other hand, will be updated periodically (see title's note for last update).

The two tabs in the table below allows you to obtain some of the data derived from the co-authorship graph using the available data.  Further analyses, e.g., showing that SIGMOD's co-authorship graph has features of a Small World, are presented in the companion paper.


Requires Java 1.3 or higher

Author-based Statistics (Author Stats tab)

You can choose either one of the conferences or all four combined.  The table allows searching by author name and clicking on column headers to sort (or reverse sort) them.  Please note: if you cannot find a specific name, try un-checking the "Largest Connected Component Only" box and searching again.

Here is a short description of the columns:

For instance, consider the following: A wrote a paper with B and C; B wrote another paper with D and E; D wrote a paper with F, and finally, X wrote a paper with Y.  The years when those papers were published is not relevant since we consider all accumulated data. Such a scenario would yield the following co-authorship graph, with two disconnected components:

B would have a  centrality score of 1.2, meaning that B could reach any other author in the same connected component traversing 1.2 edges on average, and B's component size would be 6.  (Note that the number of papers authored by someone cannot be inferred from the co-authorship graph itself.)  By default, the interface below would not display the information for X and Y since they do not belong to the largest connected component.  To view their information one would need to uncheck the "Largest Connected Component" box.

Co-authors Linkage (Author Linkage tab)

As in previous case you can also choose to use data from one or all of the four conferences available combined. You can then type two names, and if there exists a path in the co-authorship graph linking those two authors it will be displayed. If more than one path exists then the shortest one is displayed. All edges that link the two authors are displayed along with the conference(s) and year(s) where the authors involved in that edge co-authored a paper.

For instance, in the example above the (shortest) path between authors A and E would involve the co-authorship relationship between A and B, and between B and E, along with the conference(s) where the co-authorship happened.


This work was partially funded by Natural Sciences and Engineering Research Council Canada (NSERC).  The enthusiastic support of  Tamer Özsu and Rick Snodgrass is gratefully acknowledged. As well, without Michael Ley championing DBLP this work would not have been possible.


CS @ UofA (You are visitor number .)