Identifying Critical Projects via PageRank and Truck Factor
Recently, Google’s Open Source team presented the criticality score [1] a metric to assess “influence and importance” of a project in an ecosystem from project specific signals, e.g., number of dependents, commit frequency, etc. The community showed mixed reactions towards the score doubting if it can accurately identify critical projects. We share the community’s doubts and we hypothesize, that a combination of PageRank (PR) and Truck Factor (TF) can more accurately identify critical projects than Google’s current Criticality Score (CS). To verify our hypothesis, we conduct an experiment in which we compute the PR of thousands of projects from various ecosystems, such as, Maven (Java), NPM (JavaScript), PyPI (Python), etc., we compute the TFs of the projects with the highest PR in the respective ecosystems, and we compare these to the scores provided by the Google project. Unlike Google’s CS, our approach identifies projects, such as, six and idna from PyPI, com.typesafe:config from Maven, or tap from NPM, as critical projects with high degree of transitive dependents (highest PR) and low amount of core developers (each of them possessing a TF of one).
Wed 19 MayDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
Go directly to this room on Clowdr