The Software Heritage project aims to “collect, organise, preserve and make easily accessible the source code of all software that is publicly available.” Calling itself the “Software Wikipedia,” the “Internet Archive for source code,” and even the “Library of Alexandria of software,” it believes “a single and universal archive making software source code readily available will facilitate access to the knowledge contained therein, support programming education, and create a reference catalogue with all knowledge about this software.”
At launch, the project says that it had already “ingested in the Software Heritage archive a significant amount of source code, possibly assembling the largest source code archive in the world.” That currently amounts to 2 billion source files, 600 million commits, and 22.7 million projects.
Major holdings include public, non-fork repositories from GitHub, source packages from the Debian distribution (as of August 2015, via the snapshot service), and tarball releases from the GNU project (as of August 2015). As that list shows, the emphasis is on free software, although the project’s website repeatedly refers to “publicly available” code as its target. Ars has asked for clarification on what this means, but has not yet received any response.
The project has been initiated by the French research institute INRIA, which also provides most of the people working on it. Founder and CEO of the Software Heritage project is Roberto Di Cosmo. A blog post on the launch explains that the team has been working on the site for “over a year.”
Although INRIA is leading the project, the aim is to “encourage the emergence of an open network of peers and mirrors that will share with us the responsibility of maintaining available several copies of all the software we collect.”

Loading comments...