GitHub preserves open source code in the Arctic
Github is the world’s largest software repository, with 37 million users and more than 100 million repositories. Leader in the technology industry wanted a trustworthy solution for their perpetual storage needs.
GitHub has a clearer view than most of the speed at which technology evolves. Software and hardware can become obsolete to a newer version in a matter of months and this could jeopardize valuable source code to be accessible in the future. Source code creates the foundation for future development of computer science, which in many ways, is the foundation of the digital world as we know it. With so much of the world now digital and so much of our heritage increasingly digital, software and source code are a core part of the story.
Open source is particularly important, as the basis most software has been built on. Github is a major advocate of open source and places a high value on open source repositories.
Information that is born digital can be difficult to keep alive. Modern data storage options are designed for the short term and data can become inaccessible after just a few years. GitHub realised the way they were running their archival processes to protect their valuable code assets today was not sufficient. Wanting to learn about both the technical processes as well as the contextual processes of how to keep digital information safeguarded for decades and even centuries, as well as how to ensure guaranteed future access to this information, Github engaged a panel of experts.
These experts, known as the best of the best within the different parts of the digital archiving processes, include the Long Now Foundation, the Internet Archive, Software Heritage Preservation, Stanford Library and Microsoft Research. When it’s a matter of securing the world code heritage, nothing is left to chance, and only state of the art solutions where considered to address their challenges of keeping source code secured and accessible for hundreds of years.
With a key focus on perpetuity, Github engaged Piql for its unique and unmatched technology that could withstand any technological obsolescence in a time perspective of 1000+ years. Github also wanted a secure sustainable storage facility for storing information in a secondary location outside of the United States.
Piql’s unique approach to archiving data, built on principles of open source and future access, offered many benefits to the technology giant. With authenticity measures, no need for data migration and vendor independence, piqlFilm can do what no other technology can. Offer perpetual storage whilst being completely self-contained ensuring that the data can be read back both by machines as well as the human eye, guaranteeing future access to the original data independent of how much time passes.
In addition, storage in the Arctic World Archive (AWA), a safe, resilient and remote repository of digital world memory, aligned perfectly with Github’s objectives. Data stored here can last for over 1000 years, with ensured readback access regardless of future technology.
In the initial deposit, GitHub stored 6,000 of its most significant repositories in AWA for perpetuity, capturing the evolution of technology and software. This collection includes the source code for the Linux and Android operating systems; the programming languages Python, Ruby, and Rust; web platforms Node, V8, React, and Angular; cryptocurrencies Bitcoin and Ethereum; AI tools TensorFlow and FastAI; and many more.
In its second AWA deposit, GitHub stored a snapshot of every active public repository, featuring millions of individual contributions. These two deposits collectively provide an overview of the state of open source software development and use in the world today.
As today’s vital code becomes yesterday’s historical curiosity, it may be abandoned, forgotten, or lost. Worse, albeit much less likely, in the case of global catastrophe, we could lose everything stored on modern media in a few generations. Archiving software across multiple organizations and forms of storage helps to ensure its long-term preservation.
Piql and GitHub are continuing collaboration through the GitHub Archival Program, co-designing new elements of perpetual storage as part of the Arctic Code Vault project.