The Internet Archive is a repository of terabytes of data from thousands of CDs and floppy disks, and some of them can be difficult to sort. The files in the archive can contain anything—music, text documents, ancient memes, and old Flash animations — and until recently, the only way to find out what data was on those ancient CDs was to boot them and hope that the software will be able to convert data into readable material. But the new service made this process much easier.
DiscMaster is a new website that takes CDs and floppies from the Internet Archive and converts them to a searchable database. All old file formats become available for viewing in the browser.
“For a specific group of people this will revolutionize their relationship to the archive,” Jason Scott, an archivist at the Archive and spokesperson for DiscMaster. “This will be an endless font of information. This will be the biggest thing I’ll work on this year.”
Scott revealed that DiscMaster is the work of an Archive fan who contacted him via Discord. They had been working on DiscMaster for 18 months when they finally asked for help. Scott said he was impressed.
“The program is pulling apart every archive,” he said. “It is generating easy to use programs that can preview the material easily.”
One of the trickiest parts of looking through old files is the formats. In the early days of the Internet, there were no standardized file formats, no set way to play video, no agreed-upon audio codec, and no single way to play text. Viewing old files requires you to identify these ancient formats and figure out how to display them in a modern browser.
DiscMaster does it all for you and works in both modern and legacy browsers. Scott said this means that anyone on an old Commodore 64 with a browser can go to DiscMaster and view old files without any problems. And anyone using the latest version of Chrome can view the same file without much trouble.
“This thing is a beast,” he said. “It’s 11 terabytes of data right now.”
Scott likes DiscMaster for many reasons, but the main one is that it’s a revenge against the sceptics who said that no one would ever go through all the material in the archive and that it was too difficult to access.
When Scott soft-launched DiscMaster on Sunday, the website had 70 views. It crashed on Tuesday with about 40,000 views. At the moment of posting this, it has been viewed more than 117,000 times. We know this because of the old-school view counter on the main page.
The program slowly works its way through every CD and diskette in the Archive, expanding its database as it goes. Depending on the size and type of files on the CDs, the program may take several hours to sort the data and make it available for viewing on the Internet. There are also plans to use the program to sort through old archived AOL and FTP content.
Jason Scott noted that there is a possibility that private information that may be buried on CDs could be accidentally released.
“There’s only so much you can precheck with 93 million files and counting,” he said, but promised that anyone who goes to the archive can get that personal or confidential information pulled down. According to him, this was one of the first features he asked to be added to the DiscMaster.
DiscMaster is an incredible tool for archivists, historians, the curious, and people looking for half-forgotten media or works they believe have been lost to time. Scott said that thanks to DiscMaster, he found some old songs from the 1990s that he thought were lost in the archives.
“I encourage everyone to do an ego search. If you thought your work was lost, you may be shocked to discover what’s been saved.”