With a utopian vision, he founded the Internet Archive. That hasn’t changed, but the internet has

Brewster Kahle dances in his library. He smiles as he waves on the spot, an antique Victrola fills the foyer of the building, a former church, with the raspy jazz tunes of yesteryear.

He lifts the needle and the music stops, but for a moment. Soon, his staff will turn the aging record into a string of ones and zeros that will last forever in cyberspace. This is the Internet archive, which is why she and Kahle are here: to make every bit of digital or physical information that exists available online for free.

Walking with Kahle through his pillared temple of knowledge in the Richmond District of San Francisco, you understand the magnitude of what he and his staff, now numbering more than 100, have been hard at work for nearly 25 years. In a cargo hold, piles of donated books await their turn on a specialized scanning machine where a technician, shrouded in black curtains, painstakingly copies endless pages.

Downstairs, microfiche reels are being converted into computer images that will match the staggering amount of data the archive has accumulated over the years.

The servers contain more than 70 unique petabytes of data – 70 million gigabytes – including 65 million texts, movies, audio files, images, books and more.

Kahle’s quest to build what he calls “A Library of Alexandria for the Internet” began in the 1990s when he began broadcasting programs called crawlers to take digital snapshots of every page on the web, hundreds of billions of which are for available to everyone through the archive’s Wayback Machine.

That vision of free and open access to information is closely intertwined with the early ideals of Silicon Valley and the origins of the Internet itself.

“The reason for the Internet and specifically the World Wide Web was to make it so that everyone is a publisher and everyone can have a voice,” Kahle said. To him, the need for a new type of library for that new publishing system, the Internet, was clear.

But while Kahle’s goals haven’t changed, the Internet has. That early utopian view of the positive powers of digital interconnectedness is increasingly at odds with the amount of copyrighted and paywalled material online growing every day.

Left: A 1947 Albany (NY) Times newspaper in the Internet Archive offices.  Right: Book scanner Eliza Zhang opens a box of Albany Times newspapers.

Left: A 1947 Albany (NY) Times newspaper in the Internet Archive offices. Right: Book scanner Eliza Zhang opens a box of Albany Times newspapers.

Photos by Constanza Hevia H./Special to The Chronicle

When the archive started collecting, most people had online access to a few important home pages like Yahoo.com, said Margaret O’Mara, a professor at the University of Washington and Silicon Valley historian.

“Now not only is there so much more information, but a lot of that information is proprietary,” O’Mara says. “There are questions about how the Internet works and how the Internet economy works that cannot be answered by capturing web pages or capturing documents or digitizing a magazine.”

Despite this, she said the archive is invaluable to researchers like herself and reflects the idealism underlying Silicon Valley’s dream of a more open, connected and accessible world.

“They preserve the past in a way that’s rare in the industry and a community that’s always so focused on the future and focused on what’s next,” O’Mara said.

That changing online landscape keeps Kahle busy as he makes his way to the beating heart of the archive’s cavernous main space. The room is quiet. Scattered with a golden light filtering in through the windows, the former nave still feels somehow sacred. Few people are in the building due to the pandemic, but this room is never truly empty, the pews are populated with miniature statues of workers and volunteers from the past and present, including a bespectacled copy of Kahle himself.

Here, the server banks hum and flash with every upload and download as Kahle discusses how libraries, even in cyberspace, can burn.

Across the auditorium next to the main stage where hymn numbers were once posted, three numbers in metal have been chosen: 200, 404, and 451. The first two are common Internet codes for when a page opens successfully or not. The third appears when content has been removed for legal reasons, such as copyright infringement.

It’s also no coincidence that it’s a reference to Ray Bradbury’s anti-censorship novel ‘Fahrenheit 451’.

Book scanner Eliza Zhang, one of more than 100 employees, works at the Internet Archive offices in the Richmond District.

Book scanner Eliza Zhang, one of more than 100 employees, works at the Internet Archive offices in the Richmond District.

Photos by Constanza Hevia H. / Specially for The Chronicle

Kahle has said in the past that if a library and its books burned, copies likely lived on in another physical space. “That’s not the case on the web,” he said. For example: “If a newspaper goes offline in Turkey, all their archives go away. And you can’t run a culture like that.”

The archive has been buying and digitizing and lending books through its site for years, with a waiting list like other libraries. But when the coronavirus pandemic struck last year and libraries and schools were closed, the archive created what it called the National Emergency Library, a collection of 1.4 million online books available to users without a wait.

A lawsuit filed by four of the nation’s largest publishers soon followed, one of the many challenges the archive faces in its quest for freedom of navigation rights in cyberspace.

Kahle argues that copyright laws don’t prohibit libraries like his from owning, digitizing, and lending books with certain controls.

Perhaps an even bigger barrier in Kahle’s mind is smartphones and the proprietary and protected apps that fill them.

“These things are full of apps that aren’t open,” he said, holding up his phone during a recent Zoom call. That also means that many of them are immune to being crawlers and cannot be saved for posterity. That’s a very annoying problem for the archive’s mission, along with paywalls, which Kahle’s crawlers can and will block.

Brewster Kahle, who founded the Internet Archive 25 years ago, reviews the organization's servers in San Francisco, which hold more than 70 million gigabytes of data, including 65 million texts, movies, audio files, images, books, and more.

Brewster Kahle, who founded the Internet Archive 25 years ago, reviews the organization’s servers in San Francisco, which hold more than 70 million gigabytes of data, including 65 million texts, movies, audio files, images, books, and more.

Constanza Hevia H. / Specially for The Chronicle

The original Internet format of hypertext links still in use allows people to “weave knowledge together,” he said. But “the app world is inherently embedded in business products. That’s not how we’re going to build a culture that’s interoperable, builds on each other, and can develop new ideas.”

Kahle’s career in technology dates back to the early 1980s, when he graduated from the Massachusetts Institute of Technology, where he studied artificial intelligence before graduating. He helped found a supercomputing company called Thinking Machines before creating the Internet’s first publishing system, called Wide Area Information Server, which was eventually sold to America Online.

In the past, Kahle also found ways to monetize software without sacrificing the archive ideal. When he sold Alexa Internet, a web research and information company he co-founded in the 1990s, to Amazon, he struck a deal with then-CEO Jeff Bezos. He would only sell the software if Bezos allowed it to continue to donate a copy from the Internet to its archive every day. Bezos agreed.

The Internet archive today is funded by many small donations averaging around $20 each, according to Katie Barrett, the archive’s senior development manager. The archive also earns money from scanning books for libraries and receives funding from the Kahle/Austin Foundation Foundation, which was founded with Kahle’s wife, Mary Austin.

Tax forms from 2019 show the archive’s revenue for the year exceeds $36 million, with nearly $30 million in contributions and grants.

In its pursuit of a more open and accessible world, the nonprofit is working with Wikipedia, restoring links and updating pages that link back to sites that would be lost if the Wayback Machine hadn’t saved them in the first place. In partnership with the archive, Wikipedia has added more than 25 million archived web pages, mostly from Wayback Machine links, to 150 language editions of Wikipedia.

“We share a vision of the Internet where nonprofit services can increase humanity’s access to knowledge,” Gwadamirai Majange, a spokesperson for the Wikimedia Foundation, which owns Wikipedia, said in an email.

The Internet Archive Building in the Richmond neighborhood.

The Internet Archive Building in the Richmond neighborhood.

Constanza Hevia H./Special to The Chronicle

The archive also collaborates with groups such as the Digital Public Library of America, contributing primarily digitized printed material to its site.

Groups like the Long Now Foundation are also trying to promote that kind of long-term thinking through their 10,000 years clock and a project to create a digital library of human language for generations to come, in part as a counterpart to the short-term, profit-driven models of modern technology companies.

Kalhe has expanded its efforts for non-profit organizations beyond the digital world.

One was an ill-fated attempt to credit union with $1 million from the archive. In a more successful bid, he founded another nonprofit and bought a nearby San Francisco apartment building, where some of his employees live for below-market rates.

For his part, Kahle said he recognizes the increasing challenges to the mission, but that hasn’t stopped him. “I wake up on different sides of the bed and say, you know, this is going to work, and we’ll make sure it goes,” he said. “And other times, there’s so much lined up against us.”

Despite that, Kahle’s servers continue to blink blue with life in that great quiet room. And as long as millions of people continue to have access to its seemingly endless collection, the Alexandria Library of the Internet will live on long after its founder, as he puts it, “goes to the great archive in the sky”.

Leave a Comment

Your email address will not be published.