Summary
There is a vast archive of free content on the Internet Archive and web pages and documents preserved since about 1996 on the Wayback Machine that allows finding and retrieving web pages and documents that have vanished from the every day web and are not indexed in the search engines. An example is provided.
Vanishing content
The “World Wide Web” revolution occurred in 1994: the new HyperText Markup Language language(HTML) and the new universal browser “Mosaic” made browsing the Internet a breeze, replacing text-based methods such as anonymous FTP or Gopher. The Internet finally arrived to the general public! It was now much easier to share information, including text and images on the same page, even though at that time all commercial activities were not allowed. These times saw the rise of search engines simply to find things! But since then, many web sites appeared and disappeared for various reasons, or simply changed their name.
So… how do you find vanished pages?
Thanks to the Way-Back Machine, a Time Machine archive, it is possible to find again many of these vanished documents IF they were preserved.
Wayback Machine & Internet Archive
The cis embedded within the Internet Archive at archive.org but can also be accessed directly at web.archive.org.
- The Internet Archive is a non-profit library of millions of free books, movies, software, music, websites, and more.
- The Wayback Machine is a digital archive of the World Wide Web going back in time circa 1996
A simple example
Today a repository webs site of academic journal articles (that I won’t name) sent me a reminder that I was the co-author of a 1996 paper titled “Resources for HIV/AIDS on the internet” which itself can still be found (see links in References below.) The article contained many tables of web sites, (many of them have probably vanished.) At the very end is the mention that all these tables are available as a link on a computer that has been out of service for a long time (*):
All the WWW addresses described in this article can be accessed directly at the following Internet address:
http://www.bocklabs.wisc.edu/~sgro/mol-med/home.html
While a server of that name still exists, it is not the same one and the web site content has been changed. But, thanks to the Way-Back Machine this page still exists. One only needs to go to archive.org and enter the web address within the Way-Back Machine text field. IF the page has been preserved it will be found and listed within a set of calendars. So it is even possible to follow changes along time in some cases. See how the page evolved.
- Step 1: go to the WayBack Machine and enter the web address
- Step 2: check blue dots on calendars. These are the dates when the page was archived.
Thus it can be seen that the page was first archived in 1998.
- Step 3: click blue dot and select time when archived. This will open the archive
- Optional step4: click on the arrow in the time scale that opens at the top of the page to move through the timeline.
The page was still intact on February 16, 2003. But the subsequent archives, starting August 14, 2003 harbor the infamous “Not Found.” After 2006 the page is no longer archived and the web site disappeared temporarily.
Useful & Valuable
The WayBack Machine is extremely useful and of enormous value. Whenever a web address is not available, I immediately check the archive. Many times I was able to find a page or even an attached PDF, small movie etc. This is particularly useful if there is an example of code such as R
or python
that is no longer on the “normal” web site.
NOTE: It should be noted that the archive or the WayBack Machine are not indexed within the search engines, so you will never find these missing pages or document by “googling” them!
(*) I had created that link on a Netscape web server installed on a Silicon Graphics Teal-colored Indigo2 in 1994