Google Search Engine
This is a demo of the Google Search Engine. Note, it is
research in progress so expect some downtimes and malfunctions.
You can find the older Backrub web page here.
Google is being developed by
Larry Page
and
Sergey Brin with
very talented implementation help by
Scott Hassan
and
Alan Steremberg.
Current Status of Google:
Web Page Statistics |
Number of Web Pages Fetched |
24 million |
Number of Urls Seen |
76.5 million |
Number of Email Addresses |
1.7 million |
Number of 404's |
1.6 million |
Storage Statistics |
Total Size of Fetched Pages |
147.8 GB |
Compressed Repository |
53.5 GB |
Short Inverted Index |
4.1 GB |
Full Inverted Index |
37.2 GB |
Lexicon |
293 MB |
Temporary Anchor Data
(not in total) |
6.6 GB |
Document Index Incl.
Variable Width Data |
9.7 GB |
Links Database |
3.9 GB |
Total Without Repository |
55.2 GB |
Total With Repository |
108.7 GB |
Known Problems:
- We have only crawled US looking domains so as not to congest
international links. This makes the search engine somewhat incomplete.
- There has been some corruption in docid's for anchor hits. This
results in some random looking matches (about 1 in 10). SB: I have
tried to patch the code to account for this but there are still many problems.
- Also, some docinfo pointers are corrupted. SB: I have patched the
code to account for most of these but I don't have tight bounds on the
extent of corruption.
- The performance is somewhat poor right now. This is partly due to
data going over NFS and antiquated hardware. However, we are
anticipating equipment donations from IBM and Intel to help with performance and increase our disk capacity so we can scale to 100
million pages.
Before emailing, please read the FAQ. Thanks.
Please send any comments to backrub@google.stanford.edu.
Copyright ©1997 Larry Page, Sergey Brin, Scott Hassan, Alan Steremberg
Backrub
Last modified: Thu Dec 4 10:09:44 PST