[Articulate Presenter 5]
Search Engines and Google
Tom Arah comes to fully appreciate the importance of search engines in general to web publishers and Google in particular - the hard way.
When Tim Berners-Lee was working on the ideas that were to become the World Wide Web he called his first prototype browser “Enquire”, short for “Enquire Within upon Everything” a Victorian encyclopaedia that had caught his imagination as a child. His original vision of the Web as a universal and freely accessible research tool was far-reaching but the actuality quickly outgrew even this to become all-encompassing. The amount and range of information out there is almost unimaginable which causes its own problem: how does the end user find the information they are looking for? From the publisher’s perspective: how do you make sure that you’re reaching as wide an audience as possible?
The underlying architecture that makes the Web possible is hypertext. By providing relevant links to other pages both internally within the site and externally, browsers are able to follow the link that interests them most to find the the information they seek. But imagine trying to drill through to information in an encyclopaedia by opening a page at random and then following the “see also” references. Clearly users need a helping hand. The first major navigational guide to the Web was developed by two Stanford University students, David Filo and Jerry Yang back in 1994. Yahoo began life as little more than a list of personal bookmarks but soon developed into an extensive categorized directory.
Before search engines, directories like Yahoo ruled the Web (1996 screenshot courtesy of waybackmachine.org).
Directories like Yahoo and the Open Directory Project at dmoz.org act like contents pages to the encyclopaedia, pointing users in the right direction. However what you really need with so much information is an index that will take you directly to the page that you are looking for. Of course there’s no possibility of indexing the Web’s ever growing and ever-changing content manually which is where search engines come in. Early pioneers such as the World Wide Web Wanderer and WebCrawler developed the idea of data-collecting “spiders” that automatically “crawled” all the links that they found while adding the text of the pages to their databases.
End users can then simply enter keywords to define their search and a Search Engine Results Page (SERP) of matching links is automatically created to choose from. The boost to browsing efficiency was such that a new breed of commercialized search engines, such as Lycos, Excite and AltaVista, quickly came to act as the natural “portals” to the vast majority of web traffic. For the publisher this suddenly meant that a high ranking on a popular keyword-based search guaranteed serious traffic. On the other hand if your page didn’t appear on the first SERP returned, the number of visitors you could expect trailed off dramatically.
Search engines quickly came to control the majority of web traffic; but just how effective can such automated indexing systems be? Basing relevance simply on the number of times a keyword appears on a given page is clearly inadequate as longer pages are likely to have more keyword matches. To improve the quality of their searches, each engine developed its own algorithm for judging relevance, for example basing results on keyword density, proximity and prominence rather than absolute numbers.
However, even with high keyword ratings, a page could still be about a different subject entirely. To further improve the quality of their results the search engine algorithms needed to look beyond straight textual analysis to take other relevancy factors into account. Here the nature of HTML itself comes into play as Tim Berners-Lee had specifically designed the language to mark up certain kinds of content. In particular if a keyword appears within a <TITLE>, <H1> to <H4> heading or <STRONG> tag, or the ALT description in an < IMG> tag, it’s a reasonable assumption that it is significant to the page. In addition HTML enables descriptive information about each document to be included in the page’s <HEAD> tag and many early search engines gave great importance to these < META> keywords and descriptions that authors were then encouraged to include.
These days search engines take advantage of HTML-based markup.
To begin with these more advanced search algorithms undoubtedly produced better results, but in retrospect the whole system seems horribly naïve. In particular by reverse engineering what the search engines were looking for, publishers could tailor their pages accordingly and the Search Engine Optimization (SEO) industry was born. Of course each publisher can argue that they are just making the most of what they have, indeed that they are helping the search engines by making their content as findable as possible. Such “white hat” optimization can work positively for site and search engine alike but equally, by overtaking more relevant but unoptimized pages, the SERP’s overall content quality can fall.
A much more serious problem is the ease with which the search engines can be fooled entirely by so-called “black hat” SEO. Adding totally irrelevant < META> keywords can instantly boost traffic as can “keyword packing” either in dedicated “doorway pages” with no real content or in body text that is hidden from the end user. Alternatively “cloaking” can present the search engine spider with high ranking copy, sometimes simply cut and pasted from successful sites, but use a script to send actual visitors to another page entirely. Of course visitors aren’t likely to stay long when they realise they’ve been tricked, but if only a tiny fraction of them end up buying what the black hat site is selling – usually sex or illegal downloads – or click away on their affiliate links then the trick has paid off. And the fact that the search engine becomes near unusable as a result is of little concern to its parasites. The era of “search engine spam” had arrived.
The early search engine portals based on tagged text played a crucial pioneering role but, based on an inherently limited technology so open to manipulation and flagrant abuse, there had to be a better way. The solution arrived in 1998 in the form of a new startup search engine based on the ideas of another couple of Stanford University students, Larry Page and Sergey Brin. Initially Google sold itself on the sheer size and speed of its database made possible by a new distributed and so low-cost and scalable server environment. For the end user the most obvious attraction was the simplicity and focus of the Google site compared to the existing search portals which were complicating and compromising their service with paid-for-links and other money-making schemes. The message was clear: Google remained a believer in the Berners-Lee inspired vision of the Web where content was still king and money a secondary concern if that.
From the beginning Google focused on search efficiency. (1998 screenshot courtesy of waybackmachine.org)
What really made Google different though was its entirely new algorithm for determining keyword relevance: PageRank. PageRank’s strength is that it recognizes that a web page isn’t just text, it’s hypertext - in other words it’s full of links. Looked at correctly these links are full of information. The underlying maths is frightening, but the principle is simple: each “back link” pointing at a given website is effectively a vote in its favour. The reasoning is that if someone goes to the trouble of setting up a link they must think the end content is worth something. By analyzing these links Google effectively harnesses every web publisher not just to provide the content for the encyclopaedia, but to rank every site and page on the Web for its index!
Crucially, in tandem with existing tagged text analysis, Google’s PageRank link-based analysis works brilliantly at finding the best authority sites in any field. Search for a review of some computer software for example and tagged text analysis would be virtually useless for distinguishing between different efforts, most of which are little more than recycled press releases. However if a site in general or a review in particular stands out and people go to the effort of linking to it from their site or from a forum, that’s a useful indication that others are likely to be interested too. In other words, thanks to PageRank, Google can effectively pick out worthwhile PC Pro-style content which is created by people who know what they are talking about and which is itself worth talking about.
For the end user how it’s done is irrelevant, all that matters is that Google does the job. In fact searching with Google is so simple, efficient and effective that it’s easy to take it for granted and to hardly notice it’s there. But it’s hard to imagine life without it. While writing this article for example I’ve visited well over 50 authority sites (using the excellent Google toolbar to make searching an integral part of browsing). In the process I’ve not just checked the odd fact, the sites I’ve visited have helped shaped what I think about the subject and so what I’ve ended up writing. Just about every web author is doing the same so that Google isn’t just indexing the content on the Web, it’s helping to improve it.
The Google toolbar integrates google searching directly into your browsing workflow.
The quality of results that the PageRank algorithm provides are its greatest strength, but it has another major benefit. Because PageRank depends on back links, there’s very little that the publisher can do to their own site to improve its ranking except to add new pages with noteworthy content to encourage others to link to them. In other words the best advice to honest web publishers is put the time and energy saved on SEO into creating useful content that visitors will search out and link to. Get it right and a virtuous circle of increasing traffic leads to more back links, higher ranking and so more traffic and so on.
Of course these days the Web has moved on from the original vision of a universal and freely-available research encyclopaedia and so many sites aren’t designed to provide useful content per se, but rather content designed to sell products and services. However you can’t sell something if no-one knows that your site exists which is why the SEO promises of a guaranteed Google Number 1 ranking can seem so attractive. A much better route to ensure a high SERP placing on Google is provided by Google itself. With the Google AdWords system you bid to have your site appear in the column of sponsored links which appear down the right hand column next to the main content-based search results. You only pay for actual clickthrus and the system automatically reduces the actual cost-per-click ( CPC) to the lowest cost needed to maintain your ad’s position. If you can’t make a profit from such highly targeted and low-cost advertising (the current minimum CPC is 4p), then the bottom line is that you are going to find it hard surviving as a web business.
By this stage you can probably tell that I’ve long been a fan of Google and I should admit a personal interest as Google has been very good to me. Several years ago I started posting the reviews and articles I wrote for PC Pro to my designer-info.com website and others clearly found them useful or thought-provoking enough to begin linking to them. The Google effect took off and by the end of last year my traffic was approaching 250,000 visitors a month. That’s significant traffic and it began generating some reasonable affiliate and advertising revenue. The single biggest revenue stream was from Google itself based on the AdSense scheme, which uses the same AdWords technology to display sponsored links relevant to each host page’s content. Not enough to give up the day jobs, but certainly a welcome subsidy.
Google is now using its search expertise for advertising.
With Google helping to research the content, generate the traffic and pay for the site, I was certainly an admirer but to be honest I still didn’t really give Google much credit, or even thought. Google just worked so well that it almost demanded to be taken for granted. This complacency first began to change when the company launched itself on the stock market in August 2004 with the clear signal this gave that even the idealistic Google now accepted that money had become the major driving force in the Web; and its initial value of $23 billion showing just how much money. Crucially the flotation means that, despite its informal corporate motto “Don’t Be Evil”, Google now has to turn in profits to satisfy its shareholders. However with my own affiliate, advertising and sponsored links I could hardly get on a high horse about this and if you provide a useful service, I reckon you deserve to be rewarded for it.
Much more significant was the noticeable decline in quality of Google’s results after the launch. As Davey Winder and Jon Honeyball and other commentators have remarked recently, for many searches Google’s SERPs have become so full of ecommerce, affiliate and aggregate sites trying to sell products that it is virtually impossible to find any useful hard information or personal experience of the products in question. When you do find some facts, say a technical run-down, the same content is found on many of the sites; in fact sometimes the entire site is the same but listed under different URLs. When questionable sites like these appear above the manufacturer’s own home page something is clearly going wrong.
However what finally brought me up short and made me think long and hard about Google was when my own Google-generated traffic fell off a very large cliff. Naturally my reaction was to Google-search to find out what was happening and it quickly became apparent that I was not alone - a whole host of sites had been similarly affected. Moreover it soon became clear that such ranking upheavals are relatively common. In fact Google rankings regularly shift with monthly index and PageRank updates in the so-called “Google Dance”. However the effects of this update were far more serious, implying a major algorithm change. Such updates are less frequent but becoming more common. More importantly the results for those sites that lose out in these so-called “Google Quakes” can be entirely devastating, destroying livelihoods and years of work.
A very unpleasant sight.
Next month I'll be looking to try and work out what Google is up to and, more importantly, whether there's anything the poor web publisher can do to protect themselves. In the meantime I already have two important pieces of advice. For web searchers: remember to bookmark those hidden treasure sites that you come across as they might not be there next time you look for them. For web publishers: whatever you do, don’t take Google for granted. What Google giveth, it can just as easily take away.
Hopefully you've found the information you were looking for. For further information please click here.
For free trials and special offers please click the following recommended links:
For further information on the following design applications and subjects please click on the links below:
[3D], [3ds max], [Adobe], [Acrobat], [Cinema 4D], [Corel], [CorelDRAW], [Creative Suite], [Digital Image], [Dreamweaver], [Director], [Fireworks], [Flash], [FreeHand], [FrameMaker], [FrontPage], [GoLive], [Graphic Design], [HTML/CSS], [Illustrator], [InDesign], [Macromedia], [Macromedia Studio], [Microsoft], [NetObjects Fusion], [PageMaker], [Paint Shop Pro], [Painter], [Photo Editing], [PhotoImpact], [Photoshop], [Photoshop Elements], [Publisher], [QuarkXPress], [Web Design]
To continue your search on the designer-info.com site and beyond please use the Google and Amazon search boxes below:
|designer-info.com: independent, informed, intelligent, incisive, in-depth...|
All the work on the site (over 250 reviews, over 100 articles and tutorials) has been written by me, Tom Arah It's also me who maintains the site, answers your emails etc. The site is very popular and from your feedback I know it's a useful resource - but it takes a lot to keep it up.
You can help keep the site running, independent and free by Bookmarking the site (if you don't you might never find it again), telling others about it and by coming back (new content is added every month). Even better you can make a donation eg $5 the typical cost of just one issue of a print magazine or buy anything via Amazon.com or Amazon.co.uk (now or next time you feel like shopping) using these links or the designer-info.com shop - it's a great way of quickly finding the best buys, it costs you nothing and I gain a small but much-appreciated commission.
Thanks very much, Tom Arah
[DTP/Publishing] [Vector Drawing] [Bitmap/Photo] [Web] [3D]
[Articles/Tutorials] [Reviews/Archive] [Shop] [Home/What's New]