[Articulate Presenter 5]
Google Corporate Motto: Don't Be Evil
Tom Arah investigates what Google is up to with its Allegra and Bourbon updates/quakes and reminds the company of its founding principles.
Last month I looked at the importance of search engines to web traffic and the spectacular rise of Google. According to Neilsen NetRatings which measures US searchers www.searchenginewatch.com/reports/article.php/2156451 in May 2005, Google handled just under half of all US search engine traffic, more than double its nearest competitor, Yahoo. This is impressive enough, but Google is even more important for the content-based publishers, those experts in their field whose primary aim is to provide information rather than make money. Here it is common for Google to generate over three quarters of a site’s traffic. In fact a recent survey of 150 randomly-chosen medium-sized sites www.searchenginesurveys.com/ put the figure at over 90%!
For content-based publishers, Google can generate virtually all traffic – for good and bad.
Google’s role as the independent publisher’s friend is not a coincidence. In the paper (www-db.stanford.edu/~backrub/google.html) with which they launched Google, Stanford University students Sergey Brin and Larry Page introduced their new search engine as an academic, content-focused alternative to the existing search engines: “…search engines have migrated from the academic domain to the commercial…This causes search engine technology to remain largely a black art and to be advertising oriented. With Google, we have a strong goal to push more development and understanding into the academic realm.” In short, Google was a believer in the original Berners-Lee vision of the web as a universal and transparent academic research tool in which content, and content alone, was king.
Key to Google’s focus on high-quality content was its new PageRank ranking system based on measuring and weighting backlinks (see patent 6,285,999 for the technical details). As we saw last month, the central insight behind PageRank is an understanding that each hyperlink can be treated as the equivalent of an academic citation between papers or, as Google now presents it, as a vote from one site to another. Crucially, PageRank enabled Google to home in on the authority sites with high quality content that other sites, and in particular other authority sites, linked to.
PageRank had another important side-effect. It automatically enabled Google to avoid the escalating problem of spam results - pages based on optimized keyword matching but offering no real content - that were increasingly plaguing the commercial search engines results pages ( SERPs). As another of Page and Brin’s papers put it: “… PageRanks are virtually immune to manipulation by commercial interests. For a page to get a high PageRank, it must convince an important page, or a lot of non-important pages to link to it. At worst, you can have manipulation in the form of buying advertisements (links) on important sites. But, this seems well under control since it costs money.” http://dbpubs.stanford.edu/pub/1999-66
With its high-quality, junk-free results, Google was a fantastically successful system all-round and its traffic began rising exponentially. The financial temptation to introduce advertising was enormous and Google eventually put aside its original scruples and succumbed. Brilliantly though it managed to do so while remaining largely true to its founding principles. In particular, with its AdWords system, Google left its core search algorithms and results untouched, instead introducing a side column of relevant and discreet text-only sponsored links that users could ignore or take advantage of as they saw fit. And, by rolling out the system to third-party sites, via AdSense, Google began supplying not just traffic but direct income to the web’s content-focused publishers. With searcher, content provider and search engine all benefiting, it looked like Google had squared the circle between academic and commercial interests; between content and money.
AdWords enable Google to make money on the side – quite literally.
But there was a problem. As Google itself had recognized, there was now a great deal of money to be made from the web - and especially from a high Google ranking. As Page and Brin had foreseen, the need to spend money to manipulate the PageRank algorithm had kept Google’s SERPs relatively spam-free but now AdSense acted as a direct financial incentive to exploit the system. And with a little investment, whether in buying text-based links from genuine advertisers or link farms or creating your own sites and cross-linking between them, it soon proved very easy to manipulate. In fact unscrupulous webmasters only needed to invest their time, finding other sites willing to swap links or even just adding their site’s URL on all those pages that allow comments to be left – so-called “ blog spam”. Worst of all was the arrival of automatically generated “scraper sites” that trawled directories and even Google’s own SERPs, lifting the high quality content they found to gain themselves high rankings and high AdSense revenues as unwitting searchers immediately clicked to get away.
By the turn of 2005 Google spam had run out of control and the fall in Google’s search quality had become a rising complaint amongst regular users. In an online survey of 300 professional search engine optimizers, for example, 60% recognized it as a regular problem while 17% saw Google as “flooded with the stuff” (www.searchenginesurveys.com). From the publisher’s perspective, I for one put the levelling off of my own www.designer-info.com traffic after seven years of sustained growth to a combination of spam-infected SERPs pushing down legitimate rankings and the inevitable overall drop in Google usage as disillusioned searchers turned elsewhere. At the end of March however something new happened – my Google referrals began falling off a cliff.
My natural response was to turn to Google to find out what was happening. Not the Google site of course, as the company now operates under a cloak of deepest operational secrecy. With a little bit of google-searching though, I quickly found the main centre for Google-related discussion, the Google News forum at www.webmasterworld.com - a forum which partly fills this role thanks to occasional insights into the oracle itself from “ GoogleGuy”, an anonymous Google employee. From the despairing response of webmasters losing their livelihoods and seeing years of work disappearing overnight, it was clear that I was not alone and that thousands of sites, all in the same content-oriented independent publisher category, were experiencing the same collapse in traffic. Like a hurricane, the update was given a name: “ Allegra”.
The home of Google discussion at WebmasterWorld.com
So what was happening and how could we get the referrals back? Possible causes and remedies abounded and my eyes were opened to a whole host of flaws and failings in a system that I had previously assumed was the definition of algorithmic efficiency. One major topic of concern for example was Google’s “sandbox”, a crude holding area where new sites in competitive keyword areas are effectively put on probation to prove that they are not spam, so depriving start-ups of all traffic!
Google’s sandbox wasn’t a problem for the long-established sites that were now suffering, but there were plenty of other possible contenders. Two in particular seemed relevant to many sites, including mine. First up was the “canonical index” issue. For some reason Google is happy to index exactly the same site under both www and non-www versions of its URL. All it takes is one backlink to omit the “www” and, if you use relative rather than absolute links, a mirror site is created and indexed so draining your PageRank and quite possibly incurring duplicate content penalties.
The solution is to use absolute links to limit the damage and to put in a server redirect to prevent it. However make sure it is a permanent 301 redirect as if you put in a temporary 302 redirect Google treats the redirect page as the URL to index! We know this because it’s the bug that enables so-called “Google hijacking” in which a page’s content is listed under another site’s URL again draining PageRank. This can be done maliciously but, even more worryingly, it’s also very easy to do inadvertently – the majority of my hijacked pages were redirect links from respected sites. It can even be done as a prank - to highlight the flaw one site even hijacked Google’s own AdSense page!
These are all important issues that need to be addressed but, in the absence of any hard information from Google, it’s difficult to say just what effect each has. Moreover they aren’t new issues that would suddenly affect such a large number of sites. So what new rules could be in operation? The forums were full of possibilities – a new dislike of too many internal links, affiliate links, run of site links, keyword-based filenames, < iframes>, CSS, non- CSS, non-US hosted sites and so on. The nightmare is, that with no line from Google, it’s impossible to rule any suggestion out or in - though GoogleGuy did dismiss the <iframes> theory - so the temptation is to believe and act on them all. Ultimately though no possible factor seemed applicable to more than a percentage of the sites that were suffering so I felt the best approach was to hold steady.
I also felt that an important piece of the picture was still missing. Eventually I found what I believe to be the most likely candidate in the form of Google’s recent patent 0050071741, granted on March 31 2005. The title for this couldn’t be clearer: “Information retrieval based on historical data”. However the practice is anything but simple, involving no less than 63 separate claims and over a hundred examples of how they could be put into practice. Moreover the range of factors that are taken into account is extraordinary – it’s not just a document’s inception date that is monitored, for example, but how it is updated over time, whether changes are made to navigational, boilerplate or content based text, whether the title changes and so on. And this is only the beginning, the patent also describes monitoring and storing information on how link data, anchor text data, traffic data, behaviour data, query data, ranking data even domain data change over time – and each one in just as much exhaustive detail as the historical document data.
There’s a vast amount to take in but, if you step back, the suggestions again make sense at the intuitive level. Indeed you need to take time into account to keep the PageRank citation system providing the most relevant results – the General Theory of Relativity is seminal for example but physics is always moving on. On a rather more mundane level if you search for “ photoshop review” or “ dreamweaver tutorial”, chances are that you’d rather read one written in the last year or so. And the patent isn’t only about time, it is more generally concerned with what Googleguy has called “signals of quality”. By monitoring the anchor text for links, for example, it becomes possible to weed out those automatically-generated farm and scraper links from genuine citations.
Google’s recent patent is based on historical data analysis.
So has Google implemented the patent? The simple answer has to be “no”. The task of storing and processing this vast and ever-increasing amount of data for every page on the web is simply mind-boggling. Moreover, if implemented in full and across the board, the new index would bear little relation to the old and the upheaval to the web economy and ensuing outcry would be overwhelming. However there’s nothing to stop Google cherry-picking elements, say penalizing older pages unless they are still gaining backlinks, giving recent links more credit than old and screening out generated links. And then only rolling out the new algorithms to the most appropriate, say technology-based, keyword sectors. This would certainly explain what I was seeing in my stats in terms of reduced numbers all round and the new patterns emerging for keyword phrases and site entry pages.
Having said this, like everyone else I’m working in the dark, and it’s quite possible that the real explanation has absolutely nothing to do with Google’s latest patent. Eventually though I’d argue that its existence makes it very likely that this is the direction in which Google intends to move. And, on balance, I welcome it. Google’s PageRank architecture is the secret of its success and is entirely dependent on the citation-style links that were the norm in the early days of the academic web. Look at the web now however, and these freely-given, hand-coded, “see also” links are vanishingly rare. Google needs to do whatever is necessary to boost the signal-to-noise ratio.
After many nights of lost sleep, a lot of research and a determination to look on the bright side (only possible because my site is a sideline rather than my living), I finally managed to make a case on Google’s behalf and to convince myself that my 75% drop in referrals and 50% drop in overall traffic could be justifiable. However on May 21 st I faced a new test as my Google traffic simply disappeared – down to around 45 referrals a day from a peak of 4500! Again I wasn’t suffering alone. In fact as new webmasters began flooding in to the Google News forum desperately seeking a cure, or at least an explanation, it became clear that the number of sites wiped out by this new “Bourbon” algorithm/index update was even bigger.
More importantly, there could be no justification for such near-total falls. Crucially, even if you searched for your own unique domain name, this was appearing way down in the rankings, well below sites whose only connection to the domain name was a link to yours! This posed completely different questions. Whether the Allegra changes improved the quality of Google’s SERPs was debatable, but Bourbon’s blanket ban inherently lowered and sometimes destroyed their quality. OK, those sites affected might not be the best resource for all their previous referrals, but they certainly were for some.
So what possible reason would Google have for sabotaging its own results? Not surprisingly the most popular answer from the webmasters who were seeing their own revenues destroyed was a cynical: “follow the money”. Amazingly, it’s much the same argument as Larry Page and Sergey Brin made themselves in an appendix to their Google launch paper entitled “Advertising and Mixed Motives”. In it they describe how “advertising income often provides an incentive to provide poor quality search results” giving the example of a company that needs to advertise because its own home page did not appear for a query on its own name - exactly what we were seeing with Bourbon! They stopped short of saying this was done deliberately but concluded, “we believe the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm.”
Google’s idealistic founders explain their dislike of mixing advertising and search engines!
The irony that Google has now not just accepted advertising but become the web advertising company is not lost. And with a share price that has trebled in a year the pressure to return rising profits must be intense. It would certainly be a brave employee who suggested changes that would raise search quality but also lower advertising revenues. So I’ll do it: Google needs to actively check applicants to its AdSense scheme to screen out the scraper junk. Currently the company is making money directly from the spam merchants who are destroying its search quality – you can’t have more of a mixed motive than that.
Somehow though, I still cling on to the idea that Google hasn’t given in to the dark side and isn’t deliberately compromising the quality of its search to boost short-term profits. If only because Google’s long-term business plan is entirely dependent on the quality of its search. As if to confirm that Google isn’t actually evil after all, on June 16th my domain name and those of the majority of sites affected by Bourbon suddenly returned to their rightful place at the top of their domain-query SERPs and traffic generally returned to pre-Bourbon though post- Allegra levels. If I put my mind to it I can even come up with a possible justification for what happened – say an over-zealous anti-spam filter hitting innocent sites that was then corrected by another filter ready for the next index update.
But the point is I shouldn’t have to - Google needs to justify itself. Moving Google away from the mixed motives of advertising is clearly no longer an option, but transparency is. I don’t expect Google to publicize exploitable details about its ranking system and I certainly believe in its right to change its algorithms as it sees fit to return more relevant results – no publisher should think that Google owes them a living. However Google has to recognize that its index is no longer only of academic interest, as any number of websites affected by recent changes will tell you. Google is a business, a very big business, and as such it has corporate responsibilities. Factors such as the sandbox, hijacking and especially algorithm changes and penalties affect people’s livelihoods and need to be addressed much more publicly.
In particular the fact that public dialogue, between what the press is now calling “the world’s largest media company” and the actual creators of the content it supplies, is left to occasional ad-hoc remarks and responses from Googleguy is a disgrace. In a company worth over $80 billion Googleguy should be promoted and put in charge of a team whose whole job is to explain the company’s mission and actions to the webmaster community and listen to, and act on, their responses. Google needs to recognize that it is these content producers that are the ultimate source of its revenues and start treating them as suppliers rather than as raw materials. Above all that means dealing with them as fairly and openly as possible.
If not, Google’s founding principles could soon turn very sour. As Brin and Page recognized, the alternative to transparency is suspicion. Unless Google tackles this issue head-on it is danger of turning into, or certainly being seen as, an all-devouring version of the secretive, advertising-driven commercial monster it was originally designed to vanquish.
Hopefully you've found the information you were looking for. For further information please click here.
For free trials and special offers please click the following recommended links:
For further information on the following design applications and subjects please click on the links below:
[3D], [3ds max], [Adobe], [Acrobat], [Cinema 4D], [Corel], [CorelDRAW], [Creative Suite], [Digital Image], [Dreamweaver], [Director], [Fireworks], [Flash], [FreeHand], [FrameMaker], [FrontPage], [GoLive], [Graphic Design], [HTML/CSS], [Illustrator], [InDesign], [Macromedia], [Macromedia Studio], [Microsoft], [NetObjects Fusion], [PageMaker], [Paint Shop Pro], [Painter], [Photo Editing], [PhotoImpact], [Photoshop], [Photoshop Elements], [Publisher], [QuarkXPress], [Web Design]
To continue your search on the designer-info.com site and beyond please use the Google and Amazon search boxes below:
|designer-info.com: independent, informed, intelligent, incisive, in-depth...|
All the work on the site (over 250 reviews, over 100 articles and tutorials) has been written by me, Tom Arah It's also me who maintains the site, answers your emails etc. The site is very popular and from your feedback I know it's a useful resource - but it takes a lot to keep it up.
You can help keep the site running, independent and free by Bookmarking the site (if you don't you might never find it again), telling others about it and by coming back (new content is added every month). Even better you can make a donation eg $5 the typical cost of just one issue of a print magazine or buy anything via Amazon.com or Amazon.co.uk (now or next time you feel like shopping) using these links or the designer-info.com shop - it's a great way of quickly finding the best buys, it costs you nothing and I gain a small but much-appreciated commission.
Thanks very much, Tom Arah
[DTP/Publishing] [Vector Drawing] [Bitmap/Photo] [Web] [3D]
[Articles/Tutorials] [Reviews/Archive] [Shop] [Home/What's New]