I got this email (I’ll keep the emailer anonymous unless (s)he notifies me otherwise):

It’s very disturbing how Google is behaving with regard to Climategate/Climaquiddick. I put both of those in my custom news page. For a while, it steadfastly refused to update Climaquiddick, and then it began to update Climategate only with stories attacking climate change skeptics. I could find many more stories on Yahoo, most of which were alarmed at the fraud which seems to be occurring.

Then when I logged in today, Google News had deleted those two categories from my custom section. When I reestablished them, they brought up only a few of the old, outdated original stories plus a few newer attack stories.

Web searches on Climaquiddick yielded only 72,600 hits on Google and 84,300 on Bing, but 565,000 on Yahoo. None of them will autocomplete the word “Climaquiddick.” They won’t autocomplete “Climategate” either, but Yahoo alone will suggest “climate gate.”

Does everyone in Silicon Valley think that pretending information doesn’t exist will make it so? If so, how much can we trust the technology they produce?

I think that there are going to be huge reverberations of untrust throughout many areas of authority resulting from this. As was pointed out early on, it’s not just a scientific scandal, it’s a journalistic one.

  1. I tried both Google and Yahoo. Google did not suggest climategate or permutations until I finished typing it. I received about 24 mil hits. Yahoo did complete climategate and I received about 46 mil hits.

    Seems to me that there is something to it but what do you expect from Google? I have seen similar findings on other issues.

  2. I just typed in “climategate” and got 216,000,000 results. I typed it into the GOOGLE searchfield on the top right of my Firefox 3.0.15 installation on a Ubuntu 9.04 computer. The first page had 11 suggested articles, and without reviewing every single one of them most of the links were to conservative organizations like PJTV, RealClearPolitics, and BIgGovernment. There was also a link, legitimate in my opinion, to Huffington Post. I see absolutely nothing to suggest that Google is trying to hide anything.

  3. Some of the algorithms would seem to be “live”, in that they are intended to continuously (or at least periodically and frequently) mine the search terms that are actually in use.

    If you go back to some of the earliest reporting, you can see the earliest protests were from people who had already seen “climategate” suggested… but who now didn’t see it suggestion. Prompting “Wait, was it stripped manually?”

    The search is frequent enough that it keeps making it onto ‘the list’. And it (presumably) keeps getting bumped. No one quite being willing to change code on the running servers to block it permanently is my bet.

  4. I got similar results to those of newrouter, but if I separate the “climate” from the “gate” I get a list of about ten suggestions all related to the controversy. I know the people at Google are a bunch of leftards, but I’m inclined to give them a break here.

  5. Does everyone understand how autosuggest works? I’m not sure people do.

    As danoso explained, autosuggest is NOT just your browser’s “auto-complete” feature that fills in your name and address into web forms. It’s a Google (and other search engines too) feature that gives you a list of the MOST POPULAR search terms beginning with whatever you’ve typed up to that point, recomputed every time you press another key.

    It’s for this reason that Google Suggest is a useful tool for finding out what’s big in the news lately, or at least what doesn’t present a conflict of interest for Google’s board of directors (apparently). For instance, type “i’m” and the first suggested result is “i’m on a boat”. Which makes sense for a viral meme that everyone’s been searching for. If you look at the list of returned suggestions, they’re clearly sorted in decreasing order of how frequently people would have searched for them. Type something like “shock the” and you’ll see what I mean.

    However, the algorithm has the capacity to have certain search terms– including popular ones– filtered out of it. For example, “2 girls” will not return “2 girls 1 cup”, no matter how hard you try. (Fortunately.)

    If Google Suggest is returning “climate guatemala”, it means people have used those search terms at some point in the past, frequently enough to exceed some threshold of popularity and show up in the list. It doesn’t mean Google is pulling random words out of its dictionary to try to complete your search.

    Knowing this, it seems clear that the only reason “climategate” isn’t showing up is that it’s been consciously filtered out. It’s definitely not that it’s not a popular search term.

    Also– I’ve been keeping track of this ever since this started to be an interesting phenomenon a couple of weeks ago. First “climategate” had disappeared from the suggestion list, but “climate-gate” did show up in the first few entries returned by “climate”. A couple of days later, after steadily moving up the list, “climate-gate” had suddenly disappeared too. And a few days after that, I saw “climate gate” was in the results, but it too quickly vanished. Finally, just a couple of days ago, I saw “climategate emails” and “climategate scandal” popping up at the top of the suggestions. But now those too are gone; only “climate guatemala” is left.

    Not only is this a real phenomenon, it’s keeping someone at Google awfully busy.

  6. I use firefox. Got “climate guatemala” when I typed in “climategate” in the google search box. When I typed in “climate gat”, I finally got “climate gates” as second choice. “climate gatlinburg” was the first choice.

  7. I have seen this phenomenon off and on for the last couple weeks. I wish I had kept better track – it appears to be an intermittent phenomenon on both Bing and Google. Just now (~10:30 Pacific) I got a list of very pertinent results from Google [climategate, climategate emails, climategate scandal, climategate cnn, climategate fox news], but nothing from Bing. When I first read this post this morning, I didn’t get anything from either search engine. Yahoo shows climate gate. Yet if I put in something innocuous – like, SpaceShipTwo – I get a list of pertinent posts. I have never seen anything like this – it has been on and off the auto-suggest for weeks.

  8. I suspect that part of what’s happening is a junk filter on Google’s search engine, that properly used improves search results. But if enough loony lefties (some of whom may be Google employees) are submitting critical pages as junk, this filter could be subverted into censorship. An ebb and flow of politically charged search results could result from these false junk reports collecting and being cleaned out.

    But I wouldn’t dismiss the AlGore connection in how input to this filter is managed.

    Same problem wikipedia has on any topic where people have an axe to grind, where community review that otherwise results in better quality can backfire.

  9. Puzzling. No judgements, but I just did a Google search of “climategate” and got 12,800,000 hits. The first page are all sites like Fox, realclearpolitics, huffingtonpost, wikipedia, London Telegraph, cbsnews and six of the seven ads were “Pro-Hoax”. Quite contrary to what I expected after reading this post and comments.

  10. I tried the experiment, and as soon as I’d typed “clim” into Google, “climategate” was provided as an option. It produced 25,200,000 results. It seems strange that different people are reporting wildly differing results.

  11. I’m still getting climategate when I type “clima”. Have been all evening long. If you guys are getting something different at the same time, then I would suggest that there may be regional differences. I’m in Los Angeles. I’ve been hearing about this autocomplete thing for over a week now but every time I test it myself in the browser search using google it always autocompletes as ‘climategate’, and this is on computers here where I live and on campus, so its not just one machine that has no trouble with getting the proper return, I’ve got a sample of about 20 different machines. After testing so many and seeing none malfunction, I personally was beginning to think that those claiming the bug were agents provocateurs intent on making the skeptic movement look tin hatty. But I think there is a more rational explanation, that there are some regional google servers where this bug is happening.

  12. What seems to be happening is that the algorithm is dynamic, making suggestions based on an evolving assessment of the level of interest in a topic. So when I went on yesterday, Google suggested “climategate” after I got to the “g” (which makes a certain amount of sense). This morning, it suggested it earlier. I’m not sure that the search results are being modified, but given Google’s past behavior, I wouldn’t doubt that they’re fudging their news sections based on their ideological leanings. Of course, that’s no different than what any other newspaper does, by acting as a gatekeeper. The whole point of Google News was to avoid that, but as we see: that gatekeeping power couldn’t be left unused forever. Google certainly is (and to some extent always has been) used as a political tool by the left.

  13. Huh, I get “climategate” as the second choice for “climate” now, but not for “climat” even though all the phrases suggested start with “climate”.

  14. You’re all so civil. I’d expect every moral person to boycott wiki and google and, and, and, until the market provides an objective solution. Let them know why. Otherwise, it’s simply funding the algorithm for one’s demise.

    Let’s be clear. Climategate by whatever name has already exposed the largest THEFT in history. Sunk cost. However, we still have a worldwide cabal of co-conspiring politicians, bureaucrats, gov’t agencies, NGOs, VCs, publishers, grantmakers, CEOs of sham “green” cos (alternative energy like bio, solar, wind), scientists and agitators all working in concert to continue defrauding the public, in order to eliminate indivdual rights through increased taxation.

    Tens of thousands knowingly conspiring; ie, none can now claim ignorance. This is the same cadre of criminals that works to preclude US from exploiting national resources and building nuclear power plants. They’re stealing the future.

    Anyone supporting AGW is complicit. That includes any editor at a search engine or a scientific journal who alters the results to fit their view. These folks are not merely being socialists/Marxists/communists/fascists/statists, they’re flatout guilty of sedition.

    AGW is not a political issue anymore than national healthcare is; both are solely about the theft of your money and your freedom. The person willing to cheat, be it about the science or the search results, is demonstrating the morality of a kapo. Call’em out.

    Climatology is not science; it’s a study.

    UN – withdraw, defund, evict.
    EPA, NASA, NOIA et al delenda est

  15. I get “climate gate” when I type out “climate gat” so it autocompletes the e at the end for me. And I get “climate guatemala” even when I’ve typed out “climategate”

  16. Just tried at google: at “climate” climategate was the second auto-complete suggestion, but at “climateg” it wasn’t in the top 10. it didn’t show up again until “climategat”, when it was top. the rest were all climategate related, mostly spelled “climate gate”.

    It looks like maybe based off actual search strings typed by users. Maybe most people who hear about climategate break it into two words?

  17. I now get it with “clim”. As far as I know, I haven’t made any searches on this word. Weird how this keeps changing.

  18. Now, it’s “clima”. This seems to change quite regularly. I wonder if other lookup terms change so much?

  19. remember google can give different results based on your web history now, even if your not logged in.

    But how can it change so much hour by hour? I don’t do that much in between my lookup attempts?

  20. I just tried it. after typing in “cli,” climate change is the top result, with climate gate being the second. i don’t understand the outrage.

  21. All I know Jeremy is that there have been times when “climategate” has disappeared completely.

  22. It might be possible to reverse engineer the google autocomplete algorithm from comments above. It seems obvious they have a dynamic blacklist which can be overuled by enough people searching for a term, which in turn can be reversed. It’s a unending voting system rather than an absolute blacklist. It would also seem that C-L-I and others like C-L-I-M are separate hits in the index (from the comments above.)

  23. But how can it change so much hour by hour?

    Karl, one of my friends who works at google says that when he wants to test an improvement, he can divert x% of the incoming requests to use his modified algorithm. Needless to say, there are safeguards so that he doesn’t hose the site. There are more opportunities for such experimental diversions on secondary functions such as AdWords and autocomplete than there are for the main search algorithm (which my friend does not even have access to). Because people are constantly trying to game the system with respect to AdWords (I’m not sure about autocomplete), there are frequent modifications to the algorithm, and again, only x% of the users see the impact of such modifications at first.

  24. I’m not following you. No, of course they are not random noise. Certainly there are many ways of computing suggested search terms. My only point was that a user can make two requests a few seconds apart, and the first request might be processed using one algorithm, and the second request might be processed using a different algorithm. I’m not saying that this will happen, just that it could happen.

  25. Bob-1, the point is that all of these algorithms supposedly have the same final goal, delivering search terms that are frequently used and desired by the user. Hence, they should all have similar outcomes. An algorithm that delivers “climate guatemala” as a best fit to “climategate”, which I have gotten at times, is not such a one.

  26. “all of these algorithms supposedly have the same final goal, delivering search terms that are frequently used and desired by the user.”

    Karl, stop and think about all the different ways “terms that are frequently used and desired by the user” can be determined.

    The following is just off the top of my head – I’d write a better comment if I had more time, but I hope this gives you a sense of what a truly good answer would look like.

    One way: predict what the person is going to type based on all the other identical strings typed by other users

    Another way: predict what the person is going to type based on all the other identical strings typed by that user in the past.

    3rd way: same as one of the above, but factor in what other people actually clicked on, based on the words in the preview google provided in its search results.

    4th way: same as one of the first two, but factor in what the user in question actually clicked on, based on the words in preview google provided in its search results.

    5th and 6th ways: same as 3rd and 4th ways, but factor in all the words in the body of webpages which have been selected, rather than just the words in the preview.

    7th way: start using statistical liklihoods that aren’t based on previous behavior by any user, but rather, likelihoods based on the frequency of word combinations found in english text.

    8th to Nth way: In addition to the above methods, use semantic networks which represent how words are related. In other words, start to actually parse what’s going on. There are an enormous number of ways to do this.

    I’m not suggesting that some of the methods listed above are particularly smart — they aren’t! I just wanted to list a few brain dead ways of doing it so that you’d see there are alternatives. Furthermore, some of the smartest ways of doing it might be a bit brittle when it comes to neologisms, which is not always unreasonable, if the methods are superior for standard English, as well as for terms such as “watergate” which have had time to work their way into the lexicon. If people are still talking about “climategate” in a few years, and you still have the same objections, I think you’d have a stronger case.

  27. Note that some of the simplistic methods I was talking about above depend on multiple-word inputs. To see what I mean, see what google autocomplete suggests when you type “hacked email cl”….

  28. How ’bout this for a new Google motto: “Don’t be Weevil!”

    Based on the following entry from Wikipedia, the analogy is compelling: the toxic results of small (minded) pinheads who manipulate search results in other people’s information storehouses under the cover of darkness should be brought out into the open and eradicated.

    A weevil is any beetle from the Curculionoidea superfamily. They are usually small, less than 6 mm (¼ inch), and herbivorous. There are over 60,000 species in several families, mostly in the family Curculionidae (the true weevils). Some other beetles, although not closely related, bear the name “weevil”, such as the biscuit weevil (Stegobium paniceum), which belongs to the family Anobiidae.

    Many weevils are damaging to crops. The grain or wheat weevil (Sitophilus granarius) damages stored grain. The boll weevil (Anthonomus grandis) attacks cotton crops. It lays its eggs inside cotton bolls, and the young weevils eat their way out.

    Weevils are often found in dry foods including nuts and seeds, cereal and grain products, such as pancake mix. In the domestic setting, they are most likely to be observed when a bag of flour is opened. Their presence is often indicated by the granules of the infested item sticking together in strings, as if caught in a cobweb. If ingested, E. coli infection[citation needed] and other various diseases[clarification needed] can be contracted from weevils, depending on their diet.

  29. When the use of environmentalism as a tool for the creation of global government was first imagined, there was no internet. Free access to up-to-the-minute information, not filtered through the MSM, was not anticipated. When Hussein was elected, I feared even this boon to free information was insufficient. Then I learned the voting margins in most states were less than the automatic electoral bias resulting from the MSM’s mendacity. Now we see information on the AGW fraud spread like wild fire.

