This morning, the WSJ follows-up on the GIGO-based UAL stock drop. This piece, ironically dated tomorrow (see image), describes the Tribune’s side of the story, that a single click on the 2002 story was the butterfly, if you will, that led to the “computer glitch”. That glitch, from the Tribune’s point of view, began with Google’s web crawler:
“Tribune has offered details of the incident in pieces since Monday. In its latest explanation, Tribune said a single visit during a low-traffic period early Sunday morning pushed the undated story onto the list of most popular business news of its South Florida Sun-Sentinel newspaper’s Web site.
About 30 minutes after that visit, a user viewing a story about airline-cancellation policies during a storm-ravaged weekend clicked on the link for the old story. Seconds later, Google’s automated search agent, Googlebot, visited the Web site and found the story.
Soon after that, the story became available through Google News, and by Monday the article became more widely distributed to users of Bloomberg LP, the financial-news service widely watched on Wall Street.”
The story continues with Tribune pointing a finger at Google, and a Google public statement response:
“Tribune said it previously had identified problems with Google’s automated search service and had asked Google to stop trolling Tribune Web sites for inclusion in Google News.
“Despite the company’s earlier request and the confusion caused by Googlebot and Google News earlier this week, we believe that Googlebot continues to misclassify stories,” Tribune said.
Google spokesman Gabriel Stricker said in a statement: “The claim that the Tribune Company asked Google to stop crawling its newspaper Web sites is untrue.””
So, why did Google’s crawler pick up the results of this lone click? The Googlenews blog shares a chronology of events, including screen shots of the crawled pages. The summary (emphasis is mine):
“On Saturday, September 6th at 10:36 PM Pacific Daylight Time (or Sunday, September 7th at 1:36 AM Eastern Daylight Time), the Google crawler detected a new link on the Florida Sun-Sentinel’s website in a section of the most viewed stories labeled “Popular Stories: Business.” The link had newly appeared in that section since the last time Google News’ Googlebot webcrawler had visited the page (nineteen minutes earlier), so the crawler followed the link and found an article titled “UAL Files for Bankruptcy.” The article failed to include a standard newspaper article dateline, but the Sun-Sentinel page had a fresh date above the article on the top of the page of “September 7, 2008” (Eastern).
Because the Sun-Sentinel included a link to the story in its “Popular Stories” section, and provided a date on the article page of September 7, 2008, the Google News algorithm indexed it as a new story. We removed this story as soon as we were notified that it was posted in error.”
One lesson from this, make judicious use of metadata tagging in your content storage and publication.