Saijo George

Curated by Saijo George

Read more

friday9 Oct 2020

Googlers Talk About Their Indexing System Caffeine

http://search-off-the-record.googledevelopers.libsynpro.com

This was a really cool episode with Martin, John and Jerry. They talk about Google’s indexing system caffeine, here are some key points:

  • it normalize the HTML for processing ( pdf, spreadsheet, word documents, lotus files, etc are also converted to HTML for processing )
  • it gets the content of the header tags and looks at the styling for the header tag to see the relative importance of the header tags
  • it look for some meta tags – robots meta tags, etc
  • if they find HTML body-related tags in the head like div, p, span iframe etc they close the head just before that and will start processing the body tag from there
  • collapsor – the system that’s doing error page handling tries to detect a page to see if it’s a 404 page, so if you have content that this system might mistake for an error page chances are they can not be indexed by Google. ( so if you hate someone, drop in some random 404 page not found messages randomly into the page and watch they pull out their hair trying to get those pages indexed ). This system also has the potential to flag an out of stock product page on an e-commerce site as a soft 404 page depending on the words you use on the page.
SEO
No Media


I love tl;dr Marketing because I can get all the latest SEO news and trends in one spot without having to read lengthy articles. I really look forward to the daily emails to see what's new in our industry!

Google Tests Images Thumbnail On Left Side Of Description on SERPs 1 - SEO News

Ryan Mews SEO Manager Merkle