friday9 Oct 2020

Googlers Talk About Their Indexing System Caffeine

http://search-off-the-record.googledevelopers.libsynpro.com

This was a really cool episode with Martin, John and Jerry. They talk about Google’s indexing system caffeine, here are some key points:

it normalize the HTML for processing ( pdf, spreadsheet, word documents, lotus files, etc are also converted to HTML for processing )
it gets the content of the header tags and looks at the styling for the header tag to see the relative importance of the header tags
it look for some meta tags – robots meta tags, etc
if they find HTML body-related tags in the head like div, p, span iframe etc they close the head just before that and will start processing the body tag from there
collapsor – the system that’s doing error page handling tries to detect a page to see if it’s a 404 page, so if you have content that this system might mistake for an error page chances are they can not be indexed by Google. ( so if you hate someone, drop in some random 404 page not found messages randomly into the page and watch they pull out their hair trying to get those pages indexed ). This system also has the potential to flag an out of stock product page on an e-commerce site as a soft 404 page depending on the words you use on the page.

Link | h/t: John Mueller

SEO

No Media

I love tl;dr Marketing because I can get all the latest SEO news and trends in one spot without having to read lengthy articles. I really look forward to the daily emails to see what's new in our industry!

Curated by Saijo George

friday9 Oct 2020

Googlers Talk About Their Indexing System Caffeine

Ryan Mews SEO Manager Merkle