Many Sites Don’t Welcome Any Web Crawler That isn’t Google Or Bing Which Limits New Search Engines From Crawling The Web
https://www.fastcompany.comBefore a new search engine can hope to make a run against Google, it has to crawl. The web is a lot trickier to crawl than it was a few years ago.
“The high cost of maintaining a fresh index, and the decision by many large webpages to block most crawlers, significantly limits new search engine entrants,” the report stated. “Today, the only English-language search engines that maintain their own comprehensive webpage index are Google and Bing.”
That leaves many Google competitors renting the index Microsoft maintains for its Bing search, which has 6.4% of the U.S. market—compared to Google’s 87.3%—in Statcounter’s measurements.
DuckDuckGo and Neeva pointed to Facebook’s platform as one example. Its robots.txt file takes a guest-list approach, approving Google and Bing as well as such less obvious crawlers as “Applebot,” which gathers data for Apple’s Siri and Spotlight. But it excludes all bots not cited by name.