Bing Webmaster Blog - Posts tagged with 'Index and crawling'

Bingbot, the Sequel

Early this month, in the Bingbot is Coming to Town blog post, we provided details about the impending change from msnbot to bingbot.  It’s almost October 1st, so we thought we’d send out a quick reminder that this change is still in the works. Taking into account webmaster feedback, we have modified our rollout schedule a bit. Instead of doing an 100% switch on day one, we will do a staged rollout.  What does that mean to...
Read More

Bingbot is coming to town

Back in June, 2010, we published a blog post titled Bing crawler: bingbot on the horizon that announced our plans to retire our venerable web crawler, MSNBot, and replace it with the new. Our plans remain on track, and we want to remind you that this change will occur on October 1st, 2010. We also want to take this opportunity to help set some expectations for this process, to discuss the name change details, what the change means in terms of...
Read More

Bing crawler: bingbot on the horizon

Since our last post in November, the Bing team has been busy rolling out improvements to the Bing web crawler. As a result of this work, we want to announce in advance our plans to change the name of our crawler (aka user agent). Out of beta with a new name On October 1st, 2010, we will drop the beta designation from the Bing crawler and change the name of the crawler to reflect Microsoft’s new brand for search. Instead of the old msnbot 2...
Read More

Crawl delay and the Bing crawler, MSNBot

Search engines, such as Bing, need to regularly crawl websites not only  to  index new content, but also to check for content changes and removed content. Bing offers webmasters the ability to slow down the crawl rate to accommodate web server load issues. The use of such a setting is not always needed nor is it generally recommended, but it is available for use by webmasters should the need arise. Websites that are small (page-wise) and...
Read More

How to remove URLs from our index (expanded edition)

This has been an excellent week in Webmaster Center. Our new Bing community forums are alive and vibrant with excellent questions and people who are willing to share their knowledge. I would like to take this opportunity to thank everyone who has participated. We look forward to your continued participation going forward! I’ve received a couple great questions on the forums lately and, although we have posted on a few of these...
Read More

Getting out of the penalty box

Life can be cruel. You work hard to create new and compelling content for your site. You’ve studied legitimate SEO techniques. Everything is going well and your site is getting decent page rank scores across the board. Then it hits. A search engine penalty comes out of nowhere and knocks your site out of the index. Talk about a bad day. To be honest, most webmasters who get penalized know why it happened. They’ve used a myriad of...
Read More

Partnering to help solve duplicate content issues

One of the most common challenges search engines run into when indexing a website is identifying and consolidating duplicate pages. Duplicates can occur when any given webpage has multiple URLs that point to it. For example: URL Description https://hubu5cn4ze.proxynodejs.usequeue.com/ A webmaster may consider this their authoritative or canonical URL for their homepage. https://vjbyp0ytgl.proxynodejs.usequeue.com/ However, you can add ‘www’ to most websites and still get the...
Read More

Is your robots.txt file on the clock?

Just recently a strange problem came across my desk that I thought was worth sharing with you. A customer notified us that content from a site she was interested in was not showing up in our results. Wanting to understand why we may or may not have indexed the site, I took a look to see what the problem was and stumbled upon an interesting but a potentially very bad use of the robots.txt file. The first visit I made to the site had a very standard...
Read More

Robots Exclusion Protocol: joining together to provide better documentation

As a member of the Live Search Webmaster Team, I'm often asked by web publishers how they can control the way search engines access and display their content. The de-facto standard for managing this is the Robots Exclusion Protocol (REP) introduced back in the early 1990's. Over the years, the REP has evolved to support more than "exclusion" directives; it now supports directives controlling what content gets included, how the...
Read More

Microsoft to support cross-domain Sitemaps

Today we’re pleased to announce an update to the Sitemaps Protocol, in collaboration with Google, and Yahoo! This update should help many new sites adopt the protocol by increasing our flexibility on where Sitemaps are hosted. Essentially, the change allows a webmaster to store their Sitemap files just about anywhere, using a reference in the Robots.txt file to establish a trusted relationship between the Sitemap file and the domain or...
Read More