I’ve seen that a significant amount of pages we offer to google via sitemap appear in Google Webmaster Crawling Statistics as “crawled” but “not indexed” on one of our Projects.
What is the indexation Rate?
Googlebot crawls every Page as soon as they can somehow get a grip on that pages.
Yet, they don’t want to mess up their index instantly. So depending on the trust of your website, the time until a site is allowed to be indexed is a matter of seconds (Newssites) or Weeks (Newly Built Sites). Surely more factors are relevant for getting a site ranked in google, but in this blogpost I’ll just consider the indexation.
Now the question is: What is the Problem?
In Webmaster-Help they explaing short what is the most likely cause for this:
The reason is either that content is duplicated, available via 2 or more similar urls and missing canonical Tags. Second option is the site lacking relevant content.
- Pages are duplicates or non-canonical
- almost always the Indexation is significantly smaller than the crawled Pages
Both things were not true for us, since we found the following:
- We got pages that are indexed, but to have less content
- We have set up all cononcial tags right
- We have pages with content, but are not index
On the other hand we see that indexation is significantly smaller.
Having done more research, I found out that some Content is quite well hidden inside the application.
So the behavior was like this:
- Google finds subpages, but it takes quite a while
- So internal linking might need to be strengthened.
So we will add also subpages to our chronological page-index.
That should help to make these pages crawlable more easily.