Diagrams for Solving Crawl Priority & Indexation Issues
December 28, 2009
Google’s Indexation Cap
December 28, 2009
Google (very likely) has a limit it places on the number of URLs it will keep in its main index and potentially return in the search results for domains.
Let’s examine some of the potential metrics Google looks at to determine indexation:
- Importance on the Web’s Link Graph
We’ve talked previously about metrics like a domain-level calculation of PageRank (Domain mozRank is an example of this). It’s likely that Google would make this a backbone of the indexation cap estimate, as sites that tend to be more important and well-linked-to by other important sites tend to also have content worthy of being in the index. - Backlink Profile of the Domain
The profile of a site’s links can look at metrics like where those links come from, the diversity of the different domains sending links (more is better) and why those links might exist (methods that violate guidelines are often getting caught and filtered so as not to provide value). - Trustworthiness of the Domain
Calculations like TrustRank (or Domain mozTrust in Linkscape) may make their way into the determination. You may not have as many links, but if they come from sites and pages that Google trusts heavily, your chances for raising the indexation cap likely go up. - Rate of Growth in Pages vs. Backlinks
If your site’s content is growing dramatically, but you’re not earning many new links, this can be a signal to the engine that your content isn’t “worthy” of ongoing attention and inclusion. - Depth & Frequency of Linking to Pages on the Domain
If your home page and a few pieces of link-targeted content are earning external links while the rest of the site flounders in link poverty, that may be a signal to Google that although users like your site, they’re not particularly keen on the deep content – which is why the index may toss it out. - Content Uniqueness
Uniqueness is a constantly moving target and hard to nail down, but basically, if you don’t have a solid chunk of words and images that are uniquely found on one URL (ignoring scrapers and spam publishers), you’re at risk. Google likely runs a number of sophisticated calculations to help determine uniqueness, and they’re also, in my experience, much tougher on pages and sites that don’t earn high quantities of external links to their deep content with this analysis. - Visitor, CTR and Usage Data Metrics
If Google sees that clicks to your site frequently result in a click of a back button, a return to the SERPs and the selection of another result (or another query) in a very short time frame, that can be a negative signal. Likewise, metrics they gather from the Google toolbar, from ISP data and other web surfing analyses could enter into this mix. While CTR and usage metrics are noisy signals (one spammer with a Mechanical Turk account can swing the usage graph pretty significantly), they may be useful to decide which sites need higher levels of scrutiny. - Search Quality Rater Analysis + Manual Spam Reports
If your content is consistently reported as being low value or spam by users and or quality raters, expect a visit from the low indexation cap fairy. This may even be done on a folder-by-folder basis if certain portions of your site are particularly egregious while other material is index-worthy (and that phenomenon probably holds true for all of the criteria above as well).
Now let’s talk about some leading indicators that can help to show if you’re at risk:
- Deep pages rarely receive external links – if you’re producing hundreds or thousands of pages of new content and fewer than “dozens” earn any external link at all, you’re in a sticky situation. Sites like Wikipedia, the NYTimes, About.com, Facebook, Twitter and Yahoo! have millions of pages, but they also have dozens to hundreds of millions of links, and relatively few pages that have no external links. Compare that against your 10 million page site with 400K pages in the index (which is more pages than what Google reports indexing on Adobe.com, one of the best linked-to domains on the web).
- Deep pages don’t appear in Google Alerts – if Google Alerts is consistently passing you by (not reporting, this can be (but isn’t universally) an indication that they’re not perceiving your pages as being unique or worthy enough of the main index in the long run.
- Rate of crawling is slow – if you’re updating content, links and launching new pages multiple times per day, and Google’s coming by every week, you’re likely in trouble. XML Sitemaps might help, but it’s likely you’re going to need to improve some of those factors described above to get in good graces for the long term.
New Keyword Tools: Grouping and Niche Term Discovery
December 16, 2009
Wordstream has released two new “Free” Keyword Tools for both SEO and SEM uses. These tools are actually a bit more refined and produce, at a glance at least, effect results when you test it out.
1. Keyword Grouping – this tool allows you to dump a list of Keywords into the tool, and have it Grouped into the most common combined searches, and further spits out more related keyword streams that supposedly get traffic within that keyword group.
View screen shots below to see an example:
I basically took a list of the top entrance keywords that we lost the most visits for through Analytics, and dropped the list into the box.
Here are the results: You can see suggested groups and tail endings to discover long tail grouped keywords.
2. Keyword Niche Finder: This tool works similar to t he keyword grouping tool, but as opposed to submitting a list of keywords to group, you can submit a single Head Term type of Keyword to discover common tail endings, and more long tail niche keyword within each grouped tail ending of the main head term.
These tools seem to work pretty good for keyword discovery and grouping, but in order to have the Keyword Lists e-mailed back to you, or to filter the results, you need to sign up for a Paid Account.
Keyword Discovery for New Content
December 11, 2009
One effective way to discover new content ideas and keyword discovery for your websites SEO campaign is analyzing the entrance keywords for relatively new content that you add to your website. For example, articles that you loaded within the past 30 days that begin getting some good traction, it seems as if Google tests out your content for different keyword sets, and continues sending you traffic for keywords they seem match to your content more appropriately with, and stop sending you traffic for keywords they do not deem to be a good fit.
View the screen shots below to see an example of how this works.
Step 1 – View Top landing pages and look at the traffic traction of new content that you have added. Identify content that first picked up traffic then lost it. Compare 2-3 weeks to a month vs. previous time line.



Step 2 – Now view the entrance keywords, you’ll notice the first few pages are newer keywords you are getting traffic for, and the pages towards the end are keywords you lost traffic for. Look at those keywords to identify common phrases/head terms.
Step 3 – Check Traffic for head terms you identify
You can find several new head terms/phrases that recieve quite a bit of traffic and warrant individual topics to build content around through this process.
How fake sites trick search engines to hit the top
December 9, 2009
The key to getting ranked well in a search result is to go everything that a search engine bot would deem “quality content” which will give it a higher ranking.
In this example, Stickley created a fake site (creditunionofsc.org) with the consent of Credit Union of Southern California (cusocal.org), which focused on tricking the search engines into believing that the crawlers were scanning a legitimate site. All Stickley did was put link after link inside the site to create the appearance of “depth”, even if the links led to only the same picture of the credit union’s front page.
He ranked #2 on yahoo search and #1 on bing search (both already removed. However, he never made it past the 6th page on google (which does over 2/3rds of all search traffic in the US).
An extremely well trafficked site such as Bank of America would always outrank a fake site, but hackers have been known to hack into education websites such as university sites, stuff them with links to the scam site, and then via “link building”, make the search engine interpret the scam site as a legit site.
Original Source: http://news.yahoo.com/s/ap/us_tec_search_engine_safety










