SEO and CMS: Implementing Best Practices. Embedding Search Engine Optimization tactics during a CMS deployment

004 SEO and CMS: Implementing Best Practices Embedding Search Engine Optimization tactics during a CMS deployment Release Date: February 2006 Author...
Author: Beatrix Glenn
2 downloads 1 Views 540KB Size
004

SEO and CMS: Implementing Best Practices

Embedding Search Engine Optimization tactics during a CMS deployment Release Date: February 2006 Author: Randy Woods, Julie Batton

Table of Contents

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

Introduction

2

01

Section One:

Search Engine Optimization Best Practices

05

Section Two:

Embedding SEO Best Practices in CMS Implementations

13

Section Three: A Case Study

23

About non-linear creations inc.

27

Introduction Two key nonlinear creations areas of practice are Search Engine Marketing and Web Content Management solution (CMS) implementation. We have completed more than 40 CMS implementations – and have helped dozens of companies increase their profile on major search engines. This experience has provided unique insight into the intersection of search engine optimization activities and content management deployment. Whether recognized or not, SEO and CMS are intimately and deeply connected. This document discusses the most important lessons we have gleaned during real-world engagements in both areas of practice. We hope that you and your team will find this document useful. But we are even more hopeful that you will decide to give us a call and ask for our assistance.

non~linear creations inc.

You can reach us by email at [email protected], or by phone at 613.241.2067 ext.234.

1

For whom did we write this document? Responsibility for SEO lies at the intersection of marketing and IT. This document is designed to allow these two sometimes antagonistic groups to converse. In particular, we hope this document will benefit: •

Marketing teams and e-business executives



Directors of IT or CIOs



The senior management of small companies

This document may be particularly valuable in two scenarios: •

You are contemplating the deployment of a content management solution



You are struggling with the effect of a CMS deployment on your site’s search engine rankings

What this document is not... Before asking you to wade through this document, we want to be clear on its contents. This document is not: •

A silver bullet that will improve your site’s rankings on search engines for highly competitive terms



A review of the questionable search engine optimization tactics deployed by less-than-ethical practitioners



A guide to content management governance questions or implementation methodologies

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

It is an attempt to sketch established tactics for improving the visibility of your site to search engines and a discussion of how a content management system can be used to drive adoption of these tactics.

2

Structure of this Document This document is divided into three sections: Section One: An Overview of Search Engine Optimization Best Practices •

No one knows for certain how the major search engines rank results – this quick over view describes those tactics the SEO community generally agrees are effective.

Section Two: Embedding SEO Best Practices in CMS Implementations •

A review of the SEO tactics that you can embed – or enforce – with a content management solution

Section Three: A Case Study •

We’re not just making this stuff up. Honest. Some real-world results.

But before we start... The remainder of this document assumes familiarity with both web content management solutions and the concept of search engine optimization. If you are comfortable with both of these areas free to skip this section. If, on the other hand, you lack an understanding of either SEO or CMS, this brief introduction to central concepts should render the rest of this document at least vaguely intelligible.

A Very Brief Introduction to Search Engine Optimization Type “define SEO” into any search engine and you’ll get a near-ridiculous number of responses. My two favourites are: •

“The altering of a Web site, Web pages and links to Web sites and Web pages to improve visibility, rank and relevance in the organic, crawler-based listings of search engines”; and



“Search Engine Optimization represents the Ying or Female Principle in that it is more fluid and receptive to the algorithms of the search engines, which of course you do not control... It is like Judo where you use the momentum and power of the search engines to build your business.”

I like the first definition (from marketingprofs.com) for its concise description of the field. I love the second (from Ebizbydesign.com) for underscoring the goofiness that all-too-often pervades the field. (There’s a reason for the goofiness – no one truly knows how the search engines rank pages, so wild speculation is common). What does that mean in practice?

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

Setting the Ying versus Yang controversy aside, search engine optimization can be a remarkably cost-effective marketing technique for driving qualified traffic to your web site. SEO practitioners spend a lot of time scratching their heads and trying to intuit how engines like Google decide which site appears number one and which appears number 345,211. Following this head scratching, they make changes to the web site or solicit external references to the site in order to influence how the major search engines rank the site for specific terms.

3

What value does it deliver? For companies in competitive fields that depend heavily on their web site to drive sales or entice new prospects, ranking well in organic search results can be the difference between thriving and bankruptcy. Why does it fail? Successful SEO is hard. The major search engines and search engine optimization practitioners have been locked in an arms war since the advent of the first search engine. As SEO practitioners attempt to reverse engineer the search engine algorithms, the search engines modify their search models to reduce spam listings. With each algorithm tweak, results rearrange and it’s back to the drawing board for web site operators. It is particularly difficult to consistently rank well for search terms that are considered valuable by many firms – ranking for these “competitive” terms is a zero sum game and, in addition to trying to understand the search engine, you’re battling other, equally motivated organizations.

A Very Brief Introduction to Web Content Management Solutions What are they? A minimal definition of a content management solution (CMS) would be software that allows non-technical users to publish content to a web site. Most CMS solutions, however, address a broader range of issues including content approval, rotation and archiving, and consistency of presentation. What value do they deliver? Content management systems promise reduced IT involvement in content publishing, streamlined workflow and approval processes, simple rollback features and reduced costs of operation. They can also allow the enforcement of corporate governance policies and brand strategies. How is this related to SEO? In most implementations, the content management solution prescribes the format in which content will be presented. Elements defined by the CMS vary by product but commonly include: •

The mandatory completion of specific fields, such as meta data



The HTML code in which the content will be presented



The navigation structure in which the content will be embedded

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

Commonly pursued SEO tactics involve specifically tuning each of these elements such that they appear more attractive to the algorithms that underlie search engines. Careful forethought allows you use a CMS to enforce SEO best practices for all content on the site.

4

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

section one

5

Search Engine Optimization Best Practices

Short Tail vs. Long Tail Chris Anderson’s seminal article on long tail economics was published in Wired in October 2004. In it, he argues that the internet changes the economics of the publishing and entertainment industry. It’s his view that the economic infrastructure of the 20th century conspired to create “blockbusters” – a small group of movies, books and songs that generated the vast majority of revenues. In contrast, he claims that the 21st century will see a shift to revenue generation by many, many small, niche items. What does this have to do with optimizing your search engine rankings? Surprisingly, the dynamic that Anderson captures as the long tail economy also defines the tactics available to you for increasing traffic driven to your site by search engines. Repeated studies, including this 2003 paper (http://www2003.org/cdrom/papers/refereed/p017/p17lempel.html), have determined that the frequency with which a term or phrase is typed into a search engine approximates a power law.

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

The following graph captures the frequency with which a term was searched upon at a major retailer site over a one month period. To provide a sense of scale, the most popular search was performed 2229 times; 6821 queries were performed only once (we have cheated by truncating the graph along its x axis. When we include all queries performed only once, the red zone becomes almost invisible. Which, of course, is the point.)

6

The internet’s tendency to create patterns that follow power laws underlies both Anderson’s economic proposals and the reality of search engine optimization. This leaves you with a choice of two tactics for increasing the traffic driven to your site by search engines.

1.

Tactic Number One – The Short Tail Target the red zone in the graph above. Invest time, energy and acumen to drive a handful of pages to the top of the search engine rankings for search terms important to your business. This is expensive, difficult and unpredictable, but it can produce remarkable results. This tactic is necessarily limited to a handful of pages that target high-value search terms – we’ll call this “short tail SEO.”

2.

Tactic Number Two – The Long Tail Target the blue zone. Drive traffic by implementing commonly-recognized best prac tices for search engine optimization on all pages of your site. Ensure that search engines index each page and mandate the use of low-cost, replicable tactics that raise search engine prominence. This approach won’t give you a first place ranking on a two-word competitive search phrase (like “content management”) but it will ensure rankings for a large number of rarely sought but highly valuable search phrases (like “.net content management hosted less than 200 monthly” and “maintaining search rankings when implementing a cms”). Call this “long tail SEO.”

You might think of these strategies as analogous to investing in the stock market. An investor pursuing a short tail strategy would conduct deep research, competitive and technical analysis and elicit the thoughts of advisors. Based on this knowledge, she would invest her entire portfolio in a handful of stocks with the potential for exponential growth. If she is right, the return is enormous and the efforts more than justified. But if she is wrong.... In contrast, an investor pursuing a long-tail strategy would buy a mutual fund of stable stocks with growth potential. Little time, energy or cash is invested in any one stock. On average, the investor expects a reasonable return. Content management solutions cannot help you with Short Tail Tactics. They will not write copy finely tuned to search engine algorithms, nor will they solicit links from high profile, reputable sites.

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

But they can absolutely let you exploit Long Tail Tactics. By using your CMS to enforce the use of SEO best practices you can ensure that search engines find and index all of your site and increase the likelihood of your site being found for a very broad range of searches.

7

The contentious world of what matters Or, more specifically, what matters to search engines. None of the major search engines makes public the algorithms they use to sift through billions of documents and return 10 listings. As a result, just about every SEO practitioner holds a different perspective on the factors that matter. In the absence of a definitive understanding of search engine algorithms, SEO practitioners have experimented, observed, messed-with and republished web site content to try to gain insight into their inner workings. Most of these efforts are driven by short-tail considerations – the desire for one page to rank highly for a very popular search term. And there is much disagreement over the tactics to be pursued to achieve superior rankings for search terms that are deeply competitive.

Fortunately for us, a tentative consensus has been reached on which factors are important for long-tail search engine tactics. In general, these factors can be divided into four categories: •

Site-wide factors Issues that can be addressed for the entire site, such as the creation of a site map. Frequently, site or content management administrators can control or influence these factors.



On-page factors Factors that vary with each page on the site, such as the title of the page. Individual authors frequently control or influence these factors. Appropriate CMS configuration can help ensure authors follow best practices.



Off-site factors Search engines are increasingly relying on factors such as the number of external sites linking to a page and behaviour of visitors on a given site to determine rankings. CMS solutions can only address these factors tangentially.



Negative factors A wide variety of issues can lead to search engines reducing the ranking of a page – these range from unreliable site uptime to the inappropriate use of redirects. Many of these factors can be addressed with an effective content management solution.

This document addresses only those factors thought to influence search engine rankings that are relevant to content management deployment – in other words, only those that support your long tail marketing strategy. For a comprehensive discussion of all search factors see Search Engine Ranking Factors available at http://www.seomoz.org/ articles/search-engine-rankingfactors.doc.

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

Site-Wide Factors

8

1.

W3C Compliant Code A page that is coded to match W3C XHTML standards is simpler for the search engines to parse and ensures all content is readable.

2.

Site Maps A comprehensive site map ensures that search engine spiders can find and index each page on the site by following a single link. There is some speculation that labeling the link to the site map with the words “site map” improves the likelihood of search engines indexing the content. For multilingual sites, site maps should be maintained in each language. Most major search engines categorize content by language and language-specific site maps may increase the likelihood of appearance in non-English listings.

3.

Google and Yahoo Site Maps Google and Yahoo site maps significantly increase the likelihood of all of the content on your site being indexed by these engines.

Google announced Google Sitemaps in June 2005, describing it as an “easy way for you to submit all your URLs to the Google index and get detailed reports about the visibility of your pages on Google.” For each page on your site, the Google site mapping protocol allows you to specify: •

the URL of each page;



the date on which the page was last modified;



the frequency with which the page is updated;



and the importance of the page relative to the rest of the pages on your site.

Google sitemaps follow a straight-forward xml protocol. In late 2005, Yahoo quietly made available a similar feature. Yahoo search feeds allow you to submit a comprehensive list of every page on your site to Yahoo as an XML “feed.” Unlike Google, this list must be manually submitted to Yahoo. 4.

Site Navigation as Text Virtually every site has a main system of navigation – links to the topics and subtopics contained on the site. Often, this navigation persists throughout the site. On many sites, web designers have used linked graphic images to define site navigation. This allows them to control exactly how the link will appear. Search engines appear to consider the text of a link to a page when indexing that page. For this reason, it is important that all links to a page on a site – particularly those from the index page – are descriptive and readable by the search engines. Search engines do not read images. Using cascading style sheets, designers can maintain control over the look and feel of navigation elements while using text instead of graphics.

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

5.

9

Use of Search Engine Friendly URLs Most major search engines can follow dynamically generated URLs (if they couldn’t, Amazon.com listings would never appear in search results). But there is considerable evidence that spiders only persist in spidering URLS with several dynamic parameters when a site is considered of high importance. For most sites (that lack Amazon’s standing) a short static URL – such as http://www. nonlinear.ca/cms-seo.html - is more likely to be indexed than a long dynamically-gen erated URL – such as http://www.nonlinear.ca/default.asp?2345AS- 0,2097,1-1-301458-81-116,00.html. If you are unable to make use of static URLs that contain static search phrases, try to use no more than two dynamic parameters in the URL.

6.

Publish to a Flat Directory There is some evidence that “deeper” in your site a page resides, the lower the importance a search engine will place on the page. The depth of the page appears to be determined in part by the number of directories that are apparent in the URL. For example http://www.nonlinear.ca/cms/seo/guide.html would appear to be two directories deeper in the site than http://www.nonlinear.ca/cms-seo.html. By using a relatively flat file structure – no more than two directories deep – you may be able to increase the relevance assigned to pages by some search engines.

7.

8.

Eliminate Broken Links There is considerable disagreement in the SEO community over whether search engines penalize sites that contain broken links. Eliminating them certainly will not hurt your rankings and it will help your visitors. Best practices include: •

Establishing a process for finding and eliminating broken links



Creating a custom “404 page that appears if a link to your site is broken that references your site index

Appropriate Use of robots.txt The robots.txt file allows you to exclude pages or sections of your site from the search engine index. Search engine spiders request this file on arrival at a site. Two problems are associated with robots.txt: •

If robots.txt does not exist, the search engine will receive a 404 error (there is some dispute as to whether this matters, but it is probably wise to avoid it).



An improperly constructed robots.txt file can prohibit a search engine from indexing the site at all.

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

On-Page Factors

10

1.

Effective Title Tags There is general agreement that the title tag of a page is one of the most important factors influencing its ranking on major search engines. To be effective, a title tag should include key search terms relevant to the page as close to the front of the page as possible. It should also read well to searchers, as most search engines present the title tag as part of the returned result. There is little point in obtaining a first place ranking if searchers do not click through to your site.

2.

Effective Meta Description Tags Search engines do not appear to use the description meta tag in determining the relevance of a page; however, when a search engine returns a link to your site, the accompanying text often includes all or part of the text contained in the description meta tag. For this reason, every page should include a compelling description of the content of the page as a meta description.

3.

Reduce Code Clutter By reducing the amount of code presented to a visiting search engine spider, you can increase the prominence of the content you care about – the text you have written. Two approaches can be used to reduce code clutter: •

Use cacading style sheets to format the page. This allows you to position key text above much of the HTML defining the page, while still having the page rendered in the browser as you wish.



Reference javascript through the use of “includes” rather than placing it in the body of the document. In this approach, the javascript code resides outside of the HTML file rather than littering the page with itself.

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms 11

4.

Image Alt Attributes Search engine spiders cannot read images, but they can parse and process the “alt attributes” associated with an image. Alt attributes or “alt tags” are intended to contain a short text description of an image. The alt attribute text is critical if the image is a link; it is also important for allowing the image to be located by searchers performing an image search on sites such as http://images.google.com and http://www.yahoo.com/ images.

5.

Ensure Links can be Processed Search engines have become increasingly sophisticated at analyzing non-html formats such as Flash and Javascript. In some cases, they may be able to find and follow URLs embedded in these formats. But it is foolish to count on it. Avoid using Macromedia flash for key navigational elements. Be sure to include alternative text links for any links embedded in either Javascript or Flash.

6.

Avoid Spelling Errors SEO practitioners speculate that leading search engines may include the quality of grammar and spelling on a page when evaluating its quality. This is subject to some dispute – the evidence is largely observational, most pages ranking highly in Google tend to be well written without spelling mistakes. Regardless of its effect on search rankings, a well written page is likely to more effectively influence your target audience. At a bare minimum, it makes sense to spell check a page before publication.

7.

Use of Key Words in URLs The presence of search terms in the URLs of pages returned by search engines is readily apparent. For example, while writing this document we performed a search on CMS SEO at Google.com. The search results included the following:

The use of the search terms in the URL appears to contribute to search engine ranking. It certainly contributes to the likelihood of a searcher clicking through to your site. One note of warning – several SEO practitioners believe that using more than two hyphens in a URL may trigger anti-spam penalties at key search engines. 8.

Effective Content Structuring Visitors to a web site tend to scan a page rather than reading it paragraph by paragraph. For that reason, headings and highlights are important for letting visitors assess the page at a glance. A hierarchical structure is also important to search engine rankings. Structuring content using Heading number tags (, , etc.), bold and italics can indicate the general theme of the page to the search engine spider. Avoid over use of heading tags – there is some evidence that multiple H1 tags, for example, can trigger anti-spam penalties and reduce your ranking.

9.

Use of Descriptive Text in Internal Links It is widely believed that one element that search engines use to determine the content for a page is the words that appear in links to that page. For example, a link that reads “Content Management Best Practices” provides a strong indication that the page to which it links discusses both content management and best practices. You may have little influence on the text content of external links to your site, but authors have complete control of the text in internal links. Consider these examples:



Click here to view our whitepaper on multilingual workflow



View our whitepaper on multilingual workflow

The first example would not contribute to the ranking of the target page for the search phrase “multilingual workflow”; the second might.

Negative Factors 1.

Duplicate Content Search engines appear to demote pages that contain content that has already been indexed. In the case of Google, one page is selected for presentation and the rest are suppressed unless the searcher actively requests to view these pages. Ensure that most of the content on any page is unique – both within your web site and on the web.

2.

Duplicate Titles Effective page titles remain important for being perceived as relevant by search engines. The same title should never appear on more than one page on a site – search engines penalize duplicate content and this is particularly true if the content resides in the title tag.

3.

Avoid Session Variables Many dynamically generated sites assign a session variable to each visitor that is “carried” within the URL as a visitor moves from page to page. Session variables allow a system to “hold state” and are frequently used as a server side alternative to cookies. Session variables present a challenge to search engine spiders because each time they visit they are assigned a different variable. Since this variable appears in each URL, every page on the site appears to be a new page. This wouldn’t be a problem except that these “new” pages are identical in content to pages already indexed by the spider. And search engines do not like duplicated content.

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

4.

12

Canonical Issues Even if you are careful and ensure there is no duplicate content on your site, canonical issues can make it appear to search engines that you are duplicating content. For example, it is common to have www.url.com, url.com and www.url.com/index.html and www.url.com/default.asp all hosting the same content. This can lead to the search engines viewing the content as four separate, duplicate sites. Relevance penalties may be applied and search rankings plummet. This common issue can be overcome by applying a 301 redirect to all but one of the variations. This is discussed in more detail below.

5.

Associating with “Bad Neighbours” A common search engine optimization technique has been to solicit large numbers of external links to a site by participating in organized link trading schemes. These techniques had considerable success. For a while. Today, most search engines penalize sites linking to “bad neighbourhoods.”

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

section two

13

Embedding SEO Best Practices in CMS Implementations

Because content management solutions provide centralized control over web site layout and presentation, they can be used to enforce many search engine optimization best practices. A CMS implementation that is search engine optimization aware can: •

Improve the ease of indexing and likelihood of ranking for content on the existing site



Ensure that new content being added to the site appears in a search engine friendly format.

The following outlines ways in which a CMS can be configured to embed each of the SEO tactics described in the previous section. We have re-categorized these tactics into groups that correspond with the capabilities of most content management solutions.

System-Wide Considerations Enforce W3C Compliant Code Most mature CMS solutions allow you to define and “lock down” the code used in templates, prohibiting end users from making deep changes to HTML. By validating these templates prior to deployment you can ensure most site content will be W3C compliant. Some CMS solutions – such as RedDot XCMS – incorporate a code validation utility. Third party validation is also available, most notably from Watchfire.com. Key advice: •

Validate the HTML code used in your templates prior to their release into production. Prohibit end users from modifying this code.



Enable code-validation utilities in your CMS if available to ensure ongoing compliance.

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

Create Generic Site Maps

14

Virtually all CMS solutions allow automated generation and updating of site maps. Best practices suggest restricting the number of links on any one page to fewer than 100. This may require the creation of a series of hierarchical Site Maps to provide spiders with quick access to all site content. Key advice: •

Enable the site mapping capabilities of your CMS.



Reference the site map with a link from your index page that includes the visible text “site map” or “sitemap”.



Limit the number of links on the site map to fewer than 100.

Deploy Google and Yahoo Site Maps To the best of our knowledge, no mature content management solution is available that creates Google and Yahoo site maps “out of the box.” A reasonably skilled developer should be able to extend any CMS with an open API to produce these maps in the required format. Alternatively, the non-linear creations team has developed plug-ins to generate Google and Yahoo site maps for a number of leading mid-tier content management solutions. Please contact us for additional information. Key advice: •

Task a developer with extending the capabilities of your CMS to automatically create Google and Yahoo site maps; or



Call non-linear creations and take advantage of our pre-built plugins.



Schedule the periodic resubmission of Yahoo site maps after significant content updates.

Mandate Search Engine Friendly URLs

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

There are two major models for content management solution architectures:

15



Static Publication. In this model, activities associated with content management are separated from content delivery. CMS activities – authoring, editing, workflow – take place on one server (usually inside the firewall). These files are transferred as static HTML to a separate web server. This server does nothing but serve these static HTML pages to visitors. These systems almost always generate URLs with no dynamic variables.



Dynamic Publication. Dynamic CMS systems assemble content “on the fly” to create a page as it is requested by a visitor. Content management and delivery activities take place within the same system. These systems usually generate long URLs littered with dynamic variables. Some of these systems provide work-arounds for this challenge by allowing URL aliases to be created. These aliases can be standard URLs.

Key advice: •

Take into account the type of URLs generated by a CMS when selecting a product.



If you already have in place a system that generates pages dynamically, investigate options for creating static URL aliases.

Publish to a Flat Directory Many mature web content management solutions – particularly those that publish content statically - allow you to organize content hierarchically within the CMS independent of its physical location on the web server. With these systems you can maintain a much more complex system of organization within the CMS then is apparent in the directory structure of the live web site.

For example, within the CMS you can categorize content in a deep folder and sub-folder tree, or taxonomically by meta-data. The CMS can publish this content to a single directory or much less complex directory structure on the web server. Key advice: 1.

Use the CMS, not the server directory structure, to maintain a logical categorization of your content.

2.

For small sites, publish all content to a single directory on the web server.

3.

For large sites, deploy a thin hierarchical directory structure on the web server. Its primary role will be to aid visitors in the recall of URLs.

4.

One exception – if you have a multilingual site, publish each language variant to a separate directory on the server.

Eliminate Broken Links Most content management solutions will not publish content that contains invalid links to other content maintained by the system. That is, they prohibit the publication of broken internal links on the site. With the exception of RedDot CMS, few are able to validate links to external sites. If you are not making use of RedDot, or if you did not purchase the RedDot compliance module, a number of utilities are available for monitoring the validity of links. These include: •

Watchfire WebXM (www.watchfire.com) – suitable for enterprise class sites



LinkcheckerPro (www.linkcheckerpro.com)



Xenu Link Sleuth

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

Key advice:

16



Enable the link validation system in your content management solution.



If it cannot validate external links, purchase a third-party system to monitor the site. Schedule this system to run regularly and report on broken links.



Create a custom 404 page that will appear to visitors arriving at the site from a broken link. Reference both your site’s site map and key content areas on this page.

Use Robots.txt Appropriately Several CMS solutions allow end users to control the robots.txt on a page-by-page basis (HotBanana), but most leave its definition to the site developer. Having a robots.txt file is not absolutely mandatory – its absence is most notable for generating 404 errors in server logs and messing up web log analysis tools. If you choose to implement the file, however, it is critical that it is valid and accurately defines access to site content. A mistake in implementation can prevent the major search engines from indexing any of your site content.

Key advice: •

Carefully create the robots.txt file. Validate it using one of the many freely available online tools such as this one by Search Engine World (http://www.searchengineworld. com/cgi-bin/robotcheck.cgi).



Prohibit end users from making alterations to this file without approval. One approach is to define a workflow for the robots.txt file that requires review by the system administrator.

Address Canonical Issues You want your visitors to find your site whether they type in http://www.url.com, http://url.com or http://www.url.com/index.html. But you certainly don’t want the search engines to see these as separate web sites with duplicate content. Fortunately, you can take a few simple steps to overcome this challenge. Before you go live with your newly content managed site, select one of these URLs as the site URL. Then set up permanent (301) redirects for the other URLs. For example, if you selected www.url.com as your primary URL, you would set up a 301 redirect for url.com and www.url.com/index.html. The major search engines do not penalize permanent redirects. This simple method overcomes most canonical issues. Key advice: •

Use permanent 301 redirects to alias synonymous URLs to one central page.

Avoid Session Variables

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

If your site is already managed within a CMS, you first need to determine if it is making use of session variables. This is straight forward. Navigate to a sub section of your site using a web browser. Take note of the URL. Now use another computer in your office to visit the same page. Compare the URL that you see on the second machine with that you found on the first machine.

17

Are they identical? Then you probably are not assigning session variables to each visitor. Do they differ? Then there is a good chance that a session variable is being assigned. The variable is often visible as the text that differs between the two computers. Session variables are most frequently employed as an alternative to cookies. They “hold state” allowing the system to track one visitor throughout a visit. If your CMS is assigning session variables you have two options safe options and one potentially risky approach: 1.

Investigate replacing session variables with cookies as a means of holding state. Many systems provide this as a configurable option. Be aware that a considerable and growing percentage of visitors refuse cookies. As a result, this approach may not be viable for systems that rely heavily on holding state (such as some shopping cart systems).

2.

Determine whether the use of session variables can be restricted to those parts of the site that absolutely require holding state. For example, an e-commerce element of your site may require the use of session variables, but you may not require their use through out the site. If this is the case, you may be able to improve the likelihood of pages ranking in search engines by eliminating the variable.

3.

In the risky approach, you custom configure the system to identify in bound search spiders (by their IP Address or User Agent) and consistently assign the same session variable. This overcomes the session variable challenge by ensuring the search engines recognize the persistence of a page over time. It is risky because any time you treat a search engine spider differently than human visitors you open yourself to accusations of “cloaking,” a decidedly black hat SEO tactic. The penalties applied by search engines when they detect cloaking are harsh and include the possibility of a permanent ban from listings. It’s a risk that needs to be weighed against the potential gain. If your site is already ranking well, then don’t risk it. If your pages are not being indexed at all, then there is little downside to a ban. Your site is probably in between and you’ll need to make a judgment call.

Key advice: •

Eliminate session variables by replacing them with cookies; or



Restrict session variables to those parts of the site where they are truly required; or



Assign search spiders a specific, permanent session variable.

Template-Level Considerations Reduce Code Clutter

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

Increasing the clarity and prominence of the text on a page is one of the simplest, most-effective SEO tactics.

18

Most mature content management solutions provide the choice of using cascading style sheets to control the format of a page. Making use of a CSS based design – and then prohibiting the modification of this design – will eliminate much of the HTML code that would otherwise be required. If you chose to use javascript, ensure that the code is incorporated into the site as an include file rather than in the body of the page. By defining this at the template level, you can ensure the tactic is deployed throughout the site. Key advice: •

Use cascading style sheets to control design.



Incorporate javascript as an include file.

Create Site Navigation as Descriptive Text When creating the design templates for your site ensure that the navigation elements – those links that structure access to the site – are not images or Macromedia Flash objects, but simple text links. Enforce this navigation through the use of the content management solutions template functions. Many content management solutions allow you to automatically generate “bread crumbs” that indicate to visitors where they are in the site. Use this feature to generate a consistent keyword rich text element on each page. Key advice: •

Don’t use Flash objects or images as part of your site navigation.



Do use descriptive text links.



If your CMS supports navigational breadcrumbs, use them.

Ensure Links can be Processed When creating the templates that power your content management solution, do not embed URLs in javascript or Macromedia Flash. Cascading style sheets can be used to recreate many of the effects developers have traditionally depended upon javascript to perform. With careful development, you can maintain standard text links while benefiting from roll-overs, fly-out menus and sub menus.

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

Key advice:

19



Don’t embed URLs in javascript or Macromedia Flash.



Use CSS to emulate functions usually delivered using javascript.



If you must embed URLs, make sure you have a visible link to the same content on the page.

Structure Content Effectively Most mature content management solutions allow you to specify the controls available to end users within the “what you see is what you get” WYS, WYG editor. You can use this capability to encourage end users to add effective structure to their content. Key advice: •

Enable , and headers in the WYSIWYG editor.



Disable the font size control. This will force users to make use of header tags to control text size.



Use cascading style sheets to control how the header tags appear to visitors.



If your CMS includes a comprehensive compliance manager, create rules that allow only one tag per page.

Page-Level Factors Create Effective Title Tags / Use Keywords in URLs Ensuring consistent use of title tags, meta descriptions and keywords in URLs requires modifications to the editor screen of your CMS. In an ideal world: 1.

Authors would be forced to enter three key search phrases relevant to the document being created before the document is created. They would be instructed to enter these terms in the order of their importance as search terms.

2.

The CMS would automatically include these three phrases as the first terms in the title tag of the page.

3.

The CMS would automatically save the page with a filename that includes the first search term.

While several web content management solutions are sufficiently flexible to allow these changes to be made (notably RedDot CMS), the majority are not. As an alternative, you can provide detailed instructions to content authors within the editing framework. Authors usually edit or add content through a browser-based interface. The HTML that drives this interface can often be modified – either through documented processes or through a simple hack. (One simple hack example: substitute a graphic containing instructions for an existing image in the editor. With the same name and size it should appear to the user without additional changes.) Authors should be instructed to select search terms, use these in naming the file and include them in the title tag.

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

Key advice:

20



Modify the CMS authoring environment to require the definition of search terms and the use of these search terms in the title tag and as part of the file name.



If this is not possible, incorporate detailed instructions into the editing environment.

Prohibit Duplicate Titles Adopting the approach recommended above for automatically generating title tags from authorspecified search phrases will dramatically reduce the likelihood of pages with duplicate titles being published. Several content management solutions incorporate compliance managers that confirm content complies with corporate standards prior to submission to workflow. You may be able to configure these compliance managers to confirm the uniqueness of a proposed title tag. If duplicate titles are a significant concern – and your CMS does not include a sufficiently sophisticated compliance manager – consider this work around: automatically include the author name, publication date and time (including seconds) in the title. This will ensure that no two titles will be exactly the same. It is far from an ideal solution but it is one that most content management solutions will support.

Key advice: •

Configure the compliance manager to enforce uniqueness of each page title.



If this is not possible, configure the CMS to automatically include author name, date and time in the title of each page.

Encourage Effective Meta Tags Mature content management solutions allow you to require authors to define meta tags including the meta name=description tag. Use the CMS to require meta description entry. If possible, modify the authoring environment so that authors are instructed to include a compelling call to action as part of the description tag. Key advice: •

Require authors to include a meta tag description when creating or submitting a page to workflow.



On screen text should instruct the author to focus on making the meta description compelling.

Enforce Image Alt Attributes Most mature content management solutions can enforce the definition of Alt attributes for images, either by the author during page creation or in the digital asset library during image upload. If possible, incorporate instructions into the CMS interface that instructs authors to provide detailed descriptions of images suitable for indexing by search engines. Images may be exclusively aesthetic (that is, they may convey no information). In most cases, enforcing Alt attribute completion will require authors to enter something for these images. The most common approach is to enter a space or a period. Instructions can convey this to authors.

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

Key advice:

21



Require authors to define Alt attributes for all images.



Incorporate instructions explaining the role of Alt attributes and exceptions for aesthetic images.

Avoid Spelling Errors Most content management solutions incorporate a spell checking feature, either as part of the core product or as an optional module. Key advice: •

If your CMS incorporate a spell checker, make its use mandatory before authors submit documents to workflow.



If your system does not include a spell checking feature, consider incorporating one of the open source spell checking products now available (perform a search at www. searchforge.net to identify an appropriate solution).

Prohibit Duplicate Content Avoiding duplicate content penalties is important. It can also be difficult. Authors commonly use existing content as a template, make only minor modifications and publish this as a new page. Even more difficult to detect, competitors or well-meaning partners may “borrow” content from your site. To the best of our knowledge, no widely-available CMS offers duplicate content detection as an out of the box feature. You can, however, take steps to reduce the likelihood of content being duplicated. Key advice: •

When creating reusable content objects, ensure that these include relatively few characters – do not make it easy for authors to create near duplicate pages using content reuse rules.



If your CMS is PHP-based, consider integrating Duplicate Check (http://www.duplicatecheck.com/) as part of the work flow process.



If off-site use of your content is a problem, consider subscribing to the CopySentry service (www.copyscape.com). This service detects web sites with content similar to what you have published and reports on these potential copyright violations.

Use Descriptive Text in Links / Avoid Bad Neighbourhoods

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

Content management solutions usually give authors the ability to connect to external links with little or no restriction. You probably will not be able to programmatically prevent authors linking to bad neighbourhoods or enforce the use of descriptive text links. You can, however, provide explicit instructions to authors encouraging them to think carefully about how they create links.

22

As noted above, you can usually find a way to modify the author environment, either through documented processes or through a simple hack. Add instructions to the link creation screen of the CMS. Key advice: •

Add instructions to the linking utility within the WYSIWYG editor that explains why authors should include descriptive phrases within the text of the link they are creating. Include examples, if space allows.



Add a warning to the link utility page – “Do not link to external sites whose primary purpose is exchanging links. Linking to these site’s may damage our site’s prominence within search engine results.”

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

section three

A Case Study

23

Long-tail search engine marketing tactics work. They hold particular promise for sites that are not fully indexed by the search engines despite having been online for several years. This case study captures non-linear creations’ work with a mid-size public company in the semiconductor manufacturing space. We worked closely with this firm to redeploy their site in a content managed environment using many of the tactics described above.

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

The outcome was impressive. The number of visitors delivered to the site by the search engines each month increased over 600 per cent following implementation. This graph compares search engine traffic in September of the year prior to long tail optimization with September of the year following optimization.

24

Not only did the number of visitors to the site increase, but the quality of these visitors improved. Average page views per visit increased 14 percent from 3.6 page views per visit to 4.1 page views per visit.

Evidence of the Long Tail Effect

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

Deeper analysis provides solid evidence that the increase in visitors to the site is the result of the long tail search optimization. The pattern of site entry changed radically after the optimization was complete. Prior to optimization, the vast majority of visitors entered the site via the index page of the site.

25

Following optimization, slightly more than half of all visitors entered the site through pages deeper in the site, pages being found more frequently in the major search engines.

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

As the following graph makes clear, this change in entry patterns is directly related to a shift in the nature of the searches leading visitors to the site. Prior to long tail optimization, the top 50 most common search terms accounted for eight-five percent of visitors arriving at the site from search engines. After optimization, 2/3rds of visitors found the site after seeking less common “long tail” search terms.

26

The Bottom Line Embedding search engine optimization best practices during the deployment of a content management solution can pay large dividends. In the case discussed, this “long tail” tactic increased both overall traffic to the site and the quality of the traffic search engines delivered to the site.

About non-linear creations inc. non-linear creations (NLC) provides e-business consulting services to an international clientele. Since 1995, we have helped our clients leverage the power of internet technology to achieve tangible business benefits. NLC has established six areas of practice: • • • • • •

Enterprise Content Management Digital Direct Marketing including Search Engine Marketing and Email Marketing Web site and multimedia development Onsite search optimization Custom application development and integration Web analytic consultation

Each area of practice is founded on proven methodologies, extensive technology partnerships, and an extensive list of delighted clients. We have completed more than 750 projects for more than 500 clients including: • • • • • • • • •

Heinz; Mazda; Nortel Networks; Warner Brothers; E*trade; Telus Mobility; The World Health Organization (Geneva); Alcatel; and Drake International.

Download this paper non~linear creations inc.at www.nonlinear.ca/seo-cms

You can reach non-linear creations:

27

• • • •

by email at [email protected] by phone at 613.241.2067 by fax at 613.241.3086 by mail at: non-linear creations 987 Wellington Ottawa, ON Canada K1Y 2Y1

28

non~linear creations inc.