What is Considered Duplicate Content by Google’s Panda Algorithm?
What exactly is duplicate content?
Google’s algorithm, Panda, exists to keep sites with low quality content from ranking in search engines. One variable of Panda is duplicate content.
According to Google, duplicate content is made up of “substantive blocks of content within or across domains that either completely match other content or are appreciably similar.” Google will penalize content that is “deliberately duplicated across domains in order to manipulate search engine ranks.”
On the other hand, there are different types of legitimate duplicate content that will not be penalized by Google. The purpose of this article is to illustrate the main types of duplicate content and offer suggestions on avoiding penalties.
Newsfeeds – Published on Your Site
If most of your website content is original and you are using some content from quality newsfeeds, your ranks won’t suffer. The newsfeed will be considered neutral content by Google.
For instance, if you are using a newsfeed from a syndicated health publisher to keep your visitors informed on the latest medical news, this will not have a detrimental effect on your ranks in Google. A newsfeed can add great value to your site by keeping visitors up-to-date on the latest information, particularly if you do not publish original content on a daily basis. A newsfeed can help keep your site fresh for your visitors.
The general rule of thumb is that if the content is of value to your visitors, it shouldn’t have a negative impact on your search engine ranks. However, when displaying content from other sources, it is a best practice to identify the source of the content properly.
Here are two suggestions to help search engines and your visitors understand where the content originates:
- Include a citation and a link back to the original source, visible on the page
- Use the “noindex” metatag on newsfeed article pages so that search engines won’t index them
Articles from other Web Sources – Published on Your Site
Perhaps you want to display articles that were originally published on other websites. Similar to newsfeeds, if the content is something of value to your visitors, it shouldn’t be punishable by Google algorithms.
Best practice calls for following these suggestions:
- cite the article and link back to the original source
- use <link rel=”canonical” href=”https://theirsite.com/articlelink”> in the <HEAD> tag of the page
- use the noindex tags
Syndicated Content – Your Content on Other Sites
Conversely, you may want to get your content or blog posts published on other sites. This is a good way to get your site recognized for high quality content in your industry.
For example, eHealthcare Solutions publishes a newsfeed onto our social media properties (Twitter, LinkedIn, and Facebook). We use Hootesuite and RSS to automatically feed our blog articles into our social media properties. This does not hurt our search ranks.
Here are some suggestions:
- Regularly publish your content on your social media properties, even if it is only a snippet of your original article. Using a feed to post links on social media properties gets your content and website recognized in a systematic way.
- If you are syndicating your full articles to other websites (not social media properties), wait a week or two before you push your content to other sites, this way your original pages get indexed first.
- Be selective about the sites that you syndicate your full articles on. If you can get the other site to implement the rel=canonical tag in their header, it will ensure your site is the original reference: <link rel=”canonical” href=”https://yoursite.com/articlelink”>
Content Management Systems
Sometimes a content management system, discussion forum, or digital store front can duplicate the same content on multiple URLs within your site. In order to combat this, you will need to tell Google which URL should be the one they are indexing. You can do this via canonicalization or 301 redirects.
Many content management systems, like WordPress, will automatically redirect pages so that you don’t have to worry about duplicate pages.
Resources:
- 3 Myths about Duplicate Content (also covers scrapers)
- Google’s help page on duplicate content
- Duplicate Content Is A Myth. Here Are 9 Cases That Prove It! Good article explaining how to syndicate your content on other sites
- If you’re worried about duplicate content on your website, eHealthcare Solutions provides free site audits for our exclusive publishers.