Liquid

Preventing Duplicate Content with Query Parameters- Canonical URL and Robots.txt

4 min

when you use url query parameters e g "?page=2" to control dynamic content on siteglide, you can help search engines understand these introduction query parameters are a really useful way to pass infomation to the server to help it deliver dynamic, relevant content the problem with query parameters is that sometimes they make the same page look like multiple pages with "duplicate content" this is not ideal from an seo point of view as search engines like to offer users unique pages relevant to their search terms let's say we have the "about" page and this contains a webapp which can be searched we have the main url /about but search engines have incorrectly identified other urls /about?keyword=hello%20world /about?keyword=hello%20mars using canonical url in your page template in the \<head> you can set a recommendation that where query parameters are used, the canonical url should be treated as the most important version of the page to index, whereas other variations are subsidiary and should not be ranked unfavourably for being similar to the canonical "main" version you can do this by setting the main part of the url as the canonical url in this example, this will be applied to every page which uses this page template , but you may wish to use an if statement to only apply it on certain pages in the example, we use the "context headers" object to read the url of the page, specifically the "path info" which excludes query parameters, but includes slugs this means you don't need to change this for each page the liquid dynamically works out the current url without query parameters you can (and should) adjust this based on your site's structure and seo needs using robots txt adding the following to your robots txt file (see system pages ) would stop search engines from crawling any pages using url parameters disallow / ? you could be more specific and just make sure the variant pages created by pagination are not included disallow / ? page= when following these examples, make sure to double check you are not excluding query parameters which genuinely should be crawled as pages in their own right each site will have different seo needs see the same question on stackoverflow for more details https //stackoverflow\ com/questions/9149782/ignore urls in robot txt with specific parameters