website logo
⌘K
Introduction
PORTAL
Agency Whitelabelling
PAGES
Pages - Accessing Page Data
Pagination Layouts
Page Templates
FORMS
Form Confirmation Pages
Adding a Progress Bar
Adding Custom CSS To Show Form Submit Process
Dynamic Content in Workflow and Autoresponder Emails
How to output Custom Field Set fields in a Form's Custom Layout
Custom JavaScript Validation for Forms
File Upload Previews
FAQ
CATEGORIES
Filtering WebApps and Modules by Categories
Outputting Categories on WebApp / Module / eCommerce Layouts
Outputting Category Fields in any Location
Category Detail Layouts
FAQ
COMPANY INFORMATION
Company Information
SITE SEARCH
Site Search
PUBLIC API/ZAPIER
Zapier - Formatting arrays correctly
Public API/Zapier Changelog
MODULES
Module Marketplace
Building Custom Modules
Siteglide Modules
Front-end Submit Modules
DATA STRUCTURES
Automations
Creating WebApps from the CLI
Field Types
WEBAPPS
Front-end Submit WebApps
Layouts
Search and Filtering
Understanding Custom Field Names and IDs
FAQ
CRM
User Details
User Secure Zones
Storing User's Favourite WebApp / Module Items
User's Form Submissions (Cases)
How Users Edit their Email and Password Front End
Editing a User's CRM record Front End with Custom Field Sets
CLI
Introducing Siteglide CLI
CLI Changelog
Secure Zones with CLI
Page Templates with Siteglide CLI
Pages with Siteglide CLI
Includes with Siteglide CLI
Managing Email Templates
Migrate - Manual Form setup
Migrate - Convert existing Forms
Liquid
Accessing Assets
Liquid Dot Notation
Using WebApp Collections- Tutorial
Using the current_user object with Secure Zones
Preventing Duplicate Content with Query Parameters- Canonical URL and Robots.txt
FAQ
GraphQL
Tutorial Overview
About GraphQL
Tutorial 1- Your First Query
Tutorial 2 - Pagination
Tutorial 3 - Filtering the Results
Tutorial 3 - Answers to the First Filtering Challenge
Tutorial 4 - Advanced Filtering
Tutorial 4 - Challenge Answers
Tutorial 5 - Using Liquid to run GraphQL queries on your Site
Tutorial 6 - Variables
Tutorial 6 - Answers to the Variables Challenge
Tutorial 7 - Sorting
Tutorial 8 - Building a Liquid API GET Endpoint Page powered by GraphQL queries
Best Practice and Performance
Module/WebApp Caching
Getting Started with Liquid Caching - to Reduce Server Response time and Improve Performance
Includes
ecommerce/checkout_standard
Frequently Asked Questions
Using Liquid Logic to Check if a field exists, or is empty on the Front End
How do I learn more about Liquid?
How to setup a multi domain start page
Docs powered by archbee 
6min

Preventing Duplicate Content with Query Parameters- Canonical URL and Robots.txt

When you use URL query parameters e.g. "?page=2" to control dynamic content on Siteglide, you can help search engines understand these.

Introduction

Query parameters are a really useful way to pass infomation to the Server to help it deliver dynamic, relevant content. The problem with query parameters is that sometimes they make the same page look like multiple Pages with "duplicate content". This is not ideal from an SEO point of view as Search Engines like to offer Users unique Pages relevant to their search terms.

Let's say we have the "About" page and this contains a WebApp which can be searched.

We have the main URL:/about but search engines have incorrectly identified other URLS:

  • /about?keyword=hello%20world
  • /about?keyword=hello%20mars

Using Canonical URL

In your Page Template in the <head> you can set a recommendation that where query parameters are used, the Canonical URL should be treated as the most important version of the Page to index, whereas other variations are subsidiary and should not be ranked unfavourably for being similar to the Canonical "main" version . You can do this by setting the main part of the URL as the canonical URL. In this example, this will be applied to every Page which uses this Page Template, but you may wish to use an if statement to only apply it on certain Pages.

HTML
|

In the example, we use the "context.headers" object to read the URL of the page, specifically the "PATH_INFO" which excludes query parameters, but includes slugs. This means you don't need to change this for each page- the liquid dynamically works out the current URL without query parameters. You can (and should) adjust this based on your site's structure and SEO needs.

Using Robots.txt

Adding the following to your robots.txt file (see System Pages) would stop Search Engines from crawling any Pages using URL parameters: Disallow: /*?* You could be more specific and just make sure the variant Pages created by Pagination are not included: Disallow: /*?*page=* When following these examples, make sure to double-check you are not excluding query parameters which genuinely should be crawled as Pages in their own right. Each Site will have different SEO needs. See the same question on StackOverflow for more details: https://stackoverflow.com/questions/9149782/ignore-urls-in-robot-txt-with-specific-parameters

Updated 19 Oct 2021
Did this page help you?
Yes
No
UP NEXT
FAQ
Docs powered by archbee 
TABLE OF CONTENTS
Introduction
Using Canonical URL
Using Robots.txt