Query parameters can transform an ordinary URL into a seemingly neverending query string of characters. Parameters are usually added to URLs when you land on specific website’s page, navigate more deeply through the site, or come to the site through a specific campaign.
Query String Example:
If you have ever been in the Google Analytics’ Content reports, it’s highly likely you have noticed what a pain these parameters can be. The number of pages in these reports can multiply with the addition of only a few query parameters, leading to issues that range from annoying an analyst to problematic for your analytics as a whole.
Here are three of the most important reasons you should exclude query parameters from your data in Google Analytics.
1. Query Parameters Splinter Data
Query parameters break up URLs in Google Analytics. The same webpage on your site may be broken up into tens, hundreds, or (for highly trafficked sites) thousands of different pages in GA when combinations of parameters are tacked onto this URL. Hence, all associated metrics are splintered as well. This can make cohesive analysis extremely difficult and in some cases impossible.
For instance, imagine you are trying to pull the top viewed pages of the past month. With parameters breaking up the same page into unique URLs, the number of pageviews are also fragmented. At a high level, this undermines the pageview-based performance of any one page. If you wanted to dig further into these pages, common questions like where this traffic is coming from, what the bounce rate is, and how people are interacting with it become nearly impossible.
2. Query Parameters Lead to High Cardinality
We already know that query parameters can multiply the quantity of pages in your content reports. If this causes the number of rows in your pages reports to exceed 50K every day, you’ll run into high cardinality issues. Essentially, Google sets limits on the number of values a dimension can have. For the page dimension, after 50K values (or rows) of data has been filled, pages will begin to fall into the (other) value (here’s Google’s exact explanation of this problem).
Thousands of pages can be lumped into (other), and this data is essentially inaccessible to analyze. Making any judgments on this incomplete data will lead to highly inaccurate assumptions and misguided decision-making.
3. Personally Identifiable Information Can Often Linger in These Parameters
In the depths of query parameter combinations (especially for Ecommerce companies) may lurk an unnecessary evil: personally identifiable information (PII).
PII is any information that could be traced back to one specific individual. It can come in the form of:
- Phone numbers
- Credit card numbers
- Social security numbers
If users are ever prompted to enter any of this information on your website, you should be wary that you’re not sending this data through Google’s servers (which would then end up in the URL or page title of your Google Analytics Data).
*Email addresses, telephone numbers, and personal street addresses in data
Unfortunately, when URLs have large combinations of parameters, it can be difficult to distinguish this PII. Failure to exclude this information from your Google Analytics data and to cease sending it to Google’s server is a direct violation of Google’s policy and may result in termination of your account — see ya later data. To prevent this wrongful termination, you will need to not only remove these parameters at the view level, but also strip these parameters at the data collection level through Google Tag Manager .
Hopefully this, and the reasons above, have at least made you consider identifying and omitting extraneous query parameters from your data. Now you might be thinking, how do I exclude the “extraneous”?
First, you’ll have to identify all the query parameters that are appearing on your data. While you can do this manually, the more pages you have in your Content reports, the more tedious this process becomes and the more likely you are to miss parameters.
Luckily, Google created a spreadsheet to automate the process. Once you make a copy of this sheet and follow the instructions (clearly laid out on the first tab of the spreadsheet), you’ll be left with a list of all query parameters found in your data from the past N days. When leveraging the spreadsheet, it’s optimal to use a view without any filters on it.
Once you have the list, the questions you should ask yourself are:
- What parameters would be useful in an analysis?
- Do any of these parameters determine the content of website page?
For the first question, you want to determine what parameters inform critical or deeper insights. One example of this would be campaign parameters so that your analyses could dive into the success of your recent digital campaigns.
To address the second question, let’s first determine what this means by using an example URL:
This URL has three parameters. While the second two likely have a technical meaning, stripping them from the URL would not change the content of this page at all. However, the third parameter likely determines that this is the “Next Steps” page, and is thus necessary to keep in Google Analytics as a parameter. Be sure to check with your technical team to determine if
If you are attempting to address high cardinality, the next question to ask yourself is, what impact will excluding these query parameters have? You will need to identify how many rows of data for which a parameter accounts. The higher the number of rows, the more impactful excluding that parameter will be.
Once you have decided what query parameters to exclude, return to the spreadsheet where you originally identified them, and select “YES” from the drop down menu.
The query parameters will appear in the yellow box (automatically separated by commas) for you to copy. Then, got to “View Settings” and paste this into the “Exclude URL Query Parameters” field. This field is case sensitive, so if both “q” and “Q” appeared in your list, you’ll have to exclude them both.
One exception to excluding query parameters in the view settings is the site search query parameter.
If your website has a site search function, it’s likely that the query parameters in your Content report include the parameter used to distinguish specific searches. While it is valuable to include the data from site searches in GA, leaving parameters with this information tacked onto URLs is not necessary due to the Site Search report.
If you want to include site search data but still remove query parameters from your Content reports in, do not exclude these parameters. Doing so will eliminate all site search data.
Instead, remove these query parameters from URLs in the Site Search Settings of overall view settings. After switching tracking on, enter the site search-specific parameters, and check the box below this field to strip them from your URLs.
Copy your official view, follow all the steps I have given, and test this out for yourself. The path to cleaner urls and more cohesive data can be as simple as that – I promise you won’t regret it.