Category

JavaScript

Excluding internal traffic while anonymizing IP adresses in Google Analytics

Introduction

Due to the recent rollout of the GDPR, it has come to many people’s attention that IP addresses in some cases can be seen as personal identifiable information (PII), and therefore wishes to anonymize the information in Google Analytics.

Two weeks ago I was at Measure Camp in Copenhagen, giving a speech about IP anonymization, and the complications it brings with excluding internal traffic. This blog post sums up the points presented and discussed at the event.

I am not a lawyer , or an expert on how IP addresses work, so this blogpost will revolve more around how to anonymize IP addresses and exclude internal traffic than about GDPR or technical walktroughs on IP addresses.

The issue

Imagine that you have a smaller website with an identifiable ISP. If you go to the website from a specific newsletter campaign and enters the website, it will be quite easy to find you specifically in Google Analytics. All it would take is to:

  • Select network
  • Find your ISP address
  • Create a segment saying: Only show users who came from a specific ISP address and have visited my website through a specific campaign

In other cases, your IP address might be linked to your home address through your teleprovider, and in some cases it will be possible to identify you as a person.

If you know that this might be an issue with some of the users, you can easily anonymize your IP addresses through Google Tag Manager, by setting a custom field with “anonymize ip” to “true”.

What happens is that Google removes the last digits in the IP adress:

While this is all great (Besides screwing up your geography data on a city level), it does provide a major issue, as all your IP exclusion filters will stop working.

To make sure that we can still exclude our internal traffic, we need to figure out whether the user is accounted for as internal or external traffic, before sending data to Google Analytics.

The main inspiration for this, is a real client case, where I was asked to develop a way to exclude internal traffic while anonymizing IP addresses and not having any ressources besides tag manager to help me complete the task. A huge thanks goes to Simo Ahava for writing a similar post, which was used as an inspiration for making this setup.

The solution(s)

In this section, multiple solutions are provided. Use the one that fits your organization / client the best.

Solution A: Only anonymize external traffic

Step 1: Create a regular expression with your IP addresses

First of all, you need to find all your internal IP addresses, and add them into a regular expression in this format:

Step 2: Add them to a script that sets a cookie if the user is internal or external

After that, you need to include it in this script:

What it does is that It calls the service IPFY, and checks the IP of the users that visits the site. Then it matches it up with the regular expression you set up. If it matches it will add a cookie on the user’s browser saying it is internal traffic. If not, it will say add a cookie saying that it is external traffic.

The duration of the cookie is up to you to set. We have decided to let the cookie for external traffic be 7 days to not store that information for longer than needed, and 30 days for internal traffic.

Step 3: Add the cookies set in Google Tag Manager and use them as variables

If you click on create a new variable in Google Tag Manager and select the “1st party cookie” variable, it is possible to grab cookies on a users browser and use that information to control what tags you are sending. This is especially handy when looking at user consent in order to fire tags.

Step 4: Make sure that the script does not fire all the time

Finally add the script as a custom HTML tag on all pages, as long as the cookies are not set:

Step 5: Write a script that turns the cookies set into one variable

The next thing we need is to create a script so we can have it all in one variable. This has an advantage, as it makes it possible to create lookup tables,which will be used to see if a user should be anonymized or not:

Then we create a lookup table that we will use to identify weather the anonymize IP feature should be set on or off:

For some reason, boolean values require that you add ‘ ‘ to make them function within your global analytics settings.

Step 6: Add the lookup table to your Global Analytics settings

Finally, the anonymize IP feature in Google Analytics is set to only be active when it is external traffic on your site. Furthermore, It also sends whether it is internal or external traffic through a custom dimension.

The reason I have chosen to do this instead of just anonymizing everything, and filtering the traffic based on the custom dimension is to know what IP addresses that are internal. With different agencies and stakeholders I believe it is important to know what we are excluding on the site, and be able to check up on which IP a filter belongs to.

Cons

  • Sending your users IP addresses to an external service is not always something that your legal department will be okay with
  • It exposes all your internal IP addresses on the webpage

Pros

  • It is quite effective if you don’t have other means to exclude internal traffic

Excluding your internal traffic based on the custom dimension set

If you want to exclude your internal traffic with a custom dimension, you first need to set it up under custom definitions –> Custom dimensions. From here you need to select the number corresponding with the dimension value selected in Google Tag Manager and set it to a “User level”

Once that is done, you can add an exclusion filter to remove your internal traffic. Remember that this will not work retroactive, meaning that internal traffic will only be excluded from the day you set up the filter, unless it was already set up before you started on this post.

And that should be it, now any IP you add to your list in Google Tag Manager will be filtered out in Google Analytics.

Solution B: The same, but get your developers to do it!

The solution A is made for people who don’t have resources to have IT detect the type of traffic going to your site. If it is, I recommend having them to create a variable in the dataLayer that checks the users IP address server side, and from here you can set your values:

Cons

  • Involving IT takes maintenance, time and usually a bigger budget

Pros

  • Not sending your users IP addresses to third parties

Solution C: Make people in your organization visit a specific site each time they log in on an internal network

The last solution is something I have heard people say that they have done, and is also one of the Methods that Simo have written about in his previous post. It is however not something I have experienced being done successfully. This is simply by adding a unique identifier for all people who needs to be excluded from the site. This can be by:

  • Having them open up a specific email with a UTM code
  • Making the internal visitors visit a specific page and then send a cookie

Cons

  • Getting people to do things is difficult and requires frequent maintenance

Pros

  • If it works, it works well and without having multiple stakeholders involved

Final thoughts

I believe it is important to know why you need / want to exclude your IP addresses, and even more important a general understanding of how your organization uses data, and interact with your user’s behavior in Google Analytics. In terms of anonymizing IP addresses, the first solution presented is a good starting point if you need to take action, and don’t have any resources yourself. I do think that the best approach is to let the developers look at the IP address server side, and push that information to you, however it can be a hassle to get it updated.

I have been discussing my first solution and issue with different clients and nerdy peers. My conclusion so far is that It really depends on what organization you are in, and where you and your legal team stands.

If you feel that any of the solutions I have provided can be used, feel free to implement them, if not, please leave a comment and let me know if you have a better solution to the issue!

At last, a big thanks to Thomas Rhode for helping me write the code.

Setting up smart triggers with lookup tabels in Google Tag Manager

I often need to manage a series of tags, where i have to handle multiple business units in multiple languages, where multiple events needs to occur to fire specific variables.

When you are using variables, you are often limited when using lookup tables, as you can only define it from one variable as input / output. However, since the release of the Regex Lookup table, a lot of things have been easier to do.

Everything is an event

Whenever something happens within Google Tag Manager, an event is fired. A DOM load is a gtm.dom event, a Page load is a gtm.load event and so on. In this post I will write about how to use this to make your tracking a bit smarter and your triggers more dynamic.

In my last post i showed how to strip down Floodlight Tag Parameters. As i hate making a ton of tags i thought: “What if i could combine all my floodlight tags into 2 tags“, A counter and a Sales tag, and only having to use the 3 variables necessary  to build them (Category, Source and Type).

To do this i decided to make a very small piece of JavaScript to handle the task:


function() {
var combinedVariables = {{Event}}+{{Page Path}};
return combinedVariables;
}

This is just an example, but it has endless possibilities. Imagine that you want a tag to trigger once some specific DataLayer variables are present on certain pages? Now you can! Just go an add that in your custom variable and select the things you need for being able to fire your tags. See how I set it up here:

Above i have combined the business unit, with a country, with a event, with a pageview. This means that i can switch between any organisation build into the DataLayer and do any type of combination i need. This is quite need, as it gives me the flexibility to use 3 variables to control 2 floodlight tags instead of 20, saving me time and giving others a better overview when having to use multiple marketing tags.