Excluding internal traffic while anonymizing IP adresses in Google Analytics

Introduction

Due to the recent rollout of the GDPR, it has come to many people’s attention that IP addresses in some cases can be seen as personal identifiable information (PII), and therefore wishes to anonymize the information in Google Analytics.

Two weeks ago I was at Measure Camp in Copenhagen, giving a speech about IP anonymization, and the complications it brings with excluding internal traffic. This blog post sums up the points presented and discussed at the event.

I am not a lawyer , or an expert on how IP addresses work, so this blogpost will revolve more around how to anonymize IP addresses and exclude internal traffic than about GDPR or technical walktroughs on IP addresses.

The issue

Imagine that you have a smaller website with an identifiable ISP. If you go to the website from a specific newsletter campaign and enters the website, it will be quite easy to find you specifically in Google Analytics. All it would take is to:

  • Select network
  • Find your ISP address
  • Create a segment saying: Only show users who came from a specific ISP address and have visited my website through a specific campaign

In other cases, your IP address might be linked to your home address through your teleprovider, and in some cases it will be possible to identify you as a person.

If you know that this might be an issue with some of the users, you can easily anonymize your IP addresses through Google Tag Manager, by setting a custom field with “anonymize ip” to “true”.

What happens is that Google removes the last digits in the IP adress:

While this is all great (Besides screwing up your geography data on a city level), it does provide a major issue, as all your IP exclusion filters will stop working.

To make sure that we can still exclude our internal traffic, we need to figure out whether the user is accounted for as internal or external traffic, before sending data to Google Analytics.

The main inspiration for this, is a real client case, where I was asked to develop a way to exclude internal traffic while anonymizing IP addresses and not having any ressources besides tag manager to help me complete the task. A huge thanks goes to Simo Ahava for writing a similar post, which was used as an inspiration for making this setup.

The solution(s)

In this section, multiple solutions are provided. Use the one that fits your organization / client the best.

Solution A: Only anonymize external traffic

Step 1: Create a regular expression with your IP addresses

First of all, you need to find all your internal IP addresses, and add them into a regular expression in this format:

Step 2: Add them to a script that sets a cookie if the user is internal or external

After that, you need to include it in this script:

What it does is that It calls the service IPFY, and checks the IP of the users that visits the site. Then it matches it up with the regular expression you set up. If it matches it will add a cookie on the user’s browser saying it is internal traffic. If not, it will say add a cookie saying that it is external traffic.

The duration of the cookie is up to you to set. We have decided to let the cookie for external traffic be 7 days to not store that information for longer than needed, and 30 days for internal traffic.

Step 3: Add the cookies set in Google Tag Manager and use them as variables

If you click on create a new variable in Google Tag Manager and select the “1st party cookie” variable, it is possible to grab cookies on a users browser and use that information to control what tags you are sending. This is especially handy when looking at user consent in order to fire tags.

Step 4: Make sure that the script does not fire all the time

Finally add the script as a custom HTML tag on all pages, as long as the cookies are not set:

Step 5: Write a script that turns the cookies set into one variable

The next thing we need is to create a script so we can have it all in one variable. This has an advantage, as it makes it possible to create lookup tables,which will be used to see if a user should be anonymized or not:

Then we create a lookup table that we will use to identify weather the anonymize IP feature should be set on or off:

For some reason, boolean values require that you add ‘ ‘ to make them function within your global analytics settings.

Step 6: Add the lookup table to your Global Analytics settings

Finally, the anonymize IP feature in Google Analytics is set to only be active when it is external traffic on your site. Furthermore, It also sends whether it is internal or external traffic through a custom dimension.

The reason I have chosen to do this instead of just anonymizing everything, and filtering the traffic based on the custom dimension is to know what IP addresses that are internal. With different agencies and stakeholders I believe it is important to know what we are excluding on the site, and be able to check up on which IP a filter belongs to.

Cons

  • Sending your users IP addresses to an external service is not always something that your legal department will be okay with
  • It exposes all your internal IP addresses on the webpage

Pros

  • It is quite effective if you don’t have other means to exclude internal traffic

Excluding your internal traffic based on the custom dimension set

If you want to exclude your internal traffic with a custom dimension, you first need to set it up under custom definitions –> Custom dimensions. From here you need to select the number corresponding with the dimension value selected in Google Tag Manager and set it to a “User level”

Once that is done, you can add an exclusion filter to remove your internal traffic. Remember that this will not work retroactive, meaning that internal traffic will only be excluded from the day you set up the filter, unless it was already set up before you started on this post.

And that should be it, now any IP you add to your list in Google Tag Manager will be filtered out in Google Analytics.

Solution B: The same, but get your developers to do it!

The solution A is made for people who don’t have resources to have IT detect the type of traffic going to your site. If it is, I recommend having them to create a variable in the dataLayer that checks the users IP address server side, and from here you can set your values:

Cons

  • Involving IT takes maintenance, time and usually a bigger budget

Pros

  • Not sending your users IP addresses to third parties

Solution C: Make people in your organization visit a specific site each time they log in on an internal network

The last solution is something I have heard people say that they have done, and is also one of the Methods that Simo have written about in his previous post. It is however not something I have experienced being done successfully. This is simply by adding a unique identifier for all people who needs to be excluded from the site. This can be by:

  • Having them open up a specific email with a UTM code
  • Making the internal visitors visit a specific page and then send a cookie

Cons

  • Getting people to do things is difficult and requires frequent maintenance

Pros

  • If it works, it works well and without having multiple stakeholders involved

Final thoughts

I believe it is important to know why you need / want to exclude your IP addresses, and even more important a general understanding of how your organization uses data, and interact with your user’s behavior in Google Analytics. In terms of anonymizing IP addresses, the first solution presented is a good starting point if you need to take action, and don’t have any resources yourself. I do think that the best approach is to let the developers look at the IP address server side, and push that information to you, however it can be a hassle to get it updated.

I have been discussing my first solution and issue with different clients and nerdy peers. My conclusion so far is that It really depends on what organization you are in, and where you and your legal team stands.

If you feel that any of the solutions I have provided can be used, feel free to implement them, if not, please leave a comment and let me know if you have a better solution to the issue!

At last, a big thanks to Thomas Rhode for helping me write the code.

Summary
Excluding internal traffic in Google Analytics while anonymizing IP adresses
Article Name
Excluding internal traffic in Google Analytics while anonymizing IP adresses
Author

Join the discussion 10 Comments

  • We have followed the same for a couple of clients, one way of not having the need to checkup IP adresses in an external service, is to have the developers expose the users IP address in a datalayer. Process is the same, lookup table checks datalayer variable for match with IP range – and changes anonymizeIp value accordingly.

    Like the idea of setting a cookie with a shorter lifetime, to minimize the need for continues lookups 🙂

    • danny says:

      Yep, that is also how we are doing it for most clients. This particular instance og using an external service was for a client where the client didn’t have the ressources to expose it, so i had to be creative 🙂

  • Dragos says:

    Hi Danny,
    I wanted to let you know that it worked for me with the GA filter by custom dimension set to “false” (not “true” like in the screenshot).

  • danny says:

    Hi Dragos, thanks for your comment!
    We want to remove / exclude the traffic from the view if internal traffic is true. So i can’t see how the other thing can work ? Can you give me some more details on this?

  • Dragos says:

    This is what I see:
    The script used for the custom dimension variable relates to Step. 5 (JS return anonymize IP value) and returns the value “false” for the case when internal traffic is present (==1) from this statement : ” else if (internal == 1) { returnValue = “false”; ”
    Is there another script reference to show true for internal traffic custom dimension?

    Dragos

  • danny says:

    So, the idea is to have IP anonymization set as “true” until the cookie is set. When I know whether the person Is internal or external I use the javaScript to check which cookie is set to create my lookup table. By default this says true unless the internal traffic cookie is present.
    The reason for this, is for Google Tag Manager to understand whether to anonymize ip’s or not, you need to either give it the value “true” or nothing.

    Does it make sense :)?

  • Dragos says:

    Yes, I understand this. (“By default this says true unless the internal traffic cookie is present.”)

    Just wanted to mention that the screenshot with custom dimension value for the exclusion filter (Exclude internal traffic) worked with “false” instead of “true” because of the script setup.

    Related to this screenshot: https://analytics.mawani.dk/wp-content/uploads/2018/06/Sk%C3%A6rmbillede-2018-06-19-kl.-23.15.34.png

  • danny says:

    Ahh, yeah ! Thanks =)

    • Dragica says:

      Hey,
      This is not working. Not sure what went wrong. Going back to Dragos comment above – what should we do? Is that why it is not working? In GA should we set filter to True or False?
      Thank you!

      • danny says:

        Hi Dragica,
        Try and use the code here from Github mentioned in the blog: https://github.com/dannymawani/randomscripts/blob/master/checkforinternaltraffic

        If you look in the console you should see the IP adress. Once you added your regex to the script, you should be able to see the cookie – You can then do a lookup on the anonymizeip in your global analytics settings to do it automatically for you and then use the exclude internal traffic filter you normally use in GA – Hope that helps 🙂

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.