Google Analytics is a great tool, and 62% of the top 10K websites have Google Analytics tracking code installed (according to similartech.com). However, there is one major problem a lot of people don’t realize:
Google Analytics tells you what happened, not why.
Figuring out the WHY is the hard part. We need to understand how GA works at least on a very basic level.
I hear very often that a high bounce rate is bad. Really? Let’s imagine the situation:
I’m going to Kelowna, BC next weekend and I want to have some fun on the water. I’m thinking about kayaking, but I don’t own a kayak so I have to rent one. I am an MEC member and I know that they have a rental program so I open Google and type “Does MEC rent Kayaks in Kelowna?”
Google doesn’t provide me with a direct answer in the search results so I’m clicking on the first link that leads me to mec.ca. The store page provides a phone number, address and a map above the fold. I’m scrolling down to see some rental information. They rent kayaks for $40 per day. Now, I know where the store is, the price and the phone number I can call to book one so I’m closing the page.
I had a great experience. I promptly found the info I needed, but Google Analytics would report this session as a bounce with no time on the site.
On the flip side, Google Analytics may report that another user spent 5 minutes browsing your website. The problem is that we have no idea what really happened and what that user’s experience was. The user might have:
- been excited about your content and read articles
- or might have desperately been looking for information but the website was so confusing that the user couldn’t find anything even after spending 5 minutes on the site and seeing over 20 pages. Great engagement, right?
Both of these scenarios are perfectly possible but Google Analytics (in the default setup) won’t tell you which scenario is relevant.
Last Non-Direct click attribution
Solving attribution is a complex problem with which many large websites struggle. To get started, let’s stick with this simple scenario:
- I’m waiting for a meeting and killing time by checking my Twitter feed. I am noticing a link to a cool post from STAT.
- I haven’t heard about STAT before, so I’ll search for them in Google (brand query) later that day.
- After a couple months, I got to the point that I need a tool for keyword tracking so I’m searching for “enterprise keyword tracking tool” and I’m seeing a STAT ad in the search results. Because I’ve already interacted with them, I’m clicking on the ad.
- I read about their features and case studies, and decide to book a demo.
If all sessions are from a desktop, Google Analytics would report one user, three sessions, and one conversion.
Which channel gets the credit?
The conversion is fully attributed to Google AdWords, so the PPC guy should be celebrated, right?
Look at Assisted Conversions report before firing the social media guy. The report helps us reveal the entire story and can be found under Conversion > Multi-Channel Funnels > Assisted conversions.
We can also compare different attribution models in the Model Comparison tool to get a better picture of what’s going on.
Note: Google announced a new tool called Google Attribution (beta) during its Google Marketing Next event in May 2017 that aims to solve the attribution problems for us.
Cross-device tracking is another huge pain point. Google Analytics is unable to stitch together sessions that users have made on multiple devices.
As soon as a website is loaded, Google Analytics is looking for a Google Analytics cookie that contains a clientID. If the cookie is not found, a new cookie and clientID are generated.
user = browser
Incognito modes don’t store cookies, so every time a user opens a website in an incognito mode, a new clientID is generated and Google Analytics recognizes that person as a new user.
If I made every session from a different device, Google Analytics would record three users, three sessions and one conversion.
We would be unable to see that I visited the website multiple times prior booking a demo.
Google stated during the Google Marketing Next event that 30% of people use 5 or more devices. Imagine how different the data would be if Google Analytics solved multi-device attribution.
The definition of direct traffic is:
- Source: direct
- Medium: not set
What does that mean?
Direct traffic is junk.
If Google Analytics doesn’t know the source, it’s direct, and you’d be surprised how often it happens.
Direct traffic is when a user
- Types a URL by hand
- Clicks on a bookmark
- Clicks on a link in a PDF file
- Clicks on a link in a Word document
- Clicks on a link in an email
- Taps on a link in a mobile app
- Clicks on a link shared via Skype, Messenger, WhatsApp, etc
- Clicks on a link from secured site (HTTPS) to an unsecured site (HTTP)
We can prevent some of this happening by using UTM parameters, but Google Analytics does not recognize these by default. If you don’t use UTM, you should start today.
A few years ago, Groupon’s experiment showed up to 60% of direct traffic may actually be traffic from search engines.
Referral Spam Traffic
The screenshot below shows almost 5,000 sessions since the beginning of 2015. Not bad for a tracking ID that’s never been deployed to any website. Spam is usually sent via Measurement Protocol so there is no request to your server.
Referral spam isn’t a problem for big websites because those fake sessions get lost in the ocean of real referral traffic; however, it may significantly skew data for smaller websites.
A new session starts:
- After 30 minutes of inactivity
- At midnight (reason to pay attention to the time zone you set up in the View settings)
- A new incognito tab is opened
- When a new campaign source occurs
- When tracking code is missing on some pages
- When cross-domain tracking is set up incorrectly
Google’s description: In data analysis, sampling is the practice of analyzing a subset of all data in order to uncover the meaningful information in the larger dataset.
What does that mean? Google takes only a sample (e.g. 10%) of your data, analyzes it and assumes that 90% of your data is same.
Sampling is usually accurate and the difference is around 2%. However, let’s look at the revenue generated by traffic from search engines in the example below.
Google Analytics allows you to get the same information from multiple reports. So, in theory, we should get identical numbers if we switch the report.
The second report is also supposed to return information about organic traffic; but as we can see in the top-right corner, the report is based on 12.57% of sessions. The result?
A 6.9% decrease (over $110,000) in the revenue.
Default reports are not subject to sampling but other reports are:
- Analytics Standard: 500k sessions at the property level for the date range you are using
- Analytics 360: 100M sessions at the view level for the date range you are using
Google provides more information about data sampling on their support site.
Use default reports as much as you can to avoid sampling. Also, we can download data day by day via API and merge them in Google Spreadsheets or Excel.
Two Real Examples
1.93% Bounce Rate?
It’s too good to be true. As we found out, there was an interaction event that automatically fired after a page load.
Question everything below 20%.
Awesome traffic for a local business without rankings?
There was a client of mine using a prehistoric one-click website builder. There was no way to add the tracking code via its backend, so we asked the provider to add it.
As soon as the snippet was implemented, we started seeing huge traffic numbers. The only problem was that the website didn’t rank for almost any keyword. The reason? The provider implemented the GA snippet with our tracking code to all their clients.
This is a sneak peak of potential problems you can run into, so I want to leave you with two main takeaways:
Always question your data!
Understand your metrics!
You don’t want to be making decisions based on false assumptions or incorrect metrics. It may harm your business.
I’d love to hear about your experience or problems with Google Analytics, so please share them in the comments.
Slides from my talk at STAT’s offices: