Week 8 of my minidegree in conversion rate optimization delivered on high-quality content.
This week involved…
- Conducting an audit of a Google Analytics account
- How to perform A/B testing
- Statistics for testing
Of these, I found the information on how to conduct a GA audit most interesting.
It seems that week after week, I’m learning more and more how important Google Analytics is to the conversion process.
The information covered in this course was in-dept, delivered in an engaging and entertaining way.
At the end of the course, I had a long list, and a driving desire to go through the analytics accounts for each of my websites. I’ll begin though with Medicos Expertos as that is the one getting the most traffic, and it’s the one I plan on monetizing this month.
The section on A/B testing also proved to be packed with valuable information.
For years, I’ve seen different services promote their A/B testing capabilities. Companies like Thrive Themes, Unbounce, Cartflows – all have A/B testing built in. The thing I’ve learned after this week’s course on A/B testing with CXL founder Peep Laja is just how unlikely it is that you’ll get valid data from these tests.
Based on what I learned, most small businesses will struggle to have sufficient amounts of traffic to perform valid A/B tests.
The next course – Statistics For A/B Testing – is testing my ability to pay attention. It triggered flashbacks to first year stats in university – when I had to spend long nights cramming for an important exam to bring my Stats grade up to an acceptable level.
I don’t like the subject, and this course is showing me that maybe A/B testing isn’t where I want to focus my efforts. I’d rather work on being a world-class conversion copywriter like Momoko Price.
But as much as I dislike the subject matter, I’m glad it’s part of the minidegree. It speaks to how in-dept the curriculum is. It’s crucial information if you DO want to become an expert in A/B testing.
As always, my notes and thoughts on each course are below…
Google Analytics Audit
GA Property and View
Ranking scale – Eisenhower Matrix
- Important and Urgent
- Important but not urgent
- Not important but Urgent
- Not important and not Urgent
Property Settings > All Products:
- Check to make sure that the property is connected to Google Console Dashboard
Tracking Code > Referral Exchange List:
- Make sure that the site URL is added as an exclusion.
- If using a development version, make sure to add the “live” domain to the exclusion list.
- See Simo Ahava article on 13 useful custom dimensions (add version ID to GTM container)
- Check that a RAW view exists
- Check that a TEST view exists
- Make sure that website URL matches the one used in property settings
- Make sure bot filtering is checked (should be by default)
- Turn Site Search Settings “ON”
- Strip query parameters
- Are they being used (should be)
- Set up “no traffic” alert. Alerts you immediately to any problem with your site.
- Raw data view should have NO filters
- Do not exclude URL query parameters
- Uncheck “Bot Filtering”
- Do Site Search Tracking
- Do link it to Google Adwords if using so you collect all relevant data.
- Enable Enhanced Ecommerce is needed
- Set up Ecommerce settings same as the main view
- Do not test stuff in Raw Data View
Views To Add
- Add a view for just organic traffic
- Add a view for paid traffic
Sending Page Views Correctly
Verify that page views are being tracked correctly.
- Undercounting occurs for several reasons but can be quickly found by using the Google Chrome extensions within Developer tools inspector console.
- Overcounting happens when you have a version of GA tracking a site, and then GTM is added without removing the GA code.
The Hostname Filter
Using the GMA/GA Debug Extension when looking at the Chrome console, look for the GA info.
Document location (dl) is where we see the hostname.
Creating a Hostname filter makes sure your data includes ONLY traffic from your domains.
Things get tricky when doing cross-domain tracking.
Need to add this filter to MedicosExpertos.co
Check out the article included in lesson notes.
Set up IP filter to exclude traffic from my location
You can exclude regions, cities, countries.
Make sure to test that exclusions are working.
Default Channel Group
Acquisition > Channels
How did people find your website?
Traffic that is categorized as (Other) needs to be defined.
Rules of good traffic bucketization
- Read and know the google default def’ns
- Adjust tracking to the Google default def’ns
- Adjust the default channel group to match your own values.
To change default channel groups…
Property > View > Channel Settings
- Create a new channel or add on to the default grouping
Most traffic from email is going to show up as “direct”.
Make sure to add source and medium UTM to links in email campaigns.
You want to do a site crawl when you’re verifying page views and the accuracy of data.
Using ScreamingFrog.com, enter GTM account no. in custom search.
Never use UTM llinks within the site or it will cause a new session to be started and end the one that brought the visitor to the site.
Content Grouping and Query Parameters
The purpose is to make sure the same page is grouped together.
Type a “?” into the search field when in All Pages view of GA.
This will bring up all pages that contain a query parameter in them.
Do they exist?
Do they make sense?
Are too many/too few?
Events are the heart and soul of GA.
There should be a small number of event categories (10-20)
- Always track outbound links and download links
Event actions and labels need to make sense.
Do they exist?
Do they make sense?
Are there too many/too few?
Events are cheap – set up lots of them.
Goals – be stingy. You are only allowed a limited number of goals in GA.
PII (Personally Identifiable Information)
If any personal info gets tranferred to GA, your account be terminated and data destroyed.
PII can show up in…
- page content
- Event category, action, or label
- Search terms
- Custom dimensions
- Go to All Pages, and perform a search for “@” to find email info in page parameters.
- Look at search terms for any PII
Create segments and use regex to find any PII.
EEC (Enhanced Ecommerce)
Check to see if Event labels are being consistently used.
Next, check the shopping behaviour funnel to make sure the numbers look “legit”.
Checkout behaviour numbers should make sense.
The funnel steps are set in the admin section and should correspond with what’s actually happening on the site.
Make sure all four product performance categories are being tracked
Confirm that revenue numbers are accurate – compare to actual sales.
Split your traffic and test at the same time. Don’t just implement one change and run it for a week to see if it beats the week before.
The reasons for doing this are many. Your results can be affected by many factors:
- Days of the week
- Fluctuations in traffic from day to day
- TV events (Super Bowl)
Two types of testing:
The more pages you test at once, the longer it takes to get a winner. For low-volume sites, stick to two pages at a time.
Multivariate testing requires a LOT of traffic.
Categorizing and Scoring Issues
Allocate all of your findings from your conversion research report into one of five buckets:
- Test. Stuff that presents an obvious opportunity to shift behaviour, expose insight, or increase conversion.
- Instrument. Things to improve analytics reporting.
- Hypothesize. We know something needs to be done, but aren’t sure what.
- Just do it. Easy fixes that require little effort.
- Investigate. Requires further investigation, and some testing to triangulate a problem.
Score each item (1-5) for their ease of implementation and opportunity impact.
An item with a score of 5 is critical. Implementing fixes or testing is likely to drive significant change in conversion and revenue.
Follow the money. Start with things that make the biggest impact on the bottom line.
How To Run Tests
A wireframe is first and foremost a communication tool. It tells people immediately what you want, what the treatment should be like.
It is not designing!
Balsamiq is a quick, easy tool for creating mock-ups.
Getting Testing Right
To do a proper test, you need 3 things:
- Big enough sample size
- Long enough test duration (min. 2 business cycles)
- Statistical significance
Ignore test results until you have at least 350 conversions per variation. THIS IS JUST A BALLPARK FIGURE.
It’s a good idea to test for different segments – desktop, mobile for example.
P-value is the probability of seeing a result or more extreme given that A & B are identical.
Bayesian stats can actually tell you the probability of one being better than the other. P-value can not.
Don’t stop the test when/if you reach 95%+ confidence.
Don’t make conclusions based on a very small sample size. A/B testing tools always “call it too early” because they make assumptions that the same size was fixed in advance, and that it was large enough.
Do not increase the sample size by sending atypical traffic to the experiment – blasting email list with a link. Your email list may not be representative of your website search traffic.
Regression to the mean
If you experience “regression to the mean”, where your uplift returns to the baseline, try running the experiment a second time to ensure it is not a false-positive.
Results may be skewed due to the novelty effect. Returning visitors are curious about the new design or element and perform a conversion (sign up) again just to check it out.
Margin of error
If results overlap when considering the margin of error, then the test needs to keep running.
Measure what matters.
Measure to the final goal. Measure the money.
Number of transactions can go down while revenue goes up – increasing absolute conversions are not the main goal.
Avoid overlapping tests when you can
When possible, avoid running tests on multiple pages (home, cart and checkout)
When running tests with overlapping traffic, make sure the traffic is always evenly split.
The Multivariate Test
MV tests test more than one thing at a time, but are tricky to pull off. They need a lot of traffic, and the variants being tested need to be run for the same amount of time.
Mutually Exclusive Tests
Instead of forcing a multivariate test, conduct a mutually exclusive test where users are assigned to just one of the tests.
Run tests for full weeks
You must run a test for a full week to rule out differences in conversion based on day of the week.
If your test runs during a major holiday like Christmas, you want to re-run the test in January to confirm the results.
Testing more than 2 variations
When you have more than one variation and one is losing badly, don’t just remove it from the test. It changes the test dynamics and throws off results.
Instead, stop the test, remove the losing variant and test just the control vs the remaining variants.
Send traffic to Google Analytics
Send test info into GA as custom variables. Then run advanced segments and custom reports on it.
Customer Theory & Buyer Personas
Create a document called Customer Theory which contains what you know about different types of users – personas.
- Small business owners want this. They’re concerned about these things.
With every test, update the document with what you’ve learned.
The purpose of testing is to learn, not to get a lift.
Keeping a Customer Theory document allows you to quickly get new people up and running with what has been tried, what’s worked and what hasn’t.
For building personas, we’re most interested in:
- Intent (motivation, goals). Why are they on the site? What do they want to accomplish? Where did they come from? What did they see before arriving on the site?
- Concerns and fears (friction). What are their fears, hesitations? What info do they need to feel comfortable and confident to take action?
- Mode of persuasion. Will they take action quickly or slowly? Are they emotional or logical decision-makers?
Forming a test hypothesis
A hypothesis is formed from research.
- Because we saw (data/feedback)…
- We expect that (change) will cause (impact)
- We’ll measure this using (metric)
Always pick one key metric to determine success!
A better hypothesis is formed from both qual and quant data.
The point of forming a hypothesis is that we understand…
- Which problem we’re trying to solve;
- What’s the solution;
- And which metric we are trying to improve.
Never enter a test without a hypothesis.
By having a hypothesis, you will…
- Be more focused;
- Improve communication between you and the stakeholders, and team;
- Learn more about your users.
Normally you want to test just one hypothesis at a time – EXCEPT – when you have low volume of traffic.
But only test more than one hypothesis if your data or analysis shows strong support for these hypotheses.
How to prioritize tests
PIE funnel – Potential, Importance and Ease
Score each idea on a scale of 1-10 in each of the three areas. The downside is it’s hard to put a number on the potential success of a hypothesis.
ICE Model v2
Impact, Cost, and Effort
Score a hypothesis in each area on a scale of 0-4
Cost is financial and effort is time involved.
A series of questions is asked. A score of 1 or 0 is given to each question. Yes = 1 No = 0
Your test results are impacted by what’s happened or what is happening in the external world.
Developers didn’t do a good job testing your variation in different environments – Safari, Chrome, mobile, desktop, etc.
Protect against it by…
- implementing strict QC measures.
- Match results to transactional data – e.g. If stats show revenue up 40%, check with accounting to confirm.
- Do a double control experiment – A/A/B (25%/25%/50%) Both A’s should see identical user behaviour. If not, there’s an instrumentation error.
Low sample size
Sample size required should be calculated up front – before the test to make sure you can get enough people to validate the test.
The Flicker Effect
The flicker effect refers to what happens when a user will see the original page for a fraction of a second before they see the variation page. In some cases, the flicker can occur for 3-4 seconds, and totally skews the results.
The A/B script often loads in parallel to the site content. The faster the site loads, the more pronounced the flicker effect.
Another cause for the flicker effect is that the code tags loads after your website’s code.
If you see the flicker effect, do this:
- Take the code out of GTM, and place it directly in the page html.
- Load the testing tool tags in the header – before the closing of the </head> tag
- Optimize for speed.
- Check performance across different browser segments
- Check performance across key segments. Some segments may be greater than others. Overall, you may see a good lift, but mobile, for example, may be almost nothing, and desktop is huge.
- Check microconversion values – e.g. average cart value
What to test
- Obvious problems with obvious solutions
- Creative solutions when there are no obvious problems.
- E.g. Try logins with social accounts instead of email addresses
- What kind of biases can we exploit?
- Re-imagine your entire approach, or design.
How many changes per test?
- No clear answer
- Making 1 change allows for better learning, but is only good for high-traffic sites.
- Many changes are more likely to alter user behaviour but you can’t be sure which specific change resulted in the improvement.
- Make sure that all changes address a specific problem. Don’t just change for the sake of change.
- Every change supports one hypothesis – e.g. improve clarity across the funnel.
A/B testing vs. MVT
MVT requires about 100,000 visitors per month to run and validate.
Bandit tests send more traffic to the variation that seems to be converting better. It maximizes the money earned during the test.
Good for short-term campaigns – around holidays.
Requires minimal involvement.
ET is when you remove elements from a page and see what the result is. If the elements removed do not affect conversions, leave them out and maybe replace them with something else that could have a positive impact.
You test small changes. E.g. Amazon
Test any random idea and prove the value of testing.
Innovating a portion of your site is good when iterative testing isn’t working.
Also good to use when you’ve reached local maximum for a particular page.
Takes users down a different path.
- E.g. 1-step checkout vs. multi-step checkout.
Statistics for A/B Testing
Basics of Causal Inference
Correlation does not equal causation.
Run an A/A test to observe the “noise” in data. Even though they may vary in their data, given enough time, the two sets of data should converge to zero. If they don’t, then there is a mistake in your experiment set up.
Statistical Significance and Other Estimates
The more data, the bigger the Z-score.
The Z-score accounts for the estimated variance, the observed distance from the model, and the sample size.
P-value is the probability, under the specified statistical model for the null hypothesis, of observing a statistic as extreme or more extreme than the observed.
It’s a way of describing how surprising an outcome is.
A statistically significant p-value means:
Observing a low p-value means either of these is a possible logical conclusion:
- The null hypothesis is NOT true.
- The null hypothesis is true, but we have observed a very rare outcome.
- The statistical model is adequate, hence the calculated nominal p-value is not an actual p-value.