Not Provided: How to Use Behavioural Analytics to Infer Keywords

I spent a lot of time in the past doing SEO whilst running my own websites and I’ve always been fascinated by Visitor Intent. I used to spend hours sifting through keywords and working out what people really wanted from my website.

As an analytics consultant, I now employ more sophisticated classification models designed to group large number of keywords into intent “buckets” and align these with the customer journey (the topic of another blog post). But Not Provided has made that virtually impossible for Google organic traffic.

Nothing can substitute the loss of actual data but there are workarounds. One of them is behavioural analytics — using actual visitor behaviour to infer what the keyword used might be.

Untapped behavioural predictor for keywords: What people do first on a website

Search behaviour is a highly active, task-oriented process. The visitor’s mindset is rarely on browsing, it’s on finding and doing something.

First Action as an extension of search behaviour

The first thing that people do on a landing page is closely aligned with what they are there to do. But so are keyphrases. So, can we can use the known behaviour we do have data on (first action on site) to infer behaviour we don’t know, but which is closely related (keyword used)?

Yes, we can. And the beauty of it is that much of this behaviour is already being tracked in Google Analytics, we just never thought of looking at it that way.

In some cases, the first behaviour is even more accurate at predicting visitor intent than keyphrases ever were. Whenever I classified keywords by intent I inevitably ended up with an important group of keywords which were so broad and vague that it was impossible to work out what people’s intent was. But when you take their first action into account as well, the meaning of someone searching for a broad term like your brand becomes much clearer.

That’s because what someone does first on a site represents a choice which is:

  • Made actively and consciously. In the process, other possible choices are rejected.
  • Aligned with the task someone is in the process of completing.

Inferring keywords from known visitor behaviour: Examples

Let’s take a few examples to see how this might work in practice. There’s no rocket science involved. The trick is to learn to look for the underlying meaning behind interactions such as pages seen, buttons clicked, etc.

Example 1. Travel site, destination landing page

Sometimes the "first behaviour" is actually a set of micro-interactions which have meaning only in the whole of the stream. Many sites have an availability or search form with multiple filters and options. The sum of those choices, made as soon as someone lands on the page, represent a strong behavioural clue as to what their search query was.

First action

Visitor fills availability form

What to track

A concatenated string of all values in chosen options:
"santorini oia september 2013 2 adults self catering"

Inferred keyword
(landing page + first action)

"greece holidays santorini oia september 2013 2 adults self catering"
In other words, “self catering holidays for couples”

One thing we often forget is that when people land on a page, the first thing that they do is scan the page for the thing that closest matches their mental context. If someone chooses "Regions in Greece" from all of the available options it means that that’s their initial intent and interest.

First action

Visitor clicks on Regions

What to track

Anchor text of link clicked:
"regions in greece"

Inferred keyword
(landing page + first action)

"greece holidays regions"
In other words, "where to go on holiday in greece"

Behavioural scenarios are not always so clear cut. Banners and calls to action can sometimes distract someone from their original track. But more often than not, their first choice will still be closely aligned with the behaviour that started in search.

It’s unlikely that someone who searches for "luxury holidays in Greece" will choose to explore the budget holidays section as their first activity on site. You may not be able to infer the exact keyword used, but you may well be able to infer the market segment that someone belongs to. And that’s hugely useful.

First action

Visitor clicks on "Luxury holidays" banner

What to track

Custom attribute of banner image (or even the image file name, if relevant, which you can clean up later):
"greece luxury deals"

Inferred keyword
(landing page + first action)

"greece holidays luxury"

Example 2. B2B service site, homepage

Pages like the Homepage fulfil multiple functions and cater to different audiences. Not provided makes these sort of pages difficult to analyse. But here’s the paradox. Often, these pages are the easiest ones to infer "not provided" keywords for because they have so many self-selection mechanisms baked right into them. By choosing one link over another, people raise their hand and tell you what topic, interest, and therefore likely keyword they used to get there.

First action

Visitor clicks "More" link leading to Reseller hosting

What to track

Page path of link visitor clicks on:
"reseller hosting"

Inferred keyword
(landing page + first action)

"reseller hosting"

In some cases, the first thing someone does on a landing page can be highly specific and therefore less open to interpretation. In those cases you can infer a "long tail" term with a great degree of confidence.

First action

Visitor clicks "How do I transfer a domain into Clook"

What to track

Anchor text of link clicked on:
"How do I transfer a domain into Clook"

Inferred keyword
(landing page + first action)

"clook support how do I transfer a domain"

Example 3. Ecommerce site, product category landing page

Even if people switch brand of products during their visit, the one they choose first is the one they probably searched for. Even if you knew that they searched for "washing machines", their first action (e.g. "Beko") tells you that what they really meant was "Beko washing machines".

First action

Visitor clicks on Beko logo

What to track

Page path of link clicked:
"Washing machines beko"

Inferred keyword
(landing page + first action)

"beko washing machine"

A price-related filter applied at the beginning of the visit can tell you a great deal about the likely commercial nuances in the keywords used by someone.

First action

Visitor clicks "under £350" link

What to track

Anchor text link clicked:
"under £350"

Inferred keyword
(landing page + first action)

"washing machine under £350"
In other words, "cheap washing machine"

Example 5. Ticketing site, event landing page

The landing page will always give you the general topic (i.e. "head term") someone has searched for. But take into the account the active choices people make on that landing page and you can refine that head term into more specific sub-topics.

First action

Visitor clicks "238 fan reviews"

What to track

Anchor text of link clicked

"238 fan reviews"

Inferred keyword
(landing page + first action)

"charlie and the chocolate factory review"

First action

Visitor clicks link in calendar

What to track

Page path of link clicked

"theatre royal drury lane london"

Inferred keyword
(landing page + first action)

"charlie and the chocolate factory theatre royal drury lane london"
In other words, "charlie and the chocolate factory tickets" (and not reviews)

Caveats of First Action Use for Secure Search Analysis

No long tail

First behaviour gives more clarity around what the search query might have been but it will never be as precise, varied or rich in meaning as long tail keyphrases. That richness is inevitably lost.

Dependent on available choices

The first action people take on a landing page is limited to what actions are available to take. If there aren’t any or very few options for visitors to actively choose from, then you might be clutching at straws. Sometimes it might sense to bake in some self-selection mechanisms (this can also improve scent and general user experience).

Links and button names impact active choices

People choose the most appropriate link for their task based on what they see. If links or buttons are poorly labeled (i.e. click here) then that can interfere with people’s decision process ("which link looks more likely to do what I am here to do").

Dominant calls to action can skew intent

Sometimes a single call to action that dominates a landing page is so effective that it masks the visitor’s original intent (which is what we’re concerned with).

Many people may click on that dominant call to action even if they have no intention of following through. That "detour" from their original intent muddles the data.

So, in some cases, the first few actions (rather than the first alone) give more insight into what people came looking for originally, and therefore the keywords they are likely to have searched for.

Brand searches difficult to estimate

In some cases it’s reasonably straightforward to use the first action to work out if the visitor searched for the brand name (for example, when the first action is clicking through to help, contact page, branch locator, etc). But these cases are in the minority.

There are options though. You can add other behavioural analytics into the mix such the speed with which people move through the site (brand awareness implies familiarity with the layout and features of the site — unless first time on the site). I’ll explore these in a future post as they are more complex (and I’m still learning about them myself).

Based on assumptions and inferences

There’s no doubt that this technique is prone to error. You have to start with sensible assumptions (but assumptions nonetheless) about what the first action tells you about someone. You then need to validate those assumptions with data.

But the process has clear benefits. It puts you in your visitor’s shoes, forcing you to work out what their mental context might be. And by its very nature, these assumptions require that you always refer back to the business model. And that’s always a good thing.

If you have access to historical data pre-dating 100% not provided, you can cross reference many of these “first actions” with actual keyword data. You can work out what the “first action” was for important head terms or brand keywords and use that data to validate or clarify assumptions when the intent behind “first action” is too ambiguous.

How to track first actions in Google Analytics

Second Page Visited

As I mentioned earlier, some of the first behaviour is tracked in GA by default if the first action taken is a pageview (visitor lands on page A; the first thing they do they is click through to go to page B; page B counts as the first action).

Google Analytics tracks this under the ga:secondPagePath dimension (which, sadly, is only available in the API; see below).

If the first action is a client-side interaction (i.e. click on a button) which is tracked as an event, then you need some custom GA tracking. While you can see all the events associated with a page, there is no way of telling whether they occurred on a landing page or, indeed, if they were the first action taken in that session.

One workaround is to fire ALL interactions as pageviews in order to leverage the functionality of the Second Page Visited dimension. You would have to use a separate profile or property for this. I’d also use a profile filter to strip out any non-letters or digits and make the "inferred keyword" easier to read.

Sequential Segmentation

You could try the new sequential segmentation feature although it doesn’t scale for this sort of stuff. You need the first behaviours recorded as a dimension so that you can manipulate the data and cluster it for analysis and the sequential segmentation is simply not suited for that.

Event Tracking

Event tracking would be the first and obvious tracking mechanism for these "substitute keywords". You could differentiate events fired on landing pages vs other pages like this (untested):

  1. Determine whether the page is a landing page based on document.referrer being Google organic
  2. Add a custom parameter to the event category to identify events occurring on a landing page (e.g. "Regions in greece – np".

However, this gives you ALL of the interactions occurring on that landing page, regardless of order. Remember, we are only concerned with the first action, not just any action taken on the landing page (which could be the 1st or 7th). Only the first one. This is important.

At the moment, the secondPagePath dimension in Google Analytics seems the most readily accessible solution for inferring keywords based on actual visitor behaviour.

Thoughts? Comments? Let me know below

  • Peter O’Neill

    Hi Carmen, I really like the idea of defining visitor intent based on their first action. Not sure why you are trying to tie it back to keywords though, I would allocate immediately to those visitor intent buckets (or personas).

    With the event tracking approach, couldn’t you identify the first event in some way, a different name or associate with a custom variable. Then easy to look at the first actions being taken.

    Now, while this is all good, how do you propose people use this information?


    • Carmen Mardiros

      I agree with you both. The purpose of the post is to make the connection between behavioural analytics and search terms for those who have the concept of keyphrases ingrained into their psyche. It makes sense to do that because I believe the first behaviour (or first few actions) are an extension of the task people started in the search engines.

      Keywords have always been one of many behavioural clues, first behaviour is no different. But of all the behavioural signals, it’s the closest one to keyphrases, hence the association.

      I agree that event tracking would be preferable. It’s identifying it “in some way” that’s the problem. One way could be using a cookie.

      Fire the first behaviour off that landing page as event (make sure to distinguish it as originating from a landing page)
      Store a flag in a cookie “first_behaviour_stored”
      On subsequent interactions, double check that the flag exists. If it does, then don’t fire any other interactions off that landing page.

      Would work, but it’s cookie-based and as such the cookie logic must match Google’s own sessionalisation logic. Tricky.

      Thanks for the comments

  • Ernests Štāls

    I agree with Peter, you actually might have something more then just new way to find keywords.

  • Guest

    Brilliant post. I’ve been rattling my brains as how to measure intent – especially where you one single page is for researches and buyers. Will crack on with this methodology immediately.

  • Pritesh Patel

    Brilliant post. I’ve been rattling my brains as how to measure intent – especially where you have one single page which is aimed at researches and buyers. Will crack on with this methodology immediately.

    • Carmen Mardiros

      Thanks for the comment Pritesh. I reckon “first behaviour” will work best for websites where people actively engage with page elements and have to consciously decide which one to choose.

      Passive behaviour, such as scrolling and reading is much more difficult to use to infer intent. Hence, sometimes baking in some engagement elements would help on the analytics front (provided they help, or at least, don’t interfere with user experience).

  • SEO Doctor

    Great post Carmen. You just need to get hold of the historical data pre-dating 100% not provided to map them onto the first actions, hard if its a new niche. Saying that, i’m sure you could swop or buy that data from others in the industry. Its also surprising how many old clients forget to cancel your GA access ;)