We live in a world where more data exists about the average person’s ordinary life than filled all the libraries of the world 50 years ago. Data exists about our interests, our friends, our whereabouts, and our beliefs.
For communicators who seek to reach the right person with a right message at the right time, this is an amazing point in time. But data is not insight. And even accurate, relevant data can lead to profoundly wrong conclusions.
So when I think about social insight, and when I work with clients to build meaningful social strategies that are genuinely founded on understanding people’s social media conversations and behaviours, I find that I take on a split personality.
On the one hand, like a boy-band loving teenager, I am so excited about the genuinely amazing things that we can do in this new world of data. I love the opportunity that right now, today, we can use social to target our message just to the people who will most care about it. Or understand what is causing new parents stress during a 3AM baby feeding. Or figure out which Twitter feed is most read by people currently travelling to a specific tech conference. In Leeds.
On the other hand, there’s a grumpy old person inside me who has seen all of this go too far, too wrong, and too often. Who has seen entire comms teams paralysed because the CEO saw a critical tweet and no one was able to put it into context for him. Who has seen annual planning decisions made on the basis of great feedback from the brand’s Facebook followers, even though no one knows if non-followers would feel the same. Who has seen campaigns run on the basis of trying to achieve a target about brand sentiment that just can’t be measured in the way they are measuring it.
It doesn’t have to be that way. We can do better, but the first step is accepting that we have a problem:
We have to admit that there are major flaws in some of the most common measures that agencies and clients use to gain social insight.
Here are a few specific types of problems that my team often come across that are problematic; I want to show you that there are ways of solving or adapting to a lot of these limitations if you face them head-on and give them the careful thought and attention they deserve.
Demographics data on Twitter and Facebook
Basic demographics of age, gender and location are some of the most used social media data points. And for some platforms, this data is fairly reliable and well sourced.
But for some they are not. For instance, you may have seen that Twitter analytics include a breakdown of your followers and following by age, gender, income levels and interests. But have you ever stopped to consider how they get this information? At no point in the registration process does Twitter ever ask you for your age or gender.
What happens is that the platform uses algorithms to make a series of guesses about you based on what it can see. So how accurate is the algorithm? Unfortunately, according to my research, not very.
When I tried to validate Twitter’s gender data against my own list of followers, this is what I found: Twitter told me my followers were 61 percent male. I found this surprising, so I manually validated the gender of over 1000 of my followers to find that only 33 percent of them were male. Furthermore, 39 percent (that’s eight percent more than the 31 percent Twitter was estimating) were demonstrably female, and easily determined as such through a quick check of their names and pictures. The rest were split between non-gendered organisations and media entities (17 percent) and people whose gender couldn’t be determined (11 percent).
Similar problems exist with other demographic data such as age and income – this simply can’t be determined by the platform.
Even demographic data that it manually recorded by users – such as location, where people specify the city or country they are based in – can surprisingly unreliable. Back in 2009, for about six months my Twitter feed was set to be based in Tehran, a city that I had never visited. Like thousands of other people, I was trying to assist participants in the Iranian Green Revolution by making it harder for the government to identify local tweeters.
I’m aware of a prominent media personality who has set his location to the West Indies. Apparently, this has something to do with cricket.
Age is another area where people’s self-reported data is notoriously unreliable. We don’t know how many pre-teens have fibbed on their Facebook profiles to get around the platform’s ban on users under 13.
Many social media tools – including OgilvyOne’s partners at BrandWatch include a function to report ‘sentiment’ – either positive or negative. The tools determine sentiment based on language analysis that looks at the emotional context of common words and phrases.
And there’s a place for that. If you’re looking at a very large volume of conversation around a big product launch, for instance, and want to know if it’s been warmly received on the whole, this will give you a rough-and-ready measure.
But when you see social media reports that proudly and uncritically announce “80 percent of discussion around the brand is positive” based on these automated scores, you must call it out as the nonsense that it is. Because even the best sentiment analysis is unable to tell you what the positive terms actually relate to, or to reliably detect slang or sarcasm.
Consider, for instance, this comment from a new mum debating the merits of one of our clients, SMA Baby Milk:
“Before having children I only knew of Aptamil and SMA from TV advertising and hated Aptamil’s adverts and their claim of being closest to breast milk.”
Pretty clearly a negative comment mentioning the brand – but it’s actually negative about the competitor’s product marketing!
We found that most often conversation that skews strongly positive or negative in the world of baby feeding is related to the specific health or behaviour problem parents are having with their babies, not just the product itself.
So now, when we report on conversations, we categorise everything as positive or negative in relation to specific topics (colic, wind, sleeplessness – it’s a catalog of misery!) and then in relation to the brand. This requires manual coding of the data but gives us in the end an extremely powerful glimpse into the lived experience of our customers.
The iceberg effect
Brands also need to remember that 90 percent of everything that happens in social media happens below the waterline – out of reach to your naked eyes, but equally invisible to social media listening tools.
We love Twitter because not only is it a lively and exciting source of conversation, but its open API and culture of public posting means it’s almost 100 percent searchable, trackable and USEABLE.
Unfortunately, however, this has led to Twitter and other open platforms being massively over-represented in social planning. Remember that not only private Facebook data but also person-to-person sharing (via email, WhatsApp etc.) is typically both greater in volume, and richer in context. After all, wouldn’t you pay more attention to an article forwarded to you privately by your best mate than something posted to the 2,000 followers of someone you sat next to in class 10 years ago?
One way of trying to understand what content sharing is taking place in private is to look for unusual patterns in visits to your web site. Are people going direct to pages deep inside your page, to a page URL that no one would realistically have typed? They may have been sent there.
We recently took a close look at the social media feeds for a major software player in the digital space who had an engaged audience. We found that they had a lively Twitter presence, with copious conversation, retweets and comments. But their Facebook page was relatively quiet with few people engaging with the content posted there.
This impression turned out to be incredibly misleading, however – over 40 percent of all web traffic arriving at the site was directed there from the Facebook page. Their audience was silently consuming.
To get a better look at what’s really happening on Facebook outside of your own page, you may want to try making use of the ability to search by topic. Once you know what types of terms are common in referring to your brand or issue, you can now get some limited data on the prevalence of these terms – this takes a little bit of work, and proprietary tools, but may be worth it for insight into this vital bur previously hidden world.
So what should I do now?
I’m not saying that social data in inherently unreliable – far from it, I think this is information that tells a much richer story about our world than we could previously even imagine. But you have to treat it with care to make it work for you.
I want urge anyone using social data to do three simple things:
- Be appropriately skeptical
If a result looks too good to be true, it may not be – make sure you or your agency has looked at the original content. Read these posts in context.
- Do not rely only on demographic data for media targeting
The days when it made sense to run paid media spend directed at ‘British men aged 19 to 24’ are long gone. Nowadays, you should routinely be layering in behavioural and interest-based targets that can have the added benefit of sweeping up customers with a proven purchase intent who would have been left by a straightforward demographic sweep. After all, unless you’re marketing jock straps or tampons, you shouldn’t market your product only to one gender.
- Use at least three dimensions of data for your planning
Whereas any one measure in isolation can be extremely misleading, using at least distinct TYPES of data (for instance: demographics, qualitative content analysis, and web traffic sources) will tell more of the story.
So go forth, let the excited teenager in you run free in the new world of social data.
Monday 14th September 2015, 9am
Sainsbury Wing Theatre, The National Gallery
An event for Social Media Week London 2015. [Register for your free ticket] For the latest from Social Media Week London 2015, follow @OgilvyUK and @ogilvydo on Twitter and #OgilvySMW.