TalkTalk faced a raft of connectivity issues that hit its standing with customers — before arming itself with a new data-monitoring tool

TalkTalk users experienced connectivity issues before the firm discovered the root cause (Credit: PxHere)

TalkTalk isn’t the biggest or most highly regarded internet service provider (ISP), but since it stumbled across a tool that allows it to monitor the data behind certain processes in real-time, it’s managed to tackle some of the issues holding it back from becoming a major player. The two IT enthusiasts behind its new data-led approach to business tell Peter Littlejohns what they’ve achieved so far. 

The internet is so central to our daily lives that were he still alive, famed psychologist Abraham Maslow would have to include it in his hierarchy of needs, right beside essentials such as food, water and sleep.

It’s no surprise, then, that British ISP TalkTalk has suffered from customer churn through its years of well-documented connectivity issues.

Little more than a minnow in the country’s internet service market, it was always going to have an uphill struggle against the likes of BT — which had a market share of 34% in the first quarter of this year — as well as secondary giants Virgin Media and Sky.

But Matt Wood, head of internal innovation wing TalkTalk Labs, and Paul Emmett, head of the company’s network operations, believe its fortunes may be about to change with the application of a new real-time data insight strategy aimed at making it the best ISP in the UK.


Getting to the bottom of connectivity issues

When an internet user enters a URL, a digital handshake occurs between their device and the website, resolving its name into an IP address and allowing the browser access.

The tool used to navigate this handshake is a Domain Name System (DNS), a process linked to a physical server owned by the ISP or a third party that allows the change from domain name to IP address to take place.

An error during this process is one of many events that can prevent users from connecting to websites, and for those not tech savvy enough to be aware they can change their DNS settings to a reliable alternative like Google’s – it could mean disaster for an ISP’s customer retention rates.

“When customers switch their broadband on they expect it to just work, like turning on the hot tap at home and expecting hot water to come out,” says Emmett.

talktalk data
Paul Emmett, head of network and systems operations at TalkTalk

“When they did leave TalkTalk, one of the reasons they cited was poor connectivity.”

Due to the wide scope of causes that could have resulted in these problems, TalkTalk needed a way to dig into its operation and find the root issue – which is where Splunk came in.

A San Francisco-based software company, it offers a platform to investigate technical problems on a network, so that a fix happens sooner than later.

Emmett and Wood stumbled across the software accidentally, but it turned their connectivity woes upside down.

“One of the principal engineers at TalkTalk downloaded a free licence from Splunk, and we took lots of data and threw it into the software,” says Emmett

“We found quite quickly, and I mean within hours, that we had an issue with DNS.”

Armed with this knowledge, TalkTalk cut ties with its third-party DNS provider and invested in a new one.

“We’re now ranked number one for DNS response in Europe, which doesn’t mean we’ve got the best connectivity on the continent, but that was our first real use case with Splunk.”


Detecting network errors and proactively fixing them

ISP networks include a wide collection of interconnected computing hardware and software, all of which needs to be running correctly for end users to connect to the web without issues.

With so many possibilities for things to go wrong, network monitoring is vital to keep the shop open – but it’s not always easy to find the culprit behind a fault.

That’s why another area where TalkTalk applied Splunk on its network was monitoring, a process that, before crunching the numbers on its data, Emmett says was a simple as knowing “is the light red or green? Is it on or is it off?”.

But technological problems rarely come out of nowhere, instead they’re a culmination of a series of errors that leave behind a trail of suspicious log files.

Error detection is an area in which Splunk excels, with its first foray into the software market a tool that did just that.

talktalk data
Splunk’s toolset allowed TalkTalk to put its data under a magnifying glass to isolate issues

Now a much more advanced product embedded with smarter AI, Emmett and Wood were able to set-up the software to constantly investigate the network’s pipeline of processes to flag up anomalies and be proactive in fixing them.

This invariably means engineers can be called out of bed to investigate an issue, especially when network traffic peaks late at night — when many users are enjoying different recreational activities.

Emmett explains: “We ingest six million network tests per day into the platform, and that creates a picture of the UK and where the red flags are for problems.

“There’s more network activity happening at midnight than there is at three o’clock in the afternoon – I’m not going to get into what that activity might be – but Splunk sitting there all day creating alerts and sharing the data allows our engineers to respond accordingly.”

The software can’t anticipate every surprise however, and the recent release of popular free-to-play video game Fortnite’s second season shocked Emmett into thinking the network was experiencing a Distributed Denial of Service Attack – or DDoS to the more tech savvy.


Detecting customer issues before they complain

TalkTalk’s customer service problems are no secret in the UK, with the ISP sweeping the accolade for worst customer service provider among all broadband ISPs two years running.

The country’s regulatory and competition authority Ofcom stamped it with the label in 2018, due to a mix of customer complaints and satisfaction with issue handling.

The report revealed 18% of TalkTalk customers complained about their broadband, slightly more than Virgin Media’s 16%, and 40% of complainants were happy with the handling process, with Virgin Media at 46%.

But Wood expects these numbers to both descend and ascend in favour of TalkTalk in future reports, as it now uses Splunk’s data monitoring to find individual customer problems before they have to call and complain.

“By putting customer data into Splunk in real time, we can spot when things go wrong and fix the issues straight away – often before the customer has even realised.

talktalk data
Matt Wood, head of TalkTalk Labs

“One of the ways we’ve managed to scale that process is by linking it with Robotic Process Automation (RPA), so we don’t have to have people manually fixing these issues.”

TalkTalk automates these fixes using a tool made by RPA company BluePrism, but Wood says it’s just a temporary measure while engineers in the IT department take a deeper dive into the log-file data to find out what caused a customer to go offline.

“In parallel to the RPA trigger, we investigate what’s going wrong in the process and fix the root cause within the software — inevitably it will be a defect or unusual edge case in which a customer was set up wrong,” Wood adds.


TalkTalk’s next step

Wood and Emmett began to monitor TalkTalk’s data using Splunk 24 months ago, and in that time the company has seen an increase in its Net Promoter Score (NPS) – Ofcom’s measure of how many customers would recommend or dissuade people from joining an ISP.

According to Wood, the average score from customers in their first 30 days using the broadband service was -22 in 2018, but when he last checked it had grown to 6.

Ofcom’s stats show the overall NPS for customers, regardless of usage time, was -13 in 2018, increasing to -7 this year.

But as the only ISP measured by the regulator to have a negative net promoter score for its broadband service, Wood admits TalkTalk has a long way to go.

“We’re nowhere near good enough yet, we’ve got to go a lot further, and we think there’s a lot more opportunity in using Splunk to identify when things go wrong in our systems and network estate.

“That’s definitely what we’re going to focus on in the future.”