Techno-sociologist Zeynep Tufekci on the pitfalls of machine learning

Zeynep Tufecki, who initially trained as a computer programmer before moving into sociology, was speaking at the Hitachi NEXT conference in Las Vegas

Zeynep Tufekci believes caution needs to be exercised when using and implementing machine learning software (Credit: Wikimedia)

An exponentially growing marketplace, AI and machine learning are estimated to have the potential of creating an additional $2.6tn in marketing and sales by 2020, and up to $2tn in manufacturing and supply-chain planning, according to Swiss analyst EconSight.

Yet, with this largely unbridled growth, some experts have raised concerns over the potential ethical and societal impacts of the ever-deepening penetration of this technology within our daily existence.

Speaking at Japanese tech firm Hitachi’s NEXT conference in Las Vegas, techno-sociologist Zeynep Tufecki is one such voice that believes caution must be exercised when using and implementing machine learning software.

Originally training as a computer programmer before moving into sociology, Tufekci says: “My technical side is super-excited, my society, sociology side, worried. I’m worried because such transitions come with a lot of challenges.

“They come with a lot of pitfalls, a lot of difficulties, it doesn’t mean a lot of good things are impossible, it just means you have to be very careful about how you go about it.”

‘Machine intelligence is less like something instructed, more like something that is grown’

When developing and programming technology in the past, coders and software engineers would typically give direct instructions to a system, informing it step by step, how to do the function as the developers wanted it to be done.

However, when it comes to machine learning devices, the developers take a whole lot of data that’s being collected and input it into these programmes that create neural networks, which create classifications and answers.

“Here’s the tricky, amazing, fascinating, but also really challenging part, they’re black boxes to us,” explained Tufekci.

“They’re black boxes, in that we don’t really understand how they come at their insights, we don’t really know what they’re doing, we can just see how they work, and they work sometimes surprisingly well.

“We also don’t have a lot of tools we have with other statistical methods, other things when we have some understanding of what’s going on, false positives, false negatives.

“What happens when you do that is you’re sitting lose a kind of intelligence, machine intelligence, it’s less like something you’ve instructed, more like something you’ve grown.

“It’s kind of like having this… child that you can try to influence, but in the end, it does what it does this machine intelligence, and that can have really dangerous consequences.”

To explain this process in more detail Tufekci gave the example of the YouTube recommendation algorithm.

She discovered when using the video streaming website to watch a couple of rallies with then-presidential candidate Donald Trump, that YouTube would go on to recommend increasingly far-right content.

Tufekci said: “YouTube started recommending scary things to me, I started getting autoplay and recommendations, maybe white supremacist kind of things, but it was unclear.

“Pretty soon, it was recommending the Holocaust never happened kind of stuff to me, and I thought whoa, what’s happening here, this is kind of scary.”

She began to experiment with the algorithm by watching an array of videos, from Hillary Clinton and Bernie Sanders rallies to how to be a vegetarian, and how to jog —discovering YouTube would push her to one extreme or the other.

Tufekci said: “So what’s going on here? Is YouTube trying to recruit everyone to a more extreme version of whatever they start, not on purpose, not at all — what they had done was use machine learning to optimise for increasing engagement, increasing watch time.

“What the machine learning system had done was to discover a human vulnerability, and that human vulnerability is things that are slightly edgier are more attractive, more interesting.

“In fact, we kind of learnt that when Facebook CEO Mark Zuckerberg was talking about extreme stuff and misinformation on the platform, and he said that, algorithmically speaking, what they found was that content that pushes boundaries gets more engagement.

“It kind of makes sense when you think about people, things that are novel, exciting conspiracies, they’re more interesting than the kind of boring ways of here’s the basic truths.

“YouTube’s algorithm had discovered this human vulnerability and was using this at scale to increase YouTube’s engagement time without a single engineer thinking, this what we should do.

“What we learn from this example, we learn that when we use these programmes, machine learning, especially for insights, we have to do extra things and say, there are shadows here, this is a technology we do not fully understand.”

Teams needed to ethically manage and use data, says Zeynep Tufekci

Since the rise of more personalised technology, businesses have seen the amount of data rise, with an estimated 2.5 quintillion bytes of data created daily.

With this, firms could potentially find out more information about someone than it wants or needs, such as fitness tracking software probably being able to track the onset of depression due to less movement and different patterns.

Tufekci said: “What are you going to do with this data? How are you going to analyse it? How are you not going to analyse it?

“So every kind of thing we do in this requires what I would call a red team, you need people in the room who are gonna say, there’s light and their shadows in this technology.

Tufecki believes unless this data is properly managed, analytics in this field could fall out of favour with people, despite the good it can do.

She said: “Just like any other technology, if we pay attention to potential dangers and pitfalls, we can do a lot better because if we don’t pay attention to that side, people are not going to want to do the good things we could do.

“If the data can be used to fire you, or to figure out protesters, or to use for social control or not hire people prone to depression, people are not going to want this.

“What would be much better is to say, what are the guidelines? What are some technologies, we can develop their way of, for example, keeping data encrypted, but getting insights from it anyway, so that you have the insights you want without having that sort of granular plain text that is on everyone.

“These are the kind of mathematical and technical solutions that will allow us to have our cake and eat it too, and deal with some of the ethical challenges.”