Don't trust before you verify

Smoky mirrors everywhere

Mar 23, 2024

If you’re following the developments of generative AI (and its attendant AGI debates), by now you’ve probably seen the March 13 Wall Street Journal interview on YouTube (and the accompanying article) with Mira Murati, OpenAI’s CTO (and once upon two days CEO) about Sora, the text-to-video generation tool OpenAI has been teasing the world with.1

A few minutes into the video, Joanna asks Mira about the source of the video data used to train Sora:

Joanna Stern: What data was used to train Sora?

Mira Murati: We… used publicly available data and licensed data.

Joanna: So… videos on YouTube?

[Mira looks away, grimaces for three full seconds, then looks back at Joanna]

Mira: I’m actually not sure about that.

Joanna [sees she’ll have to pull teeth]: Okay. Videos from… Facebook, Instagram?

Mira: You know, if they were publicly available, umm… available… yeah, publicly available to use, there might be that data, but I’m not sure. I’m not confident about it.

Joanna: What about Shutterstock? I know you guys have a deal with them.

Mira: I’m just not gonna go into the details of the data that was… that was used. But it was publicly available or licensed data.

You’ve probably also seen, if not directly participated in, the questions and comments pouring forth from all corners of the Internet, to the tune of:

How does a CTO not know where the source data for their company’s products is coming from?

What’s Mira hiding?

Whoever media trained her did a pretty bad job.

OpenAI’s lawyers told her what to say. And what not to say.

Of course they used (stole) copyrighted videos. Otherwise she wouldn’t dodge the question like that.

OpenAI’s strategy is get to pole position no matter what, because by then the lawsuits will be too late.

More on all that here from the patiently prolific Gary Marcus.

In the meantime, let me tell you a story.

When I was in elementary school in what was then called Czechoslovakia, one particular third-grade morning still burns in my memory. The teacher was explaining to us the difference between capitalism and communism.

“Capitalism is bad because people don’t share things,” she said. “But communism is good, because people share things. It’s communal.”

I kept my pretty little blonde head still and my mouth shut. I knew better than to swallow the propaganda—yes, even at 8 years old. Many of us knew. We knew there was a world outside the Iron Curtain. But we also knew better than to speak out or argue with teachers. We kids had to sit in our seats with our hands behind our backs, nice and straight, and when the teacher entered the room, we’d all stand up and say, in well-mannered unison, “Good morning Comrade [teacher’s last name]!”

Hell we couldn’t even complain about dirt-encrusted potatoes in the grocery store. You never knew who within earshot was a clandestine informant for the Party. People get disappeared. Jailed. Tortured. And worse. You think cancellation is bad. Try risking arrest for trying to buy Kellogg’s Honey Smacks on the black market.

But this isn’t an essay about capitalism vs communism, or even about freedom (I have hella lot to say on that topic, coming from a country that had none, and also plenty to say about the two socioeconomic systems, having lived in both. Hint: neither one is all good, and neither one is all bad.).

This is an essay about trust.

Confido sed cognoscere

It is natural to trust. Trust brings safety, security, comfort, stability. In its absence, uncertainty, insecurity, stress, and anxiety come flooding in. Your sense of balance and stability start to liquefy.

Functionally speaking, it is easier to trust than to mistrust (or distrust).2 When you trust, you don’t have to research, fact check, or second guess. It’s face value currency. Mistrusting means you get invited to dinner but you’re the one who has to do the work: not only do you have to run out for groceries, find the recipe and cook, but you’re doing all the dishes afterwards while the person who invited you over sits on the couch with a margarita. Distrust means you know the cook is trying to poison you—and you still have to make your own dinner.

You might be shaking your head. I don’t just blindly trust random people. Or the mainstream media. Or politicians. What do you mean it’s easier??

Whether you arrive at the place of trust intuitively or as a result of research, or whether you get there instantly or slowly over time is irrelevant to this point. The act of trusting feels good, comforting, easy. The act of mis- or distrusting carries a lot of weight—and work—with it.

And yet, we need both trust and mistrust for our society to function. We need to be able to trust that every driver will stop at a red light, but still take that extra second or two after our light turns green. We should be able to trust that planes are still the safest mode of travel, but maybe follow the news about Boeing (and its entire corporate history) a little more closely—and not park in long-term parking beneath flight paths. We want to trust that the chocolate we buy isn’t linked to slave labor, even if we know it depends on what brand we’re buying. Happily, we can place 100% of our trust, completely blind, in the laws of the universe—time, gravity, and light move according to immutable standards. Until of course we get to quantum physics, but those are devils dressed up in details.

A certain level of distrust is healthy. You don’t automatically trust a phone number you don’t recognize. You don’t eat random berries in the forest. And you certainly don’t trust a stranger on the street asking you to drive them somewhere.

So why do we trust politicians who say what they know we want to hear, and do what they want once they’re in office?

Why do we trust CEOs? Why do we trust the media? The clergy? Our teachers and doctors and police officers? Why do we trust any authority figure?

Because we want a world we can count on. Strangers asking for a ride and strangely colored berries in a forest present potential danger; the people occupying important positions in society are supposed to provide for and protect us.

The unfortunate truth is that numerous public figures, particularly in business and government, provide for and protect themselves and/or their families and friends, too often at the public’s expense. As a result, we don’t trust them as much as we used to. A recent study by Gallup and the Knight Foundation found that half the respondents (close to 5,600 people) believe news organizations “deliberately mislead” the public. Fewer than two in ten Americans now trust the Federal government, according to the latest figures from the Pew Research Center (in 1958, when the study began, 75% of us trusted the government. We’ve come a long way, baby—a long way down). In a curious twist of irony, while business in general is seen as a more trustworthy institution than government (and that’s worldwide), individual executives (read: CEOs) are among the least trusted organizational leaders—this according to the 2023 Trust Barometer by Edelman. Want a double serving of irony? Take a peek behind the curtain of Edelman’s own client roster—sparkling greenwashed clean.

Speaking of irony. Being born in a country where political propaganda gets mixed into baby food either helps you build a strong and resolute psyche, or turns you into a cookie cutter pawn. You learn from a very young age certain institutions and figures are not to be trusted; you’re mistrustful by default. If you’re born in the land of the (supposedly) democratic and free, your default settings are set to ‘trust,’ and this gives you some catching up to do as far as trust in public figures and institutions is concerned—but it doesn’t take much these days.

It also doesn’t take much to surprise a friend or a loved one with a gift subscription—and support The Muse.

Give a gift subscription

Back to Mira and Joanna

In a recent conversation I had about the WSJ/OpenAI interview with a longtime friend of mine, a seasoned and successful engineer and entrepreneur, we explored possible reasons why Mira didn’t simply answer the question about the videos that were fed into Sora. It’s clearer than a sunlit day that there’s a deeper reason for Mira’s insistent refusal to answer Joanna’s questions.

Watch the full interview here (the data source questions start at 04:24):

OpenAI is in a tough spot, largely of its own making; Mira happens to be the person in the hot seat in this interview. She’s got a few choices:

A. Admit where the training material for Sora comes from. This would reveal OpenAI’s sources and potentially open the company up for legal action.3

B. Feign unfamiliarity with the source of the training data. This would drive public perception of her as an incompetent CTO—or a dishonest one.

C. Implicitly accept familiarity with the data sources but refuse to disclose them. This would avoid negative public perception of her personally, but cast OpenAI the company in a questionable light.

In the interview with Joanna, she exercises option B, effectively sacrificing the competence and integrity of her public persona for the sake of the company.

A Chief Technology Officer absolutely SHOULD know where the source data or material for their company’s products are coming from. In fact, Mira is a highly seasoned technologist and an experienced product lead—she could not hold the position of CTO in an AI company otherwise. Satya Nadella of Microsoft has sung her praises; Joanna Stern herself has previously outlined Mira’s experience and responsibilities in a November 18, 2023 article published by the Wall Street Journal. This is particularly telling [italics mine]:

“Murati’s teams are responsible for training OpenAI’s current and future large language models, making her an essential part of the decisions of which training data is—and isn’t—worth including and paying for. ... Murati has also spoken about the importance of improving data attribution to let people know where AI answers come from.”

Mira inadvertently contradicts herself in the video interview above. At first she tells Joanna she’s “not sure” where the data comes from, but finally takes a firmer stance and says “it was publicly available or licensed data.”

So, case pretty much closed. The CTO of OpenAI is directly involved in making the decisions regarding training data and its sources—as she should be.

Given all that OpenAI has done and been through, from infuriating creators whose content has been used for training LLMs to whipping up frothing global fear about AGI, there should be no surprise that it’s backed itself into a corner—and its leadership with it. If Mira had said, simply and plainly, “I’m not authorized to disclose that information,” without hedging or hesitating as she does in the interview, she would avoid accusations of incompetence or disingenuousness, but she would have shifted the implicit blame onto OpenAI—the underlying message would be not that she doesn’t know but that the company is barring her from disclosing the information publicly. We don’t know whether it was her choice at the spur of the moment, or whether she was directed to answer as she did, perhaps by OpenAI’s legal counsel. Either way, lately both the company and its executives have been shedding the public’s trust by the bushel.

It takes just minutes to wreck trust it took years to build up.

The primary problem here is that OpenAI, my friend the entrepreneur observed, has been deified. They’re the first company most of us think of when we think or talk about generative AI (largely due to this next sentence »). They’re the most immediately visible company in media coverage of generative AI; the media report on—and we eagerly gobble up—their every move, their every word and every tweet. The company is led by a virulently ambitious CEO who’s trying to have it all ways. As in politics, so in business: the more influence and money you amass, the lesser your ability to gauge your own fallibility and the deficiencies of your power.

OpenAI has the choice and the freedom to come clean, with all of us. No one is pinning Sam against the wall, threatening to do him in if he does right by all the people and organizations whose work has been used to train his models. The risk, which leadership is all too aware of, is the potential demise of the firm, through lawsuits and flight of capital. No doubt they would take down a few of their competitors as well, by association. It’s certainly a tough call, and it shouldn’t be surprising to anyone that OpenAI is doubling down. But there is also nothing stopping its leadership from starting over, and rebuilding the trust they should have focused on from the start—with creators, with the media, with the public. Perhaps this would start an unprecedented sea change in the tech sector, that their own power and fame are blinding them to.

No matter how much magical thinking Silicon Valley spins up, the ancient myths that underlie so many of the fall-from-grace stories we have seen over the years, from Theranos to FTX, are as steadfast as the laws of physics… we all know what happens to Icarus when he flies a little too close to the sun.

What few leaders seem to want to see, and what the myths are patiently waiting for us to realize, is that there is nothing inherently wrong with power, influence, or money. What matters is how you use it.

The Joanna/Mira dialogue starts at 04:25 in the video.

To mistrust implies a sense of doubt or skepticism about someone or something in general terms, while distrust points that doubt, or outright lack of trust, firmly at a specific target, be it a person, organization, or entity.

Mira did confirm to Joanna that Shutterstock is one of the sources, but no doubt only because Joanna already had that information—so nothing shocking there.

The Muse

Don't trust before you verify

Smoky mirrors everywhere

Confido sed cognoscere

Back to Mira and Joanna

Discussion about this post