No TechCrunch: Watching Streams and Video is Not Copyright Infringement

There’s a lot of bad legal takes when it comes to copyright and AI. You can add watching free streams to that list now.

Look, we get it, copyright law is not exactly the easiest thing in the world to understand. It does take a certain level of training and education to understand some of the many ins and outs of it. Even then, though, some of the legal takes we’ve seen in recent years has been particularly facepalm worthy. The worst part about it is that these takes were written in such a way that it pretends to be an authority on the subject even though it is plain as day clear that the authors in question have absolutely no idea what they are even talking about.

One of those bad legal takes on copyright is this idea that reading is copyright infringement. According to some, just the act of you reading this article right now is supposed to somehow allow me to file a copyright infringement lawsuit against you for damages even though I put this article out there online for free for all to read. It doesn’t take a genius to realize how monumentally stupid such an idea actually is. Like, seriously, what is the point of writing if I think it’s bad that someone might (gasp!) read what I wrote in the first place? Heck, other writers might learn how to write better if they read what I wrote and that is somehow supposed to make such a case stronger.

Yet, there are those that not only seriously believe reading is copyright infringement, but some are actually filing lawsuits making these arguments on top of it all. How could things get so monumentally stupid? Apparently, the magic words to start coming up with such ridiculously stupid bad legal take is artificial intelligence (AI). The second someone brings in AI, a dumb bomb goes off and the dumbest legal takes imaginable is somehow gospel even though it makes absolutely no sense whatsoever.

Of course, its worth pointing out that what fuels this stupidity is a set of myths surrounding AI such as how it is supposed to cause the extinction of the human race, that AI is sentient in some way, that it is this foolproof perfect technology that will perfectly replace humans at every task, and that it will render all creative and/or writing jobs useless any day now. Simply put, none of the above is true and it astonishes me that I even have to point that out still. Yet, some people believe the mainstream media hype and jump to all sorts of conclusions such as how AI is this monolithic evil that must be destroyed because it is somehow either the AI or them.

Others believe that it is an arms race to be the first to have that perfect technology and frantically develop an AI to automate tasks that AI has no business in taking on. In my own field of journalism, for instance, I’ve repeatedly seen efforts to replace journalists with AI result in fail after fail after fail after fail after fail. Yes, there are threats to journalism today, but AI is not one of those threats.

Yet, this was never a deterrent for the lawsuits that cropped up from mainstream media. For instance, Canadian mainstream outlets filed a lawsuit against OpenAI seemingly based on this moral panic that AI is somehow taking over the world. The lawsuit was an absolute clusterfuck. It was largely the mainstream media arguing that their work is important and, therefore, OpenAI owes it money because “GIMME!!!”. They argued that reading and comprehending publicly available material is copyright infringement and that robots.txt was somehow circumvented (this without a shred of evidence to back that up) and that constitutes breaking a technical protection measure (TPM). This despite the obvious fact that a robots.txt file is not legally binding nor is it a TPM by any stretch of the imagination. University law professor, Michael Geist, would later agree with me and conclude that the lawsuit is absolute garbage and it is a very weak attempt to try and score some kind of settlement.

Still, that isn’t stopping the extremely bad legal takes out there from people seemingly pretending to be legal experts in the field. Another bad legal take comes from TechCrunch and reposted onto MSN. In it, the author seems to argue that watching videos and streams that are freely available somehow constitutes copyright infringement. No, really:

OpenAI has never revealed exactly which data it used to train Sora, its video-generating AI. But from the looks of it, at least some of the data might’ve come from Twitch streams and walkthroughs of games.

OK, first of all, Twitch streams and game walkthroughs are generally online so they can be watched. What the heck is the point of a Twitch stream if you don’t want anyone to watch it?

Somehow, this is supposed to be the basis for making accusations that OpenAI is infringing on copyright. I mean, the act alone clearly isn’t violating copyright infringement.

What’s more, if it’s streamers and people producing these video’s that would somehow be the ones having their copyright somehow violated, I don’t see there being a case. By no means is the output going to be a one to one replica of the streams and video’s themselves in the first place, so who is going to be filing the infringement lawsuit in the first place?

The second paragraph, ironically, wound up being another argument against the idea that watching streams is copyright infringement (even though it is presented as an argument for claiming copyright infringement):

Sora launched on Monday, and I’ve been playing around with it for a bit (to the extent the capacity issues will allow). From a text prompt or image, Sora can generate up to 20-second-long videos in a range of aspect ratios and resolutions.

OK, so, let’s look at the other possible argument out of this. That is that the publishers of the video games themselves that would be a supposed victim of copyright infringement here. A major question over whether or not the product being produced is a replacement to the original work or not. The publishers themselves produce video games. What this AI is producing is a 20 second video clip. If that were the argument, then all walkthroughs, streams, stills, screenshots, logo’s, etc. are copyright infringement full stop. That is absolutely not the case legally because in both Canada and the US, there are exceptions known as fair dealing and fair use respectively. Reproductions of copyrighted work are made all the time and some are made with (gasp!) profit making in mind. Examples includes educational purposes and journalistic purposes.

It takes nothing to conclude that a 20 second clip is not replacing a whole freaking video game from the getgo and the legal argument fails right then and there.

The author then posts a number of these short clips, seemingly arguing that the clips themselves infringe on copyrighted works. Shortly after, the author unknowingly face plants again with this paragraph:

Granted, I had to get creative with some of the prompts (e.g. “italian plumber game”). OpenAI has implemented filtering to try to prevent Sora from generating clips depicting trademarked characters. Typing something like “Mortal Kombat 1 gameplay,” for example, won’t yield anything resembling the title.

Yeah, so measures were already implemented to try and avoid even trade mark issues. This is something the DMCA generally requires these days: that measures were put in place to prevent your online tool for being used to infringe on copyrighted works in the first place. As a result, it requires a lot of effort to try and push out something from the other end that even resembles something that looks familiar.

Yet, astonishingly, the author felt that these arguments for (which really are arguments against) copyright infringement as this is what was written next:

But my tests suggest that game content may have found its way into Sora’s training data.

OpenAI has been cagey about where it gets training data from. In an interview with The Wall Street Journal in March, OpenAI’s then-CTO, Mira Murati, wouldn’t outright deny that Sora was trained on YouTube, Instagram, and Facebook content. And in the tech specs for Sora, OpenAI acknowledged it used “publicly available” data, along with licensed data from stock media libraries like Shutterstock, to develop Sora.

OpenAI also didn’t respond to a request for comment.

If game content is indeed in Sora’s training set, it could have legal implications — particularly if OpenAI builds more interactive experiences on top of Sora.

Um… no, no it doesn’t. For that, you have to actually present an actual case for copyright infringement and the author completely failed to do so. Yet, the author managed to find a useful idiot to conclude that infringement did occur:

“Companies that are training on unlicensed footage from video game playthroughs are running many risks,” Joshua Weigensberg, an IP attorney at Pryor Cashman, told TechCrunch. “Training a generative AI model generally involves copying the training data. If that data is video playthroughs of games, it’s overwhelmingly likely that copyrighted materials are being included in the training set.”

Again, where is the copyright infringement? Watching and comprehending streams and video’s isn’t something that constitutes copyright infringement. The law doesn’t differentiate who (or what) is doing the watching.

The article goes to great pains to mention that this has been a source of litigation:

That has understandably displeased creators whose works have been swept up in training without their permission. An increasing number are seeking remedies through the court system.

Microsoft and OpenAI are currently being sued over allegedly allowing their AI tools to regurgitate licensed code. Three companies behind popular AI art apps, Midjourney, Runway, and Stability AI, are in the crosshairs of a case that accuses them of infringing on artists’ rights. And major music labels have filed suit against two startups developing AI-powered song generators, Udio and Suno, of infringement.

Many AI companies have long claimed fair use protections, asserting that their models create transformative — not plagiaristic — works. Suno makes the case, for example, that indiscriminate training is no different from a “kid writing their own rock songs after listening to the genre.”

Yet, somehow conveniently left out of the article is some of the results of the lawsuits being filed against generative AI large language modules (LLMs). For example, this lawsuit last year as posted by TechDirt:

It appears that the judge overseeing that lawsuit has noticed just how weak the claims are. Though we don’t have a written opinion yet, Reuters reports that Judge William Orrick was pretty clear at least week’s hearing that the case, as currently argued, has no chance.

U.S. District Judge William Orrick said during a hearing in San Francisco on Wednesday that he was inclined to dismiss most of a lawsuit brought by a group of artists against generative artificial intelligence companies, though he would allow them to file a new complaint.

Orrick said that the artists should more clearly state and differentiate their claims against Stability AI, Midjourney and DeviantArt, and that they should be able to “provide more facts” about the alleged copyright infringement because they have access to Stability’s relevant source code.

“Otherwise, it seems implausible that their works are involved,” Orrick said, noting that the systems have been trained on “five billion compressed images.”

Again, the theory of the lawsuit seemed to be that AI companies cut up little pieces of the content they train on and create a “collage” in response. Except, that’s not at all how it works. And since the complaint can’t show any specific work that has been infringed on by the output, the case seems like a loser. And it’s good the judge sees that.

He also recognizes that merely being inspired by someone else’s art doesn’t make the new art infringing:

“I don’t think the claim regarding output images is plausible at the moment, because there’s no substantial similarity” between images created by the artists and the AI systems, Orrick said.

It’s amazing how frequently these anti-AI articles have no problem mentioning that lawsuits were filed, yet conveniently forget to publish how these lawsuits already have a history of losing in court afterwards.

Also, for those who think the above case is a one off, here’s another result from this year as noted by TechDirt:

So far, we’ve seen that these cases aren’t doing all that well, though many are still ongoing.

Last week, a judge tossed out one of the early ones against OpenAI, brought by Raw Story and Alternet.

Part of the problem is that these lawsuits assume, incorrectly, that these AI services really are, as some people falsely call them, “plagiarism machines.” The assumption is that they’re just copying everything and then handing out snippets of it.

But that’s not how it works. It is much more akin to reading all these works and then being able to make suggestions based on an understanding of how similar things kinda look, though from memory, not from having access to the originals.

Finally, the judge basically says, “Look, I get it, you’re upset that ChatGPT read your stuff, but you don’t have an actual legal claim here.”

Let us be clear about what is really at stake here. The alleged injury for which Plaintiffs truly seek redress is not the exclusion of CMI from Defendants’ training sets, but rather Defendants’ use of Plaintiffs’ articles to develop ChatGPT without compensation to Plaintiffs. See Compl. ~ 57 (“The OpenAI Defendants have acknowledged that use of copyright-protected works to train ChatGPT requires a license to that content, and in some instances, have entered licensing agreements with large copyright owners … They are also in licensing talks with other copyright owners in the news industry, but have offered no compensation to Plaintiffs.”). Whether or not that type of injury satisfies the injury-in-fact requirement, it is not the type of harm that has been “elevated” by Section 1202(b )(i) of the DMCA. See Spokeo, 578 U.S. at 341 (Congress may “elevate to the status of legally cognizable injuries, de facto injuries that were previously inadequate in law.”). Whether there is another statute or legal theory that does elevate this type of harm remains to be seen. But that question is not before the Court today.

While the judge dismisses the case with prejudice and says they can try again, it would appear that she is skeptical they could do so with any reasonable chance of success:

In the event of dismissal Plaintiffs seek leave to file an amended complaint. I cannot ascertain whether amendment would be futile without seeing a proposed amended pleading. I am skeptical about Plaintiffs’ ability to allege a cognizable injury but, at least as to injunctive relief, I am prepared to consider an amended pleading.

Simply put, claiming copyright infringement to attack LLMs in the first place is a legal non-starter. The courts are increasingly tossing these cases for the junk lawsuits that they are. Heck, in some instances, there might be a stronger case for using moral rights where people’s likeness is used for something that is outputted by AI (especially in the case of AI that is used to create NSFW material), but the angle being used to attack AI is just not working. Yes, there are cases that are ongoing, but the legal trend here is not exactly looking promising for those who think they have a case against LLMs in the first place. By all means, though, feel free to try anyway and waste your time and money losing in court. Nothings stopping you there.

At any rate, the TechCrunch author has completely failed to present a case for copyright infringement here. The author screamed about how OpenAI used streams and video walkthroughs. The logical response here is, “so what?” You have to actually show actual copyright infringement take place. Watching a stream or a publicly available YouTube simply isn’t a case.

Drew Wilson on Mastodon, Twitter and Facebook.