Why BookTok is freaking out over Google Docs

BookTok went into a frenzy last month after users noticed that Google Labs End User License Agreement (EULA) had added a new, and somewhat alarming, clause: That it could ingest all your prompts and outputs on Google Docs to train its AI.

A number of TikTok creators—mostly authors, and some readers who make content about their favorite books—sounded the alarm with a series of viral videos, speculating that this would include their unpublished drafts sitting in Google Docs.

Creators strategized, sharing ways to download their material off Google Drive, and switch their composing to open-source or anti-surveillance platforms.

Rebecca Thorne, a fantasy author with a sizeable following on TikTok, shared a video that highlighted the potential privacy problem and offered alternatives to Docs. The video racked up more than a hundred thousand views.

@rebecca.thorne #stitch with @WoppyDoesThings Next Cloud Hub—their word processor is OnlyOffice, and has all the same features as google docs! You can sign up for free and just pay for server space—which can be as cheap as 5 euro (25gb space, up to 20 euro / month (1000gb space). Google is NOT your friend. Please keep your intellectual property safe, folks!! #writertok #authortok #booktok #book #fyp #googledocs #nextcloud ♬ original sound – Rebecca Thorne

“I don’t think any of us expected [the AI era] to come so quickly,” Thorne told the Daily Dot. “And then none of us were thinking about how the AI would be trained. That I think is why we’re seeing this massive surge in people who are panicking.”

That came on the heels of a recent scandal about Prosecraft, an AI tool that analyzed various statistics about published books, such as how many adjectives or sentences a given text had, ostensibly as a way of analyzing or improving one’s writing. Much like the Google Docs scare, the worry is what happens when AI ingests texts without the author’s knowledge, theoretically then able to reproduce plagiarism-lite versions of the original.

Google Docs has been a popular free writing tool for the last ten years because it carries a number of key advantages over similar word processors. All of your documents are accessible from any computer as long as you log into your Google account. You can also share permissions and collaborate on a document extremely easily. All of the changes and edits are saved into the cloud, meaning you can easily revert to an older version.

Thorne explained that her and her reader—her girlfriend who “makes comments on draft documents so they can be fit for human eyes”—use the live collaboration feature all the time. “Even though I write my basics in my Word document and I save locally to my computer, I copy and paste everything into Google Docs. It’s easier than sending a document back and forth over Discord,” she said.

Early this July, Google’s privacy policy changed, allowing them to scrape everything that you’ve ever written on a Google platform—for instance, reviews on Google Maps. This was a different change than the terms of service for Labs AI products, which refer to prompt inputs and outputs in your docs.

While Google has now claimed that any and all writing in the public is fair game, Google explicitly said it will not feed your personal documents into its AI products without your consent.

Google’s privacy policy explains that it collects your content and it reserves the right to use that data to improve or maintain its services. But it does make it extremely clear that although this sometimes means that Google employees or contractors do look over personal information, such as for labeling or advertising purposes, it does not read your email, except in really specific edge cases, like when users ask them to adjudicate cases of abuses or look into bugs, or when it gets subpoenaed.

When approached by the Daily Dot, a Google spokesperson strenuously denied that Google uses private document content to train AI.

“To be very clear: your interactions with intelligent features (spell check, Smart Compose, spam filtering) within Google Workspace are only ever used in an aggregated and/or anonymized fashion to improve these features within Workspace. That’s it. Your content is not and has not been used to train Bard, Search, etc.”

It did not deny, however, that its terms of service do grant them the ability to do that. And time has shown that the Silicon Valley ethos of move fast, break things, and ask permission later often means that regular people lose out on privacy rights that are even slightly ceded to big companies.

Justin Hughes, a law professor who lectures in intellectual property at Loyola and Oxford, explained that good lawyering often involves “intentional ambiguity.”

“It’s a little too clever to say a tech company would never [ingest user data] without your consent when you’ve given your consent to a whole bunch of very complex stuff in the terms of service,” he told the Daily Dot. “If a tech company says, we’ll never use your data or your materials for AI training without your express and specific consent, that would be a little different than saying without your consent.”

The terms of service are a “private legal framework” between consumer and company, he explained, and a tech company “has the incentive to be clear up to a point, but also the incentive to keep its options open.”

And Google has been proactive in pushing its right to use data for its AI. In a draft to the Australian government about its AI legislation, Google lobbied to amend copyright law to allow companies to scrape published text for AI.

“When it comes to AI training on huge data sets like Zoom might have, or a company that records university lectures or an email service provider, we just haven’t had clarity. It’s reasonable for people to be ringing the alarm bells,” Hughes said.

Hughes said he couldn’t really blame tech corporations, as they’re all providing free tools and services and it’s reasonable for them to think about how to recoup the costs of those services. And that privacy implications were inherent in the arrangement.

“I just can’t imagine why anyone would’ve ever thought at the beginning that putting stuff on the cloud, which just means putting it on some server you don’t control, would be a good way to ensure the privacy and confidentiality of their materials,” he said.

This echoes the recent panic with Zoom, which recently about-faced and agreed to not train its machines on user data. Much like Google, it did not explicitly deny that its terms of service would allow them to do this: It simply affirmed that it does not.

For Rebecca, the panic about AI scraping writers’ drafts belies greater fears about the publishing industry. Indie authors already struggle with profit margins, and have to compete with larger publishing houses and Amazon.

“Even indie authors are still being held under the thumb of the corporations that are trying to implement this type of thing,” she said, pointing to the fact that some 80% of indie book sales go through Amazon.

The sales juggernaut has no AI writing software yet, but AI-written novels represent an existential threat to authors and their intellectual property. Junk AI-written books already flood Amazon’s marketplace, and authors are having to fight to prove that books “written” under their own name are actually AI jumbles pirating off their brand.

And if Google Docs were to start parsing drafts, the situation could only get worse.

Source: https://www.dailydot.com/debug/google-docs-ai-author-concerns/