On August 7, author Zach Rosenberg took to X (Twitter) to post his anger at Benji Smith, accusing Smith of using his book, Hungers As Old As This Land, without his or his publisher’s permission.
Rosenberg was referring specifically to a site named Prosecraft. The site, at that time, offered automated analysis of thousands of books, looking at everything from total word count, passive voice usage and more. Rosenberg was upset that his book was used in that database without his permission.
The post caught the attention of other authors, who soon rallied against Prosecraft. According to an article in Wired that interviewed Smith, within a day hundreds of authors had sent cease-and-desist letters, the Author’s Guild was repeatedly contacted, and many more authors took to social media to voice their displeasure.
Smith, for his part, took quick and decisive action. That same day, he took down Prosecraft. In a post on the now-defunct site, he claimed that while he felt his site followed the boundaries of fair use, but saying that “Today the community of authors has spoken out, and I’m listening.”
The next day, he updated his main application, Shaxpir, to remove the features that relied on the Prosecraft database.
Though he put out a call to encourage authors to submit their work to the database, it seemed that the issue was done.
However, there is an interesting wrinkle in the story. Prosecraft is not a new service. It was launched in 2017 to very little fanfare. The project had flown under the radar for over half of a decade.
That raises a simple question: What made last week special?
The answer, as pointed out by Smith, Wired and others, is AI. However, it’s a story that’s illustrative of why authors are upset with AI and what companies need to do to avoid a similar wrath.
Shadow Libraries and Questionable Tactics
Before this recent controversy, Benji Smith was best known as the creator of Shaxpir (pronounced like “Shakespeare”), a desktop application aimed at authors to help them write novels or other long form fiction.
Part of that application allowed users to compare their works with others, looking at things such as “Vividness” and “Sentiment Analysis” and more. These judgements were made based upon an automated analysis of other literary works in the service’s database.
Prosecraft was the standalone version of that feature, allowing users to lookup statistics on thousands of books. While the analysis was deep, only small snippets of the original books were shown, usually to highlight a section that was the best (or worst) representative of a particular metric.
The issue was mostly with how Smith compiled this collection. Though the project started by using works submitted by the community, he quickly turned to using material scraped from the internet. That meant, inevitably, his library was built using pirated content.
That, in turn, was what authors dialed in on. Authors were upset that their books were being ingested without permission and having this analysis performed on them. Many compared Prosecraft to the way AI systems work, and some even accused Prosecraft of being a tool to feed and train AI systems.
However, as Smith noted in his apology post, that simply isn’t the case. His system was something much more simple and solely for helping human authors write better, not train AI systems at all.
As such, AI may not be the best analogy for what Prosecraft did. Instead, it may be a much older debate.
Looking Back to Google Book Search
In September 2005, the Authors Guild filed a lawsuit against Google and some of its partners, alleging that the search giant had infringed the copyright of many of its members by scanning print copies of books and creating a new search engine based on them.
The essential argument was that Google was making unauthorized copies of the books and sharing snippets in search results. Though the case dragged on for nearly a decade, in April 2014 the Second Circuit Court of Appeals upheld a lower court decision that found Google’s use of the books to create a search engine was protected by fair use.
That is surprisingly similar to what Smith did with Prosecraft. The main difference is that, where Google made unauthorized scans of legally-acquired copies, Smith simply copied already-digitized pirate copies.
However, as was made clear in the Wired article, it’s unclear if and how that distinction would impact a fair use ruling. Though the sourcing is dubious, a search tool that provides analytics about a book would seem to be at least an equally transformative use to a book search engine and would also have little, if any, impact on the market for that book.
Though it’s impossible to know exactly how a court case would go, Smith has a clear fair use argument that, at the very least, needs to be considered.
But the outrage wasn’t focused on the legal, at least not directly. It was focused on the ethical. Though authors upset enough with Google Book Search to, through the Authors Guild, file a long-running lawsuit over it, it didn’t represent an existential threat to human creators.
One of the reasons authors are so angry about that threat is that it was created, at least in part, by grabbing and training AI tools on their writing. In short, the bricks that the AI threat is built with were made by unwitting human authors, creating serious questions both legally and ethically.
But that’s not what Prosecraft was. Prosecraft was a relatively simple tool for getting analytics on different books. While the way it obtained its content was deeply flawed and problematic, it has nothing to do with generative AI.
But, in an environment where authors understandably feel the need to be incredibly protective of their work, there may just not be a way for something like Prosecraft to exist.
In spite of all that, I don’t think it’s quite fair to call Prosecraft “collateral damage” in the fight over AI. The Google Book Search case proves that authors have a long history of being protective of their writing, even when a practice has been normalized in other areas.
With that in mind, Smith is not without fault. His decision to turn to pirated books was a terrible one and was always going to get some degree of backlash. The Google Book Search case, if nothing else, should have served as a warning that authors were not going to be happy about this.
I would have personally advised Smith to follow the model of Crossref, an organization that helps journals cite, track and detect plagiarism in scholarly journals. To do this, the organization is a collaboration among publishers to share their databases to ensure the collection of data and detection of plagiarized content.
Smith could have initially seeded Prosecraft with public domain books and then worked with publishers to make a service to make a service that benefited publishers and authors alike. It would have been a very difficult approach, especially considering that Prosecraft has never been his primary focus, but it would have avoided the backlash.
Still, there’s not much doubt that AI increased the focus on Prosecraft and intensified the backlash against it. While some backlash was, most likely, always inevitable, the intensity and swiftness of it can likely be attributed to the current climate.
In a more relaxed time, the conversation around Prosecraft could have been more nuanced. But AI doesn’t leave much room for nuance in public discourse, especially for authors who already, quite rightly, feel very betrayed.