“We were surprised to see, for example, that while the LLM chatbots were able to produce fast and reliable short summaries of meeting transcripts, longer summaries of those same transcripts included only about 50 percent of the relevant facts.”

– Hilke Schellmann

Systematic tests of AI tools for three jouralistic tasks found the only reliable outputs were for short summaries. The author led a research team that tested AI for generating short summaries, producing digests of longer documents, and analyzing scientific research.

They had a warning flag AI tools designed to assist work with scientific literature, calling them “more hype than help.”

Results with everyday longer documents varied across the models. They say LLMs can be worthwhile to get the gist of a documnent but still need human work to ensure the essential points are covered and reflected accurately.

The researchers ran the same prompts repeatedly across leading LLMs and compared the results with human-generated work from the same pieces. They also noted the inherent efficiency appeal. Summarzing that took humans three to four hours was done in about a minute by the LLM, but still “AI-generated long summaries should not be used for publication.”

Tested How Well AI Tools Work for Journalism | COLUMBIA JOURNALISM REVIEW | August 19, 2025 | by Hilke Schellmann

SEE FULL STORY

LATEST

Discover more from journalismAI.com

Subscribe now to keep reading and get access to the full archive.

Continue reading