‘AI has barely touched investigative journalism, let alone transformed it.’
Benefits of AI in investigative journalism should be weighed against its limitations, says Jonathan Stray in the academic journal DIGITAL JOURNALISM. Stray explores how AI systems have have been used in various investigative projects.
He lays out six factors that can frustrate using AI for investigative journalism:
- Data availability – even so-called public data may be difficult to obtain, or hard to use in the form it is provided
- Unique stories – having one-of-a-kind fact patterns, common to investigative stories, means training data can be hard to find and lessens or prevents re-using the same tools from one story to the next
- Challenging problems – complicated source documents can make it difficult for categorization-by-algorithm, and may take on other meaning in relationship to other documents
- The need for accuracy – the extent of efforts required to verify algorithmic findings may offset the time and depth advantages of processing by machines
- Cost-effectiveness – reporting time is less expensive than data scientist time — often talking to people can be just as revealing as interrogating data
- What is news? – a question that often defies human definition is difficult to pass along for use in machine analysis
However, AI systems can be very productive for prepping data, Stray suggests. He proposes more emphasis on using AI for ‘data wrangling,’ for example in two areas:
- Extracting data from different documents
- Linking records from different databases
- Stray frames a useful question: ‘Will using an AI system provide more benefit than difficulty in getting meaningful results?’
- His six issues could be adapted project-by-project to weigh pluses and minuses for trying to use AI tools in data analysis.
- His suggestions for data-wrangling ring true. They offer a constructive approach to gaining an ‘AI advantage’ from present-day systems.
‘Many have envisioned the use of AI methods to find hidden patterns of public interest in large volumes of data, greatly reducing the cost of investigative journalism. But so far only a few investigative stories have utilized AI methods, in relatively narrow ways. This paper surveys what has been accomplished in investigative reporting using AI techniques, why it has been difficult to apply more advanced methods, and what sorts of investigative journalism problems might be solved by AI in the near term. Journalism problems are often unique to a particular story, which means that training data is not readily available and the cost of complex models cannot be amortized over multiple projects. Much of the data relevant to a story is not publicly accessible but in the hands of governments and private entities, often requiring collection, negotiation, or purchase. Journalistic inference requires very high accuracy, or extensive manual checking, to avoid the risk of libel. The factors that make some set of facts “newsworthy” are deeply sociopolitical and therefore difficult to encode computationally. The biggest near-term potential for AI in investigative journalism lies in data preparation tasks, such as data extraction from diverse documents and probabilistic cross-database record linkage.’
SEE FULL PAPER From publisher (free)