Making Artificial Intelligence Work for Investigative Journalism | Stray

‘AI has barely touched investigative journalism, let alone transformed it.’

Benefits of AI in investigative journalism should be weighed against its limitations, says Jonathan Stray in the academic journal DIGITAL JOURNALISM. Stray explores how AI systems have have been used in various investigative projects.

He lays out six factors that can frustrate using AI for investigative journalism:

  1. Data availability – even so-called public data may be difficult to obtain, or hard to use in the form it is provided
  2. Unique stories – having one-of-a-kind fact patterns, common to investigative stories, means training data can be hard to find. It also lessens or prevents the ability to use the same model in other stories.
  3. Challenging problems – complicated source documents can make it difficult for algorithms to categorize the data. Also data may take on other meaning in relationship to other documents.
  4. The need for accuracy – the extent of effort for verification may offset the time and depth advantages of processing by machines.
  5. Cost-effectiveness – reporting time is less expensive than data scientist time. Sometimes, talking to people can be just as revealing as interrogating data.
  6. What is news? – a question that often defies human definition is difficult to quantify for use in machine analysis

However, AI systems can be very productive for prepping data, Stray suggests. He proposes more emphasis on using AI for ‘data wrangling,’ for example:

  • Extracting data from different documents
  • Linking records from different databases


  • Stray frames a useful question: ‘Will using an AI system provide more benefit than difficulty in getting meaningful results?’
  • His six issues could be adapted project-by-project to weigh pluses and minuses.
  • His suggestions for data-wrangling ring true. They are a constructive approach to gaining an ‘AI advantage’ from present-day systems.


‘Many have envisioned the use of AI methods to find hidden patterns of public interest in large volumes of data, greatly reducing the cost of investigative journalism. But so far only a few investigative stories have utilized AI methods, in relatively narrow ways. This paper surveys what has been accomplished in investigative reporting using AI techniques, why it has been difficult to apply more advanced methods, and what sorts of investigative journalism problems might be solved by AI in the near term. Journalism problems are often unique to a particular story, which means that training data is not readily available and the cost of complex models cannot be amortized over multiple projects. Much of the data relevant to a story is not publicly accessible but in the hands of governments and private entities, often requiring collection, negotiation, or purchase. Journalistic inference requires very high accuracy, or extensive manual checking, to avoid the risk of libel. The factors that make some set of facts “newsworthy” are deeply sociopolitical and therefore difficult to encode computationally. The biggest near-term potential for AI in investigative journalism lies in data preparation tasks, such as data extraction from diverse documents and probabilistic cross-database record linkage.’

SEE FULL PAPER From publisher (free)

Stray, J. (2019) ‘Making Artificial Intelligence Work for Investigative Journalism,’ Digital Journalism, DOI: 10.1080/21670811.2019.1630289

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.