Foundations of data journalism – Scraping

Data scraping web pages with Google sheets, scraping PDFs with Tabula and PDFtoExcel. Build graphics with Flourish Studio. We’ll also explore how to scrape data from a photograph using Excel.


More training on Leadership:

Business tools and investigating companies


More training on Reporting & Writing:

Reporting with Pinpoint

Learn how to probe through millions of documents, including audio, video, PDFs and more, to find patterns or that proverbial needle in a haystack. You’ll see case studies of how Gannett journalists have used this software to break big investigative stories.

Download the collection of links discussed during the training below.


More training on Reporting & Writing:

Using specialized search engines to find sources, data and fact-checks

Execute deeper, more precise, more sophisticated web searches by harnessing specialized search engines. Diversity your stories, angles, sources and readers.


More training on Reporting & Writing:

Backgrounding a business

We cover:

  • How to search for proxies and tools for covering business and the economy from Journalist’s Toolbox.
  • How to use Google Finance to research a company and track stock prices.
  • Nonprofits — from ProPublica to Charity Watch — what’s out there as resources, what to look for, how to evaluate it.

More training on Reporting & Writing:

Secure communications with sources and tipsters

Tips and traps: Tools such as Signal app, Freedome and VPNs to keep you, your data and your sources safe. We’ll look at a few Google tools (including how to unplug from Maps tracking) as well as resources from the Digital Security section of Journalist’s Toolbox.

We also look at data scraping: How to scrape data from web pages with Google Sheets, browser-based plug-ins and scraping .PDFs with Tabula.technology. Students should download the Tabula software at http://tabula.technology


More training on Investigative:

CSV Match and other fuzzy-matching tools

CSV Match is an open-source fuzzy-matching library that uses some of the same algorithms as Google’s Open Refine, only it finds matches between two files rather than within one.


More training on Investigative:

Analyzing batches of PDFs

Learn tools and techniques for working with data buried in PDF files. Python experience is recommended.


More training on Investigative:

Basic databases in Google Flourish and other tools

Do you have a dataset you want to feature in your article as a table or a searchable database? There are a few easy tools to help you show your work: Google Flourish, Airtable and Tableizer. 

Links to tools: 

Google Flourish (set up a free account prior): https://flourish.studio/

Airtable: https://airtable.com/

Tableizer: https://tableizer.journalistopia.com/

Data to Build the Tables and Databases

COVID-19 Cases/Deaths by County: https://docs.google.com/spreadsheets/d/114DjZZqJFxoOV_4lxgyzDH9X-kveXDOH/edit?usp=drive_web&ouid=101717595278789621083&rtpof=true

Football Coach Salaries: https://drive.google.com/drive/folders/10UnuNnB0McI_g2Qhxl91GRp_ZQ3i-0AU

Link to PowerPoint:  https://docs.google.com/presentation/d/1kP6atqFTMi9kwq-PuWBnNky_pXMAaTN5/edit?usp=drive_web&ouid=101717595278789621083&rtpof=true


More training on Investigative:

What do you mean, it’s “statistically significant?”

As a reporter, if you remember nothing else from college stats, it should be how to tell if a number is both newsworthy and trustworthy. This session also might be called: advanced math for journalists!


More training on Investigative:

Investigative/data skills series: Data visualization using Infogram

Data visualization for everyone using InfoGram, an easy but powerful tool available to journalists across Gannett. 

Here’s data for a line chart example we used.  https://docs.google.com/spreadsheets/d/1HGXDys6Rxd9oqYIfOseTjzp2XcBQo13E0VJhkfVn3GQ/edit?usp=sharing

Here’s the MVP data from Sports Reference as an example we used, with pivot table and final sheet tabs: https://docs.google.com/spreadsheets/d/1NMANIMwjiXFm4Yb4VRD7d8EoWYY-6Pn4LSE1lJBT-VY/edit?usp=sharing

iframe wrapper for embedding https://www.gannett-cdn.com/experiments/usatoday/tools/static-embed-generator/index.html

More detail on the table feature in Infogram https://infogram.com/covid-19-vaccines-at-ford-field-and-detroit-mobile-sites-1hdw2jplgwn8j2l

xample of an image, text and chart combo on one infogram: https://infogram.com/candidate-josh-kaul-1h8n6mkq58d92xo?live


More training on Investigative:

How to pry public records from a reluctant agency

Government databases are public records, yet most agencies treat records requests for data as something exotic and fraught with challenges. Anticipate the objections and defeat them for the win. Conversation led by Steve Suo and Nick Penzenstadler of the USA TODAY investigations team.


More training on Reporting & Writing:

Unlocking the power of Excel pivot tables

Slice and dice data to to organize your reporting, find patterns and reveal better stories. Led by Erin Mansfield and Nick Penzenstadler of the USA TODAY data/investigations team.


More training on Investigative:

Backgrounding like a boss: Reporting on people and companies with free tools

How are you sure that great source with the perfect quote isn’t too good to be true? Even great reporters can get tricked by fake names or sketchy backgrounds. We’ll walk through some websites and strategies you can use to create a routine and spot potential red flags before you get burned.


More training on Investigative:

Advanced Search: Power tools for search, research and analysis

Sign up for Google’s Backlight tool: goo.gle/getbacklight

Find information faster by learning how to power search on special sites for datasets, images and court documents. See what users in your area are searching and discover story ideas on Google Trends. Explore Backlight, an AI tool for parsing massive amounts of documents.


More training on Reporting & Writing:

Building a public records mindset

A discussion on developing a documents state of mind — the key to doing solid watchdog work on a beat. We’ll explore key records on a variety of beats and give practical tips on using open records laws. We’ll give you a checklist of what to know before you make a request and advice on wording your requests for documents and data. 


More training on Investigative: