Newspaper Navigator

Benjamin Charles Germain Lee

This page contains a number of resources for Newspaper Navigator, my Innovator in Residence project at the Library of Congress. The goal of Newspaper Navigator is to re-imagine searching over the visual content in Chronicling America using machine learning. The project consists of two steps: The goal of Newspaper Navigator is to engage the American public with the Library of Congress's collections, as well advance research in computer science, digital libraries, and the digital humanities. All deliverables resulting from the Newspaper Navigator project (including all code, the dataset, etc.), are placed into the public domain for unrestricted re-use. An enormous thank you to LC Labs, The National Digital Newspaper Program, and IT Design & Development at the Library of Congress, as well as Dan Weld!

The Newspaper Navigator Dataset

Website: https://news-navigator.labs.loc.gov/
This is the landing page for the Newspaper Navigator dataset. Here, you will find details about how to query the data over https and S3. There are also hundreds of pre-packaged datasets for immediate use - no coding necessary!

Paper: CIKM 2020
DOI: https://dl.acm.org/doi/10.1145/3340531.3412767
ArXiv: https://arxiv.org/abs/2005.01583
This paper details the construction of the Newspaper Navigator dataset. This includes details on training the visual content recognition model, statistics on running the pipeline, and visualizations of the dataset itself.
*This paper was named Best Resource Paper Runner-up at CIKM 2020.*
*The dataset has been named Best Digital Humanities Dataset at the 2020 Digital Humanities Awards.*

Code: https://github.com/LibraryOfCongress/newspaper-navigator
This GitHub repo contains all of the code for the Newspaper Navigator Project, as well as the finetuned visual content recognition model weights, the dataset used for finetuning (annotations from the Beyond words project augmented with additional annotations), and demos. The entire contents of the repo are placed into the public domain.

The Newspaper Navigator Search Application

Website: https://news-navigator.labs.loc.gov/search
The Newspaper Navigator search application enables visitors to search over 1.5 million Newspaper Navigator photos. In addition to providing faceted + keyword search affordances, The application empowers visitors to train their own AI navigators to search 1.5 million Newspaper Navigator photos by visual similarity. Visitors can train AI navigators by labeling positive and negative training examples and tune the system on the fly, as training and predicting on all 1.5 million photos takes just a couple of seconds. The AI navigators are powered by ResNet-18 image embeddings.

Demo: UIST 2020
Paper DOI: https://dl.acm.org/doi/10.1145/3379350.3416143
Preview Video: https://www.youtube.com/watch?v=1WfTFVXx1fg
Short Talk Video: https://www.youtube.com/watch?v=9w7ippuo3Gk
This demo presents open faceted search, the new mode of search launched in the Newspaper Navigator Search Application. From the computer science perspective, open faceted search empowers users to define their own facets in an open-domain fashion.

Newspaper Navigator Data Archaeology

In this Digital Humanities Quarterly paper, which I call a "data archaeology," I consider the digitization journeys of four different pages in Black newspapers in Chronicling America that reproduce the same photograph of W.E.B. Du Bois. In doing so, I unpack how each step in the pipelines, such as the imaging process and the construction of training data, not only imprints bias on the resulting Newspaper Navigator dataset but also propagates the bias via the machine learning algorithms employed. I investigate the limitations machine learning as it relates to cultural heritage, from marginalization and erasure via algorithmic bias to unfair labor practices in the construction of commonly-used datasets. I argue that any use must be done with an understanding of the broader socio-technical ecosystems in which the algorithms have been utilized.

Data Archaeology:
http://www.digitalhumanities.org/dhq/vol/15/4/000578/000578.html

Newspaper Navigator Organizational Overview

To document an overview of Newspaper Navigator from the organizational perspective, we contributed an article to the EuropeanaTech Insight special issue on newspapers. You can find more details below:

Paper: https://pro.europeana.eu/page/issue-16-newspapers

Newspaper Navigator and the Ladino Press

I have also applied Newspaper Navigator to study visual content embedded within the Ladino press, as digitized by the University of Washington's Sephardic Studies Digital Library. I have presented this work at #DHJewish - Jewish Studies in the Digital Age". The DOI for the abstract, as well as the video recording of our presentation, can be found here. I have also written a chapter for the edited volume Jewish Studies in the Digital Age, which was publisehd with De Gruyter Press as part of the Studies in Digital History and Hermeneutics Series. This work is supported by the Richard Willner Memoral Fellowship as part of the Stroum Center for Jewish Studies's Graduate Fellowship program. More information on this project can be found in a blog post that I wrote for the Stroum Center for Jewish Studies.

Jewish Studies in the Digital Age Book Chapter: https://doi.org/10.1515/9783110744828-010

Newspaper Navigator and the Visual Layouts of Ethnic Periodicals

With periodicals scholars Jim Casey, Sarah Salter, and Joshua Ortiz Baco, I am studying the evolution of visual layouts of ethnic periodicals within Chronicling America. Using the Newspaper Navigator dataset, it is possible to directly quantify the similarity of layouts across millions of newspaper pages, enablingus not only to trace the technological developments of printing presses but also to uncover the hidden editorial practices embedded within layouts themselves. For example, we have identified clusters of newspaper titles withsimilar visual layouts, such as networks of African-American titles that feature illustrations and photographsof members of their communities in portrait poses in the center of their front pages. The editors’ choice of ashared visual grammar speaks to the ways in which visual culture featured prominently into editorial practices. Our first paper detailing this collaboration has been accepted at the Computational Humanities Research (CHR) 2021. You can find more details below:

First Paper: http://ceur-ws.org/Vol-2989/short_paper3.pdf

Our second paper is has appeared in the journal Criticism in the special issue: "New Approaches to Critical Bibliography and the Material Text."

Second Paper: http://ceur-ws.org/Vol-2989/short_paper3.pdf

Newspaper Navigator in the Classroom

To document the ways in which Newspaper Navigator and machine learning writ large can play a role in social studies education, Ilene Berson, Michael Berson, and I wrote an article for Social Education. You can find more details below:

Paper: https://www.socialstudies.org/social-education/85/2/machine-learning-and-social-studies

Newspaper Navigator and Layout Parser

The Newspaper Navigator visual content recognition model is now part of Layout Parser's Model Zoo! Layout Parser is a unified toolkit for Deep Learning Based Document Image Analysis. You can pip install the library and process newspaper pages in a few lines of code. Our paper on Layout Parser, led by Zejiang Shen and Melissa Dell, was presented at ICDAR 2021. You can find more details below.

Paper: https://arxiv.org/abs/2103.15348
Preview Video:https://www.youtube.com/watch?v=zmr9NOYPKHo
Presentation Video:https://www.youtube.com/watch?v=ASe4X7fSRa4

Press

The Librarians of the Future Will Be AI Archivists
Popular Mechanics (Courtney Linder, 05/13/2020)

An Archive Unearthed
The Batch (Nick Stockton, 05/13/2020)

Millions of historic newspaper images get the machine learning treatment at the Library of Congress
TechCrunch (Devin Coldewey, 05/07/2020)

Machine Learning: The Library of Congress “Newspaper Navigator” Dataset is Now Available; Over 16 Million Pages From “Chronicling America” Processed
InfoDocket (Gary Price, 05/07/2020)

U.S. Library of Congress Processes over 16 Million Historic Newspaper Pages Using AI
NVIDIA Developer News Center (05/06/2020)

Library of Congress Innovator in Residence Ben Lee Discusses His Newspaper Navigator Project That Uses Machine Learning to Extract Visual Content From Chronicling America & Announces Upcoming “Data Jam” to Preview Dataset
InfoDocket (Gary Price, 04/21/2020)

Ph.D. student Benjamin Lee named Library of Congress Innovator in Residence
Allen School News (11/25/2019)

Hip Hop and Human-Computer Interaction Focus of 2020 Innovators in Residence
the Library of Congress (11/18/19)

Interviews

Preserving Sephardic History through Interdisciplinary Collaboration: An Interview with Makena Mezistrano and Ben Lee
EuropeNow (Taylor Soja, 04/17/22)

Navigating Collections of Digitised Historical Newspapers: A Conversation with Ben Lee
NewsEye Blog (Amanda Maunoury, 10/05/21)

Ladino Newspapers Are the New Wave in “Uncharted Waters” of Digital History
The Stroum Center for Jewish Studies at the University of Washington (Hannah Pressman, 12/01/20)

Chronicling America and Navigating Newspapers
From Our Corner: Washington Secretary of State Blog (06/17/2020)

Blog Posts from The Signal and the NEH Blog

Newspaper Navigator Search Application Now Live!
The Signal (Eileen Jakeway, 09/21/20)

Reimagining Searching in Chronicling America
The NEH Blog (Joshua Ortiz Baco, 07/17/20)

Innovator Ben Lee and LC Labs Host “Data Jam” with 100 Million Historic Newspaper Images
The Signal (Leah Weinryb-Grohsgal, 04/21/20)

Newspaper Navigator Surfaces Treasure Trove of Historic Images – Get a Sneak Peek at Upcoming Data Jam!
The Signal (Eileen Jakeway, 04/21/20)

Introducing Ben and Brian, the Library’s new Innovators in Residence!
The Signal (Eileen Jakeway, 11/18/19)

Reviews

06/23/2022
Reviews in DH: A Review of Newspaper Navigator
By Lorella Viola

Selected Events & Talks

*for a more comprehensive list of talks that I've given on Newspaper Navigator, please see my talks page*

10/31/2023
Deutsche Nationalbibliothek
Newspaper Portals Meet DH Conference
Panel with Torsten Roeder
"Visual Content: Reimagining Digitized Newspapers with Machine Learning" (virtual)

09/08/2023
Virginia Tech
Computer Science Seminar Series
"Reimagining Search and Discovery for Digital Collections with Machine Learning"

06/08/2023
University College London
Sloane Lab Symposium Series: (Re)connecting Heritage Collections as Data, Infrastructure, and Participatory Engagement: Big Dreams, Big Challenges
"Reimagining Search and Discovery for Digital Collections with Mahcine Learning" (virtual)

04/21/2023
Mellon-Rare Book School Society of Fellows in Critical Bibliography
Lecture Series
"Preserving and Analyzing Digital Texts" (virtual)
With James Hodges, Ryan Cordell, and Emily Maemura
Event description available here, recording available here

03/21/2023
United States Naval Academy
"Data Science for Jewish Studies" Guest Lecture (virtual)

02/21/2023
University of Washington
Information School Seminar
"Reimagining Search and Discovery for Digital Collections with Machine Learning"

01/20/2023
European Society for Periodical Research
Seminar
"New Computational Approaches to Periodical Studies" (virtual)
With Thomas Smits and Kaspar Beelen
Event description available here, recording available here

01/18/2023
Humboldt University of Berlin
Digital History Research Colloquium
"Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning" (virtual)

01/05/2023
American Historical Association 2023
Panel
"Computing on Cultural Heritage: Reports from an LC Labs Experiment"
With Benjamin Schmidt, Meghan Ferriter, Jessica Mack, Lauren Tilton, and Taylor Arnold

10/12/2022 Digital Library Federation Forum 2022
"Newspaper Navigator: Hosting the Dataset and Deploying the Search Application" (virtual)
With Chris Adams

07/27/2022
Digital Humanities 2022
Long Presentation
"Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning" (virtual)

06/04/2022
German Historical Institute Washington
Fifth Annual GHI Conference on Digital Humanities and Digital History
Datafication in the Historical Humanities
"Compounded Mediation: Excavating the Newspaper Navigator Dataset"

05/19/2022
Research Society for American Periodicals 2022
Symposium
"New Directions for Interdisciplinary Collaborations in Periodical Studies" (virtual)
Event description available here

05/18/2022
DH Unbound 2022
"A Computational Periodicals Unconference: Exploring New Opportunities for Critical and Collaborative Inquiry" (virtual)
(with Sarah Salter, Jim Casey, and Joshua Ortiz Baco)

04/11/2022
2022 UCLadino Conference
"The Digital Humanities and the Ladino Press: Unlocking Historic Ladino Newspapers with & Machine Learning" (virtual)
Recording available here

03/15/2022
Fantastic Futures 2021
"Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning" (virtual) 02/23/2022

03/14/2022
Designing Storage Architectures for Digital Collections 2022
Library of Congress
"Newspaper Navigator: Hosting the Dataset and Deploying the Search Application" (with Chris Adams, virtual)
Event description available here; slides available here

02/23/2022
University of Wrocław
Workshop: Studying Advertisements in pre-1939 Jewish Press: Methods and Challenges
"Using Machine Learning to Extract and Analyze Advertisements in Historic Ladino Newspapers, 1890-1948"

02/22/22
Princeton University
Center for Digital Humanities
"Novel Machine Learning Methods for Computing Cultural Heritage: An Interdisciplinary Approach" (virtual)
Event description available here

02/02/22
University of Illinois Urbana-Champaign
Digital Humanities + iSchool (DHIS) Collective Meeting
"Compounded Mediation: A Data Archaeology of the Newspaper Navigator Dataset" (virtual)
Event description available here

01/28/22
The University of Washington
Computer Science & Engineering HCI Seminar
"Newspaper Navigator: Open Faceted Search for 1.5 Million Images" (virtual)

11/10/2021
Texas A&M University-Corpus Christi
Honors Program Speaker Series
"Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning" (virtual)

11/09/2021
University of London School of Advanced Study
Institute of Historical Research
Digital History Seminar Series
"Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning" (virtual)
Recording available here

10/06/2021
Consortium for the History of Science, Technology, and Medicine
Digital History of Science Working Group
"Excavating the Newspaper Navigator Dataset from a Critical Data Studies Perspective" (virtual)

06/25/2021
Association for Documentary Editing
Annual Meeting
"Cowboys, Computers, and Cartoons: Excavating and Explicating America’s Political Cartoons"
A Panel Session with Clay Jenkinson and Sharon Kilzer (virtual)

05/29/2021
The University of Washington
The Stroum Center for Jewish Studies
Colloquium
"Sephardic Experiences of Modernity: Newspapers, Migrants and Midwives" (virtual)
A Panel Session with Busra Demirkol, Oya Aktas, and Oscar Aguirre-Mandujano (respondent)
Recording available here

03/24/2021
Harvard University
Discovery Series
"Newspaper Navigator: Re-Imagining Digitized Newspapers with Machine Learning"(virtual)
Sponsored by the Cabot Science Library and the Harvard University Digital Scholarship Group

03/13/2021
The International NewsEye Conference
"From Chronicling America to Newspaper Navigator: Improving Access to Historic Newspaper Photos at the Library of Congress through Machine Learning"
(with Nathan Yarasavage)
Recording available here; slides available here

01/13/2021
#DHJewish - Jewish Studies in the Digital Age"
"The Digital Humanities and the Ladino Press: Using Machine Learning to Extract and Analyze Visual Content in Historic Ladino Newspapers"(virtual)
(with Devin Naar)
Abstract DOI & presentation recording available here

12/09/2020
Association of College & Research Libraries
ULS Technology in University Libraries Committee Tech Forum
"Newspaper Navigator: Re-imagining Library Search and Discovery with Machine Learning" (virtual)
Recording available here

11/12/2020
The University of Washington
Stroum Center for Jewish Studies
"Teaching Computers to Read Ladino" (virtual)
Recording available here

10/30/2020
The Johns Hopkins University
Digital History Workshop
"Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning" (virtual)

10/09/2020
Drexel University
History Department
"Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning" (virtual)

09/18/2020
Duke University
Data Dialogue Series
"Newspaper Navigator: Reimagining Historic Newspapers with Machine Learning" (virtual)

09/15/2020
National Endowment for the Humanities and the Library of Congress
NEH Division of Preservation and Access and LOC Serial and Government Publications Division
"Seeing Editors: Metadata, Machine Learning, and the Shapes of Social Justice" (virtual)
A panel with Jim Casey, Sarah Salter, and Joshua Ortiz Baco

09/11/2020
The Alan Turing Institute
Computer Vision for Digital Heritage
"Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning" (virtual)

07/12/2020
The Allen Turing Institute & British Library
Living with Machines Group
"Newspaper Navigator: An Introduction and Demo" (virtual)

05/15/2020
Princeton University
Princeton Center for Digital Humanities & Princeton University Library
2019-20 Collections as Data Discussion Series
"Newspaper Navigator: Reimagining Digitized Newspapers with Machine Learning" (virtual)
Recording available here

05/07/2020
Library of Congress
Newspaper Navigator Data Jam
Public Event
View a recording of the data jam here
Read more about the data jam here

Newspaper Navigator around the Web

Innovations with Digitized Newspapers
2022-2023 Library of Congress Albert Einstein Distinguished Educator Fellow Jacqueline Katz's talk describing uses of Newspaper Navigator in the classroom!

nnanno: A Collection of Tools for working with the Newspaper Navigator Dataset
Daniel van Strien's nnanno toolkit is an invaluable resource for sampling Newspaper Navigator data, adding additional labels, and training models to make new predictions using IIF!

The Collective Wisdom Handbook: Perspectives on Crowdsourcing in Cultural Heritage
Newspaper Navigator has been featured as a case study in Chapter 10: Working with Crowdsourcing Data of The Collective Wisdom Handbook! The authors of the volume are Mia Ridge, Samantha Blickhan, Meghan Ferriter, Austin Mast, Ben Brumfield, Brendon Wilkins, Daria Cybulska, Denise Burgher, Jim Casey, Kurt Luther, Michael Haley Goldman, Nick White, Pip Willcox, Sara Carlstead Brumfield, Sonya J. Coleman, and Ylva Berglund Prytz.

Working with Maps at Scale Using Computer Vision and Jupyter Notebooks
Delivered by Daniel van Strien as part of the Digital Humanities and Digital Archives workshop at the National Library of Estonia
Daniel's dataset derived from Newspaper Navigator can be found here

The Impact of Artificial Intelligence on Genealogy
Elevenses with Lisa: A Genealogy Show (Episode 32)
by Lisa Louise Cooke
Show notes here

How to Use Chronicling America's Newspaper Navigator to Find Photos and Images
Elevenses with Lisa: A Genealogy Show (Episode 26)
by Lisa Louise Cooke
Show notes here

The Newspaper Navigator Search App, an Educator's View
The Primary Source Podcast (Season 1, Episode 3)
by Tom Bober

Genealogy Quick Start (September 24, 2020 episode)
by Shamele Jordon

Machine Learning + Libraries: A Report on the State of the Field
by Ryan Cordell

Sanborn Maps Navigator (with Newspaper Navigator data as well!)
by Selena Qian (learn more about the project in Selena's 2020 Junior Fellow presentation!)

Mega Mideast Map Collage via Nick Danforth at The Afternoon Map

Twitter

Check out tweets about Newspaper Navigator and artifacts from the Newspaper Navigator Data Jam by searching Twitter for #NewspaperNavigator!