Innovation, the availability and usage of novel products and business practices, is central to improving living standards. Policymakers, in part, rely on survey-based measures of innovation to design, develop, and implement policies to promote innovation. In the U.S., the National Center for Science and Engineering Statistics (NCSES) measures innovation through nationally representative surveys of businesses, such as the Annual Business Survey (ABS). To reduce respondent fatigue and to provide more timely information, statistical organizations are interested in exploring non-traditional methods for measuring innovation to supplement existing data.
In this technical report, our goal is to document our research that demonstrates how a large corpus of opportunity data, in particular, news articles, used with advanced natural language processing methods, can be used to identify and measure innovation in various sectors (food and beverage, pharmaceutical, and computer software). We present a novel approach utilizing the Bidirectional Encoder Representation from Transformers (BERT) language model developed by Google. Our methods include (i) text classification to identify news articles that mention innovation, (ii) named-entity recognition (NER), (iii) question answering (QA) to extract company names, and (iv) developing yearly innovation indicators for companies in these sectors.