Explorations in Data Innovations: Can Machine Learning Support Data Catalog Development? Final Report

< Back to Search Results
Release Date: July 05, 2024

Explorations in Data Innovations: Can Machine Learning Support Data Catalog Development? Final Report

deliverable icon

About the Report

Download Report

The Chief Evaluation Office of the U.S. Department of Labor (DOL CEO) is committed to using innovative tools to meet the Department’s research, evaluation, and data analytics needs. In December 2021, DOL CEO commissioned the Westat Insight and American Institutes for Research® (AIR®) study team to explore potential opportunities to use machine learning methods to facilitate the automated data collection of labor-relevant data. Between May 2022 and December 2023, the study team worked with experts in machine learning, web scraping, and labor-related data to understand how DOL CEO could use machine learning approaches to automate data collection efforts. Specifically, the team explored options to use machine learning to create and maintain a public-facing, labor-related data catalog, which would serve as a use case for automated data collection in general.

This report describes lessons learned from this exploration. In general, the study team found that, at this time, machine learning methods may not be the best tools to create, populate, and update a labor-relevant data catalog.

Download this Summary (PDF)

Key Takeaways

  • Insights from the pilot study and feedback from technical working group members suggest that DOL may not be able to automate the data catalog process at this time.
  • Data sources have diverse structures and metadata available, making the development of an automation program difficult, and employing automations to even portions of the data catalog development process would require a large investment in staff and computing resources.
  • Artificial intelligence is a rapidly evolving field. Literature suggests there may be many opportunities to support automated data collection in the future, including the use of generative artificial intelligence (Cherradi et al., 2023; Yarlagadda, 2017).
  • When using machine learning to produce public-facing products, federal agencies may need to use a mix of staff with different skill sets, including data scientists, website developers, experts in cloud computing, and subject matter experts.

Citation

Mills De La Rosa, S., Patterson, L., Miller, M., & Liu, A. (2024) Westat Insight and American Institutes for Research. Explorations in data innovations–Can machine learning support data catalog development? Chief Evaluation Office, U.S. Department of Labor.

Download Report   View Study Profile

The Department of Labor’s (DOL) Chief Evaluation Office (CEO) sponsors independent evaluations and research, primarily conducted by external, third-party contractors in accordance with the Department of Labor Evaluation Policy and CEO’s research development process.