Explorations in Data Innovations: Can Machine Learning Support Data Catalog Development? Final Report
Related Tags
Topic
Research Methods
Study Population
Country
About the Report
The Chief Evaluation Office of the U.S. Department of Labor (DOL CEO) is committed to using innovative tools to meet the Department’s research, evaluation, and data analytics needs. In December 2021, DOL CEO commissioned the Westat Insight and American Institutes for Research® (AIR®) study team to explore potential opportunities to use machine learning methods to facilitate the automated data collection of labor-relevant data. Between May 2022 and December 2023, the study team worked with experts in machine learning, web scraping, and labor-related data to understand how DOL CEO could use machine learning approaches to automate data collection efforts. Specifically, the team explored options to use machine learning to create and maintain a public-facing, labor-related data catalog, which would serve as a use case for automated data collection in general.
This report describes lessons learned from this exploration. In general, the study team found that, at this time, machine learning methods may not be the best tools to create, populate, and update a labor-relevant data catalog.
Key Takeaways
- Insights from the pilot study and feedback from technical working group members suggest that DOL may not be able to automate the data catalog process at this time.
- Data sources have diverse structures and metadata available, making the development of an automation program difficult, and employing automations to even portions of the data catalog development process would require a large investment in staff and computing resources.
- Artificial intelligence is a rapidly evolving field. Literature suggests there may be many opportunities to support automated data collection in the future, including the use of generative artificial intelligence (Cherradi et al., 2023; Yarlagadda, 2017).
- When using machine learning to produce public-facing products, federal agencies may need to use a mix of staff with different skill sets, including data scientists, website developers, experts in cloud computing, and subject matter experts.
Citation
Mills De La Rosa, S., Patterson, L., Miller, M., & Liu, A. (2024) Westat Insight and American Institutes for Research. Explorations in data innovations–Can machine learning support data catalog development? Chief Evaluation Office, U.S. Department of Labor.
Download Report View Study Profile
The Department of Labor’s (DOL) Chief Evaluation Office (CEO) sponsors independent evaluations and research, primarily conducted by external, third-party contractors in accordance with the Department of Labor Evaluation Policy and CEO’s research development process.