Metadata Accelerator: Improving scientific data descriptions with Natural Language Processing methods (NLP) and Instant Feedback

Authors

  • Maria Juliana Rodriguez Cubillos University of Edinburgh
  • Andrew J. Millar University of Edinburgh
  • Ian Simpson University of Edinburgh
  • Jason Swedlow University of Edinburgh
  • Tomasz Zieliński University of Edinburgh

DOI:

https://doi.org/10.2218/eor.2024.9660

Abstract

Promoting data availability and accessibility is a foundational principle of FAIR data guidance. However, better metadata is needed to ensure knowledge dissemination, highlighting the vital role of documenting research studies.

Aim: Develop an AI metadata enrichment tool focusing on named entities within unstructured textual data. Using text mining, Machine Learning, and NLP models like GPT and BERT, my strategic goal is to offer feedback on free text descriptions to improve metadata quality and dataset reusability.

Downloads

Published

02-Jul-2024