Open Concept Lab (OCL) supports healthcare interoperability by providing tools for managing terminologies and standardized health concepts across diverse healthcare environments. Laboratory data represent one of the most complex and important domains for healthcare interoperability. A key factor in this work is mapping local laboratory data to LOINC (Logical Observation Identifiers Names and Codes), the global standard for laboratory tests and clinical observations. While LOINC enables consistent exchange of laboratory data across systems, accurate mapping remains a complex and labor-intensive process due to variations in naming conventions, incomplete metadata, abbreviations, and the scale and specificity of the terminology itself.
To help address these challenges, OCL collaborated with Parvati Naliyatthaliyazchayil and colleagues at Indiana University to implement and evaluate ScispaCy-LOINC, a novel biomedical NLP-based algorithm she developed to improve mapping performance for heterogeneous laboratory data. The algorithm leverages biomedical language models, identifies entities corresponding to the six LOINC axes, assembles candidate LOINC codes from the resulting LOINC parts, and applies a weighted ranking framework to prioritize the most likely matches. Through this collaboration, the algorithm was incorporated into OCL’s open-source mapping solution, the OCL Mapper, where it was evaluated alongside existing keyword and semantic matching approaches.
In benchmarking on the MIMIC-IV laboratory dataset, ScispaCy-LOINC identified the correct LOINC code in 42.3% of concepts, compared with 21.4% and 19.5% for semantic and conventional keyword matching approaches, respectively. On a second dataset with full-text descriptions, however, the picture reversed: semantic matching led at 54.4%, with 46.9% and 28.4% for keyword and ScispaCy-LOINC algorithms, respectively. The results suggest that biomedical NLP techniques may be particularly effective when working with sparse, abbreviated, or structurally noisy text, as is the norm in most point of care data sources, while semantic search approaches remain valuable for broader clinical concept matching.
This points to an important observation: different algorithms win on different inputs. For a terse lab string like “Absolute Basophil Count,” only ScispaCy-LOINC returned the exact LOINC code — keyword search latched onto an unrelated term containing “absolute,” and semantic search found the right analyte but the wrong method — yet for higher-frequency, fully described terms the keyword and semantic algorithms often ranked the answer first. Terminology matching is optimized by combining multiple algorithmic approaches, where each contributes distinct candidates to a shared pool that is then ranked for the best match.
The incorporation of ScispaCy-LOINC into OCL Mapper broadened the platform’s automated terminology mapping capabilities and provided users with an additional approach for matching complex laboratory concepts to LOINC.
This work ultimately resulted in a peer-reviewed publication in the Journal of the American Medical Informatics Association (https://doi.org/10.1093/jamia/ocag010), describing the methodology, implementation, and evaluation of the approach within the OCL ecosystem. ScispaCy-LOINC is available in the OCL Mapper in OCL Online as an Early Access Program feature.
Read the full article in JAMIA and sign up for the OCL Early Access Program to get started with automated terminology mapping today.
—
Read the paper: Parvati Naliyatthaliyazchayil, Venkat Ramana Sangam, Joseph Amlung, Andrew S. Kanter, Saptarshi Purkayastha, Jonathan Payne. “Automated Logical Observation Identifiers Names and Codes mapping with biomedical natural language processing models: enabling scalable health information exchange via the Open Concept Lab.” *Journal of the American Medical Informatics Association*, 2026. https://doi.org/10.1093/jamia/ocag010