This blog is written by Sunny Aggarwal. Sunny is a Python Developer in the Open Concept Lab (OCL) team. He has been working as a core contributor in OCL for many years now and currently leads the OCL software tools enhancement work.
In this blog post, we will explore the recent improvements made to the Open Concept Lab (OCL) search, an open-source terminology service developed using React, Python, Django, Postgres, and ElasticSearch (ES). Our aim is to provide users with a clearer understanding of the enhancements and their impact on the search functionality. We will discuss the motivation behind these improvements, the new features introduced, the approach taken, and the key benefits that users can expect from the upgraded OCL search.
The key improvements in the OCL search have addressed various issues to create a more intuitive and efficient search experience. These include:
- Optimized ES matching: OCL now provides much better search results by reconfiguring how ES takes advantage of the different query types available. For example, exact matches to the beginning of names are now scored higher, offering more relevant results to users. ES query types were implemented as follows:
- “term” scored highest, returns results that contain the exact value in a provided field.
- “prefix” returns results that contain a specific prefix in a provided field.
- “match_phrase” returns results that contain the words of a provided text, in the same order as provided, anywhere.
- “match” returns results that match a provided text. Word Matching.
- Unified search: OCL automatically performs exact, fuzzy and wildcard searches with each query and intelligently combines them into a single ranked list of results.
- Refined search results ranking: The ranking of search results was tweaked to have more accurate results show up on top.
- Match highlighting: The TermBrowser now highlights the matching text in each search result.
- Map Codes Search: OCL now support search in concepts’ mapped codes.
New Features of OCL Search
The recent updates to the OCL search have introduced several new features to enhance its effectiveness and efficiency. Notable improvements include:
- Removal of “exact_match” control: OCL search now automatically includes exact matches and scores them appropriately. Users no longer need to specify this parameter separately.
- Introduction of query types: OCL search now supports “term,” “prefix“, “match_phrase” and “match” queries in addition to exact word and wildcard matches. This expands the range of search options for users. For example, given a search criteria of “blood test”:
- “term” – would match anything with “blood test” only
- “prefix” – would match anything that starts with “blood test”
- “blood test result”
- “blood test report”
- “match_phrase” – would match anything that has “blood test” anywhere in the text
- “specific blood test report”
- “Malaria blood test”
- “match” – would match individual words “blood” and “test” anywhere in the text
- Inclusion of “fuzzy” matches: Fuzzy matches have been added to the search algorithm, providing users with relevant results even when there are minor spelling variations or typographical errors in the search query.
- Improved “wildcard” search: Wildcard search now applies the wildcard (*) only at the end of the search string. This is complemented by fuzzy matches and receives higher scores than fuzzy matches. For example, given a search criteria “blood”, would match, “blood test”, “bloodresult” or “bloodwork”, but not with “Lifeblood” or “Spillblood”.
- New Search Metadata: OCL now returns additional search metadata with each result in the API. This includes “search_score” (ES _score), “search_confidence” (a percentile based on search_score), and “search_highlight” (highlighted fields and texts that matched the search criteria). This information assists API users in assessing the relevance and strength of their search results.
- Concept search matching with mapped codes: OCL search now matches with mapped codes (target concept IDs) as well. It scores “SAME-AS” mapped codes higher than other mapped codes to improve result relevance.
- Customizable search behavior: While “exact_match” has been removed, new controls allow users to change the default behavior of OCL search. Query parameters such as “searchMapCode,” “excludeWildcard,” and “excludeFuzzy” offer more flexibility in search options to API users.
- searchMapCode is applicable only for “concepts” and enables search through mapped codes. This is true by default in TB, while API users can override it.
- excludeWildcard is applicable for all resources and it excludes Wildcard matches only. This is false by default in TB.
- excludeFuzzy is applicable for all resources and it excludes Fuzzy matches only. This is false by default in TB.
This blog marks the exciting Round 1 of our OCL Search Improvement journey. But we’re far from done! In Round 2, we’ll be delving into even more advanced search capabilities, delving deeper into complex queries, and unveiling enhanced ways to refine and customize your searches. We’re eager to hear your thoughts and ideas to shape OCL’s search. Join us in the OCL Chat to contribute your ideas and be part of the next wave of improvements that will redefine the way you search in OCL.
Stay tuned for Round 2 and keep exploring the world of OCL!