Recently, there was a question on Quora asking: How would you start implementing a vertical semantic search engine? Since we are very often asked how we implemented our biomedical discovery engine, I thought I would re-post my response here, since not everyone reads Quora. If you have further questions, please contact us.
Since we have implemented a vertical semantic search engine (in our case, for the biomedical and life sciences), I can tell you what we did and how we ended up with an Artificial Intelligence (AI)-powered discovery solution.
To begin, we took some time to really think through what we wanted to accomplish; that is, what limitations of traditional searching needed to be overcome to support actual users. This was critical at multiple levels. First, although we are domain experts in the biomedical and life sciences, we knew that building a solution “we liked” was not necessarily the same as one that “others needed”. So, talk to your user population, do focus groups, and be sure your goals are aligned. Second, we did a major evaluation of approaches, and came to realize that adding more bells and whistles onto those existing methods did not accomplish what we wanted. That evaluation including testing methods on a live site with real users, so that we could maintain focus on what they needed. Third, we considered what content was critical and, for our purposes, authoritative. This set us on a path where flexibility (with regard to format) and responsiveness (to evolving user needs) were built in.
With that background, let me get into the “semantics” of our vertical search engine. The reason I put semantics in quotes is that this means many things to different people. And, unfortunately, in some cases semantics is just a marketing term where the search has no semantics at all. (And, yes, the same thing is already happening with AI.) We considered a range of goals for the backbone of our system to guide the choice of technology:
- Running without human curation
- Being informed by, but not limited by, an ontology
- Disambiguating terms (a particular problem in the biomedical field where gene names are often English words)
- Providing conceptual based searching
- Having the ability to answer questions, not just finding documents
- Moving from list-based results to visuals that are easier to navigate
- Incorporating visual analytics that were predictive and descriptive
The key issue, though, is that to accomplish all of this required more capabilities than traditional semantic analysis or natural language processing alone could provide. So, we combined these technologies with several layers of artificial intelligence. It was also important to have a platform that, while taking advantage of machine learning, was more practical than pure deep learning, which requires extensive training set development. We currently have quite a mix of methods, but they combine nicely to solve the problems we identified and to provide a platform for future needs.
Not all of the above capabilities occurred at the outset, of course, but were folded into a flexible foundation that we established about 5 years ago. We now have a biomedical/life science discovery solution (see) – used around the globe – with unique capabilities including automatic discovery of concepts specifically related to the users query, an inference engine to distinguish positive and negative associations, concept trend prediction, and much more.