We also provide an advanced search API for any indexed dataset. Like other search offerings, our API supports exact-match rules for text and SQL-based filtering for metadata. However, our search algorithm also benefits from the structure and thematic content inferred by the Sturdy Statistics text analysis algorithm. Our search provides excellent results out of the box, without any added work required to fine-tune embeddings or to train a second ranking model.
Under the hood, our semantic ranking works differently than the more common semantic search based on neural embeddings. Returning to our dataset of tech company earnings calls, let’s say we submit a search query for the war in Ukraine. Our model maps this unstructured search query into our semantic topic space, identifying the following topics connected to the query:
- Global Economic Challenges,
- Foreign Exchange Impact,
- Navigating Economic Uncertainty,
- Inflation and Energy Costs, and
- Supply Chain Disruptions
Even on its own, this results gives insight into how American tech executives see the Ukraine war affecting their business. Our search algorithm then compares this thematic content to our already structured set of documents, paragraphs, sentences and words in order to return the optimal semantic ranking. This structure also enables us to extract the paragraphs, sentences, or even words which are most pertinent to the search query. In the case of our example dataset, the highest-ranking sentences are:
AMZN 2023Q1: I would say I think for any leadership team, each era is different, and it’s often meaningfully impacted by what’s happening around you. And I think that if you look at the last couple of years with things like the pandemic and the labor shortage in 2021 and the war in Ukraine and inflation and uncertain economy, good leadership teams look around and try to figure out what that means and how they should adjust their businesses.
META 2023Q3: In terms of the Q2 revenue acceleration, I’d highlight there are few factors driving that. The first is, frankly we’re lapping a weaker demand period, including the first full quarter of the war in Ukraine and the suspension of our services in Russia.
When the search term is common, our combination of exact match scoring with semantic ranking automatically produces a custom two-phase ranking search algorithm. With common keywords, the exact match scoring produces a large set of candidate documents; our semantic scoring then re-ranks the results according to the semantic content in these documents. Typically two-phase ranking requires either:
- An out of the box solution that is not optimized for any specific application, or
- Extremely labor-intensive fine tuning and maintenance on a custom dataset.
Because each Sturdy Statistics index is trained exclusively on your data, this second phase ranking mechanism is automatically optimized for your particular application: there are no global, hard-coded rules expressing someone else’s assumptions about your data, and no manual tuning or intervention is required on your part.
Our search algorithm also provides a tremendous advantage when the search term is rare. With rare keywords, our topic search operates like a fuzzy match engine: even if no documents match the query exactly, our engine surfaces examples based on the query’s semantic mapping. This fuzzy matching enables users to enjoy advanced semantic ranking, even when then have drilled down deep into their datasets with any number of filters. With other search algorithms, restrictive filters limit the dataset too much and prevent the ranking from working properly.