Summary: The premise of this new Key Object architecture is that search is broken, at least as it applies to complex merchandise like computers, printers, and cameras. An innovative and workable solution is described. The question remains, is the pain sufficient to justify a switch?
As we are all fond of saying, innovation follows pain points. If you’re reviewing the hundredth social media/instant-messaging/photo-sharing app you might conclude that those so-called pain points identified by some tech innovators hardly rise to the level of owwies. But what about search? Are we missing something in our uber-critical search capabilities that needs to be resolved?
A colleague recently pointed me to a slim volume “Structured Search for Big Data” by Mikhail Gilula (published by Elsevier and available on Amazon) that argues that not only are our search tools deficient but that a complete revamp of the underlying key-word NoSQL DB structure is what’s required.
Use Google, Amazon, or any of the other life-critical search tools we’ve become so reliant upon and you are using key-word search on NoSQL. The pain that Gilula identifies is the length of time it takes the consumer to research and select complex merchandise for best deals resulting from the imprecision of the search results.
The problem with key-word in Gilula’s estimation is that those key words are typically limited to generic descriptions (e.g. ‘laser printer’) or brand or model names. Although he gives examples from healthcare as well as the increased functionality structured search would give EDWs on NoSQL, we’ll stick with the primary example from his book which is the research of the purchase of a laser printer at the best price. The argument is germaine to all complex consumer goods such as TVs, computers, digital cameras, and the like.
Mary is shopping for a printer. She has in mind a list of specifications that include:
- Printing method: laser
- No duplex mode
- Horizontal resolution of at least 600 dpi
- Vertical resolution of at least 600 dpi
- Speed of at least 14 ppm.
There is little argument that using traditional key-word search tools it would be difficult to get to a narrow list of results with exactly these specifications, much less have them listed by best price. Also, if you want to eliminate features that you don’t want in your product, in this example ‘no duplex mode’, key word search does a poor job at logical negations.
The penalty to Mary is that she spends hours pouring through search results trying to find models with the specifications she wants, then having identified specific models, returns to key-word search to find the model at the best price.
The hidden penalty to the advertisers who paid a couple of pennys to be displayed to Mary is that the great majority of them had no opportunity to win.
Gilula’s solution is to provide you with a market-ready DB providing structured internet search based on his patented architecture called Key-Object, searchable with a SQL like language called KeySQL.
The term ‘structured data’ typically brings to mind RDBMS and it’s historically true that rich data structures have been at odds with low cost, MPP, high availability, and high speed of access that are critical to search.
Without diving too deeply into the details, an atomic level key-object is a standard tuple name:object pair which in this case is at the specification level. For example, speed:14ppm would be an atomic level key-object.
Key objects roll up JSON-like into groups that are contained in an index independent of the underlying document or page to be retrieved and can be searched not only at that level of detail, but also using KeySQL to specify the order of the search components (most important first) and can include negations (no duplex mode).
The advantage to Mary is clearly a time savings. The advantage to the tech ecommerce site employing this architecture is the ability to draw more customers based on its ease and accuracy of use. And interestingly, Gilula argues, to the advertiser who should be willing to pay more for this much more targeted seach result display, a potential revenue increase for the seller of the advertising.
The system is MPP, very fast response, works on commodity hardware, and is able to join heterogeneous data sources.
The book while less than 100 pages is laden with specifics that appear to have been worked out in great detail. And Gilula’s professional and academic credentials are good, laying claim to 20 years of industry experience in the database and data warehousing technologies working as a Sr. Data Architect for Teradata, Alcatel-Lucent, and PayPal, among others. The product is market ready though I’ve been unable to find a list of current users.
The primary implementation issue seems to be the creation of the initial structured search index of key-objects. While that can be automated to some extent for large manufacturers or distibutors of complex products, when it comes to the small web site operator, much of that effort will need to be manual. Clearly the conversion from key-word to key-object is not without cost or effort.
- Key-object is indeed for me at least a new idea.
- The key assumption is that if adopted, consumers would receive fewer, more relevant results from their searches, would be less inconvenienced in the use of their time, and that this increased accuracy could be valued in advertising because each of the more accurate results would have a higher hit rate and could therefore carry a higher charge to the advertiser.
- Since there is a non-trivial switching and maintenance cost to the catalog provider (information source), a larger percentage of the consumers using the system would have to perceive a benefit which would be true, apparently, only for categories of merchandise or services which are selected based on fairly large numbers of criteria. Consumer goods in electronics, photography, and televisions may fit this assumption but the vast majority of goods purchased with ecommerce, clothing, food, toys, paper products, general consumables, books, movies, and the like do not.
- On a technical level, using key-object across multiple databases, some NoSQL and some RDBMS would require a non-trivial effort to create the initial catalog. Blending data sources that do not have comparable levels of detail could result in high accuracy results from those DBs that have that detail but would not add that detail to DBs which did not. In his ecommerce example he speaks of passing the labor on to the catalogue creator (or the catalog’s content provider) which would be a difficult hurdle.
- Similarly, let us say that an Amazon wants only to provide this level of access on high-specification items and not incur the cost on low-specification items. It’s not clear how these might co-exist.
So is Structured Search the only answer to this pain point? I’m not so sure.
NewSQL: NewSQL which is MPP, super-fast, distributed RDBMS may serve this purpose. Whether you would want to use NewSQL to search for low-specification items would likely hinge on some of the same cost and complexity issues for structured search.
Industrial Products: The place where high specification searches might pay off could be in the complex world of industrial parts and components. In the industrial world a relatively small group of business parts and equipment buyers are continuously buying very esoteric highly-specified components, albeit in relatively small quantities relative to consumer purchases. It is my understanding however that these catalogues have already been made searchable with high-specificity, presumably using RDBMS. There may actually be no pain point here.
Small Set of Consumer Products with This Problem: The set of complex consumer goods is relatively small and low volume compared to the great majority of things we buy with ecommerce, ranging from soap to books to clothing. In this much larger low-specification market it seems unlikely that consumers would perceive a benefit.
Perhaps more importantly, the key-word search in use today seems in many respects to have taken the place of physically browsing the aisles of the store where we actually take some pleasure in seeing offerings that are perhaps similar to our stated desires but may not have known to consider. Might structured search rob us of this pleasure of surprising finds?
About the author: Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist and commercial predictive modeler since 2001.