Have all great data structures and algorithms been identified?

No more low hanging fruits? No way to further optimize computational complexity of standard algorithms such as sorting, retrieving a value associated with a key in a hash table, finding connected components in a graph? Have we reached the limit of what can be achieved, in terms of data structures and fast algorithms for basic computations such as sorting? Will any new data structures invented in the future be just a slight variation of existing structures?

Any general book about data structures or computer science has the same chapters and content today, than (say) 15 years ago. There's no new stuff being added today.

Was it ever proved that sorting is at least O(n log n), unless you make some assumptions about the data (if the data is compact and discrete, O(n) can be achieved). Is it impossible to create an O(n log log n) or O(n SQRT(log n)) algorithm for sorting? Or nobody knows if it is possible or not?

Related articles:

Views: 764

Reply to This

Replies to This Discussion

Good questions!The computational complexity is a great challenge nowadays, but with the increase of business difficulty, the computing technology is also developing rapidly, take esProc for example, it break the conventional SQL computing mode into step by step computing like R, making the computing process  more visible.

Regarding the sorting, it developed the model that can sort in groups after grouping and many other favorable functions to make the sorting work more efficient and convenient.

In term of sorting with O(n log n), I think it possible whenever it is beneficial for users.

How much beyond Tukey's 0(n log n) have we made it?

Sorting is still n log n (even the parallel versions that have issues related to load balancing) on average.  Worst case is n^2.

In writing/developing generalized systems that can be used on many types of data, the fastest known algorithms may not be sufficiently fast.  To alleviate the situation, it may be possible to attack a smaller class of problems that is known to cover all practical data situations that need to be dealt with.

For instance, William Pulleyblank (then at Waterloo, later head of IBM's parallel computation lab) reviewed integer programming methods including a modified Chernikova Algorithm developed at Statistics Canada (STC) for use in editing under the Fellegi-Holt model (J. Amer. Stat. Assn. 1976).  Based on empirical tests, Pulleyblank concluded that the STC Chernikova was superior to an algorithm of his and very superior to standard algorithms such as branch-and-bound.  The STC implementation was 10 times as fast (due to exceptional programming by Ivan Davignon) as a later implementation by Statistics Netherlands that had access to detailed documentation of the STC Chernikova algorithm but not the STC proprietary code.  Under a special agreement, I received the STC code but realized it was far too slow for the U.S. Economic Censuses.  My new SPEER algorithms attacked a subclass of general linear inequality edits used by the other algorithms because I knew that the smaller subclass was all that was needed.  The SPEER system processed each of the Economic Censuses in 12 hours or less, was 60 times as fast the STC system, and produced exceptionally high quality results that were equivalent to the gold standard of the STC system (confirmed by John Kovar of STC and me).

Other generalized systems suitable for record linkage and modeling/edit/imputation on large national files are described in


The systems also deal with somewhat narrower classes of problems but all the classes that are likely needed to be dealt with in practice.  With the above software, a team of 10-20 suitably skilled individuals can process at set of national files in 1-3 months.  With software that is 100-10 times as slow, it is not clear how long it would take to process a set of large national files.





© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service