Subscribe to DSC Newsletter

Is there any software that converts an SQL script into NoSQL? What are the drawbacks and limitations  of these translators? What about SQL to Python?

By the way I'm interested in many different programming language translators including C++ or Perl to Python or Java - if you know any great tool, feel free to share. Writing a translator would be an exciting project, if none currently exists. 

Related articles

Views: 13642

Reply to This

Replies to This Discussion

Isn't HAWQ (the Pivotal platform) supposed to "digest" SQL code entered by the analyst or coder, and transform it (transparently to the user) into something much more efficient to run under Map Reduce / Hadoop?

ETL?

Talend

Language specific ORMs (i.e roll-your-own)?

e.g. Wire up Mongo and mysql in Python and model the E. and T. and then L.

I wouldn't expect full functionality in the translated SQL, but however, I would expect the code to (1) maybe use a different, structure, like multi-dimensional hash tables, to efficiently process the query and (2) be able to split the query into sub-queries to run across multiple servers (Map Reduce).

Ok, I better understand. And of course, being you, it's interesting.

I've hit Hive or Impalla via jdbc to interface with Hadoop. If your rationales are speed and parallelized computing and you have control over the stack I'd highly recommend Impalla (or whatever newness is out there).

If however, you'd like greater control over computation then this is a problem probably worth attacking.

The current thrust in the market seems to be how to get data and reductions into memory to speed a query. The obvious maturation involves redefining commodity hardware as CPU/RAM/HDD & GPU.

I have developed a piece of software called DataCurrent that accepts SQL and translates to a common model.  It doesn't do translation to other sources however. It allows you to develop plugins that call on your specific data source with the filters and fields inferred from the query.  The tool was designed to sit underneath SQL-centric business intelligence tools, or as a lightweight extension to data processing/ETL platforms like Lavastorm's AE.  For example, we have a mongodb plugin that allows you to send sql statements to DataCurrent and then retrieve the data efficiently from mongodb.  

I found the replies to the post interesting, and any input from the group on interesting directions to take this would be appreciated.   This is a real technical challenge for a typical sql statement, because you have to analyze the constraints of the query so that you can push those filters down to the underlying source in a general way limiting the yielded data as much as possible.   If a tool sql statement supports joins across heterogeneous sources then you run into a number of additional efficiency challenges.

Comment from Louis Giokas, one of our readers:

One thing I will say, you are moving toward a situation where what you want is the SQL front end and different back ends. IBM and Oracle are doing that now. I can look into it for you if you would like. I worked for both IBM and Oracle in the past.

HP Vertica is a fantastic column store with standard SQL access (vsql).  You can create R udfs to run in your SQL statements.  Works best with structured data.  Had a complex analytics query running on Oracle Exadata that was taking up to 25 minutes...Vertica dropped that to less than 1 minute without tuning...then to about 30s with Vertica's Query optimizer.  A little off topic, but this comment made me think of Vertica and how it gave me the ability to analyze extremely large data sets before I was exposed to nosql/hive/pig/etc.  


Vincent Granville said:

Comment from Louis Giokas, one of our readers:

One thing I will say, you are moving toward a situation where what you want is the SQL front end and different back ends. IBM and Oracle are doing that now. I can look into it for you if you would like. I worked for both IBM and Oracle in the past.

HPCC Systems (http://hpccsystems.com) JDBC driver does exactly this. It accepts SQL and converts it ECL the HPCC Systems data programming language. The open source code base is available here - https://github.com/hpcc-systems/hpcc-jdbc

 

Why do you want to do that? Are you trying to create a wrapper to use SQL to query NoSql? Or, you are just trying to convert some legacy SQL to NoSql? 

@Baljeet: I'm trying to get SQL to work much faster, and integrate legacy SQL code into NoSQL environments. Also, getting a SQL to NoSQL wrapper could help business analysts leverage NoSQL power transparently, which in turns would optimize their work (without having to train them on new technologies). and save dollars.

Thanks for the explanation. You may want to check JSONiq (they call it SQL for NoSQL although the syntax is not same as SQL) and www.querymongo.com which does the conversion that you are looking for. 

The other option would be to implement a database like VoltDb or nuodb which are SQL compliant and comes with the benefits you are looking for.

Or, you can look at NewSQL class of databases that intend to give the same performance as NoSQL.

With the various options in the market; building a wrapper does not sound like an efficient approach to me (i am assuming your requirement is specific to your business needs). Not to forget that the wrapper will tend to get biased towards the technology that it would support. For example, a wrapper for MongoDB cannot guarantee the same benefits with CouchDb as the underlying mechanism may work differently especially the optimization mechanism and the use of indexes in NoSQL databases. This in turn will introduce inflexibility and gives IT less flexibility in using the technology of their choice.

I do agree that making it transparent to the business and lowering the training cost are valid business cases however i also believe that the cost of developing a wrapper, the complexity in development, testing cost and lifecycle cost for the product may not justify the above benefits. 

Migrating to SQL compliant databases may be easier and cost efficient. 

I also strongly feel that the database startups and the giants are already working on filling this gap. 

For anyone using MongoDB, I have written a SQL UI wrapper in Java that allows SQL to be executed directly and MongoDB documents returned.  It's meant to handle 80% to 90% of the common use cases.  Enhancements are ongoing.  It also will generate the JSON and Javascript necessary to programmatically pass to the MongoDB API in the language of your choice.  It can use either the aggregation framework (default) or map reduce for aggregation.

It's on github (binaries only currently):



Vincent Granville said:

I wouldn't expect full functionality in the translated SQL, but however, I would expect the code to (1) maybe use a different, structure, like multi-dimensional hash tables, to efficiently process the query and (2) be able to split the query into sub-queries to run across multiple servers (Map Reduce).

Reply to Discussion

RSS

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service