Subscribe to DSC Newsletter

This article was written by Yhat. 

Introduction

One of my favorite things about Python is that users get the benefit of observing the R community and then emulating the best parts of it. I'm a big believer that a language is only as helpful as its libraries and tools.

This post is about pandasql, a Python package we (Yhat) wrote that emulates the R package sqldf. It's a small but mighty library comprised of just 358 lines of code. The idea of pandasqlis to make Python speak SQL. For those of you who come from a SQL-first background or still "think in SQL", pandasql is a nice way to take advantage of the strengths of both languages.

In this introduction, we'll show you to get up and running with pandasql inside of Rodeo, the integrated development environment (IDE) we built for data exploration and analysis. Rodeo is an open source and completely free tool. If you're an R user, its a comparable tool with a similar feel to RStudio. As of today, Rodeo can only run Python code, but last week we addedsyntax highlighting for a bunch of other languages to the editor (markdown, JSON, julia, SQL, markdown). As you may have read or guessed, we've got big plans for Rodeo, including adding SQL support so that you can run your SQL queries right inside of Rodeo, even without our handy little pandasql. More on that in the next week or two!

What you will find in this article: 

  • Downloading Rodeo
  • A bit of background, if you're curious
  • Install pandasql
  • Check out the datasets
  • An odd graph
  • It's just SQL
  • Final thoughts

To check out all this information, click here

Top DSC Resources

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 7321

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Hector Alvaro Rojas on February 12, 2017 at 6:33am

Hi Emmanuelle!

What’s up?

Thanks for sharing this article. Anyway I have two comments to do about it. Here they are:

 (1) R-project SQLDF. 

Good: It is great if you want to use SQL to query files residents in R memory. 

Bad: Limited to RAM memory of your computer. So it cannot be used with Bigdata or datafiles bigger than you RAM memory. 

Possible solution: Move your bigdata to some database management (MySQL, MSSQL, SQLite, …) and then do your SQL work using drivers like RODBC, DBI or R packages like RMySQL, RDBL, …. This solution is great and easy to get habilitated and manage. I know because I have done all of this and it works great. 

(2) Python PANDASQLDF

Good: It is great if you want to use SQL to query files residents in Python memory? I am not sure about it but I am guessing. Am I right or not?  In fact this is a doubt that I have about pandas too.

Does it pandas work in RAM memory of the computer?  If so, pandas work like file does in R-project?

Bad: Limited to RAM memory of your computer? So it cannot be used with Bigdata or datafiles bigger than you RAM memory? I am not sure about it but I am guessing. Am I right or not? 

Possible solution: Move your bigdata to some database management (MySQL, MSSQL, SQLite, …) and then do your SQL work using drivers like PyMySQL, sqlalchemy, ….. Most of them can be managing somehow under Pandas (at least these two does it) making the queries job more easy. This solution is great but if you want to get fast results some time it is a “pain in …….” It is a lot of more easy to get equivalent results by using R–project.

I do not know yet which platform is faster in doing the same query under the same database (of course). By these days I am just in the process of finding out about it.

Anyway, in doing this job R-project has been a lot of more easy and friendly so far, at least for me.

I hope this comment can be a help by the moment somebody has to manage this stuff. Please let me know if I am wrong about my comments and, more important yet, in this case give me the solutions and the right way of doing so.

Regards,

HA

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service