Home » Uncategorized

Data analysis and visualization in Perl

Guest blog by Stephan Loyd.

Hello everybody, this is my first post here, so forgive me if I screw it up.

Let me firstly introduce background of my work. Several years ago I landed onto a Perl job. It also involves some other languages like Python and R, but it was mainly Perl, until last year focus of my role switched and I still do some Perl but much less since then. I was a little bit sad. Perl is indeed a good language, but usually underated outside its community. I am quite good at several programming languages, but with Perl I feel most comfortable and productive. So I thought I might write something at my after-work time, to use my Perl knowledge to create something to give back to the Perl community.

1991108651

I remembered I read this post Putting Perl Back on Top in the Fields of Scientific and Financial … a couple of years ago. I know Perl has PDL since long time ago, but why does it not have a big market share? Well, IMHO there are a few reasons, and one of them is that, it needs a whole ecosystem, by ecosystem I mean several strong libraries that target different aspects of scientific or financial computing, to fullfill user’s needs. Python has numpy/scipy and matplotlib since early 2000s. And based on numpy they created pandas which is used a lot today in dataframe-based data analysis. R itself has out-of-the-box dataframe and plotting features, and Hadley Wickham’s ggplot2 is so powerful that I heard people not switching from R to Python only because of ggplot2. For Perl IMHO it lacks a great plotting library, and it lack a library that can do dataframe well.

Then I decided to do something in this area: to improve ZMUGHAL’s Data::Frame, and to port R’s ggplot2 to Perl. It took me several months but it finally has got to a rough usable state. In recent couple of weeks I released two new libraries onto CPAN: Chart::GGPlot and Alt::Data::Frame::ButMore. The latter one was because I’ve not been able to reach ZMUGHAL.

For those who have not heard of R’s ggplot2, it’s an implementaion of Leland Wilkinson’s The Grammar of Graphics. Basically it allows one to define a plot by specifying various components like geometry layers, scales, etc., on a high level, and map columnar data to the plot. For example, below piece of code would get you that plot on top of this post.

1991166334

More of my Chart::GGPlot examples are here.

So that’s it. Thanks for reading. And comments or ideas are welcome 🙂

Originally posted here