After 25 Years of Coding in C And Perl.
As an independent author/researcher, there is of course nothing in my “job description” that says I should code in Python (or any other language). Yet for a long time, I thought coding in Python would help me a lot. It would mean more readers, and thus eventually, more revenue. At one point I advertised a job position, looking for people to translate my Perl scripts into Python. I thought doing it myself would take a lot of time with a long learning curve.
I was wrong. I am glad that I jumped into the Python bandwagon. It was much easier than I thought. Of course I still have to learn plenty of things. Here I relate my experience, learning Python on my own, without attending any class, without reading any book on the topic. I hope my experiment will help people do things they hesitate to do, be it learning a new language or anything else. Some readers mentioned that what I did inspire them to move forward with some projects, rather than following inertia.
First, I do not recommend learning it entirely on your own. You may save some time and money, but learning from a reputable instructor or book will guide you on the right track. In my case, I am a self-learner. Any class I attended in the past was a waste of time, and progressing too slowly. But I am the exception rather than the rule. When I published my Python code, written in a unique style, I asked for criticism. I received a lot of feedback, which I share here. First, people said Python is not designed to reflect a coder’s personality, but should be written in a “standard way” to make it easy to reuse by other coders. And while my code is based on what I’ve found online, it is no obvious to browse the Internet to find good code snippets, versus bad ones or obsolete practices, when you start from scratch.
That said, I did not really start from scratch. I have a long experience of coding in various languages, and scripting languages in particular. It definitely helps. Though it also hurts, as some of my old habits conflict with the way Python was designed. On the plus side, I started with the latest version of Python.
How it Started
It started about two weeks ago. I was working on a new fuzzy regression technique, math-free, model-free, yet providing prediction intervals (without bootstrap or resampling). At 2am on a Sunday, I could not sleep, haunted by math problems. I went to my desk and decided to try to install Python. Eighteen hours later, my first Python script was life and working properly. It deals with this fuzzy regression method. You can find the details, including the source code, here. It is the translation of an earlier version, written in Perl.
I work on a Windows machine, with the Cygwin environment (some kind of Unix environment for Windows). Installing Python and running it under Cygwin was easy, and very similar to Perl. I quickly realized that I would benefit from using libraries like Numpy or Pandas. Installing Pandas with the command
Pip install pandas did not work on Cygwin. I tried doing it in a Windows console, the equivalent of Unix shell for Windows. It allows you to enter command-line statements. It worked!
A Few Surprises
If Python is the first language that you learn, you won’t experience the problems below. But if you come from Perl or other programming languages, be prepared for some surprises. While perplexing, they are easy to overcome.
First, there is no explicit “end” when you write a loop, say a
For loop. The end of the loop is dictated by the indentation. It makes for shorter code, easier to read. But you need to be very strict with indentation. I discovered this feature on my own. Then the use of comma to separate variables does not work. I had a hard time figuring out why, until I realized there is something called “Tuple” in Python. Adding commas create and define a Tuple, not a list of variables. Once you are aware of this, it is not an issue. Also, there is nothing to identify a variable. As a result, you can’t name a variable
and also represents the Boolean operator. In Perl, variable names start with
$. However, I see it as an advantage: it makes the code more compact.
Perhaps the most surprising feature is that an array assignment such as
a=3 (on a void array) won’t work. You have to write
a.insert(1,2) instead. It is a reminder that Python must do some memory allocation in the background. Kind of like coding in C. In Perl this is transparent. I wish the interpreter would automatically recognize
a=3 and turn it into
a.insert(1,2). What I did not realize initially, is that arrays don’t really exist in Python: they are treated as lists. This explains why
a=3 does not work.
Finally, you can not put all your functions (subroutines in Perl) at the bottom of the code. A function must be defined before the first call, in your code. Debugging is very easy though, the error messages you get are more helpful than the ones I get from Perl. Using libraries like
numpy is also surprisingly easy and intuitive.
Now a few positive surprises. I was expecting Python to be very strict about variable types. I thought you would have to perform type casting all the time. Of course it is more strict than Perl, but less than C. All in all, it was not an issue. And type casting makes for more robust code.
The scope of a variable is more flexible than I thought it would be. You don’t need to pre-declare all the variables, and certainly not at the top of your code. A local variable (within a function) with the same name as a global variable, won’t overwrite the global variable. In some sense, it is a bit like Perl if you use the
warning directives, which is good programming practice.
I used three “array” initializations in my code, on three separate lines:
z=. I was told I could write
x=y=z= instead, but it does not produce the same results. I still have to figure out why, but it does not bother me. Finally, I thought I would have a hard time writing a function that returns multiple arguments. It actually worked on my first attempt, without problems. The function just returns a “Tuple”, though at that time I did not know there was something called a Tuple.
Some Advice I Received
A reader pointed to re.sub(…) for processing regular expressions. Another one suggested to conform to the Python PEP8 style standards. More specific comments:
- Using the
withstatement before opening a file means you don’t need to remember to close the file yourself (and it will be closed automatically should you hit a runtime error).
- You have overridden a built in (list) which could be confusing and really should not be done. Generally your code is a little too verbose. List comprehensions are very helpful ways to create for loops on a single line (they are faster too) and tuple unpacking avoids unnecessary indexing. Unless you are writing a huge file, I’d suggest packing the output of your calculation into a list (of dicts) and then using that to create a file at the end, rather than keeping a file open and writing to it.
- Try not to declare variables until you use them; forcing a user to look more than a few lines up or down to understand what a block of code is doing decreases the interpretability of your code. In those situations, there is probably a better way.
- You could replace
np.expand save that import 🙂 Numpy’s version has the extra advantage that it can also work with arrays.
- There’s nothing wrong in Python with nested loops, but list comprehensions are often a helpful alternative – performance often increases, but not always. See below an example:
[sum([x*y for y in range(0,3)]) for x in range(1,5)].
Should you hire a programmer with no Python experience, for a Python job? This a question worth pondering, if you can’t find candidates. In 18 hours, I made a lot of progress. My next step is playing with hash tables, called dictionaries in Python. I used them a lot for NLP, in Perl. And then try the AV video library and some graphics libraries like GD. I used them in Perl and R: they are written in C++, and I have no doubt, available in Python as well.
My Python script fuzzy.py and the input/output data sets are available on my GitHub repository. I wrote a technical document explaining the method and the code. It is entitled “Fuzzy Regression: A Generic, Model-free, Math-free Machine Learning Technique”, and available from here. Finally, I’ve found the book “Think Python” by Allen Downey, to be a good introductory tutorial for me, as it is rather compact.
About the Author
Vincent Granville is a machine learning scientist, author, and publisher. He was the co-founder of Data Science Central (acquired by TechTarget) and most recently, the founder of MLtechniques.com.