Subscribe to DSC Newsletter

Python is an increasingly popular object-oriented, interpreted and interactive programming language used for heavy-duty data analysis. Python is designed for ease-of-use, speed, readability and tailored for data-intensive applications. Python supports multiple programming paradigms, including object-oriented, imperative and functional programming styles. It features a fully dynamic type system and automatic memory management, similar to that of Scheme, Ruby, Perl and Tcl. 

. 

You can create customized data tools using Python that can handle large data sets efficiently - it lets you work more quickly and integrate your systems more effectively. You can get more done in less time using Python for manipulating, processing, cleaning, and crunching data. 
Python allows an organization to build a framework that makes it easy to collect data from a myriad of data sources and model them. So instead of spending time writing database connector code, you can use a simple configuration and quickly get off the ground. As a result of this easy familiarity, Python allows an organization to move code from development to production more quickly considering the same code created as a prototype can easily be moved into production. 
.
If you like R language, Python libraries such as SciPyiPython and Pandas provide much of the mathematical functionality typically found in R. While R offers more packages and visualization capabilities at this time, Python is catching up. 
Simply, Python is easy to learn, platform neutral and cheap. Python is a tool to build other tools with, including data analysis tools. It was actually conceived in a huge orgy of different programming paradigms, styles and languages. Python runs on Windows, Linux/Unix, Mac OS X, and has been ported to the Java and .NET virtual machines. 
.
Python is free to use, even for commercial products, because of its OSI-approved open source license.  See: http://www.python.org/psf/license/
.  
Pandas is a Python package for doing data transformation and statistical analysis. Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools. See: http://pandas.pydata.org/
.
While R is the most widely-used open source environment for statistical modeling and graphics, Pandas adopts some of the best concepts of R, like the foundational data.frame. Pandas has been described as "R data.frame on steroids". Pandas seeks to remedy some frustrations common to R users:
.
1. R has simple data alignment and indexing functionality, leaving much work to the user. Pandas makes it easy and intuitive to work with messy, irregularly indexed data - like time series data. Pandas also provides rich tools, like hierarchical indexing, not found in R;
.
2. R is not well-suited to general purpose programming and system development. Pandas enables you to do large-scale data processing seamlessly when developing your production applications;
.
3. Hybrid systems connecting R to a low-productivity systems language like Java, C++, or C# suffer from significantly reduced agility and maintainability, and you’re still stuck developing the system components in a low-productivity language;
.
4. The "copyleft" GPL license of R can create concerns for commercial software vendors who want to distribute R with their software under another license. Python and Pandas use more permissive licenses.
.
Top Python Advantages
.
- Instant feedback from the interactive interpreter.

- Non-intrusive:  You think about the problem, not the tool you are working with.  After you learn Python, it gets out of the way.

- Libraries:  Whatever you want to do, somebody has written code to help you get there.

- Community:  The community is a great source of examples and ideas.

- The philosophy of one-best-way means that Python programmers all tend to do things in sort of the same way. This is a big advantage because it makes it easy to read other people's code - a great way to learn.
.
Top Python Disadvantages
.
- No single source of truth / best-practices:  It can be hard to learn what is the best library for a particular job. The large number of packages relevant to a particular task can make it difficult to find the one best suited to your exact needs.

- Documentation is substandard:  The Python official documentation is seldom the best way to learn a new library. The informal Python community provides the most useful examples. Yet sorting out the wheat from the chaff can be hit-or-miss.

- Concurrency:  Python was designed without concurrency in mind and it shows.
.

Views: 7908

Tags: Data, Python, Tools

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Kentaro Tanaka on February 21, 2016 at 3:21pm

This book was amazing. Before I took the Data Science Bootcamp using Python in Irvine http://www.thedevmasters.com , this book gave me really good preparation to better understand. Thanks to that, I totally understand how data analysis using Python works. 

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service