Python Pandas: First thoughts from an R fanboy...

Posted on September 27, 2016

Just tried out Python Pandas for analysis work as an alternative to R, so far, so good! Really impressed with the interface, speed, and resources online. For me, trying Pandas is analogous to using R’s data table package: some slight differences, but all the same functionality when it comes to data transformation. Pandas is built on top of NumPy arrays, and between those two packages all of R’s data frame capability is available in the python environment.

My experience using R and its wealth of well developed statistic, machine learning, and visualization packages gives me quick access tools not found in Python, and the community of Statisticians using R ensures that cutting edge packages are published regularly. But let’s be honest: R has some major weirdness that makes it hard to learn, difficult to run concurrently, and downright slow for some data structures access patterns. There are ways around a lot of these issues, but it’s hard to overcome the fact that few people know R, and fewer know it well compared to python. Writing critical code for a start-up in R a risky proposition when it comes to maintainability. Most likely, a start up is using more than just R(prove me wrong!), and if Python or alternative language can handle the analysis task, it should be used. This is where Pandas can really shine, for data transformations.

Getting used to Pandas has put my foot in the door of doing data science in the Python ecosystem, although transferring all my skills in R will require learning about a dozen packages. With all the folks out there using Python for projects and companies, the advantage of using Python for analysis can only grow as Python matures. If Python can woo Academia’s statisticians, R will eventually lose it’s superiority package support, and the user environment where a lot of folks like me learned it. Until then, I’ll most likely use both languages for different tasks while I eager anticipate the day until Julia becomes better than both!

Check out this cheat sheet for Pandas basics

A nice translation of R’s data frame functions to Pandas

Comparison of R and Python for data science(Post image is from them): link