pdLSR: Pandas-aware least squares regression

2016/07/17

I have a new Python project I would like to share with the community. Actually, this project isn't so new. I developed an initial version about two years before completing my postdoctoral research, and it has undergone various revisions over the past three years. Having finally made time to give it the clean-up it needed,¹ I am excited to share it on GitHub.

Overview

pdLSR is a library for performing least squares minimization. It attempts to seamlessly incorporate this task in a Pandas-focused workflow. Input data are expected in dataframes, and multiple regressions can be performed using functionality similar to Pandas groupby. Results are returned as grouped dataframes and include best-fit parameters, statistics, residuals, and more. The results can be easily visualized using seaborn.

pdLSR currently utilizes lmfit, a flexible and powerful library for least squares minimization, which in turn, makes use of scipy.optimize.leastsq. I began using lmfit because it is one of the few libraries that supports non-linear least squares regression, which is commonly used in the natural sciences. I also like the flexibility it offers for testing different modeling scenarios and the variety of assessment statistics it provides. However, I found myself writing many for loops to perform regressions on groups of data and aggregate the resulting output. Simplification of this task was my inspiration for writing pdLSR.

pdLSR is related to libraries such as statsmodels and scikit-learn that provide linear regression functions that operate on dataframes. However, these libraries don't directly support grouping operations on dataframes.

The aggregation of minimization output parameters that is performed by pdLSR has many similarities to the R library broom, which is written by David Robinson and with whom I had an excellent conversation about our two libraries. broom is more general in its ability to accept input from many minimizers, and I think expanding pdLSR in this fashion, for compatibility with statsmodels and scikit-learn for example, could be useful in the future.

Throwing the Book at Your Data

2015/07/22

You may have noticed all has been quiet on the blog front from me lately. There are several reasons for this,¹ but I can assure you it isn't for lack of things to write about. Today, I'm happy to share with you one of the projects that has been …

Automated Notifications of Experiment Progress: Prowl Extra Credit

2013/10/19

Previously, I posted about using Growl in combination with Prowl to get remote notifications of experiment progress on both a Mac and iPhone. Later that day, I started thinking about some improvements to the script after a brief Twitter conversation with Seth Brown.¹

The script depends on the remote …

Automated Notifications of Experiment Progress: Combining Shell, SSH, Growl, and Prowl

2013/10/12

In addition to covering my use of python in research, one of my goals for this blog is to share ways I use various other computational tools to automate basic research tasks. Such an opportunity arose this week with the onset of hardware issues causing one of our laboratory's NMR …

Visualization of NMR Shaped Pulses: Fun with Javascript Animation

2013/06/14

This IPython notebook builds on the previous blog post which described how to simulate and plot the result of a shaped pulse on magnetization in an NMR experiment. For the purpose of teaching¹, it can also be useful to visualize the effect of this pulse on the magnetization at …

← Older

themodernscientist

data scientist, biophysicist, mac-unix zealot, pythonista