Monday, 16 January 2017

Add Equation to Seaborn Plot (and separate thousands with commas)

Producing a scatter plot with a line of best fit using Seaborn is extremely simple. But showing the equation of that line requires some extra work.

The following Python code produces the following graph:

rng = np.random.RandomState(1)
x = 10000 * rng.rand(50)
y = x - 500 + 500*rng.randn(50)
df = pd.DataFrame({'x':x,'y':y})
g = sns.lmplot('x','y',df,fit_reg=True,aspect=1.5,ci=None,scatter_kws={"s": 100})


Finding the Equation of the Line

Adding the line of the equation requires us to first find the parameters of the line. We can use scikit-learn to do this:

from sklearn import linear_model
regr = linear_model.LinearRegression()
X = df.x.values.reshape(-1,1)
y = df.y.values.reshape(-1,1)
regr.fit(X, y)
print(regr.coef_[0])
print(regr.intercept_)

This gives: (Your values might vary because of the initial random number)

[ 1.01360441]
[-499.28854278]

To include that in the plot we have to overlap a textbox on the plot:

g = sns.lmplot('x','y',df,fit_reg=True,aspect=1.5,ci=None, scatter_kws={"s": 100})
props = dict(boxstyle='round', alpha=0.5,color=sns.color_palette()[0])
textstr = '\$y=-499.3 + 1.0x\$'
g.ax.text(0.0, 0.0, textstr, transform=g.ax.transAxes, fontsize=14, bbox=props)


You can, of course, place the box in whichever position you like:

g.ax.text(0.7, 0.9, textstr, transform=g.ax.transAxes, fontsize=14, bbox=props)


Separating Thousands with Commas

Finally, to make the numbers on the axes more readable, we want to separate the thousands with commas. To do this we use the code:

g.ax.get_xaxis().set_major_formatter(matplotlib.ticker.FuncFormatter(lambda x, p: format(int(x), ',')))
g.ax.get_yaxis().set_major_formatter(matplotlib.ticker.FuncFormatter(lambda x, p: format(int(x), ',')))

No comments:

Post a Comment