Google Correlate: A quick statistics lesson, or “How are behavioral finance and matrix algebra related?”

Google have just opened up Google Correlate, which allows you to enter terms and see what other search terms are correlated with it (on a weekly level). This could be interesting, but I expect lots of uselessly intuitive results, such as “disposition bias”, “richard thaler” etc… To my surprise, that was not what I found…

As the tool is US-centric, I entered the American spelling of Behavioral Finance to see what came up:

Rufino Tamayo is a Mexican painter from Oaxaca.  Spartina Alterniflora is a type of Saltmarsh grass. Hmm….

How are those two so highly associated with behavioral finance? Taking a look at the scatterplot, it is evident that one datapoint exerts a lot of leverage on the measurement.

Outlier in Google Correlate data

Impressively, Google let you download the data so that you can check things yourself. So I downloaded the behavioral finance data series, and changed that 10 value to a 2 (still pretty high). This is sometimes called Winsorising the data, which reduces the leverage of outliers.

I uploaded the resulting dataset, and let google re-run the analysis. This time the results looks more reasonable:,10

So what do we now correlate with?

First, note that the the two highest items are gone. Apparently their association was mostly driven by one datapoint, and removing it drops them from the top 10. Checking the data is important.

However, it does not remove the association with Eigenvalue. While that makes me more comfortable that the association isn’t data error, none of the resulting correlates make sense to me. Why are the weeks associated with more “behavioral finance” queries be associated with higher “mitochondrial dna” queries? Why don’t associated behavioral finance terms show up?

Results such as this usually suggest one of two things:

1. Your data or calculations are wrong somehow.

2. You’re on to something really interesting.


To be discovered….






