# Testing the Correlation between Time Series Variables

In this article, we will examine whether the gasoline prices are related to the variables that are thought to affect gasoline prices the most by the Turkish people. One of the variables is the Brent crude oil prices that are averaged monthly in dollars; the other is the dollar exchange rate in Turkish currency (TL) that are averaged per month as well.

These variables will be shown brent and dollar respectively in the dataset. The range of the dataset is between 2013 and 2020. The brief data frame is shown below. (All the coding in the article is based on R language)

 head(df) #        date gasoline  brent dollar #1 2013-01-01     4.67 115.55 1.7589 #2 2013-02-01     4.85 111.38 1.7985 #3 2013-03-01     4.75 110.02 1.8090 #4 2013-04-01     4.61 102.37 1.7930 #5 2013-05-01     4.64 100.39 1.8756 #6 2013-06-01     4.72 102.16 1.9288

The T-test is used to examine whether the population correlation coefficient is zero or not. The pre-acceptance is that the sample is normally distributed. This pre-acceptance is violated in some situations, in those cases, an alternative non-parametric test is needed. The Spearman’s rank correlation test takes over here; because profit or price data generally do not show normal distribution. Therefore, it is not appropriate to use the Pearson correlation coefficient test in our dataset.

Spearman’s rank correlation test consider ranking while it measures the correlation between two variables. The value is as between +1 and -1 as is the Pearson correlation coefficient- $\rho_s$- The two-way hypothesis test is described as:

$H_0: \rho_s=0$

$H_A: \rho_s \ne 0$

First of all, the sample spearman rank correlation coefficient$r_s$- is calculated to execute the test; this happens in a couple of steps.

• Gasoline prices are ranked from small to big; in the case of equality, the ranking of equal observations are averaged and the ranking continues from where it left off. The same process is executed for Brent prices.

 library(dplyr)   df_spearman<- df %>% mutate(   rank_gasoline=rank(gasoline),   rank_brent=rank(brent),   d=rank_gasoline-rank_brent,   d_square=d^2) %>%   select(-dollar)   head(df_spearman)   #        date gasoline  brent rank_gasoline rank_brent   d d_square #1 2013-01-01     4.67 115.55            23         84 -61     3721 #2 2013-02-01     4.85 111.38            34         81 -47     2209 #3 2013-03-01     4.75 110.02            27         79 -52     2704 #4 2013-04-01     4.61 102.37            20         67 -47     2209 #5 2013-05-01     4.64 100.39            21         65 -44     1936 #6 2013-06-01     4.72 102.16            26         66 -40     1600

• The difference between the rankings of each binary observation is calculated as :

$\Sigma d_i=0$

 sum(df_spearman$d) #[1] 0 • Later, the squares of the difference are summed: $\Sigma d_i^2=69107$  d_square_sum <- sum(df_spearman$d_square)   d_square_sum #[1] 69107

Spearman rank correlation coefficient -$r_s$-  is formulated as:

$r_s=1- \frac {6\Sigma d_i^2} {n(n^2-1)}$

 n <- nrow(df) rho_s <- (1-(6*(sum(d_square_sum)))/(n*(n^2-1))) %>% round(2)   rho_s #[1] 0.3

This result shows us that there is a positive and weak relation between gasoline and brent prices. Let’s examine this result is at a significance level of %5 and find if the alternative hypothesis is true.

The point we have to look at is highlighted in the chart above for - $\alpha=0.05$- and n=84; because of the ${r_s}=0.3 \geq 0.215$  - the null hypothesis -$H_0: \rho_s=0$- is rejected and at the %5 significance level, we can say that although it is weak there is a positive relation between gasoline and brent prices.

Let’s check the results with another way by calling the function ggscatter.

 library("ggpubr")   ggscatter(df, x = "brent", y = "gasoline",           color = "blue", cor.coef = TRUE,           cor.method = "spearman",           xlab = "Brent (TL)", ylab = "Gasoline (TL)")

As we can see in the chart above, spearman’s ranked correlation coefficient (R=0.3) is the same we found before; and p-value (0.0055) less than 0.05 significance level which means the alternative hypothesis is true -$H_A: \rho_s \ne 0$-

Finally, we will examine the relation between gasoline and dollar (USD/TRY)

 ggscatter(df, x = "dollar", y = "gasoline",           color = "red", cor.coef = TRUE,           cor.method = "spearman",           xlab = "USD/TRY", ylab = "Gasoline (TL)")

The graphic above appears to have a strong positive relationship between gasoline and the dollar. P-value value less than 0.05 indicates that the result is significant and one more time null hypothesis is rejected.

References

• Sanjiv Jaggia, Alison Kelly (2013). Business Intelligence: Communicating with Numbers.
• STHDA: Correlation Test Between Two Variables in R
• Original Article

Views: 498

Comment

Join Data Science Central