This is a short article about 3 useful functions I made to get stocks data and return a normalized graph of their returns.

In this case, we are especially interested in regional ETF. ETF’s that are specifically about one part of the world as oppose to those that focus on a particular economic sector.

We’ll use the tidyverse library that encompass all the other library we need for this task.


Getting the data

First I created a list of ETF with their ticker and their region. As I like to keep things neat and I live in a place where internet is not a cheap commodity, I’ll save everything as a .csv file into a separate folder.

The file with our ETF list is called etf_country.csv and is accessible here. We download the data using the getSymbols function from the quantmod package and save everything in a separate folder called ETF_Data.

tickers <- read_csv("etf_country.csv", col_names = TRUE)
startDate = "2000-01-01"

for (i in 1:nrow(tickers)){
  data = quantmod::getSymbols(Symbols = tickers$ticker[i],
                     src = "yahoo",
                     from = startDate,
                     auto.assign = FALSE,
  colnames(data) = c("Open", "High", "Low", "Close", "Volume", "Adjusted")
  write.zoo(data, paste0("ETF_Data/", tickers$ticker[i], ".csv"), sep = ",", row.names = FALSE)

Create a Rate of Change graph

What we want to create is a graph that compare the returns of several ETF within a given time frame.

First we create a function to read the .csv file of the regional ETF we want to compare and filter the results for our given dates defined by a start_date and an end_date. Our function takes 4 arguments: * a vector containing all the ETF to compare * the path where to read these ETF * a start date * an end date

As an output the function returns a data frame (tibble) with the dates and the returns.

read_fin_data <- function(tickers, 
                          thePath = "ETF_Data/", 
                          start_date, end_date) {
  df <- read_csv(paste0(thePath, tickers, ".csv"))
  df <- df %>% as_tibble() %>% select(Index, Adjusted)
  df$Index <- ymd(df$Index)
  df <- df %>% filter(Index <= end_date) %>% 
          filter(Index >= start_date) %>% 
          mutate(adj_return = Adjusted / first(Adjusted) - 1)

The sole purpose of that function is to read the ETF price data and filter them within our 2 dates. So we will map that function to each of our ETF ticker we want to compare using the map function from the purrr package.

graph_return_region <- function(tickers,
                                start_date, end_date) {
  df <- tibble(ticker = tickers)
  etf_list <- read_csv("etf_country.csv") %>% select(ticker, country)
  df <- df %>% left_join(etf_list)
  df <- df %>% mutate(price_data = map(ticker, function(.x) read_fin_data(.x, 
                                                                          start_date = start_date, 
                                                                          end_date = end_date)))
  df <- df %>% unnest()
  ggplot(df, aes(x = Index, y = adj_return, colour = country)) + 
    geom_line() + 
    labs(x = "Dates", y = "Adjusted Return", 
         title = "ETF Comparative Returns") + 
    theme(legend.position = "top")

Let’s check out how it works out for the Asian ETF for the last 3 months.

asia <- c("EWH", "EWM", "EWS", "EWY")
start_date <- today() - 90
end_date <- today()
graph_return_region(asia, start_date, end_date)