Introduction

In this assignment, we are going to explore the World Indicator data and visualize them with graphs.

The first chunk calls the packages that we need in this assignment.

The second code chunk automatically retrieves the latest data from the World Development Indicators database, for use in the assignment.

Step 1: library calls to load packages

library(tidyverse)
library(leaflet)
library(WDI)

Step 2: Call package WDI to retrieve most updated figures available.

In this assignment, we will fetch ten data series from the WDI:

Tableau Name WDI Series
Birth Rate SP.DYN.CBRT.IN
Infant Mortality Rate SP.DYN.IMRT.IN
Internet Usage IT.NET.USER.ZS
Life Expectancy (Total) SP.DYN.LE00.IN
Forest Area (% of land) AG.LND.FRST.ZS
Mobile Phone Usage IT.CEL.SETS.P2
Population Total SP.POP.TOTL
International Tourism receipts (current US$) ST.INT.RCPT.CD
Import value index (2000=100) TM.VAL.MRCH.XD.WD
Export value index (2000=100) TX.VAL.MRCH.XD.WD

The next code chunk will call the WDI API and fetch the years 1998 through 2018, as available. You will find that only a few variables have data for 2018. The dataframe will also contain the longitude and latitude of the capital city in each country.

Note This notebook will take approximately 2 minutes to run. The WDI call is time-consuming as is the process of knitting the file. Be patient.

The World Bank uses a complex, non-intuitive scheme for naming variables. For example, the Birth Rate series is called SP.DYN.CBRT,IN. The code assigns variables names that are more intuitive than the codes assigned by the World Bank, and converts the geocodes from factors to numbers.

In your code, you will use the data frame called countries.

birth <- "SP.DYN.CBRT.IN"
infmort <- "SP.DYN.IMRT.IN"
net <-"IT.NET.USER.ZS"
lifeexp <- "SP.DYN.LE00.IN"
forest <- "AG.LND.FRST.ZS"
mobile <- "IT.CEL.SETS.P2"
pop <- "SP.POP.TOTL"
tour <- "ST.INT.RCPT.CD"
import <- "TM.VAL.MRCH.XD.WD"
export <- "TX.VAL.MRCH.XD.WD"

# create a vector of the desired indicator series
indicators <- c(birth, infmort, net, lifeexp, forest,
                mobile, pop, tour, import, export)

countries <- WDI(country="all", indicator = indicators, 
     start = 1998, end = 2018, extra = TRUE)

## rename columns for each of reference
countries <- rename(countries, birth = SP.DYN.CBRT.IN, 
       infmort = SP.DYN.IMRT.IN, net  = IT.NET.USER.ZS,
       lifeexp = SP.DYN.LE00.IN, forest = AG.LND.FRST.ZS,
       mobile = IT.CEL.SETS.P2, pop = SP.POP.TOTL, 
       tour = ST.INT.RCPT.CD, import = TM.VAL.MRCH.XD.WD,
       export = TX.VAL.MRCH.XD.WD)

# convert geocodes from factors into numerics

countries$lng <- as.numeric(as.character(countries$longitude))
countries$lat <- as.numeric(as.character(countries$latitude))

# Remove groupings, which have no geocodes
countries <- countries %>%
   filter(!is.na(lng))

A glimpse of the new dataframe

glimpse(countries)
## Observations: 4,410
## Variables: 22
## $ iso2c     <chr> "AD", "AD", "AD", "AD", "AD", "AD", "AD", "AD", "AD", …
## $ country   <chr> "Andorra", "Andorra", "Andorra", "Andorra", "Andorra",…
## $ year      <int> 2018, 2007, 2004, 2005, 2017, 1998, 1999, 2000, 2006, …
## $ birth     <dbl> 7.200, 10.100, 10.900, 10.700, NA, 11.900, 12.600, 11.…
## $ infmort   <dbl> 2.7, 4.5, 5.1, 4.9, 2.8, 6.4, 6.2, 5.9, 4.7, 5.5, 5.3,…
## $ net       <dbl> NA, 70.870000, 26.837954, 37.605766, 91.567467, 6.8862…
## $ lifeexp   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ forest    <dbl> NA, 34.042553, 34.042553, 34.042553, NA, 34.042553, 34…
## $ mobile    <dbl> 107.28255, 76.80204, 76.55160, 81.85933, 104.33241, 22…
## $ pop       <dbl> 77006, 82684, 76244, 78867, 77001, 64142, 64370, 65390…
## $ tour      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ import    <dbl> 136.50668, 190.30053, 174.09246, 178.06349, 146.27331,…
## $ export    <dbl> 268.35043, 332.78037, 271.81148, 314.89205, 264.92993,…
## $ iso3c     <fct> AND, AND, AND, AND, AND, AND, AND, AND, AND, AND, AND,…
## $ region    <fct> Europe & Central Asia, Europe & Central Asia, Europe &…
## $ capital   <fct> Andorra la Vella, Andorra la Vella, Andorra la Vella, …
## $ longitude <fct> 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218…
## $ latitude  <fct> 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, …
## $ income    <fct> High income, High income, High income, High income, Hi…
## $ lending   <fct> Not classified, Not classified, Not classified, Not cl…
## $ lng       <dbl> 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218, 1.5218…
## $ lat       <dbl> 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, 42.5075, …

Graphing and Comments

Beyond this line, you will insert your original code, following the instructions in the assignment.

Plot from Phase 1

In this part, we extracted data from World Indicator. We compared 5 major countries, US, China, Brazil, Russia and India about their developemnt of Internet usage, CO2 emissions and Health expense percentage of GDP from 2000 to 2012. We then plot the dots of each country’s value about those 3 aspects in each year, and use the line to connect each dots to show its trends. we stack the 15 graphs together on the same x-axis (time) and three different y-axis(‘Internet.Usage’, ‘CO2.Emissions’,‘Health.Exp…GDP’) to showcase the differences between each graphs.

library('ggplot2')
library('grid')
World.Indicators <- read.csv("World Indicators.csv")
fivecountries<-c('United States','Brazil','Russian Federation','India','China')
fivecountries=World.Indicators[World.Indicators$Country %in% fivecountries,]
cols<-c('Country','Year','Internet.Usage', 'CO2.Emissions','Health.Exp...GDP')
newdata=fivecountries[,cols]
newdata$Internet.Usage = as.numeric(gsub("[\\%,]", "", newdata$Internet.Usage))
newdata$Health.Exp...GDP = as.numeric(gsub("[\\%,]", "", newdata$Health.Exp...GDP))
newdata2 = gather(newdata, key = "Indicator", value = "Value", Internet.Usage, CO2.Emissions, Health.Exp...GDP)
ggplot(newdata2, aes(x = Year, y = Value)) + 
  geom_line(aes(group = Country)) + 
  geom_point()+
  theme(axis.text.x = element_text(angle = 90))+
  ggtitle('Trends in Internet Usage, CO2 Emissions, Health Exp % GDP for United States, Brazil, Russian Federation, India, and China')+
  facet_grid(Indicator~Country,scales='free')
## Warning: Removed 10 rows containing missing values (geom_point).

## World map showing a variable in 1998

Steps to create chart: 1. We selected population and filtered the data on year = 1998. We broke down population into a few clusters based on the the distribution of the dataset. (0,2 mil), (2 mil, 10 mil), (10 mil,50 mil), (50 mil,20 bil) are 4 groups that we chose. 2. We picked colors for these 4 groups. We selected prominent colors for all 4 groups to avoid the interpretation bias becasue of color differences. 3. We selected “addProviderTiles(provider =”CartoDB“)” as our presentation style becasue it provides cleaner look than “addTiles.” 4. We plotted the circles with population information on the location of a country. We included both a country’s name and its population in each circle by “label” command. 5. We finally included population ranges, colors, and a title in a legend for readers’ inference.

What we noitce: 1. The countries on islands had much smaller population size than the ones on continents, except for Japan. 2. Over 70 countries had population less than 2 million. 3. A lot of European countries had popupation between 2 million and 10 million. 4. Many countries that had coastlines placed their capital near them. 5. Australia and Canada had large territory but had less than 50 million population.

countries1998 <- countries %>%
        filter(year == 1998)

poprange1998 <- cut(countries1998$pop,
                breaks = c(0,2000000,10000000,50000000,20000000000),right=FALSE,
                labels = c("<2m","2m~10m","10m~50m",">50m"))
plot(poprange1998)

pal = colorFactor(palette = c("yellow","blue","red","black"), domain = poprange1998)

leaflet(countries1998) %>%
  addProviderTiles(provider = 'CartoDB') %>%
  addCircleMarkers(data = countries1998, lng = countries1998$lng, lat = countries1998$lat,
                     radius = ~5, label = paste(countries1998$country,countries1998$pop), color = ~pal(poprange1998)) %>%
  addLegend(pal = pal, values = ~poprange1998, position = "bottomright", opacity = 1, title = "Population in 1998")

World map showing the same variable recently

We did the same thing for year 2018.

The significant difference between 1998 and 2018 is that the population have been growing. The countries whose population are within 2 million people decreased and the countries whose population are within 10 million and 5 million people, and above 5 million people, increased significantly. Countries that increased their populations are generally in Middle East and Africa.

countries2018 <- countries %>%
        filter(year == 2018)

poprange2018 <- cut(countries2018$pop,
                breaks = c(0,2000000,10000000,50000000,20000000000),right=FALSE,
                labels = c("<2m","2m~10m","10m~50m",">50m"))
plot(poprange2018)

pal = colorFactor(palette = c("yellow","blue","red","black"), domain = poprange2018)

leaflet(countries2018) %>%
  addProviderTiles(provider = 'CartoDB') %>%
  addCircleMarkers(data = countries2018, lng = countries2018$lng, lat = countries2018$lat,
                     radius = ~5, label = paste(countries2018$country,countries2018$pop), color = ~pal(poprange2018)) %>%
  addLegend(pal = pal, values = ~poprange2018, position = "bottomright", opacity = 1, title = "Population in 2018")