02/2021 - 04/2021
CodeKpop Data Analysis
Part.1 Kpop Explained by Data
Analyzed data of all K-pop idols from its start to 2021 about K-pop Industry, artists and companies
Part.2 Kpop Companies Explained by Data
Visualized the business performance of public K-pop companies and analyzed their artist management and international marketing strategies
Here are the interactive data visualization of revenues and net income of Kpop Agencies from 2016 to 2020. If you hover your pointers over the lines of each year, the chart will show a hover box of the revenue or the net income of all companies that year.
Part.3 International Kpop Artists
In my last Kpop Data Analysis Project. I realized that there are some mistakes about nationality of kpop artists in the dataset. I corrected the data and made a clearer visualization of international Kpop artists by using Python and Plotly. This an interative choropleth of Kpop Stars' nationality other than South Korea. If you hover your pointers on the map, ther will be a information box showing how many Kpop star are from this country.
Part.4 Kpop On YouTube Explained by Data
As Kpop becomes increasingly international, YouTube plays a pivotal roles as the digital platforms for Kpop idols to share their music video to the audience all over the world. The view count is a key metrics reflecting the music videos' international popularity. I extracted the data of all Kpop music videos from Kpop Database and scraped the view counts of all 4262 music videos from YouTube by 04/05/2021.
Part.5 Why Kpop Groups Have So Many Members?
On average, Kpop groups have 5.5 members. 5-member group is the most common form. But why can some Kpop groups become so big? The largest Kpop group, NCT, has 23 members. I did an exploratory data analysis of Kpop group sizes by timeline.
Machine Learning in Python
Sentiment Analysis of Movie Reviews
01/2021
CodeWhen you have a large amount of movie reviews, how can you know whether they are complements or criticisms? In this project, I used natural language processing tools to classify the sentiment of the text by using both shallow learning and deep learning, and made a sentiment analysis of the dataset of reviews on imdb.
Dimension Reduction with PCA
01/2021
CodeIn this article I will use Principal Component Analysis to showcase dimension reduction on 'banknote authentication' dataset
Social Media Analytics
11/2020
CodeAnalyzed the data of Trump and Biden's recent tweets by scraping their recent tweets, investigating people's responses and inspecting the contents of their tweets
Calculating π by Monte-Carlo Simulation
07/2021
CodeAs we learned more and more math, we found more and more ways to calculate π. In computational statistics, there is a way to calculate π by brute force -- Monte-Carlo Simulation. In this article, I will do a simple Monte-Carlo Simulation on the calculation of π, or the area of a circle. This method can also be applied to the calculation of any area of geometric shapes.
Data Analytics in R
Analysis on Tropical Atmosphere Ocean Data
01/2020 - 04/2020
Code- Analyzed and manipulated the database containing 96k+ data measuring El Niño effect in equatorial pacific, by using R, Trifecta
- Clustered data in groups, applied logistic regression and hypothesis testing to find the relevant measures of El Niño effect, classified measures by different buoys for further studies of El Niño effect
Analysis of Secondary Education and Teen Fertility
04/2020
Code- Analyzed Word Bank's dataset of countries' secondary school enrollment rate and teen fertility rate by using OLS regression and difference-in-difference estimate
- Found that improving a country’s secondary school enrollment rate lowers its teen fertility rate.
Visualization of World Indicator Data
11/2019
- Explored the World Indicator data and visualize them with graphs by R
- Compared and visualized US, China, Brazil, Russia and India about their development of Internet usage, CO2 emissions and Health expense percentage of GDP from 2000 to 2012
- Compared and visualized the distribution of world population in 1998 and 2018
Visualization in Tableau
Visualization of Delayed Domestic Flights
11/2019
- Analyzed and manipulated the database containing 13k+ delayed domestic flights in US in one day, by using SQL
- Visualized the result of the data analysis in an interactive map illustrating delayed flights as lines between airports by using Tableau
- Showcased the distribution and the scale of delayed flights in US, each flight’s information and insights on mapping out delayed flights, for travelers’ reference
The application showcases the domestic flights in the United States on 1 January 2015. The data demonstrates each airports’ longitude, latitude, and the distance between origin and destination airports. The lines change colors from the origin airports to its destination airports. From the visualization, we can observe that the flights in the contiguous 48 states are more frequent than the outlying states and territories. For the 48 states, the major hubs like DFW , JFK, and LAX have huge amounts of flights going in and out. For the outlying states and territories, the flights to and from Hawaii and Puerto Rico are more frequent than to and from Alaska and Guam, probably because of their famous tourism.