Kpop refers to the music genre and its idol industry originated in South Korea. Unlike other music genres having blurry origins and developed gradually in today's form, Kpop has an actual birthday when Seo Taiji and Boys debuted on 1992-03-23. After less than 30 years of development, Kpop obtains worldwide popularity among youth today. I gathered data of Kpop from 1992-03-23 (the birthday of Kpop) to 2021-04-21 including information of all major Kpop idols from K-Pop Database. In this article, I will investigate Kpop by analyzing the 4 tables in this dataset:
By 2021-04-21, when the data was collected, 4146 music videos were produced. There are 1435 Kpop idols in total. It is quite stunning that Kpop has been developing for less than 30 years, but has already attracted such a great number of artists to join this industry. Kpop has always been famous for group activities. Among all idols, 1318 of them (92%) are or has been a member of a Kpop group, only 117 of them (8%) has always been soloists. 48% of them are female. 52% of them are male. There has been 176 male groups and also 176 female groups. 134 boy groups and 94 girl groups are active now. Here is a small peek of the table of all idols:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
# creating dataframes and data cleaning
idols = pd.read_csv('kpop_idols.csv')
idols['Date of Birth'] = pd.to_datetime(idols['Date of Birth'])
boys = pd.read_csv('kpop_idols_boy_groups.csv')
girls = pd.read_csv('kpop_idols_girl_groups.csv')
all_groups = pd.concat([boys, girls])
all_groups.Debut = pd.to_datetime(all_groups.Debut)
all_groups.Company.replace(to_replace=[''], value=np.nan, inplace=True)
all_groups.Company = all_groups.Company.str.split(", ")
# calculating ages of idols
age = [2021-row['Date of Birth'].year for _, row in idols.iterrows()]
idols['Age'] = age
print(f'{idols.count().Group} artist are or have been in Kpop group')
idols.head(5)
1315 artist are or have been in Kpop group
Stage Name | Full Name | Date of Birth | Group | Country | Birthplace | Other Group | Gender | Age | |
---|---|---|---|---|---|---|---|---|---|
0 | A.M | Seong Hyunwoo | 1996-12-31 | Limitless | South Korea | NaN | NaN | M | 25 |
1 | Ace | Jang Wooyoung | 1992-08-28 | VAV | South Korea | NaN | NaN | M | 29 |
2 | Aeji | Kwon Aeji | 1999-10-25 | Hashtag | South Korea | Daegu | NaN | F | 22 |
3 | AhIn | Lee Ahin | 1999-09-27 | MOMOLAND | South Korea | Wonju | NaN | F | 22 |
4 | Ahra | Go Ahra | 2001-02-21 | Favorite | South Korea | Yeosu | NaN | F | 20 |
Kpop is well-known for its internationality, 124 idols (9%) are not from South Korea, but from 12 other countries and states. Notably, there are 42 artists from China (mainland) now. There are also 38 Japanese artists and 14 American artists (Korean-Americans are not included in this dataset).
foreign = idols.groupby('Country')['Stage Name'].count().reset_index(name='count')\
.sort_values(['count'],ascending=False).iloc[1:]
foreign.plot.bar(x='Country',title='Foreign Kpop Artists',xlabel = 'Countries and States', rot=45, figsize=(10,5))
plt.show()
foreign.reset_index(drop=True, inplace=True)
foreign
Country | count | |
---|---|---|
0 | China | 42 |
1 | Japan | 38 |
2 | USA | 14 |
3 | Taiwan | 7 |
4 | Thailand | 7 |
5 | Canada | 6 |
6 | Hong Kong | 4 |
7 | Indonesia | 2 |
8 | Australia | 1 |
9 | Germany | 1 |
10 | Malaysia | 1 |
11 | Philippines | 1 |
Kpop Artists are famous for showing their youthful charisma. Kpop idols are generally young. The average age of all Kpop idols is 25, as for 2021.
age = idols.groupby('Age')['Stage Name'].count()
print(f'The average age of Kpop idols is {int(np.mean(idols.Age))}')
age.plot.bar(title='Age of Kpop Artists',rot=0, figsize=(10,5))
plt.show()
The average age of Kpop idols is 25
On the other hand, Kpop idols usually debut in even younger age. The average age of a newly debuted Kpop group members is 19. The youngest idols, such as IZ\*ONE's Wonyoung and Shinee's Taemin, debuted are only 14 years old.
from collections import Counter, defaultdict
def debut_age(df):
age = []
checked = []
for _, group in df.iterrows():
for index, idol in idols.iterrows():
if index not in checked and idol.Group == group.Name:
checked.append(index)
debut = group['Debut Year'] - idol['Birth Year']
# I calculate the debut age by minus group debut year from idols' birth year
# However, this may not work sometimes when young idols were added to the group
# after the group already debut
# Therefore, I set the minimum debut age as 14, which is already known
if debut > 13:
age.append(debut)
return age
debut_age_list = debut_age(all_groups)
debut_age_list.sort()
print(f'The average age of a newly debuted Kpop group is {int(np.mean(debut_age_list))}')
age_count = pd.DataFrame({'age': debut_age_list}).groupby('age', as_index=False).size()
age_count.plot.bar(x='age',y='size',title='The debut Age of Kpop idols',rot=0, figsize=(10,5))
plt.show()
The average age of a newly debuted Kpop group is 19
Kpop's fast development is epitomized by the timeline of music videos by year, and the timeline of group debuts by year. Especially, Kpop groups can be divided into generations. There are lot of opinions about how to divide the generations. Here, I will be showing only one of those opinions:
%config InlineBackend.figure_format = 'svg'
def timeline(df, title_name, column, include_zero):
zero_gen = list(range(1992,1996))
first_gen = list(range(1996,2004))
second_gen = list(range(2004,2012))
third_gen = list(range(2012,2021))
generations = [zero_gen, first_gen, second_gen, third_gen]
colors = ['gray','blue','orange','red']
if include_zero == False:
generations = generations[1:]
colors = colors[1:]
counters = []
for years in generations:
value_by_year = Counter()
for _, row in df.iterrows():
for year in years:
if row[column] == year:
value_by_year[year] += 1
counters.append(value_by_year)
fig = plt.figure(figsize=(10,5))
ax = fig.add_axes([0,0,1,1])
all_year = []
for gen_color, years, counter in zip(colors, generations, counters):
ax.bar(years, [counter[year] for year in years], color=gen_color)
all_year += years
plt.xticks(all_year)
plt.xticks(rotation = 45)
plt.title(title_name)
plt.show()
mv = year_clean(pd.read_csv('kpop_music_videos.csv'),'Date','Year')
timeline(mv, "Music Videos by Year", 'Year', True)
timeline(all_groups, "group debuts by Year", 'Debut Year', False)
Kpop entertainment companies play significant role in this industry. These agencies recruit artists as trainees, select excellent trainees to form groups and manage their activities after they debut. Every Kpop agency manages 1.6 groups on average. Big companies like SM Entertainment can have up to 15 groups. Here are the 10 agencies which have most groups.
agency = defaultdict(list)
for _, row in all_groups.iterrows():
if isinstance(row.Company, str):
for company in row.Company.split(','):
agency[company.strip()].append(row.Name)
companies = list(agency.keys())
group_counts = [len(groups) for groups in agency.values()]
agency_df = pd.DataFrame({'Company':companies,"Groups":agency.values(),"Group count":group_counts})
avg_group = round(agency_df['Group count'].mean(),1)
print(f'Every Kpop agency manages {avg_group} groups on average')
big10 = agency_df.sort_values(by='Group count',ascending=False).head(10)
big10.plot.bar(x='Company',y='Group count',title='The 10 Companies Having Most Groups',rot=0, figsize=(10,5))
plt.show()
print(f'The 10 companies having most groups are:')
big10.reset_index(drop=True, inplace=True)
big10
Every Kpop agency manages 1.6 groups on average
The 10 companies having most groups are:
Company | Groups | Group count | |
---|---|---|---|
0 | SM | [EXO, Fly to the Sky, H.O.T, NCT, SHINee, Shin... | 15 |
1 | JYP | [2AM, 2PM, DAY6, g.o.d., GOT7, Stray Kids, 15&... | 11 |
2 | FNC | [CNBLUE, FTISLAND, Honeyst, N.Flying, P1Harmon... | 8 |
3 | YG | [BIGBANG, iKON, SECHKIES, Treasure, WINNER, 2N... | 7 |
4 | MBK | [1the9, Speed, Turbo, DIA, F-ve Dolls, T-ara] | 6 |
5 | Big Hit | [2AM, 8Eight, BTS, Homme, TXT, GLAM] | 6 |
6 | DSP | [A-JAX, SECHKIES, SS501, April, KARA, Rainbow] | 6 |
7 | Starship | [Boyfriend, CRAVITY, MONSTA X, SISTAR, WJSN] | 5 |
8 | Cube | [BtoB, PENTAGON, (G)I-DLE, 4Minute, CLC] | 5 |
9 | Woollim | [DRIPPIN, Golden Child, Infinite, Lovelyz, Roc... | 5 |
On average, Kpop groups have 5.5 members. 5-member group is the most common form. However, maintaining the team is not easy. 90 groups, about 25%, have lost group members.
all_groups.astype({'Orig. Memb.': 'int32','Members': 'int32'}).dtypes
print(f'Kpop groups have {round(all_groups.Members.mean(),1)} members on average')
loss = 0
for _, row in all_groups.iterrows():
if row['Orig. Memb.'] - row['Members'] > 0:
loss += 1
print(f'{loss} groups, about {int(100*loss/len(all_groups))}%, have lost group members')
member_count = all_groups.groupby('Members', as_index=False).size().iloc[1:]
member_count.plot.bar(x='Members',y='size',title='Number of Members in Kpop Groups',rot=0, figsize=(10,5))
plt.show()
Kpop groups have 5.5 members on average 90 groups, about 25%, have lost group members
Kpop's success largely depends on the devotion of their fans. Kpop groups are famous for making official fanclubs for their devoted fans and organize events within those fanclubs, such as BTS' fanclub ARMY. 41% of the boy groups have their fanclubs. 30% of the boy groups have their fanclubs. The data shows that boy groups gained more fans' devotion than girl groups.
print(f'{int(100*boys["Fanclub Name"].count()/boys.shape[0])}% of the boy groups have names for their fanclubs. {int(100*girls["Fanclub Name"].count()/girls.shape[0])}% of the boy groups have names for their fanclubs.')
36% of the boy groups have names for their fanclubs. 27% of the boy groups have names for their fanclubs.