Decoding WebofScience stats using Python

Decoding WebofScience stats using Python

Using python to find key trends and visualize patterns while conducting literature review

Written by Sai Gattupalli

In this blog post, you will journey through a dataset supplied by WebofScience that captures a decade of scholarly insights on multicultural education. Using sample Python code, I will distill key trends, visualize patterns, and unravel the nuances of this vital discourse.

import pandas as pd
import matplotlib.pyplot as plt

df_multicultural_tech = pd.read_excel('dataset.xls')

# Extracting relevant data for viz
publications_per_year = df_multicultural_tech['Publication Year'].value_counts().sort_index()
df_multicultural_tech['Country'] = df_multicultural_tech['Addresses'].str.extract(r'([A-Z][A-Z]$)')
publications_per_country = df_multicultural_tech['Country'].value_counts()
all_research_areas = df_multicultural_tech['Research Areas'].str.split(';').explode().str.strip()
publications_per_discipline = all_research_areas.value_counts()

# Plotting the visuals
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(14, 18))

# Plotting pubs per year
publications_per_year.plot(kind='bar', ax=axes[0], color='skyblue')
axes[0].set_title('Number of Publications Per Year')
axes[0].set_xlabel('Year')
axes[0].set_ylabel('Number of Publications')

# Plotting distribution of pubs by country
publications_per_country.head(10).plot(kind='bar', ax=axes[1], color='lightgreen')
axes[1].set_title('Top 10 Countries with Most Publications')
axes[1].set_xlabel('Country')
axes[1].set_ylabel('Number of Publications')

# Plotting distribution by discipline
publications_per_discipline.head(10).plot(kind='bar', ax=axes[2], color='salmon')
axes[2].set_title('Top 10 Disciplines by Number of Publications')
axes[2].set_xlabel('Discipline')
axes[2].set_ylabel('Number of Publications')

plt.tight_layout()
plt.show()

After supplying data from WoS, here is the output:

Until next time.