- read

Creating Word Clouds using Python

Tincy Thomas 53

Word clouds (also known as text clouds or tag clouds) work in a simple way: the more a specific word appears in a source of textual data (such as a speech, blog post, or database), the bigger and bolder it appears in the word cloud.

Luckily, a Python package already exists in Python for generating word clouds. The package, called word_cloud was developed by Andreas Mueller. You can learn more about the package by following this link.

Let’s use this package to learn how to generate a word cloud for a given text document.

First, let’s install the package.

# install wordcloud
!pip install wordcloud

# import package and its set of stopwords
from wordcloud import WordCloud, STOPWORDS

print (‘Wordcloud is installed and imported!’)

Word clouds are commonly used to perform high-level analysis and visualization of text data. Accordinly, let's digress from the immigration dataset and work with an example that involves analyzing text data. Let's try to analyze a short novel written by Lewis Carroll titled Alice's Adventures in Wonderland. Let's go ahead and download a .txt file of the novel.

import urllib

# open the file and read it into a variable alice_novel
alice_novel = urllib.request.urlopen(‘https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/alice_novel.txt').read().decode("utf-8")

Next, let’s use the stopwords that we imported from word_cloud. We use the function set to remove any redundant stopwords.

stopwords = set(STOPWORDS)

Create a word cloud object and generate a word cloud. For simplicity, let’s generate a word cloud using only the first 2000 words in the novel.

# instantiate a word cloud object
alice_wc = WordCloud(background_color=’white’, max_words=2000,
stopwords=stopwords)

# generate the word cloud
alice_wc.generate(alice_novel)

Awesome! Now that the word cloud is created, let's visualize it.

# display the word cloud
plt.imshow(alice_wc, interpolation=’bilinear’)
plt.axis(‘off’)
plt.show()

Output.

Interesting! So in the first 2000 words in the novel, the most common words are Alice, said, little, Queen, and so on. Let’s resize the cloud so that we can see the less frequent words a little better.

fig = plt.figure(figsize=(14, 18))

# display the cloud
plt.imshow(alice_wc, interpolation=’bilinear’)
plt.axis(‘off’)
plt.show()

Output.

Much better! However, said isn’t really an informative word. So let’s add it to our stopwords and re-generate the cloud.

stopwords.add(‘said’) # add the words said to stopwords

# re-generate the word cloud
alice_wc.generate(alice_novel)

# display the cloud
fig = plt.figure(figsize=(14, 18))

plt.imshow(alice_wc, interpolation=’bilinear’)
plt.axis(‘off’)
plt.show()

Output.

Excellent! This looks really interesting! Another cool thing you can implement with the word_cloud package is superimposing the words onto a mask of any shape. Let's use a mask of Alice and her rabbit. We already created the mask for you, so let's go ahead and download it and call it alice_mask.png.

# save mask to alice_mask
alice_mask = np.array(Image.open(urllib.request.urlopen(‘https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/labs/Module%204/images/alice_mask.png')))

Let’s take a look at how the mask looks like.

fig = plt.figure(figsize=(14, 18))

plt.imshow(alice_mask, cmap=plt.cm.gray, interpolation=’bilinear’)
plt.axis(‘off’)
plt.show()

Output.

Shaping the word cloud according to the mask is straightforward using word_cloud package. For simplicity, we will continue using the first 2000 words in the novel.

# instantiate a word cloud object
alice_wc = WordCloud(background_color=’white’, max_words=2000, mask=alice_mask, stopwords=stopwords)

# generate the word cloud
alice_wc.generate(alice_novel)

# display the word cloud
fig = plt.figure(figsize=(14, 18))

plt.imshow(alice_wc, interpolation=’bilinear’)
plt.axis(‘off’)
plt.show()

Output

Really impressive! Isn’t it?

Epilogue:

Although this is not my actual masterpiece, I couldn’t help but share this wonderful piece that I came across while taking the IBM Data Analyst Professional Certificate course.

I hope it meets your requirements! :)

Please leave a remark once you have tried it.