手机版
你好,游客 登录 注册
背景:
阅读新闻

用Python生成词云

[日期:2019-12-08] 来源:Linux公社  作者:醉落红尘 [字体: ]

词云是一种数据可视化技术,用于表示文本数据,其中每个单词的大小表示其出现的频率或重要性。 可以使用词云突出显示重要的文本数据点。 词云被广泛用于分析来自社交网络网站的数据。

为了在Python中生成词云,需要的模块是– matplotlib,pandas和wordcloud。 要安装这些软件包,请运行以下命令:

pip install matplotlib
pip install pandas
pip install wordcloud

代码1:字数

可以设置要在tagcloud上显示的最大单词数。 为此,请使用WordCloud()函数的max_words关键字参数。


# importing the necessery modules 
from wordcloud import WordCloud  
import matplotlib.pyplot as plt 
import csv



# file object is created 
file_ob = open(r"linuxidc.com.csv")



# reader object is created 
reader_ob = csv.reader(file_ob)



# contents of reader object is stored . 
# data is stored in list of list  format. 
reader_contents = list(reader_ob)



# empty string is declare 
text = ""



# iterating through list of rows 
for row in reader_contents : 

 #  iterating through words in the row 
 for word in row :



  # concatenate the words 
  text = text + " " + word



# show only 10 words in the wordcloud . 
wordcloud = WordCloud(width=480,  height=480, max_words=10).generate(text)



# plot the WordCloud image 
plt.figure() 
plt.imshow(wordcloud,  interpolation="bilinear") 
plt.axis("off") 
plt.margins(x=0, y=0)  
plt.show()

输出如下图:

用Python生成词云

代码2:删除一些单词

可以删除一些我们不想显示的词。 为此,请将这些单词传递给WordCloud()函数的停用词列表参数。


# importing the necessery modules 
from wordcloud import WordCloud  
import matplotlib.pyplot as plt 
import csv



# file object is created 
file_ob = open(r"linuxidc.com.csv")



# reader object is created 
reader_ob = csv.reader(file_ob)



# contents of reader object is stored . 
# data is stored in list of list  format. 
reader_contents = list(reader_ob)



# empty string is declare 
text = ""



# iterating through list of rows 
for row in reader_contents : 

 #  iterating through words in the row 
 for word in row :



  # concatenate the words 
  text = text + " " + word



# remove Python , Matplotlib , Geeks Words from WordCloud . 
wordcloud =  WordCloud(width=480, height=480, 
   stopwords=["Python",  "Matplotlib","Geeks"]).generate(text)



# plot the WordCloud image 
plt.figure() 
plt.imshow(wordcloud,  interpolation="bilinear") 
plt.axis("off") 
plt.margins(x=0, y=0)  
plt.show()

输出效果如下:

用Python生成词云

代码3:更改背景

我们可以更改wordcloud背景的颜色。 为此,请使用WordCloud()函数的background_color关键字参数。


# importing the necessery modules 
from wordcloud import WordCloud  
import matplotlib.pyplot as plt 
import csv



# file object is created 
file_ob = open(r"linuxidc.com.csv")



# reader object is created 
reader_ob = csv.reader(file_ob)



# contents of reader object is stored . 
# data is stored in list of list  format. 
reader_contents = list(reader_ob)



# empty string is declare 
text = ""



# iterating through list of rows 
for row in reader_contents : 

 #  iterating through words in the row 
 for word in row :



  # concatenate the words 
  text = text + " " + word



wordcloud = WordCloud(width=480, height=480,  background_color="pink").generate(text)



# plot the WordCloud image 
plt.figure() 
plt.imshow(wordcloud,  interpolation="bilinear") 
plt.axis("off") 
plt.margins(x=0, y=0)  
plt.show()

输出效果如下:

用Python生成词云

代码4:更改单词的颜色

我们可以使用WordCloud()函数的colormap关键字参数来更改单词的颜色。


# importing the necessery modules 
from wordcloud import WordCloud  
import matplotlib.pyplot as plt 
import csv



# file object is created 
file_ob = open(r"linuxidc.com.csv")



# reader object is created 
reader_ob = csv.reader(file_ob)



# contents of reader object is stored . 
# data is stored in list of list  format. 
reader_contents = list(reader_ob)



# empty string is declare 
text = ""



# iterating through list of rows 
for row in reader_contents : 

 #  iterating through words in the row 
 for word in row :



  # concatenate the words 
  text = text + " " + word



wordcloud = WordCloud(width=480, height=480,  colormap="Oranges_r").generate(text)



# plot the WordCloud image 
plt.figure() 
plt.imshow(wordcloud,  interpolation="bilinear") 
plt.axis("off") 
plt.margins(x=0, y=0)  
plt.show()

输出效果如下:

用Python生成词云

代码5:设置最大和最小字体

我们可以控制wordcloud的最小和最大字体大小。 为此,请使用WordCloud()函数的max_font_size和min_font_size关键字参数。


# importing the necessery modules 
from wordcloud import WordCloud  
import matplotlib.pyplot as plt 
import csv



# file object is created 
file_ob = open(r"linuxidc.com.csv")



# reader object is created 
reader_ob = csv.reader(file_ob)



# contents of reader object is stored . 
# data is stored in list of list  format. 
reader_contents = list(reader_ob)



# empty string is declare 
text = ""



# iterating through list of rows 
for row in reader_contents : 

 #  iterating through words in the row 
 for word in row :



  # concatenate the words 
  text = text + " " + word



wordcloud = WordCloud(width=480, height=480, max_font_size=20,  min_font_size=10).generate(text) 
plt.figure() 
plt.imshow(wordcloud,  interpolation="bilinear") 
plt.axis("off") 
plt.margins(x=0, y=0)  
plt.show()

用Python生成词云

OK,暂时先这样,中文乱码解决等请继续关注我们Linux公社的Python专题栏目,谢谢阅读。

更多Python相关信息见Python 专题页面 https://www.linuxidc.com/topicnews.aspx?tid=17

Linux公社的RSS地址https://www.linuxidc.com/rssFeed.aspx

本文永久更新链接地址https://www.linuxidc.com/Linux/2019-12/161681.htm

linux
相关资讯       Python生成词云  Python词云 
本文评论   查看全部评论 (0)
表情: 表情 姓名: 字数

       

评论声明
  • 尊重网上道德,遵守中华人民共和国的各项有关法律法规
  • 承担一切因您的行为而直接或间接导致的民事或刑事法律责任
  • 本站管理人员有权保留或删除其管辖留言中的任意内容
  • 本站有权在网站内转载或引用您的评论
  • 参与本评论即表明您已经阅读并接受上述条款