本研究利用Word2Vec模型计算文本间的相似度,并在大规模语料库中统计相关文档的数量,深入分析文本内容与分布特征。
首先,建立自己的语料库:
```python
def ylk(x):
seg = jieba.cut(x, cut_all=False)
with open(D://listTwo.txt, a, encoding=utf-8) as f:
for word in seg:
f.write(word + )
f.write(\n)
```
训练模型:
```python
from gensim.models.word2vec import LineSentence, Word2Vec
# 加载语料库
sentences = LineSentence(D://)
```