`
strayly
  • 浏览: 93668 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

spark之word2vec使用(python)

阅读更多
from pyspark import SparkConf, SparkContext,SQLContext
from pyspark.sql import SparkSession 
from pyspark.ml.feature import Word2Vec,CountVectorizer

conf = SparkConf().setAppName("yjs_rec")
sc = SparkContext(conf=conf) 
sqlContext=SQLContext(sc)

'''
documentDF = sqlContext.createDataFrame([
    ("Hi I heard about Spark".split(" "),),
    ("I wish Java could use case classes".split(" "),),
    ("Logistic regression models are neat".split(" "),)
], ["text"])
'''
spark_df = sqlContext.createDataFrame(documentDF)
#word2vec
word2Vec = Word2Vec(vectorSize=100, minCount=0, inputCol="words", outputCol="result")
model = word2Vec.fit(spark_df)
result = model.transform(spark_df)
result.select("result").show()
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics