如何进行Spark中MLlib的本质分析
发表于:2024-12-12 作者:千家信息网编辑
千家信息网最后更新 2024年12月12日,如何进行Spark中MLlib的本质分析,相信很多没有经验的人对此束手无策,为此本文总结了问题出现的原因和解决方法,通过这篇文章希望你能解决这个问题。org.apache.spark.ml(http:
千家信息网最后更新 2024年12月12日如何进行Spark中MLlib的本质分析
如何进行Spark中MLlib的本质分析,相信很多没有经验的人对此束手无策,为此本文总结了问题出现的原因和解决方法,通过这篇文章希望你能解决这个问题。
org.apache.spark.ml(http://spark.apache.org/docs/latest/ml-guide.html )
org.apache.spark.ml.attributeorg.apache.spark.ml.classificationorg.apache.spark.ml.clusteringorg.apache.spark.ml.evaluationorg.apache.spark.ml.featureorg.apache.spark.ml.paramorg.apache.spark.ml.recommendationorg.apache.spark.ml.regressionorg.apache.spark.ml.source.libsvmorg.apache.spark.ml.treeorg.apache.spark.ml.tuningorg.apache.spark.ml.util
org.apache.spark.mllib (http://spark.apache.org/docs/latest/mllib-guide.html )
org.apache.spark.mllib.classificationorg.apache.spark.mllib.clusteringorg.apache.spark.mllib.evaluationorg.apache.spark.mllib.featureorg.apache.spark.mllib.fpmorg.apache.spark.mllib.linalgorg.apache.spark.mllib.linalg.distributedorg.apache.spark.mllib.pmmlorg.apache.spark.mllib.randomorg.apache.spark.mllib.rddorg.apache.spark.mllib.recommendationorg.apache.spark.mllib.regressionorg.apache.spark.mllib.statorg.apache.spark.mllib.stat.distributedorg.apache.spark.mllib.stat.testorg.apache.spark.mllib.treeorg.apache.spark.mllib.tree.configurationorg.apache.spark.mllib.tree.impurityorg.apache.spark.mllib.tree.lossorg.apache.spark.mllib.tree.modelorg.apache.spark.mllib.util
ML概念
DataFrame: Spark ML uses DataFrame from Spark SQL as an ML dataset, which can hold a variety of data types. E.g., a DataFrame could have different columns storing text, feature vectors, true labels, and predictions.Transformer: A Transformer is an algorithm which can transform one DataFrame into another DataFrame. E.g., an ML model is a Transformer which transforms DataFrame with features into a DataFrame with predictions.Estimator: An Estimator is an algorithm which can be fit on a DataFrame to produce a Transformer. E.g., a learning algorithm is an Estimator which trains on a DataFrame and produces a model.Pipeline: A Pipeline chains multiple Transformers and Estimators together to specify an ML workflow.Parameter: All Transformers and Estimators now share a common API for specifying parameters.
ML分类和回归
Classification Logistic regression Decision tree classifier Random forest classifier Gradient-boosted tree classifier Multilayer perceptron classifier One-vs-Rest classifier (a.k.a. One-vs-All)Regression Linear regression Decision tree regression Random forest regression Gradient-boosted tree regression Survival regressionDecision treesTree Ensembles Random Forests Gradient-Boosted Trees (GBTs)
ML聚类
K-meansLatent Dirichlet allocation (LDA)
MLlib 数据类型
Local vectorLabeled pointLocal matrixDistributed matrix RowMatrix IndexedRowMatrix CoordinateMatrix BlockMatrix
MLlib 分类和回归
Binary Classification: linear SVMs, logistic regression, decision trees, random forests, gradient-boosted trees, naive BayesMulticlass Classification:logistic regression, decision trees, random forests, naive BayesRegression:linear least squares, Lasso, ridge regression, decision trees, random forests, gradient-boosted trees, isotonic regression
MLlib 聚类
K-meansGaussian mixturePower iteration clustering (PIC,多用于图像识别)Latent Dirichlet allocation (LDA,多用于主题分类)Bisecting k-meansStreaming k-means
MLlib Models
DecisionTreeModelDistributedLDAModelGaussianMixtureModelGradientBoostedTreesModelIsotonicRegressionModelKMeansModelLassoModelLDAModelLinearRegressionModelLocalLDAModelLogisticRegressionModelMatrixFactorizationModelNaiveBayesModelPowerIterationClusteringModelRandomForestModelRidgeRegressionModelStreamingKMeansModelSVMModelWord2VecModel
Example
import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.param.ParamMap import org.apache.spark.mllib.linalg.{Vector, Vectors} import org.apache.spark.sql.Row val training = sqlContext.createDataFrame(Seq( (1.0, Vectors.dense(0.0, 1.1, 0.1)), (0.0, Vectors.dense(2.0, 1.0, -1.0)), (0.0, Vectors.dense(2.0, 1.3, 1.0)), (1.0, Vectors.dense(0.0, 1.2, -0.5)) )) .toDF("label", "features") val lr = new LogisticRegression()println("LogisticRegression parameters:\n" + lr.explainParams() + "\n") lr.setMaxIter(10).setRegParam(0.01) val model1 = lr.fit(training) println("Model 1 was fit using parameters: " + model1.parent.extractParamMap) val paramMap = ParamMap(lr.maxIter -> 20) .put(lr.maxIter, 30) .put(lr.regParam -> 0.1, lr.threshold -> 0.55)val paramMap2 = ParamMap(lr.probabilityCol -> "myProbability") val paramMapCombined = paramMap ++ paramMap2val model2 = lr.fit(training, paramMapCombined)println("Model 2 was fit using parameters: " + model2.parent.extractParamMap)test = sqlContext.createDataFrame(Seq( (1.0, Vectors.dense(-1.0, 1.5, 1.3)), (0.0, Vectors.dense(3.0, 2.0, -0.1)), (1.0, Vectors.dense(0.0, 2.2, -1.5)) )) .toDF("label", "features")model2.transform(test) .select("features", "label", "myProbability", "prediction") .collect() .foreach { case Row(features: Vector, label: Double, prob: Vector, prediction: Double) => println(s"($features, $label) -> prob=$prob, prediction=$prediction") }
看完上述内容,你们掌握如何进行Spark中MLlib的本质分析的方法了吗?如果还想学到更多技能或想了解更多相关内容,欢迎关注行业资讯频道,感谢各位的阅读!
分类
本质
分析
内容
方法
更多
问题
束手无策
为此
主题
原因
图像
对此
技能
数据
概念
篇文章
类型
经验
行业
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
聊天室的数据库
群晖服务器超级管理员
软件开发技术总监简历
参加网络安全大赛需要哪些知识
对于服务器管理的工作总结
中国邮储软件开发中心
服务器管理教程视频
人脸识别闸机需要配服务器吗
足力健老人服务器
数据库列的取值长度
特岗服务器满有证书吗
网络安全分析研究员
山东滕纵软件开发有限公司
网络安全插花
江苏超频服务器大概费用
广州畅海网络技术有限公司
收费站网络安全应急演练总结
电影app软件开发
网络安全研究方向怎么选
FLASH插件软件开发
简笔画信息网络安全
oracle没有创建数据库
上市公司数据库免费使用
服务器和光纤
青岛中诚互联网络技术有限公司
一号互联网科技有限公司地址
企业级服务器硬件基础设备
速保网络技术有限公司
沃金服务器人口
免费知识数据库