如何进行Spark中MLlib的本质分析
发表于:2024-12-12 作者:千家信息网编辑
千家信息网最后更新 2024年12月12日,如何进行Spark中MLlib的本质分析,相信很多没有经验的人对此束手无策,为此本文总结了问题出现的原因和解决方法,通过这篇文章希望你能解决这个问题。org.apache.spark.ml(http:
千家信息网最后更新 2024年12月12日如何进行Spark中MLlib的本质分析
如何进行Spark中MLlib的本质分析,相信很多没有经验的人对此束手无策,为此本文总结了问题出现的原因和解决方法,通过这篇文章希望你能解决这个问题。
org.apache.spark.ml(http://spark.apache.org/docs/latest/ml-guide.html )
org.apache.spark.ml.attributeorg.apache.spark.ml.classificationorg.apache.spark.ml.clusteringorg.apache.spark.ml.evaluationorg.apache.spark.ml.featureorg.apache.spark.ml.paramorg.apache.spark.ml.recommendationorg.apache.spark.ml.regressionorg.apache.spark.ml.source.libsvmorg.apache.spark.ml.treeorg.apache.spark.ml.tuningorg.apache.spark.ml.util
org.apache.spark.mllib (http://spark.apache.org/docs/latest/mllib-guide.html )
org.apache.spark.mllib.classificationorg.apache.spark.mllib.clusteringorg.apache.spark.mllib.evaluationorg.apache.spark.mllib.featureorg.apache.spark.mllib.fpmorg.apache.spark.mllib.linalgorg.apache.spark.mllib.linalg.distributedorg.apache.spark.mllib.pmmlorg.apache.spark.mllib.randomorg.apache.spark.mllib.rddorg.apache.spark.mllib.recommendationorg.apache.spark.mllib.regressionorg.apache.spark.mllib.statorg.apache.spark.mllib.stat.distributedorg.apache.spark.mllib.stat.testorg.apache.spark.mllib.treeorg.apache.spark.mllib.tree.configurationorg.apache.spark.mllib.tree.impurityorg.apache.spark.mllib.tree.lossorg.apache.spark.mllib.tree.modelorg.apache.spark.mllib.util
ML概念
DataFrame: Spark ML uses DataFrame from Spark SQL as an ML dataset, which can hold a variety of data types. E.g., a DataFrame could have different columns storing text, feature vectors, true labels, and predictions.Transformer: A Transformer is an algorithm which can transform one DataFrame into another DataFrame. E.g., an ML model is a Transformer which transforms DataFrame with features into a DataFrame with predictions.Estimator: An Estimator is an algorithm which can be fit on a DataFrame to produce a Transformer. E.g., a learning algorithm is an Estimator which trains on a DataFrame and produces a model.Pipeline: A Pipeline chains multiple Transformers and Estimators together to specify an ML workflow.Parameter: All Transformers and Estimators now share a common API for specifying parameters.
ML分类和回归
Classification Logistic regression Decision tree classifier Random forest classifier Gradient-boosted tree classifier Multilayer perceptron classifier One-vs-Rest classifier (a.k.a. One-vs-All)Regression Linear regression Decision tree regression Random forest regression Gradient-boosted tree regression Survival regressionDecision treesTree Ensembles Random Forests Gradient-Boosted Trees (GBTs)
ML聚类
K-meansLatent Dirichlet allocation (LDA)
MLlib 数据类型
Local vectorLabeled pointLocal matrixDistributed matrix RowMatrix IndexedRowMatrix CoordinateMatrix BlockMatrix
MLlib 分类和回归
Binary Classification: linear SVMs, logistic regression, decision trees, random forests, gradient-boosted trees, naive BayesMulticlass Classification:logistic regression, decision trees, random forests, naive BayesRegression:linear least squares, Lasso, ridge regression, decision trees, random forests, gradient-boosted trees, isotonic regression
MLlib 聚类
K-meansGaussian mixturePower iteration clustering (PIC,多用于图像识别)Latent Dirichlet allocation (LDA,多用于主题分类)Bisecting k-meansStreaming k-means
MLlib Models
DecisionTreeModelDistributedLDAModelGaussianMixtureModelGradientBoostedTreesModelIsotonicRegressionModelKMeansModelLassoModelLDAModelLinearRegressionModelLocalLDAModelLogisticRegressionModelMatrixFactorizationModelNaiveBayesModelPowerIterationClusteringModelRandomForestModelRidgeRegressionModelStreamingKMeansModelSVMModelWord2VecModel
Example
import org.apache.spark.ml.classification.LogisticRegression import org.apache.spark.ml.param.ParamMap import org.apache.spark.mllib.linalg.{Vector, Vectors} import org.apache.spark.sql.Row val training = sqlContext.createDataFrame(Seq( (1.0, Vectors.dense(0.0, 1.1, 0.1)), (0.0, Vectors.dense(2.0, 1.0, -1.0)), (0.0, Vectors.dense(2.0, 1.3, 1.0)), (1.0, Vectors.dense(0.0, 1.2, -0.5)) )) .toDF("label", "features") val lr = new LogisticRegression()println("LogisticRegression parameters:\n" + lr.explainParams() + "\n") lr.setMaxIter(10).setRegParam(0.01) val model1 = lr.fit(training) println("Model 1 was fit using parameters: " + model1.parent.extractParamMap) val paramMap = ParamMap(lr.maxIter -> 20) .put(lr.maxIter, 30) .put(lr.regParam -> 0.1, lr.threshold -> 0.55)val paramMap2 = ParamMap(lr.probabilityCol -> "myProbability") val paramMapCombined = paramMap ++ paramMap2val model2 = lr.fit(training, paramMapCombined)println("Model 2 was fit using parameters: " + model2.parent.extractParamMap)test = sqlContext.createDataFrame(Seq( (1.0, Vectors.dense(-1.0, 1.5, 1.3)), (0.0, Vectors.dense(3.0, 2.0, -0.1)), (1.0, Vectors.dense(0.0, 2.2, -1.5)) )) .toDF("label", "features")model2.transform(test) .select("features", "label", "myProbability", "prediction") .collect() .foreach { case Row(features: Vector, label: Double, prob: Vector, prediction: Double) => println(s"($features, $label) -> prob=$prob, prediction=$prediction") }
看完上述内容,你们掌握如何进行Spark中MLlib的本质分析的方法了吗?如果还想学到更多技能或想了解更多相关内容,欢迎关注行业资讯频道,感谢各位的阅读!
分类
本质
分析
内容
方法
更多
问题
束手无策
为此
主题
原因
图像
对此
技能
数据
概念
篇文章
类型
经验
行业
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
网络技术包括网络工程吗
java软件开发一般多少钱
服务器客户端通讯
宝马工程师数据库安装方法
数据库00112
2016域控服务器管理用户策略
项城易境网络技术有限公司
天津放心软件开发服务值得推荐
金仓安全数据库管理系统怎么开票
网络安全模式很卡
关于实施网络安全法
护苗网络安全课落实情况
南充市赵勇软件开发工程师
数据库表关系用什么软件画
sql数据库表
多模态语言数据库的优点
服务器目前无法使用
网络安全法 统筹部门
数据库如何减少依赖
google美国服务器
大云物移新兴网络技术应用
数据库应用技术等级考试
数据库精炼的检索方式
通州区管理软件开发口碑推荐
浙江做网络安全
网络安全实验室搭建配置
软件开发过程中的易变因素
无法使用浏览器连接hfs服务器
搏击数据库
支付宝数据库研发技术