导航：首页 > 互联网科技 >

python怎么实现随机森林

发表于：2024-09-21 作者：千家信息网编辑

千家信息网最后更新 2024年09月21日，这篇文章主要介绍了python怎么实现随机森林的相关知识，内容详细易懂，操作简单快捷，具有一定借鉴价值，相信大家阅读完这篇python怎么实现随机森林文章都会有所收获，下面我们一起来看看吧。背景介绍随

千家信息网最后更新 2024年09月21日python怎么实现随机森林

这篇文章主要介绍了python怎么实现随机森林的相关知识，内容详细易懂，操作简单快捷，具有一定借鉴价值，相信大家阅读完这篇python怎么实现随机森林文章都会有所收获，下面我们一起来看看吧。

背景介绍

随机森林是一组决策树的商标术语。在随机森林中，我们收集了决策树（也称为"森林"）。为了基于属性对新对象进行分类，每棵树都有一个分类，我们称该树对该类"投票"。森林选择投票最多的类别（在森林中的所有树木上）。

每棵树的种植和生长如下：

如果训练集中的案例数为N，则随机抽取N个案例样本，但要进行替换。该样本将成为树木生长的训练集。
如果有M个输入变量，则指定数字m << M，以便在每个节点上从M个中随机选择m个变量，并使用对这m个变量的最佳分割来分割节点。在森林生长期间，m的值保持恒定。
每棵树都尽可能地生长。没有修剪。

入门示例

python代码实现：

'''The following code is for the Random ForestCreated by - ANALYTICS VIDHYA'''
# importing required librariesimport pandas as pdfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.metrics import accuracy_score
# read the train and test datasettrain_data = pd.read_csv('train-data.csv')test_data = pd.read_csv('test-data.csv')
# view the top 3 rows of the datasetprint(train_data.head(3))
# shape of the datasetprint('\nShape of training data :',train_data.shape)print('\nShape of testing data :',test_data.shape)
# Now, we need to predict the missing#  target variable in the test data# target variable - Survived
# seperate the independent and target variable on training datatrain_x = train_data.drop(columns=['Survived'],axis=1)train_y = train_data['Survived']
# seperate the independent and target variable on testing datatest_x = test_data.drop(columns=['Survived'],axis=1)test_y = test_data['Survived']
'''
Create the object of the Random Forest modelYou can also add other parameters and test your code hereSome parameters are : n_estimators and max_depthDocumentation of sklearn RandomForestClassifier: 
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
'''model = RandomForestClassifier()
# fit the model with the training datamodel.fit(train_x,train_y)
# number of trees usedprint('Number of Trees used : ', model.n_estimators)
# predict the target on the train datasetpredict_train = model.predict(train_x)print('\nTarget on train data',predict_train) 
# Accuray Score on train datasetaccuracy_train = accuracy_score(train_y,predict_train)print('\naccuracy_score on train dataset : ', accuracy_train)
# predict the target on the test datasetpredict_test = model.predict(test_x)print('\nTarget on test data',predict_test) 
# Accuracy Score on test datasetaccuracy_test = accuracy_score(test_y,predict_test)print('\naccuracy_score on test dataset : ', accuracy_test)

运行结果：

Shape of training data : (712, 25)
Shape of testing data : (179, 25)
Number of Trees used :  10
Target on train data [0 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 1 0 1 1 1 0 0 1 0 01 0 0 0 0 0 0 1 1 0 0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 1 11 0 1 0 0 0 0 0 0 1 1 0 0 1 0 1 0 1 1 0 0 0 1 0 0 1 0 0 0 1 0 1 0 1 0 0 00 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 0 0 0 0 1 0 0 00 0 1 0 0 1 0 1 1 1 1 0 0 1 0 1 0 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 0 00 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 1 01 0 0 0 1 0 0 1 1 0 1 1 1 0 1 1 0 0 1 0 0 1 1 1 1 0 0 1 0 0 0 1 1 0 0 1 10 0 0 0 0 0 0 0 1 1 0 1 1 0 1 0 1 1 1 0 0 0 1 0 1 1 0 0 0 0 1 0 1 0 0 0 00 0 0 0 0 1 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 1 0 1 0 0 0 1 0 1 00 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0 0 1 0 0 1 1 1 1 01 1 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1 0 1 1 0 0 1 0 1 0 1 1 11 0 0 1 0 0 0 1 0 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 1 10 0 0 0 0 0 0 0 1 1 1 0 0 1 0 1 1 0 1 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 00 0 0 1 0 1 1 0 0 0 0 1 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 1 1 1 01 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 1 1 0 1 0 0 0 0 11 0 1 1 1 0 1 0 1 0 1 1 0 1 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 0 1 0 1 0 0 0 00 0 0 0 0 1 0 0 0 1 0 1 1 1 1 0 1 1 0 0 1 0 1 0 0 1 0 0 1 1 1 1 0 1 0 0 01 0 1 1 1 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 00 0 1 0 1 0 1 0 1 1 1 0 0 1 0]
accuracy_score on train dataset :  0.973314606741573
Target on test data [0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 01 1 1 1 0 0 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 0 0 00 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 0 11 0 1 1 0 1 0 1 0 0 0 1 1 1 1 1 0 1 1 0 1 1 0 0 1 1 0 0 1 1 0 0 0 1 0 1 00 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0]
accuracy_score on test dataset :  0.8156424581005587