千家信息网

greenplum分布式集群(数据仓库)实战

发表于:2024-09-22 作者:千家信息网编辑
千家信息网最后更新 2024年09月22日,1.准备环境1.1集群介绍系统环境:centos6.5数据库版本:greenplum-db-4.3.3.1-build-1-RHEL5-x86_64.zipgreenplum集群中,4台机器IP分别是
千家信息网最后更新 2024年09月22日greenplum分布式集群(数据仓库)实战

1.准备环境

1.1集群介绍

系统环境:centos6.5

数据库版本:greenplum-db-4.3.3.1-build-1-RHEL5-x86_64.zip

greenplum集群中,4台机器IP分别是

[root@dw-greenplum-1 ~]# cat /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.96.101 dw-greenplum-1 mdw

192.168.96.102 dw-greenplum-2 sdw1

192.168.96.103 dw-greenplum-3 sdw2

192.168.96.104 dw-greenplum-4 sdw3

每台/etc/hosts都要增加如上

1.架构图:

1.2创建用户及用户组(每台机器)

[root@dw-greenplum-1 ~]# groupadd -g 530 gpadmin

[root@dw-greenplum-1 ~]# useradd -g 530 -u530 -m -d /home/gpadmin -s /bin/bash gpadmin

[root@dw-greenplum-1 ~]# passwd gpadmin

Changing password for user gpadmin.

New password:

BAD PASSWORD: it is too simplistic/systematic

BAD PASSWORD: is too simple

Retype new password:

passwd: all authentication tokens updated successfully.

1.3修改系统内核(每台机器)

注意一定要修改否则会出错

[root@dw-greenplum-1 ~]# vi /etc/sysctl.conf

kernel.shmmax = 500000000

kernel.shmmni = 4096

kernel.shmall = 4000000000

kernel.sem = 250 512000 100 2048

kernel.sysrq = 1

kernel.core_uses_pid = 1

kernel.msgmnb = 65536

kernel.msgmax = 65536

kernel.msgmni = 2048

net.ipv4.tcp_syncookies = 1

net.ipv4.ip_forward = 0

net.ipv4.conf.default.accept_source_route = 0

net.ipv4.tcp_tw_recycle = 1

net.ipv4.tcp_max_syn_backlog = 4096

net.ipv4.conf.all.arp_filter = 1

net.ipv4.ip_local_port_range = 1025 65535

net.core.netdev_max_backlog = 10000

net.core.rmem_max = 2097152

net.core.wmem_max = 2097152

vm.overcommit_memory = 2

根据自己服务器去修改参数

让参数生效

[root@dw-greenplum-1~]# sysctl -p

1.4修改文件打开数等限制(每台机器)

注意一定要修改否则会出错

[root@dw-greenplum-1 ~]# vi /etc/security/limits.conf

# End of file

* soft nofile 65536

* hard nofile 65536

* soft nproc 131072

* hard nproc 131072

2.greenplum安装

greenplum软件创建安装目录,并且赋给gpadmin用户权限(每台操作)

[root@dw-greenplum-1 ~]# mkdir /opt/greenplum

[root@dw-greenplum-1 ~]# chown -R gpadmin:gpadmin /opt/greenplum

首先准备好安装文件(在MASTER 192.168.96.101上操作)

greenplum-db-4.3.3.1-build-1-RHEL5-x86_64.zip

执行unzip命令解压安装文件:

[root@dw-greenplum-1 ~]# unzip greenplum-db-4.3.3.1-build-1-RHEL5-x86_64.zip

执行开始安装软件:

[root@dw-greenplum-1 ~]# chmod +x greenplum-db-4.3.3.1-build-1-RHEL5-x86_64.bin

[root@dw-greenplum-1 ~]# ./greenplum-db-4.3.3.1-build-1-RHEL5-x86_64.bin

安装greenplumLicense的信息

是否接受License

选择安装目录

安装以上步骤后,软件开始怎么安装,最后显示软件安装成功。

配置系统环境变量(master,master standy两台配置)

[root@dw-greenplum-1 ~]# su - gpadmin

[gpadmin@dw-greenplum-1 ~]$ vi .bash_profile

source /opt/greenplum/greenplum-db/greenplum_path.sh

export MASTER_DATA_DIRECTORY=/home/gpadmin/gpdata/gpmaster/gpseg-1

export PGPORT=5432

export PGDATABASE=trjDB

让以上配置生效

[gpadmin@dw-greenplum-1 ~]$ source .bash_profile

配置hostlist

配置文件,将所有的服务器名记在里面

[gpadmin@dw-greenplum-1 ~]$ mkdir conf

[gpadmin@dw-greenplum-1 ~]$ cd conf/

[gpadmin@dw-greenplum-1 conf]$ vi hostlist

mdw

sdw1

sdw2

sdw3

[gpadmin@dw-greenplum-1 conf]$ vi seg_hosts

sdw1

sdw2

sdw3

[gpadmin@dw-greenplum-1 conf]$ gpssh-exkeys -f hostlist

[STEP 1 of 5] create local ID and authorize on local host

[STEP 2 of 5] keyscan all hosts and update known_hosts file

[STEP 3 of 5] authorize current user on remote hosts

... send to sdw1

***

*** Enter password for sdw1:

... send to sdw2

... send to sdw3

[STEP 4 of 5] determine common authentication file content

[STEP 5 of 5] copy authentication files to all remote hosts

... finished key exchange with sdw1

... finished key exchange with sdw2

... finished key exchange with sdw3

[INFO] completed successfully

在打通所有机器通道之后,我们就可以使用gpssh命令对所有机器进行批量操作了

[gpadmin@dw-greenplum-1 conf]$ gpssh -f hostlist

Note: command history unsupported on this machine ...

=> pwd

[sdw3] /home/gpadmin

[sdw1] /home/gpadmin

[sdw2] /home/gpadmin

[ mdw] /home/gpadmin

=>

将软件分发到每一台机器上

接下来将安装后的文件打包

[gpadmin@dw-greenplum-1 conf]$ cd /opt/greenplum/

[gpadmin@dw-greenplum-1 greenplum]$ tar -cf gp.4.3.3.1.tar greenplum-db-4.3.3.1/

然后利用gpscp命令将这个文件复制到每一台机器上:

[gpadmin@dw-greenplum-1 greenplum]$ gpscp -f /home/gpadmin/conf/hostlist gp.4.3.3.1.tar =:/opt/greenplum/

使用gpssh命令批量解压文件包:

[gpadmin@dw-greenplum-1 greenplum]$ cd /home/gpadmin/conf/

[gpadmin@dw-greenplum-1 conf]$ gpssh -f hostlist

=> cd /opt/greenplum

[sdw3]

[sdw1]

[sdw2]

[ mdw]

=> tar -xf gp.4.3.3.1.tar

[sdw3]

[sdw1]

[sdw2]

[ mdw]

建立软件连接

=> ln -s greenplum-db-4.3.3.1 greenplum-db

[sdw3]

[sdw1]

[sdw2]

[ mdw]

=> ll

[sdw3] total 397060

[sdw3] -rw-rw-r-- 1 gpadmin gpadmin 406579200 Apr 22 23:32 gp.4.3.3.1.tar

[sdw3] lrwxrwxrwx 1 gpadmin gpadmin 20 Apr 22 23:53 greenplum-db -> greenplum-db-4.3.3.1

[sdw3] drwxr-xr-x 11 gpadmin gpadmin 4096 Apr 22 23:00 greenplum-db-4.3.3.1

[sdw1] total 397056

[sdw1] -rw-rw-r-- 1 gpadmin gpadmin 406579200 Apr 22 23:32 gp.4.3.3.1.tar

[sdw1] lrwxrwxrwx 1 gpadmin gpadmin 20 Apr 22 23:53 greenplum-db -> greenplum-db-4.3.3.1

[sdw1] drwxr-xr-x 11 gpadmin gpadmin 4096 Apr 22 23:00 greenplum-db-4.3.3.1

[sdw2] total 397060

[sdw2] -rw-rw-r-- 1 gpadmin gpadmin 406579200 Apr 22 23:32 gp.4.3.3.1.tar

[sdw2] lrwxrwxrwx 1 gpadmin gpadmin 20 Apr 22 23:53 greenplum-db -> greenplum-db-4.3.3.1

[sdw2] drwxr-xr-x 11 gpadmin gpadmin 4096 Apr 22 23:00 greenplum-db-4.3.3.1

[ mdw] total 397056

[ mdw] -rw-rw-r-- 1 gpadmin gpadmin 406579200 Apr 22 23:31 gp.4.3.3.1.tar

[ mdw] lrwxrwxrwx 1 gpadmin gpadmin 22 Apr 22 23:00 greenplum-db

下面创建数据库数据目录

MASTER目录:

=> mkdir -p /home/gpadmin/gpdata/gpmaster

primary节点目录:

=> mkdir -p /home/gpadmin/gpdata/gpdatap1

=> mkdir -p /home/gpadmin/gpdata/gpdatap2

mirror节点目录:

=> mkdir -p /home/gpadmin/gpdata/gpdatam1

=> mkdir -p /home/gpadmin/gpdata/gpdatam2

让其他节点环境生效

[root@dw-greenplum-2 greenplum]# su - gpadmin

[gpadmin@dw-greenplum-2 ~]$ source .bash_profile

初始化greenplum的配置文件

[gpadmin@dw-greenplum-1 conf]$ cd $GPHOME/docs/cli_help/gpconfigs

[gpadmin@dw-greenplum-1 gpconfigs]$ cp gpinitsystem_config /home/gpadmin/conf/

[gpadmin@dw-greenplum-1 gpconfigs]$ cd /home/gpadmin/conf/

[gpadmin@dw-greenplum-1 conf]$ chmod u+w gpinitsystem_config

ARRAY_NAME="Greenplum"

SEG_PREFIX=gpseg

PORT_BASE=33000

declare -a DATA_DIRECTORY=(/home/gpadmin/gpdata/gpdatap1 /home/gpadmin/gpdata/gpdatap2)

DATABASE_NAME=trjDB

MASTER_HOSTNAME=mdw

MASTER_DIRECTORY=/home/gpadmin/gpdata/gpmaster

MASTER_PORT=5432

TRUSTED_SHELL=/usr/bin/ssh

MIRROR_PORT_BASE=43000

REPLICATION_PORT_BASE=34000

MIRROR_REPLICATION_PORT_BASE=44000

declare -a MIRROR_DATA_DIRECTORY=(/home/gpadmin/gpdata/gpdatam1 /home/gpadmin/gpdata/gpdatam2)

MACHINE_LIST_FILE=/home/gpadmin/conf/seg_hosts

初始化数据库

使用gpinitsystem脚本来初始化数据库,命令如下:

[gpadmin@dw-greenplum-1 conf]$ gpinitsystem -c gpinitsystem_config -h seg_hosts -s sdw3

看到上面图说明初始化成功,尝试登录greenplum默认的数据库postgres

[gpadmin@dw-greenplum-1 conf]$ psql -d postgres

psql (8.2.15)

Type "help" for help.

postgres=# \l

List of databases

Name | Owner | Encoding | Access privileges

-----------+---------+----------+---------------------

postgres | gpadmin | UTF8 |

template0 | gpadmin | UTF8 | =c/gpadmin

: gpadmin=CTc/gpadmin

template1 | gpadmin | UTF8 | =c/gpadmin

: gpadmin=CTc/gpadmin

trjDB | gpadmin | UTF8 |

(4 rows)

postgres=#

3.故障处理

3.1激活standby

[gpadmin@dw-greenplum-4 conf]$ gpactivatestandby

3.2恢复所有失效的segment

[gpadmin@dw-greenplum-4 gpseg-1]$ gprecoverseg

3.3还原所有segment角色

[gpadmin@dw-greenplum-4 gpseg-1]$ gprecoverseg -r

如果要新建standby,但是原来已有standby,首先要删除它。

gpinitstandby -r

3.4把原来master 变成standby

[gpadmin@dw-greenplum-1 gpmaster]$ mv gpseg-1 gpseg-1.bak

在新的主操作如下命令:

[gpadmin@dw-greenplum-4 ~]$ gpinitstandby -F pg_system:/home/gpadmin/gpdata/gpmaster/gpseg-1 -s mdw

启动standby

[gpadmin@dw-greenplum-4 ~]$ gpinitstandby -n

20160424:06:17:19:003594 gpinitstandby:dw-greenplum-4:gpadmin-[INFO]:-Standy master is already up and running.

查看集群状态

select a.dbid,a.content,a.role,a.port,a.hostname,b.fsname,c.fselocation from gp_segment_configuration a,pg_filespace b,pg_filespace_entry c where a.dbid=c.fsedbid and b.oid=c.fsefsoid order by content;

select * from gp_segment_configuration where content='-1';

查看standby延迟,查看pg_stat_replication 视图即可。

select pg_switch_xlog();

select * from pg_stat_replication ;

0