greenplum分布式集群(数据仓库)实战
1.准备环境
1.1集群介绍
系统环境:centos6.5
数据库版本:greenplum-db-4.3.3.1-build-1-RHEL5-x86_64.zip
greenplum集群中,4台机器IP分别是
[root@dw-greenplum-1 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.96.101 dw-greenplum-1 mdw
192.168.96.102 dw-greenplum-2 sdw1
192.168.96.103 dw-greenplum-3 sdw2
192.168.96.104 dw-greenplum-4 sdw3
每台/etc/hosts都要增加如上
1.架构图:
1.2创建用户及用户组(每台机器)
[root@dw-greenplum-1 ~]# groupadd -g 530 gpadmin
[root@dw-greenplum-1 ~]# useradd -g 530 -u530 -m -d /home/gpadmin -s /bin/bash gpadmin
[root@dw-greenplum-1 ~]# passwd gpadmin
Changing password for user gpadmin.
New password:
BAD PASSWORD: it is too simplistic/systematic
BAD PASSWORD: is too simple
Retype new password:
passwd: all authentication tokens updated successfully.
1.3修改系统内核(每台机器)
注意一定要修改否则会出错
[root@dw-greenplum-1 ~]# vi /etc/sysctl.conf
kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 1025 65535
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.overcommit_memory = 2
根据自己服务器去修改参数
让参数生效
[root@dw-greenplum-1~]# sysctl -p
1.4修改文件打开数等限制(每台机器)
注意一定要修改否则会出错
[root@dw-greenplum-1 ~]# vi /etc/security/limits.conf
# End of file
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
2.greenplum安装
为greenplum软件创建安装目录,并且赋给gpadmin用户权限(每台操作)
[root@dw-greenplum-1 ~]# mkdir /opt/greenplum
[root@dw-greenplum-1 ~]# chown -R gpadmin:gpadmin /opt/greenplum
首先准备好安装文件(在MASTER 192.168.96.101上操作)
greenplum-db-4.3.3.1-build-1-RHEL5-x86_64.zip
执行unzip命令解压安装文件:
[root@dw-greenplum-1 ~]# unzip greenplum-db-4.3.3.1-build-1-RHEL5-x86_64.zip
执行开始安装软件:
[root@dw-greenplum-1 ~]# chmod +x greenplum-db-4.3.3.1-build-1-RHEL5-x86_64.bin
[root@dw-greenplum-1 ~]# ./greenplum-db-4.3.3.1-build-1-RHEL5-x86_64.bin
安装greenplum的License的信息
是否接受License
选择安装目录
安装以上步骤后,软件开始怎么安装,最后显示软件安装成功。
配置系统环境变量(master,master standy两台配置)
[root@dw-greenplum-1 ~]# su - gpadmin
[gpadmin@dw-greenplum-1 ~]$ vi .bash_profile
source /opt/greenplum/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/home/gpadmin/gpdata/gpmaster/gpseg-1
export PGPORT=5432
export PGDATABASE=trjDB
让以上配置生效
[gpadmin@dw-greenplum-1 ~]$ source .bash_profile
配置hostlist
配置文件,将所有的服务器名记在里面
[gpadmin@dw-greenplum-1 ~]$ mkdir conf
[gpadmin@dw-greenplum-1 ~]$ cd conf/
[gpadmin@dw-greenplum-1 conf]$ vi hostlist
mdw
sdw1
sdw2
sdw3
[gpadmin@dw-greenplum-1 conf]$ vi seg_hosts
sdw1
sdw2
sdw3
[gpadmin@dw-greenplum-1 conf]$ gpssh-exkeys -f hostlist
[STEP 1 of 5] create local ID and authorize on local host
[STEP 2 of 5] keyscan all hosts and update known_hosts file
[STEP 3 of 5] authorize current user on remote hosts
... send to sdw1
***
*** Enter password for sdw1:
... send to sdw2
... send to sdw3
[STEP 4 of 5] determine common authentication file content
[STEP 5 of 5] copy authentication files to all remote hosts
... finished key exchange with sdw1
... finished key exchange with sdw2
... finished key exchange with sdw3
[INFO] completed successfully
在打通所有机器通道之后,我们就可以使用gpssh命令对所有机器进行批量操作了
[gpadmin@dw-greenplum-1 conf]$ gpssh -f hostlist
Note: command history unsupported on this machine ...
=> pwd
[sdw3] /home/gpadmin
[sdw1] /home/gpadmin
[sdw2] /home/gpadmin
[ mdw] /home/gpadmin
=>
将软件分发到每一台机器上
接下来将安装后的文件打包
[gpadmin@dw-greenplum-1 conf]$ cd /opt/greenplum/
[gpadmin@dw-greenplum-1 greenplum]$ tar -cf gp.4.3.3.1.tar greenplum-db-4.3.3.1/
然后利用gpscp命令将这个文件复制到每一台机器上:
[gpadmin@dw-greenplum-1 greenplum]$ gpscp -f /home/gpadmin/conf/hostlist gp.4.3.3.1.tar =:/opt/greenplum/
使用gpssh命令批量解压文件包:
[gpadmin@dw-greenplum-1 greenplum]$ cd /home/gpadmin/conf/
[gpadmin@dw-greenplum-1 conf]$ gpssh -f hostlist
=> cd /opt/greenplum
[sdw3]
[sdw1]
[sdw2]
[ mdw]
=> tar -xf gp.4.3.3.1.tar
[sdw3]
[sdw1]
[sdw2]
[ mdw]
建立软件连接
=> ln -s greenplum-db-4.3.3.1 greenplum-db
[sdw3]
[sdw1]
[sdw2]
[ mdw]
=> ll
[sdw3] total 397060
[sdw3] -rw-rw-r-- 1 gpadmin gpadmin 406579200 Apr 22 23:32 gp.4.3.3.1.tar
[sdw3] lrwxrwxrwx 1 gpadmin gpadmin 20 Apr 22 23:53 greenplum-db -> greenplum-db-4.3.3.1
[sdw3] drwxr-xr-x 11 gpadmin gpadmin 4096 Apr 22 23:00 greenplum-db-4.3.3.1
[sdw1] total 397056
[sdw1] -rw-rw-r-- 1 gpadmin gpadmin 406579200 Apr 22 23:32 gp.4.3.3.1.tar
[sdw1] lrwxrwxrwx 1 gpadmin gpadmin 20 Apr 22 23:53 greenplum-db -> greenplum-db-4.3.3.1
[sdw1] drwxr-xr-x 11 gpadmin gpadmin 4096 Apr 22 23:00 greenplum-db-4.3.3.1
[sdw2] total 397060
[sdw2] -rw-rw-r-- 1 gpadmin gpadmin 406579200 Apr 22 23:32 gp.4.3.3.1.tar
[sdw2] lrwxrwxrwx 1 gpadmin gpadmin 20 Apr 22 23:53 greenplum-db -> greenplum-db-4.3.3.1
[sdw2] drwxr-xr-x 11 gpadmin gpadmin 4096 Apr 22 23:00 greenplum-db-4.3.3.1
[ mdw] total 397056
[ mdw] -rw-rw-r-- 1 gpadmin gpadmin 406579200 Apr 22 23:31 gp.4.3.3.1.tar
[ mdw] lrwxrwxrwx 1 gpadmin gpadmin 22 Apr 22 23:00 greenplum-db
下面创建数据库数据目录
MASTER目录:
=> mkdir -p /home/gpadmin/gpdata/gpmaster
primary节点目录:
=> mkdir -p /home/gpadmin/gpdata/gpdatap1
=> mkdir -p /home/gpadmin/gpdata/gpdatap2
mirror节点目录:
=> mkdir -p /home/gpadmin/gpdata/gpdatam1
=> mkdir -p /home/gpadmin/gpdata/gpdatam2
让其他节点环境生效
[root@dw-greenplum-2 greenplum]# su - gpadmin
[gpadmin@dw-greenplum-2 ~]$ source .bash_profile
初始化greenplum的配置文件
[gpadmin@dw-greenplum-1 conf]$ cd $GPHOME/docs/cli_help/gpconfigs
[gpadmin@dw-greenplum-1 gpconfigs]$ cp gpinitsystem_config /home/gpadmin/conf/
[gpadmin@dw-greenplum-1 gpconfigs]$ cd /home/gpadmin/conf/
[gpadmin@dw-greenplum-1 conf]$ chmod u+w gpinitsystem_config
ARRAY_NAME="Greenplum"
SEG_PREFIX=gpseg
PORT_BASE=33000
declare -a DATA_DIRECTORY=(/home/gpadmin/gpdata/gpdatap1 /home/gpadmin/gpdata/gpdatap2)
DATABASE_NAME=trjDB
MASTER_HOSTNAME=mdw
MASTER_DIRECTORY=/home/gpadmin/gpdata/gpmaster
MASTER_PORT=5432
TRUSTED_SHELL=/usr/bin/ssh
MIRROR_PORT_BASE=43000
REPLICATION_PORT_BASE=34000
MIRROR_REPLICATION_PORT_BASE=44000
declare -a MIRROR_DATA_DIRECTORY=(/home/gpadmin/gpdata/gpdatam1 /home/gpadmin/gpdata/gpdatam2)
MACHINE_LIST_FILE=/home/gpadmin/conf/seg_hosts
初始化数据库
使用gpinitsystem脚本来初始化数据库,命令如下:
[gpadmin@dw-greenplum-1 conf]$ gpinitsystem -c gpinitsystem_config -h seg_hosts -s sdw3
看到上面图说明初始化成功,尝试登录greenplum默认的数据库postgres
[gpadmin@dw-greenplum-1 conf]$ psql -d postgres
psql (8.2.15)
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Access privileges
-----------+---------+----------+---------------------
postgres | gpadmin | UTF8 |
template0 | gpadmin | UTF8 | =c/gpadmin
: gpadmin=CTc/gpadmin
template1 | gpadmin | UTF8 | =c/gpadmin
: gpadmin=CTc/gpadmin
trjDB | gpadmin | UTF8 |
(4 rows)
postgres=#
3.故障处理
3.1激活standby
[gpadmin@dw-greenplum-4 conf]$ gpactivatestandby
3.2恢复所有失效的segment
[gpadmin@dw-greenplum-4 gpseg-1]$ gprecoverseg
3.3还原所有segment角色
[gpadmin@dw-greenplum-4 gpseg-1]$ gprecoverseg -r
如果要新建standby,但是原来已有standby,首先要删除它。
gpinitstandby -r
3.4把原来master 变成standby
[gpadmin@dw-greenplum-1 gpmaster]$ mv gpseg-1 gpseg-1.bak
在新的主操作如下命令:
[gpadmin@dw-greenplum-4 ~]$ gpinitstandby -F pg_system:/home/gpadmin/gpdata/gpmaster/gpseg-1 -s mdw
启动standby
[gpadmin@dw-greenplum-4 ~]$ gpinitstandby -n
20160424:06:17:19:003594 gpinitstandby:dw-greenplum-4:gpadmin-[INFO]:-Standy master is already up and running.
查看集群状态
select a.dbid,a.content,a.role,a.port,a.hostname,b.fsname,c.fselocation from gp_segment_configuration a,pg_filespace b,pg_filespace_entry c where a.dbid=c.fsedbid and b.oid=c.fsefsoid order by content;
select * from gp_segment_configuration where content='-1';
查看standby延迟,查看pg_stat_replication 视图即可。
select pg_switch_xlog();
select * from pg_stat_replication ;