PG 数据库库监听队列的长度问题
发表于:2024-11-24 作者:千家信息网编辑
千家信息网最后更新 2024年11月24日,不论mysql 还是pg 数据库都通过监听某个ip/端口, 或者某个socket 来实现通讯.这里涉及到一个问题,就是这个监听队列的长度问题.mysql 是自己实现的, 在my.cnf 里有个配置选项
千家信息网最后更新 2024年11月24日PG 数据库库监听队列的长度问题
不论mysql 还是pg 数据库都通过监听某个ip/端口, 或者某个socket 来实现通讯.
这里涉及到一个问题,就是这个监听队列的长度问题.
mysql 是自己实现的, 在my.cnf 里有个配置选项 back_log 这就是设置监听队列的长度的.
PG 数据库的监听队列的长度, 似乎没有地方可以设置.
在做一个pgbench 的高并发压力测试的时候,似乎出现这个问题.
命令:
pgbench -n -r -c 250 -j 250 -T 2 -f update_smallrange.sql
错误消息:
Connection to database "" failed:
could not connect to server: Resource temporarily unavailable
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
但是从上面的"Resource temporarily unavailable"看不出是哪个资源出问题了。
经过调查,找到了下面一个链接
http://www.postgresql.org/message-id/20130617141622.GH5875@alap2.anarazel.de
[code]
From:Andres Freund To:pgsql-hackers(at)postgresql(dot)orgSubject:PQConnectPoll, connect(2), EWOULDBLOCK and somaxconnDate:2013-06-17 14:16:22Message-ID:20130617141622.GH5875@alap2.anarazel.de (view raw, whole thread or download thread mbox)Thread: 2013-06-17 14:16:22 from Andres Freund 2013-06-26 11:22:58 from Andres Freund 2013-06-26 16:07:54 from Tom Lane 2013-06-26 18:12:00 from Andres Freund 2013-06-27 00:07:40 from Tom Lane 2013-06-27 06:17:57 from Andres Freund 2013-06-27 13:48:25 from Tom Lane 2013-06-27 16:42:47 from Tom Lane Lists:pgsql-hackersHi,
When postgres on linux receives connection on a high rate client
connections sometimes error out with:
could not send data to server: Transport endpoint is not connected
could not send startup packet: Transport endpoint is not connected
To reproduce start something like on a server with sufficiently high
max_connections:
pgbench -h /tmp -p 5440 -T 10 -c 400 -j 400 -n -f /tmp/simplequery.sql
Now that's strange since that error should happen at connect(2) time,
not when sending the startup packet. Some investigation led me to
fe-secure.c's PQConnectPoll:
if (connect(conn->sock, addr_cur->ai_addr,
addr_cur->ai_addrlen) < 0)
{
if (SOCK_ERRNO == EINPROGRESS ||
SOCK_ERRNO == EWOULDBLOCK ||
SOCK_ERRNO == EINTR ||
SOCK_ERRNO == 0)
{
/*
* This is fine - we're in non-blocking mode, and
* the connection is in progress. Tell caller to
* wait for write-ready on socket.
*/
conn->status = CONNECTION_STARTED;
return PGRES_POLLING_WRITING;
}
/* otherwise, trouble */
}
So, we're accepting EWOULDBLOCK as a valid return value for
connect(2). Which it isn't. EAGAIN in contrast is on some BSDs and on
linux. Unfortunately POSIX allows those two to share the same value...
My manpage tells me:
EAGAIN No more free local ports or insufficient entries in the routing cache. For
AF_INET see the description of
/proc/sys/net/ipv4/ip_local_port_range ip(7)
for information on how to increase the number of local
ports.
So, the problem is that we took a failed connection as having been
initially successfull but in progress.
Not accepting EWOULDBLOCK in the above if() results in:
could not connect to server: Resource temporarily unavailable
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5440"?
which makes more sense.
Trivial patch attached.
Now, the question is why we cannot complete connections on unix sockets?
Some code reading reading shows net/unix/af_unix.c:unix_stream_connect()
shows:
if (unix_recvq_full(other)) {
err = -EAGAIN;
if (!timeo)
goto out_unlock;
So, if we're in nonblocking mode - which we are - and the receive queue
is full we return EAGAIN. The receive queue for unix sockets is defined
as
static inline int unix_recvq_full(struct sock const *sk)
{
return skb_queue_len(&sk->sk_receive_queue) > sk->sk_max_ack_backlog;
}
Where sk_max_ack_backlog is whatever has been passed to the
listen(backlog) on the listening side.
Question: But postgres does listen(fd, MaxBackends * 2), how can that be
a problem?
Answer:
If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn,
then it is silently truncated to that value; the default value in this file is
128. In kernels before 2.4.25, this limit was a hard coded value, SOMAXCONN, with
the value 128.
Setting somaxconn to something higher indeed makes the problem go away.
I'd guess that pretty much the same holds true for tcp connections,
although I didn't verify that which would explain some previous reports
on the lists.
TLDR: Increase /proc/sys/net/core/somaxconn
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
[/code]
原来是PG服务端的listen backlog(受内核参数somaxconn限制)不够用了,somaxconn的默认值是128,调大后,重启PG再测就OK了。
/proc/sys/net/core/somaxconn
This file defines a ceiling value for the backlog argument of listen(2); see the listen(2) manual page
for details.
到这里解决方案就很明了了,
echo 256 > /proc/sys/net/core/somaxconn
然后重新启动pg 继续进行就ok 了.
不论mysql 还是pg 数据库都通过监听某个ip/端口, 或者某个socket 来实现通讯.
这里涉及到一个问题,就是这个监听队列的长度问题.
mysql 是自己实现的, 在my.cnf 里有个配置选项 back_log 这就是设置监听队列的长度的.
PG 数据库的监听队列的长度, 似乎没有地方可以设置.
在做一个pgbench 的高并发压力测试的时候,似乎出现这个问题.
命令:
pgbench -n -r -c 250 -j 250 -T 2 -f update_smallrange.sql
错误消息:
Connection to database "" failed:
could not connect to server: Resource temporarily unavailable
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
但是从上面的"Resource temporarily unavailable"看不出是哪个资源出问题了。
经过调查,找到了下面一个链接
http://www.postgresql.org/message-id/20130617141622.GH5875@alap2.anarazel.de
[code]
From:Andres Freund To:pgsql-hackers(at)postgresql(dot)orgSubject:PQConnectPoll, connect(2), EWOULDBLOCK and somaxconnDate:2013-06-17 14:16:22Message-ID:20130617141622.GH5875@alap2.anarazel.de (view raw, whole thread or download thread mbox)Thread: 2013-06-17 14:16:22 from Andres Freund 2013-06-26 11:22:58 from Andres Freund 2013-06-26 16:07:54 from Tom Lane 2013-06-26 18:12:00 from Andres Freund 2013-06-27 00:07:40 from Tom Lane 2013-06-27 06:17:57 from Andres Freund 2013-06-27 13:48:25 from Tom Lane 2013-06-27 16:42:47 from Tom Lane Lists:pgsql-hackersHi,
When postgres on linux receives connection on a high rate client
connections sometimes error out with:
could not send data to server: Transport endpoint is not connected
could not send startup packet: Transport endpoint is not connected
To reproduce start something like on a server with sufficiently high
max_connections:
pgbench -h /tmp -p 5440 -T 10 -c 400 -j 400 -n -f /tmp/simplequery.sql
Now that's strange since that error should happen at connect(2) time,
not when sending the startup packet. Some investigation led me to
fe-secure.c's PQConnectPoll:
if (connect(conn->sock, addr_cur->ai_addr,
addr_cur->ai_addrlen) < 0)
{
if (SOCK_ERRNO == EINPROGRESS ||
SOCK_ERRNO == EWOULDBLOCK ||
SOCK_ERRNO == EINTR ||
SOCK_ERRNO == 0)
{
/*
* This is fine - we're in non-blocking mode, and
* the connection is in progress. Tell caller to
* wait for write-ready on socket.
*/
conn->status = CONNECTION_STARTED;
return PGRES_POLLING_WRITING;
}
/* otherwise, trouble */
}
So, we're accepting EWOULDBLOCK as a valid return value for
connect(2). Which it isn't. EAGAIN in contrast is on some BSDs and on
linux. Unfortunately POSIX allows those two to share the same value...
My manpage tells me:
EAGAIN No more free local ports or insufficient entries in the routing cache. For
AF_INET see the description of
/proc/sys/net/ipv4/ip_local_port_range ip(7)
for information on how to increase the number of local
ports.
So, the problem is that we took a failed connection as having been
initially successfull but in progress.
Not accepting EWOULDBLOCK in the above if() results in:
could not connect to server: Resource temporarily unavailable
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5440"?
which makes more sense.
Trivial patch attached.
Now, the question is why we cannot complete connections on unix sockets?
Some code reading reading shows net/unix/af_unix.c:unix_stream_connect()
shows:
if (unix_recvq_full(other)) {
err = -EAGAIN;
if (!timeo)
goto out_unlock;
So, if we're in nonblocking mode - which we are - and the receive queue
is full we return EAGAIN. The receive queue for unix sockets is defined
as
static inline int unix_recvq_full(struct sock const *sk)
{
return skb_queue_len(&sk->sk_receive_queue) > sk->sk_max_ack_backlog;
}
Where sk_max_ack_backlog is whatever has been passed to the
listen(backlog) on the listening side.
Question: But postgres does listen(fd, MaxBackends * 2), how can that be
a problem?
Answer:
If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn,
then it is silently truncated to that value; the default value in this file is
128. In kernels before 2.4.25, this limit was a hard coded value, SOMAXCONN, with
the value 128.
Setting somaxconn to something higher indeed makes the problem go away.
I'd guess that pretty much the same holds true for tcp connections,
although I didn't verify that which would explain some previous reports
on the lists.
TLDR: Increase /proc/sys/net/core/somaxconn
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
[/code]
原来是PG服务端的listen backlog(受内核参数somaxconn限制)不够用了,somaxconn的默认值是128,调大后,重启PG再测就OK了。
/proc/sys/net/core/somaxconn
This file defines a ceiling value for the backlog argument of listen(2); see the listen(2) manual page
for details.
到这里解决方案就很明了了,
echo 256 > /proc/sys/net/core/somaxconn
然后重新启动pg 继续进行就ok 了.
问题
监听
长度
队列
数据
数据库
就是
不够
内核
压力
参数
命令
地方
方案
时候
明了
消息
端口
端的
解决方案
数据库的安全要保护哪些东西
数据库安全各自的含义是什么
生产安全数据库录入
数据库的安全性及管理
数据库安全策略包含哪些
海淀数据库安全审计系统
建立农村房屋安全信息数据库
易用的数据库客户端支持安全管理
连接数据库失败ssl安全错误
数据库的锁怎样保障安全
掌讯莆田网络技术有限公司
湖湘杯2019网络安全大赛
西安市公安局网络安全大队
宏基迈克菲网络安全实施
软件开发部部长工资待遇好不好
成都软件开发驻场多少钱
浦东新区个人存储服务器
可视化方式删除数据库
换一个网络安全的图画
重庆华为服务器维修哪家便宜
数据库返回
阿里云服务器个人版快速入门
关于迷你炸我的世界服务器的事情
MAC关闭mdm服务器
分包商网络安全上岗考试
语音软件开发有限公司
软件开发vi是什么意思
服务器远程做系统安装系统
怎样选云服务器申请注册
美发店管理软件开发
服务器杀毒软件 下载
宁海安卓软件开发商
门禁服务器带宽
银行管理系统数据库
1.12任务数据库插件
学校网络安全工作领导小组及职责
apache服务器课程设计报告
网络技术升大专
从网页上实现对数据库
天津智能化软件开发代理价钱