千家信息网

Session重叠问题学习(七)--小花狸合并算法和最后一次优化

发表于:2025-02-07 作者:千家信息网编辑
千家信息网最后更新 2025年02月07日,接前文Session重叠问题学习(二),这是问题和需求的描述,执行时间90秒http://blog.itpub.net/29254281/viewspace-2150229/Session重叠问题学习
千家信息网最后更新 2025年02月07日Session重叠问题学习(七)--小花狸合并算法和最后一次优化接前文
Session重叠问题学习(二),这是问题和需求的描述,执行时间90秒
http://blog.itpub.net/29254281/viewspace-2150229/

Session重叠问题学习(三)--优化,一次优化后,执行时间25秒
http://blog.itpub.net/29254281/viewspace-2150259/

Session重叠问题学习(四)--再优化,二次优化后,执行时间10秒
http://blog.itpub.net/29254281/viewspace-2150297/

Session重叠问题学习(五)--最优化,三次优化后,执行时间1.6秒
http://blog.itpub.net/29254281/viewspace-2150339/

Session重叠问题学习(六)--极致优化,四次优化后,执行时间1250-1300毫秒
http://blog.itpub.net/29254281/viewspace-2150364/


这是对这个问题的算法总结和最后一次优化.
经过这次优化,在我的电脑上(SSD硬盘,机械硬盘还是没有这么快),运行时间是980毫秒左右.真正意义上的秒出.并且我确实觉得是优无可优了。

之所以能从10秒的版本,跳跃优化到1.6s,1.3s的版本.是因为采用了小花狸Session合并算法。

假如用户间首尾时间段没有重复的情况下,满足如下的规律


但是该规律仅仅存在于用户首尾时间段不重合的情况.
比如A用户上线时间是 10点至11点整,而用户B上线时间是11点整到12点.
因为11点整这个时刻,用户A和B重合了,所以这个算法就不能生效了.

所以当时取巧,如果重合了,就增加或者减去一个很小的时间
s+interval startnum/1000000 second s,
e-interval endnum/1000000 second e

  1. insert into t2 (roomid,s,e)
  2. select roomid,
  3. s+interval startnum/1000000 second s,
  4. e-interval endnum/1000000 second e
  5. from (
  6. select
  7. roomid,
  8. s,e,
  9. startnum,
  10. when @eflag=eflag then @rn:=@rn+1 when @eflag:=eflag then @rn else @rn end endnum
  11. from (
  12. select * from (
  13. select when @sflag=sflag then @rn:=@rn+1 when @sflag:=sflag then @rn else @rn end startnum,roomid,s,e,sflag,eflag from
  14. (
  15. select * from
  16. (
  17. select t1.*,concat('[',roomid,'],',s) sflag,concat('[',roomid,'],',e) eflag from t1 order by roomid ,sflag
  18. )a,(select @sflag:='',@rn:=0,@eflag:='') vars
  19. ) b
  20. ) bb order by roomid,eflag
  21. ) c
  22. ) d ;

但是这样引入一个问题,就是有误差.误差只能缩小却不能消除.
这也是为什么1.3s和1.6s版本使用timestamp(6)的原因,就是为了缩小误差.

但是经过反复的思考,之前发现的规律其实是一个特例.

小花狸Session合并的通用情况其实是下图所示


先给每个点标号.
每个点,如果是开始点则+1,结束点则-1.
以图中左侧第二点为例,该点是四个点的重合,其中有一个结束点(-1)三个开始点(+3),所以该点标号是2.

然后从左到右计算,
最左点是0加上该点标号,作为右侧点的最大在线人数.
其他点的最大在线人数 都是左侧点的最大在线人数加上标号.

最后查询最大在线人数大于等于2的时间段,汇总即可。

改进的过程如下:

  1. DELIMITER $$
  2. CREATE DEFINER=`root`@`localhost` PROCEDURE `p`()
  3. BEGIN
  4. drop table if exists t1;
  5. drop table if exists t2;
  6. drop table if exists tmp_time_point;
  7. drop table if exists tmp_min_range;
  8. drop table if exists tmp_s;
  9. CREATE temporary TABLE `t1` (
  10. `roomid` int(11) NOT NULL DEFAULT '0',
  11. `userid` bigint(20) NOT NULL DEFAULT '0',
  12. `s` timestamp,
  13. `e` timestamp,
  14. primary key(roomid,userid,s,e)
  15. ) ENGINE=memory;
  16. CREATE temporary TABLE `t2` (
  17. `roomid` int(11) NOT NULL DEFAULT '0',
  18. `timepoint` timestamp,
  19. c int,
  20. key(roomid,timepoint)
  21. ) ENGINE=memory;
  22. CREATE temporary TABLE `tmp_min_range` (
  23. `roomid` int(11) NOT NULL DEFAULT '0',
  24. `s` timestamp,
  25. `e` timestamp,
  26. primary key(roomid,s,e),
  27. key(roomid,e)
  28. ) ENGINE=memory;
  29. create temporary table tmp_time_point(
  30. roomid bigint,
  31. timepoint timestamp,
  32. type smallint,
  33. key(roomid,timepoint)
  34. ) engine=memory;
  35. create temporary table tmp_s(
  36. roomid bigint,
  37. userid bigint,
  38. s timestamp,
  39. e timestamp,
  40. i int
  41. ) engine=memory;
  42. SET @A=0;
  43. SET @B=0;
  44. insert into tmp_s
  45. SELECT x.roomid,x.userid,s,e,datediff(e,s)+1 i
  46. FROM
  47. (
  48. (
  49. SELECT @B:=@B+1 AS id,roomid,userid,s
  50. FROM (
  51. SELECT DISTINCT roomid, userid, roomstart AS s
  52. FROM u_room_log a
  53. WHERE NOT EXISTS (SELECT *
  54. FROM u_room_log b
  55. WHERE a.roomid = b.roomid
  56. AND a.userid = b.userid
  57. AND a.roomstart > b.roomstart
  58. AND a.roomstart <= b.roomend)
  59. ) AS p
  60. ) AS x,
  61. (
  62. SELECT @A:=@A+1 AS id,roomid,userid,e
  63. FROM
  64. (
  65. SELECT DISTINCT roomid, userid, roomend AS e
  66. FROM u_room_log a
  67. WHERE NOT EXISTS (SELECT *
  68. FROM u_room_log b
  69. WHERE a.roomid = b.roomid
  70. AND a.userid = b.userid
  71. AND a.roomend >= b.roomstart
  72. AND a.roomend < b.roomend)
  73. ) AS o
  74. ) AS y
  75. )
  76. WHERE x.id = y.id AND x.roomid = y.roomid AND x.userid = y.userid ;
  77. select max(i) into @c from tmp_s;
  78. insert ignore into t1(roomid,userid,s,e)
  79. select
  80. roomid, userid,
  81. if(date(s)!=date(e) and id>1,date(s+interval id-1 date(s+interval id-1 date(e) ,e,date_format(s+interval id-1 '%Y-%m-%d 23:59:59')) e
  82. from tmp_s t1 STRAIGHT_JOIN
  83. nums on(nums.id<=t1.i)
  84. where nums.id<=@c
  85. ;
  86. -- 开始点+1,结束点-1
  87. insert into tmp_time_point(roomid,timepoint,type) select roomid,s,1 from t1;
  88. insert into tmp_time_point(roomid,timepoint,type) select roomid,e,-1 from t1;
  89. -- 计算每个点的标号
  90. insert into t2(roomid,timepoint,c)
  91. select roomid,timepoint,from tmp_time_point group by roomid,timepoint;
  92. -- 计算最小范围
  93. insert ignore into tmp_min_range(roomid,s,e)
  94. select roomid,starttime starttime, endtime endtime from (
  95. select
  96. if(@roomid=roomid,@d,'') as starttime,@d:=str_to_date(timepoint,'%Y-%m-%d %H:%i:%s.%f'),@roomid:=roomid,p.roomid,str_to_date(timepoint,'%Y-%m-%d %H:%i:%s.%f') endtime
  97. from tmp_time_point p,(select @d:='',@roomid:=-1) vars
  98. order by roomid,timepoint
  99. ) v4 where starttime!='' and date(starttime)=date(endtime);
  100. select roomid,date(s) dt,round(second,date_format(s,'%Y-%m-%d %H:%i:%s'),date_format(e,'%Y-%m-%d %H:%i:%s')))/60) ts,max(num) c from
  101. (
  102. select a.roomid,num,a.s,a.e from (
  103. select when @roomid=roomid and date(@timepoint)=date(timepoint) then @num:=@num+prevC when @roomid:=roomid then @num:=0 end num,@timepoint:=timepoint ,a.* from (
  104. select when @roomid=roomid then @prevC when @roomid:=roomid then @prevC:=0 end prevC,@prevC:=c,b.* from (
  105. select * from t2 ,(select @roomid:=-1,@timepoint:='',@num:=0,@prevC:=-1) vars
  106. ) b order by roomid,timepoint
  107. ) a order by roomid,timepoint
  108. ) c
  109. inner join
  110. tmp_min_range a on( c.timepoint=a.e and c.roomid=a.roomid)
  111. where num>=2
  112. ) d group by roomid,date(s);
  113. END

耗时分析:
填充tmp_s,合并同一房间同一用户的重叠部分,耗时639毫秒
填充t1,拆分跨天的用户数据,耗时62毫秒
填充tmp_time_point,写入开始节点和结束节点的信息和权重,耗时忽略不计
填充t2,计算每个点的标号.耗时78毫秒
填充tmp_min_range,计算最小间隔范围,耗时125毫秒
小花狸合并算法并展示,耗时266毫秒.

总计时长983毫秒.

好了,这个问题不能再玩了..over,over
0