千家信息网

Session重叠问题学习(六)--极致优化

发表于:2025-02-01 作者:千家信息网编辑
千家信息网最后更新 2025年02月01日,接前文Session重叠问题学习(二),这是问题和需求的描述,执行时间90秒http://blog.itpub.net/29254281/viewspace-2150229/Session重叠问题学习
千家信息网最后更新 2025年02月01日Session重叠问题学习(六)--极致优化接前文
Session重叠问题学习(二),这是问题和需求的描述,执行时间90秒
http://blog.itpub.net/29254281/viewspace-2150229/

Session重叠问题学习(三)--优化,一次优化后,执行时间25秒
http://blog.itpub.net/29254281/viewspace-2150259/

Session重叠问题学习(四)--再优化,二次优化后,执行时间10秒
http://blog.itpub.net/29254281/viewspace-2150297/

Session重叠问题学习(五)--最优化,三次优化后,执行时间1.6秒
http://blog.itpub.net/29254281/viewspace-2150339/

周五晚上终于把这个算法初步实现了.
连续加班忙碌了一个星期,终于有点曙光了.
从这个问题的缘起,到目前应该已经优化了快100倍了
但是周末的时候,想想还是不对.
小花狸Session合并算法(对,以后这个算法就叫这个名称了)实现的合并速度应该是非常快的.代价仅仅是扫描一遍记录.
这1.6秒到底用在哪里了?

后来经过反复调试.发现还有两块可以优化改进的地方.
改进后的过程如下:

  1. drop procedure p;
  2. DELIMITER $$
  3. CREATE DEFINER=`root`@`localhost` PROCEDURE `p`()
  4. BEGIN
  5. declare done int default 0;
  6. declare v_roomid bigint;
  7. declare v_time timestamp(6);
  8. declare v_cur_type smallint;
  9. declare v_before_roomid bigint default -1;
  10. declare v_before_type smallint default -1;
  11. declare v_before_time timestamp(6) ;
  12. declare v_num bigint default 0;
  13. declare cur_test CURSOR for select roomid,type,timepoint from tmp_time_point order by roomid,timepoint,type ;
  14. DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
  15. drop table if exists t1;
  16. drop table if exists t2;
  17. drop table if exists tmp_time_point;
  18. drop table if exists tmp_result;
  19. drop table if exists tmp_min_range;
  20. drop table if exists tmp_s;
  21. CREATE temporary TABLE `t1` (
  22. `roomid` int(11) NOT NULL DEFAULT '0',
  23. `userid` bigint(20) NOT NULL DEFAULT '0',
  24. `s` timestamp(6),
  25. `e` timestamp(6),
  26. primary key(roomid,userid,s,e)
  27. ) ENGINE=memory;
  28. CREATE temporary TABLE `t2` (
  29. `roomid` int(11) NOT NULL DEFAULT '0',
  30. `s` timestamp(6),
  31. `e` timestamp(6)
  32. ) ENGINE=memory;
  33. CREATE temporary TABLE `tmp_min_range` (
  34. `roomid` int(11) NOT NULL DEFAULT '0',
  35. `s` timestamp(6),
  36. `e` timestamp(6),
  37. primary key(roomid,s,e),
  38. key(roomid,e)
  39. ) ENGINE=memory;
  40. create temporary table tmp_time_point(
  41. roomid bigint,
  42. timepoint timestamp(6),
  43. type smallint,
  44. key(roomid,timepoint)
  45. ) engine=memory;
  46. create temporary table tmp_result(
  47. roomid bigint,
  48. timepoint timestamp(6),
  49. c int
  50. ) engine=memory;
  51. create temporary table tmp_s(
  52. roomid bigint,
  53. userid bigint,
  54. s timestamp,
  55. e timestamp,
  56. i int
  57. ) engine=memory;
  58. SET @A=0;
  59. SET @B=0;
  60. insert into tmp_s
  61. SELECT x.roomid,x.userid,s,e,datediff(e,s)+1 i
  62. FROM
  63. (
  64. (
  65. SELECT @B:=@B+1 AS id,roomid,userid,s
  66. FROM (
  67. SELECT DISTINCT roomid, userid, roomstart AS s
  68. FROM u_room_log a
  69. WHERE NOT EXISTS (SELECT *
  70. FROM u_room_log b
  71. WHERE a.roomid = b.roomid
  72. AND a.userid = b.userid
  73. AND a.roomstart > b.roomstart
  74. AND a.roomstart <= b.roomend)
  75. ) AS p
  76. ) AS x,
  77. (
  78. SELECT @A:=@A+1 AS id,roomid,userid,e
  79. FROM
  80. (
  81. SELECT DISTINCT roomid, userid, roomend AS e
  82. FROM u_room_log a
  83. WHERE NOT EXISTS (SELECT *
  84. FROM u_room_log b
  85. WHERE a.roomid = b.roomid
  86. AND a.userid = b.userid
  87. AND a.roomend >= b.roomstart
  88. AND a.roomend < b.roomend)
  89. ) AS o
  90. ) AS y
  91. )
  92. WHERE x.id = y.id AND x.roomid = y.roomid AND x.userid = y.userid ;
  93. select max(i) into @c from tmp_s;
  94. insert ignore into t1(roomid,userid,s,e)
  95. select
  96. roomid, userid,
  97. if(date(s)!=date(e) and id>1,date(s+interval id-1 date(s+interval id-1 date(e) ,e,date_format(s+interval id-1 '%Y-%m-%d 23:59:59')) e
  98. from tmp_s t1 STRAIGHT_JOIN
  99. nums on(nums.id<=t1.i)
  100. where nums.id<=@c
  101. ;
  102. insert into t2 (roomid,s,e)
  103. select roomid,
  104. s+interval startnum/1000000 second s,
  105. e-interval endnum/1000000 second e
  106. from (
  107. select
  108. roomid,
  109. s,e,
  110. startnum,
  111. when @eflag=eflag then @rn:=@rn+1 when @eflag:=eflag then @rn else @rn end endnum
  112. from (
  113. select * from (
  114. select when @sflag=sflag then @rn:=@rn+1 when @sflag:=sflag then @rn else @rn end startnum,roomid,s,e,sflag,eflag from
  115. (
  116. select * from
  117. (
  118. select t1.*,concat('[',roomid,'],',s) sflag,concat('[',roomid,'],',e) eflag from t1 order by roomid ,sflag
  119. )a,(select @sflag:='',@rn:=0,@eflag:='') vars
  120. ) b
  121. ) bb order by roomid,eflag
  122. ) c
  123. ) d ;
  124. insert into tmp_time_point(roomid,timepoint,type) select roomid,s,1 from t2;
  125. insert into tmp_time_point(roomid,timepoint,type) select roomid,e,0 from t2;
  126. insert ignore into tmp_min_range(roomid,s,e)
  127. select roomid,starttime starttime, endtime endtime from (
  128. select
  129. if(@roomid=roomid,@d,'') as starttime,@d:=str_to_date(timepoint,'%Y-%m-%d %H:%i:%s.%f'),@roomid:=roomid,p.roomid,str_to_date(timepoint,'%Y-%m-%d %H:%i:%s.%f') endtime
  130. from tmp_time_point p,(select @d:='',@roomid:=-1) vars
  131. order by roomid,timepoint
  132. ) v4 where starttime!='' and date(starttime)=date(endtime);
  133. open cur_test;
  134. repeat
  135. fetch cur_test into v_roomid,v_cur_type,v_time;
  136. if done !=1 then
  137. -- 第一行或者每个房间的第一行
  138. if v_before_roomid=-1 or v_roomid!=v_before_roomid then
  139. set v_before_roomid:=v_roomid;
  140. set v_before_type:=1;
  141. set v_before_time:='0000-00-00 00:00:00';
  142. set v_num:=0;
  143. end if;
  144. if v_before_type=1 then
  145. set v_num:=v_num+1;
  146. insert into tmp_result(roomid,timepoint,c) values(v_roomid,v_time,v_num);
  147. end if;
  148. if v_before_type=0 then
  149. set v_num:=v_num-1;
  150. insert into tmp_result(roomid,timepoint,c) values(v_roomid,v_time,v_num);
  151. end if;
  152. set v_before_roomid:=v_roomid;
  153. set v_before_type:=v_cur_type;
  154. set v_before_time:=v_time;
  155. end if;
  156. until done end repeat;
  157. close cur_test;
  158. select roomid,date(s) dt,round(second,date_format(s,'%Y-%m-%d %H:%i:%s'),date_format(e,'%Y-%m-%d %H:%i:%s')))/60) ts,max(c)-1 c from (
  159. select a.roomid,a.s,a.e,r.c,r.timepoint from tmp_result r
  160. inner join
  161. tmp_min_range a on( r.timepoint=a.e and r.roomid=a.roomid)
  162. where c>2
  163. ) a group by roomid,date(s);
  164. END

第一处改进
原来同一房间同一用户重叠时间合并,然后再拆分跨天数据,用的是一条SQL
现在改进如下
  1. create temporary table tmp_s(
  2. roomid bigint,
  3. userid bigint,
  4. s timestamp,
  5. e timestamp,
  6. i int
  7. ) engine=memory;
  8. SET @A=0;
  9. SET @B=0;
  10. insert into tmp_s
  11. SELECT x.roomid,x.userid,s,e,datediff(e,s)+1 i
  12. FROM
  13. (
  14. (
  15. SELECT @B:=@B+1 AS id,roomid,userid,s
  16. FROM (
  17. SELECT DISTINCT roomid, userid, roomstart AS s
  18. FROM u_room_log a
  19. WHERE NOT EXISTS (SELECT *
  20. FROM u_room_log b
  21. WHERE a.roomid = b.roomid
  22. AND a.userid = b.userid
  23. AND a.roomstart > b.roomstart
  24. AND a.roomstart <= b.roomend)
  25. ) AS p
  26. ) AS x,
  27. (
  28. SELECT @A:=@A+1 AS id,roomid,userid,e
  29. FROM
  30. (
  31. SELECT DISTINCT roomid, userid, roomend AS e
  32. FROM u_room_log a
  33. WHERE NOT EXISTS (SELECT *
  34. FROM u_room_log b
  35. WHERE a.roomid = b.roomid
  36. AND a.userid = b.userid
  37. AND a.roomend >= b.roomstart
  38. AND a.roomend < b.roomend)
  39. ) AS o
  40. ) AS y
  41. )
  42. WHERE x.id = y.id AND x.roomid = y.roomid AND x.userid = y.userid ;
  43. select max(i) into @c from tmp_s;
  44. insert ignore into t1(roomid,userid,s,e)
  45. select
  46. roomid, userid,
  47. if(date(s)!=date(e) and id>1,date(s+interval id-1 date(s+interval id-1 date(e) ,e,date_format(s+interval id-1 '%Y-%m-%d 23:59:59')) e
  48. from tmp_s t1 STRAIGHT_JOIN
  49. nums on(nums.id<=t1.i)
  50. where nums.id<=@c
  51. ;

先把同一房间同一用户的重叠部分合并,然后暂存临时表
记录最大的间隔时间,然后再拆分数据

拆分数据的时候 使用STRAIGHT_JOIN 强制连接顺序.
这样避免因为数字辅助表过大,而导致性能陡然变差.


第二处改进
原来使用distinct的查询, 都改为在临时表上增加主键.
然后使用insert ignore into 代替 insert into
这样大概优化了300毫秒

经过反复优化之后,执行时间大致稳定在1250毫秒 至 1300 毫秒

各个部分耗时分析如下
填充tmp_s,合并同一房间同一用户的重叠部分,耗时655毫秒
填充t1,拆分跨天的用户数据,耗时62毫秒
填充t2,用户时间段首尾相交或者首尾全部重合的数据拆分,耗时140毫秒
填充tmp_min_range,计算最小间隔范围,耗时156毫秒
小花狸Session合并算法,耗时219毫秒
结果统计展示,耗时47毫秒

0