千家信息网

Session重叠问题学习(三)--优化

发表于:2024-09-21 作者:千家信息网编辑
千家信息网最后更新 2024年09月21日,接前文http://blog.itpub.net/29254281/viewspace-2150229/前文中的算法想了一天半,终于在昨天晚上得出了正确的结果.在我的环境中,耗时90s ,还有进一步优
千家信息网最后更新 2024年09月21日Session重叠问题学习(三)--优化接前文
http://blog.itpub.net/29254281/viewspace-2150229/

前文中的算法想了一天半,终于在昨天晚上得出了正确的结果.
在我的环境中,耗时90s ,还有进一步优化的空间.

首选是生成 t1 和 t2的方式.
之前使用create table 方式 导致类型不对,
因为是临时作用的表,所以可以预先创建表结构
CREATE TABLE `t1` (
`roomid` int(11) NOT NULL DEFAULT '0',
`userid` bigint(20) NOT NULL DEFAULT '0',
`s` timestamp ,
`e` timestamp,
primary KEY (`roomid`,`userid`,`s`,`e`),
KEY (`roomid`,`s`,`e`)
) ;

CREATE TABLE `t2` (
`roomid` int(11) NOT NULL DEFAULT '0',
`userid` bigint(20) NOT NULL DEFAULT '0',
`s` timestamp ,
`e` timestamp,
primary KEY (`roomid`,`userid`,`s`,`e`),
KEY (`roomid`,`s`,`e`)
) ;

前文中的第一步可以封装为一个过程
  1. DELIMITER $$
  2. CREATE DEFINER=`root`@`localhost` PROCEDURE `p`()
  3. BEGIN
  4. insert into t1
  5. select distinct
  6. roomid,
  7. userid,
  8. if(date(s)!=date(e) and id>1,date(s+interval id-1 date(s+interval id-1 date(e) ,e,date_format(s+interval id-1 '%Y-%m-%d 23:59:59')) e
  9. from (
  10. SELECT DISTINCT s.roomid, s.userid, s.s, (
  11. SELECT MIN(e)
  12. FROM (SELECT DISTINCT roomid, userid, roomend AS e
  13. FROM u_room_log a
  14. WHERE NOT EXISTS (SELECT *
  15. FROM u_room_log b
  16. WHERE a.roomid = b.roomid
  17. AND a.userid = b.userid
  18. AND a.roomend >= b.roomstart
  19. AND a.roomend < b.roomend)
  20. ) s2
  21. WHERE s2.e > s.s
  22. AND s.roomid = s2.roomid
  23. AND s.userid = s2.userid
  24. ) AS e
  25. FROM (SELECT DISTINCT roomid, userid, roomstart AS s
  26. FROM u_room_log a
  27. WHERE NOT EXISTS (SELECT *
  28. FROM u_room_log b
  29. WHERE a.roomid = b.roomid
  30. AND a.userid = b.userid
  31. AND a.roomstart > b.roomstart
  32. AND a.roomstart <= b.roomend)
  33. ) s, (SELECT DISTINCT roomid, userid, roomend AS e
  34. FROM u_room_log a
  35. WHERE NOT EXISTS (SELECT *
  36. FROM u_room_log b
  37. WHERE a.roomid = b.roomid
  38. AND a.userid = b.userid
  39. AND a.roomend >= b.roomstart
  40. AND a.roomend < b.roomend)
  41. ) e
  42. WHERE s.roomid = e.roomid
  43. AND s.userid = e.userid
  44. ) t1 ,
  45. nums
  46. where nums.id<=datediff(e,s)+1
  47. ;
  48. END

函数修改如下
  1. DELIMITER $$
  2. CREATE DEFINER=`root`@`localhost` FUNCTION `f`(pTime timestamp) RETURNS int(11)
  3. BEGIN
  4. declare pResult bigint;
  5. insert into t2
  6. select distinct v6.roomid,v6.userid,greatest(s,starttime) s,least(e,endtime) e
  7. from (
  8. select roomid,as DATETIME) starttime,as DATETIME) endtime from (
  9. select @d as starttime,@d:=d,v3.roomid,v3.d endtime from (
  10. select distinct roomid,
  11. when nums.id=1 then v1s
  12. when nums.id=2 then v1e
  13. when nums.id=3 then v2s
  14. when nums.id=4 then v2e
  15. end d from (
  16. select v1.roomid, v1.s v1s,v1.e v1e,v2.s v2s,v2.e v2e
  17. from t1 v1
  18. inner join t1 v2 on ((v1.s between v2.s and v2.e or v1.e between v2.s and v2.e ) and v1.roomid=v2.roomid)
  19. where v2.roomid in(select distinct roomid from t1 where date(s)=pTime)
  20. and v2.s>=pTime and v2.s<(pTime+interval '1' and (v2.roomid,v2.userid,v2.s,v2.e)!= (v1.roomid,v1.userid,v1.s,v1.e)
  21. ) a,nums where nums.id<=4
  22. order by roomid,d
  23. ) v3,(select @d:='') vars
  24. ) v4 where starttime!=''
  25. ) v5 inner join t1 v6 on(v5.starttime between v6.s and v6.e and v5.endtime between v6.s and v6.e and v5.roomid=v6.roomid)
  26. ;
  27. select row_count() into pResult;
  28. RETURN pResult;
  29. END

原来是针对每天每个房间处理,经过优化对某天的所有房间进行处理,批量的形式更快

另外在中间过程增加了类型转换,可以更好的利用索引
select roomid,CAST(starttime as DATETIME) starttime,CAST(endtime as DATETIME) endtime


另外第7行 原来没有 distinct 可能导致bug
select distinct v6.roomid,v6.userid,greatest(s,starttime) s,least(e,endtime) e

调用时执行:
truncate table t1;
truncate table t2;
call p;
select f(s) from (
select distinct date(s) s from t1
) t

两步的执行时间:


今天优化了一天,从90s优化到25s以内,已经达到了预期。
我觉得在单线程环境,基本上已经达到最优.
如果还想优化到极致,第二步的函数执行,可以通过JAVA程序多线程一起跑,只要服务器CPU核数多,优化效果应该还是很明显的。

0