千家信息网

Java 搞定 SQL 集合运算的最简方法

发表于:2025-01-24 作者:千家信息网编辑
千家信息网最后更新 2025年01月24日,问题介绍  作为 java 程序员,用代码直接实现类似 SQL 中的交并补差的集合运算,总是要编写大量的代码,如果能有一个专门的外部数据工具,通过写简单类似 SQL 的脚本来实现,在 java 中直接
千家信息网最后更新 2025年01月24日Java 搞定 SQL 集合运算的最简方法

问题介绍

  作为 java 程序员,用代码直接实现类似 SQL 中的交并补差的集合运算,总是要编写大量的代码,如果能有一个专门的外部数据工具,通过写简单类似 SQL 的脚本来实现,在 java 中直接调用并可以返回结果集,就再好不过了。Java 版集算器正是解决这一难题的神器,通过 SPL 脚本可以直观自然得写出运算,再使用 java 调用 SPL 脚本,使用起来简单,快捷,高效。另外,虽然 SQL 有集合概念,但对于有序集合运算提供的支持却很有限,经常要采用很费解的思路才可以完成, SPL 基于离散数据集模型,能轻松处理有序集合运算。下面我们就由浅入深,举例说明如何使用。

SPL 实现

和集

  示例 1: 求重叠时间段的总天数

MySQL8:

with recursive t(start,end) as (select date'2010-01-07',date'2010-01-9'    union all select date'2010-01-15',date'2010-01-16'    union all select date'2010-01-07',date'2010-01-12'    union all select date'2010-01-08',date'2010-01-11'),t1(d,end) as (select start,end from t    union all select d+1,end from t1 where dselect count(distinct d) from t1;with recursive t(start,end) as (select date'2010-01-07',date'2010-01-9'    union all select date'2010-01-15',date'2010-01-16'    union all select date'2010-01-07',date'2010-01-12'    union all select date'2010-01-08',date'2010-01-11'),t1(d,end) as (select start,end from t    union all select d+1,end from t1 where dselect count(distinct d) from t1;

  说明:此例先将各时间段转成时间段内所有日子对应的日期,然后再求不同日期的个数

集算器 SPL:


A
1=connect("mysql")
2=A1.query@x("select date'2010-01-07'start,date'2010-01-9'end union all select date'2010-01-15',date'2010-01-16'union all select date'2010-01-07',date'2010-01-12'union all select date'2010-01-08',date'2010-01-11'")
3=A2.(periods(start,end))
4=A3.conj()
5=A4.icount()

  A3: 对 A2 中的每一个时间段构造从 start 到 end 的日期序列
  A4: 求 A3 中所有日期序列的和
  A5: 求 A4 中不重复日期的个数

  保存脚本文件SumSet.dfx (嵌入 Java 会用到)

差集

  示例 1: 列出英语人口和法语人口均超过 5% 的国家

MySQL8:

with t1(lang) as (select 'English' union all select 'French')    select name from world.country c    where not exists(select * from t1 where lang not in (select language from world.countrylanguage         where percentage>=5 and countrycode=c.code    ));with t1(lang) as (select 'English' union all select 'French')    select name from world.country c    where not exists(select * from t1 where lang not in (select language from world.countrylanguage         where percentage>=5 and countrycode=c.code    ));

  说明:此 SQL 只是演示通过双重否定实现差集为空

集算器 SPL:


A
1=connect("mysql")
2=A1.query("select CountryCode,Name,Language,Percentage from world.countrylanguage cl join world.country c on cl.countrycode=c.code where percentage>5")
3=A2.group(CountryCode)
4=A3.select(["English","French"]\~.(Language)==[])
5=A4.new(~.Name:name)

  A4: 选出 ["English","French"] 与本组语言集合的差为空的组,意思就是选出语言集合包含 English 和 French 的组

  保存脚本文件DifferenceSet.dfx (嵌入 Java 会用到)

交集

  示例 1: 列出英语人口、法语人口、西班牙语人口分别超过 0.3%、0.2%、0.1% 的国家代码

MySQL8:

with t1 as (select countrycode from world.countrylanguage where language='English' and percentage>0.3),     t2 as (select countrycode from world.countrylanguage where language='French' and percentage>0.2),     t3 as (select countrycode from world.countrylanguage where language='Spanish' and percentage>0.1)select countrycodefrom t1 join t2 using(countrycode) join t3 using(countrycode);with t1 as (select countrycode from world.countrylanguage where language='English' and percentage>0.3),     t2 as (select countrycode from world.countrylanguage where language='French' and percentage>0.2),     t3 as (select countrycode from world.countrylanguage where language='Spanish' and percentage>0.1)select countrycodefrom t1 join t2 using(countrycode) join t3 using(countrycode);

  说明:此例只是演示如何求解多个集合的交集

集算器 SPL:


A
1=connect("mysql")
2[English,French,Spanish]
3[0.3,0.2,0.1]
4=A2.(A1.query@i("select countrycode from world.countrylanguage where language=? and percentage>?",~,A3(#)))
5>A1.close()
6=A4.isect()

  A3: 按次序依次查询英语人口超 0.3%、法语人口超 0.2%、西班牙语超 0.1% 的国家代码,并转成序列
  A5: A3 中所有序列交集

  保存脚本文件IntersectionSet.dfx (嵌入 Java 会用到)

Java 调用

  SPL 嵌入到 Java 应用程序十分方便,通过 JDBC 调用存储过程方式加载,用和集保存的文件SumSet.dfx,示例调用如下:

...    Connection con = null;    Class.forName("com.esproc.jdbc.InternalDriver");    con= DriverManager.getConnection("jdbc:esproc:local://");    //调用存储过程,其中SumSet是dfx的文件名    st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call SumSet()");    //执行存储过程    st.execute();    //获取结果集    ResultSet rs = st.getResultSet();    ...    ...    Connection con = null;    Class.forName("com.esproc.jdbc.InternalDriver");    con= DriverManager.getConnection("jdbc:esproc:local://");    //调用存储过程,其中SumSet是dfx的文件名    st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call SumSet()");    //执行存储过程    st.execute();    //获取结果集    ResultSet rs = st.getResultSet();    ...

  替换成DifferenceSet.dfxIntersectionSet.dfx是同样的道理,只需 call DifferenceSet()或者 call IntersectionSet() 即可。这里只用 Java 片段粗略解释了如何嵌入 SPL,详细步骤请参阅Java 如何调用 SPL 脚本,也非常简单,不再赘述。同时,SPL 也支持 ODBC 驱动,集成到支持 ODBC 的语言,嵌入过程类似。

扩展节选

  关于集合运算除了上面讲的和差交运算,还可以获取与行号有关的计算,以及有序集合的对位运算。

根据行号取数据

  示例 1: 计算招商银行 (600036) 2017 年第 3 个交易日和倒数第 3 个交易日的交易信息

MySQL8:

with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31')    select tdate,open,close,volume from t where rn=3union all    select tdate,open,close,volume from t where rn=(select max(rn)-2 from t);with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31')    select tdate,open,close,volume from t where rn=3union all    select tdate,open,close,volume from t where rn=(select max(rn)-2 from t);

集算器 SPL:


A
1=connect("mysql")
2=A1.query@x("select * from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31' order by tdate")
3=A2(3)|A2.m(-3)

  A3: 第 3 条记录和倒数第 3 条记录的和集


  示例 2: 计算招商银行 (600036) 最近 20 个交易日的平均收盘价

MySQL8:

with t as (select *, row_number() over(order by tdate desc) rn from stktrade where sid='600036')select avg(close) avg20 from t where rn<=20;with t as (select *, row_number() over(order by tdate desc) rn from stktrade where sid='600036')select avg(close) avg20 from t where rn<=20;

集算器 SPL:


A
1=connect("mysql")
2=A1.query@x("select * from stktrade where sid='600036' order by tdate")
3=A2.m(-20:)
4=A3.avg(close)

  A2: 将 600036 的交易记录按日期排序
  A3: 取从倒数 20 条到末尾的所有记录
  A4: 求 A3 中所有记录收盘价的平均值

求满足条件的记录的行号

  示例 1: 计算招商银行 (600036)2017 年经过多少交易日收盘价达到 25 元

MySQL8:

with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31')select min(rn) from t where close>=25;with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31')select min(rn) from t where close>=25;

集算器 SPL:


A
1=connect("mysql")
2=A1.query@x("select * from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31' order by tdate")
3=A2.pselect(close>=25)

  A3: 从前往后查找第 1 个收盘价达到 25 元的记录位置


  示例 2: 计算格力电器 (000651) 2017 年涨幅 (考虑停牌)

MySQL8:

with t as (select * from stktrade where sid='000651'),     t1(d) as (select max(tdate) from t where tdate<'2017-01-01'),     t2(d) as (select max(tdate) from t where tdate<'2018-01-01')select s2.close/s1.close-1 risefrom (select * from t,t1 where tdate=d) s1,     (select * from t,t2 where tdate=d) s2;with t as (select * from stktrade where sid='000651'),     t1(d) as (select max(tdate) from t where tdate<'2017-01-01'),     t2(d) as (select max(tdate) from t where tdate<'2018-01-01')select s2.close/s1.close-1 risefrom (select * from t,t1 where tdate=d) s1,     (select * from t,t2 where tdate=d) s2;

集算器 SPL:


A
1=connect("mysql")
2=A1.query@x("select * from stktrade where sid='000651' and tdate<'2018-01-01' order by tdate")
3=A2.pselect@z(tdate < date("2017-01-01"))
4=A2(A3).close
5=A2.m(-1).close
6=A5/A4-1

  A2: 数据按交易日从小到大排序
  A3: 从后往前查找交易日在 2017-01-01 之前的最后一条记录在序列中的行号
  A4: 求 2016 年收盘价
  A5: 求 2017 年收盘价,其中 A2.m(-1) 取倒数第 1 条记录,即 2017 年最后一个交易日对应的记录


  示例 3: 列出 2017 年信息发展 (300469) 交易量超过 250 万股时的交易信息及各日涨幅(考虑停牌)

MySQL8:

with t as (select *, row_number() over(order by tdate) rn    from stktrade where sid='300469' and tdate<=date '2017-12-31'),t1 as (select * from t where tdate>=date'2017-01-01' and volume>=2500000)    select t1.tdate, t1.close, t.volume, t1.close/t.close-1 rise    from t1 join t on t1.rn=t.rn+1;with t as (select *, row_number() over(order by tdate) rn    from stktrade where sid='300469' and tdate<=date '2017-12-31'),t1 as (select * from t where tdate>=date'2017-01-01' and volume>=2500000)    select t1.tdate, t1.close, t.volume, t1.close/t.close-1 rise    from t1 join t on t1.rn=t.rn+1;

集算器 SPL:


A
1=connect("mysql")
2=A1.query@x("select * from stktrade where sid='300469' and tdate<= date '2017-12-31' order by tdate")
3=A2.pselect@a(tdate>=date("2017-01-01") && volume>2500000)
4=A3.new(A2(~).tdate:tdate, A2(~).close:close, A2(~).volume:volume, A2(~).close/A2(~-1).close-1:rise)

  A3: 求出 2017 年交易量超 250 万股所有记录的行号
  A4: 根据行号计算相应的日期、收盘价、交易量、涨幅

求最大值或最小值所在记录的行号

  示例 1: 计算招商银行 (600036) 2017 年最早的最低价与最早的最高价间隔多少交易日

MySQL8:

with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31'),     t1 as (select * from t where close=(select min(close) from t)),     t2 as (select * from t where close=(select max(close) from t))select abs(cast(min(t1.rn) as signed)-cast(min(t2.rn) as signed)) intevalfrom t1,t2;with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31'),     t1 as (select * from t where close=(select min(close) from t)),     t2 as (select * from t where close=(select max(close) from t))select abs(cast(min(t1.rn) as signed)-cast(min(t2.rn) as signed)) intevalfrom t1,t2;

集算器 SPL:


A
1=connect("mysql")
2=A1.query@x("select * from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31' order by tdate")
3=A2.pmax(close)
4=A2.pmin(close)
5=abs(A3-A4)

  A3: 从前往后找最大收盘价在序列中的行号
  A4: 从前往后找最小收盘价在序列中的行号


  示例 2: 计算招商银行 (600036) 2017 年最后的最低价与最后的最高价间隔多少交易日

MySQL8:

with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31'),     t1 as (select * from t where close=(select min(close) from t)),     t2 as (select * from t where close=(select max(close) from t))select abs(cast(max(t1.rn) as signed)-cast(max(t2.rn) as signed)) intevalfrom t1,t2;with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31'),     t1 as (select * from t where close=(select min(close) from t)),     t2 as (select * from t where close=(select max(close) from t))select abs(cast(max(t1.rn) as signed)-cast(max(t2.rn) as signed)) intevalfrom t1,t2;

集算器 SPL:


A
1=connect("mysql")
2=A1.query@x("select * from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31' order by tdate")
3=A2.pmax@z(close)
4=A2.pmin@z(close)
5=abs(A3-A4)

  A3: 从后往前找最大收盘价在序列中的行号
  A4: 从后往前找最小收盘价在序列中的行号

有序集合间的对位计算

  示例 1: 求 2018 年 3 月 6 日到 8 日创业板指 (399006) 对深证成指 (399001) 的每日相对收益率

MySQL8:

with t1 as (select *,close/lag(close) over(order by tdate) rise from stktrade where sid='399006' and tdate between '2018-03-05' and '2018-03-08'),     t2 as (select *, close/lag(close) over(order by tdate) rise from stktrade where sid='399001' and tdate between '2018-03-05' and '2018-03-08')select t1.rise-t2.risefrom t1 join t2 using(tdate)where t1.rise is not null;with t1 as (select *,close/lag(close) over(order by tdate) rise from stktrade where sid='399006' and tdate between '2018-03-05' and '2018-03-08'),     t2 as (select *, close/lag(close) over(order by tdate) rise from stktrade where sid='399001' and tdate between '2018-03-05' and '2018-03-08')select t1.rise-t2.risefrom t1 join t2 using(tdate)where t1.rise is not null;

集算器 SPL:


A
1=connect("mysql")
2=["399006","399001"].(A1.query("select * from stktrade where sid=? and tdate between '2018-03-05' and '2018-03-08' ",~))
3>A1.close()
4=A2.(~.calc(to(2,4),close/close[-1]))
5=A4(1)--A4(2)

  A2: 依次查询 399006 和 399001 从 2018 年 3 月 5 日到 8 日的交易数据
  A4: 依次计算 A2 中 2 个序表从第 2 条记录到第 4 条记录的涨幅,也就是 399006 和 399001 从 2018 年 3 月 6 日到 8 日的每天涨幅
  A5: 对位相减,即可算出每日相对收益率

SPL 优势

  • 有库写 SQL,没库写 SPL

  用 Java 程序直接汇总计算数据,还是比较累的,代码很长,并且不可复用,很多情况数据也不在数据库里,有了 SPL,就能像在 Java 中用 SQL 一样了,十分方便。

  • 常用无忧,不花钱就能取得终身使用权的入门版

  如果要分析的数据是一次性或临时性的,润乾集算器每个月都提供免费试用授权,可以循环免费使用。但要和 Java 应用程序集成起来部署到服务器上长期使用,定期更换试用授权还是比较麻烦,润乾提供了有终身使用权的入门版,解决了这个后顾之忧,获得方式参考 如何免费使用润乾集算器?

  • 技术文档和社区支持

  官方提供的集算器技术文档本身就有很多现成的例子,常规问题从文档里都能找到解决方法。如果获得了入门版,不仅能够使用 SPL 的常规功能,碰到任何问题都可以去乾学院上去咨询,官方通过该社区对入门版用户提供免费的技术支持。


0