Mapping怎么使用
本篇内容介绍了"Mapping怎么使用"的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来就让小编带领大家学习一下如何处理这些情况吧!希望大家仔细阅读,能够学有所成!
1
背景背景
登录状态下,日志中会采集到用户的登录id(account),可以做到用户身份的精确标识;而在匿名状态下,日志中没有采集到用户的登录id,准确标识用户,成为一件极其棘手的事情;
在事件日志中,对用户能产生标识作用的字段有:
app日志中,有deviceid,account
web日志中,有cookieid,ip,account
wxapp日志中,有openid,account
在现实中,一个用户,可能处于如下极其复杂的状态:
登录状态访问app
匿名状态访问app
登录状态访问web
匿名状态访问web
登录状态访问wx小程序
匿名状态访问wx小程序
一个用户可能拥有不止一台终端设备
一台终端设备上可能有多个用户使用
一个用户可能一段时间后更换手机
……
解决方案
适用场景
适合没有用户注册体系,或者极少数用户会进行多设备登录的产品,如工具类产品、搜索引擎、部分小型电商等。
这也是绝大多数数据分析产品唯一提供的方案。
局限性
同一用户在不同设备使用会被认为不同的用户,对后续的分析统计有影响。
不同用户在相同设备使用会被认为是一个用户,也对后续的分析统计有影响。
但如果用户跨设备使用或者多用户共用设备不是产品的常见场景的话,可以忽略上述问题。
适用场景
成功关联设备 ID 和登录 ID 之后,用户在该设备 ID 上或该登录 ID 下的行为就会贯通,被认为是一个 全局 ID 发生的。在进行事件、漏斗、留存等用户相关分析时也会算作一个用户。
关联设备 ID 和登录 ID 的方法虽然实现了更准确的用户追踪,但是也会增加复杂度。
所以一般来说,我们建议只有当同时满足以下条件时,才考虑进行 ID 关联:
需要贯通一个用户在一个设备上注册前后的行为。
需要贯通一个注册用户在不同设备上登录之后的行为
局限性
一个设备 ID 只能和一个登录 ID 关联,而事实上一台设备可能有多个用户使用。
一个登录 ID 只能和一个设备 ID 关联,而事实上一个用户可能用一个登录 ID 在多台设备上登录。
适用场景
一个用户在多个设备上进行登录是一种比较常见的场景,比如 Web 端和 App 端可能都需要进行登录。支持一个登录 ID 下关联多设备 ID 之后,用户在多设备下的行为就会贯通,被认为是一个ID 发生的。
局限性
一个设备 ID 只能和一个登录 ID 关联,而事实上一台设备可能有多个用户使用。
一个设备 ID 一旦跟某个登录 ID 关联或者一个登录 ID 和一个设备 ID 关联,就不能解除(自动解除)。
而事实上,设备 ID 和登录 ID 的动态关联才应该是更合理的。
基本原则,与方案3相同
修正之处,一个设备ID被绑定到某个登陆ID(A)之后,如果该设备在后续一段时间(比如一个月内)被一个新的登陆ID(B)更频繁使用,则该设备ID会被调整至绑定登陆ID(B)
实现
create table wedw_dw.test_id_mapping( device_id string comment '设备ID',user_id string comment '用户ID',page_url string comment '浏览页面',login_time timestamp comment '登录时间') partitioned by (date_id string)row format delimited fields terminated by ','stored as textfile;
+------------+----------+------------+------------------------+-------------+--+| device_id | user_id | page_url | login_time | date_id |+------------+----------+------------+------------------------+-------------+--+| device01 | user01 | pageview | 2020-09-01 09:01:00.0 | 2020-09-01 || device01 | user01 | pageview | 2020-09-01 09:02:00.0 | 2020-09-01 || device01 | user01 | pageview | 2020-09-01 09:03:00.0 | 2020-09-01 || device01 | user01 | pageview | 2020-09-01 09:04:00.0 | 2020-09-01 || device02 | user02 | pageview | 2020-09-01 09:03:00.0 | 2020-09-01 || device02 | user02 | pageview | 2020-09-01 09:04:00.0 | 2020-09-01 || device02 | user02 | pageview | 2020-09-01 09:05:00.0 | 2020-09-01 || device02 | user02 | pageview | 2020-09-01 09:06:00.0 | 2020-09-01 || device02 | null | pageview | 2020-09-01 09:07:00.0 | 2020-09-01 || device02 | user03 | pageview | 2020-09-01 09:08:00.0 | 2020-09-01 || device03 | null | pageview | 2020-09-01 09:01:00.0 | 2020-09-01 || device03 | null | pageview | 2020-09-01 09:02:00.0 | 2020-09-01 || device04 | null | pageview | 2020-09-01 09:08:00.0 | 2020-09-01 || device04 | null | pageview | 2020-09-01 09:09:00.0 | 2020-09-01 |+------------+----------+------------+------------------------+-------------+--+
一个账号在一个设备上的访问记录条数无论是多少,只计算一次分数 对每个设备上,登录过的 账号,按时间先后顺序,赋予不同的分值(依次递减10分) 评分结果示例如下:
#计算设备ID对应的全局IDdrop table if exists wedw_tmp.login_info_1;create table wedw_tmp.login_info_1asselect t1.device_id,t1.user_idfrom( select device_id ,user_id ,page_url ,login_time ,row_number() over(partition by device_id order by login_time asc) as rn from wedw_dw.test_id_mapping where date_id = '2020-09-01' and user_id<> 'null') t1where rn = 1;+------------+----------+--+| device_id | user_id |+------------+----------+--+| device01 | user01 || device02 | user02 |+------------+----------+--+
#计算设备ID对应的每个用户的登录信息及评分drop table if exists wedw_tmp.login_info_2;create table wedw_tmp.login_info_2asselectt4.device_id,collect_list(t4.login_info) as login_infofrom( select t3.device_id ,concat(t3.user_id,'->',110-10*t3.rn,'->',t3.login_time) as login_info from ( select t2.device_id ,t2.user_id ,t2.login_time ,row_number() over(partition by device_id order by login_time asc) as rn from ( select device_id ,user_id ,min(login_time) as login_time from wedw_dw.test_id_mapping where date_id = '2020-09-01' and user_id <> 'null' group by device_id,user_id ) t2 ) t3) t4group by t4.device_id;+------------+-------------------------------------------------------------------------+--+| device_id | login_info |+------------+-------------------------------------------------------------------------+--+| device01 | ["user01->100->2020-09-01 09:01:00"] || device02 | ["user02->100->2020-09-01 09:03:00","user03->90->2020-09-01 09:08:00"] |+------------+-------------------------------------------------------------------------+--+
# 创建guid映射表drop table if exists wedw_dw.device_guid;create table wedw_dw.device_guid( device_id string comment '设备ID',login_info arraycomment '用户登录信息',guid string comment '全局ID')row format delimited fields terminated by ','stored as textfile;
# 将初始化好的数据插入guid映射表insert into table wedw_dw.device_guidselect t1.device_id,t2.login_info,t1.user_id as guidfromwedw_tmp.login_info_1 t1inner join wedw_tmp.login_info_2 t2on t1.device_id = t2.device_id;+------------+-------------------------------------------------------------------------+---------+--+| device_id | login_info | guid |+------------+-------------------------------------------------------------------------+---------+--+| device01 | ["user01->100->2020-09-01 09:01:00"] | user01 || device02 | ["user02->100->2020-09-01 09:03:00","user03->90->2020-09-01 09:08:00"] | user02 |+------------+-------------------------------------------------------------------------+---------+--+
+------------+----------+------------+------------------------+-------------+--+| device_id | user_id | page_url | login_time | date_id |+------------+----------+------------+------------------------+-------------+--+| device01 | user01 | pageview | 2020-09-02 09:11:00.0 | 2020-09-02 || device01 | user01 | pageview | 2020-09-02 09:12:00.0 | 2020-09-02 || device01 | user01 | pageview | 2020-09-02 09:13:00.0 | 2020-09-02 || device01 | user01 | pageview | 2020-09-02 09:14:00.0 | 2020-09-02 || device02 | user03 | pageview | 2020-09-02 09:13:00.0 | 2020-09-02 || device02 | user03 | pageview | 2020-09-02 09:14:00.0 | 2020-09-02 || device02 | user03 | pageview | 2020-09-02 09:15:00.0 | 2020-09-02 || device02 | user03 | pageview | 2020-09-02 09:16:00.0 | 2020-09-02 || device02 | null | pageview | 2020-09-02 09:17:00.0 | 2020-09-02 || device02 | user02 | pageview | 2020-09-02 09:18:00.0 | 2020-09-02 || device03 | null | pageview | 2020-09-02 09:12:00.0 | 2020-09-02 || device03 | user04 | pageview | 2020-09-02 09:12:00.0 | 2020-09-02 |+------------+----------+------------+------------------------+-------------+--+
# 计算本次设备ID对应的全局IDdrop table if exists wedw_tmp.login_info_3;create table wedw_tmp.login_info_3asselect t1.device_id,t1.user_idfrom( select device_id ,user_id ,page_url ,login_time ,row_number() over(partition by device_id order by login_time asc) as rn from wedw_dw.test_id_mapping where date_id = '2020-09-02' and user_id<> 'null') t1where rn = 1;+------------+----------+--+| device_id | user_id |+------------+----------+--+| device01 | user01 || device02 | user03 || device03 | user04 |+------------+----------+--+
#计算本次设备ID对应的每个用户的登录信息及评分drop table if exists wedw_tmp.login_info_4;create table wedw_tmp.login_info_4asselectt4.device_id,collect_list(t4.login_info) as login_infofrom( select t3.device_id ,concat(t3.user_id,'->',110-10*t3.rn,'->',t3.login_time) as login_info from ( select t2.device_id ,t2.user_id ,t2.login_time ,row_number() over(partition by device_id order by login_time asc) as rn from ( select device_id ,user_id ,min(login_time) as login_time from wedw_dw.test_id_mapping where date_id = '2020-09-02' and user_id <> 'null' group by device_id,user_id ) t2 ) t3) t4group by t4.device_id;+------------+-------------------------------------------------------------------------+--+| device_id | login_info |+------------+-------------------------------------------------------------------------+--+| device01 | ["user01->100->2020-09-02 09:11:00"] || device02 | ["user03->100->2020-09-02 09:13:00","user02->90->2020-09-02 09:18:00"] || device03 | ["user04->100->2020-09-02 09:12:00"] |+------------+-------------------------------------------------------------------------+--+
计算逻辑:1. 针对当日的日志数据,为每个设备上登录过的每个账号打分 评分规则:
一个账号在一个设备上的访问记录条数无论是多少,只计算一次分数 对每个设备上,
登录过的账号,按时间先后顺序,赋予不同的分值(依次递减10分) 示例结果如下:
# 将本次计算好的guid映射结果插入到临时表drop table if exists wedw_tmp.login_info_5;create table wedw_tmp.login_info_5select t1.device_id,t2.login_info,t1.user_id as guidfromwedw_tmp.login_info_3 t1inner join wedw_tmp.login_info_4 t2on t1.device_id = t2.device_id;+------------+-------------------------------------------------------------------------+---------+--+| device_id | login_info | guid |+------------+-------------------------------------------------------------------------+---------+--+| device01 | ["user01->100->2020-09-02 09:11:00"] | user01 || device02 | ["user03->100->2020-09-02 09:13:00","user02->90->2020-09-02 09:18:00"] | user03 || device03 | ["user04->100->2020-09-02 09:12:00"] | user04 |+------------+-------------------------------------------------------------------------+---------+--+
# 两次结果数据行转列插到一个临时表中drop table if exists wedw_tmp.login_info_6;create table wedw_tmp.login_info_6selectdevice_id,split(tmp.sub,'->')[0] as user_id,split(tmp.sub,'->')[1] as score,split(tmp.sub,'->')[2] as login_timefromwedw_dw.device_guidlateral view explode(login_info) tmp as subunion all selectdevice_id,split(tmp.sub,'->')[0] as user_id,split(tmp.sub,'->')[1] as score,split(tmp.sub,'->')[2] as login_timefromwedw_tmp.login_info_5lateral view explode(login_info) tmp as sub;+------------+----------+--------+----------------------+--+| device_id | user_id | score | login_time |+------------+----------+--------+----------------------+--+| device01 | user01 | 100 | 2020-09-01 09:01:00 || device02 | user02 | 100 | 2020-09-01 09:03:00 || device02 | user03 | 90 | 2020-09-01 09:08:00 || device01 | user01 | 100 | 2020-09-02 09:11:00 || device02 | user03 | 100 | 2020-09-02 09:13:00 || device02 | user02 | 90 | 2020-09-02 09:18:00 || device03 | user04 | 100 | 2020-09-02 09:12:00 |+------------+----------+--------+----------------------+--+
# 计算前面结果数据中每个设备下每个用户的评分和及最早登录时间drop table if exists wedw_tmp.login_info_7;create table wedw_tmp.login_info_7select device_id,user_id,sum(score) as score,min(login_time) as login_timefromwedw_tmp.login_info_6group by device_id,user_id;+------------+----------+--------+----------------------+--+| device_id | user_id | score | login_time |+------------+----------+--------+----------------------+--+| device01 | user01 | 200.0 | 2020-09-01 09:01:00 || device02 | user02 | 190.0 | 2020-09-01 09:03:00 || device02 | user03 | 190.0 | 2020-09-01 09:08:00 || device03 | user04 | 100.0 | 2020-09-02 09:12:00 |+------------+----------+--------+----------------------+--+
# 合并两次结果数据 计算设备ID对应的guiddrop table if exists wedw_tmp.login_info_8;create table wedw_tmp.login_info_8select t2.device_id as device_id,t2.user_id as guidfrom( select t1.device_id ,t1.user_id ,t1.score ,t1.login_time ,row_number() over(partition by device_id order by t1.score desc) as rn from wedw_tmp.login_info_7 t1)t2where t2.rn = 1;+------------+---------+--+| device_id | guid |+------------+---------+--+| device01 | user01 || device02 | user02 || device03 | user04 |+------------+---------+--+
# 计算两次结果数据合并后每个设备ID对应的用户登录信息drop table if exists wedw_tmp.login_info_9;create table wedw_tmp.login_info_9selectt1.device_id,collect_list(t1.login_info) login_infofrom( select device_id ,concat(user_id,'->',score,'->',login_time) as login_info from wedw_tmp.login_info_7) t1group by device_id;+------------+------------------------------------------------------------------------------+--+| device_id | collect_list(login_info) |+------------+------------------------------------------------------------------------------+--+| device03 | ["user04->100.0->2020-09-02 09:12:00"] || device02 | ["user02->190.0->2020-09-01 09:03:00","user03->190.0->2020-09-01 09:08:00"] || device01 | ["user01->200.0->2020-09-01 09:01:00"] |+------------+------------------------------------------------------------------------------+--+
# 将合并好的数据插入guid映射表insert overwrite table wedw_dw.device_guidselect t1.device_id as device_id,t2.login_info as login_info,t1.guid as guidfrom wedw_tmp.login_info_8 t1inner join wedw_tmp.login_info_9 t2on t1.device_id = t2.device_id;+------------+------------------------------------------------------------------------------+---------+--+| device_id | login_info | guid |+------------+------------------------------------------------------------------------------+---------+--+| device03 | ["user04->100.0->2020-09-02 09:12:00"] | user04 || device02 | ["user02->190.0->2020-09-01 09:03:00","user03->190.0->2020-09-01 09:08:00"] | user02 || device01 | ["user01->200.0->2020-09-01 09:01:00"] | user01 |+------------+------------------------------------------------------------------------------+---------+--+
"Mapping怎么使用"的内容就介绍到这里了,感谢大家的阅读。如果想了解更多行业相关的知识可以关注网站,小编将为大家输出更多高质量的实用文章!