千家信息网

使用kylin的示例分析

发表于:2025-01-23 作者:千家信息网编辑
千家信息网最后更新 2025年01月23日,使用kylin的示例分析,相信很多没有经验的人对此束手无策,为此本文总结了问题出现的原因和解决方法,通过这篇文章希望你能解决这个问题。我的kylin.properties配置:### SERVICE
千家信息网最后更新 2025年01月23日使用kylin的示例分析

使用kylin的示例分析,相信很多没有经验的人对此束手无策,为此本文总结了问题出现的原因和解决方法,通过这篇文章希望你能解决这个问题。

我的kylin.properties配置:

### SERVICE #### Kylin server mode, valid value [all, query, job]kyin.server.mode=all# Optional information for the owner of kylin platform, it can be your team's email# Currently it will be attached to each kylin's htable attributekylin.owner=whoami@kylin.apache.org# List of web servers in use, this enables one web server instance to sync up with other servers.kylin.rest.servers=192.168.64.16:7070# Display timezone on UI,format like[GMT+N or GMT-N]kylin.rest.timezone=GMT+8### SOURCE #### Hive client, valid value [cli, beeline]kylin.hive.client=cli# Parameters for beeline client, only necessary if hive client is beeline#kylin.hive.beeline.params=-n root --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u 'jdbc:hive2://localhost:10000'kylin.hive.keep.flat.table=false### STORAGE #### The metadata store in hbasekylin.metadata.url=kylin_metadata@hbase# The storage for final cube file in hbasekylin.storage.url=hbase# In seconds (2 days)kylin.storage.cleanup.time.threshold=172800000# Working folder in HDFS, make sure user has the right access to the hdfs directorykylin.hdfs.working.dir=/kylin# Compression codec for htable, valid value [none, snappy, lzo, gzip, lz4]kylin.hbase.default.compression.codec=none# HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020# Leave empty if hbase running on same cluster with hive and mapreducekylin.hbase.cluster.fs=hdfs://master1:8020# The cut size for hbase region, in GB.kylin.hbase.region.cut=5# The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster.# Set 0 to disable this optimization.kylin.hbase.hfile.size.gb=2kylin.hbase.region.count.min=1kylin.hbase.region.count.max=500### JOB #### max job retry on error, default 0: no retrykylin.job.retry=0kylin.job.jar=$KYLIN_HOME/lib/kylin-job-1.5.4.jarkylin.coprocessor.local.jar=$KYLIN_HOME /lib/kylin-coprocessor-1.5.4.jar# If true, job engine will not assume that hadoop CLI reside on the same server as it self# you will have to specify kylin.job.remote.cli.hostname, kylin.job.remote.cli.username and kylin.job.remote.cli.password# It should not be set to "true" unless you're NOT running Kylin.sh on a hadoop client machine # (Thus kylin instance has to ssh to another real hadoop client machine to execute hbase,hive,hadoop commands)kylin.job.run.as.remote.cmd=false# Only necessary when kylin.job.run.as.remote.cmd=truekylin.job.remote.cli.hostname=kylin.job.remote.cli.port=22# Only necessary when kylin.job.run.as.remote.cmd=truekylin.job.remote.cli.username=# Only necessary when kylin.job.run.as.remote.cmd=truekylin.job.remote.cli.password=# Used by test cases to prepare synthetic data for sample cubekylin.job.remote.cli.working.dir=/tmp/kylin# Max count of concurrent jobs runningkylin.job.concurrent.max.limit=10# Time interval to check hadoop job statuskylin.job.yarn.app.rest.check.interval.seconds=10# Hive database name for putting the intermediate flat tableskylin.job.hive.database.for.intermediatetable=default# The percentage of the sampling, default 100%kylin.job.cubing.inmem.sampling.percent=100# Whether get job status from resource manager with kerberos authenticationkylin.job.status.with.kerberos=falsekylin.job.mapreduce.default.reduce.input.mb=500kylin.job.mapreduce.max.reducer.number=500kylin.job.mapreduce.mapper.input.rows=1000000kylin.job.step.timeout=7200### CUBE #### 'auto', 'inmem', 'layer' or 'random' for testingkylin.cube.algorithm=autokylin.cube.algorithm.auto.threshold=8kylin.cube.aggrgroup.max.combination=4096kylin.dictionary.max.cardinality=5000000kylin.table.snapshot.max_mb=300### QUERY ###kylin.query.scan.threshold=10000000# 3Gkylin.query.mem.budget=3221225472kylin.query.coprocessor.mem.gb=3# Enable/disable ACL check for cube querykylin.query.security.enabled=truekylin.query.cache.enabled=true### SECURITY #### Spring security profile, options: testing, ldap, saml# with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to loginkylin.security.profile=testing### SECURITY #### Default roles and admin roles in LDAP, for ldap and samlacl.defaultRole=ROLE_ANALYST,ROLE_MODELERacl.adminRole=ROLE_ADMIN# LDAP authentication configurationldap.server=ldap://ldap_server:389ldap.username=ldap.password=# LDAP user account directory;ldap.user.searchBase=ldap.user.searchPattern=ldap.user.groupSearchBase=# LDAP service account directoryldap.service.searchBase=ldap.service.searchPattern=ldap.service.groupSearchBase=## SAML configurations for SSO# SAML IDP metadata file locationsaml.metadata.file=classpath:sso_metadata.xmlsaml.metadata.entityBaseURL=https://hostname/kylinsaml.context.scheme=httpssaml.context.serverName=hostnamesaml.context.serverPort=443saml.context.contextPath=/kylin### MAIL #### If true, will send email notification;mail.enabled=falsemail.host=mail.username=mail.password=mail.sender=### WEB #### Help info, format{name|displayName|link}, optionalkylin.web.help.length=4kylin.web.help.0=start|Getting Started|kylin.web.help.1=odbc|ODBC Driver|kylin.web.help.2=tableau|Tableau Guide|kylin.web.help.3=onboard|Cube Design Tutorial|# Guide user how to build streaming cubekylin.web.streaming.guide=http://kylin.apache.org/# Hadoop url link, optionalkylin.web.hadoop=#job diagnostic url link, optionalkylin.web.diagnostic=#contact mail on web page, optionalkylin.web.contact_mail=crossdomain.enable=true

1. 运行./bin/find-hive-dependency.sh看Hive环境是否配置正确,提示找不到HCAT_HOME路径。

解决方法:export HCAT_HOME=$HIVE_HOME/hcatalog

然后重新运行脚本

2. 在kylin web界面load hive表失败,提示failed to take action。

解决方法:

vi ./bin/kylin.sh
需要对此脚本做两点修改:
1. export KYLIN_HOME=/home/grid/kylin # 改成绝对路径
2. export HBASE_CLASSPATH_PREFIX=${tomcat_root}/bin/bootstrap.jar:${tomcat_root}/bin/tomcat-juli.jar:${tomcat_root}/lib/*:$hive_dependency:$HBASE_CLASSPATH_PREFIX # 在路径中添加$hive_dependency

3. Kylin如何添加登录用户

官方doc给出解决思路:Kylin是采用Spring security framework做用户认证的,需要配置${KYLIN_HOME}/tomcat/webapps/kylin/WEB-INF/classes/kylinSecurity.xml 的sandbox,testing部分

                                        ...                                                ...

password需要spring加密:

    org.springframework.security    spring-security-core    4.0.0.RELEASE
String password = "123456"; org.springframework.security.crypto.password.PasswordEncoder encoder   = new org.springframework.security.crypto.bcrypt.BCryptPasswordEncoder();String encodedPassword = encoder.encode(password);  System.out.print(encodedPassword);

4. 建立cube时报错FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

莫名其妙的错误,在kylin.log看不到root cause,需要去hive配置的log查看(log4j中设置,默认目录是/tmp/$user/),找到原因是error message: "Error: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z"

原来是压缩格式的问题,kylin默认并没有采用hadoop的lzo压缩格式,而是采用了snappy。

有3个解决方案:

1.用kylin apache-kylin-1.5.2.1-HBase1.x-bin.tar.gz 代替apache-kylin-1.5.2.1-bin.tar.gz重新部署,因为我用的是hbase0.98,所以被pass。。

2. 改换成lzo压缩,要麻烦一点,具体查考http://kylin.apache.org/docs15/install/advance_settings.html

3. hive和hbase不采用压缩(cube build时间也许会变长,具体自行评估),在配置文件conf/kylin.properties和conf/*.xml (grep snappy),然后全部删掉snappy和compress的配置。

5. 建立cube的step3Extract Fact Table Distinct Columns 报错java.net.ConnectException: Call From master1/192.168.64.11 to localhost:18032 failed on connection exception: java.net.ConnectException: Connection refused

解决方案:

这个issue花费了太多时间,网上查到都说是yarn端口配置问题,但是我修改了yarn-site.xml之后还是不行。后来又以为是hive metastore server的问题。但是修改了之后还是同样问题。

没办法,我最后只好换成HBase 1.1.6,同时kylin版本也要找到对应hbase1.x版。问题解决。。。

6. cube创建成功后 ,查询sql出现error in coprocessor

解决方法:

这个问题真的困扰了好几天,我kylin.coprocessor.local.jar=/../kylin/lib/kylin-coprocessor-1.5.4.jar 已经配好了的,网上的解决方法是find-hbase-dependency.sh脚本里 hbase_dependency=绝对路径/habse-1.1.6/lib,但貌似还是没什么用。

最后是完全删除了hdfs上hbase的数据,重启hbase才成功。估计还是cube创建过程中出现了什么问题,以后再考证吧。

7. 关于Kylin sql

有了处理count distinct的问题的经验,我们发现,针对Kylin sql列出如下的区别:

  • 不能limit beg, end 只能limit length

  • 不支持 union, union all

  • 不支持 where exists 子句

8. 清理Kylin的中间存储数据

Kylin在创建cube过程中会在HDFS上生成很多的中间数据。另外,当我们对cube执行build/drop/merge时,一些HBase的表可能会保留在HBase中,而这些表不再被查询,所以需要我们能够每隔一段时间做一些离线存储的清理工作。具体步骤如下:

1. 检查哪些资源需要被清理,这个操作不会删除任何内容:

${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob --delete false

Time taken: 1.339 secondsOKkylin_intermediate_kylin_sales_cube_desc_2b8ea1a6_99f4_4045_b0f5_22372b9ffc60kylin_intermediate_weibo_cube_0d26f9e5_0935_409a_9a6d_6c1d03773fbdkylin_intermediate_weibo_cube_1d21fe49_990c_4a34_9267_e693421689f2Time taken: 0.33 seconds, Fetched: 3 row(s)------ Intermediate Hive Tables To Be Dropped ----------------------------------------------------------2016-10-12 15:24:25,881 INFO  [main CubeManager:132]: Initializing CubeManager with config kylin_metadata@hbase2016-10-12 15:24:25,897 INFO  [main CubeManager:828]: Loading Cube from folder kylin_metadata(key='/cube')@kylin_metadata@hbase2016-10-12 15:24:25,952 INFO  [main CubeDescManager:91]: Initializing CubeDescManager with config kylin_metadata@hbase2016-10-12 15:24:25,952 INFO  [main CubeDescManager:197]: Reloading Cube Metadata from folder kylin_metadata(key='/cube_desc')@kylin_metadata@hbase2016-10-12 15:24:26,035 DEBUG [main CubeDescManager:222]: Loaded 2 Cube(s)2016-10-12 15:24:26,038 DEBUG [main CubeManager:870]: Reloaded new cube: userlog_cube with reference beingCUBE[name=userlog_cube] having 1 segments:KYLIN_WEK77BKP6M2016-10-12 15:24:26,040 DEBUG [main CubeManager:870]: Reloaded new cube: weibo_cube with reference beingCUBE[name=weibo_cube] having 1 segments:KYLIN_5N8ZRC7Z1F2016-10-12 15:24:26,040 INFO  [main CubeManager:841]: Loaded 2 cubes, fail on 0 cubes2016-10-12 15:24:26,218 INFO  [main StorageCleanupJob:218]: Skip /kylin/kylin_metadata/kylin-779df736-75b0-4263-b045-6a49401b4516 from deletion list, as the path belongs to segment userlog_cube[19700101000000_20160930000000] of cube userlog_cube2016-10-12 15:24:26,218 INFO  [main StorageCleanupJob:218]: Skip /kylin/kylin_metadata/kylin-e9805d06-559a-4c15-ab1e-d6e947460093 from deletion list, as the path belongs to segment weibo_cube[19700101000000_20140430000000] of cube weibo_cube--------------- HDFS Path To Be Deleted ---------------/kylin/kylin_metadata/kylin-07e8f9b1-8dfc-4c57-8e5b-e9800392af0d/kylin/kylin_metadata/kylin-0855f8ed-89a5-4676-a9bb-f8c301ead327/kylin/kylin_metadata/kylin-0cdef491-d0b7-438d-ba54-091678cb463d/kylin/kylin_metadata/kylin-121752c8-ab9d-434b-812f-73f766796436/kylin/kylin_metadata/kylin-12b442a0-0c6d-43e7-830f-2f6e5826f23a/kylin/kylin_metadata/kylin-5ba7affe-d584-4f6e-85b2-2588e31a985c/kylin/kylin_metadata/kylin-5e1818bd-4644-4e8e-b332-b5bb59ff9677/kylin/kylin_metadata/kylin-680f7549-48be-496a-82c5-084434bfee74/kylin/kylin_metadata/kylin-707d1a65-392e-456f-97ea-d7d553b52950/kylin/kylin_metadata/kylin-7520fc6e-8b76-43cc-9fb8-bfba969040da/kylin/kylin_metadata/kylin-75e5b484-4594-4d31-83ce-729a6b3de1c2/kylin/kylin_metadata/kylin-79535d79-cd36-4711-858c-d8fa28266f7f/kylin/kylin_metadata/kylin-81eb9119-c806-4003-a6d6-fc43281a8c01/kylin/kylin_metadata/kylin-839e80d8-d116-4061-80d6-379c85db7114/kylin/kylin_metadata/kylin-843b185d-ed09-48c7-958c-1ee1e0e2cde5/kylin/kylin_metadata/kylin-97c0cdc6-c53e-4115-995e-b90f4381d307/kylin/kylin_metadata/kylin-998aa0aa-279c-44f0-8367-807b9110ae74/kylin/kylin_metadata/kylin-ad2ad0c7-bee5-46f2-9fc3-e60b10941ffa/kylin/kylin_metadata/kylin-b5939b9b-2a6e-4acb-aaf7-888a83113ad7/kylin/kylin_metadata/kylin-b65b555d-90e5-4455-95ce-10b215b00482/kylin/kylin_metadata/kylin-d5ac36b3-b021-4ac6-87ae-f3a38f90eb06/kylin/kylin_metadata/kylin-e7a9b0d1-a788-4ddf-88f5-37671eaa7dc3/kylin/kylin_metadata/kylin-f7094827-00f8-474b-9542-ea001797a148-------------------------------------------------------2016-10-12 15:24:26,475 INFO  [main StorageCleanupJob:91]: Exclude table KYLIN_WEK77BKP6M from drop list, as it is newly created2016-10-12 15:24:26,475 INFO  [main StorageCleanupJob:102]: Exclude table KYLIN_5N8ZRC7Z1F from drop list, as the table belongs to cube weibo_cube with status READY--------------- Tables To Be Dropped -------------------------------------------------------------------

2. 如上图所示,列出了在hive/HDFS/Hbase中可以被删除的表或文件(同时自动过滤掉最近生成或者查询过的表)。 根据上面的输出结果,查看表是否真的不再需要。确定之后,用1的命令把"-delete false"改成true就开始执行清理操作。

${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob --delete true

看完上述内容,你们掌握使用kylin的示例分析的方法了吗?如果还想学到更多技能或想了解更多相关内容,欢迎关注行业资讯频道,感谢各位的阅读!

0