千家信息网

spark写orc格式文件

发表于:2024-11-11 作者:千家信息网编辑
千家信息网最后更新 2024年11月11日,在hive中建表格式存储格式为orccreate table user(id int,name string) stored as orc;spark写文件 val jsons = "hdfs:
千家信息网最后更新 2024年11月11日spark写orc格式文件
  1. 在hive中建表格式存储格式为orc

    create table user(id int,name string) stored as orc;

  2. spark写文件

    val jsons = "hdfs://localhost:9000/test/artist_orc.json"    val people = sc.textFile(jsons)    val schemaString = "id name"    val schema = StructType(schemaString.split(" ").map(fieldName => {if(fieldName == "name")      StructField(fieldName, StringType, true) else StructField(fieldName, IntegerType, true)}))    val rowRDD = people.map(line=>{      JSONObject.fromObject(line)    }).map(p => Row(new Integer(p.get("id").toString), p.get("name")))    val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)    val peopleSchemaRDD = hiveContext.createDataFrame(rowRDD, schema)    peopleSchemaRDD.write.format("orc").save("hdfs://localhost:9000/user/xb/warehouse/artist_orc/adf")


0