渐变维度 使用 Apache Hudi 实现 SCD-2( 二 )

启动 spark shell 后,我们可以导入库,并创建 Hudi 表,如下所示 。
Welcome to      ____              __     / __/__  ___ _____/ /__    _\ \/ _ \/ _ `/ __/  '_/   /___/ .__/\_,_/_/ /_/\_\   version 2.4.8      /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)Type in expressions to have them evaluated.Type :help for more information.scala> spark.sql("""create table hudi_product_catalog (     | seller_id int,     | prod_category string,     | product_name string,     | product_package string,     | discount_percentage string,     | eff_start_ts timestamp,     | eff_end_ts timestamp,     | actv_ind int     |  ) using hudi     | tblproperties (     |   type = 'cow',     |   primaryKey = 'seller_id,prod_category,eff_end_ts',     |   preCombineField = 'eff_start_ts'     |  )     | partitioned by (actv_ind)     |  location 'gs://target_bucket/hudi_product_catalog/'""")将数据写入到存储桶后,如下是 Hudi 目标表的数据格式 。
+-------------------+---------------------+-------------------------------------------------------------------------+----------------------+--------------------------------------------------------------------------+---------+--------------+---------------+---------------+-------------------+-------------------+-------------------+--------+|_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key                                                       |_hoodie_partition_path|_hoodie_file_name                                                         |seller_id|prod_category |product_name   |product_package|discount_percentage|eff_start_ts       |eff_end_ts         |actv_ind|+-------------------+---------------------+-------------------------------------------------------------------------+----------------------+--------------------------------------------------------------------------+---------+--------------+---------------+---------------+-------------------+-------------------+-------------------+--------+|20220722113258101  |20220722113258101_0_0|seller_id:3412,prod_category:Healthcare,eff_end_ts:253402300799000000    |actv_ind=1            |a94c9c58-ac6b-4841-a734-8ef1580e2547-0_0-29-1219_20220722113258101.parquet|3412     |Healthcare    |Dolo 650       |10             |10                 |2022-04-01 16:30:45|9999-12-31 23:59:59|1       ||20220722113258101  |20220722113258101_0_1|seller_id:1234,prod_category:Home Essential,eff_end_ts:253402300799000000|actv_ind=1            |a94c9c58-ac6b-4841-a734-8ef1580e2547-0_0-29-1219_20220722113258101.parquet|1234     |Home Essential|Hand Towel     |12             |20                 |2021-10-20 06:55:22|9999-12-31 23:59:59|1       ||20220722113258101  |20220722113258101_0_2|seller_id:4565,prod_category:Gourmet,eff_end_ts:253402300799000000       |actv_ind=1            |a94c9c58-ac6b-4841-a734-8ef1580e2547-0_0-29-1219_20220722113258101.parquet|4565     |Gourmet       |Dairy Milk Silk|6              |30                 |2021-06-12 20:30:40|9999-12-31 23:59:59|1       ||20220722113258101  |20220722113258101_0_3|seller_id:1234,prod_category:Detergent,eff_end_ts:253402300799000000     |actv_ind=1            |a94c9c58-ac6b-4841-a734-8ef1580e2547-0_0-29-1219_20220722113258101.parquet|1234     |Detergent     |Tide 2L        |6              |15                 |2021-12-15 15:20:30|9999-12-31 23:59:59|1       |+-------------------+---------------------+-------------------------------------------------------------------------+----------------------+--------------------------------------------------------------------------+---------+--------------+---------------+---------------+-------------------+-------------------+-------------------+--------+

经验总结扩展阅读