渐变维度 使用 Apache Hudi 实现 SCD-2( 四 )


//Prepare an update DF from incremental temp DF, select columns from both the tablesval updDf = stgDf.as("stg")      .join(tgtHudiDf.as("tgt"),        col("stg.seller_id") === col("tgt.seller_id") &&          col("stg.prod_category") === col("tgt.prod_category"),"inner")          .where(col("stg.eff_start_ts") > col("tgt.eff_start_ts")).select((stgDf.columns.map(c => stgDf(c).as(s"stg_$c"))++ tgtHudiDf.columns.map(c => tgtHudiDf(c).as(s"tgt_$c"))):_*)updDf.show(false)+-------------+-----------------+-----------------+-------------------+-----------------------+-------------------+-----------------------+------------------------+----------------------+--------------------------+---------------------+-------------+-----------------+----------------+-------------------+-----------------------+-------------------+-------------------+------------+|stg_seller_id|stg_prod_category| stg_product_name|stg_product_package|stg_discount_percentage|   stg_eff_start_ts|tgt__hoodie_commit_time|tgt__hoodie_commit_seqno|tgt__hoodie_record_key|tgt__hoodie_partition_path|tgt__hoodie_file_name|tgt_seller_id|tgt_prod_category|tgt_product_name|tgt_product_package|tgt_discount_percentage|   tgt_eff_start_ts|     tgt_eff_end_ts|tgt_actv_ind|+-------------+-----------------+-----------------+-------------------+-----------------------+-------------------+-----------------------+------------------------+----------------------+--------------------------+---------------------+-------------+-----------------+----------------+-------------------+-----------------------+-------------------+-------------------+------------+|         1234|        Detergent|          Tide 5L|                  6|                     25|2022-01-31 10:00:30|      20220710113622931|    20220710113622931...|  seller_id:1234,pr...|                actv_ind=1| 2dd6109f-2173-429...|         1234|        Detergent|         Tide 2L|                  6|                     15|2021-12-15 15:20:30|9999-12-31 23:59:59|           1||         4565|          Gourmet|Dairy Milk Almond|                 12|                     45|2022-06-12 20:30:40|      20220710113622931|    20220710113622931...|  seller_id:4565,pr...|                actv_ind=1| 2dd6109f-2173-429...|         4565|          Gourmet| Dairy Milk Silk|                  6|                     30|2021-06-12 20:30:40|9999-12-31 23:59:59|           1|+-------------+-----------------+-----------------+-------------------+-----------------------+-------------------+-----------------------+------------------------+----------------------+--------------------------+---------------------+-------------+-----------------+----------------+-------------------+-----------------------+-------------------+-------------------+------------+

经验总结扩展阅读