others linux服务器运维 django3 监控 k8s golang 数据库 大数据 前端 devops 理论基础 java oracle 运维日志

Impala query profile 故障排查

访问量:1973 创建时间:2021-01-05

参考地址一 参考地址二 参考地址三 参考地址四

我这里遇到的问题2个40万行的表join, impala查询报错,内存不够,后来设置mem_limit=180G依然不行。impala mem_limit足够大,但是小查询还是报错。 最后解决方法:配置文件加了 --enable_partitioned_aggregation=false --enable_partitioned_hash_join=false;官方链接

vim /etc/default/impala
#其他内容省略
IMPALA_SERVER_ARGS=" \
    -log_dir=${IMPALA_LOG_DIR} \
    -default_pool_max_requests=-1 \
    --enable_partitioned_aggregation=false \
    --enable_partitioned_hash_join=false \
#其他内容省略

explain sql语句; 用于分析sql的执行计划,sql未真正执行; summary; 在一条语句执行完以后立马执行summary命令就能看到这条sql语句执行时所消耗的时间和资源的情况,还有impala预估的资源使用,有时候预估和实际情况还是有很多不同的。 PROFILE; 更详细的执行信息还可以用Profiles命令.

[hd:21000] >  SELECT  * FROM SAPSR3.AFPO AP LEFT JOIN  SAPSR3.AFKO AK ON AK.AUFNR=AP.AUFNR AND AK.MANDT = '800';
Query: select * FROM SAPSR3.AFPO AP LEFT JOIN  SAPSR3.AFKO AK ON AK.AUFNR=AP.AUFNR AND AK.MANDT = '800'
WARNINGS: 
Memory limit exceeded
Query did not have enough memory to get the minimum required buffers in the block manager.



Memory Limit Exceeded
Query(6b423c1ab18095e8:d63218157018ca9b) Limit: Limit=180.00 GB Consumption=235.46 MB
  Fragment 6b423c1ab18095e8:d63218157018ca9d: Consumption=25.24 MB
    HASH_JOIN_NODE (id=2): Consumption=10.01 MB
    EXCHANGE_NODE (id=3): Consumption=0
    DataStreamRecvr: Consumption=15.21 MB
    EXCHANGE_NODE (id=4): Consumption=0
    DataStreamSender: Consumption=64.00 B
  Block Manager: Limit=144.00 GB Consumption=0
  Fragment 6b423c1ab18095e8:d63218157018caa7: Consumption=68.30 MB
    HDFS_SCAN_NODE (id=0): Consumption=66.79 MB
    DataStreamSender: Consumption=104.52 KB
  Fragment 6b423c1ab18095e8:d63218157018caa9: Consumption=141.88 MB
    HDFS_SCAN_NODE (id=1): Consumption=139.23 MB
    DataStreamSender: Consumption=96.28 KB
Memory Limit Exceeded
Query(6b423c1ab18095e8:d63218157018ca9b) Limit: Limit=180.00 GB Consumption=167.97 MB
  Fragment 6b423c1ab18095e8:d63218157018caa0: Consumption=24.82 MB
    HASH_JOIN_NODE (id=2): Consumption=10.01 MB
    EXCHANGE_NODE (id=3): Consumption=0
    DataStreamRecvr: Consumption=14.79 MB
    EXCHANGE_NODE (id=4): Consumption=0
    DataStreamSender: Consumption=64.00 B
  Block Manager: Limit=144.00 GB Consumption=0
  Fragment 6b423c1ab18095e8:d63218157018caa3: Consumption=58.79 MB
    HDFS_SCAN_NODE (id=0): Consumption=57.25 MB
    DataStreamSender: Consumption=120.52 KB
  Fragment 6b423c1ab18095e8:d63218157018caac: Consumption=84.39 MB
    HDFS_SCAN_NODE (id=1): Consumption=81.73 MB
    DataStreamSender: Consumption=120.28 KB

运行profile

[hdpprdd02.example:21000] > PROFILE;
Query Runtime Profile:
Query (id=6b423c1ab18095e8:d63218157018ca9b):
  Summary:
    Session ID: 464ad797bf15d6ab:dd136f8c4ff2fab4
    Session Type: BEESWAX
    Start Time: 2021-01-05 09:57:49.569631000
    End Time: 2021-01-05 09:57:53.640374000
    Query Type: QUERY
    Query State: EXCEPTION
    Query Status: 
Memory limit exceeded
Query did not have enough memory to get the minimum required buffers in the block manager.

    Impala Version: impalad version 2.4.0-cdh5.6.0 RELEASE (build 386c7fffa54a2b846dfb00a2ff7bf023dc2dd684)
    User: hive@example
    Connected User: hive@example
    Delegated User: 
    Network Address: ::ffff:172.16.106.104:50118
    Default Db: default
    Sql Statement: select * FROM SAPSR3.AFPO AP LEFT JOIN  SAPSR3.AFKO AK ON AK.AUFNR=AP.AUFNR AND AK.MANDT = '800'
    Coordinator: hdpprdd02.example:22000
    Plan: 
----------------
Estimated Per-Host Requirements: Memory=2.37GB VCores=2

F03:PLAN FRAGMENT [UNPARTITIONED]
  05:EXCHANGE [UNPARTITIONED]
     hosts=6 per-host-mem=unavailable
     tuple-ids=1N,0 row-size=4.74KB cardinality=414944

F02:PLAN FRAGMENT [HASH(AK.AUFNR)]
  DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=05, UNPARTITIONED]
  02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
  |  hash predicates: AK.AUFNR = AP.AUFNR
  |  hosts=6 per-host-mem=120.09MB
  |  tuple-ids=1N,0 row-size=4.74KB cardinality=414944
  |
  |--04:EXCHANGE [HASH(AP.AUFNR)]
  |     hosts=6 per-host-mem=0B
  |     tuple-ids=0 row-size=1.64KB cardinality=408787
  |
  03:EXCHANGE [HASH(AK.AUFNR)]
     hosts=6 per-host-mem=0B
     tuple-ids=1 row-size=3.09KB cardinality=408788

F01:PLAN FRAGMENT [RANDOM]
  DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04, HASH(AP.AUFNR)]
  00:SCAN HDFS [sapsr3.afpo ap, RANDOM]
     partitions=1/1 files=10 size=94.21MB
     table stats: 408787 rows total
     column stats: all
     hosts=6 per-host-mem=2.11GB
     tuple-ids=0 row-size=1.64KB cardinality=408787

F00:PLAN FRAGMENT [RANDOM]
  DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=03, HASH(AK.AUFNR)]
  01:SCAN HDFS [sapsr3.afko ak, RANDOM]
     partitions=1/1 files=20 size=139.05MB
     predicates: AK.MANDT = '800'
     table stats: 408788 rows total
     column stats: all
     hosts=6 per-host-mem=2.25GB
     tuple-ids=1 row-size=3.09KB cardinality=408788
----------------
    Estimated Per-Host Mem: 2541843314
    Estimated Per-Host VCores: 2
    Request Pool: default-pool
    ExecSummary: 
Operator          #Hosts   Avg Time   Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail                        
--------------------------------------------------------------------------------------------------------------------------
05:EXCHANGE            1  205.257ms  205.257ms      0     414.94K         0        -1.00 B  UNPARTITIONED                 
02:HASH JOIN           0        0ns        0ns      0     414.94K         0      120.09 MB  RIGHT OUTER JOIN, PARTITIONED 
|--04:EXCHANGE         0        0ns        0ns      0     408.79K         0              0  HASH(AP.AUFNR)                
|  00:SCAN HDFS        0        0ns        0ns      0     408.79K         0        2.11 GB  sapsr3.afpo ap                
03:EXCHANGE            0        0ns        0ns      0     408.79K         0              0  HASH(AK.AUFNR)                
01:SCAN HDFS           0        0ns        0ns      0     408.79K         0        2.25 GB  sapsr3.afko ak                
    Planner Timeline: 42.419ms
       - Analysis finished: 6.995ms (6.995ms)
       - Equivalence classes computed: 10.66ms (3.70ms)
       - Single node plan created: 25.233ms (15.167ms)
       - Distributed plan created: 35.728ms (10.495ms)
       - Planning finished: 42.419ms (6.690ms)
    Query Timeline: 4s071ms
       - Start execution: 59.641us (59.641us)
       - Planning finished: 3s553ms (3s553ms)
       - Ready to start remote fragments: 3s556ms (2.454ms)
       - Remote fragments started: 3s767ms (211.182ms)
       - Rows available: 3s976ms (209.99ms)
       - Unregister query: 4s070ms (94.268ms)
  ImpalaServer:
     - ClientFetchWaitTimer: 0ns
     - RowMaterializationTimer: 0ns
  Execution Profile 6b423c1ab18095e8:d63218157018ca9b:(Total: 421.63ms, non-child: 0ns, % non-child: 0.00%)
    Fragment start latencies: count: 18, last: 0.016319ns, min: 0.014551ns, max: 0.173539ns, mean: 0.065316ns, stddev: 0.070056ns
    Per Node Peak Memory Usage: hdpprdd06.example:22000(0) hdpprdd03.example:22000(0) hdpprdd01.example:22000(0) hdpprdd05.example:22000(0) hdpprdd04.example:22000(0) hdpprdd02.example:22000(0) 
     - FinalizationTimer: 0ns
    Coordinator Fragment F03:(Total: 205.808ms, non-child: 551.47us, % non-child: 0.27%)
      MemoryUsage(500.0ms): 16.00 KB
      ThreadUsage(500.0ms): 1
       - AverageThreadTokens: 1.00 
       - PeakMemoryUsage: 16.00 KB (16384)
       - PerHostPeakMemUsage: 0
       - PrepareTime: 42.569us
       - RowsProduced: 0 (0)
       - TotalCpuTime: 0ns
       - TotalNetworkReceiveTime: 205.256ms
       - TotalNetworkSendTime: 0ns
       - TotalStorageWaitTime: 0ns
      BlockMgr:
         - BlockWritesOutstanding: 0 (0)
         - BlocksCreated: 32 (32)
         - BlocksRecycled: 0 (0)
         - BufferedPins: 0 (0)
         - BytesWritten: 0
         - MaxBlockSize: 8.00 MB (8388608)
         - MemoryLimit: 144.00 GB (154618822656)
         - PeakMemoryUsage: 0
         - TotalBufferWaitTime: 0ns
         - TotalEncryptionTime: 0ns
         - TotalIntegrityCheckTime: 0ns
         - TotalReadBlockTime: 0ns
      EXCHANGE_NODE (id=5):(Total: 205.257ms, non-child: 205.257ms, % non-child: 100.00%)
        BytesReceived(500.0ms): 0
         - BytesReceived: 0
         - ConvertRowBatchTime: 0ns
         - DeserializeRowBatchTimer: 0ns
         - FirstBatchArrivalWaitTime: 0ns
         - PeakMemoryUsage: 0
         - RowsReturned: 0 (0)
         - RowsReturnedRate: 0
         - SendersBlockedTimer: 0ns
         - SendersBlockedTotalTimer(*): 0ns
    Averaged Fragment F02:
      split sizes:  min: 0, max: 0, avg: 0, stddev: 0
      completion times: min:245.166ms  max:262.887ms  mean: 254.675259.675.259500ms  stddev:5.733565.733.564519ms
      execution rates: min:0.00 /sec  max:0.00 /sec  mean:0.00 /sec  stddev:0.00 /sec
      num instances: 6
    Averaged Fragment F01:
      split sizes:  min: 11.02 MB, max: 21.61 MB, avg: 15.70 MB, stddev: 4.40 MB
      completion times: min:226.494ms  max:229.616ms  mean: 228.095456.95.456333ms  stddev:1.110035.110.035036ms
      execution rates: min:48.15 MB/sec  max:94.40 MB/sec  mean:68.83 MB/sec  stddev:19.26 MB/sec
      num instances: 6
    Averaged Fragment F00:
      split sizes:  min: 21.55 MB, max: 29.46 MB, avg: 23.17 MB, stddev: 2.83 MB
      completion times: min:209.39ms  max:210.423ms  mean: 209.729817.729.817333ms  stddev:533.480740.480.740418us
      execution rates: min:102.42 MB/sec  max:140.05 MB/sec  mean:110.48 MB/sec  stddev:13.32 MB/sec
      num instances: 6
    Fragment F02:
      Instance 6b423c1ab18095e8:d63218157018ca9d (host=hdpprdd04.example:22000):
      Instance 6b423c1ab18095e8:d63218157018ca9e (host=hdpprdd05.example:22000):
      Instance 6b423c1ab18095e8:d63218157018ca9f (host=hdpprdd01.example:22000):
      Instance 6b423c1ab18095e8:d63218157018caa0 (host=hdpprdd03.example:22000):
      Instance 6b423c1ab18095e8:d63218157018caa1 (host=hdpprdd06.example:22000):
      Instance 6b423c1ab18095e8:d63218157018caa2 (host=hdpprdd02.example:22000):
    Fragment F01:
      Instance 6b423c1ab18095e8:d63218157018caa3 (host=hdpprdd03.example:22000):
      Instance 6b423c1ab18095e8:d63218157018caa4 (host=hdpprdd01.example:22000):
      Instance 6b423c1ab18095e8:d63218157018caa5 (host=hdpprdd05.example:22000):
      Instance 6b423c1ab18095e8:d63218157018caa6 (host=hdpprdd02.example:22000):
      Instance 6b423c1ab18095e8:d63218157018caa7 (host=hdpprdd04.example:22000):
      Instance 6b423c1ab18095e8:d63218157018caa8 (host=hdpprdd06.example:22000):
    Fragment F00:
      Instance 6b423c1ab18095e8:d63218157018caa9 (host=hdpprdd04.example:22000):
      Instance 6b423c1ab18095e8:d63218157018caaa (host=hdpprdd05.example:22000):
      Instance 6b423c1ab18095e8:d63218157018caab (host=hdpprdd01.example:22000):
      Instance 6b423c1ab18095e8:d63218157018caac (host=hdpprdd03.example:22000):
      Instance 6b423c1ab18095e8:d63218157018caad (host=hdpprdd06.example:22000):
      Instance 6b423c1ab18095e8:d63218157018caae (host=hdpprdd02.example:22000):
  1. 查询ID Query (id=6b423c1ab18095e8:d63218157018ca9b):
  2. Session Type ;BEESWAX是impala-shell,来自hue是HIVESERVER2
  3. 查询起止时间:如果你看到在Hue运行一个简单的查询用了几秒钟返回结果,是由于Hue会保持会话打开直到会话关闭或用户运行另一个查询,因此此处的时间可能比正常显示的时间更长。如果通过impala-shell运行,则开始时间和结束时间应与运行时间完全匹配,因为impala-shell在查询完成后会立即关闭查询处理程序。
    Start Time: 2021-01-05 09:57:49.569631000
    End Time: 2021-01-05 09:57:53.640374000
  1. 查询状态 Query State: EXCEPTION;OK表示查询成功执行完成。如果有错误,通常会在此处显示,例如,被用户取消(cancelled by user)、会话超时(session timeout)、异常(Exceptions)等。
  2. Impala Version: impalad version 2.4.0-cdh5.6.0 Impala版本.
  3. 用户信息
    User: hive@example
    Connected User: hive@example
    Delegated User: 
  1. Default Db: default 数据库
  2. Sql Statement: sql语句
  3. 查询节点:Coordinator: hdpprdd02.example:22000;该信息显示运行查询的Impala daemon服务的主机,通常被称为Coordinator,这有助于我们选择从哪个主机查找INFO、WARNING和ERROR级别的日志。

Plan: 查询计划 信息

  1. Estimated Per-Host Requirements: Memory=2.37GB VCores=2 需要的资源信息。

  2. 查询计划详情。查询计划(Query plan)是Impala profile中最重要的部分之一,我们需要知道如何读取它,因为它告诉我们如何扫描(scan)表、交换数据(data exchange)和连接(join)以获得最终结果。如果查询很复杂,查询计划也可能会变得非常复杂,让我们从这个简单的查询开始,以了解它的基本信息。需要记住的一件事是,我们需要反向阅读这些信息,来理解Impala的执行计划。

11.1 SCAN HDFS,第一步通常从HDFS扫描(HDFS Scan)开始,01 ,02 ,03 ,04 ,05 是执行的顺序。

#F00:执行片段 
F00:PLAN FRAGMENT [RANDOM]
    #数据流接收器,这一步的下一步是谁:F02的03段
  DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=03, HASH(AK.AUFNR)]
  #扫描的表信息sapsr3.afko
  01:SCAN HDFS [sapsr3.afko ak, RANDOM]
    #表中只有一个分区,Impala也读取一个分区。这并不一定意味着这个表是分区的,如果表没有分区,它也将显示为1/1
    #表/分区下有20个文件(files=20),读取的数据总大小为139.05MB
     partitions=1/1 files=20 size=139.05MB
     predicates: AK.MANDT = '800'
     #表信息:408788行
     table stats: 408788 rows total
     column stats: all
     #运行查询所需的内存估计值为2.25GB
     hosts=6 per-host-mem=2.25GB
     tuple-ids=1 row-size=3.09KB cardinality=408788
#F01扫描表sapsr3.afpo,下一步是FRAGMENT=F02, EXCHANGE=04
F01:PLAN FRAGMENT [RANDOM]
  DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04, HASH(AP.AUFNR)]
  00:SCAN HDFS [sapsr3.afpo ap, RANDOM]
     partitions=1/1 files=10 size=94.21MB
     table stats: 408787 rows total
     column stats: all
     hosts=6 per-host-mem=2.11GB
     tuple-ids=0 row-size=1.64KB cardinality=408787
#F02包含02:HASH JOIN ;HASH JOIN又包含Exchange操作03和04;在每个工作节点(worker node)上完成聚合之后,需要将每个工作节点的结果交换给协调器节点(coordinator)
F02:PLAN FRAGMENT [HASH(AK.AUFNR)]
  DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=05, UNPARTITIONED]
  02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
  |  hash predicates: AK.AUFNR = AP.AUFNR
  |  hosts=6 per-host-mem=120.09MB
  |  tuple-ids=1N,0 row-size=4.74KB cardinality=414944
  |
  |--04:EXCHANGE [HASH(AP.AUFNR)]
  |     hosts=6 per-host-mem=0B
  |     tuple-ids=0 row-size=1.64KB cardinality=408787
  |
  03:EXCHANGE [HASH(AK.AUFNR)]
     hosts=6 per-host-mem=0B
     tuple-ids=1 row-size=3.09KB cardinality=408788
#F03  05 EXCHANGE操作
F03:PLAN FRAGMENT [UNPARTITIONED]
  05:EXCHANGE [UNPARTITIONED]
     hosts=6 per-host-mem=unavailable
     tuple-ids=1N,0 row-size=4.74KB cardinality=414944

执行概要部分

----------------
    Estimated Per-Host Mem: 2541843314
    Estimated Per-Host VCores: 2
    Request Pool: default-pool
    ExecSummary: 
Operator          #Hosts   Avg Time   Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail                        
--------------------------------------------------------------------------------------------------------------------------
05:EXCHANGE            1  205.257ms  205.257ms      0     414.94K         0        -1.00 B  UNPARTITIONED                 
02:HASH JOIN           0        0ns        0ns      0     414.94K         0      120.09 MB  RIGHT OUTER JOIN, PARTITIONED 
|--04:EXCHANGE         0        0ns        0ns      0     408.79K         0              0  HASH(AP.AUFNR)                
|  00:SCAN HDFS        0        0ns        0ns      0     408.79K         0        2.11 GB  sapsr3.afpo ap                
03:EXCHANGE            0        0ns        0ns      0     408.79K         0              0  HASH(AK.AUFNR)                
01:SCAN HDFS           0        0ns        0ns      0     408.79K         0        2.25 GB  sapsr3.afko ak    

Planner Timeline 与Query Timeline

    Planner Timeline: 42.419ms
       - Analysis finished: 6.995ms (6.995ms)
       - Equivalence classes computed: 10.66ms (3.70ms)
       - Single node plan created: 25.233ms (15.167ms)
       - Distributed plan created: 35.728ms (10.495ms)
       - Planning finished: 42.419ms (6.690ms)
    Query Timeline: 4s071ms
       - Start execution: 59.641us (59.641us)
       #查询花费3秒
       - Planning finished: 3s553ms (3s553ms)
       - Ready to start remote fragments: 3s556ms (2.454ms)
       - Remote fragments started: 3s767ms (211.182ms)
       - Rows available: 3s976ms (209.99ms)
       - Unregister query: 4s070ms (94.268ms)

上述的查询错误总结,select * 时,无论调整多大内存都报Memory Limit Exceeded, 后来查询一个字段能够正常运行,有的字段作为select的输出就会出故障,scan数据是为0row,问题出现在scan hdfs是数据有问题。

因为我的查询很快,所以在这里看到它不是很有趣,让我们看看另一个真实的生产Impala query profile

    Query Compilation: 16.268ms
       - Metadata of all 1 tables cached: 1.786ms (1.786ms)
       - Analysis finished: 6.162ms (4.376ms)
       - Value transfer graph computed: 6.537ms (374.918us)
       - Single node plan created: 7.955ms (1.417ms)
       - Runtime filters computed: 8.274ms (318.815us)
       - Distributed plan created: 8.430ms (156.307us)
       - Lineage info computed: 9.664ms (1.234ms)
       - Planning finished: 16.268ms (6.603ms)
    Query Timeline: 35m46s
       - Query submitted: 0.000ns (0.000ns)
       - Planning finished: 22.001ms (22.001ms)
       - Submit for admission: 23.001ms (1.000ms)
       - Completed admission: 23.001ms (0.000ns)
       - Ready to start on 2 backends: 24.001ms (1.000ms)
       - All 2 execution backends (2 fragment instances) started: 36.001ms (12.000ms)
       - Rows available: 5m51s (5m51s)
       - First row fetched: 5m52s (950.045ms)
       - Last row fetched: 35m46s (29m53s)
       - Released admission control resources: 35m46s (1.000ms)
       - Unregister query: 35m46s (30.001ms)
     - ComputeScanRangeAssignmentTimer: 0.000ns

这取自一个真实案例,Impala查询运行了很长时间,客户想要找出原因。从查询时间轴(Query Timeline)中,我们可以清楚地看到,从开始执行(一共两个执行后端(All 2 execution backends))到数据可用(可用行(Rows available))几乎花费了6分钟(5m51s)。这6分钟的执行可能是正常的,就像有很多大数据集的连接(join)一样,查询运行几分钟是很常见的。 但是,我们可以注意到Impala花了30分钟将数据传递回客户端,因为第一行在第6分钟获取,而最后一行在第36分钟获取。因此,从这里,我们可以怀疑Impala协调器(coordinator)和客户端之间可能存在一些网络问题(当从客户端,如impala-shell或Hue,到Impala协调器主机获取数据时)。另一种可能性是客户端可能在获取结果时也在执行其他操作,如在屏幕上打印,因为返回的数据可能很大,该操作可能很耗时。 因此,这部分概要信息可以指引我们找到寻找瓶颈的正确方向。 这是Impala profile系列的第3部分,详细介绍了如何将查询计划部分中显示的操作号与概要文件部分的最后部分联系起来,概要文件部分显示了每个操作的详细度量,包括平均操作和每个主机上的单独操作。

一次故障排查

1、由于用户反馈查询时间比正常情况下长,我想知道的第一件事是到底花费了多长时间?所以首先,我在Profile的开头查看查询开始和结束的时间:

Start Time: 2020-01-03 07:33:42.928171000
End Time: 2020-01-03 08:47:55.745537000

我发现,现在完成查询需要1小时14分钟,这与用户的反馈是相符的。

2、由于用户取消,查询异常失败:

Query State: EXCEPTION
Query Status: Cancelled

用户已经没有耐心了,不得不取消查询,因为它运行了太长时间,这里没什么好担心的。

3、接下来,我检查了查询的复杂程度:

    Sql Statement: 
SELECT COUNT(*),NVL(SUM(NVL(NUMBER_OF_CALLS,0)),0) 
FROM cef_hpmn 
WHERE settlement_month IN (
    SELECT DISTINCT dd.month_id 
    FROM  ext_call_event_fact cef, date_dim dd 
    WHERE CAST(settlement_date_id AS bigint) = dd.date_id
) 
AND process_date = TO_TIMESTAMP ('01/02/2020', 'MM/dd/yyyy')

为了可读性,我重新格式化了。可以看到这个查询非常简单,是一个带有子查询条件的直接SELECT语句。

4、接下来我看到的是协调(Coordinator)节点的主机名称:

Coordinator: xxxxx-xxx-cdh-cdn003.xxx.XXXXXX.com:22000 如果我不能从这个Profile文件得出任何结论,那么下一步应该是获取这个主机上的impalad日志。但是现在,我只能继续,因为我还没有日志。

5、正如我在上一篇文章中提到的,Profile文件接下来的部分是查询计划(Query Plan),通常我会先跳过这部分,然后跳到概要(Summary)部分,以了解每个操作花了多长时间,并查看是否有什么明显的信息可能会让我有所发现:

Operator             #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail                             
-----------------------------------------------------------------------------------------------------------
12:AGGREGATE              1    0.000ns    0.000ns        0           1   16.00 KB       10.00 MB  FINALIZE
11:EXCHANGE               1    0.000ns    0.000ns       28           1  232.00 KB              0  UNPA...
06:AGGREGATE             29   61.358ms  110.536ms       28           1   13.60 MB       10.00 MB  
05:HASH JOIN             29  223.055ms  345.295ms  277.41M     116.66M    2.00 MB        1.94 MB  LEFT... 
|--10:EXCHANGE           29   29.977us   67.789us        2         261   24.00 KB              0  BROADCAST
|  09:AGGREGATE          29    1.569ms    2.903ms        2         261    1.96 MB       10.00 MB  FINALIZE
|  08:EXCHANGE           29   33.880us  419.580us       58         261  240.00 KB              0  HASH(..
|  04:AGGREGATE          29    9s751ms      3m53s       58         261    4.58 MB       10.00 MB  STREAMING
|  03:HASH JOIN          29    1s099ms    2s635ms  411.27M     411.27M    3.35 MB        1.94 MB  INNER ..
|  |--07:EXCHANGE        29  142.532us  334.794us    8.07K       8.07K  304.00 KB              0  BROADCAST 
|  |  02:SCAN HDFS        1    1.700ms    1.700ms    8.07K       8.07K  332.94 KB       32.00 MB  xx..             
|  01:SCAN HDFS          29   44s391ms     13m18s  411.27M     411.27M  525.95 MB        4.38 GB  xx.. 
00:SCAN HDFS             29       3m7s     56m13s  277.41M     116.66M    7.83 MB      160.00 MB  xx..

我在这里找到了几个关键的信息:

反向阅读,从下到上,因为这是Impala的操作顺序 比较“Avg Time”和“Max Time”列的数据 比较“Rows”和“Est. #Rows”列的数据 检查“Detail”列以查看每个操作的Join类型 很快,我就注意到扫描HDFS(SCAN HDFS)操作的“Avg Time”和“Max Time”的差别很大,平均时间为3分7秒,但是29个主机中的一个主机的最长时间为56分13秒。继续往下读,我还注意到第二次扫描HDFS操作也有相同的问题,分别是44秒和13分18秒。

因此,我的下一个想法是确定哪些主机的执行速度比其他主机慢,以及是否来自同一主机。为此,我使用字符串“id=0”来搜索Profile文件,“0”是Summary部分“00:SCAN HDFS”中每一行开头的操作编号。

我从文件开头搜索到了“id=0”的第一部分:

    HDFS_SCAN_NODE (id=0)
 ....
            - ScannerThreadsTotalWallClockTime: 20.0m (1200982115995)
              - MaterializeTupleTime(*): 226ms (226568242)
              - ScannerThreadsSysTime: 322ms (322168172)
              - ScannerThreadsUserTime: 6.76s (6758158482)
            - ScannerThreadsVoluntaryContextSwitches: 10,907 (10907)
            - TotalRawHdfsOpenFileTime(*): 8.6m (517759170560)
            - TotalRawHdfsReadTime(*): 3.4m (201957505069)
            - TotalReadThroughput: 749.9 KiB/s (767874)
            - TotalTime: 3.1m (187289950304)

我发现TotalTime为3.1分钟,与Summary部分中看到的3.7分钟相匹配,因此这是Averaged Fragment。 为了确认,我向后滚动并看到以下内容:

Averaged Fragment F00 继续搜索文件,我来到下面部分(第二个实例“id=0”):

            HDFS_SCAN_NODE (id=0)
....
              - ScannerThreadsTotalWallClockTime: 10.4m (626435081910)
                - MaterializeTupleTime(*): 278ms (278689886)
                - ScannerThreadsSysTime: 266ms (266094000)
                - ScannerThreadsUserTime: 5.75s (5748833000)
              - ScannerThreadsVoluntaryContextSwitches: 11,285 (11285)
              - TotalRawHdfsOpenFileTime(*): 7.8m (468388283839)
              - TotalRawHdfsReadTime(*): 1.9m (114598713147)
              - TotalReadThroughput: 731.0 KiB/s (748535)
              - TotalTime: 2.1m (125005670562)

这个告诉我花了2.1分钟,比平均3.1分钟要快,然后我回滚到前面以确认主机:

    Fragment F00
      Instance 94481a81355e51e4:51fd9f9500000053 (host=xxxxx-xxx-cdh-cdn002.xxx.XXXXXX.com:22000)
现在,可以看到我找到的三个主要信息:

      Instance 94481a81355e51e4:51fd9f9500000053 (host=xxxxx-xxx-cdh-cdn002.xxx.XXXXXX.com:22000)
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 2.1m (125005670562)

接下来我使用grep过滤出我想要的内容。由于Profile文件已缩进,因此我使用egrep执行了下面的操作:

egrep ' Instance .*)|^ HDFS_SCAN_NODE (id=0)|^ - TotalTime: ' profile-example.txt 结果如下:

...
      Instance 94481a81355e51e4:51fd9f9500000053 (host=xxxxx-xxx-cdh-cdn002.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 2.1m (125005670562)
      Instance 94481a81355e51e4:51fd9f9500000057 (host=xxxxx-xxx-cdh-cdn003.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.9m (114395426955)
      Instance 94481a81355e51e4:51fd9f9500000058 (host=xxxxx-xxx-cdh-cdn020.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.5m (92671503850)
      Instance 94481a81355e51e4:51fd9f950000003d (host=xxxxx-xxx-cdh-cdn012.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.4m (86459970122)
      Instance 94481a81355e51e4:51fd9f950000004b (host=xxxxx-xxx-cdh-cdn014.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.4m (82187347776)
      Instance 94481a81355e51e4:51fd9f9500000050 (host=xxxxx-xxx-cdh-cdn006.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.4m (82146306944)
      Instance 94481a81355e51e4:51fd9f950000004f (host=xxxxx-xxx-cdh-cdn024.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.3m (80468400288)
      Instance 94481a81355e51e4:51fd9f950000004d (host=xxxxx-xxx-cdh-cdn022.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.3m (79714897965)
      Instance 94481a81355e51e4:51fd9f9500000043 (host=xxxxx-xxx-cdh-cdn017.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.3m (78877950983)
      Instance 94481a81355e51e4:51fd9f9500000052 (host=xxxxx-xxx-cdh-cdn001.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.3m (77593734314)
      Instance 94481a81355e51e4:51fd9f950000003c (host=xxxxx-xxx-cdh-cdn019.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.3m (76164245478)
      Instance 94481a81355e51e4:51fd9f9500000045 (host=xxxxx-xxx-cdh-cdn007.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.3m (75588331159)
      Instance 94481a81355e51e4:51fd9f9500000044 (host=xxxxx-xxx-cdh-cdn010.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.2m (73596530464)
      Instance 94481a81355e51e4:51fd9f9500000042 (host=xxxxx-xxx-cdh-cdn018.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.2m (72946574082)
      Instance 94481a81355e51e4:51fd9f9500000055 (host=xxxxx-xxx-cdh-cdn026.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.2m (69918383242)
      Instance 94481a81355e51e4:51fd9f9500000054 (host=xxxxx-xxx-cdh-cdn011.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.2m (69355611992)
      Instance 94481a81355e51e4:51fd9f9500000051 (host=xxxxx-xxx-cdh-cdn009.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.1m (68527129814)
      Instance 94481a81355e51e4:51fd9f9500000048 (host=xxxxx-xxx-cdh-cdn016.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.1m (67249633571)
      Instance 94481a81355e51e4:51fd9f9500000047 (host=xxxxx-xxx-cdh-cdn013.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.1m (63989781076)
      Instance 94481a81355e51e4:51fd9f9500000041 (host=xxxxx-xxx-cdh-cdn028.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.0m (62739870946)
      Instance 94481a81355e51e4:51fd9f950000003f (host=xxxxx-xxx-cdh-cdn025.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.0m (62136511127)
      Instance 94481a81355e51e4:51fd9f950000004c (host=xxxxx-xxx-cdh-cdn005.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.0m (61943905274)
      Instance 94481a81355e51e4:51fd9f9500000046 (host=xxxxx-xxx-cdh-cdn027.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.0m (61955797776)
      Instance 94481a81355e51e4:51fd9f950000004e (host=xxxxx-xxx-cdh-cdn021.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 1.0m (60045780252)
      Instance 94481a81355e51e4:51fd9f9500000040 (host=xxxxx-xxx-cdh-cdn029.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 58.05s (58048904552)
      Instance 94481a81355e51e4:51fd9f950000004a (host=xxxxx-xxx-cdh-cdn023.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 57.34s (57338024825)
      Instance 94481a81355e51e4:51fd9f9500000049 (host=xxxxx-xxx-cdh-cdn008.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 53.13s (53130104765)
      Instance 94481a81355e51e4:51fd9f9500000056 (host=xxxxx-xxx-cdh-cdn004.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 43.24s (43238668974)
      Instance 94481a81355e51e4:51fd9f950000003e (host=xxxxx-xxx-cdh-cdn015.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=0)
              - TotalTime: 56.2m (3373973559713)

我省略了其他不相关的信息,只留下了我感兴趣的信息。现在我可以清楚地看到哪个服务器节点是瓶颈,它是主机xxxxx-xxx-cdh-cdn015.xxx.XXXXXX.com,用时56.2分钟,而其他所有主机用时约40秒至2分钟。

现在,我记得另一个HDFS SCAN具有相同的问题,即操作01(01:SCAN HDFS),所以我执行了相同的egrep命令(请记住,不同操作的缩进可能不同,所以我需要在Profile文件中搜索到那个操作,然后将其之前的空格完全复制粘贴,以获得我想要的结果):

egrep ' Instance .*)|^ HDFS_SCAN_NODE (id=1)|^ - TotalTime: ' profile-example.txt 结果再次证实了同样的结论:

....
      Instance 94481a81355e51e4:51fd9f950000000c (host=xxxxx-xxx-cdh-cdn015.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=1)
              - TotalTime: 13.3m (798747290751)
...
      Instance 94481a81355e51e4:51fd9f9500000007 (host=xxxxx-xxx-cdh-cdn001.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=1)
              - TotalTime: 28.16s (28163113611)
      Instance 94481a81355e51e4:51fd9f9500000018 (host=xxxxx-xxx-cdh-cdn009.xxx.XXXXXX.com:22000)
...
            HDFS_SCAN_NODE (id=1)
              - TotalTime: 23.29s (23285966387)
...

很明显,同样的主机xxxxx-xxx-cdh-cdn015.xxx.XXXXXX.com也有同样的问题,它的运行速度比其他主机慢得多,13.3分钟比 28.16秒。

然后我得出结论,该服务器出现了一些问题,需要修复。

登陆评论: 使用GITHUB登陆