我这里遇到的问题2个40万行的表join, impala查询报错,内存不够,后来设置mem_limit=180G依然不行。impala mem_limit足够大,但是小查询还是报错。 最后解决方法:配置文件加了 --enable_partitioned_aggregation=false --enable_partitioned_hash_join=false;官方链接
vim /etc/default/impala
#其他内容省略
IMPALA_SERVER_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-default_pool_max_requests=-1 \
--enable_partitioned_aggregation=false \
--enable_partitioned_hash_join=false \
#其他内容省略
explain sql语句; 用于分析sql的执行计划,sql未真正执行; summary; 在一条语句执行完以后立马执行summary命令就能看到这条sql语句执行时所消耗的时间和资源的情况,还有impala预估的资源使用,有时候预估和实际情况还是有很多不同的。 PROFILE; 更详细的执行信息还可以用Profiles命令.
[hd:21000] > SELECT * FROM SAPSR3.AFPO AP LEFT JOIN SAPSR3.AFKO AK ON AK.AUFNR=AP.AUFNR AND AK.MANDT = '800';
Query: select * FROM SAPSR3.AFPO AP LEFT JOIN SAPSR3.AFKO AK ON AK.AUFNR=AP.AUFNR AND AK.MANDT = '800'
WARNINGS:
Memory limit exceeded
Query did not have enough memory to get the minimum required buffers in the block manager.
Memory Limit Exceeded
Query(6b423c1ab18095e8:d63218157018ca9b) Limit: Limit=180.00 GB Consumption=235.46 MB
Fragment 6b423c1ab18095e8:d63218157018ca9d: Consumption=25.24 MB
HASH_JOIN_NODE (id=2): Consumption=10.01 MB
EXCHANGE_NODE (id=3): Consumption=0
DataStreamRecvr: Consumption=15.21 MB
EXCHANGE_NODE (id=4): Consumption=0
DataStreamSender: Consumption=64.00 B
Block Manager: Limit=144.00 GB Consumption=0
Fragment 6b423c1ab18095e8:d63218157018caa7: Consumption=68.30 MB
HDFS_SCAN_NODE (id=0): Consumption=66.79 MB
DataStreamSender: Consumption=104.52 KB
Fragment 6b423c1ab18095e8:d63218157018caa9: Consumption=141.88 MB
HDFS_SCAN_NODE (id=1): Consumption=139.23 MB
DataStreamSender: Consumption=96.28 KB
Memory Limit Exceeded
Query(6b423c1ab18095e8:d63218157018ca9b) Limit: Limit=180.00 GB Consumption=167.97 MB
Fragment 6b423c1ab18095e8:d63218157018caa0: Consumption=24.82 MB
HASH_JOIN_NODE (id=2): Consumption=10.01 MB
EXCHANGE_NODE (id=3): Consumption=0
DataStreamRecvr: Consumption=14.79 MB
EXCHANGE_NODE (id=4): Consumption=0
DataStreamSender: Consumption=64.00 B
Block Manager: Limit=144.00 GB Consumption=0
Fragment 6b423c1ab18095e8:d63218157018caa3: Consumption=58.79 MB
HDFS_SCAN_NODE (id=0): Consumption=57.25 MB
DataStreamSender: Consumption=120.52 KB
Fragment 6b423c1ab18095e8:d63218157018caac: Consumption=84.39 MB
HDFS_SCAN_NODE (id=1): Consumption=81.73 MB
DataStreamSender: Consumption=120.28 KB
运行profile
[hdpprdd02.example:21000] > PROFILE;
Query Runtime Profile:
Query (id=6b423c1ab18095e8:d63218157018ca9b):
Summary:
Session ID: 464ad797bf15d6ab:dd136f8c4ff2fab4
Session Type: BEESWAX
Start Time: 2021-01-05 09:57:49.569631000
End Time: 2021-01-05 09:57:53.640374000
Query Type: QUERY
Query State: EXCEPTION
Query Status:
Memory limit exceeded
Query did not have enough memory to get the minimum required buffers in the block manager.
Impala Version: impalad version 2.4.0-cdh5.6.0 RELEASE (build 386c7fffa54a2b846dfb00a2ff7bf023dc2dd684)
User: hive@example
Connected User: hive@example
Delegated User:
Network Address: ::ffff:172.16.106.104:50118
Default Db: default
Sql Statement: select * FROM SAPSR3.AFPO AP LEFT JOIN SAPSR3.AFKO AK ON AK.AUFNR=AP.AUFNR AND AK.MANDT = '800'
Coordinator: hdpprdd02.example:22000
Plan:
----------------
Estimated Per-Host Requirements: Memory=2.37GB VCores=2
F03:PLAN FRAGMENT [UNPARTITIONED]
05:EXCHANGE [UNPARTITIONED]
hosts=6 per-host-mem=unavailable
tuple-ids=1N,0 row-size=4.74KB cardinality=414944
F02:PLAN FRAGMENT [HASH(AK.AUFNR)]
DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=05, UNPARTITIONED]
02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
| hash predicates: AK.AUFNR = AP.AUFNR
| hosts=6 per-host-mem=120.09MB
| tuple-ids=1N,0 row-size=4.74KB cardinality=414944
|
|--04:EXCHANGE [HASH(AP.AUFNR)]
| hosts=6 per-host-mem=0B
| tuple-ids=0 row-size=1.64KB cardinality=408787
|
03:EXCHANGE [HASH(AK.AUFNR)]
hosts=6 per-host-mem=0B
tuple-ids=1 row-size=3.09KB cardinality=408788
F01:PLAN FRAGMENT [RANDOM]
DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04, HASH(AP.AUFNR)]
00:SCAN HDFS [sapsr3.afpo ap, RANDOM]
partitions=1/1 files=10 size=94.21MB
table stats: 408787 rows total
column stats: all
hosts=6 per-host-mem=2.11GB
tuple-ids=0 row-size=1.64KB cardinality=408787
F00:PLAN FRAGMENT [RANDOM]
DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=03, HASH(AK.AUFNR)]
01:SCAN HDFS [sapsr3.afko ak, RANDOM]
partitions=1/1 files=20 size=139.05MB
predicates: AK.MANDT = '800'
table stats: 408788 rows total
column stats: all
hosts=6 per-host-mem=2.25GB
tuple-ids=1 row-size=3.09KB cardinality=408788
----------------
Estimated Per-Host Mem: 2541843314
Estimated Per-Host VCores: 2
Request Pool: default-pool
ExecSummary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
--------------------------------------------------------------------------------------------------------------------------
05:EXCHANGE 1 205.257ms 205.257ms 0 414.94K 0 -1.00 B UNPARTITIONED
02:HASH JOIN 0 0ns 0ns 0 414.94K 0 120.09 MB RIGHT OUTER JOIN, PARTITIONED
|--04:EXCHANGE 0 0ns 0ns 0 408.79K 0 0 HASH(AP.AUFNR)
| 00:SCAN HDFS 0 0ns 0ns 0 408.79K 0 2.11 GB sapsr3.afpo ap
03:EXCHANGE 0 0ns 0ns 0 408.79K 0 0 HASH(AK.AUFNR)
01:SCAN HDFS 0 0ns 0ns 0 408.79K 0 2.25 GB sapsr3.afko ak
Planner Timeline: 42.419ms
- Analysis finished: 6.995ms (6.995ms)
- Equivalence classes computed: 10.66ms (3.70ms)
- Single node plan created: 25.233ms (15.167ms)
- Distributed plan created: 35.728ms (10.495ms)
- Planning finished: 42.419ms (6.690ms)
Query Timeline: 4s071ms
- Start execution: 59.641us (59.641us)
- Planning finished: 3s553ms (3s553ms)
- Ready to start remote fragments: 3s556ms (2.454ms)
- Remote fragments started: 3s767ms (211.182ms)
- Rows available: 3s976ms (209.99ms)
- Unregister query: 4s070ms (94.268ms)
ImpalaServer:
- ClientFetchWaitTimer: 0ns
- RowMaterializationTimer: 0ns
Execution Profile 6b423c1ab18095e8:d63218157018ca9b:(Total: 421.63ms, non-child: 0ns, % non-child: 0.00%)
Fragment start latencies: count: 18, last: 0.016319ns, min: 0.014551ns, max: 0.173539ns, mean: 0.065316ns, stddev: 0.070056ns
Per Node Peak Memory Usage: hdpprdd06.example:22000(0) hdpprdd03.example:22000(0) hdpprdd01.example:22000(0) hdpprdd05.example:22000(0) hdpprdd04.example:22000(0) hdpprdd02.example:22000(0)
- FinalizationTimer: 0ns
Coordinator Fragment F03:(Total: 205.808ms, non-child: 551.47us, % non-child: 0.27%)
MemoryUsage(500.0ms): 16.00 KB
ThreadUsage(500.0ms): 1
- AverageThreadTokens: 1.00
- PeakMemoryUsage: 16.00 KB (16384)
- PerHostPeakMemUsage: 0
- PrepareTime: 42.569us
- RowsProduced: 0 (0)
- TotalCpuTime: 0ns
- TotalNetworkReceiveTime: 205.256ms
- TotalNetworkSendTime: 0ns
- TotalStorageWaitTime: 0ns
BlockMgr:
- BlockWritesOutstanding: 0 (0)
- BlocksCreated: 32 (32)
- BlocksRecycled: 0 (0)
- BufferedPins: 0 (0)
- BytesWritten: 0
- MaxBlockSize: 8.00 MB (8388608)
- MemoryLimit: 144.00 GB (154618822656)
- PeakMemoryUsage: 0
- TotalBufferWaitTime: 0ns
- TotalEncryptionTime: 0ns
- TotalIntegrityCheckTime: 0ns
- TotalReadBlockTime: 0ns
EXCHANGE_NODE (id=5):(Total: 205.257ms, non-child: 205.257ms, % non-child: 100.00%)
BytesReceived(500.0ms): 0
- BytesReceived: 0
- ConvertRowBatchTime: 0ns
- DeserializeRowBatchTimer: 0ns
- FirstBatchArrivalWaitTime: 0ns
- PeakMemoryUsage: 0
- RowsReturned: 0 (0)
- RowsReturnedRate: 0
- SendersBlockedTimer: 0ns
- SendersBlockedTotalTimer(*): 0ns
Averaged Fragment F02:
split sizes: min: 0, max: 0, avg: 0, stddev: 0
completion times: min:245.166ms max:262.887ms mean: 254.675259.675.259500ms stddev:5.733565.733.564519ms
execution rates: min:0.00 /sec max:0.00 /sec mean:0.00 /sec stddev:0.00 /sec
num instances: 6
Averaged Fragment F01:
split sizes: min: 11.02 MB, max: 21.61 MB, avg: 15.70 MB, stddev: 4.40 MB
completion times: min:226.494ms max:229.616ms mean: 228.095456.95.456333ms stddev:1.110035.110.035036ms
execution rates: min:48.15 MB/sec max:94.40 MB/sec mean:68.83 MB/sec stddev:19.26 MB/sec
num instances: 6
Averaged Fragment F00:
split sizes: min: 21.55 MB, max: 29.46 MB, avg: 23.17 MB, stddev: 2.83 MB
completion times: min:209.39ms max:210.423ms mean: 209.729817.729.817333ms stddev:533.480740.480.740418us
execution rates: min:102.42 MB/sec max:140.05 MB/sec mean:110.48 MB/sec stddev:13.32 MB/sec
num instances: 6
Fragment F02:
Instance 6b423c1ab18095e8:d63218157018ca9d (host=hdpprdd04.example:22000):
Instance 6b423c1ab18095e8:d63218157018ca9e (host=hdpprdd05.example:22000):
Instance 6b423c1ab18095e8:d63218157018ca9f (host=hdpprdd01.example:22000):
Instance 6b423c1ab18095e8:d63218157018caa0 (host=hdpprdd03.example:22000):
Instance 6b423c1ab18095e8:d63218157018caa1 (host=hdpprdd06.example:22000):
Instance 6b423c1ab18095e8:d63218157018caa2 (host=hdpprdd02.example:22000):
Fragment F01:
Instance 6b423c1ab18095e8:d63218157018caa3 (host=hdpprdd03.example:22000):
Instance 6b423c1ab18095e8:d63218157018caa4 (host=hdpprdd01.example:22000):
Instance 6b423c1ab18095e8:d63218157018caa5 (host=hdpprdd05.example:22000):
Instance 6b423c1ab18095e8:d63218157018caa6 (host=hdpprdd02.example:22000):
Instance 6b423c1ab18095e8:d63218157018caa7 (host=hdpprdd04.example:22000):
Instance 6b423c1ab18095e8:d63218157018caa8 (host=hdpprdd06.example:22000):
Fragment F00:
Instance 6b423c1ab18095e8:d63218157018caa9 (host=hdpprdd04.example:22000):
Instance 6b423c1ab18095e8:d63218157018caaa (host=hdpprdd05.example:22000):
Instance 6b423c1ab18095e8:d63218157018caab (host=hdpprdd01.example:22000):
Instance 6b423c1ab18095e8:d63218157018caac (host=hdpprdd03.example:22000):
Instance 6b423c1ab18095e8:d63218157018caad (host=hdpprdd06.example:22000):
Instance 6b423c1ab18095e8:d63218157018caae (host=hdpprdd02.example:22000):
Start Time: 2021-01-05 09:57:49.569631000
End Time: 2021-01-05 09:57:53.640374000
User: hive@example
Connected User: hive@example
Delegated User:
Plan: 查询计划 信息
Estimated Per-Host Requirements: Memory=2.37GB VCores=2 需要的资源信息。
查询计划详情。查询计划(Query plan)是Impala profile中最重要的部分之一,我们需要知道如何读取它,因为它告诉我们如何扫描(scan)表、交换数据(data exchange)和连接(join)以获得最终结果。如果查询很复杂,查询计划也可能会变得非常复杂,让我们从这个简单的查询开始,以了解它的基本信息。需要记住的一件事是,我们需要反向阅读这些信息,来理解Impala的执行计划。
11.1 SCAN HDFS,第一步通常从HDFS扫描(HDFS Scan)开始,01 ,02 ,03 ,04 ,05 是执行的顺序。
#F00:执行片段
F00:PLAN FRAGMENT [RANDOM]
#数据流接收器,这一步的下一步是谁:F02的03段
DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=03, HASH(AK.AUFNR)]
#扫描的表信息sapsr3.afko
01:SCAN HDFS [sapsr3.afko ak, RANDOM]
#表中只有一个分区,Impala也读取一个分区。这并不一定意味着这个表是分区的,如果表没有分区,它也将显示为1/1
#表/分区下有20个文件(files=20),读取的数据总大小为139.05MB
partitions=1/1 files=20 size=139.05MB
predicates: AK.MANDT = '800'
#表信息:408788行
table stats: 408788 rows total
column stats: all
#运行查询所需的内存估计值为2.25GB
hosts=6 per-host-mem=2.25GB
tuple-ids=1 row-size=3.09KB cardinality=408788
#F01扫描表sapsr3.afpo,下一步是FRAGMENT=F02, EXCHANGE=04
F01:PLAN FRAGMENT [RANDOM]
DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04, HASH(AP.AUFNR)]
00:SCAN HDFS [sapsr3.afpo ap, RANDOM]
partitions=1/1 files=10 size=94.21MB
table stats: 408787 rows total
column stats: all
hosts=6 per-host-mem=2.11GB
tuple-ids=0 row-size=1.64KB cardinality=408787
#F02包含02:HASH JOIN ;HASH JOIN又包含Exchange操作03和04;在每个工作节点(worker node)上完成聚合之后,需要将每个工作节点的结果交换给协调器节点(coordinator)
F02:PLAN FRAGMENT [HASH(AK.AUFNR)]
DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=05, UNPARTITIONED]
02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
| hash predicates: AK.AUFNR = AP.AUFNR
| hosts=6 per-host-mem=120.09MB
| tuple-ids=1N,0 row-size=4.74KB cardinality=414944
|
|--04:EXCHANGE [HASH(AP.AUFNR)]
| hosts=6 per-host-mem=0B
| tuple-ids=0 row-size=1.64KB cardinality=408787
|
03:EXCHANGE [HASH(AK.AUFNR)]
hosts=6 per-host-mem=0B
tuple-ids=1 row-size=3.09KB cardinality=408788
#F03 05 EXCHANGE操作
F03:PLAN FRAGMENT [UNPARTITIONED]
05:EXCHANGE [UNPARTITIONED]
hosts=6 per-host-mem=unavailable
tuple-ids=1N,0 row-size=4.74KB cardinality=414944
执行概要部分
----------------
Estimated Per-Host Mem: 2541843314
Estimated Per-Host VCores: 2
Request Pool: default-pool
ExecSummary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
--------------------------------------------------------------------------------------------------------------------------
05:EXCHANGE 1 205.257ms 205.257ms 0 414.94K 0 -1.00 B UNPARTITIONED
02:HASH JOIN 0 0ns 0ns 0 414.94K 0 120.09 MB RIGHT OUTER JOIN, PARTITIONED
|--04:EXCHANGE 0 0ns 0ns 0 408.79K 0 0 HASH(AP.AUFNR)
| 00:SCAN HDFS 0 0ns 0ns 0 408.79K 0 2.11 GB sapsr3.afpo ap
03:EXCHANGE 0 0ns 0ns 0 408.79K 0 0 HASH(AK.AUFNR)
01:SCAN HDFS 0 0ns 0ns 0 408.79K 0 2.25 GB sapsr3.afko ak
Planner Timeline 与Query Timeline
Planner Timeline: 42.419ms
- Analysis finished: 6.995ms (6.995ms)
- Equivalence classes computed: 10.66ms (3.70ms)
- Single node plan created: 25.233ms (15.167ms)
- Distributed plan created: 35.728ms (10.495ms)
- Planning finished: 42.419ms (6.690ms)
Query Timeline: 4s071ms
- Start execution: 59.641us (59.641us)
#查询花费3秒
- Planning finished: 3s553ms (3s553ms)
- Ready to start remote fragments: 3s556ms (2.454ms)
- Remote fragments started: 3s767ms (211.182ms)
- Rows available: 3s976ms (209.99ms)
- Unregister query: 4s070ms (94.268ms)
上述的查询错误总结,select * 时,无论调整多大内存都报Memory Limit Exceeded, 后来查询一个字段能够正常运行,有的字段作为select的输出就会出故障,scan数据是为0row,问题出现在scan hdfs是数据有问题。
因为我的查询很快,所以在这里看到它不是很有趣,让我们看看另一个真实的生产Impala query profile
Query Compilation: 16.268ms
- Metadata of all 1 tables cached: 1.786ms (1.786ms)
- Analysis finished: 6.162ms (4.376ms)
- Value transfer graph computed: 6.537ms (374.918us)
- Single node plan created: 7.955ms (1.417ms)
- Runtime filters computed: 8.274ms (318.815us)
- Distributed plan created: 8.430ms (156.307us)
- Lineage info computed: 9.664ms (1.234ms)
- Planning finished: 16.268ms (6.603ms)
Query Timeline: 35m46s
- Query submitted: 0.000ns (0.000ns)
- Planning finished: 22.001ms (22.001ms)
- Submit for admission: 23.001ms (1.000ms)
- Completed admission: 23.001ms (0.000ns)
- Ready to start on 2 backends: 24.001ms (1.000ms)
- All 2 execution backends (2 fragment instances) started: 36.001ms (12.000ms)
- Rows available: 5m51s (5m51s)
- First row fetched: 5m52s (950.045ms)
- Last row fetched: 35m46s (29m53s)
- Released admission control resources: 35m46s (1.000ms)
- Unregister query: 35m46s (30.001ms)
- ComputeScanRangeAssignmentTimer: 0.000ns
这取自一个真实案例,Impala查询运行了很长时间,客户想要找出原因。从查询时间轴(Query Timeline)中,我们可以清楚地看到,从开始执行(一共两个执行后端(All 2 execution backends))到数据可用(可用行(Rows available))几乎花费了6分钟(5m51s)。这6分钟的执行可能是正常的,就像有很多大数据集的连接(join)一样,查询运行几分钟是很常见的。 但是,我们可以注意到Impala花了30分钟将数据传递回客户端,因为第一行在第6分钟获取,而最后一行在第36分钟获取。因此,从这里,我们可以怀疑Impala协调器(coordinator)和客户端之间可能存在一些网络问题(当从客户端,如impala-shell或Hue,到Impala协调器主机获取数据时)。另一种可能性是客户端可能在获取结果时也在执行其他操作,如在屏幕上打印,因为返回的数据可能很大,该操作可能很耗时。 因此,这部分概要信息可以指引我们找到寻找瓶颈的正确方向。 这是Impala profile系列的第3部分,详细介绍了如何将查询计划部分中显示的操作号与概要文件部分的最后部分联系起来,概要文件部分显示了每个操作的详细度量,包括平均操作和每个主机上的单独操作。
1、由于用户反馈查询时间比正常情况下长,我想知道的第一件事是到底花费了多长时间?所以首先,我在Profile的开头查看查询开始和结束的时间:
Start Time: 2020-01-03 07:33:42.928171000
End Time: 2020-01-03 08:47:55.745537000
我发现,现在完成查询需要1小时14分钟,这与用户的反馈是相符的。
2、由于用户取消,查询异常失败:
Query State: EXCEPTION
Query Status: Cancelled
用户已经没有耐心了,不得不取消查询,因为它运行了太长时间,这里没什么好担心的。
3、接下来,我检查了查询的复杂程度:
Sql Statement:
SELECT COUNT(*),NVL(SUM(NVL(NUMBER_OF_CALLS,0)),0)
FROM cef_hpmn
WHERE settlement_month IN (
SELECT DISTINCT dd.month_id
FROM ext_call_event_fact cef, date_dim dd
WHERE CAST(settlement_date_id AS bigint) = dd.date_id
)
AND process_date = TO_TIMESTAMP ('01/02/2020', 'MM/dd/yyyy')
为了可读性,我重新格式化了。可以看到这个查询非常简单,是一个带有子查询条件的直接SELECT语句。
4、接下来我看到的是协调(Coordinator)节点的主机名称:
Coordinator: xxxxx-xxx-cdh-cdn003.xxx.XXXXXX.com:22000 如果我不能从这个Profile文件得出任何结论,那么下一步应该是获取这个主机上的impalad日志。但是现在,我只能继续,因为我还没有日志。
5、正如我在上一篇文章中提到的,Profile文件接下来的部分是查询计划(Query Plan),通常我会先跳过这部分,然后跳到概要(Summary)部分,以了解每个操作花了多长时间,并查看是否有什么明显的信息可能会让我有所发现:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
-----------------------------------------------------------------------------------------------------------
12:AGGREGATE 1 0.000ns 0.000ns 0 1 16.00 KB 10.00 MB FINALIZE
11:EXCHANGE 1 0.000ns 0.000ns 28 1 232.00 KB 0 UNPA...
06:AGGREGATE 29 61.358ms 110.536ms 28 1 13.60 MB 10.00 MB
05:HASH JOIN 29 223.055ms 345.295ms 277.41M 116.66M 2.00 MB 1.94 MB LEFT...
|--10:EXCHANGE 29 29.977us 67.789us 2 261 24.00 KB 0 BROADCAST
| 09:AGGREGATE 29 1.569ms 2.903ms 2 261 1.96 MB 10.00 MB FINALIZE
| 08:EXCHANGE 29 33.880us 419.580us 58 261 240.00 KB 0 HASH(..
| 04:AGGREGATE 29 9s751ms 3m53s 58 261 4.58 MB 10.00 MB STREAMING
| 03:HASH JOIN 29 1s099ms 2s635ms 411.27M 411.27M 3.35 MB 1.94 MB INNER ..
| |--07:EXCHANGE 29 142.532us 334.794us 8.07K 8.07K 304.00 KB 0 BROADCAST
| | 02:SCAN HDFS 1 1.700ms 1.700ms 8.07K 8.07K 332.94 KB 32.00 MB xx..
| 01:SCAN HDFS 29 44s391ms 13m18s 411.27M 411.27M 525.95 MB 4.38 GB xx..
00:SCAN HDFS 29 3m7s 56m13s 277.41M 116.66M 7.83 MB 160.00 MB xx..
我在这里找到了几个关键的信息:
反向阅读,从下到上,因为这是Impala的操作顺序 比较“Avg Time”和“Max Time”列的数据 比较“Rows”和“Est. #Rows”列的数据 检查“Detail”列以查看每个操作的Join类型 很快,我就注意到扫描HDFS(SCAN HDFS)操作的“Avg Time”和“Max Time”的差别很大,平均时间为3分7秒,但是29个主机中的一个主机的最长时间为56分13秒。继续往下读,我还注意到第二次扫描HDFS操作也有相同的问题,分别是44秒和13分18秒。
因此,我的下一个想法是确定哪些主机的执行速度比其他主机慢,以及是否来自同一主机。为此,我使用字符串“id=0”来搜索Profile文件,“0”是Summary部分“00:SCAN HDFS”中每一行开头的操作编号。
我从文件开头搜索到了“id=0”的第一部分:
HDFS_SCAN_NODE (id=0)
....
- ScannerThreadsTotalWallClockTime: 20.0m (1200982115995)
- MaterializeTupleTime(*): 226ms (226568242)
- ScannerThreadsSysTime: 322ms (322168172)
- ScannerThreadsUserTime: 6.76s (6758158482)
- ScannerThreadsVoluntaryContextSwitches: 10,907 (10907)
- TotalRawHdfsOpenFileTime(*): 8.6m (517759170560)
- TotalRawHdfsReadTime(*): 3.4m (201957505069)
- TotalReadThroughput: 749.9 KiB/s (767874)
- TotalTime: 3.1m (187289950304)
我发现TotalTime为3.1分钟,与Summary部分中看到的3.7分钟相匹配,因此这是Averaged Fragment。 为了确认,我向后滚动并看到以下内容:
Averaged Fragment F00 继续搜索文件,我来到下面部分(第二个实例“id=0”):
HDFS_SCAN_NODE (id=0)
....
- ScannerThreadsTotalWallClockTime: 10.4m (626435081910)
- MaterializeTupleTime(*): 278ms (278689886)
- ScannerThreadsSysTime: 266ms (266094000)
- ScannerThreadsUserTime: 5.75s (5748833000)
- ScannerThreadsVoluntaryContextSwitches: 11,285 (11285)
- TotalRawHdfsOpenFileTime(*): 7.8m (468388283839)
- TotalRawHdfsReadTime(*): 1.9m (114598713147)
- TotalReadThroughput: 731.0 KiB/s (748535)
- TotalTime: 2.1m (125005670562)
这个告诉我花了2.1分钟,比平均3.1分钟要快,然后我回滚到前面以确认主机:
Fragment F00
Instance 94481a81355e51e4:51fd9f9500000053 (host=xxxxx-xxx-cdh-cdn002.xxx.XXXXXX.com:22000)
现在,可以看到我找到的三个主要信息:
Instance 94481a81355e51e4:51fd9f9500000053 (host=xxxxx-xxx-cdh-cdn002.xxx.XXXXXX.com:22000)
HDFS_SCAN_NODE (id=0)
- TotalTime: 2.1m (125005670562)
接下来我使用grep过滤出我想要的内容。由于Profile文件已缩进,因此我使用egrep执行了下面的操作:
egrep ' Instance .*)|^ HDFS_SCAN_NODE (id=0)|^ - TotalTime: ' profile-example.txt 结果如下:
...
Instance 94481a81355e51e4:51fd9f9500000053 (host=xxxxx-xxx-cdh-cdn002.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 2.1m (125005670562)
Instance 94481a81355e51e4:51fd9f9500000057 (host=xxxxx-xxx-cdh-cdn003.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.9m (114395426955)
Instance 94481a81355e51e4:51fd9f9500000058 (host=xxxxx-xxx-cdh-cdn020.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.5m (92671503850)
Instance 94481a81355e51e4:51fd9f950000003d (host=xxxxx-xxx-cdh-cdn012.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.4m (86459970122)
Instance 94481a81355e51e4:51fd9f950000004b (host=xxxxx-xxx-cdh-cdn014.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.4m (82187347776)
Instance 94481a81355e51e4:51fd9f9500000050 (host=xxxxx-xxx-cdh-cdn006.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.4m (82146306944)
Instance 94481a81355e51e4:51fd9f950000004f (host=xxxxx-xxx-cdh-cdn024.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.3m (80468400288)
Instance 94481a81355e51e4:51fd9f950000004d (host=xxxxx-xxx-cdh-cdn022.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.3m (79714897965)
Instance 94481a81355e51e4:51fd9f9500000043 (host=xxxxx-xxx-cdh-cdn017.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.3m (78877950983)
Instance 94481a81355e51e4:51fd9f9500000052 (host=xxxxx-xxx-cdh-cdn001.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.3m (77593734314)
Instance 94481a81355e51e4:51fd9f950000003c (host=xxxxx-xxx-cdh-cdn019.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.3m (76164245478)
Instance 94481a81355e51e4:51fd9f9500000045 (host=xxxxx-xxx-cdh-cdn007.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.3m (75588331159)
Instance 94481a81355e51e4:51fd9f9500000044 (host=xxxxx-xxx-cdh-cdn010.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.2m (73596530464)
Instance 94481a81355e51e4:51fd9f9500000042 (host=xxxxx-xxx-cdh-cdn018.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.2m (72946574082)
Instance 94481a81355e51e4:51fd9f9500000055 (host=xxxxx-xxx-cdh-cdn026.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.2m (69918383242)
Instance 94481a81355e51e4:51fd9f9500000054 (host=xxxxx-xxx-cdh-cdn011.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.2m (69355611992)
Instance 94481a81355e51e4:51fd9f9500000051 (host=xxxxx-xxx-cdh-cdn009.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.1m (68527129814)
Instance 94481a81355e51e4:51fd9f9500000048 (host=xxxxx-xxx-cdh-cdn016.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.1m (67249633571)
Instance 94481a81355e51e4:51fd9f9500000047 (host=xxxxx-xxx-cdh-cdn013.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.1m (63989781076)
Instance 94481a81355e51e4:51fd9f9500000041 (host=xxxxx-xxx-cdh-cdn028.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.0m (62739870946)
Instance 94481a81355e51e4:51fd9f950000003f (host=xxxxx-xxx-cdh-cdn025.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.0m (62136511127)
Instance 94481a81355e51e4:51fd9f950000004c (host=xxxxx-xxx-cdh-cdn005.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.0m (61943905274)
Instance 94481a81355e51e4:51fd9f9500000046 (host=xxxxx-xxx-cdh-cdn027.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.0m (61955797776)
Instance 94481a81355e51e4:51fd9f950000004e (host=xxxxx-xxx-cdh-cdn021.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 1.0m (60045780252)
Instance 94481a81355e51e4:51fd9f9500000040 (host=xxxxx-xxx-cdh-cdn029.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 58.05s (58048904552)
Instance 94481a81355e51e4:51fd9f950000004a (host=xxxxx-xxx-cdh-cdn023.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 57.34s (57338024825)
Instance 94481a81355e51e4:51fd9f9500000049 (host=xxxxx-xxx-cdh-cdn008.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 53.13s (53130104765)
Instance 94481a81355e51e4:51fd9f9500000056 (host=xxxxx-xxx-cdh-cdn004.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 43.24s (43238668974)
Instance 94481a81355e51e4:51fd9f950000003e (host=xxxxx-xxx-cdh-cdn015.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=0)
- TotalTime: 56.2m (3373973559713)
我省略了其他不相关的信息,只留下了我感兴趣的信息。现在我可以清楚地看到哪个服务器节点是瓶颈,它是主机xxxxx-xxx-cdh-cdn015.xxx.XXXXXX.com,用时56.2分钟,而其他所有主机用时约40秒至2分钟。
现在,我记得另一个HDFS SCAN具有相同的问题,即操作01(01:SCAN HDFS),所以我执行了相同的egrep命令(请记住,不同操作的缩进可能不同,所以我需要在Profile文件中搜索到那个操作,然后将其之前的空格完全复制粘贴,以获得我想要的结果):
egrep ' Instance .*)|^ HDFS_SCAN_NODE (id=1)|^ - TotalTime: ' profile-example.txt 结果再次证实了同样的结论:
....
Instance 94481a81355e51e4:51fd9f950000000c (host=xxxxx-xxx-cdh-cdn015.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=1)
- TotalTime: 13.3m (798747290751)
...
Instance 94481a81355e51e4:51fd9f9500000007 (host=xxxxx-xxx-cdh-cdn001.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=1)
- TotalTime: 28.16s (28163113611)
Instance 94481a81355e51e4:51fd9f9500000018 (host=xxxxx-xxx-cdh-cdn009.xxx.XXXXXX.com:22000)
...
HDFS_SCAN_NODE (id=1)
- TotalTime: 23.29s (23285966387)
...
很明显,同样的主机xxxxx-xxx-cdh-cdn015.xxx.XXXXXX.com也有同样的问题,它的运行速度比其他主机慢得多,13.3分钟比 28.16秒。
然后我得出结论,该服务器出现了一些问题,需要修复。