18116

Why hdfs throwing LeaseExpiredException in Hadoop cluster (AWS EMR)

Question:

I am getting LeaseExpiredException in hadoop cluster -

<strong>tail -f /var/log/hadoop-hdfs/hadoop-hdfs-namenode-ip-172-30-2-148.log</strong>

<blockquote>

2016-09-21 11:54:14,533 INFO BlockStateChange (IPC Server handler 10 on 8020): BLOCK* InvalidateBlocks: add blk_1073747501_6677 to 172.30.2.189:50010 2016-09-21 11:54:14,534 INFO org.apache.hadoop.ipc.Server (IPC Server handler 31 on 8020): IPC Server handler 31 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.complete from 172.30.2.189:37674 Call#34 Retry#0: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /tmp/hive/hadoop/_tez_session_dir/1e4f71f0-9f29-468d-980e-9f19690bf849/.tez/application_1474442135017_0114/recovery/1/summary (inode 26350): File does not exist. Holder DFSClient_NONMAPREDUCE_-143782605_1 does not have any open files. 2016-09-21 11:54:15,557 INFO org.apache.hadoop.hdfs.StateChange (IPC Server handler 0 on 8020): BLOCK* allocate blk_1073747503_6679{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-86592ba7-c51a-431d-8019-9e362d721b28:NORMAL:172.30.2.189:50010|RBW]]} for /var/log/hadoop-yarn/apps/hadoop/logs/application_1474442135017_0114/ip-172-30-2-122.us-west-2.compute.internal_8041.tmp

</blockquote>

And, some of the hive query is also failing. I am guessing, it is because of above issue.

<strong>tail -f /var/log/hive/hive-server2.log</strong>

2016-09-21T11:59:35,126 INFO [HiveServer2-Background-Pool: Thread-3883([])]: ql.Driver (Driver.java:execute(1477)) - Executing command(queryId=hive_20160921115934_c56d9c91-640b-4f5d-b490-34549a4258c7): INSERT INTO TABLE validation_logs SELECT "18364", "TABLE_VALIDATION", error.code, error.validator, get_json_object(key, '$.table_name'), NULL, NULL, error.failure_msg, FROM_UNIXTIME(UNIX_TIMESTAMP('20160921','yyyyMMdd')), from_unixtime(unix_timestamp()) FROM (SELECT MAP(concat("{\"table_name\" : \"", table_name , "\"}"), error) AS err_map FROM table_level_validation_result ) AS res LATERAL VIEW EXPLODE(res.err_map) tmp AS key, error WHERE error IS NOT NULL AND (error.code="error" OR error.code="warn") 2016-09-21T11:59:35,126 INFO [HiveServer2-Background-Pool: Thread-3883([])]: ql.Driver (SessionState.java:printInfo(1054)) - Query ID = hive_20160921115934_c56d9c91-640b-4f5d-b490-34549a4258c7 2016-09-21T11:59:35,126 INFO [HiveServer2-Background-Pool: Thread-3883([])]: ql.Driver (SessionState.java:printInfo(1054)) - Total jobs = 1 2016-09-21T11:59:35,127 INFO [HiveServer2-Background-Pool: Thread-3883([])]: ql.Driver (SessionState.java:printInfo(1054)) - Launching Job 1 out of 1 2016-09-21T11:59:35,127 INFO [HiveServer2-Background-Pool: Thread-3883([])]: ql.Driver (Driver.java:launchTask(1856)) - Starting task [Stage-1:MAPRED] in serial mode 2016-09-21T11:59:35,127 INFO [HiveServer2-Background-Pool: Thread-3883([])]: tez.TezSessionPoolManager (TezSessionPoolManager.java:canWorkWithSameSession(404)) - The current user: hadoop, session user: hadoop 2016-09-21T11:59:35,127 INFO [HiveServer2-Background-Pool: Thread-3883([])]: tez.TezSessionPoolManager (TezSessionPoolManager.java:canWorkWithSameSession(421)) - Current queue name is null incoming queue name is null 2016-09-21T11:59:35,173 INFO [HiveServer2-Background-Pool: Thread-3883([])]: ql.Context (Context.java:getMRScratchDir(340)) - New scratch dir is hdfs://ip-172-30-2-148.us-west-2.compute.internal:8020/tmp/hive/hadoop/65cf7f02-a7d3-40ba-a93f-ff5214afbdfc/hive_2016-09-21_11-59-34_474_5003281239065359634-127 2016-09-21T11:59:35,174 INFO [HiveServer2-Background-Pool: Thread-3883([])]: exec.Task (TezTask.java:updateSession(279)) - Session is already open 2016-09-21T11:59:35,175 INFO [HiveServer2-Background-Pool: Thread-3883([])]: tez.DagUtils (DagUtils.java:createLocalResource(758)) - Resource modification time: 1474459142291 for hdfs://ip-172-30-2-148.us-west-2.compute.internal:8020/tmp/hive/hadoop/_tez_session_dir/85d36c12-c629-44a8-b23c-c628898a79b7/commons-vfs2-2.0.jar 2016-09-21T11:59:35,176 INFO [HiveServer2-Background-Pool: Thread-3883([])]: tez.DagUtils (DagUtils.java:createLocalResource(758)) - Resource modification time: 1474459142320 for hdfs://ip-172-30-2-148.us-west-2.compute.internal:8020/tmp/hive/hadoop/_tez_session_dir/85d36c12-c629-44a8-b23c-c628898a79b7/emr-ddb-hive.jar 2016-09-21T11:59:35,177 INFO [HiveServer2-Background-Pool: Thread-3883([])]: tez.DagUtils (DagUtils.java:createLocalResource(758)) - Resource modification time: 1474459142353 for hdfs://ip-172-30-2-148.us-west-2.compute.internal:8020/tmp/hive/hadoop/_tez_session_dir/85d36c12-c629-44a8-b23c-c628898a79b7/emr-hive-goodies.jar 2016-09-21T11:59:35,178 INFO [HiveServer2-Background-Pool: Thread-3883([])]: tez.DagUtils (DagUtils.java:createLocalResource(758)) - Resource modification time: 1474459142389 for hdfs://ip-172-30-2-148.us-west-2.compute.internal:8020/tmp/hive/hadoop/_tez_session_dir/85d36c12-c629-44a8-b23c-c628898a79b7/emr-kinesis-hive.jar 2016-09-21T11:59:35,178 INFO [HiveServer2-Background-Pool: Thread-3883([])]: tez.DagUtils (DagUtils.java:createLocalResource(758)) - Resource modification time: 1474459142423 for hdfs://ip-172-30-2-148.us-west-2.compute.internal:8020/tmp/hive/hadoop/_tez_session_dir/85d36c12-c629-44a8-b23c-c628898a79b7/hive-contrib-2.1.0-amzn-0.jar 2016-09-21T11:59:35,179 INFO [HiveServer2-Background-Pool: Thread-3883([])]: tez.DagUtils (DagUtils.java:createLocalResource(758)) - Resource modification time: 1474459142496 for hdfs://ip-172-30-2-148.us-west-2.compute.internal:8020/tmp/hive/hadoop/_tez_session_dir/85d36c12-c629-44a8-b23c-c628898a79b7/hive-plugins-0.0.1-emr-upgrade-20160919.070538-1.jar 2016-09-21T11:59:35,179 INFO [HiveServer2-Background-Pool: Thread-3883([])]: exec.Task (TezTask.java:build(321)) - Dag name: INSERT INTO TABLE valid...error.code="warn")(Stage-1) 2016-09-21T11:59:35,180 INFO [HiveServer2-Background-Pool: Thread-3883([])]: ql.Context (Context.java:getMRScratchDir(340)) - New scratch dir is hdfs://ip-172-30-2-148.us-west-2.compute.internal:8020/tmp/hive/hadoop/65cf7f02-a7d3-40ba-a93f-ff5214afbdfc/hive_2016-09-21_11-59-34_474_5003281239065359634-127 2016-09-21T11:59:35,223 INFO [HiveServer2-Background-Pool: Thread-3881([])]: impl.YarnClientImpl (YarnClientImpl.java:submitApplication(273)) - Submitted application application_1474442135017_0147 2016-09-21T11:59:35,224 INFO [HiveServer2-Background-Pool: Thread-3881([])]: client.TezClient (TezClient.java:start(477)) - The url to track the Tez Session: http://ip-172-30-2-148.us-west-2.compute.internal:20888/proxy/application_1474442135017_0147/ 2016-09-21T11:59:35,391 INFO [HiveServer2-Background-Pool: Thread-3429([])]: SessionState (SessionState.java:printInfo(1054)) - Map 1: 0(+0,-4)/1 2016-09-21T11:59:35,446 ERROR [HiveServer2-Background-Pool: Thread-3429([])]: SessionState (SessionState.java:printError(1063)) - Status: Failed 2016-09-21T11:59:35,447 ERROR [HiveServer2-Background-Pool: Thread-3429([])]: SessionState (SessionState.java:printError(1063)) - Vertex failed, vertexName=Map 1, vertexId=vertex_1474442135017_0134_2_00, diagnostics=[Task failed, taskId=task_1474442135017_0134_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1474442135017_0134_2_00_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: java.io.FileNotFoundException: No such file or directory 's3://data-platform-insights/data-platform/internal_test_automation/2016/09/21/18364/logs/validations/table_col_aggregate_validation_result/.hive-staging_hive_2016-09-21_11-57-58_703_5106478639780932144-1/_tmp.-ext-10000/000000_0.gz' at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:198) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:160) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.io.IOException: java.io.FileNotFoundException: No such file or directory 's3://data-platform-insights/data-platform/internal_test_automation/2016/09/21/18364/logs/validations/table_col_aggregate_validation_result/.hive-staging_hive_2016-09-21_11-57-58_703_5106478639780932144-1/_tmp.-ext-10000/000000_0.gz' at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:360) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172) ... 14 more Caused by: java.io.IOException: java.io.FileNotFoundException: No such file or directory 's3://data-platform-insights/data-platform/internal_test_automation/2016/09/21/18364/logs/validations/table_col_aggregate_validation_result/.hive-staging_hive_2016-09-21_11-57-58_703_5106478639780932144-1/_tmp.-ext-10000/000000_0.gz' at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:299) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) ... 19 more Caused by: java.io.FileNotFoundException: No such file or directory 's3://data-platform-insights/data-platform/internal_test_automation/2016/09/21/18364/logs/validations/table_col_aggregate_validation_result/.hive-staging_hive_2016-09-21_11-57-58_703_5106478639780932144-1/_tmp.-ext-10000/000000_0.gz' at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:818) at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSystem.java:1193) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:771) at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.open(EmrFileSystem.java:168) at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:297) ... 20 more ], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : attempt_1474442135017_0134_2_00_000000_1:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: java.io.FileNotFoundException: No such file or directory 's3://data-platform-insights/data-platform/internal_test_automation/2016/09/21/18364/logs/validations/table_col_aggregate_validation_result/.hive-staging_hive_2016-09-21_11-57-58_703_5106478639780932144-1/_tmp.-ext-10000/000000_0.gz' at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:198) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:160) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.io.IOException: java.io.FileNotFoundException: No such file or directory 's3://data-platform-insights/data-platform/internal_test_automation/2016/09/21/18364/logs/validations/table_col_aggregate_validation_result/.hive-staging_hive_2016-09-21_11-57-58_703_5106478639780932144-1/_tmp.-ext-10000/000000_0.gz' at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:360) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172) ... 14 more Caused by: java.io.IOException: java.io.FileNotFoundException: No such file or directory 's3://data-platform-insights/data-platform/internal_test_automation/2016/09/21/18364/logs/validations/table_col_aggregate_validation_result/.hive-staging_hive_2016-09-21_11-57-58_703_5106478639780932144-1/_tmp.-ext-10000/000000_0.gz' at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:299) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) ... 19 more Caused by: java.io.FileNotFoundException: No such file or directory 's3://data-platform-insights/data-platform/internal_test_automation/2016/09/21/18364/logs/validations/table_col_aggregate_validation_result/.hive-staging_hive_2016-09-21_11-57-58_703_5106478639780932144-1/_tmp.-ext-10000/000000_0.gz' at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:818) at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSystem.java:1193) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:771) at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.open(EmrFileSystem.java:168) at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:297) ... 20 more ], TaskAttempt 2 failed, info=[Error: Error while running task ( failure ) : attempt_1474442135017_0134_2_00_000000_2:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: java.io.FileNotFoundException: No such file or directory 's3://data-platform-insights/data-platform/internal_test_automation/2016/09/21/18364/logs/validations/table_col_aggregate_validation_result/.hive-staging_hive_2016-09-21_11-57-58_703_5106478639780932144-1/_tmp.-ext-10000/000000_0.gz' at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:198) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:160) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.io.IOException: java.io.FileNotFoundException: No such file or directory 's3://data-platform-insights/data-platform/internal_test_automation/2016/09/21/18364/logs/validations/table_col_aggregate_validation_result/.hive-staging_hive_2016-09-21_11-57-58_703_5106478639780932144-1/_tmp.-ext-10000/000000_0.gz' at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:360) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172) ... 14 more Caused by: java.io.IOException: java.io.FileNotFoundException: No such file or directory 's3://data-platform-insights/data-platform/internal_test_automation/2016/09/21/18364/logs/validations/table_col_aggregate_validation_result/.hive-staging_hive_2016-09-21_11-57-58_703_5106478639780932144-1/_tmp.-ext-10000/000000_0.gz' at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:299) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:203) ... 19 more Caused by: java.io.FileNotFoundException: No such file or directory 's3://data-platform-insights/data-platform/internal_test_automation/2016/09/21/18364/logs/validations/table_col_aggregate_validation_result/.hive-staging_hive_2016-09-21_11-57-58_703_5106478639780932144-1/_tmp.-ext-10000/000000_0.gz' at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:818) at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.open(S3NativeFileSystem.java:1193) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:771) at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.open(EmrFileSystem.java:168) at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:297) ... 20 more ], TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : attempt_1474442135017_0134_2_00_000000_3:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: java.io.FileNotFoundException: No such file or directory 's3://data-platform-insights/data-platform/internal_test_automation/2016/09/21/18364/logs/validations/table_col_aggregate_validation_result/.hive-staging_hive_2016-09-21_11-57-58_703_5106478639780932144-1/_tmp.-ext-10000/000000_0.gz' at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:198) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:160) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.io.IOException: java.io.FileNotFoundException: No such file or directory 's3://data-platform-insights/data-platform/internal_test_automation/2016/09/21/18364/logs/validations/table_col_aggregate_validation_result/.hive-staging_hive_2016-09-21_11-57-58_703_5106478639780932144-1/_tmp.-ext-10000/000000_0.gz' at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:360) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172) ... 14 more

<strong>Hive logs with DEBUG mode enabled -</strong>

Highlighted exceptions in green color.

As per my understanding, just before exception, it replaced file name to some other name, and all these happens in S3. Since, S3 is eventual consistent, thats why sometimes it shows this exception,and sometimes it worked file.

<a href="https://docs.google.com/document/d/1cwXVqQ3p-xPFcBqU9AuD7C8z8rHjhUIHwPjY-nVpFK0/edit?usp=sharing" rel="nofollow">https://docs.google.com/document/d/1cwXVqQ3p-xPFcBqU9AuD7C8z8rHjhUIHwPjY-nVpFK0/edit?usp=sharing</a>

Also set hive configuration properties before executing the query -

set hive.mapjoin.smalltable.filesize = 2000000000 set mapreduce.map.speculative = false set mapreduce.output.fileoutputformat.compress = true set hive.exec.compress.output = true set mapreduce.task.timeout = 6000000 set hive.optimize.bucketmapjoin.sortedmerge = true set io.compression.codecs = org.apache.hadoop.io.compress.GzipCode set hive.auto.convert.sortmerge.join.noconditionaltask = false set hive.optimize.bucketmapjoin = true set hive.exec.compress.intermediate = true set hive.enforce.bucketmapjoin = true set mapred.output.compress = true set mapreduce.map.output.compress = true set hive.auto.convert.sortmerge.join = false set hive.auto.convert.join = false set mapreduce.reduce.speculative = false set mapred.output.compression.codec = org.apache.hadoop.io.compress.GzipCodec set hive.cache.expr.evaluation=false set mapred.output.compress=true set hive.exec.compress.output=true set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec set hive.exec.compress.intermediate=true set mapreduce.map.output.compress=true set hive.auto.convert.join=false set mapreduce.map.speculative=false set mapreduce.reduce.speculative=false

<strong>Cluster details -</strong>

<ol><li>one Data-node with 32 GB disk space.</li> <li>Hive - 2.1.0, execution engine - tez 0.8.3</li> <li>hadoop - 2.7.2</li> </ol>

<strong>Questions-</strong>

<ol><li>Why it is throwing LeaseExpiredException ?</li> <li>IS Hive query failure related to LeaseExpiredException ?</li> <li>Is it because of wrong hive configuration properties ?</li> </ol>

<strong>Update-1</strong>

As per this answer - <a href="https://stackoverflow.com/questions/26842933/leaseexpiredexception-no-lease-error-on-hdfs-failed-to-close-file" rel="nofollow">LeaseExpiredException: No lease error on HDFS (Failed to close file)</a>,

I added

SET hive.exec.max.dynamic.partitions=100000; SET hive.exec.max.dynamic.partitions.pernode=100000;

But then also showing the same exception.

Answer1:

I resolved the issue. Let me explain in detail.

Exceptions that is coming -

<ol><li>LeaveExpirtedException - from HDFS side.</li> <li>FileNotFoundException - from Hive side (when Tez execution engine executes DAG)</li> </ol>

<strong>Problem scenario-</strong>

<ol><li>We just upgraded the hive version from 0.13.0 to 2.1.0. And, everything was working fine with previous version. Zero runtime exception.</li> </ol>

<strong>Different thoughts to resolve the issue -</strong>

<ol><li>

First thought was, two threads was working on same piece because of NN intelligence. But as per below settings

set mapreduce.map.speculative=false set mapreduce.reduce.speculative=false

</li> </ol>

that was not possible.

<ol start="2"><li>

then, I increase the count from 1000 to 100000 for below settings -

SET hive.exec.max.dynamic.partitions=100000; SET hive.exec.max.dynamic.partitions.pernode=100000;

</li> </ol>

that also didn't work.

<ol start="3"><li>

Then the third thought was, definitely in a same process, what mapper-1 was created was deleted by another mapper/reducer. But, we didn't found any such logs in Hveserver2, Tez logs.

</li> <li>

Finally the root cause lies in a application layer code itself. In hive-exec-2.1.0 version, they introduced new configuration property

"hive.exec.stagingdir":".hive-staging"

</li> </ol>

Description of above property -

<blockquote>

Directory name that will be created inside table locations in order to support HDFS encryption. This is replaces ${hive.exec.scratchdir} for query results with the exception of read-only tables. In all cases ${hive.exec.scratchdir} is still used for other temporary files, such as job plans.

</blockquote>

So if there is any concurrent jobs in Application layer code (ETL), and are doing operation(rename/delete/move) on same table, then it may lead to this problem.

And, in our case, 2 concurrent jobs are doing "INSERT OVERWRITE" on same table, that leads to delete metadata file of 1 mapper, that is causing this issue.

Resolution -

<ol><li>Move the metadata file location to outside table(table lies in S3).</li> <li>Disable HDFS encryption (as mentioned in Description of stagingdir property.)</li> <li>Change into your Application layer code to avoid concurrency issue.</li> </ol>

Related question - <a href="https://stackoverflow.com/questions/39547001/why-hive-staging-file-is-missing-in-aws-emr/39715837#39715837" rel="nofollow">Why hive_staging file is missing in AWS EMR</a>

Recommend

  • What are the deviations between preprocessors of GCC/CLANG vs MSVC?
  • using matrices algebra to manipulate strings: is it feasible?
  • How to implement sub menu of the current route's children with vue.js and vue router
  • WebViewClient shouldInterceptRequest Freezes the WebView
  • Sampling a list with linq
  • conda returns 'Solving environment: failed'
  • Why does ng-mouseover not work with ng-if
  • how to set header font color in iText
  • What is the meaning of 'controlAggregation' in the SAPUI5 routing configuration?
  • Django cascade save?
  • Show Link on image at hovering in jquery
  • Using $injector vs injecting a service directly
  • Web Audio API: How do I play a mono source in only left or right channel?
  • Python - Pass Arguments to Different Methods from Argparse
  • How do i reset the geo-location service settings each time my page loads with HTML5 and Javascript
  • How to convert newick tree format to tree like Hierarchical javascript object
  • android ListView item click
  • PHP Sum from a file
  • How to start java project's exe file as windows service? [closed]
  • How to create a dump of the current ASP.NET managed process for debugging?
  • How do I do select from a one-to-many relationship in CodeIgniter?
  • Reconstruction MSE calculation using h2o.anomaly function from H2O R package
  • .HTML5 alternative for frames with 1 fixed (menu) column
  • How to Share Sqlite Between Two Application?
  • r - fill columns in data frame
  • How deleteLater() actually works in qt?
  • How to read config files with section in bash shell
  • How to add a CAGradientLayer to a UIView when using programmatic constraints
  • Http Requests not getting routed to Https NodeJs
  • Can't perform CORS request using Angularjs
  • how can i close the resultSet, prepareStatement, conn in several methods below to avoid rs close and
  • Octave code for gradient descent using vectorization not updating cost function correctly
  • Create One-To-One relationship between table and view in EF4?
  • Move elements from a listbox to another
  • Display variable as html in Google App Script?
  • Jenkins sending notifications to the wrong commit id
  • Bad automatic Triangulation with Mayavi for coloring a surface known only by its corner