18711

Debugging hadoop in eclipse

Is it possible to debug Hadoop's source code in Eclipse?I'm not asking about the map reduce tasks. I want to see which part of the Hadoop source code is responsible for scheduling the map reduce tasks and how it works. Is there any mechanism by which it can be done?

Answer1:

You can download Hadoop project and integrate it to your eclipse, and use F5 or F6 to debug. You have different mode of debugging in eclipse:

<ol> <li>F5 : Step by Step debugging</li> <li>F6 : Skips loops and Subroutines</li> <li>F7 : Skips the loop or subroutine and returns to the last cursor point.</li> <li>F8 : Execute and come out of debugging</li> </ol>

Or you can try yourself to understand the workflow by following step by step, you can begin from your run() method in your main.

To answer your question: who does schedule the map task? <img src="https://i.stack.imgur.com/cXBny.png" alt="">

As you can see in this schema, files are divided by the InputFormat class into fixed-size pieces called InputSplits. Each split is then given to a mapper, which is a node that was assigned a map task.

The same InputFormat class also provides a RecordReader responsible for parsing the split and extracting records.Each record is passed to a map function as a (key, value) pair. So the Mapper class is the one who call map methods.

Here is the workflow of the wordcount example:

<img src="https://i.stack.imgur.com/IssNF.png" alt="enter image description here">

Where the FileInputFormat is an abstract class that extends the abstract class InputFormat, and the TextInputFormat extends the FileInputFormat class.

Answer2:

Here are instructions from Apache Hadoop documentation. I haven't tried them out, but the instructions are good enough to get started.

Recommend

  • Consuming to a .NET SOAP service from PHP with authentication
  • Android Invoke RIL methods
  • Powershell executing makecert with variables giving Too many parameters error
  • Sending a value from opened form 2 to form 1
  • interstitialAd will not show up
  • Slick Transactionally future is not invoked in Play for Scala [duplicate]
  • Java: compile-time resolution and “most specific method” as it applies to variable arity
  • Automatic string resources translate
  • CakePHP: send mail through SMTP to MS OUTLOOK -> SMTP Error: 504 5.7.4 Unrecognized authenticatio
  • Conflicting CLASS files
  • PHP OOP :: passing session key between classes
  • Pyrocms Contact form in facebook tabs - displaying errors on page load
  • Loop vectorization gives different answer
  • Use of eval to load modules
  • How can i compare more number of images in two folders using Perl
  • How to escape “?” using regex in .htaccess for mod_rewrite
  • Can I have more than 32 netlink sockets in kernelspace?
  • Why VBA goes to error handling code when there is no error?
  • Query timeout expired in django-mssql when executing custom SQL directly
  • Pre-populated SQLite Database not reading properly in Android Studio
  • matlab crashes without dump file when using fopen for file
  • Converting query results into DataFrame in python
  • Spring integration inbound-gateway Fire an event when queue is empty
  • Primefaces ManyCheckbox inside ui:repeat calls setter method only for last loop
  • Possible to get mouse events fired when cursor is outside page?
  • Debugging VB6 Code From Visual Studio 2010
  • Diff between two dataframes in pandas
  • debug library loaded with ctypes using gdb
  • Mysterious problem with floating point in LISP - time axis generation
  • How to know when stdin is empty if it contains EOF?
  • MailKit: The IMAP server replied to the 'EXAMINE' command with a 'BAD' response
  • Record samples being played with OpenAL
  • Debugging ASP.NET on a built-in web server suddenly stops
  • Regex thinks I'm nesting, but I'm not
  • How to add a column to a Pandas dataframe made of arrays of the n-preceding values of another column
  • Fill an image in a square container while keeping aspect ratio
  • Timeout for blocking function call, i.e., how to stop waiting for user input after X seconds?
  • Rearranging Cells in UITableView Bug & Saving Changes
  • Windows forms listbox.selecteditem displaying “System.Data.DataRowView” instead of actual value
  • Reading document lines to the user (python)