80744

Apache Flume - send only new file contents

Question:

I am a very new user to Flume, please treat me as an absolute noob. I am having a minor issue configuring Flume for a particular use case and was hoping you could assist. Note that I am not using HDFS, which is why this question is different from others you may have seen on forums.

I have two Virtual Machines (VMs) connected to each other through an internal network on Oracle Virtual Box. My goal is to have one VM watch a particular directory that will only ever have one file in it. When the file is changed, I wish for Flume to only send only the new lines/data. I want the other VM to receive this data and update/concatenate the data to a single file in a particular directory on it.

So far, I have this process very close to working. Whenever changes are made in VM1, they are updated on VM2. However, the entire file on VM1 is sent to VM2 every time, not the new lines. For example, if I wrote “Test1” and then a while later underneath wrote “Test2” to the file on VM1, on VM2 the output would be:

Test1

Test1

Test2

What I want to see is:

Test1 Test2

I am not sure how to implement this, and am sending this email after thoroughly examining the Flume user guide documentation and most relevant articles on stackoverflow/stackexchange. For your reference, below are the current configurations(they are working in the manner I mentioned above).

<a href="https://i.stack.imgur.com/a0FQ1.png" rel="nofollow">VM1 configuration</a>

<a href="https://i.stack.imgur.com/S3rDq.png" rel="nofollow">VM2 configuration</a>

I realize another solution would be to keep the configuration on VM1 and overwrite the file on VM2 everytime new contents are detected. However, I am also unsure how to implement this.

Any assistance you could provide is greatly appreciated!

Answer1:

Use TailDir source provided in Flume.It periodically writes last position read in position file and its more reliable than exec source as even in case of agent crashes or stops for some reason it will start reading from last position saved in the position file.

agent1.sources.src1.type = TAILDIR agent1.sources.src1.channels = ch1 agent1.sources.src1.filegroups =f1 agent1.sources.src1.filegroups.f1= //path to log file agent1.sources.src1.maxBackoffSleep = 10000

Set maxBackoffSleep value as per your need it means how much max time agent should wait before polling for changes in log file , when it didnt find any changes in last attempt made.

Recommend

  • String splitting in PL/SQL
  • Reset a for loop when a condition is met vba
  • PHP get selected value of select form
  • Powershell Hash Table to HTML
  • Can you pass an array from javascript to asp.net mvc controller action without using a form?
  • Compress a file with GZipStream while maintaining its meta-data
  • Cannot get text from text area
  • Sending keystrokes/mouse clicks to a Java program with Autohotkey
  • Suppressing passwd when calling sqlplus from shell script
  • Why cepheus don't send int without quotes to orion?
  • How can I sort a a table with VBA with given text condition?
  • iOS: Detect app start via notification press
  • Email verification using google app script and google forms
  • Get data from AJAX - How to
  • Using variable in a value field in jMeter
  • C# - Is there a limit to the size of an httpWebRequest stream?
  • Read text file and split every line in MSBuild
  • Optimizing database types to compact database (SQLite)
  • recyclerView does not call the onBindViewHolder when scroll in the view
  • TFS: Get latest causes slow project reloading
  • Updating server-side rendering client-side
  • Running a C# exe file
  • vba code to select only visible cells in specific column except heading
  • How to get next/previous record number?
  • Return words with double consecutive letters
  • How to pass list parameters for each object using Spring MVC?
  • Confusion with PayPal's monthly billing cycle
  • how to add data labels for bar graph in matlab
  • Matrix multiplication with MKL
  • Hits per day in Google Big Query
  • Understanding cpu registers
  • How do I configure my settings file to work with unit tests?
  • File not found error Google Drive API
  • sending mail using smtp is too slow
  • How to get NHibernate ISession to cache entity not retrieved by primary key
  • Reading document lines to the user (python)
  • How can I use `wmic` in a Windows PE script?
  • Unable to use reactive element in my shiny app
  • Converting MP3 duration time
  • How do I use LINQ to get all the Items that have a particular SubItem?