Consider the following Python code:
import io import time import subprocess import sys from thread import start_new_thread def ping_function(ip): filename = 'file.log' command = ["ping", ip] with io.open(filename, 'wb') as writer, io.open(filename, 'rb', 1) as reader: process = subprocess.Popen(command, stdout=writer) while process.poll() is None: line = reader.read() # Do something with line sys.stdout.write(line) time.sleep(0.5) # Read the remaining sys.stdout.write(reader.read()) ping_function("google.com")
The goal is to run a shell command (in this case <strong>ping</strong>, but it is not relevant here) and to process the output in real time, which is also saved on a log file.
In other word, <strong>ping</strong> is running in background and it produces output on the terminal every second. My code will read this output (every 0.5 seconds), parse it and take some action in (<em>almost</em>) real time.
Realtime here means that I don't want to wait the end of the process to read the output. In this case actually <strong>ping</strong> never completes so an approach like the one I have just described is mandatory.
I have tested the code above and it actually works OK :)
Now I'd like to tun this in a separate thread, so I have replaced the last line with the following:
from thread import start_new_thread start_new_thread(ping_function, ("google.com", ))
For some reason this does not work anymore, and the reader always return empty strings. In particular, the string returned by <strong>reader.read()</strong> is always empty.
Using a Queue or another global variable is not going to help, because I am having problems even to retrieve the data in the first place (i.e. to obtain the output of the shell command)
My questions are:<ul><li>
How can I explain this behavior?</li> <li>
Is it a good idea to run a process inside a separate thread or I should use a different approach? <a href="http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them" rel="nofollow">This article</a> suggests that it is not...</li> <li>
How can I fix the code?</li> </ul>
You should never fork after starting threads. You can thread after starting a fork, so you can have a thread handle the I/O piping, but...
Let me repeat this: <strong>You should never fork after starting threads</strong>
That article explains it pretty well. You don't have control over the state of your program once you start threads. Especially in Python with things going on in the background.
To fix your code, just start the subprocess from the main thread, then start threading. It's perfectly OK to process the I/O from the pipes in a thread.