I'm relatively new to Python, coming from a .Net background.
The short version: How can I create an application-wide singleton or some other mechanism to allow multiple threads/processes to communicate with each other?
Maybe I'm spoilt but in .Net, I'd just create something in
App_Start or something else at the "application level". How can I do the same in python/uwsgi?
The long version:
We've got a restful API written using Django.
Some of the calls require some pre-processing and then passing to a back-end system which performs long-running operations.
The flow at the moment looks something like...
<li>Receive request to process all documents matching given criteria</li>
<li>Api determines which documents match those criteria (on the order of 100,000 - takes 15-20s)</li>
<li>Api generates a uuid for this batch request</li>
<li>Api publishes a message to a back-end queue for each of those documents, referencing the batch id.</li>
<li>Api listens on a different queue for "completed" messages and counts success/failures for each batch id (~1-15 minutes)</li>
<li>While processing is happening, UI can request an update for a specific batch id</li>
We need to listen to the response queue using a different thread to that which is used to serve pages since it's in a wait spin-loop...
while True: self.channel.wait()
I was handling this by getting a reference to a
QueueManager which is a singleton. The manager fires off the initial request, records the batch id and then, in a second thread, monitors the queue and updates local state.
We don't actually care about preserving the state long-term - If the messages are on the queue, the processing will be done by the back-end and the state monitoring is only a cue to the user that things are underway. If they browse away, they also lose access to the status (batch id is stored in-memory in JS)
This had a couple of benefits - we avoided using a database for syncing information (and the associated cleanup). We were able to use a single thread as a message consumer and didn't have to worry about concurrency issues as only one thread will ever collect messages/update the local state.
So... Now it's time to run it using uwsgi I've found a major issue. If I set the number of processes to 1, the the singleton works as expected, but all requests are blocked during the 15-20s that the api is processing data. Clearly that's unacceptable. Conversely, if I spin up multiple workers, each has its own singleton and its own message listener - so it's pretty much random if the publisher and consumer are the same process. And even if they are, the request for a status update probably won't end up at that same process.
How can I swap state information between multiple workers? Is there a way to use multiple threads instead of multiple workers?
It seems like I really need:
<li>n threads, each serving requests</li>
<li>1 thread listening on the queue</li>
<li>some in-memory way of communicating between them</li>
Note I've already got
--enable-threads but that only seems to apply to new threads I spawn (no idea why that wouldn't be enabled by default)
To spawn multiple threads just add --threads N where N is the number of threads to spawn.
Pay attention, it will works only if you have a single worker/process. Another common approach is using the uWSGI caching framework (the name is misleading, infact is a shared dictionary). It will allows you to share data between workers and threads.