Context Managers And Multiprocessing Pools
Solution 1:
First, this is a really great question! After digging around a bit in the multiprocessing
code, I think I've found a way to do this:
When you start a multiprocessing.Pool
, internally the Pool
object creates a multiprocessing.Process
object for each member of the pool. When those sub-processes are starting up, they call a _bootstrap
function, which looks like this:
def_bootstrap(self):
from . import util
global _current_process
try:
# ... (stuff we don't care about)
util._finalizer_registry.clear()
util._run_after_forkers()
util.info('child process calling self.run()')
try:
self.run()
exitcode = 0finally:
util._exit_function()
# ... (more stuff we don't care about)
The run
method is what actually runs the target
you gave the Process
object. For a Pool
process that's a method with a long-running while loop that waits for work items to come in over an internal queue. What's really interesting for us is what happened afterself.run
: util._exit_function()
is called.
As it turns out, that function does some clean up that sounds a lot like what you're looking for:
def_exit_function(info=info, debug=debug, _run_finalizers=_run_finalizers,
active_children=active_children,
current_process=current_process):
# NB: we hold on to references to functions in the arglist due to the# situation described below, where this function is called after this# module's globals are destroyed.global _exiting
info('process shutting down')
debug('running all "atexit" finalizers with priority >= 0') # Very interesting!
_run_finalizers(0)
Here's the docstring of _run_finalizers
:
def_run_finalizers(minpriority=None):
'''
Run all finalizers whose exit priority is not None and at least minpriority
Finalizers with highest priority are called first; finalizers with
the same priority will be called in reverse order of creation.
'''
The method actually runs through a list of finalizer callbacks and executes them:
items = [x for x in _finalizer_registry.items() if f(x)]
items.sort(reverse=True)
for key, finalizer in items:
sub_debug('calling %s', finalizer)
try:
finalizer()
except Exception:
import tracebacktraceback.print_exc()
Perfect. So how do we get into the _finalizer_registry
? There's an undocumented object called Finalize
in multiprocessing.util
that is responsible for adding a callback to the registry:
classFinalize(object):
'''
Class which supports object finalization using weakrefs
'''def__init__(self, obj, callback, args=(), kwargs=None, exitpriority=None):
assert exitpriority isNoneortype(exitpriority) isintif obj isnotNone:
self._weakref = weakref.ref(obj, self)
else:
assert exitpriority isnotNone
self._callback = callback
self._args = args
self._kwargs = kwargs or {}
self._key = (exitpriority, _finalizer_counter.next())
self._pid = os.getpid()
_finalizer_registry[self._key] = self # That's what we're looking for!
Ok, so putting it all together into an example:
import multiprocessing
from multiprocessing.util import Finalize
resource_cm = None
resource = NoneclassResource(object):
def__init__(self, args):
self.args = args
def__enter__(self):
print("in __enter__ of %s" % multiprocessing.current_process())
return self
def__exit__(self, *args, **kwargs):
print("in __exit__ of %s" % multiprocessing.current_process())
defopen_resource(args):
return Resource(args)
def_worker_init(args):
global resource
print("calling init")
resource_cm = open_resource(args)
resource = resource_cm.__enter__()
# Register a finalizer
Finalize(resource, resource.__exit__, exitpriority=16)
defhi(*args):
print("we're in the worker")
if __name__ == "__main__":
pool = multiprocessing.Pool(initializer=_worker_init, initargs=("abc",))
pool.map(hi, range(pool._processes))
pool.close()
pool.join()
Output:
calling init
in __enter__ of <Process(PoolWorker-1, started daemon)>
calling init
calling init
in __enter__ of <Process(PoolWorker-2, started daemon)>
in __enter__ of <Process(PoolWorker-3, started daemon)>
calling init
in __enter__ of <Process(PoolWorker-4, started daemon)>
we're in the worker
we're in the worker
we're in the worker
we're in the worker
in __exit__ of <Process(PoolWorker-1, started daemon)>
in __exit__ of <Process(PoolWorker-2, started daemon)>
in __exit__ of <Process(PoolWorker-3, started daemon)>
in __exit__ of <Process(PoolWorker-4, started daemon)>
As you can see __exit__
gets called in all our workers when we join()
the pool.
Solution 2:
You can subclass Process
and override its run()
method so that it performs cleanup before exit. Then you should subclass Pool
so that it uses your subclassed process:
from multiprocessing import Process
from multiprocessing.pool import Pool
classSafeProcess(Process):
""" Process that will cleanup before exit """defrun(self, *args, **kw):
result = super().run(*args, **kw)
# cleanup however you want herereturn result
classSafePool(Pool):
Process = SafeProcess
pool = SafePool(4) # use it as standard Pool
Solution 3:
Here is the solution I came up with. It uses billiard which is a fork of Python's multiprocessing package. This solution requires use of private API Worker._ensure_messages_consumed
so I DO NOT recommend using this solution in production. I just need this for a side project so this is good enough for me. Use this at your own risk.
from billiard import pool
from billiard.pool import Pool, Worker
classSafeWorker(Worker):
# this function is called just before a worker process exitsdef_ensure_messages_consumed(self, *args, **kwargs):
# Not necessary, but you can move `Pool.initializer` logic here if you want.
out = super()._ensure_messages_consumed(*args, **kwargs)
# Do clean up work herereturn out
classSafePool(Pool):
Worker = SafeWorker
Another solution I tried was to implement my clean up logic as a signal handler, but that does not work since both multiprocessing
and billiard
use exit()
to kill their worker processes. I'm not sure how atexit
works but this is probably the reason that approach does not work either.
Post a Comment for "Context Managers And Multiprocessing Pools"