Jonathan Stewmon
2014-07-14 23:50:23 UTC
I suspect this is a supervisord bug, but wanted to ask the mailing list
before filing an issue in case there is some problem with my configuration
or use of supervisord.
I am using supervisord to manage RabbitMQ consumers written in python. I
have a total of 8 programs, all using multiple processes for a total of 50
processes. Everything is working as expected with one exception - if a
large number of child processes end simultaneously, supervisord will
actually crash with the error "IOError: [Errno 4] Interrupted system call".
I can reproduce the error by sending TERM to the child processes or by
restarting the RabbitMQ server to which they are connected (a real-world
scenario).
The top of the stack varies, but the last 5 calls are always the same.
Below is an example:
load_entry_point('supervisor==3.0', 'console_scripts', 'supervisord')()
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/supervisord.py",
line 360, in main
go(options)
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/supervisord.py",
line 370, in go
d.main()
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/supervisord.py",
line 83, in main
self.run()
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/supervisord.py",
line 100, in run
self.runforever()
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/supervisord.py",
line 252, in runforever
[ group.transition() for group in pgroups ]
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/process.py",
line 697, in transition
proc.transition()
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/process.py",
line 560, in transition
logger.info('success: %s %s' % (self.config.name, msg))
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/loggers.py",
line 273, in info
self.log(LevelsByName.INFO, msg, **kw)
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/loggers.py",
line 291, in log
handler.emit(record)
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/loggers.py",
line 86, in emit
self.handleError(record)
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/loggers.py",
line 90, in handleError
traceback.print_exception(ei[0], ei[1], ei[2], None, sys.stderr)
File
"/usr/local/Cellar/python/2.7.7_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/traceback.py",
line 124, in print_exception
_print(file, 'Traceback (most recent call last):')
File
"/usr/local/Cellar/python/2.7.7_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/traceback.py",
line 13, in _print
file.write(str+terminator)
IOError: [Errno 4] Interrupted system call
Below is the supervisord.conf I'm using (I did change the program names to
something generic):
[supervisord]
logfile = /var/log/wsm/pyjobs/supervisord.log
logfile_maxbytes = 50MB
logfile_backups = 5
loglevel = info
pidfile = /tmp/supervisord.pid
nodaemon = false
minfds = 1024
minprocs = 200
umask = 022
#directory = /home/pyjobs
#user = pyjobs
identifier = supervisor
nocleanup = true
childlogdir = /var/log/wsm/pyjobs
strip_ansi = false
[unix_http_server]
file=/tmp/supervisor.sock
[inet_http_server]
port=:9001
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock
[rpcinterface:supervisor]
supervisor.rpcinterface_factory =
supervisor.rpcinterface:make_main_rpcinterface
[program:program1]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program1
process_name=%(program_name)s-%(process_num)s
numprocs=25
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
[program:program2]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program2
process_name=%(program_name)s-%(process_num)s
numprocs=5
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
[program:program3]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program3
process_name=%(program_name)s-%(process_num)s
numprocs=2
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
[program:program4]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program4
process_name=%(program_name)s-%(process_num)s
numprocs=2
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
[program:program5]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program5
process_name=%(program_name)s-%(process_num)s
numprocs=3
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
[program:program6]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program6
process_name=%(program_name)s-%(process_num)s
numprocs=3
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
[program:program7]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program7
process_name=%(program_name)s-%(process_num)s
numprocs=5
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
[program:program8]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program8
process_name=%(program_name)s-%(process_num)s
numprocs=5
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
I'm using python 2.7.7 (installed from homebrew) in a virtualenv on OS X
Mavericks:
Python 2.7.7 (default, Jun 14 2014, 23:12:13)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Thanks,
Jonathan
before filing an issue in case there is some problem with my configuration
or use of supervisord.
I am using supervisord to manage RabbitMQ consumers written in python. I
have a total of 8 programs, all using multiple processes for a total of 50
processes. Everything is working as expected with one exception - if a
large number of child processes end simultaneously, supervisord will
actually crash with the error "IOError: [Errno 4] Interrupted system call".
I can reproduce the error by sending TERM to the child processes or by
restarting the RabbitMQ server to which they are connected (a real-world
scenario).
The top of the stack varies, but the last 5 calls are always the same.
Below is an example:
load_entry_point('supervisor==3.0', 'console_scripts', 'supervisord')()
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/supervisord.py",
line 360, in main
go(options)
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/supervisord.py",
line 370, in go
d.main()
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/supervisord.py",
line 83, in main
self.run()
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/supervisord.py",
line 100, in run
self.runforever()
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/supervisord.py",
line 252, in runforever
[ group.transition() for group in pgroups ]
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/process.py",
line 697, in transition
proc.transition()
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/process.py",
line 560, in transition
logger.info('success: %s %s' % (self.config.name, msg))
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/loggers.py",
line 273, in info
self.log(LevelsByName.INFO, msg, **kw)
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/loggers.py",
line 291, in log
handler.emit(record)
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/loggers.py",
line 86, in emit
self.handleError(record)
File
"/Users/jstewmon/.virtualenvs/pyjobs/lib/python2.7/site-packages/supervisor/loggers.py",
line 90, in handleError
traceback.print_exception(ei[0], ei[1], ei[2], None, sys.stderr)
File
"/usr/local/Cellar/python/2.7.7_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/traceback.py",
line 124, in print_exception
_print(file, 'Traceback (most recent call last):')
File
"/usr/local/Cellar/python/2.7.7_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/traceback.py",
line 13, in _print
file.write(str+terminator)
IOError: [Errno 4] Interrupted system call
Below is the supervisord.conf I'm using (I did change the program names to
something generic):
[supervisord]
logfile = /var/log/wsm/pyjobs/supervisord.log
logfile_maxbytes = 50MB
logfile_backups = 5
loglevel = info
pidfile = /tmp/supervisord.pid
nodaemon = false
minfds = 1024
minprocs = 200
umask = 022
#directory = /home/pyjobs
#user = pyjobs
identifier = supervisor
nocleanup = true
childlogdir = /var/log/wsm/pyjobs
strip_ansi = false
[unix_http_server]
file=/tmp/supervisor.sock
[inet_http_server]
port=:9001
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock
[rpcinterface:supervisor]
supervisor.rpcinterface_factory =
supervisor.rpcinterface:make_main_rpcinterface
[program:program1]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program1
process_name=%(program_name)s-%(process_num)s
numprocs=25
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
[program:program2]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program2
process_name=%(program_name)s-%(process_num)s
numprocs=5
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
[program:program3]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program3
process_name=%(program_name)s-%(process_num)s
numprocs=2
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
[program:program4]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program4
process_name=%(program_name)s-%(process_num)s
numprocs=2
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
[program:program5]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program5
process_name=%(program_name)s-%(process_num)s
numprocs=3
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
[program:program6]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program6
process_name=%(program_name)s-%(process_num)s
numprocs=3
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
[program:program7]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program7
process_name=%(program_name)s-%(process_num)s
numprocs=5
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
[program:program8]
command=/var/lib/wsm/pyjobs/pyjobs-current/bin/python -m rmn.program8
process_name=%(program_name)s-%(process_num)s
numprocs=5
#directory=/home/pyjobs
#user=pyjobs
umask=022
autostart=true
autorestart=true
exitcodes=0
stopsignal=TERM
stopwaitsecs=30
stdout_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=5
stderr_logfile=/var/log/wsm/pyjobs/%(program_name)s-%(process_num)s.error.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=5
serverurl=AUTO
I'm using python 2.7.7 (installed from homebrew) in a virtualenv on OS X
Mavericks:
Python 2.7.7 (default, Jun 14 2014, 23:12:13)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Thanks,
Jonathan
--
This e-mail, including attachments, contains confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. The reader is hereby notified that any
dissemination, distribution or copying of this e-mail is prohibited. If you
have received this e-mail in error, please notify the sender by replying to
this message and delete this e-mail immediately.
This e-mail, including attachments, contains confidential and/or
proprietary information, and may be used only by the person or entity to
which it is addressed. The reader is hereby notified that any
dissemination, distribution or copying of this e-mail is prohibited. If you
have received this e-mail in error, please notify the sender by replying to
this message and delete this e-mail immediately.