Last week I blogged how often certain web-server are used in the public internet. Here is the script I used to collect that data. I used it to test async network-coding, coroutines, closures, multi-threading in python, also to test the scalability of my OSs (Darwin/Linux) and tornado. It wasn't a well defined test, but Darwin died at 10'000 concurrent connections and Linux easily managed 80'000 connections on the same hardware.
The most important rule of async programming: Never ever block!
# _______ _______ _ _________ _ _______ # |\ /|( ___ )( ____ )( ( /|\__ __/( ( /|( ____ \ # | ) ( || ( ) || ( )|| \ ( | ) ( | \ ( || ( \/ # | | _ | || (___) || (____)|| \ | | | | | \ | || | # | |( )| || ___ || __)| (\ \) | | | | (\ \) || | ____ # | || || || ( ) || (\ ( | | \ | | | | | \ || | \_ ) # | () () || ) ( || ) \ \__| ) \ |___) (___| ) \ || (___) | # (_______)|/ \||/ \__/|/ )_)\_______/|/ )_)(_______) # # If you use this script, your ISP might think you've got a trojan # and sandbox you, ban you or whatevery they think is appropriate. # # This script collects the Monte Carlo web-server statistic-data by # connecting to random web-servers and asking it for its name. # The results are stored in a dictionary with each identification string # as key and the count of web-servers found as value. # # If you want to test the maximum speed / concurrent connections # remove these lines # if hcount > 10000: # time.sleep(1) # and run a process per core on your machine. Processes have to have # different working directories! # # Features: # # * Defining maximum number of concurrent connections. This is important # for OS X and maybe other BSD based systems. They tend to lockup beyond # 9000 connections. I even had random reboots on OSX. # * Linux on the other hand just scales and scales and scales. ;-) # * I was able to maintain 80'000 connections on linux with four processes # -> Then I hit the limit of the upstream-bandwidth at home. # * It only tries to access valid IPs (ie. ignores private IPs) # * It dumps snapshots of the collected data every 5000 sucessful connections # * It uses tornados supercool read_until_regex function # * IPs are feed to the ioloop by a seperate thread # * it properly cleanups used connections after 6 seconds # -> To make the script faster you can reduce this timeout, although then # you might miss some slow servers/connections. # * It locks shared datastructures. # * I used tornade.gen to write async-code as single function using # coroutines. Coroutines are one reason I love lua and python! # Async-code gets so much more readable! # * Its not tested on python2 use python3.2 or higher # * Use 3to2-3.x to convert the iptools module # 3to2-3.2 -w # python setup.py install # !! CONFIGURE YOUR OS to the maximum concurrent connections you want # test. If the hardlimit is already hight enough the script will # set a limit of 10240. # * Remove the resource.setrlimit code if your OS doesn't support it. # * It uses closure to settings to callbacks
# * I hope the tornado.iostream methods are threadsafe. In a production system
# you should definitely move these calls to the main thread.
