I collected the Monte Carlo web-server statistic-data by connecting to random web-servers and asking it for its name. I'll blog about this next time, it was quite exciting I was able to maintain 80'000 concurrent connections on linux using tornados ioloop when I hit the limit of the upstream-bandwidth at home.
Download
webservers.json.gz here. The file is a dictionary with each identification string as key and the count of web-servers found as value.
Forgive me for using terms like "Monte Carlo": it sounds great as a blog title and I hope it is not completely utterly wrong.
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.
Out[1]:
('Python',
'3.2.3 (default, Oct 19 2012, 19:53:16) \n[GCC 4.7.2]',
'Pandas',
'0.10.0')
The file is read into a pandas DataFrame. Version 0.10 of pandas seems to support literal indexes, I believe that is new. I add a column 'wtype' for the most common web-server types.
I always use
adsy-python's display_html because it suppresses the creation of vertical scrollbars, which pandas adds by default.
Out[4]:
|
wcount |
wtype |
| RomPager/4.07 UPnP/1.0 |
37162 |
other |
| Apache |
27085 |
other |
| AkamaiGHost |
25167 |
other |
| Microsoft-IIS/6.0 |
14432 |
other |
| micro_httpd |
10862 |
other |
| Microsoft-IIS/7.5 |
8838 |
other |
| Apache/2.2.3 (CentOS) |
8336 |
other |
| GoAhead-Webs |
7807 |
other |
| nginx/1.0.11 |
5128 |
other |
| Microsoft-IIS/7.0 |
3343 |
other |
I detect the most common web-servers and write the result to the wtype column, in the next step I'll group by this column.
Out[7]:
|
wcount |
wtype |
| RomPager/4.07 UPnP/1.0 |
37162 |
rompager |
| Apache |
27085 |
apache |
| AkamaiGHost |
25167 |
akamai |
| Microsoft-IIS/6.0 |
14432 |
iis |
| micro_httpd |
10862 |
micro_httpd |
| Microsoft-IIS/7.5 |
8838 |
iis |
| Apache/2.2.3 (CentOS) |
8336 |
apache |
| GoAhead-Webs |
7807 |
other |
| nginx/1.0.11 |
5128 |
nginx |
| Microsoft-IIS/7.0 |
3343 |
iis |
Now the beautiful pandas statement: first group by wtype and then sum wcount.
I can pass the DataFrames wcount column directly to matplotlib. Note that the
metallic piechart from my previous post is now part of
adsy-python.
This is how the summed DataFrame looks:
Out[10]:
|
wcount |
| wtype |
|
| apache |
79055 |
| other |
60610 |
| rompager |
41158 |
| iis |
28556 |
| akamai |
25167 |
| nginx |
13845 |
| micro_httpd |
10862 |
Lets find out what is in the 'other' group. Group by wtype again:
Get the others group sort it by wcount and use the original DataFrame to display these entries.
Out[13]:
|
wcount |
wtype |
| GoAhead-Webs |
7807 |
other |
| Microsoft-HTTPAPI/2.0 |
2703 |
other |
| cisco-IOS |
2688 |
other |
| NET-DK/1.0 |
2629 |
other |
| mini_httpd/1.19 19dec2003 |
2093 |
other |
| httpd |
2083 |
other |
| lighttpd/1.4.28 |
2020 |
other |
| SonicWALL |
1503 |
other |
| Mini web server 1.0 ZTE corp 2005. |
1465 |
other |
| Boa/0.94.14rc21 |
978 |
other |
At the end of the table are some of the more exotic web-servers.
Out[14]:
|
wcount |
wtype |
| Crucial Web Hosting |
1 |
other |
| LANCOM 1611+ 7.58.0045 / 14.11.2008 |
1 |
other |
| iptoX GmbH |
1 |
other |
| pvparena |
1 |
other |
| WEBrick/1.3.1 (Ruby/1.8.5/2006-08-25) |
1 |
other |
| EWS-NIC5/98.41 |
1 |
other |
| VPOP3 Mail Http Server |
1 |
other |
| BSTNMA-VFTTP-113 (12.1.1 patch-0.3 [BuildId 14015]) |
1 |
other |
| ArtBlast/3.5.5 |
1 |
other |
| QTSS/5.5.4 (Build/489.0.5; Platform/MacOSX; Release/Update; ) |
1 |
other |
| LSANCA-VFTTP-155 (12.1.1 patch-0.3 [BuildId 14015]) |
1 |
other |
| EWS-NIC4/10.26 |
1 |
other |
| SR-S716C2 |
1 |
other |
| Werkzeug/0.8.3 Python/2.6.5 |
1 |
other |
| NWRKNJ-VFTTP-132 (12.1.1 patch-0.3 [BuildId 14015]) |
1 |
other |
| BT Web Server |
1 |
other |
| PHLAPA-VFTTP-83 (12.1.1 patch-0.3 [BuildId 14015]) |
1 |
other |
| kangle/2.9.9 |
1 |
other |
| NVFWS |
1 |
other |
| kangle/2.9.6 |
1 |
other |
| s2.33.2 |
1 |
other |
| Helix Universal Media Server/15.0.0.289 (win-x86_64-vc10) |
1 |
other |
| eIDC32 WebServer |
1 |
other |
| HP HTTP Server; HP Photosmart eStn C510 series - CQ140A; Serial Number: CN08N1N0AU05KN; Zeus Built:Mon Jul 25, 2011 04:08:52PM {ZEP1CN1130AR, ASIC id 0x00320104} |
1 |
other |
| ECAcc (fcn/40AA) |
1 |
other |
| Jetty/4.2.27 (Linux/2.4.22-1.2174.nptlsmp i386 java/1.4.1_04) |
1 |
other |
| ECAcc (tko/1222) |
1 |
other |
| Cougar/9.01.01.3844 |
1 |
other |
| CISCO IOS 12a Copyright (c) 1995-2002 by Cisco Systems mod_perl/2.0.4 Perl/v5.10.1 |
1 |
other |
| LANCOM 1781A 8.62.0029 / 20.06.2012 |
1 |
other |