Apologies if this is a bit exhaustive. This morning
Simon and I stress-tested a test Orion server. The test only
lasted about 10 minutes since it became clear we could not kill
it. Herein is my attempt at offering some sysadmin insight.
Test method: a shell script that logs in, and loops
indefinitely, performing any one of the six actions below. Unlike
in the real world, each action has an equal, random chance of
being executed.
- Get a random file, including a "zip-on-the-fly" export link
- Open a random directory
- Get the 'export' version of a directory or project (zip)
- Create a file in a random directory
- Edit an existing file (download a file, add 6KB of random
content, then save it)
- Create a new folder somewhere in the tree
A single instance of the script was launched, then another, until
we eventually had 12 instances running.
Server Details:
Intel SR1600UR with 16 CPU cores (Intel E5540 @ 2.53 GHz)
SuSE Linux Entreprise Server 11 running under Xen virtualization
8G RAM @ 1066MHz
Disk I/O: two regular SATA drives connected to an LSI raid
controller (RAID 1, mirroring used)
By the numbers:
Total user accounts on server: over 20,000
Runtime: a few minutes
Total requests: 21261
GETs: 10650
POSTs: 9001
PUTs: 1609
For 2 minutes the system handled 25+ req/sec, peaking at 50+ req/sec
for over 10 seconds.
Peak CPU load was 4 cores @ 100% with 12 idle cores
Peak Disk Read load was absolute zero. The entire workspace was
kept in RAM.
Peak Disk Write load was 21 MB/sec, with an average of about 12
MB/sec during peak period. Disk writes were quite high since the
odds of a new login (creating a workspace) and adding 6KB to a file
were high.
Memory usage at start: Free: 6.9G Write Buffers: 181M File
Cache: 660M
Memory usage at end: Free: 6.3G Write Buffers:195M File
Cache 936M (server workspace grew about 330M and was entirely
contained in RAM)
Conclusion:
A single server has the potential of serving hundreds of
concurrent, active users. Scalability could be achieved by placing
the serverworkspace on a shared filesystem and load-balancing the
Orion servers, and perhaps adding a caching http server between the
client and the Orion server. The two first bottlenecks would be Disk
I/O and CPU. I suspect CPU usage is partly due to the
zip-on-the-fly nature of the export links.
I hope some of this is helpful. Let me know if you have any
questions.
Denis
|