Subject: RE: More Linux - the geek stuff Sat Jan 30 13:11:54 1999 > Actually for a program, it would be > or > from the Apache docs: http://www.apache.org/docs/mod/mod_include.html under the description of : > The include virtual element should be used in preference to > exec cgi. the virtual include uses another server thread to read the file, or execute the CGI, and pass the information back to the main thread. that allows includes to be handled with all the same option checking and server features as a regular page request. to tell the truth, i'm not sure if an exec call would even be able to use the proxy system. i believe an exec just spawns a subshell, which would of course be independent of the httpd. that would bypass the proxy module, thus cutting out the system for connecting to another machine (barring additional software). i haven't actually tested it, but that would be my guess. > Not entirely true. This will "pseudo balance" the load. Once > a client has resolved the IP address for the host, it will > continue to use it. And if that machine dies, until the entry > expires from DNS the client will get a host not found error (or > host not responding). that's a good point regarding machines which are visible to the internet in general. if you want load balancing across multiple servers which can be queried direcly from the internet, you need a more robust solution that straight DNS can provide. within a clustered network, though, only the machines within the cluster talk to the back-end servers, so all the DNS traffic stays local. you can drop the TTL on the name record way down, which eliminates the caching issues, and everyone talks directly to the local primary, so there are no secondary nameservers to introduce latency problems. the additional name lookups increase the load on the network, but in an environment of 100Mb/s ethernet, that cost is affordable. the reverse proxy system in the article you mentioned is based on the same principle as what i described. the author of the article just built a custom version of Apache to put the load balancing features directly into the httpd itself. i prefer to avoid custom hacking in mission critical systems, so i use local round-robin DNS to cover the load balancing. the built-in approach also carries a maintenance load in terms of failover. it's certainly possible.. in fact simple.. to change the proxy mappings to cut out a crashed server, but the daemon doesn't do it automatically. either the network administrator has to swap the config files manually and SIGHUP the daemon, or you have to run an additional monitoring process to do the same thing. both options require synchronization between a dynamic system and a set of prewritten config files, and i personally consider that a pain in the tuckus. it's too easy for either a typo or a thinko to screw up your proxy server configuration ("what do you mean you forgot to swap the config files after we put the new server online? that was a *month* ago!"), and i've been bitten by situations like that enough to be leery of them. it's easier, IMO, to let the machines themselves decide whether they can handle requests. of course, you can find people who will argue in the opposite direction.. it's mostly a matter of taste. i prefer to build complex systems using smaller, distributed, interacting subsystems. other people prefer a single, more complex system which is centralized. each approach has its strengths, and each has its weaknesses. used well, both techniques are equally powerful. in the end, the choice is usually a question of temperament rather than technical merits. as a side note.. if you do want to build a monitoring system that checks a collection of servers to see which ones are active, it's better to use noisy remotes than polling. the approach people usually think of.. polling.. is to have a single machine that periodically makes a connection to each of the machines being monitored and checks to see if they're okay. it's a straightforward way of doing things, and keeps all the code in a single place, but it also has subtle weaknesses. the basic problem is that the central machine interprets silence as a problem in the remote machine, which may or may not be a valid conclusion. if the central monitor in a polling system dies, or goes corrupt, you're out of luck. the classic worst case scenario is when the network card fails in a polling monitor that automatically reboots any machines it considers dead. for all practical purposes, the network becomes a very expensive set of blinking christmas lights. adding redundant monitors makes things worse, because those monitors can fall into race conditions like 'blinking machine syndrome': monitor #1 decides a machine is down, and reboots it. during that reboot process, monitor #2 fails to make contact, and cycles the power again. during the second roboot, monitor #1 decides the machine has died again, etc, etc. even if you install safeguards and delays, a combination of high load and bad timing can produce a blinking server. what happens when the two monitors decide to reboot each other can only be described in terms of the Keystone Kops. the alternative, which is more robust, is to have separate processes on all the remote machines, each of which periodically reports itself to the central monitor. the advantage is that you have the capacity to make decisions at both ends of the connection, which makes it easier to build failover systems. basically, you're designing around a dead-man's switch rather than a sentry. if the central monitor of a noisy system dies, the rest of the network will know it, and can be programmed to fail over to another central monitor. the conventional design practice for noisy remotes is to make any machine in the system capable of taking over as monitor for the others, thereby decentralizing the responsibility for monitoring across the network itself.