FilterProxy home page: http://filterproxy.sourceforge.net/ Sourceforge project page: (mailing lists, cvs, etc) http://sourceforge.net/projects/filterproxy/ FilterProxy is a personal filtering proxy. It is unique in that it allows "Modules" to be installed that can perform arbitrary transformations on HTML (or any other mime-type). Currently it filters ads by rewriting HTML, compresses HTML content (for a 5-1 speedup on modems!), and de-animates animated gif's. Configuration is done with web forms. Modules currently supplied and tested are: * Rewrite: allows removal and modification of arbitrary parts of a html file using a configurable set of 'rules'. * XSLT: XML Stylesheet Language Transformations. XSLT is a W3C recommendation, which is a language for transforming XML documents into other XML documents. * Header: can strip or add headers by regex. * Compress: uses gzip compression to compress html. (4-5 times speed improvement for html, your browser uncompresses it) * DeAnim: de-animates animated gifs, and removes other "extension blocks", which often reduces the size of gifs. * Skeleton: a barebones, heavily commented module for people wanting to write new modules. See the TODO file for a list of work that needs to be done to extend this program. * ImageComp: a module which uses ImageMagick to recompress various image formats to reduce their size. (INCOMPLETE - Volunteers needed) Where to run FilterProxy: ------------------------- There are two basic ways to run FilterProxy. One is where FilterProxy is running on the same machine you are browsing from, and that machine is connected to the net via a slow interface (i.e. a modem) In this case it makes sense to use the following modules: Rewrite XSLT Mirror (when/if written) I also suggest enabling "localhost only" in this mode (for security). The second is where FilterProxy is running on a computer with relativly fast connection, and you are using it from a different computer, over a modem. In this case it makes sense to use the following modules: Compress (this will give you a ~5x speed improvement for html!) Rewrite XSLT Another way is where FilterProxy is running on a computer with a fast connection, and you are browsing from the same computer. This is basically the same as #1 above. Again, using "localhost only" is recommended. Make sure to install FilterProxy on a relatively fast computer. Don't put it on your OpenBSD firewall that's got a Pentium 90 in it. Parsing and filtering HTML is a computationally intensive task, and requires a reasonable amount of CPU. On my 533 Mhz alpha, most pages get filtered in under 0.5 seconds. On an 800 Mhz athlon I have access to, most pages get filtered in under 0.2 seconds. But on an older computer it could take many seconds, introducing a noticable delay. (This is only for HTML, images are usually very fast) If you're installing from the rpm, FilterProxy will install itself in /home/filterproxy, create a user for itself, and create an init script /etc/rc.d/init.d/filterproxy. If you wish to start FilterProxy on bootup, you should create a link to this script from /etc/rc5.d/ (or whatever your default runlevel directory is) FilterProxy also supports the following command line options: # FilterProxy.pl -h Options recognized by FilterProxy: -h Print this help message -k Kill an already running copy of FilterProxy -f Specify an alternate config file (default is `pwd`/FilterProxy.conf) -p Specify the port to which FilterProxy will bind (default is 8888) -n Do not daemonize: stay connected to the terminal from which it was started and print debugging messages. If you wish to use *another* proxy in addition to FilterProxy, you may set the environment variable http_proxy to point to the other proxy. It is also possible to set this from the CGI config page, FilterProxy.html For instance, if your ISP runs a caching proxy, set something like: # setenv http_proxy http://your.isp.here:1234 (csh syntax) # http_proxy=http://your.isp.here:1234 (sh syntax) Where 1234 is the port your other proxy runs on, and your.isp.here is the ip address of the proxy. (I have not tested this very well, but I have reports that it works as of ~0.15) If the upstream proxy requires authentication, this information can be entered on the main FilterProxy config page. (only works with BASIC authentication right now) The reason I wrote FilterProxy is to fix some problems with the web (in general) and brain dead web-site designers (specifically). Modules that I would like to see in the future: Cookie Filter cookies by server (i.e. do not send any cookies to ad servers, while still allowing cookies for other sites) Allow for sophisticated cookie management (check out HTTP::Cookies). Anchorizer Add to identifiable URL's in a web page, when those URL's don't already contain them. Clean clean-up HTML (specifically, remove MS's attempts at redefining ASCII by adding forward and back quotes, which appear on many browsers as '?') (use HTML::Clean package) Mirror Keep a local copy of all images on often visited sites. There are other things this program could do, if extended properly: 1) automatically download ads on sites you like to visit (but not displaying them to you) -- this gets money for the visited site. 2) ...I'll think of more... Thanks to the following folks for their help and ideas: Abigail, author or abiprox: http://language.perl.com/misc/abiprox/ Randal L. Schwartz, author of the WebTechniques column. Of particular interest are numbers 11 and 34, upon which FilterProxy is partially based. http://www.stonehenge.com/merlyn/WebTechniques/ Robert W. Cunningham who has been very forthcoming with ideas and helping me test FilterProxy. Steve Sekula, who has also been helpful in testing. John Conneely for rpm spec files, Header module updates. David MacKenzie for rpm spec file additions. Richard Tibbetts for "localhost only" option. Seth Golub for some patches. Vineet Kumar for patches related to transparent proxying. Vladimir N Goncharov for upstream proxy patch. Danek Duvall for some very useful discussions. Members of debian-devel and debian-legal for helpful discussions. Kenneth Vestergaard Schmidt for packaging FilterProxy for debian, and including it in the main debian distribution. (apt-get it!) Siggi Langauf For many interesting discussions, and some very good ideas. Siggi also came up with the javascript bookmarks. Baxter Rogers for pointing out some bugs, and not being afraid to pester me ;). Mario Lang for the XSLT module. Bob McElrath . WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING FilterProxy is not a completed work, and should be considered "alpha" software. It has bugs. These bugs may set your cat on fire. They may crash your computer, erase your hard drive, mail porn to your grandmother, or other nasty things. You have been warned. YOU are responsible for all content that passes for FilterProxy. Do not filter content for other people without their knowledge and consent. This software is not intended to be a "netnanny", filter dirty words, or prevent your thirteen-year-old from seeing pornography. There are lots of other filters out there for that -- go find one if you want one. A note about filtered content: ------------------------------ It is clear to me that rewriting web pages for yourself should not pose any kind of legal quandry. That is, removing banner ads from HTML is perfectly legal, just as scribbling in a book you have bought, or ripping pages out of a book you have bought is legal. What is *not* legal is redistributing modified content. In most cases, HTML is copyrighted by the site you downloaded it from, and I doubt they would be very happy if you started redistributing modified copies of their site. DO NOT install FilterProxy to filter other people's content (without their knowledge and consent). You may be liable for copyright infringement on the pages filtered. FilterProxy is meant to be a PERSONAL ad-filtering proxy. WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING