Things I'd do if I ever have time

Wish list

Please help a man further his career by donating expensive hardware. Cash works too.



Viewing One Website Is Promiscuous

Published: 09/14/2009

There was a time when using a computer meant that you sat down in front of a keyboard, did your work, saved the documents to disk or floppy, and then walked away. None of that attaching things to e-mails, clicking on a hyperlink, or sending data through established connections to another computer somewhere else on a "network." In other words, you worked on a computer that was completely stand-alone and offline.

Those days are over.

Using a computer these days that's disconnected from the Internet is almost unimagineable, just as having a cell phone that doesn't have a network connection seems completely unreasonable. Our expectation in this day and age means that we turn the computer on, expect an established Internet connection so we can look at a website, log into web-based e-mail, or do some other work which relies on an underlying network-based hook-up.

However, Internet communications happen quickly and many subtle things transpire without the user knowing or being able to approve of them. As an analogy, when you go to the auto dealership to buy a new car, the salesmen are also slipping in the extended warranty, over-priced paint sealant, rust-proofing upgrade, and fabric protection add-ons without you knowing it. On the web, you're paying the price with slower webpage load time, possible malware downloads, and potential identity theft issues.


Opening a simple website - what happens under the hood

To make these applications work seamlessly across a global network that's physically distributed over the planet requires an enormous amount of automation. As a generalization, for someone to view a website the following must happen in this order:

1) The user opens a web browser such as Mozilla Firefox or Microsoft Internet Explorer.

2) The user types in the address of a website she wishes to see, such as www.google.com.

3) The web browser application makes a request to the OS using API calls to establish a network connection to the requested website.

4) The OS sends a name query request to its assigned DNS server to determine the IP address of www.google.com.

5) Assuming the DNS server responds with the address of www.google.com, the OS then sends a connection establishment request to the server representing www.google.com. For web-related requests, this is known as a TCP handshake.

6) Once the TCP connection is established, the OS (but originally initiated via the web browser) sends an "HTTP GET" request for the root file of the website (this usually defaults to something like index.html, which is like the "intro page" of the site). HTTP is the protocol ("language") used for client-to-website communication.

7) The web server sends the page over to the user's computer. The user's OS downloads this file and places it in its designated browser cache store somewhere on the disk. The web browser then reads through the file and starts rendering the page contents in the prescribed layout.

8) If the index file references additional objects (such as images) which make up the webpage, the web browser and OS make additional "HTTP GET" requests for them. Pictures, cookie files, advertisement images, style sheets, etc., all make up the individual object pieces that fully make up the complete web page. Remember that images and page text information are separate items that make up a page. They are not put into "one file" like office documents so the web browser (and server) manages all of these as individual pieces that have to be put together like a puzzle.

9) Once all the objects are downloaded into the browser cache area on the disk, the browser is able to fully render the website as laid out by the invisible specifications embedded within the web page document.

10) If there are any scripts (sometimes referred to as JavaScript and JScript) defined within the web page document, these are automatically ran. Scripts are automated tasks that perform many functions such as loading image objects dynamically that are specific to the web browser version, load up advertising content from third-party sites, write session-specific cookie information, redirect the connection to another site, download executable objects to your disk, etc.. Generally speaking, scripts are performed transparently without the user's permission.


As hinted above, simply going to a single website address does not mean that all the content comes from that one source. Instead, it's very possible that a lot of the images and animated Flash content is downloaded from third-party websites, a scenario especially common when advertisements are involved as these are pulled from commercial ad networks which aggregate feeds from multiple marketers and stream the material directly to the end-user's computer. In virtually all cases, the original website that the user typed in the address for does not control the specific ad content coming from the third-party locations.

This is where the web gets dangerous because all this happens within a few seconds for a single web page. One click does it all.


A real-life example

Through various technical trickeries, malicious code writers potentially serve their virus-laden wares by embedding them in advertisements or by hijacking sites that serve ad content. Let's take a look at a popular website such as espn.com. The following represents a detailed session capture and the transactions which happen underneath the hood when a user opens the front page of the website (this particular examination reflects the specific characteristics of the site and content as of this writing; it's very important to understand that the content may dynamically change from one moment to the next).

Here's the capture file for those of you who know how to read a packet trace and want to follow along with Wireshark.

   View details of loading espn.com


In summary, simply going to espn.com (which redirects you to espn.go.com as its official home) automatically forces the client OS to contact twenty other servers to download content behind the scenes. Many of these are for general web page images, but some are specifically for advertising content and possibly other material of unknown nature which could potentially be hazardous.


Some safety nets

Until all web servers are hardened for much better security configurations, malware writers stop using ad networks as electronic disease carriers, and websites in general become less prone to riddling each webpage request with dozens of subsequent dependent object requests that make up images, cookies, JavaScript functions, Java applets, distracting Flash animations, etc., many security-conscious users choose to block these by default with a method of approving them on a case-by-case basis if they feel that a particular website and related third-party domain is trustworthy from both a security and aesthetics perspective.

A popular solution is to use Mozilla Firefox with the NoScript and Adblock Plus add-ons (which are additional plug-in software components that adds functionality to the base browser). However, by default this does make some sites slightly-to-completely inoperable without enabling scripting for particular third-party domains which the original site may be dependent on. The user has to ultimately make the choice whether to permit scripting or not (either permanently or only for the current session).

If Internet Explorer must be used as a browser, then using Privoxy as a local, client-side proxy at least filters out some common ads in the default configuration. IE will have to be configured to route through a proxy at the loopback address (127.0.0.1) on port 8118.

The Internet is in a constant state of change and part of the growing pain is in defending networks to protect both the client and server and the communication between them. Seemingly-trustworthy websites such as banking or news media sites can be compromised by increasingly-sophisticated crackers who embed their secret sauce into sites you may visit daily. While there may be nothing obvious to indicate a hijacked site on the surface, it may lurk there for some time unless the site owners (if ever) find out.



Go back to the main articles list.