Many companies employ some form of Internet firewall, but schools are uniquely required to provide more extensive Internet content filtering on student-use workstations. Content filtering can be applied in a variety of methodologies, and most content filtering technologies use a combination of multiple methodologies. Content filtering can be used to block access to pornography, gaming, shopping, advertising, email/chat or file transfers, or to websites that provide information about hate/bigotry, guns, drugs, gambling, etc.

The simplest method of providing content filtering is to specify a blacklist. A blacklist is nothing more than a list of domains, URLs, file names, or extensions that the content filter should block. If the Playboy.com domain were blacklisted, for example, access to that entire domain would be blocked, including any subdomains or subfolders. In the case of a blacklisted URL, such as en.wikipedia.org/wiki/Recreational_drug_use, other pages in the domain might be available, but that specific page would be blocked. Wildcards can often be used to block large sets of domains and URLs with single entries like *sex*. Blacklisting can also be used to prevent software installations by blocking access to files, such as */setup.exe, or to prevent changes to the computer by blocking potentially harmful file types, such as *.dll or *.reg. Since content filters still cannot differentiate between art and pornography, many content filters are also configured to block graphic file types, such as *.gif, *.jpg, *.png, etc.

IN White list it is the opposite of a blacklist; is a list of resources that the content filter should allow to pass through; Like a gorilla on a velvet rope, the content filter blocks any resources not specified in the whitelist. Blacklists and whitelists can be used together to provide more granular filtering; the blacklist could be used to block all types of graphic files, for example, but the whitelist could be configured to override the blacklist on images from specific, moderated or sponsored, age-appropriate image hosting services. Blacklists and whitelists are quick and easy ways to determine if a particular website should be displayed or not. Comparing a website against a list isn’t processor intensive so it can be done quickly, but it’s also not robust in the sense that new websites are constantly popping up and there’s no way anyone can keep track of adding all the bad guys to a blacklist.

So what do we do with that continuous stream of new websites coming online? That’s where the most advanced filtering methodologies come into play. analyzing can be used to search for particular words or phrases on a web page. Instead of relying solely on filtering by address, the content filter downloads the requested website (unless it’s immediately blocked by a blacklist) and reads each line, looking for bad words or phrases. A list of bad words or phrases is specified, conceptually like a blacklist, but this list would be checked for matching patterns in the web page, which requires more processor time and slows down the service of web pages. (In fact, I’m sure at this very moment there are already some content filters that are reluctant to show this very article simply because it includes the word sex in the previous paragraph, and if that’s not enough, check out what’s below …) A typical list of bad words and phrases might include “boobies”, but since web authors are just as interested in getting their content past filters as administrators are in keeping it out, it may be necessary to include weird words as well . -apparent varieties, such as b00bies, boob!es or boobie$. Filtering can be set to block any page that includes any of the bad phrases, or point values ​​can be assigned to the phrases and the filter can be set to block any page that exceeds a certain point threshold.

The next content filtering methodology is called context filtering, and picks up where the analysis of words and phrases left off. The problem with word and phrase analysis is that it’s not very smart. It just acts on everything that matches a predefined pattern, regardless of context. You might block pages that include the terms “naked truth” or “chicken breasts,” while an administrator might not care about “naked” or “breasts” in those contexts, but might want to block pages that include the words “bare breasts.” “. , “if used together. Even with point values ​​and thresholds assigned, it is possible to block legitimate web pages.

For example, a breast cancer web page could easily reference the breasts enough times to pass a point threshold. Context filtering is done through a variety of proprietary algorithms that are designed by the various manufacturers of Internet content filters. The trick is that they need to balance speed and accuracy; they must download and carefully analyze all the wording on requested web pages to determine if they are acceptable or taboo, and they must do so quickly enough to continue to appear as transparent as possible to users. If they’re too quick to judge, they can pass up unacceptable content (known as “misses”) or block acceptable content (known as “false hits”), but if they’re too thoughtful, users will complain about latency. Building a better algorithm takes more time and money, so faster and more accurate filters often cost more.

Just to complete this treatise on Internet content filtering, I should also mention that there may be other methodologies employed or configurable in various Internet content filtering solutions. Virtually all Internet content filters work on port 80 (http); most ignore other protocols, but some can apply filtering to other ports, or can filter specific ports entirely, such as FTP or Telnet. (I wonder what port “World of Warcraft” uses…)

Like firewalls, I should also point out that Internet content filters come as hardware or software solutions. Hardware solutions are commonly known as “devices” and software solutions are commonly known as “applications” or “services”. Hardware solutions provide centralized management. They may cost more, but they do all the filter-related processing to relieve your servers and workstations of such responsibilities. They often come with subscription services for blacklist, whitelist, phrase list, and context data updates, just as antivirus subscriptions provide updates to virus signature lists. They can be multihomed pass-through gateways or they can work by redirecting traffic to a specific port or destination IP address.

High-end models may also include caching to speed up the service of frequently accessed resources. Software-based solutions can be server-based or can be installed on each individual workstation. Most server installations offer the same centralized management as hardware solutions, but of course use your processor and RAM to do the filtering, rather than being a dedicated device. therefore, they may be less expensive. In the case of a workstation installation, in addition to installing the software on each individual workstation, you may also need to individually configure each workstation and periodically update each workstation individually.

Even Microsoft Internet Explorer has a free, simple, built-in Internet content filter: it’s called “Content Advisor”, and you can configure it in Internet Options in the Windows Control Panel. It’s fine for your child’s standalone computer or a small peer-to-peer network, but probably unsuitable as a business solution. Whether based on hardware or software, best-in-class enterprise solutions are often integrated into Active Directory, which simplifies administration and configuration and allows filtering settings to follow users anywhere on the network. . Teachers, for example, could have less restrictive settings regardless of where they log in, while students could still be locked out, even if they sneak into the staff room during recess.

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *