Spammers use a variety of techniques to harvest email addresses. However the two main techniques are (a) the use of automated spiders and (b) directory harvesting.
These are software agents that are known under a variety of names… spiders, crawlers, robots and bots. These spiders are the seekers of content on the internet. They form the basis of how search engines, such as Google and Yahoo!, work.
Search engine spiders trawl the internet unceasingly looking for content. Their searches are based on important words known as key words. The engines keep an index of the words they find and the website where they find them. Users of the search engines can then find these sites by keying in the search words. A major search engine will index hundreds of millions of pages, and respond to tens of millions of queries every day.
A spammer collects email addresses in a similar way… by sending an automated spider throughout the internet looking for addresses that are found on web pages or in links used to send emails. The spider sends them back to the person who is compiling the spam list.
The spammer’s spider will trawl a variety of websites looking for addresses 192.168.0.1. These include dating sites, chat rooms, message boards, Usenet newsgroups; in fact any type of webpage that might conceivably contain an address.
If you have ever sent your address to anyone on the internet, have inserted it in a form or have you own webpage with your address on it, you can be absolutely sure that your email address has been harvested by numerous spiders working for compilers of spam lists.
A directory harvesting attack, aka a dictionary attack, is another common technique for creating lists of addresses. It is used to collect addresses from internet service providers (ISPs), mail services such as Yahoo!, Hotmail and AOL, and large companies with their own mail servers.
The attacking software sends millions of emails to addresses on a particular server. It makes these addresses up using sequences of minor variations on a basic address. For example, the software could send the same email to a series of addresses such as akennedy / bkennedy / / ckennedy @yahoo.com and so on.
Nearly all these addresses will be invalid, in which case the server will respond with an SMTP 550 error message. The harvesting software will ignore these addresses. But every now and then the software will get lucky and the server will respond with a message that an email address is valid. The software will compile all the valid addresses into a list for spamming.
The software will probably send out millions of email messages just to find a few hundred valid addresses, so this seems a very inefficient way to harvest email addresses. But the whole process is automated, so it costs the spammer very little.
Other email collection techniques
There are several other ways email addresses can be harvested.
One of these is to set up a webpage offering to send a product or service free of charge as long as the user provides an email address. Examples of these kinds of sites are those that promise to send a joke-of-the-day, daily quotes from the bible, news or stock alerts, and so on. I recently came across a site that stated that there could be a registered sex offender in my area and that I could get further information by email!
In sum… there is little you can do to avoid having your address harvested by spammers. The best you can do is to make sure you are running good anti-spam software and that you keep it up to date.