Cluster Validity as a Feature in Spam Classification

Many spam mails that land in my inbox tend to be thematically similar, though the messages have slight variations (perhaps they’re being sent by the same spammer). Ordinary messages do not cluster so well. Clusters formed on these spam messages should thus be “tighter” than clusters to which ordinary messages belong. Cluster membership and validity may thus be used as a feature in subsequent spam classification.

Leave a Reply

Your email address will not be published. Required fields are marked *