Anda di halaman 1dari 20

SYNOPSIS OF PROJECT ON Automatic Phishing Email Website Detection System Using Fuzzy Techniques

ABSTRACT

PHISHING websites are forged web pages that are created by malicious people to mimic web pages of real websites. Most of these kinds of Web pages have high visual similarities to scam their victims. Some of these kinds of Web pages look exactly like the real ones. Unwary Internet users may be easily deceived by this kind of scam. Victims of phishing Web pages may expose their bank account, password, credit card number, or other important information to the phishing Web page owners.

CHAPTER 1 INTRODUCTION
1.1 ABOUT THE PROJECT Detecting and identifying Phishing websites is really a complex and dynamic problem involving many factors and criteria, and because of the subjective considerations and the ambiguities involved in the detection, Fuzzy Logic model can be an effective tool in assessing and identifying phishing websites than any other traditional tool since it offers a more natural way of dealing with quality factors rather than exact values. In this paper, we present novel approach to overcome the fuzziness in traditional website phishing risk assessment and propose an intelligent resilient and effective model for detecting phishing websites. The proposed model is based on FL operators which is used to characterize the website phishing factors and indicators as fuzzy variables and produces six measures and criterias of website phishing attack dimensions with a layer structure. Our experimental results showed the significance and importance of the phishing website criteria (URL & Domain Identity) represented by layer one, and the variety influence of the phishing characteristic layers on the final phishing website rate. The word 'Phishing' initially emerged in 1990s. The early hackers often use 'ph' to replace 'f' to produce new words in the hacker's community, since they usually hack by phones. Phishing is a new word produced from 'fishing', it refers to the act that the attacker allure users to visit a faked Web site by sending them faked e-mails (or instant messages), and stealthily get victim's personal information such as user name, password, and national security ID, etc. These information then can be used for future target advertisements or even identity theft attacks (e.g.,

transfer money from victims' bank account). The frequently used attack method is to send emails to potential victims, which seemed to be sent by banks, online organizations, or ISPs. In these e-mails, they will makeup some causes, e.g. the password of your credit card had been mis-entered for many times, or they are providing upgrading services, to allure you visit their Web site to conform or modify your account number and password through the hyperlink provided in the e-mail. You will then be linked to a counterfeited Web site after clicking those links. The style, the functions performed, sometimes even the URL of these faked Web sites are similar to the real Web site. It's very difficult for you to know that you are actually visiting a malicious site. If you input the account number and password, the attackers then successfully collect the information at the server side, and is able to perform their next step actions with that information (e.g., withdraw money out from your account).Phishing itself is not a new concept, but it's increasingly used by phishers to steal user information and perform business crime in recent years. Within one to two years, the number of phishing attacks increased dramatically. According to Gartner Inc., for the 12 months ending April 2004, "there were 1.8 million phishing attack victims, and the fraud incurred by phishing victims totaled $1.2 billion".

According to the statistics provided by the Anti-Phishing Working Group (APWG), in March 2006, the total number of unique phishing reports submitted to the APWG was 18,480; and the top three phishing site hosting countries are, the United States (35.13%), China (11.93%), and the Republic of Korea (8.85%). The infamous phishing attacks happened in China in recent years include the events to counterfeit the Bank of China (real Web site www.bank-ofchina.com, counterfeited Web site www.bank-off-china.com),the Industrial and Commercial Bank of China

(real Website www.icbc.com.cn, faked web site www.lcbc.com.cn), the Agricultural Bank of China (real webs ite www.95599.com,faked Web site www.965555.com), etc. In this project, we study the common procedure of phishing attacks and review possible anti-phishing approaches. We then focus on end-host based anti-phishing approach. We first analyze the common characteristics of the hyperlinks in phishing e-mails. Our analysis identifies that the phishing hyperlinks share one or more characteristics as listed below: 1) The visual link and the actual link are not the same; 2) The attackers often use dotted decimal IP address instead of DNS name; 3) Special tricks are used to encode the hyperlinks maliciously; 4) The attackers often use fake DNS names that are similar (but not identical) with the target Web site.

CHAPTER 2
SYSTEM ANALYSIS 2.1 EXISTING SYSTEM We briefly review the approaches for antiphishing. 1) Detect and block the phishing Web sites in time: If we can detect the phishing Web sites in time, we then can block the sites and prevent phishing attacks. It's relatively easy to (manually) determine whether a site is a phishing site or not, but it's difficult to find those phishing sites out in time. Here we list two methods for phishing site detection. A) The Web master of a legal Web site periodically scans the root DNS for suspicious sites (e.g. www. 1 cbc.com.cn vs. www.icbc.com.cn). B) Since the phisher must duplicate the content of the target site, he must use tools to (automatically) download the Web pages from the target site. It is therefore possible to detect this kind of download at the Web server and trace back to the phisher. Both approaches have shortcomings. For DNS scanning, it increases the overhead of the DNS systems and may cause problem for normal DNS queries, and furthermore, many phishing attacks simply do not require a DNS name. For phishing download detection, clever phishers may easily write tools which can mimic the behavior of human beings to defeat the detection. 2) Enhance the security of the web sites: The business Websites such as the Web sites of banks can take new methods to guarantee the security of users' personal information. One

method to enhance the security is to use hardware devices. For example, the Barclays bank provides a hand-held card reader to the users. Before shopping in the net, users need to insert their credit card into the card reader, and input their (personal identification number) PIN code, then the card reader will produce a onetime security password, users can perform transactions only after the right password is input. Another method is to use the biometrics characteristic (e.g. voice, fingerprint, iris, etc.) for user authentication. For example, Pay pal had tried to replace the single password verification by voice recognition to enhance the security of the Web site. With these methods, the phishers cannot accomplish their tasks even after they have gotten part of the victims' information. However, all these techniques need additional hardware to realize the authentication between the users and the Web sites hence will increase the cost and bring certain inconvenience. Therefore, it still needs time for these techniques to be widely adopted. 3) Block the phishing e-mails by various spam filters: Phishers generally use e-mails as 'bait' to allure potential victims. SMTP (Simple Mail Transfer Protocol) is the protocol to deliver e-mails in the Internet. It is a very simple protocol which lacks necessary authentication mechanisms. Information related to sender, such as the name and email address of the sender, route of the message, etc., can be counterfeited in SMTP. Thus, the attackers can send out large amounts of spoofed e-mails which are seemed from legitimate organizations. The phishers hide their identities when sending the spoofed e-mails, therefore, if anti-spam systems can determine whether an e-mail is sent by the announced sender (Am I Whom I Say I Am?), the phishing attacks will be decreased dramatically.

From this point, the techniques that preventing senders from counterfeiting their Send ID (e.g. SIDF of Microsoft) can defeat phishing attacks efficiently. SIDF is a combination of Microsoft's Caller ID for E-mail and the SPF (Sender Policy Framework) developed by Meng Weng Wong. Both Caller ID and SPF check e-mail sender's domain name to verify if the e-mail is sent from a server that is authorized to send e-mails of that domain and from that to determine whether that e-mail use spoofed e-mail address. If it's faked, the Internet service provider can then determine that e-mail is a spam e-mail. The spoofed e-mails used by phishers are one type of spam e-mails. From this point of view, the spam filters, can also be used to filter those phishing e-mails. For example, blacklist, white list, keyword filters, Bayesian filters with self learning abilities, and E-Mail Stamp, etc., can all be used at the e-mail server or client systems. Most of these anti-spam techniques perform filtering at the receiving side by scanning the contents and the address of the received e-mails. And they all have pros and cons as discussed below. Blacklist and whitelist cannot work if the names of the spamers are not known in advance. Keyword filter and Bayesian filters can detect spam based on content, hence can detect unknown spasm. But they can also result in false positives and false negatives. Furthermore, spam filters are designed for general spam e-mails and may not very suitable for filtering phishing e-mails since they generally do not consider the specific characteristics of phishing attacks.

CHAPTER 3
REQUIREMENT SPECIFICATIONS HARDWARE AND SOFTWARE SPECIFICATION 3.2.1 HARDWARE REQUIREMENTS Hard disk RAM Processor speed : : : 250 GB and above 2 gb Dual core

3.2.2 SOFTWARE REQUIREMENTS Operating System Documentation Tool : : windows 7 Ms word 2000

3.2.3 TECHNOLOGIES USED JSP Servlets Apache Tomcat 5.5

3.2.4 DATABASE mysql

3.3 TECHNOLOGIES USED

3.3.1 JAVA: It is a Platform Independent. Java is an object-oriented programming language developed initially by James Gosling and colleagues at Sun Microsystems. The language, initially called Oak (named after the oak trees outside Gosling's office), was intended to replace C++, although the feature set better resembles that of Objective C.

CHAPTER 4
BLOCK DIAGRAM

Fig.4.1

4.1 Data Flow Diagram

Website Homepage Login InCorrect

Register

Data Base Correct

User Homepage

Mail Send

Mail Server

Mail Receive

Logout

F I N A L

Layer 1

Layer 2

Layer 3
H I S

4.2 Sequence Diagram


H I N G

Activity Diagram

Use case Diagram

SYSTEM DESIGN Website phishing detection rate is performed based on six criteria: URL & Domain Identity, Security & Encryption, Source Code & Java script, Page Style & Contents, Web Address Bar And Social Human Factor as shown in Table I, which also shows that there are different number of components for each criterion, five components for URL & Domain Identity,

Source Code & Java script, Page Style & Contents, Web Address Bar, four components for Security & Encryption and three components for Social Human Factor. Therefore, there are twenty seven components in total. There are three layers on this website phishing fuzzy model as shown in figure 2. The first layer contains only URL & Domain Identity criteria with a weight equal to 0.3 for its importance; the second layer contains Security & Encryption criteria and Source Code & Java script criteria with a weight equal to 0.2 each; the third layer contains Page Style & Contents criteria, Web Address Bar criteria And Social Human Factor criteria with a weight equal to 0.1 each. The six criteria have been prioritized according to their importance using weights as concluded from the Website phishing experiments, case studies, Anti phishing tools analysis, web surveys, phishing quizzes, detailed questionnaire and phishing experts feedback.

5.1 MODULES Webpage Creation E-mail process Implementing Fuzzy Logic Model Final website Phishing rate

Webpage Creation:

It is a web page it includes header and footer. In an index page have a login form like username and password and it contains homepage details about this project. New User Register Login Old user Remember the Password

E-mail process: In this module includes sending and receiving a mail using JES server. Mail composing page have email address of recipient, subject and the content. All the mail received in the corresponding mail inbox. Input: User Send the Email

Output: User Receive the E-Mail

Implementing Fuzzy Logic Model: The essential advantage offered by fuzzy logic techniques is the use of linguistic variables to represent Key Phishing Characteristic Indicators and relating website phishing probability. Website phishing detection rate is performed based on six criteria: URL & Domain Identity, Security & Encryption, and Source Code & Java script, Page Style & Contents, Web Address Bar and Social Human Factor. There are three layers on this website phishing fuzzy model. The first layer contains only URL & Domain Identity criteria with a weight equal to 0.3 for its importance; the second layer contains Security & Encryption criteria and Source Code &

Java script criteria with a weight equal to 0.2 each; the third layer contains Page Style & Contents criteria, Web Address Bar criteria And Social Human Factor criteria with a weight equal to 0.1 each. Input:

Input Email message in 3 Layers

Output:

Return the status for the message in 3 Layers(Genuine, Fake, Uncertain)

Final website Phishing Rate: In the website phishing rule base last phase, there are three inputs, which are: layer one, layer two and layer three, and one output which is the rate of the phishing website The rule base contains (33) = 27 entries and the output of final website phishing rule base is one of the final output fuzzy sets (Very Legitimate, Legitimate, Suspicious, Phishy or Very Phishy) representing final phishing website rate. Input: Input 3 Layers Status

Output: Return Final Phishing Rate (legitimate, very legitimate, suspicious, phisy, very phisy)

CONCLUSION AND FUTURE ENHANCEMENT Phishing has becoming a serious network security problem, causing finical lose of billions of dollars to both consumers and e-commerce companies. And perhaps more fundamentally, phishing has made e-commerce distrusted and less attractive to normal consumers. In this paper, we have studied the characteristics of the hyperlinks that were embedded in phishing e-mails. The fuzzy website phishing model showed the significance and importance of the phishing website criteria (URL & Domain Identity) represented by layer one, and also showed that even if some of the website phishing characteristics or layers are not very clear or not definite, the website can still be phishy especially when other phishing characteristics or layers are obvious and clear. On the other hand even if some of the website phishing characteristics or layers are noticed or observed, that does not mean at all that the website is phishy, but it can be safe and secured especially when other phishing characteristics or layers are not noticeable, visible or detectable.

REFERENCES

WholeSecurity Anti-Phishing Working

Web Group.

Caller-ID, Phishing Activity

www.wholesecurity.com Trends Report,

http://antiphishing.org/reports/apwg_report_DEC2005_FINAL.pdf, December 2005. B. Adida, S. Hohenberger and R. Rivest, Lightweight ncryption for Email, USENIX Steps to Reducing Unwanted Traffic on the Internet Workshop (SRUTI), 2005. S.M. Bridges and R.B.Vaughn, fuzzy data mining and genetic algorithms applied to intrusion detection, Department of Computer Science Mississippi State University, White Paper, 2001. R.Dhamija and J.D. Tygar, The Battle against Phishing: Dynamic Security Skins, Proc.Symp. Usable Privacy and Security,2005. FDIC., Putting an End to Account-Hijacking Identity Theft,

http://www.fdic.gov/consumers/consumer/idtheftstudy/identity_theft.pdf, 2004.

A. Y. Fu, L.

Wenyin and X. Deng, Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Movers Distance (EMD) , IEEE transactions on dependable and secure computing, vol. 3, no. 4, 2006. A. Herzberg and A. Gbara, Protecting Naive Web sers, Draft of July 18, 2004. C. Y. Ho, B. W. Ling and J. D. Reiss, "Fuzzy Impulsive Control of High-Order Modulators," IEEE Transactions on Circuits and

Interpolative Low-Pass SigmaDelta

SystemsI: Regular Papers, Vol. 53, No. 10, October 2006. L. James, Phishing Exposed, Tech Target Article sponsored by: Sunbelt software, searchexchange.com, 2006. M. Liu, D. Chen and C. Wu. "The continuity of Mamdani method," International Conference on Machine Learning and Cybernetics, Page(s): 1680 - 1682 vol.3, 2002. W. Liu, G. Huang, X. Liu, M. Zhang, and X. Deng, Phishing Web Page Detection, Proc. Eighth Intl Conf. Documents Analysis and Recognition, pp. 560-564, 2005. W. iu, X. Deng, G. Huang and A. Y. Fu, An Antiphishing Strategy Based on Visual Similarity Assessment, Published by the IEEE

Computer Society

1089-7801/06 IEEE, INTERNET COMPUTING IEEE, 2006. Microsoft in E-Commerce

Corp, Microsoft Phishing Filter: A New Approach to Building Trust

Content, White Paper, 2005. S. Olsen, AOL tests caller ID for e-mail, CNET News.com, January 22, 2004. Y. Pan and X. Ding, Anomaly BasedWeb Phishing Page Detection, Proceedings of the 22nd Annual Computer Security Applications Conference ACSAC'06),

Computer Society, 2006. J. C. Perez, Yahoo airs antispam initiative, ComputerWeekly.com, December 8, 2003. S. Shah, Article, Towers Perrin, Measuring Operational Risks using Fuzzy Logic Modeling, JULY 2003. T.Sharif, Phishing Filter in IE7,

http://blogs.msdn.com/ie/archive/2005/09/09/463204.aspx, September 9, 2006. Document Object Model Level 1 Specification, http://www.w3.org, 2005.

L. Wood,

Anda mungkin juga menyukai