Detecting Phishing URLs Using Machine Learning & Lexical Feature-based Analysis
Phishing URLs is one of the greatest threats for cybersecurity
professionals and practitioners. This requires hold hands
together, make great efforts, and use current technology to
help identifying Phishing URLs and control the spread of this
threat. Many researchers have investigated various machine
learning techniques to tackle this threat. However, there are
many difficulties and obstacles of using machine learning.
The proposed approach detects Phishing URLs through
analyzing URLs to extract lexical characteristics features.
Afterward, apply machine learning approach based on the
extracted features. The dataset was collected from different
sources, it includes four different attack scenarios:
Defacement, Spam, Phishing, Malware. However, in this
research, the focus was on Phishing URLs. The dataset was
used as an input for various machine learning and statistical
detectionmodels?(RF: Random forest, DT: Decision Tree
Classifier, GNB Gaussian Naive Bayes, KNN: k-nearest
neighbour, Logistic regression, SVC: Support Vector
Classifier, QDA: Quadratic Discriminant Analysis,
Perceptron, SMOTE: Synthetic Minority Oversampling
Technique)?. These models were employed to predict
Phishing URLs based lexical characteristics features. The
result indicates a relatively good accuracy rate. The Random
forest (RF)modelhas produced the best accuracy (98%)
compared to the other detection models. As well as, the RF
has produced the best precision and recall (98%) respectively.