Identify & Classify Malicious URLs and Domains with Prediction - Not Blacklists

Using machine learning technology, Swordphish is trained to understand key features that make up URLs or domain names and analyze their likelihood of being risky.


Swordphish Results

The following results are provided by pre-trained Swordphish machine learning-classifiers. Swordphish does not render or evaluate content, but provides predictions based on approximately 50 features of the URL itself.

Predict vs. React

Hackers use VirusTotal too.  Swordphish predicts the maliciousness of any URL or domain. Phishing, Malware C&C and Ransomware DGA. No blacklists.

Add Prediction to Existing Workflows

Abuse management, DNS lookups, outbound proxy, firewall logs. Identify and measure risk posed by zero-day attack URLs and domains.

REST-based, super-fast API

Integrating machine learning-based prediction doesn’t need to be hard. Scale from hundreds of queries to hundreds of thousands.

Request your API Key Now!

Complete the form below and a representative will contact you shortly.

FAQ

What is Swordphish?

Swordphish is a predictive tool that helps companies determine whether a URL or domain is likely to be malicious. Swordphish leverages a simple REST-based API that was designed to allow users to inject intelligence into existing anti-fraud and enterprise security use-cases. More specifically, Swordphish consists of three discrete machine learning-classifiers that have been trained to differentiate between “good” URLs/domains and “bad” URLs/domains.


Why do I need Swordphish?

There is nothing really new about web-based services, lookup engines or APIs that provide reputation about IP addresses, domains, URLs, etc. These have been around for decades and are already leveraged by nearly every type of infosec solution (firewalls, proxies, content filters, etc.). These rely on either proprietary or community-driven blacklists, or a combination of the two. These systems are very effective, have low(er) false positives and are easy to implement. However, they are also completely backward looking. Swordphish is designed to be entirely predictive. The classifiers contained within do not have any blacklist-based context. They are designed with prediction in mind. In short, blacklists look backward, Swordphish looks forward.


What can Swordphish tell me?

Swordphish currently supports three discrete classifiers, each trained to understand the unique features related to phishing, traditional malware C&C, and malware DGA (domain generating algorithms). The user supplies a single URL or domain, or a set of URLs or domains and is returned a probability that each URL/domain is likely to be malicious or not, according to each classifier. This can be quite useful for any use-case that requires insight around URL/domain maliciousness at scale, and quickly. Additionally, since we don’t rely on blacklists, Swordphish is quite adept at uncovering malicious phishing and C&C infrastructure that is brand-new, or zero-day.


What use-cases can Swordphish enhance or enable?

There are several that we have already identified and we expect to identify many more. For example:

  • Predictive Malware Detection (network egress for C&C)
  • Predictive Malware DGA/Ransomware detection (network egress for C&C)
  • Predictive Spear-phishing detection through URL analysis
  • ISP/ESP Abuse Triage (automatic sorting of massive abuse queues and prioritization)
  • Real-time malware/phish/domain orchestration work-flow enhancement (augment blacklist lookups in orchestration workflows with predictive lookups)
  • DNS/Proxy/Webfilter Malware/Phish Correlation

Why did you build Swordphish?

We were seeking a way to triage massive numbers of URLs quickly as part of our malicious site detection and takedown service called Detect Monitoring Service (DMS). Easy Solutions uses Swordphish internally as a tool to automatically score and identify phishing in near real-time. We believe that our customers can also leverage Swordphish for their own use-cases as well.


How fast is Swordphish?

The classifiers themselves are extremely fast (~10ms per lookup). The majority of the request time is from round-trip latency, which is about 100ms on average.


How many requests per second can I submit?

Each API key will be restricted to 200 requests per second. If you need more, let us know.


How do the classifiers work?

For Developers, refer to the technical document on our site (Developers page), but in short—we extracted around 60 features from URLs and domains from our training sets of millions of domains/URLs. From that process, Swordphish learned to classify URLS and domains with a high-degree of accuracy by only analyzing the URL itself. This is an important point, Swordphish does not render the page, lookup the domain in the DNS or rely on any external context—just the structure of the URL itself. Scores are returned as a value between 0 and 1. Zero means highest probability that the input is good, 1 means the highest probability that the URL is bad.


How accurate is Swordphish?

Most machine-learning or predictive classifiers measure accuracy along a curve. In our testing of Swordphish, we found that the Phishing classifier had a F1-Score of 0.94 and an accuracy of over 95% with results above a 0.6 threshold. Your mileage may vary.


Developer Information

Calling the Swordphish API

The Application Programming Interface (API) allows customers to interact with our cloud-based service using industry standards. Swordphish supports the JSON output format. Please refer to the Easy Solutions Swordphish Github repository for some Swordphish API testing tools: https://github.com/easysolutionsinc/swordphish. The API is accessed through a simple URL via HTTP or HTTPS.

Swordphish supports traditional API key authentication. API keys are delivered to each user via email upon approval and key provisioning. A well-formed API key must be supplied within the HTTP header of each query. Queries with invalid keys will return a 401 error. To call the API, simply make a request with the key in a header. For more assistance, please look for details in the query format.

Calling the Swordphish API

URL FORMAT:
https://api.easysol.io/swordphish/

HTTP VERB:
POST

HEADER:
Apikey: [api key]
Content-type: application/json

PAYLOAD:
{
"urlArray": [ "URL 1","URL 2","...","URL n"]
"force_clf": true
}
urlArray(JSON Array): (mandatory) JSON array with all URLs to analyze. The maximum number of allowed URLs per call is 1000. An error response 400 is shown if maximum is surpassed. force_clf (Boolean): (mandatory) true or false” Set force_clf to “true” to ignore Alexa ranking and force the return of classifier values.

Swordphish API Query Format

RESPONSE:
{
"dga":[DGA classifier probability]
"malware":[Malware classifier probability]
"phishing":[Phishing Classifier probability]
"rank":[Alexa Rank]
"url":URL_1
}
API RATE:
50,000 API calls per day.

Success Response Codes:


Code: 200: OK
content:
[
{
"dga":[DGA classifier probability]
"malware":[Malware classifier probability]
"phishing":[Phishing Classifier probability]
"rank":[Alexa Rank] "url": URL_1
},
{
"dga":[DGA classifier probability]
"malware":[Malware classifier probability]
"phishing":[Phishing Classifier probability]
"rank":[Alexa Rank]
"url":URL_2
},
{
"dga":[DGA classifier probability]
"malware":[Malware classifier probability]
"phishing":[Phishing Classifier probability]
"rank":[Alexa Rank]
"url":URL_N
}
]
Note: The DGA, malware and phishing classifiers return "–1" when an unidentified case occurs. The rank result will return “-1” if the URL is not contained within the Alexa 1m list.

Error Response Codes:

Error Response 400
Code: 400: Bad Request
Cause: Swordphish API Server cannot understand the body
Content:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><title>400 Bad Request</title><h1>Bad Request</h1><p>The browser (or proxy) sent a request that this server could not understand.</p>

Error Response 400
Code: 400: Bad Request
Cause: URL limit by request was exceeded
Content:
bad request, max URL allowed was exceeded

Error Response 400
Code: 400: Bad Request
Cause: Payload doesn't have some parameter(s), urlArray, force_clf, or both
Content:
Bad request

Error Response 401
Code: 401: Unauthorized
Cause: No API key was sent in headers. You should have to send an API key header with appropriate value
Content:
{"message":"No API Key found in headers, body or querystring"}

Error Response 403
Code: 403: Forbidden
Cause: API Key sent in header is invalid. Ask the administrator for a new API key
Content:
{"message": "Invalid authentication credentials"}

Error Response 405
Code: 405: Method Not Allowed
Cause: You called Swordphish API using a wrong HTTP Verb. The only HTTP Verb supported by now is POST
Content:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><title>405 Method Not Allowed</title><h1>Method Not Allowed</h1><p>The method is not allowed for the requested URL.</p>

Error Response 429
Code: 429: Too Many Requests
Cause: API rate limiting was exceeded
Content:
{ "message": "API rate limit exceeded" }

Error Response 500
Code: 500: Internal Server Error
Cause: Something happen on the server, please try again. If the problem continues, contact the administrator
Content:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><title>500 Internal Server Error</title><h1>Internal Server Error</h1><p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>

Error Response 502
Code: 502: Bad Gateway
Cause: Swordphish Backend is down. Please contact swordphish@easysol.net.
Content:
An invalid response was received from the upstream server

Example JSON Query and Response

REQUEST:
curl -H "Content-Type: application/json" -H "apikey: [api key]" -X POST -d '{"urlArray":[ "http://www.cnn.com","http://gueymx.com/encode/" ],"force_clf":false }' https://api.easysol.io/swordphish/

RESPONSE:
[
{
"dga": 0,
"malware": 0,
"phishing": 0,
"rank": 91
"url": http://www.cnn.com
},

{
"dga": 0.8450319728983464,
"malware": 0.2351330099992591,
"phishing": 0.8084942834942835,
"rank": -1
"url": http://gueymx.com/encode/"
},
{
"dga": -1,
"malware": 0.2351330099992591,
"phishing": 0.8084942834942835,
"rank": -1 "url":
http://10.25.154.2/encode/"
}
]

Usability Considerations

Swordphish is designed to score URLs and domains based solely on the structure and contents of the URL itself. Since Swordphish only requires a URL as an input, it can be used for many different use-cases and is very fast. However, Swordphish is not a blacklist. A user should not expect Swordphish to return results that are identical to blacklists. While blacklists usually contain some sense of temporal scoring (i.e., blacklists become more accurate over time) Swordphish performs best where blacklists are weak—closer to zero hour.

Swordphish Scoring Methodology

Swordphish combines statistical analysis of a URL and a machine learning classifier to accurately classify phishing and malware related URLs based only on the URL itself. Classification based on URLs facilitates a defense against all phishing and malware attacks due to the features they all share, a URL. The algorithm extracts over fifty features based on analyzing the structure of the URL, for example by estimating Kullback-Leibler Divergence between the normalized character frequency of the English language and the URL. Other features include the character frequencies, the number of “@” and “-” symbols, the number of top-level domains in the URL, whether the URL is an IP address, the length and the number of suspicious words in the URL.

The results confirm that a simple defense vector as this has shown great technical results due to its simplicity and excellent statistical measures of performance. The resulting phishing model had a F1-Score of 0.94, an accuracy of over 95% and showed great stability in the holdout set.

Swordphish returns a set of decimal values between 0 and 1. For most practical applications, return values greater than or equal to 0.6 should be considered highly suspicious and highly likely to be phishing URLs. As a rule of thumb, higher probabilities yield lower false-positives and higher false negatives and vice versa.

Technical Description

Easy Solutions provides a suite of anti-fraud products and services that attack fraud at every stage of the process, from recon and targeting, to setup and launch to cash-out. In our efforts to minimize the impact of fraud in the earliest stages, Easy Solutions has built a suite of technologies to rapidly identify threats in a highly automated fashion. Swordphish was developed in this spirit to advance the state-of-the-art in high-volume and low-latency phishing detection. Existing phishing classification programs rely on inefficient mechanisms that blend rules, filters, signatures and manual classification to predict and accurately classify phishing attacks. These techniques are slow and create the conditions required for phishing attacks to remain profitable for attackers.

Swordphish is a web-based, RESTful API interface connected to a high-performance and scalable web infrastructure designed with a single purpose—to classify any URL as phishing or benign in milliseconds. Swordphish is built upon an advanced supervised machine learning implementation leveraging random forests. This approach allows Swordphish to classify any well-formed URL with 95% accuracy.

Tool Description

Swordphish delivers the fastest, most comprehensive and most accurate phishing URL services on the market. Swordphish is based upon a proprietary supervised machine-learning system that is trained to automatically predict the likelihood that any URL is associated with a phishing attack. This predictive capability is based upon historical and on-going training of the Swordphish classifier with millions of URLs.

Easy Solutions provides phishing detection, threat detection and takedown services for hundreds of global banks. Swordphish allows any organization to leverage some of the same internal Easy Solutions threat detection capabilities into their existing workflow via a lightweight, RESTFul API. Swordphish was designed expressly to add accuracy, efficiency and automation to existing threat detection and classification workflows.

Swordphish – Benefits

Phishing attacks still continue today despite huge advancements in spam filtering, education, content detection and hosting countermeasures. Even phishing attacks that are hosted for minutes or hours provide enough incentive for attackers to continue to stage and launch phishing attacks. Swordphish provides the following benefits:

Cloud-based REST API interface

No hardware to install, no software to deploy


Improve abuse processing efficiency by 10x or more

Swordphish is trained and updated constantly and provides consistent 95% detection with low false positives.

  • Stream URLs to Swordphish to score in near real-time
  • Redeploy human analysts for higher order analysis
  • Improve ability to handle surges in abuse case volume