Graph theory applied to a portion of the Dark Web shows it a set of largely isolated dark silos
1.5.2017 securityaffairs Security
A group of researchers conducted a study on the Dark Web leveraging the Graph theory. This hidden space appears as composed by sparse and isolated silos.
A group of experts from the Massachusetts Institute of Technology’s SMART lab in Singapore has recently published an interesting research paper on the Dark Web.
The researchers collected and analyzed the dark web (a.k.a. the “onionweb”) hyperlink graph, they discovered highly dissimilar to the well-studied world wide web hyperlink graph.
The team led by Carlo Ratti, director of MIT’s Senseable City Lab, used the Graph theory as a tool for analyzing social relationships for the dark web.
The experts analyzed the Tor network, one of the most popular darknet, they used crawler leveraging the tor2web proxy onion.link.
It is important to highlight that the team focused its analysis on the Tor Network, that anyway represents just a portion of the dark web.
The team crawled onion.link using the commercial service scrapinghub.com, they used two popular lists of dark web sites trying to visit them and accessing all linked pages using a breadth-first search.
The team just included in their analysis websites which responded to avoid including in their results services that no longer exist.
“I.e., if we discover a link to a page on domain v, but domain v could not be reached after >10 attempts across November 2016–February 2017, we delete node v and all edges to node v.
In our analysis, before pruning nonresponding domains we found a graph of 13,117 nodes and 39,283 edges. After pruning, we have a graph of 7, 178 nodes and 25, 104 edges (55% and 64% respectively)” states the researchers.
The first discrepancy emerged from the research is related to the number of the active .onion domain. The maintainers at the Tor Project Inc. states that the Tor network currently hosts ∼60, 000 distinct, active .onion addresses, meanwhile the team of experts has found only 7, 178 active .onion domains.
The researchers attribute this high-discrepancy to various messaging services— particularly TorChat, Tor Messenger, and Ricochet in which each user is identified by a unique .onion domain.
The Graph-theoretic results show that ∼30% of domains have exactly one incoming link—of which 62% come from one of the five largest out-degree hubs. 78% of all nodes received a connection from at least one of them.
The most intriguing aspect of the research is that 87% of sites do not link to any other site, this discovery has a significant impact on all graph-theoretic measures (see darkweb out-degree in the following image).
“We conclude that in the term “darkweb”, the word “web” is a connectivity misnomer. Instead, it is more accurate to view the darkweb as a set of largely isolated dark silos” wrote the experts. reads the paper. “In our darkweb graph, each vertex is a domain and every directed edge from u → v means there exists a page within domain u linking to a page within domain v. The weight of the edge from u → v is the number of pages on domain u linking to pages on domain v.”
I believe this research could be a starting point for further works, Ratti and his team, along with other researchers could conduct further investigations on the Dark Web, not limiting their analysis to the Tor Network.
Ratti announced that his team is working on the definition of new models to use in further researches.
“As next step,” Ratti said, “we are planning to develop a model to explain how a network develops when nodes do not trust each other.”