Originally published 5 October 2010

A wide range of tools is available with which users can analyze their data stored in data warehouses and production databases. These tools range from straightforward reporting tools via interactive online analytical processing tools to advanced statistical tools. All these tools help users in some way to improve their business operations and business decisions. They help by presenting data in a textual or graphical way by summarizing data, by grouping data, or by making predictions.

But there are things most of these tools can’t do, and that is analyze data when it’s structured as a graph or network and when that data must be analyzed by traversing the graph. For example, imagine a manager of a social networking website wants to know who the central members of the social network are. And let's define the term "central member" as a member who has the shortest paths to most of the other members. This problem can’t be solved by simply summarizing data, nor does it have anything to do with predicting. Instead, the data must be organized as a graph and a tool must be able to traverse that graph; it has to be able "walk" from node to node. And today, this is not a feature found in most reporting and analytical tools.

Analyzing network structures or graph-based structures is the domain of graph analytics. This is a special form of analytics that has been around for a very long time. In fact, the history of graph analytics and the underlying graph theory goes back to the early 18th century. Today, powerful tools and database servers are available that support graph analytics. Examples of graph database servers are InfiniteGraph, AllegroGraph RDFStore, Neo4j, and vertexdb. What’s special about them is that they allow fast online graph traversal even if the graphs consist of hundreds of millions of nodes.

Also imagine that users want to know what the cheapest flights from Amsterdam to Phoenix are leaving on March 1, 2007, with a maximum of two stops, and each stop should be less than 4 hours. Using SQL, the query would look as follows:

Graph database servers have been designed for this form of analytics. First of all, they would store this data more like a graph:

Secondly, their database languages and APIs are designed for traversing graphs which leads to straightforward and simple queries.

- All the flights from and to airports can be organized as a graph. In this case, the airports are the objects and the flights the relationships between the objects. Such a graph can be created for all the flights of one airline, or for a set of airlines (such as Expedia’s website).

- The network of all the members of a social networking site, such as LinkedIn and Facebook, plus all their relationships, can be arranged as a graph.

- The accounts of a bank with all the inter-account money transfers form a graph.

- In the context of a parcel service, all the parcel shipments between addresses worldwide can be organized as a graph.

- A visitor’s journey on an organization’s website can also be seen as a graph, where webpages form the objects and the visitor’s clicks become the relationships.

- In the context of a telecommunication company, all the call detail records between callers can be viewed as relationships between objects, and together they form an incredibly large graph.

With single path analysis the goal is to find a path through a graph starting with a specific vertex. The path is determined in steps. First, all the edges plus corresponding vertices that can be reached by one hop are evaluated. From the vertices found, one is selected and the first hop is made. Next, the edges and vertices of this vertex found is determined, and this process continues. The result of such an exercise is a path consisting of a number of vertices and edges.

When shortest path analysis is used, the shortest path is found between two vertices. Shortest means the smallest number of hops. Evidently, the shortest path between two vertices consists of one hop.

Optimal path analysis can be used to find the "best" path between two vertices. The best path could be the fastest, the safest, or the cheapest. The best is based on the properties of the vertices and the edges.

With path existence analysis, you determine whether paths exist between two vertices. In other words, if we start with two vertices and their edges are followed, will the paths meet somewhere? An example of path existence analysis is the challenge called the Six Degrees of Kevin Bacon. If a graph is created in which all the movie stars are the vertices and the edges represent the movies in which they played together, it is claimed that anyone can be linked to Kevin Bacon within six hops.

The last one we mention here is vertex centrality analysis. Various measures of the centrality of a vertex within a graph have been defined in graph theory. The higher such a measure is, the more "important" the vertex is in the graph. The following measures have been defined:

- Degree centrality: This measure indicates how many edges a vertex has. The more edges, the higher the degree centrality. A vertex with high degree centrality is generally an active vertex or a hub.
- Closeness centrality: This measure is the inverse of the sum of all shortest paths to other vertices. In other words, it indicates for a vertex the smallest number of hops to make to reach all other vertices individually. A vertex with high closeness centrality has short paths to many vertices.
- Betweenness centrality: This measure indicates the number of shortest paths a vertex is on. It shows a vertex’s position within a graph in terms of its ability to make connections to other groups in the graph.
- Eigenvector centrality: This measure indicates the importance of a vertex in a graph. Scores are assigned to vertices based on the principle that connections to high-scoring vertices contribute more to the score than equal connections to low-scoring vertices.

SOURCE: Extending Business Intelligence with Graph Analytics

**Recent articles by Rick van der Lans**

## Comments

Want to post a comment? Login or become a member today!

Be the first to comment!