Sunday, December 6, 2015

Visualizing Big Data Networks

“We now live in an economy that is the economy of information, of interconnectedness. In the same way as we had material science, we now have technology.”    -- Albert-lászló Barabási, physicist and network scientist


We live in the era of big data.  It’s so big that all digital data can now be measured in zettabytes (one trillion gigabytes).  What’s equally amazing is the rate this data travels around the globe.  Cisco states that Internet traffic has increased fivefold over the past five years.  In 2016, it is anticipated that Internet traffic will pass the zettabyte threshold.[1] 

Our connectivity to the world is becoming more and more datafied.  Information about our daily interactions are captured online.  These interactions represent our network which is part of a much larger global network.  Networks are collections of entities with relations among them.[2]  In the last decade great strides have been made in measuring and understanding these interactions.

Consider twenty years ago:  My interactions with friends and family were almost entirely analog.  I would call them on a land-line phone, write letters, and conduct face-to-face conversations.  Now the majority of my interactions occur digitally.  I send emails, texts and post Facebook updates.  This digital data can easily be stored, analyzed and graphed.  For example, in the graphic below, Paul Butler visualizes friendship relationships of 500 million people.[3] 


This graph just one sample of the emerging discipline of network science.  Networks can be seen not only within social media, but in many other areas.  Google uses network science to determine search rankings.  Amazon uses it for product recommendations.  Health organizations use it to study disease propagation, and the list goes on and on.

They key is to make sense of complex networks so that meaningful observations can be deducted.  The noted physicist Albert-lászló Barabási asserts that virtually all networks are quantifiable.  By correctly quantifying a network, it is possible to predict and control the network:

“The number of highly connected or less-connected nodes is never random in the network. The way they break down, the way they evolve is never random in these networks. The way that hubs link to their neighborhood, the way the community is formed, the way the communities look, their number, their size, they all follow very precise laws and very quantifiable patterns..... Eventually, if you quantify properly, then you can mathematically formulate. If you mathematically formulate it, then you gain predictive power. If you gain predictive power, eventually you get to the point to be able to control it.”[4]

A powerful tool in understanding networks is visualization.  Visualization lays out the nodes and relationships in an understandable pattern.  In the graphic to the right, medical researchers created a network map of diseases and their associated genes.  The hope is to redefine how diseases are classified and potentially treat them at the genetic level.[5]

Visualizations such as these help analyze the properties of a network.  Many tools such as Gephi are available to analyze and visualize data.  Properties such as centrality, density and clustering coefficient can be measured.  Graphs can be created in different formats to aid in understanding the network.  For example, the graph at the top has a geographic layout.  The disease graph utilizes the force directed layout. 

We live in the zettabyte era of big data.  All our actions and interactions are rapidly being datafied.  As the data collected becomes orders of magnitude larger, so does the complexity of their networks. Network science will play a key role in analyzing the interconnectedness of data entities.  When properly quantified, predicative models can be created.  Graph visualizations can help to understand and communicate these complex networks.  Ultimately these efforts can lead to not only predicting network behavior, but controlling it.




References

1. Cisco.  2015 May.  “The Zettabyte Era—Trends and Analysis.”  White Paper.  http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/VNI_Hyperconnectivity_WP.html

2.  Ram, Sudha.  2015.  “Business Intelligence – Introduction to Networks.”  The University of Arizona.

3.  Butler, Paul. 2010 December 10.  “Visualizing Facebook Friends: Eye Candy in R.”  http://paulbutler.org/archives/visualizing-facebook-friends/

4.  Barabási, Albert-lászló.  2012 September 24.  “Thinking in Network Terms.”  Edge.org.  http://edge.org/conversation/thinking-in-network-terms

5.  Vidal, Marc; Barabási, Albert-lászló; Cusick, Michael.  2008 May 5.  “Mapping the Human ‘Diseasome’.”  The New York Times.  http://www.nytimes.com/interactive/2008/05/05/science/20080506_DISEASE.html