Network science


Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors represented by nodes and the connections between the elements or actors as links. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

Background and history

The study of networks has emerged in diverse disciplines as a means of analyzing complex relational data. The earliest known paper in this field is the famous Seven Bridges of Königsberg written by Leonhard Euler in 1736. Euler's mathematical description of vertices and edges was the foundation of graph theory, a branch of mathematics that studies the properties of pairwise relations in a network structure. The field of graph theory continued to develop and found applications in chemistry.
Dénes Kőnig, a Hungarian mathematician and professor, wrote the first book in Graph Theory, entitled "Theory of finite and infinite graphs", in 1936.
In the 1930s Jacob Moreno, a psychologist in the Gestalt tradition, arrived in the United States. He developed the sociogram and presented it to the public in April 1933 at a convention of medical scholars. Moreno claimed that "before the advent of sociometry no one knew what the interpersonal structure of a group 'precisely' looked like". The sociogram was a representation of the social structure of a group of elementary school students. The boys were friends of boys and the girls were friends of girls with the exception of one boy who said he liked a single girl. The feeling was not reciprocated. This network representation of social structure was found so intriguing that it was printed in The New York Times. The sociogram has found many applications and has grown into the field of social network analysis.
Probabilistic theory in network science developed as an offshoot of graph theory with Paul Erdős and Alfréd Rényi's eight famous papers on random graphs. For social networks the exponential random graph model or p* is a notational framework used to represent the probability space of a tie occurring in a social network. An alternate approach to network probability structures is the network probability matrix, which models the probability of edges occurring in a network, based on the historic presence or absence of the edge in a sample of networks.
Interest in networks exploded around 2000, following new discoveries that offered novel mathematical framework to describe different network topologies, leading to the term 'network science'. Albert-László Barabási and Reka Albert discovered the scale-free networks nature of many real networks, from the WWW to the cell. The scale-free property captures the fact that in real network hubs coexist with many small degree vertices, and the authors offered a dynamical model to explain the origin of this scale-free state. Duncan Watts and Steven Strogatz reconciled empirical data on networks with mathematical representation, describing the small-world network.

Network classification

Deterministic network

The definition of deterministic network is defined compared with the definition of probabilistic network. In un-weighted deterministic networks, edges either exist or not, usually we use 0 to represent non-existence of an edge while 1 to represent existence of an edge. In weighted deterministic networks, the edge value represents the weight of each edge, for example, the strength level.

Probabilistic network

In probabilistic networks, values behind each edge represent the likelihood of the existence of each edge. For example, if one edge has a value equals to 0.9, we say the existence probability of this edge is 0.9.

Network properties

Often, networks have certain attributes that can be calculated to analyze the properties & characteristics of the network. The behavior of these network properties often define network models and can be used to analyze how certain models contrast to each other. Many of the definitions for other terms used in network science can be found in Glossary of graph theory.

Size

The size of a network can refer to the number of nodes or, less commonly, the number of edges which can range from to . In the case of a simple graph, we have ; for directed graphs, ; for directed graphs with self-connections allowed,. In the circumstance of a graph within which multiple edges may exist between a pair of vertices,.

Density

The density of a network is defined as a normalized ratio between 0 and 1 of the number of edges to the number of possible edges in a network with nodes. Network density is a measure of the percentage of "optional" edges that exist in the network and can be computed as
where and are the minimum and maximum number of edges in a connected network with nodes, respectively. In the case of simple graphs, is given by the binomial coefficient and, giving density
Another possible equation is whereas the ties are unidirectional. This gives a better overview over the network density, because unidirectional relationships can be measured.

Planar network density

The density of a network, where there is no intersection between edges, is defined as a ratio of the number of edges to the number of possible edges in a network with nodes, given by a graph with no intersecting edges, giving

Average degree

The degree of a node is the number of edges connected to it. Closely related to the density of a network is the average degree, . In the ER random graph model we can compute the expected value of : a random vertex has other vertices in the network available, and with probability, connects to each. Thus,.
Degree distribution
The degree distribution is a fundamental property of both real networks, such as the Internet and social networks, and of theoretical models. The degree distribution P of a network is defined to be the fraction of nodes in the network with degree k. The simplest network model, for example, the random graph, in which each of n nodes is independently connected with probability p, has a binomial distribution of degrees k. Most real networks, from the WWW to protein interaction networks, however, have a degree distribution that are highly right-skewed, meaning that a large majority of nodes have low degree but a small number, known as "hubs", have high degree. For such scale-free networks the degree distribution approximately follows a power law:, where γ is the degree exponent, and is a constant. Such scale-free networks have unexpected structural and dynamical properties, rooted in the diverging second moment of the degree distribution.

Average shortest path length (or characteristic path length)

The average shortest path length is calculated by finding the shortest path between all pairs of nodes, and taking the average over all paths of the length thereof. This shows us, on average, the number of steps it takes to get from one member of the network to another. The behavior of the expected average shortest path length as a function of the number of vertices of a random network model defines whether that model exhibits the small-world effect; if it scales as, the model generates small-world nets. For faster-than-logarithmic growth, the model does not produce small worlds. The special case of is known as ultra-small world effect.

Diameter of a network

As another means of measuring network graphs, we can define the diameter of a network as the longest of all the calculated shortest paths in a network. It is the shortest distance between the two most distant nodes in the network. In other words, once the shortest path length from every node to all other nodes is calculated, the diameter is the longest of all the calculated path lengths. The diameter is representative of the linear size of a network. If node A-B-C-D are connected, going from A->D this would be the diameter of 3.

Clustering coefficient

The clustering coefficient is a measure of an "all-my-friends-know-each-other" property. This is sometimes described as the friends of my friends are my friends. More precisely, the clustering coefficient of a node is the ratio of existing links connecting a node's neighbors to each other to the maximum possible number of such links. The clustering coefficient for the entire network is the average of the clustering coefficients of all the nodes. A high clustering coefficient for a network is another indication of a small world.
The clustering coefficient of the 'th node is
where is the number of neighbours of the 'th node, and is the number of connections between these neighbours. The maximum possible number of connections between neighbors is, then,
From a probabilistic standpoint, the expected local clustering coefficient is the likelihood of a link existing between two arbitrary neighbors of the same node.

Connectedness

The way in which a network is connected plays a large part into how networks are analyzed and interpreted. Networks are classified in four different categories:
  • Clique/''Complete Graph: a completely connected network, where all nodes are connected to every other node. These networks are symmetric in that all nodes have in-links and out-links from all others.
  • Giant Component: A single connected component which contains most of the nodes in the network.
  • Weakly Connected Component: A collection of nodes in which there exists a path from any node to any other, ignoring directionality of the edges.
  • Strongly Connected Component: A collection of nodes in which there exists a directed'' path from any node to any other.