Share this post on:

Ng up, rage feverishly for a few days and then largely disappear,3 and that is not what we wanted to find. Yet, as described in ?, we had only the last 200 tweets per user, so we needed to limit ourselves to a period where the data was most complete. We extracted a mentions network from 22 September 2014 (inclusive)We suspect, for example, that the much-reported discussion of `What colour is the dress?’ fits into this category; see: http:// www.bbc.co.uk/news/blogs-trending-31659395.102030broadcast score rank positive sentiment fraction negative sentiment fraction405060708090100110120130140150until the end of our snowball-sampled data, 6 November 2014, a period of 46 days. The process for creating the network was the same as described for the 7-day network, described in appendix B. The resulting network consisted of 491 417 users with 31 299 836 edges Talmapimod molecular weight between them, coming from 22 594 048 tweets. For the first 40 days, the daily average was 776 k edges; for the last 6 days, when data collection was coming to an end, the daily average was only 40 k edges. The network has an average of 63.7 outgoing edges per user, corresponding to 46.0 tweets per user, and each user mentioned an average of 30.9 distinct recipients. With the dataset chosen, we turn to the question of algorithms. Discovering communities by algorithms requires one to first formulate a precise definition of how `good’ a given division of a social network into communities is. The most widely used formula for quantifying the `goodness’ of a division is called modularity [16], and it compares the fraction of edges that lie within a community in the network with the expected fraction of edges that would lie within the community if the edges were placed at random. Many different versions of modularity have been proposed in the last decade. As we look at relatively unbalanced divisions (trying to identify small SCIO-469MedChemExpress Talmapimod portions of a large network), we considered instead a different measure called conductance [17] which takes values from 0 to 1. Groups of users that are well connected internally but well separated from the rest of the network have values close to 0, and groups with few internal connections but lots of connections to the rest of the network have values close to 1. There is also a variant of conductance, called weighted conductance, that takes into account the weights on edges, rather than just their presence or absence. We use the number of messages exchanged between two users (in either direction) as the weight of the edge between them. Thus, weighted conductance depends not only on which users have corresponded with which others but also on how ?often. If Wij is the weight of the edge from user i to user j, S is a community and S denotes the remaining users, the weighted conductance of S is ?iS,jS Wij , ?min (a(S), a(S)) where a(S) = iS jV Wij (with V being the set of all vertices, i.e. all users). We used the following three algorithms to identify communities: — The Louvain method on unweighted graphs, described in [18], as implemented in Python in the library [19] and in C++ by Lefebvre and Guillaume.4 — The Louvain method on weighted graphs, using the C++ implementation. — The k-clique-communities method5 presented in [20] as implemented in the NetworkX Python library. Using these three methods with different parameters, we produced a list of 98 078 candidate communities. For each community we calculated: — the size of the community (number of nodes), — the number.Ng up, rage feverishly for a few days and then largely disappear,3 and that is not what we wanted to find. Yet, as described in ?, we had only the last 200 tweets per user, so we needed to limit ourselves to a period where the data was most complete. We extracted a mentions network from 22 September 2014 (inclusive)We suspect, for example, that the much-reported discussion of `What colour is the dress?’ fits into this category; see: http:// www.bbc.co.uk/news/blogs-trending-31659395.102030broadcast score rank positive sentiment fraction negative sentiment fraction405060708090100110120130140150until the end of our snowball-sampled data, 6 November 2014, a period of 46 days. The process for creating the network was the same as described for the 7-day network, described in appendix B. The resulting network consisted of 491 417 users with 31 299 836 edges between them, coming from 22 594 048 tweets. For the first 40 days, the daily average was 776 k edges; for the last 6 days, when data collection was coming to an end, the daily average was only 40 k edges. The network has an average of 63.7 outgoing edges per user, corresponding to 46.0 tweets per user, and each user mentioned an average of 30.9 distinct recipients. With the dataset chosen, we turn to the question of algorithms. Discovering communities by algorithms requires one to first formulate a precise definition of how `good’ a given division of a social network into communities is. The most widely used formula for quantifying the `goodness’ of a division is called modularity [16], and it compares the fraction of edges that lie within a community in the network with the expected fraction of edges that would lie within the community if the edges were placed at random. Many different versions of modularity have been proposed in the last decade. As we look at relatively unbalanced divisions (trying to identify small portions of a large network), we considered instead a different measure called conductance [17] which takes values from 0 to 1. Groups of users that are well connected internally but well separated from the rest of the network have values close to 0, and groups with few internal connections but lots of connections to the rest of the network have values close to 1. There is also a variant of conductance, called weighted conductance, that takes into account the weights on edges, rather than just their presence or absence. We use the number of messages exchanged between two users (in either direction) as the weight of the edge between them. Thus, weighted conductance depends not only on which users have corresponded with which others but also on how ?often. If Wij is the weight of the edge from user i to user j, S is a community and S denotes the remaining users, the weighted conductance of S is ?iS,jS Wij , ?min (a(S), a(S)) where a(S) = iS jV Wij (with V being the set of all vertices, i.e. all users). We used the following three algorithms to identify communities: — The Louvain method on unweighted graphs, described in [18], as implemented in Python in the library [19] and in C++ by Lefebvre and Guillaume.4 — The Louvain method on weighted graphs, using the C++ implementation. — The k-clique-communities method5 presented in [20] as implemented in the NetworkX Python library. Using these three methods with different parameters, we produced a list of 98 078 candidate communities. For each community we calculated: — the size of the community (number of nodes), — the number.

Share this post on: