The following problems appeared in the programming assignments in the coursera course Applied Social Network Analysis in Python. The descriptions of the problems are taken from the assignments. The analysis is done using NetworkX.
The following theory is going to be used to solve the assignment problems.
1. Creating and Manipulating Graphs
Employee_Movie_Choices
, the following figure shows the content of the file.Index | Employee | Movie |
---|---|---|
0 | Andy | Anaconda |
1 | Andy | Mean Girls |
2 | Andy | The Matrix |
3 | Claude | Anaconda |
4 | Claude | Monty Python and the Holy Grail |
5 | Claude | Snakes on a Plane |
6 | Frida | The Matrix |
7 | Frida | The Shawshank Redemption |
8 | Frida | The Social Network |
9 | Georgia | Anaconda |
10 | Georgia | Monty Python and the Holy Grail |
11 | Georgia | Snakes on a Plane |
12 | Joan | Forrest Gump |
13 | Joan | Kung Fu Panda |
14 | Joan | Mean Girls |
15 | Lee | Forrest Gump |
16 | Lee | Kung Fu Panda |
17 | Lee | Mean Girls |
18 | Pablo | The Dark Knight |
19 | Pablo | The Matrix |
20 | Pablo | The Shawshank Redemption |
21 | Vincent | The Godfather |
22 | Vincent | The Shawshank Redemption |
23 | Vincent | The Social Network |
Employee_Relationships
, has data on the relationships between different coworkers. The relationship score has value of -100
(Enemies) to +100
(Best Friends). A value of zero means the two employees haven’t interacted or are indifferent. The next figure shows the content of this file.
Index | Employee1 | Employee2 | Score |
---|---|---|---|
0 | Andy | Claude | 0 |
1 | Andy | Frida | 20 |
2 | Andy | Georgia | -10 |
3 | Andy | Joan | 30 |
4 | Andy | Lee | -10 |
5 | Andy | Pablo | -10 |
6 | Andy | Vincent | 20 |
7 | Claude | Frida | 0 |
8 | Claude | Georgia | 90 |
9 | Claude | Joan | 0 |
10 | Claude | Lee | 0 |
11 | Claude | Pablo | 10 |
12 | Claude | Vincent | 0 |
13 | Frida | Georgia | 0 |
14 | Frida | Joan | 0 |
15 | Frida | Lee | 0 |
16 | Frida | Pablo | 50 |
17 | Frida | Vincent | 60 |
18 | Georgia | Joan | 0 |
19 | Georgia | Lee | 10 |
20 | Georgia | Pablo | 0 |
21 | Georgia | Vincent | 0 |
22 | Joan | Lee | 70 |
23 | Joan | Pablo | 0 |
24 | Joan | Vincent | 10 |
25 | Lee | Pablo | 0 |
26 | Lee | Vincent | 0 |
27 | Pablo | Vincent | -20 |
0 | Claude | Andy | 0 |
1 | Frida | Andy | 20 |
2 | Georgia | Andy | -10 |
3 | Joan | Andy | 30 |
4 | Lee | Andy | -10 |
5 | Pablo | Andy | -10 |
6 | Vincent | Andy | 20 |
7 | Frida | Claude | 0 |
8 | Georgia | Claude | 90 |
9 | Joan | Claude | 0 |
10 | Lee | Claude | 0 |
11 | Pablo | Claude | 10 |
12 | Vincent | Claude | 0 |
13 | Georgia | Frida | 0 |
14 | Joan | Frida | 0 |
15 | Lee | Frida | 0 |
16 | Pablo | Frida | 50 |
17 | Vincent | Frida | 60 |
18 | Joan | Georgia | 0 |
19 | Lee | Georgia | 10 |
20 | Pablo | Georgia | 0 |
21 | Vincent | Georgia | 0 |
22 | Lee | Joan | 70 |
23 | Pablo | Joan | 0 |
24 | Vincent | Joan | 10 |
25 | Pablo | Lee | 0 |
26 | Vincent | Lee | 0 |
27 | Vincent | Pablo | -20 |
Employee_Movie_Choices
file, the following figure visualizes the graph. The blue nodes represent the employees and the red nodes represent the movies.
Employee_Relationships
file, the following figure visualizes the graph. The nodes represent the employees and the edge colors and widths (weights) represent the relationships. The green edges denote friendship, the red edges enmity and blue edges neutral relations. Also, the thicker an edge is, the more powerful is a +ve or a -ve relation.Index | Employee1 | Employee2 | Relationship Score | Weight in Projected Graph |
---|---|---|---|---|
0 | Andy | Claude | 0 | 1.0 |
1 | Andy | Frida | 20 | 1.0 |
2 | Andy | Georgia | -10 | 1.0 |
3 | Andy | Joan | 30 | 1.0 |
4 | Andy | Lee | -10 | 1.0 |
5 | Andy | Pablo | -10 | 1.0 |
6 | Andy | Vincent | 20 | 0.0 |
7 | Claude | Frida | 0 | 0.0 |
8 | Claude | Georgia | 90 | 3.0 |
9 | Claude | Joan | 0 | 0.0 |
10 | Claude | Lee | 0 | 0.0 |
11 | Claude | Pablo | 10 | 0.0 |
12 | Claude | Vincent | 0 | 0.0 |
13 | Frida | Georgia | 0 | 0.0 |
14 | Frida | Joan | 0 | 0.0 |
15 | Frida | Lee | 0 | 0.0 |
16 | Frida | Pablo | 50 | 2.0 |
17 | Frida | Vincent | 60 | 2.0 |
18 | Georgia | Joan | 0 | 0.0 |
19 | Georgia | Lee | 10 | 0.0 |
20 | Georgia | Pablo | 0 | 0.0 |
21 | Georgia | Vincent | 0 | 0.0 |
22 | Joan | Lee | 70 | 3.0 |
23 | Joan | Pablo | 0 | 0.0 |
24 | Joan | Vincent | 10 | 0.0 |
25 | Lee | Pablo | 0 | 0.0 |
26 | Lee | Vincent | 0 | 0.0 |
27 | Pablo | Vincent | -20 | 1.0 |
28 | Claude | Andy | 0 | 1.0 |
29 | Frida | Andy | 20 | 1.0 |
30 | Georgia | Andy | -10 | 1.0 |
31 | Joan | Andy | 30 | 1.0 |
32 | Lee | Andy | -10 | 1.0 |
33 | Pablo | Andy | -10 | 1.0 |
34 | Vincent | Andy | 20 | 0.0 |
35 | Frida | Claude | 0 | 0.0 |
36 | Georgia | Claude | 90 | 3.0 |
37 | Joan | Claude | 0 | 0.0 |
38 | Lee | Claude | 0 | 0.0 |
39 | Pablo | Claude | 10 | 0.0 |
40 | Vincent | Claude | 0 | 0.0 |
41 | Georgia | Frida | 0 | 0.0 |
42 | Joan | Frida | 0 | 0.0 |
43 | Lee | Frida | 0 | 0.0 |
44 | Pablo | Frida | 50 | 2.0 |
45 | Vincent | Frida | 60 | 2.0 |
46 | Joan | Georgia | 0 | 0.0 |
47 | Lee | Georgia | 10 | 0.0 |
48 | Pablo | Georgia | 0 | 0.0 |
49 | Vincent | Georgia | 0 | 0.0 |
50 | Lee | Joan | 70 | 3.0 |
51 | Pablo | Joan | 0 | 0.0 |
52 | Vincent | Joan | 10 | 0.0 |
53 | Pablo | Lee | 0 | 0.0 |
54 | Vincent | Lee | 0 | 0.0 |
55 | Vincent | Pablo | -20 | 1.0 |
In this assignment we shall go through the process of importing and analyzing an internal email communication network between employees of a mid-sized manufacturing company.
Each node represents an employee and each directed edge between two nodes represents an individual email. The left node represents the sender and the right node represents the recipient, as shown in the next figure.
https://sandipanweb.files.wordpress.com/2017/09/f7.png?w=150&h=117 150w" sizes="(max-width: 263px) 100vw, 263px" />
This will only be possible if the graph is strongly connected, but it’s not. The following figure shows 42 strongly-connected components (SCC) of the graph. Each SCC is shown using a distinct color.
As can be seen from the following histogram, only one SCC contains 126 nodes, each of the remaining 41 SCCs contains exactly one node.
As can be seen from above, inside the largest SCC, all the nodes are reachable from one another with at most 3 hops, the average distance between any node pairs belonging to the SCC being 1.6461587301587302.
has length equal to the diameter of this SCC.
As obvious, the minimum number of nodes required to be removed exactly equals to the size of the min-cut with the source node (center node 38) to the target node (node 97), shown in red in the next figure. The size of the min-cut is 5 and hence 5 nodes (shown in pale-golden-red color) need to be removed, the corresponding node numbers are: 14, 32, 37, 45 and 46. As done in the earlier figures, all the nodes and edges in the email-net graph are not shown to avoid over-crowding.
The min-cut is separately shown in the following figure from source node 38 to target node 97.
The next figure shows the undirected graph constructed. As before, the node sizeis proportional to the degree of the node.
The following figure shows the distribution of the local clustering coefficients.
The following figure shows the undirected graph, this time the node size being proportional to the local clustering coefficient of the node.
The next figure shows the degree distribution of the undirected graph.
Since there are more nodes with lower degrees than with higher degrees and the transitivity weights the nodes with higher degree more, the undirected graph has lower transitivity and higher average clustering coefficient.
Comment
Sure @Thomas Triplet, it's all about checking whether networkX scale with #nodes and #edges, here are some more experiments with bigger networks (friendship and email) that can be found here: https://sandipanweb.wordpress.com/2017/09/22/some-more-social-netwo....
This is certainly useful to learn the basic of graph analysis from a data science perspective, and as such, it is great to see it on Coursera, but, real-life social graph are more like (from FB) than
From an engineering perspective, I'm very curious to see how NetworkX in Python is applicable to real-word graphs/networks (vs other dedicated frameworks like Spark GraphX, Neo4J, etc.)...
@Dmitry Zinoviev Thank you very much for the book-link.
The Pragmatic Programmers just published my book Complex Network Analysis in Python. Recognize → Construct → Visualize → Analyze → Interpret.
The book covers both elements of complex network analysis (CNA), including social network analysis, and the use of networkx for CNA. It covers not only social networks, but also product, semantic, event, interaction, and other types of networks. It has five complete case studies based on real-world data (including the "Panama papers") and numerous code examples.
The book is available as an electronic beta from the publisher website (or you can pre-order the paper version on Amazon).
@Andrew Ford actually this is part of a few assignments from the coursera course which is still running, so it will be a breach of honor code to share the code here, however, the course is free and anyone can enroll to get a few starter codes.
can you share the code you used for this?
© 2017 Data Science Central Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
You need to be a member of Data Science Central to add comments!
Join Data Science Central