public void m(){
n(this);
}
public static n(Object obj){
if(obj==null){
doSomething();
}
}
public void m(){
n(this);
}
public static n(Object obj){
if(obj==null){
doSomething();
}
}
public void m(){
if(this==null){
doSomething();
}
}
public void m(){
if(false){
doSomething();
}
}
public void m(){
}
-XX:+UseFastEmptyMethods
address entry_point = __ pc();
// If we need a safepoint check, generate full interpreter entry.
Label slow_path;
__ cmp32(ExternalAddress(SafepointSynchronize::address_of_state()),
SafepointSynchronize::_not_synchronized);
__ jcc(Assembler::notEqual, slow_path);
// do nothing for empty methods
// (do not even increment invocation counter)
// Code: _return
// _return
// return w/o popping parameters
__ pop(rax);
__ mov(rsp, r13);
__ jmp(rax);
twitter:@j_palka
code mangler at Allegro.tech, analytics, user profile and behavior
JDD/4Developers conferences, dictator for life and trouble maker
anything about JVM, bytecode, parsers, compilers and graphs of course
former chief architect, manager, database admin and performance freak
a graph is an ordered pair G = (V, E) comprising a set V of vertices, nodes or points together with a set E of edges, arcs or lines, which are 2-element subsets of V (i.e., an edge is related with two vertices,and the relation is represented as an unordered pair of the vertices with respect to the particular edge). To avoid ambiguity, this type of graph may be described precisely as undirected and simple
A Graph — records data in › Nodes — which have › Properties
Nodes — are organized by › Relationships — which also have › Properties
Nodes — are grouped by › Labels — into › Sets
Neo4j is a graph database,
it is not another layer of abstraction over existing database engine
it has its own storage mechanism for persisting nodes and edges
node
id
labels
properties
relationship
relationship type
direction
properties
schema
index
constraints
Java primitives (long, int, float, etc.)
String
arrays of the above
looks like SQL, feels like pattern-matching
MATCH (n:Label) RETURN n;
MATCH (n)-[:LIKES]->m return m;
MATCH (n)-[r:LIKES]->m return r;
(a)-->(b)
(a)-->(b)<--(c)
(a:User)-->(b)
(a:User:Admin)-->(b)
(a)-[r:REL_TYPE]->(b)
(a)-[*2]->(b)
(a)-[*3..5]->(b)
patterns return collection of paths
MATCH a-[:REL]-b, (c:Label) WHERE NOT b-[:REL]-c RETURN b;
MATCH a-[:REL]-b, (c:Label) RETURN a,b-[:REL]-c;
MATCH a-[:REL]-b, (c:Label) WITH a, b-[:REL]-c RETURN a, count(distinct c);
RETURN [0,1,2,3,4,5,6,7,8,9] AS collection
RETURN range(0,10)[3]
RETURN length(range(0,10)[0..3])
RETURN [x IN range(0,10) WHERE x % 2 = 0 | x^3] AS result
RETURN [x IN range(0,10) WHERE x % 2 = 0 | x^3] AS result;
WHERE a.name='Alice' AND b.name='Daniel' AND
ALL (x IN nodes(p) WHERE x.age > 30);
WHERE a.name='Eskil' AND ANY (x IN a.array WHERE x = "one");
WHERE n.name='Alice' AND NONE (x IN nodes(p) WHERE x.age = 25);
WHERE n.name='Alice' AND
SINGLE (var IN nodes(p) WHERE var.eyes = "blue");
MATCH (n)
WHERE EXISTS(n.name)
RETURN n.name AS name, EXISTS((n)-[:MARRIED]->()) AS is_married
MATCH p = ((a)--(b)) RETURN nodes(p)
MATCH p = ((a)--(b)) RETURN relationships(p)
MATCH p = ((a)-[:RAILS]-(b)) UNWIND nodes(p) AS n return DISTINCT n;
WITH
clause works like | (pipe)
in *NIX,
it allows you to pipeline subsequent queries
MATCH (david { name: "David" })-[]->(otherPerson)
WITH otherPerson, count(*) AS foaf
WHERE foaf > 1
RETURN otherPerson
because Cypher is syntactically, the younger brother of SQL,
you fall into the trap of writing relational queries,
which will use the Cartesian product.
This is not a blessed way of graph Gods!!!
sentences are patterns
nouns are vertexes
verbs are edges
CREATE
(t0:Training {title : "Code archeology and architecture" }),
(n0:Trainer { name : "Jarosław Pałka"}),
(s0:Skill { name : "Git" }),
(s1:Skill { name : "SonarQube"}),
t0-[:AUTHORED_BY]->n0,
t0-[:REQUIRES]->s0,
t0-[:INTRODUCES]->s1;
MATCH p=shortestPath(
(s0:Skill {name : "Java"})
-[:REQUIRES|INTRODUCES*]-
(s1:Skill {name : "REST"})
WHERE NOT ()<-[:REQUIRES]-s1
RETURN p;
MATCH (a:Skill {name:"Java"})<-[:REQUIRES]-(b:Training)
MATCH (c:Training)-[:INTRODUCES]->(d:Skill {name:"REST"})
MATCH p=allShortestPaths(b-[rels:REQUIRES|INTRODUCES*]-c)
WHERE length(p)%2=0 AND
ALL(idx in range(0, length(p)-2,2) WHERE
type(rels[idx])="INTRODUCES" AND type(rels[idx+1]) = "REQUIRES")
RETURN p
first class citizens of my inbox
mails
people
there are hidden relationships in our graph
first question
where are email threads?
small hint
emails have Message-Id
and In-Reply-To
headers
MATCH (n:Message)
MATCH (m:Message)
WHERE n.`Message-Id`=m.`In-Reply-To`
CREATE n<-[:IN_REPLY_TO]-m;
CREATE INDEX ON :Message(`In-Reply-To`);
CREATE INDEX ON :Message(`Message-Id`);
so, what’s the longest email thread in our inbox?
small hint
stop, and think for a moment, what is email thread?
MATCH p=((n:Message)<-[:IN_REPLY_TO*]-(m:Message))
WHERE NOT ()<-[:IN_REPLY_TO]-n AND NOT (m)<-[:IN_REPLY_TO]-()
WITH n, length(nodes(p)) AS c
RETURN n.Subject, c
ORDER BY c DESC
LIMIT 5;
Cypher has EXPLAIN
and PROFILE
clauses
EXPLAIN
MATCH (p:Person { name:"Tom Hanks" })
RETURN p
PROFILE
MATCH (p:Person { name:"Tom Hanks" })
RETURN p
be specific, use labels and relationship types,
it will save hours of debugging and CPU cycles
or let’s go for a drink :)
small hint
if I know your email address, we are friends, aren’t we?
OPTIONAL MATCH (m:Message)<-[:IN_REPLY_TO]-(n:Message)-[:TO|CC|BCC]->r
WHERE NOT m-[:TO|CC|BCC]->r
WITH n,r
MATCH n-[:FROM]->f
MERGE r-[:KNOWS]-f;
clusters - tightly knit groups characterized by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes
our graph is biased,
because owner of the inbox has relationships to every message
a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterized by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes
local clustering coefficient of a vertex (node) in a graph quantifies how close its neighbors are to being a clique (complete graph)
it is then given by the proportion of links between the vertices within its neighborhood divided by the number of links that could possibly exist between them.
\$C_i=(|{e_(jk) : v_j,v_k in N_i,e_(jk) in E}|)/(k_i(k_i-1))\$
where neighborhood \$N_i\$ is:
\$N_i={v_j : e_(ij) in E vv e_(ji) in E}\$
and \$k_i\$ is number of neighboors of vertex \$i\$
MATCH (a:InternetAddress)-[:KNOWS]-(b:InternetAddress)
WITH a, count(distinct b) as neighbours
MATCH (a)-[:KNOWS]-()-[r:KNOWS]-()-[:KNOWS]-(a)
WHERE neighbours>1
WITH a, neighbours, count(distinct r) AS connected_neighbours
RETURN
a.address,
(2*toFloat(connected_neighbours))/(neighbours*(neighbours-1)) as coefficient
ORDER BY coefficient DESC;
we can even go deeper and try to play with cliques
is a subset of vertices of an undirected graph such that its induced subgraph is complete; that is, every two distinct vertices in the clique are adjacent.
or strongly connected components
a graph is said to be strongly connected if every vertex is reachable from every other vertex
we are now going to work with bigger model,
publicly available movies recommendation data set from MovieLens, created by GroupLens Research.
Data set contains movies recommendations, gathered through their website.
LOAD CSV FROM 'file:data.csv' AS line FIELDTERMINATOR '|'
WITH line[0] AS firstname, line[0] AS lastname
CREATE (:Person {firstname : firstname, lastname : lastname});
LOAD CSV WITH HEADERS FROM 'file:data.csv' AS line
CREATE (:Person {firstname : line.firstname, lastname : line.lastname});
MATCH (a:User)-[:RATED]->(m:Movie)<-[:RATED]-(b:User)-[:RATED]->(n:Movie)
WHERE a.userId="333" AND NOT (a-[:RATED]->n)
RETURN n.title LIMIT 50;
k-nearest neighbors and cosine similarity
this technique uses vector of ratings
\$cos(theta)=(A*B)/(|\ |A|\ |*|\ |B|\ |)=(sum_(i=1)^n A_i xx B_i)/(sqrt(sum_(i=1)^n (A_i)^2) * sqrt(sum_(i=1)^n (B_i)^2))\$
MATCH (p1:User)-[x:RATED]->(m:Movie)<-[y:RATED]-(p2:User)
WITH SUM(x.rating * y.rating) AS xyDotProduct,
SQRT(REDUCE(xDot = 0.0, a IN COLLECT(x.rating) | xDot + a^2)) AS xLength,
SQRT(REDUCE(yDot = 0.0, b IN COLLECT(y.rating) | yDot + b^2)) AS yLength,
p1, p2
MERGE (p1)-[s:SIMILARITY]-(p2)
SET s.similarity = xyDotProduct / (xLength * yLength);
MATCH (p1:User {userId:333})-[s:SIMILARITY]-(p2:User)
WITH p2, s.similarity AS sim
ORDER BY sim DESC
LIMIT 5
RETURN p2.userId AS Neighbor, sim AS Similarity
Grapaware extensions, like geo, recommendations czy time modeling, at GraphAware extensions
graph theory, best academic stuff I have seen in years, Graph Theory by Keijo Ruohonen
using graphs in recommendations, at Studying Recommendation Algorithms by Graph Analysis
read, test and explor what other do with graphs at GraphGist Project