GML Newsletter - Issue #5: Was 2020 a good year for graph research?
“Knowledge is like money: To be of value it must circulate, and in circulating it can increase in quantity and, hopefully, in value.”
NeurIPS 2020 is over 👏, ICLR 2021 finished the rebuttal 👓, and Christmas ⛄ is around the corner, what a great time to relax and look back at the achievements of this year. Today, we will look back at what we achieved during 2020 and compare it against the predictions I made in January. We will also look at the graph works at NeurIPS 2020 and ICLR 2021. And as always share a few links that can be worth to you, aficionado of graphs 💣.
Trends of 2020
One of the first posts I made this year was on the trends of GML. I spent a couple of months reading graph submissions to ICLR 2020 and it kind of paid off: more than 7.5K people read it. Here are the four trends I outlined in the beginning of this year.
More solid theoretical understanding of GNN;
New cool applications of GNN;
Knowledge graphs become more popular;
New frameworks for graph embeddings.
Let’s deep dive at each of those. Theoretical understanding of GNNs was indeed a huge topic this year 📈. There are works on universal approximation, equivariant networks, expressive power, generalization, convergence, oversmoothing, and many other aspects. Are we done? I’m pretty sure there will be much more works on the theory of GNNs. The topic is growing and we still have many open questions remaining (for example, studying the power of non-message-passing networks on graphs).
I have a mixed feeling about the trend of new cool applications of GNN 👩🔬. People indeed published a lot of papers about how graph networks can be applied to various problems. The molecular design drew a lot of attention with several recent surveys, competitions, and datasets. There are numerous works for writing better software, traffic forecasting, computer vision. I personally liked the application of GNNs to structural design by AutoDesk. However, there is little adoption of GNNs in the industry, at least to my knowledge. Those companies that I know usually have graphs as some small subsystem for their production (although it could have some serious impact) and what’s lacking, in my opinion, is the availability of industry-level datasets (scale and nature) as well as frameworks to put GNNs into production. I hope that’s something we will see in the near future.
Research on knowledge graphs also pleased us with significant milestones. Query2box, a technique for answering logical queries, inspired more a big wave 🌊 of research on logical reasoning without ontologies such as BetaE and EmQL. Several challenging KG datasets have been released, where traditional models such as transformers and MPNN achieve very low accuracy. Google’s researchers presented a large and realistic natural language question-answering dataset CFQ, where a question should be correctly transformed into a complex logical query. Then GraphLog is a multi-purpose, multi-relational graph dataset built using rules grounded in first-order logic. Moreover, Open Graph Benchmark includes two KGs which would inspire more tasks and submissions in this area. Finally, KGs have become a must-have part for language models, now enriched with entities from wikidata and showing superior results in several NLP problems. Clearly KGs constitute a significant part of obtaining knowledge about this world so they won’t go away any time soon.
Finally, 2020 proposed some new approaches for embedding a graph. LouvainNE and InstantEmbedding are two competing approaches to get node embeddings extremely fast 🏎, which were evaluated in graphs with billions of edges. GraphZoom is a general framework that can incorporate any unsupervised embedding model and improve accuracy through a multi-scale approach. Personally, I think the field of unsupervised embedding is slowly moving from research to production settings, which is a great indication that graph methods are useful. So in the future, I would rather expect more research on supervised embedding methods.
How did you feel about NeurIPS? I think gather town 🏙 really saved communication in this conference, poster sessions felt real (except for occasional bugs), no more zoom connections for every single poster. As expected with about ~2K papers it was a bit overwhelming to search for the right material: just scrolling down the papers for a poster session already felt like a laborious task, not to say 3️⃣-hour tutorial sessions. Luckily, if you are interested in graphs, there is just the right amount of research to consume for everybody.
If you want to learn how graphs are used at Google, look no further, there is a great tutorial by Bryan Perozzi and the gang. To see a complete schedule with the links to the video, check out this page.
If you want to learn novel ideas, that’s what workshops are for and there are several related to graphs. Topological Data Analysis and Beyond, Differential Geometry meets Deep Learning, Learning Meets Combinatorial Algorithms (LMCA), Machine Learning for Molecules is all you need to see where the field is going.
And of course, there are more than 120 full papers related to graphs. Around 6.5% of all accepted papers – that’s huge! If you want to have a sneak peek at some of them check out this curated list.
ICLR is one of the top conferences in ML 🧠, gaining more popularity each year due to the novel review process when anyone can see the reviews for any of the submitted paper. This allows one to analyze all sorts of statistics about the quality of papers and the reviews. So I open-sourced the list of all submissions together with their final scores and calculated a few stats about this year. So what do we have?
There are more than 2600 submissions this year, which is approximately the same as last year. With the acceptance rate of 25% there would be 650+ accepted papers. There are about 210 graph-related papers, which again makes it a very hot topic.
Here is a list of graph-related papers, 59 of which have an average score >= 6️⃣. What’s cool is the top-1 submission across all submissions is the one related to the generalization abilities of GNNs.
You may wonder how your rebuttal affects your scores? Do reviewers tend to change their scores? If yes, by how much? So the definite answer is that about 50% of the papers get at least one of the score changes. There are some rare exceptions when all 4 scores changed after the rebuttal. On average, average score jumped by 0.25 after rebuttal for all papers. The majority of the score changes are +1 or +2, but there are again some exceptions when the scores changed by -5 or +5. So overall it seems that it’s worth being engaged in the discussion with the reviewers as it is very likely to change your scores.
Geometric Deep Learning: Successes, Challenges, Next Steps by Michael Bronstein. Michael is talking about deriving convolution from the first principles, first GNN models, the expressivity of GNNs, and the future applications of the field. Very inspiring.
Recent Developments of Graph Network Architectures by Xavier Bresson. A must-watch presentation that nicely summarizes exciting topics such as graph isomorphism, WL tests, equivariance, universal approximations, positional encodings, and more.
Graph Neural Networks Channel by Zak Jost, covering some aspects of GNNs, including an interview with DeepMind authors for using GNNs for physics.
Erdős goes neural: unsupervised learning of combinatorial algorithms with neural networks by Andreas Loukas. He talks about their NeurIPS work that proposes a differentiable way to solve CO problems with unsupervised GNNs.
Undergraduate Math Student Pushes Frontier of Graph Theory by QuantaMagazine about a 21-year-old who improved results of Erdős and Szekeres on the upper bound for two-color Ramsey numbers.
That’s all folks 👦. As always share a word 🗣 among your friends and colleagues if you liked this issue. Also, if you have something to share 📤 with the community such as blog posts or videos, please reply to this issue! Happy Christmas 🎄 and New Year’s Eve 🤶! Peace!