Tutorials


TUTORIAL 1: openCypher: New Directions in Property Graph Querying

Organizers: Petra Selmer (Neo4j), Martin Junghanns (Neo4j & University of Leipzig)
Duration: 1,5 hours
Abstract: Cypher is a property graph query language that provides expressive and efficient querying of graph data. Originally designed and implemented within the Neo4j graph database, it is now being used by several industrial database products, as well as open-source and research projects. Since 2015, Cypher has been an open, evolving language, with the aim of becoming a fully-specified standard with many independent implementations.
We introduce Cypher and the property graph model, and then describe extensions -- either actively being developed or under discussion -- which will be incorporated into Cypher in the near future. These include (i) making Cypher into a fully compositional language by supporting multiple graphs and allowing graphs to be returned from queries; (ii) allowing for more complex patterns (based on regular path queries) to be expressed; and (iii) allowing for different pattern matching semantics -- homomorphism, relationship isomorphism (the current default) or node isomorphism -- to be configured at a query-by-query level.
A subset of the proposed Cypher language extensions has already been implemented on top of Apache Spark. In the tutorial, we will present our approach including an in-depth analysis of the challenges we faced. This includes mapping the property graph model to the Spark DataFrame abstraction and the translation of Cypher query operators into relational transformations. The tutorial will conclude with a demonstration based on a real-world graph analytical use case.


TUTORIAL 2: Recent Advances in Recommender Systems: Matrices, Bandits, and Blenders

Organizers: Georgia Koutrika (Athena Research Center, Greece)
Duration: 1.5 hours
Abstract: Recent years have witnessed an explosion in methods applied to solve the recommendation problem. Modern recommender systems have become increasingly more complex compared to their early content-based and collaborative filtering versions. In this tutorial, we will cover recent advances in recommendation methods, focusing on matrix factorization, multi-armed bandits, and methods for blending recommendations. We will also describe evaluation techniques, and outline open issues and challenges. The ultimate goal of this tutorial is to present a toolkit of new recommendation methods in perspective to data-related problems, and highlight opportunities and new research paths for researchers and practitioners that work on problems in the intersection of recommender systems and databases.


TUTORIAL 3: Interactive Exploration of Composite Items

Organizers: Sihem Amer-Yahia (CNRS, Univ. Grenoble Alpes), Senjuti Basu Roy (New Jersey Institute of Technology)
Duration: 1.5 hours
Abstract: Data exploration is seeing a renewed interest in the database community. With the rise of big data analytics, this area is growing to encompass not only approaches and algorithms to find the next best data items to explore but also interactivity, i.e. accounting for feedback from the data scientist during the exploration. Interactivity is essential to account for evolving needs during the exploration and also customize the discovery process. In this tutorial, we focus on the interactive exploration of Composite Items (CIs). We will first review applications and algorithms for CI formation (20mn). We then discuss two big research questions (50mn): (i) modes of exploration for CIs, and (ii) human-in-the-loop CIs. We will conclude with research directions (20mn).


TUTORIAL 4: Real-Time Data Management for Big Data

Organizers: Wolfram Wingerath (University of Hamburg, Germany), Felix Gessert (Baqend), and Norbert Ritter (University of Hamburg, Germany)
Duration: 1.5 hours
Abstract: Users have come to expect reactivity from mobile and web applications, i.e. they assume that changes made by other users become visible immediately. However, developers are challenged with building reactive applications on top of traditional pull-oriented databases, because they are ill-equipped to push new information to the client. Systems for data stream management and processing, on the other hand, are natively push-oriented and thus facilitate reactive behavior, but they do not follow the same collection-based semantics as traditional databases: Instead of database collections, stream-oriented systems are based on a notion of potentially unbounded sequences of data items.
In this tutorial, we survey and categorize the system space between pull-oriented databases and push-oriented stream management systems, using their respectively facilitated means of data retrieval as a reference point. A particular emphasis lies on the novel system class of real-time databases which combine the push-based access paradigm of stream-oriented systems with the collection-based query semantics of traditional databases. We explore why real-time databases deserve distinction in a separate system class and dissect their different architectures to highlight issues, derive open challenges, and discuss avenues for addressing them.