\documentclass{article} \usepackage{cours} \title{Keys in Graphs} \date{January, $13^{\text{th}}$ 2022} \begin{document} \maketitle \section{Introduction} This project aims to find Graph keys, as defined in \footnote{\url{https://www.researchgate.net/publication/283189709_Keys_for_graphs}}. A Graph Key describes the relations that an object can have with their keys, and what relations these involved objects can have. \medskip For example, a Graph Key for a book can be: \begin{center} \begin{tikzpicture}[y=3cm] \node[draw] (0) at (0, 0) {Book}; \node[] (00) at (-3, -1) {x}; \node[draw] (01) at (-1, -1) {Person}; \node[] (02) at (1, -1) {y}; \node[draw] (03) at (3, -1) {Company}; \node[draw] (010) at (-2, -2) {Country}; \node[] (011) at (0, -2) {z}; \node[] (030) at (3, -2) {t}; \draw[->] (0) -- (00) node[midway,above,sloped] {title}; \draw[->] (0) -- (01) node[midway,above,sloped] {author}; \draw[->] (0) -- (02) node[midway,above,sloped] {subtitle}; \draw[->] (0) -- (03) node[midway,above,sloped] {publisher}; \draw[->] (01) -- (010) node[midway,above,sloped] {nationality}; \draw[->] (01) -- (011) node[midway,above,sloped] {last name}; \draw[->] (03) -- (030) node[midway,above,sloped] {identifier}; \end{tikzpicture} \end{center} That's to say, a book can be described with its title and its subtitle, the last name and the nationality of its author, and the public identifier of the publisher. \medskip To generate these keys, one suggests to find $n$-almost keys using SAKey, then to explore involved relations that define a domain and a range, and to explore recursively the related fields. \section{Proposed solution} The proposed tool is available here: \url{https://gitlab.crans.org/ynerant/graph-keys} \subsection{Requirements} The project is made with Python 3.9 and Python 3.10, and uses \texttt{BeautifulSoup4} and \texttt{SPARQLWrapper} as libraries. \subsection{Principle} \medskip The program takes as an input the class name that we want to explore, and a threshold $n$ that is the number of allowed exceptions for SAKey. The class name has to be existing in DBPedia since we make only DBPedia queries. \medskip First, we load the ontology of DBPedia and load a lot of relations. The goal is to define the range and the domain of most relations. For example, we learn that the relation \texttt{inCemetery} has the domain \texttt{GraveMonument} and the range \texttt{Cemetery}. \medskip The next step is to query DBPedia to get all elements of the type of the input class, then to get all triples \texttt{?x ?r ?y} that are involving these elements. Since datasets can be very big, we limit the output by default to 1000 triples, but this value can be changed using the option \texttt{-{}-limit}. \medskip We now store all these triples, and give them to the \emph{SAKey} tool, in order to extract the $n$-almost keys of the dataset, where $n$ is given in input. These keys are relations between our input, and we can now explore further. To get relevant data, we made the choice to consider only the relations that has a defined range, which is not a primitive (like integers, strings, dates, $\ldots$). In a general case, we get only a few but relevant results. \medskip In the following, we consider the exploration of Graph Keys for the class \texttt{Library}. We run the following command: \begin{center} \texttt{./main.py Library 5 -{}-limit 3000 -{}-recursion 3} \end{center} The last option will be explained further. \medskip For this example, we get the relation \texttt{location}, which has for range \texttt{Place}. \begin{figure}[H] \centering \begin{tikzpicture}[y=3cm] \node[draw] (0) at (0.00, -0) {Library}; \node[draw] (0-0) at (0.00, -1) {Place}; \draw[->] (0) -- (0-0) node[midway,above,sloped] {location}; \end{tikzpicture} \caption{Single discovered key} \end{figure} Since we discovered a relation between \texttt{Library} and \texttt{Place}, we can now explore the keys of the class \texttt{Place} and extend the key for \texttt{Library}. \medskip The program takes as optional input the parameter \texttt{-{}recursion}, which limits the height of the output graphs. A value of $1$ gives normal keys from \emph{SAKey}. \medskip When we take the example of \texttt{Library}, we get multiple outputs, like the tree bellow: \begin{figure}[H] \centering \begin{tikzpicture}[y=3cm] \node[draw] (0) at (0.00, -0) {Library}; \node[draw] (0-0) at (0.00, -1) {Place}; \node[draw] (0-0-0) at (0.00, -2) {City}; \node[draw] (0-0-0-0) at (0.00, -3) {Image}; \draw[->] (0-0-0) -- (0-0-0-0) node[midway,above,sloped] {thumbnail}; \draw[->] (0-0) -- (0-0-0) node[midway,above,sloped] {capital}; \draw[->] (0) -- (0-0) node[midway,above,sloped] {location}; \end{tikzpicture} \caption{Sample output of the program} \end{figure} \subsection{Limitations and further works} While SAKey gives a huge amount of keys, only a few of them are well-typed, in the sense that they define a comprehensive range. That gives us a serious limitation to our algorithm. \medskip Moreover, the current algorithm does not take into account the rules that don't have any defined range, which are the most. This isn't really a problem and can easily be patched, since this generates more graphs, but does not provide any additional input to extend data, as said before. \medskip We can notice that graph keys are for the most only paths (degrees of the nodes are 2 except for the extremal nodes). This is related to the fact that generated keys are minimal, and most of them contain only one parameter. Our algorithm should be able to generate any type of tree graph. \medskip However, our algorithm may not guess all existing graph keys. Current output graphs have the property that if we truncate each graph to a given depth, then it stays a valid graph key (at least a $n$-almost graph key), which has no reason to be true in a general context. To cover this issue, we may extend the SAKey algorithm to extend directly the properties with their ranges. \medskip To go further, we should take into account the missing properties, and find a way to generate more complex graphs, and to find minimal graphs that are not flat. \end{document}