diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..ce6ab41 --- /dev/null +++ b/.gitignore @@ -0,0 +1,4 @@ +*.aux +*.log +*.nt +*.pdf diff --git a/Report.tex b/Report.tex new file mode 100644 index 0000000..c859e56 --- /dev/null +++ b/Report.tex @@ -0,0 +1,181 @@ +\documentclass{article} + +\usepackage{cours} + +\title{Keys in Graphs} +\date{January, $13^{\text{th}}$ 2022} + +\begin{document} + +\maketitle + +\section{Introduction} + +This project aims to find Graph keys, as defined in +\footnote{\url{https://www.researchgate.net/publication/283189709_Keys_for_graphs}}. +A Graph Key describes the relations that an object can have with their keys, and +what relations these involved objects can have. + +\medskip + +For example, a Graph Key for a book can be: + +\begin{center} +\begin{tikzpicture}[y=3cm] +\node[draw] (0) at (0, 0) {Book}; +\node[] (00) at (-3, -1) {x}; +\node[draw] (01) at (-1, -1) {Person}; +\node[] (02) at (1, -1) {y}; +\node[draw] (03) at (3, -1) {Company}; +\node[draw] (010) at (-2, -2) {Country}; +\node[] (011) at (0, -2) {z}; +\node[] (030) at (3, -2) {t}; +\draw[->] (0) -- (00) node[midway,above,sloped] {title}; +\draw[->] (0) -- (01) node[midway,above,sloped] {author}; +\draw[->] (0) -- (02) node[midway,above,sloped] {subtitle}; +\draw[->] (0) -- (03) node[midway,above,sloped] {publisher}; +\draw[->] (01) -- (010) node[midway,above,sloped] {nationality}; +\draw[->] (01) -- (011) node[midway,above,sloped] {last name}; +\draw[->] (03) -- (030) node[midway,above,sloped] {identifier}; +\end{tikzpicture} +\end{center} + +That's to say, a book can be described with its title and its subtitle, the last +name and the nationality of its author, and the public identifier of the publisher. + +\medskip + +To generate these keys, one suggests to find $n$-almost keys using SAKey, +then to explore involved relations that define a domain and a range, and to +explore recursively the related fields. + +\section{Proposed solution} + +The proposed tool is available here: +\url{https://gitlab.crans.org/ynerant/graph-keys} + +\subsection{Requirements} + +The project is made with Python 3.9 and Python 3.10, and uses \texttt{BeautifulSoup4} +and \texttt{SPARQLWrapper} as libraries. + +\subsection{Principle} + +\medskip + +The program takes as an input the class name that we want to explore, and a +threshold $n$ that is the number of allowed exceptions for SAKey. The class name +has to be existing in DBPedia since we make only DBPedia queries. + +\medskip + +First, we load the ontology of DBPedia and load a lot of relations. The goal is +to define the range and the domain of most relations. For example, we learn that +the relation \texttt{inCemetery} has the domain \texttt{GraveMonument} and the +range \texttt{Cemetery}. + +\medskip + +The next step is to query DBPedia to get all elements of the type of the input +class, then to get all triples \texttt{?x ?r ?y} that are involving these elements. +Since datasets can be very big, we limit the output by default to 1000 triples, +but this value can be changed using the option \texttt{-{}-limit}. + +\medskip + +We now store all these triples, and give them to the \emph{SAKey} tool, in order to +extract the $n$-almost keys of the dataset, where $n$ is given in input. These keys +are relations between our input, and we can now explore further. To get relevant +data, we made the choice to consider only the relations that has a defined range, +which is not a primitive (like integers, strings, dates, $\ldots$). In a general +case, we get only a few but relevant results. + +\medskip + +In the following, we consider the exploration of Graph Keys for the class +\texttt{Library}. We run the following command: + +\begin{center} + \texttt{./main.py Library 5 -{}-limit 3000 -{}-recursion 3} +\end{center} + +The last option will be explained further. + +\medskip + +For this example, we get the relation \texttt{location}, which has for range +\texttt{Place}. + +\begin{figure}[H] +\centering +\begin{tikzpicture}[y=3cm] +\node[draw] (0) at (0.00, -0) {Library}; +\node[draw] (0-0) at (0.00, -1) {Place}; +\draw[->] (0) -- (0-0) node[midway,above,sloped] {location}; +\end{tikzpicture} +\caption{Single discovered key} +\end{figure} + +Since we discovered a relation between \texttt{Library} and \texttt{Place}, we +can now explore the keys of the class \texttt{Place} and extend the key for +\texttt{Library}. + +\medskip + +The program takes as optional input the parameter \texttt{-{}recursion}, which +limits the height of the output graphs. A value of $1$ gives normal keys from +\emph{SAKey}. + +\medskip + +When we take the example of \texttt{Library}, we get multiple outputs, like the +tree bellow: + +\begin{figure}[H] +\centering +\begin{tikzpicture}[y=3cm] +\node[draw] (0) at (0.00, -0) {Library}; +\node[draw] (0-0) at (0.00, -1) {Place}; +\node[draw] (0-0-0) at (0.00, -2) {City}; +\node[draw] (0-0-0-0) at (0.00, -3) {Image}; +\draw[->] (0-0-0) -- (0-0-0-0) node[midway,above,sloped] {thumbnail}; +\draw[->] (0-0) -- (0-0-0) node[midway,above,sloped] {capital}; +\draw[->] (0) -- (0-0) node[midway,above,sloped] {location}; +\end{tikzpicture} +\caption{Sample output of the program} +\end{figure} + +\subsection{Limitations and further works} + +While SAKey gives a huge amount of keys, only a few of them are well-typed, in the +sense that they define a comprehensive range. That gives us a serious limitation +to our algorithm. + +\medskip + +Moreover, the current algorithm does not take into account the rules that don't +have any defined range, which are the most. This isn't really a problem and can +easily be patched, since this generates more graphs, but does not provide any +additional input to extend data, as said before. + +\medskip + +We can notice that graph keys are for the most only paths (degrees of the nodes are +2 except for the extremal nodes). This is related to the fact that generated keys +are minimal, and most of them contain only one parameter. Our algorithm should +be able to generate any type of tree graph. + +\medskip + +However, our algorithm may not guess all existing graph keys. Current output graphs +have the property that if we truncate each graph to a given depth, then it stays a +valid graph key (at least a $n$-almost graph key), which has no reason to be true +in a general context. To cover this issue, we may extend the SAKey algorithm to +extend directly the properties with their ranges. + +\medskip + +To go further, we should take into account the missing properties, and find a way +to generate more complex graphs, and to find minimal graphs that are not flat. + +\end{document} diff --git a/cours.sty b/cours.sty new file mode 100644 index 0000000..8fd32ab --- /dev/null +++ b/cours.sty @@ -0,0 +1,132 @@ +\usepackage[utf8]{inputenc} +\usepackage[french]{babel} +\usepackage[T1]{fontenc} +\usepackage[top=3cm,bottom=3cm,left=2cm,right=2cm]{geometry} +\usepackage{amsmath} +\usepackage{amsfonts} +\usepackage{amssymb} +\usepackage{stmaryrd} +\usepackage{graphicx} +\usepackage{amsthm} +\usepackage{fancyhdr} +\usepackage{faktor} +\usepackage{dsfont} +\usepackage{pgf,tikz,pgfplots} +\usepackage{mathrsfs} +\usepackage{enumitem} +\usepackage{centernot} +\usepackage{float} +\usepackage{xurl} + +\pgfplotsset{compat=1.17} +\usetikzlibrary{arrows} +\usetikzlibrary{shapes} + +\author{Yohann D'ANELLO} + +\pagestyle{fancy} +\lfoot{Yohann D'ANELLO} +\rfoot{M2 DS, Université Paris-Saclay} + +\setlength{\headheight}{14pt} + +\renewcommand{\headrulewidth}{1pt} +\renewcommand{\footrulewidth}{1pt} + +\newtheoremstyle{mystyle}% name + {\topsep}%Space above + {\topsep}%Space below + {}%Body font + {0pt}%Indent amount + {\bfseries}% Theorem head font + {}%Punctuation after theorem head + {2pt}%Space after theorem head 2 + {\underline{\thmname{#1}~\thmnumber{#2}\thmnote{~(#3)}.}}%Theorem head spec (can be left empty, meaning ‘normal’) + +\theoremstyle{mystyle} + +\newenvironment{preuve}{\noindent\textit{\underline{\proofname.}}~\newline\text{}}{\hfill $\square$\bigskip} + +\newtheorem{definition}{Définition} +\newenvironment{defi}[1][]{\begin{definition}[#1]~\newline\text{}}{\end{definition}\bigskip} + +\newtheorem{theorem}[definition]{Théorème} +\newenvironment{thm}[1][]{\begin{theorem}[#1]~\newline\text{}}{\end{theorem}\bigskip} + +\newtheorem{proposition}[definition]{Proposition} +\newenvironment{prop}[1][]{\begin{proposition}[#1]~\newline\text{}}{\end{proposition}\bigskip} + +\newtheorem{corrolary}[definition]{Corollaire} +\newenvironment{cor}[1][]{\begin{corrolary}[#1]~\newline\text{}}{\end{corrolary}\bigskip} + +\newtheorem{lemma}[definition]{Lemme} +\newenvironment{lemme}[1][]{\begin{lemma}[#1]~\newline\text{}}{\end{lemma}\bigskip} + +\newtheorem{example}[definition]{Exemple} +\newenvironment{ex}[1][]{\begin{example}[#1]~\newline\text{}}{\end{example}\bigskip} + +\newtheorem{exercise}{Exercice} +\newenvironment{exo}[1][]{\begin{exercise}[#1]~\newline\text{}}{\end{exercise}\bigskip} + +\newtheorem{remark}[definition]{Remarque} +\newenvironment{rem}[1][]{\begin{remark}[#1]~\newline\text{}}{\end{remark}\bigskip} + +\newtheorem{question}{Question} +\newenvironment{q}{\begin{question}~\newline\text{}}{\end{question}\bigskip} + +\let\rawl\{ +\let\rawr\} +\renewcommand{\{}{\left\rawl} +\renewcommand{\}}{\right\rawr} + +\renewcommand{\le}{\leqslant} +\renewcommand{\ge}{\geqslant} +\newcommand{\dx}{\, \mathrm{d}x} +\newcommand{\dy}{\, \mathrm{d}y} +\newcommand{\dz}{\, \mathrm{d}z} +\newcommand{\dt}{\, \mathrm{d}t} +\newcommand{\dmu}{\, \mathrm{d}\mu} +\newcommand{\dP}{\, \mathrm{d}\bbP} +\newcommand{\bbP}{\mathbb{P}} +\newcommand{\calP}{\mathcal{P}} +\newcommand{\calF}{\mathcal{F}} +\newcommand{\frakF}{\mathfrak{F}} +\newcommand{\bbE}{\mathbb{E}} +\newcommand{\calE}{\mathcal{E}} +\newcommand{\calL}{\mathcal{L}} +\newcommand{\calO}{\mathcal{O}} +\newcommand{\N}{\mathbb{N}} +\newcommand{\Z}{\mathbb{Z}} +\newcommand{\Q}{\mathbb{Q}} +\newcommand{\R}{\mathbb{R}} +\newcommand{\Rbar}{\overline{\R}} +\newcommand{\calR}{\mathcal{R}} +\newcommand{\C}{\mathbb{C}} +\newcommand{\calC}{\mathcal{C}} +\newcommand{\Cbar}{\overline{\C}} +\newcommand{\K}{\mathbb{K}} +\newcommand{\intset}[2][1]{\left\llbracket {#1} , \, {#2} \right\rrbracket} +\newcommand{\Card}{\mathrm{Card}} +\newcommand{\A}{\mathcal{A}} +\newcommand{\B}{\mathcal{B}} +\renewcommand{\Re}{\mathfrak{Re}} +\renewcommand{\Im}{\mathfrak{Im}} +\newcommand{\dsum}{\displaystyle\sum} +\newcommand{\dprod}{\displaystyle\prod} +\newcommand{\dint}{\displaystyle\int} +%\newcommand{\dbinom}{\displaystyle\binom} +\newcommand{\doublebinom}[2]{\displaystyle\left(\!\!\binom{#1}{#2}\!\!\right)} +\newcommand{\tend}[2]{\underset{#1 \to #2}{\longrightarrow}} +\newcommand{\brkt}[1]{\left\langle {#1} \right\rangle} +\newcommand{\scal}[2]{\left\langle {#1} \middle, {#2} \right\rangle} +\newcommand{\bigslant}[2]{{\raisebox{.2em}{$#1$}\left/\raisebox{.2em}{$#2$}\right.}} +\newcommand{\op}{_{\mathrm{op}}} +\newcommand{\md}{\mathrm{d}} +\newcommand{\fL}{\mathfrak{L}} +\newcommand{\id}{\mathrm{id}} + +%\renewcommand{\thesection}{\Roman{section} --} +%\renewcommand{\thesubsection}{\arabic{subsection} --} +%\renewcommand{\thesubsubsection}{\arabic{subsection}.\arabic{subsubsection} --} + +\setlength\parindent{0pt}