Add report

Signed-off-by: Yohann D'ANELLO <ynerant@crans.org>
This commit is contained in:
Yohann D'ANELLO 2022-01-31 13:15:06 +01:00
parent 04adf8edcd
commit b1832fc527
Signed by: ynerant
GPG Key ID: 3A75C55819C8CF85
3 changed files with 317 additions and 0 deletions

4
.gitignore vendored Normal file
View File

@ -0,0 +1,4 @@
*.aux
*.log
*.nt
*.pdf

181
Report.tex Normal file
View File

@ -0,0 +1,181 @@
\documentclass{article}
\usepackage{cours}
\title{Keys in Graphs}
\date{January, $13^{\text{th}}$ 2022}
\begin{document}
\maketitle
\section{Introduction}
This project aims to find Graph keys, as defined in
\footnote{\url{https://www.researchgate.net/publication/283189709_Keys_for_graphs}}.
A Graph Key describes the relations that an object can have with their keys, and
what relations these involved objects can have.
\medskip
For example, a Graph Key for a book can be:
\begin{center}
\begin{tikzpicture}[y=3cm]
\node[draw] (0) at (0, 0) {Book};
\node[] (00) at (-3, -1) {x};
\node[draw] (01) at (-1, -1) {Person};
\node[] (02) at (1, -1) {y};
\node[draw] (03) at (3, -1) {Company};
\node[draw] (010) at (-2, -2) {Country};
\node[] (011) at (0, -2) {z};
\node[] (030) at (3, -2) {t};
\draw[->] (0) -- (00) node[midway,above,sloped] {title};
\draw[->] (0) -- (01) node[midway,above,sloped] {author};
\draw[->] (0) -- (02) node[midway,above,sloped] {subtitle};
\draw[->] (0) -- (03) node[midway,above,sloped] {publisher};
\draw[->] (01) -- (010) node[midway,above,sloped] {nationality};
\draw[->] (01) -- (011) node[midway,above,sloped] {last name};
\draw[->] (03) -- (030) node[midway,above,sloped] {identifier};
\end{tikzpicture}
\end{center}
That's to say, a book can be described with its title and its subtitle, the last
name and the nationality of its author, and the public identifier of the publisher.
\medskip
To generate these keys, one suggests to find $n$-almost keys using SAKey,
then to explore involved relations that define a domain and a range, and to
explore recursively the related fields.
\section{Proposed solution}
The proposed tool is available here:
\url{https://gitlab.crans.org/ynerant/graph-keys}
\subsection{Requirements}
The project is made with Python 3.9 and Python 3.10, and uses \texttt{BeautifulSoup4}
and \texttt{SPARQLWrapper} as libraries.
\subsection{Principle}
\medskip
The program takes as an input the class name that we want to explore, and a
threshold $n$ that is the number of allowed exceptions for SAKey. The class name
has to be existing in DBPedia since we make only DBPedia queries.
\medskip
First, we load the ontology of DBPedia and load a lot of relations. The goal is
to define the range and the domain of most relations. For example, we learn that
the relation \texttt{inCemetery} has the domain \texttt{GraveMonument} and the
range \texttt{Cemetery}.
\medskip
The next step is to query DBPedia to get all elements of the type of the input
class, then to get all triples \texttt{?x ?r ?y} that are involving these elements.
Since datasets can be very big, we limit the output by default to 1000 triples,
but this value can be changed using the option \texttt{-{}-limit}.
\medskip
We now store all these triples, and give them to the \emph{SAKey} tool, in order to
extract the $n$-almost keys of the dataset, where $n$ is given in input. These keys
are relations between our input, and we can now explore further. To get relevant
data, we made the choice to consider only the relations that has a defined range,
which is not a primitive (like integers, strings, dates, $\ldots$). In a general
case, we get only a few but relevant results.
\medskip
In the following, we consider the exploration of Graph Keys for the class
\texttt{Library}. We run the following command:
\begin{center}
\texttt{./main.py Library 5 -{}-limit 3000 -{}-recursion 3}
\end{center}
The last option will be explained further.
\medskip
For this example, we get the relation \texttt{location}, which has for range
\texttt{Place}.
\begin{figure}[H]
\centering
\begin{tikzpicture}[y=3cm]
\node[draw] (0) at (0.00, -0) {Library};
\node[draw] (0-0) at (0.00, -1) {Place};
\draw[->] (0) -- (0-0) node[midway,above,sloped] {location};
\end{tikzpicture}
\caption{Single discovered key}
\end{figure}
Since we discovered a relation between \texttt{Library} and \texttt{Place}, we
can now explore the keys of the class \texttt{Place} and extend the key for
\texttt{Library}.
\medskip
The program takes as optional input the parameter \texttt{-{}recursion}, which
limits the height of the output graphs. A value of $1$ gives normal keys from
\emph{SAKey}.
\medskip
When we take the example of \texttt{Library}, we get multiple outputs, like the
tree bellow:
\begin{figure}[H]
\centering
\begin{tikzpicture}[y=3cm]
\node[draw] (0) at (0.00, -0) {Library};
\node[draw] (0-0) at (0.00, -1) {Place};
\node[draw] (0-0-0) at (0.00, -2) {City};
\node[draw] (0-0-0-0) at (0.00, -3) {Image};
\draw[->] (0-0-0) -- (0-0-0-0) node[midway,above,sloped] {thumbnail};
\draw[->] (0-0) -- (0-0-0) node[midway,above,sloped] {capital};
\draw[->] (0) -- (0-0) node[midway,above,sloped] {location};
\end{tikzpicture}
\caption{Sample output of the program}
\end{figure}
\subsection{Limitations and further works}
While SAKey gives a huge amount of keys, only a few of them are well-typed, in the
sense that they define a comprehensive range. That gives us a serious limitation
to our algorithm.
\medskip
Moreover, the current algorithm does not take into account the rules that don't
have any defined range, which are the most. This isn't really a problem and can
easily be patched, since this generates more graphs, but does not provide any
additional input to extend data, as said before.
\medskip
We can notice that graph keys are for the most only paths (degrees of the nodes are
2 except for the extremal nodes). This is related to the fact that generated keys
are minimal, and most of them contain only one parameter. Our algorithm should
be able to generate any type of tree graph.
\medskip
However, our algorithm may not guess all existing graph keys. Current output graphs
have the property that if we truncate each graph to a given depth, then it stays a
valid graph key (at least a $n$-almost graph key), which has no reason to be true
in a general context. To cover this issue, we may extend the SAKey algorithm to
extend directly the properties with their ranges.
\medskip
To go further, we should take into account the missing properties, and find a way
to generate more complex graphs, and to find minimal graphs that are not flat.
\end{document}

132
cours.sty Normal file
View File

@ -0,0 +1,132 @@
\usepackage[utf8]{inputenc}
\usepackage[french]{babel}
\usepackage[T1]{fontenc}
\usepackage[top=3cm,bottom=3cm,left=2cm,right=2cm]{geometry}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{stmaryrd}
\usepackage{graphicx}
\usepackage{amsthm}
\usepackage{fancyhdr}
\usepackage{faktor}
\usepackage{dsfont}
\usepackage{pgf,tikz,pgfplots}
\usepackage{mathrsfs}
\usepackage{enumitem}
\usepackage{centernot}
\usepackage{float}
\usepackage{xurl}
\pgfplotsset{compat=1.17}
\usetikzlibrary{arrows}
\usetikzlibrary{shapes}
\author{Yohann D'ANELLO}
\pagestyle{fancy}
\lfoot{Yohann D'ANELLO}
\rfoot{M2 DS, Université Paris-Saclay}
\setlength{\headheight}{14pt}
\renewcommand{\headrulewidth}{1pt}
\renewcommand{\footrulewidth}{1pt}
\newtheoremstyle{mystyle}% name
{\topsep}%Space above
{\topsep}%Space below
{}%Body font
{0pt}%Indent amount
{\bfseries}% Theorem head font
{}%Punctuation after theorem head
{2pt}%Space after theorem head 2
{\underline{\thmname{#1}~\thmnumber{#2}\thmnote{~(#3)}.}}%Theorem head spec (can be left empty, meaning normal)
\theoremstyle{mystyle}
\newenvironment{preuve}{\noindent\textit{\underline{\proofname.}}~\newline\text{}}{\hfill $\square$\bigskip}
\newtheorem{definition}{Définition}
\newenvironment{defi}[1][]{\begin{definition}[#1]~\newline\text{}}{\end{definition}\bigskip}
\newtheorem{theorem}[definition]{Théorème}
\newenvironment{thm}[1][]{\begin{theorem}[#1]~\newline\text{}}{\end{theorem}\bigskip}
\newtheorem{proposition}[definition]{Proposition}
\newenvironment{prop}[1][]{\begin{proposition}[#1]~\newline\text{}}{\end{proposition}\bigskip}
\newtheorem{corrolary}[definition]{Corollaire}
\newenvironment{cor}[1][]{\begin{corrolary}[#1]~\newline\text{}}{\end{corrolary}\bigskip}
\newtheorem{lemma}[definition]{Lemme}
\newenvironment{lemme}[1][]{\begin{lemma}[#1]~\newline\text{}}{\end{lemma}\bigskip}
\newtheorem{example}[definition]{Exemple}
\newenvironment{ex}[1][]{\begin{example}[#1]~\newline\text{}}{\end{example}\bigskip}
\newtheorem{exercise}{Exercice}
\newenvironment{exo}[1][]{\begin{exercise}[#1]~\newline\text{}}{\end{exercise}\bigskip}
\newtheorem{remark}[definition]{Remarque}
\newenvironment{rem}[1][]{\begin{remark}[#1]~\newline\text{}}{\end{remark}\bigskip}
\newtheorem{question}{Question}
\newenvironment{q}{\begin{question}~\newline\text{}}{\end{question}\bigskip}
\let\rawl\{
\let\rawr\}
\renewcommand{\{}{\left\rawl}
\renewcommand{\}}{\right\rawr}
\renewcommand{\le}{\leqslant}
\renewcommand{\ge}{\geqslant}
\newcommand{\dx}{\, \mathrm{d}x}
\newcommand{\dy}{\, \mathrm{d}y}
\newcommand{\dz}{\, \mathrm{d}z}
\newcommand{\dt}{\, \mathrm{d}t}
\newcommand{\dmu}{\, \mathrm{d}\mu}
\newcommand{\dP}{\, \mathrm{d}\bbP}
\newcommand{\bbP}{\mathbb{P}}
\newcommand{\calP}{\mathcal{P}}
\newcommand{\calF}{\mathcal{F}}
\newcommand{\frakF}{\mathfrak{F}}
\newcommand{\bbE}{\mathbb{E}}
\newcommand{\calE}{\mathcal{E}}
\newcommand{\calL}{\mathcal{L}}
\newcommand{\calO}{\mathcal{O}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\Rbar}{\overline{\R}}
\newcommand{\calR}{\mathcal{R}}
\newcommand{\C}{\mathbb{C}}
\newcommand{\calC}{\mathcal{C}}
\newcommand{\Cbar}{\overline{\C}}
\newcommand{\K}{\mathbb{K}}
\newcommand{\intset}[2][1]{\left\llbracket {#1} , \, {#2} \right\rrbracket}
\newcommand{\Card}{\mathrm{Card}}
\newcommand{\A}{\mathcal{A}}
\newcommand{\B}{\mathcal{B}}
\renewcommand{\Re}{\mathfrak{Re}}
\renewcommand{\Im}{\mathfrak{Im}}
\newcommand{\dsum}{\displaystyle\sum}
\newcommand{\dprod}{\displaystyle\prod}
\newcommand{\dint}{\displaystyle\int}
%\newcommand{\dbinom}{\displaystyle\binom}
\newcommand{\doublebinom}[2]{\displaystyle\left(\!\!\binom{#1}{#2}\!\!\right)}
\newcommand{\tend}[2]{\underset{#1 \to #2}{\longrightarrow}}
\newcommand{\brkt}[1]{\left\langle {#1} \right\rangle}
\newcommand{\scal}[2]{\left\langle {#1} \middle, {#2} \right\rangle}
\newcommand{\bigslant}[2]{{\raisebox{.2em}{$#1$}\left/\raisebox{.2em}{$#2$}\right.}}
\newcommand{\op}{_{\mathrm{op}}}
\newcommand{\md}{\mathrm{d}}
\newcommand{\fL}{\mathfrak{L}}
\newcommand{\id}{\mathrm{id}}
%\renewcommand{\thesection}{\Roman{section} --}
%\renewcommand{\thesubsection}{\arabic{subsection} --}
%\renewcommand{\thesubsubsection}{\arabic{subsection}.\arabic{subsubsection} --}
\setlength\parindent{0pt}