Add report
Signed-off-by: Yohann D'ANELLO <ynerant@crans.org>
This commit is contained in:
parent
04adf8edcd
commit
b1832fc527
|
@ -0,0 +1,4 @@
|
||||||
|
*.aux
|
||||||
|
*.log
|
||||||
|
*.nt
|
||||||
|
*.pdf
|
|
@ -0,0 +1,181 @@
|
||||||
|
\documentclass{article}
|
||||||
|
|
||||||
|
\usepackage{cours}
|
||||||
|
|
||||||
|
\title{Keys in Graphs}
|
||||||
|
\date{January, $13^{\text{th}}$ 2022}
|
||||||
|
|
||||||
|
\begin{document}
|
||||||
|
|
||||||
|
\maketitle
|
||||||
|
|
||||||
|
\section{Introduction}
|
||||||
|
|
||||||
|
This project aims to find Graph keys, as defined in
|
||||||
|
\footnote{\url{https://www.researchgate.net/publication/283189709_Keys_for_graphs}}.
|
||||||
|
A Graph Key describes the relations that an object can have with their keys, and
|
||||||
|
what relations these involved objects can have.
|
||||||
|
|
||||||
|
\medskip
|
||||||
|
|
||||||
|
For example, a Graph Key for a book can be:
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
\begin{tikzpicture}[y=3cm]
|
||||||
|
\node[draw] (0) at (0, 0) {Book};
|
||||||
|
\node[] (00) at (-3, -1) {x};
|
||||||
|
\node[draw] (01) at (-1, -1) {Person};
|
||||||
|
\node[] (02) at (1, -1) {y};
|
||||||
|
\node[draw] (03) at (3, -1) {Company};
|
||||||
|
\node[draw] (010) at (-2, -2) {Country};
|
||||||
|
\node[] (011) at (0, -2) {z};
|
||||||
|
\node[] (030) at (3, -2) {t};
|
||||||
|
\draw[->] (0) -- (00) node[midway,above,sloped] {title};
|
||||||
|
\draw[->] (0) -- (01) node[midway,above,sloped] {author};
|
||||||
|
\draw[->] (0) -- (02) node[midway,above,sloped] {subtitle};
|
||||||
|
\draw[->] (0) -- (03) node[midway,above,sloped] {publisher};
|
||||||
|
\draw[->] (01) -- (010) node[midway,above,sloped] {nationality};
|
||||||
|
\draw[->] (01) -- (011) node[midway,above,sloped] {last name};
|
||||||
|
\draw[->] (03) -- (030) node[midway,above,sloped] {identifier};
|
||||||
|
\end{tikzpicture}
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
That's to say, a book can be described with its title and its subtitle, the last
|
||||||
|
name and the nationality of its author, and the public identifier of the publisher.
|
||||||
|
|
||||||
|
\medskip
|
||||||
|
|
||||||
|
To generate these keys, one suggests to find $n$-almost keys using SAKey,
|
||||||
|
then to explore involved relations that define a domain and a range, and to
|
||||||
|
explore recursively the related fields.
|
||||||
|
|
||||||
|
\section{Proposed solution}
|
||||||
|
|
||||||
|
The proposed tool is available here:
|
||||||
|
\url{https://gitlab.crans.org/ynerant/graph-keys}
|
||||||
|
|
||||||
|
\subsection{Requirements}
|
||||||
|
|
||||||
|
The project is made with Python 3.9 and Python 3.10, and uses \texttt{BeautifulSoup4}
|
||||||
|
and \texttt{SPARQLWrapper} as libraries.
|
||||||
|
|
||||||
|
\subsection{Principle}
|
||||||
|
|
||||||
|
\medskip
|
||||||
|
|
||||||
|
The program takes as an input the class name that we want to explore, and a
|
||||||
|
threshold $n$ that is the number of allowed exceptions for SAKey. The class name
|
||||||
|
has to be existing in DBPedia since we make only DBPedia queries.
|
||||||
|
|
||||||
|
\medskip
|
||||||
|
|
||||||
|
First, we load the ontology of DBPedia and load a lot of relations. The goal is
|
||||||
|
to define the range and the domain of most relations. For example, we learn that
|
||||||
|
the relation \texttt{inCemetery} has the domain \texttt{GraveMonument} and the
|
||||||
|
range \texttt{Cemetery}.
|
||||||
|
|
||||||
|
\medskip
|
||||||
|
|
||||||
|
The next step is to query DBPedia to get all elements of the type of the input
|
||||||
|
class, then to get all triples \texttt{?x ?r ?y} that are involving these elements.
|
||||||
|
Since datasets can be very big, we limit the output by default to 1000 triples,
|
||||||
|
but this value can be changed using the option \texttt{-{}-limit}.
|
||||||
|
|
||||||
|
\medskip
|
||||||
|
|
||||||
|
We now store all these triples, and give them to the \emph{SAKey} tool, in order to
|
||||||
|
extract the $n$-almost keys of the dataset, where $n$ is given in input. These keys
|
||||||
|
are relations between our input, and we can now explore further. To get relevant
|
||||||
|
data, we made the choice to consider only the relations that has a defined range,
|
||||||
|
which is not a primitive (like integers, strings, dates, $\ldots$). In a general
|
||||||
|
case, we get only a few but relevant results.
|
||||||
|
|
||||||
|
\medskip
|
||||||
|
|
||||||
|
In the following, we consider the exploration of Graph Keys for the class
|
||||||
|
\texttt{Library}. We run the following command:
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
\texttt{./main.py Library 5 -{}-limit 3000 -{}-recursion 3}
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
The last option will be explained further.
|
||||||
|
|
||||||
|
\medskip
|
||||||
|
|
||||||
|
For this example, we get the relation \texttt{location}, which has for range
|
||||||
|
\texttt{Place}.
|
||||||
|
|
||||||
|
\begin{figure}[H]
|
||||||
|
\centering
|
||||||
|
\begin{tikzpicture}[y=3cm]
|
||||||
|
\node[draw] (0) at (0.00, -0) {Library};
|
||||||
|
\node[draw] (0-0) at (0.00, -1) {Place};
|
||||||
|
\draw[->] (0) -- (0-0) node[midway,above,sloped] {location};
|
||||||
|
\end{tikzpicture}
|
||||||
|
\caption{Single discovered key}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
Since we discovered a relation between \texttt{Library} and \texttt{Place}, we
|
||||||
|
can now explore the keys of the class \texttt{Place} and extend the key for
|
||||||
|
\texttt{Library}.
|
||||||
|
|
||||||
|
\medskip
|
||||||
|
|
||||||
|
The program takes as optional input the parameter \texttt{-{}recursion}, which
|
||||||
|
limits the height of the output graphs. A value of $1$ gives normal keys from
|
||||||
|
\emph{SAKey}.
|
||||||
|
|
||||||
|
\medskip
|
||||||
|
|
||||||
|
When we take the example of \texttt{Library}, we get multiple outputs, like the
|
||||||
|
tree bellow:
|
||||||
|
|
||||||
|
\begin{figure}[H]
|
||||||
|
\centering
|
||||||
|
\begin{tikzpicture}[y=3cm]
|
||||||
|
\node[draw] (0) at (0.00, -0) {Library};
|
||||||
|
\node[draw] (0-0) at (0.00, -1) {Place};
|
||||||
|
\node[draw] (0-0-0) at (0.00, -2) {City};
|
||||||
|
\node[draw] (0-0-0-0) at (0.00, -3) {Image};
|
||||||
|
\draw[->] (0-0-0) -- (0-0-0-0) node[midway,above,sloped] {thumbnail};
|
||||||
|
\draw[->] (0-0) -- (0-0-0) node[midway,above,sloped] {capital};
|
||||||
|
\draw[->] (0) -- (0-0) node[midway,above,sloped] {location};
|
||||||
|
\end{tikzpicture}
|
||||||
|
\caption{Sample output of the program}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
\subsection{Limitations and further works}
|
||||||
|
|
||||||
|
While SAKey gives a huge amount of keys, only a few of them are well-typed, in the
|
||||||
|
sense that they define a comprehensive range. That gives us a serious limitation
|
||||||
|
to our algorithm.
|
||||||
|
|
||||||
|
\medskip
|
||||||
|
|
||||||
|
Moreover, the current algorithm does not take into account the rules that don't
|
||||||
|
have any defined range, which are the most. This isn't really a problem and can
|
||||||
|
easily be patched, since this generates more graphs, but does not provide any
|
||||||
|
additional input to extend data, as said before.
|
||||||
|
|
||||||
|
\medskip
|
||||||
|
|
||||||
|
We can notice that graph keys are for the most only paths (degrees of the nodes are
|
||||||
|
2 except for the extremal nodes). This is related to the fact that generated keys
|
||||||
|
are minimal, and most of them contain only one parameter. Our algorithm should
|
||||||
|
be able to generate any type of tree graph.
|
||||||
|
|
||||||
|
\medskip
|
||||||
|
|
||||||
|
However, our algorithm may not guess all existing graph keys. Current output graphs
|
||||||
|
have the property that if we truncate each graph to a given depth, then it stays a
|
||||||
|
valid graph key (at least a $n$-almost graph key), which has no reason to be true
|
||||||
|
in a general context. To cover this issue, we may extend the SAKey algorithm to
|
||||||
|
extend directly the properties with their ranges.
|
||||||
|
|
||||||
|
\medskip
|
||||||
|
|
||||||
|
To go further, we should take into account the missing properties, and find a way
|
||||||
|
to generate more complex graphs, and to find minimal graphs that are not flat.
|
||||||
|
|
||||||
|
\end{document}
|
|
@ -0,0 +1,132 @@
|
||||||
|
\usepackage[utf8]{inputenc}
|
||||||
|
\usepackage[french]{babel}
|
||||||
|
\usepackage[T1]{fontenc}
|
||||||
|
\usepackage[top=3cm,bottom=3cm,left=2cm,right=2cm]{geometry}
|
||||||
|
\usepackage{amsmath}
|
||||||
|
\usepackage{amsfonts}
|
||||||
|
\usepackage{amssymb}
|
||||||
|
\usepackage{stmaryrd}
|
||||||
|
\usepackage{graphicx}
|
||||||
|
\usepackage{amsthm}
|
||||||
|
\usepackage{fancyhdr}
|
||||||
|
\usepackage{faktor}
|
||||||
|
\usepackage{dsfont}
|
||||||
|
\usepackage{pgf,tikz,pgfplots}
|
||||||
|
\usepackage{mathrsfs}
|
||||||
|
\usepackage{enumitem}
|
||||||
|
\usepackage{centernot}
|
||||||
|
\usepackage{float}
|
||||||
|
\usepackage{xurl}
|
||||||
|
|
||||||
|
\pgfplotsset{compat=1.17}
|
||||||
|
\usetikzlibrary{arrows}
|
||||||
|
\usetikzlibrary{shapes}
|
||||||
|
|
||||||
|
\author{Yohann D'ANELLO}
|
||||||
|
|
||||||
|
\pagestyle{fancy}
|
||||||
|
\lfoot{Yohann D'ANELLO}
|
||||||
|
\rfoot{M2 DS, Université Paris-Saclay}
|
||||||
|
|
||||||
|
\setlength{\headheight}{14pt}
|
||||||
|
|
||||||
|
\renewcommand{\headrulewidth}{1pt}
|
||||||
|
\renewcommand{\footrulewidth}{1pt}
|
||||||
|
|
||||||
|
\newtheoremstyle{mystyle}% name
|
||||||
|
{\topsep}%Space above
|
||||||
|
{\topsep}%Space below
|
||||||
|
{}%Body font
|
||||||
|
{0pt}%Indent amount
|
||||||
|
{\bfseries}% Theorem head font
|
||||||
|
{}%Punctuation after theorem head
|
||||||
|
{2pt}%Space after theorem head 2
|
||||||
|
{\underline{\thmname{#1}~\thmnumber{#2}\thmnote{~(#3)}.}}%Theorem head spec (can be left empty, meaning ‘normal’)
|
||||||
|
|
||||||
|
\theoremstyle{mystyle}
|
||||||
|
|
||||||
|
\newenvironment{preuve}{\noindent\textit{\underline{\proofname.}}~\newline\text{}}{\hfill $\square$\bigskip}
|
||||||
|
|
||||||
|
\newtheorem{definition}{Définition}
|
||||||
|
\newenvironment{defi}[1][]{\begin{definition}[#1]~\newline\text{}}{\end{definition}\bigskip}
|
||||||
|
|
||||||
|
\newtheorem{theorem}[definition]{Théorème}
|
||||||
|
\newenvironment{thm}[1][]{\begin{theorem}[#1]~\newline\text{}}{\end{theorem}\bigskip}
|
||||||
|
|
||||||
|
\newtheorem{proposition}[definition]{Proposition}
|
||||||
|
\newenvironment{prop}[1][]{\begin{proposition}[#1]~\newline\text{}}{\end{proposition}\bigskip}
|
||||||
|
|
||||||
|
\newtheorem{corrolary}[definition]{Corollaire}
|
||||||
|
\newenvironment{cor}[1][]{\begin{corrolary}[#1]~\newline\text{}}{\end{corrolary}\bigskip}
|
||||||
|
|
||||||
|
\newtheorem{lemma}[definition]{Lemme}
|
||||||
|
\newenvironment{lemme}[1][]{\begin{lemma}[#1]~\newline\text{}}{\end{lemma}\bigskip}
|
||||||
|
|
||||||
|
\newtheorem{example}[definition]{Exemple}
|
||||||
|
\newenvironment{ex}[1][]{\begin{example}[#1]~\newline\text{}}{\end{example}\bigskip}
|
||||||
|
|
||||||
|
\newtheorem{exercise}{Exercice}
|
||||||
|
\newenvironment{exo}[1][]{\begin{exercise}[#1]~\newline\text{}}{\end{exercise}\bigskip}
|
||||||
|
|
||||||
|
\newtheorem{remark}[definition]{Remarque}
|
||||||
|
\newenvironment{rem}[1][]{\begin{remark}[#1]~\newline\text{}}{\end{remark}\bigskip}
|
||||||
|
|
||||||
|
\newtheorem{question}{Question}
|
||||||
|
\newenvironment{q}{\begin{question}~\newline\text{}}{\end{question}\bigskip}
|
||||||
|
|
||||||
|
\let\rawl\{
|
||||||
|
\let\rawr\}
|
||||||
|
\renewcommand{\{}{\left\rawl}
|
||||||
|
\renewcommand{\}}{\right\rawr}
|
||||||
|
|
||||||
|
\renewcommand{\le}{\leqslant}
|
||||||
|
\renewcommand{\ge}{\geqslant}
|
||||||
|
\newcommand{\dx}{\, \mathrm{d}x}
|
||||||
|
\newcommand{\dy}{\, \mathrm{d}y}
|
||||||
|
\newcommand{\dz}{\, \mathrm{d}z}
|
||||||
|
\newcommand{\dt}{\, \mathrm{d}t}
|
||||||
|
\newcommand{\dmu}{\, \mathrm{d}\mu}
|
||||||
|
\newcommand{\dP}{\, \mathrm{d}\bbP}
|
||||||
|
\newcommand{\bbP}{\mathbb{P}}
|
||||||
|
\newcommand{\calP}{\mathcal{P}}
|
||||||
|
\newcommand{\calF}{\mathcal{F}}
|
||||||
|
\newcommand{\frakF}{\mathfrak{F}}
|
||||||
|
\newcommand{\bbE}{\mathbb{E}}
|
||||||
|
\newcommand{\calE}{\mathcal{E}}
|
||||||
|
\newcommand{\calL}{\mathcal{L}}
|
||||||
|
\newcommand{\calO}{\mathcal{O}}
|
||||||
|
\newcommand{\N}{\mathbb{N}}
|
||||||
|
\newcommand{\Z}{\mathbb{Z}}
|
||||||
|
\newcommand{\Q}{\mathbb{Q}}
|
||||||
|
\newcommand{\R}{\mathbb{R}}
|
||||||
|
\newcommand{\Rbar}{\overline{\R}}
|
||||||
|
\newcommand{\calR}{\mathcal{R}}
|
||||||
|
\newcommand{\C}{\mathbb{C}}
|
||||||
|
\newcommand{\calC}{\mathcal{C}}
|
||||||
|
\newcommand{\Cbar}{\overline{\C}}
|
||||||
|
\newcommand{\K}{\mathbb{K}}
|
||||||
|
\newcommand{\intset}[2][1]{\left\llbracket {#1} , \, {#2} \right\rrbracket}
|
||||||
|
\newcommand{\Card}{\mathrm{Card}}
|
||||||
|
\newcommand{\A}{\mathcal{A}}
|
||||||
|
\newcommand{\B}{\mathcal{B}}
|
||||||
|
\renewcommand{\Re}{\mathfrak{Re}}
|
||||||
|
\renewcommand{\Im}{\mathfrak{Im}}
|
||||||
|
\newcommand{\dsum}{\displaystyle\sum}
|
||||||
|
\newcommand{\dprod}{\displaystyle\prod}
|
||||||
|
\newcommand{\dint}{\displaystyle\int}
|
||||||
|
%\newcommand{\dbinom}{\displaystyle\binom}
|
||||||
|
\newcommand{\doublebinom}[2]{\displaystyle\left(\!\!\binom{#1}{#2}\!\!\right)}
|
||||||
|
\newcommand{\tend}[2]{\underset{#1 \to #2}{\longrightarrow}}
|
||||||
|
\newcommand{\brkt}[1]{\left\langle {#1} \right\rangle}
|
||||||
|
\newcommand{\scal}[2]{\left\langle {#1} \middle, {#2} \right\rangle}
|
||||||
|
\newcommand{\bigslant}[2]{{\raisebox{.2em}{$#1$}\left/\raisebox{.2em}{$#2$}\right.}}
|
||||||
|
\newcommand{\op}{_{\mathrm{op}}}
|
||||||
|
\newcommand{\md}{\mathrm{d}}
|
||||||
|
\newcommand{\fL}{\mathfrak{L}}
|
||||||
|
\newcommand{\id}{\mathrm{id}}
|
||||||
|
|
||||||
|
%\renewcommand{\thesection}{\Roman{section} --}
|
||||||
|
%\renewcommand{\thesubsection}{\arabic{subsection} --}
|
||||||
|
%\renewcommand{\thesubsubsection}{\arabic{subsection}.\arabic{subsubsection} --}
|
||||||
|
|
||||||
|
\setlength\parindent{0pt}
|
Loading…
Reference in New Issue