Coreferent Clusters (dataset and a pre-trained model)
Description
MySQL database (a single table `word`) that contains a marked set of Ukrainian texts with coreferent groups and additional information (Lemma, Cases, etc) for each token. The reference below implies a Docker repository that represents an HTTP web-service to detect coreferent pairs for a Ukrainian corpus.
Files
Steps to reproduce
In order to use a dataset: 1. Download the MySQL file to your local machine. 2. Import the database using either graphical tools (e.g. phpMyAdmin) or raw commands (mysql ...). 3. Table 'word' contains all tokens. Group all tokens by the field 'DocumentID' into a set of documents. 4. In the case of the analysis at the level of sentences, split documents into sentences using the 'RawTagString': value './SENT_END' indicates the end of a sentence. 5. Group all words into mentions using the 'EntityID' attribute. 6. Group all mentions into coreferent clusters for further learning. In order to use a pre-trained model: 1. Install the docker tool on your local machine. 2. Pull the image: docker pull artemkramov/ukrainian-pack-coreference-coherence 3. Start a web-service: sudo docker run -p 5000:5000 artemkramov/ukrainian-pack-coreference-coherence:2.0 4. Send HTTP JSON queies {"text": "<text>"} to http://<local-address>:5000/api/get_coreferent_clusters