SM01: Linguistic Resources - the Semantic Lexicon of Serbian Sheet Metal Manufacturing Web
Description
Research project SM01: Parallel Semantic Crawlers for Knowledge Extraction from manufacturing business multilingual web" Semantic resources utilized by the WELES, WEM and proposed SM crawlers (SM-LTSD): * The Semantic Lexicon built from Sheet Metal Manufacturing Industry Corpus (RDF/XML, OWL/RDF and NTriples) * Resources used for the lexicon construction: The Corpus word list and conceptual hierarchy spreadsheet * Reports on the lexicon * Encoding Twins definitions * Lexicon negatives (terms not supported but found on the web sites of the domain) * Sample collections of different development&debug reports The Semantic Lexicon: Total Concept entities: 4772 Without concept relationship: 44 (0,94 %) Without any relationship: 44 (0,94 %) Total Lemma entities: 6698 Without concept relationship: 5126 (76,55 %) Without any relationship: 4077 (60,88 %)
Files
Steps to reproduce
To use our Semantic Lexicon in your projects you may import the exported versions: * OWL/XML file * RDF/XML file * NTriplets file Keep in mind that the lexicon is built for domain specific application - check the list of lexicon negatives and other reports describing it. You might want to check the Semantic term extension graphs illustrated in plain text to get feeling about the semantic network of the lexicon (ExpansionSamples.zip): proizvod--> proizvodima |-> proizvoda |-> proizvodi |-> proizvode |-> dobar |-> dobro |-> 300779261 |-> 301591332 |-> 105091238 |-> 301133600 |-> 114924528 |-> 301127033 |-> 301813766 |-> 111350662 |-> 105792344 |-> 300525544 (these numerics are actually WordNet synsets represented by Concept entities in the lexicon) teoretski--> teoretskome |-> teoretskih |-> teoretskomu |-> teoretskima |-> teoretsku |-> teoretska |-> teoretskim |-> teoretske |-> teoretskog |-> teoretsko |-> teoretskoj |-> teoretskoga |-> teoretskom |-> 300865605