Sri Lankan English Newspaper Corpus – 2015
Description
Sri Lankan English Newspaper corpus – 2015 is a web- derived newspaper corpus of 2 million words compiled by Hediwaththege Chathuri Keshala in 2015. The corpus was compiled by me for personal use in a corpus based research project that I conducted as a part of my thesis for Master of Arts in English Studies, Department of English, University of Colombo, Sri Lanka. The study was conducted upon obtaining ethical clearance from the Ethics Review Committee for Social Sciences and Humanities (ERCSSH) in the Faculty of Arts, University of Colombo. The study aimed at identifying the occurrence of grammar variations that are labelled as characteristic of Sri Lankan English in two corpora belonging to two periods of time. It used the Sri Lankan component of the SAVE corpus which includes texts extracted from two Sri Lankan newspapers; Daily News published between 2001 and 2005 and Daily Mirror published between 2002 and 2007. The self-compiled Sri Lankan English Newspaper corpus – 2015 was used as a monitor corpus. Since comparisons need to be made between the Sri Lankan sub corpus of the SAVE and the Sri Lankan English Newspaper corpus – 2015, it was designed as similar as possible to the Sri Lankan sub corpus of the SAVE, in composition. Therefore, it includes data collected from online versions of the Daily News and Daily Mirror published in 2015. All genres of writing in the online version of the newspapers have been included except advertisements and the “Opinion” column of the Daily Mirror.. Further, foreign news reports such as those of the Associated Press and Reuters have been excluded as far as possible as these are not considered representative of SLE. The corpus includes 1 million words from each newspaper totalling 2 million words. Composition of the Corpus Newspaper Source Time Span Number of words (Approximate) Daily News http://www.dailynews.lk April - June 2015 1000000 words Daily Mirror http://www.dailymirror.lk May - July 2015 1000000 word
Files
Steps to reproduce
The Sri Lankan English Newspaper corpus – 2015 was designed as similar as possible to the Sri Lankan sub corpus of the SAVE, in composition. It includes texts extracted from the online versions of two Sri Lankan newspapers; Daily News and Daily Mirror published in April - July 2015. 87 newspapers published within this period were randomly selected to be included in the sample. 1 million words from each newspaper is included in this corpus totalling 2 million words. The files are text files that are suitable for importing into any spreadsheet program. The files can be analysed using corpus analysis tools including AntConc. The corpus is divided into two sub corpora based on the newspaper from which the texts are extracted; Daily News (DN), Daily Mirrors (DM). The text files within each sub corpora are categorized according to the date on which they were collected. All genres of writing in the online version of the newspapers have been included except advertisements and the “Opinion” column of the Daily Mirror. Further, foreign news reports such as AP and Reuters have been excluded as far as possible as those could not be considered representative of SLE. The corpus is yet to be annotated.