Securities law enforcement in India
This project comprises two data-sets on the enforcement of securities laws in India. The first data-set contains meta data of 8032 enforcement orders passed by the Securities and Exchange Board of India from 2011-2020, along with web links of the order text . The second data-set comprises certain fields derived from the text of 818 of these orders.
Steps to reproduce
The enforcement orders in our data are published on the website of the Securities and Exchange Board of India as PDF files. We used two open-source tools, namely PDF-Miner and Textract along with Google’s optical character recognition engine — Tesseract 350 — to convert the PDF files to text. In order to analyse the orders efficiently, we developed a rule-based pattern matching algorithm based on the Python 3 implementation of regular expressions. For the micro-analysis, we manually read each order to arrive at the derived fields.