NGS Patstat data

Published: 07-02-2020| Version 1 | DOI: 10.17632/f45st2xmkj.1
Riccardo Priore


The datasets included in the repository concern results obtained following patent searches performed by means of Patstat online ( The analysis performed by means of Patstat online during Dec. 2019 using the Autumn 2019 edition of Patstat is quite complex therefore has been organised according to the following five objectives: 1) the files named ‘C1 – C4’ are included in the folder ‘Coverage NGS’ and are essentially aimed at determining the amount of patent families corresponding to patent applications filed considering the Next Generation Sequencing techniques/tools or the more generic item dealing with genomic sequencing. Another aim is that of ranking the applicants specifically dealing with the NGS matter. A pdf file includes a list of the patent documents in order to show that the found results effectively deal with NGS. 2) The files named ‘N1 – N2’ (‘Normalization NGS’) allow to determine the number of patent applications specifically concerning the NGS matter and filed to different national patent authorities. It may be reasonably assumed that if one geographical area is much wider and characterised by a higher number of residents than a second one, in the former case a higher number of filed patent applications than in the latter case would be reasonable, unless the normalization of the data would rather reveal a preferential specialization or commitment of one territory toward a specific technical item. 3) The files named ‘Q1 – Q5’ are the results of search criteria aimed at determining the average quality of the technical content of pooled patent documents. As detailed in the ‘Quality NGS’ folder content, qualitative considerations may benefit from knowledge regarding the differentiation between competitors based on parameters such as the residual validity of the patents, the number of fee payments, the dimension of the patent family or the number of forward citations of a patent document. 4) The files named ‘S1 – S4’ are aimed at elucidating the collaborations occurred in a specific time-frame between one applicant and one patent attorney. Cases in which two applicants, having collaborated with the very same patent attorney, can be considered as competitors or collaborators (co-assignees according to the possibility of co-filing a patent application), could emerge by means of such search phase. 5) The files named ‘E1 – E8’, included in the ‘NGS Evolution’ folder, contain data regarding four nucleic acids sequencing techniques currently acknowledged as state of the art. Such files are aimed at evaluating and compare the trends corresponding to each of these techniques, estimated by means of the number of patent families and also aimed at identifying the players on the basis of the level of their commitment toward each of the four technologies, as detailed in the ‘NGS Evolution’ folder.


Steps to reproduce

The list of SQL queries necessary to get the records included in Excel is also provided in one DATA IN BRIEF MANUSCRIPT and can be run in Patstat online as such, or with slight modification, depending on the attributes of particular interest. Basic information regarding the syntax of the SQL language and the features of patent data downloadable by means of Patstat are provided in the following URL: The SQL scripts may not refer to some of the Excel files provided in this repository and generated following the consultation of different patent databases, in particular Derwent Innovation, available from Clarivate Analytics ( or Orbit Intelligence available from Questel ( The queries corresponding to such data have been included in a file (NUCLEIC ACID SEQUENCING TECHNOLOGY SCENARIO BASED ON A COMPARATIVE ANALYSIS OF PATENT DATA) submitted to World Patent Information for publication of the analysis of such data plus those retrieved by means of Patstat online.