Integrating data from multiple sources with the aim to identify records that correspond to the same entity is required in many real-world applications including healthcare, national security, and businesses. However, privacy and confidentiality concerns impede the sharing of personal identifying values to conduct linkage across different organizations. Privacy-preserving record linkage (PPRL) techniques have been developed to tackle this problem by performing clustering based on the similarity between encoded record values, such that each cluster contains (similar) records corresponding to one single entity. When employing PPRL on databases from multiple parties, one major challenge is the prohibitively large number of similarity comparisons required for clustering, especially when the number and size of databases are large. While there have been several private blocking methods proposed to reduce the number of comparisons, they fall short in providing an efficient and effective solution for linking multiple large databases. Further, all of these methods are largely dependent on data. In this paper, we propose a novel private blocking method for efficiently linking multiple databases by exploiting the data characteristics in the form of probabilistic signatures and introduce a local blocking evaluation step for validating blocking methods without knowing the ground-truth. Experimental results show the efficacy of our method in comparison to several state-of-the-art methods.
Contributors:Marc Schulder, Yury Bakanouski
ATC-Anno is an annotation tool for the transcription and semantic annotation of air traffic control utterances.
It was developed at the Spoken Language Systems (LSV) group at Saarland University.
The latest version of the tool can always be found on the LSV GitHub account.
If you use the tool in your research, please cite the associated paper:
Marc Schulder, Johannah O'Mahony, Yury Bakanouski, Dietrich Klakow (2020). ATC-Anno: Semantic Annotation for Air Traffic Control with Assistive Auto-Annotation. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), Marseilles, France.
Contributors:Bullen, Jay C
MATLAB codes used to model arsenic(III) remediation using a composite TiO2-Fe2O3 sorbent in batch and continuous-flow systems, using a modified form of the pseudo-second order (PSO) adsorption kinetic model.
This data supports the manuscript provisionally titled 'A kinetic adsorption model to inform the design of arsenic(III) treatment plants using photocatalyst-sorbent materials'
Contributors:Juniper L. Simonis
Tools for interacting with the publicly available California Delta Fish Salvage Database, including continuous deployment of data access, analysis, and presentation.
Contributors:Daniel Huppmann, Matthew Gidden, Zeb Nicholls, Nikolay Kushin, OFR-IIASA, Rlamboll, rossursino, lumbric, Philipp S. Sommer, Michael Pimmer, Jarmo Kikstra, Arfon Smith
Improved feature for unit conversion using the pint package and the IAMconsortium/units repository, providing out-of-the-box conversion of unit definitions commonly used in integrated assessment research and energy systems modelling; see this tutorial for more information
Increased support for operations on timeseries data with continuous-time resolution
New tutorial for working with various input data formats; take a look here
Rewrite and extension of the documentation pages for the API; read the new docs!
PR #341 changed the API of IamDataFrame.convert_unit() from a dictionary to explicit kwargs current, to and factor (now optional, using pint if not specified).
PR #334 changed the arguments of IamDataFrame.interpolate() and pyam.fill_series() to time. It can still be an integer (i.e., a year).
With PR #337, initializing an IamDataFrame with n/a entries in columns other than value raises an error.
#354 Fixes formatting of API parameter docstrings
#352 Bugfix when using interpolate() on data with extra columns
#349 Fixes an issue with checking that time columns are equal when appending IamDataFrames
#348 Extend pages for API docs, clean up docstrings, and harmonize formatting
#347 Enable contexts and custom UnitRegistry with unit conversion
#341 Use pint and IIASA-ene-units repo for unit conversion
#339 Add tutorial for dataframe format io
#337 IamDataFrame to throw an error when initialized with n/a entries in columns other than value
#334 Enable interpolate to work on datetimes
This is the first release of code and data for 'Extinction rate of discovered and undiscovered plants in Singapore' by Kristensen, et al. (2020; Conservation Biology)
Contributors:DavyCats, Ruben Vorderman, Peter van 't Hof, António Paulo
Move commonly used inputs to the top-level workflow inputs sections in order
to work better with cromwell 48 and higher.
Add proper copyright headers to WDL files. So the free software license
is clear to end users who wish to adapt and modify.
Added "rna" and "exome" inputs to strelka.
Added inputs oveviews to docs.
Added miniwdl to linting.
Contributors:Boris Sekachev, Nikita Manovich, Andrey Zhavoronkov, Ben Hoff, Artyom Zankevich, DmitriySidnev, idriss, zliang7, Sebastian Yonekura, Aleksandr Melnikov, mfurkancoskun, kshramt, jrjbertram, aschernov, Toni Kunic, TOsmanov, Satoshi Oikawa, Santosh Thoduka, Rafael Kazuo Sato Simião, Naval Chand, Julian Guarin, JADG14, Happyzippy, EvgenyShashkin, Eric Jiang, Eduardo, Dustin Dorroh, DanVev, Codacy Badger, Ajay Ramesh
[1.0.0-alpha] - 2020-03-31
Data streaming using chunks (https://github.com/opencv/cvat/pull/1007)
New UI: showing file names in UI (https://github.com/opencv/cvat/pull/1311)
New UI: delete a point from context menu (https://github.com/opencv/cvat/pull/1292)
Git app cannot clone a repository (https://github.com/opencv/cvat/pull/1330)
New UI: preview position in task details (https://github.com/opencv/cvat/pull/1312)
AWS deployment (https://github.com/opencv/cvat/pull/1316)
The CMIP6 next generation (CMIP6ng) archive is an update to the raw CMIP6 archive as provided by the Earth System Grid Federation (ESGF). It introduces a range of additional checks for the processed variables and their
main dimensions (time, longitude, latitude) as well as incremental optimizations in the file structure and consistency of the files from different institutions. It provides models in their native horizontal resolution and on a common 2.5°×2.5° longitude-latitude grid. Files are provided in monthly time resolution and as annual means calculated from the monthly means. In addition, selected variables are available in daily resolution. Here, the differences between the CMIP6ng and the raw CMIP6 archives are presented, the processing structure is detailed and a list of checks is given.