Contributors:JQ, Rahul, King Chung (Johnny) Ho, Dawn Childress, YellowNoise, Peter Broadwell
Thanks for using our package! This is a command line interface for detecting annotations in IIIF-hosted printed books. Give the script a list of IIIF manifest URLs (either saved locally or hosted elsewhere) and it will generate a resultsManifest.json file that is IIIF-compliant that can be displayed via IIIF Viewers like Mirador. Other outputs available are an HTML gallery and plain text file.
Support for smaller-than-max inference images
Sometimes IIIF image servers limit the dimensions of the largest "full" image they make available (so people can't just download the full-res version); the code to highlight the detected handwriting regions via IIIF annotations now handles these situations.
There's a new command-line option, --max_pages=N, which can be used to avoid processing all of the images/pages on a given manifest, if for example the manifest is really long and you only want to detect handwriting on the first N pages.
The generated IIIF annotations have been streamlined considerably, thanks to suggestions from @glenrobson.
Add option to generate IIIF annotation lists
If the user specifies --annotate when running inferencer.py with the--manifestflag, IIIF annotation list files are created in theannotations/` folder and referenced in the resulting manifest file. IIIF-compatible viewers like Mirador can visualize the detected handwriting/notes on the input images when they load the generated manifest.
There's also an optional --iiif_root argument to specify the web address where the manifest of the detected annotations will be posted, with the annotations/ folder for the optional IIIF annotation lists within it.
The detected handwriting/notes are highlighted in the IIIF annotation overlays when viewed in Mirador via a dashed rectangle bounding box with a dashed mask path within it. Mousing over the detected annotations displays a tag and the confidence level.
Requires Python 3.6.x
Download the Source Code package, unzip it, and save the Omniscribe-1.0.2 folder to your local machine or server.
Download the model.h5 and save to the Omniscribe-1.0.2 folder.
Using the command line, navigate to the Omniscribe-1.0.2 folder.
Install dependencies by running the command pip install -r requirements.txt.
NOTE: We recommend setting up a virtual environment to install Omniscribe. For more information on setting up a virtual environment, please refer to https://packaging.python.org/guides/installing-using-pip-and-virtualenv/ up to the Leaving the virtualenv section of this documentation.
Run inferencer.py with the manifest URLs where you wish to detect annotations:
python3 inferencer.py [ export ] [ confidence ] [ manifest-url/path ]
--manifest - exports resultsManifest.json, a IIIF manifest listing the images with detected annotations.
--text - exports resultsURIs.txt, a text file that contains URLs of images with detected annotations.
--html - exports resultsImages.html, a simple HTML gallery of images with detected annotations.
--max_pages=N - limits the number of images/pages processed in a given manifest, if for example the manifest is really long and you only want to detect handwriting on the first N pages.
--annotate (when running inferencer.py with the --manifest flag) - IIIF annotation list files are created in the annotations/ folder and referenced in the resulting manifest file.
--iiif_root - specify the web address where the manifest of the detected annotations will be posted, with the annotations/ folder for the optional IIIF annotation lists within it.
The default export format is resultsManifest.json if no export options are specified.
--confidence=VALUE - adjust this value for any values between 0 and 1 (inclusive).
E.g. --confidence=0.91 sets the threshold to 0.91. This means that any region that receives a score of 0.91 or higher from our model will be inferred as an annotation.
The default confidence level is 0.95 if no confidence value is specified.
Gauging a "Good" Confidence Value
We found that marginalia are often detected with a confidence value of 0.90 and higher, but detecting interlinear annotations require lower confidence values, somewhere between 0.70-0.85. This means that setting a confidence score of --confidence=0.90 will detect marginalia, but will be less effective in detecting interlinear annotations since these often receive scores below the threshold of 0.90. Setting a confidence score of --confidence=0.70 will detect both interlinear annotations and marginalia (as both types of annotations will receive scores that are equal or higher than the confidence score); however, using the lower confidence threshold will likely result in more false positives.
Operating on Multiple Manifests
The manifests can be hosted or local IIIF manifest files. You can input multiple manifest URLs or paths, and the application will crawl through all the images from each manifest such that the resulting export is a single conglomerate of all the sub-results from every manifest.
Example Command Lines
python3 inferencer.py --manifest --confidence=0.93 manifest1.json
python3 inferencer.py --html --confidence=0.90 manifest1.json
python3 inferencer.py --text --confidence=0.94 manifest1.json
python3 inferencer.py --manifest --html --confidence=0.92 manifest1.json
python3 inferencer.py --text --manifest --confidence=0.97 manifest1.json
python3 inferencer.py --html --text --confidence=0.93 manifest1.json
python3 inferencer.py --html --manifest --text --confidence=0.91 manifest1.json
python3 inferencer.py --confidence=0.95 manifest1.json
python3 inferencer.py --text manifest1.json
python3 inferencer.py manifest1.json
python3 inferencer.py --manifest --text --html --confidence=0.96 manifest1.json manifest2.json
python3 inferencer.py --manifest --text manifest1.json manifest2.json manifest3.json manifest4.json
Note that omitting the confidence option will be interpreted as setting the confidence score to 0.95. Additionally, omitting all export options will be interpreted as setting the export to a manifest file.
Collecting the Results
After inferencer.py is done processing all the images, you will see the message Finished detecting annotations.
All the export files will be saved in the Omniscribe-1.0.2 folder.
Command Line will typically display this as it processes through all the images.
TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.
Displaying resultsManifest.json through Mirador, an image viewing client that supports IIIF.
Contributors:Danny van Bruggen, Federico Tomassetti, MysterAitch, Malte Langkabel, Nicholas Smith, Artur Bosch, Malte Skoruppa, Cruz Maximilien, ThLeu, Panayiotis, Sebastian Kirsch (@skirsch79), Simon, Johann Beleites, Wim Tibackx, André Rouél, jean pierre L, Daan Schipper, Mathiponds, Why you want to know, Ryan Beckett, ptitjes, kotari4u, Marvin Wyrich, Ricardo Morais, bresai, Maarten Coene, Ty, Romain Lebouc, Implex1v, Bernhard Haumacher
TODO: Describe any bug fixes
TODO: Describe any new features or enhancements
Contributors:Florescu Dorian, England Matthew
This toolbox supports the results in the following publication:
D. Florescu and M. England. A machine learning based software pipeline to pick the variable ordering for algorithms with polynomial inputs.
The authors are supported by EPSRC Project EP/R019622/1: Embedding Machine Learning within Quantifer Elimination Procedures.
The main script is ML_test_rand/pipeline1.py. More details can be found as comments in the script.
The sotd heuristic is implemented in the file data_gen_sotd_rand_test.mw. The data is already generated in the repository.
The dataset of polynomials can be found in folders entitled poly_rand_dataset (for training) and poly_rand_dataset_test (for testing).
The CAD data is generated by running generate_CAD_data.py. The data is already generated in the repository.
The CAD routine was run in Maple 2018, with an updated version of the RegularChains Library downloaded in February 2019 from http://www.regularchains.org. The library file is also available in this repository (RegularChains_Updated.mla)
This updated library contains bug fixes and additional functionality. The training and evaluation of the machine learning models was done using the scikit-learn package v0.20.2 for Python 2.7.
Some data files generated by the pipeline are included in this repository for consistency and for saving time. However, they can be generated again by the user should they wish so:
- the predictions with the sotd heuristic (II(d) in the supported paper)
- the ML hyperparameters, resulted from 5-fold cross-validation (I(d)i in the supported paper)
- the files containing CAD runtimes (in the folders comp_times_rand_dataset and comp_times_rand_dataset_test, corresponding to I(a) and II(e) in the supported paper)
Contributors:Dietrich, Johannes W.
CyberUnits is a cross-platform class library that supports modelling for biomedical cybernetics and systems biology with Object Pascal.
CyberUnits' Bricks collection is a set of Pascal units for rapid programming of high-performance computer simulations in life sciences. It also delivers a class library that facilitates the generation of visual block diagrams in software.
Contributors:Robert Haase, Deborah Schmidt
This is part of clij release 1.5.6
Contributors:Bruno Nicenboim, stonekate
Bug fixes and very minor issues:
More unit testing.
read_edf() wasn't reading events from the status channel
fixed some inconsistencies with .reference argument
Contributors:Lisa Cerrato, Bridget Almas, TDBuck, srdee, ahanhardt, Thibault Clérice, Alison Babeu, Scott Fleischman, gregorycrane, Matthew Munson, Aurélien Berra, KATEBHN, Chiara Palladino, Adiel Mittmann, Joel Kalvesmaki, Eric Sowell
XML Canonical resources for Greek Literature
Contributors:Jose Dias Neto, Guilherme Castelão
McRadar is an Open Source Python package to simulate the multi-frequency radar variables using the output from McSnow. The package is built on top of the PyTmatrix.
McRadar was initially idealized during the Ice Microphysics workshop (at mount Zugspitze) to allow easy verification of the radar variables through the evolution of the ice microphysics. This package inherits several of my ideas that I developed when I was trying to use PyTmatrix to reproduce observed bimodal spectra.
Contributors:Sean M. Law, Will Li, Bradley Dice, Brett Fattori, ronaldhorner, mexxexx, Bharat Raghunathan, 0xflotus, Dave, Uwe L. Korn
Version 1.3.0 Release
Contributors:Paul Jennings, Martin Hangaard Hansen, Jose A. Garrido Torres, Jacob Boes, Philomena Schlexer Lamoureux, Raul Flores, Andrew, Ziyun Wang, graph-theory-NatCatal, Osman Mamun, Max Hoffmann, Igor Kowalec, Jiang Li
Added module for site featurization and GA feature selection.
Fixed ML-NEB compatibility issue with FHI-AIMS ase calculator.
Compatibility updated for ASE 3.19.0
Compatibility updated for Pandas 0.24.0
Compatibility updated for Scikit-learn 0.22.0
Dropped support for python 2.
Dropped support for python 3.5
Added testing for python 3.7 and 3.8