Filter Results
198411 results
Quality control from RNA extraction was performed using the Agilent bio-analyzer, processed using the Illumina™ TotalPrep™-96 RNA Amplification Kit (ThermoFisher 4393543), hybridized to Illumina HT12v4 microarrays (Catalog number: 4393543), and scanned on an Illumina HiScan scanner [22] [23].   For each of the 22 individuals, three biological replicates were profiled, with each sample being assessed at both standard glucose conditions (11mM of glucose), as well as high glucose conditions (30mM of glucose). Biological replicates were split from the same mother flask; cells were grown in separate flasks and run on different microarray plates on different days. Each biological replicate was generated from a separate frozen aliquot of that cell line. The gene expression data comprised a total of 144 samples from 22 individuals (3 replicates per individual and treatment, except for 3 individuals with 5 replicates).  Gene expression was assessed in two conditions, standard glucose and high glucose, and generated from four different groups of gene expression arrays run at a given time that were carefully designed to minimize potential batch effects. BeadChip data were extracted using GenomeStudio (version GSGX 1.9.0) and the raw expression and control probe data from the four different batches were preprocessed using a lumiExpresso function in the lumi R package [8, 9] in three steps: (i) background correction (lumiB function with the bgAdjust method); (ii) variance stabilizing transformation (lumiT function with the log2 option); (iii) normalization (lumiN function with the robust spline normalization (rsn) algorithm that is a mixture of quantile and loess normalization). To remove unexpressed probes, we applied a detection filter to retain probes with strong true signal by applying Illumina BeadArrays detection p-values < 0.01 followed by removing probes that did not have annotated genes, resulting in a total of 15,591 probes.  
Data Types:
  • Dataset
  • Text
HealthAidKB, a Knowledge Base, is the result of an automatic extraction and clustering pipeline of common procedural knowledge in the domain of health. Our goal is to construct domain targeted high precision procedural knowledge base containing task frames. We developed a pipeline of methods leveraging Open IE to extract procedural knowledge by tapping in to on-line communities. In addition, we devise a mechanism to canonicalize the task frames in to clusters based on the similarity of the problems they intend to solve. The resulting knowledge base shows high precision based on an evaluation by human experts in the domain. We extracted the procedural knowledge by tapping in to health category of wiki how (https://www.wikihow.com/Category:Health ) and how to cure (https://howtocure.com/).
Data Types:
  • Software/Code
  • Tabular Data
  • Dataset
  • Text
We represented a new Bangla dataset with a Hybrid Recurrent Neural Network model which generated Bangla natural language description of images. This dataset achieved by a large number of images with classification and containing natural language process of images. We conducted experiments on our self-made Bangla Natural Language Image to Text (BNLIT) dataset. Our dataset contained 8,743 images. We made this dataset using Bangladesh perspective images. We used one annotation for each image. In our repository, we added two types of pre-processed data which is 224 × 224 and 500 × 375 respectively alongside annotations of full dataset. We also added CNN features file of whole dataset in our repository which is features.pkl.
Data Types:
  • Other
  • Software/Code
  • Dataset
  • Text
  • File Set
The functions used to carry out this work are found in the files provided, "k-Prototypes Clustering" and "clustMixType modified functions". These algorithms carry out the operations of obtaining and manipulating the data matrix, descriptive statistics of the data, determining the best number of clusters, clustering with the k-prototypes method, and statistical validation of the generated clusters with MANOVA. An example is also presented using the Iris database, contained in the R software library, and widely used to exemplify and validate algorithms developed in R language. The functions modified for this work are found in the files "clustMixType modified functions". The modified functions are called in the algorithm of the file "k-Prototypes Clustering", on line 41, by the file "k-Prototypes Clustering.R". The kproto.modif (), clprofiles.modif () and summary.kproto.modif () functions were modified from the kproto (), clprofiles () and summary.kproto () functions, respectively, of the clustMixType package, developed by SZEPANNEK (2018). The dist.binary () function of the ade4 package, developed by DRAY & DUFOUR (2017), was also used in the development of the kproto.modif () function, that now can use a variety of similarity functions. The relationship between the variables is expressed by the squared Euclidean distance, to quantify the distance between numerical variables, and for the nominal variables, the distance can be obtained from a variety of coefficients of similarity. The fviz_cluster.modif () function was modified from the fviz_cluster () function of the factoextra package, developed by KASSAMBARA & MUNDT (2017). REFERENCES: - DRAY, S.; DUFOUR, A.-B. The ade4 Package: Implementing the Duality Diagram for Ecologists. Journal of Statistical Software, v.22, n.4, p.1-20, set. 2017. R Package version 1.7-13. Available at: https://CRAN.R-project.org/package=ade4. https://www.doi.org/10.18637/jss.v022.i04. - KASSAMBARA, A.; MUNDT, F. factoextra: Extract and Visualize the Results of Multivariate Data Analyses. 2017. R Package version 1.0.5. Available at: https://CRAN.R-project.org/package=factoextra. - SZEPANNEK, G. clustMixType: User-Friendly Clustering of Mixed-Type Data in R. The R Journal, v.10, n.2, p.200-208, 2018. R Package version 0.2-1. Available at: https://CRAN.R-project.org/package=clustMixType. https://www.doi.org/10.32614/RJ-2018-048.
Data Types:
  • Software/Code
  • Dataset
  • Text
Preprocessed EEG data, behavioral measures, participants infos and experiment code for the study 'An ecological measure of rapid and automatic face-sex categorization'. Please find additional information in 'READ_ME.txt' file.
Data Types:
  • Other
  • Image
  • Tabular Data
  • Dataset
  • Text
  • File Set
In active learning, Optimally Balanced Entropy-Based Sampling (OBEBS) method is a selection strategy from unlabelled data. At active zero-shot learning there is not enough information for supervised machine learning method, thus, our sampling strategy was based on unsupervised learning (clustering). The cluster membership likelihoods of the items were essential for the algorithm to connect the clusters and the classes; i.e. to find assignment between them. For best assignment, Hungarian algorithm was used. We developed and implemented adaptive assignment variants of OBEBS method in the software.
Data Types:
  • Software/Code
  • Dataset
  • Text
Primary analysis files for bioRxiv manuscript with id 2019/859603 (https://www.biorxiv.org/content/10.1101/859603v1) to evaluate how common variant effect prediction methods capture effect determined by deep mutational scanning experiments. 'data' contains the deep mutational scanning data in a parsed format. See the manuscript for the original data sources which would then be processed with parseRawDatasets.py, followed by manual sequence mapping (resulting in the mapped_seqs.txt files) and then be processed with parseScores.py to result in the .npz files. 'predictionData' contains predictions from SIFT, PolyPhen-2, SNAP2 and Envision, parsed into .npz files. Additional folders are for dummy methods and while executing the below scripts. 'analysis' will contain most of the output files. See below for sample calls to reproduce e.g. Figure 1 from the paper. The scripts are written in Python3 and require, among others, numpy, pandas, scipy, sklearn, rpy2, svgutils and matplotlib. For all scripts the --normalization-scheme flag describes how the experimental scores are processed to fit on the same scale of values. The scheme used for the final manuscript is 'wt0_del_scaled' for deleterious effect variants and 'wt0_ben_scaled' for beneficial effect variants. For compareBinaryDMSToPredictions.py the --binarization-scheme flag describes how scores are binarized to neutral/effect. Possible values are the schemes outlined in the manuscript 'syn90', 'syn95' and 'syn99'.
Data Types:
  • Other
  • Software/Code
  • Sequencing Data
  • Tabular Data
  • Dataset
  • Text
The dataset contains information on the currents in the upper 15-meter layer of the Caspian Sea in 2003-2005. The velocity fields were reconstructed in an eddy-resolving ocean general circulation model SZ-COMPAS using a realistic forcing. The data arrays have a very high resolution: ~2 km in space and 4 hours in time. This is sufficient to resolve most of the mesoscale features of sea dynamics as well as a wide range of their temporal spectrum, including inertial oscillations, synoptic and seasonal variability. The NetCDF files are: 3 files with instantaneous currents and 3 files with monthly mean currents (suffix “mm” in the file names). Each file corresponds to one of the three horizons (depths): 1 m, 7 m, and 15 m. In the horizontal plane the data are defined on a uniform geographical grid (46.7625–54.2125°E, 36.5092–47.2892°N), dimensions of all of the arrays are 299 by 589. The period covered by the instantaneous data is 2003, time step is 4 hours; the first record corresponds to 2003-01-01 04:00:00 GMT. The period covered by the monthly mean data is 2003-2005 with one vector field for every month, defined on the last day of the month. All data dimension is cm/s. It should be noted that the model describes the upper 30-meter layer of the sea in sigma-coordinate, so the data values are actually defined on the 1st, 4th, and 8th sigma-horizons, rather than 1 m, 7 m, and 15 m depths. This means that, in the sea cells with bottom depth less than 30 m, the actual depth of the data nodes is ~3%, ~23% and 50% of real water column height, while in the rest of cells (with bottom depth greater than 30 m), the data nodes are located at ~1 m, ~7 m, and ~15 m of depth (± few centimeters depending on the local sea level).
Data Types:
  • Dataset
  • Text
  • File Set
These two files include the heat capacity data for pure iron and magnesium. They include three columns: temperature (Temp), heat capacity (Cp) and reference (Ref). The data are collected from the two following sources: (1) Y. S. Touloukian, E. H. Buyco, Thermophysical properties of matter - the TPRC data series. volume 4. specific heat-metallic elements and alloys, Tech. rep., Thermophysical and Electronic Properties Information Analysis Center Lafayette IN (1971). (2) NIST Thermodynamic Research Center, the NIST alloy data web application, https://trc.nist.gov/metals_data/, (accessed: 13 February 2020).
Data Types:
  • Dataset
  • Text
Main authors potentially importatn from the chimical epistemic field in byopolimers. (from 2006 to 2019) - Patents registered and papers (Conferences are not included. Database in .bib bibliographic format.
Data Types:
  • Dataset
  • Text
2