This data set comprises of the raw and processing data related to classification of the abstract text with the Alzheimer disease. Raw data has been processed and double fold cross validated to prepare GOLD standard data set for Alzheimer association class with the genes. This data can be further used for disease gene classification as well as for system level meta analysis of the Alzheimer.
Contributors:Amanda Kahn, Sally Leys, clark pennelly
These are data of the observed contraction behaviors of several species of sponges and cnidarians on the deep seafloor of Station M (4,100 m depth, northeast Pacific Ocean). Data are organized with each worksheet housing data for a different taxon. Each worksheet has records of contraction kinetics for different individuals. Units are hours unless indicated otherwise.
Raw uncropped original images that were used for making figures in Basu et al., PNAS 2020
Plot data in csv format.
Cycling level regarding weather conditions, in March 2018 at Puerto Madryn.
Commuters in urban or recreative cycling rides across weather conditions
Contributors:Víctor Labayen, Eduardo Magana, Daniel Morato Oses, Mikel Izal
The dataset is a set of network traffic traces in pcap/csv format captured from a single user. The traffic is classified in 5 different activities (Video, Bulk, Idle, Web, and Interactive) and the label is shown in the filename. There is also a file (mapping.csv) with the mapping of the host's IP address, the csv/pcap filename and the activity label.
Interactive: applications that perform real-time interactions in order to provide a suitable user experience, such as editing a file in google docs and remote CLI's sessions by SSH.
Bulk data transfer: applications that perform a transfer of large data volume files over the network. Some examples are SCP/FTP applications and direct downloads of large files from web servers like Mediafire, Dropbox or the university repository among others.
Web browsing: contains all the generated traffic while searching and consuming different web pages. Examples of those pages are several blogs and new sites and the moodle of the university.
Vídeo playback: contains traffic from applications that consume video in streaming or pseudo-streaming. The most known server used are Twitch and Youtube but the university online classroom has also been used.
Idle behaviour: is composed by the background traffic generated by the user computer when the user is idle. This traffic has been captured with every application closed and with some opened pages like google docs, YouTube and several web pages, but always without user interaction.
The capture is performed in a network probe, attached to the router that forwards the user network traffic, using a SPAN port. The traffic is stored in pcap format with all the packet payload. In the csv file, every non TCP/UDP packet is filtered out, as well as every packet with no payload. The fields in the csv files are the following (one line per packet): Timestamp, protocol, payload size, IP address source and destination, UDP/TCP port source and destination. The fields are also included as a header in every csv file.
The amount of data is stated as follows:
Bulk : 19 traces, 3599 s of total duration, 8704 MBytes of pcap files
Video : 23 traces, 4496 s, 1405 MBytes
Web : 23 traces, 4203 s, 148 MBytes
Interactive : 42 traces, 8934 s, 30.5 MBytes
Idle : 52 traces, 6341 s, 0.69 MBytes
The code of our machine learning approach is also included. There is a README.txt file with the documentation of how to use the code.
Contributors:Roslyn Rivkah Isseroff, David Siegel, Jayne Joo
Supplemental Materials for JAAD Research Letter "Access to Mohs Surgery through the Choice program of the United States Department
of Veterans Affairs "
Contributors:Vinh Truong Hoang
This paper introduces a new dataset for solving the ground-based cloud images classification task. We name it ‘Cloud-ImVN 1.0’ which is an extension of SWIMCAT database. This dataset contains 6 categories of clouds images which consists of 2,100 color images (150 × 150 pixels). Several task can be applied on this dataset including classification, clustering and segmentation in both supervised and unsupervised learning context.
Contributors:Raja Ram Gurung, ganga Gharty chhetri, prashanna maharjan
1. Distribution of S. aureus in different age groups and types of patients.
2. Distribution of MRSA in outpatients and inpatients.
3. Gender wise distribution of S. aureus in various infection groups.
4. Antibiogram of MRSA and MSSA.
5. Susceptibility pattern towards erythromycin and clindamycin.
6. Multi drug Resistance pattern of MRSA
Contributors:Jorge Segarra-Tamarit, Emilio Perez, Hector Beltran, Javier Perez
There are four main folders in the project: code, data, models and logdir.
This folder contains all the data used from the two studied locations: Loc.1 (latitude=40.4º, longitude=6.0º) and Loc.2 (latitude=39.99º, longitude=-0.06º).
Sorted by year, month and day, each location has three kinds of data:
• The files named as just a number are 151x151 irradiance estimates matrices centered in the same location obtained from http://msgcpp.knmi.nl. The spatial resolution is 0.03º for both latitude and longitude.
• The files named Real_ are the irradiance measurements at the location
• The files named CopernicusClear_ are the clear sky estimates from the CAMS McClear model
Each file contains the 96 15-minute samples for the same day in Matlab format and UTC time.
All the python scripts used to train the neural networks and perform the forecasts. The main files are:
• tf1.yml: List of the modules and versions used. A clean Anaconda environment created from this file can run all the code in the project.
• learnRadiation.py: The script to train a new model. Changing the variables “paper_model_name” and “location”. The first variable selects the kind of model to fit and the second one the training location.
• predictOnly.py: Loads a trained model and performs the forecast. Notice that the model and location must match the ones used to train the model stored in the “training_path” folder
This folder contains all the trained models and their forecasting results. There is also a training folder to contain the last trained model.
This folder stores Tensorboard files during training
How to train and test a model
A new model can be trained using “learnRadiation.py”. This script has three parameters
• location: Selects the location where the model will be trained (LOC1 or LOC2)
• paper_model_name: This sets the inputs to match the ones used in the models from the article.
• training_path: The folder to save the trained model
Then the “predictOnly.py” script allows performing the forecasts. It is important to set the same parameters as in the “learnRadiation.py” script. This program will generate the predictions and save them in the model folder. It also plots some days, which can be modified at the bottom of the script.
For instance for LOC2 and model TOA & all real we would run:
"python learnRadiation.py TOAallreal LOC2 training"
This will train the neural network and save the results in the folder models/training.
After this, we would generate the results and plot some days using:
“python predictOnly.py TOAallreal LOC2 training”
This will save the forecasts and real values in the training folder and show figures with 1 to 6 hour forecasts
The models used for the article can also be evaluated by using predictOnly.py and targeting their folders. For instance, to evaluate the TOA & all real model used in the article, this command must be used:
“python predictOnly.py TOAallreal LOC2 RtoaAllReal”