Deep Green Diagnostics

Published: 22-03-2019| Version 4 | DOI: 10.17632/dn8rj26kzm.4
Israel Agustin Vargas Monroy,
Anayantzin Paola Lopez Juarez,
Carlos Duchanoy,
Marco Antonio Moreno-Armendáriz,
Miguel Santiago Suárez Castañon


The following dataset was created to cover the necessity of getting images of urban areas to estimate the health vegetation using a drone to 30 hight meters. The images were taken with a drone Phantom 4 of the DJI family and cut them to get a size of 200x200px. Into the pictures, there are different types of vegetation such as grass, trees of different size, and shrubs. Also, streets, buildings, cars, people and other types of objects commonly present in these areas. Contamination is an agent that is almost always found in all of the cities, so we consider as contamination as all the garbage that could be found in an urban environment, such as waste of food, empty containers of sweets or chips, plastic bottles, and other things.


Steps to reproduce

Into the directory there are two CSV files called tag_traning and tag_test each file contain the name and the tag of each image, like in the next structure: Name_Img.jpg Health_tag Contaminated Where: The first column is the name of the .jpg file. The second column is the health tag: SA -> Healthy, SE -> Dry, NS -> Unhealthy and NP -> No-vegetation. The third column is the flag to identify if there is contamination, to indicate contamination we use number __1 for contamination and __0 for no contamination. Also, the directory contains two .zip files. The "Training_dataset" has more than 9000 images and the "Testing_dataset" has approximately 900 images. It is important to say that there are two serialized pickle files, each file has a python list that contains the next structure: [[[numpy.arr], [list_tag]], ..., [[numpy.arr], [list_tag]]] This pickle file was generated with a python code, for that reason, the numpy array is the information of each pixel inside the image and the list contains a one-hot coding as follows: No-contaminated [1,0,0,0,0,0,0,0] -> SA 0 - Healthy [0,1,0,0,0,0,0,0] -> SE 0 - Dry [0,0,1,0,0,0,0,0] -> NS 0 - Unhealthy [0,0,0,1,0,0,0,0] -> NP 0 - No-vegetation Contaminated [0,0,0,0,1,0,0,0] -> SA 1 - Healthy contaminated [0,0,0,0,0,1,0,0] -> SE 1 - Dry contaminated [0,0,0,0,0,0,1,1] -> NS 1 - Unhealthy contaminated [0,0,0,0,0,0,0,1] -> NP 1 - No-vegetation contaminated If you want load the pickle file, run the next code: import pickle list_info = pickle.load(open( "file_pickle.pickle", "rb" )) Note: In this version of dataset we can load the file pickle_training.pickle but the file was split. You need to join the file again. Please use the software HJ-Split, just download the software -the software is compatible with Linux, Mac and Windows- selecting your System and download. You do not need to install anything. Just run the executable file. Immediately you run the file just select the “join” option in the user interface, push the button “Input File” and select the file “... 001” then push the button “Output” and push “start”. If you Have troubles assembling the file or performance, please let us know and send a mail to the next mail:, with the subject: "Pickle_Training".