NASA SLS-1 and SLS-2 Rat Datasets
The decision to reanalyze research from SLS-1 and SLS-2 was made on broader methodological questions on data engineering principles and research gaps about biology in spaceflight. Our methods are intended to facilitate the data organization of older spaceflight studies to perform further studies without the impediment of spaceflight experimentation costs and time. We developed a pipeline that could be applied to other studies of similar nature to use its results to bridge differences between space life science experiments. It’s a form of systematically updating previous reviews and giving more power and purpose to older legacy data. After testing our methods with datasets from the SLS Missions, we successfully wrote code that produces a programmable data frame that creates accurate and logical plots that may be used for spaceflight data reanalysis. The beauty of this longform data structure lies in its ability to compare multiple datasets from different experiments and tissues to generate new conclusions. Since all the measurements are stacked in one column, we can use the data frame’s simple nature to select with code which rows encompass our dataset and assign it to a variable to produce a plot. Essentially, we can choose an interesting physiological relationship that is readily available in our data frame and produce a plot to visualize how it may behave beside another dataset on a different tissue that shares the same experimental parameters. Different types of graphs, including box plots, pie charts, x/y line plots, bar graphs and more can be incorporated and automated into the code through py-matplotlib with just an input of the rows that will be used. It is exciting to be able to play around with organized and understandable datasets and visually discover interesting directions the data may take.
Steps to reproduce
This project’s objective was to extract, filter, organize, and analyze all Rattus norvegicus data and metadata obtained from Columbia’s Spacelab Life Sciences 1 (SLS-1, STS-40) and Spacelab Life Sciences 2 (SLS-2, STS-58) missions to explore the ways that we can compile information from rats to create a reliable model to understand biological mechanisms in response to these space flight changes. By reusing rare space legacy data coupled with new data analysis techniques, we can combine individual preexisting datasets with current ones to gain new, comprehensive insights about the effects of spaceflight on our bodies. Our methods can also lead to the creation of a standardized pipeline that could be applied to other space life science datasets for analysis. The main bulk of our pipeline lies in data extraction and formatting. To address some of the challenges that we face compiling and processing space data that has not been ‘touched’ for some time, we resorted to some useful tools like WebPlotDigitizer and Python Libraries to help extract the data and organize the data frame. One example of a data extraction method is using WebPlotDigitizer’s software, which converts images of bar graphs or any other type of graph from an unprogrammable document to a .csv output. From there, we can use Python to rename the file, add a description and necessary labels, and move it to a designated folder. This protocol was used for data that was stored in graphs in publications or sources that could not be accessed with code. Not all data has to go through this pipeline, instead, since they are in a excel file type (.xlsx) or comma delimited (.csv) formatting, we proceed with the extraction and compiling with Python into our master data frame. Each measurement has respective specifications that had to be considered to characterize that single data point in our data frame and avoid duplicates or overlaps. For this reason, we used a py-pandas data frame to store a data because of it its ability to hold large amounts of data and be stored in a .csv file. Thirteen columns were established to specify the data ranging from Sample # to the File Name where it is stored. In a short, the raw data from the 1991 SLS-1 mission was turned it into a long-form uniform dataset stacked into a master data frame and separated by experiment. The columns include Experiment Title, PI, Experiment ID, Filename, Tissue Type, Assay, Rat ID, Flight Phase, Factor Value, Statistics, Housing Module, Measurement, and its Units. From the different formats experimenters used in the missions, we wrote code that would automate converting each horizontal measurement and represent them individually vertically along with their metadata. The beauty of this longform data structure lies in its ability to compare multiple datasets from different experiments and tissues to generate new conclusions.