Contributors: Lu, Jonathan, Engelhardt, Barbara
... Gene regulatory network inference holds great potential to uncover the biological mechanisms of disease and inform downstream experiments. However, effective inference of such causal relations is not straightforward. Methods for network inference must demonstrate accuracy to handle molecular interactions, statistical signifcance to decide on thresholds, and scalability to handle high-throughput sequencing assays. We introduce BETS (Bootstrap Elastic net regression from Time Series) to address these issues. BETS makes three major contributions: 1) it uses the elastic net regression penalty to handle correlated genes, 2) it ranks edges based on a new measure of their stability, the "bootstrap frequency", and 3) it is highly parallelized, allowing analysis of datasets of thousands of genes in only a few days. Through these three innovations, our method has ranked 3rd in AUROC (out of 17) and 6th in AUPR (out of 22) in the DREAM4 100-gene community benchmark for gene regulatory network inference. Importantly, our method is one of the fastest methods compared with methods of similar performance . We next run on the GR project data, consisting of 2768 differentially-expressed genes across 12 timepoints. We infer two causal networks, which we analyze for their biological relevance. Finally, we evaluate on multiple sources of held-out data, including overexpression data from the same experimental system and literature-curated regulatory relationships.
Contributors: Mehta, Divya, Dobkin, David
... The growing polarization between Liberals and Conservatives in the United States today is largely caused and perpetuated by biased news sources. This thesis aims to begin restoring healthy political discourse and unifying young Americans by exposing this bias and enabling transparency on controversial issues. Taking inspiration from the work of Iyyer et al., we first leverage Long Short-Term Memory Networks, a type of Recurrent Neural Network, to generate a bias detection model that can, with significant accuracy, predict the liberal or conservative bias skews of a given sentence. By then applying that model to a sophisticated three-step analysis system, we can extract news article biases at the sentence-level, article-level, and topic-level. Finally, we can create simplistic and informative visualizations from that analysis to empower users with the ability to better understand the comparative ideological biases that are present in the news articles they consume.
Contributors: Di Caprio, Francesco, Chiaramonte, Maurizio, Gandelsonas, Mario
... In this thesis, I explore the issue of urban flooding for slum communities in Lagos, Nigeria. While the urban poor of Lagos are the segment of the population most endangered by the regular flood events that hit the region, the century-old slum community of Makoko-Iwaya Waterfront has developed a unique urban form that allows its residents to live harmoniously with the turbulent waters of Lagos Lagoon. When people willing to learn and support this community have come, exciting new architectural forms have emerged, as when architect Kunlé Adeyemi and his firm, NLÉ, created the Makoko Floating School in 2012. Unfortunately, the Lagos State Government has a history of violent slum clearance efforts that threaten Makoko-Iwaya Waterfront’s existence. I argue that the state government’s development process is stuck in a traditional mindset of how a city ought to be that is unsuited to the flood-prone environment it is situated in. The state also ignores the innovative urban forms developing in its own slum communities, as exemplified within the Makoko-Iwaya Waterfront. Finally, I dissect what I call the Makoko model of architectural and urban development, explain how the Makoko Floating School built upon that model, and then offer my analysis and suggestions for how this line of development can be further continued.
Contributors: Goytia, Luisa, Kaplan, Alan
... Personal security is a global concern that ignores race, gender, age, and socioeconomic status. It can be gravely compromised without warning in a dangerous location as well as in a perceived well-protected neighborhood. Theoretically, someone in need would call 911, or the equivalent in their country, and help would arrive. However, depending on the nature of the emergency, it is not always feasible to make a call or send a message. For instance, in case of assault or kidnapping, where the victim needs to physically defend themselves, placing a call would only further endanger the victim. In the scenario where contact can be established, emergency responders have very limited information to formulate an effective rescue protocol. These are concerns that can be addressed through the utilization of smartphones which have become more globally ubiquitous in the past few years. This thesis introduces Amazona, a context-aware mobile framework that discretely and rapidly distributes important system and user information to emergency contacts upon activation in an emergency. The context-aware nature of the system allows the mobile device to gather information about its environment and to adapt its emergency protocol accordingly. Based on the selected protocol, Amazona sends an updated information package, containing location, video/images, among other data, to preselected emergency contacts upon the consecutive presses of the device's power button. The package does not just reflect a snapshot of the user's condition at the time of the emergency but also includes location data from the past, creating a detailed timeline of the user's activity before the alarm is activated. This approach facilitates access to different sources of rescue, diminishes the risk involved with reaching out for help using traditional methods such as calling, and improves the probability of rescue. Keywords: Context-aware; Android; Personal Security; Smartphones; GPS
Contributors: Nair, Prem, Russakovsky, Olga
... Convolutional neural networks (CNNs) are state of the art in image multiclass classification tasks. Often, these tasks involves more complex scenarios involving multiple attributes we want to either classify or reduce bias against. We call non-primary attributes in these tasks "domains." A problem arises where performance and fairness of our classifiers deteriorate as a result of these additional requirements, often in an unexpectedly large way. First, we examine the intersection of this problem and the area of multi-task learning. We construct two multi-task image classification tasks: (1) predicting coarse and fine-grained labels on the CIFAR-100 dataset, and (2) predicting dataset domain origin and digit classification from the combination of the SVHN and MNIST datasets. We demonstrate the failure of an automatic method to balance performance across the classification tasks and consider explanations for this behavior. Next, we construct a novel dataset from CIFAR-10 that provides a framework for studying, benchmarking, and mitigating bias in a multidomain setting. While the classification task appears deceptively simple, we show experimental results of novel CNN training and inference procedures that demonstrate some success toward the challenge of bias mitigation. Finally, we apply the mitigation strategies we have developed on an activity classification task on the imSitu dataset, and reveal real-world improvements on gender bias.
Contributors: Toy, Nico, Fellbaum, Christiane
... This paper describes the development of a framework for generating classical music in a given style by training on compositions using an Recurrent Neural Network. We detail a novel technique of encoding sequences of musical notes into sequences of vectors suitable for training, in a way that preserves certain high-level music properties such as key and time signature. We then present the architecture of the neural network used and how we train on notes encoded in this way.
Contributors: Chou, Jesse, Singh, Jaswinder Pal
... Multi-instrument classification is important for music recommendation, remixing tracks, and several other applications in the music industry. Our goal is to develop an algorithm that takes a song with one or more instruments and output the names of the instruments in it. Previous research has succeeded in single-instrument classification, but multi-instrument classification remains unsuccessful due to lack of prediction specificity and generalizability to all instruments. Our new idea is, instead of training on multi-instrument examples and testing directly on multi-instrument samples like previous approaches have, we train on single-instrument examples and test on multi-instrument samples that have been first reduced into multiple single-instrument samples. We perform the reduction step by first decomposing the multi-instrument sample with Fourier transform, partitioning the signals by harmonics, then sending those partitions through our classifier which was previously trained on single instruments. We train on support vector machines (SVM), k-nearest neighbors (KNN), multilayer perceptron (MLP), and random forest classifiers, using the amplitudes of the harmonics of each instrument as features. We achieved the best results with SVM using the first 10 harmonics, which resulted in a 88% cross validation accuracy, 89% single-instrument test accuracy and 75% multi-instrument test accuracy.
Contributors: Hinthorn, William, Russakovsky, Olga
... Over the past five years, Convolutional Neural Networks (CNNs) and massive benchmark datasets have pushed the field of computer vision (CV) to new heights. Current models can segment images according to semantic class with great accuracy. As visual artificial intelligence (AI) becomes integrated in our daily lives, the need arises for models to better understand how humans refer to objects. They must see beyond the explicit class or classes that verily could be used to label an entity and understand the intent implicit in the specific localization of a visual query, selecting the label that most likely matches the human intent given the visual context. Insufficient research has been devoted towards building CV systems that model the joint attention between humans and machines. In this thesis, I propose an object-part inference task to improve CV’s abilities to reason about the nuanced task of human pointing. In the process of developing this task, I make three specific contributions to the goal of building human-like AI in the process of developing this task. First, I have annotated a dataset of points distributed over 15 of the object classes of the Pascal VOC Parts Challenge dataset. Each point is annotated as most likely referring to the entire object, a part of the object, neither of the above, or whether the point is located in a position such that one may not clearly infer the intent of the pointer. My second contribution is a statistical analysis of this dataset to examine existing biases and other insights into the complexities of the object-part inference task. My third contribution is the design of computer vision models that infer human intent given a point on an image. I report 81.5% accuracy on the object-part inference task when conditioned on the semantic object class, with 67.3% accuracy achieved without any additional semantic information. Finally, I extend the task to predict the spatial extent of the object or part indicated by a point and obtain a mIoU of 48.80% over the validation set. Note that the semantic segmentation mIoU of the simple model used for these scores is a mere 68.28%, well below the state-of the-art on Pascal VOC. Using deeper, more powerful base networks would greatly improve overall accuracy on the object-part task.
Impact of Remote Sensing Domain Knowledge on Satellite Imagery Classification of the Amazon Rainforest
Contributors: Zeng, Lindy, Dobkin, David
... Satellite imagery with finer spatial resolution provides an opportunity to detect small- scale changes in forest cover. With recent improvements of techniques in the field of computer vision, convolutional neural networks can be trained with satellite imagery through a combination of deep learning and remote sensing principles. The aim of this thesis is to identify land cover types, land use types, and atmospheric conditions present in satellite imagery of the Amazon rainforest using deep learning models. Implementing, training, and testing models provide methods for detecting and identifying mechanisms of deforestation and shape understanding needed to protect the world’s forests. Our trained models demonstrate the ability to identify features in satellite imagery of the Amazon rainforest to a high degree of accuracy. Additionally, we establish their capability to detect small-scale forest cover changes over time.
A Computational Pathway for Identifying Metabolites Relevant to Cancer Development: New Methods Incorporating Protein Structure and DiffMut
Contributors: Berman, Adam, Singh, Mona G
... Last year, I developed a computational pipeline capable of leveraging TCGA (The Cancer Genome Atlas) genomic cancer data to assign scores to a list of all biologically active endogenous metabolites in accordance to their relevance to breast cancer, ultimately resulting in a ranked list of metabolites. At the time, the pipeline utilized two different types of data to assign these scores: mutational data, and RNA-Seq expressional data. After further consideration, it has become apparent that just these two types of data are insufficient to derive nuanced, meaningful scores. For example, not all mutations are equally likely to be positively correlated with cancer development. For this reason, I have modified my scoring algorithm to intelligently incorporate two new subscores: one based on information relating to whether mutations occur in the binding regions of the protein partners of the metabolites of interest, and another based on DiffMut, a pre-existing differential cancer mutational analysis program. With four different subscores for each metabolite, I perform pairwise combinations of these scores to determine the pair of scores that optimally reflects known cancer-linked metabolites and metabolic pathways. I also drastically improved the method of determining synonym metabolites from HMDB to properly account for isomers. Therefore, this work can be seen a drastic expansion, refactoring, and improvement on last year’s pipeline, particularly through its incorporation of positional information to consider the impact each mutation would have on the structural binding region of proteins that interact with metabolites. Indeed, the average area-under-the-curve (AUC) value of this year’s overall per-metabolite scores was a full 30 percent better than last year’s, increasing from 0.499 to 0.651.