Picture of Wilson Rivera Gallego
2014 CISE Student Lecture Series
by Wilson Rivera Gallego - Saturday, 20 September 2014, 11:41 AM

October 16, 2014

CID 123; 2:30 - 4:00 pm

Design and Implementation of a De-duplicated and Distributed File System

Paul Bartus

 Abstract: File systems often contain redundant copies of information, like identical files or sub-file regions, possibly stored on a single host, on a shared storage cluster, or backed-up to secondary storage. De-duplicating storage systems take advantage of this redundancy to reduce the underlining space needed to contain the file system. The purpose of this work is to design and implement a distributed file system with deduplication and use it on an Internet Small Computer Standard Interface iSCSI Storage Area Network. Scalable, highly reliable distributed systems supporting data deduplication have recently become popular for storing backup and archival data. The concept of file recipe is central to this approach. The file recipe for a file is a synopsis that contains a list of data block identifiers: each block identifier is a cryptographic hash over the contents of the block. Once the data blocks identified in a recipe have been obtained, they can be combined as prescribed in the recipe to reconstruct the file. Files can be replaced by the corresponding recipes. One of the important requirements for backup storage is the ability to delete data selectively. Data deduplication systems discover redundancies between data blocks. Our approach divides data into 8KB chunks and identifies redundancies via fingerprints. This will improve the storage capacity by increasing the storage efficiency ratio (bytes of actual file / bytes used to store). We will reexamine traditional choices and explore new design points. We discuss some aspects of the design and implementation.

October 23, 2014

CID 123; 2:30 - 4:00 pm

Towards Probabilistic Inference in Bayesian Networks using Map-Reduce

Walter Quispe Vargas

Abstract: The Bayesian networks (BNs) are probabilistic graphical models used for studying probabilistic dependencies among variables of interest. It is used frequently in Artificial Intelligence, Machine Learning, Statistics and Expert Systems. Probabilistic inference in BNs is computationally complex. In the worst case, algorithms for estimate those probabilities are NP-hard, since the size of the conditional probability table grows exponentially in the number of parents in the network as the input data size increases. Thus sequential learning for large and complex BNs becomes challenging even in the case of complete data. In order to speed up the process of estimation, it is necessary run parts of the network in parallel. In this talk we discuss approaches to Bayesian parameter learning for complete data via Bayesian update and for incomplete data using the classical Expectation Maximization algorithm. We also explore the use of Hadoop for implementing BNs. Both analytical and experimental results show gains in speed up and parameter quality.

October 30, 2014

CID 123; 2:30 - 4:00 pm

A parallel implementation of digital watermarking

Einstein Morales

Abstract: Digital image watermarking is an important component in digital communications and is widely used for authentication of reconnaissance images. A robust method for performing this process consists of subdividing both the host image and the watermark into smaller blocks and independently applying the Discrete Cosine Transform (DCT) to each block. A block by block watermarked image is then obtained by applying the inverse DCT (IDCT) to a linear combination of the transformed blocks. Since for any pair of blocks the DCT and IDCT can be applied independently, this process is highly parallelizable. We present an implementation of digital watermarking in which we apply the DCT and IDCT to 8x8 blocks using openMP and MPI, We reduce the number of multiplications and additions to a minimum by making use of the symmetries in the matrix forms of the 8X8 DCT and IDCT. Experimental results show that our algorithm can be a viable option compared to parallelizable versions of the DCT using CUDA which make use of resources such as graphic cards, achieving consume the least amount of time as possible in problems such as copyright protection for a huge amount of digital files, avoiding any loss of information.

November 6, 2014

CID 123; 2:30 - 4:00 pm

Weather Prediction Models and Big Data

Roberto C. Trespalacios Alies

Abstract:  The use of computer systems, better conditioning allows predictions of atmospheric phenomena. However, even such prediction is a challenge that is presented to the majority of scientists in the Atmospheric Sciences. This paper seeks to explore some optimal models and show the impact of big data on building better prediction algorithms. There are two different approaches to the problem; among which are the models of numerical weather prediction or statistical models. Numerical weather prediction models are very efficient to simulate the rainfall intensity for long periods of relative time (< 72 hours); while statistical models have been used very successfully for now casting of rainfall intensity at short times (6 1 hour). This is due to the fact that for short periods of time the nonlinear dynamics of atmospheric processes can be approximated by linear processes, and in short time dominated the spatial distribution of rainfall advection and persistence, which are features that can be represented by statistical computer models. Between the most important factors in predicting short-term atmospheric, we can mention, the large volume of data and the dispersion phenomenon of sudden rain in short periods; therefore, it is difficult to apply to data streams of time series models. Improvements in these models of short-term prediction avoid many environmental disasters, structural damage and death of people

November 13, 2014

CID 123; 2:30 - 4:00 pm

Dynamic Programming approach in Conflict Resolution Algorithm of Access Control module in Medical Information Systems

Hiva Samadian

Abstract: The high sensitivity of assets (e.g. patients’ health record and sensitive medical devices) in medical centers requires the managers to pay special attention to deploy reliable authorization models. A reliable authorization model must be able to resolve the contingent conflicts that can occur due to different authorization assignments to subjects (e.g. technicians). Resolving conflicts is quite a challenge due to the existence of sophisticated inheritance hierarchies that might cause an exponential number of conflicts (in term of the number of subjects in the organization hierarchy) and the diversity of ways to combine resolution policies. The need to an approach that can handle as much contingent conflicts and resolution policies as possible and work in an appropriate time emerges here. This paper develops a dynamic programming (DP) algorithm for resolving all conflicts in accordance to all existing policies. The solution has been implemented and tested through variety of simulated data. The experiments on worst cases show the time decrease to 1/10 of the best existing algorithm. The improvement in real world instances is more significant (3/1000), so the average time of determining the authorization of a subject over 500 objects is just 52.56 ms.

November 20, 2014

CID 123; 2:30 - 4:00 pm

Combined Computational Fluid Dynamics and Magnetic Resonance Flow Mapping

Javan Cooper

Abstract: Medical staffs rely on high quality images of the body to diagnose illnesses. Magnetic Resonance Imaging (MRI) machines generate such images. However, movement and blood flows commonly produce image artefacts. In this work, we simulate the flow of ‘particle spins’ using Magnetic Resonance techniques. The MRI simulator calls a Computational Fluid Dynamics solver that imposes a velocity on the ‘particle spins’ at each time step. The MRI simulator numerically solves the Bloch equations a coupled system that describes the evolution of magnetization over time using a hybrid Eulerian-Lagrangian-particle-in-cell approach. In this seminar, I will show how the magnetization distribution changes with particle velocity. I will also present a parallel algorithm to reconstruct a MR image. The software may serve as an inexpensive tool for technician training. Additionally, researchers may use the software to develop techniques to reduce the occurrence of artefacts in MR images.

December 4, 2014

CID 123; 2:30 - 4:00 pm

Tackling the Challenges of Big Data

Carlos J Gomez

Abstract: Terabytes of data are being generated every day from emails, videos, audios, images, search queries, health records, social networking interactions, science data, sensors, mobile phones applications and more. Traditional collection, storage and analytics methods were not designed to handle this vast amount of data. New technologies and algorithms have been developed in recent years to face this problem and gain insights from big data. An introduction to big data and an overview of state of the art big data technologies will be presented. The content is based on the MIT Professional Education course on the same topic.