Subscribe to DSC Newsletter

Jake Drew Ph.D.'s Blog (9)

Polymorphic Malware Detection Using Sequence Classification Methods

A pdf version of this document created using latex can be downloaded by clicking here.

Abstract



Polymorphic malware detection is challenging due to the continual mutations miscreants introduce to successive instances of a particular virus. Such changes are akin to mutations in biological sequences. Recently, high-throughput methods for gene sequence…

Continue

Added by Jake Drew Ph.D. on May 22, 2016 at 6:32pm — No Comments

Finding Whales in Ocean Water: Edge Detection, Blob Processing, and RGB Channels in C#

A Visual Studio 2013 demo project including all of the code in this article can be downloaded using the links in the resources section below.

I was recently tasked with trying to isolate whales within an image of ocean water using C#.  While building machine learning models, the question was raised: “Does ocean water add noise in a model to detect a specific whale while it is swimming in the ocean?”  This was the primary…

Continue

Added by Jake Drew Ph.D. on January 11, 2016 at 9:00pm — No Comments

Mining Web Pages in Parallel

A Visual Studio 2013 demo project including the WebpageDownloader and LinkCrawler can be downloaded here.

Introduction

The US digital universe currently doubles in size approximately every three years [1].  In fact, Hewlett Packard estimates that by the end of this decade, the digital universe will be measured in ‘Brontobytes’, which…

Continue

Added by Jake Drew Ph.D. on March 18, 2015 at 7:00pm — 1 Comment

Mass Compromise of IIS Shared Web Hosting for Blackhat SEO: A Case Study

Jake Drew, Marie Vasek, and Tyler Moore

Computer Science and Engineering Department Southern Methodist University Dallas, TX, USA

{jdrew, mvasek, tylerm}@smu.edu

The latex publication pdf for this article can be downloaded …

Continue

Added by Jake Drew Ph.D. on March 9, 2015 at 9:30pm — No Comments

Clustering Similar Images Using MapReduce Style Feature Extraction with C# and R

Abstract



This article provides a full demo application using both the C# and R programming languages interchangeably to rapidly identify and cluster similar images.   The demo application includes a directory with 687 screenshots of webpages.  Many of these images are very similar with different domain names but near identical content.  Some images are only slightly similar with the sites using the same general layouts but different colors and different images on certain…

Continue

Added by Jake Drew Ph.D. on June 25, 2014 at 4:00pm — No Comments

Automatic Identification of Replicated Criminal Websites Using Combined Clustering Methods

The following publication was presented at the 2014 IEEE International Workshop on Cyber Crime and received the Best Paper Award on 5/18/2014.  The original IEEE LaTeX formatted PDF publication can also be downloaded from here: IWCC Combined Clustering.…

Continue

Added by Jake Drew Ph.D. on May 19, 2014 at 6:30pm — No Comments

Practical Applications of Locality Sensitive Hashing for Unstructured Data

Introduction

The purpose of this article is to demonstrate how the practical Data Scientist can implement a Locality Sensitive Hashing system from start to finish in order to drastically reduce the search time typically required in high dimensional spaces when finding similar items.  Locality Sensitive Hashing accomplishes this efficiency by exponentially reducing the amount of data required for storage when collecting features for comparison between similar…

Continue

Added by Jake Drew Ph.D. on May 8, 2014 at 9:00am — No Comments

MapReduce / Map Reduction Strategies Using C#

A Brief History of Map Reduction



Map and Reduce functions can be traced all the way back to functional programming languages such as Haskell and its Polymorphic Map function known as fmap.  Even before fmap there was the Haskell …

Continue

Added by Jake Drew Ph.D. on March 31, 2014 at 6:48am — No Comments

Machine Learning in Parallel with Support Vector Machines, Generalized Linear Models, and Adaptive Boosting

Introduction

This article describes methods for machine learning using bootstrap samples and parallel processing to model very large volumes of data in short periods of time. The R programming language includes many packages for machine learning different types of data. Three of these packages include Support Vector Machines (SVM) [1], Generalized Linear Models (GLM) [2], and Adaptive Boosting (AdaBoost) [3]. While all three packages can be highly accurate for…

Continue

Added by Jake Drew Ph.D. on March 19, 2014 at 9:10am — 4 Comments

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service