Enhanced JS/DOM Crawling

Allgemein

Betreuer: Dominik Noß

Beginn: now

Weitere Details:

Beschreibung

You are interested in JavaScript, HTML and the DOM? You really like coding and digging into data? Working with graphs and graph databases? Then this bachelor thesis is perfect for you. This thesis consists of implementing the enhancements to the NDS DOM Crawler software project. We can agree on a selection from the following list of tasks:

  • Centralized logging to a server. This simplifies and helps debugging problems. (difficulty: easy)
  • Detailed inspection of getter / setter. Each Property in JavaScript can have these functions, which determine its value. Collect and store as much information on them as possible. (medium)
  • Adding export as native GraphML (an XML format) and directly into Neo4J (via AJAX). This is necessary to render the crawler more usable. (easy)
  • Performance and stability enhancement. (easy to medium)
  • Finding hidden properties, functions and features in JavaScript. How can you find something JavaScript doesn’t want you to? (challenging)
  • Calling simple functions. Which DOM functions can be called safely? (medium)
  • Crawling different contexts (e.g. chrome://, extensions, workers) (medium)
  • Modifying the source code of Chromium or Firefox in order to expose internal data to the crawler (challenging)
  • Using the heap exporting feature of browser in order to enricht graph data with memory addresses (medium)

If you are interested in a more detailed discussion, please mail dominik.noss@rub.de.

Voraussetzungen

  • You’re quite good in JavaScript
  • You know for a={};b={};c=a; whether a===a;a===b;a===c;
  • You know the XML file format
  • Familiarity with DOM, HTTP, HTML and graphs - or desire to learn it