Go to Top

Implementation of the streaming, fault-tolerant, controllable and secure data ingestion mechanism on top of Hadoop Distributed File System.

Customer description:

The customer is a Digital Media Advertising Technology Company that specializes in Real Time Bidding and Programmatic Media. Its recently launched Data Management Platform collects more than 75 GB of adserving events, such as clicks, impressions, tracking data and so on from the web daily from more than 50 systems and all this events need to be ingested into the internal Data Lake in the controllable and secure manner.

The task was to implement RESTful interface on top of Hadoop Distributed File System (HDFS) that would allow ingesting data into the Data Lake in the streaming, retriable, controllable and secure way.

The solution was implemented using Apache Hadoop Distributed File System and JAXRS APIs from scratch in just two weeks. The data was transferred directly into HDFS according to the permissions of an authenticated user without landing the data to any intermediate storage. Additionally the service was provided with the User Interface and Documentation, which allowed starting using it fast and easily.