Data Pipeline for Aviation Mapping

Data Pipeline for Aviation Mapping

This is a data engineering pipeline we built for aviation mapping

Documentation - Charts Rebuild

Overview

Loz Analytics rebuilt a primary data pipeline to support an aviation mapping project. Our partner company needed to ensure that the data within their mapping software was always up to date and wanted to save time, cost and avoid human error in preparing this data. We uncovered a unique method of extracting the data, built a multi-step data cleansing, transformation and feature engineering module within AWS Lambda, and then orchestrated it all with an AWS Step Function State Machine. In addition, we rebuilt a complex geospatial query with the Uber H3 package, which let us calculate approximate distances and relationships between millions of points using hex-bins, which led to a fifteen-fold reduction in processing time. The end result was a robust data pipeline that took complex raw data and created operationally-ready data for the map creators.

Technical Overview

There were six main steps in our process:

  1. Extract the data from OneDrive / SharePoint and add to AWS S3
  2. Convert the proprietary data into usable relational data
  3. Transformation and cleaning
  4. Feature engineering
  5. Geospatial indexing with H3
  6. Orchestration with Lambda and Step Functions

The extraction from OneDrive and Sharepoint to AWS was the first step (and is mentioned in another article, as the process was unique due to firewall constraints). After that, we built nine separate, modularized Python Lambda functions with custom triggers in Event Bridge. Rather than pass data between the Lambdas, we used several S3 directories to maintain data for each step of the transformation process, mainly because there are significant limitations (size, data type) on passing data between Lambdas (we learned this in our first project with AWS Lambda). We used individual layers over containers within Lambda for simplicity (we are only using a handful of custom packages, such as the Uber H3 package).
One crucial element of the process was that the Lambdas had to be executed in the correct order, four sequentially, four in parallel and then a final lambda that builds a geospatial index. A Step Function State Machine made this very easy to orchestrate and control the state of the run so that everything is run at the proper time. We use custom triggers through Event Bridge to determine when certain operations should run. This creates a system that is traceable and scalable, describing faults at any given point and produces thorough logs. State Machine

Features

  • Built each AWS Lambda function to conduct only one major task ensured for a more fault tolerant and robust system architecture.
  • Granting AWS Lambda and the State Machine the exact IAM roles and permissions fulfilled the ‘least privilege’ best security practice.
  • Rebuilt the geospatial index using H3 resulted in a fifteen-fold reduction of processing time.
  • Unit tests ensure that the data is pristine and operational at all times.

Cost Efficiency

By transitioning to AWS Lambda, we made significant improvements on cost. AWS Lambda pricing is very low cost (as an example, currently $0.20 per million invocations), and as a serverless offering, is low maintenance. Because this pipeline is fully automated, all manual labor is eliminated, saving about four hours per week. The previous system of notifying every operator that they need to download the new data (which is easy to miss) was burdensome, and our system eliminated this issue.

Future Planning & Considerations

This data rebuild project was successful, but there were numerous issues along the way, primarily with syncing the data between OneDrive and AWS S3 (addressed in another article). Access to this data will speed tool performance and give leadership more control over when the data is gathered and synced, permitting developers to focus on designing new tools. Otherwise, our team at Loz continues to get more proficient at orchestrating complex workflows within AWS Lambda. Some final considerations:

  • Research and carefully weigh the pros and cons of using open source Lambda templates and pre-built layers. If the code or libraries are out of date, untangling the web of issues can be a difficult task. Sometimes the issues aren’t immediately evident.
  • The transition of developing the functionality locally and then moving to Lambda is not always a perfect ‘lift and shift’.
  • Creating distinct roles and permissions for the various Lambda instances may be a better method than lumping all Lambdas into the same category.
  • Conduct a holistic pricing strategy - what is the actual cost of human error, address long-term maintenance and ease of use.