Big Data Community Challenge: Data Modeling & Analysis

How our tech communities are engaging members while solving complex problems in a fun technical way

Our Big Data Community started this year with a cool challenge. We wanted to engage all the community members, create some healthy competition and learn by using any tool in a fun technical way. With that goal in mind, the community set out to solve a particular change challenge – engage all members.

Beyond engagement, the challenge needed to: 

  • Develop technical skills 
  • Verify tech capabilities
  • Learn new things out of normal expertise 
  • Discover new technologies /tap into the existing wide range of technologies

The first challenge was a resounding success and inspired the community members to make it a series of challenges. Here’s an outline of the first challenge: 

Car Sales Driven By Data

A car sales company wanted to extract insights that would help steer the company into making more data-driven business decisions. The source dataset contained over 400k craigslist car listings with features like, price, model, odometer, transmission, etc. The challenge was to:

  • Create a database structure which allowed users to load the initial data, but also allowed for subsequent data loads.
  • Create a data pipeline to load the dataset to the DB structure.
  • Create a dashboard to answer pertinent business questions like:
    • What are the regional preferences or trends?
    • Which regions have the best condition for classic cars?
    • Which regions have the best condition for newer cars?
    • Which region would be the best to start a branch?
    • Is it worth it to bring cars from region to region for profit?

Winning solution

The winning team, comprised of Softvisioners, Luiza Maria Sava, Daniel Vasilan, Loredana Ilie, Roxana Pop, Adrian Steau, created a data processing solution based on Databricks and used Tableau in order to create a dashboard that answers several business questions and brings out a lot of meaningful insights. 

The data ingestion flow performs data sanity checks, data cleanup, deduplication, enrichment and standardization, ensuring that reliable data is fed to the reporting layer. All these steps are performed while the data passes through  Staging- EDW layers, before reaching the ADL layer, where it is ready to be consumed by the reports.

Do you want to be part of our monthly challenges? Join our Big Data community and participate in the next ones.