Task 2: Scrape & filter accounts

This document aims to help you understand the workflow and components involved in our web scraping system, designed for efficiently gathering and processing data from a single source website. Let's wa

Developer Documentation

Overview

The Moti Scrape & Filter Task is designed to collect data from a specific source website using user credentials, process this data through various modules, and then manage the data submission, auditing, and distribution.

Components

  1. User Input

    • Data: User Name and Password

    • Function: Users provide their login credentials, which are necessary for accessing the source website.

  2. Source Website

    • Function: The target website from which data will be scraped. The website requires a user profile to access the data.

  3. Moti Scraper Task

    • The core part of the system where the data scraping and processing happens.

Inside the Moti Scraper Task

  1. Crawler

    • Function: The crawler navigates the source website using the user profile provided. It collects raw data from the website.

  2. Adapter

    • Function: The adapter transforms the raw data into a standardized format suitable for further processing. The adapter acts as an intermediary between the crawler and the subsequent components.

  3. Submission

    • Function: After the data is standardized by the adapter, it is sent to the submission module. This module prepares the data for storage or further processing.

  4. Audit

    • Function: The audit module reviews the submitted data for accuracy and completeness. It ensures that the data meets the required quality standards before distribution.

  5. Distribution

    • Function: The final stage where the processed and audited data is distributed to the relevant endpoints or systems that will use this data.

Data Flow

  1. User Input: Users provide their login credentials.

  2. Crawler: Uses the credentials to access the source website and scrape raw data.

  3. Adapter: Transforms the raw data into a unified format.

  4. Submission: Prepares the standardized data for quality checks.

  5. Audit: Ensures data integrity and quality.

  6. Distribution: Delivers the final data to the designated endpoints.

API Endpoints

  • POST Endpoint: Used to submit new (pending endorsements).

  • GET Request: Utilized to fetch and display endorsements, which might be segmented by the audit status.

Conclusion

The Moti Scraper Task is a robust system designed to streamline the process of collecting, processing, and distributing endorsement from a specific website. By breaking down the task into specialized components, the system ensures that data is handled efficiently and accurately. This documentation should help you understand the workflow and effectively utilize the system for the hackathon.

If you have any questions or need further assistance, please don't hesitate to reach out to our support team. Happy hacking!

Last updated