← All projects

Udyam Registration Form Clone — Scrape, Render, Validate & Persist (Full-Stack)

Source: Derived from Resources/Openbiz__Assignment.pdf — a real take-home assignment from Openbiz for a full-stack / backend developer role. The candidate must recreate the first two steps of India's government Udyam (MSME) registration portal (https://udyamregistration.gov.in/UdyamRegistration.aspx).

Skills Required

Background a Student Needs

You should be comfortable with the full request/response lifecycle of a web app: an HTML form in the browser, an HTTP POST to a backend, server-side validation, and a row written to a database. You need basic familiarity with regular expressions (for PAN/Aadhaar formats), with reading the DOM in browser dev-tools, and with at least one frontend framework (React/Next.js) and one backend stack (Express or FastAPI). Knowing what the Indian Udyam/MSME registration is helps for context, but the real lesson is treating a form as data — scraping its definition, storing that definition as JSON, and driving both the UI and the validation from that single source of truth.

Task Summary

Recreate the first two steps of the official Udyam registration form — Aadhaar + OTP validation, then PAN validation — as a polished, fully responsive web app. First scrape the real portal to capture every field, label, and validation rule into a JSON schema; then build a React/Next.js UI that renders dynamically from that schema with real-time validation; finally back it with a REST API that re-validates and persists submissions in PostgreSQL. Bonus credit for PIN-code auto-fill, a step progress tracker, tests, Docker, and live deployment.

The Task

1. Web Scraping (Steps 1 & 2)

2. Responsive UI Development

3. Backend Implementation

4. Testing

5. Deployment (Bonus)

Evaluation Criteria

AreaKey Metrics
ScrapingAccuracy of extracted fields; handling dynamic content if any.
UI/UXPixel-perfect responsiveness, intuitive error messages, smooth transitions.
BackendREST API correctness, validation logic, database schema design.
Code QualityClean architecture, modular code, proper comments, Git practices.
TestingCoverage of edge cases (invalid Aadhaar, empty fields, etc.).

Alternate Tasks (Mini-Project Variations)

  1. (Beginner) Static two-step Udyam form, hand-coded. Build just the frontend: a hard-coded two-step form (Aadhaar + OTP, then PAN) in plain React or HTML/CSS/JS, with the PAN and Aadhaar regex validations and inline error messages, plus a "Step 1 of 2" indicator. Skip scraping, the backend, and the database entirely — fake the OTP step with a hard-coded code. This is the ideal warm-up because it isolates the single most important full-stack skill a junior is judged on: building a clean, responsive, accessible form with correct client-side validation. It teaches controlled inputs, regex validation, multi-step UI state, and mobile-first CSS without the cognitive load of scraping or a database.
  2. (Beginner–Intermediate) Scrape the form into a JSON schema. Focus only on the scraping half. Using BeautifulSoup or Puppeteer, extract every field, label, dropdown option, and validation pattern from the first two Udyam steps and emit a clean, well-typed schema.json. Write a short README documenting which fields are static HTML versus rendered by JavaScript (a key gotcha on .aspx pages, which often need a headless browser rather than a simple HTTP fetch). This exercise teaches DOM inspection, the difference between static and dynamic pages, and — most valuably — the mindset of treating a UI as structured data you can serialize, which is the conceptual backbone of the whole assignment.
  3. (Intermediate) Schema-driven renderer + persistence (the core assignment, trimmed). Take the schema.json from variation 2 and build a generic FormRenderer component that reads the schema and outputs the right widget for each field type, applying the regex validations from the schema rather than from hard-coded logic. Wire it to a minimal Express + Prisma (or FastAPI + SQLAlchemy) backend that re-validates against the same rules and writes to PostgreSQL, returning 201 on success and 400 on bad input. This is the heart of the real task and teaches the single-source-of-truth principle: one schema drives the UI, the client validation, and the server validation, so the form can change without touching component code.
  4. (Intermediate–Advanced, MERN twist) Multi-form builder SaaS on the MERN stack. Generalize the project away from Udyam into a small "Google Forms"-style app on MongoDB + Express + React + Node. An admin UI lets a user define a form's fields and validation rules; those definitions are stored as documents in MongoDB; an end-user route renders any saved form dynamically and submits responses; and a dashboard lists collected submissions. This twists the assignment toward a real MERN product and teaches CRUD across two resource types (form definitions and responses), schema-versioning concerns, authentication for the admin area, and the same schema-driven rendering pattern at a larger scale.
  5. (Advanced, Agentic AI twist) Agent that auto-generates the form from a URL. Build an agentic pipeline that, given any government/registration form URL, uses a headless browser tool plus an LLM to (a) fetch and read the page, (b) reason about each field's purpose and infer the correct validation regex, and (c) emit a validated JSON schema ready for the renderer from variation 3 — with a verification step where the agent re-checks its own output against the live DOM and flags low-confidence fields for human review. This replaces brittle hand-written scrapers with an LLM-driven extractor and teaches tool-use orchestration, structured output with schema validation, prompt design for extraction, and the critical agentic skill of self-verification and confidence reporting rather than blind trust in model output.

Reference Links