Udyam Registration Form Clone — Scrape, Render, Validate & Persist (Full-Stack)
Source: Derived from Resources/Openbiz__Assignment.pdf — a real take-home assignment from Openbiz for a full-stack / backend developer role. The candidate must recreate the first two steps of India's government Udyam (MSME) registration portal (https://udyamregistration.gov.in/UdyamRegistration.aspx).
Skills Required
- Web scraping & DOM inspection — extracting input fields, labels, dropdown options, button states, and client-side validation rules from a live ASP.NET (
.aspx) page. - Scraping tooling — Python (
BeautifulSoup, Scrapy) or JavaScript (Puppeteer, Cheerio); understanding when a page is static HTML vs. dynamically rendered (and needing a headless browser). - Schema design — converting a scraped form into a structured, machine-readable JSON schema (field name, type, label, regex, required, options).
- Modern frontend — React or Next.js, TypeScript preferred; component-driven UI; controlled inputs.
- Dynamic / schema-driven form rendering — building a renderer that reads JSON and outputs the correct widgets instead of hard-coding each field.
- Responsive, mobile-first CSS — flexbox/grid, media queries, accessible form layout.
- Client-side validation — regex matching (PAN
[A-Za-z]{5}[0-9]{4}[A-Za-z]{1}, 12-digit Aadhaar), real-time inline error messaging, OTP-step UX. - REST API design — Node.js (Express + Prisma) or Python (FastAPI + SQLAlchemy); request validation; meaningful HTTP status codes (400 vs 201).
- Relational data modeling — PostgreSQL schema mirroring the Udyam fields; migrations via Prisma/SQLAlchemy.
- Third-party API integration — PIN-code lookup to auto-fill city/state (bonus).
- Testing — unit tests for validation logic and endpoints (Jest or pytest), with edge-case coverage.
- DevOps (bonus) — Docker, plus deploying frontend (Vercel/Netlify) and backend (Railway/Render/Heroku).
- Engineering hygiene — clean modular architecture, comments, sensible Git commit history.
Background a Student Needs
You should be comfortable with the full request/response lifecycle of a web app: an HTML form in the browser, an HTTP POST to a backend, server-side validation, and a row written to a database. You need basic familiarity with regular expressions (for PAN/Aadhaar formats), with reading the DOM in browser dev-tools, and with at least one frontend framework (React/Next.js) and one backend stack (Express or FastAPI). Knowing what the Indian Udyam/MSME registration is helps for context, but the real lesson is treating a form as data — scraping its definition, storing that definition as JSON, and driving both the UI and the validation from that single source of truth.
Task Summary
Recreate the first two steps of the official Udyam registration form — Aadhaar + OTP validation, then PAN validation — as a polished, fully responsive web app. First scrape the real portal to capture every field, label, and validation rule into a JSON schema; then build a React/Next.js UI that renders dynamically from that schema with real-time validation; finally back it with a REST API that re-validates and persists submissions in PostgreSQL. Bonus credit for PIN-code auto-fill, a step progress tracker, tests, Docker, and live deployment.
The Task
1. Web Scraping (Steps 1 & 2)
- Goal: Identify all input fields, labels, validation rules (e.g., PAN/Aadhaar formats), and UI components (dropdowns, buttons) from the first two steps of the Udyam portal.
- Scope: Scrape only the first two steps — (a) Aadhaar number + OTP validation and (b) PAN validation — at https://udyamregistration.gov.in/UdyamRegistration.aspx.
- Tools: Python (
BeautifulSoup/Scrapy) or JavaScript (Puppeteer/Cheerio). - Output: A structured JSON schema describing each field (name, type, label, placeholder, regex/validation, required flag, and any dropdown option lists).
2. Responsive UI Development
- Replicate the Udyam form layout with a mobile-first approach and 100% responsiveness.
- Use React/Next.js (TypeScript preferred) — or vanilla HTML/CSS/JS if needed.
- Implement dynamic form rendering driven by the scraped JSON schema (do not hard-code each field).
- Add real-time validation, e.g. PAN format
[A-Za-z]{5}[0-9]{4}[A-Za-z]{1}, 12-digit Aadhaar, and an OTP step. - Bonus UI enhancements:
- Auto-fill suggestions for city/state from PIN code (use a PIN-lookup API).
- A progress tracker showing Steps 1 & 2.
3. Backend Implementation
- Build a REST API (Node.js or Python) that:
- Validates incoming form data against the scraped rules.
- Stores submissions in PostgreSQL.
- Design a database schema matching the Udyam form fields.
- Tools: Node.js → Express + Prisma ORM; or Python → FastAPI + SQLAlchemy.
4. Testing
- Write unit tests for:
- Form validation logic (e.g., an invalid PAN triggers an error).
- API endpoints (e.g.,
POST /submit returns 400 for invalid data).
- Tools: Jest (JavaScript) or pytest (Python).
5. Deployment (Bonus)
- Dockerize the application.
- Deploy frontend (Vercel/Netlify) and backend (Heroku/Railway).
Evaluation Criteria
| Area | Key Metrics |
|---|
| Scraping | Accuracy of extracted fields; handling dynamic content if any. |
| UI/UX | Pixel-perfect responsiveness, intuitive error messages, smooth transitions. |
| Backend | REST API correctness, validation logic, database schema design. |
| Code Quality | Clean architecture, modular code, proper comments, Git practices. |
| Testing | Coverage of edge cases (invalid Aadhaar, empty fields, etc.). |
Alternate Tasks (Mini-Project Variations)
- (Beginner) Static two-step Udyam form, hand-coded. Build just the frontend: a hard-coded two-step form (Aadhaar + OTP, then PAN) in plain React or HTML/CSS/JS, with the PAN and Aadhaar regex validations and inline error messages, plus a "Step 1 of 2" indicator. Skip scraping, the backend, and the database entirely — fake the OTP step with a hard-coded code. This is the ideal warm-up because it isolates the single most important full-stack skill a junior is judged on: building a clean, responsive, accessible form with correct client-side validation. It teaches controlled inputs, regex validation, multi-step UI state, and mobile-first CSS without the cognitive load of scraping or a database.
- (Beginner–Intermediate) Scrape the form into a JSON schema. Focus only on the scraping half. Using BeautifulSoup or Puppeteer, extract every field, label, dropdown option, and validation pattern from the first two Udyam steps and emit a clean, well-typed
schema.json. Write a short README documenting which fields are static HTML versus rendered by JavaScript (a key gotcha on .aspx pages, which often need a headless browser rather than a simple HTTP fetch). This exercise teaches DOM inspection, the difference between static and dynamic pages, and — most valuably — the mindset of treating a UI as structured data you can serialize, which is the conceptual backbone of the whole assignment. - (Intermediate) Schema-driven renderer + persistence (the core assignment, trimmed). Take the
schema.json from variation 2 and build a generic FormRenderer component that reads the schema and outputs the right widget for each field type, applying the regex validations from the schema rather than from hard-coded logic. Wire it to a minimal Express + Prisma (or FastAPI + SQLAlchemy) backend that re-validates against the same rules and writes to PostgreSQL, returning 201 on success and 400 on bad input. This is the heart of the real task and teaches the single-source-of-truth principle: one schema drives the UI, the client validation, and the server validation, so the form can change without touching component code. - (Intermediate–Advanced, MERN twist) Multi-form builder SaaS on the MERN stack. Generalize the project away from Udyam into a small "Google Forms"-style app on MongoDB + Express + React + Node. An admin UI lets a user define a form's fields and validation rules; those definitions are stored as documents in MongoDB; an end-user route renders any saved form dynamically and submits responses; and a dashboard lists collected submissions. This twists the assignment toward a real MERN product and teaches CRUD across two resource types (form definitions and responses), schema-versioning concerns, authentication for the admin area, and the same schema-driven rendering pattern at a larger scale.
- (Advanced, Agentic AI twist) Agent that auto-generates the form from a URL. Build an agentic pipeline that, given any government/registration form URL, uses a headless browser tool plus an LLM to (a) fetch and read the page, (b) reason about each field's purpose and infer the correct validation regex, and (c) emit a validated JSON schema ready for the renderer from variation 3 — with a verification step where the agent re-checks its own output against the live DOM and flags low-confidence fields for human review. This replaces brittle hand-written scrapers with an LLM-driven extractor and teaches tool-use orchestration, structured output with schema validation, prompt design for extraction, and the critical agentic skill of self-verification and confidence reporting rather than blind trust in model output.
Reference Links