Generate a CV for this Job!

Based on your profile and this job description, you can create a tailored CV to apply directly.

Text Extraction from PDF using NODEJS or any other solution.

Upwork • , US • Remote

Posted on: 12th March, 2025
Employment Type: CONTRACTOR

Job Description

We are looking for an experienced NODEJS developer to create a solution for extracting text from PDFs. The extracted text should identify :

1. font family

2. Font size

3. Font color,

4. Font if bold or super script, sub script text

X & Y coordinates of the bounding box for each sentence needed.

Extraction should identify sentences written in 2 lines as merge them into one text box. The goal is to accurately extract text

Project Requirements:

PDF Layout Analysis:

Implement a layout analysis algorithm to detect text boxes or groups in the PDF.

Extract text in a grouped format based on spatial proximity, emulating the structure of text blocks visually represented in the PDF.

Accurate Text Grouping:

The solution should recognize logical groupings of text (e.g., paragraphs, tables, or other content blocks) rather than extracting it in a line-by-line manner.

Deliverables:

A Node.js-based script or application that processes a PDF and outputs text grouped by layout structure. Output format should be JSON.

Documentation explaining the solution's functionality and instructions for integrating it into a front-end user interface.

Additional Requirements:

Demo: A working demo on a sample PDF page must be provided to demonstrate the solution's effectiveness before the project is awarded.

Server-Ready Code: No front-end or design elements are needed. The code should function as an API, to be deployed on our server with documentation.

Attached: Sample reference PDF

Responsibilities

  • X & Y coordinates of the bounding box for each sentence needed
  • The goal is to accurately extract text
  • Implement a layout analysis algorithm to detect text boxes or groups in the PDF
  • Extract text in a grouped format based on spatial proximity, emulating the structure of text blocks visually represented in the PDF
  • Output format should be JSON
  • Documentation explaining the solution's functionality and instructions for integrating it into a front-end user interface

Requirements

  • Font color,
  • Font if bold or super script, sub script text
  • Extraction should identify sentences written in 2 lines as merge them into one text box
  • The solution should recognize logical groupings of text (e.g., paragraphs, tables, or other content blocks) rather than extracting it in a line-by-line manner
  • A Node.js-based script or application that processes a PDF and outputs text grouped by layout structure
  • Demo: A working demo on a sample PDF page must be provided to demonstrate the solution's effectiveness before the project is awarded
  • Server-Ready Code: No front-end or design elements are needed
  • The code should function as an API, to be deployed on our server with documentation
Upwork

Upwork

Technology

Job Type

CONTRACTOR

Loading...

Loading...

AI Cover Letter Generator

Generate a Tailored Cover Letter!

Our AI will analyze your profile and create a personalized cover letter that highlights your relevant skills and experience.

Ready to Apply?

Click the button below to start your application process.

Related Jobs

Cynet Systems

3 days ago

FULLTIME

Senior NodeJS Developer

California, US View Job