Skip to main content
Available starting with FlowX.AI 5.6.0

Overview

The Web Page Extractor node is a workflow node that collects readable content from web page URLs. It supports static URL lists and dynamic URL generation, configurable crawling depth with link following, and adjustable scrape speed presets.
Web Page Extractor node configuration with URLs, Crawl Depth, and Scrape Speed settings

Static or dynamic URLs

Provide a fixed list of URLs or generate them dynamically from workflow data

Link following

Optionally follow links on pages up to a configurable depth

PDF processing

Extract content from PDF files linked on the page

Scrape speed control

Choose from speed presets or define custom rate limits and concurrency

Configuration

1

Open your workflow

Open your workflow in Integration Designer.
2

Add the node

Add a Web Page Extractor node from the Tools category in the left panel.
3

Configure URL source and extraction settings

Configure the settings described below.

URL source

URL Mode
enum
required
How URLs are provided to the node.
ModeDescription
StaticProvide a fixed list of up to 20 URLs
DynamicGenerate URLs from a workflow data key using ${expression} syntax
Default: Static
URLs
string[]
List of URLs to extract content from. Only available when URL Mode is Static.Maximum: 20 URLsURLs must use http:// or https:// protocol. Supports ${variable} placeholders for dynamic values.
A workflow data key or expression that resolves to a URL at runtime. Only available when URL Mode is Dynamic.Example: ${inputData.targetUrl}

Crawl depth

When turned on, the extractor follows links found on the page up to the configured depth.Default: OFF
Max Depth
number
How many levels of links to follow from the starting page. Only applies when Follow Links is turned on.Range: 0–10Default: 0
Process Linked PDFs
boolean
When turned on, extracts content from PDF files linked on the page.Default: OFF

Scrape speed

Scrape Speed Preset
enum
required
Controls how aggressively the node requests pages from the target server.
PresetDescription
SlowConservative rate limiting — best for fragile or rate-limited servers
ModerateBalanced speed and reliability
FastAggressive crawling — assumes the target server can handle high traffic
CustomDefine your own rate limit and concurrency
Default: Moderate
Rate Limit
number
Maximum requests per second. Only available when Scrape Speed Preset is Custom.Default: 2
Concurrency
number
Number of concurrent requests. Only available when Scrape Speed Preset is Custom.Default: 3

Response key

responseKey
string
required
The key where extracted content is stored in the workflow data.Example: extractedContent

Timeout and retry

Timeout
number
Request timeout in milliseconds. If the extraction exceeds this duration, the node fails.
Retry Config
object
Optional retry strategy for failed requests.
FieldDescriptionDefault
Retry TypeFixed or Exponential backoff
Max AttemptsMaximum retry attempts2
Backoff PeriodDelay between retries (ms)1000
Max Backoff PeriodMaximum delay for exponential backoff (ms)120000
Backoff MultiplierMultiplier for exponential backoff2

Best practices

Start with Moderate speed

Use the Moderate preset unless you know the target server’s capacity. Switch to Fast only for internal or robust servers.

Limit crawl depth

Keep Max Depth low (1–3) to avoid excessive page requests. Deep crawls can be slow and may trigger rate limiting.

Use dynamic URLs for runtime flexibility

When the target URL comes from user input or a previous workflow step, use Dynamic mode with ${expression} placeholders.

Set timeouts for external sites

Always configure a timeout when crawling external websites to avoid blocking the workflow on slow or unresponsive servers.

Extract Data from File

Extract text and data from documents and images

AI node types

Overview of all AI workflow node types

Integration Designer

Build and manage integration workflows
Last modified on March 25, 2026