How to generate XPath and CSS selectors for any website.
There are a few ways to generate selectors for a new scraping or automation project:
finic generate-selectors
to use a LLM to predict selectors for elements you pick out.Each option has different tradeoffs. Defining selectors yourself is the most reliable method, but also most time consuming. Using DevTools makes it easy to get started, but the selectors it generates depend on IDs or specific tree structures that are often brittle. Using LLMs is more reliable than DevTools and faster than DIY, but has its own LLM-associated drawbacks (hallucinations, cost, etc).
In practice, we found that it costs about $0.03 per selector with GPT-4o or $0.0009 per selector with GPT-4o-mini.
Finic’s selector generator runs entirely locally, except for a call to OpenAi or Anthropic using your API key. You can view the code here.
This launches an interactive flow in your terminal for picking elements. At any point the help
or h
command will list all available commands.
Your terminal will update whenever a new element is selected. Enter add
or a
to queue it up for generation. list
or l
will list all queued elements.
The browser will open in inspect mode, which prevents interacting with the website. Enter mode
or m
to toggle inspect mode on and off so you can navigate the website.
The command generate
or g
will start generating selectors for the elements you’ve queued. To minimize token usage, we send HTML for only the 2 closest siblings of the target element, the parent element, and the grandparent element.
In our testing this ends up being less than 5000 tokens in most cases.
Selectors will be saved to selectors.json
in the current directory.
Keep in mind that calling this endpoing with the same selector_id
will overwrite existing registered selectors.
selectors.json
with FSS in a future update.How to generate XPath and CSS selectors for any website.
There are a few ways to generate selectors for a new scraping or automation project:
finic generate-selectors
to use a LLM to predict selectors for elements you pick out.Each option has different tradeoffs. Defining selectors yourself is the most reliable method, but also most time consuming. Using DevTools makes it easy to get started, but the selectors it generates depend on IDs or specific tree structures that are often brittle. Using LLMs is more reliable than DevTools and faster than DIY, but has its own LLM-associated drawbacks (hallucinations, cost, etc).
In practice, we found that it costs about $0.03 per selector with GPT-4o or $0.0009 per selector with GPT-4o-mini.
Finic’s selector generator runs entirely locally, except for a call to OpenAi or Anthropic using your API key. You can view the code here.
This launches an interactive flow in your terminal for picking elements. At any point the help
or h
command will list all available commands.
Your terminal will update whenever a new element is selected. Enter add
or a
to queue it up for generation. list
or l
will list all queued elements.
The browser will open in inspect mode, which prevents interacting with the website. Enter mode
or m
to toggle inspect mode on and off so you can navigate the website.
The command generate
or g
will start generating selectors for the elements you’ve queued. To minimize token usage, we send HTML for only the 2 closest siblings of the target element, the parent element, and the grandparent element.
In our testing this ends up being less than 5000 tokens in most cases.
Selectors will be saved to selectors.json
in the current directory.
Keep in mind that calling this endpoing with the same selector_id
will overwrite existing registered selectors.
selectors.json
with FSS in a future update.