Sourcing & data ethics

A small set of
house rules for how we collect.

The First Hire is built on other people's work - the careers pages of hundreds of companies. That privilege is conditional. Below is the full, plain-language version of what we do, what we refuse to do, and how to get a listing pulled.

01
Principle

Public career pages only

Front door, never the back.

We crawl pages a company has chosen to expose to the open web - /careers, /jobs, public ATS boards (Greenhouse, Lever, Ashby, Workable). Nothing behind a login, paywall, or robots.txt disallow.

  • Respect robots.txt and meta noindex on every fetch
  • Identify our crawler with a contactable User-Agent
  • Rate-limit per host, back off on 429/5xx, never parallelise abusively
  • Skip pages requiring authentication, cookies, or captcha solving
02
Principle

Minimal facts, not full copies

An index entry, not a mirror.

We store the structured facts a job-seeker needs to decide whether to click through: title, company, location, remote flag, salary band, posted date. We do not warehouse the prose of the job description.

  • Stored: title, company, role category, location, remote, salary range, posted date, source URL
  • Not stored: full description HTML, benefits copy, hiring-manager bios, internal team blurbs
  • One short one-liner may be retained - kept under 200 characters, derived not copied
  • Raw HTML is discarded after parsing; only normalised fields are persisted
03
Principle

Always link back to the original

The company owns the canonical post.

Every row in the index is a pointer. The Apply button sends the visitor to the company's own posting - not a clone, not a re-host, not a reader-mode mirror. The traffic, the application, and the credit belong to the source.

  • Source URL stored alongside every job and surfaced as the primary CTA
  • Canonical link preserved verbatim, including UTM parameters set by the employer
  • No reader-mode, no AMP-style rewrap, no scraping of the application form itself
  • Dead links are flagged within 48 hours and removed from the live index
04
Principle

No recruiter personal data

People aren't line items.

Job posts often expose the hiring manager - name, photo, LinkedIn, direct email. We strip all of it. The First Hire lists roles, not the humans assigned to fill them.

  • Personal names, photos, phone numbers, and direct emails removed at parse time
  • Social handles (LinkedIn, X, personal sites) stripped before storage
  • Generic role mailboxes (careers@, jobs@) kept only when they are the official apply path
  • No enrichment from third-party people databases - ever
05
Principle

A real opt-out, honoured fast

One email, gone in a week.

If a company doesn't want to appear here, we don't argue. Send a note from a verifiable company domain and the listing - plus the crawl rule that produced it - comes down.

  • Email opt-out@thefirsthire.com with the company domain in the request
  • Acknowledged within 2 business days, removed within 7
  • Domain added to a permanent skip-list so it won't reappear on the next crawl
  • Sub-paths can be excluded instead of the whole domain if preferred
Opt-out · Colophon

Want your roles
removed?

Email us from any address on your company domain. Tell us whether you'd like the entire domain skipped or just specific sub-paths. We confirm within two business days and the rows are gone within seven.

The address
opt-out@thefirsthire.com
Ack window
≤ 2 business days
Removal SLA
≤ 7 days
Re-crawl
never
Cost to you
zero