Compare commits

...

3 Commits

6 changed files with 816 additions and 17 deletions

View File

@@ -138,7 +138,7 @@ def fetch_vdaal() -> list[RawListing]:
- `_text(soup, selector)` — Get inner text from element
- `_src(soup, selector)` — Get src or data-src attribute
- `_extract_postcode(text)` — Regex postcode from any text
- `_infer_stad(postcode)` — Simple lookup: 26002629 → Delft, 31003135 → Schiedam
- `_infer_stad(postcode)` — Simple lookup: 26002629 → Delft, 31003135 → Schiedam (Den Haag not in this helper; use the city value from the broker directly)
---
@@ -203,19 +203,40 @@ Secrets (API keys, webhook URLs) are **environment variables**, not in config.
---
## CMS Detection Tool
## Platform / CMS Quick Identification
Before investigating a broker's HTML manually, prod the human in the loop to run `autoscraper.py` from the project root:
Before investigating a broker's HTML manually, check for known platforms in this order:
### 1. OG Online / realtime-listings (API — fastest)
Check if `https://<base>/nl/realtime-listings/consumer` returns JSON (with header `X-Requested-With: XMLHttpRequest`). If yes, this is a 10-line addition to `api.py`. Known brokers: bjornd, moerman, vandaal, elzenaar, doen.
Fields: `isSales`, `statusOrig`, `salesPrice`, `address`, `zipcode`, `city`, `rooms`, `bedrooms`, `livingSurface`, `plotSurface`, `dateOfConstruction`, `energyLabel`, `type`, `photo`, `url`.
Add a `_CITIES` set to filter by city if the broker covers a wide area. Skip statuses `"rented"` and `"rented_ur"`.
### 2. Realworks CMS (SSR — one liner)
Run `autoscraper.py` or check HTML for `li.aanbodEntry`. If detected:
```python
def fetch_mybroker() -> list[RawListing]:
return fetch_realworks("https://www.mybroker.nl", "mybroker")
```
### 3. SURE WordPress Plugin (SSR — ~50 lines)
Check HTML for `sure-` CSS classes or `?sure_koop_huur=koop` filter. Two card variants:
- `a.card-house` (single dash) — e.g. Olsthoorn
- `a.card--house` (double dash) — e.g. Borgdorff
Both use `?sure_koop_huur=koop` to filter buy listings and `/page/{N}/` pagination. Detail page always has `#kenmerken li span span` pairs with labels like `status`, `soort woonhuis`/`soort woning`/`soort bouw`, `bouwjaar`, `gebruiksoppervlakte wonen`, `perceeloppervlakte`, `aantal slaapkamers`, `energielabel`. Postcode is often **not** available on the detail page.
Terminate pagination when `len(cards) < expected_per_page` (typically 15 for SURE).
### 4. Unknown CMS
Run the autoscraper tool:
```bash
python autoscraper.py listings <listings-url>
python autoscraper.py details <detail-page-url>
```
If the broker uses a known CMS, the tool prints the exact code to add — no further investigation needed. Currently detected CMSes:
- **Realworks** → prints a ready-to-paste `fetch_realworks(...)` one-liner for `ssr.py`
If the CMS is unknown, the tool prints structural diagnostics (card selectors, field patterns, pagination) to guide manual adapter development.
It prints structural diagnostics (card selectors, field patterns, pagination) to guide manual adapter development.
## Important Notes
@@ -240,6 +261,13 @@ status = _STATUS_MAP.get(item.get("status"), "beschikbaar")
### Postcode Extraction
Always aim for the **Dutch postcode format** (4 digits + 2 letters, e.g., `"2611CA"`). The travel time calculation depends on it. If a broker only provides the address string, use `_extract_postcode(address)`.
If a postcode field contains extra text (e.g., `"2522GW Den Haag"`), extract cleanly with:
```python
m = re.search(r"\d{4}\s*[A-Z]{2}", raw.upper())
postcode = m.group(0).replace(" ", "") if m else None
```
Never just `.replace(" ", "")` — that produces garbage like `"2522GWDenHaag"`.
### Price Handling
Prices are **integers** (euros), never floats. Use `parse_prijs()` for HTML.
@@ -272,7 +300,8 @@ The database stores this as JSON in the `extra` column.
- Nominatim (geocoding) has a 1 req/s limiter built into `huizenbot.py`
- Never spawn parallel requests without the human's approval
- Always use the `USER_AGENT` header (includes contact info for respectful scraping)
- Don't keep curling the same endpoint, pipe it to a <name makelaar>.dump and then rg through it to find what you need. Can also pipe it through the bsprettify.py and then rg that.
- Don't keep curling the same endpoint, pipe it to a <name makelaar>.dump and then rg through it to find what you need. Can also pipe it through the bsprettify.py and then rg that.
- Don't over-investigate pagination — confirm card count on page 1, assume it's consistent across pages, move on. Never fetch multiple pages just to verify the per-page count.
---

View File

@@ -1,4 +1,4 @@
# Verkoopmakelaars Delft & Schiedam
# Verkoopmakelaars Delft, Leiden, Den Haag & Schiedam
## Delft
@@ -13,13 +13,17 @@
| [x] | ZO makelaars | zomakelaars.nl | Van Foreestweg 4 |
| [ ] | Marloes Makelaars | — | Maerten Trompstraat 28 |
| [ ] | Makelaarskantoor J.E. Mouthaan | — | Julianalaan 43 |
| [ ] | Olsthoorn Makelaars Delft | olsthoornmakelaars.nl | Noordeinde 51 |
| [ ] | Post Makelaardij (v/h Bayense) | postmakelaardij.nl | Spoorsingel 1a |
| [ ] | Morris NVM Makelaars | morrismakelaardij.nl | — |
| [x] | Olsthoorn Makelaars Delft | olsthoornmakelaars.nl | Noordeinde 51 |
| [x] | Post Makelaardij (v/h Bayense) | postmakelaardij.nl | Spoorsingel 1a |
| [x] | Morris NVM Makelaars | morrismakelaardij.nl | — |
| [ ] | Prinsenstad Makelaardij | — | — |
| [ ] | Oude Delft Makelaardij | — | — |
| [ ] | Dijksman Woningmakelaars | — | — |
| [ ] | CORPOwonen | — | — |
| [ ] | Bergklis Makelaars | bergklis.nl | — |
| [ ] | Van Gulden Makelaardij | vanguldenmakelaardij.nl | Zaïrestraat 1 |
| [ ] | Van der Togt Makelaardij | vdtmakelaardij.nl | — (Voorburg, actief in Delft) |
## Schiedam
@@ -38,6 +42,19 @@
| [x] | Schieland Borsboom NVM Makelaars | schielandborsboom.nl | (Rotterdam, actief in Schiedam) |
## Den Haag
| Done | Naam | Website | Adres |
|------|------|---------|-------|
| [skip] | Yuvam Makelaardij | yuvammakelaardij.nl | — (connection refused) |
| [x] | 88 Makelaars | 88makelaars.nl | — |
| [skip] | DIVA Makelaars | divamakelaars.nl | — (alleen Maartensdijk, niet Den Haag) |
| [x] | Elzenaar NVM Makelaars | elzenaar.com | — |
| [skip] | Frisia Makelaars | frisiamakelaars.nl | — (SPA/Vue, geen API) |
| [x] | Borgdorff Makelaars | borgdorff.nl | — (vestiging Den Haag) |
| [skip] | SMASH Makelaars | smashmakelaars.nl | — (te klein, geen API) |
| [x] | DOEN NVM Makelaars | doenmakelaars.com | Doezastraat 30 (Leiden, ook actief in Den Haag) |
## Leiden
| Done | Naam | Website | Adres |

View File

@@ -1,4 +1,39 @@
# SSR
# OG Online / realtime-listings (fastest — API)
Check out the add_scraper_context.md, let's add a new scraper.
**Broker:** [name]
**Base URL:** [e.g. https://www.mybroker.nl]
**Cities to include:** [e.g. {"Den Haag", "Voorburg"} — omit if broker is single-city]
_(No further investigation needed — OG Online platform is fully understood.)_
# Realworks CMS (one-liner — SSR)
Check out the add_scraper_context.md, let's add a new scraper.
**Broker:** [name]
**Base URL:** [e.g. https://www.mybroker.nl]
_(No further investigation needed — Realworks platform is fully understood.)_
# SURE WordPress Plugin (SSR)
Check out the add_scraper_context.md, let's add a new scraper.
**Broker:** [name]
**Base URL:** [e.g. https://www.mybroker.nl]
**Card selector:** [a.card-house or a.card--house]
**City filter:** [city name(s) to include, or "single city — no filter needed"]
**Cards per page:** [e.g. 15]
_(Detail page always uses #kenmerken li span span — no further investigation needed.)_
# SSR (custom)
Check out the add_scraper_context.md, let's add a new scraper.
**Broker:** [name]
@@ -16,7 +51,7 @@ Check out the add_scraper_context.md, let's add a new scraper.
**Notes:** [auth, JS rendering, price filter in URL, etc.]
# API
# API (custom)
Check out the add_scraper_context.md, let's add a new scraper.

View File

@@ -307,6 +307,135 @@ def fetch_vandaal() -> list[RawListing]:
return listings
# ---------------------------------------------------------------------------
# Elzenaar NVM Makelaars (Den Haag) — OG Online platform
# ---------------------------------------------------------------------------
# Zelfde platform als bjornd/moerman/vandaal.
_ELZENAAR_BASE = "https://www.elzenaar.com"
_ELZENAAR_SKIP = {"rented", "rented_ur"}
_ELZENAAR_CITIES = {"Den Haag", "Voorburg", "Rijswijk"}
_ELZENAAR_STATUS_MAP = {
"available": "beschikbaar",
"under_bid": "onder_bod",
"under_option": "onder_bod",
"sold": "verkocht",
"sold_ur": "verkocht",
}
def fetch_elzenaar() -> list[RawListing]:
data = fetch_json(
f"{_ELZENAAR_BASE}/nl/realtime-listings/consumer",
headers={"X-Requested-With": "XMLHttpRequest"},
)
listings = []
for item in data:
if not item.get("isSales"):
continue
if item.get("statusOrig") in _ELZENAAR_SKIP:
continue
if item.get("city") not in _ELZENAAR_CITIES:
continue
if item.get("salesPrice", 0) > config.MAX_PRICE:
continue
postcode = (item.get("zipcode") or "").replace(" ", "") or None
perceel = item.get("plotSurface") or None
if perceel == 0:
perceel = None
raw_year = item.get("dateOfConstruction") or ""
bouwjaar = int(raw_year) if raw_year.isdigit() else None
listings.append(RawListing(
url=_ELZENAAR_BASE + item["url"],
source_makelaar="elzenaar",
status=_ELZENAAR_STATUS_MAP.get(item.get("statusOrig", ""), "beschikbaar"),
adres=item.get("address") or None,
postcode=postcode,
stad=item.get("city") or None,
prijs=item.get("salesPrice") or None,
woningtype=item.get("type") or None,
woonoppervlak=item.get("livingSurface") or None,
perceeloppervlak=perceel,
kamers=item.get("rooms") or None,
slaapkamers=item.get("bedrooms") or None,
bouwjaar=bouwjaar,
energielabel=item.get("energyLabel") or None,
hero_image_url=item.get("photo") or None,
))
log.info("elzenaar: %d koopwoningen opgehaald", len(listings))
return listings
# ---------------------------------------------------------------------------
# DOEN NVM Makelaars (Den Haag / Leiden / Voorburg) — OG Online platform
# ---------------------------------------------------------------------------
_DOEN_BASE = "https://www.doenmakelaars.com"
_DOEN_SKIP = {"rented", "rented_ur"}
_DOEN_CITIES = {"Den Haag", "Leiden", "Voorburg", "Leidschendam", "Rijswijk", "Wassenaar", "Zoetermeer"}
_DOEN_STATUS_MAP = {
"available": "beschikbaar",
"under_bid": "onder_bod",
"under_option": "onder_bod",
"sold": "verkocht",
"sold_ur": "verkocht",
}
def fetch_doen() -> list[RawListing]:
data = fetch_json(
f"{_DOEN_BASE}/nl/realtime-listings/consumer",
headers={"X-Requested-With": "XMLHttpRequest"},
)
listings = []
for item in data:
if not item.get("isSales"):
continue
if item.get("statusOrig") in _DOEN_SKIP:
continue
if item.get("city") not in _DOEN_CITIES:
continue
if item.get("salesPrice", 0) > config.MAX_PRICE:
continue
postcode = (item.get("zipcode") or "").replace(" ", "") or None
perceel = item.get("plotSurface") or None
if perceel == 0:
perceel = None
raw_year = item.get("dateOfConstruction") or ""
bouwjaar = int(raw_year) if raw_year.isdigit() else None
listings.append(RawListing(
url=_DOEN_BASE + item["url"],
source_makelaar="doen",
status=_DOEN_STATUS_MAP.get(item.get("statusOrig", ""), "beschikbaar"),
adres=item.get("address") or None,
postcode=postcode,
stad=item.get("city") or None,
prijs=item.get("salesPrice") or None,
woningtype=item.get("type") or None,
woonoppervlak=item.get("livingSurface") or None,
perceeloppervlak=perceel,
kamers=item.get("rooms") or None,
slaapkamers=item.get("bedrooms") or None,
bouwjaar=bouwjaar,
energielabel=item.get("energyLabel") or None,
hero_image_url=item.get("photo") or None,
))
log.info("doen: %d koopwoningen opgehaald", len(listings))
return listings
# ---------------------------------------------------------------------------
# SCRAPERS — exporteer hier alle actieve API adapters
# ---------------------------------------------------------------------------
@@ -316,4 +445,6 @@ SCRAPERS = {
'ooms': fetch_ooms,
'moerman': fetch_moerman,
'vandaal': fetch_vandaal,
'elzenaar': fetch_elzenaar,
'doen': fetch_doen,
}

View File

@@ -1292,6 +1292,588 @@ def fetch_roepman() -> list[RawListing]:
return listings
# ---------------------------------------------------------------------------
# Post Makelaardij (v/h Bayense) — Delft & omgeving
# ---------------------------------------------------------------------------
# Custom Tailwind CSS site; covers Delft, Pijnacker, Rijswijk etc.
# Filter for Delft only.
_POST_BASE = "https://www.postmakelaardij.nl"
_POST_STATUS_MAP = {
"te koop": "beschikbaar",
"onder bod": "onder_bod",
"verkocht": "verkocht",
}
def _post_detail(detail_url: str) -> dict:
"""Fetch Post Makelaardij detail page and extract kenmerken."""
try:
soup = fetch_soup(detail_url)
# Energielabel from CSS class: energielabel-{letter}
energielabel = None
for el in soup.select('[class]'):
for cls in el.get('class', []):
if cls.startswith('energielabel-') and cls != 'energielabel':
energielabel = cls.replace('energielabel-', '').upper()
break
if energielabel:
break
# Woonoppervlak, perceeloppervlak, slaapkamers from icon spans
woonoppervlak = None
perceeloppervlak = None
slaapkamers = None
for span in soup.select('span.object-info-icon-text'):
txt = span.get_text(strip=True)
if 'slaapkamer' in txt:
m = re.search(r'(\d+)', txt)
slaapkamers = int(m.group(1)) if m else None
elif 'perceel' in txt:
perceeloppervlak = parse_m2(txt)
elif '' in txt or 'm2' in txt:
woonoppervlak = parse_m2(txt)
return {
"woonoppervlak": woonoppervlak,
"perceeloppervlak": perceeloppervlak,
"slaapkamers": slaapkamers,
"energielabel": energielabel,
}
except Exception as e:
log.warning("post: detail fetch fout %s: %s", detail_url, e)
return {}
def fetch_post() -> list[RawListing]:
"""Fetch Post Makelaardij listings; only Delft, only koop."""
listings = []
page = 1
while True:
url = f"{_POST_BASE}/woningaanbod/koop?page={page}"
soup = fetch_soup(url)
cards = soup.select("article")
if not cards:
break
for card in cards:
try:
# URL — first link in image slider
a_tag = card.select_one("a[href]")
if not a_tag:
continue
href = a_tag["href"]
detail_url = href if href.startswith("http") else _POST_BASE + href
# Postcode + city from span.custom-postcode-text
pc_el = card.select_one("span.custom-postcode-text")
if not pc_el:
continue
pc_parts = pc_el.get_text(strip=True).split()
if len(pc_parts) < 3:
continue
postcode = pc_parts[0] + pc_parts[1] # "2613BD"
stad = " ".join(pc_parts[2:]) # "Delft"
# Filter: only Delft
if stad.lower() != "delft":
continue
# Price — filter early
prijs = parse_prijs(_text(card, "span.price-block"))
if prijs and prijs > config.MAX_PRICE:
continue
# Status from span.status text
status_text = (_text(card, "span.status") or "").lower()
status = _POST_STATUS_MAP.get(status_text, "beschikbaar")
# Address
adres = _text(card, "h4.custom-address-text")
# Hero: first img in article
img = card.select_one("img")
hero = img["src"] if img else None
kk = _post_detail(detail_url)
listings.append(RawListing(
url=detail_url,
source_makelaar="post",
status=status,
adres=adres,
postcode=postcode,
stad=stad,
prijs=prijs,
hero_image_url=hero,
woonoppervlak=kk.get("woonoppervlak"),
perceeloppervlak=kk.get("perceeloppervlak"),
slaapkamers=kk.get("slaapkamers"),
energielabel=kk.get("energielabel"),
))
if config.APP_ENV == "dev":
break
except Exception as e:
log.warning("post: parse fout: %s", e)
if len(cards) < 12:
break
page += 1
log.info("post: %d listings opgehaald", len(listings))
return listings
# ---------------------------------------------------------------------------
# Morris NVM Makelaars (Delft) — Realworks CMS
# ---------------------------------------------------------------------------
def fetch_morris() -> list[RawListing]:
return fetch_realworks("https://www.morrismakelaardij.nl", "morris")
# ---------------------------------------------------------------------------
# Olsthoorn Makelaars Delft (SURE WordPress plugin)
# ---------------------------------------------------------------------------
# Covers Delft, Den Haag, Naaldwijk etc — we filter for Delft only.
# Detail page has no postcode; leave as None.
_OLSTHOORN_BASE = "https://www.olsthoornmakelaars.nl"
_OLSTHOORN_STATUS_MAP = {
"badge-available": "beschikbaar",
"badge-bid": "onder_bod",
"badge-option": "onder_bod",
"badge-sold": "verkocht",
}
_OLSTHOORN_DETAIL_STATUS_MAP = {
"beschikbaar": "beschikbaar",
"onder bod": "onder_bod",
"onder optie": "onder_bod",
"verkocht": "verkocht",
}
def _olsthoorn_detail(detail_url: str) -> dict:
"""Fetch Olsthoorn detail page; extract kenmerken from #kenmerken li pairs."""
try:
soup = fetch_soup(detail_url)
kv: dict[str, str] = {}
for li in soup.select("#kenmerken li"):
spans = li.select("span")
if len(spans) >= 2:
label = spans[0].get_text(strip=True).lower()
value = spans[1].get_text(strip=True)
kv[label] = value
return {
"status": kv.get("status", "").lower(),
"woningtype": kv.get("soort object") or kv.get("soort woning") or kv.get("soort bouw"),
"bouwjaar": kv.get("bouwjaar"),
"woonoppervlak": kv.get("gebruiksoppervlakte"),
"perceeloppervlak": kv.get("perceeloppervlakte"),
"kamers": kv.get("aantal kamers"),
"slaapkamers": kv.get("aantal slaapkamers"),
"energielabel": kv.get("energielabel"),
}
except Exception as e:
log.warning("olsthoorn: detail fetch fout %s: %s", detail_url, e)
return {}
def fetch_olsthoorn() -> list[RawListing]:
"""Fetch Olsthoorn Makelaars listings; only Delft, only koop."""
listings = []
page = 1
while True:
if page == 1:
url = f"{_OLSTHOORN_BASE}/wonen?sure_koop_huur=koop"
else:
url = f"{_OLSTHOORN_BASE}/wonen/page/{page}/?sure_koop_huur=koop"
soup = fetch_soup(url)
cards = soup.select("a.card-house")
if not cards:
break
for card in cards:
try:
href = card.get("href", "")
if not href:
continue
detail_url = href if href.startswith("http") else _OLSTHOORN_BASE + href
# Filter: only Delft
stad_el = card.select_one("h2.card__title")
stad = stad_el.get_text(strip=True) if stad_el else None
if not stad or stad.lower() != "delft":
continue
# Price from bold tag — filter early before detail fetch
prijs_b = card.select_one("b")
prijs = parse_prijs(prijs_b.get_text() if prijs_b else None)
if prijs and prijs > config.MAX_PRICE:
continue
# Status from badge class on label span
label_span = card.select_one("span.card-house__label")
status = "beschikbaar"
if label_span:
for cls in label_span.get("class", []):
if cls in _OLSTHOORN_STATUS_MAP:
status = _OLSTHOORN_STATUS_MAP[cls]
break
# Address: second <p> under .short--info (collapse internal whitespace)
adres_p = card.select("div.short--info > p")
if adres_p:
adres = " ".join(adres_p[0].get_text().split())
else:
adres = None
# Hero image: largest source srcset
src_tag = card.select_one('picture source[media="(min-width:1024px)"]')
hero = src_tag.get("data-srcset") if src_tag else None
if hero and not hero.startswith("http"):
hero = _OLSTHOORN_BASE + hero
# Woonoppervlak + kamers + energielabel from card data icons
woonoppervlak_card = None
kamers_card = None
energielabel_card = None
for data_div in card.select("div.data"):
inner = data_div.select_one("span.date__inner")
if not inner:
continue
txt = inner.get_text(strip=True)
if data_div.select_one("i.icon-sizes"):
woonoppervlak_card = parse_m2(txt)
elif data_div.select_one("i.icon-door"):
m = re.search(r"(\d+)", txt)
kamers_card = int(m.group(1)) if m else None
elif data_div.select_one("i.icon-energylabel"):
energielabel_card = txt or None
kk = _olsthoorn_detail(detail_url)
# Refine status from detail page
detail_status = _OLSTHOORN_DETAIL_STATUS_MAP.get(kk.get("status", ""), "")
if detail_status:
status = detail_status
listings.append(RawListing(
url=detail_url,
source_makelaar="olsthoorn",
status=status,
adres=adres,
postcode=None, # not exposed by broker
stad=stad,
prijs=prijs,
hero_image_url=hero,
woningtype=kk.get("woningtype"),
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
kamers=int(kk["kamers"]) if kk.get("kamers") else kamers_card,
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
energielabel=kk.get("energielabel") or energielabel_card,
))
if config.APP_ENV == "dev":
break
except Exception as e:
log.warning("olsthoorn: parse fout: %s", e)
if len(cards) < 15:
break
page += 1
log.info("olsthoorn: %d listings opgehaald", len(listings))
return listings
# ---------------------------------------------------------------------------
# 88 Makelaars (Den Haag) — Custom WordPress theme
# ---------------------------------------------------------------------------
# Cards on /ons-aanbod/page/{N}/; details in div.listing_detail kv pairs.
_88_BASE = "https://88makelaars.nl"
_88_STATUS_MAP = {
"te koop": "beschikbaar",
"beschikbaar": "beschikbaar",
"onder bod": "onder_bod",
"onder optie": "onder_bod",
"verkocht onder voorbehoud": "verkocht",
"verkocht": "verkocht",
}
def _88makelaars_detail(detail_url: str) -> dict:
"""Fetch 88makelaars detail page; extract kenmerken from div.listing_detail kv pairs."""
try:
soup = fetch_soup(detail_url)
kv: dict[str, str] = {}
for div in soup.select("div.listing_detail"):
txt = div.get_text(strip=True)
if ":" in txt:
label, _, value = txt.partition(":")
kv[label.strip().lower()] = value.strip()
raw_pc = kv.get("postcode") or ""
pc_match = re.search(r"\d{4}\s*[A-Z]{2}", raw_pc.upper())
postcode = pc_match.group(0).replace(" ", "") if pc_match else None
return {
"postcode": postcode,
"slaapkamers": kv.get("slaapkamers"),
"woonoppervlak": kv.get("woning grootte"),
"energielabel": kv.get("energieklasse"),
"woningtype": kv.get("soort woning"),
}
except Exception as e:
log.warning("88makelaars: detail fetch fout %s: %s", detail_url, e)
return {}
def fetch_88makelaars() -> list[RawListing]:
"""Fetch 88 Makelaars listings (Den Haag only)."""
listings = []
page = 1
while True:
if page == 1:
url = f"{_88_BASE}/ons-aanbod/"
else:
url = f"{_88_BASE}/ons-aanbod/page/{page}/"
soup = fetch_soup(url)
cards = soup.select("div.property_listing")
if not cards:
break
for card in cards:
try:
# URL from carousel
a_tag = card.select_one(".property_unit_carousel a[href]")
if not a_tag:
continue
detail_url = a_tag["href"]
if not detail_url.startswith("http"):
detail_url = _88_BASE + detail_url
# City — last link in property_location_image
loc_links = card.select(".property_location_image a")
stad = loc_links[-1].get_text(strip=True) if loc_links else None
if not stad or stad.lower() != "den haag":
continue
# Price
prijs = parse_prijs(_text(card, ".listing_unit_price_wrapper"))
if prijs and prijs > config.MAX_PRICE:
continue
# Status
status_text = (_text(card, ".ribbon-inside") or "").lower()
status = _88_STATUS_MAP.get(status_text, "beschikbaar")
# Address
adres = _text(card, "h4 a") or _text(card, "h4")
# Surface + rooms
woonoppervlak_card = parse_m2(_text(card, "span.infosize"))
kamers_card = None
rooms_txt = _text(card, "span.inforoom")
if rooms_txt:
m = re.search(r"(\d+)", rooms_txt)
kamers_card = int(m.group(1)) if m else None
# Hero: first active carousel image
img = card.select_one(".item.active img")
hero = img.get("src") or img.get("data-original") if img else None
kk = _88makelaars_detail(detail_url)
listings.append(RawListing(
url=detail_url,
source_makelaar="88makelaars",
status=status,
adres=adres,
postcode=kk.get("postcode"),
stad="Den Haag",
prijs=prijs,
hero_image_url=hero,
woningtype=kk.get("woningtype"),
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
kamers=kamers_card,
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
energielabel=kk.get("energielabel"),
))
if config.APP_ENV == "dev":
break
except Exception as e:
log.warning("88makelaars: parse fout: %s", e)
if len(cards) < 10:
break
page += 1
log.info("88makelaars: %d listings opgehaald", len(listings))
return listings
# ---------------------------------------------------------------------------
# Borgdorff Makelaars (Den Haag / Westland) — SURE WordPress plugin
# ---------------------------------------------------------------------------
# Covers Den Haag ('s-gravenhage), Monster, Naaldwijk etc. Filter for Den Haag.
# Same SURE plugin as Schieland Borsboom but uses a.card--house (double dash).
# No postcode on detail page.
_BORGDORFF_BASE = "https://www.borgdorff.nl"
_BORGDORFF_DEN_HAAG = {"'s-gravenhage", "den haag"}
_BORGDORFF_BADGE_MAP = {
"badge--info": "beschikbaar",
"badge--warning": "onder_bod",
"badge--danger": "verkocht",
}
def _borgdorff_detail(detail_url: str) -> dict:
"""Fetch Borgdorff detail page; extract #kenmerken li span pairs."""
try:
soup = fetch_soup(detail_url)
kv: dict[str, str] = {}
for li in soup.select("#kenmerken li"):
spans = li.select("span")
if len(spans) >= 2:
label = spans[0].get_text(strip=True).lower()
value = spans[1].get_text(strip=True)
kv[label] = value
return {
"status": kv.get("status", "").lower(),
"woningtype": kv.get("soort woonhuis") or kv.get("soort woning") or kv.get("soort bouw"),
"bouwjaar": kv.get("bouwjaar"),
"woonoppervlak": kv.get("gebruiksoppervlakte wonen") or kv.get("gebruiksoppervlakte"),
"perceeloppervlak": kv.get("perceeloppervlakte"),
"slaapkamers": kv.get("aantal slaapkamers"),
"energielabel": kv.get("energielabel"),
}
except Exception as e:
log.warning("borgdorff: detail fetch fout %s: %s", detail_url, e)
return {}
def fetch_borgdorff() -> list[RawListing]:
"""Fetch Borgdorff listings; only Den Haag / 's-gravenhage, only koop."""
listings = []
page = 1
while True:
if page == 1:
url = f"{_BORGDORFF_BASE}/wonen?sure_koop_huur=koop"
else:
url = f"{_BORGDORFF_BASE}/wonen/page/{page}/?sure_koop_huur=koop"
soup = fetch_soup(url)
cards = soup.select("a.card--house")
if not cards:
break
for card in cards:
try:
href = card.get("href", "")
if not href:
continue
detail_url = href if href.startswith("http") else _BORGDORFF_BASE + href
# Filter: only Den Haag
stad_el = card.select_one("p.lead-two")
stad = stad_el.get_text(strip=True) if stad_el else None
if not stad or stad.lower() not in _BORGDORFF_DEN_HAAG:
continue
# Price — filter early
prijs = parse_prijs(_text(card, "p.strong"))
if prijs and prijs > config.MAX_PRICE:
continue
# Status from badge class
label_span = card.select_one("span.card-house__label")
status = "beschikbaar"
if label_span:
for cls in label_span.get("class", []):
if cls in _BORGDORFF_BADGE_MAP:
status = _BORGDORFF_BADGE_MAP[cls]
break
# Address
adres = _text(card, "h4")
# Hero: largest source srcset
src_tag = card.select_one('picture source[media="(min-width:1280px)"]')
hero = src_tag.get("srcset") if src_tag else None
if not hero:
img = card.select_one("img[data-src]")
hero = img.get("data-src") if img else None
if hero and not hero.startswith("http"):
hero = _BORGDORFF_BASE + hero
# Surface + bedrooms from data icons
woonoppervlak_card = None
slaapkamers_card = None
for data_div in card.select("div.data"):
inner = data_div.select_one("p.small")
if not inner:
continue
txt = inner.get_text(strip=True)
if data_div.select_one("i.icon-surface"):
woonoppervlak_card = parse_m2(txt)
elif data_div.select_one("i.icon-bed"):
m = re.search(r"(\d+)", txt)
slaapkamers_card = int(m.group(1)) if m else None
kk = _borgdorff_detail(detail_url)
# Refine status from detail page
detail_status_map = {
"beschikbaar": "beschikbaar",
"onder bod": "onder_bod",
"onder optie": "onder_bod",
"verkocht": "verkocht",
}
if kk.get("status"):
status = detail_status_map.get(kk["status"], status)
listings.append(RawListing(
url=detail_url,
source_makelaar="borgdorff",
status=status,
adres=adres,
postcode=None, # not exposed by broker
stad=stad,
prijs=prijs,
hero_image_url=hero,
woningtype=kk.get("woningtype"),
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else slaapkamers_card,
energielabel=kk.get("energielabel"),
))
if config.APP_ENV == "dev":
break
except Exception as e:
log.warning("borgdorff: parse fout: %s", e)
if len(cards) < 15:
break
page += 1
log.info("borgdorff: %d listings opgehaald", len(listings))
return listings
# ---------------------------------------------------------------------------
# SCRAPERS — exporteer hier alle actieve SSR adapters
# ---------------------------------------------------------------------------
@@ -1309,4 +1891,9 @@ SCRAPERS = {
'vwmakelaars': fetch_vwmakelaars,
'roepman': fetch_roepman,
'zomakelaars': fetch_zomakelaars,
'post': fetch_post,
'morris': fetch_morris,
'olsthoorn': fetch_olsthoorn,
'88makelaars': fetch_88makelaars,
'borgdorff': fetch_borgdorff,
}

View File

@@ -16,7 +16,7 @@ logging.basicConfig(
)
# --- change this to test a different adapter ---
ADAPTER = SCRAPERS['zomakelaars']
ADAPTER = SCRAPERS['post']
if __name__ == "__main__":
print(f"Testing adapter: {ADAPTER.__name__}")