Compare commits
21 Commits
8450c33887
...
master
| Author | SHA1 | Date | |
|---|---|---|---|
| fc6f3ff809 | |||
| 1841412c93 | |||
| c6328cee46 | |||
| f74e9bcfb0 | |||
| 1011d9cf87 | |||
| 9149d11a06 | |||
| 6bb2538143 | |||
| 77f8e91f07 | |||
| 7096220203 | |||
| 75c5b6f26d | |||
| 6beae1133b | |||
| bfd69e3542 | |||
| d310a7a560 | |||
| c92ddb5812 | |||
| edd2580919 | |||
| 942170ef7f | |||
| 84e5656ca0 | |||
| e1745841b1 | |||
| fbe50790da | |||
| 423a429f56 | |||
| f1748214ce |
@@ -1,11 +1,5 @@
|
||||
HA_WEBHOOK_URL=
|
||||
|
||||
SMTP_HOST=
|
||||
SMTP_PORT=587
|
||||
SMTP_FROM=
|
||||
SMTP_TO=
|
||||
SMTP_USER=
|
||||
SMTP_PASSWORD=
|
||||
|
||||
DB_PATH=/data/huizenbot.db
|
||||
|
||||
APP_ENV=dev
|
||||
|
||||
1
.gitignore
vendored
1
.gitignore
vendored
@@ -5,3 +5,4 @@
|
||||
**/__pycache__/
|
||||
|
||||
tests/cache/
|
||||
data/
|
||||
|
||||
45
CLAUDE.md
Normal file
45
CLAUDE.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# Huizenbot
|
||||
|
||||
## Doel
|
||||
|
||||
Periodiek scrapen van makelaarswebsites in Delft en Schiedam, nieuwe woningen opslaan in SQLite, en pushnotificaties sturen via Home Assistant. Draait als één Docker container op homelab met cron.
|
||||
|
||||
Dit draait op het moment al, dus we zijn nu enkel bezig met uitbreidingen en verbeteringen.
|
||||
|
||||
|
||||
# AIDE - An IDE for your Agent
|
||||
This project uses AIDA to support agents, it increases the robustness of edits and reduces token costs.
|
||||
|
||||
You can always use `aide help` for further info, and use it also on each subcommand. If you do, edit what you learned into the agents.md so we don't have to spend tokens on it repeatedly.
|
||||
|
||||
|
||||
## Using aide effectively
|
||||
|
||||
**Always start with aide for codebase exploration — not Read or Grep:**
|
||||
- Use `aide outline <file>` first to get the function map of any file before reading it
|
||||
- Use `aide source <file> <sym>` to read individual functions — never Read a whole large file just to find one function
|
||||
- This is especially important for large files like `ssr.py` (84KB+) where Read truncates
|
||||
|
||||
**For edits:** `aide insert` is fragile with large inputs (see note above) — fall back to the `Edit` tool for anything non-trivial. `aide replace` is fine for small targeted changes.
|
||||
|
||||
## What aide can do (quick reference)
|
||||
|
||||
| Command | What it replaces |
|
||||
|---------|-----------------|
|
||||
| `aide outline <file\|dir>` | `Read` whole file for structure; `ls` + loop |
|
||||
| `aide source <file> <sym>` | `Read` whole file for one function |
|
||||
| `aide callers <sym>` | `Grep` for call sites |
|
||||
| `aide search <term>` | `Grep` across the project |
|
||||
| `aide replace <file> <sym> <msg>` | `Edit` / `sed` for symbol-level changes |
|
||||
| `aide replace … --lines N-M <msg>` | `Edit` for intra-function line edits |
|
||||
| `aide remove <file> <sym>` | Manual splice to delete a symbol |
|
||||
| `aide insert <file> <msg> --after <sym>` | Manual splice to add a new symbol — **insert one function at a time**; large messages cause bash to be killed |
|
||||
| `aide rename <file> <old> <new>` | Manual find-and-replace of a name |
|
||||
| `aide log` | Log related to the undo command; see which files changed in which order |
|
||||
| `aide annotate <file> <sym>` | Persist a non-obvious invariant or gotcha for a symbol |
|
||||
| `aide context <file> <sym>` | Read the stored annotation before editing |
|
||||
| `aide review [path]` | Check for annotations invalidated by recent edits |
|
||||
|
||||
Line numbers in `--lines N-M` are **1-based and relative to the symbol's first
|
||||
line** (line 1 is the signature / opening line of the symbol). This means they
|
||||
are stable across edits elsewhere in the file.
|
||||
@@ -96,8 +96,13 @@ def fetch_bjornd() -> list[RawListing]:
|
||||
- `fetch_json(url, *, params=None, headers=None)` — GET with User-Agent, timeout, Retry-After handling
|
||||
- Built-in logging via `log = logging.getLogger("huizenbot.api")`
|
||||
|
||||
#### 2. **SSR/HTML-based** (`src/adapters/ssr.py`)
|
||||
For brokers with server-side rendered HTML.
|
||||
#### 2. **SSR/HTML-based** (`src/adapters/ssr/` package)
|
||||
For brokers with server-side rendered HTML. The package is split by CMS platform:
|
||||
- `realworks.py` — Realworks CMS (li/div.aanbodEntry cards + span.kenmerk detail)
|
||||
- `sure.py` — SURE WordPress plugin (/wonen?sure_koop_huur=koop + #kenmerken detail)
|
||||
- `schiedam.py` — Custom Schiedam scrapers (diverse platforms)
|
||||
- `denhaag.py` — Den Haag scrapers (diverse platforms)
|
||||
- `overige.py` — Other / multi-city scrapers (OG Online WP, Elementor)
|
||||
|
||||
**Pattern:**
|
||||
```python
|
||||
@@ -138,24 +143,28 @@ def fetch_vdaal() -> list[RawListing]:
|
||||
- `_text(soup, selector)` — Get inner text from element
|
||||
- `_src(soup, selector)` — Get src or data-src attribute
|
||||
- `_extract_postcode(text)` — Regex postcode from any text
|
||||
- `_infer_stad(postcode)` — Simple lookup: 2600–2629 → Delft, 3100–3135 → Schiedam
|
||||
- `_infer_stad(postcode)` — Simple lookup: 2600–2629 → Delft, 3100–3135 → Schiedam (Den Haag not in this helper; use the city value from the broker directly)
|
||||
|
||||
---
|
||||
|
||||
## Registration
|
||||
|
||||
Both `api.py` and `ssr.py` have a `SCRAPERS` dict at the bottom:
|
||||
**API scrapers** (`src/adapters/api.py`): Add your function and register in the `SCRAPERS` dict at the bottom of the file.
|
||||
|
||||
**SSR scrapers**: Add your function to the appropriate submodule (`realworks.py`, `sure.py`, `schiedam.py`, `denhaag.py`, or `overige.py`), then import it in `src/adapters/ssr/__init__.py` and add it to the `SCRAPERS` dict there.
|
||||
|
||||
```python
|
||||
# api.py
|
||||
# api.py — SCRAPERS dict
|
||||
SCRAPERS = {
|
||||
'bjornd': fetch_bjornd,
|
||||
'your_broker': fetch_your_broker, # ← Add here
|
||||
}
|
||||
|
||||
# ssr.py
|
||||
# ssr/__init__.py — import + register
|
||||
from .realworks import fetch_your_broker # ← import from the right submodule
|
||||
|
||||
SCRAPERS = {
|
||||
'bjornd_demo': fetch_bjornd_demo,
|
||||
...
|
||||
'your_broker': fetch_your_broker, # ← Add here
|
||||
}
|
||||
```
|
||||
@@ -173,7 +182,7 @@ The human will help you:
|
||||
- Write exploratory curl requests (for APIs) or BeautifulSoup inspections
|
||||
|
||||
### 2. Develop & Test Locally
|
||||
- Add your scraper function to the appropriate file (`api.py` or `ssr.py`)
|
||||
- Add your scraper function to the appropriate file (`api.py` or the right `ssr/` submodule)
|
||||
- Register it in the `SCRAPERS` dict
|
||||
- The human updates `tests/test_adapters.py` to point to your adapter:
|
||||
```python
|
||||
@@ -203,19 +212,48 @@ Secrets (API keys, webhook URLs) are **environment variables**, not in config.
|
||||
|
||||
---
|
||||
|
||||
## CMS Detection Tool
|
||||
## Platform / CMS Quick Identification
|
||||
|
||||
Before investigating a broker's HTML manually, prod the human in the loop to run `autoscraper.py` from the project root:
|
||||
Before investigating a broker's HTML manually, check for known platforms in this order:
|
||||
|
||||
### 1. OG Online / realtime-listings (API — fastest)
|
||||
**File:** `src/adapters/api.py`
|
||||
|
||||
Check if `https://<base>/nl/realtime-listings/consumer` returns JSON (with header `X-Requested-With: XMLHttpRequest`). If yes, this is a 10-line addition to `api.py`. Known brokers: bjornd, moerman, vandaal, elzenaar, doen.
|
||||
|
||||
Fields: `isSales`, `statusOrig`, `salesPrice`, `address`, `zipcode`, `city`, `rooms`, `bedrooms`, `livingSurface`, `plotSurface`, `dateOfConstruction`, `energyLabel`, `type`, `photo`, `url`.
|
||||
|
||||
Add a `_CITIES` set to filter by city if the broker covers a wide area. Skip statuses `"rented"` and `"rented_ur"`.
|
||||
|
||||
### 2. Realworks CMS (SSR — one liner)
|
||||
**File:** `src/adapters/ssr/realworks.py`
|
||||
|
||||
Run `autoscraper.py` or check HTML for `li.aanbodEntry`. If detected:
|
||||
```python
|
||||
def fetch_mybroker() -> list[RawListing]:
|
||||
return fetch_realworks("https://www.mybroker.nl", "mybroker")
|
||||
```
|
||||
|
||||
### 3. SURE WordPress Plugin (SSR — ~50 lines)
|
||||
**File:** `src/adapters/ssr/sure.py`
|
||||
|
||||
Check HTML for `sure-` CSS classes or `?sure_koop_huur=koop` filter. Two card variants:
|
||||
- `a.card-house` (single dash) — e.g. Olsthoorn
|
||||
- `a.card--house` (double dash) — e.g. Borgdorff
|
||||
|
||||
Both use `?sure_koop_huur=koop` to filter buy listings and `/page/{N}/` pagination. Detail page always has `#kenmerken li span span` pairs with labels like `status`, `soort woonhuis`/`soort woning`/`soort bouw`, `bouwjaar`, `gebruiksoppervlakte wonen`, `perceeloppervlakte`, `aantal slaapkamers`, `energielabel`. Postcode is often **not** available on the detail page.
|
||||
|
||||
Terminate pagination when `len(cards) < expected_per_page` (typically 15 for SURE).
|
||||
|
||||
### 4. Unknown CMS
|
||||
**File:** `src/adapters/ssr/schiedam.py`, `denhaag.py`, or `overige.py` depending on city — or add a new file if needed.
|
||||
|
||||
Run the autoscraper tool:
|
||||
```bash
|
||||
python autoscraper.py listings <listings-url>
|
||||
python autoscraper.py details <detail-page-url>
|
||||
```
|
||||
|
||||
If the broker uses a known CMS, the tool prints the exact code to add — no further investigation needed. Currently detected CMSes:
|
||||
|
||||
- **Realworks** → prints a ready-to-paste `fetch_realworks(...)` one-liner for `ssr.py`
|
||||
|
||||
If the CMS is unknown, the tool prints structural diagnostics (card selectors, field patterns, pagination) to guide manual adapter development.
|
||||
It prints structural diagnostics (card selectors, field patterns, pagination) to guide manual adapter development.
|
||||
|
||||
## Important Notes
|
||||
|
||||
@@ -240,6 +278,13 @@ status = _STATUS_MAP.get(item.get("status"), "beschikbaar")
|
||||
### Postcode Extraction
|
||||
Always aim for the **Dutch postcode format** (4 digits + 2 letters, e.g., `"2611CA"`). The travel time calculation depends on it. If a broker only provides the address string, use `_extract_postcode(address)`.
|
||||
|
||||
If a postcode field contains extra text (e.g., `"2522GW Den Haag"`), extract cleanly with:
|
||||
```python
|
||||
m = re.search(r"\d{4}\s*[A-Z]{2}", raw.upper())
|
||||
postcode = m.group(0).replace(" ", "") if m else None
|
||||
```
|
||||
Never just `.replace(" ", "")` — that produces garbage like `"2522GWDenHaag"`.
|
||||
|
||||
### Price Handling
|
||||
Prices are **integers** (euros), never floats. Use `parse_prijs()` for HTML.
|
||||
|
||||
@@ -272,7 +317,8 @@ The database stores this as JSON in the `extra` column.
|
||||
- Nominatim (geocoding) has a 1 req/s limiter built into `huizenbot.py`
|
||||
- Never spawn parallel requests without the human's approval
|
||||
- Always use the `USER_AGENT` header (includes contact info for respectful scraping)
|
||||
- Don't keep curling the same endpoint, pipe it to a <name makelaar>.dump and then rg through it to find what you need. Can also pipe it through the bsprettify.py and then rg that.
|
||||
- Don't keep curling the same endpoint, pipe it to a <name makelaar>.dump and then rg through it to find what you need. Can also pipe it through the bsprettify.py and then rg that.
|
||||
- Don't over-investigate pagination — confirm card count on page 1, assume it's consistent across pages, move on. Never fetch multiple pages just to verify the per-page count.
|
||||
|
||||
---
|
||||
|
||||
|
||||
46
makelaars.md
46
makelaars.md
@@ -1,25 +1,34 @@
|
||||
# Verkoopmakelaars Delft & Schiedam
|
||||
# Verkoopmakelaars Delft, Leiden, Den Haag & Schiedam
|
||||
|
||||
## TODO
|
||||
|
||||
- ~~**API scrapers need detail page enrichment**: OG Online API (bjornd, moerman, vandaal, elzenaar, doen, vandriel) sometimes omits fields like `energyLabel`. We should fetch the detail page for each listing and merge in missing fields (especially energielabel, bouwjaar). This is already done for SSR scrapers; needs to be added to API-based ones.~~ ✅ Done — `_og_detail()` added to `api.py`
|
||||
|
||||
## Delft
|
||||
|
||||
| Done | Naam | Website | Adres |
|
||||
| [ ] | ---- |------|---------|-------|
|
||||
| [ ] | Van Silfhout & Hogetoorn Wereldmakelaars | vansilfhout.nl | Ireneboulevard 2 |
|
||||
| [ ] | Van Daal Makelaardij | vandaalmakelaardij.nl | Voldersgracht 33 |
|
||||
| [x] | Van Silfhout & Hogetoorn Wereldmakelaars | vansilfhout.nl | Ireneboulevard 2 |
|
||||
| [x] | Van Daal Makelaardij | vandaalmakelaardij.nl | Voldersgracht 33 |
|
||||
| [x] | Björnd Makelaardij | bjornd.nl | Oude Delft 103 |
|
||||
| [ ] | Hof van Delft Makelaardij | hofvandelftmakelaardij.nl | Wateringsevest 26 |
|
||||
| [ ] | V&W Makelaars Delft | vwmakelaars.nl | Coenderstraat 31 |
|
||||
| [ ] | Roepman Makelaardij NVM | roepman.nl | Molslaan 43 |
|
||||
| [ ] | ZO makelaars | zomakelaars.nl | Van Foreestweg 4 |
|
||||
| [x] | V&W Makelaars Delft | vwmakelaars.nl | Coenderstraat 31 |
|
||||
| [x] | Roepman Makelaardij NVM | roepman.nl | Molslaan 43 |
|
||||
| [x] | ZO makelaars | zomakelaars.nl | Van Foreestweg 4 |
|
||||
| [ ] | Marloes Makelaars | — | Maerten Trompstraat 28 |
|
||||
| [ ] | Makelaarskantoor J.E. Mouthaan | — | Julianalaan 43 |
|
||||
| [ ] | Olsthoorn Makelaars Delft | olsthoornmakelaars.nl | Noordeinde 51 |
|
||||
| [ ] | Post Makelaardij (v/h Bayense) | postmakelaardij.nl | Spoorsingel 1a |
|
||||
| [ ] | Morris NVM Makelaars | morrismakelaardij.nl | — |
|
||||
| [x] | Olsthoorn Makelaars Delft | olsthoornmakelaars.nl | Noordeinde 51 |
|
||||
| [x] | Post Makelaardij (v/h Bayense) | postmakelaardij.nl | Spoorsingel 1a |
|
||||
| [x] | Morris NVM Makelaars | morrismakelaardij.nl | — |
|
||||
| [ ] | Prinsenstad Makelaardij | — | — |
|
||||
| [ ] | Oude Delft Makelaardij | — | — |
|
||||
| [ ] | Dijksman Woningmakelaars | — | — |
|
||||
| [ ] | CORPOwonen | — | — |
|
||||
| [ ] | Bergklis Makelaars | bergklis.nl | — |
|
||||
| [ ] | Van Gulden Makelaardij | vanguldenmakelaardij.nl | Zaïrestraat 1 |
|
||||
| [ ] | Van der Togt Makelaardij | vdtmakelaardij.nl | — (Voorburg, actief in Delft) |
|
||||
| [x] | Van Oord Makelaardij | vanoordmakelaardij.nl | — (Delft + Schiedam) |
|
||||
|
||||
|
||||
## Schiedam
|
||||
|
||||
@@ -33,11 +42,26 @@
|
||||
| [x] | 3D Makelaars | 3dmakelaars.nl | Gerrit Verboonstraat 17 |
|
||||
| [x] | Dupont Makelaars | dupont.nl | Rotterdamsedijk 437 |
|
||||
| [x] | D&S Makelaardij | densmakelaars.nl | Land van Belofte 50 |
|
||||
| [ ] | Moerman & De Jong Makelaars | moerman-dejong.nl | Lange Kerkstraat 80B |
|
||||
| [x] | Moerman & De Jong Makelaars | moerman-dejong.nl | Lange Kerkstraat 80B |
|
||||
| [ ] | Hagestein Makelaardij | — | Degerfors 54 |
|
||||
| [ ] | Schieland Borsboom NVM Makelaars | schielandborsboom.nl | (Rotterdam, actief in Schiedam) |
|
||||
| [x] | Schieland Borsboom NVM Makelaars | schielandborsboom.nl | (Rotterdam, actief in Schiedam) |
|
||||
| [x] | Vandriel Makelaardij | vandrielmakelaardij.nl | — |
|
||||
| [x] | Van Herk Makelaars | vanherk.nl | — |
|
||||
|
||||
|
||||
## Den Haag
|
||||
|
||||
| Done | Naam | Website | Adres |
|
||||
|------|------|---------|-------|
|
||||
| [skip] | Yuvam Makelaardij | yuvammakelaardij.nl | — (connection refused) |
|
||||
| [x] | 88 Makelaars | 88makelaars.nl | — |
|
||||
| [skip] | DIVA Makelaars | divamakelaars.nl | — (alleen Maartensdijk, niet Den Haag) |
|
||||
| [x] | Elzenaar NVM Makelaars | elzenaar.com | — |
|
||||
| [skip] | Frisia Makelaars | frisiamakelaars.nl | — (SPA/Vue, geen API) |
|
||||
| [x] | Borgdorff Makelaars | borgdorff.nl | — (vestiging Den Haag) |
|
||||
| [skip] | SMASH Makelaars | smashmakelaars.nl | — (te klein, geen API) |
|
||||
| [x] | DOEN NVM Makelaars | doenmakelaars.com | Doezastraat 30 (Leiden, ook actief in Den Haag) |
|
||||
|
||||
## Leiden
|
||||
|
||||
| Done | Naam | Website | Adres |
|
||||
|
||||
@@ -1,4 +1,39 @@
|
||||
# SSR
|
||||
# OG Online / realtime-listings (fastest — API)
|
||||
|
||||
Check out the add_scraper_context.md, let's add a new scraper.
|
||||
|
||||
**Broker:** [name]
|
||||
**Base URL:** [e.g. https://www.mybroker.nl]
|
||||
**Cities to include:** [e.g. {"Den Haag", "Voorburg"} — omit if broker is single-city]
|
||||
|
||||
_(No further investigation needed — OG Online platform is fully understood.)_
|
||||
|
||||
|
||||
# Realworks CMS (one-liner — SSR)
|
||||
|
||||
Check out the add_scraper_context.md, let's add a new scraper.
|
||||
|
||||
**Broker:** [name]
|
||||
**Base URL:** [e.g. https://www.mybroker.nl]
|
||||
|
||||
_(No further investigation needed — Realworks platform is fully understood.)_
|
||||
|
||||
|
||||
# SURE WordPress Plugin (SSR)
|
||||
|
||||
Check out the add_scraper_context.md, let's add a new scraper.
|
||||
|
||||
**Broker:** [name]
|
||||
**Base URL:** [e.g. https://www.mybroker.nl]
|
||||
**Card selector:** [a.card-house or a.card--house]
|
||||
**City filter:** [city name(s) to include, or "single city — no filter needed"]
|
||||
**Cards per page:** [e.g. 15]
|
||||
|
||||
_(Detail page always uses #kenmerken li span span — no further investigation needed.)_
|
||||
|
||||
|
||||
# SSR (custom)
|
||||
|
||||
Check out the add_scraper_context.md, let's add a new scraper.
|
||||
|
||||
**Broker:** [name]
|
||||
@@ -16,7 +51,7 @@ Check out the add_scraper_context.md, let's add a new scraper.
|
||||
**Notes:** [auth, JS rendering, price filter in URL, etc.]
|
||||
|
||||
|
||||
# API
|
||||
# API (custom)
|
||||
|
||||
Check out the add_scraper_context.md, let's add a new scraper.
|
||||
|
||||
|
||||
12
shell.nix
12
shell.nix
@@ -1,21 +1,23 @@
|
||||
{ pkgs ? import <nixpkgs> {} }:
|
||||
|
||||
{ pkgs ? import <nixpkgs> { config.allowUnfree = true; } }:
|
||||
let
|
||||
unstable = import <nixos-unstable> { config.allowUnfree = true; };
|
||||
in
|
||||
pkgs.mkShell {
|
||||
packages = [
|
||||
(pkgs.python3.withPackages (ps: with ps; [
|
||||
httpx
|
||||
beautifulsoup4
|
||||
flask
|
||||
lxml
|
||||
waitress
|
||||
]))
|
||||
pkgs.claude-code
|
||||
unstable.claude-code
|
||||
];
|
||||
|
||||
shellHook = ''
|
||||
if [ -f .env ]; then
|
||||
set -a
|
||||
source .env
|
||||
set +a
|
||||
echo ".env geladen"
|
||||
fi
|
||||
'';
|
||||
}
|
||||
|
||||
@@ -7,9 +7,11 @@ Voeg nieuwe toe onderaan en registreer in SCRAPERS.
|
||||
|
||||
import json
|
||||
import logging
|
||||
import re
|
||||
import time
|
||||
|
||||
import httpx
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
import config
|
||||
from huizenbot import RawListing
|
||||
@@ -40,8 +42,71 @@ def fetch_json(url: str, *, params: dict = None, headers: dict = None) -> dict |
|
||||
return r.json()
|
||||
|
||||
raise RuntimeError(f"Blijvend 429 op {url}")
|
||||
|
||||
|
||||
|
||||
|
||||
def _og_detail(url: str, makelaar: str) -> dict:
|
||||
"""
|
||||
Fetch an OG Online detail page and extract missing fields.
|
||||
|
||||
OG Online sites typically expose kenmerken in one of two patterns:
|
||||
1. A table/list with dt/dd or label/value span pairs
|
||||
2. An energielabel CSS class (energielabel-A, energielabel-B, etc.)
|
||||
|
||||
Returns a dict with any fields found; empty dict on failure.
|
||||
"""
|
||||
try:
|
||||
r = httpx.get(
|
||||
url,
|
||||
headers={"User-Agent": config.USER_AGENT},
|
||||
timeout=15,
|
||||
follow_redirects=True,
|
||||
)
|
||||
r.raise_for_status()
|
||||
soup = BeautifulSoup(r.text, "html.parser")
|
||||
|
||||
# Pattern 1: energielabel CSS class on any element
|
||||
energielabel = None
|
||||
for el in soup.select("[class]"):
|
||||
for cls in el.get("class", []):
|
||||
if cls.startswith("energielabel-") and cls != "energielabel":
|
||||
energielabel = cls.replace("energielabel-", "").upper()
|
||||
break
|
||||
if energielabel:
|
||||
break
|
||||
|
||||
# Pattern 2: kenmerken table — try dt/dd pairs first
|
||||
kv: dict[str, str] = {}
|
||||
dts = soup.select("dt")
|
||||
dds = soup.select("dd")
|
||||
for dt, dd in zip(dts, dds):
|
||||
kv[dt.get_text(strip=True).lower()] = dd.get_text(strip=True)
|
||||
|
||||
# Pattern 3: ul.objectkenmerken / div.kenmerken span pairs
|
||||
if not kv:
|
||||
for li in soup.select("li"):
|
||||
spans = li.select("span")
|
||||
if len(spans) >= 2:
|
||||
kv[spans[0].get_text(strip=True).lower()] = spans[1].get_text(strip=True)
|
||||
|
||||
if not energielabel:
|
||||
energielabel = (
|
||||
kv.get("energielabel")
|
||||
or kv.get("energieklasse")
|
||||
or kv.get("energie")
|
||||
) or None
|
||||
|
||||
raw_year = kv.get("bouwjaar") or ""
|
||||
bouwjaar = int(raw_year) if raw_year.isdigit() else None
|
||||
|
||||
return {
|
||||
"energielabel": energielabel,
|
||||
"bouwjaar": bouwjaar,
|
||||
}
|
||||
except Exception as e:
|
||||
log.warning("%s: detail fetch fout %s: %s", makelaar, url, e)
|
||||
return {}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Bjornd
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -56,26 +121,36 @@ _STATUS_MAP = {
|
||||
"sold": "verkocht",
|
||||
"sold_ur": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
def fetch_bjornd() -> list[RawListing]:
|
||||
data = fetch_json(
|
||||
f"{_BJORND_BASE}/nl/realtime-listings/consumer",
|
||||
headers={"X-Requested-With": "XMLHttpRequest"},
|
||||
)
|
||||
|
||||
|
||||
listings = []
|
||||
for item in data:
|
||||
if not item.get("isSales"):
|
||||
continue
|
||||
if item.get("statusOrig") in _BJORND_SKIP:
|
||||
continue
|
||||
if item.get('salesPrice')>config.MAX_PRICE:
|
||||
if item.get("salesPrice", 0) > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
|
||||
|
||||
detail_url = _BJORND_BASE + item["url"]
|
||||
raw_year = item.get("dateOfConstruction") or ""
|
||||
bouwjaar = int(raw_year) if raw_year.isdigit() else None
|
||||
energielabel = item.get("energyLabel") or None
|
||||
|
||||
# Fetch detail page when API omits key fields
|
||||
if not energielabel or not bouwjaar:
|
||||
extra_kk = _og_detail(detail_url, "bjornd")
|
||||
energielabel = energielabel or extra_kk.get("energielabel")
|
||||
bouwjaar = bouwjaar or extra_kk.get("bouwjaar")
|
||||
|
||||
listings.append(RawListing(
|
||||
url=_BJORND_BASE + item["url"],
|
||||
url=detail_url,
|
||||
source_makelaar="bjornd",
|
||||
status=_STATUS_MAP.get(item.get("statusOrig", ""), "beschikbaar"),
|
||||
adres=item.get("address") or None,
|
||||
@@ -87,6 +162,8 @@ def fetch_bjornd() -> list[RawListing]:
|
||||
perceeloppervlak=item.get("plotSurface") or None,
|
||||
kamers=item.get("rooms") or None,
|
||||
slaapkamers=item.get("bedrooms") or None,
|
||||
bouwjaar=bouwjaar,
|
||||
energielabel=energielabel,
|
||||
hero_image_url=item.get("photo") or None,
|
||||
extra=json.dumps({
|
||||
"balcony": item.get("balcony"),
|
||||
@@ -102,10 +179,13 @@ def fetch_bjornd() -> list[RawListing]:
|
||||
"photos": item.get("photos"),
|
||||
}, ensure_ascii=False),
|
||||
))
|
||||
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
|
||||
log.info("bjornd: %d koopwoningen opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Ooms
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -182,11 +262,373 @@ def fetch_ooms() -> list[RawListing]:
|
||||
log.info("ooms: %d listings opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Moerman & De Jong Makelaars (Schiedam)
|
||||
# ---------------------------------------------------------------------------
|
||||
# Zelfde OG Online / realtime-listings platform als Bjornd.
|
||||
|
||||
_MOERMAN_BASE = "https://www.moerman-dejong.nl"
|
||||
_MOERMAN_SKIP = {"rented", "rented_ur"}
|
||||
|
||||
_MOERMAN_STATUS_MAP = {
|
||||
"available": "beschikbaar",
|
||||
"under_bid": "onder_bod",
|
||||
"under_option": "onder_bod",
|
||||
"sold": "verkocht",
|
||||
"sold_ur": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def fetch_moerman() -> list[RawListing]:
|
||||
data = fetch_json(
|
||||
f"{_MOERMAN_BASE}/nl/realtime-listings/consumer",
|
||||
headers={"X-Requested-With": "XMLHttpRequest"},
|
||||
)
|
||||
|
||||
listings = []
|
||||
for item in data:
|
||||
if not item.get("isSales"):
|
||||
continue
|
||||
if item.get("statusOrig") in _MOERMAN_SKIP:
|
||||
continue
|
||||
if item.get("salesPrice", 0) > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
postcode = (item.get("zipcode") or "").replace(" ", "") or None
|
||||
perceel = item.get("plotSurface") or None
|
||||
if perceel == 0:
|
||||
perceel = None
|
||||
|
||||
raw_year = item.get("dateOfConstruction") or ""
|
||||
bouwjaar = int(raw_year) if raw_year.isdigit() else None
|
||||
energielabel = item.get("energyLabel") or None
|
||||
|
||||
detail_url = _MOERMAN_BASE + item["url"]
|
||||
if not energielabel:
|
||||
extra_kk = _og_detail(detail_url, "moerman")
|
||||
energielabel = extra_kk.get("energielabel")
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="moerman",
|
||||
status=_MOERMAN_STATUS_MAP.get(item.get("statusOrig", ""), "beschikbaar"),
|
||||
adres=item.get("address") or None,
|
||||
postcode=postcode,
|
||||
stad=item.get("city") or None,
|
||||
prijs=item.get("salesPrice") or None,
|
||||
woningtype=item.get("type") or None,
|
||||
woonoppervlak=item.get("livingSurface") or None,
|
||||
perceeloppervlak=perceel,
|
||||
kamers=item.get("rooms") or None,
|
||||
slaapkamers=item.get("bedrooms") or None,
|
||||
bouwjaar=bouwjaar,
|
||||
energielabel=energielabel,
|
||||
hero_image_url=item.get("photo") or None,
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
|
||||
log.info("moerman: %d koopwoningen opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Van Daal Makelaardij (Delft)
|
||||
# ---------------------------------------------------------------------------
|
||||
# OG Online / realtime-listings platform.
|
||||
|
||||
_VANDAAL_BASE = "https://www.vandaalmakelaardij.nl"
|
||||
_VANDAAL_SKIP = {"rented", "rented_ur"}
|
||||
|
||||
_VANDAAL_STATUS_MAP = {
|
||||
"available": "beschikbaar",
|
||||
"under_bid": "onder_bod",
|
||||
"under_option": "onder_bod",
|
||||
"is_bought": "verkocht",
|
||||
"sold": "verkocht",
|
||||
"sold_ur": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def fetch_vandaal() -> list[RawListing]:
|
||||
data = fetch_json(
|
||||
f"{_VANDAAL_BASE}/nl/realtime-listings/consumer",
|
||||
headers={"X-Requested-With": "XMLHttpRequest"},
|
||||
)
|
||||
|
||||
listings = []
|
||||
for item in data:
|
||||
if not item.get("isSales"):
|
||||
continue
|
||||
if item.get("statusOrig") in _VANDAAL_SKIP:
|
||||
continue
|
||||
if item.get("salesPrice", 0) > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
postcode = (item.get("zipcode") or "").replace(" ", "") or None
|
||||
perceel = item.get("plotSurface") or None
|
||||
if perceel == 0:
|
||||
perceel = None
|
||||
|
||||
raw_year = item.get("dateOfConstruction") or ""
|
||||
bouwjaar = int(raw_year) if raw_year.isdigit() else None
|
||||
energielabel = item.get("energyLabel") or None
|
||||
|
||||
detail_url = _VANDAAL_BASE + item["url"]
|
||||
if not energielabel:
|
||||
extra_kk = _og_detail(detail_url, "vandaal")
|
||||
energielabel = extra_kk.get("energielabel")
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="vandaal",
|
||||
status=_VANDAAL_STATUS_MAP.get(item.get("statusOrig", ""), "beschikbaar"),
|
||||
adres=item.get("address") or None,
|
||||
postcode=postcode,
|
||||
stad=item.get("city") or None,
|
||||
prijs=item.get("salesPrice") or None,
|
||||
woningtype=item.get("type") or None,
|
||||
woonoppervlak=item.get("livingSurface") or None,
|
||||
perceeloppervlak=perceel,
|
||||
kamers=item.get("rooms") or None,
|
||||
slaapkamers=item.get("bedrooms") or None,
|
||||
bouwjaar=bouwjaar,
|
||||
energielabel=energielabel,
|
||||
hero_image_url=item.get("photo") or None,
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
|
||||
log.info("vandaal: %d koopwoningen opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Elzenaar NVM Makelaars (Den Haag) — OG Online platform
|
||||
# ---------------------------------------------------------------------------
|
||||
# Zelfde platform als bjornd/moerman/vandaal.
|
||||
|
||||
_ELZENAAR_BASE = "https://www.elzenaar.com"
|
||||
_ELZENAAR_SKIP = {"rented", "rented_ur"}
|
||||
_ELZENAAR_CITIES = {"Den Haag", "Voorburg", "Rijswijk"}
|
||||
|
||||
_ELZENAAR_STATUS_MAP = {
|
||||
"available": "beschikbaar",
|
||||
"under_bid": "onder_bod",
|
||||
"under_option": "onder_bod",
|
||||
"sold": "verkocht",
|
||||
"sold_ur": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def fetch_elzenaar() -> list[RawListing]:
|
||||
data = fetch_json(
|
||||
f"{_ELZENAAR_BASE}/nl/realtime-listings/consumer",
|
||||
headers={"X-Requested-With": "XMLHttpRequest"},
|
||||
)
|
||||
|
||||
listings = []
|
||||
for item in data:
|
||||
if not item.get("isSales"):
|
||||
continue
|
||||
if item.get("statusOrig") in _ELZENAAR_SKIP:
|
||||
continue
|
||||
if item.get("city") not in _ELZENAAR_CITIES:
|
||||
continue
|
||||
if item.get("salesPrice", 0) > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
postcode = (item.get("zipcode") or "").replace(" ", "") or None
|
||||
perceel = item.get("plotSurface") or None
|
||||
if perceel == 0:
|
||||
perceel = None
|
||||
|
||||
raw_year = item.get("dateOfConstruction") or ""
|
||||
bouwjaar = int(raw_year) if raw_year.isdigit() else None
|
||||
energielabel = item.get("energyLabel") or None
|
||||
|
||||
detail_url = _ELZENAAR_BASE + item["url"]
|
||||
if not energielabel:
|
||||
extra_kk = _og_detail(detail_url, "elzenaar")
|
||||
energielabel = extra_kk.get("energielabel")
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="elzenaar",
|
||||
status=_ELZENAAR_STATUS_MAP.get(item.get("statusOrig", ""), "beschikbaar"),
|
||||
adres=item.get("address") or None,
|
||||
postcode=postcode,
|
||||
stad=item.get("city") or None,
|
||||
prijs=item.get("salesPrice") or None,
|
||||
woningtype=item.get("type") or None,
|
||||
woonoppervlak=item.get("livingSurface") or None,
|
||||
perceeloppervlak=perceel,
|
||||
kamers=item.get("rooms") or None,
|
||||
slaapkamers=item.get("bedrooms") or None,
|
||||
bouwjaar=bouwjaar,
|
||||
energielabel=energielabel,
|
||||
hero_image_url=item.get("photo") or None,
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
|
||||
log.info("elzenaar: %d koopwoningen opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# DOEN NVM Makelaars (Den Haag / Leiden / Voorburg) — OG Online platform
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_DOEN_BASE = "https://www.doenmakelaars.com"
|
||||
_DOEN_SKIP = {"rented", "rented_ur"}
|
||||
_DOEN_CITIES = {"Den Haag", "Leiden", "Voorburg", "Leidschendam", "Rijswijk", "Wassenaar", "Zoetermeer"}
|
||||
|
||||
_DOEN_STATUS_MAP = {
|
||||
"available": "beschikbaar",
|
||||
"under_bid": "onder_bod",
|
||||
"under_option": "onder_bod",
|
||||
"sold": "verkocht",
|
||||
"sold_ur": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def fetch_doen() -> list[RawListing]:
|
||||
data = fetch_json(
|
||||
f"{_DOEN_BASE}/nl/realtime-listings/consumer",
|
||||
headers={"X-Requested-With": "XMLHttpRequest"},
|
||||
)
|
||||
|
||||
listings = []
|
||||
for item in data:
|
||||
if not item.get("isSales"):
|
||||
continue
|
||||
if item.get("statusOrig") in _DOEN_SKIP:
|
||||
continue
|
||||
if item.get("city") not in _DOEN_CITIES:
|
||||
continue
|
||||
if item.get("salesPrice", 0) > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
postcode = (item.get("zipcode") or "").replace(" ", "") or None
|
||||
perceel = item.get("plotSurface") or None
|
||||
if perceel == 0:
|
||||
perceel = None
|
||||
|
||||
raw_year = item.get("dateOfConstruction") or ""
|
||||
bouwjaar = int(raw_year) if raw_year.isdigit() else None
|
||||
energielabel = item.get("energyLabel") or None
|
||||
|
||||
detail_url = _DOEN_BASE + item["url"]
|
||||
if not energielabel:
|
||||
extra_kk = _og_detail(detail_url, "doen")
|
||||
energielabel = extra_kk.get("energielabel")
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="doen",
|
||||
status=_DOEN_STATUS_MAP.get(item.get("statusOrig", ""), "beschikbaar"),
|
||||
adres=item.get("address") or None,
|
||||
postcode=postcode,
|
||||
stad=item.get("city") or None,
|
||||
prijs=item.get("salesPrice") or None,
|
||||
woningtype=item.get("type") or None,
|
||||
woonoppervlak=item.get("livingSurface") or None,
|
||||
perceeloppervlak=perceel,
|
||||
kamers=item.get("rooms") or None,
|
||||
slaapkamers=item.get("bedrooms") or None,
|
||||
bouwjaar=bouwjaar,
|
||||
energielabel=energielabel,
|
||||
hero_image_url=item.get("photo") or None,
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
|
||||
log.info("doen: %d koopwoningen opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Vandriel Makelaardij (Schiedam) — OG Online / realtime-listings
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_VANDRIEL_BASE = "https://www.vandrielmakelaardij.nl"
|
||||
_VANDRIEL_SKIP = {"rented", "rented_ur"}
|
||||
|
||||
_VANDRIEL_STATUS_MAP = {
|
||||
"available": "beschikbaar",
|
||||
"under_bid": "onder_bod",
|
||||
"under_option": "onder_bod",
|
||||
"sold": "verkocht",
|
||||
"sold_ur": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def fetch_vandriel() -> list[RawListing]:
|
||||
data = fetch_json(
|
||||
f"{_VANDRIEL_BASE}/nl/realtime-listings/consumer",
|
||||
headers={"X-Requested-With": "XMLHttpRequest"},
|
||||
)
|
||||
|
||||
listings = []
|
||||
for item in data:
|
||||
if not item.get("isSales"):
|
||||
continue
|
||||
if item.get("statusOrig") in _VANDRIEL_SKIP:
|
||||
continue
|
||||
if (item.get("city") or "").lower() != "schiedam":
|
||||
continue
|
||||
if item.get("salesPrice", 0) > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
postcode = (item.get("zipcode") or "").replace(" ", "") or None
|
||||
perceel = item.get("plotSurface") or None
|
||||
if perceel == 0:
|
||||
perceel = None
|
||||
|
||||
raw_year = item.get("dateOfConstruction") or ""
|
||||
bouwjaar = int(raw_year) if raw_year.isdigit() else None
|
||||
energielabel = item.get("energyLabel") or None
|
||||
|
||||
detail_url = _VANDRIEL_BASE + item["url"]
|
||||
if not energielabel:
|
||||
extra_kk = _og_detail(detail_url, "vandriel")
|
||||
energielabel = extra_kk.get("energielabel")
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="vandriel",
|
||||
status=_VANDRIEL_STATUS_MAP.get(item.get("statusOrig", ""), "beschikbaar"),
|
||||
adres=item.get("address") or None,
|
||||
postcode=postcode,
|
||||
stad=item.get("city") or None,
|
||||
prijs=item.get("salesPrice") or None,
|
||||
woningtype=item.get("type") or None,
|
||||
woonoppervlak=item.get("livingSurface") or None,
|
||||
perceeloppervlak=perceel,
|
||||
kamers=item.get("rooms") or None,
|
||||
slaapkamers=item.get("bedrooms") or None,
|
||||
bouwjaar=bouwjaar,
|
||||
energielabel=energielabel,
|
||||
hero_image_url=item.get("photo") or None,
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
|
||||
log.info("vandriel: %d koopwoningen opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# SCRAPERS — exporteer hier alle actieve API adapters
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
SCRAPERS = {
|
||||
'bjornd': fetch_bjornd,
|
||||
'ooms': fetch_ooms,
|
||||
'moerman': fetch_moerman,
|
||||
'vandaal': fetch_vandaal,
|
||||
'elzenaar': fetch_elzenaar,
|
||||
'doen': fetch_doen,
|
||||
'vandriel': fetch_vandriel,
|
||||
}
|
||||
|
||||
65
src/adapters/ssr/__init__.py
Normal file
65
src/adapters/ssr/__init__.py
Normal file
@@ -0,0 +1,65 @@
|
||||
"""
|
||||
adapters/ssr — HTML/SSR-based makelaars
|
||||
|
||||
Elke scraper is een functie () -> list[RawListing].
|
||||
Om een nieuwe makelaar toe te voegen:
|
||||
1. Voeg een fetch_* functie toe in het juiste submodule
|
||||
(realworks.py, sure.py, schiedam.py, denhaag.py, overige.py)
|
||||
2. Importeer de functie hier en registreer in SCRAPERS.
|
||||
|
||||
CMS-typen per module:
|
||||
realworks.py — Realworks CMS (li/div.aanbodEntry + span.kenmerk detail)
|
||||
sure.py — SURE WordPress plugin (/wonen?sure_koop_huur=koop + #kenmerken)
|
||||
schiedam.py — Custom Schiedam scrapers (diverse platforms)
|
||||
denhaag.py — Den Haag scrapers (diverse platforms)
|
||||
overige.py — Overige / multi-stad (OG Online WP, Elementor)
|
||||
"""
|
||||
|
||||
from .realworks import (
|
||||
fetch_ankebodewes,
|
||||
fetch_woongoed,
|
||||
fetch_vwmakelaars,
|
||||
fetch_zomakelaars,
|
||||
fetch_morris,
|
||||
fetch_wassenaar,
|
||||
fetch_roepman,
|
||||
fetch_post,
|
||||
fetch_vankleef,
|
||||
)
|
||||
from .sure import (
|
||||
fetch_schielandborsboom,
|
||||
fetch_olsthoorn,
|
||||
fetch_vanherk,
|
||||
fetch_borgdorff,
|
||||
)
|
||||
from .schiedam import (
|
||||
fetch_dewittegarantiemakelaars,
|
||||
fetch_dens,
|
||||
fetch_3dmakelaars,
|
||||
fetch_dupont,
|
||||
)
|
||||
from .denhaag import fetch_88makelaars
|
||||
from .overige import fetch_vansilfhout, fetch_vanoord
|
||||
|
||||
SCRAPERS = {
|
||||
'ankebodewes': fetch_ankebodewes,
|
||||
'woongoed': fetch_woongoed,
|
||||
'dewittegarantiemakelaars': fetch_dewittegarantiemakelaars,
|
||||
'wassenaar': fetch_wassenaar,
|
||||
'dens': fetch_dens,
|
||||
'3dmakelaars': fetch_3dmakelaars,
|
||||
'dupont': fetch_dupont,
|
||||
'schielandborsboom': fetch_schielandborsboom,
|
||||
'vansilfhout': fetch_vansilfhout,
|
||||
'vwmakelaars': fetch_vwmakelaars,
|
||||
'roepman': fetch_roepman,
|
||||
'zomakelaars': fetch_zomakelaars,
|
||||
'post': fetch_post,
|
||||
'morris': fetch_morris,
|
||||
'olsthoorn': fetch_olsthoorn,
|
||||
'88makelaars': fetch_88makelaars,
|
||||
'borgdorff': fetch_borgdorff,
|
||||
'vanherk': fetch_vanherk,
|
||||
'vanoord': fetch_vanoord,
|
||||
'vankleef': fetch_vankleef,
|
||||
}
|
||||
79
src/adapters/ssr/_shared.py
Normal file
79
src/adapters/ssr/_shared.py
Normal file
@@ -0,0 +1,79 @@
|
||||
"""Shared utilities for all SSR scrapers."""
|
||||
import logging
|
||||
import re
|
||||
import time
|
||||
|
||||
import httpx
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
import config
|
||||
|
||||
log = logging.getLogger("huizenbot.ssr")
|
||||
|
||||
|
||||
def fetch_soup(url: str, *, params: dict = None) -> BeautifulSoup:
|
||||
"""GET request → BeautifulSoup. Handelt 429 af met Retry-After."""
|
||||
for attempt in range(3):
|
||||
r = httpx.get(
|
||||
url,
|
||||
params=params,
|
||||
headers={"User-Agent": config.USER_AGENT},
|
||||
timeout=15,
|
||||
follow_redirects=True,
|
||||
)
|
||||
if r.status_code == 429:
|
||||
wait = int(r.headers.get("Retry-After", 60))
|
||||
log.warning("429 op %s, wacht %ds", url, wait)
|
||||
time.sleep(wait)
|
||||
continue
|
||||
r.raise_for_status()
|
||||
return BeautifulSoup(r.text, "html.parser")
|
||||
|
||||
raise RuntimeError(f"Blijvend 429 op {url}")
|
||||
|
||||
|
||||
def parse_prijs(text: str | None) -> int | None:
|
||||
"""'€ 325.000 k.k.' → 325000"""
|
||||
if not text:
|
||||
return None
|
||||
digits = re.sub(r"[^\d]", "", text)
|
||||
return int(digits) if digits else None
|
||||
|
||||
|
||||
def parse_m2(text: str | None) -> int | None:
|
||||
"""'87 m²' → 87"""
|
||||
if not text:
|
||||
return None
|
||||
m = re.search(r"(\d+)", text.replace(".", ""))
|
||||
return int(m.group(1)) if m else None
|
||||
|
||||
|
||||
def _text(soup, selector: str) -> str | None:
|
||||
el = soup.select_one(selector)
|
||||
return el.get_text(strip=True) if el else None
|
||||
|
||||
|
||||
def _src(soup, selector: str) -> str | None:
|
||||
el = soup.select_one(selector)
|
||||
if el is None:
|
||||
return None
|
||||
return el.get("src") or el.get("data-src")
|
||||
|
||||
|
||||
def _extract_postcode(text: str | None) -> str | None:
|
||||
if not text:
|
||||
return None
|
||||
m = re.search(r"\b(\d{4}\s?[A-Z]{2})\b", text)
|
||||
return m.group(1).replace(" ", "") if m else None
|
||||
|
||||
|
||||
def _infer_stad(postcode: str | None) -> str | None:
|
||||
"""Simpele mapping op basis van postcode range — uitbreiden naar wens."""
|
||||
if not postcode:
|
||||
return None
|
||||
code = int(postcode[:4])
|
||||
if 2600 <= code <= 2629:
|
||||
return "Delft"
|
||||
if 3100 <= code <= 3135:
|
||||
return "Schiedam"
|
||||
return None
|
||||
138
src/adapters/ssr/denhaag.py
Normal file
138
src/adapters/ssr/denhaag.py
Normal file
@@ -0,0 +1,138 @@
|
||||
"""
|
||||
Den Haag scrapers (custom platforms).
|
||||
|
||||
Scrapers: 88makelaars
|
||||
Note: borgdorff also covers Den Haag but uses the SURE CMS → see sure.py.
|
||||
"""
|
||||
import re
|
||||
|
||||
import config
|
||||
from huizenbot import RawListing
|
||||
|
||||
from ._shared import fetch_soup, parse_prijs, parse_m2, _text, log
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 88 Makelaars (Den Haag)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_88_BASE = "https://88makelaars.nl"
|
||||
|
||||
_88_STATUS_MAP = {
|
||||
"te koop": "beschikbaar",
|
||||
"beschikbaar": "beschikbaar",
|
||||
"onder bod": "onder_bod",
|
||||
"onder optie": "onder_bod",
|
||||
"verkocht onder voorbehoud": "verkocht",
|
||||
"verkocht": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def _88makelaars_detail(detail_url: str) -> dict:
|
||||
"""Fetch 88makelaars detail page; extract kenmerken from div.listing_detail kv pairs."""
|
||||
try:
|
||||
soup = fetch_soup(detail_url)
|
||||
kv: dict[str, str] = {}
|
||||
for div in soup.select("div.listing_detail"):
|
||||
txt = div.get_text(strip=True)
|
||||
if ":" in txt:
|
||||
label, _, value = txt.partition(":")
|
||||
kv[label.strip().lower()] = value.strip()
|
||||
raw_pc = kv.get("postcode") or ""
|
||||
pc_match = re.search(r"\d{4}\s*[A-Z]{2}", raw_pc.upper())
|
||||
postcode = pc_match.group(0).replace(" ", "") if pc_match else None
|
||||
return {
|
||||
"postcode": postcode,
|
||||
"slaapkamers": kv.get("slaapkamers"),
|
||||
"woonoppervlak": kv.get("woning grootte"),
|
||||
"energielabel": kv.get("energieklasse"),
|
||||
"woningtype": kv.get("soort woning"),
|
||||
}
|
||||
except Exception as e:
|
||||
log.warning("88makelaars: detail fetch fout %s: %s", detail_url, e)
|
||||
return {}
|
||||
|
||||
|
||||
def fetch_88makelaars() -> list[RawListing]:
|
||||
"""Fetch 88 Makelaars listings (Den Haag only)."""
|
||||
listings = []
|
||||
page = 1
|
||||
|
||||
while True:
|
||||
if page == 1:
|
||||
url = f"{_88_BASE}/ons-aanbod/"
|
||||
else:
|
||||
url = f"{_88_BASE}/ons-aanbod/page/{page}/"
|
||||
soup = fetch_soup(url)
|
||||
cards = soup.select("div.property_listing")
|
||||
if not cards:
|
||||
break
|
||||
|
||||
for card in cards:
|
||||
try:
|
||||
# URL from carousel
|
||||
a_tag = card.select_one(".property_unit_carousel a[href]")
|
||||
if not a_tag:
|
||||
continue
|
||||
detail_url = a_tag["href"]
|
||||
if not detail_url.startswith("http"):
|
||||
detail_url = _88_BASE + detail_url
|
||||
|
||||
# City — last link in property_location_image
|
||||
loc_links = card.select(".property_location_image a")
|
||||
stad = loc_links[-1].get_text(strip=True) if loc_links else None
|
||||
if not stad or stad.lower() != "den haag":
|
||||
continue
|
||||
|
||||
# Price
|
||||
prijs = parse_prijs(_text(card, ".listing_unit_price_wrapper"))
|
||||
if prijs and prijs > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
# Status
|
||||
status_text = (_text(card, ".ribbon-inside") or "").lower()
|
||||
status = _88_STATUS_MAP.get(status_text, "beschikbaar")
|
||||
|
||||
# Address
|
||||
adres = _text(card, "h4 a") or _text(card, "h4")
|
||||
|
||||
# Surface + rooms
|
||||
woonoppervlak_card = parse_m2(_text(card, "span.infosize"))
|
||||
kamers_card = None
|
||||
rooms_txt = _text(card, "span.inforoom")
|
||||
if rooms_txt:
|
||||
m = re.search(r"(\d+)", rooms_txt)
|
||||
kamers_card = int(m.group(1)) if m else None
|
||||
|
||||
# Hero: first active carousel image
|
||||
img = card.select_one(".item.active img")
|
||||
hero = img.get("src") or img.get("data-original") if img else None
|
||||
|
||||
kk = _88makelaars_detail(detail_url)
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="88makelaars",
|
||||
status=status,
|
||||
adres=adres,
|
||||
postcode=kk.get("postcode"),
|
||||
stad="Den Haag",
|
||||
prijs=prijs,
|
||||
hero_image_url=hero,
|
||||
woningtype=kk.get("woningtype"),
|
||||
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
|
||||
kamers=kamers_card,
|
||||
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
|
||||
energielabel=kk.get("energielabel"),
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
except Exception as e:
|
||||
log.warning("88makelaars: parse fout: %s", e)
|
||||
|
||||
if len(cards) < 10:
|
||||
break
|
||||
page += 1
|
||||
|
||||
log.info("88makelaars: %d listings opgehaald", len(listings))
|
||||
return listings
|
||||
288
src/adapters/ssr/overige.py
Normal file
288
src/adapters/ssr/overige.py
Normal file
@@ -0,0 +1,288 @@
|
||||
"""
|
||||
Overige SSR scrapers (no shared CMS platform, multi-city).
|
||||
|
||||
Scrapers: vansilfhout (OG Online WordPress), vanoord (Elementor/custom)
|
||||
"""
|
||||
import re
|
||||
|
||||
import config
|
||||
from huizenbot import RawListing
|
||||
|
||||
from ._shared import fetch_soup, parse_prijs, parse_m2, _text, log
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Van Silfhout & Hogetoorn Wereldmakelaars (Delft) — OG Online WordPress
|
||||
# ---------------------------------------------------------------------------
|
||||
# All listings on one page. Postcode embedded in JS; detail has shortSpecs.
|
||||
# Also serves as base for fetch_vwmakelaars and fetch_zomakelaars which
|
||||
# happen to use the standard Realworks CMS instead — see realworks.py.
|
||||
|
||||
_VANSILFHOUT_BASE = "https://www.vansilfhout.nl"
|
||||
|
||||
_VANSILFHOUT_STATUS_MAP = {
|
||||
"te koop": "beschikbaar",
|
||||
"onder bod": "onder_bod",
|
||||
"verkocht": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def _vansilfhout_detail(detail_url: str) -> dict:
|
||||
"""Fetch Van Silfhout detail page; extract postcode from JS and specs from shortSpecs."""
|
||||
try:
|
||||
import httpx
|
||||
r = httpx.get(
|
||||
detail_url,
|
||||
headers={"User-Agent": config.USER_AGENT},
|
||||
timeout=15,
|
||||
follow_redirects=True,
|
||||
)
|
||||
r.raise_for_status()
|
||||
html = r.text
|
||||
from bs4 import BeautifulSoup
|
||||
soup = BeautifulSoup(html, "html.parser")
|
||||
|
||||
# Postcode embedded in JS: objectZipcode': '2624NP'
|
||||
m = re.search(r"objectZipcode':\s*'([^']+)'", html)
|
||||
postcode = m.group(1) if m else None
|
||||
|
||||
# shortSpecs: <li><span>Label:</span><span>Value</span></li>
|
||||
kv: dict[str, str] = {}
|
||||
for li in soup.select(".shortSpecs li"):
|
||||
spans = li.select("span")
|
||||
if len(spans) >= 2:
|
||||
label = spans[0].get_text(strip=True).rstrip(":").lower()
|
||||
value = spans[-1].get_text(strip=True)
|
||||
kv[label] = value
|
||||
|
||||
return {
|
||||
"postcode": postcode,
|
||||
"bouwjaar": kv.get("bouwjaar"),
|
||||
"woonoppervlak": kv.get("oppervlakte"),
|
||||
"kamers": kv.get("kamers"),
|
||||
"slaapkamers": kv.get("slaapkamers"),
|
||||
}
|
||||
except Exception as e:
|
||||
log.warning("vansilfhout: detail fetch fout %s: %s", detail_url, e)
|
||||
return {}
|
||||
|
||||
|
||||
def fetch_vansilfhout() -> list[RawListing]:
|
||||
"""Fetch Van Silfhout woningaanbod (alle listings op één pagina)."""
|
||||
soup = fetch_soup(f"{_VANSILFHOUT_BASE}/woningaanbod/")
|
||||
listings = []
|
||||
|
||||
for card in soup.select("article.row"):
|
||||
try:
|
||||
a_tag = card.select_one("a.objectcontainerimg")
|
||||
if not a_tag or "href" not in a_tag.attrs:
|
||||
continue
|
||||
detail_url = a_tag["href"]
|
||||
if not detail_url.startswith("http"):
|
||||
detail_url = _VANSILFHOUT_BASE + detail_url
|
||||
|
||||
# Status
|
||||
status_text = (_text(card, "span.objectstatus") or "").lower()
|
||||
status = _VANSILFHOUT_STATUS_MAP.get(status_text, "beschikbaar")
|
||||
|
||||
# Address and city
|
||||
adres = _text(card, "h2.objecttitle")
|
||||
city_el = card.select("a.straatnaamwoonplaats span")
|
||||
stad = city_el[-1].get_text(strip=True) if city_el else None
|
||||
|
||||
# Price from shortSpecs strong
|
||||
prijs = parse_prijs(_text(card, "ul.shortSpecs li strong"))
|
||||
if prijs and prijs > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
# Area and rooms from shortSpecs
|
||||
woonoppervlak_card = None
|
||||
kamers_card = None
|
||||
for li in card.select("ul.shortSpecs li"):
|
||||
spans = li.select("span")
|
||||
if len(spans) >= 2:
|
||||
label = spans[0].get_text(strip=True).lower()
|
||||
val = spans[-1].get_text(strip=True)
|
||||
if "oppervlakt" in label:
|
||||
woonoppervlak_card = parse_m2(val)
|
||||
elif "kamer" in label:
|
||||
m = re.search(r"(\d+)", val)
|
||||
kamers_card = int(m.group(1)) if m else None
|
||||
|
||||
# Hero image: prefer data-lazy-src, fall back to noscript img src
|
||||
img_tag = card.select_one("a.objectcontainerimg img")
|
||||
hero = None
|
||||
if img_tag:
|
||||
hero = (img_tag.get("data-lazy-src")
|
||||
or img_tag.get("src") or None)
|
||||
if hero and hero.startswith("data:"):
|
||||
noscript = card.select_one("noscript img")
|
||||
hero = noscript["src"] if noscript else None
|
||||
|
||||
kk = _vansilfhout_detail(detail_url)
|
||||
|
||||
# Parse kamers/slaapkamers from detail
|
||||
kamers = kamers_card
|
||||
if kk.get("kamers"):
|
||||
m = re.search(r"(\d+)", kk["kamers"])
|
||||
kamers = int(m.group(1)) if m else kamers_card
|
||||
|
||||
slaapkamers = None
|
||||
if kk.get("slaapkamers"):
|
||||
m = re.search(r"(\d+)", kk["slaapkamers"])
|
||||
slaapkamers = int(m.group(1)) if m else None
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="vansilfhout",
|
||||
status=status,
|
||||
adres=adres,
|
||||
postcode=kk.get("postcode"),
|
||||
stad=stad,
|
||||
prijs=prijs,
|
||||
hero_image_url=hero,
|
||||
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
|
||||
kamers=kamers,
|
||||
slaapkamers=slaapkamers,
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
except Exception as e:
|
||||
log.warning("vansilfhout: parse fout: %s", e)
|
||||
|
||||
log.info("vansilfhout: %d listings opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Van Oord Makelaardij (Delft + Schiedam) — Elementor/custom WordPress
|
||||
# ---------------------------------------------------------------------------
|
||||
# Separate listing pages per city; detail page has rw-object-features-list.
|
||||
|
||||
_VANOORD_BASE = "https://www.vanoordmakelaardij.nl"
|
||||
_VANOORD_LISTINGS = [
|
||||
f"https://www.vanoordmakelaardij.nl/aanbod/?_price=0%2C{config.MAX_PRICE}&_city=Delft&_availability=Te+koop",
|
||||
f"https://www.vanoordmakelaardij.nl/aanbod/?_price=0%2C{config.MAX_PRICE}&_city=Schiedam&_availability=Te+koop",
|
||||
]
|
||||
|
||||
_VANOORD_STATUS_MAP = {
|
||||
"te koop": "beschikbaar",
|
||||
"onder bod": "onder_bod",
|
||||
"verkocht": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def _vanoord_detail(detail_url: str) -> dict:
|
||||
"""Fetch Van Oord detail page; extract kenmerken from rw-object-features-list and postcode."""
|
||||
try:
|
||||
soup = fetch_soup(detail_url)
|
||||
kv: dict[str, str] = {}
|
||||
for li in soup.select("ul.rw-object-features-list li"):
|
||||
label_el = li.select_one("span.rw-object-list-label")
|
||||
value_el = li.select_one("span.rw-object-list-value")
|
||||
if label_el and value_el:
|
||||
label = label_el.get_text(strip=True).lower()
|
||||
value = value_el.get_text(strip=True)
|
||||
kv[label] = value
|
||||
# Postcode is in first .elementor-heading-title (e.g. "3562 TN,")
|
||||
headings = soup.select(".elementor-heading-title")
|
||||
postcode = None
|
||||
if headings:
|
||||
postcode = headings[0].get_text(strip=True).rstrip(",").strip()
|
||||
return {
|
||||
"status": kv.get("status", "").lower(),
|
||||
"postcode": postcode,
|
||||
"bouwjaar": kv.get("bouwjaar"),
|
||||
"woonoppervlak": kv.get("woonoppervlakte"),
|
||||
"kamers": kv.get("aantal kamers"),
|
||||
"slaapkamers": kv.get("slaapkamers"),
|
||||
"energielabel": kv.get("energieklasse"),
|
||||
}
|
||||
except Exception as e:
|
||||
log.warning("vanoord: detail fetch fout %s: %s", detail_url, e)
|
||||
return {}
|
||||
|
||||
|
||||
|
||||
def fetch_vanoord() -> list[RawListing]:
|
||||
"""Fetch Van Oord listings; Delft and Schiedam, only koop."""
|
||||
seen: set[str] = set()
|
||||
listings = []
|
||||
|
||||
for listing_url in _VANOORD_LISTINGS:
|
||||
soup = fetch_soup(listing_url)
|
||||
cards = soup.select("div.e-loop-item")
|
||||
|
||||
for card in cards:
|
||||
try:
|
||||
# Detail URL from h3 > a
|
||||
a_tag = card.select_one("h3.elementor-heading-title a[href]")
|
||||
if not a_tag:
|
||||
continue
|
||||
detail_url = a_tag["href"]
|
||||
if not detail_url.startswith("http"):
|
||||
detail_url = _VANOORD_BASE + detail_url
|
||||
if detail_url in seen:
|
||||
continue
|
||||
seen.add(detail_url)
|
||||
|
||||
# Status from rw-status-label widget class
|
||||
status_el = card.select_one("[class*='rw-status-label--']")
|
||||
status = "beschikbaar"
|
||||
if status_el:
|
||||
status_text = status_el.get_text(strip=True).lower()
|
||||
status = _VANOORD_STATUS_MAP.get(status_text, "beschikbaar")
|
||||
|
||||
# City from h4
|
||||
h4 = card.select_one("h4.elementor-heading-title")
|
||||
stad = h4.get_text(strip=True) if h4 else None
|
||||
|
||||
# Address from h3 > a text
|
||||
adres = " ".join(a_tag.get_text().split())
|
||||
|
||||
# Price from h3 without <a> child
|
||||
prijs = None
|
||||
for h3 in card.select("h3.elementor-heading-title"):
|
||||
if not h3.select_one("a"):
|
||||
prijs = parse_prijs(h3.get_text())
|
||||
break
|
||||
if prijs and prijs > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
# Card icon list: [0]=surface [1]=rooms [2]=energy
|
||||
icon_items = card.select("ul.elementor-icon-list-items li span.elementor-icon-list-text")
|
||||
woonoppervlak_card = parse_m2(icon_items[0].get_text()) if len(icon_items) > 0 else None
|
||||
kamers_card = None
|
||||
if len(icon_items) > 1:
|
||||
m = re.search(r"(\d+)", icon_items[1].get_text())
|
||||
kamers_card = int(m.group(1)) if m else None
|
||||
energielabel_card = icon_items[2].get_text(strip=True) if len(icon_items) > 2 else None
|
||||
|
||||
kk = _vanoord_detail(detail_url)
|
||||
|
||||
detail_status = _VANOORD_STATUS_MAP.get(kk.get("status", ""), "")
|
||||
if detail_status:
|
||||
status = detail_status
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="vanoord",
|
||||
status=status,
|
||||
adres=adres,
|
||||
postcode=kk.get("postcode"),
|
||||
stad=stad,
|
||||
prijs=prijs,
|
||||
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar", "").isdigit() else None,
|
||||
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
|
||||
kamers=(int(kk["kamers"]) if kk.get("kamers", "").isdigit() else None) or kamers_card,
|
||||
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers", "").isdigit() else None,
|
||||
energielabel=kk.get("energielabel") or energielabel_card,
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
except Exception as e:
|
||||
log.warning("vanoord: parse fout: %s", e)
|
||||
|
||||
log.info("vanoord: %d listings opgehaald", len(listings))
|
||||
return listings
|
||||
568
src/adapters/ssr/realworks.py
Normal file
568
src/adapters/ssr/realworks.py
Normal file
@@ -0,0 +1,568 @@
|
||||
"""
|
||||
Realworks CMS scrapers.
|
||||
|
||||
All makelaars here run the Realworks CMS. Listings come from paginated
|
||||
/aanbod/woningaanbod/-{price}/koop/ pages; detail pages have span.kenmerk
|
||||
label/value pairs. Some variants (Wassenaar, Roepman) expose listing-level
|
||||
data via JSON-LD instead of card HTML.
|
||||
|
||||
Scrapers: ankebodewes, woongoed, vwmakelaars, zomakelaars, morris,
|
||||
wassenaar, roepman, post
|
||||
"""
|
||||
import json as _json
|
||||
import re
|
||||
|
||||
import config
|
||||
from huizenbot import RawListing
|
||||
|
||||
from ._shared import fetch_soup, parse_prijs, parse_m2, _text, log
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Shared Realworks helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_REALWORKS_STATUS_MAP = {
|
||||
"te koop": "beschikbaar",
|
||||
"nieuw": "beschikbaar",
|
||||
"onder bod": "onder_bod",
|
||||
"onder optie": "onder_bod",
|
||||
"verkocht o.v.": "verkocht",
|
||||
"verkocht": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def _realworks_detail(detail_url: str, makelaar: str) -> dict:
|
||||
"""Fetch a Realworks detail page and extract kenmerken. Returns empty dict on failure."""
|
||||
try:
|
||||
soup = fetch_soup(detail_url)
|
||||
|
||||
# Build a label→value map from all .kenmerk spans
|
||||
kv: dict[str, str] = {}
|
||||
for kenmerk in soup.select("span.kenmerk"):
|
||||
label_el = kenmerk.select_one("span.kenmerkName")
|
||||
value_el = kenmerk.select_one("span.kenmerkValue")
|
||||
if label_el and value_el:
|
||||
label = label_el.get_text(strip=True).lower()
|
||||
value = value_el.get_text(strip=True)
|
||||
kv[label] = value
|
||||
|
||||
return {
|
||||
"woningtype": kv.get("type woning"),
|
||||
"bouwjaar": kv.get("bouwjaar"),
|
||||
"woonoppervlak": kv.get("woonoppervlakte"),
|
||||
"perceeloppervlak": kv.get("perceeloppervlakte"),
|
||||
"kamers": kv.get("aantal kamers"),
|
||||
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||
"energielabel": kv.get("energieklasse"),
|
||||
}
|
||||
except Exception as e:
|
||||
log.warning("%s: detail fetch fout %s: %s", makelaar, detail_url, e)
|
||||
return {}
|
||||
|
||||
|
||||
def fetch_realworks(base_url: str, makelaar: str) -> list[RawListing]:
|
||||
"""
|
||||
Generic fetcher for Realworks CMS brokers.
|
||||
Paginates via /pagina-{n}/, fetches detail page per listing.
|
||||
"""
|
||||
listings_path = f"/aanbod/woningaanbod/-{config.MAX_PRICE}/koop"
|
||||
listings = []
|
||||
page = 1
|
||||
|
||||
while True:
|
||||
url = f"{base_url}{listings_path}/pagina-{page}/"
|
||||
soup = fetch_soup(url)
|
||||
cards = soup.select("li.aanbodEntry")
|
||||
if not cards:
|
||||
break
|
||||
|
||||
for card in cards:
|
||||
try:
|
||||
a_tag = card.select_one("a.aanbodEntryLink")
|
||||
if not a_tag:
|
||||
continue
|
||||
listing_url = base_url + a_tag["href"]
|
||||
|
||||
adres = _text(card, ".street-address")
|
||||
postcode = (_text(card, ".postal-code") or "").replace(" ", "") or None
|
||||
stad = _text(card, ".locality")
|
||||
prijs = parse_prijs(_text(card, ".koopprijs .kenmerkValue"))
|
||||
|
||||
status_text = (_text(card, ".objectstatusbanner") or "").lower()
|
||||
status = _REALWORKS_STATUS_MAP.get(status_text, "beschikbaar")
|
||||
|
||||
img_tag = card.select_one(".hoofdfoto img")
|
||||
hero = img_tag["src"] if img_tag else None
|
||||
|
||||
kk = _realworks_detail(listing_url, makelaar)
|
||||
|
||||
listings.append(RawListing(
|
||||
url=listing_url,
|
||||
source_makelaar=makelaar,
|
||||
adres=adres,
|
||||
postcode=postcode,
|
||||
stad=stad,
|
||||
prijs=prijs,
|
||||
status=status,
|
||||
hero_image_url=hero,
|
||||
woningtype=kk.get("woningtype"),
|
||||
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||
woonoppervlak=parse_m2(kk.get("woonoppervlak")),
|
||||
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||
kamers=int(kk["kamers"]) if kk.get("kamers") else None,
|
||||
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
|
||||
energielabel=kk.get("energielabel"),
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
except Exception as e:
|
||||
log.warning("%s: parse fout: %s", makelaar, e)
|
||||
|
||||
if len(cards) < 10:
|
||||
break
|
||||
page += 1
|
||||
|
||||
log.info("%s: %d listings opgehaald", makelaar, len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Simple Realworks wrappers (one-liners)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def fetch_ankebodewes() -> list[RawListing]:
|
||||
return fetch_realworks("https://www.ankebodewes.nl", "ankebodewes")
|
||||
|
||||
|
||||
def fetch_woongoed() -> list[RawListing]:
|
||||
return fetch_realworks("https://www.woongoedmakelaars.nl", "woongoed")
|
||||
|
||||
|
||||
def fetch_vwmakelaars() -> list[RawListing]:
|
||||
return fetch_realworks("https://www.vwmakelaars.nl", "vwmakelaars")
|
||||
|
||||
|
||||
def fetch_zomakelaars() -> list[RawListing]:
|
||||
return fetch_realworks("https://www.zomakelaars.nl", "zomakelaars")
|
||||
|
||||
|
||||
def fetch_morris() -> list[RawListing]:
|
||||
return fetch_realworks("https://www.morrismakelaardij.nl", "morris")
|
||||
|
||||
|
||||
def fetch_vankleef() -> list[RawListing]:
|
||||
"""Fetch Van Kleef makelaars — only Schiedam, as specified."""
|
||||
listings_path = f"/aanbod/woningaanbod/schiedam/koop"
|
||||
listings = []
|
||||
page = 1
|
||||
|
||||
while True:
|
||||
url = f"https://www.vankleefmakelaars.nl{listings_path}/pagina-{page}/"
|
||||
soup = fetch_soup(url)
|
||||
cards = soup.select("li.aanbodEntry")
|
||||
if not cards:
|
||||
break
|
||||
|
||||
for card in cards:
|
||||
try:
|
||||
a_tag = card.select_one("a.aanbodEntryLink")
|
||||
if not a_tag:
|
||||
continue
|
||||
listing_url = "https://www.vankleefmakelaars.nl" + a_tag["href"]
|
||||
|
||||
adres = _text(card, ".street-address")
|
||||
postcode = (_text(card, ".postal-code") or "").replace(" ", "") or None
|
||||
stad = _text(card, ".locality")
|
||||
prijs = parse_prijs(_text(card, ".koopprijs .kenmerkValue"))
|
||||
|
||||
if prijs and prijs > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
status_text = (_text(card, ".objectstatusbanner") or "").lower()
|
||||
status = _REALWORKS_STATUS_MAP.get(status_text, "beschikbaar")
|
||||
|
||||
img_tag = card.select_one(".hoofdfoto img")
|
||||
hero = img_tag["src"] if img_tag else None
|
||||
|
||||
kk = _realworks_detail(listing_url, "vankleef")
|
||||
|
||||
listings.append(RawListing(
|
||||
url=listing_url,
|
||||
source_makelaar="vankleef",
|
||||
adres=adres,
|
||||
postcode=postcode,
|
||||
stad=stad,
|
||||
prijs=prijs,
|
||||
status=status,
|
||||
hero_image_url=hero,
|
||||
woningtype=kk.get("woningtype"),
|
||||
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||
woonoppervlak=parse_m2(kk.get("woonoppervlak")),
|
||||
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||
kamers=int(kk["kamers"]) if kk.get("kamers") else None,
|
||||
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
|
||||
energielabel=kk.get("energielabel"),
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
except Exception as e:
|
||||
log.warning("vankleef: parse fout: %s", e)
|
||||
|
||||
if len(cards) < 10:
|
||||
break
|
||||
page += 1
|
||||
|
||||
log.info("vankleef: %d listings opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Makelaardij Wassenaar (Schiedam) — Realworks CMS, JSON-LD listing page
|
||||
# ---------------------------------------------------------------------------
|
||||
# Listings page has JSON-LD (Residence) with url/address/price/photo.
|
||||
# Detail pages have span.kenmerk with Wassenaar-specific label names.
|
||||
|
||||
_WASSENAAR_BASE = "https://www.makelaardijwassenaar.nl"
|
||||
|
||||
_WASSENAAR_STATUS_MAP = {
|
||||
"te koop": "beschikbaar",
|
||||
"nieuw": "beschikbaar",
|
||||
"onder bod": "onder_bod",
|
||||
"onder optie": "onder_bod",
|
||||
"verkocht o.v.": "onder_bod",
|
||||
"verkocht onder voorbehoud": "onder_bod",
|
||||
"verkocht": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def _wassenaar_detail(detail_url: str) -> dict:
|
||||
"""Fetch Realworks detail page; extract kenmerken with Wassenaar-specific labels."""
|
||||
try:
|
||||
soup = fetch_soup(detail_url)
|
||||
kv: dict[str, str] = {}
|
||||
for kenmerk in soup.select("span.kenmerk"):
|
||||
label_el = kenmerk.select_one("span.kenmerkName")
|
||||
value_el = kenmerk.select_one("span.kenmerkValue")
|
||||
if label_el and value_el:
|
||||
kv[label_el.get_text(strip=True).lower()] = value_el.get_text(strip=True)
|
||||
return {
|
||||
"woningtype": kv.get("soort object"),
|
||||
"bouwjaar": kv.get("bouwjaar"),
|
||||
"woonoppervlak": kv.get("woonoppervlakte"),
|
||||
"perceeloppervlak": kv.get("perceeloppervlakte"),
|
||||
"kamers": kv.get("aantal kamers"),
|
||||
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||
"energielabel": kv.get("energieklasse"),
|
||||
}
|
||||
except Exception as e:
|
||||
log.warning("wassenaar: detail fetch fout %s: %s", detail_url, e)
|
||||
return {}
|
||||
|
||||
|
||||
def fetch_wassenaar() -> list[RawListing]:
|
||||
soup = fetch_soup(f"{_WASSENAAR_BASE}/aanbod/woningaanbod/-{config.MAX_PRICE}/koop/")
|
||||
|
||||
# First pass: collect status + thumbnail per relative url
|
||||
# Each listing has two a.aanbodEntryLink with the same href;
|
||||
# the first has the status banner + photo, the second has address + price.
|
||||
status_by_url: dict[str, str] = {}
|
||||
photo_by_url: dict[str, str] = {}
|
||||
for a in soup.select("a.aanbodEntryLink[href]"):
|
||||
href = a["href"]
|
||||
if href in status_by_url:
|
||||
continue
|
||||
banner = a.select_one(".objectstatusbanner")
|
||||
status_text = banner.get_text(strip=True).lower() if banner else ""
|
||||
status_by_url[href] = _WASSENAAR_STATUS_MAP.get(status_text, "beschikbaar")
|
||||
img = a.select_one("span.hoofdfoto img")
|
||||
if img:
|
||||
src = img.get("src", "")
|
||||
if "geenfotobeschikbaar" not in src:
|
||||
photo_by_url[href] = src
|
||||
|
||||
# Second pass: parse JSON-LD blocks (one per listing)
|
||||
seen: set[str] = set()
|
||||
listings = []
|
||||
for tag in soup.select('script[type="application/ld+json"]'):
|
||||
try:
|
||||
ld = _json.loads(tag.string)
|
||||
if ld.get("@type") != "Residence":
|
||||
continue
|
||||
rel_url = ld.get("url", "")
|
||||
if not rel_url or rel_url in seen:
|
||||
continue
|
||||
seen.add(rel_url)
|
||||
|
||||
detail_url = _WASSENAAR_BASE + rel_url
|
||||
address = ld.get("address", {})
|
||||
postcode = address.get("postalCode", "").replace(" ", "") or None
|
||||
|
||||
price_spec = next(
|
||||
(a.get("priceSpecification", {}) for a in ld.get("potentialAction", [])
|
||||
if a.get("priceSpecification")),
|
||||
{}
|
||||
)
|
||||
prijs = int(price_spec["price"]) if price_spec.get("price") else None
|
||||
if prijs and prijs > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
hero = ld.get("photo") or photo_by_url.get(rel_url)
|
||||
status = status_by_url.get(rel_url, "beschikbaar")
|
||||
kk = _wassenaar_detail(detail_url)
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="wassenaar",
|
||||
status=status,
|
||||
adres=address.get("streetAddress") or None,
|
||||
postcode=postcode,
|
||||
stad=address.get("addressLocality") or None,
|
||||
prijs=prijs,
|
||||
hero_image_url=hero,
|
||||
woningtype=kk.get("woningtype"),
|
||||
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||
woonoppervlak=parse_m2(kk.get("woonoppervlak")),
|
||||
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||
kamers=int(kk["kamers"]) if kk.get("kamers") else None,
|
||||
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
|
||||
energielabel=kk.get("energielabel"),
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
except Exception as e:
|
||||
log.warning("wassenaar: parse fout: %s", e)
|
||||
|
||||
log.info("wassenaar: %d listings opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Roepman Makelaardij NVM (Delft) — Realworks CMS, JSON-LD listing page
|
||||
# ---------------------------------------------------------------------------
|
||||
# Uses div.aanbodEntry instead of li.aanbodEntry; price from JSON-LD.
|
||||
|
||||
_ROEPMAN_BASE = "https://www.roepman.nl"
|
||||
|
||||
|
||||
def fetch_roepman() -> list[RawListing]:
|
||||
listings_path = f"/aanbod/woningaanbod/-{config.MAX_PRICE}/koop"
|
||||
listings = []
|
||||
page = 1
|
||||
|
||||
while True:
|
||||
url = f"{_ROEPMAN_BASE}{listings_path}/pagina-{page}/"
|
||||
soup = fetch_soup(url)
|
||||
cards = soup.select("div.aanbodEntry")
|
||||
if not cards:
|
||||
break
|
||||
|
||||
# Collect status + photo per relative url
|
||||
status_by_url: dict[str, str] = {}
|
||||
photo_by_url: dict[str, str] = {}
|
||||
for card in cards:
|
||||
a_tag = card.select_one("a.aanbodEntryLink[href]")
|
||||
if not a_tag:
|
||||
continue
|
||||
href = a_tag["href"]
|
||||
if href in status_by_url:
|
||||
continue
|
||||
banner = card.select_one(".objectstatusbanner")
|
||||
status_text = banner.get_text(strip=True).lower() if banner else ""
|
||||
status_by_url[href] = _REALWORKS_STATUS_MAP.get(status_text, "beschikbaar")
|
||||
img = card.select_one("img")
|
||||
if img:
|
||||
src = img.get("src", "")
|
||||
if "geenfotobeschikbaar" not in src:
|
||||
photo_by_url[href] = src
|
||||
|
||||
# Parse JSON-LD Residence blocks (one per listing)
|
||||
seen: set[str] = set()
|
||||
for tag in soup.select('script[type="application/ld+json"]'):
|
||||
try:
|
||||
ld = _json.loads(tag.string)
|
||||
if ld.get("@type") != "Residence":
|
||||
continue
|
||||
rel_url = ld.get("url", "")
|
||||
if not rel_url or rel_url in seen:
|
||||
continue
|
||||
seen.add(rel_url)
|
||||
|
||||
detail_url = _ROEPMAN_BASE + rel_url
|
||||
address = ld.get("address", {})
|
||||
postcode = address.get("postalCode", "").replace(" ", "") or None
|
||||
|
||||
price_spec = next(
|
||||
(a.get("priceSpecification", {}) for a in ld.get("potentialAction", [])
|
||||
if a.get("priceSpecification")),
|
||||
{}
|
||||
)
|
||||
prijs = int(price_spec["price"]) if price_spec.get("price") else None
|
||||
if prijs and prijs > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
hero = ld.get("photo") or photo_by_url.get(rel_url)
|
||||
status = status_by_url.get(rel_url, "beschikbaar")
|
||||
kk = _realworks_detail(detail_url, "roepman")
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="roepman",
|
||||
status=status,
|
||||
adres=address.get("streetAddress") or None,
|
||||
postcode=postcode,
|
||||
stad=address.get("addressLocality") or None,
|
||||
prijs=prijs,
|
||||
hero_image_url=hero,
|
||||
woningtype=kk.get("woningtype"),
|
||||
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||
woonoppervlak=parse_m2(kk.get("woonoppervlak")),
|
||||
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||
kamers=int(kk["kamers"]) if kk.get("kamers") else None,
|
||||
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
|
||||
energielabel=kk.get("energielabel"),
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
except Exception as e:
|
||||
log.warning("roepman: parse fout: %s", e)
|
||||
|
||||
if len(cards) < 10:
|
||||
break
|
||||
page += 1
|
||||
|
||||
log.info("roepman: %d listings opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Post Makelaardij (Delft) — Realworks CMS, custom detail parser
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_POST_BASE = "https://www.postmakelaardij.nl"
|
||||
|
||||
_POST_STATUS_MAP = {
|
||||
"te koop": "beschikbaar",
|
||||
"onder bod": "onder_bod",
|
||||
"verkocht": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def _post_detail(detail_url: str) -> dict:
|
||||
"""Fetch Post Makelaardij detail page and extract kenmerken."""
|
||||
try:
|
||||
soup = fetch_soup(detail_url)
|
||||
|
||||
# Energielabel from CSS class: energielabel-{letter}
|
||||
energielabel = None
|
||||
for el in soup.select('[class]'):
|
||||
for cls in el.get('class', []):
|
||||
if cls.startswith('energielabel-') and cls != 'energielabel':
|
||||
energielabel = cls.replace('energielabel-', '').upper()
|
||||
break
|
||||
if energielabel:
|
||||
break
|
||||
|
||||
# Woonoppervlak, perceeloppervlak, slaapkamers from icon spans
|
||||
woonoppervlak = None
|
||||
perceeloppervlak = None
|
||||
slaapkamers = None
|
||||
for span in soup.select('span.object-info-icon-text'):
|
||||
txt = span.get_text(strip=True)
|
||||
if 'slaapkamer' in txt:
|
||||
m = re.search(r'(\d+)', txt)
|
||||
slaapkamers = int(m.group(1)) if m else None
|
||||
elif 'perceel' in txt:
|
||||
perceeloppervlak = parse_m2(txt)
|
||||
elif 'm²' in txt or 'm2' in txt:
|
||||
woonoppervlak = parse_m2(txt)
|
||||
|
||||
return {
|
||||
"woonoppervlak": woonoppervlak,
|
||||
"perceeloppervlak": perceeloppervlak,
|
||||
"slaapkamers": slaapkamers,
|
||||
"energielabel": energielabel,
|
||||
}
|
||||
except Exception as e:
|
||||
log.warning("post: detail fetch fout %s: %s", detail_url, e)
|
||||
return {}
|
||||
|
||||
|
||||
def fetch_post() -> list[RawListing]:
|
||||
"""Fetch Post Makelaardij listings; only Delft, only koop."""
|
||||
listings = []
|
||||
page = 1
|
||||
|
||||
while True:
|
||||
url = f"{_POST_BASE}/woningaanbod/koop?page={page}"
|
||||
soup = fetch_soup(url)
|
||||
cards = soup.select("article")
|
||||
if not cards:
|
||||
break
|
||||
|
||||
for card in cards:
|
||||
try:
|
||||
# URL — first link in image slider
|
||||
a_tag = card.select_one("a[href]")
|
||||
if not a_tag:
|
||||
continue
|
||||
href = a_tag["href"]
|
||||
detail_url = href if href.startswith("http") else _POST_BASE + href
|
||||
|
||||
# Postcode + city from span.custom-postcode-text
|
||||
pc_el = card.select_one("span.custom-postcode-text")
|
||||
if not pc_el:
|
||||
continue
|
||||
pc_parts = pc_el.get_text(strip=True).split()
|
||||
if len(pc_parts) < 3:
|
||||
continue
|
||||
postcode = pc_parts[0] + pc_parts[1] # "2613BD"
|
||||
stad = " ".join(pc_parts[2:]) # "Delft"
|
||||
|
||||
# Filter: only Delft
|
||||
if stad.lower() != "delft":
|
||||
continue
|
||||
|
||||
# Price — filter early
|
||||
prijs = parse_prijs(_text(card, "span.price-block"))
|
||||
if prijs and prijs > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
# Status from span.status text
|
||||
status_text = (_text(card, "span.status") or "").lower()
|
||||
status = _POST_STATUS_MAP.get(status_text, "beschikbaar")
|
||||
|
||||
# Address
|
||||
adres = _text(card, "h4.custom-address-text")
|
||||
|
||||
# Hero: first img in article
|
||||
img = card.select_one("img")
|
||||
hero = img["src"] if img else None
|
||||
|
||||
kk = _post_detail(detail_url)
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="post",
|
||||
status=status,
|
||||
adres=adres,
|
||||
postcode=postcode,
|
||||
stad=stad,
|
||||
prijs=prijs,
|
||||
hero_image_url=hero,
|
||||
woonoppervlak=kk.get("woonoppervlak"),
|
||||
perceeloppervlak=kk.get("perceeloppervlak"),
|
||||
slaapkamers=kk.get("slaapkamers"),
|
||||
energielabel=kk.get("energielabel"),
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
except Exception as e:
|
||||
log.warning("post: parse fout: %s", e)
|
||||
|
||||
if len(cards) < 12:
|
||||
break
|
||||
page += 1
|
||||
|
||||
log.info("post: %d listings opgehaald", len(listings))
|
||||
return listings
|
||||
@@ -1,195 +1,26 @@
|
||||
"""
|
||||
adapters/ssr.py — HTML/SSR-based makelaars
|
||||
Custom Schiedam scrapers (no shared CMS platform).
|
||||
|
||||
Elke scraper is een functie () -> list[RawListing].
|
||||
Voeg nieuwe toe onderaan en registreer in SCRAPERS.
|
||||
Each makelaar here uses a bespoke site structure that required its own parser.
|
||||
|
||||
Scrapers: dewittegarantiemakelaars (JSON-LD), dens, 3dmakelaars, dupont
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
import time
|
||||
|
||||
import httpx
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
import config
|
||||
from huizenbot import RawListing
|
||||
|
||||
log = logging.getLogger("huizenbot.ssr")
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Gedeelde HTTP helper
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def fetch_soup(url: str, *, params: dict = None) -> BeautifulSoup:
|
||||
"""
|
||||
GET request → BeautifulSoup. Handelt 429 af met Retry-After.
|
||||
"""
|
||||
for attempt in range(3):
|
||||
r = httpx.get(
|
||||
url,
|
||||
params=params,
|
||||
headers={"User-Agent": config.USER_AGENT},
|
||||
timeout=15,
|
||||
follow_redirects=True,
|
||||
)
|
||||
if r.status_code == 429:
|
||||
wait = int(r.headers.get("Retry-After", 60))
|
||||
log.warning("429 op %s, wacht %ds", url, wait)
|
||||
time.sleep(wait)
|
||||
continue
|
||||
r.raise_for_status()
|
||||
return BeautifulSoup(r.text, "html.parser")
|
||||
|
||||
raise RuntimeError(f"Blijvend 429 op {url}")
|
||||
from ._shared import (
|
||||
fetch_soup, parse_prijs, parse_m2, _text,
|
||||
_extract_postcode, _infer_stad, log,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Parse helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def parse_prijs(text: str | None) -> int | None:
|
||||
"""'€ 325.000 k.k.' → 325000"""
|
||||
if not text:
|
||||
return None
|
||||
digits = re.sub(r"[^\d]", "", text)
|
||||
return int(digits) if digits else None
|
||||
|
||||
|
||||
def parse_m2(text: str | None) -> int | None:
|
||||
"""'87 m²' → 87"""
|
||||
if not text:
|
||||
return None
|
||||
m = re.search(r"(\d+)", text.replace(".", ""))
|
||||
return int(m.group(1)) if m else None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Realworks CMS (shared)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_REALWORKS_STATUS_MAP = {
|
||||
"te koop": "beschikbaar",
|
||||
"nieuw": "beschikbaar",
|
||||
"onder bod": "onder_bod",
|
||||
"onder optie": "onder_bod",
|
||||
"verkocht o.v.": "verkocht",
|
||||
"verkocht": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def _realworks_detail(detail_url: str, makelaar: str) -> dict:
|
||||
"""Fetch a Realworks detail page and extract kenmerken. Returns empty dict on failure."""
|
||||
try:
|
||||
soup = fetch_soup(detail_url)
|
||||
|
||||
# Build a label→value map from all .kenmerk spans
|
||||
kv: dict[str, str] = {}
|
||||
for kenmerk in soup.select("span.kenmerk"):
|
||||
label_el = kenmerk.select_one("span.kenmerkName")
|
||||
value_el = kenmerk.select_one("span.kenmerkValue")
|
||||
if label_el and value_el:
|
||||
label = label_el.get_text(strip=True).lower()
|
||||
value = value_el.get_text(strip=True)
|
||||
kv[label] = value
|
||||
|
||||
return {
|
||||
"woningtype": kv.get("type woning"),
|
||||
"bouwjaar": kv.get("bouwjaar"),
|
||||
"woonoppervlak": kv.get("woonoppervlakte"),
|
||||
"perceeloppervlak": kv.get("perceeloppervlakte"),
|
||||
"kamers": kv.get("aantal kamers"),
|
||||
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||
"energielabel": kv.get("energieklasse"),
|
||||
}
|
||||
except Exception as e:
|
||||
log.warning("%s: detail fetch fout %s: %s", makelaar, detail_url, e)
|
||||
return {}
|
||||
|
||||
|
||||
def fetch_realworks(base_url: str, makelaar: str) -> list[RawListing]:
|
||||
"""
|
||||
Generic fetcher for Realworks CMS brokers.
|
||||
Paginates via /pagina-{n}/, fetches detail page per listing.
|
||||
"""
|
||||
listings_path = f"/aanbod/woningaanbod/-{config.MAX_PRICE}/koop"
|
||||
listings = []
|
||||
page = 1
|
||||
|
||||
while True:
|
||||
url = f"{base_url}{listings_path}/pagina-{page}/"
|
||||
soup = fetch_soup(url)
|
||||
cards = soup.select("li.aanbodEntry")
|
||||
if not cards:
|
||||
break
|
||||
|
||||
for card in cards:
|
||||
try:
|
||||
a_tag = card.select_one("a.aanbodEntryLink")
|
||||
if not a_tag:
|
||||
continue
|
||||
listing_url = base_url + a_tag["href"]
|
||||
|
||||
adres = _text(card, ".street-address")
|
||||
postcode = (_text(card, ".postal-code") or "").replace(" ", "") or None
|
||||
stad = _text(card, ".locality")
|
||||
prijs = parse_prijs(_text(card, ".koopprijs .kenmerkValue"))
|
||||
|
||||
status_text = (_text(card, ".objectstatusbanner") or "").lower()
|
||||
status = _REALWORKS_STATUS_MAP.get(status_text, "beschikbaar")
|
||||
|
||||
img_tag = card.select_one(".hoofdfoto img")
|
||||
hero = img_tag["src"] if img_tag else None
|
||||
|
||||
kk = _realworks_detail(listing_url, makelaar)
|
||||
|
||||
listings.append(RawListing(
|
||||
url=listing_url,
|
||||
source_makelaar=makelaar,
|
||||
adres=adres,
|
||||
postcode=postcode,
|
||||
stad=stad,
|
||||
prijs=prijs,
|
||||
status=status,
|
||||
hero_image_url=hero,
|
||||
woningtype=kk.get("woningtype"),
|
||||
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||
woonoppervlak=parse_m2(kk.get("woonoppervlak")),
|
||||
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||
kamers=int(kk["kamers"]) if kk.get("kamers") else None,
|
||||
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
|
||||
energielabel=kk.get("energielabel"),
|
||||
))
|
||||
except Exception as e:
|
||||
log.warning("%s: parse fout: %s", makelaar, e)
|
||||
|
||||
if len(cards) < 10:
|
||||
break
|
||||
page += 1
|
||||
|
||||
log.info("%s: %d listings opgehaald", makelaar, len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Anke Bodewes Makelaardij
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def fetch_ankebodewes() -> list[RawListing]:
|
||||
return fetch_realworks("https://www.ankebodewes.nl", "ankebodewes")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Woongoed Makelaars Schiedam
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def fetch_woongoed() -> list[RawListing]:
|
||||
return fetch_realworks("https://www.woongoedmakelaars.nl", "woongoed")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# De Witte Garantiemakelaars
|
||||
# De Witte Garantiemakelaars (Schiedam)
|
||||
# ---------------------------------------------------------------------------
|
||||
# Listing cards have a pill badge for status. All detail data comes from
|
||||
# JSON-LD (schema.org BuyAction/Offer) on the detail page.
|
||||
|
||||
_DEWITTE_BASE = "https://dewittegarantiemakelaars.nl"
|
||||
|
||||
@@ -292,6 +123,9 @@ def fetch_dewittegarantiemakelaars() -> list[RawListing]:
|
||||
bouwjaar=int(bouwjaar) if bouwjaar else None,
|
||||
hero_image_url=hero,
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
|
||||
except Exception as e:
|
||||
log.warning("dewitte: parse fout: %s", e)
|
||||
|
||||
@@ -303,159 +137,6 @@ def fetch_dewittegarantiemakelaars() -> list[RawListing]:
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Makelaardij Wassenaar (Schiedam)
|
||||
# ---------------------------------------------------------------------------
|
||||
# Realworks CMS. Listings page has JSON-LD (Residence) with url/address/price/photo.
|
||||
# Detail pages have span.kenmerk with Wassenaar-specific label names.
|
||||
|
||||
_WASSENAAR_BASE = "https://www.makelaardijwassenaar.nl"
|
||||
|
||||
_WASSENAAR_STATUS_MAP = {
|
||||
"te koop": "beschikbaar",
|
||||
"nieuw": "beschikbaar",
|
||||
"onder bod": "onder_bod",
|
||||
"onder optie": "onder_bod",
|
||||
"verkocht o.v.": "onder_bod",
|
||||
"verkocht onder voorbehoud": "onder_bod",
|
||||
"verkocht": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def _wassenaar_detail(detail_url: str) -> dict:
|
||||
"""Fetch Realworks detail page; extract kenmerken with Wassenaar-specific labels."""
|
||||
try:
|
||||
soup = fetch_soup(detail_url)
|
||||
kv: dict[str, str] = {}
|
||||
for kenmerk in soup.select("span.kenmerk"):
|
||||
label_el = kenmerk.select_one("span.kenmerkName")
|
||||
value_el = kenmerk.select_one("span.kenmerkValue")
|
||||
if label_el and value_el:
|
||||
kv[label_el.get_text(strip=True).lower()] = value_el.get_text(strip=True)
|
||||
return {
|
||||
"woningtype": kv.get("soort object"),
|
||||
"bouwjaar": kv.get("bouwjaar"),
|
||||
"woonoppervlak": kv.get("woonoppervlakte"),
|
||||
"perceeloppervlak": kv.get("perceeloppervlakte"),
|
||||
"kamers": kv.get("aantal kamers"),
|
||||
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||
"energielabel": kv.get("energieklasse"),
|
||||
}
|
||||
except Exception as e:
|
||||
log.warning("wassenaar: detail fetch fout %s: %s", detail_url, e)
|
||||
return {}
|
||||
|
||||
|
||||
def fetch_wassenaar() -> list[RawListing]:
|
||||
import json as _json
|
||||
soup = fetch_soup(f"{_WASSENAAR_BASE}/aanbod/woningaanbod/-{config.MAX_PRICE}/koop/")
|
||||
|
||||
# First pass: collect status + thumbnail per relative url
|
||||
# Each listing has two a.aanbodEntryLink with the same href;
|
||||
# the first has the status banner + photo, the second has address + price.
|
||||
status_by_url: dict[str, str] = {}
|
||||
photo_by_url: dict[str, str] = {}
|
||||
for a in soup.select("a.aanbodEntryLink[href]"):
|
||||
href = a["href"]
|
||||
if href in status_by_url:
|
||||
continue
|
||||
banner = a.select_one(".objectstatusbanner")
|
||||
status_text = banner.get_text(strip=True).lower() if banner else ""
|
||||
status_by_url[href] = _WASSENAAR_STATUS_MAP.get(status_text, "beschikbaar")
|
||||
img = a.select_one("span.hoofdfoto img")
|
||||
if img:
|
||||
src = img.get("src", "")
|
||||
if "geenfotobeschikbaar" not in src:
|
||||
photo_by_url[href] = src
|
||||
|
||||
# Second pass: parse JSON-LD blocks (one per listing)
|
||||
seen: set[str] = set()
|
||||
listings = []
|
||||
for tag in soup.select('script[type="application/ld+json"]'):
|
||||
try:
|
||||
ld = _json.loads(tag.string)
|
||||
if ld.get("@type") != "Residence":
|
||||
continue
|
||||
rel_url = ld.get("url", "")
|
||||
if not rel_url or rel_url in seen:
|
||||
continue
|
||||
seen.add(rel_url)
|
||||
|
||||
detail_url = _WASSENAAR_BASE + rel_url
|
||||
address = ld.get("address", {})
|
||||
postcode = address.get("postalCode", "").replace(" ", "") or None
|
||||
|
||||
price_spec = next(
|
||||
(a.get("priceSpecification", {}) for a in ld.get("potentialAction", [])
|
||||
if a.get("priceSpecification")),
|
||||
{}
|
||||
)
|
||||
prijs = int(price_spec["price"]) if price_spec.get("price") else None
|
||||
if prijs and prijs > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
hero = ld.get("photo") or photo_by_url.get(rel_url)
|
||||
status = status_by_url.get(rel_url, "beschikbaar")
|
||||
kk = _wassenaar_detail(detail_url)
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="wassenaar",
|
||||
status=status,
|
||||
adres=address.get("streetAddress") or None,
|
||||
postcode=postcode,
|
||||
stad=address.get("addressLocality") or None,
|
||||
prijs=prijs,
|
||||
hero_image_url=hero,
|
||||
woningtype=kk.get("woningtype"),
|
||||
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||
woonoppervlak=parse_m2(kk.get("woonoppervlak")),
|
||||
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||
kamers=int(kk["kamers"]) if kk.get("kamers") else None,
|
||||
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
|
||||
energielabel=kk.get("energielabel"),
|
||||
))
|
||||
except Exception as e:
|
||||
log.warning("wassenaar: parse fout: %s", e)
|
||||
|
||||
log.info("wassenaar: %d listings opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# SSR helper utils
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _text(soup, selector: str) -> str | None:
|
||||
el = soup.select_one(selector)
|
||||
return el.get_text(strip=True) if el else None
|
||||
|
||||
|
||||
def _src(soup, selector: str) -> str | None:
|
||||
el = soup.select_one(selector)
|
||||
if el is None:
|
||||
return None
|
||||
return el.get("src") or el.get("data-src")
|
||||
|
||||
|
||||
def _extract_postcode(text: str | None) -> str | None:
|
||||
if not text:
|
||||
return None
|
||||
m = re.search(r"\b(\d{4}\s?[A-Z]{2})\b", text)
|
||||
return m.group(1).replace(" ", "") if m else None
|
||||
|
||||
|
||||
def _infer_stad(postcode: str | None) -> str | None:
|
||||
"""Simpele mapping op basis van postcode range — uitbreiden naar wens."""
|
||||
if not postcode:
|
||||
return None
|
||||
code = int(postcode[:4])
|
||||
if 2600 <= code <= 2629:
|
||||
return "Delft"
|
||||
if 3100 <= code <= 3135:
|
||||
return "Schiedam"
|
||||
return None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# D&S Makelaars (Schiedam)
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -485,6 +166,7 @@ def _ds_detail(detail_url: str, html_text: str = None) -> dict:
|
||||
)
|
||||
html_text = r.text
|
||||
|
||||
from bs4 import BeautifulSoup
|
||||
soup = BeautifulSoup(html_text, "html.parser")
|
||||
|
||||
# Parse <dt>/<dd> pairs into a label → value map
|
||||
@@ -504,18 +186,16 @@ def _ds_detail(detail_url: str, html_text: str = None) -> dict:
|
||||
if m:
|
||||
postcode = f"{m.group(1)}{m.group(2)}"
|
||||
|
||||
# Extract specific fields
|
||||
result = {
|
||||
"status": kv.get("status", "beschikbaar").lower(),
|
||||
"woningtype": kv.get("soort woning"),
|
||||
"bouwjaar": kv.get("bouwjaar"),
|
||||
return {
|
||||
"status": kv.get("status", "beschikbaar").lower(),
|
||||
"woningtype": kv.get("soort woning"),
|
||||
"bouwjaar": kv.get("bouwjaar"),
|
||||
"woonoppervlak": kv.get("woonoppervlakte"),
|
||||
"kamers": kv.get("aantal kamers"),
|
||||
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||
"kamers": kv.get("aantal kamers"),
|
||||
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||
"energielabel": kv.get("energielabel"),
|
||||
"postcode": postcode,
|
||||
"postcode": postcode,
|
||||
}
|
||||
return result
|
||||
except Exception as e:
|
||||
log.warning("dens: detail fetch fout %s: %s", detail_url, e)
|
||||
return {}
|
||||
@@ -580,7 +260,6 @@ def fetch_dens() -> list[RawListing]:
|
||||
if detail_data.get("status"):
|
||||
status = _DS_STATUS_MAP.get(detail_data["status"], status)
|
||||
|
||||
# Build listing
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="dens",
|
||||
@@ -597,6 +276,8 @@ def fetch_dens() -> list[RawListing]:
|
||||
slaapkamers=int(detail_data["slaapkamers"]) if detail_data.get("slaapkamers") else None,
|
||||
energielabel=detail_data.get("energielabel"),
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
except Exception as e:
|
||||
log.warning("dens: parse fout: %s", e)
|
||||
|
||||
@@ -638,12 +319,12 @@ def _3dmakelaars_detail(detail_url: str) -> dict:
|
||||
postcode = _extract_postcode(text)
|
||||
|
||||
return {
|
||||
"kamers": int(kv["aantal kamers"].split()[0]) if "aantal kamers" in kv else None,
|
||||
"slaapkamers": int(kv["aantal slaapkamers"].split()[0]) if "aantal slaapkamers" in kv else None,
|
||||
"bouwjaar": int(kv["bouwjaar"]) if "bouwjaar" in kv else None,
|
||||
"woningtype": kv.get("bouwvorm"),
|
||||
"kamers": int(kv["aantal kamers"].split()[0]) if "aantal kamers" in kv else None,
|
||||
"slaapkamers": int(kv["aantal slaapkamers"].split()[0]) if "aantal slaapkamers" in kv else None,
|
||||
"bouwjaar": int(kv["bouwjaar"]) if "bouwjaar" in kv else None,
|
||||
"woningtype": kv.get("bouwvorm"),
|
||||
"woonoppervlak": parse_m2(kv.get("oppervlakte")),
|
||||
"postcode": postcode,
|
||||
"postcode": postcode,
|
||||
}
|
||||
except Exception as e:
|
||||
log.warning("3dmakelaars: detail fetch fout %s: %s", detail_url, e)
|
||||
@@ -723,6 +404,8 @@ def fetch_3dmakelaars() -> list[RawListing]:
|
||||
slaapkamers=detail_data.get("slaapkamers"),
|
||||
hero_image_url=hero,
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
except Exception as e:
|
||||
log.warning("3dmakelaars: parse fout: %s", e)
|
||||
|
||||
@@ -771,13 +454,13 @@ def _dupont_detail(detail_url: str) -> dict:
|
||||
postcode = _extract_postcode(small_tag.get_text())
|
||||
|
||||
return {
|
||||
"postcode": postcode,
|
||||
"woningtype": kv.get("soort woning"),
|
||||
"bouwjaar": kv.get("bouwjaar"),
|
||||
"postcode": postcode,
|
||||
"woningtype": kv.get("soort woning"),
|
||||
"bouwjaar": kv.get("bouwjaar"),
|
||||
"woonoppervlak": kv.get("woonoppervlakte"),
|
||||
"kamers": kv.get("aantal kamers"),
|
||||
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||
"energielabel": kv.get("energielabel"),
|
||||
"kamers": kv.get("aantal kamers"),
|
||||
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||
"energielabel": kv.get("energielabel"),
|
||||
}
|
||||
except Exception as e:
|
||||
log.warning("dupont: detail fetch fout %s: %s", detail_url, e)
|
||||
@@ -845,6 +528,9 @@ def fetch_dupont() -> list[RawListing]:
|
||||
slaapkamers=int(detail_data["slaapkamers"]) if detail_data.get("slaapkamers") else None,
|
||||
energielabel=detail_data.get("energielabel"),
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
|
||||
except Exception as e:
|
||||
log.warning("dupont: parse fout: %s", e)
|
||||
|
||||
@@ -854,18 +540,3 @@ def fetch_dupont() -> list[RawListing]:
|
||||
|
||||
log.info("dupont: %d listings opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# SCRAPERS — exporteer hier alle actieve SSR adapters
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
SCRAPERS = {
|
||||
'ankebodewes': fetch_ankebodewes,
|
||||
'woongoed': fetch_woongoed,
|
||||
'dewittegarantiemakelaars': fetch_dewittegarantiemakelaars,
|
||||
'wassenaar': fetch_wassenaar,
|
||||
'dens': fetch_dens,
|
||||
'3dmakelaars': fetch_3dmakelaars,
|
||||
'dupont': fetch_dupont,
|
||||
}
|
||||
656
src/adapters/ssr/sure.py
Normal file
656
src/adapters/ssr/sure.py
Normal file
@@ -0,0 +1,656 @@
|
||||
"""
|
||||
SURE WordPress plugin scrapers.
|
||||
|
||||
All makelaars here use the SURE real estate plugin for WordPress. Listings
|
||||
are at /wonen?sure_koop_huur=koop with pagination via /wonen/page/{N}/.
|
||||
Cards use class a.card-house or div.card.card--house.
|
||||
Detail pages have a #kenmerken section with label/value pairs.
|
||||
|
||||
Scrapers: schielandborsboom, olsthoorn, vanherk, borgdorff
|
||||
"""
|
||||
import re
|
||||
|
||||
import config
|
||||
from huizenbot import RawListing
|
||||
|
||||
from ._shared import fetch_soup, parse_prijs, parse_m2, _text, _extract_postcode, log
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Schieland Borsboom NVM Makelaars (Rotterdam, active in Schiedam)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_SCHIELAND_BASE = "https://www.schielandborsboom.nl"
|
||||
|
||||
_SCHIELAND_STATUS_MAP = {
|
||||
"sure-status-available": "beschikbaar",
|
||||
"sure-status-under_bid": "onder_bod",
|
||||
"sure-status-sold": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def _schieland_detail(detail_url: str) -> dict:
|
||||
"""Fetch Schieland Borsboom detail page and extract kenmerken."""
|
||||
try:
|
||||
soup = fetch_soup(detail_url)
|
||||
|
||||
# Postcode from house__status p (e.g. "3117 DP Schiedam")
|
||||
postcode_el = soup.select_one("div.house__status p")
|
||||
postcode = _extract_postcode(postcode_el.get_text()) if postcode_el else None
|
||||
|
||||
# Parse house-features__block sections: div.house-features__block > ul > li
|
||||
kv: dict[str, str] = {}
|
||||
for block in soup.select("div.house-features__block"):
|
||||
h4 = block.select_one("h4")
|
||||
if not h4:
|
||||
continue
|
||||
section_title = h4.get_text(strip=True).lower()
|
||||
|
||||
for li in block.select("ul > li"):
|
||||
strong = li.select_one("strong")
|
||||
span = li.select_one("span")
|
||||
if not strong or not span:
|
||||
continue
|
||||
|
||||
label = strong.get_text(strip=True).lower()
|
||||
value = span.get_text(strip=True)
|
||||
|
||||
# Remove links from value
|
||||
for a in span.select("a"):
|
||||
value = value.replace(a.get_text(strip=True), "").strip()
|
||||
|
||||
kv[f"{section_title}.{label}"] = value
|
||||
|
||||
return {
|
||||
"postcode": postcode,
|
||||
"status": kv.get("overdracht.status", "").lower(),
|
||||
"woningtype": kv.get("bouwvorm.soort bouw"),
|
||||
"bouwjaar": kv.get("bouwvorm.bouwjaar"),
|
||||
"woonoppervlak": kv.get("indeling.woonoppervlakte"),
|
||||
"perceeloppervlak": kv.get("indeling.perceeloppervlakte"),
|
||||
"kamers": kv.get("indeling.aantal kamers"),
|
||||
"slaapkamers": kv.get("indeling.aantal slaapkamers"),
|
||||
"energielabel": kv.get("energie & installatie.energielabel"),
|
||||
}
|
||||
except Exception as e:
|
||||
log.warning("schielandborsboom: detail fetch fout %s: %s", detail_url, e)
|
||||
return {}
|
||||
|
||||
|
||||
def fetch_schielandborsboom() -> list[RawListing]:
|
||||
"""Fetch Schieland Borsboom NVM listings (koop only, Schiedam)."""
|
||||
listings = []
|
||||
page = 1
|
||||
|
||||
while True:
|
||||
if page == 1:
|
||||
url = f"{_SCHIELAND_BASE}/wonen/zoeken/heel-nederland/prijs=200000-300000/schiedam/"
|
||||
else:
|
||||
url = f"{_SCHIELAND_BASE}/wonen/zoeken/heel-nederland/prijs=200000-300000/schiedam/?pagina={page}"
|
||||
|
||||
soup = fetch_soup(url)
|
||||
cards = soup.select("div.card.card--house")
|
||||
if not cards:
|
||||
break
|
||||
|
||||
for card in cards:
|
||||
try:
|
||||
a_tag = card.select_one("a.card__anchor")
|
||||
if not a_tag or "href" not in a_tag.attrs:
|
||||
continue
|
||||
detail_url = a_tag["href"]
|
||||
if not detail_url.startswith("http"):
|
||||
detail_url = _SCHIELAND_BASE + detail_url
|
||||
|
||||
# Filter: only Schiedam
|
||||
stad_el = card.select_one("p.house-place")
|
||||
stad = stad_el.get_text(strip=True) if stad_el else None
|
||||
if not stad or stad.lower() != "schiedam":
|
||||
continue
|
||||
|
||||
# Status from card-house__status badge
|
||||
status_el = card.select_one("div.card-house__status")
|
||||
status_text = status_el.get_text(strip=True).lower() if status_el else ""
|
||||
# Check for known status keywords in badge text
|
||||
if "beschikbaar" in status_text:
|
||||
status = "beschikbaar"
|
||||
elif "onder bod" in status_text:
|
||||
status = "onder_bod"
|
||||
elif "verkocht" in status_text:
|
||||
status = "verkocht"
|
||||
else:
|
||||
status = "beschikbaar"
|
||||
|
||||
# Price
|
||||
prijs = parse_prijs(_text(card, "p.price"))
|
||||
if prijs and prijs > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
adres = _text(card, "h4.house-street")
|
||||
|
||||
# Hero image from picture source (medium size)
|
||||
src_tag = card.select_one('picture source[media="(min-width:100px)"]')
|
||||
hero = src_tag["srcset"] if src_tag else None
|
||||
if hero is None:
|
||||
img = card.select_one("img")
|
||||
hero = img.get("src") if img else None
|
||||
if hero and not hero.startswith("http"):
|
||||
hero = _SCHIELAND_BASE + hero
|
||||
|
||||
# Data icons on card: surface, bedrooms, energy label
|
||||
woonoppervlak_card = None
|
||||
slaapkamers_card = None
|
||||
energielabel_card = None
|
||||
for data_div in card.select("div.data"):
|
||||
txt = data_div.get_text(strip=True)
|
||||
if data_div.select_one("i.icon-surface"):
|
||||
woonoppervlak_card = parse_m2(txt)
|
||||
elif data_div.select_one("i.icon-bedrooms"):
|
||||
m = re.search(r"(\d+)", txt)
|
||||
slaapkamers_card = int(m.group(1)) if m else None
|
||||
elif data_div.select_one("i.icon-label"):
|
||||
energielabel_card = txt.strip() or None
|
||||
|
||||
# Fetch detail page for full kenmerken
|
||||
kk = _schieland_detail(detail_url)
|
||||
|
||||
# Refine status from detail page
|
||||
if kk.get("status"):
|
||||
status = _SCHIELAND_STATUS_MAP.get(kk["status"], status)
|
||||
|
||||
# Parse kamers: "5 kamers" → 5
|
||||
kamers = None
|
||||
if kk.get("kamers"):
|
||||
m = re.search(r"(\d+)", kk["kamers"])
|
||||
kamers = int(m.group(1)) if m else None
|
||||
|
||||
# Parse slaapkamers: "3" or "3 slaapkamers" → 3
|
||||
slaapkamers = slaapkamers_card
|
||||
if kk.get("slaapkamers"):
|
||||
m = re.search(r"(\d+)", kk["slaapkamers"])
|
||||
slaapkamers = int(m.group(1)) if m else slaapkamers_card
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="schielandborsboom",
|
||||
status=status,
|
||||
adres=adres,
|
||||
postcode=kk.get("postcode"),
|
||||
stad=stad,
|
||||
prijs=prijs,
|
||||
hero_image_url=hero,
|
||||
woningtype=kk.get("woningtype"),
|
||||
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
|
||||
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||
kamers=kamers,
|
||||
slaapkamers=slaapkamers,
|
||||
energielabel=kk.get("energielabel") or energielabel_card,
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
except Exception as e:
|
||||
log.warning("schielandborsboom: parse fout: %s", e)
|
||||
|
||||
if len(cards) < 18:
|
||||
break
|
||||
page += 1
|
||||
|
||||
log.info("schielandborsboom: %d listings opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Olsthoorn Makelaars Delft (SURE WordPress plugin)
|
||||
# ---------------------------------------------------------------------------
|
||||
# Covers Delft, Den Haag, Naaldwijk etc — we filter for Delft only.
|
||||
# Detail page has no postcode; leave as None.
|
||||
|
||||
_OLSTHOORN_BASE = "https://www.olsthoornmakelaars.nl"
|
||||
|
||||
_OLSTHOORN_STATUS_MAP = {
|
||||
"badge-available": "beschikbaar",
|
||||
"badge-bid": "onder_bod",
|
||||
"badge-option": "onder_bod",
|
||||
"badge-sold": "verkocht",
|
||||
}
|
||||
|
||||
_OLSTHOORN_DETAIL_STATUS_MAP = {
|
||||
"beschikbaar": "beschikbaar",
|
||||
"onder bod": "onder_bod",
|
||||
"onder optie": "onder_bod",
|
||||
"verkocht": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def _olsthoorn_detail(detail_url: str) -> dict:
|
||||
"""Fetch Olsthoorn detail page; extract kenmerken from #kenmerken li pairs."""
|
||||
try:
|
||||
soup = fetch_soup(detail_url)
|
||||
kv: dict[str, str] = {}
|
||||
for li in soup.select("#kenmerken li"):
|
||||
spans = li.select("span")
|
||||
if len(spans) >= 2:
|
||||
label = spans[0].get_text(strip=True).lower()
|
||||
value = spans[1].get_text(strip=True)
|
||||
kv[label] = value
|
||||
return {
|
||||
"status": kv.get("status", "").lower(),
|
||||
"woningtype": kv.get("soort object") or kv.get("soort woning") or kv.get("soort bouw"),
|
||||
"bouwjaar": kv.get("bouwjaar"),
|
||||
"woonoppervlak": kv.get("gebruiksoppervlakte"),
|
||||
"perceeloppervlak": kv.get("perceeloppervlakte"),
|
||||
"kamers": kv.get("aantal kamers"),
|
||||
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||
"energielabel": kv.get("energielabel"),
|
||||
}
|
||||
except Exception as e:
|
||||
log.warning("olsthoorn: detail fetch fout %s: %s", detail_url, e)
|
||||
return {}
|
||||
|
||||
|
||||
def fetch_olsthoorn() -> list[RawListing]:
|
||||
"""Fetch Olsthoorn Makelaars listings; only Delft, only koop."""
|
||||
listings = []
|
||||
page = 1
|
||||
|
||||
while True:
|
||||
if page == 1:
|
||||
url = f"{_OLSTHOORN_BASE}/wonen?sure_koop_huur=koop"
|
||||
else:
|
||||
url = f"{_OLSTHOORN_BASE}/wonen/page/{page}/?sure_koop_huur=koop"
|
||||
|
||||
soup = fetch_soup(url)
|
||||
cards = soup.select("a.card-house")
|
||||
if not cards:
|
||||
break
|
||||
|
||||
for card in cards:
|
||||
try:
|
||||
href = card.get("href", "")
|
||||
if not href:
|
||||
continue
|
||||
detail_url = href if href.startswith("http") else _OLSTHOORN_BASE + href
|
||||
|
||||
# Filter: only Delft
|
||||
stad_el = card.select_one("h2.card__title")
|
||||
stad = stad_el.get_text(strip=True) if stad_el else None
|
||||
if not stad or stad.lower() != "delft":
|
||||
continue
|
||||
|
||||
# Price from bold tag — filter early before detail fetch
|
||||
prijs_b = card.select_one("b")
|
||||
prijs = parse_prijs(prijs_b.get_text() if prijs_b else None)
|
||||
if prijs and prijs > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
# Status from badge class on label span
|
||||
label_span = card.select_one("span.card-house__label")
|
||||
status = "beschikbaar"
|
||||
if label_span:
|
||||
for cls in label_span.get("class", []):
|
||||
if cls in _OLSTHOORN_STATUS_MAP:
|
||||
status = _OLSTHOORN_STATUS_MAP[cls]
|
||||
break
|
||||
|
||||
# Address: second <p> under .short--info (collapse internal whitespace)
|
||||
adres_p = card.select("div.short--info > p")
|
||||
if adres_p:
|
||||
adres = " ".join(adres_p[0].get_text().split())
|
||||
else:
|
||||
adres = None
|
||||
|
||||
# Hero image: largest source srcset
|
||||
src_tag = card.select_one('picture source[media="(min-width:1024px)"]')
|
||||
hero = src_tag.get("data-srcset") if src_tag else None
|
||||
if hero and not hero.startswith("http"):
|
||||
hero = _OLSTHOORN_BASE + hero
|
||||
|
||||
# Woonoppervlak + kamers + energielabel from card data icons
|
||||
woonoppervlak_card = None
|
||||
kamers_card = None
|
||||
energielabel_card = None
|
||||
for data_div in card.select("div.data"):
|
||||
inner = data_div.select_one("span.date__inner")
|
||||
if not inner:
|
||||
continue
|
||||
txt = inner.get_text(strip=True)
|
||||
if data_div.select_one("i.icon-sizes"):
|
||||
woonoppervlak_card = parse_m2(txt)
|
||||
elif data_div.select_one("i.icon-door"):
|
||||
m = re.search(r"(\d+)", txt)
|
||||
kamers_card = int(m.group(1)) if m else None
|
||||
elif data_div.select_one("i.icon-energylabel"):
|
||||
energielabel_card = txt or None
|
||||
|
||||
kk = _olsthoorn_detail(detail_url)
|
||||
|
||||
# Refine status from detail page
|
||||
detail_status = _OLSTHOORN_DETAIL_STATUS_MAP.get(kk.get("status", ""), "")
|
||||
if detail_status:
|
||||
status = detail_status
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="olsthoorn",
|
||||
status=status,
|
||||
adres=adres,
|
||||
postcode=None, # not exposed by broker
|
||||
stad=stad,
|
||||
prijs=prijs,
|
||||
hero_image_url=hero,
|
||||
woningtype=kk.get("woningtype"),
|
||||
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
|
||||
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||
kamers=int(kk["kamers"]) if kk.get("kamers") else kamers_card,
|
||||
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
|
||||
energielabel=kk.get("energielabel") or energielabel_card,
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
except Exception as e:
|
||||
log.warning("olsthoorn: parse fout: %s", e)
|
||||
|
||||
if len(cards) < 15:
|
||||
break
|
||||
page += 1
|
||||
|
||||
log.info("olsthoorn: %d listings opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Van Herk Makelaars (Schiedam) — SURE WordPress plugin (card-house)
|
||||
# ---------------------------------------------------------------------------
|
||||
# Listings filtered by city + price in URL; pagination via /page/{N}/.
|
||||
# Detail page: div.features ul.unstyled li with two <span> (label + value).
|
||||
|
||||
_VANHERK_BASE = "https://www.vanherk.nl"
|
||||
_VANHERK_LISTINGS = "https://www.vanherk.nl/wonen/aanbod/zoeken/schiedam/200000-300000/"
|
||||
|
||||
_VANHERK_STATUS_MAP = {
|
||||
"beschikbaar": "beschikbaar",
|
||||
"onder bod": "onder_bod",
|
||||
"onder optie": "onder_bod",
|
||||
"verkocht": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def _vanherk_detail(detail_url: str) -> dict:
|
||||
"""Fetch Van Herk detail page; extract kenmerken from div.features."""
|
||||
try:
|
||||
soup = fetch_soup(detail_url)
|
||||
kv: dict[str, str] = {}
|
||||
for li in soup.select("div.features ul.unstyled li"):
|
||||
spans = li.select("span")
|
||||
if len(spans) >= 2:
|
||||
label = spans[0].get_text(strip=True).lower()
|
||||
value = spans[1].get_text(strip=True)
|
||||
kv[label] = value
|
||||
# Postcode is in <title>: "Lorentzlaan 19 B, 3112 KE SCHIEDAM - Van Herk Makelaars"
|
||||
postcode = None
|
||||
if soup.title:
|
||||
m = re.search(r"\b(\d{4}\s*[A-Z]{2})\b", soup.title.get_text())
|
||||
if m:
|
||||
postcode = m.group(1).replace(" ", " ").strip()
|
||||
return {
|
||||
"status": kv.get("status", "").lower(),
|
||||
"postcode": postcode,
|
||||
"bouwjaar": kv.get("bouwjaar"),
|
||||
"woonoppervlak": kv.get("woonoppervlakte"),
|
||||
"kamers": kv.get("aantal kamers"),
|
||||
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||
"energielabel": kv.get("energielabel"),
|
||||
}
|
||||
except Exception as e:
|
||||
log.warning("vanherk: detail fetch fout %s: %s", detail_url, e)
|
||||
return {}
|
||||
|
||||
|
||||
def fetch_vanherk() -> list[RawListing]:
|
||||
"""Fetch Van Herk listings; only Schiedam, only koop."""
|
||||
listings = []
|
||||
page = 1
|
||||
|
||||
while True:
|
||||
if page == 1:
|
||||
url = _VANHERK_LISTINGS
|
||||
else:
|
||||
url = _VANHERK_LISTINGS + f"page/{page}/"
|
||||
|
||||
soup = fetch_soup(url)
|
||||
cards = soup.select("a.card-house")
|
||||
if not cards:
|
||||
break
|
||||
|
||||
for card in cards:
|
||||
try:
|
||||
href = card.get("href", "")
|
||||
if not href:
|
||||
continue
|
||||
detail_url = href if href.startswith("http") else _VANHERK_BASE + href
|
||||
|
||||
# City from lead paragraph
|
||||
lead = card.select_one("p.lead")
|
||||
stad = lead.get_text(strip=True) if lead else None
|
||||
|
||||
# Address from h4 (normalize whitespace incl. )
|
||||
h4 = card.select_one("h4")
|
||||
adres = " ".join(h4.get_text().split()) if h4 else None
|
||||
|
||||
# Price from .subtitle
|
||||
subtitle = card.select_one("p.subtitle")
|
||||
prijs = parse_prijs(subtitle.get_text() if subtitle else None)
|
||||
if prijs and prijs > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
# Hero image: largest srcset source
|
||||
src_tag = card.select_one('picture source[media="(min-width:1280px)"]')
|
||||
hero = src_tag.get("srcset") if src_tag else None
|
||||
if hero and not hero.startswith("http"):
|
||||
hero = _VANHERK_BASE + hero
|
||||
|
||||
# Card data icons: surface, bedrooms, energy label
|
||||
woonoppervlak_card = None
|
||||
slaapkamers_card = None
|
||||
energielabel_card = None
|
||||
for data_div in card.select("div.data"):
|
||||
classes = data_div.get("class") or []
|
||||
if "d-none" in classes:
|
||||
continue
|
||||
if "data-energie" in classes:
|
||||
inner = data_div.select_one(".date__inner")
|
||||
energielabel_card = inner.get_text(strip=True) if inner else None
|
||||
elif data_div.select_one("i.icon-surface"):
|
||||
inner = data_div.select_one("span.date__inner")
|
||||
woonoppervlak_card = parse_m2(inner.get_text(strip=True) if inner else None)
|
||||
elif data_div.select_one("i.icon-bed"):
|
||||
inner = data_div.select_one("span.date__inner")
|
||||
txt = inner.get_text(strip=True) if inner else None
|
||||
m = re.search(r"(\d+)", txt) if txt else None
|
||||
slaapkamers_card = int(m.group(1)) if m else None
|
||||
|
||||
kk = _vanherk_detail(detail_url)
|
||||
|
||||
status = _VANHERK_STATUS_MAP.get(kk.get("status", ""), "beschikbaar")
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="vanherk",
|
||||
status=status,
|
||||
adres=adres,
|
||||
postcode=kk.get("postcode"),
|
||||
stad=stad,
|
||||
prijs=prijs,
|
||||
hero_image_url=hero,
|
||||
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar", "").isdigit() else None,
|
||||
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
|
||||
kamers=int(kk["kamers"]) if kk.get("kamers", "").isdigit() else None,
|
||||
slaapkamers=(int(kk["slaapkamers"]) if kk.get("slaapkamers", "").isdigit() else None) or slaapkamers_card,
|
||||
energielabel=kk.get("energielabel") or energielabel_card,
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
except Exception as e:
|
||||
log.warning("vanherk: parse fout: %s", e)
|
||||
|
||||
if len(cards) < 15:
|
||||
break
|
||||
page += 1
|
||||
|
||||
log.info("vanherk: %d listings opgehaald", len(listings))
|
||||
return listings
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Borgdorff Makelaars (Den Haag / Westland) — SURE WordPress plugin
|
||||
# ---------------------------------------------------------------------------
|
||||
# Covers Den Haag ('s-gravenhage), Monster, Naaldwijk etc. Filter for Den Haag.
|
||||
# Same SURE plugin as Schieland Borsboom but uses a.card--house (double dash).
|
||||
# No postcode on detail page.
|
||||
|
||||
_BORGDORFF_BASE = "https://www.borgdorff.nl"
|
||||
_BORGDORFF_DEN_HAAG = {"'s-gravenhage", "den haag"}
|
||||
|
||||
_BORGDORFF_BADGE_MAP = {
|
||||
"badge--info": "beschikbaar",
|
||||
"badge--warning": "onder_bod",
|
||||
"badge--danger": "verkocht",
|
||||
}
|
||||
|
||||
_BORGDORFF_DETAIL_STATUS_MAP = {
|
||||
"beschikbaar": "beschikbaar",
|
||||
"onder bod": "onder_bod",
|
||||
"onder optie": "onder_bod",
|
||||
"verkocht": "verkocht",
|
||||
}
|
||||
|
||||
|
||||
def _borgdorff_detail(detail_url: str) -> dict:
|
||||
"""Fetch Borgdorff detail page; extract #kenmerken li span pairs."""
|
||||
try:
|
||||
soup = fetch_soup(detail_url)
|
||||
kv: dict[str, str] = {}
|
||||
for li in soup.select("#kenmerken li"):
|
||||
spans = li.select("span")
|
||||
if len(spans) >= 2:
|
||||
label = spans[0].get_text(strip=True).lower()
|
||||
value = spans[1].get_text(strip=True)
|
||||
kv[label] = value
|
||||
return {
|
||||
"status": kv.get("status", "").lower(),
|
||||
"woningtype": kv.get("soort woonhuis") or kv.get("soort woning") or kv.get("soort bouw"),
|
||||
"bouwjaar": kv.get("bouwjaar"),
|
||||
"woonoppervlak": kv.get("gebruiksoppervlakte wonen") or kv.get("gebruiksoppervlakte"),
|
||||
"perceeloppervlak": kv.get("perceeloppervlakte"),
|
||||
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||
"energielabel": kv.get("energielabel"),
|
||||
}
|
||||
except Exception as e:
|
||||
log.warning("borgdorff: detail fetch fout %s: %s", detail_url, e)
|
||||
return {}
|
||||
|
||||
|
||||
def fetch_borgdorff() -> list[RawListing]:
|
||||
"""Fetch Borgdorff listings; only Den Haag / 's-gravenhage, only koop."""
|
||||
listings = []
|
||||
page = 1
|
||||
|
||||
while True:
|
||||
if page == 1:
|
||||
url = f"{_BORGDORFF_BASE}/wonen?sure_koop_huur=koop"
|
||||
else:
|
||||
url = f"{_BORGDORFF_BASE}/wonen/page/{page}/?sure_koop_huur=koop"
|
||||
|
||||
soup = fetch_soup(url)
|
||||
cards = soup.select("a.card--house")
|
||||
if not cards:
|
||||
break
|
||||
|
||||
for card in cards:
|
||||
try:
|
||||
href = card.get("href", "")
|
||||
if not href:
|
||||
continue
|
||||
detail_url = href if href.startswith("http") else _BORGDORFF_BASE + href
|
||||
|
||||
# Filter: only Den Haag
|
||||
stad_el = card.select_one("p.lead-two")
|
||||
stad = stad_el.get_text(strip=True) if stad_el else None
|
||||
if not stad or stad.lower() not in _BORGDORFF_DEN_HAAG:
|
||||
continue
|
||||
|
||||
# Price — filter early
|
||||
prijs = parse_prijs(_text(card, "p.strong"))
|
||||
if prijs and prijs > config.MAX_PRICE:
|
||||
continue
|
||||
|
||||
# Status from badge class
|
||||
label_span = card.select_one("span.card-house__label")
|
||||
status = "beschikbaar"
|
||||
if label_span:
|
||||
for cls in label_span.get("class", []):
|
||||
if cls in _BORGDORFF_BADGE_MAP:
|
||||
status = _BORGDORFF_BADGE_MAP[cls]
|
||||
break
|
||||
|
||||
# Address
|
||||
adres = _text(card, "h4")
|
||||
|
||||
# Hero: largest source srcset
|
||||
src_tag = card.select_one('picture source[media="(min-width:1280px)"]')
|
||||
hero = src_tag.get("srcset") if src_tag else None
|
||||
if not hero:
|
||||
img = card.select_one("img[data-src]")
|
||||
hero = img.get("data-src") if img else None
|
||||
if hero and not hero.startswith("http"):
|
||||
hero = _BORGDORFF_BASE + hero
|
||||
|
||||
# Surface + bedrooms from data icons
|
||||
woonoppervlak_card = None
|
||||
slaapkamers_card = None
|
||||
for data_div in card.select("div.data"):
|
||||
inner = data_div.select_one("p.small")
|
||||
if not inner:
|
||||
continue
|
||||
txt = inner.get_text(strip=True)
|
||||
if data_div.select_one("i.icon-surface"):
|
||||
woonoppervlak_card = parse_m2(txt)
|
||||
elif data_div.select_one("i.icon-bed"):
|
||||
m = re.search(r"(\d+)", txt)
|
||||
slaapkamers_card = int(m.group(1)) if m else None
|
||||
|
||||
kk = _borgdorff_detail(detail_url)
|
||||
|
||||
# Refine status from detail page
|
||||
if kk.get("status"):
|
||||
status = _BORGDORFF_DETAIL_STATUS_MAP.get(kk["status"], status)
|
||||
|
||||
listings.append(RawListing(
|
||||
url=detail_url,
|
||||
source_makelaar="borgdorff",
|
||||
status=status,
|
||||
adres=adres,
|
||||
postcode=None, # not exposed by broker
|
||||
stad=stad,
|
||||
prijs=prijs,
|
||||
hero_image_url=hero,
|
||||
woningtype=kk.get("woningtype"),
|
||||
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
|
||||
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else slaapkamers_card,
|
||||
energielabel=kk.get("energielabel"),
|
||||
))
|
||||
if config.APP_ENV == "dev":
|
||||
break
|
||||
except Exception as e:
|
||||
log.warning("borgdorff: parse fout: %s", e)
|
||||
|
||||
if len(cards) < 15:
|
||||
break
|
||||
page += 1
|
||||
|
||||
log.info("borgdorff: %d listings opgehaald", len(listings))
|
||||
return listings
|
||||
@@ -1,5 +1,5 @@
|
||||
"""
|
||||
config.py — vul aan met je eigen waarden. Secrets via environment variables.
|
||||
config.py — Secrets via environment variables.
|
||||
"""
|
||||
import os
|
||||
|
||||
@@ -10,16 +10,46 @@ MICHELLE_WERK_9292 = "vlaardingen/"+MICHELLE_WERK_POSTCODE
|
||||
|
||||
HA_WEBHOOK_URL = os.environ.get("HA_WEBHOOK_URL", "")
|
||||
|
||||
SMTP_HOST = os.environ.get("SMTP_HOST", "")
|
||||
SMTP_PORT = int(os.environ.get("SMTP_PORT", "587"))
|
||||
SMTP_FROM = os.environ.get("SMTP_FROM", "")
|
||||
SMTP_TO = os.environ.get("SMTP_TO", "")
|
||||
SMTP_USER = os.environ.get("SMTP_USER", "")
|
||||
|
||||
USER_AGENT = "Huizenbot/1.0 (+mark@kalsbeek.dev) persoonlijk gebruik"
|
||||
|
||||
DB_PATH = os.environ.get("DB_PATH", "/data/huizenbot.db")
|
||||
|
||||
FIETS_SNELHEID_FACTOR = 1.27
|
||||
|
||||
MAX_PRICE = 300_000
|
||||
MAX_PRICE = 300_000 # coarse pre-filter in adapters only
|
||||
|
||||
MIN_AREA = 65 # Sq meters
|
||||
|
||||
# Fine price filter: max mortgage per energy label group * 0.9
|
||||
# Labels not in this map fall back to the most conservative tier.
|
||||
_LABEL_DISCOUNT = 0.9
|
||||
MAX_PRIJS_PER_LABEL: dict[str, int] = {
|
||||
"EFG": int(286_942 * _LABEL_DISCOUNT),
|
||||
"CD": int(291_942 * _LABEL_DISCOUNT),
|
||||
"AB": int(296_942 * _LABEL_DISCOUNT),
|
||||
"A+": int(306_942 * _LABEL_DISCOUNT),
|
||||
}
|
||||
_MAX_PRIJS_ONBEKEND = MAX_PRIJS_PER_LABEL["EFG"] # conservative fallback
|
||||
|
||||
def max_prijs_voor_label(label: str | None) -> int:
|
||||
"""Return the max allowed price for a given energy label (or None/unknown)."""
|
||||
if not label:
|
||||
return _MAX_PRIJS_ONBEKEND
|
||||
l = label.strip().upper()
|
||||
if l in ("A+++", "A++", "A+"):
|
||||
return MAX_PRIJS_PER_LABEL["A+"]
|
||||
if l in ("A", "B"):
|
||||
return MAX_PRIJS_PER_LABEL["AB"]
|
||||
if l in ("C", "D"):
|
||||
return MAX_PRIJS_PER_LABEL["CD"]
|
||||
if l in ("E", "F", "G"):
|
||||
return MAX_PRIJS_PER_LABEL["EFG"]
|
||||
return _MAX_PRIJS_ONBEKEND
|
||||
|
||||
# Travel time limits (None travel time → pass, with warning)
|
||||
MAX_OV_MINUTEN_MARK = 50
|
||||
MAX_OV_MINUTEN_MICHELLE = 50
|
||||
MAX_FIETS_MINUTEN_MARK = 35
|
||||
# No fiets limit for michelle
|
||||
|
||||
APP_ENV = os.environ.get("APP_ENV", "dev")
|
||||
|
||||
218
src/huizenbot.py
218
src/huizenbot.py
@@ -6,13 +6,10 @@ import hashlib
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import smtplib
|
||||
import sqlite3
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime, date
|
||||
from email.mime.multipart import MIMEMultipart
|
||||
from email.mime.text import MIMEText
|
||||
from typing import Callable, Any
|
||||
|
||||
import httpx
|
||||
@@ -97,6 +94,7 @@ CREATE TABLE IF NOT EXISTS woningen (
|
||||
|
||||
|
||||
def get_db(path: str) -> sqlite3.Connection:
|
||||
log.info(f"Opening db at path {path}")
|
||||
conn = sqlite3.connect(path)
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute("PRAGMA journal_mode=WAL")
|
||||
@@ -161,9 +159,22 @@ def upsert(conn: sqlite3.Connection, listing: RawListing, travel: dict[str,int])
|
||||
"extra": json.dumps(listing.extra) if listing.extra else None,
|
||||
})
|
||||
else:
|
||||
_cursor = conn.execute("""
|
||||
UPDATE woningen SET last_seen = ?, status = ? WHERE id = ?
|
||||
""", (now, listing.status, lid))
|
||||
if travel:
|
||||
conn.execute("""
|
||||
UPDATE woningen
|
||||
SET last_seen = ?, status = ?,
|
||||
fiets_mark = ?, fiets_michelle = ?, ov_mark = ?, ov_michelle = ?
|
||||
WHERE id = ?
|
||||
""", (
|
||||
now, listing.status,
|
||||
travel.get("fiets_mark"), travel.get("fiets_michelle"),
|
||||
travel.get("ov_mark"), travel.get("ov_michelle"),
|
||||
lid,
|
||||
))
|
||||
else:
|
||||
conn.execute("""
|
||||
UPDATE woningen SET last_seen = ?, status = ? WHERE id = ?
|
||||
""", (now, listing.status, lid))
|
||||
|
||||
conn.commit()
|
||||
return is_new
|
||||
@@ -234,7 +245,7 @@ def _next_weekday_morning() -> str:
|
||||
return d.strftime("%Y%m%dT083000")
|
||||
|
||||
|
||||
def bereken_reistijden(postcode: str | None) -> dict[str, int]:
|
||||
def bereken_reistijden(postcode: str | None, stad: str | None) -> dict[str, int]:
|
||||
"""Bereken alle reistijden voor een woning postcode. Geeft lege dict bij falen."""
|
||||
if not postcode:
|
||||
return {}
|
||||
@@ -243,16 +254,20 @@ def bereken_reistijden(postcode: str | None) -> dict[str, int]:
|
||||
if not woning_coords:
|
||||
return {}
|
||||
|
||||
werk1 = geocode(config.MARK_WERK_POSTCODE)
|
||||
werk2 = geocode(config.MICHELLE_WERK_POSTCODE)
|
||||
werk1_coords = geocode(config.MARK_WERK_POSTCODE)
|
||||
werk2_coords = geocode(config.MICHELLE_WERK_POSTCODE)
|
||||
|
||||
# 9292 expects "cityname/postcode" strings (lowercase city)
|
||||
stad_lower = (stad or "").strip().lower()
|
||||
woning_9292 = f"{stad_lower}/{postcode}" if stad_lower else postcode
|
||||
|
||||
result = {}
|
||||
if werk1:
|
||||
result["fiets_mark"] = fiets_minuten(woning_coords, werk1)
|
||||
result["ov_mark"] = ov_minuten(woning_coords, werk1)
|
||||
if werk2:
|
||||
result["fiets_michelle"] = fiets_minuten(woning_coords, werk2)
|
||||
result["ov_michelle"] = ov_minuten(woning_coords, werk2)
|
||||
if werk1_coords:
|
||||
result["fiets_mark"] = fiets_minuten(woning_coords, werk1_coords)
|
||||
result["ov_mark"] = ov_minuten(woning_9292, config.MARK_WERK_9292)
|
||||
if werk2_coords:
|
||||
result["fiets_michelle"] = fiets_minuten(woning_coords, werk2_coords)
|
||||
result["ov_michelle"] = ov_minuten(woning_9292, config.MICHELLE_WERK_9292)
|
||||
|
||||
return result
|
||||
|
||||
@@ -285,45 +300,66 @@ def notify_ha(listing: RawListing, travel: dict[str,int]) -> None:
|
||||
log.info("HA notificatie verstuurd voor %s", listing.adres)
|
||||
except Exception as e:
|
||||
log.error("HA webhook fout: %s", e)
|
||||
notify_email(listing, travel) # fallback
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Filtering
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def notify_email(listing: RawListing, travel: dict[str,int]) -> None:
|
||||
"""Stuur HTML email als fallback."""
|
||||
if not config.SMTP_HOST:
|
||||
return
|
||||
|
||||
subject = f"Nieuwe woning: {listing.adres}, {listing.stad} — €{listing.prijs:,}"
|
||||
|
||||
html = f"""
|
||||
<html><body>
|
||||
<h2>{listing.adres}, {listing.stad}</h2>
|
||||
<p><strong>Prijs:</strong> €{listing.prijs:,}</p>
|
||||
<p><strong>Status:</strong> {listing.status}</p>
|
||||
<p><strong>Fiets P1:</strong> {travel.get('fiets_mark')} min
|
||||
<strong>OV P1:</strong> {travel.get('ov_mark')} min</p>
|
||||
<p><strong>Fiets P2:</strong> {travel.get('fiets_michelle')} min
|
||||
<strong>OV P2:</strong> {travel.get('ov_michelle')} min</p>
|
||||
{"<img src='" + listing.hero_image_url + "' width='600'>" if listing.hero_image_url else ""}
|
||||
<p><a href="{listing.url}">Bekijk listing</a></p>
|
||||
</body></html>
|
||||
def _check_filters(listing: RawListing, travel: dict[str, int]) -> bool:
|
||||
"""
|
||||
Returns True if the listing passes all filters and should trigger a notification.
|
||||
Always errs on the side of notifying when data is missing (logs a warning).
|
||||
"""
|
||||
passed = True
|
||||
|
||||
msg = MIMEMultipart("alternative")
|
||||
msg["Subject"] = subject
|
||||
msg["From"] = config.SMTP_FROM
|
||||
msg["To"] = config.SMTP_TO
|
||||
msg.attach(MIMEText(html, "html"))
|
||||
# --- Price filter ---
|
||||
if listing.prijs is not None:
|
||||
max_p = config.max_prijs_voor_label(listing.energielabel)
|
||||
if listing.prijs > max_p:
|
||||
log.info(
|
||||
"Gefilterd op prijs: %s €%d > €%d (label: %s)",
|
||||
listing.adres, listing.prijs, max_p, listing.energielabel or "onbekend",
|
||||
)
|
||||
passed = False
|
||||
# --- Area filter ---
|
||||
if listing.woonoppervlak is not None and listing.woonoppervlak < config.MIN_AREA:
|
||||
log.info(f"Gefilterd op oppervlakte: {listing.woonoppervlak} < {config.MIN_AREA}")
|
||||
passed = False
|
||||
|
||||
try:
|
||||
with smtplib.SMTP(config.SMTP_HOST, config.SMTP_PORT) as s:
|
||||
if config.SMTP_USER:
|
||||
s.starttls()
|
||||
s.login(config.SMTP_USER, os.environ.get("SMTP_PASSWORD", ""))
|
||||
s.send_message(msg)
|
||||
log.info("Email verstuurd voor %s", listing.adres)
|
||||
except Exception as e:
|
||||
log.error("Email fout: %s", e)
|
||||
# --- OV filter ---
|
||||
ov_mark = travel.get("ov_mark")
|
||||
ov_michelle = travel.get("ov_michelle")
|
||||
|
||||
if ov_mark is None:
|
||||
log.warning(
|
||||
"OV reistijd mark ONBEKEND voor %s — notificatie wordt toch verstuurd",
|
||||
listing.adres,
|
||||
)
|
||||
elif ov_mark > config.MAX_OV_MINUTEN_MARK:
|
||||
log.info("Gefilterd op OV mark: %s %dmin > %dmin", listing.adres, ov_mark, config.MAX_OV_MINUTEN_MARK)
|
||||
passed = False
|
||||
|
||||
if ov_michelle is None:
|
||||
log.warning(
|
||||
"OV reistijd michelle ONBEKEND voor %s — notificatie wordt toch verstuurd",
|
||||
listing.adres,
|
||||
)
|
||||
elif ov_michelle > config.MAX_OV_MINUTEN_MICHELLE:
|
||||
log.info("Gefilterd op OV michelle: %s %dmin > %dmin", listing.adres, ov_michelle, config.MAX_OV_MINUTEN_MICHELLE)
|
||||
passed = False
|
||||
|
||||
# --- Fiets filter (mark only) ---
|
||||
fiets_mark = travel.get("fiets_mark")
|
||||
if fiets_mark is None:
|
||||
log.warning(
|
||||
"Fiets reistijd mark ONBEKEND voor %s — notificatie wordt toch verstuurd",
|
||||
listing.adres,
|
||||
)
|
||||
elif fiets_mark > config.MAX_FIETS_MINUTEN_MARK:
|
||||
log.info("Gefilterd op fiets mark: %s %dmin > %dmin", listing.adres, fiets_mark, config.MAX_FIETS_MINUTEN_MARK)
|
||||
passed = False
|
||||
|
||||
return passed
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -333,42 +369,66 @@ def notify_email(listing: RawListing, travel: dict[str,int]) -> None:
|
||||
Scraper = Callable[[], list[RawListing]]
|
||||
|
||||
|
||||
def run(scrapers: list[Scraper], db_path: str) -> None:
|
||||
conn = get_db(db_path)
|
||||
total_new = 0
|
||||
|
||||
for scraper in scrapers:
|
||||
name = scraper.__name__
|
||||
log.info("Scraper starten: %s", name)
|
||||
try:
|
||||
listings = scraper()
|
||||
except Exception as e:
|
||||
log.error("Scraper %s gefaald: %s", name, e)
|
||||
continue
|
||||
|
||||
def _run_scraper(scraper: Scraper) -> tuple[str, list[RawListing]]:
|
||||
name = scraper.__name__
|
||||
log.info("Scraper starten: %s", name)
|
||||
try:
|
||||
listings = scraper()
|
||||
log.info("Scraper %s: %d listings opgehaald", name, len(listings))
|
||||
return name, listings
|
||||
except Exception as e:
|
||||
log.error("Scraper %s gefaald: %s", name, e)
|
||||
return name, []
|
||||
|
||||
for listing in listings:
|
||||
travel = {}
|
||||
try:
|
||||
# Check of het een nieuwe woning is vóór upsert
|
||||
lid = listing_id(listing.url)
|
||||
is_existing = conn.execute(
|
||||
"SELECT id FROM woningen WHERE id = ?", (lid,)
|
||||
).fetchone() is not None
|
||||
|
||||
if not is_existing:
|
||||
travel = bereken_reistijden(listing.postcode)
|
||||
def run(scrapers: dict[str,Scraper], db_path: str) -> None:
|
||||
import concurrent.futures
|
||||
|
||||
is_new = upsert(conn, listing, travel)
|
||||
conn = get_db(db_path)
|
||||
|
||||
if is_new:
|
||||
total_new += 1
|
||||
log.info("Nieuwe woning: %s (%s)", listing.adres, listing.url)
|
||||
total_new = 0
|
||||
total_notified = 0
|
||||
|
||||
# Phase 1: run all scrapers concurrently (each hits a different domain)
|
||||
all_listings: list[RawListing] = []
|
||||
with concurrent.futures.ThreadPoolExecutor(max_workers=len(scrapers)) as pool:
|
||||
futures = {pool.submit(_run_scraper, s): s for s in scrapers.values()}
|
||||
for future in concurrent.futures.as_completed(futures):
|
||||
_name, listings = future.result()
|
||||
all_listings.extend(listings)
|
||||
|
||||
log.info("Alle scrapers klaar. %d listings totaal opgehaald.", len(all_listings))
|
||||
|
||||
# Phase 2: sequential travel calculation + upsert + filtered notify
|
||||
for listing in all_listings:
|
||||
travel = {}
|
||||
try:
|
||||
lid = listing_id(listing.url)
|
||||
row = conn.execute(
|
||||
"SELECT fiets_mark FROM woningen WHERE id = ?", (lid,)
|
||||
).fetchone()
|
||||
is_existing = row is not None
|
||||
needs_travel = not is_existing or row[0] is None
|
||||
|
||||
if needs_travel:
|
||||
travel = bereken_reistijden(listing.postcode, listing.stad)
|
||||
|
||||
is_new = upsert(conn, listing, travel)
|
||||
|
||||
if is_new:
|
||||
total_new += 1
|
||||
log.info("Nieuwe woning: %s (%s)", listing.adres, listing.url)
|
||||
if _check_filters(listing, travel):
|
||||
total_notified += 1
|
||||
notify_ha(listing, travel)
|
||||
else:
|
||||
log.info("Geen notificatie voor %s (gefilterd)", listing.adres)
|
||||
|
||||
except Exception as e:
|
||||
log.error("Fout bij verwerken %s: %s", listing.url, e)
|
||||
except Exception as e:
|
||||
log.error("Fout bij verwerken %s: %s", listing.url, e)
|
||||
|
||||
log.info("Run klaar. %d nieuwe woningen gevonden.", total_new)
|
||||
log.info(
|
||||
"Run klaar. %d nieuwe woningen, %d notificaties verstuurd.",
|
||||
total_new, total_notified,
|
||||
)
|
||||
conn.close()
|
||||
|
||||
741
src/templates/index.html
Normal file
741
src/templates/index.html
Normal file
@@ -0,0 +1,741 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="nl">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Huizenbot</title>
|
||||
<link rel="preconnect" href="https://fonts.googleapis.com">
|
||||
<link href="https://fonts.googleapis.com/css2?family=Syne:wght@400;600;700;800&family=DM+Mono:wght@400;500&display=swap" rel="stylesheet">
|
||||
<style>
|
||||
:root {
|
||||
--bg: #f5f0eb;
|
||||
--surface: #fdf9f5;
|
||||
--surface2: #ede8e2;
|
||||
--border: #ddd6cc;
|
||||
--accent: #6a9e78;
|
||||
--accent-dim: #4f7a5c;
|
||||
--text: #2e2a25;
|
||||
--text-dim: #7a7068;
|
||||
--text-dimmer: #aaa098;
|
||||
--red: #c0524a;
|
||||
--orange: #c07c3a;
|
||||
--radius: 10px;
|
||||
--font-ui: 'Syne', sans-serif;
|
||||
--font-mono: 'DM Mono', monospace;
|
||||
}
|
||||
|
||||
*, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
|
||||
body {
|
||||
background: var(--bg);
|
||||
color: var(--text);
|
||||
font-family: var(--font-ui);
|
||||
min-height: 100vh;
|
||||
}
|
||||
|
||||
/* ── Header ── */
|
||||
header {
|
||||
padding: 1.25rem 1rem 0;
|
||||
display: flex;
|
||||
align-items: baseline;
|
||||
gap: 0.75rem;
|
||||
}
|
||||
header h1 {
|
||||
font-size: 1.5rem;
|
||||
font-weight: 800;
|
||||
letter-spacing: -0.03em;
|
||||
color: var(--accent);
|
||||
}
|
||||
#count {
|
||||
font-family: var(--font-mono);
|
||||
font-size: 0.75rem;
|
||||
color: var(--text-dim);
|
||||
}
|
||||
|
||||
/* ── Filters ── */
|
||||
#filters {
|
||||
position: sticky;
|
||||
top: 0;
|
||||
z-index: 100;
|
||||
background: var(--bg);
|
||||
border-bottom: 1px solid var(--border);
|
||||
padding: 0.75rem 1rem;
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 0.5rem;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
.filter-group {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.35rem;
|
||||
background: var(--surface);
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 6px;
|
||||
padding: 0.3rem 0.6rem;
|
||||
}
|
||||
.filter-group label {
|
||||
font-size: 0.7rem;
|
||||
font-weight: 600;
|
||||
color: var(--text-dim);
|
||||
letter-spacing: 0.04em;
|
||||
white-space: nowrap;
|
||||
text-transform: uppercase;
|
||||
}
|
||||
.filter-group input[type=number] {
|
||||
background: transparent;
|
||||
border: none;
|
||||
color: var(--accent);
|
||||
font-family: var(--font-mono);
|
||||
font-size: 0.8rem;
|
||||
width: 3.2rem;
|
||||
outline: none;
|
||||
text-align: right;
|
||||
}
|
||||
.filter-group input[type=number]::-webkit-inner-spin-button { opacity: 0.3; }
|
||||
|
||||
.filter-group select {
|
||||
background: transparent;
|
||||
border: none;
|
||||
color: var(--text);
|
||||
font-family: var(--font-ui);
|
||||
font-size: 0.75rem;
|
||||
font-weight: 600;
|
||||
outline: none;
|
||||
cursor: pointer;
|
||||
}
|
||||
.filter-group select option { background: var(--surface2); }
|
||||
|
||||
#filter-reset {
|
||||
background: none;
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 6px;
|
||||
color: var(--text-dimmer);
|
||||
font-family: var(--font-ui);
|
||||
font-size: 0.7rem;
|
||||
font-weight: 600;
|
||||
letter-spacing: 0.04em;
|
||||
text-transform: uppercase;
|
||||
padding: 0.3rem 0.7rem;
|
||||
cursor: pointer;
|
||||
transition: color 0.15s, border-color 0.15s;
|
||||
}
|
||||
#filter-reset:hover { color: var(--text); border-color: var(--text-dim); }
|
||||
|
||||
/* ── Card list ── */
|
||||
#listings {
|
||||
padding: 0.75rem 1rem 3rem;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.5rem;
|
||||
max-width: 900px;
|
||||
margin: 0 auto;
|
||||
}
|
||||
|
||||
#empty {
|
||||
text-align: center;
|
||||
color: var(--text-dimmer);
|
||||
font-family: var(--font-mono);
|
||||
font-size: 0.85rem;
|
||||
padding: 4rem 1rem;
|
||||
display: none;
|
||||
}
|
||||
|
||||
/* ── Card ── */
|
||||
.card {
|
||||
background: var(--surface);
|
||||
border: 1px solid var(--border);
|
||||
border-radius: var(--radius);
|
||||
overflow: hidden;
|
||||
transition: border-color 0.15s;
|
||||
}
|
||||
.card:hover { border-color: #c5bdb4; }
|
||||
|
||||
.card-compact {
|
||||
display: grid;
|
||||
grid-template-columns: 1fr 2fr;
|
||||
min-height: 110px;
|
||||
cursor: pointer;
|
||||
user-select: none;
|
||||
}
|
||||
|
||||
/* Image */
|
||||
.card-img {
|
||||
position: relative;
|
||||
background: var(--surface2);
|
||||
overflow: hidden;
|
||||
}
|
||||
.card-img img {
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
object-fit: cover;
|
||||
display: block;
|
||||
transition: transform 0.3s ease;
|
||||
}
|
||||
.card:hover .card-img img { transform: scale(1.03); }
|
||||
.card-img-placeholder {
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
color: var(--text-dimmer);
|
||||
font-size: 1.5rem;
|
||||
}
|
||||
.card-source {
|
||||
position: absolute;
|
||||
bottom: 0.4rem;
|
||||
left: 0.4rem;
|
||||
background: rgba(255,255,255,0.75);
|
||||
backdrop-filter: blur(4px);
|
||||
color: var(--text-dim);
|
||||
font-family: var(--font-mono);
|
||||
font-size: 0.6rem;
|
||||
padding: 0.15rem 0.4rem;
|
||||
border-radius: 4px;
|
||||
letter-spacing: 0.03em;
|
||||
}
|
||||
|
||||
/* Data section */
|
||||
.card-data {
|
||||
padding: 0.7rem 0.75rem;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.35rem;
|
||||
min-width: 0;
|
||||
}
|
||||
|
||||
.card-header {
|
||||
display: flex;
|
||||
align-items: flex-start;
|
||||
justify-content: space-between;
|
||||
gap: 0.5rem;
|
||||
}
|
||||
|
||||
.card-adres {
|
||||
font-size: 0.85rem;
|
||||
font-weight: 700;
|
||||
line-height: 1.3;
|
||||
color: var(--text);
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
white-space: nowrap;
|
||||
flex: 1;
|
||||
min-width: 0;
|
||||
}
|
||||
.card-stad {
|
||||
font-size: 0.7rem;
|
||||
color: var(--text-dim);
|
||||
font-weight: 400;
|
||||
}
|
||||
|
||||
/* Link chip — always clickable, does NOT expand card */
|
||||
.card-link {
|
||||
flex-shrink: 0;
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: 0.25rem;
|
||||
background: var(--surface2);
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 5px;
|
||||
color: var(--text-dim);
|
||||
font-family: var(--font-mono);
|
||||
font-size: 0.6rem;
|
||||
padding: 0.2rem 0.45rem;
|
||||
text-decoration: none;
|
||||
transition: color 0.15s, border-color 0.15s, background 0.15s;
|
||||
white-space: nowrap;
|
||||
}
|
||||
.card-link:hover {
|
||||
color: var(--accent);
|
||||
border-color: var(--accent-dim);
|
||||
background: rgba(106,158,120,0.08);
|
||||
}
|
||||
.card-link svg { flex-shrink: 0; }
|
||||
|
||||
.card-prijs {
|
||||
font-size: 1rem;
|
||||
font-weight: 800;
|
||||
color: var(--accent);
|
||||
letter-spacing: -0.02em;
|
||||
font-family: var(--font-mono);
|
||||
}
|
||||
|
||||
.card-meta {
|
||||
display: grid;
|
||||
grid-template-columns: 1fr 1fr;
|
||||
gap: 0.2rem 0.5rem;
|
||||
margin-top: 0.1rem;
|
||||
}
|
||||
.card-meta-item {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.3rem;
|
||||
font-size: 0.68rem;
|
||||
color: var(--text-dim);
|
||||
font-family: var(--font-mono);
|
||||
white-space: nowrap;
|
||||
}
|
||||
.card-meta-item .icon { font-size: 0.75rem; }
|
||||
.card-meta-item .val { color: var(--text); font-weight: 500; }
|
||||
.card-meta-item.warn .val { color: var(--orange); }
|
||||
.card-meta-item.ok .val { color: var(--accent); }
|
||||
|
||||
/* Expand toggle indicator */
|
||||
.card-toggle {
|
||||
align-self: flex-end;
|
||||
color: var(--text-dimmer);
|
||||
font-size: 0.65rem;
|
||||
font-weight: 600;
|
||||
letter-spacing: 0.04em;
|
||||
text-transform: uppercase;
|
||||
margin-top: auto;
|
||||
}
|
||||
|
||||
/* ── Expanded panel ── */
|
||||
.card-expanded {
|
||||
display: none;
|
||||
border-top: 1px solid var(--border);
|
||||
padding: 0.9rem 1rem;
|
||||
background: var(--surface2);
|
||||
}
|
||||
.card.open .card-expanded { display: block; }
|
||||
.card.open .card-toggle { color: var(--accent-dim); }
|
||||
|
||||
.expanded-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fill, minmax(160px, 1fr));
|
||||
gap: 0.5rem 1rem;
|
||||
margin-bottom: 0.75rem;
|
||||
}
|
||||
.expanded-field {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.1rem;
|
||||
}
|
||||
.expanded-field .ef-label {
|
||||
font-size: 0.62rem;
|
||||
font-weight: 600;
|
||||
color: var(--text-dimmer);
|
||||
letter-spacing: 0.06em;
|
||||
text-transform: uppercase;
|
||||
}
|
||||
.expanded-field .ef-val {
|
||||
font-size: 0.8rem;
|
||||
color: var(--text);
|
||||
font-family: var(--font-mono);
|
||||
}
|
||||
|
||||
.extra-section {
|
||||
border-top: 1px solid var(--border);
|
||||
padding-top: 0.6rem;
|
||||
margin-top: 0.25rem;
|
||||
}
|
||||
.extra-section h4 {
|
||||
font-size: 0.62rem;
|
||||
font-weight: 600;
|
||||
color: var(--text-dimmer);
|
||||
letter-spacing: 0.06em;
|
||||
text-transform: uppercase;
|
||||
margin-bottom: 0.4rem;
|
||||
}
|
||||
.extra-kv {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 0.35rem;
|
||||
}
|
||||
.extra-kv-item {
|
||||
background: var(--surface);
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 5px;
|
||||
padding: 0.2rem 0.5rem;
|
||||
font-family: var(--font-mono);
|
||||
font-size: 0.68rem;
|
||||
color: var(--text-dim);
|
||||
}
|
||||
.extra-kv-item .ek { color: var(--text-dimmer); }
|
||||
.extra-kv-item .ev { color: var(--text); margin-left: 0.3rem; }
|
||||
|
||||
/* ── Energielabel badge ── */
|
||||
.el-badge {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
font-size: 0.62rem;
|
||||
font-weight: 700;
|
||||
font-family: var(--font-mono);
|
||||
padding: 0.1rem 0.35rem;
|
||||
border-radius: 3px;
|
||||
letter-spacing: 0.03em;
|
||||
line-height: 1.5;
|
||||
color: #fff;
|
||||
min-width: 1.8rem;
|
||||
text-align: center;
|
||||
}
|
||||
.el-Appp { background: #004f2d; }
|
||||
.el-App { background: #006837; }
|
||||
.el-Ap { background: #1a9641; }
|
||||
.el-A { background: #3cb54a; }
|
||||
.el-B { background: #69b444; }
|
||||
.el-C { background: #a6d854; color: #2e2a25; }
|
||||
.el-D { background: #f9c819; color: #2e2a25; }
|
||||
.el-E { background: #f4a432; color: #2e2a25; }
|
||||
.el-F { background: #e8612d; }
|
||||
.el-G { background: #c0392b; }
|
||||
.el-unknown { background: var(--surface2); color: var(--text-dim); border: 1px solid var(--border); }
|
||||
|
||||
/* ── Search bar ── */
|
||||
#f-search {
|
||||
background: var(--surface);
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 6px;
|
||||
color: var(--text);
|
||||
font-family: var(--font-ui);
|
||||
font-size: 0.75rem;
|
||||
font-weight: 500;
|
||||
padding: 0.3rem 0.6rem;
|
||||
outline: none;
|
||||
width: 11rem;
|
||||
transition: border-color 0.15s;
|
||||
}
|
||||
#f-search::placeholder { color: var(--text-dimmer); }
|
||||
#f-search:focus { border-color: var(--accent); }
|
||||
|
||||
/* ── Disable filters toggle ── */
|
||||
#filter-disable {
|
||||
background: none;
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 6px;
|
||||
color: var(--text-dimmer);
|
||||
font-family: var(--font-ui);
|
||||
font-size: 0.7rem;
|
||||
font-weight: 600;
|
||||
letter-spacing: 0.04em;
|
||||
text-transform: uppercase;
|
||||
padding: 0.3rem 0.7rem;
|
||||
cursor: pointer;
|
||||
transition: color 0.15s, border-color 0.15s, background 0.15s;
|
||||
}
|
||||
#filter-disable:hover { color: var(--text); border-color: var(--text-dim); }
|
||||
#filter-disable.active {
|
||||
background: var(--orange);
|
||||
border-color: var(--orange);
|
||||
color: #fff;
|
||||
}
|
||||
|
||||
/* ── No results ── */
|
||||
#empty { display: none; }
|
||||
#empty.visible { display: block; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<header>
|
||||
<h1>Huizenbot</h1>
|
||||
<span id="count"></span>
|
||||
</header>
|
||||
|
||||
<div id="filters">
|
||||
<div class="filter-group">
|
||||
<label>OV Mark ≤</label>
|
||||
<input type="number" id="f-ov-mark" value="45" min="0" max="120">
|
||||
<label>min</label>
|
||||
</div>
|
||||
<div class="filter-group">
|
||||
<label>OV Michelle ≤</label>
|
||||
<input type="number" id="f-ov-michelle" value="45" min="0" max="120">
|
||||
<label>min</label>
|
||||
</div>
|
||||
<div class="filter-group">
|
||||
<label>Fiets Mark ≤</label>
|
||||
<input type="number" id="f-fiets-mark" value="40" min="0" max="90">
|
||||
<label>min</label>
|
||||
</div>
|
||||
<div class="filter-group">
|
||||
<label>Max prijs</label>
|
||||
<input type="number" id="f-prijs" value="300000" min="0" step="5000">
|
||||
</div>
|
||||
<div class="filter-group">
|
||||
<label>Min opp.</label>
|
||||
<input type="number" id="f-opp" value="65" min="0" max="300">
|
||||
<label>m²</label>
|
||||
</div>
|
||||
<div class="filter-group">
|
||||
<label>Sorteer</label>
|
||||
<select id="f-sort">
|
||||
<option value="first_seen_desc">Nieuwste eerst</option>
|
||||
<option value="first_seen_asc">Oudste eerst</option>
|
||||
<option value="prijs_asc">Prijs ↑</option>
|
||||
<option value="prijs_desc">Prijs ↓</option>
|
||||
<option value="ov_mark_asc">OV Mark ↑</option>
|
||||
<option value="fiets_mark_asc">Fiets Mark ↑</option>
|
||||
<option value="opp_asc">Opp. ↑</option>
|
||||
<option value="opp_desc">Opp. ↓</option>
|
||||
</select>
|
||||
</div>
|
||||
<input type="search" id="f-search" placeholder="Zoek adres, stad…">
|
||||
<button id="filter-disable">Filters uit</button>
|
||||
<button id="filter-reset">Reset</button>
|
||||
</div>
|
||||
|
||||
<div id="listings"></div>
|
||||
<div id="empty">Geen woningen gevonden met deze filters.</div>
|
||||
|
||||
<script>
|
||||
const LISTINGS = {{ listings_json | safe }};
|
||||
|
||||
const DEFAULTS = {
|
||||
'f-ov-mark': 45,
|
||||
'f-ov-michelle': 45,
|
||||
'f-fiets-mark': 40,
|
||||
'f-prijs': 300000,
|
||||
'f-opp': 65,
|
||||
'f-sort': 'first_seen_desc',
|
||||
};
|
||||
|
||||
// ── Helpers ──
|
||||
|
||||
function fmt_prijs(p) {
|
||||
if (!p) return '—';
|
||||
return '€\u202f' + p.toLocaleString('nl-NL');
|
||||
}
|
||||
|
||||
function fmt_min(m) {
|
||||
if (m == null) return '—';
|
||||
return m + ' min';
|
||||
}
|
||||
|
||||
function travel_class(val, warn, good) {
|
||||
if (val == null) return '';
|
||||
if (val <= good) return 'ok';
|
||||
if (val <= warn) return '';
|
||||
return 'warn';
|
||||
}
|
||||
|
||||
function fmt_date(iso) {
|
||||
if (!iso) return '—';
|
||||
return iso.slice(0, 10);
|
||||
}
|
||||
|
||||
function fmt_extra_val(v) {
|
||||
if (v === null || v === undefined) return null;
|
||||
if (typeof v === 'boolean') return v ? 'ja' : 'nee';
|
||||
if (Array.isArray(v)) {
|
||||
if (v.length === 0) return null;
|
||||
// photos array: just show count
|
||||
return v.length + ' foto\'s';
|
||||
}
|
||||
if (typeof v === 'object') return JSON.stringify(v).slice(0, 60);
|
||||
const s = String(v);
|
||||
if (s === '' || s === 'null') return null;
|
||||
// truncate long description
|
||||
return s.length > 120 ? s.slice(0, 120) + '…' : s;
|
||||
}
|
||||
|
||||
function el_class(label) {
|
||||
if (!label) return 'el-unknown';
|
||||
const s = label.replace(/\+/g, 'p').replace(/-/g, '');
|
||||
const map = { 'Appp': 'el-Appp', 'App': 'el-App', 'Ap': 'el-Ap', 'A': 'el-A',
|
||||
'B': 'el-B', 'C': 'el-C', 'D': 'el-D', 'E': 'el-E', 'F': 'el-F', 'G': 'el-G' };
|
||||
return map[s] || 'el-unknown';
|
||||
}
|
||||
|
||||
function ef(label, val) {
|
||||
if (val == null || val === '' || val === 'null') return '';
|
||||
return `<div class="expanded-field">
|
||||
<span class="ef-label">${label}</span>
|
||||
<span class="ef-val">${val}</span>
|
||||
</div>`;
|
||||
}
|
||||
|
||||
// ── Card renderer ──
|
||||
|
||||
function render_card(l) {
|
||||
const img = l.hero_image_url
|
||||
? `<img src="${l.hero_image_url}" alt="${l.adres || ''}" loading="lazy">`
|
||||
: `<div class="card-img-placeholder">🏠</div>`;
|
||||
|
||||
const ovM = travel_class(l.ov_mark, 45, 30);
|
||||
const ovMi = travel_class(l.ov_michelle, 45, 30);
|
||||
const fM = travel_class(l.fiets_mark, 40, 25);
|
||||
const fMi = travel_class(l.fiets_michelle, 50, 35);
|
||||
|
||||
const extra_items = Object.entries(l.extra || {})
|
||||
.map(([k, v]) => {
|
||||
const fv = fmt_extra_val(v);
|
||||
if (fv === null) return '';
|
||||
return `<span class="extra-kv-item"><span class="ek">${k}</span><span class="ev">${fv}</span></span>`;
|
||||
}).join('');
|
||||
|
||||
const extra_section = extra_items
|
||||
? `<div class="extra-section"><h4>Extra</h4><div class="extra-kv">${extra_items}</div></div>`
|
||||
: '';
|
||||
|
||||
return `
|
||||
<div class="card" data-id="${l.id}">
|
||||
<div class="card-compact">
|
||||
<div class="card-img">
|
||||
${img}
|
||||
<span class="card-source">${l.source_makelaar}</span>
|
||||
</div>
|
||||
<div class="card-data">
|
||||
<div class="card-header">
|
||||
<div>
|
||||
<div class="card-adres">${l.adres || '—'}</div>
|
||||
<div class="card-stad">${l.stad || ''} ${l.postcode || ''}</div>
|
||||
</div>
|
||||
<a class="card-link" href="${l.url}" target="_blank" rel="noopener" onclick="event.stopPropagation()">
|
||||
<svg width="9" height="9" viewBox="0 0 12 12" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round"><path d="M5 3H2a1 1 0 00-1 1v6a1 1 0 001 1h6a1 1 0 001-1V7M8 1h3m0 0v3m0-3L5 7"/></svg>
|
||||
link
|
||||
</a>
|
||||
</div>
|
||||
<div class="card-prijs">${fmt_prijs(l.prijs)}</div>
|
||||
<div class="card-meta">
|
||||
<div class="card-meta-item ${ovM}">
|
||||
<span class="icon">🚌</span><span>Mark</span><span class="val">${fmt_min(l.ov_mark)}</span>
|
||||
</div>
|
||||
<div class="card-meta-item ${ovMi}">
|
||||
<span class="icon">🚌</span><span>Michelle</span><span class="val">${fmt_min(l.ov_michelle)}</span>
|
||||
</div>
|
||||
<div class="card-meta-item ${fM}">
|
||||
<span class="icon">🚲</span><span>Mark</span><span class="val">${fmt_min(l.fiets_mark)}</span>
|
||||
</div>
|
||||
<div class="card-meta-item ${fMi}">
|
||||
<span class="icon">🚲</span><span>Michelle</span><span class="val">${fmt_min(l.fiets_michelle)}</span>
|
||||
</div>
|
||||
${l.woonoppervlak ? `<div class="card-meta-item"><span class="icon">📐</span><span class="val">${l.woonoppervlak} m²</span></div>` : ''}
|
||||
${l.kamers ? `<div class="card-meta-item"><span class="icon">🚪</span><span class="val">${l.kamers} kamers</span></div>` : ''}
|
||||
${l.energielabel ? `<div class="card-meta-item"><span class="el-badge ${el_class(l.energielabel)}">${l.energielabel}</span></div>` : ''}
|
||||
</div>
|
||||
<div class="card-toggle">meer ↓</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="card-expanded">
|
||||
<div class="expanded-grid">
|
||||
${ef('Eerste gezien', fmt_date(l.first_seen))}
|
||||
${ef('Datum aanmelding', l.datum_aanmelding ? fmt_date(l.datum_aanmelding) : null)}
|
||||
${ef('Woningtype', l.woningtype)}
|
||||
${ef('Bouwjaar', l.bouwjaar)}
|
||||
${ef('Woonoppervlak', l.woonoppervlak ? l.woonoppervlak + ' m²' : null)}
|
||||
${ef('Perceeloppervlak', l.perceeloppervlak ? l.perceeloppervlak + ' m²' : null)}
|
||||
${ef('Kamers', l.kamers)}
|
||||
${ef('Slaapkamers', l.slaapkamers)}
|
||||
${ef('Energielabel', l.energielabel)}
|
||||
${ef('Postcode', l.postcode)}
|
||||
${ef('OV Mark', fmt_min(l.ov_mark))}
|
||||
${ef('OV Michelle', fmt_min(l.ov_michelle))}
|
||||
${ef('Fiets Mark', fmt_min(l.fiets_mark))}
|
||||
${ef('Fiets Michelle', fmt_min(l.fiets_michelle))}
|
||||
</div>
|
||||
${extra_section}
|
||||
</div>
|
||||
</div>`;
|
||||
}
|
||||
|
||||
// ── Filter + sort + render ──
|
||||
|
||||
let filters_disabled = false;
|
||||
|
||||
function get_filters() {
|
||||
return {
|
||||
ov_mark: parseInt(document.getElementById('f-ov-mark').value) || Infinity,
|
||||
ov_michelle: parseInt(document.getElementById('f-ov-michelle').value) || Infinity,
|
||||
fiets_mark: parseInt(document.getElementById('f-fiets-mark').value) || Infinity,
|
||||
prijs: parseInt(document.getElementById('f-prijs').value) || Infinity,
|
||||
opp: parseInt(document.getElementById('f-opp').value) || 0,
|
||||
sort: document.getElementById('f-sort').value,
|
||||
search: document.getElementById('f-search').value.trim().toLowerCase(),
|
||||
};
|
||||
}
|
||||
|
||||
const SORT_FNS = {
|
||||
first_seen_desc: (a, b) => (b.first_seen || '').localeCompare(a.first_seen || ''),
|
||||
first_seen_asc: (a, b) => (a.first_seen || '').localeCompare(b.first_seen || ''),
|
||||
prijs_asc: (a, b) => (a.prijs || 0) - (b.prijs || 0),
|
||||
prijs_desc: (a, b) => (b.prijs || 0) - (a.prijs || 0),
|
||||
ov_mark_asc: (a, b) => (a.ov_mark ?? 999) - (b.ov_mark ?? 999),
|
||||
fiets_mark_asc: (a, b) => (a.fiets_mark ?? 999) - (b.fiets_mark ?? 999),
|
||||
opp_asc: (a, b) => (a.woonoppervlak ?? 0) - (b.woonoppervlak ?? 0),
|
||||
opp_desc: (a, b) => (b.woonoppervlak ?? 0) - (a.woonoppervlak ?? 0),
|
||||
};
|
||||
|
||||
function apply() {
|
||||
const f = get_filters();
|
||||
let filtered = LISTINGS.filter(l => {
|
||||
if (!filters_disabled) {
|
||||
if (f.ov_mark < Infinity && (l.ov_mark == null || l.ov_mark > f.ov_mark)) return false;
|
||||
if (f.ov_michelle < Infinity && (l.ov_michelle == null || l.ov_michelle > f.ov_michelle)) return false;
|
||||
if (f.fiets_mark < Infinity && (l.fiets_mark == null || l.fiets_mark > f.fiets_mark)) return false;
|
||||
if (l.prijs != null && l.prijs > f.prijs) return false;
|
||||
if (f.opp > 0 && (l.woonoppervlak == null || l.woonoppervlak < f.opp)) return false;
|
||||
}
|
||||
if (f.search) {
|
||||
const haystack = [l.adres, l.stad, l.postcode, l.source_makelaar, l.woningtype]
|
||||
.filter(Boolean).join(' ').toLowerCase();
|
||||
if (!haystack.includes(f.search)) return false;
|
||||
}
|
||||
return true;
|
||||
});
|
||||
|
||||
filtered.sort(SORT_FNS[f.sort] || SORT_FNS.first_seen_desc);
|
||||
|
||||
const container = document.getElementById('listings');
|
||||
const empty = document.getElementById('empty');
|
||||
const count = document.getElementById('count');
|
||||
|
||||
// Preserve open state
|
||||
const open_ids = new Set(
|
||||
[...container.querySelectorAll('.card.open')].map(el => el.dataset.id)
|
||||
);
|
||||
|
||||
container.innerHTML = filtered.map(render_card).join('');
|
||||
count.textContent = filtered.length + ' / ' + LISTINGS.length + ' woningen';
|
||||
|
||||
// Restore open state
|
||||
open_ids.forEach(id => {
|
||||
const el = container.querySelector(`.card[data-id="${id}"]`);
|
||||
if (el) el.classList.add('open');
|
||||
});
|
||||
|
||||
// Toggle on compact click
|
||||
container.querySelectorAll('.card-compact').forEach(compact => {
|
||||
compact.addEventListener('click', () => {
|
||||
compact.closest('.card').classList.toggle('open');
|
||||
const toggle = compact.querySelector('.card-toggle');
|
||||
const isOpen = compact.closest('.card').classList.contains('open');
|
||||
toggle.textContent = isOpen ? 'minder ↑' : 'meer ↓';
|
||||
});
|
||||
});
|
||||
|
||||
empty.classList.toggle('visible', filtered.length === 0);
|
||||
}
|
||||
|
||||
// ── Init ──
|
||||
|
||||
document.querySelectorAll('#filters input, #filters select').forEach(el => {
|
||||
el.addEventListener('input', apply);
|
||||
});
|
||||
|
||||
document.getElementById('filter-disable').addEventListener('click', () => {
|
||||
filters_disabled = !filters_disabled;
|
||||
document.getElementById('filter-disable').classList.toggle('active', filters_disabled);
|
||||
document.getElementById('filter-disable').textContent = filters_disabled ? 'Filters aan' : 'Filters uit';
|
||||
apply();
|
||||
});
|
||||
|
||||
document.getElementById('filter-reset').addEventListener('click', () => {
|
||||
Object.entries(DEFAULTS).forEach(([id, val]) => {
|
||||
document.getElementById(id).value = val;
|
||||
});
|
||||
document.getElementById('f-search').value = '';
|
||||
filters_disabled = false;
|
||||
document.getElementById('filter-disable').classList.remove('active');
|
||||
document.getElementById('filter-disable').textContent = 'Filters uit';
|
||||
apply();
|
||||
});
|
||||
|
||||
apply();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
65
src/web.py
Normal file
65
src/web.py
Normal file
@@ -0,0 +1,65 @@
|
||||
"""
|
||||
web.py — huizenbot web interface
|
||||
Single route: query SQLite, SSR listings into index.html template.
|
||||
"""
|
||||
import json
|
||||
import sqlite3
|
||||
import os
|
||||
from flask import Flask, render_template, g
|
||||
|
||||
DB_PATH = os.environ.get("DB_PATH", "/data/huizenbot.db")
|
||||
APP_ENV = os.environ.get("APP_ENV", "dev")
|
||||
|
||||
app = Flask(__name__)
|
||||
|
||||
|
||||
def get_db():
|
||||
if "db" not in g:
|
||||
conn = sqlite3.connect(DB_PATH)
|
||||
conn.row_factory = sqlite3.Row
|
||||
g.db = conn
|
||||
return g.db
|
||||
|
||||
|
||||
@app.teardown_appcontext
|
||||
def close_db(e=None):
|
||||
db = g.pop("db", None)
|
||||
if db is not None:
|
||||
db.close()
|
||||
|
||||
|
||||
@app.route("/")
|
||||
def index():
|
||||
conn = get_db()
|
||||
rows = conn.execute("""
|
||||
SELECT
|
||||
id, url, source_makelaar, first_seen, last_seen, datum_aanmelding,
|
||||
status, adres, postcode, stad,
|
||||
prijs, woningtype, woonoppervlak, perceeloppervlak,
|
||||
kamers, slaapkamers, bouwjaar, energielabel,
|
||||
hero_image_url,
|
||||
fiets_mark, fiets_michelle, ov_mark, ov_michelle,
|
||||
extra
|
||||
FROM woningen
|
||||
WHERE status = 'beschikbaar'
|
||||
ORDER BY first_seen DESC
|
||||
""").fetchall()
|
||||
|
||||
listings = []
|
||||
for row in rows:
|
||||
d = dict(row)
|
||||
try:
|
||||
d["extra"] = json.loads(d["extra"]) if d["extra"] else {}
|
||||
except Exception:
|
||||
d["extra"] = {}
|
||||
listings.append(d)
|
||||
|
||||
return render_template("index.html", listings_json=json.dumps(listings, ensure_ascii=False))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
if APP_ENV == "dev":
|
||||
app.run(debug=True, host="0.0.0.0", port=5005)
|
||||
else:
|
||||
from waitress import serve
|
||||
serve(app, host="0.0.0.0", port=5005)
|
||||
@@ -22,10 +22,10 @@ def _key(url: str, params: dict[str,str] | None) -> str:
|
||||
|
||||
def _patch():
|
||||
import adapters.api as api_mod
|
||||
import adapters.ssr as ssr_mod
|
||||
import adapters.ssr._shared as ssr_shared
|
||||
|
||||
_orig_fetch_json = api_mod.fetch_json
|
||||
_orig_fetch_soup = ssr_mod.fetch_soup
|
||||
_orig_fetch_soup = ssr_shared.fetch_soup
|
||||
|
||||
def cached_fetch_json(url, *, params: dict[str,str]|None=None, headers=None):
|
||||
path = CACHE_DIR / (_key(url, params) + ".json")
|
||||
@@ -46,7 +46,15 @@ def _patch():
|
||||
return result
|
||||
|
||||
api_mod.fetch_json = cached_fetch_json
|
||||
ssr_mod.fetch_soup = cached_fetch_soup
|
||||
# fetch_soup is imported directly in each submodule via `from ._shared import fetch_soup`,
|
||||
# so we must patch the name in every submodule that uses it.
|
||||
import adapters.ssr.realworks as _rw
|
||||
import adapters.ssr.sure as _sure
|
||||
import adapters.ssr.schiedam as _sch
|
||||
import adapters.ssr.denhaag as _dh
|
||||
import adapters.ssr.overige as _ov
|
||||
for _mod in [ssr_shared, _rw, _sure, _sch, _dh, _ov]:
|
||||
_mod.fetch_soup = cached_fetch_soup
|
||||
print("[cache] fetch_json and fetch_soup patched")
|
||||
|
||||
|
||||
|
||||
@@ -16,11 +16,11 @@ logging.basicConfig(
|
||||
)
|
||||
|
||||
# --- change this to test a different adapter ---
|
||||
ADAPTER = SCRAPERS['dupont']
|
||||
ADAPTER = SCRAPERS['post']
|
||||
|
||||
if __name__ == "__main__":
|
||||
print(f"Testing adapter: {ADAPTER.__name__}")
|
||||
listings = ADAPTER()
|
||||
print(f"Got {len(listings)} listings\n")
|
||||
for l in listings:
|
||||
print(f" {l.adres}, {l.postcode}, {l.stad} — €{l.prijs} — {l.kamers} rooms — {l.url}")
|
||||
print(f" {l.adres}, {l.postcode}, {l.stad} — €{l.prijs} — {l.kamers} rooms — {l.woonoppervlak}m2 — {l.energielabel} — {l.url}")
|
||||
|
||||
@@ -1,26 +0,0 @@
|
||||
import sys
|
||||
sys.path.insert(0, "../src")
|
||||
|
||||
from huizenbot import notify_email, RawListing
|
||||
|
||||
TEST_LISTING = RawListing(
|
||||
url="https://example.com/test-woning",
|
||||
source_makelaar="test",
|
||||
adres="Teststraat 1",
|
||||
stad="Delft",
|
||||
postcode="2613AA",
|
||||
prijs=350000,
|
||||
hero_image_url=None,
|
||||
)
|
||||
|
||||
TEST_TRAVEL = {
|
||||
"fiets_mark": 20,
|
||||
"fiets_michelle": 35,
|
||||
"ov_mark": 30,
|
||||
"ov_michelle": 45,
|
||||
}
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("=== Email ===")
|
||||
notify_email(TEST_LISTING, TEST_TRAVEL)
|
||||
print(" verstuurd (check je inbox)")
|
||||
Reference in New Issue
Block a user