Compare commits
28 Commits
26d9d936f4
...
master
| Author | SHA1 | Date | |
|---|---|---|---|
| fc6f3ff809 | |||
| 1841412c93 | |||
| c6328cee46 | |||
| f74e9bcfb0 | |||
| 1011d9cf87 | |||
| 9149d11a06 | |||
| 6bb2538143 | |||
| 77f8e91f07 | |||
| 7096220203 | |||
| 75c5b6f26d | |||
| 6beae1133b | |||
| bfd69e3542 | |||
| d310a7a560 | |||
| c92ddb5812 | |||
| edd2580919 | |||
| 942170ef7f | |||
| 84e5656ca0 | |||
| e1745841b1 | |||
| fbe50790da | |||
| 423a429f56 | |||
| f1748214ce | |||
| 8450c33887 | |||
| b35025b9cb | |||
| 918042d27e | |||
| 18c01139c2 | |||
| 4f37a1dd37 | |||
| efd31686be | |||
| 17b35d1997 |
10
.env.example
10
.env.example
@@ -1,13 +1,5 @@
|
|||||||
NAVITIA_API_KEY=
|
|
||||||
|
|
||||||
HA_WEBHOOK_URL=
|
HA_WEBHOOK_URL=
|
||||||
|
|
||||||
SMTP_HOST=
|
|
||||||
SMTP_PORT=587
|
|
||||||
SMTP_FROM=
|
|
||||||
SMTP_TO=
|
|
||||||
SMTP_USER=
|
|
||||||
SMTP_PASSWORD=
|
|
||||||
|
|
||||||
DB_PATH=/data/huizenbot.db
|
DB_PATH=/data/huizenbot.db
|
||||||
|
|
||||||
|
APP_ENV=dev
|
||||||
|
|||||||
1
.gitignore
vendored
1
.gitignore
vendored
@@ -5,3 +5,4 @@
|
|||||||
**/__pycache__/
|
**/__pycache__/
|
||||||
|
|
||||||
tests/cache/
|
tests/cache/
|
||||||
|
data/
|
||||||
|
|||||||
45
CLAUDE.md
Normal file
45
CLAUDE.md
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
# Huizenbot
|
||||||
|
|
||||||
|
## Doel
|
||||||
|
|
||||||
|
Periodiek scrapen van makelaarswebsites in Delft en Schiedam, nieuwe woningen opslaan in SQLite, en pushnotificaties sturen via Home Assistant. Draait als één Docker container op homelab met cron.
|
||||||
|
|
||||||
|
Dit draait op het moment al, dus we zijn nu enkel bezig met uitbreidingen en verbeteringen.
|
||||||
|
|
||||||
|
|
||||||
|
# AIDE - An IDE for your Agent
|
||||||
|
This project uses AIDA to support agents, it increases the robustness of edits and reduces token costs.
|
||||||
|
|
||||||
|
You can always use `aide help` for further info, and use it also on each subcommand. If you do, edit what you learned into the agents.md so we don't have to spend tokens on it repeatedly.
|
||||||
|
|
||||||
|
|
||||||
|
## Using aide effectively
|
||||||
|
|
||||||
|
**Always start with aide for codebase exploration — not Read or Grep:**
|
||||||
|
- Use `aide outline <file>` first to get the function map of any file before reading it
|
||||||
|
- Use `aide source <file> <sym>` to read individual functions — never Read a whole large file just to find one function
|
||||||
|
- This is especially important for large files like `ssr.py` (84KB+) where Read truncates
|
||||||
|
|
||||||
|
**For edits:** `aide insert` is fragile with large inputs (see note above) — fall back to the `Edit` tool for anything non-trivial. `aide replace` is fine for small targeted changes.
|
||||||
|
|
||||||
|
## What aide can do (quick reference)
|
||||||
|
|
||||||
|
| Command | What it replaces |
|
||||||
|
|---------|-----------------|
|
||||||
|
| `aide outline <file\|dir>` | `Read` whole file for structure; `ls` + loop |
|
||||||
|
| `aide source <file> <sym>` | `Read` whole file for one function |
|
||||||
|
| `aide callers <sym>` | `Grep` for call sites |
|
||||||
|
| `aide search <term>` | `Grep` across the project |
|
||||||
|
| `aide replace <file> <sym> <msg>` | `Edit` / `sed` for symbol-level changes |
|
||||||
|
| `aide replace … --lines N-M <msg>` | `Edit` for intra-function line edits |
|
||||||
|
| `aide remove <file> <sym>` | Manual splice to delete a symbol |
|
||||||
|
| `aide insert <file> <msg> --after <sym>` | Manual splice to add a new symbol — **insert one function at a time**; large messages cause bash to be killed |
|
||||||
|
| `aide rename <file> <old> <new>` | Manual find-and-replace of a name |
|
||||||
|
| `aide log` | Log related to the undo command; see which files changed in which order |
|
||||||
|
| `aide annotate <file> <sym>` | Persist a non-obvious invariant or gotcha for a symbol |
|
||||||
|
| `aide context <file> <sym>` | Read the stored annotation before editing |
|
||||||
|
| `aide review [path]` | Check for annotations invalidated by recent edits |
|
||||||
|
|
||||||
|
Line numbers in `--lines N-M` are **1-based and relative to the symbol's first
|
||||||
|
line** (line 1 is the signature / opening line of the symbol). This means they
|
||||||
|
are stable across edits elsewhere in the file.
|
||||||
407
add_scraper_context.md
Normal file
407
add_scraper_context.md
Normal file
@@ -0,0 +1,407 @@
|
|||||||
|
# Huizenbot — Agent Context for Adding Routes
|
||||||
|
|
||||||
|
## Project Overview
|
||||||
|
|
||||||
|
**Huizenbot** is a periodic scraper of real estate broker websites in Delft and Schiedam (Netherlands). It:
|
||||||
|
- Fetches property listings from broker websites
|
||||||
|
- Saves new ones to SQLite with `RawListing` schema
|
||||||
|
- Calculates travel times (bike + public transit) to two work locations
|
||||||
|
- Sends push notifications via Home Assistant webhook (with email fallback)
|
||||||
|
|
||||||
|
**Your role:** You will add new broker routes (scrapers) to the `adapters/` directory. A human will:
|
||||||
|
1. Select a broker from the list
|
||||||
|
2. Help you investigate the broker's website
|
||||||
|
3. For API-based brokers: develop curl requests to test
|
||||||
|
4. For HTML scrapers: develop parsing logic using BeautifulSoup
|
||||||
|
5. Run `tests/test_adapters.py` to validate
|
||||||
|
6. Merge your code snippets into the codebase
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Schema: RawListing
|
||||||
|
|
||||||
|
**Location:** `src/huizenbot.py` (lines 29–52)
|
||||||
|
|
||||||
|
This is the data model you must populate. All fields except `url` are optional:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@dataclass
|
||||||
|
class RawListing:
|
||||||
|
url: str # REQUIRED — the listing URL
|
||||||
|
|
||||||
|
source_makelaar: str = "" # Name of the broker (e.g., "bjornd", "vdaal")
|
||||||
|
datum_aanmelding: str | None = None # ISO 8601 date if available
|
||||||
|
status: str = "beschikbaar" # enum: beschikbaar | onder_bod | verkocht
|
||||||
|
|
||||||
|
# Location
|
||||||
|
adres: str | None = None # Street address (e.g., "Binnenwatersloot 3")
|
||||||
|
postcode: str | None = None # Dutch postcode (e.g., "2611CA")
|
||||||
|
stad: str | None = None # City (e.g., "Delft")
|
||||||
|
|
||||||
|
# Property details
|
||||||
|
prijs: int | None = None # Price in euros (integer, no float)
|
||||||
|
woningtype: str | None = None # Type (e.g., "appartement", "tussenwoning")
|
||||||
|
woonoppervlak: int | None = None # Living space in m²
|
||||||
|
perceeloppervlak: int | None = None # Plot size in m² (NULL for apartments)
|
||||||
|
kamers: int | None = None # Number of rooms
|
||||||
|
slaapkamers: int | None = None # Number of bedrooms
|
||||||
|
bouwjaar: int | None = None # Build year
|
||||||
|
energielabel: str | None = None # Energy label (e.g., "A", "B")
|
||||||
|
|
||||||
|
# Media
|
||||||
|
hero_image_url: str | None = None # Main photo URL
|
||||||
|
|
||||||
|
# Extra data (broker-specific fields)
|
||||||
|
extra: dict[str, Any] = field(default_factory=dict) # Arbitrary JSON data
|
||||||
|
```
|
||||||
|
|
||||||
|
**DB Upsert:** The listing is inserted on first run (with `id = sha256(url)`) and updated only on `last_seen` / `status` on subsequent runs. Travel times are calculated only on first insert.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adapter Structure
|
||||||
|
|
||||||
|
Adapters live in `src/adapters/` and are organized by type:
|
||||||
|
|
||||||
|
### Two Adapter Types
|
||||||
|
|
||||||
|
#### 1. **API-based** (`src/adapters/api.py`)
|
||||||
|
For brokers with REST/JSON endpoints.
|
||||||
|
|
||||||
|
**Pattern:**
|
||||||
|
```python
|
||||||
|
def fetch_bjornd() -> list[RawListing]:
|
||||||
|
data = fetch_json("https://...", params={...}, headers={...})
|
||||||
|
listings = []
|
||||||
|
for item in data:
|
||||||
|
# Filter / validate
|
||||||
|
if item.get("status") in _SKIP:
|
||||||
|
continue
|
||||||
|
if item.get("price") > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=item["url"],
|
||||||
|
source_makelaar="bjornd",
|
||||||
|
adres=item.get("address"),
|
||||||
|
postcode=item.get("zipcode"),
|
||||||
|
# ... etc
|
||||||
|
))
|
||||||
|
|
||||||
|
log.info("bjornd: %d listings", len(listings))
|
||||||
|
return listings
|
||||||
|
```
|
||||||
|
|
||||||
|
**Helpers available:**
|
||||||
|
- `fetch_json(url, *, params=None, headers=None)` — GET with User-Agent, timeout, Retry-After handling
|
||||||
|
- Built-in logging via `log = logging.getLogger("huizenbot.api")`
|
||||||
|
|
||||||
|
#### 2. **SSR/HTML-based** (`src/adapters/ssr/` package)
|
||||||
|
For brokers with server-side rendered HTML. The package is split by CMS platform:
|
||||||
|
- `realworks.py` — Realworks CMS (li/div.aanbodEntry cards + span.kenmerk detail)
|
||||||
|
- `sure.py` — SURE WordPress plugin (/wonen?sure_koop_huur=koop + #kenmerken detail)
|
||||||
|
- `schiedam.py` — Custom Schiedam scrapers (diverse platforms)
|
||||||
|
- `denhaag.py` — Den Haag scrapers (diverse platforms)
|
||||||
|
- `overige.py` — Other / multi-city scrapers (OG Online WP, Elementor)
|
||||||
|
|
||||||
|
**Pattern:**
|
||||||
|
```python
|
||||||
|
def fetch_vdaal() -> list[RawListing]:
|
||||||
|
soup = fetch_soup("https://vdaalmakelaardij.nl/aanbod")
|
||||||
|
listings = []
|
||||||
|
|
||||||
|
for card in soup.select(".property-card"):
|
||||||
|
try:
|
||||||
|
url = card.select_one("a[href]")["href"]
|
||||||
|
if not url.startswith("http"):
|
||||||
|
url = VDAAL_BASE + url
|
||||||
|
|
||||||
|
adres = _text(card, ".address-selector")
|
||||||
|
postcode = _extract_postcode(adres)
|
||||||
|
prijs = parse_prijs(_text(card, ".price"))
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=url,
|
||||||
|
source_makelaar="vdaal",
|
||||||
|
adres=adres,
|
||||||
|
postcode=postcode,
|
||||||
|
stad=_infer_stad(postcode),
|
||||||
|
prijs=prijs,
|
||||||
|
# ... etc
|
||||||
|
))
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("Parse error: %s", e)
|
||||||
|
|
||||||
|
log.info("vdaal: %d listings", len(listings))
|
||||||
|
return listings
|
||||||
|
```
|
||||||
|
|
||||||
|
**Helpers available:**
|
||||||
|
- `fetch_soup(url, *, params=None)` — GET with BeautifulSoup, Retry-After handling
|
||||||
|
- `parse_prijs(text)` — Extract price from strings like "€ 325.000 k.k." → 325000
|
||||||
|
- `parse_m2(text)` — Extract area from "87 m²" → 87
|
||||||
|
- `_text(soup, selector)` — Get inner text from element
|
||||||
|
- `_src(soup, selector)` — Get src or data-src attribute
|
||||||
|
- `_extract_postcode(text)` — Regex postcode from any text
|
||||||
|
- `_infer_stad(postcode)` — Simple lookup: 2600–2629 → Delft, 3100–3135 → Schiedam (Den Haag not in this helper; use the city value from the broker directly)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Registration
|
||||||
|
|
||||||
|
**API scrapers** (`src/adapters/api.py`): Add your function and register in the `SCRAPERS` dict at the bottom of the file.
|
||||||
|
|
||||||
|
**SSR scrapers**: Add your function to the appropriate submodule (`realworks.py`, `sure.py`, `schiedam.py`, `denhaag.py`, or `overige.py`), then import it in `src/adapters/ssr/__init__.py` and add it to the `SCRAPERS` dict there.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# api.py — SCRAPERS dict
|
||||||
|
SCRAPERS = {
|
||||||
|
'bjornd': fetch_bjornd,
|
||||||
|
'your_broker': fetch_your_broker, # ← Add here
|
||||||
|
}
|
||||||
|
|
||||||
|
# ssr/__init__.py — import + register
|
||||||
|
from .realworks import fetch_your_broker # ← import from the right submodule
|
||||||
|
|
||||||
|
SCRAPERS = {
|
||||||
|
...
|
||||||
|
'your_broker': fetch_your_broker, # ← Add here
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `src/adapters/__init__.py` merges both dicts, so the runner picks up all registered adapters automatically.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing Workflow
|
||||||
|
|
||||||
|
### 1. Understand the Website
|
||||||
|
The human will help you:
|
||||||
|
- Identify the broker's API endpoint (or the HTML structure)
|
||||||
|
- Check for a `robots.txt` or rate limit headers
|
||||||
|
- Write exploratory curl requests (for APIs) or BeautifulSoup inspections
|
||||||
|
|
||||||
|
### 2. Develop & Test Locally
|
||||||
|
- Add your scraper function to the appropriate file (`api.py` or the right `ssr/` submodule)
|
||||||
|
- Register it in the `SCRAPERS` dict
|
||||||
|
- The human updates `tests/test_adapters.py` to point to your adapter:
|
||||||
|
```python
|
||||||
|
ADAPTER = SCRAPERS['your_broker_name']
|
||||||
|
```
|
||||||
|
- Run the test:
|
||||||
|
```bash
|
||||||
|
cd tests && python test_adapters.py
|
||||||
|
```
|
||||||
|
- The test prints listings in a simple format so you can validate output
|
||||||
|
|
||||||
|
### 3. Merge Code
|
||||||
|
Once validated, the human will **copy your inline code snippets** into the main codebase. You produce **easily pasteable functions**, not entire files.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Config & Constants
|
||||||
|
|
||||||
|
**Location:** `src/config.py`
|
||||||
|
|
||||||
|
Key values you may reference:
|
||||||
|
- `MAX_PRICE = 300_000` — Price filter (your scraper can skip listings above this)
|
||||||
|
- `USER_AGENT = "Huizenbot/1.0 (+mark@kalsbeek.dev) persoonlijk gebruik"` — Used in all HTTP headers
|
||||||
|
- `MARK_WERK_POSTCODE`, `MICHELLE_WERK_POSTCODE` — Work postcodes for travel time calculation
|
||||||
|
|
||||||
|
Secrets (API keys, webhook URLs) are **environment variables**, not in config.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Platform / CMS Quick Identification
|
||||||
|
|
||||||
|
Before investigating a broker's HTML manually, check for known platforms in this order:
|
||||||
|
|
||||||
|
### 1. OG Online / realtime-listings (API — fastest)
|
||||||
|
**File:** `src/adapters/api.py`
|
||||||
|
|
||||||
|
Check if `https://<base>/nl/realtime-listings/consumer` returns JSON (with header `X-Requested-With: XMLHttpRequest`). If yes, this is a 10-line addition to `api.py`. Known brokers: bjornd, moerman, vandaal, elzenaar, doen.
|
||||||
|
|
||||||
|
Fields: `isSales`, `statusOrig`, `salesPrice`, `address`, `zipcode`, `city`, `rooms`, `bedrooms`, `livingSurface`, `plotSurface`, `dateOfConstruction`, `energyLabel`, `type`, `photo`, `url`.
|
||||||
|
|
||||||
|
Add a `_CITIES` set to filter by city if the broker covers a wide area. Skip statuses `"rented"` and `"rented_ur"`.
|
||||||
|
|
||||||
|
### 2. Realworks CMS (SSR — one liner)
|
||||||
|
**File:** `src/adapters/ssr/realworks.py`
|
||||||
|
|
||||||
|
Run `autoscraper.py` or check HTML for `li.aanbodEntry`. If detected:
|
||||||
|
```python
|
||||||
|
def fetch_mybroker() -> list[RawListing]:
|
||||||
|
return fetch_realworks("https://www.mybroker.nl", "mybroker")
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. SURE WordPress Plugin (SSR — ~50 lines)
|
||||||
|
**File:** `src/adapters/ssr/sure.py`
|
||||||
|
|
||||||
|
Check HTML for `sure-` CSS classes or `?sure_koop_huur=koop` filter. Two card variants:
|
||||||
|
- `a.card-house` (single dash) — e.g. Olsthoorn
|
||||||
|
- `a.card--house` (double dash) — e.g. Borgdorff
|
||||||
|
|
||||||
|
Both use `?sure_koop_huur=koop` to filter buy listings and `/page/{N}/` pagination. Detail page always has `#kenmerken li span span` pairs with labels like `status`, `soort woonhuis`/`soort woning`/`soort bouw`, `bouwjaar`, `gebruiksoppervlakte wonen`, `perceeloppervlakte`, `aantal slaapkamers`, `energielabel`. Postcode is often **not** available on the detail page.
|
||||||
|
|
||||||
|
Terminate pagination when `len(cards) < expected_per_page` (typically 15 for SURE).
|
||||||
|
|
||||||
|
### 4. Unknown CMS
|
||||||
|
**File:** `src/adapters/ssr/schiedam.py`, `denhaag.py`, or `overige.py` depending on city — or add a new file if needed.
|
||||||
|
|
||||||
|
Run the autoscraper tool:
|
||||||
|
```bash
|
||||||
|
python autoscraper.py listings <listings-url>
|
||||||
|
python autoscraper.py details <detail-page-url>
|
||||||
|
```
|
||||||
|
It prints structural diagnostics (card selectors, field patterns, pagination) to guide manual adapter development.
|
||||||
|
|
||||||
|
## Important Notes
|
||||||
|
|
||||||
|
Don't treat detail pages as optional, we always want all the info!
|
||||||
|
|
||||||
|
### Status Mapping
|
||||||
|
Brokers use different status strings. Always map to one of:
|
||||||
|
- `"beschikbaar"` — Available for sale
|
||||||
|
- `"onder_bod"` — Under offer
|
||||||
|
- `"verkocht"` — Sold
|
||||||
|
|
||||||
|
Example from api.py:
|
||||||
|
```python
|
||||||
|
_STATUS_MAP = {
|
||||||
|
"available": "beschikbaar",
|
||||||
|
"under_bid": "onder_bod",
|
||||||
|
"sold": "verkocht",
|
||||||
|
}
|
||||||
|
status = _STATUS_MAP.get(item.get("status"), "beschikbaar")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Postcode Extraction
|
||||||
|
Always aim for the **Dutch postcode format** (4 digits + 2 letters, e.g., `"2611CA"`). The travel time calculation depends on it. If a broker only provides the address string, use `_extract_postcode(address)`.
|
||||||
|
|
||||||
|
If a postcode field contains extra text (e.g., `"2522GW Den Haag"`), extract cleanly with:
|
||||||
|
```python
|
||||||
|
m = re.search(r"\d{4}\s*[A-Z]{2}", raw.upper())
|
||||||
|
postcode = m.group(0).replace(" ", "") if m else None
|
||||||
|
```
|
||||||
|
Never just `.replace(" ", "")` — that produces garbage like `"2522GWDenHaag"`.
|
||||||
|
|
||||||
|
### Price Handling
|
||||||
|
Prices are **integers** (euros), never floats. Use `parse_prijs()` for HTML.
|
||||||
|
|
||||||
|
### Image URLs
|
||||||
|
Store the hero/main image URL in `hero_image_url`. This appears in Home Assistant notifications.
|
||||||
|
|
||||||
|
### Extra Data
|
||||||
|
If a broker provides extra fields that don't fit the schema (e.g., balcony, garden, orientation), store them in the `extra` dict:
|
||||||
|
```python
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=...,
|
||||||
|
...
|
||||||
|
extra={
|
||||||
|
"balcony": item.get("has_balcony"),
|
||||||
|
"garden": item.get("has_garden"),
|
||||||
|
"custom_field": item.get("something_else"),
|
||||||
|
}
|
||||||
|
))
|
||||||
|
```
|
||||||
|
|
||||||
|
The database stores this as JSON in the `extra` column.
|
||||||
|
|
||||||
|
### Error Handling
|
||||||
|
- Wrap individual listing parsing in try/except to continue on one bad listing
|
||||||
|
- Log parse warnings, not errors (brokers' HTML changes)
|
||||||
|
- Let HTTP errors bubble up (the runner catches them at the adapter level)
|
||||||
|
|
||||||
|
### Rate Limiting & Ethics
|
||||||
|
- Both `fetch_json()` and `fetch_soup()` handle 429 Retry-After automatically
|
||||||
|
- Nominatim (geocoding) has a 1 req/s limiter built into `huizenbot.py`
|
||||||
|
- Never spawn parallel requests without the human's approval
|
||||||
|
- Always use the `USER_AGENT` header (includes contact info for respectful scraping)
|
||||||
|
- Don't keep curling the same endpoint, pipe it to a <name makelaar>.dump and then rg through it to find what you need. Can also pipe it through the bsprettify.py and then rg that.
|
||||||
|
- Don't over-investigate pagination — confirm card count on page 1, assume it's consistent across pages, move on. Never fetch multiple pages just to verify the per-page count.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Example: Adding "Van Daal" (API-based)
|
||||||
|
|
||||||
|
### Scenario
|
||||||
|
The human finds that Van Daal (vandaalmakelaardij.nl) has a JSON API at:
|
||||||
|
```
|
||||||
|
https://api.vandaal.nl/listings?city=delft&status=available
|
||||||
|
```
|
||||||
|
|
||||||
|
### Your Code (add to api.py)
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Van Daal
|
||||||
|
# --------
|
||||||
|
_VANDAAL_BASE = "https://www.vandaalmakelaardij.nl"
|
||||||
|
_VANDAAL_API = "https://api.vandaal.nl/listings"
|
||||||
|
|
||||||
|
_VANDAAL_STATUS_MAP = {
|
||||||
|
"available": "beschikbaar",
|
||||||
|
"under_offer": "onder_bod",
|
||||||
|
"sold": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
def fetch_vandaal() -> list[RawListing]:
|
||||||
|
listings = []
|
||||||
|
for city in ["delft", "schiedam"]:
|
||||||
|
data = fetch_json(
|
||||||
|
_VANDAAL_API,
|
||||||
|
params={"city": city, "status": "available"}
|
||||||
|
)
|
||||||
|
|
||||||
|
for item in data.get("listings", []):
|
||||||
|
if item.get("price", 0) > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=item["url"],
|
||||||
|
source_makelaar="vandaal",
|
||||||
|
adres=item.get("address"),
|
||||||
|
postcode=item.get("postcode"),
|
||||||
|
stad=item.get("city"),
|
||||||
|
prijs=item.get("price"),
|
||||||
|
woningtype=item.get("type"),
|
||||||
|
woonoppervlak=item.get("living_area"),
|
||||||
|
slaapkamers=item.get("bedrooms"),
|
||||||
|
hero_image_url=item.get("image_url"),
|
||||||
|
))
|
||||||
|
|
||||||
|
log.info("vandaal: %d listings", len(listings))
|
||||||
|
return listings
|
||||||
|
```
|
||||||
|
|
||||||
|
### Register in SCRAPERS (in api.py)
|
||||||
|
```python
|
||||||
|
SCRAPERS = {
|
||||||
|
'bjornd': fetch_bjornd,
|
||||||
|
'vandaal': fetch_vandaal, # ← Add this
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Test
|
||||||
|
Human updates `test_adapters.py`:
|
||||||
|
```python
|
||||||
|
ADAPTER = SCRAPERS['vandaal']
|
||||||
|
```
|
||||||
|
|
||||||
|
Then runs:
|
||||||
|
```bash
|
||||||
|
cd tests && python test_adapters.py
|
||||||
|
```
|
||||||
|
|
||||||
|
If all looks good, the human copies the `fetch_vandaal()` function into the real `api.py` and adds it to `SCRAPERS`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
1. **You receive** an adapter request + investigation results (API endpoint or HTML structure)
|
||||||
|
2. **You write** a clean, self-contained scraper function that returns `list[RawListing]`
|
||||||
|
3. **You register** it in the appropriate `SCRAPERS` dict
|
||||||
|
4. **The human tests** it with `test_adapters.py` and validates output
|
||||||
|
5. **The human merges** your code into the production files
|
||||||
|
|
||||||
|
Keep code simple, use the provided helpers, populate `RawListing` fields as best you can, and always set `source_makelaar` and `url` correctly.
|
||||||
316
autoscraper.py
Normal file
316
autoscraper.py
Normal file
@@ -0,0 +1,316 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
autoscraper.py — detect CMS and extract patterns from broker pages
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python autoscraper.py listings <url> — detect CMS + card structure
|
||||||
|
python autoscraper.py details <url> — detect CMS + kenmerk patterns
|
||||||
|
"""
|
||||||
|
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import json
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
from bs4 import BeautifulSoup, Tag
|
||||||
|
|
||||||
|
UA = "Huizenbot/1.0 (+mark@kalsbeek.dev) persoonlijk gebruik"
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# CMS fingerprints
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
# Each entry: (name, listings_signal, details_signal, adapter_hint)
|
||||||
|
# signals are (selector, min_count) tuples — all must match
|
||||||
|
CMS_FINGERPRINTS = [
|
||||||
|
{
|
||||||
|
"name": "Realworks",
|
||||||
|
"listings": [("li.aanbodEntry", 1), ("span.kenmerkValue", 1)],
|
||||||
|
"details": [("span.kenmerkName", 3), ("span.kenmerkValue", 3)],
|
||||||
|
"hint": "fetch_realworks('{base_url}', '{makelaar}')",
|
||||||
|
},
|
||||||
|
]
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Candidate card selectors (tried in order for unknown CMS)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
CARD_CANDIDATES = [
|
||||||
|
"li.aanbodEntry",
|
||||||
|
"article",
|
||||||
|
"li[class*=object]",
|
||||||
|
"li[class*=woning]",
|
||||||
|
"li[class*=listing]",
|
||||||
|
"div[class*=object-item]",
|
||||||
|
"div[class*=property-item]",
|
||||||
|
"div[class*=aanbod]",
|
||||||
|
".listing-item",
|
||||||
|
]
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Regex patterns for field detection
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
RE_POSTCODE = re.compile(r"\b\d{4}\s?[A-Z]{2}\b")
|
||||||
|
RE_PRICE = re.compile(r"€\s*[\d.,]+")
|
||||||
|
RE_M2 = re.compile(r"\d+\s*m[²2]")
|
||||||
|
RE_PAGE_URL = re.compile(r"pagina[-/]?\d+|[?&]p(?:age)?=\d+|/\d+/?$")
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def fetch(url: str) -> BeautifulSoup:
|
||||||
|
r = httpx.get(url, headers={"User-Agent": UA}, timeout=15, follow_redirects=True)
|
||||||
|
r.raise_for_status()
|
||||||
|
return BeautifulSoup(r.text, "html.parser")
|
||||||
|
|
||||||
|
|
||||||
|
def _selector_path(el: Tag) -> str:
|
||||||
|
"""Short CSS-like path for an element: tag.class1.class2"""
|
||||||
|
parts = []
|
||||||
|
for ancestor in reversed(list(el.parents)):
|
||||||
|
if ancestor.name in (None, "[document]", "html", "body"):
|
||||||
|
continue
|
||||||
|
cls = ".".join(ancestor.get("class", []))
|
||||||
|
parts.append(f"{ancestor.name}.{cls}" if cls else ancestor.name)
|
||||||
|
if len(parts) >= 3:
|
||||||
|
break
|
||||||
|
cls = ".".join(el.get("class", []))
|
||||||
|
parts.append(f"{el.name}.{cls}" if cls else el.name)
|
||||||
|
return " > ".join(parts[-3:])
|
||||||
|
|
||||||
|
|
||||||
|
def _detect_cms(soup: BeautifulSoup, mode: str) -> dict | None:
|
||||||
|
key = "listings" if mode == "listings" else "details"
|
||||||
|
for cms in CMS_FINGERPRINTS:
|
||||||
|
if all(len(soup.select(sel)) >= n for sel, n in cms[key]):
|
||||||
|
return cms
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _find_cards(soup: BeautifulSoup) -> tuple[list, str | None]:
|
||||||
|
for sel in CARD_CANDIDATES:
|
||||||
|
found = soup.select(sel)
|
||||||
|
if len(found) >= 2:
|
||||||
|
return found, sel
|
||||||
|
# fallback: find the most repeated element class
|
||||||
|
from collections import Counter
|
||||||
|
class_counts: Counter = Counter()
|
||||||
|
for el in soup.find_all(True):
|
||||||
|
cls = tuple(el.get("class", []))
|
||||||
|
if cls:
|
||||||
|
class_counts[cls] += 1
|
||||||
|
if class_counts:
|
||||||
|
top_cls, count = class_counts.most_common(1)[0]
|
||||||
|
if count >= 2:
|
||||||
|
sel = "." + ".".join(top_cls)
|
||||||
|
return soup.select(sel), f"{sel} (auto-detected, count={count})"
|
||||||
|
return [], None
|
||||||
|
|
||||||
|
|
||||||
|
def _pattern_hits(soup: BeautifulSoup, pattern: re.Pattern, label: str):
|
||||||
|
hits = []
|
||||||
|
for el in soup.find_all(string=pattern):
|
||||||
|
parent = el.parent
|
||||||
|
if parent:
|
||||||
|
hits.append((parent.get_text(strip=True)[:80], _selector_path(parent)))
|
||||||
|
if hits:
|
||||||
|
print(f"\n [{label}] — {len(hits)} hit(s)")
|
||||||
|
for text, path in hits[:4]:
|
||||||
|
print(f" {path}")
|
||||||
|
print(f" → {text!r}")
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Commands
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def cmd_listings(url: str):
|
||||||
|
print(f"Fetching: {url}\n")
|
||||||
|
soup = fetch(url)
|
||||||
|
base_url = "/".join(url.split("/")[:3])
|
||||||
|
|
||||||
|
cms = _detect_cms(soup, "listings")
|
||||||
|
|
||||||
|
if cms:
|
||||||
|
print(f"✓ CMS DETECTED: {cms['name']}")
|
||||||
|
hint = cms["hint"].format(base_url=base_url, makelaar="<name>")
|
||||||
|
print(f"\n Add to ssr.py:\n")
|
||||||
|
print(f" def fetch_<name>() -> list[RawListing]:")
|
||||||
|
print(f" return {hint}\n")
|
||||||
|
print(f" Register in SCRAPERS dict:")
|
||||||
|
print(f" '<name>': fetch_<name>,")
|
||||||
|
return
|
||||||
|
|
||||||
|
print("✗ CMS unknown — structural diagnostics:\n")
|
||||||
|
|
||||||
|
# Cards
|
||||||
|
cards, matched_sel = _find_cards(soup)
|
||||||
|
print(f"=== CARDS ({matched_sel or 'none found'}: {len(cards)}) ===")
|
||||||
|
if cards:
|
||||||
|
print("\n--- FIRST CARD ---")
|
||||||
|
print(cards[0].prettify()[:2500])
|
||||||
|
print("\n--- CHILD ELEMENTS & CLASSES ---")
|
||||||
|
for el in cards[0].find_all(True):
|
||||||
|
cls = el.get("class")
|
||||||
|
text = el.get_text(strip=True)[:50]
|
||||||
|
if cls:
|
||||||
|
print(f" <{el.name}> .{' .'.join(cls)} {text!r}")
|
||||||
|
|
||||||
|
# Pattern hits in cards area (or full page if no cards)
|
||||||
|
search_area = cards[0] if cards else soup
|
||||||
|
print("\n=== FIELD PATTERNS ===")
|
||||||
|
_pattern_hits(search_area, RE_POSTCODE, "postcode")
|
||||||
|
_pattern_hits(search_area, RE_PRICE, "prijs")
|
||||||
|
_pattern_hits(search_area, RE_M2, "m²")
|
||||||
|
|
||||||
|
# Pagination
|
||||||
|
print("\n=== PAGINATION ===")
|
||||||
|
page_links = soup.find_all("a", href=RE_PAGE_URL)
|
||||||
|
if page_links:
|
||||||
|
seen = set()
|
||||||
|
for a in page_links:
|
||||||
|
href = a.get("href", "")
|
||||||
|
if href not in seen:
|
||||||
|
seen.add(href)
|
||||||
|
print(f" {href!r} — {a.get_text(strip=True)!r}")
|
||||||
|
else:
|
||||||
|
print(" No pagination links found")
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_details(url: str):
|
||||||
|
print(f"Fetching: {url}\n")
|
||||||
|
soup = fetch(url)
|
||||||
|
|
||||||
|
cms = _detect_cms(soup, "details")
|
||||||
|
|
||||||
|
if cms:
|
||||||
|
print(f"✓ CMS DETECTED: {cms['name']}")
|
||||||
|
print("\n _realworks_detail() will extract:")
|
||||||
|
kv: dict[str, str] = {}
|
||||||
|
for kenmerk in soup.select("span.kenmerk"):
|
||||||
|
label_el = kenmerk.select_one("span.kenmerkName")
|
||||||
|
value_el = kenmerk.select_one("span.kenmerkValue")
|
||||||
|
if label_el and value_el:
|
||||||
|
label = label_el.get_text(strip=True).lower()
|
||||||
|
value = value_el.get_text(strip=True)
|
||||||
|
kv[label] = value
|
||||||
|
|
||||||
|
target_fields = {
|
||||||
|
"type woning": "woningtype",
|
||||||
|
"bouwjaar": "bouwjaar",
|
||||||
|
"woonoppervlakte": "woonoppervlak",
|
||||||
|
"perceeloppervlakte": "perceeloppervlak",
|
||||||
|
"aantal kamers": "kamers",
|
||||||
|
"aantal slaapkamers": "slaapkamers",
|
||||||
|
"energieklasse": "energielabel",
|
||||||
|
}
|
||||||
|
for key, field in target_fields.items():
|
||||||
|
val = kv.get(key, "NOT FOUND")
|
||||||
|
status = "✓" if key in kv else "✗"
|
||||||
|
print(f" {status} {field:<20} ← {key!r}: {val!r}")
|
||||||
|
return
|
||||||
|
|
||||||
|
print("✗ CMS unknown — structural diagnostics:\n")
|
||||||
|
|
||||||
|
# Address
|
||||||
|
print("=== ADDRESS ===")
|
||||||
|
for tag in ["h1", "h2"]:
|
||||||
|
for el in soup.select(tag):
|
||||||
|
t = el.get_text(strip=True)
|
||||||
|
if t:
|
||||||
|
print(f" <{tag}> {t!r}")
|
||||||
|
|
||||||
|
# Key-value patterns
|
||||||
|
print("\n=== KEY-VALUE STRUCTURES ===")
|
||||||
|
kv_selectors = [
|
||||||
|
("dl", "dt", "dd"),
|
||||||
|
("table", "th", "td"),
|
||||||
|
(".kenmerk", ".kenmerkName", ".kenmerkValue"),
|
||||||
|
(".spec", ".spec-label", ".spec-value"),
|
||||||
|
(".feature", ".feature-label", ".feature-value"),
|
||||||
|
]
|
||||||
|
found_any = False
|
||||||
|
for container_sel, label_sel, value_sel in kv_selectors:
|
||||||
|
pairs = []
|
||||||
|
for container in soup.select(container_sel)[:50]:
|
||||||
|
label_el = container.select_one(label_sel)
|
||||||
|
value_el = container.select_one(value_sel)
|
||||||
|
if label_el and value_el:
|
||||||
|
l = label_el.get_text(strip=True)
|
||||||
|
v = value_el.get_text(strip=True)
|
||||||
|
if l and v:
|
||||||
|
pairs.append((l, v))
|
||||||
|
if pairs:
|
||||||
|
found_any = True
|
||||||
|
print(f"\n [{container_sel} > {label_sel} / {value_sel}] — {len(pairs)} pairs")
|
||||||
|
for l, v in pairs[:10]:
|
||||||
|
print(f" {l:<30} {v}")
|
||||||
|
|
||||||
|
if not found_any:
|
||||||
|
print(" No key-value structures detected")
|
||||||
|
|
||||||
|
# Field pattern hits
|
||||||
|
print("\n=== FIELD PATTERNS ===")
|
||||||
|
_pattern_hits(soup, RE_POSTCODE, "postcode")
|
||||||
|
_pattern_hits(soup, RE_PRICE, "prijs")
|
||||||
|
_pattern_hits(soup, RE_M2, "m²")
|
||||||
|
|
||||||
|
# Images
|
||||||
|
print("\n=== IMAGES (first 5) ===")
|
||||||
|
for img in soup.select("img")[:5]:
|
||||||
|
src = img.get("src") or img.get("data-src")
|
||||||
|
alt = img.get("alt", "")
|
||||||
|
print(f" {src} [{alt}]")
|
||||||
|
|
||||||
|
# JSON-LD
|
||||||
|
print("\n=== JSON-LD (schema.org) ===")
|
||||||
|
for tag in soup.select('script[type="application/ld+json"]'):
|
||||||
|
try:
|
||||||
|
ld = json.loads(tag.string)
|
||||||
|
offered = ld.get("itemOffered", {})
|
||||||
|
address = offered.get("address", {})
|
||||||
|
floor_size = offered.get("floorSize", {})
|
||||||
|
fields = {
|
||||||
|
"woningtype": offered.get("@type"),
|
||||||
|
"adres": address.get("streetAddress"),
|
||||||
|
"postcode": address.get("postalCode"),
|
||||||
|
"stad": address.get("addressLocality"),
|
||||||
|
"prijs": ld.get("price"),
|
||||||
|
"woonoppervlak": floor_size.get("value"),
|
||||||
|
"kamers": offered.get("numberOfRooms"),
|
||||||
|
"bouwjaar": offered.get("yearBuilt"),
|
||||||
|
"availability": ld.get("availability"),
|
||||||
|
"image": ld.get("image"),
|
||||||
|
}
|
||||||
|
for k, v in fields.items():
|
||||||
|
mark = "✓" if v is not None else "✗"
|
||||||
|
print(f" {mark} {k:<16} {v!r}")
|
||||||
|
except Exception as e:
|
||||||
|
print(f" parse fout: {e}")
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Entry point
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def main():
|
||||||
|
if len(sys.argv) < 3:
|
||||||
|
print(__doc__)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
cmd = sys.argv[1]
|
||||||
|
url = sys.argv[2]
|
||||||
|
|
||||||
|
if cmd == "listings":
|
||||||
|
cmd_listings(url)
|
||||||
|
elif cmd == "details":
|
||||||
|
cmd_details(url)
|
||||||
|
else:
|
||||||
|
print(f"Unknown command: {cmd}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
3
bsprettify.py
Normal file
3
bsprettify.py
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
import sys
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
print(BeautifulSoup(sys.stdin.read(), 'html.parser').prettify())
|
||||||
108
makelaars.md
108
makelaars.md
@@ -1,38 +1,82 @@
|
|||||||
# Verkoopmakelaars Delft & Schiedam
|
# Verkoopmakelaars Delft, Leiden, Den Haag & Schiedam
|
||||||
|
|
||||||
|
## TODO
|
||||||
|
|
||||||
|
- ~~**API scrapers need detail page enrichment**: OG Online API (bjornd, moerman, vandaal, elzenaar, doen, vandriel) sometimes omits fields like `energyLabel`. We should fetch the detail page for each listing and merge in missing fields (especially energielabel, bouwjaar). This is already done for SSR scrapers; needs to be added to API-based ones.~~ ✅ Done — `_og_detail()` added to `api.py`
|
||||||
|
|
||||||
## Delft
|
## Delft
|
||||||
|
|
||||||
| Naam | Website | Adres |
|
| Done | Naam | Website | Adres |
|
||||||
|------|---------|-------|
|
| [ ] | ---- |------|---------|-------|
|
||||||
| Van Silfhout & Hogetoorn Wereldmakelaars | vansilfhout.nl | Ireneboulevard 2 |
|
| [x] | Van Silfhout & Hogetoorn Wereldmakelaars | vansilfhout.nl | Ireneboulevard 2 |
|
||||||
| Van Daal Makelaardij | vandaalmakelaardij.nl | Voldersgracht 33 |
|
| [x] | Van Daal Makelaardij | vandaalmakelaardij.nl | Voldersgracht 33 |
|
||||||
| Björnd Makelaardij | bjornd.nl | Oude Delft 103 |
|
| [x] | Björnd Makelaardij | bjornd.nl | Oude Delft 103 |
|
||||||
| Hof van Delft Makelaardij | hofvandelftmakelaardij.nl | Wateringsevest 26 |
|
| [ ] | Hof van Delft Makelaardij | hofvandelftmakelaardij.nl | Wateringsevest 26 |
|
||||||
| V&W Makelaars Delft | vwmakelaars.nl | Coenderstraat 31 |
|
| [x] | V&W Makelaars Delft | vwmakelaars.nl | Coenderstraat 31 |
|
||||||
| Roepman Makelaardij NVM | roepman.nl | Molslaan 43 |
|
| [x] | Roepman Makelaardij NVM | roepman.nl | Molslaan 43 |
|
||||||
| ZO makelaars | zomakelaars.nl | Van Foreestweg 4 |
|
| [x] | ZO makelaars | zomakelaars.nl | Van Foreestweg 4 |
|
||||||
| Marloes Makelaars | — | Maerten Trompstraat 28 |
|
| [ ] | Marloes Makelaars | — | Maerten Trompstraat 28 |
|
||||||
| Makelaarskantoor J.E. Mouthaan | — | Julianalaan 43 |
|
| [ ] | Makelaarskantoor J.E. Mouthaan | — | Julianalaan 43 |
|
||||||
| Olsthoorn Makelaars Delft | olsthoornmakelaars.nl | Noordeinde 51 |
|
| [x] | Olsthoorn Makelaars Delft | olsthoornmakelaars.nl | Noordeinde 51 |
|
||||||
| Post Makelaardij (v/h Bayense) | postmakelaardij.nl | Spoorsingel 1a |
|
| [x] | Post Makelaardij (v/h Bayense) | postmakelaardij.nl | Spoorsingel 1a |
|
||||||
| Morris NVM Makelaars | morrismakelaardij.nl | — |
|
| [x] | Morris NVM Makelaars | morrismakelaardij.nl | — |
|
||||||
| Prinsenstad Makelaardij | — | — |
|
| [ ] | Prinsenstad Makelaardij | — | — |
|
||||||
| Oude Delft Makelaardij | — | — |
|
| [ ] | Oude Delft Makelaardij | — | — |
|
||||||
| Dijksman Woningmakelaars | — | — |
|
| [ ] | Dijksman Woningmakelaars | — | — |
|
||||||
| CORPOwonen | — | — |
|
| [ ] | CORPOwonen | — | — |
|
||||||
|
| [ ] | Bergklis Makelaars | bergklis.nl | — |
|
||||||
|
| [ ] | Van Gulden Makelaardij | vanguldenmakelaardij.nl | Zaïrestraat 1 |
|
||||||
|
| [ ] | Van der Togt Makelaardij | vdtmakelaardij.nl | — (Voorburg, actief in Delft) |
|
||||||
|
| [x] | Van Oord Makelaardij | vanoordmakelaardij.nl | — (Delft + Schiedam) |
|
||||||
|
|
||||||
|
|
||||||
## Schiedam
|
## Schiedam
|
||||||
|
|
||||||
| Naam | Website | Adres |
|
| Done | Naam | Website | Adres |
|
||||||
|------|---------|-------|
|
|------|------|---------|-------|
|
||||||
| Anke Bodewes Makelaardij | ankebodewes.nl | Hargplein 118 |
|
| [x] | Anke Bodewes Makelaardij | ankebodewes.nl | Hargplein 118 |
|
||||||
| Woongoed Makelaars Schiedam | woongoedmakelaars.nl | Oranjestraat 93 |
|
| [x] | Woongoed Makelaars Schiedam | woongoedmakelaars.nl | Oranjestraat 93 |
|
||||||
| Ooms Makelaars Schiedam | ooms.com | Gerrit Verboonstraat 2 |
|
| [x] | Ooms Makelaars Schiedam | ooms.com | Gerrit Verboonstraat 2 |
|
||||||
| De Witte Garantiemakelaars | dewittegarantiemakelaars.nl | Philippusweg 2 |
|
| [x] | De Witte Garantiemakelaars | dewittegarantiemakelaars.nl | Philippusweg 2 |
|
||||||
| Makelaardij Wassenaar | makelaardijwassenaar.nl | Gerrit Verboonstraat 12 |
|
| [x] | Makelaardij Wassenaar | makelaardijwassenaar.nl | Gerrit Verboonstraat 12 |
|
||||||
| 3D Makelaars | 3dmakelaars.nl | Gerrit Verboonstraat 17 |
|
| [x] | 3D Makelaars | 3dmakelaars.nl | Gerrit Verboonstraat 17 |
|
||||||
| Dupont Makelaars | dupont.nl | Rotterdamsedijk 437 |
|
| [x] | Dupont Makelaars | dupont.nl | Rotterdamsedijk 437 |
|
||||||
| D&S Makelaardij | densmakelaars.nl | Land van Belofte 50 |
|
| [x] | D&S Makelaardij | densmakelaars.nl | Land van Belofte 50 |
|
||||||
| Moerman & De Jong Makelaars | moerman-dejong.nl | Lange Kerkstraat 80B |
|
| [x] | Moerman & De Jong Makelaars | moerman-dejong.nl | Lange Kerkstraat 80B |
|
||||||
| Hagestein Makelaardij | — | Degerfors 54 |
|
| [ ] | Hagestein Makelaardij | — | Degerfors 54 |
|
||||||
| Schieland Borsboom NVM Makelaars | schielandborsboom.nl | (Rotterdam, actief in Schiedam) |
|
| [x] | Schieland Borsboom NVM Makelaars | schielandborsboom.nl | (Rotterdam, actief in Schiedam) |
|
||||||
|
| [x] | Vandriel Makelaardij | vandrielmakelaardij.nl | — |
|
||||||
|
| [x] | Van Herk Makelaars | vanherk.nl | — |
|
||||||
|
|
||||||
|
|
||||||
|
## Den Haag
|
||||||
|
|
||||||
|
| Done | Naam | Website | Adres |
|
||||||
|
|------|------|---------|-------|
|
||||||
|
| [skip] | Yuvam Makelaardij | yuvammakelaardij.nl | — (connection refused) |
|
||||||
|
| [x] | 88 Makelaars | 88makelaars.nl | — |
|
||||||
|
| [skip] | DIVA Makelaars | divamakelaars.nl | — (alleen Maartensdijk, niet Den Haag) |
|
||||||
|
| [x] | Elzenaar NVM Makelaars | elzenaar.com | — |
|
||||||
|
| [skip] | Frisia Makelaars | frisiamakelaars.nl | — (SPA/Vue, geen API) |
|
||||||
|
| [x] | Borgdorff Makelaars | borgdorff.nl | — (vestiging Den Haag) |
|
||||||
|
| [skip] | SMASH Makelaars | smashmakelaars.nl | — (te klein, geen API) |
|
||||||
|
| [x] | DOEN NVM Makelaars | doenmakelaars.com | Doezastraat 30 (Leiden, ook actief in Den Haag) |
|
||||||
|
|
||||||
|
## Leiden
|
||||||
|
|
||||||
|
| Done | Naam | Website | Adres |
|
||||||
|
|------|------|---------|-------|
|
||||||
|
| [ ] | RE/MAX Makelaarsgilde | makelaars-in-leiden.nl | Levendaal 73-75 |
|
||||||
|
| [ ] | Hypodomus Leiden | hypodomusleiden.nl | Haarlemmerstraat 268 |
|
||||||
|
| [ ] | Alpina Leiden (v/h De Leeuw) | advies.alpina.nl | Molenwerf 4 |
|
||||||
|
| [ ] | Fides makelaars (ERA/NVM) | fidesmakelaarsleiden.nl | Lammenschansweg 76 |
|
||||||
|
| [ ] | Werk Makelaardij | werkmakelaardij.nl | Stevenshof (Leiden) |
|
||||||
|
| [ ] | Kerkvliet Makelaars | kerkvlietmakelaars.nl | Hoge Rijndijk 271A |
|
||||||
|
| [ ] | Kompas Makelaars & Taxateurs | kompasmakelaardij.nl | Maresingel 75-76 |
|
||||||
|
| [ ] | Hoekstra en Van Eck Leiden | hoekstraenvaneck.nl | Schipholweg 55-75 |
|
||||||
|
| [ ] | DOEN NVM Makelaars | doenmakelaars.com | Doezastraat 30 |
|
||||||
|
| [ ] | Oudshoorn Makelaardij | oudshoornmakelaardij.nl | — |
|
||||||
|
| [ ] | April Makelaars Leiden | aprilmakelaars.nl | Haagweg 55 |
|
||||||
|
| [ ] | Emil NVM Makelaars | emilmakelaars.nl | — |
|
||||||
|
| [ ] | Goedhart Makelaars | — | Oude Singel 14 |
|
||||||
|
| [ ] | Graal Makelaardij & Taxaties | — | Rapenburg 5 |
|
||||||
|
|||||||
71
new_scraper_prompt.md
Normal file
71
new_scraper_prompt.md
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
# OG Online / realtime-listings (fastest — API)
|
||||||
|
|
||||||
|
Check out the add_scraper_context.md, let's add a new scraper.
|
||||||
|
|
||||||
|
**Broker:** [name]
|
||||||
|
**Base URL:** [e.g. https://www.mybroker.nl]
|
||||||
|
**Cities to include:** [e.g. {"Den Haag", "Voorburg"} — omit if broker is single-city]
|
||||||
|
|
||||||
|
_(No further investigation needed — OG Online platform is fully understood.)_
|
||||||
|
|
||||||
|
|
||||||
|
# Realworks CMS (one-liner — SSR)
|
||||||
|
|
||||||
|
Check out the add_scraper_context.md, let's add a new scraper.
|
||||||
|
|
||||||
|
**Broker:** [name]
|
||||||
|
**Base URL:** [e.g. https://www.mybroker.nl]
|
||||||
|
|
||||||
|
_(No further investigation needed — Realworks platform is fully understood.)_
|
||||||
|
|
||||||
|
|
||||||
|
# SURE WordPress Plugin (SSR)
|
||||||
|
|
||||||
|
Check out the add_scraper_context.md, let's add a new scraper.
|
||||||
|
|
||||||
|
**Broker:** [name]
|
||||||
|
**Base URL:** [e.g. https://www.mybroker.nl]
|
||||||
|
**Card selector:** [a.card-house or a.card--house]
|
||||||
|
**City filter:** [city name(s) to include, or "single city — no filter needed"]
|
||||||
|
**Cards per page:** [e.g. 15]
|
||||||
|
|
||||||
|
_(Detail page always uses #kenmerken li span span — no further investigation needed.)_
|
||||||
|
|
||||||
|
|
||||||
|
# SSR (custom)
|
||||||
|
|
||||||
|
Check out the add_scraper_context.md, let's add a new scraper.
|
||||||
|
|
||||||
|
**Broker:** [name]
|
||||||
|
**Website:** [base url]
|
||||||
|
**Listing page URL:** [url with any price/city filters applied]
|
||||||
|
**Detail page kenmerken:** yes / no
|
||||||
|
|
||||||
|
**Listing page HTML** (one card):
|
||||||
|
[paste]
|
||||||
|
|
||||||
|
**Detail page dump:** [attached / n.a.]
|
||||||
|
|
||||||
|
**Pagination:** [e.g. 10 per page, pagina-N in URL / no pagination]
|
||||||
|
|
||||||
|
**Notes:** [auth, JS rendering, price filter in URL, etc.]
|
||||||
|
|
||||||
|
|
||||||
|
# API (custom)
|
||||||
|
|
||||||
|
Check out the add_scraper_context.md, let's add a new scraper.
|
||||||
|
|
||||||
|
**Broker:** [name]
|
||||||
|
**Website:** [base url]
|
||||||
|
**API endpoint:** [full url]
|
||||||
|
**Auth:** [none / header: X-Foo: bar / query param]
|
||||||
|
|
||||||
|
**Example curl:**
|
||||||
|
[paste]
|
||||||
|
|
||||||
|
**Example response (one item):**
|
||||||
|
[paste]
|
||||||
|
|
||||||
|
**Pagination:** [e.g. page param / offset / single response]
|
||||||
|
|
||||||
|
**Notes:** [price filter, city filter, status field values, etc.]
|
||||||
11
shell.nix
11
shell.nix
@@ -1,20 +1,23 @@
|
|||||||
{ pkgs ? import <nixpkgs> {} }:
|
{ pkgs ? import <nixpkgs> { config.allowUnfree = true; } }:
|
||||||
|
let
|
||||||
|
unstable = import <nixos-unstable> { config.allowUnfree = true; };
|
||||||
|
in
|
||||||
pkgs.mkShell {
|
pkgs.mkShell {
|
||||||
packages = [
|
packages = [
|
||||||
(pkgs.python3.withPackages (ps: with ps; [
|
(pkgs.python3.withPackages (ps: with ps; [
|
||||||
httpx
|
httpx
|
||||||
beautifulsoup4
|
beautifulsoup4
|
||||||
|
flask
|
||||||
lxml
|
lxml
|
||||||
|
waitress
|
||||||
]))
|
]))
|
||||||
|
unstable.claude-code
|
||||||
];
|
];
|
||||||
|
|
||||||
shellHook = ''
|
shellHook = ''
|
||||||
if [ -f .env ]; then
|
if [ -f .env ]; then
|
||||||
set -a
|
set -a
|
||||||
source .env
|
source .env
|
||||||
set +a
|
set +a
|
||||||
echo ".env geladen"
|
|
||||||
fi
|
fi
|
||||||
'';
|
'';
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -7,9 +7,11 @@ Voeg nieuwe toe onderaan en registreer in SCRAPERS.
|
|||||||
|
|
||||||
import json
|
import json
|
||||||
import logging
|
import logging
|
||||||
|
import re
|
||||||
import time
|
import time
|
||||||
|
|
||||||
import httpx
|
import httpx
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
|
||||||
import config
|
import config
|
||||||
from huizenbot import RawListing
|
from huizenbot import RawListing
|
||||||
@@ -40,8 +42,71 @@ def fetch_json(url: str, *, params: dict = None, headers: dict = None) -> dict |
|
|||||||
return r.json()
|
return r.json()
|
||||||
|
|
||||||
raise RuntimeError(f"Blijvend 429 op {url}")
|
raise RuntimeError(f"Blijvend 429 op {url}")
|
||||||
|
|
||||||
|
|
||||||
|
def _og_detail(url: str, makelaar: str) -> dict:
|
||||||
|
"""
|
||||||
|
Fetch an OG Online detail page and extract missing fields.
|
||||||
|
|
||||||
|
OG Online sites typically expose kenmerken in one of two patterns:
|
||||||
|
1. A table/list with dt/dd or label/value span pairs
|
||||||
|
2. An energielabel CSS class (energielabel-A, energielabel-B, etc.)
|
||||||
|
|
||||||
|
Returns a dict with any fields found; empty dict on failure.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
r = httpx.get(
|
||||||
|
url,
|
||||||
|
headers={"User-Agent": config.USER_AGENT},
|
||||||
|
timeout=15,
|
||||||
|
follow_redirects=True,
|
||||||
|
)
|
||||||
|
r.raise_for_status()
|
||||||
|
soup = BeautifulSoup(r.text, "html.parser")
|
||||||
|
|
||||||
|
# Pattern 1: energielabel CSS class on any element
|
||||||
|
energielabel = None
|
||||||
|
for el in soup.select("[class]"):
|
||||||
|
for cls in el.get("class", []):
|
||||||
|
if cls.startswith("energielabel-") and cls != "energielabel":
|
||||||
|
energielabel = cls.replace("energielabel-", "").upper()
|
||||||
|
break
|
||||||
|
if energielabel:
|
||||||
|
break
|
||||||
|
|
||||||
|
# Pattern 2: kenmerken table — try dt/dd pairs first
|
||||||
|
kv: dict[str, str] = {}
|
||||||
|
dts = soup.select("dt")
|
||||||
|
dds = soup.select("dd")
|
||||||
|
for dt, dd in zip(dts, dds):
|
||||||
|
kv[dt.get_text(strip=True).lower()] = dd.get_text(strip=True)
|
||||||
|
|
||||||
|
# Pattern 3: ul.objectkenmerken / div.kenmerken span pairs
|
||||||
|
if not kv:
|
||||||
|
for li in soup.select("li"):
|
||||||
|
spans = li.select("span")
|
||||||
|
if len(spans) >= 2:
|
||||||
|
kv[spans[0].get_text(strip=True).lower()] = spans[1].get_text(strip=True)
|
||||||
|
|
||||||
|
if not energielabel:
|
||||||
|
energielabel = (
|
||||||
|
kv.get("energielabel")
|
||||||
|
or kv.get("energieklasse")
|
||||||
|
or kv.get("energie")
|
||||||
|
) or None
|
||||||
|
|
||||||
|
raw_year = kv.get("bouwjaar") or ""
|
||||||
|
bouwjaar = int(raw_year) if raw_year.isdigit() else None
|
||||||
|
|
||||||
|
return {
|
||||||
|
"energielabel": energielabel,
|
||||||
|
"bouwjaar": bouwjaar,
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("%s: detail fetch fout %s: %s", makelaar, url, e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Bjornd
|
# Bjornd
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
@@ -56,26 +121,36 @@ _STATUS_MAP = {
|
|||||||
"sold": "verkocht",
|
"sold": "verkocht",
|
||||||
"sold_ur": "verkocht",
|
"sold_ur": "verkocht",
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
def fetch_bjornd() -> list[RawListing]:
|
def fetch_bjornd() -> list[RawListing]:
|
||||||
data = fetch_json(
|
data = fetch_json(
|
||||||
f"{_BJORND_BASE}/nl/realtime-listings/consumer",
|
f"{_BJORND_BASE}/nl/realtime-listings/consumer",
|
||||||
headers={"X-Requested-With": "XMLHttpRequest"},
|
headers={"X-Requested-With": "XMLHttpRequest"},
|
||||||
)
|
)
|
||||||
|
|
||||||
listings = []
|
listings = []
|
||||||
for item in data:
|
for item in data:
|
||||||
if not item.get("isSales"):
|
if not item.get("isSales"):
|
||||||
continue
|
continue
|
||||||
if item.get("statusOrig") in _BJORND_SKIP:
|
if item.get("statusOrig") in _BJORND_SKIP:
|
||||||
continue
|
continue
|
||||||
if item.get('salesPrice')>config.MAX_PRICE:
|
if item.get("salesPrice", 0) > config.MAX_PRICE:
|
||||||
continue
|
continue
|
||||||
|
|
||||||
|
detail_url = _BJORND_BASE + item["url"]
|
||||||
|
raw_year = item.get("dateOfConstruction") or ""
|
||||||
|
bouwjaar = int(raw_year) if raw_year.isdigit() else None
|
||||||
|
energielabel = item.get("energyLabel") or None
|
||||||
|
|
||||||
|
# Fetch detail page when API omits key fields
|
||||||
|
if not energielabel or not bouwjaar:
|
||||||
|
extra_kk = _og_detail(detail_url, "bjornd")
|
||||||
|
energielabel = energielabel or extra_kk.get("energielabel")
|
||||||
|
bouwjaar = bouwjaar or extra_kk.get("bouwjaar")
|
||||||
|
|
||||||
listings.append(RawListing(
|
listings.append(RawListing(
|
||||||
url=_BJORND_BASE + item["url"],
|
url=detail_url,
|
||||||
source_makelaar="bjornd",
|
source_makelaar="bjornd",
|
||||||
status=_STATUS_MAP.get(item.get("statusOrig", ""), "beschikbaar"),
|
status=_STATUS_MAP.get(item.get("statusOrig", ""), "beschikbaar"),
|
||||||
adres=item.get("address") or None,
|
adres=item.get("address") or None,
|
||||||
@@ -87,6 +162,8 @@ def fetch_bjornd() -> list[RawListing]:
|
|||||||
perceeloppervlak=item.get("plotSurface") or None,
|
perceeloppervlak=item.get("plotSurface") or None,
|
||||||
kamers=item.get("rooms") or None,
|
kamers=item.get("rooms") or None,
|
||||||
slaapkamers=item.get("bedrooms") or None,
|
slaapkamers=item.get("bedrooms") or None,
|
||||||
|
bouwjaar=bouwjaar,
|
||||||
|
energielabel=energielabel,
|
||||||
hero_image_url=item.get("photo") or None,
|
hero_image_url=item.get("photo") or None,
|
||||||
extra=json.dumps({
|
extra=json.dumps({
|
||||||
"balcony": item.get("balcony"),
|
"balcony": item.get("balcony"),
|
||||||
@@ -102,15 +179,456 @@ def fetch_bjornd() -> list[RawListing]:
|
|||||||
"photos": item.get("photos"),
|
"photos": item.get("photos"),
|
||||||
}, ensure_ascii=False),
|
}, ensure_ascii=False),
|
||||||
))
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
|
||||||
log.info("bjornd: %d koopwoningen opgehaald", len(listings))
|
log.info("bjornd: %d koopwoningen opgehaald", len(listings))
|
||||||
return listings
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Ooms
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
_OOMS_BASE = "https://ooms.com"
|
||||||
|
_OOMS_CITIES = {"Delft", "Schiedam", "Rotterdam", "Leiden", "Voorburg", "Pijnacker"}
|
||||||
|
_OOMS_SKIP_STATUS = {"verhuurd", "verhuurd onder voorbehoud"}
|
||||||
|
_OOMS_STATUS_MAP = {
|
||||||
|
"beschikbaar": "beschikbaar",
|
||||||
|
"onder bod": "onder_bod",
|
||||||
|
"onder optie": "onder_bod",
|
||||||
|
"verkocht": "verkocht",
|
||||||
|
"verkocht onder voorbehoud":"verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_ooms() -> list[RawListing]:
|
||||||
|
data = fetch_json(f"{_OOMS_BASE}/api/properties/available.json")
|
||||||
|
listings = []
|
||||||
|
|
||||||
|
for item in data.get("objects", []):
|
||||||
|
if item.get("buy_or_rent") != "buy":
|
||||||
|
continue
|
||||||
|
if item.get("place") not in _OOMS_CITIES:
|
||||||
|
continue
|
||||||
|
if item.get("buy_price", 0) > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
status_raw = item.get("availability_status", "")
|
||||||
|
if status_raw in _OOMS_SKIP_STATUS:
|
||||||
|
continue
|
||||||
|
|
||||||
|
hnr = item.get("house_number", "")
|
||||||
|
add = item.get("house_number_addition") or ""
|
||||||
|
adres = f"{item.get('street_name', '')} {hnr}{(' ' + add) if add else ''}".strip()
|
||||||
|
|
||||||
|
main_images = item.get("realworks_main_images") or item.get("realworks_images") or []
|
||||||
|
hero = None
|
||||||
|
if main_images:
|
||||||
|
sizes = main_images[0].get("sizes") or []
|
||||||
|
best = max(sizes, key=lambda s: s.get("width", 0), default=None)
|
||||||
|
if best:
|
||||||
|
hero = _OOMS_BASE + best["imageUrl"]
|
||||||
|
|
||||||
|
perceel = item.get("parcel_surface") or None
|
||||||
|
if perceel == 0:
|
||||||
|
perceel = None
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=item["url"],
|
||||||
|
source_makelaar="ooms",
|
||||||
|
datum_aanmelding=item.get("publish_date", "")[:10] or None,
|
||||||
|
status=_OOMS_STATUS_MAP.get(status_raw, "beschikbaar"),
|
||||||
|
adres=adres or None,
|
||||||
|
postcode=(item.get("zip_code") or "").replace(" ", "") or None,
|
||||||
|
stad=item.get("place") or None,
|
||||||
|
prijs=item.get("buy_price") or None,
|
||||||
|
woningtype=item.get("appartment_characteristic") or item.get("residential_building_type") or None,
|
||||||
|
woonoppervlak=item.get("usable_area_living_function") or None,
|
||||||
|
perceeloppervlak=perceel,
|
||||||
|
kamers=item.get("amount_of_rooms") or None,
|
||||||
|
slaapkamers=item.get("amount_of_bedrooms") or None,
|
||||||
|
hero_image_url=hero,
|
||||||
|
extra={
|
||||||
|
"office": item.get("office", {}).get("name"),
|
||||||
|
"locations": item.get("locations"),
|
||||||
|
"garden_types": item.get("garden_types"),
|
||||||
|
"lat": item.get("lat"),
|
||||||
|
"lng": item.get("lng"),
|
||||||
|
"object_code": item.get("object_code"),
|
||||||
|
},
|
||||||
|
))
|
||||||
|
|
||||||
|
log.info("ooms: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Moerman & De Jong Makelaars (Schiedam)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Zelfde OG Online / realtime-listings platform als Bjornd.
|
||||||
|
|
||||||
|
_MOERMAN_BASE = "https://www.moerman-dejong.nl"
|
||||||
|
_MOERMAN_SKIP = {"rented", "rented_ur"}
|
||||||
|
|
||||||
|
_MOERMAN_STATUS_MAP = {
|
||||||
|
"available": "beschikbaar",
|
||||||
|
"under_bid": "onder_bod",
|
||||||
|
"under_option": "onder_bod",
|
||||||
|
"sold": "verkocht",
|
||||||
|
"sold_ur": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_moerman() -> list[RawListing]:
|
||||||
|
data = fetch_json(
|
||||||
|
f"{_MOERMAN_BASE}/nl/realtime-listings/consumer",
|
||||||
|
headers={"X-Requested-With": "XMLHttpRequest"},
|
||||||
|
)
|
||||||
|
|
||||||
|
listings = []
|
||||||
|
for item in data:
|
||||||
|
if not item.get("isSales"):
|
||||||
|
continue
|
||||||
|
if item.get("statusOrig") in _MOERMAN_SKIP:
|
||||||
|
continue
|
||||||
|
if item.get("salesPrice", 0) > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
postcode = (item.get("zipcode") or "").replace(" ", "") or None
|
||||||
|
perceel = item.get("plotSurface") or None
|
||||||
|
if perceel == 0:
|
||||||
|
perceel = None
|
||||||
|
|
||||||
|
raw_year = item.get("dateOfConstruction") or ""
|
||||||
|
bouwjaar = int(raw_year) if raw_year.isdigit() else None
|
||||||
|
energielabel = item.get("energyLabel") or None
|
||||||
|
|
||||||
|
detail_url = _MOERMAN_BASE + item["url"]
|
||||||
|
if not energielabel:
|
||||||
|
extra_kk = _og_detail(detail_url, "moerman")
|
||||||
|
energielabel = extra_kk.get("energielabel")
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="moerman",
|
||||||
|
status=_MOERMAN_STATUS_MAP.get(item.get("statusOrig", ""), "beschikbaar"),
|
||||||
|
adres=item.get("address") or None,
|
||||||
|
postcode=postcode,
|
||||||
|
stad=item.get("city") or None,
|
||||||
|
prijs=item.get("salesPrice") or None,
|
||||||
|
woningtype=item.get("type") or None,
|
||||||
|
woonoppervlak=item.get("livingSurface") or None,
|
||||||
|
perceeloppervlak=perceel,
|
||||||
|
kamers=item.get("rooms") or None,
|
||||||
|
slaapkamers=item.get("bedrooms") or None,
|
||||||
|
bouwjaar=bouwjaar,
|
||||||
|
energielabel=energielabel,
|
||||||
|
hero_image_url=item.get("photo") or None,
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
|
||||||
|
log.info("moerman: %d koopwoningen opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Van Daal Makelaardij (Delft)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# OG Online / realtime-listings platform.
|
||||||
|
|
||||||
|
_VANDAAL_BASE = "https://www.vandaalmakelaardij.nl"
|
||||||
|
_VANDAAL_SKIP = {"rented", "rented_ur"}
|
||||||
|
|
||||||
|
_VANDAAL_STATUS_MAP = {
|
||||||
|
"available": "beschikbaar",
|
||||||
|
"under_bid": "onder_bod",
|
||||||
|
"under_option": "onder_bod",
|
||||||
|
"is_bought": "verkocht",
|
||||||
|
"sold": "verkocht",
|
||||||
|
"sold_ur": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_vandaal() -> list[RawListing]:
|
||||||
|
data = fetch_json(
|
||||||
|
f"{_VANDAAL_BASE}/nl/realtime-listings/consumer",
|
||||||
|
headers={"X-Requested-With": "XMLHttpRequest"},
|
||||||
|
)
|
||||||
|
|
||||||
|
listings = []
|
||||||
|
for item in data:
|
||||||
|
if not item.get("isSales"):
|
||||||
|
continue
|
||||||
|
if item.get("statusOrig") in _VANDAAL_SKIP:
|
||||||
|
continue
|
||||||
|
if item.get("salesPrice", 0) > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
postcode = (item.get("zipcode") or "").replace(" ", "") or None
|
||||||
|
perceel = item.get("plotSurface") or None
|
||||||
|
if perceel == 0:
|
||||||
|
perceel = None
|
||||||
|
|
||||||
|
raw_year = item.get("dateOfConstruction") or ""
|
||||||
|
bouwjaar = int(raw_year) if raw_year.isdigit() else None
|
||||||
|
energielabel = item.get("energyLabel") or None
|
||||||
|
|
||||||
|
detail_url = _VANDAAL_BASE + item["url"]
|
||||||
|
if not energielabel:
|
||||||
|
extra_kk = _og_detail(detail_url, "vandaal")
|
||||||
|
energielabel = extra_kk.get("energielabel")
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="vandaal",
|
||||||
|
status=_VANDAAL_STATUS_MAP.get(item.get("statusOrig", ""), "beschikbaar"),
|
||||||
|
adres=item.get("address") or None,
|
||||||
|
postcode=postcode,
|
||||||
|
stad=item.get("city") or None,
|
||||||
|
prijs=item.get("salesPrice") or None,
|
||||||
|
woningtype=item.get("type") or None,
|
||||||
|
woonoppervlak=item.get("livingSurface") or None,
|
||||||
|
perceeloppervlak=perceel,
|
||||||
|
kamers=item.get("rooms") or None,
|
||||||
|
slaapkamers=item.get("bedrooms") or None,
|
||||||
|
bouwjaar=bouwjaar,
|
||||||
|
energielabel=energielabel,
|
||||||
|
hero_image_url=item.get("photo") or None,
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
|
||||||
|
log.info("vandaal: %d koopwoningen opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Elzenaar NVM Makelaars (Den Haag) — OG Online platform
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Zelfde platform als bjornd/moerman/vandaal.
|
||||||
|
|
||||||
|
_ELZENAAR_BASE = "https://www.elzenaar.com"
|
||||||
|
_ELZENAAR_SKIP = {"rented", "rented_ur"}
|
||||||
|
_ELZENAAR_CITIES = {"Den Haag", "Voorburg", "Rijswijk"}
|
||||||
|
|
||||||
|
_ELZENAAR_STATUS_MAP = {
|
||||||
|
"available": "beschikbaar",
|
||||||
|
"under_bid": "onder_bod",
|
||||||
|
"under_option": "onder_bod",
|
||||||
|
"sold": "verkocht",
|
||||||
|
"sold_ur": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_elzenaar() -> list[RawListing]:
|
||||||
|
data = fetch_json(
|
||||||
|
f"{_ELZENAAR_BASE}/nl/realtime-listings/consumer",
|
||||||
|
headers={"X-Requested-With": "XMLHttpRequest"},
|
||||||
|
)
|
||||||
|
|
||||||
|
listings = []
|
||||||
|
for item in data:
|
||||||
|
if not item.get("isSales"):
|
||||||
|
continue
|
||||||
|
if item.get("statusOrig") in _ELZENAAR_SKIP:
|
||||||
|
continue
|
||||||
|
if item.get("city") not in _ELZENAAR_CITIES:
|
||||||
|
continue
|
||||||
|
if item.get("salesPrice", 0) > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
postcode = (item.get("zipcode") or "").replace(" ", "") or None
|
||||||
|
perceel = item.get("plotSurface") or None
|
||||||
|
if perceel == 0:
|
||||||
|
perceel = None
|
||||||
|
|
||||||
|
raw_year = item.get("dateOfConstruction") or ""
|
||||||
|
bouwjaar = int(raw_year) if raw_year.isdigit() else None
|
||||||
|
energielabel = item.get("energyLabel") or None
|
||||||
|
|
||||||
|
detail_url = _ELZENAAR_BASE + item["url"]
|
||||||
|
if not energielabel:
|
||||||
|
extra_kk = _og_detail(detail_url, "elzenaar")
|
||||||
|
energielabel = extra_kk.get("energielabel")
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="elzenaar",
|
||||||
|
status=_ELZENAAR_STATUS_MAP.get(item.get("statusOrig", ""), "beschikbaar"),
|
||||||
|
adres=item.get("address") or None,
|
||||||
|
postcode=postcode,
|
||||||
|
stad=item.get("city") or None,
|
||||||
|
prijs=item.get("salesPrice") or None,
|
||||||
|
woningtype=item.get("type") or None,
|
||||||
|
woonoppervlak=item.get("livingSurface") or None,
|
||||||
|
perceeloppervlak=perceel,
|
||||||
|
kamers=item.get("rooms") or None,
|
||||||
|
slaapkamers=item.get("bedrooms") or None,
|
||||||
|
bouwjaar=bouwjaar,
|
||||||
|
energielabel=energielabel,
|
||||||
|
hero_image_url=item.get("photo") or None,
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
|
||||||
|
log.info("elzenaar: %d koopwoningen opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# DOEN NVM Makelaars (Den Haag / Leiden / Voorburg) — OG Online platform
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
_DOEN_BASE = "https://www.doenmakelaars.com"
|
||||||
|
_DOEN_SKIP = {"rented", "rented_ur"}
|
||||||
|
_DOEN_CITIES = {"Den Haag", "Leiden", "Voorburg", "Leidschendam", "Rijswijk", "Wassenaar", "Zoetermeer"}
|
||||||
|
|
||||||
|
_DOEN_STATUS_MAP = {
|
||||||
|
"available": "beschikbaar",
|
||||||
|
"under_bid": "onder_bod",
|
||||||
|
"under_option": "onder_bod",
|
||||||
|
"sold": "verkocht",
|
||||||
|
"sold_ur": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_doen() -> list[RawListing]:
|
||||||
|
data = fetch_json(
|
||||||
|
f"{_DOEN_BASE}/nl/realtime-listings/consumer",
|
||||||
|
headers={"X-Requested-With": "XMLHttpRequest"},
|
||||||
|
)
|
||||||
|
|
||||||
|
listings = []
|
||||||
|
for item in data:
|
||||||
|
if not item.get("isSales"):
|
||||||
|
continue
|
||||||
|
if item.get("statusOrig") in _DOEN_SKIP:
|
||||||
|
continue
|
||||||
|
if item.get("city") not in _DOEN_CITIES:
|
||||||
|
continue
|
||||||
|
if item.get("salesPrice", 0) > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
postcode = (item.get("zipcode") or "").replace(" ", "") or None
|
||||||
|
perceel = item.get("plotSurface") or None
|
||||||
|
if perceel == 0:
|
||||||
|
perceel = None
|
||||||
|
|
||||||
|
raw_year = item.get("dateOfConstruction") or ""
|
||||||
|
bouwjaar = int(raw_year) if raw_year.isdigit() else None
|
||||||
|
energielabel = item.get("energyLabel") or None
|
||||||
|
|
||||||
|
detail_url = _DOEN_BASE + item["url"]
|
||||||
|
if not energielabel:
|
||||||
|
extra_kk = _og_detail(detail_url, "doen")
|
||||||
|
energielabel = extra_kk.get("energielabel")
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="doen",
|
||||||
|
status=_DOEN_STATUS_MAP.get(item.get("statusOrig", ""), "beschikbaar"),
|
||||||
|
adres=item.get("address") or None,
|
||||||
|
postcode=postcode,
|
||||||
|
stad=item.get("city") or None,
|
||||||
|
prijs=item.get("salesPrice") or None,
|
||||||
|
woningtype=item.get("type") or None,
|
||||||
|
woonoppervlak=item.get("livingSurface") or None,
|
||||||
|
perceeloppervlak=perceel,
|
||||||
|
kamers=item.get("rooms") or None,
|
||||||
|
slaapkamers=item.get("bedrooms") or None,
|
||||||
|
bouwjaar=bouwjaar,
|
||||||
|
energielabel=energielabel,
|
||||||
|
hero_image_url=item.get("photo") or None,
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
|
||||||
|
log.info("doen: %d koopwoningen opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Vandriel Makelaardij (Schiedam) — OG Online / realtime-listings
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
_VANDRIEL_BASE = "https://www.vandrielmakelaardij.nl"
|
||||||
|
_VANDRIEL_SKIP = {"rented", "rented_ur"}
|
||||||
|
|
||||||
|
_VANDRIEL_STATUS_MAP = {
|
||||||
|
"available": "beschikbaar",
|
||||||
|
"under_bid": "onder_bod",
|
||||||
|
"under_option": "onder_bod",
|
||||||
|
"sold": "verkocht",
|
||||||
|
"sold_ur": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_vandriel() -> list[RawListing]:
|
||||||
|
data = fetch_json(
|
||||||
|
f"{_VANDRIEL_BASE}/nl/realtime-listings/consumer",
|
||||||
|
headers={"X-Requested-With": "XMLHttpRequest"},
|
||||||
|
)
|
||||||
|
|
||||||
|
listings = []
|
||||||
|
for item in data:
|
||||||
|
if not item.get("isSales"):
|
||||||
|
continue
|
||||||
|
if item.get("statusOrig") in _VANDRIEL_SKIP:
|
||||||
|
continue
|
||||||
|
if (item.get("city") or "").lower() != "schiedam":
|
||||||
|
continue
|
||||||
|
if item.get("salesPrice", 0) > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
postcode = (item.get("zipcode") or "").replace(" ", "") or None
|
||||||
|
perceel = item.get("plotSurface") or None
|
||||||
|
if perceel == 0:
|
||||||
|
perceel = None
|
||||||
|
|
||||||
|
raw_year = item.get("dateOfConstruction") or ""
|
||||||
|
bouwjaar = int(raw_year) if raw_year.isdigit() else None
|
||||||
|
energielabel = item.get("energyLabel") or None
|
||||||
|
|
||||||
|
detail_url = _VANDRIEL_BASE + item["url"]
|
||||||
|
if not energielabel:
|
||||||
|
extra_kk = _og_detail(detail_url, "vandriel")
|
||||||
|
energielabel = extra_kk.get("energielabel")
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="vandriel",
|
||||||
|
status=_VANDRIEL_STATUS_MAP.get(item.get("statusOrig", ""), "beschikbaar"),
|
||||||
|
adres=item.get("address") or None,
|
||||||
|
postcode=postcode,
|
||||||
|
stad=item.get("city") or None,
|
||||||
|
prijs=item.get("salesPrice") or None,
|
||||||
|
woningtype=item.get("type") or None,
|
||||||
|
woonoppervlak=item.get("livingSurface") or None,
|
||||||
|
perceeloppervlak=perceel,
|
||||||
|
kamers=item.get("rooms") or None,
|
||||||
|
slaapkamers=item.get("bedrooms") or None,
|
||||||
|
bouwjaar=bouwjaar,
|
||||||
|
energielabel=energielabel,
|
||||||
|
hero_image_url=item.get("photo") or None,
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
|
||||||
|
log.info("vandriel: %d koopwoningen opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# SCRAPERS — exporteer hier alle actieve API adapters
|
# SCRAPERS — exporteer hier alle actieve API adapters
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
SCRAPERS = {
|
SCRAPERS = {
|
||||||
'bjornd': fetch_bjornd,
|
'bjornd': fetch_bjornd,
|
||||||
|
'ooms': fetch_ooms,
|
||||||
|
'moerman': fetch_moerman,
|
||||||
|
'vandaal': fetch_vandaal,
|
||||||
|
'elzenaar': fetch_elzenaar,
|
||||||
|
'doen': fetch_doen,
|
||||||
|
'vandriel': fetch_vandriel,
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,154 +0,0 @@
|
|||||||
"""
|
|
||||||
adapters/ssr.py — HTML/SSR-based makelaars
|
|
||||||
|
|
||||||
Elke scraper is een functie () -> list[RawListing].
|
|
||||||
Voeg nieuwe toe onderaan en registreer in SCRAPERS.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import logging
|
|
||||||
import re
|
|
||||||
import time
|
|
||||||
|
|
||||||
import httpx
|
|
||||||
from bs4 import BeautifulSoup
|
|
||||||
|
|
||||||
import config
|
|
||||||
from huizenbot import RawListing
|
|
||||||
|
|
||||||
log = logging.getLogger("huizenbot.ssr")
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Gedeelde HTTP helper
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
def fetch_soup(url: str, *, params: dict = None) -> BeautifulSoup:
|
|
||||||
"""
|
|
||||||
GET request → BeautifulSoup. Handelt 429 af met Retry-After.
|
|
||||||
"""
|
|
||||||
for attempt in range(3):
|
|
||||||
r = httpx.get(
|
|
||||||
url,
|
|
||||||
params=params,
|
|
||||||
headers={"User-Agent": config.USER_AGENT},
|
|
||||||
timeout=15,
|
|
||||||
follow_redirects=True,
|
|
||||||
)
|
|
||||||
if r.status_code == 429:
|
|
||||||
wait = int(r.headers.get("Retry-After", 60))
|
|
||||||
log.warning("429 op %s, wacht %ds", url, wait)
|
|
||||||
time.sleep(wait)
|
|
||||||
continue
|
|
||||||
r.raise_for_status()
|
|
||||||
return BeautifulSoup(r.text, "html.parser")
|
|
||||||
|
|
||||||
raise RuntimeError(f"Blijvend 429 op {url}")
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Parse helpers
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
def parse_prijs(text: str | None) -> int | None:
|
|
||||||
"""'€ 325.000 k.k.' → 325000"""
|
|
||||||
if not text:
|
|
||||||
return None
|
|
||||||
digits = re.sub(r"[^\d]", "", text)
|
|
||||||
return int(digits) if digits else None
|
|
||||||
|
|
||||||
|
|
||||||
def parse_m2(text: str | None) -> int | None:
|
|
||||||
"""'87 m²' → 87"""
|
|
||||||
if not text:
|
|
||||||
return None
|
|
||||||
m = re.search(r"(\d+)", text.replace(".", ""))
|
|
||||||
return int(m.group(1)) if m else None
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Björn & Dries adapter (bjornd.nl)
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# TODO: vul de echte CSS selectors in na inspectie van de pagina.
|
|
||||||
# Dit is een structureel sjabloon — de selectors zijn placeholders.
|
|
||||||
|
|
||||||
BJORND_BASE = "https://www.bjornd.nl"
|
|
||||||
BJORND_AANBOD = f"{BJORND_BASE}/aanbod"
|
|
||||||
|
|
||||||
|
|
||||||
def fetch_bjornd_demo() -> list[RawListing]:
|
|
||||||
soup = fetch_soup(BJORND_AANBOD)
|
|
||||||
listings = []
|
|
||||||
|
|
||||||
# Pas de selector aan op de echte HTML structuur
|
|
||||||
for card in soup.select(".property-card"): # ← aanpassen
|
|
||||||
try:
|
|
||||||
a_tag = card.select_one("a[href]")
|
|
||||||
if not a_tag:
|
|
||||||
continue
|
|
||||||
url = a_tag["href"]
|
|
||||||
if not url.startswith("http"):
|
|
||||||
url = BJORND_BASE + url
|
|
||||||
|
|
||||||
adres = _text(card, ".property-address") # ← aanpassen
|
|
||||||
postcode = _extract_postcode(_text(card, ".property-location"))
|
|
||||||
prijs = parse_prijs(_text(card, ".property-price"))
|
|
||||||
opp = parse_m2(_text(card, ".property-area"))
|
|
||||||
img = _src(card, "img")
|
|
||||||
|
|
||||||
listings.append(RawListing(
|
|
||||||
url=url,
|
|
||||||
source_makelaar="bjornd",
|
|
||||||
adres=adres,
|
|
||||||
postcode=postcode,
|
|
||||||
stad=_infer_stad(postcode),
|
|
||||||
prijs=prijs,
|
|
||||||
woonoppervlak=opp,
|
|
||||||
hero_image_url=img,
|
|
||||||
))
|
|
||||||
except Exception as e:
|
|
||||||
log.warning("Fout bij parsen bjornd card: %s", e)
|
|
||||||
|
|
||||||
return listings
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# SSR helper utils
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
def _text(soup, selector: str) -> str | None:
|
|
||||||
el = soup.select_one(selector)
|
|
||||||
return el.get_text(strip=True) if el else None
|
|
||||||
|
|
||||||
|
|
||||||
def _src(soup, selector: str) -> str | None:
|
|
||||||
el = soup.select_one(selector)
|
|
||||||
if el is None:
|
|
||||||
return None
|
|
||||||
return el.get("src") or el.get("data-src")
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_postcode(text: str | None) -> str | None:
|
|
||||||
if not text:
|
|
||||||
return None
|
|
||||||
m = re.search(r"\b(\d{4}\s?[A-Z]{2})\b", text)
|
|
||||||
return m.group(1).replace(" ", "") if m else None
|
|
||||||
|
|
||||||
|
|
||||||
def _infer_stad(postcode: str | None) -> str | None:
|
|
||||||
"""Simpele mapping op basis van postcode range — uitbreiden naar wens."""
|
|
||||||
if not postcode:
|
|
||||||
return None
|
|
||||||
code = int(postcode[:4])
|
|
||||||
if 2600 <= code <= 2629:
|
|
||||||
return "Delft"
|
|
||||||
if 3100 <= code <= 3135:
|
|
||||||
return "Schiedam"
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# SCRAPERS — exporteer hier alle actieve SSR adapters
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
SCRAPERS = {
|
|
||||||
'bjornd_demo': fetch_bjornd_demo,
|
|
||||||
}
|
|
||||||
65
src/adapters/ssr/__init__.py
Normal file
65
src/adapters/ssr/__init__.py
Normal file
@@ -0,0 +1,65 @@
|
|||||||
|
"""
|
||||||
|
adapters/ssr — HTML/SSR-based makelaars
|
||||||
|
|
||||||
|
Elke scraper is een functie () -> list[RawListing].
|
||||||
|
Om een nieuwe makelaar toe te voegen:
|
||||||
|
1. Voeg een fetch_* functie toe in het juiste submodule
|
||||||
|
(realworks.py, sure.py, schiedam.py, denhaag.py, overige.py)
|
||||||
|
2. Importeer de functie hier en registreer in SCRAPERS.
|
||||||
|
|
||||||
|
CMS-typen per module:
|
||||||
|
realworks.py — Realworks CMS (li/div.aanbodEntry + span.kenmerk detail)
|
||||||
|
sure.py — SURE WordPress plugin (/wonen?sure_koop_huur=koop + #kenmerken)
|
||||||
|
schiedam.py — Custom Schiedam scrapers (diverse platforms)
|
||||||
|
denhaag.py — Den Haag scrapers (diverse platforms)
|
||||||
|
overige.py — Overige / multi-stad (OG Online WP, Elementor)
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .realworks import (
|
||||||
|
fetch_ankebodewes,
|
||||||
|
fetch_woongoed,
|
||||||
|
fetch_vwmakelaars,
|
||||||
|
fetch_zomakelaars,
|
||||||
|
fetch_morris,
|
||||||
|
fetch_wassenaar,
|
||||||
|
fetch_roepman,
|
||||||
|
fetch_post,
|
||||||
|
fetch_vankleef,
|
||||||
|
)
|
||||||
|
from .sure import (
|
||||||
|
fetch_schielandborsboom,
|
||||||
|
fetch_olsthoorn,
|
||||||
|
fetch_vanherk,
|
||||||
|
fetch_borgdorff,
|
||||||
|
)
|
||||||
|
from .schiedam import (
|
||||||
|
fetch_dewittegarantiemakelaars,
|
||||||
|
fetch_dens,
|
||||||
|
fetch_3dmakelaars,
|
||||||
|
fetch_dupont,
|
||||||
|
)
|
||||||
|
from .denhaag import fetch_88makelaars
|
||||||
|
from .overige import fetch_vansilfhout, fetch_vanoord
|
||||||
|
|
||||||
|
SCRAPERS = {
|
||||||
|
'ankebodewes': fetch_ankebodewes,
|
||||||
|
'woongoed': fetch_woongoed,
|
||||||
|
'dewittegarantiemakelaars': fetch_dewittegarantiemakelaars,
|
||||||
|
'wassenaar': fetch_wassenaar,
|
||||||
|
'dens': fetch_dens,
|
||||||
|
'3dmakelaars': fetch_3dmakelaars,
|
||||||
|
'dupont': fetch_dupont,
|
||||||
|
'schielandborsboom': fetch_schielandborsboom,
|
||||||
|
'vansilfhout': fetch_vansilfhout,
|
||||||
|
'vwmakelaars': fetch_vwmakelaars,
|
||||||
|
'roepman': fetch_roepman,
|
||||||
|
'zomakelaars': fetch_zomakelaars,
|
||||||
|
'post': fetch_post,
|
||||||
|
'morris': fetch_morris,
|
||||||
|
'olsthoorn': fetch_olsthoorn,
|
||||||
|
'88makelaars': fetch_88makelaars,
|
||||||
|
'borgdorff': fetch_borgdorff,
|
||||||
|
'vanherk': fetch_vanherk,
|
||||||
|
'vanoord': fetch_vanoord,
|
||||||
|
'vankleef': fetch_vankleef,
|
||||||
|
}
|
||||||
79
src/adapters/ssr/_shared.py
Normal file
79
src/adapters/ssr/_shared.py
Normal file
@@ -0,0 +1,79 @@
|
|||||||
|
"""Shared utilities for all SSR scrapers."""
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
import time
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
|
||||||
|
import config
|
||||||
|
|
||||||
|
log = logging.getLogger("huizenbot.ssr")
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_soup(url: str, *, params: dict = None) -> BeautifulSoup:
|
||||||
|
"""GET request → BeautifulSoup. Handelt 429 af met Retry-After."""
|
||||||
|
for attempt in range(3):
|
||||||
|
r = httpx.get(
|
||||||
|
url,
|
||||||
|
params=params,
|
||||||
|
headers={"User-Agent": config.USER_AGENT},
|
||||||
|
timeout=15,
|
||||||
|
follow_redirects=True,
|
||||||
|
)
|
||||||
|
if r.status_code == 429:
|
||||||
|
wait = int(r.headers.get("Retry-After", 60))
|
||||||
|
log.warning("429 op %s, wacht %ds", url, wait)
|
||||||
|
time.sleep(wait)
|
||||||
|
continue
|
||||||
|
r.raise_for_status()
|
||||||
|
return BeautifulSoup(r.text, "html.parser")
|
||||||
|
|
||||||
|
raise RuntimeError(f"Blijvend 429 op {url}")
|
||||||
|
|
||||||
|
|
||||||
|
def parse_prijs(text: str | None) -> int | None:
|
||||||
|
"""'€ 325.000 k.k.' → 325000"""
|
||||||
|
if not text:
|
||||||
|
return None
|
||||||
|
digits = re.sub(r"[^\d]", "", text)
|
||||||
|
return int(digits) if digits else None
|
||||||
|
|
||||||
|
|
||||||
|
def parse_m2(text: str | None) -> int | None:
|
||||||
|
"""'87 m²' → 87"""
|
||||||
|
if not text:
|
||||||
|
return None
|
||||||
|
m = re.search(r"(\d+)", text.replace(".", ""))
|
||||||
|
return int(m.group(1)) if m else None
|
||||||
|
|
||||||
|
|
||||||
|
def _text(soup, selector: str) -> str | None:
|
||||||
|
el = soup.select_one(selector)
|
||||||
|
return el.get_text(strip=True) if el else None
|
||||||
|
|
||||||
|
|
||||||
|
def _src(soup, selector: str) -> str | None:
|
||||||
|
el = soup.select_one(selector)
|
||||||
|
if el is None:
|
||||||
|
return None
|
||||||
|
return el.get("src") or el.get("data-src")
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_postcode(text: str | None) -> str | None:
|
||||||
|
if not text:
|
||||||
|
return None
|
||||||
|
m = re.search(r"\b(\d{4}\s?[A-Z]{2})\b", text)
|
||||||
|
return m.group(1).replace(" ", "") if m else None
|
||||||
|
|
||||||
|
|
||||||
|
def _infer_stad(postcode: str | None) -> str | None:
|
||||||
|
"""Simpele mapping op basis van postcode range — uitbreiden naar wens."""
|
||||||
|
if not postcode:
|
||||||
|
return None
|
||||||
|
code = int(postcode[:4])
|
||||||
|
if 2600 <= code <= 2629:
|
||||||
|
return "Delft"
|
||||||
|
if 3100 <= code <= 3135:
|
||||||
|
return "Schiedam"
|
||||||
|
return None
|
||||||
138
src/adapters/ssr/denhaag.py
Normal file
138
src/adapters/ssr/denhaag.py
Normal file
@@ -0,0 +1,138 @@
|
|||||||
|
"""
|
||||||
|
Den Haag scrapers (custom platforms).
|
||||||
|
|
||||||
|
Scrapers: 88makelaars
|
||||||
|
Note: borgdorff also covers Den Haag but uses the SURE CMS → see sure.py.
|
||||||
|
"""
|
||||||
|
import re
|
||||||
|
|
||||||
|
import config
|
||||||
|
from huizenbot import RawListing
|
||||||
|
|
||||||
|
from ._shared import fetch_soup, parse_prijs, parse_m2, _text, log
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# 88 Makelaars (Den Haag)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
_88_BASE = "https://88makelaars.nl"
|
||||||
|
|
||||||
|
_88_STATUS_MAP = {
|
||||||
|
"te koop": "beschikbaar",
|
||||||
|
"beschikbaar": "beschikbaar",
|
||||||
|
"onder bod": "onder_bod",
|
||||||
|
"onder optie": "onder_bod",
|
||||||
|
"verkocht onder voorbehoud": "verkocht",
|
||||||
|
"verkocht": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _88makelaars_detail(detail_url: str) -> dict:
|
||||||
|
"""Fetch 88makelaars detail page; extract kenmerken from div.listing_detail kv pairs."""
|
||||||
|
try:
|
||||||
|
soup = fetch_soup(detail_url)
|
||||||
|
kv: dict[str, str] = {}
|
||||||
|
for div in soup.select("div.listing_detail"):
|
||||||
|
txt = div.get_text(strip=True)
|
||||||
|
if ":" in txt:
|
||||||
|
label, _, value = txt.partition(":")
|
||||||
|
kv[label.strip().lower()] = value.strip()
|
||||||
|
raw_pc = kv.get("postcode") or ""
|
||||||
|
pc_match = re.search(r"\d{4}\s*[A-Z]{2}", raw_pc.upper())
|
||||||
|
postcode = pc_match.group(0).replace(" ", "") if pc_match else None
|
||||||
|
return {
|
||||||
|
"postcode": postcode,
|
||||||
|
"slaapkamers": kv.get("slaapkamers"),
|
||||||
|
"woonoppervlak": kv.get("woning grootte"),
|
||||||
|
"energielabel": kv.get("energieklasse"),
|
||||||
|
"woningtype": kv.get("soort woning"),
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("88makelaars: detail fetch fout %s: %s", detail_url, e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_88makelaars() -> list[RawListing]:
|
||||||
|
"""Fetch 88 Makelaars listings (Den Haag only)."""
|
||||||
|
listings = []
|
||||||
|
page = 1
|
||||||
|
|
||||||
|
while True:
|
||||||
|
if page == 1:
|
||||||
|
url = f"{_88_BASE}/ons-aanbod/"
|
||||||
|
else:
|
||||||
|
url = f"{_88_BASE}/ons-aanbod/page/{page}/"
|
||||||
|
soup = fetch_soup(url)
|
||||||
|
cards = soup.select("div.property_listing")
|
||||||
|
if not cards:
|
||||||
|
break
|
||||||
|
|
||||||
|
for card in cards:
|
||||||
|
try:
|
||||||
|
# URL from carousel
|
||||||
|
a_tag = card.select_one(".property_unit_carousel a[href]")
|
||||||
|
if not a_tag:
|
||||||
|
continue
|
||||||
|
detail_url = a_tag["href"]
|
||||||
|
if not detail_url.startswith("http"):
|
||||||
|
detail_url = _88_BASE + detail_url
|
||||||
|
|
||||||
|
# City — last link in property_location_image
|
||||||
|
loc_links = card.select(".property_location_image a")
|
||||||
|
stad = loc_links[-1].get_text(strip=True) if loc_links else None
|
||||||
|
if not stad or stad.lower() != "den haag":
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Price
|
||||||
|
prijs = parse_prijs(_text(card, ".listing_unit_price_wrapper"))
|
||||||
|
if prijs and prijs > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Status
|
||||||
|
status_text = (_text(card, ".ribbon-inside") or "").lower()
|
||||||
|
status = _88_STATUS_MAP.get(status_text, "beschikbaar")
|
||||||
|
|
||||||
|
# Address
|
||||||
|
adres = _text(card, "h4 a") or _text(card, "h4")
|
||||||
|
|
||||||
|
# Surface + rooms
|
||||||
|
woonoppervlak_card = parse_m2(_text(card, "span.infosize"))
|
||||||
|
kamers_card = None
|
||||||
|
rooms_txt = _text(card, "span.inforoom")
|
||||||
|
if rooms_txt:
|
||||||
|
m = re.search(r"(\d+)", rooms_txt)
|
||||||
|
kamers_card = int(m.group(1)) if m else None
|
||||||
|
|
||||||
|
# Hero: first active carousel image
|
||||||
|
img = card.select_one(".item.active img")
|
||||||
|
hero = img.get("src") or img.get("data-original") if img else None
|
||||||
|
|
||||||
|
kk = _88makelaars_detail(detail_url)
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="88makelaars",
|
||||||
|
status=status,
|
||||||
|
adres=adres,
|
||||||
|
postcode=kk.get("postcode"),
|
||||||
|
stad="Den Haag",
|
||||||
|
prijs=prijs,
|
||||||
|
hero_image_url=hero,
|
||||||
|
woningtype=kk.get("woningtype"),
|
||||||
|
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
|
||||||
|
kamers=kamers_card,
|
||||||
|
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
|
||||||
|
energielabel=kk.get("energielabel"),
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("88makelaars: parse fout: %s", e)
|
||||||
|
|
||||||
|
if len(cards) < 10:
|
||||||
|
break
|
||||||
|
page += 1
|
||||||
|
|
||||||
|
log.info("88makelaars: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
288
src/adapters/ssr/overige.py
Normal file
288
src/adapters/ssr/overige.py
Normal file
@@ -0,0 +1,288 @@
|
|||||||
|
"""
|
||||||
|
Overige SSR scrapers (no shared CMS platform, multi-city).
|
||||||
|
|
||||||
|
Scrapers: vansilfhout (OG Online WordPress), vanoord (Elementor/custom)
|
||||||
|
"""
|
||||||
|
import re
|
||||||
|
|
||||||
|
import config
|
||||||
|
from huizenbot import RawListing
|
||||||
|
|
||||||
|
from ._shared import fetch_soup, parse_prijs, parse_m2, _text, log
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Van Silfhout & Hogetoorn Wereldmakelaars (Delft) — OG Online WordPress
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# All listings on one page. Postcode embedded in JS; detail has shortSpecs.
|
||||||
|
# Also serves as base for fetch_vwmakelaars and fetch_zomakelaars which
|
||||||
|
# happen to use the standard Realworks CMS instead — see realworks.py.
|
||||||
|
|
||||||
|
_VANSILFHOUT_BASE = "https://www.vansilfhout.nl"
|
||||||
|
|
||||||
|
_VANSILFHOUT_STATUS_MAP = {
|
||||||
|
"te koop": "beschikbaar",
|
||||||
|
"onder bod": "onder_bod",
|
||||||
|
"verkocht": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _vansilfhout_detail(detail_url: str) -> dict:
|
||||||
|
"""Fetch Van Silfhout detail page; extract postcode from JS and specs from shortSpecs."""
|
||||||
|
try:
|
||||||
|
import httpx
|
||||||
|
r = httpx.get(
|
||||||
|
detail_url,
|
||||||
|
headers={"User-Agent": config.USER_AGENT},
|
||||||
|
timeout=15,
|
||||||
|
follow_redirects=True,
|
||||||
|
)
|
||||||
|
r.raise_for_status()
|
||||||
|
html = r.text
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
soup = BeautifulSoup(html, "html.parser")
|
||||||
|
|
||||||
|
# Postcode embedded in JS: objectZipcode': '2624NP'
|
||||||
|
m = re.search(r"objectZipcode':\s*'([^']+)'", html)
|
||||||
|
postcode = m.group(1) if m else None
|
||||||
|
|
||||||
|
# shortSpecs: <li><span>Label:</span><span>Value</span></li>
|
||||||
|
kv: dict[str, str] = {}
|
||||||
|
for li in soup.select(".shortSpecs li"):
|
||||||
|
spans = li.select("span")
|
||||||
|
if len(spans) >= 2:
|
||||||
|
label = spans[0].get_text(strip=True).rstrip(":").lower()
|
||||||
|
value = spans[-1].get_text(strip=True)
|
||||||
|
kv[label] = value
|
||||||
|
|
||||||
|
return {
|
||||||
|
"postcode": postcode,
|
||||||
|
"bouwjaar": kv.get("bouwjaar"),
|
||||||
|
"woonoppervlak": kv.get("oppervlakte"),
|
||||||
|
"kamers": kv.get("kamers"),
|
||||||
|
"slaapkamers": kv.get("slaapkamers"),
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("vansilfhout: detail fetch fout %s: %s", detail_url, e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_vansilfhout() -> list[RawListing]:
|
||||||
|
"""Fetch Van Silfhout woningaanbod (alle listings op één pagina)."""
|
||||||
|
soup = fetch_soup(f"{_VANSILFHOUT_BASE}/woningaanbod/")
|
||||||
|
listings = []
|
||||||
|
|
||||||
|
for card in soup.select("article.row"):
|
||||||
|
try:
|
||||||
|
a_tag = card.select_one("a.objectcontainerimg")
|
||||||
|
if not a_tag or "href" not in a_tag.attrs:
|
||||||
|
continue
|
||||||
|
detail_url = a_tag["href"]
|
||||||
|
if not detail_url.startswith("http"):
|
||||||
|
detail_url = _VANSILFHOUT_BASE + detail_url
|
||||||
|
|
||||||
|
# Status
|
||||||
|
status_text = (_text(card, "span.objectstatus") or "").lower()
|
||||||
|
status = _VANSILFHOUT_STATUS_MAP.get(status_text, "beschikbaar")
|
||||||
|
|
||||||
|
# Address and city
|
||||||
|
adres = _text(card, "h2.objecttitle")
|
||||||
|
city_el = card.select("a.straatnaamwoonplaats span")
|
||||||
|
stad = city_el[-1].get_text(strip=True) if city_el else None
|
||||||
|
|
||||||
|
# Price from shortSpecs strong
|
||||||
|
prijs = parse_prijs(_text(card, "ul.shortSpecs li strong"))
|
||||||
|
if prijs and prijs > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Area and rooms from shortSpecs
|
||||||
|
woonoppervlak_card = None
|
||||||
|
kamers_card = None
|
||||||
|
for li in card.select("ul.shortSpecs li"):
|
||||||
|
spans = li.select("span")
|
||||||
|
if len(spans) >= 2:
|
||||||
|
label = spans[0].get_text(strip=True).lower()
|
||||||
|
val = spans[-1].get_text(strip=True)
|
||||||
|
if "oppervlakt" in label:
|
||||||
|
woonoppervlak_card = parse_m2(val)
|
||||||
|
elif "kamer" in label:
|
||||||
|
m = re.search(r"(\d+)", val)
|
||||||
|
kamers_card = int(m.group(1)) if m else None
|
||||||
|
|
||||||
|
# Hero image: prefer data-lazy-src, fall back to noscript img src
|
||||||
|
img_tag = card.select_one("a.objectcontainerimg img")
|
||||||
|
hero = None
|
||||||
|
if img_tag:
|
||||||
|
hero = (img_tag.get("data-lazy-src")
|
||||||
|
or img_tag.get("src") or None)
|
||||||
|
if hero and hero.startswith("data:"):
|
||||||
|
noscript = card.select_one("noscript img")
|
||||||
|
hero = noscript["src"] if noscript else None
|
||||||
|
|
||||||
|
kk = _vansilfhout_detail(detail_url)
|
||||||
|
|
||||||
|
# Parse kamers/slaapkamers from detail
|
||||||
|
kamers = kamers_card
|
||||||
|
if kk.get("kamers"):
|
||||||
|
m = re.search(r"(\d+)", kk["kamers"])
|
||||||
|
kamers = int(m.group(1)) if m else kamers_card
|
||||||
|
|
||||||
|
slaapkamers = None
|
||||||
|
if kk.get("slaapkamers"):
|
||||||
|
m = re.search(r"(\d+)", kk["slaapkamers"])
|
||||||
|
slaapkamers = int(m.group(1)) if m else None
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="vansilfhout",
|
||||||
|
status=status,
|
||||||
|
adres=adres,
|
||||||
|
postcode=kk.get("postcode"),
|
||||||
|
stad=stad,
|
||||||
|
prijs=prijs,
|
||||||
|
hero_image_url=hero,
|
||||||
|
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||||
|
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
|
||||||
|
kamers=kamers,
|
||||||
|
slaapkamers=slaapkamers,
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("vansilfhout: parse fout: %s", e)
|
||||||
|
|
||||||
|
log.info("vansilfhout: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Van Oord Makelaardij (Delft + Schiedam) — Elementor/custom WordPress
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Separate listing pages per city; detail page has rw-object-features-list.
|
||||||
|
|
||||||
|
_VANOORD_BASE = "https://www.vanoordmakelaardij.nl"
|
||||||
|
_VANOORD_LISTINGS = [
|
||||||
|
f"https://www.vanoordmakelaardij.nl/aanbod/?_price=0%2C{config.MAX_PRICE}&_city=Delft&_availability=Te+koop",
|
||||||
|
f"https://www.vanoordmakelaardij.nl/aanbod/?_price=0%2C{config.MAX_PRICE}&_city=Schiedam&_availability=Te+koop",
|
||||||
|
]
|
||||||
|
|
||||||
|
_VANOORD_STATUS_MAP = {
|
||||||
|
"te koop": "beschikbaar",
|
||||||
|
"onder bod": "onder_bod",
|
||||||
|
"verkocht": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _vanoord_detail(detail_url: str) -> dict:
|
||||||
|
"""Fetch Van Oord detail page; extract kenmerken from rw-object-features-list and postcode."""
|
||||||
|
try:
|
||||||
|
soup = fetch_soup(detail_url)
|
||||||
|
kv: dict[str, str] = {}
|
||||||
|
for li in soup.select("ul.rw-object-features-list li"):
|
||||||
|
label_el = li.select_one("span.rw-object-list-label")
|
||||||
|
value_el = li.select_one("span.rw-object-list-value")
|
||||||
|
if label_el and value_el:
|
||||||
|
label = label_el.get_text(strip=True).lower()
|
||||||
|
value = value_el.get_text(strip=True)
|
||||||
|
kv[label] = value
|
||||||
|
# Postcode is in first .elementor-heading-title (e.g. "3562 TN,")
|
||||||
|
headings = soup.select(".elementor-heading-title")
|
||||||
|
postcode = None
|
||||||
|
if headings:
|
||||||
|
postcode = headings[0].get_text(strip=True).rstrip(",").strip()
|
||||||
|
return {
|
||||||
|
"status": kv.get("status", "").lower(),
|
||||||
|
"postcode": postcode,
|
||||||
|
"bouwjaar": kv.get("bouwjaar"),
|
||||||
|
"woonoppervlak": kv.get("woonoppervlakte"),
|
||||||
|
"kamers": kv.get("aantal kamers"),
|
||||||
|
"slaapkamers": kv.get("slaapkamers"),
|
||||||
|
"energielabel": kv.get("energieklasse"),
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("vanoord: detail fetch fout %s: %s", detail_url, e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_vanoord() -> list[RawListing]:
|
||||||
|
"""Fetch Van Oord listings; Delft and Schiedam, only koop."""
|
||||||
|
seen: set[str] = set()
|
||||||
|
listings = []
|
||||||
|
|
||||||
|
for listing_url in _VANOORD_LISTINGS:
|
||||||
|
soup = fetch_soup(listing_url)
|
||||||
|
cards = soup.select("div.e-loop-item")
|
||||||
|
|
||||||
|
for card in cards:
|
||||||
|
try:
|
||||||
|
# Detail URL from h3 > a
|
||||||
|
a_tag = card.select_one("h3.elementor-heading-title a[href]")
|
||||||
|
if not a_tag:
|
||||||
|
continue
|
||||||
|
detail_url = a_tag["href"]
|
||||||
|
if not detail_url.startswith("http"):
|
||||||
|
detail_url = _VANOORD_BASE + detail_url
|
||||||
|
if detail_url in seen:
|
||||||
|
continue
|
||||||
|
seen.add(detail_url)
|
||||||
|
|
||||||
|
# Status from rw-status-label widget class
|
||||||
|
status_el = card.select_one("[class*='rw-status-label--']")
|
||||||
|
status = "beschikbaar"
|
||||||
|
if status_el:
|
||||||
|
status_text = status_el.get_text(strip=True).lower()
|
||||||
|
status = _VANOORD_STATUS_MAP.get(status_text, "beschikbaar")
|
||||||
|
|
||||||
|
# City from h4
|
||||||
|
h4 = card.select_one("h4.elementor-heading-title")
|
||||||
|
stad = h4.get_text(strip=True) if h4 else None
|
||||||
|
|
||||||
|
# Address from h3 > a text
|
||||||
|
adres = " ".join(a_tag.get_text().split())
|
||||||
|
|
||||||
|
# Price from h3 without <a> child
|
||||||
|
prijs = None
|
||||||
|
for h3 in card.select("h3.elementor-heading-title"):
|
||||||
|
if not h3.select_one("a"):
|
||||||
|
prijs = parse_prijs(h3.get_text())
|
||||||
|
break
|
||||||
|
if prijs and prijs > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Card icon list: [0]=surface [1]=rooms [2]=energy
|
||||||
|
icon_items = card.select("ul.elementor-icon-list-items li span.elementor-icon-list-text")
|
||||||
|
woonoppervlak_card = parse_m2(icon_items[0].get_text()) if len(icon_items) > 0 else None
|
||||||
|
kamers_card = None
|
||||||
|
if len(icon_items) > 1:
|
||||||
|
m = re.search(r"(\d+)", icon_items[1].get_text())
|
||||||
|
kamers_card = int(m.group(1)) if m else None
|
||||||
|
energielabel_card = icon_items[2].get_text(strip=True) if len(icon_items) > 2 else None
|
||||||
|
|
||||||
|
kk = _vanoord_detail(detail_url)
|
||||||
|
|
||||||
|
detail_status = _VANOORD_STATUS_MAP.get(kk.get("status", ""), "")
|
||||||
|
if detail_status:
|
||||||
|
status = detail_status
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="vanoord",
|
||||||
|
status=status,
|
||||||
|
adres=adres,
|
||||||
|
postcode=kk.get("postcode"),
|
||||||
|
stad=stad,
|
||||||
|
prijs=prijs,
|
||||||
|
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar", "").isdigit() else None,
|
||||||
|
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
|
||||||
|
kamers=(int(kk["kamers"]) if kk.get("kamers", "").isdigit() else None) or kamers_card,
|
||||||
|
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers", "").isdigit() else None,
|
||||||
|
energielabel=kk.get("energielabel") or energielabel_card,
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("vanoord: parse fout: %s", e)
|
||||||
|
|
||||||
|
log.info("vanoord: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
568
src/adapters/ssr/realworks.py
Normal file
568
src/adapters/ssr/realworks.py
Normal file
@@ -0,0 +1,568 @@
|
|||||||
|
"""
|
||||||
|
Realworks CMS scrapers.
|
||||||
|
|
||||||
|
All makelaars here run the Realworks CMS. Listings come from paginated
|
||||||
|
/aanbod/woningaanbod/-{price}/koop/ pages; detail pages have span.kenmerk
|
||||||
|
label/value pairs. Some variants (Wassenaar, Roepman) expose listing-level
|
||||||
|
data via JSON-LD instead of card HTML.
|
||||||
|
|
||||||
|
Scrapers: ankebodewes, woongoed, vwmakelaars, zomakelaars, morris,
|
||||||
|
wassenaar, roepman, post
|
||||||
|
"""
|
||||||
|
import json as _json
|
||||||
|
import re
|
||||||
|
|
||||||
|
import config
|
||||||
|
from huizenbot import RawListing
|
||||||
|
|
||||||
|
from ._shared import fetch_soup, parse_prijs, parse_m2, _text, log
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Shared Realworks helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
_REALWORKS_STATUS_MAP = {
|
||||||
|
"te koop": "beschikbaar",
|
||||||
|
"nieuw": "beschikbaar",
|
||||||
|
"onder bod": "onder_bod",
|
||||||
|
"onder optie": "onder_bod",
|
||||||
|
"verkocht o.v.": "verkocht",
|
||||||
|
"verkocht": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _realworks_detail(detail_url: str, makelaar: str) -> dict:
|
||||||
|
"""Fetch a Realworks detail page and extract kenmerken. Returns empty dict on failure."""
|
||||||
|
try:
|
||||||
|
soup = fetch_soup(detail_url)
|
||||||
|
|
||||||
|
# Build a label→value map from all .kenmerk spans
|
||||||
|
kv: dict[str, str] = {}
|
||||||
|
for kenmerk in soup.select("span.kenmerk"):
|
||||||
|
label_el = kenmerk.select_one("span.kenmerkName")
|
||||||
|
value_el = kenmerk.select_one("span.kenmerkValue")
|
||||||
|
if label_el and value_el:
|
||||||
|
label = label_el.get_text(strip=True).lower()
|
||||||
|
value = value_el.get_text(strip=True)
|
||||||
|
kv[label] = value
|
||||||
|
|
||||||
|
return {
|
||||||
|
"woningtype": kv.get("type woning"),
|
||||||
|
"bouwjaar": kv.get("bouwjaar"),
|
||||||
|
"woonoppervlak": kv.get("woonoppervlakte"),
|
||||||
|
"perceeloppervlak": kv.get("perceeloppervlakte"),
|
||||||
|
"kamers": kv.get("aantal kamers"),
|
||||||
|
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||||
|
"energielabel": kv.get("energieklasse"),
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("%s: detail fetch fout %s: %s", makelaar, detail_url, e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_realworks(base_url: str, makelaar: str) -> list[RawListing]:
|
||||||
|
"""
|
||||||
|
Generic fetcher for Realworks CMS brokers.
|
||||||
|
Paginates via /pagina-{n}/, fetches detail page per listing.
|
||||||
|
"""
|
||||||
|
listings_path = f"/aanbod/woningaanbod/-{config.MAX_PRICE}/koop"
|
||||||
|
listings = []
|
||||||
|
page = 1
|
||||||
|
|
||||||
|
while True:
|
||||||
|
url = f"{base_url}{listings_path}/pagina-{page}/"
|
||||||
|
soup = fetch_soup(url)
|
||||||
|
cards = soup.select("li.aanbodEntry")
|
||||||
|
if not cards:
|
||||||
|
break
|
||||||
|
|
||||||
|
for card in cards:
|
||||||
|
try:
|
||||||
|
a_tag = card.select_one("a.aanbodEntryLink")
|
||||||
|
if not a_tag:
|
||||||
|
continue
|
||||||
|
listing_url = base_url + a_tag["href"]
|
||||||
|
|
||||||
|
adres = _text(card, ".street-address")
|
||||||
|
postcode = (_text(card, ".postal-code") or "").replace(" ", "") or None
|
||||||
|
stad = _text(card, ".locality")
|
||||||
|
prijs = parse_prijs(_text(card, ".koopprijs .kenmerkValue"))
|
||||||
|
|
||||||
|
status_text = (_text(card, ".objectstatusbanner") or "").lower()
|
||||||
|
status = _REALWORKS_STATUS_MAP.get(status_text, "beschikbaar")
|
||||||
|
|
||||||
|
img_tag = card.select_one(".hoofdfoto img")
|
||||||
|
hero = img_tag["src"] if img_tag else None
|
||||||
|
|
||||||
|
kk = _realworks_detail(listing_url, makelaar)
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=listing_url,
|
||||||
|
source_makelaar=makelaar,
|
||||||
|
adres=adres,
|
||||||
|
postcode=postcode,
|
||||||
|
stad=stad,
|
||||||
|
prijs=prijs,
|
||||||
|
status=status,
|
||||||
|
hero_image_url=hero,
|
||||||
|
woningtype=kk.get("woningtype"),
|
||||||
|
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||||
|
woonoppervlak=parse_m2(kk.get("woonoppervlak")),
|
||||||
|
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||||
|
kamers=int(kk["kamers"]) if kk.get("kamers") else None,
|
||||||
|
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
|
||||||
|
energielabel=kk.get("energielabel"),
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("%s: parse fout: %s", makelaar, e)
|
||||||
|
|
||||||
|
if len(cards) < 10:
|
||||||
|
break
|
||||||
|
page += 1
|
||||||
|
|
||||||
|
log.info("%s: %d listings opgehaald", makelaar, len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Simple Realworks wrappers (one-liners)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def fetch_ankebodewes() -> list[RawListing]:
|
||||||
|
return fetch_realworks("https://www.ankebodewes.nl", "ankebodewes")
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_woongoed() -> list[RawListing]:
|
||||||
|
return fetch_realworks("https://www.woongoedmakelaars.nl", "woongoed")
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_vwmakelaars() -> list[RawListing]:
|
||||||
|
return fetch_realworks("https://www.vwmakelaars.nl", "vwmakelaars")
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_zomakelaars() -> list[RawListing]:
|
||||||
|
return fetch_realworks("https://www.zomakelaars.nl", "zomakelaars")
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_morris() -> list[RawListing]:
|
||||||
|
return fetch_realworks("https://www.morrismakelaardij.nl", "morris")
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_vankleef() -> list[RawListing]:
|
||||||
|
"""Fetch Van Kleef makelaars — only Schiedam, as specified."""
|
||||||
|
listings_path = f"/aanbod/woningaanbod/schiedam/koop"
|
||||||
|
listings = []
|
||||||
|
page = 1
|
||||||
|
|
||||||
|
while True:
|
||||||
|
url = f"https://www.vankleefmakelaars.nl{listings_path}/pagina-{page}/"
|
||||||
|
soup = fetch_soup(url)
|
||||||
|
cards = soup.select("li.aanbodEntry")
|
||||||
|
if not cards:
|
||||||
|
break
|
||||||
|
|
||||||
|
for card in cards:
|
||||||
|
try:
|
||||||
|
a_tag = card.select_one("a.aanbodEntryLink")
|
||||||
|
if not a_tag:
|
||||||
|
continue
|
||||||
|
listing_url = "https://www.vankleefmakelaars.nl" + a_tag["href"]
|
||||||
|
|
||||||
|
adres = _text(card, ".street-address")
|
||||||
|
postcode = (_text(card, ".postal-code") or "").replace(" ", "") or None
|
||||||
|
stad = _text(card, ".locality")
|
||||||
|
prijs = parse_prijs(_text(card, ".koopprijs .kenmerkValue"))
|
||||||
|
|
||||||
|
if prijs and prijs > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
status_text = (_text(card, ".objectstatusbanner") or "").lower()
|
||||||
|
status = _REALWORKS_STATUS_MAP.get(status_text, "beschikbaar")
|
||||||
|
|
||||||
|
img_tag = card.select_one(".hoofdfoto img")
|
||||||
|
hero = img_tag["src"] if img_tag else None
|
||||||
|
|
||||||
|
kk = _realworks_detail(listing_url, "vankleef")
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=listing_url,
|
||||||
|
source_makelaar="vankleef",
|
||||||
|
adres=adres,
|
||||||
|
postcode=postcode,
|
||||||
|
stad=stad,
|
||||||
|
prijs=prijs,
|
||||||
|
status=status,
|
||||||
|
hero_image_url=hero,
|
||||||
|
woningtype=kk.get("woningtype"),
|
||||||
|
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||||
|
woonoppervlak=parse_m2(kk.get("woonoppervlak")),
|
||||||
|
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||||
|
kamers=int(kk["kamers"]) if kk.get("kamers") else None,
|
||||||
|
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
|
||||||
|
energielabel=kk.get("energielabel"),
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("vankleef: parse fout: %s", e)
|
||||||
|
|
||||||
|
if len(cards) < 10:
|
||||||
|
break
|
||||||
|
page += 1
|
||||||
|
|
||||||
|
log.info("vankleef: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Makelaardij Wassenaar (Schiedam) — Realworks CMS, JSON-LD listing page
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Listings page has JSON-LD (Residence) with url/address/price/photo.
|
||||||
|
# Detail pages have span.kenmerk with Wassenaar-specific label names.
|
||||||
|
|
||||||
|
_WASSENAAR_BASE = "https://www.makelaardijwassenaar.nl"
|
||||||
|
|
||||||
|
_WASSENAAR_STATUS_MAP = {
|
||||||
|
"te koop": "beschikbaar",
|
||||||
|
"nieuw": "beschikbaar",
|
||||||
|
"onder bod": "onder_bod",
|
||||||
|
"onder optie": "onder_bod",
|
||||||
|
"verkocht o.v.": "onder_bod",
|
||||||
|
"verkocht onder voorbehoud": "onder_bod",
|
||||||
|
"verkocht": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _wassenaar_detail(detail_url: str) -> dict:
|
||||||
|
"""Fetch Realworks detail page; extract kenmerken with Wassenaar-specific labels."""
|
||||||
|
try:
|
||||||
|
soup = fetch_soup(detail_url)
|
||||||
|
kv: dict[str, str] = {}
|
||||||
|
for kenmerk in soup.select("span.kenmerk"):
|
||||||
|
label_el = kenmerk.select_one("span.kenmerkName")
|
||||||
|
value_el = kenmerk.select_one("span.kenmerkValue")
|
||||||
|
if label_el and value_el:
|
||||||
|
kv[label_el.get_text(strip=True).lower()] = value_el.get_text(strip=True)
|
||||||
|
return {
|
||||||
|
"woningtype": kv.get("soort object"),
|
||||||
|
"bouwjaar": kv.get("bouwjaar"),
|
||||||
|
"woonoppervlak": kv.get("woonoppervlakte"),
|
||||||
|
"perceeloppervlak": kv.get("perceeloppervlakte"),
|
||||||
|
"kamers": kv.get("aantal kamers"),
|
||||||
|
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||||
|
"energielabel": kv.get("energieklasse"),
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("wassenaar: detail fetch fout %s: %s", detail_url, e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_wassenaar() -> list[RawListing]:
|
||||||
|
soup = fetch_soup(f"{_WASSENAAR_BASE}/aanbod/woningaanbod/-{config.MAX_PRICE}/koop/")
|
||||||
|
|
||||||
|
# First pass: collect status + thumbnail per relative url
|
||||||
|
# Each listing has two a.aanbodEntryLink with the same href;
|
||||||
|
# the first has the status banner + photo, the second has address + price.
|
||||||
|
status_by_url: dict[str, str] = {}
|
||||||
|
photo_by_url: dict[str, str] = {}
|
||||||
|
for a in soup.select("a.aanbodEntryLink[href]"):
|
||||||
|
href = a["href"]
|
||||||
|
if href in status_by_url:
|
||||||
|
continue
|
||||||
|
banner = a.select_one(".objectstatusbanner")
|
||||||
|
status_text = banner.get_text(strip=True).lower() if banner else ""
|
||||||
|
status_by_url[href] = _WASSENAAR_STATUS_MAP.get(status_text, "beschikbaar")
|
||||||
|
img = a.select_one("span.hoofdfoto img")
|
||||||
|
if img:
|
||||||
|
src = img.get("src", "")
|
||||||
|
if "geenfotobeschikbaar" not in src:
|
||||||
|
photo_by_url[href] = src
|
||||||
|
|
||||||
|
# Second pass: parse JSON-LD blocks (one per listing)
|
||||||
|
seen: set[str] = set()
|
||||||
|
listings = []
|
||||||
|
for tag in soup.select('script[type="application/ld+json"]'):
|
||||||
|
try:
|
||||||
|
ld = _json.loads(tag.string)
|
||||||
|
if ld.get("@type") != "Residence":
|
||||||
|
continue
|
||||||
|
rel_url = ld.get("url", "")
|
||||||
|
if not rel_url or rel_url in seen:
|
||||||
|
continue
|
||||||
|
seen.add(rel_url)
|
||||||
|
|
||||||
|
detail_url = _WASSENAAR_BASE + rel_url
|
||||||
|
address = ld.get("address", {})
|
||||||
|
postcode = address.get("postalCode", "").replace(" ", "") or None
|
||||||
|
|
||||||
|
price_spec = next(
|
||||||
|
(a.get("priceSpecification", {}) for a in ld.get("potentialAction", [])
|
||||||
|
if a.get("priceSpecification")),
|
||||||
|
{}
|
||||||
|
)
|
||||||
|
prijs = int(price_spec["price"]) if price_spec.get("price") else None
|
||||||
|
if prijs and prijs > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
hero = ld.get("photo") or photo_by_url.get(rel_url)
|
||||||
|
status = status_by_url.get(rel_url, "beschikbaar")
|
||||||
|
kk = _wassenaar_detail(detail_url)
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="wassenaar",
|
||||||
|
status=status,
|
||||||
|
adres=address.get("streetAddress") or None,
|
||||||
|
postcode=postcode,
|
||||||
|
stad=address.get("addressLocality") or None,
|
||||||
|
prijs=prijs,
|
||||||
|
hero_image_url=hero,
|
||||||
|
woningtype=kk.get("woningtype"),
|
||||||
|
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||||
|
woonoppervlak=parse_m2(kk.get("woonoppervlak")),
|
||||||
|
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||||
|
kamers=int(kk["kamers"]) if kk.get("kamers") else None,
|
||||||
|
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
|
||||||
|
energielabel=kk.get("energielabel"),
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("wassenaar: parse fout: %s", e)
|
||||||
|
|
||||||
|
log.info("wassenaar: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Roepman Makelaardij NVM (Delft) — Realworks CMS, JSON-LD listing page
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Uses div.aanbodEntry instead of li.aanbodEntry; price from JSON-LD.
|
||||||
|
|
||||||
|
_ROEPMAN_BASE = "https://www.roepman.nl"
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_roepman() -> list[RawListing]:
|
||||||
|
listings_path = f"/aanbod/woningaanbod/-{config.MAX_PRICE}/koop"
|
||||||
|
listings = []
|
||||||
|
page = 1
|
||||||
|
|
||||||
|
while True:
|
||||||
|
url = f"{_ROEPMAN_BASE}{listings_path}/pagina-{page}/"
|
||||||
|
soup = fetch_soup(url)
|
||||||
|
cards = soup.select("div.aanbodEntry")
|
||||||
|
if not cards:
|
||||||
|
break
|
||||||
|
|
||||||
|
# Collect status + photo per relative url
|
||||||
|
status_by_url: dict[str, str] = {}
|
||||||
|
photo_by_url: dict[str, str] = {}
|
||||||
|
for card in cards:
|
||||||
|
a_tag = card.select_one("a.aanbodEntryLink[href]")
|
||||||
|
if not a_tag:
|
||||||
|
continue
|
||||||
|
href = a_tag["href"]
|
||||||
|
if href in status_by_url:
|
||||||
|
continue
|
||||||
|
banner = card.select_one(".objectstatusbanner")
|
||||||
|
status_text = banner.get_text(strip=True).lower() if banner else ""
|
||||||
|
status_by_url[href] = _REALWORKS_STATUS_MAP.get(status_text, "beschikbaar")
|
||||||
|
img = card.select_one("img")
|
||||||
|
if img:
|
||||||
|
src = img.get("src", "")
|
||||||
|
if "geenfotobeschikbaar" not in src:
|
||||||
|
photo_by_url[href] = src
|
||||||
|
|
||||||
|
# Parse JSON-LD Residence blocks (one per listing)
|
||||||
|
seen: set[str] = set()
|
||||||
|
for tag in soup.select('script[type="application/ld+json"]'):
|
||||||
|
try:
|
||||||
|
ld = _json.loads(tag.string)
|
||||||
|
if ld.get("@type") != "Residence":
|
||||||
|
continue
|
||||||
|
rel_url = ld.get("url", "")
|
||||||
|
if not rel_url or rel_url in seen:
|
||||||
|
continue
|
||||||
|
seen.add(rel_url)
|
||||||
|
|
||||||
|
detail_url = _ROEPMAN_BASE + rel_url
|
||||||
|
address = ld.get("address", {})
|
||||||
|
postcode = address.get("postalCode", "").replace(" ", "") or None
|
||||||
|
|
||||||
|
price_spec = next(
|
||||||
|
(a.get("priceSpecification", {}) for a in ld.get("potentialAction", [])
|
||||||
|
if a.get("priceSpecification")),
|
||||||
|
{}
|
||||||
|
)
|
||||||
|
prijs = int(price_spec["price"]) if price_spec.get("price") else None
|
||||||
|
if prijs and prijs > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
hero = ld.get("photo") or photo_by_url.get(rel_url)
|
||||||
|
status = status_by_url.get(rel_url, "beschikbaar")
|
||||||
|
kk = _realworks_detail(detail_url, "roepman")
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="roepman",
|
||||||
|
status=status,
|
||||||
|
adres=address.get("streetAddress") or None,
|
||||||
|
postcode=postcode,
|
||||||
|
stad=address.get("addressLocality") or None,
|
||||||
|
prijs=prijs,
|
||||||
|
hero_image_url=hero,
|
||||||
|
woningtype=kk.get("woningtype"),
|
||||||
|
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||||
|
woonoppervlak=parse_m2(kk.get("woonoppervlak")),
|
||||||
|
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||||
|
kamers=int(kk["kamers"]) if kk.get("kamers") else None,
|
||||||
|
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
|
||||||
|
energielabel=kk.get("energielabel"),
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("roepman: parse fout: %s", e)
|
||||||
|
|
||||||
|
if len(cards) < 10:
|
||||||
|
break
|
||||||
|
page += 1
|
||||||
|
|
||||||
|
log.info("roepman: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Post Makelaardij (Delft) — Realworks CMS, custom detail parser
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
_POST_BASE = "https://www.postmakelaardij.nl"
|
||||||
|
|
||||||
|
_POST_STATUS_MAP = {
|
||||||
|
"te koop": "beschikbaar",
|
||||||
|
"onder bod": "onder_bod",
|
||||||
|
"verkocht": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _post_detail(detail_url: str) -> dict:
|
||||||
|
"""Fetch Post Makelaardij detail page and extract kenmerken."""
|
||||||
|
try:
|
||||||
|
soup = fetch_soup(detail_url)
|
||||||
|
|
||||||
|
# Energielabel from CSS class: energielabel-{letter}
|
||||||
|
energielabel = None
|
||||||
|
for el in soup.select('[class]'):
|
||||||
|
for cls in el.get('class', []):
|
||||||
|
if cls.startswith('energielabel-') and cls != 'energielabel':
|
||||||
|
energielabel = cls.replace('energielabel-', '').upper()
|
||||||
|
break
|
||||||
|
if energielabel:
|
||||||
|
break
|
||||||
|
|
||||||
|
# Woonoppervlak, perceeloppervlak, slaapkamers from icon spans
|
||||||
|
woonoppervlak = None
|
||||||
|
perceeloppervlak = None
|
||||||
|
slaapkamers = None
|
||||||
|
for span in soup.select('span.object-info-icon-text'):
|
||||||
|
txt = span.get_text(strip=True)
|
||||||
|
if 'slaapkamer' in txt:
|
||||||
|
m = re.search(r'(\d+)', txt)
|
||||||
|
slaapkamers = int(m.group(1)) if m else None
|
||||||
|
elif 'perceel' in txt:
|
||||||
|
perceeloppervlak = parse_m2(txt)
|
||||||
|
elif 'm²' in txt or 'm2' in txt:
|
||||||
|
woonoppervlak = parse_m2(txt)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"woonoppervlak": woonoppervlak,
|
||||||
|
"perceeloppervlak": perceeloppervlak,
|
||||||
|
"slaapkamers": slaapkamers,
|
||||||
|
"energielabel": energielabel,
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("post: detail fetch fout %s: %s", detail_url, e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_post() -> list[RawListing]:
|
||||||
|
"""Fetch Post Makelaardij listings; only Delft, only koop."""
|
||||||
|
listings = []
|
||||||
|
page = 1
|
||||||
|
|
||||||
|
while True:
|
||||||
|
url = f"{_POST_BASE}/woningaanbod/koop?page={page}"
|
||||||
|
soup = fetch_soup(url)
|
||||||
|
cards = soup.select("article")
|
||||||
|
if not cards:
|
||||||
|
break
|
||||||
|
|
||||||
|
for card in cards:
|
||||||
|
try:
|
||||||
|
# URL — first link in image slider
|
||||||
|
a_tag = card.select_one("a[href]")
|
||||||
|
if not a_tag:
|
||||||
|
continue
|
||||||
|
href = a_tag["href"]
|
||||||
|
detail_url = href if href.startswith("http") else _POST_BASE + href
|
||||||
|
|
||||||
|
# Postcode + city from span.custom-postcode-text
|
||||||
|
pc_el = card.select_one("span.custom-postcode-text")
|
||||||
|
if not pc_el:
|
||||||
|
continue
|
||||||
|
pc_parts = pc_el.get_text(strip=True).split()
|
||||||
|
if len(pc_parts) < 3:
|
||||||
|
continue
|
||||||
|
postcode = pc_parts[0] + pc_parts[1] # "2613BD"
|
||||||
|
stad = " ".join(pc_parts[2:]) # "Delft"
|
||||||
|
|
||||||
|
# Filter: only Delft
|
||||||
|
if stad.lower() != "delft":
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Price — filter early
|
||||||
|
prijs = parse_prijs(_text(card, "span.price-block"))
|
||||||
|
if prijs and prijs > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Status from span.status text
|
||||||
|
status_text = (_text(card, "span.status") or "").lower()
|
||||||
|
status = _POST_STATUS_MAP.get(status_text, "beschikbaar")
|
||||||
|
|
||||||
|
# Address
|
||||||
|
adres = _text(card, "h4.custom-address-text")
|
||||||
|
|
||||||
|
# Hero: first img in article
|
||||||
|
img = card.select_one("img")
|
||||||
|
hero = img["src"] if img else None
|
||||||
|
|
||||||
|
kk = _post_detail(detail_url)
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="post",
|
||||||
|
status=status,
|
||||||
|
adres=adres,
|
||||||
|
postcode=postcode,
|
||||||
|
stad=stad,
|
||||||
|
prijs=prijs,
|
||||||
|
hero_image_url=hero,
|
||||||
|
woonoppervlak=kk.get("woonoppervlak"),
|
||||||
|
perceeloppervlak=kk.get("perceeloppervlak"),
|
||||||
|
slaapkamers=kk.get("slaapkamers"),
|
||||||
|
energielabel=kk.get("energielabel"),
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("post: parse fout: %s", e)
|
||||||
|
|
||||||
|
if len(cards) < 12:
|
||||||
|
break
|
||||||
|
page += 1
|
||||||
|
|
||||||
|
log.info("post: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
542
src/adapters/ssr/schiedam.py
Normal file
542
src/adapters/ssr/schiedam.py
Normal file
@@ -0,0 +1,542 @@
|
|||||||
|
"""
|
||||||
|
Custom Schiedam scrapers (no shared CMS platform).
|
||||||
|
|
||||||
|
Each makelaar here uses a bespoke site structure that required its own parser.
|
||||||
|
|
||||||
|
Scrapers: dewittegarantiemakelaars (JSON-LD), dens, 3dmakelaars, dupont
|
||||||
|
"""
|
||||||
|
import re
|
||||||
|
|
||||||
|
import config
|
||||||
|
from huizenbot import RawListing
|
||||||
|
|
||||||
|
from ._shared import (
|
||||||
|
fetch_soup, parse_prijs, parse_m2, _text,
|
||||||
|
_extract_postcode, _infer_stad, log,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# De Witte Garantiemakelaars (Schiedam)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Listing cards have a pill badge for status. All detail data comes from
|
||||||
|
# JSON-LD (schema.org BuyAction/Offer) on the detail page.
|
||||||
|
|
||||||
|
_DEWITTE_BASE = "https://dewittegarantiemakelaars.nl"
|
||||||
|
|
||||||
|
_DEWITTE_PILL_MAP = {
|
||||||
|
"bg-fun-green": "beschikbaar",
|
||||||
|
"bg-sold": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
_DEWITTE_TYPE_MAP = {
|
||||||
|
"Apartment": "appartement",
|
||||||
|
"House": "woning",
|
||||||
|
"SingleFamilyResidence": "woning",
|
||||||
|
"Residence": "woning",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _dewitte_jsonld(detail_url: str) -> dict:
|
||||||
|
"""Fetch detail page and return parsed JSON-LD dict, or {} on failure."""
|
||||||
|
import json
|
||||||
|
try:
|
||||||
|
soup = fetch_soup(detail_url)
|
||||||
|
tag = soup.select_one('script[type="application/ld+json"]')
|
||||||
|
if not tag:
|
||||||
|
log.warning("dewitte: geen JSON-LD op %s", detail_url)
|
||||||
|
return {}
|
||||||
|
return json.loads(tag.string)
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("dewitte: JSON-LD fout %s: %s", detail_url, e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_dewittegarantiemakelaars() -> list[RawListing]:
|
||||||
|
listings = []
|
||||||
|
page = 1
|
||||||
|
|
||||||
|
while True:
|
||||||
|
url = (
|
||||||
|
f"{_DEWITTE_BASE}/woningaanbod"
|
||||||
|
f"?buy_rent=buy&buy_price=1-{config.MAX_PRICE}&page={page}"
|
||||||
|
)
|
||||||
|
soup = fetch_soup(url)
|
||||||
|
cards = soup.select("div.card.card--property")
|
||||||
|
if not cards:
|
||||||
|
break
|
||||||
|
|
||||||
|
for card in cards:
|
||||||
|
try:
|
||||||
|
a_tag = card.select_one("a.card__anchor")
|
||||||
|
if not a_tag:
|
||||||
|
continue
|
||||||
|
detail_url = a_tag["href"]
|
||||||
|
if not detail_url.startswith("http"):
|
||||||
|
detail_url = _DEWITTE_BASE + detail_url
|
||||||
|
|
||||||
|
pill = card.select_one("span.pill")
|
||||||
|
pill_classes = pill.get("class", []) if pill else []
|
||||||
|
status_key = next(
|
||||||
|
(c for c in pill_classes if c.startswith("bg-")), None
|
||||||
|
)
|
||||||
|
status = _DEWITTE_PILL_MAP.get(status_key, "onder_bod")
|
||||||
|
|
||||||
|
ld = _dewitte_jsonld(detail_url)
|
||||||
|
if not ld:
|
||||||
|
continue
|
||||||
|
|
||||||
|
offered = ld.get("itemOffered", {})
|
||||||
|
address = offered.get("address", {})
|
||||||
|
floor_size = offered.get("floorSize", {})
|
||||||
|
|
||||||
|
postcode = address.get("postalCode", "").replace(" ", "") or None
|
||||||
|
stad = address.get("addressLocality") or None
|
||||||
|
adres = address.get("streetAddress") or None
|
||||||
|
|
||||||
|
prijs = ld.get("price")
|
||||||
|
if prijs and int(prijs) > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
woningtype = _DEWITTE_TYPE_MAP.get(offered.get("@type", ""))
|
||||||
|
woonoppervlak = int(floor_size["value"]) if floor_size.get("value") else None
|
||||||
|
kamers = offered.get("numberOfRooms")
|
||||||
|
bouwjaar = offered.get("yearBuilt")
|
||||||
|
|
||||||
|
# Full-res image from JSON-LD, fall back to card thumbnail
|
||||||
|
hero = ld.get("image")
|
||||||
|
if not hero:
|
||||||
|
img = card.select_one("picture img")
|
||||||
|
hero = img["src"] if img else None
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="dewittegarantiemakelaars",
|
||||||
|
status=status,
|
||||||
|
adres=adres,
|
||||||
|
postcode=postcode,
|
||||||
|
stad=stad,
|
||||||
|
prijs=int(prijs) if prijs else None,
|
||||||
|
woningtype=woningtype,
|
||||||
|
woonoppervlak=woonoppervlak,
|
||||||
|
kamers=int(kamers) if kamers else None,
|
||||||
|
bouwjaar=int(bouwjaar) if bouwjaar else None,
|
||||||
|
hero_image_url=hero,
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("dewitte: parse fout: %s", e)
|
||||||
|
|
||||||
|
if len(cards) < 10:
|
||||||
|
break
|
||||||
|
page += 1
|
||||||
|
|
||||||
|
log.info("dewittegarantiemakelaars: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# D&S Makelaars (Schiedam)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
_DS_BASE = "https://www.densmakelaars.nl"
|
||||||
|
|
||||||
|
_DS_STATUS_MAP = {
|
||||||
|
"onder bod": "onder_bod",
|
||||||
|
"te koop": "beschikbaar",
|
||||||
|
"nieuw": "beschikbaar",
|
||||||
|
"beschikbaar": "beschikbaar",
|
||||||
|
"verkocht": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _ds_detail(detail_url: str, html_text: str = None) -> dict:
|
||||||
|
"""Fetch D&S detail page and extract all kenmerken from <dt>/<dd> pairs and postcode from maps URL."""
|
||||||
|
try:
|
||||||
|
# If html_text not provided, fetch it
|
||||||
|
if html_text is None:
|
||||||
|
import httpx
|
||||||
|
r = httpx.get(
|
||||||
|
detail_url,
|
||||||
|
headers={"User-Agent": config.USER_AGENT},
|
||||||
|
timeout=15,
|
||||||
|
follow_redirects=True,
|
||||||
|
)
|
||||||
|
html_text = r.text
|
||||||
|
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
soup = BeautifulSoup(html_text, "html.parser")
|
||||||
|
|
||||||
|
# Parse <dt>/<dd> pairs into a label → value map
|
||||||
|
kv: dict[str, str] = {}
|
||||||
|
dts = soup.select("dt")
|
||||||
|
dds = soup.select("dd")
|
||||||
|
|
||||||
|
for dt, dd in zip(dts, dds):
|
||||||
|
label = dt.get_text(strip=True).lower()
|
||||||
|
value = dd.get_text(strip=True)
|
||||||
|
kv[label] = value
|
||||||
|
|
||||||
|
# Extract postcode from Google Maps URL in iframe src
|
||||||
|
# Pattern: q=...POSTCODE...,CITY where POSTCODE is 4 digits + 2 letters
|
||||||
|
postcode = None
|
||||||
|
m = re.search(r'q=.+?,(\d{4})\s+([A-Z]{2}),', html_text)
|
||||||
|
if m:
|
||||||
|
postcode = f"{m.group(1)}{m.group(2)}"
|
||||||
|
|
||||||
|
return {
|
||||||
|
"status": kv.get("status", "beschikbaar").lower(),
|
||||||
|
"woningtype": kv.get("soort woning"),
|
||||||
|
"bouwjaar": kv.get("bouwjaar"),
|
||||||
|
"woonoppervlak": kv.get("woonoppervlakte"),
|
||||||
|
"kamers": kv.get("aantal kamers"),
|
||||||
|
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||||
|
"energielabel": kv.get("energielabel"),
|
||||||
|
"postcode": postcode,
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("dens: detail fetch fout %s: %s", detail_url, e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_dens() -> list[RawListing]:
|
||||||
|
"""Fetch D&S Makelaars listings with full detail pages."""
|
||||||
|
listings = []
|
||||||
|
page = 1
|
||||||
|
|
||||||
|
while True:
|
||||||
|
url = f"{_DS_BASE}/aanbod/koopwoningen?page={page}"
|
||||||
|
soup = fetch_soup(url)
|
||||||
|
cards = soup.select(".col-12.col-md-4.object-wrapper")
|
||||||
|
if not cards:
|
||||||
|
break
|
||||||
|
|
||||||
|
for card in cards:
|
||||||
|
try:
|
||||||
|
# Extract URL
|
||||||
|
a_tag = card.select_one("a.property")
|
||||||
|
if not a_tag or "href" not in a_tag.attrs:
|
||||||
|
continue
|
||||||
|
detail_url = a_tag["href"]
|
||||||
|
if not detail_url.startswith("http"):
|
||||||
|
detail_url = _DS_BASE + detail_url
|
||||||
|
|
||||||
|
# Extract listing page data
|
||||||
|
status_label = _text(card, "span.label") or "beschikbaar"
|
||||||
|
status_label = status_label.strip().lower()
|
||||||
|
status = _DS_STATUS_MAP.get(status_label, "beschikbaar")
|
||||||
|
|
||||||
|
adres = _text(card, "h3")
|
||||||
|
stad = _text(card, "h4")
|
||||||
|
prijs_text = _text(card, "div.price")
|
||||||
|
prijs = parse_prijs(prijs_text)
|
||||||
|
|
||||||
|
# Extract area and rooms from footer
|
||||||
|
footer_spans = card.select("div.footer span")
|
||||||
|
woonoppervlak = None
|
||||||
|
kamers = None
|
||||||
|
for span in footer_spans:
|
||||||
|
text = span.get_text(strip=True)
|
||||||
|
if "m²" in text:
|
||||||
|
woonoppervlak = parse_m2(text)
|
||||||
|
elif "kamers" in text.lower():
|
||||||
|
m = re.search(r"(\d+)", text)
|
||||||
|
if m:
|
||||||
|
kamers = int(m.group(1))
|
||||||
|
|
||||||
|
# Extract hero image
|
||||||
|
img_tag = card.select_one("img")
|
||||||
|
hero = img_tag["src"] if img_tag else None
|
||||||
|
|
||||||
|
# Fetch and parse detail page
|
||||||
|
detail_data = _ds_detail(detail_url)
|
||||||
|
|
||||||
|
# Use postcode from detail data (extracted from Google Maps URL)
|
||||||
|
postcode = detail_data.get("postcode")
|
||||||
|
|
||||||
|
# Determine status from detail page if available
|
||||||
|
if detail_data.get("status"):
|
||||||
|
status = _DS_STATUS_MAP.get(detail_data["status"], status)
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="dens",
|
||||||
|
adres=adres,
|
||||||
|
postcode=postcode,
|
||||||
|
stad=stad or _infer_stad(postcode),
|
||||||
|
prijs=prijs,
|
||||||
|
status=status,
|
||||||
|
hero_image_url=hero,
|
||||||
|
woningtype=detail_data.get("woningtype"),
|
||||||
|
bouwjaar=int(detail_data["bouwjaar"]) if detail_data.get("bouwjaar") else None,
|
||||||
|
woonoppervlak=parse_m2(detail_data.get("woonoppervlak")) or woonoppervlak,
|
||||||
|
kamers=int(detail_data["kamers"]) if detail_data.get("kamers") else kamers,
|
||||||
|
slaapkamers=int(detail_data["slaapkamers"]) if detail_data.get("slaapkamers") else None,
|
||||||
|
energielabel=detail_data.get("energielabel"),
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("dens: parse fout: %s", e)
|
||||||
|
|
||||||
|
if len(cards) < 10:
|
||||||
|
break
|
||||||
|
page += 1
|
||||||
|
|
||||||
|
log.info("dens: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# 3D Makelaars (Schiedam/Vlaardingen)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
_3D_BASE = "https://3dmakelaars.nl"
|
||||||
|
|
||||||
|
|
||||||
|
def _3dmakelaars_detail(detail_url: str) -> dict:
|
||||||
|
"""Fetch 3dmakelaars detail page and extract structured info block."""
|
||||||
|
try:
|
||||||
|
soup = fetch_soup(detail_url)
|
||||||
|
|
||||||
|
# Parse structured info block: span (label) + p (value) pairs
|
||||||
|
kv: dict[str, str] = {}
|
||||||
|
for li in soup.select("div.tl-adiltional-inforamtion ul.tl-adiltional-listed li"):
|
||||||
|
label_el = li.select_one("span")
|
||||||
|
value_el = li.select_one("p")
|
||||||
|
if label_el and value_el:
|
||||||
|
label = label_el.get_text(strip=True).lower()
|
||||||
|
value = value_el.get_text(strip=True)
|
||||||
|
kv[label] = value
|
||||||
|
|
||||||
|
# Extract postcode from first description paragraph
|
||||||
|
postcode = None
|
||||||
|
p_tag = soup.select_one(".omschrijving > p:nth-child(1)")
|
||||||
|
if p_tag:
|
||||||
|
text = p_tag.get_text()
|
||||||
|
postcode = _extract_postcode(text)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"kamers": int(kv["aantal kamers"].split()[0]) if "aantal kamers" in kv else None,
|
||||||
|
"slaapkamers": int(kv["aantal slaapkamers"].split()[0]) if "aantal slaapkamers" in kv else None,
|
||||||
|
"bouwjaar": int(kv["bouwjaar"]) if "bouwjaar" in kv else None,
|
||||||
|
"woningtype": kv.get("bouwvorm"),
|
||||||
|
"woonoppervlak": parse_m2(kv.get("oppervlakte")),
|
||||||
|
"postcode": postcode,
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("3dmakelaars: detail fetch fout %s: %s", detail_url, e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_3dmakelaars() -> list[RawListing]:
|
||||||
|
"""Fetch 3D Makelaars listings with pagination."""
|
||||||
|
listings = []
|
||||||
|
page = 1
|
||||||
|
|
||||||
|
while True:
|
||||||
|
url = (
|
||||||
|
f"{_3D_BASE}/woningen-te-koop-in-schiedam-en-vlaardingen"
|
||||||
|
f"?kamers=&oppervlakte=&woonplaats=&video=&prijs=3&page={page}"
|
||||||
|
)
|
||||||
|
soup = fetch_soup(url)
|
||||||
|
cards = soup.select("div.tl-properties-item")
|
||||||
|
if not cards:
|
||||||
|
break
|
||||||
|
|
||||||
|
for card in cards:
|
||||||
|
try:
|
||||||
|
# Extract detail URL from onclick attribute
|
||||||
|
onclick = card.get("onclick", "")
|
||||||
|
detail_url = None
|
||||||
|
if "window.location" in onclick:
|
||||||
|
m = re.search(r"window\.location\s*=\s*['\"]([^'\"]+)['\"]", onclick)
|
||||||
|
if m:
|
||||||
|
detail_url = _3D_BASE + m.group(1)
|
||||||
|
|
||||||
|
if not detail_url:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Extract listing-level info
|
||||||
|
adres = _text(card, "h3.price")
|
||||||
|
prijs_text = _text(card, "span.address")
|
||||||
|
prijs = parse_prijs(prijs_text)
|
||||||
|
|
||||||
|
# Extract rooms and area from meta list
|
||||||
|
kamers = None
|
||||||
|
woonoppervlak = None
|
||||||
|
for li in card.select("ul.tl-meta-listed > li"):
|
||||||
|
text = li.get_text(strip=True)
|
||||||
|
if "kamers" in text.lower():
|
||||||
|
m = re.search(r"(\d+)", text)
|
||||||
|
if m:
|
||||||
|
kamers = int(m.group(1))
|
||||||
|
elif "m²" in text or "m2" in text:
|
||||||
|
woonoppervlak = parse_m2(text)
|
||||||
|
|
||||||
|
# Extract image
|
||||||
|
img_tag = card.select_one("img")
|
||||||
|
hero = img_tag["src"] if img_tag else None
|
||||||
|
if hero and not hero.startswith("http"):
|
||||||
|
hero = _3D_BASE + hero
|
||||||
|
|
||||||
|
# Fetch detail page for full info
|
||||||
|
detail_data = _3dmakelaars_detail(detail_url)
|
||||||
|
|
||||||
|
# Postcode from detail page, fallback to extraction from address
|
||||||
|
postcode = detail_data.get("postcode")
|
||||||
|
if not postcode and adres:
|
||||||
|
postcode = _extract_postcode(adres)
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="3dmakelaars",
|
||||||
|
adres=adres,
|
||||||
|
postcode=postcode,
|
||||||
|
stad=_infer_stad(postcode),
|
||||||
|
prijs=prijs,
|
||||||
|
woningtype=detail_data.get("woningtype"),
|
||||||
|
bouwjaar=detail_data.get("bouwjaar"),
|
||||||
|
woonoppervlak=woonoppervlak or detail_data.get("woonoppervlak"),
|
||||||
|
kamers=kamers or detail_data.get("kamers"),
|
||||||
|
slaapkamers=detail_data.get("slaapkamers"),
|
||||||
|
hero_image_url=hero,
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("3dmakelaars: parse fout: %s", e)
|
||||||
|
|
||||||
|
if len(cards) < 7:
|
||||||
|
break
|
||||||
|
page += 1
|
||||||
|
|
||||||
|
log.info("3dmakelaars: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Dupont ERA Makelaars (Schiedam/Rotterdam)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
_DUPONT_BASE = "https://www.dupont.nl"
|
||||||
|
|
||||||
|
_DUPONT_STATUS_MAP = {
|
||||||
|
"te koop": "beschikbaar",
|
||||||
|
"nieuw": "beschikbaar",
|
||||||
|
"onder bod": "onder_bod",
|
||||||
|
"verkocht onder voorbehoud": "onder_bod",
|
||||||
|
"verkocht": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _dupont_detail(detail_url: str) -> dict:
|
||||||
|
"""Fetch Dupont detail page and extract kenmerken from dt/dd pairs."""
|
||||||
|
try:
|
||||||
|
soup = fetch_soup(detail_url)
|
||||||
|
|
||||||
|
# Parse dt/dd pairs into label → value map
|
||||||
|
kv: dict[str, str] = {}
|
||||||
|
dts = soup.select("dt")
|
||||||
|
dds = soup.select("dd")
|
||||||
|
|
||||||
|
for dt, dd in zip(dts, dds):
|
||||||
|
label = dt.get_text(strip=True).lower()
|
||||||
|
value = dd.get_text(strip=True)
|
||||||
|
kv[label] = value
|
||||||
|
|
||||||
|
# Extract postcode from small tag (format: "NNNN AA CITY")
|
||||||
|
postcode = None
|
||||||
|
small_tag = soup.select_one("section div.container-fluid small")
|
||||||
|
if small_tag:
|
||||||
|
postcode = _extract_postcode(small_tag.get_text())
|
||||||
|
|
||||||
|
return {
|
||||||
|
"postcode": postcode,
|
||||||
|
"woningtype": kv.get("soort woning"),
|
||||||
|
"bouwjaar": kv.get("bouwjaar"),
|
||||||
|
"woonoppervlak": kv.get("woonoppervlakte"),
|
||||||
|
"kamers": kv.get("aantal kamers"),
|
||||||
|
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||||
|
"energielabel": kv.get("energielabel"),
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("dupont: detail fetch fout %s: %s", detail_url, e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_dupont() -> list[RawListing]:
|
||||||
|
"""Fetch Dupont ERA Makelaars listings with pagination and detail pages."""
|
||||||
|
listings = []
|
||||||
|
page = 1
|
||||||
|
|
||||||
|
while True:
|
||||||
|
url = f"{_DUPONT_BASE}/aanbod/koopwoningen?page={page}"
|
||||||
|
soup = fetch_soup(url)
|
||||||
|
cards = soup.select("article.object")
|
||||||
|
if not cards:
|
||||||
|
break
|
||||||
|
|
||||||
|
for card in cards:
|
||||||
|
try:
|
||||||
|
# Extract URL
|
||||||
|
a_tag = card.select_one("a[href]")
|
||||||
|
if not a_tag or "href" not in a_tag.attrs:
|
||||||
|
continue
|
||||||
|
detail_url = a_tag["href"]
|
||||||
|
if not detail_url.startswith("http"):
|
||||||
|
detail_url = _DUPONT_BASE + detail_url
|
||||||
|
|
||||||
|
# Extract listing-level data
|
||||||
|
adres = _text(card, "h3")
|
||||||
|
stad = _text(card, "h4")
|
||||||
|
prijs_text = _text(card, "div.price")
|
||||||
|
prijs = parse_prijs(prijs_text)
|
||||||
|
|
||||||
|
# Extract status from label
|
||||||
|
status_label = _text(card, "div.label") or "beschikbaar"
|
||||||
|
status_label = status_label.strip().lower()
|
||||||
|
status = _DUPONT_STATUS_MAP.get(status_label, "beschikbaar")
|
||||||
|
|
||||||
|
# Extract image
|
||||||
|
img_tag = card.select_one("img.img-responsive")
|
||||||
|
hero = img_tag["src"] if img_tag else None
|
||||||
|
if hero and not hero.startswith("http"):
|
||||||
|
hero = _DUPONT_BASE + hero
|
||||||
|
|
||||||
|
# Fetch detail page for full data
|
||||||
|
detail_data = _dupont_detail(detail_url)
|
||||||
|
|
||||||
|
# Use postcode from detail if available
|
||||||
|
postcode = detail_data.get("postcode")
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="dupont",
|
||||||
|
adres=adres,
|
||||||
|
postcode=postcode,
|
||||||
|
stad=stad or _infer_stad(postcode),
|
||||||
|
prijs=prijs,
|
||||||
|
status=status,
|
||||||
|
hero_image_url=hero,
|
||||||
|
woningtype=detail_data.get("woningtype"),
|
||||||
|
bouwjaar=int(detail_data["bouwjaar"]) if detail_data.get("bouwjaar") else None,
|
||||||
|
woonoppervlak=parse_m2(detail_data.get("woonoppervlak")),
|
||||||
|
kamers=int(detail_data["kamers"]) if detail_data.get("kamers") else None,
|
||||||
|
slaapkamers=int(detail_data["slaapkamers"]) if detail_data.get("slaapkamers") else None,
|
||||||
|
energielabel=detail_data.get("energielabel"),
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("dupont: parse fout: %s", e)
|
||||||
|
|
||||||
|
if len(cards) < 10:
|
||||||
|
break
|
||||||
|
page += 1
|
||||||
|
|
||||||
|
log.info("dupont: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
656
src/adapters/ssr/sure.py
Normal file
656
src/adapters/ssr/sure.py
Normal file
@@ -0,0 +1,656 @@
|
|||||||
|
"""
|
||||||
|
SURE WordPress plugin scrapers.
|
||||||
|
|
||||||
|
All makelaars here use the SURE real estate plugin for WordPress. Listings
|
||||||
|
are at /wonen?sure_koop_huur=koop with pagination via /wonen/page/{N}/.
|
||||||
|
Cards use class a.card-house or div.card.card--house.
|
||||||
|
Detail pages have a #kenmerken section with label/value pairs.
|
||||||
|
|
||||||
|
Scrapers: schielandborsboom, olsthoorn, vanherk, borgdorff
|
||||||
|
"""
|
||||||
|
import re
|
||||||
|
|
||||||
|
import config
|
||||||
|
from huizenbot import RawListing
|
||||||
|
|
||||||
|
from ._shared import fetch_soup, parse_prijs, parse_m2, _text, _extract_postcode, log
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Schieland Borsboom NVM Makelaars (Rotterdam, active in Schiedam)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
_SCHIELAND_BASE = "https://www.schielandborsboom.nl"
|
||||||
|
|
||||||
|
_SCHIELAND_STATUS_MAP = {
|
||||||
|
"sure-status-available": "beschikbaar",
|
||||||
|
"sure-status-under_bid": "onder_bod",
|
||||||
|
"sure-status-sold": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _schieland_detail(detail_url: str) -> dict:
|
||||||
|
"""Fetch Schieland Borsboom detail page and extract kenmerken."""
|
||||||
|
try:
|
||||||
|
soup = fetch_soup(detail_url)
|
||||||
|
|
||||||
|
# Postcode from house__status p (e.g. "3117 DP Schiedam")
|
||||||
|
postcode_el = soup.select_one("div.house__status p")
|
||||||
|
postcode = _extract_postcode(postcode_el.get_text()) if postcode_el else None
|
||||||
|
|
||||||
|
# Parse house-features__block sections: div.house-features__block > ul > li
|
||||||
|
kv: dict[str, str] = {}
|
||||||
|
for block in soup.select("div.house-features__block"):
|
||||||
|
h4 = block.select_one("h4")
|
||||||
|
if not h4:
|
||||||
|
continue
|
||||||
|
section_title = h4.get_text(strip=True).lower()
|
||||||
|
|
||||||
|
for li in block.select("ul > li"):
|
||||||
|
strong = li.select_one("strong")
|
||||||
|
span = li.select_one("span")
|
||||||
|
if not strong or not span:
|
||||||
|
continue
|
||||||
|
|
||||||
|
label = strong.get_text(strip=True).lower()
|
||||||
|
value = span.get_text(strip=True)
|
||||||
|
|
||||||
|
# Remove links from value
|
||||||
|
for a in span.select("a"):
|
||||||
|
value = value.replace(a.get_text(strip=True), "").strip()
|
||||||
|
|
||||||
|
kv[f"{section_title}.{label}"] = value
|
||||||
|
|
||||||
|
return {
|
||||||
|
"postcode": postcode,
|
||||||
|
"status": kv.get("overdracht.status", "").lower(),
|
||||||
|
"woningtype": kv.get("bouwvorm.soort bouw"),
|
||||||
|
"bouwjaar": kv.get("bouwvorm.bouwjaar"),
|
||||||
|
"woonoppervlak": kv.get("indeling.woonoppervlakte"),
|
||||||
|
"perceeloppervlak": kv.get("indeling.perceeloppervlakte"),
|
||||||
|
"kamers": kv.get("indeling.aantal kamers"),
|
||||||
|
"slaapkamers": kv.get("indeling.aantal slaapkamers"),
|
||||||
|
"energielabel": kv.get("energie & installatie.energielabel"),
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("schielandborsboom: detail fetch fout %s: %s", detail_url, e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_schielandborsboom() -> list[RawListing]:
|
||||||
|
"""Fetch Schieland Borsboom NVM listings (koop only, Schiedam)."""
|
||||||
|
listings = []
|
||||||
|
page = 1
|
||||||
|
|
||||||
|
while True:
|
||||||
|
if page == 1:
|
||||||
|
url = f"{_SCHIELAND_BASE}/wonen/zoeken/heel-nederland/prijs=200000-300000/schiedam/"
|
||||||
|
else:
|
||||||
|
url = f"{_SCHIELAND_BASE}/wonen/zoeken/heel-nederland/prijs=200000-300000/schiedam/?pagina={page}"
|
||||||
|
|
||||||
|
soup = fetch_soup(url)
|
||||||
|
cards = soup.select("div.card.card--house")
|
||||||
|
if not cards:
|
||||||
|
break
|
||||||
|
|
||||||
|
for card in cards:
|
||||||
|
try:
|
||||||
|
a_tag = card.select_one("a.card__anchor")
|
||||||
|
if not a_tag or "href" not in a_tag.attrs:
|
||||||
|
continue
|
||||||
|
detail_url = a_tag["href"]
|
||||||
|
if not detail_url.startswith("http"):
|
||||||
|
detail_url = _SCHIELAND_BASE + detail_url
|
||||||
|
|
||||||
|
# Filter: only Schiedam
|
||||||
|
stad_el = card.select_one("p.house-place")
|
||||||
|
stad = stad_el.get_text(strip=True) if stad_el else None
|
||||||
|
if not stad or stad.lower() != "schiedam":
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Status from card-house__status badge
|
||||||
|
status_el = card.select_one("div.card-house__status")
|
||||||
|
status_text = status_el.get_text(strip=True).lower() if status_el else ""
|
||||||
|
# Check for known status keywords in badge text
|
||||||
|
if "beschikbaar" in status_text:
|
||||||
|
status = "beschikbaar"
|
||||||
|
elif "onder bod" in status_text:
|
||||||
|
status = "onder_bod"
|
||||||
|
elif "verkocht" in status_text:
|
||||||
|
status = "verkocht"
|
||||||
|
else:
|
||||||
|
status = "beschikbaar"
|
||||||
|
|
||||||
|
# Price
|
||||||
|
prijs = parse_prijs(_text(card, "p.price"))
|
||||||
|
if prijs and prijs > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
adres = _text(card, "h4.house-street")
|
||||||
|
|
||||||
|
# Hero image from picture source (medium size)
|
||||||
|
src_tag = card.select_one('picture source[media="(min-width:100px)"]')
|
||||||
|
hero = src_tag["srcset"] if src_tag else None
|
||||||
|
if hero is None:
|
||||||
|
img = card.select_one("img")
|
||||||
|
hero = img.get("src") if img else None
|
||||||
|
if hero and not hero.startswith("http"):
|
||||||
|
hero = _SCHIELAND_BASE + hero
|
||||||
|
|
||||||
|
# Data icons on card: surface, bedrooms, energy label
|
||||||
|
woonoppervlak_card = None
|
||||||
|
slaapkamers_card = None
|
||||||
|
energielabel_card = None
|
||||||
|
for data_div in card.select("div.data"):
|
||||||
|
txt = data_div.get_text(strip=True)
|
||||||
|
if data_div.select_one("i.icon-surface"):
|
||||||
|
woonoppervlak_card = parse_m2(txt)
|
||||||
|
elif data_div.select_one("i.icon-bedrooms"):
|
||||||
|
m = re.search(r"(\d+)", txt)
|
||||||
|
slaapkamers_card = int(m.group(1)) if m else None
|
||||||
|
elif data_div.select_one("i.icon-label"):
|
||||||
|
energielabel_card = txt.strip() or None
|
||||||
|
|
||||||
|
# Fetch detail page for full kenmerken
|
||||||
|
kk = _schieland_detail(detail_url)
|
||||||
|
|
||||||
|
# Refine status from detail page
|
||||||
|
if kk.get("status"):
|
||||||
|
status = _SCHIELAND_STATUS_MAP.get(kk["status"], status)
|
||||||
|
|
||||||
|
# Parse kamers: "5 kamers" → 5
|
||||||
|
kamers = None
|
||||||
|
if kk.get("kamers"):
|
||||||
|
m = re.search(r"(\d+)", kk["kamers"])
|
||||||
|
kamers = int(m.group(1)) if m else None
|
||||||
|
|
||||||
|
# Parse slaapkamers: "3" or "3 slaapkamers" → 3
|
||||||
|
slaapkamers = slaapkamers_card
|
||||||
|
if kk.get("slaapkamers"):
|
||||||
|
m = re.search(r"(\d+)", kk["slaapkamers"])
|
||||||
|
slaapkamers = int(m.group(1)) if m else slaapkamers_card
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="schielandborsboom",
|
||||||
|
status=status,
|
||||||
|
adres=adres,
|
||||||
|
postcode=kk.get("postcode"),
|
||||||
|
stad=stad,
|
||||||
|
prijs=prijs,
|
||||||
|
hero_image_url=hero,
|
||||||
|
woningtype=kk.get("woningtype"),
|
||||||
|
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||||
|
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
|
||||||
|
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||||
|
kamers=kamers,
|
||||||
|
slaapkamers=slaapkamers,
|
||||||
|
energielabel=kk.get("energielabel") or energielabel_card,
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("schielandborsboom: parse fout: %s", e)
|
||||||
|
|
||||||
|
if len(cards) < 18:
|
||||||
|
break
|
||||||
|
page += 1
|
||||||
|
|
||||||
|
log.info("schielandborsboom: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Olsthoorn Makelaars Delft (SURE WordPress plugin)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Covers Delft, Den Haag, Naaldwijk etc — we filter for Delft only.
|
||||||
|
# Detail page has no postcode; leave as None.
|
||||||
|
|
||||||
|
_OLSTHOORN_BASE = "https://www.olsthoornmakelaars.nl"
|
||||||
|
|
||||||
|
_OLSTHOORN_STATUS_MAP = {
|
||||||
|
"badge-available": "beschikbaar",
|
||||||
|
"badge-bid": "onder_bod",
|
||||||
|
"badge-option": "onder_bod",
|
||||||
|
"badge-sold": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
_OLSTHOORN_DETAIL_STATUS_MAP = {
|
||||||
|
"beschikbaar": "beschikbaar",
|
||||||
|
"onder bod": "onder_bod",
|
||||||
|
"onder optie": "onder_bod",
|
||||||
|
"verkocht": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _olsthoorn_detail(detail_url: str) -> dict:
|
||||||
|
"""Fetch Olsthoorn detail page; extract kenmerken from #kenmerken li pairs."""
|
||||||
|
try:
|
||||||
|
soup = fetch_soup(detail_url)
|
||||||
|
kv: dict[str, str] = {}
|
||||||
|
for li in soup.select("#kenmerken li"):
|
||||||
|
spans = li.select("span")
|
||||||
|
if len(spans) >= 2:
|
||||||
|
label = spans[0].get_text(strip=True).lower()
|
||||||
|
value = spans[1].get_text(strip=True)
|
||||||
|
kv[label] = value
|
||||||
|
return {
|
||||||
|
"status": kv.get("status", "").lower(),
|
||||||
|
"woningtype": kv.get("soort object") or kv.get("soort woning") or kv.get("soort bouw"),
|
||||||
|
"bouwjaar": kv.get("bouwjaar"),
|
||||||
|
"woonoppervlak": kv.get("gebruiksoppervlakte"),
|
||||||
|
"perceeloppervlak": kv.get("perceeloppervlakte"),
|
||||||
|
"kamers": kv.get("aantal kamers"),
|
||||||
|
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||||
|
"energielabel": kv.get("energielabel"),
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("olsthoorn: detail fetch fout %s: %s", detail_url, e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_olsthoorn() -> list[RawListing]:
|
||||||
|
"""Fetch Olsthoorn Makelaars listings; only Delft, only koop."""
|
||||||
|
listings = []
|
||||||
|
page = 1
|
||||||
|
|
||||||
|
while True:
|
||||||
|
if page == 1:
|
||||||
|
url = f"{_OLSTHOORN_BASE}/wonen?sure_koop_huur=koop"
|
||||||
|
else:
|
||||||
|
url = f"{_OLSTHOORN_BASE}/wonen/page/{page}/?sure_koop_huur=koop"
|
||||||
|
|
||||||
|
soup = fetch_soup(url)
|
||||||
|
cards = soup.select("a.card-house")
|
||||||
|
if not cards:
|
||||||
|
break
|
||||||
|
|
||||||
|
for card in cards:
|
||||||
|
try:
|
||||||
|
href = card.get("href", "")
|
||||||
|
if not href:
|
||||||
|
continue
|
||||||
|
detail_url = href if href.startswith("http") else _OLSTHOORN_BASE + href
|
||||||
|
|
||||||
|
# Filter: only Delft
|
||||||
|
stad_el = card.select_one("h2.card__title")
|
||||||
|
stad = stad_el.get_text(strip=True) if stad_el else None
|
||||||
|
if not stad or stad.lower() != "delft":
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Price from bold tag — filter early before detail fetch
|
||||||
|
prijs_b = card.select_one("b")
|
||||||
|
prijs = parse_prijs(prijs_b.get_text() if prijs_b else None)
|
||||||
|
if prijs and prijs > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Status from badge class on label span
|
||||||
|
label_span = card.select_one("span.card-house__label")
|
||||||
|
status = "beschikbaar"
|
||||||
|
if label_span:
|
||||||
|
for cls in label_span.get("class", []):
|
||||||
|
if cls in _OLSTHOORN_STATUS_MAP:
|
||||||
|
status = _OLSTHOORN_STATUS_MAP[cls]
|
||||||
|
break
|
||||||
|
|
||||||
|
# Address: second <p> under .short--info (collapse internal whitespace)
|
||||||
|
adres_p = card.select("div.short--info > p")
|
||||||
|
if adres_p:
|
||||||
|
adres = " ".join(adres_p[0].get_text().split())
|
||||||
|
else:
|
||||||
|
adres = None
|
||||||
|
|
||||||
|
# Hero image: largest source srcset
|
||||||
|
src_tag = card.select_one('picture source[media="(min-width:1024px)"]')
|
||||||
|
hero = src_tag.get("data-srcset") if src_tag else None
|
||||||
|
if hero and not hero.startswith("http"):
|
||||||
|
hero = _OLSTHOORN_BASE + hero
|
||||||
|
|
||||||
|
# Woonoppervlak + kamers + energielabel from card data icons
|
||||||
|
woonoppervlak_card = None
|
||||||
|
kamers_card = None
|
||||||
|
energielabel_card = None
|
||||||
|
for data_div in card.select("div.data"):
|
||||||
|
inner = data_div.select_one("span.date__inner")
|
||||||
|
if not inner:
|
||||||
|
continue
|
||||||
|
txt = inner.get_text(strip=True)
|
||||||
|
if data_div.select_one("i.icon-sizes"):
|
||||||
|
woonoppervlak_card = parse_m2(txt)
|
||||||
|
elif data_div.select_one("i.icon-door"):
|
||||||
|
m = re.search(r"(\d+)", txt)
|
||||||
|
kamers_card = int(m.group(1)) if m else None
|
||||||
|
elif data_div.select_one("i.icon-energylabel"):
|
||||||
|
energielabel_card = txt or None
|
||||||
|
|
||||||
|
kk = _olsthoorn_detail(detail_url)
|
||||||
|
|
||||||
|
# Refine status from detail page
|
||||||
|
detail_status = _OLSTHOORN_DETAIL_STATUS_MAP.get(kk.get("status", ""), "")
|
||||||
|
if detail_status:
|
||||||
|
status = detail_status
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="olsthoorn",
|
||||||
|
status=status,
|
||||||
|
adres=adres,
|
||||||
|
postcode=None, # not exposed by broker
|
||||||
|
stad=stad,
|
||||||
|
prijs=prijs,
|
||||||
|
hero_image_url=hero,
|
||||||
|
woningtype=kk.get("woningtype"),
|
||||||
|
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||||
|
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
|
||||||
|
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||||
|
kamers=int(kk["kamers"]) if kk.get("kamers") else kamers_card,
|
||||||
|
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else None,
|
||||||
|
energielabel=kk.get("energielabel") or energielabel_card,
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("olsthoorn: parse fout: %s", e)
|
||||||
|
|
||||||
|
if len(cards) < 15:
|
||||||
|
break
|
||||||
|
page += 1
|
||||||
|
|
||||||
|
log.info("olsthoorn: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Van Herk Makelaars (Schiedam) — SURE WordPress plugin (card-house)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Listings filtered by city + price in URL; pagination via /page/{N}/.
|
||||||
|
# Detail page: div.features ul.unstyled li with two <span> (label + value).
|
||||||
|
|
||||||
|
_VANHERK_BASE = "https://www.vanherk.nl"
|
||||||
|
_VANHERK_LISTINGS = "https://www.vanherk.nl/wonen/aanbod/zoeken/schiedam/200000-300000/"
|
||||||
|
|
||||||
|
_VANHERK_STATUS_MAP = {
|
||||||
|
"beschikbaar": "beschikbaar",
|
||||||
|
"onder bod": "onder_bod",
|
||||||
|
"onder optie": "onder_bod",
|
||||||
|
"verkocht": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _vanherk_detail(detail_url: str) -> dict:
|
||||||
|
"""Fetch Van Herk detail page; extract kenmerken from div.features."""
|
||||||
|
try:
|
||||||
|
soup = fetch_soup(detail_url)
|
||||||
|
kv: dict[str, str] = {}
|
||||||
|
for li in soup.select("div.features ul.unstyled li"):
|
||||||
|
spans = li.select("span")
|
||||||
|
if len(spans) >= 2:
|
||||||
|
label = spans[0].get_text(strip=True).lower()
|
||||||
|
value = spans[1].get_text(strip=True)
|
||||||
|
kv[label] = value
|
||||||
|
# Postcode is in <title>: "Lorentzlaan 19 B, 3112 KE SCHIEDAM - Van Herk Makelaars"
|
||||||
|
postcode = None
|
||||||
|
if soup.title:
|
||||||
|
m = re.search(r"\b(\d{4}\s*[A-Z]{2})\b", soup.title.get_text())
|
||||||
|
if m:
|
||||||
|
postcode = m.group(1).replace(" ", " ").strip()
|
||||||
|
return {
|
||||||
|
"status": kv.get("status", "").lower(),
|
||||||
|
"postcode": postcode,
|
||||||
|
"bouwjaar": kv.get("bouwjaar"),
|
||||||
|
"woonoppervlak": kv.get("woonoppervlakte"),
|
||||||
|
"kamers": kv.get("aantal kamers"),
|
||||||
|
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||||
|
"energielabel": kv.get("energielabel"),
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("vanherk: detail fetch fout %s: %s", detail_url, e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_vanherk() -> list[RawListing]:
|
||||||
|
"""Fetch Van Herk listings; only Schiedam, only koop."""
|
||||||
|
listings = []
|
||||||
|
page = 1
|
||||||
|
|
||||||
|
while True:
|
||||||
|
if page == 1:
|
||||||
|
url = _VANHERK_LISTINGS
|
||||||
|
else:
|
||||||
|
url = _VANHERK_LISTINGS + f"page/{page}/"
|
||||||
|
|
||||||
|
soup = fetch_soup(url)
|
||||||
|
cards = soup.select("a.card-house")
|
||||||
|
if not cards:
|
||||||
|
break
|
||||||
|
|
||||||
|
for card in cards:
|
||||||
|
try:
|
||||||
|
href = card.get("href", "")
|
||||||
|
if not href:
|
||||||
|
continue
|
||||||
|
detail_url = href if href.startswith("http") else _VANHERK_BASE + href
|
||||||
|
|
||||||
|
# City from lead paragraph
|
||||||
|
lead = card.select_one("p.lead")
|
||||||
|
stad = lead.get_text(strip=True) if lead else None
|
||||||
|
|
||||||
|
# Address from h4 (normalize whitespace incl. )
|
||||||
|
h4 = card.select_one("h4")
|
||||||
|
adres = " ".join(h4.get_text().split()) if h4 else None
|
||||||
|
|
||||||
|
# Price from .subtitle
|
||||||
|
subtitle = card.select_one("p.subtitle")
|
||||||
|
prijs = parse_prijs(subtitle.get_text() if subtitle else None)
|
||||||
|
if prijs and prijs > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Hero image: largest srcset source
|
||||||
|
src_tag = card.select_one('picture source[media="(min-width:1280px)"]')
|
||||||
|
hero = src_tag.get("srcset") if src_tag else None
|
||||||
|
if hero and not hero.startswith("http"):
|
||||||
|
hero = _VANHERK_BASE + hero
|
||||||
|
|
||||||
|
# Card data icons: surface, bedrooms, energy label
|
||||||
|
woonoppervlak_card = None
|
||||||
|
slaapkamers_card = None
|
||||||
|
energielabel_card = None
|
||||||
|
for data_div in card.select("div.data"):
|
||||||
|
classes = data_div.get("class") or []
|
||||||
|
if "d-none" in classes:
|
||||||
|
continue
|
||||||
|
if "data-energie" in classes:
|
||||||
|
inner = data_div.select_one(".date__inner")
|
||||||
|
energielabel_card = inner.get_text(strip=True) if inner else None
|
||||||
|
elif data_div.select_one("i.icon-surface"):
|
||||||
|
inner = data_div.select_one("span.date__inner")
|
||||||
|
woonoppervlak_card = parse_m2(inner.get_text(strip=True) if inner else None)
|
||||||
|
elif data_div.select_one("i.icon-bed"):
|
||||||
|
inner = data_div.select_one("span.date__inner")
|
||||||
|
txt = inner.get_text(strip=True) if inner else None
|
||||||
|
m = re.search(r"(\d+)", txt) if txt else None
|
||||||
|
slaapkamers_card = int(m.group(1)) if m else None
|
||||||
|
|
||||||
|
kk = _vanherk_detail(detail_url)
|
||||||
|
|
||||||
|
status = _VANHERK_STATUS_MAP.get(kk.get("status", ""), "beschikbaar")
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="vanherk",
|
||||||
|
status=status,
|
||||||
|
adres=adres,
|
||||||
|
postcode=kk.get("postcode"),
|
||||||
|
stad=stad,
|
||||||
|
prijs=prijs,
|
||||||
|
hero_image_url=hero,
|
||||||
|
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar", "").isdigit() else None,
|
||||||
|
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
|
||||||
|
kamers=int(kk["kamers"]) if kk.get("kamers", "").isdigit() else None,
|
||||||
|
slaapkamers=(int(kk["slaapkamers"]) if kk.get("slaapkamers", "").isdigit() else None) or slaapkamers_card,
|
||||||
|
energielabel=kk.get("energielabel") or energielabel_card,
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("vanherk: parse fout: %s", e)
|
||||||
|
|
||||||
|
if len(cards) < 15:
|
||||||
|
break
|
||||||
|
page += 1
|
||||||
|
|
||||||
|
log.info("vanherk: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Borgdorff Makelaars (Den Haag / Westland) — SURE WordPress plugin
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Covers Den Haag ('s-gravenhage), Monster, Naaldwijk etc. Filter for Den Haag.
|
||||||
|
# Same SURE plugin as Schieland Borsboom but uses a.card--house (double dash).
|
||||||
|
# No postcode on detail page.
|
||||||
|
|
||||||
|
_BORGDORFF_BASE = "https://www.borgdorff.nl"
|
||||||
|
_BORGDORFF_DEN_HAAG = {"'s-gravenhage", "den haag"}
|
||||||
|
|
||||||
|
_BORGDORFF_BADGE_MAP = {
|
||||||
|
"badge--info": "beschikbaar",
|
||||||
|
"badge--warning": "onder_bod",
|
||||||
|
"badge--danger": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
_BORGDORFF_DETAIL_STATUS_MAP = {
|
||||||
|
"beschikbaar": "beschikbaar",
|
||||||
|
"onder bod": "onder_bod",
|
||||||
|
"onder optie": "onder_bod",
|
||||||
|
"verkocht": "verkocht",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _borgdorff_detail(detail_url: str) -> dict:
|
||||||
|
"""Fetch Borgdorff detail page; extract #kenmerken li span pairs."""
|
||||||
|
try:
|
||||||
|
soup = fetch_soup(detail_url)
|
||||||
|
kv: dict[str, str] = {}
|
||||||
|
for li in soup.select("#kenmerken li"):
|
||||||
|
spans = li.select("span")
|
||||||
|
if len(spans) >= 2:
|
||||||
|
label = spans[0].get_text(strip=True).lower()
|
||||||
|
value = spans[1].get_text(strip=True)
|
||||||
|
kv[label] = value
|
||||||
|
return {
|
||||||
|
"status": kv.get("status", "").lower(),
|
||||||
|
"woningtype": kv.get("soort woonhuis") or kv.get("soort woning") or kv.get("soort bouw"),
|
||||||
|
"bouwjaar": kv.get("bouwjaar"),
|
||||||
|
"woonoppervlak": kv.get("gebruiksoppervlakte wonen") or kv.get("gebruiksoppervlakte"),
|
||||||
|
"perceeloppervlak": kv.get("perceeloppervlakte"),
|
||||||
|
"slaapkamers": kv.get("aantal slaapkamers"),
|
||||||
|
"energielabel": kv.get("energielabel"),
|
||||||
|
}
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("borgdorff: detail fetch fout %s: %s", detail_url, e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_borgdorff() -> list[RawListing]:
|
||||||
|
"""Fetch Borgdorff listings; only Den Haag / 's-gravenhage, only koop."""
|
||||||
|
listings = []
|
||||||
|
page = 1
|
||||||
|
|
||||||
|
while True:
|
||||||
|
if page == 1:
|
||||||
|
url = f"{_BORGDORFF_BASE}/wonen?sure_koop_huur=koop"
|
||||||
|
else:
|
||||||
|
url = f"{_BORGDORFF_BASE}/wonen/page/{page}/?sure_koop_huur=koop"
|
||||||
|
|
||||||
|
soup = fetch_soup(url)
|
||||||
|
cards = soup.select("a.card--house")
|
||||||
|
if not cards:
|
||||||
|
break
|
||||||
|
|
||||||
|
for card in cards:
|
||||||
|
try:
|
||||||
|
href = card.get("href", "")
|
||||||
|
if not href:
|
||||||
|
continue
|
||||||
|
detail_url = href if href.startswith("http") else _BORGDORFF_BASE + href
|
||||||
|
|
||||||
|
# Filter: only Den Haag
|
||||||
|
stad_el = card.select_one("p.lead-two")
|
||||||
|
stad = stad_el.get_text(strip=True) if stad_el else None
|
||||||
|
if not stad or stad.lower() not in _BORGDORFF_DEN_HAAG:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Price — filter early
|
||||||
|
prijs = parse_prijs(_text(card, "p.strong"))
|
||||||
|
if prijs and prijs > config.MAX_PRICE:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Status from badge class
|
||||||
|
label_span = card.select_one("span.card-house__label")
|
||||||
|
status = "beschikbaar"
|
||||||
|
if label_span:
|
||||||
|
for cls in label_span.get("class", []):
|
||||||
|
if cls in _BORGDORFF_BADGE_MAP:
|
||||||
|
status = _BORGDORFF_BADGE_MAP[cls]
|
||||||
|
break
|
||||||
|
|
||||||
|
# Address
|
||||||
|
adres = _text(card, "h4")
|
||||||
|
|
||||||
|
# Hero: largest source srcset
|
||||||
|
src_tag = card.select_one('picture source[media="(min-width:1280px)"]')
|
||||||
|
hero = src_tag.get("srcset") if src_tag else None
|
||||||
|
if not hero:
|
||||||
|
img = card.select_one("img[data-src]")
|
||||||
|
hero = img.get("data-src") if img else None
|
||||||
|
if hero and not hero.startswith("http"):
|
||||||
|
hero = _BORGDORFF_BASE + hero
|
||||||
|
|
||||||
|
# Surface + bedrooms from data icons
|
||||||
|
woonoppervlak_card = None
|
||||||
|
slaapkamers_card = None
|
||||||
|
for data_div in card.select("div.data"):
|
||||||
|
inner = data_div.select_one("p.small")
|
||||||
|
if not inner:
|
||||||
|
continue
|
||||||
|
txt = inner.get_text(strip=True)
|
||||||
|
if data_div.select_one("i.icon-surface"):
|
||||||
|
woonoppervlak_card = parse_m2(txt)
|
||||||
|
elif data_div.select_one("i.icon-bed"):
|
||||||
|
m = re.search(r"(\d+)", txt)
|
||||||
|
slaapkamers_card = int(m.group(1)) if m else None
|
||||||
|
|
||||||
|
kk = _borgdorff_detail(detail_url)
|
||||||
|
|
||||||
|
# Refine status from detail page
|
||||||
|
if kk.get("status"):
|
||||||
|
status = _BORGDORFF_DETAIL_STATUS_MAP.get(kk["status"], status)
|
||||||
|
|
||||||
|
listings.append(RawListing(
|
||||||
|
url=detail_url,
|
||||||
|
source_makelaar="borgdorff",
|
||||||
|
status=status,
|
||||||
|
adres=adres,
|
||||||
|
postcode=None, # not exposed by broker
|
||||||
|
stad=stad,
|
||||||
|
prijs=prijs,
|
||||||
|
hero_image_url=hero,
|
||||||
|
woningtype=kk.get("woningtype"),
|
||||||
|
bouwjaar=int(kk["bouwjaar"]) if kk.get("bouwjaar") else None,
|
||||||
|
woonoppervlak=parse_m2(kk.get("woonoppervlak")) or woonoppervlak_card,
|
||||||
|
perceeloppervlak=parse_m2(kk.get("perceeloppervlak")),
|
||||||
|
slaapkamers=int(kk["slaapkamers"]) if kk.get("slaapkamers") else slaapkamers_card,
|
||||||
|
energielabel=kk.get("energielabel"),
|
||||||
|
))
|
||||||
|
if config.APP_ENV == "dev":
|
||||||
|
break
|
||||||
|
except Exception as e:
|
||||||
|
log.warning("borgdorff: parse fout: %s", e)
|
||||||
|
|
||||||
|
if len(cards) < 15:
|
||||||
|
break
|
||||||
|
page += 1
|
||||||
|
|
||||||
|
log.info("borgdorff: %d listings opgehaald", len(listings))
|
||||||
|
return listings
|
||||||
@@ -1,5 +1,5 @@
|
|||||||
"""
|
"""
|
||||||
config.py — vul aan met je eigen waarden. Secrets via environment variables.
|
config.py — Secrets via environment variables.
|
||||||
"""
|
"""
|
||||||
import os
|
import os
|
||||||
|
|
||||||
@@ -10,16 +10,46 @@ MICHELLE_WERK_9292 = "vlaardingen/"+MICHELLE_WERK_POSTCODE
|
|||||||
|
|
||||||
HA_WEBHOOK_URL = os.environ.get("HA_WEBHOOK_URL", "")
|
HA_WEBHOOK_URL = os.environ.get("HA_WEBHOOK_URL", "")
|
||||||
|
|
||||||
SMTP_HOST = os.environ.get("SMTP_HOST", "")
|
|
||||||
SMTP_PORT = int(os.environ.get("SMTP_PORT", "587"))
|
|
||||||
SMTP_FROM = os.environ.get("SMTP_FROM", "")
|
|
||||||
SMTP_TO = os.environ.get("SMTP_TO", "")
|
|
||||||
SMTP_USER = os.environ.get("SMTP_USER", "")
|
|
||||||
|
|
||||||
USER_AGENT = "Huizenbot/1.0 (+mark@kalsbeek.dev) persoonlijk gebruik"
|
USER_AGENT = "Huizenbot/1.0 (+mark@kalsbeek.dev) persoonlijk gebruik"
|
||||||
|
|
||||||
DB_PATH = os.environ.get("DB_PATH", "/data/huizenbot.db")
|
DB_PATH = os.environ.get("DB_PATH", "/data/huizenbot.db")
|
||||||
|
|
||||||
FIETS_SNELHEID_FACTOR = 1.27
|
FIETS_SNELHEID_FACTOR = 1.27
|
||||||
|
|
||||||
MAX_PRICE = 300_000
|
MAX_PRICE = 300_000 # coarse pre-filter in adapters only
|
||||||
|
|
||||||
|
MIN_AREA = 65 # Sq meters
|
||||||
|
|
||||||
|
# Fine price filter: max mortgage per energy label group * 0.9
|
||||||
|
# Labels not in this map fall back to the most conservative tier.
|
||||||
|
_LABEL_DISCOUNT = 0.9
|
||||||
|
MAX_PRIJS_PER_LABEL: dict[str, int] = {
|
||||||
|
"EFG": int(286_942 * _LABEL_DISCOUNT),
|
||||||
|
"CD": int(291_942 * _LABEL_DISCOUNT),
|
||||||
|
"AB": int(296_942 * _LABEL_DISCOUNT),
|
||||||
|
"A+": int(306_942 * _LABEL_DISCOUNT),
|
||||||
|
}
|
||||||
|
_MAX_PRIJS_ONBEKEND = MAX_PRIJS_PER_LABEL["EFG"] # conservative fallback
|
||||||
|
|
||||||
|
def max_prijs_voor_label(label: str | None) -> int:
|
||||||
|
"""Return the max allowed price for a given energy label (or None/unknown)."""
|
||||||
|
if not label:
|
||||||
|
return _MAX_PRIJS_ONBEKEND
|
||||||
|
l = label.strip().upper()
|
||||||
|
if l in ("A+++", "A++", "A+"):
|
||||||
|
return MAX_PRIJS_PER_LABEL["A+"]
|
||||||
|
if l in ("A", "B"):
|
||||||
|
return MAX_PRIJS_PER_LABEL["AB"]
|
||||||
|
if l in ("C", "D"):
|
||||||
|
return MAX_PRIJS_PER_LABEL["CD"]
|
||||||
|
if l in ("E", "F", "G"):
|
||||||
|
return MAX_PRIJS_PER_LABEL["EFG"]
|
||||||
|
return _MAX_PRIJS_ONBEKEND
|
||||||
|
|
||||||
|
# Travel time limits (None travel time → pass, with warning)
|
||||||
|
MAX_OV_MINUTEN_MARK = 50
|
||||||
|
MAX_OV_MINUTEN_MICHELLE = 50
|
||||||
|
MAX_FIETS_MINUTEN_MARK = 35
|
||||||
|
# No fiets limit for michelle
|
||||||
|
|
||||||
|
APP_ENV = os.environ.get("APP_ENV", "dev")
|
||||||
|
|||||||
218
src/huizenbot.py
218
src/huizenbot.py
@@ -6,13 +6,10 @@ import hashlib
|
|||||||
import json
|
import json
|
||||||
import logging
|
import logging
|
||||||
import os
|
import os
|
||||||
import smtplib
|
|
||||||
import sqlite3
|
import sqlite3
|
||||||
import time
|
import time
|
||||||
from dataclasses import dataclass, field
|
from dataclasses import dataclass, field
|
||||||
from datetime import datetime, date
|
from datetime import datetime, date
|
||||||
from email.mime.multipart import MIMEMultipart
|
|
||||||
from email.mime.text import MIMEText
|
|
||||||
from typing import Callable, Any
|
from typing import Callable, Any
|
||||||
|
|
||||||
import httpx
|
import httpx
|
||||||
@@ -97,6 +94,7 @@ CREATE TABLE IF NOT EXISTS woningen (
|
|||||||
|
|
||||||
|
|
||||||
def get_db(path: str) -> sqlite3.Connection:
|
def get_db(path: str) -> sqlite3.Connection:
|
||||||
|
log.info(f"Opening db at path {path}")
|
||||||
conn = sqlite3.connect(path)
|
conn = sqlite3.connect(path)
|
||||||
conn.row_factory = sqlite3.Row
|
conn.row_factory = sqlite3.Row
|
||||||
conn.execute("PRAGMA journal_mode=WAL")
|
conn.execute("PRAGMA journal_mode=WAL")
|
||||||
@@ -161,9 +159,22 @@ def upsert(conn: sqlite3.Connection, listing: RawListing, travel: dict[str,int])
|
|||||||
"extra": json.dumps(listing.extra) if listing.extra else None,
|
"extra": json.dumps(listing.extra) if listing.extra else None,
|
||||||
})
|
})
|
||||||
else:
|
else:
|
||||||
_cursor = conn.execute("""
|
if travel:
|
||||||
UPDATE woningen SET last_seen = ?, status = ? WHERE id = ?
|
conn.execute("""
|
||||||
""", (now, listing.status, lid))
|
UPDATE woningen
|
||||||
|
SET last_seen = ?, status = ?,
|
||||||
|
fiets_mark = ?, fiets_michelle = ?, ov_mark = ?, ov_michelle = ?
|
||||||
|
WHERE id = ?
|
||||||
|
""", (
|
||||||
|
now, listing.status,
|
||||||
|
travel.get("fiets_mark"), travel.get("fiets_michelle"),
|
||||||
|
travel.get("ov_mark"), travel.get("ov_michelle"),
|
||||||
|
lid,
|
||||||
|
))
|
||||||
|
else:
|
||||||
|
conn.execute("""
|
||||||
|
UPDATE woningen SET last_seen = ?, status = ? WHERE id = ?
|
||||||
|
""", (now, listing.status, lid))
|
||||||
|
|
||||||
conn.commit()
|
conn.commit()
|
||||||
return is_new
|
return is_new
|
||||||
@@ -234,7 +245,7 @@ def _next_weekday_morning() -> str:
|
|||||||
return d.strftime("%Y%m%dT083000")
|
return d.strftime("%Y%m%dT083000")
|
||||||
|
|
||||||
|
|
||||||
def bereken_reistijden(postcode: str | None) -> dict[str, int]:
|
def bereken_reistijden(postcode: str | None, stad: str | None) -> dict[str, int]:
|
||||||
"""Bereken alle reistijden voor een woning postcode. Geeft lege dict bij falen."""
|
"""Bereken alle reistijden voor een woning postcode. Geeft lege dict bij falen."""
|
||||||
if not postcode:
|
if not postcode:
|
||||||
return {}
|
return {}
|
||||||
@@ -243,16 +254,20 @@ def bereken_reistijden(postcode: str | None) -> dict[str, int]:
|
|||||||
if not woning_coords:
|
if not woning_coords:
|
||||||
return {}
|
return {}
|
||||||
|
|
||||||
werk1 = geocode(config.MARK_WERK_POSTCODE)
|
werk1_coords = geocode(config.MARK_WERK_POSTCODE)
|
||||||
werk2 = geocode(config.MICHELLE_WERK_POSTCODE)
|
werk2_coords = geocode(config.MICHELLE_WERK_POSTCODE)
|
||||||
|
|
||||||
|
# 9292 expects "cityname/postcode" strings (lowercase city)
|
||||||
|
stad_lower = (stad or "").strip().lower()
|
||||||
|
woning_9292 = f"{stad_lower}/{postcode}" if stad_lower else postcode
|
||||||
|
|
||||||
result = {}
|
result = {}
|
||||||
if werk1:
|
if werk1_coords:
|
||||||
result["fiets_mark"] = fiets_minuten(woning_coords, werk1)
|
result["fiets_mark"] = fiets_minuten(woning_coords, werk1_coords)
|
||||||
result["ov_mark"] = ov_minuten(woning_coords, werk1)
|
result["ov_mark"] = ov_minuten(woning_9292, config.MARK_WERK_9292)
|
||||||
if werk2:
|
if werk2_coords:
|
||||||
result["fiets_michelle"] = fiets_minuten(woning_coords, werk2)
|
result["fiets_michelle"] = fiets_minuten(woning_coords, werk2_coords)
|
||||||
result["ov_michelle"] = ov_minuten(woning_coords, werk2)
|
result["ov_michelle"] = ov_minuten(woning_9292, config.MICHELLE_WERK_9292)
|
||||||
|
|
||||||
return result
|
return result
|
||||||
|
|
||||||
@@ -285,45 +300,66 @@ def notify_ha(listing: RawListing, travel: dict[str,int]) -> None:
|
|||||||
log.info("HA notificatie verstuurd voor %s", listing.adres)
|
log.info("HA notificatie verstuurd voor %s", listing.adres)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
log.error("HA webhook fout: %s", e)
|
log.error("HA webhook fout: %s", e)
|
||||||
notify_email(listing, travel) # fallback
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Filtering
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
def notify_email(listing: RawListing, travel: dict[str,int]) -> None:
|
def _check_filters(listing: RawListing, travel: dict[str, int]) -> bool:
|
||||||
"""Stuur HTML email als fallback."""
|
|
||||||
if not config.SMTP_HOST:
|
|
||||||
return
|
|
||||||
|
|
||||||
subject = f"Nieuwe woning: {listing.adres}, {listing.stad} — €{listing.prijs:,}"
|
|
||||||
|
|
||||||
html = f"""
|
|
||||||
<html><body>
|
|
||||||
<h2>{listing.adres}, {listing.stad}</h2>
|
|
||||||
<p><strong>Prijs:</strong> €{listing.prijs:,}</p>
|
|
||||||
<p><strong>Status:</strong> {listing.status}</p>
|
|
||||||
<p><strong>Fiets P1:</strong> {travel.get('fiets_mark')} min
|
|
||||||
<strong>OV P1:</strong> {travel.get('ov_mark')} min</p>
|
|
||||||
<p><strong>Fiets P2:</strong> {travel.get('fiets_michelle')} min
|
|
||||||
<strong>OV P2:</strong> {travel.get('ov_michelle')} min</p>
|
|
||||||
{"<img src='" + listing.hero_image_url + "' width='600'>" if listing.hero_image_url else ""}
|
|
||||||
<p><a href="{listing.url}">Bekijk listing</a></p>
|
|
||||||
</body></html>
|
|
||||||
"""
|
"""
|
||||||
|
Returns True if the listing passes all filters and should trigger a notification.
|
||||||
|
Always errs on the side of notifying when data is missing (logs a warning).
|
||||||
|
"""
|
||||||
|
passed = True
|
||||||
|
|
||||||
msg = MIMEMultipart("alternative")
|
# --- Price filter ---
|
||||||
msg["Subject"] = subject
|
if listing.prijs is not None:
|
||||||
msg["From"] = config.SMTP_FROM
|
max_p = config.max_prijs_voor_label(listing.energielabel)
|
||||||
msg["To"] = config.SMTP_TO
|
if listing.prijs > max_p:
|
||||||
msg.attach(MIMEText(html, "html"))
|
log.info(
|
||||||
|
"Gefilterd op prijs: %s €%d > €%d (label: %s)",
|
||||||
|
listing.adres, listing.prijs, max_p, listing.energielabel or "onbekend",
|
||||||
|
)
|
||||||
|
passed = False
|
||||||
|
# --- Area filter ---
|
||||||
|
if listing.woonoppervlak is not None and listing.woonoppervlak < config.MIN_AREA:
|
||||||
|
log.info(f"Gefilterd op oppervlakte: {listing.woonoppervlak} < {config.MIN_AREA}")
|
||||||
|
passed = False
|
||||||
|
|
||||||
try:
|
# --- OV filter ---
|
||||||
with smtplib.SMTP(config.SMTP_HOST, config.SMTP_PORT) as s:
|
ov_mark = travel.get("ov_mark")
|
||||||
if config.SMTP_USER:
|
ov_michelle = travel.get("ov_michelle")
|
||||||
s.starttls()
|
|
||||||
s.login(config.SMTP_USER, os.environ.get("SMTP_PASSWORD", ""))
|
if ov_mark is None:
|
||||||
s.send_message(msg)
|
log.warning(
|
||||||
log.info("Email verstuurd voor %s", listing.adres)
|
"OV reistijd mark ONBEKEND voor %s — notificatie wordt toch verstuurd",
|
||||||
except Exception as e:
|
listing.adres,
|
||||||
log.error("Email fout: %s", e)
|
)
|
||||||
|
elif ov_mark > config.MAX_OV_MINUTEN_MARK:
|
||||||
|
log.info("Gefilterd op OV mark: %s %dmin > %dmin", listing.adres, ov_mark, config.MAX_OV_MINUTEN_MARK)
|
||||||
|
passed = False
|
||||||
|
|
||||||
|
if ov_michelle is None:
|
||||||
|
log.warning(
|
||||||
|
"OV reistijd michelle ONBEKEND voor %s — notificatie wordt toch verstuurd",
|
||||||
|
listing.adres,
|
||||||
|
)
|
||||||
|
elif ov_michelle > config.MAX_OV_MINUTEN_MICHELLE:
|
||||||
|
log.info("Gefilterd op OV michelle: %s %dmin > %dmin", listing.adres, ov_michelle, config.MAX_OV_MINUTEN_MICHELLE)
|
||||||
|
passed = False
|
||||||
|
|
||||||
|
# --- Fiets filter (mark only) ---
|
||||||
|
fiets_mark = travel.get("fiets_mark")
|
||||||
|
if fiets_mark is None:
|
||||||
|
log.warning(
|
||||||
|
"Fiets reistijd mark ONBEKEND voor %s — notificatie wordt toch verstuurd",
|
||||||
|
listing.adres,
|
||||||
|
)
|
||||||
|
elif fiets_mark > config.MAX_FIETS_MINUTEN_MARK:
|
||||||
|
log.info("Gefilterd op fiets mark: %s %dmin > %dmin", listing.adres, fiets_mark, config.MAX_FIETS_MINUTEN_MARK)
|
||||||
|
passed = False
|
||||||
|
|
||||||
|
return passed
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
@@ -333,42 +369,66 @@ def notify_email(listing: RawListing, travel: dict[str,int]) -> None:
|
|||||||
Scraper = Callable[[], list[RawListing]]
|
Scraper = Callable[[], list[RawListing]]
|
||||||
|
|
||||||
|
|
||||||
def run(scrapers: list[Scraper], db_path: str) -> None:
|
def _run_scraper(scraper: Scraper) -> tuple[str, list[RawListing]]:
|
||||||
conn = get_db(db_path)
|
name = scraper.__name__
|
||||||
total_new = 0
|
log.info("Scraper starten: %s", name)
|
||||||
|
try:
|
||||||
for scraper in scrapers:
|
listings = scraper()
|
||||||
name = scraper.__name__
|
|
||||||
log.info("Scraper starten: %s", name)
|
|
||||||
try:
|
|
||||||
listings = scraper()
|
|
||||||
except Exception as e:
|
|
||||||
log.error("Scraper %s gefaald: %s", name, e)
|
|
||||||
continue
|
|
||||||
|
|
||||||
log.info("Scraper %s: %d listings opgehaald", name, len(listings))
|
log.info("Scraper %s: %d listings opgehaald", name, len(listings))
|
||||||
|
return name, listings
|
||||||
|
except Exception as e:
|
||||||
|
log.error("Scraper %s gefaald: %s", name, e)
|
||||||
|
return name, []
|
||||||
|
|
||||||
for listing in listings:
|
|
||||||
travel = {}
|
|
||||||
try:
|
|
||||||
# Check of het een nieuwe woning is vóór upsert
|
|
||||||
lid = listing_id(listing.url)
|
|
||||||
is_existing = conn.execute(
|
|
||||||
"SELECT id FROM woningen WHERE id = ?", (lid,)
|
|
||||||
).fetchone() is not None
|
|
||||||
|
|
||||||
if not is_existing:
|
def run(scrapers: dict[str,Scraper], db_path: str) -> None:
|
||||||
travel = bereken_reistijden(listing.postcode)
|
import concurrent.futures
|
||||||
|
|
||||||
is_new = upsert(conn, listing, travel)
|
conn = get_db(db_path)
|
||||||
|
|
||||||
if is_new:
|
total_new = 0
|
||||||
total_new += 1
|
total_notified = 0
|
||||||
log.info("Nieuwe woning: %s (%s)", listing.adres, listing.url)
|
|
||||||
|
# Phase 1: run all scrapers concurrently (each hits a different domain)
|
||||||
|
all_listings: list[RawListing] = []
|
||||||
|
with concurrent.futures.ThreadPoolExecutor(max_workers=len(scrapers)) as pool:
|
||||||
|
futures = {pool.submit(_run_scraper, s): s for s in scrapers.values()}
|
||||||
|
for future in concurrent.futures.as_completed(futures):
|
||||||
|
_name, listings = future.result()
|
||||||
|
all_listings.extend(listings)
|
||||||
|
|
||||||
|
log.info("Alle scrapers klaar. %d listings totaal opgehaald.", len(all_listings))
|
||||||
|
|
||||||
|
# Phase 2: sequential travel calculation + upsert + filtered notify
|
||||||
|
for listing in all_listings:
|
||||||
|
travel = {}
|
||||||
|
try:
|
||||||
|
lid = listing_id(listing.url)
|
||||||
|
row = conn.execute(
|
||||||
|
"SELECT fiets_mark FROM woningen WHERE id = ?", (lid,)
|
||||||
|
).fetchone()
|
||||||
|
is_existing = row is not None
|
||||||
|
needs_travel = not is_existing or row[0] is None
|
||||||
|
|
||||||
|
if needs_travel:
|
||||||
|
travel = bereken_reistijden(listing.postcode, listing.stad)
|
||||||
|
|
||||||
|
is_new = upsert(conn, listing, travel)
|
||||||
|
|
||||||
|
if is_new:
|
||||||
|
total_new += 1
|
||||||
|
log.info("Nieuwe woning: %s (%s)", listing.adres, listing.url)
|
||||||
|
if _check_filters(listing, travel):
|
||||||
|
total_notified += 1
|
||||||
notify_ha(listing, travel)
|
notify_ha(listing, travel)
|
||||||
|
else:
|
||||||
|
log.info("Geen notificatie voor %s (gefilterd)", listing.adres)
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
log.error("Fout bij verwerken %s: %s", listing.url, e)
|
log.error("Fout bij verwerken %s: %s", listing.url, e)
|
||||||
|
|
||||||
log.info("Run klaar. %d nieuwe woningen gevonden.", total_new)
|
log.info(
|
||||||
|
"Run klaar. %d nieuwe woningen, %d notificaties verstuurd.",
|
||||||
|
total_new, total_notified,
|
||||||
|
)
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|||||||
741
src/templates/index.html
Normal file
741
src/templates/index.html
Normal file
@@ -0,0 +1,741 @@
|
|||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="nl">
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||||
|
<title>Huizenbot</title>
|
||||||
|
<link rel="preconnect" href="https://fonts.googleapis.com">
|
||||||
|
<link href="https://fonts.googleapis.com/css2?family=Syne:wght@400;600;700;800&family=DM+Mono:wght@400;500&display=swap" rel="stylesheet">
|
||||||
|
<style>
|
||||||
|
:root {
|
||||||
|
--bg: #f5f0eb;
|
||||||
|
--surface: #fdf9f5;
|
||||||
|
--surface2: #ede8e2;
|
||||||
|
--border: #ddd6cc;
|
||||||
|
--accent: #6a9e78;
|
||||||
|
--accent-dim: #4f7a5c;
|
||||||
|
--text: #2e2a25;
|
||||||
|
--text-dim: #7a7068;
|
||||||
|
--text-dimmer: #aaa098;
|
||||||
|
--red: #c0524a;
|
||||||
|
--orange: #c07c3a;
|
||||||
|
--radius: 10px;
|
||||||
|
--font-ui: 'Syne', sans-serif;
|
||||||
|
--font-mono: 'DM Mono', monospace;
|
||||||
|
}
|
||||||
|
|
||||||
|
*, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
|
||||||
|
|
||||||
|
body {
|
||||||
|
background: var(--bg);
|
||||||
|
color: var(--text);
|
||||||
|
font-family: var(--font-ui);
|
||||||
|
min-height: 100vh;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Header ── */
|
||||||
|
header {
|
||||||
|
padding: 1.25rem 1rem 0;
|
||||||
|
display: flex;
|
||||||
|
align-items: baseline;
|
||||||
|
gap: 0.75rem;
|
||||||
|
}
|
||||||
|
header h1 {
|
||||||
|
font-size: 1.5rem;
|
||||||
|
font-weight: 800;
|
||||||
|
letter-spacing: -0.03em;
|
||||||
|
color: var(--accent);
|
||||||
|
}
|
||||||
|
#count {
|
||||||
|
font-family: var(--font-mono);
|
||||||
|
font-size: 0.75rem;
|
||||||
|
color: var(--text-dim);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Filters ── */
|
||||||
|
#filters {
|
||||||
|
position: sticky;
|
||||||
|
top: 0;
|
||||||
|
z-index: 100;
|
||||||
|
background: var(--bg);
|
||||||
|
border-bottom: 1px solid var(--border);
|
||||||
|
padding: 0.75rem 1rem;
|
||||||
|
display: flex;
|
||||||
|
flex-wrap: wrap;
|
||||||
|
gap: 0.5rem;
|
||||||
|
align-items: center;
|
||||||
|
}
|
||||||
|
|
||||||
|
.filter-group {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: 0.35rem;
|
||||||
|
background: var(--surface);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: 6px;
|
||||||
|
padding: 0.3rem 0.6rem;
|
||||||
|
}
|
||||||
|
.filter-group label {
|
||||||
|
font-size: 0.7rem;
|
||||||
|
font-weight: 600;
|
||||||
|
color: var(--text-dim);
|
||||||
|
letter-spacing: 0.04em;
|
||||||
|
white-space: nowrap;
|
||||||
|
text-transform: uppercase;
|
||||||
|
}
|
||||||
|
.filter-group input[type=number] {
|
||||||
|
background: transparent;
|
||||||
|
border: none;
|
||||||
|
color: var(--accent);
|
||||||
|
font-family: var(--font-mono);
|
||||||
|
font-size: 0.8rem;
|
||||||
|
width: 3.2rem;
|
||||||
|
outline: none;
|
||||||
|
text-align: right;
|
||||||
|
}
|
||||||
|
.filter-group input[type=number]::-webkit-inner-spin-button { opacity: 0.3; }
|
||||||
|
|
||||||
|
.filter-group select {
|
||||||
|
background: transparent;
|
||||||
|
border: none;
|
||||||
|
color: var(--text);
|
||||||
|
font-family: var(--font-ui);
|
||||||
|
font-size: 0.75rem;
|
||||||
|
font-weight: 600;
|
||||||
|
outline: none;
|
||||||
|
cursor: pointer;
|
||||||
|
}
|
||||||
|
.filter-group select option { background: var(--surface2); }
|
||||||
|
|
||||||
|
#filter-reset {
|
||||||
|
background: none;
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: 6px;
|
||||||
|
color: var(--text-dimmer);
|
||||||
|
font-family: var(--font-ui);
|
||||||
|
font-size: 0.7rem;
|
||||||
|
font-weight: 600;
|
||||||
|
letter-spacing: 0.04em;
|
||||||
|
text-transform: uppercase;
|
||||||
|
padding: 0.3rem 0.7rem;
|
||||||
|
cursor: pointer;
|
||||||
|
transition: color 0.15s, border-color 0.15s;
|
||||||
|
}
|
||||||
|
#filter-reset:hover { color: var(--text); border-color: var(--text-dim); }
|
||||||
|
|
||||||
|
/* ── Card list ── */
|
||||||
|
#listings {
|
||||||
|
padding: 0.75rem 1rem 3rem;
|
||||||
|
display: flex;
|
||||||
|
flex-direction: column;
|
||||||
|
gap: 0.5rem;
|
||||||
|
max-width: 900px;
|
||||||
|
margin: 0 auto;
|
||||||
|
}
|
||||||
|
|
||||||
|
#empty {
|
||||||
|
text-align: center;
|
||||||
|
color: var(--text-dimmer);
|
||||||
|
font-family: var(--font-mono);
|
||||||
|
font-size: 0.85rem;
|
||||||
|
padding: 4rem 1rem;
|
||||||
|
display: none;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Card ── */
|
||||||
|
.card {
|
||||||
|
background: var(--surface);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: var(--radius);
|
||||||
|
overflow: hidden;
|
||||||
|
transition: border-color 0.15s;
|
||||||
|
}
|
||||||
|
.card:hover { border-color: #c5bdb4; }
|
||||||
|
|
||||||
|
.card-compact {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: 1fr 2fr;
|
||||||
|
min-height: 110px;
|
||||||
|
cursor: pointer;
|
||||||
|
user-select: none;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Image */
|
||||||
|
.card-img {
|
||||||
|
position: relative;
|
||||||
|
background: var(--surface2);
|
||||||
|
overflow: hidden;
|
||||||
|
}
|
||||||
|
.card-img img {
|
||||||
|
width: 100%;
|
||||||
|
height: 100%;
|
||||||
|
object-fit: cover;
|
||||||
|
display: block;
|
||||||
|
transition: transform 0.3s ease;
|
||||||
|
}
|
||||||
|
.card:hover .card-img img { transform: scale(1.03); }
|
||||||
|
.card-img-placeholder {
|
||||||
|
width: 100%;
|
||||||
|
height: 100%;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
justify-content: center;
|
||||||
|
color: var(--text-dimmer);
|
||||||
|
font-size: 1.5rem;
|
||||||
|
}
|
||||||
|
.card-source {
|
||||||
|
position: absolute;
|
||||||
|
bottom: 0.4rem;
|
||||||
|
left: 0.4rem;
|
||||||
|
background: rgba(255,255,255,0.75);
|
||||||
|
backdrop-filter: blur(4px);
|
||||||
|
color: var(--text-dim);
|
||||||
|
font-family: var(--font-mono);
|
||||||
|
font-size: 0.6rem;
|
||||||
|
padding: 0.15rem 0.4rem;
|
||||||
|
border-radius: 4px;
|
||||||
|
letter-spacing: 0.03em;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Data section */
|
||||||
|
.card-data {
|
||||||
|
padding: 0.7rem 0.75rem;
|
||||||
|
display: flex;
|
||||||
|
flex-direction: column;
|
||||||
|
gap: 0.35rem;
|
||||||
|
min-width: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
.card-header {
|
||||||
|
display: flex;
|
||||||
|
align-items: flex-start;
|
||||||
|
justify-content: space-between;
|
||||||
|
gap: 0.5rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.card-adres {
|
||||||
|
font-size: 0.85rem;
|
||||||
|
font-weight: 700;
|
||||||
|
line-height: 1.3;
|
||||||
|
color: var(--text);
|
||||||
|
overflow: hidden;
|
||||||
|
text-overflow: ellipsis;
|
||||||
|
white-space: nowrap;
|
||||||
|
flex: 1;
|
||||||
|
min-width: 0;
|
||||||
|
}
|
||||||
|
.card-stad {
|
||||||
|
font-size: 0.7rem;
|
||||||
|
color: var(--text-dim);
|
||||||
|
font-weight: 400;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Link chip — always clickable, does NOT expand card */
|
||||||
|
.card-link {
|
||||||
|
flex-shrink: 0;
|
||||||
|
display: inline-flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: 0.25rem;
|
||||||
|
background: var(--surface2);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: 5px;
|
||||||
|
color: var(--text-dim);
|
||||||
|
font-family: var(--font-mono);
|
||||||
|
font-size: 0.6rem;
|
||||||
|
padding: 0.2rem 0.45rem;
|
||||||
|
text-decoration: none;
|
||||||
|
transition: color 0.15s, border-color 0.15s, background 0.15s;
|
||||||
|
white-space: nowrap;
|
||||||
|
}
|
||||||
|
.card-link:hover {
|
||||||
|
color: var(--accent);
|
||||||
|
border-color: var(--accent-dim);
|
||||||
|
background: rgba(106,158,120,0.08);
|
||||||
|
}
|
||||||
|
.card-link svg { flex-shrink: 0; }
|
||||||
|
|
||||||
|
.card-prijs {
|
||||||
|
font-size: 1rem;
|
||||||
|
font-weight: 800;
|
||||||
|
color: var(--accent);
|
||||||
|
letter-spacing: -0.02em;
|
||||||
|
font-family: var(--font-mono);
|
||||||
|
}
|
||||||
|
|
||||||
|
.card-meta {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: 1fr 1fr;
|
||||||
|
gap: 0.2rem 0.5rem;
|
||||||
|
margin-top: 0.1rem;
|
||||||
|
}
|
||||||
|
.card-meta-item {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: 0.3rem;
|
||||||
|
font-size: 0.68rem;
|
||||||
|
color: var(--text-dim);
|
||||||
|
font-family: var(--font-mono);
|
||||||
|
white-space: nowrap;
|
||||||
|
}
|
||||||
|
.card-meta-item .icon { font-size: 0.75rem; }
|
||||||
|
.card-meta-item .val { color: var(--text); font-weight: 500; }
|
||||||
|
.card-meta-item.warn .val { color: var(--orange); }
|
||||||
|
.card-meta-item.ok .val { color: var(--accent); }
|
||||||
|
|
||||||
|
/* Expand toggle indicator */
|
||||||
|
.card-toggle {
|
||||||
|
align-self: flex-end;
|
||||||
|
color: var(--text-dimmer);
|
||||||
|
font-size: 0.65rem;
|
||||||
|
font-weight: 600;
|
||||||
|
letter-spacing: 0.04em;
|
||||||
|
text-transform: uppercase;
|
||||||
|
margin-top: auto;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── Expanded panel ── */
|
||||||
|
.card-expanded {
|
||||||
|
display: none;
|
||||||
|
border-top: 1px solid var(--border);
|
||||||
|
padding: 0.9rem 1rem;
|
||||||
|
background: var(--surface2);
|
||||||
|
}
|
||||||
|
.card.open .card-expanded { display: block; }
|
||||||
|
.card.open .card-toggle { color: var(--accent-dim); }
|
||||||
|
|
||||||
|
.expanded-grid {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: repeat(auto-fill, minmax(160px, 1fr));
|
||||||
|
gap: 0.5rem 1rem;
|
||||||
|
margin-bottom: 0.75rem;
|
||||||
|
}
|
||||||
|
.expanded-field {
|
||||||
|
display: flex;
|
||||||
|
flex-direction: column;
|
||||||
|
gap: 0.1rem;
|
||||||
|
}
|
||||||
|
.expanded-field .ef-label {
|
||||||
|
font-size: 0.62rem;
|
||||||
|
font-weight: 600;
|
||||||
|
color: var(--text-dimmer);
|
||||||
|
letter-spacing: 0.06em;
|
||||||
|
text-transform: uppercase;
|
||||||
|
}
|
||||||
|
.expanded-field .ef-val {
|
||||||
|
font-size: 0.8rem;
|
||||||
|
color: var(--text);
|
||||||
|
font-family: var(--font-mono);
|
||||||
|
}
|
||||||
|
|
||||||
|
.extra-section {
|
||||||
|
border-top: 1px solid var(--border);
|
||||||
|
padding-top: 0.6rem;
|
||||||
|
margin-top: 0.25rem;
|
||||||
|
}
|
||||||
|
.extra-section h4 {
|
||||||
|
font-size: 0.62rem;
|
||||||
|
font-weight: 600;
|
||||||
|
color: var(--text-dimmer);
|
||||||
|
letter-spacing: 0.06em;
|
||||||
|
text-transform: uppercase;
|
||||||
|
margin-bottom: 0.4rem;
|
||||||
|
}
|
||||||
|
.extra-kv {
|
||||||
|
display: flex;
|
||||||
|
flex-wrap: wrap;
|
||||||
|
gap: 0.35rem;
|
||||||
|
}
|
||||||
|
.extra-kv-item {
|
||||||
|
background: var(--surface);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: 5px;
|
||||||
|
padding: 0.2rem 0.5rem;
|
||||||
|
font-family: var(--font-mono);
|
||||||
|
font-size: 0.68rem;
|
||||||
|
color: var(--text-dim);
|
||||||
|
}
|
||||||
|
.extra-kv-item .ek { color: var(--text-dimmer); }
|
||||||
|
.extra-kv-item .ev { color: var(--text); margin-left: 0.3rem; }
|
||||||
|
|
||||||
|
/* ── Energielabel badge ── */
|
||||||
|
.el-badge {
|
||||||
|
display: inline-flex;
|
||||||
|
align-items: center;
|
||||||
|
justify-content: center;
|
||||||
|
font-size: 0.62rem;
|
||||||
|
font-weight: 700;
|
||||||
|
font-family: var(--font-mono);
|
||||||
|
padding: 0.1rem 0.35rem;
|
||||||
|
border-radius: 3px;
|
||||||
|
letter-spacing: 0.03em;
|
||||||
|
line-height: 1.5;
|
||||||
|
color: #fff;
|
||||||
|
min-width: 1.8rem;
|
||||||
|
text-align: center;
|
||||||
|
}
|
||||||
|
.el-Appp { background: #004f2d; }
|
||||||
|
.el-App { background: #006837; }
|
||||||
|
.el-Ap { background: #1a9641; }
|
||||||
|
.el-A { background: #3cb54a; }
|
||||||
|
.el-B { background: #69b444; }
|
||||||
|
.el-C { background: #a6d854; color: #2e2a25; }
|
||||||
|
.el-D { background: #f9c819; color: #2e2a25; }
|
||||||
|
.el-E { background: #f4a432; color: #2e2a25; }
|
||||||
|
.el-F { background: #e8612d; }
|
||||||
|
.el-G { background: #c0392b; }
|
||||||
|
.el-unknown { background: var(--surface2); color: var(--text-dim); border: 1px solid var(--border); }
|
||||||
|
|
||||||
|
/* ── Search bar ── */
|
||||||
|
#f-search {
|
||||||
|
background: var(--surface);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: 6px;
|
||||||
|
color: var(--text);
|
||||||
|
font-family: var(--font-ui);
|
||||||
|
font-size: 0.75rem;
|
||||||
|
font-weight: 500;
|
||||||
|
padding: 0.3rem 0.6rem;
|
||||||
|
outline: none;
|
||||||
|
width: 11rem;
|
||||||
|
transition: border-color 0.15s;
|
||||||
|
}
|
||||||
|
#f-search::placeholder { color: var(--text-dimmer); }
|
||||||
|
#f-search:focus { border-color: var(--accent); }
|
||||||
|
|
||||||
|
/* ── Disable filters toggle ── */
|
||||||
|
#filter-disable {
|
||||||
|
background: none;
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
border-radius: 6px;
|
||||||
|
color: var(--text-dimmer);
|
||||||
|
font-family: var(--font-ui);
|
||||||
|
font-size: 0.7rem;
|
||||||
|
font-weight: 600;
|
||||||
|
letter-spacing: 0.04em;
|
||||||
|
text-transform: uppercase;
|
||||||
|
padding: 0.3rem 0.7rem;
|
||||||
|
cursor: pointer;
|
||||||
|
transition: color 0.15s, border-color 0.15s, background 0.15s;
|
||||||
|
}
|
||||||
|
#filter-disable:hover { color: var(--text); border-color: var(--text-dim); }
|
||||||
|
#filter-disable.active {
|
||||||
|
background: var(--orange);
|
||||||
|
border-color: var(--orange);
|
||||||
|
color: #fff;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* ── No results ── */
|
||||||
|
#empty { display: none; }
|
||||||
|
#empty.visible { display: block; }
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
|
||||||
|
<header>
|
||||||
|
<h1>Huizenbot</h1>
|
||||||
|
<span id="count"></span>
|
||||||
|
</header>
|
||||||
|
|
||||||
|
<div id="filters">
|
||||||
|
<div class="filter-group">
|
||||||
|
<label>OV Mark ≤</label>
|
||||||
|
<input type="number" id="f-ov-mark" value="45" min="0" max="120">
|
||||||
|
<label>min</label>
|
||||||
|
</div>
|
||||||
|
<div class="filter-group">
|
||||||
|
<label>OV Michelle ≤</label>
|
||||||
|
<input type="number" id="f-ov-michelle" value="45" min="0" max="120">
|
||||||
|
<label>min</label>
|
||||||
|
</div>
|
||||||
|
<div class="filter-group">
|
||||||
|
<label>Fiets Mark ≤</label>
|
||||||
|
<input type="number" id="f-fiets-mark" value="40" min="0" max="90">
|
||||||
|
<label>min</label>
|
||||||
|
</div>
|
||||||
|
<div class="filter-group">
|
||||||
|
<label>Max prijs</label>
|
||||||
|
<input type="number" id="f-prijs" value="300000" min="0" step="5000">
|
||||||
|
</div>
|
||||||
|
<div class="filter-group">
|
||||||
|
<label>Min opp.</label>
|
||||||
|
<input type="number" id="f-opp" value="65" min="0" max="300">
|
||||||
|
<label>m²</label>
|
||||||
|
</div>
|
||||||
|
<div class="filter-group">
|
||||||
|
<label>Sorteer</label>
|
||||||
|
<select id="f-sort">
|
||||||
|
<option value="first_seen_desc">Nieuwste eerst</option>
|
||||||
|
<option value="first_seen_asc">Oudste eerst</option>
|
||||||
|
<option value="prijs_asc">Prijs ↑</option>
|
||||||
|
<option value="prijs_desc">Prijs ↓</option>
|
||||||
|
<option value="ov_mark_asc">OV Mark ↑</option>
|
||||||
|
<option value="fiets_mark_asc">Fiets Mark ↑</option>
|
||||||
|
<option value="opp_asc">Opp. ↑</option>
|
||||||
|
<option value="opp_desc">Opp. ↓</option>
|
||||||
|
</select>
|
||||||
|
</div>
|
||||||
|
<input type="search" id="f-search" placeholder="Zoek adres, stad…">
|
||||||
|
<button id="filter-disable">Filters uit</button>
|
||||||
|
<button id="filter-reset">Reset</button>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div id="listings"></div>
|
||||||
|
<div id="empty">Geen woningen gevonden met deze filters.</div>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
const LISTINGS = {{ listings_json | safe }};
|
||||||
|
|
||||||
|
const DEFAULTS = {
|
||||||
|
'f-ov-mark': 45,
|
||||||
|
'f-ov-michelle': 45,
|
||||||
|
'f-fiets-mark': 40,
|
||||||
|
'f-prijs': 300000,
|
||||||
|
'f-opp': 65,
|
||||||
|
'f-sort': 'first_seen_desc',
|
||||||
|
};
|
||||||
|
|
||||||
|
// ── Helpers ──
|
||||||
|
|
||||||
|
function fmt_prijs(p) {
|
||||||
|
if (!p) return '—';
|
||||||
|
return '€\u202f' + p.toLocaleString('nl-NL');
|
||||||
|
}
|
||||||
|
|
||||||
|
function fmt_min(m) {
|
||||||
|
if (m == null) return '—';
|
||||||
|
return m + ' min';
|
||||||
|
}
|
||||||
|
|
||||||
|
function travel_class(val, warn, good) {
|
||||||
|
if (val == null) return '';
|
||||||
|
if (val <= good) return 'ok';
|
||||||
|
if (val <= warn) return '';
|
||||||
|
return 'warn';
|
||||||
|
}
|
||||||
|
|
||||||
|
function fmt_date(iso) {
|
||||||
|
if (!iso) return '—';
|
||||||
|
return iso.slice(0, 10);
|
||||||
|
}
|
||||||
|
|
||||||
|
function fmt_extra_val(v) {
|
||||||
|
if (v === null || v === undefined) return null;
|
||||||
|
if (typeof v === 'boolean') return v ? 'ja' : 'nee';
|
||||||
|
if (Array.isArray(v)) {
|
||||||
|
if (v.length === 0) return null;
|
||||||
|
// photos array: just show count
|
||||||
|
return v.length + ' foto\'s';
|
||||||
|
}
|
||||||
|
if (typeof v === 'object') return JSON.stringify(v).slice(0, 60);
|
||||||
|
const s = String(v);
|
||||||
|
if (s === '' || s === 'null') return null;
|
||||||
|
// truncate long description
|
||||||
|
return s.length > 120 ? s.slice(0, 120) + '…' : s;
|
||||||
|
}
|
||||||
|
|
||||||
|
function el_class(label) {
|
||||||
|
if (!label) return 'el-unknown';
|
||||||
|
const s = label.replace(/\+/g, 'p').replace(/-/g, '');
|
||||||
|
const map = { 'Appp': 'el-Appp', 'App': 'el-App', 'Ap': 'el-Ap', 'A': 'el-A',
|
||||||
|
'B': 'el-B', 'C': 'el-C', 'D': 'el-D', 'E': 'el-E', 'F': 'el-F', 'G': 'el-G' };
|
||||||
|
return map[s] || 'el-unknown';
|
||||||
|
}
|
||||||
|
|
||||||
|
function ef(label, val) {
|
||||||
|
if (val == null || val === '' || val === 'null') return '';
|
||||||
|
return `<div class="expanded-field">
|
||||||
|
<span class="ef-label">${label}</span>
|
||||||
|
<span class="ef-val">${val}</span>
|
||||||
|
</div>`;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Card renderer ──
|
||||||
|
|
||||||
|
function render_card(l) {
|
||||||
|
const img = l.hero_image_url
|
||||||
|
? `<img src="${l.hero_image_url}" alt="${l.adres || ''}" loading="lazy">`
|
||||||
|
: `<div class="card-img-placeholder">🏠</div>`;
|
||||||
|
|
||||||
|
const ovM = travel_class(l.ov_mark, 45, 30);
|
||||||
|
const ovMi = travel_class(l.ov_michelle, 45, 30);
|
||||||
|
const fM = travel_class(l.fiets_mark, 40, 25);
|
||||||
|
const fMi = travel_class(l.fiets_michelle, 50, 35);
|
||||||
|
|
||||||
|
const extra_items = Object.entries(l.extra || {})
|
||||||
|
.map(([k, v]) => {
|
||||||
|
const fv = fmt_extra_val(v);
|
||||||
|
if (fv === null) return '';
|
||||||
|
return `<span class="extra-kv-item"><span class="ek">${k}</span><span class="ev">${fv}</span></span>`;
|
||||||
|
}).join('');
|
||||||
|
|
||||||
|
const extra_section = extra_items
|
||||||
|
? `<div class="extra-section"><h4>Extra</h4><div class="extra-kv">${extra_items}</div></div>`
|
||||||
|
: '';
|
||||||
|
|
||||||
|
return `
|
||||||
|
<div class="card" data-id="${l.id}">
|
||||||
|
<div class="card-compact">
|
||||||
|
<div class="card-img">
|
||||||
|
${img}
|
||||||
|
<span class="card-source">${l.source_makelaar}</span>
|
||||||
|
</div>
|
||||||
|
<div class="card-data">
|
||||||
|
<div class="card-header">
|
||||||
|
<div>
|
||||||
|
<div class="card-adres">${l.adres || '—'}</div>
|
||||||
|
<div class="card-stad">${l.stad || ''} ${l.postcode || ''}</div>
|
||||||
|
</div>
|
||||||
|
<a class="card-link" href="${l.url}" target="_blank" rel="noopener" onclick="event.stopPropagation()">
|
||||||
|
<svg width="9" height="9" viewBox="0 0 12 12" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round"><path d="M5 3H2a1 1 0 00-1 1v6a1 1 0 001 1h6a1 1 0 001-1V7M8 1h3m0 0v3m0-3L5 7"/></svg>
|
||||||
|
link
|
||||||
|
</a>
|
||||||
|
</div>
|
||||||
|
<div class="card-prijs">${fmt_prijs(l.prijs)}</div>
|
||||||
|
<div class="card-meta">
|
||||||
|
<div class="card-meta-item ${ovM}">
|
||||||
|
<span class="icon">🚌</span><span>Mark</span><span class="val">${fmt_min(l.ov_mark)}</span>
|
||||||
|
</div>
|
||||||
|
<div class="card-meta-item ${ovMi}">
|
||||||
|
<span class="icon">🚌</span><span>Michelle</span><span class="val">${fmt_min(l.ov_michelle)}</span>
|
||||||
|
</div>
|
||||||
|
<div class="card-meta-item ${fM}">
|
||||||
|
<span class="icon">🚲</span><span>Mark</span><span class="val">${fmt_min(l.fiets_mark)}</span>
|
||||||
|
</div>
|
||||||
|
<div class="card-meta-item ${fMi}">
|
||||||
|
<span class="icon">🚲</span><span>Michelle</span><span class="val">${fmt_min(l.fiets_michelle)}</span>
|
||||||
|
</div>
|
||||||
|
${l.woonoppervlak ? `<div class="card-meta-item"><span class="icon">📐</span><span class="val">${l.woonoppervlak} m²</span></div>` : ''}
|
||||||
|
${l.kamers ? `<div class="card-meta-item"><span class="icon">🚪</span><span class="val">${l.kamers} kamers</span></div>` : ''}
|
||||||
|
${l.energielabel ? `<div class="card-meta-item"><span class="el-badge ${el_class(l.energielabel)}">${l.energielabel}</span></div>` : ''}
|
||||||
|
</div>
|
||||||
|
<div class="card-toggle">meer ↓</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="card-expanded">
|
||||||
|
<div class="expanded-grid">
|
||||||
|
${ef('Eerste gezien', fmt_date(l.first_seen))}
|
||||||
|
${ef('Datum aanmelding', l.datum_aanmelding ? fmt_date(l.datum_aanmelding) : null)}
|
||||||
|
${ef('Woningtype', l.woningtype)}
|
||||||
|
${ef('Bouwjaar', l.bouwjaar)}
|
||||||
|
${ef('Woonoppervlak', l.woonoppervlak ? l.woonoppervlak + ' m²' : null)}
|
||||||
|
${ef('Perceeloppervlak', l.perceeloppervlak ? l.perceeloppervlak + ' m²' : null)}
|
||||||
|
${ef('Kamers', l.kamers)}
|
||||||
|
${ef('Slaapkamers', l.slaapkamers)}
|
||||||
|
${ef('Energielabel', l.energielabel)}
|
||||||
|
${ef('Postcode', l.postcode)}
|
||||||
|
${ef('OV Mark', fmt_min(l.ov_mark))}
|
||||||
|
${ef('OV Michelle', fmt_min(l.ov_michelle))}
|
||||||
|
${ef('Fiets Mark', fmt_min(l.fiets_mark))}
|
||||||
|
${ef('Fiets Michelle', fmt_min(l.fiets_michelle))}
|
||||||
|
</div>
|
||||||
|
${extra_section}
|
||||||
|
</div>
|
||||||
|
</div>`;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Filter + sort + render ──
|
||||||
|
|
||||||
|
let filters_disabled = false;
|
||||||
|
|
||||||
|
function get_filters() {
|
||||||
|
return {
|
||||||
|
ov_mark: parseInt(document.getElementById('f-ov-mark').value) || Infinity,
|
||||||
|
ov_michelle: parseInt(document.getElementById('f-ov-michelle').value) || Infinity,
|
||||||
|
fiets_mark: parseInt(document.getElementById('f-fiets-mark').value) || Infinity,
|
||||||
|
prijs: parseInt(document.getElementById('f-prijs').value) || Infinity,
|
||||||
|
opp: parseInt(document.getElementById('f-opp').value) || 0,
|
||||||
|
sort: document.getElementById('f-sort').value,
|
||||||
|
search: document.getElementById('f-search').value.trim().toLowerCase(),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
const SORT_FNS = {
|
||||||
|
first_seen_desc: (a, b) => (b.first_seen || '').localeCompare(a.first_seen || ''),
|
||||||
|
first_seen_asc: (a, b) => (a.first_seen || '').localeCompare(b.first_seen || ''),
|
||||||
|
prijs_asc: (a, b) => (a.prijs || 0) - (b.prijs || 0),
|
||||||
|
prijs_desc: (a, b) => (b.prijs || 0) - (a.prijs || 0),
|
||||||
|
ov_mark_asc: (a, b) => (a.ov_mark ?? 999) - (b.ov_mark ?? 999),
|
||||||
|
fiets_mark_asc: (a, b) => (a.fiets_mark ?? 999) - (b.fiets_mark ?? 999),
|
||||||
|
opp_asc: (a, b) => (a.woonoppervlak ?? 0) - (b.woonoppervlak ?? 0),
|
||||||
|
opp_desc: (a, b) => (b.woonoppervlak ?? 0) - (a.woonoppervlak ?? 0),
|
||||||
|
};
|
||||||
|
|
||||||
|
function apply() {
|
||||||
|
const f = get_filters();
|
||||||
|
let filtered = LISTINGS.filter(l => {
|
||||||
|
if (!filters_disabled) {
|
||||||
|
if (f.ov_mark < Infinity && (l.ov_mark == null || l.ov_mark > f.ov_mark)) return false;
|
||||||
|
if (f.ov_michelle < Infinity && (l.ov_michelle == null || l.ov_michelle > f.ov_michelle)) return false;
|
||||||
|
if (f.fiets_mark < Infinity && (l.fiets_mark == null || l.fiets_mark > f.fiets_mark)) return false;
|
||||||
|
if (l.prijs != null && l.prijs > f.prijs) return false;
|
||||||
|
if (f.opp > 0 && (l.woonoppervlak == null || l.woonoppervlak < f.opp)) return false;
|
||||||
|
}
|
||||||
|
if (f.search) {
|
||||||
|
const haystack = [l.adres, l.stad, l.postcode, l.source_makelaar, l.woningtype]
|
||||||
|
.filter(Boolean).join(' ').toLowerCase();
|
||||||
|
if (!haystack.includes(f.search)) return false;
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
});
|
||||||
|
|
||||||
|
filtered.sort(SORT_FNS[f.sort] || SORT_FNS.first_seen_desc);
|
||||||
|
|
||||||
|
const container = document.getElementById('listings');
|
||||||
|
const empty = document.getElementById('empty');
|
||||||
|
const count = document.getElementById('count');
|
||||||
|
|
||||||
|
// Preserve open state
|
||||||
|
const open_ids = new Set(
|
||||||
|
[...container.querySelectorAll('.card.open')].map(el => el.dataset.id)
|
||||||
|
);
|
||||||
|
|
||||||
|
container.innerHTML = filtered.map(render_card).join('');
|
||||||
|
count.textContent = filtered.length + ' / ' + LISTINGS.length + ' woningen';
|
||||||
|
|
||||||
|
// Restore open state
|
||||||
|
open_ids.forEach(id => {
|
||||||
|
const el = container.querySelector(`.card[data-id="${id}"]`);
|
||||||
|
if (el) el.classList.add('open');
|
||||||
|
});
|
||||||
|
|
||||||
|
// Toggle on compact click
|
||||||
|
container.querySelectorAll('.card-compact').forEach(compact => {
|
||||||
|
compact.addEventListener('click', () => {
|
||||||
|
compact.closest('.card').classList.toggle('open');
|
||||||
|
const toggle = compact.querySelector('.card-toggle');
|
||||||
|
const isOpen = compact.closest('.card').classList.contains('open');
|
||||||
|
toggle.textContent = isOpen ? 'minder ↑' : 'meer ↓';
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
empty.classList.toggle('visible', filtered.length === 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Init ──
|
||||||
|
|
||||||
|
document.querySelectorAll('#filters input, #filters select').forEach(el => {
|
||||||
|
el.addEventListener('input', apply);
|
||||||
|
});
|
||||||
|
|
||||||
|
document.getElementById('filter-disable').addEventListener('click', () => {
|
||||||
|
filters_disabled = !filters_disabled;
|
||||||
|
document.getElementById('filter-disable').classList.toggle('active', filters_disabled);
|
||||||
|
document.getElementById('filter-disable').textContent = filters_disabled ? 'Filters aan' : 'Filters uit';
|
||||||
|
apply();
|
||||||
|
});
|
||||||
|
|
||||||
|
document.getElementById('filter-reset').addEventListener('click', () => {
|
||||||
|
Object.entries(DEFAULTS).forEach(([id, val]) => {
|
||||||
|
document.getElementById(id).value = val;
|
||||||
|
});
|
||||||
|
document.getElementById('f-search').value = '';
|
||||||
|
filters_disabled = false;
|
||||||
|
document.getElementById('filter-disable').classList.remove('active');
|
||||||
|
document.getElementById('filter-disable').textContent = 'Filters uit';
|
||||||
|
apply();
|
||||||
|
});
|
||||||
|
|
||||||
|
apply();
|
||||||
|
</script>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
65
src/web.py
Normal file
65
src/web.py
Normal file
@@ -0,0 +1,65 @@
|
|||||||
|
"""
|
||||||
|
web.py — huizenbot web interface
|
||||||
|
Single route: query SQLite, SSR listings into index.html template.
|
||||||
|
"""
|
||||||
|
import json
|
||||||
|
import sqlite3
|
||||||
|
import os
|
||||||
|
from flask import Flask, render_template, g
|
||||||
|
|
||||||
|
DB_PATH = os.environ.get("DB_PATH", "/data/huizenbot.db")
|
||||||
|
APP_ENV = os.environ.get("APP_ENV", "dev")
|
||||||
|
|
||||||
|
app = Flask(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def get_db():
|
||||||
|
if "db" not in g:
|
||||||
|
conn = sqlite3.connect(DB_PATH)
|
||||||
|
conn.row_factory = sqlite3.Row
|
||||||
|
g.db = conn
|
||||||
|
return g.db
|
||||||
|
|
||||||
|
|
||||||
|
@app.teardown_appcontext
|
||||||
|
def close_db(e=None):
|
||||||
|
db = g.pop("db", None)
|
||||||
|
if db is not None:
|
||||||
|
db.close()
|
||||||
|
|
||||||
|
|
||||||
|
@app.route("/")
|
||||||
|
def index():
|
||||||
|
conn = get_db()
|
||||||
|
rows = conn.execute("""
|
||||||
|
SELECT
|
||||||
|
id, url, source_makelaar, first_seen, last_seen, datum_aanmelding,
|
||||||
|
status, adres, postcode, stad,
|
||||||
|
prijs, woningtype, woonoppervlak, perceeloppervlak,
|
||||||
|
kamers, slaapkamers, bouwjaar, energielabel,
|
||||||
|
hero_image_url,
|
||||||
|
fiets_mark, fiets_michelle, ov_mark, ov_michelle,
|
||||||
|
extra
|
||||||
|
FROM woningen
|
||||||
|
WHERE status = 'beschikbaar'
|
||||||
|
ORDER BY first_seen DESC
|
||||||
|
""").fetchall()
|
||||||
|
|
||||||
|
listings = []
|
||||||
|
for row in rows:
|
||||||
|
d = dict(row)
|
||||||
|
try:
|
||||||
|
d["extra"] = json.loads(d["extra"]) if d["extra"] else {}
|
||||||
|
except Exception:
|
||||||
|
d["extra"] = {}
|
||||||
|
listings.append(d)
|
||||||
|
|
||||||
|
return render_template("index.html", listings_json=json.dumps(listings, ensure_ascii=False))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
if APP_ENV == "dev":
|
||||||
|
app.run(debug=True, host="0.0.0.0", port=5005)
|
||||||
|
else:
|
||||||
|
from waitress import serve
|
||||||
|
serve(app, host="0.0.0.0", port=5005)
|
||||||
@@ -22,10 +22,10 @@ def _key(url: str, params: dict[str,str] | None) -> str:
|
|||||||
|
|
||||||
def _patch():
|
def _patch():
|
||||||
import adapters.api as api_mod
|
import adapters.api as api_mod
|
||||||
import adapters.ssr as ssr_mod
|
import adapters.ssr._shared as ssr_shared
|
||||||
|
|
||||||
_orig_fetch_json = api_mod.fetch_json
|
_orig_fetch_json = api_mod.fetch_json
|
||||||
_orig_fetch_soup = ssr_mod.fetch_soup
|
_orig_fetch_soup = ssr_shared.fetch_soup
|
||||||
|
|
||||||
def cached_fetch_json(url, *, params: dict[str,str]|None=None, headers=None):
|
def cached_fetch_json(url, *, params: dict[str,str]|None=None, headers=None):
|
||||||
path = CACHE_DIR / (_key(url, params) + ".json")
|
path = CACHE_DIR / (_key(url, params) + ".json")
|
||||||
@@ -46,7 +46,15 @@ def _patch():
|
|||||||
return result
|
return result
|
||||||
|
|
||||||
api_mod.fetch_json = cached_fetch_json
|
api_mod.fetch_json = cached_fetch_json
|
||||||
ssr_mod.fetch_soup = cached_fetch_soup
|
# fetch_soup is imported directly in each submodule via `from ._shared import fetch_soup`,
|
||||||
|
# so we must patch the name in every submodule that uses it.
|
||||||
|
import adapters.ssr.realworks as _rw
|
||||||
|
import adapters.ssr.sure as _sure
|
||||||
|
import adapters.ssr.schiedam as _sch
|
||||||
|
import adapters.ssr.denhaag as _dh
|
||||||
|
import adapters.ssr.overige as _ov
|
||||||
|
for _mod in [ssr_shared, _rw, _sure, _sch, _dh, _ov]:
|
||||||
|
_mod.fetch_soup = cached_fetch_soup
|
||||||
print("[cache] fetch_json and fetch_soup patched")
|
print("[cache] fetch_json and fetch_soup patched")
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -1,17 +1,26 @@
|
|||||||
import sys
|
import sys
|
||||||
|
|
||||||
sys.path.insert(0, "../src")
|
sys.path.insert(0, "../src")
|
||||||
|
|
||||||
|
import logging
|
||||||
|
|
||||||
from cache import * # noqa: F401 — must be before adapter imports
|
from cache import * # noqa: F401 — must be before adapter imports
|
||||||
|
|
||||||
from adapters import SCRAPERS
|
from adapters import SCRAPERS
|
||||||
|
|
||||||
|
logging.basicConfig(
|
||||||
|
stream=sys.stdout,
|
||||||
|
level=logging.INFO, # debug costs too many tokens
|
||||||
|
format="%(asctime)s %(levelname)s %(name)s — %(message)s",
|
||||||
|
datefmt="%Y-%m-%dT%H:%M:%S",
|
||||||
|
)
|
||||||
|
|
||||||
# --- change this to test a different adapter ---
|
# --- change this to test a different adapter ---
|
||||||
ADAPTER = SCRAPERS['bjornd']
|
ADAPTER = SCRAPERS['post']
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
print(f"Testing adapter: {ADAPTER.__name__}")
|
print(f"Testing adapter: {ADAPTER.__name__}")
|
||||||
listings = ADAPTER()
|
listings = ADAPTER()
|
||||||
print(f"Got {len(listings)} listings\n")
|
print(f"Got {len(listings)} listings\n")
|
||||||
for l in listings:
|
for l in listings:
|
||||||
print(f" {l.adres}, {l.stad} — €{l.prijs} — {l.url}")
|
print(f" {l.adres}, {l.postcode}, {l.stad} — €{l.prijs} — {l.kamers} rooms — {l.woonoppervlak}m2 — {l.energielabel} — {l.url}")
|
||||||
|
|||||||
@@ -1,26 +0,0 @@
|
|||||||
import sys
|
|
||||||
sys.path.insert(0, "../src")
|
|
||||||
|
|
||||||
from huizenbot import notify_email, RawListing
|
|
||||||
|
|
||||||
TEST_LISTING = RawListing(
|
|
||||||
url="https://example.com/test-woning",
|
|
||||||
source_makelaar="test",
|
|
||||||
adres="Teststraat 1",
|
|
||||||
stad="Delft",
|
|
||||||
postcode="2613AA",
|
|
||||||
prijs=350000,
|
|
||||||
hero_image_url=None,
|
|
||||||
)
|
|
||||||
|
|
||||||
TEST_TRAVEL = {
|
|
||||||
"fiets_mark": 20,
|
|
||||||
"fiets_michelle": 35,
|
|
||||||
"ov_mark": 30,
|
|
||||||
"ov_michelle": 45,
|
|
||||||
}
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
print("=== Email ===")
|
|
||||||
notify_email(TEST_LISTING, TEST_TRAVEL)
|
|
||||||
print(" verstuurd (check je inbox)")
|
|
||||||
@@ -0,0 +1,37 @@
|
|||||||
|
import sys
|
||||||
|
sys.path.insert(0, "../src")
|
||||||
|
|
||||||
|
import logging
|
||||||
|
|
||||||
|
from huizenbot import notify_ha, RawListing
|
||||||
|
|
||||||
|
|
||||||
|
logging.basicConfig(
|
||||||
|
stream=sys.stdout,
|
||||||
|
level=logging.INFO, # debug costs too many tokens
|
||||||
|
format="%(asctime)s %(levelname)s %(name)s — %(message)s",
|
||||||
|
datefmt="%Y-%m-%dT%H:%M:%S",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
TEST_LISTING = RawListing(
|
||||||
|
url="https://home.kalsbeek.dev/api/webhook/new_house",
|
||||||
|
source_makelaar="test",
|
||||||
|
adres="Teststraat 1",
|
||||||
|
stad="Delft",
|
||||||
|
postcode="2613AA",
|
||||||
|
prijs=350000,
|
||||||
|
hero_image_url=None,
|
||||||
|
)
|
||||||
|
|
||||||
|
TEST_TRAVEL = {
|
||||||
|
"fiets_persoon1": 20,
|
||||||
|
"fiets_persoon2": 35,
|
||||||
|
"ov_persoon1": 30,
|
||||||
|
"ov_persoon2": 45,
|
||||||
|
}
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
print("=== Home Assistant webhook ===")
|
||||||
|
notify_ha(TEST_LISTING, TEST_TRAVEL)
|
||||||
|
print(" verstuurd (check HA voor bevestiging)")
|
||||||
|
|||||||
Reference in New Issue
Block a user