Optimize TWebCopy Settings for Faster, Accurate Site Copies
1. Choose the right download mode
- Full site — best for complete offline browsing; slower and larger.
- Mirror (structured) — preserves site structure; balanced speed and accuracy.
- Single page / partial — fastest; use when you only need specific sections.
2. Set URL filters
- Include: add only necessary paths (e.g., /blog/, /docs/) to reduce size.
- Exclude: block large media folders, tracking scripts, CDN paths, or query-heavy URLs (e.g., ?session=).
- Use wildcard patterns (e.g., /images/) to efficiently trim content.
3. Limit depth and bandwidth
- Maximum link depth: 2–4 for most sites; higher depths increase time and size.
- Max file size: skip very large files (e.g., >5–10 MB) to speed up.
- Download threads/concurrency: increase for faster downloads but monitor CPU/network; start with 4–8 threads.
4. Adjust request behavior
- Respect robots.txt: enable if you want to follow site rules; disable only if you have permission.
- Delay between requests: 200–800 ms to avoid server throttling; reduce for trusted/local sites.
- User-Agent: set a common browser UA to retrieve correct content; avoid impersonation of crawlers.
5. Handle JavaScript-heavy sites
- TWebCopy is not a full browser — for JS-rendered content, prefetch static API endpoints or use tools that support headless browsers (e.g., HTTrack + browser emulation).
- If pages include JSON endpoints, add those URLs to include list.
6. File types and MIME handling
- Include needed extensions (.html, .css, .js, .png, .jpg, .svg, .json).
- Exclude unnecessary formats like .mp4/.zip if not required.
7. Rewrite and link handling
- Enable link rewriting to make local links functional.
- Use absolute-to-relative conversion for portability.
8. Logging and retries
- Enable detailed logs to catch missed resources.
- Set retry count to 2–3 for transient errors; backoff between retries.
9. Post-download cleanup
- Remove orphaned files and empty folders.
- Run a quick local link-check to find broken links and re-download missing assets.
10. Test and iterate
- Start with a small section using conservative settings, verify results, then expand.
- Keep a checklist: page rendering, images, styles, scripts, forms, and downloads.
If you want, I can produce a ready-to-use TWebCopy settings profile (threads, depth, filters, file-size limits) for a blog, documentation site, or e-commerce site—tell me which.
Leave a Reply