I'm looking to gather all the property listings out there (not that many in reality) - so what's the best way to learn how to pull this data in from a combination of scraping and API calls and dumping into a database.
And then I want to query that data to identify the best properties to buy: for instance looking for the string "in need of" within the description to identify potential fixer-uppers.
Any help is most appreciated.
First, you need to look at various web scrapers (for instance, this one gave me good results and was easy to start - http://webscraper.io).
Then you may want to check some scripts in R and / or Python etc to scrape some data from the web.
And then google for 'text parsing'.
I know this is not much, but may give you general direction to research options.
BTW, I am currently researching almost in the same direction using R :)
I think Octoparse may interest you. No need to know coding knowledge. If you know something about regex and xpath that would be better.
Obviously a couple of parts to you data acquisition and data parsing.
If you are on Windows OS try using Powershell. Free and powerful tool where you can easily scrape web data and insert into a database. You can regex the data in the database or in the script while downloading. Another easy to use too for the data acquisition part is Imacros add-on for Chrome. I've used this to scrape public data when no API exists. Easy and fairly powerful. Good Luck.