Useful stats that could be gotten from Daft

Currently stats for daft are: volumes advertised on daftwatch, discounts/changes on IrishPropertyWatch and TreesDontGrowToTheSky. There are some other stats that could be retrieved also. Back in 2007 I posted some stats I calculated from daft data:

It gives an picture of asking price distribution in the market for a particular subtype of property at a point in time.
This could be quite useful info. The interesting part of the graph is the threshold on the low end where there is likelihood of sales occuring. If the low side of the curve moves lower/higher, the price has dropped/increased by that amount (with some % discount to be taken into account).

These freq distribution stats could be done across catagories of properties, number of rooms, location to give some good indication of the current actively traded part of the market. Would it be good to add this info to daftwatch, IPW, or TreesDGTTS?

We could definitely look at adding something like this to Daftwatch.

Do you have some historical data collected?

I only collected some at the time I tried out that analysis. I then got distracted by other things. At most I have 2 or 3 weeks. Something I was thinking of at the time was to take the geo-location info used to feed to the Location Map popup. This isn’t very acurate in some cases but reasonably so in most I think. With that you could probably produce some nice stats on areas or distances from city centre.

We publish the median and average sales price on any IPW advanced search.

If you spec it out I could try and add so extra results to the search page.

On IPW under statistics I see average price and number of properties that dropped. I didn’t see median there. Not sure where you have avg and median.

To spec it out: I’m thinking of a snapshot of the entire set of prices at a point in time, not just the ones that have dropped. You would collect all prices for each available catagory of property and bin them into price ranges, e.g. from 150,000 up in steps of 50,000. You’d probably need to apply some curve analysis to determin the characteristics of the freq distripution: e.g. find the price point where its 0.2 of modal value on low side or point where 5% of set is lower. Maybe over time the shape of the distribution will change as go from denial into serious selling - I’d imagine you’d start to see the distribution moving lower and narrowing as that happens.

I’m not a proper stats person so I don’t know what analysis to put on it.

You can see the median price for 3 beds in Dublin in this


The most interesting thing about that chart is that the sale agreed price is always below the asking price.

I should be pretty easy to modify the existing scripts to grab the count of sale agreeds across a certain price range for a county and plot them against total sales. I can look at doing this over the weekend.

I’ve never really bothered with the maps feed at all, as you said it’s often not very accurate, but it might give some nice indications the location of the housing stock. Just having a quick look now it seems easy enough to grab the data required: … 4803161621

I’d briefly looked at something similar for Ghost Estates but never followed through with it, but it’s easy enough to calculate the distance between two coordinates:

sqrt((69.1 * ($lat2 - $lat1)) * (69.1 * ($lat2 - $lat1)) + (69.1 * ($lon2 - $lon1) * cos($lat1/57.3)) * (69.1 * ($lon2 - $lon1) * cos($lat1/57.3)))

Oh yes, I see the advanced search on IPW now. Rather obvious it is too.

Does that distance formual look a bit complicated? I thought it’d be sqrt( (*f1)^2 + (f2)^2) where the f1 and f2 are some factors for converting degrees of lat/lon into kilometers. Whats the cos() stuff in there?

The cosine function improves accuracy to around 95%, you can also use the following but it’s only around 90% accurate, both give results in miles:

sqrt((69.1 * ($lat2 - $lat1)) * (69.1 * ($lat2 - $lat1)) + (53.0 * ($lon2 - $lon1)) * (53.0 * ($lon2 - $lon1) ))

You can calculate the great circle distance if you require higher accuracy but it’s not really worth it.

I generated something similar a while ago , my data goes back to Aug 07 if anyone is interested. The graphs below are histograms of apartments listed on Daft on Sept 07 and May 08. There is a slight shift downwards in the median price as can be seen. It is possible to generate these graphs for different regions etc

I think you can see the shape of the curve change between 2007 and 2008. There is a heaping up towards the lower end while the tail on the high side remains. The high end tail probably represents something about denial. Of course these stats are for all apts which include 1, 2 and 3 beds I presume. It would be good to have them broken down by type.
I suppose the IPW data represents the difference between these two pictures - those that have shifted there price.

I’ve added some quick price distribution graphs to Daftwatch. They are available for each county on the respective breakdown page.

Thanks for that nemonoid! Very quickly put together. Its good to have a picture of how much “stuff” is for sale in particular price ranges and maybe be able to watch how it evolves over time.

I think that breakdown by property type, number of rooms and possibly areas or distance from city would make the the price distribution picture more interpretable. Maybe this isn’t easy to do with the way daftwatch stats are handled.

I had planned to add a breakdown on totals for number of rooms and property type but I just haven’t had the time. I hope to have a look at this over the coming week and also to have a play around with the XML feed for the maps, as this contains the lat/long, address, no. of rooms (although not as a separate element, it’s contained within the address field) and price data.

How did you grab your data in the first place to generate your stats? I’m wary of hitting Daft with too many requests to capture the data.

I had a fairly simple perl script where the url query was tailored to specify the catagory of property, number of rooms and county. This only read the pages of results and not the details for each prperty. I didn’t get to do the loaction info part but had looked at how to construct the url to do it.

When running a query, I also though about the danger of overloading the site so I added periodic pauses, something like get 4 pages of results and pause for a second. I also ran the queries late in the evening, eg 11:30.