PPR vs myhome matches

Don’t know if I’m able to post something this size. This is all 5,600 matches of PPR against my historical myhome data (which is quarterly snapshots going back to start of 2011). First line of each match shows PPR sale date, sale price, address. Second line shows latest asking price I saw on myhome (sorry about lack of formatting), address, and myhome brief description. Matches only include properties with house numbers. Suggestions welcome as to where to take this ( … polite ones, anyway).

Ok … that answered my question – “413 Request Entity Too Large”. I’ll see if I can post by county/postcode or something.

Data sorted by PPR date within county

I think googledocs or something like that would be the way to go

Ok. Sounds more sensible. Here’s a flavour of the data in case anyone has suggestions.

Dublin 4 (61 matches)

01/03/2012      €610,000.00   66 Belmont Avenue, Donnybrook, Dublin 4
                     655000   66 Belmont Avenue, Donnybrook, Dublin 4  4 Bed Terraced House 158.9 m² For Sale

01/07/2011    €1,030,000.00   126 Merrion Road, Ballsbridge, Dublin 4
                    1200000   126 Merrion Road, Ballsbridge, Dublin 4  4 Bed Detached House 208 m² For Sale

01/08/2012      €155,000.00   89 The Waterside, Ringsend, Dublin 4
                     175000   89 The Waterside, Ringsend, Dublin 4  2 Bed Apartment For Sale

02/02/2012      €620,000.00   103 Marlborough Road, Donnybrook
                     775000   103 Marlborough Road, Donnybrook, Dublin 4  3 Bed Semi-Detached House 223 m² For Sale

02/05/2012      €905,000.00   3 Raglan Road, Ballsbridge
                    1300000   3 Raglan Road, Ballsbridge, Dublin 4  5 Bed Terraced House For Sale

02/05/2012      €475,000.00   28 Upper Grand Canal Street, Dublin 4
                     495000   28 Upper Grand Canal Street, Dublin 4  3 Bed Terraced House 148.6 m² For Sale

02/08/2011      €670,000.00   23 Shrewsbury Park, Ballsbridge
                     695000   23 Shrewsbury Park, Ballsbridge, Dublin 4  3 Bed Semi-Detached House 134.7 m² For Sale

02/09/2011      €270,000.00   86 Lansdowne Village, Ballsbridge
                     295000   86 Lansdowne Village, Ballsbridge,  Dublin 4.  2 Bed Terraced House 69 m² For Sale

03/04/2012      €165,000.00   32 Gordon Street, Ringsend
                     220000   32 Gordon Street, Ringsend, Dublin 4  2 Bed End of Terrace House For Sale

03/07/2012      €220,000.00   Apartment 179, The Sweepstakes, Ballsbridge
                     220000   Apt 179 The Sweepstakes, Ballsbridge, Dublin 4  1 Bed Apartment For Sale

03/08/2012      €310,000.00   Apartment 50, Bloomfield Park, Donnybrook
                     325000   Apartment 50 Bloomfield Park, Donnybrook, Dublin 4  2 Bed Apartment 70 m² / 753 ft² For Sale

03/09/2012      €607,999.00   12A Ailesbury Grove, Donnybrook
                    1050000   12a Ailesbury Grove, Donnybrook, Dublin 4  3 Bed Detached House 164 m² For Sale

04/04/2011      €310,000.00   4 chapel ave, irishtown
                     349950   4 Chapel Avenue, Irishtown, Dublin 4  2 Bed Terraced House 73 m² For Sale

04/05/2011      €547,000.00   36 Havelock Square, Sandymount
                     595000   36 Havelock Square, Sandymount, Dublin 4  3 Bed Terraced House For Sale

05/01/2011      €190,000.00   2 Pembroke Cottages, Ringsend, Dublin 4
                     195000   2 Pembroke Cottages, Ringsend, Dublin 4  1 Bed Cottage For Sale - 45m² 

06/07/2011      €685,000.00   154 Lansdowne Park, Ballsbridge, Dublin
                     795000   154 Lansdowne Park, Ballsbridge, Dublin 4  3 Bed Detached House 155 m² For Sale

08/02/2012      €170,000.00   41 Beech Hill Drive, Donnybrook
                     179950   41 Beech Hill Drive, Donnybrook, Dublin  3 Bed Terraced House For Sale

08/07/2011      €820,000.00   8 Merlyn Road, Ballsbridge
                     950000   8 Merlyn Road, Ballsbridge, Dublin 4  4 Bed Semi-Detached House 146 m² For Sale

09/02/2012      €245,000.00   2 Donnybrook Green, Donnybrook
                     275000   2 Donnybrook Green, Donnybrook, Dublin 4  2 Bed Apartment 90.6 m² For Sale

09/12/2011      €185,000.00   79 Beech Hill DRive, Donnybrook
                     184950   79 Beech Hill Drive, Donnybrook, Dublin 4  3 Bed Terraced House For Sale

10/10/2011      €160,000.00   33 Ringsend Road, Ringsend
                     245000   33 Ringsend Road, Ringsend, Dublin  3 Bed Terraced House 90 m² For Sale

11/01/2012      €985,000.00   144 Tritonville Road, Sandymount
                    1150000   144 Tritonville Road, Sandymount, Dublin 4  4 Bed End of Terrace House 185.8 m² For Sale

12/08/2011      €611,027.00   13 Simmons Court, Ballsbridge
                     750000   13 Simmons Court, Ballsbridge, Dublin 4  3 Bed Terraced House For Sale - 175m² 

13/02/2012      €365,000.00   15 Radcliff Hall, St John's Road, Sandymount
                     375000   15 Radcliff Hall, St Johns Road, Sandymount, Dublin 4  2 Bed Apartment 96.9 m² For Sale

13/02/2012      €875,000.00   13 Greenfield Park, Donnybrook, Dublin
                    1450000   13 Greenfield Park, Donnybrook, Dublin 4  2 Bed Detached House 227 m² For Sale

13/07/2012      €440,000.00   7 Lea Crescent, Sandymount
                              7 Lea Crescent, Sandymount, Dublin 4  3 Bed Terraced House 73 m² / 786 ft² For Sale By Auction

13/09/2010      €184,000.00   43 Ropewalk Place, Ringsend, Dublin 4
                     200000   43 Ropewalk Place, Ringsend, Dublin 4  1 Bed Apartment For Sale - 42m² 

13/10/2011    €1,500,000.00   37 Northumberland Road, Ballsbridge
                    2200000   37 Northumberland Road , Ballsbridge,   Dublin 4  4 Bed Period House from 325 m² For Sale

14/04/2011      €210,000.00   6 Anglers Rest, Donnybrook, Dublin 4
                     275000   6 Anglers Rest, Donnybrook, Dublin 4  1 Bed Terraced House For Sale - 64m² 

14/10/2011      €880,000.00   44 Eglinton Road, Donnybrook
                     995000   44 Eglinton Road, Donnybrook, DUBLIN 4  5 Bed Semi-Detached House 204 m² For Sale

15/08/2012      €377,500.00   Apartment 12, Wavendon, 69 Northumberland Road  Ballsbridge
                     395000   Apartment 12, Wavendon, 69 Northumberland Road, Ballsbridge, Dublin 4  3 Bed Apartment 100 m² / 1076 ft² For Sale

15/12/2011      €395,000.00   15 Sydenham Court, Sydenham Road, Ballsbridge
                     375000   15 Sydenham Court, Sydenham Road, Ballsbridge, DUBLIN 4  2 Bed Apartment 90 m² For Sale

16/05/2011      €625,000.00   8 Churchill Terrace, Sandymount
                     720000   8 Churchill Terrace, Sandymount, Dublin 4  3 Bed Terraced House For Sale - 135m² 

16/12/2011      €500,000.00   36 Shelbourne Road, Ballsbridge
                     525000   36 Shelbourne Road, Ballsbridge, Dublin 4  3 Bed Terraced House 134 m² For Sale

17/08/2012      €429,000.00   13 Tritonville Road, Sandymount, Dublin 4
                     475000   13 Tritonville Road, Sandymount, Dublin 4  3 Bed Period House 125 m² / 1345 ft² For Sale

18/01/2012      €175,000.00   15 Somerset Street, Ringsend, Dublin 4.
                     200000   15 Somerset Street, Ringsend, Dublin 4  2 Bed Terraced House For Sale

18/08/2011      €550,000.00   48  Beach Road, Sandymount, Dublin 4
                     625000   48 Beach Road, Sandymount, Dublin 4  4 Bed Semi-Detached House 148 m² For Sale

19/07/2011      €700,000.00   27 Seafort Gardens, Sandymount, Dublin 4
                     795000   27 Seafort Gardens, Sandymount, Dublin 4  4 Bed Semi-Detached House 185 m² For Sale

19/08/2011      €971,000.00   10 MARINE DRIVE, SANDYMOUNT
                     990000   10 Marine Drive, Sandymount, Dublin 4  4 Bed Semi-Detached House For Sale

20/01/2012    €2,600,000.00   25 Greenfield Park, Donnybrook
                    3000000   25 Greenfield Park, Donnybrook, Dublin 4  5 Bed Bungalow 479 m² For Sale

20/04/2012      €328,000.00   38 Bath Street, Irishtown, Dublin 4
                     370000   38 Bath Street, Irishtown,   Dublin 4  3 Bed Terraced House 1487 ft² For Sale

20/09/2011      €280,000.00   29 Morehampton Square, Donnybrook
                     345000   29 Morehampton Square, Donnybrook, Dublin 4  2 Bed Semi-Detached House For Sale

21/12/2011      €260,000.00   45 Haddington Square, Haddington Road, Ballsbridge
                     295000   45 Haddington Square, Haddington Road, Ballsbridge, Dublin 4  2 Bed Apartment 72 m² For Sale

21/12/2011    €3,650,000.00   14 Ailesbury Road, Ballsbridge
                    4250000   14 Ailesbury Road, Ballsbridge, Dublin 4  6 Bed Semi-Detached House 419.8 m² For Sale

22/05/2012      €620,000.00   21 Bushfield Terrace, Donnybrook
                     650000   21 Bushfield Terrace, Donnybrook, Dublin 4  4 Bed Terraced House For Sale

22/06/2012      €475,000.00   99 the sweepstakes, ballsbridge
                     535000   99 The Sweepstakes, Ballsbridge, Dublin 4  4 Bed Terraced House 173 m² For Sale

22/06/2012      €425,000.00   34 shrewsbury park, ballsbridge
                     495000   34 Shrewsbury Park, Ballsbridge, Dublin 4  4 Bed Terraced House 134 m² For Sale

22/09/2011      €427,000.00   6 Gilford Avenue, Sandymount
                     495000   6 Gilford Avenue, Sandymount, Dublin 4  3 Bed End of Terrace House 78 m² For Sale

22/12/2011      €420,000.00   17 Morehampton Terrace, Donnybrook
                     500000   17 Morehampton Terrace, Donnybrook, Dublin 4  3 Bed Terraced House 120.8 m² For Sale

23/06/2011      €315,000.00   6 Pembroke Cottages, Donnybrook, Dublin 4
                     350000   6 Pembroke Cottages, Donnybrook,  Dublin 4  2 Bed Cottage from 65 m² For Sale

24/08/2012      €280,000.00   16 ST. CATHRYN'S COURT, SANDYMOUNT
                     299000   16 St Cathryn's Court, Sandymount,   Dublin 4  2 Bed Apartment 90 m² / 969 ft² For Sale

25/05/2012    €2,200,000.00   16 Clyde Road, Ballsbridge, Dublin 4
                    2200000   16 Clyde Road, Ballsbridge, Dublin 4  5 Bed Detached House 377 m² For Sale

25/08/2011       €91,500.00   13 Dermot O'Hurley Avenue, Stella Gardens, Irishtown
                      99000   13 Dermot O'Hurley Avenue, Stella Gardens, Irishtown, Dublin 4  2 Bed Cottage For Sale

26/10/2011      €295,000.00   32 BALLSBRIDGE GARDENS, BALLSBRIDGE
                     325000   32 Ballsbridge Gardens, Ballsbridge  2 Bed Apartment 86 m² For Sale

27/04/2012      €705,000.00   6 Bushfield Terrace, Donnybrook
                     825000   6 Bushfield Terrace, Donnybrook, Dublin 4  4 Bed Terraced House 170 m² For Sale

30/04/2012      €125,000.00   8 rowan house, mespil estate
                     135000   8 Rowan House , Mespil Estate  1 Bed Apartment 34 m² / 366 ft² For Sale

30/08/2011      €250,000.00   3 Nutley Square, Donnybrook
                     350000   3 Nutley Square, Donnybrook, Dublin 4  2 Bed Terraced House 60 m² For Sale

30/09/2011      €455,500.00   26B Newgrove Avenue, Sandymount, Dublin 4
                     475000   26B Newgrove Avenue, Sandymount, Dublin 4  3 Bed Semi-Detached House 104 m² For Sale

31/05/2011      €370,000.00   22 St. John's, Park Avenue, Sandymount
                     390000   22 St John's, Park Avenue, Sandymount,  Dublin 4  3 Bed Townhouse For Sale

31/05/2012      €625,750.00   33 Belmont Avenue, Donnybrook, Dublin 4
                    1100000   33 Belmont Avenue, Donnybrook, Dublin 4  9 Bed For Sale

31/05/2012      €165,000.00   9 Hope Street, Ringsend
                     145000   9 Hope Street, Ringsend,   Dublin 4  2 Bed Cottage 62 m² / 667 ft² For Sale

Up on Google Docs … I think it converted to Google spreadsheet; I can put it back in raw csv if anyone prefers.

docs.google.com/spreadsheet/ccc … VpqRkZIWGc

EDIT: Updated link

Excellent work

If possible could we also see
(A) the c. 50K register entries
(B) whatever you’ve scraped from myhome (Have you anything in addition to price, address, and myhome brief description.

Perfectly understand though if you want to keep it private

Any particular reason for wanting to see the 50k register entries without mh data? I am guessing to support a PPR search with optional mh data. However, not sure I want to be the sole source of that info – what happens when it needs to be updated? I would prefer to provide the canonicalisation code (in Java) so that people can do their own matching. Nevertheless, I’ll look into doing it as a once off, just to see if it’s any use to anyone. I would guess it is of more use with the PPR duplicates put back in, and the mh data matched up to each matching record.

Unfortunately there is not much more to the mh data I have scraped. Apart from the address and price, it is basically all gleaned by pattern matching against the myhome brief description (mhDesc in the file). From this I provide the number of beds (mhBeds) and square footage (mhSqFt). I use the price/sq.ft. which in my quarterly analysis, but this is easily computable from the aforementioned fields. I have a field for the dwelling type, and a summary field indicating “House”, “Apartment” or “Other”. I also have a summary field indicating “Sale” or “Auction”. I used to have a tax type (Section 23, Pre-63 etc.) but myhome stopped doing it. There is, of course, the computed canonicalised address which is what is used for matching between the two databases, but I doubt it is of particular use to anyone. That’s about it. I’ll consider including those other fields in another version.

Any other ideas welcomed.

Best of all reasons; I was too lazy to do it myself
But it’s here now
google.com/fusiontables/Dat … 3PclWssvWI

I’d love to see the canonicalisation code and especially, as 2pack asked on the other thread, the entire original scrape from myhome (i.e. including those that didn’t match). Maybe a few people trying to match addresses in different ways might produce a good hybrid.

They went out of their way yesterday to say this wasn’t a property price index. With your data it is.

EDIT: Updated link

Address canonicalisation code for rendering myhome and ppr addresses comparable:

package ps.ppr;

import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/*
 * Written by ps200306@yahoo.ie. Permission to use for any purpose is hereby granted.
 * No warranty is expressed or implied that the code is fit for any purpose whatsoever.
 * The user accepts all liability for damage or loss incurred as a result of using this
 * code.
 */

/*
 * Change history:
 * 01-Oct-2012 PS  Initial version.
 * 02-Oct-2012 PS  Support optionally ignoring Dublin post codes.
 * 
 */

/**
 * Performs canonicalisation specifically for Irish addresses, allowing differently
 * formatted addresses to be compared and matched. An instance of CanonicalAddress is
 * created using a raw input address and optional postcode and county hints. The address
 * is canonicalised and the original and canonical addresses can be queried.
 * 
 * Canonicalisation works as follows: all inputs are converted to "normalised text":
 * everything is lowercased, accented characters are unaccented, single and double
 * quotes are removed, and characters other than alphabetic and numeric ones
 * are converted to spaces. Leading and trailing spaces are removed, and runs of
 * embedded spaces are converted to a single space. The remaining words of the address
 * are checked against a list of synonyms which converts certain words to standard
 * abbreviations.
 * 
 * Next, a county is determined using the postcode hint (if supplied), the county hint
 * (if supplied) and the trailing words of the input address, in that order. Some 
 * sources of address data will have these fields separately, so these optional hints
 * allow them to be supplied separately. Examples of 
 * the county formats accepted are: "dublin", "dublin 24", "co dublin 24", "24". Note
 * that words like "COUNTY" will already have been normalised to "co". When a county
 * is determined, if the trailing part of the address contains a county it is removed
 * and replaced with the normalised form: "co dublin 24" (or more generally "co
 * cname optionalcode"). The constructor optionally allows for ignoring dublin post
 * codes, but addresses to be compared must make sure to both use the same option.
 * 
 * Although the front part of the normalised address text is not modified, it *is*
 * analysed to check if the address begins with a house number. If so, this is available
 * via an accessor method. A house number is of the form "nnA" or "apt nnA", where nnA
 * represents any string of digits followed an optional single letter from a to g. The
 * string "apt" will already have been subsituted as a synonym for various other common
 * spellings or abbreviations of "APARTMENT". 
 * 
 * The canonical form address is a string suitable for use as a key or for generating 
 * hash values, and can be used for comparing similar addresses that vary in formatting,
 * punctuation, and abbreviation.
 * 
 * @author ps200306
 *
 */
public class CanonicalAddress {

  // Inputs
  String iAddress;
  String iCounty;
  String iPostCode;
  boolean iTrimPostCode;
  // Outputs
  String oAddress;
  String oCounty;
  String oHouse;

  /**
   * Create a canonical address
   * 
   * @param address
   *          input unformatted address
   * @param countyHint
   *          optional county; takes precedence over the address field for
   *          determining county
   * @param postCodeHint
   *          optional county/postCode. Takes precedence over the countyHint and
   *          address fields for determining county.
   * @param trimPostCode
   *          if true, causes Dublin post code number to be dropped. Ensure that when
   *          performing address comparisons, all addresses use the same option.
   */
  public CanonicalAddress(String address, String countyHint, String postCodeHint, boolean trimPostCode) {
    iAddress = address;
    iCounty = countyHint;
    iPostCode = postCodeHint;
    iTrimPostCode = trimPostCode;
    normaliseAddress();
  }
  
  /**
   * Create a canonical address, equivalent to CanonicalAddress(address, county, postcode, false).
   */
  public CanonicalAddress(String address, String countyHint, String postCodeHint) {
    this(address, countyHint, postCodeHint, false);
  }
  
  /**
   * @return the canonical form of the address.
   * This can be useful for comparing two addresses
   * in their canonical form or providing a hash key
   * for searching.
   */
  public String getCanonicalAddress() {
    return oAddress;
  }
  
  /**
   * @return canonical form of county, or null if none.
   * If this method returns non-null the address has a valid
   * county, which can be an important criterion in the ranking
   * of matches.

   */
  public String getCanonicalCounty() {
    return oCounty;
  }
  
  /**
   * @return canonical form of house number, or null if none. If
   * this method returns non-null the address has a valid house 
   * number, which can be an important criterion in the ranking
   * of matches.
   */
  public String getCanonicalHouseNumber() {
    return oHouse;
  }
  
  /**
   * @return original non-canonical address
   */
  public String getInputAddress() {
    return iAddress;
  }
  
  /**
   * @return original non-canonical county hint
   */
  public String getInputCounty() {
    return iCounty;
  }
  
  /**
   * @return original non-canonical post code
   */
  public String getInputPostCode() {
    return iPostCode;
  }
  
  /**
   * @return post code trim option
   */
  public boolean isTrimPostCode() {
    return iTrimPostCode;
  }
  
  /**
   * Normalise the address provided.
   */
  private void normaliseAddress() {
    // Use postcode and county hints first
    String county1 = normaliseText(iPostCode);
    county1 = normaliseCounty(county1);
    String county2 = normaliseText(iCounty);
    county2 = normaliseCounty(county2);
    // Also get county from address, if present
    String address = normaliseText(iAddress);
    String addressParts] = splitAddressCounty(address);
    address = addressParts[0];
    String county3 = addressParts[1];
    oCounty = pickCounty(county1, county2);
    oCounty = pickCounty(oCounty, county3); 
    if (oCounty != null) {
      // Add the county back to address in canonical form
      address = address + " co " + oCounty;
    }
    oAddress = address;
    // Use regex to check for house number
    Matcher m = kPattHouse.matcher(address);
    if (m.matches()) {
      oHouse = address.substring(0, m.group(1).length());
    }
  }
  
  /**
   * Given two counties, each guaranteed to be either null or a a canonical
   * county, choose the non-null one. Also, if dublin post codes are to be 
   * ignored, do so here. If both counties are non-null the first is chosen.
   * @param c1 county 1
   * @param c2 county 2
   * @return
   */
  private String pickCounty(String c1, String c2) {
    String cs] = new String] {c1, c2};
    for (String s : cs) {
      if (s != null) {
        return (iTrimPostCode && s.startsWith("dublin"))? "dublin" : s;
      }
    }
    return null;
  }
  
  /**
   * Split county from trailing part of address.
   * @param address input address with text already normalised.
   * @return array of two strings containing address and county separately.
   */
  private String] splitAddressCounty(String address) {
    String result] = new String[2];
    // Get last three words of address - that is the max that can
    // form a county name, e.g. co dublin 24. Check last 3 words
    // together, then last two, then last one, for valid county.
    int pos] = new int[3];
    pos[2] = address.lastIndexOf(' ');
    pos[1] = address.lastIndexOf(' ', pos[2] - 1);
    pos[0] = address.lastIndexOf(' ', pos[1] - 1);
    for (int i = 0; i < pos.length; i++) {
      String addressTail = address.substring(pos* + 1);
      String county = normaliseCounty(addressTail);
      if (county != null) {
        // We have a county -- remove the trailing words from address
        address = address.substring(0, (pos* < 0)? 0 : pos* );
        result[0] = address;
        result[1] = county;
        return result;
      }
    }
    // we didn't find a county - just return whole address
    result[0] = address;
    return result;
  }
  
  /**
   * Determine canonical county.
   * @param county input county as normalised text
   * @return canonical county, or null if none found
   */
  private String normaliseCounty(String county) {
    // Strip optional leading "co"
    final  String kCo = "co ";
    if (county.startsWith(kCo)) {
      county = county.substring(kCo.length());
    }
    if (county.length() == 0) {
      return null;
    }
    // Is the county name an integer, or the special
    // case of 6w -- if so, assume Dublin postcode.
    if (isInteger(county) || "6w".equals(county)) {
      county = "dublin " + county;
    }
    // Now check against the known set of counties
    if (kCounties.contains(county)) {
      return county;
    }
    return null;    
  }
  
  /**
   * Normalise text. Remove accents and punctuation, convert to 
   * lowercase, trim and compress spaces, and apply synonyms.
   * @param s input string
   * @return normalised string
   */
  private String normaliseText(String s) {
    if (s == null) {
      // Convert null to empty string
      return "";
    }
    StringBuilder result = new StringBuilder(s.length());
    StringBuilder currWord = new StringBuilder();
    int max = s.length() + 1;
    boolean firstWord = true;
    for (int i = 0; i < max; i++) {
      // Add an extra space at the end to make sure last word processed
      char c = (i == s.length())? ' ' : s.charAt(i);
      // convert upper alpha to lower alpha
      if ((c >= 'A') && (c <= 'Z')) {
        c += 32;
      } else {
        // Convert accented characters
        for (int j = 0; j < kAccentedChar.length; j++) {
          if (c == kAccentedChar[j]) {
            c = kUnaccentedChar[j];
            break;
          }
        }
      }

      // Save alphabetic and numeric to output
      if (((c >= 'a') && (c <= 'z')) || ((c >= '0') && (c <= '9'))) {
        currWord.append(c);
      } else if ((c == '\'') || (c == '"')) {
        // ignore these chars
      } else {
        // discard everything else, and treat as end of word
        if (currWord.length() > 0) {
          // get ready to write current word
          String word = currWord.toString();
          // first apply synonyms
          String syn = kSynonyms.get(word);
          if (syn != null) {
            word = syn;
          }
          // Append space before each word other than first
          if (!firstWord) {
            result.append(' ');
          }
          // Append current word and start next
          result.append(word);
          currWord = new StringBuilder();
          firstWord = false;
        }
      }
    }
    return result.toString();
  }
  
  /**
   * Check if a string is a run of digits
   * @param s input string
   * @return true if string contains only digits
   */
  private boolean isInteger(String s) {
    for (int i = 0; i < s.length(); i++) {
      if ((s.charAt(i) < '0') || (s.charAt(i) > '9')) {
        return false;
      }
    }
    return true;
  }
  
  /**
   * Set of canonical county names
   */
  private static Set<String> kCounties
    = new HashSet<String>(Arrays.asList(
        "carlow",
        "cavan",
        "clare",
        "cork",
        "donegal",
        "dublin",
        "dublin 1",
        "dublin 2",
        "dublin 3",
        "dublin 4",
        "dublin 5",
        "dublin 6",
        "dublin 6w",
        "dublin 7",
        "dublin 8",
        "dublin 9",
        "dublin 10",
        "dublin 11",
        "dublin 12",
        "dublin 13",
        "dublin 14",
        "dublin 15",
        "dublin 16",
        "dublin 17",
        "dublin 18",
        "dublin 20",
        "dublin 22",
        "dublin 24",
        "galway",
        "kerry",
        "kildare",
        "kilkenny",
        "laois",
        "leitrim",
        "limerick",
        "longford",
        "louth",
        "mayo",
        "meath",
        "monaghan",
        "offaly",
        "roscommon",
        "sligo",
        "tipperary",
        "waterford",
        "westmeath",
        "wexford",
        "wicklow"
    ));
  
  /**
   * List of synonyms used to replace words in addresses.
   * This list should be augmented as necessary.
   * Keep in alphabetical order.
   */
  private static Map<String, String> kSynonyms = new HashMap<String, String>();
  static {
    String syn] = new String] {
        "apartment", "apt",
        "apartments", "apt",
        "apmnt", "apt",
        "apmnts", "apt",
        "avenue", "ave",
        "block", "blk",
        "close", "cl",
        "county", "co",
        "court", "ct",
        "crescent", "cres",
        "drive", "dr",
        "garden", "gdn",
        "gardens", "gdn",
        "grove","gr",
        "lawn", "ln",
        "lawns", "ln",
        "lower", "lr",
        "lwr", "lr",
        "mount", "mt",
        "north", "nth",
        "park","pk",
        "road","rd",
        "saint", "st",
        "south", "sth",
        "square", "sq",
        "street", "st",
        "terrace", "tce",
        "upper", "upr",
        "uppr", "upr",
        "villas", "vls",
        "wood", "wd",
        "woods", "wd",
    };
    for (int i = 1; i < syn.length; i += 2) {
      kSynonyms.put(syn*, syn*);
    }
  }
  
  /**
   * Pattern for matching optional house number at start of address. Allows for
   * optional "apt" and number with a single letter from a-g suffixed.
   */
  private static Pattern kPattHouse = Pattern.compile("((?:apt )?[0-9]*[a-g]?) .*");
  
  /**
   * Arrays for converting upper and lowercase acute and grave accented characters.
   * Irish "síne fada" will normally be written as an acute accent but keyboard
   * accidents sometimes happen.
   */
  private static char] kAccentedChar   = "ÁáÀàÉéÈèÍíÌìÓóÒòÚúÙù".toCharArray();
  private static char] kUnaccentedChar = "aaaaeeeeiiiioooouuuu".toCharArray();
  
}

EDIT: Backwardly compatible version - optionally allows Dublin post codes to be ignored.*****

To you sir, I tip my hat.

Just noticed the PP DB Cleanup thread …will join that one for anything further …

viewtopic.php?f=4&t=46950

  • 1 .