Addresses
rigour.addresses
This module provides a set of tools for handling postal/geographic addresses. It includes functions for normalising addresses for comparison purposes, and for formatting addresses given in parts for display as a single string.
Postal address formatting
This set of helpers is designed to help with the processing of real-world addresses, including composing an address from individual parts, and cleaning it up.
from rigour.addresses import format_address_line
address = {
"road": "Bahnhofstr.",
"house_number": "10",
"postcode": "86150",
"city": "Augsburg",
"state": "Bayern",
"country": "Germany",
}
address_text = format_address_line(address, country="DE")
Acknowledgements
The address formatting database contained in rigour/data/addresses/formats.yml is
derived from worldwide.yml in the OpenCageData address-formatting
repository. It is used to
format addresses according to customs in the country that is been encoded.
clean_address(full)
format_address(address, country=None)
Format the given address part into a multi-line string that matches the conventions of the country of the given address.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
address
|
Dict[str, Optional[str]]
|
The address parts to be combined. Common parts include: summary: A short description of the address. po_box: The PO box/mailbox number. street: The street or road name. house: The descriptive name of the house. house_number: The number of the house on the street. postal_code: The postal code or ZIP code. city: The city or town name. county: The county or district name. state: The state or province name. state_district: The state or province district name. state_code: The state or province code. country: The name of the country (words, not ISO code). country_code: A pre-normalized country code. |
required |
country
|
Optional[str]
|
ISO code for the country of the address. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
A single-line string with the formatted address. |
Source code in rigour/addresses/format.py
format_address_line(address, country=None)
Format the given address part into a single-line string that matches the conventions of the country of the given address.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
address
|
Dict[str, Optional[str]]
|
The address parts to be combined. Common parts include: summary: A short description of the address. po_box: The PO box/mailbox number. street: The street or road name. house: The descriptive name of the house. house_number: The number of the house on the street. postal_code: The postal code or ZIP code. city: The city or town name. county: The county or district name. state: The state or province name. state_district: The state or province district name. state_code: The state or province code. country: The name of the country (words, not ISO code). country_code: A pre-normalized country code. |
required |
country
|
Optional[str]
|
ISO code for the country of the address. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
A single-line string with the formatted address. |
Source code in rigour/addresses/format.py
normalize_address(address, latinize=False, min_length=4)
Build a comparison key from an address.
Casefolds, replaces punctuation/symbols with whitespace,
tokenises on Unicode general-category, and rejoins with
single-space separators. The output is a flat lowercase token
sequence suitable for substring matching, equality keys, or
feeding :func:shorten_address_keywords /
:func:remove_address_keywords — not a display form.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
address
|
str
|
The address to normalise. |
required |
latinize
|
bool
|
When |
False
|
min_length
|
int
|
Reject the result as |
4
|
Returns:
| Type | Description |
|---|---|
Optional[str]
|
Normalised address, or |
Optional[str]
|
than |
Source code in rigour/addresses/normalize.py
remove_address_keywords(address, latinize=False, replacement=WS)
Strip common address keywords from a normalised address.
Removes recognised forms ("street", "road", "south",
territory names, ordinals, …) by substituting each match with
replacement. Consecutive matches produce consecutive
replacement runs — whitespace is not collapsed, so the
output may contain multi-space gaps. Use
normality.squash_spaces afterwards if a single-space
output is wanted.
Input must already be normalised with :func:normalize_address
using the same latinize flag — the alias table is built
against that normalised form.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
address
|
str
|
A pre-normalised address string. |
required |
latinize
|
bool
|
Must match the flag passed to
:func: |
False
|
replacement
|
str
|
String substituted in place of each match. Defaults to a single ASCII space. |
WS
|
Returns:
| Type | Description |
|---|---|
Optional[str]
|
The address with recognised keywords removed. |
Source code in rigour/addresses/normalize.py
shorten_address_keywords(address, latinize=False)
Shorten common address keywords in a normalised address.
Replaces recognised forms with their canonical short form
("street" → "st", "avenue" → "av", "united arab
emirates" → "ae", …). Multi-token forms beat single-token
components via longest-form-first ordering in the alias
pattern, so country names win over their constituent words.
Input must already be normalised with :func:normalize_address
using the same latinize flag — the alias table is built
against that normalised form.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
address
|
str
|
A pre-normalised address string. |
required |
latinize
|
bool
|
Must match the flag passed to
:func: |
False
|
Returns:
| Type | Description |
|---|---|
Optional[str]
|
The address with recognised keywords shortened. Tokens |
Optional[str]
|
that don't match any alias pass through unchanged. |