News, views and what I choose to dos




Whois study shows 77 percent of domains have dodgy registration info

Category : Domain names, ICANN, Internet, Internet governance · by Feb 17th, 2010

I’ve written a story for The Register about a new report [pdf] regarding the Whois system for domain registration data. We all knew that Whois was a mess, but it’s good to have some facts and figures that show what a mess it is.

Here’s hoping this gives the endless, intractable discussions within ICANN about Whois a bit of a kick. I’m hoping Maria Farrell will write something about this report – she saw several years of her life chewed up trying to make some progress on this issue. Oh, I should also thank Jenny Kelly from NORC who was incredibly responsive and helpful.

Full story below:


77% of domain registrations stuffed with rubbish

Whois in charge? ICANN’t tell

An incredible 77 per cent of internet domains – nearly 90 million internet addresses – are registered with false, incomplete, or unverifiable information.

An extensive review of 1,419 representative domain names conducted by overseeing body ICANN, including direct contact with over 500 individual domain owners, produced some startling results [pdf]. Example: only 23 per cent of domain registrations display the owner’s correct name and physical address.

Worse, an extraordinary 29 per cent of domains are registered with patently false or suspicious information – a shady sign of online criminalty. The remaining 48 per cent of faulty registrations are in a grey area where people are either unaware or unwilling to hand over their identifying details.

But Jenny Kelly of the National Opinion Research Center (NORC), who headed the investigation, warned against making broad assumptions about the initegrity of the Whois system for registration data. “What we found was that 23 per cent of domains had good information that we could verify but there are many others where we were not able to confirm the content and so were not able to say they were good,” she explained, citing as an example post-office boxes.

The survey found that Whois records for domains contain a lot of incomplete information – something that can be put down to the practices of different registrars. “The approach taken varies widely by registrar,” Kelly explained.

“As part of my background preparation, I tried to register domains with different registrars. It was clear that some companies have good checks: checking your zip code is right for the city and state you entered, and likely checking credit card details against the registered address. But others did not apply such checks during the registration process, although whether or not they apply them subsequently I don’t know.”

The findings will have far-reaching implications for the domain-name system, which has spent the past five years working under the broad assumption that 95 per cent of domain information was at least partially accurate.

The report itself expressed surprise that no incidences of identity theft were found – but it concluded, incredibly, that this was because such theft wasn’t needed. “It would seem that given the latitude that people have in choosing what information to provide when registering a domain name, identity theft may not be necessary. It is all too easy to enter any or no name, along with an unreliable or undeliverable address,” the report reads.

The reasons behind the widespread failure of the Whois system, which is supposed to ensure accurate domain registration information, are complex – but they stretch beyond complaints about the privacy implications of people posting their information online for anyone to view.

There are no mandated standards for registrars to check whether the information provided is accurate. A large number of domain registrants are unaware that the service even exists, and until a few months ago, the system only accepted ASCII English, causing millions of registrants to register with best-guess information.

The report notes that the majority of the issues behind the inaccuracy are within the ICANN community’s ability to fix. All registrars of generic top-level domains (such as .com or .info) have to sign a Registrar Accreditation Agreement (RAA) with ICANN and changes to that agreement can oblige registrars to improve the level of checking for domains.

ICANN is also in a position to provide ratings of different registrars so that consumers can be better informed about with whom they register their domain. However, both alterations in the RAA and the creation of a rating mechanism require a change in policy at ICANN – something to which registrars would need to agree.

Following the collapse of one registrar, RegisterFly, a few years ago that resulted in tens of thousands of individuals losing control of their domains, ICANN announced it would revise the RAA to make sure it didn’t happen again. However, the end result of its review was watered down over time by the registrars themselves before they were agreed to. A new set of amendments are currently under discussion.

Improving accuracy of the Whois system would also come at a cost, the report warns. “The cost of ensuring accuracy will escalate with the level of accuracy sought, and ultimately the cost of increased accuracy would be passed through to the registrants in the fees they pay to register a domain.”

Kelly wouldn’t be drawn into discussing the cost of such a system, but she did note that heavy competition in the registrar market has driven down costs to less than $10 per domain per year. She hypothesised: “If registration cost $20 rather than $10, would it stop people from registering domains? And would we find that it enables more security?”

The report may help break a decade-long impasse over the Whois service, during which conflicting interests have ensured that no progress has been made on much-needed changes. The last time a full study [pdf] of Whois accuracy was commissioned – by the US Government Accountability Office (GAO) in 2005 – it was reported that only 5 per cent of domain names contained “missing or patently false information.”

That report was dismissed by those in the know as being wildly inaccurate because it didn’t look into the accuracy of information, but only whether it appeared normal. Today’s report discovered roughly the same 5 per cent of nonsense information but found a far greater percentage of wrong or false information by looking at the actual data itself.

ICANN is obliged under its Affirmation of Commitments with the US government to maintain the Whois service as well as “assess the extent to which WHOIS policy is effective and its implementation meets the legitimate needs of law enforcement and promotes consumer trust.”

ICANN’s management has also clearly signalled that it intends to use the report to break the Whois impasse, with its Chief Operating Officer Doug Brent telling us: “Ultimately, any solution reached for Whois accuracy must be closely tied to ICANN’s contractual enforcement mechanisms which today go no further than requiring investigation of inaccuracy complaints.

“Sometimes, the thorniest problems are the most important to address, and we hope that putting some facts out on the table leads to a more informed debate, and an actual path to solutions.”

You can view public comment on the report – or submit your own thoughts – here.

SHARE :

(4) comments

Michele
5 years ago ·

Kieren

I’ve already posted my own comments on this report, but I’ll more or less repeat them here.

I have my doubts about this study’s methodology and I’m not sure if its conclusions should bear that much weight. It is, in many respects, as if someone decided that WHOIS was inaccurate, but wanted a bit of paper to validate that claim.

In any case until such time as gTLDs adopt a saner approach to WHOIS that respects individuals’ right to privacy this “problem” won’t go away.

A criminal is never going to provide valid WHOIS data and in many of the cases I’ve seen will quite happily produce perfectly valid WHOIS data – just not their own.

Regards

Michele

kierenmccarthy
5 years ago ·

@Michele: Actually, I asked Jenny Kelly from NORC specifically about this issue.

She says that statistically, they calculated that 1,400 domain names would give the best results (within a few percentage points accurate) taking into account the cost of a larger survey.

So they took a random sample of 2,400 domains and then boiled it down to 1,419 by pulling out domains that wouldn’t be that useful and/or very expensive to verify i.e. where this is a single domain from a country with poor infrastructure that didn’t speak English.

Polls and statistics are funny animals and I know from my limited exposure, and despite the fact that I have degree-level maths and statistics, that they require real expertise.

So I’m tempted to trust NORC on this one. Most big polls – election exit polls etc use less than 2,000 inputs.

But with regard to the wider issue of Whois – everyone have known the answer for a decade: you have to allow people to not have their personal information published on the Internet. But there should be an expectation that the registrant provides accurate information. And that means that you need a system where certain parties under certain circumstances are allowed access to that data.

So, law enforcement. Lawyers if they open some kind of official legal complaint. And technical bodies like ICANN so long as they don’t disclose the information publicly.

The problem, as ever, is people being unreasonable. The privacy advocates are often over the top and unhelpful; and the IP lawyers want to believe they are of equal authority as law enforcement of the courts – which of course they are not.

Hopefully the IP lawyers will see that the information is so crap that they start being a little more realistic. Hopefully privacy advocates will recognise that the system has to be fixed and they need to find a useful compromise. And hopefully law enforcement and ICANN will have enough courage to deal with the issue openly and publicly and say what they are going to allow and to do, and that way alleviate well-founded suspicions that this data could be abused.

Oh, and of course the registrars need to recognise that times have changed and they need to accept more rules on their business – in their own interests.

Will any of this actually happen? Who knows? We may have reached the point of exhaustion after 10 years of Whois talks, or people may continue their stubbornness, cynically assessing the status quo as preferable.

Only time will tell.

Jeffrey Eckhaus
5 years ago ·

I think it is unfair to run with a headline that 77% of domains have dodgy registration info unless dodgy means “not able to verify”.
As NORC has stated a PO Box is one example where the whois information could not verified. PO Boxes are legitimate and used by millions of people worldwide and as a previous user of a PO Box I resent being labeled as dodgy.

I also do not think it is right to label domains registered with patently false or suspicious information – a shady sign of online criminalty. Lets not forget that ICANN CEO Rod Beckstrom’s domain beckstrom.com whois data was all listed with term private and the email address as private@private.com. Is this a shady sign of online criminality? While he may be the exception rather than the norm, he is included when people make these types of assumptions.

While I agree there are some issues with the current WHOIS system and I think you may want to rethink the headlines and conclusions from this report.

Jeff

kierenmccarthy
4 years ago ·

I find this an odd response and I’ll tell you why.

1. The story itself includes the information that you are using to criticise the story.

2. Headlines are designed – and will always be designed – to grab attention. You cannot be entirely accurate in a headline: a headline is not the first paragraph. It is a matter of law that headlines must be taken alongside the story itself: the legal system recognises the reality of headlines – you should too. Also, headlines are almost always written by someone other than the reporter.

3. The criticism that completely wrong information may not be suspicious is an illogical tack – people know that when they type in clearly wrong information that they are acting unusually and suspiciously. The discovery of this level of purposeful deception requires explanation from those typing in the wrong information, not an accusatory response from those knowingly providing the wrong information.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>