ybabts
ybabts2y ago

Parsing Hostname for TLD, domain name, and SLD

does anyone know of any packages to parse a URL's hostname for the top level domain, domain name, and sublevel domains? I found one, but it actually just doesn't work. It just errors saying the label is too short for everything I give it. https://github.com/lupomontero/psl
GitHub
GitHub - lupomontero/psl: JavaScript domain name parser based on th...
JavaScript domain name parser based on the Public Suffix List - GitHub - lupomontero/psl: JavaScript domain name parser based on the Public Suffix List
6 Replies
ybabts
ybabtsOP2y ago
I'm suprised this isn't already in the URL object for Javascript
tristinDLC
tristinDLC2y ago
Would these not work for you?
// Store the URL into variable
var url =
"https://geeksforgeeks.org:3000/pathname/?search=query";

// Created a URL object using URL() method
var parser = new URL(url);

// Protocol used in URL
console.log(parser.protocol);

// Host of the URL
console.log(parser.host);

// Port in the URL
console.log(parser.port);

// Hostname of the URL
console.log(parser.hostname);

// Search in the URL
console.log(parser.search);

// Search parameter in the URL
console.log(parser.searchParams);
// Store the URL into variable
var url =
"https://geeksforgeeks.org:3000/pathname/?search=query";

// Created a URL object using URL() method
var parser = new URL(url);

// Protocol used in URL
console.log(parser.protocol);

// Host of the URL
console.log(parser.host);

// Port in the URL
console.log(parser.port);

// Hostname of the URL
console.log(parser.hostname);

// Search in the URL
console.log(parser.search);

// Search parameter in the URL
console.log(parser.searchParams);
ybabts
ybabtsOP2y ago
no I need the sub level domain of the url, which comes from the URL.hostname
// Sub level domain is "api"
// Domain Name is "geeksforgeeks"
// Top level domain is "org"
const url = new URL("https://api.geeksforgeeks.org:3000/pathname/?search=query");
// api.geeksforgeeks.org
console.log(url.hostname)
// Sub level domain is "api"
// Domain Name is "geeksforgeeks"
// Top level domain is "org"
const url = new URL("https://api.geeksforgeeks.org:3000/pathname/?search=query");
// api.geeksforgeeks.org
console.log(url.hostname)
NeTT
NeTT2y ago
why not use some regex for that
const parseHostname = /(?:([a-z0-9]+)\.)?([a-z0-9]+)\.(?:([a-z]+)\.)?([a-z]+)/

const url = new URL("https://api.geeksforgeeks.org:3000/pathname/?search=query");

parseHostname.exec(url.hostname)
/*
[
"api.geeksforgeeks.org", // hostname
"api", // subdomain
"geeksforgeeks", // domain name
undefined, // sld
"org", // tld
index: 0,
input: "api.geeksforgeeks.org",
groups: undefined
]
*/
const parseHostname = /(?:([a-z0-9]+)\.)?([a-z0-9]+)\.(?:([a-z]+)\.)?([a-z]+)/

const url = new URL("https://api.geeksforgeeks.org:3000/pathname/?search=query");

parseHostname.exec(url.hostname)
/*
[
"api.geeksforgeeks.org", // hostname
"api", // subdomain
"geeksforgeeks", // domain name
undefined, // sld
"org", // tld
index: 0,
input: "api.geeksforgeeks.org",
groups: undefined
]
*/
I just woke up, there may be better ways to do it
tristinDLC
tristinDLC2y ago
I'd also recommend regex if you need to parse other components. Here is what I would have done:
const regexStr = /(?:(www\.))|(?:(?:^https:\/\/www\.)|(?:^https:\/\/)?)([\w]+){1}(?:\.[\w]+){2,}/;
const urlLocation = 'https://docs.google.com/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash';
const matches = urlLocation.match(regexStr);
const subDomain = {
...matches
};
console.log(subDomain[2]);
const regexStr = /(?:(www\.))|(?:(?:^https:\/\/www\.)|(?:^https:\/\/)?)([\w]+){1}(?:\.[\w]+){2,}/;
const urlLocation = 'https://docs.google.com/dir/1/2/search.html?arg=0-a&arg1=1-b&arg3-c#hash';
const matches = urlLocation.match(regexStr);
const subDomain = {
...matches
};
console.log(subDomain[2]);
My regex string is a little safer as it looks for specific pieces of a URL in their respective spot instead of just matching on any string containing a char/num.
tristinDLC
tristinDLC2y ago
@ybabts Here is a URL parsing library that looks infinitely easier to use than juggling regex: https://github.com/remusao/tldts
GitHub
GitHub - remusao/tldts: JavaScript Library to work against complex ...
JavaScript Library to work against complex domain names, subdomains and URIs. - GitHub - remusao/tldts: JavaScript Library to work against complex domain names, subdomains and URIs.

Did you find this page helpful?