What happens when you type a web address in your browser and press Enter.

Danilo Romero
8 min readAug 24, 2020

I came to this world in 1982, before the Internet became the regular way to get information, back then if you really wanted or needed to get some information a trip to the library was the best way to satisfy your curiosity. Once you got there you could ask the librarian for help, some tips to find what you wanted and then manually look for the book, newspaper or magazine by subject, title or by author in the library catalog. See below image of a library catalog.

The card catalog in Manchester Central Library
https://en.wikipedia.org/wiki/File:2010_Manchester_UK_4467481691.jpg

The catalog is this collection of drawers in alphabetical order full of cards, where every publication had a card with some general info including the

http://www.libraryhistorybuff.com/images/catalog-card-loc-90.jpg

numeric index that you should give to the librarian to go and get the book from the shelf in the deposit. The librarian will return with the publication you asked for. It took some time and effort, right now all you need is a desktop or portable device and a connection to internet and you can get the information in seconds just surfing the web, as simple as it looks it is even a more complicated process that thanks to the speed of electricity it happens in the blink of an eye.

Now my mission is to break down that process of typing a web address and getting information on the screen, so we typed: https://www.holbertonschool.com

  • DNS request => Domain Name System request

Same as with books in the library that we know the names of authors or books and they are organized by a code in the deposit, we remember the names of websites and they are organized by a number called IP address. IP stands for Internet Protocol. Every device on the internet has an address. So when we “visit” a website what really happens is that some files travel from a server (the device where the files are stored) to our device and appear on our screen. The way those files know how to find our device in between all the traffic on the internet is by following our device’s IP address, but first our computer needs to find out the IP address of the server where those files are.

When we type the name of the website the first step is to translate that name into an IP address. There is a physical place similar to the library catalog, it is a DNS server. To connect to the internet we have a contract with an ISP=> Internet Service Provider, they have a default DNS server to translate the name of the website we want into the correspondent IP address.

The web address has a more technical term URL=>Uniform Resource Locator

We probably typed holberton school in a search engine, we might click on one of results. To make a full explanation let’s use the URL https://www.holbertonschool.com/about

Now let’s examine each part of the URL before we understand how the DNS request is resolved:

  • https is the scheme, most know as protocol( more on this later)
  • www.holbertonschool.com is the hostname of the web server
  • /about is the path. where the file that has the content is located

This is how regularly an an IP address is located using the hostname:

Root Server: Here the suffix of the URL is checked to match with common extensions most users are familiar with such as .com, .org, or .gov, then the server redirects he request to a specific TLD server depending of the suffix for the next step.

TLD=>Top Level Domain Server: Here is an index with addresses with the same suffix where blocks of addresses belong to an authoritative server that has its own IP and the request is redirected there.

Authoritative Server: here there is another index than contains a partial result, here is the last redirection to the to find the hostname IP address in the DNS resolver.

DNS Resolver Server: here the IP address that belong to the hostname is located and returns to our computer so the process can continue .

http://sekolahlinux.com/wp-content/uploads/2015/09/DNS.png
  • TCP/IP => Transfer Control protocol + Internet protocol

Our computer now knows the IP address to send the request for the content of the website we typed in the browser. The browser we use in the computer is known as the client. The website is composed of files of many types such as HTML, CSS, JavaScript, Images and a few other possibilities. The consistency of those files reaching our computer is due to the TCP/IP protocol. The files are split in small packets. Our browser sends a request and the server sends a response.

To make sure the delivery of the packages is completed a constant confirmation of the arrival of the packets is performed. Before the first package is transferred the PAR=>Positive Acknowledgement with Re-transmission is activated, composed of the following 3 steps:

  • Step 1 (SYN) : our web browser=>the client sends a Synchronize Sequence Number (SYN) to inform the server that it wants to start communicating.
  • Step 2 (SYN + ACK): Then, the server responds to the Client with a number set called SYN-ACK =>Synchronize Sequence Number-Acknowledgement. The number is a validation of the Request of the Client, meaning that t is ready for a request.
  • Step 3 (ACK) : Finally, the client acknowledges that the server is ready to listen for a request and the data transfer can take place.

This 3 steps are known as the TCP handshake illustrated in the following image.

http://blog.tofte-it.dk/wp-content/uploads/2018/04/tcp-behavior-handshake-1024x668.png
  • Firewall

All this transfer and traffic of data is susceptible to vulnerabilities of many kinds. One of several tools to reduce the those vulnerabilities is a firewall. It is a system to monitor incoming and outgoing traffic on a determined set of security rules.

There are hardware and software firewalls on both sides client and server. There are several kinds of implementations that may deserve a dedicated article. To keep it simple one common rule of security configuration is for a firewall to intercept all data packets in the transmission and verify the process id against a set of rules configured, if the verification is passed the transmission can continue.

  • HTTP=>Hyper Text Transfer Protocol

This is a standard for request response transactions over the internet. It uses the port 80 as standard. Even all connections via http use port 80, it is omitted in the url because it is set by default but remember that is there.

The data traveling over the internet is susceptible to be intercepted. To avoid this vulnerability when sending sensible information such as credit card data the SSL=>secure socket layer and https protocol are implemented.

  • HTTPS=>Hyper Text Transfer Protocol Secure

while visiting a website we may notice a lock on the left of the address in the

web browser indicating that the connection is secure with additional encryption. It is done via the port 443 on both sides of the connection which is the default so it also seems invisible on the screen.

https://www.instantssl.com/http-vs-https

Server side:

We are half way to get the content we want on our screen. The server we want may have many request at the same time so it is important to have some infrastructure to handle all the traffic. The first step our request will need to complete is to pass a load balancer.

Full connection diagram
  • Load-balancer

To solved many request at the same time or to handle some maintenance and still have the content available there are many servers listening to the requests users send all the time. The diagram shows an infrastructure of only two servers for simplicity.

The job of the load balancer is to distribute the many requests sent by the users between the servers available. As usual there are several implementations, hardware or software, and different algorithms: round robin, random, fastest and a few others.

  • Web server

When the load balancer assigns us a web server the request is read and the response begins the new process of finding the files. The web server holds static content. It is the HTML, the CSS styles, images that are stored in the file system of the web server and it is usually known as the front-end. As usual there are different technologies, in the diagram appears Nginx but Apache is another popular one. Those are software application to manage the content.

Many websites for small companies that only show the brand and services only need the web server to be fully functional. There are others that support a more complex functionality and require an application server and a database , this additional part is generally known as the back-end.

  • Application server

If the website has a sign in button there must be behind the static content additional infrastructure to handle that data and then store it. This is the dynamic content that may generate a username, a random password to your personal email for verification purposes and from there many more content.

This application server uses different languages to handle the information such as php, python as seen in the diagram but as usual there are several others. This is also called the business logic, it is how the data is managed. All this data also needs to be stored while is not being modified so there must be a warehouse to keep all the data and it is a database server

  • Database server

On the first click to visit a website we don’t get into the database but as told before once you register and have a user and password, then and every time you log in this data is verified inside the database. There are different DMMSs=>Data Base Managment Systems that provide different functionality. It is very similar to the librarian going to the deposit to retrieve the publication we asked for. The librarian who moves things here and there acting as the application server and the database the shelves where he books are. As seen in the diagram each server ha a copy of the database, to keep track of the changes one must be assigned the primary profile and the other copies will be replicas. This setting allows to write changes to the database, then synchronize the replicas to maintain the resistance of the information.

Finally

Once the requested data is found an https response is send to our web browser that reads the files and shows them on our screen in the blink of and eye. Even we say we visit a website we never go there, we actually have a kind of conversation sending petitions and receiving answers. It is a real pleasure to be in this point in time where the curiosity can be satisfied almost instantly. With the appropriate selection we can learn anything that can make the world a more comfortable place

--

--