Basic knowledge of session and cookies and the differences between session and cookies
Introduction of cookie and session
Sessions and cookies are two very common web concepts, and are also very easy to misunderstand. However, they are extremely important for the authorization of pages, as well as for gathering page statistics. Let’s take a look at these two use cases.
Suppose you want to crawl a page that restricts public access, like a twitter user’s homepage for instance. Of course you can open your browser and type in your username and password to login and access that information, but so-called “web crawling” means that we use a program to automate this process without any human intervention. Therefore, we have to find out what is really going on behind the scenes when we use a browser to login.
When we first receive a login page and type in a username and password, after we press the “login” button, the browser sends a POST request to the remote server. The Browser redirects to the user homepage after the server verifies the login information and returns an HTTP response. The question here is, how does the server know that we have access privileges for the desired webpage? Because HTTP is stateless, the server has no way of knowing whether or not we passed the verification in last step. The easiest and perhaps the most naive solution is to append the username and password to the URL. This works, but puts too much pressure on the server (the server must validate every request against the database), and can be detrimental to the user experience. An alternative way of achieving this goal is to save the user’s identity either on the server side or client side using cookies and sessions.
cookie principle
Cookies, in short, store historical information (including user login information) on the client’s computer. The client’s browser sends these cookies everytime the user visits the same website, automatically completing the login step for the user.
Cookies are maintained by browsers. They can be modified during communication between webservers and browsers. Web applications can access cookie information when users visit the corresponding websites. Within most browser settings, there is one setting pertaining to cookie privacy. You should be able to see something similar to the following when you open it.
Cookies have an expiry time, and there are two types of cookies distinguished by their life cyles: session cookies and persistent cookies.
If your application doesn’t set a cookie expiry time, the browser will not save it into the local file system after the browser is closed. These cookies are called session cookies, and this type of cookie is usually saved in memory instead of to the local file system.
If your application does set an expiry time (for example, setMaxAge(606024)), the browser will save this cookie to the local file system, and it will not be deleted until reaching the allotted expiry time. Cookies that are saved to the local file system can be shared by different browser processes -for example, by two IE windows; different browsers use different processes for dealing with cookies that are saved in memory.
session principle
Sessions, on the other hand, store historical information on the server side. The server uses a session id to identify different sessions, and the session id that is generated by the server should always be random and unique. You can use cookies or URL arguments to get the client’s identity.
A session is a series of actions or messages. For example, you can think of the actions you between picking up your telephone to hanging up to be a type of session. When it comes to network protocols, sessions have more to do with connections between browsers and servers.
Sessions help to store the connection status between server and client, and this can sometimes be in the form of a data storage struct.
Sessions are a server-side mechanism, and usually employ hash tables (or something similar) to save incoming information.
When an application needs to assign a new session to a client, the server should check if there are any existing sessions for the same client with a unique session id. If the session id already exists, the server will just return the same session to the client. On the other hand, if a session id doesn’t exist for the client, the server creates a brand new session (this usually happens when the server has deleted the corresponding session id, but the user has appended the old session manually).
The session itself is not complex but its implementation and deployment are, so you cannot use “one way to rule them all”.