Monday, March 16, 2015

Solution for file_get_content returning gibberish content

In case you're looking for the solution and you don't care about the coding, it is in bold red.

Recently, I wanted to use StackExchange API where I wanted to extract some data for some purposes. I decided to use the file_get_contents method. But, surprisingly, all I could get was gibberish content with weird characters.

The code I was using was this:

echo $html =  file_get_contents("");

Really simple and straight to the point, just one line of code.  What I would get surprisingly is something like this:

‹ •’ÁŽ›0 †_ÅòaO H€ ¢½TªÔK«öÐCU!ÇLÀ ØÔ B¢h¥¾F_¯OÒ1$Û½ö6öÌ|3ÿo߸Bè /¿Ý8ŠÆ ¼ ]ƒåß n&MAyã † *£y¹Î¶ ØJÕ¼ÜdI‘ßÏx €—TÜ(‡`¡æ ¬9ª *Õ‹Æ'[ÄÁ•Q4MÓª±â,PØ•4}´„Q²‹óÝ6_ïdždñq—¦Ûc½Ž‹x› ùñ ŸÝ~½ÉŸê½ªA£’F?Ùý§÷4©VnèĵҢ÷ƒ>à ¤ ìÊ}A£ÁQI§ôé¾ ­àN0 Á­ y‚‹l…n`^ÆËqѬ-² (ü º…ó på*¡Ý4k,ÑŽ 𳂩’fÔè-*’€/ ¯w wÒXZmMa' VB¢:+¼Vµ@Ÿn¶Y’ ù+-Ìv¿M­7YJ©Ÿ#¸95Û¿‹Óâ?„=šIœïŒj . !ZZvy÷p2½Ð¡Ò¡é®ý D8 ×*ݐøЂ u†ðhH Cb’éJ#Q f$QaçýGd&Ø 2›ÉLiöq! ìë+šÝÑìfÿÐÌ£‘ kª¡ \ ãF …-x.è?¿~;æZC–Sƒ®YgH»eÖ˜þ™¿Ðon…«úù Ž¢sà4(ª^\x™Äñãl¡§©´•w7ù qUb_"

So, I looked everywhere on my server and on the internet and I couldn't find a solution. It turns out the lines above you see aren't gibberish, they are gzip encoded data.  The proof was simple, just try to run this code


You will see this:

Array ( 
[0] => HTTP/1.1 200 OK 
[1] => Cache-Control: private 
[2] => Content-Type: application/json; charset=utf-8 
[3] => Content-Encoding: gzip 
[4] => Access-Control-Allow-Origin: * 
[5] => Access-Control-Allow-Methods: GET, POST 
[6] => Access-Control-Allow-Credentials: false 
[7] => X-Content-Type-Options: nosniff 
[8] => Date: Mon, 16 Mar 2015 23:41:57 GMT 
[9] => Content-Length: 511 

Aha, the fourth element of the array is gzip as you see. The solution is to do the following:

$html = file_get_contents(""); 
echo gzinflate(substr($html,10,-8));

Now, this will definitely work, I guarantee, this is what I got:

{"items":[{"tags":["gender"],"owner":{"reputation":161,"user_id":25398,"user_type":"registered","profile_image":"","display_name":"Rebecca J. Stones","link":""},"is_answered":true,"view_count":1644,"answer_count":1,"score":12,"last_activity_date":1426534996,"creation_date":1426512546,"question_id":27049,"link":"","title":"Does a transgender woman in Olympia, Washington receive frequent complaints about indecent exposure in the women’s showers and locker room?"}],"has_more":false,"quota_max":300,"quota_remaining":272}

What does this mean?

We you use a famous, overused PHP function like file_get_contents() to fetch a remote webpage, it doesn't send any http headers by default! The remote server's job is to check if you (the requester of the data, aka your server) supports compression over http (known as HTTP_ACCEPT_ENCODING).

Now it is very likely that no headers were sent with your request, the server shouldn't have replied with gzip encoded data, but they are doing so. So, it is StackExchange API fault a bit. 

Anyhow, you got your solution now, let me know in the comments if you still have a problem!


Sunday, March 8, 2015

Do not set timeout in httpd.conf to 0 (ZERO)

Apache, the most popular web server on the planet, is mainly configured by placing directives in plain file httpd.conf !

In httpd.conf you will find  in /etc/httpd/conf/httpd.conf by default this:

# Timeout: The number of seconds before receives and sends time out.
Timeout: 60

Do not ever think that by replacing 60 to 0 you will have unlimited timeout for recieves and sends!

By placing it to 0, you will prevent your server from sending any file to your server (mp4, css, js or anything). In other words, no resources will be accessible from your server.

If you have set it to 0 once like I do, you should revoke it and restart your server.

You can set it to 9999 if you desire but never 0.

For useful up-to-date information you may see this.