Bay Cao và Bay Xa – Fly High and Fly Far

September 10, 2009

Web Performance And Scalability – Best practices

Filed under: MySQL, Programming — Tags: , — doqkhanh @ 8:53 AM

Some of these may be conflicting, not applicable to everyone.

1) think horizontal — everything, not just the web servers. Micro optimizations are boring, as or other details

2) benchmarking techniques;. Not “how fast” but “how many”. test force, not speed.

3) bigger and faster vertical scaling is the enemy.

4) horizontal scaling = add another box

5) implementation, scale your system a few times, but scale your ARCHITECTURE a dozens or hundreds of time.

6) start from the beginning with architecture implementation.

7) don’t have “The server” for anything

8) stateless good, stateful bad

9) “shared nothing” good

10) don’t keep state within app server

11) caching good.

12) generate static pages periodically, works well for not millions of pages or changes.

13) cache full output in application

14) include cookies in the “cache key” so diff browsers can get diff info too

15) use cache when this, not when that

16) use regexp to insert customized content into the cached page

17) set Expires header to control cache times, or rewrite rule to generate page if the cached file doesn’t exist (rails does this)

18) if content is dynamic this doesn’t work, but great for caching “dynamic” images

19) partial pages — pre-generate static page snippets, have handler just assemble pieces.

20) cache little snippets, ie sidebar

21) don’t spend more time managing the cache than you save

22) cache data that’s too slow to query, fetch, calc.

23) generate page from cached data

24) use same data to generate API responses

25) moves load to web servers

26) start with things you hit all the time

27) if you don’t use it, don’t cache it, check db logs

28) don’t depend on MySQL Query cache unless it actually helps

29) local file system not so good because you copy page for every server

30) use process memory, not shared

31) mysql cache table — id is the “cache key” type is the “namespace”, metadata for things like headers for cached http responses; purge_key to make it easier to delete data from cache (make it an index, too, primary index on id,type, expire index on expire field) fields

32) why 31 fails, how do you load balance, what if mysql server died, now no cache

33) but you can use mysql scaling techniques to deal, like dual-master replication

34) use memcached, like lj, slashdot, wikipedia — memory based, linux 2.6(epoll) or FreeBsD(kqueue), low overhead for lots of cxns, no master, simple!

35) how to scale the db horizontally, use MySQL, use replication to share the load, write to one master, read from many slaves, good for heavy read apps (or insert delayed, if you don’t need to write right away) — check out “High Performance MySQL”

36) relay slave replication if too much bandwidth on the master, use a replication slave to replicate to other slaves.

37) writing doesn’t scale with replication — all servers need to do the same writes. 5.1’s row-level replication might help.

38) so partition the data, divide and conquer. separate cluster for different data sets

39) if you can’t divide, use flexible partitioning, global server keeps track for which “cluster” has what info. auto_increment columns only in the “global master”. Aggressively cache “global master” data.

40) If you use a master-master setup like 39, then you don’t have replication slaves, no latency from commit to data being available. if you are careful you can write to both masters. Make each user always use the same master, so primary keys won’t be messed up. If one master fails, use the other one.

41) don’t be afraid of the data duplication monster. use summary tables, to avoid things like COUNT(*) and GROUP BY. do it once, put result into a table — do this periodically, or do it when the data is inserted. Or data affecting a “user” and a “group” goes into both the “user” and “group” partitions (clusters). so it’s duplicating data.

42) but you can go further, and use summary dbs! copy data into special dbs optimized for special queries, ie FULLTEXT searches, anything spanning more than one or all clusters, different dbs for different latency requirements, ie RSS feeds from a replicated slave db — RSS feeds can be late).

43) save data to multiple “partitions” like the application doing manual replication — app writes to 2 places OR last_updated and deleted columns, use triggers to add to “replication_queue” table, background program to copy data based on queue table or last_updated column

44) if you’re running oracle, move read operations to MySQL with this manual replication idea. Good way to sneak MySQL into an oracle shop.

45) make everything repeatable, build summary and load scripts so they can restart or run again — also have one trusted eata place, so summaries and copies can be (re)created from there.

BREATHE! HALFWAY THERE!!

46) use innodb because it’s more robust. except for big read-only tables, high volume streaming tables (logging), locked tables or INSERT DELAYED, specialized engines for special needs, and more engines in the future — but for now, InnoDB

47) Multiple MySQL instances — run diff instances for diff workloads, even if they share the same server. moving to separate hardware is easier, of course. optimize the server instance for the workload. easy to set up with instance manager or mysqld_multi, and there are init scripts that support the instance manager.

48) asynchronous data loading when you can — if you’re updating counts or loading logs, send updates through Spread (or whatever messaging something) to a daemon loading data. Don’t update for each request (ie, counts), do it every 1000 updates, or every few minutes. This helps if db loses net connection, the frontend keeps running! or if you want to lock tables, etc.

49) preload, dump and process — let the servers pre-process, as much as possible. dump never changing data structures to js files for the client to cache (postal data maybe), or dump to memory, or use SQLite, or BerkeleyDB and rsync to each webserver, or mysql replica on webserver

50) stored procedures are dangerous because they’re not horizontal, more work than just adding a webserver– only use if it saves the db work (ie send 5 rows to app instead of 5,000 and parsing in app)

51) reconsider persistent db connections because it requires a thread = memory, all httpd processes talk to all dbs, lots of caching might mean you don’t need main db, mysql cxns are fast so why not just reopen?

52) innodb_file_per_table, so OPTIMIZE TABLE clears unused space. innodb_buffer_pool_soze set to 80% of total mem (dedicated mysql server). innodb_flush_log_at_trx_commit, innodb_log_file_size

53) have metadata in db, store images in filesystem, but then how do you replicate? or store images in myisam tables, split up so tables don’t get bigger than 4G, so if gets corrupt fewer problems. metadata table might specify what table it’s in. include last modified date in metadata, and use in URLs to optimize caching, ie with squid: /images/$timestamp/$id.jpg

54) do everything in unicode

55) UTC for everything

56) STRICT_TRANS_TABLE so MySQL is picky about bad input and doesn’t just turn it to NULL or zero.

57) Don’t overwork the DB — dbs don’t easily scale like web servers

58) STATELESS. don’t make cookie id’s easy to guess, or sequential, etc. don’t save state on one server only, save it on every one. put the data in the db, don’t put it in the cookie, that duplicates efforts. important data into db, so it gets saved, unimportant transient data puts in memcache, SMALL data in cookie. a shopping cart would go in db, background color goes in cookie, and last viewed items go in memcache

59) to make cookies safer, use checksums and timestamps to validate cookies. Encryption usually a waste of cycles.

60) use resources wisely. balance how you use hardware — use memory to save I/O or CPU, don’t swap memory to disk EVER.

61) do the work in parallel — split work into smaller pieces and run on different boxes. send sub-requests off as soon as possible and do other stuff in the meantime.

62) light processes for light tasks — thin proxy servers for “network buffers”, goes between the user and your heavier backend application. Use httpd with mod_proxy, mod_backhand. the proxy does the ‘net work, and fewer httpd processes are needed to do the real work, this saves memory and db connections. proxies can also server static files and cache responses. Avoid starting main app as root. Load balancing, and very important if your background processes are “heavy”. Very EASY to set up a light process. ProxyPreserveHostOn in apache 2

63) job queues — use queues, AJAX can make this easy. webserver submits job to database “queue”, first avail worker picks up first job, and sends result to queue. or ue gearman, Spread, MQ/Java Messaging Service(?)

64) log http requests to a database! log all 4xx and 5xx requests, great to see which requests are slow or fast. but only log 1-2% of all requests. Time::HiRes in Perl, microseconds from gettimeofday system call.

65) get good deals on servers http://www.siliconmechanics.com, server vendor of lj and others.

IN SUMMARY: HORIZONTAL GOOD, VERTICAL BAD

Quoted from http://forge.mysql.com/wiki/Web_Performance_And_Scalability

Hope this is helpful!

July 24, 2009

How to install WWW::Contact module for import contact from yahoo, gmail, AOL from perl cgi website

Filed under: Perl, Programming — doqkhanh @ 1:29 PM

In this small tutorial, I will show you how to install WWW::Contact module to implement import contact feature from yahoo, gmail, AOL in your perl-cgi website.

I am using centos 5.3 and its bash shell in Poderosa, with a root account:
#find /usr/lib/perl5/5.8.8/CPAN/Config.pm path
locate CPAN/Config.pm

#open to edit and make sure we have access to the Internet
vim /usr/lib/perl5/5.8.8/CPAN/Config.pm
#’connect_to_internet_ok’ => q[1],
#’urllist’ => [],

#Download WWW::Contact packet
wget http://search.cpan.org/CPAN/authors/id/F/FA/FAYLAND/WWW-Contact-0.27.tar.gz

#un compress
tar -xvzf WWW-Contact-0.27.tar.gz

#check config
cd WWW-Contact-0.27
perl Build.PL

#you could see something like this
Checking whether your kit is complete…
Looks good

Checking prerequisites…
Looks good

Deleting Build
Removed previous script ‘Build’

Creating new ‘Build’ script for ‘WWW-Contact’ version ‘0.27′

#If there is any missed module, install it.
Checking prerequisites…
– ERROR: Net::Google::AuthSub is not installed
– ERROR: HTML::TokeParser::Simple is not installed
– ERROR: WWW::Mechanize is not installed
– ERROR: Crypt::SSLeay is not installed
– ERROR: JSON::XS is not installed
– ERROR: Text::vCard::Addressbook is not installed
– ERROR: WWW::Mechanize::GZip is not installed
– ERROR: Moose is not installed

#To install missed modules
perl -MCPAN -e shell
install Net::Google::AuthSub
install HTML::TokeParser::Simple
install WWW::Mechanize
install Crypt::SSLeay
install JSON::XS
install Text::vCard::Addressbook
install WWW::Mechanize::GZip
install Moose

# you may need force install for SSLeay module
force install Crypt::SSLeay

# try to install our Contact module
perl -MCPAN -e shell
install WWW::Contact

#if still cannot install Contact module in automatic mode, try to install it manually
perl Build.PL
./Build
./Build test
./Build install

Congrats ! Your sample should work well now.

May 25, 2009

perl -MCPAN -e shell TIPs, TRICKs

Filed under: Other, Perl, Programming — doqkhanh @ 10:06 AM

1. With history
Using   perl -MCPAN -e shell -xdg
Instead of perl -MCPAN -e shell

You can using up/down arrow key to get local history.

2. When you get this message
CPAN.pm panic
Simply delete your lock file and reload cpan to continue without any error. Thanks Mr Bang for this handy tip.

3. You can set ftp site to empty for better support in case your wget cannot get file with cpan shell
Try to locate Config.pm in cpan directory
# locate CPAN/Config.pm
And edit it with vim, vi or any editor like that and make sure your urllist is empty, it is empty but it will working like a champ.
'urllist' => [],

4. Update cpan
# Backup your current module list
perl -MCPAN -e autobundle
# That will create a file with a name like
## /root/.cpan/Bundle/Snapshot_yyyy_mm_dd_00.pm
# Enter CPAN
perl -MCPAN -e shell -xdg
# Update
install Bundle::CPAN
# Reload cpan
reload cpan

April 20, 2009

Perl does not do any automatic dereferencing for you

Filed under: Other, Perl, Programming — Tags: , — doqkhanh @ 12:49 PM

No Automatic Dereferencing

Perl does not do any automatic dereferencing for you. You must explicitly dereference using the constructs just described. This is similar to C, in which you have to say *p to indicate the object pointed to by p. Consider

$rarray = \@array;

push ($rarray,  1, 2, 3);   # Error: $rarray is a scalar, not an array

push (@$rarray, 1, 2, 3);   # OK

push expects an array as the first argument, not a reference to an array (which is a scalar). Similarly, when printing an array, Perl does not automatically deference any references. Consider

print "$rarray, $rhash";

This prints

ARRAY(0xc70858), HASH(0xb75ce8)

This issue may seem benign but has ugly consequences in two cases. The first is when a reference is used in an arithmetic or conditional expression by mistake; for example, if you said $a += $r when you really meant to say $a += $$r, you’ll get only a hard-to-track bug. The second common mistake is assigning an array to a scalar ($a = @array) instead of the array reference ($a = \@array). Perl does not warn you in either case, and Murphy’s law being what it is, you will discover this problem only when you are giving a demo to a customer.

Copyright: http://oreilly.com/catalog/advperl/excerpt/ch01.html

April 17, 2009

chuyển đổi chữ việt có dấu thành không dấu với PHP, MYSQL

Filed under: PHP — doqkhanh @ 8:10 AM

Rất nhiều tình huống phải chuyển chữ việt có dấu thành không dấu, sau đây xin chia sẻ với các bạn cách làm điều này trong PHP. Sử dụng hàm remove_accents là có thể giải quyết nhanh chóng:
<?php
function remove_accents( $str )
{
var $str = htmlentities($str);
return preg_replace(“/&([a-z])[a-z]+;/i”,”$1″,$str);
}
$str = “Xin chào, tôi là chữ Việt có dấu.”;
echo remove_accents( $str );
?>

Nếu chạy script này, sẽ in ra màn hình là:
Xin chao, toi la chu Viet co dau.

CREATE trigger video_before_insert before insert on video for each row
BEGIN

DECLARE done INT DEFAULT 0;
DECLARE counter INT DEFAULT 0;
DECLARE strInput VARCHAR(120);
DECLARE str_no_sign_title VARCHAR(120);
DECLARE cur_1 CURSOR FOR SELECT ("áa", "àa") ;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN cur_1;
REPEAT
FETCH cur_1 INTO strInput;
                IF NOT done THEN
                        SET str_no_sign_title  = REPLACE(str_no_sign_title , SUBSTRING(strInput, 0,1), SUBSTRING(strInput, 1,1));
                END IF;
        UNTIL done END REPEAT;
 CLOSE cur_1;

   set new.no_sign_title = str_no_sign_title;

END

Happy codings !

March 27, 2009

Perl: @INC array

Filed under: Perl, Programming — Tags: — doqkhanh @ 1:23 PM

The @INC array is a list of directories Perl searches when attempting to load modules. To display the current contents of the @INC array:

# perl -e “print join(\”\n\”, @INC);”


The following two methods may be used to append to Perl’s @INC array:

1. Add the directory to the PERL5LIB environment variable.
Centos/Redhat:
# export PERL5LIB=/usr/local/src/api/soap

2. Add use lib directory; in your Perl script.
#
## using perl module with direct link to module directory with use lib
# use lib qw (/usr/local/src/api/soap);
#

For more information, read the perlrun manpage or type perldoc lib

February 25, 2009

Several ways to execute a shell script in Perl

Filed under: Perl — Tags: — doqkhanh @ 3:23 AM

There are several ways to execute a shell script in Perl :

#!/usr/bin/perl -w

use Shell qw(cat ps cp df ssh su);
print "Content-type: text/html; charset=Shift_JIS\n\n";

su ("neededuser");
@result = qx{ sudo su username -c '/home/perl/cgi-bin/CheckMailStatus.sh' };

$cmd = "sudo su neededuser -c '/home/perl/cgi-bin/CheckStatus.sh' ";
$data = qx/$cmd/;

$result = system ("sudo su username -c 'echo hello '");
print $data;

exit;

December 22, 2008

Get and set URL parameter by Javascript

Filed under: Javascript — doqkhanh @ 1:20 AM

Get and set URL parameter by Javascript
Source code is all :

/**
* \summary: Get url parameter by javascript
* \return: Null
* \param: parameter name
* \author: Netlobo (http://www.netlobo.com/url_query_string_javascript.html)
*/
function getUrlParameter( name )
{
var errorMessage = “Javascript error at getUrlParameter() function in kingpot.js. Please contact GNT’s site administrator.”;

try
{
name = name.replace(/[\[]/,”\\\[").replace(/[\]]/,”\\\]”);
var regexS = “[\\?&]“+name+”=([^&#]*)”;
var regex = new RegExp( regexS );
var results = regex.exec( window.location.href );
if( results == null )
return “”;
else
return results[1];
}
catch(err)
{
alert(errorMessage + “\n
Error detail:” + err);
}
}

/*
function getUrlParam(url, param)
{
var re = new RegExp(“(\\\?|&)” + param + “=([^&]+)(&|$)”, “i”);
var m = url.match(re);
if (m)
return m[2];
else
return ”;
}
*/

/**
* \summary: Set url parameter value by javascript
* \return: Replaced URL
* \param:
url : input url, example: url or document.location.href
param : parameter name, example: pageIndex
v : parameter value, example: 12
* \author: csdn.net (http://topic.csdn.net/t/20060712/09/4874772.html)
* \sample: var url = setUrlParam(document.location.href,’year’,curr_year);
*/
function setUrlParameter(url, param, v)
{
var re = new RegExp(“(\\\?|&)” + param + “=([^&]+)(&|$)”, “i”);
var m = url.match(re);
if (m)
{
return (url.replace(re, function($0, $1, $2) { return ($0.replace($2, v)); } ));
}
else
{
if (url.indexOf(‘?’) == -1)
return (url + ‘?’ + param + ‘=’ + v);
else
return (url + ‘&’ + param + ‘=’ + v);
}
}

August 21, 2008

Get ranking with only 1 MySQL Query

Filed under: MySQL, Perl, Programming — Tags: , — doqkhanh @ 7:21 AM

Sometime, you need to order and get member ranking base on one or more condition, you can get all information and use a array to get need ranking, but today I will introduce to you a better way, and certainly – faster way to get it: using just 1 query to get member ranking.

This is query in query command:

SET @rownum := 0; #Create counter variable
SELECT *
FROM
(
SELECT DISTINCT(FieldID), @rownum := @rownum + 1 as ranking
FROM TableName.FieldName
WHERE
FieldID = '27'
) AS NewTable
WHERE NewTable.FieldID = '00022'

This is perl code for the above query:

my $sql0 =" SET \@rownum := 0;";
my $sql1=" SELECT ranking FROM ..... LIMIT 1;";

$db->do($sql0);
$sth=$db->prepare($sql1);

$row=$sth->execute;

if(defined($row) && $row == 1)
{
$rec = $sth->fetchrow_arrayref;
$result = $rec->[0];
}
$sth->finish;

Comparison Operators – How to compare values in Perl

Filed under: Perl, Programming — Tags: — doqkhanh @ 2:26 AM

Perl actually has two sets of comparison operators – one for comparing numeric values and one for comparing string (ascii) values.

== eq equal
> gt greater than
>= ge greater than or equal to
< lt less than
<= le less than or equal to
!= ne not equal

August 7, 2008

Converting to sjis to utf-8

Filed under: Perl, Programming — Tags: , — doqkhanh @ 6:38 AM

This code can convert a Japanese Shift JIS string to a utf-8 string. Thank 中島さん (Mr Nakajima) for this useful code snippet.

use Unicode::Japanese;

my $keyword="";
$keyword = "something in Japanese like これわ日本語です。";

# use utf-8
my $converter = Unicode::Japanese->new($keyword, 'sjis');
$keyword = $converter->get;

#Su dung $keyword bt

August 1, 2008

Javascript UrlEncode work well with PHP and Perl

Filed under: Perl, Programming — Tags: — doqkhanh @ 9:47 AM
    Javascript UrlEncode work well with PHP
    /**
    * \summary: http://kevin.vanzonneveld.net
    * \author:   Original by: Philip Peterson
    *              improved by: Kevin van Zonneveld (http://kevin.vanzonneveld.net), doqkhanh (http://quockhanh.info)
    * \example: urlencode('Kevin van Zonneveld!');
    * \returns: 'Kevin+van+Zonneveld%21'
    */
    function urlencode( str ) {
    var errorMessage = "Javascript error at urlencode() function in kingpot.js. Please contact GNT's site administrator.";
    var ret = str;
    try
    {
        ret = ret.toString();
        ret = encodeURIComponent(ret);
        ret = ret.replace(/%20/g, '+');
    }
    catch(err)
    {
        alert(errorMessage);
    }

    return ret;
    }

    PERL UrlEncode and UrlDecode
    sub urlencode
    {
        my $str = shift;
        $str =~ s/([^A-Za-z0-9])/sprintf("%%%02X", ord($1))/seg;    return $str;
    }    

    sub urldecode
    {
        my $str = shift;
        $str =~ s/%([A-Fa-f0-9]{2})/pack('C', hex($1))/seg;
        return $str;
    }

    Faster solution:
    $NeedEncodeingString =~ s/([^A-Za-z0-9])/sprintf("%%%02X", ord($1))/seg;
    $NeedDecodingString =~ s/\%([A-Fa-f0-9]{2})/pack('C', hex($1))/seg;

Invoke Amazon Web Services with Perl

Filed under: Perl, Programming — Tags: , — doqkhanh @ 9:16 AM

use LWP; # This library provides API to call Amazon Webservice

# These xml library provides API to paser XML from Amazon Webservice
use XML::Parser;
use XML::Simple;
use Data::Dumper;

# Create a user agent object
my $ua = LWP::UserAgent->new;
$ua->agent(“ApplicationName(c)YourCompany/Version 1.0 “);

#Create a request
my $SubscriptionId = “0MB1VZ********NYEQR2″;
my $request =  “http://webservices.amazon.co.jp/onca/xml?Service=AWSECommerceService&SubscriptionId=$SubscriptionId&Operation=ItemSearch&SearchIndex=Music&Keywords=$keyword&ResponseGroup=Medium,Tracks&Binding=CD&ReleaseDate=Latest”;

my $req = HTTP::Request->new(POST => $request);
$req->content_type(‘application/x-www-form-urlencoded’);
$req->content(‘query=libwww-perl&mode=dist’);

# Pass request to the user agent and get a response back
my $res = $ua->request($req);
my $ItemCollection;

my $dump_result; #This avariable using to unsderstance returned data structure

# Check the outcome of the response
if ($res->is_success)
{
#Get XML Page
$tmp = $res->content;

#Try paser by simple xml library
$ItemCollection = XMLin($tmp);

# Get data array structure
$dump_result = Dumper($ItemCollection);

#Try parse and get needed data
if($ItemCollection->{Items}->{TotalResults} ne 0)
{
if($ItemCollection->{Items}->{TotalResults} ne 1)    #if an array
{
$url = $ItemCollection->{Items}->{Item}->[0]->{ImageSets}->{ImageSet}->{MediumImage}->{URL};
}
else #if not an array
{
$url = $ItemCollection->{Items}->{Item}->{ImageSets}->{ImageSet}->{MediumImage}->{URL};
}

}

}

Test new Search Engine Result

Filed under: Perl, Programming — Tags: — doqkhanh @ 2:28 AM

Search result for “doqkhanh” keyword.
No 1: cuil.com 174
No 2:google.com 113
No 3:search.yahoo.com 65
No 4:live.com 57

Whew…

Wow. That was intense. Looking back at the first 48 hours since launch, it was quite an experience. After a lot of hard work, we were thrilled to begin offering our new approach to search. We were even more thrilled with the interest, and traffic, we received.

In fact, it was overwhelming—literally. While we had planned for a large number of searches on our first day, we hadn’t planned on more than 50 million. After all, that’s in the same ballpark as Microsoft’s Live Search and approaching Yahoo!. And they have a bit more infrastructure than our small start-up.

So for a good part of the first day, the traffic volume simply outstripped our ability to respond. Some machines failed. Some bugs were found. Some of our redundancies…weren’t so redundant. This meant some searches didn’t get the best results. Some didn’t get any.

And yet, for a lot of searches, Cuil did provide users with new results, different from the ones folks have gotten in the past, according to the reports we’ve received. This is one of our goals—to give people an alternative to existing approaches.

Thank you very much for the feedback. The emails we’ve gotten at feedback@cuil.com have been very helpful, telling us areas you enjoy—such as the layout and the search by category feature—and areas where we need to improve—image matching, for example. We read them, so please keep them coming.

At Cuil, we are tackling some of the big challenges in search, from finding ways to search more of the Web, to finding results by analyzing the content on a page, to providing images with our results to help you pick the page you want. These are difficult problems, and we know we have more work to do.

We are incredibly proud of what this small team of 30 employees has been able to build from scratch already, and we are committed to improving Cuil every day. Thank you for trying us out.

Tom, Anna and Russell

Blog at WordPress.com.