[root@fc17 ~]# uname -ri 3.4.6-2.fc17.x86_64 x86_64 [root@fc17 ~]# cat /etc/fedora-release Fedora release 17 (Beefy Miracle) |
install httrack via yum
[root@fc17 ~]# yum install httrack -y |
create a directory to store web contents
$ mkdir web-copy |
I’ll copy my blog site.
Please take care of copyright matters , bandwidth , load etc when you copy web contents.
run httrack
$ httrack Welcome to HTTrack Website Copier (Offline Browser) 3.43-9+libhtsjava.so.2 Copyright (C) Xavier Roche and other contributors To see the option list, enter a blank line or try httrack --help Enter project name :my-blog-copy Base path (return=/home/hattori/websites/) :/home/hattori/web-copy Enter URLs (separated by commas or blank spaces) :http://lost-and-found-narihiro.blogspot.jp Action: (enter) 1 Mirror Web Site(s) 2 Mirror Web Site(s) with Wizard 3 Just Get Files Indicated 4 Mirror ALL links in URLs (Multiple Mirror) 5 Test Links In URLs (Bookmark Test) 0 Quit : 1 Proxy (return=none) : You can define wildcards, like: -*.gif +www.*.com/*.zip -*img_*.zip Wildcards (return=none) : You can define additional options, such as recurse level (-r<number>), separed by blank spaces To see the option list, type help Additional options (return=none) : ---> Wizard command line: httrack http://lost-and-found-narihiro.blogspot.jp -O "/home/hattori/web-copy/my-blog-copy" -%v Ready to launch the mirror? (Y/n) :y Mirror launched on Tue, 24 Jul 2012 23:56:52 by HTTrack Website Copier/3.43-9+libhtsjava.so.2 [XR&CO'2010] mirroring http://lost-and-found-narihiro.blogspot.jp with the wizard help.. |
Here’s a capture data
User-Agent Header seems to be Mozilla.
Hypertext Transfer Protocol GET /2012/01/linux-mint-12-configure-ip-aliases.html HTTP/1.1\r\n [Expert Info (Chat/Sequence): GET /2012/01/linux-mint-12-configure-ip-aliases.html HTTP/1.1\r\n] [Message: GET /2012/01/linux-mint-12-configure-ip-aliases.html HTTP/1.1\r\n] [Severity level: Chat] [Group: Sequence] Request Method: GET Request URI: /2012/01/linux-mint-12-configure-ip-aliases.html Request Version: HTTP/1.1 Referer: http://lost-and-found-narihiro.blogspot.jp/\r\n Cookie: $Version=1; blogger_TID=xxx; $Path=/\r\n Connection: Keep-Alive\r\n Host: lost-and-found-narihiro.blogspot.jp\r\n User-Agent: Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)\r\n Accept: image/png, image/jpeg, image/pjpeg, image/x-xbitmap, image/svg+xml, image/gif;q=0.9, */*;q=0.1\r\n Accept-Language: en, *\r\n Accept-Charset: iso-8859-1, iso-8859-*;q=0.9, utf-8;q=0.66, *;q=0.33\r\n Accept-Encoding: gzip, identity;q=0.9\r\n \r\n [Full request URI: http://lost-and-found-narihiro.blogspot.jp/2012/01/linux-mint-12-configure-ip-aliases.html] |
after finishing copying web contents , web contents will be stored under “~/web-copy/my-blog-copy” directory.
[root@fc17 my-blog-copy]# pwd /home/hattori/web-copy/my-blog-copy [root@fc17 my-blog-copy]# ls backblue.gif hts-in_progress.lock lost-and-found-narihiro.blogspot.jp fade.gif hts-log.txt hts-cache index.html |
open index.html with an web browser.
or
you could copy web contents with wget command like this:
$ wget --user-agent=Mozilla --mirror --wait=1 http://zzzzz |
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.