Personal Protection, Google Hacking And Robots.txt Relationship And Setting Skills

I Think Most People Should Know The Function Of The Robots.txt File (if You Don't Know, Please Refer To My Previous Article ), Because The Robots.txt Is For The Web Crawler ( Web Crawler ), The Search Engine Is Going To Grab The Website Page. Before The Index, Will Have To Crawl "courtesy" Of The Robots.txt File Analysis To Go Back, If Allowed To Really Grasp The Specific Url Will Continue To Download The Page To Go Back To The Page, But Look At This File Mankind Is Not Necessarily So Polite, So Hackers Knowing That You "do Not Want To Be Caught In The Catalog", But Instead Let The Hackers Interested, And Then Attempt To Attack These Web Pages That You Do Not Want To Be Known, So Choose Not To Put These Paths In The Robots.txt Definition File, However, This Concept Is It Correct? No!!! Please Continue To Read...
Some People Choose Not To Put The Sensitive Directory In The Robots.txt File, In Order To Prevent "anyone And Machine" From Knowing The Existence Of This Directory, But The Unfortunate Search Engine's Ability To Find The Url Is Still Strong. Many, You Are Not Listed In The Robots.txt , You Will Still Be Aware Of These Pages, So Anyway, Pages That You Don't Want To Be Found By The Search Engine Should Be Included In The Robots.txt Definition File !
Since The Robots.txt File Must Be Publicly Available, Everyone Can View Its Content. Suppose You Have A Website Whose Front And Back Are On The Same Website. Suppose The Website Url Is In The /backend/ Path, But You Do Not Let The Search Engine Enter This Directory, So You Will Include The Url Path In The Robots.txt File, For Example:
User-Agent: *
Disallow: /Backend/

There Is An Advantage To Defining Robots.txt . Search Engines Don't Retrieve All The Content In This Path Anyway, So Even If Your Backend Doesn't Have Any Authentication Or Forgetting To Set Permissions , At Least It Won't Be Searched By Any Search Engine. So This Setting Helps Reduce The Chances Of Being Hacked By Google , So Anyway, Pages That You Don't Want To Be Found By The Search Engine Should Be Included In The Robots.txt Definition File !
However, " Does The Hacker Know That You Don't Want To Be Caught In The Catalog, But Instead Let The Hacker Generate Interest, And Then Attempt To Attack These Web Pages That You Don't Want To Know ?" Doesn't This Happen? How To Avoid This Problem? Regarding This Issue, I Suggest Using The Following Method To Solve:
  • Use a standalone platform to perform "background management functions"


  • Cut "background Management" From Existing Website Projects
  • Set This Station To Only Allow Specific Source Ip To Access The Website.
  • Set Robots.txt And Reject All User-agents From Retrieving The Site


  • User-Agent: *
    Disallow: /

    I Think That Many Websites Are Unable To Cut The Existing "back-end Management Functions" From The Existing Website Projects Due To Budgetary Problems, Or Even Rent A Space To Execute The Back-end Website. In This Case, There Is Still A Way, That Is, " Restrict The Folder In The Background And Set The Ip Source Access Limit For A Specific Directory ."
    In Addition, I Still Recommend Setting Robots.txt And Rejecting All The User-agents In The Directory. I Have Recently Developed A Setting Technique That Can Effectively Solve The Problem Of Guessing The Directory Address, As Follows: .
    1. Suppose Your Backend Address Is Under /backend/ Path
    2. It Is Recommended That You Change This Directory To /backend 87916246412145 / Path, Which Is To Add A Random Number Or Garbled In The Original Directory Name , So That Outsiders Can Not Guess The Exact Path Of The Background Of This Website (unless There Is A Ghost)
    3. At This Point You Can Safely Set The Robots.txt File, But The Settings Are As Follows:
    User-Agent: *
    Disallow: /Backend

     Please Note: The Part Of The Url Is /Backend Instead Of /Backend /
    Through The Above Settings, There Are Two Advantages:
    • In The Entire Website, As Long As The Url At The Beginning Of /backend Will Have The Search Engine Search Requirements
    • Therefore, /backend123 Or /backendadmin Will Not Be Indexed By The Search Engine.
    • Anyone Can't Find Your Smart Directory Or File Through Search Engines, And It Can Also Reduce The      Risk Of Using Google Hacking To Launch Automated Attacks. For Example, The Explosion Of Sql Injection Attacks Automate Thousands Of Taiwan Websites.

    In This Way, Your Backstage Is Neither Afraid Of Google Hacking Attacks, But Also Effectively Blocks The Chance Of Human Attacks!

    In Conclusion 
    • In Any Case, The Background, And The Pages That You Don't Want To Be Found, Should Be Included In The Robots.txt Definition File!
    • When Setting Up Robots.txt , Consider Using The Techniques Taught At The End Of This Article.
    • If You Are Afraid That The Person Responsible For The Online Mistakenly Deletes The Robots.txt , It Is Also Recommended Not To Name The Background Directory As Easy To Remember And Guess, Such As Admin, Administrator, Backend Or Backoffice, Because It Is Easy To Be Attacked By Google Hacking . Search For The Login Page In The Background!
    • Take Advantage Of Google Webmaster Tools.

    Experience Sharing
    There May Still Be People Who Think That My Website Has A Login Verification Function. The Hacker Doesn't Know The Password, And There Is No Sql Injection Problem On My Website. How Can I Still Be Hacked?
    Authentication And Authorization Are Completely Different Things. Today, You Have Implemented A Verification Mechanism That Restricts Login. This Does Not Mean That All Background Functions And Files Have Been Properly Authorized. I Am Researching Google Hacking Attacks. In The Way, I Found That Many Websites Have Been Authenticated, But They Have Not Completely Implemented The "authorization"!
    For Example, The Background Often Has The Function Of Uploading Files. The Uploaded Files Are Usually Not Deleted. This Also Means That When The Administrator Uploads The Wrong File (for Example, Uploads The Data Containing The User's Capital), This File May Appear On The Front Page For A While, Which Causes The File To Be Indexed And Cached By The Search Engine. Afterwards, Although The File Is Updated, The Old File Is Still On The Website, Which Means That The File Can Still It Is Searched By Users. Anyone Who Is Familiar With Similar Google Hacking Skills Can Easily Find These Files With Personal Resources. Therefore, It Is Necessary To Set Robots.txt To Effectively Prevent Such Incidents From Happening Again!