Scanning HTML and PHP files for text

How to search for text strings in files using the WinOne® Command Language Interpreter (CLI).

There are a lot of questions asked in WordPress forums about how to find which file to edit to change the appearance or behavour of WordPress themes or plugins. To identify the file or files your looking for you need to scan all the files for the text your interested in. I use WinOne for this purpose.

Looking at the website it’s not really clear how useful the WinOne software is and how to use it. This therefore is an example of how to use one feature of WinOne – the “FIND” command to scan folders of files for text strings.

As a preparatory action you need to download a copy the folder/directory containing the theme or plugin files from your website to your PC. For the purposes of this demonstration I’ve downloaded a copy of the NextGEN gallery plugin to my F: drive.

(1) Download and install WinOne onto your PC. It’s currently free to use for up to 30 days and costs less than $5 to purchase a licence if you wish to continue using it after that.

(2) Start WinOne by clicking on the WinOne Command Shell icon on your desktop.

(3) Select the disk drive your files are on by clicking on the appropriate drive icon at the bottom of the WinOne window.

Disk Drive Selection

(4) Navigate to the directory your files are in using the Change Directory (cd) command to select the drive you downloaded the theme or plugin code to.

Change Directory

(5) Use the FIND command to scan the files for the text your searching for.

Command FIND
Search for a text string in the specified files.

FIND [drive:][path]filename “textstring” [/S] [/M]

[drive:][path]filename Specifies the text file(s) to search.
textstring Specifies the text string to find.
/M Match case.
/S Process sub-directories.

Command FIND uses the fast Boyer-Moore Algorithm and will only search text files. Non-text file are simply skipped.
Command FIND is not compatible with the DOS command FIND.
Unlike the DOS command FIND, Wildcards are allowed for parameter filename.

Example 1 – Direct the output from the FIND command to the screen.
To find all files containing the string “alttext” enter find * /S “alttext”

Find Screen Output

Example 2 – Direct the output from the FIND command to a file.
To find all files containing the string “alttext” and redirect the output to a file enter
find * /S “alttext” > /temp/alttext.txt then open the file /temp/alttext.txt in your favourite file editor (notepad, notepad++ etc) to view the search results.

Find File Output

Having found the file or files your interested in you can open them to review the content or make changes as necessary to suit your requirements.

WinOne operates with a mix of DOS and UNIX like commands and syntax so in the above example you could for instance, append the output to a file by using >> rather than > which overwrites.

I have no financial interest in WinOne and the infomation provided here is supplied because I find the software very useful and it may be of interest to others.

NextGEN and Canonical URLs

When using the NextGEN gallery plugin you should check that none of your gallery posts or pages are incorrectly begin flagged as ‘do not index’ by WordPress automatically setting canonical URL’s.

It appears that on some NexGEN gallery sites WordPress is using the canonical setting on pages that are in fact unique and therefore should not have a canonical url set. For example, on my clients Japanese Woodlbock Prints site, pages that contain unique descriptions about the prints had canonical urls pointing back to the NextGEN galleries thumbnail page. As a result of this ‘misuse’ of canonical urls a significant part of the website was flagged non-indexable and therefore google would potentially not index those pages.

Looking at the screen capture below on the left it can be seen that SEO tools are reporting that a canonical url is being set for the page – even though the page is unique. The setting of the canonical url on that page is WordPress’s default handling but in this case it is wrong. The screen capture on the right shows the same page with no canonical url present which for this site is the correct setup.

Canonical urls were turned off on the site simply by editing the default-filters.php file to comment out the add_action( 'wp_head', 'rel_canonical' ); line.

WP Bad Use of Canonical URLs

Bad Use of Canonical URLs

WP Bad Use of Canonical URLs Fixed

Bad Use of Canonical URLs - FIXED!

Some SEO plugins allow you to set or unset canonical urls so if your using one then you’ll need to check its settings. I used All in One SEO on the gallery site with Canonical URL’s unselected in its options. Another very popular SEO plugin is WordPress SEO, it however has no setting options for canonical URL’s. It just forces them onto the user even if you edit the default-filters.php file as suggested above. We were hoping to use WordPress SEO but because of its canonical handling (or more correctly, lack of) we couldn’t use it on the NextGEN gallery site. See WordPress SEO Plugin for a possible solution to this issue – we’ve not tested it so I’d recommend you hunt down the WordPress SEO support pages on to see if there’s been any feedback on it.

If you are a NextGEN user I recommend that you check to see that there is no inappropriate setting of canonical urls on your content pages. Whether canonical should be turned off on a site depends on it’s impact on an individual site. For some it may be an issue and for some it may not be.