CheckCheck - my new project for automatic site validation
Recently, I needed a new website monitoring tool. I usually use tools like Screaming Frog, lighthouse in chrome, as well as several browser plugins in my day-to-day work. I use plugins for the browser especially for quickly analyzing the markup of meta and opengraph tags without opening the developer tools.
The idea of the tool I need is to combine the functions of all the programs described above in one, or even better - in the form of a service that can do this analysis regularly and automatically.
One of the problems that such a tool can solve is the daily control of the correct operation of all internal pages of the site. The problem of regular monitoring is compounded if the site is very large and has tens of thousands of pages (news or online store, for example). On such sites it can be difficult to keep track of if part of the content "dropped out" for technical reasons. These daily crawls can detect and fix a range of issues, from developer errors that were missed by tests to flaws by content managers or third-party data providers.
In early 2021, I started implementing such a tool as an online service cc.floor12.net. The very first versions, which were written by me in the first weeks, immediately bore fruit - on one of the online stores I served, 50,000 products were found to be missing from the catalog due to errors in the mechanism for importing data from partners. The fact that the first implementations immediately showed their practical usefulness prompted me to continue developing. The project was named [CheckCheсk] (https://cc.floor12.net/ru).
At the moment, after four months of work, the first version has been implemented, which bypasses the given site in several streams and generates reports in the form of html, pdf and excel documents. The site interface and reports are implemented in Russian, English and Spanish. The list of checks carried out at this stage is small and amounts to about 25-30 basic checks, which can be grouped as follows:
- availability and validity of canonical tag
- images: availability of alt and valid src attribute,
- title and meta tags, their length
- all basic types of OpenGraph tags and their validity
- presence and length h1
- codes and server response time
- presence of breadcrumbs with micro-markup on internal pages
Practice has shown that for the initial analysis of a resource, it is enough to limit the scan to the first thousand pages to get an initial list of errors. Most of the errors received when scanning the first thousand pages will be repeated on all pages of the site. So it makes sense to scan the entire project, if it is large, only after eliminating the typical errors identified by the first scan.
At the moment, I devote almost all my free time to the project in order to quickly endow it with new functions.