fix: prevent asset conflicts between React and Grid.js versions

Add coexistence checks to all enqueue methods to prevent loading
both React and Grid.js assets simultaneously.

Changes:
- ReactAdmin.php: Only enqueue React assets when ?react=1
- Init.php: Skip Grid.js when React active on admin pages
- Form.php, Coupon.php, Access.php: Restore classic assets when ?react=0
- Customer.php, Product.php, License.php: Add coexistence checks

Now the toggle between Classic and React versions works correctly.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
dwindown
2026-04-18 17:02:14 +07:00
parent bd9cdac02e
commit e8fbfb14c1
74973 changed files with 6658406 additions and 71 deletions

4
node_modules/robots-parser/.npmignore generated vendored Normal file
View File

@@ -0,0 +1,4 @@
node_modules
coverage
.nyc_output
.vscode

86
node_modules/robots-parser/CHANGELOG.md generated vendored Normal file
View File

@@ -0,0 +1,86 @@
# Changelog
## Version 3.0.1
- Fixed bug with `https:` URLs defaulting to port `80` instead of `443` if no port is specified.
Thanks to @dskvr for reporting
This affects comparing URLs with the default HTTPs port to URLs without it.
For example, comparing `https://example.com/` to `https://example.com:443/` or vice versa.
They should be treated as equivalent but weren't due to the incorrect port
being used for `https:`.
## Version 3.0.0
- Changed to using global URL object instead of importing. &ndash; Thanks to @brendankenny
## Version 2.4.0:
- Added Typescript definitions. &ndash; Thanks to @danhab99 for creating
- Added SECURITY.md policy and CodeQL scanning
## Version 2.3.0:
- Fixed bug where if the user-agent passed to `isAllowed()` / `isDisallowed()` is called "constructor" it would throw an error.
- Added support for relative URLs. This does not affect the default behavior so can safely be upgraded.
Relative matching is only allowed if both the robots.txt URL and the URLs being checked are relative.
For example:
```js
var robots = robotsParser('/robots.txt', [
'User-agent: *',
'Disallow: /dir/',
'Disallow: /test.html',
'Allow: /dir/test.html',
'Allow: /test.html'
].join('\n'));
robots.isAllowed('/test.html', 'Sams-Bot/1.0'); // false
robots.isAllowed('/dir/test.html', 'Sams-Bot/1.0'); // true
robots.isDisallowed('/dir/test2.html', 'Sams-Bot/1.0'); // true
```
## Version 2.2.0:
- Fixed bug that with matching wildcard patterns with some URLs
&ndash; Thanks to @ckylape for reporting and fixing
- Changed matching algorithm to match Google's implementation in google/robotstxt
- Changed order of precedence to match current spec
## Version 2.1.1:
- Fix bug that could be used to causing rule checking to take a long time
&ndash; Thanks to @andeanfog
## Version 2.1.0:
- Removed use of punycode module API's as new URL API handles it
- Improved test coverage
- Added tests for percent encoded paths and improved support
- Added `getMatchingLineNumber()` method
- Fixed bug with comments on same line as directive
## Version 2.0.0:
This release is not 100% backwards compatible as it now uses the new URL APIs which are not supported in Node < 7.
- Update code to not use deprecated URL module API's.
&ndash; Thanks to @kdzwinel
## Version 1.0.2:
- Fixed error caused by invalid URLs missing the protocol.
## Version 1.0.1:
- Fixed bug with the "user-agent" rule being treated as case sensitive.
&ndash; Thanks to @brendonboshell
- Improved test coverage.
&ndash; Thanks to @schornio
## Version 1.0.0:
- Initial release.

21
node_modules/robots-parser/LICENSE.md generated vendored Normal file
View File

@@ -0,0 +1,21 @@
The MIT License (MIT)
Copyright (c) 2014 Sam Clarke
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

205
node_modules/robots-parser/README.md generated vendored Normal file
View File

@@ -0,0 +1,205 @@
# Robots Parser [![NPM downloads](https://img.shields.io/npm/dm/robots-parser)](https://www.npmjs.com/package/robots-parser) [![DeepScan grade](https://deepscan.io/api/teams/457/projects/16277/branches/344939/badge/grade.svg)](https://deepscan.io/dashboard#view=project&tid=457&pid=16277&bid=344939) [![GitHub license](https://img.shields.io/github/license/samclarke/robots-parser.svg)](https://github.com/samclarke/robots-parser/blob/master/license.md) [![Coverage Status](https://coveralls.io/repos/github/samclarke/robots-parser/badge.svg?branch=master)](https://coveralls.io/github/samclarke/robots-parser?branch=master)
A robots.txt parser which aims to be complaint with the [draft specification](https://datatracker.ietf.org/doc/html/draft-koster-rep).
The parser currently supports:
- User-agent:
- Allow:
- Disallow:
- Sitemap:
- Crawl-delay:
- Host:
- Paths with wildcards (\*) and EOL matching ($)
## Installation
Via NPM:
npm install robots-parser
or via Yarn:
yarn add robots-parser
## Usage
```js
var robotsParser = require('robots-parser');
var robots = robotsParser('http://www.example.com/robots.txt', [
'User-agent: *',
'Disallow: /dir/',
'Disallow: /test.html',
'Allow: /dir/test.html',
'Allow: /test.html',
'Crawl-delay: 1',
'Sitemap: http://example.com/sitemap.xml',
'Host: example.com'
].join('\n'));
robots.isAllowed('http://www.example.com/test.html', 'Sams-Bot/1.0'); // true
robots.isAllowed('http://www.example.com/dir/test.html', 'Sams-Bot/1.0'); // true
robots.isDisallowed('http://www.example.com/dir/test2.html', 'Sams-Bot/1.0'); // true
robots.getCrawlDelay('Sams-Bot/1.0'); // 1
robots.getSitemaps(); // ['http://example.com/sitemap.xml']
robots.getPreferredHost(); // example.com
```
### isAllowed(url, [ua])
**boolean or undefined**
Returns true if crawling the specified URL is allowed for the specified user-agent.
This will return `undefined` if the URL isn't valid for this robots.txt.
### isDisallowed(url, [ua])
**boolean or undefined**
Returns true if crawling the specified URL is not allowed for the specified user-agent.
This will return `undefined` if the URL isn't valid for this robots.txt.
### getMatchingLineNumber(url, [ua])
**number or undefined**
Returns the line number of the matching directive for the specified URL and user-agent if any.
Line numbers start at 1 and go up (1-based indexing).
Returns -1 if there is no matching directive. If a rule is manually added without a lineNumber then this will return undefined for that rule.
### getCrawlDelay([ua])
**number or undefined**
Returns the number of seconds the specified user-agent should wait between requests.
Returns undefined if no crawl delay has been specified for this user-agent.
### getSitemaps()
**array**
Returns an array of sitemap URLs specified by the `sitemap:` directive.
### getPreferredHost()
**string or null**
Returns the preferred host name specified by the `host:` directive or null if there isn't one.
# Changes
### Version 3.0.1
- Fixed bug with `https:` URLs defaulting to port `80` instead of `443` if no port is specified.
Thanks to @dskvr for reporting
This affects comparing URLs with the default HTTPs port to URLs without it.
For example, comparing `https://example.com/` to `https://example.com:443/` or vice versa.
They should be treated as equivalent but weren't due to the incorrect port
being used for `https:`.
### Version 3.0.0
- Changed to using global URL object instead of importing. &ndash; Thanks to @brendankenny
### Version 2.4.0:
- Added Typescript definitions
&ndash; Thanks to @danhab99 for creating
- Added SECURITY.md policy and CodeQL scanning
### Version 2.3.0:
- Fixed bug where if the user-agent passed to `isAllowed()` / `isDisallowed()` is called "constructor" it would throw an error.
- Added support for relative URLs. This does not affect the default behavior so can safely be upgraded.
Relative matching is only allowed if both the robots.txt URL and the URLs being checked are relative.
For example:
```js
var robots = robotsParser('/robots.txt', [
'User-agent: *',
'Disallow: /dir/',
'Disallow: /test.html',
'Allow: /dir/test.html',
'Allow: /test.html'
].join('\n'));
robots.isAllowed('/test.html', 'Sams-Bot/1.0'); // false
robots.isAllowed('/dir/test.html', 'Sams-Bot/1.0'); // true
robots.isDisallowed('/dir/test2.html', 'Sams-Bot/1.0'); // true
```
### Version 2.2.0:
- Fixed bug that with matching wildcard patterns with some URLs
&ndash; Thanks to @ckylape for reporting and fixing
- Changed matching algorithm to match Google's implementation in google/robotstxt
- Changed order of precedence to match current spec
### Version 2.1.1:
- Fix bug that could be used to causing rule checking to take a long time
&ndash; Thanks to @andeanfog
### Version 2.1.0:
- Removed use of punycode module API's as new URL API handles it
- Improved test coverage
- Added tests for percent encoded paths and improved support
- Added `getMatchingLineNumber()` method
- Fixed bug with comments on same line as directive
### Version 2.0.0:
This release is not 100% backwards compatible as it now uses the new URL APIs which are not supported in Node < 7.
- Update code to not use deprecated URL module API's.
&ndash; Thanks to @kdzwinel
### Version 1.0.2:
- Fixed error caused by invalid URLs missing the protocol.
### Version 1.0.1:
- Fixed bug with the "user-agent" rule being treated as case sensitive.
&ndash; Thanks to @brendonboshell
- Improved test coverage.
&ndash; Thanks to @schornio
### Version 1.0.0:
- Initial release.
# License
The MIT License (MIT)
Copyright (c) 2014 Sam Clarke
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

467
node_modules/robots-parser/Robots.js generated vendored Normal file
View File

@@ -0,0 +1,467 @@
/**
* Trims the white space from the start and end of the line.
*
* If the line is an array it will strip the white space from
* the start and end of each element of the array.
*
* @param {string|Array} line
* @return {string|Array}
* @private
*/
function trimLine(line) {
if (!line) {
return null;
}
if (Array.isArray(line)) {
return line.map(trimLine);
}
return String(line).trim();
}
/**
* Remove comments from lines
*
* @param {string} line
* @return {string}
* @private
*/
function removeComments(line) {
var commentStartIndex = line.indexOf('#');
if (commentStartIndex > -1) {
return line.substr(0, commentStartIndex);
}
return line;
}
/**
* Splits a line at the first occurrence of :
*
* @param {string} line
* @return {Array.<string>}
* @private
*/
function splitLine(line) {
var idx = String(line).indexOf(':');
if (!line || idx < 0) {
return null;
}
return [line.slice(0, idx), line.slice(idx + 1)];
}
/**
* Normalises the user-agent string by converting it to
* lower case and removing any version numbers.
*
* @param {string} userAgent
* @return {string}
* @private
*/
function formatUserAgent(userAgent) {
var formattedUserAgent = userAgent.toLowerCase();
// Strip the version number from robot/1.0 user agents
var idx = formattedUserAgent.indexOf('/');
if (idx > -1) {
formattedUserAgent = formattedUserAgent.substr(0, idx);
}
return formattedUserAgent.trim();
}
/**
* Normalises the URL encoding of a path by encoding
* unicode characters.
*
* @param {string} path
* @return {string}
* @private
*/
function normaliseEncoding(path) {
try {
return urlEncodeToUpper(encodeURI(path).replace(/%25/g, '%'));
} catch (e) {
return path;
}
}
/**
* Convert URL encodings to support case.
*
* e.g.: %2a%ef becomes %2A%EF
*
* @param {string} path
* @return {string}
* @private
*/
function urlEncodeToUpper(path) {
return path.replace(/%[0-9a-fA-F]{2}/g, function (match) {
return match.toUpperCase();
});
}
/**
* Matches a pattern with the specified path
*
* Uses same algorithm to match patterns as the Google implementation in
* google/robotstxt so it should be consistent with the spec.
*
* @see https://github.com/google/robotstxt/blob/f465f0ede81099dd8bc4aeb2966b3a892bd488b3/robots.cc#L74
* @param {string} pattern
* @param {string} path
* @return {boolean}
* @private
*/
function matches(pattern, path) {
// I've added extra comments to try make this easier to understand
// Stores the lengths of all the current matching substrings.
// Maximum number of possible matching lengths is every length in path plus
// 1 to handle 0 length too (if pattern starts with * which is zero or more)
var matchingLengths = new Array(path.length + 1);
var numMatchingLengths = 1;
// Initially longest match is 0
matchingLengths[0] = 0;
for (var p = 0; p < pattern.length; p++) {
// If $ is at the end of pattern then we must match the whole path.
// Which is true if the longest matching length matches path length
if (pattern[p] === '$' && p + 1 === pattern.length) {
return matchingLengths[numMatchingLengths - 1] === path.length;
}
// Handle wildcards
if (pattern[p] == '*') {
// Wildcard so all substrings minus the current smallest matching
// length are matches
numMatchingLengths = path.length - matchingLengths[0] + 1;
// Update matching lengths to include the smallest all the way up
// to numMatchingLengths
// Don't update smallest possible match as * matches zero or more
// so the smallest current match is also valid
for (var i = 1; i < numMatchingLengths; i++) {
matchingLengths[i] = matchingLengths[i - 1] + 1;
}
} else {
// Check the char at the matching length matches the pattern, if it
// does increment it and add it as a valid length, ignore if not.
var numMatches = 0;
for (var i = 0; i < numMatchingLengths; i++) {
if (
matchingLengths[i] < path.length &&
path[matchingLengths[i]] === pattern[p]
) {
matchingLengths[numMatches++] = matchingLengths[i] + 1;
}
}
// No paths matched the current pattern char so not a match
if (numMatches == 0) {
return false;
}
numMatchingLengths = numMatches;
}
}
return true;
}
function parseRobots(contents, robots) {
var newlineRegex = /\r\n|\r|\n/;
var lines = contents
.split(newlineRegex)
.map(removeComments)
.map(splitLine)
.map(trimLine);
var currentUserAgents = [];
var isNoneUserAgentState = true;
for (var i = 0; i < lines.length; i++) {
var line = lines[i];
if (!line || !line[0]) {
continue;
}
switch (line[0].toLowerCase()) {
case 'user-agent':
if (isNoneUserAgentState) {
currentUserAgents.length = 0;
}
if (line[1]) {
currentUserAgents.push(formatUserAgent(line[1]));
}
break;
case 'disallow':
robots.addRule(currentUserAgents, line[1], false, i + 1);
break;
case 'allow':
robots.addRule(currentUserAgents, line[1], true, i + 1);
break;
case 'crawl-delay':
robots.setCrawlDelay(currentUserAgents, line[1]);
break;
case 'sitemap':
if (line[1]) {
robots.addSitemap(line[1]);
}
break;
case 'host':
if (line[1]) {
robots.setPreferredHost(line[1].toLowerCase());
}
break;
}
isNoneUserAgentState = line[0].toLowerCase() !== 'user-agent';
}
}
/**
* Returns if a pattern is allowed by the specified rules.
*
* @param {string} path
* @param {Array.<Object>} rules
* @return {Object?}
* @private
*/
function findRule(path, rules) {
var matchedRule = null;
for (var i = 0; i < rules.length; i++) {
var rule = rules[i];
if (!matches(rule.pattern, path)) {
continue;
}
// The longest matching rule takes precedence
// If rules are the same length then allow takes precedence
if (!matchedRule || rule.pattern.length > matchedRule.pattern.length) {
matchedRule = rule;
} else if (
rule.pattern.length == matchedRule.pattern.length &&
rule.allow &&
!matchedRule.allow
) {
matchedRule = rule;
}
}
return matchedRule;
}
/**
* Converts provided string into an URL object.
*
* Will return null if provided string is not a valid URL.
*
* @param {string} url
* @return {?URL}
* @private
*/
function parseUrl(url) {
try {
// Specify a URL to be used with relative paths
// Using non-existent subdomain so can never cause conflict unless
// trying to crawl it but doesn't exist and even if tried worst that can
// happen is it allows relative URLs on it.
var url = new URL(url, 'http://robots-relative.samclarke.com/');
if (!url.port) {
url.port = url.protocol === 'https:' ? 443 : 80;
}
return url;
} catch (e) {
return null;
}
}
function Robots(url, contents) {
this._url = parseUrl(url) || {};
this._rules = Object.create(null);
this._sitemaps = [];
this._preferredHost = null;
parseRobots(contents || '', this);
}
/**
* Adds the specified allow/deny rule to the rules
* for the specified user-agents.
*
* @param {Array.<string>} userAgents
* @param {string} pattern
* @param {boolean} allow
* @param {number} [lineNumber] Should use 1-based indexing
*/
Robots.prototype.addRule = function (userAgents, pattern, allow, lineNumber) {
var rules = this._rules;
userAgents.forEach(function (userAgent) {
rules[userAgent] = rules[userAgent] || [];
if (!pattern) {
return;
}
rules[userAgent].push({
pattern: normaliseEncoding(pattern),
allow: allow,
lineNumber: lineNumber
});
});
};
/**
* Adds the specified delay to the specified user agents.
*
* @param {Array.<string>} userAgents
* @param {string} delayStr
*/
Robots.prototype.setCrawlDelay = function (userAgents, delayStr) {
var rules = this._rules;
var delay = Number(delayStr);
userAgents.forEach(function (userAgent) {
rules[userAgent] = rules[userAgent] || [];
if (isNaN(delay)) {
return;
}
rules[userAgent].crawlDelay = delay;
});
};
/**
* Add a sitemap
*
* @param {string} url
*/
Robots.prototype.addSitemap = function (url) {
this._sitemaps.push(url);
};
/**
* Sets the preferred host name
*
* @param {string} url
*/
Robots.prototype.setPreferredHost = function (url) {
this._preferredHost = url;
};
Robots.prototype._getRule = function (url, ua) {
var parsedUrl = parseUrl(url) || {};
var userAgent = formatUserAgent(ua || '*');
// The base URL must match otherwise this robots.txt is not valid for it.
if (
parsedUrl.protocol !== this._url.protocol ||
parsedUrl.hostname !== this._url.hostname ||
parsedUrl.port !== this._url.port
) {
return;
}
var rules = this._rules[userAgent] || this._rules['*'] || [];
var path = urlEncodeToUpper(parsedUrl.pathname + parsedUrl.search);
var rule = findRule(path, rules);
return rule;
};
/**
* Returns true if allowed, false if not allowed.
*
* Will return undefined if the URL is not valid for
* this robots.txt file.
*
* @param {string} url
* @param {string?} ua
* @return {boolean?}
*/
Robots.prototype.isAllowed = function (url, ua) {
var rule = this._getRule(url, ua);
if (typeof rule === 'undefined') {
return;
}
return !rule || rule.allow;
};
/**
* Returns the line number of the matching directive for the specified
* URL and user-agent if any.
*
* The line numbers start at 1 and go up (1-based indexing).
*
* Return -1 if there is no matching directive. If a rule is manually
* added without a lineNumber then this will return undefined for that
* rule.
*
* @param {string} url
* @param {string?} ua
* @return {number?}
*/
Robots.prototype.getMatchingLineNumber = function (url, ua) {
var rule = this._getRule(url, ua);
return rule ? rule.lineNumber : -1;
};
/**
* Returns the opposite of isAllowed()
*
* @param {string} url
* @param {string} ua
* @return {boolean}
*/
Robots.prototype.isDisallowed = function (url, ua) {
return !this.isAllowed(url, ua);
};
/**
* Gets the crawl delay if there is one.
*
* Will return undefined if there is no crawl delay set.
*
* @param {string} ua
* @return {number?}
*/
Robots.prototype.getCrawlDelay = function (ua) {
var userAgent = formatUserAgent(ua || '*');
return (this._rules[userAgent] || this._rules['*'] || {}).crawlDelay;
};
/**
* Returns the preferred host if there is one.
*
* @return {string?}
*/
Robots.prototype.getPreferredHost = function () {
return this._preferredHost;
};
/**
* Returns an array of sitemap URLs if there are any.
*
* @return {Array.<string>}
*/
Robots.prototype.getSitemaps = function () {
return this._sitemaps.slice(0);
};
module.exports = Robots;

11
node_modules/robots-parser/SECURITY.md generated vendored Normal file
View File

@@ -0,0 +1,11 @@
# Security Policy
## Supported Versions
The latest version is supported.
## Reporting a Vulnerability
If you find or suspect a security vulnerability, please report it to sam@samclarke.com, and I will try to respond as quickly as possible.
If possible, please include "Security vulnerability report:" in the subject line.

12
node_modules/robots-parser/index.d.ts generated vendored Normal file
View File

@@ -0,0 +1,12 @@
declare module 'robots-parser';
interface Robot {
isAllowed(url: string, ua?: string): boolean | undefined;
isDisallowed(url: string, ua?: string): boolean | undefined;
getMatchingLineNumber(url: string, ua?: string): number;
getCrawlDelay(ua?: string): number | undefined;
getSitemaps(): string[];
getPreferredHost(): string | null;
}
export default function robotsParser(url: string, robotstxt: string): Robot;

5
node_modules/robots-parser/index.js generated vendored Normal file
View File

@@ -0,0 +1,5 @@
var Robots = require('./Robots');
module.exports = function (url, contents) {
return new Robots(url, contents);
};

57
node_modules/robots-parser/package.json generated vendored Normal file
View File

@@ -0,0 +1,57 @@
{
"name": "robots-parser",
"version": "3.0.1",
"description": "A specification compliant robots.txt parser with wildcard (*) matching support.",
"keywords": [
"robots.txt",
"parser",
"user-agent",
"scraper",
"spider",
"bot",
"robots-exclusion-standard"
],
"main": "index.js",
"directories": {
"test": "tests"
},
"homepage": "https://github.com/samclarke/robots-parser",
"bugs": {
"url": "https://github.com/samclarke/robots-parser/issues"
},
"scripts": {
"test": "nyc --reporter=text-summary --reporter=html --reporter=lcovonly mocha"
},
"repository": {
"type": "git",
"url": "https://github.com/samclarke/robots-parser.git"
},
"author": "Sam Clarke <sam@samclarke.com>",
"license": "MIT",
"files": [
"/Robots.js",
"/index.js",
"/index.d.ts",
"/README.md",
"/LICENSE.md",
"/CHANGELOG.md",
"/SECURITY.md",
"test/*.js",
".gitignore"
],
"engines": {
"node": ">=10.0.0"
},
"prettier": {
"tabWidth": 4,
"useTabs": true,
"singleQuote": true,
"trailingComma": "none"
},
"devDependencies": {
"chai": "^4.3.6",
"mocha": "^10.0.0",
"nyc": "^15.1.0"
},
"types": "./index.d.ts"
}

864
node_modules/robots-parser/test/Robots.js generated vendored Normal file
View File

@@ -0,0 +1,864 @@
var robotsParser = require('../index');
var expect = require('chai').expect;
function testRobots(url, contents, allowed, disallowed) {
var robots = robotsParser(url, contents);
allowed.forEach(function (url) {
expect(robots.isAllowed(url)).to.equal(true);
});
disallowed.forEach(function (url) {
expect(robots.isDisallowed(url)).to.equal(true);
});
}
describe('Robots', function () {
it('should parse the disallow directive', function () {
var contents = [
'User-agent: *',
'Disallow: /fish/',
'Disallow: /test.html'
].join('\n');
var allowed = [
'http://www.example.com/fish',
'http://www.example.com/Test.html'
];
var disallowed = [
'http://www.example.com/fish/index.php',
'http://www.example.com/fish/',
'http://www.example.com/test.html'
];
testRobots('http://www.example.com/robots.txt', contents, allowed, disallowed);
});
it('should parse the allow directive', function () {
var contents = [
'User-agent: *',
'Disallow: /fish/',
'Disallow: /test.html',
'Allow: /fish/test.html',
'Allow: /test.html'
].join('\n');
var allowed = [
'http://www.example.com/fish',
'http://www.example.com/fish/test.html',
'http://www.example.com/Test.html',
'http://www.example.com/test.html'
];
var disallowed = [
'http://www.example.com/fish/index.php',
'http://www.example.com/fish/',
];
testRobots('http://www.example.com/robots.txt', contents, allowed, disallowed);
});
it('should parse patterns', function () {
var contents = [
'User-agent: *',
'Disallow: /fish*.php',
'Disallow: /*.dext$',
'Disallow: /dir*'
].join('\n');
var allowed = [
'http://www.example.com/Fish.PHP',
'http://www.example.com/Fish.dext1',
'http://www.example.com/folder/dir.html',
'http://www.example.com/folder/dir/test.html'
];
var disallowed = [
'http://www.example.com/fish.php',
'http://www.example.com/fishheads/catfish.php?parameters',
'http://www.example.com/AnYthInG.dext',
'http://www.example.com/Fish.dext.dext',
'http://www.example.com/dir/test.html',
'http://www.example.com/directory.html'
];
testRobots('http://www.example.com/robots.txt', contents, allowed, disallowed);
});
it('should have the correct order precedence for allow and disallow', function () {
var contents = [
'User-agent: *',
'Disallow: /fish*.php',
'Allow: /fish/index.php',
'Disallow: /test',
'Allow: /test/',
'Disallow: /aa/',
'Allow: /aa/',
'Allow: /bb/',
'Disallow: /bb/',
].join('\n');
var allowed = [
'http://www.example.com/test/index.html',
'http://www.example.com/fish/index.php',
'http://www.example.com/test/',
'http://www.example.com/aa/',
'http://www.example.com/bb/',
'http://www.example.com/x/'
];
var disallowed = [
'http://www.example.com/fish.php',
'http://www.example.com/fishheads/catfish.php?parameters',
'http://www.example.com/test'
];
testRobots('http://www.example.com/robots.txt', contents, allowed, disallowed);
});
it('should have the correct order precedence for wildcards', function () {
var contents = [
'User-agent: *',
'Disallow: /*/',
'Allow: /x/',
].join('\n');
var allowed = [
'http://www.example.com/x/',
'http://www.example.com/fish.php',
'http://www.example.com/test'
];
var disallowed = [
'http://www.example.com/a/',
'http://www.example.com/xx/',
'http://www.example.com/test/index.html'
];
testRobots('http://www.example.com/robots.txt', contents, allowed, disallowed);
});
it('should parse lines delimitated by \\r', function () {
var contents = [
'User-agent: *',
'Disallow: /fish/',
'Disallow: /test.html'
].join('\r');
var allowed = [
'http://www.example.com/fish',
'http://www.example.com/Test.html'
];
var disallowed = [
'http://www.example.com/fish/index.php',
'http://www.example.com/fish/',
'http://www.example.com/test.html'
];
testRobots('http://www.example.com/robots.txt', contents, allowed, disallowed);
});
it('should parse lines delimitated by \\r\\n', function () {
var contents = [
'User-agent: *',
'Disallow: /fish/',
'Disallow: /test.html'
].join('\r\n');
var allowed = [
'http://www.example.com/fish',
'http://www.example.com/Test.html'
];
var disallowed = [
'http://www.example.com/fish/index.php',
'http://www.example.com/fish/',
'http://www.example.com/test.html'
];
testRobots('http://www.example.com/robots.txt', contents, allowed, disallowed);
});
it('should parse lines delimitated by mixed line endings', function () {
var contents = [
'User-agent: *\r',
'Disallow: /fish/\r\n',
'Disallow: /test.html\n\n'
].join('');
var allowed = [
'http://www.example.com/fish',
'http://www.example.com/Test.html'
];
var disallowed = [
'http://www.example.com/fish/index.php',
'http://www.example.com/fish/',
'http://www.example.com/test.html'
];
testRobots('http://www.example.com/robots.txt', contents, allowed, disallowed);
});
it('should ignore rules that are not in a group', function () {
var contents = [
'Disallow: /secret.html',
'Disallow: /test',
].join('\n');
var allowed = [
'http://www.example.com/secret.html',
'http://www.example.com/test/index.html',
'http://www.example.com/test/'
];
testRobots('http://www.example.com/robots.txt', contents, allowed, []);
});
it('should ignore comments', function () {
var contents = [
'#',
'# This is a comment',
'#',
'User-agent: *',
'# This is a comment',
'Disallow: /fish/ # ignore',
'# Disallow: fish',
'Disallow: /test.html'
].join('\n');
var allowed = [
'http://www.example.com/fish',
'http://www.example.com/Test.html'
];
var disallowed = [
'http://www.example.com/fish/index.php',
'http://www.example.com/fish/',
'http://www.example.com/test.html'
];
testRobots('http://www.example.com/robots.txt', contents, allowed, disallowed);
});
it('should ignore invalid lines', function () {
var contents = [
'invalid line',
'User-agent: *',
'Disallow: /fish/',
':::::another invalid line:::::',
'Disallow: /test.html',
'Unknown: tule'
].join('\n');
var allowed = [
'http://www.example.com/fish',
'http://www.example.com/Test.html'
];
var disallowed = [
'http://www.example.com/fish/index.php',
'http://www.example.com/fish/',
'http://www.example.com/test.html'
];
testRobots('http://www.example.com/robots.txt', contents, allowed, disallowed);
});
it('should ignore empty user-agent lines', function () {
var contents = [
'User-agent:',
'Disallow: /fish/',
'Disallow: /test.html'
].join('\n');
var allowed = [
'http://www.example.com/fish',
'http://www.example.com/Test.html',
'http://www.example.com/fish/index.php',
'http://www.example.com/fish/',
'http://www.example.com/test.html'
];
var disallowed = [];
testRobots('http://www.example.com/robots.txt', contents, allowed, disallowed);
});
it('should support groups with multiple user agents (case insensitive)', function () {
var contents = [
'User-agent: agenta',
'User-agent: agentb',
'Disallow: /fish',
].join('\n');
var robots = robotsParser('http://www.example.com/robots.txt', contents);
expect(robots.isAllowed("http://www.example.com/fish", "agenta")).to.equal(false);
});
it('should return undefined for invalid urls', function () {
var contents = [
'User-agent: *',
'Disallow: /secret.html',
'Disallow: /test',
].join('\n');
var invalidUrls = [
'http://example.com/secret.html',
'http://ex ample.com/secret.html',
'http://www.example.net/test/index.html',
'http://www.examsple.com/test/',
'example.com/test/',
':::::;;`\\|/.example.com/test/'
];
var robots = robotsParser('http://www.example.com/robots.txt', contents);
invalidUrls.forEach(function (url) {
expect(robots.isAllowed(url)).to.equal(undefined);
});
});
it('should handle Unicode, urlencoded and punycode URLs', function () {
var contents = [
'User-agent: *',
'Disallow: /secret.html',
'Disallow: /test',
].join('\n');
var allowed = [
'http://www.münich.com/index.html',
'http://www.xn--mnich-kva.com/index.html',
'http://www.m%C3%BCnich.com/index.html'
];
var disallowed = [
'http://www.münich.com/secret.html',
'http://www.xn--mnich-kva.com/secret.html',
'http://www.m%C3%BCnich.com/secret.html'
];
testRobots('http://www.münich.com/robots.txt', contents, allowed, disallowed);
testRobots('http://www.xn--mnich-kva.com/robots.txt', contents, allowed, disallowed);
testRobots('http://www.m%C3%BCnich.com/robots.txt', contents, allowed, disallowed);
});
it('should handle Unicode and urlencoded paths', function () {
var contents = [
'User-agent: *',
'Disallow: /%CF%80',
'Disallow: /%e2%9d%83',
'Disallow: /%a%a',
'Disallow: /💩',
'Disallow: /✼*t$',
'Disallow: /%E2%9C%A4*t$',
'Disallow: /✿%a',
'Disallow: /http%3A%2F%2Fexample.org'
].join('\n');
var allowed = [
'http://www.example.com/✼testing',
'http://www.example.com/%E2%9C%BCtesting',
'http://www.example.com/✤testing',
'http://www.example.com/%E2%9C%A4testing',
'http://www.example.com/http://example.org',
'http://www.example.com/http:%2F%2Fexample.org'
];
var disallowed = [
'http://www.example.com/%CF%80',
'http://www.example.com/%CF%80/index.html',
'http://www.example.com/π',
'http://www.example.com/π/index.html',
'http://www.example.com/%e2%9d%83',
'http://www.example.com/%E2%9D%83/index.html',
'http://www.example.com/❃',
'http://www.example.com/❃/index.html',
'http://www.example.com/%F0%9F%92%A9',
'http://www.example.com/%F0%9F%92%A9/index.html',
'http://www.example.com/💩',
'http://www.example.com/💩/index.html',
'http://www.example.com/%a%a',
'http://www.example.com/%a%a/index.html',
'http://www.example.com/✼test',
'http://www.example.com/%E2%9C%BCtest',
'http://www.example.com/✤test',
'http://www.example.com/%E2%9C%A4testt',
'http://www.example.com/✿%a',
'http://www.example.com/%E2%9C%BF%atest',
'http://www.example.com/http%3A%2F%2Fexample.org'
];
testRobots('http://www.example.com/robots.txt', contents, allowed, disallowed);
});
it('should handle lone high / low surrogates', function () {
var contents = [
'User-agent: *',
'Disallow: /\uD800',
'Disallow: /\uDC00'
].join('\n');
// These are invalid so can't be disallowed
var allowed = [
'http://www.example.com/\uDC00',
'http://www.example.com/\uD800'
];
var disallowed = [];
testRobots('http://www.example.com/robots.txt', contents, allowed, disallowed);
});
it('should ignore host case', function () {
var contents = [
'User-agent: *',
'Disallow: /secret.html',
'Disallow: /test',
].join('\n');
var allowed = [
'http://www.example.com/index.html',
'http://www.ExAmPlE.com/index.html',
'http://www.EXAMPLE.com/index.html'
];
var disallowed = [
'http://www.example.com/secret.html',
'http://www.ExAmPlE.com/secret.html',
'http://www.EXAMPLE.com/secret.html'
];
testRobots('http://www.eXample.com/robots.txt', contents, allowed, disallowed);
});
it('should handle relative paths', function () {
var contents = [
'User-agent: *',
'Disallow: /fish',
'Allow: /fish/test',
].join('\n');
var robots = robotsParser('/robots.txt', contents);
expect(robots.isAllowed('/fish/test')).to.equal(true);
expect(robots.isAllowed('/fish')).to.equal(false);
});
it('should not allow relative paths if domain specified', function () {
var contents = [
'User-agent: *',
'Disallow: /fish',
'Allow: /fish/test',
].join('\n');
var robots = robotsParser('http://www.example.com/robots.txt', contents);
expect(robots.isAllowed('/fish/test')).to.equal(undefined);
expect(robots.isAllowed('/fish')).to.equal(undefined);
});
it('should not treat invalid robots.txt URLs as relative', function () {
var contents = [
'User-agent: *',
'Disallow: /fish',
'Allow: /fish/test',
].join('\n');
var robots = robotsParser('https://ex ample.com/robots.txt', contents);
expect(robots.isAllowed('/fish/test')).to.equal(undefined);
expect(robots.isAllowed('/fish')).to.equal(undefined);
});
it('should not allow URls if domain specified and robots.txt is relative', function () {
var contents = [
'User-agent: *',
'Disallow: /fish',
'Allow: /fish/test',
].join('\n');
var robots = robotsParser('/robots.txt', contents);
expect(robots.isAllowed('http://www.example.com/fish/test')).to.equal(undefined);
expect(robots.isAllowed('http://www.example.com/fish')).to.equal(undefined);
});
it('should allow all if empty robots.txt', function () {
var allowed = [
'http://www.example.com/secret.html',
'http://www.example.com/test/index.html',
'http://www.example.com/test/'
];
var robots = robotsParser('http://www.example.com/robots.txt', '');
allowed.forEach(function (url) {
expect(robots.isAllowed(url)).to.equal(true);
});
});
it('should treat null as allowing all', function () {
var robots = robotsParser('http://www.example.com/robots.txt', null);
expect(robots.isAllowed("http://www.example.com/", "userAgent")).to.equal(true);
expect(robots.isAllowed("http://www.example.com/")).to.equal(true);
});
it('should handle invalid robots.txt urls', function () {
var contents = [
'user-agent: *',
'disallow: /',
'host: www.example.com',
'sitemap: /sitemap.xml'
].join('\n');
var sitemapUrls = [
undefined,
null,
'null',
':/wom/test/'
];
sitemapUrls.forEach(function (url) {
var robots = robotsParser(url, contents);
expect(robots.isAllowed('http://www.example.com/index.html')).to.equal(undefined);
expect(robots.getPreferredHost()).to.equal('www.example.com');
expect(robots.getSitemaps()).to.eql(['/sitemap.xml']);
});
});
it('should parse the crawl-delay directive', function () {
var contents = [
'user-agent: a',
'crawl-delay: 1',
'user-agent: b',
'disallow: /d',
'user-agent: c',
'user-agent: d',
'crawl-delay: 10'
].join('\n');
var robots = robotsParser('http://www.example.com/robots.txt', contents);
expect(robots.getCrawlDelay('a')).to.equal(1);
expect(robots.getCrawlDelay('b')).to.equal(undefined);
expect(robots.getCrawlDelay('c')).to.equal(10);
expect(robots.getCrawlDelay('d')).to.equal(10);
expect(robots.getCrawlDelay()).to.equal(undefined);
});
it('should ignore invalid crawl-delay directives', function () {
var contents = [
'user-agent: a',
'crawl-delay: 1.2.1',
'user-agent: b',
'crawl-delay: 1.a0',
'user-agent: c',
'user-agent: d',
'crawl-delay: 10a'
].join('\n');
var robots = robotsParser('http://www.example.com/robots.txt', contents);
expect(robots.getCrawlDelay('a')).to.equal(undefined);
expect(robots.getCrawlDelay('b')).to.equal(undefined);
expect(robots.getCrawlDelay('c')).to.equal(undefined);
expect(robots.getCrawlDelay('d')).to.equal(undefined);
});
it('should parse the sitemap directive', function () {
var contents = [
'user-agent: a',
'crawl-delay: 1',
'sitemap: http://example.com/test.xml',
'user-agent: b',
'disallow: /d',
'sitemap: /sitemap.xml',
'sitemap: http://example.com/test/sitemap.xml '
].join('\n');
var robots = robotsParser('http://www.example.com/robots.txt', contents);
expect(robots.getSitemaps()).to.eql([
'http://example.com/test.xml',
'/sitemap.xml',
'http://example.com/test/sitemap.xml'
]);
});
it('should parse the host directive', function () {
var contents = [
'user-agent: a',
'crawl-delay: 1',
'host: www.example.net',
'user-agent: b',
'disallow: /d',
'host: example.com'
].join('\n');
var robots = robotsParser('http://www.example.com/robots.txt', contents);
expect(robots.getPreferredHost()).to.equal('example.com');
});
it('should parse empty and invalid directives', function () {
var contents = [
'user-agent:',
'user-agent:::: a::',
'crawl-delay:',
'crawl-delay:::: 0:',
'host:',
'host:: example.com',
'sitemap:',
'sitemap:: site:map.xml',
'disallow:',
'disallow::: /:',
'allow:',
'allow::: /:',
].join('\n');
robotsParser('http://www.example.com/robots.txt', contents);
});
it('should treat only the last host directive as valid', function () {
var contents = [
'user-agent: a',
'crawl-delay: 1',
'host: www.example.net',
'user-agent: b',
'disallow: /d',
'host: example.net',
'host: example.com'
].join('\n');
var robots = robotsParser('http://www.example.com/robots.txt', contents);
expect(robots.getPreferredHost()).to.equal('example.com');
});
it('should return null when there is no host directive', function () {
var contents = [
'user-agent: a',
'crawl-delay: 1',
'user-agent: b',
'disallow: /d',
].join('\n');
var robots = robotsParser('http://www.example.com/robots.txt', contents);
expect(robots.getPreferredHost()).to.equal(null);
});
it('should fallback to * when a UA has no rules of its own', function () {
var contents = [
'user-agent: *',
'crawl-delay: 1',
'user-agent: b',
'crawl-delay: 12',
'user-agent: c',
'user-agent: d',
'crawl-delay: 10'
].join('\n');
var robots = robotsParser('http://www.example.com/robots.txt', contents);
expect(robots.getCrawlDelay('should-fall-back')).to.equal(1);
expect(robots.getCrawlDelay('d')).to.equal(10);
expect(robots.getCrawlDelay('dd')).to.equal(1);
});
it('should not fallback to * when a UA has rules', function () {
var contents = [
'user-agent: *',
'crawl-delay: 1',
'user-agent: b',
'disallow:'
].join('\n');
var robots = robotsParser('http://www.example.com/robots.txt', contents);
expect(robots.getCrawlDelay('b')).to.equal(undefined);
});
it('should handle UAs with object property names', function () {
var contents = [
'User-agent: *',
'Disallow: /fish',
].join('\n');
var robots = robotsParser('http://www.example.com/robots.txt', contents);
expect(robots.isAllowed('http://www.example.com/fish', 'constructor')).to.equal(false);
expect(robots.isAllowed('http://www.example.com/fish', '__proto__')).to.equal(false);
});
it('should ignore version numbers in the UA string', function () {
var contents = [
'user-agent: *',
'crawl-delay: 1',
'user-agent: b',
'crawl-delay: 12',
'user-agent: c',
'user-agent: d',
'crawl-delay: 10'
].join('\n');
var robots = robotsParser('http://www.example.com/robots.txt', contents);
expect(robots.getCrawlDelay('should-fall-back/1.0.0')).to.equal(1);
expect(robots.getCrawlDelay('d/12')).to.equal(10);
expect(robots.getCrawlDelay('dd / 0-32-3')).to.equal(1);
expect(robots.getCrawlDelay('b / 1.0')).to.equal(12);
});
it('should return the line number of the matching directive', function () {
var contents = [
'',
'User-agent: *',
'',
'Disallow: /fish/',
'Disallow: /test.html',
'Allow: /fish/test.html',
'Allow: /test.html',
'',
'User-agent: a',
'allow: /',
'',
'User-agent: b',
'disallow: /test',
'disallow: /t*t',
'',
'User-agent: c',
'Disallow: /fish*.php',
'Allow: /fish/index.php'
].join('\n');
var robots = robotsParser('http://www.example.com/robots.txt', contents);
expect(robots.getMatchingLineNumber('http://www.example.com/fish')).to.equal(-1);
expect(robots.getMatchingLineNumber('http://www.example.com/fish/test.html')).to.equal(6);
expect(robots.getMatchingLineNumber('http://www.example.com/Test.html')).to.equal(-1);
expect(robots.getMatchingLineNumber('http://www.example.com/fish/index.php')).to.equal(4);
expect(robots.getMatchingLineNumber('http://www.example.com/fish/')).to.equal(4);
expect(robots.getMatchingLineNumber('http://www.example.com/test.html')).to.equal(7);
expect(robots.getMatchingLineNumber('http://www.example.com/test.html', 'a')).to.equal(10);
expect(robots.getMatchingLineNumber('http://www.example.com/fish.php', 'c')).to.equal(17);
expect(robots.getMatchingLineNumber('http://www.example.com/fish/index.php', 'c')).to.equal(18);
});
it('should handle large wildcards efficiently', function () {
var contents = [
'User-agent: *',
'Disallow: /' + '*'.repeat(2048) + '.html',
].join('\n');
var allowed = [
'http://www.example.com/' + 'sub'.repeat(2048) + 'folder/index.php',
];
var disallowed = [
'http://www.example.com/secret.html'
];
const start = Date.now();
testRobots('http://www.eXample.com/robots.txt', contents, allowed, disallowed);
const end = Date.now();
// Should take less than 500 ms (high to allow for variableness of
// machines running the test, should normally be much less)
expect(end - start).to.be.lessThan(500);
});
it('should honor given port number', function () {
var contents = [
'User-agent: *',
'Disallow: /fish/',
'Disallow: /test.html'
].join('\n');
var allowed = [
'http://www.example.com:8080/fish',
'http://www.example.com:8080/Test.html'
];
var disallowed = [
'http://www.example.com/fish',
'http://www.example.com/Test.html',
'http://www.example.com:80/fish',
'http://www.example.com:80/Test.html'
];
testRobots('http://www.example.com:8080/robots.txt', contents, allowed, disallowed);
});
it('should default to port 80 for http: if no port given', function () {
var contents = [
'User-agent: *',
'Disallow: /fish/',
'Disallow: /test.html'
].join('\n');
var allowed = [
'http://www.example.com:80/fish',
'http://www.example.com:80/Test.html'
];
var disallowed = [
'http://www.example.com:443/fish',
'http://www.example.com:443/Test.html',
'http://www.example.com:80/fish/index.php',
'http://www.example.com:80/fish/',
'http://www.example.com:80/test.html'
];
testRobots('http://www.example.com/robots.txt', contents, allowed, disallowed);
});
it('should default to port 443 for https: if no port given', function () {
var contents = [
'User-agent: *',
'Disallow: /fish/',
'Disallow: /test.html'
].join('\n');
var allowed = [
'https://www.example.com:443/fish',
'https://www.example.com:443/Test.html',
'https://www.example.com/fish',
'https://www.example.com/Test.html'
];
var disallowed = [
'http://www.example.com:80/fish',
'http://www.example.com:80/Test.html',
'http://www.example.com:443/fish/index.php',
'http://www.example.com:443/fish/',
'http://www.example.com:443/test.html'
];
testRobots('https://www.example.com/robots.txt', contents, allowed, disallowed);
});
});