正则表达式删除外部链接，除了提供的域相关链接php

问题描述：

我想正则表达式从我的内容中删除所有外部链接，只是保持提供的域的链接。正则表达式删除外部链接，除了提供的域相关链接php

例如，

$inputContent = 'Lorem Ipsum <a href="http://www.example1.com" target="_blank">http://www.example1.com</a> lorem ipsum dummy text <a href="http://www.mywebsite.com" target="_blank">http://www.mywebsite.com</a>';

预期输出：

$outputContent = 'Lorem Ipsum lorem ipsum dummy text <a href="http://www.mywebsite.com" target="_blank">http://www.mywebsite.com</a>';

试图用这种解决方案，但它不工作。

$pattern = '#<a [^>]*\bhref=([\'"])http.?://((?<!mywebsite)[^\'"])+\1 *>.*?</a>#i'; 
$filteredString = preg_replace($pattern, '', $content);

尝试首先检查您的正则表达式，您有3个未转义的分隔符。你可以使用这个网站来检查你的正则表达式的一致性。 https://regex101.com/ – Marcs

你的正则表达式不考虑'target = _blank'或任何其他属性。 – mario

答

试图用这种解决方案，但它不工作。
$pattern = '#<a [^>]*\bhref=([\'"])http.?://((?<!mywebsite)[^\'"])+\1 *>.*?</a>#i'; 

你接近。为了使您的解决方案能够正常工作，只需删除一个>即i。即

$pattern = '#<a [^>]*\bhref=([\'"])http.?://((?<!mywebsite)[^\'"])+\1 *.*?</a>#i';

答

你在这里需要的不是正则表达式。您正在解析HTML文档，因此您应该为其选择正确的工具：DOMDocument。

<?php 

$html = <<< HTML 
Lorem Ipsum <a href="http://www.example1.com" target="_blank">http://www.example1.com</a> 
lorem ipsum dummy text 
<a href="http://mywebsite.com" target="_blank">http://www.mywebsite.com</a> 
HTML; 


$dom = new \DOMDocument(); 
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); 
$xpath = new \DOMXPath($dom); 

$site = 'mywebsite.com'; 
// Query all `a` tags that don't start with your website domain name 
$anchors = $xpath->query("//a[not(starts-with(@href,'http://{$site}')) and not(starts-with(@href,'http://www.{$site}'))]"); 

foreach ($anchors as $anchor) { 
    $anchor->parentNode->removeChild($anchor); 
} 

echo $dom->saveHTML();

输出：

<p>Lorem Ipsum 
lorem ipsum dummy text 
<a href="http://mywebsite.com" target="_blank">http://www.mywebsite.com</a></p>

这不会工作来涵盖相对链接吗？ –

答

用正则表达式的解决方案：

$inputContent = 'Lorem Ipsum <a href=\'http://www.example1.com\' target="_blank"><strong>http://www.example1.com</strong></a> lorem ipsum dummy text <a href="http://www.mywebsite.com" target="_blank">http://www.mywebsite.com</a>'; 

function callback($matches) { 
    //print_r($matches); 

    if (preg_match('#^https?://(www\.)?mywebsite\.com(/.+)?$#i', $matches[1])) { 
     return '<a href="' . $matches[1] . '" target="_blank">' . $matches[2] . '</a>'; 
    } 

    //return ''; 
    return $matches[2]; // or you can remove only the anchor and print the text only 
} 

$pattern = '#<a[^>]*href=[\'"]([^\'"]*)[\'"][^>]*>(((?!<a\s).)*)</a>#i'; 
$filteredString = preg_replace_callback($pattern, 'callback', $inputContent); 

echo $filteredString;

正则表达式删除外部链接，除了提供的域相关链接php

相关推荐