Java:如何从重定向的URL中读取内容?
我用下面的Java代码在一个Bean读取URL的内容:Java:如何从重定向的URL中读取内容?
String url;
String inputLine;
StringBuilder srcCode=new StringBuilder();
public void setUrl (String value) {
url = value;
}
private void scanWebPage() throws IOException {
try {
URL dest = new URL(url);
URLConnection yc = dest.openConnection();
yc.setUseCaches(false);
BufferedReader in = new BufferedReader(new
InputStreamReader(yc.getInputStream()));
while ((inputLine = in.readLine()) != null)
srcCode = srcCode.append (inputLine);
in.close();
} catch (FileNotFoundException fne) {
srcCode.append("File Not Found") ;
}
}
代码工作对大多数的URL,但对于重定向的URL不起作用。如何更新上述代码以从重定向的URL中读取内容?对于重定向的URL,我得到"File Not Found"
。
给下面一展身手:
HttpURLConnection yc = (HttpURLConnection) dest.openConnection();
yc.setInstanceFollowRedirects(true);
在上下文上面代码:
`String url = "http://java.sun.com";
String inputLine;
StringBuilder srcCode=new StringBuilder();
URL dest = new URL(url);
HttpURLConnection yc = (HttpURLConnection) dest.openConnection();
yc.setInstanceFollowRedirects(true);
yc.setUseCaches(false);
BufferedReader in = new BufferedReader(
new InputStreamReader(
yc.getInputStream()));
while ((inputLine = in.readLine()) != null) {
srcCode = srcCode.append (inputLine);
}
in.close();`
进一步修饰,以帮助您诊断是怎么回事。此代码会关闭自动重定向,然后手动按照位置标题打印输出。
@Test
public void f() throws IOException {
String url = "http://java.sun.com";
fetchURL(url);
}
private HttpURLConnection fetchURL(String url) throws IOException {
URL dest = new URL(url);
HttpURLConnection yc = (HttpURLConnection) dest.openConnection();
yc.setInstanceFollowRedirects(false);
yc.setUseCaches(false);
System.out.println("url = " + url);
int responseCode = yc.getResponseCode();
if (responseCode >= 300 && responseCode < 400) { // brute force check, far too wide
return fetchURL(yc.getHeaderField("Location"));
}
System.out.println("yc.getResponseCode() = " + yc.getResponseCode());
return yc;
}
克里斯 - 谢谢,但这没有奏效。重定向的网址就像“迷你网址”一样,在网页浏览器中输入时会变为真实的网址,但通过Java代码,它们不会更改,并被称为无效网址。 – user1492667 2013-02-27 16:06:48
你要去的URL是什么?我测试了上面的代码,发现它遵循上面url的重定向。你的情况是你的URL重定向到不同的协议?如果是这样,那么可能是你的问题,因为HttpURLConnection不会遵循这些。如果是这样的话,我会亲自使用一个库,如Play2中包含的库或Apache HttpCommons。或者,您可以随时将自动跟随设置为false,然后自己读出位置标题,然后自己显式获取该URL。 – 2013-02-28 09:01:08
它不是你的前卫的debuggin,但你可以看看这个例子
public class GetURLData
{
public static void main(String args[])
{
String url = "the url you want the response from";
HttpClient httpClient = new DefaultHttpClient();
HttpPost httpPost = new HttpPost(url);
HttpResponse response;
StringBuilder builder= new StringBuilder();
try
{
response = httpClient.execute(httpPost);
BufferedReader in = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), "UTF-8"));
char[] buf = new char[8000];
int l = 0;
while (l >= 0)
{
builder.append(buf, 0, l);
l = in.read(buf);
}
System.out.println(builder.toString);
} catch (Exception e)
{
System.out.println("Exception is :"+e);
e.printStackTrace();
}
}
}
'java.net.URL'应遵循默认重定向(除非您以前称为'HttpURLConnection.setFollowRedirects (false)'),所以你应该只看到最终目标URL的内容。假设重定向本身不会进入404页面... – 2013-02-27 11:27:59
如果协议发生变化(即从HTTP到HTTPS),URL连接将不会遵循重定向。这是你的场景吗?另外,你是否被允许使用[Apache HttpComponents](http://hc.apache.org/)? – Perception 2013-02-27 11:28:01