最近搞了一站点提交给sitemap给search console,结果一直显示无法读取此站点地图,sitemap多次核查是没有问题的,百度和bing等等收录正常,自己写了一个simemap生成器仍是如此。后来发现Google也支持API提交的就尝试搞了一下。
- 参考: https://developers.google.com/search/apis/indexing-api/v3/quickstart?hl=zh-cn
-
注册感觉没啥可说的按上面教程来就行,注意给授权就行。
-
requirements.txt
beautifulsoup4==4.11.1
bs4==0.0.1
cachetools==5.2.0
certifi==2022.6.15
charset-normalizer==2.1.0
google-api-core==2.8.2
google-api-python-client==2.55.0
google-auth-httplib2==0.1.0
google-auth==2.9.1
googleapis-common-protos==1.56.4
httplib2==0.20.4
idna==3.3
oauth2client==4.1.3
pip==21.3.1
protobuf==4.21.4
pyasn1-modules==0.2.8
pyasn1==0.4.8
pyparsing==3.0.9
requests==2.28.1
rsa==4.9
setuptools==60.2.0
six==1.16.0
soupsieve==2.3.2.post1
uritemplate==4.1.1
urllib3==1.26.11
wheel==0.37.1
- 按照requirements.txt安装
pip install -r requirements.txt
-
注意配置修改代码与自己相关的json验证文件和网站地图文件的路径
-
代码如下
from oauth2client.service_account import ServiceAccountCredentials
import httplib2
import requests as req
from bs4 import BeautifulSoup
def index(url):
SCOPES = [ "https://www.googleapis.com/auth/indexing" ]
ENDPOINT = "https://indexing.googleapis.com/v3/urlNotifications:publish"
# service_account_file.json is the private key that you created for your service account.
JSON_KEY_FILE = "your-index-api.json"
credentials = ServiceAccountCredentials.from_json_keyfile_name(JSON_KEY_FILE, scopes=SCOPES)
http = credentials.authorize(httplib2.Http())
# Define contents here as a JSON string.
# This example shows a simple update request.
# Other types of requests are described in the next step.
content = "{\"url\": \"%s\", \"type\": \"URL_UPDATED\"}" % url
response, content = http.request(ENDPOINT, method="POST", body=content)
return response
all_link = []
origin_rul = 'https://your-sitemap-url.xml'
r = req.get(origin_rul)
bs = BeautifulSoup(r.content, 'html.parser') #解析网页
hyperlink = bs.find_all(name = 'loc') # 标签是否要附加信息,如要附加。去BeautifulSoup查看文档,我目前测试过attrs={'alt' : ''}
for h in hyperlink:
hh = h.string
all_link.append(hh)
all_link.reverse()
sent = []
# 打开文件
fo = open("sent.txt", "r")
print("文件名为: ", fo.name)
for line in fo.readlines(): # 依次读取每行
line = line.strip() # 去掉每行头尾空白
sent.append(line) # 将每行的内容添加到列表中
print("读取的数据为: %s" % (line))
# 关闭文件
fo.close()
for link in all_link:
if link not in sent:
print(link)
res = index(link)
if res.get("status") == "200":
with open("sent.txt", 'a+') as f:
f.write(str(link) + '\n') # 加\n换行显示
else:
print(res)
break
else:
print(str(link) + '已经发送过了')
continue