I came up with a quick and dirty solution to utilize BeatifulSoup for data extraction from HTML DOM and then use Google API. This became a bit tricky as you need to open a service account for which you need to get an authentication JSON file. In the end, you need to add the email address of this account to the Calendar shared with the list. The email you can find in the [name-of-the-cred-file].json file:
Maybe it is easier to setup this first, go to Google service accounts page, create a project and in it create a service account. Once created you will be able to go to actions button in the list of service accounts and click manage keys. Generate a key and you will get automatic download of the credentials key in a form of json file. Rename it and place it in the folder of the project. I’ve used creds.js
Once that is done you need also to add this serice account to your Google calendar share with list. Go to Google Calendar next to each calendar in the list you can click action > Settings and sharing. In the “Share with specific people” section you need to enter email address of a service account you previously created. Thats it, now your service can access your Calendar.
I will share the code with you down but before I do that I want to explain briefly how to run this. As I’ve mentioned I am lazy and to be honest I hate to add this to a cronjob on my personal computer. Thus, I was thinking that the easiest way to get this working daily is to add the code to the Lambda AWS function and attach EventBridge Cloud Watch event to trigger it daily.
Go to AWS > Lambda > Functions and go ahead and create a function with Python as Runtime engine. Once the function is created AWS will have this Function overview interface where you can click on the “Add trigger” button. Select Event Bridge with 1 day
Schedule expression. (run it daily)
Now you just need to add code to run this. Again since I am lazy I was not into setting up CI/CD pipeline but rather ziping the lambda function and its contents. Go ahead create a folder lambda-func
store the script and credentials into it. Now we also need to fetch dependencies and add them to the package. Now create a folder package mkdir package
and write in terminal
pip install --target ./package BeautifulSoup4 requests google-api-python-client google-auth-httplib2 google-auth-oauthlib bs4 html5li lxml
Go in package folder cd package and type zip -r ../my-deployment-package.zip
. this will create my-deployment-package.zip in the parent folder. Now go back cd ..
and add two remaining files in the package (script and credential file) zip my-deployment-package.zip name_of_script.py creds.json
that will add remaining files in zip.
Now all you have to do is go to the page of your Lambda function and click on the “Upload from” button, select .zip
file and then select zip you just created. Once selected, save it in order to upload it. You should have everything set now. You can use “Test” button in Test section to test the function.
# In order for this to work you need to open service account and to add its email to Calendar share feature
#
# @author Milos Kovacki
#
import io
import requests
import sys
import datetime
import re
import os.path
import boto3
import googleapiclient.discovery
from bs4 import BeautifulSoup
from google.oauth2 import service_account
api_key = 'xxXXxXXXxXxxXxx'
calendar_id = '00000000000000000000000000000000000000096@group.calendar.google.com'
SCOPES = [ 'https://www.googleapis.com/auth/calendar',
'https://www.googleapis.com/auth/admin.directory.resource.calendar',
'https://www.googleapis.com/auth/calendar.events.readonly',
'https://www.googleapis.com/auth/calendar.readonly'
]
SERVICE_ACCOUNT_FILE = 'cred.json'
credentials = service_account.Credentials.from_service_account_file(
SERVICE_ACCOUNT_FILE, scopes=SCOPES)
service = googleapiclient.discovery.build('calendar', 'v3',credentials=credentials)
# Fetch events from event site
def fetchEvents():
events = []
today = datetime.date.today().strftime("%Y%m%d")
URL = "http://izlazak.com/component/jem/day/" + today
resp = requests.get(URL)
if resp.status_code == 200:
soup = BeautifulSoup(resp.text, "lxml")
elements = soup.select('div[id="jem"] form a span')
jd1 = soup.select('div[id="jem"] form span[class="jem_date-1"]')
loc = soup.select('div[id="jem"] form div[class="calendar day location"]')
dc = 0
for i, e in enumerate(elements):
dc = dc+2 if i > 0 else 0
if (len(jd1) > (dc+1)):
event_name = elements[i].text
event_start_date = jd1[dc].text
event_end_date = jd1[dc+1].text
# remove spaces
event_location = re.sub(r'(^[ \t]+|[ \t]+(?=:))', '', loc[i].text.strip(), flags=re.M)
event_location = event_location.replace("\xa0\xa0\t\t\t\t\t\t\t\n\n\n", "")
events.append({'name' : event_name, 'start': event_start_date, 'end' : event_end_date,
'loc' : event_location})
return events
# Fetch already exisitng calendar events and checks whether there are ones
# that are not added to the calendar
def fetchCalendarEvents():
now = datetime.datetime.utcnow().isoformat() + 'Z' # 'Z' indicates UTC time
events_result = service.events().list(calendarId=calendar_id,
maxResults=150, singleEvents=True,
orderBy='startTime').execute()
events = events_result.get('items', [])
if not events:
print('No upcoming events found.')
# Prints the start and name of the next 10 events
for event in events:
start = event['start'].get('dateTime', event['start'].get('date'))
#print(start, event['summary'])
return events
# Adds event to the calendar
def addEventToCalendar(event):
start_date = datetime.datetime.strptime(event['start'],
'%d.%m.%y').replace(minute=0,
hour=19)
end_date = datetime.datetime.strptime(event['end'],
'%d.%m.%y').replace(minute=0,
hour=20)
new_event = {
'summary': event['name'],
'location': event['loc'],
'description': '',
'start': {
'dateTime': start_date.isoformat() + 'Z',
'timeZone': 'Europe/Berlin',
},
'end': {
'dateTime': end_date.isoformat() + 'Z',
'timeZone': 'Europe/Berlin',
},
'recurrence': [
],
'attendees': [
],
'reminders': {
'useDefault': True,
},
}
print('Adding event ' + event['name'])
event_result = service.events().insert(calendarId=calendar_id, body=new_event).execute()
def lambda_handler(event, context):
# Firstly fetch events
current_events = fetchEvents()
calendar_events = fetchCalendarEvents()
added_events = []
# compile a list of events in calendar
for ce in calendar_events:
added_events.append(ce['summary'])
# add events not in calendar
for ce in current_events:
if ce['name'] not in added_events:
addEventToCalendar(ce)
print('Done')
I hope you find this at least a bit useful :)