Techsalad

Blogging 101

Oladipupo Joseph Ibeun — Sat, 14 Nov 2020 15:09:13 GMT

In recent years, weblogging popularly known as blogging has been a means for writers to create a relationship with their readers by self-publishing articles, photography, and other media contents online.

The need to get feedback from these self-published articles has created an evaluation bottle-neck for beginners in blogging. Generally, the misconception would be that your blog will blow out when you publish your first article. Although this is a possibility, it’s rarely the case.

I started blogging this year, few months after I got into the world of data science, my primary goal is to share my acquired knowledge and make it easier for my readers to get things done faster than I was able to do them.

In this article, I’ll share “what successful blogging means to me” as a beginner in blogging. I’ll break successful blogging down into two categories:

Self-satisfaction: If I study a concept, solve a problem or create a project and I write an article that demystifies the work done to the level of a 5-year-olds intuition, then I consider that a success. Before I start thinking about the numbers return I get on my blog, I want to be sure that every concept and tool contained in that blog post is structured in the most basic and understandable way. I believe if I can achieve this level of demystification on every blog post, my audience will grow organically.

while you are here, check out my article on automation using python programming language. it's a fun and work salad 😊

The second and last category is:

One-feedback rule: A blog post is meant to communicate a concept with the readers. Communication in itself is not complete without feedback. I believe a blog post that’s well written will connect with at least one reader. If that one reader gives feedback in any form to signify that he/she has gotten value from the time invested in my article, then that’s a success! Haven said these few things, it’s important to note that developing your writing skills will have a positive impact on the traffic you generate for your blog. I hope you’ve learned a thing or two from this post. Do leave a comment if you feel obliged.

Knowledge nugget: while writing this blog post, I discovered a little functionality in Microsoft word 2016 that allows you to impute a border-line on the current cursor line by typing an asterisk three times or more and pressing enter. If you did not know that, you’re welcome 😊

Automate log file clearing

Oladipupo Joseph Ibeun — Fri, 13 Nov 2020 23:26:43 GMT

Repetitive daily activities can get cumbersome and tiring. Recently, I found myself trying to navigate through all the folders in my personal computer searching for log files and deleting them one after the other. This process is tiring. For this reason, I wrote a python scripts that does all my log clearing.

Before we go on, if you are reading this and have always wondered what webscrapping is and want to learn how to scrape a website, my article on 5 steps to easy webscraping is for you. Check it out!

The solution implemented in the article can be found here: Github repo

Tools used

Python programming language

Packages used

Os.path : to access directories and files
Datetime: to create instance of data and time the files were last modified
Beepy: User notification alert

Solution process

To effectively solve the problems, every steps that is to be executed is functionalized. I will explain each function in this section. If you would rather skip the explanation and review the implemented solution, you can find the github link here.

success_alert: to alert the user if the task was completed successfully
```
def success_alert():
 beepy.beep(6)
```
error_alert: to alert the user if there was an error in execution
```
def error_alert():
 beepy.beep(2
```

check_path: to check if the path specified by the user exists on the personal computer

# create a function to check if log_directory path exists
def check_path(path):
 if os.path.exists(path):
     print("file path exists")
 else:
     error_alert()
     print(f'{path} does not exist')

confirm_directory: to check if the existing path is a directory or not. This is because the scripts require a directory in which it loops the directory to select specific files

# create function to check if path is a directory
def confirm_directory(path):
 if os.path.isdir(path):
     print(f'{path} is a directory, we are good to go')
 else:
     error_alert()
     print(f'{path} is not a directory, change path and try again')

create_file_path: check for sub folders and iterate through them to create file_path and delete all log files created today.

# check for sub folders and iterate through them to create file_path and delete all log files created today
def create_file_path(path):
 for base_folder, folder, files in os.walk(path):
     # check for files
     for file in files:
         # create file_path
         file_path = os.path.join(base_folder, file)

         # getting file extension
         file_extension = os.path.splitext(file_path)[1]

         # compare file extension with log_extension
         if log_extension == file_extension:
             # check file properties for set condition
             # get date file was last modified
             timestamp = date.fromtimestamp(path.stat().st_ctime)

             if date.today() == timestamp:
                 if not os.remove(file_path):
                     success_alert()
                     print(f'{file_path} removed successfully')
                 else:
                     print(f'Unable to delete {file_path}')
         else:
             print(f'{file_path} is not a log file')

Usage: to make use of the solution, simply download and install python from this link, run pip install requirements.txt and run the python scripts available in the github repo provided here.

Furthermore: Feel free to contribute your own python utility scripts to this repo, lets build a community together. I look forward to your pull request!.

5 Steps to easy web Scraping

Oladipupo Joseph Ibeun — Tue, 08 Sep 2020 15:11:09 GMT

What is web scraping? How do I scrape a webpage? What are the tools used for scraping a webpage? These are questions that have been asked by aspiring data analysts, data scientists, data engineers and data practitioners. This article will provide easy, functional and actionable explanations to what web scrapping is and how to perform web scraping in 5 easy steps in python.

Let’s get started:

What is web scraping?

Web scraping refers to the process of retrieving data that is stored on websites. If you have ever gone to a website to collect data about a particular topic, you have scraped the web. However, manually going to a search engine to download all the result you got from your “images of cat” query can be extremely tedious and time consuming. The process you used to collect this data can be automated for when you need to collect large amounts of data, like the result of your “images of a cat” query.

Web scraping is a name given to the process of automating the manual task of retrieving information from a website. Web scraping is perceived to be a daunting task that can only be performed by senior programmers through countless lines of code, this assumption is not accurate. Although, scraping a website requires you to have a coding background, the task is relatively easy with tools like python, request, BeautifulSoup, selenium and many others. Textual data, image files, video files can all be retrieved through web scraping.

Now that we know what web scraping is, how exactly do we scrape a webpage?

Note, the code used in this article is provided here

https://github.com/josephdickson11/web_scrapers.git

How to scrape a webpage?

These are 5 easy steps to scrape a webpage using python:

1. Identify desired webpage: Identifying a web page that contains the data one wants to retrieve is the first step in web scraping. For example, if you want to retrieve ratings and genre of movies released from the year 2018–2020, you can get this data from the IMDB website. Head-on to the site, set the filters on the search query to the desired output and the web engine will display all desired data. Where do you go from here?

2. Study the structure of the webpage: In this step, you will need to view the source code of the page and study the html elements. There are built in tools in modern web browsers that enables users to interact with the backend of the webpage. In Chrome, you can simply right-click anywhere on the webpage and click on “view page source” or “inspect” this will launch the developers console, giving you access to the webpage backend.

3. Locate data and HTML element tree: After carefully studying the structure of the webpage, you should be able to identify the location of the actual data you wish to retrieve. Another way to do this is to simply move the cursor to the position of your desired data, right-click and click on “inspect”, and this will take you directly to the containing html element tag. Now that we have located the desired data, we can proceed to write the efficient code to get us the data.

4. Write the code to retrieve the data: You are almost done, now automate all the steps from one through three, in other to achieve this, one has to be familiar with the tools used for web scraping;

Tools for web scraping

There are several python modules that can be used for scraping websites. These modules include Request, BeautifulSoup, Selenium, Scrapy and many others. For the purpose of this course, the focus will be on Request and BeautifulSoup as they are both sufficient to achieve our aim.

Request: Request is a python module that enables users to send HTTP request such as GET and POST using python code. The module is preinstalled with the python base environment and offers features such as sessions with cookies, multipart file uploads, streaming downloads, connection timeouts and many more.
BeautifulSoup: BeautifulSoup a native module that enables python to parse the HTML and XML documents retrieved through the request get method. The module needs to be installed with the command pip install beautifulsoup4 before it can be used. some of the offered features can be seen here

Now that we have the tools, we can put them to use and write the code!

5. Store retrieved data: Now that you have the data, the data can be stored in any desired format. For the purpose of this article, the data will be stored in a pandas dataframe.

feel free to give feed backs about the article. Ask questions about ambiguous concepts, and suggestions will be highly appreciated