Translate

Python Pandas - Get the current date and time from Timestamp object

 

Python Pandas - Get the current date and time from Timestamp object


 Chapter -1

 how to get current date and time using python and pandas

```

from datetime import datetime

# datetime object containing current date and time
now = datetime.now()
 
print("now =", now)

# dd/mm/YY H:M:S
dt_string = now.strftime("%d/%m/%Y %H:%M:%S")
print("date and time =", dt_string)
import pandas as pd
df = pd.DataFrame({'date': dt_string,
                   'value': [2345]})
df
``` 
Output 


date value
0 05/05/2022 05:46:09 2
1 05/05/2022 05:46:09 3
2 05/05/2022 05:46:09 4
305/05/2022 05:46:095

```

Get the current date and time from Timestamp object, use the timestamp.today() method.


At first, import the required libraries −

import pandas as pd import datetime

Create the timestamp in Pandas

timestamp = pd.Timestamp(datetime.datetime(2021, 10, 10))

Display the Timestamp

print("Timestamp: ", timestamp)

Getting the current date and time

res = timestamp.today()

Example

Following is the code

import pandas as pd import datetime # set the timestamp in Pandas timestamp = pd.Timestamp(datetime.datetime(2021, 10, 10)) # display the Timestamp print("Timestamp: ", timestamp) # display the day from given timestamp print("Day Name:", timestamp.day_name()) # getting the current date and time res = timestamp.today() # display the current date and time print("\nToday's Date and time...\n", res) # display the current day print("Today's Day:", res.day_name())

Output

This will produce the following code

Timestamp: 2021-10-10 00:00:00
Day Name: Sunday

Today's Date and time...
2021-10-03 13:22:28.891506
Today's Day: Sunday


Chapter -2

Working with datetime in Pandas DataFrame

Some Pandas tricks to help you get started with data analysis

Photo by Lukas Blazek on Unsplash

Datetime is a common data type in data science projects. Often, you’ll work with it and run into problems. I found Pandas is an amazing library that contains extensive capabilities and features for working with date and time.

In this article, we will cover the following common datetime problems and should help you get started with data analysis.

  1. Convert strings to datetime

Please check out my Github repo for the source code.

1. Convert strings to datetime

Pandas has a built-in function called to_datetime() that can be used to convert strings to datetime. Let’s take a look at some examples

With default arguments

Pandas to_datetime() is able to parse any valid date string to datetime without any additional arguments. For example:

df = pd.DataFrame({'date': ['3/10/2000', '3/11/2000', '3/12/2000'],
'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'])
df

Day first format

By default, to_datetime() will parse string with month first (MM/DD, MM DD, or MM-DD) format, and this arrangement is relatively unique in the United State.

In most of the rest of the world, the day is written first (DD/MM, DD MM, or DD-MM). If you would like Pandas to consider day first instead of month, you can set the argument dayfirst to True.

df = pd.DataFrame({'date': ['3/10/2000', '3/11/2000', '3/12/2000'],
'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df

Alternatively, you pass a custom format to the argument format .

Custome format

By default, strings are parsed using the Pandas built-in parser from dateutil.parser.parse. Sometimes, your strings might be in a custom format, for example, YYYY-DD-MM HH:MM:SS. Pandas to_datetime() has an argument called format that allows you to pass a custom format:

df = pd.DataFrame({'date': ['2016-6-10 20:30:0', 
'2016-7-1 19:45:30',
'2013-10-12 4:5:1']
,
'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'], format="%Y-%d-%m %H:%M:%S")
df

Speed up parsing with infer_datetime_format

Passing infer_datetime_format=True can often speed up a parsing if its not an ISO8601 format exactly but in a regular format. According to [1], in some cases, this can increase the parsing speed by 5–10x.

# Make up 3000 rows
df = pd.DataFrame({'date': ['3/11/2000', '3/12/2000', '3/13/2000'] * 1000 })
%timeit pd.to_datetime(df['date'], infer_datetime_format=True)
100 loops, best of 3: 10.4 ms per loop
%timeit pd.to_datetime(df['date'], infer_datetime_format=False)
1 loop, best of 3: 471 ms per loop

Handle parsing error

You will end up with a TypeError if the date string does not meet the timestamp format.

df = pd.DataFrame({'date': ['3/10/2000', 'a/11/2000', '3/12/2000'],
'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'])

to_datetime() has an argument called errors that allows you to ignore the error or force an invalid value to NaT.

df['date'] = pd.to_datetime(df['date'], errors='ignore')
df
df['date'] = pd.to_datetime(df['date'], errors='coerce')
df

In addition, if you would like to parse date columns when reading data from a CSV file, please check out the following article

2. Assemble a datetime from multiple columns

to_datetime() can be used to assemble a datetime from multiple columns as well. The keys (columns label) can be common abbreviations like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or plurals of the same.

df = pd.DataFrame({'year': [2015, 2016],
'month': [2, 3],
'day': [4, 5]}
)
df['date'] = pd.to_datetime(df)
df

3. Get year, month, and day

dt.year, dt.month and dt.day are the inbuilt attributes to get year, month , and day from Pandas datetime object.

First, let’s create a dummy DateFrame and parse DoB to datetime.

df = pd.DataFrame({'name': ['Tom', 'Andy', 'Lucas'],
'DoB': ['08-05-1997', '04-28-1996', '12-16-1995']})
df['DoB'] = pd.to_datetime(df['DoB'])

And to get year, month, and day

df['year']= df['DoB'].dt.year
df['month']= df['DoB'].dt.month
df['day']= df['DoB'].dt.day
df

4. Get the week of year, the day of week and leap year

Similarly, dt.week, dt.dayofweek, and dt.is_leap_year are the inbuilt attributes to get the week of year, the day of week, and leap year.

df['week_of_year'] = df['DoB'].dt.week
df['day_of_week'] = df['DoB'].dt.dayofweek
df['is_leap_year'] = df['DoB'].dt.is_leap_year
df

Note that Pandas dt.dayofweek attribute returns the day of the week and it is assumed the week starts on Monday, which is denoted by 0 and ends on Sunday which is denoted by 6. To replace the number with full name, we can create a mapping and pass it to map() :

dw_mapping={
0: 'Monday',
1: 'Tuesday',
2: 'Wednesday',
3: 'Thursday',
4: 'Friday',
5: 'Saturday',
6: 'Sunday'
}

df['day_of_week_name']=df['DoB'].dt.weekday.map(dw_mapping)
df

5. Get the age from the date of birth

The simplest solution to get age is by subtracting year:

today = pd.to_datetime('today')
df['age'] = today.year - df['DoB'].dt.year
df

However, this is not accurate as people might haven't had their birthday this year. A more accurate solution would be to consider the birthday

# Year difference
today = pd.to_datetime('today')
diff_y = today.year - df['DoB'].dt.year
# Haven't had birthday
b_md = df['DoB'].apply(lambda x: (x.month,x.day) )
no_birthday = b_md > (today.month,today.day)
df['age'] = diff_y - no_birthday
df

6. Improve performance by setting date column as the index

A common solution to select data by date is using a boolean maks. For example

condition = (df['date'] > start_date) & (df['date'] <= end_date)
df.loc[condition]

This solution normally requires start_date, end_date and date column to be datetime format. And in fact, this solution is slow when you are doing a lot of selections by date in a large dataset.

If you are going to do a lot of selections by date, it would be faster to set date column as the index first so you take advantage of the Pandas built-in optimization. Then, you can select data by date using df.loc[start_date:end_date] . Let take a look at an example dataset city_sales.csv, which has 1,795,144 rows data

df = pd.read_csv('data/city_sales.csv',parse_dates=['date'])
df.info()
RangeIndex: 1795144 entries, 0 to 1795143
Data columns (total 3 columns):
# Column Dtype
--- ------ -----
0 date datetime64[ns]
1 num int64
2 city object
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 41.1+ MB

To set the date column as the index

df = df.set_index(['date'])
df

7. Select data with a specific year and perform aggregation

Let’s say we would like to select all data in the year 2018

df.loc['2018']

And to perform aggregation on the selection for example:

Get the total num in 2018

df.loc['2018','num'].sum()1231190

Get the total num for each city in 2018

df['2018'].groupby('city').sum()

8. Select data with a specific month and a specific day of the month

To select data with a specific month, for example, May 2018

df.loc['2018-5']

Similarly, to select data with a specific day of the month, for example, 1st May 2018

df.loc['2018-5-1']

9 Select data between two dates

To select data between two dates, you can usedf.loc[start_date:end_date] For example:

Select data between 2016 and 2018

df.loc['2016' : '2018']

Select data between 10 and 11 o'clock on the 2nd May 2018

df.loc['2018-5-2 10' : '2018-5-2 11' ]

Select data between 10:30 and 10:45 on the 2nd May 2018

df.loc['2018-5-2 10:30' : '2018-5-2 10:45' ]

And to select data between time, we should use between_time(), for example, 10:30 and 10:45

df.between_time('10:30','10:45')

10 Handle missing values

We often need to compute window statistics such as a rolling mean or a rolling sum.

Let’s compute the rolling sum over a 3 window period and then have a look at the top 5 rows.

df['rolling_sum'] = df.rolling(3).sum()
df.head()

We can see that it only starts having valid values when there are 3 periods over which to look back. One solution to handle this is by backfilling of data.

df['rolling_sum_backfilled'] = df['rolling_sum'].fillna(method='backfill')
df.head()

For more details about backfilling, please check out the following article


reference  - https://www.tutorialspoint.com/python-pandas-get-the-current-date-and-time-from-timestamp-object

https://towardsdatascience.com/working-with-datetime-in-pandas-dataframe-663f7af6c587

Comments

Popular Posts

bot

Popular post